Barus, Hanisa putri
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Comparison of XGBoost and Naive Bayes Models in Type 2 Diabetes Prediction with RFE Feature Selection Barus, Hanisa putri; Robet; Feriani Astuti Tarigan
Sinkron : jurnal dan penelitian teknik informatika Vol. 10 No. 1 (2026): Article Research January 2026
Publisher : Politeknik Ganesha Medan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33395/sinkron.v10i1.15509

Abstract

Type 2 diabetes mellitus is a chronic disease with an increasing prevalence rate that can cause serious complications if not detected early. The application of machine learning algorithms can aid prediction, but selecting the right model and features greatly determines the accuracy of the results. This study aims to compare the performance of the Extreme Gradient Boosting (XGBoost) and Naive Bayes algorithms in predicting type 2 diabetes with and without Recursive Feature Elimination (RFE) feature selection. The data used were from the UCI Machine Learning Repository, comprising 768 samples and eight clinical features. The research process included data preprocessing, dividing the data into 614 training data and 154 testing data, applying RFE to select the most influential features, model training, and evaluation using accuracy, precision, recall, F1-score, and AUC. The results show that Naive Bayes without RFE achieves 70.77% accuracy, 0.57377 precision, 0.648148 recall, F1-score 0.608696, and 0.772778 AUC, while Naive Bayes with RFE increases the accuracy to 74.02% and the AUC to 0.793333. Meanwhile, XGBoost with RFE provided the best results with an accuracy of 74.67%, precision of 0.653061, recall of 0.592593, F1-score of 0.621359, and the highest AUC of 0.804259. Besides, applying RFE also improves the computational efficiency. These findings indicate that applying RFE significantly improves classification and computation time performance. The practical implication is that this model could aid early detection of diabetes in clinical settings. Further research can be conducted by optimizing parameters and using more diverse datasets.