Type 2 diabetes mellitus is a chronic disease with an increasing prevalence rate that can cause serious complications if not detected early. The application of machine learning algorithms can aid prediction, but selecting the right model and features greatly determines the accuracy of the results. This study aims to compare the performance of the Extreme Gradient Boosting (XGBoost) and Naive Bayes algorithms in predicting type 2 diabetes with and without Recursive Feature Elimination (RFE) feature selection. The data used were from the UCI Machine Learning Repository, comprising 768 samples and eight clinical features. The research process included data preprocessing, dividing the data into 614 training data and 154 testing data, applying RFE to select the most influential features, model training, and evaluation using accuracy, precision, recall, F1-score, and AUC. The results show that Naive Bayes without RFE achieves 70.77% accuracy, 0.57377 precision, 0.648148 recall, F1-score 0.608696, and 0.772778 AUC, while Naive Bayes with RFE increases the accuracy to 74.02% and the AUC to 0.793333. Meanwhile, XGBoost with RFE provided the best results with an accuracy of 74.67%, precision of 0.653061, recall of 0.592593, F1-score of 0.621359, and the highest AUC of 0.804259. Besides, applying RFE also improves the computational efficiency. These findings indicate that applying RFE significantly improves classification and computation time performance. The practical implication is that this model could aid early detection of diabetes in clinical settings. Further research can be conducted by optimizing parameters and using more diverse datasets.
Copyrights © 2026