Diabetes represents a global health concern classified as a non-communicable disease, impacting more than 422 million people worldwide, with the number expected to increase each year. This study aims to evaluate the performance of the Random Forest and Extreme Gradient Boosting (XGBoost) classification algorithms on the diabetes disease dataset taken from Kaggle. To improve prediction accuracy, feature selection was carried out using Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) which are expected to filter the most relevant features. The study results showed that the Random Forest model without feature selection yielded an Area Under Curve (AUC) value of 0.8120, while XGBoost achieved an AUC of 0.7666. After applying feature selection with PSO, the AUC increased to 0.8582 for Random Forest and 0.8250 for XGBoost. The use of feature selection with GA gave better results, with an AUC of 0.8612 for Random Forest and 0.8351 for XGBoost. These results indicate that the increase in accuracy after feature selection using PSO ranges from 5.7% to 7.6%, while the increase with GA ranges from 6.1% to 8.9%, with GA providing more significant results. This study contributes to improving the accuracy of diabetes disease classification, which is expected to support the diagnosis process more quickly and accurately.
Copyrights © 2025