This study evaluates the effectiveness of two feature selection methods, namely the statistics-based SelectKBest and the model-based Permutation Importance, in improving the performance of classification algorithms for diabetes prediction. A dataset consisting of 17 clinical and demographic features was used to train 11 machine learning algorithms with two subsets of selected features. Performance evaluation used accuracy, precision, recall, F1-Score, ROC AUC, and training time. Based on the results, the SelectKBest method was able to improve the performance of Random Forest with an accuracy of 82.7%, a precision of 0.8, a recall of 0.5, and an F1-Score of 0.615. Meanwhile, the Permutation Importance method showed more consistent performance, with six models including Random Forest, K-Nearest Neighbors, and Quadratic Discriminant Analysis (QDA) achieving an accuracy of up to 86.2%. QDA stood out with the highest ROC AUC of 0.887, indicating better class detection capabilities. These findings underscore the superiority of Permutation Importance in selecting relevant and varied features, including demographic factors, thereby improving overall prediction accuracy. In practice, Random Forest with SelectKBest is recommended for applications requiring fast and interpretable models, while QDA and Gradient Boosting with Permutation Importance are recommended for those requiring high accuracy and sensitivity. This study strengthens the foundation for developing more accurate and applicable diabetes prediction models across various contexts.
Copyrights © 2025