JOURNAL OF APPLIED INFORMATICS AND COMPUTING
Vol. 9 No. 5 (2025): October 2025

Comparison of Logistic Regression, Random Forest, Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) Algorithms in Diabetes Prediction

Kurniawan, M. Fadli (Unknown)
Megawaty, Dyah Ayu (Unknown)



Article Info

Publish Date
04 Oct 2025

Abstract

Diabetes mellitus is a prevalent chronic illness that continues to grow in incidence worldwide, placing significant strain on healthcare systems. The timely prediction of diabetes is crucial for early intervention and management. This study explores the comparative effectiveness of four machine learning algorithms Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN) in identifying diabetes cases using a large public dataset containing 100,000 patient records obtained from open source Kaggle. The dataset includes nine clinical variables, such as age, gender, body mass index (BMI), blood glucose level, and HbA1c levels, among others. To address class imbalance, which showed less than 10% positive (diabetic) cases initially, the Synthetic Minority Oversampling Technique (SMOTE) was applied exclusively to the training data after an 80:20 stratified split. All models were evaluated using 5-fold stratified cross-validation, measuring their performance through accuracy, precision, recall, F1-score, area under the ROC curve (AUC), and training time. Among the models, Random Forest achieved the highest classification accuracy (96.88%) and AUC (99.70%), indicating superior overall performance. Furthermore, McNemar statistical tests revealed that the differences in performance between Random Forest and the other models were statistically significant. An analysis of feature importance highlighted that HbA1c, glucose level, and BMI were the most influential predictors. These results demonstrate that Random Forest offers the most balanced combination of accuracy, interpretability, and robustness, making it highly suitable for real-world clinical screening scenarios where early detection of diabetes is critical.

Copyrights © 2025






Journal Info

Abbrev

JAIC

Publisher

Subject

Computer Science & IT

Description

Journal of Applied Informatics and Computing (JAIC) Volume 2, Nomor 1, Juli 2018. Berisi tulisan yang diangkat dari hasil penelitian di bidang Teknologi Informatika dan Komputer Terapan dengan e-ISSN: 2548-9828. Terdapat 3 artikel yang telah ditelaah secara substansial oleh tim editorial dan ...