Diabetes is a chronic metabolic disorder characterized by elevated blood glucose levels caused by the body’s inability to produce or effectively respond to insulin. The increasing prevalence of diabetes in Indonesia requires accurate data-driven early detection systems to assist the diagnostic process. This study aims to compare the performance of three machine learning algorithms—Support Vector Machine (SVM), Random Forest, and Logistic Regression—in predicting diabetes disease based on patient clinical data. The dataset used was obtained from the Kaggle repository titled 100,000 Diabetes Clinical Dataset. The research process was conducted using the Orange Data Mining software through several stages, including data preprocessing, One-Hot Encoding transformation, model training, and evaluation using the 10-Fold Cross Validation method. The results show that the Random Forest algorithm achieved the best performance with an accuracy of 97.1%, followed by Logistic Regression at 96.0% and SVM at 92.3%. These findings indicate that ensemble-based methods such as Random Forest outperform others in producing stable and accurate predictions for diabetes diagnosis
Copyrights © 2025