Diabetes Mellitus (DM) is a chronic metabolic disease that poses a global health threat, with its prevalence increasing every year. Early detection through the application of Data Mining techniques is crucial to prevent severe complications and to support medical practitioners in making faster clinical decisions. This study aims to compare the performance of two popular machine learning algorithms, namely Support Vector Machine (SVM) and Random Forest, in predicting diabetes risk. Unlike previous studies that often utilize complex feature optimization techniques or oversampling methods (such as SMOTE), this research focuses on evaluating baseline performance to observe each algorithm’s pure capability on the standard Pima Indians Diabetes dataset, which consists of 10,004 medical records with 22 clinical attributes. The experiments were conducted using RapidMiner with a 10-Fold Cross-Validation approach to ensure valid and reliable evaluation results. The findings show that the Random Forest algorithm achieved superior performance with an accuracy of 82.19%, while SVM obtained an accuracy of 79.40%. These results confirm that the ensemble learning approach of Random Forest provides better stability in handling clinical data with high variability compared to single-hyperplane methods such as SVM under default parameters. This study is expected to serve as a foundational benchmark for further development of diabetes prediction models in the future.
Copyrights © 2025