Diabetes mellitus is one of the global health issues with a continuously increasing prevalence. Its high prevalence significantly impacts economic burdens and healthcare systems, as it often leads to severe complications such as cardiovascular diseases and kidney failure. Therefore, early prediction and detection of diabetes mellitus are crucial in mitigating its adverse effects. Data mining and machine learning technologies offer innovative solutions for processing complex medical data, providing deeper insights, and supporting data-driven decision-making. This study aims to develop a diabetes mellitus prediction model using the Stochastic Gradient Boosting (SGB) algorithm. The model utilizes a dataset comprising clinical variables such as glucose levels, blood pressure, body mass index (BMI), and genetic history to identify diabetes risk. The results indicate that the developed prediction model demonstrates high performance across various dataset splitting ratios: 70:30, 80:20, and 90:10. The model achieved the highest accuracy of 95.50% at the 70:30 ratio, with an AUC (Area Under the Curve) value of 0.9862, showcasing its ability to effectively differentiate between positive (diabetes) and negative (non-diabetes) classes. At the 80:20 and 90:10 ratios, the model achieved accuracies of 92.75% and 92.31%, with AUC values of 0.9767 and 0.9777, respectively, indicating consistent performance. The model’s high accuracy is attributed to the iterative boosting approach in the SGB algorithm, which adaptively corrects prediction errors at each iteration. Additionally, regulatory mechanisms such as learning rate and subsampling help prevent overfitting, making the algorithm effective for datasets with complex patterns.
Copyrights © 2025