The increase in diabetes mellitus cases globally, including in Indonesia, demands a more adaptive lifestyle-based risk prediction strategy. This study aims to evaluate and compare the efficiency of Support Vector Machine (SVM) and Naive Bayes in the diabetes risk classification process using a Hybrid clustering-classification approach . The data analyzed in this study were obtained from the Kaggle platform , with 8,500 data of diabetes patients analyzed based on the attributes of age, gender, and smoking history. The Clustering process was carried out using K-Means (k=3), resulting in three main groups with different lifestyle characteristics. The classification results showed that Naive Bayes provided stable performance with an F1-score of 0.975. Meanwhile, SVM excelled in terms of F1-score 98.3% and perfect AUC (1,000), and was able to classify all data in cluster C3 without error. However, SVM recorded a higher classification error in the majority cluster . This study concluded that SVM was superior by 0.8% over Naive Bayes . Naive Bayes is more suitable for balanced data, while SVM is effective for detecting patterns in minority groups. These findings support the use of a hybrid approach in lifestyle data-based diabetes early detection systems. Future research directions include integrating additional variables and ensemble techniques to improve model generalization.
Copyrights © 2025