Diabetes, also known as diabetes mellitus, is a long-term condition caused by the inability of the pancreas to produce enough insulin, which leads to increased levels of glucose in the blood. Diabetes is a dangerous disease. There is no known cause of diabetes, but many believe that lifestyle and genes may play a role. Bioinformatics researchers are trying to overcome this disease and create systems that help predict diabetes. Many diabetes prediction systems use methods such as C4.5, KNN, Naive Bayes, and linear SVM, according to existing research. In this study, the analysis of the accuracy of diabetes disease data classification was carried out using SVM and several choices of variables on the original and balanced data. The results of the original data experiment with 768 rows of variables that have the highest correlation are glucose, and using three variables (glucose, BMI, Age) has the highest accuracy with SVM RBF and Polynomial (0.773). Balanced data using five variables (pregnancies, glucose, BMI, diabetes pedigree function, age) has the highest classification accuracy of linear SVM (0.775). Conclusion: by balancing the number of diabetes disease classes, there is a slight increase in classification accuracy results from the initial 0.766 to 0.775.
Copyrights © 2026