The increasing number of people with diabetes is an international health problem. To prevent diabetic complications, early diagnosis and accurate classification are essential. This study looks at how the composition of split data affects the classification performance of diabetics with machine learning algorithms such as Random Forest, Naive Bayes, and Support Vector Machine (SVM). The research data is taken from Bojonegoro Regency Hospital, which consists of 128 samples that have 10 main features. To ensure the data is ready for use, the research method goes through a preprocessing stage. Next, the data was divided into training and testing data with a ratio of 90:10, 80:20, 70:30, 60:40, and 50:50 respectively. Using confusion matrix, the algorithm is assessed for accuracy, precision, recall, and F1 score. In this study we focus on the accuracy values obtained and the results show that the proportion of data sharing affects the performance of the algorithm. Random Forest achieved 100% accuracy in some scenarios. This algorithm also proved to be the most effective in the classification of diabetics. In conclusion, algorithm selection and data split composition are very important for model performance optimization. These results are important for the development of more accurate and efficient Machine Learning-based diagnosis systems. Further research can consider larger datasets and additional algorithms for better results.
Copyrights © 2025