Garuda - Garba Rujukan Digital

Bulletin of Computer Science Research

Vol. 6 No. 1 (2025): December 2025

Astofa, Aniq (Unknown)
Rosyani, Perani (Unknown)
Rahmawati, Rahmawati (Unknown)
Apandi, Sopiyan (Unknown)

Publish Date
31 Dec 2025

Diabetes is one of the non-communicable diseases that is often detected at an advanced stage, thereby increasing the risk of serious complications. The application of machine learning has the potential to support early diabetes detection; however, most previous studies have focused on large-scale datasets and high predictive accuracy, while methodological evaluations on small-sized clinical data remain limited. This study aims to evaluate and compare the performance of several machine learning algorithms for early diabetes prediction using a limited clinical dataset, with particular emphasis on analyzing the impact of data characteristics on model performance. The dataset used in this study consists of 22 samples with eight clinical features and one target variable, which were divided into 17 training samples and 5 testing samples. The research stages include data preprocessing, training–testing data splitting, model training, and performance evaluation using accuracy, precision, recall, F1-score, and ROC-AUC metrics. The algorithms evaluated include Logistic Regression, Random Forest, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and XGBoost. The experimental results indicate that none of the evaluated models were able to effectively detect the diabetes class, as reflected by precision, recall, and F1-score values of zero across all models. Although Random Forest and XGBoost achieved an accuracy of 0.6, this value was largely influenced by the dominance of the non-diabetes class in the very limited test set. Correlation analysis further reveals that Glucose, BMI, and Diabetes Pedigree Function are the most influential features associated with diabetes status. The main contribution of this study lies in providing a realistic methodological evaluation of machine learning models applied to small-sized clinical datasets, highlighting that limited sample size and training–testing data partitioning have a substantial impact on model performance and the interpretation of evaluation metrics. These findings provide an important methodological reference for future studies aiming to develop more reliable early diabetes prediction models under constrained clinical data conditions.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Bulletin of Computer Science Research

Website

Abbrev

bulletincsr

Publisher

Forum Kerjasama Pendidikan TInggi

Subject

Computer Science & IT

Description

Bulletin of Computer Science Research covers the whole spectrum of Computer Science, which includes, but is not limited to : • Artificial Immune Systems, Ant Colonies, and Swarm Intelligence • Bayesian Networks and Probabilistic Reasoning • Biologically Inspired Intelligence • Brain-Computer ...

Article Info

Abstract

Evaluasi Komparatif Algoritma Machine Learning untuk Prediksi Dini Diabetes

Article Info

Abstract