Claim Missing Document
Check
Articles

Deteksi Diabetes Mellitus dengan Menggunakan Teknik Ensemble XGBoost dan LightGBM Pratama, Naufal Adhi; Utomo, Danang Wahyu
JISKA (Jurnal Informatika Sunan Kalijaga) Vol. 11 No. 1 (2026): January 2026
Publisher : UIN Sunan Kalijaga Yogyakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14421/jiska.4908

Abstract

Diabetes mellitus is a metabolic disease characterized by elevated blood sugar levels due to impaired insulin secretion, insulin action, or both. The disease has a major impact on public health and contributes to high morbidity and mortality rates in many countries. Prevention and early detection are essential to reduce the adverse effects of this disease. This study aims to analyze and apply machine learning algorithms in detecting diabetes mellitus, focusing on the use of XGBoost and LightGBM algorithms. The dataset used in this study includes various features related to diabetes risk factors, such as age, gender, body mass index (BMI), hypertension, smoking history, and HbA1c and blood glucose levels. Preprocessing was performed to clean and balance the data using the SMOTE-Tomek technique. Next, the model was built and evaluated using the K-Fold cross-validation method to measure the accuracy and stability of the model. The results showed that the XGBoost model achieved 97.31% accuracy, while the LightGBM model produced 97.26% accuracy. Combining the two models through blending techniques resulted in an accuracy of 97.51%, indicating that the combination of models can improve prediction performance. This study shows the great potential of machine learning algorithms, especially XGBoost and LightGBM, in detecting diabetes mellitus accurately and efficiently. Hopefully, the results of this study can contribute to the development of decision support systems for more effective early diagnosis of diabetes.
Perbandingan Kinerja Model Deep LearningĀ Convolutional Neural Network (CNN) dan Multilayer Perceptron (MLP) untuk Klasifikasi Penyakit Diabetes Melitus Putri, Cindy Arlita; Utomo, Danang Wahyu
Infotekmesin Vol 17 No 1 (2026): Infotekmesin: Januari 2026
Publisher : P3M Politeknik Negeri Cilacap

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35970/infotekmesin.v17i1.2984

Abstract

Diabetes mellitus is a chronic disease with a continuously increasing number of sufferers. Early detection remains difficult because conventional methods often only recognize the disease at an advanced stage. This study evaluates the performance of the Convolutional Neural Network (CNN) and Multi-Layer Perceptron (MLP) in classifying diabetes using the NHANES dataset (2,278 samples; 21 positive for diabetes). The models were tested with k-fold cross-validation using the metrics accuracy, precision, recall, F1-Score, and ROC-AUC. Results show high accuracy and precision (0.99), an average recall of 0.67, and an F1-Score of 0.75. A paired t-test indicates that CNN is superior in some metrics with a p-value of 0.374, though the ROC-AUC difference is not significant. CNNs can capture complex patterns in health features such as glucose, BMI, and age, whereas MLPs remain reliable as a baseline. In conclusion, both CNN and MLP have the potential to be used for tabular data-based diabetes classification, with CNN showing a tendency to be more effective in detecting non-linear patterns in the imbalanced dataset.
Implementasi Stacking Ensemble Berbasis Cross Domain untuk Klasifikasi Diabetes Ijayanti, Selvi; Utomo, Danang Wahyu
Infotekmesin Vol 17 No 1 (2026): Infotekmesin: Januari 2026
Publisher : P3M Politeknik Negeri Cilacap

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35970/infotekmesin.v17i1.3000

Abstract

Diabetes mellitus is a chronic disease whose prevalence continues to increase and demands accurate early detection solutions that are adaptive to patient data diversity. This study implements the stacking ensemble method for diabetes risk classification with a cross-domain approach, integrating two popular datasets, namely the PIMA Indians Diabetes and NHANES. The experimental pipeline includes feature and label harmonization, missing value imputation using the median, standardization, and class balancing through oversampling. The base models used include Random Forest, Support Vector Machine, Decision Tree, and Multi-Layer Perceptron, with Logistic Regression as the meta learner in the stacking scheme. The evaluation was conducted systematically using stratified k-fold cross-validation and test split, as well as cross-domain scenarios to measure the model's cross-domain adaptation capabilities. In the adaptive domain scenario, the stacking ensemble achieved an accuracy of approximately 0.987% with a recall of 1.000% and an ROC-AUC of approximately 0.987%, while the accuracy of the single base learner reached an accuracy of 0.976% with a recall of 1.000% and an ROC-AUC of approximately 0.977%, thus demonstrating that the adaptive domain stacking approach provides consistently higher performance than the base model. These findings confirm the superiority of adaptive domain-based stacking in dealing with medical data heterogeneity and class imbalance issues, and reinforce its potential as a decision support system for early detection of diabetes in a wider population.