Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal JOURNAL OF APPLIED INFORMATICS AND COMPUTING

Latuconsina, Muhammad Sidik

Unknown Affiliation

Author-ID : 9678701

Computer Science & IT

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Title

Comparison of LightGBM and CatBoost Algorithms for Diabetes Prediction Based on Clinical Data Latuconsina, Muhammad Sidik; Rahardi, Majid
Journal of Applied Informatics and Computing Vol. 10 No. 1 (2026): February 2026
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v10i1.12179

Diabetes Mellitus presents a global health challenge necessitating accurate early detection to prevent fatal complications. However, clinical data often exhibit imbalanced class distributions, hindering standard prediction models from effectively detecting positive patients. This study aims to compare the performance of two modern Gradient Boosting algorithms, LightGBM and CatBoost, in predicting diabetes risk. Random Forest and Logistic Regression algorithms were included as baseline models to benchmark effectiveness. To address data imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied during the training data preprocessing stage. The dataset was sourced from the Kaggle public repository (Diabetes Prediction Dataset), comprising 100,000 patient medical records with clinical attributes such as age, body mass index (BMI), and HbA1c levels. Performance evaluation utilized Accuracy, Precision, Recall, F1-Score, and Area Under the Curve (AUC) metrics. Experimental results demonstrated a tight competition, where LightGBM achieved the highest Accuracy of 97.16%. However, CatBoost demonstrated superior sensitivity (Recall) of 69.71% and the highest F1-Score of 80.48%. This makes CatBoost the most reliable model in minimizing False Negatives compared to LightGBM and Random Forest, whereas Logistic Regression showed the lowest performance. Furthermore, interpretability analysis using SHAP (SHapley Additive exPlanations) revealed that HbA1c and blood glucose levels were the most dominant features in detection, validating the model's alignment with clinical diagnosis. This study concludes that the CatBoost algorithm combined with SMOTE offers a more sensitive, transparent, and efficient diabetes prediction for medical screening.

Co-Authors Majid Rahardi

Title Search

Found 1 Documents Search Journal : journal of applied informatics and computing

Abstract

Title

Found 1 Documents
Search
Journal : journal of applied informatics and computing