JOURNAL OF APPLIED INFORMATICS AND COMPUTING
Vol. 9 No. 5 (2025): October 2025

Comparison of Light Gradient Boosting Machine, eXtreme Gradient Boosting, and CatBoost with Balancing and Hyperparameter Tuning for Hypertension Risk Prediction on Clinical Dataset

Murtiningsih, Dewi Ayu (Unknown)
Sari, Bety Wulan (Unknown)
Fajri, Ika Nur (Unknown)



Article Info

Publish Date
18 Oct 2025

Abstract

Hypertension is a long-lasting condition that is highly prevalent and significantly contributes to cardiovascular issues, making early identification a crucial preventive action. This research evaluates the efficacy of three boosting algorithms, eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), and CatBoost in forecasting hypertension risk. A publicly accessible dataset consisting of 4,363 samples was employed, followed by data preprocessing, feature selection through a voting method that integrates Boruta, Recursive Feature Elimination (RFE), and SelectKBest, as well as addressing class imbalance using the Synthetic Minority Over-sampling Technique (SMOTE) and ADASYN (Adaptive Synthetic Sampling Approach). The models were additionally fine-tuned through hyperparameter optimization using GridSearchCV and Repeated Stratified K-Fold Cross Validation. The evaluation results demonstrate that all three algorithms exhibited strong predictive capabilities, with CatBoost leading the way, achieving an accuracy of 0.992, precision of 0.992, recall of 0.992, F1-score of 0.992, and ROC-AUC of 0.9987. Analyzing the confusion matrix further validated that CatBoost had the lowest number of misclassifications when compared to XGBoost and LGBM. Additionally, the use of SHapley Additive exPlanations (SHAP) for model interpretability highlighted that the key factors influencing the prediction of hypertension risk are blood pressure, body mass index (BMI), overall physical activity, waist circumference, triglyceride levels, age, and LDL cholesterol levels, aligning with established medical knowledge. To facilitate real-world use, the top-performing model was implemented into a user-friendly website interface, allowing users to predict their hypertension risk interactively. These findings illustrate that boosting algorithms, especially CatBoost, offer an accurate, dependable, and interpretable machine learning method for creating hypertension risk prediction systems.

Copyrights © 2025






Journal Info

Abbrev

JAIC

Publisher

Subject

Computer Science & IT

Description

Journal of Applied Informatics and Computing (JAIC) Volume 2, Nomor 1, Juli 2018. Berisi tulisan yang diangkat dari hasil penelitian di bidang Teknologi Informatika dan Komputer Terapan dengan e-ISSN: 2548-9828. Terdapat 3 artikel yang telah ditelaah secara substansial oleh tim editorial dan ...