Kardono, Dicka Yale
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Coronary Heart Disease Risk Prediction under Class Imbalance Using XGBoost with SHAP-Based Interpretation Amiroch, Siti; Laili, Fitri Nur; Rohmah, Awawin Mustana; Kardono, Dicka Yale
CAUCHY: Jurnal Matematika Murni dan Aplikasi Vol 11, No 1 (2026): CAUCHY: JURNAL MATEMATIKA MURNI DAN APLIKASI
Publisher : Mathematics Department, Universitas Islam Negeri Maulana Malik Ibrahim Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.18860/cauchy.v11i1.37938

Abstract

Coronary heart disease (CHD) risk prediction is challenging because clinical data are heterogeneous and the response variable is imbalanced. This study develops an interpretable predictive framework for CHD risk using Extreme Gradient Boosting (XGBoost), median imputation, IQR-based winsorization, standardization, the Synthetic Minority Over-sampling Technique (SMOTE), bootstrap-based uncertainty assessment, and Shapley Additive Explanations (SHAP). The learning problem is formulated within a regularized empirical risk minimization framework, so the model is viewed as a statistical estimator rather than merely an algorithmic classifier. To avoid information leakage, train–test splitting is performed before any resampling, and SMOTE is applied only to the training data. The primary analysis is fixed a priori at an 80:20 stratified split, whereas 60:40 and 70:30 splits are treated as sensitivity analyses rather than model-selection devices. In the primary analysis, the model attains accuracy of 79.36%, precision of 27.88%, recall of 22.48%, F1-score of 24.89%, and ROC–AUC of 0.6502. The 95% bootstrap confidence interval for ROC–AUC is [0.6017, 0.6981]. SHAP analysis in probability space identifies age, cigsPerDay, male, heartRate, and sysBP as the most influential predictors. These results show that the proposed framework is mathematically well-structured and interpretable, but that its out-of-sample discrimination on this dataset is moderate rather than high.