Rahim, Abd Mizwar A.
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Stroke prediction using data balancing method and extreme gradient boosting Rahim, Abd Mizwar A.; Baita, Anna; Asharudin, Firman; Ashari, Wahid Miftahul; Hakim, Walidy Rahman; Putra, Andriyan Dwi; Supriatin, Supriatin; Pramono, Eko
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 15, No 1: February 2026
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v15.i1.pp655-671

Abstract

Stroke is one of the leading causes of death worldwide, creating an urgent need for effective early detection systems, particularly because conventional methods often struggle with class imbalance and produce biased evaluations. Previous studies have primarily focused on accuracy while overlooking model consistency, data pre-processing quality, and probability-based evaluation. This study evaluates model performance under three conditions: original data using extreme gradient boosting (XGBoost) with scale_pos_weight, original data using the easy ensemble classifier, and class-balanced data generated using random oversampling (ROS), adaptive synthetic sampling (ADASYN), and synthetic minority over-sampling technique (SMOTE). Each model underwent missing value handling, normalization, feature preparation, and hyperparameter optimization using grid search. Performance was assessed using area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), confidence intervals, calibration curves, Shapley additive explanations (SHAP), decision curve analysis (DCA), and external validation. The results demonstrate that data resampling significantly improves performance, with the XGBoost-SMOTE combination achieving the best results, including an accuracy of 0.99, AUROC of 0.998, and AUPRC of 0.986, outperforming the other approaches. This method provides more consistent and balanced predictions, supporting the application of artificial intelligence for early stroke risk identification.