Garuda - Garba Rujukan Digital

JOURNAL OF APPLIED INFORMATICS AND COMPUTING

Vol. 9 No. 3 (2025): June 2025

Gullam Almuzadid (Unknown)
Egia Rosi Subhiyakto (Unknown)

Publish Date
16 Jun 2025

Stroke is a leading cause of global death and disability. This study proposes a stroke risk classification model using ensemble learning that combines Random Forest and XGBoost algorithms. A Kaggle dataset with 5110 samples (249 stroke, 4861 non-stroke) presented significant class imbalance. To address this, a comprehensive preprocessing pipeline was implemented, including feature encoding, feature scaling, feature selection using ANOVA F-test, outlier handling with Z-Score and IQR methods, and missing value imputation using MICE. The SMOTE-ENN approach was applied to handle class imbalance, resulting in a more balanced sample distribution. The dataset was split into 80% training and 20% testing data (hold-out test) to ensure objective evaluation. Hyperparameter optimization was performed using Bayesian optimization, while model evaluation employed stratified K-fold cross-validation to prevent overfitting. Validation on the hold-out test set demonstrated exceptional ensemble model performance with an AUC of 0.99, 98% accuracy, 98% precision, and 98% recall. Feature importance analysis identified average glucose level and age as the strongest stroke risk predictors. The proposed approach significantly improved predictive accuracy compared to previous research, demonstrating the effectiveness of ensemble learning and preprocessing methods in developing reliable, high-performing machine learning models for early stroke risk assessment.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

JOURNAL OF APPLIED INFORMATICS AND COMPUTING

Website

Abbrev

JAIC

Publisher

Politeknik Negeri Batam

Subject

Computer Science & IT

Description

Journal of Applied Informatics and Computing (JAIC) Volume 2, Nomor 1, Juli 2018. Berisi tulisan yang diangkat dari hasil penelitian di bidang Teknologi Informatika dan Komputer Terapan dengan e-ISSN: 2548-9828. Terdapat 3 artikel yang telah ditelaah secara substansial oleh tim editorial dan ...

Article Info

Abstract

Stroke Risk Classification Using the Ensemble Learning Method of XGBoost and Random Forest

Article Info

Abstract