This Author published in this journals
All Journal ILKOM Jurnal Ilmiah
Silky Safira
Universitas Putra Indonesia YPTK

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

SMOTE-Based Comparative Analysis of Machine Learning Models for Stroke Risk Prediction Using Imbalanced Healthcare Data Ratu Mutiara Siregar; Budy Satria; Sandi Fadilah; Liga Mayola; Silky Safira
ILKOM Jurnal Ilmiah Vol 18, No 1 (2026)
Publisher : Prodi Teknik Informatika FIK Universitas Muslim Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33096/ilkom.v18i1.3161.180-194

Abstract

Stroke remains one of the leading causes of mortality and long-term disability worldwide, with a significant burden in Indonesia. Early detection is crucial, as up to 90% of stroke cases are potentially preventable through timely intervention. However, predictive modeling for stroke risk is often challenged by imbalanced datasets, where non-stroke cases significantly outnumber stroke cases, potentially biasing classification models. This study aims to perform a systematic comparative evaluation of six machine learning algorithms Logistic Regression, Decision Tree, Random Forest, Naïve Bayes, Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost) for stroke risk prediction under imbalanced data conditions. The dataset consists of 5,110 patient records with 11 health-related features obtained from a publicly available healthcare dataset. Data preprocessing included anomaly removal, categorical encoding, feature scaling, and class balancing using the Synthetic Minority Oversampling Technique (SMOTE). Model evaluation was conducted using 5-fold cross-validation and assessed through accuracy, precision, recall, and F1-score metrics. The experimental results demonstrate that ensemble-based models outperform single classifiers. Random Forest achieved the highest mean accuracy of 97.12% (±0.42) with an F1-score of 0.96, followed closely by XGBoost with 96.85% (±0.51). Both models also exhibited superior recall performance, indicating improved minority class detection. The novelty of this study lies in the systematic evaluation of multiple machine learning models using SMOTE-based balancing and cross-validation on publicly available healthcare data, providing robust comparative insights for imbalanced medical classification problems.