ILKOM Jurnal Ilmiah
Vol 18, No 1 (2026)

SMOTE-Based Comparative Analysis of Machine Learning Models for Stroke Risk Prediction Using Imbalanced Healthcare Data

Ratu Mutiara Siregar (Institut Teknologi Sawit Indonesia)
Budy Satria (Universitas Andalas)
Sandi Fadilah (Universiti Muhammadiyah Malaysia)
Liga Mayola (Universitas Putra Indonesia YPTK)
Silky Safira (Universitas Putra Indonesia YPTK)



Article Info

Publish Date
20 Apr 2026

Abstract

Stroke remains one of the leading causes of mortality and long-term disability worldwide, with a significant burden in Indonesia. Early detection is crucial, as up to 90% of stroke cases are potentially preventable through timely intervention. However, predictive modeling for stroke risk is often challenged by imbalanced datasets, where non-stroke cases significantly outnumber stroke cases, potentially biasing classification models. This study aims to perform a systematic comparative evaluation of six machine learning algorithms Logistic Regression, Decision Tree, Random Forest, Naïve Bayes, Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost) for stroke risk prediction under imbalanced data conditions. The dataset consists of 5,110 patient records with 11 health-related features obtained from a publicly available healthcare dataset. Data preprocessing included anomaly removal, categorical encoding, feature scaling, and class balancing using the Synthetic Minority Oversampling Technique (SMOTE). Model evaluation was conducted using 5-fold cross-validation and assessed through accuracy, precision, recall, and F1-score metrics. The experimental results demonstrate that ensemble-based models outperform single classifiers. Random Forest achieved the highest mean accuracy of 97.12% (±0.42) with an F1-score of 0.96, followed closely by XGBoost with 96.85% (±0.51). Both models also exhibited superior recall performance, indicating improved minority class detection. The novelty of this study lies in the systematic evaluation of multiple machine learning models using SMOTE-based balancing and cross-validation on publicly available healthcare data, providing robust comparative insights for imbalanced medical classification problems.

Copyrights © 2026






Journal Info

Abbrev

ILKOM

Publisher

Subject

Computer Science & IT

Description

ILKOM Jurnal Ilmiah is an Indonesian scientific journal published by the Department of Information Technology, Faculty of Computer Science, Universitas Muslim Indonesia. ILKOM Jurnal Ilmiah covers all aspects of the latest outstanding research and developments in the field of Computer science, ...