Stroke is one of the leading causes of death and disability worldwide. Early detection of stroke risk is crucial to prevent more severe complications. This study aims to develop a stroke prediction model based on machine learning using an open dataset from Kaggle containing patients' medical and demographic information. Four machine learning algorithms were utilized and compared: AdaBoost, Gradient Boosting, LightGBM, and XGBoost. Data preprocessing steps included missing value imputation, categorical variable encoding, numerical feature normalization, and class balancing using the SMOTEENN method. Additionally, feature selection was performed using the Extra Trees algorithm to enhance model performance. The results showed that the XGBoost model delivered the best performance, achieving an accuracy of 97.16%, an F1-score of 97.49%, and an AUC of 99.75%. This model proved to be effective in detecting stroke cases and holds potential for integration into clinical decision support systems. The study concludes that a combination of modern boosting algorithms and optimal preprocessing techniques can yield a reliable stroke prediction system suitable for implementation in digital healthcare contexts.
Copyrights © 2025