Stroke is a disease with a high mortality and disability rate, especially in Indonesia. Early detection of stroke risk is important to prevent serious consequences. This study examines the distribution of stroke cases based on age groups and evaluates the performance of the K-Nearest Neighbors (KNN) algorithm on imbalanced data and after applying the Synthetic Minority Oversampling Technique (SMOTE). The analysis uses two data division scenarios: 80:20 and 70:30 between training and test data. The results show that the risk of stroke increases with age. No cases were found in the 20–30 age group, cases began to appear in the 30–40 age group, and increased sharply above the age of 50. KNN without SMOTE had an accuracy of 95% (80:20) and 94% (70:30), but low recall, 0.04 and f1-score 0.07 (80:20), and recall 0.03 and f1-score 0.05 (70:30). After SMOTE, recall increased to 0.36 and f1-score 0.21 (80:20), and recall 0.28 and f1-score 0.17 (70:30). Accuracy decreased to 86% in both ratios, but recall and f1-score increased, indicating that the model was more sensitive to stroke cases. Overall, SMOTE effectively reduces majority class bias and helps the model recognize overlooked stroke patterns. However, sensitivity still needs to be improved through parameter tuning, selection of relevant features, or alternative algorithms to enhance prediction reliability.
Copyrights © 2025