Heart attack is one of the leading causes of death worldwide, making early risk prediction essential for improving patient outcomes. However, many medical datasets suffer from class imbalance, where the number of high-risk cases is significantly smaller than normal cases. This condition may cause machine learning models to be biased toward the majority class and reduce their ability to detect high-risk patients. This study aims to analyze the performance of the K-Nearest Neighbor (KNN) algorithm optimized using F1-score and combined with the Synthetic Minority Over-sampling Technique (SMOTE) for heart attack risk classification. The dataset used is the Heart Attack Dataset, which consists of numerical and categorical features. The research applies an experimental approach by developing a machine learning pipeline that includes data preprocessing, missing value handling, feature standardization, oversampling using SMOTE, and hyperparameter optimization through GridSearchCV with F1-score as the main evaluation metric. Model evaluation is conducted using Stratified 5-Fold Cross-Validation with accuracy, precision, recall, F1-score, and ROC-AUC metrics. The results show that the baseline KNN model achieves an accuracy of 98.50%, precision 95.27%, recall 81.47%, and ROC-AUC 0.9278. Meanwhile, the KNN model integrated with SMOTE attains a recall of 87.27% and ROC-AUC of 0.9484, indicating improved detection of heart attack cases and a reduction in false negatives by 31%, although precision decreases to 72.15%. These findings demonstrate that the integration of SMOTE and hyperparameter optimization effectively improves model sensitivity, making it more suitable for medical applications that prioritize patient safety.
Copyrights © 2026