Heart failure remains one of the leading causes of mortality worldwide, posing significant challenges for early diagnosis and patient management. One of the major obstacles in developing predictive models for heart failure is the class imbalance problem, where the number of surviving patients far exceeds those who experience death events. This imbalance often leads machine learning algorithms to bias toward the majority class, reducing sensitivity to critical minority cases. To address this issue, this study applies the Synthetic Minority Oversampling Technique (SMOTE) to balance the dataset and improve model performance. Three supervised learning algorithms, namely Logistic Regression (LR), Random Forest (RF), and K-Nearest Neighbor (KNN), were implemented and compared on the UCI Heart Failure Clinical Records dataset containing 299 patient samples with 13 clinical attributes. Experimental results show that the Random Forest model achieved the highest performance with 90% accuracy, precision, recall, and F1-score, outperforming both LR and KNN. The findings demonstrate that combining data balancing with ensemble learning effectively enhances prediction accuracy and sensitivity toward minority classes. The main contribution of this research lies in optimizing supervised models for medical data with skewed class distributions, providing a more reliable and interpretable approach for early heart failure detection. Future research may extend this work by integrating advanced ensemble or hybrid deep learning models and expanding the dataset for multi-institutional validation
Copyrights © 2025