Heart disease remains one of the leading causes of mortality, highlighting the importance of data-driven predictive models for risk analysis. However, medical datasets commonly suffer from class imbalance and weak predictive signals, which can limit model performance. This study aims to evaluate the performance of a Logistic Regression model for heart attack prediction by comparing imbalanced and balanced datasets using different train–test split ratios of 80:20 and 90:10. Model performance was evaluated using accuracy, precision, recall, F1-score, and confusion matrix. The experimental results show that models trained on imbalanced data achieved higher accuracy but exhibited biased performance, particularly low recall for the minority class. After applying data balancing techniques, accuracy decreased; however, the model demonstrated more balanced performance with improved recall and F1-score for the minority class. These findings indicate that accuracy alone does not adequately represent model performance on imbalanced medical datasets. Moreover, the results suggest that the relationship between the medical attributes and heart attack occurrence in the dataset is relatively weak, limiting the model’s ability to establish clear decision boundaries. Therefore, appropriate evaluation metrics and representative clinical datasets are essential for developing reliable heart disease risk prediction models.
Copyrights © 2026