Heart failure remains a major global health challenge, and early prediction is essential for improving patient outcomes. This study evaluates three ensemble learning methods, namely Random Forest, AdaBoost, and XGBoost, using the Heart Failure Prediction dataset containing 918 patient records from Kaggle. A quantitative experimental design was applied, including preprocessing with KNN imputation, model development, and evaluation using 10-Fold Cross Validation. Performance was assessed through accuracy, precision, recall, F1-score, and AUC-ROC. Random Forest achieved the highest accuracy (0.868), recall (0.907), F1-score (0.884), and AUC-ROC (0.922), while AdaBoost produced the highest precision (0.874). Although the models showed generally similar performance patterns, statistical tests revealed notable distinctions: RF vs. XGB exhibited significant differences in Recall (p = 0.011) and F1-score (p = 0.016), and the Friedman test identified a significant difference in Recall (p = 0.034) across the three models. Feature importance analysis showed that the models consistently emphasized clinically relevant variables, with ST-segment slope, Oldpeak, and exercise-induced angina appearing among the most influential predictors. These features align with recent cardiovascular evidence identifying exercise ECG indicators and stress-response variables as strong predictors of cardiac risk. Overall, the results suggest that recall-related behaviour is the main performance differentiator among the ensemble models, with Random Forest providing a modest advantage in identifying true heart failure cases. The study is limited by its reliance on a single dataset and a relatively small sample size, which may restrict the generalizability of the findings.