Cardiovascular disease remains a primary contributor to global mortality, underscoring the urgent need for accurate and early diagnostic tools. This study aims to develop a robust classification model for heart disease by conducting a comparative analysis of six ensemble machine learning algorithms, comprising three from the Bagging family (Random Forest, Bagged Decision Tree, Extra Trees) and three from the Boosting family (AdaBoost, Gradient Boosting, XGBoost). The research utilizes the publicly available UCI Cleveland Heart Disease dataset, which exhibits a mild class imbalance. To address this, the Synthetic Minority Over-sampling Technique (SMOTE) was strategically applied to the training data. The performance of each model was rigorously evaluated using accuracy, precision, recall, and F1-score. Experimental results revealed that the Extra Trees algorithm, when combined with SMOTE, achieved the highest overall performance with 90% accuracy, 96% precision, 82% recall, and an 88% F1-score. The primary contribution of this work lies in its comprehensive analysis demonstrating that the randomization strategy of Extra Trees provides a superior and more reliable framework for this classification task compared to other common ensemble techniques, particularly after data balancing. These findings confirm that an integrated approach of ensemble learning and proper data balancing can significantly enhance the development of fair and effective diagnostic tools to support medical professionals.
Copyrights © 2025