This study compared the performance of nine machine learning algorithms in predicting heart disease using a dataset dating back to 1988 and consisting of four databases: Cleveland, Hungary, Switzerland, and Long Beach totaling 1025 data. The dataset used includes medical features that reflect physiological states, clinical examination results, and cardiovascular risk factors, namely age, gender, type of chest pain, resting blood pressure, serum cholesterol levels, fasting blood sugar levels, resting electrocardiography results, maximum heart rate, chest pain during physical activity, ST segment depression, ST segment slope, number of major blood vessels visible by fluoroscopy, and thalassemia status. The stages of this study include data cleaning, data transformation, and evaluation carried out using the data splitting method for training and testing as well as K-fold cross-validation with metrics of accuracy, precision, recall, F1 score, and AUC-ROC. The algorithms used in this study are Decision Tree, Random Forest, Support Vector Machine, MLP Classifier, Bagging Classifier, Gradient Boosting, CatBoost, XGBoost, and LightGBM with ensemble-based models, such as CatBoost, Random Forest, XGBoost, and LightGBM, showing consistent performance on various evaluation metrics when compared to non-ensemble models. Among all models tested, CatBoost showed the best performance, with an accuracy reaching 98%, an F1-Score of 0.980, and a Recall of 0.9875 then followed by other ensemble algorithms such as Random Forest, XGBoost and LightGBM. The results of this study indicate that ensemble models are proven to be more effective in predicting heart disease. This study aims to present an in-depth comparative study of the performance of ensemble algorithms and modern machine learning in predicting heart disease, as well as enriching the literature related to the application of Knowledge Discovery in the health sector and providing a basis for selecting more reliable prediction algorithms to support clinical decision making and the development of machine learning-based heart disease diagnosis support systems.
Copyrights © 2026