International Journal of Applied Mathematics, Sciences, and Technology for National Defense
Vol. 4 No. 1 (2026): International Journal of Applied Mathematics, Sciences, and Technology for Nati

Hybrid random forest–catboost ensemble for heart disease prediction on imbalanced datasets: Toward applications in military health systems

Ihsan, Mahyus (Unknown)
Zahnur (Unknown)
Fadlan, Iftahul (Unknown)
Ikhsan Maulidi (Unknown)



Article Info

Publish Date
30 Apr 2026

Abstract

ackground: Heart disease is one of the main causes of death worldwide, with cases increasing every year. This situation highlights the urgent need for early detection systems that are not only fast but also accurate and reliable. In recent years, machine learning has emerged as a promising alternative approach for analyzing medical data, particularly for disease classification and risk prediction tasks. Aims: This study aims to develop a heart disease prediction model by integrating Random Forest and CatBoost in a hybrid ensemble framework and evaluating its performance on an imbalanced medical dataset. Method: This study employs a quantitative approach based on supervised learning using the Behavioral Risk Factor Surveillance System (BRFSS) 2021 dataset, which consists of more than 300,000 observations. Data preprocessing includes duplicate removal, BMI categorization, encoding of categorical variables, and exploratory analysis. To address class imbalance, the Borderline-SMOTE technique was applied before splitting the dataset using an 80:20 train-test split. Random Forest and CatBoost models were trained and combined using a soft voting ensemble. Result: The evaluation results indicate that Random Forest achieved the highest accuracy of 0.94, with well-balanced precision and recall across all classes. CatBoost demonstrated relatively stable performance with accuracy around 0.84. The ensemble approach achieved an accuracy of 0.91 with strong metric stability and good sensitivity to positive cases. Conclusion: The results indicate that Random Forest performs best for the dataset used in this study, while the ensemble model provides a balanced compromise between predictive performance and robustness. The analysis also shows that Age Category, General Health, and BMI are the most influential predictors of heart disease risk. This model can support early cardiovascular risk detection in military personnel, contributing to maintaining operational readiness in defense systems. Furthermore, the proposed approach provides a reliable decision-support tool for large-scale medical screening in resource-constrained healthcare environments.

Copyrights © 2026






Journal Info

Abbrev

JAS-ND

Publisher

Subject

Biochemistry, Genetics & Molecular Biology Chemistry Computer Science & IT Mathematics Physics

Description

International Journal of Applied Mathematics, Sciences, and Technology for National Defense (App.Sci.Def) [e-ISSN: 2985-9352, p_ISSN: 2986-0776] is a journal published by the Foundation of Advanced Education. International Journal of Applied Mathematics, Sciences and Technology for National Defense ...