Cardiovascular diseases, particularly heart failure, remain a leading cause of mortality in Indonesia, affecting an estimated 2.78 million individuals. This study aims to develop a heart failure risk prediction model using the XGBoost algorithm and to evaluate its performance through a comparative validation approach across two datasets with distinct characteristics. The primary model was trained on a large-scale Indonesian population dataset (N = 158,355; 28 features) representing the complexity of real-world clinical data, while the UCI Heart Disease dataset (N = 918; 12 features) was used as a benchmark under more controlled conditions. Experimental results show that the Indonesian model achieved a testing accuracy of 73.50% with a very small training–testing performance gap of 0.53% and an AUC-ROC value of 0.814, indicating strong stability and generalization capability. In contrast, the model trained on the UCI dataset obtained a higher accuracy of 88.59% but exhibited moderate overfitting, reflected by a larger performance gap of 4.60%. Feature importance analysis consistently identified a history of heart disease, hypertension, and smoking behavior as the most influential predictors across both datasets. These findings highlight that model stability and generalization on real-world data are more critical than raw accuracy derived from small, idealized datasets when assessing the clinical deployment readiness of medical artificial intelligence systems in Indonesia.
Copyrights © 2025