This study delves into heart disease classification through integrated feature selection and machine learning methodologies, utilizing three datasets comprising 4,728 participants and 11 features, with 4.27% missing data. Employing machine learning, we used XGBoost to achieve 0.95 accuracy for one feature, while Random Forest (RF) demonstrated accuracies of 0.92 and 0.99 for the remaining two features. Comparing 11 classification models, RF and XGBoost classified heart disease with 0.97 and 0.99 accuracy, respectively, using all available features. Applying Feature Elimination with Simultaneous Perturbation Feature Selection and Ranking (SpFSR) revealed that RF attained 0.99 accuracy by selecting only four features (cholesterol level, age, resting electrocardiographic measurements, and maximum heart rate), while XGBoost dropped to 0.91. Constructing an RF model with four features enhanced interpretability without compromising accuracy. Explainable Machine Learning (XAI) techniques, including Permutation Importance and SHAP Summary Plot analyses, gauged feature impact on heart disease prediction. The resting electrocardiographic measurements feature held the highest value (0.40 ± 0.01), followed by maximum heart rate (0.32 ± 0.01), cholesterol level (0.28 ± 0.01), and age (0.26 ± 0.005). These results underscore the significance of each feature in diagnosing heart disease via machine learning.
Copyrights © 2025