Feature selection plays a crucial role in improving the effectiveness of medical classification models. This study compares two feature selection approaches—filter and wrapper methods—in developing a k-Nearest Neighbors (k-NN) model for heart disease risk classification. The dataset consists of patients’ demographic data, lifestyle factors, and clinical indicators. In this study, the filter method was applied by considering data types: Pearson Correlation was used for numerical features, while the Chi-Square test was applied to categorical features. The selected features from both techniques were then combined, reducing the initial 20 features to four key variables considered most relevant for heart disease risk classification: BMI, homocysteine level, blood pressure, and stress level. This approach achieved high computational efficiency; however, it resulted in only a modest accuracy improvement (76.8%) and a low recall for the minority class (0.07). In contrast, the wrapper method using Sequential Forward Selection (SFS) produced a more informative subset of 11 features, achieving higher accuracy (80.00%) and a ROC-AUC of 0.657, indicating better discrimination capability for the minority class. These findings suggest that while the filter method excels in simplicity and computational efficiency, the wrapper method is more effective in improving classification performance. This study provides empirical insights into selecting appropriate feature selection strategies based on analytical objectives, particularly for clinical decision support systems.
Copyrights © 2026