The performance of machine learning in disease classification heavily depends on effective feature selection. This study explores feature selection methods—Boruta and Recursive Feature Elimination (RFE)—with ensemble models like Random Forest, Decision Tree, Gradient Boosting, LightGBM, and XGBoost using Electronic Health Records (EHR) data. Results show that combining Boruta with LightGBM achieves the highest accuracy of 99%. Feature selection enhances precision by focusing on relevant variables and removing unnecessary ones. Further analysis reveals that features such as Red Blood Cells, Insulin, Heart Rate, and Cholesterol significantly influence the classification of specific diseases. These findings highlight the importance of feature selection in multi-disease classification and medical data analysis, improving the efficiency of machine learning systems. Future research should develop more flexible feature selection methods and test models on diverse disease datasets.
Copyrights © 2025