Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics
Vol. 8 No. 1 (2026): February

Comparative Study of Filter, Wrapper, and Hybrid Feature Selection Using Tree-Based Classifiers for Software Defect Prediction

Rahmayanti, Rahmayanti (Unknown)
Herteno, Rudy (Unknown)
Saputro, Setyo Wahyu (Unknown)
Saragih, Triando Hamonangan (Unknown)
Abadi, Friska (Unknown)



Article Info

Publish Date
27 Dec 2025

Abstract

Software defect prediction (SDP) is essential for improving software reliability by enabling the early identification of modules that may contain defects before the release stage. SDP commonly exhibits redundant or non-contributory metrics, underscoring the need for feature selection to derive a more informative subset. To address this problem, the present study investigates and compares the effectiveness of three feature-selection strategies: SelectKBest (SKB), Recursive Feature Elimination (RFE), and the hybrid SKB+RFE, in enhancing the performance of tree-based classifiers on the NASA Metrics Data Program (MDP) data collections. The study utilizes three classification algorithms, namely Random Forest (RF), Extra Trees (ET), and Bagging (Decision Tree), with Area Under the Curve (AUC) serving as the primary metric for assessing model performance. Experimental results reveal that the RFE and Extra Trees combination yields the top performance, producing an average AUC of 0.7855. This is subsequently followed by the SKB+RFE+ET configuration, which achieves an AUC of 0.7809, and SKB+ET at 0.7776. These findings demonstrate that iterative wrapper-based approaches such as RFE can identify more relevant and effective feature subsets than filter or hybrid strategies, with the RFE+Extra Trees configuration yielding the strongest overall predictive performance and wrapper-based methods exhibiting higher stability across heterogeneous datasets. Even without hyperparameter tuning and relying solely on class-weighting rather than explicit resampling techniques, the findings offer empirical insight into the isolated influence of feature selection on predictive performance. Overall, the study confirms that RFE combined with Extra Trees offers the strongest predictive performance on NASA MDP data collections and forms a foundation for developing more adaptive and robust models.

Copyrights © 2026






Journal Info

Abbrev

ijeeemi

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Electrical & Electronics Engineering Health Professions Materials Science & Nanotechnology

Description

Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics (IJEEEMI) publishes peer-reviewed, original research and review articles in an open-access format. Accepted articles span the full extent of the Electronics, Biomedical, and Medical Informatics. IJEEEMI seeks to ...