Claim Missing Document
Check
Articles

Found 1 Documents
Search

Analysis of Important Features in Software Defect Prediction Using Synthetic Minority Oversampling Techniques (SMOTE), Recursive Feature Elimination (RFE) and Random Forest Ghinaya, Helma; Herteno, Rudy; Faisal, Mohammad Reza; Farmadi, Andi; Indriani, Fatma
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 3 (2024): July
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i3.453

Abstract

Software Defect Prediction (SDP) is essential for improving software quality during testing. As software systems grow more complex, accurately predicting defects becomes increasingly challenging. One of the challenges faced is dealing with imbalanced class distributions, where the number of defective instances is significantly lower than non-defective ones. To tackle the imbalanced class issue, use the SMOTE technique. Random Forest as a classification algorithm is due to its ability to handle non-linear data, its resistance to overfitting, and its ability to provide information about the importance of features in classification. This research aims to evaluate important features and measure accuracy in SDP using the SMOTE+RFE+Random Forest technique. The dataset used in this study is NASA MDP D", which included 12 data sets. The method used combines SMOTE, RFE, and random forest techniques. This study is conducted in two stages of approach. The first stage uses the RFE+Random Forest technique; the second stage involves adding the SMOTE technique before RFE and Random Forest to measure the accurate data from NASA MDP. The result of this study is that the use of the SMOTE technique enhances accuracy across most datasets, with the best performance achieved on the MC1 dataset with an accuracy of 0.9998. Feature importance analysis identifies "maintenance severity" and "cyclomatic density" as the most crucial features in data modeling for SDP. Therefore, the SMOTE+RFE+RF technique effectively improves prediction accuracy across various datasets and successfully addresses class imbalance issues.