Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : Journal of Electronics, Electromedical Engineering, and Medical Informatics

Impact of a Synthetic Data Vault for Imbalanced Class in Cross-Project Defect Prediction Putri Nabella; Rudy Herteno; Setyo Wahyu Saputro; Mohammad Reza Faisal; Friska Abadi
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 2 (2024): April
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i2.409

Abstract

Software Defect Prediction (SDP) is crucial for ensuring software quality. However, class imbalance (CI) poses a significant challenge in predictive modeling. This study delves into the effectiveness of the Synthetic Data Vault (SDV) in mitigating CI within Cross-Project Defect Prediction (CPDP). Methodologically, the study addresses CI across ReLink, MDP, and PROMISE datasets by leveraging SDV to augment minority classes. Classification utilizing Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbors (KNN), Naive Bayes (NB), and Random Forest (RF), also model performance is evaluated using AUC and t-Test. The results consistently show that SDV performs better than SMOTE and other techniques in various projects. This superiority is evident through statistically significant improvements. KNN dominance in average AUC results, with values 0.695, 0.704, and 0.750. On ReLink, KNN show 16.06% improvement over the imbalanced and 12.84% over SMOTE. Similarly, on MDP, KNN 20.71% improvement over the imbalanced and a 10.16% over SMOTE. Moreover, on PROMISE, KNN 13.55% improvement over the imbalanced and 7.01% over SMOTE. RF displays moderate performance, closely followed by LR and DT, while NB lags behind. The statistical significance of these findings is confirmed by t-Test, all below the 0.05 threshold. These findings underscore SDV's potential in enhancing CPDP outcomes and tackling CI challenges in SDV. With KNN as the best classification algorithm. Adoption of SDV could prove to be a promising tool for enhancing defect detection and CI mitigation
Optimization of Backward Elimination for Software Defect Prediction with Correlation Coefficient Filter Method Muhammad Noor; Radityo Adi Nugroho; Setyo Wahyu Saputro; Rudy Herteno; Friska Abadi
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 6 No 4 (2024): October
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v6i4.466

Abstract

Detecting software defects is a crucial step for software development not only to reduce cost and save time, but also to mitigate more costly losses. Backward Elimination is one method for detecting software defects. Notably Backward Elimination may remove features that may later become significant to the outcome affecting the performance of Backward Elimination. The aim of this study is to improve Backward Elimination performance. In this study, several features were selected based on their correlation coefficient, with the selected feature applied to improve Backward Elimination final model performance. The final model was validated using cross validation with Naïve Bayes as the classification method on the NASA MDP dataset to determine the accuracy and Area Under the Curve (AUC) of the final model. Using top 10 correlation feature and Backward Elimination achieve an average result of 86.6% accuracy and 0.797 AUC, while using top 20 correlation feature and Backward Elimination achieved an average result of 84% accuracy and 0.812 AUC. Compare to using Backward Elimination and Naïve Bayes respectively the improvement using top 10 correlation feature as follows: AUC:1.52%, 13.53% and Accuracy: 13%, 12.4% while the improvement using top 20 correlation feature as follows: AUC:3.43%, 15.66% and Accuracy: 10.4%, 9.8%. Results showed that selecting the top 10 and top 20 feature based on its correlation before using Backward Elimination have better result than only using Backward Elimination. This result shows that combining Backward Elimination with correlation coefficient feature selection does improve Backward Elimination’s final model and yielding good results for detecting software defects.