Purpose: Early detection of software defects is essential to prevent problems with software maintenance. Although much machine learning research has been used to predict software defects, most have not paid attention to the problems of data imbalance and feature correlation. This research focuses on overcoming the problems of imbalance dataset. It provides new insights into the impact of these two feature extraction techniques in improving the accuracy of software defect prediction. Methods: This research compares three algorithms: Random Forest, Logistic Regression, and XGBoost, with the application of PSO for feature selection and SMOTE to overcome the problem of imbalanced data. Comparison of algorithm performance is measured using F1-Score, Precision, Recall, and Accuracy metrics to evaluate the effectiveness of each approach. Result: This research demonstrates the potential of SMOTE and PSO techniques in enhancing the performance of software defect detection models, particularly in ensemble algorithms like Random Forest (RF) and XGBoost (XGB). The application of SMOTE and PSO resulted in a significant increase in RF accuracy to 87.63%, XGB to 85.40%, but a decrease in Logistic Regression (LR) accuracy to 72.98%. The F1-Score, Precision, and Recall metrics showed substantial improvements in RF and XGB, but not in LR due to the decrease in accuracy, highlighting the impact of the research findings. Novelty: Based on the comparison results, it is proven that the SMOTE and PSO algorithms can improve the Random Forest and XGB models for predicting software defect.
Copyrights © 2024