This study conducted experiments using ensemble methods, hyperparameter tuning, and voting to improve software defect prediction across multiple projects using the Kamei dataset. Five machine learning models LightGBM, XGBoost, Random Forest, Extra Trees, and Gradient Boosting were applied to six projects: Bugzilla, Columba, JDT, Mozilla, Platform, and Postgres. Overall, the models demonstrated good performance when tested on datasets of projects with similar characteristics or strong relationships, such as Mozilla, JDT, and Platform, achieving accuracy and F1 scores above 80%. This indicates that defect patterns learned from one project can be effectively applied to similar projects. However, the models’ performance dropped significantly when predicting defects in the Bugzilla project from other projects, indicating notable differences in defect patterns or feature incompatibility. Differences in data distribution across projects remain a major challenge in CPDP. Therefore, domain adaptation techniques or feature transformation methods are needed to reduce inter-project differences, enabling the models to better recognize defect patterns across projects. Despite some improvements, data differences and class imbalance still limit prediction performance. Future research should address these challenges.
Copyrights © 2025