Claim Missing Document
Check
Articles

Found 1 Documents
Search

An Empirical Study of Cross-Project and Within-Project Performance in Software Defect Prediction Models Using Tree-Based and Boosting Classifiers Raidra Zeniananto; Herteno, Rudy; Radityo Adi Nugroho; Andi Farmadi; Setyo Wahyu Saputro
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 7 No. 3 (2025): August
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/ijeeemi.v7i3.95

Abstract

Software Defect Prediction (SDP) is a vital process in modern software engineering aimed at identifying faulty components in the early stages of development. In this study, we conducted a comprehensive evaluation of two widely employed SDP approaches, Within-Project Software Defect Prediction (WP-SDP) and Cross-Project Software Defect Prediction (CP-SDP), using identical preprocessing steps to ensure an objective comparison. We utilized the NASA MDP dataset, where each project was split into 70% training and 30% testing data, and applied three distinct resampling strategies—no sampling, oversampling, and undersampling—to address the challenge of class imbalance. Five classification algorithms were examined, including Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting (GB), XGBoost (XGB), and LightGBM (LGBM). Performance was measured primarily using Accuracy and Area Under the Curve (AUC) metrics, resulting in 360 experimental outcomes. Our findings revealed that WP-SDP, combined with oversampling and Random Forest, demonstrated superior predictive capability on most projects, achieving an Accuracy of 89.92% and an AUC of 0.931 on PC4. Nonetheless, CP-SDP excelled in certain small-scale projects (e.g., MW1), underscoring its potential when local historical data is scarce but inter-project characteristics remain sufficiently similar. This study’s results underscore the importance of selecting a prediction scheme tailored to specific project attributes, class imbalance levels, and available historical data. By establishing a standardized methodological framework, our work contributes to a clearer understanding of the strengths and limitations of WP-SDP and CP-SDP, paving the way for more effective defect detection strategies and improved software quality.