Cross-Project Defect Prediction (CPDP) addresses the scarcity of defect data in new software projects by transferring knowledge from existing ones. However, domain shift between projects remains a major challenge. This study introduces a lightweight and practical CPDP pipeline based on traditional metric features, integrating domain adaptation (CORAL, TCA, TCA+), feature selection, and resampling techniques. Through 120 configurations evaluated on multiple PROMISE datasets, we found that combining TCA or TCA+ with Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors (SMOTEENN) consistently improved F1-Score and Recall on imbalanced datasets. LightGBM demonstrated the most stable performance across projects, while Logistic Regression yielded the highest MCC in specific cases. Principal Component Analysis (PCA) visualizations supported the effectiveness of domain alignment. The proposed pipeline offers a reproducible, cost-efficient alternative to deep learning models and provides actionable insights for defect prediction in resource-constrained environments.
Copyrights © 2025