Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control

Integrating Adaptive Sampling with Ensembles Model for Software Defect Prediction Yusuf, Muhammad; Haq, Arinal; Rochimah, Siti
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control Vol. 10, No. 2, May 2025
Publisher : Universitas Muhammadiyah Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22219/kinetik.v10i2.2191

Abstract

Handling class imbalance is a challenge in software defect prediction. Imbalanced datasets can cause bias in machine learning models, hindering their ability to detect defects. This paper proposes an integration of Adaptive Synthetic Sampling (ADASYN) and ensemble learning methods to improve prediction accuracy. ADASYN enhances the handling of imbalanced data by generating synthetic samples for hard-to-classify instances. At the same time, the ensemble stacking technique leverages the strengths of multiple models to reduce bias and variance. The machine learning models used in this study are K-Nearest Neighbors (KNN), Decision Tree (DT), and Random Forest (RF). The results demonstrate that ADASYN, combined with ensemble stacking, outperforms the traditional SMOTE technique in most cases. For instance, in the Ant-1.7 dataset, ADASYN achieved a stacking accuracy of 90.60% compared to 89.32% with SMOTE. Similarly, in the Camel-1.6 dataset, ADASYN achieved 91.56%, slightly exceeding SMOTE’s 91.32%. However, SMOTE performed better in simpler models like Decision Tree for certain datasets, highlighting the importance of choosing the appropriate resampling method. Across all datasets, ensemble stacking consistently provided the highest accuracy, benefiting from ADASYN's adaptive resampling strategy. These results underscore the importance of combining advanced sampling methods with ensemble learning techniques to address class imbalance effectively. This approach improves prediction accuracy and provides a practical framework for reliable software defect prediction in real-world scenarios. Future work will explore hybrid techniques and broader evaluations across diverse datasets and classifiers.