Sinkron : Jurnal dan Penelitian Teknik Informatika
Vol. 9 No. 3 (2025): Article Research July 2025

Integrating SMOTE with XGBoost for Robust Classification on Imbalanced Datasets: A Dual-Domain Evaluation

Siagian, Novriadi Antonius (Unknown)
Sipayung, Sardo P (Unknown)
Alex Rikki (Unknown)
Marbun, Nasib (Unknown)



Article Info

Publish Date
15 Jul 2025

Abstract

Class imbalance is one of the main challenges in classification problems, as it can reduce the model's ability to accurately identify minority classes and negatively impact the overall reliability of predictions. In response to this problem, this study proposes an integrated approach combining SMOTE and XGBoost to improve classification performance on imbalanced data. This approach aims to evaluate the impact of oversampling techniques on prediction accuracy and model sensitivity to class distribution. The evaluation was conducted using two public datasets representing different domains and different amounts of data, namely Spambase and Diabetes, to assess the effectiveness and generalization of the applied approach. The experimental results show that this integrated model consistently outperforms traditional comparison algorithms, with an F1 score of 0.94 and ROC-AUC of 0.98 on the Spambase dataset and ROC-AUC of 0.83 on the Diabetes dataset, with a good balance between precision and recall. The 10-fold cross-validation technique was applied to ensure objective performance estimates free from random data splitting bias. Additionally, this study highlights the importance of selecting appropriate evaluation metrics in the context of imbalanced data, as single accuracy often provides a misleading performance picture. This study makes a significant contribution by providing a benchmark for comparing the effectiveness of SMOTE-XGBoost integration using two different datasets, accompanied by rigorous cross-validation. These findings reinforce the position of integrating data preprocessing strategies and ensemble learning as a competitive and adaptive solution for addressing class imbalance challenges in data-driven classification systems.

Copyrights © 2025






Journal Info

Abbrev

sinkron

Publisher

Subject

Computer Science & IT

Description

Scope of SinkrOns Scientific Discussion 1. Machine Learning 2. Cryptography 3. Steganography 4. Digital Image Processing 5. Networking 6. Security 7. Algorithm and Programming 8. Computer Vision 9. Troubleshooting 10. Internet and E-Commerce 11. Artificial Intelligence 12. Data Mining 13. Artificial ...