Sipayung, Sardo P
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Integrating SMOTE with XGBoost for Robust Classification on Imbalanced Datasets: A Dual-Domain Evaluation Siagian, Novriadi Antonius; Sipayung, Sardo P; Alex Rikki; Marbun, Nasib
Sinkron : jurnal dan penelitian teknik informatika Vol. 9 No. 3 (2025): Article Research July 2025
Publisher : Politeknik Ganesha Medan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33395/sinkron.v9i3.15029

Abstract

Class imbalance is one of the main challenges in classification problems, as it can reduce the model's ability to accurately identify minority classes and negatively impact the overall reliability of predictions. In response to this problem, this study proposes an integrated approach combining SMOTE and XGBoost to improve classification performance on imbalanced data. This approach aims to evaluate the impact of oversampling techniques on prediction accuracy and model sensitivity to class distribution. The evaluation was conducted using two public datasets representing different domains and different amounts of data, namely Spambase and Diabetes, to assess the effectiveness and generalization of the applied approach. The experimental results show that this integrated model consistently outperforms traditional comparison algorithms, with an F1 score of 0.94 and ROC-AUC of 0.98 on the Spambase dataset and ROC-AUC of 0.83 on the Diabetes dataset, with a good balance between precision and recall. The 10-fold cross-validation technique was applied to ensure objective performance estimates free from random data splitting bias. Additionally, this study highlights the importance of selecting appropriate evaluation metrics in the context of imbalanced data, as single accuracy often provides a misleading performance picture. This study makes a significant contribution by providing a benchmark for comparing the effectiveness of SMOTE-XGBoost integration using two different datasets, accompanied by rigorous cross-validation. These findings reinforce the position of integrating data preprocessing strategies and ensemble learning as a competitive and adaptive solution for addressing class imbalance challenges in data-driven classification systems.