Claim Missing Document
Check
Articles

Found 1 Documents
Search

A Comparative Analysis of SMOTE and ADASYN for Cervical Cancer Detection using XGBoost with MICE Imputation Ramadhan, Mita Azzahra; Saragih, Triando Hamonangan; Kartini, Dwi; Muliadi, Muliadi; Mazdadi, Muhammad Itqan
Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol 8 No 1 (2026): January
Publisher : Department of Electromedical Engineering, POLTEKKES KEMENKES SURABAYA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/jeeemi.v8i1.1415

Abstract

Cervical cancer remains a significant global health burden for women, with approximately 660,000 new cases and 350,000 associated deaths recorded worldwide in 2022. Machine learning methods have shown great promise in advancing timely detection and accurate diagnosis. This investigation compares two widely used oversampling strategies, Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN), applied to cervical cancer identification via the XGBoost classifier, paired with Multiple Imputation by Chained Equations (MICE) to handle incomplete data. The dataset consists of cervical cancer risk factors with four diagnostic outcomes: Hinselmann, Schiller, Cytology, and Biopsy, which are treated as independent binary classification tasks rather than a single multilabel classification problem. The process began by preparing a dataset of cervical cancer risk factors through MICE imputation, then applying SMOTE and ADASYN to address class imbalance. The XGBoost model is optimized using Random Search hyperparameter tuning and evaluated across train-test split ratios (50:50, 60:40, 70:30, 80:20, and 90:10) using accuracy, precision (macro, micro, weighted), recall (macro, micro, weighted), F1-score (macro, micro, weighted), and AUC metrics. The results indicated that the XGBoost setup with MICE and SMOTE outperformed the others, achieving 97.1% accuracy, 97.1% mic-precision, 97.1% mic-recall, 97.1% mic-F1, and 97.1% AUC. Meanwhile, the ADASYN-integrated model showed marginally lower results, with 95.4% accuracy, 95.4% micro-precision, 95.4% micro-recall, 95.4% micro-F1, and 55.5% AUC. SMOTE proved more adept at creating evenly distributed synthetic data for the underrepresented group. Overall, this work underscores the value of integrating MICE imputation, SMOTE oversampling, and tuned XGBoost as a reliable approach for cervical cancer detection. These insights pave the way for automated screening tools that can bolster clinical judgment and improve early diagnosis outcomes.