Cervical cancer remains one of the leading causes of cancer-related mortality among women worldwide, particularly in developing countries. Early prediction through machine learning has the potential to support clinical decision-making; however, cervical cancer datasets often suffer from severe class imbalance, which reduces the ability of conventional models to correctly detect minority cases. This study aims to improve minority class detection in cervical cancer prediction by evaluating several imbalance-aware ensemble learning approaches. The proposed study compares five models, namely Random Forest (RF), SMOTE combined with Random Forest (SMOTE+RF), Balanced Random Forest (BRF), EasyEnsemble, and RUSBoost. The models were evaluated using 5-fold cross-validation with performance metrics including accuracy, recall, F1-score, and Area Under the Curve (AUC). Statistical validation was conducted using the Friedman test, followed by the Wilcoxon signed-rank test and Kendall’s W effect size analysis to assess the significance and magnitude of performance differences. Unlike prior studies that primarily focus on performance improvement, this study introduces a statistically rigorous comparative evaluation to assess both significance and practical effect of imbalance-aware ensemble methods. Experimental results show that imbalance-aware ensemble methods significantly improve minority detection compared to the baseline RF model. In particular, BRF achieved the highest AUC of 0.9469 with improved recall stability, while RUSBoost produced the highest F1-score of 0.7451. Although the Friedman test indicated no statistically significant difference among models (p = 0.2037), the Kendall’s W value of 0.297 suggests a small-to-moderate practical effect. These findings indicate that imbalance-aware ensemble learning can enhance the robustness of cervical cancer prediction models, particularly for minority class detection. The results highlight the importance of incorporating imbalance-handling strategies in medical prediction systems and suggest potential directions for future research in improving diagnostic decision-support models.
Copyrights © 2026