Cervical Cancer has a high mortality rate among women, driving the adoption of early detection systems based on machine learning. However, their implementation is hindered by class imbalance issues, as seen in the UCI Cervical Cancer Behavior Risk Dataset, where positive cases constitute only 5.8–7.3% of the data. This study proposes an evaluation of resampling techniques—including SMOTE, ADASYN, Random Undersampling, and Borderline-SMOTE—combined with classification algorithms such as RF, XGBoost, LR, GNB, and k-NN. Using Stratified K-Fold Cross Validation to preserve the original class distribution in each fold and ensuring resampling is applied only to the training data in each iteration, the results demonstrate that Borderline-SMOTE significantly improved model performance. Specifically, the Random Forest model achieved a Recall of 0.87 and an AUC-ROC of 0.94. These findings are expected to provide a foundation for future research focused on optimizing adaptive sampling methods
Copyrights © 2025