Abdul Karim
Department of Artificial Intelligence Convergence, HallymUniversity, Chuncheon 24252, Republic of Korea

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Evaluation of Undersampling and Oversampling Techniques in Term Deposit Prediction: A Gradient Boosting Approach Lasmedi Afuan; Abdul Karim; Ipung Permadi
International Journal of Machine Learning (IJOML) Vol. 1 No. 1 (2026): IJOML Volume 1, Number 1, June 2026
Publisher : APJIKOM

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.66472/ijoml.v1i1.2

Abstract

Time deposits play a pivotal role in maintaining banking liquidity, yet telemarketing campaigns designed to secure them are often inefficient due to low response rates and untargeted outreach. The primary challenge in predictive marketing modeling lies in extreme data class imbalance, which renders standard algorithms prone to bias and leads to a failure in detecting potential customers. This study aims to validate the effectiveness of Gradient Boosting models and empirically evaluate the impact of various resampling techniques in mitigating class distribution disparities. The applied methodology encompasses the utilization of XGBoost, LightGBM, and CatBoost algorithms on the UCI Bank Marketing dataset, integrated with Random Under-Sampling, Random Over-Sampling, SMOTENC, and Tomek Links strategies. Experimental results reveal a significant trade-off between sensitivity and precision, wherein LightGBM paired with Random Under-Sampling achieved the highest detection capability with a Recall of 88.28%. Concurrently, the combination of CatBoost with Random Over-Sampling demonstrated the optimal balance, attaining an F1-Score of 0.6040, a Recall of 81.95%, and an AUC-ROC value reaching 0.9326. These findings offer a strategic contribution to bank management in selecting analytic approaches aligned with business priorities, whether the focus is on operational cost efficiency or aggressive market penetration to optimize customer acquisition.
RoBERTa with Sample Reweighting and Temperature Scaling for Imbalanced Toxicity Detection: A Performance–Fairness–Calibration Study Lasmedi Afuan; Nurul Hidayat; Abdul Karim
International Journal of Machine Learning (IJOML) Vol. 1 No. 1 (2026): IJOML Volume 1, Number 1, June 2026
Publisher : APJIKOM

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.66472/ijoml.v1i1.3

Abstract

Detecting toxic language at scale requires models that are not only accurate but also robust to demographic subgroup bias and reliable in their probability estimates; however, these objectives can conflict, especially under severe class imbalance. This study investigates the performance–fairness–calibration interplay in toxicity detection using the Jigsaw Unintended Bias dataset (124,858 comments; 5.99% toxic; identity annotations in 9.39% of samples). We aim to quantify how sample reweighting and imbalance-aware training affect global discrimination, worst-subgroup behavior, and probabilistic calibration, and to assess post-hoc temperature scaling on predicted probabilities. We compare a TF-IDF + logistic regression baseline against RoBERTa variants trained without mitigation, with sample reweighting, and with an imbalance-oriented loss, using multi-metric evaluation (AUC, Min/Worst-Subgroup AUC, ECE, and NLL). RoBERTa consistently improves global AUC over the baseline (≈0.96 vs 0.9155) while worst-subgroup AUC remains substantially lower and varies modestly across RoBERTa variants (≈0.7726–0.7813). Calibration results indicate a marked gap between models: the baseline achieves the lowest ECE (0.0052), whereas RoBERTa exhibits higher ECE (≈0.0257) that increases further under reweighting and imbalance-oriented training (≈0.0490–0.0866), with NLL not improving consistently. These findings contribute empirical evidence that fairness-oriented interventions can shift error and calibration profiles, motivating holistic evaluation and methods that jointly constrain subgroup fairness and probabilistic reliability.