Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
Vol 9 No 6 (2025): December 2025 (in progress)

Benchmarking Machine Learning Paradigms for Resume Screening on Imbalanced Data

Fitri Noor Febriana (Unknown)
Ira Puspitasari (Unknown)



Article Info

Publish Date
06 Dec 2025

Abstract

Manual resume screening is an inefficient and bias-prone process, yet comprehensive benchmarks of machine learning models on imbalanced, real-world recruitment data remain scarce. This study addresses this gap by benchmarking seven models from classical, ensemble, and deep learning paradigms for automated resume classification. Using a private dataset of 2,483 resumes across 24 job categories, this study evaluates the models with distinct TF-IDF and BERT embedding feature pipelines and an adaptive strategy for handling class imbalance (Class Weights, SMOTE, SMOTEENN). The results showed that the XGBoost model achieved the highest performance (weighted F1-score of 0.779), followed by the highly competitive BERT (F1 0.728) and Random Forest (F1 0.711) models. Despite these methods, all models struggled with extreme minority classes, confirming data scarcity as a primary limitation. This study provides a valuable benchmark and an evidence-based framework for HR practitioners, highlighting the critical trade-off between predictive performance (XGBoost), interpretability (Random Forest), and semantic capability (BERT). The findings conclude that the primary challenge is data representation, steering future work towards data augmentation and fairness audits.

Copyrights © 2025






Journal Info

Abbrev

RESTI

Publisher

Subject

Computer Science & IT Engineering

Description

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) dimaksudkan sebagai media kajian ilmiah hasil penelitian, pemikiran dan kajian analisis-kritis mengenai penelitian Rekayasa Sistem, Teknik Informatika/Teknologi Informasi, Manajemen Informatika dan Sistem Informasi. Sebagai bagian dari semangat ...