Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal Journal of Technology Informatics and Engineering

Binghua Zhou

Computer Science, USC, CA, USA

Author-ID : 10212079

Computer Science & IT

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Calibrated Resume-Job Matching for Trustworthy LLM-Assisted Recruiter Screening: Pairwise Matching, Probability Calibration, and Selective Refusal on Two Public Recruitment Datasets Binghua Zhou; Jiaying Jin; David Zhao
Journal of Technology Informatics and Engineering Vol. 4 No. 3 (2025): DECEMBER | JTIE : Journal of Technology Informatics and Engineering
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/jtie.v4i3.529

Recruiter screening increasingly relies on large language model (LLM)-assisted workflows, but high-stakes applications require reproducible matching, calibrated probabilities, and reliable handling of uncertain cases. This study evaluates a screening framework combining matching, calibration, and selective refusal using two public datasets: resume-job-description-fit for supervised pairwise learning and Resume-Screening-Dataset for benchmarking and external generalization. After deterministic preprocessing, we compared cosine similarity, alignment features, TF-IDF pairwise models, and hybrid models integrating text, alignment, and title information. The strongest probabilistic models were calibrated with Platt scaling and isotonic regression and evaluated under confidence-based refusal. On the resume-job-description-fit test set, the best three-class model achieved a macro-F1 of 0.450. For binary shortlist-versus-reject screening, the title-augmented hybrid model obtained 0.654 balanced accuracy, 0.647 F1, and 0.699 AUROC. Platt calibration improved probability estimates by reducing the Brier score from 0.232 to 0.226 and negative log-likelihood from 0.772 to 0.675. Selective refusal further improved in-domain accuracy, while cross-dataset transfer remained weak (AUROC 0.47–0.51). These results indicate that matching, calibration, and selective refusal enhance trustworthy within-domain screening, although human review remains essential under distribution shift.

Co-Authors David Zhao Jiaying Jin

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search