Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Journal of Technology Informatics and Engineering

Off-Policy Evaluation and Conservative Policy Selection for Slot-Level Dynamic Bidding and Ranking on the Open Bandit Dataset (Small) Ye, Tong; Mu, Jinyi; Hunter, James
Journal of Technology Informatics and Engineering Vol. 5 No. 1 (2026): APRIL | JTIE : Journal of Technology Informatics and Engineering
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/jtie.v5i1.503

Abstract

Dynamic bidding and ranking systems must improve revenue or engagement while avoiding harmful regressions during deployment. This paper presents an end-to-end offline OPE and conservative policy-selection workflow for slot-level contextual bandit approximations of ranking decisions. Using the small Open Bandit Dataset (OBD-small) from ZOZOTOWN (ZOZO, Inc.), each logged row is treated as a context-dependent choice among discrete actions (items), with binary click rewards and logged propensity. This formulation is suitable at the slot level but does not capture full listwise ranking or multi-step offline reinforcement learning. Dynamic bidding and ranking systems must improve revenue or engagement while avoiding harmful regressions during deployment. This paper presents an end-to-end offline OPE and conservative policy-selection workflow for slot-level contextual bandit approximations of ranking decisions. Using the small Open Bandit Dataset (OBD-small) from ZOZOTOWN (ZOZO, Inc.), each logged row is treated as a context-dependent choice among discrete actions (items), with binary click rewards and logged propensity. This formulation is suitable at the slot level but does not capture full listwise ranking or multi-step offline reinforcement learning. Empirically, highly deterministic evaluation policies exhibit extreme variance under sparse clicks, while the logistic reward model remains weak (ROC-AUC ≈ 0.5), limiting DM/DR interpretability. Clipped-DR mixing yields only limited certified improvements: in the women’s campaign, gains appear only at moderate confidence (δ=0.10) and for caps up to M=5, whereas stricter or looser settings revert to baseline; in the men’s campaign, certification is largely absent. These findings demonstrate that OPE diagnostics and conservative mixing enable reproducible offline selection under uncertainty, but do not indicate deployment-ready improvements.