Journal of Technology Informatics and Engineering
Vol. 5 No. 1 (2026): APRIL | JTIE : Journal of Technology Informatics and Engineering

Off-Policy Evaluation and Conservative Policy Selection for Slot-Level Dynamic Bidding and Ranking on the Open Bandit Dataset (Small)

Ye, Tong (Unknown)
Mu, Jinyi (Unknown)
Hunter, James (Unknown)



Article Info

Publish Date
20 Apr 2026

Abstract

Dynamic bidding and ranking systems must improve revenue or engagement while avoiding harmful regressions during deployment. This paper presents an end-to-end offline OPE and conservative policy-selection workflow for slot-level contextual bandit approximations of ranking decisions. Using the small Open Bandit Dataset (OBD-small) from ZOZOTOWN (ZOZO, Inc.), each logged row is treated as a context-dependent choice among discrete actions (items), with binary click rewards and logged propensity. This formulation is suitable at the slot level but does not capture full listwise ranking or multi-step offline reinforcement learning. Dynamic bidding and ranking systems must improve revenue or engagement while avoiding harmful regressions during deployment. This paper presents an end-to-end offline OPE and conservative policy-selection workflow for slot-level contextual bandit approximations of ranking decisions. Using the small Open Bandit Dataset (OBD-small) from ZOZOTOWN (ZOZO, Inc.), each logged row is treated as a context-dependent choice among discrete actions (items), with binary click rewards and logged propensity. This formulation is suitable at the slot level but does not capture full listwise ranking or multi-step offline reinforcement learning. Empirically, highly deterministic evaluation policies exhibit extreme variance under sparse clicks, while the logistic reward model remains weak (ROC-AUC ≈ 0.5), limiting DM/DR interpretability. Clipped-DR mixing yields only limited certified improvements: in the women’s campaign, gains appear only at moderate confidence (δ=0.10) and for caps up to M=5, whereas stricter or looser settings revert to baseline; in the men’s campaign, certification is largely absent. These findings demonstrate that OPE diagnostics and conservative mixing enable reproducible offline selection under uncertainty, but do not indicate deployment-ready improvements.

Copyrights © 2026






Journal Info

Abbrev

jtie

Publisher

Subject

Computer Science & IT

Description

Power Engineering Telecommunication Engineering Computer Engineering Control and Computer Systems Electronics Information technology Informatics Data and Software engineering Biomedical ...