Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal REINWARDTIA BERITA BIOLOGI Journal of Technology Informatics and Engineering

James Hunter, James

Unknown Affiliation

Author-ID : 596333

Agriculture, Biological Sciences & Forestry Biochemistry, Genetics & Molecular Biology Computer Science & IT Education

Published : 3 Documents Claim Missing Document

Claim Missing Document

Articles

Title

Off-Policy Evaluation and Conservative Policy Selection for Slot-Level Dynamic Bidding and Ranking on the Open Bandit Dataset (Small) Ye, Tong; Mu, Jinyi; Hunter, James
Journal of Technology Informatics and Engineering Vol. 5 No. 1 (2026): APRIL | JTIE : Journal of Technology Informatics and Engineering
Publisher : University of Science and Computer Technology

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/jtie.v5i1.503

Dynamic bidding and ranking systems must improve revenue or engagement while avoiding harmful regressions during deployment. This paper presents an end-to-end offline OPE and conservative policy-selection workflow for slot-level contextual bandit approximations of ranking decisions. Using the small Open Bandit Dataset (OBD-small) from ZOZOTOWN (ZOZO, Inc.), each logged row is treated as a context-dependent choice among discrete actions (items), with binary click rewards and logged propensity. This formulation is suitable at the slot level but does not capture full listwise ranking or multi-step offline reinforcement learning. Dynamic bidding and ranking systems must improve revenue or engagement while avoiding harmful regressions during deployment. This paper presents an end-to-end offline OPE and conservative policy-selection workflow for slot-level contextual bandit approximations of ranking decisions. Using the small Open Bandit Dataset (OBD-small) from ZOZOTOWN (ZOZO, Inc.), each logged row is treated as a context-dependent choice among discrete actions (items), with binary click rewards and logged propensity. This formulation is suitable at the slot level but does not capture full listwise ranking or multi-step offline reinforcement learning. Empirically, highly deterministic evaluation policies exhibit extreme variance under sparse clicks, while the logistic reward model remains weak (ROC-AUC ≈ 0.5), limiting DM/DR interpretability. Clipped-DR mixing yields only limited certified improvements: in the women’s campaign, gains appear only at moderate confidence (δ=0.10) and for caps up to M=5, whereas stricter or looser settings revert to baseline; in the men’s campaign, certification is largely absent. These findings demonstrate that OPE diagnostics and conservative mixing enable reproducible offline selection under uncertainty, but do not indicate deployment-ready improvements.

Co-Authors Deden Girmansyah, Deden Harry Wiriadinata, Harry Hoover, Scott Kuswata Kartawinata Mu, Jinyi W. Scoot Hoover, W. Scoot Ye, Tong

Title Search

Found 1 Documents Search Journal : Journal of Technology Informatics and Engineering

Abstract

Title

Found 1 Documents
Search
Journal : Journal of Technology Informatics and Engineering