Garuda - Garba Rujukan Digital

Journal of Technology Informatics and Engineering

Vol. 4 No. 3 (2025): DECEMBER | JTIE : Journal of Technology Informatics and Engineering

Mu, Jinyi (Unknown)
Ye, Tong (Unknown)
Patel, Priya (Unknown)

Publish Date
25 Aug 2025

Offline or counterfactual evaluation is a critical capability for iterating advertising and recommender ranking strategies when online A/B testing is slow, expensive, or risky. Off-policy evaluation (OPE) estimates the expected reward of a candidate policy using logged interaction data from a different behavior policy. Still, it can suffer from high variance under poor overlap and can be misleading when the operational objective is choosing among candidate policies rather than minimizing point-estimation bias alone. This paper presents a fully reproducible empirical study of IPS, self-normalized IPS (SNIPS), doubly robust (DR), and Switch-DR estimators on the Open Bandit Dataset (OBD) small release. Using the Men and Women campaigns (10,000 logged item-impressions per campaign and behavior policy) collected by uniform random and Bernoulli Thompson Sampling (BTS), we construct a held-out oracle for stationary slot-wise policies from the random-traffic split and evaluate both value estimation and policy-ranking consistency on random-logged and BTS-logged test sets. Across 1,000 nonparametric bootstrap replications, IPS and SNIPS are accurate on randomly logged data, whereas BTS-logged data exhibit extreme importance weights and very small effective sample sizes (ESS), making IPS-based ranking unreliable under weak support. Switch-DR is most useful in moderate-overlap regimes, where it truncates high-variance corrections. Still, it introduces bias that depends on the switching threshold and must therefore be stress-tested rather than treated as a universally superior estimator. Finally, we provide a structured reporting template—based on oracle decomposition, overlap diagnostics, and estimator components—for explaining why a policy appears better and how reliable that conclusion is.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Journal of Technology Informatics and Engineering

Website

Abbrev

jtie

Publisher

Universitas Sains Dan Teknologi Komputer

Subject

Computer Science & IT

Description

Power Engineering Telecommunication Engineering Computer Engineering Control and Computer Systems Electronics Information technology Informatics Data and Software engineering Biomedical ...

Article Info

Abstract

Offline Counterfactual Evaluation for Advertising and Recommendation Slot Policies: A Reproducible Study on the Open Bandit Dataset (Small)

Article Info

Abstract