The IBM Retail Data Warehouse (RDW) correctly recognized the importance of integrated retail data, but it remained largely descriptive, did not formalize the underlying architecture, and lacked a reproducible empirical validation. This paper reconstructs and substantially extends that early proposal into a publication-ready research article. We first synthesize the historical IBM RDW, Retail Data Warehouse Model (RDWM), Retail Services Data Model (RSDM), and Retail Business Solution Template (RBST) concepts with contemporary data warehousing, data governance, and retail analytics literature. We then propose a governed, RDW-informed logical architecture that separates ingestion, quality control, conformed dimensional modeling, analytics marts, and decision-support services. To move beyond conceptual discussion, we instantiate the architecture with an open retail dataset from the UCI Machine Learning Repository containing 541,909 transactions. After governance-oriented preprocessing, the final analytical mart contains 392,692 valid rows, 18,532 orders, 4,338 customers, 3,665 products, and 37 countries. We formulate the transformation and forecasting workflow mathematically, define an end-to-end algorithmic pipeline, and evaluate a retail revenue forecasting task using naive, seasonal naive, linear regression, ridge regression, random forest, and gradient boosting baselines. On the hold-out test window, the best model (linear regression on warehouse-engineered features) achieves an RMSE of 4,302.61 GBP and R2=0.9766, while a raw, ungoverned pipeline yields a much weaker RMSE of 10,068.59 GBP. This corresponds to a 57.27% reduction in RMSE attributable to governance and dimensional integration. The results show that the practical value of an RDW-like architecture is not merely organizational; when implemented as a governed analytical platform, it measurably improves reproducibility, interpretability, and forecasting quality.
Copyrights © 2025