This study addresses the operational challenge of multi-horizon GPU demand forecasting in large-scale computing clusters, where GPUs are costly resources and demand fluctuates under constraint-driven scheduling. The objective is to evaluate whether integrating workload semantics improves forecasting performance across horizons up to 72 hours. A reproducible empirical benchmark is developed using the Alibaba Clusterdata GPU trace (cluster-trace-gpu-v2023), comprising 8,152 pods over approximately 149 days with a total capacity of 6,212 GPUs. The study compares two statistical baselines, ARIMA(48,0,0) and a seasonal-trend additive model, with three lightweight deep learning models: Temporal Convolutional Network (TCN), Informer-lite, and TFT-lite. Workload semantics are approximated by converting hourly job metadata into textual summaries, embedding them with TF-IDF and truncated SVD (8 dimensions), and incorporating them as exogenous covariates. Evaluation uses SMAPE and MASE across multiple horizons (1–72 hours), along with peak-aware metrics and operational risk curves. Results show that the seasonal-trend model achieves the best overall accuracy (15.34% sMAPE), while TCN is the strongest deep model (17.20% sMAPE). Semantic embeddings do not improve short horizons (1–48 hours) but reduce 72-hour sMAPE by 11.1% and improve peak-window error. These findings indicate that autoregressive signals dominate short-term forecasting, whereas semantic context becomes beneficial at longer horizons. The study emphasizes that combining point accuracy with risk-based evaluation is essential for effective GPU capacity planning under dynamic and uncertain demand conditions.
Copyrights © 2025