Traditional predictive models such as linear regression often struggle to capture the nonlinear interactions among operational factors that cause delivery delays in multi-category courier services. This study addresses that gap by developing and comparing machine learning (ML) algorithms to predict delivery delays across different service types at PT Pos Indonesia. The primary objective is to identify the most accurate predictive model and the dominant variables influencing delays across high-speed (Same Day, Next Day) and economical delivery services. A quantitative experimental design was employed using operational data from PT Pos Indonesia, consisting of 10,999 records and 12 variables. Three ML algorithms Logistic Regression, Random Forest, and XGBoost were trained and evaluated using standardized preprocessing, feature encoding, and stratified data splitting. Results show that Random Forest and XGBoost outperform Logistic Regression, each achieving approximately 65% accuracy with an AUC of 0.73, indicating moderate yet consistent predictive capabilities. Feature importance analysis reveals that Discount_offered, Weight_in_gms, and Prior_purchases are the most influential predictors of delivery timeliness.This study provides theoretical and practical contributions by introducing the first comparative ML framework for delay prediction in a national logistics context. The findings offer actionable insights for optimizing scheduling, load balancing, and promotional strategies, while advancing the integration of AI-based predictive analytics within postal logistics operations.
Copyrights © 2025