This study addresses the problem of predicting delivery status in supply chain data, a critical task for optimizing logistics and operations. The dataset, which includes multiple features like order details, product specifications, and customer information, was pre-processed using oversampling to address class imbalance, ensuring that the model could handle rare cases of late or canceled deliveries. The data cleaning process involved handling missing values, removing irrelevant columns, and transforming categorical variables into numerical formats. After pre-processing and cleaning, five machine learning models were applied: Logistic Regression, Random Forest, SVM, K-Nearest Neighbors (KNN), and XGBoost. Each model was evaluated using metrics such as accuracy, precision, recall, and F1-score. The results showed that XGBoost outperformed the other models, achieving the highest accuracy and providing the most reliable predictions for the delivery status. This makes XGBoost the best choice for supply chain data analysis in this context. This study contributes to the growing application of machine learning in supply chain optimization by identifying XGBoost as a robust model for delivery status prediction in large datasets. For future research, exploring hybrid models and advanced feature engineering techniques could further improve prediction accuracy and address additional challenges in supply chain optimization, especially in the context of real-time data processing and dynamic supply chain environments.
Copyrights © 2025