Big Data Analytics and Data Science
Vol. 1 No. 2 (2026): June: Big Data Analytics and Data Science

Development of an Early Warning System for Predicting Drug Shortages Using a Random Forest Algorithm on Hospital Pharmacy Logistics Data

Ahmad Asyhadi Asyhadi (Universitas Dinamika Bangsa)
Widyadhana Candraningtias (Universitas Dinamika Bangsa)



Article Info

Publish Date
22 Jun 2026

Abstract

Stock shortages are a significant issue in inventory management that can disrupt operations and pose a risk of loss. Therefore, an early warning system capable of detecting potential risks at an earlier stage is necessary. This study aims to develop a machine learning-based prediction model to detect risk conditions using the Random Forest algorithm and compare it with several other classification models. To address the issue of data imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was applied to the training data. Additionally, feature engineering was performed by creating a usage ratio variable as an indicator of the relationship between inventory and usage. The dataset consists of 882 data points with an imbalanced class distribution, where the risk class is more dominant than the normal class. Model evaluation was performed using the F2-score metric, which places greater emphasis on recall, given the importance of minimizing false negatives in early warning systems. Furthermore, model performance was also analyzed using ROC curves and Precision-Recall Curves to measure the model’s discriminatory ability more comprehensively. A high AUC value indicates that the model is effective at distinguishing between the normal and risk classes, particularly under imbalanced data conditions. To improve risk detection sensitivity, a threshold tuning approach was employed by adjusting the probability decision threshold based on F2-score optimization. This approach aims to increase the recall value so that all at-risk cases can be detected to the greatest extent possible, albeit with a potential increase in false positives. The research results show that the developed model is capable of achieving very high performance, with an optimal F2-score and no classification errors found in the test data. Feature importance analysis indicates that the stock, usage, and usage ratio variables are dominant factors in determining risk conditions. Nevertheless, these very high results need to be analyzed critically due to the interdependence between features and the target label formation process. Overall, this study contributes to the development of a machine learning-based early warning system that focuses not only on accuracy but also on comprehensive risk detection capabilities. The proposed approach can be used as a decision support system for more proactive inventory management.

Copyrights © 2026






Journal Info

Abbrev

BDAS

Publisher

Subject

Description

Aims This journal aims to publish cutting-edge research in big data analytics and data science, emphasizing data-driven methods and intelligent analytics for decision support and innovation. Scope Big data architectures and platforms Data mining and predictive analytics Machine learning for data ...