Garuda - Garba Rujukan Digital

Journal of Advanced Health Informatics Research

Vol. 3 No. 1 (2025)

Reina Melani (Unknown)
Dina Febrina (Unknown)

Publish Date
28 Aug 2025

The classification of drugs into Prescription (Rx) and Over-the-Counter (OTC) categories is an important aspect of pharmaceutical governance because it has a direct impact on patient safety, drug access, and regulatory compliance. However, large-scale pharmaceutical data often consists of heterogeneous categorical variables and short texts, such as product names or indications, which poses challenges in the form of duplication, inconsistencies, and potential class imbalances. This condition demands a modeling approach that is not only accurate, but also lightweight and explainable. This study proposes a hybrid ensemble model that combines three algorithms, namely CART, Random Forest, and LightGBM, through a weighted soft-voting mechanism. This approach combines decision tree transparency with the reliability of modern boosting techniques. The main contribution of this study is to show that a low-complexity domain-based pipeline can produce accurate, efficient, and easily auditable Rx and OTC classifications for both clinical and regulatory needs. The pre-processing pipeline includes TF-IDF for short text, One-Hot Encoding for categorical features, as well as simple dosage variables. All features were combined into a solid matrix, then trained using weighted ensembles [1,1,8]. Evaluations include Accuracy, Precision, Recall, F1-score, ROC-AUC, Brier score, confusion matrix, and ROC curve. Test results on a dataset of 50,000 balanced samples showed consistent in-sample performance: Accuracy = 0.742; Accuracy = 0.742; Recall = 0.742; F1 = 0.742; ROC-AUC = 0.819; then Brier score = 0.214. The model is able to stably distinguish classes with a balance between False Positive and False Negative errors. In conclusion, this lightweight ensemble is able to present competitive prediction performance as well as interpretation, so that it has the potential to be applied to pharmacovigilance and drug classification. Further studies suggest adding cross-validation, probability calibration, as well as robustness tests to data outside the distribution to strengthen the reliability of the model

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Journal of Advanced Health Informatics Research

Website

Abbrev

jahir

Publisher

Peneliti Teknologi Teknik Indonesia

Subject

Computer Science & IT Control & Systems Engineering Engineering Medicine & Pharmacology Public Health

Description

Journal of Advanced Health Informatics Research (JAHIR) is a scientific journal that focuses on the application of computer science to the health field. JAHIR is a peer-reviewed open-access journal that is published three times a year (April, August and December). The scientific journal is published ...

Article Info

Abstract

Hybrid Ensemble Learning for Classifying Prescription vs. Over-the-Counter Medicines on Large-Scale Categorical and Textual Data

Article Info

Abstract