Acute Respiratory Tract Infections (ARI) are the leading cause of childhood morbidity in Indonesia, with challenges in early detection due to limited medical personnel and diagnostic data imbalance, where LRTI cases are far fewer than URTI cases. This study developed and optimized an ARI classification prediction model (URTI and LRTI) based on machine learning with resampling techniques to address imbalance. An explanatory quantitative design was used with secondary data from the Mijen Community Health Center, Semarang (2020–2025, 12.177 valid data), with preprocessing including outlier handling (Winsorizing, IQR), stratified split (70:30), and RobustScaler on the training data. Three resampling techniques (SMOTE, ADASYN, SMOTE-ENN) were applied, then tested using Decision Tree and Random Forest with GridSearchCV and 5-fold cross-validation, focusing on Recall and AUC-PR evaluation for minority classes. The results showed that Random Forest with SMOTE-ENN provided the best performance, increasing the LRTI recall from 0.02 to 0.37 and F1-macro to 0.54, while Decision Tree with SMOTE-ENN produced the highest AUC-PR of 0.31. Despite this significant improvement, a recall of 0.37 is still low for clinical applications because the risk of false negatives remains high, potentially delaying patient treatment Future implementation requires the integration of clinical symptom data (e.g., respiratory rate) to achieve clinically acceptable sensitivity. These findings confirm that resampling can improve model capabilities, but additional feature exploration is needed to achieve adequate diagnostic sensitivity in the context of healthcare analytics.
Copyrights © 2025