Journal of Applied Data Sciences
Vol 6, No 4: December 2025

Optimization of Machine Learning Models for Risk Prediction of DHF Spread to Support Management Strategies in Urban Areas

Devis, Yesica (Unknown)
Muhamadiah, Muhamadiah (Unknown)
Yulanda, Yulanda (Unknown)
Irawan, Yuda (Unknown)
Wahyuni, Refni (Unknown)



Article Info

Publish Date
02 Sep 2025

Abstract

Dengue fever is an endemic disease that poses a serious threat to public health in tropical regions such as Indonesia. Efforts to control this disease require a data-based approach that is able to accurately predict the level of risk so that interventions can be targeted. This study aims to develop a predictive model of DHF risk using ensemble stacking method optimized with Optuna algorithm and integrated into an interactive dashboard based on Streamlit. The dataset used includes environmental, climate, and socio-demographic indicators from 2015 to 2024 with a total of 1,440 data entries. The preprocessing process includes normalization with Standard Scaler, feature selection using LASSO, and label data balancing with the SMOTE method. Model validation was performed using 10-Fold Cross Validation to ensure model generalization to new data. The stacking model is built with three basic algorithms, namely SVM, KNN, and Random Forest, which are combined using Logistic Regression as a meta-learner. The evaluation results show that the model is able to achieve an average accuracy of 97.57%, with high precision, recall, and f1-score values in all three prediction classes (low, medium, high). The ROC-AUC for each class also showed near-perfect performance. The implementation of the model in the Streamlit dashboard allows non-technical users such as health center or health office staff to perform regional risk prediction and obtain data-driven intervention recommendations automatically. This research not only contributes to the development of predictive technology, but also strengthens evidence-based health promotion practices in urban areas. Further research is recommended to integrate IoT-based real-time data and expand the scope of application areas.

Copyrights © 2025






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...