JITK (Jurnal Ilmu Pengetahuan dan Komputer)
Vol. 11 No. 4 (2026): JITK Issue May 2026

HYBRID RESAMPLING METHOD AND HYPERPARAMETER OPTIMIZATION FOR HIV/AIDS PREDICTION: EVIDENCE FROM EIGHT MACHINE-LEARNING MODELS

Lydia Nur Sa'adah (Universitas Muhammadiyah Semarang)
Fatkhurokhman Fauzi (Universitas Muhammadiyah Semarang)
Prizka Rismawati Arum (Universitas Muhammadiyah Semarang)
M Al Haris (Universitas Muhammadiyah Semarang)
Yan Nazala Bisoumi (Universitas Muhammadiyah Semarang)



Article Info

Publish Date
22 May 2026

Abstract

HIV/AIDS remains a global health challenge with continuously increasing infection rates, highlighting the importance of accurate prediction models to support prevention and early detection. However, the development of such models is often constrained by class imbalance and irrelevant features. This study aims to improve HIV/AIDS infection prediction by integrating feature selection, data balancing techniques, and eight machine learning algorithms. Feature selection was performed using Mutual Information and Chi-Square to identify the most relevant features. The dataset used was the HIV/AIDS Infection Prediction Dataset from Kaggle, consisting of 2,139 instances and 23 features, with an imbalanced distribution of 1,618 non-infected and 521 infected cases. The dataset was divided into 80% training data and 20% testing data, with resampling applied only to the training set to prevent data leakage. Three resampling scenarios were evaluated: no sampling, SMOTE, and SMOTE-ENN. Hyperparameter tuning was conducted using Bayesian Optimization integrated with 5-fold Cross-Validation to improve model robustness and reliability. Eight machine learning algorithms were evaluated, including Decision Tree, Random Forest, AdaBoost, Gradient Boosting, XGBoost, LightGBM, K-Nearest Neighbors, and Logistic Regression. The results show that SMOTE-ENN combined with hyperparameter optimization significantly improved model performance. The best model, Gradient Boosting + SMOTE-ENN, achieved 96.1% accuracy, 94.8% precision, 98.4% recall, and 96.5% F1-score. These findings indicate that the proposed integrated framework is highly effective for predicting HIV/AIDS infection and has strong potential to support early diagnosis and data-driven decision-making in healthcare.

Copyrights © 2026






Journal Info

Abbrev

jitk

Publisher

Subject

Computer Science & IT

Description

Kegiatan menonton film merupakan salah satu cara sederhana untuk menghibur diri dari rasa gundah gulana ataupun melepas rasa lelah setelah melakukan aktivitas sehari-hari. Akan tetapi, karena berbagai alasan terkadang seseorang tidak ada waktu untuk menonton film di bioskop. Dengan bantuan media ...