Yan Nazala Bisoumi
Universitas Muhammadiyah Semarang

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

HYBRID RESAMPLING METHOD AND HYPERPARAMETER OPTIMIZATION FOR HIV/AIDS PREDICTION: EVIDENCE FROM EIGHT MACHINE-LEARNING MODELS Lydia Nur Sa'adah; Fatkhurokhman Fauzi; Prizka Rismawati Arum; M Al Haris; Yan Nazala Bisoumi
JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer) Vol. 11 No. 4 (2026): JITK Issue May 2026
Publisher : LPPM Nusa Mandiri

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33480/jitk.v11i4.7533

Abstract

HIV/AIDS remains a global health challenge with continuously increasing infection rates, highlighting the importance of accurate prediction models to support prevention and early detection. However, the development of such models is often constrained by class imbalance and irrelevant features. This study aims to improve HIV/AIDS infection prediction by integrating feature selection, data balancing techniques, and eight machine learning algorithms. Feature selection was performed using Mutual Information and Chi-Square to identify the most relevant features. The dataset used was the HIV/AIDS Infection Prediction Dataset from Kaggle, consisting of 2,139 instances and 23 features, with an imbalanced distribution of 1,618 non-infected and 521 infected cases. The dataset was divided into 80% training data and 20% testing data, with resampling applied only to the training set to prevent data leakage. Three resampling scenarios were evaluated: no sampling, SMOTE, and SMOTE-ENN. Hyperparameter tuning was conducted using Bayesian Optimization integrated with 5-fold Cross-Validation to improve model robustness and reliability. Eight machine learning algorithms were evaluated, including Decision Tree, Random Forest, AdaBoost, Gradient Boosting, XGBoost, LightGBM, K-Nearest Neighbors, and Logistic Regression. The results show that SMOTE-ENN combined with hyperparameter optimization significantly improved model performance. The best model, Gradient Boosting + SMOTE-ENN, achieved 96.1% accuracy, 94.8% precision, 98.4% recall, and 96.5% F1-score. These findings indicate that the proposed integrated framework is highly effective for predicting HIV/AIDS infection and has strong potential to support early diagnosis and data-driven decision-making in healthcare.