HIV/AIDS remains a significant global health challenge, requiring accurate predictive models for early detection and improved clinical decision-making. However, developing an effective predictive model faces challenges such as data imbalance and the presence of irrelevant features, which can compromise model accuracy. This study aims to enhance the performance of AIDS infection prediction models by integrating feature selection, data balancing, and machine learning classification techniques. Feature selection is conducted using Pearson Correlation, Mutual Information, and Chi-Square tests to retain only the most relevant features. Random Oversampling, SMOTE, and ADASYN are employed to address data imbalance and improve model robustness. Nine machine learning algorithms, including Decision Tree, Random Forest, XGBoost, LightGBM, Gradient Boosting, Support Vector Machine, AdaBoost, and Logistic Regression, are tested for classification. Performance evaluation using confusion matrix, precision, recall, F1-score, and AUC-ROC shows that tree-based models (Random Forest, Extra Trees, and XGBoost) achieve the best results, particularly in handling minority class predictions. The study concludes that combining feature selection, data balancing, and machine learning techniques significantly improves predictive performance, making it a valuable approach for early detection and clinical decision support in HIV/AIDS diagnosis. Future research may explore hyperparameter tuning and real-world clinical data integration to enhance practical applicability.
Copyrights © 2025