Yulianto, Pramudya Ridwan
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Perbandingan Kinerja Algoritma CatBoost, XGBoost, LightGBM dan Random Forest Dalam Memprediksi Risiko Infeksi Aids Dalam Dataset Kesehatan Yulianto, Pramudya Ridwan; Astuti, Yani Parti
Building of Informatics, Technology and Science (BITS) Vol 7 No 4 (2026): March 2026
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i4.8975

Abstract

This study investigates the prediction of AIDS infection risk using tree-based algorithms CatBoost, XGBoost, LightGBM, and Random Forest applied to a medical and demographic dataset consisting of 2,139 observations and 23 variables. The research process includes data exploration, cleaning, handling extreme values using the interquartile range (IQR) method, normalization with RobustScaler, and class balancing using the Synthetic Minority Over-sampling Technique (SMOTE). Due to the imbalanced nature of the dataset, model evaluation emphasizes not only accuracy but also Recall, F1-Score, and AUC-ROC to better assess infected class detection. Prior to SMOTE implementation, all models achieved high accuracy but relatively low recall for the positive class; after resampling, CatBoost demonstrated the most significant improvement, with recall increasing from 63% to 77% and F1-Score from 72% to 79%, achieving an overall accuracy of 90%. In comparison, XGBoost reached an accuracy of 88.63% with a more moderate recall improvement, while LightGBM and Random Forest showed consistent yet smaller gains, indicating that the combination of SMOTE and CatBoost is more effective in minimizing False Negatives in AIDS infection cases. The main contribution of this study lies in the integration of robust outlier handling, feature normalization, and class balancing within a structured experimental framework, with a specific emphasis on sensitivity optimization to enhance early detection reliability in clinical screening contexts.