Hypertension in the elderly poses complex classification challenges, characterized by noisy categorical features in health survey datasets. This study focuses on using XGBoost and CatBoost algorithms to overcome barriers when classifying hypertension in the elderly ( years) using IFLS 5 data. Unlike standard methods that focus on accuracy, this evaluation emphasizes the recall metric to reduce false negative errors, which is crucial for ensuring safety in medical screening. After carefully tuning the hyperparameters using GridSearchCV and 5-fold cross-validation on 2,774 participants, the models revealed clear algorithmic trade-offs. CatBoost demonstrated superior generalization stability and achieved the highest accuracy (66.49%), while XGBoost exhibited significant superiority in sensitivity (recall of 80.18%) by effectively applying regularization to detect minority class signals. Evaluating feature significance using the information gain and prediction values change metrics verified that biological indicators, particularly diabetes and BMI, were the main predictors compared to demographic variables. In summary, CatBoost is reliable, but XGBoost is better suited for building clinical decision support systems where the priority is detecting sensitivity.
Copyrights © 2026