Anemia is a global health issue that has a significant impact on quality of life and productivity. Early and accurate detection is essential to prevent more serious complications. This study aims to develop an anemia classification model based on machine learning technology using the XGBoost algorithm, as well as compare its performance with Logistic Regression and Random Forest methods. The dataset used in this study was obtained from the Kaggle platform, consisting of 1,421 samples and six clinical attributes, namely Gender, Hemoglobin (HGB), Mean Corpuscular Hemoglobin (MCH), Mean Corpuscular Hemoglobin Concentration (MCHC), Mean Corpuscular Volume (MCV), Result. During the feature engineering process, the derived feature of the hemoglobin-to-MCV ratio (Hb/MCV) was added, which is medically relevant in distinguishing types of anemia. Evaluation results showed that XGBoost and Random Forest achieved an accuracy rate and F1-Score of 100%, while Logistic Regression achieved a rate of 98.9%. XGBoost was selected as the primary model due to its efficient computational capabilities and support for interpretation using SHAP (SHapley Additive exPlanations). SHAP visualization revealed that the Hb/MCV ratio and hemoglobin were the most influential features in classification. This model has the potential to be used as a decision support system for automated anemia screening and can be further integrated into clinical systems.
Copyrights © 2025