Diabetes is one of the most prevalent chronic diseases worldwide and requires accurate early detection to prevent long-term complications. In the field of medical data analysis, the application of machine learning algorithms such as XGBoost has proven effective in classifying disease risk. This study aims to compare the performance of the XGBoost algorithm before and after applying Principal Component Analysis (PCA) in diabetes risk classification using the Early Stage Diabetes Risk Prediction Dataset. The research stages include data preprocessing involving missing value checking, label encoding, outlier removal, normalization, and followed by the application of PCA with a 90% variance retention threshold. The experimental results show that the XGBoost model without PCA achieved the highest accuracy of 99.04%, while the model with PCA achieved 98.08%. Although the application of PCA slightly reduced accuracy, this technique successfully decreased the number of features and improved computational efficiency without losing important information. Therefore, PCA is proven to be effective in simplifying data complexity while maintaining optimal model performance.
Copyrights © 2025