Scientific Journal of Informatics
Vol. 11 No. 3: August 2024

Principal Component Analysis for Prediabetes Prediction using Extreme Gradient Boosting (XGBoost)

Wardhani, Kartina Diah Kesuma (Unknown)
Novayani, Wenda (Unknown)



Article Info

Publish Date
06 Nov 2024

Abstract

Purpose: The purpose of this study is to increase the accuracy of the model used for prediabetes prediction. This study integrates Principal Component Analysis (PCA) for reducing the dimension of data with Extreme Gradient Boosting (XGBoost). The study contributes to providing a new alternative for prediabetes prediction in patients by reducing the complexity of the dataset with the aim of increasing the accuracy of the obtained model. PCA and XGBoost identify the best features that have the highest correlation with prediabetes so that they are expected to produce a better predictive model. Methods: This study utilizes published data sourced from the UCI Machine Learning Repository consisting of 520 records, 16 attributes and 1 label class. The dataset is data collected through direct questionnaires from patients in Sylhet, Bangladesh at the Sylhet Diabetes Hospital. The research method in this study consists of several stages, namely: Data Collection, Data Preprocessing, Dimension Reduction using PCA to reduce the complexity of dimensions in the dataset, Modeling using XGBoost to identify patterns used to predict prediabetes, and Model evaluation used to measure the performance of the resulting model using evaluation metrics such as accuracy, recall, precision and F1-Score. Result: The current study utilizes XGBoost with Principal Component Analysis for feature selection, resulting in 12 features and a model accuracy of 97.44. Novelty: The study's originality lies in applying PCA as a preprocessing step to enhance the performance of machine learning models by reducing data dimensionality and focusing on the most critical features. By demonstrating how PCA can improve the efficiency and accuracy of prediabetes prediction models, this research provides valuable insights to inform future studies and contribute to the development of more effective diagnostic tools for early detection and prevention of prediabetes.

Copyrights © 2024






Journal Info

Abbrev

sji

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Electrical & Electronics Engineering Engineering

Description

Scientific Journal of Informatics (p-ISSN 2407-7658 | e-ISSN 2460-0040) published by the Department of Computer Science, Universitas Negeri Semarang, a scientific journal of Information Systems and Information Technology which includes scholarly writings on pure research and applied research in the ...