Diabetes is a serious metabolic disease that can cause various health complications. With more than 537 million people worldwide living with diabetes in 2021, early detection is crucial to preventing further complications. This research aims to predict the risk of diabetes using machine learning algorithms, namely Random Forest (RF), K-Nearest Neighbor (KNN), and Logistic Regression (LR), with the diabetes dataset from UCI. Previous research has explored a variety of algorithms and techniques, with results varying in accuracy. This research uses a dataset from Kaggle which consists of 768 data with 8 parameters, which are processed through pre-processing and data normalization techniques. The model was evaluated using metrics such as accuracy, confusion matrix, and ROC-AUC. The results showed that Logistic Regression had the best performance with 77% accuracy and AUC 0.83, compared to KNN (75% accuracy, AUC 0.81) and Random Forest ( 74% accuracy, AUC 0.81). These findings emphasize the importance of appropriate algorithm selection and good data pre-processing in diabetes risk prediction. This study concludes that Logistic Regression is the most effective method for predicting diabetes risk in the dataset used.
Copyrights © 2024