Diabetes mellitus is a chronic disease with a rapidly increasing global prevalence, affecting around 422 million people, predominantly in low- and middle-income countries. Effective management of diabetes requires early detection and timely intervention. This study aims to develop an accurate predictive model for diabetes mellitus using three machine learning algorithms: Random Forest, Logistic Regression, and Decision Tree. The Pima Indians Diabetes dataset, comprising 768 patient records with various health indicators, was utilized for model training and evaluation. Exploratory data analysis revealed significant correlations between glucose levels, BMI, age, and diabetes risk. The dataset was split into 80% training and 20% testing sets. Models were validated using cross-validation and evaluated based on accuracy, precision, recall, and F1-score. Results indicated that Logistic Regression achieved the highest accuracy (75%) and balanced performance in identifying both positive and negative cases. Decision Tree excelled in recall, while Random Forest showed a slightly lower balance between precision and recall. The ROC curve analysis demonstrated that Random Forest had the highest AUC (0.82), followed by Logistic Regression (0.81) and Decision Tree (0.73). This study confirms that machine learning algorithms can effectively predict diabetes, providing valuable tools for early detection and intervention, ultimately reducing the global burden of diabetes mellitus.
Copyrights © 2024