Journal of Applied Data Sciences
Vol 5, No 4: DECEMBER 2024

A Framework for Diabetes Detection Using Machine Learning and Data Preprocessing

Abu-Shareha, Ahmad Adel (Unknown)
Qutaishat, Haneen (Unknown)
Al-Khayat, Asma (Unknown)



Article Info

Publish Date
15 Oct 2024

Abstract

People with diabetes are at an increased risk of developing other complications, such as heart disease and nerve damage. Therefore, diabetes prediction is crucial to reduce the severe consequences of this disease. This study proposed a comprehensive framework for diabetes prediction to maximize the information from available diabetes datasets, which include historical records, laboratory tests, and demographic data. The proposed framework implements a data imputation technique for filling in missing values and adopts feature selection methods to remove less important features for better diabetes classification. An oversampling technique and a parameter tuning approach were used to increase the samples and fine-tune the parameters for training the machine learning algorithms. Various machine learning algorithms, including Neural Networks, Logistic Regression, Support Vector Machines, and Random Forest, were used for the prediction. These algorithms were evaluated using both train-test split and cross-validation techniques. The experiments were conducted on the Pima Indian Diabetes dataset using various evaluation metrics, including accuracy, precision, recall, and F-measure. The results showed that the Random Forest algorithm, particularly when fine-tuned with Grid Search Cross Validation, outperformed other algorithms, achieving an impressive accuracy of 0.99. This demonstrates the robustness and effectiveness of the proposed framework, which outperformed the accuracy of state-of-the-art approaches.

Copyrights © 2024






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...