Scientific Journal of Computer Science
Vol. 2 No. 2 (2026): December Article in Process

Machine Learning-Based Diabetes Classification Using Vital Signs and Clinical Information from the MIMIC-IV Dataset

Huynh, Huy (Unknown)
Cao, Thanh (Unknown)
Tran, Hai (Unknown)



Article Info

Publish Date
29 Mar 2026

Abstract

Diagnosing diabetes based on clinical data is very important because the number of people with diabetes is growing around the world. The main focus of this study is on using machine learning models to figure out what kind of sickness someone has from a variety of clinical data. The MIMIC-IV dataset was used, which has both structured and unstructured data. The structured data includes vital signs, demographics, and lab tests. The unstructured data includes medical notes, major complaints, and a list of medications. Logistic Regression, Random Forest, Gradient Boosting, Support Vector Machine, and XGBoost were some of the models that were tested. Accuracy, Precision, Recall, F1-score, and AUC-ROC were used to measure how well the models worked. When random text data was added to the experiments, the results showed a big improvement in performance: the accuracy increased from approximately 68% to up to 87% across models. The best-performing models achieved AUC-ROC values above 0.95, with Random Forest and XGBoost showing the strongest performance. This shows that semantic mining from clinical notes is a key part of making medical decision support systems more reliable.

Copyrights © 2026






Journal Info

Abbrev

sjcs

Publisher

Subject

Computer Science & IT

Description

The Scientific Journal of Computer Science (SJCS) (e-ISSN: 3110-3170) is a peer-reviewed and open-access scientific journal, managed and published by PT. Teknologi Futuristik Indonesia in collaboration with Universitas Qamarul Huda Badaruddin Bagu and Peneliti Teknologi Teknik Indonesia. The SJCS ...