Cao, Thanh
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Machine Learning-Based Diabetes Classification Using Vital Signs and Clinical Information from the MIMIC-IV Dataset Huynh, Huy; Cao, Thanh; Tran, Hai
Scientific Journal of Computer Science Vol. 2 No. 2 (2026): December Article in Process
Publisher : PT. Teknologi Futuristik Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.64539/sjcs.v2i2.2026.439

Abstract

Diagnosing diabetes based on clinical data is very important because the number of people with diabetes is growing around the world. The main focus of this study is on using machine learning models to figure out what kind of sickness someone has from a variety of clinical data. The MIMIC-IV dataset was used, which has both structured and unstructured data. The structured data includes vital signs, demographics, and lab tests. The unstructured data includes medical notes, major complaints, and a list of medications. Logistic Regression, Random Forest, Gradient Boosting, Support Vector Machine, and XGBoost were some of the models that were tested. Accuracy, Precision, Recall, F1-score, and AUC-ROC were used to measure how well the models worked. When random text data was added to the experiments, the results showed a big improvement in performance: the accuracy increased from approximately 68% to up to 87% across models. The best-performing models achieved AUC-ROC values above 0.95, with Random Forest and XGBoost showing the strongest performance. This shows that semantic mining from clinical notes is a key part of making medical decision support systems more reliable.