Diabetes mellitus is a non-communicable disease with a continuously increasing global prevalence and impacts quality of life and long-term economic burden; therefore, data-driven early detection is crucial to prevent clinical complications. This study aims to develop a diabetes prediction model using the TabTransformer architecture by utilizing a clinical dataset from Kaggle containing 100,000 patient profiles with more than 35 relevant numerical and categorical attributes. The research stages include preprocessing to remove potential leakage features, target and feature separation, numerical normalization, and categorical feature embedding. The TabTransformer model is applied for binary classification (diagnosed_diabetes) by utilizing a self-attention mechanism to capture latent interactions between tabular features, and is evaluated using accuracy, precision, recall, F1-score, and ROC AUC metrics. The results show competitive performance with an accuracy of 82.55%, a diabetes class F1-score of 0.8527, and a ROC AUC value of 0.9009, indicating the model's discriminatory ability to reliably distinguish diabetic and non-diabetic patients. Based on these results, the TabTransformer architecture has been proven effective for processing large-scale clinical tabular data and is worthy of consideration in the implementation of an artificial intelligence-based medical decision support system for predicting chronic diseases, especially diabetes mellitus.
Copyrights © 2024