Science in Information Technology Letters
Vol 6, No 1 (2025): May 2025

A comparative study on SMOTE, CTGAN, and hybrid SMOTE-CTGAN for medical data augmentation

Khoirunnisa, Ninda (Unknown)
Rosyda, Miftahurrahma (Unknown)



Article Info

Publish Date
30 May 2025

Abstract

The imbalance of clinical datasets remains a challenge in medical data mining, often resulting in models biased toward majority outcomes and reduced sensitivity to rare but clinically critical cases. This study presents a comparative evaluation of three augmentation strategies—Synthetic Minority Oversampling Technique (SMOTE), Conditional Tabular GAN (CTGAN), and a hybrid SMOTE+CTGAN—on the Framingham Heart Study dataset for cardiovascular disease prediction. Augmented datasets were evaluated using Decision Tree, Random Forest, and XGBoost classifiers across multiple metrics, including accuracy, precision, recall, and F1-score. Results demonstrate that classifiers trained on imbalanced data achieved high accuracy but poor minority recall (0.40), confirming model’s bias toward majority class. SMOTE yielded the strongest improvements in minority recall (up to 0.88 with XGBoost) and balanced F1 across classes, though at the cost of reduced majority recall. CTGAN and SMOTE+CTGAN delivered more moderate improvements in minority recall (0.66–0.77) while preserving higher majority recall (0.86), providing a gentler trade-off. These findings indicate that while SMOTE remains a robust baseline for addressing imbalance, hybrid and GAN-based approaches offer practical alternatives for preserving majority performance. The results highlight that augmentation choice should be informed by clinical context.

Copyrights © 2025






Journal Info

Abbrev

sitech

Publisher

Subject

Computer Science & IT

Description

Science in Information Technology Letters (SITech) aims to keep abreast of the current development and innovation in the area of Science in Information Technology as well as providing an engaging platform for scientists and engineers throughout the world to share research results in related ...