cover
Contact Name
Andri Pranolo
Contact Email
andri@ascee.org
Phone
+6281392554050
Journal Mail Official
andri@ascee.org
Editorial Address
Association for Scientific Computing Electrical and Engineering (ASCEE) Jl. Janti, Karangjambe 130B, Banguntapan, Bantul, Yogyakarta, Indonesia
Location
Kota yogyakarta,
Daerah istimewa yogyakarta
INDONESIA
Science in Information Technology Letters
ISSN : -     EISSN : 27224139     DOI : https://doi.org/10.31763/SiTech
Core Subject : Science,
Science in Information Technology Letters (SITech) aims to keep abreast of the current development and innovation in the area of Science in Information Technology as well as providing an engaging platform for scientists and engineers throughout the world to share research results in related disciplines. SITech is a peer reviewed open-access journal which covers four (4) majors areas of research that includes 1) Artificial Intelligence, 2) Communication and Information System, 3) Software Engineering, and 4) Business intelligence Submitted papers must be written in English for initial review stage by editors and further review process by minimum two international reviewers. Finally, accepted and published papers will be freely accessed in this website.
Articles 51 Documents
A comparative study on SMOTE, CTGAN, and hybrid SMOTE-CTGAN for medical data augmentation Khoirunnisa, Ninda; Rosyda, Miftahurrahma
Science in Information Technology Letters Vol 6, No 1 (2025): May 2025
Publisher : Association for Scientific Computing Electronics and Engineering (ASCEE)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31763/sitech.v6i1.2203

Abstract

The imbalance of clinical datasets remains a challenge in medical data mining, often resulting in models biased toward majority outcomes and reduced sensitivity to rare but clinically critical cases. This study presents a comparative evaluation of three augmentation strategies—Synthetic Minority Oversampling Technique (SMOTE), Conditional Tabular GAN (CTGAN), and a hybrid SMOTE+CTGAN—on the Framingham Heart Study dataset for cardiovascular disease prediction. Augmented datasets were evaluated using Decision Tree, Random Forest, and XGBoost classifiers across multiple metrics, including accuracy, precision, recall, and F1-score. Results demonstrate that classifiers trained on imbalanced data achieved high accuracy but poor minority recall (0.40), confirming model’s bias toward majority class. SMOTE yielded the strongest improvements in minority recall (up to 0.88 with XGBoost) and balanced F1 across classes, though at the cost of reduced majority recall. CTGAN and SMOTE+CTGAN delivered more moderate improvements in minority recall (0.66–0.77) while preserving higher majority recall (0.86), providing a gentler trade-off. These findings indicate that while SMOTE remains a robust baseline for addressing imbalance, hybrid and GAN-based approaches offer practical alternatives for preserving majority performance. The results highlight that augmentation choice should be informed by clinical context.