Muhammad 'Arif Faizin
Institut Teknologi Sepuluh Nopember

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Evaluation of Synthetic Data Effectiveness using Generative Adversarial Networks (GAN) in Improving Javanese Script Recognition on Ancient Manuscript Muhammad 'Arif Faizin; Nanik Suciati; Chastine Fatichah
JUTI: Jurnal Ilmiah Teknologi Informasi Vol. 23, No. 1, January 2025
Publisher : Department of Informatics, Institut Teknologi Sepuluh Nopember

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j24068535.v23i1.a1256

Abstract

The imbalance of Javanese script data in ancient manuscript recognition poses a challenge due to the limited availability of data. A potential approach to addressing this issue is the use of Generative Adversarial Networks (GAN). This study evaluates the effectiveness of synthetic data generated using Enhanced Balancing GAN (EBGAN) in mitigating data imbalance. Various evaluation scenarios are conducted, including: (i) assessing the impact of syn-thetic data as augmentation, (ii) evaluating the sufficiency of synthetic data for recognition models, (iii) analyzing minority class oversampling with different selection strategies, and (iv) evaluating model generalization through cross-validation. Quantitative analysis of the generated synthetic data, based on Fréchet Inception Distance (FID) and Structural Similarity Index (SSIM), as well as visual assessment, indicates that the quality of synthetic data closely resembles real data. Additionally, experimental results demonstrate that combining real and synthetic data improves accuracy, precision, recall, and F1-score. The oversampling strategy for synthetic data has proven effective in meeting the data sufficiency requirements for training recognition models. Meanwhile, selecting minority classes and determining threshold values based on percentage, distribution, and model performance in oversampling can serve as guidelines for enhancing script recognition performance. Compared to previous methods, the use of EBGAN has been shown to produce more diverse synthetic data with better visual quality. However, further research is still needed to optimize GAN performance in supporting script recognition.