Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : Journal of Intelligent Systems Technology and Informatics

Indonesian Sign Language Alphabet Image Classification using Vision Transformer Agustiansyah, Yoga; Kurniadi, Dede
Journal of Intelligent Systems Technology and Informatics Vol 1 No 1 (2025): JISTICS, March 2025
Publisher : Aliansi Peneliti Informatika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.64878/jistics.v1i1.5

Abstract

Effective communication is fundamental for social interaction, yet individuals with hearing impairments often face significant barriers. Indonesian Sign Language (BISINDO) is a vital communication tool for the deaf community in Indonesia. However, limited public understanding of BISINDO creates communication barriers, which necessitate an accurate automatic recognition system. This research aims to investigate the efficacy of the Vision Transformer (ViT) model, a state-of-the-art deep learning architecture, for classifying static BISINDO alphabet images, exploring its potential to overcome the limitations of previous approaches through robust feature extraction. The methodology involved utilizing a dataset of 26 BISINDO alphabet classes, which underwent comprehensive preprocessing, including class balancing via augmentation and image normalization. The Google/vit-base-patch16-224-in21k ViT model was adapted with a custom classification head and trained using a two-phase strategy: initial feature extraction with a frozen backbone, followed by full network fine-tuning. The fine-tuned Vision Transformer model demonstrated exceptional performance on the unseen test set, achieving an accuracy of 99.77% (95% CI: 99.55%–99.99%), precision of 99.77%, recall of 99.72%, and a weighted F1-score of 0.9977, significantly surpassing many previously reported methods. The findings compellingly confirm that the ViT model is a highly effective and robust solution for BISINDO alphabet image classification, underscoring the potential of advanced Transformer-based architectures in developing accurate assistive communication technologies to benefit the Indonesian deaf and hard-of-hearing community.
From Local Features to Global Context: Comparing CNN and Transformer for Sundanese Script Classification Agustiansyah, Yoga; Fauzi, Dhika Restu
Journal of Intelligent Systems Technology and Informatics Vol 1 No 2 (2025): JISTICS, July 2025
Publisher : Aliansi Peneliti Informatika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.64878/jistics.v1i2.38

Abstract

The digital preservation of historical writing systems like Aksara Sunda is critical for cultural heritage, yet automated recognition is hindered by high character similarity and handwriting variability. This study systematically compares two dominant deep learning paradigms, Convolutional Neural Networks (CNNs) and Transformers, to evaluate the crucial trade-off between model accuracy and real-world robustness. Using a transfer learning approach, we trained five models (ResNet50, MobileNetV2, EfficientNetB0, ViT, and DeiT) on a balanced 30-class dataset of Sundanese script. Performance was assessed on a standard in-distribution test set and a challenging, independently collected Out-of-Distribution (OOD) dataset designed to simulate varied real-world conditions. The results reveal a significant performance inversion. While EfficientNetB0 achieved the highest accuracy of 96.9% on in-distribution data, its performance plummeted on the OOD set. Conversely, ResNet50, despite being lower in in-distribution accuracy, proved to be the most robust model, achieving the highest accuracy of 92.5% on the OOD data. This study concludes that for practical applications requiring reliable performance, the generalization capability demonstrated by ResNet50 is more valuable than the specialized accuracy of EfficientNetB0, offering a crucial insight for developing robust digital preservation tools for historical scripts.