Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : Buletin Ilmiah Sarjana Teknik Elektro

Improving Indonesian Sign Alphabet Recognition for Assistive Learning Robots Using Gamma-Corrected MobileNetV2 Hayati, Lilis Nur; Handayani, Anik Nur; Irianto, Wahyu Sakti Gunawan; Asmara, Rosa Andrie; Indra, Dolly; Damanhuri, Nor Salwa
Buletin Ilmiah Sarjana Teknik Elektro Vol. 7 No. 3 (2025): September
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12928/biste.v7i3.13300

Abstract

Sign language recognition plays a critical role in promoting inclusive education, particularly for deaf children in Indonesia. However, many existing systems struggle with real-time performance and sensitivity to lighting variations, limiting their applicability in real-world settings. This study addresses these issues by optimizing a BISINDO (Bahasa Isyarat Indonesia) alphabet recognition system using the SSD MobileNetV2 architecture, enhanced with gamma correction as a luminance normalization technique. The research contribution is the integration of gamma correction preprocessing with SSD MobileNetV2, tailored for BISINDO and implemented on a low-cost assistive robot platform. This approach aims to improve robustness under diverse lighting conditions while maintaining real-time capability without the use of specialized sensors or wearables. The proposed method involves data collection, image augmentation, gamma correction (γ = 1.2, 1.5, and 2.0), and training using the SSD MobileNetV2 FPNLite 320x320 model. The dataset consists of 1,820 original images expanded to 5,096 via augmentation, with 26 BISINDO alphabet classes. The system was evaluated under indoor and outdoor conditions. Experimental results showed significant improvements with gamma correction. Indoor accuracy increased from 94.47% to 97.33%, precision from 91.30% to 95.23%, and recall from 97.87% to 99.57%. Outdoor accuracy improved from 93.80% to 97.30%, with precision rising from 90.33% to 94.73%, and recall reaching 100%. In conclusion, the proposed system offers a reliable, real-time solution for BISINDO recognition in low-resource educational environments. Future work includes the recognition of two-handed gestures and integration with natural language processing for enhanced contextual understanding.
Bi-LSTM and Attention-based Approach for Lip-To-Speech Synthesis in Low-Resource Languages: A Case Study on Bahasa Indonesia Setyaningsih, Eka Rahayu; Handayani, Anik Nur; Irianto, Wahyu Sakti Gunawan; Kristian, Yosi
Buletin Ilmiah Sarjana Teknik Elektro Vol. 7 No. 4 (2025): December
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12928/biste.v7i4.14310

Abstract

Lip-to-speech synthesis enables the transformation of visual information, particularly lip movements, into intelligible speech. This technology has gained increasing attention due to its potential in assistive communication for individuals with speech impairments, audio restoration in cases of missing or corrupted speech signals, and enhancement of communication quality in noisy or bandwidth-limited environments. However, research on low-resource languages, such as Bahasa Indonesia, remains limited, primarily due to the absence of suitable corpora and the unique phonetic structures of the language. To address this challenge, this study employs the LUMINA dataset, a purpose-built Indonesian audio-visual corpus comprising 14 speakers with diverse syllabic coverage. The main contribution of this work is the design and evaluation of an Attention-Augmented Bi-LSTM Multimodal Autoencoder, implemented as a two-stage parallel pipeline: (1) an audio autoencoder trained to learn compact latent representations from Mel-spectrograms, and (2) a visual encoder based on EfficientNetV2-S integrated with Bi-LSTM and multi-head attention to predict these latent features from silent video sequences. The experimental evaluation yields promising yet constrained results. Objective metrics yielded maximum scores of PESQ 1.465, STOI 0.7445, and ESTOI 0.5099, which are considerably lower than those of state-of-the-art English systems (PESQ > 2.5, STOI > 0.85), indicating that intelligibility remains a challenge. However, subjective evaluation using Mean Opinion Score (MOS) demonstrates consistent improvements: while baseline LSTM models achieve only 1.7–2.5, the Bi-LSTM with 8-head attention attains 3.3–4.0, with the highest ratings observed in female multi-speaker scenarios. These findings confirm that Bi-LSTM with attention improves over conventional baselines and generalizes better in multi-speaker contexts. The study establishes a first baseline for lip-to-speech synthesis in Bahasa Indonesia and underscores the importance of larger datasets and advanced modeling strategies to further enhance intelligibility and robustness in low-resource language settings.