Saddad Nabbil
Universitas Pamulang

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Sistem Penerjemahan Ucapan Bahasa Sunda Berbasis Web dengan Augmentasi Visual Menggunakan Convolutional Neural Network Saddad Nabbil; Yono Cahyono
Journal of Artificial Intelligence and Innovative Applications (JOAIIA) Vol. 7 No. 1 (2026): February
Publisher : Teknik Informatika Universitas Pamulang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32493/joaiia.v7i1.56485

Abstract

This research develops a web-based Sundanese speech translation system incorporating visual enhancement through Convolutional Neural Network (CNN). The primary challenge is insufficient accuracy in audio-only Automatic Speech Recognition (ASR) for low-resource languages under noisy conditions. The solution integrates fine-tuned Whisper Medium for transcription, CNN-based lip-reading, and attention-weighted audio-visual fusion. Training used OpenSLR36 Sundanese corpus with ~35,000 samples from 175,324 available instances (subset due to memory constraints). Optimization was executed on RunPod using NVIDIA RTX 4090 GPU (24GB VRAM) for 5,000 iterations (~11 hours). Results show the optimized model achieves Word Error Rate (WER) of 2.45% at optimal checkpoint (iteration 3500), improving 7.37 percentage points from baseline (9.82% at iteration 500). This performance approaches state-of-the-art by Raharjo & Zahra (2025) reporting 2.03% WER using Whisper Small. The visual module comprises three-layer CNN producing 512-dimensional features with MediaPipe facial detection. Black-box testing validates functional compliance, while responsive interface ensures cross-device compatibility. This work advances Sundanese preservation through accessible translation with competitive accuracy.