Journal of Artificial Intelligence and Innovative Applications (JOAIIA)
Vol. 7 No. 1 (2026): February

Sistem Penerjemahan Ucapan Bahasa Sunda Berbasis Web dengan Augmentasi Visual Menggunakan Convolutional Neural Network

Saddad Nabbil (Unknown)
Yono Cahyono (Unknown)



Article Info

Publish Date
28 Feb 2026

Abstract

This research develops a web-based Sundanese speech translation system incorporating visual enhancement through Convolutional Neural Network (CNN). The primary challenge is insufficient accuracy in audio-only Automatic Speech Recognition (ASR) for low-resource languages under noisy conditions. The solution integrates fine-tuned Whisper Medium for transcription, CNN-based lip-reading, and attention-weighted audio-visual fusion. Training used OpenSLR36 Sundanese corpus with ~35,000 samples from 175,324 available instances (subset due to memory constraints). Optimization was executed on RunPod using NVIDIA RTX 4090 GPU (24GB VRAM) for 5,000 iterations (~11 hours). Results show the optimized model achieves Word Error Rate (WER) of 2.45% at optimal checkpoint (iteration 3500), improving 7.37 percentage points from baseline (9.82% at iteration 500). This performance approaches state-of-the-art by Raharjo & Zahra (2025) reporting 2.03% WER using Whisper Small. The visual module comprises three-layer CNN producing 512-dimensional features with MediaPipe facial detection. Black-box testing validates functional compliance, while responsive interface ensures cross-device compatibility. This work advances Sundanese preservation through accessible translation with competitive accuracy.

Copyrights © 2026