Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Elkom: Jurnal Elektronika dan Komputer

Optimisasi Whisper Speech-to-Text Bahasa Indonesia dengan Hybrid Cloud dan Multi-Engine Ikhwan Alfath Nurul Fathony; Affix Mareta; Beta Estri Adiana; Olivia Wardhani; Dimas Ardiansyah Halim
Elkom: Jurnal Elektronika dan Komputer Vol. 18 No. 1 (2025): Juli : Jurnal Elektronika dan Komputer
Publisher : STEKOM PRESS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51903/5n2d3s08

Abstract

Automatic Speech Recognition (ASR) for the Indonesian language faces significant challenges due to high Word Error Rate (WER), especially when using pre-trained models without fine-tuning. This study develops an optimized ASR system using a hybrid cloud architecture that integrates the Faster-Whisper large-v3 engine with advanced audio preprocessing techniques. The system adopts a distributed architecture, with Google Colab (Tesla T4, 15GB VRAM) as the GPU server and Ubuntu 22.04 LTS (8 core, 32GB RAM) as the client. Evaluation was conducted on five Indonesian audio samples covering formal news, informal conversations, and long-duration recordings. The system achieved an 80% success rate in processing, with WER ranging from 27.69% (formal news) to 645.16% (informal conversations). Resource utilization was also efficient, with 21.3% GPU usage and 35.4% RAM usage. Processing time remained stable for normal-sized files but experienced timeouts on large files (>50MB). The results indicate that hybrid cloud architecture is feasible for distributed ASR processing in Indonesian, with several areas still open for optimization toward production deployment.