Claim Missing Document
Check
Articles

Found 1 Documents
Search

Fine-Tuning Whisper Model for Mandar Speech Recognition: Approach and Performance Evaluation Jafar, Jafar; Tb, Mar Athul Wazithah; Aziz, Firman; Iriany, Rosary; Nasir, Norma
Journal of Applied Engineering and Technological Science (JAETS) Vol. 7 No. 1 (2025): Journal of Applied Engineering and Technological Science (JAETS)
Publisher : Yayasan Riset dan Pengembangan Intelektual (YRPI)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.37385/jaets.v7i1.7170

Abstract

This research focuses on the development of speech recognition technology for the Mandar language, a regional language in Indonesia with limited digital resources. The main challenge lies in the lack of local datasets and the minimal representation of the Mandar language in existing multilingual speech recognition models. This study aims to enhance the performance of Automatic Speech Recognition (ASR) systems by fine-tuning the Whisper model using a Mandar-specific dataset. The dataset consists of 1,000 audio recordings with various dialects and recording qualities, which underwent preprocessing steps such as segmentation, normalization, and data augmentation. Fine-tuning was conducted using supervised learning methods with hyperparameter optimization, resulting in a reduction of Word Error Rate (WER) from 73.7% in the pretrained model to 37.4% after fine-tuning, and an increase in accuracy from 26.3% to 62.6%. The optimized model was also compared with other ASR models, such as DeepSpeech and wav2vec 2.0, demonstrating superior performance in terms of accuracy and time efficiency. Further analysis revealed that recording quality and dialect variations significantly impacted model performance, with high-quality recordings and standard dialects yielding the best results. The model was implemented as a web application prototype, enabling efficient and near real-time transcription of Mandar speech. This research not only contributes to the development of ASR technology for low-resource languages but also opens new opportunities for preserving and utilizing the Mandar language through digital technology. For future improvements, larger datasets, more advanced augmentation techniques, and the exploration of additional language model integration are recommended.