Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Knowbase : International Journal of Knowledge in Database

End-to-End Text-to-Speech for Minangkabau Pariaman Dialect Using Variational Autoencoder with Adversarial Learning (VITS) Fakhrezi, Muhammad Dzaki; Yusra; Muhammad Fikry; Pizaini; Suwanto Sanjaya
Knowbase : International Journal of Knowledge in Database Vol. 5 No. 1 (2025): June 2025
Publisher : Universitas Islam Negeri Sjech M. Djamil Djambek Bukittinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30983/knowbase.v5i1.9909

Abstract

Language serves as a medium of human communication to convey ideas, emotions, and information, both orally and in writing. Each language possesses vocabulary and grammar adapted to the local culture. One of the regional languages that enriches Indonesian as the national language is Minangkabau. This language has four main dialects, namely Tanah Datar, Lima Puluh Kota, Agam, and Pesisir. Within the Pesisir dialect, there are several variations, including the Padang Kota, Padang Luar Kota, Painan, Tapan, and Pariaman dialects. This study discusses the application of Text-to-Speech (TTS) technology to the Minangkabau language, specifically the Pariaman dialect, using the Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech (VITS) method. This dialect needs to be preserved to prevent extinction and supported through technological development that broadens its use. The VITS method was chosen because it is capable of producing natural and high-quality speech. The research stages include voice data collection and recording, VITS model training, and speech quality evaluation using the Mean Opinion Score (MOS). The final results show a score of 4.72 out of 5, indicating that the generated speech closely resembles the natural utterances of native speakers. This TTS technology is expected to support the preservation and development of the Minangkabau language in the Pariaman dialect, as well as enhance information accessibility for its speakers.