Setyaningsih, Eka Rahayu
Unknown Affiliation

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Bi-LSTM and Attention-based Approach for Lip-To-Speech Synthesis in Low-Resource Languages: A Case Study on Bahasa Indonesia Setyaningsih, Eka Rahayu; Handayani, Anik Nur; Irianto, Wahyu Sakti Gunawan; Kristian, Yosi
Buletin Ilmiah Sarjana Teknik Elektro Vol. 7 No. 4 (2025): December
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12928/biste.v7i4.14310

Abstract

Lip-to-speech synthesis enables the transformation of visual information, particularly lip movements, into intelligible speech. This technology has gained increasing attention due to its potential in assistive communication for individuals with speech impairments, audio restoration in cases of missing or corrupted speech signals, and enhancement of communication quality in noisy or bandwidth-limited environments. However, research on low-resource languages, such as Bahasa Indonesia, remains limited, primarily due to the absence of suitable corpora and the unique phonetic structures of the language. To address this challenge, this study employs the LUMINA dataset, a purpose-built Indonesian audio-visual corpus comprising 14 speakers with diverse syllabic coverage. The main contribution of this work is the design and evaluation of an Attention-Augmented Bi-LSTM Multimodal Autoencoder, implemented as a two-stage parallel pipeline: (1) an audio autoencoder trained to learn compact latent representations from Mel-spectrograms, and (2) a visual encoder based on EfficientNetV2-S integrated with Bi-LSTM and multi-head attention to predict these latent features from silent video sequences. The experimental evaluation yields promising yet constrained results. Objective metrics yielded maximum scores of PESQ 1.465, STOI 0.7445, and ESTOI 0.5099, which are considerably lower than those of state-of-the-art English systems (PESQ > 2.5, STOI > 0.85), indicating that intelligibility remains a challenge. However, subjective evaluation using Mean Opinion Score (MOS) demonstrates consistent improvements: while baseline LSTM models achieve only 1.7–2.5, the Bi-LSTM with 8-head attention attains 3.3–4.0, with the highest ratings observed in female multi-speaker scenarios. These findings confirm that Bi-LSTM with attention improves over conventional baselines and generalizes better in multi-speaker contexts. The study establishes a first baseline for lip-to-speech synthesis in Bahasa Indonesia and underscores the importance of larger datasets and advanced modeling strategies to further enhance intelligibility and robustness in low-resource language settings.
Sentiment Classification untuk Opini Berita SepakBola Setyaningsih, Eka Rahayu
Intelligent System and Computation Vol 3 No 2 (2021): INSYST: Journal of Intelligent System and Computation
Publisher : Institut Sains dan Teknologi Terpadu Surabaya (d/h Sekolah Tinggi Teknik Surabaya)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52985/insyst.v3i2.193

Abstract

Pada penelitian ini akan dibahas mengenai sebuah aplikasi yang dibuat secara khusus untuk mengkategorikan opini masyarakat terhadap sebuah berita Sepak Bola. Opini yang diolah diperoleh dari dua sumber, yaitu melalui hasil crawl situs berita olah raga dan opini yang ditambahkan oleh user sendiri pada aplikasi ini. Opini yang ada nantinya akan disajikan secara terpisah menurut kelompoknya; sentiment positive, negative, maupun netral. Proses klasifikasinya sendiri terdiri dari dua tahap. Tahap pertama adalah proses preprocessing yang terdiri atas proses tokenisasi, normalisasi, case folding, stop word removing, common word removing, stemming. Tahap kedua adalah mengklasifikasikan opini-opini tersebut dengan algoritma Baseline, dan Naive Bayes. Opini yang digunakan untuk proses klasifikasi yaitu opini yang menggunakan bahasa Inggris dari situs fifa.com dan goal.com. Dari perhitungan macroaveraged untuk setiap kelas, didapatkan akurasi 93,06%, presisi 81,90%, dan recall 92,67% untuk kelas sentiment positive. Dari perhitungan kelas sentiment negative didapatkan akurasi 87,73%, presisi 96,29%, dan recall 83,63%. Dari perhitungan kelas sentiment netral didapatkan akurasi 92,26%, presisi 64,44%, dan recall 90,37%. Kesimpulan yang diperoleh saat penelitian ini dari awal hingga akhir adalah, proses crawling yang digunakan untuk mendapatkan berita dan komentar berita sangat membantu dalam penambahan konten website, tetapi banyak sekali komentar berita yang diperoleh kurang cocok untuk proses klasifikasi.