Nur Shabrina, Nariswari
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

A BiLSTM-Based Approach For Speech Emotion Recognition In Conversational Indonesian Audio using SMOTE Nur Shabrina, Nariswari; Kasyidi, Fatan; Ilyas, Ridwan
Jurnal Teknik Informatika (Jutif) Vol. 6 No. 5 (2025): JUTIF Volume 6, Number 5, Oktober 2025
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2025.6.5.5183

Abstract

Speech Emotion Recognition (SER) identifies human emotions through voice signal analysis, focusing on pitch, intonation, and tempo. This study determines the optimal sampling rate of 48,000 Hz, following the Nyquist-Shannon theorem, ensuring accurate signal reconstruction. Audio features are extracted using Mel-Frequency Cepstral Coefficients (MFCC) to capture frequency and rhythm changes in temporal signals. To address data imbalance, Synthetic Minority Over-sampling Technique (SMOTE) generates synthetic data for the minority class, enabling more balanced model training. A One-vs-All (OvA) approach is applied in emotion classification, constructing separate models for each emotion to enhance detection. The model is trained using Bidirectional Long Short-Term Memory (BiLSTM), capturing contextual information from both directions, improving understanding of complex speech patterns. To optimize the model, Nadam (Nesterov-accelerated Adaptive Moment Estimation) is used to accelerate convergence and stabilize weight updates. Bagging (Bootstrap Aggregating) techniques are implemented to reduce overfitting and improve prediction accuracy. The results show that this combination of techniques achieves 78% accuracy in classifying voice emotions, contributing significantly to improving emotion detection systems, especially for under-resourced languages.