Claim Missing Document
Check
Articles

Found 1 Documents
Search

Dimensional Speech Emotion Recognition from Acoustic and Text Features using Recurrent Neural Networks Bagus Tris Atmaja; Masato Akagi; Reda Elbarougy
International Journal of Informatics, Information System and Computer Engineering (INJIISCOM) Vol 1 No 1 (2020): International JournalĀ of Informatics, Information System and Computer Engineering
Publisher : Universitas Komputer Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (566.864 KB) | DOI: 10.34010/injiiscom.v1i1.4023

Abstract

Emotion can be inferred from tonal and verbal information, where both features can be extracted from speech. While most researchers conducted studies on categorical emotion recognition from a single modality, this research presents a dimensional emotion recognition combining acoustic and text features. A number of 31 acoustic features are extracted from speech, while word vector is used as text features. The initial result on single modality emotion recognition can be used as a cue to combine both features with improving the recognition result. The latter result shows that a combination of acoustic and text features decreases the error of dimensional emotion score prediction by about 5% from the acoustic system and 1% from the text system. This smallest error is achieved by combining the text system with Long Short-Term Memory (LSTM) networks and acoustic systems with bidirectional LSTM networks and concatenated both systems with dense networks.