Garuda - Garba Rujukan Digital

Bulletin of Electrical Engineering and Informatics

Vol 13, No 3: June 2024

Al Mukarram, Khasyi (Unknown)
Mukhlas, M. Anang (Unknown)
Zahra, Amalia (Unknown)

Publish Date
01 Jun 2024

This study evaluates the effectiveness of data augmentation on 1D convolutional neural network (CNN) and transformer models for speech emotion recognition (SER) on the Ryerson audio-visual database of emotional speech and song (RAVDESS) dataset. The results show that data augmentation has a positive impact on improving emotion classification accuracy. Techniques such as noising, pitching, stretching, shifting, and speeding are applied to increase data variation and overcome class imbalance. The 1D CNN model with data augmentation achieved 94.5% accuracy, while the transformer model with data augmentation performed even better at 97.5%. This research is expected to contribute better insights for the development of accurate emotion recognition methods by using data augmentation with these models to improve classification accuracy on the RAVDESS dataset. Further research can explore larger and more diverse datasets and alternative model approaches.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Bulletin of Electrical Engineering and Informatics

Website

Abbrev

EEI

Publisher

Universitas Ahmad Dahlan

Subject

Electrical & Electronics Engineering

Description

Bulletin of Electrical Engineering and Informatics (Buletin Teknik Elektro dan Informatika) ISSN: 2089-3191, e-ISSN: 2302-9285 is open to submission from scholars and experts in the wide areas of electrical, electronics, instrumentation, control, telecommunication and computer engineering from the ...

Article Info

Abstract

Enhancing speech emotion recognition with deep learning using multi-feature stacking and data augmentation

Article Info

Abstract