The research of speech emotion recognition (SER) is growing rapidly. However, SER still faces a cross-corpus SER problem which is performance degradation when a single SER model is tested in different domains. This study shows the impact of implementing a generative adversarial network (GAN) model for adapting speech data from different domains and performs emotion classification from the speech features using a 1D convolutional neural network (CNN) model. The results of this study found that the domain adaptation approach using a GAN model could improve the accuracy of emotion classification in speech data from 2 different domain such as the ryerson audio-visual database of emotional speech and song (RAVDESS) speech corpus and the EMO-DB speech corpus ranging from 10.88% to 28.77%, with the highest average performance increase across three different class balancing method reaching 18.433%.
Copyrights © 2025