The ability of computers to imitate human abilities has been an interesting thing to develop. In several studies, emotion recognition has been studied both through facial photos and verbal and non-verbal speech. This study aims to explore various deep learning methods to get the best model for detecting emotions using the EmoDB dataset. Feature extraction is done using Zero Crossing Rate, Chroma_stft, Mel Frequency Cepstral Coefficients (MFCC), Root Mean Square (RMS) and MelSpectogram. In the pre-processing stage, data augmentation techniques are applied by applying noise injection, shifting time and changing the audio pitch and speed. From the results of the study, it was stated that the best deep learning method based on the accuracy value was CNN-BiLSTM.
Copyrights © 2022