Lecturers play a crucial role in higher education, with their teaching behavior directly impacting learning and teaching quality. Lecturer evaluation by students (LES) is a common method for assessing lecturer performance, though it often relies on subjective perceptions. As a more objective alternative, speech emotion recognition (SER) uses speech technology to analyze emotions in the speech of lecturers during classes. This study proposes using deep learning-based SER, including convolutional neural network (CNN) and bidirectional long short-term memory (Bi-LSTM), to evaluate teaching quality by analyzing displayed emotions. Removing silence from audio signals is crucial for enhancing feature analysis, such as energy, zero-crossing rate (ZCR), and mel-frequency cepstral coefficients (MFCC). This method removes inactive segments, emphasizing significant segments, and improving accuracy in detecting voice and emotions. Results show that the 1D CNN model with Bi-LSTM, using MFCC with 13 coefficients, energy, and ZCR, performs excellently in emotion detection, achieving a validation accuracy of over 0.851 with an accuracy gap of 0.002. This small gap indicates good generalization and reduces the risk of overfitting, making teaching evaluations more objective and valuable for improving practices.
Copyrights © 2025