The ability of machines to recognize emotions from voice is known as Speech Emotion Recognition (SER). This study developed a voice emotion classification system using a Convolutional Neural Network (CNN) and implemented it in the form of an Android mobile application. The main problem raised is how to recognize human emotions through voice signals accurately, efficiently, and in real-time on mobile devices. The study was conducted with two training stages, namely pre-training using the RAVDESS dataset and fine-tuning with the IndoWaveSentiment dataset. Audio data was converted into a 128×128×1 Mel-spectrogram to be input to the CNN. The CNN model consists of three convolution and pooling blocks, as well as dense and softmax layers. After training, the model was converted to TensorFlow Lite format and integrated with the Android application through a client-server architecture using Flask. The test results showed that the system was able to recognize neutral, happy, disappointed, and surprised emotions with a high level of accuracy both on test data but not as good on live recorded voice. The system also features a SQLite-based history feature. Test results showed 96% accuracy on external test data and 55% on live recorded audio, with an average accuracy of 75.5%. This indicates the model performs very well in structured conditions, but still needs improvement for real-world input.
Copyrights © 2026