Claim Missing Document
Check
Articles

Found 1 Documents
Search

Comparative Analysis of Stochastic Gradient Descent Optimization and Adaptive Moment Estimation in Emotion Classification from Audio Using Convolutional Neural Network Aldelia Jocelyn Tutuhatunewa
Jurnal Aplikasi Sains Data Vol. 1 No. 1 (2025): Journal of Data Science Applications.
Publisher : Program Studi Sains Data UPN "Veteran" Jawa Timur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33005/jasid.v1i1.5

Abstract

Emotion is a fundamental aspect of human life that profoundly shapes behavior, social interactions, and decision-making processes. The ability to effectively communicate and foster mutual understanding between individuals relies heavily on accurately recognizing and expressing emotions. Among various channels of emotional expression, sound stands out as a powerful and direct medium that reflects and conveys human emotional states. This makes audio-based emotion recognition a critical and rapidly evolving field of study. With the rapid advancements in information technology and artificial intelligence, research focused on recognizing emotions through sound signals has gained significant momentum. Machine learning algorithms, particularly deep learning models like neural networks, have demonstrated remarkable capabilities in identifying and classifying emotions expressed through multiple modalities such as text, images, videos, and especially audio signals. Within the family of neural networks, Convolutional Neural Networks (CNNs) have been especially effective for audio emotion classification, due to their strength in extracting hierarchical and spatial features directly from raw input data. This study specifically investigates the comparative effectiveness of two popular optimization algorithms—Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation (Adam)—in training CNN models for emotion classification from audio recordings. Utilizing the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset, experimental results indicate that CNNs trained with the SGD optimizer achieve an overall accuracy of 53%, surpassing the 48% accuracy achieved by Adam. These results underscore the potential advantages of SGD in fine-tuning deep learning models for audio-based emotion recognition. Consequently, researchers and practitioners are encouraged to consider SGD optimization to improve the performance and robustness of emotion classification systems based on audio data.