Music is a powerful art form for conveying and evoking emotions; however, the vast volume of digital music data makes manual emotion categorization difficult. This study aims to implement a Convolutional Neural Network (CNN) to classify emotions in instrumental songs based on audio features. The dataset used is the Database for Emotional Analysis of Music (DEAM), containing 1,802 songs with valence and arousal annotations, which is divided with a 70:15:15 ratio for training, validation, and testing. The feature extraction methods applied include Mel-Frequency Cepstral Coefficients (MFCC) with variations of 13, 24, and 30 coefficients, and Mel-spectrograms with variations of 128, 256, and 512 bins. Data is processed through pre-emphasis and framing stages before being input into a CNN architecture with four convolutional blocks. Evaluation was conducted using 4-quadrant classification scenarios and a simplification into 2 quadrants. The results showed that in the 4-quadrant classification, the best model was achieved using MFCC with 30 coefficients with an accuracy of 66%, but model performance was hindered by extreme minority class imbalance. Conversely, simplifying the emotion space into 2 quadrants (valence or arousal) significantly improved accuracy to 77%. This study concludes that while increasing feature resolution has a minor impact, simplifying emotion dimensions proves more effective in addressing complexity and data imbalance in music emotion classification.
Copyrights © 2026