Music genre classification based on spectrogram images is an important task in music information retrieval. This study compares the performance of a custom Convolutional Neural Network (CNN) architecture and VGG-16 for classifying five music genres from the GTZAN dataset: blues, classical, hiphop, metal, and reggae. A total of 500 audio files were converted into spectrogram images for training and testing. The custom CNN was designed and trained from scratch, while VGG-16 utilized pretrained weights with fine-tuning applied to the fully connected layers. Experimental results show that the custom CNN achieved 75% test accuracy and a macro F1- score of 0.74, outperforming VGG-16 which achieved 68.75% accuracy and a macro F1-score of 0.67. These findings demonstrate the advantage of using a tailored architecture for spectrogram- based music genre classification and provide directions for future research, including full fine- tuning of pretrained models, hybrid architectures, and integration of temporal features.
Copyrights © 2025