This study presents the implementation of Transfer learning using the ResNet-18 architecture for classifying 10 musical instrument categories based on visual representations of audio signals. The audio waveform is transformed into image-like inputs appropriate for CNN processing, accompanied by data augmentation and ImageNet-standard normalization. ResNet-18 is utilized due to its efficient feature extraction capability enabled by residual blocks, which help overcome vanishing gradient issues. The model was trained for 10 Epochs using the AdamW optimizer and Cross-Entropy Loss. Experimental results show that the model achieved a maximum validation accuracy of 77.35%, with a stable downward trend in training loss, indicating effective feature learning. However, several misclassification cases were observed, particularly among instruments with similar spectral characteristics, such as drum–violin and tabla–sitar. These findings demonstrate that while ResNet-18 performs reliably for musical instrument classification, further improvements remain possible through deeper architectures like ResNet-50, more comprehensive hyperparameter optimization, and the use of richer audio representations such as Mel-Spectrograms. This research provides an essential foundation for developing automated music analysis systems powered by Deep Learning.
Copyrights © 2024