Classifying traditional musical instrument audio remains challenging due to limited labeled data, strong acoustic variability, and spectral similarity across instruments. This paper proposes an attention-based Long Short-Term Memory (LSTM) model for traditional instrument sound classification using Mel-Frequency Cepstral Coefficients (MFCC) as the feature representation. Three LSTM variants, Bidirectional LSTM, Residual LSTM, and Attention-based LSTM are investigated to identify the most effective temporal architecture for this task. The attention mechanism is specifically integrated to enable the model to prioritize discriminative temporal segments, such as unique attack phases and harmonic decay, which are often obscured in traditional instruments. The dataset comprises 1,000 audio samples from 10 traditional instrument classes. All samples are normalized to 3-second duration and augmented via pitch shifting, time stretching, and additive noise to improve generalization. Using 5-fold cross-validation, the Attention-based LSTM consistently achieves the highest performance, with average accuracy 96.73%. This superiority stems from the mechanism’s ability to surpress irrelevant noise frames while focusing on key spectral-temporal features. Robustnes experiments maintain accuracy above 90% under noisy conditions, suggesting that coupling MFCC with attention-enchanced modeling provides a robust solution for cultural heritage preservation through digital audio recognition.
Copyrights © 2026