Infant cry classification is an important task to support parents and healthcare professionals in understanding infants’ needs, yet the challenge of limited and imbalanced datasets often reduces model accuracy and generalization. This study proposes the application of diverse audio data augmentation strategies including time stretching, time shifting, pitch scaling, and polarity inversion combined with spectrogram representation to enhance Convolutional Neural Network (CNN) performance in classifying infant cries. The dataset from the Donate-a-Cry Corpus was expanded from 457 to 6,855 samples through augmentation, improving class balance and variability. Experimental results show that CNN accuracy increased from 85% before augmentation to 99.85% after augmentation, with precision, recall, and F1-score reaching near-perfect values across all categories. The confusion matrix further confirms robust classification with minimal misclassifications. These findings demonstrate that data augmentation is crucial to overcoming dataset limitations, enriching acoustic feature diversity, and reducing model bias, while offering practical implications for the development of accurate, reliable, and real-world applicable infant cry detection systems.
Copyrights © 2025