Recognizing infant cries is essential for healthcare, yet conventional representations such as spectrograms and MFCC often fail to capture temporal dynamics, limiting classification performance. This study introduces Polarized Amplitude Time Spiral Encoding (PATSE), a novel transformation that encodes amplitude and time into spiral-based polar representations, enabling richer visual features for deep learning. To address data scarcity and imbalance, audio augmentation techniques time stretching, time shifting, pitch scaling, and polarity inversion were applied, expanding the dataset from 457 to 6855 samples. A Convolutional Neural Network (CNN) trained on PATSE images achieved notable improvements, with overall accuracy increasing from 80% before augmentation to 93% after augmentation. The model attained high performance on the dominant Hungry class (F1-score = 0.96) while also enhancing recognition of minority classes such as belly pain, burping, discomfort, and tired. These results confirm the effectiveness of PATSE in improving generalization and reducing bias, offering a distinctive advantage over linear representations. The proposed framework provides a foundation for intelligent infant cry monitoring and early detection systems in healthcare.
Copyrights © 2025