Nuk Ghurroh Setyoningrum
Universitas Amikom Yogyakarta

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Polarized Amplitude Time Spiral Encoding for Infant Cry Audio Augmentation and CNN Classification Nuk Ghurroh Setyoningrum; Ema Utami; Kusrini; Ferry Wahyu Wibowo
Journal of Innovation Information Technology and Application (JINITA) Vol 7 No 2 (2025): JINITA, December 2025
Publisher : Politeknik Negeri Cilacap

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35970/10.35970/jinita.v7i2.2960

Abstract

Recognizing infant cries is essential for healthcare, yet conventional representations such as spectrograms and MFCC often fail to capture temporal dynamics, limiting classification performance. This study introduces Polarized Amplitude Time Spiral Encoding (PATSE), a novel transformation that encodes amplitude and time into spiral-based polar representations, enabling richer visual features for deep learning. To address data scarcity and imbalance, audio augmentation techniques time stretching, time shifting, pitch scaling, and polarity inversion were applied, expanding the dataset from 457 to 6855 samples. A Convolutional Neural Network (CNN) trained on PATSE images achieved notable improvements, with overall accuracy increasing from 80% before augmentation to 93% after augmentation. The model attained high performance on the dominant Hungry class (F1-score = 0.96) while also enhancing recognition of minority classes such as belly pain, burping, discomfort, and tired. These results confirm the effectiveness of PATSE in improving generalization and reducing bias, offering a distinctive advantage over linear representations. The proposed framework provides a foundation for intelligent infant cry monitoring and early detection systems in healthcare.