Cleft Lip and Palate (CLP) is a congenital condition that often results in atypical speech articulation, making automatic recognition of CLP speech a challenging task. This study proposes a deep learning-based classification system using Convolutional Neural Networks (CNN) and Wavelet-MFCC features to distinguish speech patterns produced by CLP individuals. Specifically, we investigate the use of two wavelet families Reverse Biorthogonal (rbio1.1) and Biorthogonal (bior1.1)—with three decomposition strategies: single-level (L1), two-level (L2), and a combined level (L1+2). Speech data were collected from 10 CLP patients, each pronouncing nine selected Indonesian words ten times, resulting in 900 utterances. The audio signals were processed using wavelet-based decomposition followed by Mel-Frequency Cepstral Coefficients (MFCC) extraction to generate time-frequency representations of speech. The resulting features were input into a CNN model and evaluated using 5-fold cross-validation. Experimental results show that the combined L1+2 decomposition yields the highest classification accuracy (92.73%), sensitivity (92.97%), and specificity (99.04%). Additionally, certain words such as “selam”, “kapak”, “baju”, “muka”, and “abu” consistently achieved recall scores above 0.94, while “lampu” and “lembab” proved more difficult to classify. The findings demonstrate that integrating multi-level wavelet decomposition with CNN significantly improves the recognition of pathological speech and offers promising potential for clinical diagnostic support.
Copyrights © 2025