Muhammad Hilmy Herdiansyah
Department of Computer Engineering, Universitas Negeri Semarang

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Wavelet-Based MFCC and CNN Framework for Automatic Detection of Cleft Speech Disorders Muhammad Hilmy Herdiansyah; Syahroni Hidayat; Nur Iksan
Jurnal Teknologi Informasi dan Multimedia Vol. 7 No. 3 (2025): August
Publisher : Sekawan Institut

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35746/jtim.v7i3.780

Abstract

Cleft Lip and Palate (CLP) is a congenital condition that often results in atypical speech articulation, making automatic recognition of CLP speech a challenging task. This study proposes a deep learning-based classification system using Convolutional Neural Networks (CNN) and Wavelet-MFCC features to distinguish speech patterns produced by CLP individuals. Specifically, we investigate the use of two wavelet families Reverse Biorthogonal (rbio1.1) and Biorthogonal (bior1.1)—with three decomposition strategies: single-level (L1), two-level (L2), and a combined level (L1+2). Speech data were collected from 10 CLP patients, each pronouncing nine selected Indonesian words ten times, resulting in 900 utterances. The audio signals were processed using wavelet-based decomposition followed by Mel-Frequency Cepstral Coefficients (MFCC) extraction to generate time-frequency representations of speech. The resulting features were input into a CNN model and evaluated using 5-fold cross-validation. Experimental results show that the combined L1+2 decomposition yields the highest classification accuracy (92.73%), sensitivity (92.97%), and specificity (99.04%). Additionally, certain words such as “selam”, “kapak”, “baju”, “muka”, and “abu” consistently achieved recall scores above 0.94, while “lampu” and “lembab” proved more difficult to classify. The findings demonstrate that integrating multi-level wavelet decomposition with CNN significantly improves the recognition of pathological speech and offers promising potential for clinical diagnostic support.