Automatic Speech Recognition (ASR) for a typical speech, such as dysarthria, presents a significant challenge due to high acoustic variability, which often leads to failures in standard models. This challenge is further compounded when implementation is targeted for edge devices with limited computational resources, memory, and power. The need for model architectures that are not only accurate but also highly efficient (lightweight) is crucial for realizing on-device ASR systems with low latency. This research focuses on exploring modern deep learning architectures to address these two primary challenges: accuracy in dysarthric speech and computational efficiency. The study aims to implement and evaluate three efficient models—MobileNetV3Small, EfficientNetB0, and NASNetMobile—on the UASpeech and TORGO datasets. The methodology involves extracting Mel-Frequency Cepstral Coefficients (MFCC) features, which are visualized as spectrograms and subsequently classified using a transfer learning approach. Experimental results show that the MobileNetV3Small model achieved the highest performance on the UASPEECH dataset, attaining a uniform score of 97,8 % for accuracy. This study concludes that lightweight CNN architectures like MobileNetV3Small are highly effective for dysarthric speech classification and demonstrate the feasibility of developing robust and practical ASR systems for resource-constrained environments.
Copyrights © 2025