Amyotrophic Lateral Sclerosis (ALS) is a highly progressive neurodegenerative disease that impairs motor and speech function. Conventional diagnostic methods, both invasive and non-invasive, are often time-consuming and produce limited sensitivity. This leads to delays in treatment and worsening disease progression. This study proposes a multimodal deep learning framework that utilizes and integrates invasive medical records with non-invasive morphological features of patient speech audio extracted into Mel-Spectrograms. Unlike previous studies that focused solely on speech or clinical features, this study introduces an integrated multimodal diagnostic framework that effectively combines both data sources to achieve reliable diagnostic accuracy. The study included two experimental scenarios. In the first scenario, the audio-trained model used a Convolutional Neural Network (CNN) and was systematically optimized by testing variations in network depth, feature fusion techniques, and layer dropout probabilities to improve model generalization and stability. From the experimental results of the first scenario, the CNN achieved the best performance, achieving 80.33% accuracy in classification using audio data alone from all the tested model variations. In the second experimental scenario, when the best model was trained by incorporating clinical data, the model demonstrated improved diagnostic performance, achieving 100% accuracy. This finding highlights the importance of combining data modalities or sources from various domains, both invasive and non-invasive, to achieve optimal model performance for early ALS detection.
Copyrights © 2026