Moyo, Sibusiso
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Evolutionary trends in automatic speech recognition with artificial intelligence: a systematic literature review Oluwatobi Sobola, Gabriel; Adetiba, Emmanuel; Idowu-Bismark, Olabode; Abayomi, Abdultaofeek; Jules Kala, Raymond; Thakur, Surendra Colin; Moyo, Sibusiso
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 15, No 1: February 2026
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v15.i1.pp20-43

Abstract

Human beings depend greatly on communication and continually seek ways to overcome language barriers. Automatic speech recognition (ASR) has emerged as a vital tool for enhancing human interaction. Early ASR research relied on probabilistic models, particularly the hidden Markov model (HMM) and Gaussian mixture model (GMM), with mel-frequency cepstral coefficients (MFCCs) for feature extraction, leading to the creation of Audrey at Bell Laboratories. Subsequently, artificial intelligence (AI) approaches, especially deep learning, have transformed ASR and produced systems such as Jasper, Whisper, Google Assistant, Microsoft Cortana, Apple Siri, and Amazon Alexa. This paper presents a systematic literature review that examines ASR’s evolution, the AI architectures employed, their features, strengths and weaknesses, and the performance gains achieved since AI was integrated into probabilistic modelling. A snowballing approach was used to identify relevant studies from Google Scholar and Scopus to address five research questions, iterating through backward and forward searches until no new information was found. Findings reveal that ASR dates back to the 1920s with the Radio Rex toy and has since advanced through architectures including deep learning, recurrent neural networks (RNN), support vector machines (SVM), and transformers, all contributing to improved performance measured by reduced word error rates (WER).