Hearing impairment can have a profound impact on the mental and emotional state of sufferers, as well as hinder communication and delay in accessing information directly that relies on interpreters. Advances in assistive technology, especially speech recognition systems that are able to convert spoken language into written text (speech-to-text). However, its implementation faces various challenges related to the level of accuracy of each speech-to-text Application Programming Interface (API), thus requiring an appropriate deep learning model. This study serves to analyze and compare the performance of speech-to-text API services (Deepgram API, Google API and Whisper AI) based on Word Error Rate (WER) and Words Per Minute (WPM), to determine the most optimal API in a web-based real-time transcription system using the JavaScript programming language and Glitch.com. The three API services were tested by calculating their error rates and transcription speeds, then evaluated to see how low the error accuracy rate was and how high the transcription speed was. On average, Whisper AI had a WER of 0% across all word categories, but its speed was lower than the other two APIs. Deepgram API displayed the best balance between accuracy and speed, with an average WER of 13.78% and 67 WPM. Google API performed stably, but its WER value was slightly higher than Deepgram API. In conclusion, based on the results, Deepgram API was deemed the most optimal for live transcription, as it is capable of producing fast and error-free transcriptions, significantly increasing the accessibility of information for the deaf community.
Copyrights © 2025