Access to education, particularly in a university environment, is essential for deaf and hard-of-hearing students as more of them pursue higher education. At UIN Sunan Kalijaga the current challenges are a limited number of sign language interpreters and translating technical terminology in lectures. Many methods are available for speech recognition, but research on how well this method performs in Indonesian has not been published, especially in education-level recognizers. This experimental study aims to investigate if Indonesian words can be recognized through Convolutional Neural Networks (CNN) and to find out the Data Ratio for Training, Validation, and Testing set to get the best performance. The study used a dataset of 4 Indonesian words with the total voice sample, each with 50 voice samples from young adults aged 19-23. Audio data is preprocessed into spectrograms, inputs to the CNN model using TensorFlow. The CNN Model had a 90% accuracy with a 60:20:20 ratio between training, validation, and test data. The other ratios (70:15:15 and 80:10:10) provided accuracy ranges of between 80% to 90%. This study shows that CNNs are the best for Indonesian word recognition and that the data ratio of 60:20:20 is optimal. This result has valuable benefits, such as using voice-to-text over lectures to enhance the ease of learning and education in Indonesia. Further studies should be conducted using different neural network approaches; the denoise approach is also necessary to increase accuracy.
Copyrights © 2025