Akbar, Ahmad Reinaldi
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Designing Human-Robot Communication in the Indonesian Language Using the Deep Bidirectional Long Short-Term Memory Algorithm Dwijayanti, Suci; Akbar, Ahmad Reinaldi; Suprapto, Bhakti Yudho
Jurnal Elektronika dan Telekomunikasi Vol 24, No 1 (2024)
Publisher : National Research and Innovation Agency

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.55981/jet.595

Abstract

Humanoid robots closely resemble humans and engage in various human-like activities while responding to queries from their users, facilitating two-way communication between humans and robots. This bidirectional interaction is enabled through the integration of speech-to-text and text-to-speech systems within the robot. However, research on two-way communication systems for humanoid robots utilizing speech-to-text and text-to-speech technologies has predominantly focused on the English language. This study aims to develop a real-time two-way communication system between humans and a robot, with data collected from ten respondents, including eight males and two females. The sentences used adhere to the standard rules of the Indonesian language. The speech-to-text system employs a deep bidirectional long short-term memory algorithm, coupled with feature extraction via the Mel frequency cepstral coefficients, to convert spoken language into text. Conversely, the text-to-speech system utilizes the Python pyttsx3 module to translate text into spoken responses delivered by the robot. The results indicate that the speech-to-text model achieves a high level of accuracy under quiet-room conditions, with noise levels ranging from 57.5 to 60 dB, boasting an average word error rate (WER) of 24.99% and 25.31% for speakers within and outside the dataset, respectively. In settings with engine noise and crowds, where noise levels range from 62.4 to 86 dB, the measured WER is 36.36% and 36.96% for speakers within and outside the dataset, respectively. This study demonstrates the feasibility of implementing a two-way communication system between humans and a robot, enabling the robot to respond to various vocal inputs effectively.