Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal Journal of Dinda : Data Science, Information Technology, and Data Analytics

Mohamad Nurkamal Fauzan

Telkom University

Author-ID : 8318076

Computer Science & IT

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

CNN-LSTM for MFCC-based Speech Recognition on Smart Mirrors for Edge Computing Command Aji Gautama Putrada; Ikke Dian Oktaviani; Mohamad Nurkamal Fauzan; Nur Alamsyah
Indonesian Journal of Data Science, IoT, Machine Learning and Informatics Vol 4 No 2 (2024): August
Publisher : Research Group of Data Engineering, Faculty of Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20895/dinda.v4i2.1504

Smart mirrors are conventional mirrors that are augmented with embedded system capabilities to provide comfort and sophistication for users, including introducing the speech command function. However, existing research still applies the Google Speech API, which utilizes the cloud and provides sub-optimal processing time. Our research aim is to design speech recognition using Mel-frequency cepstral coefficients (MFCC) and convolutional neural network–long short-term memory (CNN-LSTM) to be applied to smart mirror edge devices for optimum processing time. Our first step was to download a synthetic speech recognition dataset consisting of waveform audio files (WAVs) from Kaggle, which included the utterances “left,” “right,” “yes,” “no,” “on,” and “off. ” We then designed speech recognition by involving Fourier transformation and low-pass filtering. We benchmark MFCC with linear predictive coding (LPC) because both are feature extraction methods on speech datasets. Then, we benchmarked CNN-LSTM with LSTM, simple recurrent neural network (RNN), and gated recurrent unit (GRU). Finally, we designed a smart mirror system complete with GUI and functions. The test results show that CNN-LSTM performs better than the three other methods with accuracy, precision, recall, and an f1-score of 0.92. The speech command with the best precision is "no," with a value of 0.940. Meanwhile, the command with the best recall is "off," with a value of 0.963. On the other hand, the speech command with the worst precision and recall is "other," with a value of 0.839. The contribution of this research is a smart mirror whose speech commands are carried out on the edge device with CNN-LSTM.

Co-Authors Aji Gautama Putrada Ikke Dian Oktaviani Nur Alamsyah

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search