Garuda - Garba Rujukan Digital

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal Jurnal Nasional Teknik Elektro dan Teknologi Informasi

Santoso

Department of Electrical Engineering, Faculty of Intelligent Electrical and Informatics Technology, Institut Teknologi Sepuluh Nopember, Surabaya, Jawa Timur 60111, Indonesia

Author-ID : 9938064

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering Energy Engineering

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Voice Command Recognition Using CNN-LSTM Parallel Architecture Santoso; Tri Arief Sardjono; Djoko Purwanto
Jurnal Nasional Teknik Elektro dan Teknologi Informasi Vol 15 No 1: Februari 2026
Publisher : This journal is published by the Department of Electrical and Information Engineering, Faculty of Engineering, Universitas Gadjah Mada.

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22146/jnteti.v15i1.23855

A parallel convolutional neural network–long short-term memory (CNN–LSTM) architecture is introduced for voice command recognition, designed to simultaneously extract spatial and temporal features from speech signals. Conventional serial architectures process these components sequentially, which can lead to the loss of temporal information after CNN-based spatial compression. This study aimed to improve recognition performance by preserving complementary spectral and temporal representations through parallel feature modeling. In the proposed approach, the CNN branch extracted spectral features from Mel-frequency cepstral coefficients (MFCCs), while the LSTM branch independently modeled long-term temporal dependencies from the same input stream. The outputs from both branches were fused through concatenation to form a comprehensive acoustic representation enhancing discrimination between phonetically similar commands. The model was trained and evaluated using a dataset containing eight classes of spoken commands. During training, the proposed model achieved a loss of 0.0186 and an accuracy of 99.87%, indicating effective learning. On the validation and test datasets, the model reached an accuracy of 89.16%, demonstrating stable convergence and consistent generalization performance. Evaluation using precision, recall, and F1 score metrics confirmed balanced recognition across classes, with particularly high accuracy for commands such as “stop,” “right,” and “yes,” while “go” and “no” showed lower accuracy due to acoustic similarity. In conclusion, the proposed parallel CNN–LSTM architecture effectively integrates convolutional and recurrent learning, resulting in improved recognition accuracy and robust performance with strong potential for real-time voice control and embedded applications.

Co-Authors Djoko Purwanto Tri Arief Sardjono

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search