JURNAL NASIONAL TEKNIK ELEKTRO
Vol 14, No 3: November 2025

A Hybrid Wavelet Scattering and Mel Spectrogram Feature with Deep Convolution Neural Network for Robust Spoken Digit Recognition

irmawan, Irmawan (Unknown)
Dwijayanti, Suci (Unknown)
Suprapto, Bhakti Yudho (Unknown)



Article Info

Publish Date
12 Dec 2025

Abstract

Spoken digit recognition (SDR) plays a critical role in biometric authentication and human–computer interaction, yet existing approaches often rely on small datasets, limited feature representations, or architectures prone to overfitting. To address these limitations, this study proposes a robust end-to-end pipeline that integrates Wavelet Time Scattering (WTS), Mel-Frequency Cepstral Coefficients (MFCC), and a 2D Deep Convolutional Neural Network (2D-CNN) to enhance the accuracy and generalization of SDR systems in realistic environments. The Free-Spoken Digit Dataset (FSDD), consisting of 3000 audio samples from speakers with diverse accents, was pre-processed using zero-padding normalization and transformed into high-resolution time–frequency spectrograms via WTS. The proposed CNN architecture, optimized through systematic experimentation on batch size and learning rate, demonstrated stable convergence and superior discriminative capability. Using a learning rate of 0.001 and a batch size of 50, the model achieved the highest performance with 99.2% accuracy, outperforming established methods including SVM, MFCC-LSTM, and Multiple RNN architectures. Comparative evaluations further revealed that the combined WTS–MFCC feature extraction significantly enhances spectral–temporal representation quality, contributing to improved classification precision across all digit classes. These findings demonstrate that the proposed WTS-MFCC-CNN framework not only advances SDR accuracy but also provides a scalable and computationally efficient approach suitable for real-world biometric, financial, and voice-controlled applications. The results highlight the potential of hybrid time–frequency representations integrated with deep architectures to set a new benchmark for robust spoken digit recognition.

Copyrights © 2025






Journal Info

Abbrev

JNTE

Publisher

Subject

Electrical & Electronics Engineering

Description

Jurnal Nasional Teknik Elektro (JNTE) adalah jurnal ilmiah peer-reviewed yang diterbitkan oleh Jurusan Teknik Elektro Universitas Andalas dengan versi cetak (p-ISSN:2302-2949) dan versi elektronik (e-ISSN:2407-7267). JNTE terbit dua kali dalam setahun untuk naskah hasil/bagian penelitian yang ...