IAES International Journal of Artificial Intelligence (IJ-AI)
Vol 13, No 3: September 2024

Enhanced multi-ethnic speech recognition using pitch shifting generative adversarial networks

Nugroho, Kristiawan (Unknown)
Hadiono, Kristophorus (Unknown)
Sutanto, Felix (Unknown)
Marutho, Dhendra (Unknown)
Farooq, Omar (Unknown)



Article Info

Publish Date
01 Sep 2024

Abstract

Research in the field of speech recognition is a challenging research area. Various approaches have been applied to build robust models. A problem faced in speech recognition research is overfitting, especially if there is insufficient data to train the model. A large enough amount of data can train the model well, resulting in high accuracy. Data augmentation is an approach often used to increase the quantity of dataset. This research uses a data augmentation approach, namely pitch shifting, to increase the quantity of speech dataset, which is then processed into spectrogram data and then classified using a generative adversarial network (GAN). Using the pitch shifting-generative adversarial network (PS-GAN) model, this research produces high accuracy performance in multi-ethnic speech recognition, namely 98.43%, better than several similar studies.

Copyrights © 2024






Journal Info

Abbrev

IJAI

Publisher

Subject

Computer Science & IT Engineering

Description

IAES International Journal of Artificial Intelligence (IJ-AI) publishes articles in the field of artificial intelligence (AI). The scope covers all artificial intelligence area and its application in the following topics: neural networks; fuzzy logic; simulated biological evolution algorithms (like ...