Speech recognition is transforming the way humans interact with technology and automatic gender recognition is an essential part of this evolution. This study develops a multilingual deep learning (DL) model for gender detection using three audio datasets: RAVDESS (English), Berlin EmoDB (German), and IITKGP-SEHSC (Hindi). These datasets provide linguistic diversity, enabling the development of a multi-lingual gender identification model. The mel-frequency cepstral coefficients (MFCC) and VGGish embeddings and other audio features were used to process raw audio data into something meaningful. The findings show the machine learning (ML) models (random forest (RF) and extreme gradient boosting) achieved good results in the monolingual (98.26% using Hindi and 96.90% using cross-lingual) setup. In DL models, convolutional neural network (CNN) outperformed other models in both monolingual and cross-lingual scenarios, with 99.33% accuracy for Hindi and 98.11% accuracy in cross-lingual setup. These findings show how well DL works for gender detection in multilingual and emotionally complex settings. It outperforms traditional models. The experiment describes the potential of DL in speech-based artificial intelligence (AI) applications, which enhances the performance in real-life scenarios.
Copyrights © 2026