Sahana, P.
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Speech Analysis Language Identification and Translation Ramya, G.; lekha , N. Chandra; Pranathi, P.; Sahana, P.
International Journal of Advances in Artificial Intelligence and Machine Learning Vol. 2 No. 3 (2025): International Journal of Advances in Artificial Intelligence and Machine Learni
Publisher : CV Media Inti Teknologi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.58723/ijaaiml.v2i3.461

Abstract

Background of study: The increasing globalization of communication has intensified the need for systems capable of automatically identifying spoken languages and providing accurate, real-time translation. With advancements in speech processing and machine learning, an integrated framework for speech analysis, language identification, and translation has become both feasible and necessary.Aims: This paper aims to develop and evaluate a comprehensive system that performs automatic speech preprocessing, language identification, speech recognition, and machine translation. The study focuses on designing a multilingual pipeline capable of detecting multiple languages, converting speech to text, and translating the output into a target language with high accuracy and usability.Methods: A multilingual speech corpus comprising recordings in English, Spanish, French, and Mandarin was used. Audio underwent preprocessing, feature extraction using MFCCs and spectrograms, and language identification using CNN-based MFCC classifiers as well as i-vector and x-vector models. Speech recognition was conducted using pre-trained ASR systems such as Whisper and DeepSpeech, followed by neural machine translation (NMT). System performance was evaluated through accuracy, precision, recall, BLEU scores, real-time factor (RTF), and user experience assessments.Result: The proposed system demonstrated strong performance across the LID, ASR, and translation components. CNN-based language identification achieved high accuracy across multilingual inputs, while ASR models produced coherent transcriptions suitable for downstream translation. Translation evaluation using BLEU scores and qualitative human review confirmed that the pipeline maintained contextual accuracy. The system also showed robustness across varying speakers, accents, and noise conditions.Conclusion: The integrated Speech Analysis, Language Identification, and Translation system provides an effective solution for overcoming language barriers in real-time communication. By combining noise-reduced audio preprocessing, reliable language detection, and accurate translation, the system offers a user-friendly platform suitable for multilingual applications. Future improvements include expanding the language set, enhancing robustness against dialectal variation, and deploying the model on lightweight edge devices for real-time applications.