Aarti Bakshi
SNDT University

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Feature selection for improving Indian spoken language identification in utterance duration mismatch condition Aarti Bakshi; Sunil Kumar Kopparapu
Bulletin of Electrical Engineering and Informatics Vol 10, No 5: October 2021
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/eei.v10i5.3173

Abstract

In spoken language identification (SLID) systems, the test data may be of a sufficiently shorter duration than training data, known as duration mismatch condition. Duration normalized features are used to identify a spoken language for nine Indian languages in duration mismatch conditions. Random forest-based importance vectors of 1582 OpenSMILE features are calculated for each utterance in different duration datasets. The feature importance vectors are normalized across each dataset and later across different duration datasets. The optimal number of duration normalized features is selected to maximize SLID system accuracy. Three classifiers, artificial neural network (ANN), support vector machine (SVM), and random forest (RF), and their fusion, weights optimized using logistic regression, are used. The speech material comprised utterances, each of 30 sec, extracted from the All India Radio dataset with nine Indian languages. Seven new datasets of smaller utterance durations were generated by carefully splitting each utterance. Experimental results showed that 150 most important duration normalized features were optimal with a relative increase in 18-80% accuracy for mismatch conditions. The accuracy decreased with increased duration mismatch.
A GMM supervector approach for spoken Indian language identification for mismatch utterance length Aarti Bakshi; Sunil Kumar Kopparapu
Bulletin of Electrical Engineering and Informatics Vol 10, No 2: April 2021
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/eei.v10i2.2861

Abstract

Gaussian mixture model-universal background model (GMM UBM) supervectors are used to identify spoken Indian languages. The supervectors are calculated from short-time MFCC, its first and sec derivatives. The UBM builds a generalized Indian language model, and mean adaptation transforms it to a duration normalized language-specific GMM. Multi-class support vector machine and artificial neural network classifiers are used to identify language labels from the supervectors. Experimental evaluations are performed using 30 sec speech utterances from nine Indian languages comprised five Indo-Aryan and four Dravidian languages, extracted from all India radio broadcast news data-set. Eight smaller duration data-sets were manually derived to study the effect of training and test duration mismatch. In mismatch conditions, identification accuracy decreases with a decrease in test and train utterance duration. Investigations showed that the 32-mixture model with ANN classifier has optimal performance.