Satyanand Singh
Fiji National University

Published : 7 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 7 Documents
Search

Speaker specific feature based clustering and its applications in language independent forensic speaker recognition Satyanand Singh; Pragya Singh
International Journal of Electrical and Computer Engineering (IJECE) Vol 10, No 4: August 2020
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (901.278 KB) | DOI: 10.11591/ijece.v10i4.pp3508-3518

Abstract

Forensic speaker recognition (FSR) is the process of determining whether the source of a questioned voice recording (trace) is a specific individual (suspected speaker). The role of the forensic expert is to testify by using, if possible, a quantitative measure of this value to the value of the voice evidence. Using this information as an aid in their judgments and decisions are up to the judge and/or the jury. Most existing methods measure inter-utterance similarities directly based on spectrum-based characteristics, the resulting clusters may not be well related to speaker’s, but rather to different acoustic classes. This research addresses this deficiency by projecting language-independent utterances into a reference space equipped to cover the standard voice features underlying the entire utterance set. The resulting projection vectors naturally represent the language-independent voice-like relationships among all the utterances and are therefore more robust against non-speaker interference. Then a clustering approach is proposed based on the peak approximation in order to maximize the similarities between language-independent utterances within all clusters. This method uses a K-medoid, Fuzzy C-means, Gustafson and Kessel and Gath-Geva algorithm to evaluate the cluster to which each utterance should be allocated, overcoming the disadvantage of traditional hierarchical clustering that the ultimate outcome can only hit the optimum recognition efficiency. The recognition efficiency of K-medoid, Fuzzy C-means, Gustafson and Kessel and Gath-Geva clustering algorithms are 95.2%, 97.3%, 98.5% and 99.7% and EER are 3.62%, 2.91 %, 2.82%, and 2.61% respectively. The EER improvement of the Gath-Geva technique based FSRsystem compared with Gustafson and Kessel and Fuzzy C-means is 8.04% and 11.49% respectively
High level speaker specific features modeling in automatic speaker recognition system Satyanand Singh
International Journal of Electrical and Computer Engineering (IJECE) Vol 10, No 2: April 2020
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (720.231 KB) | DOI: 10.11591/ijece.v10i2.pp1859-1867

Abstract

Spoken words convey several levels of information. At the primary level, the speech conveys words or spoken messages, but at the secondary level, the speech also reveals information about the speakers. This work is based on the high-level speaker-specific features on statistical speaker modeling techniques that express the characteristic sound of the human voice. Using Hidden Markov model (HMM), Gaussian mixture model (GMM), and Linear Discriminant Analysis (LDA) models build Automatic Speaker Recognition (ASR) system that are computational inexpensive can recognize speakers regardless of what is said. The performance of the ASR system is evaluated for clear speech to a wide range of speech quality using a standard TIMIT speech corpus. The ASR efficiency of HMM, GMM, and LDA based modeling technique are 98.8%, 99.1%, and 98.6% and Equal Error Rate (EER) is 4.5%, 4.4% and 4.55% respectively. The EER improvement of GMM modeling technique based ASR systemcompared with HMM and LDA is 4.25% and 8.51% respectively.
Discrete interferences optimum beamformer in correlated signal and interfering noise Satyanand Singh; Sajai Vir Singh; Dinesh Yadav; Sanjay Kumar Suman; Bhagyalakshmi Lakshminarayanan; Ghanshyam Singh
International Journal of Electrical and Computer Engineering (IJECE) Vol 12, No 2: April 2022
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijece.v12i2.pp1732-1743

Abstract

This paper introduces a significant special situation where the noise is a collection of D-plane interference signals and the correlated noise of D+1 is less than the number of array components. An optimal beamforming processor based on the minimum variance distortionless response (MVDR) generates and combines appropriate statistics for the D+1 model. Instead of the original space of the N-dimensional problem, the interference signal subspace is reduced to D+1. Typical antenna arrays in many modern communication networks absorb waves generated from multiple point sources. An analytical formula was derived to improve the signal to interference and noise ratio (SINR) obtained from the steering errors of the two beamformers. The proposed MVDR processor-based beamforming does not enforce general constraints. Therefore, it can also be used in systems where the steering vector is compromised by gain. Simulation results show that the output of the proposed beamformer based on the MVDR processor is usually close to the ideal state within a wide range of signal-to-noise ratio and signal-to-interference ratio. The MVDR processor-based beamformer has been experimentally evaluated. The proposed processor-based MVDR system significantly improves performance for large interference white noise ratio (INR) in the sidelobe region and provide an appropriate beam pattern.
Forensic and Automatic Speaker Recognition System Satyanand Singh
International Journal of Electrical and Computer Engineering (IJECE) Vol 8, No 5: October 2018
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (160.245 KB) | DOI: 10.11591/ijece.v8i5.pp2804-2811

Abstract

Current Automatic Speaker Recognition (ASR) System has emerged as an important medium of confirmation of identity in many businesses, ecommerce applications, forensics and law enforcement as well. Specialists trained in criminological recognition can play out this undertaking far superior by looking at an arrangement of acoustic, prosodic, and semantic attributes which has been referred to as structured listening. An algorithmbased system has been developed in the recognition of forensic speakers by physics scientists and forensic linguists to reduce the probability of a contextual bias or pre-centric understanding of a reference model with the validity of an unknown audio sample and any suspicious individual. Many researchers are continuing to develop automatic algorithms in signal processing and machine learning so that improving performance can effectively introduce the speaker’s identity, where the automatic system performs equally with the human audience. In this paper, I examine the literature about the identification of speakers by machines and humans, emphasizing the key technical speaker pattern emerging for the automatic technology in the last decade. I focus on many aspects of automatic speaker recognition (ASR) systems, including speaker-specific features, speaker models, standard assessment data sets, and performance metrics
Bayesian distance metric learning and its application in automatic speaker recognition systems Satyanand Singh
International Journal of Electrical and Computer Engineering (IJECE) Vol 9, No 4: August 2019
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (398.656 KB) | DOI: 10.11591/ijece.v9i4.pp2960-2967

Abstract

This paper proposes state-of the-art Automatic Speaker Recognition System (ASR) based on Bayesian Distance Learning Metric as a feature extractor. In this modeling, I explored the constraints of the distance between modified and simplified i-vector pairs by the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. Given a speaker tag, I select the data pair of the different speakers with the highest cosine score to form a set of speaker constraints. This collection captures the most discriminating variability between the speakers in the training data. This Bayesian distance learning approach achieves better performance than the most advanced methods. Furthermore, this method is insensitive to normalization compared to cosine scores. This method is very effective in the case of limited training data. The modified supervised i-vector based ASR system is evaluated on the NIST SRE 2008 database. The best performance of the combined cosine score EER 1.767% obtained using LDA200 + NCA200 + LDA200, and the best performance of Bayes_dml EER 1.775% obtained using LDA200 + NCA200 + LDA100. Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3 condition report for NIST SRE 2008 data.
The role of speech technology in biometrics, forensics and man-machine interface Satyanand Singh
International Journal of Electrical and Computer Engineering (IJECE) Vol 9, No 1: February 2019
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (831.48 KB) | DOI: 10.11591/ijece.v9i1.pp281-288

Abstract

Day by day Optimism is growing that in the near future our society will witness the Man-Machine Interface (MMI) using voice technology. Computer manufacturers are building voice recognition sub-systems in their new product lines. Although, speech technology based MMI technique is widely used before, needs to gather and apply the deep knowledge of spoken language and performance during the electronic machine-based interaction. Biometric recognition refers to a system that is able to identify individuals based on their own behavior and biological characteristics. Fingerprint success in forensic science and law enforcement applications with growing concerns relating to border control, banking access fraud, machine access control and IT security, there has been great interest in the use of fingerprints and other biological symptoms for the automatic recognition. It is not surprising to see that the application of biometric systems is playing an important role in all areas of our society. Biometric applications include access to smartphone security, mobile payment, the international border, national citizen register and reserve facilities. The use of MMI by speech technology, which includes automated speech/speaker recognition and natural language processing, has the significant impact on all existing businesses based on personal computer applications. With the help of powerful and affordable microprocessors and artificial intelligence algorithms, the human being can talk to the machine to drive and control all computer-based applications. Today's applications show a small preview of a rich future for MMI based on voice technology, which will ultimately replace the keyboard and mouse with the microphone for easy access and make the machine more intelligent.
Kernal based speaker specific feature extraction and its applications in iTaukei cross language speaker recognition Satyanand Singh; Pragya Singh
TELKOMNIKA (Telecommunication Computing Electronics and Control) Vol 18, No 5: October 2020
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12928/telkomnika.v18i5.14655

Abstract

Extraction and classification algorithms based on kernel nonlinear features are popular in the new direction of research in machine learning. This research paper considers their practical application in the iTaukei automatic speaker recognition system (ASR) for cross-language speech recognition. Second, nonlinear speaker-specific extraction methods such as kernel principal component analysis (KPCA), kernel independent component analysis (KICA), and kernel linear discriminant analysis (KLDA) are summarized. The conversion effects on subsequent classifications were tested in conjunction with Gaussian mixture modeling (GMM) learning algorithms; in most cases, computations were found to have a beneficial effect on classification performance. Additionally, the best results were achieved by the Kernel linear discriminant analysis (KLDA) algorithm. The performance of the ASR system is evaluated for clear speech to a wide range of speech quality using ATR Japanese C language corpus and self-recorded iTaukei corpus. The ASR efficiency of KLDA, KICA, and KLDA technique for 6 sec of ATR Japanese C language corpus 99.7%, 99.6%, and 99.1% and equal error rate (EER) are 1.95%, 2.31%, and 3.41% respectively. The EER improvement of the KLDA technique-based ASR system compared with KICA and KPCA is 4.25% and 8.51% respectively.