Palsapure, Pranita Niraj
Unknown Affiliation

Published : 3 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 3 Documents
Search

Enhancing speaker verification accuracy with deep ensemble learning and inclusion of multifaceted demographic factors Palsapure, Pranita Niraj; Rajeswari, Rajeswari; Kempegowda, Sandeep Kumar
International Journal of Electrical and Computer Engineering (IJECE) Vol 13, No 6: December 2023
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijece.v13i6.pp6972-6983

Abstract

Effective speaker identification is essential for achieving robust speaker recognition in real-world applications such as mobile devices, security, and entertainment while ensuring high accuracy. However, deep learning models trained on large datasets with diverse demographic and environmental factors may lead to increased misclassification and longer processing times. This study proposes incorporating ethnicity and gender information as critical parameters in a deep learning model to enhance accuracy. Two convolutional neural network (CNN) models classify gender and ethnicity, followed by a Siamese deep learning model trained with critical parameters and additional features for speaker verification. The proposed model was tested using the VoxCeleb 2 database, which includes over one million utterances from 6,112 celebrities. In an evaluation after 500 epochs, equal error rate (EER) and minimum decision cost function (minDCF) showed notable results, scoring 1.68 and 0.10, respectively. The proposed model outperforms existing deep learning models, demonstrating improved performance in terms of reduced misclassification errors and faster processing times.
Discriminative deep learning based hybrid spectro-temporal features for synthetic voice spoofing detection Palsapure, Pranita Niraj; Rajeswari, Rajeswari; Kempegowda, Sandeep Kumar; Ravikumar, Kumbhar Trupti
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 14, No 1: February 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v14.i1.pp130-141

Abstract

Voice-based systems like speaker identification systems (SIS) and automatic speaker verification systems (ASV) are proliferating across industries such as finance and healthcare due to their utility in identity verification through unique speech pattern analysis. Despite their advancements, ASVs are susceptible to various spoofing attacks, including logical and replay attacks, posing challenges due to the sophisticated acoustic distinctions between authentic and spoofed voices. To counteract, this study proposes a robust yet computationally efficient countermeasure system, utilizing a systematic data processing pipeline coupled with a hybrid spectral-temporal learning approach. The aim is to identify effective features that optimize the model's detection accuracy and computational efficiency. The model achieved superior performance with an accuracy of 99.44% and an equal error rate (EER) of 0.014 in the logical access scenario of the ASVspoof 2019 challenge, demonstrating its enhanced accuracy and reliability in detecting spoofing attacks with minimized error margin. 
Deep feature synthesis approach using selective graph attention for replay attack voice spoofing detection Palsapure, Pranita Niraj; Rajeswari, Rajeswari; Kempegowda, Sandeep Kumar
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 13, No 4: December 2024
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v13.i4.pp4915-4926

Abstract

As voice-based authentication becomes increasingly integrated into security frameworks, establishing effective defenses against voice spoofing, particularly replay attacks, is more crucial than ever. This paper presents a novel comprehensive framework for replay attack detection that leverages the integration of advanced spectral-temporal feature extraction and graph-based feature processing mechanisms. The proposed system presents the design of a waveform encoder and a novel temporal residual unit for spectral and temporal feature extraction in synchronous. Further, an approach of selective attention graph followed by multi-scale feature synthesis is employed to retain precise and spoof indicative feature vectors at the classification layer. The proposed method addresses the significant challenge of distinguishing genuine speech from replayed recordings. The validation of the proposed model is done on the ASVSpoof2019 dataset to demonstrate the efficacy of the proposed approach. The proposed system outperforms existing methods, achieving a lower equal error rate (EER) of 0.015 and a reduced tandem detection cost function (t-DCF) of 0.503. The comparative outcome exhibits the robustness of the method in identifying replay attacks.