Claim Missing Document
Check
Articles

Found 4 Documents
Search

Enhancing speaker verification accuracy with deep ensemble learning and inclusion of multifaceted demographic factors Palsapure, Pranita Niraj; Rajeswari, Rajeswari; Kempegowda, Sandeep Kumar
International Journal of Electrical and Computer Engineering (IJECE) Vol 13, No 6: December 2023
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijece.v13i6.pp6972-6983

Abstract

Effective speaker identification is essential for achieving robust speaker recognition in real-world applications such as mobile devices, security, and entertainment while ensuring high accuracy. However, deep learning models trained on large datasets with diverse demographic and environmental factors may lead to increased misclassification and longer processing times. This study proposes incorporating ethnicity and gender information as critical parameters in a deep learning model to enhance accuracy. Two convolutional neural network (CNN) models classify gender and ethnicity, followed by a Siamese deep learning model trained with critical parameters and additional features for speaker verification. The proposed model was tested using the VoxCeleb 2 database, which includes over one million utterances from 6,112 celebrities. In an evaluation after 500 epochs, equal error rate (EER) and minimum decision cost function (minDCF) showed notable results, scoring 1.68 and 0.10, respectively. The proposed model outperforms existing deep learning models, demonstrating improved performance in terms of reduced misclassification errors and faster processing times.
Discriminative deep learning based hybrid spectro-temporal features for synthetic voice spoofing detection Palsapure, Pranita Niraj; Rajeswari, Rajeswari; Kempegowda, Sandeep Kumar; Ravikumar, Kumbhar Trupti
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 14, No 1: February 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v14.i1.pp130-141

Abstract

Voice-based systems like speaker identification systems (SIS) and automatic speaker verification systems (ASV) are proliferating across industries such as finance and healthcare due to their utility in identity verification through unique speech pattern analysis. Despite their advancements, ASVs are susceptible to various spoofing attacks, including logical and replay attacks, posing challenges due to the sophisticated acoustic distinctions between authentic and spoofed voices. To counteract, this study proposes a robust yet computationally efficient countermeasure system, utilizing a systematic data processing pipeline coupled with a hybrid spectral-temporal learning approach. The aim is to identify effective features that optimize the model's detection accuracy and computational efficiency. The model achieved superior performance with an accuracy of 99.44% and an equal error rate (EER) of 0.014 in the logical access scenario of the ASVspoof 2019 challenge, demonstrating its enhanced accuracy and reliability in detecting spoofing attacks with minimized error margin. 
Deep feature synthesis approach using selective graph attention for replay attack voice spoofing detection Palsapure, Pranita Niraj; Rajeswari, Rajeswari; Kempegowda, Sandeep Kumar
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 13, No 4: December 2024
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v13.i4.pp4915-4926

Abstract

As voice-based authentication becomes increasingly integrated into security frameworks, establishing effective defenses against voice spoofing, particularly replay attacks, is more crucial than ever. This paper presents a novel comprehensive framework for replay attack detection that leverages the integration of advanced spectral-temporal feature extraction and graph-based feature processing mechanisms. The proposed system presents the design of a waveform encoder and a novel temporal residual unit for spectral and temporal feature extraction in synchronous. Further, an approach of selective attention graph followed by multi-scale feature synthesis is employed to retain precise and spoof indicative feature vectors at the classification layer. The proposed method addresses the significant challenge of distinguishing genuine speech from replayed recordings. The validation of the proposed model is done on the ASVSpoof2019 dataset to demonstrate the efficacy of the proposed approach. The proposed system outperforms existing methods, achieving a lower equal error rate (EER) of 0.015 and a reduced tandem detection cost function (t-DCF) of 0.503. The comparative outcome exhibits the robustness of the method in identifying replay attacks.
Serial parallel dataflow-pipelined processing architecture based accelerator for 2D transform-quantization in video coder and decoder Shivarudraiah, Sumalatha; Rajeswari, Rajeswari
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 14, No 1: February 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v14.i1.pp798-809

Abstract

The video coder and decoder (CODEC) standards from MPEG-4 to the recent versatile video codec (VVC), adopted lossy compression methodologies, which involves transformation, quantization and entropy coding. The growing usage of video data in all means of communication demands more bandwidth and storage requirements. While compression with redundancy removal by transform coefficient coding, the focal point is the crucial sequential data flow and data processing structures. Handling the block wise data near to the processing unit prior and after computation will reduce the data waiting time of the processing unit, hence accelerating the targeted functionality. The proposed serial parallel data-flow pipelined processing architecture (SPDPA) accelerates the speed of processing unit by on chip data availability and parallel data accessing options and also with the pipeline operations of transformation, data transpose and quantization. The post implementation results of the architecture targeted to 16 nm and 28 nm field programmable gate array (FPGA) shows that there is a trade-off between power and frequency of operations for various block sizes. The design targeted to 16 nm works for higher frequencies with an average power consumption 0.64 w as compared to 28 nm FPGA which consumes less average power of 0.15 w.