Raut, Karishma
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Multimodal perception for enhancing human computer interaction through real-world affect recognition Raut, Karishma; Kulkarni, Sujata; Sawant, Ashwini
Indonesian Journal of Electrical Engineering and Computer Science Vol 38, No 1: April 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v38.i1.pp428-438

Abstract

Human-Computer Interaction can benefit from real-world affect recognition in applications like healthcare and assistive robotics. Human express emotions through various modalities, with audio-visual being the most significant. Using a unimodal approach, such as only speech or visual, is challenging in natural, dynamic environments. The proposed methodology integrated a pretrained model with a convolution neural network (CNN) to provide a robust initialization point and address the limited availability of facial expression data. The multimodal framework enhances discriminative power by combining visual scores with speech. This work addresses the challenges at each stage of the real-world affect recognition framework, including data preprocessing, feature extraction, feature fusion, and final classification. A 1D-CNN is employed for training on spectral and prosodic audio features, while deep visual features are processed using a 2D-CNN. The proposed system's performance was evaluated on the extended Cohn-Kanade (CK+), acted-facial-expressions in-the-wild (AFEW), and real-world-affective-face-database (RAF) datasets, which are commonly used in face recognition research. Experimental results indicate that 2% to 5% of visual data from natural settings were undetected, and the inclusion of the audio modality improved performance by providing relevant and supplementary information.