Emerging Science Journal
Vol. 10 No. 2 (2026): April

Multimodal Emotion Recognition Using Hybrid Large Language Models and Metaheuristic Algorithms

Andino Maseleno (1) Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand. 2) Research Organization for Electronics and Informatics, National Research and Innovation Agency, Jakarta Pusat 10340)
M. Teduh Uliniansyah (Research Organization for Electronics and Informatics, National Research and Innovation Agency, Jakarta Pusat 10340)
Agung Santosa (Research Organization for Electronics and Informatics, National Research and Innovation Agency, Jakarta Pusat 10340)
Lyla Ruslana Aini (Research Organization for Electronics and Informatics, National Research and Innovation Agency, Jakarta Pusat 10340)
Rini Wijayanti (Research Organization for Electronics and Informatics, National Research and Innovation Agency, Jakarta Pusat 10340)
Ahmad Fudholi (Pusat Pengajian Citra Universiti, Universiti Kebangsaan Malaysia, Bangi, Selangor 43600)
Chotirat Ann Ratanamahatana (Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok)



Article Info

Publish Date
01 Apr 2026

Abstract

Emotion recognition is a vital component of human–computer interaction and intelligent systems, yet robust multimodal emotion recognition remains challenging due to high-dimensional input space, noisy features, and the complexity of integrating heterogeneous modalities. This study proposes a novel hybrid multimodal framework that enhances both accuracy and computational efficiency by combining the semantic representation capability of Large Language Models (LLMs) with the optimization strengths of metaheuristic algorithms. In the proposed approach, an LLM is utilized to extract high-level contextual features from text and audio streams, while the Binary Artificial Hummingbird Algorithm (BAHA) performs feature selection to remove redundant attributes. Subsequently, the Goose Algorithm (GA) optimizes classifier hyperparameters, and the Komodo Mlipir Algorithm (KMA) conducts late fusion of the final multimodal outputs. Experiments conducted on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset, evaluated on six emotion categories, demonstrate that this hybrid approach successfully captures subtle affective cues and surpasses state-of-the-art baselines, achieving an accuracy of 87.5%. Integrating LLMs with multiple specialized metaheuristics therefore yields a substantially more robust emotion recognition pipeline and represents a promising direction toward the development of more emotionally intelligent systems.

Copyrights © 2026






Journal Info

Abbrev

ESJ

Publisher

Subject

Environmental Science

Description

Emerging Science Journal is not limited to a specific aspect of science and engineering but is instead devoted to a wide range of subfields in the engineering and sciences. While it encourages a broad spectrum of contribution in the engineering and sciences. Articles of interdisciplinary nature are ...