Buletin Ilmiah Sarjana Teknik Elektro
Vol. 7 No. 4 (2025): December

Enhancing Facial Emotion Recognition on FER2013 Using Attention-based CNN and Sparsemax-Driven Class-Balanced Architectures

Suwartono, Christiany (Unknown)
Bata, Julius Victor Manuel (Unknown)
Airlangga, Gregorius (Unknown)



Article Info

Publish Date
03 Dec 2025

Abstract

Facial emotion recognition plays a critical role in various human–computer interaction applications, yet remains challenging due to class imbalance, label noise, and subtle inter-class visual similarities. The FER2013 dataset, containing seven emotion classes, is particularly difficult because of its low resolution and heavily skewed label distribution. This study presents a comparative investigation of advanced deep learning architectures against traditional machine-learning baselines on FER2013 to address these challenges and improve recognition performance. Two novel architectures are proposed. The first is an attention-based convolutional neural network (CNN) that integrates Mish activations and squeeze-and-excitation (SE) channel recalibration to enhance the discriminative capacity of intermediate features. The second, FastCNN-SE, is a refined extension designed for computational efficiency and minority-class robustness, incorporating Sparsemax activation, Poly-Focal loss, class-balanced reweighting, and MixUp augmentation. The research contribution is demonstrating how combining attention, sparse activations, and imbalance-aware learning improves FER performance under challenging real-world conditions. Both models were extensively evaluated: the attention-CNN under 10-fold cross-validation, achieving 0.6170 accuracy and 0.555 macro-F1, and FastCNN-SE on the held-out test set, achieving 0.5960 accuracy and 0.5138 macro-F1. These deep models significantly outperform PCA-based Logistic Regression, Linear SVC, and Random Forest baselines (≤0.37 accuracy and ≤0.29 macro-F1). We additionally justify the differing evaluation protocols by emphasizing cross-validation for architectural stability and held-out testing for generalization and note that FastCNN-SE contains ~3M parameters, enabling efficient inference. These findings demonstrate that architecture-level fusion of SE attention, Sparsemax, and Poly-Focal loss improves balanced emotion recognition, offering a strong foundation for future studies on efficient and robust affective-computing systems.

Copyrights © 2025






Journal Info

Abbrev

biste

Publisher

Subject

Electrical & Electronics Engineering

Description

Buletin Ilmiah Sarjana Teknik Elektro (BISTE) adalah jurnal terbuka dan merupakan jurnal nasional yang dikelola oleh Program Studi Teknik Elektro, Fakultas Teknologi Industri, Universitas Ahmad Dahlan. BISTE merupakan Jurnal yang diperuntukkan untuk mahasiswa sarjana Teknik Elektro. Ruang lingkup ...