Facial expression recognition is a fundamental component of artificial intelligence systems, particularly in human–machine interaction. However, achieving robust detection accuracy remains challenging due to variations in lighting, facial orientation, and limited training data diversity. While recent lightweight YOLO architectures—YOLOv8n, YOLOv10n, and YOLO11n—have demonstrated strong performance in general object detection, comparative studies evaluating these models specifically for facial expression detection remain limited. This study addresses this gap by systematically comparing these three nano-variant models on a dataset of 2,000 labeled facial images across four expression categories: flat face, angry, sad, and smile. The dataset was divided into training (70%), validation (20%), and test (10%) subsets. Experiments were conducted under two scenarios—with and without data augmentation—using identical training configurations. Augmentation techniques included mosaic composition, HSV variation, geometric transformations, and flipping. Results show that augmentation improved the F1 score of YOLOv10n from 0.68 to 0.72 and YOLO11n from 0.65 to 0.72, with the latter achieving the highest overall precision of 0.82. YOLOv8n exhibited stable performance with an F1 score of 0.75 under both conditions. Confidence threshold optimization revealed distinct optimal operating points for each model, ranging from 0.1 to 0.6, confirming that per-model threshold tuning is necessary to maximize detection performance. These findings provide practical guidance for selecting and configuring lightweight YOLO models for facial expression detection in resource-constrained environments.
Copyrights © 2026