Face detection is a critical foundation of video-based drowsiness monitoring systems because all downstream tasks such as eye-closure estimation, yawning detection, and head movement analysis depend entirely on correctly identifying the face region. Many previous studies rely on detector-generated outputs as ground truth, which can introduce bias and inflate model performance . To avoid this limitation, I manually constructed a ground truth dataset using 1,229 frames extracted from 129 yawning and microsleep videos in the NITYMED dataset. Ten representative frames were sampled from each video using a face-guided extraction script, and all frames were manually annotated in Roboflow following the COCO format to ensure accurate bounding box labeling under varying lighting, head poses, and facial deformation. Using this manually annotated dataset, I conducted a comprehensive benchmark of seven face-detection algorithms: YOLOv11n, SSD MobileNet, CenterFace, YuNet, FastMtCnn, HaarCascade, and LBP. The evaluation focused on localization quality using Intersection over Union (IoU ≥ 0.5) and Dice Similarity, allowing each algorithm’s predicted bounding box to be directly compared against human defined ground truth. The results show that HaarCascade achieved the highest IoU and Dice scores, particularly in frontal and well-lit frames. FastMtCnn also produced strong alignment with a high number of correctly matched frames. CenterFace and SSD MobileNet demonstrated smooth bounding box fitting with competitive Dice scores, while YOLOv11n and YuNet delivered moderate but stable performance across most samples. LBP showed the weakest results, mainly due to its sensitivity to lighting variations and soft-texture regions. Overall, this benchmark provides an unbiased and comprehensive comparison of modern and classical face-detection algorithms for video-based driver-drowsiness applications.
Copyrights © 2025