Abdussalam Abdussalam
Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Indonesia

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Interpretable and Statistically Validated Comparative Evaluation of EfficientNetB0, MobileNetV2, and ResNet50 for Bold and Natural Makeup Classification on CelebA Aurelia Chiara Suryabangun; Abdussalam Abdussalam
Jurnal Teknik Informatika (Jutif) Vol. 7 No. 3 (2026): JUTIF Volume 7, Number 3, June 2026
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2026.7.3.5806

Abstract

Facial makeup classificationplays a critical role in beauty technology, visual style analysis, and intelligent web-based image inference. Distinguishing bold makeup from natural makeup is challenging due to subtle visual overlap, borderline facial appearance, and inconsistent makeup intensity across images. While numerous prior studies have applied deep learning for facial analysis, most focus solely on conventional performance metrics without addressing statistical validation, probability calibration, or interpretability — a critical gap that limits reliable model selection in visually subtle classification tasks. This study presents an interpretable and statistically validated comparative evaluation of three transfer learning architectures — EfficientNetB0, MobileNetV2, and ResNet50 — for binary makeup classification using a curated CelebA-based dataset. The final dataset comprises 12,000 facial images equally divided into natural_makeup and bold_makeup classes, with separate training, validation, and clean test subsets. Models were evaluated using holdout testing, 10-fold cross-validation, McNemar statistical testing, calibration analysis, confidence intervals, ROC and PR curves, and Grad-CAM visualization. Experimental results show that EfficientNetB0 achieved the best overall performance, with 0.7900 Accuracy, 0.7898 Macro-F1, 0.8829 ROC-AUC, and 0.8461 PR-AUC on the clean holdout test set. Across ten-fold cross-validation, EfficientNetB0 further achieved 0.7801 ± 0.0093 Accuracy and 0.8780 ± 0.0090 ROC-AUC. It also demonstrated the strongest calibration performance, with the lowest Expected Calibration Error (ECE = 0.0558) and Brier Score (0.1449) among all compared models. The selected model was further implemented in a FastAPI-based backend system for web-based prediction. From a broader Informatics and Computer Science perspective, this study contributes a rigorous and reproducible evaluation framework that integrates statistical validation, calibration assessment, and interpretability, enabling more reliable model selection in visually subtle facial analysis tasks and supporting practical deployment in intelligent systems.