Face-based biometric identification systems have significant limitations when a subject’s face is covered, whether due to mask usage after the COVID-19 pandemic or face veils for cultural and religious reasons. This creates real security gaps, as evidenced by the gender-disguise infiltration incident at Masjid Jannatul Firdaus in Makassar. In such situations, the eyes remain the only consistently exposed biometric feature. This study proposes the application of Vision Transformer (ViT-B/16) pretrained on ImageNet-21K with a progressive fine-tuning strategy based on the discriminative learning rate principle to classify gender from eye images. The Female and Male Eyes dataset from Kaggle consists of 11,525 eye images divided into training (64%), validation (16%), and testing (20%) sets. Experiments were conducted in two series: Series B tested variations in the number of unfrozen transformer blocks (0–6), and Series C tested discriminative learning rate ratios between the classifier and encoder (5:1, 10:1, 3:1). The optimal configuration with 6 unfrozen blocks and a 3:1 ratio achieved 95.70% accuracy, 97.67% precision, 92.69% recall, and 0.9569 weighted F1-score, surpassing MobileNet (93.90%) and K-Nearest Neighbor (68.81%). These results indicate that ViT with discriminative fine-tuning is effective for gender classification from eye images and has potential for biometric security applications.
Copyrights © 2026