This study proposes a Landmark-to-Image Conversion (L2IC) approach integrated with the MobileViT-XXS architecture for Indonesian Sign Language (BISINDO) alphabet recognition. The method converts 42 hand keypoints, extracted using MediaPipe Hands into normalized 224×224 grayscale images to capture spatial hand patterns more effectively. These L2IC representations are then used as input to the MobileViT-XXS model, trained for 30 epochs with a learning rate of 0.001. Experimental results show that the model achieves an accuracy and Macro F1-Score of 97.98%, outperforming baseline approaches using raw RGB images and MLP-based classification on numerical keypoints. While the model demonstrates strong performance in controlled offline experiments, further evaluation is required to assess its robustness under real-world dynamic BISINDO usage and deployment on resource-limited devices. These findings indicate that the L2IC representation effectively captures essential spatial information, contributing to high recognition accuracy in static BISINDO hand gesture classification.
Copyrights © 2025