The development of an automatic Indonesian Sign Language (BISINDO) translation system on mobile devices faces major challenges in the form of high computational costs and variability in signing styles across individuals. This study proposes a lightweight approach using MediaPipe Holistic skeletal feature extraction integrated with a Recurrent Neural Network (RNN) architecture. Specifically, the research evaluates and compares the performance of Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) architectures in recognizing 12 classes of dynamic sign words. Unlike most previous studies that employ random data splitting, this research applies a Leave-One-Subject-Out (LOSO) validation scheme to rigorously assess model generalization to unseen users. Experimental results reveal a significant performance gap between the two architectures. The LSTM model exhibits poor generalization capability, achieving an accuracy of only 40.34%, whereas the GRU model demonstrates superior performance with an accuracy of 73.95%. In terms of resource efficiency, GRU is more optimal, with a model size of 0.83 MB (22% smaller than LSTM), 24% fewer parameters, and stable inference speed in the range of 13–14 FPS. This study concludes that GRU is a more effective and efficient architecture for implementing robust BISINDO recognition systems on resource-constrained devices.
Copyrights © 2026