Transformer-based architectures and attention mechanisms have revolutionized the field of image recognition. This study focuses on offline handwritten Malayalam word recognition, addressing the lack of publicly available datasets for this low-resource language. A new Malayalam word dataset (MWD) comprising 20,850 samples across 139 classes was developed to support research in this domain. The vision transformer (ViT) was employed for advanced feature extraction, and multiple recognition models—feed-forward neural network (FFNN), global average pooling (GAP), bidirectional long short-term memory (BiLSTM), and attention based feed-forward neural network (AFFNN)—were evaluated. Among these, AFFNN achieved the highest accuracy of 98.56%, establishing the proposed vision transformer-based attention handwritten word recognition (ViTA-HWR) model as a robust framework for handwritten Malayalam word recognition and valuable contribution to regional language processing.
Copyrights © 2026