Garuda - Garba Rujukan Digital

Journal of Technology Informatics and Engineering

Vol. 4 No. 3 (2025): DECEMBER | JTIE : Journal of Technology Informatics and Engineering

Orinos, Nasios (Unknown)
Onola, Quedevo (Unknown)
Chistoff, Ong Ben (Unknown)

Publish Date
05 Dec 2025

Document classification in low-resource languages remains a critical challenge due to the scarcity of annotated datasets, language-specific resources, and linguistic tools. This study investigates the effectiveness of zero-shot learning (ZSL) for multilingual document classification, with a specific focus on low-resource Southeast Asian languages: Javanese, Sundanese, and Malay. We adopt a zero-shot cross-lingual transfer approach, using English-labeled data as the source domain and evaluating on unseen target-language documents without any supervised fine-tuning. Specifically, we employ two state-of-the-art multilingual transformer models, XLM-RoBERTa (XLM-R) and Multilingual T5 (mT5), to evaluate their ability to generalize across linguistically distant languages. Experimental results show that XLM-R achieves higher average accuracy (≈78%) and F1 Score (≈0.76) than mT5 (≈74% accuracy, 0.72 F1), demonstrating stronger transferability and stability. Both models exhibit efficient inference speed and manageable computational costs, indicating potential for deployment in resource-constrained environments. The findings introduce an early benchmark for zero-shot multilingual document classification in Southeast Asian languages and highlight the feasibility of inclusive NLP systems that bridge the data gap for underrepresented linguistic communities.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Journal of Technology Informatics and Engineering

Website

Abbrev

jtie

Publisher

Universitas Sains Dan Teknologi Komputer

Subject

Computer Science & IT

Description

Power Engineering Telecommunication Engineering Computer Engineering Control and Computer Systems Electronics Information technology Informatics Data and Software engineering Biomedical ...

Article Info

Abstract

Zero-Shot Learning For Multilingual Document Classification In Low-Resource Languages

Article Info

Abstract