The Indonesian Journal of Computer Science
Vol. 14 No. 6 (2025): The Indonesian Journal of Computer Science

Text Prediction Standards for Modelling Under-Resourced Languages: A Shona Case Study

Chibaya, Colin (Unknown)



Article Info

Publish Date
07 Oct 2025

Abstract

Text prediction is critical in natural language processing aiding as writing assistance. However, under-resourced languages lack support in current text prediction models. We examine research on text prediction standards towards modelling under-resourced languages. To achieve this, 806 studies were scrutinized, out of which 59 remained relevant. Key findings indicate the prevalence of N-gram, BERT, and LSTM models. A gap in the literature was noted, explaining why under-resourced languages are lagging. Precisely, data scarcity and domain specificity are the obstacles to progress in under-resourced language modeling. A potential solution to this challenge is visible, linked to the leverage of transfer learning techniques such as cross-lingual model pre-training. This way, data scarcity issues can be mitigated. We observed that N-gram models stand out in this respect. These are the most used text prediction approaches yielding outstanding perplexity scores. Our Shona text prediction prototype, RNN model achieved 83.69% accuracy with a perplexity score of 4.825. Notably, the N-gram based prototype outperformed the RNN model in all the measured categories. The development of text prediction standards will likely impact text prediction accuracy in under-resourced languages. Hopefully, research on under-resourced languages can draw insights from these standards and explore the development of tailored solutions.

Copyrights © 2025






Journal Info

Abbrev

ijcs

Publisher

Subject

Computer Science & IT Electrical & Electronics Engineering Engineering

Description

The Indonesian Journal of Computer Science (IJCS) is a bimonthly peer-reviewed journal published by AI Society and STMIK Indonesia. IJCS editions will be published at the end of February, April, June, August, October and December. The scope of IJCS includes general computer science, information ...