Journal of Information Systems Engineering and Business Intelligence
Vol. 10 No. 2 (2024): June

Text Stemming and Lemmatization of Regional Languages in Indonesia: A Systematic Literature Review

Abidin, Zaenal (Unknown)
Junaidi, Akmal (Unknown)
Wamiliana (Unknown)



Article Info

Publish Date
28 Jun 2024

Abstract

Background: Stemming is significantly essential in natural language processing (NLP) due to the ability to minimize word variations to fundamental forms. This procedure facilitates the analysis of textual data and enhances the precision of classification and information retrieval. Objective: Previous related systematic literature review has not been conducted on stemming and lemmatization in regional languages in Indonesia. Therefore, this study aims to conduct a systematic literature review to capture the latest developments in stemming and lemmatization in regional languages in Indonesia. Methods: This study was carried out using Kitchenham method, analyzing 35 studies extracted from 740, which were obtained from Scopus, IEEE Xplore, and Google Scholar, and published between 2014 and 2023. Results: The results showed that study trends in stemming possessed the potential to continue developing every year. Additionally, the main element in stemming and lemmatization studies was found to be the availability of digital dictionaries in regional languages. This was because greater number of basic vocabularies contributed more positively to stemming or lemmatization. The availability of word morphology information in regional languages would be constructive for making rule-based stemmers. Meanwhile, corpus-based stemming and lemmatization studies could only be conducted for languages with a large corpus to ensure there were various affixed words to process. Conclusion: Based on SLR study, stemming and lemmatization in regional languages in Indonesia developed significantly from 2014 to 2023. The two main strategies applied included using available digital dictionaries and language morphology information. However, the main challenges encountered were the limited number of vocabulary words in the dictionaries and testing various rule-based methods.   Keywords: Lemmatization, Morphology, Rule-based, Stemming, Systematic Literature Review.  

Copyrights © 2024






Journal Info

Abbrev

JISEBI

Publisher

Subject

Computer Science & IT

Description

Jurnal ini menerima makalah ilmiah dengan fokus pada Rekayasa Sistem Informasi ( Information System Engineering) dan Sistem Bisnis Cerdas (Business Intelligence) Rekayasa Sistem Informasi ( Information System Engineering) adalah Pendekatan multidisiplin terhadap aktifitas yang berkaitan dengan ...