TELKOMNIKA (Telecommunication Computing Electronics and Control)
Vol 19, No 4: August 2021

Enhancing text classification performance by preprocessing misspelled words in Indonesian language

Reza Setiabudi (Universitas Multimedia Nusantara)
Ni Made Satvika Iswari (Universitas Multimedia Nusantara)
Andre Rusli (Universitas Multimedia Nusantara)



Article Info

Publish Date
01 Aug 2021

Abstract

Supervised learning using shallow machine learning methods is still a popular method in processing text, despite the rapidly advancing sector of unsupervised methodologies using deep learning. Supervised text classification for application user feedback sentiments in Indonesian Language is one of the applications which is quite popular in both the research community and industry. However, due to the nature of shallow machine learning approaches, various text preprocessing techniques are required to clean the input data. This research aims to implement and evaluate the role of Levenshtein distance algorithm in detecting and preprocessing misspelled words in Indonesian language, before the text data is then used to train a user feedback sentiment classification model using multinomial Naïve Bayes. This research experimented with various evaluation scenarios, and found that preprocessing misspelled words in Indonesian language using the Levenshtein distance algorithm could be useful and showed a promising 8.2% increase on the accuracy of the model’s ability to classify user feedback text according to their sentiments.

Copyrights © 2021






Journal Info

Abbrev

TELKOMNIKA

Publisher

Subject

Computer Science & IT

Description

Submitted papers are evaluated by anonymous referees by single blind peer review for contribution, originality, relevance, and presentation. The Editor shall inform you of the results of the review as soon as possible, hopefully in 10 weeks. Please notice that because of the great number of ...