ComTech: Computer, Mathematics and Engineering Applications
Vol. 17 No. 1 (2026): ComTech

Balinese Language Classification on Social Media using Multinomial Naive Bayes Method with TF-IDF

Putu Widyantara Artanta Wibawa (Informatics Study Program, Faculty of Mathematics and Natural Sciences, Udayana University, Bali, Indonesia 80361)
Cokorda Pramartha (Informatics Study Program, Faculty of Mathematics and Natural Sciences, Udayana University, Bali, Indonesia 80361)
I Gusti Ngurah Anom Cahyadi Putra (Informatics Study Program, Faculty of Mathematics and Natural Sciences, Udayana University, Bali, Indonesia 80361)
Luh Gede Astuti (Informatics Study Program, Faculty of Mathematics and Natural Sciences, Udayana University, Bali, Indonesia 80361)



Article Info

Publish Date
30 Jan 2026

Abstract

Balinese is a local language that is widely used and spoken by Balinese people, including on social media platforms. However, the nuances of its politeness levels are often lost in informal digital communication, and there is a significant lack of computational models that automatically classify these levels, particularly for low-resource languages such as Balinese. The primary objective of this study is to evaluate the performance of the Multinomial Naive Bayes method combined with Term Frequency–Inverse Document Frequency (TFIDF) feature extraction, Chi-square feature selection, and the Synthetic Minority Oversampling Technique (SMOTE) in classifying Balinese language levels. The dataset used in this study consists of 1,314 annotated social media posts and comments, primarily sourced from Instagram. A Balinese language expert performs the annotation, categorizing the texts into six levels that represent varying degrees of politeness and formality. These levels include alus singgih (polite, used for respecting others), alus sor (polite, used for self-humbling), alus mider (polite, used for both respecting others and self-humbling), alus madia (an intermediate level of politeness), basa andap (casual, commonly used in everyday life), and basa kasar (impolite, often used during arguments or toward animals). The experimental results show that the model achieves 96.53% accuracy on the training data and 61.45% accuracy on the test data. In addition, hyperparameter tuning reveals that the Multinomial Naive Bayes model with 2,720 selected features and SMOTE oversampling achieves 91.78% accuracy, significantly outperforming the baseline model without feature selection or oversampling, which achieves only 64.93% accuracy.

Copyrights © 2026






Journal Info

Abbrev

comtech

Publisher

Subject

Computer Science & IT Engineering Mathematics

Description

The journal invites professionals in the world of education, research, and entrepreneurship to participate in disseminating ideas, concepts, new theories, or science development in the field of Information Systems, Architecture, Civil Engineering, Computer Engineering, Industrial Engineering, Food ...