Journal of Computing and Informatics Research
Vol 5 No 1 (2025): November 2025

Analisis Perbandingan Kemiripan Teks Bahasa Daerah di Indonesia Menggunakan Algoritma Naive Bayes dan K-Nearest Neighbor

Alfarizi (Unknown)
Herry Sujaini (Unknown)
Niken Candraningrum (Unknown)



Article Info

Publish Date
30 Nov 2025

Abstract

Indonesia, as an archipelagic country, has a wide variety of languages, with 718 regional languages. However, many regional languages face the risk of declining usage and even extinction. Technological developments have opened up opportunities to analyze the patterns and unique characteristics of regional languages through n-gram analysis using naive bayes and k-nearest neighbor algorithms. Therefore, this study was conducted with the aim of analyzing the similarity of regional languages, particularly Central Javanese, Sundanese, and Pontianak Malay, as part of an effort to assist in the preservation of regional languages in Indonesia. The similarity between languages was calculated based on errors in the confusion matrix, and the performance of the algorithms was evaluated using accuracy and F1-score metrics. The naive bayes algorithm with combined unigram and bigram features showed the best performance with an accuracy and F1-score of 0.921. The results of the study showed the highest similarity value in the ‘Javanese - Malay’ language, although only 3.82%, and the lowest in the ‘Malay - Sundanese’ language at 1.66%. These similarity values are based on the dominant characters that appear in a language, such as ‘e’ in Malay and ‘a’ and ‘u’ in Sundanese. This study proves that there is little similarity between Javanese, Sundanese, and Malay.

Copyrights © 2025






Journal Info

Abbrev

comforch

Publisher

Subject

Computer Science & IT

Description

Fokus kajian Journal of Computing and Informatics Research mempublikasikan hasil-hasil penelitian pada bidang informatika, namun tidak terbatas pada bidang ilmu komputer yang lain, seperti: 1. Kriptografi, 2. Artificial Intelligence, 3. Expert System, 4. Decision Support System, 5. Data Mining, dan ...