Journal of Information Technology and Cyber Security
Vol. 2 No. 2 (2024): July

Dialect Classification of the Javanese Language Using the K-Nearest Neighbor

Filby, Brilliant (Unknown)
Pujianto, Utomo (Unknown)
Hammad, Jehad A. H. (Unknown)
Wibawa, Aji Prasetya (Unknown)



Article Info

Publish Date
31 Dec 2024

Abstract

Indonesia is rich in ethnic and cultural diversity, each reflected in its unique linguistic characteristics. One way to preserve the Javanese language is by conducting research on its dialects. This study aims to classify three main dialects in Java Island—East Java, Central Java, and West Java—using text data from online sources. The classification process includes preprocessing (tokenizing, case folding, and word weighting), data balancing with the Synthetic Minority Oversampling Technique (SMOTE), and classification using the K-Nearest Neighbor (K-NN) algorithm. This study highlights the importance of dialect recognition in supporting the preservation of the Javanese language and the development of linguistic technology applications. Testing using 10-fold cross-validation showed the best performance at , with an accuracy of 94.05%, precision of 95.83%, and recall of 94.44%. These findings significantly support computational linguistics research and the preservation of regional languages.

Copyrights © 2024






Journal Info

Abbrev

jitsc

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management Electrical & Electronics Engineering

Description

Journal of Information Technology and Cyber Security (JITCS) is a refereed international journal whose focus is on exchanging information relating to Information Technology and Cyber Security in industry, government, and universities worldwide. The thrust of the journal is to publish papers dealing ...