With its rich diversity of ethnicities, cultures, races, and religions, Indonesia is one of the countries with the highest number of regional languages in the world. This linguistic diversity often leads to communication challenges, particularly when conveying information or engaging in textual conversations. This study aims to identify and classify the Toraja, Batak, and Ambon languages using machine learning-based computational methods. The techniques employed include Decision Tree and Gradient Boost algorithms to evaluate the accuracy of each model. The results demonstrate that both Decision Tree and Gradient Boost are effective in language identification, achieving accuracy rates above 77%. However, based on the confusion matrix analysis, the Gradient Boost method proved to be more effective, with an accuracy rate of 81.06%, compared to 78.39% achieved by the Decision Tree. These findings suggest that Gradient Boost offers better performance for classifying these regional languages.
Copyrights © 2025