Language has an important role in human life. With the existence of language, humans can communicate and exchange ideas with one another. However, the diversity of ethnic groups in Indonesia causes Indonesia to have a variety of regional languages, therefore regional languages can make the delivery of information and communication difficult. This study aims to identify Toraja, Halmahera and Kalimantan languages in text form. Identification is done to find out the language of each region by using computerized technology. This identification uses a classification technique using two methods, namely decision trees and gradient boots. These two methods are used to identify the language according to the text that has been entered and then calculate the accuracy value. The data identified were 195315 sentences. This Research Also Resulted In A Comparison Of The Accuracy Of The Two Methods, So That It Can Be Known Which Methods Are Effective And Can Be Used In Identifying Language. The results of the study found that both methods are quite effective for use in identifying languages with an accuracy value of 0.65 or 65%. However, Judging From The Confusion Matrix, the Gradient Boost Method Is More Effective Than The Decision Tree With Accuracy Values Of 0.6525 And 0.6509 Or 65.25% And 65.05%
Copyrights © 2022