This study focuses on the application of the Multinomial Naive Bayes algorithm to detect hate speech in Indonesian tweets and test its accuracy level. According to The 2022 World Football Report, around 69% of Indonesia's population shows a high interest in football, creating a positive digital environment. The Dataset used consists of tweet data related to PSSI and politic taken from Twitter, which is then manually labeled into three classes, namely non-HS (Hate Speech), insults and provocations. The Dataset used consists of 2,210 tweets taken from Twitter, then manually labeled into three classes, namely non-HS (Hate Speech), insults, and provocations. Before dividing the Dataset into train and test data, an undersampling technique was applied to handle class imbalance, with the aim of ensuring a balanced distribution between the three categories. After undersampling, the training Dataset consisted of 350 tweets and the test Dataset consisted of 88 tweets. Evaluation of each method was carried out using matrix precision, recall, and F1-score. The results of the study indicate that the Multinomial Naïve Bayes algorithm obtained an accuracy of 62%. This accuracy result is expected to be useful for developing an effective and accurate hate speech detection model on social media platforms, especially Twitter, so that it can help reduce the awareness of the Indonesian people about the dangers of the spread of hate speech.
Copyrights © 2025