A Hate speech in the digital political space during election campaigns has the potential to cause polarization and undermine the quality of public discussion. This study analyzes hate speech in YouTube comments related to the five stages of the 2024 Indonesian presidential debate. We used IndoBERT, a Transformer-based language model specifically trained in Indonesian, to classify comments into hate speech and non-hate speech categories. The dataset consists of 38,742 comments collected from official debate videos. The dataset was labeled using a combination of manual annotation (20%) and semi-supervised learning (80%) using a pseudo-labeling approach. Experimental results show that IndoBERT achieved an average accuracy of 89.7% and a macro F1-score of 0.89 across all stages. IndoBERT outperformed baseline models such as mBERT, SVM, and Random Forest. These findings suggest that IndoBERT is more effective in capturing the linguistic nuances and distinctive Indonesian political rhetoric than multilingual or classical models. This study contributes an Indonesian-language political dataset and a comprehensive evaluation of relevant hate speech detection models for further research. Keywords: hate speech, IndoBERT, 2024 presidential debate, semi-supervised learning.
Copyrights © 2026