This study develops a hate speech corpus by integrating Searle’s Speech Act theory to identify the illocutionary intentions behind offensive utterances, elaborated in two research objectives: 1) identifying illocutionary points within the corpus containing social identity content by employing Searle’s speech acts approach, and 2) evaluating corpus quality from a natural language processing perspective. Achieving these objectives requires a methodology that integrates linguistically qualitative description with quantitative machine learning measurement. The data was obtained from a readjusted corpus, with a focused annotation on 3,315 data points containing social identity markers. The study employed a qualitative linguistic framework for intention attribution, followed by a quantitative evaluation using a hybrid BiLSTM-IndoBERT algorithm to assess corpus consistency and model predictability. The findings indicate that hate speech in the Indonesian context is predominantly manifested through negatively expressive utterances, with religion being the most frequent target, followed by ethnicity-based directive attacks. The hybrid model achieved an F1-score of 87%, demonstrating the viability of the annotated corpus for automated detection. Integrating intention attribution provides a more granular linguistic foundation for language models compared to purely semantic-based approaches. This study offers a framework for stakeholders to map hate speech patterns, though future work should incorporate more diverse sociopolitical contexts.
Copyrights © 2026