JOIV : International Journal on Informatics Visualization
Vol 9, No 4 (2025)

Multi-label Aspect Dangerous Speech Classification Using Keyword-Driven Ensemble Classifier on Imbalanced Data

Findawati, Yulian (Unknown)
Budi Raharjo, Agus (Unknown)
Adni Navastara, Dini (Unknown)
Yonathan, Vincent (Unknown)
Yatestha, Anak Agung (Unknown)
Purwitasari, Diana (Unknown)



Article Info

Publish Date
31 Jul 2025

Abstract

This study aims to detect various aspects of dangerous speech on social media, particularly Twitter, which has the potential to incite violence and increase prejudice against specific communities. The research dataset includes tweets containing dangerous speech related to the Indonesian government from 2019 to 2022. Researchers manually labeled the data based on seven aspects of hazardous speech, including social and historical context, dehumanization, accusations in the mirror, threats against women/children, questioning in-group loyalty, and threats against groups. The study employs a multi-label classification method to handle these aspects, which appear simultaneously in a single text. The main challenges include data imbalance, ambiguity, and the informal language frequently appearing in tweets. This study introduces a Keyword-Driven Ensemble Classifier (KDEC), a new ensemble model that leverages the strengths of SVC, Logistic Regression, IndoBERTweet, and specific keyword lists for each label. Researchers designed KDEC based on the best results from machine learning and deep learning methods tested in this study. The research team tested the model on small and large datasets, conducting trials involving seven and four-label classifications. The results show that KDEC, with label reduction and keyword support, effectively addresses data imbalance, resolves label overlap, and achieves 92% accuracy for seven-label classification and 88% for four-label classification. The findings of this research are highly relevant for hate speech analysis across various platforms and languages, particularly in understanding context and conveyed messages. Additionally, this study provides valuable insights into managing harmful content in online government-related discussions. This method identifies dangerous speech on a larger scale and supports data-driven social media content regulation decision-making.

Copyrights © 2025






Journal Info

Abbrev

joiv

Publisher

Subject

Computer Science & IT

Description

JOIV : International Journal on Informatics Visualization is an international peer-reviewed journal dedicated to interchange for the results of high quality research in all aspect of Computer Science, Computer Engineering, Information Technology and Visualization. The journal publishes state-of-art ...