The spread of negative, engagement-driven content online causes significant societal harm, requiring advanced automated moderation tools. However, current classification systems often treat harmful content subtypes as independent, "flat" categories, which hinders their ability to thematically overlap content. This study designed and validated a novel integrated framework to accurately and transparently classify such complex cases. We proposed KG-DToT-HTC, a hybrid framework that synergistically combines three methodologies: a predefined Hierarchical Text Classification (HTC) taxonomy to structure the decision-making process; a domain-specific Knowledge Graph (KG) to provide factual, real-world context; and Decision Tree-of-Thought (DToT) prompting to guide a Large Language Model through an explicit, step-by-step reasoning process. On a real-world dataset of harmful Indonesian news, the proposed framework achieved a state-of-the-art Macro-F1 score of 0.934, representing a nearly 15-percentage point improvement over a zero-shot baseline. Ablation studies confirmed that each component—hierarchy, knowledge, and reasoning—provided a distinct and critical contribution to the final performance. The major conclusion of this study is that a synergistic architecture is essential for the accurate classification of complex harmful content. This work demonstrates a viable path toward "glass-box," interpretable AI moderation systems whose decisions are not only highly accurate but also fully auditable.
Copyrights © 2026