This Author published in this journals
All Journal Academia Open
Yudhis Tri Hardianza
Department of Informatics Engineering, Faculty of Engineering, Universitas PGRI Ronggolawe, Tuban

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Analysis Comparison of T5, Indobert, and TF-IDF for Improving Accuracy in Detecting Cyberbullying on Indonesian-Language Social Media Yudhis Tri Hardianza; Asfan Muqtadir; Andik Adi Suryanto
Academia Open Vol. 11 No. 1 (2026): June
Publisher : Universitas Muhammadiyah Sidoarjo

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.21070/acopen.11.2026.13312

Abstract

General Background: Cyberbullying detection in Indonesian social media has become increasingly important due to rapid digital communication growth and complex informal language usage. Specific Background: Automated identification remains challenging because Indonesian online discourse frequently contains slang, ambiguity, sarcasm, and class imbalance, limiting the capability of conventional statistical and earlier deep learning approaches. Knowledge Gap: Prior studies have emphasized traditional classifiers and encoder-based Transformers such as IndoBERT, while generative text-to-text architectures like T5 and their comparison with hybrid feature fusion strategies remain underexplored in Indonesian-language corpora. Aims: This study systematically compares three modeling scenarios—T5 Base, Hybrid (T5 + TF-IDF), and Enhanced (T5 + TF-IDF + sentiment)—to evaluate their performance in detecting cyberbullying from 20,000 Indonesian social media comments with naturally imbalanced distribution. Results: Experimental findings show that T5 Base achieves the highest test Accuracy (0.8325) and Macro F1-Score (0.8329), while Hybrid and Enhanced models yield slightly lower yet competitive performance. The results indicate that contextual semantic representations learned by T5 sufficiently capture explicit and implicit abusive expressions, and additional statistical and sentiment features do not yield superior classification outcomes. Novelty: This research provides empirical evidence that a standalone text-to-text Transformer architecture can outperform hybrid feature fusion strategies in Indonesian cyberbullying detection under limited training data conditions. Implications: The findings support the adoption of end-to-end Transformer-based models for scalable, robust, and linguistically adaptive monitoring systems in low-resource social media environments. Highlights: The standalone text-to-text architecture produced the strongest test-set metrics among all evaluated scenarios. Integration of statistical weighting and sentiment signals did not surpass the semantic-only configuration. Stable generalization was maintained despite limited training allocation and naturally imbalanced data distribution. Keywords: Text Classification, Social Media Analysis, Transformer Models, Indonesian Language, Cyberbullying Detection.