This study develops a lightweight early-warning model to identify toxic utterances as practical indicators of cyberbullying in Indonesian-language conversations within the Roblox gaming community, to support digital character education and child online safety. A corpus of 2,798 publicly available comments was manually annotated into Safe and Toxic categories and divided into training and testing sets. Text preprocessing included case folding, noise removal, tokenization, Roblox-specific slang normalization, stemming, and stopword removal. Text features were represented using term frequency–inverse document frequency (TF-IDF) unigram–bigram vectors. A linear Support Vector Machine (SVM) was evaluated against Multinomial Naïve Bayes as a baseline model. Results from hold-out testing indicate that the SVM achieved 82.14% accuracy and a macro-F1 score of 0.82, outperforming the baseline. Cross-validation results show performance variability, highlighting the need for continuous updates of domain-specific slang resources and broader data coverage. From an educational perspective, the proposed prototype can function as a non-punitive screening tool to support digital literacy instruction, school counselling, and parental mediation within a human-in-the-loop framework.
Copyrights © 2026