This study investigates automated cyberbullying detection on TikTok within the Indonesian digital context, where high social media usage among children and adolescents demands scalable and consistent content moderation. We propose an IndoBERT-based framework for detecting and classifying cyberbullying in Indonesian-language TikTok comments, incorporating algorithmic fairness considerations. A dataset of 2,122 TikTok comments was collected from a publicly available Kaggle repository and divided into training, validation, and testing sets using a 70:15:15 stratified sampling ratio. The IndoBERT-base-p1 model was fine-tuned with the PyTorch and HuggingFace frameworks, optimizing hyperparameters like the AdamW optimizer and learning rate scheduling. Experimental results show that the model achieved an accuracy of 70.66% and a ROC-AUC score of 0.7969, demonstrating solid discriminative power. With a macro F1-score of 0.7066 and a cyberbullying recall of 0.7170, the model shows balanced performance in identifying harmful content. A key contribution of this study is a fairness evaluation framework that reveals an accuracy gap of 2.08% and an equal opportunity gap of 0.0208, indicating overall fairness. However, demographic parity remains a concern. This system, supporting content triage combined with human review, enhances moderation workflows by filtering non-cyberbullying cases while flagging potentially harmful content for human oversight.
Copyrights © 2026