Online gambling promotions on social media have become a serious concern in Indonesia, where perpetrators use ambiguous and disguised language to evade detection. This study compares two transformer-based models, DistilBERT and DeBERTa, in detecting such content within Indonesian YouTube comments. Using a balanced dataset of 6,350 comments, both models were fine-tuned with optimized hyperparameters (learning rate 1e-5, batch size 32, 5 epochs) and evaluated through five-fold cross-validation. Results show that DeBERTa achieves superior performance with 99.84% accuracy and perfect recall, while DistilBERT achieves 99.29% accuracy. Error and linguistic analyses indicate that DeBERTa’s disentangled attention and Byte-Pair Encoding provide better understanding of non-standard and ambiguous language. Despite requiring higher computational cost, DeBERTa is ideal for high-accuracy applications, whereas DistilBERT remains suitable for real-time and resource-limited environments.
Copyrights © 2025