This Author published in this journals
All Journal bit-Tech
Kevin Iansyah
Universitas Pembangunan Nasional "Veteran" Jawa Timur

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Comparative Analysis of IndoBERT, IndoBERTweet, and XLM-RoBERTa for Detecting Online Gambling Comments on YouTube Kevin Iansyah; Afina Lina Nurlaili; Muhammad Muharrom Al Haromainy
bit-Tech Vol. 8 No. 2 (2025): bit-Tech
Publisher : Komunitas Dosen Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32877/bt.v8i2.3257

Abstract

The proliferation of online gambling promotions in YouTube comment sections poses significant challenges for content moderation on Indonesian digital platforms. Although transformer models have proven effective for various Indonesian-language NLP tasks, no systematic comparative evaluation exists for detecting online gambling promotions on YouTube, nor has research explored model sensitivity to hyperparameters in this context. This research identifies the optimal transformer model and configuration for detecting Indonesian-language online gambling promotion comments on YouTube. A total of 26,455 YouTube comments were collected from February to July 2025 and stratified into balanced training (18,926 comments) and validation sets (3,340 comments), plus an imbalanced testing set (4,189 comments with 28.05% promotions and 71.95% non-promotions) reflecting realistic platform conditions. Nine fine-tuning experiments were conducted with three transformer models (IndoBERT, IndoBERTweet, XLM-RoBERTa) using three learning rates (1e-5, 2e-5, 3e-5). Evaluation employed accuracy, precision, recall, F1-score, and AUC-ROC metrics. Results show IndoBERT with learning rate 1e-5 achieved best performance (F1-score 99.57%, recall 99.49%), outperforming IndoBERTweet (F1-score 98.58%) and XLM-RoBERTa (F1-score 99.28%). Interestingly, the formal corpus-trained model (IndoBERT) proved more effective than the social media model (IndoBERTweet), indicating that gambling promotion language patterns tend to be structured despite appearing in informal contexts. IndoBERT demonstrated greatest stability to learning rate variations (standard deviation 0.0011), while XLM-RoBERTa offered fastest inference time (2.48 ms) with optimal performance-efficiency balance. These findings provide practical recommendations for automated content moderation systems on Indonesian social media platforms, with IndoBERT for maximum accuracy scenarios and XLM-RoBERTa for large-scale real-time deployment.