The phenomenon of online gambling continues to show growth with increasingly worrying trends. One of the challenges faced is the proliferation of gambling promotional comments on the YouTube platform due to the suboptimal performance of spam detection systems in recognizing manipulative language patterns. To address this issue, this study proposes a model for detecting spam comments in Indonesian using a combination of Term Frequency–Inverse Document Frequency (TF-IDF) and Extreme Gradient Boosting (XGBoost). The dataset contains 10,220 YouTube comments that have been manually labeled and processed through preprocessing stages, including unicode normalization and cleaning of irrelevant characters. The model was evaluated using 20% of the test data and produced an accuracy of 91%, precision of 92%, recall of 91%, and an F1-score of 91%. These results show that the combination of TF-IDF and XGBoost is effective for classifying short texts in YouTube comments. Thus, this study contributes to the development of Indonesian-language spam comment detection models, which are still rarely researched, and can also be used as a reference for media platforms in improving the effectiveness of stopping the spread of illegal content through social media comment sections.
Copyrights © 2025