Clickbait remains a common strategy on YouTube, where video titles are often crafted to maximize viewer engagement. Although transformer-based machine learning technologies have advanced rapidly, studies that specifically investigate clickbait in YouTube video titles are still rare, even though such titles have unique linguistic characteristics that are shorter, more informal, and more ambiguous than news headlines or other social media texts. This study compares three Transformer models, namely BERT, RoBERTa, and XLNet, for the task of clickbait detection using two benchmark datasets. Each model was fine-tuned and evaluated using standard classification metrics, with additional analyses on training and inference efficiency. The results show that all three models achieved accuracy above 95 percent. RoBERTa achieved the best performance on the Chaudhary dataset (99.84 percent), while BERT cased performed best on the Vierti dataset (96.91 percent). In contrast, XLNet lagged in both accuracy and computational efficiency, with inference times exceeding six seconds per batch. This study demonstrates a 1.31 percent improvement in accuracy compared to previous SVM-based methods and provides a comprehensive evaluation of three Transformer architectures in the YouTube context, offering empirical guidance for more effective clickbait detection.
Copyrights © 2025