The rapid growth of digital news platforms necessitates reliable and automated systems for maintaining content quality at scale. This study presents the engineering and evaluation of an IndoBERT-based Natural Language Processing (NLP) framework for automated clickbait detection in Indonesian news headlines. The proposed framework is designed as an end-to-end text classification pipeline, incorporating data preprocessing, tokenization, fine-tuning of a pretrained IndoBERT model, and systematic performance evaluation. Experiments were conducted using the CLICK-ID dataset comprising 15,000 Indonesian news headlines, with an 80:20 stratified train–test split. The fine-tuned model achieved an accuracy of 0.83, with a precision of 0.82, recall of 0.77, and an F1-score of 0.79 for the clickbait class. Further evaluation using threshold-independent metrics yielded a ROC-AUC value of 0.89 and an average precision of 0.88, indicating strong discriminative capability under moderate class imbalance. Comparative analysis shows that the proposed approach outperforms prior CNN, Bi-LSTM, and ensemble-based methods evaluated on the same dataset. These results demonstrate that IndoBERT provides a robust foundation for engineering automated clickbait detection systems tailored to Indonesian-language news streams.
Copyrights © 2026