The spread of hoaxes through social media presents a significant challenge to the accuracy of public information. Automated detection based on natural language processing (NLP) offers a potential solution to this issue. This study investigates the impact of keyword extraction methods on the performance of hoax classification using the Bidirectional Long Short-Term Memory (Bi-LSTM) architecture. Two methods are evaluated: YAKE, which relies on statistical features, and KeyBERT, which utilizes semantic representations from the BERT transformer model. The IDNHoaxCorpus, an Indonesian-language dataset, serves as the experimental basis, undergoing preprocessing, keyword extraction, and model training stages. Evaluation metrics include accuracy, precision, recall, F1-score, and processing time. Results show that KeyBERT achieves higher accuracy and F1-score (82.56% and 73.30%, respectively) compared to YAKE (80.07% and 71.11%), but at the cost of significantly longer processing time (360 seconds vs. 13 seconds). These findings highlight a notable trade-off between accuracy and computational efficiency, which should be considered based on application requirements such as real-time systems or batch processing. This study underscores the importance of selecting appropriate feature extraction strategies in text-based hoax detection systems.
Copyrights © 2025