This study addresses the growing problem of hoax news in Indonesia, which has contributed to social conflicts. It aims to develop an effective detection method using the Multinomial Naive Bayes algorithm. The study integrates Indonesian specific text preprocessing and feature engineering within the CRISP-DM framework to enhance classification performance. A dataset of 5,226 news articles (2,612 non-hoax and 2,614 hoax) was collected from kompas.com and turnbackhoax.id. Preprocessing steps included case folding, tokenization, stopword removal, and stemming tailored to the Indonesian language. Feature extraction was performed using the TF-IDF weighting scheme to convert text into numerical representations. The Multinomial Naive Bayes algorithm achieved an average accuracy of 86%, precision of 86%, recall of 86%, and F1 score of 86%, indicating stable and balanced performance. Furthermore, the trained model was successfully deployed using the Flask framework and stored in (pickle/joblib) format, demonstrating its practical applicability in real world systems. The results indicate that the integration of Indonesian specific preprocessing and TF-IDF feature representation significantly supports the effectiveness of the Multinomial Naive Bayes algorithm in detecting hoax news. This study provides a scalable and implementable approach to combating the spread of false information in Indonesian digital media.
Copyrights © 2026