The rapid growth of digital technology and internet access has completely changed how information is shared, enabling content to spread quickly across various online platforms. However, these advancements have also made it easier for misleading or entirely fabricated news to circulate, posing serious risks to social stability, political environments, and public health. This study tackles this problem by employing several machine learning-based classification methods for analyzing textual data. Four algorithms Support Vector Machine (SVM), Logistic Regression (LR), Naive Bayes (NB), and Extreme Gradient Boosting (XGBoost) were applied to detect linguistic patterns that differentiate genuine news from fake content. A major contribution of this research is the creation of a custom dataset gathered directly from Indonesian online news portals, comprising a total of 4,909 entries. The evaluation results demonstrate exceptionally high accuracy across the models: 99.69% for SVM, 99.39% for LR, 99.29% for NB, and 99.19% for XGBoost. To verify reliability, each model was further evaluated using cross-validation, yielding average accuracy scores of 99.57% (SVM), 99.52% (LR), 99.44% (NB), and 99.49% (XGBoost). These findings confirm that all four classifiers are highly effective and well-suited for identifying fake news in text-based data.
Copyrights © 2026