Claim Missing Document
Check
Articles

Found 1 Documents
Search

Perbandingan Algoritma Machine Learning untuk Klasifikasi Hoaks Berbahasa Indonesia pada Dataset Komdigi Haris Setyo Pratomo; Panny Agustia Rahayuningsih; Muhammad Rezki
Jurnal Komputer Teknologi Informasi Sistem Komputer (JUKTISI) Vol. 5 No. 1 (2026): Juni 2026
Publisher : LKP KARYA PRIMA KURSUS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62712/juktisi.v5i1.1255

Abstract

The spread of Indonesian-language hoaxes continues to increase along with the development of digital platforms, making it necessary to develop an automatic classification system capable of accurately and efficiently categorizing types of hoaxes. This study compares the performance of five machine learning algorithms, namely Support Vector Machine (SVM), Random Forest, Logistic Regression, Decision Tree, and Naive Bayes, in classifying Indonesian hoax categories using the Komdigi dataset consisting of 16,308 articles across six categories. Feature representation was performed using TF-IDF with n-gram combination (1,2) enriched with text statistical features, while the extreme class imbalance was handled using SMOTE applied internally within the Stratified K-Fold Cross-Validation pipeline to prevent data leakage. Evaluation results show that SVM (LinearSVC) achieved the highest accuracy of 95.9% and cross-validation score of 0.960, while Logistic Regression outperformed others in AUC Macro at 0.952 and macro F1-Score of 0.460, reflecting the best ability to recognize all categories in a balanced manner. Decision Tree showed the lowest performance with an AUC Macro of 0.635. These findings confirm that the selection of the best algorithm depends on the priority of evaluation metrics used according to the needs. This study contributes a recommendation of effective algorithms for Indonesian hoax classification and a valid, data leakage-free methodological framework.