Claim Missing Document
Check
Articles

Found 2 Documents
Search

Perbandingan Algoritma Machine Learning untuk Klasifikasi Hoaks Berbahasa Indonesia pada Dataset Komdigi Haris Setyo Pratomo; Panny Agustia Rahayuningsih; Muhammad Rezki
Jurnal Komputer Teknologi Informasi Sistem Komputer (JUKTISI) Vol. 5 No. 1 (2026): Juni 2026
Publisher : LKP KARYA PRIMA KURSUS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62712/juktisi.v5i1.1255

Abstract

The spread of Indonesian-language hoaxes continues to increase along with the development of digital platforms, making it necessary to develop an automatic classification system capable of accurately and efficiently categorizing types of hoaxes. This study compares the performance of five machine learning algorithms, namely Support Vector Machine (SVM), Random Forest, Logistic Regression, Decision Tree, and Naive Bayes, in classifying Indonesian hoax categories using the Komdigi dataset consisting of 16,308 articles across six categories. Feature representation was performed using TF-IDF with n-gram combination (1,2) enriched with text statistical features, while the extreme class imbalance was handled using SMOTE applied internally within the Stratified K-Fold Cross-Validation pipeline to prevent data leakage. Evaluation results show that SVM (LinearSVC) achieved the highest accuracy of 95.9% and cross-validation score of 0.960, while Logistic Regression outperformed others in AUC Macro at 0.952 and macro F1-Score of 0.460, reflecting the best ability to recognize all categories in a balanced manner. Decision Tree showed the lowest performance with an AUC Macro of 0.635. These findings confirm that the selection of the best algorithm depends on the priority of evaluation metrics used according to the needs. This study contributes a recommendation of effective algorithms for Indonesian hoax classification and a valid, data leakage-free methodological framework.
Perbandingan Sentimen Ulasan Pengguna Aplikasi Brainly dan Ruangguru Menggunakan Naïve bayes, KNN, Decision Tree Adrianus Windi; Panny Agustia Rahayuningsih; Muhammad Rezki
Jurnal Komputer Teknologi Informasi Sistem Komputer (JUKTISI) Vol. 5 No. 1 (2026): Juni 2026
Publisher : LKP KARYA PRIMA KURSUS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62712/juktisi.v5i1.1294

Abstract

Sentiment analysis is an important way to understand user opinions about digital education apps, as the number of reviews on the Google Play Store is too large to be manually analyzed one by one. This study compares three machine learning methods, namely Naïve Bayes, K-Nearest Neighbor (KNN), and Decision Tree, to classify sentiments from user reviews of the Brainly and Ruang Guru apps. Data were collected by scraping 8,000 reviews from the Google Play Store, i.e., 4,000 reviews per app, from May to June 2026; after removing duplicate reviews, 6,151 reviews remained, consisting of 2,836 reviews for Brainly and 3,315 reviews for Ruang Guru. Sentiment labels were arranged based on the number of stars (1–3 means negative, 4–5 means positive), resulting in an unbalanced distribution of 79.8% positive and 20.2% negative. The text was processed through nine pre-processing stages specifically used for informal Indonesian. Features were then extracted using the TF-IDF method, resulting in 2,398 features and a viewing rate of 99.78%. The training data was quantity-equalized using the SMOTE technique, and the model was optimized with GridSearchCV using StratifiedKFold with 5 data splits. In the tuning and SMOTE scenarios, the Naïve Bayes method showed the best performance with an accuracy of 82.78%, an F1-Score of 83.79%, and an ROC-AUC of 88.44%, which was better than Decision Tree and KNN. Interestingly, the Naïve Bayes method without using SMOTE actually achieved the highest overall accuracy of 88.95%, indicating that using SMOTE on high-dimensional TF-IDF data does not always improve model performance. Differentiating keyword analysis helps to identify positive sentiments such as 'helpful', 'easy', and 'best', as well as negative sentiments such as 'trash', 'ads', and 'error', which can be used as a benchmark in providing service quality by the second application developer.