Claim Missing Document
Check
Articles

Found 2 Documents
Search

ANALISIS PENANGANAN DATA TIDAK SEIMBANG TERHADAP KINERJA KLASIFIKASI SENTIMEN MULTIKELAS PADA ULASAN MARKETPLACE TOKOPEDIA Alfarizi, Nauval; Sinurat, Satria; Putra, Adi; Amin, Muhammad; Lydia, Prima
JOURNAL OF SCIENCE AND SOCIAL RESEARCH Vol 9, No 1 (2026): February 2026
Publisher : Smart Education

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.54314/jssr.v9i1.5804

Abstract

Abstract: The development of digital marketplaces has led to an increasing number of user reviews, which can be used to understand consumer perceptions of products and services. However, sentiment analysis in marketplace reviews faces a major challenge: class imbalance, where positive sentiment often dominates to an extreme. This study aims to analyze the effects of various imbalanced data-handling techniques on the performance of machine-learning-based multiclass sentiment classification in Tokopedia marketplace reviews. The dataset used consists of 56,981 reviews with three sentiment classes, with more than 97% of them being positive. Feature extraction was performed using the TF-IDF method, resulting in 17,765 features. The handling of data imbalance was tested through four scenarios: class weighting, Random Oversampling, SMOTE, and ADASYN, with the Naive Bayes, Logistic Regression, and Random Forest algorithms. The experimental results show that Random Forest with SMOTE achieves the highest accuracy of 0.9749 but has limitations in recognizing minority classes, with a recall of 0.3786. In contrast, Logistic Regression with Random Oversampling provides the most balanced performance with the highest F1-score (macro) value of 0.4992 and recall of 0.5866. Keywords: Analysis, Sentiment, Imbalanced Data, Multi-Class Classification F1-Score Abstrak: Perkembangan marketplace digital menyebabkan meningkatnya jumlah ulasan pengguna yang dapat dimanfaatkan untuk memahami persepsi konsumen terhadap produk dan layanan. Namun, analisis sentimen pada ulasan marketplace menghadapi tantangan utama berupa ketidakseimbangan distribusi kelas, di mana sentimen positif sering kali mendominasi secara ekstrem. Penelitian ini bertujuan untuk menganalisis pengaruh berbagai teknik penanganan data tidak seimbang terhadap kinerja klasifikasi sentimen multikelas pada ulasan marketplace Tokopedia berbasis machine learning. Dataset yang digunakan terdiri dari 56.981 ulasan dengan tiga kelas sentiment, di mana proporsi sentimen positif mencapai lebih dari 97%. Ekstraksi fitur dilakukan menggunakan metode TF-IDF yang menghasilkan 17.765 fitur. Penanganan ketidakseimbangan data diuji melalui empat skenario, yaitu class weighting, Random Oversampling, SMOTE, dan ADASYN, dengan algoritma Naive Bayes, Logistic Regression, dan Random Forest. Hasil eksperimen menunjukkan bahwa Random Forest dengan SMOTE menghasilkan akurasi tertinggi sebesar 0,9749, namun memiliki keterbatasan dalam mengenali kelas minoritas dengan nilai recall 0,3786. Sebaliknya, Logistic Regression dengan Random Oversampling memberikan performa paling seimbang dengan nilai F1-score (macro) tertinggi sebesar 0,4992 dan recall 0,5866. Kata kunci: Analisis, Sentimen, Data Tidak Seimbang, Klasifikasi Multi Kelas F1-Score
Comparative Machine Learning Analysis for Sentiment Classification of Sumatra Disaster 2025 Alfarizi, Nauval; Lydia, Prima; Novelan, Muhammad Syahputra; Putra, Adi; Sinurat, Satria
Journal of Technology and Computer Vol. 3 No. 1 (2026): February 2026 - Journal of Technology and Computer
Publisher : PT. Technology Laboratories Indonesia (TechnoLabs)

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Indonesia is highly vulnerable to natural disasters due to its geological position, resulting in extensive disaster-related news coverage that shapes public sentiment. This study presents a comparative machine learning analysis for sentiment classification of online news related to natural disasters in Sumatra during December 2025. The dataset was collected through web scraping from two major Indonesian news portals, like CNN Indonesia and Detik, and categorized into three sentiment classes: negative, neutral, and positive. Sentiment classification was conducted using Naive Bayes, Support Vector Machine (SVM), and k-Nearest Neighbors (KNN) algorithms. The results demonstrate that Naive Bayes achieved accuracy values of 0.57 on the CNN Indonesia dataset and 0.61 on the Detik dataset. However, its performance was highly biased toward the dominant negative class, as indicated by low macro-average F1-scores of (0.24) and (0.39). In contrast, SVM showed the most balanced performance by reducing class bias, achieving accuracies of (0.68) and (0.67) with macro-average F1-scores of (0.51) and (0.59), respectively. KNN demonstrated moderate performance, with accuracy values of 0.60 and 0.59, but remained less effective than SVM in handling minority sentiment classes.