Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : JOURNAL OF SCIENCE AND SOCIAL RESEARCH

ANALISIS PENANGANAN DATA TIDAK SEIMBANG TERHADAP KINERJA KLASIFIKASI SENTIMEN MULTIKELAS PADA ULASAN MARKETPLACE TOKOPEDIA Alfarizi, Nauval; Sinurat, Satria; Putra, Adi; Amin, Muhammad; Lydia, Prima
JOURNAL OF SCIENCE AND SOCIAL RESEARCH Vol 9, No 1 (2026): February 2026
Publisher : Smart Education

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.54314/jssr.v9i1.5804

Abstract

Abstract: The development of digital marketplaces has led to an increasing number of user reviews, which can be used to understand consumer perceptions of products and services. However, sentiment analysis in marketplace reviews faces a major challenge: class imbalance, where positive sentiment often dominates to an extreme. This study aims to analyze the effects of various imbalanced data-handling techniques on the performance of machine-learning-based multiclass sentiment classification in Tokopedia marketplace reviews. The dataset used consists of 56,981 reviews with three sentiment classes, with more than 97% of them being positive. Feature extraction was performed using the TF-IDF method, resulting in 17,765 features. The handling of data imbalance was tested through four scenarios: class weighting, Random Oversampling, SMOTE, and ADASYN, with the Naive Bayes, Logistic Regression, and Random Forest algorithms. The experimental results show that Random Forest with SMOTE achieves the highest accuracy of 0.9749 but has limitations in recognizing minority classes, with a recall of 0.3786. In contrast, Logistic Regression with Random Oversampling provides the most balanced performance with the highest F1-score (macro) value of 0.4992 and recall of 0.5866. Keywords: Analysis, Sentiment, Imbalanced Data, Multi-Class Classification F1-Score Abstrak: Perkembangan marketplace digital menyebabkan meningkatnya jumlah ulasan pengguna yang dapat dimanfaatkan untuk memahami persepsi konsumen terhadap produk dan layanan. Namun, analisis sentimen pada ulasan marketplace menghadapi tantangan utama berupa ketidakseimbangan distribusi kelas, di mana sentimen positif sering kali mendominasi secara ekstrem. Penelitian ini bertujuan untuk menganalisis pengaruh berbagai teknik penanganan data tidak seimbang terhadap kinerja klasifikasi sentimen multikelas pada ulasan marketplace Tokopedia berbasis machine learning. Dataset yang digunakan terdiri dari 56.981 ulasan dengan tiga kelas sentiment, di mana proporsi sentimen positif mencapai lebih dari 97%. Ekstraksi fitur dilakukan menggunakan metode TF-IDF yang menghasilkan 17.765 fitur. Penanganan ketidakseimbangan data diuji melalui empat skenario, yaitu class weighting, Random Oversampling, SMOTE, dan ADASYN, dengan algoritma Naive Bayes, Logistic Regression, dan Random Forest. Hasil eksperimen menunjukkan bahwa Random Forest dengan SMOTE menghasilkan akurasi tertinggi sebesar 0,9749, namun memiliki keterbatasan dalam mengenali kelas minoritas dengan nilai recall 0,3786. Sebaliknya, Logistic Regression dengan Random Oversampling memberikan performa paling seimbang dengan nilai F1-score (macro) tertinggi sebesar 0,4992 dan recall 0,5866. Kata kunci: Analisis, Sentimen, Data Tidak Seimbang, Klasifikasi Multi Kelas F1-Score