Salvia Devi Muhshanah
Universitas Kristen Satya Wacana

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Evaluation of the Impact of Labeling Quality and Class Imbalance on Sentiment Classification of the Palestine–Israel Conflict Salvia Devi Muhshanah; Evi Maria
Sistemasi: Jurnal Sistem Informasi Vol 15, No 5 (2026): Sistemasi: Jurnal Sistem Informasi
Publisher : Program Studi Sistem Informasi Fakultas Teknik dan Ilmu Komputer

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32520/stmsi.v15i5.6304

Abstract

This study aims to evaluate the performance of sentiment classification on social media data related to the Palestine–Israel conflict, with a particular emphasis on the role of labeling quality and data distribution. The proposed approach combines TF-IDF text representation with lexicon-based labeling using InSet, along with two classification algorithms: Support Vector Machine (SVM) and Random Forest. The dataset was collected from the social media platform X and consisted of 2,831 Indonesian-language tweets that had undergone preprocessing. The results indicate that the sentiment distribution was dominated by the negative class (39.35%), followed by neutral (38.43%) and positive (22.21%) classes, indicating the presence of class imbalance. The labeling validity evaluation produced a Cohen’s Kappa value of 0.0175, indicating a low level of agreement between automatic labeling and manual annotation. The SVM model achieved an accuracy of 0.69 and a weighted F1-score of 0.68. However, both models demonstrated poor performance on the positive class as the minority class. These findings suggest that the limitations in model performance are not solely caused by the classification algorithms themselves, but are also significantly influenced by labeling quality and data distribution characteristics. This study contributes by emphasizing the importance of comprehensive evaluation throughout the sentiment analysis pipeline, particularly when dealing with complex and uncontrolled data sources such as social media.