Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : Journal of Information Systems and Informatics

Evaluation of Machine Learning Models for Sentiment Analysis in the South Sumatra Governor Election Using Data Balancing Techniques Panjaitan, Febriyanti; Ce, Win; Oktafiandi, Hery; Kanugrahan, Ghanim; Ramdhani, Yudi; Putra, Vito Hafizh Cahaya
Journal of Information System and Informatics Vol 7 No 1 (2025): March
Publisher : Universitas Bina Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51519/journalisi.v7i1.1019

Abstract

Sentiment analysis is crucial for understanding public opinion, especially in political contexts like the 2024 South Sumatra gubernatorial election. Social media platforms such as Twitter and YouTube provide key sources of public sentiment, which can be analyzed using machine learning to classify opinions as positive, neutral, or negative. However, challenges such as data imbalance and selecting the right model to improve classification accuracy remain significant. This study compares five machine learning algorithms (SVM, Naïve Bayes, KNN, Decision Tree, and Random Forest) and examines the impact of data balancing on their performance. Data was collected via Twitter crawling (140 entries) and YouTube scraping (384 entries), and text features were extracted using CountVectorizer. The models were then evaluated on imbalanced and balanced datasets using accuracy, precision, recall, and F1-score. The Decision Tree and Random Forest models achieved the highest accuracies of 79.22% and 75.32% on imbalanced data, respectively. However, they also exhibited overfitting, as indicated by their near-perfect training performance. Naïve Bayes, on the other hand, demonstrated the lowest accuracy at 54.55% despite achieving high precision, suggesting frequent misclassification, particularly for the minority class. SVM and KNN also struggled with imbalanced data, recording accuracies of 58.44% and 63.64%, respectively. Significant improvements were observed after applying data balancing techniques. The accuracy of SVM increased to 71.43%, and KNN improved to 66.23%, indicating that these models are more stable and effective when class distributions are even. These findings highlight the substantial impact of data balancing on model performance, particularly for methods sensitive to class distribution. While tree-based models achieved high accuracy on imbalanced data, their tendency to overfit underscores the importance of balancing techniques to enhance model generalization.
Sentiment Analysis on Coretax Data Using SVM and Random Forest with SMOTE and Tomek-Link Oktafiandi, Hery; Winarnie, Winarnie; Ramadhan, M. Fajar; Panjaitan, Febriyanti
Journal of Information System and Informatics Vol 7 No 3 (2025): September
Publisher : Universitas Bina Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.51519/journalisi.v7i3.1279

Abstract

This study is motivated by the increasing adoption of digital tax platforms in Indonesia, particularly Coretax, which enables online tax reporting and payment. Understanding user sentiment is crucial for evaluating system effectiveness and identifying areas for improvement. However, sentiment data is often imbalanced, making it challenging to detect the sentiments of the minority class. This research evaluates the performance of Support Vector Machine (SVM) and Random Forest (RF) in classifying sentiment from Coretax related reviews collected between March and September 2025 from Twitter, YouTube, and the DJP application. Lexicon-based labeling and preprocessing were applied, followed by class balancing using Tomek-Link, SMOTE, and SMOTE-Tomek techniques. On the original data, SVM achieved an accuracy of 98.56%, while Random Forest reached 98.43%, both performing strongly on the majority class. However, minority class detection was improved through SMOTE and SMOTE-Tomek, albeit with a slight decrease in overall accuracy due to the risk of overfitting. The novelty of this study lies in its focus on Coretax 2025 data and a comparative analysis of multiple resampling techniques, providing practical insights into improving sentiment analysis performance on imbalanced digital tax data.