This study is motivated by the increasing adoption of digital tax platforms in Indonesia, particularly Coretax, which enables online tax reporting and payment. Understanding user sentiment is crucial for evaluating system effectiveness and identifying areas for improvement. However, sentiment data is often imbalanced, making it challenging to detect the sentiments of the minority class. This research evaluates the performance of Support Vector Machine (SVM) and Random Forest (RF) in classifying sentiment from Coretax related reviews collected between March and September 2025 from Twitter, YouTube, and the DJP application. Lexicon-based labeling and preprocessing were applied, followed by class balancing using Tomek-Link, SMOTE, and SMOTE-Tomek techniques. On the original data, SVM achieved an accuracy of 98.56%, while Random Forest reached 98.43%, both performing strongly on the majority class. However, minority class detection was improved through SMOTE and SMOTE-Tomek, albeit with a slight decrease in overall accuracy due to the risk of overfitting. The novelty of this study lies in its focus on Coretax 2025 data and a comparative analysis of multiple resampling techniques, providing practical insights into improving sentiment analysis performance on imbalanced digital tax data.
Copyrights © 2025