Coretax is a tax information system developed by the Directorate General of Taxes (DJP) to support digital and integrated tax administration processes, covering everything from taxpayer registration to reporting and auditing. Although it was designed to improve efficiency, transparency, and accuracy in tax management, its implementation has sparked mixed reactions among the public due to various technical challenges and the complexity of the annual tax reporting process. This situation highlights the need for a sentiment analysis that can objectively capture public perceptions of the system’s performance. In this study, Natural Language Processing (NLP) and Machine Learning techniques were applied to analyze 3,000 tweets from Twitter (X) related to Coretax. One of the main issues identified in the dataset is class imbalance, where positive sentiments significantly outnumber negative and neutral ones, leading to biased classification results. To address this issue, the Synthetic Minority Over-sampling Technique (SMOTE) was used to balance the dataset by generating synthetic samples for the minority classes. The BERT model was then employed for sentiment classification because of its strong ability to understand contextual meaning through its transformer-based architecture. Experimental results show that before applying SMOTE, the BERT model achieved an accuracy of 77%, which increased to 80% after SMOTE was implemented, along with improvements in precision, recall, and F1-score, particularly for the minority classes. These findings demonstrate that the combination of SMOTE and BERT significantly enhances the performance of sentiment analysis in understanding public responses to Coretax. This approach can serve as a valuable reference for evaluating and improving tax digitalization policies, ensuring they are more effective, inclusive, and responsive to public needs.
Copyrights © 2025