Presidential elections are crucial in a country's political dynamics and are increasingly discussed on social media platforms like Twitter. However, sentiment analysis of public opinion on these platforms faces significant challenges, such as large data volumes, diverse formats, and the complexity of informal language. The key challenge is choosing the most appropriate feature extraction technique and classification algorithm to address the unique characteristics of Indonesian-language tweets in the context of presidential elections. This study aims to compare the effectiveness of two feature extraction approaches—semantic based on BERT (Bidirectional Encoder Representations from Transformers) and statistical based on TF-IDF (Term Frequency-Inverse Document Frequency)—in sentiment analysis of Indonesian-language tweets related to the presidential election, using four classification algorithms: Support Vector Machine (SVM), Naive Bayes, K-Nearest Neighbors, and Decision Tree. The experimental results demonstrate that the combination of TF-IDF with SVM provides the best performance, with an accuracy of 85.1% and a macro f1-score of 0.81, outperforming the BERT approach used statically. These findings indicate that statistical approaches such as TF-IDF remain relevant and practical for short social media texts and emphasize the importance of choosing a method that suits the characteristics of the data and the context of the analysis.
Copyrights © 2025