Sentiment analysis on social media is becoming an important approach in understanding public opinion towards an event. Twitter, as a microblogging platform, generates a large amount of data that can be utilized for this analysis. This study aims to evaluate and compare the performance of three classification algorithms, namely Support Vector Machine (SVM), Random Forest, and Extreme Gradient Boosting (XGBoost), in sentiment analysis related to the Clash of Champions event in Indonesia. To represent the text data, two feature extraction techniques are used, namely Term Frequency-Inverse Document Frequency (TF-IDF) and Bag of Words (BoW). In addition, Synthetic Minority Over-sampling Technique (SMOTE) is applied to handle data imbalance, while model optimization is performed using GridSearchCV. The research dataset consists of 1,000 tweets collected through web scraping, then manually processed and labeled before model training and testing. The results showed that the TF-IDF technique provided superior results compared to BoW. The Random Forest model with TF-IDF achieved the highest accuracy of 91%, while XGBoost with TF-IDF had the highest Area Under the Curve (AUC) of 0.91. The findings confirm that the selection of appropriate feature extraction techniques and algorithms can improve accuracy in sentiment analysis. This study can be applied in public opinion monitoring and data-driven decision-making. Future research can explore word embedding techniques and transformer-based deep learning models to improve semantic understanding and accuracy of sentiment analysis.
Copyrights © 2025