The imbalance of user reviews in the Roblox game creates accuracy challenges in sentiment classification, where the number of positive reviews significantly exceeds negative ones, causing the model to struggle particularly in identifying negative sentiment. This study aims to compare the performance of the Naïve Bayes and Support Vector Machine algorithms in classifying sentiment on imbalanced data. The research was conducted through several stages, including web scraping, pre-processing, automatic labeling using CNN, data splitting, model training, and performance evaluation using a Confusion Matrix. The findings reveal that Naïve Bayes tends to classify most samples as positive, resulting in very high recall for the positive class, reaching 0.995–0.997, but poor performance on the negative class, leading to consistent imbalance across all test ratios. In contrast, SVM achieves higher accuracy and more stable performance, with a Macro-F1 score of 0.740–0.769 and an AUC-PR of 0.936–0.942. The performance differences between the two models are statistically significant, with p-values of 0.001 and 0.0004, indicating that SVM is more effective in identifying both majority and minority classes. However, in terms of computational efficiency, Naïve Bayes is superior, requiring only 0.003–0.016 seconds of training time. Therefore, SVM is considered more reliable and robust for sentiment analysis on imbalanced data such as Roblox game reviews, whereas Naïve Bayes is more suitable when processing speed is the priority.
Copyrights © 2025