The phenomenon of “Sound Horeg” on online platforms has sparked diverse public sentiments, making sentiment analysis an essential tool for understanding public opinion. This study aims to classify user sentiments (positive/negative) related to “Sound Horeg” using the Naïve Bayes algorithm. The dataset used in this research exhibits significant class imbalance, with a predominance of negative sentiments. The methodology involves a series of text preprocessing stages, including case folding, tokenizing, normalization, lexicon-based sentiment labeling, stopword removal, stemming, and duplicate removal. The sentiment labeling process utilizes an Indonesian sentiment lexicon compiled from two sources lexicon_positif.csv and lexicon_negatif.csv containing predefined lists of words with positive and negative sentiment scores based on Indonesian public opinion lexicons. Subsequently, text features are extracted using the Term Frequency–Inverse Document Frequency (TF-IDF) method. To address data imbalance, the Synthetic Minority Oversampling Technique (SMOTE) is applied to the training data to balance the number of positive and negative samples. The Naïve Bayes model is then optimized using GridSearchCV to determine the best alpha value. Experimental results show that the unoptimized Naïve Bayes model achieved an accuracy of 73%, but struggled to classify minority classes (positive sentiments) due to data bias. After applying SMOTE and parameter tuning, the model’s performance improved significantly, demonstrating the effectiveness of these techniques in producing a more balanced and robust model. This study concludes that the Naïve Bayes algorithm, when optimized with SMOTE and hyperparameter tuning, is effective for Indonesian-language sentiment analysis, particularly on imbalanced datasets. Future work may include exploring other algorithms and employing broader sentiment lexicons and more complex linguistic features to further enhance model performance.
Copyrights © 2025