Sentiment analysis has become a vital approach in understanding customer opinions through textual reviews. One of the primary challenges in sentiment classification lies in class imbalance, where positive reviews often dominate the dataset. This imbalance causes machine learning models to be biased toward the majority class and underperform in detecting minority sentiments. To address this issue, this study applies the Synthetic Minority Oversampling Technique (SMOTE) and evaluates the performance of two Transformer-based models: Generative Pre-trained Transformer (GPT) as a baseline and IndoBERT as the primary model. The dataset consists of 12,704 product reviews from Lazada, obtained from the Kaggle platform, and is categorized into three sentiment classes (positive, neutral, negative). The data was split into 80% for training and 20% for testing. After preprocessing and applying SMOTE for data balancing, the fine-tuned IndoBERT model achieved the best performance with an accuracy of 88%, significantly outperforming GPT, which yielded only 47% accuracy in a zero-shot setting. These findings highlight the critical role of addressing data imbalance and selecting context-aware models for improving sentiment classification accuracy in Indonesian language texts
Copyrights © 2025