The rapid development of e-commerce in Indonesia has led to an increase in the number of consumer reviews containing opinions and experiences of using products. In the cosmetic product category, text reviews have an important role in influencing purchasing decisions. However, the large volume of data and the imbalance of sentiment distribution are the main challenges in conducting manual and accurate sentiment analysis. Therefore, an automated approach based on machine learning is needed that is efficient and capable of handling large-scale and unbalanced data. This study aims to analyze the sentiment of reviews of Emina brand cosmetic products on the Tokopedia platform and evaluate the effectiveness of the Multinomial Naïve Bayes algorithm combined with TF-IDF and SMOTE data balancing techniques in classifying positive, neutral, and negative sentiments. The research data was obtained through web scraping of Emina product reviews, resulting in 446,325 review data. The research stages include text preprocessing, rule-based sentiment labeling, feature extraction using TF-IDF, data balancing using SMOTE, and classification modeling with the Naïve Bayes Multinomial algorithm. Model performance evaluation was carried out using accuracy, precision, recall, F1-score, and confusion matrix metrics. The test results showed that the model achieved an accuracy of 94.72% with a stable F1-score value in all sentiment classes, including minority classes, after the implementation of SMOTE. This study proves that the combination of Multinomial Naïve Bayes, TF-IDF, and SMOTE is effective for large-scale analysis of cosmetic product review sentiment and is able to significantly overcome the problem of data imbalance.
Copyrights © 2025