Sentiment analysis is an important way to understand user opinions about digital education apps, as the number of reviews on the Google Play Store is too large to be manually analyzed one by one. This study compares three machine learning methods, namely Naïve Bayes, K-Nearest Neighbor (KNN), and Decision Tree, to classify sentiments from user reviews of the Brainly and Ruang Guru apps. Data were collected by scraping 8,000 reviews from the Google Play Store, i.e., 4,000 reviews per app, from May to June 2026; after removing duplicate reviews, 6,151 reviews remained, consisting of 2,836 reviews for Brainly and 3,315 reviews for Ruang Guru. Sentiment labels were arranged based on the number of stars (1–3 means negative, 4–5 means positive), resulting in an unbalanced distribution of 79.8% positive and 20.2% negative. The text was processed through nine pre-processing stages specifically used for informal Indonesian. Features were then extracted using the TF-IDF method, resulting in 2,398 features and a viewing rate of 99.78%. The training data was quantity-equalized using the SMOTE technique, and the model was optimized with GridSearchCV using StratifiedKFold with 5 data splits. In the tuning and SMOTE scenarios, the Naïve Bayes method showed the best performance with an accuracy of 82.78%, an F1-Score of 83.79%, and an ROC-AUC of 88.44%, which was better than Decision Tree and KNN. Interestingly, the Naïve Bayes method without using SMOTE actually achieved the highest overall accuracy of 88.95%, indicating that using SMOTE on high-dimensional TF-IDF data does not always improve model performance. Differentiating keyword analysis helps to identify positive sentiments such as 'helpful', 'easy', and 'best', as well as negative sentiments such as 'trash', 'ads', and 'error', which can be used as a benchmark in providing service quality by the second application developer.
Copyrights © 2026