Online reviews on platforms like Google Maps have become a crucial data source for analyzing public opinion and consumer behavior, including in the context of selecting religious educational institutions, specifically pesantren (Islamic boarding schools). This study aims to perform sentiment analysis to measure public perception towards pesantren located across the island of Java. The data were collected via web scraping, yielding a total of 8,577 reviews, which subsequently underwent essential text preprocessing steps including cleansing, case folding, tokenization, stopword removal, and stemming. The prepared dataset was then partitioned using the Stratified Train-Test Split method into 70% for training and 30% for testing.The research evaluated the performance of three pre-trained language models IndoBERT Base, IndoROBERTa Small, and XLM-RoBERTa, which were fine-tuned using the Focal Loss function. The training strategy prioritized saving the best model based on the neutral F1-score.The final evaluation on the unseen test data demonstrated that the IndoBERT Base model significantly outperformed the others, achieving the highest overall accuracy of 0.92 (92%). This strong balance confirms the model's excellent generalization ability, indicating no significant overfitting and successful mitigation of classification bias. The findings validate IndoBERT Base as the optimal model for sentiment classification of pesantren reviews. Future research is recommended to shift focus toward building a larger, more diverse dataset to further enhance model generalizability.
Copyrights © 2026