The rapid growth of application-based transportation services in Indonesia has generated a large volume of user reviews that contain essential information for service development. However, significant challenges arise in processing and analyzing data on a large scale. This study utilizes Hadoop and Apache Spark technology to conduct sentiment analysis on online transportation application reviews, focusing on Gojek user reviews. The dataset comprises 1.880.112 reviews obtained from Kaggle and Google Play Store. The research method includes data preprocessing using distributed computing with Hadoop and Spark, followed by sentiment labeling based on user ratings. The sentiment analysis model used is Logistic Regression, with hyperparameter tuning through Cross Validation. The evaluation results show a model accuracy of 80%, demonstrating the model's capability in effectively classifying sentiments, supported by PySpark implementation which enables efficient training and evaluation processes despite working with large-scale datasets. Text visualization in the form of a word cloud reveals that negative sentiment is often associated with app performance and digital wallet issues, while neutral sentiment focuses more on driver services. On the other hand, positive sentiment highlights user satisfaction with the overall service. The findings of this study demonstrate the effectiveness of Hadoop in large-scale sentiment analysis processing and provide valuable insights for improving online transportation services.
Copyrights © 2025