Sentiment analysis is an application of text mining that is used to find out opinions from a set of textual data about a particular event or topic. The main function of sentiment analysis is to extract information and find the meaning and opinions of a given user. Sentiment analysis requires classification algorithms, such as Support Vector Machine (SVM). SVM is a frequently used algorithm for text data classification because it can handle high-dimensional data. The concept of SVM is to determine the best hyperplane that serves as a separator of two classes in the input space. Text data with a large number of features causes data imbalance and affects the classification process so it is necessary to do feature selection. Feature selection is a technique used to reduce irrelevant attributes in the dataset. N-gram feature selection is a statistics-based approach to classifying text. N-grams are able to classify unknown text with the highest certainty. The characteristics of N-grams in sentiment analysis are that they function well despite textual errors, run efficiently, require simple storage, and fast processing time. This research aims to perform sentiment analysis on application reviews on the Google Play Store with SVM and unigram, bigram, and trigram feature selection. The methodology of this research includes conducting theoretical studies, web scraping, text preprocessing, labeling sentiments with VADER, weighting with TF-IDF, dividing data into training data (80%) and testing data (20%), training and evaluating models, classifying testing data, and interpreting results. Based on the research results, 3151 testing data were classified. SVM classification and unigram feature selection have the highest accuracy value of 90% and AUC of 0.93 (excellent). SVM classification and bigram feature selection have an accuracy value of 78% with an AUC value of 0.81 (good). SVM classification and trigram feature selection had the lowest accuracy value of 68% with an AUC value of 0.66 (poor).
Copyrights © 2025