The disease outbreak caused by the corona virus (2019-nCov) or commonly called COVID-19 is an outbreak of a disease that causes infection in the lungs. The topic of the COVID-19 vaccine has become a hot topic of discussion for the majority of Indonesian people. One of the biggest social media platforms, Twitter, has become a place for the aspirations of the Indonesian people to express their opinions regarding the COVID-19 vaccine. Therefore, a sentiment analysis system is needed to examine polarities of publics response and to facilitate the data analysis process. The data analyzed comes from the opinion of the Indonesian people on Twitter as many as 1482 tweets with the distribution of training data totaling 1185 and test data totaling 297. The data will then be grouped based on 3 sentiment classes there are negative sentiment class, neutral sentiment, and positive sentiment. Before starting sentiment analysis process, the data set used will be preprocessed including case folding, cleaning, tokenizing, filtering, and stemming. Furthermore, chi square feature selection is applied to eliminate unimportant features or terms, then proceed with TF-IDF weighting. After weighting the TF-IDF, then calculating the cosine similarity, and for the last stage, applying the KNN method approach to find the classification results. The results of the confusion matrix evaluation produce accuracy with a value of 88.5522%, precision with a value of 88.18%, recall with a value of 89.95%, and f-measure with a value of 89.05%.
Copyrights © 2022