International Journal of Advances in Intelligent Informatics
Vol 9, No 1 (2023): March 2023

Sentiment classification from reviews for tourism analytics

Nur Aliah Khairina Mohd Haris (School of Computing Sciences, College of Computing, Informatics and Media, Universiti Teknologi MARA)
Sofianita Mutalib (School of Computing Sciences, College of Computing, Informatics and Media, Universiti Teknologi MARA)
Ariff Md Ab Malik (Faculty of Business and Management, Universiti Teknologi MARA)
Shuzlina Abdul-Rahman (Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA)
Siti Nur Kamaliah Kamarudin (Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA)



Article Info

Publish Date
31 Mar 2023

Abstract

User-generated content is critical for tourism destination management as it could help them identify their customers' opinions and come up with solutions to upgrade their tourism organizations as it could help them identify customer opinions. There are many reviews on social media and it is difficult for these organizations to analyse the reviews manually. By applying sentiment classification, reviews can be classified into several classes and help ease decision-making. The reviews contain noisy contents, such as typos and emoticons, which could affect the accuracy of the classifiers. This study evaluates the reviews using Support Vector Machine and Random Forest models to identify a suitable classifier. The main phases in this study are data collection, data preparation, data labelling and modelling phases. The reviews are labelled into three sentiments; positive, neutral, and negative. During pre-processing, steps such as removing the missing value, tokenization, case folding, stop words removal, stemming, and applying n-grams are performed. The result of this research is evaluated by looking at the performance of the models based on accuracy where the result with the highest accuracy is chosen as the solution. In this study, data is data from TripAdvisor and Google reviews using web scraping tools. The findings show that the Support Vector Machine model with 5-fold cross-validation the most suitable classifier with an accuracy of 67.97% compared to Naive Bayes with 61.33% accuracy and Random Forest classifier with 63.55% accuracy. In conclusion, the result of this paper could provide important information in tourism besides determining the suitable algorithm to be used for Sentiment Analysis related to the tourism domain.

Copyrights © 2023






Journal Info

Abbrev

IJAIN

Publisher

Subject

Computer Science & IT

Description

International journal of advances in intelligent informatics (IJAIN) e-ISSN: 2442-6571 is a peer reviewed open-access journal published three times a year in English-language, provides scientists and engineers throughout the world for the exchange and dissemination of theoretical and ...