This study investigates the efficacy of machine learning algorithms in sentiment classification within the context of Culture and Heritage Tourism content analysis. This study adopts the CRISP-DM method, a comprehensive methodology encompassing distinct stages, including business understanding, data understanding, modeling, evaluation, and deployment. The k-nearest Neighbors, Decision Tree, Naive Bayes Classifier, and Support Vector Machine models are used. The performance of each model is scrutinized through confusion matrix analysis, encompassing metrics such as accuracy, precision, recall, and F-measure. Additionally, the impact of the Synthetic Minority Over-sampling Technique (SMOTE) implementation on addressing data imbalance is assessed. Leveraging data from the national geographic channel's YouTube platform, with a focus on ma'nene content, results reveal SVM's consistent superiority, particularly with SMOTE integration, showcasing elevated accuracy (77.89%), precision (72.60%), recall (89.62%), and F-measure (80.21%) values. These findings underscore the importance of algorithm selection and data preprocessing methods in enhancing sentiment classification accuracy for culture and heritage tourism content, thus contributing quantifiable insights to the tourism research domain.
Copyrights © 2023