Building of Informatics, Technology and Science
Vol 5 No 3 (2023): December 2023

Sentiment Classification of S.E.A Aquarium Singapore Reviews through CRISP-DM using DT and SVM with SMOTE

Singgalen, Yerik Afrianto (Unknown)



Article Info

Publish Date
30 Dec 2023

Abstract

In recent years, sentiment analysis has emerged as a critical area of research due to its wide-ranging applications in understanding public opinion, customer feedback, and social media sentiment. However, one of the significant challenges faced in sentiment analysis is the handling of imbalanced datasets, where the distribution of sentiment classes is uneven, leading to biased model performance. This study employs the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology to investigate sentiment analysis algorithms, mainly focusing on the Support Vector Machine (SVM) algorithm and the integration of the Synthetic Minority Over-sampling Technique (SMOTE). Through systematic experimentation and evaluation, the research demonstrates the superior performance of the SVM-SMOTE model in handling imbalanced datasets, achieving an accuracy of 98.46%, an AUC of 1.000, precision of 100.00%, recall of 96.91%, and an impressive F-measure of 98.42%. Additionally, the evaluation unveils specific toxicity scores across various categories, with Toxicity scoring at 0.11036 and 0.93915, Severe Toxicity at 0.00905 and 0.45895, Identity Attack at 0.02415 and 0.66373, Insult at 0.05149 and 0.85793, Profanity at 0.06392 and 0.93426, and Threat at 0.01562 and 0.51957. These numerical indicators provide quantitative insights into potential harm within analyzed content, emphasizing the efficacy of the SVM-SMOTE model in real-world applications and contributing to the advancement of sentiment analysis within the CRISP-DM framework.

Copyrights © 2023






Journal Info

Abbrev

bits

Publisher

Subject

Computer Science & IT

Description

Building of Informatics, Technology and Science (BITS) is an open access media in publishing scientific articles that contain the results of research in information technology and computers. Paper that enters this journal will be checked for plagiarism and peer-rewiew first to maintain its quality. ...