In recent years, sentiment analysis has emerged as a critical area of research due to its wide-ranging applications in understanding public opinion, customer feedback, and social media sentiment. However, one of the significant challenges faced in sentiment analysis is the handling of imbalanced datasets, where the distribution of sentiment classes is uneven, leading to biased model performance. This study employs the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology to investigate sentiment analysis algorithms, mainly focusing on the Support Vector Machine (SVM) algorithm and the integration of the Synthetic Minority Over-sampling Technique (SMOTE). Through systematic experimentation and evaluation, the research demonstrates the superior performance of the SVM-SMOTE model in handling imbalanced datasets, achieving an accuracy of 98.46%, an AUC of 1.000, precision of 100.00%, recall of 96.91%, and an impressive F-measure of 98.42%. Additionally, the evaluation unveils specific toxicity scores across various categories, with Toxicity scoring at 0.11036 and 0.93915, Severe Toxicity at 0.00905 and 0.45895, Identity Attack at 0.02415 and 0.66373, Insult at 0.05149 and 0.85793, Profanity at 0.06392 and 0.93426, and Threat at 0.01562 and 0.51957. These numerical indicators provide quantitative insights into potential harm within analyzed content, emphasizing the efficacy of the SVM-SMOTE model in real-world applications and contributing to the advancement of sentiment analysis within the CRISP-DM framework.
                        
                        
                        
                        
                            
                                Copyrights © 2023