This research evaluates various machine learning models in classifying sentiment in cyberbullying data across six categories: not_cyberbullying, gender, religion, other_cyberbullying, age, and ethnicity. Using a Bag of Words approach combined with Chi-Square feature selection (1000 features), models tested include SVM, Logistic Regression, Naïve Bayes, KNN, and Random Forest. Results show SVM and Logistic Regression achieving the highest accuracy at 83%, indicating their effectiveness in prediction. Naïve Bayes performed the poorest with 62% accuracy, suggesting a mismatch with the data or need for further tuning. KNN and Random Forest showed good performance with 75% and 81% accuracy respectively, though not as high as SVM and Logistic Regression. This multi-algorithm approach provides insights into each model's effectiveness and behavior on diverse data characteristics, essential for understanding the unique nuances of each cyberbullying category. Model selection should consider accuracy, interpretability, computational cost, and suitability to specific problem characteristics. This research aims to deepen understanding of cyberbullying to support more effective mitigation strategies.
                        
                        
                        
                        
                            
                                Copyrights © 2024