Hate speech is characterized as a form of communication that expresses hostility or discontent towards particular individuals, groups, or ethnicities, with the intent to belittle one party. This research aims to examine hate speech expressions on Twitter, assessing their categorization as hate speech through the application of machine learning methodologies. The study incorporates feature engineering techniques, such as Term Frequency-Inverse Document Frequency (TF-IDF) and the Synthetic Minority Over-sampling Technique (SMOTE), to mitigate challenges related to data imbalance. The machine learning models utilized include Logistic Regression (LR), Decision Tree (DT), Gradient Boosting (GB), and Random Forest (RF). Among these models, Logistic Regression (LR) demonstrated the highest efficacy, achieving an accuracy of 91.43%, precision of 88.83%, recall of 93.99%, and an F1 score of 97.10%.
Copyrights © 2025