Hate speech and abusive language are negative tendencies that often appear on social media recently. In addition, due to the advancement of technology and the rapid growth of the internet, anyone can now engage in hate speech or even offensive expression such as in Twitter, which eventually leads to fights on that social media platforms. Automatic detection of offensive contents and hate speech is recommended to be applied, especially on the user application’s side, to filter tweet contents which destruct social life in the real world. The purpose of this research is to create a classification model using Support Vector Machine with FastText word embeddings features, to classify if a tweet contains hate speech and/or offensive language. Our contribution in this research is an improvement in performance from the baseline SVM (support vector machine) with FastText word embeddings features input. The experiment results will also be compared with several machine learning method that have been reported using the same dataset of 13,167 tweets. The experiment using the most optimal SVM model, yields an average accuracy of 82.65%, with the accuracies of the hate speech class, abusive language class and hate speech level, are 84.92%, 86.60% and 76.43% respectively. These results are better than conventional machine learning, but cannot exceed the results achieved by deep learning.
Copyrights © 2023