Hate speech is a linguistic phenomenon that deviates from the norms and polite grammar in language and communication ethics, today hate speech is very widespread on the internet, especially social media users. This research is aimed at detecting a word or sentence containing or not containing a hate speech using the Support Vector Machine (SVM) method for classification. This research takes data from twitter tweets using the Tweepy API and gets a total sample data of 1681 labeled HS (643 data) for tweet data that is a hate speech and Non_HS (1038 data) for data that is not a hate speech. To do word weighting, researchers use Term Frequency-Inverse Document Frequency (TF-IDF) to find out the frequency of words that often arise in the dataset. In the classification process, researchers used two methods, namely Support Vector Machine and XGBoost which then from the best results in SVM with 90% training data and 10% test data obtained a training score of 95.87% and a test score of 87.30% with a gap of 8.57% then from the SVM method was tuned using Randomized Search Cross Validation (RSCV) and managed to increase the training score by 100% test score of 93.20% with a gap of 6.80%.
Copyrights © 2023