G-Tech : Jurnal Teknologi Terapan
Vol 8 No 1 (2024): G-Tech, Vol. 8 No. 1 Januari 2024

Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications

Gregorius Airlangga (Atma Jaya Catholic University of Indonesia, Indonesia)



Article Info

Publish Date
22 Jan 2024

Abstract

This research aimed to compare the effectiveness of two Natural Language Processing (NLP) techniques—SpaCy's word embeddings and Sklearn's TF-IDF vectorization—in identifying hate speech within online comments. Utilizing a balanced dataset, each model was meticulously assessed on its ability to classify comments as 'hateful' or 'non-hateful'. The evaluation metrics employed were precision, recall, F1-score, and overall accuracy. The model using SpaCy's word embeddings achieved an accuracy of 65%, with equal precision and recall for both classes. The Sklearn's TF-IDF vectorization model, however, demonstrated superior performance with an overall accuracy of 75% and an enhanced ability to correctly identify hateful comments, evidenced by a 77% recall rate. This suggests that the TF-IDF model is more adept at discerning nuanced expressions of hate speech. The study's findings highlight the critical role of vectorization methods in the field of automated content moderation and stress the importance of continued innovation and model adaptation to effectively manage the evolving nature of online hate speech.

Copyrights © 2024






Journal Info

Abbrev

g-tech

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management Energy Engineering

Description

Jurnal G-Tech bertujuan untuk mempublikasikan hasil penelitian asli dan review hasil penelitian tentang teknologi dan terapan pada ruang lingkup keteknikan meliputi teknik mesin, teknik elektro, teknik informatika, sistem informasi, agroteknologi, ...