Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer
Vol 3 No 5 (2019): Mei 2019

Klasifikasi Hate Speech Berbahasa Indonesia di Twitter Menggunakan Naive Bayes dan Seleksi Fitur Information Gain dengan Normalisasi Kata

Ivan Ivan (Fakultas Ilmu Komputer, Universitas Brawijaya)
Yuita Arum Sari (Fakultas Ilmu Komputer, Universitas Brawijaya)
Putra Pandu Adikara (Fakultas Ilmu Komputer, Universitas Brawijaya)



Article Info

Publish Date
20 Jun 2019

Abstract

Hate speech is a form of expression that is done to eliminate hatred and commit acts of violence and oppose someone or a group of people for various reasons. The cases of hate speech are very often encountered on social media, one of which is on Twitter. The goal to be achieved is to create a system that can classify a tweet on Twitter into a class of hate speech (HS) or non hate speech (NONHS). The method used is Naive Bayes and Information Gain feature selection with word normalization. Word normalization is used to solve problems on Twitter such as the number of words abbreviated, the use of slang, misspellings, and the use of languages ​​that are not in accordance with existing standards.Word normalization comes from Indonesian Natural Language Processing REST API. The data used supports 250 data tweets of hate speech in Indonesian with a ratio of 80% for training data and 20% for testing data. The threshold used is 20%, 40%, 60%, 80%, and 90%. Threshold is a limit that is determined to store a collection of terms or a collection of words with the aim of selecting a word that has a high value ​​in the Information Gain feature selection. The best accuracy results obtained by using word normalization in the pre-processing stage and using Information Gain feature selection with an 80% threshold. The best accuracy result is 98%, precision result is 100%, recall result is 96.15%, and f-measure result is 98.03%. Based on the analysis of the results and testing obtained, it can be concluded when doing hate speech classifications in Indonesian on Twitter using Naive Bayes and Information Gain feature selection with word normalization can improve better accuracy of the results.

Copyrights © 2019






Journal Info

Abbrev

j-ptiik

Publisher

Subject

Computer Science & IT Control & Systems Engineering Education Electrical & Electronics Engineering Engineering

Description

Jurnal Pengembangan Teknlogi Informasi dan Ilmu Komputer (J-PTIIK) Universitas Brawijaya merupakan jurnal keilmuan dibidang komputer yang memuat tulisan ilmiah hasil dari penelitian mahasiswa-mahasiswa Fakultas Ilmu Komputer Universitas Brawijaya. Jurnal ini diharapkan dapat mengembangkan penelitian ...