Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer
Vol 5 No 13 (2021): Publikasi Khusus Tahun 2021

Klasifikasi Dokumen Berita Menggunakan Feature Hashing Dan Jaringan Saraf Tiruan

Guedho Augnifico Mahardika (Fakultas Ilmu Komputer, Universitas Brawijaya)
Indriati Indriati (Fakultas Ilmu Komputer, Universitas Brawijaya)
Putra Pandu Adikara (Fakultas Ilmu Komputer, Universitas Brawijaya)



Article Info

Publish Date
15 Jun 2021

Abstract

News is a report about events that are important to be a public consumption or a report about something that is simply of interest to someone. News document are usually categorized as channel with a purpose of organizing the many news documents. With the development of technology, the large number of news has made it difficult to organize news documents manually. This study is to make a classification model to organize news document automatically. The classification model used in this study is Artificial Neural Network (ANN) model with an extraction feature of Feature Hashing. The dataset used in this study has 50,926 with the training data of 80% dataset and test data of 20%. The best model that is made in this study has the accuracy of 0.789 using 1.69% feature from the entire bag-of-word feature or 2,500 feature from 159,154 feature and artificial neural network with 50 neurons in its hidden layer. The easiest classes that can be classified by the model are “Sport” (Olahraga), “Politic” (Politik), “Techno” (Tekno) with f1 measure successively 0.96 0.87 and 0.84. Classes that are the hardest to be classified are “Lifestyle” (Gaya Hidup), “Tourism” (Pariwisata) and “Education” (Pendidikan) with f1 measure successively 0.65, 0.7 and 0.71. Furthermore, feature length from the result of feature extraction and the amount of neuron in the hidden layer of the ANN have an effect on the result of model's accuracy with a logarithmic relationship. Furthermore, n-gram feature also has an effect with the best accuracy can be achieved using uni-gram while hashing method doesn't have a significant effect on model's accuracy.

Copyrights © 2021






Journal Info

Abbrev

j-ptiik

Publisher

Subject

Computer Science & IT Control & Systems Engineering Education Electrical & Electronics Engineering Engineering

Description

Jurnal Pengembangan Teknlogi Informasi dan Ilmu Komputer (J-PTIIK) Universitas Brawijaya merupakan jurnal keilmuan dibidang komputer yang memuat tulisan ilmiah hasil dari penelitian mahasiswa-mahasiswa Fakultas Ilmu Komputer Universitas Brawijaya. Jurnal ini diharapkan dapat mengembangkan penelitian ...