News is a report about events that are important to be a public consumption or a report about something that is simply of interest to someone. News document are usually categorized as channel with a purpose of organizing the many news documents. With the development of technology, the large number of news has made it difficult to organize news documents manually. This study is to make a classification model to organize news document automatically. The classification model used in this study is Artificial Neural Network (ANN) model with an extraction feature of Feature Hashing. The dataset used in this study has 50,926 with the training data of 80% dataset and test data of 20%. The best model that is made in this study has the accuracy of 0.789 using 1.69% feature from the entire bag-of-word feature or 2,500 feature from 159,154 feature and artificial neural network with 50 neurons in its hidden layer. The easiest classes that can be classified by the model are “Sport†(Olahraga), “Politic†(Politik), “Techno†(Tekno) with f1 measure successively 0.96 0.87 and 0.84. Classes that are the hardest to be classified are “Lifestyle†(Gaya Hidup), “Tourism†(Pariwisata) and “Education†(Pendidikan) with f1 measure successively 0.65, 0.7 and 0.71. Furthermore, feature length from the result of feature extraction and the amount of neuron in the hidden layer of the ANN have an effect on the result of model's accuracy with a logarithmic relationship. Furthermore, n-gram feature also has an effect with the best accuracy can be achieved using uni-gram while hashing method doesn't have a significant effect on model's accuracy.
Copyrights © 2021