Indonesian Journal of Electrical Engineering and Computer Science
Vol 8, No 3: December 2017

Indonesian News Classification Using Naïve Bayes and Two-Phase Feature Selection Model

M. Ali Fauzi (Universitas Brawijaya)
Agus Zainal Arifin (Institut Teknologi Sepuluh Nopember)
Sonny Christiano Gosaria (Institut Teknologi Sepuluh Nopember)



Article Info

Publish Date
01 Dec 2017

Abstract

Since the rise of WWW, information available online is growing rapidly. One of the example is Indonesian online news. Therefore, automatic text classification became very important task for information filtering. One of the major issue in text classification is its high dimensionality of feature space. Most of the features are irrelevant, noisy, and redundant, which may decline the accuracy of the system. Hence, feature selection is needed. Maximal Marginal Relevance for Feature Selection (MMR-FS) has been proven to be a good feature selection for text with many redundant features, but it has high computational complexity. In this paper, we propose a two-phased feature selection method. In the first phase, to lower the complexity of MMR-FS we utilize Information Gain first to reduce features. This reduced feature will be selected using MMR-FS in the second phase. The experiment result showed that our new method can reach the best accuracy by 86%. This new method could lower the complexity of MMR-FS but still retain its accuracy.

Copyrights © 2017