Huge numbers of digital news document in Indonesian Language led to the need for automatic document clustering based on topic so readers would have an easier access to news articles in the same topic. One of the major problems in document clustering is low relevancy in the clustering result so the documents are not grouped based on their appropriate topic. This paper proposed a new term weighting method that employs combination of corpus-based thesaurus and dictionary-based thesaurus to consider conceptual similarity between terms. This method is evaluated using K-Means algorithm to 253 news document in Indonesian language. Experimental results show that the proposed term weighting method is able to achieve good performance.
Copyrights © 2017