Nusantara Journal of Computers and its Applications
Vol 2, No 1 (2017): Juni 2017

PENGGUNAAN DICTIONARY-BASED DAN CORPUS-BASED THESAURUS UNTUK PEMBOBOTAN TERM PADA PENGELOMPOKAN DOKUMEN BERITA BERBAHASA INDONESIA

Amelia Sahira Rahma (Departemen Teknik InformatikaInstitut Teknologi Sepuluh Nopember Jl. Raya ITS Kampus Sukolilo, Surabaya, 60111, Indonesia)
Vit Zuraida (Departemen Teknik InformatikaInstitut Teknologi Sepuluh Nopember Jl. Raya ITS Kampus Sukolilo, Surabaya, 60111, Indonesia)
Dimas Fanny Hebrasianto Permadi (Departemen Teknik InformatikaInstitut Teknologi Sepuluh Nopember Jl. Raya ITS Kampus Sukolilo, Surabaya, 60111, Indonesia)



Article Info

Publish Date
01 Jul 2017

Abstract

Huge numbers of digital news document in Indonesian Language led to the need for automatic document clustering based on topic so readers would have an easier access to news articles in the same topic. One of the major problems in document clustering is low relevancy in the clustering result so the documents are not grouped based on their appropriate topic. This paper proposed a new term weighting method that employs combination of corpus-based thesaurus and dictionary-based thesaurus to consider conceptual similarity between terms. This method is evaluated using K-Means algorithm to 253 news document in Indonesian language.  Experimental results show that the proposed term weighting method is able to achieve good performance.

Copyrights © 2017






Journal Info

Abbrev

njca

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Engineering Other

Description

NJCA (Nusantara Journal of Computers and Its Applications) is a peer-reviewed bi-annual journal concerning on computer science and its applications. The article shall address any research on theoretical and empirical on computer science and its applications. The Topics addressed within the journal ...