JIKSI (Jurnal Ilmu Komputer dan Sistem Informasi)
Vol 2, No 2 (2014): Jurnal Ilmu Komputer dan Sistem Informasi

PENGGUNAAN TESAURUS UNTUK MELAKUKAN CLUSTERING SINGLE-LINKAGE PADA MEDIA SOSIAL TWITTER

Sylvia Wulandari (Unknown)



Article Info

Publish Date
31 Aug 2014

Abstract

Clustering is a one of the text operations to categorize documents using similarity contents to find the relationship between the news or topics. In this study, single-linkage clustering is used to categorize the content of tweets and generate a topic to each cluster. We used Manhattan Distance to calculated the distance between words. In this paper, we also used thesaurus for clusttering process. The data will be joined according to the tweets and distance of the closest synonym. The experiments were performed using 4 sets of data with different threshold values. The accurary of this system is evaluated using the value of purity. It will be used to compare the result between the system result and the references. It turn out, purity using a thesaurus is better than without using thesaurus, because the cluster will be joined when words have synonyms in tweets. The best clustering accurary obtained from first dataset with 0.0003 threshold value is 80.16%. Key words: Clustering, Hierarchical Clustering, Manhattan Distance, Single-Linkage Clustering, Thesaurus, Tweets

Copyrights © 2014






Journal Info

Abbrev

jiksi

Publisher

Subject

Computer Science & IT Mathematics Other

Description

Jurnal Ilmu Komputer dan Sistem Informasi (JIKSI) diterbitkan oleh Fakultas Teknologi Informasi Universitas Tarumanagara (FTI Untar) Jakarta sebagai media publikasi karya ilmiah mahasiswa program studi Teknik Informatika dan Sistem Informasi FTI Untar. Karya-karya ilmiah yang dihasilkan berupa hasil ...