Xplore: Journal of Statistics
Vol. 8 No. 1 (2019): 30 April 2019

Penerapan Metode DBSCAN dalam Memperbaiki Kinerja K-Means untuk Penggerombolan Data Tweet

Astri Fatimah (Department of Statistics, IPB)
Anang Kurnia (Department of Statistics, IPB)
Septian Rahardiantoro (Department of Statistics, IPB)
Yani Nurhadryani (Department of Computer Science, IPB)



Article Info

Publish Date
06 Apr 2019

Abstract

Text Mining is collecting text data mining results from a computer to get information contained therein. Text data has a form of data that is not structured and difficult to analyze. The unstructured data can be used as structured data through pre-processing stages. Text data is represented as numerical data after going through the pre-processing stages using vector space model method and weighting method of inverse frequency document frequency so that it can be used for analysis. The K-Means cluster analysis is one method that can be used for unstructured data, but the K-Means method is not robust to noise. Outliers can be detected using Density Based Spatial Clustering of Application with Noise (DBSCAN) cluster analysis. Outliers obtained from DBSCAN results can be omitted in the data. Cluster analysis was carried out again after removal of outliers using the K-Means method with the same number of k clusters. Evaluation of the cluster that is used to see the goodness of the cluster results is Silhouette Coefficient (SC). The SC value of the K-Means method after removal of outliers has a significant increase of 0.21 for a small amount of data. Adding the amount of text data to cluster analysis also affects the number of clusters. This is influenced by the number of katas in a document that is given weight. The fewer katas that are given weight, the more number of clusters will be generated

Copyrights © 2019






Journal Info

Abbrev

xplore

Publisher

Subject

Decision Sciences, Operations Research & Management Engineering Mathematics

Description

Xplore: Journal of Statistics diterbitkan berkala 3 (tiga) kali dalam setahun yang memuat tulisan ilmiah yang berhubungan dengan bidang statistika. Artikel yang dimuat berupa hasil penelitian atau kajian pustaka dalam bidang statistika dan atau penerapannya. ISSN: 2302-5751 Mulai Desember 2018, ...