Eksponensial
Vol 11 No 2 (2020)

Implementasi Text Mining Pengelompokkan Dokumen Skripsi Menggunakan Metode K-Means Clustering

Rachman, Dezty Adhe Chajannah (Unknown)
Goejantoro, Rito (Unknown)
Amijaya, Fidia Deny Tisna (Unknown)



Article Info

Publish Date
19 Jan 2021

Abstract

Text mining is the text analysis that automatically discover quality information from a series of texts that is summarized in a document. K-Means Clustering method is often used because of its ability to make a group of large amounts of data with relatively fast and efficient computing time. The purpose of this study is to determine the optimal number of the groups formed from the thesis documents and determine the results of the groups formed. This study is using Nazief and Adriani algorithms for the stemming step, Euclidean Similarity to calculate document distances, and Silhouette Coefficient to test the cluster validity. The sample in this study is 119 thesis documents of Statistics Study Program, Mathematics Department, Faculty of Mathematics and Natural Sciences, graduates of 2016-2018. Based on the results of the analysis, the optimal number of groups formed is two clusters with a silhouette coefficient of 0.12. The results of the grouping formed are two clusters with the total of the first cluster is 85 documents and the second cluster is 34 documents. The first cluster is dominated by studies with data mining especially classification, time series analysis, regression analysis, survival analysis, spatial analysis and operational research, and the second cluster is dominated by studies with multivariate analysis, quality control, and insurance mathematics.

Copyrights © 2020






Journal Info

Abbrev

exponensial

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management Economics, Econometrics & Finance Mathematics Other

Description

Jurnal Eksponensial is a scientific journal that publishes articles of statistics and its application. This journal This journal is intended for researchers and readers who are interested of statistics and its ...