The growth of technology makes it easy to get informations and a kind of informations is often used is news media. As technology growth, news can be spread through news portals in form of web-bases such as Kompas, Detik, Tempo, and many others. Users of information technology sometimes don't have time to read news all the time and sometime can't get the news that they need. One of many solution to solve the problem is to do clustering news documents and after that topic extraction is used to get get important topics from the news cluster. In this research using Clustering Large Application (CLARA) for the clustering algorithm because CLARA is an optimization of k-medoid which is better than k-means from various aspects and on topic extraction uses term-cluster weighting to calculate term weights in the cluster. The proses of this research is used text preprocessing documents so it become structured data, after that Singular Value Decomposition (SVD) used to decomose features. Then CLARA is used to clustering documents and for topic extraction is using term frequency-inverse cluster frequency (TF-ICF). Data in this research is secondary data that obtained from Kaggle website which is an English language news documents. The result of silhoette sore from using 226 documents and 2 clusters is 0,005. As for accuracy topic extraction is 1 with taken number topic from 1 to 10.
Copyrights © 2019