Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : ComEngApp : Computer Engineering and Applications Journal

Comparison Jaccard similarity, Cosine Similarity and Combined Both of the Data Clustering With Shared Nearest Neighbor Method Lisna Zahrotun
Computer Engineering and Applications Journal Vol 5 No 1 (2016)
Publisher : Universitas Sriwijaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (481.098 KB) | DOI: 10.18495/comengapp.v5i1.160

Abstract

Text Mining is the excavations carried out by the computer to get something new that comes from information extracted automatically from data sources of different text. Clustering technique itself is a grouping technique that is widely used in data mining. The aim of this study was to find the most optimum value similarity. Jaccard similarity method used similarity, cosine similarity and a combination of Jaccard similarity and cosine similarity. By combining the two similarity is expected to increase the value of the similarity of the two titles. While the document is used only in the form of a title document of practical work in the Department of Informatics Engineering University of Ahmad Dahlan. All these articles have been through the process of preprocessing beforehand. And the method used is the method of document clustering with Shared Nearest Neighbor (SNN). Results from this study is the cosine similarity method gives the best value of proximity or similarity compared to Jaccard similarity and a combination of both
Text Mining for Internship Titles Clustering Using Shared Nearest Neighbor Lisna Zahrotun
Computer Engineering and Applications Journal Vol 6 No 3 (2017)
Publisher : Universitas Sriwijaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (607.479 KB) | DOI: 10.18495/comengapp.v6i3.214

Abstract

An Internship course becomes one of many compulsory subjects in Under graduate Program of Informatics Engineering in Ahmad Dahlan University, Yogyakarta.In the last few semesters, we found that some students were failed in taking this subject. After being identified, they were facing some obstacles such as determining the main theme for their job description. During this study, we proposed an application to classify the internship titles by using a technique in text mining called Shared Nearest-Neighbor and Cosine Similarity. From the result, we got values from the parameter K is 7, the epsilon value is 0.5, and the value of Mint t is 0.3 with 22 clusters and 0 outlier. These values presented that all data titles of internship activitiesareclassified into each cluster. 7 topics whichtook by majority of students are:1) Information Systems (7 titles);2) Instructional Media (5 titles);3)Archiving Applications (4 titles);4) Web Profile Implementation (3 titles); 5)Instructional Media for University Courses (3 titles); Multimedia (3 titles) and 6)Workshop & Training (3 titles).