INFOKUM
Vol. 9 No. 2, June (2021): Data Mining, Image Processing and artificial intelligence

IMPLEMENTATION OF TF-IDF AND COSINE SIMILARITY ALGORITHMS FOR CLASSIFICATION OF DOCUMENTS BASED ON ABSTRACT SCIENTIFIC JOURNALS

Paska Marto Hasugian (Program Studi Rekayasa Perangkat Lunak, STMIK Pelita Nusantara)
Jonson Manurung (Program Studi Rekayasa Perangkat Lunak, STMIK Pelita Nusantara)
Logaraz Logaraz (Mahasiswa, Program Studi Rekayasa Perangkat Lunak, STMIK Pelita Nusantara)
Uzitha Ram (Mahasiswa, Program Studi Rekayasa Perangkat Lunak, STMIK Pelita Nusantara)



Article Info

Publish Date
31 Aug 2021

Abstract

Research on one of the higher education dharmas is carried out by each lecturer and is a challenge for lecturers who pay attention to produce new and useful findings. Research results will be published in journals both nationally and internationally and one of the websites published by Ristekbirn is Sinta which includes all research works in Indonesia. The problem in this research is the accumulation of data that is getting bigger and it needs to be analyzed by utilizing text mining by searching for the resources contained in the abstract document and presenting part of the information. The purpose of this study is to classify the suitability of another document so that knowledge is found. and placement in groups according to existing topics. The process of these problems is by classifying documents based on abstracts from the publication of scientific papers. Solving these problems involves two mutually supporting algorithms, namely TD-IDF with Cosine Similarity with different tasks. TF-IDF ensures the weight of each document that can be read and read with Cosine Similarity. This research uses text mining as part of the search for related patterns and documents that have been tested. For the process of calculating the test data, 1 document and 15 documents were used as training data. With the calculation of TD-IDF the weight of each document from Q, D2 to D15 is 10,946, 28,050,27,176, 39,043, 36,535, 30,696, 25,612, 12,581, 42,335, 29,661, 33,867, 31,706, 22,654, 15,450, 59,832, 42,127, The similarity of the data is tested by determining the value of k = 4 which results in similarity to the Expert System and Cryptography, while with the selection of K = 5 with the highest similarity to the expert system..

Copyrights © 2021






Journal Info

Abbrev

infokum

Publisher

Subject

Computer Science & IT

Description

The INFOKUM a scientific journal of Decision support sistem , expert system and artificial inteligens which includes scholarly writings on pure research and applied research in the field of information systems and information technology as well as a review-general review of the development of the ...