Computer Science and Information Technologies
Vol 6, No 2: July 2025

Classification and similarity detection of Indonesian scientific journal articles

Cahyani, Nyimas Sabilina (Unknown)
Stiawan, Deris (Unknown)
Abdiansah, Abdiansah (Unknown)
Afifah, Nurul (Unknown)
Permana, Dendi Renaldo (Unknown)



Article Info

Publish Date
01 Jul 2025

Abstract

The development of technology is accelerating in finding references to scientific articles or journals related to research topics. One of the sources of national aggregator services to find references is Garba Rujukan Digital (GARUDA), developed by the Ministry of Education, Culture, Research, and Technology (Kemendikbudristek) of the Republic of Indonesia. The naïve Bayes method classifies articles into several categories based on titles and abstracts. The system achieves an F1-score of 98%, which indicates high classification accuracy, and the classification process takes less than 60 minutes. Article similarity detection is done using the cosine similarity method, and a similarity score of 0.071 reflects the degree of similarity between the title and the abstract that has been concatenated, while a score close to 1 indicates a higher similarity. Searching for similar scientific articles based on title and abstract, sort articles based on the results of the highest similarity score are the most similar articles, and generating article categories. The results of the research show that the proposed method significantly improves the classification and search processes in GARUDA, as well as accurate and efficient similarity detection.

Copyrights © 2025






Journal Info

Abbrev

csit

Publisher

Subject

Computer Science & IT Engineering

Description

Computer Science and Information Technologies ISSN 2722-323X, e-ISSN 2722-3221 is an open access, peer-reviewed international journal that publish original research article, review papers, short communications that will have an immediate impact on the ongoing research in all areas of Computer ...