JURNAL MEDIA INFORMATIKA BUDIDARMA
Vol 8, No 1 (2024): Januari 2024

Algoritma Jaccard Similarity untuk Deteksi Kemiripan Judul Disertasi dengan Pendekatan Variasi Stop Word Removal

Liga Mayola (Universitas Putra Indonesia YPTK, Padang)
M. Hafizh (Universitas Putra Indonesia YPTK, Padang)
Deri Marse Putra (Universitas Putra Indonesia YPTK, Padang)



Article Info

Publish Date
28 Jan 2024

Abstract

Choosing an unique dissertation title is a challenge. The number of dissertation titles rises as the number of students increases. The title of the dissertation must differ between students. Anticipation that can be done is to adopt a similarity algorithm to detect similarities in dissertation titles. The similarity algorithm chosen is the Jaccard Similarity Algorithm. Jaccard algorithm can be used to detect document similarities. Analysis process begins with preprocessing text. The stages of preprocessing text are case folding, tokenizing, stop word removal and stemming. In this study, variations of stop word removal were tested and the accuracy results obtained were tested after being analyzed using Jaccard Similarity. Researchers call it Stop Word Removal Version One (SWR1) and Stop Word Removal Version Two (SWR2). In SWR1 only prepositions and conjunctions are deleted. Meanwhile SWR2; what was done was the deletion of words in SWR1 plus the deletion of words that were often used in the title but did not make a significant contribution to the meaning of the title. The aim of this approach is to test the accuracy produced by Jaccard against these two stop word removal approaches. The research results show that Jaccard accuracy with SWR2 has an accuracy of 97.8% and SWR1 accuracy is 57.7%. stop word removal , is a critical stage in determining similarity and has a significant influence on the results of the Jaccard Algorithm.

Copyrights © 2024






Journal Info

Abbrev

mib

Publisher

Subject

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering

Description

Decission Support System, Expert System, Informatics tecnique, Information System, Cryptography, Networking, Security, Computer Science, Image Processing, Artificial Inteligence, Steganography etc (related to informatics and computer ...