JuTISI (Jurnal Teknik Informatika dan Sistem Informasi)
Vol 9 No 2 (2023): JuTISI

Analisis Komparatif Pengukuran Kemiripan Artikel Ilmiah menggunakan Jaccard dan Levenshtein serta Blocking

Muhammad Rizqi Nur (Institut Teknologi Sepuluh Nopember)
Gandhi Surya Buana (Institut Teknologi Sepuluh Nopember)
Nur Aini Rakhmawati (Institut Teknologi Sepuluh Nopember)



Article Info

Publish Date
11 Aug 2023

Abstract

Paper search engines have made it easier for academics to conduct literature reviews. However, easy doesn't mean accurate. For certain niche topics, search results often aren’t quite good. Snowballing can be done to overcome this, but it is limited to the initial articles owned, especially the author's access when the article was written. As an alternative, paper databases provide recommendations for relevant articles of an article, but it’s limited to that database. A tool to search for similar articles without relying on a specific database would be very helpful, but before that, the appropriate method for measuring article similarity needs to be determined. This research aims to measure article similarity based on title, author, and keywords using Weighted Jaccard Measure and Levenshtein distance and evaluate it. This study also compares performance by adding blocking with overlap blocking and stop word removal. The Jaccard evaluation results are quite poor, but the Levenshtein + Jaccard evaluation results are decent. In addition, it was found that emphasizing weighting on the title produces the best results. Overlap blocking and stop words removal increases processing time instead. Overlap blocking can reduce the number of measurements by almost half with an overlap of 1, but overlaps above 1 will discard many pairs that should be similar. Removing stop words improves Jaccard and Levenshtein performance but requires threshold adjustment.

Copyrights © 2023






Journal Info

Abbrev

jutisi

Publisher

Subject

Computer Science & IT

Description

Paper topics that can be included in JuTISI are as follows, but are not limited to: • Artificial Intelligence • Business Intelligence • Cloud & Grid Computing • Computer Networking & Security • Data Analytics • Datawarehouse & Datamining • Decision Support System • E-Systems (E-Gov, ...