Infotekmesin
Vol 16 No 2 (2025): Infotekmesin: Juli 2025

Kombinasi Algoritma TF-IDF dan Weighted Dice Similarity untuk Pengukuran Kemiripan Judul Tugas Akhir

Purwaningrum, Santi (Unknown)
Susanto, Agus (Unknown)
Setiawan Prabowo , Annas (Unknown)



Article Info

Publish Date
31 Jul 2025

Abstract

The high similarity rate among undergraduate thesis titles has become a critical issue in maintaining the originality of academic work within higher education institutions. This study aims to develop an automated system for detecting title similarity by combining the Term Frequency–Inverse Document Frequency (TF-IDF) algorithm with the Weighted Dice Similarity method. TF-IDF is used to assign weights to important words in the titles, while Weighted Dice Similarity measures the degree of similarity between titles based on the distribution and weights of these words. The study utilizes a dataset of 200 manually annotated thesis titles as ground truth. The analysis process includes preprocessing, word weighting, and similarity computation between titles. Experimental results show that the system achieves an accuracy of 94%, a precision of 66.67%, a recall of 81.3%, and an average Weighted Dice similarity score of 0.62. Although the precision score is relatively moderate, the combination of both methods is considered effective, as it captures both lexical structure and semantic similarity, capabilities that are not fully achieved when using a single method alone

Copyrights © 2025






Journal Info

Abbrev

infotekmesin

Publisher

Subject

Computer Science & IT Electrical & Electronics Engineering Mechanical Engineering

Description

INFOTEKMESIN is a peer-reviewed open-access journal with e-ISSN 2685-9858 and p-ISSN: 2087-1627 published by Pusat Penelitian dan Pengabdian Masyarakat (P3M) Politeknik Negeri Cilacap. The journal invites scientists and engineers to exchange and disseminate theoretical and practice-oriented in the ...