This Author published in this journals
All Journal Infotekmesin
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Kombinasi Algoritma TF-IDF dan Weighted Dice Similarity untuk Pengukuran Kemiripan Judul Tugas Akhir Purwaningrum, Santi; Susanto, Agus; Setiawan Prabowo , Annas
Infotekmesin Vol 16 No 2 (2025): Infotekmesin: Juli 2025
Publisher : P3M Politeknik Negeri Cilacap

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35970/infotekmesin.v16i2.2812

Abstract

The high similarity rate among undergraduate thesis titles has become a critical issue in maintaining the originality of academic work within higher education institutions. This study aims to develop an automated system for detecting title similarity by combining the Term Frequency–Inverse Document Frequency (TF-IDF) algorithm with the Weighted Dice Similarity method. TF-IDF is used to assign weights to important words in the titles, while Weighted Dice Similarity measures the degree of similarity between titles based on the distribution and weights of these words. The study utilizes a dataset of 200 manually annotated thesis titles as ground truth. The analysis process includes preprocessing, word weighting, and similarity computation between titles. Experimental results show that the system achieves an accuracy of 94%, a precision of 66.67%, a recall of 81.3%, and an average Weighted Dice similarity score of 0.62. Although the precision score is relatively moderate, the combination of both methods is considered effective, as it captures both lexical structure and semantic similarity, capabilities that are not fully achieved when using a single method alone