Journal of Embedded Systems, Security and Intelligent Systems
Vol 6, No 1 (2025): March 2025

A Hybrid Framework for Plagiarism Detection: Integrating Token-Based Similarity with Density-Based Clustering

Fajar B, Muhammad (Unknown)
Lestary, Fitriyanty Dwi (Unknown)
Surianto, Dewi Fatmarani (Unknown)



Article Info

Publish Date
29 Mar 2025

Abstract

Plagiarism detection in academic assignments remains a critical challenge in maintaining academic integrity in higher education. This study proposes an automated method to detect content similarity between student assignment documents by combining Jaccard Similarity and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithms. The process begins with the collection of student assignment files in digital format, followed by text extraction to form a set-based representation of each document. Jaccard Similarity is then used to compute the degree of similarity between every document pair, and the resulting similarity matrix is transformed into a distance matrix as input for DBSCAN. Experiments conducted on 23 documents yielded 253 unique document pairs. The results demonstrate that the method successfully identified pairs with high similarity scores—such as 0.9114 and 0.7226—which were visually confirmed through a heatmap and effectively grouped into clusters by DBSCAN. Parameter settings of eps = 0.3 and min_samples = 1 proved optimal for distinguishing original documents from those exhibiting substantial content overlap. This approach is not only accurate and efficient, but also eliminates the need for predefined cluster numbers, making it suitable for deployment in automated plagiarism detection systems for academic texts.

Copyrights © 2025






Journal Info

Abbrev

JESSI

Publisher

Subject

Computer Science & IT

Description

The Journal of Embedded System Security and Intelligent System (JESSI), ISSN/e-ISSN 2745-925X/2722-273X covers all topics of technology in the field of embedded system, computer and network security, and intelligence system as well as innovative and productive ideas related to emerging technology ...