Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Journal of Embedded Systems, Security and Intelligent Systems

A Hybrid Framework for Plagiarism Detection: Integrating Token-Based Similarity with Density-Based Clustering Fajar B, Muhammad; Lestary, Fitriyanty Dwi; Surianto, Dewi Fatmarani
Journal of Embedded Systems, Security and Intelligent Systems Vol 6, No 1 (2025): March 2025
Publisher : Program Studi Teknik Komputer

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.59562/jessi.v6i1.7664

Abstract

Plagiarism detection in academic assignments remains a critical challenge in maintaining academic integrity in higher education. This study proposes an automated method to detect content similarity between student assignment documents by combining Jaccard Similarity and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithms. The process begins with the collection of student assignment files in digital format, followed by text extraction to form a set-based representation of each document. Jaccard Similarity is then used to compute the degree of similarity between every document pair, and the resulting similarity matrix is transformed into a distance matrix as input for DBSCAN. Experiments conducted on 23 documents yielded 253 unique document pairs. The results demonstrate that the method successfully identified pairs with high similarity scores—such as 0.9114 and 0.7226—which were visually confirmed through a heatmap and effectively grouped into clusters by DBSCAN. Parameter settings of eps = 0.3 and min_samples = 1 proved optimal for distinguishing original documents from those exhibiting substantial content overlap. This approach is not only accurate and efficient, but also eliminates the need for predefined cluster numbers, making it suitable for deployment in automated plagiarism detection systems for academic texts.