Plagiarism detection in academic assignments remains a critical challenge in maintaining academic integrity in higher education. This study proposes an automated method to detect content similarity between student assignment documents by combining Jaccard Similarity and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithms. The process begins with the collection of student assignment files in digital format, followed by text extraction to form a set-based representation of each document. Jaccard Similarity is then used to compute the degree of similarity between every document pair, and the resulting similarity matrix is transformed into a distance matrix as input for DBSCAN. Experiments conducted on 23 documents yielded 253 unique document pairs. The results demonstrate that the method successfully identified pairs with high similarity scores—such as 0.9114 and 0.7226—which were visually confirmed through a heatmap and effectively grouped into clusters by DBSCAN. Parameter settings of eps = 0.3 and min_samples = 1 proved optimal for distinguishing original documents from those exhibiting substantial content overlap. This approach is not only accurate and efficient, but also eliminates the need for predefined cluster numbers, making it suitable for deployment in automated plagiarism detection systems for academic texts.
                        
                        
                        
                        
                            
                                Copyrights © 2025