Scientific Journal of Informatics
Vol 11, No 1 (2024): February 2024

Damerau-Levenshtein Distance Algorithm Based on Abstract Syntax Tree to Detect Code Plagiarism

Nuraminah, Ahlijati (Unknown)
Ammar, Abdullah (Unknown)



Article Info

Publish Date
05 Dec 2023

Abstract

Purpose: This research aimed to detect source code plagiarism based on Abstract Syntax Tree using Damerau-Levenshtein Distance algorithm, which is expected to streamline the inaccuracies and time-consumption associated with the manual process.Methods: Damerau-Levenshtein Distance algorithm was used to determine the similarity between source code files and calculate F-Measure. The dataset, which consisted of 178 source code files from 20 coursework assignments, was obtained from GitHub by Lawton Nichols in 2019. Damerau-Levenshtein Distance algorithm was used to compute the minimum cost required to transform one line of code into another. Furthermore, ANTLR detected AST, which was processed through preprocessing, including node pruning, function and variable sorting, and log output removal. Result: The result showed that the two methods took 5.704 seconds and 0.996 seconds to complete. The lowest and highest values obtained using F-Measure were 0.16 and 0.8, respectively. Therefore, the system performed detection processes quickly and effectively detected common forms of code plagiarism with difficulty in the more complex forms. Novelty: In conclusion, this research used AST and Damerau-Levenshtein Distance algorithm to calculate the 5 levels of similarity in Java programming language source code. For further development, preprocessing steps were needed to prune unnecessary nodes and detect equivalent but differently syntaxed code. 

Copyrights © 2024






Journal Info

Abbrev

SJI

Publisher

Subject

Computer Science & IT

Description

Scientific Journal of Informatics published by the Department of Computer Science, Semarang State University, a scientific journal of Information Systems and Information Technology which includes scholarly writings on pure research and applied research in the field of information systems and ...