IAES International Journal of Artificial Intelligence (IJ-AI)
Vol 13, No 4: December 2024

Efficient cross-lingual plagiarism detection using bidirectional and auto-regressive transformers

Bouaine, Chaimaa (Unknown)
Benabbou, Faouzia (Unknown)



Article Info

Publish Date
01 Dec 2024

Abstract

The pervasive availability of vast online information has fundamentally altered our approach to acquiring knowledge. Nevertheless, this wealth of data has also presented significant challenges to academic integrity, notably in the realm of cross-lingual plagiarism. This type of plagiarism involves the unauthorized copying, translation, ideas, or works from one language into others without proper citation. This research introduces a methodology for identifying multilingual plagiarism, utilizing a pre-trained multilingual bidirectional and auto-regressive transformers (mBART) model for document feature extraction. Additionally, a siamese long short-term memory (SLSTM) model is employed for classifying pairs of documents as either "plagiarized" or "non-plagiarized". Our approach exhibits notable performance across various languages, including English (En), Spanish (Es), German (De), and French (Fr). Notably, experiments focusing on the En-Fr language pair yielded exceptional results, with an accuracy of 98.83%, precision of 98.42%, recall of 99.32%, and F-score of 98.87%. For En-Es, the model achieved an accuracy of 97.94%, precision of 98.57%, recall of 97.47%, and an F-score of 98.01%. In the case of En-De, the model demonstrated an accuracy of 95.59%, precision of 95.21%, recall of 96.85%, and F-score of 96.02%. These outcomes underscore the effectiveness of combining the MBART transformer and SLSTM models for cross-lingual plagiarism detection.

Copyrights © 2024






Journal Info

Abbrev

IJAI

Publisher

Subject

Computer Science & IT Engineering

Description

IAES International Journal of Artificial Intelligence (IJ-AI) publishes articles in the field of artificial intelligence (AI). The scope covers all artificial intelligence area and its application in the following topics: neural networks; fuzzy logic; simulated biological evolution algorithms (like ...