JURIKOM (Jurnal Riset Komputer)
Vol. 12 No. 6 (2025): Desember 2025

Pengembangan Sistem Deteksi Plagiarisme Dokumen Jurnal Berbasis Bidirectional Encoder Representations from Transformers Dan Cosine Similarity

Ariyanto, Cahya Yoga (Unknown)
Aji, Adam Sekti (Unknown)



Article Info

Publish Date
31 Dec 2025

Abstract

The development of digital technology has had a significant impact across various fields, including education and the management of scientific documents. The ease of access to online journals has introduced a new challenge—an increase in the potential for plagiarism. To address this issue, an automated system capable of detecting document similarity quickly and accurately is required. This study aims to develop a plagiarism detection system based on Cosine Similarity and Bidirectional Encoder Representations from Transformers (BERT). The research stages include text preprocessing, word weighting using Term Frequency–Inverse Document Frequency (TF-IDF), Cosine Similarity computation, BERT model training, and model performance evaluation. The results show that integrating BERT with TF-IDF significantly improves performance compared to using BERT alone. Based on the experiments, the BERT model with TF-IDF achieved the highest accuracy of 0.9621 in a 10:90 data split scenario, with a precision of 0.8141, recall of 0.7302, and F1-score of 0.8022. Meanwhile, the BERT model without TF-IDF only achieved an accuracy of 0.8529. The application of Cosine Similarity with a threshold value of 0.6 also proved effective in identifying plagiarized and non-plagiarized documents. These findings demonstrate that combining BERT and TF-IDF enhances the accuracy of plagiarism detection systems by simultaneously capturing semantic context and word weighting.

Copyrights © 2025






Journal Info

Abbrev

jurikom

Publisher

Subject

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering

Description

JURIKOM (Jurnal Riset Komputer) membahas ilmu dibidang Informatika, Sistem Informasi, Manajemen Informatika, DSS, AI, ES, Jaringan, sebagai wadah dalam menuangkan hasil penelitian baik secara konseptual maupun teknis yang berkaitan dengan Teknologi Informatika dan Komputer. Topik utama yang ...