Garuda - Garba Rujukan Digital

p-Index From 2021 - 2026

1.978

P-Index

This Author published in this journals

All Journal Jurnal Informatika dan Teknik Elektro Terapan IMAGE JOURNAL OF APPLIED INFORMATICS AND COMPUTING RISTEKDIK : Jurnal Bimbingan dan Konseling Instructional Development Journal Syntax Idea Jurnal Ilmu Komunikasi Network Media Prosiding Seminar Nasional Program Pengabdian Masyarakat Digital Transformation Technology (Digitech) Jurnal Teknik Industri

Akbar, Muhammad Ilham

Unknown Affiliation

Author-ID : 1609117

Aerospace Engineering Agriculture, Biological Sciences & Forestry Automotive Engineering Chemical Engineering, Chemistry & Bioengineering Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Economics, Econometrics & Finance Education Environmental Science Industrial & Manufacturing Engineering Library & Information Science Materials Science & Nanotechnology Mechanical Engineering Medicine & Pharmacology Social Sciences Other

Published : 10 Documents Claim Missing Document

Claim Missing Document

Articles

Title

Identification of Source Code Plagiarism Using a Natural Language Processing (NLP) Approach Based on Code Writing Style Analysis Akbar, Muhammad Ilham; Ningrum, Novita Kurnia
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.11206

Source code plagiarism identificatio requires a system capable of identifying semantic similarity rather than mere textual resemblance. This study utilized a dataset of 1,000 source code files, which after cleaning resulted in 996 individual code samples collected from GitHub repositories. The dataset included various programming languages (Python, Java, JavaScript, TypeScript, C++), divided into 697 training data, 149 validation data, and 149 testing data. The model employed was CodeBERT, configured with a hidden size of 768, 12 layers, and 12 attention heads. CodeBERT generated vector embeddings for each code sample, which were then projected by a Siamese Network to calculate cosine similarity between code pairs. Testing used a threshold of 0.80 to classify plagiarism. The identification results achieved an accuracy of 96.4%, precision of 95.2%, recall of 97.8%, F1-score of 96.4%, and an error rate of 4.6%. The system produced similarity scores and status labels of “plagiarism detected” or “not detected,” demonstrating the effectiveness of the CodeBERT-based approach for adaptive and intelligent code similarity identificatio.

Co-Authors Al Asy'Ari, Musa Azzam Al Rizqi, Muhammad Rana Amelita, Arvira Andayani, Saraswati Surya Ning Asri Wulandari Dutho Suh Utomo Eri Sasmita Susanto Gunawan, Suwardi Hidayat, Sholeh Nur Idifitriani, Farida Irsad, Habib Muhammad Jaya, Heromadhan Ilham Syaputra Malik, Elanisa Maulana, Muhamad Riki Mulyanto, Yudi Ningrum, Novita Kurnia Prasetyo, Aries Heru Putri Rachmawati, Putri Raiz, Muhammad Faizal Rirmawati, Rirmawati Sari, Dewi Sekar Setiadi, Ikhbal Tria Sirajudin, Alviana Uula, zikroatul Nurul Wijaya, Tri Agung Nugraha Buana Zaenal Abidin

Title Search

Found 1 Documents Search Journal : JOURNAL OF APPLIED INFORMATICS AND COMPUTING

Abstract

Title

Found 1 Documents
Search
Journal : JOURNAL OF APPLIED INFORMATICS AND COMPUTING