Jurnal Teknologi dan Sistem Komputer
Volume 8, Issue 2, Year 2020 (April 2020)

Temu kembali dokumen sumber rujukan dalam sistem daur ulang teks

Nathaniel Clarence Haryanto (Department of Informatics, Universitas Kristen Duta Wacana)
Lucia Dwi Krisnawati (Department of Informatics, Universitas Kristen Duta Wacana)
Antonius Rachmat Chrismanto (Department of Informatics, Universitas Kristen Duta Wacana)



Article Info

Publish Date
30 Apr 2020

Abstract

The architecture of the text-reuse detection system consists of three main modules, i.e., source retrieval, text analysis, and knowledge-based postprocessing. Each module plays an important role in the accuracy rate of the detection outputs. Therefore, this research focuses on developing the source retrieval system in cases where the source documents have been obfuscated in different levels. Two steps of term weighting were applied to get such documents. The first was the local-word weighting, which has been applied to the test or reused documents to select query per text segments. The tf-idf term weighting was applied for indexing all documents in the corpus and as the basis for computing cosine similarity between the queries per segment and the documents in the corpus. A two-step filtering technique was applied to get the source document candidates. Using artificial cases of text reuse testing, the system achieves the same rates of precision and recall that are 0.967, while the recall rate for the simulated cases of reused text is 0.66.

Copyrights © 2020






Journal Info

Abbrev

JTSISKOM

Publisher

Subject

Computer Science & IT Electrical & Electronics Engineering

Description

Jurnal Teknologi dan Sistem Komputer (JTSiskom, e-ISSN: 2338-0403) adalah terbitan berkala online nasional yang diterbitkan oleh Departemen Teknik Sistem Komputer, Universitas Diponegoro, Indonesia. JTSiskom menyediakan media untuk mendiseminasikan hasil-hasil penelitian, pengembangan dan ...