Prosiding Seminar Nasional Teknoka
Vol 5 (2020): Prosiding Seminar Nasional Teknoka ke - 5

Analisa Penggunaan K-Gram pada Karakter, Kata dan Kalimat untuk Mendeteksi Kesamaan Dokumen

Ida Widaningrum (Universitas Muhammadiyah Ponorogo)
Dyah Mustikasari (Universitas Muhammadiyah Ponorogo)
Rizal Arifin (Universitas Muhammadiyah Ponorogo)
Erika Diyah Cahyani (Universitas Muhammadiyah Ponorogo)



Article Info

Publish Date
01 Jan 2021

Abstract

The use of digital technology is now a necessity; one of its components is documents. Similarity detection can use a variety of methods, including the fingerprinting method. Fingerprint has a working principle using hashing techniques and K-gram. This research is focused on the detection model using Kgram using the winnowing algorithm and python as a programming language. The k-gram parsing test uses 5 k pieces, namely k = 2 k = 3 k = 4 k = 5 k = 6. As a result, the character parsing gets a larger percentage than the manual character percentage. The percentage of word parsing has the closest percentage of the manual percentage. while in sentences, the percentage is the lowest than the manual percentage.

Copyrights © 2020