Jurnal Teknologi Informasi, Komputer, dan Aplikasinya (JTIKA )
Vol 6 No 1 (2024): March 2024

SPELLING ERROR CORRECTION IN INDONESIAN USING DAMERAU-LEVENSHTEIN DISTANCE DAN N-GRAM

Kokong, Diah Anggreni Ratna Sari (Unknown)
Irmawati, Budi (Unknown)
Dwiyansaputra, Ramaditia (Unknown)



Article Info

Publish Date
31 Mar 2024

Abstract

Writing errors or spelling is a thing that needs to be considered because it can affect the calculations performed by some of the topics on Natural Language Processing that relies on the validity of the input data. Several studies have been conducted to correct writing errors that occur, one of which study by Fahma, A. I., et al using n-gram method and Levenshtein distance produced corrections with the best precision value of 0.97 for insertion type and best recall value by 1 for substitution types. With high accuracy, this study proposes to use the algorithm of development of Levenshtein, namely Damerau-Levenshtein, and n-gram methods. Damerau-Levenshtein has the same operations like insertion, deletion, substitution but with the addition of transposition operations between two characters. Damerau not only distinguishes four edit operations but also states that operations in the developed algorithms, can fit about 80% of all human writing errors. The types of n-grams used are bigram (n = 2) and trigram (n = 3). The testing results obtained in this study for the detection accuracy of the precision and recall ranged from 80%-100%. While correction accuracy testing uses equations proposed by Dahlmier and Ng, among the average accuracy values of precision and recall for all three scenarios, scenario C with a top 10 rating has the highest accuracy value of 96%.

Copyrights © 2024






Journal Info

Abbrev

JTIKA

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management Engineering

Description

Jurnal Teknologi Informasi, Komputer dan Aplikasinya disingkat dengan JTIKA diterbitkan oleh Program Studi Teknik Informatika Fakultas Teknik Universitas Mataram sebagai wadah publikasi hasil penelitian original dalam di bidang teknologi informasi, ilmu komputer dan aplikasinya. JTIKA adalah open ...