Hardiyanti, Margareta
Unknown Affiliation

Published : 8 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : Scientific Journal of Informatics

Identifying The Common Type of Spelling Error by Leveraging Levenshtein Distance and N-gram Hardiyanti, Margareta
Scientific Journal of Informatics Vol 8, No 1 (2021): May 2021
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v8i1.29273

Abstract

A spelling error commonly occurs during document writing. It probably happens due to the authors’ vocabulary incompetence or they may strike the improper key in the keyboard. The types of errors that mostly appear such as insertion of an extra letter, deletion of one letter, substitution of one letter, or transposition of two adjacent letters. This study aims to identify the common type of spelling error and it uses the list of common misspelling words submitted by Wikipedia contributors.A brief overview of Levenshtein and N-gram distance techniques is provided to describe the technical approaches that support the author to achieve the purpose of this study.Those two techniques are utilised to predict the correct word of misspellings from the English dictionary.This study shows that Levenshtein works well to correct substitution single letter and transposition two sequenced letters, while N-gram operates effectively to fix the word with letter omission.The overall result is then evaluated by recall measurement to see which technique that works well on correcting the misspellings. Since the recall of Levenshtein is higher than N-gram, it is concluded that the frequency of misspelling words which are correctly fixed by Levenshteinoccurs more often.
Identifying The Common Type of Spelling Error by Leveraging Levenshtein Distance and N-gram Hardiyanti, Margareta
Scientific Journal of Informatics Vol 8, No 1 (2021): May 2021
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v8i1.29273

Abstract

Purpose: This study aims to identify the common type of spelling error and it uses the list of common misspelling words submitted by Wikipedia contributors. Methods: Levenshtein and N-gram distance are utilized to predict the correct word of misspelling from English dictionary. Then, the result of both algorithms is observed and evaluated using recall metrics to determine which technique works more effectively. Result: The result of this study shows that Levenshtein works well to correct substitution single letter and transposition two sequenced letters, while N-gram operates effectively to fix the word with letter omission. The overall result is then evaluated by recall measurement to see which technique that works well on correcting the misspellings. Since the recall of Levenshtein is higher than N-gram, it is concluded that the frequency of misspelling words that are correctly fixed by Levenshtein occurs more often. Novelty: This is the first study that compares two spelling correction algorithms on identifying the common type of spelling error.