Text is one of communication and information media in human life. The crucial thing in text writing is a mistake in word writing called typographical error. The error occurs while using the keyboard on computer or on smartphone. Typographical error on a text can lead to something unpredictable for some people. Based on that reason, a system is needed to identify typographical error in a text and also make the correction of the error word. N-gram and Levenshtein Distance method can be used for correcting typographical error in the text. For detecting how many word candidates of typographical error, Levenshtein Distance can be implemented. Because the word candidates are unsorted, N-gram method is using to sort those word candidates based on the value of cosine similarity. In this research, the reason N-gram method using N=2 is to separated each two characters of identified typographical error and its word candidates.The value of cosine similarity calculated by tf-idf when the process of N-gram was done. The result of test scenario, the best value of precision is 0.97 from insertion type and the best value of recall is 1 from substitution type.
Copyrights © 2018