Hicham Gueddah
Mohammed V University

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Improving SpellChecking: an effective Ad-Hoc probabilistic lexical measure for general typos Hicham Gueddah; Mohamed Nejja; Said Iazzi; Abdellah Yousfi; Si Lhoussain Aouragh
Indonesian Journal of Electrical Engineering and Computer Science Vol 27, No 1: July 2022
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v27.i1.pp521-527

Abstract

Since the era of learning to write by human beings,  mistakes made in typing words have occupied a privileged place in linguistic studies, integrating new disciplines into school curricula such as spelling and dictation. According to exhaustive studies that we have done in the field of spellchecking errors made in typing Arabic texts, very few research works that deal with typographical errors specifically caused by the insertion or missing of the blank-space in words. On the other hand, spelling correction software remains ineffective for handling this type of errors. Failure to process errors due to the insertion/missing of blankspace between and in words leads and brings us back to situations of ambiguity and incomprehension of the meaning of the typed text. To remedy this limitation of correction, we propose in this article an ad-hoc probabilistic method which is based jointly on two approaches. The first approach treats the errors due to deletion or missing of blank-space between or inside words, while the second puts emphasis in correcting space insertion errors in a word of course in addition to other kinds of elementary editing errors (addition, deletion, permutation of characters). Our new approach combines edit distance with n-gram language models to correct the errors already mentioned. Our new approach gave an accuracy rate that reaches 98,14% for missing blank-space errors (noted MBSE) and 89,5% for insertion blank-dpace errors (noted IBSE), which gives an average correction rate of around 95,26%. These results are very encouraging and show the interest and the importance of our approach.
Arabic spellchecking: a depth-filtered composition metric to achieve fully automatic correction Hicham Gueddah; Youssef Lachibi
International Journal of Electrical and Computer Engineering (IJECE) Vol 13, No 5: October 2023
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijece.v13i5.pp5366-5373

Abstract

Digital environments for human learning have evolved a lot in recent years thanks to incredible advances in information technologies. Computer assistance for text creation and editing tools represent a future market in which natural language processing (NLP) concepts will be used. This is particularly the case of the automatic correction of spelling mistakes used daily by data operators. Unfortunately, these spellcheckers are considered writing aids tools, they are unable to perform this task automatically without user’s assistance. In this paper, we suggest a filtered composition metric based on the weighting of two lexical similarity distances in order to reach the auto-correction. The approach developed in this article requires the use of two phases: the first phase of correction involves combining two well-known distances: the edit distance weighted by relative weights of the proximity of the Arabic keyboard and the calligraphical similarity between Arabic alphabet, and combine this measure with the JaroWinkler distance to better weight, filter solutions having the same metric. The second phase is considered as a booster of the first phase, this use the probabilistic bigram language model after the recognition of the solutions of error, which may have the same lexical similarity measure in the first correction phase. The evaluation of the experimental results obtained from the test performed by our filtered composition measure on a dataset of errors allowed us to achieve a 96% of auto-correction rate.