The process of encoding an input text image into a machine-readable format is called optical character recognition (OCR). The difference in characteristics of each language makes it difficult to develop a universal method that will have high accuracy for all languages. A method that produces good results for one language may not necessarily produce the same results for another language. OCR for printed characters is easier than handwritten characters because of the uniformity that exists in printed characters. While conventional methods find it hard to improve the existing methods, Convolutional Neural Networks (CNN) has shown drastic improvement in classification and recognition of other languages. However, there is no OCR model using CNN for Malayalam characters. Our proposed system uses a new CNN architecture for feature extraction and softmax layer for classification of characters. This eliminates manual designing of features that is used in the conventional methods. P-ARTS Kayyezhuthu dataset is used for training the CNN and an accuracy of 99.75% is obtained for the testing dataset meanwhile a collection of 40 real time input images yielded an accuracy of 95%.
Copyrights © 2023