Sinkron : Jurnal dan Penelitian Teknik Informatika
Vol. 9 No. 1 (2025): Research Article, January 2025

Improving Tesseract OCR Accuracy Using SymSpell Algorithm on Passport Data

Had, Iqbaluddin Syam (Unknown)
Maulana Baihaqi, Wiga (Unknown)
Putriana Nuramanah Kinding, Dwi (Unknown)



Article Info

Publish Date
27 Jan 2025

Abstract

Optical Character Recognition (OCR) is a technology used to recognize text from images or digital documents, such as passports. One popular OCR tool is Tesseract as it offers high accuracy. However, OCR accuracy is often affected by various factors, including image noise and/or non-text elements. This article discusses the application of the SymSpell algorithm for post processing to improve OCR accuracy on standard Indonesian passports. OCR will be focused on the Visual Inspection Zone, specifically the Place of Birth and Issuing Office values. Unlike the Machine Readable Zone which is composed of individual codes and a clear background, the Visual Inspection Zone often experiences OCR errors due to holograms blocking the text and spaced layouts. SymSpell is an edit distance based spelling correction algorithm designed to process data quickly and efficiently, even on very huge datasets. In this study, SymSpell is used to detect and correct errors in OCR results that are compared to a corpus word list. Experimental results with 10 tested scans and passport photos showed that the integration of SymSpell with the Research and Development methodology was able to improve the OCR accuracy rate by 21,43% for certain Place of Birth and Issuing Office data from the Visual Inspection Zone. With this approach, OCR systems can provide more reliable results for practical applications.

Copyrights © 2025






Journal Info

Abbrev

sinkron

Publisher

Subject

Computer Science & IT

Description

Scope of SinkrOns Scientific Discussion 1. Machine Learning 2. Cryptography 3. Steganography 4. Digital Image Processing 5. Networking 6. Security 7. Algorithm and Programming 8. Computer Vision 9. Troubleshooting 10. Internet and E-Commerce 11. Artificial Intelligence 12. Data Mining 13. Artificial ...