JUITA : Jurnal Informatika
JUITA Vol. 13 Issue 2, July 2025

Enhanced OCR Recognition for Madurese Text Documents: A Genetic Algorithm Approach with Tesseract 5.5

Muhammad Nazir Arifin (Universitas Madura)
Muhammad Umar Mansyur (Universitas Madura)
Ali Rahman (Universitas Madura)
Nindian Puspa Dewi (Universitas Madura)
Fauzan Prasetyo Eka Putra (Universitas Madura)



Article Info

Publish Date
04 Aug 2025

Abstract

Character Recognition (OCR) for the Madurese language using Genetic Algorithms (GA). The study addresses the challenges in processing Madurese text documents by implementing a nine-step image preprocessing workflow optimized through GA. Our methodology combines rescaling, grayscale conversion, adaptive thresholding, deskewing, median blur, Otsu thresholding, border removal, contrast enhancement, and noise reduction, with the sequence determined by GA optimization. The system utilizes Tesseract 5.5 OCR engine configured with Vietnamese language model parameters to accommodate Maderese writing characteristics. Experiments conducted on a dataset of 500 images demonstrated significant improvements in recognition accuracy. The GA-optimized preprocessing sequence achieved a 24.32% Word Error Rate (WER) and 7.47% Character Error Rate (CER), marking substantial improvements over the baseline Tesseract implementation. Further optimization through language model selection, particularly using the Occitan (OCI) model, yielded 100% accuracy in specific test cases. The research also explored various fitness function configurations, with a 0.7:0.3 WER-to-CER ratio proving most effective. These results demonstrate the potential of GA optimization in enhancing OCR performance for regional languages with unique characteristics, contributing to the broader field of document digitization and language preservation

Copyrights © 2025






Journal Info

Abbrev

JUITA

Publisher

Subject

Computer Science & IT

Description

UITA: Jurnal Informatika is a science journal and informatics field application that presents articles on thoughts and research of the latest developments. JUITA is a journal peer reviewed and open access. JUITA is published by the Informatics Engineering Study Program, Universitas Muhammadiyah ...