International Journal Software Engineering and Computer Science (IJSECS)
Vol. 5 No. 3 (2025): DECEMBER 2025

Optimization of Tesseract OCR for Automatic Text Extraction on Indonesian ID Cards (KTP) Through Image Quality Enhancement Using Preprocessing Techniques

Gilang Ramadhan (Sekolah Tinggi Ilmu Komputer Cipta Karya Informatika)
Dadang Iskandar Mulyana (Sekolah Tinggi Ilmu Komputer Cipta Karya Informatika)
Sopan Adrianto (Sekolah Tinggi Ilmu Komputer Cipta Karya Informatika)



Article Info

Publish Date
01 Dec 2025

Abstract

Tesseract OCR ranks among the most widely adopted open-source tools for text extraction. Nevertheless, processing documents with degraded image quality—including blurry e-KTPs, low-contrast specimens, or those affected by uneven lighting—presents substantial challenges. We conducted experimental research to generate empirical data supporting the development of text detection systems for e-KTPs operating under non-ideal conditions. Our methodology involved testing 10 e-KTP images, each containing 15 text attributes, yielding 150 evaluated data points. Image preprocessing proceeded sequentially through grayscale conversion, denoising, contrast enhancement (CLAHE), and thresholding to improve image clarity prior to Tesseract OCR processing. We evaluated accuracy using confusion matrix analysis, emphasizing True Positive (TP), False Positive (FP), and False Negative (FN) metrics. Results demonstrate that preprocessing stages substantially improved text readability. Baseline OCR accuracy of 39.55% increased incrementally: +22.68% following grayscale conversion, +47.70% after denoising, +60.99% post-CLAHE application, and +19.62% after thresholding, culminating in 64.97% accuracy upon completing all preprocessing stages. Average TP values rose from 4 to 8 out of 15 attributes per image, while precision remained stable at 100% (FP = 0). Despite variable CLAHE performance across samples, preprocessing stages proved essential for OCR systems operating under degraded image conditions. Our work introduces a novel preprocessing pipeline tailored specifically to Indonesian e-KTP characteristics, providing quantitative benchmarks and systematic analysis that can inform the development of more adaptive digitalization and verification systems for population documents under real-world field conditions

Copyrights © 2025






Journal Info

Abbrev

ijsecs

Publisher

Subject

Computer Science & IT

Description

IJSECS is committed to bridge the theory and practice of information technology and computer science. From innovative ideas to specific algorithms and full system implementations, IJSECS publishes original, peer-reviewed, and high quality articles in the areas of information technology and computer ...