Gilang Ramadhan
Sekolah Tinggi Ilmu Komputer Cipta Karya Informatika

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : International Journal Software Engineering and Computer Science (IJSECS)

Optimization of Tesseract OCR for Automatic Text Extraction on Indonesian ID Cards (KTP) Through Image Quality Enhancement Using Preprocessing Techniques Gilang Ramadhan; Dadang Iskandar Mulyana; Sopan Adrianto
International Journal Software Engineering and Computer Science (IJSECS) Vol. 5 No. 3 (2025): DECEMBER 2025
Publisher : Lembaga Komunitas Informasi Teknologi Aceh (KITA)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35870/ijsecs.v5i3.5183

Abstract

Tesseract OCR ranks among the most widely adopted open-source tools for text extraction. Nevertheless, processing documents with degraded image quality—including blurry e-KTPs, low-contrast specimens, or those affected by uneven lighting—presents substantial challenges. We conducted experimental research to generate empirical data supporting the development of text detection systems for e-KTPs operating under non-ideal conditions. Our methodology involved testing 10 e-KTP images, each containing 15 text attributes, yielding 150 evaluated data points. Image preprocessing proceeded sequentially through grayscale conversion, denoising, contrast enhancement (CLAHE), and thresholding to improve image clarity prior to Tesseract OCR processing. We evaluated accuracy using confusion matrix analysis, emphasizing True Positive (TP), False Positive (FP), and False Negative (FN) metrics. Results demonstrate that preprocessing stages substantially improved text readability. Baseline OCR accuracy of 39.55% increased incrementally: +22.68% following grayscale conversion, +47.70% after denoising, +60.99% post-CLAHE application, and +19.62% after thresholding, culminating in 64.97% accuracy upon completing all preprocessing stages. Average TP values rose from 4 to 8 out of 15 attributes per image, while precision remained stable at 100% (FP = 0). Despite variable CLAHE performance across samples, preprocessing stages proved essential for OCR systems operating under degraded image conditions. Our work introduces a novel preprocessing pipeline tailored specifically to Indonesian e-KTP characteristics, providing quantitative benchmarks and systematic analysis that can inform the development of more adaptive digitalization and verification systems for population documents under real-world field conditions