Claim Missing Document
Check
Articles

Found 1 Documents
Search

INTEGRASI OCR TESSERACT DAN GEMINI UNTUK BUKU TAMU DIGITAL BERBASIS WEB Ponggohong, Alfa Riegel Imanuel; Rorimpandey, Gladly Caren; Maswonggo, Vandi Vanda
JOINTER : Journal of Informatics Engineering Vol 6 No 02 (2025): JOINTER : Journal of Informatics Engineering
Publisher : Program Studi Teknik Informatika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.53682/jointer.v6i02.431

Abstract

Abstract—Digital guest book systems require efficient data extraction from identity documents. Optical Character Recognition (OCR) combined with Artificial Intelligence enables automated text extraction and entity classification. This research develops and evaluates a web-based digital guest book system integrating Tesseract OCR with Google Gemini AI using Django framework. The system processes various identity card types (ID card, driver’s license, employee card) with 21 image samples. Multi-variant image preprocessing techniques (grayscale conversion, CLAHE, binarization, deskewing, sharpening) are applied in parallel to improve OCR accuracy. Tesseract OCR with Indonesian+English language configuration (psm=6, oem=1) extracts text, followed by Gemini AI analysis for entity classification (name vs. institution). A rule-based fallback mechanism ensures robustness when AI is unavailable. Experimental results demonstrate the fallback mechanism’s reliability, achieving 100% accuracy for name detection and 47.6% for institution detection using traditional rule-based methods when AI quota limits are reached. Multi-variant preprocessing achieved average OCR confidence of 70.48%, with best-performing variants (enhanced CLAHE, binarization, original) yielding con- fidence ranges from 51.12% to 94.11%. The system successfully processes 21 different identity card types with 100% OCR success rate. The developed system demonstrates effectiveness for automated guest data acquisi- tion and can be adopted by institutions requiring fast and accurate digital registration processes.