In alignment with Indonesia's digital transformation agenda, this research addresses the inefficiencies and error-prone nature of manual data entry on the Foreign Policy Strategy Agency's (BSKLN) e-magang platform. This study introduces a comprehensive, end-to-end Optical Character Recognition (OCR) pipeline, specifically designed for structured identity documents and real-world government platform integration. The proposed methodology features a robust workflow, including image preprocessing with histogram matching, hierarchical segmentation using vertical projection, and intelligent postprocessing to structure the output. To overcome the limitations of a small dataset, three specialized Convolutional Neural Network (CNN) models were rigorously trained and validated using a stratified 5-fold cross-validation technique. The final system was successfully integrated, connecting a Flask-based model engine with the existing Laravel and React platform. End-to-end testing demonstrated strong performance, achieving an average character-reading accuracy of 93.31% with a mean processing time of 14.48 seconds per image. The primary contribution of this research to the field of informatics is the development of a complete and deployable system architecture that ensures data interoperability and reliability, providing a practical blueprint for integrating intelligent automation into digital public services.
Copyrights © 2025