Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal Jurnal Riset Informatika

Muhammad Raihan Izharul Haq

Unknown Affiliation

Author-ID : 8942807

Computer Science & IT

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Integration of OCR Technology with ETL Processes for Automating Data Pipeline of Financial Disbursement Documents at BPS Sukabumi Regency Muhammad Raihan Izharul Haq; Gina Purnama Insany; Somantri
Jurnal Riset Informatika Vol. 7 No. 4 (2025): September 2025
Publisher : Kresnamedia Publisher

In the digital era, managing archival data poses challenges for many institutions, including Badan Pusat Statistik (BPS) of Sukabumi Regency, especially when dealing with unstructured PDF documents. This study develops a data pipeline by effectively integrating Optical Character Recognition (OCR) technology with Extract, Transform, Load (ETL) processes. Unstructured data from financial disbursement documents, such as SPM and SP2D, were automatically extracted with high accuracy, achieving an average of 98.52% for SPM using a combination of OCR and PDFPlumber, and 100% for SP2D extracted using PDFPlumber. Extraction results were stored in a data warehouse, then transformed using Apache Spark and loaded into data marts. ETL process was automated using Apache Airflow, which operated reliably according to dependencies. The processed data were presented through an interactive Looker Studio dashboard in real-time, supporting efficient archive management and more informed decision-making. This study not only provides a solution to existing archival management problems but also opens opportunities for further development in the application of big data technologies and business process automation in public sector.

Co-Authors Gina Purnama Insany Somantri

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search