Garuda - Garba Rujukan Digital

Sinkron : Jurnal dan Penelitian Teknik Informatika

Vol. 10 No. 1 (2026): Article Research January 2026

Rahman, Moh Syaiful (Unknown)
Andrianingsih , Andrianingsih (Unknown)

Publish Date
03 Jan 2026

PDF template data extraction remains a substantial challenge due to semi-structured document formats and variations. While large pre-trained models achieve high accuracy, they require extensive computational resources and labeled datasets, making them impractical for resource-constrained environments. Conversely, rule-based approaches are efficient but rigid. This research addresses this gap by developing an adaptive learning system that integrates rule-based approaches with Conditional Random Fields (CRF) in a hybrid framework, designed for data-scarce scenarios. The system implements parallel extraction strategies with confidence-based selection and Human-in-the-Loop (HITL) feedback for incremental learning. Pattern learning updates rule-based strategies, while CRF models are retrained incrementally. Evaluated on synthetically generated documents across diverse template types, the system achieves 98.61% accuracy with minimal training data and 7% user correction rate, demonstrating high learning efficiency (1.88 corrections per percentage point). The improvement is statistically significant (paired t-test, p < 0.001, Cohen’s d = 8.95). The system operates on CPU-only hardware with 50-100 MB footprint and 0.1-0.5 seconds processing time. This work fills a practical gap in document extraction, providing a middle-ground solution balancing high accuracy, minimal data requirements, low resource consumption, and real-time adaptability—suitable for small organizations and rapid deployment where large models are impractical. The evaluation uses synthetic data to ensure reproducibility and controlled assessment, though real-world validation would strengthen practical applicability.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Sinkron : Jurnal dan Penelitian Teknik Informatika

Website

Abbrev

sinkron

Publisher

Politeknik Ganesha Medan

Subject

Computer Science & IT

Description

Scope of SinkrOns Scientific Discussion 1. Machine Learning 2. Cryptography 3. Steganography 4. Digital Image Processing 5. Networking 6. Security 7. Algorithm and Programming 8. Computer Vision 9. Troubleshooting 10. Internet and E-Commerce 11. Artificial Intelligence 12. Data Mining 13. Artificial ...

Article Info

Abstract

Adaptive Learning System Based on Human-in-the-Loop for PDF Template Data Extraction

Article Info

Abstract