The increasing reliance on digital financial documents has highlighted the need for automated methods to extract structured information from bank statements. Traditional optical character recognition (OCR) systems often fail to capture complex tabular structures, leading to incomplete or error-prone transaction records. To address this challenge, this research proposes a two-stage detection and recognition pipeline that combines YOLOv12 for table and structural element detection with PaddleOCR for text extraction, followed by automated Excel conversion. The objective of this study is to improve accuracy in localizing tables, detecting rows and columns, and generating structured financial data that can be directly utilized for downstream applications. The methods involve training a YOLOv12-n model in two stages: Stage 1 focuses on detecting entire table regions, while Stage 2 focuses on identifying row and column structures within the detected tables. A lightweight AdamW optimizer with conservative augmentation strategies was applied to preserve the geometric integrity of document layouts. Results show that Stage 1 achieved precision of 0.998, recall of 1.0, and mAP50-95 of 0.989, while Stage 2 achieved precision of 0.992, recall of 0.964, and mAP50-95 of 0.899, demonstrating strong localization and structural recognition. The conclusions confirm that the proposed two-stage pipeline is effective for financial document processing, with potential applications in digital banking, auditing, and automated record management. Future research may focus on expanding datasets and addressing domain-specific variability.
Copyrights © 2026