Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : International Journal Software Engineering and Computer Science (IJSECS)

ETL Pipeline with DTO Normalization for IPOS Data Integration in Spring Boot Nugroho, Adhi Septian; Susetyo, Yeremia Alfa
International Journal Software Engineering and Computer Science (IJSECS) Vol. 6 No. 1 (2026): APRIL 2026
Publisher : Lembaga Otonom Lembaga Informasi dan Riset Indonesia (KITA INFO dan RISET) - Lembaga KITA

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35870/ijsecs.v6i1.6850

Abstract

IPOS point-of-sale software, widely used by Indonesian small and medium retail enterprises (UMKM), exports transaction data as Excel files with no enforced schema—producing format-variable, multi-row receipt blocks with heterogeneous date representations, locale-dependent numeric formats, and embedded unit strings that resist conventional relational import. Transforming these unstructured exports into a relational database requires a structured architectural approach capable of handling format variability, type inconsistency, and record duplication. This study designs and implements a Spring Boot-based ETL (Extract, Transform, Load) service that applies the Data Transfer Object (DTO) pattern through ten purpose-specific DTO classes covering each pipeline phase, structured within a four-layer Model-View-Controller (MVC) architecture (Controller-Service-Repository-Entity). The Extractor employs a streaming Excel reader with dynamic column-layout detection based on header keywords, producing raw String-typed ExtractedReceipt and ExtractedItem DTOs. The Transformer applies six normalization steps via four utility classes—StringNormalizer, DateParser (seven date-format patterns), NumberParser (Indonesian and Western currency formats), and a HashSet-based duplicate detector—converting raw strings into typed ValidatedReceipt and ValidatedItem DTOs with explicit error logging. The Loader performs batch inserts per 1,000 records using pre-loaded duplicate sets for O(1) lookup. The pipeline operates asynchronously, returning a jobId immediately while processing continues on a background thread. Functional evaluation across ten scenarios yielded a 100% pass rate, covering valid files, invalid file types, date-format heterogeneity, embedded-unit quantity strings, Indonesian numeric formats, cross-file and intra-file duplicate detection, grand-total reconciliation tolerance, and product-variation tracking. Performance observation shows that files of 200–500 receipts complete within 5–15 seconds. These results indicate that a DTO-centric, explicitly mapped ETL pipeline over Spring Boot MVC provides a maintainable, auditable, and production-ready solution for UMKM retail data integration.