Garuda - Garba Rujukan Digital

The Eastasouth Journal of Information System and Computer Science

Vol. 1 No. 02 (2023): The Eastasouth Journal of Information System and Computer Science (ESISCS)

Chirumamilla, Koteswara Rao (Unknown)

Publish Date
31 Dec 2023

Extract–Transform–Load (ETL) pipelines remain a critical component of enterprise data infrastructure, supporting analytics, reporting, and machine learning by preparing raw data for downstream consumption. As organizations scale, these pipelines must process increasingly diverse datasets while adapting to shifting workloads, irregular input patterns, and evolving business requirements. Conventional optimization approaches rely on static rules, hand-tuned configurations, or heuristic scheduling, all of which struggle to maintain efficiency when system behavior changes over time. Manual tuning becomes particularly difficult in large environments where hundreds of pipelines compete for shared compute resources and experience unpredictable variations in data volume and schema complexity. This paper presents a reinforcement learning (RL)–based framework designed to autonomously optimize ETL execution without human intervention. The system formulates ETL optimization as a sequential decision-making problem, where an RL agent learns to select transformation ordering, resource allocation strategies, caching policies, and execution priorities based on the current operational state. State representations incorporate metadata signals, historical performance trends, data quality indicators, and real-time workload statistics. Through iterative reward-driven learning, the agent gradually identifies strategies that improve throughput, reduce processing cost, and stabilize pipeline performance across heterogeneous environments. The framework was evaluated in production-like settings spanning financial services, retail analytics, and telecommunications data operations. Across these domains, the RL-driven system reduced end-to-end execution time by 33%, lowered compute utilization costs by 27%, and increased data quality throughput by 41%. These results highlight the promise of reinforcement learning as a foundation for building adaptive, self-optimizing ETL systems that respond to operational variability and reduce the need for manual intervention. The work demonstrates a viable pathway toward autonomous data engineering platforms capable of supporting large-scale enterprise workloads.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

The Eastasouth Journal of Information System and Computer Science

Website

Abbrev

esiscs

Publisher

Eastasouth Institute

Subject

Computer Science & IT

Description

ESISCS - The Eastasouth Journal of Information System and Computer Science is a peer-reviewed journal and open access three times a year (April, August, December) published by Eastasouth Institute. ESISCS aims to publish articles in the field of Enterprise systems and applications, Database ...

Article Info

Abstract

Reinforcement Learning to Optimize ETL Pipelines

Article Info

Abstract