Chirumamilla, Koteswara Rao
Unknown Affiliation

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Predicting Data Contract Failures Using Machine Learning Chirumamilla, Koteswara Rao
The Eastasouth Journal of Information System and Computer Science Vol. 1 No. 01 (2023): The Eastasouth Journal of Information System and Computer Science (ESISCS)
Publisher : Eastasouth Institute

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.58812/esiscs.v1i01.843

Abstract

Data contracts have emerged as a foundational mechanism for ensuring reliable communication between producers and consumers in modern distributed data ecosystems. They specify expected schemas, semantic intentions, and quality constraints, forming the basis for trustworthy data exchange across pipelines and organizational boundaries. Despite their growing adoption, contract violations remain a persistent operational challenge. These failures frequently stem from subtle schema shifts, unexpected type variations, incomplete records, or semantic inconsistencies introduced during upstream system changes. Traditional validation approaches—often built on static rules or manual inspection—struggle to keep pace with evolving datasets, diverse integration patterns, and continuous delivery cycles. As a result, contract breaches propagate downstream, causing pipeline interruptions, test instability, and avoidable production incidents. This paper presents a machine learning–driven framework designed to anticipate data contract failures before they manifest. The approach draws on both historical and real-time metadata, capturing patterns in schema evolution, anomaly trajectories, operational log signals, and field-level drift behavior. A hybrid modeling strategy is employed, combining gradient-boosted decision trees for structured anomaly detection, temporal drift modules for sequential pattern monitoring, and embedding-based schema representations for high-dimensional contract features. By integrating these components, the system provides early warning indicators that enable teams to intervene proactively rather than react after failures disrupt operations. The framework was evaluated using datasets from financial services, e-commerce platforms, and healthcare systems—domains characterized by diverse data heterogeneity and high operational sensitivity. Across these environments, the model achieved up to 79% accuracy in predicting contract violations, reduced downstream pipeline failures by 42%, and shortened incident triage time by 37%. These results highlight the potential of ML-driven predictive validation as a practical path toward resilient, self-monitoring data infrastructures in enterprise settings.
Reinforcement Learning to Optimize ETL Pipelines Chirumamilla, Koteswara Rao
The Eastasouth Journal of Information System and Computer Science Vol. 1 No. 02 (2023): The Eastasouth Journal of Information System and Computer Science (ESISCS)
Publisher : Eastasouth Institute

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.58812/esiscs.v1i02.844

Abstract

Extract–Transform–Load (ETL) pipelines remain a critical component of enterprise data infrastructure, supporting analytics, reporting, and machine learning by preparing raw data for downstream consumption. As organizations scale, these pipelines must process increasingly diverse datasets while adapting to shifting workloads, irregular input patterns, and evolving business requirements. Conventional optimization approaches rely on static rules, hand-tuned configurations, or heuristic scheduling, all of which struggle to maintain efficiency when system behavior changes over time. Manual tuning becomes particularly difficult in large environments where hundreds of pipelines compete for shared compute resources and experience unpredictable variations in data volume and schema complexity. This paper presents a reinforcement learning (RL)–based framework designed to autonomously optimize ETL execution without human intervention. The system formulates ETL optimization as a sequential decision-making problem, where an RL agent learns to select transformation ordering, resource allocation strategies, caching policies, and execution priorities based on the current operational state. State representations incorporate metadata signals, historical performance trends, data quality indicators, and real-time workload statistics. Through iterative reward-driven learning, the agent gradually identifies strategies that improve throughput, reduce processing cost, and stabilize pipeline performance across heterogeneous environments. The framework was evaluated in production-like settings spanning financial services, retail analytics, and telecommunications data operations. Across these domains, the RL-driven system reduced end-to-end execution time by 33%, lowered compute utilization costs by 27%, and increased data quality throughput by 41%. These results highlight the promise of reinforcement learning as a foundation for building adaptive, self-optimizing ETL systems that respond to operational variability and reduce the need for manual intervention. The work demonstrates a viable pathway toward autonomous data engineering platforms capable of supporting large-scale enterprise workloads.