Gollapudi, Raghu
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : International Journal of Electrical and Computer Engineering

Designing self-healing database fabrics for real-time payment rails Gollapudi, Raghu
International Journal of Electrical and Computer Engineering (IJECE) Vol 16, No 3: June 2026
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijece.v16i3.pp1360-1368

Abstract

Real-time payment platforms operating at scale face an unforgiving operational reality: even brief outages translate directly into failed transactions, regulatory exposure, and eroded customer trust. Database replication and failover automation have matured considerably over the past two decades, yet a troubling blind spot remains. Recovery frameworks built for general-purpose distributed systems were never designed with settlement finality in mind, and that design omission leaves payment operators exposed to split-brain scenarios that generic high-availability tooling cannot reliably prevent. This paper addresses that omission head-on through a self-healing database fabric purpose-built for payment rail environments. The proposed autonomous resilience fabric architecture (ARFA) operates across three coordinated layers: a continuous monitoring layer that harvests telemetry from compute, storage, and network subsystems; a decision layer that fuses rule-based heuristics with an ensemble of isolation forests, recurrent neural networks, and gradient boosting classifiers to separate genuine fault conditions from transient noise; and a deterministic action layer that executes recovery procedures anchored to explicit settlement finality constraints. In fault injection trials covering node crashes, network partitions, replication lag, and performance degradation, the architecture cut average recovery times by 88% against manual baselines, restoring service in roughly 8 seconds rather than the 180 seconds that human-driven remediation typically requires. False positive rates held below 2% across all failure categories, and the system achieved a 98% recovery success rate. Taken together, these results make a practical case that autonomous resilience and regulatory compliance reinforce rather than conflict with each other when the regulatory constraints are designed in from the start.