Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2020 - 2025

0.23

P-Index

This Author published in this journals

All Journal International Journal of Engineering, Science and Information Technology

Kunal Shah, Jyoti

Unknown Affiliation

Author-ID : 8940190

Astronomy Biochemistry, Genetics & Molecular Biology Chemical Engineering, Chemistry & Bioengineering Chemistry Civil Engineering, Building, Construction & Architecture Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Earth & Planetary Sciences Education Electrical & Electronics Engineering Energy Engineering Industrial & Manufacturing Engineering Library & Information Science Materials Science & Nanotechnology Mathematics Mechanical Engineering Physics Social Sciences Transportation

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Towards Self-Healing Cloud Infrastructures: Predictive Maintenance with Reinforcement Learning and Generative Models Kunal Shah, Jyoti; Matam, Prashanthi
International Journal of Engineering, Science and Information Technology Vol 5, No 3 (2025)
Publisher : Malikussaleh University, Aceh, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52088/ijesty.v5i3.1185

Reinforcement Learning (RL) is quickly becoming a powerful way to predict failures and improve systems in large cloud environments before they happen. Unlike traditional reactive methods, RL lets smart agents learn the best actions by interacting with changing environments and using reward signals to improve system uptime, resource use, and reliability. As cloud-based big data systems get bigger and more complicated, they also become more likely to have problems that slow them down or cause them to fail at random times. To deal with these problems, we need more than just advanced failure prediction algorithms. We also need adaptive, explainable systems that help people understand what's going on and step in when necessary. This paper looks into how to use RL to help predict and manage failures in cloud-based big data systems. We suggest a layered architecture that uses RL agents and generative explanation models to predict failures and take steps to stop them. We focus on real-time feedback loops, autonomous learning, and outputs that can be understood. This is especially important in anomaly detection pipelines, where explanations need to be detailed but short. We show how reinforcement learning agents can find patterns of risk and take steps to avoid them by using examples from real-world hyperscale data centers. We also look at how generative models, like transformer-based language generators, can turn complicated telemetry data into information that people can understand. At the end of the paper, the authors suggest areas for future research, such as safe RL deployment, multi-agent coordination, and explainable policy design.

Co-Authors Matam, Prashanthi

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search