The past threat of the COVID-19 pandemic has challenged policymakers to develop effective decision-support systems. Reinforcement learning (RL), a branch of artificial intelligence, has emerged as a promising approach to designing such systems. This systematic review analyzes 20 selected studies published between 2020 and 2024 that apply RL as a decision-making tool for COVID-19 mitigation, focusing on environment models, algorithms, state representation, action design, reward functions, and challenges. Our findings reveal that Q-learning is the most frequently used algorithm, with most implementations relying on SEIR-based models and real-world COVID-19 epidemiological data. Policy interventions, particularly lockdowns, are commonly modeled as actions, while reward functions are health-oriented, economic, or hybrid, with an increasing trend toward multi-objective designs. Despite these advancements, key limitations persist, including data uncertainty, computational complexity, ethical concerns, and the gap between simulated performance and real-world feasibility. This review further identifies a research opportunity to integrate epidemic model formulations with explicit control inputs into RL frameworks, potentially enhancing learning efficiency and bridging the gap between simulation and practice for future pandemic response systems.
Copyrights © 2025