Our paper proposes a framework to automate penetration testing by utilizing reinforcement learning (RL) capabilities. The framework aims to identify and prioritize vulnerable paths within a network by dynamically learning and adapting strategies for vulnerability assessment by acquiring the network data obtained from a comprehensive network scanner. The study evaluates three RL algorithms: deep Q-network (DQN), deep deterministic policy gradient (DDPG), and asynchronous episodic deep deterministic policy gradient (AE-DDPG) in order to compare their effectiveness for this task. DQN uses a learned model of the environment to make decisions and is hence called model-based RL, while DDPG and AE-DDPG learn directly from interactions with the network environment and are called model-free RL. By dynamically adapting its strategies, the framework can identify and focus on the most critical vulnerabilities within the network infrastructure. Our work is to check how well the RL technique picked security vulnerabilities. The identified vulnerable paths are tested using Metasploit, which also confirmed the accuracy of the RL approach's results. The tabulated findings show that RL promises to automate penetration testing tasks.
Copyrights © 2025