The advancement of artificial intelligence, particularly reinforcement learning (RL), has driven innovation in automated decision-making for financial markets. While Deep Reinforcement Learning (DRL) is widely applied, it often requires significant computational resources and lacks transparency. This study proposes a lightweight, replicable, value-based RL (Q-Learning) trading bot utilizing open data from Yahoo Finance. The system is developed end-to-end, covering data acquisition, preprocessing, RL agent design, and strategy evaluation. The Q-Learning agent is trained to execute daily actions (buy, sell, hold) to maximize cumulative returns and minimize risk. Experimental results show that the Q-Learning Bot achieved a cumulative return of 145.7%, outperforming Buy-and-Hold (120.5%) and Moving Average Crossover (85.3%), with lower maximum drawdown (-18.7% vs -35.0%). A Sharpe Ratio of 1.35 and a win rate of 58.9% indicate superior risk-adjusted performance. These findings demonstrate that Tabular Q-Learning has strong potential as an adaptive and effective trading approach with low computational cost.
Copyrights © 2025