Jurnal Mandiri IT
Vol. 14 No. 2 (2025): Computer Science and Field

Reinforcement learning for bitcoin trading: A comparative study of PPO and DQN

Prasetyo, Romadhan Edy (Unknown)
Sumanto, Sumanto (Unknown)
Chaidir, Indra (Unknown)
Supriyatna, Adi (Unknown)



Article Info

Publish Date
22 Aug 2025

Abstract

Bitcoin’s high volatility demands automated strategies that adapt to changing market regimes while managing risk. This study compares Proximal Policy Optimization (PPO) and Deep Q-Network (DQN) for Bitcoin trading using hourly BTC/USDT data from 2019 to early 2025. The models are trained to generate buy and sell signals from technical indicators including the Relative Strength Index (RSI), MA20, volatility, Moving Average Convergence Divergence (MACD), volume trend, SMA200, and a weekly trend filter. All features are computed on hourly bars. The evaluation shows that PPO tends to trade more aggressively and delivers higher performance during bullish phases, though with greater risk in unstable markets. By contrast, DQN trades more selectively and maintains better stability in sideways or choppy conditions. These findings support the effectiveness of reinforcement learning for adaptive cryptocurrency trading and highlight complementary strengths between PPO and DQN across market regimes.

Copyrights © 2025






Journal Info

Abbrev

Mandiri

Publisher

Subject

Computer Science & IT Library & Information Science Mathematics

Description

The Jurnal Mandiri IT is intended as a publication media to publish articles reporting the results of Computer Science and related ...