Azrarsyah, Muhammad Rafli
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

From Risk-Neutral to Risk-Sensitive Reinforcement Learning: Actor–Critic vs REINFORCE with Tail-Based Risk Measures Lestia, Aprida Siska; Effendie, Adhitya Ronnie; Tantrawan, Made; Azrarsyah, Muhammad Rafli
CAUCHY: Jurnal Matematika Murni dan Aplikasi Vol 11, No 1 (2026): CAUCHY: JURNAL MATEMATIKA MURNI DAN APLIKASI
Publisher : Mathematics Department, Universitas Islam Negeri Maulana Malik Ibrahim Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.18860/cauchy.v11i1.40309

Abstract

his study investigates the application of \emph{risk-sensitive reinforcement learning} on heavy-tailed return series by comparing two primary algorithms: REINFORCE with baseline (REINFORCE-BL) and episodic batched actor--critic (A2C-B). Initial exploratory analysis reveals an asymmetric return distribution with numerous extreme \emph{outliers}, rendering variance-based risk measures inadequate and motivating the integration of tail-based risk measures—specifically Value at Risk (VaR), Conditional Value at Risk (CVaR), and Entropic Value at Risk (EVaR)—into the RL objective function. This study constructs a simple portfolio environment with discrete actions (market entry, market exit, and \emph{hold}) and trains both algorithms under four scenarios: risk-neutral, VaR, CVaR, and EVaR. Experimental results demonstrate that A2C-B consistently outperforms REINFORCE-BL across all scenarios, exhibiting higher average long-term rewards, faster convergence rates, and more stable \emph{learning curves}. While VaR and CVaR penalties significantly reduce rewards and increase learning volatility for REINFORCE-BL, A2C-B experiences only moderate reward reductions while maintaining stability. In the EVaR scenario, both algorithms yield high rewards, yet A2C-B retains a slight advantage in terms of stability. These findings indicate that in environments with heavy-tailed returns, employing coherent risk measures (particularly CVaR and EVaR) within an actor--critic framework offers a more compelling trade-off between tail risk control and average performance, serving as a viable \emph{baseline} for the development of risk-sensitive RL in finance and actuarial science.