This study aims to design and evaluate a Deep Recurrent Q-Network (DRQN) agent for automated trading decision-making on Islamic stocks, training it with daily historical price data from the Indonesian Islamic Stock Index (ISSI) and integrating a Long Short-Term Memory (LSTM) layer. Although the agent successfully learns a profitable strategy during the training phase, on unseen test data, it exhibits passive behavior by only choosing the 'hold' action, resulting in zero profit—a phenomenon known as policy stagnation. This finding indicates that the used reward function implicitly encourages excessive risk aversion. The study concludes that the success of the DRQN architecture relies heavily on sophisticated reward engineering, underscoring the need for future research on dynamic and adaptive reward mechanisms to develop robust and generalizable trading agents in the complex Islamic finance domain.
                        
                        
                        
                        
                            
                                Copyrights © 2025