Training agents is an intriguing research topic due to human limitations in maintaining consistent performance, particularly in the game Flappy Bird. This study compares action selection methods, namely Softmax and Upper Confidence Bound (UCB), to enhance agent performance in action selection. Testing was conducted using both methods in a Flappy Bird environment based on Gymnasium. Evaluation was performed using metrics such as average score, highest score, average steps, and Q-value. The final results indicate that Softmax tends to explore early in training but achieves convergence toward the end, whereas UCB tends to exploit early, leading to stagnant scores. Based on t-test results, no significant difference was found in the performance of the two action selection methods. This study provides guidance on selecting action selection methods for agents in simple games.
Copyrights © 2025