Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : kinetik game technology information system computer network computing electronics and control

Imitation Learning to Accelerate Training Process of Multi-Agent Reinforcement Learning in 2v2 Pong Game Marvin Yonathan Hadiyanto; Budi Harsono; Indra Karnadi; Ivan Tanra
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control Vol. 11, No. 2, May 2026
Publisher : Universitas Muhammadiyah Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22219/kinetik.v11i2.2564

Abstract

Training multi-agent reinforcement learning (MARL) systems often requires a significant amount of time due to sample inefficiency, particularly when agents must perform extensive exploration in complex environments and coordinate among multiple entities. This study proposes the use of imitation learning to accelerate the MARL training process in a 2v2 pong game by leveraging demonstrations from a 1v1 pong game to shape the initial policy without undergoing inefficient exploration procedures. We employ a deep Q-network (DQN) framework with centralized training and decentralized execution (CTDE) to compare the performance of pretrained and untrained agents in the 2v2 pong environment. Experimental results show that learning from demonstrations in the 1v1 setting improves reward accumulation and game scores of pretrained agents in the 2v2 pong game. The performance improvement peaks at 700 demonstration learning steps and diminishes at larger learning steps due to excessive memorization of the demonstration gameplay. Furthermore, comparative experiments demonstrate that imitation learning with 700 learning steps achieves learning efficiency improvements of approximately 300% and 571% compared to the zonation method and standard reinforcement learning pretraining, respectively. These results indicate that imitation learning from demonstrations can effectively reduce the prolonged training process in MARL, offering a viable solution, particularly when data collection, computational resources, and training time are severely constrained.