Training multi-agent reinforcement learning (MARL) systems often requires a significant amount of time due to sample inefficiency, particularly where agents need to do a considerable amount of exploration in a complex environment and coordination among multiple entities. This study proposes the use of imitation learning to accelerate the MARL training process in a 2v2 pong game by learning from demonstrations in 1v1 pong game to shape the initial policy without undergoing inefficient exploration procedure. We use deep Q-network (DQN) with centralized training with decentralized execution (CTDE) to observe the difference of performance between pretrained and untrained agents in 2v2 pong game. Experimental results show that learning from demonstration in 1v1 setting significantly improved reward accumulation and game scores of pretrained agent in 2v2 pong game. The improvement peaks at 700 learning steps of demonstration and diminishes at the larger learning steps due to excessive memorization of the demonstration gameplay. This work demonstrates that imitation learning from demonstrations can be used to reduce a prolonged training process in MARL, offering a viable solution especially when data collecting, computational resources, and training are the severely constrained.
Copyrights © 2026