Created by W.Langdon from gp-bibliography.bib Revision:1.8276
First, the safe multi-agent coordination problem is investigated. Popular multi-agent benchmarks provide limited safety support for the safe multi-agent reinforcement learning (MARL) research, where negative reward for collisions cannot guarantee the safety. Therefore, this research proposes a new safety-constrained multi-agent environment: MatrixWorld, based on the general pursuit-evasion game. In particular, the multi-agent safety constraints are implemented by three classification ways of pursuit-evasion games: the multi-agent-environment interaction model, the collision resolution mechanism in multi-agent action execution model, and the game termination condition. Besides, MatrixWorld is a lightweight co-evolution framework for the learning of pursuit tasks, evasion tasks, or both, where more pursuit-evasion variants can be designed based on different practical meanings of safety.
Second, the NP-hard distributed coordination problem is investigated throughout our research. For example, in the fully observable pursuit of a single evader, this research proposes the cooperative co-evolutionary particle swarm optimization algorithm for robots (CCPSO-R). It introduces the concept of virtual agents and utilizes the cooperative co-evolutionary evaluation mechanism for the decentralized cooperation of on-line planning pursuers. Experiments are conducted on a scalable swarm of pursuers with 4 types of evaders, the results of which show the reliability, generality, and scalability of the proposed CCPSO-R. Comparison with a representative dynamic path planning based algorithm Multi-Agent Real-Time Pursuit (MAPS) further shows the effectiveness of CCPSO-R.
Third, the NP-complete multi-agent task allocation problem is investigated in the pursuit-evasion variants with more than one evaders. For example, in the fully observable pursuit of multiple evaders, this research proposes the two-stage approach: BiPCCR, which solves in a dynamic optimization way. In particular, a multi-evader pursuit (MEP) fitness function is proposed for the involved bi-quadratic assignment problem (BiQAP), which significantly reduces the search cost. Besides, based on the domain knowledge, one BiQAP solver is improved to work better statistically. In this work, the safety of CCPSOR algorithm is enhanced in the proposed PCCPSO-R algorithm for the simultaneous multi-agent decision-making and action execution.
Fourth, the multi-agent observation uncertainty and interaction uncertainty are investigated in the partial observable pursuit-evasion variants. Further, to avoid the coordination performance degradation due to communication failures and be immune from the communication cost, a more restricted self-organizing setup with only implicit coordination is considered. To address the above challenges, this research proposes a distributed hierarchical framework called the fuzzy self-organizing cooperative coevolution (FSC2) algorithm. The experimental results demonstrate that by decomposing the task by FSC2, superior performance are achieved compared with other implicit coordination policies fully trained by general MARL algorithms. The scalability of FSC2 is proved that up to 2048 FSC2 agents perform efficiently with almost 100% capture rates. Empirical analyses and ablation studies verify the interpretability, rationality, and effectiveness of component algorithms in FSC2.
Last, open problems and magics in the autocurriculum learning are explored in the co-evolutionary pursuit-evasion variants. To better understand related research works and more accurately use similar terminologies, this research reviews and analyzes the co-evolution mechanism in the multi-agent setting, which clearly reveals its relationships with auto-curricula, self-play, arms races, and adversarial learning. Then, through adversarial learning, this research achieves various arms race outcomes of different co-evolution mechanisms. Based on experiments, arms races with steady and converging improvement are more practical for increasingly complex behaviors, while policy cycles between two rival sides are useful for producing diverse policies. In particular, this research finds that the passive (evasive) policy learning benefits more from co-evolution than active (pursuing) policy learning in an asymmetric adversarial game.",
Supervisors: Chin-Teng Lin and Yuhui Shi",
Genetic Programming entries for Lijun Sun