Created by W.Langdon from gp-bibliography.bib Revision:1.8051
We propose a method that acquires the purposive behaviours based on the estimation of the state vectors in Chapter 3. In order to acquire the cooperative behaviors in multiagent environments, each learning robot estimates the Local Prediction Model (hereafter LPM) between the learner and the other objects separately. The LPM estimate the local interaction while reinforcement learning copes with the global interaction between multiple LPMs and the given tasks. Based on the LPMs which satisfies the Markovian environment assumption as possible, robots learn the desired behaviours using reinforcement learning. We also propose a learning schedule in order to make learning stable especially in the early stage of multiagent systems.
Chapter 4 discusses how an agent can develop its behaviour according to the complexity of the interactions with its environment. A method for controlling the complexity is proposed for a vision-based mobile robot. The agent estimates the full set of state vectors with the order of the major vector components based on the LPM. The environmental complexity is defined in terms of the speed of the agent while the complexity of the state vector is the number of the dimensions of the state vector. According to the increase of the speed of its own or others, the dimension of the state vector is increased by taking a trade-off between the size of the state space and the learning time.
The vector-valued reward function is discussed in order to cope with the multiple tasks in Chapter 5. Unlike the traditional weighted sum of several reward functions, we introduce a discounted matrix to integrate them in order to estimate the value function, which evaluates the current action strategy. Owing to the extension of the value function, the learning agent can estimate the future multiple reward from the environment appropriately.
Chapter 6 discusses how multiple robots can emerge cooperative behaviours through co-evolutionary processes. A genetic programming method is applied to individual population corresponding to each robot so as to obtain cooperative and competitive behaviors. The complexity of the problem can be explained twofold: co-evolution for cooperative behaviours needs exact synchronisation of mutual evolutions, and three robot co-evolution requires well-complicated environment setups that may gradually change from simpler to more complicated situations. As an example task, several simplified soccer games are selected to show the validity of the proposed methods. Finally, discussion and concluding remarks on our work are given.",
Genetic Programming entries for Eiji Uchibe