A variable size mechanism of distributed graph programs and its performance evaluation in agent control problems
Introduction
Genetic Algorithm (GA) (Holland, 1975) and Genetic Programming (GP) (Koza, 1992, Koza, 1994) are typical evolutionary algorithms that have been widely studied. A large number of real world applications have been also studied such as robot programming (Kamio & Iba, 2005), financial problems (Alfaro-Cid et al., 2008, Iba and Sasaki, 1999, Ruiz-Torrubiano and Suárez, 2010) and network security systems (Banković et al., 2007, Folino et al., 2005).
In order to create reliable systems using evolutionary algorithms, the program structures (phenotype and genotype representations) and how to efficiently evolve them are important issues. Therefore, as an extended algorithm of GA and GP, Genetic Network Programming (GNP) and its extension using reinforcement learning (GNP-RL) (Mabu et al., 2007, Hirasawa et al., 2001) have been proposed and applied to many applications (Mabu et al., 2011, Hirasawa et al., 2008). The program of GNP is represented by directed graph structures and evolved by crossover and mutation. Originally, GNP was proposed because graph structures may have better representation abilities than strings and trees. In addition, human brain also has a graph (network) structure, so some inherent abilities may be involved in the graph structure. In fact, the graph structure has some advantages such as (1) reusability of nodes and (2) applicability to dynamic environments. Actually, GNP has the following features. (1) The directed graph structure automatically generates some repetitive processes like subroutines, and reuses nodes repeatedly during the node transition, which contributes to creating programs with compact structures. (2) Once GNP starts its node transition from the start node, GNP executes judgment nodes (if–then functions) and processing nodes (action functions) according to the connections between nodes without any terminal nodes. Therefore, the node transition implicitly memorizes the history of judgment and actions, which contributes to the decision making in dynamic environments because GNP can make decisions based not only on the current, but also the past information.
Evolutionary Programming (EP) (Fogel et al., 1966, Fogel, 1994) is a graph-based evolutionary algorithm to create finite state machines (FSMs) automatically, but the characteristics of EP and GNP are different. Generally, FSM must define the state transition rules for all the combinations of states and possible inputs, thus the FSM program will become large and complex when the number of states and inputs is large. On the other hand, the evolution of GNP selects only the necessary nodes by changing connections between nodes, which means that GNP can judge only the essential inputs for making decisions at the current situations. As a result, GNP does not have to consider all the combinations of the inputs and actions, which makes the compact program structures.
When GNP is applied to complicated real world systems, large size of graph structures are needed to represent various kinds of rules for adapting to various kinds of situations. However, huge structures may decrease the efficiency of evolution, as a result, decrease the performance of the systems. Therefore, we have been studied distributed GNP (Yang, He, Mabu, & Hirasawa, 2012) based on the idea of divide and conquer which is used to create the complicated systems using relatively small size of the programs (Li, Tian, & Sclaroff, 2012). In Yang et al. (2012), the programs are divided into several subprograms which cooperatively control whole tasks, and this method is applied to a stock trading model, which shows better performances than the method without distributed structures. However, because the previous work divided a program into some subprograms with the same size, it cannot adjust the sizes of the subprograms depending on the problems. The best size of the structure is different depending on the difficulties of the task, thus the fixed size of the structure limits the representation ability of the solutions. In order to solve this problem, an efficient evolutionary algorithm of variable size distributed GNP (VS-DGNP) is proposed and its performance is evaluated by the tileworld problem (Pollack & Ringuette, 1990) that is one of the benchmark problems of multiagent systems in dynamic environments. The features of the proposed method are as follows.
The complicated structure can be divided into several substructures with different sizes. Genetic operations can be executed inside each substructure and between substructures, respectively, as a result, the efficient optimization can be done. Migration of nodes from a certain substructure to another substructure is executed to optimize the sizes of the substructures appropriately.
The rest of this paper is organized as follows. In Section 2, the basic structure of GNP and GNP with reinforcement learning (GNP-RL) is reviewed. In Section 3, the structure of distributed GNP and how to realize variable size structure are explained. In Section 4, after simulation environments and conditions are explained, the results and analysis are described. Section 5 is devoted to conclusions.
Section snippets
Review of genetic network programming with reinforcement learning
Because the proposed Variable Size Distributed GNP (VS-DGNP) is based on GNP with Reinforcement Learning (GNP-RL), the structure of GNP-RL and its learning and evolution mechanisms are firstly reviewed. An individual of GNP-RL is represented by a directed graph structure, evolved by crossover and mutation, and the node transition rules are learned by reinforcement learning.
Proposed method: variable size mechanism of distributed GNP-RL
In Section 2, the basic structure of GNP-RL is explained, where one individual consists of one graph structure. In this section, the distributed graph structure is firstly introduced, then a variable size mechanism of the distributed structures is proposed.
Simulations
To confirm the effectiveness of the proposed method, the simulations for determining agents’ behavior using the Tileworld problem (Pollack & Ringuette, 1990) are described in this section.
Conclusions
In this paper, a variable size mechanism of distributed GNP is proposed. In order to realize this mechanism, the new crossover and mutation operators considering the distributed structures are introduced. From the simulation results in the Tileworld problem, the proposed method shows better results than distributed GNP without using variable size mechanism in both training and testing environments. In the future work, the contribution of the distributed structure and variable size mechanism is
References (19)
- et al.
Improving network security using genetic algorithm approach
Computers & Electrical Engineering
(2007) - Alfaro-Cid, E., Castillo, P.A., Esparcia, A., Sharman, K., Merelo, J., Prieto, A., Mora, A., Laredo, J. (2008)....
An introduction to simulated evolutionary optimization
IEEE Transactions on Neural Networks
(1994)- et al.
Artificial intelligence through simulated evolution
(1966) - et al.
GP ensemble for distributed intrusion detection systems
- et al.
A double-deck elevator group supervisory control system using genetic network programming
IEEE Transactions on Systems Man and Cybernetics C
(2008) - Hirasawa, K., Okubo, M., Katagiri, H., Hu, J., Murata, J. (2001). Comparison between genetic network programming (GNP)...
Adaptation in natural and artificial systems
(1975)- Iba, H., Sasaki, T. (1999). Using genetic programming to predict financial data. In Proceedings of the congress on...
Cited by (2)
Efficiency improvement of genetic network programming by tasks decomposition in different types of environments
2021, Genetic Programming and Evolvable MachinesEvolving meta-level reasoning with reinforcement learning and A* for coordinated multi-agent path-planning
2020, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS