Created by W.Langdon from gp-bibliography.bib Revision:1.8051
Genetic Network Programming is a newly developed EC method which chooses the directed graph structure as its chromosome. High expression ability of the graph structure, reusability of nodes and realisation of partially observable processes enable GNP to deal with many complex problems in dynamic environments. One of the disadvantages of GNP is that its gene structure may become too complicated after generations of the training. In the testing phase, it might not be able to adapt to the new environment easily and its generalisation ability is not good. This is because the implicit memory function of GNP can not memorise enough information of the environment, which is incompetent in dealing with the agent control problems in high dynamic environments. Therefore, explicit memory should be associated with GNP in order to explore its full potential.
Various research has revealed that memory schemes could enhance EC in dynamic environments. This is because storing the useful historical information into the memory could improve the search ability of EC. Inspired by this idea, a GNP-based explicit memory scheme named Genetic Network Programming with Rule Accumulation is proposed in this thesis. Focusing on this issue, it is studied in the following chapters of this thesis how to create action rules and use them for agent control, how to improve the performance in Non-Markov environments, how to prune the bad rules to improve the quality of the rule pool, how to build a rule pool adapting to the environment changes and how to create more general rules for agent control in dynamic environments. The organisation of this thesis is as follows.
Chapter 1 describes the research background, problems to be solved and outline of the thesis. Some classical methods in the domain of evolutionary computation and reinforcement learning are reviewed.
Chapter 2 designs the general framework of GNP-RA, which contains two stages, the training stage and the testing stage. In the training stage, the node transitions of GNP are recorded as rules and stored into the rule pool generation by generation. In the testing stage, all the rules in the rule pool are used to determine agents' actions through a unique matching degree calculation. The very different point of GNP-RA from the basic GNP is that GNP-RA uses a great number of rules to determine agents' actions. However, GNP could use only one rule corresponding to its node transition. Therefore, the generalisation ability of GNP-RA is better than that of GNP. Moreover, GNP-RA could make use of the previous experiences in the form of rules to determine agents' current action, which means that GNP-RA could learn from agents' past behaviours. This also helps the current agent to take correct actions and improve its performance. Simulations on the tile-world demonstrate that GNP-RA could achieve higher fitness values and better generalisation ability.",
Chapter 4 focuses on how to improve the quality of the rule pool. Two improvements are made in order to realise this. One of them is that during the rule generation, reinforcement learning is combined with evolution in order to create more efficient rules. The obtained knowledge during the running of the program could be used to select the proper processing for judgements, i.e., better rules. The second approach is that after the rule generation, a unique rule pruning method using bad individuals is proposed. The basic idea is that good individuals are used as rule generators and bad individuals are used as monitors to filter the generated bad rules. Four pruning methods are proposed and their performances are discussed and compared. After pruning the bad rules, the good rules could stand out and contribute to better performances. Simulation results demonstrate the efficiency and effectiveness of the enhanced rule-based model.
Chapter 5 is devoted to improve the adaptability of GNP-RA depending on the environment changes. GNP-RA has poor adaptability to the dynamic environments since it always uses the old rules in the rule pool for agent control. Generally speaking, the environment keeps changing all the time, while the rules in the rule pool remain the same. Therefore, the old rules in the rule pool become incompetent to guide agents' actions in the new environments. In order to solve this problem, Sarsa-learning is used as a tool to update the old rules to cope with the inexperienced situations in the new environments. The basic idea is that when evolution ends, the elite individual of GNP-RA still execute Sarsa-learning to update the Q table. With the changes of the Q table, the node transitions could be changed in accordance with the environment, bringing some new rules. These rules are used to update the rule pool, so that the rule pool could adapt to the changing environments.
Chapter 6 tries to improve the generalisation ability BibTeX entry too long. Truncated
Genetic Programming entries for Lutao Wang