Abstract:
|
Temporal difference methods are theoretically grounded and empirically effective methods for addressing reinforcement learning problems. In most real-worldtasks, TD methods require a function approximator to represent the value function. However, using function approximators requires manually making crucial representational decisions. This thesis investigates evolutionary function approximation, a novel approach to automatically selecting function approximator representations that enable efficient individual learning. This method evolves individuals that are better able to learn. I present an instantiation of evolutionary function approximation which combines NEAT, a neuroevolutionary optimization technique, with Q-learning. The resulting NEAT+Q algorithm automatically discovers effective representations for neural network function approximators. I also present on-line evolutionary computation, which improves the on-line performance of evolutionary computation by borrowing selection mechanisms used in TD methods to choose individual actions and using them in evolutionary computation to select policies for evaluation. I evaluate these contributions with empirical studies in two domains: 1) mountain car, a standard reinforcement learning benchmark on which neural network function approximators have previously performed poorly and 2) server job scheduling, a probabilistic domain from the field of autonomic computing. The results demonstrate that evolutionary function approximation can improve the performance of TD methods and on-line evolutionary computation can improve evolutionary methods.
|