Continuous probabilistic model building genetic network programming using reinforcement learning
Graphical abstract
Introduction
Despite the selection operator based on the concept of “Survival-of-the-fittest”, classical Evolutionary Algorithms (EAs) generally evolve the population of candidate solutions by the random variation derived from biological evolution, such as crossover and mutation. However, numerous studies report that the results of EAs strongly rely on the configurations of the parameters associated with the stochastic genetic operators, such as crossover/mutation rate. For concrete problems, the parameter settings generally vary. Hence, the parameter tuning itself becomes an optimization problem. Meantime, the stochastic genetic operators sometimes may not identify and recombine the building blocks (BBs, defined by high-quality partial solutions) correctly and efficiently due to the implicit adaptation of the building block hypothesis (BBH) [1], [2], which causes the problems of premature convergence and poor evolution ability. These reasons have motivated the proposal of a new class of EAs named estimation of distribution algorithm (EDA) [3], which has received much attention in recent years [4], [5]. As the name implies, EDA focuses on estimating the probability distribution of the population using statistic/machine learning to construct a probabilistic model. Despite the selection operator which is also used to select the set of promising samples for the estimation of probability distribution, EDA replaces the crossover and mutation operators by sampling the model to generate new population. By explicitly identifying and recombining the BBs using probabilistic modeling, EDA has drawn its success to outperform the conventional EAs with fixed, problem-independent genetic operators in various optimization problems.
Numerous EDAs has been proposed, where there are mainly three ways to classify the existing EDAs. (1) From the model complexity viewpoint, EDAs can be mainly classified into three groups [6]: univariate model, pairwise model and multivariate model, which identify the BBs of different orders. Univariate model assumes there is no interactions between the elements,1 hence constructing the probabilistic model by marginal probabilities to identify BBs of order one. Similarly, pairwise and multivariate models use more complex methods to model BBs of order two and more. One can easily observe that estimating the distribution is not an easy task and modeling more accurate model generally requires higher computational cost [4], [7]. (2) From the perspective of individual structures, EDA can mainly be classified into two groups, which are probabilistic model building genetic algorithm (PMBGA) [8] and PMB genetic programming (PMBGP) [9]. PMBGA studies the probabilistic modeling using GA's bit-string individual structures. PMBGP explores EDA to tree structures which provide more complex ways to represent solutions for program evolution. (3) For different problem domains, EDA can be grouped into discrete EDAs and continuous EDAs, which solve the optimization problems of discrete domain [4], [8] and continuous domain [10], [11], [12], [13].
A novel EDA, called probabilistic model building genetic network programming (PMBGNP), was recently proposed [14], [15], [16]. PMBGNP is inspired by the classical EDAs, however, a distinguished directed graph (network) structure [17], [18], [19], [20], [21] is used to represent its individual. Hence, it can be viewed as a graph EDA that extends conventional EDAs like bit-string structure based PMBGA and tree-structure based PMBGP. The fundamental points of PMBGNP are:
- 1.
PMBGNP allows higher expression ability by means of graph structures than conventional EDAs.
- 2.
Due to the unique features of its graph structures, PMBGNP explores the applicability of EDAs to wider range of problems, such as data mining [22], [14], [23] and the problems of controlling the agents’ behavior (agent control problems) [16], [24], [25], [26].
In the previous research, it has been demonstrated that PMBGNP can successfully outperform classical EAs with the above problems.
However, PMBGNP is mainly designed for discrete optimization problems. In other words, it cannot deal with (or directly handle) continuous variables which are widely existed in many real-world control problems. To solve this problem, the simplest way is to employ discretization process to transfer the continuous variables into discrete ones, however, which will cause the loss of solution precision.
This paper is dedicated to an extension of PMBGNP to continuous optimization in agent control problems. Different from most of the existing continuous EDAs developed by incremental learning [10], maximum likelihood estimation [11], histogram [27] or some other sorts of machine learning techniques [28], [29], [30], [31], [32], the proposed algorithm employs the techniques of reinforcement learning (RL) [33], such as actor critic (AC), as the mechanism to estimate the probability density functions (PDFs) of the continuous variables. Although most of the classical continuous EDAs formulate the PDFs of continuous variables by Gaussian distribution , the proposed algorithm applies AC to calculate the temporal-difference (TD) error to evaluate whether the selection (sampling) of continuous values is better or worse than expected. Based on the idea of trial-and-error, a scalar reinforcement signal which can decide whether the tendency to select the sampled continuous value should be strengthened or weakened is formulated by the gradient learning for the evolution of Gaussian distribution (μ and σ).
Most importantly, as an extension of PMBGNP, the proposed algorithm mainly possesses the ability to solve the agent control problems, rather than the conventional continuous EDAs only for function optimization problems. Accordingly, the applicability of continuous EDAs is explored in certain degrees.
In this paper, the proposed algorithm is applied to control the behavior of a real autonomous robot, Khepera robot [34], [35], in which the robot's wheel speeds and sensor values are continuous variables. To evaluate the performance of this work, various classical algorithms are selected from the literature of standard EAs, EDA and RL for comparison.
The rest of this paper is organized as follows. Section 2 briefly introduces the original framework of PMBGNP in the discrete domain. In Section 3, extending PMBGNP to continuous domain is explained in details. The experimental study is shown in Section 4. Finally we conclude this paper in Section 5.
Section snippets
Directed graph (network) structure
From the explicit viewpoint, PMBGNP distinguishes itself from the classical EDAs by using a unique directed graph (network) structure to represent its individual, depicted in Fig. 1. The directed graph structure is originally proposed in a newly graph-based EA named Genetic Network Programming (GNP) [17], [18], [36]. Three types of nodes are created to form the program (individual) of GNP:
- •
Start node: it has no function and conditional branch.
- •
Judgment node: it has its own judgment function and
Extending PMBGNP to continuous domain
PMBGNP is dedicated to solve the discrete optimization problems, since its search space is formulated by the node connections Ci for all i ∈ Nnode while the function of each node is fixed and un-evolvable. In other words, for the problems including functions with continuous variables that take any real numbers within the given intervals, discretization should be carried out to transform the problems to discrete cases in PMBGNP. By discretization the continuous variables are substituted by a set
Experiments
Different from most of the conventional EDAs doing function optimization problems, PMBGNP-AC is applied to controlling the behaviors of an autonomous robot – Khepera robot [34], [35], which can be classified to a kind of Reinforcement Learning (RL) problems [33].
Conclusions
This paper extended a recent EDA algorithm named PMBGNP from the discrete domain to continuous cases. This study followed the conventional research on the topic of continuous EDAs and reformulated a novel method to learn Gaussian distribution by a Reinforcement Learning method, i.e., Actor-critic (AC). The resulting PMBGNP-AC method can be thought as an extension of PBILc, where AC can implicitly update the PDF of Gaussian distribution by considering multivariate interactions. The
References (45)
- et al.
Efficient program generation by evolving graph structures with multi-start nodes
Appl. Soft Comput.
(2011) - et al.
Compact particle swarm optimization
Inf. Sci.
(2013) Adaptation in Natural and Artificial Systems, Ann-Arbor
(1975)Genetic Algorithm in Search, Optimization and Machine Learning
(1989)- et al.
Estimation of distribution algorithms
A New Tool for Evolutionary Computation
(2002) - et al.
Linkage problem, distribution estimation, and bayesian networks
Evol. Comput.
(2002) - et al.
MP-EDA: a robust estimation of distribution algorithm with multiple probabilistic models for global continuous optimization
- et al.
A survey of optimization by building and using probabilistic models
Comput. Optim. Appl.
(2002) - et al.
A bayesian network approach to program generation
IEEE Trans. Evol. Comput.
(2008) - et al.
From recombination of genes to the estimation of distributions. I. Binary parameters
A survey of probabilistic model building genetic programming
Extending population-based incremental learning to continuous search spaces
Optimization by learning and simulation of Bayesian and Gaussian networks, Tech. Report EHU-KZAA-IK-4-99, Intelligent Systems Group, Dept. of Comput. Sci. and Artif. Intell.
On the convergence of a class of estimation of distribution algorithms
IEEE Trans. Evol. Comput.
A restart univariate estimation of distribution algorithm: sampling under mixed gaussian and lévy probability distribution
Genetic network programming with estimation of distribution algorithms for class association rule mining in traffic prediction
Towards the maintenance of population diversity: a hybrid probabilistic model building genetic network programming
Trans. Japan. Soc. Evol. Comput.
A novel graph-based estimation of distribution algorithm and its extension using reinforcement learning
IEEE Trans. Evol. Comput.
Comparison between genetic network programming (GNP) and genetic programming (GP)
A graph-based evolutionary algorithm: genetic network programming (GNP) and its extension using reinforcement learning
Evol. Comput.
Genetic network programming with simplified genetic operators
Neural Information Processing, Vol. 8227 of Lecture Notes in Computer Science
Adaptive genetic network programming
Cited by (5)
A league championship algorithm equipped with network structure and backward Q-learning for extracting stock trading rules
2018, Applied Soft Computing JournalCitation Excerpt :But in GNP, an individual is randomly altered or combined with others, thus, its performance may improve or not. The major advantage of LCA-N over probabilistic model building genetic network programming (PMBGNP) [27,28] is that there is no need to estimate a joint probability distribution associated with the database of selected individuals from the previous generation which is very complex and computationally expensive. Reinforcement learning is applied to LCA-N to enhance the intensified search and online learning abilities.
Supervised fuzzy reinforcement learning for robot navigation
2016, Applied Soft Computing JournalCitation Excerpt :RL capabilities, such as no need for desired outputs, training based on a scalar reinforcement signal, the possibility of online interactive training and high degree of exploration, encourage researchers to use RL in robot navigation problem. Due to large dimension of the discrete state-action pairs, continuous RL algorithms such as fuzzy RL (FRL) are usually employed to overcome the curse of dimensionality [2,12,13]. Here, we focus on continuous RL algorithms based on critic-only architecture.
Multi-fuzzy Sarsa learning-based sit-to-stand motion control for walking-support assistive robot
2021, International Journal of Advanced Robotic SystemsSearch experience-based search adaptation in artificial bee colony algorithm
2016, 2016 IEEE Congress on Evolutionary Computation, CEC 2016