Elsevier

Applied Soft Computing

Volume 27, February 2015, Pages 457-467
Applied Soft Computing

Continuous probabilistic model building genetic network programming using reinforcement learning

https://doi.org/10.1016/j.asoc.2014.10.023Get rights and content

Highlights

  • This paper proposes a novel continuous estimation of distribution algorithm (EDA).

  • A recent EDA named PMBGNP is extended from discrete domain to continuous domain.

  • Reinforcement Learning (RL) is applied to construct the probabilistic model.

  • Experiments on real mobile robot control show the superiority of the proposed algorithm.

  • It bridges the gap between EDA and RL.

Abstract

Recently, a novel probabilistic model-building evolutionary algorithm (so called estimation of distribution algorithm, or EDA), named probabilistic model building genetic network programming (PMBGNP), has been proposed. PMBGNP uses graph structures for its individual representation, which shows higher expression ability than the classical EDAs. Hence, it extends EDAs to solve a range of problems, such as data mining and agent control. This paper is dedicated to propose a continuous version of PMBGNP for continuous optimization in agent control problems. Different from the other continuous EDAs, the proposed algorithm evolves the continuous variables by reinforcement learning (RL). We compare the performance with several state-of-the-art algorithms on a real mobile robot control problem. The results show that the proposed algorithm outperforms the others with statistically significant differences.

Introduction

Despite the selection operator based on the concept of “Survival-of-the-fittest”, classical Evolutionary Algorithms (EAs) generally evolve the population of candidate solutions by the random variation derived from biological evolution, such as crossover and mutation. However, numerous studies report that the results of EAs strongly rely on the configurations of the parameters associated with the stochastic genetic operators, such as crossover/mutation rate. For concrete problems, the parameter settings generally vary. Hence, the parameter tuning itself becomes an optimization problem. Meantime, the stochastic genetic operators sometimes may not identify and recombine the building blocks (BBs, defined by high-quality partial solutions) correctly and efficiently due to the implicit adaptation of the building block hypothesis (BBH) [1], [2], which causes the problems of premature convergence and poor evolution ability. These reasons have motivated the proposal of a new class of EAs named estimation of distribution algorithm (EDA) [3], which has received much attention in recent years [4], [5]. As the name implies, EDA focuses on estimating the probability distribution of the population using statistic/machine learning to construct a probabilistic model. Despite the selection operator which is also used to select the set of promising samples for the estimation of probability distribution, EDA replaces the crossover and mutation operators by sampling the model to generate new population. By explicitly identifying and recombining the BBs using probabilistic modeling, EDA has drawn its success to outperform the conventional EAs with fixed, problem-independent genetic operators in various optimization problems.

Numerous EDAs has been proposed, where there are mainly three ways to classify the existing EDAs. (1) From the model complexity viewpoint, EDAs can be mainly classified into three groups [6]: univariate model, pairwise model and multivariate model, which identify the BBs of different orders. Univariate model assumes there is no interactions between the elements,1 hence constructing the probabilistic model by marginal probabilities to identify BBs of order one. Similarly, pairwise and multivariate models use more complex methods to model BBs of order two and more. One can easily observe that estimating the distribution is not an easy task and modeling more accurate model generally requires higher computational cost [4], [7]. (2) From the perspective of individual structures, EDA can mainly be classified into two groups, which are probabilistic model building genetic algorithm (PMBGA) [8] and PMB genetic programming (PMBGP) [9]. PMBGA studies the probabilistic modeling using GA's bit-string individual structures. PMBGP explores EDA to tree structures which provide more complex ways to represent solutions for program evolution. (3) For different problem domains, EDA can be grouped into discrete EDAs and continuous EDAs, which solve the optimization problems of discrete domain [4], [8] and continuous domain [10], [11], [12], [13].

A novel EDA, called probabilistic model building genetic network programming (PMBGNP), was recently proposed [14], [15], [16]. PMBGNP is inspired by the classical EDAs, however, a distinguished directed graph (network) structure [17], [18], [19], [20], [21] is used to represent its individual. Hence, it can be viewed as a graph EDA that extends conventional EDAs like bit-string structure based PMBGA and tree-structure based PMBGP. The fundamental points of PMBGNP are:

  • 1.

    PMBGNP allows higher expression ability by means of graph structures than conventional EDAs.

  • 2.

    Due to the unique features of its graph structures, PMBGNP explores the applicability of EDAs to wider range of problems, such as data mining [22], [14], [23] and the problems of controlling the agents’ behavior (agent control problems) [16], [24], [25], [26].

In the previous research, it has been demonstrated that PMBGNP can successfully outperform classical EAs with the above problems.

However, PMBGNP is mainly designed for discrete optimization problems. In other words, it cannot deal with (or directly handle) continuous variables which are widely existed in many real-world control problems. To solve this problem, the simplest way is to employ discretization process to transfer the continuous variables into discrete ones, however, which will cause the loss of solution precision.

This paper is dedicated to an extension of PMBGNP to continuous optimization in agent control problems. Different from most of the existing continuous EDAs developed by incremental learning [10], maximum likelihood estimation [11], histogram [27] or some other sorts of machine learning techniques [28], [29], [30], [31], [32], the proposed algorithm employs the techniques of reinforcement learning (RL) [33], such as actor critic (AC), as the mechanism to estimate the probability density functions (PDFs) of the continuous variables. Although most of the classical continuous EDAs formulate the PDFs of continuous variables by Gaussian distribution N(μ,σ2), the proposed algorithm applies AC to calculate the temporal-difference (TD) error to evaluate whether the selection (sampling) of continuous values is better or worse than expected. Based on the idea of trial-and-error, a scalar reinforcement signal which can decide whether the tendency to select the sampled continuous value should be strengthened or weakened is formulated by the gradient learning for the evolution of Gaussian distribution (μ and σ).

Most importantly, as an extension of PMBGNP, the proposed algorithm mainly possesses the ability to solve the agent control problems, rather than the conventional continuous EDAs only for function optimization problems. Accordingly, the applicability of continuous EDAs is explored in certain degrees.

In this paper, the proposed algorithm is applied to control the behavior of a real autonomous robot, Khepera robot [34], [35], in which the robot's wheel speeds and sensor values are continuous variables. To evaluate the performance of this work, various classical algorithms are selected from the literature of standard EAs, EDA and RL for comparison.

The rest of this paper is organized as follows. Section 2 briefly introduces the original framework of PMBGNP in the discrete domain. In Section 3, extending PMBGNP to continuous domain is explained in details. The experimental study is shown in Section 4. Finally we conclude this paper in Section 5.

Section snippets

Directed graph (network) structure

From the explicit viewpoint, PMBGNP distinguishes itself from the classical EDAs by using a unique directed graph (network) structure to represent its individual, depicted in Fig. 1. The directed graph structure is originally proposed in a newly graph-based EA named Genetic Network Programming (GNP) [17], [18], [36]. Three types of nodes are created to form the program (individual) of GNP:

  • Start node: it has no function and conditional branch.

  • Judgment node: it has its own judgment function and

Extending PMBGNP to continuous domain

PMBGNP is dedicated to solve the discrete optimization problems, since its search space is formulated by the node connections Ci for all i  Nnode while the function of each node is fixed and un-evolvable. In other words, for the problems including functions with continuous variables that take any real numbers within the given intervals, discretization should be carried out to transform the problems to discrete cases in PMBGNP. By discretization the continuous variables are substituted by a set

Experiments

Different from most of the conventional EDAs doing function optimization problems, PMBGNP-AC is applied to controlling the behaviors of an autonomous robot – Khepera robot [34], [35], which can be classified to a kind of Reinforcement Learning (RL) problems [33].

Conclusions

This paper extended a recent EDA algorithm named PMBGNP from the discrete domain to continuous cases. This study followed the conventional research on the topic of continuous EDAs and reformulated a novel method to learn Gaussian distribution N(μ,σ2) by a Reinforcement Learning method, i.e., Actor-critic (AC). The resulting PMBGNP-AC method can be thought as an extension of PBILc, where AC can implicitly update the PDF of Gaussian distribution by considering multivariate interactions. The

References (45)

  • S. Mabu et al.

    Efficient program generation by evolving graph structures with multi-start nodes

    Appl. Soft Comput.

    (2011)
  • F. Neri et al.

    Compact particle swarm optimization

    Inf. Sci.

    (2013)
  • J.H. Holland

    Adaptation in Natural and Artificial Systems, Ann-Arbor

    (1975)
  • D.E. Goldberg

    Genetic Algorithm in Search, Optimization and Machine Learning

    (1989)
  • P. Larrañaga et al.

    Estimation of distribution algorithms

    A New Tool for Evolutionary Computation

    (2002)
  • M. Pelikan et al.

    Linkage problem, distribution estimation, and bayesian networks

    Evol. Comput.

    (2002)
  • J.-H. Zhong et al.

    MP-EDA: a robust estimation of distribution algorithm with multiple probabilistic models for global continuous optimization

  • M. Pelikan et al.

    A survey of optimization by building and using probabilistic models

    Comput. Optim. Appl.

    (2002)
  • Y. Hasegawa et al.

    A bayesian network approach to program generation

    IEEE Trans. Evol. Comput.

    (2008)
  • H. Mühlenbein et al.

    From recombination of genes to the estimation of distributions. I. Binary parameters

  • Y. Shan et al.

    A survey of probabilistic model building genetic programming

  • M. Sebag et al.

    Extending population-based incremental learning to continuous search spaces

  • P. Larrañaga et al.

    Optimization by learning and simulation of Bayesian and Gaussian networks, Tech. Report EHU-KZAA-IK-4-99, Intelligent Systems Group, Dept. of Comput. Sci. and Artif. Intell.

    (1999)
  • Q. Zhang et al.

    On the convergence of a class of estimation of distribution algorithms

    IEEE Trans. Evol. Comput.

    (2004)
  • Y. Wang et al.

    A restart univariate estimation of distribution algorithm: sampling under mixed gaussian and lévy probability distribution

  • X. Li et al.

    Genetic network programming with estimation of distribution algorithms for class association rule mining in traffic prediction

  • X. Li et al.

    Towards the maintenance of population diversity: a hybrid probabilistic model building genetic network programming

    Trans. Japan. Soc. Evol. Comput.

    (2010)
  • X. Li et al.

    A novel graph-based estimation of distribution algorithm and its extension using reinforcement learning

    IEEE Trans. Evol. Comput.

    (2014)
  • K. Hirasawa et al.

    Comparison between genetic network programming (GNP) and genetic programming (GP)

  • S. Mabu et al.

    A graph-based evolutionary algorithm: genetic network programming (GNP) and its extension using reinforcement learning

    Evol. Comput.

    (2007)
  • X. Li et al.

    Genetic network programming with simplified genetic operators

    Neural Information Processing, Vol. 8227 of Lecture Notes in Computer Science

    (2013)
  • X. Li et al.

    Adaptive genetic network programming

  • Cited by (5)

    • A league championship algorithm equipped with network structure and backward Q-learning for extracting stock trading rules

      2018, Applied Soft Computing Journal
      Citation Excerpt :

      But in GNP, an individual is randomly altered or combined with others, thus, its performance may improve or not. The major advantage of LCA-N over probabilistic model building genetic network programming (PMBGNP) [27,28] is that there is no need to estimate a joint probability distribution associated with the database of selected individuals from the previous generation which is very complex and computationally expensive. Reinforcement learning is applied to LCA-N to enhance the intensified search and online learning abilities.

    • Supervised fuzzy reinforcement learning for robot navigation

      2016, Applied Soft Computing Journal
      Citation Excerpt :

      RL capabilities, such as no need for desired outputs, training based on a scalar reinforcement signal, the possibility of online interactive training and high degree of exploration, encourage researchers to use RL in robot navigation problem. Due to large dimension of the discrete state-action pairs, continuous RL algorithms such as fuzzy RL (FRL) are usually employed to overcome the curse of dimensionality [2,12,13]. Here, we focus on continuous RL algorithms based on critic-only architecture.

    • Search experience-based search adaptation in artificial bee colony algorithm

      2016, 2016 IEEE Congress on Evolutionary Computation, CEC 2016
    View full text