Model-driven regularization approach to straight line program genetic programming
Introduction
Inductive inference (Angluin, Smith, 1983, Gori, Maggini, Martinelli, Soda, 1998, Shaoning, Kasabov, 2004, Tenebaum, Griffiths, Kemp, 2006) is one of the main fields in Machine Learning, and it can be defined as the process of hypothesizing a general rule from examples. The methods used in inductive inference span a variety of very different Machine Learning tools including neural networks, regression and decision trees, support vector machines, Bayesian networks, probabilistic finite state machines and many other statistical techniques. Given a sample set of an unknown process, the problem of finding a model capable to predict correct values for new examples has applications in a diversity of fields such as economics, electronic design, game playing, physical processes, etc. When dealing with real-world problems, the sample data are generally obtained through measures that are often corrupted by noise. In addition, the distribution according to which the examples are generated is usually unknown. This situation is very common when trying to solve inductive learning problems in the context of symbolic regression and must be considered and modeled in its whole complexity to produce reasonable models. Symbolic regression can be understood as the problem of finding a mathematical formula that fits a set of data. For a long time symbolic regression was a human task, but the advent of computers facilitated the exploration of the huge search space in which the regression process usually takes place. Genetic Programming (GP) can be seen as a symbolic regression strategy by means of evolutionary algorithms. This idea was proposed by Koza to whom both the term and the concept are due (see Koza, 1992).
In the last years, GP has been applied to a wide range of situations to solve both unsupervised and supervised learning problems, and it has become a powerful tool in the Knowledge Discovery and Data Mining domain (e.g., Freitas, 2002), including the emergent field of Big Data analysis (see Castelli, Vanneschi, Manzoni, & Popovič, 2015). Main subjects in unsupervised learning like clustering have been approached using GP (see Bezdek, Boggavarapu, Hall, Bensaid, 1994, Falco, Tarantino, Cioppa, Fontanella, 2004, Folino, Pizzuti, Spezzano, 2008, Jie, Xinbo, Li-cheng, 2003). Supervised classification by evolving selection rules is another avenue in which GP obtains a remarkable success as shown, for example, in Carreño, Leguizamón, and Wagner (2007), Cano, Herrera, and Lozano (2007), Chien, Yang, and Lin (2003), Freitas (1997), Hennessy, Madden, Conroy, and Ryder (2005) and Kuo, Hong, and Chen (2007). Singular applications to medicine and biology problems (Aslam, Zhu, Nandi, 2013, Bojarczuk, Lopes, Freitas, 2000, Bojarczuk, Lopes, Freitas, Michalkiewicz, 2004, Castelli, Vanneschi, Silva, 2014), feature extraction methods (Krawiec, 2002, Smith, Bull, 2005), database clustering and rule extraction (Wedashwara, Mabu, Obayashi, & Kuremoto, 2015), generation of hybrid multi-level predictors for function approximation and regression analysis (Tsakonas & Gabrys, 2012) are other examples in which GP is applied. Specific applications to inductive learning problems solved via GP can be found in relatively old papers (Okley, 1994, Poli, Cagnoni, 1997).
The general procedure of GP consists in the evolution of a population of computer programs, each of them computing a certain function, with the aim of finding one that best describes the target function. These computer programs are build out from a set of functions and a set of terminals consisting of variables and constants. In the evolutionary process, the fitness function evaluates the goodness of each member of the population by measuring the empirical error over the sample set. In the presence of noise and to prevent other causes of overfitting (like, for example, the use of a very complex model), this fitness function must be regularized with some term that usually depends on the complexity of the model. Regularization is then a central problem related to generalization. By generalization we mean that the empirical error must converge to the expected true error when the number of examples increases. This notion of generalization defined here roughly agrees with the informal use of the term in GP (good performance over unseen examples) but captures the nature of the problem in the learning theory setting. The problem of generalization is focused for example in Tackett and Carmi (1994) and also in Cavaretta and Chellapilla (1999). In Keijzer and Babovic (2000), ensembles methods that can improve the generalization error in GP are discussed. The specific problem of regularization is extensively treated for polynomials in Nikolaev and Iba (2001), Nikolaev, de Menezes, and Iba (2002). Recent work regarding regularization used in GP can be found in Ni and Rockett (2015b), where a vicinal risk regularization technique is explored. A detailed description of other GP-regularization strategies can be found in Ni and Rockett (2015a), where Tikhonov regularization, in conjunction with node count as a general complexity measure in multiobjective GP, is proposed.
Most work describing GP strategies for solving symbolic regression problems employs trees as data structure for representing programs. There are other methodologies having an instruction-based approach, like that of matrix representation given in Li, Wang, Lee, and Leung (2008) or the more classical Cartesian Genetic Programming (CGP) of Miller and Thomson (2000). In this paper, we propose the use of straight line programs (SLPs) as data structure to evolve in GP. The SLP structure has a good performance when solving symbolic regression problem instances as shown in Alonso, Montana, and Puente (2008). The main difference between SLP GP and CGP consists in the implementation of the crossover operator, which in the case of SLPs is designed to interchange subgraphs, as described in Section 3. While GP-trees are expression trees of formulas, the SLPs correspond to expression dags (direct acyclic graphs) in which precomputed results can be reused. This makes the SLP data structure more compact and, usually, smaller than a tree, and consequently, easier to evaluate. Moreover, there is a canonical transformation mapping GP-trees onto SLPs (see Alonso, Montana, and Puente, 2009 for a detailed explanation of the relationships between SLPs and GP-trees). This transformation preserves the size and other complexity measures. For this reason, conclusions related to SLP performance can be extended to GP-tree performance at least in the context of generalization and regularization.
This paper is primarily concerned with statistical regularization methods which can be applied to a wide family of models. In particular we study a complexity measure for families of data structures representing programs named Vapnik Chervonenkis dimension (VCD). This measure, which belongs to the field of Information Theory, usually appears in conjunction with the Probably Approximately Correct approach to supervised learning, commonly referred to in the literature as the PAC model (see Vapnik, 1998 and Vapnik & Chervonenkis, 1974). We consider families of SLPs constructed from a set of Pfaffian functions (solutions of triangular systems of first order partial differential equations with polynomial coefficients - the formal definition is given in Section 6). As important examples, polynomials, exponential functions, trigonometric functions on some particular intervals and, in general, analytic algebraic functions are all Pfaffian. The main result of this paper is summarized in Theorem 1, where an upper bound for the VCD of a family of SLPs using Pfaffian functions is found. This upper bound is polynomial on some parameters, and in particular, on the non-scalar length of the SLPs. This implies that the length of the SLPs can be arbitrary if the excess is caused by addition and/or subtraction operations. Based on this result, we propose a method for selecting models in SLP GP for the case in which the admitted operations are all Pfaffian. In general, analyticity of the operators is a very desirable property in numerical computations, leading to more robust computer programs and providing a higher classification capacity. There are even well-founded proposals to replace the traditional protected division in GP by analytic operators, providing much better experimental results as shown in Ni, Drieberg, and Rockett (2013). This is another strong motivation for using Pfaffian operations.
The paper is organized as follows. Section 2 contains some basic concepts concerning statistical regression, the model selection criteria used in this paper, along with the general definition of the VCD of a family of sets. In Section 3 we describe the SLP data structure, and we introduce, via a technical lemma, the concept of an universal SLP. Section 4 includes the main traits of the SLP GP paradigm. In Section 5 we provide basic definitions for the Pfaffian operators, and we present a summary of previous technical results that are used in the proof of our main theorem, which can be found in Section 6. There, we give an upper bound for the VCD of families of SLPs that use as operators only Pfaffian functions. Section 7 shows the results of an extensive experimentation phase on a variety of symbolic regression problems. We also provide a comparative analysis between our method and two other well-known statistical regularization strategies, in which the complexity of the models is estimated by the number of free parameters. Finally, Section 8 contains some conclusions and future work.
Section snippets
Supervised learning and regression
Genetic Programming can be seen as a direct evolution method of computer programs for inductive learning. Inductive GP can be considered as a specialization of GP, in that it uses the framework of the last one in order to solve inductive learning problems. These problems are, in general, searching problems, where the aim is to find the best model from a finite set of observed data. Note that the best model might not be the one that perfectly fits the data, since this could lead to overfitting
Straight line programs
Straight line programs have a large history in the field of Computational Algebra. A particular class of SLPs, known in the literature as arithmetic circuits, constitutes the underlying computation model in Algebraic Complexity Theory (Burguisser, Clausen, & Shokrollahi, 1997). Arithmetic circuits with the standard arithmetic operations are the natural model of computation for studying the computational complexity of algorithms solving problems with an algebraic flavor. They have been
Straight line program genetic programming
The general definition of GP as an evolution method makes it independent of the data structure used for the representation of the evolved programs. Usually these programs are represented by directed trees with ordered branches (see Koza, 1992). Linear Genetic Programming (LGP) is a GP variant that evolves sequences of instructions from an imperative programming language or from a machine language. The term linear refers to the data structure used for the program representation, constituted by
Pfaffian functions and VCD of formulas
In this section we introduce some basic notions, tools and known results concerning the geometry of sets defined by boolean combinations of sign conditions over Pfaffian functions, also called semi-Pfaffian sets in the mathematical literature (see Gabrielov and Vorobjov, 2004 for a complete survey).
Definition 5 Let be an open domain. A Pfaffian chain of order q ≥ 1 and degree D ≥ 1 in U is a sequence of real analytic functions on U satisfying a system of differential equations
VCD bounds of straight line programs with Pfaffian instructions
In this section we show how we can extend the above mentioned results to the family of SLPs that use Pfaffian operators. As we shall shortly see, our upper bound does not depend on the length of the SLPs. In previous results about the VC dimension of programs or families of computation trees, a bound approximated by the number of steps of the program execution or by the height of the computation tree was needed. In our case, we only need a bound for the number of non-scalar SLP instructions
Experimentation
In this section we present the results obtained after an important experimental phase, in which symbolic regression problem instances were solved using SLPs over a fixed set of Pfaffian functions as the underlying structure of the model. We implemented a GP algorithm with the genetic operators described in Section 2 and the fitness regularization function from Eq. (7) (the model known as structural risk minimization). Although Theorem 1 presents an upper bound for the VCD of our families of
Conclusions and future work
Straight line programs constitute a promising structure for representing models in the GP framework. In this paper we control the complexity of populations of SLPs while they evolve in order to find good models for solving symbolic regression problem instances. The evolving structure is constructed from a set of functions that contains Pfaffian operators. We have considered the VCD as a complexity measure. We have found a theoretical upper bound of the VCD of families of SLPs over Pfaffian
Acknowledgment
This article significantly extends (Alonso, Montaña, & Borges, 2013), a paper that was presented at the 5th International Joint Conference on Computational Intelligence (IJCCI 2013), where it received the Best Paper Award. This work was partially supported by project BASMATI (TIN2011-27479-C04-04) of Programa Nacional de Investigación and project PAC::LFO (MTM2014-55262-P) of Programa Estatal de Fomento de la Investigación Científica y Técnica de Excelencia, Ministerio de Ciencia e Innovación
References (66)
- et al.
Feature generation using genetic programming with comparative partner selection for diabetes classification
Expert Systems with Applications
(2013) On computing the determinant in small parallel time using a small number of processors
Information Processing Letters
(1984)- et al.
A constrained-syntax genetic programming system for discovering classification rules: application to medical data sets
Artificial Intelligence in Medicine
(2004) - et al.
Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability
Data & Knowledge Engineering
(2007) - et al.
Semantic genetic programming for fast and accurate data knowledge discovery
Swarm and Evolutionary Computation
(2016) - et al.
Prediction of the unified Parkinson’s disease rating scale assessment using a genetic programming system with geometric semantic genetic operators
Expert Systems with Applications
(2014) - et al.
An improved genetic programming technique for the classification of Raman spectra
Knowledge Based Systems
(2005) - et al.
Polynomial bounds for VC dimension of sigmoidal and general Pfaffian Neural Networks
The Journal of Computer and System Sciences
(1997) - et al.
Gradient: Grammar-driven genetic programming framework for building multi-component, hierarchical predictive systems
Expert Systems with Applications
(2012) Statistical prediction information
Annals of the Institute of Statistical Mathematics
(1970)
Keel data-mining sofware tool: data set repository, integration of algorithms and experimental analysis framework
Journal of Multiple–Valued Logic and Soft Computing
Model complexity control in straight line program genetic programming
Straight line programs: a new linear genetic programming approach
Proceedings of the international conference on tools with artificial intelligence
A new linear genetic programming approach based on straight line programs: Some theoretical and experimental aspects
International Journal on Artificial Intelligence Tools
Inductive inference: theory and methods
ACM Computing Surveys
Model selection and error estimation
Machine Learning
Bayesian theory
Genetic algorithm guided clustering
Proceedings of the 1st IEEE conference on evolutionary computation
Genetic programming for knowledge discovery in chest pain diagnosis
IEEE Engineering in Medicine and Biology Magazine
Linear genetic programming
Algebraic Complexity Theory
Evolution of classification rules for comprehensible knowledge discovery
Data mining using genetic programming - the implications of parsimony on generalization error
Model complexity control for regression using VC generalization bounds
Transactions on Neural Networks
Comparison of model selection fo regression
Neural Computation
Learning from data: concepts, theory, and methods
Generating effective classifiers with supervised learning of genetic programming
Parameter tuning for configuring and analyzing evolutionary algorithms
Swarm and Evolutionary Computation
An innovative approach to genetic programming-based clustering
Training distributed GP ensemble with a selective algorithm based on clustering and pruning for pattern classification
IEEE Transactions on Evolutionary Computation
A genetic programming framework for two data mining tasks: classification and generalized rule induction
Proceedings of the 2nd annual conference on genetic programming
Data mining and knowledge discovery with evolutionary algorithms
Complexity of computations with Pfaffian and Noetherian functions
Cited by (4)
Soft target and functional complexity reduction: A hybrid regularization method for genetic programming
2021, Expert Systems with ApplicationsCitation Excerpt :Nevertheless, we opted for nested tournaments, instead of multi-optimization, given their simplicity and the good results that they have been able to obtain in recent GP research (Vanneschi et al., 2019). In 2016, Alonso and colleagues (Alonso et al., 2016; Montaña et al., 2016) proposed a tool for controlling the complexity of GP models. The tool was based on Vapnik-Chervonekis dimension (VCD) theory and was combined with a specific representation of models.
Choosing function sets with better generalisation performance for symbolic regression models
2021, Genetic Programming and Evolvable MachinesCarbon emission and economic growth model of Beijing based on symbolic regression
2018, Polish Journal of Environmental Studies