Skip to main content
Log in

Empirical modeling using genetic programming: a survey of issues and approaches

  • Published:
Natural Computing Aims and scope Submit manuscript

Abstract

Empirical modeling, which is a process of developing a mathematical model of a system from experimental data, has attracted many researchers due to its wide applicability. Finding both the structure and appropriate numeric coefficients of the model is a real challenge. Genetic programming (GP) has been applied by many practitioners to solve this problem. However, there are a number of issues which require careful attention while applying GP to empirical modeling problems. We begin with highlighting the importance of these issues including: computational efforts in evolving a model, premature convergence, generalization ability of an evolved model, building hierarchical models, and constant creation techniques. We survey and classify different approaches used by GP researchers to deal with the mentioned issues. We present different performance measures which are useful to report the results of analysis of GP runs. We hope this work would help the reader by facilitating to understand key concepts and practical issues of GP and steering in selection of an appropriate approach to solve a particular issue effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Altenberg L (1994) The evolution of evolvability in genetic programming. In: Kinnear Jr. KE (eds) Advances in genetic programming. MIT Press, Cambridge, MA, pp 47–74

    Google Scholar 

  • Angeline PJ, Pollack J (1993) Evolutionary module acquisition. In: Fogel D, Atmar W (eds) Proceedings of the second annual conference on evolutionary programming, La Jolla, CA, pp 154–163

  • Babovic V, Keijzer M (2000) Genetic programming as a model induction engine. J Hydroinform 2(1):35–60

    Google Scholar 

  • Barr RS, Golden BL, Kelly JP, Resende MG, Stewart Jr. WR (1995) Designing and reporting on computational experiments with heuristic methods. J Heuristics 1(1):9–32

    Article  MATH  Google Scholar 

  • Beadle L, Johnson C (2008) Semantically driven crossover in genetic programming. In: Evolutionary computation, 2008. CEC 2008. IEEE World Congress on Computational Intelligence, pp 111–116

  • Bentley PJ, Wakefield JP (1996) An analysis of multiobjective optimization within genetic algorithms. Technical Report ENGPJB96 96:1–14

    Google Scholar 

  • Burke E, Gustafson S, Kendall G (2004) Diversity in genetic programming: an analysis of measures and correlation with fitness. IEEE Trans Evol Comput 8(1):47–62

    Article  Google Scholar 

  • Cantu-Paz E (2000) Efficient and accurate parallel genetic algorithms, vol. 1. Springer, Norwell, MA

    Google Scholar 

  • Coello CAC (1998) A comprehensive survey of evolutionary-based multiobjective optimization techniques. Knowl Inf Syst 1(3):269–308

    Article  Google Scholar 

  • Costelloe D, Ryan C (2009) On improving generalisation in genetic programming. In: Proceedings of the 12th European conference on genetic programming, EuroGP ’09, Springer-Verlag, Berlin, Heidelberg, pp 61–72

  • Crawford-Marks R, Spector L (2002) Size control via size fair genetic operators in the pushgp genetic programming system. In: Proceedings of the genetic and evolutionary computation conference, GECCO ’02, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 733–739

  • Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimisation: Nsga-ii. In: Proceedings of the 6th international conference on parallel problem solving from nature, PPSN VI, Springer-Verlag, London, pp 849–858

  • de Jong ED, Watson RA, Pollack JB (2001) Reducing bloat and promoting diversity using multi-objective methods. Proceedings of the genetic and evolutionary computation conference (GECCO-2001), pp 11–18

  • de Vega FF, Tomassini M, Vanneschi L, Bucher L (2000) A distributed computing environment for genetic programming using MPI. In: Proceedings of the 7th European PVM/MPI users’ group meeting on recent advances in parallel virtual machine and message passing interface, Springer, London, UK, pp 322–329

  • Dignum S, Poli R (2008) Operator equalisation and bloat free gp. In: Proceedings of the 11th European conference on genetic programming, EuroGP’08, Springer-Verlag, Berlin, Heidelberg, pp 110–121

  • Eiben A, Jelasity M (2002) A critical note on experimental research methodology in ec. In: Proceedings of the 2002 Congress on evolutionary computation, 2002. CEC’02., vol 1, pp 582–587

  • Eiben A, Smit S (2011) Parameter tuning for configuring and analyzing evolutionary algorithms, pp 19–31

  • Ekárt A, Németh SZ (2001) Selection based on the pareto nondomination criterion for controlling code growth in genetic programming. Genet Program Evolvable Mach 2(1):61–73

    Article  MATH  Google Scholar 

  • Eshelman LJ, Schaffer JD (1993) Crossover’s niche. In: Proceedings of the 5th international conference on genetic algorithms, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 9–14

  • Esparcia-Alcazar AI, Sharman K (1997) Learning schemes for genetic programming. In: Late breaking papers at the 1997 genetic programming conference, pp 57–65

  • Ferreira C (2002) Gene expression programming in problem solving. In: Soft computing and industry, Springer, Berlin, pp 635–653.

  • Ferreira C (2003) Function finding and the creation of numerical constants in gene expression programming. Springer, Berlin, pp 257–265

    Google Scholar 

  • Fonseca CM, Fleming PJ (1993) Genetic algorithms for multiobjective optimization: Formulation, discussion and generalization. In: Proceedings of the 5th international conference on genetic algorithms, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 416–423

  • Gagné C, Parizeau M, Dubreuil M (2003) Distributed beagle: an environment for parallel and distributed evolutionary computations. In: Proceedings of the 17th annual international symposium on high performance computing systems and applications (HPCS), vol 2003. NRC Research Press, Canada, pp 201–208

  • Gagné C, Schoenauer M, Parizeau M, Tomassini M (2006) Genetic programming, validation sets, and parsimony pressure. In: Proceedings of the 9th European conference on genetic programming, EuroGP’06, Springer-Verlag, Berlin, Heidelberg, pp 109–120

  • Gustafson S, Burke E, Krasnogor N (2005) On improving genetic programming for symbolic regression. In: The 2005 IEEE congress on evolutionary computation, 2005. vol. 1, pp 912–919

  • Guyon I, Alamdari A, Dror G, Buhmann, J (2006) Performance prediction challenge. In: International joint conference on neural networks, 2006. IJCNN ’06, pp 1649–1656

  • Handley S (1994) On the use of a directed acyclic graph to represent a population of computer programs. In: Proceedings of the First IEEE Conference on evolutionary computation, 1994. IEEE world congress on computational intelligence, vol 1, pp 154–159

  • Harmeling S, Dornhege G, Tax D, Meinecke F, Müller KR (2006) From outliers to prototypes: ordering data. Neurocomputing 69(13):1608–1618

    Article  Google Scholar 

  • Haynes T (1998) Collective adaptation: the exchange of coding segments. Evol Comput 6(4):311–338

    Article  Google Scholar 

  • Hengproprohm S, Chongstitvatana P (2001) Selective crossover in genetic programming. In: ISCIT international symposium on communications and information technologies. ChiangMai Orchid, ChiangMai Thailand

  • Holland JH (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control and artificial intelligence. MIT Press, Cambridge

    Google Scholar 

  • Horn J, Nafpliotis N, Goldberg D (1994) A niched pareto genetic algorithm for multiobjective optimization. In: Proceedings of the First IEEE Conference on evolutionary computation, 1994. IEEE world congress on computational intelligence, vol 1, pp 82–87

  • Howard L, D’Angelo D (1995) The ga-p: a genetic algorithm and genetic programming hybrid. IEEE Expert 10(3):11–15

    Article  Google Scholar 

  • Ito T, Iba H, Sato S (1998) Non-destructive depth-dependent crossover for genetic programming. In: Genetic programming, Springer, London, pp 71–82.

  • Jin R, Chen W, Simpson TW (2000) Comparative studies of metamodeling techniques under multiple modeling criteria. Struct Multi Optim 23:1–13

    Article  Google Scholar 

  • Jin Y, Olhofer M, Sendhoff B (2001) Dynamic weighted aggregation for evolutionary multi-objective optimization: why does it work and how? In: Proceedings of the genetic and evolutionary computation conference GECCO, Morgan Kaufmann, pp 1042–1049

  • Keijzer M (1996) Advances in genetic programming. MIT Press, Cambridge, MA, pp 259–278

    Google Scholar 

  • Keijzer M (2003) Improving symbolic regression with interval arithmetic and linear scaling. In: Proceedings of the 6th European conference on genetic programming, EuroGP’03, Springer-Verlag, Berlin, Heidelberg, pp 70–82

  • Keijzer M (2004) Alternatives in subtree caching for genetic programming. In: Genetic programming, Springer, Berlin, pp 328–337

  • Keijzer M, Babovic V (2000) Genetic programming within a framework of computer-aided discovery of scientific knowledge. In: Whitley D, Goldberg D, Cantu-Paz D, Spector L, Parmee I, Beyer HG (eds) Proceedings of the genetic and evolutionary computation conference (GECCO-2000), Morgan Kaufmann, Las Vegas, Nevada, pp 543–550

  • Knowles JD, Corne DW (2000) Approximating the nondominated front using the pareto archived evolution strategy. Evol Comput 8(2):149–172

    Article  Google Scholar 

  • Kotanchek M, Smits G, Vladislavleva E (2007) Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models. In: Riolo RL, Soule T, Worzel B (eds) Genetic programming theory and practice V, vol. 5. Springer. Genetic and Evolutionary Computation, Ann Arbor, pp 201–220.

  • Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection, vol 1. MIT Press, Cambridge

    Google Scholar 

  • Koza JR (1995) Evolving the architecture of a multi-part program in genetic programming using architecture-altering operations. In: McDonnell JR, Reynolds RG, Fogel DB (eds) Evolutionary programming IV proceedings of the fourth annual conference on evolutionary programming, MIT Press, San Diego, CA, pp 695–717.

  • Langdon WB (1998) Genetic programming and data structures: genetic programming + data structures = automatic programming!, vol 1. Springer, Berlin

    Book  Google Scholar 

  • Langdon WB (2000) Size fair and homologous tree crossovers for tree genetic programming. Genet Program Evolvable Mach 1(1−2):95–119

    Article  MATH  Google Scholar 

  • Langdon W, Nordin J (2000) Seeding genetic programming populations. In: Poli R, Banzhaf W, Langdon W, Miller J, Nordin P, Fogarty T (eds) Genetic programming, lecture notes in computer science, vol. 1802, vol. 1802. Springer, Berlin Heidelberg, pp 304–315

    Google Scholar 

  • Langdon WB, Poli R (1998) Fitness causes bloat: mutation. In: Chawdhry PK, Roy R, Pan RK (eds) Second on-line world conference on soft computing in engineering design and manufacturing, Springer-Verlag, London, pp 37–48

  • Laumanns M, Thiele L, Zitzler E, Deb K (2002) Archiving with guaranteed convergence and diversity in multi-objective optimization. In: Proceedings of the genetic and evolutionary computation conference (GECCO), GECCO’02, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 439–447

  • Li X, Zhou C, Nelson PC, Tirpak TM (2004) Investigation of constant creation techniques in the context of gene expression programming. In: Keijzer M (eds) Late breaking papers at the 2004 genetic and evolutionary computation conference. Seattle, Washington, USA

  • Li X, Zhou C, Xiao W, Nelson PC (2005) Prefix gene expression programming. In: Late breaking paper at genetic and evolutionary computation conference (GECCO’2005), Washington, DC, pp 25–31

  • Liu SH, Mernik M, Bryant BR (2006) Entropy-driven exploration and exploitation in evolutionary algorithms. In: Proceedings of the 2nd international conference on bioinspired optimization methods and their applications (BIOMA 2006), pp 15–24

  • Liu SH, Mernik M, Bryant BR (2007) A clustering entropy-driven approach for exploring and exploiting noisy functions. In: Proceedings of the 2007 ACM symposium on applied computing, SAC’07, ACM, New York, NY, pp 738–742

  • Lopes HS, Weinert WR (2004) EGIPSYS: an enhanced gene expression programming approach for symbolic regression problems. Int J Appl Math Comput Sci 14(3):375–384

    MATH  MathSciNet  Google Scholar 

  • Luke S (2003) Modification point depth and genome growth in genetic programming. Evol Comput 11(1):67–106

    Article  Google Scholar 

  • Majeed H, Ryan C (2007) On the constructiveness of context-aware crossover. In: Proceedings of the 9th annual conference on genetic and evolutionary computation, GECCO’07, ACM, New York, NY, pp 1659–1666.

  • McPhee NF, Hopper NJ (1999) Analysis of genetic diversity through population history. In: Banzhaf W, Daida J, Eiben AE, Garzon MH, Honavar V, Jakiela M, Smith RE (eds) Proceedings of the genetic and evolutionary computation conference, vol 2. Morgan Kaufmann, Orlando, Florida, pp 1112–1120.

  • McPhee NF, Miller JD (1995) Accurate replication in genetic programming. In: Proceedings of the 6th international conference on genetic algorithms, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 303–309

  • Ngatchou P, Zarei A, El-Sharkawi M (2005) Pareto multi objective optimization. In: Proceedings of the 13th international conference on intelligent systems application to power systems, 2005, pp 84–91

  • Nikolaev N, Iba H (2001) Regularization approach to inductive genetic programming. IEEE Trans Evol Comput 5(4):359–375

    Article  Google Scholar 

  • O’Neill M, Vanneschi L, Gustafson S, Banzhaf W (2010) Open issues in genetic programming. Genet Program Evolvable Mach 11(3-4):339–363

    Article  Google Scholar 

  • O’Reilly UM, Oppacher F (1994) Program search with a hierarchical variable length representation: genetic programming, simulated annealing and hill climbing. Technical Report

  • Orlov M, Sipper M (2011) Flight of the finch through the java wilderness. IEEE Trans Evol Comput 15(2):166–182

    Article  Google Scholar 

  • Poli R (1996) Some steps towards a form of parallel distributed genetic programming. In: Proceedings of the first on-line workshop on soft computing, pp 290–295

  • Poli R (2003) A simple but theoretically-motivated method to control bloat in genetic programming. In: Proceedings of the 6th European conference on genetic programming, EuroGP’03, Springer-Verlag, Berlin, Heidelberg, pp 204–217

  • Poli R, McPhee NF (2008) Parsimony pressure made easy. In: Proceedings of the 10th annual conference on Genetic and evolutionary computation, GECCO’08, ACM, New York, NY, pp 1267–1274

  • Poli R, Langdon WB, Dignum S (2007) On the limiting distribution of program sizes in tree-based genetic programming. In: Proceedings of the 10th European conference on genetic programming, EuroGP’07, Springer-Verlag, Berlin, Heidelberg, pp 193–204

  • Poli R, Vanneschi L, Langdon WB, Mcphee NF (2010) Theoretical results in genetic programming: the next ten years?. Genet Program Evolvable Mach 11(3-4):285–320

    Article  Google Scholar 

  • Rosca JP (1995a) Entropy-driven adaptive representation. In: Proceedings of the workshop on genetic programming: from theory to real-world applications, Morgan Kaufmann, pp 23–32.

  • Rosca JP (1995b) Towards automatic discovery of building blocks in genetic programming. In: Working Notes for the AAAI Symposium on Genetic Programming, vol. 445. MIT, Cambridge, MA: AAAI, pp 78–85

  • Ryan C (1994) Advances in genetic programming chap Pygmies and civil servants. MIT Press, Cambridge, MA, pp 243–263

    Google Scholar 

  • Ryan C, Keijzer M (2003) An analysis of diversity of constants of genetic programming. In: Proceedings of the 6th European conference on genetic programming, EuroGP’03, Springer-Verlag, Berlin, Heidelberg, pp 404–413

  • Schaffer JD (1985) Multiple objective optimization with vector evaluated genetic algorithms. In: Proceedings of the 1st international conference on genetic algorithms, L. Erlbaum Associates Inc., Hillsdale, NJ, pp 93–100

  • Schmidt MD, Lipson H (2009) Incorporating expert knowledge in evolutionary search: a study of seeding methods. In: Proceedings of the 11th annual conference on genetic and evolutionary computation, GECCO’09, ACM, New York, NY, pp 1091–1098.

  • Silva S (2008) Controlling bloat: individual and population based approaches in genetic programming. Ph.D. thesis, Departamento de Engenharia Informatica, Universidade de Coimbra

  • Silva S, Costa E (2009) Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories. Genet Program Evolvable Mach 10(2):141–179

    Article  MathSciNet  Google Scholar 

  • Smits G, Vladislavleva E (2006) Ordinal pareto genetic programming. In: IEEE congress on evolutionary computation, 2006. CEC 2006, pp 3114–3120

  • Smits G, Kordon A, Vladislavleva K, Jordaan E, Kotanchek M (2005) Variable selection in industrial datasets using pareto genetic programming. In: Yu T, Riolo RL, Worzel B (eds) Genetic programming theory and practice III, genetic programming, vol. 9, chap. 6. Springer, Ann Arbor, pp 79–92

  • Soule T, Foster J (1998) Removal bias: a new cause of code growth in tree based evolutionary programming. In: The 1998 IEEE international conference on evolutionary computation proceedings, 1998. IEEE world congress on computational intelligence, pp 781–786

  • Srinivas N, Deb K (1994) Muiltiobjective optimization using nondominated sorting in genetic algorithms. Evol Comput 2(3):221–248

    Article  Google Scholar 

  • Stinstra E, Rennen G, Teeuwen G (2006) Meta-modeling by symbolic regression and pareto simulated annealing. Internal Report No. 2006-15, Tilburg University, Holland

  • Tackett WA (1994) Recombination, selection, and the genetic construction of computer programs. Ph.D. thesis, Los Angeles, CA, USA. Not available from Univ. Microfilms Int.

  • tak Zhang B (1997) A taxonomy of control schemes for genetic code growth. In: Position paper at the workshop on evolutionary computation with variable size representation at ICGA-97. East Lansing, MI, USA

  • Tokui N, Iha H (1999) Empirical and statistical analysis of genetic programming with linear genome. In: IEEE international conference on systems, man, and cybernetics, 1999. IEEE SMC’99 conference proceedings, vol 3, pp 610–615

  • Torres S, Larre M, Torres J (2002) A string representation methodology to generate syntactically valid genetic programs. In: WSEAS transactions on systems, vol 1, Mexico, pp 290–295

  • Ursem RK (2002) Diversity-guided evolutionary algorithms. In: Proceedings of the 7th international conference on parallel problem solving from nature, PPSN VII, Springer-Verlag, London, pp 462–474.

  • Uy NQ, Hoai NX, O’Neill M (2009) Semantic aware crossover for genetic programming: the case for real-valued function regression. In: Proceedings of the 12th European conference on genetic programming, EuroGP’09, Springer-Verlag, Berlin, Heidelberg, pp 292–302.

  • Uy NQ, Hoai NX, O’Neill M, Mckay RI, Galván-López E (2011) Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet Program Evolvable Mach 12:91–119

    Article  Google Scholar 

  • Vanneschi L, Castelli M, Silva S (2010) Measuring bloat, overfitting and functional complexity in genetic programming. In: Proceedings of the 12th annual conference on genetic and evolutionary computation, GECCO’10, ACM, New York, NY, pp 877–884.

  • Vladislavleva EJ, Smits GF, Den Hertog D (2009) Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. Trans Evol Comput 13:333–349

    Article  Google Scholar 

  • Wichard J (2006) Model selection in an ensemble framework. In: International joint conference on neural networks, 2006. IJCNN’06, pp 2187–2192

  • Wyns B, De Bruyne P, Boullart L (2006) Characterizing diversity in genetic programming. In: Proceedings of the 9th European conference on genetic programming, Springer-Verlag, pp 250–259

  • Zăvoianu AC (2010) Towards solution parsimony in an enhanced genetic programming process. Master’s thesis, International School Informatics: Engineering & Management, ISI-Hagenberg, Johannes Kepler University, Linz

  • Zhang BT, Cho DY (1999) Genetic programming with active data selection. In: Selected papers from the Second Asia-Pacific Conference on Simulated Evolution and Learning on Simulated Evolution and Learning, SEAL’98, Springer-Verlag, London, pp 146–153

  • Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans Evol Comput 3(4):257–271

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vipul K. Dabhi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dabhi, V.K., Chaudhary, S. Empirical modeling using genetic programming: a survey of issues and approaches. Nat Comput 14, 303–330 (2015). https://doi.org/10.1007/s11047-014-9416-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11047-014-9416-y

Keywords

Navigation