Abstract
Empirical modeling, which is a process of developing a mathematical model of a system from experimental data, has attracted many researchers due to its wide applicability. Finding both the structure and appropriate numeric coefficients of the model is a real challenge. Genetic programming (GP) has been applied by many practitioners to solve this problem. However, there are a number of issues which require careful attention while applying GP to empirical modeling problems. We begin with highlighting the importance of these issues including: computational efforts in evolving a model, premature convergence, generalization ability of an evolved model, building hierarchical models, and constant creation techniques. We survey and classify different approaches used by GP researchers to deal with the mentioned issues. We present different performance measures which are useful to report the results of analysis of GP runs. We hope this work would help the reader by facilitating to understand key concepts and practical issues of GP and steering in selection of an appropriate approach to solve a particular issue effectively.
Similar content being viewed by others
References
Altenberg L (1994) The evolution of evolvability in genetic programming. In: Kinnear Jr. KE (eds) Advances in genetic programming. MIT Press, Cambridge, MA, pp 47–74
Angeline PJ, Pollack J (1993) Evolutionary module acquisition. In: Fogel D, Atmar W (eds) Proceedings of the second annual conference on evolutionary programming, La Jolla, CA, pp 154–163
Babovic V, Keijzer M (2000) Genetic programming as a model induction engine. J Hydroinform 2(1):35–60
Barr RS, Golden BL, Kelly JP, Resende MG, Stewart Jr. WR (1995) Designing and reporting on computational experiments with heuristic methods. J Heuristics 1(1):9–32
Beadle L, Johnson C (2008) Semantically driven crossover in genetic programming. In: Evolutionary computation, 2008. CEC 2008. IEEE World Congress on Computational Intelligence, pp 111–116
Bentley PJ, Wakefield JP (1996) An analysis of multiobjective optimization within genetic algorithms. Technical Report ENGPJB96 96:1–14
Burke E, Gustafson S, Kendall G (2004) Diversity in genetic programming: an analysis of measures and correlation with fitness. IEEE Trans Evol Comput 8(1):47–62
Cantu-Paz E (2000) Efficient and accurate parallel genetic algorithms, vol. 1. Springer, Norwell, MA
Coello CAC (1998) A comprehensive survey of evolutionary-based multiobjective optimization techniques. Knowl Inf Syst 1(3):269–308
Costelloe D, Ryan C (2009) On improving generalisation in genetic programming. In: Proceedings of the 12th European conference on genetic programming, EuroGP ’09, Springer-Verlag, Berlin, Heidelberg, pp 61–72
Crawford-Marks R, Spector L (2002) Size control via size fair genetic operators in the pushgp genetic programming system. In: Proceedings of the genetic and evolutionary computation conference, GECCO ’02, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 733–739
Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimisation: Nsga-ii. In: Proceedings of the 6th international conference on parallel problem solving from nature, PPSN VI, Springer-Verlag, London, pp 849–858
de Jong ED, Watson RA, Pollack JB (2001) Reducing bloat and promoting diversity using multi-objective methods. Proceedings of the genetic and evolutionary computation conference (GECCO-2001), pp 11–18
de Vega FF, Tomassini M, Vanneschi L, Bucher L (2000) A distributed computing environment for genetic programming using MPI. In: Proceedings of the 7th European PVM/MPI users’ group meeting on recent advances in parallel virtual machine and message passing interface, Springer, London, UK, pp 322–329
Dignum S, Poli R (2008) Operator equalisation and bloat free gp. In: Proceedings of the 11th European conference on genetic programming, EuroGP’08, Springer-Verlag, Berlin, Heidelberg, pp 110–121
Eiben A, Jelasity M (2002) A critical note on experimental research methodology in ec. In: Proceedings of the 2002 Congress on evolutionary computation, 2002. CEC’02., vol 1, pp 582–587
Eiben A, Smit S (2011) Parameter tuning for configuring and analyzing evolutionary algorithms, pp 19–31
Ekárt A, Németh SZ (2001) Selection based on the pareto nondomination criterion for controlling code growth in genetic programming. Genet Program Evolvable Mach 2(1):61–73
Eshelman LJ, Schaffer JD (1993) Crossover’s niche. In: Proceedings of the 5th international conference on genetic algorithms, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 9–14
Esparcia-Alcazar AI, Sharman K (1997) Learning schemes for genetic programming. In: Late breaking papers at the 1997 genetic programming conference, pp 57–65
Ferreira C (2002) Gene expression programming in problem solving. In: Soft computing and industry, Springer, Berlin, pp 635–653.
Ferreira C (2003) Function finding and the creation of numerical constants in gene expression programming. Springer, Berlin, pp 257–265
Fonseca CM, Fleming PJ (1993) Genetic algorithms for multiobjective optimization: Formulation, discussion and generalization. In: Proceedings of the 5th international conference on genetic algorithms, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 416–423
Gagné C, Parizeau M, Dubreuil M (2003) Distributed beagle: an environment for parallel and distributed evolutionary computations. In: Proceedings of the 17th annual international symposium on high performance computing systems and applications (HPCS), vol 2003. NRC Research Press, Canada, pp 201–208
Gagné C, Schoenauer M, Parizeau M, Tomassini M (2006) Genetic programming, validation sets, and parsimony pressure. In: Proceedings of the 9th European conference on genetic programming, EuroGP’06, Springer-Verlag, Berlin, Heidelberg, pp 109–120
Gustafson S, Burke E, Krasnogor N (2005) On improving genetic programming for symbolic regression. In: The 2005 IEEE congress on evolutionary computation, 2005. vol. 1, pp 912–919
Guyon I, Alamdari A, Dror G, Buhmann, J (2006) Performance prediction challenge. In: International joint conference on neural networks, 2006. IJCNN ’06, pp 1649–1656
Handley S (1994) On the use of a directed acyclic graph to represent a population of computer programs. In: Proceedings of the First IEEE Conference on evolutionary computation, 1994. IEEE world congress on computational intelligence, vol 1, pp 154–159
Harmeling S, Dornhege G, Tax D, Meinecke F, Müller KR (2006) From outliers to prototypes: ordering data. Neurocomputing 69(13):1608–1618
Haynes T (1998) Collective adaptation: the exchange of coding segments. Evol Comput 6(4):311–338
Hengproprohm S, Chongstitvatana P (2001) Selective crossover in genetic programming. In: ISCIT international symposium on communications and information technologies. ChiangMai Orchid, ChiangMai Thailand
Holland JH (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control and artificial intelligence. MIT Press, Cambridge
Horn J, Nafpliotis N, Goldberg D (1994) A niched pareto genetic algorithm for multiobjective optimization. In: Proceedings of the First IEEE Conference on evolutionary computation, 1994. IEEE world congress on computational intelligence, vol 1, pp 82–87
Howard L, D’Angelo D (1995) The ga-p: a genetic algorithm and genetic programming hybrid. IEEE Expert 10(3):11–15
Ito T, Iba H, Sato S (1998) Non-destructive depth-dependent crossover for genetic programming. In: Genetic programming, Springer, London, pp 71–82.
Jin R, Chen W, Simpson TW (2000) Comparative studies of metamodeling techniques under multiple modeling criteria. Struct Multi Optim 23:1–13
Jin Y, Olhofer M, Sendhoff B (2001) Dynamic weighted aggregation for evolutionary multi-objective optimization: why does it work and how? In: Proceedings of the genetic and evolutionary computation conference GECCO, Morgan Kaufmann, pp 1042–1049
Keijzer M (1996) Advances in genetic programming. MIT Press, Cambridge, MA, pp 259–278
Keijzer M (2003) Improving symbolic regression with interval arithmetic and linear scaling. In: Proceedings of the 6th European conference on genetic programming, EuroGP’03, Springer-Verlag, Berlin, Heidelberg, pp 70–82
Keijzer M (2004) Alternatives in subtree caching for genetic programming. In: Genetic programming, Springer, Berlin, pp 328–337
Keijzer M, Babovic V (2000) Genetic programming within a framework of computer-aided discovery of scientific knowledge. In: Whitley D, Goldberg D, Cantu-Paz D, Spector L, Parmee I, Beyer HG (eds) Proceedings of the genetic and evolutionary computation conference (GECCO-2000), Morgan Kaufmann, Las Vegas, Nevada, pp 543–550
Knowles JD, Corne DW (2000) Approximating the nondominated front using the pareto archived evolution strategy. Evol Comput 8(2):149–172
Kotanchek M, Smits G, Vladislavleva E (2007) Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models. In: Riolo RL, Soule T, Worzel B (eds) Genetic programming theory and practice V, vol. 5. Springer. Genetic and Evolutionary Computation, Ann Arbor, pp 201–220.
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection, vol 1. MIT Press, Cambridge
Koza JR (1995) Evolving the architecture of a multi-part program in genetic programming using architecture-altering operations. In: McDonnell JR, Reynolds RG, Fogel DB (eds) Evolutionary programming IV proceedings of the fourth annual conference on evolutionary programming, MIT Press, San Diego, CA, pp 695–717.
Langdon WB (1998) Genetic programming and data structures: genetic programming + data structures = automatic programming!, vol 1. Springer, Berlin
Langdon WB (2000) Size fair and homologous tree crossovers for tree genetic programming. Genet Program Evolvable Mach 1(1−2):95–119
Langdon W, Nordin J (2000) Seeding genetic programming populations. In: Poli R, Banzhaf W, Langdon W, Miller J, Nordin P, Fogarty T (eds) Genetic programming, lecture notes in computer science, vol. 1802, vol. 1802. Springer, Berlin Heidelberg, pp 304–315
Langdon WB, Poli R (1998) Fitness causes bloat: mutation. In: Chawdhry PK, Roy R, Pan RK (eds) Second on-line world conference on soft computing in engineering design and manufacturing, Springer-Verlag, London, pp 37–48
Laumanns M, Thiele L, Zitzler E, Deb K (2002) Archiving with guaranteed convergence and diversity in multi-objective optimization. In: Proceedings of the genetic and evolutionary computation conference (GECCO), GECCO’02, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 439–447
Li X, Zhou C, Nelson PC, Tirpak TM (2004) Investigation of constant creation techniques in the context of gene expression programming. In: Keijzer M (eds) Late breaking papers at the 2004 genetic and evolutionary computation conference. Seattle, Washington, USA
Li X, Zhou C, Xiao W, Nelson PC (2005) Prefix gene expression programming. In: Late breaking paper at genetic and evolutionary computation conference (GECCO’2005), Washington, DC, pp 25–31
Liu SH, Mernik M, Bryant BR (2006) Entropy-driven exploration and exploitation in evolutionary algorithms. In: Proceedings of the 2nd international conference on bioinspired optimization methods and their applications (BIOMA 2006), pp 15–24
Liu SH, Mernik M, Bryant BR (2007) A clustering entropy-driven approach for exploring and exploiting noisy functions. In: Proceedings of the 2007 ACM symposium on applied computing, SAC’07, ACM, New York, NY, pp 738–742
Lopes HS, Weinert WR (2004) EGIPSYS: an enhanced gene expression programming approach for symbolic regression problems. Int J Appl Math Comput Sci 14(3):375–384
Luke S (2003) Modification point depth and genome growth in genetic programming. Evol Comput 11(1):67–106
Majeed H, Ryan C (2007) On the constructiveness of context-aware crossover. In: Proceedings of the 9th annual conference on genetic and evolutionary computation, GECCO’07, ACM, New York, NY, pp 1659–1666.
McPhee NF, Hopper NJ (1999) Analysis of genetic diversity through population history. In: Banzhaf W, Daida J, Eiben AE, Garzon MH, Honavar V, Jakiela M, Smith RE (eds) Proceedings of the genetic and evolutionary computation conference, vol 2. Morgan Kaufmann, Orlando, Florida, pp 1112–1120.
McPhee NF, Miller JD (1995) Accurate replication in genetic programming. In: Proceedings of the 6th international conference on genetic algorithms, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 303–309
Ngatchou P, Zarei A, El-Sharkawi M (2005) Pareto multi objective optimization. In: Proceedings of the 13th international conference on intelligent systems application to power systems, 2005, pp 84–91
Nikolaev N, Iba H (2001) Regularization approach to inductive genetic programming. IEEE Trans Evol Comput 5(4):359–375
O’Neill M, Vanneschi L, Gustafson S, Banzhaf W (2010) Open issues in genetic programming. Genet Program Evolvable Mach 11(3-4):339–363
O’Reilly UM, Oppacher F (1994) Program search with a hierarchical variable length representation: genetic programming, simulated annealing and hill climbing. Technical Report
Orlov M, Sipper M (2011) Flight of the finch through the java wilderness. IEEE Trans Evol Comput 15(2):166–182
Poli R (1996) Some steps towards a form of parallel distributed genetic programming. In: Proceedings of the first on-line workshop on soft computing, pp 290–295
Poli R (2003) A simple but theoretically-motivated method to control bloat in genetic programming. In: Proceedings of the 6th European conference on genetic programming, EuroGP’03, Springer-Verlag, Berlin, Heidelberg, pp 204–217
Poli R, McPhee NF (2008) Parsimony pressure made easy. In: Proceedings of the 10th annual conference on Genetic and evolutionary computation, GECCO’08, ACM, New York, NY, pp 1267–1274
Poli R, Langdon WB, Dignum S (2007) On the limiting distribution of program sizes in tree-based genetic programming. In: Proceedings of the 10th European conference on genetic programming, EuroGP’07, Springer-Verlag, Berlin, Heidelberg, pp 193–204
Poli R, Vanneschi L, Langdon WB, Mcphee NF (2010) Theoretical results in genetic programming: the next ten years?. Genet Program Evolvable Mach 11(3-4):285–320
Rosca JP (1995a) Entropy-driven adaptive representation. In: Proceedings of the workshop on genetic programming: from theory to real-world applications, Morgan Kaufmann, pp 23–32.
Rosca JP (1995b) Towards automatic discovery of building blocks in genetic programming. In: Working Notes for the AAAI Symposium on Genetic Programming, vol. 445. MIT, Cambridge, MA: AAAI, pp 78–85
Ryan C (1994) Advances in genetic programming chap Pygmies and civil servants. MIT Press, Cambridge, MA, pp 243–263
Ryan C, Keijzer M (2003) An analysis of diversity of constants of genetic programming. In: Proceedings of the 6th European conference on genetic programming, EuroGP’03, Springer-Verlag, Berlin, Heidelberg, pp 404–413
Schaffer JD (1985) Multiple objective optimization with vector evaluated genetic algorithms. In: Proceedings of the 1st international conference on genetic algorithms, L. Erlbaum Associates Inc., Hillsdale, NJ, pp 93–100
Schmidt MD, Lipson H (2009) Incorporating expert knowledge in evolutionary search: a study of seeding methods. In: Proceedings of the 11th annual conference on genetic and evolutionary computation, GECCO’09, ACM, New York, NY, pp 1091–1098.
Silva S (2008) Controlling bloat: individual and population based approaches in genetic programming. Ph.D. thesis, Departamento de Engenharia Informatica, Universidade de Coimbra
Silva S, Costa E (2009) Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories. Genet Program Evolvable Mach 10(2):141–179
Smits G, Vladislavleva E (2006) Ordinal pareto genetic programming. In: IEEE congress on evolutionary computation, 2006. CEC 2006, pp 3114–3120
Smits G, Kordon A, Vladislavleva K, Jordaan E, Kotanchek M (2005) Variable selection in industrial datasets using pareto genetic programming. In: Yu T, Riolo RL, Worzel B (eds) Genetic programming theory and practice III, genetic programming, vol. 9, chap. 6. Springer, Ann Arbor, pp 79–92
Soule T, Foster J (1998) Removal bias: a new cause of code growth in tree based evolutionary programming. In: The 1998 IEEE international conference on evolutionary computation proceedings, 1998. IEEE world congress on computational intelligence, pp 781–786
Srinivas N, Deb K (1994) Muiltiobjective optimization using nondominated sorting in genetic algorithms. Evol Comput 2(3):221–248
Stinstra E, Rennen G, Teeuwen G (2006) Meta-modeling by symbolic regression and pareto simulated annealing. Internal Report No. 2006-15, Tilburg University, Holland
Tackett WA (1994) Recombination, selection, and the genetic construction of computer programs. Ph.D. thesis, Los Angeles, CA, USA. Not available from Univ. Microfilms Int.
tak Zhang B (1997) A taxonomy of control schemes for genetic code growth. In: Position paper at the workshop on evolutionary computation with variable size representation at ICGA-97. East Lansing, MI, USA
Tokui N, Iha H (1999) Empirical and statistical analysis of genetic programming with linear genome. In: IEEE international conference on systems, man, and cybernetics, 1999. IEEE SMC’99 conference proceedings, vol 3, pp 610–615
Torres S, Larre M, Torres J (2002) A string representation methodology to generate syntactically valid genetic programs. In: WSEAS transactions on systems, vol 1, Mexico, pp 290–295
Ursem RK (2002) Diversity-guided evolutionary algorithms. In: Proceedings of the 7th international conference on parallel problem solving from nature, PPSN VII, Springer-Verlag, London, pp 462–474.
Uy NQ, Hoai NX, O’Neill M (2009) Semantic aware crossover for genetic programming: the case for real-valued function regression. In: Proceedings of the 12th European conference on genetic programming, EuroGP’09, Springer-Verlag, Berlin, Heidelberg, pp 292–302.
Uy NQ, Hoai NX, O’Neill M, Mckay RI, Galván-López E (2011) Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet Program Evolvable Mach 12:91–119
Vanneschi L, Castelli M, Silva S (2010) Measuring bloat, overfitting and functional complexity in genetic programming. In: Proceedings of the 12th annual conference on genetic and evolutionary computation, GECCO’10, ACM, New York, NY, pp 877–884.
Vladislavleva EJ, Smits GF, Den Hertog D (2009) Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. Trans Evol Comput 13:333–349
Wichard J (2006) Model selection in an ensemble framework. In: International joint conference on neural networks, 2006. IJCNN’06, pp 2187–2192
Wyns B, De Bruyne P, Boullart L (2006) Characterizing diversity in genetic programming. In: Proceedings of the 9th European conference on genetic programming, Springer-Verlag, pp 250–259
Zăvoianu AC (2010) Towards solution parsimony in an enhanced genetic programming process. Master’s thesis, International School Informatics: Engineering & Management, ISI-Hagenberg, Johannes Kepler University, Linz
Zhang BT, Cho DY (1999) Genetic programming with active data selection. In: Selected papers from the Second Asia-Pacific Conference on Simulated Evolution and Learning on Simulated Evolution and Learning, SEAL’98, Springer-Verlag, London, pp 146–153
Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans Evol Comput 3(4):257–271
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dabhi, V.K., Chaudhary, S. Empirical modeling using genetic programming: a survey of issues and approaches. Nat Comput 14, 303–330 (2015). https://doi.org/10.1007/s11047-014-9416-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11047-014-9416-y