An information—theoretic framework is presented for the development and analysis of the ensemble learning approach of genetic programming. As evolution proceeds, this approach suggests that the mutual information between the target and models should: (i) not decrease in the population; (ii) concentrate in fewer individuals; and (iii) be “distilled” from the inputs, eliminating excess entropy. Normalized information theoretic indices are developed to measure fitness and diversity of ensembles, without a priori knowledge of how the multiple constituent models might be composed into a single model. With the use of these indices for reproductive and survival selection, building blocks are less likely to be lost and more likely to be recombined. Price's Theorem is generalized to pair selection and rewritten to show key factors related to heritability and evolvability. Heritability of information should be stronger than that of error, improving evolvability. We support these arguments with simulations using a logic function benchmark and a time series application. For a chaotic time series prediction problem, for instance, the proposed approach avoids familiar difficulties (premature convergence, deception, poor scaling, and early loss of needed building blocks) with standard GP symbolic regression systems; informationbased fitness functions showed strong intergenerational correlations as required by Price's Theorem.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aguirre, A. and Coello, C. (2004). Mutual information-based fitness functions for evolutionary circuit synthesis. In Proc. of the 2004 IEEE Congress on Evolutionary Computation (CEC’04).
Altenberg, Lee (1994). The Schema Theorem and Price’s Theorem. In Whitley, L. Darrell and Vose, Michael D., editors, Foundations of Genetic Algorithms 3, pages 23-49, Estes Park, Colorado, USA. Morgan Kaufmann. Published 1995.
Card, Stuart W. and Mohan, Chilukuri K. (2005). Information theoretic indicators of fitness, relevant diversity & pairing potential in genetic programming. In Corne, David, Michalewicz, Zbigniew, Dorigo, Marco, Eiben, Gusz, Fogel, David, Fonseca, Carlos, Greenwood, Garrison, Chen, Tan Kay, Raidl, Guenther, Zalzala, Ali, Lucas, Simon, Paechter, Ben, Willies, Jennifier, Guervos, Juan J. Merelo, Eberbach, Eugene, McKay, Bob, Channon, Alastair, Tiwari, Ashutosh, Volkert, L. Gwenn, Ashlock, Dan, and Schoenauer, Marc, editors, Proceedings of the 2005 IEEE Congress on Evolutionary Computation, volume 3, pages 2545-2552, Edinburgh, UK. IEEE Press.
Chechik, G. (2003). An Information Theoretic Approach to the Study of Auditory Coding. PhD thesis, Hebrew University.
Daida, Jason M. (2005). Towards identifying populations that increase the likelihood of success in genetic programming. In Beyer, Hans-Georg, O’Reilly, Una-May, Arnold, Dirk V., Banzhaf, Wolfgang, Blum, Christian, Bonabeau, Eric W., Cantu-Paz, Erick, Dasgupta, Dipankar, Deb, Kalyanmoy, Foster, James A., de Jong, Edwin D., Lipson, Hod, Llora, Xavier, Mancoridis, Spiros, Pelikan, Martin, Raidl, Guenther R., Soule, Terence, Tyrrell, Andy M., Watson, Jean-Paul, and Zitzler, Eckart, editors, GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation, volume 2, pages 1627-1634, Washington DC, USA. ACM Press.
Deignan, P., Meckl, P., and Franchek, M. (2002). The mi-rbfn: Mapping for generalization. In Proc. of the American Control Conference (ACC’02).
Koza, John R., Al-Sakran, Sameer H., and Jones, Lee W. (2005). Cross-domain features of runs of genetic programming used to evolve designs for analog circuits, optical lens systems, controllers, antennas, mechanical systems, and quantum computing circuits. In Lohn, Jason, Gwaltney, David, Hornby, Gregory, Zebulum, Ricardo, Keymeulen, Didier, and Stoica, Adrian, editors,
Proceedings of the 2005 NASA/DoD Conference on Evolvable Hardware, pages 205-214, Washington, DC, USA. IEEE Press.
Li, Ming, Chen, Xin, Li, Xin, Ma, Bin, and Vitanyi, Paul (2003). The similarity metric. In Proc. 14th ACM-SIAM Symp. on Discrete Algorithms, pages 863-872.
Liu, Y., Yao, X., Zhao, Q., and Higuchi, T. (2001). Evolving a cooperative population of neural networks by minimizing mutual information. In Proc. of the 2001 IEEE Congress on Evolutionary Computation (CEC’01).
Muharram, Mohammed A. and Smith, George D. (2004). Evolutionary feature construction using information gain and gini index. In Keijzer, Maarten, O’Reilly, Una-May, Lucas, Simon M., Costa, Ernesto, and Soule, Terence, editors, Genetic Programming 7th European Conference, EuroGP 2004, Proceedings, volume 3003 of LNCS, pages 379-388, Coimbra, Portugal. Springer-Verlag.
Principe, J., Fisher, and Xu, D. (2000). Information theoretic learning. In Unsupervised Adaptive Filtering.
Radcliffe, N. (1993). Genetic set recombination. In Foundations of Genetic Algorithms 2.
Sprott, J. (1997). Simplest dissipative chaotic flow. Physics Letters A, 228.
Sprott, J. (2000). Algebraically simple chaotic flows. International Journal of Chaos Theory and Applications, 5(2).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Card, S.W., Mohan, C.K. (2008). Towards an Information Theoretic Framework for Genetic Programming. In: Riolo, R., Soule, T., Worzel, B. (eds) Genetic Programming Theory and Practice V. Genetic and Evolutionary Computation Series. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-76308-8_6
Download citation
DOI: https://doi.org/10.1007/978-0-387-76308-8_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-76307-1
Online ISBN: 978-0-387-76308-8
eBook Packages: Computer ScienceComputer Science (R0)