skip to main content
10.1145/2001858.2002060acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
tutorial

Overfitting detection and adaptive covariant parsimony pressure for symbolic regression

Authors Info & Claims
Published:12 July 2011Publication History

ABSTRACT

Covariant parsimony pressure is a theoretically motivated method primarily aimed to control bloat. In this contribution we describe an adaptive method to control covariant parsimony pressure that is aimed to reduce overfitting in symbolic regression. The method is based on the assumption that overfitting can be reduced by controlling the evolution of program length. Additionally, we propose an overfitting detection criterion that is based on the correlation of the fitness values on the training set and a validation set of all models in the population.

The proposed method uses covariant parsimony pressure to decrease the average program length when overfitting occurs and allows an increase of the average program length in the absence of overfitting. The proposed approach is applied on two real world datasets. The experimental results show that the correlation of training and validation fitness can be used as an indicator for overfitting and that the proposed method of covariant parsimony pressure adaption alleviates overfitting in symbolic regression experiments with the two datasets.

References

  1. H. Akaike. Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory, pages 267--281. 1973.Google ScholarGoogle Scholar
  2. R. M. A. Azad and C. Ryan. Abstract functions and lifetime learning in genetic programming for symbolic regression. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, GECCO '10, pages 893--900, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Dignum and R. Poli. Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat. In GECCO '07: Proceedings of the 9th annual conference on Genetic and evolutionary computation, volume 2, pages 1588--1595, London, 7-11 July 2007. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Frank and A. Asuncion. UCI machine learning repository, 2010.Google ScholarGoogle Scholar
  5. C. Gagne, M. Schoenauer, M. Parizeau, and M. Tomassini. Genetic programming, validation sets, and parsimony pressure. In Genetic Programming, 9th European Conference, EuroGP2006, volume 3905 of Lecture Notes in Computer Science, pages 109--120, Berlin, Heidelberg, New York, 2006. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning - Data Mining, Inference, and Prediction. Springer, 2009. Second Edition.Google ScholarGoogle Scholar
  7. M. Keijzer. Scaled symbolic regression. Genetic Programming and Evolvable Machines, 5(3):259--269, Sept. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Luke. Two fast tree-creation algorithms for genetic programming. IEEE Transactions on Evolutionary Computation, 4(3):274--283, Sept. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Poli and N. F. McPhee. Covariant parsimony pressure for genetic programming. Technical Report CES-480, Department of Computing and Electronic Systems, University of Essex, UK, 2008.Google ScholarGoogle Scholar
  10. J. Rissanen. A universal prior for integers and estimation by minimum description length. Annals of Statistics, 11:416--431, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  11. M. Schmidt and H. Lipson. Symbolic regression of implicit equations. In Genetic Programming Theory and Practice VII, Genetic and Evolutionary Computation, pages 73--85. Springer US, 2010.Google ScholarGoogle Scholar
  12. G. E. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6(2):461--464, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  13. S. Silva and S. Dignum. Extending operator equalisation: Fitness based self adaptive length distribution for bloat free GP. In Proceedings of the 12th European Conference on Genetic Programming, EuroGP 2009, volume 5481 of LNCS, pages 159--170, Tuebingen, Apr. 15-17 2009. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Silva and L. Vanneschi. Operator equalisation, bloat and overfitting: a study on human oral bioavailability prediction. In GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages 1115--1122, Montreal, 8-12 July 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. F. Smits and M. Kotanchek. Pareto-front exploitation in symbolic regression. In Genetic Programming in Theory and Practice II, pages 283--299. Springer, 2005.Google ScholarGoogle Scholar
  16. C. Spearman. The proof and measurement of association between two things. The American Journal of Psychology, 15(1):72--101, 1904.Google ScholarGoogle ScholarCross RefCross Ref
  17. L. Vanneschi, M. Castelli, and S. Silva. Measuring bloat, overfitting and functional complexity in genetic programming. In Proc. GECCO'10, pages 877--884, July 7-11 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. J. Vladislavleva, G. F. Smits, and D. den Hertog. Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Transactions on Evolutionary Computation, 13(2):333--349, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Wagner. Heuristic Optimization Software Systems - Modeling of Heuristic Optimization Algorithms in the HeuristicLab Software Environment. PhD thesis, Institute for Formal Models and Verification, Johannes Kepler University, Linz, Austria, 2009.Google ScholarGoogle Scholar
  21. S. Winkler, M. Affenzeller, and S. Wagner. Using enhanced genetic programming techniques for evolving classifiers in the context of medical diagnosis. Genetic Programming and Evolvable Machines, 10(2):111--140, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Overfitting detection and adaptive covariant parsimony pressure for symbolic regression

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      GECCO '11: Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
      July 2011
      1548 pages
      ISBN:9781450306904
      DOI:10.1145/2001858

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 July 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • tutorial

      Acceptance Rates

      Overall Acceptance Rate1,669of4,410submissions,38%

      Upcoming Conference

      GECCO '24
      Genetic and Evolutionary Computation Conference
      July 14 - 18, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader