Skip to main content

Genetic Programming as an Innovation Engine for Automated Machine Learning: The Tree-Based Pipeline Optimization Tool (TPOT)

  • Chapter
  • First Online:
Handbook of Evolutionary Machine Learning

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

Abstract

One of the central challenges of machine learning is the selection of methods for feature selection, feature engineering, and classification or regression algorithms for building an analytics pipeline. This is true for both novices and experts. Automated machine learning (AutoML) has emerged as a useful approach to generate machine learning pipelines without the need for manual construction and evaluation. We review here some challenges of building pipelines and present several of the first and most widely used AutoML methods and open-source software. We present in detail the Tree-based Pipeline Optimization Tool (TPOT) that represents pipelines as expression trees and uses genetic programming (GP) for discovery and optimization. We present some of the extensions of TPOT and its application to real-world big data. We end with some thoughts about the future of AutoML and evolutionary machine learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chicco, D., Oneto, L., Tavazzi, E.: Eleven quick tips for data cleaning and feature engineering. PLoS Comput. Biol. 18, e1010718 (2022)

    Article  Google Scholar 

  2. Urbanowicz, R.J., Meeker, M., La Cava, W., Olson, R.S., Moore, J.H.: Relief-based feature selection: introduction and review. J. Biomed. Inform. 85, 189–203 (2018)

    Article  Google Scholar 

  3. Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. 38 (2006)

    Google Scholar 

  4. Smits, G.F., Kotanchek, M.: Pareto-front exploitation in symbolic regression. In: O’Reilly, U.-M., Yu, T., Riolo, R., Worzel, B. (eds.) Genetic Programming Theory and Practice II. pp. 283–299 (2006)

    Google Scholar 

  5. Combi, C., Amico, B., Bellazzi, R., Holzinger, A., Moore, J.H., Zitnik, M., Holmes, J.H.: A manifesto on explainability for artificial intelligence in medicine. Artif. Intell. Med. 133, 102423 (2022)

    Article  Google Scholar 

  6. Hutter, F., Kotthoff, L., Vanschoren, J. (eds.): Automated Machine Learning: Methods, Systems, Challenges. Springer International Publishing (2019)

    Google Scholar 

  7. Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855. ACM, New York, NY, USA (2013)

    Google Scholar 

  8. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11, 10–18 (2009)

    Article  Google Scholar 

  9. Wang, H.-L., Hsu, W.-Y., Lee, M.-H., Weng, H.-H., Chang, S.-W., Yang, J.-T., Tsai, Y.-H.: Automatic machine-learning-based outcome prediction in patients with primary intracerebral hemorrhage. Front. Neurol. 10, 910 (2019)

    Article  Google Scholar 

  10. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28. pp. 2962–2970. Curran Associates, Inc. (2015)

    Google Scholar 

  11. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  12. Howard, D., Maslej, M.M., Lee, J., Ritchie, J., Woollard, G., French, L.: Transfer learning for risk classification of social media posts: model evaluation study. J. Med. Internet Res. 22, e15371 (2020)

    Article  Google Scholar 

  13. Olson, R.S., Urbanowicz, R.J., Andrews, P.C., Lavender, N.A., Kidd, L.C., Moore, J.H.: Automating biomedical data science through tree-based pipeline optimization. In: Squillero, G., Burelli, P. (eds.) Applications of Evolutionary Computation, pp. 123–137. Springer International Publishing, Cham (2016)

    Chapter  Google Scholar 

  14. Olson, R.S., Bartley, N., Urbanowicz, R.J., Moore, J.H.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 485–492. ACM, New York, NY, USA (2016)

    Google Scholar 

  15. Olson, R.S., Moore, J.H.: TPOT: a tree-based pipeline optimization tool for automating machine learning. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds.) Automated Machine Learning: Methods, Systems, Challenges, pp. 151–160. Springer International Publishing, Cham (2019)

    Chapter  Google Scholar 

  16. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA (1992)

    MATH  Google Scholar 

  17. Fortin, F., De Rainville, F., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)

    MathSciNet  Google Scholar 

  18. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2002)

    Article  Google Scholar 

  19. Helmuth, T., McPhee, N.F., Spector, L.: Lexicase selection for program synthesis: a diversity analysis. In: Riolo, R., Worzel, W.P., Kotanchek, M., Kordon, A. (eds.) Genetic Programming Theory and Practice XIII, pp. 151–167. Springer International Publishing, Cham (2016)

    Chapter  Google Scholar 

  20. Le, T.T., Fu, W., Moore, J.H.: Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinforma. Oxf. Engl. 36, 250–256 (2020)

    Article  Google Scholar 

  21. Romano, J., Le, T., Fu, W., Moore, J.: TPOT-NN: augmenting tree-based automated machine learning with neural network estimators. Genet. Program. Evolvable Mach. 1–21 (2021)

    Google Scholar 

  22. Manduchi, E., Romano, J.D., Moore, J.H.: The promise of automated machine learning for the genetic analysis of complex traits. Hum. Genet. 141, 1529–1544 (2022)

    Article  Google Scholar 

  23. Orlenko, A., Kofink, D., Lyytikäinen, L.-P., Nikus, K., Mishra, P., Kuukasjärvi, P., Karhunen, P.J., Kähönen, M., Laurikka, J.O., Lehtimäki, T., Asselbergs, F.W., Moore, J.H.: Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning. Bioinforma. Oxf. Engl. 36, 1772–1778 (2020)

    Article  Google Scholar 

  24. Manduchi, E., Fu, W., Romano, J.D., Ruberto, S., Moore, J.H.: Embedding covariate adjustments in tree-based automated machine learning for biomedical big data analyses. BMC Bioinform. 21, 430 (2020)

    Article  Google Scholar 

  25. Purkayastha, S., Zhao, Y., Wu, J., Hu, R., McGirr, A., Singh, S., Chang, K., Huang, R.Y., Zhang, P.J., Silva, A., Soulen, M.C., Stavropoulos, S.W., Zhang, Z., Bai, H.X.: Differentiation of low and high grade renal cell carcinoma on routine MRI with an externally validated automatic machine learning algorithm. Sci. Rep. 10, 19503 (2020)

    Article  Google Scholar 

  26. Heimisdottir, L.H., Lin, B.M., Cho, H., Orlenko, A., Ribeiro, A.A., Simon-Soro, A., Roach, J., Shungin, D., Ginnis, J., Simancas-Pallares, M.A., Spangler, H.D., Zandoná, A.G.F., Wright, J.T., Ramamoorthy, P., Moore, J.H., Koo, H., Wu, D., Divaris, K.: Metabolomics insights in early childhood caries. J. Dent. Res. 100, 615–622 (2021)

    Article  Google Scholar 

  27. Manduchi, E., Le, T.T., Fu, W., Moore, J.H.: Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection. IEEE/ACM Trans. Comput. Biol. Bioinform. 19, 1379–1386 (2022)

    Article  Google Scholar 

  28. Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems. Curran Associates, Inc. (2017)

    Google Scholar 

  29. Sipper, M., Moore, J.H.: Genetic programming theory and practice: a fifteen-year trajectory. Genet. Program Evolvable Mach. 21, 169–179 (2020)

    Article  Google Scholar 

  30. La Cava, W., Williams, H., Fu, W., Vitale, S., Srivatsan, D., Moore, J.H.: Evaluating recommender systems for AI-driven biomedical informatics. Bioinforma. Oxf. Engl. 37, 250–256 (2021)

    Article  Google Scholar 

  31. Moore, J.H., Parker, J.S., Olsen, N.J., Aune, T.M.: Symbolic discriminant analysis of microarray data in autoimmune disease. Genet. Epidemiol. 23, 57–69 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason H. Moore .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Moore, J.H., Ribeiro, P.H., Matsumoto, N., Saini, A.K. (2024). Genetic Programming as an Innovation Engine for Automated Machine Learning: The Tree-Based Pipeline Optimization Tool (TPOT). In: Banzhaf, W., Machado, P., Zhang, M. (eds) Handbook of Evolutionary Machine Learning. Genetic and Evolutionary Computation. Springer, Singapore. https://doi.org/10.1007/978-981-99-3814-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-3814-8_14

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-3813-1

  • Online ISBN: 978-981-99-3814-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics