Skip to main content

Biological Strategies ParetoGP Enables Analysis of Wide and Ill-Conditioned Data from Nonlinear Systems

  • Chapter
  • First Online:
Genetic Programming Theory and Practice XIX

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

  • 299 Accesses

Abstract

Genetic, proteomic, and other biologically derived data sets are often ill-conditioned with many more variables than data records. Furthermore, the variables are often highly correlated as well as coupled. These attributes make such data sets very difficult to analyze with conventional statistical and machine learning techniques. The ParetoGP approach implemented within DataModeler exploring the trade-off between model complexity and accuracy enables attacking such data sets with dual benefits of identifying key variables, associations, and metavariables along with providing concise, explainable, and human-interpretable predictive models. Transparency of key variables, model structures, and response behaviors provide a substantial benefit relative to conventional machine learning and the associated black-box models. In this chapter, we describe the analysis methodology and highlight benefits using available biological data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Evolved Analytics LLC. DataModeler. Evolved Analytics LLC, Rancho Santa Fe, USA, version 9.5 edition, 9 2021

    Google Scholar 

  2. Dwarshuis, N.J., Song, H.W., Patel, A., Kotanchek, T., Roy, K.: Functionalized microcarriers improve t cell manufacturing by facilitating migratory memory t cell production and increasing cd4/cd8 ratio

    Google Scholar 

  3. Cheng, A., Vantucci, C.E., Krishnan, L., Ruehle, M.A., Kotanchek, T., Wood, L.B., Roy, K., Guldberg, R.E.: Early systemic immune biomarkers predict bone regeneration after trauma. PNAS 118(8) (2021)

    Google Scholar 

  4. Pradhan, P., Chatterjee, P., Stevens, H., Marmon, A., Medrano-Trochez, C., Jimenez, A., Kippner, L., Li, Y., Savage, E., Gaul, D., Fernandez, F., Gibson, G., Kurtzberg, J., Kotanchek, T., Yeago, C., Roy, K.: Multiomic analysis and computational modeling to identify critical quality attributes for immunomodulatory potency of mesenchymal stromal cells. In: International Society Cell and Gene Therapy - Cytotherapy, vol. 23 (2021)

    Google Scholar 

  5. Saplakoglu, U., Kotanchek, T., Marshall, D., Roy, K., Sobecki, S.: Expert roundtable: embracing transformation: how big data, ai and digitization are changing cell and gene therapy manufacture. Bioinsights 503–518 (2021)

    Google Scholar 

  6. Odeh-Couvertier, V.Y., Dwarshuis, N.J., Colonna, M.B., Levine, B.L., Edison, A.S., Kotanchek, T., Roy, K., Torres-Garcia, W.: Predicting t-cell quality during manufacturing through an artificial intelligence-based integrative multiomics analytical platform. Bioeng. Transl, Med (2021)

    Google Scholar 

  7. Maughon, T.S., Shen, X., Shen, X., Huang, D., Adebayo Michael, A.O., Andrew Shockey, W., Andrews, S.H., McRae III, J.M., Platt, M.O., Fernandez, F.M., Edison, A.S., Stice, S.L., Marklein, R.A.: Metabololics amd cytokine profiling of mesenchymal stromal cells idefntify markers predictive of t-cell suppression. ISCT - Cytotherapy 24 (2021)

    Google Scholar 

  8. Fernandez, F.M.: Facundo private communications. Metabolic Data

    Google Scholar 

  9. NSF Engineering Research Center for Cell Manufacturing Technologies. Cell manufacturing technology 9 (2022)

    Google Scholar 

  10. Haldeman-Englert, C., Raymond T. Jr., Novick, T.: University of rochester medical center health encyclopedia cd4-cd8 ratio

    Google Scholar 

  11. Medicine National Academies of Sciences, Engineering. Applying Systems Thinking to Regenerative Medicine: Proceedings of a Workshop. The National Academies Press, Washington, DC (2021)

    Google Scholar 

  12. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  13. Keijzer, M.: Improving symbolic regression with interval arithmetic and linear scaling

    Google Scholar 

  14. Kotanchek, M., Smits, G., Kordon, A., Vladislavleva, K., Jordaan, E.: Variable Selection in Industrial Datasets Using Pareto Genetic Programming, volume 9 of Genetic Programming, theory and practice, vo. 6, 1st edn., pp. 79–92. Springer (2006)

    Google Scholar 

  15. Iwadoh, K.: Private communications. Mach Learn Comments

    Google Scholar 

  16. Kotanchek, M., Haut, N.: Back To The Future: Revisiting OrdinalGP and Trustable Models After a Decade, volume 18 of Genetic Programming, Theory and Practice, vol. 7, pp. 129–142. Springer (2022)

    Google Scholar 

Download references

Acknowledgements

We would like to thank the leadership and staff of the National Science Foundation Engineering Research Center for Cell Manufacturing Technologies, including Krishnendu Roy, Nathan J. Dwarshuis, Maxwell B. Colonna, Valerie Y. Odeh-Couvertier, Wandaliz Torres-Garcia, and Arthur S. Edison, for their contributions in preparing and providing the CAR-T cell datasets. We thank Facundo M. Fernandez, Alexandria R. Van Group and members of the Marcus Center for Therapeutic Cell Characterization and Manufacturing (MC3M) staff, including Pallab Pradhan, Paramita Chatterjee, Carolyn Yeago, Andrew Marmon, and Annie Boules-Welch, for their contributions and support in providing the MSC datasets. We thank the University of Oregon’s Guldberg Musculoskeletal Research Lab, including Robert E. Guldberg, Albert Cheng, Casey Vantucci, and Kelly Leguineche for their contributions and support in providing the bone regeneration dataset. We would also like to thank Kazuhiro Iwadoh for his insights with issues facing other machine-learning algorithms.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Kotanchek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kotanchek, M., Kotanchek, T., Kotanchek, K. (2023). Biological Strategies ParetoGP Enables Analysis of Wide and Ill-Conditioned Data from Nonlinear Systems. In: Trujillo, L., Winkler, S.M., Silva, S., Banzhaf, W. (eds) Genetic Programming Theory and Practice XIX. Genetic and Evolutionary Computation. Springer, Singapore. https://doi.org/10.1007/978-981-19-8460-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-8460-0_5

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-8459-4

  • Online ISBN: 978-981-19-8460-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics