Skip to main content

Evolutionary Computation for the Interpretation of Metabolomic Data

  • Chapter

Abstract

Post-genomic science is producing bounteous data floods, and as the above quotation indicates the extraction of the most meaningful parts of these data is key to the generation of useful new knowledge. Atypical metabolic fingerprint or metabolomics experiment is expected to generate thousands of data points (samples times variables) of which only a handful might be needed to describe the problem adequately. Evolutionary algorithms are ideal strategies for mining such data to generate useful relationships, rules and predictions. This chapter describes these techniques and highlights their exploitation in metabolomics.

The fewer data needed, the better the information. And an overload of information, that is, anything much beyond what is truly needed, leads to information blackout. It does not enrich, but impoverishes.Peter F. Drucker - Management: Tasks, Responsibilities, Practices

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Al-Jowder O, Defernez M, Kemsley EK, Wilson RH. Mid-infrared spectroscopy and chemometrics for die authentication of meat products. J Agric Food Chem 47: 3210–3218 (1999).

    Article  PubMed  CAS  Google Scholar 

  • Allen JK, Davey HM, Broadhurst D et al. Metabolic footprinting: a high-throughput, high-information approach to cellular characterisation and functional genomics. Nature Biotechnol submitted (2002).

    Google Scholar 

  • Alsberg BK, Goodacre R, Rowland JJ, Kell DB. Classification of pyrolysis mass spectra by fuzzy multivariate rule induction - comparison with regression, k-nearest neighbour, neural and decision-tree methods. Anal Chim Acta 348: 389–407 (1997).

    Article  CAS  Google Scholar 

  • Alsberg BK, Kell DB, Goodacre R. Variable selection in discriminant partial least squares analysis. Anal Chem 70: 4126–4133 (1998).

    Article  PubMed  CAS  Google Scholar 

  • Altshuler D, Daly M, Kruglyak L. Guilt by association. Nature Genet 26: 135–137 (2000).

    Article  PubMed  CAS  Google Scholar 

  • Bäck T, Fogel DB, Michalewicz Z. Handbook of Evolutionary Computation. Oxford University Press, Oxford (1997).

    Book  Google Scholar 

  • Banzhaf W, Nordin P, Keller RE, Francone FD. Genetic Programming: An Introduction. Morgan Kaufmann, San Francisco (1998).

    Google Scholar 

  • Barnaby W. The Plague Makers: The Secret World of Biolgoical Warfare. Vision Paperbacks, London (1997).

    Google Scholar 

  • Beavis RC, Colby SM, Goodacre R et al. Artificial intelligence and expert systems in mass spectrometry. In Encyclopedia of Analytical Chemistry. Meyers RA (Ed) pp. 11558–11597, John Wiley and Son, Chichester (2000).

    Google Scholar 

  • Beyer H-G. The Theory of Evolution Strategies. Springer, Berlin (2001)

    Google Scholar 

  • Bishop CM. Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1995).

    Google Scholar 

  • Bø TH, Jonassen I. New feature subset selection procedures for classification of expression profiles. http://genomebiologvcom/2Q02/3/4/researcli/00171 3: research0017.1–0017.11 (2002).

    Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Wadsworth Inc, Pacific Grove (1984).

    Google Scholar 

  • Brent R. Functional genomics: learning to think about gene expression data. Curr Biol 9: R338–R341 (1999).

    Article  PubMed  CAS  Google Scholar 

  • Brent R. Genomic biology. Cell 100: 169–183 (2000).

    Article  PubMed  CAS  Google Scholar 

  • Broadhurst D, Goodacre R, Jones A et al. Genetic algorithms as a method for variable selection in PLS regression, with application to pyrolysis mass spectra. Anal Chim Acta 348: 71–86 (1997).

    Article  CAS  Google Scholar 

  • Broomhead DS, Lowe D. Multivariate function interpolation and adaptive networks. Complex Sys 2: 321–355 (1988).

    Google Scholar 

  • Chatfield C, Collins AJ. Introduction to Multivariate Analysis. Chapman and Hall, London (1980).

    Google Scholar 

  • Corne D, Dorigo M, Glover F (Ed). New Ideas in Optimization. McGraw Hill, London (1999).

    Google Scholar 

  • Dainty RH. Chemical/biochemical detection of spoilage. Int J Food Microbiol 33: 19–33 (1996).

    Article  PubMed  CAS  Google Scholar 

  • Dando M. Biological Warfare in the 21 st Century. Brassey’s Ltd., London (1994).

    Google Scholar 

  • Darby RM, Maddison A, Mur LAJ et al. Cell specific expression of salicylate hydroxylase in an attempt to separate localised HR and systemic signalling establishing SAR in tobacco. Plant Mol Pathol 1: 115–124 (2000).

    Article  CAS  Google Scholar 

  • Downey G, McElhinney J, Fearn T. Species identification in selected raw homogenized meats by reflectance spectroscopy in the mid-infrared, near-infrared, and visible ranges. Appl Spectr 54: 894–899 (2000).

    Article  CAS  Google Scholar 

  • Doyle MP, Beuchat LR, Montville TJ (Ed) Food Microbiology: Fundamentals and Frontiers. American Society of Microbiology Press, Washington DC (1997).

    Google Scholar 

  • Duda RO, Hart PE, Stork DE. Pattern Classification. 2nd Edn. John Wiley and Sons, London (2001).

    Google Scholar 

  • Ellis DI, Broadhurst D, Kell DB et al. Rapid and quantitative detection of the microbial spoilage of meat using FT-IR spectroscopy and machine learning. Appl Env Microbiol 68: 2822–2828 (2002).

    Article  CAS  Google Scholar 

  • Everitt BS. Cluster Analysis. Edward Arnold, London (1993).

    Google Scholar 

  • Fell DA. Understanding the Control of Metabolism. Portland Press, London (1996).

    Google Scholar 

  • Fiehn O. Metabolomics — the link between genotypes and phenotypes. Plant Mol Biol 48: 155–171 (2002).

    Article  PubMed  CAS  Google Scholar 

  • Fiehn O, Kloska S, Altmann T. Integrated studies on plant biology using multiparallel techniques. Curr Opin Biotechnol 12: 82–86 (2001).

    Article  PubMed  CAS  Google Scholar 

  • Fiehn O, Kopka J, Dormann P et al. Metabolite profiling for plant functional genomics. Nature Biotechnol 18: 1157–1161 (2000a).

    Article  CAS  Google Scholar 

  • Fiehn O, Kopka J, Trethewey RN, Willmitzer L. Identification of uncommon plant metabolites based on calculation of elemental compositions using gas chromatography and quadrupole mass spectrometry. Anal Chem 72: 3573–3580 (2000b).

    Article  PubMed  CAS  Google Scholar 

  • Fogel DB. A comparison of evolutionary programming and genetic algorithms on selected constrained optimization problems. Simulation 64: 397–404 (1995).

    Article  Google Scholar 

  • Fogel DB. Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. IEEE Press, Piscataway (2000).

    Google Scholar 

  • Garey M, Johnson D. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco (1979).

    Google Scholar 

  • Gilbert RJ, Goodacre R, Woodward AM, Kell DB. Genetic programming: a novel method for the quantitative analysis of pyrolysis mass spectral data. Anal Chem 69: 4381–4389 (1997).

    Article  PubMed  CAS  Google Scholar 

  • Goldberg DE. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading (1989).

    Google Scholar 

  • Goodacre R, Neal MJ, Kell DB. Quantitative analysis of multivariate data using artificial neural networks: a tutorial review and applications to the deconvolution of pyrolysis mass spectrtra. Z Bakteriol 284: 516–539 (1996).

    Article  CAS  Google Scholar 

  • Goodacre R, Shann B, Gilbert R et al. The detection of the dipicolinic acid biomarker in Bacillus spores using Curie-point pyrolysis mass spectrometry and Fourier transform infrared spectroscopy. Anal Chem 72: 119–127 (2000).

    Article  PubMed  CAS  Google Scholar 

  • Goodacre R, Timmins EM, Burton R et al. Rapid identification of urinary tract infection bacteria using hyperspectral, whole organism fingerprinting and artificial neural networks. Microbiol 144: 1157–1170 (1998).

    Article  CAS  Google Scholar 

  • Harrington PB. Fuzzy rule-building expert systems: minimal neural networks. J Osmometries 5: 467–486 (1991).

    Google Scholar 

  • Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer-Verlag, Berlin (2001).

    Google Scholar 

  • Heinrich R, Schuster S. The Regulation of Cellular Systems. Chapman and Hall, New York (1996).

    Book  Google Scholar 

  • Holland JH. Adaption in Natural and Artificial Systems. MIT Press, Cambridge (1992).

    Google Scholar 

  • Horchner U, Kalivas JH. Further investigation on a comparative study of simulated annealing and genetic algorithm for wavelength selection. Anal Chim Acta 311: 1–13 (1995).

    Article  Google Scholar 

  • Johnson HE, Gilbert RJ, Winson MK et al. Explanatory analysis of the metabolome using genetic programming of simple, interpretable rules. Genet Program Evolv Mach 1: 243–258 (2000).

    Article  Google Scholar 

  • Jolliffe IT. Principal Component Analysis. Springer-Verlag, New York (1986).

    Google Scholar 

  • Kell DB. Defence against the flood: a solution to the data mining and predictive modeling challenges of today. Bioinformatics World (part of Scientific Computing News) Issue 1: 16–18 (2002a) http://www.abcrgc.com/biwppl6–18 as publ.pdf.

    Google Scholar 

  • Kell DB. Genotype-phenotype mapping: genes as computer programs. Trends Genet in press (2002b).

    Google Scholar 

  • Kell DB, Darby RM, Draper J. Genomic computing. Explanatory analysis of plant expression profiling data using machine learning. Plant Phys 126: 943–951 (2001).

    Article  CAS  Google Scholar 

  • Kell DB, King RD. On the optimization of classes for the assignment of unidentified reading frames in functional genomics programmes: the need for machine learning. Trends Biotechnol 18: 93–98 (2000).

    Article  PubMed  CAS  Google Scholar 

  • Kell DB, Mendes P. Snapshots of systems: metabolic control analysis and biotechnology in the post-genomic era. In Technological and Medical Implications of Metabolic Control Analysis. Cornish-Bowden A, Cardenas ML (Ed) pp. 3–25, Kluwer Academic Publishers, Dordrecht (2000) (see http://qbab.aber.ac.uk/dbk/mca99.htm).

    Chapter  Google Scholar 

  • Kell DB, Sonnleitner B. GMP — Good Modelling Practice: an essential component of Good Manafacturing Practice. Trends Biotechnol 13: 481–492 (1995).

    Article  CAS  Google Scholar 

  • Kell DB, Westerhoff HV. Towards a rational approach to the optimization of flux in microbial biotransformations. Trends Biotechnol 4: 137–142 (1986).

    Article  CAS  Google Scholar 

  • King RD, Muggleton S, Lewis RA, Sternberg MJE. Drug design by machine learning — the use of inductive logic programming to model the structure-activity-relationships of trimethoprim analogs binding to dihydrofolate-reductase. Proc Natl Acad Sci USA 89: 11322–11326 (1992).

    Article  PubMed  CAS  Google Scholar 

  • Koza JR. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992).

    Google Scholar 

  • Koza JR. Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge (1994).

    Google Scholar 

  • Koza JR, Bennett FH, Keane MA, Andre D. Genetic Programming III: Darwinian Invention and Problem Solving. Morgan Kaufmann, San Francisco (1999).

    Google Scholar 

  • Langdon WB. Genetic Programming and Data Structures: Genetic Programming + Data Structures = Automatic Programming! Kluwer Academic Publishers, Boston (1998).

    Book  Google Scholar 

  • Langdon WB, Poli R. Fitness causes bloat: mutation. In Proc First European Workshop on Genetic Programming. Vol. 1391. Banzhaf W, Poli R, Schoenauer M, Fogarty TC (Ed) pp. 37–48, Springer-Verlag, Berlin (1998).

    Google Scholar 

  • Langdon WB, Poli R. Foundations of Genetic Programming. Springer-Verlag, Berlin (2002).

    Google Scholar 

  • Lavrac N, Dzeroski S. Inductive Logic Programming: Techniques and Applications. Ellis Horwood, Chichester (1994).

    Google Scholar 

  • Leardi R, Seasholtz MB, Pell RJ. Variable selection for multivariate calibration using a genetic algorithm: prediction of additive concentrations in polymer films from Fourier transform-infrared spectral data. Anal Chim Acta 461: 189–200 (2002).

    Article  CAS  Google Scholar 

  • Lindon JC, Nicholson JK, Holmes E, Everett JR. Metabonomics: metabolic processes studied by NMR spectroscopy of biofluids. Concepts Magn Reson 12: 289–320 (2000).

    Article  CAS  Google Scholar 

  • Lloyd JW. Foundations of Logic Programming. Springer-Verlag, Berlin (1987).

    Book  Google Scholar 

  • Manly BFJ. Multivariate Statistical Methods: A Primer. Chapman and Hall, London (1994).

    Google Scholar 

  • Martens H, Naes T. Multivariate Calibration. John Wiley and Sons, Chichester (1989).

    Google Scholar 

  • McGovern AC, Broadhurst D, Taylor J et al. Monitoring of complex industrial bioprocesses for metabolite concentrations using modern spectroscopies and machine learning: application to gibberellic acid production. Biotechnol Bioeng 78: 527–538 (2002).

    Article  PubMed  CAS  Google Scholar 

  • McGovern AC, Ernill R, Kara BV et al. Rapid analysis of the expression of heterologous proteins in Escherichia coli using pyrolysis mass spectrometry and Fourier transform infrared spectroscopy with chemometrics: application to α2-interferon production. J Biotechnol 72: 157–167 (1999).

    Article  PubMed  CAS  Google Scholar 

  • Mendes P. Emerging bioinformatics for the metabolome. Briefings Bioinformat 3: 134–45 (2002).

    Article  CAS  Google Scholar 

  • Mendes P, Kell DB, Westerhoff HV. Why and when channeling can decrease pool size at constant net flux in a simple dynamic channel. Biochim Biophys Acta 1289: 175–186 (1996).

    Article  PubMed  Google Scholar 

  • Michalewicz Z. Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag, Berlin (1994).

    Google Scholar 

  • Michalewicz Z, Fogel DB. How to Solve It: Modern Heuristics. Springer-Verlag, Heidelberg (2000).

    Google Scholar 

  • Mitchell M. An Introduction to Genetic Algorithms. MIT Press, Boston (1995).

    Google Scholar 

  • Mitchell TM. Machine Learning. McGraw Hill, New York (1997).

    Google Scholar 

  • Muggleton SH. Inductive logic programming. New Generation Comput 8: 295–318 (1990).

    Article  Google Scholar 

  • Nychas GJE, Tassou CC. Spoilage processes and proteolysis in chicken as detected by HPLC. J Sci Food Agric 74: 199–208 (1997).

    Article  CAS  Google Scholar 

  • Oldroyd D. The Arch of Knowledge: An Introduction to the History of the Philosophy and Methodology of Science. Methuen, New York (1986).

    Google Scholar 

  • Oliver SG. Proteomics: guilt-by-association goes global. Nature 403: 601–603 (2000).

    Article  PubMed  CAS  Google Scholar 

  • Oliver SG, Winson MK, Kell DB, Baganz F. Systematic functional analysis of the yeast genome. Trends Biotechnol 16: 373–378 (1998).

    Article  PubMed  CAS  Google Scholar 

  • Quinlan JR. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993).

    Google Scholar 

  • Raamsdonk LM, Teusink B, Broadhurst D et al. A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations. Nature Biotechnol 19: 45–50 (2001).

    Article  CAS  Google Scholar 

  • Radovic BS, Goodacre R, Anklam E. Contribution of pyrolysis mass spectrtrometry (Py-MS) to authenticity testing of honey. J Anal Appl Pyrolysis 60: 79–87 (2001).

    Article  CAS  Google Scholar 

  • Roger JM, Bellon-Maurel V. Using genetic algorithms to select wavelengths in near-infrared spectra: application to sugar content prediction in cherries. Appl Spectr 54: 1313–1320 (2000).

    Article  CAS  Google Scholar 

  • Rudolph G. Convergence Properties of Evolutionary Algorithms. Verlag Dr Kovac, Hamburg (1997).

    Google Scholar 

  • Sana A, Keller JD. Algorithms for better representation and faster learning in radial basis functions. In Advances in Neural Information Processing Sytems. Vol. 2. Touretzky D (Ed) pp. 482–489, Morgan Kaufmann, San Mateo (1990).

    Google Scholar 

  • Schwefel H-P. Evolution and Optimum Seeking. John Wiley and Sons, New York (1995).

    Google Scholar 

  • Seasholtz MB, Kowalski B. The parsimony principle applied to multivariate calibration. Anal Chim Act 277: 165–177 (1993).

    Article  CAS  Google Scholar 

  • Shaw AD, Kaderbhai N, Jones A et al. Non-invasive, on-line monitoring of the biotransformation by yeast of glucose to ethanol using dispersive Raman spectroscopy and chemometrics. Appl Spectr 53: 1419–1428 (1999).

    Article  CAS  Google Scholar 

  • Tukey JW. Exploratory Data Analysis. Addison-Wesley, Reading (1977).

    Google Scholar 

  • Vaidyanathan S, Kell DB, Goodacre R. Flow-injection electrospray ionization mass spectrometry of crude cell extracts for high-throughput bacterial identification. J Am Sot-Mass Spectrom 13: 118–128 (2002).

    Article  CAS  Google Scholar 

  • Vaidyanathan S, Macaloney G, McNeill B. Fundamental investigations on the near-infrared spectra of microbial biomass as applicable to bioprocess monitoring. Analyst 124: 157–162 (1999).

    Article  CAS  Google Scholar 

  • Vaidyanathan S, Rowland JJ, Kell DB, Goodacre R. Rapid discrimination of aerobic endospore-forming bacteria via electrospray-ionisation mass spectrometry of whole cell suspensions. Anal Chem 73: 4134–4144 (2001).

    Article  PubMed  CAS  Google Scholar 

  • Werbos PJ. The Roots of Back-Propagation: From Ordered Derivatives to Neural Networks and Political Forecasting. John Wiley and Sons, Chichester (1994).

    Google Scholar 

  • Westerhoff HV, Kell DB. What BioTechnologists knew all along…? J Theor Biol 182: 411–420 (1996).

    Article  PubMed  CAS  Google Scholar 

  • Wilkinson L. The Grammar of Graphics. Springer-Verlag, New York (1999).

    Google Scholar 

  • Williams RR, Paradkar RP. Correcting fluctuating baselines and spectral overlap with genetic regression. Appl Spectr 51: 92–100 (1997).

    Article  Google Scholar 

  • Winson MK, Goodacre R, Woodward AM et al. Diffuse reflectance absorbance spectroscopy taking in chemometrics (DRASTIC). A hyperspectral FT-IR-based approach to rapid screening for metabolite overproduction. Anal Chim Acta 348: 273–282 (1997).

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media New York

About this chapter

Cite this chapter

Goodacre, R., Kell, D.B. (2003). Evolutionary Computation for the Interpretation of Metabolomic Data. In: Harrigan, G.G., Goodacre, R. (eds) Metabolic Profiling: Its Role in Biomarker Discovery and Gene Function Analysis. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0333-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-0333-0_13

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5025-5

  • Online ISBN: 978-1-4615-0333-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics