Classification of Gene Expression Data with Genetic Programming

Driscoll, Joseph A.; Worzel, Bill; MacLean, Duncan

doi:10.1007/978-1-4419-8983-3_3

Classification of Gene Expression Data with Genetic Programming

Joseph A. Driscoll⁴,
Bill Worzel⁵ &
Duncan MacLean⁵

Chapter

308 Accesses
14 Citations

Part of the book series: Genetic Programming Series ((GPEM,volume 6))

Abstract

This paper summarizes the use of a genetic programming (GP) system to develop classification rules for gene expression data that hold promise for the development of new molecular diagnostics. This work focuses on discovering simple, accurate rules that diagnose diseases based on changes of gene expression profiles within a diseased cell. GP is shown to be a useful technique for discovering classification rules in a supervised learning mode where the biological genotype is paired with a biological phenotype such as a disease state. In the process of developing these rules, it is necessary to devise new techniques for establishing fitness and interpreting the results of evolutionary runs because of the large number of independent variables and the comparatively small number of samples. These techniques are described and issues of overfitting caused by small sample sizes and the behavior of the GP system when variables are missing from the samples are discussed.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Reference

Bojarczuk, C. C, Lopes, H. S., and Freitas, A. A. (2001). Data mining with constrained-syntax genetic programming: applications to medical data sets. Intelligent Data Analysis in Medicine and Pharmacology (IDAMAP-2001)
Google Scholar
Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C, Furey, T. S., Manuel Ares, J. & Haussler, D. (1999). Support vector machine classification of microarray gene expression data. University of Santa Cruz Technical Report. UCSC-CRL-99–09 http://www.cse.ucsc.edu/research/compbio/genex/genex.ps/research/compbio/genex/genex.ps.
Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C, Furey, T. S., Manuel Ares, J. & Haussler, D. (1999). Supplemental data for “Knowledge-based analysis of microarray gene expression data by using support vector machines”, available at http://www.cse.ucsc.edu/research/compbio/genex/.
Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C., Furey, T. S., Manuel Ares, J. & Haussler, D. (2000). Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. (USA) 97: 262–267
Article Google Scholar
Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998). Supplemental data for “Cluster analysis and display of genome-wide expression patterns”, Proc. Nat. Acad. Sci. (USA) 95: 14863–14868, available at http://rana.stanford.edu/clustering/clustering.
Article Google Scholar
Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns, Proc. Nat. Acad. Sci. (USA) 95: 14863–14868.
Article Google Scholar
Gerhold, D., et al. (1999). DNA chips: Promising Toys have become Powerful Tools. Trends Biochem Sci. ; 24(5): 168–73
Article Google Scholar
Khan, J. et al. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7: 673–679
Article Google Scholar
Khan, J. et al. (2001). Supplementary information for Javed Khan, et. al, Nature Medicine; 7(6):673–679, http://www.nhgri.nih.gov/DIR/Microarray/Supplement/.
Article Google Scholar
Linden, D. and Altshuler, E. (1999). Evolving Wire Antennas using Genetic Algorithm. Proceedings of the First NASA/DoD Workshop on Evolvable Hardware, 225–232, IEEE Computer Society, Los Alamitos, CA.
Book Google Scholar
Luke, S. and Panait, L. (2002). Is the Perfect the Enemy of the Good? In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 820–828, Morgan Kauffman, San Francisco, CA.
Google Scholar
McKay. B. et al. (1995). Using a tree structured genetic algorithm to perform symbolic regression. In First International Conference on Genetic Algorithms in Engineering Systems: Innovations and Applications, A. M. S. Zalzala (Ed. ); GALESIA, volume 414, pages 487–492, Sheffield UK, 12–14, September. IEEE.
Google Scholar
McPhee, N. F. and Hopper, N. J. (1999). Analysis of Genetic Diversity through Population History. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1112–1120, Morgan Kauffman, San Francisco, CA.
Google Scholar
MYGD. Munich Information center for Protein Sequences (MIPS) yeast genome database, http://www.mips.biochem.mpg.de/proj/yeast/proj/yeast.
Raidl, G. R. (1998). A Hybrid GP Approach for Numerically Robust Symbolic Regression. In Genetic Programming 1998: Proceedings of the Third Annual Conference, J. R. Koza, et al (Eds. ), pp. 323–28. University of Wisconsin, Madison. San Francisco: Morgan Kaufmann Publishers.
Google Scholar
Rao, C. R. (1964). The Use and Interpretation of Principal Component Analysis in Applied Research, Sankya, Series A: 26: 329–358
MATH Google Scholar
Tan, K. C, Tay, A., Lee, T. H., and Heng, C. M. (2002). Mining multiple comprehensible classification rules using genetic programming. In Proceedings of the 2002 Congress on Evolutionary Computation CEC, 1302–1307.
Google Scholar
Teller, A. and Veloso, M. (1995). PADO: Learning Tree Structured Algorithms for Orchestration into an Object Recognition System. Technical Report CMU-CS-95–101, Carnegie Mellon University, Dept. of Computer Science.
Google Scholar

Download references

Author information

Authors and Affiliations

Middle Tennessee State University, Murfreesboro, 37132, TN, USA
Joseph A. Driscoll
Genetics Squared, Inc., Milan, MI, 48160, USA
Bill Worzel & Duncan MacLean

Authors

Joseph A. Driscoll
View author publications
You can also search for this author in PubMed Google Scholar
Bill Worzel
View author publications
You can also search for this author in PubMed Google Scholar
Duncan MacLean
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for the Study of Complex Systems, University of Michigan, USA
Rick Riolo
Genetics Squared, USA
Bill Worzel

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Driscoll, J.A., Worzel, B., MacLean, D. (2003). Classification of Gene Expression Data with Genetic Programming. In: Riolo, R., Worzel, B. (eds) Genetic Programming Theory and Practice. Genetic Programming Series, vol 6. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-8983-3_3

Download citation

DOI: https://doi.org/10.1007/978-1-4419-8983-3_3
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-4747-7
Online ISBN: 978-1-4419-8983-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics