Skip to main content

Application of Machine-Learning Methods to Understand Gene Expression Regulation

  • Chapter
  • First Online:
Book cover Genetic Programming Theory and Practice XII

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

Abstract

With the development and application of high-throughput technologies, an enormous amount of biological data has been produced in the past few years. These large-scale datasets make it possible and necessary to implement machine learning techniques for mining biological insights. In this chapter, we describe several examples to show how machine learning approaches are used to elucidate the mechanism of transcriptional regulation mediated by transcription factors and histone modifications. We demonstrate that machine learning provides powerful tools to quantitatively relate gene expression with transcription factor binding and histone modifications, to identify novel regulatory DNA elements in the genomes, and to predict gene functions. We also discuss the advantages and limitations of genetic programming in analyzing and processing biological data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Andre D, Koza J (1996) A parallel implementation of genetic programming that achieves super-linear performance. Proceedings of the international conference on parallel and distributed processing techniques and applications, CSREA Press, Sunnyvale:A.H.R.

    Google Scholar 

  • Berger S (2007) The complex language of chromatin regulation during transcription. Nature 447(7143):407–412

    Article  Google Scholar 

  • Chadwick L (2012) The NIH roadmap epigenomics program data resource. Epigenomics 4(3):317–324

    Article  MathSciNet  Google Scholar 

  • Chen X, Xu H, Yuan P, Fang F, Huss M, Vega V, Wong E, Orlov Y, Zhang W, Jiang J (2008) Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133(6):1106–1117

    Article  Google Scholar 

  • Cheng C, Gerstein M (2012) Modeling the relative relationship of transcription factor binding and histone modifications to gene expression levels in mouse embryonic stem cells. Nucleic Acids Res 40(2):553–568

    Article  Google Scholar 

  • Cheng C, Li L (2008) Systematic identification of cell cycle regulated transcription factors from microarray time series data. BMC Genomics 9:116

    Article  Google Scholar 

  • Cheng C, Shou C, Yip K, Gerstein M (2011a) Genome-wide analysis of chromatin features identifies histone modification sensitive and insensitive yeast transcription factors. Genome Biol 12(11):R111

    Article  Google Scholar 

  • Cheng C, Yan K, Yip K, Rozowsky J, Alexander R, Shou C, Gerstein M (2011b) A statistical framework for modeling gene expression using chromatin features and application to modencode datasets. Genome Biol 12(2):R15

    Article  Google Scholar 

  • Cheng C, Alexander R, Min R, Leng J, Yip K, Rozowsky J, Yan K, Dong X, Djebali S, Ruan Y (2012) Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome Res 22(9):1658–1667

    Article  Google Scholar 

  • Cheng C, Ung M, Grant G, Whitfield M (2013) Transcription factor binding profiles reveal cyclic expression of human protein-coding genes and non-coding rnas. PLoS Computational Biol 9(7):e1003132

    Article  Google Scholar 

  • Cloonan N, Forrest A, Kolle G, Gardiner B, Faulkner G, Brown M, Taylor D, Steptoe A, Wani S, Bethel G (2008) Stem cell transcriptome profiling via massive-scale mrna sequencing. Nat Methods 5(7):613–619

    Article  Google Scholar 

  • Creyghton M, Cheng A, Welstead G, Kooistra T, Carey B, Steine E, Hanna J, Lodato M, Frampton G, Sharp P (2010) Histone h3k27ac separates active from poised enhancers and predicts developmental state. Proceedings of the National Academy of Sciences of the United States of America 107(50):21,931–21,936

    Google Scholar 

  • Eggermont J, Kok J, Kosters W (2004) Genetic programming for data classification:partitioning the search space. Proceedings of the 2004 ACM symposium on Applied computing ACM Press, Nicosia, pp 1001–1005

    Google Scholar 

  • ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414):57–74

    Google Scholar 

  • Farnham P (2009)Insights from genomic profiling of transcription factors. Nat Rev Genet 10(9):605–616

    Article  Google Scholar 

  • Gerstein M, Lu Z, Nostrand EV, Cheng C, Arshinoff B, Liu T, Yip K, Robilotto R, Rechtsteiner A, Ikegami K (2010) Integrative analysis of the caenorhabditis elegans genome by the modencode project. Science 330(6012):1775–1787

    Article  Google Scholar 

  • Ghosh P, Bagchi M (2009) Qsar modeling for quinoxaline derivatives using genetic algorithm and simulated annealing based feature selection. Curr Med Chem 16(30):4032–4048

    Article  Google Scholar 

  • Johnson D, Mortazavi A, Myers R, Wold B (2007) Genome-wide mapping of in vivo protein-dna interactions. Science 316(5830):1497–1502

    Article  Google Scholar 

  • Kandoth C, McLellan M, Vandin F, Ye K, Niu B, Lu C, Xie M, Zhang Q, McMichael J, Wyczalkowski M (2013) Mutational landscape and significance across 12 major cancer types. Nature 502(7471):333–339

    Article  Google Scholar 

  • Khan M, Alam M (2012) A survey of application: genomics and genetic programming, a new frontier. Genomics 100(2):65–71

    Article  Google Scholar 

  • Kotanchek M, Smits G, Vladislavleva E (2006) Pursuing the pareto paradigm tournaments, algorithm variations & ordinal optimization. In: Riolo RL, Soule T, Worzel B (eds) Genetic programming theory and practice IV, genetic and evolutionary computation, vol 5. Springer, Ann Arbor, pp 167–185. doi:10.1007/978-0-387-49650-4–11

    Google Scholar 

  • Kotanchek ME,Vladislavleva E, Smits G(2012) Symbolic regression is not enough: It takes a village to raise a model. In: Riolo R, Vladislavleva E, Ritchie MD, Moore JH (eds) Genetic programming theory and practice X, genetic and evolutionary computation. Springer, Ann Arbor, pp 187–203. doi:10.1007/978-1-4614-6846-2-13, http://dx.doi.org/10.1007/978-1-4614-6846-2-13

  • Koza JR, Mydlowec W, Lanza G, Yu J, Keane MA (2001) Automatic synthesis of both the topology and sizing of metabolic pathways using genetic programming. In: Spector L, Goodman ED, Wu A, Langdon WB, Voigt HM, Gen M, Sen S, Dorigo M, Pezeshk S, Garzon MH, Burke E (eds) Proceedings of the genetic and evolutionary computation conference (GECCO-2001). Morgan Kaufmann, San Francisco, pp 57–65. http://www.cs.bham.ac.uk/~wbl/biblio/gecco2001/koza-gecco2001.pdf

  • Kurdistani S, Tavazoie S, Grunstein M (2004) Mapping global histone acetylation patterns to gene expression. Cell 117(6):721–733

    Article  Google Scholar 

  • Lander E, Linton L, Birren B, Nusbaum C, Zody M, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921

    Article  Google Scholar 

  • Li B, Carey M, Workman J (2007) The role of chromatin during transcription. Cell 128(4):707–719

    Article  Google Scholar 

  • Maston G, Evans S, Green M (2006) Transcriptional regulatory elements in the human genome. Annu Rev Genomics Hum Genet 7:29–59

    Google Scholar 

  • Mikkelsen T, Ku M, Jaffe D, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim T, Koche R (2007) Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448(7153):553–560

    Article  Google Scholar 

  • Mitra A, Almal A, George B, Fry D, Lenehan P, Pagliarulo V, Cote R, Datar R, Worzel W (2006) The use of genetic programming in the analysis of quantitative gene expression profiles for identification of nodal status in bladder cancer. BMC Cancer 6:159

    Article  Google Scholar 

  • Moore J, White B (2006) Genome-wide genetic analysis using genetic programming: the critical need for expert knowledge. In: Riolo RL, Soule T, Worzel B (eds) Genetic programming theory and practice IV, Springer, genetic and evolutionary computation, vol 5, pp 11–28

    Google Scholar 

  • Orlando D, Lin C, Bernard A, Wang J, Socolar J, Iversen E, Hartemink A, Haase S (2008) Global control of cell-cycle transcription by coupled cdk and network oscillators. Nature 453(7197):944–947

    Article  Google Scholar 

  • Pennacchio L, Ahituv N, Moses A, Prabhakar S, Nobrega M, Shoukry M, Minovisky S, Dubchak I, Holt A, Lewis K (2006) In vivo enhancer analysis of human conserved non-coding sequences. Nature 444(7118):499–502

    Article  Google Scholar 

  • Pennacchio L, Bickmore W, Dean A, Nobrega M, Bejerano G (2013) Enhancers: five essential questions. Nat Rev Genet 14(4):288–295

    Article  Google Scholar 

  • Ren B, Robert F, Wyrick J, Aparicio O, Jennings E, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E (2000) Genome-wide location and function of dna binding proteins. Science 290(5500):2306–2309

    Article  Google Scholar 

  • Simon I, Barnett J, Hannett N, Harbison C, Rinaldi N, Volkert T, Wyrick J, Zeitlinger J, Gifford D, Jaakkola T (2001) Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 106(6):697–708

    Article  Google Scholar 

  • Stamatoyannopoulos J, Snyder M, Hardison R, Ren B, Gingeras T, Gilbert D, Groudine M, Bender M, Kaul R, Canfield T (2012) An encyclopedia of mouse dna elements (mouse encode). Gen Biol 13(8):418

    Google Scholar 

  • Stormo G (2000) Dna binding sites: representation and discovery. Bioinformatics 16(1):16–23

    Article  Google Scholar 

  • Strahl B, Allis C (2000) The language of covalent histone modifications. Nature 403(6765):41–45

    Article  Google Scholar 

  • Venter J, Adams M, Myers E, Li P, Mural R, Sutton G, Smith H, Yandell M, Evans C, Holt R (2001) The sequence of the human genome. Science 291(5507):1304–1351

    Article  Google Scholar 

  • Whitfield M, Sherlock G, Saldanha A, Murray J, Ball C, Alexander K, Matese J, Perou C, Hurt M, Brown P (2002) Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell 13(6):1977–2000

    Article  Google Scholar 

  • Worzel W, Yu J, Almal A, Chinnaiyan A (2009) Applications of genetic programming in cancer research. Int J Biochem Cell Biol 41(2):405–413

    Article  Google Scholar 

  • Yip K, Cheng C, Bhardwaj N, Brown J, Leng J, Kundaje A, Rozowsky J, Birney E, Bickel P, Snyder M (2012) Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome biol 13(9):R48

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Cheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Cheng, C., Worzel, W. (2015). Application of Machine-Learning Methods to Understand Gene Expression Regulation. In: Riolo, R., Worzel, W., Kotanchek, M. (eds) Genetic Programming Theory and Practice XII. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-319-16030-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16030-6_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16029-0

  • Online ISBN: 978-3-319-16030-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics