Skip to main content

Boosting Improves Stability and Accuracy of Genetic Programming in Biological Sequence Classification

  • Chapter
Book cover Genetic Programming Theory and Practice IV

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

  • 904 Accesses

Abstract

Biological sequence analysis presents interesting challenges for machine learning. With an important problem — the recognition of functional target sites for microRNA molecules — as an example, we show how multiple genetic programming classifiers improve accuracy and stability. Moving from single classifiers to bagging and boosting with crossvalidation and parameter optimization requires more computing power. A special-purpose search processor for fitness evaluation renders boosted genetic programming practical for our purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990). Basic local alignment search tool. Journal of molecular biology, 215(3):403–410.

    Google Scholar 

  • Bartlett, P., Freund, Y., Lee, W. S., and Schapire, R. E. (1998). Boosting the margin: a new explanation for the effectiveness of voting methods. Annals of Statistics, 26(5):1651–1686.

    Article  MATH  MathSciNet  Google Scholar 

  • Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., and Wheeler, D. L. (2005). GenBank. Nucleic Acids Research, 33(DB):D34–D38.

    Google Scholar 

  • Brenner, S., Jacob, F., and Meselson, M. (1961). An unstable intermediate carrying information from genes to ribosomes for protein synthesis. Nature, 190:576–581.

    Article  Google Scholar 

  • Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Knowledge Discovery and Data Mining, 2(2): 121–167.

    Article  Google Scholar 

  • Crick, F. H. C. (1958). The biological replication of macromolecules. Symposia of the Society for Experimental Biology, 12:138–163.

    Google Scholar 

  • Eiben, Agoston Endre, Hinterding, Robert, and Michalewicz, Zbigniew (1999). Parameter control in evolutionary algorithms. IEEE Transations on Evolutionary Computation, 3(2): 124–141.

    Article  Google Scholar 

  • Feldt, Robert and Nordin, Peter (2000). Using factorial experiments to evaluate the effect of genetic programming parameters. In Poli, Riccardo, Banzhaf, Wolfgang, Langdon, William B., Miller, Julian F., Nordin, Peter, and Fogarty, Terence C, editors, Genetic Programming, Proceedings of EuroGP’2000, volume 1802 of LNCS, pages 271–282, Edinburgh. Springer-Verlag.

    Google Scholar 

  • Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139.

    Article  MATH  MathSciNet  Google Scholar 

  • Griffiths-Jones, S. (2004). The microRNA registry. Nucleic Acids Research, 32(90001):D109–111.

    Article  Google Scholar 

  • Halaas, A., Svingen, B., Nedland, M., Sætrom, P., Snøve Jr., O., and Birkeland, O. R. (2004). A recursive MISD architecture for pattern matching. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 12(7):727–734.

    Article  Google Scholar 

  • Hansen, L. K. and Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10):993–1001.

    Article  Google Scholar 

  • Knuth, D. E. (2002). All questions answered. Notices of the AMS, 49(3):318–324.

    MATH  MathSciNet  Google Scholar 

  • Koza, John R. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA.

    MATH  Google Scholar 

  • Lewin, B. (2000). Genes VII. Oxford University Press, Oxford, UK.

    Google Scholar 

  • Martinez, J. and Tuschl, T. (2004). RISC is a 5′ phosphomonoester-producing RNA endonuclease. Genes & development, 18(9):975–980.

    Article  Google Scholar 

  • Meir, R. and Rätsch, G. (2003). An introduction to boosting and leveraging. In Mendelson, S. and Smola, A., editors, Advanced Lectures on Machine Learning, volume 2600, pages 118–183. Springer-Verlag.

    Google Scholar 

  • Montana, David J. (1995). Strongly typed genetic programming. Evolutionary Computation, 3(2): 199–230.

    Google Scholar 

  • Petersen, C. P., Bordeleau, M.-E., Pelletier, J., and Sharp, P. A. (2006). Short RNAs repress translation after initiation in mammalian cells. Molecular cell, 21(4):533–542.

    Article  Google Scholar 

  • Prechelt, L. (1998). Automatic early stopping using cross validation: quantifying the criteria. Neural Networks, 11(4):761–767.

    Article  Google Scholar 

  • Rätsch, G., Onoda, T., and Müller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning, 42(3):287–320.

    Article  MATH  Google Scholar 

  • Saetrom, O., Snøve Jr., O., and Sætrom, P. (2005a). Weighted sequence motifs as an improved seeding step in microRNA target prediction algorithms. RNA, 11(7):995–1003.

    Article  Google Scholar 

  • Sætrom, P. (2004). Predicting the efficacy of short oligonucleotides in antisense and RNAi experiments with boosted genetic programming. Bioinformatics, 20(17):3055–3063.

    Article  Google Scholar 

  • Sætrom, P., Sneve, R., Kristiansen, K. I., Snøve Jr., O., Grünfeld, T., Rognes, T., and Seeberg, E. (2005b). Predicting non-coding RNA genes in Escherichia coli with boosted genetic programming. Nucleic Acids Research, 33(10):3263–3270.

    Article  Google Scholar 

  • Salzberg, S. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1(3):317–328.

    Article  Google Scholar 

  • Sethupathy, P., Corda, B., and Hatzigeorgiou, A. G. (2006). TarBase: a comprehensive database of experimentally supported anima 1 microRNA targets. RNA, 12(2): 192–197.

    Article  Google Scholar 

  • Smith, T. F. and Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of molecular biology, 147(l):403–410.

    Google Scholar 

  • Vapnik, V. N. (1998). Statistical Learning Theory. Wiley-Interscience, New York, NY, USA.

    MATH  Google Scholar 

  • Wightman, B., Ha, I., and Ruvkun, G. (1993). Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell, 75(5):855–862.

    Article  Google Scholar 

  • Yekta, S., Shih, I., and Bartel, D. P. (2004). MicroRNA-directed cleavage of HOXB8 mRNA. Science, 304(5670):594–596.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Saetrom, P., Birkeland, O.R., Snøve, O. (2007). Boosting Improves Stability and Accuracy of Genetic Programming in Biological Sequence Classification. In: Riolo, R., Soule, T., Worzel, B. (eds) Genetic Programming Theory and Practice IV. Genetic and Evolutionary Computation. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-49650-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-49650-4_5

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-33375-5

  • Online ISBN: 978-0-387-49650-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics