Skip to main content

Evolving Regular Expression-Based Sequence Classifiers for Protein Nuclear Localisation

  • Conference paper
Book cover Applications of Evolutionary Computing (EvoWorkshops 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3005))

Included in the following conference series:

Abstract

A number of bioinformatics tools use regular expression (RE) matching to locate protein or DNA sequence motifs that have been discovered by researchers in the laboratory. For example, patterns representing nuclear localisation signals (NLSs) are used to predict nuclear localisation. NLSs are not yet well understood, and so the set of currently known NLSs may be incomplete. Here we use genetic programming (GP) to generate RE-based classifiers for nuclear localisation. While the approach is a supervised one (with respect to protein location), it is unsupervised with respect to already-known NLSs. It therefore has the potential to discover new NLS motifs. We apply both tree-based and linear GP to the problem. The inclusion of predicted secondary structure in the input does not improve performance. Benchmarking shows that our majority classifiers are competitive with existing tools. The evolved REs are usually “NLS-like” and work is underway to analyse these for novelty.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J.H., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acid Research 25, 3389–3402 (1997)

    Article  Google Scholar 

  2. Bairoch, A., Apweller, R.: The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic Acid Research 25, 31–36 (1997)

    Article  Google Scholar 

  3. Bairoch, A., Bucher, P., Hofmann, K.: The PROSITE database, its status in 1997. Nucleic Acid Research 25, 217–221 (1997)

    Article  Google Scholar 

  4. Brameier, M., Banzhaf, W.: A comparison of linear genetic programming and neural networks in medical data mining. IEEE-EC 5, 17–26 (2001)

    Google Scholar 

  5. Christophe, D., Christophe-Hobertus, C., Pichon, B.: Nuclear targeting of proteins: how many different signals. CS 12(5), 337–341 (2000)

    Google Scholar 

  6. Cokol, M., Nair, R., Rost, B.: Finding nuclear localization signals. EMBO Rep 1(5), 411–415 (2000)

    Article  Google Scholar 

  7. Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. Journal of Molecular Biology 300(4), 1005–1016 (2000)

    Article  Google Scholar 

  8. Fontes, M.R.M., Teh, T., Jans, D., Brinkworth, R.I., Kobe, B.: Structural basis for the specificity of bipartite nuclear localization sequence binding by importin alpha. Journal of Biological Chemistry 278(30), 27981–27987 (2003)

    Article  Google Scholar 

  9. Hazel, P.: PCRE - Perl Compatible Regular Expressions library, http://www.pcre.org

  10. Howard, D., Benson, K.: Promoter prediction with a GP-automaton. In: Raidl, G.R., Cagnoni, S., Cardalda, J.J.R., Corne, D.W., Gottlieb, J., Guillot, A., Hart, E., Johnson, C.G., Marchiori, E., Meyer, J.-A., Middendorf, M. (eds.) EvoIASP 2003, EvoWorkshops 2003, EvoSTIM 2003, EvoROB/EvoRobot 2003, EvoCOP 2003, EvoBIO 2003, and EvoMUSART 2003. LNCS, vol. 2611, pp. 44–53. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  11. Jensen, L.J., Gupta, R., Staerfeldt, H.-H., Brunak, S.: Prediction of human protein function according to Gene Ontology categories. Bioinformatics 19(5), 635–642 (2003)

    Article  Google Scholar 

  12. Jonassen, J.F.: Collins, and D. G. Higgins. Finding flexible patterns in unaligned protein sequences. Protein Science 4(8), 1587–1595 (1995)

    Article  Google Scholar 

  13. Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292, 195–202 (1999)

    Article  Google Scholar 

  14. Koza, J.R.: Genetic Programming: On the Programming of Computers by Natural Selection. MIT Press, Cambridge (1992)

    MATH  Google Scholar 

  15. Koza, J.R., Bennett III, F.H., Andre, D.: Using programmatic motifs and genetic programming to classify protein sequences as to cellular location. In: Porto, V.W., Waagen, D. (eds.) EP 1998. LNCS, vol. 1447, pp. 437–447. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  16. Macara, G.: Transport into and out of the nucleus. Microbiology and Molecular Biology Reviews 65(4), 570–594 (2001)

    Article  Google Scholar 

  17. MacCallum, R.M.: Introducing a Perl Genetic Programming System: and Can Meta-evolution Solve the Bloat Problem? In: Ryan, C., Soule, T., Keijzer, M., Tsang, E.P.K., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 369–378. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  18. Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochem. Biophys. Acta 405, 442–451 (1975)

    Google Scholar 

  19. Mulder, N.J., et al.: The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acid Research 31(1), 315–318 (2003)

    Article  MathSciNet  Google Scholar 

  20. Nair, R., Rost, B.: Better prediction of sub-cellular localization by combining evolutionary and structural information. Proteins: Structure, Function and Genetics 53(4), 917–930 (2003)

    Article  Google Scholar 

  21. Nakai, K., Horton, P.: PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends in Biochemical Science 24(1), 34–36 (1999)

    Article  Google Scholar 

  22. Pearson, W.R.: Rapid and sensitive sequence comparison with FASTP and FASTA. Methods in Enzymology 183, 63–98 (1990)

    Article  Google Scholar 

  23. Reinhardt, A., Hubbard, T.: Using neural networks for prediction of the subcellular location of proteins. Nucleic Acid Research 26(9), 2230–2236 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Heddad, A., Brameier, M., MacCallum, R.M. (2004). Evolving Regular Expression-Based Sequence Classifiers for Protein Nuclear Localisation. In: Raidl, G.R., et al. Applications of Evolutionary Computing. EvoWorkshops 2004. Lecture Notes in Computer Science, vol 3005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24653-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24653-4_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21378-9

  • Online ISBN: 978-3-540-24653-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics