Abstract
There have been published some studies of genetic programming as a way to discover motifs in proteins and other biological data. These studies have been small, and often used domain knowledge to improve search. In this paper we present a genetic programming algorithm, that does not use domain knowledge, with results on 44 different protein families. We demonstrate that our list-based representation, given a fixed amount of processing resources, is able to discover meaningful motifs with good classification performance. Sometimes comparable to or even surpassing that of motifs found in a database of manually created motifs. We also investigate introduction of gaps in our algorithm, and it seems that this give a small increase in classification accuracy and recall, but with reduced precision.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Halaas, A., Svingen, B., Nedland, M., Saetrom, P., Snove Jr., O., Birkeland, O.R.: A recursive MISD architecture for pattern matching. IEEE Transactions on Very Large Scale Integraion (VLSI) Systems 12(7), 727–734 (2004)
Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic Programming – An Introduction; On the Automatic Evolution of Computer Programs and its Applications. Morgan Kaufmann, San Francisco (1998)
Brazma, A., Jonassen, I., Eidhammer, I., Gilbert, D.: Approaches to the automatic discovery of patterns in biosequences. Journal of Computational Biology 5(2), 277–304 (1998)
Heddad, A., Brameier, M., MacCallum, R.M.: Evolving regular expression-based sequence classifiers for protein nuclear localisation. In: Raidl, G.R., Cagnoni, S., Branke, J., Corne, D.W., Drechsler, R., Jin, Y., Johnson, C.G., Machado, P., Marchiori, E., Rothlauf, F., Smith, G.D., Squillero, G. (eds.) EvoWorkshops 2004. LNCS, vol. 3005, pp. 31–40. Springer, Heidelberg (2004)
Hu, Y.-J.: Biopattern discovery by genetic programming. In: Koza, J.R., Banzhaf, W., Chellapilla, K., Deb, K., Dorigo, M., Fogel, D.B., Garzon, M.H., Goldberg, D.E., Iba, H., Riolo, R. (eds.) Genetic Programming 1998: Proceedings of the Third Annual Conference, University of Wisconsin, Madison, Wisconsin, USA, July 22-25, pp. 152–157. Morgan Kaufmann, San Francisco (1998)
Hulo, N., Sigrist, C.J.A., Le Saux, V., Langendijk-Genevaux, P.S., Bordoli, L., Gattiker, A., De Castro, E., Bucher, P., Bairoch, A.: Recent improvements to the PROSITE database. Nucl. Acids Res. 32(90001), D134–137 (2004)
Koza, J.R., Andre, D.: Automatic discovery using genetic programming of an unknown-sized detector of protein motifs containing repeatedly-used subexpressions. In: Rosca, J.P. (ed.) Proceedings of the Workshop on Genetic Programming: From Theory to Real-World Applications, Tahoe City, California, USA, July 9, pp. 89–97 (1995)
Ross, B.J.: The evaluation of a stochastic regular motif language for protein sequences. In: Spector, L., Goodman, E.D., Wu, A., Langdon, W.B., Voigt, H.-M., Gen, M., Sen, S., Dorigo, M., Pezeshk, S., Garzon, M.H., Burke, E. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), San Francisco, California, USA, July 7-11, pp. 120–128. Morgan Kaufmann, San Francisco (2001)
Ross, B.J.: The evolution of stochastic regular motifs for protein sequences. New Generation Computing 20(2), 187–213 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Seehuus, R. (2005). Protein Motif Discovery with Linear Genetic Programming. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3683. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11553939_109
Download citation
DOI: https://doi.org/10.1007/11553939_109
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28896-1
Online ISBN: 978-3-540-31990-0
eBook Packages: Computer ScienceComputer Science (R0)