A genetic programming method for protein motif discovery and protein classification

Tsunoda, Denise Fukumi; Freitas, Alex Alves; Lopes, Heitor Silvério

doi:10.1007/s00500-010-0624-9

A genetic programming method for protein motif discovery and protein classification

Focus
Published: 16 June 2010

Volume 15, pages 1897–1908, (2011)
Cite this article

Soft Computing Aims and scope Submit manuscript

Denise Fukumi Tsunoda¹,
Alex Alves Freitas² &
Heitor Silvério Lopes³

227 Accesses
1 Citation
Explore all metrics

Abstract

Proteins can be grouped into families according to some features such as hydrophobicity, composition or structure, aiming to establish common biological functions. This paper presents MAHATMA—memetic algorithm-based highly adapted tool for motif ascertainment—a system that was conceived to discover features (particular sequences of amino acids, or motifs) that occur very often in proteins of a given family but rarely occur in proteins of other families. These features can be used for the classification of unknown proteins, that is, to predict their function by analyzing their primary structure. Experiments were done with a set of enzymes extracted from the Protein Data Bank. The heuristic method used was based on genetic programming using operators specially tailored for the target problem. The final performance was measured using sensitivity, specificity and hit rate. The best results obtained for the enzyme dataset suggest that the proposed evolutionary computation method is effective in finding predictive features (motifs) for protein classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

schematikon: Detailed Sequence-Structure Relationships from Mining a Non-redundant Protein Structure Database

ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins

Article Open access 16 May 2015

Motifs and structural blocks retrieval by GHT

Article 24 June 2014

Notes

Available at http://www.ncbi.nlm.nih.gov/blast.
Available at http://www.pdb.org/pdb/home/home.do.

References

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Google Scholar
Banzhaf W, Nordin P, Keller RE, Francone FD (1998) Genetic programming: an introduction. Morgan Kaufmann, San Mateo, CA
MATH Google Scholar
Branden CI, Tooze J (1999) Introduction to protein structure. Garland, New York
Google Scholar
Chua H, Sung W, Wong L (2006) Exploiting indirect neighbors and topological weight to predict protein function from protein interactions. Bioinformatics 32(13):1623–1630. doi:10.1093/bioinformatics/btl145
Article Google Scholar
desJardins M, Karp PD, Krummenacker M, Lee TJ (1997) Prediction of enzyme classification from protein sequence without the use of sequence similarity. ISMB-97 Proceedings, pp 92–99
Eiben AE, Smith JE (2003) Introduction to evolutionary computing, 2nd printing. Natural computing series. Springer, Berlin
Espejo PG, Ventura S, Herrera F (2010) A survey on the application of genetic programming to classification. IEEE Trans Syst Man Cybern Part C Appl Rev 40(2):121–144. doi:10.1109/TSMCC.2009.2033566
Article Google Scholar
Freitas AA, de Carvalho ACPLF (2007) A tutorial on hierarchical classification with applications in bioinformatics. In: Taniar D (ed) Research and trends in data mining technologies and applications, Idea Group, pp 175–208
Freitas AA, Wieser DC, Apweiler R (2010) On the importance of comprehensible classification models for protein function prediction. IEEE/ACM Trans Comput Biol Bioinform 7(1):172–182. doi:10.1109/TCBB.2008.47
Article Google Scholar
Friedberg I (2006) Automated protein function prediction—the genomic challenge. Brief Bioinform 7(3):225–242. doi:10.1093/bib/bbl004
Article Google Scholar
Goldberg DE (1989) Genetic algorithms in search optimization and machine learning. Addison-Wesley, Reading
MATH Google Scholar
Hsu WH (2009) Genetic programming. In: Wang J (ed) Encyclopedia of data warehousing and mining, 2nd edn. Idea Group Inc. Global, pp 926–931
Izrailev S, Farnum MA (2004) Enzyme classification by ligand binding. Proteins Struct Funct Bioinform 57(4):711–724. doi:10.1002/prot.20277
Article Google Scholar
Jensen LJ, Gupta R, Blom N, Devos D, Tamames J, Kesmir C, Nielsen H, Staerfeldt HH, Rapacki K, Workman C, Andersen CAF, Knudsen S, Krogh A, Valencia A, Brunak S (2002) Prediction of human protein function from post-translational modifications and localization features. J Mol Biol 319:1257–1265. doi:10.1016/S0022-2836(02)00379-0
Article Google Scholar
Kaminska KH, Milanowska K, Bujnicki JM (2009) The basics of protein sequence analysis. In: Bujnicki JM (ed) Prediction of protein structures, functions, and interactions, pp 1–38. doi:10.1002/9780470741894
Koza JR (1992) Genetic programming—on the programming of computers by means of natural selection. MIT Press, Cambridge
MATH Google Scholar
Koza JR (1994) Genetic programming ii: automatic discovery of reusable programs. MIT Press, Cambridge
MATH Google Scholar
Larose DT (2006) Data mining methods and models. Wiley and Sons, Hoboken, NJ
MATH Google Scholar
Lehninger AL, Nelson DL, Cox MM (1998) Principles of biochemistry, 2nd edn. Worth Publishers, New York
Google Scholar
Lesk AM (2001) Introduction to protein architecture. Oxford University Press Inc., New York
Google Scholar
Leung CM, Chin FYL (2006) Algorithms for challenging motif problems. J Bioinform Comput Biol 4:43–58. doi:10.1142/S0219720006001692
Article Google Scholar
Lopes HS (1996) Analogia e Aprendizado Evolucionário: uma aplicação em diagnóstico clínico. PhD thesis, Brazil (in Portuguese)
Moscato P (1989) On evolution, search, optimization, genetic algorithms and martial arts: towards memetic algorithms. Technical report Caltech Concurrent Computation Program, No. 826, CA
Nisbet R, Elder J, Miner G (2009) Statistical analysis and data mining applications. Elsevier, San Diego, CA
MATH Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo, CA
Google Scholar
RCBS (2010) Research collaboratory for structural bioinformatics (RCSB) website. Available at http://www.pdb.org/pdb/home/home.do
Rost B, Liu J, Nair R, Wrzeszczynski KO, Ofran Y (2003) Automatic prediction of protein function. CMLS Cell Mol Life Sci 60:2637–2650
Article Google Scholar
Santos CT, Bazzan ALC, Lemke N (2009) Automatic classification of enzyme family in protein annotation. Lect Notes Comput Sci 5676:86–96. doi:10.1007/978-3-642-03223-3_8
Article Google Scholar
Silla Jr CN, Freitas AA (2010) A survey of hierarchical classification across different application domains. Data Min Knowl Discov (in press)
Tsunoda DF, Lopes HS (2005) Automatic motif discovery in an enzyme database using a genetic algorithm-based approach. Soft Comput Fusion Found Methodol Appl 10(4):325–330. doi:10.1007/s00500-005-0490-z
Google Scholar
Tsunoda DF, Freitas AA, Lopes HS (2009) MAHATMA: a genetic programming-based tool for protein classification. In: Proc 2009 ninth international conference on intelligent systems design and applications (ISDA-09), IEEE Press, pp 1136–1142
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Mateo, CA
Zhao XM, Wang Y, Chen L, Aihara K (2008) Protein function prediction with high-throughput data. Amino Acids 35(3):517–530. doi:10.1007/s00726-008-0077-y
Article Google Scholar

Download references

Author information

Authors and Affiliations

Federal University of Parana, Av. Prefeito Lothário Meissner, 632, Room 38, Curitiba, PR, Brazil
Denise Fukumi Tsunoda
School of Computing, University of Kent, Room S107, Canterbury, Kent, CT2 7NF, UK
Alex Alves Freitas
Federal University of Technology, Av. 7 de Setembro, 3165, Bloco D, 3° floor, Curitiba, PR, Brazil
Heitor Silvério Lopes

Authors

Denise Fukumi Tsunoda
View author publications
You can also search for this author in PubMed Google Scholar
Alex Alves Freitas
View author publications
You can also search for this author in PubMed Google Scholar
Heitor Silvério Lopes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Denise Fukumi Tsunoda.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tsunoda, D.F., Freitas, A.A. & Lopes, H.S. A genetic programming method for protein motif discovery and protein classification. Soft Comput 15, 1897–1908 (2011). https://doi.org/10.1007/s00500-010-0624-9

Download citation

Published: 16 June 2010
Issue Date: October 2011
DOI: https://doi.org/10.1007/s00500-010-0624-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A genetic programming method for protein motif discovery and protein classification

Abstract

Access this article

Similar content being viewed by others

schematikon: Detailed Sequence-Structure Relationships from Mining a Non-redundant Protein Structure Database

ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins

Motifs and structural blocks retrieval by GHT

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A genetic programming method for protein motif discovery and protein classification

Abstract

Access this article

Similar content being viewed by others

schematikon: Detailed Sequence-Structure Relationships from Mining a Non-redundant Protein Structure Database

ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins

Motifs and structural blocks retrieval by GHT

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation