Skip to main content
Log in

Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

Given a protein that is localized in the mitochondria it is very important to know the submitochondria localization of that protein to understand its function. In this work, we propose a submitochondria localizer whose feature extraction method is based on the Chou’s pseudo-amino acid composition. The pseudo-amino acid based features are obtained by combining pseudo-amino acid compositions with hundreds of amino-acid indices and amino-acid substitution matrices, then from this huge set of features a small set of 15 “artificial” features is created. The feature creation is performed by genetic programming combining one or more “original” features by means of some mathematical operators. Finally, the set of combined features are used to train a radial basis function support vector machine. This method is named GP-Loc. Moreover, we also propose a very few parameterized method, named ALL-Loc, where all the “original” features are used to train a linear support vector machine. The overall prediction accuracy obtained by GP-Loc is 89% when the jackknife cross-validation is used, this result outperforms the performance obtained in the literature (85.2%) using the same dataset. While the overall prediction accuracy obtained by ALL-Loc is 83.9%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. This algorithm has been implemented in the Matlab toolbox named “GPLAB: A Genetic Programming Toolbox for MATLAB”, http://www.gplab.sourceforge.net/.

  2. http://www.massey.ac.nz/∼mgwalker/gp/index.html.

  3. The GP is performed using a fivefold cross-validation for each pattern, so for each pattern the best features could be quite different, we report the best five features on average.

References

  • Cai YD, Liu XJ, Xu XB, Chou KC (2000) Support vector machines for prediction of protein subcellular location. Mol Cell Biol Res Commun 4:230–233

    Article  PubMed  CAS  Google Scholar 

  • Cai YD, Liu XJ, Xu XB, Chou KC (2002) Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect. J Cell Biochem 84:343–348

    Article  PubMed  CAS  Google Scholar 

  • Cedano J, Aloy P, P’erez-Pons JA, Querol E (1997) Relation between amino acid composition and cellular location of proteins. J Mol Biol 266:594–600

    Article  PubMed  CAS  Google Scholar 

  • Chen C, Tian YX, Zou XY, Cai PX, Mo JY (2006a) Using pseudo-amino acid composition and support vector machine to predict protein structural class. J Theor Biol 243:444–448

    Article  PubMed  CAS  Google Scholar 

  • Chen C, Zhou X, Tian Y, Zou X, Cai P (2006b) Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Anal Biochem 357:116–121

    Article  PubMed  CAS  Google Scholar 

  • Chen J, Liu H, Yang J, Chou KC (2007) Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids 33:423–428

    Article  PubMed  CAS  Google Scholar 

  • Chen YL, Li QZ (2007a) Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo amino acid composition. J Theor Biol 248:377–381

    Article  PubMed  CAS  Google Scholar 

  • Chen YL, Li QZ (2007b) Prediction of the subcellular location of apoptosis proteins. J Theor Biol 245:775–783

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2000) Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem Biophys Res Commun 278:477–483

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2001) Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct, Funct, Genet (Erratum: ibid., 2001, 44:60) 43:246–255

  • Chou KC (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10–19

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2002) Using functional domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem 277:45765–45769

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2003) A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. Biochem Biophys Res Commun 311:743–747

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2004a) Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition. J Cell Biochem 91:1197–1203

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2004b) Prediction of protein subcellular locations by GO-FunD-PseAA predicor. Biochem Biophys Res Commun 320:1236–1239

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2005) Predicting protein localization in budding yeast. Bioinformatics 21:944–950

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Elrod DW (1998) Using discriminant function for prediction of subcellular location of prokaryotic proteins. Biochem Biophys Res Commun 252:63–68

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Elrod DW (1999) Protein subcellular location prediction. Protein Eng 12:107–118

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2006) Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem Biophys Res Commun 347:150–157

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007a) Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J Proteome Res 6:1728–1734

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007b) Large-scale plant protein subcellular location prediction. J Cell Biochem 100:665–678

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007c) MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Commun 360:339–345

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007d) Review: recent progresses in protein subcellular location prediction. Anal Biochem 370:1–16

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007e) Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. Biochem Biophys Res Commun 357:633–640

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Zhang CT (1995) Review: prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349

    Article  PubMed  CAS  Google Scholar 

  • Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

    Google Scholar 

  • Diao Y, Li M, Feng Z, Yin J, Pan Y (2007a) The community structure of human cellular signaling network. J Theor Biol 247:608–615

    Article  PubMed  Google Scholar 

  • Diao Y, Ma D, Wen Z, Yin J, Xiang J, Li M (2007b) Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and Lempel-Ziv complexity. Amino Acids. doi:10.1007/s00726-007-0550-z

  • Ding YS, Zhang TL, Chou KC (2007) Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. Protein Peptide Lett 14:811–815

    Article  CAS  Google Scholar 

  • Du P, Li Y (2006) Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinformatics 7:518

    Article  PubMed  CAS  Google Scholar 

  • Fang Y, Guo Y, Feng Y, Li M (2007) Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids. doi:10.1007/s00726-007-0568-2

  • Gao Y, Shao SH, Xiao X, Ding YS, Huang YS, Huang ZD, Chou KC (2005) Using pseudo amino acid composition to predict protein subcellular location: approached with Lyapunov index, Bessel function, and Chebyshev filter. Amino Acids 28:373–376

    Article  PubMed  CAS  Google Scholar 

  • Guo YZ, Li M, Lu M, Wen Z, Wang K, Li G, Wu J (2006) Classifying G protein-coupled receptors and nuclear receptors based on protein power spectrum from fast Fourier transform. Amino Acids 30:397–402

    Article  PubMed  CAS  Google Scholar 

  • Gottlieb RA (2000) Programmed cell death. Drug News Perspect 13:471–476

    PubMed  CAS  Google Scholar 

  • Jassem W, Fuggle SV, Rela M, Koo DD, Heaton ND (2000) The role of mitochondria in ischemia/reperfusion injury. Transplantation 73:493–499

    Article  Google Scholar 

  • Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucleic Acids Res. 28:374

    Article  PubMed  CAS  Google Scholar 

  • Kontijevskis A, Wikberg JES, Komorowski J (2007) Computational proteomics analysis of HIV-1 protease interactome. Proteins: Struct, Funct, Bioinformatics 68(1):305–312

    Article  CAS  Google Scholar 

  • Kurgan LA, Stach W, Ruan J (2007) Novel scales based on hydrophobicity indices for secondary protein structure. J Theor Biol 248:354–366

    Article  PubMed  CAS  Google Scholar 

  • Lee K, Kim DW, Na D, Lee KH, Lee D (2006) PLPD: reliable protein localization prediction from imbalanced and overlapped datasets. Nucleic Acids Res 34:4655–4666

    Article  PubMed  CAS  Google Scholar 

  • Li FM, Li QZ (2007) Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach. Amino Acids. doi:10.1007/s00726-007-0545-9

  • Lin H, Li QZ (2007a) Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. Biochem Biophys Res Commun 354:548–551

    Article  PubMed  CAS  Google Scholar 

  • Lin H, Li QZ (2007b) Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J Comput Chem 28:1463–1466

    Article  PubMed  CAS  Google Scholar 

  • Liu DQ, Liu H, Shen HB, Yang J, Chou KC (2007) Predicting secretory protein signal sequence cleavage sites by fusing the marks of global alignments. Amino Acids 32:493–496

    Article  PubMed  CAS  Google Scholar 

  • Liu H, Wang M, Chou KC (2005) Low-frequency Fourier spectrum for predicting membrane protein types. Biochem Biophys Res Commun 336:737–739

    Article  PubMed  CAS  Google Scholar 

  • Lumini A, Nanni L (2007) Over-complete feature generation and feature selection for biometry. Expert Syst Appl. doi:10.1016/j.eswa.2007.08.097

  • Mondal S, Bhavna R, Mohan Babu R, Ramakumar S (2006) Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. J Theor Biol 243:252–60

    Article  PubMed  CAS  Google Scholar 

  • Mundra P, Kumar M, Kumar KK, Jayaraman VK, Kulkarni BD (2007) Using pseudo amino acid composition to predict protein subnuclear localization: approached with PSSM. Pattern Recognit Lett 28:1610–1615

    Article  Google Scholar 

  • Nakai K, Horton P (1999) PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 24:34–36

    Article  PubMed  CAS  Google Scholar 

  • Nakai K, Kanehisa M (1992) A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14:897–911

    Article  PubMed  CAS  Google Scholar 

  • Nanni L, Lumini A (2006a) An ensemble of K-Local Hyperplane for predicting protein-protein interactions. BioInformatics 22(10):1207–1210

    Article  PubMed  CAS  Google Scholar 

  • Nanni L, Lumini A (2006b) MppS: an ensemble of support vector machine based on multiple physicochemical properties of amino-acids. NeuroComputing 69:1688–1690

    Article  Google Scholar 

  • Niu B, Cai YD, Lu WC, Zheng GY, Chou KC (2006) Predicting protein structural class with AdaBoost learner. Protein Peptide Lett 13:489–492

    Article  CAS  Google Scholar 

  • Paul TK, Iba H (2007) Prediction of cancer class with majority voting genetic programming classifier using gene expression data. IEEE Trans Comp Biol Bioinformatics. http://doi.ieeecomputersociety.org/10.1109/TCBB.2007.70245

  • Pu X, Guo J, Leung H, Lin Y (2007) Prediction of membrane protein types from sequences and position-specific scoring matrices. J Theor Biol 247:259–265

    Article  PubMed  CAS  Google Scholar 

  • Rögnvaldsson T, You L (2003) Why neural networks should not be used for HIV-1 protease cleavage site prediction. Bioinformatics 20(11):1702–1709

    Article  CAS  Google Scholar 

  • Shen HB, Chou KC (2007a) EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem Biophys Res Commun 364:53–59

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Chou KC (2007b) Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Biophys Res Commun 355:1006–1111

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Chou KC (2007c) Signal-3L: a 3-layer approach for predicting signal peptide. Biochem Biophys Res Commun 363:297–303

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Chou KC (2007d) Using ensemble classifier to identify membrane protein types. Amino Acids 32:483–488

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Yang J, Chou KC (2007) Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids 33:57–67

    Article  PubMed  CAS  Google Scholar 

  • Shi JY, Zhang SW, Pan Q, Cheng Y-M, Xie J (2007) Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids 33:69–74

    Article  PubMed  CAS  Google Scholar 

  • Sun XD, Huang RB (2006) Prediction of protein structural classes using support vector machines. Amino Acids 30:469–475

    Article  PubMed  CAS  Google Scholar 

  • Tan F, Feng X, Fang Z, Li M, Guo Y, Jiang L (2007) Prediction of mitochondrial proteins based on genetic algorithm—partial least squares and support vector machine. Amino Acids. doi:10.1007/s00726-006-0465-0

  • Yu, Bhanu B (2006) Evolutionary feature synthesis for facial expression recognition. Pattern Recognit Lett 27:1289–1298

    Article  Google Scholar 

  • Wang M, Yang J, Chou KC (2005) Using string kernel to predict signal peptide cleavage site based on subsite coupling model. Amino Acids (Erratum: ibid., 2005, 29:301) 28:395–402

  • Wang M, Yang J, Liu GP, Xu ZJ, Chou KC (2004) Weighted-support vector machines for predicting membrane protein types based on pseudo amino acid composition. Protein Eng, Des, Sel 17:509–516

    Article  CAS  Google Scholar 

  • Wen Z, Li M, Li Y, Guo Y, Wang K (2006) Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognition. Amino Acids 32:277–283

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Chou KC (2007) Digital coding of amino acids based on hydrophobic index. Protein Peptide Lett 14:871–875

    Article  CAS  Google Scholar 

  • Xiao X, Shao S, Ding Y, Huang Z, Huang Y, Chou KC (2005) Using complexity measure factor to predict protein subcellular location. Amino Acids 28:57–61

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Shao SH, Ding YS, Huang ZD, Chou KC (2006a) Using cellular automata images and pseudo amino acid composition to predict protein subcellular location. Amino Acids 30:49–54

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Shao SH, Huang ZD, Chou KC (2006b) Using pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor. J Comput Chem 27:478–482

    Article  PubMed  CAS  Google Scholar 

  • Yuan Z (1999) Prediction of protein subcellular locations using Markov chain models. FEBS Lett 451:23–26

    Article  PubMed  CAS  Google Scholar 

  • Zhang SW, Pan Q, Zhang HC, Shao ZC, Shi JY (2006) Prediction protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and naive Bayes feature fusion. Amino Acids 30:461–468

    Article  PubMed  CAS  Google Scholar 

  • Zhang TL, Ding YS (2007) Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids. doi:10.1007/s00726-007-0496-1

  • Zhou XB, Chen C, Li ZC, Zou XY (2007) Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol 248:546–551

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Loris Nanni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nanni, L., Lumini, A. Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization. Amino Acids 34, 653–660 (2008). https://doi.org/10.1007/s00726-007-0018-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-007-0018-1

Keywords

Navigation