Abstract
In this paper several methods of grammar induction problem are examined in the context of biological sequence analysis. In addition to this, a new method which generates noncircular context-free grammars is proposed. It has been shown through a computational experiment that the proposed, evolutionary-inspired approach overcomes statistically—with respect to classification quality—other grammatical inference algorithms on the sequences from a real amyloidogenic dataset.
Keywords
This research was supported by National Science Center, grant 2016/21/B/ST6/02158.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Asgari, E., Mofrad, M.R.K.: Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE 10(11), e0141287 (2015). https://doi.org/10.1371/journal.pone.0141287
Banzhaf, W., Francone, F.D., Keller, R.E., Nordin, P.: Genetic Programming: An Introduction: On the Automatic Evolution of Computer Programs and Its Applications. Morgan Kaufmann, San Francisco (1998)
Chirathamjaree, C., Ackroyd, M.H.: A method for the inference of non-recursive context-free grammars. Int. J. Man Mach. Stud. 12(4), 379–387 (1980)
Bouckaert, R.R., Frank, E.: Evaluating the replicability of significance tests for comparing learning algorithms. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS, vol. 3056, pp. 3–12. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24775-3_3
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, New York (1998)
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979)
Hu, X., Pan, Y.: Knowledge Discovery in Bioinformatics: Techniques, Methods, and Applications. Wiley, New Jersey (2007)
Keedwell, E., Narayanan, A.: Intelligent Bioinformatics: The Application of Artificial Intelligence Techniques to Bioinformatics Problems. Wiley, Chichester (2005)
Langdon, W.B., Barrett, S.J.: Genetic programming in data mining for drug discovery. In: Ghosh, A., Jain, L.C. (eds.) Evolutionary Computing in Data Mining, vol. 163, pp. 211–235. Springer, Heidelberg (2005). https://doi.org/10.1007/3-540-32358-9_10
Wieczorek, W., Unold, O.: Induction of directed acyclic word graph in a bioinformatics task. In: JMLR Workshop and Conference Proceedings, vol. 34, pp. 207–217 (2014)
Wieczorek, W., Unold, O.: Use of a novel grammatical inference approach in classification of amyloidogenic hexapeptides. Comput. Math. Methods Med. 2016 (2016). Article ID 1782732
Wieczorek, W.: Grammatical Inference: Algorithms, Routines and Applications. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-46801-3
Wozniak, P.P., Kotulska, M.: AmyLoad: website dedicated to amyloidogenic protein fragments. Bioinformatics 31(20), 3395–3397 (2015)
Wu, T.-F., Lin, C.-J., Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5, 975–1005 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wieczorek, W., Unold, O. (2019). GP-Based Grammatical Inference for Classification of Amyloidogenic Sequences. In: Bartoletti, M., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2017. Lecture Notes in Computer Science(), vol 10834. Springer, Cham. https://doi.org/10.1007/978-3-030-14160-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-14160-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14159-2
Online ISBN: 978-3-030-14160-8
eBook Packages: Computer ScienceComputer Science (R0)