Learning Heuristics for Mining RNA Sequence-Structure Motifs

Elyasaf, Achiya; Vaks, Pavel; Milo, Nimrod; Sipper, Moshe; Ziv-Ukelson, Michal

doi:10.1007/978-3-319-34223-8_2

Achiya Elyasaf⁷,
Pavel Vaks⁷,
Nimrod Milo⁷,
Moshe Sipper⁷ &
…
Michal Ziv-Ukelson⁷

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

921 Accesses
1 Citations

Abstract

The computational identification of conserved motifs in RNA molecules is a major—yet largely unsolved—problem. Structural conservation serves as strong evidence for important RNA functionality. Thus, comparative structure analysis is the gold standard for the discovery and interpretation of functional RNAs.In this paper we focus on one of the functional RNA motif types, sequence-structure motifs in RNA molecules, which marks the molecule as targets to be recognized by other molecules.We present a new approach for the detection of RNA structure (including pseudoknots), which is conserved among a set of unaligned RNA sequences. Our method extends previous approaches for this problem, which were based on first identifying conserved stems and then assembling them into complex structural motifs. The novelty of our approach is in simultaneously preforming both the identification and the assembly of these stems. We believe this novel unified approach offers a more informative model for deciphering the evolution of functional RNAs, where the sets of stems comprising a conserved motif co-evolve as a correlated functional unit.Since the task of mining RNA sequence-structure motifs can be addressed by solving the maximum weighted clique problem in an n-partite graph, we translate the maximum weighted clique problem into a state graph. Then, we gather and define domain knowledge and low-level heuristics for this domain. Finally, we learn hyper-heuristics for this domain, which can be used with heuristic search algorithms (e.g., A*, IDA*) for the mining task.The hyper-heuristics are evolved using HH-Evolver, a tool for domain-specific, hyper-heuristic evolution. Our approach is designed to overcome the computational limitations of current algorithms, and to remove the necessity of previous assumptions that were used for sparsifying the graph.This is still work in progress and as yet we have no results to report. However, given the interest in the methodology and its previous success in other domains we are hopeful that these shall be forthcoming soon.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Akutsu T (2000) Dp algorithms for rna secondary structure prediction with pseudoknots. Discrete Appl Math 104(1–3):45–62
Article MathSciNet MATH Google Scholar
Aler R, Borrajo D, Isasi P (1998) Genetic programming of control knowledge for planning. In: Proceedings of AIPS-98
Google Scholar
Aler R, Borrajo D, Isasi P (2001) Learning to solve planning problems efficiently by means of genetic programming. Evol Comput 9(4):387–420
Article Google Scholar
Aler R, Borrajo D, Isasi P (2002) Using genetic programming to learn and improve knowledge. Artif Intell 141(1–2):29–56
Article MATH Google Scholar
Arfaee SJ, Zilles S, Holte RC (2010) Bootstrap learning of heuristic functions. In: Proceedings of the 3rd international symposium on combinatorial search (SoCS2010), pp 52–59
Google Scholar
Backofen R, Tsur D, Zakov S, Ziv-Ukelson M (2011) Sparse folding: time and space efficient algorithms. J Discrete Algorithms 9(1):12–31
Article MathSciNet MATH Google Scholar
Bonet B, Geffner H (2005) mGPT: A probabilistic planner based on heuristic search. J Artif Intell Res 24:933–944
MATH Google Scholar
Borrajo D, Veloso MM (1997) Lazy incremental learning of control knowledge for efficiently obtaining quality plans. Artif Intell Rev 11(1–5):371–405
Article Google Scholar
Brierley I, Gilbert RC, Pennell S (2008) Pseudoknots and the regulation of protein synthesis. Biochem Soc Trans 36(4):684–689
Article Google Scholar
Burke EK, Kendall G, Soubeiga E (2003) A tabu-search hyperheuristic for timetabling and rostering. J Heuristics 9(6):451–470. http://dx.doi.org/10.1023/B:HEUR.0000012446.94732.b6
Article Google Scholar
Burke EK, Hyde M, Kendall G, Ochoa G, Ozcan E, Woodward JR (2010) A classification of hyper-heuristic approaches. In: Gendreau M, Potvin J (eds) Handbook of meta-heuristics, 2nd edn. Springer, Berlin, pp 449–468
Google Scholar
Cowling PI, Kendall G, Soubeiga E (2000) A hyperheuristic approach to scheduling a sales summit. In: Burke EK, Erben W (eds) PATAT. Lecture notes in computer science, vol 2079. Springer, Berlin, pp 176–190. doi:10.1007/3-540-44629-X_11
Google Scholar
Do CB, Woods DA, Batzoglou S (2006) Contrafold: RNA secondary structure prediction without physics-based models. Bioinformatics 22(14):e90–e98
Article Google Scholar
Elyasaf A, Sipper M (2013) Hh-evolver: a system for domain-specific, hyper-heuristic evolution. In: Proceedings of the 15th annual conference companion on genetic and evolutionary computation GECCO ’13 companion. ACM, New York, pp 1285–1292. doi:10.1145/2464576.2482707. http://doi.acm.org/10.1145/2464576.2482707
Chapter Google Scholar
Elyasaf A, Hauptman A, Sipper M (2012) Evolutionary design of FreeCell solvers. IEEE Trans Comput Intell AI Games 4(4):270–281. doi:10.1109/TCIAIG.2012.2210423. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6249736
Article Google Scholar
Fawcett C, Karpas E, Helmert M, Roger G, Hoos H (2011) Fd-autotune: domain-specific configuration using fast-downward. In: Proceedings of ICAPS-PAL 2011
Google Scholar
Garrido P, Rojas MCR (2010) DVRP: a hard dynamic combinatorial optimisation problem tackled by an evolutionary hyper-heuristic. J Heuristics 16(6):795–834. http://dx.doi.org/10.1007/s10732-010-9126-2
Article MATH Google Scholar
Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A (2005) RFAM: annotating non-coding RNAS in complete genomes. Nucleic Acids Res 33(suppl 1):D121–D124
Google Scholar
Hart PE, Nilsson NJ, Raphael B (1968) A formal basis for heuristic determination of minimum path cost. IEEE Trans Syst Sci Cybern 4(2):100–107
Article Google Scholar
Hauptman A, Elyasaf A, Sipper M, Karmon A (2009) GP-Rush: using genetic programming to evolve solvers for the Rush Hour puzzle. In: GECCO’09: Proceedings of 11th annual conference on genetic and evolutionary computation conference. ACM, New York, pp 955–962. doi:10.1145/1569901.1570032. http://dl.acm.org/citation.cfm?id=1570032
Google Scholar
Havgaard J, Lyngso R, Stormo G, Gorodkin J (2005) Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics 21(9):1815–1824
Article Google Scholar
Hochsmann M, Toller T, Giegerich R, Kurtz S (2003) Local similarity in RNA secondary structures. In: Proceedings of the IEEE computer society conference on bioinformatics, Citeseer, p 159
Google Scholar
Hofacker I, Fontana W, Stadler P, Bonhoeffer L, Tacker M, Schuster P (1994) Fast folding and comparison of RNA secondary structures. Monatshefte fur Chemie/Chemical Monthly 125(2):167–188
Article Google Scholar
Hofacker I, Fekete M, Stadler P (2002) Secondary structure prediction for aligned RNA sequences. J Mol Biol 319:1059–1066
Article Google Scholar
Hofacker I, Bernhart S, Stadler P (2004) Alignment of RNA base pairing probability matrices. Bioinformatics 20(14):2222–2227
Article Google Scholar
Hoffmann J, Nebel B (2001) The FF planning system: fast plan generation through heuristic search. J Artif Int Res 14(1):253–302. http://dl.acm.org/citation.cfm?id=1622394.1622404
MATH Google Scholar
Ji Y, Xu X, Stormo GD (2004) A graph theoretical approach for predicting common rna secondary structure motifs including pseudoknots in unaligned sequences. Bioinformatics 20(10):1591–1602
Article Google Scholar
Korf RE (1985) Depth-first iterative-deepening: an optimal admissible tree search. Artif Intell 27(1):97–109
Article MathSciNet MATH Google Scholar
Korf RE (1997) Finding optimal solutions to Rubik’s cube using pattern databases. In: Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on innovative applications of artificial intelligence, AAAI’97/IAAI’97, AAAI Press, pp 700–705
Google Scholar
Koza JR (1994) Genetic programming II: automatic discovery of reusable programs. MIT Press, Cambridge, MA
MATH Google Scholar
Levine J, Humphreys D (2003) Learning action strategies for planning domains using genetic programming. In: Raidl GR, Meyer JA, Middendorf M, Cagnoni S, Cardalda JJR, Corne D, Gottlieb J, Guillot A, Hart E, Johnson CG, Marchiori E (eds) EvoWorkshops. Lecture notes in computer science, vol 2611. Springer, New York, pp 684–695
Google Scholar
Levine J, Westerberg H, Galea M, Humphreys D (2009) Evolutionary-based learning of generalised policies for AI planning domains. In: Rothlauf F (ed) Proceedings of the 11th annual conference on genetic and evolutionary computation (GECCO 2009). ACM, New York, pp 1195–1202
Google Scholar
Mandal M, Breaker RR (2004) Gene regulation by riboswitches. Cell 6:451–463
Google Scholar
Mathews DH, Turner DH (2002) Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J Mol Biol 317(2):191–203
Article Google Scholar
Milo N, Zakov S, Katzenelson E, Bachmat E, Dinitz Y, Ziv-Ukelson M (2013) Unrooted unordered homeomorphic subtree alignment of rna trees. Algorithms Mol Biol 8(1):13
Article Google Scholar
Milo N, Yogev S, Ziv-Ukelson M (2014) Stemsearch: Rna search tool based on stem identification and indexing. Methods
Google Scholar
Mitchell TM (1999) Machine learning and data mining. Commun ACM 42(11):30–36
Article Google Scholar
Oltean M (2005) Evolving evolutionary algorithms using linear genetic programming. Evol Comput 13(3):387–410. http://dx.doi.org/10.1162/1063656054794815
Article Google Scholar
Pederson J, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander E, Kent J, Miller W, Haussler D (2006) Identification and classification of conserved RNA secondary structures in the human genome. PLOS Comput Biol 2:e33
Article Google Scholar
Samadi M, Felner A, Schaeffer J (2008) Learning from multiple heuristics. In: Fox D, Gomes CP (eds) Proceedings of the twenty-third AAAI conference on artificial intelligence (AAAI 2008), AAAI Press, pp 357–362
Google Scholar
Sczyrba A, Kruger J, Mersch H, Kurtz S, Giegerich R (2003) RNA-related tools on the bielefeld bioinformatics server. Nucleic Acids Res 31(13):3767
Article Google Scholar
Siebert S, Backofen R (2005) MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons. Bioinformatics 21(16):3352–3359
Article Google Scholar
Staple DW, Butcher SE (2005) Pseudoknots: RNA structures with diverse functions. PLoS Biol 3(6):e213
Article Google Scholar
Thompson J, Higgins D, Gibson T (1994) CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22):4673
Article Google Scholar
Torarinsson E, Havgaard JH, Gorodkin J (2007) Multiple structural alignment and clustering of RNA sequences. Bioinformatics 23(8):926–932
Article Google Scholar
Wang Z, Zhang K (2001) Alignment between two RNA structures. Lecture notes in computer science. Springer, Berlin, pp 690–702
MATH Google Scholar
Washietl S, Hofacker I (2004) Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. J Mol Biol 342:19–30
Article Google Scholar
Will S, Reiche K, Hofacker IL, Stadler PF, Backofen R (2007) Inferring non-coding RNA families and classes by means of genome-scale structure-based clustering. PLOS Comput Biol 3(4):e65
Article MathSciNet Google Scholar
Yoon SW, Fern A, Givan R (2008) Learning control knowledge for forward search planning. J Mach Learn Res 9:683–718. http://doi.acm.org/10.1145/1390681.1390705
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research was supported by the Israel Science Foundation (grant no. 123/11 and grant no. 179/14).

Author information

Authors and Affiliations

Department of Computer Science, Ben-Gurion University, Beer-Sheva, 84105, Israel
Achiya Elyasaf, Pavel Vaks, Nimrod Milo, Moshe Sipper & Michal Ziv-Ukelson

Authors

Achiya Elyasaf
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Vaks
View author publications
You can also search for this author in PubMed Google Scholar
Nimrod Milo
View author publications
You can also search for this author in PubMed Google Scholar
Moshe Sipper
View author publications
You can also search for this author in PubMed Google Scholar
Michal Ziv-Ukelson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Achiya Elyasaf .

Editor information

Editors and Affiliations

Center for the Study of Complex Systems, University of Michigan, Ann Arbor, Michigan, USA
Rick Riolo
Evolution Enterprises, Ann Arbor, Michigan, USA
W.P. Worzel
Evolved Analytics, Midland, Michigan, USA
Mark Kotanchek
Evolved Analytics LLC, Midland, Michigan, USA
Arthur Kordon

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Elyasaf, A., Vaks, P., Milo, N., Sipper, M., Ziv-Ukelson, M. (2016). Learning Heuristics for Mining RNA Sequence-Structure Motifs. In: Riolo, R., Worzel, W., Kotanchek, M., Kordon, A. (eds) Genetic Programming Theory and Practice XIII. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-319-34223-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-34223-8_2
Published: 22 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34221-4
Online ISBN: 978-3-319-34223-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics