Skip to main content

Learning Heuristics for Mining RNA Sequence-Structure Motifs

  • Chapter
  • First Online:
Genetic Programming Theory and Practice XIII

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

Abstract

The computational identification of conserved motifs in RNA molecules is a major—yet largely unsolved—problem. Structural conservation serves as strong evidence for important RNA functionality. Thus, comparative structure analysis is the gold standard for the discovery and interpretation of functional RNAs.In this paper we focus on one of the functional RNA motif types, sequence-structure motifs in RNA molecules, which marks the molecule as targets to be recognized by other molecules.We present a new approach for the detection of RNA structure (including pseudoknots), which is conserved among a set of unaligned RNA sequences. Our method extends previous approaches for this problem, which were based on first identifying conserved stems and then assembling them into complex structural motifs. The novelty of our approach is in simultaneously preforming both the identification and the assembly of these stems. We believe this novel unified approach offers a more informative model for deciphering the evolution of functional RNAs, where the sets of stems comprising a conserved motif co-evolve as a correlated functional unit.Since the task of mining RNA sequence-structure motifs can be addressed by solving the maximum weighted clique problem in an n-partite graph, we translate the maximum weighted clique problem into a state graph. Then, we gather and define domain knowledge and low-level heuristics for this domain. Finally, we learn hyper-heuristics for this domain, which can be used with heuristic search algorithms (e.g., A*, IDA*) for the mining task.The hyper-heuristics are evolved using HH-Evolver, a tool for domain-specific, hyper-heuristic evolution. Our approach is designed to overcome the computational limitations of current algorithms, and to remove the necessity of previous assumptions that were used for sparsifying the graph.This is still work in progress and as yet we have no results to report. However, given the interest in the methodology and its previous success in other domains we are hopeful that these shall be forthcoming soon.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Akutsu T (2000) Dp algorithms for rna secondary structure prediction with pseudoknots. Discrete Appl Math 104(1–3):45–62

    Article  MathSciNet  MATH  Google Scholar 

  • Aler R, Borrajo D, Isasi P (1998) Genetic programming of control knowledge for planning. In: Proceedings of AIPS-98

    Google Scholar 

  • Aler R, Borrajo D, Isasi P (2001) Learning to solve planning problems efficiently by means of genetic programming. Evol Comput 9(4):387–420

    Article  Google Scholar 

  • Aler R, Borrajo D, Isasi P (2002) Using genetic programming to learn and improve knowledge. Artif Intell 141(1–2):29–56

    Article  MATH  Google Scholar 

  • Arfaee SJ, Zilles S, Holte RC (2010) Bootstrap learning of heuristic functions. In: Proceedings of the 3rd international symposium on combinatorial search (SoCS2010), pp 52–59

    Google Scholar 

  • Backofen R, Tsur D, Zakov S, Ziv-Ukelson M (2011) Sparse folding: time and space efficient algorithms. J Discrete Algorithms 9(1):12–31

    Article  MathSciNet  MATH  Google Scholar 

  • Bonet B, Geffner H (2005) mGPT: A probabilistic planner based on heuristic search. J Artif Intell Res 24:933–944

    MATH  Google Scholar 

  • Borrajo D, Veloso MM (1997) Lazy incremental learning of control knowledge for efficiently obtaining quality plans. Artif Intell Rev 11(1–5):371–405

    Article  Google Scholar 

  • Brierley I, Gilbert RC, Pennell S (2008) Pseudoknots and the regulation of protein synthesis. Biochem Soc Trans 36(4):684–689

    Article  Google Scholar 

  • Burke EK, Kendall G, Soubeiga E (2003) A tabu-search hyperheuristic for timetabling and rostering. J Heuristics 9(6):451–470. http://dx.doi.org/10.1023/B:HEUR.0000012446.94732.b6

    Article  Google Scholar 

  • Burke EK, Hyde M, Kendall G, Ochoa G, Ozcan E, Woodward JR (2010) A classification of hyper-heuristic approaches. In: Gendreau M, Potvin J (eds) Handbook of meta-heuristics, 2nd edn. Springer, Berlin, pp 449–468

    Google Scholar 

  • Cowling PI, Kendall G, Soubeiga E (2000) A hyperheuristic approach to scheduling a sales summit. In: Burke EK, Erben W (eds) PATAT. Lecture notes in computer science, vol 2079. Springer, Berlin, pp 176–190. doi:10.1007/3-540-44629-X_11

    Google Scholar 

  • Do CB, Woods DA, Batzoglou S (2006) Contrafold: RNA secondary structure prediction without physics-based models. Bioinformatics 22(14):e90–e98

    Article  Google Scholar 

  • Elyasaf A, Sipper M (2013) Hh-evolver: a system for domain-specific, hyper-heuristic evolution. In: Proceedings of the 15th annual conference companion on genetic and evolutionary computation GECCO ’13 companion. ACM, New York, pp 1285–1292. doi:10.1145/2464576.2482707. http://doi.acm.org/10.1145/2464576.2482707

    Chapter  Google Scholar 

  • Elyasaf A, Hauptman A, Sipper M (2012) Evolutionary design of FreeCell solvers. IEEE Trans Comput Intell AI Games 4(4):270–281. doi:10.1109/TCIAIG.2012.2210423. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6249736

    Article  Google Scholar 

  • Fawcett C, Karpas E, Helmert M, Roger G, Hoos H (2011) Fd-autotune: domain-specific configuration using fast-downward. In: Proceedings of ICAPS-PAL 2011

    Google Scholar 

  • Garrido P, Rojas MCR (2010) DVRP: a hard dynamic combinatorial optimisation problem tackled by an evolutionary hyper-heuristic. J Heuristics 16(6):795–834. http://dx.doi.org/10.1007/s10732-010-9126-2

    Article  MATH  Google Scholar 

  • Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A (2005) RFAM: annotating non-coding RNAS in complete genomes. Nucleic Acids Res 33(suppl 1):D121–D124

    Google Scholar 

  • Hart PE, Nilsson NJ, Raphael B (1968) A formal basis for heuristic determination of minimum path cost. IEEE Trans Syst Sci Cybern 4(2):100–107

    Article  Google Scholar 

  • Hauptman A, Elyasaf A, Sipper M, Karmon A (2009) GP-Rush: using genetic programming to evolve solvers for the Rush Hour puzzle. In: GECCO’09: Proceedings of 11th annual conference on genetic and evolutionary computation conference. ACM, New York, pp 955–962. doi:10.1145/1569901.1570032. http://dl.acm.org/citation.cfm?id=1570032

    Google Scholar 

  • Havgaard J, Lyngso R, Stormo G, Gorodkin J (2005) Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics 21(9):1815–1824

    Article  Google Scholar 

  • Hochsmann M, Toller T, Giegerich R, Kurtz S (2003) Local similarity in RNA secondary structures. In: Proceedings of the IEEE computer society conference on bioinformatics, Citeseer, p 159

    Google Scholar 

  • Hofacker I, Fontana W, Stadler P, Bonhoeffer L, Tacker M, Schuster P (1994) Fast folding and comparison of RNA secondary structures. Monatshefte fur Chemie/Chemical Monthly 125(2):167–188

    Article  Google Scholar 

  • Hofacker I, Fekete M, Stadler P (2002) Secondary structure prediction for aligned RNA sequences. J Mol Biol 319:1059–1066

    Article  Google Scholar 

  • Hofacker I, Bernhart S, Stadler P (2004) Alignment of RNA base pairing probability matrices. Bioinformatics 20(14):2222–2227

    Article  Google Scholar 

  • Hoffmann J, Nebel B (2001) The FF planning system: fast plan generation through heuristic search. J Artif Int Res 14(1):253–302. http://dl.acm.org/citation.cfm?id=1622394.1622404

    MATH  Google Scholar 

  • Ji Y, Xu X, Stormo GD (2004) A graph theoretical approach for predicting common rna secondary structure motifs including pseudoknots in unaligned sequences. Bioinformatics 20(10):1591–1602

    Article  Google Scholar 

  • Korf RE (1985) Depth-first iterative-deepening: an optimal admissible tree search. Artif Intell 27(1):97–109

    Article  MathSciNet  MATH  Google Scholar 

  • Korf RE (1997) Finding optimal solutions to Rubik’s cube using pattern databases. In: Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on innovative applications of artificial intelligence, AAAI’97/IAAI’97, AAAI Press, pp 700–705

    Google Scholar 

  • Koza JR (1994) Genetic programming II: automatic discovery of reusable programs. MIT Press, Cambridge, MA

    MATH  Google Scholar 

  • Levine J, Humphreys D (2003) Learning action strategies for planning domains using genetic programming. In: Raidl GR, Meyer JA, Middendorf M, Cagnoni S, Cardalda JJR, Corne D, Gottlieb J, Guillot A, Hart E, Johnson CG, Marchiori E (eds) EvoWorkshops. Lecture notes in computer science, vol 2611. Springer, New York, pp 684–695

    Google Scholar 

  • Levine J, Westerberg H, Galea M, Humphreys D (2009) Evolutionary-based learning of generalised policies for AI planning domains. In: Rothlauf F (ed) Proceedings of the 11th annual conference on genetic and evolutionary computation (GECCO 2009). ACM, New York, pp 1195–1202

    Google Scholar 

  • Mandal M, Breaker RR (2004) Gene regulation by riboswitches. Cell 6:451–463

    Google Scholar 

  • Mathews DH, Turner DH (2002) Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J Mol Biol 317(2):191–203

    Article  Google Scholar 

  • Milo N, Zakov S, Katzenelson E, Bachmat E, Dinitz Y, Ziv-Ukelson M (2013) Unrooted unordered homeomorphic subtree alignment of rna trees. Algorithms Mol Biol 8(1):13

    Article  Google Scholar 

  • Milo N, Yogev S, Ziv-Ukelson M (2014) Stemsearch: Rna search tool based on stem identification and indexing. Methods

    Google Scholar 

  • Mitchell TM (1999) Machine learning and data mining. Commun ACM 42(11):30–36

    Article  Google Scholar 

  • Oltean M (2005) Evolving evolutionary algorithms using linear genetic programming. Evol Comput 13(3):387–410. http://dx.doi.org/10.1162/1063656054794815

    Article  Google Scholar 

  • Pederson J, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander E, Kent J, Miller W, Haussler D (2006) Identification and classification of conserved RNA secondary structures in the human genome. PLOS Comput Biol 2:e33

    Article  Google Scholar 

  • Samadi M, Felner A, Schaeffer J (2008) Learning from multiple heuristics. In: Fox D, Gomes CP (eds) Proceedings of the twenty-third AAAI conference on artificial intelligence (AAAI 2008), AAAI Press, pp 357–362

    Google Scholar 

  • Sczyrba A, Kruger J, Mersch H, Kurtz S, Giegerich R (2003) RNA-related tools on the bielefeld bioinformatics server. Nucleic Acids Res 31(13):3767

    Article  Google Scholar 

  • Siebert S, Backofen R (2005) MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons. Bioinformatics 21(16):3352–3359

    Article  Google Scholar 

  • Staple DW, Butcher SE (2005) Pseudoknots: RNA structures with diverse functions. PLoS Biol 3(6):e213

    Article  Google Scholar 

  • Thompson J, Higgins D, Gibson T (1994) CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22):4673

    Article  Google Scholar 

  • Torarinsson E, Havgaard JH, Gorodkin J (2007) Multiple structural alignment and clustering of RNA sequences. Bioinformatics 23(8):926–932

    Article  Google Scholar 

  • Wang Z, Zhang K (2001) Alignment between two RNA structures. Lecture notes in computer science. Springer, Berlin, pp 690–702

    MATH  Google Scholar 

  • Washietl S, Hofacker I (2004) Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. J Mol Biol 342:19–30

    Article  Google Scholar 

  • Will S, Reiche K, Hofacker IL, Stadler PF, Backofen R (2007) Inferring non-coding RNA families and classes by means of genome-scale structure-based clustering. PLOS Comput Biol 3(4):e65

    Article  MathSciNet  Google Scholar 

  • Yoon SW, Fern A, Givan R (2008) Learning control knowledge for forward search planning. J Mach Learn Res 9:683–718. http://doi.acm.org/10.1145/1390681.1390705

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research was supported by the Israel Science Foundation (grant no. 123/11 and grant no. 179/14).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Achiya Elyasaf .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Elyasaf, A., Vaks, P., Milo, N., Sipper, M., Ziv-Ukelson, M. (2016). Learning Heuristics for Mining RNA Sequence-Structure Motifs. In: Riolo, R., Worzel, W., Kotanchek, M., Kordon, A. (eds) Genetic Programming Theory and Practice XIII. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-319-34223-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-34223-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-34221-4

  • Online ISBN: 978-3-319-34223-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics