Skip to main content

Discovering Gated Recurrent Neural Network Architectures

  • Chapter
  • First Online:
Deep Neural Evolution

Part of the book series: Natural Computing Series ((NCS))

Abstract

Gated recurrent networks such as those composed of Long Short-Term Memory (LSTM) nodes have recently been used to improve state of the art in many sequential processing tasks such as speech recognition and machine translation. However, the basic structure of the LSTM node is essentially the same as when it was first conceived 25 years ago. Recently, evolutionary and reinforcement-learning mechanisms have been employed to create new variations of this structure. This chapter proposes a new method, evolution of a tree-based encoding of the gated memory nodes, and shows that it makes it possible to explore new variations more effectively than other methods. The method discovers nodes with multiple recurrent paths and multiple memory cells, which lead to significant improvement in the standard language modeling benchmark task. The chapter also shows how the search process can be speeded up by training an LSTM network to estimate performance of candidate structures, and by encouraging exploration of novel solutions. Thus, evolutionary design of complex neural network structures promises to improve performance of deep learning architectures beyond human ability to do so.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hoos, H.: Programming by optimization. Commun. ACM 55, 70–80 (2012)

    Google Scholar 

  2. Andrychowicz, M., Denil, M., Colmenarejo, S.G., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., de Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 3988–3996. Curran Associates Inc., Red Hook (2016). Available: http://dl.acm.org/citation.cfm?id=3157382.3157543

  3. Malkomes, G., Schaff, C., Garnett, R.: Bayesian optimization for automated model selection. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 2900–2908. Curran Associates Inc., Red Hook (2016). Available: http://dl.acm.org/citation.cfm?id=3157382.3157422

  4. Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning (2016). Available: https://arxiv.org/pdf/1611.01578v1.pdf

  5. Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning (2016). Available: https://arxiv.org/pdf/1611.02167v2.pdf

  6. Miikkulainen, R., Liang, J.Z., Meyerson, E., Rawal, A., Fink, D., Francon, O., Raju, B., Shahrzad, H., Navruzyan, A., Duffy, N., Hodjat, B.: Evolving deep neural networks. CoRR, vol. abs/1703.00548 (2017). Available: http://arxiv.org/abs/1703.00548

  7. Real, E., Moore, S., Selle, A., Saxena, S., Suematsu, Y.L., Tan, J., Le, Q.V., Kurakin, A.: Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17. JMLR.org, pp. 2902–2911 (2017). Available: http://dl.acm.org/citation.cfm?id=3305890.3305981

  8. Fernando, C.: PathNet: evolution channels gradient descent in super neural networks (2017). Available: https://arxiv.org/abs/1701.08734

  9. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). Available: http://dx.doi.org/10.1162/neco.1997.9.8.1735

    Google Scholar 

  10. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. Preprint (2014). arXiv:1409.0473

    Google Scholar 

  11. Klaus, G., Srivastava, R., Koutník, J., Steunebrink, R., Schmidhuber, J.: LSTM: a search space odyssey. Preprint, vol. arxiv/1503.04069 (2014)

    Google Scholar 

  12. Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 2342–2350 (2015)

    Google Scholar 

  13. Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. Preprint, vol. arxiv/1409.2329 (2014)

    Google Scholar 

  14. Zilly, J.G., Srivastava, R.K., Koutník, J., Schmidhuber, J.: Recurrent highway networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, 06–11 Aug 2017, vol. 70, pp. 4189–4198. International Convention Centre, Sydney (2017). Available: http://proceedings.mlr.press/v70/zilly17a.html

    Google Scholar 

  15. Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 1027–1035. Curran Associates Inc., Red Hook (2016). Available: http://dl.acm.org/citation.cfm?id=3157096.3157211

  16. Bayer, J., Wierstra, D., Togelius, J., Schmidhuber, J.: Evolving memory cell structures for sequence learning. In: Artificial Neural Networks ICANN, pp. 755–764 (2009)

    Google Scholar 

  17. Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR, vol. abs/1406.1078 (2014). Available: http://arxiv.org/abs/1406.1078

  18. Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR, 10–15 July 2018, vol. 80, pp. 4095–4104. Stockholmsmässan, Stockholm (2018). Available: http://proceedings.mlr.press/v80/pham18a.html

    Google Scholar 

  19. Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002). Available: http://nn.cs.utexas.edu/?stanley:ec02

    Google Scholar 

  20. Suganuma, M., Shirakawa, S., Nagao, T.: A genetic programming approach to designing convolutional neural network architectures. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’17, pp. 497–504. ACM, New York (2017). Available: http://doi.acm.org/10.1145/3071178.3071229

  21. Lehman, J., Stanley, K.O.: Abandoning objectives: evolution through the search for novelty alone. Evol. Comput. 19(2), 189–223 (2011)

    Google Scholar 

  22. Trujillo, L., Muñoz, L., López, E.G., Silva, S.: neat genetic programming: controlling bloat naturally. Inf. Sci. 333, 21–43 (2016)

    Google Scholar 

  23. Zhang, Y., Zhang, M.: A multiple-output program tree structure in genetic programming. Tech. Rep. (2004)

    Google Scholar 

  24. Francone, F.D., Conrads, M., Banzhaf, W., Nordin, P.: Homologous crossover in genetic programming. In: Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation - Volume 2, GECCO’99, pp. 1021–1026. Morgan Kaufmann Publishers Inc., San Francisco (1999). Available: http://dl.acm.org/citation.cfm?id=2934046.2934059

  25. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, pp. 3104–3112. MIT Press, Cambridge (2014). Available: http://dl.acm.org/citation.cfm?id=2969033.2969173

  26. Klein, A., Falkner, S., Springenberg, J.T., Hutter, F.: Learning curve prediction with Bayesian neural networks. In: Conference Paper at ICLR (2017)

    Google Scholar 

  27. Baker, B., Gupta, O., Raskar, R., Naik, N.: Practical neural network performance prediction for early stopping. CoRR, vol. abs/1705.10823 (2017). Available: http://arxiv.org/abs/1705.10823

  28. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)

    Google Scholar 

  29. Ycart, A., Benetos, E.: Polyphonic music sequence transduction with meter-constrained LSTM networks. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 386–390, April 2018

    Google Scholar 

  30. Press, O., Wolf, L.: Using the output embedding to improve language models. Preprint, vol. arxiv/1608.05859 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Aditya Rawal or Risto Miikkulainen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Rawal, A., Liang, J., Miikkulainen, R. (2020). Discovering Gated Recurrent Neural Network Architectures. In: Iba, H., Noman, N. (eds) Deep Neural Evolution. Natural Computing Series. Springer, Singapore. https://doi.org/10.1007/978-981-15-3685-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-3685-4_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-3684-7

  • Online ISBN: 978-981-15-3685-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics