Discovering Gated Recurrent Neural Network Architectures

Rawal, Aditya; Liang, Jason; Miikkulainen, Risto

doi:10.1007/978-981-15-3685-4_9

Aditya Rawal⁵,
Jason Liang⁶ &
Risto Miikkulainen⁶

Part of the book series: Natural Computing Series ((NCS))

1712 Accesses
2 Citations

Abstract

Gated recurrent networks such as those composed of Long Short-Term Memory (LSTM) nodes have recently been used to improve state of the art in many sequential processing tasks such as speech recognition and machine translation. However, the basic structure of the LSTM node is essentially the same as when it was first conceived 25 years ago. Recently, evolutionary and reinforcement-learning mechanisms have been employed to create new variations of this structure. This chapter proposes a new method, evolution of a tree-based encoding of the gated memory nodes, and shows that it makes it possible to explore new variations more effectively than other methods. The method discovers nodes with multiple recurrent paths and multiple memory cells, which lead to significant improvement in the standard language modeling benchmark task. The chapter also shows how the search process can be speeded up by training an LSTM network to estimate performance of candidate structures, and by encouraging exploration of novel solutions. Thus, evolutionary design of complex neural network structures promises to improve performance of deep learning architectures beyond human ability to do so.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hoos, H.: Programming by optimization. Commun. ACM 55, 70–80 (2012)
Google Scholar
Andrychowicz, M., Denil, M., Colmenarejo, S.G., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., de Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 3988–3996. Curran Associates Inc., Red Hook (2016). Available: http://dl.acm.org/citation.cfm?id=3157382.3157543
Malkomes, G., Schaff, C., Garnett, R.: Bayesian optimization for automated model selection. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 2900–2908. Curran Associates Inc., Red Hook (2016). Available: http://dl.acm.org/citation.cfm?id=3157382.3157422
Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning (2016). Available: https://arxiv.org/pdf/1611.01578v1.pdf
Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning (2016). Available: https://arxiv.org/pdf/1611.02167v2.pdf
Miikkulainen, R., Liang, J.Z., Meyerson, E., Rawal, A., Fink, D., Francon, O., Raju, B., Shahrzad, H., Navruzyan, A., Duffy, N., Hodjat, B.: Evolving deep neural networks. CoRR, vol. abs/1703.00548 (2017). Available: http://arxiv.org/abs/1703.00548
Real, E., Moore, S., Selle, A., Saxena, S., Suematsu, Y.L., Tan, J., Le, Q.V., Kurakin, A.: Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17. JMLR.org, pp. 2902–2911 (2017). Available: http://dl.acm.org/citation.cfm?id=3305890.3305981
Fernando, C.: PathNet: evolution channels gradient descent in super neural networks (2017). Available: https://arxiv.org/abs/1701.08734
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). Available: http://dx.doi.org/10.1162/neco.1997.9.8.1735
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. Preprint (2014). arXiv:1409.0473
Google Scholar
Klaus, G., Srivastava, R., Koutník, J., Steunebrink, R., Schmidhuber, J.: LSTM: a search space odyssey. Preprint, vol. arxiv/1503.04069 (2014)
Google Scholar
Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 2342–2350 (2015)
Google Scholar
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. Preprint, vol. arxiv/1409.2329 (2014)
Google Scholar
Zilly, J.G., Srivastava, R.K., Koutník, J., Schmidhuber, J.: Recurrent highway networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, 06–11 Aug 2017, vol. 70, pp. 4189–4198. International Convention Centre, Sydney (2017). Available: http://proceedings.mlr.press/v70/zilly17a.html
Google Scholar
Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 1027–1035. Curran Associates Inc., Red Hook (2016). Available: http://dl.acm.org/citation.cfm?id=3157096.3157211
Bayer, J., Wierstra, D., Togelius, J., Schmidhuber, J.: Evolving memory cell structures for sequence learning. In: Artificial Neural Networks ICANN, pp. 755–764 (2009)
Google Scholar
Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR, vol. abs/1406.1078 (2014). Available: http://arxiv.org/abs/1406.1078
Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR, 10–15 July 2018, vol. 80, pp. 4095–4104. Stockholmsmässan, Stockholm (2018). Available: http://proceedings.mlr.press/v80/pham18a.html
Google Scholar
Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002). Available: http://nn.cs.utexas.edu/?stanley:ec02
Google Scholar
Suganuma, M., Shirakawa, S., Nagao, T.: A genetic programming approach to designing convolutional neural network architectures. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’17, pp. 497–504. ACM, New York (2017). Available: http://doi.acm.org/10.1145/3071178.3071229
Lehman, J., Stanley, K.O.: Abandoning objectives: evolution through the search for novelty alone. Evol. Comput. 19(2), 189–223 (2011)
Google Scholar
Trujillo, L., Muñoz, L., López, E.G., Silva, S.: neat genetic programming: controlling bloat naturally. Inf. Sci. 333, 21–43 (2016)
Google Scholar
Zhang, Y., Zhang, M.: A multiple-output program tree structure in genetic programming. Tech. Rep. (2004)
Google Scholar
Francone, F.D., Conrads, M., Banzhaf, W., Nordin, P.: Homologous crossover in genetic programming. In: Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation - Volume 2, GECCO’99, pp. 1021–1026. Morgan Kaufmann Publishers Inc., San Francisco (1999). Available: http://dl.acm.org/citation.cfm?id=2934046.2934059
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, pp. 3104–3112. MIT Press, Cambridge (2014). Available: http://dl.acm.org/citation.cfm?id=2969033.2969173
Klein, A., Falkner, S., Springenberg, J.T., Hutter, F.: Learning curve prediction with Bayesian neural networks. In: Conference Paper at ICLR (2017)
Google Scholar
Baker, B., Gupta, O., Raskar, R., Naik, N.: Practical neural network performance prediction for early stopping. CoRR, vol. abs/1705.10823 (2017). Available: http://arxiv.org/abs/1705.10823
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)
Google Scholar
Ycart, A., Benetos, E.: Polyphonic music sequence transduction with meter-constrained LSTM networks. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 386–390, April 2018
Google Scholar
Press, O., Wolf, L.: Using the output embedding to improve language models. Preprint, vol. arxiv/1608.05859 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Uber AI Labs, San Francisco, CA, USA
Aditya Rawal
Cognizant Technologies, Teaneck, NJ, USA
Jason Liang & Risto Miikkulainen

Authors

Aditya Rawal
View author publications
You can also search for this author in PubMed Google Scholar
Jason Liang
View author publications
You can also search for this author in PubMed Google Scholar
Risto Miikkulainen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Aditya Rawal or Risto Miikkulainen .

Editor information

Editors and Affiliations

Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
Hitoshi Iba
School of Electrical Engineering and Computing, The University of Newcastle, Callaghan, NSW, Australia
Nasimul Noman

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rawal, A., Liang, J., Miikkulainen, R. (2020). Discovering Gated Recurrent Neural Network Architectures. In: Iba, H., Noman, N. (eds) Deep Neural Evolution. Natural Computing Series. Springer, Singapore. https://doi.org/10.1007/978-981-15-3685-4_9

Download citation

DOI: https://doi.org/10.1007/978-981-15-3685-4_9
Published: 21 May 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3684-7
Online ISBN: 978-981-15-3685-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics