Abstract
Gated recurrent networks such as those composed of Long Short-Term Memory (LSTM) nodes have recently been used to improve state of the art in many sequential processing tasks such as speech recognition and machine translation. However, the basic structure of the LSTM node is essentially the same as when it was first conceived 25 years ago. Recently, evolutionary and reinforcement-learning mechanisms have been employed to create new variations of this structure. This chapter proposes a new method, evolution of a tree-based encoding of the gated memory nodes, and shows that it makes it possible to explore new variations more effectively than other methods. The method discovers nodes with multiple recurrent paths and multiple memory cells, which lead to significant improvement in the standard language modeling benchmark task. The chapter also shows how the search process can be speeded up by training an LSTM network to estimate performance of candidate structures, and by encouraging exploration of novel solutions. Thus, evolutionary design of complex neural network structures promises to improve performance of deep learning architectures beyond human ability to do so.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hoos, H.: Programming by optimization. Commun. ACM 55, 70–80 (2012)
Andrychowicz, M., Denil, M., Colmenarejo, S.G., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., de Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 3988–3996. Curran Associates Inc., Red Hook (2016). Available: http://dl.acm.org/citation.cfm?id=3157382.3157543
Malkomes, G., Schaff, C., Garnett, R.: Bayesian optimization for automated model selection. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 2900–2908. Curran Associates Inc., Red Hook (2016). Available: http://dl.acm.org/citation.cfm?id=3157382.3157422
Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning (2016). Available: https://arxiv.org/pdf/1611.01578v1.pdf
Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning (2016). Available: https://arxiv.org/pdf/1611.02167v2.pdf
Miikkulainen, R., Liang, J.Z., Meyerson, E., Rawal, A., Fink, D., Francon, O., Raju, B., Shahrzad, H., Navruzyan, A., Duffy, N., Hodjat, B.: Evolving deep neural networks. CoRR, vol. abs/1703.00548 (2017). Available: http://arxiv.org/abs/1703.00548
Real, E., Moore, S., Selle, A., Saxena, S., Suematsu, Y.L., Tan, J., Le, Q.V., Kurakin, A.: Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17. JMLR.org, pp. 2902–2911 (2017). Available: http://dl.acm.org/citation.cfm?id=3305890.3305981
Fernando, C.: PathNet: evolution channels gradient descent in super neural networks (2017). Available: https://arxiv.org/abs/1701.08734
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). Available: http://dx.doi.org/10.1162/neco.1997.9.8.1735
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. Preprint (2014). arXiv:1409.0473
Klaus, G., Srivastava, R., KoutnÃk, J., Steunebrink, R., Schmidhuber, J.: LSTM: a search space odyssey. Preprint, vol. arxiv/1503.04069 (2014)
Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 2342–2350 (2015)
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. Preprint, vol. arxiv/1409.2329 (2014)
Zilly, J.G., Srivastava, R.K., KoutnÃk, J., Schmidhuber, J.: Recurrent highway networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, 06–11 Aug 2017, vol. 70, pp. 4189–4198. International Convention Centre, Sydney (2017). Available: http://proceedings.mlr.press/v70/zilly17a.html
Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 1027–1035. Curran Associates Inc., Red Hook (2016). Available: http://dl.acm.org/citation.cfm?id=3157096.3157211
Bayer, J., Wierstra, D., Togelius, J., Schmidhuber, J.: Evolving memory cell structures for sequence learning. In: Artificial Neural Networks ICANN, pp. 755–764 (2009)
Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR, vol. abs/1406.1078 (2014). Available: http://arxiv.org/abs/1406.1078
Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR, 10–15 July 2018, vol. 80, pp. 4095–4104. Stockholmsmässan, Stockholm (2018). Available: http://proceedings.mlr.press/v80/pham18a.html
Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002). Available: http://nn.cs.utexas.edu/?stanley:ec02
Suganuma, M., Shirakawa, S., Nagao, T.: A genetic programming approach to designing convolutional neural network architectures. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’17, pp. 497–504. ACM, New York (2017). Available: http://doi.acm.org/10.1145/3071178.3071229
Lehman, J., Stanley, K.O.: Abandoning objectives: evolution through the search for novelty alone. Evol. Comput. 19(2), 189–223 (2011)
Trujillo, L., Muñoz, L., López, E.G., Silva, S.: neat genetic programming: controlling bloat naturally. Inf. Sci. 333, 21–43 (2016)
Zhang, Y., Zhang, M.: A multiple-output program tree structure in genetic programming. Tech. Rep. (2004)
Francone, F.D., Conrads, M., Banzhaf, W., Nordin, P.: Homologous crossover in genetic programming. In: Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation - Volume 2, GECCO’99, pp. 1021–1026. Morgan Kaufmann Publishers Inc., San Francisco (1999). Available: http://dl.acm.org/citation.cfm?id=2934046.2934059
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, pp. 3104–3112. MIT Press, Cambridge (2014). Available: http://dl.acm.org/citation.cfm?id=2969033.2969173
Klein, A., Falkner, S., Springenberg, J.T., Hutter, F.: Learning curve prediction with Bayesian neural networks. In: Conference Paper at ICLR (2017)
Baker, B., Gupta, O., Raskar, R., Naik, N.: Practical neural network performance prediction for early stopping. CoRR, vol. abs/1705.10823 (2017). Available: http://arxiv.org/abs/1705.10823
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)
Ycart, A., Benetos, E.: Polyphonic music sequence transduction with meter-constrained LSTM networks. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 386–390, April 2018
Press, O., Wolf, L.: Using the output embedding to improve language models. Preprint, vol. arxiv/1608.05859 (2016)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Rawal, A., Liang, J., Miikkulainen, R. (2020). Discovering Gated Recurrent Neural Network Architectures. In: Iba, H., Noman, N. (eds) Deep Neural Evolution. Natural Computing Series. Springer, Singapore. https://doi.org/10.1007/978-981-15-3685-4_9
Download citation
DOI: https://doi.org/10.1007/978-981-15-3685-4_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3684-7
Online ISBN: 978-981-15-3685-4
eBook Packages: Computer ScienceComputer Science (R0)