#1. The complete title of one (or more) paper(s) published in the open literature describing the work that the author claims describes a human-competitive result: 1. AutoLR: an evolutionary approach to learning rate policies 2. Evolving Learning Rate Optimizers for Deep Neural Networks #2. The name, complete physical mailing address, e-mail address, and phone number of EACH author of EACH paper(s): Pedro Carvalho Departamento de Engenharia Informática Faculdade de Ciências e Tecnologia, Universidade de Coimbra Pólo II - Pinhal de Marrocos 3030-290, Coimbra, Portugal pfcarvalho@dei.uc.pt +351 239790016 Nuno Lourenço Departamento de Engenharia Informática Faculdade de Ciências e Tecnologia, Universidade de Coimbra Pólo II - Pinhal de Marrocos 3030-290, Coimbra, Portugal naml@dei.uc.pt +351 239790016 Filipe Assunção Departamento de Engenharia Informática Faculdade de Ciências e Tecnologia, Universidade de Coimbra Pólo II - Pinhal de Marrocos 3030-290, Coimbra, Portugal fga@dei.uc.pt +351 239790016 Penousal Machado Departamento de Engenharia Informática Faculdade de Ciências e Tecnologia, Universidade de Coimbra Pólo II - Pinhal de Marrocos 3030-290, Coimbra, Portugal machado@dei.uc.pt +351 239790052 #3. The name of the corresponding author (i.e., the author to whom notices will be sent concerning the competition): Nuno Lourenço (naml@dei.uc.pt) #4. The abstract of the paper(s): a) The choice of a proper learning rate is paramount for good Artificial Neural Network training and performance. In the past, one had to rely on experience and trial-and-error to find an adequate learning rate. Presently, a plethora of state of the art automatic methods exist that make the search for a good learning rate easier. While these techniques are effective and have yielded good results over the years, they are general solutions. This means the optimization of learning rate for specific network topologies remains largely unexplored. This work presents AutoLR, a framework that evolves Learning Rate Schedulers for a specific Neural Network Architecture using Structured Grammatical Evolution. The system was used to evolve learning rate policies that were compared with a commonly used baseline value for learning rate. Results show that training performed using certain evolved policies is more efficient than the established baseline and suggest that this approach is a viable means of improving a neural network's performance. b) Artificial Neural Networks (ANNs) became popular due to their successful application on difficult problems such as image and speech recognition. However, when practitioners want to design an ANN they need to undergo a laborious process of selecting a set of parameters and topology. Currently, there are several state-of-the art methods that allow for the automatic selection of some of these aspects. Learning Rate optimizers are a set of such techniques that search for good values of learning rates. Whilst these techniques are effective and have yielded good results over the years, they are general solutions i.e. they do not consider the characteristics of a specific network. We propose a framework called AutoLR to automatically design Learning Rate Optimizers. Two versions of the system are detailed. The first one, Dynamic AutoLR, evolves static and dynamic learning rate optimizers based on the current epoch and the previous learning rate. The second version, Adaptive AutoLR, evolves adaptive optimizers that can fine tune the learning rate for each network weight which makes them generally more effective. The results are competitive with the best state of the art methods, even outperforming them in some scenarios. Furthermore, the system evolved a classifier, ADES, that appears to be novel and innovative since, to the best of our knowledge, it has a structure that differs from state of the art methods. #5. A list containing one or more of the eight letters (A, B, C, D, E, F, G, or H) that correspond to the criteria (see above) that the author claims that the work satisfies: (B) The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal. (E) The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions. (F) The result is equal to or better than a result that was considered an achievement in its field at the time it was first discovered. (G) The result solves a problem of indisputable difficulty in its field. #6. A statement stating why the result satisfies the criteria that the contestant claims (see examples of statements of human-competitiveness as a guide to aid in constructing this part of the submission): (B) The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal. We evaluated AutoLR in two scenarios: 1. The evolution of Dynamic Learning Rate Schedulers 1. Deep Neural Networks (DNNs) are configured by a set of hyperparameters. One such parameter is the learning rate, which scales the changes made to the network’s weights during training. This parameter has a profound effect on the effectiveness of the training and the network’s subsequent performance. Traditionally, a single learning rate value is used for the entirety of training, without considering other hyper-parameters. However, and since the hyperparameters in ANNs are inter-dependent there is no guarantee the learning rate remains adequate once other parameters are adjusted. To overcome these issues, researchers started to adopt dynamic learning rates, where the learning rate changes throughout the training process (e.g. start with a high learning rate that decreases as training progresses). In our work we show that using AutoLR we can evolve Learning Rate Schedulers that have a competitive performance when compared to the typical approach of using a fixed learning rate. We tested our approach in several scenarios and in the best one, the best evolved scheduler for a DNN for the MNIST dataset attained a classification accuracy in the test set of 88.68%. The baseline approach, using a fixed learning rate, achieved a classification of 87.47%. 2. The evolution of Adaptive Optimizers for Deep Neural Networks 1. Dynamic learning rate schedulers are still limited because they have no knowledge of what is happening throughout the training process. The optimizers can change the learning rate based on the training epoch but not based on changes in the gradient. This led to the development of the most sophisticated approaches to weight optimization: adaptive optimizers. Adaptive optimizers are able to tune the different sized changes for individual weights through the use of auxiliary variables that are tracked for each weight. The most well known and successful Adaptive Optimizer is Adam [1] which resulted from the succession of increasingly better human-created solutions Our approach was able to automatically discover an adaptive optimizer, called Adaptive Evolutionary Squared (ADES), which obtained a test accuracy of 79.72% in the CIFAR-10 dataset whilst Adam obtained 78.76%. (E) The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions. The optimization of neural networks weights is a relevant topic in neural network design. Over the years, several approaches have been used to tackle this problem. Most notably, Stochastic Gradient Descent (SGD) [3] has established itself as the standard solution to neural network weight optimization. The success of SGD motivated further research, resulting in a set of SGD-based neural network optimizers being invented over the years. There are several seminal optimizers resulting from this research: momentum based optimizers [2], RMSprop, and finally, the standard for modern neural network optimization, Adam [1]. The focus of this field has been to develop optimizers that are generally applicable since it is not feasible for humans to develop specific optimizers for every relevant problem. In our work, we create a system capable of designing these optimizers through evolution. We then leverage the automatic design abilities of the system to explore the benefits of creating specialized neural network optimizers. The best evolved optimizer produced, ADES, remains competitive with man made solutions across all experiments, notably out-performing the state of the art Adam in the classification of images from the established CIFAR10 dataset, as stated in (B). (F) The result is equal to or better than a result that was considered an achievement in its field at the time it was first discovered. Deep Learning (DL) models, in particular CNNs, have shown a remarkable performance in solving difficult problems from the fields of computer vision, medicine, and natural language processing. However, when using an DNN for a specific problem we need to train it, i.e., find the right set of weights that give the maximum performance for the task at hand. Training is paramount, and there has been extensive research into how DNNs should be trained and how to regulate the train, namely through the development of several methodologies and hyper-parameters. AutoLR is an evolutionary approach that is capable of automatically discovering novel and effective training methodologies and procedures. One remarkable result of AutoLR is concerned with the discovery of a new Adaptive Optimizer called Adaptive Evolutionary Squared (ADES) that is able to perform on par with human established techniques such as Adam [1], Nesterov Momentum [2] or RMSProp, while being structurally different. (G) The result solves a problem of indisputable difficulty in its field. In DL the difficulty is not the lack of models able to solve a task, but rather effectively train them. Currently, there are several state-of-the art methods that allow for the automatic selection of some of these aspects. Learning rate optimizers are a set of such techniques that optimize the size of changes made to neural network weights during training. Whilst these techniques are effective and have yielded good results over the years, they are general solutions, i.e., they do not consider the characteristics of a specific network and the problem at hand. The AutoLR framework is able to tackle these issues by automatically designing these optimizers. We have demonstrated the generality of our framework by using it to evolve Dynamic and Adaptive optimizers. The fact that AutoLR was able to produce results competitive with the state of the art, and that it was even able to discover a completely novel and innovative optimizer, called ADES, shows its effectiveness. Additionally AutoLR is highly generalisable and easy to adapt and/or insert additional knowledge, due to its grammar-based engine. #7. A full citation of the paper (that is, author names; publication date; name of journal, conference, technical report, thesis, book, or book chapter; name of editors, if applicable, of the journal or edited book; publisher name; publisher city; page numbers, if applicable) a) @inproceedings{DBLP:conf/gecco/Carvalho0AM20, author = {Pedro Carvalho and Nuno Louren{\c{c}}o and Filipe Assun{\c{c}}{\~{a}}o and Penousal Machado}, editor = {Carlos Artemio Coello Coello}, title = {AutoLR: an evolutionary approach to learning rate policies}, booktitle = {{GECCO} '20: Genetic and Evolutionary Computation Conference, Canc{\'{u}}n Mexico, July 8-12, 2020}, pages = {672--680}, publisher = {{ACM}}, year = {2020}, url = {https://doi.org/10.1145/3377930.3390158}, doi = {10.1145/3377930.3390158}, timestamp = {Mon, 20 Jul 2020 14:46:12 +0200}, biburl = {https://dblp.org/rec/conf/gecco/Carvalho0AM20.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} } b) @article{DBLP:journals/corr/abs-2103-12623, author = {Pedro Carvalho and Nuno Louren{\c{c}}o and Penousal Machado}, title = {Evolving Learning Rate Optimizers for Deep Neural Networks}, journal = {CoRR}, volume = {abs/2103.12623}, year = {2021}, url = {https://arxiv.org/abs/2103.12623}, eprint = {2103.12623}, } #7. A statement either that "any prize money, if any, is to be divided equally among the co-authors" OR a specific percentage breakdown as to how the prize money, if any, is to be divided among the co-authors: Any prize money, if any, is to be divided equally among the co-authors. #8. A statement stating why the authors expect that their entry would be the "best": AutoLR is an evolutionary framework that deals with the design of neural network optimizers. This task is interesting and challenging from an evolutionary point of view but also useful and relevant for the field of machine learning. The problem space of neural network optimizers is vast and difficult to navigate. AutoLR produced notable results in this difficult scenario, not only discovering established man-made optimizers autonomously but also innovating on these solutions, creating novel optimizers that employ unique ideas while performing competitively with the state of the art. Furthermore, experiments performed with AutoLR have been conducted with limited computing power: the experiments were conducted on a server with only four 1080Ti GPUs. Additionally, it should be noted that AutoLR was applied in the computer vision domain, which is arguably one of the most researched Machine Learning tasks. Nevertheless, it was able to still find novelty and outperform established solutions, showcasing the potential of AutoLR to unexplored domains. #9. An indication of the general type of genetic or evolutionary computation used, such as GA (genetic algorithms), GP (genetic programming), ES (evolution strategies), EP (evolutionary programming), LCS (learning classifier systems), GE (grammatical evolution), GEP (gene expression programming), DE (differential evolution), etc.: Genetic Programming (GP) and Structured Grammatical Evolution (SGE) #10. The date of publication of each paper. If the date of publication is not on or before the deadline for submission, but instead, the paper has been unconditionally accepted for publication and is “in press” by the deadline for this competition, the entry must include a copy of the documentation establishing that the paper meets the "in press" requirement: a) June 2020 b) March 2021 #References: [1] - D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2015. [Online]. Available: http://arxiv.org/abs/1412.6980 [2] - Y. NESTEROV, “A method for unconstrained convex minimization problem with the rate of convergence o(1/kˆ2),” Doklady AN USSR, vol. 269, pp. 543–547, 1983. [Online]. Available: https: //ci.nii.ac.jp/naid/20001173129/en/ [3] - L. Bottou, “Online algorithms and stochastic approximations,” in Online Learning and Neural Networks, D. Saad, Ed. Cambridge, UK: Cambridge University Press, 1998, revised, oct 2012. [Online]. Available: http://leon.bottou.org/papers/bottou-98x