ABSTRACT
In this paper, we develop upon the topic of loss function learning, an emergent meta-learning paradigm that aims to learn loss functions that significantly improve the performance of the models trained under them. Specifically, we propose a new meta-learning framework for task and model-agnostic loss function learning via a hybrid search approach. The framework first uses genetic programming to find a set of symbolic loss functions. Second, the set of learned loss functions is subsequently parameterized and optimized via unrolled differentiation. The versatility and performance of the proposed framework are empirically validated on a diverse set of supervised learning tasks. Results show that the learned loss functions bring improved convergence, sample efficiency, and inference performance on tabulated, computer vision, and natural language processing problems, using a variety of task-specific neural network architectures.
- Han Altae-Tran, Bharath Ramsundar, Aneesh S Pappu, and Vijay Pande. 2017. Low Data Drug Discovery with One-Shot Learning. ACS central science 3, 4 (2017), 283--293.Google Scholar
- Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. 2016. Learning to learn by gradient descent by gradient descent. In Advances in neural information processing systems. 3981--3989.Google Scholar
- Yogesh Balaji, Swami Sankaranarayanan, and Rama Chellappa. 2018. MetaReg: Towards Domain Generalization using Meta-Regularization. Advances in Neural Information Processing Systems 31 (2018), 998--1008.Google Scholar
- Sarah Bechtle, Artem Molchanov, Yevgen Chebotar, Edward Grefenstette, Ludovic Righetti, Gaurav Sukhatme, and Franziska Meier. 2021. Meta-Learning via Learned Loss. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 4161--4168.Google ScholarCross Ref
- Samy Bengio, Yoshua Bengio, and Jocelyn Cloutier. 1994. Use of Genetic Programming for the Search of a New Learning Rule for Neural Networks. In Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence. IEEE, 324--327.Google ScholarCross Ref
- Can Chen, Xi Chen, Chen Ma, Zixuan Liu, and Xue Liu. 2022. Gradient-based bi-level optimization for deep learning: A survey. arXiv preprint arXiv:2207.11719 (2022).Google Scholar
- Qi Chen, Bing Xue, and Mengjie Zhang. 2015. Generalisation and Domain Adaptation in GP with Gradient Descent for Symbolic Regression. In 2015 IEEE congress on evolutionary computation (CEC). IEEE, 1137--1144.Google Scholar
- John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Sergey Levine, Quoc V Le, Honglak Lee, and Aleksandra Faust. 2021. Evolving Reinforcement Learning Algorithms. arXiv preprint arXiv:2101.03958 (2021).Google Scholar
- Alan Collet, Antonio Bazco-Nogueras, Albert Banchs, and Marco Fiore. 2022. Loss meta-learning for forecasting. https://openreview.net/forum?id=rczz7TUKIIBGoogle Scholar
- Charles-Alban Deledalle, Samuel Vaiter, Jalal Fadili, and Gabriel Peyré. 2014. Stein Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter selection. SIAM Journal on Imaging Sciences 7, 4 (2014), 2448--2487.Google ScholarDigital Library
- Yadong Ding, Yu Wu, Chengyue Huang, Siliang Tang, Yi Yang, Longhui Wei, Yueting Zhuang, and Qi Tian. 2022. Learning to learn by jointly optimizing neural architecture and weights. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 129--138.Google ScholarCross Ref
- Justin Domke. 2012. Generic methods for optimization-based modeling. In Artificial Intelligence and Statistics. PMLR, 318--326.Google Scholar
- Thomas Elsken, Benedikt Staffler, Jan Hendrik Metzen, and Frank Hutter. 2020. Meta-learning of neural architectures for few-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12365--12375.Google ScholarCross Ref
- Chelsea Finn. 2018. Learning to learn with gradients. Ph. D. Dissertation. UC Berkeley.Google Scholar
- Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In International Conference on Machine Learning. PMLR, 1126--1135.Google Scholar
- Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: Evolutionary Algorithms Made Easy. The Journal of Machine Learning Research 13, 1 (2012), 2171--2175.Google ScholarDigital Library
- Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. 2017. Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning. PMLR, 1165--1173.Google Scholar
- Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, and Massimiliano Pontil. 2018. Bilevel programming for hyperparameter optimization and meta-learning. In International Conference on Machine Learning. PMLR, 1568--1577.Google Scholar
- Boyan Gao, Henry Gouk, and Timothy M Hospedales. 2021. Searching for Robustness: Loss Learning for Noisy Classification Tasks. arXiv preprint arXiv:2103.00243 (2021).Google Scholar
- Boyan Gao, Henry Gouk, Yongxin Yang, and Timothy Hospedales. 2022. Loss function learning for domain generalization by implicit gradient. In International Conference on Machine Learning. PMLR, 7002--7016.Google Scholar
- Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 249--256.Google Scholar
- Santiago Gonzalez and Risto Miikkulainen. 2020. Improved Training Speed, Accuracy, and Data Utilization Through Loss Function Optimization. In 2020 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1--8.Google ScholarDigital Library
- Santiago Gonzalez and Risto Miikkulainen. 2021. Optimizing Loss Functions through Multi-Variate Taylor Polynomial Parameterization. In Proceedings of the Genetic and Evolutionary Computation Conference. 305--313.Google ScholarDigital Library
- Josif Grabocka, Randolf Scholz, and Lars Schmidt-Thieme. 2019. Learning Surrogate Losses. arXiv preprint arXiv:1905.10108 (2019).Google Scholar
- Edward Grefenstette, Brandon Amos, Denis Yarats, Phu Mon Htut, Artem Molchanov, Franziska Meier, Douwe Kiela, Kyunghyun Cho, and Soumith Chintala. 2019. Generalized Inner Loop Meta-Learning. arXiv preprint arXiv:1910.01727 (2019).Google Scholar
- Timothy Hospedales, Antreas Antoniou, Paul Micaelli, and Amos Storkey. 2020. Meta-learning in neural networks: A survey. arXiv preprint arXiv:2004.05439 (2020).Google Scholar
- Rein Houthooft, Yuhua Chen, Phillip Isola, Bradly Stadie, Filip Wolski, OpenAI Jonathan Ho, and Pieter Abbeel. 2018. Evolved Policy Gradients. In Advances in Neural Information Processing Systems, Vol. 31. 5405--5414. https://proceedings.neurips.cc/paper/2018/file/7876acb66640bad41f1e1371ef30c180-Paper.pdfGoogle Scholar
- Chen Huang, Shuangfei Zhai, Walter Talbott, Miguel Bautista Martin, Shih-Yu Sun, Carlos Guestrin, and Josh Susskind. 2019. Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment. In International Conference on Machine Learning. PMLR, 2891--2900.Google Scholar
- Andrey Ignatov, Radu Timofte, Andrei Kulik, Seungsoo Yang, Ke Wang, Felix Baum, Max Wu, Lirong Xu, and Luc Van Gool. 2019. AI Benchmark: All About Deep Learning on Smartphones in 2019. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 3617--3635.Google ScholarCross Ref
- Jaehong Kim, Sangyeul Lee, Sungwan Kim, Moonsu Cha, Jung Kwon Lee, Youngduck Choi, Yongseok Choi, Dong-Yeon Cho, and Jiwon Kim. 2018. Auto-meta: Automated gradient based meta learner search. arXiv preprint arXiv:1806.06927 (2018).Google Scholar
- John R Koza. 1992. Genetic programming: on the programming of computers by means of natural selection. Vol. 1. MIT press.Google ScholarDigital Library
- Alex Krizhevsky and Geoffrey Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. (2009).Google Scholar
- Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarCross Ref
- Hao Li, Tianwen Fu, Jifeng Dai, Hongsheng Li, Gao Huang, and Xizhou Zhu. 2021. AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks. arXiv preprint arXiv:2103.14026 (2021).Google Scholar
- Yiying Li, Yongxin Yang, Wei Zhou, and Timothy Hospedales. 2019. Feature-critic networks for heterogeneous domain generalization. In International Conference on Machine Learning. PMLR, 3915--3924.Google Scholar
- Peidong Liu, Gengwei Zhang, Bochao Wang, Hang Xu, Xiaodan Liang, Yong Jiang, and Zhenguo Li. 2021. Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search. arXiv preprint arXiv:2102.04700 (2021).Google Scholar
- Jonathan Lorraine, Paul Vicol, and David Duvenaud. 2020. Optimizing millions of hyperparameters by implicit differentiation. In International Conference on Artificial Intelligence and Statistics. PMLR, 1540--1552.Google Scholar
- Dougal Maclaurin, David Duvenaud, and Ryan Adams. 2015. Gradient-based hyperparameter optimization through reversible learning. In International conference on machine learning. PMLR, 2113--2122.Google Scholar
- Ji Ni, Russ H Drieberg, and Peter I Rockett. 2012. The Use of an Analytic Quotient Operator in Genetic Programming. IEEE Transactions on Evolutionary Computation 17, 1 (2012), 146--152.Google ScholarDigital Library
- Alex Nichol, Joshua Achiam, and John Schulman. 2018. On First-Order Meta-Learning Algorithms. arXiv preprint arXiv:1803.02999 (2018).Google Scholar
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An Imperative Style, High-Performance Deep Learning Library. Advances in neural information processing systems 32 (2019), 8026--8037.Google Scholar
- Huimin Peng. 2020. A Comprehensive Overview and Survey of Recent Advances in Meta-Learning. arXiv preprint arXiv:2004.11149 (2020).Google Scholar
- Aravind Rajeswaran, Chelsea Finn, Sham Kakade, and Sergey Levine. 2019. Meta-learning with implicit gradients. (2019).Google Scholar
- Prajit Ramachandran, Barret Zoph, and Quoc V Le. 2017. Searching for Activation Functions. arXiv preprint arXiv:1710.05941 (2017).Google Scholar
- Delip Rao and Brian McMahan. 2019. Natural Language Processing with PyTorch: Build Intelligent Language Applications using Deep Learning. " O'Reilly Media, Inc.".Google Scholar
- Christian Raymond, Qi Chen, Bing Xue, and Mengjie Zhang. 2023. Online Loss Function Learning. arXiv e-prints, Article arXiv:2301.13247 (Jan. 2023), arXiv:2301.13247 pages. arXiv:2301.13247 [cs.LG] Google ScholarCross Ref
- Esteban Real, Chen Liang, David So, and Quoc Le. 2020. AutoML-Zero: Evolving Machine Learning Algorithms From Scratch. In International Conference on Machine Learning. PMLR, 8007--8019.Google Scholar
- David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986. Learning representations by back-propagating errors. nature 323, 6088 (1986), 533--536.Google Scholar
- Jürgen Schmidhuber. 1987. Evolutionary Principles in Self-Referential Learning. Ph. D. Dissertation. Technische Universität München.Google Scholar
- Jürgen Schmidhuber. 1992. Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks. Neural Computation 4, 1 (1992), 131--139.Google ScholarDigital Library
- Damien Scieur, Quentin Bertrand, Gauthier Gidel, and Fabian Pedregosa. 2022. The Curse of Unrolling: Rate of Differentiating Through Optimization. arXiv preprint arXiv:2209.13271 (2022).Google Scholar
- Amirreza Shaban, Ching-An Cheng, Nathan Hatch, and Byron Boots. 2019. Truncated back-propagation for bilevel optimization. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 1723--1732.Google Scholar
- Will Smart and Mengjie Zhang. 2004. Applying Online Gradient Descent Search to Genetic Programming for Object Recognition. In Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation-Volume 32. 133--138.Google ScholarDigital Library
- Kenneth O Stanley, Jeff Clune, Joel Lehman, and Risto Miikkulainen. 2019. Designing neural networks through neuroevolution. Nature Machine Intelligence 1, 1 (2019), 24--35.Google ScholarCross Ref
- Alexander Topchy and William F Punch. 2001. Faster Genetic Programming based on Local Gradient Search of Numeric Leaf Values. In Proceedings of the genetic and evolutionary computation conference (GECCO-2001), Vol. 155162. Morgan Kaufmann.Google Scholar
- Joaquin Vanschoren. 2018. Meta-learning: A survey. arXiv preprint arXiv:1810.03548 (2018).Google Scholar
- Ricardo Vilalta and Youssef Drissi. 2002. A perspective view and survey of meta-learning. Artificial intelligence review 18, 2 (2002), 77--95.Google Scholar
- Qi Wang, Yue Ma, Kun Zhao, and Yingjie Tian. 2022. A comprehensive survey of loss functions in machine learning. Annals of Data Science 9, 2 (2022), 187--212.Google ScholarCross Ref
- Robert Edwin Wengert. 1964. A simple automatic derivative evaluation program. Commun. ACM 7, 8 (1964), 463--464.Google ScholarDigital Library
- Mengjie Zhang and Will Smart. 2005. Learning Weights in Genetic Programs Using Gradient Descent for Object Recognition. In Workshops on Applications of Evolutionary Computation. Springer, 417--427.Google Scholar
Index Terms
- Fast and Efficient Local-Search for Genetic Programming Based Loss Function Learning
Recommendations
Leader learning loss function in neural network classification
AbstractDeep learning, based on Empirical Risk Minimization (ERM), typically aims to fit the ideal outputs of all samples due to its large capacity. However, models trained based on empirical losses like cross entropy (CE) or mean square error ...
Highlights- The stepwise-changed CE covers the deficiency on classification error.
- Leader ...
Genetic Programming for Reward Function Search
Reward functions in reinforcement learning have largely been assumed given as part of the problem being solved by the agent. However, the psychological notion of intrinsic motivation has recently inspired inquiry into whether there exist alternate ...
Genetic programming methods for reinforcement learning
GECCO '19: Proceedings of the Genetic and Evolutionary Computation ConferenceReinforcement Learning (RL) algorithms can be used to optimally solve dynamic decision-making and control problems. With continuous-valued state and input variables, RL algorithms must rely on function approximators to represent the value function and ...
Comments