research-article

Fast and Efficient Local-Search for Genetic Programming Based Loss Function Learning

Authors:
Christian Raymond

Victoria University of Wellington, Wellington, New Zealand

Victoria University of Wellington, Wellington, New Zealand

https://orcid.org/0000-0002-0963-9367
View Profile

,
Qi Chen

Victoria University of Wellington, Wellington, New Zealand

Victoria University of Wellington, Wellington, New Zealand

https://orcid.org/0000-0001-9367-4757
View Profile

,
Bing Xue

Victoria University of Wellington, Wellington, New Zealand

Victoria University of Wellington, Wellington, New Zealand

https://orcid.org/0000-0002-4865-8026
View Profile

,
Mengjie Zhang

Victoria University of Wellington, Wellington, New Zealand

Victoria University of Wellington, Wellington, New Zealand

https://orcid.org/0000-0003-4463-9538
View Profile

GECCO '23: Proceedings of the Genetic and Evolutionary Computation ConferenceJuly 2023Pages 1184–1193https://doi.org/10.1145/3583131.3590361

Published:12 July 2023Publication History

GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference

Pages 1184–1193

ABSTRACT

In this paper, we develop upon the topic of loss function learning, an emergent meta-learning paradigm that aims to learn loss functions that significantly improve the performance of the models trained under them. Specifically, we propose a new meta-learning framework for task and model-agnostic loss function learning via a hybrid search approach. The framework first uses genetic programming to find a set of symbolic loss functions. Second, the set of learned loss functions is subsequently parameterized and optimized via unrolled differentiation. The versatility and performance of the proposed framework are empirically validated on a diverse set of supervised learning tasks. Results show that the learned loss functions bring improved convergence, sample efficiency, and inference performance on tabulated, computer vision, and natural language processing problems, using a variety of task-specific neural network architectures.

References

Han Altae-Tran, Bharath Ramsundar, Aneesh S Pappu, and Vijay Pande. 2017. Low Data Drug Discovery with One-Shot Learning. ACS central science 3, 4 (2017), 283--293.Google Scholar
Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. 2016. Learning to learn by gradient descent by gradient descent. In Advances in neural information processing systems. 3981--3989.Google Scholar
Yogesh Balaji, Swami Sankaranarayanan, and Rama Chellappa. 2018. MetaReg: Towards Domain Generalization using Meta-Regularization. Advances in Neural Information Processing Systems 31 (2018), 998--1008.Google Scholar
Sarah Bechtle, Artem Molchanov, Yevgen Chebotar, Edward Grefenstette, Ludovic Righetti, Gaurav Sukhatme, and Franziska Meier. 2021. Meta-Learning via Learned Loss. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 4161--4168.Google ScholarCross Ref
Samy Bengio, Yoshua Bengio, and Jocelyn Cloutier. 1994. Use of Genetic Programming for the Search of a New Learning Rule for Neural Networks. In Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence. IEEE, 324--327.Google ScholarCross Ref
Can Chen, Xi Chen, Chen Ma, Zixuan Liu, and Xue Liu. 2022. Gradient-based bi-level optimization for deep learning: A survey. arXiv preprint arXiv:2207.11719 (2022).Google Scholar
Qi Chen, Bing Xue, and Mengjie Zhang. 2015. Generalisation and Domain Adaptation in GP with Gradient Descent for Symbolic Regression. In 2015 IEEE congress on evolutionary computation (CEC). IEEE, 1137--1144.Google Scholar
John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Sergey Levine, Quoc V Le, Honglak Lee, and Aleksandra Faust. 2021. Evolving Reinforcement Learning Algorithms. arXiv preprint arXiv:2101.03958 (2021).Google Scholar
Alan Collet, Antonio Bazco-Nogueras, Albert Banchs, and Marco Fiore. 2022. Loss meta-learning for forecasting. https://openreview.net/forum?id=rczz7TUKIIBGoogle Scholar
Charles-Alban Deledalle, Samuel Vaiter, Jalal Fadili, and Gabriel Peyré. 2014. Stein Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter selection. SIAM Journal on Imaging Sciences 7, 4 (2014), 2448--2487.Google ScholarDigital Library
Yadong Ding, Yu Wu, Chengyue Huang, Siliang Tang, Yi Yang, Longhui Wei, Yueting Zhuang, and Qi Tian. 2022. Learning to learn by jointly optimizing neural architecture and weights. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 129--138.Google ScholarCross Ref
Justin Domke. 2012. Generic methods for optimization-based modeling. In Artificial Intelligence and Statistics. PMLR, 318--326.Google Scholar
Thomas Elsken, Benedikt Staffler, Jan Hendrik Metzen, and Frank Hutter. 2020. Meta-learning of neural architectures for few-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12365--12375.Google ScholarCross Ref
Chelsea Finn. 2018. Learning to learn with gradients. Ph. D. Dissertation. UC Berkeley.Google Scholar
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In International Conference on Machine Learning. PMLR, 1126--1135.Google Scholar
Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: Evolutionary Algorithms Made Easy. The Journal of Machine Learning Research 13, 1 (2012), 2171--2175.Google ScholarDigital Library
Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. 2017. Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning. PMLR, 1165--1173.Google Scholar
Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, and Massimiliano Pontil. 2018. Bilevel programming for hyperparameter optimization and meta-learning. In International Conference on Machine Learning. PMLR, 1568--1577.Google Scholar
Boyan Gao, Henry Gouk, and Timothy M Hospedales. 2021. Searching for Robustness: Loss Learning for Noisy Classification Tasks. arXiv preprint arXiv:2103.00243 (2021).Google Scholar
Boyan Gao, Henry Gouk, Yongxin Yang, and Timothy Hospedales. 2022. Loss function learning for domain generalization by implicit gradient. In International Conference on Machine Learning. PMLR, 7002--7016.Google Scholar
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 249--256.Google Scholar
Santiago Gonzalez and Risto Miikkulainen. 2020. Improved Training Speed, Accuracy, and Data Utilization Through Loss Function Optimization. In 2020 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1--8.Google ScholarDigital Library
Santiago Gonzalez and Risto Miikkulainen. 2021. Optimizing Loss Functions through Multi-Variate Taylor Polynomial Parameterization. In Proceedings of the Genetic and Evolutionary Computation Conference. 305--313.Google ScholarDigital Library
Josif Grabocka, Randolf Scholz, and Lars Schmidt-Thieme. 2019. Learning Surrogate Losses. arXiv preprint arXiv:1905.10108 (2019).Google Scholar
Edward Grefenstette, Brandon Amos, Denis Yarats, Phu Mon Htut, Artem Molchanov, Franziska Meier, Douwe Kiela, Kyunghyun Cho, and Soumith Chintala. 2019. Generalized Inner Loop Meta-Learning. arXiv preprint arXiv:1910.01727 (2019).Google Scholar
Timothy Hospedales, Antreas Antoniou, Paul Micaelli, and Amos Storkey. 2020. Meta-learning in neural networks: A survey. arXiv preprint arXiv:2004.05439 (2020).Google Scholar
Rein Houthooft, Yuhua Chen, Phillip Isola, Bradly Stadie, Filip Wolski, OpenAI Jonathan Ho, and Pieter Abbeel. 2018. Evolved Policy Gradients. In Advances in Neural Information Processing Systems, Vol. 31. 5405--5414. https://proceedings.neurips.cc/paper/2018/file/7876acb66640bad41f1e1371ef30c180-Paper.pdfGoogle Scholar
Chen Huang, Shuangfei Zhai, Walter Talbott, Miguel Bautista Martin, Shih-Yu Sun, Carlos Guestrin, and Josh Susskind. 2019. Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment. In International Conference on Machine Learning. PMLR, 2891--2900.Google Scholar
Andrey Ignatov, Radu Timofte, Andrei Kulik, Seungsoo Yang, Ke Wang, Felix Baum, Max Wu, Lirong Xu, and Luc Van Gool. 2019. AI Benchmark: All About Deep Learning on Smartphones in 2019. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 3617--3635.Google ScholarCross Ref
Jaehong Kim, Sangyeul Lee, Sungwan Kim, Moonsu Cha, Jung Kwon Lee, Youngduck Choi, Yongseok Choi, Dong-Yeon Cho, and Jiwon Kim. 2018. Auto-meta: Automated gradient based meta learner search. arXiv preprint arXiv:1806.06927 (2018).Google Scholar
John R Koza. 1992. Genetic programming: on the programming of computers by means of natural selection. Vol. 1. MIT press.Google ScholarDigital Library
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. (2009).Google Scholar
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarCross Ref
Hao Li, Tianwen Fu, Jifeng Dai, Hongsheng Li, Gao Huang, and Xizhou Zhu. 2021. AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks. arXiv preprint arXiv:2103.14026 (2021).Google Scholar
Yiying Li, Yongxin Yang, Wei Zhou, and Timothy Hospedales. 2019. Feature-critic networks for heterogeneous domain generalization. In International Conference on Machine Learning. PMLR, 3915--3924.Google Scholar
Peidong Liu, Gengwei Zhang, Bochao Wang, Hang Xu, Xiaodan Liang, Yong Jiang, and Zhenguo Li. 2021. Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search. arXiv preprint arXiv:2102.04700 (2021).Google Scholar
Jonathan Lorraine, Paul Vicol, and David Duvenaud. 2020. Optimizing millions of hyperparameters by implicit differentiation. In International Conference on Artificial Intelligence and Statistics. PMLR, 1540--1552.Google Scholar
Dougal Maclaurin, David Duvenaud, and Ryan Adams. 2015. Gradient-based hyperparameter optimization through reversible learning. In International conference on machine learning. PMLR, 2113--2122.Google Scholar
Ji Ni, Russ H Drieberg, and Peter I Rockett. 2012. The Use of an Analytic Quotient Operator in Genetic Programming. IEEE Transactions on Evolutionary Computation 17, 1 (2012), 146--152.Google ScholarDigital Library
Alex Nichol, Joshua Achiam, and John Schulman. 2018. On First-Order Meta-Learning Algorithms. arXiv preprint arXiv:1803.02999 (2018).Google Scholar
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An Imperative Style, High-Performance Deep Learning Library. Advances in neural information processing systems 32 (2019), 8026--8037.Google Scholar
Huimin Peng. 2020. A Comprehensive Overview and Survey of Recent Advances in Meta-Learning. arXiv preprint arXiv:2004.11149 (2020).Google Scholar
Aravind Rajeswaran, Chelsea Finn, Sham Kakade, and Sergey Levine. 2019. Meta-learning with implicit gradients. (2019).Google Scholar
Prajit Ramachandran, Barret Zoph, and Quoc V Le. 2017. Searching for Activation Functions. arXiv preprint arXiv:1710.05941 (2017).Google Scholar
Delip Rao and Brian McMahan. 2019. Natural Language Processing with PyTorch: Build Intelligent Language Applications using Deep Learning. " O'Reilly Media, Inc.".Google Scholar
Christian Raymond, Qi Chen, Bing Xue, and Mengjie Zhang. 2023. Online Loss Function Learning. arXiv e-prints, Article arXiv:2301.13247 (Jan. 2023), arXiv:2301.13247 pages. arXiv:2301.13247 [cs.LG] Google ScholarCross Ref
Esteban Real, Chen Liang, David So, and Quoc Le. 2020. AutoML-Zero: Evolving Machine Learning Algorithms From Scratch. In International Conference on Machine Learning. PMLR, 8007--8019.Google Scholar
David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986. Learning representations by back-propagating errors. nature 323, 6088 (1986), 533--536.Google Scholar
Jürgen Schmidhuber. 1987. Evolutionary Principles in Self-Referential Learning. Ph. D. Dissertation. Technische Universität München.Google Scholar
Jürgen Schmidhuber. 1992. Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks. Neural Computation 4, 1 (1992), 131--139.Google ScholarDigital Library
Damien Scieur, Quentin Bertrand, Gauthier Gidel, and Fabian Pedregosa. 2022. The Curse of Unrolling: Rate of Differentiating Through Optimization. arXiv preprint arXiv:2209.13271 (2022).Google Scholar
Amirreza Shaban, Ching-An Cheng, Nathan Hatch, and Byron Boots. 2019. Truncated back-propagation for bilevel optimization. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 1723--1732.Google Scholar
Will Smart and Mengjie Zhang. 2004. Applying Online Gradient Descent Search to Genetic Programming for Object Recognition. In Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation-Volume 32. 133--138.Google ScholarDigital Library
Kenneth O Stanley, Jeff Clune, Joel Lehman, and Risto Miikkulainen. 2019. Designing neural networks through neuroevolution. Nature Machine Intelligence 1, 1 (2019), 24--35.Google ScholarCross Ref
Alexander Topchy and William F Punch. 2001. Faster Genetic Programming based on Local Gradient Search of Numeric Leaf Values. In Proceedings of the genetic and evolutionary computation conference (GECCO-2001), Vol. 155162. Morgan Kaufmann.Google Scholar
Joaquin Vanschoren. 2018. Meta-learning: A survey. arXiv preprint arXiv:1810.03548 (2018).Google Scholar
Ricardo Vilalta and Youssef Drissi. 2002. A perspective view and survey of meta-learning. Artificial intelligence review 18, 2 (2002), 77--95.Google Scholar
Qi Wang, Yue Ma, Kun Zhao, and Yingjie Tian. 2022. A comprehensive survey of loss functions in machine learning. Annals of Data Science 9, 2 (2022), 187--212.Google ScholarCross Ref
Robert Edwin Wengert. 1964. A simple automatic derivative evaluation program. Commun. ACM 7, 8 (1964), 463--464.Google ScholarDigital Library
Mengjie Zhang and Will Smart. 2005. Learning Weights in Genetic Programs Using Gradient Descent for Object Recognition. In Workshops on Applications of Evolutionary Computation. Springer, 417--427.Google Scholar

Index Terms

Fast and Efficient Local-Search for Genetic Programming Based Loss Function Learning
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms

Recommendations

Leader learning loss function in neural network classification
Abstract
Deep learning, based on Empirical Risk Minimization (ERM), typically aims to fit the ideal outputs of all samples due to its large capacity. However, models trained based on empirical losses like cross entropy (CE) or mean square error ...
Highlights
- The stepwise-changed CE covers the deficiency on classification error.
- Leader ...
Read More
Genetic Programming for Reward Function Search

Reward functions in reinforcement learning have largely been assumed given as part of the problem being solved by the agent. However, the psychological notion of intrinsic motivation has recently inspired inquiry into whether there exist alternate ...
Read More
Genetic programming methods for reinforcement learning
GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference

Reinforcement Learning (RL) algorithms can be used to optimally solve dynamic decision-making and control problems. With continuous-valued state and input variables, RL algorithms must rely on function approximators to represent the value function and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference
July 2023
1667 pages
ISBN:9798400701191
DOI:10.1145/3583131
Chair:
Sara Silva,
Program Chair:
Luís Paquete
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 July 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
meta-learning
loss function learning
genetic programming
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,669of4,410submissions,38%
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 140
  Total Downloads
- Downloads (Last 12 months)140
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast and Efficient Local-Search for Genetic Programming Based Loss Function Learning

GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Leader learning loss function in neural network classification

Genetic Programming for Reward Function Search

Genetic programming methods for reinforcement learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Fast and Efficient Local-Search for Genetic Programming Based Loss Function Learning

GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Leader learning loss function in neural network classification

Genetic Programming for Reward Function Search

Genetic programming methods for reinforcement learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media