skip to main content
10.1145/3583131.3590361acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Fast and Efficient Local-Search for Genetic Programming Based Loss Function Learning

Published:12 July 2023Publication History

ABSTRACT

In this paper, we develop upon the topic of loss function learning, an emergent meta-learning paradigm that aims to learn loss functions that significantly improve the performance of the models trained under them. Specifically, we propose a new meta-learning framework for task and model-agnostic loss function learning via a hybrid search approach. The framework first uses genetic programming to find a set of symbolic loss functions. Second, the set of learned loss functions is subsequently parameterized and optimized via unrolled differentiation. The versatility and performance of the proposed framework are empirically validated on a diverse set of supervised learning tasks. Results show that the learned loss functions bring improved convergence, sample efficiency, and inference performance on tabulated, computer vision, and natural language processing problems, using a variety of task-specific neural network architectures.

References

  1. Han Altae-Tran, Bharath Ramsundar, Aneesh S Pappu, and Vijay Pande. 2017. Low Data Drug Discovery with One-Shot Learning. ACS central science 3, 4 (2017), 283--293.Google ScholarGoogle Scholar
  2. Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. 2016. Learning to learn by gradient descent by gradient descent. In Advances in neural information processing systems. 3981--3989.Google ScholarGoogle Scholar
  3. Yogesh Balaji, Swami Sankaranarayanan, and Rama Chellappa. 2018. MetaReg: Towards Domain Generalization using Meta-Regularization. Advances in Neural Information Processing Systems 31 (2018), 998--1008.Google ScholarGoogle Scholar
  4. Sarah Bechtle, Artem Molchanov, Yevgen Chebotar, Edward Grefenstette, Ludovic Righetti, Gaurav Sukhatme, and Franziska Meier. 2021. Meta-Learning via Learned Loss. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 4161--4168.Google ScholarGoogle ScholarCross RefCross Ref
  5. Samy Bengio, Yoshua Bengio, and Jocelyn Cloutier. 1994. Use of Genetic Programming for the Search of a New Learning Rule for Neural Networks. In Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence. IEEE, 324--327.Google ScholarGoogle ScholarCross RefCross Ref
  6. Can Chen, Xi Chen, Chen Ma, Zixuan Liu, and Xue Liu. 2022. Gradient-based bi-level optimization for deep learning: A survey. arXiv preprint arXiv:2207.11719 (2022).Google ScholarGoogle Scholar
  7. Qi Chen, Bing Xue, and Mengjie Zhang. 2015. Generalisation and Domain Adaptation in GP with Gradient Descent for Symbolic Regression. In 2015 IEEE congress on evolutionary computation (CEC). IEEE, 1137--1144.Google ScholarGoogle Scholar
  8. John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Sergey Levine, Quoc V Le, Honglak Lee, and Aleksandra Faust. 2021. Evolving Reinforcement Learning Algorithms. arXiv preprint arXiv:2101.03958 (2021).Google ScholarGoogle Scholar
  9. Alan Collet, Antonio Bazco-Nogueras, Albert Banchs, and Marco Fiore. 2022. Loss meta-learning for forecasting. https://openreview.net/forum?id=rczz7TUKIIBGoogle ScholarGoogle Scholar
  10. Charles-Alban Deledalle, Samuel Vaiter, Jalal Fadili, and Gabriel Peyré. 2014. Stein Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter selection. SIAM Journal on Imaging Sciences 7, 4 (2014), 2448--2487.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yadong Ding, Yu Wu, Chengyue Huang, Siliang Tang, Yi Yang, Longhui Wei, Yueting Zhuang, and Qi Tian. 2022. Learning to learn by jointly optimizing neural architecture and weights. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 129--138.Google ScholarGoogle ScholarCross RefCross Ref
  12. Justin Domke. 2012. Generic methods for optimization-based modeling. In Artificial Intelligence and Statistics. PMLR, 318--326.Google ScholarGoogle Scholar
  13. Thomas Elsken, Benedikt Staffler, Jan Hendrik Metzen, and Frank Hutter. 2020. Meta-learning of neural architectures for few-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12365--12375.Google ScholarGoogle ScholarCross RefCross Ref
  14. Chelsea Finn. 2018. Learning to learn with gradients. Ph. D. Dissertation. UC Berkeley.Google ScholarGoogle Scholar
  15. Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In International Conference on Machine Learning. PMLR, 1126--1135.Google ScholarGoogle Scholar
  16. Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: Evolutionary Algorithms Made Easy. The Journal of Machine Learning Research 13, 1 (2012), 2171--2175.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. 2017. Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning. PMLR, 1165--1173.Google ScholarGoogle Scholar
  18. Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, and Massimiliano Pontil. 2018. Bilevel programming for hyperparameter optimization and meta-learning. In International Conference on Machine Learning. PMLR, 1568--1577.Google ScholarGoogle Scholar
  19. Boyan Gao, Henry Gouk, and Timothy M Hospedales. 2021. Searching for Robustness: Loss Learning for Noisy Classification Tasks. arXiv preprint arXiv:2103.00243 (2021).Google ScholarGoogle Scholar
  20. Boyan Gao, Henry Gouk, Yongxin Yang, and Timothy Hospedales. 2022. Loss function learning for domain generalization by implicit gradient. In International Conference on Machine Learning. PMLR, 7002--7016.Google ScholarGoogle Scholar
  21. Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 249--256.Google ScholarGoogle Scholar
  22. Santiago Gonzalez and Risto Miikkulainen. 2020. Improved Training Speed, Accuracy, and Data Utilization Through Loss Function Optimization. In 2020 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1--8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Santiago Gonzalez and Risto Miikkulainen. 2021. Optimizing Loss Functions through Multi-Variate Taylor Polynomial Parameterization. In Proceedings of the Genetic and Evolutionary Computation Conference. 305--313.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Josif Grabocka, Randolf Scholz, and Lars Schmidt-Thieme. 2019. Learning Surrogate Losses. arXiv preprint arXiv:1905.10108 (2019).Google ScholarGoogle Scholar
  25. Edward Grefenstette, Brandon Amos, Denis Yarats, Phu Mon Htut, Artem Molchanov, Franziska Meier, Douwe Kiela, Kyunghyun Cho, and Soumith Chintala. 2019. Generalized Inner Loop Meta-Learning. arXiv preprint arXiv:1910.01727 (2019).Google ScholarGoogle Scholar
  26. Timothy Hospedales, Antreas Antoniou, Paul Micaelli, and Amos Storkey. 2020. Meta-learning in neural networks: A survey. arXiv preprint arXiv:2004.05439 (2020).Google ScholarGoogle Scholar
  27. Rein Houthooft, Yuhua Chen, Phillip Isola, Bradly Stadie, Filip Wolski, OpenAI Jonathan Ho, and Pieter Abbeel. 2018. Evolved Policy Gradients. In Advances in Neural Information Processing Systems, Vol. 31. 5405--5414. https://proceedings.neurips.cc/paper/2018/file/7876acb66640bad41f1e1371ef30c180-Paper.pdfGoogle ScholarGoogle Scholar
  28. Chen Huang, Shuangfei Zhai, Walter Talbott, Miguel Bautista Martin, Shih-Yu Sun, Carlos Guestrin, and Josh Susskind. 2019. Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment. In International Conference on Machine Learning. PMLR, 2891--2900.Google ScholarGoogle Scholar
  29. Andrey Ignatov, Radu Timofte, Andrei Kulik, Seungsoo Yang, Ke Wang, Felix Baum, Max Wu, Lirong Xu, and Luc Van Gool. 2019. AI Benchmark: All About Deep Learning on Smartphones in 2019. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 3617--3635.Google ScholarGoogle ScholarCross RefCross Ref
  30. Jaehong Kim, Sangyeul Lee, Sungwan Kim, Moonsu Cha, Jung Kwon Lee, Youngduck Choi, Yongseok Choi, Dong-Yeon Cho, and Jiwon Kim. 2018. Auto-meta: Automated gradient based meta learner search. arXiv preprint arXiv:1806.06927 (2018).Google ScholarGoogle Scholar
  31. John R Koza. 1992. Genetic programming: on the programming of computers by means of natural selection. Vol. 1. MIT press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Alex Krizhevsky and Geoffrey Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. (2009).Google ScholarGoogle Scholar
  33. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  34. Hao Li, Tianwen Fu, Jifeng Dai, Hongsheng Li, Gao Huang, and Xizhou Zhu. 2021. AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks. arXiv preprint arXiv:2103.14026 (2021).Google ScholarGoogle Scholar
  35. Yiying Li, Yongxin Yang, Wei Zhou, and Timothy Hospedales. 2019. Feature-critic networks for heterogeneous domain generalization. In International Conference on Machine Learning. PMLR, 3915--3924.Google ScholarGoogle Scholar
  36. Peidong Liu, Gengwei Zhang, Bochao Wang, Hang Xu, Xiaodan Liang, Yong Jiang, and Zhenguo Li. 2021. Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search. arXiv preprint arXiv:2102.04700 (2021).Google ScholarGoogle Scholar
  37. Jonathan Lorraine, Paul Vicol, and David Duvenaud. 2020. Optimizing millions of hyperparameters by implicit differentiation. In International Conference on Artificial Intelligence and Statistics. PMLR, 1540--1552.Google ScholarGoogle Scholar
  38. Dougal Maclaurin, David Duvenaud, and Ryan Adams. 2015. Gradient-based hyperparameter optimization through reversible learning. In International conference on machine learning. PMLR, 2113--2122.Google ScholarGoogle Scholar
  39. Ji Ni, Russ H Drieberg, and Peter I Rockett. 2012. The Use of an Analytic Quotient Operator in Genetic Programming. IEEE Transactions on Evolutionary Computation 17, 1 (2012), 146--152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Alex Nichol, Joshua Achiam, and John Schulman. 2018. On First-Order Meta-Learning Algorithms. arXiv preprint arXiv:1803.02999 (2018).Google ScholarGoogle Scholar
  41. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An Imperative Style, High-Performance Deep Learning Library. Advances in neural information processing systems 32 (2019), 8026--8037.Google ScholarGoogle Scholar
  42. Huimin Peng. 2020. A Comprehensive Overview and Survey of Recent Advances in Meta-Learning. arXiv preprint arXiv:2004.11149 (2020).Google ScholarGoogle Scholar
  43. Aravind Rajeswaran, Chelsea Finn, Sham Kakade, and Sergey Levine. 2019. Meta-learning with implicit gradients. (2019).Google ScholarGoogle Scholar
  44. Prajit Ramachandran, Barret Zoph, and Quoc V Le. 2017. Searching for Activation Functions. arXiv preprint arXiv:1710.05941 (2017).Google ScholarGoogle Scholar
  45. Delip Rao and Brian McMahan. 2019. Natural Language Processing with PyTorch: Build Intelligent Language Applications using Deep Learning. " O'Reilly Media, Inc.".Google ScholarGoogle Scholar
  46. Christian Raymond, Qi Chen, Bing Xue, and Mengjie Zhang. 2023. Online Loss Function Learning. arXiv e-prints, Article arXiv:2301.13247 (Jan. 2023), arXiv:2301.13247 pages. arXiv:2301.13247 [cs.LG] Google ScholarGoogle ScholarCross RefCross Ref
  47. Esteban Real, Chen Liang, David So, and Quoc Le. 2020. AutoML-Zero: Evolving Machine Learning Algorithms From Scratch. In International Conference on Machine Learning. PMLR, 8007--8019.Google ScholarGoogle Scholar
  48. David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986. Learning representations by back-propagating errors. nature 323, 6088 (1986), 533--536.Google ScholarGoogle Scholar
  49. Jürgen Schmidhuber. 1987. Evolutionary Principles in Self-Referential Learning. Ph. D. Dissertation. Technische Universität München.Google ScholarGoogle Scholar
  50. Jürgen Schmidhuber. 1992. Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks. Neural Computation 4, 1 (1992), 131--139.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Damien Scieur, Quentin Bertrand, Gauthier Gidel, and Fabian Pedregosa. 2022. The Curse of Unrolling: Rate of Differentiating Through Optimization. arXiv preprint arXiv:2209.13271 (2022).Google ScholarGoogle Scholar
  52. Amirreza Shaban, Ching-An Cheng, Nathan Hatch, and Byron Boots. 2019. Truncated back-propagation for bilevel optimization. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 1723--1732.Google ScholarGoogle Scholar
  53. Will Smart and Mengjie Zhang. 2004. Applying Online Gradient Descent Search to Genetic Programming for Object Recognition. In Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation-Volume 32. 133--138.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Kenneth O Stanley, Jeff Clune, Joel Lehman, and Risto Miikkulainen. 2019. Designing neural networks through neuroevolution. Nature Machine Intelligence 1, 1 (2019), 24--35.Google ScholarGoogle ScholarCross RefCross Ref
  55. Alexander Topchy and William F Punch. 2001. Faster Genetic Programming based on Local Gradient Search of Numeric Leaf Values. In Proceedings of the genetic and evolutionary computation conference (GECCO-2001), Vol. 155162. Morgan Kaufmann.Google ScholarGoogle Scholar
  56. Joaquin Vanschoren. 2018. Meta-learning: A survey. arXiv preprint arXiv:1810.03548 (2018).Google ScholarGoogle Scholar
  57. Ricardo Vilalta and Youssef Drissi. 2002. A perspective view and survey of meta-learning. Artificial intelligence review 18, 2 (2002), 77--95.Google ScholarGoogle Scholar
  58. Qi Wang, Yue Ma, Kun Zhao, and Yingjie Tian. 2022. A comprehensive survey of loss functions in machine learning. Annals of Data Science 9, 2 (2022), 187--212.Google ScholarGoogle ScholarCross RefCross Ref
  59. Robert Edwin Wengert. 1964. A simple automatic derivative evaluation program. Commun. ACM 7, 8 (1964), 463--464.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Mengjie Zhang and Will Smart. 2005. Learning Weights in Genetic Programs Using Gradient Descent for Object Recognition. In Workshops on Applications of Evolutionary Computation. Springer, 417--427.Google ScholarGoogle Scholar

Index Terms

  1. Fast and Efficient Local-Search for Genetic Programming Based Loss Function Learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference
      July 2023
      1667 pages
      ISBN:9798400701191
      DOI:10.1145/3583131

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 July 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,669of4,410submissions,38%

      Upcoming Conference

      GECCO '24
      Genetic and Evolutionary Computation Conference
      July 14 - 18, 2024
      Melbourne , VIC , Australia
    • Article Metrics

      • Downloads (Last 12 months)140
      • Downloads (Last 6 weeks)10

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader