skip to main content
10.1145/3583133.3596367acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

MLStar: A System for Synthesis of Machine-Learning Programs

Published:24 July 2023Publication History

ABSTRACT

This paper introduces our auto-ML system, MLStar, which uses genetic programming to create scikit-learn and Keras-based Python programs to perform supervised learning. MLStar leverages our own genetic programming system (GPStar4) and provides a greater search space compared to traditional genetic programming frameworks.

Key elements that enable MLStar's performance include representing individuals as Directed Acyclic Graphs (DAGs), a rich type system to shape the kinds of graphs generated, novel genetic operators which work on the DAG structure, and advanced hyperparameter tuning via the Optuna hyperparameter optimization framework. MLStar also offers multiobjective fitnesses and a variety of complex population types.

We show that MLStar performs favorably to several other auto-ML frameworks on benchmark tests. We also demonstrate that MLStar is capable of competitive solutions even when running with computationally expensive features disabled.

References

  1. 2023. auto_ml. https://github.com/ClimbsRocks/auto_mlGoogle ScholarGoogle Scholar
  2. Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. James Bergstra, Daniel Yamins, and David Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 28), Sanjoy Dasgupta and David McAllester (Eds.). PMLR, Atlanta, Georgia, USA, 115--123. https://proceedings.mlr.press/v28/bergstra13.htmlGoogle ScholarGoogle Scholar
  4. Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD '16). ACM, New York, NY, USA, 785--794. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Francois Chollet et al. 2015. Keras. https://github.com/fchollet/kerasGoogle ScholarGoogle Scholar
  6. Piali Das, Nikita Ivkin, Tanya Bansal, Laurence Rouesnel, Philip Gautier, Zohar Karnin, Leo Dirac, Lakshmi Ramakrishnan, Andre Perunicic, Iaroslav Shcherbatyi, Wilton Wu, Aida Zolic, Huibin Shen, Amr Ahmed, Fela Winkelmolen, Miroslav Miladinovic, Cedric Archembeau, Alex Tang, Bhaskar Dutt, Patricia Grao, and Kumar Venkateswar. 2020. Amazon SageMaker Autopilot: A White Box AutoML Solution at Scale. In Proceedings of the Fourth International Workshop on Data Management for End-to-End Machine Learning (Portland, OR, USA) (DEEM'20). Association for Computing Machinery, New York, NY, USA, Article 2, 7 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and Robust Automated Machine Learning. In Advances in Neural Information Processing Systems 28 (2015). 2962--2970.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Haifeng Jin, Qingquan Song, and Xia Hu. 2019. Auto-Keras: An Efficient Neural Architecture Search System. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD '19). Association for Computing Machinery, New York, NY, USA, 1946--1956. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gabriel Kopito Julien Amblard, Robert Filman. 2023. GPStar4: A flexible framework for experimenting with genetic programming. submitted to GECCO 2023. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. James Max Kanter and Kalyan Veeramachaneni. 2015. Deep feature synthesis: Towards automating data science endeavors. 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (2015), 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  11. Donald E. Knuth. 1968. Semantics of Context-Free Languages. Math. Syst. Theory 2, 2 (1968), 127--145. Google ScholarGoogle ScholarCross RefCross Ref
  12. Erin LeDell and Sebastien Poirier. 2020. H2O AutoML: Scalable Automatic Machine Learning. 7th ICML Workshop on Automated Machine Learning (AutoML) (July 2020). https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_61.pdfGoogle ScholarGoogle Scholar
  13. Edgar Galván López and Katya Rodríguez-Vázquez. 2007. Multiple Interactive Outputs in a Single Tree: An Empirical Investigation. In Genetic Programming, 10th European Conference, EuroGP 2007, Valencia, Spain, April 11--13, 2007, Proceedings (Lecture Notes in Computer Science, Vol. 4445), Marc Ebner, Michael O'Neill, Anikó Ekárt, Leonardo Vanneschi, and Anna Esparcia-Alcázar (Eds.). Springer, 341--350. Google ScholarGoogle ScholarCross RefCross Ref
  14. Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore. 2016. Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. Google ScholarGoogle ScholarCross RefCross Ref
  15. Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore. 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10, 36 (11 Dec 2017), 1--13. Google ScholarGoogle ScholarCross RefCross Ref
  16. Michael O'Neill. 2009. Riccardo Poli, William B. Langdon, Nicholas F. McPhee: A Field Guide to Genetic Programming: Lulu. com, 2008, 250 pp, ISBN 978-1-4092-0073-4.Google ScholarGoogle Scholar
  17. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2018/file/14491b756b3a51daac41c24863285549-Paper.pdfGoogle ScholarGoogle Scholar
  19. Léo Françoso D. P. Sotto, Paul Kaufmann, Timothy Atkinson, Roman Kalkreuth, and Márcio Porto Basgalupp. 2020. A Study on Graph Representations for Genetic Programming. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (Cancún, Mexico) (GECCO '20). Association for Computing Machinery, New York, NY, USA, 931--939. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. In-Kwon Yeo and Richard A. Johnson. 2000. A new family of power transformations to improve normality or symmetry. Biometrika 87, 4 (12 2000), 954--959. arXiv:https://academic.oup.com/biomet/article-pdf/87/4/954/633221/870954.pdf Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. MLStar: A System for Synthesis of Machine-Learning Programs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary Computation
      July 2023
      2519 pages
      ISBN:9798400701207
      DOI:10.1145/3583133

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 July 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,669of4,410submissions,38%

      Upcoming Conference

      GECCO '24
      Genetic and Evolutionary Computation Conference
      July 14 - 18, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader