research-article

MLStar: A System for Synthesis of Machine-Learning Programs

Authors:
Gabriel Kopito

PerformanceStar, Santa Clara, United States of America

PerformanceStar, Santa Clara, United States of America

https://orcid.org/0000-0003-2282-6113
View Profile

,
Jonathan Schwartz

PerformanceStar, Santa Clara, United States of America

PerformanceStar, Santa Clara, United States of America

https://orcid.org/0000-0003-2203-9814
View Profile

,
Julien Amblard

PerformanceStar, Santa Clara, United States of America

PerformanceStar, Santa Clara, United States of America

https://orcid.org/0000-0002-7072-9281
View Profile

,
Robert Filman

PerformanceStar, Santa Clara, United States of America

PerformanceStar, Santa Clara, United States of America

https://orcid.org/0000-0002-9566-5341
View Profile

,
Landon Rabern

PerformanceStar, Santa Clara, United States of America

PerformanceStar, Santa Clara, United States of America

https://orcid.org/0000-0002-8075-6806
View Profile

GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary ComputationJuly 2023Pages 1721–1726https://doi.org/10.1145/3583133.3596367

Published:24 July 2023Publication History

GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary Computation

Pages 1721–1726

ABSTRACT

This paper introduces our auto-ML system, MLStar, which uses genetic programming to create scikit-learn and Keras-based Python programs to perform supervised learning. MLStar leverages our own genetic programming system (GPStar4) and provides a greater search space compared to traditional genetic programming frameworks.

Key elements that enable MLStar's performance include representing individuals as Directed Acyclic Graphs (DAGs), a rich type system to shape the kinds of graphs generated, novel genetic operators which work on the DAG structure, and advanced hyperparameter tuning via the Optuna hyperparameter optimization framework. MLStar also offers multiobjective fitnesses and a variety of complex population types.

We show that MLStar performs favorably to several other auto-ML frameworks on benchmark tests. We also demonstrate that MLStar is capable of competitive solutions even when running with computationally expensive features disabled.

References

2023. auto_ml. https://github.com/ClimbsRocks/auto_mlGoogle Scholar
Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Google ScholarDigital Library
James Bergstra, Daniel Yamins, and David Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 28), Sanjoy Dasgupta and David McAllester (Eds.). PMLR, Atlanta, Georgia, USA, 115--123. https://proceedings.mlr.press/v28/bergstra13.htmlGoogle Scholar
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD '16). ACM, New York, NY, USA, 785--794. Google ScholarDigital Library
Francois Chollet et al. 2015. Keras. https://github.com/fchollet/kerasGoogle Scholar
Piali Das, Nikita Ivkin, Tanya Bansal, Laurence Rouesnel, Philip Gautier, Zohar Karnin, Leo Dirac, Lakshmi Ramakrishnan, Andre Perunicic, Iaroslav Shcherbatyi, Wilton Wu, Aida Zolic, Huibin Shen, Amr Ahmed, Fela Winkelmolen, Miroslav Miladinovic, Cedric Archembeau, Alex Tang, Bhaskar Dutt, Patricia Grao, and Kumar Venkateswar. 2020. Amazon SageMaker Autopilot: A White Box AutoML Solution at Scale. In Proceedings of the Fourth International Workshop on Data Management for End-to-End Machine Learning (Portland, OR, USA) (DEEM'20). Association for Computing Machinery, New York, NY, USA, Article 2, 7 pages. Google ScholarDigital Library
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and Robust Automated Machine Learning. In Advances in Neural Information Processing Systems 28 (2015). 2962--2970.Google ScholarDigital Library
Haifeng Jin, Qingquan Song, and Xia Hu. 2019. Auto-Keras: An Efficient Neural Architecture Search System. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD '19). Association for Computing Machinery, New York, NY, USA, 1946--1956. Google ScholarDigital Library
Gabriel Kopito Julien Amblard, Robert Filman. 2023. GPStar4: A flexible framework for experimenting with genetic programming. submitted to GECCO 2023. Google ScholarDigital Library
James Max Kanter and Kalyan Veeramachaneni. 2015. Deep feature synthesis: Towards automating data science endeavors. 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (2015), 1--10.Google ScholarCross Ref
Donald E. Knuth. 1968. Semantics of Context-Free Languages. Math. Syst. Theory 2, 2 (1968), 127--145. Google ScholarCross Ref
Erin LeDell and Sebastien Poirier. 2020. H2O AutoML: Scalable Automatic Machine Learning. 7th ICML Workshop on Automated Machine Learning (AutoML) (July 2020). https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_61.pdfGoogle Scholar
Edgar Galván López and Katya Rodríguez-Vázquez. 2007. Multiple Interactive Outputs in a Single Tree: An Empirical Investigation. In Genetic Programming, 10th European Conference, EuroGP 2007, Valencia, Spain, April 11--13, 2007, Proceedings (Lecture Notes in Computer Science, Vol. 4445), Marc Ebner, Michael O'Neill, Anikó Ekárt, Leonardo Vanneschi, and Anna Esparcia-Alcázar (Eds.). Springer, 341--350. Google ScholarCross Ref
Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore. 2016. Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. Google ScholarCross Ref
Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore. 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10, 36 (11 Dec 2017), 1--13. Google ScholarCross Ref
Michael O'Neill. 2009. Riccardo Poli, William B. Langdon, Nicholas F. McPhee: A Field Guide to Genetic Programming: Lulu. com, 2008, 250 pp, ISBN 978-1-4092-0073-4.Google Scholar
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.Google ScholarDigital Library
Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2018/file/14491b756b3a51daac41c24863285549-Paper.pdfGoogle Scholar
Léo Françoso D. P. Sotto, Paul Kaufmann, Timothy Atkinson, Roman Kalkreuth, and Márcio Porto Basgalupp. 2020. A Study on Graph Representations for Genetic Programming. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (Cancún, Mexico) (GECCO '20). Association for Computing Machinery, New York, NY, USA, 931--939. Google ScholarDigital Library
In-Kwon Yeo and Richard A. Johnson. 2000. A new family of power transformations to improve normality or symmetry. Biometrika 87, 4 (12 2000), 954--959. arXiv:https://academic.oup.com/biomet/article-pdf/87/4/954/633221/870954.pdf Google ScholarCross Ref

Index Terms

MLStar: A System for Synthesis of Machine-Learning Programs
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Bio-inspired approaches
        Genetic programming

Recommendations

On finding optimal polytrees

We study the NP-hard problem of finding a directed acyclic graph (DAG) on a given set of nodes so as to maximize a given scoring function. The problem models the task of inferring a probabilistic network from data, which has been studied extensively in ...
Read More
Stack and Queue Layouts of Directed Acyclic Graphs: Part I

Stack layouts and queue layouts of undirected graphs have been used to model problems in fault-tolerant computing and in parallel process scheduling. However, problems in parallel process scheduling are more accurately modeled by stack and queue layouts ...
Read More
Learning Bayesian Network Classifiers: Searching in a Space of Partially Directed Acyclic Graphs

There is a commonly held opinion that the algorithms for learning unrestricted types of Bayesian networks, especially those based on the score+search paradigm, are not suitable for building competitive Bayesian network-based classifiers. Several ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary Computation
July 2023
2519 pages
ISBN:9798400701207
DOI:10.1145/3583133
Chair:
Sara Silva,
Program Chair:
Luís Paquete
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 July 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
genetic programming
directed acyclic graphs
ScikitLearn
auto-ML
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,669of4,410submissions,38%
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 31
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MLStar: A System for Synthesis of Machine-Learning Programs

GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary Computation

ABSTRACT

References

Cited By

Index Terms

Recommendations

On finding optimal polytrees

Stack and Queue Layouts of Directed Acyclic Graphs: Part I

Learning Bayesian Network Classifiers: Searching in a Space of Partially Directed Acyclic Graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

MLStar: A System for Synthesis of Machine-Learning Programs

GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary Computation

ABSTRACT

References

Cited By

Index Terms

Recommendations

On finding optimal polytrees

Stack and Queue Layouts of Directed Acyclic Graphs: Part I

Learning Bayesian Network Classifiers: Searching in a Space of Partially Directed Acyclic Graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media