ABSTRACT
Tree-based Pipeline Optimization Tool (TPOT) is an automated machine learning (AutoML) system that recommends optimal pipeline for supervised learning problems by scanning data for novel features, selecting appropriate models and optimizing their parameters. However, like other AutoML systems, TPOT may reach computational resource limits when working on big data such as whole-genome expression data. We develop two novel features for TPOT, Feature Set Selector and Template, which leverage domain knowledge, greatly reduce the computational expense and flexibly extend TPOT's application to biomedical big data analysis.
Supplemental Material
Available for Download
Supplemental material.
- Wolfgang Banzhaf, Frank D. Francone, Robert E. Keller, and Peter Nordin. 1998. Genetic Programming: An Introduction: on the Automatic Evolution of Computer Programs and Its Applications. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarDigital Library
- K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (April 2002), 182--197. Google ScholarDigital Library
- Trang T Le, Nigel O Blackwood, Jaclyn N Taroni, Weixuan Fu, and Matthew K Breitenstein. 2018. Integrated machine learning pipeline for aberrant biomarker enrichment (i-mAB): characterizing clusters of differentiation within a compendium of systemic lupus erythematosus patients. AMIA ... Annual Symposium proceedings. AMIA Symposium 2018 (Dec. 2018), 1358--1367. https://www.ncbi.nlm.nih.gov/pubmed/30815180Google Scholar
- Trang T. Le, Jonathan Savitz, Hideo Suzuki, Masaya Misaki, T. Kent Teague, Bill C. White, Julie H. Marino, Graham Wiley, Patrick M. Gaffney, Wayne C. Drevets, Brett A. McKinney, and Jerzy Bodurka. 2018. Identification and replication of RNA-Seq gene network modules associated with depression severity. Translational Psychiatry 8, 1 (Sept. 2018), 180. Google ScholarCross Ref
- Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore. 2016. Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. ACM Press, 485--492. Google ScholarDigital Library
- Randal S Olson, William La Cava, Zairah Mustahsan, Akshay Varik, and Jason H Moore. 2018. Data-driven advice for applying machine learning to bioinformatics problems. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 23 (2018), 192--203. https://www.ncbi.nlm.nih.gov/pubmed/29218881Google ScholarCross Ref
- Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore. 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10, 1 (Dec. 2017). Google ScholarCross Ref
Index Terms
Large scale biomedical data analysis with tree-based automated machine learning
Recommendations
Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science
GECCO '16: Proceedings of the Genetic and Evolutionary Computation Conference 2016As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the ...
From Zero to AI Hero with Automated Machine Learning
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningAutomated ML is an emerging field in Machine Learning that helps developers and new data scientists with little data science knowledge build Machine Learning models and solutions without understanding the complexity of Learning Algorithm selection, and ...
A General Recipe for Automated Machine Learning in Practice
Advances in Artificial Intelligence – IBERAMIA 2022AbstractAutomated Machine Learning (AutoML) is an area of research that focuses on developing methods to generate machine learning models automatically. The idea of being able to build machine learning models with very little human intervention represents ...
Comments