abstract = "As data science becomes more mainstream, there will be
an ever-growing demand for data science tools that are
more accessible, flexible, and scalable. In response to
this demand, automated machine learning (AutoML)
researchers have begun building systems that automate
the process of designing and optimizing machine
learning pipelines. In this paper we present TPOT, an
open source genetic programming-based AutoML system
that optimizes a series of feature preprocessors and
machine learning models with the goal of maximizing
classification accuracy on a supervised classification
task. We benchmark TPOT on a series of 150 supervised
classification tasks and find that it significantly
outperforms a basic machine learning analysis in 22 of
them, while experiencing minimal degradation in
accuracy on 5 of the benchmarks, all without any domain
knowledge nor human input. As such, GP-based AutoML
systems show considerable promise in the AutoML
domain.",