abstract = "We describe FlexGP, the first Genetic Programming
system to perform symbolic regression on large-scale
datasets on the cloud via massive data-parallel
ensemble learning. Flex-GP provides a decentralized,
fault tolerant parallelization framework that runs many
copies of Multiple Regression Genetic Programming, a
sophisticated symbolic regression algorithm, on the
cloud. Each copy executes with a different sample of
the data and different parameters. The framework can
create a fused model or ensemble on demand as the
individual GP learners are evolving. We demonstrate our
framework by deploying 100 independent GP instances in
a massive data-parallel manner to learn from a dataset
composed of 515K exemplars and 90 features, and by
generating a competitive fused model in less than 10
minutes.",
notes = "DataModeler\cite{Friese:2012:dortmund} and Eureqa
\cite{Science09:Schmidt} Data Modeller, Eurequa,
embarrassingly parallel, LASSO, factor subsets,
NSGA-II,regularized linear regression, Vopal Wabbit,
million song dataset DynEq GP, producer
effect