Created by W.Langdon from gp-bibliography.bib Revision:1.8592
However, when the dataset is large, each individual configuration takes longer to execute, therefore the overall AutoML running times become increasingly high.
To this end, we present SubStrat, an AutoML optimization strategy that tackles the data size, rather than configuration space. It wraps existing AutoML tools, and instead of executing them directly on the entire dataset, SubStrat uses a genetic-based algorithm to find a small yet representative data subset that preserves a particular characteristic of the full data. It then employs the AutoML tool on the small subset, and finally, it refines the resulting pipeline by executing a restricted, much shorter, AutoML process on the large dataset. Our experimental results, performed on three popular AutoML frameworks, Auto-Sklearn, TPOT, and H2O show that SubStrat reduces their running times by 76.3 percent (on average), with only a 4.15 percent average decrease in the accuracy of the resulting ML pipeline.",
Genetic Programming entries for Teddy Lazebnik Amit Somech Abraham Itzhak Weinberg