ABSTRACT
We introduce Evolutionary Feature Synthesis (EFS), a regression method that generates readable, nonlinear models of small to medium size datasets in seconds. EFS is, to the best of our knowledge, the fastest regression tool based on evolutionary computation reported to date. The feature search involved in the proposed method is composed of two main steps: feature composition and feature subset selection. EFS adopts a bottom-up feature composition strategy that eliminates the need for a symbolic representation of the features and exploits the variable selection process involved in pathwise regularized linear regression to perform the feature subset selection step. The result is a regression method that is competitive against neural networks, and outperforms both linear methods and Multiple Regression Genetic Programming, up to now the best regression tool based on evolutionary computation.
- Multiple regression genetic programming in Java. http://flexgp.github.io/gp-learners/, 2014.Google Scholar
- I. Arnaldo, K. Krawiec, and U.-M. O'Reilly. Multiple regression genetic programming. In Proceedings of the 2014 Conference on Genetic and Evolutionary Computation, GECCO '14, pages 879--886, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- T. Bertin-Mahieux, D. Ellis, B. Whitman, and P. Lamere. The million song dataset. In ISMIR 2011: Proceedings of the 12th International Society for Music Information Retrieval Conference, October 24--28, Miami, Florida, pages 591--596, 2011.Google Scholar
- V. V. De Melo. Kaizen programming. In Proceedings of the 2014 Conference on Genetic and Evolutionary Computation, GECCO '14, pages 895--902, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- J. H. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1--22, 2 2010.Google ScholarCross Ref
- Y. Ganjisaffar. Lasso4j. https://code.google.com/p/lasso4j/, 2014.Google Scholar
- M. Garcia-Limon, H. J. Escalante, E. Morales, and A. Morales-Reyes. Simultaneous generation of prototypes and features through genetic programming. In Proceedings of the 2014 Conference on Genetic and Evolutionary Computation, GECCO '14, pages 517--524, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- C. Gathercole and P. Ross. Dynamic training subset selection for supervised learning in genetic programming. In Y. Davidor, H.-P. Schwefel, and R. Manner, editors, Parallel Problem Solving from Nature, PPSN III, volume 866 of Lecture Notes in Computer Science, pages 312--321. Springer Berlin Heidelberg, 1994. Google ScholarDigital Library
- I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157--1182, Mar. 2003. Google ScholarDigital Library
- T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning: data mining, inference and prediction. Springer, 2nd edition, 2009.Google ScholarCross Ref
- I. Icke and J. Bongard. Improving genetic programming based symbolic regression using deterministic machine learning. In 2013 IEEE Congress on Evolutionary Computation (CEC), pages 1763--1770, June 2013.Google ScholarCross Ref
- U. Kamath, J. Lin, and K. De Jong. SAX-EFG: An evolutionary feature generation framework for time series classification. In Proceedings of the 2014 Conference on Genetic and Evolutionary Computation, GECCO '14, pages 533--540, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- K. Krawiec and U.-M. O'Reilly. Behavioral programming: A broader and more detailed take on semantic GP. In Proceedings of the 2014 Conference on Genetic and Evolutionary Computation, GECCO '14, pages 935--942, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- J. Langford, L. Li, and T. Zhang. Sparse online learning via truncated gradient. Journal of Machine Learning Research, 10:777--801, June 2009. Google ScholarDigital Library
- Y. Lin and B. Bhanu. Evolutionary feature synthesis for object recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 35(2):156--171, May 2005. Google ScholarDigital Library
- MathWorks. Neural network toolbox, 2014.Google Scholar
- T. McConaghy. FFX: Fast, scalable, deterministic symbolic regression technology. In R. Riolo, E. Vladislavleva, and J. H. Moore, editors, Genetic Programming Theory and Practice IX, Genetic and Evolutionary Computation, pages 235--260. Springer New York, 2011.Google ScholarCross Ref
- R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58:267--288, 1994.Google Scholar
- A. Tsanas and A. Xifara. Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy and Buildings, 49(0):560--567, 2012.Google ScholarCross Ref
- F. Xue, R. Subbu, and P. Bonissone. Locally weighted fusion of multiple predictive models. In International Joint Conference on Neural Networks, 2006. IJCNN '06., pages 2137--2143, 2006.Google Scholar
Recommendations
Heterogeneous feature subset selection using mutual information-based feature transformation
Conventional mutual information (MI) based feature selection (FS) methods are unable to handle heterogeneous feature subset selection properly because of data format differences or estimation methods of MI between feature subset and class label. A way ...
Feature subset selection for logistic regression via mixed integer optimization
This paper concerns a method of selecting a subset of features for a logistic regression model. Information criteria, such as the Akaike information criterion and Bayesian information criterion, are employed as a goodness-of-fit measure. The purpose of ...
Selecting feature subset for high dimensional data via the propositional FOIL rules
Feature interaction is an important issue in feature subset selection. However, most of the existing algorithms only focus on dealing with irrelevant and redundant features. In this paper, a propositional FOIL rule based algorithm FRFS, which not only ...
Comments