Abstract
Feature generation is the problem of automatically constructing good features for a given target learning problem. While most feature generation algorithms belong either to the filter or to the wrapper approach, this paper focuses on embedded feature generation. We propose a general scheme to embed feature generation in a wide range of tree-based learning algorithms, including single decision trees, random forests and tree boosting. It is based on the formalization of feature construction as a sequential decision making problem addressed by a tractable Monte Carlo search algorithm coupled with node splitting. This leads to fast algorithms that are applicable to large-scale problems. We empirically analyze the performances of these tree-based learners combined or not with the feature generation capability on several standard datasets.
Chapter PDF
References
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: ICML 2006, pp. 161–168. ACM, New York (2006)
Cazenave, T.: Nested monte-carlo search. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI 2009), pp. 456–461 (2009)
Chaslot, G., Bakkes, S., Szita, I., Spronck, P.: Monte-carlo tree search: A new framework for game ai. In: Darken, C., Mateas, M. (eds.) Proceedings of the Fourth Artificial Intelligence and Interactive Digital Entertainment Conference (2008)
Ekárt, A., Márkus, A.: Using genetic programming and decision trees for generating structural descriptions of four bar mechanisms. Artif. Intell. Eng. Des. Anal. Manuf. 17(3), 205–220 (2003)
Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. Trans. Sys. Man Cyber. Part C 40(2), 121–144 (2010)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learning 36(1), 3–42 (2006)
Guo, H., Jack, L.B., Nandi, A.K.: Feature generation using genetic programming with application to fault classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B 35(1), 89–99 (2005)
Kégl, B., Busa-Fekete, R.: Boosting products of base classifiers. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, pp. 497–504. ACM, New York (2009)
Krzysztof, K.: Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genetic Programming and Evolvable Machines 3(4), 329–343 (2002)
Markovitch, S., Rosenstein, D.: Feature generation using general constructor functions. Machine Learning 49, 59–98 (2002)
Murthy, S.K., Kasif, S., Salzberg, S.: A system for induction of oblique decision trrees. Journal of Artificial Intelligence Research 2, 1–32 (1994)
Ng, A.Y.: Feature selection, L1 vs. L2 regularization, and rotational invariance. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004. ACM, New York (2004)
Pachet, F., Roy, P.: Analytical features: a knowledge-based approach to audio feature generation. EURASIP J. Audio Speech Music Process., 1–23 (2009)
Pagallo, G., Haussler, D.: Boolean feature discovery in empirical learning. Machine Learning 5(1), 71–99 (1990)
Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A.: Genetic programming for improved data mining: application to the biochemistry of protein interactions. In: Proceedings of the First Annual Conference on Genetic Programming, GECCO 1996, pp. 375–380. MIT Press, Cambridge (1996)
Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning 37(3), 297–336 (1999)
Smith, M.G., Bull, L.: Genetic programming with a genetic algorithm for feature construction and selection. Genetic Programming and Evolvable Machines 6(3), 265–281 (2005)
Yang, D.-S., Rendell, L., Blix, G.: A scheme for feature construction and a comparison of empirical methods. In: Proceedings of the Twelfth International Joint Conference on Artificial Intelligence, pp. 699–704. Morgan Kaufmann (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maes, F., Geurts, P., Wehenkel, L. (2012). Embedding Monte Carlo Search of Features in Tree-Based Ensemble Methods. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33460-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-33460-3_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33459-7
Online ISBN: 978-3-642-33460-3
eBook Packages: Computer ScienceComputer Science (R0)