Abstract
The objective of this paper is to define an effective strategy for building an ensemble of Genetic Programming (GP) models. Ensemble methods are widely used in machine learning due to their features: they average out biases, they reduce the variance and they usually generalize better than single models. Despite these advantages, building ensemble of GP models is not a well-developed topic in the evolutionary computation community. To fill this gap, we propose a strategy that blends individuals produced by standard syntax-based GP and individuals produced by geometric semantic genetic programming, one of the newest semantics-based method developed in GP. In fact, recent literature showed that combining syntax and semantics could improve the generalization ability of a GP model. Additionally, to improve the diversity of the GP models used to build up the ensemble, we propose different pruning criteria that are based on correlation and entropy, a commonly used measure in information theory. Experimental results, obtained over different complex problems, suggest that the pruning criteria based on correlation and entropy could be effective in improving the generalization ability of the ensemble model and in reducing the computational burden required to build it.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Vanneschi, L., Castelli, M., Silva, S.: A survey of semantic methods in genetic programming. Genet. Program. Evolvable Mach. 15(2), 195–214 (2014)
Castelli, M., Vanneschi, L., Felice, M.D.: Forecasting short-term electricity consumption using a semantics-based genetic programming framework: the South Italy case. Energy Econ. 47, 37–41 (2015)
Castelli, M., Castaldi, D., Giordani, I., Silva, S., Vanneschi, L., Archetti, F., Maccagnola, D.: An efficient implementation of geometric semantic genetic programming for anticoagulation level prediction in pharmacogenetics. In: Correia, L., Reis, L.P., Cascalho, J. (eds.) EPIA 2013. LNCS (LNAI), vol. 8154, pp. 78–89. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40669-0_8
Yoo, S., Xie, X., Kuo, F.C., Chen, T.Y., Harman, M.: Human competitiveness of genetic programming in spectrum-based fault localisation: theoretical and empirical analysis. ACM Trans. Softw. Eng. Methodol. 26(1), 4:1–4:30 (2017)
Picek, S., Mariot, L., Leporati, A., Jakobovic, D.: Evolving s-boxes based on cellular automata with genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO 2017, pp. 251–252. ACM, New York (2017)
Keijzer, M., Babovic, V.: Genetic programming, ensemble methods and the bias/variance tradeoff – introductory investigations. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 76–90. Springer, Heidelberg (2000). https://doi.org/10.1007/978-3-540-46239-2_6
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1
Castelli, M., Silva, S., Vanneschi, L.: A C++ framework for geometric semantic genetic programming. Genet. Program. Evolvable Mach. 16(1), 73–81 (2015)
Gonçalves, I., Silva, S., Fonseca, C.M., Castelli, M.: Unsure when to stop? Ask your semantic neighbors. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 929–936. ACM (2017)
Polikar, R.: Ensemble learning. In: Zhang, C., Ma, Y. (eds.) Ensemble Machine Learning, pp. 1–34. Springer, Boston (2012). https://doi.org/10.1007/978-1-4419-9326-7_1
Gonçalves, I.: An exploration of generalization and overfitting in genetic programming: standard and geometric semantic approaches. Ph.D. thesis, Department of Informatics Engineering, University of Coimbra, Portugal (2017)
Chen, Q., Xue, B., Shang, L., Zhang, M.: Improving generalisation of genetic programming for symbolic regression with structural risk minimisation. In: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference, pp. 709–716. ACM (2016)
Gonçalves, I., Silva, S., Fonseca, C.M.: On the generalization ability of geometric semantic genetic programming. In: Machado, P., Heywood, M.I., McDermott, J., Castelli, M., García-Sánchez, P., Burelli, P., Risi, S., Sim, K. (eds.) EuroGP 2015. LNCS, vol. 9025, pp. 41–52. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16501-1_4
Kommenda, M., Affenzeller, M., Burlacu, B., Kronberger, G., Winkler, S.M.: Genetic programming with data migration for symbolic regression. In: Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp. 1361–1366. ACM (2014)
Gonçalves, I., Silva, S.: Balancing learning and overfitting in genetic programming with interleaved sampling of training data. In: Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds.) EuroGP 2013. LNCS, vol. 7831, pp. 73–84. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37207-0_7
Gonçalves, I., Silva, S., Melo, J.B., Carreiras, J.M.B.: Random sampling technique for overfitting control in genetic programming. In: Moraglio, A., Silva, S., Krawiec, K., Machado, P., Cotta, C. (eds.) EuroGP 2012. LNCS, vol. 7244, pp. 218–229. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29139-5_19
Gonçalves, I., Silva, S.: Experiments on controlling overfitting in genetic programming. In: Proceedings of the 15th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence, EPIA 2011 (2011)
Castelli, M., Manzoni, L., Silva, S., Vanneschi, L.: A quantitative study of learning and generalization in genetic programming. In: Silva, S., Foster, J.A., Nicolau, M., Machado, P., Giacobini, M. (eds.) EuroGP 2011. LNCS, vol. 6621, pp. 25–36. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20407-4_3
Vanneschi, L., Bakurov, I., Castelli, M.: An initialization technique for geometric semantic GP based on demes evolution and despeciation. In: 2017 IEEE Congress on Evolutionary Computation (CEC), pp. 113–120. IEEE (2017)
Vanneschi, L., Galvão, B.: A parallel and distributed semantic genetic programming system. In: 2017 IEEE Congress on Evolutionary Computation (CEC), pp. 121–128. IEEE (2017)
Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Freund, Y., Schapire, R.E., et al.: Experiments with a new boosting algorithm. In: Icml, vol. 96, pp. 148–156 (1996)
Iba, H.: Bagging, boosting, and bloating in genetic programming. In: Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation, vol. 2, pp. 1053–1060. Morgan Kaufmann Publishers Inc. (1999)
Gagné, C., Sebag, M., Schoenauer, M., Tomassini, M.: Ensemble learning for free with evolutionary algorithms? In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, pp. 1782–1789. ACM (2007)
Zhang, Y., Bhattacharyya, S.: Genetic programming in classifying large-scale data: an ensemble method. Inf. Sci. 163(1), 85–101 (2004)
Folino, G., Pizzuti, C., Spezzano, G.: GP ensembles for large-scale data classification. IEEE Trans. Evol. Comput. 10(5), 604–616 (2006)
Folino, G., Pizzuti, C., Spezzano, G.: GP ensemble for distributed intrusion detection systems. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 54–62. Springer, Heidelberg (2005). https://doi.org/10.1007/11551188_6
Isele, R., Bizer, C.: Active learning of expressive linkage rules using genetic programming. Web Semant. Sci. Serv. Agents World Wide Web 23, 2–15 (2013)
Bartoli, A., De Lorenzo, A., Medvet, E., Tarlao, F.: Active learning of regular expressions for entity extraction. IEEE Trans. Cybern. 1–14 (2017)
Pappa, G.L., Freitas, A.A.: Evolving rule induction algorithms with multi-objective grammar-based genetic programming. Knowl. Inf. Syst. 19(3), 283–309 (2009)
Bartoli, A., De Lorenzo, A., Medvet, E., Tarlao, F.: Learning text patterns using separate-and-conquer genetic programming. In: Machado, P., Heywood, M.I., McDermott, J., Castelli, M., García-Sánchez, P., Burelli, P., Risi, S., Sim, K. (eds.) EuroGP 2015. LNCS, vol. 9025, pp. 16–27. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16501-1_2
Veeramachaneni, K., Derby, O., Sherry, D., O’Reilly, U.M.: Learning regression ensembles with genetic programming at scale. In: Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation, pp. 1117–1124. ACM (2013)
Moraglio, A., Krawiec, K., Johnson, C.G.: Geometric semantic genetic programming. In: Coello, C.A.C., Cutello, V., Deb, K., Forrest, S., Nicosia, G., Pavone, M. (eds.) PPSN 2012. LNCS, vol. 7491, pp. 21–31. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32937-1_3
Vanneschi, L., Castelli, M., Manzoni, L., Silva, S.: A new implementation of geometric semantic GP and its application to problems in pharmacokinetics. In: Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds.) EuroGP 2013. LNCS, vol. 7831, pp. 205–216. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37207-0_18
Brooks, T., Pope, D., Marcolini, A.: Airfoil self-noise and prediction. Technical report, NASA RP-1218 (1989)
Castelli, M., Vanneschi, L., Silva, S.: Prediction of high performance concrete strength using genetic programming with geometric semantic genetic operators. Expert Syst. Appl. 40(17), 6856–6862 (2013)
Castelli, M., Vanneschi, L., Popovič, A.: Parameter evaluation of geometric semantic genetic programming in pharmacokinetics. Int. J. Bio-Inspired Comput. 8(1), 42–50 (2016)
Yeh, I.-C.: Simulation of concrete slump using neural networks. Constr. Mater. 162(1), 11–18 (2009)
Ortigosa, I., Lopez, R., Garcia, J.: A neural networks approach to residuary resistance of sailing yachts prediction. In: Proceedings of the International Conference on Marine Engineering MARINE, vol. 2007, p. 250 (2007)
Acknowledgements
This work was also financed through the Regional Operational Programme CENTRO2020 within the scope of the project CENTRO-01-0145-FEDER-000006.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Castelli, M., Gonçalves, I., Manzoni, L., Vanneschi, L. (2018). Pruning Techniques for Mixed Ensembles of Genetic Programming Models. In: Castelli, M., Sekanina, L., Zhang, M., Cagnoni, S., García-Sánchez, P. (eds) Genetic Programming. EuroGP 2018. Lecture Notes in Computer Science(), vol 10781. Springer, Cham. https://doi.org/10.1007/978-3-319-77553-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-77553-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77552-4
Online ISBN: 978-3-319-77553-1
eBook Packages: Computer ScienceComputer Science (R0)