Abstract
This work starts from the empirical observation that k nearest neighbours (KNN) consistently outperforms state-of-the-art techniques for regression, including geometric semantic genetic programming (GSGP). However, KNN is a memorization, and not a learning, method, i.e. it evaluates unseen data on the basis of training observations, and not by running a learned model. This paper takes a first step towards the objective of defining a learning method able to equal KNN, by defining a new semantic mutation, called random vectors-based mutation (RVM). GP using RVM, called RVMGP, obtains results that are comparable to KNN, but still needs training data to evaluate unseen instances. A comparative analysis sheds some light on the reason why RVMGP outperforms GSGP, revealing that RVMGP is able to explore the semantic space more uniformly. This finding opens a question for the future: is it possible to define a new genetic operator, that explores the semantic space as uniformly as RVM does, but that still allows us to evaluate unseen instances without using training data?
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Vanneschi, L.: An introduction to geometric semantic genetic programming. In: Schütze, O., Trujillo, L., Legrand, P., Maldonado, Y. (eds.) NEO 2015. SCI, vol. 663, pp. 3–42. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-44003-3_1
Moraglio, A., Krawiec, K., Johnson, C.G.: Geometric semantic genetic programming. In: Coello, C.A.C., Cutello, V., Deb, K., Forrest, S., Nicosia, G., Pavone, M. (eds.) PPSN 2012. LNCS, vol. 7491, pp. 21–31. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32937-1_3
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
Gonçalves, I., Silva, S., Fonseca, C.M.: On the generalization ability of geometric semantic genetic programming. In: Machado, P., et al. (eds.) EuroGP 2015. LNCS, vol. 9025, pp. 41–52. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16501-1_4
Castelli, M., Silva, S., Vanneschi, L.: A C++ framework for geometric semantic genetic programming. Genetic Program. Evolvable Mach. 16(1), 73–81 (2015)
Moraglio, A.: An efficient implementation of GSGP using higher-order functions and memoization. In: Semantic Methods in Genetic Programming, Workshop at Parallel Problem Solving from Nature (2014)
Martins, J.F.B.S., Oliveira, L.O.V.B., Miranda, L.F., Casadei, F., Pappa, G.L.: Solving the exponential growth of symbolic regression trees in geometric semantic genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2018, pp. 1151–1158. ACM, New York (2018)
Moraglio, A., Mambrini, A.: Runtime analysis of mutation-based geometric semantic genetic programming for basis functions regression. In: Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation, GECCO 2013, pp. 989–996. ACM, New York (2013)
Vanneschi, L., Silva, S., Castelli, M., Manzoni, L.: Geometric semantic genetic programming for real life applications. In: Riolo, R., Moore, J.H., Kotanchek, M. (eds.) Genetic Programming Theory and Practice XI. GEC, pp. 191–209. Springer, New York (2014). https://doi.org/10.1007/978-1-4939-0375-7_11
Kramer, O.: K-nearest neighbors. In: Kramer, O. (ed.) Dimensionality Reduction with Unsupervised Nearest Neighbors. Intelligent Systems Reference Library, vol. 51, pp. 13–23. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38652-7_2
Mucherino, A., Papajorgji, P.J., Pardalos, P.M.: k-nearest neighbor classification. In: Mucherino, A., Papajorgji, P.J., Pardalos, P.M. (eds.) Data Mining in Agriculture. Springer Optimization and Its Applications, vol. 34, pp. 83–106. Springer, New York (2009). https://doi.org/10.1007/978-0-387-88615-2_4
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Verikas, A., Gelzinis, A., Bacauskiene, M.: Mining data with random forests: a survey and results of new tests. Pattern Recogn. 44(2), 330–349 (2011)
Ziegler, A., König, I.: Mining data with random forests: current options for real-world applications. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 4, 55–63 (2014)
Archetti, F., Lanzeni, S., Messina, E., Vanneschi, L.: Genetic programming for computational pharmacokinetics in drug discovery and development. Genetic Program. Evolvable Mach. 8(4), 413–432 (2007)
Castelli, M., Vanneschi, L., Silva, S.: Prediction of high performance concrete strength using genetic programming with geometric semantic genetic operators. Expert Syst. Appl. 40(17), 6856–6862 (2013)
Castelli, M., Trujillo, L., Vanneschi, L., Popovič, A.: Prediction of energy performance of residential buildings: a genetic programming approach. Energy Buildings 102, 67–74 (2015)
Castelli, M., Vanneschi, L., Silva, S.: Prediction of the unified Parkinson’s disease rating scale assessment using a genetic programming system with geometric semantic genetic operators. Expert Syst. Appl. 41(10), 4608–4616 (2014)
Cheng, D., Zhang, S., Deng, Z., Zhu, Y., Zong, M.: kNN algorithm with data-driven k value. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS (LNAI), vol. 8933, pp. 499–512. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14717-8_39
Galván, E., Schoenauer, M.: Promoting semantic diversity in multi-objective genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2019, pp. 1021–1029. ACM, New York (2019)
Chen, G.H., Shah, D.: Explaining the success of nearest neighbor methods in prediction. Found. Trends® in Mach. Learn. 10(5–6), 337–588 (2018)
Cohen, G., Sapiro, G., Giryes, R.: DNN or k-NN: that is the generalize vs. memorize question. ArXiv abs/1805.06822 (2018)
Slavinec, M., et al.: Novelty search for global optimization. Appl. Math. Comput. 347, 865–881 (2019)
Acknowledgments
This work was partially supported by FCT, Portugal, through funding of LASIGE Research Unit (UIDB/00408/2020) and projects BINDER (PTDC/CCI-INF/29168/2017), GADgET (DSAIPA/DS/0022/2018), AICE (DSAIPA/DS/0113/2019), INTERPHENO (PTDC/ASP-PLA/28726/2017), OPTOX (PTDC/CTA-AMB/30056/2017) and PREDICT (PTDC/CCI-CIF/29877/2017), and by the Slovenian Research Agency (research core funding No. P5-0410). We also thank Reviewer 2 for the interesting comments, and apologize for not having had enough time to follow all the helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Vanneschi, L., Castelli, M., Manzoni, L., Silva, S., Trujillo, L. (2020). Is k Nearest Neighbours Regression Better Than GP?. In: Hu, T., Lourenço, N., Medvet, E., Divina, F. (eds) Genetic Programming. EuroGP 2020. Lecture Notes in Computer Science(), vol 12101. Springer, Cham. https://doi.org/10.1007/978-3-030-44094-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-44094-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44093-0
Online ISBN: 978-3-030-44094-7
eBook Packages: Computer ScienceComputer Science (R0)