Abstract
We propose and motivate the use of vicinal-risk minimization (VRM) for training genetic programming classifiers. We demonstrate that VRM has a number of attractive properties and demonstrate that it has a better correlation with generalization error compared to empirical risk minimization (ERM) so is more likely to lead to better generalization performance, in general. From the results of statistical tests over a range of real and synthetic datasets, we further demonstrate that VRM yields consistently superior generalization errors compared to conventional ERM.
Similar content being viewed by others
Notes
This is exactly what is done in the hinge loss used in soft-margin support-vector machines [4].
We omit the normalization of the Parzen density estimate since this contributes only a multiplicative constant that does not affect the subsequent minimization stage.
Downloadable from http://www.stats.ox.ac.uk/pub/PRNN/.
The sum of ranks 1–11 = 66.
In this paper we make what may seem to be rather cautious statements about the outcome of hypothesis tests. Hypothesis tests are frequently misinterpreted—see Cohen [6] for a discussion of the technical arguments.
References
N.M. Amil, N. Bredeche, C. Gagné, S. Gelly, M. Schoenauer, O. Teytaud, A statistical learning perspective of genetic programming, 12th European Conference on Genetic Programming (EuroGP 2009) (Tübingen, Germany, 2009), pp. 327–338
Borges, C.E., Alonso, C.L., Montaña, J.L. (2010). Model selection in genetic programming. In: 12th Annual Conference on Genetic and Evolutionary Computation (GECCO 2010), pp. 985–986. Portland, OR.
Chapelle, O., Weston, J., Bottou, L., Vapnik, V. (2000). Vicinal risk minimization. In: Advances in Neural Information Processing Systems 13 (NIPS 2000), pp. 416–422. Denver CO.
V. Cherkassky, F.M. Mulier, Learning from data: concepts, theory and methods, 2nd edn. (Wiley-IEEE Press, New Jersey, 2007)
C.A.C. Coello, G.B. Lamont, Applications of multi-objective evolutionary algorithms, vol. 1 (World Scientific, Singapore, 2004)
J. Cohen, The earth is round (\(p <.05\)). Am. Psychol. 49(12), 997–1003 (1994)
J. Demšar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
R.O. Duda, P.E. Hart, D.G. Stork, Pattern recognition, 2nd edn. (John Wiley and Sons, New York, 2001)
A. Ekárt, S.Z. Németh, Selection based on the Pareto nondomination criterion for controlling code growth in genetic programming. Genet. Program. Evol. M. 2(1), 61–73 (2001)
A. Frank, A. Asuncion, UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. (2010) http://archive.ics.uci.edu/ml
T. Hastie, R. Tibshirani, J. Friedman, The elements of statistical learning: data mining, inference, and prediction, 2nd edn. (Springer, Berlin, 2009)
L. Holmström, P. Koistinen, Using additive noise in back-propagation training. IEEE T. Neural. Networ. 3(1), 24–38 (1992)
H. Iba, H. de Garis., T. Sato, Genetic programming using a minimum description length principle. In: Advances in Genetic Programming, pp. 265–284. MIT Press (1994)
G.N. Karystinos, D.A. Pados, On overfitting, generalization, and randomly expanded training sets. IEEE T. Neural. Networ. 11(5), 1050–1057 (2000)
R. Kumar, P.I. Rockett, Improved sampling of the Pareto-front in multiobjective genetic optimizations by steady-state evolution: A Pareto converging genetic algorithm. Evol. Comput. 10(3), 283–314 (2002)
C. Nadeau, Y. Bengio, Inference for the generalization error. Mach. Learn. 52(3), 239–281 (2003)
E. Polak, Optimization: algorithms and consistent approximations (Springer, New York, 1997)
R. Poli, W.B. Langdon, N.F. McPhee, A Field Guide to Genetic Programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk (2008)
B.D. Ripley, Neural networks and related methods for classification. J. Roy. Stat. Soc. B Met. 56(3), 409–456 (1994)
J. Rissanen, Modeling by shortest data description. Automatica 14(5), 465–471 (1978)
C.P. Robert, G. Casella, Monte Carlo statistical methods, 2nd edn. (Springer, New York, 2005)
S. Silva, S. Dignum, L. Vanneschi, Operator equalisation for bloat free genetic programming and a survey of bloat control methods. Genet. Program. Evol. M. 13(2), 197–238 (2012)
C. Soares, Is the UCI repository useful for data mining?, \(11^{th}\) Portuguese Conference on Artificial Intelligence (EPIA 2003) (Beja, Portugal, 2003), pp. 209–223
A.N. Tikhonov, V.Y. Arsenin, Solutions of ill posed problems (V.H. Winston, Washington, DC, 1977)
V.N. Vapnik, The nature of statistical learning theory, 2nd edn. (Springer, New York, 2000)
Y. Zhang, P. Rockett, A comparison of three evolutionary strategies for multiobjective genetic programming. Artif. Intell. Rev. 27(2–3), 149–163 (2007)
Y. Zhang, P.I. Rockett, A generic optimising feature extraction method using multiobjective genetic programming. Appl. Soft Comput. 11(1), 1087–1097 (2011)
Acknowledgments
The authors would like to thank Yilong Cao and Richard Everson for valuable discussions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ni, J., Rockett, P. Training genetic programming classifiers by vicinal-risk minimization. Genet Program Evolvable Mach 16, 3–25 (2015). https://doi.org/10.1007/s10710-014-9222-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10710-014-9222-4