Compositional kernel learning using tree-based genetic programming for Gaussian process regression

Jin, Seung-Seop

doi:10.1007/s00158-020-02559-7

Compositional kernel learning using tree-based genetic programming for Gaussian process regression

Research Paper
Published: 18 May 2020

Volume 62, pages 1313–1351, (2020)
Cite this article

Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Seung-Seop Jin¹

1030 Accesses
14 Citations
Explore all metrics

Abstract

Although Gaussian process regression (GPR) is a powerful Bayesian nonparametric regression model for engineering problems, its predictive performance is highly dependent on a kernel for covariance function of GPR. However, choosing a proper kernel is still challenging even for experts. To choose proper kernel automatically, this study proposes a compositional kernel (CPK) learning using tree-based genetic programming (GEP). The optimal structure of the kernel is defined as a compositional representation based on sums and products of eight base-kernels. The CPK can be encoded as a tree-structure, so that tree-based GEP is employed to discover an optimal tree-structure of the CPK. To avoid overly complex solution in GEP, the proposed method introduced a dynamic maximum tree-depth technique. The novelty of the proposed method is to utilize more flexible and efficient learning capability to learn the relationship between input and output than existing methods. To evaluate the learning capability of the proposed method, seven test functions were firstly investigated with various noise levels, and its predictive accuracy was compared with existing methods. Reliability problems in both parallel and series systems were introduced to evaluate the performance of the proposed method for efficient reliability assessment. The results show that the proposed method generally outperforms or performs similarly to the best one among existing methods. In addition, it is also shown that proper kernel function can significantly improve the performance of GPR as the training data increases. Stated differently, the proposed method can learn the function of being fitted efficiently with less training samples than existing methods. In this context, the proposed method can make powerful and automatic predictive modeling based on GPR in engineering problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Genetic algorithms: theory, genetic operators, solutions, and applications

Article 03 February 2023

Bushra Alhijawi & Arafat Awajan

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Article Open access 24 January 2024

Beatriz Flamia Azevedo, Ana Maria A. C. Rocha & Ana I. Pereira

A New Optimization Model for MLP Hyperparameter Tuning: Modeling and Resolution by Real-Coded Genetic Algorithm

Article Open access 14 March 2024

Fatima Zahrae El-Hassani, Meryem Amri, … Khalid Haddouch

References

Au S-K, Wang Y (2014) Engineering risk assessment with subset simulation. Wiley, Singapore
Book Google Scholar
Babaee H, Perdikaris P, Chryssostomidis C, Karniadakis GE (2016) Multi-fidelity modelling of mixed convection based on experimental correlations and numerical simulations. J Fluid Mech 809:895–917. https://doi.org/10.1017/jfm.2016.718
Article MathSciNet MATH Google Scholar
Ben-Ari EN, Steinberg DM (2007) Modeling data from computer experiments: an empirical comparison of kriging with MARS and projection pursuit regression. Qual Eng 19:327–338. https://doi.org/10.1080/08982110701580930
Article Google Scholar
Bichon BJ, McFarland JM, Mahadevan S (2011) Efficient surrogate models for reliability analysis of systems with multiple failure modes. Reliab Eng Syst Saf 96:1386–1395. https://doi.org/10.1016/j.ress.2011.05.008
Article Google Scholar
Canas LS et al. (2018) Gaussian processes with optimal kernel construction for neuro-degenerative clinical onset prediction vol 10575. SPIE Medical Imaging. SPIE
Chiles J-P, Delfiner P (1999) Geostatistics: modeling spatial uncertainty. Wiley series in probability and statistics applied probability and statistics section. Wiley, New York
Book Google Scholar
Costabal FS, Matsuno K, Yao J, Perdikaris P, Kuhl E (2019) Machine learning in drug development: characterizing the effect of 30 drugs on the QT interval using Gaussian process regression, sensitivity analysis, and uncertainty quantification. Comput Methods Appl Mech Eng 348:313–333. https://doi.org/10.1016/j.cma.2019.01.033
Article MathSciNet MATH Google Scholar
Dette H, Pepelyshev A (2010) Generalized Latin hypercube design for computer experiments. Technometrics 52:421–429. https://doi.org/10.1198/Tech.2010.09157
Article MathSciNet Google Scholar
DiazDelaO FA, Adhikari S (2011) Gaussian process emulators for the stochastic finite element method. Int J Numer Methods Eng 87:521–540. https://doi.org/10.1002/nme.3116
Article MathSciNet MATH Google Scholar
Diosan L, Rogozan A, Pecuchet JP (2007) Evolving kernel functions for SVMs by genetic programming. In: Sixth international conference on machine learning and applications (ICMLA 2007), 13–15 Dec. 2007. pp 19–24. https://doi.org/10.1109/ICMLA.2007.70
Duvenaud D, Lloyd J, Grosse R, Tenenbaum J, Zoubin G (2013) Structure discovery in nonparametric regression through compositional kernel search. Paper presented at the proceedings of the 30th international conference on machine learning, proceedings of machine learning research
Forrester AIJ, Sóbester AS, Keane AJ (2008) Engineering design via surrogate modelling: a practical guide. Wiley, Chichester
Book Google Scholar
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–67. https://doi.org/10.1214/aos/1176347963
Article MathSciNet MATH Google Scholar
Ginsbourger D, Durrande N, Roustant O (2013) Kernels and designs for modelling invariant functions: from group invariance to additivity. In, Heidelberg. mODa 10—advances in model-oriented design and analysis. Springer International Publishing, pp 107–115
Guo MW, Hesthaven JS (2018) Reduced order modeling for nonlinear structural analysis using Gaussian process regression. Comput Methods Appl Mech Eng 341:807–826. https://doi.org/10.1016/j.cma.2018.07.017
Article MathSciNet MATH Google Scholar
Hamdaoui M, Le Quilliec G, Breitkopf P, Villon P (2014) POD surrogates for real-time multi-parametric sheet metal forming problems. Int J Mater Form 7:337–358. https://doi.org/10.1007/s12289-013-1132-0
Article Google Scholar
Hamdaoui M, Oujebbour F-Z, Habbal A, Breitkopf P, Villon P (2015) Kriging surrogates for evolutionary multi-objective optimization of CPU intensive sheet metal forming applications. Int J Mater Form 8:469–480. https://doi.org/10.1007/s12289-014-1190-y
Article Google Scholar
Harper WV, Gupta SK (1983) Sensitivity/uncertainty analysis of a borehole scenario comparing Latin hypercube sampling and deterministic sensitivity approaches. Battelle Memorial Inst., Columbus, OH (USA). Office of Nuclear Waste Isolation,
Holland JH (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control and artificial intelligence. MIT Press
Hwang Y, Tong A, Choi J (2016) Automatic construction of nonparametric relational regression models for multiple time series. Paper presented at the proceedings of the 33rd international conference on machine learning, proceedings of machine learning research,
Jin SS, Jung HJ (2016) Self-adaptive sampling for sequential surrogate modeling of time-consuming finite element analysis. Smart Struct Syst 17:611–629
Article Google Scholar
Kersaudy P, Sudret B, Varsier N, Picon O, Wiart J (2015) A new surrogate modeling technique combining kriging and polynomial chaos expansions—application to uncertainty analysis in computational dosimetry. J Comput Phys 286:103–117. https://doi.org/10.1016/j.jcp.2015.01.034
Article MathSciNet MATH Google Scholar
Kleijnen JPC, van Beers WCM (2004) Application-driven sequential designs for simulation experiments: kriging metamodelling. J Oper Res Soc 55:876–883. https://doi.org/10.1057/palgrave.jors.2601747
Article MATH Google Scholar
Klenske ED, Zeilinger MN, Schölkopf B, Hennig P (2013) Nonparametric dynamics estimation for time periodic systems. In: 2013 51st annual allerton conference on communication, control, and computing (Allerton), 2–4 Oct. 2013. pp 486–493. https://doi.org/10.1109/Allerton.2013.6736564
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. Complex adaptive systems. MIT Press, Cambridge
MATH Google Scholar
Kronberger G, Kommenda M (2013) Evolution of covariance functions for gaussian process regression using genetic programming. In: Computer aided systems theory—EUROCAST 2013. Springer, Berlin, pp 308–315
Chapter Google Scholar
Lawrence N (2005) Probabilistic non-linear principal component analysis with Gaussian process latent variable models. J Mach Learn Res 6:1783–1816
MathSciNet MATH Google Scholar
Lee I, Choi KK, Gorsich D (2010) System reliability-based design optimization using the MPP-based dimension reduction method. Struct Multidiscip Optim 41:823–839. https://doi.org/10.1007/s00158-009-0459-0
Article Google Scholar
Lloyd JR, Duvenaud D, Grosse R, Tenenbaum JB, Ghahramani Z (2014) Automatic construction and natural-language description of nonparametric regression models. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2894066th edn. AAAI Press, Quebec City, pp 1242–1250
Google Scholar
Madár J, Abonyi J, Szeifert F (2005) Genetic programming for the identification of nonlinear input−output models. Ind Eng Chem Res 44:3178–3186. https://doi.org/10.1021/ie049626e
Article Google Scholar
Menezes R, Garcia-Soidán P, Febrero-Bande M (2005) A comparison of approaches for valid variogram achievement. Comput Stat 20:623–642. https://doi.org/10.1007/bf02741319
Article MathSciNet MATH Google Scholar
Moon H, Dean AM, Santner TJ (2012) Two-stage sensitivity-based group screening in computer experiments. Technometrics 54:376–387. https://doi.org/10.1080/00401706.2012.725994
Article MathSciNet Google Scholar
Morris MD, Mitchell TJ, Ylvisaker D (1993) Bayesian design and analysis of computer experiments: use of derivatives in surface prediction. Technometrics 35:243–255. https://doi.org/10.1080/00401706.1993.10485320
Article MathSciNet MATH Google Scholar
Perdikaris P, Raissi M, Damianou A, Lawrence ND, Karniadakis GE (2017) Nonlinear information fusion algorithms for data-efficient multi-fidelity modelling. P Roy Soc A-Math Phy 473:2016075135. https://doi.org/10.1098/Rspa.2016.0751
Article MATH Google Scholar
Picheny V, Wagner T, Ginsbourger D (2013) A benchmark of kriging-based infill criteria for noisy optimization. Struct Multidiscip Optim 48:607–626. https://doi.org/10.1007/s00158-013-0919-4
Article Google Scholar
Quionero-Candela J, Rasmussen CE (2005) A unifying view of sparse approximate Gaussian process regression. J Mach Learn Res 6:1939–1959
MathSciNet MATH Google Scholar
Raissi M, Perdikaris P, Karniadakis GE (2017) Inferring solutions of differential equations using noisy multi-fidelity data. J Comput Phys 335:736–746. https://doi.org/10.1016/j.jcp.2017.01.060
Article MathSciNet MATH Google Scholar
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, Cambridge
MATH Google Scholar
Schulz E, Speekenbrink M, Krause A (2018) A tutorial on Gaussian process regression: modelling, exploring, and exploiting functions. J Math Psychol 85:1–16. https://doi.org/10.1016/j.jmp.2018.03.001
Article MathSciNet MATH Google Scholar
Silva S, Almeida J (2003) Dynamic maximum tree depth—a simple technique for avoiding bloat in tree-based GP. Lect Notes Comput Sci 2724:1776–1787
Article Google Scholar
Smith N, Mahadevan S (2005) Integrating system-level and component-level designs under uncertainty. J Spacecr Rocket 42:752–760. https://doi.org/10.2514/1.6662
Article Google Scholar
Snelson E, Ghahramani Z (2005) Sparse Gaussian processes using pseudo-inputs. Paper presented at the proceedings of the 18th international conference on neural information processing systems, Vancouver, British Columbia, Canada,
Soule T, Foster JA (1998) Effects of code growth and parsimony pressure on populations in genetic programming. Evol Comput 6:293–309. https://doi.org/10.1162/evco.1998.6.4.293
Article Google Scholar
Stein ML (1999) Interpolation of spatial data: some theory for kriging. Springer, New York. https://doi.org/10.1007/978-1-4612-1494-6
Book MATH Google Scholar
Ugray Z, Lasdon L, Plummer J, Glover F, Kelly J, Martí R (2007) Scatter search and local NLP solvers: a multistart framework for global optimization. INFORMS J Comput 19:313–484. https://doi.org/10.1287/ijoc.1060.0175
Article MathSciNet MATH Google Scholar
Ward EJ (2008) A review and comparison of four commonly used Bayesian and maximum likelihood model selection tools. Ecol Model 211:1–10. https://doi.org/10.1016/j.ecolmodel.2007.10.030
Article Google Scholar
Welch WJ, Robert JB, Sacks J, Wynn HP, Mitchell TJ, Morris MD (1992) Screening, predicting, and computer experiments. Technometrics 34:15–25. https://doi.org/10.2307/1269548
Article Google Scholar
William E, Northern J (2008) Genetic programming lab (GPLab) tool set version 3.0. In: 2008 IEEE Region 5 Conference, 17–20 April 2008. pp 1–6. https://doi.org/10.1109/TPSD.2008.4562729
Wu B, Zhang W-Q, Chen L, Liang J-H. (2010) A GP-based kernel construction and optimization method for RVM. In: 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE), 26–28 Feb. 2010. pp 419–423. https://doi.org/10.1109/ICCAE.2010.5451646

Download references

Replication of results

The codes are available from the following repository of GITHUB: https://github.com/seungsab/CPKL_using_Tree-GEP.

Funding

This research was supported by a grant from a Strategic Research Project (Smart Monitoring System for Concrete Structures Using FRP Nerve Sensor) funded by the Korea Institute of Civil Engineering and Building Technology.

Author information

Authors and Affiliations

Sustainable Infrastructure Research Center, KICT, Goyang-si, 10223, Republic of Korea
Seung-Seop Jin

Authors

Seung-Seop Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seung-Seop Jin.

Ethics declarations

Conflict of interest

The author declares no conflict of interest.

Additional information

Responsible Editor: Mehmet Polat Saka

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1. Base kernel for covariance function

The distance-based similarity between function values is fundamental for the nonparametric modeling in regression. The properties of the similarity are encoded by the kernel. Therefore, the kernel for covariance function is a crucial ingredient in GPR, since it encodes our prior assumptions about the function which we wish to learn. This appendix briefly reviews eight kernels for richer representation to examine their properties. Hereafter, these kernels are referred to as base-kernels. Their properties and expressible structures are shown in Fig. 24.

1.1 Squared exponential kernel (SE)

SE is the most popular kernel for the covariance function of GPR. Since the SE kernel is infinitely differentiable, it encodes prior assumption about the strongly smooth function. It is also known as the “radial basis” or “Gaussian correlation” kernel and this kernel is given by

$$ {k}_{\mathrm{SE}}\left(\boldsymbol{x},{\boldsymbol{x}}^{\prime };{\sigma}^2,\boldsymbol{l}\right)={\sigma}^2\exp \left(-\frac{{\left(\boldsymbol{x}-\boldsymbol{x}\prime \right)}^2}{2{\boldsymbol{l}}^{\mathbf{2}}}\right) $$

(23)

where σ² and l are the scale factor and length scale for each input, respectively; σ² determines the average distance away from its mean, while l defines the characteristics of the correlation length scale for each input (length of the wiggle in the functions).

1.2 Rational quadratic kernel (RQ)

RQ is equivalent to adding many SE kernels with different length scales. The RQ kernel encodes the function of varying smoothly across many length scales to represent smoothness with multiscale variation. This kernel is given by

$$ {k}_{\mathrm{RQ}}\left(\boldsymbol{x},{\boldsymbol{x}}^{\prime };{\sigma}^2,\boldsymbol{l},\alpha \right)={\sigma}^2{\left(1+\frac{{\left(\boldsymbol{x}-\boldsymbol{x}\prime \right)}^2}{2\alpha {\boldsymbol{l}}^{\mathbf{2}}}\right)}^{-\alpha } $$

(24)

where σ², l, and α are the scale factor, the correlation length scale for each input, and the shape parameter for the RQ kernel, respectively; σ² and l play similar roles in those of the SE kernel; and α adjusts multiscale variation (i.e., relative weighting of large-scale and small-scale variations).

1.3 Matérn kernel (ME)

ME is a generalization of the SE kernel with an additional hyperparameter (υ) of controlling the smoothness of the function (i.e., roughness). Stein (1999) recommended the ME kernel as an alternative of the SE kernel, since the smoothness assumption of the SE kernel is unrealistic for modeling physical processes. Matérn kernels with υ = 3/2 and υ = 5/2 are most popular for the GPR, and they are described as

$$ {k}_{\mathrm{ME}3}\left(\boldsymbol{x},{\boldsymbol{x}}^{\prime };{\sigma}^2,\boldsymbol{l}\right)={\sigma}^2\left(1+\frac{\boldsymbol{r}\sqrt{3}}{\boldsymbol{l}}\right)\exp \left(-\frac{\boldsymbol{r}\sqrt{3}}{\boldsymbol{l}}\right) $$

(25)

and

$$ {k}_{\mathrm{ME}5}\left(\boldsymbol{x},{\boldsymbol{x}}^{\prime };{\sigma}^2,\boldsymbol{l}\right)={\sigma}^2\left(1+\frac{\boldsymbol{r}\sqrt{5}}{\boldsymbol{l}}+\frac{5{\boldsymbol{r}}^{\mathbf{2}}}{3{\boldsymbol{l}}^{\mathbf{2}}}\right)\exp \left(-\frac{\boldsymbol{r}\sqrt{5}}{\boldsymbol{l}}\right) $$

(26)

where r = x − x′; σ² and l play similar roles in those of the SE kernel.

1.4 Constant kernel (CON)

CON can be used as the part of a product-kernel to scale the magnitude of the other kernel. The CON kernel is also used as part of a sum-kernel to represent the mean of GPR. It is given by

$$ {k}_{\mathrm{CON}}\left(\boldsymbol{x},{\boldsymbol{x}}^{\prime };{\sigma}^2\right)={\sigma}^2 $$

(27)

where σ² can be used as both scale factor for part of a product-kernel or mean of GPR as the sum-kernel.

1.5 Linear kernel (LIN)

LIN is nonstationary, which means that it depends on the absolute location of inputs. The nonstationarity of the LIN kernel can model the polynomial trend as part of the product-kernel. Therefore, the LIN kernel is used to encode the trend in the function. This kernel is defined by

$$ {k}_{\mathrm{LIN}}\left(\boldsymbol{x},{\boldsymbol{x}}^{\prime };{\sigma}_b^2,{\sigma}_v^2\right)={\sigma}_b^2+{\sigma}_v^2\boldsymbol{x}\cdotp \boldsymbol{x}^{\prime } $$

(28)

where $ {\sigma}_b^2 $ and $ {\sigma}_v^2 $ denote the bias and slope coefficient for each input.

1.6 Periodic kernel (PER)

PER is a stationary periodic function to encode the repeating patterns in the function. This kernel is given by

$$ {k}_{\mathrm{PER}}\left(\boldsymbol{x},{\boldsymbol{x}}^{\prime };{\sigma}^2,\boldsymbol{l},p\right)={\sigma}^2\exp \left(-\frac{2\sin^2\left(\pi \left|\boldsymbol{x}-\boldsymbol{x}^{\prime}\right|/p\right)}{{\boldsymbol{l}}^{\mathbf{2}}}\right) $$

(29)

where σ² and l play similar roles in those of the SE kernel, and p is the period to simply determine the distance between repetition of the function.

1.7 White noise kernel (WN)

WN encodes the uncorrelated noise in functions. The WN kernel can be modeled by a limit of the SE kernel by going its length scale to zero. The WN kernel is widely used to model the additive noise in the function. This kernel is defined by

$$ {k}_{\mathrm{WN}}\left(\boldsymbol{x},{\boldsymbol{x}}^{\prime };{\sigma}_{\mathrm{WN}}^2\right)={\sigma}_{\mathrm{WN}}^2{\delta}_{\boldsymbol{x},\boldsymbol{x}\prime } $$

(30)

where $ {\sigma}_{\mathrm{WN}}^2 $ and δ_{x, x′} are the noise variance and Kronecker delta function, respectively. It has only one hyperparameter as: $ {\sigma}_{\mathrm{WN}}^2 $ determines the level of the noise.

Appendix 2. Analytical test functions

1.1 Branin function (Forrester et al. 2008) (d = 2)

$$ f(X)={\left({x}_2-\frac{5.1}{4{\pi}^2}{x}_1^2+\frac{5}{\pi }{x}_1-6\right)}^2+10\left(1-\frac{1}{8\pi}\right)\cos \left({x}_1\right)+10 $$

(31)

where x₁ ∈ [−5, 10] and x₂ ∈ [0, 15].

1.2 Friedman function (Friedman 1991) (d = 5)

$$ f(X)=10\sin \left(\pi {x}_1{x}_2\right)+20{\left({x}_3-0.5\right)}^2+10{x}_{\mathbf{4}}+5{x}_5 $$

(32)

where x_i ∈ [0, 1]⁵.

1.3 Dette and Pepelyshev function (Dette and Pepelyshev 2010) (d = 8)

$$ f\left(\boldsymbol{X}\right)=4{\left({x}_1-2+8{x}_2-8{x}_2^2\right)}^2+{\left(3-4{x}_2\right)}^2+16\sqrt{x_3+1}{\left(2{x}_3-1\right)}^2+\sum \limits_{i=4}^8i\ln \left(1+\sum \limits_{j=3}^i{x}_j\right) $$

(33)

where x_i ∈ [0, 1]⁸.

1.4 Welch et al. function (Welch et al. 1992) (d = 20)

$$ f(X)=\frac{5{x}_{12}}{1+{x}_1}+5{\left({x}_4-{x}_{20}\right)}^2+{x}_5+40{x}_{19}^3-5{x}_{19}+0.05{x}_2+0.08{x}_3-0.03{x}_6+0.03{x}_7-0.09{x}_9-0.01{x}_{10}-0.07{x}_{11}+0.25{x}_{13}^3-0.04{x}_{14}+0.06{x}_{15}-0.01{x}_{17}-0.03{x}_{18} $$

(34)

where x_i ∈ [−0.5, 0.5]²⁰.

1.5 Output-transformerless (OTL) circuit model (d = 6)

The OTL circuit model has six inputs to estimate the mid-point voltage of an OTL push-pull circuit. This model has been used as a test function in surrogate modeling (Ben-Ari and Steinberg 2007). This model provides analytical expression of the mid-point voltage and it is defined as

$$ f(X)=\frac{\left(\frac{12{x}_2}{x_1+{x}_2}+0.74\right){x}_6\left({x}_5+9\right)}{x_6\left({x}_5+9\right)+{x}_3}+\frac{11.35{x}_3}{x_6\left({x}_5+9\right)+{x}_3}+\frac{0.74{x}_3{x}_6\left({x}_5+9\right)}{\left({x}_6\left({x}_5+9\right)+{x}_3\right){x}_4} $$

(35)

where x₁ ∈ [50, 150], x₂ ∈ [25, 70], x₃ ∈ [0.5, 3], x₄ ∈ [1.2, 2.5], x₅ ∈ [0.25, 1.2], and x₆ ∈ [50, 300].

1.6 Borehole model (d = 8)

The borehole model has eight inputs to estimate water-flow through a borehole. The borehole is drilled from the ground surface through the two aquifers. This model has been used as a test function in surrogate modeling (Kersaudy et al. 2015; Morris et al. 1993) and sensitivity analysis (Harper and Gupta 1983). The water-flow rate can be computed by (36) with the properties of the aquifers and borehole

$$ f\left(\boldsymbol{X}\right)=\frac{2\pi {x}_3\left({x}_4-{x}_6\right)}{\ln \left({x}_2/{x}_1\right)\left(1+\frac{2{x}_7{x}_3}{\ln \left({x}_2/{x}_1\right){x}_1^2{x}_8}+\frac{x_3}{x_5}\right)} $$

(36)

where x₁ ∈ [0.05, 0.15], x₂ ∈ [100, 50,000], x₃ ∈ [63,070, 115,600], x₄ ∈ [990, 1110],

x₅ ∈ [63.1, 116], x₆ ∈ [700, 82], x₇ ∈ [1120, 1680], and x₈ ∈ [9855, 12,045].

1.7 Wing weight model (d = 10)

The wing weight model has 10 inputs to estimate the weight of the light aircraft wing. This model has been used as a test function for the purpose of the input variable screening (Forrester et al. 2008). The wing weight is computed by (37)

$$ f\left(\boldsymbol{X}\right)=0.036{x}_1^{0.758}{x}_2^{0.0035}{\left(\frac{x_3}{\cos^2\left({x}_4\right)}\right)}^{0.6}{x}_5^{0.006}{x}_6^{0.04}{\left(\frac{100{x}_7}{\cos \left({x}_4\right)}\right)}^{-0.3}{\left({x}_8{x}_9\right)}^{0.49}+{x}_1{x}_{10} $$

(37)

where x₁ ∈ [150, 200], x₂ ∈ [220, 300], x₃ ∈ [6, 10], x₄ ∈ [−10, 10], x₅ ∈ [16, 45], x₆ ∈ [0.5, 1], x₇ ∈ [0.08, 0.18], x₈ ∈ [2.5, 6], x₉ ∈ [1700, 2500], and x₁₀ ∈ [0.025, 0.08].

Appendix 3. Predictive performance for global surrogate modeling

1.1 Numerical verification #1: interpolation for noiseless data (β = 0)

Table 4 Summary statistics of RMSE for mathematical test functions (β = 0)

Full size table

Table 5 Summary statistics of MAE for mathematical test functions (β = 0)

Full size table

Table 6 Summary statistics of RMSE for physical models (β = 0)

Full size table

Table 7 Summary statistics of MAE for physical models (β = 0)

Full size table

1.2 Numerical verification #2: regression for noisy data (β = 0.005)

Table 8 Summary statistics of RMSE for mathematical test functions (β = 0.005)

Full size table

Table 9 Summary statistics of MAE for mathematical test functions (β = 0.005)

Full size table

Table 10 Summary statistics of RMSE for physical models (β = 0.005)

Full size table

Table 11 Summary statistics of MAE for physical models (β = 0.005)

Full size table

1.3 Numerical verification #3: regression for noisy data (β = 0.05)

Table 12 Summary statistics of RMSE for mathematical test functions (β = 0.05)

Full size table

Table 13 Summary statistics of MAE for mathematical test functions (β = 0.05)

Full size table

Table 14 Summary statistics of RMSE for physical models (β = 0.05)

Full size table

Table 15 Summary statistics of MAE for physical models (β = 0.05)

Full size table

1.4 Computational experiment #1: system reliability in a parallel system

Table 16 Summary statistics of RMSE for multimodal system

Full size table

Table 17 Summary statistics of MAE for multimodal system

Full size table

1.5 Computational experiment #2: system reliability in a series system

Table 18 Summary statistics of RMSE and MAE for liquid hydrogen tank

Full size table

Appendix 4. Computational cost with different sample sizes and dimensionalities

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jin, SS. Compositional kernel learning using tree-based genetic programming for Gaussian process regression. Struct Multidisc Optim 62, 1313–1351 (2020). https://doi.org/10.1007/s00158-020-02559-7

Download citation

Received: 21 August 2019
Revised: 23 January 2020
Accepted: 27 February 2020
Published: 18 May 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s00158-020-02559-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compositional kernel learning using tree-based genetic programming for Gaussian process regression

Abstract

Access this article

Similar content being viewed by others

Genetic algorithms: theory, genetic operators, solutions, and applications

Hybrid approaches to optimization and machine learning methods: a systematic literature review

A New Optimization Model for MLP Hyperparameter Tuning: Modeling and Resolution by Real-Coded Genetic Algorithm

References

Replication of results

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Appendices

Appendix 1. Base kernel for covariance function

1.1 Squared exponential kernel (SE)

1.2 Rational quadratic kernel (RQ)

1.3 Matérn kernel (ME)

1.4 Constant kernel (CON)

1.5 Linear kernel (LIN)

1.6 Periodic kernel (PER)

1.7 White noise kernel (WN)

Appendix 2. Analytical test functions

1.1 Branin function (Forrester et al. 2008) (d = 2)

1.2 Friedman function (Friedman 1991) (d = 5)

1.3 Dette and Pepelyshev function (Dette and Pepelyshev 2010) (d = 8)

1.4 Welch et al. function (Welch et al. 1992) (d = 20)

1.5 Output-transformerless (OTL) circuit model (d = 6)

1.6 Borehole model (d = 8)

1.7 Wing weight model (d = 10)

Appendix 3. Predictive performance for global surrogate modeling

1.1 Numerical verification #1: interpolation for noiseless data (β = 0)

1.2 Numerical verification #2: regression for noisy data (β = 0.005)

1.3 Numerical verification #3: regression for noisy data (β = 0.05)

1.4 Computational experiment #1: system reliability in a parallel system

1.5 Computational experiment #2: system reliability in a series system

Appendix 4. Computational cost with different sample sizes and dimensionalities

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation