Conservative strategy-based ensemble surrogate model for optimal groundwater remediation design at DNAPLs-contaminated sites

https://doi.org/10.1016/j.jconhyd.2017.05.007Get rights and content

Highlights

  • Conservative strategy is a promising alternative for addressing surrogate-modeling uncertainties.

  • The ensemble surrogate model that combined MGGP with KRG has the most favorable performance.

  • Combining different surrogate models into ensembles cannot ensure improved performance.

Abstract

The surrogate-based simulation-optimization techniques are frequently used for optimal groundwater remediation design. When this technique is used, surrogate errors caused by surrogate-modeling uncertainty may lead to generation of infeasible designs. In this paper, a conservative strategy that pushes the optimal design into the feasible region was used to address surrogate-modeling uncertainty. In addition, chance-constrained programming (CCP) was adopted to compare with the conservative strategy in addressing this uncertainty. Three methods, multi-gene genetic programming (MGGP), Kriging (KRG) and support vector regression (SVR), were used to construct surrogate models for a time-consuming multi-phase flow model. To improve the performance of the surrogate model, ensemble surrogates were constructed based on combinations of different stand-alone surrogate models. The results show that: (1) the surrogate-modeling uncertainty was successfully addressed by the conservative strategy, which means that this method is promising for addressing surrogate-modeling uncertainty. (2) The ensemble surrogate model that combines MGGP with KRG showed the most favorable performance, which indicates that this ensemble surrogate can utilize both stand-alone surrogate models to improve the performance of the surrogate model.

Introduction

Dense nonaqueous-phase liquids (DNAPLs) are now frequently detected in groundwater throughout the world, because of the widespread use, improper disposal, accidental spills and leaks of petrochemical products (Kueper and Mcworter, 1991). Because of their low solubility, low mobility and high density in water, DNAPLs may remain in aquifers for long periods of time, and thus ultimately become long-term continuous sources of groundwater contamination (Qin et al., 2007). Surfactant-enhanced aquifer remediation (SEAR), an enhancement to the conventional pump-and-treat technique, is a promising way to remove DNAPLs from aquifers (Schaerlaekens et al., 2005). By adding surfactants to the water, the solubility and mobility of DNAPLs in an aquifer can be increased (Delshad et al., 1996), which makes SEAR more efficient than the conventional pump-and-treat technique. Because the cost of the SEAR process is high, optimizing design for cost-effectiveness is of great value.

Simulation-optimization (S/O) techniques have been used extensively to solve such problems (Jiang et al., 2015, Luo et al., 2013). When such techniques are employed, the numerical simulation model would be called thousands of times before the optimal design is obtained, which is computationally expensive, and may be prohibitive (Hou et al., 2015). Using surrogates (also known as meta-models or proxy models) to replace the computationally expensive simulation models has become commonplace.

However, no matter how well the surrogate model approximates the simulation model, errors caused by surrogate-modeling uncertainty exist. In constrained optimization (constraints being surrogate models), the obtained solution may be infeasible because of surrogate errors (Viana et al., 2010). He et al. (2010) regarded the error between simulation model and surrogate model as a stochastic variable and adopted the chance-constrained programming (CCP) method to incorporate it into the optimization model for groundwater remediation design. However, before this method is adopted, the hypotheses of normality and zero-means for the errors generated by surrogate models should be tested, which cannot always be achieved.

Recently, researchers have focused on conservative strategy-based surrogate models (also called conservative surrogates), which push the optimal solution into the feasible region (Pan et al., 2012). In many engineering problems, there is an incentive to obtain approximations that are as close as possible but on the safer side in terms of the actual response (Picheny et al., 2008). In groundwater remediation design optimization, such a response may represent the minimum allowable value of the contaminant removal rate that, to avoid failure, must not be overestimated. In this paper, we call a surrogate model conservative when the estimations are lower than the true responses. That is to say, conservative surrogates tend to underestimate target values. In contrast, general surrogate models are unbiased; that is, the estimations are equally likely to be lower and higher than the actual value (Pan et al., 2012). To date, conservative surrogate models have been used in structural analysis of vehicle engineering (Zhu et al., 2013) and aircraft engineering (Acar et al., 2007), but have not previously been applied in optimization of groundwater remediation design.

Many techniques for surrogate modeling have been proposed, such as artificial neural networks (ANNs) (Luo et al., 2013), Kriging (KRG) (Zhao et al., 2016), support vector regression (SVR) (Ouyang et al., 2017), and extreme learning machines (ELMs) (Jiang et al., 2015). More recently, multi-gene genetic programming (MGGP) (Hinchliffe et al., 1996, Searson et al., 2007) has been designed to develop the input–output relationship of a system, and has attracted the attention of many researchers across a broad range of fields (Pan et al., 2013, Pandey et al., 2015, Mohammadzadeh et al., 2016). The main advantage of MGGP is its ability to develop a compact and explicit prediction equation in terms of different model variables without assuming a prior form of the existing relationships (Muduli and Das, 2015). A previous study (Ouyang et al., 2017) demonstrated the superiority of MGGP over KRG and SVR. Recently, researchers have tended to combine multiple surrogate models in ensembles instead of selecting only the best model and discarding the rest (Acar and Rais-Rohani, 2008, Goel et al., 2006, Viana et al., 2009). However, the combination of MGGP surrogate modeling with other techniques has not previously been evaluated.

The aim of the present study is to determine an optimal groundwater remediation design for DNAPL-contaminated sites with minimum costs under certain constraints. To address the abovementioned concerns, this study 1) combines an MGGP surrogate model with other surrogate models to form ensembles and make comparisons between them, and 2) adopts a conservative strategy to address surrogate-modeling uncertainty in case of failure. The CCP method is used for comparison.

Section snippets

MGGP

MGGP is a robust variant of genetic programming (GP) and is designed to generate empirical mathematical models of the input–output relationship from the datasets. GP is based on the evaluation of a single gene, whereas MGGP is constructed from a number of genes (Gandomi and Alavi, 2012, Searson et al., 2007). Each gene evolved by MGGP is a structured tree composed of functions and terminals (Searson et al., 2007), as can be seen in Fig. 1. The function set can include elements such as

Site overview

The application of the proposed approach was analyzed on a hypothetical perchloroethylene (PCE)-contaminated site. UTCHEM (Center for Petroleum and Geosystems Engineering, 2000) software developed by the University of Texas was used to simulate the SEAR processes. The studied site was a three-dimensional domain with a horizontal area of 100 × 70 m2 and a depth of 20 m. The simulation domain consisted of 20 layers; each layer was discretized into 50 × 30 grid blocks. Each grid block had dimensions of 2

Analysis of surrogate models

First, three stand-alone surrogate models were constructed with MGGP, KRG and SVR. Two statistical metrics, the coefficient of determination (R2) (Eq. 18) and the root mean square error (RMSE) (Eq. 19), were used to assess model performance. The value of R2 indicates how well the surrogate approximates the simulation model; higher values indicate better approximation. The RMSE is an indicator of the precision of the surrogate model; smaller values indicate greater accuracy. These metrics can be

Conclusions

In this study, a conservative strategy was proposed to address surrogate-modeling uncertainty when using the surrogate-based optimization-simulation technique to identify optimal groundwater remediation design at DNAPLs-contaminated sites. In addition, the CCP method was employed to compare with this method. To construct a surrogate model that has favorable performance with both training and testing data, MGGP was combined with KRG, SVR and a combination of both methods to form ensemble

Acknowledgments

This work was supported by the National Natural Science Foundation of China (41372237) and the National Key Research and Development Program of China (No. 2016YFC0402804). The authors thank the editor and anonymous reviewers for their insightful comments and suggestions.

References (38)

  • P. Zhu et al.

    Lightweight design of vehicle parameters under crashworthiness using conservative surrogates

    Comput. Ind.

    (2013)
  • E. Acar et al.

    Comparing effectiveness of measures that improve aircraft structural safety

    J. Aerosp. Eng.

    (2007)
  • E. Acar et al.

    Ensemble of metamodels with optimized weight factors

    Struct. Multidiscip. Optim.

    (2008)
  • C. Bishop

    Neural Networks for Pattern Recognition

    (1995)
  • Center for Petroleum and Geosystems Engineering, University of Texas at Austin

    UTCHEM 9.0 Volume I

  • C.-C. Chang et al.

    LIBSVM: A Library for Support Vector Machines

    (2011)
  • A. Charnes et al.

    Chance-constrained programming

    Manag. Sci.

    (1959)
  • A.H. Gandomi et al.

    A new multi-gene genetic programming approach to nonlinear system modeling. Part I: Materials and structural engineering problems

    Neural Comput. & Applic.

    (2012)
  • A. Garg et al.

    A modified multi-gene genetic programming approach for modelling true stress of dynamic strain aging regime of austenitic stainless steel 304

    Meccanica

    (2014)
  • Cited by (0)

    View full text