Elsevier

Journal of Hydrology

Volume 547, April 2017, Pages 544-556
Journal of Hydrology

Research papers
Regionalization of runoff models derived by genetic programming

https://doi.org/10.1016/j.jhydrol.2017.02.018Get rights and content

Highlights

  • We derived runoff models by genetic programming (GP models).

  • We transferred GP models between catchments.

  • The transfer of GP models is possible when physical similarity exists between donor catchment(s) and acceptor catchment.

Abstract

The aim of this study is to assess the potential of hydrological models derived by genetic programming (GP) to estimate runoff at ungauged catchments by regionalization. A set of 176 catchments from the MOPEX (Model Parameter Estimation Experiment) project was used for our analysis. Runoff models for each catchment were derived by genetic programming (hereafter GP models). A comparison of efficiency was made between GP models and three conceptual models (SAC-SMA, BTOPMC, GR4J). The efficiency of the GP models was in general comparable with that of the SAC-SMA and BTOPMC models but slightly lower (up to 10% for calibration and 15% in validation) than for the GR4J model. The relationship between the efficiency of the GP models and catchment descriptors (CDs) was investigated. From 13 available CDs the aridity index and mean catchment elevation explained most of the variation in the efficiency of the GP models. The runoff for each catchment was then estimated considering GP models from single or multiple physically similar catchments (donors). Better results were obtained with multiple donor catchments. Increasing the number of CDs used for quantification of physical similarity improves the efficiency of the GP models in runoff simulation. The best regionalization results were obtained with 6 CDs together with 6 donors. Our results show that transfer of the GP models is possible and leads to satisfactory results when applied at physically similar catchments. The GP models can be therefore used as an alternative for runoff modelling at ungauged catchments if similar gauged catchments can be identified and successfully simulated.

Introduction

Runoff simulation on ungauged catchments continues to be a subject of interest among hydrologists (Hrachowitz et al., 2013). Various approaches collectively termed hydrological regionalization and associated with estimating hydrological characteristics on ungauged catchments are used.

The term hydrological regionalization describes methods allowing transfer of information about hydrological behaviour between catchments (Oudin et al., 2008). However, this definition varies in the context of the studied problem (compare Gottschalk et al. (1979) with Blöschl and Sivapalan (1995)). He et al. (2011) provide a brief overview of how the definition of hydrological regionalization has developed.

Some of the first works in the field of hydrological regionalization were published by Jarboe and Haan (1974) and Magette et al. (1976). Using regression, they endeavoured to relate the Kentucky watershed model parameters with measurable catchment descriptors (CDs). Subsequently, three basic approaches to hydrological regionalization have been used frequently: the regression approach (Jarboe and Haan, 1974, Magette et al., 1976, Xu, 1999, Merz and Blöschl, 2004, Heuvelmans et al., 2006, Wagener and Wheater, 2006), the spatial proximity approach (Vandewiele and Elias, 1995, Merz and Blöschl, 2004, Parajka et al., 2005, Oudin et al., 2008) and the physical similarity approach (Acreman and Sinclair, 1986, Burn and Boorman, 1993, Parajka et al., 2005, Oudin et al., 2008, Zhang and Chiew, 2009). Detailed description of these regionalization approaches can be found in He et al. (2011) or in Blöschl et al. (2013). In the physical similarity approach (which is used further in our study) it is assumed that catchments with similar values of important CDs have similar hydrological behaviour. Transfer of model parameters from a gauged catchment (donor) to a physically similar ungauged catchment (acceptor) should lead to meaningful results (McIntyre et al., 2005, Blöschl et al., 2013), due to the existing links between CDs and the hydrological behaviour of catchment. These links were explored by Oudin et al. (2010) who showed that relationship between physical and hydrological similarity exists for 60 % of catchments of their catchment set.

The three basic regionalization approaches (regression, spatial proximity and physical similarity approach) have been compared in terms of model simulation efficiency on ungauged catchments (Merz and Blöschl, 2004, Parajka et al., 2005, Oudin et al., 2008, Zhang and Chiew, 2009). These approaches have also been combined in order to improve model efficiency (Burn and Boorman, 1993, Merz and Blöschl, 2005, Zhang and Chiew, 2009). Parajka et al. (2013) compared these regionalization approaches on the basis of 34 previously presented studies (encompassing 3,780 catchments from different climatic conditions). Their results show that physical similarity and spatial proximity provide similar model efficiencies and that these model efficiencies are better than those based upon regression. They also presented catchment conditions where regionalization performs well.

Genetic programming (GP) is an evolutionary machine learning technique that automatically solves wide range of problems without requiring the user to specify the structure of the solution in advance (Poli et al., 2008). Since GP’s introduction by Koza (1992), it has been applied frequently (Poli et al., 2008). The fields of GP applications are e.g. optimization, data mining, signal processing and control (for more see Koza, 2010, Poli et al., 2008).

Genetic programming is a general optimization method, and its main objective is to discover the relationships between the input and output data. In runoff modelling, GP is used most frequently as a symbolic regression tool, which means that equations describing the relationships between inputs and outputs are derived by a simulated evolution process (a GP run). The derived equation represents a runoff model for a given data set. No prior information about the structure of the model (equation) is needed. In a GP run, the population of individuals (candidate solutions) is progressively improved from generation to generation by selection and variation of the best individuals, whose offspring proceed to the next generation (often together with the best parents). A GP run is described in more detail in 2.5.

Standard GP began to be used in rainfall-runoff modelling during the second half of the 1990s in the work of Babovic (1996) and Cousin and Savic (1997). Synthetic rainfall-runoff series were used for modelling in these studies. In the first decade of this century, more works were published which included real data. Combination of GP and conceptual models in the sense of calibrating the conceptual model using GP and error correction was presented by Babovic and Keijzer (2002). Studies testing GP for 1 day runoff forecasting were published by e.g. Jayawardena et al., 2005, Jayawardena et al., 2006 and Charhate et al. (2009). Rabuñal et al. (2007) applied a combined approach using GP and artificial neural networks to rainfall-runoff modelling on an urbanized catchment. In a study by Makkeasorn et al. (2008), GP and artificial neural networks were compared for their ability to predict runoff while using assorted variables (including radar data) as inputs. GP also appears in comparative studies of data-driven models (Elshorbagy et al., 2010, Elshorbagy et al., 2010, Londhe and Charhate, 2010). In these comparative studies, GP has generally been considered the most successful technique.

Models derived by GP (GP models) can be suitable alternatives to conceptual models. Optimization of model structure is the main advantage of GP models. Thus, a model’s structure may vary when runoff is simulated in different hydro-climatic conditions.

The main objective and novelty of this study lies in testing transferability of GP models between catchments. Such testing has yet not been carried out, and in particular not for daily hydrograph prediction in ungauged catchments. The specific objectives include to:

  • 1.

    test GP models’ appropriateness for runoff simulation on a sample of MOPEX (Model Parameter Estimation Experiment) catchments and compare simulation efficiencies of GP models and conceptual models,

  • 2.

    investigate relationships between GP models’ efficiencies and CDs, and

  • 3.

    test transferability of GP models between catchments on the basis of their physical similarity.

To address objective 1, we test whether GP models can reasonably simulate runoff. Furthermore, simulation efficiencies of GP models are compared with those of conceptual models. The overriding objective here, then, is to test whether GP models are capable to capture different aspects of catchment hydrological behaviour similarly to how conceptual models do so.

The intentions for objective 2 are to examine which CDs significantly affect the quality of GP model simulations and to identify catchment conditions in which GP models perform well.

Objective 3 involves testing GP models’ transferability in implementing the physical similarity approach. Single-donor and multiple-donor techniques are used for estimating total runoff on ungauged catchments. Also considered are the effects of various combinations of CDs. The results of the physical similarity approach are compared with results from the naive method, which is the application of a general model of catchment behaviour.

The paper is organized as follows: 2 presents catchments and data sets used in the tests and describes the methodology. The donor catchment searching method, GP setup, and combination of the GP and physical similarity approach in regionalization are described in this section. In Sections 3 Results, 4 Discussion, the results are presented and analysed. The paper closes with conclusions.

Section snippets

Input data

Time series of daily precipitation (P in mm/day), potential evaporation (PE in mm/day) and runoff (R in mm/day) for 176 catchments in the US for period 1970–1989 were considered in our analysis. The data originate from the MOPEX project (Duan et al., 2006). The period from 1 January 1970 to 31 December 1979 was used for model calibration (i.e. identification of optimal model structure and its parameters). The period from 1 January 1980 to 31 December 1989 was then used for model validation. We

Calibration of catchment behaviour models using genetic programming

For every catchment, the Rd=f(Pd-0,d-l,PEd-0,d-l) model of catchment behaviour (GP model) was established by GP optimization process. Results from calibration and validation are depicted in Fig. 1. The box plots present values of aggNS, NS, and logNS for GP models and also for the GR4J model. The GR4J’s results are discussed in Section 3.3. GP models’ performances are worse when moving from calibration to validation stage, but this weakening is small. Efficiency value declines by 0.072 for

Calibration of catchment behaviour models using genetic programming

One aim of this study was to test whether GP models can be used for runoff simulation and subsequent regionalization. Results from calibration and validation show that GP models perform reasonably well in simulating runoff. In addition, the difference in the efficiencies between calibration and validation was small, i.e. GP models were rather robust. This may be related to a small depth of the considered syntax trees preventing overfitting.

Generally, identified GP models could be influenced by

Conclusions

The goal of this paper was to introduce a combination of a physically based regionalization approach and data-driven models derived by genetic programming.

The results of our study indicate that GP models can simulate runoff reasonably in different hydro-climatic conditions. GP models therefore present a simple and user-friendly alternative to common conceptual models for examining runoff modelling issues. This study shows that GP models can be used also for regionalization tasks. The results

Acknowledgements

This work was supported by the university-wide internal grant agency of the Czech University of Life Sciences Prague, grant CIGA20142028.

The authors thank the three anonymous reviewers for their constructive suggestions, which helped to improve the text and wish to acknowledge the MOPEX project staff in relation to data provision and management.

References (57)

  • C. Perrin et al.

    Improvement of a parsimonious model for streamflow simulation

    J. Hydrol.

    (2003)
  • S. Sette et al.

    Genetic programming: principles and applications

    Eng. Appl. Artif. Intell.

    (2001)
  • R. Singh et al.

    Identifying dominant controls on hydrologic parameter transfer from gauged to ungauged catchments – a comparative hydrology approach

    J. Hydrol.

    (2014)
  • G.L. Vandewiele et al.

    Monthly water balance of ungauged catchments obtained by geographical regionalization

    J. Hydrol.

    (1995)
  • T. Wagener et al.

    Parameter estimation and regionalization for continuous rainfall-runoff models including uncertainty

    J. Hydrol.

    (2006)
  • T. Ao et al.

    Relating BTOPMC model parameters to physical features of MOPEX basins

    J. Hydrol.

    (2006)
  • V. Babovic

    Emergence, evolution, intelligence: Hydroinformatics

  • V. Babovic et al.

    Rainfall-runoff modelling based on genetic programming

    Nord. Hydrol.

    (2002)
  • K. Beven et al.

    A physically based, variable contributing area model of basin hydrology

    Hydrol. Sci. Bull.

    (1979)
  • G. Blöschl et al.

    Scale issues in hydrological modelling: a review

    Hydrol. Process.

    (1995)
  • G. Blöschl et al.

    Runoff Prediction in Ungauged Basins, Synthesis across Processes, Places and Scales

    (2013)
  • R.J.C. Burnash

    The NWS river forecast system – catchment modeling

  • S.B. Charhate et al.

    Genetic programming to forecast stream flow

  • Cousin, N., Savic, D.A., 1997. A rainfall-runoff model using genetic programming. Tech. rep., School of Engineering,...
  • C. Daly et al.

    A statistical-topographic model for mapping climatological precipitation over mountainous terrain

    J. Appl. Meteorol.

    (1994)
  • A. Elshorbagy et al.

    Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology – Part 1: concepts and methodology

    Hydrol. Earth Syst. Sci.

    (2010)
  • A. Elshorbagy et al.

    Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology – Part 2: application

    Hydrol. Earth Syst. Sci.

    (2010)
  • Farnsworth, R.K., Thompson, E.S., Peck, E.L., Service, U.S.N.W., 1982. Evaporation Atlas for the Contiguous 48 United...
  • Cited by (22)

    • Using a global sensitivity analysis to estimate the appropriate length of calibration period in the presence of high hydrological model uncertainty

      2022, Journal of Hydrology
      Citation Excerpt :

      Three models with very different complexities (GR4J, IHACRES and Sacramento) have been used to demonstrate the procedure for estimating the appropriate calibration data length (Table 1). These conceptual hydrological models have been used in various studies until recently (Moussu et al., 2011; Petheram et al., 2012; Shin et al., 2016; Heřmanovský et al., 2017; Guo et al., 2018; Sezen et al., 2019). The GR4J model (Perrin et al., 2003) uses four parameters and simulates daily runoff using two stores (production and routing) and two unit hydrographs.

    • Multi-gene genetic programming expressions for simulating solute transport in fractures

      2022, Journal of Hydrology
      Citation Excerpt :

      GP is a robust method with several advantages over other commonly employed data-driven methods (e.g., artificial neural networks) (Hadi and Tombul, 2018), including: 1) generation of explicit expressions or “glass box” models, 2) automatic discovery of model structure utilizing given data, 3) adaptive, evolutionary ability to generate global solutions without becoming trapped in local optima, and 4) not requiring any specific prior domain knowledge. As such, GP has been applied widely in water resource-related research, including hydrogeologic (Aryafar et al., 2019; Cianflone et al., 2017; Esfahani and Datta, 2016; Sadat-Noori et al., 2020), river stage (Ghorbani et al., 2018; Hadi and Tombul, 2018; Mehr and Gandomi, 2021), real-time wave forecasting (Kambekar and Deo, 2014), water quality (Jamei et al., 2020), and rainfall-runoff modelling (Chadalawada et al., 2020; Heřmanovský et al., 2017). Multi-gene genetic programming (MGGP) advances GP through linearly combining low-depth GP blocks to improve the accuracy of solutions evolved by single-gene GP.

    • A computer vision-based approach to fusing spatiotemporal data for hydrological modeling

      2018, Journal of Hydrology
      Citation Excerpt :

      In the 1990s, data-driven models began to receive increased attention from the hydrologic community, largely due to the remarkable progress in machine learning techniques, such as artificial neural networks (ANNs) (Beven, 2012; Hsu et al., 1995). Since then, in addition to classic runoff forecasting (Heřmanovský et al., 2017), data-driven models have been applied to address a variety of hydrological issues, such as soil moisture estimation (Ahmad et al., 2010), evapotranspiration prediction (Abdullah et al., 2015), river stage forecasting (Chau, 2007), and groundwater level simulation (Gholami et al., 2015). However, the hydrologic community, by and large, still views data-driven models as a valuable supplement to process-based models in hydrological modeling.

    • Genetic programming in water resources engineering: A state-of-the-art review

      2018, Journal of Hydrology
      Citation Excerpt :

      As a result, the proposed model demonstrated excellent performance with less uncertainty. Most recently, an outstanding investigation concerning assessments of hydrological models (i.e., GP model) was undertaken by Heřmanovský et al. (2017) in simulating runoff process at ungagged catchments using regionalization aspect. They used 176 catchments obtained from model parameter estimation experiments in the USA.

    • A binary genetic programing model for teleconnection identification between global sea surface temperature and local maximum monthly rainfall events

      2017, Journal of Hydrology
      Citation Excerpt :

      For details about GP, the reader is referred to Koza (1990). In hydrological studies, GP has been mostly used for regression problems either to identify the underlying structure of a natural system (e.g., Danandeh Mehr et al., 2013; Ravansalar et al., 2016; Danandeh Mehr and Kahya, 2017a,b; Heřmanovský et al. 2017) or experimental processes (e.g., Khan et al., 2012; Uyumaz et al., 2014; Sattar and Gharabaghi, 2015; Sattar 2016). When the task is to build an empirical model of data acquired from a process, GP is also known as symbolic regression (Searson et al. 2010).

    View all citing articles on Scopus
    View full text