Statistical downscaling of watershed precipitation using Gene Expression Programming (GEP)

https://doi.org/10.1016/j.envsoft.2011.07.007Get rights and content

Abstract

Investigation of hydrological impacts of climate change at the regional scale requires the use of a downscaling technique. Significant progress has already been made in the development of new statistical downscaling techniques. Statistical downscaling techniques involve the development of relationships between the large scale climatic parameters and local variables. When the local parameter is precipitation, these relationships are often very complex and may not be handled efficiently using linear regression. For this reason, a number of non-linear regression techniques and the use of Artificial Neural Networks (ANNs) was introduced. But due to the complexity and issues related to finding a global solution using ANN-based techniques, the Genetic Programming (GP) based techniques have surfaced as a potential better alternative. Compared to ANNs, GP based techniques can provide simpler and more efficient solutions but they have been rarely used for precipitation downscaling. This paper presents the results of statistical downscaling of precipitation data from the Clutha Watershed in New Zealand using a non-linear regression model developed by the authors using Gene Expression Programming (GEP), a variant of GP. The results show that GEP-based downscaling models can offer very simple and efficient solutions in the case of precipitation downscaling.

Highlights

► We present GEP model as a new tool for precipitation downscaling at watershed scale. ► The new model was compared with an existing model of similar nature known as SDSM. ► Our new model proves to be simpler and more efficient than SDSM.

Introduction

Investigation of the hydrological impacts of climate change at the regional or local scale requires the use of a statistical downscaling technique rather than dynamical downscaling as, often, in such investigations, point specific information is desirable (Timbal et al., 2009). Downscaling is a process of transforming the coarse spatial resolution of Global Climate Model (GCM) information to a form suitable for direct use in many types of climate impact models. On the basis of its working principle, it can be broadly categorized as statistical and dynamical. There have been a number of studies to date which provide detailed discussions about both categories such as Xu, 1999, Wilby et al., 2004 and Christenson et al. (2007). Significant progress has already been made in the development of new statistical downscaling techniques. These techniques can be classified into three major classes namely; (1) Weather generator (2) Weather typing and (3) Multiple regression. The third class of statistical downscaling techniques involves the development of relationships between the large scale climatic parameters such as geopotential height and mean sea level pressure (predictors) and the local variables (predictands) such as temperature and precipitation. Traditionally, this has been done through multiple linear regression, with or without data pre-processing techniques aimed at reducing the dimensionality of the problem (Huth, 1999, Wilby et al., 2002). These techniques include principle component analysis (e.g. Schoof and Pryor, 2001) and canonical correlation analysis (e.g. Busuioc et al., 2008). When the variable of interest is precipitation, the predictor-predictand relationships are often very complex and linear regression based methods may not work very well. For this reason, a number of non-linear regression downscaling schemes and the use of Artificial Neural Networks (ANNs) was introduced. Recent studies have shown that statistical downscaling schemes based on the use of artificial neural networks (ANNs) models can present good non-linear regression models (e.g. Mpelasoka et al., 2001, Haylock et al., 2006) and can be considered as a global correlation detector. The ANN-based models are often complex in nature so their solutions are not easily interpretable in a physically meaningful way and they can suffer significantly from conventional modelling problems such as being trapped at the local optimum during their calibration (Tripathi et al., 2006).

Another soft computing technique known as Genetic Programming (GP) (Koza, 1994) has emerged as a popular tool in recent years. In the field of water resources, GP has a number of applications. It has been used to model groundwater related problems (Hong and Rosen, 2002). It has also been used as a rainfall-runoff modelling tool on a number of occasions, which are not fully reviewed here. Previous studies show that GP is more efficient than ANNs in the area of rainfall-runoff modelling (e.g. Savic et al., 1999, Whigham and Crapper, 2001, Babovic and Keijzer, 2002, Liong et al., 2002 and Aytek et al., 2008). In the context of downscaling for climate change studies, the use of GP (or its variants) is not widespread. The advantage of GP over ANN for developing downscaling functions is that it provides efficient and transparent modelling solutions. As far as the authors are aware, the first attempt to use GP as a downscaling tool was made by Coulibaly (2004). The target parameter in that study was temperature and the testing of GP for a challenging task of precipitation downscaling was recommended. This study endorses the recommendations in Coulibaly (2004). A precipitation downscaling case study has been presented here at the Clutha watershed in New Zealand, using a non-linear multiple regression model developed by the authors using Gene Expression Programming (GEP) which is a variant of GP (Ferreira, 2001). While some of the results of the GEP model used in this study are presented previously (Hashmi et al., 2009), complete modelling details were not, then, provided. The main focus of the present study is to: i) Fully explain the GEP modelling process in the context of watershed precipitation downscaling; and ii) Evaluate and compare the performance of our GEP model with a popular downscaling model of similar nature.

Section snippets

Study site and data

This study is focused on the Clutha River watershed, upstream of Balclutha stream flow gauge, located in the South Island of New Zealand (Fig. 1). The Clutha is the largest river by volume and the second longest river in New Zealand, having a length of 322 km. As discussed by Murray [1975], the precipitation in this watershed is strongly influenced by the mountains of Fiordland and the Southern Alps (see Fig. 1). As far as the distribution of the monthly precipitation is concerned, recorded

Methodology

Symbolic regression is a type of nonparametric regression which relates the predictors and predictand variables in the form of a function. This function is not specified a priori, but is constrained to contain a number of mathematical or logical expressions to be chosen from a pre-selected set of mathematical expressions (symbols) and predictor variables. In symbolic regression, genetic programming, which is based on Darwin’s evolution theory is used to obtain the optimum set of symbols and

Results and discussion

In the GEP-based symbolic regression the most important predictors are selected automatically from the set of all of the independent variables (predictors). The best evolved model in our case contained seven predictors, these being considered as the most important among the total of 26 predictors. Further details about this model are given in Appendix A. Table 5 shows the full list of the predictors with their short names as used in the best model. The predictors shown in bold in Table 5 were

Conclusions

A comparative statistical downscaling analysis was undertaken to evaluate the GEP-based symbolic regression as a statistical downscaling tool. Daily precipitation time series of the Clutha Watershed in New Zealand were downscaled using GEP. The SDSM model results were used as a benchmark for analysing the downscaled results of the GEP model. The results were analysed visually by using scatter plots and also in terms of parameters such as the number of predictors used by the downscaling model,

Acknowledgements

The authors of this paper are thankful to the National Institute of Water and Atmospheric Research (NIWA), New Zealand, for providing the precipitation data used in this study and the Higher Education Commission (HEC) of Pakistan for funding this research work.

References (31)

  • M. Collins et al.

    The internal climate variability of HadCM3, a version of the Hadley centre coupled model without flux adjustments

    Climate Dynamics

    (2001)
  • P. Coulibaly

    Downscaling daily extreme temperatures with genetic programming

    Geophysical Research Letters

    (2004)
  • C. Ferreira

    Gene expression programming: a new adaptive algorithm for solving problems

    Complex Systems

    (2001)
  • C. Ferreira

    Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence

    (2006)
  • C. Ferreira

    "What is GEP?" From GeneXproTools Tutorials - A Gepsoft Web Resource

    (2010)
  • Cited by (87)

    View all citing articles on Scopus
    1

    Tel.: +64 99238499.

    2

    Tel.: +64 99238165.

    View full text