Elsevier

Journal of Hydrology

Volume 508, 16 January 2014, Pages 1-11
Journal of Hydrology

Generalizability of Gene Expression Programming-based approaches for estimating daily reference evapotranspiration in coastal stations of Iran

https://doi.org/10.1016/j.jhydrol.2013.10.034Get rights and content

Highlights

  • We used Genetic Programming (GP) to model daily reference evapotranspiration.

  • Generalizability assessment of GP model was performed.

  • Results confirmed the capability of GP model through k-fold test.

  • Externally trained GP may be good alternative for locally trained GP.

Summary

When dealing with climatic variables, the performance assessment of many Artificial Intelligence (AI) and/or data mining applications is based on a single data set assignment of the training and test sets. Further, it is very usual that this assignment is defined according to a local and temporary criterion, i.e. the models are trained and tested using data of the same station. Based on this procedure, the performance of the models outside the training location cannot be inferred. The present work evaluates the performance of Gene Expression Programming (GEP) based models for estimating reference evapotranspiration (ET0) according to temporal and spatial criteria and data set scanning procedures in coastal environments of Iran. The accuracy differences between the local and the external performance depend on the specific climatic trends of the test stations, as well as on the input combination used to feed the models. When relying on a suitable input selection, externally trained models might be a valid alternative to locally trained ones, which would be a crucial advantage in places where only limited climatic variables are available. K-fold testing is a good choice to prevent partially valid conclusions derived from model assessments based on a simple data set assignment. Further, calibration of the GEP model may not be needed, if enough climatic data are available at other stations for external model application. The performance of the GEP model fluctuates chronologically and spatially. A suitable assessment of the model should consider a complete local and/or external scan of the data set used.

Introduction

Evapotranspiration (ET) can be quantified directly by relatively high cost aerodynamic as well as irradiative Bowen ratio methods or by utilization of lysimeters, based on a water balance in a controlled crop area (Allen et al., 1998). The term reference ET (ET0) was introduced because the interdependence of the factors affecting the ET makes the study of the evaporative demand of the atmosphere difficult. In this way, the Penman–Monteith equation (FAO56-PM) has been adopted as a reference equation for estimating ET0 and calibrating other equations (Allen et al., 1998). However, the need for large number of climatic variables (e.g. air temperature, relative humidity, solar radiation and wind speed) is a major disadvantage of the FAO56-PM model. Therefore, the development and validation of models relying on fewer climatic data is of critical importance for the regions where the measured climatic data are limited. In the last decades, the application of Artificial Intelligence (AI) techniques (e.g. Genetic Programming) for modeling agro-hydrologic parameters (e.g. ET) has been viable. Numerous studies have demonstrated that AI-based ET0 estimation models are superior to traditional empirical and semi empirical ET0 estimation models (e.g. Kisi et al., 2012c, Pour Ali Baba et al., 2013, Rahimi Khoob, 2008, Shiri and Kisi, 2011b, Shiri et al., 2012a, Shiri et al., 2013a, Shiri et al., 2013b).

Genetic Programming (GP) was first proposed by Koza (1992) and is particularly suitable where: (a) the interrelationships among relevant variables are poorly understood; (b) finding the optimum solution is hard; (c) conventional mathematical analysis does not, or cannot, provide analytical solutions; (d) an approximate solution is acceptable; (e) small improvements in the performance are routinely measured (or easily measurable) and highly valued; and (f) there is a large amount of data, in computer readable form, that requires examination, classification, and integration (Banzhaf et al., 1998).

GEP (Gene Expression Programming) is comparable to GP but involves computer programs of different sizes and shapes encoded in linear chromosomes of fixed lengths. The most important advantages of GEP are (Ferreira, 2001): (i) the chromosomes are simple entities: linear, compact, relatively small, easy to manipulate genetically (replicate, mutate, recombine, etc.); (ii) the expression trees are exclusively the expression of their respective chromosomes; they are entities upon which selection acts, and according to fitness, they are selected to reproduce with modification.

Notable applications of GP (i.e. GEP) in modeling water resources systems have been reported in the literature, including e.g. predicting velocity in compound channels (Harris et al., 2003); determination of chezy resistance factor (Giustolisi, 2004); determining the unit hydrograph of the urban basins (Rabunal et al., 2007); modeling flow and water quality variables in watersheds (Preis and Otsfeld, 2008); predicting groundwater table fluctuations (Shiri and Kisi, 2011a, Shiri et al., 2013c); river flow prediction (Shiri et al., 2012b); modeling daily precipitation (Kisi and Shiri, 2011); modeling river suspended sediment load (Kisi and Shiri, 2012, Kisi et al., 2012a); modeling daily lake level fluctuations (Kisi et al., 2012b); estimating daily incoming solar radiation (Landeras et al., 2012), modeling daily dewpoint temperature (Shiri et al., 2013d), and modeling rainfall-runoff procedure (e.g. Aytek and Alp, 2008, Kisi et al., 2013). Nonetheless, some few studies have been reported in literature including GP application for modeling evaporation/evapotranspiration. Parasuraman et al. (2007) applied GP for modeling the dynamics of ET. Guven et al. (2008) used GEP for modeling ET0 in USA. Guven and Kisi (2010) investigated linear genetic programming (LGP) and ANN applications to model daily pan evaporation. Izadifar and Elshorbagy (2010) compared ANN, GEP and statistical models for estimating hourly actual ET. Kisi and Guven (2010) used linear genetic programming for modeling ET. Shiri and Kisi (2011b) compared GEP, ANFIS and ANNs to estimate daily pan evaporation values using recorded and estimated weather variables. Shiri et al. (2012a) applied GEP for modeling daily reference evapotranspiration with a local (individual station) as well as pooled (the whole region) approaches.

Commonly, many AI and GP based applications consider only a single data set assignment, as well as, exclusively, a temporary and local management of the data sets, i.e. models are trained and tested using data of the same station. Apart from not performing a suitable and complete performance assessment of the local patterns, another important limitation of this approach is that the generalizability of the developed models is not assessed outside the training station. This is decisive to evaluate the real usefulness of many published procedures, especially those presenting an accurate performance of locally trained models relying on limited inputs. Although requiring few inputs for their application, those models might only be useful in the training stations, unless the external generalizability is also validated, which is not the case in most applications, as mentioned. If these models are only accurate in the training stations, their real applicability is limited to local emergency cases, like breakdowns in the data acquisition system. A new user would not be able to apply that model in a different station, because the external performance was not evaluated, and would require a suitable set of patterns, including the targets, for training a new local model relying on that limited combination of inputs. In most cases, calculated FAO56-PM ET0 targets are used, due to the usual absence of experimental ones. So, enough inputs would be required for a new user to calculate first the needed targets according to FAO56-PM. Hence, the studies enhancing the usefulness of models relying of limited inputs fail often in the evaluation of their performance and might provide misleading conclusions about their real applicability. Only few studies have tried to assess the external performance of ET0 models (Kisi, 2007, Kisi et al., 2012c, Martí et al., 2010, Martí et al., 2011, Rahimi Khoob, 2008, Shiri et al., 2011, Shiri et al., 2013a, Shiri et al., 2013b). Nevertheless, these studies considered only a single data set assignment. Shiri et al. (2013e) performed for the first time an external assessment of the generalizability of GEP based models for estimating pan evaporation based on k-fold testing. The current study aims at applying a similar approach to estimate ET0 in a different climatic scenario, namely several coastal locations of Iran.

Section snippets

Studied region and used data

Eight coastal weather stations from Iran were considered in this study. The geographical positions of the studied weather stations are shown in Fig. 1. The used dataset comprises daily values of maximum air temperature (Tmax), minimum air temperature (Tmin), mean air temperature (Tmean), wind speed (WS), relative humidity (RH) and solar radiation (RS) between the 1st of January 2000 and the 31st December 2008. Table 1 sums up the average and standard deviation values of the used weather data in

Results and discussion

The local and external performance per station of the studied GEP models for the three input combinations (GEP1, GEP2, and GEP3) is presented in Fig. 4, Fig. 5, Fig. 6, respectively. A high variability in the RMSE, MAE, AARE and r2 statistics can be clearly seen in all stations. The global RMSE values of the local GEP1 and GEP2 models respectively range between 0.51 and 0.47 mm for the Bandar-e-Lengeh station and between 0.90 and 0.92 mm for the and Abadan station, while the RMSE of the GEP3

Conclusions

The generalizability of GEP based models for ET0 estimation was assessed in this paper through spatial and temporal k-fold testing in a coastal environment of Iran. The spatial assessment results indicated that the externally trained GEP models presented less accurate estimations in Abadan, Ahwaz and Sari stations than in the other stations (Bandar Abbas, Bandare-Lenge, Bushehr, Gorgan and Rasht). Locally trained GEP models performed better than the externally trained models in Abadan, Ahwaz,

References (47)

  • J. Shiri et al.

    Global cross station assessment of neuro-fuzzy models for estimating daily reference evapotranspiration

    Journal of Hydrology

    (2013)
  • J. Shiri et al.

    Predicting groundwater level fluctuations with meteorological effect implications – a comparative study among soft computing techniques

    Computers & Geosciences

    (2013)
  • Allen, R.G., Pereira, L.S., Raes, D., Smith, M., 1998. Crop evapotranspiration. Guide lines for computing crop...
  • A. Aytek et al.

    An application of artificial intelligence for rainfall runoff modeling

    Journal of Earth System Science

    (2008)
  • W. Banzhaf et al.

    Genetic Programming

    (1998)
  • P. Droogers et al.

    Estimating reference evapotranspiration under inaccurate data conditions

    Irrigation and Drainage Systems

    (2002)
  • Ferreira, C., 2001. Gene expression programming in problem solving. In: 6th Online World Conference on Soft Computing...
  • O. Giustolisi

    Using GP to determine Chezzy resistance coefficient in corrugated channels

    Journal of Hydroinformatics

    (2004)
  • A. Guven et al.

    Daily pan evaporation modeling using linear genetic programming technique

    Irrigation Science

    (2010)
  • A. Guven et al.

    Genetic programming-based empirical model for daily reference evapotranspiration estimation

    Clean Soil, Air, Water

    (2008)
  • G.H. Hargreaves et al.

    Reference crop evapotranspiration from temperature

    American Society of Agricultural Engineers

    (1985)
  • E.L. Harris et al.

    Velocity predictions in compound channels with vegetated flood plains using genetic programming

    International Journal of River Basin Management

    (2003)
  • Z. Izadifar et al.

    Prediction of hourly actual evapotranspiration using neural networks, genetic programming, and statistical models

    Hydrological Processes

    (2010)
  • Cited by (95)

    View all citing articles on Scopus
    1

    Ph.D. Student.

    View full text