Generalizability of Gene Expression Programming-based approaches for estimating daily reference evapotranspiration in coastal stations of Iran

doi:10.1016/j.jhydrol.2013.10.034

Journal of Hydrology

Volume 508, 16 January 2014, Pages 1-11

https://doi.org/10.1016/j.jhydrol.2013.10.034 Get rights and content

Highlights

•
We used Genetic Programming (GP) to model daily reference evapotranspiration.
•
Generalizability assessment of GP model was performed.
•
Results confirmed the capability of GP model through k-fold test.
•
Externally trained GP may be good alternative for locally trained GP.

Summary

When dealing with climatic variables, the performance assessment of many Artificial Intelligence (AI) and/or data mining applications is based on a single data set assignment of the training and test sets. Further, it is very usual that this assignment is defined according to a local and temporary criterion, i.e. the models are trained and tested using data of the same station. Based on this procedure, the performance of the models outside the training location cannot be inferred. The present work evaluates the performance of Gene Expression Programming (GEP) based models for estimating reference evapotranspiration (ET₀) according to temporal and spatial criteria and data set scanning procedures in coastal environments of Iran. The accuracy differences between the local and the external performance depend on the specific climatic trends of the test stations, as well as on the input combination used to feed the models. When relying on a suitable input selection, externally trained models might be a valid alternative to locally trained ones, which would be a crucial advantage in places where only limited climatic variables are available. K-fold testing is a good choice to prevent partially valid conclusions derived from model assessments based on a simple data set assignment. Further, calibration of the GEP model may not be needed, if enough climatic data are available at other stations for external model application. The performance of the GEP model fluctuates chronologically and spatially. A suitable assessment of the model should consider a complete local and/or external scan of the data set used.

Introduction

Evapotranspiration (ET) can be quantified directly by relatively high cost aerodynamic as well as irradiative Bowen ratio methods or by utilization of lysimeters, based on a water balance in a controlled crop area (Allen et al., 1998). The term reference ET (ET₀) was introduced because the interdependence of the factors affecting the ET makes the study of the evaporative demand of the atmosphere difficult. In this way, the Penman–Monteith equation (FAO56-PM) has been adopted as a reference equation for estimating ET₀ and calibrating other equations (Allen et al., 1998). However, the need for large number of climatic variables (e.g. air temperature, relative humidity, solar radiation and wind speed) is a major disadvantage of the FAO56-PM model. Therefore, the development and validation of models relying on fewer climatic data is of critical importance for the regions where the measured climatic data are limited. In the last decades, the application of Artificial Intelligence (AI) techniques (e.g. Genetic Programming) for modeling agro-hydrologic parameters (e.g. ET) has been viable. Numerous studies have demonstrated that AI-based ET₀ estimation models are superior to traditional empirical and semi empirical ET₀ estimation models (e.g. Kisi et al., 2012c, Pour Ali Baba et al., 2013, Rahimi Khoob, 2008, Shiri and Kisi, 2011b, Shiri et al., 2012a, Shiri et al., 2013a, Shiri et al., 2013b).

Genetic Programming (GP) was first proposed by Koza (1992) and is particularly suitable where: (a) the interrelationships among relevant variables are poorly understood; (b) finding the optimum solution is hard; (c) conventional mathematical analysis does not, or cannot, provide analytical solutions; (d) an approximate solution is acceptable; (e) small improvements in the performance are routinely measured (or easily measurable) and highly valued; and (f) there is a large amount of data, in computer readable form, that requires examination, classification, and integration (Banzhaf et al., 1998).

GEP (Gene Expression Programming) is comparable to GP but involves computer programs of different sizes and shapes encoded in linear chromosomes of fixed lengths. The most important advantages of GEP are (Ferreira, 2001): (i) the chromosomes are simple entities: linear, compact, relatively small, easy to manipulate genetically (replicate, mutate, recombine, etc.); (ii) the expression trees are exclusively the expression of their respective chromosomes; they are entities upon which selection acts, and according to fitness, they are selected to reproduce with modification.

Notable applications of GP (i.e. GEP) in modeling water resources systems have been reported in the literature, including e.g. predicting velocity in compound channels (Harris et al., 2003); determination of chezy resistance factor (Giustolisi, 2004); determining the unit hydrograph of the urban basins (Rabunal et al., 2007); modeling flow and water quality variables in watersheds (Preis and Otsfeld, 2008); predicting groundwater table fluctuations (Shiri and Kisi, 2011a, Shiri et al., 2013c); river flow prediction (Shiri et al., 2012b); modeling daily precipitation (Kisi and Shiri, 2011); modeling river suspended sediment load (Kisi and Shiri, 2012, Kisi et al., 2012a); modeling daily lake level fluctuations (Kisi et al., 2012b); estimating daily incoming solar radiation (Landeras et al., 2012), modeling daily dewpoint temperature (Shiri et al., 2013d), and modeling rainfall-runoff procedure (e.g. Aytek and Alp, 2008, Kisi et al., 2013). Nonetheless, some few studies have been reported in literature including GP application for modeling evaporation/evapotranspiration. Parasuraman et al. (2007) applied GP for modeling the dynamics of ET. Guven et al. (2008) used GEP for modeling ET₀ in USA. Guven and Kisi (2010) investigated linear genetic programming (LGP) and ANN applications to model daily pan evaporation. Izadifar and Elshorbagy (2010) compared ANN, GEP and statistical models for estimating hourly actual ET. Kisi and Guven (2010) used linear genetic programming for modeling ET. Shiri and Kisi (2011b) compared GEP, ANFIS and ANNs to estimate daily pan evaporation values using recorded and estimated weather variables. Shiri et al. (2012a) applied GEP for modeling daily reference evapotranspiration with a local (individual station) as well as pooled (the whole region) approaches.

Commonly, many AI and GP based applications consider only a single data set assignment, as well as, exclusively, a temporary and local management of the data sets, i.e. models are trained and tested using data of the same station. Apart from not performing a suitable and complete performance assessment of the local patterns, another important limitation of this approach is that the generalizability of the developed models is not assessed outside the training station. This is decisive to evaluate the real usefulness of many published procedures, especially those presenting an accurate performance of locally trained models relying on limited inputs. Although requiring few inputs for their application, those models might only be useful in the training stations, unless the external generalizability is also validated, which is not the case in most applications, as mentioned. If these models are only accurate in the training stations, their real applicability is limited to local emergency cases, like breakdowns in the data acquisition system. A new user would not be able to apply that model in a different station, because the external performance was not evaluated, and would require a suitable set of patterns, including the targets, for training a new local model relying on that limited combination of inputs. In most cases, calculated FAO56-PM ET₀ targets are used, due to the usual absence of experimental ones. So, enough inputs would be required for a new user to calculate first the needed targets according to FAO56-PM. Hence, the studies enhancing the usefulness of models relying of limited inputs fail often in the evaluation of their performance and might provide misleading conclusions about their real applicability. Only few studies have tried to assess the external performance of ET₀ models (Kisi, 2007, Kisi et al., 2012c, Martí et al., 2010, Martí et al., 2011, Rahimi Khoob, 2008, Shiri et al., 2011, Shiri et al., 2013a, Shiri et al., 2013b). Nevertheless, these studies considered only a single data set assignment. Shiri et al. (2013e) performed for the first time an external assessment of the generalizability of GEP based models for estimating pan evaporation based on k-fold testing. The current study aims at applying a similar approach to estimate ET₀ in a different climatic scenario, namely several coastal locations of Iran.

Section snippets

Studied region and used data

Eight coastal weather stations from Iran were considered in this study. The geographical positions of the studied weather stations are shown in Fig. 1. The used dataset comprises daily values of maximum air temperature (T_max), minimum air temperature (T_min), mean air temperature (T_mean), wind speed (W_S), relative humidity (R_H) and solar radiation (R_S) between the 1st of January 2000 and the 31st December 2008. Table 1 sums up the average and standard deviation values of the used weather data in

Results and discussion

The local and external performance per station of the studied GEP models for the three input combinations (GEP1, GEP2, and GEP3) is presented in Fig. 4, Fig. 5, Fig. 6, respectively. A high variability in the RMSE, MAE, AARE and r² statistics can be clearly seen in all stations. The global RMSE values of the local GEP1 and GEP2 models respectively range between 0.51 and 0.47 mm for the Bandar-e-Lengeh station and between 0.90 and 0.92 mm for the and Abadan station, while the RMSE of the GEP3

Conclusions

The generalizability of GEP based models for ET₀ estimation was assessed in this paper through spatial and temporal k-fold testing in a coastal environment of Iran. The spatial assessment results indicated that the externally trained GEP models presented less accurate estimations in Abadan, Ahwaz and Sari stations than in the other stations (Bandar Abbas, Bandare-Lenge, Bushehr, Gorgan and Rasht). Locally trained GEP models performed better than the externally trained models in Abadan, Ahwaz,

References (47)

P. Gavilán et al.
Regional calibration of Hargreaves equation for estimating reference ET in a semi arid environment
Agricultural Water Management
(2006)
O. Kisi et al.
River suspended sediment estimation by climatic variables implication: comparative study among soft computing techniques
Computers & Geoscienes
(2012)
O. Kisi et al.
Suspended sediment modeling using genetic programming and soft computing techniques
Journal of Hydrology
(2012)
O. Kisi et al.
Forecasting daily lake levels using artificial intelligence approaches
Computers & Geosciences
(2012)
O. Kisi et al.
Modeling rain fall-runoff process using soft computing techniques
Computers & Geosciences
(2013)
G. Landeras et al.
Comparison of gene expression programming with neuro-fuzzy and neural network computing techniques in estimating daily incoming solar radiation in the Basque Country (Northern Spain)
Energy Conversion and Management
(2012)
P. Martí et al.
An artificial neural network approach for the estimation of stem water potential from frequency domain reflectometry soil moisture measurements and meteorological data
Computers and Electronics in Agriculture
(2013)
A. Preis et al.
A coupled model tree-genetic algorithm scheme of flow and water quality predictions in watersheds
Journal of Hydrology
(2008)
J. Shiri et al.
Comparison of genetic programming with neuro-fuzzy systems for predicting short-term water table depth fluctuations
Computers & Geosciences
(2011)
J. Shiri et al.
Daily reference evapotranspiration modeling by using genetic programming approach in the Basque Country (Northern Spain)
Journal of Hydrology
(2012)

J. Shiri et al.

Global cross station assessment of neuro-fuzzy models for estimating daily reference evapotranspiration

Journal of Hydrology

(2013)

J. Shiri et al.

Predicting groundwater level fluctuations with meteorological effect implications – a comparative study among soft computing techniques

Computers & Geosciences

(2013)

Allen, R.G., Pereira, L.S., Raes, D., Smith, M., 1998. Crop evapotranspiration. Guide lines for computing crop...

A. Aytek et al.

An application of artificial intelligence for rainfall runoff modeling

Journal of Earth System Science

(2008)

W. Banzhaf et al.

Genetic Programming

(1998)

P. Droogers et al.

Estimating reference evapotranspiration under inaccurate data conditions

Irrigation and Drainage Systems

(2002)

Ferreira, C., 2001. Gene expression programming in problem solving. In: 6th Online World Conference on Soft Computing...

O. Giustolisi

Using GP to determine Chezzy resistance coefficient in corrugated channels

Journal of Hydroinformatics

(2004)

A. Guven et al.

Daily pan evaporation modeling using linear genetic programming technique

Irrigation Science

(2010)

A. Guven et al.

Genetic programming-based empirical model for daily reference evapotranspiration estimation

Clean Soil, Air, Water

(2008)

G.H. Hargreaves et al.

Reference crop evapotranspiration from temperature

American Society of Agricultural Engineers

(1985)

E.L. Harris et al.

Velocity predictions in compound channels with vegetated flood plains using genetic programming

International Journal of River Basin Management

(2003)

Z. Izadifar et al.

Prediction of hourly actual evapotranspiration using neural networks, genetic programming, and statistical models

Hydrological Processes

(2010)

Cited by (95)

Innovative approach for predicting daily reference evapotranspiration using improved shallow and deep learning models in a coastal region: A comparative study
2024, Journal of Environmental Management
Accurate and reliable estimation of Reference Evapotranspiration (ETo) is crucial for water resources management, hydrological processes, and agricultural production. The FAO-56 Penman-Monteith (FAO-56PM) approach is recommended as the standard model for ETo estimation; nevertheless, the absence of comprehensive meteorological variables at many global locations frequently restricts its implementation. This study compares shallow learning (SL) and deep learning (DL) models for estimating daily ETo against the FAO-56PM approach based on various statistic metrics and graphic tool over a coastal Red Sea region, Sudan. A novel approach of the SL model, the Catboost Regressor (CBR) and three DL models: 1D-Convolutional Neural Networks (1D-CNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) were adopted and coupled with a semi-supervised pseudo-labeling (PL) technique. Six scenarios were developed regarding different input combinations of meteorological variables such as air temperature (Tmin, Tmax, and Tmean), wind speed (U2), relative humidity (RH), sunshine hours duration (SSH), net radiation (Rn), and saturation vapor pressure deficit (es-ea). The results showed that the PL technique reduced the systematic error of SL and DL models during training for all the scenarios. The input combination of Tmin, Tmax, Tmean, and RH reflected higher performance than other combinations for all employed models. The CBR-PL model demonstrated good generalization abilities to predict daily ETo and was the overall superior model in the testing phase according to prediction accuracy, stability analysis, and less computation cost compared to DL models. Thus, the relatively simple CBR-PL model is highly recommended as a promising tool for predicting daily ETo in coastal regions worldwide which have limited climate data.
Estimating reference crop evapotranspiration using improved convolutional bidirectional long short-term memory network by multi-head attention mechanism in the four climatic zones of China
2024, Agricultural Water Management
Accurate reference crop evapotranspiration (ET₀) estimation is essential for agricultural water management, crop productivity, and irrigation systems. As the standard ET₀ estimation method, the Penman-Monteith equation has been widely recommended worldwide. However, its application is still restricted to comprehensive meteorological data deficiency, making the exploration of alternative simpler models for acceptable ET₀ estimation highly meaningful. Concerning the aforementioned requirement, this study developed the novel deep learning model (MA-CNN-BiLSTM), which incorporates Multi-Head Attention mechanism (MA), Convolutional Neural Network (CNN), and Bidirectional Long Short-Term Memory network (BiLSTM) as intricate relationship processor, feature extractor, and regression component, to estimate ET₀ based on radiation-based (R_n-based), humidity-based (RH-based), and temperature-based (T-based) input combinations at 600 stations during 1961–2020 throughout China under internal and external cross-validation strategies. Besides, through a comparative evaluation among MA-CNN-BiLSTM, CNN-BiLSTM, BiLSTM, LSTM, Multivariate Adaptive Regression Splines (MARS), and empirical models, the result indicated that MA-CNN-BiLSTM achieved superior precision, with values of Determination Coefficient (R²), Nash–Sutcliffe efficiency coefficient (NSE), Relative Root Mean Square Error (RRMSE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) ranging 0.877–0.972, 0.844–0.962, 0.129–0.292, 0.294–0.644 mm d⁻¹, 0.244–0.566 mm d⁻¹ for internal strategy and 0.797–0.927, 0.786–0.920, 0.162–0.335, 0.409–0.969 mm d⁻¹, 0.294–0.699 mm d⁻¹ for external strategy. Specifically, R_n-based MA-CNN-BiLSTM excelled in the temperate continental zone (TCZ) and mountain plateau zone (MPZ), while RH-based MA-CNN-BiLSTM yielded best precision in others. Furthermore, the internal strategy was superior to external strategy by 2.74–106.04% for R², 1.11–120.49% for NSE, 1.41–40.27% for RRMSE, 1.68–45.53% for RMSE, and 1.21–38.87% for MAE, respectively. In summary, the main contribution of the present study is the proposal of a novel LSTM-type ET₀ model (MA-CNN-BiLSTM) to cope with various data-missing scenarios throughout China, which can provide effective support for decision-making in regional agriculture water management.
A review of recent advances and future prospects in calculation of reference evapotranspiration in Bangladesh using soft computing models
2024, Journal of Environmental Management
Evapotranspiration (ETo) is a complex and non-linear hydrological process with a significant impact on efficient water resource planning and long-term management. The Penman-Monteith (PM) equation method, developed by the Food and Agriculture Organization of the United Nations (FAO), represents an advancement over earlier approaches for estimating ETo. Eto though reliable, faces limitations due to the requirement for climatological data not always available at specific locations. To address this, researchers have explored soft computing (SC) models as alternatives to conventional methods, known for their exceptional accuracy across disciplines. This critical review aims to enhance understanding of cutting-edge SC frameworks for ETo estimation, highlighting advancements in evolutionary models, hybrid and ensemble approaches, and optimization strategies. Recent applications of SC in various climatic zones in Bangladesh are evaluated, with the order of preference being ANFIS > Bi-LSTM > RT > DENFIS > SVR-PSOGWO > PSO–HFS due to their consistently high accuracy (RMSE and $R^{2}$ ). This review introduces a benchmark for incorporating evolutionary computation algorithms (EC) into ETo modeling. Each subsection addresses the strengths and weaknesses of known SC models, offering valuable insights. The review serves as a valuable resource for experienced water resource engineers and hydrologists, both domestically and internationally, providing comprehensive SC modeling studies for ETo forecasting. Furthermore, it provides an improved water resources monitoring and management plans.
Evaluating data-driven and hybrid modeling of terrestrial actual evapotranspiration based on an automatic machine learning approach
2024, Journal of Hydrology
The performances of physics-based, data-driven, and hybrid models for estimating terrestrial actual evapotranspiration (ETa) is currently under debate, which requires thorough evaluations of those models particularly with recent developments in automatic machine learning (AML) techniques. In this study, six AML-based models were first constructed using the H2O-AML platform, from which an optimal (AML-OP) model was selected for estimating daily ETa at ecosystem scales. In addition, hybrid models were developed by combining the six AML models with surface conductance (Gs) inverted from the Penman-Monteith equation and an optimal (PM-OP) model was also selected. With 15 predictor variables as model inputs that were compiled from various data sources, the performances of those models for estimating daily ETa were evaluated using observed data from the FLUXNET2015 dataset. The results revealed that no models showed consistently low noise levels across different ecosystem types, making it necessary to use AML techniques for selecting ecosystem-specific models. Interestingly, the AML-OP models (root mean square error (RMSE) and symmetric mean absolute percentage error (SMAPE) were 0.16–0.31 mm d^-1 and 9 %–36 % respectively) showed slightly better predictive results than the PM-OP models (RMSE and SMAPE were 0.23–0.36 mm d^-1 and 15 %–68 % respectively), likely owing to model parameter uncertainties and tight constraints of physical models on application condition. Secondly, as ETa nonlinearly responds to environmental variables, model predictability under extreme weather (drought and heatwave) conditions was examined. The results showed that the prediction of the AML-OP and PM-OP models expectedly worsened (RMSE and SMAPE increased by 0.06–0.77 mm d^-1 and −19 % to 79 %, respectively); however, the AML-OP model still outperformed the PM-OP model in most ecosystems, further underscoring the need to understand ETa regulation mechanisms under varying climatic conditions. Finally, the PM-OP models developed here provided better daily ETa estimates compared to other recently proposed hybrid models (RMSE reduced by 0.98–1.80 mm d^-1). Both models can be better applied to wetlands that have been less frequently evaluated previously (RMSE reduction of 0.22 mm d^-1 and 0.18 mm d^-1 for the AML-OP and PM-OP models).
Review of artificial intelligence and internet of things technologies in land and water management research during 1991–2021: A bibliometric analysis
2023, Engineering Applications of Artificial Intelligence
The challenges of urbanization, land degradation, water scarcity, and climate change are threatening agricultural systems and food security. Therefore, it is essential to manage land and water resources sustainably to improve productivity and address these challenges. Digital agriculture, which involves the use of smart systems and advanced farming practices incorporating Artificial Intelligence (AI) and Internet of Things (IoT) technologies, can provide solutions for sustainable agriculture and climate change adaptation. This study aims to fill the research gap by understanding the recent pulse and current trends of significant works published on AI and IoT-assisted land and water management (LWM) research. A bibliometric analysis of 436 English language articles published over the last three decades (1991–2021) was conducted. The study revealed a significant shift in research trends in 2010, with over 60 AI techniques utilized under different AI and IoT frameworks for LWM. The highly adopted AI techniques include Artificial Neural Networks (ANN—9.85%), Adaptive Neuro-Fuzzy Inference System (ANFIS —5.98%), Support Vector Regression (SVR—3.87%), Random Forest (RF—3.16%), and Multilayer Perceptron-ANN (MLP-ANN—2.81%). China, India, Iran, Australia, and the USA were identified as pioneers in the field, while Italy, Spain, and Saudi Arabia showed potential as emerging countries but with low collaboration links. The study also discussed the current limitations and challenges of AI and IoT technologies. Future research focus areas for AI and IoT were identified, and it was recommended to conduct comparative evaluations of AI techniques to determine the most effective approach for specific LWM domains.
A review of the Artificial Intelligence (AI) based techniques for estimating reference evapotranspiration: Current trends and future perspectives
2023, Computers and Electronics in Agriculture
Reference Evapotranspiration (ET_o) is a complex, dynamic and non-linear hydrological process. Accurate estimation of ET_o has long been an eminent topic of interest in the research community for its importance in effective planning and sustainable water resource management. Although the FAO-56 Penman-Monteith (PM) equation has been accepted as a standard equation for ET_o measurement, the primary concern that inhibits the applicability of this equation is the requirement for all the climatological variables, which might not be available at a given location. Owning to the remarkable success and accuracy achieved by Artificial Intelligence (AI) in almost every sphere, scientists have proposed the usage AI models for ET_o prediction as an alternate to the conventional methods. This comprehensive review will serve to raise awareness regarding the various state-of-the-art standalone AI frameworks, along with capturing the intriguing developments in the advanced AI space such as the hybrid and ensemble models, evolutionary models and a range of optimization techniques. The results from the publications published over the last 15 years (2007–2022) for ET_o prediction using AI under varied agro-climatic scenarios have been analysed and synthesized. The advantages and disadvantages of the established AI techniques have been discussed in each subsection. Some of the derived insights and major findings are discussed along with the future research recommendations. This review will not only provide a research vision for the novice researchers in the applicability of the aforementioned techniques, in context of ET_o prediction, but also be helpful as a compilation of the AI modelling studies for ET_o prediction for the established water resource engineers and hydrologists.

View all citing articles on Scopus

¹: Ph.D. Student.

View full text

Generalizability of Gene Expression Programming-based approaches for estimating daily reference evapotranspiration in coastal stations of Iran

Highlights

Summary

Introduction

Section snippets

Studied region and used data

Results and discussion

Conclusions

Agricultural Water Management

Computers & Geoscienes

Journal of Hydrology

Computers & Geosciences

Computers & Geosciences

Energy Conversion and Management

Computers and Electronics in Agriculture

Journal of Hydrology

Computers & Geosciences

Journal of Hydrology

Journal of Hydrology

Computers & Geosciences

An application of artificial intelligence for rainfall runoff modeling

Journal of Earth System Science

Genetic Programming

Estimating reference evapotranspiration under inaccurate data conditions

Irrigation and Drainage Systems

Using GP to determine Chezzy resistance coefficient in corrugated channels

Journal of Hydroinformatics

Daily pan evaporation modeling using linear genetic programming technique

Irrigation Science

Genetic programming-based empirical model for daily reference evapotranspiration estimation

Clean Soil, Air, Water

Reference crop evapotranspiration from temperature

American Society of Agricultural Engineers

Velocity predictions in compound channels with vegetated flood plains using genetic programming

International Journal of River Basin Management

Prediction of hourly actual evapotranspiration using neural networks, genetic programming, and statistical models

Hydrological Processes