Generalized reference evapotranspiration models with limited climatic data based on random forest and gene expression programming in Guangxi, China

doi:10.1016/j.agwat.2019.03.027

Agricultural Water Management

Volume 221, 20 July 2019, Pages 220-230

https://doi.org/10.1016/j.agwat.2019.03.027 Get rights and content

Abstract

Accurate estimation of reference evapotranspiration (ET₀) is very important in hydrological cycle research, and is essential in agricultural water management and allocation. The application of the standard model (FAO-56 Penman-Monteith) to estimate ET₀ is restricted due to the absence of required meteorological data. Although many machine learning algorithms have been applied in modeling ET₀ with fewer meteorological variables, most of the models are trained and tested using data from the same station, their performances outside the training station are not evaluated. This study aims to investigate generalization ability of the random forest (RF) algorithm in modeling ET₀ with different input combinations (refer to different circumstances in missing data), and compares this algorithm with the gene-expression programming (GEP) method using the data from 24 weather stations in a karst region of southwest China. The ET₀ estimated by the FAO-56 Penman-Monteith model was used as a reference to evaluate the derived RF-based and GEP-based models, and the coefficient of determination (R²), Nash-Sutcliffe coefficiency of efficiency (NSCE), root of mean squared error (RMSE), and percent bias (PBIAS) were used as evaluation criteria. The results revealed that the derived RF-based generalization ET₀ models are successfully applied in modeling ET₀ with complete and incomplete meteorological variables (R², NSCE, RMSE and PBIAS ranged from 0.637 to 0.987, 0.626 to 0.986, 0.107 to 0.563 mm day⁻¹, and −2.916% to 1.571%, respectively), and seven RF-based models corresponding to different incomplete data circumstances are proposed. The GEP-based generalization ET₀ models are also proposed, and they produced promising results (R², NSCE, RMSE and PBIAS ranged from 0.639 to 0.944, 0.636 to 0.942, 0.222 to 0.555 mm day⁻¹, and −1.98% to 0.248%, respectively). Although the RF-based ET₀ models performed slightly better than the GEP-based models, the GEP approach has the ability to give explicit expressions between the dependent and independent variables, which is more convenient for irrigators with minimal computer skills. Therefore, we recommend applying the RF-based models in water balance research, and the GEP-based models in agricultural irrigation practice. Moreover, the models performance decreased with periods due to climate change impact on ET₀. At last, both of the two methods have the ability to assess the importance of predictors, the order of the importance of meteorological variables on ET₀ in Guangxi is: sunshine duration, air temperature, relative humidity, and wind speed.

Introduction

Evapotranspiration (ET) is an important branch of the hydrologic cycle (Traore and Guven, 2013), as more than 60% of total global precipitation is dissipated by it (Falamarzi et al., 2014). Accurate observation of actual ET is important in the design of irrigation schedules, water resource management, and water allocation (Wang et al., 2015). Although ET can be monitored directly by using a lysimeter (Allen et al., 2011), or a method for the transfer of the energy balance and water vapor mass (Shiri et al., 2014b), these measurements are laborious, time consuming, and expensive (Shiri et al., 2014b). Moreover, the measurements are limited in time and space (Falamarzi et al., 2014). Alternatively, ET can be estimated by a reference evapotranspiration (ET₀) multiplied by a crop coefficient, which is the most extensive approach recommended by the Food and Agriculture Organization (FAO) (Allen et al., 1998, Wang et al., 2015, Rahimikhoob, 2016). The empirical crop coefficient (K_c, defined as the ratio of actual crop ET to the reference crop ET), is determined predominantly using specific crop characteristics and only a small percentage with environmental conditions (Allen et al., 1998). For example, the K_c values of hops were 0.69, 1.02 and 0.85 in at initial, mid-season, and end-season periods, respectively (Fandio et al., 2015). K_c values of approximately 80 crops are available on the FAO's website. Therefore, precisely calculating ET₀ is essential for accurately estimating ET (Rahimikhoob, 2016). ET₀ is the rate of ET of a hypothesized grass (with adequate water supply, albedo = 0.23, height = 0.12, and surface resistance = 70 s/m), and represents the maximum atmospheric evaporative power at a given time and location, regardless of crop type and soil characteristics (Allen et al., 1998, Shiri et al., 2012, Feng et al., 2016). Allen et al. (1998) stressed that ET₀ is influenced only by meteorological factors. Therefore, many methods have been proposed to estimate ET₀ from climatic data.

In these methods, the FAO-56 Penman-Monteith (FAO-PM) is a physical approach proposed based on the theories of aerodynamics and energy balance. This approach has been recommended as the standard method by the FAO, and is used to calibrate other ET₀ methods (Allen et al., 1998, Shiri et al., 2012). The method has two important advantages: (1) It can be applied to different geographic and climatic zones without local calibration because of its theoretical basis, and the results have been proven to be more consistent with the observation data than other methods. (2) It has been validated using lysimeters worldwide (Kumar et al., 2008, Shiri et al., 2012, Wang et al., 2015). The main drawback of the FAO-PM is that it requires a full set of meteorological factors, including air temperature, relative humidity, solar radiation, and wind speed, and high-quality data (Kim and Kim, 2008a, Kumar et al., 2009). Furthermore, the computation procedure is complicated for irrigation technicians who typically are not sophisticated computer users(Traore and Guven, 2013). However, weather stations that satisfy the requirements of observations are limited, especially in developing countries (Wang et al., 2015, Shiri et al., 2014b, Shiri et al., 2012). Air temperature sensors are generally available in most weather stations worldwide, whereas sensors for observing other meteorological factors are found in relatively fewer stations, and the quality of data is not always reliable (Shiri et al., 2012, Droogers and Allen, 2002). Therefore, there is a need to develop simpler ET₀ models that use fewer meteorological variables and have adequate precision. Empirical ET₀ models using smaller amounts of climatic data have also been widely used as a substitute for FAO-PM. The Hargreaves–Samani equation is superior to others as it requires only the maximum and minimum air temperatures (Hargreaves, 1982), and provides the most accurate global average performance (Almorox et al., 2015). It was therefore employed in this study.

ET₀ can be recognized as a function of several meteorological variables. With the advancement in computational resources and the emergence of big data, some machine learning techniques have been successfully applied to estimate ET₀, such as artificial neural networks (ANN) (Kumar et al., 2002, Kim and Kim, 2008b, Shiri et al., 2014a), genetic programming (GP) (Izadifar and Elshorbagy, 2010, Kisi and Guven, 2010), support vector machine (SVM) (Tabari et al., 2012), adaptive neuro-fuzzy inference system (ANFIS) (Shiri et al., 2012, Tabari et al., 2012), extreme learning machine (ELM) (Feng et al., 2016, Abdullah et al., 2015), and gene expression programming (GEP) (Shiri et al., 2012, Shiri et al., 2014c, Wang et al., 2015). Unlike other machine learning approaches that produce black-box models, the GEP has the ability to provide explicit expressions between dependent and independent variables, a powerful advantage for practical applications, and transferability (Traore and Guven, 2013, Shiri et al., 2014c). Thus, it was applied in this study.

At present, the major deficiency in research on machine learning-based ET₀ estimation models is that these models are trained and tested using climatic data at the same station, and their applicability is not validated beyond the training stations. Therefore, although the proposed models have adequate accuracy, as has been reported (Abdullah et al., 2015, Kim and Kim, 2008b, Kumar et al., 2008, Traore and Guven, 2011), they may be useful only in the training stations, and their effectiveness is otherwise doubtful. Moreover, it is impossible to develop ET₀ models for each locations. One effective way is to develop generalized ET₀ models using fewer meteorological variables, but few studies have considered this (Shiri et al., 2014c, Kisi, 2016). Beyond this, no research has tested whether machine learning-based ET₀ models are applicable in the context of climate change, which has an important impact on water resource management (Allen et al., 1998, Wang et al., 2015).

The random forest (RF) method, which is an ensemble learning method for classification and regression, has become popular in recent years because of its robust performance across a wide range of datasets, high prediction accuracy, limited number of user-defined parameters, and ability to avoid overfitting (Jing et al., 2015). It can also estimate the relative importance of variables. Fern et al. (2014) conducted an exhaustive evaluation of 179 classifiers arising from 17 families (discriminant analysis, Bayesian, ANN, SVM, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests, generalized linear models, nearest neighbors, partial least squares, principal component regression, logistic and multinomial regression, and multiple adaptive regression splines) over 121 datasets, and concluded that the RF delivers the best performance overall. RF has been successfully applied to many areas.

Guangxi, located in southwest China, has one of the largest continuous karst landforms in the world. Although this region has a subtropical, mountainous, monsoon climate with a large amount of annual precipitation (more than 1200 mm), the karst habitat is deficient in water resources for vegetation growth because there are a large number of fissures, gaps, channels, and sinkholes. Thus, these karst systems have an ineffective water storage capacity (Chen et al., 2009, Chen et al., 2010). Furthermore, drought occurs more frequently. Liu et al. (2014) found that southwestern China has generally become drier in relation to global climate change, and that regional mean annual precipitation has decreased by 11.4 mm per decade. It is therefore acknowledged that accurate ET₀ estimation is important for water resource allocation and management for agriculture in this region.

The objectives of the study are: (1) to demonstrate the applicability of RF and GEP in estimating ET₀; (2) to develop and compare the performance of the generalized RF-based and GEP-based ET₀ estimation models in Guangxi with different meteorological variables used as input, and evaluate the applicability of the models in the context of climate change; and (3) to identify the contribution rank of each climatic factor in ET₀ estimation.

Section snippets

Study area and data collection

Guangxi is located in the Pearl River basin of southwest China between 20°54′–26°23′ N and 104°29′–112°04′ E, and covers approximately 236,700 km², accounting for 2.47% of China's total territory. The carbonate area takes up approximately 37.8% of the province. The territory tilts from northwest to southeast, and has a hilly mountain terrain. The region has a tropical and subtropical humid climate, with an average annual temperature of 17–23 °C and annual precipitation of 1080–2760 mm. Fig. 1

Performance of RF-based models during the testing period

The ET₀ values estimated from FAO-PM were considered the benchmark to evaluate the application of the proposed RF and GEP models during the testing periods. The statistical indicators, R² NSCE, RMSE and PBIAS, are shown in Table 3. It was observed that R², NSCE, RMSE and PBIAS ranged from 0.637 to 0.987, 0.626 to 0.986, 0.107 to 0.563 mm day⁻¹ and −2.916% to 1.571%, respectively. The presence or absence of critical meteorological factors in the input sets significantly impacted the performance

Conclusions

The RF algorithm has a lot of merit and the ability to model complicated nonlinear systems, however, it is rarely applied in hydrological research. This study aims to investigate the applicability and the generalization of RF in modeling ET₀ in Guangxi with different input combinations (refer to the different circumstances in missing data), and compare with the GEP method. The following conclusions can be drawn:

(1)
The derived RF-based generalization ET₀ models are successfully applied in modeling

Acknowledgements

This study was financially supported by the Guangxi Natural Science Foundation (2018GXNSFBA281136 and 2018GXNSFGA281003), and the National Natural Science Foundation of China (41807012). We would also like to thank the two anonymous reviewers for their thoughtful and constructive comments on the manuscript.

References (40)

S.S. Abdullah et al.
Extreme learning machines: a new approach for prediction of reference evapotranspiration
J. Hydrol.
(2015)
R.G. Allen et al.
Evapotranspiration information reporting: I. Factors governing measurement accuracy
Agric. Water Manage.
(2011)
J. Almorox et al.
Global performance ranking of temperature-based approaches for evapotranspiration estimation considering Koppen climate classes
J. Hydrol.
(2015)
Y. Falamarzi et al.
Estimating evapotranspiration from temperature and wind speed data using artificial and wavelet neural networks (WNNs)
Agric. Water Manage.
(2014)
Y. Feng et al.
Comparison of ELM, GANN, WNN and empirical models for estimating reference evapotranspiration in humid region of Southwest China
J. Hydrol.
(2016)
S. Kim et al.
Neural networks and genetic algorithm approach for nonlinear evaporation and evapotranspiration modeling
J. Hydrol.
(2008)
S. Kim et al.
Neural networks and genetic algorithm approach for nonlinear evaporation and evapotranspiration modeling
J. Hydrol.
(2008)
O. Kisi
Modeling reference evapotranspiration using three different heuristic regression approaches
Agric. Water Manage.
(2016)
J.E. Nash et al.
River flow forecasting through conceptual models part i - a discussion of principles
J. Hydrol.
(1970)
J. Shiri et al.
Daily reference evapotranspiration modeling by using genetic programming approach in the Basque Country (Northern Spain)
J. Hydrol.
(2012)

J. Shiri et al.

Comparison of heuristic and empirical approaches for estimating reference evapotranspiration from limited inputs in Iran

Comput. Electron. Agric.

(2014)

J. Shiri et al.

Generalizability of gene expression programming-based approaches for estimating daily reference evapotranspiration in coastal stations of Iran

J. Hydrol.

(2014)

H. Tabari et al.

SVM, ANFIS, regression and climate based models for reference evapotranspiration modeling using limited climatic data in a semi-arid highland environment

J. Hydrol.

(2012)

M.A. Yassin et al.

Artificial neural networks versus gene expression programming for estimating reference evapotranspiration in arid climate

Agric. Water Manage.

(2016)

R.G. Allen et al.

Crop evapotranspiration: guidelines for computing crop water requirements

(1998)

L. Breiman

Random Forests

(2001)

X. Chen et al.

The impact of land use and land cover changes on soil moisture and hydraulic conductivity along the karst hillslopes of southwest china

Environ. Earth Sci.

(2009)

H. Chen et al.

Soil moisture dynamics under different land uses on karst hillslope in northwest Guangxi, China

Environ. Earth Sci.

(2010)

P. Droogers et al.

Estimating reference evapotranspiration under inaccurate data conditions

Irrig. Drain. Syst.

(2002)

M. Fandio et al.

Assessing and modelling water use and the partition of evapotranspiration of irrigated hop (Humulus lupulus), and relations of transpiration with hops yield and alpha-acids

Ind. Crops Prod.

(2015)

Cited by (84)

Lake evaporation in arid zones: Leveraging Landsat 8's water temperature retrieval and key meteorological drivers
2024, Journal of Environmental Management
This study assessed the accuracy of various methods for estimating lake evaporation in arid, high-wind environments, leveraging water temperature data from Landsat 8. The evaluation involved four estimation techniques: the FAO 56 radiation-based equation, the Schendel temperature-based equation, the Brockamp & Wenner mass transfer-based equation, and the VUV regression-based equation. The study focused on the Chah Nimeh Reservoirs (CNRs) in the arid region of Iran due to its distinctive wind patterns and dry climate. Our analysis revealed that the Split-window algorithm was the most precise for satellite-based water surface temperature measurement, with an R² value of 0.86 and an RMSE of 1.61 °C. Among evaporation estimation methods, the FAO 56 stood out, demonstrating an R² value of 0.76 and an RMSE of 4.36 mm/day in comparison to pan evaporation measurements. A subsequent sensitivity analysis using an artificial neural network (ANN) identified net radiation as the predominant factor influencing lake evaporation, especially during both wind and no-wind conditions. This research underscores the importance of incorporating net radiation, water surface temperature, and wind speed parameters in evaporation evaluations, providing pivotal insights for effective water management in arid, windy regions.
Integrating machine learning and empirical evapotranspiration modeling with DSSAT: Implications for agricultural water management
2024, Science of the Total Environment
The availability of accurate reference evapotranspiration (ETo) data is crucial for developing decision support systems for optimal water resource management. This study aimed to evaluate the accuracy of three empirical models (Hargreaves-Samani (HS), Priestly-Taylor (PT), and Turc (TU)) and three machine learning models (Multiple linear regression (LR), Random Forest (RF), and Artificial Neural Network (NN)) in estimating daily ETo compared to the Penman-Monteith FAO-56 (PM) model. Long-term data from 42 weather stations in Florida were used. Moreover, the effect of ETo model selection on sweet corn irrigation water use was investigated by integrating simulated ETo data from empirical and ML models using the Decision Support System for Agrotechnology Transfer (DSSAT) model at two locations (Citra and Homestead) in Florida. Furthermore, a linear bias correction calibration technique was employed to improve the performance of empirical models. Results were consistent in that the NN and RF models outperformed the empirical models. The empirical models tended to underestimate and overestimate small and high daily ETo values, respectively, with the HS model exhibiting the least accuracy. However, calibrated PT and TU models performed comparably to the ML models. Results also revealed that using an inappropriate ETo model could lead to over-irrigation by up to 54 mm during a single crop season. Overall, ML models have proven reliable alternatives to the PM model, especially in regions with access to long-term data due to their site-independent performance. In areas without long-term data for ML model training and testing, calibrating empirical models is viable, but site-specific calibration is needed. It is important to highlight that distinct plant species exhibit varying transpiration characteristics and, consequently, have different water requirements. These differences play a pivotal role in shaping the overall impact of ETo models on crop water use.
A review of recent advances and future prospects in calculation of reference evapotranspiration in Bangladesh using soft computing models
2024, Journal of Environmental Management
Evapotranspiration (ETo) is a complex and non-linear hydrological process with a significant impact on efficient water resource planning and long-term management. The Penman-Monteith (PM) equation method, developed by the Food and Agriculture Organization of the United Nations (FAO), represents an advancement over earlier approaches for estimating ETo. Eto though reliable, faces limitations due to the requirement for climatological data not always available at specific locations. To address this, researchers have explored soft computing (SC) models as alternatives to conventional methods, known for their exceptional accuracy across disciplines. This critical review aims to enhance understanding of cutting-edge SC frameworks for ETo estimation, highlighting advancements in evolutionary models, hybrid and ensemble approaches, and optimization strategies. Recent applications of SC in various climatic zones in Bangladesh are evaluated, with the order of preference being ANFIS > Bi-LSTM > RT > DENFIS > SVR-PSOGWO > PSO–HFS due to their consistently high accuracy (RMSE and $R^{2}$ ). This review introduces a benchmark for incorporating evolutionary computation algorithms (EC) into ETo modeling. Each subsection addresses the strengths and weaknesses of known SC models, offering valuable insights. The review serves as a valuable resource for experienced water resource engineers and hydrologists, both domestically and internationally, providing comprehensive SC modeling studies for ETo forecasting. Furthermore, it provides an improved water resources monitoring and management plans.
Thermal performance enhancement of metal hydride reactor for hydrogen storage with graphene oxide nanofluid: Model prediction with machine learning
2024, International Journal of Hydrogen Energy
Some metals and metal alloys can store gaseous hydrogen, making the storage of hydrogen in metal hydrides (MHs) possible. For the MH reactor to store hydrogen at a higher rate, improved heat transfer is required. The 2-D material graphene oxide (GO) attracted researchers’ attention due to its excellent thermal properties. The present work aims to improve heat transfer and hydrogen storage rate of the LaNi₅ MH reactor. A 2-D axis-symmetric numerical model of the reactor is formed and simulated using COMSOL Multiphysics 5.6 software. Water and its based nanofluids (NFs), namely, GO, GO-SiO₂ (50:50), GO-TiO₂ (50:50), and Al₂O₃ are employed as heat transfer fluids (HTFs). The effect of inlet temperature and flow velocity of the HTF; and hydrogen supply pressure on the reactor performance is examined. The findings demonstrate that the storage rate is greatly improved by lowering the HTF inlet temperature; and increasing its inlet velocity and hydrogen supply pressure. In comparison to water and all other NFs, the GO NF with 1 vol% demonstrated comparatively better heat transfer. It reduces the duration by 61.7% of that of water to attain 90% hydrogen storage capacity at similar conditions. The data acquired in the numerical investigations were used to build a prediction metamodel using the evolutionary machine learning (ML) technique of gene expression programming (GEP).
Altered landscape pattern dominates the declined urban evapotranspiration trend
2023, Journal of Hydrology
Urbanization has significantly altered the regional hydrological cycle and thermal environment. However, the response of urban evapotranspiration (ET_u) to these altered urban landscape patterns remains unclear. This study proposes a hybrid ET_u model that combines physics processes and machine learning to evaluate trends and drivers of ET_u across 480 cities in China. The results indicate that the long-term average ET_u across the 480 cities is 508 mm, and the average rate of decline in ET_u (DR_ETu) across the 480 cities is 2.19 mm/year. DR_ETu exhibits a positive correlation with the increasing rate of urban aridity index (AI) and fractional impermeable surface coverage (f_i). The decrease in ET_u is primarily caused by the altered urban landscape patterns, which include: 1) reductions in urban vegetation and water body areas resulting from urbanization; and 2) the substitution of natural forest land with landscape grasslands in urban areas. We further investigated the correlation between the urban heat island (UHI) effect and ET_u and found that the intensified UHI effect is caused by an increased difference in ET_u between the main urban area and the suburbs. The results of this study emphasize the significance of constructing water bodies and green infrastructure in the central urban area, as well as restoring natural forests, to enhance the thermal comfort of the urban environment.
Modelling reference evapotranspiration using gene expression programming and artificial neural network at Pantnagar, India
2023, Information Processing in Agriculture
Citation Excerpt :
It is a parallel distributed system of interconnected neurons which mimics the human nervous system. GEP has been used in function finding in many hydro-climatological researches like evaporation [14–17], evapotranspiration [18–28], rainfall-runoff modelling [29,30], infiltration [31], stage-discharge relationship [32], etc. ANN has been used extensively in estimating hydro-climatological processes ranging from evaporation [33–36], evapotranspiration [37–40], rainfall-runoff [41–44], sediment outflow [45–47].
Evapotranspiration is an essential component of the hydrological cycle that is of particular interest for water resource planning. Its quantification is helpful in irrigation scheduling, water balance studies, water allocation, etc. Modelling of reference evapotranspiration (ET₀) using both gene expression programming (GEP) and artificial neural network (ANN) techniques was done using the daily meteorological data of the Pantnagar region, India, from 2010 to 2019. A total of 15 combinations of inputs were used in developing the ET₀ models. The model with the least number of inputs consisted of maximum and minimum air temperatures, whereas the model with the highest number of inputs consisted of maximum air temperature, minimum air temperature, mean relative humidity, number of sunshine hours, wind speed at 2 m height and extra-terrestrial radiation as inputs and with ET₀ as the output for all the models. All the GEP models were developed for a single functional set and pre-defined genetic operator values, while the best structure in each ANN model was found based on the performance during the testing phase. It was found that ANN models were superior to GEP models for the estimation purpose. It was evident from the reduction in RMSE values ranging from 2 % to 56 % during training and testing phases in all the ANN models compared with GEP models. The ANN models showed an increase of about 0.96 % to 9.72 % of R² value compared to the respective GEP models. The comparative study of these models with multiple linear regression (MLR) depicted that the ANN and GEP models were superior to MLR models.

View all citing articles on Scopus

View full text