Elsevier

Agricultural Water Management

Volume 221, 20 July 2019, Pages 220-230
Agricultural Water Management

Generalized reference evapotranspiration models with limited climatic data based on random forest and gene expression programming in Guangxi, China

https://doi.org/10.1016/j.agwat.2019.03.027Get rights and content

Abstract

Accurate estimation of reference evapotranspiration (ET0) is very important in hydrological cycle research, and is essential in agricultural water management and allocation. The application of the standard model (FAO-56 Penman-Monteith) to estimate ET0 is restricted due to the absence of required meteorological data. Although many machine learning algorithms have been applied in modeling ET0 with fewer meteorological variables, most of the models are trained and tested using data from the same station, their performances outside the training station are not evaluated. This study aims to investigate generalization ability of the random forest (RF) algorithm in modeling ET0 with different input combinations (refer to different circumstances in missing data), and compares this algorithm with the gene-expression programming (GEP) method using the data from 24 weather stations in a karst region of southwest China. The ET0 estimated by the FAO-56 Penman-Monteith model was used as a reference to evaluate the derived RF-based and GEP-based models, and the coefficient of determination (R2), Nash-Sutcliffe coefficiency of efficiency (NSCE), root of mean squared error (RMSE), and percent bias (PBIAS) were used as evaluation criteria. The results revealed that the derived RF-based generalization ET0 models are successfully applied in modeling ET0 with complete and incomplete meteorological variables (R2, NSCE, RMSE and PBIAS ranged from 0.637 to 0.987, 0.626 to 0.986, 0.107 to 0.563 mm day−1, and −2.916% to 1.571%, respectively), and seven RF-based models corresponding to different incomplete data circumstances are proposed. The GEP-based generalization ET0 models are also proposed, and they produced promising results (R2, NSCE, RMSE and PBIAS ranged from 0.639 to 0.944, 0.636 to 0.942, 0.222 to 0.555 mm day−1, and −1.98% to 0.248%, respectively). Although the RF-based ET0 models performed slightly better than the GEP-based models, the GEP approach has the ability to give explicit expressions between the dependent and independent variables, which is more convenient for irrigators with minimal computer skills. Therefore, we recommend applying the RF-based models in water balance research, and the GEP-based models in agricultural irrigation practice. Moreover, the models performance decreased with periods due to climate change impact on ET0. At last, both of the two methods have the ability to assess the importance of predictors, the order of the importance of meteorological variables on ET0 in Guangxi is: sunshine duration, air temperature, relative humidity, and wind speed.

Introduction

Evapotranspiration (ET) is an important branch of the hydrologic cycle (Traore and Guven, 2013), as more than 60% of total global precipitation is dissipated by it (Falamarzi et al., 2014). Accurate observation of actual ET is important in the design of irrigation schedules, water resource management, and water allocation (Wang et al., 2015). Although ET can be monitored directly by using a lysimeter (Allen et al., 2011), or a method for the transfer of the energy balance and water vapor mass (Shiri et al., 2014b), these measurements are laborious, time consuming, and expensive (Shiri et al., 2014b). Moreover, the measurements are limited in time and space (Falamarzi et al., 2014). Alternatively, ET can be estimated by a reference evapotranspiration (ET0) multiplied by a crop coefficient, which is the most extensive approach recommended by the Food and Agriculture Organization (FAO) (Allen et al., 1998, Wang et al., 2015, Rahimikhoob, 2016). The empirical crop coefficient (Kc, defined as the ratio of actual crop ET to the reference crop ET), is determined predominantly using specific crop characteristics and only a small percentage with environmental conditions (Allen et al., 1998). For example, the Kc values of hops were 0.69, 1.02 and 0.85 in at initial, mid-season, and end-season periods, respectively (Fandio et al., 2015). Kc values of approximately 80 crops are available on the FAO's website. Therefore, precisely calculating ET0 is essential for accurately estimating ET (Rahimikhoob, 2016). ET0 is the rate of ET of a hypothesized grass (with adequate water supply, albedo = 0.23, height = 0.12, and surface resistance = 70 s/m), and represents the maximum atmospheric evaporative power at a given time and location, regardless of crop type and soil characteristics (Allen et al., 1998, Shiri et al., 2012, Feng et al., 2016). Allen et al. (1998) stressed that ET0 is influenced only by meteorological factors. Therefore, many methods have been proposed to estimate ET0 from climatic data.

In these methods, the FAO-56 Penman-Monteith (FAO-PM) is a physical approach proposed based on the theories of aerodynamics and energy balance. This approach has been recommended as the standard method by the FAO, and is used to calibrate other ET0 methods (Allen et al., 1998, Shiri et al., 2012). The method has two important advantages: (1) It can be applied to different geographic and climatic zones without local calibration because of its theoretical basis, and the results have been proven to be more consistent with the observation data than other methods. (2) It has been validated using lysimeters worldwide (Kumar et al., 2008, Shiri et al., 2012, Wang et al., 2015). The main drawback of the FAO-PM is that it requires a full set of meteorological factors, including air temperature, relative humidity, solar radiation, and wind speed, and high-quality data (Kim and Kim, 2008a, Kumar et al., 2009). Furthermore, the computation procedure is complicated for irrigation technicians who typically are not sophisticated computer users(Traore and Guven, 2013). However, weather stations that satisfy the requirements of observations are limited, especially in developing countries (Wang et al., 2015, Shiri et al., 2014b, Shiri et al., 2012). Air temperature sensors are generally available in most weather stations worldwide, whereas sensors for observing other meteorological factors are found in relatively fewer stations, and the quality of data is not always reliable (Shiri et al., 2012, Droogers and Allen, 2002). Therefore, there is a need to develop simpler ET0 models that use fewer meteorological variables and have adequate precision. Empirical ET0 models using smaller amounts of climatic data have also been widely used as a substitute for FAO-PM. The Hargreaves–Samani equation is superior to others as it requires only the maximum and minimum air temperatures (Hargreaves, 1982), and provides the most accurate global average performance (Almorox et al., 2015). It was therefore employed in this study.

ET0 can be recognized as a function of several meteorological variables. With the advancement in computational resources and the emergence of big data, some machine learning techniques have been successfully applied to estimate ET0, such as artificial neural networks (ANN) (Kumar et al., 2002, Kim and Kim, 2008b, Shiri et al., 2014a), genetic programming (GP) (Izadifar and Elshorbagy, 2010, Kisi and Guven, 2010), support vector machine (SVM) (Tabari et al., 2012), adaptive neuro-fuzzy inference system (ANFIS) (Shiri et al., 2012, Tabari et al., 2012), extreme learning machine (ELM) (Feng et al., 2016, Abdullah et al., 2015), and gene expression programming (GEP) (Shiri et al., 2012, Shiri et al., 2014c, Wang et al., 2015). Unlike other machine learning approaches that produce black-box models, the GEP has the ability to provide explicit expressions between dependent and independent variables, a powerful advantage for practical applications, and transferability (Traore and Guven, 2013, Shiri et al., 2014c). Thus, it was applied in this study.

At present, the major deficiency in research on machine learning-based ET0 estimation models is that these models are trained and tested using climatic data at the same station, and their applicability is not validated beyond the training stations. Therefore, although the proposed models have adequate accuracy, as has been reported (Abdullah et al., 2015, Kim and Kim, 2008b, Kumar et al., 2008, Traore and Guven, 2011), they may be useful only in the training stations, and their effectiveness is otherwise doubtful. Moreover, it is impossible to develop ET0 models for each locations. One effective way is to develop generalized ET0 models using fewer meteorological variables, but few studies have considered this (Shiri et al., 2014c, Kisi, 2016). Beyond this, no research has tested whether machine learning-based ET0 models are applicable in the context of climate change, which has an important impact on water resource management (Allen et al., 1998, Wang et al., 2015).

The random forest (RF) method, which is an ensemble learning method for classification and regression, has become popular in recent years because of its robust performance across a wide range of datasets, high prediction accuracy, limited number of user-defined parameters, and ability to avoid overfitting (Jing et al., 2015). It can also estimate the relative importance of variables. Fern et al. (2014) conducted an exhaustive evaluation of 179 classifiers arising from 17 families (discriminant analysis, Bayesian, ANN, SVM, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests, generalized linear models, nearest neighbors, partial least squares, principal component regression, logistic and multinomial regression, and multiple adaptive regression splines) over 121 datasets, and concluded that the RF delivers the best performance overall. RF has been successfully applied to many areas.

Guangxi, located in southwest China, has one of the largest continuous karst landforms in the world. Although this region has a subtropical, mountainous, monsoon climate with a large amount of annual precipitation (more than 1200 mm), the karst habitat is deficient in water resources for vegetation growth because there are a large number of fissures, gaps, channels, and sinkholes. Thus, these karst systems have an ineffective water storage capacity (Chen et al., 2009, Chen et al., 2010). Furthermore, drought occurs more frequently. Liu et al. (2014) found that southwestern China has generally become drier in relation to global climate change, and that regional mean annual precipitation has decreased by 11.4 mm per decade. It is therefore acknowledged that accurate ET0 estimation is important for water resource allocation and management for agriculture in this region.

The objectives of the study are: (1) to demonstrate the applicability of RF and GEP in estimating ET0; (2) to develop and compare the performance of the generalized RF-based and GEP-based ET0 estimation models in Guangxi with different meteorological variables used as input, and evaluate the applicability of the models in the context of climate change; and (3) to identify the contribution rank of each climatic factor in ET0 estimation.

Section snippets

Study area and data collection

Guangxi is located in the Pearl River basin of southwest China between 20°54′–26°23′ N and 104°29′–112°04′ E, and covers approximately 236,700 km2, accounting for 2.47% of China's total territory. The carbonate area takes up approximately 37.8% of the province. The territory tilts from northwest to southeast, and has a hilly mountain terrain. The region has a tropical and subtropical humid climate, with an average annual temperature of 17–23 °C and annual precipitation of 1080–2760 mm. Fig. 1

Performance of RF-based models during the testing period

The ET0 values estimated from FAO-PM were considered the benchmark to evaluate the application of the proposed RF and GEP models during the testing periods. The statistical indicators, R2 NSCE, RMSE and PBIAS, are shown in Table 3. It was observed that R2, NSCE, RMSE and PBIAS ranged from 0.637 to 0.987, 0.626 to 0.986, 0.107 to 0.563 mm day−1 and −2.916% to 1.571%, respectively. The presence or absence of critical meteorological factors in the input sets significantly impacted the performance

Conclusions

The RF algorithm has a lot of merit and the ability to model complicated nonlinear systems, however, it is rarely applied in hydrological research. This study aims to investigate the applicability and the generalization of RF in modeling ET0 in Guangxi with different input combinations (refer to the different circumstances in missing data), and compare with the GEP method. The following conclusions can be drawn:

  • (1)

    The derived RF-based generalization ET0 models are successfully applied in modeling

Acknowledgements

This study was financially supported by the Guangxi Natural Science Foundation (2018GXNSFBA281136 and 2018GXNSFGA281003), and the National Natural Science Foundation of China (41807012). We would also like to thank the two anonymous reviewers for their thoughtful and constructive comments on the manuscript.

References (40)

  • J. Shiri et al.

    Comparison of heuristic and empirical approaches for estimating reference evapotranspiration from limited inputs in Iran

    Comput. Electron. Agric.

    (2014)
  • J. Shiri et al.

    Generalizability of gene expression programming-based approaches for estimating daily reference evapotranspiration in coastal stations of Iran

    J. Hydrol.

    (2014)
  • H. Tabari et al.

    SVM, ANFIS, regression and climate based models for reference evapotranspiration modeling using limited climatic data in a semi-arid highland environment

    J. Hydrol.

    (2012)
  • M.A. Yassin et al.

    Artificial neural networks versus gene expression programming for estimating reference evapotranspiration in arid climate

    Agric. Water Manage.

    (2016)
  • R.G. Allen et al.

    Crop evapotranspiration: guidelines for computing crop water requirements

    (1998)
  • L. Breiman

    Random Forests

    (2001)
  • X. Chen et al.

    The impact of land use and land cover changes on soil moisture and hydraulic conductivity along the karst hillslopes of southwest china

    Environ. Earth Sci.

    (2009)
  • H. Chen et al.

    Soil moisture dynamics under different land uses on karst hillslope in northwest Guangxi, China

    Environ. Earth Sci.

    (2010)
  • P. Droogers et al.

    Estimating reference evapotranspiration under inaccurate data conditions

    Irrig. Drain. Syst.

    (2002)
  • M. Fandio et al.

    Assessing and modelling water use and the partition of evapotranspiration of irrigated hop (Humulus lupulus), and relations of transpiration with hops yield and alpha-acids

    Ind. Crops Prod.

    (2015)
  • Cited by (0)

    View full text