Generalized reference evapotranspiration models with limited climatic data based on random forest and gene expression programming in Guangxi, China
Introduction
Evapotranspiration (ET) is an important branch of the hydrologic cycle (Traore and Guven, 2013), as more than 60% of total global precipitation is dissipated by it (Falamarzi et al., 2014). Accurate observation of actual ET is important in the design of irrigation schedules, water resource management, and water allocation (Wang et al., 2015). Although ET can be monitored directly by using a lysimeter (Allen et al., 2011), or a method for the transfer of the energy balance and water vapor mass (Shiri et al., 2014b), these measurements are laborious, time consuming, and expensive (Shiri et al., 2014b). Moreover, the measurements are limited in time and space (Falamarzi et al., 2014). Alternatively, ET can be estimated by a reference evapotranspiration (ET0) multiplied by a crop coefficient, which is the most extensive approach recommended by the Food and Agriculture Organization (FAO) (Allen et al., 1998, Wang et al., 2015, Rahimikhoob, 2016). The empirical crop coefficient (Kc, defined as the ratio of actual crop ET to the reference crop ET), is determined predominantly using specific crop characteristics and only a small percentage with environmental conditions (Allen et al., 1998). For example, the Kc values of hops were 0.69, 1.02 and 0.85 in at initial, mid-season, and end-season periods, respectively (Fandio et al., 2015). Kc values of approximately 80 crops are available on the FAO's website. Therefore, precisely calculating ET0 is essential for accurately estimating ET (Rahimikhoob, 2016). ET0 is the rate of ET of a hypothesized grass (with adequate water supply, albedo = 0.23, height = 0.12, and surface resistance = 70 s/m), and represents the maximum atmospheric evaporative power at a given time and location, regardless of crop type and soil characteristics (Allen et al., 1998, Shiri et al., 2012, Feng et al., 2016). Allen et al. (1998) stressed that ET0 is influenced only by meteorological factors. Therefore, many methods have been proposed to estimate ET0 from climatic data.
In these methods, the FAO-56 Penman-Monteith (FAO-PM) is a physical approach proposed based on the theories of aerodynamics and energy balance. This approach has been recommended as the standard method by the FAO, and is used to calibrate other ET0 methods (Allen et al., 1998, Shiri et al., 2012). The method has two important advantages: (1) It can be applied to different geographic and climatic zones without local calibration because of its theoretical basis, and the results have been proven to be more consistent with the observation data than other methods. (2) It has been validated using lysimeters worldwide (Kumar et al., 2008, Shiri et al., 2012, Wang et al., 2015). The main drawback of the FAO-PM is that it requires a full set of meteorological factors, including air temperature, relative humidity, solar radiation, and wind speed, and high-quality data (Kim and Kim, 2008a, Kumar et al., 2009). Furthermore, the computation procedure is complicated for irrigation technicians who typically are not sophisticated computer users(Traore and Guven, 2013). However, weather stations that satisfy the requirements of observations are limited, especially in developing countries (Wang et al., 2015, Shiri et al., 2014b, Shiri et al., 2012). Air temperature sensors are generally available in most weather stations worldwide, whereas sensors for observing other meteorological factors are found in relatively fewer stations, and the quality of data is not always reliable (Shiri et al., 2012, Droogers and Allen, 2002). Therefore, there is a need to develop simpler ET0 models that use fewer meteorological variables and have adequate precision. Empirical ET0 models using smaller amounts of climatic data have also been widely used as a substitute for FAO-PM. The Hargreaves–Samani equation is superior to others as it requires only the maximum and minimum air temperatures (Hargreaves, 1982), and provides the most accurate global average performance (Almorox et al., 2015). It was therefore employed in this study.
ET0 can be recognized as a function of several meteorological variables. With the advancement in computational resources and the emergence of big data, some machine learning techniques have been successfully applied to estimate ET0, such as artificial neural networks (ANN) (Kumar et al., 2002, Kim and Kim, 2008b, Shiri et al., 2014a), genetic programming (GP) (Izadifar and Elshorbagy, 2010, Kisi and Guven, 2010), support vector machine (SVM) (Tabari et al., 2012), adaptive neuro-fuzzy inference system (ANFIS) (Shiri et al., 2012, Tabari et al., 2012), extreme learning machine (ELM) (Feng et al., 2016, Abdullah et al., 2015), and gene expression programming (GEP) (Shiri et al., 2012, Shiri et al., 2014c, Wang et al., 2015). Unlike other machine learning approaches that produce black-box models, the GEP has the ability to provide explicit expressions between dependent and independent variables, a powerful advantage for practical applications, and transferability (Traore and Guven, 2013, Shiri et al., 2014c). Thus, it was applied in this study.
At present, the major deficiency in research on machine learning-based ET0 estimation models is that these models are trained and tested using climatic data at the same station, and their applicability is not validated beyond the training stations. Therefore, although the proposed models have adequate accuracy, as has been reported (Abdullah et al., 2015, Kim and Kim, 2008b, Kumar et al., 2008, Traore and Guven, 2011), they may be useful only in the training stations, and their effectiveness is otherwise doubtful. Moreover, it is impossible to develop ET0 models for each locations. One effective way is to develop generalized ET0 models using fewer meteorological variables, but few studies have considered this (Shiri et al., 2014c, Kisi, 2016). Beyond this, no research has tested whether machine learning-based ET0 models are applicable in the context of climate change, which has an important impact on water resource management (Allen et al., 1998, Wang et al., 2015).
The random forest (RF) method, which is an ensemble learning method for classification and regression, has become popular in recent years because of its robust performance across a wide range of datasets, high prediction accuracy, limited number of user-defined parameters, and ability to avoid overfitting (Jing et al., 2015). It can also estimate the relative importance of variables. Fern et al. (2014) conducted an exhaustive evaluation of 179 classifiers arising from 17 families (discriminant analysis, Bayesian, ANN, SVM, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests, generalized linear models, nearest neighbors, partial least squares, principal component regression, logistic and multinomial regression, and multiple adaptive regression splines) over 121 datasets, and concluded that the RF delivers the best performance overall. RF has been successfully applied to many areas.
Guangxi, located in southwest China, has one of the largest continuous karst landforms in the world. Although this region has a subtropical, mountainous, monsoon climate with a large amount of annual precipitation (more than 1200 mm), the karst habitat is deficient in water resources for vegetation growth because there are a large number of fissures, gaps, channels, and sinkholes. Thus, these karst systems have an ineffective water storage capacity (Chen et al., 2009, Chen et al., 2010). Furthermore, drought occurs more frequently. Liu et al. (2014) found that southwestern China has generally become drier in relation to global climate change, and that regional mean annual precipitation has decreased by 11.4 mm per decade. It is therefore acknowledged that accurate ET0 estimation is important for water resource allocation and management for agriculture in this region.
The objectives of the study are: (1) to demonstrate the applicability of RF and GEP in estimating ET0; (2) to develop and compare the performance of the generalized RF-based and GEP-based ET0 estimation models in Guangxi with different meteorological variables used as input, and evaluate the applicability of the models in the context of climate change; and (3) to identify the contribution rank of each climatic factor in ET0 estimation.
Section snippets
Study area and data collection
Guangxi is located in the Pearl River basin of southwest China between 20°54′–26°23′ N and 104°29′–112°04′ E, and covers approximately 236,700 km2, accounting for 2.47% of China's total territory. The carbonate area takes up approximately 37.8% of the province. The territory tilts from northwest to southeast, and has a hilly mountain terrain. The region has a tropical and subtropical humid climate, with an average annual temperature of 17–23 °C and annual precipitation of 1080–2760 mm. Fig. 1
Performance of RF-based models during the testing period
The ET0 values estimated from FAO-PM were considered the benchmark to evaluate the application of the proposed RF and GEP models during the testing periods. The statistical indicators, R2 NSCE, RMSE and PBIAS, are shown in Table 3. It was observed that R2, NSCE, RMSE and PBIAS ranged from 0.637 to 0.987, 0.626 to 0.986, 0.107 to 0.563 mm day−1 and −2.916% to 1.571%, respectively. The presence or absence of critical meteorological factors in the input sets significantly impacted the performance
Conclusions
The RF algorithm has a lot of merit and the ability to model complicated nonlinear systems, however, it is rarely applied in hydrological research. This study aims to investigate the applicability and the generalization of RF in modeling ET0 in Guangxi with different input combinations (refer to the different circumstances in missing data), and compare with the GEP method. The following conclusions can be drawn:
- (1)
The derived RF-based generalization ET0 models are successfully applied in modeling
Acknowledgements
This study was financially supported by the Guangxi Natural Science Foundation (2018GXNSFBA281136 and 2018GXNSFGA281003), and the National Natural Science Foundation of China (41807012). We would also like to thank the two anonymous reviewers for their thoughtful and constructive comments on the manuscript.
References (40)
- et al.
Extreme learning machines: a new approach for prediction of reference evapotranspiration
J. Hydrol.
(2015) - et al.
Evapotranspiration information reporting: I. Factors governing measurement accuracy
Agric. Water Manage.
(2011) - et al.
Global performance ranking of temperature-based approaches for evapotranspiration estimation considering Koppen climate classes
J. Hydrol.
(2015) - et al.
Estimating evapotranspiration from temperature and wind speed data using artificial and wavelet neural networks (WNNs)
Agric. Water Manage.
(2014) - et al.
Comparison of ELM, GANN, WNN and empirical models for estimating reference evapotranspiration in humid region of Southwest China
J. Hydrol.
(2016) - et al.
Neural networks and genetic algorithm approach for nonlinear evaporation and evapotranspiration modeling
J. Hydrol.
(2008) - et al.
Neural networks and genetic algorithm approach for nonlinear evaporation and evapotranspiration modeling
J. Hydrol.
(2008) Modeling reference evapotranspiration using three different heuristic regression approaches
Agric. Water Manage.
(2016)- et al.
River flow forecasting through conceptual models part i - a discussion of principles
J. Hydrol.
(1970) - et al.
Daily reference evapotranspiration modeling by using genetic programming approach in the Basque Country (Northern Spain)
J. Hydrol.
(2012)
Comparison of heuristic and empirical approaches for estimating reference evapotranspiration from limited inputs in Iran
Comput. Electron. Agric.
Generalizability of gene expression programming-based approaches for estimating daily reference evapotranspiration in coastal stations of Iran
J. Hydrol.
SVM, ANFIS, regression and climate based models for reference evapotranspiration modeling using limited climatic data in a semi-arid highland environment
J. Hydrol.
Artificial neural networks versus gene expression programming for estimating reference evapotranspiration in arid climate
Agric. Water Manage.
Crop evapotranspiration: guidelines for computing crop water requirements
Random Forests
The impact of land use and land cover changes on soil moisture and hydraulic conductivity along the karst hillslopes of southwest china
Environ. Earth Sci.
Soil moisture dynamics under different land uses on karst hillslope in northwest Guangxi, China
Environ. Earth Sci.
Estimating reference evapotranspiration under inaccurate data conditions
Irrig. Drain. Syst.
Assessing and modelling water use and the partition of evapotranspiration of irrigated hop (Humulus lupulus), and relations of transpiration with hops yield and alpha-acids
Ind. Crops Prod.
Cited by (84)
Lake evaporation in arid zones: Leveraging Landsat 8's water temperature retrieval and key meteorological drivers
2024, Journal of Environmental ManagementIntegrating machine learning and empirical evapotranspiration modeling with DSSAT: Implications for agricultural water management
2024, Science of the Total EnvironmentA review of recent advances and future prospects in calculation of reference evapotranspiration in Bangladesh using soft computing models
2024, Journal of Environmental ManagementThermal performance enhancement of metal hydride reactor for hydrogen storage with graphene oxide nanofluid: Model prediction with machine learning
2024, International Journal of Hydrogen EnergyAltered landscape pattern dominates the declined urban evapotranspiration trend
2023, Journal of HydrologyModelling reference evapotranspiration using gene expression programming and artificial neural network at Pantnagar, India
2023, Information Processing in AgricultureCitation Excerpt :It is a parallel distributed system of interconnected neurons which mimics the human nervous system. GEP has been used in function finding in many hydro-climatological researches like evaporation [14–17], evapotranspiration [18–28], rainfall-runoff modelling [29,30], infiltration [31], stage-discharge relationship [32], etc. ANN has been used extensively in estimating hydro-climatological processes ranging from evaporation [33–36], evapotranspiration [37–40], rainfall-runoff [41–44], sediment outflow [45–47].