Short-term streamflow forecasting with global climate change implications – A comparative study between genetic programming and neural network models
Introduction
The availability of adequate fresh water is a fundamental requirement for the sustainability of human and terrestrial landscapes. Thus, the importance of understanding and improving predictive capacity regarding all aspects of the global and regional water cycle is certain to continue to increase. One fundamental component of the water cycle is streamflow. Streamflow is related to fresh water availability for human, animal, and plant populations, and to the incidences of natural hazards, such as flood and drought, that occur abruptly and may result in loss of human and animal life and damages to human properties. Flood alert systems hold the highest possibility of reducing the damages from the floods. On the other hand, drought analysis also counts on appropriate forecasts of stream flow. Stream flow prediction therefore provides crucial information for adaptive water resources management. Prospective users may include farmers, fishermen, waterway navigators, coastal ecosystem management, reservoir operators, recreational management, and riparian management, for example. Yet fluctuations of global climate change challenge scientists and engineers to estimate and forecast the magnitude and timing of stream discharges with higher accuracy. Recent leaps in remote sensing and artificial intelligence technologies empower such an effort of streamflow forecasting.
Genetic programming (GP) model is selected as a means to compare against neural network (NN) model for the development of a suite of streamflow forecasting models to meet various demands in this study. Strengths of the GP include the evolutionary approach, the nature selection process, and the white box characteristic. Since multiple input variables may be used in the prediction of stream flows, we must be able to identify the importance of each variable. The evolutionary process and natural selection techniques embedded in the GP model would allow the screening of the multiple input variables to be executed inherently for achieving the best result. The white box character of GP model can reveal internal structures of all created models, which can be examined by such a genetic algorithm-based binary tree modeling structure.
In all modeling schemes, lead-lag regression approach is preferred over the time-series approach to creating forecasting models. Lead-lag regression model is a statistical model which identifies differences of timing in fluctuations through a system. Lead-lag regression utilizes existing data to predict discharge in the future time steps. Three-step-behind inputs are paired with a current discharge, a four-step-behind inputs are paired with a one-step-behind discharge, and so on. Thus, lead-lag regression only requires historical data to develop a predicting model. The time-series model predicts a one-step-ahead discharge, and then put the one-step-ahead discharge back into the input dataset to predict a two-step-ahead discharge, and repeat this process until a number of future time steps can be achieved. Other approaches of streamflow predictions are physically-based model and conceptual model. These two approaches usually need precipitation data as a driver to estimate the future discharges. Normally, a forecast of precipitation may be achieved beforehand. The estimated future precipitation data are fed into the model in order to calculate the amount of discharges in the future time steps. The accuracy of this approach depends mainly on the accuracy of predicted precipitation and estimated watershed characteristics. Consequently creating future time-step drivers could be much more difficult when multiple drivers are required to do so. Thus, the lead-lag regression is preferable in this study. The multiple input variables of interest include historic streamflows, NEXt generation RADar (NEXRAD) precipitation data, sea surface temperatures (SSTs) in the Pacific Ocean, the Atlantic Ocean, and the Gulf of Mexico, and local meteorological data collected from three weather stations in the watershed. Effective lead-time streamflow forecast is one of the key aspects of successful flood and drought management based on an enlarged hydrometeorological datasets. Hence, it is the aim of this study to testing the hypothesis that the inclusion of SSTs would significantly impact the accuracy of streamflow forecasting and the GP model can capture the underlying non-linear characteristics in a river system basin wide. All the efforts to improving existing methods and developing new methods of streamflow prediction in this paper, in the nexus of artificial intelligence and high performance computing, may support the adaptive water resources management at all scales spatially and temporally.
Section snippets
Background
Sea surface temperatures (SSTs) have been a primary expression of global climate anomalies for several decades. El Niño-Southern oscillation (ENSO) is the sea surface temperature oscillation in the Pacific Ocean. A vast amount of studies show influences of ENSO on climate changes in the North, Central, and South America (Hansen et al., 1997, Cayan et al., 1998, Harrison and Larkin, 2000, Andrews et al., 2003, Tartaglione et al., 2003, Haylock et al., 2005). Pacific decadal oscillation (PDO) is
Study area
The Choke Canyon Reservoir Watershed (CCRW) is a portion of the Nueces River Basin, south Texas. It is composed of several land use and land cover patterns covering an area of approximately 15,000 km2. The major uses of the land are agriculture and livestock. Intensive uses of groundwater for irrigation are highly concentrated in the middle and lower areas of the basin. Geography of the area strongly influences the hydrological cycle of the watershed. In the upper portion of the watershed the
Solution procedure: genetic programming versus neural networks
The GP method that is a subset of genetic algorithm generally approaches a solution using evolutionary processes including crossover, mutation, duplication, and deletion (Koza, 2004). It involves regression models over a series of generations based on the Darwinian principle of natural selection (Koza, 1992). It starts with solving a problem by creating massive amount of simple random functions in a population pool. These simple parent functions mate and reproduce massive amount of children
Data analysis and synthesis
Two groups of data used in this study include the existing national data from USGS Water Data for Nation, National Weather Service (NWS), National Data Buoy Center (NDBC), and the data collected by the authors from three weather stations deployed in the study area. The National Water Information System (NWIS) (http://waterdata.usgs.gov) provides surface stream discharges. The NWS provides precipitation data obtained from the NEXRAD. The NEXRAD offers spatio-temporal precipitation data which
Technical setting for scenario analyses
Impacts of climate change on stream flows have been on the rise with a great potential to improve the adaptive water resources management. Models in this study were created under the assumption that SST of the eastern Pacific Ocean, the western Atlantic Ocean, and the Gulf of Mexico influence climate in this study area which, in turn, characterizes streamflow of rivers in this watershed to some extent. The USGS stream gages therefore imply the capacity of streams to hold and transport the
Results and discussion
All of the models predict a well-defined lead-lag structure between the discharge and the NEXRAD precipitation, SST, USGS surface flow, and 15 meteorological data. At first, non-linear functional forms of three GP-derived models are summarized in ‘Streamflow forecasting with GP-derived models’ section. To account for the relative importance among the input variables in these GP models, a frequency of use (FOU) is also introduced here to address the rules of how these input variables can be
30-Day forecasting model
where V1 is the stream data at gage id# 8197500, V2 is the stream data at gage id# 8198500, V4 is , V8 is the soil temperature at Donnel site, V9 is the precipitation at Donnel site, V10 is the volumetric water content at Donnel site, V15 is the volumetric water content at Charlotte
7-Day forecasting model
where V4 is , V12 is the relative humidity at Charlotte site, and V15 volumetric water content at Charlotte site.
3-Day forecasting model
Conclusion
A new approach using GP as a means for streamflow forecasting was introduced in this paper by incorporating multidimensional datasets. Improvement of streamflow forecasting was accomplished via expanding the network of meteorological sensors to capture more missing characteristics basin wide. They include the historical streamflow at USGS stations, rainfall data at NEXRAD stations, sea surface temperatures at three buoy stations, and the local meteorological data at three independent stations
References (69)
- et al.
An introduction to the European Hydrological System – Système Hydrologique Européen, “SHE”, 1: history and Philosophy of a physically-based distributed modeling system
Journal of Hydrology
(1986) - et al.
An introduction to the European Hydrological System – Système Hydrologique Européen, “SHE”, 2: structure of a physically-based distributed modeling system
Journal of Hydrology
(1986) - et al.
A soil moisture index as an auxiliary ANN input for stream flow forecasting
Journal of Hydrology
(2004) - et al.
Impact of the length of observed records on the performance of ANN and of conceptual parsimonious rainfall-runoff forecasting models
Environmental Modeling and Software
(2004) - et al.
Scale stream flow predictions: the support vector machines approach
Journal of Hydrology
(2006) - et al.
A grey fuzzy multiobjective programming approach for the optimal planning of a reservoir watershed. Part A: theoretical development
Water Research
(1996) - et al.
A grey fuzzy multiobjective programming approach for the optimal planning of a reservoir watershed. Part B: application
Water Research
(1996) - et al.
Prediction analysis of solid waste generation based on grey fuzzy dynamic modeling
Resources, Conservation and Recycling
(2000) - et al.
Evaluation of streamflow predictions by the IHACRES rainfall-runoff model in two South African catchments
Environmental Modelling & Software
(2003) On the modelling of the infiltration process in arid zones for irrigation project purposes with the aid of the “Système Hydrologique Européen” (SHE)
Agricultural Water Management
(1988)
Improving the characteristics of streamflow modeled by regional climate models
Journal of Hydrology
Computational of the instantaneous unit hydrograph and identifiable component flows with application to two small upland catchments
Journal of Hydrology
Estimation of an ARMA model for river flow forecasting using goal programming
Journal of Hydrology
Linking the pacific decadal oscillation to seasonal stream discharge patterns in Southeast Alaska
Journal of Hydrology
Application of neural approaches to one-step daily flow forecasting in Portuguese watersheds
Journal of Hydrology
Precipitation and northern hemisphere regimes
Atmospheric Science Letters
Flow forecasting for a Hawaii stream using rating curves and neural networks
Journal of Hydrology
Streamflow prediction for the Queanbeyan River at Tinderry, Australia
Environment International
Assessing urban hydrologic prediction accuracy through event reconstruction
Journal of Hydrology
Grey input–output analysis and its application for environmental cost allocation
European Journal of Operational Research
Computation of the instantaneous unit hydrograph and identifiable component flows with application to two small upland catchments – comment
Journal of Hydrology
An Introduction to the Method of Characteristic
An exploration of artificial neural network rainfall-runoff forecasting combined with wavelet decomposition
Journal of Environmental Engineering and Science
Influence of ENSO on flood frequency along the California coast
Journal of Climate
ENSO, Pacific decadal variability, and US summertime precipitation, drought, and streamflow
Journal of Climate
Time series modeling for long-term stream flow forecasting
Journal of Water Resources Planning and Management, ASCE
Rainfall-Runoff Modelling: The Primer
ENSO and hydrologic extremes in the western United States
Journal of Climate
Real-time recurrent learning neural network for stream-flow forecasting
Hydrological Processes
A two-step-ahead recurrent neural network for stream-flow forecasting
Hydrological Processes
Prediction of PCDDs/PCDFs emissions from municipal incinerators by genetic programming and neural network modeling
Waste Management & Research
Cited by (120)
A workflow to address pitfalls and challenges in applying machine learning models to hydrology
2021, Advances in Water ResourcesIn silico analysis of the antimicrobial activity of phytochemicals: towards a technological breakthrough
2021, Computer Methods and Programs in BiomedicineCitation Excerpt :Namely, we apply a feedforward Multi-Layer Perceptron (MLP) Artificial Neural Network (ANN) [10,11,38], and a Genetic Programming (GP) procedure [19,35,43,44] to generate and evolve automatically unknown functions represented implicitally (ANN) or explicitally (GP). While these methods have already been applied in many areas of life sciences [20,21,48,66–69,82,83], they never seem to have been used to predict an antimicrobial sensitivity profile using an approach not based on the culture of the microorganism and on the data of the genomic sequence. The initial part of this study was devoted to the collection of data, obtained with an online bibliographic search through search engines (PubMed, Scopus, and Google), using appropriate keywords to limit the dispersion (e.g. polyphenol mix, phytochemical, antimicrobial properties, MIC, MFC/MOC/MBC).
Comparison of eight filter-based feature selection methods for monthly streamflow forecasting – Three case studies on CAMELS data sets
2020, Journal of HydrologyCitation Excerpt :In addition, popular machine learning models and deep learning models, such as support vector regression (SVR), extreme learning machines (ELM), and the long short-term memory (LSTM) neural network, are frequently employed for streamflow forecasting (e.g., Kisi and Cimen, 2011; Yaseen et al., 2019; Yuan et al., 2018). Regarding model inputs, in addition to the most commonly used input influential factors of precipitation and streamflow (e.g., Kagoda et al., 2010; Li et al., 2010), information such as evaporation, temperature, soil moisture, relative humidity and climatic indices have been employed in data-driven models and have proven to be beneficial for streamflow forecasting (Behzad et al., 2009; Makkeasorn et al., 2008; Noori et al., 2011; Rasouli et al., 2012; Sharma et al., 2015). Notably, increased input variables for data-driven models do not necessarily result in better performances (Guyon and Elisseeff, 2003).