Short-term streamflow forecasting with global climate change implications – A comparative study between genetic programming and neural network models

https://doi.org/10.1016/j.jhydrol.2008.01.023Get rights and content

Summary

Sustainable water resources management is a critically important priority across the globe. While water scarcity limits the uses of water in many ways, floods may also result in property damages and the loss of life. To more efficiently use the limited amount of water under the changing world or to resourcefully provide adequate time for flood warning, the issues have led us to seek advanced techniques for improving streamflow forecasting on a short-term basis. This study emphasizes the inclusion of sea surface temperature (SST) in addition to the spatio-temporal rainfall distribution via the Next Generation Radar (NEXRAD), meteorological data via local weather stations, and historical stream data via USGS gage stations to collectively forecast discharges in a semi-arid watershed in south Texas. Two types of artificial intelligence models, including genetic programming (GP) and neural network (NN) models, were employed comparatively. Four numerical evaluators were used to evaluate the validity of a suite of forecasting models. Research findings indicate that GP-derived streamflow forecasting models were generally favored in the assessment in which both SST and meteorological data significantly improve the accuracy of forecasting. Among several scenarios, NEXRAD rainfall data were proven its most effectiveness for a 3-day forecast, and SST Gulf-to-Atlantic index shows larger impacts than the SST Gulf-to-Pacific index on the streamflow forecasts. The most forward looking GP-derived models can even perform a 30-day streamflow forecast ahead of time with an r-square of 0.84 and RMS error 5.4 in our study.

Introduction

The availability of adequate fresh water is a fundamental requirement for the sustainability of human and terrestrial landscapes. Thus, the importance of understanding and improving predictive capacity regarding all aspects of the global and regional water cycle is certain to continue to increase. One fundamental component of the water cycle is streamflow. Streamflow is related to fresh water availability for human, animal, and plant populations, and to the incidences of natural hazards, such as flood and drought, that occur abruptly and may result in loss of human and animal life and damages to human properties. Flood alert systems hold the highest possibility of reducing the damages from the floods. On the other hand, drought analysis also counts on appropriate forecasts of stream flow. Stream flow prediction therefore provides crucial information for adaptive water resources management. Prospective users may include farmers, fishermen, waterway navigators, coastal ecosystem management, reservoir operators, recreational management, and riparian management, for example. Yet fluctuations of global climate change challenge scientists and engineers to estimate and forecast the magnitude and timing of stream discharges with higher accuracy. Recent leaps in remote sensing and artificial intelligence technologies empower such an effort of streamflow forecasting.

Genetic programming (GP) model is selected as a means to compare against neural network (NN) model for the development of a suite of streamflow forecasting models to meet various demands in this study. Strengths of the GP include the evolutionary approach, the nature selection process, and the white box characteristic. Since multiple input variables may be used in the prediction of stream flows, we must be able to identify the importance of each variable. The evolutionary process and natural selection techniques embedded in the GP model would allow the screening of the multiple input variables to be executed inherently for achieving the best result. The white box character of GP model can reveal internal structures of all created models, which can be examined by such a genetic algorithm-based binary tree modeling structure.

In all modeling schemes, lead-lag regression approach is preferred over the time-series approach to creating forecasting models. Lead-lag regression model is a statistical model which identifies differences of timing in fluctuations through a system. Lead-lag regression utilizes existing data to predict discharge in the future time steps. Three-step-behind inputs are paired with a current discharge, a four-step-behind inputs are paired with a one-step-behind discharge, and so on. Thus, lead-lag regression only requires historical data to develop a predicting model. The time-series model predicts a one-step-ahead discharge, and then put the one-step-ahead discharge back into the input dataset to predict a two-step-ahead discharge, and repeat this process until a number of future time steps can be achieved. Other approaches of streamflow predictions are physically-based model and conceptual model. These two approaches usually need precipitation data as a driver to estimate the future discharges. Normally, a forecast of precipitation may be achieved beforehand. The estimated future precipitation data are fed into the model in order to calculate the amount of discharges in the future time steps. The accuracy of this approach depends mainly on the accuracy of predicted precipitation and estimated watershed characteristics. Consequently creating future time-step drivers could be much more difficult when multiple drivers are required to do so. Thus, the lead-lag regression is preferable in this study. The multiple input variables of interest include historic streamflows, NEXt generation RADar (NEXRAD) precipitation data, sea surface temperatures (SSTs) in the Pacific Ocean, the Atlantic Ocean, and the Gulf of Mexico, and local meteorological data collected from three weather stations in the watershed. Effective lead-time streamflow forecast is one of the key aspects of successful flood and drought management based on an enlarged hydrometeorological datasets. Hence, it is the aim of this study to testing the hypothesis that the inclusion of SSTs would significantly impact the accuracy of streamflow forecasting and the GP model can capture the underlying non-linear characteristics in a river system basin wide. All the efforts to improving existing methods and developing new methods of streamflow prediction in this paper, in the nexus of artificial intelligence and high performance computing, may support the adaptive water resources management at all scales spatially and temporally.

Section snippets

Background

Sea surface temperatures (SSTs) have been a primary expression of global climate anomalies for several decades. El Niño-Southern oscillation (ENSO) is the sea surface temperature oscillation in the Pacific Ocean. A vast amount of studies show influences of ENSO on climate changes in the North, Central, and South America (Hansen et al., 1997, Cayan et al., 1998, Harrison and Larkin, 2000, Andrews et al., 2003, Tartaglione et al., 2003, Haylock et al., 2005). Pacific decadal oscillation (PDO) is

Study area

The Choke Canyon Reservoir Watershed (CCRW) is a portion of the Nueces River Basin, south Texas. It is composed of several land use and land cover patterns covering an area of approximately 15,000 km2. The major uses of the land are agriculture and livestock. Intensive uses of groundwater for irrigation are highly concentrated in the middle and lower areas of the basin. Geography of the area strongly influences the hydrological cycle of the watershed. In the upper portion of the watershed the

Solution procedure: genetic programming versus neural networks

The GP method that is a subset of genetic algorithm generally approaches a solution using evolutionary processes including crossover, mutation, duplication, and deletion (Koza, 2004). It involves regression models over a series of generations based on the Darwinian principle of natural selection (Koza, 1992). It starts with solving a problem by creating massive amount of simple random functions in a population pool. These simple parent functions mate and reproduce massive amount of children

Data analysis and synthesis

Two groups of data used in this study include the existing national data from USGS Water Data for Nation, National Weather Service (NWS), National Data Buoy Center (NDBC), and the data collected by the authors from three weather stations deployed in the study area. The National Water Information System (NWIS) (http://waterdata.usgs.gov) provides surface stream discharges. The NWS provides precipitation data obtained from the NEXRAD. The NEXRAD offers spatio-temporal precipitation data which

Technical setting for scenario analyses

Impacts of climate change on stream flows have been on the rise with a great potential to improve the adaptive water resources management. Models in this study were created under the assumption that SST of the eastern Pacific Ocean, the western Atlantic Ocean, and the Gulf of Mexico influence climate in this study area which, in turn, characterizes streamflow of rivers in this watershed to some extent. The USGS stream gages therefore imply the capacity of streams to hold and transport the

Results and discussion

All of the models predict a well-defined lead-lag structure between the discharge and the NEXRAD precipitation, SST, USGS surface flow, and 15 meteorological data. At first, non-linear functional forms of three GP-derived models are summarized in ‘Streamflow forecasting with GP-derived models’ section. To account for the relative importance among the input variables in these GP models, a frequency of use (FOU) is also introduced here to address the rules of how these input variables can be

30-Day forecasting model

30-day forecasting discharge=2·(2A1-1)·(V4)2A1=|sin(A2)|+V2-A3A2=sin(2·V10+3.075786)+V2A3=[-sin((-V1)·(V8)·(V16)-V15)·(2A4-1+V2)]A4=|sin(A5)+V9|A5=sin((-V1)(V8)(V16)-V15)1.29105-1.0309+V2-V42+V2where V1 is the stream data at gage id# 8197500, V2 is the stream data at gage id# 8198500, V4 is SSTGulfSSTAtlantic, V8 is the soil temperature at Donnel site, V9 is the precipitation at Donnel site, V10 is the volumetric water content at Donnel site, V15 is the volumetric water content at Charlotte

7-Day forecasting model

7-day discharge(cfs)=2B2-1-2·V4·(-V15)B2=-0.6205·{2B3-1-[sin(-B4)-V12]2}B3=[sin(-B4)-V12]2·2-V15B4=(20.69816·(B5)-1)·(2-V15)B5=-1.9659·(B6)+V12-0.9-2·V15·sin(V15)B6=[sin(V15)·(-V15)]-V4where V4 is SSTGulfSSTAtlantic, V12 is the relative humidity at Charlotte site, and V15 volumetric water content at Charlotte site.

3-Day forecasting model

3-day forecasting discharge=2C1-1.3729C1=2C2-1C2=2C3-1C3=2C4-1.3729C4=2C5-1C5=||C6|-C7+V4-V19|-C7C6=(-1.2029)·[-(C84)-V15]2·(V6)C7=[(C12-C11)·(C11)]·[-(C84)-V15]2C8=cos2(C9)-sin2(|C10-V15|)C9=C11+0.3363V0·[(C12-C11)·(C11)]C10=(0.1932)·[(2·C13)+V18]C11=sin[(C14-V15)·(V20)]C12=(C13-C15)·(C15)C13=[C19-sin(C16)]·sin(C16)C14=(-0.0291)·sin(2·C17)C15=sin(|C10-V15|)C16=C18-V15-0.0290C17=sin2(|C10-V15|)C18={V5·cos[sin(C20)]}-(0.7150·V5)+V4C19=-sin(C20)·sin(C20)cos[sin(C20)]C20=2(-V12)·(V20+1.4686)-1

Conclusion

A new approach using GP as a means for streamflow forecasting was introduced in this paper by incorporating multidimensional datasets. Improvement of streamflow forecasting was accomplished via expanding the network of meteorological sensors to capture more missing characteristics basin wide. They include the historical streamflow at USGS stations, rainfall data at NEXRAD stations, sea surface temperatures at three buoy stations, and the local meteorological data at three independent stations

References (69)

  • J.P. Evans

    Improving the characteristics of streamflow modeled by regional climate models

    Journal of Hydrology

    (2003)
  • A.J. Jakeman et al.

    Computational of the instantaneous unit hydrograph and identifiable component flows with application to two small upland catchments

    Journal of Hydrology

    (1990)
  • K. Mohammadi et al.

    Estimation of an ARMA model for river flow forecasting using goal programming

    Journal of Hydrology

    (2006)
  • E.G. Neal et al.

    Linking the pacific decadal oscillation to seasonal stream discharge patterns in Southeast Alaska

    Journal of Hydrology

    (2002)
  • I. Pulido-Calvo et al.

    Application of neural approaches to one-step daily flow forecasting in Portuguese watersheds

    Journal of Hydrology

    (2007)
  • C.C. Raible et al.

    Precipitation and northern hemisphere regimes

    Atmospheric Science Letters

    (2003)
  • G.B. Sahoo et al.

    Flow forecasting for a Hawaii stream using rating curves and neural networks

    Journal of Hydrology

    (2006)
  • S.Y. Schreider et al.

    Streamflow prediction for the Queanbeyan River at Tinderry, Australia

    Environment International

    (1995)
  • B.E. Vieux et al.

    Assessing urban hydrologic prediction accuracy through event reconstruction

    Journal of Hydrology

    (2004)
  • C.C. Wu et al.

    Grey input–output analysis and its application for environmental cost allocation

    European Journal of Operational Research

    (2003)
  • P. Young et al.

    Computation of the instantaneous unit hydrograph and identifiable component flows with application to two small upland catchments – comment

    Journal of Hydrology

    (1991)
  • M.B. Abbott

    An Introduction to the Method of Characteristic

    (1966)
  • F. Anctil et al.

    An exploration of artificial neural network rainfall-runoff forecasting combined with wavelet decomposition

    Journal of Environmental Engineering and Science

    (2004)
  • E.D. Andrews et al.

    Influence of ENSO on flood frequency along the California coast

    Journal of Climate

    (2003)
  • Azouz, A.D., Lachniet, M.S., Asmerom, Y., Polyak, V., Burns, S.J., 2006. Evidence of an active ENSO, PDO, and AMO...
  • M. Barlow et al.

    ENSO, Pacific decadal variability, and US summertime precipitation, drought, and streamflow

    Journal of Climate

    (2000)
  • M. Bender et al.

    Time series modeling for long-term stream flow forecasting

    Journal of Water Resources Planning and Management, ASCE

    (1994)
  • K.J. Beven

    Rainfall-Runoff Modelling: The Primer

    (2000)
  • D.R. Cayan et al.

    ENSO and hydrologic extremes in the western United States

    Journal of Climate

    (1998)
  • Center for Cave and Karst Studies, 2005. Karst. Western Kentucky University, USA....
  • F.J. Chang et al.

    Real-time recurrent learning neural network for stream-flow forecasting

    Hydrological Processes

    (2002)
  • L.C. Chang et al.

    A two-step-ahead recurrent neural network for stream-flow forecasting

    Hydrological Processes

    (2004)
  • Chang, N.B., Makkeasorn, A., submitted for publication. Optimal site selection of hydrological monitoring stations...
  • N.B. Chang et al.

    Prediction of PCDDs/PCDFs emissions from municipal incinerators by genetic programming and neural network modeling

    Waste Management & Research

    (2000)
  • Cited by (120)

    • In silico analysis of the antimicrobial activity of phytochemicals: towards a technological breakthrough

      2021, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      Namely, we apply a feedforward Multi-Layer Perceptron (MLP) Artificial Neural Network (ANN) [10,11,38], and a Genetic Programming (GP) procedure [19,35,43,44] to generate and evolve automatically unknown functions represented implicitally (ANN) or explicitally (GP). While these methods have already been applied in many areas of life sciences [20,21,48,66–69,82,83], they never seem to have been used to predict an antimicrobial sensitivity profile using an approach not based on the culture of the microorganism and on the data of the genomic sequence. The initial part of this study was devoted to the collection of data, obtained with an online bibliographic search through search engines (PubMed, Scopus, and Google), using appropriate keywords to limit the dispersion (e.g. polyphenol mix, phytochemical, antimicrobial properties, MIC, MFC/MOC/MBC).

    • Comparison of eight filter-based feature selection methods for monthly streamflow forecasting – Three case studies on CAMELS data sets

      2020, Journal of Hydrology
      Citation Excerpt :

      In addition, popular machine learning models and deep learning models, such as support vector regression (SVR), extreme learning machines (ELM), and the long short-term memory (LSTM) neural network, are frequently employed for streamflow forecasting (e.g., Kisi and Cimen, 2011; Yaseen et al., 2019; Yuan et al., 2018). Regarding model inputs, in addition to the most commonly used input influential factors of precipitation and streamflow (e.g., Kagoda et al., 2010; Li et al., 2010), information such as evaporation, temperature, soil moisture, relative humidity and climatic indices have been employed in data-driven models and have proven to be beneficial for streamflow forecasting (Behzad et al., 2009; Makkeasorn et al., 2008; Noori et al., 2011; Rasouli et al., 2012; Sharma et al., 2015). Notably, increased input variables for data-driven models do not necessarily result in better performances (Guyon and Elisseeff, 2003).

    View all citing articles on Scopus
    View full text