A Pareto-optimal moving average-multigene genetic programming model for rainfall-runoff modelling

https://doi.org/10.1016/j.envsoft.2017.03.004Get rights and content

Highlights

  • We developed a Pareto-optimal model integration moving average with multigene genetic programming for runoff prediction.

  • The new model is explicit and less complex than stand-alone genetic programming models.

  • Lagged prediction is the main drawback of stand-alone genetic programming models.

Abstract

The effectiveness of genetic programming (GP) in rainfall-runoff modelling has been recognized in recent studies. However, it may produce misleading estimations if autoregressive relationship between runoff and its antecedent values is not carefully considered. Meanwhile, GP evolves alternative models of different accuracy and complexity, where selecting a parsimonious model from such alternatives needs extra attention. To cope with these problems, this paper proposes a new hybrid model that integrates moving average filtering with multigene GP and uses Pareto-front plot to optimize the evolved models through an interactive complexity-efficiency trade-off. The model was applied to develop single- and multi-day-ahead rainfall-runoff models and compared to stand-alone GP, multigene GP, and multilayer perceptron as the benchmarks. The results indicated that the new model provides substantial improvements relative to the benchmarks, with prediction errors 25–60% lower and timing accuracy 80–760% higher. Moreover, it is explicit and parsimonious, motivating to be used in practice.

Introduction

It is well documented that the transformation of rainfall into runoff is an extremely complex process, which can be difficult to fully understand and represent (Hsu et al., 1995, Humphrey et al., 2016). This is mainly due to the non-stationary feature of the phenomenon (including possible trend, seasonality, and jump) and highly nonlinear relationship between streamflow and its driving variables (Cannas et al., 2006, Nourani et al., 2011). One of the common ways to represent such complex process is to implement artificial intelligence (AI) methods such as artificial neural networks (ANN; Tokar and Johnson, 1999), fuzzy logic (Hundecha et al., 2001), support vector machine (Bray and Han, 2004), and genetic programming (GP; Savic et al., 1999) methods. Inasmuch as historical rainfall and streamflow data are typically used to train these methods, a sound knowledge of the underlying physical processes is not prerequisite.

For the purpose of improving prediction accuracy in AI-based rainfall–runoff models, a number of data pre-processing approaches (with varying degree of effectiveness) including principal component analysis (PCA; e.g., Hu et al., 2007), wavelet transform (WA; e.g., Nourani et al., 2009, Nourani et al., 2011), moving average (MA; e.g., de Vos and Rientjes, 2005, Wu et al., 2009), singular spectrum analysis (SSA; e.g., Wu and Chau, 2011), and others have been used in recent studies. For example, Hu et al. (2007) used the PCA to improve the predictive accuracy of ANN method in Darong River watershed, China. The authors showed that extracting the principal components from lagged input hydro-meteorological data can provide a generally better representation of rainfall–runoff process in the watershed. Using rainfall-runoff data from two watersheds in China, Wu and Chau (2011) demonstrated that SSA approach can be effectively used to eliminate the lagged prediction effect of stand-alone ANN models. Nourani et al. (2009) developed a hybrid wavelet–ANN (WANN) model to simulate rainfall–runoff process at Lighvanchai Watershed (Iran) and compared the results with those of stand-alone ANN. Where discrete wavelet transform was used for decomposition and reconstruction of rainfall and runoff data, the ANN was applied for prediction. The authors demonstrated that the hybrid WANN model produces more accurate results than ANN, especially for the peaks of the streamflow time series. Such hybrid methodology was then applied by Pramanik et al. (2010) and Nourani et al. (2013a) for multi-day ahead streamflow forecasting, respectively. Confirming the results of previous studies, the authors reported that wavelet decomposition in conjunction with ANN can overcome drawbacks of the individual model. In another study, de Vos and Rientjes (2005) pointed out that the rainfall–runoff modelling in a watershed can be contaminated with lagged prediction effect because of dominating autoregressive relationship between runoff and its antecedent values used as ANN model inputs. The authors found a stand-alone ANN to be not suitable in rainfall-runoff modelling with short lead times. In order to overcome this problem, the authors applied the MA data pre-processing approach over antecedent rainfall and runoff data and demonstrated that the new hybrid MA-ANN model is superior to the conventional neural network. Inspired by this study, a similar hybrid model has been developed for daily streamflow prediction and successful application of MA data pre-processing has been reported by Wu et al. (2009).

In spite of providing satisfactory prediction results, none of the above-mentioned hybrid models (i.e., either a model with a data pre-processing step or a conjunction of at least two models) put forward an explicit formulation in regard to rainfall-runoff process. In addition, the models are too complex, which are undesirable to use in practice. With respect to the point, GP-based rainfall-runoff models were suggested in recent hydrological studies. A pioneer study in the application of GP for rainfall-runoff modelling was carried out by Savic et al. (1999). Since then, a variety of studies have investigated capability of GP and its advancements in rainfall-runoff modelling (e.g., Whigham and Crapper, 2001, Babovic and Keijzer, 2000, Dorado et al., 2003, Muttil and Liong, 2004, Jayawardena et al., 2006, Aytek and Alp, 2008, Nourani et al., 2013b, Danandeh Mehr and Demirel, 2016). It is worth mentioning that a GP model (similar to ANN) may produce misleading estimations if non-stationary features and noise reduction are not carefully taken into account (Nourani et al., 2012, Shoaib et al., 2015). Consequently, additional studies in order to develop not only explicit but also precise models are still required. To this end, effect of various data pre-processing approaches on GP can be investigated. Our review concerning the application of GP in environmental modelling indicated that only a few researchers investigated drawbacks of stand-alone GP, particularly when facing with highly non-stationary rainfall-runoff time series. For instance, Nourani et al. (2012) combined the power of wavelet transform with GP and ANN techniques to model daily and monthly streamflow process in two small watersheds in Iran. To cope with non-stationary feature of the process, historical rainfall-runoff time series were decomposed into different components using various mother wavelets, and then hybrid GP and ANN models were constructed and compared to stand-alone ANN and GP. Overall, the hybrid models outperformed their counterparts and they were found to be able to monitor both short and long-term patterns due to the use of multi-scale time series of input data. The authors also indicated that while wavelet decomposition may improve the performance of the stand-alone GP and ANN, GP can be used to decrease the number of input variables for hybrid WANN modelling. In order to develop an accurate and reliable GP model, more recently, Shoaib et al. (2015) developed a hybrid wavelet gene expression programming (WGEP) model to forecast the runoff using rainfall data in four catchments located in different hydro-climatic regions. The authors demonstrated that the WGEP model is superior to the stand-alone GEP in all the catchments (see Nourani et al., 2014 and Yaseen et al., 2015 respectively for reviews of WAI models in hydrology and AI-based streamflow models).

The main goal of this paper is to propose a new hybrid rainfall-runoff model based on a recent advancement of GP known as multigene GP (MGGP) to create more accurate and explicit forecasts of daily streamflow. This goal is motivated by the importance of an accurate and reliable single- and multi-day ahead streamflow forecasts for several water resources engineering issues such as real-time flood warning and reservoir operation systems as well as limitations (i.e., low accuracy and high complexity) of the current AI-based models for multi-day ahead streamflow forecasts. Furthermore, the present study explores two typical issues related to the development of stand-alone GP-based rainfall-runoff models, which involve lagged prediction effect and complexity of evolved models. Consequently, particular attention is given to the data pre-processing procedure and developing a parsimonious (accurate and simple) model, inasmuch as it is motivating to be used in practice. To develop a parsimonious model, the Pareto-front idea from multi-objective optimization community (e.g., Creaco and Pezzinga, 2015; Taormina and Chau, 2015) is used in this study. Although MGGP has been applied for low flow prediction in a recent study by Danandeh Mehr and Demirel (2016), to our knowledge the potential of stand-alone and hybrid MGGP for multi-day ahead streamflow forecasting has never been explored.

Section snippets

Study area and observational data

The proposed hybrid rainfall-runoff model is trained and verified for the unregulated portion of Haldizen Watershed located in Trabzon Province, Turkey. According to Koppen-Geiger climate classification, the province has a borderline oceanic/humid subtropical climate with warm summers and cool winters. Mean annual rainfall in the province is about 810 mm with a maximum rate commonly in spring months (March to May). The watershed area at the point of stream/rain gage (Şerah Station; 40°37′20″

Development of prediction scenarios and benchmark models

Prior to developing the proposed Pareto-optimal MA-MGGP model, the ability of stand-alone GP, MGGP, and MLP techniques to model rainfall-runoff process at the study area are investigated. The models comprised both single- (i.e., Eq. (5)) and multi-day ahead (i.e., Eqs. (6), (7)) forecasting schemes and are considered as bench marks in this study.Qt+1=f(Qt,Qt1,,Qti,,Qt6,It,It1,,Iti,,It6)+ε(t)Qt+2=f(Qt,Qt1,,Qti,,Qt6,It,It1,,Iti,,It6)+ε(t)Qt+3=f(Qt,Qt1,,Qti,,Qt6,It,It1,,I

Concluding remarks

This paper proposed a new rainfall-runoff model, namely Pareto-optimal MA-MGGP for short-range streamflow prediction. It is an explicit hybrid model that integrates MA data pre-processing approach with multigene GP engine to enhance the prediction precision of the stand-alone GP-based models. In the proposed model, MA filtered data are entered into the MGGP system, and then Pareto-front plot of the best population is depicted to choose a parsimonious (accurate and simple) model. The model was

Acknowledgements

This research has been carried out as a part of an ongoing postdoctoral research project at University of Tabriz funded by the Iran's National Elites Foundation (BMN). The authors gratefully acknowledge Technology Affairs of University of Tabriz for their tremendous help during the research. The authors also thank Turkish State Hydraulic Works and Turkish State Meteorological Service for providing the data used in this study. The authors are also grateful to three anonymous reviewers for their

References (59)

  • V. Nourani et al.

    Applications of hybrid wavelet–Artificial Intelligence models in hydrology: a review

    J. Hydrol.

    (2014)
  • V. Nourani et al.

    Two hybrid Artificial Intelligence approaches for modeling rainfall–runoff process

    J. Hydrol.

    (2011)
  • M. Ravansalar et al.

    A wavelet-linear genetic programming model for sodium (Na+) concentration forecasting in rivers

    J. Hydrol.

    (2016)
  • A.M. Sattar et al.

    Gene expression models for prediction of longitudinal dispersion coefficient in streams

    J. Hydrol.

    (2015)
  • M. Shoaib et al.

    Runoff forecasting using hybrid wavelet gene expression programming (WGEP) approach

    J. Hydrol.

    (2015)
  • R. Taormina et al.

    An information theoretic approach to select alternate subsets of predictors for data-driven hydrological models

    J. Hydrol.

    (2016)
  • P.A. Whigham et al.

    Modeling rainfall–runoff using genetic programming

    Math. Comput. Model

    (2001)
  • C.L. Wu et al.

    Methods to improve neural network performance in daily flows prediction

    J. Hydrol.

    (2009)
  • C.L. Wu et al.

    Data-driven models for monthly streamflow time series prediction

    Eng. Appl. Artif. Intel.

    (2010)
  • C.L. Wu et al.

    Rainfall–runoff modeling using artificial neural network coupled with singular spectrum analysis

    J. Hydrol.

    (2011)
  • C.L. Wu et al.

    Prediction of rainfall time series using modular soft computing methods

    Eng. Appl. Artif. Intel.

    (2013)
  • Z.M. Yaseen et al.

    Artificial intelligence based models for stream-flow forecasting: 2000–2015

    J. Hydrol.

    (2015)
  • T. Zerenner et al.

    Downscaling near-surface atmospheric fields with multi-objective genetic programming

    Environ. Modell. Softw.

    (2016)
  • C.R. Zorn et al.

    Peak flood estimation using gene expression programming

    J. Hydrol.

    (2015)
  • ASCE task committee on application of Artificial Neural Networks in hydrology

    Artificial neural networks in hydrology 2: hydrologic applications

    J. Hydrol. Eng.

    (2000)
  • A. Aytek et al.

    An application of artificial intelligence for rainfall–runoff modeling

    J. Earth Syst. Sci.

    (2008)
  • V. Babovic et al.

    Genetic programming as a model induction engine

    J. Hydroinform

    (2000)
  • M. Brameier et al.

    Linear Genetic Programming

    (2007)
  • M. Bray et al.

    Identification of support vector machines for runoff modelling

    J. Hydroinform

    (2004)
  • Cited by (0)

    View full text