Research papersAn improved gene expression programming model for streamflow forecasting in intermittent streams
Introduction
Accurate streamflow forecasting is an important task for variety of issues in basin hydrology including (but not limited to) reservoir operation, irrigation planning, food production, flood damage mitigation and environmental protection. A number of models have been suggested to simulate this complex process either conceptually or through data-driven methods (Aksoy and Bayazit, 2000, Wang et al., 2009, Yaseen et al., 2017). Intermittent streams are those that may experience dry spells occasionally. This is often the case in arid and semi-arid regions (Salas, 1993), particularly in the tributaries of mountainous rivers or snow-fed streams. Because of the paucity of gauging stations in mountainous regions, the commonly used rainfall-runoff approaches may not be applicable to forecast streamflow in intermittent streams. In such situations, data-driven techniques could be implemented to model streamflow time series if a continuous set of streamflow measurements is available. Then, the evolved model could be applied for neighbouring tributaries using regionalization techniques. In recent literature, due to the advances in data-driven techniques, a number of cross-station, single-station, and successive-station monthly streamflow forecasting models have been developed and their successful results have been reported Danandeh Mehr et al. (2013).
Gene expression programming (GEP) is relatively a new data-driven method that uses population of individuals (programs), improves according to fitness, and obtains the best solution using one or more genetic operators (Ferreira 2001). However, there is foremost differences between genetic programming (GP) and GEP algorithms mainly reside in the nature of their programs. In both, programs are nonlinear entities with different size and shape. While programs are encoded as parse tree in GP, they are encoded as linear strings of fixed length in GEP which are afterwards expressed as the chromosomes. Details about GP and GEP are provided in Section 2.
In recent years, different variants of GP such as GEP, multigene GP (MGGP), and linear GP (LGP) have been used for streamflow prediction (Babovic and Keijzer, 2002, Meshgi et al., 2015, Ravansalar et al., 2017). For example, Guven (2009) compared LGP with two versions of artificial neural networks (ANNs) to predict daily streamflow of Schuylkill River in the USA. The author demonstrated that the performance of LGP is higher than ANNs. Danandeh Mehr et al. (2013) used LGP for monthly streamflow prediction between successive-stations at Çoruh River, a perennial river in Turkey and showed that LGP is superior to neuro-wavelet model. Shoaib et al. (2015) integrated GEP model with discrete wavelet transform pre-processing approach to predict streamflow using rainfall data. The main contribution of the study was the introducing a novel wavelet-GEP model applicable over four watersheds. Worth to mention, the aim of applying wavelet transform on the streamflow time series was to extract their temporal and spectral information. The authors used the sequential time series approach to determine the input vector matrix that built the predictive model. The proposed wavelet-GEP model outperformed the individual GEP model in all case study catchments during both training and testing phases. Using rainfall, potential evapotranspiration and streamflow from Moselle River basin in France, Danandeh Mehr and Demirel (2016) showed that MGGP can be satisfactorily used for one-day ahead low flow prediction. More recently, Danandeh Mehr and Kahya (2017) developed a Pareto-optimal moving MGGP model for daily streamflow prediction and demonstrate that their hybrid model can overcome the timing error in time series analysing of daily streamflow models.
Focusing on the implementation of GP/GEP in wider range of hydrological studies, the author’s review showed that they have been frequently used to distil knowledge from natural or experimental observations (e.g., Khu et al., 2001, Kisi et al., 2012b, Meshgi et al., 2014, Johari and Nejad, 2015, Danandeh Mehr, 2018). These are techniques which generate symbolic expressions that can be interpreted and combined with domain knowledge (Babovic, 2005, Babovic, 2009). Thus, motivating to be used in practice. Until recently, only a few studies focused on the application of GEP for monthly streamflow forecasting. For example, Karimi et al. (2016) forecasted river flow for both daily and monthly time scales using GEP model integrated with wavelet data pre-processing approach at Filyos River, which is a perennial river in Mediterranean region of Turkey. For comparison purpose, traditional auto regressive moving average model together with two other soft computing methods, ANNs and adaptive neuro-fuzzy inference system, were used in the study. The authors showed that wavelet-GEP was superior to its counterparts. Al-Juboori and Guven (2016) developed a GEP-based stepwise monthly streamflow prediction model and demonstrated that their model precisely forecasts monthly flows at the perennial Hurman River in Turkey as well as Diyalah and Lesser Zab Rivers in Iraq.
Table 1 has listed some of the studies that implemented at least one GP variant for time series modelling of streamflow data. As shown in the table, Karimi et al. (2016) as well as Al-Juboori and Guven’s (2016) papers are dealing with generating GEP-based monthly streamflow forecasting model for perennial rivers, whereas the present study focuses on the calibrating GEP for intermittent rivers. The main difference between the methodology of this study and those of Karimi et al., 2016, Al-Juboori and Guven, 2016 is the inclusion of seasonality effect in the selection of potential predictors which is the major pattern in the intermittent streamflow series. Moreover, the present study puts forward a new strategy to enhance the accuracy of GEP forecasts.
On the other hand, the documented studies related to the streamflow forecasting in intermittent rivers are quite limited owing to the complexity of time series modelling of intermittent flows (Kisi et al., 2012b). Although one might find a few studies that suggest the implementation of soft computing methods for intermittent streamflow forecasting (e.g., Cigizoglu, 2005, Kişi, 2009, Kisi et al., 2012b), to the best of the author’s knowledge, the present study is the first study in the literature that applies GEP for monthly streamflow forecasting in an intermittent stream. Under the lights of the abovementioned literature, a new hybridization procedure is suggested in order to augment GEP prediction accuracy. This is a new procedure by which the coefficients of the best GEP induced expression are optimized through genetic algorithm (GA). The proposed hybrid GEP-GA methodology is applied for single-station monthly streamflow forecasting at Shavir Creek, an intermittent stream located at North West of Iran. The efficiency results of the new model are compared with those of classic GP, standalone GEP as well as multi-linear regression (MLR) and hybrid GEP-linear regression (GEP-LR) models developed in the present study as the benchmarks.
Section snippets
Study area and data
The task of intermittent streamflow forecasting in arid and semi-arid regions is more complicated than in moist tropical and subtropical climates. A first order tributary of Shavir stream, an intermittent stream in Sefidrood River Basin, located in a semi-arid region in North West of Iran, was selected as the case study in the present study (Fig. 1). The stream catchment covers an area of approximately 55.5 km2, which is about 0.03% territory of Ardabil Province, Iran. The stream springs from
Prediction scenario
Fig. 6 shows the ACF and PACF of the streamflow time series for a lag range of 0–60 months. The figure includes the corresponding 95% confidence levels and exhibits a pronounced annual oscillating pattern (almost 12-month periodicity) at ACF diagram. This means that monthly streamflow at the gauging station is more correlated to its previous year amount than that of previous month. In addition, the PACF graph shows that the serial correlation is strongly weak after two years. Therefore, 1-,
Conclusions
Classic GP and GEP have difficulty creating appropriate model for intermittent streamflow forecast. Using more complicated functions, increasing runtime, number of expressions, or depth of genes could not necessarily augment their performance. By contrast, they may lead GP/GEP to over-trained models only after a few generations. This paper, proposed a novel hybrid method, GEP-GA, which embeds GA into GEP to enhance GEP performance through creating the new gene weights that meet the GEP
Acknowledgments
The streamflow data used in this research was provided by Iran Water Resource Management Company (www.wrm.ir). The author also would like to thank the reviewers for their constructive comment on the manuscript.
References (55)
A split-step particle swarm optimization algorithm in river stage forecasting
J. Hydrol.
(2007)- et al.
A comparative study of population-based optimization algorithms for downstream river flow forecasting by a hybrid neural network model
Eng. Appl. Artif. Intell.
(2015) - et al.
Streamflow prediction using linear genetic programming in comparison with a neuro-wavelet technique
J. Hydrol.
(2013) - et al.
A Pareto-optimal moving average multigene genetic programming model for daily streamflow prediction
J. Hydrol.
(2017) - et al.
Chaos-based multigene genetic programming: a new hybrid strategy for river flow forecasting
J. Hydrol.
(2018) - et al.
Suspended sediment modeling using genetic programming and soft computing techniques
J. Hydrol.
(2012) - et al.
An empirical method for approximating stream baseflow time series using groundwater table fluctuations
J. Hydrol.
(2014) - et al.
Development of a modular streamflow model to quantify runoff contributions from different land use types in tropical urban environments using genetic programming
J. Hydrol.
(2015) - et al.
Wavelet-linear genetic programming: a new approach for modeling monthly streamflow
J. Hydrol.
(2017) - et al.
Event-based stormwater management pond runoff temperature model
J. Hydrol.
(2016)
Runoff forecasting using hybrid wavelet gene expression programming (WGEP) approach
J. Hydrol.
Selection of significant input variables for time series forecasting
Environ. Modell. Software
A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series
J. Hydrol.
Novel approach for streamflow forecasting using a hybrid ANFIS-FFA model
J. Hydrol.
A daily intermittent streamflow simulator
Turk. J. Eng. Environ. Sci.
A stepwise model to predict monthly streamflow
J. Hydrol.
Predicting longitudinal dispersion coefficient using ANN with metaheuristic training algorithms
Int. J. Environ. Sci. Technol.
Emergence, evolution, intelligence; hydroinformatics: a study of distributed and decentralised computing using intelligent agents
IHE Delft Inst. Water Educ.
Data mining in hydrology
Hydrol. Process.
Introducing knowledge into learning based on genetic programming
J. Hydroinf.
The evolution of equations from hydraulic data Part I: Theory
J. Hydr. Res.
The evolution of equations from hydraulic data Part II: Applications
J. Hydraul. Res.
Genetic programming as a model induction engine
J. Hydroinf.
Rainfall runoff modelling based on genetic programming
Nord. Hydrol.
Use of meta-heuristic techniques in rainfall-runoff modelling
Water
Application of generalized regression neural networks to intermittent flow forecasting and estimation
J. Hydrol. Eng.
Cited by (51)
Maximum energy entropy: A novel signal preprocessing approach for data-driven monthly streamflow forecasting
2024, Ecological InformaticsTowards an efficient streamflow forecasting method for event-scales in Ca River basin, Vietnam
2023, Journal of Hydrology: Regional StudiesGroundwater level prediction using machine learning models: A comprehensive review
2022, NeurocomputingCitation Excerpt :Like other AI techniques, a set of training data is used to train the GP and the evolved solution must be generalized for unseen testing data sets. To minimize computational costs, a set of suitable functions, input variables, evolutionary operation rates, and a maximum depth of the GP trees must also be considered in the modeling process (Mehr and Noyrani 2018 [196]; Tur 2020 [204]). To avoid over-fitting, a lower number of functions and short trees are recommended [201,205].
Hourly streamflow forecasting using a Bayesian additive regression tree model hybridized with a genetic algorithm
2022, Journal of HydrologyCitation Excerpt :Data-driven methods are superior to physical-based approaches because they do not consider the complex processes of rainfall-runoff mechanisms and they can be easily implemented (Ren et al., 2020). Scientists have applied various statistical and data-driven algorithms, including the adaptive network-based fuzzy inference system (ANFIS) (Yaseen et al., 2017; Zhou et al., 2019), artificial neural network-based models (Freire et al., 2019; Prasad et al., 2017; Taormina and Chau, 2015), deep neural network models (long short-term memory (LSTM) network and sequence-to-sequence model) (Alizadeh et al., 2021; Apaydin and Sibtain, 2021; Cheng et al., 2020; Fu et al., 2020; Le et al., 2021; Ni et al., 2020a; Yin et al., 2021), genetic programming (Danandeh Mehr, 2018; Mehr and Gandomi, 2021), support vector regression (SVR) (Adnan et al., 2020; Luo et al., 2019), and multiple linear regression (MLR) algorithms (Chokmani et al., 2008; Kim et al., 2018; Salmasi and Abraham, 2021) to solve hydrological problems and predict streamflow at different tempo-spatial scales. Their research was successful in modeling complex hydrological mechanisms and produced reliable results in streamflow predictions (Mosavi et al., 2018).
Artificial Intelligence-based model fusion approach in hydroclimatic studies
2022, Handbook of HydroInformatics: Volume II: Advanced Machine Learning Techniques