Regression model for sediment transport problems using multi-gene symbolic genetic programming

doi:10.1016/j.compag.2014.02.010

Computers and Electronics in Agriculture

Volume 103, April 2014, Pages 82-90

https://doi.org/10.1016/j.compag.2014.02.010 Get rights and content

Highlights

•
Develop regression models using multigene symbolic genetic programming.
•
Models have been tested on different efficiency criteria.
•
Good model performance was noted irrespective of dataset sizes.
•
Distribution fits are very close to each other for observed and predicted values.

Abstract

Sediment transport modeling problems are complex due to the multi-dimensionality of the problems, along with their nonlinear interdependence. Also, in river hydraulics, phenomena are stochastic and variables are measured with uncertainties which are unavoidable. Dimensional and regression analyses have been employed in the past but have associated limitations. As a robust modeling tool, genetic programming was used to develop predictor models for three different but related problems of sediment transport-vegetated flow, incipient motion and total bed load prediction. A relatively new development over the conventional genetic programming-multi-gene symbolic regression was used to model functional relationships that were able to generalize highly nonlinear variations in data as well as predict system behavior from independent input data in all the three cases. The algorithmic parameters for genetic programming technique were resolved iteratively, varying based on problems in context. For all the three models developed, model efficiency criteria were found out and presented and the performance of the present model was compared with several past models for the same data points. The models developed herein were able to generalize the underlying relationships in the presented data as well as were able to predict values for unknown data with high accuracy.

Introduction

Sediment transport problems form an essential part of civil engineering practice with regard to river hydraulics related challenges faced in field. Solving sediment transport problems is indispensable in planning and managing water resources. However, system parameters in these problems are multiple in numbers with complex and exhibit nonlinear interdependence. This complexity combined with huge spatiotemporal variations and an inherent nonlinearity makes it difficult to analyze the system analytically. Besides, the variables seem to assume values specific to geographies and climates. This compels one to take assumptions in an analysis that are rendered false when the model may be used for disparate regions. This has also prevented development of universal models which offer satisfactory prediction capabilities irrespective of environments of application. As an example of interdependence, sediment and vegetation make the flow complicated, affecting the velocity profile. This then affects the bed and wall shear stresses and vegetation shapes, causing further changes in sediment loads and velocity profiles. Some or all of the model variables are subjected to the sources of uncertainty, like errors of measurement, absence of information and poor or partial understanding of the driving forces and mechanisms. This imposes a limit on our confidence in the response of the model. Also, models may have to cope with the natural intrinsic variability of the system, such as the occurrence of stochastic events.

Almost all of the existing equations for sediment transport problems are empirical in nature due to such limitations. However, regression and dimensional analyses have been used extensively in the past. These approaches have certain limitations that keep them from being used widely for field applications. Regression has inadequacies pertaining to a first-hand functional form determination and clustering effect of influential points and groups of points. Dimensional analysis is also inadequate due to high number of variables and problems of multiple forms of the same equation In problems of river hydrology, the system often reflects a stochastic nature and the variables cannot be measured without uncertainty. It has therefore been realized that there is a need for developing new and robust models that can overcome the restrictions posed by the conventional techniques.

Soft computing is an emerging paradigm based on the backbone of artificial intelligence, evolutionary/bio-inspired computing and probabilistic computing. These allow developing of statistical black-box models based entirely on historical data. Soft computing has been employed extensively in hydrology and hydraulics with varying applications. The suitability of application of soft computing comes from the fact that it allows for uncertainties in measured values. This is critical in river hydraulics due to the inadvertent uncertainties in measuring data from the field and while experimenting. The models developed are not expected to give 100% accurate results but rather to be tolerant to errors in measurement and offer overall better predictability. These “black-box” models are purely statistical models and model parameters are adjusted by providing training data so as to give predictions for independent and new inputs. Primarily, soft computing techniques include artificial neural network (ANN), fuzzy logic, genetic algorithms (GA), particle swarm optimization (PSO), etc. Several soft computing models have been developed in the past. Adib (2008) used ANN for determining water surface elevation in tidal rivers. Sediment load prediction was carried out by Altunkaynak (2009) using genetic algorithms. Goel and Pal (2009) have used support vector machine in scour prediction. GA was also used for parameter identification for modeling river network by Tang et al. (2010). Kumar et al. (2010) has used Radial Basis Function model to design an incipient channel with bed suction. Kumar and Rao (2010) has used metamodel to predict friction factor in alluvial channel. Application of neural networks and fuzzy logic models to long-shore sediment transport was carried out by Samani et al. (2011). Amirabdollahian et al. (2011) used fuzzy genetic algorithm for optimal design of water networks. Kumar (2011) has used ANN model for friction factor prediction in alluvial channel. Krishna et al. (2012) used a wavelet neural network model for river flow time series. Kumar (2012) has applied soft computing technique for bed material load prediction. Ismail et al. (2013) have applied a feed-forward neural network to predict bridge scour. Other recent relevant work done in the field of river hydraulics by employing soft computing techniques include those of Kisi and Hosseinzadeh (2012) for modeling rainfall–runoff process, Kisi and Hosseinzadeh (2012) for suspended sediment modeling, Shiri et al. (2012) for forecasting daily stream flow. Shiri and Kisi (2012) also estimated daily suspended sediment load using wavelet conjunction models. A comparative study was completed by Kisi and Shiri (2012) in river suspended sediment estimation by climatic variables implication where various soft computing techniques were compared.

Genetic programming (GP) proposed in Koza (1992) views the modeling problem as one of program discovery. Genetic programming is a relatively newer domain in soft computing and has gained popularity in a variety of applications, including those in river hydraulics and sediment dynamics in fluvial systems. Singh et al. (2007) applied neural network–genetic programming for sediment transport. Azamathulla et al. (2008) used genetic programming to predict ski-jump bucket spill-way scour. Aytek and Kişi (2008) attempted sediment modeling using a genetic programming approach. Kisi and Guven (2010) carried out suspended sediment concentration estimation using a machine code-based genetic programming. Chang et al. (2012) used linear genetic programming for discharge prediction in compound channels. Kisi and Hosseinzadeh (2012) developed suspended sediment models using genetic programming. The paradigm of genetic programming attempts a search for the best program from a search space of programs by evolving generations of genetically bred and mutated populations of programs (mathematical expressions). Indeed, the modeling problem requires one to develop models which may well be an explicit function of the independent variables. In this, the approach of genetic programming differs from that of artificial neural models which do not present an explicit expression and rather utilize a number of network parameters to transform inputs to outputs. However, both ANN and GP help develop black-box models which are not based on the underlying physics or the phenomena of the system but are purely statistical. Genetic programming is different from conventional regression. Rather than finding numeric coefficients of a predetermined functional form as done by regression, symbolic regression attempts to find a symbolic expression containing both, functions as well as independent variables and numeric coefficients. The method is also referred to as symbolic function identification. The major difference lies in the fact that unlike conventional regression, GP does not require predetermined functional forms. Instead, it accepts the library of operators (functions and variables) and evolves generations of expressions to ultimately reach the best expression. The term symbolic regression is used for any technique which fits the measured data using a suitable mathematical formula. GP employs a search heuristic where the algorithm begins with randomized sets of expressions and creating new expressions in each generation (iteration) which perform better than the previous generation. Hence, the expressions are not calculated but generated from parent expressions using the genetic operators (mutation, crossover, etc.). The only calculations that take place are evaluations of expressions to assess their performance. This is done using model performance indicators (correlation coefficient, etc.) on the training data. The indicator helps to assess to what degree the model has been able to generalize the training dataset statistically. A good correlation coefficient, for example, would indicate a good generalization. These river hydraulics models are highly complex, and therefore their underlying relationships may be poorly understood. In such cases, the model can be viewed as a black box, i.e. the output is an opaque function of its inputs.

The present attempt is aimed at suggesting a new and improved regression model for sediment transport problems, namely, multi-gene symbolic regression model for three different but related phenomena in sediment transport-vegetated flow, incipient motion and total bed load prediction. Multi-gene symbolic regression uses GP to find (and not calculate) multiple sub-programs (individual genes) and finally regresses the coefficients of these sub programs to reach the final expression. Models developed herein for the all the three cases were found to be better than existing models in terms of model performance criteria.

Section snippets

Sediment transport problems

Flow velocity prediction in vegetated channel flow, total bedload and incipient motion prediction have been taken up in this study. The state of the art for each has been discussed briefly in the subsections that follow.

Methodology

A similar methodology has been adopted in all the three cases taken up to demonstrate the applicability of genetic programming approach across sediment transport problems. The source of data used for modeling in all the three cases has been stated. This is followed by description of the adopted technique, namely, multi-gene symbolic regression. Finally, model performance was assessed through criteria such as correlation coefficient and index of agreement and finally comparing with past

Results and discussion

The parameters were varied through a range of values to detect the appropriate combination of parameters. Build method, which represents ways of initializing tree structures in the first generation, was varied through three possible configurations-‘full’, ‘grow’ and ‘ramped half-and-half’. The maximum depth of an individual gene was varied between 2 and 8. Number of genes was set to vary within 1 and 50. The fractions of mutations, crossover and direct cop were varied with steps of 0.1 within

Conclusion

Sediment transport problems are difficult to model analytically due to the difficulties posed by multidimensional and nonlinearly interdependent system variables. Moreover, graphical techniques and empirical methods proposed in the past for several sediment transport problems show inadequacy in terms of low predictability and agreement with actual data. Soft computing offers several computational techniques to develop efficient and robust models. Multi-gene symbolic regression was used to

Acknowledgements

The authors gratefully acknowledge the financial support that was received from the department of science and technology, Govt. of India (SERC-DST: SR/S3/MERC/005/2010) to carry out the research work presented in this paper.

References (55)

A. Altunkaynak
Sediment load prediction by genetic algorithms
Adv. Eng. Softw.
(2009)
A. Aytek et al.
A genetic programming approach to suspended sediment modelling
J. Hydrol.
(2008)
W. Erskine et al.
Influence of riparian vegetation on channel widening and subsequent contraction on a sand-bed stream since European settlement: Widden Brook, Australia
Geomorphology
(2012)
A. Goel et al.
Application of support vector machines in scour prediction on grade-control structures
Eng. Appl. Artif. Intel.
(2009)
A. Ismail et al.
Predictions of bridge scour: application of a feed-forward neural network with an adaptive activation function
Eng. Appl. Artif. Intel.
(2013)
Ö. Kisi et al.
A machine code-based genetic programming for suspended sediment concentration estimation
Adv. Eng. Softw.
(2010)
O. Kisi et al.
Suspended sediment modeling using genetic programming and soft computing techniques
J. Hydrol.
(2012)
O. Kisi et al.
River suspended sediment estimation by climatic variables implication: Comparative study among soft computing techniques
Comp. Geosci.
(2012)
B. Kumar et al.
Metamodeling approach to predict friction factor in alluvial channel
Comput. Electron. Agr.
(2010)
B. Kumar et al.
Incipient motion design of sand bed channels affected by bed suction
Comput. Electron. Agr.
(2010)

J.E. Nash et al.

River flow forecasting through conceptual models. 1: Discussion of principles

J. Hydrol.

(1970)

D. Paphitis

Sediment movement under unidirectional flows: an assessment of empirical threshold curves

Coast. Eng.

(2001)

H. Tang et al.

Parameter identification for modeling river network using a genetic algorithm

J. Hydrodyn.

(2010)

Acaroglu, E.R., 1968. Sediment transport in convenyance system. PhD Thesis, Cornell University, Ithaca, New York,...

A. Adib

Determining water surface elevation in tidal rivers by ANN

P. I. Civil Eng. – Wat. M.

(2008)

M. Amirabdollahian et al.

Optimal design of water networks using fuzzy genetic algorithm

P I Civil Eng-Wat M

(2011)

H. Azamathulla et al.

Genetic programming to predict ski-jump bucket spill-way scour

J. Hydrodyn.

(2008)

Bagnold, R.A., 1966. An Approach to the Sediment Transport Problem From General Physics, Professional Paper 422-I, U.S....

M.J. Baptist et al.

On inducing equations for vegetation resistance

J. Hydraulic Res.

(2006)

Brownlie, W.R., 1981. Compilation of Alluvial Channel Data: Laboratory and field, Rep. No. KH-R-43B, California...

J. Buffington

The legend of A. F. Shields

J. Hydraul. Eng.

(1999)

Z. Cao et al.

Explicit formulation of the Shields diagram for incipient motion of sediment

J. Hydraul. Eng.

(2006)

C.K. Chang et al.

Appraisal of soft computing techniques in prediction of total bed material load in tropical rivers

J. Earth Syst. Sci.

(2012)

N. Chien et al.

Mechanics of Sediment Movement

(1983)

Einstein, H.A., 1950. The Bed-load Function for Sediment Transportation in Open Channel Flows, Technical Bulletin 1026,...

Engelund F., Hansen E., 1967. A monograph for sediment transport in alluvial channel, report, Copenhagen,...

Galema, A., 2009. Evaluation of Vegetation Resistance Descriptors for Flood Management, Master Thesis, University of...

Cited by (20)

Multi-gen genetic programming based improved innovative model for extrapolation of wind data at high altitudes, case study: Turkey
2022, Computers and Electrical Engineering
Citation Excerpt :
The parameter values of the MGGP used in the creation of the functions for the first week of each month are shown in Table 3. In addition, Table 3 was organized according to Refs. [19,21]. All of the above steps were applied to all site points S2 S3, S4, S5 and S6 respectively.
Wind speed is the most important input of wind energy conversion systems and has higher values at high altitudes. Therefore, tall wind measurement masts are used in the wind power industry to determine the wind speed at high altitudes. However, this situation brings many engineering problems (cost escalation, de-erection and re-erection of the masts due to the failure of the anemometer and sensors, lightning strikes, mechanical failures etc.). In this study, it is aimed to estimate the data at the hub height levels of the proposed wind power generators by placing shorter wind masts as a suitable alternative for longer masts. Therefore, we proposed an innovative model that uses multigene genetic programming to estimate wind speed at high altitudes. According to the power and logarithmic law, analysis results show that root mean square error (RMSE) values were decreased with proposed method in the wind speed estimation, 58.62% and 58.77% respectively.
Multigene genetic programming and its various applications
2022, Handbook of HydroInformatics: Volume I: Classic Soft-Computing Techniques
This chapter reviews a modified version of genetic programming, so-called multigene genetic programming (MGGP). It is basically one of artificial intelligence models and capable of searching for a suitable relationship between any input and output sets of data, regardless of the physical background of the data. Since it has a flexible structure and powerful search engine, it has inevitable potential to be used for solving various problems in different field of research including water resources management. In this regard, this chapter aims to introduce this technique and its merits. First, it begins with genetic programming and its different versions. Afterward, multigene genetic programming and its main controlling parameters are introduced. A brief literature review on application of multigene genetic programming is presented, while some applications are suggested for using multigene genetic programming for future studies. Furthermore, the problem of developing stage-discharge rating curves is revisited, and a new MGGP-based model is proposed. Finally, the results indicate that MGGP-based models outperformed other methods considered in the comparative analysis.
The potential of hybrid evolutionary fuzzy intelligence model for suspended sediment concentration prediction
2019, Catena
Citation Excerpt :
In modeling sediment transport to be specific, various AI-based techniques have been employed. Among many approaches, couple models have shown noticeable progress in modeling sediment transport such as the traditional artificial neural network (Afan et al., 2014; Agarwal, 2009; Doğan et al., 2007; Huang et al., 2012; Nagy et al., 2002; Tfwala and Wang, 2016; Van Maanen et al., 2010), the theory of fuzzy logic (Bakhtyar et al., 2008; Dogan, 2005; Doğan et al., 2007; Kabiri-Samani et al., 2011; Mianaei and Keshavarzi, 2010), the application of support vector regression (Azamathulla et al., 2010; Buyukyildiz and Kumcu, 2017; Ebtehaj and Bonakdari, 2016; Batt, 2013; Kisi, 2012; Misra et al., 2009; Wei, 2009; Zounemat-Kermani et al., 2016), the employment of evolutionary computing (Altunkaynak, 2009; Aytek and Kişi, 2008; Jaiyeola, 2015; Kizhisseri et al., 2006; Kumar et al., 2014), and most recently the complementary model of wavelet-AI models (Ebtehaj et al., 2016; Goyal, 2014; Liu et al., 2013; Partal and Cigizoglu, 2008; Rajaee, 2011). Despite the extensive researches on the employment of soft computing models, scholars are still seeking for more robust, reliable and effective models that are able to mimic this complex stochastic problem of the suspended sediment transport.
Providing a robust and reliable prediction model for suspended sediment concentration (SSC) is an essential task for several environmental and geomorphology prospective including water quality, river bed engineering sustainability, and aquatic habitats. In this research, a novel hybrid intelligence approach based on evolutionary fuzzy (EF) approach is developed to predict river suspended sediment concentration. To demonstrate the modeling application, one of the highly affected rivers located in the north-western part of California is selected as a case study (i.e., Eel River). Eel River is considered as one of the most polluted river due to the streamside land sliding, owing to the highly stochastic water river discharge. Thus, the predictive model is constructed using discharge information as it is the main trigger for the SSC amount. The prediction conducted on different locations of the stream (i.e., up-stream and down-stream stations). Three different well-established integrative fuzzy models are developed for the validation purpose including adaptive neuro-fuzzy inference system coupled with subtractive clustering (ANFIS-SC), grid partition (ANFIS-GP), and fuzzy c-means (ANFIS-FCM) models. The predictive models evaluated based on several numerical indicators and two-dimension graphical diagram (i.e., Taylor diagram) that vividly exhibits the observed and predicted values. The attained results evidenced the predictability of the EF model for the SSC over the other models. The discharge information provided an excellent input attributes for the predictive models. In summary, the discovered model showed an outstanding data-intelligence model for the environmental perspective and particularly for Eel River. The methodology is highly qualified to be implemented as a real-time prediction model that can provide a brilliant approach for the river engineering sustainability.
Estimating incipient motion velocity of bed sediments using different data-driven methods
2018, Applied Soft Computing Journal
In the present research, the data-driven methods (DDMs), are used to estimate the threshold velocity of sediment motion. Results of the DDMs used in this research, including artificial neural networks (FFNN & RBNN), adaptive neuro-fuzzy inference system models (ANFIS, ANFIS-GA & ANFIS-IWO), and wavelet neural network (WaveNet), are compared with those of the mathematical models and experimental observations. The obtained results indicate that the WaveNet model with the Nash–Sutcliffe coefficient of 0.997 has better performance than the other methods. Moreover, in order to specify the relative importance of the input parameters for the uncertainty of the threshold velocity, sensitivity analysis is performed, the results of which indicate that the median diameter of the particles and relative density are the most important parameters affecting the threshold velocity, respectively. In addition, the Monte Carlo simulation is used to quantify the uncertainty of the threshold velocity of motion. The uncertainty is expressed using the coefficient of variation (CV). The highest amount of CV is related to the median diameter of grain size, therefore, this parameter has the maximum effect on variations of the incipient motion.
Suspended sediment concentration estimation by stacking the genetic programming and neuro-fuzzy predictions
2016, Applied Soft Computing Journal
Citation Excerpt :
Kisi and Shiri [43] trained gene expression programming by using rainfall data, streamflow, and sediment and then compared it with other methods such as neural network and neuro-fuzzy, showing that this method was more powerful than other methods. Kumar et al. [8] examined three basic issues (i.e., vegetative flow, incipient shear, and total bed load) in predicting the threshold of sediment by using multi-gene genetic programming. The results showed that the proposed method could clearly represent the nonlinear data.
In the new decade due to rich and dense water resources, it is vital to have an accurate and reliable sediment prediction and incorrect estimation of sediment rate has a huge negative effect on supplying drinking and agricultural water. For this reason, many studies have been conducted in order to improve the accuracy of prediction. In a wide range of these studies, various soft computing techniques have been used to predict the sediment. It is expected that combining the predictions obtained by these soft computing techniques can improve the prediction accuracy. Stacking method is a powerful machine learning technique to combine the prediction results of other methods intelligently through a meta-model based on cross validation. However, to the best of our knowledge, the stacking method has not been used to predict sediment or other hydrological parameters, so far. This study introduces stacking method to predict the suspended sediment. For this purpose, linear genetic programming and neuro-fuzzy methods are applied as two successful soft computing methods to predict the suspended sediment. Then, the accuracy of prediction is increased by combining their results with the meta-model of neural network based on cross validation. To evaluate the proposed method, two stations including Rio Valenciano and Quebrada Blanca, in the USA were selected as case studies and streamflow and suspended sediment concentration were defined as inputs to predict the daily suspended sediment. The obtained results demonstrated that the stacking method greatly improved RMSE and $R^{2}$ statistics for both stations compared to use of linear genetic programming or neuro-fuzzy solitarily.
Buckling Load Estimation Using Multiple Linear Regression Analysis and Multigene Genetic Programming Method in Cantilever Beams with Transverse Stiffeners
2023, Arabian Journal for Science and Engineering

View all citing articles on Scopus

View full text

Regression model for sediment transport problems using multi-gene symbolic genetic programming

Highlights

Abstract

Introduction

Section snippets

Sediment transport problems

Methodology

Results and discussion

Conclusion

Acknowledgements

Adv. Eng. Softw.

J. Hydrol.

Geomorphology

Eng. Appl. Artif. Intel.

Eng. Appl. Artif. Intel.

Adv. Eng. Softw.

J. Hydrol.

Comp. Geosci.

Comput. Electron. Agr.

Comput. Electron. Agr.

J. Hydrol.

Coast. Eng.

J. Hydrodyn.

Determining water surface elevation in tidal rivers by ANN

P. I. Civil Eng. – Wat. M.

Optimal design of water networks using fuzzy genetic algorithm

P I Civil Eng-Wat M

Genetic programming to predict ski-jump bucket spill-way scour

J. Hydrodyn.

On inducing equations for vegetation resistance

J. Hydraulic Res.

The legend of A. F. Shields

J. Hydraul. Eng.

Explicit formulation of the Shields diagram for incipient motion of sediment

J. Hydraul. Eng.

Appraisal of soft computing techniques in prediction of total bed material load in tropical rivers

J. Earth Syst. Sci.

Mechanics of Sediment Movement