Regression model for sediment transport problems using multi-gene symbolic genetic programming

https://doi.org/10.1016/j.compag.2014.02.010Get rights and content

Highlights

  • Develop regression models using multigene symbolic genetic programming.

  • Models have been tested on different efficiency criteria.

  • Good model performance was noted irrespective of dataset sizes.

  • Distribution fits are very close to each other for observed and predicted values.

Abstract

Sediment transport modeling problems are complex due to the multi-dimensionality of the problems, along with their nonlinear interdependence. Also, in river hydraulics, phenomena are stochastic and variables are measured with uncertainties which are unavoidable. Dimensional and regression analyses have been employed in the past but have associated limitations. As a robust modeling tool, genetic programming was used to develop predictor models for three different but related problems of sediment transport-vegetated flow, incipient motion and total bed load prediction. A relatively new development over the conventional genetic programming-multi-gene symbolic regression was used to model functional relationships that were able to generalize highly nonlinear variations in data as well as predict system behavior from independent input data in all the three cases. The algorithmic parameters for genetic programming technique were resolved iteratively, varying based on problems in context. For all the three models developed, model efficiency criteria were found out and presented and the performance of the present model was compared with several past models for the same data points. The models developed herein were able to generalize the underlying relationships in the presented data as well as were able to predict values for unknown data with high accuracy.

Introduction

Sediment transport problems form an essential part of civil engineering practice with regard to river hydraulics related challenges faced in field. Solving sediment transport problems is indispensable in planning and managing water resources. However, system parameters in these problems are multiple in numbers with complex and exhibit nonlinear interdependence. This complexity combined with huge spatiotemporal variations and an inherent nonlinearity makes it difficult to analyze the system analytically. Besides, the variables seem to assume values specific to geographies and climates. This compels one to take assumptions in an analysis that are rendered false when the model may be used for disparate regions. This has also prevented development of universal models which offer satisfactory prediction capabilities irrespective of environments of application. As an example of interdependence, sediment and vegetation make the flow complicated, affecting the velocity profile. This then affects the bed and wall shear stresses and vegetation shapes, causing further changes in sediment loads and velocity profiles. Some or all of the model variables are subjected to the sources of uncertainty, like errors of measurement, absence of information and poor or partial understanding of the driving forces and mechanisms. This imposes a limit on our confidence in the response of the model. Also, models may have to cope with the natural intrinsic variability of the system, such as the occurrence of stochastic events.

Almost all of the existing equations for sediment transport problems are empirical in nature due to such limitations. However, regression and dimensional analyses have been used extensively in the past. These approaches have certain limitations that keep them from being used widely for field applications. Regression has inadequacies pertaining to a first-hand functional form determination and clustering effect of influential points and groups of points. Dimensional analysis is also inadequate due to high number of variables and problems of multiple forms of the same equation In problems of river hydrology, the system often reflects a stochastic nature and the variables cannot be measured without uncertainty. It has therefore been realized that there is a need for developing new and robust models that can overcome the restrictions posed by the conventional techniques.

Soft computing is an emerging paradigm based on the backbone of artificial intelligence, evolutionary/bio-inspired computing and probabilistic computing. These allow developing of statistical black-box models based entirely on historical data. Soft computing has been employed extensively in hydrology and hydraulics with varying applications. The suitability of application of soft computing comes from the fact that it allows for uncertainties in measured values. This is critical in river hydraulics due to the inadvertent uncertainties in measuring data from the field and while experimenting. The models developed are not expected to give 100% accurate results but rather to be tolerant to errors in measurement and offer overall better predictability. These “black-box” models are purely statistical models and model parameters are adjusted by providing training data so as to give predictions for independent and new inputs. Primarily, soft computing techniques include artificial neural network (ANN), fuzzy logic, genetic algorithms (GA), particle swarm optimization (PSO), etc. Several soft computing models have been developed in the past. Adib (2008) used ANN for determining water surface elevation in tidal rivers. Sediment load prediction was carried out by Altunkaynak (2009) using genetic algorithms. Goel and Pal (2009) have used support vector machine in scour prediction. GA was also used for parameter identification for modeling river network by Tang et al. (2010). Kumar et al. (2010) has used Radial Basis Function model to design an incipient channel with bed suction. Kumar and Rao (2010) has used metamodel to predict friction factor in alluvial channel. Application of neural networks and fuzzy logic models to long-shore sediment transport was carried out by Samani et al. (2011). Amirabdollahian et al. (2011) used fuzzy genetic algorithm for optimal design of water networks. Kumar (2011) has used ANN model for friction factor prediction in alluvial channel. Krishna et al. (2012) used a wavelet neural network model for river flow time series. Kumar (2012) has applied soft computing technique for bed material load prediction. Ismail et al. (2013) have applied a feed-forward neural network to predict bridge scour. Other recent relevant work done in the field of river hydraulics by employing soft computing techniques include those of Kisi and Hosseinzadeh (2012) for modeling rainfall–runoff process, Kisi and Hosseinzadeh (2012) for suspended sediment modeling, Shiri et al. (2012) for forecasting daily stream flow. Shiri and Kisi (2012) also estimated daily suspended sediment load using wavelet conjunction models. A comparative study was completed by Kisi and Shiri (2012) in river suspended sediment estimation by climatic variables implication where various soft computing techniques were compared.

Genetic programming (GP) proposed in Koza (1992) views the modeling problem as one of program discovery. Genetic programming is a relatively newer domain in soft computing and has gained popularity in a variety of applications, including those in river hydraulics and sediment dynamics in fluvial systems. Singh et al. (2007) applied neural network–genetic programming for sediment transport. Azamathulla et al. (2008) used genetic programming to predict ski-jump bucket spill-way scour. Aytek and Kişi (2008) attempted sediment modeling using a genetic programming approach. Kisi and Guven (2010) carried out suspended sediment concentration estimation using a machine code-based genetic programming. Chang et al. (2012) used linear genetic programming for discharge prediction in compound channels. Kisi and Hosseinzadeh (2012) developed suspended sediment models using genetic programming. The paradigm of genetic programming attempts a search for the best program from a search space of programs by evolving generations of genetically bred and mutated populations of programs (mathematical expressions). Indeed, the modeling problem requires one to develop models which may well be an explicit function of the independent variables. In this, the approach of genetic programming differs from that of artificial neural models which do not present an explicit expression and rather utilize a number of network parameters to transform inputs to outputs. However, both ANN and GP help develop black-box models which are not based on the underlying physics or the phenomena of the system but are purely statistical. Genetic programming is different from conventional regression. Rather than finding numeric coefficients of a predetermined functional form as done by regression, symbolic regression attempts to find a symbolic expression containing both, functions as well as independent variables and numeric coefficients. The method is also referred to as symbolic function identification. The major difference lies in the fact that unlike conventional regression, GP does not require predetermined functional forms. Instead, it accepts the library of operators (functions and variables) and evolves generations of expressions to ultimately reach the best expression. The term symbolic regression is used for any technique which fits the measured data using a suitable mathematical formula. GP employs a search heuristic where the algorithm begins with randomized sets of expressions and creating new expressions in each generation (iteration) which perform better than the previous generation. Hence, the expressions are not calculated but generated from parent expressions using the genetic operators (mutation, crossover, etc.). The only calculations that take place are evaluations of expressions to assess their performance. This is done using model performance indicators (correlation coefficient, etc.) on the training data. The indicator helps to assess to what degree the model has been able to generalize the training dataset statistically. A good correlation coefficient, for example, would indicate a good generalization. These river hydraulics models are highly complex, and therefore their underlying relationships may be poorly understood. In such cases, the model can be viewed as a black box, i.e. the output is an opaque function of its inputs.

The present attempt is aimed at suggesting a new and improved regression model for sediment transport problems, namely, multi-gene symbolic regression model for three different but related phenomena in sediment transport-vegetated flow, incipient motion and total bed load prediction. Multi-gene symbolic regression uses GP to find (and not calculate) multiple sub-programs (individual genes) and finally regresses the coefficients of these sub programs to reach the final expression. Models developed herein for the all the three cases were found to be better than existing models in terms of model performance criteria.

Section snippets

Sediment transport problems

Flow velocity prediction in vegetated channel flow, total bedload and incipient motion prediction have been taken up in this study. The state of the art for each has been discussed briefly in the subsections that follow.

Methodology

A similar methodology has been adopted in all the three cases taken up to demonstrate the applicability of genetic programming approach across sediment transport problems. The source of data used for modeling in all the three cases has been stated. This is followed by description of the adopted technique, namely, multi-gene symbolic regression. Finally, model performance was assessed through criteria such as correlation coefficient and index of agreement and finally comparing with past

Results and discussion

The parameters were varied through a range of values to detect the appropriate combination of parameters. Build method, which represents ways of initializing tree structures in the first generation, was varied through three possible configurations-‘full’, ‘grow’ and ‘ramped half-and-half’. The maximum depth of an individual gene was varied between 2 and 8. Number of genes was set to vary within 1 and 50. The fractions of mutations, crossover and direct cop were varied with steps of 0.1 within

Conclusion

Sediment transport problems are difficult to model analytically due to the difficulties posed by multidimensional and nonlinearly interdependent system variables. Moreover, graphical techniques and empirical methods proposed in the past for several sediment transport problems show inadequacy in terms of low predictability and agreement with actual data. Soft computing offers several computational techniques to develop efficient and robust models. Multi-gene symbolic regression was used to

Acknowledgements

The authors gratefully acknowledge the financial support that was received from the department of science and technology, Govt. of India (SERC-DST: SR/S3/MERC/005/2010) to carry out the research work presented in this paper.

References (55)

  • J.E. Nash et al.

    River flow forecasting through conceptual models. 1: Discussion of principles

    J. Hydrol.

    (1970)
  • D. Paphitis

    Sediment movement under unidirectional flows: an assessment of empirical threshold curves

    Coast. Eng.

    (2001)
  • H. Tang et al.

    Parameter identification for modeling river network using a genetic algorithm

    J. Hydrodyn.

    (2010)
  • Acaroglu, E.R., 1968. Sediment transport in convenyance system. PhD Thesis, Cornell University, Ithaca, New York,...
  • A. Adib

    Determining water surface elevation in tidal rivers by ANN

    P. I. Civil Eng. – Wat. M.

    (2008)
  • M. Amirabdollahian et al.

    Optimal design of water networks using fuzzy genetic algorithm

    P I Civil Eng-Wat M

    (2011)
  • H. Azamathulla et al.

    Genetic programming to predict ski-jump bucket spill-way scour

    J. Hydrodyn.

    (2008)
  • Bagnold, R.A., 1966. An Approach to the Sediment Transport Problem From General Physics, Professional Paper 422-I, U.S....
  • M.J. Baptist et al.

    On inducing equations for vegetation resistance

    J. Hydraulic Res.

    (2006)
  • Brownlie, W.R., 1981. Compilation of Alluvial Channel Data: Laboratory and field, Rep. No. KH-R-43B, California...
  • J. Buffington

    The legend of A. F. Shields

    J. Hydraul. Eng.

    (1999)
  • Z. Cao et al.

    Explicit formulation of the Shields diagram for incipient motion of sediment

    J. Hydraul. Eng.

    (2006)
  • C.K. Chang et al.

    Appraisal of soft computing techniques in prediction of total bed material load in tropical rivers

    J. Earth Syst. Sci.

    (2012)
  • N. Chien et al.

    Mechanics of Sediment Movement

    (1983)
  • Einstein, H.A., 1950. The Bed-load Function for Sediment Transportation in Open Channel Flows, Technical Bulletin 1026,...
  • Engelund F., Hansen E., 1967. A monograph for sediment transport in alluvial channel, report, Copenhagen,...
  • Galema, A., 2009. Evaluation of Vegetation Resistance Descriptors for Flood Management, Master Thesis, University of...
  • Cited by (20)

    • Multi-gen genetic programming based improved innovative model for extrapolation of wind data at high altitudes, case study: Turkey

      2022, Computers and Electrical Engineering
      Citation Excerpt :

      The parameter values of the MGGP used in the creation of the functions for the first week of each month are shown in Table 3. In addition, Table 3 was organized according to Refs. [19,21]. All of the above steps were applied to all site points S2 S3, S4, S5 and S6 respectively.

    • Multigene genetic programming and its various applications

      2022, Handbook of HydroInformatics: Volume I: Classic Soft-Computing Techniques
    • The potential of hybrid evolutionary fuzzy intelligence model for suspended sediment concentration prediction

      2019, Catena
      Citation Excerpt :

      In modeling sediment transport to be specific, various AI-based techniques have been employed. Among many approaches, couple models have shown noticeable progress in modeling sediment transport such as the traditional artificial neural network (Afan et al., 2014; Agarwal, 2009; Doğan et al., 2007; Huang et al., 2012; Nagy et al., 2002; Tfwala and Wang, 2016; Van Maanen et al., 2010), the theory of fuzzy logic (Bakhtyar et al., 2008; Dogan, 2005; Doğan et al., 2007; Kabiri-Samani et al., 2011; Mianaei and Keshavarzi, 2010), the application of support vector regression (Azamathulla et al., 2010; Buyukyildiz and Kumcu, 2017; Ebtehaj and Bonakdari, 2016; Batt, 2013; Kisi, 2012; Misra et al., 2009; Wei, 2009; Zounemat-Kermani et al., 2016), the employment of evolutionary computing (Altunkaynak, 2009; Aytek and Kişi, 2008; Jaiyeola, 2015; Kizhisseri et al., 2006; Kumar et al., 2014), and most recently the complementary model of wavelet-AI models (Ebtehaj et al., 2016; Goyal, 2014; Liu et al., 2013; Partal and Cigizoglu, 2008; Rajaee, 2011). Despite the extensive researches on the employment of soft computing models, scholars are still seeking for more robust, reliable and effective models that are able to mimic this complex stochastic problem of the suspended sediment transport.

    • Suspended sediment concentration estimation by stacking the genetic programming and neuro-fuzzy predictions

      2016, Applied Soft Computing Journal
      Citation Excerpt :

      Kisi and Shiri [43] trained gene expression programming by using rainfall data, streamflow, and sediment and then compared it with other methods such as neural network and neuro-fuzzy, showing that this method was more powerful than other methods. Kumar et al. [8] examined three basic issues (i.e., vegetative flow, incipient shear, and total bed load) in predicting the threshold of sediment by using multi-gene genetic programming. The results showed that the proposed method could clearly represent the nonlinear data.

    View all citing articles on Scopus
    View full text