Regression model for sediment transport problems using multi-gene symbolic genetic programming
Introduction
Sediment transport problems form an essential part of civil engineering practice with regard to river hydraulics related challenges faced in field. Solving sediment transport problems is indispensable in planning and managing water resources. However, system parameters in these problems are multiple in numbers with complex and exhibit nonlinear interdependence. This complexity combined with huge spatiotemporal variations and an inherent nonlinearity makes it difficult to analyze the system analytically. Besides, the variables seem to assume values specific to geographies and climates. This compels one to take assumptions in an analysis that are rendered false when the model may be used for disparate regions. This has also prevented development of universal models which offer satisfactory prediction capabilities irrespective of environments of application. As an example of interdependence, sediment and vegetation make the flow complicated, affecting the velocity profile. This then affects the bed and wall shear stresses and vegetation shapes, causing further changes in sediment loads and velocity profiles. Some or all of the model variables are subjected to the sources of uncertainty, like errors of measurement, absence of information and poor or partial understanding of the driving forces and mechanisms. This imposes a limit on our confidence in the response of the model. Also, models may have to cope with the natural intrinsic variability of the system, such as the occurrence of stochastic events.
Almost all of the existing equations for sediment transport problems are empirical in nature due to such limitations. However, regression and dimensional analyses have been used extensively in the past. These approaches have certain limitations that keep them from being used widely for field applications. Regression has inadequacies pertaining to a first-hand functional form determination and clustering effect of influential points and groups of points. Dimensional analysis is also inadequate due to high number of variables and problems of multiple forms of the same equation In problems of river hydrology, the system often reflects a stochastic nature and the variables cannot be measured without uncertainty. It has therefore been realized that there is a need for developing new and robust models that can overcome the restrictions posed by the conventional techniques.
Soft computing is an emerging paradigm based on the backbone of artificial intelligence, evolutionary/bio-inspired computing and probabilistic computing. These allow developing of statistical black-box models based entirely on historical data. Soft computing has been employed extensively in hydrology and hydraulics with varying applications. The suitability of application of soft computing comes from the fact that it allows for uncertainties in measured values. This is critical in river hydraulics due to the inadvertent uncertainties in measuring data from the field and while experimenting. The models developed are not expected to give 100% accurate results but rather to be tolerant to errors in measurement and offer overall better predictability. These “black-box” models are purely statistical models and model parameters are adjusted by providing training data so as to give predictions for independent and new inputs. Primarily, soft computing techniques include artificial neural network (ANN), fuzzy logic, genetic algorithms (GA), particle swarm optimization (PSO), etc. Several soft computing models have been developed in the past. Adib (2008) used ANN for determining water surface elevation in tidal rivers. Sediment load prediction was carried out by Altunkaynak (2009) using genetic algorithms. Goel and Pal (2009) have used support vector machine in scour prediction. GA was also used for parameter identification for modeling river network by Tang et al. (2010). Kumar et al. (2010) has used Radial Basis Function model to design an incipient channel with bed suction. Kumar and Rao (2010) has used metamodel to predict friction factor in alluvial channel. Application of neural networks and fuzzy logic models to long-shore sediment transport was carried out by Samani et al. (2011). Amirabdollahian et al. (2011) used fuzzy genetic algorithm for optimal design of water networks. Kumar (2011) has used ANN model for friction factor prediction in alluvial channel. Krishna et al. (2012) used a wavelet neural network model for river flow time series. Kumar (2012) has applied soft computing technique for bed material load prediction. Ismail et al. (2013) have applied a feed-forward neural network to predict bridge scour. Other recent relevant work done in the field of river hydraulics by employing soft computing techniques include those of Kisi and Hosseinzadeh (2012) for modeling rainfall–runoff process, Kisi and Hosseinzadeh (2012) for suspended sediment modeling, Shiri et al. (2012) for forecasting daily stream flow. Shiri and Kisi (2012) also estimated daily suspended sediment load using wavelet conjunction models. A comparative study was completed by Kisi and Shiri (2012) in river suspended sediment estimation by climatic variables implication where various soft computing techniques were compared.
Genetic programming (GP) proposed in Koza (1992) views the modeling problem as one of program discovery. Genetic programming is a relatively newer domain in soft computing and has gained popularity in a variety of applications, including those in river hydraulics and sediment dynamics in fluvial systems. Singh et al. (2007) applied neural network–genetic programming for sediment transport. Azamathulla et al. (2008) used genetic programming to predict ski-jump bucket spill-way scour. Aytek and Kişi (2008) attempted sediment modeling using a genetic programming approach. Kisi and Guven (2010) carried out suspended sediment concentration estimation using a machine code-based genetic programming. Chang et al. (2012) used linear genetic programming for discharge prediction in compound channels. Kisi and Hosseinzadeh (2012) developed suspended sediment models using genetic programming. The paradigm of genetic programming attempts a search for the best program from a search space of programs by evolving generations of genetically bred and mutated populations of programs (mathematical expressions). Indeed, the modeling problem requires one to develop models which may well be an explicit function of the independent variables. In this, the approach of genetic programming differs from that of artificial neural models which do not present an explicit expression and rather utilize a number of network parameters to transform inputs to outputs. However, both ANN and GP help develop black-box models which are not based on the underlying physics or the phenomena of the system but are purely statistical. Genetic programming is different from conventional regression. Rather than finding numeric coefficients of a predetermined functional form as done by regression, symbolic regression attempts to find a symbolic expression containing both, functions as well as independent variables and numeric coefficients. The method is also referred to as symbolic function identification. The major difference lies in the fact that unlike conventional regression, GP does not require predetermined functional forms. Instead, it accepts the library of operators (functions and variables) and evolves generations of expressions to ultimately reach the best expression. The term symbolic regression is used for any technique which fits the measured data using a suitable mathematical formula. GP employs a search heuristic where the algorithm begins with randomized sets of expressions and creating new expressions in each generation (iteration) which perform better than the previous generation. Hence, the expressions are not calculated but generated from parent expressions using the genetic operators (mutation, crossover, etc.). The only calculations that take place are evaluations of expressions to assess their performance. This is done using model performance indicators (correlation coefficient, etc.) on the training data. The indicator helps to assess to what degree the model has been able to generalize the training dataset statistically. A good correlation coefficient, for example, would indicate a good generalization. These river hydraulics models are highly complex, and therefore their underlying relationships may be poorly understood. In such cases, the model can be viewed as a black box, i.e. the output is an opaque function of its inputs.
The present attempt is aimed at suggesting a new and improved regression model for sediment transport problems, namely, multi-gene symbolic regression model for three different but related phenomena in sediment transport-vegetated flow, incipient motion and total bed load prediction. Multi-gene symbolic regression uses GP to find (and not calculate) multiple sub-programs (individual genes) and finally regresses the coefficients of these sub programs to reach the final expression. Models developed herein for the all the three cases were found to be better than existing models in terms of model performance criteria.
Section snippets
Sediment transport problems
Flow velocity prediction in vegetated channel flow, total bedload and incipient motion prediction have been taken up in this study. The state of the art for each has been discussed briefly in the subsections that follow.
Methodology
A similar methodology has been adopted in all the three cases taken up to demonstrate the applicability of genetic programming approach across sediment transport problems. The source of data used for modeling in all the three cases has been stated. This is followed by description of the adopted technique, namely, multi-gene symbolic regression. Finally, model performance was assessed through criteria such as correlation coefficient and index of agreement and finally comparing with past
Results and discussion
The parameters were varied through a range of values to detect the appropriate combination of parameters. Build method, which represents ways of initializing tree structures in the first generation, was varied through three possible configurations-‘full’, ‘grow’ and ‘ramped half-and-half’. The maximum depth of an individual gene was varied between 2 and 8. Number of genes was set to vary within 1 and 50. The fractions of mutations, crossover and direct cop were varied with steps of 0.1 within
Conclusion
Sediment transport problems are difficult to model analytically due to the difficulties posed by multidimensional and nonlinearly interdependent system variables. Moreover, graphical techniques and empirical methods proposed in the past for several sediment transport problems show inadequacy in terms of low predictability and agreement with actual data. Soft computing offers several computational techniques to develop efficient and robust models. Multi-gene symbolic regression was used to
Acknowledgements
The authors gratefully acknowledge the financial support that was received from the department of science and technology, Govt. of India (SERC-DST: SR/S3/MERC/005/2010) to carry out the research work presented in this paper.
References (55)
Sediment load prediction by genetic algorithms
Adv. Eng. Softw.
(2009)- et al.
A genetic programming approach to suspended sediment modelling
J. Hydrol.
(2008) - et al.
Influence of riparian vegetation on channel widening and subsequent contraction on a sand-bed stream since European settlement: Widden Brook, Australia
Geomorphology
(2012) - et al.
Application of support vector machines in scour prediction on grade-control structures
Eng. Appl. Artif. Intel.
(2009) - et al.
Predictions of bridge scour: application of a feed-forward neural network with an adaptive activation function
Eng. Appl. Artif. Intel.
(2013) - et al.
A machine code-based genetic programming for suspended sediment concentration estimation
Adv. Eng. Softw.
(2010) - et al.
Suspended sediment modeling using genetic programming and soft computing techniques
J. Hydrol.
(2012) - et al.
River suspended sediment estimation by climatic variables implication: Comparative study among soft computing techniques
Comp. Geosci.
(2012) - et al.
Metamodeling approach to predict friction factor in alluvial channel
Comput. Electron. Agr.
(2010) - et al.
Incipient motion design of sand bed channels affected by bed suction
Comput. Electron. Agr.
(2010)
River flow forecasting through conceptual models. 1: Discussion of principles
J. Hydrol.
Sediment movement under unidirectional flows: an assessment of empirical threshold curves
Coast. Eng.
Parameter identification for modeling river network using a genetic algorithm
J. Hydrodyn.
Determining water surface elevation in tidal rivers by ANN
P. I. Civil Eng. – Wat. M.
Optimal design of water networks using fuzzy genetic algorithm
P I Civil Eng-Wat M
Genetic programming to predict ski-jump bucket spill-way scour
J. Hydrodyn.
On inducing equations for vegetation resistance
J. Hydraulic Res.
The legend of A. F. Shields
J. Hydraul. Eng.
Explicit formulation of the Shields diagram for incipient motion of sediment
J. Hydraul. Eng.
Appraisal of soft computing techniques in prediction of total bed material load in tropical rivers
J. Earth Syst. Sci.
Mechanics of Sediment Movement
Cited by (20)
Multi-gen genetic programming based improved innovative model for extrapolation of wind data at high altitudes, case study: Turkey
2022, Computers and Electrical EngineeringCitation Excerpt :The parameter values of the MGGP used in the creation of the functions for the first week of each month are shown in Table 3. In addition, Table 3 was organized according to Refs. [19,21]. All of the above steps were applied to all site points S2 S3, S4, S5 and S6 respectively.
Multigene genetic programming and its various applications
2022, Handbook of HydroInformatics: Volume I: Classic Soft-Computing TechniquesThe potential of hybrid evolutionary fuzzy intelligence model for suspended sediment concentration prediction
2019, CatenaCitation Excerpt :In modeling sediment transport to be specific, various AI-based techniques have been employed. Among many approaches, couple models have shown noticeable progress in modeling sediment transport such as the traditional artificial neural network (Afan et al., 2014; Agarwal, 2009; Doğan et al., 2007; Huang et al., 2012; Nagy et al., 2002; Tfwala and Wang, 2016; Van Maanen et al., 2010), the theory of fuzzy logic (Bakhtyar et al., 2008; Dogan, 2005; Doğan et al., 2007; Kabiri-Samani et al., 2011; Mianaei and Keshavarzi, 2010), the application of support vector regression (Azamathulla et al., 2010; Buyukyildiz and Kumcu, 2017; Ebtehaj and Bonakdari, 2016; Batt, 2013; Kisi, 2012; Misra et al., 2009; Wei, 2009; Zounemat-Kermani et al., 2016), the employment of evolutionary computing (Altunkaynak, 2009; Aytek and Kişi, 2008; Jaiyeola, 2015; Kizhisseri et al., 2006; Kumar et al., 2014), and most recently the complementary model of wavelet-AI models (Ebtehaj et al., 2016; Goyal, 2014; Liu et al., 2013; Partal and Cigizoglu, 2008; Rajaee, 2011). Despite the extensive researches on the employment of soft computing models, scholars are still seeking for more robust, reliable and effective models that are able to mimic this complex stochastic problem of the suspended sediment transport.
Estimating incipient motion velocity of bed sediments using different data-driven methods
2018, Applied Soft Computing JournalSuspended sediment concentration estimation by stacking the genetic programming and neuro-fuzzy predictions
2016, Applied Soft Computing JournalCitation Excerpt :Kisi and Shiri [43] trained gene expression programming by using rainfall data, streamflow, and sediment and then compared it with other methods such as neural network and neuro-fuzzy, showing that this method was more powerful than other methods. Kumar et al. [8] examined three basic issues (i.e., vegetative flow, incipient shear, and total bed load) in predicting the threshold of sediment by using multi-gene genetic programming. The results showed that the proposed method could clearly represent the nonlinear data.
Buckling Load Estimation Using Multiple Linear Regression Analysis and Multigene Genetic Programming Method in Cantilever Beams with Transverse Stiffeners
2023, Arabian Journal for Science and Engineering