A data-driven approach for modeling post-fire debris-flow volumes and their uncertainty

https://doi.org/10.1016/j.envsoft.2011.07.014Get rights and content

Abstract

This study demonstrates the novel application of genetic programming to evolve nonlinear post-fire debris-flow volume equations from variables associated with a data-driven conceptual model of the western United States. The search space is constrained using a multi-component objective function that simultaneously minimizes root-mean squared and unit errors for the evolution of fittest equations. An optimization technique is then used to estimate the limits of nonlinear prediction uncertainty associated with the debris-flow equations. In contrast to a published multiple linear regression three-variable equation, linking basin area with slopes greater or equal to 30 percent, burn severity characterized as area burned moderate plus high, and total storm rainfall, the data-driven approach discovers many nonlinear and several dimensionally consistent equations that are unbiased and have less prediction uncertainty. Of the nonlinear equations, the best performance (lowest prediction uncertainty) is achieved when using three variables: average basin slope, total burned area, and total storm rainfall. Further reduction in uncertainty is possible for the nonlinear equations when dimensional consistency is not a priority and by subsequently applying a gradient solver to the fittest solutions. The data-driven modeling approach can be applied to nonlinear multivariate problems in all fields of study.

Highlights

► We use genetic programming to predict post-fire debris-flow volumes in the western US. ► We use optimization to estimate the prediction uncertainty of debris-flow volumes. ► Reductions in uncertainty are possible when dimensional consistency is not a priority.

Introduction

The hazardous consequences of rainfall on basins motivate investigators to study its cause-and-effect on various hydrologic responses (Friedel, 2008, Gartner et al., 2008, Biswajeet and Lee, 2010, Fotopoulos et al., 2010, Scott Wilkinson, 2009; Stenson et al., 2011). Of these responses, debris flows have the most hazardous consequences and may be one reason why several modeling studies focus on this response. A review of traditional debris-flow modeling is provided by Bulmer et al. (2002). To date, the types of modeling include physical, empirical, and numerical approaches. Early physical models consider debris flows as a single phase Bingham (or Coulomb) continuum (Johnson, 1984). Takahashi (1980) considers particle–particle interactions but for homogenous mixtures without internal pressure on the fluid-matrix mixture. Later, these modeling assumptions are generalized by Iverson (1997) to include viscous pore fluid in a fluid-solid momentum transport approach. Because the debris-flow dynamics are nonlinear, time-dependent, and spatially varying, many researchers began digital investigations involving empirical and numerical approaches.

Empirically-based models are developed by fitting equations to field data for predicting post-fire debris-flow generation at the outlets of burned basins. While not explicitly describing the physics of debris flows, these models can provide a first-order prediction of debris-flow behavior. Some post-fire examples include equations devised using multiple linear regression (MLR) to predict debris-flow peak discharge as a function of variation in basin landform, burn severity, and rainfall (Cannon et al., 2003, Gartner, 2005). Although peak discharge is successfully used to model extreme flooding (Friedel et al., 2008) and extreme rainfall events (Friedel, 2008), many researchers consider it too uncertain for predicting post-fire debris flows (Pierson, 2004). In a recent study by Friedel (2010), the average range of prediction uncertainty in Colorado debris-flow peak discharge measurements is determined to span a factor of about six. For these reasons, there is a shift away from peak discharge in favor of alternative response variables that include the percent chance for debris-flow production (Cannon et al., 2004) and total volume of debris flows (Gartner et al., 2008).

Numerically-based models are also used to predict the timing and spatial movement of debris flows in response to rainfall on burned basins (Bunch et al., 2004, Elliott et al., 2005, Mikos et al., 2006, Rosso et al., 2007, Bathurst et al., 2007, Hsu et al., 2010). These models differ from standard basin hydrologic models with the addition of a friction slope. The friction slope term depends on which rheological model is chosen to represent the shear stress of a non-Newtonian fluid. One numerical modeling application is the creation of post-fire debris-flow inundation maps for basins burned during the 2002 Colorado wildfires (Elliott et al., 2005). In that study, the numerical problem is solved in two steps. First, the peak-discharge hydrographs associated with a 100-year storm event are estimated at the outlets of tributary basins burned as part of the Coal seam, Hayman, and Missionary Ridge wildfires. Second, the hydrographs are bulked and then used as input to an unsteady, unconfined, two-dimensional flow and transport model for predicting the timing and spatial extent of debris flows. The challenges in that study illustrate those common to other post-fire numerical modeling efforts: (1) poor spatial rainfall resolution, (2) poor spatial resolution of physical properties, (3) assumed homogeneity of the debris-flow field, (4) little or no streamflow and debris-flow information at basin outlets to calibrate and validate the model.

Despite the progress made in modeling post-fire debris flows, there remains a need for additional improvement (Han et al., 2007). This is particularly true with respect to the development of alternative nonlinear models and quantification of prediction uncertainty. Over the past decade, data-driven techniques have been introduced as alternative tools in hydrology (Dawson and Wilby, 2001, Han et al., 2007). One data-driven technique is the self-organizing map. The self-organizing map is a type of unsupervised neural network that maps nonlinear data vectors from a high- to low-dimensional model output space (Kohonen, 2001). Some applications include investigating the spatial and temporal trends in basin water quality data (Lischeid, 2003), estimating design hydrographs for ungauged basins (Lin and Wu, 2007), assessing the vulnerability of rainfall-induced debris flows (Lu et al., 2007), and developing post-fire landscape models at multi-state (regional) scales (Friedel, 2011). A more comprehensive review of applications in water-resources is provided by Kalteh and Berndtsson (2008). A second data-driven technique is symbolic regression. Symbolic regression is one type of genetic programming (GP) that searches for empirical relations using a specific form of the evolutionary algorithm (Koza, 1992, Koza et al., 1999). These algorithms share the common property of applying selection, variation, and reproduction to a population of structures that undergo evolution. Recent applications include the evolution of equations to estimate soil hydraulic properties (Parasuraman et al., 2007), estimate suspended sediment concentration (Aytek et al., 2008), forecast short-term streamflow with global climate change implications (Makkeasorn et al., 2009), and to project climate change impacts on landlocked salmon (Tung et al., 2009).

A common interest in the field of hydrology is the estimation of prediction uncertainty (Vecchia and Cooley, 1987, Christensen and Cooley, 1999a, Christensen and Cooley, 1999b, Cooley, 2004, Friedel, 2005, Friedel, 2006a, Friedel, 2006b, Gallager and Doherty, 2007, Friedel et al., 2008, Yu et al., 2008, Sreekanth and Datta, 2011). Part of the motivation for this analysis is the recognition that empirical (and numerical) models are non unique. That is, there are many alternate combinations of model coefficients (or parameter values) that can satisfy the same best-fit criteria. Because predictions made using a given model represent one set of many, there is range over which they vary. In this study, the following objectives focus on burned basins in the western United States: (1) evolve a set of nonlinear multivariate debris-flow volume equations; and (2) quantify and compare model statistics and prediction uncertainty among these nonlinear equations to a published linear equation. This study extends the work of Gartner (2008) who sought to devise debris-flow equations based on the traditional multiple linear regression approach. We demonstrate the novel application of genetic programming to evolve nonlinear post-fire debris-flow volume equations from variables associated with a data-driven conceptual model of the western United States (Friedel, 2011). In addition to providing new equations, this study illustrates the applicability of an inverse technique for estimating nonlinear post-fire debris-flow prediction uncertainty. The general nonlinear modeling approach can be applied to multivariate problems in all fields of study.

Section snippets

Conceptual models and data

The selection of variables for use in this study is based on a conceptual post-fire landscape model provided by Friedel (2011). In that study, conceptual models are delineated at the multi-state (regional) scale using data from six hundred burned basins in nine western states (Gartner et al., 2005), the self-organizing map technique (Kohonen, 2001), the partitive cluster technique (Vesanto and Alhoniemi, 2000), and the Davies-Bouldin criteria (Davies and Bouldin, 1979). Given that the

Genetic programming

Symbolic regression operates on a collection of debris-flow observations to evolve an equation in which the model structure and coefficients are part of the search process (Koza, 1992, Koza et al., 1999, Babovic and Keijzer, 2000). To carry out this process, the criterion of heredity in children from a parent population is introduced using a crossover operator (replace a randomly chosen subtree from a formula with a randomly chosen subtree from another formula) and mutation operator (replace a

Debris-flow volume equations

In the first experiment, the goal is to see if GP can evolve the Gartner equation for western United States (Gartner et al., 2008). This requires the algorithm to choose among an expanded function set that includes arithmetic, square root, and logarithm operators and same independent landscape variables: basin area with slopes greater or equal to 30 percent, in m2 (G30); burn severity characterized as area burned moderate plus high, in m2 (BMH); and total storm rainfall, in m (TSR). In contrast

Conceptual debris-flow model

The conceptual model used in this study provides a hypothesis of post-fire debris-flow volume discharge that is tested using the GP technique. The fact that equations can be evolved from the associated variables suggests that the data-drive approach is useful. Given that the published linear and new nonlinear equations incorporate variables from only two of the eight possible conceptual models presents an opportunity for future evolutionary work. That is, the data-driven approach could also be

Conclusions

We find that symbolic regression and inverse-optimization techniques can be used to model post-fire debris-flow volume discharge and their uncertainty at basin outlets. In contrast to the traditional multiple linear regression technique, the data-driven approach discovers many nonlinear and several dimensionally consistent equations that are unbiased and have less prediction uncertainty. The application of this data-driven modeling approach led to the following specific conclusions:

  • 1.

    Application

Acknowledgments

The author thanks Sue Cannon of the USGS for providing the burned basin debris-flow volume data used in this study. In addition, the author is indebted to Sue Cannon, Raymond Johnson, and James Tindall of the USGS for their valuable comments and suggestions. The data used in this paper can be obtained by sending an email request to the corresponding author.

References (56)

  • J.C. Bathurst et al.

    Modelling the impact of forest loss on shallow landslide sediment yield, Ijuez river catchment, Spanish Pyrenees

    Hydrology and Earth Systems Sciences

    (2007)
  • P. Biswajeet et al.

    Landslide susceptibility assessment and factor effect analysis: back propagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling

    Environmental Modelling & Software

    (2010)
  • M.H. Bulmer et al.

    An empirical approach to studying debris flows: implications for planetary modeling studies

    Journal of Geophysical Research Planets

    (2002)
  • M.A. Bunch et al.

    A model for simulating the deposition of water-lain sediments in dryland environments

    Hydrology and Earth Systems Sciences

    (2004)
  • S.H. Cannon et al.

    Debris flow response of basins burned by the 2002 coal seam and Missionary Ridge Fires, Colorado

  • Cannon S.H., Gartner J.E., Rupert M.G., Michael J.A., 2004. Emergency assessment of debris-flow hazards from basins...
  • S. Christensen et al.

    Evaluation of prediction intervals for expressing uncertainties in groundwater flow model predictions

    Water Resources Research

    (1999)
  • R.L. Cooley

    A Theory for Modeling Ground-Water Flow in Heterogeneous Media. USGS Professional Paper 1679

    (2004)
  • D.L. Davies et al.

    A cluster separation measure

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1979)
  • C.W. Dawson et al.

    Hydrological modelling using artificial neural networks

    Progress in Physical Geography

    (2001)
  • J. Doherty

    PEST: Model-Independent Parameter estimation, Version 5 of User Manual. Watermark Numerical computing

    (2004)
  • J.G. Elliott et al.

    Analysis and mapping of post-fire hydrologic hazards for the 2002 Hayman, Coal Seam, and Missionary Ridge wildfires, Colorado

    U.S. Geological Survey Science Investment Report

    (2005)
  • Friedel, M.J., July 30, 2011. Modeling hydrologic and geomorphologic responses across post-fire landscapes using a...
  • M.J. Friedel

    Predictive streamflow uncertainty in relation to calibration-constraint information, model complexity, and model bias

    International Journal of River Basin Management

    (2006)
  • M.J. Friedel

    Reliability in estimating urban groundwater recharge through the vadose zone: managing sustainable development in arid and semiarid regions

  • M.J. Friedel

    Regularized Joint Inverse Estimation of extreme rainfall events in ungaged coastal basins of El Salvador

    Natural Hazards Journal

    (2008)
  • Friedel, M.J., 2010, Post-fire debris flow prediction using a two-step hybrid approach, U.S. Geological Survey, 3rd...
  • M.J. Friedel et al.

    Probable flood predictions in ungaged coastal basins of El Salvador, Special issue: methodologies in Hydrologic Modeling

    Journal of Hydrologic Engineering

    (2008)
  • Cited by (0)

    View full text