A data-driven approach for modeling post-fire debris-flow volumes and their uncertainty
Highlights
► We use genetic programming to predict post-fire debris-flow volumes in the western US. ► We use optimization to estimate the prediction uncertainty of debris-flow volumes. ► Reductions in uncertainty are possible when dimensional consistency is not a priority.
Introduction
The hazardous consequences of rainfall on basins motivate investigators to study its cause-and-effect on various hydrologic responses (Friedel, 2008, Gartner et al., 2008, Biswajeet and Lee, 2010, Fotopoulos et al., 2010, Scott Wilkinson, 2009; Stenson et al., 2011). Of these responses, debris flows have the most hazardous consequences and may be one reason why several modeling studies focus on this response. A review of traditional debris-flow modeling is provided by Bulmer et al. (2002). To date, the types of modeling include physical, empirical, and numerical approaches. Early physical models consider debris flows as a single phase Bingham (or Coulomb) continuum (Johnson, 1984). Takahashi (1980) considers particle–particle interactions but for homogenous mixtures without internal pressure on the fluid-matrix mixture. Later, these modeling assumptions are generalized by Iverson (1997) to include viscous pore fluid in a fluid-solid momentum transport approach. Because the debris-flow dynamics are nonlinear, time-dependent, and spatially varying, many researchers began digital investigations involving empirical and numerical approaches.
Empirically-based models are developed by fitting equations to field data for predicting post-fire debris-flow generation at the outlets of burned basins. While not explicitly describing the physics of debris flows, these models can provide a first-order prediction of debris-flow behavior. Some post-fire examples include equations devised using multiple linear regression (MLR) to predict debris-flow peak discharge as a function of variation in basin landform, burn severity, and rainfall (Cannon et al., 2003, Gartner, 2005). Although peak discharge is successfully used to model extreme flooding (Friedel et al., 2008) and extreme rainfall events (Friedel, 2008), many researchers consider it too uncertain for predicting post-fire debris flows (Pierson, 2004). In a recent study by Friedel (2010), the average range of prediction uncertainty in Colorado debris-flow peak discharge measurements is determined to span a factor of about six. For these reasons, there is a shift away from peak discharge in favor of alternative response variables that include the percent chance for debris-flow production (Cannon et al., 2004) and total volume of debris flows (Gartner et al., 2008).
Numerically-based models are also used to predict the timing and spatial movement of debris flows in response to rainfall on burned basins (Bunch et al., 2004, Elliott et al., 2005, Mikos et al., 2006, Rosso et al., 2007, Bathurst et al., 2007, Hsu et al., 2010). These models differ from standard basin hydrologic models with the addition of a friction slope. The friction slope term depends on which rheological model is chosen to represent the shear stress of a non-Newtonian fluid. One numerical modeling application is the creation of post-fire debris-flow inundation maps for basins burned during the 2002 Colorado wildfires (Elliott et al., 2005). In that study, the numerical problem is solved in two steps. First, the peak-discharge hydrographs associated with a 100-year storm event are estimated at the outlets of tributary basins burned as part of the Coal seam, Hayman, and Missionary Ridge wildfires. Second, the hydrographs are bulked and then used as input to an unsteady, unconfined, two-dimensional flow and transport model for predicting the timing and spatial extent of debris flows. The challenges in that study illustrate those common to other post-fire numerical modeling efforts: (1) poor spatial rainfall resolution, (2) poor spatial resolution of physical properties, (3) assumed homogeneity of the debris-flow field, (4) little or no streamflow and debris-flow information at basin outlets to calibrate and validate the model.
Despite the progress made in modeling post-fire debris flows, there remains a need for additional improvement (Han et al., 2007). This is particularly true with respect to the development of alternative nonlinear models and quantification of prediction uncertainty. Over the past decade, data-driven techniques have been introduced as alternative tools in hydrology (Dawson and Wilby, 2001, Han et al., 2007). One data-driven technique is the self-organizing map. The self-organizing map is a type of unsupervised neural network that maps nonlinear data vectors from a high- to low-dimensional model output space (Kohonen, 2001). Some applications include investigating the spatial and temporal trends in basin water quality data (Lischeid, 2003), estimating design hydrographs for ungauged basins (Lin and Wu, 2007), assessing the vulnerability of rainfall-induced debris flows (Lu et al., 2007), and developing post-fire landscape models at multi-state (regional) scales (Friedel, 2011). A more comprehensive review of applications in water-resources is provided by Kalteh and Berndtsson (2008). A second data-driven technique is symbolic regression. Symbolic regression is one type of genetic programming (GP) that searches for empirical relations using a specific form of the evolutionary algorithm (Koza, 1992, Koza et al., 1999). These algorithms share the common property of applying selection, variation, and reproduction to a population of structures that undergo evolution. Recent applications include the evolution of equations to estimate soil hydraulic properties (Parasuraman et al., 2007), estimate suspended sediment concentration (Aytek et al., 2008), forecast short-term streamflow with global climate change implications (Makkeasorn et al., 2009), and to project climate change impacts on landlocked salmon (Tung et al., 2009).
A common interest in the field of hydrology is the estimation of prediction uncertainty (Vecchia and Cooley, 1987, Christensen and Cooley, 1999a, Christensen and Cooley, 1999b, Cooley, 2004, Friedel, 2005, Friedel, 2006a, Friedel, 2006b, Gallager and Doherty, 2007, Friedel et al., 2008, Yu et al., 2008, Sreekanth and Datta, 2011). Part of the motivation for this analysis is the recognition that empirical (and numerical) models are non unique. That is, there are many alternate combinations of model coefficients (or parameter values) that can satisfy the same best-fit criteria. Because predictions made using a given model represent one set of many, there is range over which they vary. In this study, the following objectives focus on burned basins in the western United States: (1) evolve a set of nonlinear multivariate debris-flow volume equations; and (2) quantify and compare model statistics and prediction uncertainty among these nonlinear equations to a published linear equation. This study extends the work of Gartner (2008) who sought to devise debris-flow equations based on the traditional multiple linear regression approach. We demonstrate the novel application of genetic programming to evolve nonlinear post-fire debris-flow volume equations from variables associated with a data-driven conceptual model of the western United States (Friedel, 2011). In addition to providing new equations, this study illustrates the applicability of an inverse technique for estimating nonlinear post-fire debris-flow prediction uncertainty. The general nonlinear modeling approach can be applied to multivariate problems in all fields of study.
Section snippets
Conceptual models and data
The selection of variables for use in this study is based on a conceptual post-fire landscape model provided by Friedel (2011). In that study, conceptual models are delineated at the multi-state (regional) scale using data from six hundred burned basins in nine western states (Gartner et al., 2005), the self-organizing map technique (Kohonen, 2001), the partitive cluster technique (Vesanto and Alhoniemi, 2000), and the Davies-Bouldin criteria (Davies and Bouldin, 1979). Given that the
Genetic programming
Symbolic regression operates on a collection of debris-flow observations to evolve an equation in which the model structure and coefficients are part of the search process (Koza, 1992, Koza et al., 1999, Babovic and Keijzer, 2000). To carry out this process, the criterion of heredity in children from a parent population is introduced using a crossover operator (replace a randomly chosen subtree from a formula with a randomly chosen subtree from another formula) and mutation operator (replace a
Debris-flow volume equations
In the first experiment, the goal is to see if GP can evolve the Gartner equation for western United States (Gartner et al., 2008). This requires the algorithm to choose among an expanded function set that includes arithmetic, square root, and logarithm operators and same independent landscape variables: basin area with slopes greater or equal to 30 percent, in m2 (G30); burn severity characterized as area burned moderate plus high, in m2 (BMH); and total storm rainfall, in m (TSR). In contrast
Conceptual debris-flow model
The conceptual model used in this study provides a hypothesis of post-fire debris-flow volume discharge that is tested using the GP technique. The fact that equations can be evolved from the associated variables suggests that the data-drive approach is useful. Given that the published linear and new nonlinear equations incorporate variables from only two of the eight possible conceptual models presents an opportunity for future evolutionary work. That is, the data-driven approach could also be
Conclusions
We find that symbolic regression and inverse-optimization techniques can be used to model post-fire debris-flow volume discharge and their uncertainty at basin outlets. In contrast to the traditional multiple linear regression technique, the data-driven approach discovers many nonlinear and several dimensionally consistent equations that are unbiased and have less prediction uncertainty. The application of this data-driven modeling approach led to the following specific conclusions:
- 1.
Application
Acknowledgments
The author thanks Sue Cannon of the USGS for providing the burned basin debris-flow volume data used in this study. In addition, the author is indebted to Sue Cannon, Raymond Johnson, and James Tindall of the USGS for their valuable comments and suggestions. The data used in this paper can be obtained by sending an email request to the corresponding author.
References (56)
- et al.
A genetic programming approach to suspended sediment modeling
Journal of Hydrology
(2008) - et al.
Evaluation of confidence intervals for a steady-state leaky aquifer model
Advances in Water Resources
(1999) - et al.
Flood forecasting in transboundary catchments using the Open Modeling Interface
Environmental Modelling & Software
(2010) Coupled inverse modeling of vadose zone water, heat, and solute transport: calibration constraints, parameter nonuniqueness, and predictive uncertainty
Journal of Hydrology
(2005)- et al.
Review of the self-organizing map (SOM) approach in water resources: analysis, modeling and application
Environmental Modelling & Software
(2008) - et al.
A SOM-based apporach to estimating design hyetographs of unguaged sites
Journal of Hydrology
(2007) - et al.
Sources of debris flow material in burned areas
Geomorphology
(2008) - et al.
Modelling and testing spatially distributed sediment budgets to relate erosion processes to sediment yields
Environmental Modelling & Software
(2009) - et al.
Estimation of water and salt generation from unregulated upland catchments
Environmental Modelling & Software
(2011) - et al.
Genetic programming as a model induction engine
Journal of Hydroinformatics
(2000)