Elsevier

Computers & Geosciences

Volume 106, September 2017, Pages 139-149
Computers & Geosciences

Research paper
Application of evolutionary computation on ensemble forecast of quantitative precipitation

https://doi.org/10.1016/j.cageo.2017.06.011Get rights and content

Highlights

  • Genetic programming (GP) is used for postprocessing ensemble forecasts.

  • The problem of ensemble forecast of rainfall amount was addressed.

  • GP performed better than traditional statistical techniques.

  • GP outperformed by a large margin the best individual forecasts.

  • Knowledge were extracted from the resulting symbolic forecast models.

Abstract

An evolutionary computation algorithm known as genetic programming (GP) has been explored as an alternative tool for improving the ensemble forecast of 24-h accumulated precipitation. Three GP versions and six ensembles’ languages were applied to several real-world datasets over southern, southeastern and central Brazil during the rainy period from October to February of 2008–2013. According to the results, the GP algorithms performed better than two traditional statistical techniques, with errors 27–57% lower than simple ensemble mean and the MASTER super model ensemble system. In addition, the results revealed that GP algorithms outperformed the best individual forecasts, reaching an improvement of 34–42%. On the other hand, the GP algorithms had a similar performance with respect to each other and to the Bayesian model averaging, but the former are far more versatile techniques. Although the results for the six ensembles’ languages are almost indistinguishable, our most complex linear language turned out to be the best overall proposal. Moreover, some meteorological attributes, including the weather patterns over Brazil, seem to play an important role in the prediction of daily rainfall amount.

Introduction

The main goal of this paper is to propose a new approach based on genetic programming algorithms to create more accurate deterministic ensemble forecasts (DEF) of 24-h accumulated precipitation. This goal is motivated by the importance of an accurate and reliable quantitative precipitation forecast (QPF) for the strategic planning of several socio-economic sectors (such as agricultural production, hydropower generation, water availability for public consumption, and flood and landslide control), as well as by the difficulty in forecasting quantitative precipitation and by the limitations of the current methods for postprocessing ensembles. The traditional statistical techniques (such as model output statistics (MOS; Glahn and Lowry, 1972), MASTER super model ensemble system (MSMES; Silva Dias et al., 2006), and Bayesian model averaging (BMA; Raftery et al., 2005)) have worked well for variables such as temperature and geopotential height. However, these approaches lead to unsatisfactory results for QPF, perhaps because the distribution of precipitation is far from normal (usually gamma distribution), or due to the complexity of the processes involved, or because of its high spatial, temporal and frequency variability.

Genetic programming (GP) is an evolutionary algorithm, which is inspired by genetics and Darwinian evolution. GP was introduced by Koza (1992) in the early 1990s, due to its ability to learn implicit relationships in observed data and to express them automatically in a symbolic mathematical manner. Furthermore, GP is a supervised machine learning technique that has been able to solve complex optimization problems which cannot feasibly be solved directly or rigorously in real-world applications. Gene-expression programming (GEP) (Ferreira, 2001), grammar-based GP (GGP) (Whigham, 1995) and grammatical evolution (GE) (Ryan et al., 1998) are specializations of the canonical GP, with the last two having the advantage of evolving syntactically correct solutions in an arbitrary language described by a grammar.

In contrast to traditional statistical approaches, evolutionary algorithms do not require prior knowledge about the statistical distribution of the data, nor do they need to explicitly assume a model form. Moreover, evolutionary algorithms usually test many solutions instead of continually trying to improve a single one, and can also automatically capture complex interactions among input and output variables in a system. Additionally, the ability of traditional statistical techniques to deal with non-linear problems is limited, whereas for the evolutionary algorithms it is very satisfactory.

Until recently, only a few papers focused on applying GP algorithms in Hydrology, Meteorology and Water Resources (Omolbani et al., 2010). For the ensemble forecast problem, Bakhshaii and Stull (2009) proposed the use of GEP to form linear or non-linear combinations of numerical weather predictions (NWP). The authors applied GEP to produce short-range DEFs of 24-h accumulated precipitation at 24 stations in mountainous southwestern Canada during the two fall–spring rainy seasons of October 2003–March 2005, using an eleven-member multimodel multigrid-size ensemble. The GEP DEFs obtained superior performance relative to simple ensemble means for about half of the mountain weather stations tested. Roebber (2010) focused on the production of consensus 24-h forecasts for minimum temperature at a site in Ohio derived from evolutionary programming (EP). The resulting deterministic forecasts’ improvement relative to MOS was nearly 27%. Roebber, 2015a, Roebber, 2015b extended this work to investigate probabilistic as well as deterministic forecasts of minimum temperature, which were superior to those obtained from operational ensembles and MOS.

Roebber's papers are concerned with generating ensemble of EP solutions, whereas here we are interested in optimizing a combination of NWP ensemble members as in Bakhshaii and Stull (2009). Two important differences between the purpose of this paper and that of Bakhshaii and Stull (2009) are: (i) the use of grammar-based GP instead of GEP, a non-grammatical approach, and (ii) the inclusion of other potential predictors, such as the major weather patterns over Brazil, in addition to NWP models. Although in Roebber, 2010, Roebber, 2015a, Roebber, 2015b the author introduces specialist's domain knowledge into the programs’ language, this is not achieved through a formal grammar as in our work. Furthermore, QPF for regions of Brazil is considered a harder problem than minimum temperature forecasting, as addressed by Roebber, 2010, Roebber, 2015a, Roebber, 2015b, due to the more complex processes associated with tropical and subtropical convection.

The current paper is an extension to the previous work (Dufek et al., 2013) in which the feasibility of the GE algorithm to deal with the problem of ensemble forecast of rainfall amount was evaluated on three artificial datasets comprising known relationships between three hypothetical meteorological models and two weather patterns. Now, three GP versions are applied to postprocessing short-range ensemble forecast of daily rainfall amount for several real-world datasets. Furthermore, other meteorological information are incorporated into the grammars in addition to weather patterns.

The main contributions of this paper consist of (i) creating deterministic ensemble 24- and 72-h forecasts of 24-h accumulated precipitation based on GGP and GE algorithms for 317 locations in southern, southeastern and central Brazil during the rainy period from October to February of 2008–2013; (ii) comparing in terms of accuracy the DEFs of quantitative precipitation via GP algorithms with those obtained from three traditional statistical techniques: simple ensemble mean, MSMES, and BMA, and also with the best forecast in the ensemble; (iii) the development and study of six different ensemble forecast grammars to represent the possible solutions to the ensemble forecast problem; (iv) an investigation into the non-linearity of the phenomenon; (v) providing some meteorological information as input attributes in order to enrich the GP forecasting model; (vi) an investigation into the influence of the four major weather patterns in Brazil on the precipitation skill of NWP models; (vii) extracting knowledge from the resulting best solutions, such as the relationships between the input attributes and the occurrence of rainfall, and the classification of the meteorological attributes in order of importance in the ensemble postprocessing.

The frequently used abbreviations are listed in Table 1 in order to facilitate the reading of the paper.

Section snippets

Genetic programming

GP is one of the main areas of evolutionary computation, first devised by Cramer (1985) and greatly developed by Koza (1992). GP is a stochastic optimization technique based on Darwin's theory of evolution by natural selection that evolves a population of computer programs, usually expressed as syntax trees. Whigham (1995) introduced the grammar-based GP (GGP) in order to evolve syntactically correct computer programs in an arbitrary language described by a grammar. Grammatical evolution (GE) (

Data

Daily rainfall amount predicted by several NWP models for 317 locations in the domain between 32.8°S–14.8°S and 57.8°W–39.8°W, during the period from October to February of 2008–2013, came from the Center for Weather Forecast and Climate Studies (CPTEC) of the Brazilian National Institute for Space Research (INPE). The corresponding observed values of daily rainfall amount were derived from a higher quality gridded dataset at a spatial resolution of 0.25° (Rozante et al., 2010). The location of

Conclusions and future work

GP algorithms were explored in order to provide a more accurate and reliable deterministic ensemble forecasts of 24-h accumulated precipitation. Three GP versions and six ensemble forecast grammars were applied to 24- and 72-h forecasts at 317 locations in southern, southeastern and central Brazil during the rainy period from October to February of 2008–2013.

The GP deterministic ensemble forecasts provide substantial improvements in accuracy of rainfall amount forecasting relative to two

Acknowledgments

The authors would like to thank the support provided by CNPq (grants 140680/2010-1, 486103/2012-9, 310778/2013-1 and 502836/2014-8), FAPERJ (grants E26/100.388/2012) and the project IBM/LNCC (B1258534).

References (31)

  • Augusto, D.A., Barbosa, H.J.C., Barreto, A.M.S., Bernardino, H.S., 2011. Evolving numerical constants in grammatical...
  • A. Bakhshaii et al.

    Deterministic ensemble forecasts using gene-expression programming

    Weather Forecast.

    (2009)
  • Borda, J.C., 1784. Mémoire sur les élections au scrutin. Histoire de l'Académie Royale des...
  • L.M. Carvalho et al.

    The south atlantic convergence zonepersistence, intensity, form, extreme precipitation and relationships with intraseasonal activity

    J. Clim.

    (2004)
  • Chambers, J., Cleveland, W., Kleiner, B., Tukey, P., 1983. Graphical Methods for Data Analysis. Chapman & Hall...
  • N. Chomsky

    Three models for the description of language

    IRE Trans. Inf. Theory

    (1956)
  • Cramer, N.L., 1985. A representation for the adaptive generation of simple sequential programs. In: Proceedings of the...
  • Dufek, A.S., Augusto, D.A., Silva Dias, P.L., Barbosa, H.J.C., 2013. Evaluating the feasibility of grammar-based GP in...
  • F.A. Eckel et al.

    Aspects of effective mesoscale, short-range ensemble forecasting

    Weather Forecast.

    (2005)
  • Eiben, A.E., Smith, J.E., 2003. Introduction to Evolutionary Computing....
  • Espinosa, A.M., 2011. Ensemble forecast of rainfall amount in the Rio Grande basin in southeast Brazil during summer...
  • C. Ferreira

    Gene expression programminga new adaptive algorithm for solving problems

    Complex Syst.

    (2001)
  • H.R. Glahn et al.

    The use of model output statistics (MOS) in objective weather forecasting

    J. Appl. Meteorol.

    (1972)
  • S.J. Greybush et al.

    The regime dependence of optimally weighted ensemble model consensus forecasts of surface temperature

    Weather Forecast.

    (2008)
  • J.R. Koza

    Genetic Programming: On the Programming of Computers by Means of Natural Selection

    (1992)
  • Cited by (10)

    • Gene expression models

      2022, Handbook of HydroInformatics: Volume I: Classic Soft-Computing Techniques
    • Data-driven symbolic ensemble models for wind speed forecasting through evolutionary algorithms

      2020, Applied Soft Computing Journal
      Citation Excerpt :

      To the best of our knowledge, the powerful evolutionary algorithm known as grammatical evolution has never been explored as an alternative tool to tackle the challenges related to wind power generation systems. An example of successful application of grammatical evolution in ensemble forecast problems may be found in Dufek et al. [21], where the authors focused on ensemble forecasts for daily rainfall amount at 317 locations in Brazil. In contrast to single-solution algorithms which tend to get stuck in local optima, grammatical evolution (GE) is a population-based, stochastic, global optimization algorithm that, together with its high degree of parallelism, allows for a better exploration of the whole search space, which in turn increases the probability of finding the global optimum.

    • Genetic programming in water resources engineering: A state-of-the-art review

      2018, Journal of Hydrology
      Citation Excerpt :

      Their results exhibited that ENSO and EQUINOO indices of time steps (t - 1), (t - 2) and (t - 3) are the most effective inputs for satisfactory prediction of precipitation. In the case of ensemble forecast problem, GP variants have been recently value to as methodological remedy as well seen in the work of Dufek et al. (2017) who proposed Grammar-GP (GGP) model to form more accurate short-range (24-h, 72-h) deterministic ensemble forecast models of 24-h accumulated precipitation at 317 weather stations in Brazil during rainy period from October to February of 2008–2013. Differing from the earlier work of Bakhshaii and Stull (2009), the authors included other predictors, such as major weather patterns over Brazil, in addition to numerical weather prediction models.

    View all citing articles on Scopus
    View full text