Spatially-explicit forecasting of cyanobacteria assemblages in freshwater lakes by multi-objective hybrid evolutionary algorithms
Introduction
The current economic development in Australia and worldwide goes side by side with the global problems of eutrophication and climate change. There is evidence that high nutrient loads, rising temperatures, enhanced stratification, increased residence time and salinisation of drinking water reservoirs and lakes favor the dominance of cyanobacteria (Paerl and Huisman, 2008). Therefore water industries have to consider coinciding effects of eutrophication and climate change in their strategies to manage cyanobacterial blooms. However our ability to predict the occurrence and composition of cyanobacteria blooms has lagged well behind our ability to control total algal biomass. We urgently need advances in our ability to predict and prevent the growth of undesirable algae and other nuisance-forming organisms (Smith and Schindler, 2009). To develop comprehensive lake-based monitoring and early warning systems for water quality and cyanobacteria is therefore the right step forward (Schindler, 2009). Frequent population outbreaks of toxic cyanobacteria in drinking water reservoirs and lakes will have detrimental effects on raw water quality and aquatic biodiversity, and costly technology will be required to sustain safe human water supplies (e.g. Dodds et al., 2009). In order to assist water industries in making informed decisions and timely adaptations of measures for preventing and controlling effects of cyanobacteria, more adequate computer models are required (Jackson et al., 2001).
Traditionally, process-based models which allow simulations of food web dynamics and nutrient cycles over time by using ordinary differential equations (ODEs) (Pei and Ma, 2002, Arhonditsis and Brett, 2005, Chen et al., 2014) are widely used. However, there are some shortcomings to use these process-based models. Firstly, process-based models may hardly comprehend the causal complexity of the phytoplankton community in order to make accurate daily forecasts of population dynamics of algal species. Secondly, process-based models are calibrated for a limited number of years with annual data that constrains their validity to those years. Thirdly, the data demand of process-based models by far exceeds operationally-available data of a lake or a lake site at a certain point in time. Therefore it is unlikely that process-based models may ever been applicable as operational forecasting tools for early warning.
With rapidly growing amounts of ecological data and progress in computing technology, powerful tools for inductive reasoning and forecasting from complex data become available. Artificial neural networks (Hornik et al., 1989) approximate complex data with high accuracy by multivariate nonlinear models (Recknagel et al., 1997, Wei et al., 2001, Jeong et al., 2001), but lack the explicit representation of models extracted from data. In recent years, the use of evolutionary algorithms (EAs) (Holland, 1975) has gained wide popularity in domains, such as machine learning, pattern recognition, economic prediction and so on, due to their characteristics of self-adaptation, self-organization, self-learning and generality (Bäck et al., 1997). Since EA applications for ecological modelling have been pioneered by Bobbin and Recknagel (2001), Cao et al. (2006) developed the hybrid evolutionary algorithm (HEA) that is now worldwide applied for non-spatially-explicit modelling of cyanobacteria blooms in lakes and rivers (e.g. Kim et al., 2007, Chan et al., 2007; Recknagel et al., 2014a) as well as for knowledge discovery (Recknagel et al., 2014b, Recknagel et al., 2016). Since the HEA was designed to develop non-spatially-explicit models, resulting typically single output rule models did not represent spatial or multi-species relationships. However plankton communities in lakes vary seasonally and spatially by abiotic factors like advection, thermal stratification, nutrient loads as well as by biotic factors like competition, grazing, and predation. Therefore there is a demand for models allowing spatially-explicit forecasting that can identify local hotspots for seasonal outbreaks of cyanobacteria blooms.
It is well known that multi-objective optimization (MOO) techniques (Marler and Arora, 2004; Miettinen, 1999, Deb, 2001, Hanne, 2000) have been widely applied in many fields. The multi-objective hybrid evolutionary algorithm (MOHEA) proposed in this study allows to develop IF-THEN-ELSE rules with multiple outputs whereby fitting errors of all outputs are minimized by MOO. Resulting IF-THEN-ELSE rules with multiple outputs provide the benefit of: (1) revealing threshold conditions (IF-condition) that trigger population outbreaks being generic for all outputs, and (2) forecasting multiple species at a single site and single species at multiple sites (see Fig. 1). The functionality of MOHEA is tested for 7-day-ahead forecasting of the cyanobacteria Anabaena and Cylindrospermopsis in the Lakes Wivenhoe and Somerset, Queensland (Australia) based on physical and chemical water quality data monitored from 1999 to 2010. The paper validates forecasting results of different types of multi-output models and discusses ecological relationships revealed by input sensitivity analyses of the models.
Section snippets
Study sites and data
Different data were utilized for developing the three types of multi-output rule models. Eleven years of water quality data from 1999 to 2010 from Lake Wivenhoe in Queensland, Australia were used to develop single-site multi-species and multi-site single-species models. Measured data from Site30001 of Lake Wivenhoe (see Fig. 2) were used for developing single-site multi-species models and the measured data from sites 30015, 30016 and 30017 were used for developing multi-site single-species
Results for single-site multi-species model
Table 4 and Fig. 6 document the best performing model that has been developed for 7-day-ahead forecasting of Cylindrospermopsis and Anabaena at the same Site30001 of Lake Wivenhoe by 100 runs of MOHEA. As shown in Table 4 forecasts of Cylind achieved on average a higher R2 value (0.43) compared to Anabaena (0.29) also reflected by R2 values 0.54 and 0.40 for the best models for Cylind respective Anabaena. The best model selected all the input variables listed in Table 3 except Silica as
Conclusions and future work
This paper illustrates preliminary results of the multi-objective hybrid evolutionary algorithm (MOHEA) that show the potential for:
- (1)
spatially-explicit forecasting of population outbreaks and dispersal at different sites between or within lakes by one model with good accuracy regarding timing and differing accuracy regarding magnitudes of such events,
- (2)
revealing threshold conditions that trigger population outbreaks being generic for modelled sites and populations such as the water temperature of
Acknowledgements
This work was supported by Australian Research Council (ARC Grant no: LP0990453) and the industry partners SA water and Seqwater.
References (35)
- et al.
Eutrophication model for lake washington (USA). Part I. model description and sensitivity analysis
Ecol. Modell.
(2005) - et al.
Knowledge discovery for prediction and explanation of blue-green algal dynamics in lakes by evolutionary algorithms
Ecol. Modell.
(2001) - et al.
Elucidation and short-term forecasting of microcystin concentrations in Lake Suwa (Japan) by means of artificial neural networks and evolutionary algorithms
Water Res.
(2007) - et al.
Adaptation and multiple parameter optimization of the simulation model SALMO as prerequisite for scenario analysis on a shallow eutrophic Lake
Ecol. Modell.
(2014) - et al.
Multilayer feedforward networks are universal approximators
Neural Netw.
(1989) - et al.
Predictive function and rules for population dynamics of Microcystis aeruginosa in the regulated Nakdong River (South Korea), discovered by evolutionary algorithms
Ecol. Modell.
(2007) - et al.
River flow forecasting through conceptual models, part I—a discussion of principles
J. Hydrol.
(1970) - et al.
Evaluation of quantitative real-time PCR to characterise spatial and temporal variations in cyanobacteria, Cylindrospermopsis raciborskii (Woloszynska) Seenaya et Subba Raju and cylindrospermopsin concentrations in three subtropical Australian reservoirs
Harmful Algae
(2010) - et al.
Artificial neural network approach for modelling and prediction of algal blooms
Ecol. Modell.
(1997) - et al.
Inductive reasoning and forecasting of population dynamics of Cylindrospermopsis raciborskii in three sub-tropical reservoirs by evolutionary computation
Harmful Algae
(2014)
Model ensemble for the simulation of plankton community dynamics of Lake Kinneret (Israel) induced from in situ predictor variables by evolutionary computation
Environ. Modell. Softw.
Eutrophication science: where do we go from here?
Trends Ecol. Evol.
Use of artificial neural network in the prediction of algal blooms
Water Res.
Evolutionary computation: comments on the history and current state
IEEE Trans. Evol. Comput.
Hybrid evolutionary algorithm for rule set discovery in time-series data to forecast and explain algal population dynamics in two lakes different in morphometry and eutrophication
Parameter optimization algorithms for evolving rule models applied to freshwater ecosystems
IEEE Trans. Evol. Comput.
Bootstrap Methods and Their Application
Cited by (17)
Automation of species-specific cyanobacteria phycocyanin fluorescence compensation using machine learning classification
2022, Ecological InformaticsCitation Excerpt :Several studies have successfully developed data-driven predictive models for cyanoHABs using water quality, meteorological and/or physical variables (see Rousso et al., 2020 for a review). Most previous research on predicting cyanoHABs has focused on models to predict cell counts or biomass of a particular cyanobacteria species of interest (e.g., Cao et al., 2016; Li et al., 2007; Ndong et al., 2014; Welk et al., 2008) or of the entire cyanobacteria community (e.g., Almuhtaram et al., 2021; Xiao et al., 2017) rather than the dominant cyanobacterial taxon. Predicting the dominant species taxa could provide useful information to support automated species-specific compensation of in-situ f-PC sensors (Bertone et al., 2019; Rousso et al., 2022a).
Chlorophyll and phycocyanin in-situ fluorescence in mixed cyanobacterial species assemblages: Effects of morphology, cell size and growth phase
2022, Water ResearchCitation Excerpt :Examples of efficient use of data post-hoc include the development and validation of short-term forecasting (e.g., early warning systems) or long-term predictive (e.g., scenario analysis). Some machine learning and numerical models based on fluorescence estimates have recently been developed to forecast CyanoHAB occurrence (Elliott, 2012; Ndong et al., 2014; Xiao et al., 2017), predict dominant species (Cao et al., 2016; Fadel et al., 2017) and understand the main drivers for cyanobacteria succession (Shan et al., 2019; Wei et al., 2001), exemplifying the potential of high-frequency data in improving these models (Hamilton et al., 2015; Rousso et al., 2020). Species-specific calibration of fluorescence sensors must be part of an integrated management plan that considers account site-specific characteristics (e.g., dominant taxa, interferences) and supports informed decision making.
State of knowledge on early warning tools for cyanobacteria detection
2021, Ecological IndicatorsCitation Excerpt :Almuhtaram et al. (2021b) demonstrated that three algorithms, One-Class Support Vector Machine, elliptic envelope, and Isolation Forest, are able to accurately identify cyanobacterial blooms in four datasets when trained on standardized historical phycocyanin data and tested on more recent data. Similarly, Cao et al. (2016) applied a multi-objective hybrid evolutionary algorithm to successfully identify the onset of cyanobacterial blooms using water quality parameters, and Chen et al. (2015) developed an autoregressive integrated moving average model to predict chlorophyll a concentrations and provide early warning of algal blooms. Thus, these and other machine learning algorithms can potentially be implemented as part of a utility’s harmful algal bloom monitoring strategy.