Spatially-explicit forecasting of cyanobacteria assemblages in freshwater lakes by multi-objective hybrid evolutionary algorithms

doi:10.1016/j.ecolmodel.2016.09.024

Ecological Modelling

Volume 342, 24 December 2016, Pages 97-112

https://doi.org/10.1016/j.ecolmodel.2016.09.024 Get rights and content

Highlights

•
A new multi-objective hybrid evolutionary algorithm to forecast toxic cyanobacteria.
•
Spacially-explicit forecasting of cyanobacteria assemblages in multiple sites/lakes.
•
Highly understandable rules to indicate water conditions favoured by cyanobacteria.
•
A decision tool for water managers to give early warning of cyanobacteria blooms.

Abstract

This paper proposes a novel multi-objective hybrid evolutionary algorithm (MOHEA) that allows spatially-explicit modelling of local outbreaks and dispersal of population density. The MOHEA was tested for modelling at once two cyanobacteria populations at one lake site, same population in two different lakes and same population at three different sites of one lake. All experiments with MOHEA utilized water quality time-series and abundances of Anabaena and Cylindrospermopsis monitored in the sub-tropical Lakes Wivenhoe and Somerset in Queensland (Australia) from 1999 to 2010. Results have demonstrated the capacity of MOHEA to determine generic rules that: (1) reveal crucial thresholds for outbreaks of cyanobacteria blooms, and (2) perform spatially-explicit forecasting of timing and magnitudes 7-day-ahead of bloom events.

Introduction

The current economic development in Australia and worldwide goes side by side with the global problems of eutrophication and climate change. There is evidence that high nutrient loads, rising temperatures, enhanced stratification, increased residence time and salinisation of drinking water reservoirs and lakes favor the dominance of cyanobacteria (Paerl and Huisman, 2008). Therefore water industries have to consider coinciding effects of eutrophication and climate change in their strategies to manage cyanobacterial blooms. However our ability to predict the occurrence and composition of cyanobacteria blooms has lagged well behind our ability to control total algal biomass. We urgently need advances in our ability to predict and prevent the growth of undesirable algae and other nuisance-forming organisms (Smith and Schindler, 2009). To develop comprehensive lake-based monitoring and early warning systems for water quality and cyanobacteria is therefore the right step forward (Schindler, 2009). Frequent population outbreaks of toxic cyanobacteria in drinking water reservoirs and lakes will have detrimental effects on raw water quality and aquatic biodiversity, and costly technology will be required to sustain safe human water supplies (e.g. Dodds et al., 2009). In order to assist water industries in making informed decisions and timely adaptations of measures for preventing and controlling effects of cyanobacteria, more adequate computer models are required (Jackson et al., 2001).

Traditionally, process-based models which allow simulations of food web dynamics and nutrient cycles over time by using ordinary differential equations (ODEs) (Pei and Ma, 2002, Arhonditsis and Brett, 2005, Chen et al., 2014) are widely used. However, there are some shortcomings to use these process-based models. Firstly, process-based models may hardly comprehend the causal complexity of the phytoplankton community in order to make accurate daily forecasts of population dynamics of algal species. Secondly, process-based models are calibrated for a limited number of years with annual data that constrains their validity to those years. Thirdly, the data demand of process-based models by far exceeds operationally-available data of a lake or a lake site at a certain point in time. Therefore it is unlikely that process-based models may ever been applicable as operational forecasting tools for early warning.

With rapidly growing amounts of ecological data and progress in computing technology, powerful tools for inductive reasoning and forecasting from complex data become available. Artificial neural networks (Hornik et al., 1989) approximate complex data with high accuracy by multivariate nonlinear models (Recknagel et al., 1997, Wei et al., 2001, Jeong et al., 2001), but lack the explicit representation of models extracted from data. In recent years, the use of evolutionary algorithms (EAs) (Holland, 1975) has gained wide popularity in domains, such as machine learning, pattern recognition, economic prediction and so on, due to their characteristics of self-adaptation, self-organization, self-learning and generality (Bäck et al., 1997). Since EA applications for ecological modelling have been pioneered by Bobbin and Recknagel (2001), Cao et al. (2006) developed the hybrid evolutionary algorithm (HEA) that is now worldwide applied for non-spatially-explicit modelling of cyanobacteria blooms in lakes and rivers (e.g. Kim et al., 2007, Chan et al., 2007; Recknagel et al., 2014a) as well as for knowledge discovery (Recknagel et al., 2014b, Recknagel et al., 2016). Since the HEA was designed to develop non-spatially-explicit models, resulting typically single output rule models did not represent spatial or multi-species relationships. However plankton communities in lakes vary seasonally and spatially by abiotic factors like advection, thermal stratification, nutrient loads as well as by biotic factors like competition, grazing, and predation. Therefore there is a demand for models allowing spatially-explicit forecasting that can identify local hotspots for seasonal outbreaks of cyanobacteria blooms.

It is well known that multi-objective optimization (MOO) techniques (Marler and Arora, 2004; Miettinen, 1999, Deb, 2001, Hanne, 2000) have been widely applied in many fields. The multi-objective hybrid evolutionary algorithm (MOHEA) proposed in this study allows to develop IF-THEN-ELSE rules with multiple outputs whereby fitting errors of all outputs are minimized by MOO. Resulting IF-THEN-ELSE rules with multiple outputs provide the benefit of: (1) revealing threshold conditions (IF-condition) that trigger population outbreaks being generic for all outputs, and (2) forecasting multiple species at a single site and single species at multiple sites (see Fig. 1). The functionality of MOHEA is tested for 7-day-ahead forecasting of the cyanobacteria Anabaena and Cylindrospermopsis in the Lakes Wivenhoe and Somerset, Queensland (Australia) based on physical and chemical water quality data monitored from 1999 to 2010. The paper validates forecasting results of different types of multi-output models and discusses ecological relationships revealed by input sensitivity analyses of the models.

Section snippets

Study sites and data

Different data were utilized for developing the three types of multi-output rule models. Eleven years of water quality data from 1999 to 2010 from Lake Wivenhoe in Queensland, Australia were used to develop single-site multi-species and multi-site single-species models. Measured data from Site30001 of Lake Wivenhoe (see Fig. 2) were used for developing single-site multi-species models and the measured data from sites 30015, 30016 and 30017 were used for developing multi-site single-species

Results for single-site multi-species model

Table 4 and Fig. 6 document the best performing model that has been developed for 7-day-ahead forecasting of Cylindrospermopsis and Anabaena at the same Site30001 of Lake Wivenhoe by 100 runs of MOHEA. As shown in Table 4 forecasts of Cylind achieved on average a higher R² value (0.43) compared to Anabaena (0.29) also reflected by R² values 0.54 and 0.40 for the best models for Cylind respective Anabaena. The best model selected all the input variables listed in Table 3 except Silica as

Conclusions and future work

This paper illustrates preliminary results of the multi-objective hybrid evolutionary algorithm (MOHEA) that show the potential for:

(1)
spatially-explicit forecasting of population outbreaks and dispersal at different sites between or within lakes by one model with good accuracy regarding timing and differing accuracy regarding magnitudes of such events,
(2)
revealing threshold conditions that trigger population outbreaks being generic for modelled sites and populations such as the water temperature of

Acknowledgements

This work was supported by Australian Research Council (ARC Grant no: LP0990453) and the industry partners SA water and Seqwater.

References (35)

G.B. Arhonditsis et al.
Eutrophication model for lake washington (USA). Part I. model description and sensitivity analysis
Ecol. Modell.
(2005)
J. Bobbin et al.
Knowledge discovery for prediction and explanation of blue-green algal dynamics in lakes by evolutionary algorithms
Ecol. Modell.
(2001)
W.S. Chan et al.
Elucidation and short-term forecasting of microcystin concentrations in Lake Suwa (Japan) by means of artificial neural networks and evolutionary algorithms
Water Res.
(2007)
Q. Chen et al.
Adaptation and multiple parameter optimization of the simulation model SALMO as prerequisite for scenario analysis on a shallow eutrophic Lake
Ecol. Modell.
(2014)
K. Hornik et al.
Multilayer feedforward networks are universal approximators
Neural Netw.
(1989)
D.-J. Kim et al.
Predictive function and rules for population dynamics of Microcystis aeruginosa in the regulated Nakdong River (South Korea), discovered by evolutionary algorithms
Ecol. Modell.
(2007)
J.E. Nash et al.
River flow forecasting through conceptual models, part I—a discussion of principles
J. Hydrol.
(1970)
P.T. Orr et al.
Evaluation of quantitative real-time PCR to characterise spatial and temporal variations in cyanobacteria, Cylindrospermopsis raciborskii (Woloszynska) Seenaya et Subba Raju and cylindrospermopsin concentrations in three subtropical Australian reservoirs
Harmful Algae
(2010)
F. Recknagel et al.
Artificial neural network approach for modelling and prediction of algal blooms
Ecol. Modell.
(1997)
F. Recknagel et al.
Inductive reasoning and forecasting of population dynamics of Cylindrospermopsis raciborskii in three sub-tropical reservoirs by evolutionary computation
Harmful Algae
(2014)

F. Recknagel et al.

Model ensemble for the simulation of plankton community dynamics of Lake Kinneret (Israel) induced from in situ predictor variables by evolutionary computation

Environ. Modell. Softw.

(2014)

V.H. Smith et al.

Eutrophication science: where do we go from here?

Trends Ecol. Evol.

(2009)

B. Wei et al.

Use of artificial neural network in the prediction of algal blooms

Water Res.

(2001)

T. Bäck et al.

Evolutionary computation: comments on the history and current state

IEEE Trans. Evol. Comput.

(1997)

H. Cao et al.

Hybrid evolutionary algorithm for rule set discovery in time-series data to forecast and explain algal population dynamics in two lakes different in morphometry and eutrophication

H. Cao et al.

Parameter optimization algorithms for evolving rule models applied to freshwater ecosystems

IEEE Trans. Evol. Comput.

(2014)

A.C. Davison et al.

Bootstrap Methods and Their Application

(1997)

Cited by (17)

Influence of resampling techniques on Bayesian network performance in predicting increased algal activity
2023, Water Research
Early warning of increased algal activity is important to mitigate potential impacts on aquatic life and human health. While many methods have been developed to predict increased algal activity, an ongoing issue is that severe algal blooms often occur with low frequency in water bodies. This results in imbalanced data sets available for model specification, leading to poor predictions of the frequency of increased algal activity. One approach to address this is to resample data sets of increased algal activity to increase the prevalence of higher than normal algal activity in calibration data and ultimately improve model predictions. This study aims to investigate the use of resampling techniques to address the imbalanced dataset and determine if such methods can improve the prediction of increased algal activity. Three techniques were investigated, Kmeans under-sampling (US_Kmeans), synthetic minority over-sampling technique (SMOTE), and ‘SMOTE and cluster-based under-sampling technique’ (SCUT). The resampling methods were applied to a Bayesian network (BN) model of Lake Burragorang in New South Wales, Australia. The model was developed to predict chlorophyll-a (chl-a) using a range of water quality parameters as predictors. The original data and each of the balanced datasets were used for BN structures and parameter learning. The results showed that the best graphical structure was obtained by adding synthetic data from SMOTE with the highest true positive rate (TPR) and area under the curve (AUC). When compared using a fixed graphical structure for the BN, all resampling techniques increased the ability of the BN to detect events with higher probability of increased algal activity. The resampling model results can also be used to better understand the most important influences on high chl-a concentrations and suggest future data collection and model development priorities.
Automation of species-specific cyanobacteria phycocyanin fluorescence compensation using machine learning classification
2022, Ecological Informatics
Citation Excerpt :
Several studies have successfully developed data-driven predictive models for cyanoHABs using water quality, meteorological and/or physical variables (see Rousso et al., 2020 for a review). Most previous research on predicting cyanoHABs has focused on models to predict cell counts or biomass of a particular cyanobacteria species of interest (e.g., Cao et al., 2016; Li et al., 2007; Ndong et al., 2014; Welk et al., 2008) or of the entire cyanobacteria community (e.g., Almuhtaram et al., 2021; Xiao et al., 2017) rather than the dominant cyanobacterial taxon. Predicting the dominant species taxa could provide useful information to support automated species-specific compensation of in-situ f-PC sensors (Bertone et al., 2019; Rousso et al., 2022a).
High-frequency cyanobacteria monitoring often uses in-situ fluorescence of phycocyanin (f-PC). However, f-PC must be calibrated for the dominant cyanobacteria species, and it cannot distinguish cyanobacteria taxa, which relies on conventional time-consuming cyanobacteria identification methods. This study proposes a framework to automate f-PC species-specific compensation through three components: (1) prediction of the dominant cyanobacteria species using data-driven models and routine environmental monitoring data; (2) determination of species-specific f-PC per biomass in controlled laboratory experiments; and (3) automation of f-PC species compensation. The framework was validated by applying it to Myponga drinking water reservoir in South Australia. Three machine learning techniques using only high-frequency water temperature data were compared to predict the dominant cyanobacteria species. The framework application to Myponga drinking water reservoir improved the agreement of f-PC with conventional cyanobacteria biovolume measurements, and provided rapid, low-cost identification of the dominant cyanobacteria species, which can support proactive species-targeted cyanobacteria management.
Chlorophyll and phycocyanin in-situ fluorescence in mixed cyanobacterial species assemblages: Effects of morphology, cell size and growth phase
2022, Water Research
Citation Excerpt :
Examples of efficient use of data post-hoc include the development and validation of short-term forecasting (e.g., early warning systems) or long-term predictive (e.g., scenario analysis). Some machine learning and numerical models based on fluorescence estimates have recently been developed to forecast CyanoHAB occurrence (Elliott, 2012; Ndong et al., 2014; Xiao et al., 2017), predict dominant species (Cao et al., 2016; Fadel et al., 2017) and understand the main drivers for cyanobacteria succession (Shan et al., 2019; Wei et al., 2001), exemplifying the potential of high-frequency data in improving these models (Hamilton et al., 2015; Rousso et al., 2020). Species-specific calibration of fluorescence sensors must be part of an integrated management plan that considers account site-specific characteristics (e.g., dominant taxa, interferences) and supports informed decision making.
Cyanobacteria harmful blooms can represent a major risk for public health due to potential release of toxins and other noxious compounds in the water. A continuous and high-resolution monitoring of the cyanobacteria population is required due to their rapid dynamics, which has been increasingly done using in-situ fluorescence of phycocyanin (f-PC) and chlorophyll a (f-Chl a). Appropriate in-situ fluorometers calibration is essential because f-PC and f-Chl a are affected by biotic and abiotic factors, including species composition. Measurement of f-PC and f-Chl a in mixed species assemblages during different growth phases - representative of most field conditions - has received little attention. We hypothesized that f-PC and f-Chl a of mixed assemblages of cyanobacteria may be accurately estimated if taxa composition and fluorescence characteristics are known. We also hypothesized that species with different morphologies would have different fluorescence per unit cell and biomass. We tested these hypotheses in a controlled culture experiment in which photosynthetic pigment fluorescence, chemical pigment extraction, optical density and microscopic enumeration of four common cyanobacteria species (Aphanocapsa sp, Microcystis aeruginosa, Dolichospermum circinale and Raphidiopsis raciborskii) were quantified. Both monocultures and mixed cultures were monitored from exponential to late stationary growth phases. The sum of fluorescence of individual species calculated for mixed samples was not significantly different than measured fluorescence of mixed cultures. Estimated and measured f-PC and f-Chl a of mixed cultures had higher correlations and smaller absolute median errors when estimations were based on fluorescence per biomass instead of fluorescence per cell. Largest errors were overestimations of measured fluorescence for species with different morphologies. Fluorescence per cell was significantly different among most species, while fluorescence per unit biomass was not, indicating that conversion of fluorescence to biomass reduces species-specific bias. This study presents new information on the effect of species composition on cyanobacteria fluorescence. Best practices of deployment and operation of fluorometers, and data-driven models supporting in-situ fluorometers calibration are discussed as suitable solutions to minimize taxa-specific bias in fluorescence estimates.
State of knowledge on early warning tools for cyanobacteria detection
2021, Ecological Indicators
Citation Excerpt :
Almuhtaram et al. (2021b) demonstrated that three algorithms, One-Class Support Vector Machine, elliptic envelope, and Isolation Forest, are able to accurately identify cyanobacterial blooms in four datasets when trained on standardized historical phycocyanin data and tested on more recent data. Similarly, Cao et al. (2016) applied a multi-objective hybrid evolutionary algorithm to successfully identify the onset of cyanobacterial blooms using water quality parameters, and Chen et al. (2015) developed an autoregressive integrated moving average model to predict chlorophyll a concentrations and provide early warning of algal blooms. Thus, these and other machine learning algorithms can potentially be implemented as part of a utility’s harmful algal bloom monitoring strategy.
The potential for cyanobacterial blooms to impact recreational and drinking water source quality is a growing concern. Numerous monitoring tools have been developed that can alert stakeholders to the onset of cyanobacterial blooms to initiate mitigation efforts for waters used for recreation or drinking water supply. Early warning monitoring systems need to consider multiple aspects of a cyanobacterial bloom: whether a bloom is occurring in the source water, whether it might be transported to drinking water intakes, whether toxin or taste and odor compound producers are present and what proportion of the cells in a bloom they comprise, and whether cells are entering a utility at concentrations above threshold levels. No single monitoring tool can provide all this information, so multi-barrier approaches are needed. Reviews of monitoring tools and their variations are available, but they are generally limited to one type of tool. Instead, a review and comparison of all the available tools is needed to inform stakeholders of them and their relative advantages and limitations. Therefore, this review covers conventional tools including microscopic enumeration, pigment extraction, qPCR, probes, and remote sensing as well as emerging techniques including next-generation sequencing, photonic systems, biosensors, drones, and applications of machine learning and discusses them primarily from a practical and operational standpoint. Moreover, a three-tier framework is proposed for designing comprehensive early warning systems that groups monitoring tools by their analytical targets: biological activity or algal biomass, cyanobacteria or cyanobacteria-related genes, and cyanobacterial metabolites. First tier tools are generally simple and inexpensive to use, including turbidity, optical density, visual inspection, drones, chlorophyll a, and adenosine triphosphate. Changes in water quality conditions detected using a first tier tool triggers the use of a second tier tools for identification and quantification of cyanobacteria by microscopy, phycocyanin, biosensors, hyperspectral remote sensing, or next-generation sequencing. If potentially harmful concentrations of cyanobacteria are confirmed, third tier tools are deployed for quantifying concentrations of cyanotoxins and taste and odor compounds or the genes that encode for them using enzyme-linked immunosorbent assays, mass spectrometry, qPCR, or other analytical methods. This framework is designed to minimize the time and cost associated with cyanobacteria monitoring without compromising the ability of stakeholders to detect the onset of a bloom.
Implications of flow regulation for habitat conditions and phytoplankton populations of the Nakdong River, South Korea
2021, Water Research
Anthropogenic regulation of hydrographs is a widespread approach to river management; however, the effects of river regulation on habitat conditions and aquatic communities have rarely been studied. In this study, we analyzed the physical, chemical, and biological data from the lower Nakdong River in South Korea from 2005 to 2009 before weir construction and from 2012 to 2016 after weir construction. A partial least square path model (PLS-PM) was applied to delineate the complex interrelationships of diatoms and cyanobacteria with physicochemical parameters, nutrients, zooplankton grazing, and hydrological parameters. Inferential modeling using the hybrid evolutionary algorithm (HEA) allowed the identification of differences in the importance and threshold conditions of population dynamics drivers of diatoms and cyanobacteria before and after flow regulation. The annually averaged trajectories of limnological variables displayed significant shifts in seasonality and magnitudes of phytoplankton, zooplankton, and nutrient concentrations between the two periods. The results of PLS-PM indicated that, after flow regulation, diatoms and cyanobacteria were directly affected by nutrients and zooplankton densities and the path coefficients of hydrological parameters decreased or even were insignificant. The inferential models suggested that diatom dynamics were essentially shaped by threshold conditions of water temperature (WT) and pH before regulation, but mainly by those of rotifers (below 51.1 ind. L⁻¹) after regulation. As for cyanobacteria dynamics, WT was identified as a critical threshold condition before and after regulation, and the threshold of PO₄⁻ concentration above 145.4 L⁻¹ was identified as the reason for occasional blooms during the post-regulation period. Overall, the results suggest that flow regulation gradually alters habitat conditions typically of rivers to those of stagnant waters. These findings must be taken into account for sustainable management strategies of regulated rivers.
Quantifying phenological asynchrony of phyto- and zooplankton in response to changing temperature and nutrient conditions in Lake Müggelsee (Germany) by means of evolutionary computation
2021, Environmental Modelling and Software
There is a need to determine and quantify global change induced phenological asynchrony because of possible loss of biodiversity and implications for the food web. Phenological asynchrony in freshwater lakes can be studied retrospectively by analysing historical data, and prospectively by scenario analysis based on historical data. Models allowing to study phenology by scenario analysis need to be valid for multiple populations over decades limiting the suitability of process-based models that are structural rigid and calibratable only for a limited period of time.
Here we applied models inferred from historical data by evolutionary computation to simulate food-web dynamics of the plankton community of Lake Müggelsee from 2002 to 2012. The models were driven by nutrient concentrations, water temperature (WT) and endogenous interrelationships within the plankton community. The validated models simulated seasonal and inter-annual dynamics of seven phyto- and zooplankton groups in response to scenarios of prospective global warming, nutrient enrichments as well as combinations of warming with nutrient enrichments and warming with nutrient reductions.
Phenological WT-sensitivities resulting from the warming scenario indicated substantial shifts towards earlier timing of cyanobacteria peaks and delayed timing of cladocerans peaks in summer, suggesting significant phenological asynchrony in the plankton community of Lake Müggelsee. The phenological sensitivities of cyanobacteria and cladocerans towards phosphorus and nitrogen enrichments revealed similar trends of summer peaks as identified for warming, most likely contributing to phenological asynchrony.
The combination of warming and nutrient reductions showed increased spring maxima but almost unchanged summer peaks of cyanobacteria demonstrating that gradually decreasing phosphorus and nitrogen concentrations may outweigh warming effects on phytoplankton growth.
Scenarios that simulated WT- and nutrient-changes as gradual processes rather than immediately imposed events proved to be more realistic and credible. The proposed ensemble of complementary inferential models proved to be a viable tool for determining long-term dynamics and phenological asynchronies in plankton communities under the impact global changes.

View all citing articles on Scopus

View full text

Spatially-explicit forecasting of cyanobacteria assemblages in freshwater lakes by multi-objective hybrid evolutionary algorithms

Highlights

Abstract

Introduction

Section snippets

Study sites and data

Results for single-site multi-species model

Conclusions and future work

Acknowledgements

Ecol. Modell.

Ecol. Modell.

Water Res.

Ecol. Modell.

Neural Netw.

Ecol. Modell.

J. Hydrol.

Harmful Algae

Ecol. Modell.

Harmful Algae

Environ. Modell. Softw.

Trends Ecol. Evol.

Water Res.

Evolutionary computation: comments on the history and current state

IEEE Trans. Evol. Comput.

Hybrid evolutionary algorithm for rule set discovery in time-series data to forecast and explain algal population dynamics in two lakes different in morphometry and eutrophication

Parameter optimization algorithms for evolving rule models applied to freshwater ecosystems

IEEE Trans. Evol. Comput.

Bootstrap Methods and Their Application