Skip to main content
Log in

A Comparison Between Regression Models and Genetic Programming for Predictions of Chlorophyll-a Concentrations in Northern Lakes

  • Published:
Environmental Modeling & Assessment Aims and scope Submit manuscript

Abstract

Chlorophyll-a (chl-a) concentrations are often used as a proxy for water quality problems as well as phytoplankton blooms. Available chl-a models range from simple phosphorus loading models to complex regression and dynamic models. A comparison of multiple regression models was made with genetic programming (GP) techniques to predict chl-a concentrations over a large range of 104 Swedish lakes. Independent variables used were lake area, mean depth, iron, latitude, ammonium, nitrogen + nitrate, pH, phosphate, secchi depth, silicon, temperature, total phosphorus, total nitrogen and total organic carbon. GP is a method based on the Darwinian evolution theory. This implies that a program will be able to test different mathematical equations, iterating and improving each equation using fundamental ideas from evolution theory to increase the predictive power. A good correspondence was found between the multiple regression and the GP modelling approach. No significant improvement of the predictive power was found using GP, and it is therefore recommended that multiple regression methods should be preferred when predicting chl-a concentrations as these models tend to be less complex and the modelling approach is easier to use. Results from GP were in some cases more accurate compared to multiple regressions; however, the best model was created by multiple regressions which used concentrations of total phosphorus, total nitrogen and latitude as independent variables. These findings will be an important note for limnologists and modelling managers when developing future models of chl-a concentrations in lakes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Søndergaard, M., Larsen, A. E., Jørgensen, T. B., & Jeppesen, E. (2011). Using chlorophyll a and cyanobacteria in the ecological classification of lakes. Ecological Indicators, 11, 1403–1412.

    Article  Google Scholar 

  2. Gregor, J., & Marsalek, B. (2004). Freshwater phytoplankton quantification by chlorophyll a: a comparative study of in vitro, in vivo and in situ methods. Water Research, 38, 517–522.

    Article  CAS  Google Scholar 

  3. Håkanson, L., Bryhn, A. C., & Hytteborn, J. (2007). On the issue of limiting nutrient and predictions of cyanobacteria in aquatic systems. Science of the Total Environment, 379, 89–108.

    Article  Google Scholar 

  4. Sakamoto, M. (1966). Primary production by phytoplankton community in some Japanese lakes and its dependence on depth. Archives of Hydrobiology, 62, 1–28.

    Google Scholar 

  5. Dillon, P. J., & Rigler, F. H. (1974). The phosphorus-Chlorophyll Relationship in Lakes. Limnology and Oceanography, 19, 767–773.

    Article  CAS  Google Scholar 

  6. Jones, J. R., & Bachmann, R. W. (1976). Prediction of phosphorus and chlorophyll levels in lakes. Journal Water Pollution Control Federation, 48, 2176–2182.

    CAS  Google Scholar 

  7. Prairie, Y. T., Duarte, C. M., & Kalff, J. (1989). Unifying nutrient-chlorophyll relationships in lakes. Canadian Journal of Fisheries and Aquatic Sciences, 46, 1176–1182.

    Article  CAS  Google Scholar 

  8. Celik, K. (2006). Spatial and seasonal variations in chlorophyll-nutrient relationships in the shallow hypertrophic lake Manyas, Turkey. Environmental Monitoring and Assessment, 117, 261–269. doi:10.1007/s10661-006-0990-z.

    Article  CAS  Google Scholar 

  9. Carlson, R. E. (1977). A trophic state index for lakes. Limnology and Oceanography, 22, 361–369.

    Article  CAS  Google Scholar 

  10. Wetzel, R. G. (2001). Limnology (3rd ed.). San Diego: Academic Press.

    Google Scholar 

  11. Dimberg, P. H., Hytteborn, J. K., & Bryhn, A. C. (2013). Predicting median monthly chlorophyll-a concentrations. Limnologica, 43, 169–176. doi:10.1016/j.limno.2012.08.011 DOI:10.1016/j.limno.2012.08.011#doilink.

    Article  CAS  Google Scholar 

  12. Seip, K. L., Sas, H., & Vermij, S. (1990). The short term response to eutrophication abatement. Aquatic Sciences, 52, 199–220.

    Article  Google Scholar 

  13. Håkanson, L., & Peters, R. H. (1995). Predictive limnology—methods for predictive modelling. Amst: SPB Academic Publishers.

    Google Scholar 

  14. Muttil, N., & Lee, J. H. W. (2005). Genetic programming for analysis and real-time prediction of coastal algal blooms. Ecological Modelling, 189, 363–376.

    Article  Google Scholar 

  15. Muttil, N., & Chau, K.-W. (2006). Neural network and genetic programming for modelling coastal algal blooms. International Journal of Environment and Pollution, 28, 223–238.

    Article  CAS  Google Scholar 

  16. Koza, J. (1992). Genetic programming: on the programming of computers by means of natural selection. Cambridge, MA: MIT Press.

    Google Scholar 

  17. SLU. (2012). Database. http://www.slu.se/en/ (Accessed 20 August 2012)

  18. SMHI. (2009). SMHI—Sjödjup och sjövolym (eng: Lake depth and lake volume)

  19. Prairie, Y. T. (1996). Evaluating the predictive power of regression models. Canadian Journal of Fisheries and Aquatic Sciences, 53, 490–492.

    Article  Google Scholar 

  20. Hastie, T., Tibshirani, R., Friedman, J. (2009). The elements of statistical learning—data mining, inference and prediction, 2nd ed. Springer.

  21. Håkanson, L., & Lindström, M. (1997). Frequency distributions and transformations of lakes variables, catchment area and morphometric parameters in predictive regression models for small glacial lakes. Ecological Modelling, 99, 171–201.

    Article  Google Scholar 

  22. Oltean, M., & Grosan, C. (2003). A comparison of several linear genetic programming techniques. Complex Systems, 14, 285–313.

    Google Scholar 

  23. Searson, D. (2009). GPTIPS: genetic programming & symbolic regression for MATLAB. http://gptips.sourceforge.net

  24. Phillips, G., Pietiläinen, O. P., Carvalho, L., Solimini, A., Solheim Lyche, A., & Cardaso, A. C. (2008). Chlorophyll-nutrient relationships of different lake types using a large European dataset. Aquatic Ecology, 42, 213–226. doi:10.1007/s10452-008-9180-0.

    Article  CAS  Google Scholar 

Download references

Acknowledgments

The authors would like to thank one anonymous reviewer and the associate editor who greatly helped in improving this article. The authors would also like to thank Gesa Weyhenmeyer and Roger Herbert for valuable comments. The Swedish University of Agricultural Sciences and the Swedish Meteorological and Hydrological Institute are also acknowledged for making data available on their web pages.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter H. Dimberg.

Appendix 1

Appendix 1

The models produced using regression and GP modelling techniques (Eqs. 1 to 5). a = Regression models, b = GP models. p Value is below 0.001 for Eqs. 1, 3 and 5. p Value is below 0.05 for Eqs. 2 and 4. Cluster elimination has been made for Eqs. 4 and 5.

$$ \log (Chl)=0.58+1.23\cdot \log\;(TP) $$
(1a)
$$ Chl=0.02996+\left(5.728\cdot {10}^{-7}\right)\cdot T{P}^4+0.5124\cdot TP $$
(1b)
$$ \log (Chl)=2.17+1.4\cdot \log (TP)-0.75\cdot \log (TN)-0.03\cdot Lat+0.3\cdot log(TOC)-0.17\cdot \log \left(N{H}_4\right)+0.16\cdot \log (Dm) $$
(2a)
$$ Chl=1.623+0.0006625\cdot TOC-12.54\cdot TP-0.02813\cdot Dm-0.02747\cdot TN-0.02813\cdot TP\cdot N{H}_4+1.195\cdot TP\cdot Lat-0.02743\cdot TP\cdot La{t}^2+0.0001878\cdot TP\cdot La{t}^3+0.0006625\cdot N{H_4}^2+0.0001294\cdot Dm\cdot TP\cdot N{H_4}^2 $$
(2b)
$$ \log (Chl)=3.53+1.4\cdot \log (TP)-0.72\cdot \log (TN)-0.04\cdot Lat $$
(3a)
$$ Chl=1.258+0.006834\cdot TP\cdot \left(TP+TN\right)-0.05404\cdot Lat+0.02702\cdot TP\cdot Lat-0.0001351\cdot TP\cdot Lat\cdot \left(TN+Lat\right) $$
(3b)
$$ \log (Chl)=-1.28+1.07\cdot \log (TP)+0.04\cdot Temp+0.07\cdot \log (Fe)-0.09\cdot \log \left(N{O}_2N{O}_3\right)+0.21\cdot \log (Dm) $$
(4a)
$$ Chl=3.794-\left(1.94\cdot {10}^{-5}\right)\cdot T{P}^2\cdot N{O}_2N{O_3}^2+0.0001868\cdot Dm\cdot T{P}^2\cdot N{O}_2N{O}_3+0.4688\cdot TP+0.2344\cdot Temp $$
(4b)
$$ \log (Chl)=-1.06+1.08\cdot \log (TP)+0.04\cdot Temp $$
(5a)
$$ Chl=3.953+\left(6.014\cdot {10}^{-7}\right)\cdot T{P}^4+0.4866\cdot TP+0.2433\cdot Temp $$
(5b)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dimberg, P.H., Olofsson, C.J. A Comparison Between Regression Models and Genetic Programming for Predictions of Chlorophyll-a Concentrations in Northern Lakes. Environ Model Assess 21, 221–232 (2016). https://doi.org/10.1007/s10666-015-9480-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10666-015-9480-4

Keywords

Navigation