Predicting water turbidity in a macro-tidal coastal bay using machine learning approaches

https://doi.org/10.1016/j.ecss.2021.107276Get rights and content

Highlights

  • Machine learning models are useful tools for estimating coastal water turbidity.

  • Data reduction is essential to select model inputs.

  • Memory effects are considered in responses of sea surface turbidity to tidal range.

  • The artificial neural network model shows the best performance.

  • The genetic programming model helps to provide physically meaningful predictors.

Abstract

Water turbidity is of particular importance for diffusion and migration of nutrients and contaminants, biological production, and ecosystem health in coastal turbid areas. The estimation of water turbidity is therefore significant for studies of coastal dynamics. Many factors influence turbidity in complex and nonlinear ways, making accurate estimations of turbidity a challenging task. In this study, three machine learning models, Artificial Neural Networks (ANN), Genetic Programming (GP), and Support Vector Machine (SVM) are developed for better estimation and prediction of the tidally-averaged sea surface turbidity. The observational data of tides and waves at a macro-tidal coastal bay, Jiangsu coast, China are used as model inputs. Through data reduction, it is found that tidal average sea surface turbidity is most determined by the average tidal range of the two preceding tidal cycles (2 and 3 tidal periods before the present one, respectively) and the tidal average significant wave height of the present tidal cycle of turbidity. These three machine learning models all show successful estimations of turbidity, and comparisons of the optimized models indicate that ANN shows the best performance and GP helps to provide physically meaningful predictors. This study provides an example of developing a predictive machine learning algorithm with a limited dataset (94 tidal cycles). The generality of the present predictors can be reinforced with much more data from a variety of coastal environments.

Introduction

In coastal turbid areas, water turbidity is mostly determined by suspended sediment concentration (SSC) (Wilson and Heath, 2019). Variability in water turbidity is of particular importance for diffusion and migration of nutrients and contaminants, biological production, and further the maintenance of coastal ecosystem health (Tian et al., 2009; Kennish, 2019; Díez-Minguito and de Swart, 2020; Ganju et al., 2020; Ge et al., 2020). Coastal water turbidity (and SSC) has been simulated by many process-based models (Lesser et al., 2004; Warner et al., 2008; Amoudry and Souza, 2011; Gao et al., 2018) and also statistical models (Amos and Tee, 1989; Mitchell et al., 2012; Wilson and Heath, 2019).

Machine learning is a useful and growing set of tools to extract information from large datasets that can be translated into understanding the underlying physics (Afan et al., 2016; Bergen et al., 2019; Ebert-Uphoff et al., 2019; Brunton et al., 2020). As statistical models, machine learning models are data-driven. However, the advantage of machine learning is that it can automatically search for rules and relationships in the data without any hypothesis on their structure (Goldstein et al., 2019). With increasingly observational datasets and powerful machine learning tools, machine learning gradually plays an important role in the study of coastal processes (e.g. Goldstein et al., 2019; Beuzen and Splinter, 2020). Machine learning, process-based, and statistical models can complement each other, and their synergy has the potential to transform the study of coastal dynamics.

Among machine learning models, the Artificial Neural Network (ANN) is one of the most powerful and popular techniques in recent years. ANNs have been widely used in studies of coastal sediment dynamics, where they show outstanding performance in the prediction of near-bed reference SSC (Lin and Namin, 2005; Oehler et al., 2012), wave-induced sand ripple geometry (Yan et al., 2008), and longshore sediment transport rate (van Maanen et al., 2010; Kabiri-Samani et al., 2011).

Genetic Programming (GP) is a ‘‘natural selection’’-based machine learning method (Koza, 1992) that has been used to study river suspended sediment load (Kisi et al., 2012), wave-induced sand ripple geometry (Goldstein et al., 2013), vegetation effects on current velocity (Tinoco et al., 2015), and sediment settling velocity (Goldstein and Coco, 2014). In contrast to other machine learning techniques, here the basis function is not required prior to optimization. GP provides smooth functions that are easy to interpret for physical significance and particularly suited to discover physical laws (Goldstein et al., 2013; Tinoco et al., 2015).

Support Vector Machine (SVM) is an emerging machine learning technique for classification, regression, and other learning tasks (Chang and Lin, 2011). It has good modeling skills when simulating estuarine algal blooms (Shen et al., 2019), forecasting wave conditions (James et al., 2018), and predicting coastal storm surge (Rajasekaran et al., 2008). However, as suggested by Goldstein et al. (2019) and Beuzen and Splinter (2020), this method has minimal usage in studies of coastal sediment dynamics.

Although machine learning approaches have been widely used in hydrology and oceanography, the applications of machine learning models for coastal water turbidity (and SSC) simulation are relatively limited. Yoon et al. (2013) used an ANN to predict time-dependent SSC as a function of low-frequency motion, wave-induced motion, and turbulent kinetic energy in the surf zone, where a time lag of 2 s of SSC to input parameters was set to account for the history of sediment suspension. In a wave-dominated coast, Bhattacharya et al. (2012) simulated SSC variations in response to bed shear stress, wave height, and wind velocity by an ANN model. Also using ANN, Seo et al. (2020) established the relationship between SSC and hydrodynamic factors in an estuary, including river discharge, flow velocity, and salinity at the surface and bottom layer.

It is noted that the effect of memory is an important consideration in predicting time-dependent SSC or turbidity, which is, in essence, a measure of the autocorrelation in the time series. For tidal environments, the memory effects are likely more significant, i.e., turbidity or SSC responds to hydrodynamics with a relaxation time of up to a few days (Grabemann et al., 1997, Postma, 1967; Schoelhamer, 1996).

In light of the previous studies, it is important to further evaluate the performance of machine learning methods for coastal water turbidity (and SSC) estimation (Goldstein et al., 2019). The objective of this study is to apply machine learning models (ANN, GP, and SVM) to predict water turbidity in response to hydrodynamic factors in a macro-tidal coastal bay at Jiangsu coast, China. The three methods are then compared to determine which is best and gives insight into processes. In addition, the role of memory effects on coastal water turbidity (and SSC) variation is to be examined.

Section snippets

Study area

Among the numerous tidal basins along the Jiangsu coast, China, “Xiaomiaohong channel” is chosen for this study, which situates on the southern flank of submarine radial sand ridges (Zhang, 2004) (Fig. 1). The basin is ~38 km long and ~15 km wide. A blind channel with a maximum water depth larger than 10 m lies in the basin, stretching in the WNW-ESE direction and flanked by tidal flats. The channel width is ~8 km at the mouth and diminishes to the WNW end (Yu and Lu, 1996; Chen et al., 2012).

Field observations

Long-term monitoring of the coastal environment was undertaken at the Xiaomiaohong channel (Fig. 1). A tidal gauge was set in the middle reach of the blind channel, using a VEGAPULS 61 radar sensor to record the sea surface elevation relative to the 1985 national height datum of China. Waves are measured by an offshore buoy. An OBS3A was mounted on the buoy measuring the turbidity at 0.9 m below the sea surface. Continuous hourly datasets of water levels, significant wave heights, wave peak

Selection of the optimized models

For each machine learning method, only a single model is finally chosen according to their performances evaluated not only by the error metrics but incorporating partially known physics as well. Note that the RMSE for training and test datasets are very close here in all the three types of methods, thus unless specified, RMSE refers to that for the whole dataset and hereinafter the same.

Firstly, RMSE in the ANN models keeps almost steady for HLS of 4, 6, 8, and 10, which equals to 10.68, 10.68,

Discussion

By extraction of quantitative relationships from multidimensional observational data, the present machine learning approaches can substantially improve our understanding of seawater turbidity variations, which is an important part of marine sediment dynamics. As suggested by Brunton et al. (2020), the task of machine learning studies is to discover unknown physics and to improve models by incorporating known physics. The machine learning models can be further combined with process-based models,

Conclusions

Through data reduction, it is found that tidal average sea surface turbidity is most determined by the average tidal range of the two preceding tidal cycles which are 2 and 3 tidal periods before the present one and the tidal average significant wave height of the present tidal cycle of turbidity. Three machine learning models (ANN, GP, and SVM) are successfully applied to simulate the turbidity in response to variations in tidal ranges and wave heights. Comparisons of the optimized models

CRediT authorship contribution statement

Yunwei Wang: Methodology, Writing - original draft, Writing - review & editing, Funding acquisition. Jun Chen: Investigation. Hui Cai: Investigation. Qian Yu: Conceptualization, Writing - original draft, Funding acquisition. Zeng Zhou: Writing - review & editing, Funding acquisition.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This study is supported by the National Natural Science Foundation of China (41676077, 41676081, 41976156, and 51620105005). We highly acknowledge the two anonymous reviewers for their constructive suggestions. All rights of the data from raw observations are reserved to the Experiment Center of Harbour and Waterway Engineering and Coastal Ocean Science, Hohai University. The tidal average data for the machine learning models in this paper are available at //data.mendeley.com/datasets/cjbw66w27k/1

References (73)

  • A.R. Kabiri-Samani et al.

    Application of neural networks and fuzzy logic models to long-shore sediment transport

    Appl. Soft Comput.

    (2011)
  • O. Kisi et al.

    Suspended sediment modeling using genetic programming and soft computing techniques

    J. Hydrol.

    (2012)
  • G.R. Lesser et al.

    Development and validation of a three-dimensional morphological model

    Coast. Eng.

    (2004)
  • F. Oehler et al.

    A data-driven approach to predict suspended-sediment reference concentration under non-breaking waves

    Continent. Shelf Res.

    (2012)
  • S. Rajasekaran et al.

    Support vector regression methodology for storm surge predictions

    Ocean Eng.

    (2008)
  • J. Shen et al.

    A data-driven modeling approach for simulating algal blooms in the tidal freshwater of James River in response to riverine nutrient loading

    Ecol. Model.

    (2019)
  • T. Tian et al.

    Importance of resuspended sediment dynamics for the phytoplankton spring bloom in a coastal marine ecosystem

    J. Sea Res.

    (2009)
  • M. Valipour et al.

    Comparison of the ARMA, ARIMA, and the autoregressive artificial neural network models in forecasting the monthly inflow of Dez dam reservoir

    J. Hydrol.

    (2013)
  • Z.B. Wang et al.

    Influence of the nodal tide on the morphological response of estuaries

    Mar. Geol.

    (2012)
  • J.C. Warner et al.

    Development of a three-dimensional, regional, coupled wave, current, and sediment-transport model

    Comput. Geosci.

    (2008)
  • M. Xie et al.

    A validation concept for cohesive sediment transport model and application on Lianyungang Harbor, China

    Coast. Eng.

    (2010)
  • B. Yan et al.

    Prediction of sand ripple geometry under waves using an artificial neural network

    Comput. Geosci.

    (2008)
  • Z.M. Yaseen et al.

    Artificial intelligence based models for stream-flow forecasting: 2000–2015

    J. Hydrol.

    (2015)
  • H.-D. Yoon et al.

    Prediction of time-dependent sediment suspension in the surf zone using artificial neural network

    Coast. Eng.

    (2013)
  • D. Zhang et al.

    Modeling and simulating of reservoir operation using the artificial neural network, support vector regression, deep learning algorithm

    J. Hydrol.

    (2018)
  • C.L. Amos et al.

    Suspended sediment transport processes in Cumberland basin, bay of Fundy

    J. Geophys. Res.

    (1989)
  • L.O. Amoudry et al.

    Deterministic coastal morphological and sediment transport modeling: a review and discussion

    Rev. Geophys.

    (2011)
  • S. Araghinejad
  • K.J. Bergen et al.

    Machine learning for data-driven discovery in solid Earth geoscience

    Science

    (2019)
  • T. Beuzen et al.

    Controls of variability in Berm and Dune storm erosion

    J. Geophys. Res.: Earth Surf.

    (2019)
  • B. Bhattacharya et al.

    Spatio-temporal prediction of suspended sediment concentration in the coastal zone using an artificial neural network and a numerical model

    J. Hydroinf.

    (2012)
  • G.J. Bowden et al.

    Optimal division of data for neural network models in water resources applications

    Water Resour. Res.

    (2002)
  • S.L. Brunton et al.

    Machine learning for fluid mechanics

    Annu. Rev. Fluid Mech.

    (2020)
  • C.-C. Chang et al.

    LIBSVM: a library for support vector machines

    ACM Trans. Intell. Syst. Technol.

    (2011)
  • Y.-W. Chang et al.

    Training and testing low-degree polynomial data mappings via linear SVM

    J. Mach. Learn. Res.

    (2010)
  • K. Chen et al.

    Hydrodynamic mechanism of morphology revolution of the Xiaomiaohong tidal channel in radial sand ridges, Jiangsu province

    Acta Sci. Nat. Univ. Sunyatseni

    (2012)
  • Cited by (0)

    View full text