Predicting water turbidity in a macro-tidal coastal bay using machine learning approaches
Introduction
In coastal turbid areas, water turbidity is mostly determined by suspended sediment concentration (SSC) (Wilson and Heath, 2019). Variability in water turbidity is of particular importance for diffusion and migration of nutrients and contaminants, biological production, and further the maintenance of coastal ecosystem health (Tian et al., 2009; Kennish, 2019; Díez-Minguito and de Swart, 2020; Ganju et al., 2020; Ge et al., 2020). Coastal water turbidity (and SSC) has been simulated by many process-based models (Lesser et al., 2004; Warner et al., 2008; Amoudry and Souza, 2011; Gao et al., 2018) and also statistical models (Amos and Tee, 1989; Mitchell et al., 2012; Wilson and Heath, 2019).
Machine learning is a useful and growing set of tools to extract information from large datasets that can be translated into understanding the underlying physics (Afan et al., 2016; Bergen et al., 2019; Ebert-Uphoff et al., 2019; Brunton et al., 2020). As statistical models, machine learning models are data-driven. However, the advantage of machine learning is that it can automatically search for rules and relationships in the data without any hypothesis on their structure (Goldstein et al., 2019). With increasingly observational datasets and powerful machine learning tools, machine learning gradually plays an important role in the study of coastal processes (e.g. Goldstein et al., 2019; Beuzen and Splinter, 2020). Machine learning, process-based, and statistical models can complement each other, and their synergy has the potential to transform the study of coastal dynamics.
Among machine learning models, the Artificial Neural Network (ANN) is one of the most powerful and popular techniques in recent years. ANNs have been widely used in studies of coastal sediment dynamics, where they show outstanding performance in the prediction of near-bed reference SSC (Lin and Namin, 2005; Oehler et al., 2012), wave-induced sand ripple geometry (Yan et al., 2008), and longshore sediment transport rate (van Maanen et al., 2010; Kabiri-Samani et al., 2011).
Genetic Programming (GP) is a ‘‘natural selection’’-based machine learning method (Koza, 1992) that has been used to study river suspended sediment load (Kisi et al., 2012), wave-induced sand ripple geometry (Goldstein et al., 2013), vegetation effects on current velocity (Tinoco et al., 2015), and sediment settling velocity (Goldstein and Coco, 2014). In contrast to other machine learning techniques, here the basis function is not required prior to optimization. GP provides smooth functions that are easy to interpret for physical significance and particularly suited to discover physical laws (Goldstein et al., 2013; Tinoco et al., 2015).
Support Vector Machine (SVM) is an emerging machine learning technique for classification, regression, and other learning tasks (Chang and Lin, 2011). It has good modeling skills when simulating estuarine algal blooms (Shen et al., 2019), forecasting wave conditions (James et al., 2018), and predicting coastal storm surge (Rajasekaran et al., 2008). However, as suggested by Goldstein et al. (2019) and Beuzen and Splinter (2020), this method has minimal usage in studies of coastal sediment dynamics.
Although machine learning approaches have been widely used in hydrology and oceanography, the applications of machine learning models for coastal water turbidity (and SSC) simulation are relatively limited. Yoon et al. (2013) used an ANN to predict time-dependent SSC as a function of low-frequency motion, wave-induced motion, and turbulent kinetic energy in the surf zone, where a time lag of 2 s of SSC to input parameters was set to account for the history of sediment suspension. In a wave-dominated coast, Bhattacharya et al. (2012) simulated SSC variations in response to bed shear stress, wave height, and wind velocity by an ANN model. Also using ANN, Seo et al. (2020) established the relationship between SSC and hydrodynamic factors in an estuary, including river discharge, flow velocity, and salinity at the surface and bottom layer.
It is noted that the effect of memory is an important consideration in predicting time-dependent SSC or turbidity, which is, in essence, a measure of the autocorrelation in the time series. For tidal environments, the memory effects are likely more significant, i.e., turbidity or SSC responds to hydrodynamics with a relaxation time of up to a few days (Grabemann et al., 1997, Postma, 1967; Schoelhamer, 1996).
In light of the previous studies, it is important to further evaluate the performance of machine learning methods for coastal water turbidity (and SSC) estimation (Goldstein et al., 2019). The objective of this study is to apply machine learning models (ANN, GP, and SVM) to predict water turbidity in response to hydrodynamic factors in a macro-tidal coastal bay at Jiangsu coast, China. The three methods are then compared to determine which is best and gives insight into processes. In addition, the role of memory effects on coastal water turbidity (and SSC) variation is to be examined.
Section snippets
Study area
Among the numerous tidal basins along the Jiangsu coast, China, “Xiaomiaohong channel” is chosen for this study, which situates on the southern flank of submarine radial sand ridges (Zhang, 2004) (Fig. 1). The basin is ~38 km long and ~15 km wide. A blind channel with a maximum water depth larger than 10 m lies in the basin, stretching in the WNW-ESE direction and flanked by tidal flats. The channel width is ~8 km at the mouth and diminishes to the WNW end (Yu and Lu, 1996; Chen et al., 2012).
Field observations
Long-term monitoring of the coastal environment was undertaken at the Xiaomiaohong channel (Fig. 1). A tidal gauge was set in the middle reach of the blind channel, using a VEGAPULS 61 radar sensor to record the sea surface elevation relative to the 1985 national height datum of China. Waves are measured by an offshore buoy. An OBS3A was mounted on the buoy measuring the turbidity at 0.9 m below the sea surface. Continuous hourly datasets of water levels, significant wave heights, wave peak
Selection of the optimized models
For each machine learning method, only a single model is finally chosen according to their performances evaluated not only by the error metrics but incorporating partially known physics as well. Note that the RMSE for training and test datasets are very close here in all the three types of methods, thus unless specified, RMSE refers to that for the whole dataset and hereinafter the same.
Firstly, RMSE in the ANN models keeps almost steady for HLS of 4, 6, 8, and 10, which equals to 10.68, 10.68,
Discussion
By extraction of quantitative relationships from multidimensional observational data, the present machine learning approaches can substantially improve our understanding of seawater turbidity variations, which is an important part of marine sediment dynamics. As suggested by Brunton et al. (2020), the task of machine learning studies is to discover unknown physics and to improve models by incorporating known physics. The machine learning models can be further combined with process-based models,
Conclusions
Through data reduction, it is found that tidal average sea surface turbidity is most determined by the average tidal range of the two preceding tidal cycles which are 2 and 3 tidal periods before the present one and the tidal average significant wave height of the present tidal cycle of turbidity. Three machine learning models (ANN, GP, and SVM) are successfully applied to simulate the turbidity in response to variations in tidal ranges and wave heights. Comparisons of the optimized models
CRediT authorship contribution statement
Yunwei Wang: Methodology, Writing - original draft, Writing - review & editing, Funding acquisition. Jun Chen: Investigation. Hui Cai: Investigation. Qian Yu: Conceptualization, Writing - original draft, Funding acquisition. Zeng Zhou: Writing - review & editing, Funding acquisition.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This study is supported by the National Natural Science Foundation of China (41676077, 41676081, 41976156, and 51620105005). We highly acknowledge the two anonymous reviewers for their constructive suggestions. All rights of the data from raw observations are reserved to the Experiment Center of Harbour and Waterway Engineering and Coastal Ocean Science, Hohai University. The tidal average data for the machine learning models in this paper are available at //data.mendeley.com/datasets/cjbw66w27k/1
References (73)
- et al.
Past, present and prospect of an Artificial Intelligence (AI) based model for sediment transport prediction
J. Hydrol.
(2016) - et al.
Machine learning and coastal processes
- et al.
Long-term model of planimetric and bathymetric evolution of a tidal lagoon
Continent. Shelf Res.
(2010) Twenty-five years with OBS sensors: the good, the bad, and the ugly
Continent. Shelf Res.
(2006)- et al.
Influence of suspended sediment front on nutrients and phytoplankton dynamics off the Changjiang Estuary: a FVCOM-ERSEM coupled model experiment
J. Mar. Syst.
(2020) Suspended-sediment response to semidiurnal and fortnightly tidal variations in a mesotidal estuary: Columbia River, U.S.A
Mar. Geol.
(1983)- et al.
Prediction of wave ripple characteristics using genetic programming
Continent. Shelf Res.
(2013) - et al.
A review of machine learning applications to coastal sediment transport and morphodynamics
Earth Sci. Rev.
(2019) - et al.
Behaviour of turbidity maxima in the Tamar (UK) and Weser (FRG) estuaries. Estuarine
Coast. Shelf Sci.
(1997) - et al.
A machine learning framework to forecast wave conditions
Coast. Eng.
(2018)