Evolutionary computational intelligence algorithm coupled with self-tuning predictive model for water quality index determination

doi:10.1016/j.jhydrol.2020.124974

Journal of Hydrology

Volume 587, August 2020, 124974

https://doi.org/10.1016/j.jhydrol.2020.124974 Get rights and content

Highlights

•
A new hybrid intelligent models are developed for water quality index (WQI).
•
Non-linear input selection with non-tuned learning model is built for modeling WQI.
•
Kinta River is selected as case study which is located in tropical environment.
•
The stand-alone modeling schema is validated with the proposed hybrid models.
•
The proposed models indicated a superior prediction capacity for WQI.

Abstract

Anthropogenic activities affect the water bodies and result in a drastic reduction of river water quality (WQ). The development of a reliable intelligent model for evaluating the suitability of water remains a challenging task facing hydro-environmental engineers. The current study is investigated the applicability of Extreme Gradient Boosting (XGB) and Genetic Programming (GP) in obtaining feature importance, and then abstracted input variables were imposed into the predictive model (the Extreme Learning Machine (ELM)) for the prediction of water quality index (WQI). The stand-alone modeling schema is compared with the proposed hybrid models where the optimum variables are supplied into the GP, XGB, linear regression (LR), stepwise linear regression (SWLR) and ELM models. The WQ data is obtained from the Department of Environment (DoE) (Malaysia), and results are evaluated in terms of determination coefficient (R²) and root mean square error (RMSE). The results demonstrated that the hybrid GPELM and XGBELM models outperformed the standalone GP, XGB, and ELM models for the prediction of WQI at Kinta River basin. A comparison of the hybrid models showed that the predictive skill of GPELM (RMSE = 3.441 training and RMSE = 3.484 testing) over XGBELM improving the accuracy by decreasing the values of RMSE by 5% and 9% for training and testing, respectively with regards to XGBELM (RMSE = 3.606 training and RMSE = 3.816 testing). Although regressions are often proposed as reference models (LR and SWLR), when combined with computational intelligence, they still provide satisfactory results in this study. The proposed hybrid GPELM and XGBELM models have improved the prediction accuracy with minimum number of input variables and can therefore serve as reliable predictive tools for WQI at Kinta River basin.

Introduction

The continuous reduction in WQ remained a top global concern, especially in terms of industrial, agricultural and domestic utilization. Several factors including geological, weather, hydrological, industrial source (such as textile, laundry, and pharmaceutical etc.) and natural phenomena (physical processes) have adverse effect on the quality of water (Abba and Elkiran, 2017, Ke et al., 2015). In literature, water pollution is described as presence of detrimental substance(s) in water to the level that causes problems to living organisms. Therefore, monitoring water features to ensure the quality of is of paramount importance (Mahmoodabadi and Rezaei Arshad, 2018). WQ can be considered as set of chemical, physical and biological properties of water that can be utilized in forecasting the water quality (Sharma and Kansal, 2011). WQ helps in determining the chemicals concentration in the water. Thus, assessment of WQ can be describe as the analysis chemical, physical and biological properties of water. In the literature, several studies have been carried out in the area of water quality (Singh, 2017). Following this, WQI provide a single parameter describing the water quality reducing the huge number of parameters to a simpler expression, thereby making the monitoring and interpretation WQ effortless (Gazzaz et al., 2012). However, WQ is dependent on the ecosystem as well as human usages, such as industrial pollution, sewage, and wastewater; and both international and national agencies are committed to pollution control and WQ analysis (Bharti, 2011). Factually, there is no single variable can express the WQ efficiently, the WQ is normally determined by measuring multiple WQ parameters (Hameed et al., 2017). For this purpose, large amount of data is collected by the monitoring team which must be easily interpretable for decision-makers and the general public. In in this regard, numerous WQIs have been developed, which are defined based on WQ criteria and important parameters (Bharti, 2011).

WQI is a widely used measure in different parts of the world to solve problems of data management and to evaluate the successes as well as failures of management strategies for improving water quality. While numerous indices are used in summarizing WQ data to an understandable format, WQI is derived from numerous water characterization parameters to signifies water quality level (Abbasi, 2002). In essence, general approach to computing WQI involve passing numerous water quality parameters into computable functions and logical expressions that rates the wellbeing of a water body with a solitary number (Castilla-herná, 2014). The computed value is then rated from very bad to excellent based on existing rating scale which is understandable in ascertaining WQ by non-technical water managers, political decision-makers and the general public (Abbasi, 2002). Over the decades various WQI have been defined globally, these WQI help to represent the overall WQ in that particular area efficiently, including the United State National Sanitation Foundation WQI (NSFWQI) (Horton, 1965, Bharti, 2011), the Canadian Council of Ministers of the Environment WQI (CCMEWQI) (Sharma, 2002, Khan et al., 2004), the British Columbia Water Quality Index (BCWQI), and Oregon WQI (OWQI) (Debels et al., 2005). These indices are based on a comparison of the WQ parameters to regulatory standards and give a single value to the water quality of a source (Abba et al., 2019).

Despite the several paradigms that can be used to assess the quality of water, Malaysian Department of Environments (DoE) approved the use of a unified water quality indices in 1974, for analysis and ranking the level of contamination and pollution of Malaysian rivers. This recommended adaptation of WQI is called the DoE-WQI and served as method of choice for calculating the WQ index of local Malaysian rivers (Gazzaz et al., 2012, Yaseen et al., 2018a, Yaseen et al., 2018b, Yaseen et al., 2018c). DoE-WQI gives a standard WQ guidelines use to classify the WQ of Malaysian local rivers to five categories depending on their appropriateness for different usage including irrigation, domestic supplies and fish culture, water supply, recreational use, livestock drinking. The application used for computing the water quality in Malaysia is almost same as that employed in (Muhammad et al., 2015, Gazzaz et al., 2012, Yaseen et al., 2018a, Yaseen et al., 2018b, Yaseen et al., 2018c). However, the analysis and calculation of this method generally requires significant amounts of time and effort, which could lead to accidental errors during the sub-index computations; nevertheless, the method has proved to be highly effective and successful in practice based on scientific fact. For detailed manual calculation of the WQ sub-index for Malaysia, refer to (Hameed et al., 2017, Gazzaz et al., 2012).

Recent research shows that there is exponential increase in the use of Artificial Intelligence (AI), which represents an alternative, attractive, quick and direct computing tool for water quality modeling (Tiyasha et al., 2020, Sahoo and Patra, 2020, Gaya et al., 2020, Yasin and Karim, 2020, Karim and Kamsani, 2020). AI has the ability to minimize the error, effort and computation time (Bhagat et al., 2019). Artificial neural network (ANN), adaptive neuro-fuzzy inference system (ANFIS), and support vector regression (SVR) are among the popular artificial intelligence modeling methods developed for highly complicated process and handling data nonlinearity (Barzegar et al., 2016). However, several researchers have employed different combinations of AI-based models to predict the WQI (Wagener et al., 2019, Yaseen et al., 2018b). For example, Nourani et al. (2013) proposed the application of an ANN to monitor treated water quality. The results indicated that ANN has the potential to perform better than the conventional WQ method. Emamgholizadeh et al. (2014) applied ANN and ANFIS models simultaneously to predict the WQ variables of the Karoon River water, and the model results indicated the ANN model have a better predictive ability compared to ANFIS based models. In this regard, Barzegar et al. (2016) investigated the performance of standalone ANN and ANFIS, and hybrid wavelet-ANN and wavelet-ANFIS for the prediction of monthly averaged water salinity of Aji-Chay River in northwest Iran, where the results showed that ANFIS performance was superior to ANN. Hameed et al. (2017) examined two different models viz; Radial Basis Function NN (RBFNN) and Back Propagation NN (BPNN). The results from both models shows the reliable performance with improved accuracy.

Similarly, ANN model was employed to estimate the WQI at Langat River Basin, Malaysia, and the outcomes was compared with the traditional multilinear regression analysis. The obtained results demonstrated the effectiveness of the ANN model for the prediction of WQI (Juahir et al., 2004). Gazzaz et al. (2012) studied the potential of an ANN model to predict the WQI in Malaysia; the results demonstrated that the ANN model offered a reliable alternative to WQI computation and forecasting. Mohammadpour et al. (2014) studied and compared the potential of Support Vector Machine (SVM), BPNN and RBFNN techniques to predict the WQI in a wetland. For this purpose, different WQ variables at 17 monitoring points were evaluated. The results showed that SVM and FFBP outperformed RBF and therefore emerged as successful and reliable models for the prediction of WQI. The outcomes also indicated that the methods can reliably reduce time and computational burdens. More recently, Yaseen et al., 2018a, Yaseen et al., 2018b, Yaseen et al., 2018c) employed different types of AI-based models to predict the WQI. Abba et al. (2019) presented a performance comparison analysis of non-linear models based on RBFNN and Hammerstein-Weiner (HW) techniques and traditional linear modelling technique based on Generalized Linear Regression (GLR) and Autoregressive Integrated Moving Average (ARIMA) for Agra station of Yamuna river, India, and Kinta river in Malaysia. The simulation results established that the non-linear models outperformed the other models.

Based on the above literature, it is evident that studies in the field of WQ and WQI conducted around the world have shown the reliability of computational intelligence models (Tiyasha et al., 2020). Although there is a significant increase in the use of AI models in the field of WQI prediction, yet, many standalone AI models produce unsatisfactory results due to some related limitations to the identification of the appropriate input parameters or the selection of the internal models parameters. Particularly in applications involving very dynamic hydrological processes. Hence, to alleviate the aforementioned shortcomings, it is mandatory to develop an appropriate structure and to have a proper selection technique of the suitable variables. This has been proven to be significantly influential in the prediction ability for complex non-linear processes, in order to provide a reliable alternative to manual WQ computation. Hence, the objective of this study is to investigate the potential of XGB and GP in obtaining the feature importance, and then the abstracted inputs are used in the predictive model (i.e. ELM). For comparison purpose, the proposed models are compared with the stand-alone schema where the optimum combination of input variables are applied to the XGB, GP, ELM and Linear Regression (LR) models. The primary motivation behind the current study is to explore the abilities of the GP and XGB approaches for choosing the most dominant related variables. In this regard, hybrid models are established to increase the prediction accuracy of the non-tuned data intelligent models for mimicking the river WQ pattern. However, it is important to mention that since the development of this new algorithm, to the best of the authors’ knowledge, no research or technical literature has been published in respect of application of XGB in WQI prediction and as an essential method of input selection. Because of its distinctive and outstanding features, this study, used XGB algorithm in both input selection and WQI modelling. The remaining sections of the paper are arranged as follows: Section 2 presents the applied methodology for single model and proposed modeling schema adopted in the study, Section 3 described the case study and the data description, while the application of result and analysis are given in Section 4 and the paper ends with a conclusion in Section 5.

Section snippets

Extreme gradient boosting (XGB)

The XGB algorithm has lately been dominating supervised machine learning applications, such as regression and classification (Pradhan and Sameen, 2020). It is an improvement of the gradient boosting technique introduced by Friedman (2001). It possesses the salient features of gradient tree boosting methods such as computational efficiency, high processing ability, and learning speed. It uses a more precise approximation to find the best decision tree model to achieve a higher speed and better

Case study and data description

River Kinta span an area of about 2500 km² and a length of about 100 km within Kinta district. It supports farming activities, residential areas and industries (Fig. 2) across its three subdivisions (i.e. upstream, undulating and downstream). On the other hand, stations 2PK34 and 2PK25 are the center of the river, at the upstream there are stations 2PK22, 2PK24, and stations 2PK55 and 2PK19 are the downstream. Kinta river is largely utilized for forestry and recreation along the upstream. The

Application results and analysis

The accuracy of intelligent models is affected by the use of numerous inputs. Several inputs selection methods have been reported in hydro-environmental literatures; popular among them includes; correlation, auto-correlation and principal components analysis. Nevertheless, these methods are frequently used for input/output linear relationships (Hadi and Tombul, 2018). This study employed two novel nonlinear input variables selection tools (GP and XGB) for choosing the most dominant related

Conclusion

The study was explored the abilities of two evolutionary nonlinear input variable selection techniques (GP and XGB) coupled with a non-tuned data intelligent model (ELM) for modeling the WQI in Kinta River basin, Malaysia. For this purpose, different trials were evaluated, and the best one was chosen as the best combination. The established modeling schema of the study was to use GP and XGB as a nonlinear selection tool in addition to using them as the main model or prediction models.

CRediT authorship contribution statement

S.I. Abba: Conceptualization, Validation, Investigation, Visualization. Sinan Jasim Hadi: Software, Validation, Investigation, Visualization. Saad Sh. Sammen: Conceptualization, Validation, Investigation, Visualization. Sinan Q. Salih: Validation, Investigation, Visualization, Validation, Investigation, Visualization. R.A. Abdulkadir: . Quoc Bao Pham: . Zaher Mundher Yaseen: Conceptualization, Validation, Investigation, Visualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors acknowledge the historical water quality data availability by the Department of Environment (DoE) (Malaysia).

References (75)

S.I. Abba et al.
Effluent prediction of chemical oxygen demand from the wastewater treatment plant using artificial neural network application
Proc. Comput. Sci.
(2017)
Z.-Y. Chen et al.
Extreme gradient boosting model to estimate PM2. 5 concentrations with missing-filled satellite data in China
Atmos. Environ.
(2019)
A. Danandeh Mehr et al.
Streamflow prediction using linear genetic programming in comparison with a neuro-wavelet technique
J. Hydrol.
(2013)
G. Elkiran et al.
Multi-step ahead modelling of river water quality parameters using ensemble artificial intelligence-based approach
J. Hydrol.
(2019)
N.M. Gazzaz et al.
Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors
Mar. Pollut. Bull.
(2012)
G.-B. Huang et al.
Extreme learning machine: theory and applications
Neurocomputing
(2006)
Gao Huang et al.
Trends in extreme learning machines: a review
Neural Networks
(2015)
X. Ke et al.
Assessing water quality by ratio of the number of dominant bacterium species between surface/subsurface sediments in Haihe River Basin
Mar. Pollut. Bull.
(2015)
K. Khosravi et al.
Meteorological data mining and hybrid data-intelligence models for reference evaporation simulation: a case study in Iraq
Comput. Electron. Agric.
(2019)
M.S. Lachniet et al.
Use of correlation and stepwise regression to evaluate physical controls on the stable isotope values of Panamanian rain and surface waters
J. Hydrol.
(2006)

M. Mahmoodabadi et al.

Long-term evaluation of water quality parameters of the Karoun River using a regression approach and the adaptive neuro-fuzzy inference system

Mar. Pollut. Bull.

(2018)

E. Olyaie et al.

A comparative analysis among computational intelligence techniques for dissolved oxygen prediction in Delaware River

Geosci. Front.

(2017)

R. Peirovi Minaee et al.

Calibration of water quality model for distribution networks using genetic algorithm, particle swarm optimization, and hybrid methods

MethodsX

(2019)

N. Qian et al.

Predicting heat transfer of oscillating heat pipes for machining processes based on extreme gradient boosting algorithm

Appl. Therm. Eng.

(2020)

M.S. Samsudin et al.

Comparison of prediction model using spatial discriminant analysis for marine water quality index in mangrove estuarine zones

Mar. Pollut. Bull.

(2019)

P. Shi et al.

Prediction of dissolved oxygen content in aquaculture using clustering-based Softplus Extreme Learning Machine

Comput. Electron. Agric.

(2019)

J.B. Shukla et al.

Mathematical modeling and analysis of the depletion of dissolved oxygen in eutrophied water bodies affected by organic pollutants

Nonlinear Anal. Real World Appl.

(2008)

Z.M. Yaseen et al.

Predicting compressive strength of lightweight foamed concrete using extreme learning machine model

Adv. Eng. Software

(2018)

X. Yu et al.

Comparison of support vector regression and extreme gradient boosting for decomposition-based data-driven 10-day streamflow forecasting

J. Hydrol.

(2020)

S.I. Abba et al.

Modelling of Uncertain System: A comparison study of Linear and Non-Linear Approaches

(2019)

Abbasi, S.A., 2002. Water Quality Indices. State of the Art Report, National Institute of Hydrology, Scientific...

H.A. Abdulwahab et al.

An Enhanced Version of Black Hole Algorithm Via Levy Flight for Optimization and Data Clustering Problems

(2019)

R. Barzegar et al.

Application of wavelet-artificial intelligence hybrid models for water quality prediction: a case study in Aji-Chay River, Iran

Stochast. Environ. Res. Risk Assess.

(2016)

S.K. Bhagat et al.

Development of artificial intelligence for modeling wastewater heavy metal removal: state of the art, application assessment and possible future research

J. Clean. Prod.

(2019)

N. Bharti

Water quality indices used for surface water vulnerability assessment

Int. J. Environ. Sci.

(2011)

Castilla-herná, P., 2014. Water Quality of a Reservoir and Its Major Tributary Located in East-Central Mexico 6,...

T. Chen et al.

Xgboost: a scalable tree boosting system

Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., 2015. Xgboost: extreme gradient boosting. R package version...

W.-B. Chen et al.

Water quality modeling in reservoirs using multivariate linear regression and two neural network models

Adv. Artif. Neural Syst.

(2015)

D.N. Moriasi et al.

Model evaluation guidelines for systematic quantification of accuracy in watershed simulations

Trans. ASABE

(2007)

P. Debels et al.

Evaluation of water quality in the Chill??n River (Central Chile) using physicochemical parameters and a modified Water Quality Index

Environ. Monit. Assess.

(2005)

M. Ehteram et al.

Efficiency evaluation of reverse osmosis desalination plant using hybridized multilayer perceptron with particle swarm optimization

Environ. Sci. Pollut. Res.

(2020)

S. Emamgholizadeh et al.

Prediction of water quality parameters of Karoon River (Iran) by artificial intelligence-based models

Int. J. Environ. Sci. Technol.

(2014)

J.H. Friedman

Greedy function approximation: a gradient boosting machine

Ann. Stat.

(2001)

M. Fu et al.

Deep learning data-intelligence model based on adjusted forecasting window scale: application in daily streamflow

Simulation

(2020)

Elkiran, G., V.N., Abba, S.I., Abdullahi, J., 2018. Artificial intelligence-based approaches for multi-station...

Gaya, M.S., Abba, S.I., Abdu, A.M., Tukur, A.I., 2020. Estimation of water quality index using artificial intelligence...

Cited by (99)

A combination of multivariate statistics and machine learning techniques in groundwater characterization and quality forecasting
2024, Geosystems and Geoenvironment
Globally, the quality of groundwater has proven to have been affected by some natural and human activities in recent years. To ensure there is good drinking water (Sustainable Development Goal 6.3, there is a need to elucidate the groundwater quality status of the area of interest. The groundwater in the northwestern parts of Ghana is not yet well characterized. Hence, this study employed a multi-method approach of hydrochemistry, water quality index (WQI), multivariate statistics, and machine models: multiple linear regression (MLR), decision tree regression (DTR), random forest regression (RFR), and artificial neural network (ANN), are combined in the characterization and prediction of the water quality in the area. They are robust in providing conclusions on groundwater assessment that can be relied upon for decision-making processes regarding groundwater usage and monitoring. Except for NO₃⁻ and TDS exceeding their standard levels in 22 and 2 locations, respectively, the other physicochemical parameters are within acceptable limits. The groundwater is generally good for domestic usage based on the WQI, with 79.2% of excellent to good waters. The groundwater evolved from Na-type, Cl-type, and Cl(SO₄)-Ca(Mg) facies. Agricultural activities are the main source of human impact on the groundwater. Silicate mineral dissolution and ion exchange processes are the natural processes that affect groundwater mineralization, with mineral dissolution being the dominant process. Based on the performance metrics: MAE, MSE and RMSE of the ML methods considered in the WQI forecasting, the order of performance of the models is ANN > RFR > DTR > MLR, with the following respective R² values 0.9974, 0.9193, 0.8966 and 0.8886.
Predicting geogenic groundwater arsenic contamination risk in floodplains using interpretable machine-learning model
2024, Environmental Pollution
Long-term exposure to geogenic arsenic (As)-contaminated groundwater poses a severe threat to public health problems. Generally, elevated As concentrations have been observed with high amounts of ammonium in groundwater of floodplains. An extreme gradient boosting algorithm was conducted to develop a probability model based on hydrogeochemical data, which predicted the occurrence rates of groundwater As on a regional scale. Results showed that concentrations of NH₄⁺, Eh, K, Cl⁻, SO₄²⁻, and NO₃⁻ were powerful predictive variables of As exposure. The model revealed the co-enrichment of As with NH₄⁺, suggesting that the mineralization of nitrogen-containing organic matter promoted the reduction of As-bearing iron-oxides. The predicted distribution of high-As groundwater showed high consistency with known spatial distribution of As contamination, and the model also accurately predicted As concentrations in Jiangbei Plain of China and typical As-affected floodplains of Southeast Asia. The model can serve as a low-cost and rapid virtual sensor for detecting As concentrations in private or newly drilled wells, thereby providing critical information for informed management decisions, environmental protection and public health safety.
Bootstrap approach for quantifying the uncertainty in modeling of the water quality index using principal component analysis and artificial intelligence
2024, Journal of the Saudi Society of Agricultural Sciences
Collecting and analyzing data on surface water across extensive areas is a challenging, time-consuming and expensive. Developing predictive models that offer high accuracy, reliability and require minimal parameters can potentially reduce the time and expense associated with water quality monitoring and management. While most existing studies have focused on estimating point prediction of water quality without approximating the predictive interval (PI) of the estimation, this study aimed to develop a prediction tool to estimate the PI of water quality indexes (WQIs) in the lower Mun river basin. This was achieved by employing principal component analysis (PCA), artificial neural networks (ANN), and bootstrap methods to enhance accuracy, robustness, and reliability with the minimum number of water quality parameters. PCA was initially used to select 4 parameters for the WQI. Subsequently, ANN regression was employed to develop a new WQI to enhance data evaluation efficiency. The testing results of the proposed model revealed its excellent performance compared to other models in terms of accuracy (root mean square error (RMSE) = 0.86, correlation coefficient (R) = 0.993, scatter index (SI) = 0.019, mean absolute error (MAE) = 0.709, and mean bias error (MBE) = −0.003). Additionally, the proposed model incorporated the bootstrap method to quantify uncertainty and create a PI, resulting in a high coverage rate exceeding 95%. By integrating statistical techniques with artificial intelligence and quantifying uncertainty, it is possible to effectively evaluate water quality, provide more accurate and reliable indexes. This study can be an effective tool for decision makers and planners seeking precise data on water quality to develop water resource management strategies.
Applying Machine Learning to investigate metal isotope variations at the watershed scale: A case study with lithium isotopes across the Yukon River Basin
2023, Science of the Total Environment
Constraining the multiple climatic, lithological, topographic, and geochemical variables controlling isotope variations in large rivers is often challenging with standard statistical methods. Machine learning (ML) is an efficient method for analyzing multidimensional datasets, resolving correlated processes, and exploring relationships between variables simultaneously. We tested four ML algorithms to elucidate the controls of riverine δ⁷Li variations across the Yukon River Basin (YRB). We compiled (n = 102) and analyzed new samples (n = 21), producing a dataset of 123 river water samples collected across the basin during the summer including δ⁷Li and extracted environmental, climatological, and geological characteristics of the drainage area for each sample from open-access geospatial databases. The ML models were trained, tuned, and tested under multiple scenarios to avoid issues such as overfitting. Random Forests (RF) performed best at predicting δ⁷Li across the basin, with the median model explaining 62 % of the variance. The most important variables controlling δ⁷Li across the basin are elevation, lithology, and past glacial coverage, which ultimately influence weathering congruence. Riverine δ⁷Li has a negative dependence on elevation. This reflects congruent weathering in kinetically-limited mountain zones with short residence times. The consistent ranking of lithology, specifically igneous and metamorphic rock cover, as a top feature controlling riverine δ⁷Li modeled by the RFs is unexpected. Further study is required to validate this finding. Rivers draining areas that were extensively covered during the last glacial maximum tend to have lower δ⁷Li due to immature weathering profiles resulting in short residence times, less secondary mineral formation and therefore more congruent weathering. We demonstrate that ML provides a fast, simple, visualizable, and interpretable approach for disentangling key controls of isotope variations in river water. We assert that ML should become a routine tool, and present a framework for applying ML to analyze spatial metal isotope data at the catchment scale.
A novel interval decomposition correlation particle swarm optimization-extreme learning machine model for short-term and long-term water quality prediction
2023, Journal of Hydrology
Water quality prediction plays a crucial role in pollution treatment. However, inaccurate long-term prediction resulting from complex information patterns and insufficient feature extraction may lead to unnecessary environmental costs. In this study, a novel Interval Decomposition Correlation Particle Swarm Optimization-Extreme Learning Machine (IDCPSO-ELM) model is proposed to improve long-term water quality prediction ability. We employ Multivariate Variational Mode Decomposition (MVMD), Variational Mode Decomposition (VMD), Sliding Correlation and Permutation Entropy to reconstruct features to reduce complexity. Seasonal and Trend decomposition using Loess-Empirical Wavelet Transform decomposition (STL-EWT) is applied to target variables to improve feature extraction ability. Particle Swarm Optimization-Extreme Learning Machine (PSO-ELM) is used to predict high-frequency components based on the reconstructive features, and Back Propagation Neural Network-Extreme Learning Machine (BPNN-ELM) is employed to predict the trend and lowest-frequency components. Grey Wolf Optimization Algorithm (GWO) is used to combine these results. Six stations with high potential pollution threats in the Haihe River basin of Beijing from 2021 to 2022 are taken into model competition. The results show that: (1) In short-term prediction, average MAPE, RMSE and NSE of IDCPSO-ELM model reach 0.0934, 0.0410 and 0.8061, respectively, which are better than Genetic Algorithm-Elman Network (GA-ELMAN), Cuckoo Search Algorithm-Back Propagation Neural Network (CS-BPNN), PSO-ELM and Sparrow Search Algorithm-Kernel Extreme Learning Machine (SSA-KELM) short-term prediction models. (2) In long-term prediction, average MAPE, RMSE and NSE of IDCPSO-ELM model reach 0.1114, 0.0420 and 0.8268, respectively, which are better than the Radial Basis Function Neural Network (RBFNN), Grey Wolf Optimization-Support Vector Machine Regression (GWO-SVR), Whale Optimization Algorithm-Deep Extreme Learning Machine (WOA-DELM), Wavelet Transform decomposition-GWO-SVR-BPNN (WT-GWO-SVR-BPNN) and Complete Ensemble Empirical Mode Decomposition with Adaptive Noise-PSO-ELM-Long Short-Term Memory (CEEMDAN-PSO-ELM-LSTM) long-term prediction models. IDCPSO-ELM model is considered competitive and promising for long-term water quality prediction, particularly in areas with high potential pollution threats.
Deep learning with PID residual elimination network for time-series prediction of water quality in aquaculture industry
2023, Computers and Electronics in Agriculture
Time-series prediction of water quality is the most critical component of water quality monitoring in the aquaculture industry. Accurate multi-step ahead prediction of water quality can provide reasonable support for production decisions and reduce breeding risks. However, because of the nonlinearity and strong coupling between water quality factors, it is difficult to obtain accurate prediction results and reduce prediction errors. For solving this problem, we innovatively propose a methodology wherein deep neural networks (DNN) coupled with a proportional–integral–derivative residual elimination network (PID-RENet) are used for time-series prediction of water quality. In the proposed method, PID-RENet is mainly composed of two parts: a conventional PID controller and a backpropagation (BP) neural network. The role of the PID controller is to calculate the control amount according to the predicted historical deviation and to correct the prediction result of the benchmark DNN model. The BP neural network is used to dynamically adjust the three parameters of the PID controller for improving its adaptive ability. To verify the effectiveness and practicability of the proposed method, a series of multi-step ahead prediction experiments were performed on a publicly available dataset and two self-constructed water quality datasets. The results for the time-series prediction of all types of datasets suggested that the error performance indicators of the model corrected by PID-RENet were all better than those of the evaluated benchmark DNN models, indicating that PID-RENet can provide more accurate predictions.

View all citing articles on Scopus

View full text

Research papersEvolutionary computational intelligence algorithm coupled with self-tuning predictive model for water quality index determination

Highlights

Abstract

Introduction

Section snippets

Extreme gradient boosting (XGB)

Case study and data description

Application results and analysis

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Proc. Comput. Sci.

Atmos. Environ.

J. Hydrol.

J. Hydrol.

Mar. Pollut. Bull.

Neurocomputing

Neural Networks

Mar. Pollut. Bull.

Comput. Electron. Agric.

J. Hydrol.

Mar. Pollut. Bull.

Geosci. Front.

MethodsX

Appl. Therm. Eng.

Mar. Pollut. Bull.

Comput. Electron. Agric.

Nonlinear Anal. Real World Appl.

Adv. Eng. Software

J. Hydrol.

Modelling of Uncertain System: A comparison study of Linear and Non-Linear Approaches

An Enhanced Version of Black Hole Algorithm Via Levy Flight for Optimization and Data Clustering Problems

Application of wavelet-artificial intelligence hybrid models for water quality prediction: a case study in Aji-Chay River, Iran

Stochast. Environ. Res. Risk Assess.

Development of artificial intelligence for modeling wastewater heavy metal removal: state of the art, application assessment and possible future research

J. Clean. Prod.

Water quality indices used for surface water vulnerability assessment

Int. J. Environ. Sci.

Xgboost: a scalable tree boosting system

Water quality modeling in reservoirs using multivariate linear regression and two neural network models

Adv. Artif. Neural Syst.

Model evaluation guidelines for systematic quantification of accuracy in watershed simulations

Trans. ASABE

Evaluation of water quality in the Chill??n River (Central Chile) using physicochemical parameters and a modified Water Quality Index

Environ. Monit. Assess.

Efficiency evaluation of reverse osmosis desalination plant using hybridized multilayer perceptron with particle swarm optimization

Environ. Sci. Pollut. Res.

Prediction of water quality parameters of Karoon River (Iran) by artificial intelligence-based models

Int. J. Environ. Sci. Technol.

Greedy function approximation: a gradient boosting machine

Ann. Stat.

Deep learning data-intelligence model based on adjusted forecasting window scale: application in daily streamflow

Simulation

Research papers
Evolutionary computational intelligence algorithm coupled with self-tuning predictive model for water quality index determination