Research papersEvolutionary computational intelligence algorithm coupled with self-tuning predictive model for water quality index determination
Introduction
The continuous reduction in WQ remained a top global concern, especially in terms of industrial, agricultural and domestic utilization. Several factors including geological, weather, hydrological, industrial source (such as textile, laundry, and pharmaceutical etc.) and natural phenomena (physical processes) have adverse effect on the quality of water (Abba and Elkiran, 2017, Ke et al., 2015). In literature, water pollution is described as presence of detrimental substance(s) in water to the level that causes problems to living organisms. Therefore, monitoring water features to ensure the quality of is of paramount importance (Mahmoodabadi and Rezaei Arshad, 2018). WQ can be considered as set of chemical, physical and biological properties of water that can be utilized in forecasting the water quality (Sharma and Kansal, 2011). WQ helps in determining the chemicals concentration in the water. Thus, assessment of WQ can be describe as the analysis chemical, physical and biological properties of water. In the literature, several studies have been carried out in the area of water quality (Singh, 2017). Following this, WQI provide a single parameter describing the water quality reducing the huge number of parameters to a simpler expression, thereby making the monitoring and interpretation WQ effortless (Gazzaz et al., 2012). However, WQ is dependent on the ecosystem as well as human usages, such as industrial pollution, sewage, and wastewater; and both international and national agencies are committed to pollution control and WQ analysis (Bharti, 2011). Factually, there is no single variable can express the WQ efficiently, the WQ is normally determined by measuring multiple WQ parameters (Hameed et al., 2017). For this purpose, large amount of data is collected by the monitoring team which must be easily interpretable for decision-makers and the general public. In in this regard, numerous WQIs have been developed, which are defined based on WQ criteria and important parameters (Bharti, 2011).
WQI is a widely used measure in different parts of the world to solve problems of data management and to evaluate the successes as well as failures of management strategies for improving water quality. While numerous indices are used in summarizing WQ data to an understandable format, WQI is derived from numerous water characterization parameters to signifies water quality level (Abbasi, 2002). In essence, general approach to computing WQI involve passing numerous water quality parameters into computable functions and logical expressions that rates the wellbeing of a water body with a solitary number (Castilla-herná, 2014). The computed value is then rated from very bad to excellent based on existing rating scale which is understandable in ascertaining WQ by non-technical water managers, political decision-makers and the general public (Abbasi, 2002). Over the decades various WQI have been defined globally, these WQI help to represent the overall WQ in that particular area efficiently, including the United State National Sanitation Foundation WQI (NSFWQI) (Horton, 1965, Bharti, 2011), the Canadian Council of Ministers of the Environment WQI (CCMEWQI) (Sharma, 2002, Khan et al., 2004), the British Columbia Water Quality Index (BCWQI), and Oregon WQI (OWQI) (Debels et al., 2005). These indices are based on a comparison of the WQ parameters to regulatory standards and give a single value to the water quality of a source (Abba et al., 2019).
Despite the several paradigms that can be used to assess the quality of water, Malaysian Department of Environments (DoE) approved the use of a unified water quality indices in 1974, for analysis and ranking the level of contamination and pollution of Malaysian rivers. This recommended adaptation of WQI is called the DoE-WQI and served as method of choice for calculating the WQ index of local Malaysian rivers (Gazzaz et al., 2012, Yaseen et al., 2018a, Yaseen et al., 2018b, Yaseen et al., 2018c). DoE-WQI gives a standard WQ guidelines use to classify the WQ of Malaysian local rivers to five categories depending on their appropriateness for different usage including irrigation, domestic supplies and fish culture, water supply, recreational use, livestock drinking. The application used for computing the water quality in Malaysia is almost same as that employed in (Muhammad et al., 2015, Gazzaz et al., 2012, Yaseen et al., 2018a, Yaseen et al., 2018b, Yaseen et al., 2018c). However, the analysis and calculation of this method generally requires significant amounts of time and effort, which could lead to accidental errors during the sub-index computations; nevertheless, the method has proved to be highly effective and successful in practice based on scientific fact. For detailed manual calculation of the WQ sub-index for Malaysia, refer to (Hameed et al., 2017, Gazzaz et al., 2012).
Recent research shows that there is exponential increase in the use of Artificial Intelligence (AI), which represents an alternative, attractive, quick and direct computing tool for water quality modeling (Tiyasha et al., 2020, Sahoo and Patra, 2020, Gaya et al., 2020, Yasin and Karim, 2020, Karim and Kamsani, 2020). AI has the ability to minimize the error, effort and computation time (Bhagat et al., 2019). Artificial neural network (ANN), adaptive neuro-fuzzy inference system (ANFIS), and support vector regression (SVR) are among the popular artificial intelligence modeling methods developed for highly complicated process and handling data nonlinearity (Barzegar et al., 2016). However, several researchers have employed different combinations of AI-based models to predict the WQI (Wagener et al., 2019, Yaseen et al., 2018b). For example, Nourani et al. (2013) proposed the application of an ANN to monitor treated water quality. The results indicated that ANN has the potential to perform better than the conventional WQ method. Emamgholizadeh et al. (2014) applied ANN and ANFIS models simultaneously to predict the WQ variables of the Karoon River water, and the model results indicated the ANN model have a better predictive ability compared to ANFIS based models. In this regard, Barzegar et al. (2016) investigated the performance of standalone ANN and ANFIS, and hybrid wavelet-ANN and wavelet-ANFIS for the prediction of monthly averaged water salinity of Aji-Chay River in northwest Iran, where the results showed that ANFIS performance was superior to ANN. Hameed et al. (2017) examined two different models viz; Radial Basis Function NN (RBFNN) and Back Propagation NN (BPNN). The results from both models shows the reliable performance with improved accuracy.
Similarly, ANN model was employed to estimate the WQI at Langat River Basin, Malaysia, and the outcomes was compared with the traditional multilinear regression analysis. The obtained results demonstrated the effectiveness of the ANN model for the prediction of WQI (Juahir et al., 2004). Gazzaz et al. (2012) studied the potential of an ANN model to predict the WQI in Malaysia; the results demonstrated that the ANN model offered a reliable alternative to WQI computation and forecasting. Mohammadpour et al. (2014) studied and compared the potential of Support Vector Machine (SVM), BPNN and RBFNN techniques to predict the WQI in a wetland. For this purpose, different WQ variables at 17 monitoring points were evaluated. The results showed that SVM and FFBP outperformed RBF and therefore emerged as successful and reliable models for the prediction of WQI. The outcomes also indicated that the methods can reliably reduce time and computational burdens. More recently, Yaseen et al., 2018a, Yaseen et al., 2018b, Yaseen et al., 2018c) employed different types of AI-based models to predict the WQI. Abba et al. (2019) presented a performance comparison analysis of non-linear models based on RBFNN and Hammerstein-Weiner (HW) techniques and traditional linear modelling technique based on Generalized Linear Regression (GLR) and Autoregressive Integrated Moving Average (ARIMA) for Agra station of Yamuna river, India, and Kinta river in Malaysia. The simulation results established that the non-linear models outperformed the other models.
Based on the above literature, it is evident that studies in the field of WQ and WQI conducted around the world have shown the reliability of computational intelligence models (Tiyasha et al., 2020). Although there is a significant increase in the use of AI models in the field of WQI prediction, yet, many standalone AI models produce unsatisfactory results due to some related limitations to the identification of the appropriate input parameters or the selection of the internal models parameters. Particularly in applications involving very dynamic hydrological processes. Hence, to alleviate the aforementioned shortcomings, it is mandatory to develop an appropriate structure and to have a proper selection technique of the suitable variables. This has been proven to be significantly influential in the prediction ability for complex non-linear processes, in order to provide a reliable alternative to manual WQ computation. Hence, the objective of this study is to investigate the potential of XGB and GP in obtaining the feature importance, and then the abstracted inputs are used in the predictive model (i.e. ELM). For comparison purpose, the proposed models are compared with the stand-alone schema where the optimum combination of input variables are applied to the XGB, GP, ELM and Linear Regression (LR) models. The primary motivation behind the current study is to explore the abilities of the GP and XGB approaches for choosing the most dominant related variables. In this regard, hybrid models are established to increase the prediction accuracy of the non-tuned data intelligent models for mimicking the river WQ pattern. However, it is important to mention that since the development of this new algorithm, to the best of the authors’ knowledge, no research or technical literature has been published in respect of application of XGB in WQI prediction and as an essential method of input selection. Because of its distinctive and outstanding features, this study, used XGB algorithm in both input selection and WQI modelling. The remaining sections of the paper are arranged as follows: Section 2 presents the applied methodology for single model and proposed modeling schema adopted in the study, Section 3 described the case study and the data description, while the application of result and analysis are given in Section 4 and the paper ends with a conclusion in Section 5.
Section snippets
Extreme gradient boosting (XGB)
The XGB algorithm has lately been dominating supervised machine learning applications, such as regression and classification (Pradhan and Sameen, 2020). It is an improvement of the gradient boosting technique introduced by Friedman (2001). It possesses the salient features of gradient tree boosting methods such as computational efficiency, high processing ability, and learning speed. It uses a more precise approximation to find the best decision tree model to achieve a higher speed and better
Case study and data description
River Kinta span an area of about 2500 km2 and a length of about 100 km within Kinta district. It supports farming activities, residential areas and industries (Fig. 2) across its three subdivisions (i.e. upstream, undulating and downstream). On the other hand, stations 2PK34 and 2PK25 are the center of the river, at the upstream there are stations 2PK22, 2PK24, and stations 2PK55 and 2PK19 are the downstream. Kinta river is largely utilized for forestry and recreation along the upstream. The
Application results and analysis
The accuracy of intelligent models is affected by the use of numerous inputs. Several inputs selection methods have been reported in hydro-environmental literatures; popular among them includes; correlation, auto-correlation and principal components analysis. Nevertheless, these methods are frequently used for input/output linear relationships (Hadi and Tombul, 2018). This study employed two novel nonlinear input variables selection tools (GP and XGB) for choosing the most dominant related
Conclusion
The study was explored the abilities of two evolutionary nonlinear input variable selection techniques (GP and XGB) coupled with a non-tuned data intelligent model (ELM) for modeling the WQI in Kinta River basin, Malaysia. For this purpose, different trials were evaluated, and the best one was chosen as the best combination. The established modeling schema of the study was to use GP and XGB as a nonlinear selection tool in addition to using them as the main model or prediction models.
CRediT authorship contribution statement
S.I. Abba: Conceptualization, Validation, Investigation, Visualization. Sinan Jasim Hadi: Software, Validation, Investigation, Visualization. Saad Sh. Sammen: Conceptualization, Validation, Investigation, Visualization. Sinan Q. Salih: Validation, Investigation, Visualization, Validation, Investigation, Visualization. R.A. Abdulkadir: . Quoc Bao Pham: . Zaher Mundher Yaseen: Conceptualization, Validation, Investigation, Visualization.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors acknowledge the historical water quality data availability by the Department of Environment (DoE) (Malaysia).
References (75)
- et al.
Effluent prediction of chemical oxygen demand from the wastewater treatment plant using artificial neural network application
Proc. Comput. Sci.
(2017) - et al.
Extreme gradient boosting model to estimate PM2. 5 concentrations with missing-filled satellite data in China
Atmos. Environ.
(2019) - et al.
Streamflow prediction using linear genetic programming in comparison with a neuro-wavelet technique
J. Hydrol.
(2013) - et al.
Multi-step ahead modelling of river water quality parameters using ensemble artificial intelligence-based approach
J. Hydrol.
(2019) - et al.
Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors
Mar. Pollut. Bull.
(2012) - et al.
Extreme learning machine: theory and applications
Neurocomputing
(2006) - et al.
Trends in extreme learning machines: a review
Neural Networks
(2015) - et al.
Assessing water quality by ratio of the number of dominant bacterium species between surface/subsurface sediments in Haihe River Basin
Mar. Pollut. Bull.
(2015) - et al.
Meteorological data mining and hybrid data-intelligence models for reference evaporation simulation: a case study in Iraq
Comput. Electron. Agric.
(2019) - et al.
Use of correlation and stepwise regression to evaluate physical controls on the stable isotope values of Panamanian rain and surface waters
J. Hydrol.
(2006)
Long-term evaluation of water quality parameters of the Karoun River using a regression approach and the adaptive neuro-fuzzy inference system
Mar. Pollut. Bull.
A comparative analysis among computational intelligence techniques for dissolved oxygen prediction in Delaware River
Geosci. Front.
Calibration of water quality model for distribution networks using genetic algorithm, particle swarm optimization, and hybrid methods
MethodsX
Predicting heat transfer of oscillating heat pipes for machining processes based on extreme gradient boosting algorithm
Appl. Therm. Eng.
Comparison of prediction model using spatial discriminant analysis for marine water quality index in mangrove estuarine zones
Mar. Pollut. Bull.
Prediction of dissolved oxygen content in aquaculture using clustering-based Softplus Extreme Learning Machine
Comput. Electron. Agric.
Mathematical modeling and analysis of the depletion of dissolved oxygen in eutrophied water bodies affected by organic pollutants
Nonlinear Anal. Real World Appl.
Predicting compressive strength of lightweight foamed concrete using extreme learning machine model
Adv. Eng. Software
Comparison of support vector regression and extreme gradient boosting for decomposition-based data-driven 10-day streamflow forecasting
J. Hydrol.
Modelling of Uncertain System: A comparison study of Linear and Non-Linear Approaches
An Enhanced Version of Black Hole Algorithm Via Levy Flight for Optimization and Data Clustering Problems
Application of wavelet-artificial intelligence hybrid models for water quality prediction: a case study in Aji-Chay River, Iran
Stochast. Environ. Res. Risk Assess.
Development of artificial intelligence for modeling wastewater heavy metal removal: state of the art, application assessment and possible future research
J. Clean. Prod.
Water quality indices used for surface water vulnerability assessment
Int. J. Environ. Sci.
Xgboost: a scalable tree boosting system
Water quality modeling in reservoirs using multivariate linear regression and two neural network models
Adv. Artif. Neural Syst.
Model evaluation guidelines for systematic quantification of accuracy in watershed simulations
Trans. ASABE
Evaluation of water quality in the Chill??n River (Central Chile) using physicochemical parameters and a modified Water Quality Index
Environ. Monit. Assess.
Efficiency evaluation of reverse osmosis desalination plant using hybridized multilayer perceptron with particle swarm optimization
Environ. Sci. Pollut. Res.
Prediction of water quality parameters of Karoon River (Iran) by artificial intelligence-based models
Int. J. Environ. Sci. Technol.
Greedy function approximation: a gradient boosting machine
Ann. Stat.
Deep learning data-intelligence model based on adjusted forecasting window scale: application in daily streamflow
Simulation
Cited by (99)
A combination of multivariate statistics and machine learning techniques in groundwater characterization and quality forecasting
2024, Geosystems and GeoenvironmentPredicting geogenic groundwater arsenic contamination risk in floodplains using interpretable machine-learning model
2024, Environmental PollutionBootstrap approach for quantifying the uncertainty in modeling of the water quality index using principal component analysis and artificial intelligence
2024, Journal of the Saudi Society of Agricultural SciencesDeep learning with PID residual elimination network for time-series prediction of water quality in aquaculture industry
2023, Computers and Electronics in Agriculture