Elsevier

Journal of Hydrology

Volume 587, August 2020, 124974
Journal of Hydrology

Research papers
Evolutionary computational intelligence algorithm coupled with self-tuning predictive model for water quality index determination

https://doi.org/10.1016/j.jhydrol.2020.124974Get rights and content

Highlights

  • A new hybrid intelligent models are developed for water quality index (WQI).

  • Non-linear input selection with non-tuned learning model is built for modeling WQI.

  • Kinta River is selected as case study which is located in tropical environment.

  • The stand-alone modeling schema is validated with the proposed hybrid models.

  • The proposed models indicated a superior prediction capacity for WQI.

Abstract

Anthropogenic activities affect the water bodies and result in a drastic reduction of river water quality (WQ). The development of a reliable intelligent model for evaluating the suitability of water remains a challenging task facing hydro-environmental engineers. The current study is investigated the applicability of Extreme Gradient Boosting (XGB) and Genetic Programming (GP) in obtaining feature importance, and then abstracted input variables were imposed into the predictive model (the Extreme Learning Machine (ELM)) for the prediction of water quality index (WQI). The stand-alone modeling schema is compared with the proposed hybrid models where the optimum variables are supplied into the GP, XGB, linear regression (LR), stepwise linear regression (SWLR) and ELM models. The WQ data is obtained from the Department of Environment (DoE) (Malaysia), and results are evaluated in terms of determination coefficient (R2) and root mean square error (RMSE). The results demonstrated that the hybrid GPELM and XGBELM models outperformed the standalone GP, XGB, and ELM models for the prediction of WQI at Kinta River basin. A comparison of the hybrid models showed that the predictive skill of GPELM (RMSE = 3.441 training and RMSE = 3.484 testing) over XGBELM improving the accuracy by decreasing the values of RMSE by 5% and 9% for training and testing, respectively with regards to XGBELM (RMSE = 3.606 training and RMSE = 3.816 testing). Although regressions are often proposed as reference models (LR and SWLR), when combined with computational intelligence, they still provide satisfactory results in this study. The proposed hybrid GPELM and XGBELM models have improved the prediction accuracy with minimum number of input variables and can therefore serve as reliable predictive tools for WQI at Kinta River basin.

Introduction

The continuous reduction in WQ remained a top global concern, especially in terms of industrial, agricultural and domestic utilization. Several factors including geological, weather, hydrological, industrial source (such as textile, laundry, and pharmaceutical etc.) and natural phenomena (physical processes) have adverse effect on the quality of water (Abba and Elkiran, 2017, Ke et al., 2015). In literature, water pollution is described as presence of detrimental substance(s) in water to the level that causes problems to living organisms. Therefore, monitoring water features to ensure the quality of is of paramount importance (Mahmoodabadi and Rezaei Arshad, 2018). WQ can be considered as set of chemical, physical and biological properties of water that can be utilized in forecasting the water quality (Sharma and Kansal, 2011). WQ helps in determining the chemicals concentration in the water. Thus, assessment of WQ can be describe as the analysis chemical, physical and biological properties of water. In the literature, several studies have been carried out in the area of water quality (Singh, 2017). Following this, WQI provide a single parameter describing the water quality reducing the huge number of parameters to a simpler expression, thereby making the monitoring and interpretation WQ effortless (Gazzaz et al., 2012). However, WQ is dependent on the ecosystem as well as human usages, such as industrial pollution, sewage, and wastewater; and both international and national agencies are committed to pollution control and WQ analysis (Bharti, 2011). Factually, there is no single variable can express the WQ efficiently, the WQ is normally determined by measuring multiple WQ parameters (Hameed et al., 2017). For this purpose, large amount of data is collected by the monitoring team which must be easily interpretable for decision-makers and the general public. In in this regard, numerous WQIs have been developed, which are defined based on WQ criteria and important parameters (Bharti, 2011).

WQI is a widely used measure in different parts of the world to solve problems of data management and to evaluate the successes as well as failures of management strategies for improving water quality. While numerous indices are used in summarizing WQ data to an understandable format, WQI is derived from numerous water characterization parameters to signifies water quality level (Abbasi, 2002). In essence, general approach to computing WQI involve passing numerous water quality parameters into computable functions and logical expressions that rates the wellbeing of a water body with a solitary number (Castilla-herná, 2014). The computed value is then rated from very bad to excellent based on existing rating scale which is understandable in ascertaining WQ by non-technical water managers, political decision-makers and the general public (Abbasi, 2002). Over the decades various WQI have been defined globally, these WQI help to represent the overall WQ in that particular area efficiently, including the United State National Sanitation Foundation WQI (NSFWQI) (Horton, 1965, Bharti, 2011), the Canadian Council of Ministers of the Environment WQI (CCMEWQI) (Sharma, 2002, Khan et al., 2004), the British Columbia Water Quality Index (BCWQI), and Oregon WQI (OWQI) (Debels et al., 2005). These indices are based on a comparison of the WQ parameters to regulatory standards and give a single value to the water quality of a source (Abba et al., 2019).

Despite the several paradigms that can be used to assess the quality of water, Malaysian Department of Environments (DoE) approved the use of a unified water quality indices in 1974, for analysis and ranking the level of contamination and pollution of Malaysian rivers. This recommended adaptation of WQI is called the DoE-WQI and served as method of choice for calculating the WQ index of local Malaysian rivers (Gazzaz et al., 2012, Yaseen et al., 2018a, Yaseen et al., 2018b, Yaseen et al., 2018c). DoE-WQI gives a standard WQ guidelines use to classify the WQ of Malaysian local rivers to five categories depending on their appropriateness for different usage including irrigation, domestic supplies and fish culture, water supply, recreational use, livestock drinking. The application used for computing the water quality in Malaysia is almost same as that employed in (Muhammad et al., 2015, Gazzaz et al., 2012, Yaseen et al., 2018a, Yaseen et al., 2018b, Yaseen et al., 2018c). However, the analysis and calculation of this method generally requires significant amounts of time and effort, which could lead to accidental errors during the sub-index computations; nevertheless, the method has proved to be highly effective and successful in practice based on scientific fact. For detailed manual calculation of the WQ sub-index for Malaysia, refer to (Hameed et al., 2017, Gazzaz et al., 2012).

Recent research shows that there is exponential increase in the use of Artificial Intelligence (AI), which represents an alternative, attractive, quick and direct computing tool for water quality modeling (Tiyasha et al., 2020, Sahoo and Patra, 2020, Gaya et al., 2020, Yasin and Karim, 2020, Karim and Kamsani, 2020). AI has the ability to minimize the error, effort and computation time (Bhagat et al., 2019). Artificial neural network (ANN), adaptive neuro-fuzzy inference system (ANFIS), and support vector regression (SVR) are among the popular artificial intelligence modeling methods developed for highly complicated process and handling data nonlinearity (Barzegar et al., 2016). However, several researchers have employed different combinations of AI-based models to predict the WQI (Wagener et al., 2019, Yaseen et al., 2018b). For example, Nourani et al. (2013) proposed the application of an ANN to monitor treated water quality. The results indicated that ANN has the potential to perform better than the conventional WQ method. Emamgholizadeh et al. (2014) applied ANN and ANFIS models simultaneously to predict the WQ variables of the Karoon River water, and the model results indicated the ANN model have a better predictive ability compared to ANFIS based models. In this regard, Barzegar et al. (2016) investigated the performance of standalone ANN and ANFIS, and hybrid wavelet-ANN and wavelet-ANFIS for the prediction of monthly averaged water salinity of Aji-Chay River in northwest Iran, where the results showed that ANFIS performance was superior to ANN. Hameed et al. (2017) examined two different models viz; Radial Basis Function NN (RBFNN) and Back Propagation NN (BPNN). The results from both models shows the reliable performance with improved accuracy.

Similarly, ANN model was employed to estimate the WQI at Langat River Basin, Malaysia, and the outcomes was compared with the traditional multilinear regression analysis. The obtained results demonstrated the effectiveness of the ANN model for the prediction of WQI (Juahir et al., 2004). Gazzaz et al. (2012) studied the potential of an ANN model to predict the WQI in Malaysia; the results demonstrated that the ANN model offered a reliable alternative to WQI computation and forecasting. Mohammadpour et al. (2014) studied and compared the potential of Support Vector Machine (SVM), BPNN and RBFNN techniques to predict the WQI in a wetland. For this purpose, different WQ variables at 17 monitoring points were evaluated. The results showed that SVM and FFBP outperformed RBF and therefore emerged as successful and reliable models for the prediction of WQI. The outcomes also indicated that the methods can reliably reduce time and computational burdens. More recently, Yaseen et al., 2018a, Yaseen et al., 2018b, Yaseen et al., 2018c) employed different types of AI-based models to predict the WQI. Abba et al. (2019) presented a performance comparison analysis of non-linear models based on RBFNN and Hammerstein-Weiner (HW) techniques and traditional linear modelling technique based on Generalized Linear Regression (GLR) and Autoregressive Integrated Moving Average (ARIMA) for Agra station of Yamuna river, India, and Kinta river in Malaysia. The simulation results established that the non-linear models outperformed the other models.

Based on the above literature, it is evident that studies in the field of WQ and WQI conducted around the world have shown the reliability of computational intelligence models (Tiyasha et al., 2020). Although there is a significant increase in the use of AI models in the field of WQI prediction, yet, many standalone AI models produce unsatisfactory results due to some related limitations to the identification of the appropriate input parameters or the selection of the internal models parameters. Particularly in applications involving very dynamic hydrological processes. Hence, to alleviate the aforementioned shortcomings, it is mandatory to develop an appropriate structure and to have a proper selection technique of the suitable variables. This has been proven to be significantly influential in the prediction ability for complex non-linear processes, in order to provide a reliable alternative to manual WQ computation. Hence, the objective of this study is to investigate the potential of XGB and GP in obtaining the feature importance, and then the abstracted inputs are used in the predictive model (i.e. ELM). For comparison purpose, the proposed models are compared with the stand-alone schema where the optimum combination of input variables are applied to the XGB, GP, ELM and Linear Regression (LR) models. The primary motivation behind the current study is to explore the abilities of the GP and XGB approaches for choosing the most dominant related variables. In this regard, hybrid models are established to increase the prediction accuracy of the non-tuned data intelligent models for mimicking the river WQ pattern. However, it is important to mention that since the development of this new algorithm, to the best of the authors’ knowledge, no research or technical literature has been published in respect of application of XGB in WQI prediction and as an essential method of input selection. Because of its distinctive and outstanding features, this study, used XGB algorithm in both input selection and WQI modelling. The remaining sections of the paper are arranged as follows: Section 2 presents the applied methodology for single model and proposed modeling schema adopted in the study, Section 3 described the case study and the data description, while the application of result and analysis are given in Section 4 and the paper ends with a conclusion in Section 5.

Section snippets

Extreme gradient boosting (XGB)

The XGB algorithm has lately been dominating supervised machine learning applications, such as regression and classification (Pradhan and Sameen, 2020). It is an improvement of the gradient boosting technique introduced by Friedman (2001). It possesses the salient features of gradient tree boosting methods such as computational efficiency, high processing ability, and learning speed. It uses a more precise approximation to find the best decision tree model to achieve a higher speed and better

Case study and data description

River Kinta span an area of about 2500 km2 and a length of about 100 km within Kinta district. It supports farming activities, residential areas and industries (Fig. 2) across its three subdivisions (i.e. upstream, undulating and downstream). On the other hand, stations 2PK34 and 2PK25 are the center of the river, at the upstream there are stations 2PK22, 2PK24, and stations 2PK55 and 2PK19 are the downstream. Kinta river is largely utilized for forestry and recreation along the upstream. The

Application results and analysis

The accuracy of intelligent models is affected by the use of numerous inputs. Several inputs selection methods have been reported in hydro-environmental literatures; popular among them includes; correlation, auto-correlation and principal components analysis. Nevertheless, these methods are frequently used for input/output linear relationships (Hadi and Tombul, 2018). This study employed two novel nonlinear input variables selection tools (GP and XGB) for choosing the most dominant related

Conclusion

The study was explored the abilities of two evolutionary nonlinear input variable selection techniques (GP and XGB) coupled with a non-tuned data intelligent model (ELM) for modeling the WQI in Kinta River basin, Malaysia. For this purpose, different trials were evaluated, and the best one was chosen as the best combination. The established modeling schema of the study was to use GP and XGB as a nonlinear selection tool in addition to using them as the main model or prediction models.

CRediT authorship contribution statement

S.I. Abba: Conceptualization, Validation, Investigation, Visualization. Sinan Jasim Hadi: Software, Validation, Investigation, Visualization. Saad Sh. Sammen: Conceptualization, Validation, Investigation, Visualization. Sinan Q. Salih: Validation, Investigation, Visualization, Validation, Investigation, Visualization. R.A. Abdulkadir: . Quoc Bao Pham: . Zaher Mundher Yaseen: Conceptualization, Validation, Investigation, Visualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors acknowledge the historical water quality data availability by the Department of Environment (DoE) (Malaysia).

References (75)

  • M. Mahmoodabadi et al.

    Long-term evaluation of water quality parameters of the Karoun River using a regression approach and the adaptive neuro-fuzzy inference system

    Mar. Pollut. Bull.

    (2018)
  • E. Olyaie et al.

    A comparative analysis among computational intelligence techniques for dissolved oxygen prediction in Delaware River

    Geosci. Front.

    (2017)
  • R. Peirovi Minaee et al.

    Calibration of water quality model for distribution networks using genetic algorithm, particle swarm optimization, and hybrid methods

    MethodsX

    (2019)
  • N. Qian et al.

    Predicting heat transfer of oscillating heat pipes for machining processes based on extreme gradient boosting algorithm

    Appl. Therm. Eng.

    (2020)
  • M.S. Samsudin et al.

    Comparison of prediction model using spatial discriminant analysis for marine water quality index in mangrove estuarine zones

    Mar. Pollut. Bull.

    (2019)
  • P. Shi et al.

    Prediction of dissolved oxygen content in aquaculture using clustering-based Softplus Extreme Learning Machine

    Comput. Electron. Agric.

    (2019)
  • J.B. Shukla et al.

    Mathematical modeling and analysis of the depletion of dissolved oxygen in eutrophied water bodies affected by organic pollutants

    Nonlinear Anal. Real World Appl.

    (2008)
  • Z.M. Yaseen et al.

    Predicting compressive strength of lightweight foamed concrete using extreme learning machine model

    Adv. Eng. Software

    (2018)
  • X. Yu et al.

    Comparison of support vector regression and extreme gradient boosting for decomposition-based data-driven 10-day streamflow forecasting

    J. Hydrol.

    (2020)
  • S.I. Abba et al.

    Modelling of Uncertain System: A comparison study of Linear and Non-Linear Approaches

    (2019)
  • Abbasi, S.A., 2002. Water Quality Indices. State of the Art Report, National Institute of Hydrology, Scientific...
  • H.A. Abdulwahab et al.

    An Enhanced Version of Black Hole Algorithm Via Levy Flight for Optimization and Data Clustering Problems

    (2019)
  • R. Barzegar et al.

    Application of wavelet-artificial intelligence hybrid models for water quality prediction: a case study in Aji-Chay River, Iran

    Stochast. Environ. Res. Risk Assess.

    (2016)
  • S.K. Bhagat et al.

    Development of artificial intelligence for modeling wastewater heavy metal removal: state of the art, application assessment and possible future research

    J. Clean. Prod.

    (2019)
  • N. Bharti

    Water quality indices used for surface water vulnerability assessment

    Int. J. Environ. Sci.

    (2011)
  • Castilla-herná, P., 2014. Water Quality of a Reservoir and Its Major Tributary Located in East-Central Mexico 6,...
  • T. Chen et al.

    Xgboost: a scalable tree boosting system

  • Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., 2015. Xgboost: extreme gradient boosting. R package version...
  • W.-B. Chen et al.

    Water quality modeling in reservoirs using multivariate linear regression and two neural network models

    Adv. Artif. Neural Syst.

    (2015)
  • D.N. Moriasi et al.

    Model evaluation guidelines for systematic quantification of accuracy in watershed simulations

    Trans. ASABE

    (2007)
  • P. Debels et al.

    Evaluation of water quality in the Chill??n River (Central Chile) using physicochemical parameters and a modified Water Quality Index

    Environ. Monit. Assess.

    (2005)
  • M. Ehteram et al.

    Efficiency evaluation of reverse osmosis desalination plant using hybridized multilayer perceptron with particle swarm optimization

    Environ. Sci. Pollut. Res.

    (2020)
  • S. Emamgholizadeh et al.

    Prediction of water quality parameters of Karoon River (Iran) by artificial intelligence-based models

    Int. J. Environ. Sci. Technol.

    (2014)
  • J.H. Friedman

    Greedy function approximation: a gradient boosting machine

    Ann. Stat.

    (2001)
  • M. Fu et al.

    Deep learning data-intelligence model based on adjusted forecasting window scale: application in daily streamflow

    Simulation

    (2020)
  • Elkiran, G., V.N., Abba, S.I., Abdullahi, J., 2018. Artificial intelligence-based approaches for multi-station...
  • Gaya, M.S., Abba, S.I., Abdu, A.M., Tukur, A.I., 2020. Estimation of water quality index using artificial intelligence...
  • Cited by (99)

    View all citing articles on Scopus
    View full text