Skip to main content
Log in

Study of the effectiveness of wavelet genetic programming model for water quality analysis in the Uttar Pradesh region

  • Research
  • Published:
Environmental Monitoring and Assessment Aims and scope Submit manuscript

Abstract

Water constitutes an essential part of the earth as it helps in making the environment greener and support life. But water quality and availability are drastically affected by rising water pollution and its poor sanitation. Water gets contaminated due to the excessive use of chemicals by the industries, fertilizers, and pesticides by the farmers. Not only the surface water, groundwater and river water are also getting contaminated. Several published work in Indian context have used different models for the prediction of water quality. Some of them performed poorly due to the presence of irrelevant and missing data in the training samples. Moreover, these studies have assessed water quality on the basis of biochemical oxygen demand (BOD) and coliform and chemical oxygen demand (COD), whereas dissolved oxygen(DO) is one of the most important parameters in terms of water quality assessment as it is considered a key determinant of pollution. Thus, there is a strong need to categorically identify and visualize the DO as one of the key components responsible for deteriorating the quality of water in Indian context. The main objective of this work is to build a wavelet genetic programming (WGP)-based workflow model for the assessment of water quality in 13 rivers of Uttar Pradesh region. WGP model has a unique feature of discarding the redundant and irrelevant data values from the source data. The proposed WGP model has given promising results which can be attributed to two factors: firstly, the novel use of Morlet wavelet in place of the widely popular Db wavelet, as the mother wavelet, and secondly, the use of MICE technique for missing value imputation in the pre-processing stage. The proposed model not only cleans the data but also demonstrates the feasibility of using DO values as one of the prime factors to assess the water quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

River water data was taken from the UP-Pollution Control Board website titled “Water Quality Data of the Polluted River Stretches” available at http://www.uppcb.com/water-quality-data-stretches.htm for the years 2020 and 2021.

References

Download references

Author information

Authors and Affiliations

Authors

Contributions

Mansi Gaonkar: data curation, investigation and analysis, writing—original draft, and visualization. Bhawna Saxena: writing—review and editing. Sandeep Kumar Singh: writing—review and editing, and supervision.

Corresponding author

Correspondence to Sandeep Kumar Singh.

Ethics declarations

Ethics approval

All authors have read, understood, and have complied as applicable with the statement on “Ethical responsibilities of Authors” as found in the Instructions for Authors and are aware that with minor exceptions, no changes can be made to authorship once the paper is submitted.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

S. No

Abbreviation

Full form

1

W-MGGP

Wavelet pre-processing with multi-gene genetic programming

2

WGEP

Wavelet gene expression programming

3

DWT

Discrete wavelet transform

4

LPM

Linear probability model

5

MGGP

Multi-gene genetic programming

6

PSR

Predictive state representation

7

FNN

False nearest neighbor

8

LASSO

Least absolute shrinkage and selection operator

9

GEP

Genetic evolutionary programming

10

FLGGP

Fixed-length gene genetic programming

11

LGP

Linear genetic programming

12

GP

Genetic programming

13

ARIMA

Autoregressive integrated moving average

14

WGP

Wavelet genetic programming

15

MSGP

Multi-stage genetic programming

16

GBT

Gradient boosting

17

ANN

Artificial neural networks

18

WANN

Wavelet artificial neural networks

19

ANFIS

Adaptive neuro-fuzzy inference system

20

WANFIS

Wavelet adaptive neuro-fuzzy inference system

21

FL

Fuzzy logic

22

SVM

Support vector machine

23

NF

Neuro-fuzzy

24

GA-NN

Genetic algorithm neural networks

25

WNN

Wavelet neural network

26

WSVR

Wavelet support vector regression

27

WLGP

Wavelet linear genetic programming

28

PCA

Principle component analysis

29

LSTM

Long short-term memory

30

RNNs

Recurrent neural networks

31

BP

Back propagation

32

SVR

Support vector regression

33

MSA

Multiple sequence alignments

34

MLP

Multilayer perceptron

35

MLR

Multiple linear regression

36

KNN

K-nearest neighbor

37

MODWT

Maximal overlap discrete wavelet transform

38

FWNN

Wavelet-based fuzzy neural network

39

CWT

Continuous wavelet transform

40

DDM

Data-driven modeling

41

NAR

Non-linear autoregressive neural network

42

RF

Random forest

43

ABM

Agent-based modeling

44

GBU

Gradient boosting unit

45

EMSP

Effective maliciousness score of permission

46

GBT

Gradient boosting technique

47

OAM

Open application model

48

LEM

Learnable evolution model

49

EP

Evolutionary programming

50

AR

Augmented reality

51

WNF

Wavelet neuro-fuzzy

52

WTC

Wavelet task clustering

53

SARIMA

Seasonal autoregressive integrated moving average

54

FNN

Feedforward neural network

55

RMSE

Root mean square error

56

NSE

Nash–Sutcliffe efficiency

57

MAE

Mean absolute error

58

R2

Coefficient of determination

59

MAPE

Mean absolute percentage error

60

MSE

Mean square error

61

AICc

Akaike information criterion

62

AARE

Average absolute relative error

63

MARE

Mean absolute relative error

64

AOI

Index of agreement

65

CE

Coefficient of efficiency

66

NSC

Nearest shrunken centroid

67

SSE

Sum of squared errors

68

MPE

Mean prediction error

69

R

Coefficient of correlation

70

RE

Relative error

71

MRSE

Mean root squared error

72

PARE

Pooled average relative underestimation and over estimation errors

73

NDEI

Non-dimensional error index

74

Ai

Accuracy factor

75

P

Pearson correlation coefficient

76

bias

Absolute value of the average forecast error

77

d

Willmott index of agreement

78

pMAPE

Mean absolute percentage errors of peaks

79

EVS

Explain variance score

80

RPD

Residual prediction deviation

81

Dr

Discrepancy ratio

82

MSRE

Mean square relative error

83

MS4E

Mean higher order error

84

SI

Scatter index

85

MS

Mean squared error

86

SBC

Schwarz Bayesian criterion

87

DO

Dissolved oxygen

88

BOD

Biochemical oxygen demand

89

EC

Electrical conductivity

90

COD

Chemical oxygen demand

91

TDS

Total dissolved solids

92

TP

Total phosphorus

93

TKN

Total Kjeldahl nitrogen

94

NH3-N

Ammoniacal nitrogen

95

RD

River depth

96

RW

River width

97

CA

Cultivated area

98

OA

Orchard area

99

SSL

Suspended sediment load

100

SSC

Suspended sediment concentration

101

NO3+

Nitrate

102

NH4+

Ammonium

103

PO4

Phosphate

104

TSS

Total suspended solids

105

NH4+-N

Ammonium nitrogen

106

WT

Water temperature

107

TU

Turbidity

108

db

Daubechies4

109

dmey

Discrete Meyer

110

bior

Biorthogonal

111

coif

Coiflets

112

sym

Symlets (symmetrical wavelets)

113

MCAR

Missing completely at random

114

MAR

Missing at random

115

MNAR

Missing not at random

116

MICE

Multiple imputation by chained equations

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saxena, B., Gaonkar, M. & Singh, S.K. Study of the effectiveness of wavelet genetic programming model for water quality analysis in the Uttar Pradesh region. Environ Monit Assess 195, 1010 (2023). https://doi.org/10.1007/s10661-023-11489-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10661-023-11489-y

Keywords

Navigation