# Modeling Daily and Monthly Water Quality Indicators in a Canal Using a Hybrid Wavelet-Based Support Vector Regression Structure

^{*}

## Abstract

**:**

_{4}(COD

_{Mn}), ammonia nitrogen (NH

_{3}-N), and dissolved oxygen (DO), in water from the Grand Canal from Beijing to Hangzhou. Modeling was independently conducted over daily and monthly time scales. The results demonstrated that the hybrid WA-PSO-SVR model was able to effectively predict non-linear stationary and non-stationary time series and outperformed two other models (PSO-SVR and a standalone SVR), especially for extreme values prediction. Daily predictions were more accurate than monthly predictions, indicating that the hybrid model was more suitable for short-term predictions in this case. It also demonstrated that using the autocorrelation and partial autocorrelation of time series enabled the construction of appropriate models for water quality prediction. The results contribute to water quality monitoring and better management for water diversion.

## 1. Introduction

_{4}(COD

_{Mn}), ammonia nitrogen (NH

_{3}-N), and dissolved oxygen (DO). The performance of the models developed for both time scales were compared and assessed. In addition, to assess the potential advantages of using the WA-PSO-SVR model, its results were compared to the results from a simple support vector regression with particle swarm optimization model (PSO-SVR) and a standalone SVR model, which were developed for the datasets.

## 2. Materials and Methods

#### 2.1. Study Area and Data Used

_{4}(COD

_{Mn}), ammonia nitrogen (NH

_{3}-N), and dissolved oxygen (DO), were selected for analysis. These parameters were selected because (1) they provide a general overview of the degree to which organic pollutants and nutrients have contaminated the river, and (2) the measurement and control of these pollutants is one of the primary tasks inherent in the operation of the water diversion [33,34]. Thus, COD

_{Mn}, NH

_{3}-N, and DO were selected as the target indicators used to develop and test the predictive models in this study. A statistical summary of these three water quality indicators for both daily and monthly time scales is presented in Table 1. Table 1 also includes information on whether the time series are stationary. Water was not transferred daily or on a regular pattern. Rather, transfers depended on the demand for water in northern areas, which were usually conducted in winter or spring. As a result, the water flows tended to significantly change on transferring days. With regards to the monthly data series, significant efforts to control waste water and non-point source pollution led to a slight decrease in the pollution level within the region.

#### 2.2. Wavelet Analysis (WA)

#### 2.3. Support Vector Regression (SVR)

#### 2.4. Particle Swarm Optimization (PSO) Algorithms

#### 2.5. Model Development

_{Mn}, NH

_{3}-N, and DO at the selected monitoring site. The implemented steps used to predict the water quality indicators are shown in Figure 3 and included the following:

- Data pre-processing. Due to occasional inefficiencies of the auto-monitoring systems, some auto-monitoring data were missing or erroneous. Thus, statistical outliers and structural zeros were removed from the dataset. In the case of missing data, an exponential smoothing method was used to estimate and replace the missing values.
- Wavelet analysis. Wavelet analysis was used to decompose each time series into wavelet sub-series. The choice of mother wavelets influences sub-series decomposition and construction. Three mother wavelets that are commonly employed are the Daubechies, Symlet, and Haar. The db3 wavelet is a function based on Daubechies extremal phase wavelets with a vanishing moment of 3; it has often been successfully applied in water quality predictions [29,32]. Thus, a db3 wavelet based on four layers was used herein for decomposing the water quality data series. All the analyzed time series were found to possess five sub-series, one represents the approximation series A
_{4}, and the other four are the detailed series from each layer, D_{1}, D_{2}, D_{3}, and D_{4}. - Data standardization. In order to remove dimensional effects which may bias the predictive models, the data were standardized by scaling the input variables over their range of observation prior to the modeling processes. The general formula for standardization is:$${\mathit{x}}_{\mathit{i}}^{\prime}=\frac{{\mathit{x}}_{\mathit{i}}-{\mathit{x}}_{\mathit{m}\mathit{i}\mathit{n}}}{{\mathit{x}}_{\mathit{m}\mathit{a}\mathit{x}}-{\mathit{x}}_{\mathit{m}\mathit{i}\mathit{n}}}$$
**x**, respectively; and ${\mathit{x}}_{\mathit{m}\mathit{i}\mathit{n}}$ and ${\mathit{x}}_{\mathit{m}\mathit{a}\mathit{x}}$ refer to the minimum and maximum values of variable**x**, respectively. Based on the characteristics of the different series and performance of the predictive models, the original data series and approximation series ranged from 0 to 1, while the detail series were between −1 and 1. - PSO-SVR modeling. The hybrid model exhibits a multi-input single output structure. The relevant and important input variables in the models were extracted using values from an autocorrelation function (ACF) and partial autocorrelation function (PACF) from each time series, with the criterion of the correlation coefficient set at the 95% confidence level. The PSO method was then applied to deduce the optimal parameter values for the SVR models. For each data series, five predictive models (one model for A
_{4}and four models for D_{1}to D_{4}) were run and calculated separately. - Data reconstruction. After calculation, algebraic sums of the predicted values based on the five sub-series (A
_{4}, D_{1}, D_{2}, D_{3}, and D_{4}) were obtained to generate the final forecasting results for each data series.

#### 2.6. Performance Assessment of the Models

^{2}), and the Nash–Sutcliffe efficiency coefficient (NSE).

^{2}and lower RMSE and MAPE values indicate a more precise model. The Nash–Sutcliffe efficiency coefficient (NSE), which ranges from −∞ to 1, can be used to assess the forecasting power of hydrological models [47]. The closer the NSE model efficiency is to 1, the more accurate the model. When NSE = 0, model predictions are as accurate as the mean of the observed data. In contrast, when NSE < 0, the residual variance is larger than the observed data variance and the model is unreliable. Equations (20)–(23) are the mathematical expressions used to calculate RMSE, MAPE, R

^{2}, and NSE, respectively:

## 3. Results

#### 3.1. Models for Daily Prediction

_{Mn}, NH

_{3}-N, and DO were decomposed using the db3 wavelet based on four layers (as described above). The sub-series after decomposition and reconstruction are shown in Figure 4. These three series were all non-linear. All three parameters exhibited considerable fluctuations during the summer wet season (from June to September) (Figure 4). This may be caused by a large amount of precipitation during this period which accounted for more than 80% of the annual precipitation.

_{3}-N was stationary, while COD

_{Mn}and DO were non-stationary. The decomposed and reconstructed sub-series included both stationary and non-stationary series. For the stationary series, the inputs for subsequent models were selected using their autocorrelation coefficients. For non-stationary series, the inputs were selected by their partial autocorrelation coefficients to obtain a high level of model performance [7]. Results related to the three water quality indicators that were produced by the hybrid WA-PSO-SVR model and the two other contrasting models (i.e., the PSO-SVR and the standalone SVR which used cross validation as optimization method) are presented in Figure 5.

_{Mn}was better than for the other two indicators (NH

_{3}-N and DO). Each indicator was more closely predicted by the WA-PSO-SVR model than by either the PSO-SVR or the single SVR model, especially for the prediction of extreme values. Predictive results generated by the PSO-SVR and the single SVR models were similar; in fact, the results nearly overlap for COD

_{Mn}. The predication of DO differed significantly among the three models. The performance of the standalone SVR model was lower than that of the other two models when predicted and observed values are compared. In addition, both the PSO-SVR and single SVR models possessed a one-day lag between observed and predicted values, which led to larger model errors that can be seen in scatter plots (Figure 6). The coefficient of determination (R

^{2}) for the WA-PSO-SVR models are about 0.9, while the values for the other models are much lower. Although the prediction of DO was the worst (Figure 5), the R

^{2}values of predicting NH

_{3}-N were the lowest among the three indicators. The highest R

^{2}value for NH

_{3}-N was only 0.8837; it was calculated using the WA-PSO-SVR model. The PSO-SVR model possessed larger errors than the standalone SVR model.

_{Mn}by the three models. All three models were efficient, with NSE values close to 1. During the testing period, NSE values for the WA-PSO-SVR model were 10.73% and 11.04% higher than the PSO-SVR and standalone SVR model, respectively. RMSE was calculated to be 46.76% and 47.23% lower, while MAPE was 40.77% and 42.86% lower, respectively.

_{3}-N (Table 3), the WA-PSO-SVR model performed well, while the other two models had poor performances and were unreliable as they exhibited NSE values below 0. These results illustrate that the hybrid model was the only one that can be used for daily NH

_{3}-N prediction.

#### 3.2. Models for Monthly Prediction

_{Mn}, NH

_{3}-N, and DO were initially decomposed (Figure 7). NH

_{3}-N exhibited a declining trend, whereas COD

_{Mn}and DO exhibited constant trends with generally consistent fluctuations.

_{3}-N and DO were found to be non-stationary series, whereas the COD

_{Mn}series was stationary. Following the selection of inputs for predictive models of each sub-series, the estimated results of the three models were calculated (Figure 8).

_{3}-N exhibited the largest errors. The predicted curves by the PSO-SVR and standalone SVR models for COD

_{Mn}overlapped (as they did for the daily predictions); the predictions of DO were also similar. The hybrid models were also better at predicting extreme values. This was especially true for the prediction of maximum DO concentrations; the hybrid model was the only one that accurately (closely) described changes in DO. The other two models even predicted values that were opposite to the observed values. In addition, these two models produced predictions that possessed a one-month lag delay in predicted indicators. The scatter plots in Figure 9 show that the WA-PSO-SVR models significantly outperformed the others. The prediction of NH

_{3}-N was the worst; the highest R

^{2}value was 0.8252. The prediction of NH

_{3}-N also exhibited the largest differences between the hybrid model and the others. The PSO-SVR and standalone SVR models both preformed extremely poorly in terms of NH

_{3}-N predictions.

_{Mn}data by the WA-PSO-SVR model during the testing phase produced RMSE, MAPE, and NSE values of 0.2506, 5.126%, and 0.8941, respectively (Table 5). These statistical values show that the model was able to make relatively accurate predictions of monthly COD

_{Mn}time series. In contrast, the PSO-SVR and SVR models had similar statistical assessment values, with NSE values below 0, indicating they generated undesired predictive results.

_{3}-N data was similar to COD

_{Mn}(Table 6). Only the WA-PSO-SVR model produced reliable results, although its MAPE value was much larger for NH

_{3}-N than for the prediction of COD

_{Mn}. RMSE, MAPE, and NSEs values calculated for the results of the other two models illustrate that they all produced large errors and had difficulties in generating satisfactory and accurate results.

_{3}-N than the other parameters; the WA-PSO-SVR model outperformed the other two models (Table 7). The WA-PSO-SVR model produced RMSE values that were 50.23% and 48.96% lower in comparison to the results generated by the PSO-SVR and SVR models, respectively. The MAPE values were 55.94% and 56.65% better, respectively, while the NSE values of the WA-PSO-SVR model improved by 99.93% and 87.69% over the others, respectively.

## 4. Discussion

_{3}-N and monthly COD

_{Mn}series were stationary. However, the accuracy of the WA-PSO-SVR modeling results was uncorrelated to whether the time series were stationary. For daily NH

_{3}-N and monthly COD

_{Mn}prediction, the hybrid models generated satisfying results, whereas both the PSO-SVR and SVR produced unreliable results for them, as determined by negative NSE values. The PSO-SVR and SVR have given satisfactory performances in some other studies [21]; however, these results showed a possibility that the hybrid model was more suitable for stationary data than the PSO-SVR and SVR models in this situation. Similar to the stationarity of the data series, when comparing the WA-PSO-SVR and other two models, the model performances were also unrelated to the distribution of data. Wavelet analysis could increase the accuracy of prediction, which was independent of Skewness and Kurtosis values.

_{Mn}was extremely good with the highest NSE values, followed closely by DO, and NH

_{3}-N was the worst. However, when comparing the RMSE and MAPE values, the results were different. NH

_{3}-N had the lowest RMSE, but DO had the lowest MAPE. In general, the prediction of NH

_{3}-N was more difficult. The NH

_{3}-N models always had larger MAPE and lower NSE among three indicators. This is related to the distribution of the data series. Although all of the original six data series did not have a normal distribution, two NH

_{3}-N series had larger absolute values of Skewness and Kurtosis among them, indicating that they were far from normal distribution than other series. Highly skewed and imbalanced data is a reason that could lead to the poor performance of these models [48].

## 5. Conclusions

_{Mn}, NH

_{3}-N, and DO reached up to 0.9627, 0.8433, and 0.9190, respectively, indicating that the models were available to provide satisfactory predictions. Third, among the three indicators in this study, COD

_{Mn}and DO were effectively predicted for both daily and monthly timeframes, but NH

_{3}-N showed the worst performances, as the data series much deviated to normal distribution. Finally, this study shows that the prediction of water quality indicators using only a data series (i.e., without considering other indicators) is possible. The autocorrelation of series data can identify statistically significant lagged data and be used to construct appropriate predictive models for daily management purposes.

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A

**Figure A1.**Flow chart of the standalone SVR structure for the prediction of water quality indicators.

## References

- Gorgoglione, A.; Gioia, A.; Iacobellis, V. A framework for assessing modeling performance and effects of rainfall-catchment-drainage characteristics on nutrient urban runoff in poorly gauged watersheds. Sustainability
**2019**, 11, 4933. [Google Scholar] [CrossRef] [Green Version] - Liu, A.; Egodawatta, P.; Guan, Y.; Goonetilleke, A. Influence of rainfall and catchment characteristics on urban stormwater quality. Sci. Total Environ.
**2013**, 444, 255–262. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Boyacioglu, H. Development of a water quality index based on a European classification scheme. Water SA
**2007**, 33, 101–106. [Google Scholar] [CrossRef] [Green Version] - Khalil, B.; Ouarda, T.B.M.J.; St-Hilaire, A.; Chebana, F. A statistical approach for the rationalization of water quality indicators in surface water quality monitoring networks. J. Hydrol.
**2010**, 386, 173–185. [Google Scholar] [CrossRef] - Katimon, A.; Shahid, S.; Mohsenipour, M. Modeling water quality and hydrological variables using ARIMA: A case study of Johor River, Malaysia. Sustain. Water Resour. Manag.
**2018**, 4, 991–998. [Google Scholar] [CrossRef] - Rajaee, T.; Jafari, H. Utilization of WGEP and WDT models by wavelet denoising to predict water quality parameters in rivers. J. Hydrol. Eng.
**2018**, 23, 04018054. [Google Scholar] [CrossRef] - Fijani, E.; Barzegar, R.; Deo, R.C.; Tziritis, E.; Skordas, K.; Konstantinos, S. Design and implementation of a hybrid model based on two-layer decomposition method coupled with extreme learning machines to support real-time environmental monitoring of water quality parameters. Sci. Total Environ.
**2019**, 648, 839–853. [Google Scholar] [CrossRef] - Ahmed, A.N.; Othman, F.B.; Afan, H.A.; Ibrahim, R.K.; Fai, C.M.; Hossain, S.; Ehteram, M.; Elshafie, A. Machine learning methods for better water quality prediction. J. Hydrol.
**2019**, 578, 124084. [Google Scholar] [CrossRef] - Li, S.; Guo, W.; Mitchell, B. Evaluation of water quality and management of Hongze Lake and Gaoyou Lake along the Grand Canal in Eastern China. Environ. Monit. Assess.
**2011**, 176, 373–384. [Google Scholar] [CrossRef] [PubMed] - Xiaolong, W.; Jingyi, H.; Ligang, X.; Qi, Z. Spatial and seasonal variations of the contamination within water body of the Grand Canal, China. Environ. Pollut.
**2010**, 158, 1513–1520. [Google Scholar] [CrossRef] - Gorai, A.K.; Hasni, S.A.; Iqbal, J. Prediction of ground water quality index to assess suitability for drinking purposes using fuzzy rule-based approach. Appl. Water Sci.
**2016**, 6, 393–405. [Google Scholar] [CrossRef] [Green Version] - Sahu, M.; Mahapatra, S.S.; Sahu, H.; Patel, R.K. Prediction of Water Quality Index Using Neuro Fuzzy Inference System. Water Qual. Expo. Health
**2011**, 3, 175–191. [Google Scholar] [CrossRef] - Barakat, A.; El Baghdadi, M.; Rais, J.; Aghezzaf, B.; Slassi, M. Assessment of spatial and seasonal water quality variation of Oum Er Rbia River (Morocco) using multivariate statistical techniques. Int. Soil Water Conserv. Res.
**2016**, 4, 284–292. [Google Scholar] [CrossRef] - Saha, N.; Rahman, M.S. Multivariate statistical analysis of metal contamination in surface water around Dhaka export processing industrial zone, Bangladesh. Environ. Nanotechnol. Monit. Manag.
**2018**, 10, 206–211. [Google Scholar] [CrossRef] - Dong, L.; Wang, L.; Khahro, S.F.; Gao, S.; Liao, X. Wind power day-ahead prediction with cluster analysis of NWP. Renew. Sustain. Energy Rev.
**2016**, 60, 1206–1212. [Google Scholar] [CrossRef] - Çamdevýren, H.; Demýr, N.; Kanik, A.; Keskýn, S. Use of principal component scores in multiple linear regression models for prediction of Chlorophyll-a in reservoirs. Ecol. Model.
**2005**, 181, 581–589. [Google Scholar] [CrossRef] - Liu, C.; Hu, Y.; Yu, T.; Xu, Q.; Liu, C.; Li, X.; Shen, C. Optimizing the Water Treatment Design and Management of the Artificial Lake with Water Quality Modeling and Surrogate-Based Approach. Water
**2019**, 11, 391. [Google Scholar] [CrossRef] [Green Version] - Wang, Y.; Wu, L.; Engel, B. Prediction of sewage treatment cost in rural regions with multivariate adaptive regression splines. Water
**2019**, 11, 195. [Google Scholar] [CrossRef] [Green Version] - Heddam, S.; Kisi, O. Modelling daily dissolved oxygen concentration using least square support vector machine, multivariate adaptive regression splines and M5 model tree. J. Hydrol.
**2018**, 559, 499–509. [Google Scholar] [CrossRef] - Yoon, H.; Kim, Y.; Ha, K.; Lee, S.-H.; Kim, G.-P. Comparative evaluation of ANN-and SVM-time series models for predicting freshwater-saltwater interface fluctuations. Water
**2017**, 9, 323. [Google Scholar] [CrossRef] [Green Version] - Mohammad, S.K.; Paulin, C. Application of Support Vector Machine in Lake Water Level Prediction. J. Hydrol. Eng.
**2006**, 11, 199–205. [Google Scholar] - Sapankevych, N.I.; Sankar, R. Time series prediction using support vector machines: A survey. IEEE Comput. Intell. Mag.
**2009**, 4, 24–38. [Google Scholar] [CrossRef] - Ostadrahimi, L.; Mariño, M.A.; Afshar, A. Multi-reservoir operation rules: Multi-swarm PSO-based optimization approach. Water Resour. Manag.
**2012**, 26, 407–427. [Google Scholar] [CrossRef] - Nieto, P.G.; Garcia-Gonzalo, E.; Alonso-Fernández, J.R.; Muñiz, C.D. Hybrid PSO–SVM-based method for long-term forecasting of turbidity in the Nalón river basin: A case study in Northern Spain. Ecol. Eng.
**2014**, 73, 192–200. [Google Scholar] [CrossRef] - Zhang, F.; Dai, H.; Tang, D. A conjunction method of wavelet transform-particle swarm optimization-support vector machine for streamflow forecasting. J. Appl. Math.
**2014**, 2014, 910196. [Google Scholar] [CrossRef] - Alizadeh, M.J.; Kavianpour, M.R. Development of wavelet-ANN models to predict water quality parameters in Hilo Bay, Pacific Ocean. Mar. Pollut. Bull.
**2015**, 98, 171–178. [Google Scholar] [CrossRef] - Meng, E.; Huang, S.; Huang, Q.; Fang, W.; Wu, L.; Wang, L. A robust method for non-stationary streamflow prediction based on improved EMD-SVM model. J. Hydrol.
**2019**, 568, 462–478. [Google Scholar] [CrossRef] - Najah, A.A.; El-Shafie, A.; Karim, O.A.; Jaafar, O. Water quality prediction model utilizing integrated wavelet-ANFIS model with cross-validation. Neural Comput. Appl.
**2012**, 21, 833–841. [Google Scholar] [CrossRef] - Liu, S.; Xu, L.; Jiang, Y.; Li, D.; Chen, Y.; Li, Z. A hybrid WA–CPSO-LSSVR model for dissolved oxygen content prediction in crab culture. Eng. Appl. Artif. Intell.
**2014**, 29, 114–124. [Google Scholar] [CrossRef] - Kisi, O.; Parmar, K.S. Application of least square support vector machine and multivariate adaptive regression spline models in long term prediction of river water pollution. J. Hydrol.
**2016**, 534, 104–112. [Google Scholar] [CrossRef] - Barzegar, R.; Adamowski, J.; Moghaddam, A.A. Application of wavelet-artificial intelligence hybrid models for water quality prediction: A case study in Aji-Chay River, Iran. Stoch. Environ. Res. Risk Assess.
**2016**, 30, 1797–1819. [Google Scholar] [CrossRef] - Barzegar, R.; Moghaddam, A.A.; Adamowski, J.; Ozga-Zielinski, B. Multi-step water quality forecasting using a boosting ensemble multi-wavelet extreme learning machine model. Stoch. Environ. Res. Risk Assess.
**2018**, 32, 799–813. [Google Scholar] [CrossRef] - Guo, P.; Ren, J. Variation trend analysis of water quality along the eastern route of South-to-North Water Diversion Project. South North Water Transf. Water Sci. Technol.
**2014**, 1, 59–64. (In Chinese) [Google Scholar] - Hu, Y.; Han, B.; Du, J. Water quality of Xuzhou block of the south-to-north water transfer project and countermeasures. Soils
**2007**, 3, 483–487. (In Chinese) [Google Scholar] - Qian, T.; Vai, M.I.; Xu, Y. Wavelet Analysis and Applications; Birkhäuser: Basel, Switzerland, 2007. [Google Scholar]
- Xu, M.; Han, M.; Lin, H. Wavelet-denoising multiple echo state networks for multivariate time series prediction. Inf. Sci.
**2018**, 465, 439–458. [Google Scholar] [CrossRef] - Adamowski, J.; Chan, H.F. A wavelet neural network conjunction model for groundwater level forecasting. J. Hydrol.
**2011**, 407, 28–40. [Google Scholar] [CrossRef] - Partal, T.; Kişi, Ö. Wavelet and neuro-fuzzy conjunction model for precipitation forecasting. J. Hydrol.
**2007**, 342, 199–212. [Google Scholar] [CrossRef] - Kisi, O.; Cimen, M. A wavelet-support vector machine conjunction model for monthly streamflow forecasting. J. Hydrol.
**2011**, 399, 132–140. [Google Scholar] [CrossRef] - Christopoulou, E.B.; Skodras, A.N.; Georgakilas, A.A. The “Trous”wavelet transform versus classical methods for the improvement of solar images. In Proceedings of the 14th International Conference on Digital Signal Processings, Santorini, Greece, 1–3 July 2002. [Google Scholar]
- Vapnik, V.N. The nature of statistical learning theory. IEEE Trans. Neural Netw.
**1995**, 8, 988–999. [Google Scholar] - Vapnik, V.N. Statistical Learning Theory (Adaptive and Learning Systems for Signal Processing, Communications, and Control); Wiley: New York, NY, USA, 1998. [Google Scholar]
- Haykin, S.S. Neural Networks and Learning Machines; Pearson: Upper Saddle River, NJ, USA, 2009; Volume 3. [Google Scholar]
- Ring, M.; Eskofier, B.M. An approximation of the Gaussian RBF kernel for efficient classification with SVMs. Pattern Recognit. Lett.
**2016**, 84, 107–113. [Google Scholar] [CrossRef] - Alpaydin, E. Introduction to Machine Learning; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
- Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995. [Google Scholar]
- Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol.
**1970**, 10, 282–290. [Google Scholar] [CrossRef] - Liu, Y.; An, A.; Huang, X. Boosting prediction accuracy on imbalanced datasets with SVM ensembles. In Pacific-Asia Conference on Knowledge Discovery and Data Mining; Springer: Berlin, Germany, 2006. [Google Scholar]
- Borzilov, V.A.; Novitsky, M.A.; Konoplev, A.V.; Voszhennikov, O.I.; Gerasimenko, A.C. A model for prediction and assessment of surface water contamination in emergency situations and methodology of determining its parameters. Radiat. Prot. Dosim.
**1993**, 50, 349–351. [Google Scholar] [CrossRef]

**Figure 1.**(

**a**) Location of east route of the South-to-North Water Diversion Project in China; (

**b**) Location of the Xuzhou City in Jiangsu Province; (

**c**) Location of the Zhanglou sampling site.

**Figure 3.**Flow chart of the wavelet analysis-support vector regression approach with particle swarm optimization algorithm (WA-PSO-SVR) structure for the prediction of water quality indicators.

Indicators | Unit | Data Group | Max. | Min. | Median | Std. Dev. | Skewness | Kurtosis | Stationarity ^{1} |
---|---|---|---|---|---|---|---|---|---|

COD_{Mn} | mg/L | Daily | 6.63 | 1.94 | 3.51 | 0.92 | 0.60 | −0.01 | N |

Monthly | 8.20 | 0.80 | 3.60 | 1.05 | 0.47 | 1.87 | S | ||

NH_{3}-N | mg/L | Daily | 3.97 | 0.06 | 0.16 | 0.41 | 6.00 | 43.06 | S |

Monthly | 4.10 | 0.11 | 0.50 | 0.46 | 4.17 | 25.98 | N | ||

DO | mg/L | Daily | 20.81 | 2.24 | 9.18 | 3.36 | 0.29 | −0.53 | N |

Monthly | 13.80 | 2.90 | 8.40 | 1.85 | 0.06 | −0.20 | N |

^{1}The stationarity of time series was assessed using the Augmented Dickey–Fuller (ADF) test; N refers to non-stationary series, S refers to stationary series.

Model | Training | Testing | ||||
---|---|---|---|---|---|---|

RMSE | MAPE (%) | NSE | RMSE | MAPE (%) | NSE | |

WA-PSO-SVR | 0.1103 | 2.085 | 0.9867 | 0.1420 | 2.333 | 0.9627 |

PSO-SVR | 0.2929 | 4.530 | 0.9058 | 0.2667 | 3.939 | 0.8694 |

SVR | 0.2879 | 4.501 | 0.9090 | 0.2691 | 4.083 | 0.8670 |

Model | Training | Testing | ||||
---|---|---|---|---|---|---|

RMSE | MAPE (%) | NSE | RMSE | MAPE (%) | NSE | |

WA-PSO-SVR | 0.0571 | 13.24 | 0.9839 | 0.0089 | 6.791 | 0.8433 |

PSO-SVR | 0.0853 | 15.43 | 0.9641 | 0.0228 | 17.32 | −0.0351 |

SVR | 0.1476 | 18.46 | 0.8924 | 0.0259 | 21.27 | −0.3466 |

**Table 4.**Statistical values of daily dissolved oxygen (DO) prediction in the training and testing phase.

Model | Training | Testing | ||||
---|---|---|---|---|---|---|

RMSE | MAPE (%) | NSE | RMSE | MAPE (%) | NSE | |

WA-PSO-SVR | 0.7980 | 6.222 | 0.9204 | 0.2329 | 1.106 | 0.9190 |

PSO-SVR | 1.4298 | 10.59 | 0.7444 | 0.5567 | 3.061 | 0.5371 |

SVR | 1.3052 | 9.623 | 0.7870 | 0.9412 | 5.271 | −0.3230 |

Model | Training | Testing | ||||
---|---|---|---|---|---|---|

RMSE | MAPE (%) | NSE | RMSE | MAPE (%) | NSE | |

WA-PSO-SVR | 0.3102 | 6.724 | 0.9071 | 0.2506 | 5.126 | 0.8941 |

PSO-SVR | 0.7425 | 19.14 | 0.4679 | 0.8032 | 18.00 | −0.0881 |

SVR | 0.7395 | 19.00 | 0.4722 | 0.8136 | 18.15 | −0.1166 |

Model | Training | Testing | ||||
---|---|---|---|---|---|---|

RMSE | MAPE (%) | NSE | RMSE | MAPE (%) | NSE | |

WA-PSO-SVR | 0.1333 | 20.10 | 0.8683 | 0.0730 | 14.68 | 0.8142 |

PSO-SVR | 0.3549 | 42.74 | 0.0670 | 0.2062 | 55.17 | −0.4802 |

SVR | 0.3270 | 38.82 | 0.2082 | 0.2298 | 57.56 | −0.8388 |

Model | Training | Testing | ||||
---|---|---|---|---|---|---|

RMSE | MAPE (%) | NSE | RMSE | MAPE (%) | NSE | |

WA-PSO-SVR | 0.4429 | 3.876 | 0.9422 | 0.5222 | 4.277 | 0.8587 |

PSO-SVR | 1.1230 | 10.95 | 0.6286 | 1.0492 | 9.707 | 0.4295 |

SVR | 1.0800 | 10.05 | 0.6565 | 1.0232 | 9.866 | 0.4575 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Wang, Y.; Yuan, Y.; Pan, Y.; Fan, Z.
Modeling Daily and Monthly Water Quality Indicators in a Canal Using a Hybrid Wavelet-Based Support Vector Regression Structure. *Water* **2020**, *12*, 1476.
https://doi.org/10.3390/w12051476

**AMA Style**

Wang Y, Yuan Y, Pan Y, Fan Z.
Modeling Daily and Monthly Water Quality Indicators in a Canal Using a Hybrid Wavelet-Based Support Vector Regression Structure. *Water*. 2020; 12(5):1476.
https://doi.org/10.3390/w12051476

**Chicago/Turabian Style**

Wang, Yuxin, Yuan Yuan, Ye Pan, and Zhengqiu Fan.
2020. "Modeling Daily and Monthly Water Quality Indicators in a Canal Using a Hybrid Wavelet-Based Support Vector Regression Structure" *Water* 12, no. 5: 1476.
https://doi.org/10.3390/w12051476