# A Structurally Simplified Hybrid Model of Genetic Algorithm and Support Vector Machine for Prediction of Chlorophyll a in Reservoirs

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Study Area and Data Description

^{2}, its maximum depth is 153.93 m, and the maximum volume of reservoir storage is 3.349 × 10

^{10}m

^{3}. Monitoring data shows that in recent years, the total phosphorus concentration in the reservoir fluctuated between 0.010 and 0.025 mg/L, which means the nutrition status of the water is at a mesotrophication to oligotrophication level. The total nitrogen concentration ranges between 0.62 and 1.43 mg/L, indicating that the nutrition status is at a mild or moderate eutrophication level. Planktonic algae have rich diversities, and the dominant population in various periods is different in the Miyun Reservoir. As for cyanobacterium, from 2001 to 2003 it was the dominant algae from September to October [21]; from 2008 to 2010, it was the dominant algae from June to September [22,23]. Considering the current water quality situation, we should take effective measures to alleviate adverse influences resulting from climate change and human activities on the reservoir.

**Figure 2.**Water environmental situation in Miyun Reservoir: (

**a**) Concentration of chlorophyll a; (

**b**) TP concentration; (

**c**) TN concentration; (

**d**) COD

_{Mn}concentration; (

**e**) reservoir storage; and (

**f**) water level.

## 3. Methods

#### 3.1. The Flow Chart for Developing a Simplified Structural GA-SVM Hybrid Model

**Figure 3.**The flow chart for developing simplified structural GA-SVM hybrid model for chlorophyll a prediction.

#### 3.2. Construction of Chlorophyll a Prediction Model Based on the SVM Algorithm

^{n}; w is the matrix of the regression weight vector; $\varnothing $ is a non-linear function by which x is mapped into a high dimensional feature space; b is a bias; and b and w can be obtained with Equation (3). In the mapping process, a kernel function $\mathrm{k}\left(*,*\right)$ can be constructed by $\mathrm{k}\left(x,{x}^{\text{'}}\right)=\left(\varnothing \left(x\right)\bullet \varnothing \left({x}^{\text{'}}\right)\right)$. Therefore, we only need to replace the x or x

_{i}of the original space with $\varnothing \left(x\right)$or $\varnothing \left({x}_{i}\right)$, while it is not necessary to know the explicit expression of nonlinear mapping $\varnothing $. In this study, we selected radial basis function (RBF) as the kernel function:

_{i}is the input vector, x$\in $R

^{n}; and γ is the parameter of the RBF kernel function.

_{I}is inflow, W

_{0}is outflow, L is water level, TP is the concentration of total phosphorus in water, TN is total nitrogen in water, COD

_{Mn}is permanganate index in water, DO is the dissolved oxygen concentration in water, TW is water temperature, pH is hydrogenion concentration of water, SD is water transparency, T

_{A}is temperature, and P is precipitation.

#### 3.3. Feature Selection and Parameter Optimization Based on Genetic Algorithm Optimization

#### 3.4. Model Calibration

^{2})—Were selected to evaluate the fit and prediction effect of the model. AE represented the deviation between monitoring and prediction values, and RE was the ratio of AE and monitoring values, reflecting the objective accuracy of measurement results. RMSE reflected the performance of the prediction model, i.e., generally, the smaller the RMSE the better the performance. R

^{2}represented the degree of linear relevance among the variables, i.e., the closer R

^{2}was to 1 the higher the relevance. The expressions of these four indicators were as follows:

_{i}is the real value of the data set, ${\widehat{y}}_{i}$ is prediction value, $\overline{y}$ is the average of the original data, and n is the amount of data for the testing set.

## 4. Results and Discussion

#### 4.1. Wavelet Denoising

#### 4.2. Results of Sensitivity Analysis and Feature Selection

^{2}was 97.33%. After the SVM prediction model was trained, chlorophyll a in the Miyun Reservoir was predicted for the period between 2005 and 2010. In Figure 6b, we can see that from 2005 to 2009, the simulation effect was passable. However, in 2010 the simulation effect was not satisfactory. Calculations showed that the RMSE of the testing set was 0.000641 and the R

^{2}was 81.97%. From 2005 to 2009, the RMSE was 0.0004 and the R

^{2}was 85.96%; however, in 2010, the RMSE was 0.0013 and the R

^{2}was only 79.00%. This was primarily related to the fluctuations and periodicity of the monitored data. For the training data set, from 2000 to 2004, the concentration of chlorophyll a generally showed a peak in the middle of the year, but there was no obvious periodic trend for the concentration of chlorophyll a in the testing data set. In addition, in 2010 the concentration of chlorophyll a in the Miyun Reservoir was relatively higher compared with previous years. During April and for the period from August to November, the concentration of chlorophyll a was anomalously high, exceeding 0.004 mg/L, whereas in the training data set, the concentration of chlorophyll a had never achieved this level. Therefore, the SVM model was sensitive to the data. To explore which indicators were most relevant to chlorophyll a, sensitivity analysis for each input vector of the model was conducted.

**Figure 6.**Training and prediction results of the SVM model: (

**a**) training results and (

**b**) prediction results.

#### 4.3. Relative Errors of the SVM Model

#### 4.4. Comparisons of Model with Feature Selection and Model without Feature Selection

^{2}of the model with feature selection were slightly larger, and the RMSE was slightly smaller; in the testing process, the mean AE, mean RE, and RMSE of the model with feature selection were smaller, and the R

^{2}was significantly higher than that of model without feature selection.

Description | Model with Feature Selection | Model without Feature Selection | |
---|---|---|---|

Number of input vectors | 4 | 13 | |

Input vectors | TP, TN, COD_{Mn}, S | TP, TN, COD_{Mn}, S, W_{I}, W_{0}, L, DO, T_{W}, pH, SD, T_{A}, P | |

Training process | Mean AE | 0.00014244 | 0.00013824 |

Mean RE | 12.35% | 10.64% | |

RMSE | 0.00017 | 0.00018 | |

R^{2} | 97.33% | 97.19% | |

Testing process | Mean AE | 0.00045199 | 0.00057325 |

Mean RE | 22.98% | 26.16% | |

RMSE | 0. 000641 | 0.000836 | |

R^{2} | 81.97% | 69.36% |

_{I}is inflow; W

_{O}is outflow; L is water level; TP is the concentration of total phosphorus in water; TN is total nitrogen in water; COD

_{Mn}is permanganate index in water; DO is the dissolved oxygen concentration in water; T

_{W}is water temperature; pH is hydrogenion concentration of water; SD is water transparency; T

_{A}is the temperature; and P is precipitation.

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Scholz, M. Sustainable water systems. Water
**2013**, 5, 239–242. [Google Scholar] [CrossRef] - Cai, Y.P.; Huang, G.H.; Tan, Q.; Yang, Z.F. An integrated approach for climate-change impact analysis and adaptation planning under multi-level uncertainties. Part I: Methodology. Renew. Sustain. Energy Rev.
**2011**, 15, 2779–2790. [Google Scholar] [CrossRef] - Cai, Y.P.; Huang, G.H.; Yang, Z.F.; Tan, Q. Identification of optimal strategies for energy management systems planning under multiple uncertainties. Appl. Energy
**2009**, 86, 480–495. [Google Scholar] [CrossRef] - Tan, Q.; Huang, G.H.; Cai, Y.P. Radial interval chance-constrained programming for agricultural non-point source water pollution control under uncertainty. Agric. Water Manag.
**2011**, 98, 1595–1606. [Google Scholar] [CrossRef] - Mulia, I.E.; Tay, H.; Roopsekhar, K.; Tkalich, P. Hybrid ANN-GA model for predicting turbidity and chlorophyll-a concentrations. J. Hydro-Environ. Res.
**2013**, 7, 279–299. [Google Scholar] [CrossRef] - Liu, Y.; Guo, H.; Yang, P. Exploring the influence of lake water chemistry on chlorophyll a: A multivariate statistical model analysis. Ecol. Model.
**2010**, 221, 681–688. [Google Scholar] [CrossRef] - Cerco, C.F.; Noel, M.R. Twenty-one-year simulation of Chesapeake Bay water quality using the CE-QUAL-ICM eutrophication model. J. Am. Water. Resour. Assoc.
**2013**, 49, 1119–1133. [Google Scholar] - Blancher, E.C. Modeling nutrients and multiple algal groups using AQUATOX: Watershed management implications for the Braden River Reservoir, Bradenton Florida. Proc. Water Environ. Feder.
**2010**, 10, 6393–6410. [Google Scholar] [CrossRef] - Rinke, K.; Yeates, P.; Rothhaupt, K.O. A simulation study of the feedback of phytoplankton on thermal structure via light extinction. Freshw. Biol.
**2010**, 55, 1674–1693. [Google Scholar] - Seo, D.G.; Ahn, J.H. Prediction of chlorophyll-a changes due to weir constructions in the Nakdong River using EFDC-WASP modelling. Environ. Eng. Res.
**2012**, 17, 95–102. [Google Scholar] [CrossRef] - Chen, Q.; Han, R.; Ye, F.; Li, W. Spatio-temporal ecological models. Ecol. Inform.
**2011**, 6, 37–43. [Google Scholar] [CrossRef] - Gandomi, A.H.; Yun, G.J.; Yang, X.S.; Talatahari, S. Chaos-enhanced accelerated particle swarm optimization. Commun. Noulinear Sci. Numer. Simul.
**2013**, 18, 327–340. [Google Scholar] [CrossRef] - Maity, R.; Bhagwat, P.P.; Bhatnagar, A. Potential of support vector regression for prediction of monthly streamflow using endogenous property. Hydrol. Process.
**2010**, 24, 917–923. [Google Scholar] [CrossRef] - Malekmohamadi, I.; Bazargan-Lari, M.R.; Kerachian, R.; Nikoo, M.R.; Fallahnia, M. Evaluating the efficacy of SVMs, BNs, ANNs and ANFIS in wave height prediction. Ocean. Eng.
**2011**, 38, 487–497. [Google Scholar] [CrossRef] - Karamouz, M.; Ahmadi, A.; Moridi, A. Probabilistic Reservoir Operation Using Bayesian Stochastic Model and Support Vector Machine. Adv. Water Resour.
**2009**, 32, 1588–1600. [Google Scholar] [CrossRef] - Çimen, M.; Kisi, O. Comparison of Two Different Data-driven Techniques in Modelling Lake Level Fluctuations in Turkey. J. Hydrol.
**2009**, 378, 253–262. [Google Scholar] [CrossRef] - Zhang, Y.C.; Qian, X.; Qian, Y.; Liu, J.P.; Kong, F.X. Application of SVM on Chl-a concentration retrievals in Taihu Lake. China Environ. Sci.
**2009**, 29, 78–83. (In Chinese) [Google Scholar] - Xiang, X.Q.; Tao, J.H. Eutrophication Model of Bohai Bay Based on GA-SVM. J. Tianjin Univ.
**2011**, 44, 215–220. (In Chinese) [Google Scholar] - Liu, C.; Tang, D. Spatial and temporal variations in algal blooms in the coastal waters of the western South China Sea. J. Hydro-Environ. Res.
**2012**, 6, 239–247. [Google Scholar] [CrossRef] - Cho, K.H.; Kang, J.H.; Ki, S.J.; Park, Y.; Kim, J.H. Determination of the optimal parameters in regression models for the prediction of chlorophyll-a: A case study of the Yeongsan Reservoir, Korea. Sci. Total Environ.
**2009**, 407, 2536–2545. [Google Scholar] [CrossRef] [PubMed] - Wang, L.; Yang, M.; Guo, Z.H.; Zhang, Y.; Jiang, Y.; Fan, K.P. Study on water quality transformation in Miyun Reservoir. China Water Wastewater
**2006**, 22, 45–48. (In Chinese) [Google Scholar] - Jia, D.M.; Wang, J.S.; Xue, X.J.; Qi, Z.Y. Research on phytoplankton characteristics of Miyun Reservoir. Beijing Water
**2013**, 1, 12–15. (In Chinese) [Google Scholar] - Pan, K.M.; Wang, J.M. Control and management of eutrophication of the Miyun reservoir. Beijing Water
**2010**, 6, 25–27. (In Chinese) [Google Scholar] - Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn.
**1995**, 20, 273–297. [Google Scholar] - Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol.
**2011**, 2, 1–39. [Google Scholar] [CrossRef] - Maus, A.; Sprott, C. Neural network method for determining embedding dimension of a time series. Commun. Nonlinear Sci. Numer. Simul.
**2011**, 16, 3294–3302. [Google Scholar] [CrossRef] - Pourbasheer, E.; Riahi, S.; Ganjali, M.R.; Norouzi, P. Application of genetic algorithm-support vector machine (GA-SVM) for prediction of BK-channels activity. Eur. J. Med. Chem.
**2009**, 44, 5023–5028. [Google Scholar] [CrossRef] [PubMed] - Fernandez, M.; Caballero, J.; Fernandez, L.; Sarai, A. Genetic algorithm optimization in drug design QSAR: Bayesian-regularized genetic neural networks (BRGNN) and genetic algorithm-optimized support vectors machines (GA-SVM). Mol. Divers.
**2011**, 15, 269–289. [Google Scholar] [CrossRef] [PubMed] - Canfield, D.E., Jr. Prediction of chlorophyll a concentrations in Florida lakes: The importance of phosphorus and nitrogen. J. Am. Water Resour. Assoc.
**1983**, 19, 255–262. [Google Scholar] [CrossRef] - Noori, R.; Karbassi, A.R.; Moghaddamnia, A.; Han, D.; Zokaei-Ashtiani, M.H.; Farokhnia, A.; Gousheh, M.G. Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction. J. Hydrol.
**2001**, 401, 177–189. [Google Scholar] [CrossRef] - Besalatpour, A.A.; Ayoubi, S.; Hajabbasi, M.A. Feature selection using parallel genetic algorithm for the prediction of geometric mean diameter of soil aggregates by machine learning methods. Arid Land Res. Manag.
**2014**, 28, 383–394. [Google Scholar] [CrossRef] - Zandieh, M.; Karimi, N. An adaptive multi-population genetic algorithm to solve the multi-objective group scheduling problem in hybrid flexible flowshop with sequence-dependent setup times. J. Intell. Manuf.
**2011**, 22, 979–989. [Google Scholar] [CrossRef] - Halder, U.; Das, S.; Maity, D. A cluster-based differential evolution algorithm with external archive for optimization in dynamic environments. IEEE Trans. Cybern.
**2013**, 43, 881–897. [Google Scholar] [CrossRef] [PubMed]

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Su, J.; Wang, X.; Zhao, S.; Chen, B.; Li, C.; Yang, Z.
A Structurally Simplified Hybrid Model of Genetic Algorithm and Support Vector Machine for Prediction of Chlorophyll *a** * in Reservoirs. *Water* **2015**, *7*, 1610-1627.
https://doi.org/10.3390/w7041610

**AMA Style**

Su J, Wang X, Zhao S, Chen B, Li C, Yang Z.
A Structurally Simplified Hybrid Model of Genetic Algorithm and Support Vector Machine for Prediction of Chlorophyll *a** * in Reservoirs. *Water*. 2015; 7(4):1610-1627.
https://doi.org/10.3390/w7041610

**Chicago/Turabian Style**

Su, Jieqiong, Xuan Wang, Shouyan Zhao, Bin Chen, Chunhui Li, and Zhifeng Yang.
2015. "A Structurally Simplified Hybrid Model of Genetic Algorithm and Support Vector Machine for Prediction of Chlorophyll *a** * in Reservoirs" *Water* 7, no. 4: 1610-1627.
https://doi.org/10.3390/w7041610