Water Quality Prediction Method Based on IGRA and LSTM

Water quality prediction has great significance for water environment protection. A water quality prediction method based on the Improved Grey Relational Analysis (IGRA) algorithm and a Long-Short Term Memory (LSTM) neural network is proposed in this paper. Firstly, considering the multivariate correlation of water quality information, IGRA, in terms of similarity and proximity, is proposed to make feature selection for water quality information. Secondly, considering the time sequence of water quality information, the water quality prediction model based on LSTM, whose inputs are the features obtained by IGRA, is established. Finally, the proposed method is applied in two actual water quality datasets: Tai Lake and Victoria Bay. Experimental results demonstrate that the proposed method can take full advantage of the multivariate correlations and time sequence of water quality information to achieve better performance on water quality prediction compared with the single feature or non-sequential prediction methods.


Introduction
Accurate water quality prediction is the basis of water environment management and is of great significance for water environment protection.Water quality information exist in the form of multivariate time-series datasets.There is no doubt that the accuracy of water quality prediction will be improved if the multivariate correlation and time sequence data of water quality are fully used.
The common methods for water quality prediction include Artificial Neural Networks (ANN), Regression Analyses (RA), Grey Systems (GS), and Support Vector Regressions (SVR).Li et al. [1] applied the optimized back-propagation neural network to predict the concentration of chlorophyll in a lake.Grbić et al. [2] proposed a method based on a Gaussian process regression to predict daily average water temperature.Candelieri et al. [3] applied clustering and SVR in water demand forecasting and anomaly detection.Dai et al. [4] established the Grey Model (1,1) with GS theory to predict major pollutants in a particular water environment.
Most of the methods mentioned above only adopted a single feature for prediction without considering the multivariate correlation of water quality information.Some researchers have considered multiple indicators in prediction [5][6][7][8], but the correlations among these indicators haven't been analyzed.The multivariate correlations of water quality information refer to the complex and variable correlations among various indicators, and an example of such correlations is the nonlinear correlation between dissolved oxygen content and multiple indicators such as microbial concentration, temperature, salinity, etc.To take advantage of the multivariate correlations of water quality information, it is essential to analyze the correlations among various indicators and select Water 2018, 10, 1148 2 of 11 appropriate features from water quality indicators.Common methods of correlation analysis include Granger Causality Analysis (GCA) [9], Copula Analysis (CA) [10], and Grey Relational Analysis (GRA) [11].GCA can only analyze the information qualitatively and it is unable to give a quantitative description.Therefore, it can't be directly applied to the nonlinear system such as water environment.CA cannot find a suitable edge distribution when dealing with irregularly distributed water quality information.There are many factors affecting the water quality indicators, which are partial and grey in many cases.Therefore, it is favorable to solve such problems using GRA.Nevertheless, GRA has a problem with measuring negative correlations.Therefore, an Improved Grey Relational Analysis (IGRA) algorithm is proposed in this paper to measure the correlations among water quality indicators more accurately.And then, it is used to make the feature selection from the indicators.
Water quality information exists as time-series, which means it changes periodically along with time.For instance, water quality information changes significantly as the season changes.With the development of water quality prediction, neural networks with nonlinear and self-organizing learning characteristics are widely adopted [12][13][14][15][16].However, the neuron structure of traditional neural networks is not suitable for sequential data.A Long-Short Term Memory (LSTM) neural network, which is a kind of recurrent neural network (RNN) [17], establishes a long time lag among preventing gradient explosion, input, and feedback.This neuron structure has a selective memory function, which is very suitable for dealing with sequential data such as water quality information.It has been applied in the field of time series prediction successfully, such as in stock prediction [18] and traffic flow prediction [19].
To take full advantage of the multivariate correlation and time sequence of water quality information, IGRA and LSTM are combined for water quality prediction in this paper.Firstly, IGRA is proposed to perform feature selections for water quality information.Secondly, LSTM is adopted to establish the water quality prediction model, whose inputs are the indicators obtained by IGRA.The proposed method is compared with other similar methods in two actual water quality datasets: Tai Lake and Victoria Bay.The experimental results demonstrate that the method proposed in this paper has better performance for water quality prediction compared with other similar methods.
The contributions of this paper are listed as follows: (1) IGRA is proposed to make feature selections to take full advantage of the multivariate correlation of water quality information.(2) LSTM is employed to establish the water quality prediction model to make full use of the time sequence of water quality information.
The rest of the paper is structured as follows: The water quality prediction method based on IGRA and LSTM is described in Section 2. The experiments and comparison analysis with other methods are discussed in Section 3.This paper is summarized in Section 4.

Feature Selection Based on IGRA
GRA is a multi-factor statistical analysis method.In this paper, the grey correlation degree in GRA is regarded as the evaluation index for the relevance of water quality indicators.Liu et al. [20] proposed the correlation calculation in terms of similarity and proximity.However, when their method is used to calculate the correlation among the water quality indicators, the positive and negative areas will counterbalance during the integration process.Due to that, the results often cannot accurately reflect the relevance of the indicators.Therefore, IGRA is proposed in this paper.

Definition 1.
Set the water quality sequence as X i (n) = [x i (1), . . . . . .x i (n)], where X i (n) represents the observations of the water quality indicator X i at the previous n historical moments and the observation of X i at the nth moment is denoted as x i (n).Then, the origin annihilation image of X i (n) can be expressed as.
Water 2018, 10, 1148 3 of 11 Furthermore, at the interval [k, k + 1], the integration of the above-mentioned can be regarded as the area of a right triangle with a right-angled side measured as 1.Then the integration can be further expressed as: Definition 3.There are two compared water quality sequences X i (n) and X j (n).The similarity and proximity coefficient between X i (n) and X j (n) are calculated as Equations ( 5) and (6), respectively: sgn returns an integer variable indicating the positive and negative sign of the parameter.The similarity and proximity between X i (n) and X j (n) are respectively calculated as follows: The grey correlation degree between X i (n) and X j (n) is denoted as w (w is in the range of 0-1): IGRA calculates the similarity and proximity by relative area change ratio.Positive and negative areas will never counterbalance during the calculating process [21], which makes the calculation of the correlations among the water quality indicators more objective and accurate.Set X i as the water quality indicator to predict, and the grey correlation degree w between X i and another indicator can be calculated by Equations ( 1)- (9).s water quality indicators U = {X i1 , X i2 , . . .X is } with a larger absolute value of grey correlation degree about X i are selected.In particular, X is represents the sth indicator associated with X i .The selected indicators U = {X i1 , X i2 , . . .X is } and X i together are regarded as the features.The observations of the features at previous t − d historical moments are applied to predict x i (t), which is the value of X i at the tth moment.The size of the sliding window is denoted as d, which determines how many historical observations should be adopted.After feature selection via IGRA, the input of the prediction model can be determined as where the observations of the indicator X i at the previous t − d historical moments are denoted as

Water Quality Prediction Based on LSTM
LSTM was proposed by Hochreiter and Schmidhuber in 1997 [22].It is a new kind of RNN, which is faster and easier to converge to the optimal solution than other traditional neural networks when dealing with time sequence prediction problems.A water quality prediction model based on LSTM is established in Figure 1.The inputs are observations of X i and U = {X i1 , X i2 , . . .X is } at previous t − d historical moments denoted as T. The output is the prediction value of X i at the tth moment denoted as x i (t).The model consists of three layers: the input layer, the hidden layer, and the output layer.The weight between the input layer and the hidden layer is represented as W ih .The neurons of the hidden layer are denoted as H = (h 1 , h 2 , . . ., h j ), where the jth neuron of the hidden layer is expressed as h j .The weight within the hidden layer is denoted as W hh .The weight between the hidden layer and output layer is represented as W ho .
can be calculated by Equations ( 1)- (9).s water quality indicators with a larger absolute value of grey correlation degree about i X are selected.In particular, is X represents the sth indicator associated with i X .The selected indicators

Water Quality Prediction Based on LSTM
LSTM was proposed by Hochreiter and Schmidhuber in 1997 [22].It is a new kind of RNN, which is faster and easier to converge to the optimal solution than other traditional neural networks when dealing with time sequence prediction problems.A water quality prediction model based on LSTM is established in Figure 1.The inputs are observations of i X and historical moments denoted as T .The output is the prediction value of i X at the tth moment denoted as ( ) x t .The model consists of three layers: the input layer, the hidden layer, and the output layer.The weight between the input layer and the hidden layer is represented as ih W .
The neurons of the hidden layer are denoted as ( , ,..., ) , where the jth neuron of the hidden layer is expressed as j h .The weight within the hidden layer is denoted as hh W .The weight between the hidden layer and output layer is represented as ho W .The calculation of the model is shown as follows:

( ) n i h h hn h h H W T W h b
In the above formulas, the bias vector of the hidden layer is denoted as h b and the bias vector of the output layer is denoted as y b .The calculation of the model is shown as follows: In the above formulas, the bias vector of the hidden layer is denoted as b h and the bias vector of the output layer is denoted as b y .
Each neuron of hidden layer in Figure 1 consists of three gates: the input gate, the output gate, and the forget gate.The structure of LSTM neuron is shown in Figure 2.
Each neuron of hidden layer in Figure 1 consists of three gates: the input gate, the output gate, and the forget gate.The structure of LSTM neuron is shown in Figure 2.
and ( ) The calculation of the forget gate is shown as follows: ) The calculation of the input gate is shown as follows: ) The calculation of the update state in the neuron is shown as follows: ) The calculation of the output gate is shown as follows: ) The calculation of the hidden layer at the tth moment is shown as follows: 1 ( ) * ( ) In the above formulas, the sigmoid function is represented as σ .are the weights between the forget gate and the input layer, the state unit, the hidden layer, respectively.Ii W , Ic W and Ih W are the weights between the input gate and the input layer, the state unit, the hidden layer, respectively.ci W , ch W , and cc W are the weights between the state cell and the input layer, the hidden layer, the last moment state of the state cell, respectively.oi W and oc W are the weights between the output gate and the input layer, the state cell, respectively.The bias vectors of the forget gate, input gate, the state cell and the output layer are denoted as f b , I b , c b , o b , respectively.* stands for the scalar product.The selective memory function of LSTM is implemented by the gating mechanism that makes LSTM more suitable for dealing with time sequence prediction problems than other traditional neural In Figure 2, the forget gate determines which part of the information should be forgotten according to the current input x i (t − d), the last moment state of the neuron c t−d+1 and the last moment output of the jth neuron h j (t − d + 1) in the hidden layer.The input gate determines which part of the information should be the input of the current moment state c t−d according to x i (t − d), c t−d+1 and h j (t − d + 1).The output gate determines the output of the current moment state according to the c t−d , h j (t − d + 1) and x i (t − d).
The calculation of the forget gate is shown as follows: The calculation of the input gate is shown as follows: The calculation of the update state in the neuron is shown as follows: The calculation of the output gate is shown as follows: The calculation of the hidden layer at the tth moment is shown as follows: In the above formulas, the sigmoid function is represented as σ.g and P are the extensions of stand sigmoid function with the value ranges of [−2, 2] and [−1, 1], respectively.W f i , W f c , and W f h are the weights between the forget gate and the input layer, the state unit, the hidden layer, respectively.W Ii , W Ic and W Ih are the weights between the input gate and the input layer, the state unit, the hidden layer, respectively.W ci , W ch , and W cc are the weights between the state cell and the input layer, the hidden layer, the last moment state of the state cell, respectively.W oi and W oc are the weights between the output gate and the input layer, the state cell, respectively.The bias vectors of the forget gate, input gate, the state cell and the output layer are denoted as b f , b I , b c , b o , respectively.* stands for the scalar product.
The selective memory function of LSTM is implemented by the gating mechanism that makes LSTM more suitable for dealing with time sequence prediction problems than other traditional neural networks.The water quality prediction model based on LSTM can take full advantage of the time sequence of the water quality information to improve the accuracy of prediction.

Water Quality Prediction Method Based on IGRA and LSTM
The procedure of water quality prediction method based on IGRA and LSTM is shown in Figure 3.In order to take full advantage of the multivariate correlation and time sequence of water quality information, the method in Section 2.1 is applied to select features from water quality information and the method in Section 2.2 is adopted to establish a water quality prediction model.
The specific steps for the prediction of the water quality indicator X i are shown as follows: Step 1. Exclude outliers based on Pauta criterion and normalize datasets.
Step 2. Calculate the correlation between X i and other water quality indicators by IGRA.
Step networks.The water quality prediction model based on LSTM can take full advantage of the time sequence of the water quality information to improve the accuracy of prediction.

Water Quality Prediction Method Based on IGRA and LSTM
The procedure of water quality prediction method based on IGRA and LSTM is shown in Figure 3.In order to take full advantage of the multivariate correlation and time sequence of water quality information, the method in Section 2.1 is applied to select features from water quality information and the method in Section 2.2 is adopted to establish a water quality prediction model.
The specific steps for the prediction of the water quality indicator i X are shown as follows: Step 1. Exclude outliers based on Pauta criterion and normalize datasets.
Step 2. Calculate the correlation between i X and other water quality indicators by IGRA. Step

Results and Discussion
The experiment is implemented by advanced neural network toolkit Keras and TensorFlow.From our previous work [23], the optimal number of neuron nodes for each layer is 3, 8, and 1.The number of epochs is set to 50 and the proportion of training set and test set is set to 8:2.The proposed method is compared with other similar methods in two actual water quality datasets: Tai Lake and Victoria Bay.

Datasets
Tai Lake is the third largest fresh water lake in China, with a perimeter of about 400 kilometers.In recent decades, the industry and agriculture in the coastal areas of Tai Lake has developed rapidly, the water quality has been seriously polluted.In 2000, only 15% of the water bodies weren't polluted, and the rest suffered varying degrees of pollution.The dataset of Tai Lake is composed of 648 monthly historical monitoring data collected from 8 monitoring stations between 2000 and 2006.It includes 10 water quality indicators: Total Nitrogen (TN), Total Phosphorus (TP), Ammonia Nitrogen (NH3-N), Suspended Solids (SS), Water Temperature (WT), Dissolved Oxygen (DO), Hydrogen Ion Concentration (pH), Transparency, Chloride (CL), and Precipitation.
Victoria Bay is the harbour between the Kowloon Peninsula and the Hong Kong Island in China.The area is about 41.88 km 2 .It was formed more than 7000 years ago when the sea level was lower than it is now.In recent years, the content of DO in Vitoria Bay has been lower than the standard.The dataset of Victoria Bay is composed of 4283 historical monitoring data collected from 8 monitoring stations every two weeks between 1986 and 2016.It includes 9 water quality indicators: Escherichia coli (E.coli), 5th Biochemical Oxygen Demand (BOD5), NH3-N, Nitrite, Phosphate, pH, WT, Salinity, and DO.
It's important to make water quality predictions for Tai Lake and Victoria Bay.The water quality indicator predicted in this experiment is DO.

Results of Feature Selection
This paper applies different relational analysis methods to calculate the correlation between DO and other indicators.The results of Tai Lake and Victoria Bay are shown in Tables 1 and 2.  1 and 2 that compared with grey relational analysis used in literature [20], IGRA cannot only measure the positive correlation but also the negative correlations between DO and other water quality indicators.Compared with grey relational analysis algorithm in terms of similarity in literature [21], the results of IGRA in term of the similarity and proximity are more consistent with the results of qualitative analysis.
To further verify the effectiveness of IGRA, 4 indicators in the above tables, each of which has larger absolute correlation with DO, are selected as input features for the prediction model based on LSTM.The prediction errors of Tai Lake and Victoria are shown in Tables 3 and 4. From Tables 3 and 4, compared with literature [23], which adopts only one feature DO for prediction, the results of the method with multiple features as inputs are better.Compared with the grey relational analysis algorithms in literature [20] and literature [21], the prediction error (root mean square error, RMSE) is smaller when the features are selected by IGRA.It suggests that IGRA can fully take advantage of the multivariate correlation of water quality information, which is effective for improving the accuracy of prediction.

Results of Water Quality Prediction
The result of feature selection for Tai Lake through IGRA is shown in the fourth row of Table 3.The result of feature selection for Victoria Bay is shown in the fourth row of Table 4.The comparison among DO prediction results of LSTM, Back Propagation (BP) neural network, and Auto Regressive Integrated Moving Average (ARIMA) model with the same inputs are shown in Figures 4 and 5. RMSE of methods mentioned above is shown in Figure 6.To further verify the effectiveness of IGRA, 4 indicators in the above tables, each of which has larger absolute correlation with DO, are selected as input features for the prediction model based on LSTM.The prediction errors of Tai Lake and Victoria are shown in Tables 3 and 4. From Tables 3 and 4, compared with literature [23], which adopts only one feature DO for prediction, the results of the method with multiple features as inputs are better.Compared with the grey relational analysis algorithms in literature [20] and literature [21], the prediction error (root mean square error, RMSE) is smaller when the features are selected by IGRA.It suggests that IGRA can fully take advantage of the multivariate correlation of water quality information, which is effective for improving the accuracy of prediction.

Results of Water Quality Prediction
The result of feature selection for Tai Lake through IGRA is shown in the fourth row of Table 3.The result of feature selection for Victoria Bay is shown in the fourth row of Table 4.The comparison among DO prediction results of LSTM, Back Propagation (BP) neural network, and Auto Regressive Integrated Moving Average (ARIMA) model with the same inputs are shown in Figures 4 and 5. RMSE of methods mentioned above is shown in Figure 6.The prediction results in a random window of sequential sampling points from the test data set are shown in Figures 4 and 5.According to these, the results of LSTM are closer to the real observations.It indicates that the prediction model based on LSTM is more accurate than other models based on BP or ARIMA.The RMSE for the entire test data set is shown in Figure 6.According to the Figure 6, the RMSE of LSTM is lower than that of BP and ARIMA in Tai Lake and Victoria Bay.It suggests that LSTM can fully take advantage of the time sequence of water quality information, which is effective for improving the accuracy of prediction.

Conclusions
Water quality prediction has great significance for water environment protection.Considering the multivariate correlation and time sequence of water quality information, a water quality prediction method based on IGRA and LSTM is proposed in this paper.First, IGRA is proposed to select features that are the indicators with a larger absolute correlation with the indicator to predict.In the second place, a prediction model based on LSTM is established, whose inputs are the indicators  The prediction results in a random window of sequential sampling points from the test data set are shown in Figures 4 and 5.According to these, the results of LSTM are closer to the real observations.It indicates that the prediction model based on LSTM is more accurate than other models based on BP or ARIMA.The RMSE for the entire test data set is shown in Figure 6.According to the Figure 6, the RMSE of LSTM is lower than that of BP and ARIMA in Tai Lake and Victoria Bay.It suggests that LSTM can fully take advantage of the time sequence of water quality information, which is effective for improving the accuracy of prediction.

Conclusions
Water quality prediction has great significance for water environment protection.Considering the multivariate correlation and time sequence of water quality information, a water quality prediction method based on IGRA and LSTM is proposed in this paper.First, IGRA is proposed to select features that are the indicators with a larger absolute correlation with the indicator to predict.In the second place, a prediction model based on LSTM is established, whose inputs are the indicators The prediction results in a random window of sequential sampling points from the test data set are shown in Figures 4 and 5.According to these, the results of LSTM are closer to the real observations.It indicates that the prediction model based on LSTM is more accurate than other models based on BP or ARIMA.The RMSE for the entire test data set is shown in Figure 6.According to the Figure 6, the RMSE of LSTM is lower than that of BP and ARIMA in Tai Lake and Victoria Bay.It suggests that LSTM can fully take advantage of the time sequence of water quality information, which is effective for improving the accuracy of prediction.

Conclusions
Water quality prediction has great significance for water environment protection.Considering the multivariate correlation and time sequence of water quality information, a water quality prediction method based on IGRA and LSTM is proposed in this paper.First, IGRA is proposed to select features that are the indicators with a larger absolute correlation with the indicator to predict.In the second place, a prediction model based on LSTM is established, whose inputs are the indicators obtained

Figure 1 .
Figure 1.Water quality prediction model based on LSTM.

Figure 1 .
Figure 1.Water quality prediction model based on LSTM.
g and P are the extensions of stand sigmoid function with the value ranges of [−2, 2] and [−1, 1], respectively.fi W , fc W , and fh W

3 .
Select a set of water quality indicators U including larger absolute values of correlation about X i .After that, construct a training set D. Step 4. Establish the water quality prediction model based on LSTM, train the model by D until the loss function of the model converges.Step 5. Input the observations T of X i and U at previous t − d historical moments to the model to acquire the prediction value x i (t) of X i at the tth moment.Water 2018, 10, x FOR PEER REVIEW 6 of 11

3 .
Select a set of water quality indicators U including larger absolute values of correlation about i X .After that, construct a training set D .Step 4. Establish the water quality prediction model based on LSTM, train the model by D until the loss function of the model converges.Step 5. Input the observations T of i X and U at previous t d − historical moments to the model to acquire the prediction value ( ) i x t of i X at the tth moment.

Figure 3 .
Figure 3. Flow chart of water quality prediction method based on IGRA and LSTM.

Water 2018 ,
10, x FOR PEER REVIEW 8 of 11

Figure 4 .
Figure 4. Prediction results of Tai Lake.Figure 4. Prediction results of Tai Lake.

Figure 4 .
Figure 4. Prediction results of Tai Lake.Figure 4. Prediction results of Tai Lake.

Figure 5 .
Figure 5. Prediction results of Victoria Bay.
the origin annihilation operator is D.
, which is the value of i X at the tth moment.The size of the sliding window is denoted as d , which determines how many historical observations should be adopted.After feature selection via IGRA, the input of the prediction model can be determined as i x t

Table 1 .
The relational analysis results of Tai Lake.

Table 2 .
The relational analysis results of Victoria Bay.

Table 3 .
Feature selection and prediction error of Tai Lake.

Table 4 .
Feature selection and prediction error of Victoria Bay.

Table 3 .
Feature selection and prediction error of Tai Lake.

Table 4 .
Feature selection and prediction error of Victoria Bay.