Water Quality Prediction Method Based on Multi-Source Transfer Learning for Water Environmental IoT System

Water environmental Internet of Things (IoT) system, which is composed of multiple monitoring points equipped with various water quality IoT devices, provides the possibility for accurate water quality prediction. In the same water area, water flows and exchanges between multiple monitoring points, resulting in an adjacency effect in the water quality information. However, traditional water quality prediction methods only use the water quality information of one monitoring point, ignoring the information of nearby monitoring points. In this paper, we propose a water quality prediction method based on multi-source transfer learning for a water environmental IoT system, in order to effectively use the water quality information of nearby monitoring points to improve the prediction accuracy. First, a water quality prediction framework based on multi-source transfer learning is constructed. Specifically, the common features in water quality samples of multiple nearby monitoring points and target monitoring points are extracted and then aligned. According to the aligned features of water quality samples, the water quality prediction models based on an echo state network at multiple nearby monitoring points are established with distributed computing, and then the prediction results of distributed water quality prediction models are integrated. Second, the prediction parameters of multi-source transfer learning are optimized. Specifically, the back propagates population deviation based on multiple iterations, reducing the feature alignment bias and the model alignment bias to improve the prediction accuracy. Finally, the proposed method is applied in the actual water quality dataset of Hong Kong. The experimental results demonstrate that the proposed method can make full use of the water quality information of multiple nearby monitoring points to train several water quality prediction models and reduce the prediction bias.


Introduction
As an important part of the natural environment, water environment plays a vital role in human life. With the rapid development of industry, the discharge of industrial wastewater has increased day by day [1], leading to the deterioration of water environment, and water environment protection is facing severe challenges. Accurate water quality prediction is the basis for water environment protection. The monitoring points are equipped with various water quality Internet of Things (IoT) devices to build the water environmental IoT system [2], which can collect water quality information in real time, making the prediction of accurate water quality possible.
Traditional methods of water quality prediction can be classified into three types: regression analysis, grey systems, and neural networks [3]. Water quality prediction method based on regression analysis is derived from mathematical statistics. It determines the relationship between the dependent variable and the independent variable through the analysis of statistical data, and calculates the correlation coefficient through a certain algorithm, thereby constructing a regression equation to predict water quality information. Ratko et al. [4] proposed a water temperature prediction method based on Gaussian process regression to predict the daily average water temperature of the river. Anja et al. [5] proposed a water quality prediction method based on partial least square regression analysis to predict the water quality information of mining wastewater. Mohammad et al. [6] proposed a prediction method based on M5 model tree and multiple adaptive regression to predict the daily river flow. Water quality prediction method based on grey systems regards the water environment system as a grey system. After that, a strong regular series for water quality prediction is generated by identifying the relationships of system factors. Zhang et al. [7] constructed a grey prediction model to predict the chemical oxygen of industrial wastewater. Yang et al. [8] constructed a GM (1,1) model to predict the water quality information of the lake. Xue et al. [9] constructed a grey prediction model to predict the mineralization of groundwater. Xiao et al. [10] applied grey theory to construct a model to predict the affecting factors of water bloom. Water quality prediction method based on neural networks forms an adaptive nonlinear system through the connection of neurons, using the neural networks to adaptively learn the trend of water quality information. With the emergence of cloud computing, edge computing and other technologies [11][12][13], neural networks requiring complex computation were gradually applied to water quality prediction. Dawood et al. [14] constructed an artificial neural network to predict the water quality information. Zhou et al. [15] proposed a water quality prediction method based on improved grey relational analysis and long-short term memory (LSTM) neural network to predict the dissolved oxygen. Dong et al. [16] proposed a water quality prediction method based on Savitzky-Golay and LSTM to predict the water quality information. Hu et al. [17] constructed a deep LSTM to predict pH and water temperature. Considering the temporality and the nonlinearity of water quality information, neural networks have more advantages and better prediction performance than the other two types of methods but require a large number of training samples. If the target monitoring point has too few training samples, the accuracy of water quality prediction will be reduced.
Water flows and exchanges between multiple monitoring points [18] in the same water area result adjacency effect in their water quality information. The prediction accuracy of neural networks can be improved if the adjacency effect is used for neural networks. Traditional transfer learning methods, such as transfer component analysis (TCA) [19], are usually used for single-source transfer, which can transfer the features of water quality samples from a single nearby monitoring point to a target monitoring point [20]. However, TCA does not consider the bias between the features of water quality samples of multiple nearby monitoring points, which makes it not applicable for the transfer of water quality samples of multiple nearby monitoring points. Target monitoring points are often surrounded with multiple nearby monitoring points in practice. Compared with traditional transfer learning methods, which can only effectively use one source domain, multi-source transfer learning (MSTL) [21,22] can make full use of multiple source domains. Therefore, we proposed a water quality prediction framework based on MSTL, effectively using the water quality information of multiple nearby monitoring points with distributed computing.
The water quality information changes periodically along with time, so it has the nature of temporality. By using the temporality, the accuracy of water quality prediction can be effectively improved. Echo state network (ESN) [23], as an improved model of recurrent neural network (RNN) [24], retains the information left at the last moment through the internal connections of reservoir, which can effectively use the temporality. Moreover, ESN only needs to use the linear regression algorithm to train the output weights, which can solve the problem of slow convergence speed of traditional RNN. Therefore, we establish the distributed water quality prediction models based on ESN at multiple nearby monitoring points in the framework, effectively using the temporality of water quality information.
Bias [25] exists not only in the feature alignment of nearby monitoring points and target monitoring points, but also in the model alignment of the water quality prediction models at multiple nearby monitoring points. Therefore, we optimize the prediction parameters of MSTL to improve the prediction accuracy of the models.
In this paper, we propose a water quality prediction method based on MSTL, for the purpose of making full use of the adjacency effect of water quality information. The contributions of this paper are listed as follows.
(1) We construct a water quality prediction framework based on MSTL. In particular, the common features of water quality samples of multiple nearby monitoring points and the target monitoring point are extracted and then aligned. Afterwards, according to the aligned features of water quality samples, the water quality prediction models based on ESN at multiple nearby monitoring points are established with distributed computing, and then the prediction results of distributed water quality prediction models are integrated. This framework successfully solves the problem of an insufficient number of training samples of the target monitoring point. (2) We optimize the prediction parameters of MSTL. In particular, the back propagates the population deviation based on multiple iterations and can reduce the feature alignment bias and the model alignment bias to improve the prediction accuracy of the models. (3) We perform experiments in the actual water quality dataset of Hong Kong. The experimental results demonstrate that the proposed method can train multiple water quality prediction models by using the adjacency effect, and thus reduce the prediction bias and improve the prediction accuracy compared with other similar methods.
The rest of this paper is organized as follows. Section 2 gives the details of the proposed method, including the water quality prediction framework based on MSTL, the prediction parameters optimization of MSTL, and the overall process. Section 3 gives the experimental results and analyses. Section 4 is the summary of this paper.

Water Quality Prediction Framework Based on MSTL
We construct a water quality prediction framework based on MSTL, as shown in Figure 1. First, we use the feature extraction network based on the residual network [26] to extract the water quality features of nearby monitoring points and the target monitoring point into the same feature space, to obtain the common features of water quality samples of nearby monitoring points and the target monitoring point. Second, we use the feature alignment networks based on a bottleneck layer [27] to align the common features of water quality samples in the same feature space, to obtain the aligned features. Third, we establish the water quality prediction model based on ESN at every nearby monitoring point with distributed computing and predict the water quality information at the next moment according to the aligned features of water quality samples. Finally, we integrate the results of distributed water quality prediction models to reduce the prediction bias. If there are v nearby monitoring points around the target monitoring point, respectively construct the features of water quality samples of the j-th nearby monitoring point and the target monitoring point at the previous n moments as } represent the features of water quality samples of the j-th nearby monitoring point and the target monitoring point at the h-th moment, respectively. In particular, d represents the size of the sliding window, x sj (h − 1) and x t (h − 1) represent the water quality information of the j-th nearby monitoring point and the target monitoring point at the (h − 1)-th moment, respectively. First, we construct the feature extraction network based on residual network (F), for the purpose of extracting the common features of water quality samples of v nearby monitoring points and the target monitoring point. The structure of this network is shown in Figure 2. In Figure 2, Conv F is the convolution kernel, BatchNorm is the normalization algorithm, Rule is the activation function, and MaxPool is the max pooling layer. The features of water quality samples extracted from the j-th nearby monitoring point and the target monitoring point are respectively C * sj and C * t , and they are calculated by Second, we construct the feature alignment networks based on the bottleneck layer ( H 1 , H 2 , · · · , H j , · · · , H v ) at v nearby monitoring points, for the purpose of aligning the common features extracted from the nearby monitoring point with the features extracted from the target monitoring point. In particular, H j is the feature alignment network at the j-th nearby monitoring point, and its structure is shown in Figure 3. In Figure 3, Conv H j is the convolution kernel, Rule is the activation function, and AvgPool is the average pooling layer. The aligned features of water quality sample of the j-th nearby monitoring point and the target monitoring point are respectively C sj and C tj , and they are calculated by After aligning the common features of water quality samples, construct the water quality sample sets of the j-th nearby monitoring point and target monitoring point as , y t (h) respectively represent the water quality samples of the j-th nearby monitoring point and the target monitoring point at the h-th moment, where c sj (h) and c tj (h) are the aligned feature of water quality sample of the j-th nearby monitoring point and the target monitoring point at the h-th moment. y sj (h) and y t (h) are the real water quality information of the j-th nearby monitoring point and the target monitoring point at the h-th moment.
We combine U sj and U tj to obtain the water quality sample set U train is the water quality sample at the h-th moment, c total j (h) is the feature of water quality sample at the h-th moment, and y total j (h) is the real water quality information at the h-th moment. U train j will be used to train the following water quality prediction model.
Afterwards, we construct the water quality prediction models based on ESN at v nearby monitoring points ( ESN 1 , ESN 2 , · · · , ESN j , · · · , ESN v ), where ESN j is the distributed water quality prediction model at the j-th nearby monitoring point, and its structure is shown in Figure 4. The model consists of an input layer with d neurons, a reservoir with r neurons, and an output layer with one neuron. Besides, the input of the model is the feature of water quality sample at the h-th moment (c total j (h)), and the output is the predicted water quality information at the h-th moment (y pre j (h)). The calculation of the water quality prediction model based on ESN at the j-th nearby monitoring point is as where Tanh is the activation function, and s j (h) is the internal state vector of the reservoir. W in j is the input layer weight, W r j is the reservoir weight, and W out j is the output weight. In particular, W out j is trained by the ridge regression algorithm [28] according to U train j , W r j and is scaled by where α is the scaling range and 0 < α < 1. ρ is the spectral radius of W r j , and W 0 is a sparse matrix which is randomly generated.
Finally, we integrate the prediction results of distributed water quality prediction models at multiple nearby monitoring points to obtain the final prediction result (y pre (h)) by using the arithmetic average. y pre (h) is calculated by

Prediction Parameters Optimization of MSTL
We optimize the prediction parameters of MSTL to reduce the feature alignment bias between nearby monitoring points and the target monitoring point, and to minimize the model alignment bias between the water quality prediction models. Specifically, to minimize the overall bias (l total ), CONV F , CONV H j and W out j are updated by the stochastic gradient descent (SGD), since W out j affects the prediction results of the water quality prediction model, CONV F and CONV H j affect the aligned features obtained by the feature alignment networks. The smaller l total is, the better the prediction accuracy is. l total is calculated by l total = l mse + λ(l mmd + l disc ) (11) where λ is the trade-off parameter, which is used to measure the importance of l mmd and l disc . λ is calculated by where iter is the total number of iterations, and i is the current number of iterations.
l mse is the model prediction bias of the water quality prediction models at the v nearby monitoring points. The smaller l mse is, the smaller the model prediction bias is. l mse is calculated by where y pre j (h) is the predicted water quality information of the prediction model at the j-th nearby monitoring point, y true is the real water quality information, and Mse is the mean square error function.
l mmd is the feature alignment bias between nearby monitoring points and the target monitoring point. The smaller l mmd is, the smaller the feature alignment bias is. l mmd is calculated by where MMD is the maximum mean discrepancy function [29], which is used to measure the distance between the aligned features of water quality samples of nearby monitoring points and the target monitoring point after mapping to the same feature space. l disc is the model alignment bias between the water quality prediction models at multiple nearby monitoring points. The smaller the l disc is, the smaller the model alignment bias is. l disc is calculated by

Process of Water Quality Prediction Method Based on MSTL
The overall process of the water quality prediction method based on MSTL is summarized in Figure 5. The specific steps are as follows: Step 1: Use the feature extraction network based on residual network to extract the common features of water quality samples of the j-th nearby monitoring point and target monitoring point (C * sj and C * t ).
Step 2: Use the feature alignment network based on the bottleneck layer of the j-th nearby monitoring point to align the common features of water quality samples of the j-th nearby monitoring point and the target monitoring point (C sj and C tj ).
Step 3: Construct the training set U train j , and train the distributed water quality prediction model based on ESN at the j-th nearby monitoring point.
Step 4: Repeat Steps 1-3 to obtain the common features and the aligned features of water quality samples of all nearby monitoring points, and train distributed water quality prediction models based on ESN at all nearby monitoring points.
Step 5: Calculate the overall bias (l total ) according to the aligned features of water quality samples of all nearby monitoring points and the prediction results of distributed water quality prediction models based on ESN.
Step 6: Judge whether l total meets the accuracy requirements. If the requirements are met, go to step 8. Otherwise, go to step 7.
Step 7: For every nearby monitoring point, update CONV F , CONV H j , and W out j through multiple iterations and back-propagating l total ,. After that, go to step 1.
Step 8: At the j-th nearby monitoring point, input the water quality information of the previous d moments of the current time of the target monitoring point into the optimized water quality prediction framework based on MSTL, then obtain the prediction result through distributed computing. In the same way, the prediction results of distributed water quality prediction models of all nearby monitoring points are obtained.
Step 9: Integrate the prediction results of distributed water quality prediction models at all nearby monitoring points to obtain the final prediction result (y pre (h)).

Experimental Results and Analyses
The proposed method is implemented by Python and Torch. First, we describe the specific dataset of the experiments. Second, we select the prediction parameters of MSTL. Afterwards, MSTL is compared with other transfer methods. Finally, we compare ESN with other prediction models.
We set 20% samples of the target monitoring point as the test sample set and 20% as the validation sample set. Thus, the training sample set is composed of the remaining 60% samples of the target monitoring point and the samples transferred from nearby monitoring points. The mean squared error (MSE) is chosen as the indicator measuring the prediction bias. The smaller MSE is, the smaller the prediction bias is. Specifically, MSE is calculated by where q is the number of samples, y true is the real water quality information, and y pre is the predicted water quality information.

Datasets
We performed two experiments. In the first experiment, we set Oxtail Sea as the target monitoring point. Oxtail Sea has only 3193 pieces of water quality information, which is slightly insufficient. The spatial location of monitoring points in the first experiment is shown in Figure 6. Tolo Harbour, Mirs Bay and Southern District are close to Oxtail Sea in water area, and they have the adjacency effect. As a result, we consider these three locations as nearby monitoring points. Afterwards, we use the framework based on MSTL to align the features of water quality samples of these three nearby monitoring points with Oxtail Sea, and then use the samples of these three nearby monitoring points to train three water quality prediction models. Among them, Tolo Harbour has 3192 pieces of water quality information, Mirs Bay has only 758 pieces of water quality information, and Southern District has 4467 pieces of water quality information. The water quality indicators of Oxtail Sea include dissolved oxygen (DO), phosphate, water temperature (WT), and nitrite. The data of these indicators are collected by the sensors and transmitted back approximately every fifteen days. The purpose of the experiment is to predict the DO of Oxtail Sea at the next moment. In the second experiment, we set the Western Buffer District as the target monitoring point. Western Buffer District has only 1440 pieces of water quality information, which is also slightly insufficient. The spatial location of monitoring points in the second experiment is shown in Figure 7. Figure 7 shows the Northwestern District, the Southern District and the Victoria Harbour are close to Western Buffer District in water area and they have the adjacency effect. As a result, we consider these three locations as nearby monitoring points. The experimental procedure is the same as the first experiment. Among them, the Northwestern District has only 1630 pieces of water quality information, the Southern District has 4467 pieces of water quality information, and the Victoria Harbour has 4158 pieces of water quality information. The water quality indicators and the prediction purpose of the Western Buffer District are the same as the first experiment.

Parameters Selection
In order to improve the prediction accuracy, we select the parameters including the size of sliding window (d), the size of reservoir (r) and the size of spectral radius (ρ) in the water quality prediction models based on ESN. In the experiment of Oxtail Sea, l total converges when the number of iterations is 300. Table 1 shows the prediction results of distributed water quality prediction models based on ESN with different parameters in Oxtail Sea. When d = 3, r = 500, and ρ = 0.7, the prediction result of Oxtail Sea is the best (l total = 0.085, MSE = 0.0060). Similarly, in the experiment of the Western Buffer District, l total converges also when the number of iterations is 300. Table 2 shows the prediction results of distributed water quality prediction models based on ESN with different parameters in the Western Buffer District. When d = 3, r = 500, and ρ = 0.6, the prediction result of the Western Buffer District is the best (l total = 0.0111, MSE = 0.0106). The optimal parameters mentioned above are used in the subsequent experiments.

Comparison of Transfer Methods
We compare MSTL with non-expansion, TCA [20] and the joint class proportion and optimal transport (JCPOT) [30]. In particular, non-expansion uses only the water quality information of the target monitoring point. TCA transfers the features of water quality samples from a single nearby monitoring point to the target monitoring point, and selects the high-quality samples based on similarity and time sequence. JCPOT predicts the water quality information by using the optimal transport to correct and align the feature alignment bias between multiple source domains and target domain. MSTL extracts and aligns the features of water quality samples of multiple nearby monitoring points and the target monitoring point and trains the model through the aligned samples. The prediction results of different transfer methods in the Oxtail Sea and the Western Buffer District locations are shown in Table 3. From Table 3, we can observe that the prediction bias of MSTL is lower than that of non-expansion either in the Oxtail Sea or in the Western Buffer District. Besides, the prediction bias of MSTL is lower than that of TCA and JCPOT, because TCA can only use the water quality information of a single nearby monitoring point and JCPOT does not consider the effect of the feature alignment bias between different source domains. The prediction results show that MSTL can effectively use the water quality information of nearby monitoring points to train multiple water quality prediction models, which can reduce the model prediction bias and improve the prediction accuracy.

Comparison of Prediction Models
In the water quality prediction framework based on MSTL, we compare the water quality prediction models based on ESN with the water quality prediction models based on back propagation (BP) network, and the water quality prediction models based on gated recurrent unit (GRU) network. Like ESN, both BP and GRU have only one hidden layer. As a widely used basic neural network, BP has the advantages of simple structure and small calculation. As an improvement of LSTM, GRU adds a gating mechanism to make it have a memory ability. Compared with BP, the training of GRU is more complex. Partial prediction results of different prediction models in the Oxtail Sea and the Western Buffer District are shown in Figures 8 and 9, respectively. The comparisons of different water quality prediction models in terms of prediction bias and training time are shown in Figures 10 and 11.   Figures 8 and 9 show that the accuracy of BP is poor, and the prediction results fluctuate greatly. The prediction results of GRU and ESN are close when the data fluctuate slightly. Overall, ESN has better prediction ability than that of GRU either in the peak or in the valley part of the data. Figures 10 and 11 show that ESN has the smallest prediction bias and the shortest training time in the Oxtail Sea or the Western Buffer District, because ESN has a special reservoir structure and use only a simple linear regression algorithm for training.
To further illustrate the prediction accuracy of the proposed method, Figure 12 gives the box-plot comparison of the predicted water quality information and the real water quality information in the Oxtail Sea and the Western Buffer District. As seen in the figure, there exists a nearly uniform presentation through the observations of the measures, including the upper and lower quartiles, the upper and lower bound, the median and the outliers.

Conclusions
Water environmental IoT system, which can collect water quality information in real time, provides the possibility for accurate water quality prediction. In this paper, we propose a water quality prediction method based on MSTL for water environmental IoT system, to effectively use the water quality information of nearby monitoring points, and then improve the prediction accuracy of water quality. First, a water quality prediction framework based on MSTL is constructed, which establishes multiple water quality prediction models based on ESN at multiple nearby monitoring points with distributed computing. Second, the water quality prediction parameters of MSTL are optimized. Specifically, the back propagates population deviation based on multiple iterations reducing the feature alignment bias and the model alignment bias. Finally, the proposed method is compared with other similar methods in the actual water quality dataset of Hong Kong. The experimental results demonstrate that the proposed method can effectively align the features of water quality samples of multiple nearby monitoring points through MSTL and use the aligned samples of multiple nearby monitoring points to train multiple water quality prediction models, which can effectively reduce the prediction bias. It should be noted that the same type of sensors needs to be used at different monitoring points to collect the data of the same water quality indicators, so that the prediction models at different monitoring points have the same input parameter. In the following work, we will study to break through this limitation.