A Prediction Model Based on Deep Belief Network and Least Squares SVR Applied to Cross-Section Water Quality

Recently, the quality of fresh water resources is threatened by numerous pollutants. Prediction of water quality is an important tool for controlling and reducing water pollution. By employing superior big data processing ability of deep learning it is possible to improve the accuracy of prediction. This paper proposes a method for predicting water quality based on the deep belief network (DBN) model. First, the particle swarm optimization (PSO) algorithm is used to optimize the network parameters of the deep belief network, which is to extract feature vectors of water quality time series data at multiple scales. Then, combined with the least squares support vector regression (LSSVR) machine which is taken as the top prediction layer of the model, a new water quality prediction model referred to as PSO-DBN-LSSVR is put forward. The developed model is valued in terms of the mean absolute error (MAE), the mean absolute percentage error (MAPE), the root mean square error (RMSE), and the coefficient of determination (R2). Results illustrate that the model proposed in this paper can accurately predict water quality parameters and better robustness of water quality parameters compared with the traditional back propagation (BP) neural network, LSSVR, the DBN neural network, and the DBN-LSSVR combined model.


Introduction
Rapid population growth, industrialization, and the use of fertilizers and pesticides for agricultural purposes have made the water environment problem increasingly serious [1,2]. Therefore, predicting water quality parameters is of great significance for the control of water pollution. At present, water quality prediction models mainly include mechanism water quality prediction models and non-mechanism water quality prediction models. The mechanism water quality prediction model is relatively complicated, using the system structure data to simulate water quality, which is constrained by the internal and external environment of the water body, but it is more versatile and applicable to any water body. The earliest water quality simulation model is the Streeter-Phelos (S-P) model that belongs to the one-dimensional steady-state oxygen balance model, having been widely used so far, and then derived the BDO-DO bilinear coupling system model [3]. Subsequently, Occident has developed a variety of water quality models, such as the QUAL model [4] and the WASP model [5], which has been widely used in the simulation of river water quality. In 1992, Warren [6] proposed to take advantage of MIKE21 for a modeling system for estuaries, coastal waters, and seas. Hayes [7] a new hybrid method was developed, using artificial neural network and Markov chain method to predict dissolved oxygen (DO) which is the main indicator of water quality [18]. Yan et al. [19] proposed a genetic algorithm (GA) and particle swarm optimization (PSO) algorithm to optimize the back propagation (BP) neural network to predict the biochemical oxygen demand of the lake, moreover, the forecast accuracy has been greatly improved. A affinity propagation clustering method based on a least squares support vector machine (AP-LSSVM) was put forward for water quality prediction, which is a supervised learning method, but it is more sensitive to vacancies [20]. Solanki et al. [21] applied the deep belief network model in 2015 to analyze and predict the chemical eigenvalues of water, especially DO and pH, their research indicates that deep learning techniques can provide more accurate results than supervised learning-based techniques. Deep learning methods have achieved great application results in the fields of image classification [22], speech recognition [23], fault diagnosis [24], and prediction estimation [25] due to the powerful big data processing ability and classification capability. For instance, a deep belief network (DBN), one of the commonly used neural networks in deep learning, is used to classify spectral images by extracting hyperspectral features [26], identify and classify different discharge patterns [27], and predict traffic flow [28,29] and weather [30]. It has been verified that deep learning methods are better than traditional methods. Besides, Marir et al. [31] developed a model to discover the abnormal behavior from large-scale network traffic data using a combination of deep feature extraction and multi-layer ensemble support vector machines (SVMs) in a distributed way. Fadlullah et al. [32] envisaged a reward-based deep learning structure, which jointly employs a deep convolutional neural network (CNN) and a deep belief network (DBN) to predict the traffic load value matrix and construct the final action matrix.
In this paper, we proposed a new deep learning method that combines DBN neural network optimized by PSO with least squares support vector regression (LSSVR) to predict water quality parameters. Then, the BP, DBN, LSSVR, and DBN-LSSVR methods were used as comparative experiments, and we evaluate the performance of each method. Experimental results demonstrate that the method proposed in this paper has a higher accuracy of prediction than the other four methods.

Study Area and Monitoring Data
In this paper, the water quality data of Sanhe East Bridge, a control section of Juhe River on the Haihe River Basin, is selected as the research object. It is located in Sanhe City (39 • 48 N-40 • 05 N, 116 • 45 E-117 • 15 E), in Hebei Province of China, belonging to a temperate continental monsoon climate with distinct seasons throughout the year. The landform types are complex, with low hills, plains, and depressions. The Juhe River is one of the important river systems on the Haihe River Basin, which flows through three provinces of Hebei, Beijing, and Tianjin, with a total length of 206 km, a drainage area of 2276 km, and an annual runoff of 29.09 million cubic meters. There are important tributary rivers: Jinji River, Zhouhe River, and Huanxiang River.
Human activities mainly include industrial wastewater pollution, domestic sewage pollution, livestock and poultry farming wastewater pollution, and garbage pollution, which has caused critical environmental pressure on the water quality of the Juhe River. In addition, as a pilot zone of China's free trade, Hebei's rapid economic development has also lead to serious pollution of the water quality of the Juhe River. The experimental data in this paper is monitored by the Langfang Environmental Monitoring Station, which is from the Beijing Environmental Planning Institute of the Ministry of Environmental Protection. Figure 1 is the monitoring station map of the Juhe River. Figure 2 shows the annual runoff of the Juhe River during our analyzed period, which is real-time monitoring data transmitted from the monitoring system of Sanhe Hydrological Station by wireless communication.
The summer and early autumn of the study area are seasons with relatively concentrated rainfall in that it is easily controlled by the southeast warm and humid airflow, besides, according to the quality of the data provided by the organization, so the time interval of data in this paper is from 18 August 2018, to 18 March 2019, collected by every four hours, a total of 1086 data.
Water 2020, 12, x FOR PEER REVIEW 3 of 16 that the method proposed in this paper has a higher accuracy of prediction than the other four methods.

Study Area and Monitoring Data
In this paper, the water quality data of Sanhe East Bridge, a control section of Juhe River on the Haihe River Basin, is selected as the research object. It is located in Sanhe City (39°48′ N-40°05′ N, 116°45′ E-117°15′ E), in Hebei Province of China, belonging to a temperate continental monsoon climate with distinct seasons throughout the year. The landform types are complex, with low hills, plains, and depressions. The Juhe River is one of the important river systems on the Haihe River Basin, which flows through three provinces of Hebei, Beijing, and Tianjin, with a total length of 206 km, a drainage area of 2276 km, and an annual runoff of 29.09 million cubic meters. There are important tributary rivers: Jinji River, Zhouhe River, and Huanxiang River.
Human activities mainly include industrial wastewater pollution, domestic sewage pollution, livestock and poultry farming wastewater pollution, and garbage pollution, which has caused critical environmental pressure on the water quality of the Juhe River. In addition, as a pilot zone of China's free trade, Hebei's rapid economic development has also lead to serious pollution of the water quality of the Juhe River. The experimental data in this paper is monitored by the Langfang Environmental Monitoring Station, which is from the Beijing Environmental Planning Institute of the Ministry of Environmental Protection. Figure 1 is the monitoring station map of the Juhe River. Figure 2 shows the annual runoff of the Juhe River during our analyzed period, which is real-time monitoring data transmitted from the monitoring system of Sanhe Hydrological Station by wireless communication.
The summer and early autumn of the study area are seasons with relatively concentrated rainfall in that it is easily controlled by the southeast warm and humid airflow, besides, according to the quality of the data provided by the organization, so the time interval of data in this paper is from 18 August, 2018, to 18 March, 2019, collected by every four hours, a total of 1086 data. In Figure 1, there are 22 monitoring stations. The stations marked in green are the stations for the construction of sewage villages along the banks of the river, the yellow are the stations with more pollution on the Juhe River, and the blue are the stations in the buffer zone of the provincial boundary, which require higher water quality and is greatly affected by upstream water quality. In Figure 1, there are 22 monitoring stations. The stations marked in green are the stations for the construction of sewage villages along the banks of the river, the yellow are the stations with more pollution on the Juhe River, and the blue are the stations in the buffer zone of the provincial boundary, which require higher water quality and is greatly affected by upstream water quality.

Feature Extraction Based on DBN Model
DBN is an important model in deep learning [33]. It is a probabilistic generative model composed of a series of Restricted Boltzmann Machine (RBM) units [34]. There is no connection between each neural unit in each layer of the RBM model, furthermore, each neural unit in the visible layer is connected to each neural unit in the hidden layer. Also, the output of each layer of RBM is used as the input of the next layer. Its structure is shown in Figure 3. The bottom layer of the DBN model adopts a multi-layer RBM structure. The greedy algorithm is used to train the sample data layer by layer. The parameters obtained by training the first layer RBM are used as the input of the second layer RBM, and the parameters of each layer are obtained by analogy. The training process belongs to unsupervised learning. The joint configuration energy of the visible layer and the hidden layer in RBM can be expressed as the following: where, = , , , it is the connection weight value between visible unit and hidden unit , is an internal bias of visible layer neurons, and is hidden layers.

Feature Extraction Based on DBN Model
DBN is an important model in deep learning [33]. It is a probabilistic generative model composed of a series of Restricted Boltzmann Machine (RBM) units [34]. There is no connection between each neural unit in each layer of the RBM model, furthermore, each neural unit in the visible layer is connected to each neural unit in the hidden layer. Also, the output of each layer of RBM is used as the input of the next layer. Its structure is shown in Figure 3.

Feature Extraction Based on DBN Model
DBN is an important model in deep learning [33]. It is a probabilistic generative model composed of a series of Restricted Boltzmann Machine (RBM) units [34]. There is no connection between each neural unit in each layer of the RBM model, furthermore, each neural unit in the visible layer is connected to each neural unit in the hidden layer. Also, the output of each layer of RBM is used as the input of the next layer. Its structure is shown in Figure 3. The bottom layer of the DBN model adopts a multi-layer RBM structure. The greedy algorithm is used to train the sample data layer by layer. The parameters obtained by training the first layer RBM are used as the input of the second layer RBM, and the parameters of each layer are obtained by analogy. The training process belongs to unsupervised learning. The joint configuration energy of the visible layer and the hidden layer in RBM can be expressed as the following: where, = , , , it is the connection weight value between visible unit and hidden unit , is an internal bias of visible layer neurons, and is hidden layers. The bottom layer of the DBN model adopts a multi-layer RBM structure. The greedy algorithm is used to train the sample data layer by layer. The parameters obtained by training the first layer RBM are used as the input of the second layer RBM, and the parameters of each layer are obtained by analogy. The training process belongs to unsupervised learning. The joint configuration energy of the visible layer and the hidden layer in RBM can be expressed as the following: Water 2020, 12, 1929 it is the connection weight value between visible unit i and hidden unit j, a i is an internal bias of visible layer neurons, and b j is hidden layers.
When the parameter θ is fixed, based on the energy function, the joint probability distribution of the visible layer and the hidden layer can be obtained Equation (2) and combining it with Equation (3) as follows: When the state of the visible layer ν is known, the activation probability of the jth neural unit of the hidden layer h is obtained: When the hidden layer state h is known, the activation probability of the ith neural unit of the visible layer ν is obtained: is an activation function, named the sigmoid function. Each neuron will determine its state value as 1 or 0 with probability P.
In the unsupervised learning process, the purpose of training RBM is to obtain the model parameters, which can be given by the log-likelihood function as follows: In the training process, due to the complex calculation of the normalization factor Z θ , it is generally approximated by sampling methods such as Gibbs [33]. Hinton [35] proposed a comparative diversification (CD) rapid learning algorithm to train network parameters, thereby improving training efficiency.
In this paper, in order to mine the essential features of the water quality of the cross-section, the DBN network is used to extract the features of the water quality. At the top level of the model, the LSSVR layer is used to optimize the prediction results, afterward, the abstract features obtained by the training and learning of the bottom model are used as the input of the LSSVR layer, furthermore, the prediction results are output through the LSSVR layer fitting. At the same time, the LSSVR layer is also required to fine-tune and optimize the obtained model parameters. This process is supervised learning.

Optimizing DBN Model Using PSO
Particle Swarm Optimization (PSO) is an evolutionary computing technology developed by Kennedy and Eberhart, in 1995, whose idea stems from the study of bird predation behavior [36]. The particle swarm algorithm first generates a random solution, and then iteratively finds the optimal solution with the best fitness value. This algorithm has been widely applied in the field of optimization methods with the advantages of simple and easy implementation, less setting parameters, and fast convergence speed, etc. The basic form of particle swarm optimization algorithm is composed of a group of particles, each particle determines its flight direction through the value and speed of the adaptive function, gradually moving to a better area, and finally searching for the global optimal solution. The position of the particle represents the candidate solution to the problem sought, corresponding to the weight value in the neural network.
To optimize the problem, the position, velocity, and fitness value of each particle are determined by the following mathematical Equation (8) and combining it with Equation (9): where, V i is the velocity of the particle, P i is the current position of the particle, i = 1, 2,· · · n, n replaces the total number of particles in the particle group; k represents epoch; rand is a random number between (0,1) to increase the randomness of the search; ω is the inertial weight, adjusting the search range, which is a non-negative number; c 1 and c 2 are acceleration constants that play the role of adjusting the maximum learning step; pbest is the personal optimal value and gbest is the global optimal value. The particle swarm optimization algorithm procedures are usually illustrated as follows: Step 1: Initial Initialize the particle population where the population size is n, including random position and velocity.
Step 2: Evaluation According to the fitness function, the fitness of each particle is evaluated.
Step 3: Find the Pbest For each particle, comparing its current fitness value with the fitness value corresponding to the individual's historical best position (pbest), the current best position pbest will be updated with the current position if the current fitness value is higher.
Step 4: Find the Gbest For each particle, comparing its current fitness value with the fitness value corresponding to the global best position (gbest), the global best position gbest will be updated with the current particle position if the current fitness value is higher.
Step 5: Update the Velocity Update the speed and position of each particle according to Equations (8) and (9).
Step 6: Algorithm Over The particle swarm optimization algorithm is judged by the end condition. According to the set end condition of the algorithm, which is the maximum number of iterations or the target fitness value, if the condition is not met, then jump to step 2, otherwise, the global optimal value gbest is output.

Least Squares Support Vector Regression Machine
The least squares support vector regression machine is a machine learning method based on statistical theory and structural risk minimization criterion. Different from the support vector regression machine, LSSVR replaces inequality constraints by equality constraints, using the loss function and error square as the empirical loss of the training set, and transforming solving quadratic programming problems into solving linear equations, which can effectively improve the calculation speed and convergence accuracy, and has a good promotion performance. The training of the LSSVR model is to supervised train the model through input values and label values, and update the parameters of the model. Its specific form is as follows: where, x i ∈R l is the input vector, y i ∈R l is the output vector, ω is a weight vector and C∈R + is a regularization parameter, which is an empirical parameter set manually, ξ i ∈R, s.t., and µ represent restrictions, empirical error, and bias, respectively. ϕ(·) is the kernel function, which is the nonlinear mapping of the input space to the feature space. Here, we choose to use the RBF radial basis kernel function. To solve the above-constrained optimization problem, the Lagrange polynomial function of the dual problem has the following form: where η = [η 1 , η 2 , η i , · · · η l ], on behalf of the Lagrange multiplier. According to the Karush-Kuhn-Tucher (KKT) condition, find the partial derivatives of ω, µ, ξ i , η i , respectively, and eliminate ω, ξ i , then the following linear Equation (13) can be obtained: where I = [1, 1, · · · , 1] T , Ψ ij is the kernel function K(x i , y i ) that satisfies the Mercer condition, E is the unit matrix of l × l dimension, y = [y 1 , y 2 , · · · , y l ] T , while the mathematical model of the least squares support vector regression machine can be obtained as follows:

Prediction Model Based on PSO Optimized DBN Network and LSSVR
The prediction model constructed in this paper uses deep learning as the pre-processing system of the least squares support vector regression machine. Different from other water quality prediction methods, the deep belief network is used here to fully excavate the essential characteristics of the water quality of the cross-section, rather than as a prediction network, for instance, a single BP network or DBN network is adopted to predict water quality directly. Firstly, apply DBN to perform feature learning on the original water quality data to extract essential feature information that reflects the water quality change trend. Then, utilize the PSO optimization algorithm to solve the optimal initial weight value of the DBN network model. The DBN-LSSVR model is used as the prediction model after the DBN model feature extraction, that is, based on the feature vector output from the DBN model, we optimize and train the least square support vector regression machine, then establish water quality prediction model depended on the optimal combination of LSSVR model parameters and kernel function parameters in this paper. The specific algorithm steps of the model are as follows: Step 1: Determination of DBN model parameters. Initialize the learning rate and the number of iterations. The number of visible layer neurons is determined by the number of input features and the number of hidden layer neurons and the number of hidden layers, as well as the weights and thresholds of each layer, are determined in the training RBM layer by layer. Next, use the CD algorithm to pre-train each layer of RBM, while, regard the output of each lower layer RBM as the input of the higher layer RBM, and then train the higher layer RBM. The data will undergo feature extraction and reduce the dimension, output the feature vector, and obtain the appropriate initial weight of the model after each layer of RBM training. This step is mainly to pre-train each RBM layer of the DBN model. Step 2: To overcome the shortcoming that the DBN network is easy to fall into local optimum during the learning and training process, utilize the PSO optimization algorithm to dynamically optimize and adjust all RBM model parameters, and find the optimal initial weight of the network model.
Step 3: Determination of LSSVR model parameters. The output of the top-level RBM is used as the input of the LSSVR regression layer to train the LLSVR regression model. When the maximum number of cycles or the error is less than the specified threshold, the LSSVR model training is ended, and the LSSVR prediction model is constructed with the optimal combination parameters.
Step 4: After the LSSVR model training is completed, each layer of the RBM network can only ensure that the weights in its own layer are optimal for the feature vector mapping of this layer, not for the feature vector mapping of the entire DBN and LSSVR combined model. So it is necessary that the top-level LSSVR model propagates from top to bottom to each layer of RBM, and iteratively updates the weights and offsets of the fine-tuned DBN network until the model converges, and the training of the model is completed.
The following Figure 4 is the flow chart of the combined prediction model. number of cycles or the error is less than the specified threshold, the LSSVR model training is ended, and the LSSVR prediction model is constructed with the optimal combination parameters.
Step 4: After the LSSVR model training is completed, each layer of the RBM network can only ensure that the weights in its own layer are optimal for the feature vector mapping of this layer, not for the feature vector mapping of the entire DBN and LSSVR combined model. So it is necessary that the top-level LSSVR model propagates from top to bottom to each layer of RBM, and iteratively updates the weights and offsets of the fine-tuned DBN network until the model converges, and the training of the model is completed.
The following Figure 4 is the flow chart of the combined prediction model.

Evaluation of Performance
In this paper, the data set is divided into two subsets for training and testing the model, the first 80% of the data set is used for the training set, and the remaining 20% is used for testing the model. To evaluate the performance of the PSO-DBN-LSSVR model and other prediction models, the following model prediction effect evaluation indicators are applied, while the calculation formulas are as follows:

Evaluation of Performance
In this paper, the data set is divided into two subsets for training and testing the model, the first 80% of the data set is used for the training set, and the remaining 20% is used for testing the model. To evaluate the performance of the PSO-DBN-LSSVR model and other prediction models, the following model prediction effect evaluation indicators are applied, while the calculation formulas are as follows: where MAE is the mean absolute error, MAPE is the mean absolute percentage error, RMSE is the root mean square error, and R 2 is the coefficient of determination. In Equations (15)- (18), n is the number of prediction points, y i represents the true value of the water quality parameter at the ith prediction point, y i is the average value of the true value of the water quality parameter at the ith prediction point,ŷ i is the predicted value of the ith prediction point model,ŷ i is the average value of the predicted value of the ith prediction point water quality parameter model.

Data Selection and Preprocessing
The experimental data is collected by the environmental monitoring station every four hours, by means of the IoT system, using the water quality data acquisition module for the data acquisition, the transmission module for the data transmission, and the cloud server module for the cloud storage of the data. We selected 9 chemical factors, including water temperature (T, • C), pH, dissolved oxygen (DO, mg/L), conductivity (µS/cm), turbidity (NTU), potassium permanganate index (COD, mg/L), total phosphorus (TP, mg/L), and ammonia nitrogen (NH 4 N, mg/L). It is critical that these parameters are important parameters for evaluating water quality and directly related to the prediction of water quality. We used the first eight chemical factors as the input characteristic parameters, and total nitrogen as the output parameter, which is also the water quality parameter to be predicted in this paper. We selected the first 870 data of the cross-sectional water quality as the training set and the last 216 data as the test set to predict the water total nitrogen content.
Besides, the reason that the tested water quality parameters come from different Internet of Things (IoT) collection devices, equipment errors or faults, and human factors will cause the lack or abnormality of the collected water quality data. Therefore, before the construction of a deep learning network, it is essential to complete and correct the dataset [37]. The average method and k nearest neighbor (KNN) completion algorithm are more commonly data completion methods, so the KNN completion algorithm is used to complete and correct the collected data in this paper. The main principle of KNN is to select the k nearest samples of missing data according to the characteristics of high similarity between adjacent data and regard their average value as the missing data, where the Euclidean distance is used to judge the distance between sample points, the formula is as follows: where, X i = {x i1 , x i2 , · · · , x im }, replaces the first m-dimensional data of the ith sample, x ir is the rth attribute of the ith sample.
After completing the missing data, adopt the Pauta Criterion to eliminate the abnormal data in the sample data. The formula is as follows and the Pauta Criterion can be described in Figure 5. (20) where, n−1 , when the above formula holds, it is determined as abnormal data.
Euclidean distance is used to judge the distance between sample points, the formula is as follows: where, = , , ⋯ , , replaces the first m-dimensional data of the ith sample, is the rth attribute of the ith sample.
After completing the missing data, adopt the Pauta Criterion to eliminate the abnormal data in the sample data. The formula is as follows and the Pauta Criterion can be described in Figure 5.
where, ∆ = − , = ∑ ∆ , when the above formula holds, it is determined as abnormal data.  where,x(i) is the normalized data value, x(i) is the input data, x max and x min are the maximum and minimum values of the input data, respectively.
There are a total of 1086 experimental data, and the missing data accounts for about 3%, which is complemented by the k nearest neighbor completion algorithm. Abnormal data accounted for about 5%. After using the Pauta Criterion to eliminate abnormal data, the k nearest neighbor completion algorithm is also used to correct the data.

Results of Experiments
The algorithm is implemented using the matlab2018a experimental simulation platform, and the prediction model simulation program is written to predict the total nitrogen content. In this paper, five prediction models, BP, LSSVR, DBN, DBN-LSSVR, and PSO-DBN-LSSVR, are established respectively, and the prediction performance is compared.
Establish prediction models for water quality time series. For the BP prediction model, the structural parameter is 8-13-1, where the number of input layer neurons is 8, the number of hidden layer neurons is 13, the number of output layer neurons is 1, the learning rate is 0.001, the learning target is 0.01, and iterate 3000 times. The learning parameters C and δ of the LSSVR model are optimized and selected by the grid search method, and the iteration step is 1. We establish a double hidden layer DBN network structure composed of RBM1 and RBM2, it is found that the structural parameter of 8-12-4-1 is the best after many iteration experiments, where the number of visible layer neurons is 8, the number of RBM1 hidden layer neurons is 12, the number of RBM2 hidden layer neurons is 4, and the LSSVR output layer is 1. The RBM learning rate is set to 0.05, the number of iterations is 3000, and the population size is set to 50 in the PSO algorithm. After many experiments, it is found that the optimal parameter combination of LSSVR is C = 96 and δ = 1.5, then utilized the optimal prediction model to predict the total nitrogen (TN) content.
The prediction results are shown in Figure 6. It can be seen intuitively that the PSO-DBN-LSSVR prediction model proposed in this paper can be closer to the true value and has better prediction accuracy.
it is found that the optimal parameter combination of LSSVR is C = 96 and δ = 1.5, then utilized the optimal prediction model to predict the total nitrogen (TN) content.
The prediction results are shown in Figure 6. It can be seen intuitively that the PSO-DBN-LSSVR prediction model proposed in this paper can be closer to the true value and has better prediction accuracy.             Table 1 demonstrates some of the predicted values of the total nitrogen content predicted by each model, that is, the comparison between the actual results and the true values in the next seven days.   Table 1 demonstrates some of the predicted values of the total nitrogen content predicted by each model, that is, the comparison between the actual results and the true values in the next seven days.  20.22%, RMSE decreased by 2.4477, and R 2 increased by 0.2873. In terms of running time, the average training time of BP, LSSVR, and DBN are 12, 420, and 128 s, respectively. Although the combined model takes a long time, the best prediction accuracy is obtained to meet the requirements of engineering applications. It can be demonstrated from the analysis that the PSO-DBN-LSSVR prediction model we developed is superior to the shallow BP and LSSVR models, as well as the DBN and DBN-LSSVR models, which can better predict the total nitrogen content of the water quality of the cross-section in the next four hours. As can be seen from the determination coefficient, the prediction model also has better accuracy and robustness.

Conclusions
In order to solve the problem that the traditional prediction method is difficult to fully excavate the essential characteristics of the water quality of the cross-section, resulting in low prediction accuracy of the water quality parameters, in this paper, we proposed a deep learning method based on DBN to predict the water quality of the cross-section. Used deep belief network to learn and extract water quality features based on historical water quality data at a given time, and then combined LSSVR to predict value changes of water quality in the future. To overcome the shortcoming that the random initialization of the DBN network parameter will affect model prediction performance, the particle swarm optimization algorithm is used to optimize the model weight parameters to enhance the model prediction performance. At the top layer of the model, LSSVR is applied to predict water quality parameters. The experimental results indicated that the PSO-DBN-LSSVR cross-section water quality prediction model constructed in this paper can organically integrate feature extraction and non-linear regression methods, which greatly reduced the error between the prediction result and the actual value, and effectively improved the prediction accuracy of water quality parameters. Specifically, it provides technical support for the precise control of the water quality of the cross-section and also has a certain role in the management of water pollution.
In order to make the water quality prediction model more practical, in future research, we will collect more water quality data such as heavy metal parameters, flow, etc., and increase the input feature parameters of the neural network. Also, optimize the existing deep belief network structure further to improve the accuracy of the prediction model.