A Water Quality Prediction Method Based on the Deep LSTM Network Considering Correlation in Smart Mariculture

An accurate prediction of cage-cultured water quality is a hot topic in smart mariculture. Since the mariculturing environment is always open to its surroundings, the changes in water quality parameters are normally nonlinear, dynamic, changeable, and complex. However, traditional forecasting methods have lots of problems, such as low accuracy, poor generalization, and high time complexity. In order to solve these shortcomings, a novel water quality prediction method based on the deep LSTM (long short-term memory) learning network is proposed to predict pH and water temperature. Firstly, linear interpolation, smoothing, and moving average filtering techniques are used to repair, correct, and de-noise water quality data, respectively. Secondly, Pearson’s correlation coefficient is used to obtain the correlation priors between pH, water temperature, and other water quality parameters. Finally, a water quality prediction model based on LSTM is constructed using the preprocessed data and its correlation information. Experimental results show that, in the short-term prediction, the prediction accuracy of pH and water temperature can reach 98.56% and 98.97%, and the time cost of the predictions is 0.273 s and 0.257 s, respectively. In the long-term prediction, the prediction accuracy of pH and water temperature can reach 95.76% and 96.88%, respectively.


Introduction
In smart mariculture, it is an inevitable trend for aquaculture to become smarter, more accurate, and more ecological. However, due to the influence of climate, typhoons, rain, and changes of culture density of seawater, the balance of algae and bacteria in an aquaculture environment can easily be destroyed. Consequently, this leads to a decrease in the anti-stress ability and disease-resistant ability of farmed fish [1][2][3][4]. Furthermore, in traditional mariculture, water quality can only be determined by breeding workers using their experiences, although it is often impossible to grasp the changing trend of water quality in a timely and accurate manner based on empirical judgment alone. The precise prediction of water quality parameters can help aquaculture farmers to get hold of the trend of water quality parameters in the future, so as to adopt countermeasures. Therefore, it is necessary to figure  1.
The linear interpolation method and smoothing method are used to fill and correct the data sampled by the sensors, respectively. The moving average filter is used to denoise the data after filling and correcting. 2.
The influence factors of pH and water temperature are analyzed comprehensively. The correlation between water temperature, pH, and other water quality parameters is obtained by Pearson's correlation coefficient method, which can be used as the input parameters of the model training.

3.
Based on the pre-processed data and the correlation analysis results, a water quality prediction model based on a deep LSTM learning network is trained. Compared with the RNN based prediction model, the proposed prediction method can obtain higher prediction accuracy with less time.
The rest of this paper is organized as follows. Section 2 gives the methods of data acquisition and data analysis, and presents the water quality prediction model based on LSTM. In Section 3, we analyze and discuss the experimental results of pretreatment, and evaluate the accuracy and time complexity of proposed prediction methods for pH and water temperature. Finally, Section 4 concludes this paper.

Data Acquisition
The experimental data was collected in the mariculture base at Xincun Town, Lingshui County, Hainan Province, China. Data collection was achieved by deploying sensor devices in a cage. The collected data was stored in the data server, and real-time data information could be viewed on Sensors 2019, 19, 1420 4 of 20 the mobile terminal. Figure 1a shows the culture cage. It can be seen that a small amount of dead fish has floated above the water because of the deterioration in the water quality. Figure 1b shows a data acquisition sensor device, a vertical axis wind power generation device, a solar power generation panel, a storage battery, and a wireless transmitting device. Figure 1c shows the online data monitoring display page on a mobile phone. collected data was stored in the data server, and real-time data information could be viewed on the mobile terminal. Figure 1a shows the culture cage. It can be seen that a small amount of dead fish has floated above the water because of the deterioration in the water quality. Figure 1b shows a data acquisition sensor device, a vertical axis wind power generation device, a solar power generation panel, a storage battery, and a wireless transmitting device. Figure 1c shows the online data monitoring display page on a mobile phone.

Data Preprocessing
High-quality sample data sets are the basis of accurate analysis and prediction. In wireless sensor networks, however, due to instability, aging or erosion of the sensor equipment, and the susceptibility of the transmission network to distance and the surrounding environment, data loss, abnormality, and noise interference may occur during the measurement and transmission of water quality parameters [37].

Data Filling and Correction
In some application scenarios in this research where water quality changed dramatically, the linear interpolation was used, since it has better robustness than nearest-neighbor interpolation and is also suitable for filling data sets with small data intervals. The relationship between two known data and one unknown datum is seen as a linear relationship by the linear interpolation method, which uses the slope of the assumed line to calculate the data increment, thereby obtaining the required unknown data. This method is shown in (1):

Data Preprocessing
High-quality sample data sets are the basis of accurate analysis and prediction. In wireless sensor networks, however, due to instability, aging or erosion of the sensor equipment, and the susceptibility of the transmission network to distance and the surrounding environment, data loss, abnormality, and noise interference may occur during the measurement and transmission of water quality parameters [37].

Data Filling and Correction
In some application scenarios in this research where water quality changed dramatically, the linear interpolation was used, since it has better robustness than nearest-neighbor interpolation and is also suitable for filling data sets with small data intervals. The relationship between two known data and one unknown datum is seen as a linear relationship by the linear interpolation method, which uses the slope of the assumed line to calculate the data increment, thereby obtaining the required unknown data. This method is shown in (1): Sensors 2019, 19, 1420 5 of 20 where x k+i denotes lost data, x k is theknown data before x k+i , x k+j is the jth known data after x k . Water quality data are usually continuous, and normally change slowly and smoothly, with no sudden increase or decrease in a very short period of time. Therefore, if the received water quality data has a sharp jump, it needs to be corrected. This paper applied the smoothing method on data error correction. Error correction involves two steps: error detection and correction. If the error is detected, it is necessary to check whether the relative difference between the current data and the previous data or the latter data exceeds a certain range. If it does, the current data is wrong, otherwise the data is correct. The data correction process is to use the mean value of the previous and latter data of the erroneous data as the correction value, and the erroneous data is then replaced with the correction value. This method is shown in (2): where β 1 and β 2 are adjacent data error thresholds, and x k represents the data currently needed for error detection. If the difference between x k and x k−1 is larger than β 1 or the difference between x k and x k+1 is larger than β 2 , then it has to be updated as the mean of the two values before and after.

Moving Average Filtering
In the complex aquaculture environment there are noise interference signals in water quality data, for example, noise caused by constantly oscillating water waves. Figure 2 shows that the noise frequency in the water quality data was relatively stable. Therefore, the moving average filter can be used to achieve data noise reduction.
Water quality data are usually continuous, and normally change slowly and smoothly, with no sudden increase or decrease in a very short period of time. Therefore, if the received water quality data has a sharp jump, it needs to be corrected. This paper applied the smoothing method on data error correction. Error correction involves two steps: error detection and correction. If the error is detected, it is necessary to check whether the relative difference between the current data and the previous data or the latter data exceeds a certain range. If it does, the current data is wrong, otherwise the data is correct. The data correction process is to use the mean value of the previous and latter data of the erroneous data as the correction value, and the erroneous data is then replaced with the correction value. This method is shown in (2): x  is larger than 2  , then it has to be updated as the mean of the two values before and after.

Moving Average Filtering
In the complex aquaculture environment there are noise interference signals in water quality data, for example, noise caused by constantly oscillating water waves. Figure 2 shows that the noise frequency in the water quality data was relatively stable. Therefore, the moving average filter can be used to achieve data noise reduction. The moving average filter is a finite-length unit impulse response filter, shown as (3). The moving average filter is a finite-length unit impulse response filter, shown as (3).
In (3), N represents the size of window, t is a certain time, y(t) is the data in the new sequence corresponding to time t (the average value of N data), and x(t − k) is the data at time (t − k) in original data set. The moving average filtering can attenuate high frequency signals, so as to achieve data smoothing.

Correlation Analysis
In this paper, the Pearson's correlation coefficient method was used to analyze the correlation between water temperature, pH, and other water quality in an aquaculture environment. Pearson's correlation coefficient [38] is a method for analyzing wheth er there is a close correlation between two variables, which is defined as the quotient of the covariance and the standard deviation between two variables. After pretreatment of all measured water quality parameters, the Pearson's correlation coefficient method was used to analyze the correlation between the required predicted water quality factors and other factors. The calculation results are shown in Table 1.  Table 1 shows that the water temperature has a strong positive correlation with conductivity, a strong negative correlation with salinity, a moderate positive correlation with chlorophyll, a weak correlation with turbidity, an extremely strong negative correlation with pH, and a strong positive correlation with dissolved oxygen. On the other hand, pH has a strong negative correlation with conductivity, a strong positive correlation with salinity, almost no correlation with turbidity, and a weak correlation with dissolved oxygen.

The Proposed Prediction Model Based on LSTM Deep Learning
Water quality prediction plays a significant role in smart aquaculture. The trends of changes in water quality in the future are predicted using historical water quality data and the correlations with other water quality. If the predicted water quality parameters are beyond the tolerance range of the fish, the water quality should be regulated in time to keep the fish living in the most suitable water environment for a long time.

LSTM Deep Learning Network
The LSTM neural network is an improvement on the recurrent neural network (RNN). The main difference is that a processor called a "cell state" is added to the hidden layer, which is used to determine whether the information is useful or not. The advantage of the RNN is that it can continuously remember historical information during the training along the sequence, however its memory capacity is limited. As the training progresses, artificial neurons forget information that is considered irrelevant and only remember some information that is considered important, which leads to the loss of some information needed at the later stage. However, the LSTM neural network has been improved to resolve this problem.
The hidden layer of the traditional RNN has only one state s, while the LSTM network added a state c, that is, cell state. As shown in Figure 3, at time t, the hidden layer has three inputs, namely, the input value x t at the current time, the output value s t−1 of the hidden layer neurons at the previous time, and the unit state c t−1 at the previous time. Meanwhile, there are two outputs in the hidden layer, namely, the output of the hidden layer s t and the cell state c t at the current time.
In Figure 4, the LSTM network sets up three gates to control c, namely, r 1 (called the forget gate, which is used to control whether to save the long-term state), r 2 (called the input gate, which is used to control whether the current state is inputted into the long-term state), and r 3 (called the output gate, which controls whether the current long-term state is the output of the hidden layer). The forget gate determines how much information of state c t−1 at the previous moment is retained to the current state c t . The input gate determines how much information of data x t inputted into the hidden layer in current is saved to the current state c t . Similarly, the output gate controls how much information in c t is inputted into s t . The internal structure of the hidden layer in the LSTM network is shown in Figure 5.  In Figure 4, the LSTM network sets up three gates to control c , namely, 1 r (called the forget gate, which is used to control whether to save the long-term state), 2 r (called the input gate, which is used to control whether the current state is inputted into the long-term state), and 3 r (called the output gate, which controls whether the current long-term state is the output of the hidden layer).
The forget gate determines how much information of state 1 t c  at the previous moment is retained to the current state t c . The input gate determines how much information of data t x inputted into the hidden layer in current is saved to the current state t c . Similarly, the output gate controls how much information in t c is inputted into t s . The internal structure of the hidden layer in the LSTM network is shown in Figure 5. In Figure 5, t x is the input data set, -1 t s is the output of previous hidden layer, t f is the forget gate, f W is the weight of the forget gate, t i is the input gate, i W is the weight of the input  In Figure 4, the LSTM network sets up three gates to control c , namely, 1 r (called the forget gate, which is used to control whether to save the long-term state), 2 r (called the input gate, which is used to control whether the current state is inputted into the long-term state), and 3 r (called the output gate, which controls whether the current long-term state is the output of the hidden layer).
The forget gate determines how much information of state 1 t c  at the previous moment is retained to the current state t c . The input gate determines how much information of data t x inputted into the hidden layer in current is saved to the current state t c . Similarly, the output gate controls how much information in t c is inputted into t s . The internal structure of the hidden layer in the LSTM network is shown in Figure 5. In Figure 5, t x is the input data set, -1 t s is the output of previous hidden layer, t f is the forget gate, f W is the weight of the forget gate, t i is the input gate, i W is the weight of the input   In Figure 4, the LSTM network sets up three gates to control c , namely, 1 r (called the forget gate, which is used to control whether to save the long-term state), 2 r (called the input gate, which is used to control whether the current state is inputted into the long-term state), and 3 r (called the output gate, which controls whether the current long-term state is the output of the hidden layer).
The forget gate determines how much information of state  In Figure 5, t x is the input data set, -1 t s is the output of previous hidden layer, t f is the forget gate, f W is the weight of the forget gate, t i is the input gate, i W is the weight of the input gate, t c is the input cell state at the current time, c W is the weight of t c , t o is the output gate, o W Figure 5. The internal structure of the hidden layers of the LSTM network.
In Figure 5, x t is the input data set, s t−1 is the output of previous hidden layer, f t is the forget gate, W f is the weight of the forget gate, i t is the input gate, W i is the weight of the input gate, c t is the input cell state at the current time, W c is the weight of c t , o t is the output gate, W o is the weight of the output gate, c t−1 is the previous unit state, c t is the current unit state, and s t is the output of the current hidden layer.
The LSTM network has the process of data back-propagation, the same as the RNN, and the error value propagates along the time series in addition to spreading between layers. After obtaining the updated gradient of the horizontal and vertical weights and bias terms, the updated value of each weight and bias term can be obtained by the structure of the hidden layers. The calculation method is the same as the RNN network, and the value of the learning rate α should be set to control the error updated gradient and the speed of error decline.
In the above training model, we introduced three evaluation metrics [39] to evaluate the prediction effect, which are defined as follows: MAE is the basic evaluation metric, and the following methods are generally used as a reference to compare the advantages and disadvantages.
Definition 2: RMSE (root mean squared error) RMSE denotes the mean error, which is more sensitive to extreme values. If there is an extreme value in the training process at some time point, the RMSE will be greatly affected by the increasing error. The change of the evaluation index can be used as the benchmark for the robustness test of the model.
Definition 3: MAPE (mean absolute percent error) MAPE considers not only the deviation between the predicted data and the real data, but also the ratio between the deviation and the real data.
In (4), (5), and (6), y i represents the real value, y i represents the value predicted by the model at the same time, which is the output value of the deep learning model, and N is the number of samples in the data set. The closer the above three evaluation metrics are to 0, the better the prediction and fitting effect of the model will be.

Construction of a Water Temperature and pH Prediction Model Based on the LSTM Deep Network
The whole prediction process is shown in Figure 6. The proposed prediction model integrates data preprocessing and correlation analysis into the training of the deep LSTM network. Firstly, after receiving water quality data from the wireless transmission network, a series of linear interpolation, smoothing, and moving average filtering techniques are used to repair, correct and de-noise water quality data, respectively. Then, Pearson's correlation coefficient is used to obtain correlation priors between pH, water temperature, and other water quality parameters. Finally, a water quality prediction model based on deep LSTM is constructed using the preprocessed data and its correlation information. When the prediction accuracy of the model reaches the expected requirements, the whole prediction model is considered to be established successfully; otherwise, it will be retrained to obtain better results. According to the system dynamics relationship between the water quality factors analyzed in Section 2.3, it is easy to know that water temperature has a strong correlation with conductivity, salinity, pH, and dissolved oxygen. The historical data of these parameters and water temperature were used as the input data of the water temperature prediction model constructed for training, and the output of the model was the predicted value of water temperature. The actual water temperature data measured locally was taken as the real data.
Using the TensorFlow platform and python language, the prediction models based on the RNN and the LSTM deep network were built. The preprocessed data of 610 groups were input into the prediction model separately along a time series. In the water temperature prediction model, the input data dimension was 5, the output data dimension was 1, the number of hidden layers was 15, the time step was set as 20, the learning rate was set to 0.0005, and the times of training were set as 10,000. In each training process, three evaluation metrics-MAE, RMSE, and MAPE-between the output values of all output layers and the real values were recorded, as shown in Table 2 Figure 6. The complete flowchart of the water quality prediction model. According to the system dynamics relationship between the water quality factors analyzed in Section 2.3, it is easy to know that water temperature has a strong correlation with conductivity, salinity, pH, and dissolved oxygen. The historical data of these parameters and water temperature were used as the input data of the water temperature prediction model constructed for training, and the output of the model was the predicted value of water temperature. The actual water temperature data measured locally was taken as the real data.
Using the TensorFlow platform and python language, the prediction models based on the RNN and the LSTM deep network were built. The preprocessed data of 610 groups were input into the prediction model separately along a time series. In the water temperature prediction model, the input data dimension was 5, the output data dimension was 1, the number of hidden layers was 15, the time step was set as 20, the learning rate was set to 0.0005, and the times of training were set as 1,0000. In each training process, three evaluation metrics-MAE, RMSE, and MAPE-between the output values of all output layers and the real values were recorded, as shown in Table 2 below. The comparison of RMSE (water temperature) between the LSTM-based prediction model and the RNN-based prediction model is shown in Figure 7. For water temperature, the unit of the RMSE is Celsius (°C).
(a) Statistics comparison of error (b) Partial magnification comparison   The comparison of RMSE (water temperature) between the LSTM-based prediction model and the RNN-based prediction model is shown in Figure 7. For water temperature, the unit of the RMSE is Celsius ( • C).  Figure 6. The complete flowchart of the water quality prediction model. According to the system dynamics relationship between the water quality factors analyzed in Section 2.3, it is easy to know that water temperature has a strong correlation with conductivity, salinity, pH, and dissolved oxygen. The historical data of these parameters and water temperature were used as the input data of the water temperature prediction model constructed for training, and the output of the model was the predicted value of water temperature. The actual water temperature data measured locally was taken as the real data.
Using the TensorFlow platform and python language, the prediction models based on the RNN and the LSTM deep network were built. The preprocessed data of 610 groups were input into the prediction model separately along a time series. In the water temperature prediction model, the input data dimension was 5, the output data dimension was 1, the number of hidden layers was 15, the time step was set as 20, the learning rate was set to 0.0005, and the times of training were set as 1,0000. In each training process, three evaluation metrics-MAE, RMSE, and MAPE-between the output values of all output layers and the real values were recorded, as shown in Table 2 below. The comparison of RMSE (water temperature) between the LSTM-based prediction model and the RNN-based prediction model is shown in Figure 7. For water temperature, the unit of the RMSE is Celsius (°C).  In the same way, according to the analysis in Section 2.3, it can be seen that pH has a strong correlation with conductivity, salinity, and water temperature. These water quality parameters and the historical data of pH were used as the input data for the model construction, the output value of the model was taken as the predicted value of the pH, and the actual measured pH data was taken as the real data. When building the prediction model, the dimension of the input data was 4, the dimension of the output data was 1, the hidden layer was set to 15 layers, the time step was set to 20, the learning rate was set to 0.0005, and the training was performed 10,000 times. The three evaluation metrics between the output values of all output layers and the real value in each training process was recorded, as shown in Table 2. The comparison of RSME (pH) between the LSTM-based prediction model and the RNN-based prediction model is shown in Figure 8. In the same way, according to the analysis in Section 2.3, it can be seen that pH has a strong correlation with conductivity, salinity, and water temperature. These water quality parameters and the historical data of pH were used as the input data for the model construction, the output value of the model was taken as the predicted value of the pH, and the actual measured pH data was taken as the real data. When building the prediction model, the dimension of the input data was 4, the dimension of the output data was 1, the hidden layer was set to 15 layers, the time step was set to 20, the learning rate was set to 0.0005, and the training was performed 1,0000 times. The three evaluation metrics between the output values of all output layers and the real value in each training process was recorded, as shown in Table 2. The comparison of RSME (pH) between the LSTM-based prediction model and the RNN-based prediction model is shown in Figure 8. From Figure 7a and Figure 8a, we can see that the error between the output data and the real data was gradually reduced, and finally was close to 0. In addition, in the early stage of training, the error slowed down faster, while the dropping rate gradually slowed down in the middle and late stages. Figure 7b and Figure 8b are the charts of partial magnification comparison of error. It can be seen that the overall trend of the error was decreasing. Since the error of the entire training network was larger each time when the training started again, a local rising phenomenon occurred. However, as the single training went on, the error kept decreasing.

Results and Discussions
The experimental data was collected from the mariculture cages with the sensor devices, and then transmitted to the data server by means of a wireless bridge for storage. For the short-term predictions, the sampling frequency of the data was once every 5 min. Water quality data of 610 groups (about 51 h) including temperature, conductivity, chlorophyll, salinity, turbidity, pH, and dissolved oxygen parameters were used as experimental data for model training, and another 100 sets of water quality data (about 8.3 h) were used to verify the prediction effect. In addition, for the long-term predictions, the data sampling interval and data collection quantity are described in Section 3.4.
The experimental environment is: Intel(R) Core(TM) i7-8550 CPU@2Ghz processor, 8 GB memory, Windows 10(64-bit) operating system, Anaconda3 experimental platform, and pycharm3.3 IDE (Integrated Development Environment), and the construction of the neural network model is based on python 3.6 and the Tensorflow 1.6.0 package. The accuracy and range ability of the sensors are shown in Table 3. F.S. is the abbreviation for "Full Scale", NTU is the abbreviation for "Nephelometric Turbidity Unit", and PSU is the abbreviation for "Practical Salinity Unit". Table 3. The accuracy and range ability of the sensors. From Figures 7a and 8a, we can see that the error between the output data and the real data was gradually reduced, and finally was close to 0. In addition, in the early stage of training, the error slowed down faster, while the dropping rate gradually slowed down in the middle and late stages. Figures 7b and 8b are the charts of partial magnification comparison of error. It can be seen that the overall trend of the error was decreasing. Since the error of the entire training network was larger each time when the training started again, a local rising phenomenon occurred. However, as the single training went on, the error kept decreasing.

Results and Discussions
The experimental data was collected from the mariculture cages with the sensor devices, and then transmitted to the data server by means of a wireless bridge for storage. For the short-term predictions, the sampling frequency of the data was once every 5 min. Water quality data of 610 groups (about 51 h) including temperature, conductivity, chlorophyll, salinity, turbidity, pH, and dissolved oxygen parameters were used as experimental data for model training, and another 100 sets of water quality data (about 8.3 h) were used to verify the prediction effect. In addition, for the long-term predictions, the data sampling interval and data collection quantity are described in Section 3.4.
The experimental environment is: Intel(R) Core(TM) i7-8550 CPU@2Ghz processor, 8 GB memory, Windows 10(64-bit) operating system, Anaconda3 experimental platform, and pycharm3.3 IDE (Integrated Development Environment), and the construction of the neural network model is based on python 3.6 and the Tensorflow 1.6.0 package. The accuracy and range ability of the sensors are shown in Table 3. F.S. is the abbreviation for "Full Scale", NTU is the abbreviation for "Nephelometric Turbidity Unit", and PSU is the abbreviation for "Practical Salinity Unit".

Experiments and Analysis of Data Preprocessing
Compared with spline interpolation, nearest-neighbor interpolation, and cubic interpolation, it has been found that linear interpolation has a similar interpolation effect to nearest-neighbor interpolation, and is superior to spline interpolation and cubic interpolation. Therefore, in this experiment, we used the improved method mentioned in Section 2.2.1 for data filling.
In the process of data mending, taking water temperature data collected at depth of 3.26 m as an example, in order to determine the optimal value of i and j in (1), the relative error between the original data and filling data obtained by the linear interpolation method is calculated when i and j are both positive integers in interval [1,10]. The variation is shown in Figure 9.

Technical Parameters of Sensors
The

Experiments and Analysis of Data Preprocessing
Compared with spline interpolation, nearest-neighbor interpolation, and cubic interpolation, it has been found that linear interpolation has a similar interpolation effect to nearest-neighbor interpolation, and is superior to spline interpolation and cubic interpolation. Therefore, in this experiment, we used the improved method mentioned in Section 2.2.1 for data filling.
In the process of data mending, taking water temperature data collected at depth of 3.26 m as an example, in order to determine the optimal value of i and j in (1), the relative error between the original data and filling data obtained by the linear interpolation method is calculated when i and j are both positive integers in interval [1,10] .The variation is shown in Figure 9. As shown in Figure 9, in the deep orange area-i.e., -the relative errors between the original data and filling data are nearly 0, while, in the red area, blue area, and the area near them-i.e., [4,10] [1, 4] i j    -the relative errors are close to 0.04. Furthermore, as can be seen from Figure 9, the relative errors can be minimized when ( [1,4] [7,10]) ( [6,10] In terms of data correction, since water quality data have a time correlation, 1  and 2  in Equation (2) can be determined using the relative difference between two adjacent historical water quality data as a constraint of the current relative difference. Take the average value of the relative difference between two adjacent data of the previous day as the value of 1  and 2  , i.e., 1 2 10%     , the relative differences of pH and water temperature before and after data correction are shown in Figure 10. As shown in Figure 9, in the deep orange area-i.e., i ∈ [1, 4] ∩ j ∈ [1, 10], i ∈ [4, 6] ∩ j ∈ [4, 10], or i ∈ [6, 10] ∩ j ∈ [6, 10]-the relative errors between the original data and filling data are nearly 0, while, in the red area, blue area, and the area near them-i.e., i ∈ [4, 10] ∩ j ∈ [1, 4]-the relative errors are close to 0.04. Furthermore, as can be seen from Figure 9, the relative errors can be minimized when (i ∈ [1, 4] ∩ j ∈ [7, 10]) ∪ (i ∈ [6, 10] ∩ j ∈ [7, 10]).
In terms of data correction, since water quality data have a time correlation, β 1 and β 2 in Equation (2) can be determined using the relative difference between two adjacent historical water quality data as a constraint of the current relative difference. Take the average value of the relative difference between two adjacent data of the previous day as the value of β 1 and β 2 , i.e., β 1 = β 2 = 10%, the relative differences of pH and water temperature before and after data correction are shown in Figure 10.
As shown in Figure 10, red and blue dots overlap together, and the relative differences of temperature and pH before and after data correction are not greatly different, which indicates that there are relatively few error data in the collected data.
In the process of data denoising, Equation (3) is used in the experiment. The size of window N was set as 4, the data of water temperature and pH were smoothed and denoised. Comparisons of water quality data before and after denoising are shown in Figure 11.
From Figure 11, it can be seen that the moving average filter can effectively reduce the data noise, restore the original data affected by wave and transmission, and smooth the water quality parameter curve. Sensors 2019, 19, x FOR PEER REVIEW 12 of 20 (a) Water temperature (b) pH Figure 10. Comparison of relative differences before and after data correction.
As shown in Figure 10, red and blue dots overlap together, and the relative differences of temperature and pH before and after data correction are not greatly different, which indicates that there are relatively few error data in the collected data.
In the process of data denoising, Equation (3) is used in the experiment. The size of window N was set as 4, the data of water temperature and pH were smoothed and denoised. Comparisons of water quality data before and after denoising are shown in Figure 11.
(a) Water temperature (b) pH Figure 11. Comparison of water quality data before and after noise reduction.
From Figure 11, it can be seen that the moving average filter can effectively reduce the data noise, restore the original data affected by wave and transmission, and smooth the water quality parameter curve.

The Prediction of Water Temperature
Two kinds of prediction models were used to predict the variation trend of water quality parameters in the future. The 100 values were predicted, and the comparison between predicted values and real values is shown in Figure 12.  As shown in Figure 10, red and blue dots overlap together, and the relative differences of temperature and pH before and after data correction are not greatly different, which indicates that there are relatively few error data in the collected data.
In the process of data denoising, Equation (3) is used in the experiment. The size of window N was set as 4, the data of water temperature and pH were smoothed and denoised. Comparisons of water quality data before and after denoising are shown in Figure 11.
(a) Water temperature (b) pH Figure 11. Comparison of water quality data before and after noise reduction.
From Figure 11, it can be seen that the moving average filter can effectively reduce the data noise, restore the original data affected by wave and transmission, and smooth the water quality parameter curve.

The Prediction of Water Temperature
Two kinds of prediction models were used to predict the variation trend of water quality parameters in the future. The 100 values were predicted, and the comparison between predicted values and real values is shown in Figure 12. Figure 11. Comparison of water quality data before and after noise reduction.

The Prediction of Water Temperature
Two kinds of prediction models were used to predict the variation trend of water quality parameters in the future. The 100 values were predicted, and the comparison between predicted values and real values is shown in Figure 12. The water temperature data predicted by the two models mentioned above is not completely matched with the real value, but the value predicted by the LSTM-based model is closer to the real value. Obviously, the values predicted by the RNN-based prediction model fluctuate greatly, and the errors between the predicted value and the real value are also large. Table 4 shows the relative deviations between the predicted values and the real values of the two models. The unit of the The water temperature data predicted by the two models mentioned above is not completely matched with the real value, but the value predicted by the LSTM-based model is closer to the real value. Obviously, the values predicted by the RNN-based prediction model fluctuate greatly, and the errors between the predicted value and the real value are also large. Table 4 shows the relative deviations between the predicted values and the real values of the two models. The unit of the deviation in LSTM or RNN is degrees Celsius. In order to facilitate typesetting, the 100 groups of deviation values between the predicted data and the real data are divided into four columns from left to right, with each column showing 25 data. In Table 4, relative deviations between the predicted values and the real values using the model based on LSTM are mostly less than 1 • C, with an average of 1.03 • C, while the deviations using the model based on RNN are mostly more than 1 • C, with an average of 1.37 • C. As a result, the LSTM-based model can predict water temperature more effectively and more accurately.

Time Complexity Analysis
The duration of each training and the total time cost of 10,000 times training under the two neuron networks were recorded in the experiment. Figure 13 shows a comparison of the time spent performing 10,000 trainings between the two methods.
From Figure 13, it can be seen that the training time of the LSTM-based prediction model is shorter and more stable, while the training time of the RNN-based model is longer, and increases sharply between the 7000th and 9000th. The average training time of the LSTM neuron network is 0.257 s, and the total time cost of 10,000 times training is 2567.06 s; the average training time of RNN is 0.259 s, and its total training time is 2591.95 s. Therefore, the training time of the LSTM-based prediction model is shorter than that of the RNN-based model. In other words, the construction efficiency of the LSTM-based prediction model is higher. The duration of each training and the total time cost of 1,0000 times training under the two neuron networks were recorded in the experiment. Figure 13 shows a comparison of the time spent performing 1,0000 trainings between the two methods. From Figure 13, it can be seen that the training time of the LSTM-based prediction model is shorter and more stable, while the training time of the RNN-based model is longer, and increases sharply between the 7000th and 9000th. The average training time of the LSTM neuron network is 0.257 s, and the total time cost of 1,0000 times training is 2567.06 s; the average training time of RNN is 0.259 s, and its total training time is 2591.95 s. Therefore, the training time of the LSTM-based prediction model is shorter than that of the RNN-based model. In other words, the construction efficiency of the LSTM-based prediction model is higher.

Prediction of pH Values
The future pH data is predicted using the two trained models. The 100 values are predicted, and the comparison between the predicted values and real values is shown in Figure 14.  Figure 14 shows the predicted contrast effect after the scale of the vertical axis is enlarged. In fact, the relative errors are no more than 5%, and the future trend of pH can be judged from Figure  14. The predictions based on the LSTM deep network are closer to the real values. Table 5 shows the

Prediction of pH Values
The future pH data is predicted using the two trained models. The 100 values are predicted, and the comparison between the predicted values and real values is shown in Figure 14. From Figure 13, it can be seen that the training time of the LSTM-based prediction model is shorter and more stable, while the training time of the RNN-based model is longer, and increases sharply between the 7000th and 9000th. The average training time of the LSTM neuron network is 0.257 s, and the total time cost of 1,0000 times training is 2567.06 s; the average training time of RNN is 0.259 s, and its total training time is 2591.95 s. Therefore, the training time of the LSTM-based prediction model is shorter than that of the RNN-based model. In other words, the construction efficiency of the LSTM-based prediction model is higher.

Prediction of pH Values
The future pH data is predicted using the two trained models. The 100 values are predicted, and the comparison between the predicted values and real values is shown in Figure 14.  Figure 14 shows the predicted contrast effect after the scale of the vertical axis is enlarged. In fact, the relative errors are no more than 5%, and the future trend of pH can be judged from Figure   Figure 14. Comparison between predicted values and real values. Figure 14 shows the predicted contrast effect after the scale of the vertical axis is enlarged. In fact, the relative errors are no more than 5%, and the future trend of pH can be judged from Figure 14. The predictions based on the LSTM deep network are closer to the real values. Table 5 shows the relative deviations between the predicted values and the real values under the two models. In order to facilitate typesetting, the 100 groups of deviation values between the predicted data and the real data are divided into four columns from left to right, with each column showing 25 data.
As shown in Table 5, the average relative deviation between the predicted values and the real values using the RNN-based prediction model is 1.579, while the one using the LSTM-based model is 1.439. Therefore, the predicted results of the LSTM-based model are closer to the real values.

Time Complexity Analysis
The experiments recorded the time cost of each training and the total time consumption of 10,000 training times for pH data using two prediction models. Figure 15 shows the comparison of 10,000 training times between the two methods.
Sensors 2019, 19, x FOR PEER REVIEW 15 of 20 relative deviations between the predicted values and the real values under the two models. In order to facilitate typesetting, the 100 groups of deviation values between the predicted data and the real data are divided into four columns from left to right, with each column showing 25 data. As shown in Table 5, the average relative deviation between the predicted values and the real values using the RNN-based prediction model is 1.579, while the one using the LSTM-based model is 1.439. Therefore, the predicted results of the LSTM-based model are closer to the real values.

Time Complexity Analysis
The experiments recorded the time cost of each training and the total time consumption of 1,0000 training times for pH data using two prediction models. Figure 15 shows the comparison of 1,0000 training times between the two methods. As can be seen from Figure 15, the training time of the LSTM network is apparently shorter than that of RNN, and the time variations of the former are also smaller, thus it is more stable when using the LSTM network. The average training time of the RNN is 0.298 s, and the total time cost of 1,0000 As can be seen from Figure 15, the training time of the LSTM network is apparently shorter than that of RNN, and the time variations of the former are also smaller, thus it is more stable when using the LSTM network. The average training time of the RNN is 0.298 s, and the total time cost of 10,000 times training is 2968.568 s, while the average training time of the LSTM network is 0.273 s, and its total training time is 2734.118 s. Therefore, the LSTM network takes less time and is more efficient in constructing the pH prediction model.

Long-Term Prediction of Water Temperature and pH
In order to further verify the practicability and robustness of the prediction model, a longer training data set was collected for model training. Then, we used the trained model to predict the next 83 h (about 3.5 days) of water quality data. The sampling frequency of the data is once every 1 min. A total of 30000 groups (about 21 days) of data were collected for training. An additional 5000 sets of data (83 h in total) were used for comparison.
The experiment was carried out under the same conditions as the short-term prediction, and the number of trainings was 500 and 1000, respectively. The results of the three evaluation indicators obtained during each training process are shown in Table 6. For the water temperature prediction, the comparison of RMSE between the LSTM-based prediction model and the RNN-based prediction model is shown in Figure 16. According to equation (5), when RMSE is closer to 0, the prediction error of water quality parameters is smaller. For water temperature, the unit of the RMSE is Celsius ( • C).

Long-Term Prediction of Water Temperature and pH
In order to further verify the practicability and robustness of the prediction model, a longer training data set was collected for model training. Then, we used the trained model to predict the next 83 h (about 3.5 days) of water quality data. The sampling frequency of the data is once every 1 min. A total of 30000 groups (about 21 days) of data were collected for training. An additional 5000 sets of data (83 h in total) were used for comparison.
The experiment was carried out under the same conditions as the short-term prediction, and the number of trainings was 500 and 1000, respectively. The results of the three evaluation indicators obtained during each training process are shown in Table 6. For the water temperature prediction, the comparison of RMSE between the LSTM-based prediction model and the RNN-based prediction model is shown in Figure 16. According to equation (5), when RMSE is closer to 0, the prediction error of water quality parameters is smaller. For water temperature, the unit of the RMSE is Celsius (°C). For the pH prediction, the comparison of RMSE between the LSTM-based prediction model and the RNN-based prediction model is shown in Figure 17. For the pH prediction, the comparison of RMSE between the LSTM-based prediction model and the RNN-based prediction model is shown in Figure 17.
The trained model was used to predict the water temperature and pH values. We have predicted a total of 5000 sets of data. The comparison of the long-term prediction effect between the proposed scheme and RNN is shown in Figures 18 and 19. It takes a total of 66 s to predict 5000 pieces of data using the trained model, and the average prediction time is 13.2 ms. Different kinds of fish have different tolerances to water quality parameters. Saddle-spotted grouper (epinephelus lanceolatus) for example, generally has a pH value tolerance range of 7.5-9.2, and the range suitable for growth is 7.9-8.4. The water temperature suitable for growth is 22 • C-30 • C, with a minimum tolerance of 15 • C and a maximum tolerance of 35 • C. Because cultured fish are sensitive to changes in key water quality parameters, countermeasures can be taken in advance through water quality prediction to keep water quality parameters within the tolerance threshold range. The trained model was used to predict the water temperature and pH values. We have predicted a total of 5000 sets of data. The comparison of the long-term prediction effect between the proposed scheme and RNN is shown in Figure 18 and Figure 19. It takes a total of 66 s to predict 5000 pieces of data using the trained model, and the average prediction time is 13.2 ms.  Different kinds of fish have different tolerances to water quality parameters. Saddle-spotted grouper (epinephelus lanceolatus) for example, generally has a pH value tolerance range of 7.5-9.2, and the range suitable for growth is 7.9-8.4. The water temperature suitable for growth is 22 °C-30 °C, with a minimum tolerance of 15 °C and a maximum tolerance of 35 °C. Because cultured fish are sensitive to changes in key water quality parameters, countermeasures can be taken in advance The trained model was used to predict the water temperature and pH values. We have predicted a total of 5000 sets of data. The comparison of the long-term prediction effect between the proposed scheme and RNN is shown in Figure 18 and Figure 19. It takes a total of 66 s to predict 5000 pieces of data using the trained model, and the average prediction time is 13.2 ms.  Different kinds of fish have different tolerances to water quality parameters. Saddle-spotted grouper (epinephelus lanceolatus) for example, generally has a pH value tolerance range of 7.5-9.2, and the range suitable for growth is 7.9-8.4. The water temperature suitable for growth is 22 °C-30 °C, with a minimum tolerance of 15 °C and a maximum tolerance of 35 °C. Because cultured fish are sensitive to changes in key water quality parameters, countermeasures can be taken in advance The trained model was used to predict the water temperature and pH values. We have predicted a total of 5000 sets of data. The comparison of the long-term prediction effect between the proposed scheme and RNN is shown in Figure 18 and Figure 19. It takes a total of 66 s to predict 5000 pieces of data using the trained model, and the average prediction time is 13.2 ms.  Different kinds of fish have different tolerances to water quality parameters. Saddle-spotted grouper (epinephelus lanceolatus) for example, generally has a pH value tolerance range of 7.5-9.2, and the range suitable for growth is 7.9-8.4. The water temperature suitable for growth is 22 °C-30 °C, with a minimum tolerance of 15 °C and a maximum tolerance of 35 °C. Because cultured fish are sensitive to changes in key water quality parameters, countermeasures can be taken in advance Figure 19. Comparison of the long-term prediction effect for water temperature.
From Figures 18 and 19, we can see some spikes. Since these spikes don't last very long, we treat them as predicted abnormal data without intervention. However, if such spikes last longer (i.e., more than 15 min) and are outside the tolerance threshold, the farmer needs to pay close attention and take countermeasures in advance.

Discussions
From the above experimental analysis, the proposed scheme can achieve better results in long-term and short-term prediction. Using the proposed scheme, the short-term prediction accuracy can reach 98.56% and 98.97% for pH and water temperature, respectively, while the long-term prediction accuracy can reach 95.76% and 96.88% for pH and water temperature, respectively. In addition, the average prediction time for short-term predictions is 12.5 ms, and the average time for long-term predictions is 13.2 ms. Therefore, based on the trained model, the proposed scheme can realize fast and accurate predictions.
However, the proposed scheme still needs more computational cost in data set processing. Moreover, compared with the real data, the overall prediction results have strong fluctuations. In the future, we will focus on the optimization of the deep learning network structure. On the premise of ensuring the prediction accuracy, we will reduce the computational complexity of model training through optimization. Meanwhile, in order to make the water quality prediction model more robust and practical, the deep neural network structure will incorporate more relevant prior knowledge (such as precipitation and climate factors) for prediction. In addition, the proposed method also has some limitations in data preprocessing. According to the SVD (Singular Value Decomposition) theory [40,41], we know that the original signal contributes little to the tail singular values, and the signal energy is mainly concentrated on the first several singular values, while the tail singular values is mainly determined by noise. In future work, we will obtain more effective noise reduction methods based on this conclusion. Meanwhile, for some of the meaningless spikes that appear in Figures 18 and 19, we will consider how to conduct reasonable and safe post-processing in our future work to make the prediction curve smoother.

Conclusions
Aiming at the problems of water quality prediction in smart mariculture, a relatively accurate water quality prediction method based on the deep LSTM network is proposed, which integrates the correlation coefficient between water quality data. In the proposed method, the integrity and accuracy of data are effectively improved after data pretreatment operation in the first stage. Then, the correlation knowledge between pH, water temperature, and other water quality parameters is obtained using Pearson's correlation coefficient method. Ultimately, the water quality prediction model based on the deep LSTM network is constructed. The prediction results of pH and water temperature using the constructed prediction model show that the proposed method can achieve a higher prediction accuracy and lower time cost than the RNN-based prediction model. Specifically, for the short-term predictions, the prediction accuracy of the proposed scheme can reach 98.56% and 98.97% in the prediction of pH and water temperature, respectively, while, for long-term predictions, the prediction accuracy of the proposed scheme can reach 95.76% and 96.88%.