1. Introduction
Water is one of the most essential natural resources on which all life depends. However, various economic activities have an indispensable impact on the environment through different pathways [
1]. Take China as an example: in recent years, along with high-speed economic development and urbanization, China’s limited freshwater resources have been drastically reduced and, at the same time, increasing water pollution poses a serious threat to human survival and security and has become a significant obstacle to human health and sustainable socio-economic development. From the perspective of China’s actual national conditions, water resources are relatively scarce. In addition, as China is undergoing a period of rapid socio-economic development, the demand for water resources is accelerating. Although China has 2.8 trillion water resources [
2], which seems to be very rich, the per capita share of water resources is only 2400 cubic meters due to its large population [
3], and account for less than one-quarter of the world’s total per capita water resources. In addition, the discharge of industrial wastewater and domestic sewage into water bodies without treatment has led to the severe pollution of various water bodies, including rivers and lakes, thus seriously damaging the ecological environment, biodiversity, and the ecological and service functions of water bodies [
4]. According to previous studies, only a small number of rivers worldwide are not affected by water pollution [
5]. At present, the pollution and eutrophication of rivers in China are severe. According to the 2019 statistics from China’s State Environmental Protection Administration, the seven major water systems in China in descending order of pollution level are listed as follows: the Liaohe River basin, the Haihe River basin, the Huaihe River basin, the Yellow River basin, the Songhua River basin, the Pearl River basin, and the Yangtze River basin, with more than 70% of the Liaohe, Haihe, Huaihe, and Yellow River basins being polluted. Huang et al. [
6] conducted an analysis of water quality data from 2424 water quality observation stations in China from 2003–2018 and concluded that the quality of river water in China showed significant spatial differences, with 17.2% of sampling sites in eastern China showing poor water quality during the period of 2016–2018, compared to 4.6% in the western region. Moreover, 24.4% of the sampling sites in coastal areas (buffer zone of 20 km from the coastline) showed poor water quality. Although the Chinese government has invested a great deal of money into the treatment and management of polluted water bodies, the pollution proportion of water resources is still quite impressive, which has brought severe economic and social costs to China’s water environment remediation [
7]. Water quality prediction is a necessary tool for water environment planning, management, and control; an important element of water pollution research; and a fundamental part of water environmental protection and management. Thus, it is vital to find a reasonable and effective water quality prediction method. At the same time, predicting future water quality is a prerequisite for preventing rapid changes in water quality and proposing countermeasures. Therefore, the accurate prediction of water quality changes can not only effectively ensure the safety of people’s drinking water, but can also have a positive impact on guiding fishery production and protecting biodiversity.
Research into water quality prediction dates back to the 1920s. Streeter and Phelps developed a coupled model based on biochemical oxygen demand and dissolved oxygen when they studied pollution sources in the Ohio River. They proposed a one-dimensional steady-state oxygen balance water quality model (the S-P model). Since then, many scholars have supplemented and revised their theories [
8,
9,
10]. At present, the research methods of water quality prediction are mainly divided into two categories: one is to use theoretical mathematical model and physical model to predict the development trend of water quality mechanism [
11], the other is a non-mechanistic prediction method that builds mathematical statistical prediction models based on historical data. The mechanistic prediction method analyses the physical, chemical, and biological changes of each factor in the water resource cycle; establishes a mathematical model reflecting the relationship between the substances; and solves the corresponding mathematical equations to predict the trend of water quality changes. For example, Zhang et al. incorporated the operation rules of dams or sluices into the reservoir regulation module, used an improved SWAT model to simulate the water quantity and quality in the Huaihe River basin, and compared the results with those of the original SWAT model. The results showed that the improved SWAT model was more accurate in simulating the water quantity and quality in the Huaihe River basin [
12]. Peng et al. used the Environmental Fluid Dynamics Code (EFDC) model coupled with a geographic information system (GIS) model to simulate the water quality of the lower Charles River, and the results showed that the accuracy of the model was improved compared with the original EFDC model [
13]. The mechanistic models of river water quality tend to provide a more comprehensive description of water quality changes, as they consider the effects of physical, chemical, and biological processes on the spatial and temporal transport and transformation patterns of pollutants in river waters; however, at the same time, most of these models are complex and require a great deal of basic information and data (numerical model uses a large amount of water quality data as the basis for calculation), and it is difficult to obtain a continuous distribution of water quality in space and time. This has greatly limited the application of these models [
14]. In addition, the mechanics of many water environment systems are not fully understood by scholars; hence, it is difficult to describe them accurately using exclusively mechanistic modelling. In contrast, non-mechanical water quality modelling is a black-box approach to a particular water quality system, which is modelled by mathematical statistics or other mathematical methods to make predictions about water quality. Commonly used non-mechanical water quality simulation prediction methods include regression models, probability statistical models, grey prediction models, time series models, etc.
In recent years, neural networks and other machine learning algorithms have been applied by many researchers in the field of water quality prediction and have achieved good prediction results. The SOTA table of the progress of research based on water quality prediction is shown in
Table 1 (distinguishing between mechanistic and non-mechanistic models).
Archana et al. used the depth belief network in unsupervised learning to study the PH, dissolved oxygen, turbidity, and other water quality parameters of the Chaskaman reservoir for prediction and analysis [
31]. The results show that this method performs better than the classical method for prediction. Wang et al. introduced the Holt–Winters seasonal model based on the ARIMA model and predicted the total phosphorus and total nitrogen in the reservoir. The results showed that the model had a prediction accuracy of 97.5% and had many advantages, such as fast learning speed [
32]. Mohamed et al. analyzed the irrigation water quality index in Egypt by means of an integrated evaluation method and an artificial neural network model. In addition, the ARIMA model was developed to predict IWQI in Bahr El-Baqar drain, Egypt [
33]. Shi et al. proposed a combination of the wavelet artificial neural network (WANN) model and the high-frequency alternative measurement of water quality anomaly detection and early warning method [
34]. Li et al. proposed an EEMD-SVR water quality prediction model to predict the water quality of Jialing River in China. The model first decomposes water quality indicators, such as DO, into each IMF component by the EEMD algorithm, and then builds the SVR model based on each IMF component. The results showed that the hybrid model outperformed the standard SVR model and BPNN model in a variety of evaluation indicators [
35]. Ewaid et al. established a multiple linear regression model according to the specified weight and predicted the water quality of the Euphrates River [
36]. Xu combined wavelet transform and BPNN to establish a short-term wavelet neural network water quality prediction model and used the model to predict the water quality of intensive freshwater pearl culture ponds in Duchang County, Jiangxi Province, China. The results showed that the RMSE of the model was 3.822 in DO metrics, which was much lower than that of the BPNN and ELman models, showing desirable performance [
37]. Qin et al. developed a PSO-WSVR model and used a particle swarm algorithm to optimize the parameters of the weighted support vector regression machine to predict water quality in Yixing, China. The results showed that the model reduced RMSE, MAE, MAPE, and MSE by 46.74%, 17.86%, 43.62%, and 67.84%, respectively, compared with the standard SVR model [
38]. Tizro et al. used the ARIMA model to study nine water quality parameters of Hor Rood River [
39]. Faruk established an ARIMA-ANN model with 108 months of water quality data from the Büyük Menderes River in Turkey from 1996–2004. The model consisted of two parts: firstly, the ARIMA model was used to model the linear part of the dataset, and then the artificial neural network was used to model the nonlinear part of the water quality series based on the fact that the ARIMA model could not solve the nonlinear part of the water quality series well. The results showed that the correlation coefficients between the predicted values of the hybrid model and the observed data for boron, dissolved oxygen, and water temperature were 0.902, 0.893, and 0.909, respectively [
40]. Zhang et al. developed an ARIMA-RBFNN model to predict the total nitrogen (TN) and total phosphorus (TP) of Chagan Lake. The results showed that the RMSE values of this hybrid model were 0.139 and 0.036 for TN and TP indicators, respectively, which were improved compared to the ARIMA and RBFNN models [
41]. Than et al. developed the LSTM-MA model, classified the water quality of Dongnai River from 2012 to 2019, predicted the water quality in the next two years, and proved that the LSTM-MA hybrid model has a quicker training time and more precise prediction than ARIMA, NAR, NAR-MA, and LSTM models [
42]. Jian et al. first used an improved grey correlation (IGRA) to extract the features of water quality information and subsequently used LSTM to predict the water quality of Taihu Lake and Victoria Harbor; the results showed that the RMSE values of the model were 0.07 and 0.067, which were lower than those of the BPNN and ARIMA models, showing good performance [
43]. Hameed et al. used an RBF neural network (RBFNN) and BPNN model to forecast and compare the water quality in Malaysia, respectively. The results showed that the RMSE of BPNN was 0.867 and the RMSE of RBFNN was 0.0194, and the RBF neural network outperformed the BP neural network model in terms of prediction accuracy [
44].
In summary, although scholars have proposed a large number of research methods in the field of water quality prediction, the prediction results of traditional statistical models are not satisfactory for time series with large fluctuations and long-term trends. For example, the regression analysis model is relatively simple, but its requirements for statistical data are high, demanding a large sample and data with a good distribution pattern; the time series model has a relatively sound theoretical basis, but its prediction accuracy is poor; the grey prediction model is suitable for the case of small and discontinuous historical data, but the model is susceptible to the influence of unstable data, resulting in a large prediction error; the support vector machine is suitable for small samples, but it is more sensitive to the choice of parameters and kernel functions. In addition, traditional single deep learning models, such as back Propagation neural network (BPNN) and RBFNN, lack the memory ability for historical information. Moreover, most of the missing data filling methods cannot effectively handle the time-series information in the dataset, resulting in large errors in the estimation of missing values. Therefore, this study attempts to use an artificial neural network to fill in the missing information of water quality, comprehensively apply wavelet transform and the LSTM model to the field of water quality prediction, and compare the prediction results with ANN-LSTM, ARIMA, NARNN, CNN-LSTM, and DWT-CNN-LSTM models so as to prove the effectiveness of the proposed model.
This study is divided into the following parts:
Section 2 introduces the artificial neural network model, wavelet transform, long-short term memory network model, and error evaluation index;
Section 3 takes the Jinjiang River Basin as the research object, constructs the ANN-WT-LSTM model for water quality prediction, and compares the prediction results with the NAR neural network model, ANN-LSTM model, and ARIMA model; and the conclusion and research prospects are presented in
Section 5.
2. Materials and Methods
2.1. Study Area Description and Dataset Analysis
The Jinjiang River is 182 km long, with a watershed area of 5629 square kilometers, an average slope of 0.19%, and an average annual runoff of 5.13 billion cubic meters. It is the largest river in Quanzhou and the third largest river in Fujian Province. The following
Figure 1 shows the geographical location of the Jinjiang River.
The Jinjiang River is divided into two tributaries, the east stream and the west stream, and the source of the Jinjiang River is the west stream, which is 153 km long with a watershed area of 3101 square kilometers and an average annual runoff of 3.65 billion cubic meters. The east stream of the Jinjiang River originates at the southern foot of Xueshan Mountain in Jindou, Yongchun. The river is 120 km long, with a watershed area of 1917 square kilometers and an average annual runoff of 1.4 billion cubic meters. Quanzhou City, through which the Jinjiang River flows downstream, is one of the most economically developed regions in Fujian Province. Quanzhou, located in the southeastern part of Fujian Province, is one of the three central cities in Fujian Province, and its total economic output has remained the first in Fujian Province for 22 consecutive years. In 2020, the city’s population was over 7 million, ranking first in the province in terms of population size. As the Jinjiang River basin covers 53.8% of Quanzhou’s land area, water resources are very important for the city’s sustainable development. At the same time, there has been a serious pollution problem in the Jinjiang River basin [
45,
46]. The traditional industrial development model has caused great damage to local sustainable development, the pressure on the water environment is increasing, pollution from some enterprises is rebounding, the construction of environmental protection infrastructure is lagging behind, and the proportion of domestic pollution sources is increasing day by day. Therefore, the accurate prediction of water quality in the Jinjiang River basin will provide crucial decision data support for future pollution control programs.
The dataset used in this study was selected from the weekly report of automatic water quality monitoring at the Shilong section of Jinjiang River basin. Among the many water quality evaluation indexes, we selected dissolved oxygen (DO), permanganate index (CODMn), ammonia nitrogen (NH
3-N), and TP (total phosphorus), which are the four most representative indexes of the research object. The time of data collection was from 7 January 2013 to 21 June 2021. The data update cycle occurred once a week, with a total of 443 groups of data. We used the first 421 groups of data as the training set and the last 22 groups as the test set. The images of the dataset are shown in
Figure 2.
Next, the dataset was analyzed and the missing values were found. The analysis results are shown in
Table 2.
Then, we used Pearson’s correlation coefficient to analyze the correlation of each dataset. The results are shown in the
Table 3. From the above correlation analysis table, it can be seen that the DO dataset was negatively correlated with the CODMn, TP, and NH
3-N datasets; the CODMn dataset showed a weak positive correlation with the TP and a significant positive correlation with the NH
3-N dataset; and the TP dataset showed a significant positive correlation with the NH
3-N dataset.
2.2. The Framework of the Proposed Model
The single neural network model is susceptible to fluctuations in the water quality time series during training, which affects the prediction accuracy. Therefore, this study introduced the signal time and frequency decomposition method for water quality data preprocessing and built a hybrid prediction model based on “decomposition- prediction- reconstruction” to improve the overall prediction accuracy. The hybrid model is made up of five components:
Data preprocessing: firstly make a descriptive analysis of the collected water quality data, find the missing value, estimate the missing value by artificial neural network, and then normalize it to eliminate the influence of dimension.
Discrete wavelet transform: The db5 wavelet technique is used to decompose the water quality time series datasets.
Model training, detection: Split the high-frequency and low-frequency signals of each dataset obtained from the db5 wavelet decomposition into a training set and a test set according to a fixed ratio. In this study, we set the first 421 sets of each dataset as the training set and the last 22 sets as the test set. Subsequently, we used LSTM to train each training set and adjust the relevant parameters of LSTM, such as learning rate and the maximum number of iterations.
The predictions obtained from the decomposed test set of each sub-series are superimposed to obtain the final prediction results.
Model evaluation: This study used four indicators—MSE, RMSE, MAE and MAPE—to evaluate the model’s performance.
The whole algorithm flow chart is shown in
Figure 3.
2.3. Data Normalization
Data normalization is a fundamental task for mining data in machine learning. In practical research, different methods and evaluation metrics often have different scales and units, which will produce diverse data analysis results. In order to reduce the relative relationship between quantities and to eliminate the influence of the dimension between indicators, the data must be normalized in order to achieve comparability between data indicators and to achieve the expectation of data optimization. The original data are normalized such that the indicators are in the same order of magnitude, which is convenient for comprehensive comparison and evaluation. Commonly used normalization methods include min-max normalization [
47] and Z-score normalization [
48]. Minimum-maximum normalization, also known as outlier normalization, is a linear transformation of the original data such that the resulting values map to between 0 and 1. There are also some other data normalization methods, such as the Z-score standardization method. However, the Z-score application also has risks. Firstly, the estimation of the Z-score requires the overall mean and variance, but this value is difficult to obtain in real analysis and mining. In most cases, it is replaced by the sample mean and standard deviation. Secondly, Z-score has certain requirements for data distribution, and normal distribution is the most conducive to Z-score calculation. Therefore, we chose the min-max normalization method. It is more suitable for use on data with relatively concentrated values. The transformation function of the min-max normalization used in this study is as follows:
where
is the maximum value of the sample data and
is the minimum value of the sample data.
2.4. Artificial Neural Network (ANN)
During the collection of time-series data, the loss of single or multiple attributes of some data in the final dataset or the loss of single or multiple records will be caused by acquisition, storage, and human error. These data are called missing data. The lack or incompleteness of data brings many difficulties to data mining, which will lead to the deviation of the analysis results and mislead users’ decisions, resulting in adverse consequences. Therefore, filling the missing data completely under certain conditions is of great significance for macro data mining in big data scenarios. Nowadays, there are several ways to deal with missing data, such as the deletion method [
49,
50], missing value filling method based on a statistical model [
51], or the method based on parameter estimation. This method first judges the missing mechanism of the missing value and then establishes a specific model to estimate the missing value. This method is widely used because it is more flexible in application and can be applied to datasets with a large number of missing values [
52]. Common methods include the expectation maximization method, multiple filling method [
53,
54,
55,
56,
57], maximum likelihood estimation method, etc. Austin et al. used multiple interpolations to estimate missing values in clinical medicine [
58]. Chang et al. developed a distributed multiple filling method with communication efficiency to estimate the missing data in distributed health data networks (DHDNs) [
59].
In summary, research on interpolation methods for missing values of time series has received increasing attention from scholars in various fields, and although some scholars have considered the correlation characteristics of time series, most of these studies have not quantified the correlation between the observed quantities. Although some scholars consider the correlation characteristics of time series, most of the studies are still based on traditional interpolation or regression analysis methods. Moreover, some traditional models, such as piecewise linear interpolation [
60], cannot estimate the missing value well [
61,
62]. Therefore, with the development of machine learning, researchers can gradually apply various machine learning algorithms to the field of missing value filling, which can to some extent solve the problem of non-linearity that cannot be handled by traditional methods. Machine learning methods for missing value estimation include the KNN method [
63], artificial neural network, etc.
Artificial neural network (ANN) is a classical fundamental technique in machine learning. Compared with general multi-factor prediction methods, its prediction method has the advantages of high fault tolerance, high reliability, and fast prediction speed. In addition, ANN is a powerful interpolation tool [
64,
65,
66]. Artificial neural networks generally have more than three layers of multilayer neural networks, which generally include three-layer structures of input, hidden, and output layers, as shown in
Figure 4.
The relationship between the input and output of neurons is , where is the net activation, is the input vector, is the weight vector, and is the activation function, which represents the function of mapping the net activation and output. Some commonly used activation functions include , , , etc.
A neural network can be divided into two states: learning state and working state. The learning state is used to adjust the weight of the neural network to make the output close to the actual value, while the working state uses the established network for classification and prediction without changing the weight of the neural network. The learning mode of the neural network is tutorial learning. The weight of the network is adjusted by the difference between the actual output and expected output of the network to make the model adapt as accurately as possible.
In this study, the MLP neural network was used to estimate the missing values from the water quality data of the Jinjiang River. The activation function of the output layer is constant. The single-layer perceptron is the simplest neural network, which is composed of input and output layers, and the input and output layers are directly connected. The MLP neural network contains an input layer, output layer, and several hidden layers, which is a kind of multi-layer feed-forward neural network based on BP algorithm training. The input signal is passed forward through the input layer to the hidden layer, and subsequently the neurons in the hidden layer are computationally processed and then passed forward to the output layer, which is a forward transmission process in which the output of the MLP neural network depends only on the current input and not on past or future inputs; thus, the MLP neural network is also known as a multi-layer feed-forward neural network. Among many neural network architectures, MLP neural networks are simple in structure, easy to implement, and have good fault tolerance, robustness, and excellent nonlinear mapping capability (
Figure 5).
2.5. Basic Principle of Wavelet Transform
In the process of time-series data acquisition, there will be some noise in the time series data due to observation error, systematic error, or other reasons, and the noise will seriously affect the data processing results. Therefore, in the data preprocessing stage, different methods should be selected to denoise the data according to the type of noise. Common denoising methods include the Fourier transform [
67], the wavelet transform [
68], etc.
The Fourier transform is a widely used analysis method in the field of signal processing. It converts a time domain signal into a frequency domain signal. Its basic idea is to decompose the signal into the superposition of a series of continuous sine waves with different frequencies. However, Fourier transform also has many disadvantages. The traditional Fourier transform can only realize the overall transformation between the signal time domain and the frequency domain and cannot distinguish time-domain information. However, Fourier transform is only suitable for stable signals; most signals have variability, which significantly limits the application of Fourier transform.
The basic idea of wavelet transform is to adaptively adjust the time-frequency window according to the signal, decomposing the original signal into a series of sub-band signals with different spatial resolutions, frequency characteristics, and directional characteristics after stretching and translating. These sub-bands have good local characteristics in both the time and frequency domains and can therefore be used to represent the local characteristics of the original signal, thus enabling the localization of the signal in time and frequency. This method can overcome the limitations of Fourier analysis in dealing with non-smooth signals and complex images.
The mathematical definition of wavelet is as follows: let , which is almost always 0 on R and satisfies , then is the wavelet, where is the Fourier transform of . Wavelet transform is one order of magnitude faster than fast Fourier transform. When the signal length is M, the computational complexity of Fourier transform is Of = Mlog2M and that of wavelet transform is OM = M.
Wavelet transform can be divided into continuous wavelet transform (CWT) and discrete wavelet transform (DWT).
The formula of continuous wavelet transform is:
where
Wf(
a,
b) is the continuous wavelet coefficient,
a is the scaling factor,
b is the translation factor,
is the conjugate function of
, and
represents the original data. The scale of wavelet transform is controlled by adjusting the values of
a and
b to realize the adaptive time-frequency signal analysis.
The discrete wavelet transform formula is:
where
Wf(
j,
k) is the discrete wavelet coefficient and
f(
t) is the original data.
The dbn wavelet is the most common wavelet transform and is mainly used in discrete wavelet transform. For wavelets of a finite length, when applied to fast wavelet transform, there will be a sequence composed of two real numbers. One is the coefficient of the high-pass filter, which is called the wavelet filter, and the other is the coefficient of the low-pass filter, which is called the adjustment filter. Firstly, the wavelet transform decomposes the original data into the low-frequency wavelet coefficient
cAn and high-frequency wavelet coefficient
cD1,
cD2, …,
cDn by using the low-pass filter and high-pass filter, respectively. Among them, the low-frequency wavelet coefficient can be further decomposed and iterated several times until the maximum decomposition time is reached. Finally, the decomposed wavelet low-frequency signal and high-frequency signal are added to realize wavelet reconstruction. The formula is:
where
f(
t) is the restored signal;
and
are the low-pass filter and high pass filter, respectively;
cAn is low-frequency wavelet coefficient; and
cDn is high-frequency wavelet coefficient.
The calculation steps of wavelet transform are as follows:
Step 1. Elect the wavelet function and align it with the starting point of the analysis signal.
Step 2. Calculate the approximation degree between the signal to be analyzed and the wavelet function at this time; that is, the wavelet transform’s coefficient C. The larger the coefficient C, If the coefficient C is larger, the more similar the current signal is to the waveform of the selected wavelet function.
Step 3. Move the wavelet function to the right one-unit time along the time axis, and then repeat Steps 1 and 2 to calculate the transformation coefficient C until it covers the whole signal length.
Step 4. Scale the selected wavelet function by one unit, and then repeat Steps 1–4.
Step 5. Repeat Steps 1–4 for all expansion scales.
The selection of the mother wavelet type and decomposition level are the two most important problems in wavelet analysis. In this study, the db5 wavelet was used to decompose the experimental sequence for the following two reasons:
- (1)
The db wavelets are more suitable for relatively stable sequences;
- (2)
db5 is also one of the most commonly used wavelets in the db wavelet family, which is suitable for smoother datasets.
Because Jinjiang water quality data has obvious smoothing characteristics, the db5 wavelet analysis was the most suitable method for this study.
The maximum decomposition levels of wavelet can be calculated by the following Equation (5):
where
lw is the length of the wavelet decomposition low-pass filter and
nd is the data length.
In this study, lw = 23 and nd = 443 were selected, and L was calculated such that the number of wavelet decomposition layers was 3.
2.6. Basic Principle of LSTM
RNN was first proposed in the 1980s. As a popular algorithm in deep learning, compared with deep learning network (DNN), its circular network structure allows it to take full advantage of the sequence information in the sequence data itself. Therefore, it has many advantages in dealing with time series. Moreover, the ability to correct errors is achieved through back-propagation and a gradient descent algorithm. However, there are also many problems: as the time series grows, researchers have found that RNNs are weak for long time series, which means that the long-term memory of RNNs is poor. At the same time, as the length of the sequence increases, the depth of the model increases, and the problem of gradient disappearance and gradient explosion cannot be avoided when calculating the gradient. Therefore, Hochreiter et al. [
69] proposed LSTM. The structure of LSTM is shown in
Figure 6 [
70].
The long-short term memory network is different from the traditional recurrent neural network in rewriting memory at each time step. LSTM will save the important features it has learned as long-term memory, and selectively retain, update, or forget the saved long-term memory according to the learning. However, the features with small weight in multiple iterations will be regarded as short-term memory and eventually forgotten by the network. This mechanism allows the important feature information to be transmitted all the time with the iteration so that the network has better performance in the classification task with a long-time dependence of samples. LSTM has been widely applied in flood sensitivity prediction [
71], the prediction of key parameters of nuclear power plants [
72], wind speed prediction [
73,
74], financial price trends [
75], language processing [
76], etc. In recent years, the LSTM model has made a series of improvements on the basis of RNN neurons. These include the addition of a transmission unit state in the RNN hidden layer controlled by three gating units: the forgetting gate, input gate, and output gate. Forgetting gates are used to control the forgetting of information and the extent to which it is retained. The calculation formula is:
where
Xt is the current input information,
ht−1 is the data information in the previous hidden state, and the range of
Ft is 0 to 1. When
Ft = 1, it means that the information is completely retained, and when
Ft = 0, it means that the information is completely abandoned.
The input gate is used to control how much input information at the current time is saved to the unit state. The expression is written as:
where
Wi is the weight matrix,
bi is the offset term, and
It is the input layer vector value.
The input unit status
Ct is represented as:
where
Wc is the weight matrix and
bc is the offset term.
The output calculation formula of the output gate
Ot is shown as:
where
bo is the offset value,
Wo is the judgment matrix, and
ht−1 is the hidden layer state at time (
t−1).
In Equation (11), is the Hadamard product and ht is the hidden layer state at time t.
2.7. Evaluation Index
In this study, mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), and mean percentage error (MAPE) were selected as the basis for judging the prediction effect of the model. The calculation formulae are as follows:
where
N represents the total data volume,
represents the real value, and
represents the predicted value.
MAE is used to measure the mean absolute error between the predicted and actual values, RMSE is used to measure the deviation between the predicted and actual values (which is sensitive to outliers), and MAPE is used to measure the average relative error between the predicted and actual values.
4. Discussion
Water pollution is one of the biggest important environmental problems facing mankind, and the harm caused by it is largely due to the lack of prediction and early warning and emergency disposal capabilities. Therefore, the construction of an effective monitoring and early warning system to achieve intelligent decision making and the management of water quality is a key scientific and technological issue that needs to be addressed urgently. However, because water quality indicators usually have the characteristics of nonlinearity and non-smoothness, conventional statistical models often have difficulties making accurate predictions [
77]. In recent years, there has been a rapid development of deep learning technology and wireless sensing technology. The model proposed in this study can be applied in the following aspects:
- (1)
Existing monitoring systems cannot achieve online high-frequency monitoring of all important pollutants, so the model proposed in this study can be used for soft computing to improve the timeliness, coverage and frequency of online monitoring and to form an effective early warning system for water quality management.
- (2)
According to real-time monitoring data for water quality change trend prediction and water quality risk judgment. When the prediction results show that the water quality situation has a deteriorating trend, the relevant management departments can make the corresponding measures of pollution prevention and control at the first time, so as to minimize the water quality losses caused by pollution incidents.
From the above results and error images, it can be seen that the accuracy of the ANN-WT-LSTM model prediction on the DO dataset was substantially improved compared with the MLPNN model, ANN-LSTM model, NAR neural network model, CNN-LSTM model, SSA-LSTM, SSA-BPNN model, and ISSA-BPNN model. For the CODMn dataset, the MAPE of the ANN-WT-LSTM model was 0.021, which was 0.157, 0.2383, 0.1749, 0.677, 0.129, 1.569, 0.139, 1.279, 0.749, 0.029, and 0.039 lower compared to the ANN-LSTM model, NAR neural network model, ARIMA model, MLPNN model, CNN-LSTM model, BPNN model, SSA-LSTM model, ISSA-BPNN model, SSA-BPNN model, DWT-CNN-LSTM model, and EMD-LSTM model, respectively. For the TP dataset, the RMSE of the ANN-WT-LSTM model was 0.026, which decreased by 0.004, 0.0085, 0.0099, 0.007, 0.108, 0.424, 0.114, 0.324, 0.334, 0.019, and 0.164, respectively, compared to the other models. For the NH3-N dataset, the MSE of the ANN-WT-LSTM model was 0.006, which decreased by 0.024, 1.939, 0.0237, 0.009, 0.002, 0.064, 0.003, 0.044, 0.094, 0.024, and 0.016, respectively, when compared with the other models.
It is known that water quality prediction methods are divided into two main categories: mechanistic and non-mechanistic predictions. Mechanistic water quality models are derived using system structure data based on constraints in the underlying physical, biological, and chemical processes of the water environment system. A variety of water quality models have been developed, such as QUAL, WASP, MIKE, EFDC, SWAT, SMS, BASINS, etc., and have been widely used. However, these mechanistic water quality models are very complex and require a large amount of basic data information (such as simulation parameters, source and sink terms, etc.) to establish and solve the water quality control equations. This makes the complexity of building water quality models high and the parameters more difficult to determine, leading to limitations in the application of the models in many water bodies [
78,
79]. Moreover, for many aquatic environmental systems, the detailed mechanisms are not fully explained, and the evolutionary development of water quality is influenced and disturbed by many variables, such as physics, chemistry, biology, meteorology, and hydraulics, with strong non-linear characteristics. The existing water quality prediction models based on mathematical expressions are unable to take the influence of these factors into account, and it is difficult to accurately describe the migration and dispersion of the water environment using mechanistic modelling; hence, the predictions made on this basis have a “natural” bias. Furthermore, typical basin hydrological models, such as SWAT, HSPF and MIKE, have different scenarios that are able to simulate the hydrological processes and the evolution of point and non-point sources of pollution in large scale basins over long periods of time; however, they are not suitable for predicting water quality in larger water bodies, such as lakes and reservoirs. Water quality models such as CE-QUAL-W2, WASP, and EPD-RIV1 address the hydrodynamics and water quality of larger water bodies, but not the hydrological problems that occur in the basin.
In contrast, the ANN-WT-LSTM model proposed in this paper is based on the idea of neural networks to analyze historical water quality data to predict future water quality changes, and is one of the non-mechanical water quality prediction methods. Non-mechanical forecasting methods use the idea of statistics, through the water quality related to the historical time series data mining analysis, to find its data behind the law of change, and then deduce the trend of water quality changes. Compared with the mechanistic water quality prediction methods, the advantages are obvious. First of all, the modelling cost is lower as the modelling data requirements are not high. Therefore, the method can be applied to water quality prediction in areas where a large amount of hydrological data is missing. Secondly, the model prediction reliability is good, because the ANN-WT-LSTM model has good applicability to the analysis and prediction of non-linear problems in uncertain environment; thus, the water quality prediction accuracy has been improved a great deal compared with previous models (
Table 6). In addition, the ANN-WT-LSTM model has good applicability. The model itself is a “black box” model analysis, which does not need the hydrological data of pollution sources for analysis. Whether the study area is the river basin environment or lakes, reservoirs or other large water bodies, it has wide applicability and universality. In summary, our view is that the ANN-WT-LSTM model proposed in this paper is not the only choice in water quality prediction models, but it still has great potential for application compared to other competing methods (including 1D, 2D, and 3D numerical models) due to its reliability, efficiency, and accuracy.
The ANN-DWT-LSTM model proposed in this study still has several aspects that can be improved.
- (1)
The model proposed in this paper only considers the historical data of water quality indicators in the Jinjiang River basin, while changes in the external environment have a greater impact on river water quality, which can interfere with the neural network training process, thus affecting the accuracy of the model. There is still room for further research into how to reduce the interference of external factors or consider the influence of water quality factors in the model.
- (2)
In this study, LSTM was used to predict water quality; however, there are numerous improved versions of the LSTM model, including the Bi-LSTM (bi-directional long short-term memory network) and the adaptive neuro-fuzzy inference system (ANFIS). These methods can be used to compare with the model proposed in this study.
Based on the powerful parallel data processing capability and non-linear processing ability of neural networks, we believe that the model proposed in this study can be combined with big data technologies, such as IoT, which can process large-scale data quickly and accurately and can meet the requirements of multi-sensor data fusion well.
5. Conclusions
To improve the accuracy of water quality prediction data, this study proposed the ANN-WT-LSTM model based on an artificial neural network, wavelet transform, and long short-term memory network, using the water quality data of the Jinjiang River basin in China as the research object for prediction analysis. For missing water quality data caused by instrument failure, this study used an artificial neural network to fill in the missing values based on the time-series information of water quality data. Then, we used wavelet transform to decompose and reconstruct the water quality time series, in order to remove the impact of short-term random disturbance noise, improve the prediction accuracy of the model on out-of-sample data, and the ability to predict future dynamic trends, so that it can more effectively predict the short-term as well as long-term dynamic trends in water quality time-series data. Subsequently, compared with the ANN-LSTM model and the NAR neural network model, the results show that the ANN-WT-LSTM proposed in this study is better than other models in all evaluation indexes, and the model effectively improves the accuracy of water quality prediction, which is significant for water environment protection. The study not only provides vital data support for water quality safety management decisions, but also has important theoretical and practical significance for safeguarding the sustainable development of the riverine areas and water environmental protection in the reservoir area.
This study predicts the possible future situation of reservoir water quality through the study of time series. However, due to the limitation of monitoring conditions, it can only predict the water quality at one point of the reservoir, which cannot reflect the overall spatial change of water quality. Therefore, in order to establish a more perfect reservoir early warning system, we suggest that water quality monitoring points be set up in many places to monitor the water quality in different directions of the reservoir to combine water quality prediction with GIS technology. In this way, we not only study the development trend of water quality in time, but also study the change of water quality in space, so as to combine time and space prediction and lay a good foundation for establishing a perfect water quality early warning system.