Using a Backpropagation Artificial Neural Network to Predict Nutrient Removal in Tidal Flow Constructed Wetlands

Nutrient removal in tidal flow constructed wetlands (TF-CW) is a complex series of nonlinear multi-parameter interactions. We simulated three tidal flow systems and a continuous vertical flow system filled with synthetic wastewater and compared the influent and effluent concentrations to examine (1) nutrient removal in artificial TF-CWs, and (2) the ability of a backpropagation (BP) artificial neural network to predict nutrient removal. The nutrient removal rates were higher under tidal flow when the idle/reaction time was two, and reached 90 ± 3%, 99 ± 1%, and 58 ± 13% for total nitrogen (TN), ammonium nitrogen (NH4-N), and total phosphorus (TP), respectively. The main influences on nutrient removal for each scenario were identified by redundancy analysis and were input into the model to train and verify the pollutant effluent concentrations. Comparison of the actual and model-predicted effluent concentrations showed that the model predictions were good. The predicted and actual values were correlated and the margin of error was small. The BP neural network fitted best to TP, with an R2 of 0.90. The R2 values of TN, NH4-N, and nitrate nitrogen (NO3-N) were 0.67, 0.73, and 0.69, respectively.


Introduction
In recent decades, constructed wetlands have been widely used to treat industrial and agricultural wastewater because of their ability to remove nutrients, organic matter, heavy metals, and pathogens from wastewater [1][2][3].In addition to their wastewater purification ability, constructed wetlands have low running costs, low energy consumption, and are simple to manage and maintain [4][5][6].However, constructed wetlands are also criticized for their low re-oxygenation ability and their vulnerability to low temperatures, so it is therefore necessary to find methods and technologies for improving existing constructed wetlands.
A new type of constructed wetland system, a tidal flow constructed wetland (TF-CW), has significantly higher dissolved oxygen (DO) levels than conventional constructed wetlands [7].The TF-CW uses the interstitial suction created by a change in the bed infiltration surface during tidal operations to suck atmospheric oxygen into the wetland matrix or soil voids [8,9].Through this mechanism, the oxygen environment in the bed of the artificial wetland can be improved considerably, such that the plant roots have sufficient oxygen.The TF-CW also provides more efficient removal of ammonium nitrogen (NH 4 + -N) and organic matter than traditional constructed wetlands.Nutrient removal in constructed wetlands is linked to many factors.For example, NH 4 + -N removal is influenced by, among other factors, the influent concentration, temperature, pH, and DO.The removal of nitrate nitrogen (NO 3 − -N) is influenced by total organic carbon (TOC), the influent method, and time [10,11].
The removals of NH 4 + -N and NO 3 − -N are linked, and are influenced, promoted, or restricted by these factors, resulting in a complex artificial wetland ecosystem.Pollutant removal in TF-CWs is therefore a complex process involving nonlinear multi-parameter interactions [12].
Rumelhart and McCelland established a multi-layer feed-forward neural network trained by an error backpropagation algorithm in 1986, known as the backpropagation (BP) network, which later become one of the most widely used neural network models.These BP networks can learn and store many input-output pattern mappings and can predict the complex dynamic systems of nonlinear models [13].The applications of BP neural networks infiltrate into various disciplines; for example, they can be used to predict disasters such as typhoon pathways [14] and earthquakes [15], to classify remote sensing images [16], identify wireless channels [17], predict the stock market [18], and to predict groundwater quality [19].In ecology, BP neural networks are widely used, for example, to support ecological and environmental health assessments [20,21], ecological planning, ecological security assessment, eco-environmental quality assessment, ecological restoration, and pest prediction [22][23][24].However, they have rarely been applied to predict the performance of TF-CWs.In this study, we set up a BP network model to predict the removal of pollutants from TF-CWs with four different influent modes.We used this model to construct the artificial neural network topological structure of constructed wetlands, to further simulate the process of TF-CWs, and to validate the concentrations of nitrogen and phosphorus in effluents.The aim of this study was to predict the quality of the effluent discharged from TF-CWs and to optimize the nitrogen and phosphorus removal efficiencies.

Materials and Methods
The TF-CW simulator (Beijing, China) used in this study was a cylindrical steel drum, 110 cm high, with a diameter and volume of 40 cm and 138 L, respectively (Figure 1).The device had 21 sampling ports, each with a diameter of 1 cm.These ports were equally spaced around the device in seven rows that were arranged at 15-cm intervals from the top to the bottom.The inflow water was controlled by a water pump that was monitored by a time controller (TB-125, LUEABB, Wenzhou, China).The device was covered with a sprinkler cover that had small holes on its bottom to ensure uniform water flow.The effluent water was controlled by electromagnetic valve (2 W normally closed 220 V, LUEABB, Wenzhou, China) and was monitored by a time controller.Zeolite, with a particle size of between 4 and 8 mm and a porosity of 42%, was filled into the device to a height of 105 cm.The zeolite was planted with Typha orientalis.
Water 2017, 9, 83 2 of 21 ammonium nitrogen (NH4 + -N) and organic matter than traditional constructed wetlands.Nutrient removal in constructed wetlands is linked to many factors.For example, NH4 + -N removal is influenced by, among other factors, the influent concentration, temperature, pH, and DO.The removal of nitrate nitrogen (NO3 − -N) is influenced by total organic carbon (TOC), the influent method, and time [10,11].
The removals of NH4 + -N and NO3 − -N are linked, and are influenced, promoted, or restricted by these factors, resulting in a complex artificial wetland ecosystem.Pollutant removal in TF-CWs is therefore a complex process involving nonlinear multi-parameter interactions [12].
Rumelhart and McCelland established a multi-layer feed-forward neural network trained by an error backpropagation algorithm in 1986, known as the backpropagation (BP) network, which later become one of the most widely used neural network models.These BP networks can learn and store many input-output pattern mappings and can predict the complex dynamic systems of nonlinear models [13].The applications of BP neural networks infiltrate into various disciplines; for example, they can be used to predict disasters such as typhoon pathways [14] and earthquakes [15], to classify remote sensing images [16], identify wireless channels [17], predict the stock market [18], and to predict groundwater quality [19].In ecology, BP neural networks are widely used, for example, to support ecological and environmental health assessments [20,21], ecological planning, ecological security assessment, eco-environmental quality assessment, ecological restoration, and pest prediction [22][23][24].However, they have rarely been applied to predict the performance of TF-CWs.In this study, we set up a BP network model to predict the removal of pollutants from TF-CWs with four different influent modes.We used this model to construct the artificial neural network topological structure of constructed wetlands, to further simulate the process of TF-CWs, and to validate the concentrations of nitrogen and phosphorus in effluents.The aim of this study was to predict the quality of the effluent discharged from TF-CWs and to optimize the nitrogen and phosphorus removal efficiencies.

Materials and Methods
The TF-CW simulator (Beijing, China) used in this study was a cylindrical steel drum, 110 cm high, with a diameter and volume of 40 cm and 138 L, respectively (Figure 1).The device had 21 sampling ports, each with a diameter of 1 cm.These ports were equally spaced around the device in seven rows that were arranged at 15-cm intervals from the top to the bottom.The inflow water was controlled by a water pump that was monitored by a time controller (TB-125, LUEABB, Wenzhou, China).The device was covered with a sprinkler cover that had small holes on its bottom to ensure uniform water flow.The effluent water was controlled by electromagnetic valve (2 W normally closed 220 V, LUEABB, Wenzhou, China) and was monitored by a time controller.Zeolite, with a particle size of between 4 and 8 mm and a porosity of 42%, was filled into the device to a height of 105 cm.The zeolite was planted with Typha orientalis.We simulated the sewage of Beijing using ammonium chloride, glucose, and monopotassium phosphate to represent total nitrogen (TN), TOC, and total phosphorus (TP), respectively.The volume of artificial sewage stored was 4 m 3 .The concentrations of the main water quality indicators in the simulated sewage are shown in Table 1.In this experiment, we studied the effects of nutrient removal and the factors that influenced nutrient removal under different influent modes in four test devices (A-D).We set up three different tidal water inflow models (B, C, and D), and a continuous influent mode as a control (A).The input cycle of the four modes was 24 h.Each of the four installations received the same amount of flow (0.138 m 3 /day).Details of the hours of operation are presented in Table 2.We conducted a preliminary experiment from January to March 2014.We then attached the membrane to the substrate and commenced the experiment formally in April 2014.Water samples from the sampling ports at seven treatment depths (15,30,45,60,75,90, and 105 cm) in Devices A, B, C, and D and the sewage pool were collected weekly from April 2014 to November 2014 and sent to the laboratory for analysis.Temperature, DO, oxidation reduction potential (ORP), pH, and conductivity (Cond) were measured by a portable multi-parameter water quality analyzer (YSI-EXO, YSI, Yellow Springs, OH, USA).The concentrations of TN, NH 4 + -N, nitrite nitrogen (NO 2 − -N), NO 3 − -N, and TP were determined by an automatic discrete analyzer (SMARTCHEM200, WestCo, Frepillon, France).We used Origin 8.5 (OriginLab, Northampton, MA, USA) for data fitting and SPSS version 19.0 (SPSS, Chicago, IL USA) for statistical analysis.Statistically significant differences were determined at the α = 0.05 significance level.RDA analyses were carried out within Canoco4.5 (Microcomputer Power, Ithaca, NY, USA) and we constructed the plot in CanoDraw4.5 (Microcomputer Power, Ithaca, NY, USA).The BP neural network was simulated in Matlab R2011b (MathWorks, Natick, MA, USA).

Nutrient Removal Effect
As shown in Figure 2, the TN removal rates in the four different influent conditions in the TF-CW simulators exceeded 65%, and the mean TN removal rate of the continuous flow simulator was 82 ± 5%.When the idle time/reaction time ratios were 1:1 (B), 1:2 (C), and 2:1 (D), the mean TN removal rates were 85 ± 5%, 86 ± 4%, and 90 ± 3%, respectively (Table 3).The removal was optimal for an idle time/reaction time ratio of 2:1 (D) under tidal flow conditions.The NH4 + -N concentrations in the effluent were statistically significantly lower than those in the influent (Figure 3).The NH4 + -N removal rate was higher than 90% and the removal efficiency was greater than that of TN.The removal efficiency of NH4 + -N in the tidal flow constructed wetland exceeded 98%, and the NH4 + -N removal rates were significantly different between the four influent conditions (p < 0.05) (Table 3).Note: Differences between a, b, and c indicate significant differences at the 0.05 level, the average ± standard deviation.
The NO3 − -N concentrations in the effluent were higher than those in the influent from the four simulations, mainly because of NO3 − -N produced during reactions.The removal rates were negative and the difference between the inflow and outflow concentrations in the continuous flow setting was The NH 4 + -N concentrations in the effluent were statistically significantly lower than those in the influent (Figure 3).The NH 4 + -N removal rate was higher than 90% and the removal efficiency was greater than that of TN.The removal efficiency of NH 4 + -N in the tidal flow constructed wetland exceeded 98%, and the NH 4 + -N removal rates were significantly different between the four influent conditions (p < 0.05) (Table 3).The NH4 + -N concentrations in the effluent were statistically significantly lower than those in the influent (Figure 3).The NH4 + -N removal rate was higher than 90% and the removal efficiency was greater than that of TN.The removal efficiency of NH4 + -N in the tidal flow constructed wetland exceeded 98%, and the NH4 + -N removal rates were significantly different between the four influent conditions (p < 0.05) (Table 3).Note: Differences between a, b, and c indicate significant differences at the 0.05 level, the average ± standard deviation.
The NO3 − -N concentrations in the effluent were higher than those in the influent from the four simulations, mainly because of NO3 − -N produced during reactions.The removal rates were negative and the difference between the inflow and outflow concentrations in the continuous flow setting was  Note: Differences between a, b, and c indicate significant differences at the 0.05 level, the average ± standard deviation.
The NO 3 − -N concentrations in the effluent were higher than those in the influent from the four simulations, mainly because of NO 3 − -N produced during reactions.The removal rates were negative and the difference between the inflow and outflow concentrations in the continuous flow setting was greater than the difference between the inflow and outflow concentrations in the tidal flow simulation devices (Figure 4).
Water 2017, 9, 83 5 of 21 greater than the difference between the inflow and outflow concentrations in the tidal flow simulation devices (Figure 4).In TF-CWs, TP is mainly absorbed by the matrix.The results in Table 3 show that the TP removal efficiencies of the TF-CW simulator under different influent conditions (A, B, C, D) were 55 ± 14%, 57 ± 16%, 56 ± 19%, and 58 ± 13%, respectively (Figure 5).The TP removal rate did not significantly improve under tidal flow conditions and the natural zeolite did not improve the removal rate of TP from the sewage.

Building the BP Artificial Neural Network
The development of a nonlinear function fitting algorithm of a BP neural network can be divided into three steps as follows: Construction, training, and prediction [21].The details of the algorithm flow are shown in Figure 6.This model uses powerful data recognition and simulation capabilities to solve nonlinear system problems.The input and output data are trained on the BP neural network so that the network can express the unknown black box function and then the trained BP neural network can be used to predict the system outputs.In TF-CWs, TP is mainly absorbed by the matrix.The results in Table 3 show that the TP removal efficiencies of the TF-CW simulator under different influent conditions (A, B, C, D) were 55 ± 14%, 57 ± 16%, 56 ± 19%, and 58 ± 13%, respectively (Figure 5).The TP removal rate did not significantly improve under tidal flow conditions and the natural zeolite did not improve the removal rate of TP from the sewage.
Water 2017, 9, 83 5 of 21 greater than the difference between the inflow and outflow concentrations in the tidal flow simulation devices (Figure 4).In TF-CWs, TP is mainly absorbed by the matrix.The results in Table 3 show that the TP removal efficiencies of the TF-CW simulator under different influent conditions (A, B, C, D) were 55 ± 14%, 57 ± 16%, 56 ± 19%, and 58 ± 13%, respectively (Figure 5).The TP removal rate did not significantly improve under tidal flow conditions and the natural zeolite did not improve the removal rate of TP from the sewage.

Building the BP Artificial Neural Network
The development of a nonlinear function fitting algorithm of a BP neural network can be divided into three steps as follows: Construction, training, and prediction [21].The details of the algorithm flow are shown in Figure 6.This model uses powerful data recognition and simulation capabilities to solve nonlinear system problems.The input and output data are trained on the BP neural network so that the network can express the unknown black box function and then the trained BP neural network can be used to predict the system outputs.

Building the BP Artificial Neural Network
The development of a nonlinear function fitting algorithm of a BP neural network can be divided into three steps as follows: Construction, training, and prediction [21].The details of the algorithm flow are shown in Figure 6.This model uses powerful data recognition and simulation capabilities to solve nonlinear system problems.The input and output data are trained on the BP neural network so that the network can express the unknown black box function and then the trained BP neural network can be used to predict the system outputs.It is difficult to control the basic chemical transformations in ecological engineering applications, and pollutant recycling in wastewater is a highly complicated process.There is considerable variation in the many factors that influence the removal efficiency of different pollutants, and it is impossible to consider the wide range of potential influences as model inputs.In this study, we used redundancy analysis (RDA), based on the actual effluent concentrations, to screen each of the indicators for their influence on the removal efficiency of nutrients in the TF-CW simulator (Figure 7).Redundancy analysis can extract and summarize the variation in a set of response variables that can be explained by a set of explanatory variables.It can also be considered a constrained version of principal components analysis (PCA), where the canonical axes-built from linear combinations of response variables-must also be linear combinations of the explanatory variables [25].It is difficult to control the basic chemical transformations in ecological engineering applications, and pollutant recycling in wastewater is a highly complicated process.There is considerable variation in the many factors that influence the removal efficiency of different pollutants, and it is impossible to consider the wide range of potential influences as model inputs.In this study, we used redundancy analysis (RDA), based on the actual effluent concentrations, to screen each of the indicators for their influence on the removal efficiency of nutrients in the TF-CW simulator (Figure 7).Redundancy analysis can extract and summarize the variation in a set of response variables that can be explained by a set of explanatory variables.It can also be considered a constrained version of principal components analysis (PCA), where the canonical axes-built from linear combinations of response variables-must also be linear combinations of the explanatory variables [25].It is difficult to control the basic chemical transformations in ecological engineering applications, and pollutant recycling in wastewater is a highly complicated process.There is considerable variation in the many factors that influence the removal efficiency of different pollutants, and it is impossible to consider the wide range of potential influences as model inputs.In this study, we used redundancy analysis (RDA), based on the actual effluent concentrations, to screen each of the indicators for their influence on the removal efficiency of nutrients in the TF-CW simulator (Figure 7).Redundancy analysis can extract and summarize the variation in a set of response variables that can be explained by a set of explanatory variables.It can also be considered a constrained version of principal components analysis (PCA), where the canonical axes-built from linear combinations of response variables-must also be linear combinations of the explanatory variables [25].Figure 7 shows that the TN and NH 4 + -N removal rates were not affected by the operating time, but the removal rate of NO 3 − -N was inversely proportional to the operating time.The removal efficiencies of TN and NH 4 + -N were stable, but the removal rate of NO 3 − -N decreased gradually.The tidal flow device can effectively purify wastewater with NH 4 + -N as the main pollutant [26].The aging rate was slow, and the removal over time was consistent.When RDA was applied to continuous flow, the idle time/emptying time (RAT) value was denoted by 0. The TN and NH 4 + -N removal rates were positively correlated with RAT.The NO 3 − -N influent concentration was close to 0, but when coupled with the nitrification of NH 4 + -N, the NO 3 − -N concentration in the outflow was higher than in the inflow.Therefore, the NO 3 − -N removal rates were negative, but were positively correlated with the NO 3 − -N influent concentrations.Removal rates of NO 3 − -N should be studied in more detail for situations where NO 3 − -N is present at higher concentrations and is the main pollutant.The results from RDA showed that RAT was positively correlated with DO, which indicates that TF-CWs had a higher re-oxygenating capacity than continuous flow constructed wetlands.
The DO concentrations were positively correlated with the TN and NH 4 + -N removal rates, which shows that DO had a strong influence on the nutrient removal efficiency from wastewater when NH 4 + -N was the main pollutant [27,28].Many factors are known to affect NO 3 − -N removal, including pH, conductivity, water temperature, and salinity.When the pH is either too high or too low, it has a large effect on the activity of wetland microorganisms.Temperature affects nitrogen removal by denitrification in constructed wetlands, and the rate decreases when the temperature falls below 5 • C. In contrast to other similar studies [29], the NO 3 − -N removal rate in this study was not significantly correlated with TOC concentrations.The concentrations of TOC in the four devices were low, and thus the correlation between these two parameters was not accurately reflected.The RDA results showed that the NH 4 + -N removal rates were negatively correlated with the TOC concentrations.In this study, the TOC concentrations (between 0 and 20 mg/L) were too low to provide the energy needed for NO 3 − -N denitrification.Therefore, we need to carry out further studies to examine nitrogen removal in high TOC concentrations and to find a way to maintain the C/N ratio at a level that is conducive to improved nitrogen removal.The TN removal rates were positively correlated with pH, conductivity, water temperature, and salinity; the NH 4 + -N removal rates were positively correlated with depth, and removals of both TN and NH 4 + -N were negatively correlated with ORP and TOC (Table 4).
The treatment depth had a major influence on TP removal [30], with a correlation coefficient of 0.65.Other factors did not have major roles, which indicates that substrate adsorption and sedimentation played a leading role in TP removal.
The RDA results indicated that different factors influenced the removals of TN, NH 4 + -N, and NO 3 − -N.Therefore, we selected different training samples for the different models, as follows: the data collection times were as close as possible to ensure the data were effective, the samples were as diverse as possible, and significant deviations were removed from the actual data because of their influence on the neural network training effect.The model was divided into training data and validation data [31].The data were divided into 112 groups of training data and 56 sets of validation data using a randomly generated method.The various indicators of the sample data may not fall within the same order of magnitude, which means that the network cannot converge and thus the time to train the network needs to be increased.To obtain better training results, the index data should be within the same order of magnitude.Before running the model, all the data were mapped to between 0 to 1 and the range of changes of the index were unified, to improve the training efficiency of the network and avoid differences in the order of magnitude that would affect the accuracy of the network recognition.The MATLAB toolbox has many normalization functions, including zscore, premnmx, mapminmax, and postmnmx functions; of these, we used the mapminmax function for the normalization.The output predicted data were the output values of the normalized data, and so the outputs from the network had to be anti-normalized.
When the output layer of the model was the TN concentration in the effluent, the input layers were DO, RAT, ORP, and TOC.When the NH 4 + -N concentration was the output layer, the input layers were DO, RAT, ORP, and Depth.When NO 3 − -N was the output layer, Cond, Temp, Sal, and pH were the input layers.The input layers were DO, RAT, Time, and Depth when the output layer was the TP effluent concentration.
The number of nodes in the hidden layer is very important.The fitting ability of the network improves as the number of nodes in the hidden layer increases.When the number of hidden nodes is small, the network fitting information cannot meet the demand.If the number of hidden layer nodes is infinite, any nonlinear mapping can be implemented from the input to the output.However, the number and training time are positively related.If time is not considered, there may be problem of overfitting, in which an increase in the test error leads to a decrease in the generalization capacity [32].Therefore, the method used to determine the number of hidden layer nodes is very important, and is a complex problem.It has a certain relationship between the input and output layers, but it does not provide a good formula for deriving this number.In practice, we can combine empirical formulae to calculate the range of hidden layer nodes, and then use the trial and error method to determine the actual number.The empirical formula most commonly used is [32]: where N is the number of hidden layer nodes, n is the number of input layers, m is the number of output layers, and a is a constant (1 to 10).The empirical formula calculation showed that the number of nodes in the hidden layer should be between 3 and 12, and thus the trial-and-error method was then used to train and verify the number of all the network nodes between 3 and 12, and the numbers were tested in the network.The number of nodes was tested 3 to 5 times, and the average coefficient value derived from the 3 to 5 determination coefficients obtained was used as the determination coefficient of the number of nodes.The optimal number of hidden layer nodes was selected from the coefficient.The relationship between the number of hidden layers and the coefficient of determination (R 2 ) is shown in Figure 8.The coefficient of determination is a non-unit parameter that can directly determine the merits of the fitting indicators.In general, the greater the goodness of fit is, the higher the power of the independent variable to explain the dependent variable [33].
Figure 8 shows that the number of nodes increased as the number of hidden layers increased, and there was a clear increase in the R 2 of TN when the number of hidden nodes was 9.The training effect was best when TN had 9 hidden layers.The R 2 value reached 0.73 ± 0.02, gradually stabilized, and then decreased slightly.For NH 4 + -N, when the number of hidden nodes increased from 3 to 4, the training effect improved noticeably, but then the coefficient of the hidden layer started to fluctuate as the hidden layer nodes increased.When the number of hidden nodes reached 11, R 2 was maximized and reached 0.63 ± 0.04, which was less than the value for the TN training data.When training the NO 3 − -N data, the effect of the training data increased when the number of nodes in the hidden layer increased.We chose to use 12 hidden layer nodes in the NO 3 − -N network topology, and achieved an R 2 of 0.71 ± 0.06.For TP, there was no obvious change as the number of nodes in the hidden layer increased.The overall R 2 was high, and reached 0.93 ± 0.02 when the number of hidden nodes was 9. We therefore used 9 hidden layer nodes when predicting the TP effluent concentrations.
Once the input and output layers were determined, the number of nodes in the hidden layer were further selected, and the corresponding topological structure of the artificial neural network model of tidal flow was established.The main influences on TN, NH 4 + -N, NO 3 − -N, and TP, as indicated by the RDA, were selected as the input layers of the model; there were four input layers and one output layer.Therefore, there were 9, 11, 12, and 9 hidden layer nodes in the input layers of the neural network model, respectively, and one neuron in the output layer.

Comparison of the Predicted and Measured Values
We collected 168 groups of inflow and outflow data, among which we randomly selected 112 sets of data as training samples, and other 56 groups as test samples.The model parameters described in Section 2 were used to train the network.The predicted and measured values are shown in Figure 9.
The results show that the model established by the BP artificial neural network gave reasonable predictions of the nutrient concentrations in the effluent.While there were some errors, some of the predicted and actual values were significantly correlated, and there was minor deviation between the predicted concentrations in the output water and the measured effluent concentrations.The four indicators of the predicted and the measured values showed that there were irregular changes in the concentrations of the output water, mainly because of how the data were arranged.When the training model was used to process data for different treatment depths and different water ways, the data were disrupted and sorted randomly.The first 112 groups were extracted as training data, while the last 56 groups were extracted as the validation data.The data were independent and there were no relationships among them, and thus there were no regular changes in the predicted output water concentrations.When analyzing continuous data, the data should not be sorted randomly with the rand function.

Comparison of the Predicted and Measured Values
We collected 168 groups of inflow and outflow data, among which we randomly selected 112 sets of data as training samples, and other 56 groups as test samples.The model parameters described in Section 2 were used to train the network.The predicted and measured values are shown in Figure 9.
The results show that the model established by the BP artificial neural network gave reasonable predictions of the nutrient concentrations in the effluent.While there were some errors, some of the predicted and actual values were significantly correlated, and there was minor deviation between the predicted concentrations in the output water and the measured effluent concentrations.The four indicators of the predicted and the measured values showed that there were irregular changes in the concentrations of the output water, mainly because of how the data were arranged.When the training model was used to process data for different treatment depths and different water ways, the data were disrupted and sorted randomly.The first 112 groups were extracted as training data, while the last 56 groups were extracted as the validation data.The data were independent and there were no relationships among them, and thus there were no regular changes in the predicted output water concentrations.When analyzing continuous data, the data should not be sorted randomly with the rand function.

Training Errors in the BP Neural Network
The prediction error of the BP network and the percentage prediction error are shown in Figure 10.
There were some errors in the effluent concentrations of each index predicted by the BP neural network.The network prediction error percentage and the prediction error fluctuated significantly (p > 0.05).By comparing Figures 9 and 10, we find that, when the measured value was high, the error in the predicted value was very obvious and the prediction percentage error was large; when the measured value was low, the network prediction error and the percentage network prediction error were both low.When there was no functional relationship between the training data and the verification data, the BP neural network had a poor predictive ability, and the network only performed well when the data were regular.The network prediction error in the TN effluent concentrations ranged from −15 to 15 mg/L, and, with 80% of the prediction error between −5 and 5 mg/L, the overall network error was small.The error associated with the predicted NH4 + -N effluent concentrations ranged from −8 to 8 mg/L, and 90% of the predicted error was between −4 and 4 mg/L.The obvious variations and the high relative errors resulted from the low effluent concentrations.The errors associated with the predicted NO3 − -N effluent concentrations were mostly between −6 and 6 mg/L.The results suggest that the BP neural network was better at predicting the effluent concentrations of TP than the other nutrients since the errors associated with the predicted TP concentrations were small and 90% of the TP prediction errors were between −2 and 1 mg/L.

Training Errors in the BP Neural Network
The prediction error of the BP network and the percentage prediction error are shown in Figure 10.There were some errors in the effluent concentrations of each index predicted by the BP neural network.The network prediction error percentage and the prediction error fluctuated significantly (p > 0.05).By comparing Figures 9 and 10, we find that, when the measured value was high, the error in the predicted value was very obvious and the prediction percentage error was large; when the measured value was low, the network prediction error and the percentage network prediction error were both low.When there was no functional relationship between the training data and the verification data, the BP neural network had a poor predictive ability, and the network only performed well when the data were regular.The network prediction error in the TN effluent concentrations ranged from −15 to 15 mg/L, and, with 80% of the prediction error between −5 and 5 mg/L, the overall network error was small.The error associated with the predicted NH 4 + -N effluent concentrations ranged from −8 to 8 mg/L, and 90% of the predicted error was between −4 and 4 mg/L.The obvious variations and the high relative errors resulted from the low effluent concentrations.The errors associated with the predicted NO 3 − -N effluent concentrations were mostly between −6 and 6 mg/L.The results suggest that the BP neural network was better at predicting the effluent concentrations of TP than the other nutrients since the errors associated with the predicted TP concentrations were small and 90% of the TP prediction errors were between −2 and 1 mg/L.

Network Fitting Ability
The fitting ability of the network is an indication of how well the constructed BP neural network fits to the topological structure.The data were automatically divided into training data, validation data, and test data during the training and prediction.When calculating the fit between the measured and output values of each data, the R 2 represented the fitting effect, the fit line represented the fitting curve, Y = T represented the fitted line when R 2 = 1, and the degree of fitness decreased as the deviation increased.
The fitting of the BP neural network to the TN effluent concentrations is shown in Figure 11.The R 2 value for the training data was 0.70, and the R 2 value for the validation data was 0.61.The fitting coefficient of the test data network was 0.67 and the total effect was very good, with an R 2 of 0.67.

Network Fitting Ability
The fitting ability of the network is an indication of how well the constructed BP neural network fits to the topological structure.The data were automatically divided into training data, validation data, and test data during the training and prediction.When calculating the fit between the measured and output values of each data, the R 2 represented the fitting effect, the fit line represented the fitting curve, Y = T represented the fitted line when R 2 = 1, and the degree of fitness decreased as the deviation increased.
The fitting of the BP neural network to the TN effluent concentrations is shown in Figure 11.The R 2 value for the training data was 0.70, and the R 2 value for the validation data was 0.61.The fitting coefficient of the test data network was 0.67 and the total effect was very good, with an R 2 of 0.67.The fitting of the BP neural network to the NO 3 − -N effluent concentrations is shown in Figure 13.
The R 2 of the training data was 0.73 and, with an R 2 of 0.74, fitted well with the validation data.
The fitting ability of the test data was poor with an R 2 of 0.53.The overall R 2 was 0.69.
The fitting of the BP neural network to the NO3 − -N effluent concentrations is shown in Figure 13.The R 2 of the training data was 0.73 and, with an R 2 of 0.74, fitted well with the validation data.The fitting ability of the test data was poor with an R 2 of 0.53.The overall R 2 was 0.69.The above three figures  show that the quality of the fit of the BP neural network differed for each of the three nitrogen forms.Although there was a large relative error associated with the NH4 + -N fitting, it fitted better to the network than the other two nitrogen forms.Therefore, when using BP neural networks, we should not just consider the network error or the percentage network error, but we should also consider the network fitting ability.
The BP neural network fitted well to the TP effluent concentrations (Figure 14).The R 2 of the overall fitting ability reached 0.90 and the R 2 of the training data was 0.91.The validation data had the best fit with an R 2 of 0.92.The overall R 2 of the test data was 0.81.The TP concentrations in the effluent were mainly influenced by the Depth, while the effects of the other factors were minimal.Depth was therefore the only input layer of the BP neural network for TP, and the fact that the fitting ability of the TP effluent concentrations was better than that of the other nutrients indicates that networks with only one guiding factor have a better fitting ability and lower error than other networks.The above three figures (Figures 11-13) show that the quality of the fit of the BP neural network differed for each of the three nitrogen forms.Although there was a large relative error associated with the NH 4 + -N fitting, it fitted better to the network than the other two nitrogen forms.Therefore, when using BP neural networks, we should not just consider the network error or the percentage network error, but we should also consider the network fitting ability.The BP neural network fitted well to the TP effluent concentrations (Figure 14).The R 2 of the overall fitting ability reached 0.90 and the R 2 of the training data was 0.91.The validation data had the best fit with an R 2 of 0.92.The overall R 2 of the test data was 0.81.The TP concentrations in the effluent were mainly influenced by the Depth, while the effects of the other factors were minimal.Depth was therefore the only input layer of the BP neural network for TP, and the fact that the fitting ability of the TP effluent concentrations was better than that of the other nutrients indicates that networks with only one guiding factor have a better fitting ability and lower error than other networks.
Water 2018, 10, x FOR PEER REVIEW 14 of 17 The fitting of the BP neural network to the NO3 − -N effluent concentrations is shown in Figure 13.The R 2 of the training data was 0.73 and, with an R 2 of 0.74, fitted well with the validation data.The fitting ability of the test data was poor with an R 2 of 0.53.The overall R 2 was 0.69.The above three figures (Figures 11-13) show that the quality of the fit of the BP neural network differed for each of the three nitrogen forms.Although there was a large relative error associated with the NH4 + -N fitting, it fitted better to the network than the other two nitrogen forms.Therefore, when using BP neural networks, we should not just consider the network error or the percentage network error, but we should also consider the network fitting ability.
The BP neural network fitted well to the TP effluent concentrations (Figure 14).The R 2 of the overall fitting ability reached 0.90 and the R 2 of the training data was 0.91.The validation data had the best fit with an R 2 of 0.92.The overall R 2 of the test data was 0.81.The TP concentrations in the effluent were mainly influenced by the Depth, while the effects of the other factors were minimal.Depth was therefore the only input layer of the BP neural network for TP, and the fact that the fitting ability of the TP effluent concentrations was better than that of the other nutrients indicates that networks with only one guiding factor have a better fitting ability and lower error than other networks.

Conclusions
In this study, we examined the removal of nutrients in a TF-CW.We used three tidal inflow models and one continuous flow model to examine the removal rate and the variation of each pollutant index at different depths.We used RDA to select the main controls on nutrient removal and applied them as the main factors in the BP neural network for model simulations.
In constructed wetlands, tidal operation can improve the removal rate of nutrients.The tidal flow has a positive effect and refreshes the wetland with fresh air and enhances the dissolved oxygen (DO) in the system.Such intermittent aeration is more effective than continuous aeration, as it facilitates the establishment of aerobic and anaerobic conditions suitable for nitrification and denitrification [34,35].The different nutrients in these four kinds of water treatment methods behaved differently.The optimal removal efficiency of TN by the tidal flow constructed wetlands was achieved by the influent method with an idle time/reaction time of 2:1 (D), which was significantly different from the other three influent methods (p < 0.05).This treatment also had the optimal NH4 + -N intake, which reached 99% ± 1%.The NO3 − -N concentrations in the effluent from the four simulations were higher than the influent concentrations; the removal rates were negative, and the overall removal efficiency was better in the continuous flow system than in the tidal flow simulation system.The TP removal rates did not differ significantly between the four devices (p > 0.05), and were not significantly higher in the tidal flow system.
Our results show that the ability of the BP artificial neural network model to predict nutrient concentrations in the effluent was good, and there were small errors when correlating the predicted values and the actual values.Of all the indexes, TP had the best fitting ability to the BP neural network, with a total R 2 of 0.90.The overall R 2 values for TN, NH4 + -N, and NO3 − -N were 0.67, 0.73, and 0.69, respectively.Despite the errors, the BP artificial neural network provides a new method for modeling the effectiveness of TF-CWs.Future studies should explore BP neural networks with multiple hidden layers, and greater generalization abilities and prediction accuracies.For simple mapping, a single hidden layer can be selected to achieve greater speed when the network meets the precision requirements.For complex mapping, the multiple hidden layer can improve the prediction accuracy of the network.We also need to carry out further studies of applications of optimized BP neural networks to TF-CWs based on genetic algorithms.

Conclusions
In this study, we examined the removal of nutrients in a TF-CW.We used three tidal inflow models and one continuous flow model to examine the removal rate and the variation of each pollutant index at different depths.We used RDA to select the controls on nutrient removal and applied them as the main factors in the BP neural network for model simulations.
In constructed wetlands, tidal operation can improve the removal rate of nutrients.The tidal flow has a positive effect and refreshes the wetland with fresh air and enhances the dissolved oxygen (DO) in the system.Such intermittent aeration is more effective than continuous aeration, as it facilitates the establishment of aerobic and anaerobic conditions suitable for nitrification and denitrification [34,35].The different nutrients in these four kinds of water treatment methods behaved differently.The optimal removal efficiency of TN by the tidal flow constructed wetlands was achieved by the influent method with an idle time/reaction time of 2:1 (D), which was significantly different from the other three influent methods (p < 0.05).This treatment also had the optimal NH 4 + -N intake, which reached 99% ± 1%.The NO 3 − -N concentrations in the effluent from the four simulations were higher than the influent concentrations; the removal rates were negative, and the overall removal efficiency was better in the continuous flow system than in the tidal flow simulation system.The TP removal rates did not differ significantly between the four devices (p > 0.05), and were not significantly higher in the tidal flow system.Our results show that the ability of the BP artificial neural network model to predict nutrient concentrations in the effluent was good, and there were small errors when correlating the predicted values and the actual values.Of all the indexes, TP had the best fitting ability to the BP neural network, with a total R 2 of 0.90.The overall R 2 values for TN, NH 4 + -N, and NO 3 − -N were 0.67, 0.73, and 0.69, respectively.Despite the errors, the BP artificial neural network provides a new method for modeling the effectiveness of TF-CWs.Future studies should explore BP neural networks with multiple hidden layers, and greater generalization abilities and prediction accuracies.For simple mapping, a single hidden layer can be selected to achieve greater speed when the network meets the precision requirements.For complex mapping, the multiple hidden layer can improve the prediction accuracy of the network.We also need to carry out further studies of applications of optimized BP neural networks to TF-CWs based on genetic algorithms.

Figure 2 .
Figure 2. Removal rates of total nitrogen (TN) in the four different devices (A-D).

Figure 3 .
Figure 3. Removal rate of NH4 + -N in the four different devices (A-D).

Figure 2 .
Figure 2. Removal rates of total nitrogen (TN) in the four different devices (A-D).

Figure 2 .
Figure 2. Removal rates of total nitrogen (TN) in the four different devices (A-D).

Figure 3 .
Figure 3. Removal rate of NH4 + -N in the four different devices (A-D).

Figure 3 .
Figure 3. Removal rate of NH 4 + -N in the four different devices (A-D).

Figure 4 .
Figure 4. NO3 − -N removal rates in the four different devices (A-D).

Figure 5 .
Figure 5.Total phosphorus (TP) removal rates in the four different devices (A-D).

Figure 4 .
Figure 4. NO 3 − -N removal rates in the four different devices (A-D).

Figure 4 .
Figure 4. NO3 − -N removal rates in the four different devices (A-D).

Figure 5 .
Figure 5.Total phosphorus (TP) removal rates in the four different devices (A-D).

Figure 5 .
Figure 5.Total phosphorus (TP) removal rates in the four different devices (A-D).

Figure 8 .
Figure 8. Relationship between the coefficients of determination (R 2 ) and the number of hidden layers.

Figure 8 .
Figure 8. Relationship between the coefficients of determination (R 2 ) and the number of hidden layers.

Figure 9 .
Figure 9. Fitting curve values and model output values.

Figure 9 .
Figure 9. Fitting curve values and model output values.

Figure 10 .
Figure 10.Prediction error and the percentage prediction error of the BP network.

Figure 10 .
Figure 10.Prediction error and the percentage prediction error of the BP network.

Figure 11 .
Figure 11.Fitting of the BP neural network to the TN effluent concentrations.The fitting of the BP neural network to the NH4 + -N effluent concentrations is shown in Figure12.The R 2 values for fitting the training and validation data were 0.80 and 0.56, respectively.The R 2 value of the validation data was relatively low because of the lack of data.The fitting coefficient of the test data network was 0.68, and the overall effect was very good with an R 2 value of 0.73.

Figure 12 .
Figure 12.Fitting of the BP neural network to the NH4 + -N effluent concentrations.

Figure 11 . 17 Figure 11 .
Figure 11.Fitting of the BP neural network to the TN effluent concentrations.

Figure 12 .
Figure 12.Fitting of the BP neural network to the NH4 + -N effluent concentrations.Figure 12. Fitting of the BP neural network to the NH 4 + -N effluent concentrations.

Figure 12 .
Figure 12.Fitting of the BP neural network to the NH4 + -N effluent concentrations.Figure 12. Fitting of the BP neural network to the NH 4 + -N effluent concentrations.

Figure 13 .
Figure 13.Fitting of the BP neural network to the NO3 − -N effluent concentrations.

Figure 13 .
Figure 13.Fitting of the BP neural network to the NO 3 − -N effluent concentrations.

Figure 13 .
Figure 13.Fitting of the BP neural network to the NO3 − -N effluent concentrations.

Figure 14 .
Figure 14.Fitting of the BP neural network to the TP effluent concentrations.

Figure 14 .
Figure 14.Fitting of the BP neural network to the TP effluent concentrations.

Table 1 .
Water quality of influent wastewater.

Table 2 .
Time settings of the different water inputs to the TF-CW simulator.

Table 3 .
Removal efficiency of nutrients in the different TF-CW simulators.

Table 3 .
Removal efficiency of nutrients in the different TF-CW simulators.

Table 3 .
Removal efficiency of nutrients in the different TF-CW simulators.