Prediction Study of Pollutants in Artificial Wetlands Enhanced by Electromagnetic Fields

Yin, Fajin; Ma, Rong; Liu, Yungen; Xiong, Liechao; Luo, Hu

doi:10.3390/su162310327

Open AccessArticle

Prediction Study of Pollutants in Artificial Wetlands Enhanced by Electromagnetic Fields

by

Fajin Yin

^1,2,

Rong Ma

^1,2,*,

Yungen Liu

²,

Liechao Xiong

^1,2 and

Hu Luo

^1,2

¹

School of Mechanical Engineering and Transportation, Southwest Forestry University, Kunming 650233, China

²

Key Laboratory of Ecological Environment Evolution and Pollution Control in Mountainous and Rural Areas of Yunnan Province, Kunming 650224, China

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(23), 10327; https://doi.org/10.3390/su162310327

Submission received: 18 September 2024 / Revised: 20 November 2024 / Accepted: 22 November 2024 / Published: 26 November 2024

Download

Browse Figures

Versions Notes

Abstract

Predictive modelling is very important for water pollution management. In this study, based on an electromagnetic field-enhanced vertical flow artificial wetland and using the actual measured data as inputs to the model, the ammonia nitrogen (NH₄⁺-N) effluent concentration of this wetland system was analyzed by Pearson’s correlation analysis to be related to six key factors, which were the NH⁺-N raw water concentration, the chemical oxygen demand (COD) raw water concentration, the treatment time, the magnetic field strength, the aeration time, and the electric field strength. Then, different artificial neural network models were constructed for comparison and the constructed models were evaluated based on statistical parameters. The results show that the PSO algorithm can improve the prediction effect of the BP neural network, but the prediction accuracy of the CNN model is better compared to the others. The prediction accuracy of the RF model is the highest compared to the others, and the evaluation parameters of R2, RMSE, and MAE of the test set are (0.9446, 2.4328, and 3.0943), respectively. The prediction error of this model is the smallest, and the model can predict the concentration of electric and magnetic fields in a wetland system with high accuracy compared to other models. This model can more accurately predict the NH₄⁺-N effluent concentration of the magnetic field-enhanced wetland system, which can provide a certain basis for the study of the management of water pollution.

Keywords:

artificial neural networks; machine learning; artificial wetlands; electric field; magnetic field

1. Introduction

Presently, both domestic and foreign countries attach great importance to water pollution treatment, and improving wastewater treatment efficiency has become a hotspot for many researchers. Artificial wetlands are an ecological wastewater treatment technology, and this technology is widely used to remove pollutants in wastewater. The technology has the advantages of simplicity and low cost [1] and can prevent the spread of bacteria in the water [2]. However, the artificial wetland is affected by some conditions, such as weather and temperature, during operation, and problems, such as poor removal effects, will occur after a long period of operation. The introduction of an electromagnetic field in traditional vertical flow artificial wetlands has been a hot research topic in recent years, and this group has already achieved some results [3]. To make the electromagnetic field technology more efficient to strengthen the removal effect of artificial wetlands and reduce the waste of human and material resources, it is necessary to construct a prediction model related to the electric and magnetic fields, and to the water quality indexes.

Water quality prediction modeling is a mathematical method to explore the changing law of water quality in the future by describing the relationship between the interaction of water quality components in a water body based on a large amount of historical monitoring data [4]. The problems of non-linearity and hysteresis inherent in the water pollution treatment process [5] make it difficult to obtain the effluent water quality by direct calculation through the white box model. The artificial neural network model is a more widely used black box model. In the process of practical application, it is necessary to collect and analyze a large number of data samples, construct a mathematical model and train the model, and then use the model to achieve the function of pattern recognition and prediction [6]. Machine learning algorithms can reduce the overall error and obtain the best-fit model through iterative learning [7], which greatly enhances the ability to monitor the pollution of the water environment. The Hopfield neural network (HNN) was first introduced in 1984, starting the period of development of artificial neural networks [8]. The backpropagation neural network (BPNN) was proposed to solve the multilayer problems reflected by neural networks [9]. Li et al. used the BP neural network combined with the principles of landscape ecology to make a more comprehensive evaluation and prediction of urban green space landscape planning [10]. Chen Wei et al. used the BP neural network to establish an NH₄⁺-N prediction model and verified the accuracy of the model [11]. In 1998, Lecun et al. used the BP algorithm based on convolution and pooling layers and created the prototype of the convolutional neural network (CNN) [12]. A convolutional neural network (CNN) is a common type of deep learning model which has better self-learning and self-adaptive ability than other traditional prediction models and which can better handle nonlinear problems [13]. Meanwhile, CNNs can provide better prediction accuracies even without spectral preprocessing due to their powerful feature extraction and sample mapping of local features [14]. Ye et al. used a CNN model to construct a chemical demand prediction model with good accuracy [15]. In the study of water resources management, two models, namely the generalized regression neural network (GRNN) and random forest (RF), were used for comparison, and it was found that the prediction accuracy of the RF model was higher than that of the GRNN model [16]. Cyanobacteria are the main algae causing water bloom; to control the concentration of algae, Derot et al. predicted the cyanobacteria abundance using a random forest (RF) model [17]. Lu et al. proposed two machine learning models based on hybrid decision trees to obtain more accurate short-term water quality predictions [18]. Although all kinds of machine learning algorithms are widely used to solve environmental problems, the results obtained by different machine learning models in different problems will be different, and some scholars have pointed out that when using machine learning methods to solve problems in the field of environmental sciences, the conditions affecting the construction of machine learning models should be fully considered along with the performance of different models and interpretability, which will, in turn, improve the scope of application of machine learning models in the field of environmental studies [19]. The construction of a reasonable water quality prediction model should be considered. Therefore, it is of great significance to construct a reasonable water quality prediction model to monitor and predict the effluent water quality after sewage treatment.

In this study, the NH₄⁺-N concentration in the effluent of this wetland system was predicted based on various machine learning methods, and a prediction model for the water quality indicator NH₄⁺-N concentration was constructed by using a backpropagation neural network and a convolutional neural network as well as random forests, while particle swarm optimization algorithm was used to optimize the traditional backpropagation neural network. Afterwards, the prediction effects of the four models were compared and evaluated. In this study, the NH₄⁺-N concentration in water quality indicators was predicted by different prediction models to provide scientific support for water pollution control.

2. Materials and Methods

2.1. Sources of Data

This study constructed a set of electromagnetic field-enhanced vertical flow artificial wetland systems based on the traditional vertical flow artificial wetland system by introducing electromagnetic field enhancement to remove pollutants. As shown in Figure 1, the size of the device is 2000 mm × 1000 mm × 1000 mm, in which the magnetic field reinforcement is introduced in the 650 mm section, and the substrate fillers in the magnetic field section are 500 mm-high biofilm balls (with polyester–ammonia sponges inside the ball), and 20–30 mm-high volcanic rock from bottom to top, respectively. In the 1350 mm section, electric field reinforcement is introduced, and the substrates in the electric field section are 300 mm-high volcanic stone, 200 mm-high ceramic granule, and 10–20 mm-high zeolite from bottom to top, respectively. The wetland system can hold 1.5 m³ of wastewater after filling. The water flow direction of the system is as follows: firstly, the wastewater is lifted to the water inlet by the lifting pump, then it passes through the magnetic field reinforcement part, then it passes through the electric field reinforcement part, and, finally, it returns to the water storage barrel through the water outlet to complete a cycle. The system was monitored to determine the water quality. The monitoring data are from June 2022 to November 2023, during which time the inlet water of the system was replaced every 6 days. The residence times of the sewage in the wetland system were 1, 2, 3, and 6 days, respectively, and the data collected on those days were measured. These included water temperature, pH, dissolved oxygen, conductivity, NH₄⁺-N, COD, total phosphorus, orthophosphorus, and the strength of the electromagnetic fields. The measured data were eventually used for model construction. The data used to construct the predictive model comprised 499 sets. The wastewater used in the test was the tail water of a municipal domestic wastewater treatment plant. Table 1 shows a part of the data on the NH₄⁺-N effluent concentration in a traditional vertical flow artificial wetland (VFCW) and electromagnetic field-enhanced vertical flow artificial wetland (EM-VFCW), from which it can be seen that the removal of NH₄⁺-N by VFCW improved after the enhancement of the electric and magnetic fields.

After determining that the electromagnetic fields had enhanced NH₄⁺-N removal, the EM-VFCW was monitored for various water quality indicators, including different electromagnetic field strengths, as shown in Table 2. The table shows a part of the monitoring data. From the table, it can be seen that if the electric field or magnetic field strengths are changed, as well as some factors, such as treatment time, this will have an impact on the concentration of NH₄⁺-N effluent, so it is important to build a suitable prediction model to be able to better monitor and control the NH₄⁺-N effluent. Not all of these variables strongly correlate with the NH₄⁺-N effluent concentration, and the intrinsic relationship between these parameters and the NH₄⁺-N effluent concentration should be analyzed in more detail. By analyzing the strongly correlated parameters and using them as inputs, the accuracy of the model in predicting the NH₄⁺-N concentration is expected to improve, so it is necessary to screen the parameters using Pearson correlation analysis, which will be presented in Section 2.3.

2.2. Data Normalization

In this paper, all data are subjected to magnitude elimination to reduce the impact of numerical gaps between different monitoring items on the model weights and to improve the accuracy and applicability of the network. Normalization before data input into the network is to limit the range of the data to between 0 and 1. The arithmetic formula is as follows:

\begin{matrix} y = \frac{(y_{m a x} - y_{m i n}) (x - x_{m i n})}{x_{m a x} - x_{m i n}} + y_{m i n} \end{matrix}

(1)

where x is not yet normalized data, y is the output result after normalization, y_max, and y_min are the maximum and minimum values of the data set, respectively, y_max and y_min are set to 0 and 1. As such, it can be completed for the whole data set normalized in the range from 0 to 1.

2.3. Input Feature Parameter Selection

To improve the accuracy of the prediction model, the Pearson correlation coefficient method was used for data that were poorly correlated with NH₄⁺-N concentrations. The criteria for determining their Pearson correlation coefficients were 0.8–1.0 for a very strong correlation, 0.6–0.8 for a strong correlation, 0.4–0.6 for a moderate correlation, 0.2–0.4 for a weak correlation, and 0.0–0.2 for a very weak or no correlation. * and ** indicate a significant correlation at p < 0.05 and p < 0.01, respectively, and *** indicates a significant correlation at p < 0.001. Parameters with a higher correlation with the output values should be selected as inputs. Experimental data were processed using Excel and experimental graphs were plotted using Origin2021. Correlation analysis was performed using SPSS27.

2.4. Construction of the Model

In this experiment, four methods, namely a BP neural network, PSO-BP neural network, convolutional neural network (CNN), and random forest, were used to construct the water quality index prediction model, and MATLAB2020 software was used to construct the above model. After the construction of the model, 70% of the samples were taken for the training set and 30% were used for the test set. Subsequently, the parameters of each model were optimized to reach the optimal parameters, and then the training and test sets were trained, and the accuracy of the models was compared and evaluated. The following are the main principles of the 4 prediction models.

2.4.1. BP Neural Network

The backpropagation neural network (BPNN) is often used to solve problems with complex internal mechanisms and more uncertainties. Figure 2 shows the schematic structure of the BPNN. Its structure consists of a three-layer network with an input layer for the independent variable, an output layer for the dependent variable, and a hidden layer, where each node in the network is connected to all the nodes in the lower layer. The nodes between the same layers are independent of each other, and the hidden layer can have one or more layers [20]. The number of hidden layers used in this study is one, and the number of hidden layer nodes is eight. BP neural networks have the advantages of network simplicity, rapid arithmetic, accurate prediction, and the ability to simulate the nonlinear relationship between data [21].

2.4.2. Particle Swarm Optimization Algorithm to Improve the BP Model

Particle swarm optimization (PSO) is a holistic optimization technique based on the behavior of flocks of birds and schools of fish. It achieves global optimization by mimicking the mutual collaboration and competition between members of bird and fish flocks [22]. Each individual is considered as a particle in the PSO algorithm, and each particle has a position vector and a velocity vector, where the position vector represents the current solution of the particle and where the velocity vector represents the direction of motion of the particle [23]. It can quickly converge to the global optimal solution in the search space by iteratively updating the position and velocity of the particles. Since BP neural networks have defects, such as poor convergence speed, using PSO to optimize BP neural networks can improve the convergence speed, reduce the possibility of slipping to local extremes [24,25], and improve the accuracy of prediction.

2.4.3. Convolutional Neural Networks (CNN)

A CNN consists of three main parts, namely a convolutional layer, a pooling layer, and a fully connected layer, where neurons in the convolutional layer are locally connected to their feature surfaces in the input layer [26]. This locally weighted sum is passed to the activation function to obtain the output value of each neuron in the convolutional layer. Convolutional neural networks use local connectivity and weight sharing, i.e., each neuron is connected to only a small region of the input, and they share the same weights. This structure reduces the number of network parameters, improves the stability and generalization of the network, and enables high-level abstraction and classification tasks by stacking multiple convolutional and pooling layers to learn and extract feature representations of the input data layer by layer. The basic principle diagram of the convolutional neural network is shown in Figure 3.

2.4.4. Random Forest (RF)

The schematic structure of an RF is shown in Figure 4. An RF is a collection of weakly independent decision tree ML methods that use multiple variables randomly selected from an initial set of input variables to make split decisions at each node [27]. The RF method is used for classification and regression prediction. In classification, one vote per tree is used to make the final category decision, while in regression, the final output value is the average of the weak trees. In this study, the number of weak trees was equal to 100, the number of randomly selected variables was equal to the square of the total number of input variables, and the minimum number of leaves for each tree was equal to 5.

2.5. Indicators for Model Evaluation

After the model is constructed, the performance of the model needs to be evaluated using some metrics. The metrics used in this evaluation are R² (coefficient of determination), RMSE (root mean square error), and MAE (mean absolute percentage error).

R² is the degree of fit between the expected and predicted values. The statistical measure of the coefficient of determination is the coefficient of determination R², and the value of R² ranges from [0, 1]. When the value is closer to 1, it means that the degree of fit between the expected value and the predicted value is better. On the contrary, if the value is closer to 0, it means that the degree of fit between the expected value and the predicted value is worse, and its calculation formula is as follows:

\begin{matrix} R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}} \end{matrix}

(2)

A smaller RMSE indicates a better model, calculated as follows:

\begin{matrix} R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{n}} \end{matrix}

(3)

The value of MAE is in the range of [0, +∞]. The more the result converges to 0, the more perfect the model is, and the higher the accuracy of the prediction is, and when the predicted value is the same as the actual value, the MAPE is 0, as follows:

\begin{matrix} M A E = \frac{1}{n} \cdot \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} | \end{matrix}

(4)

3. Results and Analyses

3.1. Input Feature Selection

To ensure the accuracy of the prediction model, it is necessary to carry out correlation analysis for the data of the NH₄⁺-N effluent concentration index. This study uses nine indicators as input characteristics and NH₄⁺-N effluent concentration to carry out Pearson’s correlation analysis. The data used in this study are real-time monitoring data, and the correlation between the NH₄⁺-N effluent concentration and the input characteristics is shown in Figure 5. The indicators with a high Pearson correlation with NH₄⁺-N effluent concentration were selected as the input features of the neural network (correlation coefficient > 0.2), which aimed to improve the accuracy of the prediction model, and according to the information in Figure 5, it can be concluded that the six indicators with high correlation with the NH₄⁺-N effluent concentration were NH₄⁺-N raw water concentration, COD raw water concentration, treatment time, magnetic field strength, aeration time, and electric field strength. The prediction performance is highly dependent on the selection of input parameters, and different parameter selections may lead to completely different prediction results, increasing the difficulty of model selection and tuning.

3.2. Analysis of Different Model Prediction Results

Figure 6 and Figure 7 show the comparison of predicted and actual value correlations for the training and test sets, respectively, where the predicted and actual value correlations of the RF neural network (training set: R = 0.9845, test set: R = 0.9766) are higher than those of the other three neural network models (CNN model (training set: R = 0.9715, test set: R = 0.9650) > PSO-BP model (training set: R = 0.9668, test set: R = 0.9625) > BP model (training set: R = 0.9551, test set: R = 0.9609)). From the figures, it can be seen that no matter the training set or the test set, the RF model is more accurate. Its simulated data are closer to the real value. The second is the CNN model—compared to PSO-BP and BP, its correlation is further improved, but the simulated data accuracy is not high enough, as the difference between the some of the real datapoints and the predicted values is too large. The PSO-BP model after particle swarm optimization algorithm optimization is better than the BP model, indicating that the PSO algorithm can effectively enhance the performance of the BP neural network, improve the prediction accuracy, and improve the performance of the BP neural network. The PSO algorithm can effectively enhance the performance of the BP neural network and improve the accuracy of prediction. Through the analysis of the prediction results of the four models, the four artificial neural network models can predict the water quality indicators, which is due to their excellent algorithm design. Artificial neural network models can be highly fitted to the data in the training phase. This is due to the establishment of the mapping relationship between the input data and the output value. To achieve the effect of rapid response, different algorithms are used for the establishment of models which have different prediction effects for the same number of samples, meaning that the models have different prediction effects. The models built with different algorithms have different prediction effects for the same number of samples, so it is necessary to screen the appropriate algorithm to build the model to achieve better prediction results.

The accuracy of the prediction model is primarily assessed based on the accuracy of the test set data. Figure 8 compares the real and predicted values of NH₄⁺-N concentration in the test set of four prediction models. Figure 8a–d corresponds to the BP, PSO-BP, CNN, and RF models, respectively. The figures show that the trends of the real values and the predicted values of the four models are consistent. However, the accuracy of the different models varies. The prediction results only differ in the details of the error fluctuations. It can be considered that all four models can predict future changes in water quality, with the predicted value of the RF model being closer to the real value. Figure 9 displays the prediction error of the NH₄⁺-N concentration in the test set of the four models. Upon comparison, it is evident that the prediction error of the RF model is the smallest among the four models. Out of the 150 groups of data in the test set, the prediction error of 133 groups is between −5 and +5, accounting for 88.6% of the total number of samples. Only 17 groups of data have an error greater than −5 and +5. Approximately 52.6% of the total samples have an error between −2 and +2. The maximum error range of the RF model is between −8.4 and +8.5. The test set errors of the CNN model, PSO-BP, and BP model ranged from −5 to +5 and accounted for 72%, 77.3%, and 70% of the total samples, respectively. These error ranges are larger than that of the RF model. The RF model exhibited smaller errors compared to the other three models, and its predicted values were closer to the real values. It demonstrated better fitting effects and prediction accuracy compared to the BP and CNN models. Additionally, the RF model developed in this study can be used to predict the effluent concentration of the device for different characteristics of wastewater concentration. The parameters of the device can be adjusted based on whether the prediction result meets the standard or not.

3.3. Analysis of Evaluation Results of Different Models

The evaluation parameters of four artificial neural network models for predicting the NH₄⁺-N effluent concentration are shown in Table 2. In this study, the BP model, PSO-B model, RF model, and CNN model were used for evaluation to verify the accuracy of the models. The fitting effect of the BP model is relatively poor compared to the other three models. The R² of the training set is only 0.9119, and the RMSE and MAE are 3.2657 mg/L and 3.4863 mg/L, respectively, but the RMSE and MAE of the test set have been increased to 4.2922 mg/L and 4.1478 mg/L, respectively, and the R² of the test set has decreased and decreased to 0.893, which is a large gap compared with the evaluation parameters of the training set. This is because the weights and thresholds of the BP neural network are based on the gradient descent method [28], which is easily affected by the initial parameters, but when constructing the BP neural network, the initial weights and thresholds of the network are usually set arbitrarily, with a strong randomness, which often leads to the model having a low convergence speed and finding it easy to fall into the local minimum, as well as other problems during the training process. This will lead to simulated results that do not fit with the real situation. As such, the fitting process for the training set means that the accuracy of the test set is reduced, which in turn reduces the BP model’s overall predictive accuracy [29]. The PSO algorithm is a population intelligence algorithm [30]. The BP neural network model was improved by the PSO algorithm: the RMSE and MAE in the training set were 3.4021 mg/L and 3.2911 mg/L, respectively, and the R² was 0.933. Compared with the traditional BP model, the RMSE of the improved model had a slight improvement, but the overall model evaluation results were better than the BP model. For the test set, the RMSE and MAE of the PSO-BP model were 3.4327 mg/L and 3.7068 mg/L, respectively. The R² of the test set was 0.9263, which possessed a better consistency with the R² of the training set, which indicates that the traditional BP model has better predictive power after the improvement of the PSO algorithm. In the BPNN, under the already set model framework, the initial parameters are adjusted according to the set learning rate, and the set maximum number of training times with the model error value is the stopping indication. The addition of the PSO algorithm changes the random selection of each parameter of the BP model into a global search with a set parameter selection range of −1~1, and, at the same time, 5 populations are set up, and each population is updated 50 times. This changes the random selection of the parameters of the BP model and improves the search efficiency of the neural network. At the same time, it can prevent the neural network from falling into local extremes, and it improves the R² of the test set from 0.893 to 0.926, which increases the fitting accuracy as well as the prediction accuracy.

From the parameters in Table 3, it can be seen that the R² of the CNN model is higher than that of the BP and PSO-BP models in both the training set and the test set, and the RMSE is not much different from that of the PSO-BP model. However, the MAE is lower than that of the BP and PSO-BP models, which indicates that the model fitting accuracy of the CNN model is better than that of the BP and PSO-BP models, and it is more suitable for the prediction of NH₄⁺-N effluent of this system. The MAE values of the CNN model in the training set, test set, and total samples are lower than the MAE of the BPNN model and PSO-BPNN model, which reflects that the error value between the predicted value and the real value obtained by the CNN model is also smaller, and the CNN also has a higher prediction effect for this system. Here, the neurons in the convolutional layer are locally connected to their feature surfaces in the input layer [26]. The locally weighted sum is passed to the activation function to obtain the output value of each neuron in the convolutional layer. The pooling layer reduces the number of connections between the convolutional layers, decreasing the dimension of the feature map and the computational complexity of the model. Fully connected layers can combine local information from convolutional and sampling layers with category differentiation [31], and the output value of the last fully connected layer is passed to the output layer to realize the output. CNNs can fit multidimensional mapping problems, and neurons in multilayered feature extractors can provide enough complexity to simulate the nonlinear nature of the task. CNNs with their local connectivity, weight-sharing, and pooling operation properties can reduce the number of training parameters and effectively reduce the complexity of the network while making the model invariant to a certain extent to translation, distortion, and scaling, improving robustness and fault tolerance. Based on these superior characteristics, it outperforms traditional neural networks in a variety of signal and information processing and has achieved good results in areas, such as environmental protection [32]. The RMSE and MAE of the RF model are the smallest among the four models in both the training and test sets, and the combination of the evaluation parameters in Figure 6 and Figure 7 and Table 3 shows that RF is more suitable for predicting the effluent concentration of NH₄⁺-N under the system. The R² of RF is the closest to 1 in the comparison of the four models, and the trend and degree of agreement between the real and the predicted values of the test set in Figure 8 are in high agreement, which indicates that the RF model can better transform the data into suitable mathematical relationships. As such, the model has a better prediction effect in the subsequent prediction. The model can better transform the data into appropriate mathematical relationships, so the model has a better prediction effect in the subsequent prediction. Each node in the BP model needs to set individual weights, thresholds, etc., so the number of parameters is relatively large. However, the CNN model uses a two-layer convolution structure, which not only reduces the number of parameters, but also reduces the risk of fitting, and the prediction accuracy will be further improved. Meanwhile, the RF model has a strong generalization ability. The model has the advantages of strong generalization ability, adaptability, high prediction accuracy, etc., and can ensure the accurate prediction of the NH₄⁺-N concentration in effluent after adjusting the decision tree trees and the minimum number of leaves of the RF, which also enhances the network’s ability to learn from the real situation while filtering the data. The model also applies the variance of the maximally selected rank statistic for regression, which is faster and more computationally efficient than the general RF model. Allowing parallel processing and applying segmentation criteria to reduce node impurities, the importance of variables can be assessed by evaluating the reduction in impurities in the ranking pattern as a way to screen out the appropriate variables, and then regression prediction can greatly improve the effectiveness of regression prediction [33].

The NH₄⁺-N concentration of the effluent of the system can be predicted more accurately through the establishment of a suitable artificial neural model, and based on the prediction results, it can be judged whether the wastewater treated by the electric field- and magnetic field-enhanced artificial wetland system can be discharged in compliance with the standard, in order to judge the effluent treatment effect of the device and whether it can be adapted to the characteristics of the local wastewater, thus reducing the large number of human experimental process and avoiding the waste of resources. Four different prediction models were used to predict the NH₄⁺-N effluent concentration of the electromagnetic field-enhanced vertical flow artificial wetland, and the results showed that the CNN model and the RF model had a better fitting ability compared with the BP model and the PSO-BP model, but that the prediction accuracy of the RF was relatively higher. The results of this study can provide some theoretical basis for the research into water quality pollutant prediction models, and in the subsequent research, the advantages of various models should be combined to construct a new type of high-precision prediction model.

4. Conclusions

This paper describes a method for predicting NH₄⁺-N concentrations in electromagnetic field-enhanced artificial wetland systems using a machine learning model. The Pearson correlation analysis concluded that the six key factors related to the NH₄⁺-N effluent concentration in the electromagnetic-field enhanced artificial wetland system are NH4⁺-N raw water concentration, COD raw water concentration, treatment time, magnetic field strength, aeration time and electric field strength. The four models constructed in this study, namely the BP model, the BP-PSO model, the CNN model, and the RF model, can realize the prediction of NH₄⁺-N concentration. However, due to the different models used, the accuracy of prediction varies. The PSO algorithm can improve the traditional BP model and reduce the error of the model to improve the prediction accuracy of the model, but the accuracy is not high enough in this system. By comparing the results of the four different prediction models, the prediction accuracy is as follows: RF model > CNN model > PSO-BP model > BP model, indicating that the RF model is more applicable in the electromagnetic field-strengthened artificial wetland system and that its prediction value is closer to the actual value. It also provides a new idea for predicting water quality parameters, which has good application prospects and application value. Advanced machine learning techniques should be explored in subsequent studies to effectively address issues, such as water pollution modelling.

Author Contributions

Software, F.Y.; Formal analysis, F.Y.; Resources, L.X.; Data curation, F.Y. and H.L.; Writing—original draft, F.Y.; Supervision, R.M.; Project administration, Y.L.; Funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

Applied basic research project of Yunnan Province (202201AT070048). Science and Technology Programme Key Project of Yunnan Province (202301AS070042). Scientific Research Fund Project of Yunnan Provincial Department of Education (2024Y606).

Data Availability Statement

The data and materials used to support presented results and analyses of this study can be made freely available.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, X.; Yang, L.; Xu, K.; Bei, K.; Zheng, X.; Lu, S.; An, N.; Zhao, J.; Jin, Z. Application of constructed wetlands in treating rural sewage from source separation with high-influent nitrogen load: A review. World J. Microbiol. Biotechnol. 2021, 37, 138. [Google Scholar] [CrossRef] [PubMed]
Liu, J.Y.; Sun, Y.S.; Yao, Q.; Chen, F.; Liang, C.; Yang, T.Y. Review on the application of constructed wetland technology in industrial wastewater treatment. Water Wastewater Eng. 2021, 57, 509–516. [Google Scholar]
Xiong, L.; Ma, R.; Yin, F.; Fu, C.; Peng, L.; Liu, Y.; Lu, X.; Li, C. Simulation and optimisation of magnetic and experimental study of magnetic field coupling constructed wetland. Environ. Technol. 2024, 45, 5083–5103. [Google Scholar] [CrossRef]
Li, Y.; Shi, Y.; Jiang, L.; Zhu, X.; Gong, R. Advances in surface water environment numerical models. Water Resour. Prot. 2019, 35, 1–8. [Google Scholar]
Chen, L.; Liu, Q.; Wang, L.; Zhao, J.; Wang, W. Data-driven Prediction on Performance Indicators in Process Industry: A Survey. Acta Autom. Sin. 2017, 43, 944–954. [Google Scholar]
Feng, Y.; Cui, N.; Gong, D.; Zhang, Q.; Zhao, L. Evaluation of random forests and generalized regression neural networks for daily reference evapotranspiration modelling. Agric. Water Manag. 2017, 193, 163–173. [Google Scholar] [CrossRef]
Hassan, W.H.; Hussein, H.; Alshammari, M.H.; Jalal, H.K.; Rasheed, S.E. Evaluation of gene expression programming and artificial neural networks in PyTorch for the prediction of local scour depth around a bridge pier. Results Eng. 2022, 13, 100353. [Google Scholar] [CrossRef]
Hopfield, J.J. Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Natl. Acad. Sci. USA 1984, 81, 3088–3092. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Li, J.; Dong, X.; Ruan, S.; Shi, L. A parallel integrated learning technique of improved particle swarm optimization and BP neural network and its application. Sci. Rep. 2022, 12, 19325. [Google Scholar] [CrossRef]
Chen, W.; Chen, H.; Dai, F. Effluent water quality prediction model based on artificial neural network for wastewater treatment. Water Wastewater Eng. 2020, 56, 990–994. [Google Scholar]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Croce, D.; Rossini, D.; Basili, R. Explaining non-linear classifier decisions within kernel-based deep architectures. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, 1 November 2018; pp. 16–24. [Google Scholar]
Zhao, H.; Liu, F.; Li, L.; Luo, C. A novel softplus linear unit for deep convolutional neural networks. Appl. Intell. 2018, 48, 1707–1720. [Google Scholar] [CrossRef]
Ye, B.; Cao, X.; Liu, H.; Wang, Y.; Tang, B.; Chen, C.; Chen, Q. Water chemical oxygen demand prediction model based on the CNN and ultraviolet-visible spectroscopy. Front. Environ. Sci. 2022, 10, 1027693. [Google Scholar] [CrossRef]
Topp, S.N.; Pavelsky, T.M.; Jensen, D.; Simard, M.; Ross, M.R.V. Research trends in the use of remote sensing for inland water quality science: Moving towards multidisciplinary applications. Water 2020, 12, 169. [Google Scholar] [CrossRef]
Derot, J.; Yajima, H.; Jacquet, S. Advances in forecasting harmful algal blooms using machine learning models: A case study with Planktothrix rubescens in Lake Geneva. Harmful Algae 2020, 99, 101906. [Google Scholar] [CrossRef]
Lu, H.; Ma, X. Hybrid decision tree-based machine learning models for short-term water quality prediction. Chemosphere 2020, 249, 126169. [Google Scholar] [CrossRef]
Liu, X.; Lu, D.; Zhang, A.; Liu, Q.; Jiang, G. Data-driven machine learning in environmental pollution: Gains and problems. Environ. Sci. Technol. 2022, 56, 2124–2133. [Google Scholar] [CrossRef]
He, J.; Liu, N.; Han, M.; Chen, Y. Research on danjiang water quality prediction based on improved artificial bee colony algorithm and optimized BP neural network. Sci. Program. 2021, 2021, 3688300. [Google Scholar] [CrossRef]
He, Y.; Gong, Z.; Zheng, Y.; Zhang, Y. Inland reservoir water quality inversion and eutrophication evaluation using BP neural network and remote sensing imagery: A case study of Dashahe reservoir. Water 2021, 13, 2844. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
Lu, G.; Xu, D.; Meng, Y. Dynamic evolution analysis of desertification images based on BP neural network. Comput. Intell. Neurosci. 2022, 2022, 5645535. [Google Scholar] [CrossRef] [PubMed]
Yan, J.; Xu, Z.; Yu, Y.; Xu, H.; Gao, K. Application of a hybrid optimized BP network model to estimate water quality parameters of Beihai Lake in Beijing. Appl. Sci. 2019, 9, 1863. [Google Scholar] [CrossRef]
Jahandideh-Tehrani, M.; Bozorg-Haddad, O.; Loáiciga, H.A. Application of particle swarm optimization to water management: An introduction and overview. Environ. Monit. Assess. 2020, 192, 281. [Google Scholar] [CrossRef] [PubMed]
Pu, F.; Ding, C.; Chao, Z.; Yu, Y.; Xu, X. Water-quality classification of inland lakes using Landsat8 images by convolutional neural networks. Remote Sens. 2019, 11, 1674. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Meng, Y.; Xiong, Y.; Guo, H.; Zhang, Y.; Zhao, Z.; Jiang, X. A Sing Building Seismic Damage Assessment Method Based on Improved Genetic Algorithm Optimized BP Neural Network. Earthq. Res. China 2023, 39, 785–794. [Google Scholar]
Zhang, B.; Li, J.; Shen, Q.; Wu, Y.; Zhang, F.; Wang, S. Key Technologies and Systems of Surface Water Environment Monitoring by Remote Sensing. Environ. Monit. China 2019, 35, 1–9. [Google Scholar] [CrossRef]
Basha, S.H.S.; Dubey, S.R.; Pulabaigari, V.; Mukherjee, S. Impact of fully connected layers on performance of convolutional neural networks for image classification. Neurocomputing 2019, 378, 112–119. [Google Scholar] [CrossRef]
Baltrušaitis, T.; Ahuja, C.; Morency, L.P. Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 423–443. [Google Scholar] [CrossRef]
Olmedilla, M.; Martínez-Torres, M.R.; Toral, S. Prediction and modelling online reviews helpfulness using 1D Convolutional Neural Networks. Expert Syst. Appl. 2022, 198, 116787. [Google Scholar] [CrossRef]
Nicodemus, K.K.; Malley, J.D.; Strobl, C.; Ziegler, A. The behaviour of random forest permutation-based variable importance measures under predictor correlation. BMC Bioinform. 2010, 11, 110. [Google Scholar] [CrossRef]

Figure 1. Electromagnetic field-enhanced vertical flow artificial wetland system.

Figure 2. Structure of the BP neural network.

Figure 3. Convolutional neural network schematic.

Figure 4. Random forest network structure.

Figure 5. Heat map of the Pearson correlation of input features.

Figure 6. Plots of the true values of the training set fitted to the predicted values.

Figure 7. Plots of the true values of the test set fitted to the predicted values.

Figure 8. Plot of true vs. predicted concentrations for different predictive model test sets. (a) BP (b) PSO-BP (c) CNN (d) RF.

Figure 9. Plot of error values for different model test sets.

Table 1. VFCW and EM-VFCW ammonia effluent concentrations.

Sampling Number	NH₄⁺-N Raw Water Concentration (mg/L)	VFCW	EM-VFCW
1	58.72	45.56	32.09
2	53.03	40.32	30.85
3	57.33	49.17	32.97
4	59.50	43.42	31.38
5	67.19	47.81	34.01
6	55.42	42.57	31.71
7	44.92	39.31	27.72
8	56.40	41.72	32.68
9	35.35	30.85	18.95
10	47.19	33.51	24.11

Table 2. Indicators affecting NH₄⁺-N effluent concentration of EM-VFCW (sections).

NH₄⁺-N Raw Water Concentration	COD Raw Water Concentration	Treatment Time	Magnetic Field Strength	Oxygen Supply Time	Electric Field Strength	DO	PH	Temp	NH₄⁺-N Effluent Concentration
33.13	281.92	24	3	0	5	2.88	7.53	25.26	20.54
49.87	442.75	24	3	0	5	3.33	7.51	20.01	25.45
13.66	295.00	24	3	0	5	3.13	7.53	19.16	26.81
38.39	201.08	48	3	0	5	3.19	7.94	20.05	25.69
39.74	163.58	48	3	0	5	3.06	8.06	19.88	10.54
26.75	150.75	48	3	0	5	2.73	7.57	18.96	18.73
25.87	169.17	72	3	0	10	2.73	7.79	19.92	9.19
23.12	193.39	72	3	0	10	2.54	7.55	22.27	6.73
30.23	168.39	72	3	0	10	2.47	7.74	17.65	7.07
62.27	233.16	144	8	0	10	2.95	7.76	21.83	25.72
82.95	424	144	8	0	10	2.79	7.91	21.57	27.8
58.72	288.166	144	8	0	10	2.78	7.81	21.35	32.0
47.16	237.33	24	8	24	10	2.76	7.71	22.2	10.9
56.88	295.50	24	8	24	10	2.81	7.63	20.81	23.17
58.92	333.00	24	8	24	10	3.11	7.92	21.97	22.82
31.26	236.67	48	8	48	10	2.96	8	23.21	0.27
28.79	225.00	48	8	48	10	2.61	7.2	21.65	0.37
21.79	42.43	48	8	48	10	2.85	7.18	21.74	0.04
19.54	91.00	72	8	72	10	3.15	7.19	23.57	3.87
10.92	48.78	72	8	72	10	3.91	7.22	23.66	0.05
31.26	236.67	72	8	72	10	2.94	6.42	24.58	0.21
49.87	442.75	144	3	0	15	3.17	6.73	24.76	11.2
21.79	42.43	144	3	0	15	3.07	7.12	21.18	7.42
25.99	127.75	144	3	0	15	3.11	7.13	21.38	7.95
58.92	333.00	72	0	0	15	3.58	7.13	21.82	43.37
63.81	413.00	72	0	0	15	3.72	7.16	22.17	49.12
73.51	382.75	72	0	0	15	3.82	7.37	21.16	55.93
57.33	319.58	72	10	0	0	3.78	7.34	21.02	38.65
59.50	246.91	72	10	0	0	3.71	7.55	18.42	42.78
55.42	151.91	72	10	0	0	3.13	7.31	17.96	33.64

Table 3. Evaluation parameters of the 4 models.

Model	(R²)		(RMSE)		(MAE)
Model	Training Set	Test Set	Training Set	Test Set	Training Set	Test Set
BP	0.9119	0.8930	3.2657	4.2922	3.4863	4.1478
PSO-BP	0.9330	0.9263	3.4021	3.4327	3.2911	3.7068
CNN	0.9437	0.9306	3.4113	3.4992	2.9175	3.6347
RF	0.9649	0.9446	2.3486	2.4328	2.3676	3.0943

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yin, F.; Ma, R.; Liu, Y.; Xiong, L.; Luo, H. Prediction Study of Pollutants in Artificial Wetlands Enhanced by Electromagnetic Fields. Sustainability 2024, 16, 10327. https://doi.org/10.3390/su162310327

AMA Style

Yin F, Ma R, Liu Y, Xiong L, Luo H. Prediction Study of Pollutants in Artificial Wetlands Enhanced by Electromagnetic Fields. Sustainability. 2024; 16(23):10327. https://doi.org/10.3390/su162310327

Chicago/Turabian Style

Yin, Fajin, Rong Ma, Yungen Liu, Liechao Xiong, and Hu Luo. 2024. "Prediction Study of Pollutants in Artificial Wetlands Enhanced by Electromagnetic Fields" Sustainability 16, no. 23: 10327. https://doi.org/10.3390/su162310327

APA Style

Yin, F., Ma, R., Liu, Y., Xiong, L., & Luo, H. (2024). Prediction Study of Pollutants in Artificial Wetlands Enhanced by Electromagnetic Fields. Sustainability, 16(23), 10327. https://doi.org/10.3390/su162310327

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction Study of Pollutants in Artificial Wetlands Enhanced by Electromagnetic Fields

Abstract

1. Introduction

2. Materials and Methods

2.1. Sources of Data

2.2. Data Normalization

2.3. Input Feature Parameter Selection

2.4. Construction of the Model

2.4.1. BP Neural Network

2.4.2. Particle Swarm Optimization Algorithm to Improve the BP Model

2.4.3. Convolutional Neural Networks (CNN)

2.4.4. Random Forest (RF)

2.5. Indicators for Model Evaluation

3. Results and Analyses

3.1. Input Feature Selection

3.2. Analysis of Different Model Prediction Results

3.3. Analysis of Evaluation Results of Different Models

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI