Prediction Model of Wastewater Pollutant Indicators Based on Combined Normalized Codec

: Effective prediction of wastewater treatment is beneﬁcial for precise control of wastewater treatment processes. The nonlinearity of pollutant indicators such as chemical oxygen demand (COD) and total phosphorus (TP) makes the model difﬁcult to ﬁt and has low prediction accuracy. The classical deep learning methods have been shown to perform nonlinear modeling. However, there are enormous numerical differences between multi-dimensional data in the prediction problem of wastewater treatment, such as COD above 3000 mg/L and TP around 30 mg/L. It will make current normalization methods challenging to handle effectively, leading to the training failing to converge and the gradient disappearing or exploding. This paper proposes a multi-factor prediction model based on deep learning. The model consists of a combined normalization layer and a codec. The combined normalization layer combines the advantages of three normalization calculation methods: z-score, Interval, and Max, which can realize the adaptive processing of multi-factor data, fully retain the characteristics of the data, and ﬁnally cooperate with the codec to learn the data characteristics and output the prediction results. Experiments show that the proposed model can overcome data differences and complex nonlinearity in predicting industrial wastewater pollutant indicators and achieve better prediction accuracy than classical models.


Introduction
In order to protect water resources and reduce the pollution of production and domestic wastewater to the environment, it is necessary to reduce the discharge of pollutants through the harmless treatment of wastewater [1]. Therefore, the effect of wastewater treatment has received extensive attention, and innovative technologies and management methods have become a current research focus.
Anaerobic biological treatment technology, also known as anaerobic digestion (AD), is widely used in the sewage treatment link of wastewater treatment plants (WWTPs) [2]. Its processing cost includes anaerobic granular sludge (AnGS) bed reactors, e.g., the upflow anaerobic sludge blanket (UASB) reactor, the expanded granular sludge bed (EGSB) reactor, and the internal circulation (IC) reactor [3], etc. Due to the complexity of sludge composition, its application has limitations, mainly in the inability to fully use functional anaerobic microorganisms, resulting in a slow hydrolysis rate and poor biodegradability [4]. Although ultrasonic irradiation and other methods can improve the efficiency of anaerobic treatment, improper use of parameters will inhibit sludge metabolism and affect the economy of wastewater treatment [5]. Moreover, the anaerobic biological action in the the nonlinearity and strong randomness of the data, which seriously affects the model's prediction accuracy.
The solution to this problem is to modify the normalization processing part of the model so that the data can be reasonably limited to a specific range, reducing the complexity of the data, and speeding up the convergence of the model. The current research considers adaptive normalization layer, automatic selection of normalization layer, etc., and adopts a data-driven way to select a suitable normalization method adaptively. However, these improvements are primarily for univariate forecasting, and the final calculation method is still one. In the prediction task with multiple factors and significant data differences, it is practical to consider multiple normalization processing methods.
In summary, this paper considers a combined normalization codec (CNC) model for predicting water quality indicators in wastewater treatment. The model consists of a combined normalization layer, a renormalization layer, and a codec. The advantages of the processing method can be improved to improve the model's prediction accuracy.
The main contributions of this paper are summarized as follows: (1) A combined normalized encoder structure is proposed for the multi-factor prediction problem of wastewater pollutant indicators. This structure combines the advantages of three normalization methods, which can adaptively normalize and encode pollutant index data of different magnitudes, simplify complex index data processing processes, and improve the data processing capability in multi-factor prediction. (2) A combined renormalized decoder structure is proposed for the prediction task. The structure uses three renormalization methods to adaptively renormalize the output value of the decoder and map to obtain the actual prediction result. Its feature of adaptively adjusting parameters in model optimization can improve model prediction accuracy.

Related Work
Currently, some studies use machine learning methods to predict the quality of wastewater treatment. Arismendy et al. [16] developed an intelligent system based on multilayer perceptrons. The system can predict the COD index to support the relevant decisionmaking of the sewage treatment plant. Hilal et al. [17] used the model combining KNN and extreme learning machine (ELM) to predict the SS index, and the prediction accuracy reached 93.56%. Liu et al. [18] used the least squares support vector machine (LS-SVM) to build a prediction model, which was validated in the COD prediction of an anaerobic wastewater treatment system. These models based on machine learning can complete the prediction of water quality indicators in practice but generally target a single factor. Because the models are relatively simple, the prediction accuracy still needs to be improved.
Therefore, there are studies considering prediction models based on deep learning. Han et al. [19] used an adaptive fuzzy neural network to achieve multi-objective predictive control. They dealt with conflicting control objectives by capturing the nonlinear behavior of the sewage treatment plant to improve its operational performance of the sewage treatment plant. Farhi et al. [20] used LSTM to build a wastewater prediction model, which showed better results than machine learning in predicting ammonia and nitrate concentrations in wastewater. Wan et al. [21] comprehensively considered spatial, temporal, and probabilistic reliability, and used convolutional neural network (CNN), shared-weight long short-term memory (SWLSTM), and Gaussian process regression (GPR) to jointly build a model to predict water quality. And it is applied to high-precision point prediction and interval prediction monitoring of papermaking wastewater treatment systems.
These applications demonstrate the superiority of deep learning methods in wastewater treatment quality prediction. However, with the increase in pollutant index modeling needs and training data, deep learning methods also expose some problems. When faced with multiple factors and numerical differences, due to the enormous amount of training data, the existing data processing methods are complicated to operate and difficult to meet the processing requirements. Studies have shown improper normalization can significantly affect model performance, reducing model generalization and prediction accuracy [22]. Therefore, more efficient data processing methods must be adopted to cope with the growing demand for forecasting [23].
Passalis et al. [24] combined the z-score normalization method with a neural layer to design an adaptive normalization layer and applied it to the field of time series forecasting. The model adaptive optimization method can achieve better processing results than a fixed normalization scheme. Since this study only considers one basic normalization method, it is challenging to adapt widely to multiple forecasting scenarios. Jin et al. [25] combined z-score, Interval, decimal, and Min-Max normalization methods to design the normalization layer and renormalization layer and obtained the best predictions for a greenhouse weather dataset.
Based on the above analysis, this paper proposes the CNC model in combination with the actual characteristics of the deep learning state estimation method. In this paper, the combined normalization method is adopted, the advantages of various normalization methods are integrated, the data processing effect is improved, and the normalization layer and renormalization layer for the prediction task of wastewater treatment indicators are designed.

Combined Normalized Codec Prediction Model
The structure of the proposed combined normalized codec prediction model is shown in Figure 1. The model contains a variety of data normalization methods, which can adaptively integrate the advantages of multiple data processing methods through the end-to-end model optimization process. Thereby, the learning effect of the model on multidimensional data is improved, and the purpose of improving the prediction accuracy is finally achieved. The attention mechanism [Error! Reference source not found.] focuses on the encoded features, selecting the most favorable traits for the model output values and ignoring the unimportant ones, thus reducing the model's internal parameters, and learning more distant historical information. The features filtered by the attention The CNC model comprises three parts: combined normalization encoder, attention mechanism [26], and combined renormalization decoder. The combined normalization encoder integrates an adaptive combined normalization layer containing three normalization calculation methods: z-score [27], Interval [25], and Max [28] normalization. During the model training process, the unprocessed pollutant indicator data are directly input into the adaptive combined normalization layer in batches. Three normalized calculations are obtained by separately obtaining the batch data's mean, variance, and other statistics. In order to synthesize the advantages of the three calculation methods and get the optimal processing effect, the results of the three normalization calculations are weighted and selected based on the Softmax function [29]. The weights are obtained from the model training to finally generate the weighted normalized processing results. These results are scaled and panned by the learnable parameters α and β that can be dynamically adjusted according to the current model training effect. The exponential weighted average method is used to fit the global distribution of the data, and the iterative estimation is performed according to the statistics of each batch of data. The optimal global statistics are retained, and the prediction accuracy of the data by the final training model is improved. The normalized data are encoded by a multilayer LSTM [30].
The attention mechanism [27] focuses on the encoded features, selecting the most favorable traits for the model output values and ignoring the unimportant ones, thus reducing the model's internal parameters, and learning more distant historical information. The features filtered by the attention mechanism are fed into the combined renormalization decoder. The combined renormalization decoder decodes the data features. The decoding of features is mainly achieved by multilayer LSTMs containing sophisticated gating mechanisms that preserve and learn long-term information about the sequence. After decoding the prediction values, the final prediction values are output through the adaptive combined renormalization layer. Corresponding to the adaptive combined normalization layer, this layer contains three renormalization algorithms, which respectively perform renormalization calculation on the output features of the LSTM according to the statistics during data normalization. This layer also uses the Softmax function [29] to weigh the three sets of renormalized results and comprehensively considers the three sets of results through the trainable combined weights to obtain the best estimation results. Moreover, this layer adds similar trainable parameters λ and ν to correct the results, and the values of λ and ν can also be trained by backpropagation. The structure of the combined normalization encoder and the combined renormalization decoder is described below.

Combined Normalized Encoder
The schematic structure of the combined normalized encoder is shown in Figure 2. The combined normalization encoder integrates the combined normalization layer on top of the conventional encoder. It can combine the computational results of multiple normalizations by improving the effect of normalization processing and ultimately improving the feature encoding capability of the encoder. There are three normalization methods used in the combined normalization layer, including z-score [27], Interval [25], and Max [28], which are calculated as: where x represents the source data,x represents the calculation result. min, max, mean, and σ 2 represent the minimum, maximum, mean, and variance of the source data, respectively, and a, b represents the normalized interval. ∆ represents a fixed, smaller positive number. In order to make the output of combined normalization better adaptable data, in this paper, the trainable parameters α and β are used as translation factors, respectively. These two parameters can be updated with process of the model to better correct the calculation results. The output of th normalization method is adjusted according to the training effect. Th parameters are calculated as: where Y represents the final output of the normalized layer of the batch, X value of the batch after normalization calculation, α and β are correction Finally, the combined normalized output adjusted by trainable parameters is an encoding structure composed of LSTMs to obtain the encoded features.
In the model training, in order to grasp the global distribution of the dat to the batch data and ensure the fitting effect of the model to the input data a the training, this paper uses the exponential weighted moving average (EWM [32] to iteratively estimate the statistics of each batch and record the optim distribution. It is calculated as: Each of the three normalizations has its strengths and can process the input data to the standard normal distribution, (a, b) specific interval, and between (−1, 1), respectively, to exert different effects on the data. Among them, z-score [27] processing can obtain data conforming to the standard normal distribution and reduce data distribution differences [31]; Interval method [25] processing fixes the results in a specific interval to prevent gradient disappearance and gradient explosion problems; Max [28] is scaling normalization scales down the input data without changing the scale characteristics of the input data.
In order to use the effect of the three normalization methods on the input data, this paper uses the adaptive combined normalization method to weigh the calculation results of normalization and determine the most suitable normalization calculation method. In the combined normalization layer, the Softmax function [29] acts as a combined function and is calculated as follows: where t is the trainable parameter. It can optimize end-to-end by error backpropagation and is dynamically adjusted according to the model training effect. In this paper, three trainable parameters are set to output the combined weights for the results of the three normalization calculations to enhance the effectiveness of the combined normalization method. The calculation formula for combining using the Softmax function [29] is: where t 1 , t 2 , and t 3 denote the three selected trainable parameters, x 1 , x 2 , and x 3 denote the results obtained from the three normalization calculations, Softmax means the Softmax function [29], X represents the final output, and ⊗ means matrix multiplication. In order to make the output of combined normalization better adaptable to complex data, in this paper, the trainable parameters α and β are used as scaling and translation factors, respectively. These two parameters can be updated with the training process of the model to better correct the calculation results. The output of the combined normal-ization method is adjusted according to the training effect. The trainable parameters are calculated as: where Y represents the final output of the normalized layer of the batch, X denotes the value of the batch after normalization calculation, α and β are correction parameters. Finally, the combined normalized output adjusted by trainable parameters is encoded by an encoding structure composed of LSTMs to obtain the encoded features.
In the model training, in order to grasp the global distribution of the data according to the batch data and ensure the fitting effect of the model to the input data at the end of the training, this paper uses the exponential weighted moving average (EWMA) method [32] to iteratively estimate the statistics of each batch and record the optimal statistical distribution. It is calculated as: where min t , max t , mean t , and σ 2 t denote the minimum, maximum, mean, and variance statistics of the batch data at the moment t. running_min t and running_min t−1 denote the estimates of the minimum value at the moment with t and t − 1, running_max t and running_max t−1 denote the estimates of the maximum value at the moment with t and t−1, running_mean t and running_mean t−1 denote the estimates of the mean value at the moment with t and t−1, running_σ 2 t and running_σ 2 t−1 denote the estimates of the variance at the moment with t and t − 1, and k denotes the weight of retaining the information of the previous moment, respectively. In this paper, the value of k is set to 0.6. The flow of the algorithm for combined normalization layer is shown in Algorithm 1. Output:

Attention Mechanism
In this paper, the scaled dot product attention mechanism [33,34] is used to pay attention to the input features of the combined normalization encoder. By adaptively selecting relevant feature information, highly relevant features are retained, and irrelevant features are ignored, thereby improving the renormalization encoding. The structure of the scaled dot product attention mechanism is shown in Figure 3.

Combined Renormalized Decoder
The combined renormalization decoder consists of an LSTM model and an adaptive combined renormalization layer. Figure 4 shows the schematic structure of the combined renormalization decoder layer. The output features of the attention mechanism first go through a decoder consisting of multiple layers of LSTMs, which decode the features into normalized predicted values. In order to get the actual predicted value, this value needs to be processed using a combined renormalization layer. Corresponding to the normalization calculation, the adaptive merging and renormalization layer includes three renormalization calculations, which are calculated as follows: where x represents the data after renormalization, x represents the data without renormalization, and min, max, mean, and σ 2 represent the maximum, minimum, mean, and variance value of the input data, respectively, which all share the statistics from the normalization calculation and are updated with different batches of values. a and b, on the other hand, represent the interval set by the renormalization method and Δ represents a fixed smaller positive number.
To combine the results of the three renormalization calculations and improve the overall data processing, the Softmax [29] combining function is also added to the combined renormalization layer to select the results. This function is used as a combining function to calculate three trainable parameters and output the combined weights for the results of the three renormalization calculations. Three trainable parameters can be optimized by error backpropagation to improve the effectiveness of the renormalization combination. The Softmax function [29] for combining is calculated as follows: It can be seen that the feature vectors from the combined normalized coder are passed through three different linear layers to obtain the query vector Q, the key vector K, and the value vector V. First, the dot product calculation is performed on Q and K to obtain the similarity matrix of Q and K. Next, the similarity matrix is scaled. Then, the attention weights are obtained by normalizing the values of the similarity matrix using the Softmax function [29]. The purpose of using the Softmax function [29] is to ensure that the sum of the weights is 1. Then, the attention weights and V are computed as a dot product to obtain the final result. The calculation process is as follows: where d denotes the scaling multiplier, Q, K, and V denote the query vector, key vector, and value vector, respectively, Softmax denotes the Softmax function [31], and Attention (Q, K, V) denotes the final result.

Combined Renormalized Decoder
The combined renormalization decoder consists of an LSTM model and an adaptive combined renormalization layer. Figure 4 shows the schematic structure of the combined renormalization decoder layer. The output features of the attention mechanism first go through a decoder consisting of multiple layers of LSTMs, which decode the features into normalized predicted values. In order to get the actual predicted value, this value needs to be processed using a combined renormalization layer. Corresponding to the normalization calculation, the adaptive merging and renormalization layer includes three renormalization calculations, which are calculated as follows: x =x * |x| max (11) where x represents the data after renormalization,x represents the data without renormalization, and min, max, mean, and σ 2 represent the maximum, minimum, mean, and variance value of the input data, respectively, which all share the statistics from the normalization calculation and are updated with different batches of values. a and b, on the other hand, represent the interval set by the renormalization method and ∆ represents a fixed smaller positive number.
where 1 c , 2 c , and 3 c denote the three selected trainable parameters, 1 h , 2 h , and 3 h denote the results obtained from the three renormalization calculations, Softmax denotes the Softmax function [29], H denotes the final output, and ⊗ denotes the matrix multiplication. Similarly, the combined renormalization layer incorporates the learnable correction parameters λ and ν as the scaling and translation factors, respectively. The expression at the output of the renormalization layer modified by the correction parameter can be expressed as: (14) where O represents the predicted output of the renormalization layer, H represents the value after the renormalization calculation, λ is the scaling factor, and ν is the translation factor. Finally, the output O is used as the predicted value of the model. The flow of the algorithm for combined renormalization layer is shown in Algorithm 2.  To combine the results of the three renormalization calculations and improve the overall data processing, the Softmax [29] combining function is also added to the combined renormalization layer to select the results. This function is used as a combining function to calculate three trainable parameters and output the combined weights for the results of the three renormalization calculations. Three trainable parameters can be optimized by error backpropagation to improve the effectiveness of the renormalization combination. The Softmax function [29] for combining is calculated as follows: where c 1 , c 2 , and c 3 denote the three selected trainable parameters, h 1 , h 2 , and h 3 denote the results obtained from the three renormalization calculations, Softmax denotes the Softmax function [29], H denotes the final output, and ⊗ denotes the matrix multiplication. Similarly, the combined renormalization layer incorporates the learnable correction parameters λ and ν as the scaling and translation factors, respectively. The expression at the output of the renormalization layer modified by the correction parameter can be expressed as: where O represents the predicted output of the renormalization layer, H represents the value after the renormalization calculation, λ is the scaling factor, and ν is the translation factor. Finally, the output O is used as the predicted value of the model. The flow of the algorithm for combined renormalization layer is shown in Algorithm 2.

Experiment
In this experiment, the change data of pollutant indicators at the water inlet and outlet when treating brewery wastewater was used. Beer is an alcoholic beverage brewed with malt grain, hops, and water as the primary raw materials, through liquid gelatinization and saccharification and then through liquid fermentation [35]. Beer is the fifth largest consumer beverage globally, second only to tea, carbonated beverages, milk, and coffee, with an average consumption of 23 L per person per year [36]. Beer production requires a lot of water; for each cubic meter of beer produced, the water consumed in general is 10-20 m 3 , of which more than 90% will be discharged into a sewer system, and wastewater is produced at all stages of production [37]. Moreover, beer wastewater has a high concentration of soluble organic pollutants and SS [38], and the COD of the wastewater produced in the production process is high because the most organic matter in the water is made up of sugars, starches, and proteins [39]. The biological methods commonly used for beer wastewater treatment include aerobic sequential batch reactor, cross-flow ultrafiltration membrane anaerobic reactor, and UASB [40]. Beer wastewater produces methane [39], and better wastewater treatment strategies could lead to better economic benefits while protecting the environment.
The concentration of pollutants such as COD, SS, TN, and TP detected in the wastewater treatment process is an essential indicator of wastewater treatment, and whether it meets the national discharge standards is the determining factor for judging the effect of wastewater treatment. Predicting the future treatment effect according to the pollutant concentration index of the input wastewater at a historical time to assist in decision-making is a hot issue in current research. However, due to the multi-factor, complex, and nonlinear characteristics of forecasting tasks, higher requirements are placed on forecasting models' data processing and modeling capabilities. Therefore, this study uses COD, SS, TN, and TP data before and after brewery wastewater treatment to verify the model's prediction accuracy.
The predictive model is built on the open-source Tensorflow deep learning framework. In comparative experiments, the hyperparameters of the model need to be set. Specifically, all prediction models were optimized using the Adam hyperparameter optimization algorithm, and the optimized learning rate was set to 0.0001; the batch size of the data input network was set to 10, and the number of iterations per training was 300. To avoid the influence of random errors of the model on the prediction results, all comparative experiments were repeated ten times independently, and the average value was taken as the final result.
In this paper, four evaluation indicators are used to evaluate the experimental results: root mean square error (RMSE) [48], mean absolute error (MAE) [49], mean absolute percentage error (MAPE) [50], and Pearson correlation coefficient (R) [51]. All four evaluation indicators can measure the difference between the prediction value given by the model and the actual value and evaluate the model's performance. The smaller RMSE [48], MAE [49], and MAPE [50] values represent the minor difference between the prediction value given by the model and the actual value. In comparison, the larger R [51] values represent the model's better-fitting ability.

Validation Results
The dataset consists of four pollutant concentration indicators of COD, SS, TN, and TP detected during the brewery wastewater treatment. The data set was collected from a wastewater treatment station. About 720 sets were collected from 11 June to 11 July 2022. The data sampling interval was 1 h. Each data set includes four pollutant concentration indicators at the inlet and outlet. The structure of the dataset used is shown in Figure 5. The dataset consists of four pollutant concentration indicators of COD, SS, TN, and TP detected during the brewery wastewater treatment. The data set was collected from a wastewater treatment station. About 720 sets were collected from 11 June to 11 July 2022. The data sampling interval was 1 h. Each data set includes four pollutant concentration indicators at the inlet and outlet. The structure of the dataset used is shown in Figure 5. In the experiment, the CNC model proposed in this paper is compared with other classical prediction models, and the superiority of the CNC model in the prediction of the actual wastewater treatment effect is verified by comparing the experimental results. The comparison models include: ANN [41], DNN [42], LSTM [43], GRU [44], Attention_LSTM [45], Attention_GRU [46], and Codec model [47]. The pollutant concentration index of the water inlet from time t−30 to t was used to predict the pollutant concentration index of the water outlet at time t + 1. The dataset is divided into 90% training set and 10% test set.
The prediction accuracy evaluation indexes of each comparative model are shown in Table 1. Figure 6 compares the predicted and actual values of each model. We can see that the RMSE [48], MAE [49], and MAPE [50] of the CNC model proposed in this paper are reduced by 1.5%, 3.2%, and 0.5%, respectively, and the R [51] indicator is increased by 0.1% compared with the suboptimal Codec model. The comparison results show that the model proposed in this paper has better performance indicators, and the prediction results are closer to the actual situation. In the experiment, the CNC model proposed in this paper is compared with other classical prediction models, and the superiority of the CNC model in the prediction of the actual wastewater treatment effect is verified by comparing the experimental results. The comparison models include: ANN [41], DNN [42], LSTM [43], GRU [44], Attention_LSTM [45], Attention_GRU [46], and Codec model [47]. The pollutant concentration index of the water inlet from time t−30 to t was used to predict the pollutant concentration index of the water outlet at time t + 1. The dataset is divided into 90% training set and 10% test set.
The prediction accuracy evaluation indexes of each comparative model are shown in Table 1. Figure 6 compares the predicted and actual values of each model. We can see that the RMSE [48], MAE [49], and MAPE [50] of the CNC model proposed in this paper are reduced by 1.5%, 3.2%, and 0.5%, respectively, and the R [51] indicator is increased by 0.1% compared with the suboptimal Codec model. The comparison results show that the model proposed in this paper has better performance indicators, and the prediction results are closer to the actual situation.  Consistent with how other deep learning models used in engineering are deplo [52,53], the models first need to take a long time to be pre-trained using historical d which can take hours or even days. The training effect of the model is optimized continuously adjusting the model's hyperparameters until the gap between the mod predicted output during training and the reference value meets the requirements. T save the trained model parameters for practical application. Through this deploym method, in the practical application of the model, new data are input into the tra model, and it no longer takes a lot of time to perform operations. Therefore, the predi value can be given within 100 ms, meeting the real-time requirement.

Conclusions
The organic and inorganic pollutants in the wastewater produced by factories not only pollute the soil and water bodies but also endanger human health through enrichment effect of the food chain. However, due to wastewater treatment's volat and nonlinear characteristics, it is not easy to carry out predictive modeling and g early regulation, which seriously affects treatment efficiency [54].
Considering the prediction of pollutant indicators in brewery wastewater treatm to assist management, a combined normalized codec (CNC) prediction model proposed. The model is based on a combined normalized codec prediction for multi-fa The last orange-red band is the actual ground-truth value, and the prediction results of all methods are compared using dashed lines. It can be seen that the red band (the method proposed in this paper) is the closest to the actual value.
Consistent with how other deep learning models used in engineering are deployed [52,53], the models first need to take a long time to be pre-trained using historical data, which can take hours or even days. The training effect of the model is optimized by continuously adjusting the model's hyperparameters until the gap between the model's predicted output during training and the reference value meets the requirements. Then save the trained model parameters for practical application. Through this deployment method, in the practical application of the model, new data are input into the trained model, and it no longer takes a lot of time to perform operations. Therefore, the predicted value can be given within 100 ms, meeting the real-time requirement.

Conclusions
The organic and inorganic pollutants in the wastewater produced by factories will not only pollute the soil and water bodies but also endanger human health through the enrichment effect of the food chain. However, due to wastewater treatment's volatility and nonlinear characteristics, it is not easy to carry out predictive modeling and guide early regulation, which seriously affects treatment efficiency [54].
Considering the prediction of pollutant indicators in brewery wastewater treatment to assist management, a combined normalized codec (CNC) prediction model was proposed. The model is based on a combined normalized codec prediction for multi-factor and strongly nonlinear scenarios prediction tasks. In this model, the multi-factor pollutant index data such as COD and SS are first input into the combined normalization encoder. The data are adaptively processed by combining the advantages of the three normalization methods. The encoder extracts the features of the data. Then, the decoder performs feature decoding after the features are paid attention to by the attention mechanism. Finally, a combined renormalization layer adaptively renormalizes the data and outputs the prediction results. The constructed CNC model was used to predict the four pollutant indicators of COD, SS, TN, and TP in brewery wastewater treatment and compared with the classical prediction model. The proposed model's RMSE [47], MAE [48], and MAPE [49] indicators were 4.355, 3.113, and 1.007, and the R [50] index reached 0.975, which is better than the comparison model. The experimental results show that the model is more suitable for managing and applying wastewater treatment.
In future work, the model should continue to be improved to ensure prediction accuracy. Meanwhile, the method's applicability is verified by applying the model to more scenarios.