Enhancing Real-Time Prediction of Efﬂuent Water Quality of Wastewater Treatment Plant Based on Improved Feedforward Neural Network Coupled with Optimization Algorithm

: To provide real-time prediction of wastewater treatment plant (WWTP) efﬂuent water quality, a machine learning (ML) model was developed by combining an improved feedforward neural network (IFFNN) with an optimization algorithm. Data used as input variables of the IFFNN included hourly inﬂuent water quality parameters, inﬂuent ﬂow rate and WWTP process monitoring and operational parameters. Additionally, input variables included historical efﬂuent water quality parameters for future prediction. The model was demonstrated in a WWTP in Jiangsu Province, China, where prediction of efﬂuent chemical oxygen demand (COD) and total nitrogen (TN) with large variations were tested. Relative to the traditional feedforward neural network (FFNN) model without considering historical efﬂuent water quality parameter input, the IFFNN enhanced prediction performance by 52.3% (COD) and 72.6% (TN) based on the mean absolute percentage errors of test datasets, after its model structure was optimized with a genetic algorithm (GA). The problem of over-ﬁtting could also be overcome through the use of the IFFNN, with the determination of coefﬁcient increased from 0.20 to 0.76 for test datasets of efﬂuent COD. The GA-IFFNN model, which was efﬁcient in capturing complex non-linear relationships and extrapolation, could be a useful tool for real-time direction of regulatory changes in WWTP operations.


Introduction
Wastewater treatment plants (WWTPs) are an important part of urban water infrastructure in pollutant reduction and public health protection. Currently WWTPs are facing increasingly stringent requirements on effluent quality, energy consumption, and resource recycling [1][2][3][4][5]. To meet these requirements, mathematical models have increasingly been employed to predict the efficiency of the WWTP and then provide suitable operation strategies, by establishing a quantitative response of WWTP influent parameters to effluent water quality.
Over the past decades, mechanistic models established with a variety of configurations for carbon, nitrogen and phosphorous removal have been the mainstream. They target either the biological processes (to achieve the treatment quality target) or various aspects of engineering (to achieve cost effective design and operation) [6,7]. Activated sludge modeling (ASM) represents an important milestone in the modeling of biological treatment processes [8,9]. Since the introduction of the first version, i.e., ASM1 [10], various versions of such dynamic models have been developed by assimilating the developments in the understanding of wastewater treatment processes [11][12][13]. However, mechanistic models usually focus on the biological treatment processes instead of the whole treatment processes, and determination of the stoichiometric and kinetic parameters of treatment processes is also complex and differs among different WWTPs [14][15][16][17]. In addition, the same WWTP may comprise different treatment technologies corresponding to different construction stages, which further increases the complexity of using mechanistic models. Alternatively, researchers have tried to apply machine learning (ML) for wastewater treatment to take advantage of ML's ability to assimilate and analyze large amount of data for prediction and decision-making.
With the development of the ML approach, the number of studies applying ML to water system and wastewater treatment processes has increased in recent years [18][19][20]. ML models can quickly generate simulation results by using the correlations among data. Regression analysis such as multiple linear or non-linear regression and multivariate adaptive regression splines have been used in many cases for predicting the values of a continuous set, as it can find the best function parameters to minimize the sum of squares of errors between the predicted and actual values [21]. There are also other algorithms for both regression and classification, which include support vector machine, decision tree, random forest and new hybrid models [22,23]. Aside from the above algorithms, artificial neural network (ANN), especially feedforward back-propagation neural network (FFNN in short), is the most widely used method in the prediction of effluent water quality due to its fast operation speed and good fitting effect for non-linear problems [18,19,[24][25][26][27][28][29]. These ANN models have also been applied for predicting the effluent quality of processes including a membrane bioreactor [30], a sequential batch reactor [31], an anoxic/oxic system [32][33][34] and aerobic granular sludge reactors [35]. There was also a newly developed FFNN learning algorithm, i.e., the extreme learning machine, which demonstrated good real-time performance and high prediction accuracy for the measurement of biological oxygen demand of the effluent quality [36]. In addition to the typical single ML approaches mentioned above, the use of hybrid models is also of increasing interest. For example, genetic algorithm (GA), an evolutionary algorithm, uses Darwin's theory to model the natural evolutionary process to achieve the minimum or maximum objective function [37,38]. It was reported that, through combined methods, the ANN-GA could provide higher accuracy and lower error [39][40][41][42][43].
Although ML has shown great promise for minimizing complexities and complications in wastewater treatment modeling and data analysis, there are still limitations in current research on the prediction of WWTP effluent water quality. Firstly, many ML technologies were employed using analytical data from controllable scenarios (e.g., laboratory-produced data) to simulate, predict and optimize pollutant removal in wastewater treatment processes. In real scenarios, the operation of the WWTP faces greater uncertainty, e.g., stochastic WWTP influent, including sudden shock loading from untreated wastewater discharge in large quantities during short periods. Recently there were studies that employed feedforward artificial neural networks to predict the effluent quality of actual WWTPs based on monthly mean values or daily values of water quality parameters [31,33,34,44]. However, the composite nature of these data would still limit the practical application of ML models from the perspective of on-line WWTP prediction and control. Inaccessibility to the data of smaller time intervals (e.g., hourly data) to capture the dynamic variations of the process variables is a limitation of these studies. It is anticipated that ML could support a large amount of online data to become more user-friendly and perform more accurately in practical non-steady WWTP operation with potentially large variations. Secondly, a traditional forward neural network has been used as a model providing the prediction of WWTP effluents as a stimulus of WWTP influents. An alternative forecasting approach that incorporates historical WWTP effluent inputs may provide a representation of dynamic memory to enhance the efficiency of learning for future direction. Thirdly, optimization algorithms can be further employed to better simulate complex process correlations by evolving adaptive neural network structures. In this way, the hybridization of single methods is anticipated to enhance the predictive performance of the neural network.
To tackle these challenges, a hybrid model based on a dynamic feedback ML model coupled with an optimization algorithm was developed to simulate and predict wastewater treatment operations. The hybrid model was demonstrated in a WWTP in Jiangsu Province. First, a series of WWTP on-line influent and process controlling variables, as well as on-line effluent variables, was incorporated to form an ANN structure composed of processing elements and artificial neurons connected by links of variable weight to form black box representations of the treatment systems. The optimization algorithm was then introduced to optimize the neural network structure, and a robust ML model featuring the best prediction accuracy was provided. Finally, the modeling performance for real-time prediction of WWTP effluent water quality was probed.
The paper is organized as follows: In Section 2, the study site and on-line database are described, together with the structure and methodology of the GA-IFFNN presented herein. The results and discussion are shown in Section 3, where the GA-IFFNN is compared with the traditional ANN approach, so as to demonstrate the promising results of developed GA-IFFNN to predict real-time effluent water quality with large variations. A final summary and a perspective of the future work are described in Section 4.

Study Site Description
The study site was a WWTP in Suzhou City, Jiangsu Province, China. The selected WWTP serves a population of 1,200,000 and treats up to 180,000 m 3 /day of municipal wastewater. The plant was constructed in two phases with design treatment capacities of 100,000 m 3 /day and 80,000 m 3 /day, respectively.
As shown in Figure 1, the wastewater treatment process consisted of pre-treatment (screens), a grit chamber and an activated sludge system, which included anaerobic, anoxic and aerobic tanks, with a hydraulic retention time of 18.2 h. The biological treatment system was followed by a secondary clarifier and then treated with flocculation, sedimentation, sand filtration and chlorination before discharge as the final effluent.
Water 2022, 14, x FOR PEER REVIEW for future direction. Thirdly, optimization algorithms can be further employed t simulate complex process correlations by evolving adaptive neural network struct this way, the hybridization of single methods is anticipated to enhance the pre performance of the neural network.
To tackle these challenges, a hybrid model based on a dynamic feedback ML coupled with an optimization algorithm was developed to simulate and wastewater treatment operations. The hybrid model was demonstrated in a WW Jiangsu Province. First, a series of WWTP on-line influent and process con variables, as well as on-line effluent variables, was incorporated to form an ANN st composed of processing elements and artificial neurons connected by links of v weight to form black box representations of the treatment systems. The optim algorithm was then introduced to optimize the neural network structure, and a rob model featuring the best prediction accuracy was provided. Finally, the m performance for real-time prediction of WWTP effluent water quality was probed The paper is organized as follows: In Section 2, the study site and on-line d are described, together with the structure and methodology of the GA-IFFNN pr herein. The results and discussion are shown in Section 3, where the GA-IF compared with the traditional ANN approach, so as to demonstrate the promising of developed GA-IFFNN to predict real-time effluent water quality with large var A final summary and a perspective of the future work are described in Section 4.

Study Site Description
The study site was a WWTP in Suzhou City, Jiangsu Province, China. The s WWTP serves a population of 1,200,000 and treats up to 180,000 m 3 /day of mu wastewater. The plant was constructed in two phases with design treatment capa 100,000 m 3 /day and 80,000 m 3 /day, respectively.
As shown in Figure 1, the wastewater treatment process consisted of pre-tre (screens), a grit chamber and an activated sludge system, which included ana anoxic and aerobic tanks, with a hydraulic retention time of 18.2 h. The bio treatment system was followed by a secondary clarifier and then treated with flocc sedimentation, sand filtration and chlorination before discharge as the final efflue

On-Line Data Source
Data used to develop machine learning method for predicting effluent water quality were obtained from on-line instruments at the WWTP. Generally, the on-line data can be divided into three categories: WWTP influent flow and water quality data, WWTP effluent water quality data and treatment process monitoring and equipment operational data.
For the WWTP influent, there were eight on-line parameters including stage I inflow, stage II inflow, total outflow and influent water quality data for chemical oxygen demand (COD), ammonia (NH 3 -N), total nitrogen (TN), total phosphorus (TP), and pH. For the WWTP effluent, there were four on-line water quality parameters including COD, NH 3 -N, TN and TP. The on-line data collection frequency for both WWTP influent and effluent was 1 h. Statistics of WWTP influent and effluent water quality data for the year 2019 is shown in Table 1. For the monitoring of the WWTP treatment processes, there were 36 on-line monitoring devices, including 6 dissolved oxygen (DO) meters, 4 oxidation-reduction potential (ORP) meters and 3 mixed liquor suspended solids (MLSS) meters in the biological reaction tanks of phase I; 12 DO meters, 7 ORP meters and 4 MLSS meters in the biological reaction tanks of phase II. There were also 42 on-line operational control devices including 8 internal reflux controlling valves in biological reaction tanks, 5 air blowers, 1 total air blower controlling valve, 14 flocculation mixers and 14 sand filter effluent valves. The operation data of all of the above devices were collected in real time, with a data collection frequency of 1 h.

Motive of Model Structure Design
Commonly, the modeling of pollutant removal through a WWTP is based on the completely mixed reactor model with first-order kinetics: where c is the effluent concentration; V is volume of system; W(t) is the real-time influent mass loading, taken as the product: where Q in (t) is the volumetric flow rate entering the simulated system and C in (t) is influent concentration as a function of time; λ is eigenvalue or characteristic value that represents mass loss related to WWTP outflow and biological treatment processes. During simulation, the first derivative of c with respect to t can be approximated using a forward difference by where c i and c i+1 are simulated effluent concentration at present and at a future time, t i and t i+1 , respectively. Substituting Equation (3) into Equation (1) finally yields Equation (4) indicates that effluent water quality at a future time is related to both the influent mass loading and effluent water quality at the present time. Based on traditional FFNN design, the influent mass loading (W(t)) and treatment process parameters (generalized as λ) are treated as the input layer and effluent water quality is regarded as the output layer. However, the influence of current and historical effluent water quality on future water quality cannot be incorporated into the FFNN. To overcome this problem, we designed an improved feed-forward neural network (IFFNN) that further introduced historical WWTP effluent parameters into ANN input variables, as shown in Figure 2. In this way, the multi-layer perceptron ANN enabled the model to capture the memory of the WWTP operation for future directions in real-time prediction, and it would be efficient to model highly dynamic systems with time series [45]. Depending on the data collection frequency of on-line sensors (1 h data sampling frequency in this case), the model could perform the prediction in a flexible way. Compared to the recurrent neural network model featuring a recurrent hidden state whose activation at each time is dependent on that of the previous time [46,47], the IFFNN developed herein permits the detection of temporal dependences more easily by directly incorporating the previous on-line effluent measurements as inputs. During simulation, the first derivative of c with respect to t can be approximated using a forward difference by where and are simulated effluent concentration at present and at a future time, and , respectively. Substituting Equation (3) into Equation (1) finally yields Equation (4) indicates that effluent water quality at a future time is related to both the influent mass loading and effluent water quality at the present time. Based on traditional FFNN design, the influent mass loading ( ( )) and treatment process parameters (generalized as ) are treated as the input layer and effluent water quality is regarded as the output layer. However, the influence of current and historical effluent water quality on future water quality cannot be incorporated into the FFNN. To overcome this problem, we designed an improved feed-forward neural network (IFFNN) that further introduced historical WWTP effluent parameters into ANN input variables, as shown in Figure 2. In this way, the multi-layer perceptron ANN enabled the model to capture the memory of the WWTP operation for future directions in real-time prediction, and it would be efficient to model highly dynamic systems with time series [45]. Depending on the data collection frequency of on-line sensors (1 h data sampling frequency in this case), the model could perform the prediction in a flexible way. Compared to the recurrent neural network model featuring a recurrent hidden state whose activation at each time is dependent on that of the previous time [46,47], the IFFNN developed herein permits the detection of temporal dependences more easily by directly incorporating the previous on-line effluent measurements as inputs. Generally, the ANN model comprises one or two hidden layers. An increase of hidden layers could reduce predictive accuracy due to overfitting [48]. Therefore, an ANN model with one input layer with multiple input parameters, two hidden layers and one output layer with one neuron (i.e., one designated effluent water quality indicator) was adopted. The number of neurons and most appropriate activation functions in the hidden …… …… · · · · · · · · · Effluent data at time t, t- Generally, the ANN model comprises one or two hidden layers. An increase of hidden layers could reduce predictive accuracy due to overfitting [48]. Therefore, an ANN model with one input layer with multiple input parameters, two hidden layers and one output layer with one neuron (i.e., one designated effluent water quality indicator) was adopted. The number of neurons and most appropriate activation functions in the hidden layer were first specified by the user and then optimized by the optimization algorithm to achieve the best fit between modeling and measured output data.

Modeling Methodology 2.4.1. IFFNN Module
Let A denote the set of indices i for which a i (t) is an external input at moment t, and B denote the set of indices i for which b i (t − k), k = 0, 1, . . . , n, be the ANN output of the designated parameter at moment t, t − 1, . . . , t − n, respectively, in the network. We, thus, have where µ i (t) denotes the i-th input element. The activity of neuron j at moment t is computed by where w ji is the weight of the link between i-th input element to neuron j. The output of neuron j is given by passing net j (t) through the activation function f (.), yielding where θ is the threshold to control the output of current neuron. In Equation (9), a variety of candidate activation functions were used to complete the non-linear transfer from input parameters to output data. Specifically, seven activation functions including Relu, Softmax, Softplus, Softsign, Relu6, Tanh and Sigmoid function were employed in our study.
The performance of the established IFFNN was assessed using the mean absolute percentage error (MAPE) and the coefficient of determination (R 2 ), as described below [49,50]: where y is the average of y over the n data,ŷ i is the predicted time series of the WWTP effluent water quality data by IFFNN model and y i is the measured time series of the WWTP effluent data.

IFFNN Structure Optimization
The search for the best IFFNN structure was based on GA. As an adaptive optimization algorithm based on "survival of the fittest", GA starts to optimize the problem with an initial population consisting of N chromosomes or individuals. Through the evolution of chromosome groups from generation to generation, including reproduction, crossover and mutation, the algorithm converges to the individual with the best performance for model prediction. Specifically, the optimization steps of GA are performed as follows: Step 1: Generation of the initial population. Figure 3a gives an initial population with N designated individuals or chromosomes, and each individual is a combination of four optimization parameters, including the number of nodes in the first hidden layer, the number of nodes in the second hidden layer and the activation function in each of the two hidden layers, respectively. For the activation function in the first and second layer of each individual, the numbers 0~6 represent the Relu, Softmax, Softplus, Softsign, Relu6, Tanh and Sigmoid functions, respectively.
Step 2: Population evolution. The tournament method runs a tournament among two individuals chosen at random from the population and selects the winner in accordance with their fitness values (based on MAPE). The winner with better fitness is left intact within the genotype array, enabling the presumptive good genes to be transferred to the next generation (see Figure 3b). The loser with poorer fitness will be modified by crossover and mutation subsequently. In this case, the fitness was based on the MAPE as shown in Equation (8).
within the genotype array, enabling the presumptive good genes to be transferred next generation (see Figure 3b). The loser with poorer fitness will be modified by cro and mutation subsequently. In this case, the fitness was based on the MAPE as sh Equation (8).
Step 3: Crossover and mutation. Crossover is applied on two parent individu originates a new individual that contains the combined traits of the parents (see 3b). In the mutation stage, new individuals are produced by changing all or some of the selected individuals within the population. Mutation is applied in the of generated by the crossover with a mutation probability to which low values are u assigned.
(a) (b) Steps 2 and 3 were repeated until one individual's fitness was equal to 1 or predefined maximum number of generations was attained.

Datasets for ANN Modeling
On-line data in 2019 were used to perform ML prediction for WWTP effluen quality. Specifically, 1700 groups of data were extracted, where cases of non-stead Step 3: Crossover and mutation. Crossover is applied on two parent individuals and originates a new individual that contains the combined traits of the parents (see Figure 3b). In the mutation stage, new individuals are produced by changing all or some genes of the selected individuals within the population. Mutation is applied in the offspring generated by the crossover with a mutation probability to which low values are usually assigned.
Steps 2 and 3 were repeated until one individual's fitness was equal to 1 or until a predefined maximum number of generations was attained.

Datasets for ANN Modeling
On-line data in 2019 were used to perform ML prediction for WWTP effluent water quality. Specifically, 1700 groups of data were extracted, where cases of non-steady large variations were included. In this way, the modeling robustness to predict effluent water quality in response to WWTP influent with significant variations could be probed.
These datasets were divided into two subsets of data, namely training and testing data. Previous study showed that 60% training datasets might not be representative enough of the whole dataset [51,52]; however, the performance of the ANN model increased obviously by feeding more training data, with stronger capability and robustness in prediction and extrapolation using 70~89% of the group for training [48]. Apart from a simple percentage cut, there is also the method of k-fold cross validation [53]. Taking 10-fold cross validation as an example, the dataset is divided into 10 parts and 10% of data is rotated for testing in turn, which runs the program 10 times. The modeling performance is then evaluated based on the averaged outcomes of 10 simulations measured by indicators such as MAPE and R 2 . The k-fold cross validation method is especially efficient for smaller samples, e.g., daily averaged WWTP influent and effluent data. However, considering that hourly WWTP influent and effluent data were available in our case, 80% of the datasets were directly used for training, and the remaining 20% of data points were employed for testing purposes. Specifically, 20% data points (i.e., 340 data points) were chosen pseudorandomly based on the linear congruential method [54].

Modeling Prediction Performance between FFNN and IFFNN
Two separate ANN models for effluent COD and TN were established and trained, considering the two effluent parameters exhibited large real-time variations and effluent COD even had the potential of over-limit discharge. These two models were developed using FFNN and IFFNN structure separately. For both the models, the activation function initially employed for the two hidden layers was Relu6 function.
The neural network was initially designed based on "n-n-n-1" structure, indicating n input variables, n neurons in the first and second hidden layer, and one output variable for designated effluent parameter. For the IFFNN model, 90 parameters including three inflow rates (i.e., total inflow, inflow of phase I and phase II WWTP), 78 on-line treatment process monitoring and equipment operational parameters, 5 influent water quality parameters (COD, NH 3 -N, TN, TP and pH) at current time step t and 4 effluent water quality parameters (COD, NH 3 -N, TN, TP) at time step t were employed as input variables. For the traditional FFNN model, 4 effluent water quality parameters at time step t were not included in the input variables, and 86 input variables were employed. Thus, the adopted neural network structure for FFNN and IFFNN was "86-86-86-1" and "90-90-90-1", respectively.
Comparison of real-time modeling performance based on FFNN and IFFNN is demonstrated and summarized in Figures 4 and 5 and Table 2. Generally, the lower relative error and higher R 2 throughout the data groups, and for the two modeled parameters of COD and TN, are indications of the IFFNN's robustness. Strictly, if taking test datasets as the measurement, the MAPE of the FFNN approximating wastewater treatment processes was 22.0% and 8.4% for effluent COD and TN, respectively, while the MAPE of the predicted COD and TN using the IFFNN was 16.,8% and 4.8% respectively. The performance efficiency of IFFNN modeling improved by 23.6% and 42.9% for COD and TN, respectively, as compared to that of the basic FFNN. If the test datasets were compared based on R 2 , it increased from 0.20 to 0.47 for effluent COD after the employment of IFFNN. For effluent TN, R 2 improved from 0.70 to 0.89 after the use of IFFNN.  The IFFNN models also demonstrated their robustness in predicting the scenar effluent under large variations such as shock loading from illicit industrial se discharge into sewers in a short period, which would be helpful for enhancin preparedness of the WWTP operation or adjusting the urban sewer network in ad for leveling off influent pollutant loading. The IFFNN seems to predict the high v better than the FFNN. For example, if setting the effluent COD concentration of 40 as the risk value of potentially abnormal cases, the performance of the IFFNN mod predicting effluent COD under abnormal cases improved by 43.3% compared to th   The IFFNN models also demonstrated their robustness in predicting the scenario effluent under large variations such as shock loading from illicit industrial sew discharge into sewers in a short period, which would be helpful for enhancing preparedness of the WWTP operation or adjusting the urban sewer network in adv for leveling off influent pollutant loading. The IFFNN seems to predict the high va better than the FFNN. For example, if setting the effluent COD concentration of 40 m as the risk value of potentially abnormal cases, the performance of the IFFNN mode predicting effluent COD under abnormal cases improved by 43.3% compared to th    The IFFNN models also demonstrated their robustness in predicting the scenarios of effluent under large variations such as shock loading from illicit industrial sewage discharge into sewers in a short period, which would be helpful for enhancing the preparedness of the WWTP operation or adjusting the urban sewer network in advance for leveling off influent pollutant loading. The IFFNN seems to predict the high values better than the FFNN. For example, if setting the effluent COD concentration of 40 mg/L as the risk value of potentially abnormal cases, the performance of the IFFNN model for predicting effluent COD under abnormal cases improved by 43.3% compared to that of the basic FFNN when measured using MAPE. Under this scenario, R 2 increased from 0.65 to 0.85 due to the introduction of the IFFNN. As demonstrated in Figure 5a, for the effluent over-limit discharges, the COD concentration predicted by the IFFNN was more likely to approximate the truth; by comparison, the predicted output by the FFNN tended to be under-estimated, which is associated with the lack of a historical memory of the WWTP effluent especially under abnormal influent conditions. Therefore, potential future over-limit discharge could be neglected by the FFNN. By contrast, the IFFNN could present a reasonable prediction. In this way, the IFFNN is superior to the FFNN for determining the WWTP's proactive actions to tackle potentially abnormal cases.

Modeling Prediction Performance Based on GA-IFFNN
The developed IFFNN model was then optimized with GA (i.e., GA-IFFNN), with the objective of further improving the predicting efficiency for the WWTP effluent water quality. In this case, the minimum and maximum number of artificial neurons in the two hidden layers of the IFFNN were set to 20 and 200, respectively. A random pick was made between the minimum and maximum neurons with a designated interval of 10. The seven above-mentioned candidate activation functions were used. The population size, the number of generations and the probability of mutation of GA were set to 200, 20 and 0.001, respectively. In this way, the GA solved the optimization problem by starting from a random solution and looking for the optimal solution (i.e., optimized FFNN structure) through iteration.
During the self-optimizing process, 20 generations and a corresponding 4000 runs were performed to reach an acceptable goodness of fit between predicted time series data and the measured data at the WWTP outlet. For each generation, through training datasets, the optimized IFFNN structures were determined, and then the modeling prediction performance of GA-IFFNN was further verified using separate test datasets. Figure 6 presents the modeling performance under different generations and associated populations using MAPE as the performance indicator. It shows that the MAPE drops rapidly when the number of generations increases. Generally, for both the predicted effluent indicators, when the size of generations was up to 15, the MAPE fell into a small range, and a further increase of generations could not reduce the MAPE, suggesting that the model optimization was achieved.
effluent especially under abnormal influent conditions. Therefore, potential future over-limit discharge could be neglected by the FFNN. By contrast, the IFFNN could present a reasonable prediction. In this way, the IFFNN is superior to the FFNN for determining the WWTP's proactive actions to tackle potentially abnormal cases.

Modeling Prediction Performance Based on GA-IFFNN
The developed IFFNN model was then optimized with GA (i.e., GA-IFFNN), with the objective of further improving the predicting efficiency for the WWTP effluent water quality. In this case, the minimum and maximum number of artificial neurons in the two hidden layers of the IFFNN were set to 20 and 200, respectively. A random pick was made between the minimum and maximum neurons with a designated interval of 10. The seven above-mentioned candidate activation functions were used. The population size, the number of generations and the probability of mutation of GA were set to 200, 20 and 0.001 respectively. In this way, the GA solved the optimization problem by starting from a random solution and looking for the optimal solution (i.e., optimized FFNN structure) through iteration.
During the self-optimizing process, 20 generations and a corresponding 4000 runs were performed to reach an acceptable goodness of fit between predicted time series data and the measured data at the WWTP outlet. For each generation, through training datasets, the optimized IFFNN structures were determined, and then the modeling prediction performance of GA-IFFNN was further verified using separate test datasets Figure 6 presents the modeling performance under different generations and associated populations using MAPE as the performance indicator. It shows that the MAPE drops rapidly when the number of generations increases. Generally, for both the predicted effluent indicators, when the size of generations was up to 15, the MAPE fell into a small range, and a further increase of generations could not reduce the MAPE, suggesting that the model optimization was achieved. The optimized IFFNN structures including the optimized neurons in the first and second hidden layer and the associated activation functions are presented in Table 2 Different water quality parameters correspond to different optimized IFFNN structures, which may be related to different treatment processes for pollutant removal. For example, The optimized IFFNN structures including the optimized neurons in the first and second hidden layer and the associated activation functions are presented in Table 2. Different water quality parameters correspond to different optimized IFFNN structures, which may be related to different treatment processes for pollutant removal. For example, removal of TN involves anaerobic-anoxic-aerobic reaction tanks, while the treatment of COD is further enhanced by following flocculation-sedimentation processes. Table 2 demonstrates that, compared with the IFFNN, the outcome of the GA-IFFNN model was better and the prediction error was less. Measured by MAPE, the performance efficiency of GA-IFFNN modeling further improved by 22.6% and 45.8% over that of IFFNN, for the effluent COD and TN, respectively. R 2 between the predicted and observed values of effluent COD and TN further improved from 0.47 to 0.63 and from 0.89 to 0.97, respectively, using GA-IFFNN as compared to IFFNN. As a result, the possibility of overfitting further decreased for effluent COD prediction. In particular, the predicted and measured TN concentration in WWTP effluent discharge were almost identical, showing that the GA-IFFNN model developed here would be a robust ML tool in predicting the real-time WWTP effluent water quality of conventional pollutants.
Furthermore, the modeling prediction efficiency with the increase of artificial neurons in the hidden layers and backward hours for effluent data input was probed, so as to further test the robustness of the optimized IFFNN model. The activation function in the first and second layer remained the same as that of GA-IFFNN in Table 2. The artificial neurons in the hidden layers ranged from 200 to 800, and the backward hours for effluent data input ranged from 1 to 25 h (i.e., at moments t, t − 1, . . . , t − 24 in Equation (5)). The MAPE and R 2 of the test datasets for predicted COD and TN under different neural network complexity are presented in Figures 7 and 8, respectively. These two figures exhibit a fluctuating trend of modeling performance, which is associated with the high non-linearity of the ANN structure.
for effluent TN is also demonstrated in Table 2, with the effluent data input of 25 h an 200 neurons in the two hidden layers. However, very slight improvement was foun considering that the originally optimized IFFNN structure coincided well with the trut Generally, for the real-time prediction of WWTP effluent water quality with potential large variations, such as effluent COD prediction herein, an MAPE on the order of 10% could be the best modeling performance to approximate the real outputs. Under this cas relative to the traditional FFNN model, the GA-IFFNN enhanced the predictio performance of effluent COD by 52.3% when measured based on MAPE. The problem over-fitting could also be overcome significantly through the use of the GA-IFFNN, wi the R 2 increasing from 0.20 to 0.76 for test datasets of effluent COD. Additionally, th computation time for effluent COD prediction using the originally optimized IFFNN an best IFFNN was 43 and 118 s, respectively (Intel i5-8250U CPU, 1.80 GHz), which did n indicate an obviously increased computation burden in the latter case.

Conclusions
To provide highly effective real-time prediction of WWTP effluent water quality, a ML model based on IFFNN coupled with GA was developed. Compared to the tradition FFNN that processes the input assuming the absence of dependency on historical outpu the IFFNN incorporated not only current inputs but also recent past outputs from on-lin measurements to produce new outputs. Although GA-IFFNN did not depend on the u effluent data input. For all the combination scenarios in Figure 8, the best IFFNN structure for effluent TN is also demonstrated in Table 2, with the effluent data input of 25 h and 200 neurons in the two hidden layers. However, very slight improvement was found considering that the originally optimized IFFNN structure coincided well with the truth Generally, for the real-time prediction of WWTP effluent water quality with potentially large variations, such as effluent COD prediction herein, an MAPE on the order of 10% could be the best modeling performance to approximate the real outputs. Under this case relative to the traditional FFNN model, the GA-IFFNN enhanced the prediction performance of effluent COD by 52.3% when measured based on MAPE. The problem o over-fitting could also be overcome significantly through the use of the GA-IFFNN, with the R 2 increasing from 0.20 to 0.76 for test datasets of effluent COD. Additionally, the computation time for effluent COD prediction using the originally optimized IFFNN and best IFFNN was 43 and 118 s, respectively (Intel i5-8250U CPU, 1.80 GHz), which did no indicate an obviously increased computation burden in the latter case.

Conclusions
To provide highly effective real-time prediction of WWTP effluent water quality, an ML model based on IFFNN coupled with GA was developed. Compared to the traditiona FFNN that processes the input assuming the absence of dependency on historical output the IFFNN incorporated not only current inputs but also recent past outputs from on-line measurements to produce new outputs. Although GA-IFFNN did not depend on the use For all the combination scenarios in Figure 7, the lowest MAPE and highest R 2 of test datasets in Figure 7 were 10.5% and 0.76, respectively, with an improvement of 2.5% and 0.13, respectively, compared to the measurements of the originally optimized IFFNN. The best modeling structure for effluent COD prediction is presented in Table 2, which corresponds to 800 artificial neurons in the two hidden layers and backward 25 h for effluent data input. For all the combination scenarios in Figure 8, the best IFFNN structure for effluent TN is also demonstrated in Table 2, with the effluent data input of 25 h and 200 neurons in the two hidden layers. However, very slight improvement was found considering that the originally optimized IFFNN structure coincided well with the truth. Generally, for the real-time prediction of WWTP effluent water quality with potentially large variations, such as effluent COD prediction herein, an MAPE on the order of 10%, could be the best modeling performance to approximate the real outputs. Under this case, relative to the traditional FFNN model, the GA-IFFNN enhanced the prediction performance of effluent COD by 52.3% when measured based on MAPE. The problem of over-fitting could also be overcome significantly through the use of the GA-IFFNN, with the R 2 increasing from 0.20 to 0.76 for test datasets of effluent COD. Additionally, the computation time for effluent COD prediction using the originally optimized IFFNN and best IFFNN was 43 and 118 s, respectively (Intel i5-8250U CPU, 1.80 GHz), which did not indicate an obviously increased computation burden in the latter case.

Conclusions
To provide highly effective real-time prediction of WWTP effluent water quality, an ML model based on IFFNN coupled with GA was developed. Compared to the traditional FFNN that processes the input assuming the absence of dependency on historical output, the IFFNN incorporated not only current inputs but also recent past outputs from on-line measurements to produce new outputs. Although GA-IFFNN did not depend on the use of internal recurrent hidden state (e.g., the case of recurrent neural network), it was able to model the high non-linearity of the actual WWTPs and capture the dynamics of the WWTP in response to large varied influent properties.
An application in one WWTP with the treatment capacity of 180,000 m 3 /day in Jiangsu Province, China, demonstrated the effectiveness of the model developed in this study. Measured by MAPE, GA-IFFNN model improved the prediction performance by 52.3% and 72.6% for the test datasets of COD and TN, respectively, relative to the traditional FFNN model. In particular, the problem of over-fitting could be overcome significantly after the use of IFFNN (R 2 increased from 0.20 to 0.76 for effluent COD prediction), and the GA-IFFNN modeling output for TN was even almost identical with the monitored data. Comparing to the optimized IFFNN model, further increased model complexity by augmenting artificial neurons in the hidden layers and historical WWTP effluent data input did not necessarily improve prediction performance.
Furthermore, under the scenario of sudden WWTP overloading, the application of the presented ML technologies could be strengthened by regulating the inflow volume rate and concentration in a timely manner (e.g., regulation of urban sewer network through distributed operations of mid-way sewage pumping stations to level off influent shock loading), so as to provide operators with the opportunity to ensure wastewater discharge standards. In this way, the developed model would help adequately address pollutant removal and respond to ensure safe operation of the WWTP as well.
The limitation of this study is that the current IFFNN model lacks an optimal control module to minimize WWTP operational costs in real time, while satisfying the effluent requirements. Thus, future work involves developing an ML model that further combines the current IFFNN with an optimal control algorithm, which will enable its application to both real-time prediction of effluent water quality and WWTP advanced control. Such a model would help adequately address smart WWTP operation and achieve economic benefits in a coordinated way.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the authors, though some of the data may be held by third parties and permission would need to be sought to obtain those data.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.