Inlet Water Quality Forecasting of Wastewater Treatment Based on Kernel Principal Component Analysis and an Extreme Learning Machine

The stable operation of sewage treatment is of great significance to controlling regional water environment pollution. It is also important to forecast the inlet water quality accurately, which may ensure the purification efficiency of sewage treatment at a low cost. In this paper, a combined kernel principal component analysis (KPCA) and extreme learning machine (ELM) model is established to forecast the inlet water quality of sewage treatment. Specifically, KPCA is employed for feature extraction and dimensionality reduction of the inlet wastewater quality and ELM is utilized for the future inlet water quality forecasting. The experimental results indicated that the KPCA-ELM model has a higher accuracy than the other comparison PCA-ELM model, ELM model, and back propagation neural network (BPNN) model for forecasting COD and BOD concentration of the inlet wastewater, with mean absolute error (MAE) values of 2.322 mg/L and 1.125 mg/L, mean absolute percentage error (MAPE) values of 1.223% and 1.321%, and root mean square error (RMSE) values of 3.108 and 1.340, respectively. It is recommended from this research that the method may provide a reliable and effective reference for forecasting the water quality of sewage treatment.


Introduction
The accumulation of high levels of pollutants in water may cause adverse effects on humans and wildlife [1,2].It is necessary to purify polluted water by sewage treatment in a timely manner to meet emission standards.However, the production conditions of the sewage treatment process are accompanied by random disturbance.It is difficult to deal with water quality in a short time so that it returns to normal, which will greatly affect the next phase of water quality once the problem occurs and it may result in serious energy waste.The past observation data for forecasting inlet water quality helps to adjust the performance parameters and keep the wastewater treatment plant (WWTP) operating economically and stably.Therefore, inlet water quality forecasting is vital for wastewater treatment [3], which gives messages in advance for guiding the operations with a high efficiency.
Recently, various models for dealing with this issue have been proposed, e.g., artificial neural network [4,5], auto-regressive integrated moving average [6], data mining [3], Multiple regression method [7], adaptive recursive least squares [8], support vector machine [9], partial least squares [10], and measured hydraulic dynamics [11].Among these models, machine learning has attracted much attention because of its superiorities [12], e.g., high generalization performance, low computational complexity, fast learning speed, strong generalization ability, and universal approximation property [13].Many machine learning experts have turned to high-level abstractions, which dramatically simplify the design and implementation of a restricted class of parallel algorithms [14].Machine learning is programming computers to optimize a performance criterion using example data or past experience [15], and an extreme learning machine (ELM) is a successful representation [16].A new fast learning neural algorithm that refers to the ELM with additive hidden nodes and radial basis function kernels has been developed for single-hidden layer feed forward networks (SLFNs) [17].It has shown an excellent predictive performance in various fields because of several salient features: (1) Simple structure: No parameters need to be manually tuned except for the predefined network architecture [18]; (2) Fast learning speed: It can produce a good generalization performance in most cases and learn thousands of times faster than conventional popular learning algorithms for feed forward neural networks [19]; and (3) Wide applicability: Almost all piece wise continuous can be used as activation functions in ELM and fully complex functions can also be used as activation functions in ELM.Therefore, it is an active research topic with multiple extensions and improvements proposed over the last decade [20], e.g., horizontal global solar radiation [21], Landslide hazard [22], electricity price [23], short-term load forecasting [24], and water quality [25].The ELM is applied to forecast the inlet wastewater quality in this paper, and the experimental results have achieved a good effect.
To enhance the forecasting accuracy, many data preprocess methods are proposed in forecasting models, e.g., principal component analysis (PCA) [26], kernel principal component analysis (KPCA) [27], wavelet transform [28], cluster analysis [29], mode decomposition [30], linear discriminant analysis [31], independent component correlation algorithm [32], and factor analysis [33].Among these techniques, the PCA and the KPCA are widely used in classification, feature extraction, and de-noising applications [34], which can help reduce the dimensionality of the data and determine the key variables in a multidimensional data set [35,36].Furthermore, compared with the PCA [37], the KPCA can effectively capture data nonlinear characteristics without requirements for the spatial distribution of the original data.The method has been successfully applied in many fields, e.g., process monitoring and fault diagnosis [38], intrusion detection [39], formation drill ability prediction [40], and displacement prediction in colluvial landslides [41].However, the KPCA is rarely applied to inlet wastewater quality forecasting, so the KPCA is thus introduced in this paper.
Therefore, the ELM model combining the KPCA is proposed for inlet wastewater forecasting.The frameworks of the approach can be divided into two parts: (1) Principal components extraction: Describes the feature extraction case, in which KPCA is introduced as a tool for eliminating linear correlation among data and for extracting the principal components; and (2) Forecasting model performance: Performs the ELM to learn and forecast the inlet wastewater quality factors.In this paper, chemical oxygen demand (COD) and biochemical oxygen demand (BOD) are taken as examples, which are representative parameters for sewer water quality [42][43][44], and the oxygen consumption from the degradation of organic material is normally measured as BOD and COD.So, the BOD and the COD are selected as forecasting quality factors in this paper.Furthermore, to validate performance, the investigated results are compared with the PCA-ELM, the ELM, and the back propagation neural network (BPNN) in this study.
The remainder of this paper is structured as follows.Section 2 describes the modeling methods of the KPCA, ELM, and proposed KPCA-ELM models.Section 3 illustrates the datasets, the process of sewage treatment, the experimental design, and the performance criteria of the forecasting models.Section 4 presents the search results and discussion.Section 5 provides conclusions.

Extracting Principal Components Based on KPCA
The KPCA has successfully extended the PCA to nonlinear cases by mapping the data in the original space into a higher or even infinite dimensional feature space [45].This mapping technique can increase the amount of information in the data set, particularly if the number of data is small [46].The KPCA has already proven to be powerful as a preprocessing step for identification algorithms [47][48][49].In this section, a brief description of KPCA for feature extraction is provided.
A sample composed of n particles is represented as x k (k = 1, 2, 3, . . ., n).Assuming φ is nonlinear mapping, the sample covariance matrix C in F space should fit the formula [50], where φ(x k ) is the kth sample in the feature space with zero-mean and unit-variance.Let [φ(x 1 ), • • • , φ(x n )] be the data matrix in the feature space, where φ is usually hard to obtain.To avoid eigenvalue-decomposing C directly, a Gram kernel matrix K is determined as follows: The mean centered kernel matrix can be calculated from where By applying eigenvalue decomposition to the kernel function matrix K, as shown, One can obtain the orthonormal eigenvectors α 1 , α 2 , • • • , α n and the associated corresponding eigenvalues The dimension reduction can be achieved by retaining the first p eigenvectors.The score vector of the kth observation in the testing sample data set can be obtained by projecting φ(x) onto the eigenvectors V k in F, where k = 1, . . ., p.In the feature space, the nonlinear principal components of the testing sample x can be extracted by [51]: The general rules for selecting the main elements are where p is the number of principal components and E is the threshold of principal components.Then, the principal component vector in feature space can be calculated by Equation (5), and the feature information can be obtained and analyzed.

Forecasting Water Quality Based on Extreme Learning Machine
The ELM is different from the general algorithm of feed forward neural networks, which overcomes the problems caused by gradient descent-based algorithms such as BP applied in ANNs [52].This method is supported by the learning speed and generalization ability, which represent outstanding advantages in the data set and the actual application [53].
The ELM is a very simple and fast neural network learning algorithm.A single hidden layer feed forward network with L hidden layer nodes using activation function g(•) for these N training data is given by: where w i ∈ R q is the input weight vector connecting the input layer nodes to the ith hidden node, b i ∈ R h is the bias of the kth hidden node, β i ∈ R h is the link connecting the ith hidden node to the output nodes, G(w i ,b i ,x j ) is the output function of the ith hidden node with respect to the input sample x j , and w i •x j denotes the inner product of column vectors w i and x j .The standard SFLNs can be forced by these samples with zero error means, as follows: and there β i , w i , and b i apply to the formula: The above equation can be expressed as a matrix: where H is called the hidden layer output matrix of the neural network [54], and T is the desired output matrix.The formula can be adjusted by solving the minimization problem as follows: For fixed weights w i and bias b i , one can seek β to train SLFNs by the least squares linear system.Conventional SLFNs need to find a set of optimal ŵi , bi , β (i = 1, . . ., N), and bring The above equation can be expressed as a matrix: where H + is the Moore-Penrose generalized inverse of the hidden layer output matrix H.

Overview of the Proposed KPCA-ELM Model
The proposed KPCA-ELM modeling procedure, which is illustrated in Figure 1, can be summarized as follows: Step 1. Collect the modeling data, i.e., historical inlet wastewater quality factors.
Step 3. Perform the KPCA for feature extraction.
Step 4. Employ the ELM to forecast the inlet wastewater quality.
Step 5. Output the forecasting result using the inverse normalization.End.
Water 2018, 10, x FOR PEER REVIEW 5 of 17

Data Sets
The original data for sewage treatment monitoring during 1/1/2015-31/12/2015 are used in this study, and include COD (mg/L), BOD (mg/L), NH3-N (mg/L), SS (mg/L), TP (mg/L), and TN (mg/L).There are a total of 365 × 6 samples, which are divided into two categories, i.e., the former 300 × 6 samples for model training, and the rest (65 × 6 samples) for model testing.Each index's daily mean values nonlinear change trend is shown in Figure 2.
Next, the statistical properties, including the maximum, minimum, mean, and standard deviation (SD), are calculated for further analysis to get a deeper understanding.Table 1 illustrates the statistical properties of COD and BOD in respect of the divided training and testing sets, and Table 2 details the statistical properties of the variables.According to Table 1, the training and testing sets have different statistical properties, so it can better explain the performance of the predicted results.From Table 2 and Figure 2, it can be found that the data has the characteristics of violent fluctuation, and the magnitudes of the variables clearly display a big difference.In fact, the effect of the variables with a large magnitude on the modeling is larger than the one with a small magnitude, and thus it is not appropriate to directly take the data to establish the model [55].Thus, all the data are normalized to (0, 1) with the same magnitude to eliminate the influence of the dimension among variables before applying them in the experiments.

Data Sets
The original data for sewage treatment monitoring during 1/1/2015-31/12/2015 are used in this study, and include COD (mg/L), BOD (mg/L), NH 3 -N (mg/L), SS (mg/L), TP (mg/L), and TN (mg/L).There are a total of 365 × 6 samples, which are divided into two categories, i.e., the former 300 × 6 samples for model training, and the rest (65 × 6 samples) for model testing.Each index's daily mean values nonlinear change trend is shown in Figure 2.
Next, the statistical properties, including the maximum, minimum, mean, and standard deviation (SD), are calculated for further analysis to get a deeper understanding.Table 1 illustrates the statistical properties of COD and BOD in respect of the divided training and testing sets, and Table 2 details the statistical properties of the variables.According to Table 1, the training and testing sets have different statistical properties, so it can better explain the performance of the predicted results.From Table 2 and Figure 2, it can be found that the data has the characteristics of violent fluctuation, and the magnitudes of the variables clearly display a big difference.In fact, the effect of the variables with a large magnitude on the modeling is larger than the one with a small magnitude, and thus it is not appropriate to directly take the data to establish the model [55].Thus, all the data are normalized to (0, 1) with the same magnitude to eliminate the influence of the dimension among variables before applying them in the experiments.
Figure 2 and Tables 1-3 indicate that each index value of inlet water quality is outside the standard range.Hence, it is important for the sewage treatment to build a reasonable process plan for disposing of the inlet wastewater, to meet the nation discharge standard of sewage.Establishing a reliable forecasting model not only helps to adjust the performance parameters, such as the balance of carbon source, aeration rates, and reflux ratio, but also to minimize the operation costs and energy consumption.
predicted results.From Table 2 and Figure 2, it can be found that the data has the characteristics of violent fluctuation, and the magnitudes of the variables clearly display a big difference.In fact, the effect of the variables with a large magnitude on the modeling is larger than the one with a small magnitude, and thus it is not appropriate to directly take the data to establish the model [55].Thus, all the data are normalized to (0, 1) with the same magnitude to eliminate the influence of the dimension among variables before applying them in the experiments.

Process of Sewage Treatment
A traditional Anaerobic/Anoxic/Oxic (A/A/O) process is applied to treat domestic sewage in WWTP, which exhibits a good performance for nutrient removal.However, the performance parameter of any WWTP must be modified according to the actual condition of the A/A/O process.Otherwise, the efficiency of the WWTP cannot meet the initial design properties and it may result in serious energy waste [56].Better control of a WWTP can be achieved by developing robust models for forecasting the plant performance based on the past observation of certain water quality factors.This study uses a combination of the KPCA and ELM to extract the principal components from past observation data for forecasting inlet COD and BOD concentration, which helps to adjust the performance parameters, such as the balance of carbon source, aeration rates, and reflux ratio.The A/A/O process flow is shown in Figure 3.

Process of Sewage Treatment
A traditional Anaerobic/Anoxic/Oxic (A/A/O) process is applied to treat domestic sewage in WWTP, which exhibits a good performance for nutrient removal.However, the performance parameter of any WWTP must be modified according to the actual condition of the A/A/O process.Otherwise, the efficiency of the WWTP cannot meet the initial design properties and it may result in serious energy waste [56].Better control of a WWTP can be achieved by developing robust models for forecasting the plant performance based on the past observation of certain water quality factors.This study uses a combination of the KPCA and ELM to extract the principal components from past observation data for forecasting inlet COD and BOD concentration, which helps to adjust the performance parameters, such as the balance of carbon source, aeration rates, and reflux ratio.The A/A/O process flow is shown in Figure 3.

Experimental Design
The experimental design processing of the KPCA-ELM model is shown in Figure 4.

Experimental Design
The experimental design processing of the KPCA-ELM model is shown in Figure 4.  6), if the starting p eigenvalues are over 95% of the total eigenvalues, then the information can be presented by p principle components in practical applications.
The principle components are extracted by the KPCA algorithm as the input of the ELM.In the ELM experimental section, there are the two important parameters (hidden layer nodes and activation function).The specific operation of the selection steps is established as follows: (a) The trial and error method is used to select the optimal activation function with root mean square error (RMSE) as the criteria.(b) The sigmoid function is selected as the activation function [57], and the sigmoid function is expressed as follows:   6), if the starting p eigenvalues are over 95% of the total eigenvalues, then the information can be presented by p principle components in practical applications.
The principle components are extracted by the KPCA algorithm as the input of the ELM.In the ELM experimental section, there are the two important parameters (hidden layer nodes and activation function).The specific operation of the selection steps is established as follows: (a) The trial and error method is used to select the optimal activation function with root mean square error (RMSE) as the criteria.(b) The sigmoid function is selected as the activation function [57], and the sigmoid function is expressed as follows: Water 2018, 10, 873 9 of 17

Assessing the Performance of the Forecasting Model
Bulleted lists look like this: To assess the performance of the proposed model, three criteria, mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE), are applied in this paper.
Mean absolute error (MAE) Mean absolute percentage error (MAPE) Root mean square error (RMSE) where y represents the observed values, ŷi represents the forecasting values, and N is the length of the output data series.

Assessing the Performance of the Forecasting Model
The COD, BOD, NH 3 -N, SS, TP, and TN provided by the sewage treatment plant are used as the input parameters of the water quality forecasting.After the KPCA processing, the principal components are extracted.As shown in Figure 5, the contribution rate of the three principal components is up to 98.20%.Therefore, one can employ these three principal components as the input in the next forecasting process.
Water 2018, 10, x FOR PEER REVIEW 9 of 17 Mean absolute error (MAE) Mean absolute percentage error (MAPE) Root mean square error (RMSE) where y represents the observed values, i y ˆ represents the forecasting values, and N is the length of the output data series.

Assessing the Performance of the Forecasting Model
The COD, BOD, NH3-N, SS, TP, and TN provided by the sewage treatment plant are used as the input parameters of the water quality forecasting.After the KPCA processing, the principal components are extracted.As shown in Figure 5, the contribution rate of the three principal components is up to 98.20%.Therefore, one can employ these three principal components as the input in the next forecasting process.The parameters in the algorithms are determined by trial and error.In this study, the hidden layer nodes for the model are gradually increased from 5 to 250 with the interval 5.In addition, the model forecasts the values of the COD and BOD concentrations at time t using the three principal components in the input structure with different time lags (up to seven prior days) [58].
In this paper, the RMSE, defined as Equation (18), is used to evaluate the regression accuracy for forecasting inlet COD and BOD under the different number of hidden layer nodes and different time lags, as in Figure 6a,b, respectively.The parameters in the algorithms are determined by trial and error.In this study, the hidden layer nodes for the model are gradually increased from 5 to 250 with the interval 5.In addition, the model forecasts the values of the COD and BOD concentrations at time t using the three principal components in the input structure with different time lags (up to seven prior days) [58].
In this paper, the RMSE, defined as Equation (18), is used to evaluate the regression accuracy for forecasting inlet COD and BOD under the different number of hidden layer nodes and different time lags, as in Figure 6a,b, respectively.Figure 6a demonstrates that the RMSE values of the COD forecasting with an increasing the number of hidden layer nodes decreased and then increased for the different lags.The best performance of the COD forecasting is achieve when the number of the hidden layer nodes is 100 with three days ahead, and the lowest RMSE value is 3.108.As shown in Figure 6b, for the proposed model to forecast BOD with an increasing number of hidden layer nodes, the RMSE values are decreased, and then gradually stabilized.The lowest RMSE value of the BOD is 1.340 with 90 hidden layer nodes and three days ahead.
Under the best performance structure, the comparison between the forecasting value and true value of inlet COD and BOD concentration is as seen in Figure 6.
From Figure 7, one can find that the forecasting results of the KPCA-ELM can follow the changes in the testing data successfully, and the forecasting curve is consistent with the testing curve, both for the COD and the BOD.The model has a sufficient ability to forecast peak data both the forecasting value of inlet COD and BOD concentration.The experimental results show that the proposed approach has some good attributes, e.g., a superior accuracy and higher stability, which can meet the requirements of the water quality forecasting of wastewater treatment.

Comparisons
To validate the prediction capacity of the proposed model, three methods are compared with the KPCA-ELM model using the same dataset: PCA-ELM, ELM, and BPNN.A comparison of the dimension reduction ability of the PCA method and KPCA method can be seen in Table 4.It shows that the PCA accumulation is only 89.923%, and the KPCA accumulation is up to 98.200% for three principal components extraction (i = 3).It illustrates that the KPCA retains much more information than that of the PCA with the same principal components.Figure 6a demonstrates that the RMSE values of the COD forecasting with an increasing the number of hidden layer nodes decreased and then increased for the different lags.The best performance of the COD forecasting is achieve when the number of the hidden layer nodes is 100 with three days ahead, and the lowest RMSE value is 3.108.As shown in Figure 6b, for the proposed model to forecast BOD with an increasing number of hidden layer nodes, the RMSE values are decreased, and then gradually stabilized.The lowest RMSE value of the BOD is 1.340 with 90 hidden layer nodes and three days ahead.
Under the best performance structure, the comparison between the forecasting value and true value of inlet COD and BOD concentration is as seen in Figure 6.
From Figure 7, one can find that the forecasting results of the KPCA-ELM can follow the changes in the testing data successfully, and the forecasting curve is consistent with the testing curve, both for the COD and the BOD.The model has a sufficient ability to forecast peak data both the forecasting value of inlet COD and BOD concentration.The experimental results show that the proposed approach has some good attributes, e.g., a superior accuracy and higher stability, which can meet the requirements of the water quality forecasting of wastewater treatment.Figure 6a demonstrates that the RMSE values of the COD forecasting with an increasing the number of hidden layer nodes decreased and then increased for the different lags.The best performance of the COD forecasting is achieve when the number of the hidden layer nodes is 100 with three days ahead, and the lowest RMSE value is 3.108.As shown in Figure 6b, for the proposed model to forecast BOD with an increasing number of hidden layer nodes, the RMSE values are decreased, and then gradually stabilized.The lowest RMSE value of the BOD is 1.340 with 90 hidden layer nodes and three days ahead.
Under the best performance structure, the comparison between the forecasting value and true value of inlet COD and BOD concentration is as seen in Figure 6.
From Figure 7, one can find that the forecasting results of the KPCA-ELM can follow the changes in the testing data successfully, and the forecasting curve is consistent with the testing curve, both for the COD and the BOD.The model has a sufficient ability to forecast peak data both the forecasting value of inlet COD and BOD concentration.The experimental results show that the proposed approach has some good attributes, e.g., a superior accuracy and higher stability, which can meet the requirements of the water quality forecasting of wastewater treatment.

Comparisons
To validate the prediction capacity of the proposed model, three methods are compared with the KPCA-ELM model using the same dataset: PCA-ELM, ELM, and BPNN.A comparison of the dimension reduction ability of the PCA method and KPCA method can be seen in Table 4.It shows that the PCA accumulation is only 89.923%, and the KPCA accumulation is up to 98.200% for three principal components extraction (i = 3).It illustrates that the KPCA retains much more information

Comparisons
To validate the prediction capacity of the proposed model, three methods are compared with the KPCA-ELM model using the same dataset: PCA-ELM, ELM, and BPNN.A comparison of the dimension reduction ability of the PCA method and KPCA method can be seen in Table 4.It shows that the PCA accumulation is only 89.923%, and the KPCA accumulation is up to 98.200% for three principal components extraction (i = 3).It illustrates that the KPCA retains much more information than that of the PCA with the same principal components.The parameters of the PCA-ELM model, ELM model, and BPNN model of the inlet COD and BOD forecasting are determined by trial and error.For the BPNN model, the hidden layer nodes are trained by the empirical formula (n l = √ q + s + a, where n l represents the hidden nodes, q represents input layer nodes, s represents the output layer nodes, and is equal to [0, 10]).The comparison model settings with the optimal structures are detailed in Table 5.
Table 5. Optimal structures of the comparison models.From Figure 8, one can find that the forecasting results of the PCA-ELM can follow the trends of the testing data, but fail the peak data (the 55th day) in terms of both the forecasting values of inlet COD and BOD concentration.To explore the uncertainty from different nodes of KPCA-ELM and PCA-ELM, RMSE, MAE, and MAPE variations are analyzed in each trial case.As shown in Figure 9, the forecasting results of the ELM can follow the fluctuation of the testing data, but fail the detail value of both COD and BOD concentration forecasting.The results of the BPNN model are illustrated in Figure 10, which shows the great error between the forecasted value and true value and fails to follow the fluctuation of the testing data of both COD and BOD concentration forecasting.

Model
PCA-ELM, RMSE, MAE, and MAPE variations are analyzed in each trial case.As shown in Figure 9, the forecasting results of the ELM can follow the fluctuation of the testing data, but fail the detail value of both COD and BOD concentration forecasting.The results of the BPNN model are illustrated in Figure 10, which shows the great error between the forecasted value and true value and fails to follow the fluctuation of the testing data of both COD and BOD concentration forecasting.

Statistical Analysis
To further compare the performance and effectiveness of the models, the correlation between the predicted values of the different approaches and observed values is demonstrated in Figure 11.

Statistical Analysis
To further compare the performance and effectiveness of the models, the correlation between the predicted values of the different approaches and observed values is demonstrated in Figure 11.

Statistical Analysis
To further compare the performance and effectiveness of the models, the correlation between the predicted values of the different approaches and observed values is demonstrated in Figure 11.

Statistical Analysis
To further compare the performance and effectiveness of the models, the correlation between the predicted values of the different approaches and observed values is demonstrated in Figure 11.As observed from Figure 11, most of the predicted values of the KPCA-ELM and PCA are closer to y = x than ELM, but a few predicted values of PCA-ELM are far away from y = x for both the COD prediction and BOD prediction.Simultaneously, the correlation coefficient of COD forecasting of the BPNN, ELM, PCA-ELM, and KPCA-ELM is equal to 0.4527, 0.7164, 0.9544, and 0.9844, respectively; and the correlation coefficient of BOD forecasting of the BPNN, ELM, PCA-ELM, and KPCA-ELM is equal to 0.5494, 0.7928, 0.9593, and 0.9864, respectively.This further illustrates the superior performance of the proposed approach.As observed from Figure 11, most of the predicted values of the KPCA-ELM and PCA are closer to y = x than ELM, but a few predicted values of PCA-ELM are far away from y = x for both the COD prediction and BOD prediction.Simultaneously, the correlation coefficient of COD forecasting of the BPNN, ELM, PCA-ELM, and KPCA-ELM is equal to 0.4527, 0.7164, 0.9544, and 0.9844, respectively; and the correlation coefficient of BOD forecasting of the BPNN, ELM, PCA-ELM, and KPCA-ELM is equal to 0.5494, 0.7928, 0.9593, and 0.9864, respectively.This further illustrates the superior performance of the proposed approach.
In addition to the qualitative comparison using the forecasting results and the residual error analysis, the RMSE, MAE, and MAPE are used to quantitatively evaluate the forecasting performance among the KPCA-ELM, the PCA-ELM, and the ELM.  6.The comparative analyses demonstrate that the proposed model has a better forecasting performance according to each of the three criteria.Additionally, to further validate the performance of the proposed model, an error boxplot is drawn in Figure 12.The boxplot often helps to indicate the degree of dispersion and skewedness in the data, and identifies outliers.As indicated in Figure 12, the results generated by the KPCA-ELM and the PCA-ELM are shorter than those of the ELM and BPNN when observing the length of each box entity, indicating that the distributions of the absolute error are relatively concentrated using the KPCA or the PCA.Nevertheless, the location of the entity for the KPCA-ELM is lower than the PCA-ELM, and the mean absolute error for the KPCA-ELM is the lowest.Through counting the amount of outliers of every model, the ELM has the most, followed by the PCA-ELM, and the KPCA-ELM has the least.In addition, when comparing the distance between the median and the quartiles, the situation of the KPCA-ELM is relatively symmetrical and has a basically normal distribution.Therefore, the proposed model also overwhelms the comparison models.Additionally, to further validate the performance of the proposed model, an error boxplot is drawn in Figure 12.The boxplot often helps to indicate the degree of dispersion and skewedness in the data, and identifies outliers.As indicated in Figure 12, the results generated by the KPCA-ELM and the PCA-ELM are shorter than those of the ELM and BPNN when observing the length of each box entity, indicating that the distributions of the absolute error are relatively concentrated using the KPCA or the PCA.Nevertheless, the location of the entity for the KPCA-ELM is lower than the PCA-ELM, and the mean absolute error for the KPCA-ELM is the lowest.Through counting the amount of outliers of every model, the ELM has the most, followed by the PCA-ELM, and the KPCA-ELM has the least.In addition, when comparing the distance between the median and the quartiles, the situation of the KPCA-ELM is relatively symmetrical and has a basically normal distribution.Therefore, the proposed model also overwhelms the comparison models.In terms of the comparison analysis above, all the results sufficiently illustrate that the ELM model improved by the KPCA for the feature extraction and dimension reduction (KPCA-ELM) In terms of the comparison analysis above, all the results sufficiently illustrate that the ELM model improved by the KPCA for the feature extraction and dimension reduction (KPCA-ELM) exhibits the best forecasting performance when compared to the application of the PCA-ELM and the ELM.
The KPCA-ELM model has been constructed for forecasting the inlet water quality of sewage treatment.Combining the fast learning capacity of the ELM with the nonlinear feature extraction ability of the KPCA, the proposed model exhibits the best forecasting performance among all the peer methods.In addition, the KPCA-ELM has the same performances for both BOD and COD, demonstrating that it has better generalization abilities.

Conclusions
The inlet COD and BOD concentration forecasting of wastewater treatment based on KPCA and ELM is proposed in this study.The KPCA-ELM model is can be used to control parameter adjustment of the sewage treatment system by providing a data reference, which provides a convenient and economic approach to achieve better control of WWTP.The KPCA is employed for feature extraction and dimensionality reduction of the inlet wastewater quality from the sewage treatment in 2015.In each mode, the best outputs of the ELM are determined by selecting the optimal activation function and the number of hidden layer nodes.In addition, the PCA-ELM, the ELM, and the BPNN are introduced as contrast approaches.The experimental results indicate that the KPCA-ELM method has a better forecasting capacity than the peer methods for MAE, MAPE, and RMSE.Simulations results from a wastewater treatment show that the reliability and accuracy of the KPCA-ELM model outperform the PCA-ELM model, the ELM model, and the BPNN model.
In this work, it is shown that KPCA can explore higher order information of the original inputs than the PCA, and the ELM provides a better generalization performance than other popular learning algorithms and faster speeds.Thus, the presented model can be found to excel in water quality forecasting of wastewater treatment in ways that are complex, nonlinear, and uncertain.

Water 2018 ,
10, x FOR PEER REVIEW 7 of 17

3 . 4 .
Assessing the Performance of the Forecasting ModelBulleted lists look like this: To assess the performance of the proposed model, three criteria, mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE), are applied in this paper.

Figure 6 .
Figure 6.The RMSE forecasting values under different hidden layer nodes and different time lags; (a) COD and (b) BOD.

Figure 6 .
Figure 6.The RMSE forecasting values under different hidden layer nodes and different time lags; (a) COD and (b) BOD.

Figure 6 .
Figure 6.The RMSE forecasting values under different hidden layer nodes and different time lags; (a) COD and (b) BOD.

Figure 8 .
Figure 8. Forecasting results of the PCA-ELM model: (a) COD and (b) BOD.Figure 8. Forecasting results of the PCA-ELM model: (a) COD and (b) BOD.

Figure 11 .
Figure 11.Correlogram analysis of the predicted and observed values of comparison models: (a) COD and (b) BOD.

Figure 11 .
Figure 11.Correlogram analysis of the predicted and observed values of comparison models: (a) COD and (b) BOD.
The experimental results indicated that the KPCA-ELM model has a higher accuracy than the others for forecasting COD and BOD concentration of the inlet wastewater, with MAE values of 2.322 mg/L and 1.125 mg/L, MAPE values of 1.223% and 1.321%, and RMSE values of 3.108 and 1.340, respectively.The PCA-ELM model for forecasting COD and BOD concentration of the inlet wastewater displayed MAE values of 3.542 mg/L and 1.125 mg/L, MAPE values of 1.900% and 1.777%, and RMSE values of 4.270 and 1.710, respectively.The ELM model for forecasting COD and BOD concentration of the inlet wastewater exhibited MAE values of 9.125 mg/L and 4.399 mg/L, MAPE values of 6.234% and 6.057%, and RMSE values of 14.267 and 5.585, respectively.The BPNN model for forecasting COD and BOD concentration of the inlet wastewater had MAE values of 15.826 mg/L and 6.950 mg/L, MAPE values of 8.061% and 8.783%, and RMSE values of 20.126 and 8.817, respectively.Quantitative analysis was employed and the results are summarized in Table

Figure 12 .
Figure 12.Comparisons of the boxplot using different models: (a) COD and (b) BOD.

Figure 12 .
Figure 12.Comparisons of the boxplot using different models: (a) COD and (b) BOD.

Table 1 .
Statistical properties of COD and BOD in terms of divided training and testing sets.

Table 2 .
Statistical properties of the variables.

Table 1 .
Statistical properties of COD and BOD in terms of divided training and testing sets.

Table 2 .
Statistical properties of the variables.

Table 4 .
Comparison of the PCA and the KPCA of the principal components extraction.

Table 6 .
Comparison results using different models.