Soft Measurement Modeling Based on Chaos Theory for Biochemical Oxygen Demand ( BOD )

Abstract: The precision of soft measurement for biochemical oxygen demand (BOD) is always restricted due to various factors in the wastewater treatment plant (WWTP). To solve this problem, a new soft measurement modeling method based on chaos theory is proposed and is applied to BOD measurement in this paper. Phase space reconstruction (PSR) based on Takens embedding theorem is used to extract more information from the limited datasets of the chaotic system. The WWTP is first testified as a chaotic system by the correlation dimension (D), the largest Lyapunov exponents (λ1), the Kolmogorov entropy (K) of the BOD and other water quality parameters time series. Multivariate chaotic time series modeling method with principal component analysis (PCA) and artificial neural network (ANN) is then adopted to estimate the value of the effluent BOD. Simulation results show that the proposed approach has higher accuracy and better prediction ability than the corresponding modeling approaches not based on chaos theory.


Introduction
As one of the key effluent quality parameters for evaluating the performance of the wastewater treatment plant (WWTP), biochemical oxygen demand (BOD) reflects the content of biodegradable organic matter in the water and needs to be measured accurately [1,2].Based on the national standard, BOD should remain at a regulatory value or below (e.g., 10 mg/L according to the standard of "Level I class A" in Discharge standard of pollutants for municipal wastewater treatment plant (GB-18918-2002) in China).Generally, BOD is measured by the chemical experiment for days [3].Furthermore, the existing BOD monitoring instruments are used for the measurement of BOD but need high economic costs and have poor stability [4,5].Hence, it is crucial to find a way to improve the convenience, economy and accuracy of BOD measurements.
In order to solve the above problem, the soft measurement method is widely used in complex system modeling which can estimate hard-to-measure process variables from other easy-to-measure variables [6].Artificial neural network (ANN) has some particular properties such as large scale parallel distributed processing, fault-tolerance, self-organized learning, classification, self-adaptation and strong capability of the nonlinear approximation with high reconstructing accuracy and fast training rate for the nonlinear dynamic system [7,8].Therefore, models based on ANN which can mirror the law hidden in the data are the most popular ones for the soft measurement modeling and prediction [9].This technique has been adopted to solve many practical engineering problems in WWTP.The modeling variables for WWTP contain the common measurable parameters, such as Q, T, pH, ORP, DO, MLSS, NH 4 -N, NO X -N, BOD, COD, SS, TSS, TP, TN, SVI and EC.The units and  1.Among them, Q, T, pH, ORP, DO, COD and SS have great correlation with BOD in the process of WWTP.Some of them (i.e., T, pH, ORP, DO) can be measured online.The measurement of COD and SS need to take several hours to complete.An improved TakagiSugeno fuzzy neural network (TSFNN) was proposed to predict effluent BOD values with the influent COD, pH, SS and DO of aeration tank as the input variables by soft measurement method [1].A self-organizing radial basis function (SORBF) neural network was introduced to estimate the effluent BOD concentration with the influent COD, pH and SS as the inputs in WWTP [4].K-nearest neighbors (KNN), support vector machine (SVM) and self-organizing map (SOM) were adopted to estimate five-day at 20 • C N-Allylthiourea BOD and suspended solids (SS) [10].A soft computing method based on Wavelet Neural Network (WNN) with Principal Component Analysis (PCA) and on-line measuring instruments were applied to accomplish real-time detection and control for ORP, DO, pH, COD, etc. in sewage treatment [11].An adaptive network-based fuzzy inference system (ANFIS) with PCA was introduced for the estimation of effluent SS and COD with influent COD, SS, Q and pH, DO as the inputs in WWTP [12].A method based on a generalized regression neural network (GRNN) was proposed to estimate the concentration of effluent BOD with effluent COD, SS, pH, T and EC in WWTP [13].A three-layered feed forward ANN with a back propagation learning algorithm was applied to forecast effluent BOD with BOD in other seven sampling sites as the inputs in WWTP [14].These articles adopted different modeling method to accomplish soft measurement for effluent BOD and other effluent water quality parameters that are difficult to measure in WWTP.However, the precision of soft measurement modeling for BOD in WWTP is affected by various factors, such as the quality and quantity of the observed data, the selection of input variables, modeling method and ANN's parameters [1,3,4].In addition, WWTP is a complex dynamic system with characteristics of uncertainty, large time-lags and high nonlinearity and strong coupling due to the change of the environmental and operational conditions, which makes the modeling, optimizing and control difficult [15].Therefore, the prediction for effluent BOD in WWTP is a challenging problem and how to further enhance the accuracy of the soft measurement for BOD is a question worth thinking of and examining further [16].
Because BOD prediction is affected by characteristics of WWTP, it is important to take its characteristics into consideration to try to obtain more useful information for soft measurement modeling.Chaos is a widely existing phenomenon in nature and is a specific behavior for the nonlinear dynamic system.With the rapid development of chaos theory, the chaotic time series analysis and prediction are widely studied and have been widespread concerned in the research fields such as electric short-term load [17], economics [18], industrial manufacture [19], signal processing [20] and medical diagnosis [21].A multivariate prediction method was presented for electric short-term load using chaos theory and radial basis function (RBF) neural networks.The proposed method improves the precision of forecasting significantly comparing with the univariate methods [17].Multilayer perceptron (MLP) neural network model based on phase space reconstruction (PSR) in chaos theory was proposed to predict the carbon price.Results demonstrate the model has higher prediction Water 2016, 8, 581 3 of 21 accuracy and fitting effect than other related models [18].A novel RBF prediction model for melt index (RBF-chaos) are set up to characterized its strong nonlinear and correlated relationships under chaos theory.Results indicate that the proposed neural network model with chaos is superior to the previous models without considering chaotic characteristics [19].A new method realized a long-term prediction of sensor baseline and drift based on PSR and RBF neural network.Results show that the proposed model can make long-term and accurate forecasting of chemical sensor baseline and drift time series [20].
Based on the above researches, the chaotic characteristics of time series are identified and demonstrated first before modeling with chaos theory.PSR in chaos theory can memorize all of the properties of a chaotic attractor and clearly recover the motion trace of a time series, thus PSR provides more information for modeling and makes more accurate forecasting possible [17][18][19][20].Therefore, we suppose that the WWTP is a chaotic system, but this has not been demonstrated yet.If the WWTP is a chaotic system, then chaos theory can be taken into consideration for the soft measurement modeling for BOD in WWTP.
In chaos theory, all the possible states of a nonlinear chaotic system can be described by the phase space.Each point in phase space, called phase points, expresses the whole physical state [22].PSR technique based on the Takens embedding theorem can recover the m-dimensional phase space or the structure of the attractor by single time series [23].Thus, we can give a recurrence of the chaotic attractor in the original dynamical system and extract more quantity of the information from the limited dataset.This feature may increase the accuracy of modeling.
In this paper, the chaotic characteristics of WWTP are first analyzed and a new soft measurement method with PCA and ANN based on chaos theory is proposed for the prediction of BOD.Numerical experiments are designed to verify its effectiveness and feasibility by comparison between the ANN model with chaos theory and that without it.
The rest of paper is organized as follows.In Section 2, the methods for chaotic characteristics analysis are introduced.PSR based on the Takens embedding theorem is presented for the univariate and multivariate time series modeling.The WWTP and the structure of the soft measurement prediction model for BOD based on PCA-ANN with chaos theory are described.In Section 3, the numerical experiments and the comparative results between the proposed method and other methods not based on chaos theory are presented.In Section 4, the several important factors which have effect on the prediction accuracy of the soft measurement modeling for effluent BOD based on chaos theory are analyzed and discussed.Finally, the conclusions are given in Section 5.

Chaotic Characteristic Analysis Methods
If the chaos theory is applied to soft measurement modeling in the WWTP, chaotic characteristics need to first be confirmed for WWTP.Correlation dimension (D), largest Lyapunov exponents (λ 1 ) and Kolmogorov entropy (K) are three characteristics to determine whether a dynamic system is chaotic or not [18].Analysis methods for these characteristics are introduced in this section accordingly.

Phase Space Reconstruction (PSR)
Chaotic system cannot be predicted easily due to its features (e.g., initial value sensitivity, parameter sensitivity, ergodicity and random similarity) until the Takens embedding theorem was proposed by Takens in 1981.For the analysis of the characteristics of chaos, the key and first step is PSR based on the Takens embedding theorem which can recover the properties of the original dynamic system [23].Then, an m-dimensional vectors can be obtained by reconstruction with a single time series observed from a chaotic system with two parameters which are delay time τ and embedding dimension m [24].
Water 2016, 8, 581 4 of 21 Suppose an observed univariate chaotic time series x(t) is generated by a nonlinear dynamic system.Based on the Takens embedding theorem, an m-dimensional embedding phase space point as input for the ANN model can be described as follows where m is embedding dimension, τ is delay time, M is the number of phase points, M = N − (m − 1), τ and N is the length of the observed time series.If m is large enough (m ≥ 2D + 1, D is the fractal dimension of the attractor), the reconstructed phase space is equivalent to the original dynamic system in the meaning of topology homeomorphism.There exists a smooth diffeomorphism f : R m →R 1 as follows where h is the step of prediction.The reconstructed phase space X is composed of the M phase space points, then the formula is expanded as The output prediction vector Y can be descried by The above method for univariate chaotic time series can realize the single variable short-term prediction based on the history value [17,[24][25][26].It is worth noting that the soft measurement method is a multivariate modeling process.Moreover, multivariate chaotic time series modeling method contains more information associated with the original dynamic system which can improve the accuracy of prediction to some extent compared with the univariate chaotic time series modeling method [27][28][29].
For the multivariate chaotic time series prediction, suppose a given L-dimensional multivariate time series {x i (t As in the case of univariate time series (when L = 1), the phase space points by PSR can be expressed as where τ i and m i (i = 1, 2, . . ., L) are the ith variable's delay time and the embedding dimensions, respectively.Based on Takens embedding theorem, the total embedding dimensions m = m 1 + m 2 + . . .+ m L .If m ≥ 2D + 1(D is the fractal dimension of the attractor) or each m i is large enough, there exists an L-dimensional continued vector mapping F: R m →R m as follows Equation ( 6) can be redefined by an L-dimensional continued vector mapping F i : R m →R, shown as Then, the transformation from V n to x i,n+1 or V n+1 reflects the evolutionary process from the current state to the future state for the original dynamic system, which signifies that the geometrical characteristics of the chaotic attractor in the reconstructed phase space are identical to the original state space.Thus, any differential or topological invariant quantities in the original dynamic system can be computed in the reconstructed phase space or reconstructed strange attractor.The value of the delay time τ i and the embedding dimensions m i can be determined by the following method.
(1) Delay Time The delay time τ decides the quantity of information contained in the reconstructed phase space.In order to select an appropriate delay time τ, there are several methods to choose from, such as mutual information (MI) [17], autocorrelation function [25] and C-C method [30].Because it can reflect the nonlinear correlation between the two time series, MI method is adopted broadly.
MI is a concept from the information theory.MI between x(t) and x(t + τ) is described by where N is the length of the observed time series, τ is delay time, p(x(t)) is the marginal probability density function (PDF) of x(t), p(x(t + τ)) is the marginal PDF of x(t + τ), and p(x(t),x(t + τ)) is the joint PDF of x(t) and x(t + τ).The optimal delay time τ is determined by the first minimum value of I(τ).
As one of the methods to compute I(τ), the histogram-based MI estimation divides the x(t) − x(t + τ) coordinate space into (k x(t) × k x(t+τ) ) equally sized (∆ x(t) × ∆ x(t+τ) ) cells [31].k x is the number of cells, and it is selected by where ζ is a constant value, and round{} means the closest integer of a real variable.Equations ( 9) and (10) are defined in reference [31] which has demonstrated that the method to choose k x can obtain more accurate MI. (

2) Embedding Dimension
The embedding dimension m is the other key parameter for PSR.Many methods, such as false nearest neighbor method [20], G-P algorithm [32,33] and Cao's method [34], are proposed to estimate the value of embedding dimension m.In this paper, we only need to obtain the minimum embedding dimension m min based on the Takens embedding theorem [24][25][26][27][28][29].The G-P algorithm proposed by Grassberger and Procaccia is adopted to determine it due to its easy calculation and good performance.The G-P algorithm is in detail as follows: Step 1: The m is set to 2 as the initialization after the delay time τ has been obtained based on Section 2.1.1 (1).Further, M phase points can be generated by Equation (1).The correlation integral C m (r) is the percentage of the number of the distance between each two phase points less than r in all phase points [35].C m (r) can be calculated by where r is the given radius (r ∈ {r min , r min + δ, r min + 2δ, . . ., r max }), X i − X j is the Euclidean distance between phase space points X i and X j .H(x) is the Heaviside function defined as Water 2016, 8, 581 6 of 21 In practical computation, r is computed by where r max = max X i − X j , r min = min X i − X j is the maximum and minimum of the X i − X j , respectively.k is a variable (k = 0, 1, 2, . . .). δ is the step length of the r.s is the number of the points need to be inserted (s = 50~200).With the increasing of r, a curve of lnC m (r)~lnr corresponding to one m is plotted by Equations ( 11)- (13).
Step 2: Multiple curves of lnC m (r)~lnr can be plotted by different value of m.If the system is chaotic, C m (r) is proportional to r D2(m) which is described by where D 2 (m) is correlation dimension with different embedding dimension m (m = 1, 2, . . .).D 2 (m) is one of the attractor's fractal dimension D which can be determined from the slope of the curve between lnC m (r) and and lnr in the section of linearity.
Step 3: When D 2 (m) tends to saturate with the increase of the embedding dimension m, the system is chaotic, and the value of the saturate gives D 2 (m).The minimum embedding dimension m min can be decided by the formula m ≥ 2D + 1 referred before.If not, the system is random.

Lyapunov Exponent
The Lyapunov exponents describe the phenomenon of dispersion for two nearby initial values due to the butterfly effect, and quantify the average exponential rates of divergence or convergence of adjacent trajectories in phase space [18,19,22].
For one-dimensional dynamic system x n+1 = F(x n ), after the N steps' iteration or evolution, the Lyapunov exponent λ is defined as If the system contains one positive Lyapunov exponent, we can demonstrate the dynamic system is chaotic [19]: (a) when λ ≤ 0, the system is stable; (b) when 0 < λ < ∞, the system is chaotic; and (c) when λ = ∞, the system is random.Therefore, only the Largest Lyapunov Exponent (LLE), denoted as λ 1 , needs to be calculated.In our study, the Wolf algorithm [19,36] is used to obtain the LLE.

Kolmogorov Entropy
Kolmogorov entropy (K) measures the degree of chaos for a nonlinear system.The more the loss ratio of the information for the system, the larger the value of K.And the loss ratio of the information is proportional to the level of the chaos for the dynamical system [18,21].
For an n-dimensional dynamic system, assume that the n-dimensional phase space is divided into boxes of size r n .We make an assumption that the system has a strange attractor and x(t) is its pathway.If p(i 0 , i 1 , . . ., i d ) is the joint probability that t = τ in box i 1 , t = 2τ in box i 2 , ..., and t = dτ in box i d , K is defined by where τ is delay time, and r is the edge length of the boxes.But, the process of the computation for K based on the above formula is too cumbersome.For this reason, the G-P algorithm [37] is applied to reduce the computation complexity.In this method, K can be estimated by the value of K 2 which is stable with the growth of m.The K 2 is described by where m is the embedding dimension, and C(r) is the correlation integral.Different K represent different states of the motion [18]: (a) K = 0 means regular motion; (b) 0 < K < ∞ means chaotic motion; (c) K = ∞ means random motion.

Wastewater Treatment Plant (WWTP)
WWTP plays an important role in the reclamation and recycling of sewage.The activated sludge process as shown in Figure 1 is a common and effective technique to realize the WWTP.The activated sludge contains various micropopulation which can eliminate and defuse the harmful substances for the human and ecological environment.Generally, WWTP consists of four parts which are primary stage (physical treatment), secondary stage (biological treatment and secondary sedimentation tank), tertiary stage (advanced treatment) and sludge treatment, which are described in details as follows [9,38]: (a) Primary treatment.The preliminary step is used to remove large objects that have negatively influence on downstream processing equipment, the sand and other solid waste from the influent wastewater.Dense organic material is removed through primary sedimentation tank.The primary treated wastewater is transported to the biochemical reaction basin.(b) Secondary treatment.The biological treatment takes place in biochemical reaction basin in which the organic carbon (C), biological nitrogen (N), biological phosphorus (P) and ammonium are removed from the liquid portion of the wastewater by microorganism in the activated sludge and transferred to the solids portion.There is the secondary sedimentation tank, in which the treated water and the sludge are separated by physical subsiding.(c) Tertiary treatment.The advanced treatment further removes the refractory organic matter and soluble inorganic matter in order to make potable water when it is needed.(d) Sludge treatment.A fraction of the separated sludge in secondary sedimentation tank is returned to the biochemical reaction basin to sustain the ability of the wastewater treatment.The other redundant sludge is carried away after sludge treatment process.WWTP plays an important role in the reclamation and recycling of sewage.The activated sludge process as shown in Figure 1 is a common and effective technique to realize the WWTP.The activated sludge contains various micropopulation which can eliminate and defuse the harmful substances for the human and ecological environment.Generally, WWTP consists of four parts which are primary stage (physical treatment), secondary stage (biological treatment and secondary sedimentation tank), tertiary stage (advanced treatment) and sludge treatment, which are described in details as follows [9,38]: (a) Primary treatment.The preliminary step is used to remove large objects that have negatively influence on downstream processing equipment, the sand and other solid waste from the influent wastewater.Dense organic material is removed through primary sedimentation tank.The primary treated wastewater is transported to the biochemical reaction basin.(b) Secondary treatment.The biological treatment takes place in biochemical reaction basin in which the organic carbon (C), biological nitrogen (N), biological phosphorus (P) and ammonium are removed from the liquid portion of the wastewater by microorganism in the activated sludge and transferred to the solids portion.There is the secondary sedimentation tank, in which the treated water and the sludge are separated by physical subsiding.(c) Tertiary treatment.The advanced treatment further removes the refractory organic matter and soluble inorganic matter in order to make potable water when it is needed.(d) Sludge treatment.A fraction of the separated sludge in secondary sedimentation tank is returned to the biochemical reaction basin to sustain the ability of the wastewater treatment.
The other redundant sludge is carried away after sludge treatment process.
Grilling Room  WWTP is a complex nonlinear dynamic process due to its multivariable coupling, unstable, large-scale disturbances and other uncertain human and environmental factors [1,2].These characters bring great challenges to the system modeling, controlling and parameter prediction.For the modeling of WWTP, there are two main segments that are (1) the hydraulic model and (2) the  WWTP is a complex nonlinear dynamic process due to its multivariable coupling, unstable, large-scale disturbances and other uncertain human and environmental factors [1,2].These characters bring great challenges to the system modeling, controlling and parameter prediction.For the modeling of WWTP, there are two main segments that are (1) the hydraulic model and (2) the biological model.The hydraulic model is used to describe the flow and technological processes.The biological model, which is the main component of the WWTP, shows the behavior and effects of the activated sludge.The biological process is mainly related to the activated sludge and it represents microbial growth, death, and nutrient consumption.These activated sludge models are used to approach enormous amount of biological processes which occurred in each bioreactor [38].Overall, the modeling of the WWTP is still a complicated problem for the complex and changing process [16].Due to the limitations of technology and chemical factors, some key water quality and environmental parameters cannot be measured easily and accurately [38].The practical WWTP also is unstable and time-varying with the interaction of the variables with each other and disturbance [15].Hence, the modeling is a challenging task so that accurate models are researched for describing WWTP.

Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is an important feature extraction method for multivariate statistical analysis [11,12,27].It can express datasets by effective characteristics with fewer dimensions while maintaining the quantity of the information in time series.
In fact, PSR increases the dimensions of the input assistant variables to obtain more information hidden in the chaotic dynamic system [24][25][26][27][28][29].Using PCA, we can reduce the dimensions of the input assistant variables without information loss to enhance the ability of generalization performance and to avoid over-fitting on account of overmuch input dimensions on the basis of PSR [27].In this paper, PCA is used to reduce the dimensions after the PSR and before the modeling task.
The basic steps of PCA are as follows: Suppose the input sample matrix X M×m , where m = m 1 + m 2 + . . .+ m L , and M = N − max[(m i − 1) × τ i ]. m i and τ i (i = 1, 2, . . ., L), are the ith input variable's embedding dimension and delay time, respectively.
Step 1: Standardize the X M×m into X M×m : where i is the number of phase points, i = 1, 2, . . ., M; j is the dimension of phase points, j = 1, 2, . . ., m; x ij is the jth dimension of ith phase points; x j is the mean of jth dimension of phase points; σ j is the standard deviation of jth dimension of phase points.
Step 4: Compute the principal component (or score) t i : Step 5: The covariance matrix ψ x is made up of m eigenvectors, and only the first a eigenvectors corresponding to larger eigenvalues are reserved.Thus, the number of the remaining eigenvectors is (m − a) with related smaller eigenvalues, and they are filtered out, due to being regarded as noise.Because the existing diversified methods lead to different results, how many smaller eigenvalues Water 2016, 8, 581 9 of 21 should be abandoned to obtain the best results, is a difficult problem.The appropriate number of principal components can filter out most of the noise and acquire the best structure of data.In this paper, a simple and classical method is adopted.
The contribution rate of variance δ i for t i is defined as The contribution rate of the accumulated variance η a for the first a principal component is then described by The number of principal components can be determined based on the pre-set contribution rate of accumulated variance η 0 (such as 85%).The minimum a satisfying η a > η 0 is then the acquired value.

Multivariate Chaotic Time Series Model for BOD
In this paper, the structure of the soft measurement model based on chaos theory with PCA-ANN for effluent BOD is shown in Figure 2. The data which are measured in WWTP are sent to the data pre-processing (DP) module.The input assistant variables are determined by the mechanism and data analysis, which are easily measured, economical and have greater chemical correlation with effluent BOD.According to the analysis and existing papers [1,2,4,5,13], influent COD, SS, pH and DO are selected as the input assistant variables finally.
Water 2016, 8, 581 9 of 21 The number of principal components can be determined based on the pre-set contribution rate of accumulated variance η0 (such as 85%).The minimum a satisfying ηa > η0 is then the acquired value.

x3(t) x3(t-τ3) x3(t-(m3-1)τ3)
where x is the original observed value, ' x is the normalized value, xmax and xmin are the maximum and minimum of the original observed value, respectively; and vmax and vmin are the maximum and where x is the original observed value, x is the normalized value, x max and x min are the maximum and minimum of the original observed value, respectively; and v max and v min are the maximum and minimum of the normalized value, respectively.In this paper, we set v max = 1, v min = −1, and the experiment data are all normalized into [−1, 1].The existing soft measurement models all adopt the easy-measured variables to be input directly without taking the chaotic characteristic of WWTP into consideration.In addition, the PSR technique can recover the structure of the chaotic attractor under the condition that the WWTP is demonstrated to be a chaotic system.This will increase the information of the nonlinear dynamic system based on the analysis of Section 2.1.
The soft measurement model with multivariate chaotic time series is established based on the Section 2.1.1,where the step of prediction h is set to zero.The variable COD, pH, SS, DO are taken as the inputs of the ANN.The dimension of input assistant variables is extended by PSR based on calculated delay time τ and embedding dimension m.The corresponding BOD is regarded as the output.Before entering into the ANN, the dimension of reconstructed input variable are reduced by PCA.The mapping relation can be descried as Based on the multivariate chaotic time series prediction method in Section 2.1.1,the input variables are reconstructed with m i and τ i (i = 1, 2, . . ., L) by PSR.Then, each input variable generates M m i -dimensional phase points.The input dimensions have been changed from L to m (m = m 1 + m 2 + . . .+ m L ).In order to enhance the ability of generalization performance and avoid over-fitting on account of overmuch input dimensions, the PCA technique in Section 2.2.2 is adopted to reduce the input dimensions to m without information loss.Afterwards, the obtained inputs are taken as the inputs of ANN.And the number of neurons in the hidden layers can be estimated by [18] where m is the input dimensions, and α is an integer (α ∈ [1,10]).Hence, the structure of the ANN is m − n h − 1.The multivariate chaotic time series prediction method is only suitable for the chaotic time series.If the method is applied for the soft measurement modeling, the chaotic characteristic of input and output variables time series need to be demonstrated and analyzed first.
The ANN can map the function f (•) by training with the input and output dataset.In practical engineering applications, the processed data are taken as the input of ANN for model predictive control (MPC) [39].As the output of ANN, the estimated BOD values are compared with the set values.The difference value is used as the control signal for controller.The method not only accomplishes the soft measurement, but also takes the chaotic information into consideration.

Evaluation of the Model
The sum of squared error (SSE) is chosen as the training target function which can be defined by where y i , ŷi , and N are the measured values, the predicted values, and the length of the measured values, respectively.And root mean squared error (RMSE) and mean absolute percentage error (MAPE) are used to evaluate the prediction accuracy and effectiveness of the proposed prediction model and other models [8].They are calculated as follows: The RMSE reflects the degree of the discrepancy between the predicted and the measured values.The MAPE reflects the mean of the absolute percentage error of prediction.

Data Source
The 609 data are derived from a real wastewater treatment plant every day from 1 January 2011 to 31 August 2012 in Beijing as shown in Figure 3.

Data Source
The 609 data are derived from a real wastewater treatment plant every day from 1 January 2011 to 31 August 2012 in Beijing as shown in Figure 3.In Figure 3, the original data contains some outlier data or noise which have impact on the chaotic characteristic analysis.The noise points are usually generated by instruments, manual operation, other unforeseen circumstances, etc.The values of noise are also random and abnormal.As a phenomenon found in non-linear dynamic system, chaos is deterministic and random-like.The denoised data changes within a definite range and doesn't contain the outlier data or missing data.Experiment data are 598 points in total after noise elimination with 3σ-rule.Influent COD, SS, pH and DO, which are measured in aeration tank, are taken as the inputs.Effluent BOD is used as the output for the soft measurement model.In Figure 3, the original data contains some outlier data or noise which have impact on the chaotic characteristic analysis.The noise points are usually generated by instruments, manual operation, other unforeseen circumstances, etc.The values of noise are also random and abnormal.As a phenomenon found in non-linear dynamic system, chaos is deterministic and random-like.The denoised data changes within a definite range and doesn't contain the outlier data or missing data.

The Chaotic Characteristic of WWTP
Experiment data are 598 points in total after noise elimination with 3σ-rule.Influent COD, SS, pH and DO, which are measured in aeration tank, are taken as the inputs.Effluent BOD is used as the output for the soft measurement model.

The Chaotic Characteristic of WWTP
The chaotic characteristics of WWTP are first analyzed to further decide whether the soft measurement model can be designed based on chaos theory.The G-P algorithm is adopted to calculate the minimum embedding dimension m min based on Takens embedding theorem for PSR.Before application of the PSR, the optimal delay time τ of BOD is computed by MI (Section 2.1.1).The results are shown in Figure 4.  From Figure 4, we can see the optimal delay time τ is 2 for the time series of BOD and corresponding I(τ) is 1.8377, which means that MI reaches the first minimum in the second day.The curves of lnr-lnC(r) for different embedding dimensions m from 1 to 21 are drawn as shown in Figure 5 after the delay time τ has been determined.In Figure 5, the slope of each curve which is calculated by linear fitting in the non-scaling interval is the correlation dimension D for different embedding dimensions m.Based on the results in Figure 5, the curves of D over m for BOD can be plotted in Figure 6.From Figure 4, we can see the optimal delay time τ is 2 for the time series of BOD and corresponding I(τ) is 1.8377, which means that MI reaches the first minimum in the second day.The curves of lnr-lnC(r) for different embedding dimensions m from 1 to 21 are drawn as shown in Figure 5 after the delay time τ has been determined.From Figure 4, we can see the optimal delay time τ is 2 for the time series of BOD and corresponding I(τ) is 1.8377, which means that MI reaches the first minimum in the second day.The curves of lnr-lnC(r) for different embedding dimensions m from 1 to 21 are drawn as shown in Figure 5 after the delay time τ has been determined.In Figure 5, the slope of each curve which is calculated by linear fitting in the non-scaling interval is the correlation dimension D for different embedding dimensions m.Based on the results in Figure 5, the curves of D over m for BOD can be plotted in Figure 6.In Figure 5, the slope of each curve which is calculated by linear fitting in the non-scaling interval is the correlation dimension D for different embedding dimensions m.Based on the results in Figure 5, the curves of D over m for BOD can be plotted in Figure 6.In Figure 5, the slope of each curve which is calculated by linear fitting in the non-scaling interval is the correlation dimension D for different embedding dimensions m.Based on the results in Figure 5, the curves of D over m for BOD can be plotted in Figure 6.As one can see from Figure 6, even though the correlation dimension D increases with the increasing embedding dimension m, it approaches to a saturation value (namely saturated As one can see from Figure 6, even though the correlation dimension D increases with the increasing embedding dimension m, it approaches to a saturation value (namely saturated correlation dimension) 1.2114 when m ≥ 8. Then the minimum embedding dimension m min for the time series of BOD is 8 based on the Figure 6 and the principle of the m ≥ 2D + 1 (Section 2.1.1).The fractional correlation dimension D reveals that the BOD time series may be chaotic.
After the determination of the delay time τ and embedding dimension m, the phase space can be reconstructed based on the PSR approach.The two-dimensional phase space portrait is reconstructed by the BOD time series as shown in Figure 7.According to Figure 7, the two-dimensional phase space portrait shows an obvious attractor in the reconstructed phase space.This phenomenon or motion feature, which seems to have a certain regularity, is different from a random motion.
Water 2016, 8, 581 13 of 21 correlation dimension) 1.2114 when m ≥ 8. Then the minimum embedding dimension mmin for the time series of BOD is 8 based on the Figure 6 and the principle of the m ≥ 2D + 1 (Section 2.1.1).The fractional correlation dimension D reveals that the BOD time series may be chaotic.
After the determination of the delay time τ and embedding dimension m, the phase space can be reconstructed based on the PSR approach.The two-dimensional phase space portrait is reconstructed by the BOD time series as shown in Figure 7.According to Figure 7, the two-dimensional phase space portrait shows an obvious attractor in the reconstructed phase space.This phenomenon or motion feature, which seems to have a certain regularity, is different from a random motion.However, we cannot yet conclude that the evolution process of BOD must be chaotic.To further judge the characteristics of the BOD time series, the Kolmogorov entropy (K), LLE (λ1) are calculated by the G-P algorithm (Section 2.1.3)and Wolf algorithm (Section 2.1.2),respectively.The value of K is 0.067637 (K > 0) in Figure 8. Besides, the LLE λ1 is 0.31441 (λ1 > 0).K and λ1 are all positive which demonstrates that the BOD time series in WWTP is chaotic.However, we cannot yet conclude that the evolution process of BOD must be chaotic.To further judge the characteristics of the BOD time series, the Kolmogorov entropy (K), LLE (λ 1 ) are calculated by the G-P algorithm (Section 2.1.3)and Wolf algorithm (Section 2.1.2),respectively.The value of K is 0.067637 (K > 0) in Figure 8. Besides, the LLE λ 1 is 0.31441 (λ 1 > 0).K and λ 1 are all positive which demonstrates that the BOD time series in WWTP is chaotic.
From the previous analysis, we can demonstrate that the effluent BOD time series is chaotic, that is, the wastewater treatment system is chaotic.In order to further verify the result, more variables need to be tested.Therefore, the COD, pH, SS and DO time series are analyzed by the same processes and method in Section 2.1.The results are listed in Table 2.
The results in Table 2 have two functions as follows: (1) whether the dynamic system is chaotic or not can be judged based on the value of m, K, λ 1 ; (2) the necessary parameters (i.e., m and τ) for multivariate chaotic time series modeling are provided.
However, we cannot yet conclude that the evolution process of BOD must be chaotic.To further judge the characteristics of the BOD time series, the Kolmogorov entropy (K), LLE (λ1) are calculated by the G-P algorithm (Section 2.1.3)and Wolf algorithm (Section 2.1.2),respectively.The value of K is 0.067637 (K > 0) in Figure 8. Besides, the LLE λ1 is 0.31441 (λ1 > 0).K and λ1 are all positive which demonstrates that the BOD time series in WWTP is chaotic.From the previous analysis, we can demonstrate that the effluent BOD time series is chaotic, that is, the wastewater treatment system is chaotic.In order to further verify the result, more variables need to be tested.Therefore, the COD, pH, SS and DO time series are analyzed by the same processes and method in Section 2.1.The results are listed in Table 2.
The results in Table 2 have two functions as follows: (1) whether the dynamic system is chaotic or not can be judged based on the value of m, K, λ1; (2) the necessary parameters (i.e., m and τ) for multivariate chaotic time series modeling are provided.The analysis experiments show that the BOD time series and other variables in WWTP all have fractional correlation dimension, positive K and λ 1 .It is testified that the random-like motion of the various WWTP time series can be explained as a chaotic phenomenon.After the chaotic characteristics analysis, the chaotic time series prediction methods can be applied to the modeling of the soft measurement of BOD.

Soft Measurement for BOD Based on Chaos Theory
Based on Table 2 and Section 2.2.2, the variance explained by the sorted principal components are shown in Figure 9.
Water 2016, 8, 581 14 of 21 The analysis experiments show that the BOD time series and other variables in WWTP all have fractional correlation dimension, positive K and λ1.It is testified that the random-like motion of the various WWTP time series can be explained as a chaotic phenomenon.After the chaotic characteristics analysis, the chaotic time series prediction methods can be applied to the modeling of the soft measurement of BOD.

Soft Measurement for BOD Based on Chaos Theory
Based on Table 2 and Section 2.2.2, the variance explained by the sorted principal components are shown in Figure 9.The histogram represents the variance explained (or the contribution rate of variance) by each principal component.The blue curve represents the total variance explained.One can see that the total variance explained is 86.10% (>85%) by adding the first 14 principal components.
From Table 2, the number of neurons in the input layer is determined by the embedding dimensions m, which is 24.Then the input dimensions are reduced to 14 based on the results of the PCA technique in Figure 9.To compare different models, we set the number of the input and  The histogram represents the variance explained (or the contribution rate of variance) by each principal component.The blue curve represents the total variance explained.One can see that the total variance explained is 86.10% (>85%) by adding the first 14 principal components.
From Table 2, the number of neurons in the input layer is determined by the embedding dimensions m, which is 24.Then the input dimensions are reduced to 14 based on the results of the PCA technique in Figure 9.To compare different models, we set the number of the input and hidden neurons for all models to 14 and 10 or 12 based on Figure 9 and Equation (26), respectively.And the learning rate is set to 0.0045.The maximum training error is set to 0.001, and the maximum iterations to 5000.
Then, 568 phase points are generated from the 598 measured data based on the Equations ( 1) and ( 5) by PSR.Thus, the beginning of the prediction is the 31st time series point according to Equation (5).The first 368 phase points are taken as training data, and the remaining 200 phase points are used as testing data.Fuzzy neural network (FNN) is adopted for soft measurement modeling for BOD.In order to verify the performance of the model with the chaos theory, the FNN with chaos theory (C-FNN) are applied for the prediction of BOD as well.The training and testing output of the FNN, C-FNN for BOD are shown in Figures 10 and 11, respectively.The scatter plots of the training and testing errors against the measured values are shown in Figures 12 and 13, respectively.3, the calculated training and testing RMSE, the minimum and maximum MAPE of FNN are 0.3665 and 0.6528, 5.9424% and 7.6864%, respectively.As a comparison, the training and testing RMSE, the minimum and maximum MAPE of C-FNN are 0.1356 and 0.3430, 2.9023% and 5.4033%, respectively.We can see that the C-FNN has better accuracy than FNN based on the value of the training or testing RMSE and MAPE.Besides, C-FNN has the smaller errors and more stable performance than FNN from the scatter plots in Figures 12  and 13, especially in extreme points.3, the calculated training and testing RMSE, the minimum and maximum MAPE of FNN are 0.3665 and 0.6528, 5.9424% and 7.6864%, respectively.As a comparison, the training and testing RMSE, the minimum and maximum MAPE of C-FNN are 0.1356 and 0.3430, 2.9023% and 5.4033%, respectively.We can see that the C-FNN has better accuracy than FNN based on the value of the training or testing RMSE and MAPE.Besides, C-FNN has the smaller errors and more stable performance than FNN from the scatter plots in Figures 12  and 13, especially in extreme points.In order to further verify the effects of the models which take chaos theory into consideration, the corresponding results and performance comparison of different models are also listed in Table 3.When m = [1 1 1 1 1] with each element representing BOD, COD, pH, SS and DO, respectively, it means that the input variables cannot be reconstructed based on Equations ( 3) and (4).When m = [8 6 6 5 7], it means that the input variables are reconstructed as shown in Table 3.The neural network model with chaos theory is written as C-ANN.The ANNs which are used for comparison include multilayer perceptron (MLP) [14], radial basis function neural network (RBF) [7], Elman neural network (Elman), fuzzy neural network (FNN) [1], MLP with chaos theory (C-MLP), RBF with chaos theory (C-RBF), Elman with chaos theory (C-Elman), and FNN with chaos theory (C-FNN).All ANNs adopt the traditional gradient descent method for training all connection weights.Notes: nh, the number of the hidden neurons; --, without PSR.
From Table 3, the C-ANNs all have smaller testing RMSE and MAPE than the corresponding ANN without chaos theory.In terms of the MAPE, the accuracy of the models with chaos theory increases about 2%-4% compared with the corresponding ANN without chaos theory.

Discussion
The WWTP has been proven to be a chaotic system based on Section 3.2.The PSR technique in Based on Figures 10-13 and Table 3, the calculated training and testing RMSE, the minimum and maximum MAPE of FNN are 0.3665 and 0.6528, 5.9424% and 7.6864%, respectively.As a comparison, the training and testing RMSE, the minimum and maximum MAPE of C-FNN are 0.1356 and 0.3430, 2.9023% and 5.4033%, respectively.We can see that the C-FNN has better accuracy than FNN based on the value of the training or testing RMSE and MAPE.Besides, C-FNN has the smaller errors and more stable performance than FNN from the scatter plots in Figures 12 and 13, especially in extreme points.In order to further verify the effects of the models which take chaos theory into consideration, the corresponding results and performance comparison of different models are also listed in Table 3.When m = [1 1 1 1 1] with each element representing BOD, COD, pH, SS and DO, respectively, it means that the input variables cannot be reconstructed based on Equations ( 3) and (4).When m = [8 6 6 5 7], it means that the input variables are reconstructed as shown in Table 3.The neural network model with chaos theory is written as C-ANN.The ANNs which are used for comparison include multilayer perceptron (MLP) [14], radial basis function neural network (RBF) [7], Elman neural network (Elman), fuzzy neural network (FNN) [1], MLP with chaos theory (C-MLP), RBF with chaos theory (C-RBF), Elman with chaos theory (C-Elman), and FNN with chaos theory (C-FNN).All ANNs adopt the traditional gradient descent method for training all connection weights.
From Table 3, the C-ANNs all have smaller testing RMSE and MAPE than the corresponding ANN without chaos theory.In terms of the MAPE, the accuracy of the models with chaos theory increases about 2%-4% compared with the corresponding ANN without chaos theory.

Discussion
The WWTP has been proven to be a chaotic system based on Section 3.2.The PSR technique in chaos theory can improve the accuracy of soft measurement modeling for BOD based on the results of the Table 3 in Section 3.3.That is because we can obtain a good representation of the attractor of the dynamical system by PSR [19].Therefore, it provided more information than the corresponding model without chaos theory.Then, ANN can learn more accurate laws from the richer information through training.In the practical application, the input and output data are obtained from measuring instruments or laboratory, which are memorized as the history data.Then, the datasets are analyzed and computed for chaotic characteristic and modeling for BOD.After it, the value of BOD can be obtained with the current input datasets.Actually, several important factors have great effects on the prediction accuracy of the soft measurement modeling for BOD based on chaos theory: (a) The quantity of the original dataset.On the one hand, the more the amount of data, the more dynamic information can be contained and the better precision the chaotic characteristic analysis has.
On the other hand, the ANN can learn more relationship and disciplinarian between input and output from it.(b) Effects of noise in the data.Generally, the original data also contain some noise, which have a negative effect on chaotic characteristic analysis and modeling in some degree.The noise can be defined as the unexplainable or random data that is found within the given data.In order to compare the difference between the de-noised data (Table 2) and noisy data for the chaotic characteristic analysis, the experiment are designed for noisy data.The results are listed in Table 4.
Table 4 shows that the data without noise-removal processing have great influence on the chaotic characteristics analysis.The existence of the noisy data can cause the increasing of the K, as well as λ 1 .This phenomenon is consistent with the significance of the K and λ 1 in Sections 2.1.2and 2.1.3.Moreover, it further misguides the chaotic characteristics analysis (i.e., some nonlinear dynamic system which is not chaos may be regarded as a chaotic system).Some of τ, D, m, K, λ 1 cannot be obtained under the influence of noise by the method mentioned in Section 2.1.The noisy data made it harder to obtain reliable parameters for modeling.According to the change of τ, D, m from Tables 2-4, the noise has different influence on various water quality parameters.Therefore, noise-removal processing is a necessary step.
(c) The selection of the input variables.Different input variables will lead to different results.
With mechanism analysis, simulation study and existing papers [1,2,4,5], influent COD, SS, pH and DO are selected as the input assistant variables finally.For more comprehensive analysis, the other models with different input variables have been examined for comparison and selection.The testing RMSE of soft measurement modeling for BOD with different input variables are shown in Table 5.From Table 5, the model with only pH and DO as the input has the lowest accuracy.Though the pH and DO can be measured online, the information that is provided by only pH and DO is insufficient for prediction.The models with three input variables (i.e., pH, DO, T and pH, SS, DO) have better performance than the model with only pH and DO since the more information come from the T or SS.The model with COD, pH, SS and DO as the inputs has the best accuracy among the given models.It is worth mentioning that the model with COD, pH, SS, DO and T has similar performance with COD, pH, SS and DO.However, the model with COD, pH, SS and DO is selected in this paper due to the fewer inputs and better accuracy.In addition, the model with five input variables (i.e., COD, pH, SS, DO, T, and ORP) does not have better precision by reason of the increasing dimensions of inputs, which have a negative effect on ANN's generalization.Therefore, the multiple factors such as biochemical mechanism, the number of inputs, accuracy, etc. need to be considered for the choice of input variables.
(d) The accuracy and rationality of the chaotic characteristic parameters.The chaotic characteristic parameters include delay time τ, embedding dimension m, Kolmogorov entropy K, and largest Lyapunov exponent λ 1 .The m, K and λ 1 are used to judge whether the nonlinear system is chaotic or not and indicate the degree of chaotic motion.The K and λ 1 , which just are the characterization of chaos, can provide key information for judging chaotic system and have no impact on the modeling or prediction.Especially, τ and m, which directly decide the reconstructed phase space by PSR, need to be appropriately selected.The performance comparison with different τ and m for C-FNN model are listed in Table 6.The choice of this paper for τ and m are marked in bold.
The number of the experimental phase points is the minimum value among them.
The accuracy of the modeling or prediction appeared downward trend with the increase or decrease of τ and m on the basic of the results in Table 2.This not only proved that the calculated value of the τ and m were accurate and reasonable, but also indicated that too large or too small values of τ and m can lead to poor prediction performance.Therefore, the estimation and calculation of the chaotic characteristic parameters is a very important part of the identification and modeling.Several experiments are conducted for the number of hidden neurons based on the errors and the range in Equation ( 26).The larger or smaller learning rate can cause the oscillation or slower convergence speed for ANN, respectively.(f) Normalization and dimensionality reduction.Generally, the scope of the normalization is [0, 1] or [−1, 1].The input and output dataset all need to be normalized for better training performance and generalization ability.The dimensions of input variables should be reduced for higher data quality.This needs to be further analyzed and tested for reasonable choice.

Conclusions
A novel soft measurement modeling method based on chaos theory and ANN for effluent BOD in WWTP is proposed in this paper.The chaotic characteristic of the WWTP has been first discovered by the fractional correlation dimension D, the positive Largest Lyapunov Exponent λ 1 and Kolmogorov entropy K of the BOD, COD, pH, SS, DO time series, which is different from the conventional research points about WWTP as a pure random and irregular system.Based on the above-mentioned chaotic characteristic, the chaos-ANN model, which combines chaos theory (i.e., PSR) with ANN, is further represented for the prediction of BOD time series.
The numerical experiments demonstrated that the proposed soft measurement modeling method based on chaos theory with the suitable m and τ has higher accuracy than the corresponding modeling method not based on chaos theory.Meanwhile, de-noised data, appropriate inputs and modeling parameters can contribute to the prediction precision.If one system has been proved to be a chaotic, the chaos theory can be added into the soft measurement model to improve the accuracy of prediction.The method can be expanded to other nonlinear modeling approaches for soft measurement and other similar practical engineering applications.Beside, further work will be performed to improve its convenience and integration for better application.

Figure 1 .
Figure 1.Activated sludge process in a wastewater treatment plant (WWTP).

Figure 1 .
Figure 1.Activated sludge process in a wastewater treatment plant (WWTP).
Model for BODIn this paper, the structure of the soft measurement model based on chaos theory with PCA-ANN for effluent BOD is shown in Figure2.The data which are measured in WWTP are sent to the data pre-processing (DP) module.The input assistant variables are determined by the mechanism and data analysis, which are easily measured, economical and have greater chemical correlation with effluent BOD.According to the analysis and existing papers[1,2,4,5,13], influent COD, SS, pH and DO are selected as the input assistant variables finally.

Figure 2 .
Figure 2. The structure of the prediction model based on chaos theory for BOD with PCA-ANN.

Figure 2 .
Figure 2. The structure of the prediction model based on chaos theory for BOD with PCA-ANN.

Figure 3 .
Figure 3.The original measured data used for effluent BOD modeling.

Figure 3 .
Figure 3.The original measured data used for effluent BOD modeling.

Figure 6 .
Figure 6.The curves of D over m for the BOD dataset.

Figure 6 .
Figure 6.The curves of D over m for the BOD dataset.

Figure 7 .
Figure 7. Two-dimension phase space portrait of BOD dataset.

Figure 8 .
Figure 8.The curves of m-K for BOD dataset.

Figure 8 .
Figure 8.The curves of m-K for BOD dataset.

Figure 9 .
Figure 9.The variance explained by the principal components.

Figure 9 .
Figure 9.The variance explained by the principal components.

Figure 10 .
Figure 10.The training output of the FNN, C-FNN for BOD.

Figure 11 .Figure 10 .
Figure 11.The testing output of the FNN, C-FNN for BOD.

Figure 10 .
Figure 10.The training output of the FNN, C-FNN for BOD.

Figure 11 .Figure 11 .
Figure 11.The testing output of the FNN, C-FNN for BOD.

Figure 11 .
Figure 11.The testing output of the FNN, C-FNN for BOD.

Figure 12 .
Figure 12.The training error of the FNN, C-FNN for BOD.

Figure 13 .
Figure 13.The testing error of the FNN, C-FNN for BOD.

Table 1 .
Measurable variables and corresponding connotations in WWTP.

Table 2 .
Delay time τ, correlation dimension D, embedding dimension m, Kolmogorov entropy K, largest Lyapunov exponent λ1 of the BOD, COD, pH, SS, DO.

Table 3 .
The performance comparison of different models.

Table 3 .
The performance comparison of different models.

Table 4 .
Delay time τ, correlation dimension D, embedding dimension m, Kolmogorov entropy K, largest Lyapunov exponent λ 1 of the BOD, COD, pH, SS, DO for noisy data.

Table 6 .
The performance comparison with different τ and m for C-FNN model.: M, the number of phase points; M train , the number of training data; M test , the number of testing data; * the number of experimental phase points.(e) The selection of the ANN modeling parameters.The ANN modeling parameters include the number of input and hidden neurons, learning rate, maximum iterations, and maximum training error.The number of inputs is determined by the embedding dimension m and PCA. Notes