1. Introduction
As one of the key effluent quality parameters for evaluating the performance of the wastewater treatment plant (WWTP), biochemical oxygen demand (BOD) reflects the content of biodegradable organic matter in the water and needs to be measured accurately [
1,
2]. Based on the national standard, BOD should remain at a regulatory value or below (e.g., 10 mg/L according to the standard of “Level I class A” in
Discharge standard of pollutants for municipal wastewater treatment plant (GB-18918-2002) in China). Generally, BOD is measured by the chemical experiment for days [
3]. Furthermore, the existing BOD monitoring instruments are used for the measurement of BOD but need high economic costs and have poor stability [
4,
5]. Hence, it is crucial to find a way to improve the convenience, economy and accuracy of BOD measurements.
In order to solve the above problem, the soft measurement method is widely used in complex system modeling which can estimate hard-to-measure process variables from other easy-to-measure variables [
6]. Artificial neural network (ANN) has some particular properties such as large scale parallel distributed processing, fault-tolerance, self-organized learning, classification, self-adaptation and strong capability of the nonlinear approximation with high reconstructing accuracy and fast training rate for the nonlinear dynamic system [
7,
8]. Therefore, models based on ANN which can mirror the law hidden in the data are the most popular ones for the soft measurement modeling and prediction [
9]. This technique has been adopted to solve many practical engineering problems in WWTP. The modeling variables for WWTP contain the common measurable parameters, such as Q, T, pH, ORP, DO, MLSS, NH
4-N, NO
X-N, BOD, COD, SS, TSS, TP, TN, SVI and EC. The units and connotations of the above-mentioned variables are shown in
Table 1. Among them, Q, T, pH, ORP, DO, COD and SS have great correlation with BOD in the process of WWTP. Some of them (i.e., T, pH, ORP, DO) can be measured online. The measurement of COD and SS need to take several hours to complete.
An improved TakagiSugeno fuzzy neural network (TSFNN) was proposed to predict effluent BOD values with the influent COD, pH, SS and DO of aeration tank as the input variables by soft measurement method [
1]. A self-organizing radial basis function (SORBF) neural network was introduced to estimate the effluent BOD concentration with the influent COD, pH and SS as the inputs in WWTP [
4]. K-nearest neighbors (KNN), support vector machine (SVM) and self-organizing map (SOM) were adopted to estimate five-day at 20 °C N-Allylthiourea BOD and suspended solids (SS) [
10]. A soft computing method based on Wavelet Neural Network (WNN) with Principal Component Analysis (PCA) and on-line measuring instruments were applied to accomplish real-time detection and control for ORP, DO, pH, COD, etc. in sewage treatment [
11]. An adaptive network-based fuzzy inference system (ANFIS) with PCA was introduced for the estimation of effluent SS and COD with influent COD, SS, Q and pH, DO as the inputs in WWTP [
12]. A method based on a generalized regression neural network (GRNN) was proposed to estimate the concentration of effluent BOD with effluent COD, SS, pH, T and EC in WWTP [
13]. A three-layered feed forward ANN with a back propagation learning algorithm was applied to forecast effluent BOD with BOD in other seven sampling sites as the inputs in WWTP [
14]. These articles adopted different modeling method to accomplish soft measurement for effluent BOD and other effluent water quality parameters that are difficult to measure in WWTP. However, the precision of soft measurement modeling for BOD in WWTP is affected by various factors, such as the quality and quantity of the observed data, the selection of input variables, modeling method and ANN’s parameters [
1,
3,
4]. In addition, WWTP is a complex dynamic system with characteristics of uncertainty, large time-lags and high nonlinearity and strong coupling due to the change of the environmental and operational conditions, which makes the modeling, optimizing and control difficult [
15]. Therefore, the prediction for effluent BOD in WWTP is a challenging problem and how to further enhance the accuracy of the soft measurement for BOD is a question worth thinking of and examining further [
16].
Because BOD prediction is affected by characteristics of WWTP, it is important to take its characteristics into consideration to try to obtain more useful information for soft measurement modeling. Chaos is a widely existing phenomenon in nature and is a specific behavior for the nonlinear dynamic system. With the rapid development of chaos theory, the chaotic time series analysis and prediction are widely studied and have been widespread concerned in the research fields such as electric short-term load [
17], economics [
18], industrial manufacture [
19], signal processing [
20] and medical diagnosis [
21]. A multivariate prediction method was presented for electric short-term load using chaos theory and radial basis function (RBF) neural networks. The proposed method improves the precision of forecasting significantly comparing with the univariate methods [
17]. Multilayer perceptron (MLP) neural network model based on phase space reconstruction (PSR) in chaos theory was proposed to predict the carbon price. Results demonstrate the model has higher prediction accuracy and fitting effect than other related models [
18]. A novel RBF prediction model for melt index (RBF-chaos) are set up to characterized its strong nonlinear and correlated relationships under chaos theory. Results indicate that the proposed neural network model with chaos is superior to the previous models without considering chaotic characteristics [
19]. A new method realized a long-term prediction of sensor baseline and drift based on PSR and RBF neural network. Results show that the proposed model can make long-term and accurate forecasting of chemical sensor baseline and drift time series [
20].
Based on the above researches, the chaotic characteristics of time series are identified and demonstrated first before modeling with chaos theory. PSR in chaos theory can memorize all of the properties of a chaotic attractor and clearly recover the motion trace of a time series, thus PSR provides more information for modeling and makes more accurate forecasting possible [
17,
18,
19,
20]. Therefore, we suppose that the WWTP is a chaotic system, but this has not been demonstrated yet. If the WWTP is a chaotic system, then chaos theory can be taken into consideration for the soft measurement modeling for BOD in WWTP.
In chaos theory, all the possible states of a nonlinear chaotic system can be described by the phase space. Each point in phase space, called phase points, expresses the whole physical state [
22]. PSR technique based on the Takens embedding theorem can recover the
m-dimensional phase space or the structure of the attractor by single time series [
23]. Thus, we can give a recurrence of the chaotic attractor in the original dynamical system and extract more quantity of the information from the limited dataset. This feature may increase the accuracy of modeling.
In this paper, the chaotic characteristics of WWTP are first analyzed and a new soft measurement method with PCA and ANN based on chaos theory is proposed for the prediction of BOD. Numerical experiments are designed to verify its effectiveness and feasibility by comparison between the ANN model with chaos theory and that without it.
The rest of paper is organized as follows. In
Section 2, the methods for chaotic characteristics analysis are introduced. PSR based on the Takens embedding theorem is presented for the univariate and multivariate time series modeling. The WWTP and the structure of the soft measurement prediction model for BOD based on PCA-ANN with chaos theory are described. In
Section 3, the numerical experiments and the comparative results between the proposed method and other methods not based on chaos theory are presented. In
Section 4, the several important factors which have effect on the prediction accuracy of the soft measurement modeling for effluent BOD based on chaos theory are analyzed and discussed. Finally, the conclusions are given in
Section 5.
4. Discussion
The WWTP has been proven to be a chaotic system based on
Section 3.2. The PSR technique in chaos theory can improve the accuracy of soft measurement modeling for BOD based on the results of the
Table 3 in
Section 3.3. That is because we can obtain a good representation of the attractor of the dynamical system by PSR [
19]. Therefore, it provided more information than the corresponding model without chaos theory. Then, ANN can learn more accurate laws from the richer information through training. In the practical application, the input and output data are obtained from measuring instruments or laboratory, which are memorized as the history data. Then, the datasets are analyzed and computed for chaotic characteristic and modeling for BOD. After it, the value of BOD can be obtained with the current input datasets. Actually, several important factors have great effects on the prediction accuracy of the soft measurement modeling for BOD based on chaos theory:
- (a)
The quantity of the original dataset. On the one hand, the more the amount of data, the more dynamic information can be contained and the better precision the chaotic characteristic analysis has. On the other hand, the ANN can learn more relationship and disciplinarian between input and output from it.
- (b)
Effects of noise in the data. Generally, the original data also contain some noise, which have a negative effect on chaotic characteristic analysis and modeling in some degree. The noise can be defined as the unexplainable or random data that is found within the given data. In order to compare the difference between the de-noised data (
Table 2) and noisy data for the chaotic characteristic analysis, the experiment are designed for noisy data. The results are listed in
Table 4.
Table 4 shows that the data without noise-removal processing have great influence on the chaotic characteristics analysis. The existence of the noisy data can cause the increasing of the
K, as well as
λ1. This phenomenon is consistent with the significance of the
K and
λ1 in
Section 2.1.2 and
Section 2.1.3. Moreover, it further misguides the chaotic characteristics analysis (i.e., some nonlinear dynamic system which is not chaos may be regarded as a chaotic system).
Some of
τ,
D,
m,
K,
λ1 cannot be obtained under the influence of noise by the method mentioned in
Section 2.1. The noisy data made it harder to obtain reliable parameters for modeling. According to the change of
τ,
D,
m from
Table 2,
Table 3 and
Table 4, the noise has different influence on various water quality parameters. Therefore, noise-removal processing is a necessary step.
- (c)
The selection of the input variables. Different input variables will lead to different results. With mechanism analysis, simulation study and existing papers [
1,
2,
4,
5], influent COD, SS, pH and DO are selected as the input assistant variables finally. For more comprehensive analysis, the other models with different input variables have been examined for comparison and selection. The testing RMSE of soft measurement modeling for BOD with different input variables are shown in
Table 5.
From
Table 5, the model with only pH and DO as the input has the lowest accuracy. Though the pH and DO can be measured online, the information that is provided by only pH and DO is insufficient for prediction. The models with three input variables (i.e., pH, DO, T and pH, SS, DO) have better performance than the model with only pH and DO since the more information come from the T or SS. The model with COD, pH, SS and DO as the inputs has the best accuracy among the given models. It is worth mentioning that the model with COD, pH, SS, DO and T has similar performance with COD, pH, SS and DO. However, the model with COD, pH, SS and DO is selected in this paper due to the fewer inputs and better accuracy. In addition, the model with five input variables (i.e., COD, pH, SS, DO, T, and ORP) does not have better precision by reason of the increasing dimensions of inputs, which have a negative effect on ANN’s generalization. Therefore, the multiple factors such as biochemical mechanism, the number of inputs, accuracy, etc. need to be considered for the choice of input variables.
- (d)
The accuracy and rationality of the chaotic characteristic parameters. The chaotic characteristic parameters include delay time
τ, embedding dimension
m, Kolmogorov entropy
K, and largest Lyapunov exponent
λ1. The
m,
K and
λ1 are used to judge whether the nonlinear system is chaotic or not and indicate the degree of chaotic motion. The
K and
λ1, which just are the characterization of chaos, can provide key information for judging chaotic system and have no impact on the modeling or prediction. Especially,
τ and
m, which directly decide the reconstructed phase space by PSR, need to be appropriately selected. The performance comparison with different
τ and
m for C-FNN model are listed in
Table 6. The choice of this paper for
τ and
m are marked in bold. The number of the experimental phase points is the minimum value among them.
The accuracy of the modeling or prediction appeared downward trend with the increase or decrease of
τ and
m on the basic of the results in
Table 2. This not only proved that the calculated value of the
τ and
m were accurate and reasonable, but also indicated that too large or too small values of
τ and
m can lead to poor prediction performance. Therefore, the estimation and calculation of the chaotic characteristic parameters is a very important part of the identification and modeling.
- (e)
The selection of the ANN modeling parameters. The ANN modeling parameters include the number of input and hidden neurons, learning rate, maximum iterations, and maximum training error. The number of inputs is determined by the embedding dimension m and PCA. Several experiments are conducted for the number of hidden neurons based on the errors and the range in Equation (26). The larger or smaller learning rate can cause the oscillation or slower convergence speed for ANN, respectively.
- (f)
Normalization and dimensionality reduction. Generally, the scope of the normalization is [0, 1] or [−1, 1]. The input and output dataset all need to be normalized for better training performance and generalization ability. The dimensions of input variables should be reduced for higher data quality. This needs to be further analyzed and tested for reasonable choice.
5. Conclusions
A novel soft measurement modeling method based on chaos theory and ANN for effluent BOD in WWTP is proposed in this paper. The chaotic characteristic of the WWTP has been first discovered by the fractional correlation dimension D, the positive Largest Lyapunov Exponent λ1 and Kolmogorov entropy K of the BOD, COD, pH, SS, DO time series, which is different from the conventional research points about WWTP as a pure random and irregular system. Based on the above-mentioned chaotic characteristic, the chaos-ANN model, which combines chaos theory (i.e., PSR) with ANN, is further represented for the prediction of BOD time series.
The numerical experiments demonstrated that the proposed soft measurement modeling method based on chaos theory with the suitable m and τ has higher accuracy than the corresponding modeling method not based on chaos theory. Meanwhile, de-noised data, appropriate inputs and modeling parameters can contribute to the prediction precision. If one system has been proved to be a chaotic, the chaos theory can be added into the soft measurement model to improve the accuracy of prediction. The method can be expanded to other nonlinear modeling approaches for soft measurement and other similar practical engineering applications. Beside, further work will be performed to improve its convenience and integration for better application.