A Real-Time BOD Estimation Method in Wastewater Treatment Process Based on an Optimized Extreme Learning Machine

: It is difﬁcult to capture the real-time online measurement data for biochemical oxygen demand (BOD) in wastewater treatment processes. An optimized extreme learning machine (ELM) based on an improved cuckoo search algorithm (ICS) is proposed in this paper for the design of soft BOD measurement model. In ICS-ELM, the input weights matrices of the extreme learning machine and the threshold of the hidden layer are encoded as the cuckoo’s nest locations. The best input weights matrices and threshold are obtained by using the strong global search ability of improved cuckoo search algorithm. The optimal results can be used to improve the precision of forecasting based on less number of neurons of the hidden layer in ELM. Simulation results show that the soft sensor model has good real-time performance, high prediction accuracy, and stronger generalization performance for BOD measurement of the efﬂuent quality compared to other modeling methods such as back propagation (BP) network in most cases.


Introduction
The awareness of environmental protection in the society has been gradually improving due to better education on sustainability, and wastewater treatment has become one of the important research topics in the field of environmental protection. Biochemical oxygen demand (BOD) is one of the major parameters of effluent quality indices of wastewater treatment processes. Real-time and accurate monitoring of BOD is the key factor to improve the automatic control of the performance of wastewater treatment processes. However, due to the strong nonlinear and time-varying characteristics of wastewater treatment processes as well as the capacity of current detection methods [1] and measurement accuracies of the sensors and instruments, it is difficult to achieve accurate real-time and on-line measurement of BOD data; this limits the application of closed-loop control in wastewater treatment processes [2]. Hence, there is a great need on how to measure BOD rapidly and accurately to control the wastewater treatment processes.
Recently, the development of soft sensing technology provides a new way of measuring variables which are not measurable on-line in real-time. Especially, researchers have applied soft sensing technology to model the wastewater treatment processes and have achieved good results.
to obtain the optimal parameters of the ELM by using an improved cuckoo search (ICS) algorithm [15]. The fuzzy rough monotone dependence (FRMD) algorithm proposed by Liang et al. [16] to reduce the dimensionality of the data from the BSM1 simulation model [17,18] is used in Matlab simulation. The reduction data is used as the input of the soft sensor and effluent BOD is used as the output of it.

Materials and Methods
The following steps should be taken to build the Neural Network model: (i) Obtain data and normalizing them. (ii) Carry out data attribute reduction process using fuzzy rough monotone dependence (FRMD) algorithm (mentioned in Section 2.3.2). This would help to reduce the input layer nodes, which would save the computational cost when the model training process is carried out and make the process more efficient. (iii) Train the model using the final data, until it reaches the fitness function f (Equation (13)). (iv) Verify the model using the test data sets. (v) Employ the model into field trials. The most important thing to build a Neural Network model of a process is to obtain valuable data for training and verification. This would guarantee the correctness and effectiveness of the model. The modeling process can easily be implemented in MATLAB.
Our paper focuses on the following three points: The first one is fuzzy rough monotone dependence (FRMD) algorithm for the data attribute reduction process to reduce the data attribute, and simultaneously reduce the number of input layer nodes of the neural network (NN) model. The second one is using an improved cuckoo search algorithm (ICS) to adjust and optimize the input weights and the hidden layer biases during the training process. This would help to improve the prediction accuracy and the stability of the extreme learning machine (ELM) NN model. The third one is using the proposed model to do a real-time BOD estimation study in wastewater treatment process.

Extreme Learning Machine
, t i2 , . . . , t im ] T ∈ R m , and an infinitely differentiable activation function of any finite interval, g : R → R , standard single hidden layer feed-forward networks (SLFNs) with L hidden nodes are mathematically modeled as where w i = [w i1 , w i2 , . . . , w in ] T is the connecting weight matrices of the i th hidden node and the input nodes; β i = [β i1 , β i2 , . . . , β in ] T is the connecting weight matrices of the i th hidden node and the output nodes; and b i is the threshold of the i th hidden node. x j · w i is the inner product of x j and w i . The network topology with linear output nodes is shown in Figure 1. corresponding fitness values of the root mean square error (RMSE) between the actual value and the prediction value, to obtain the optimal parameters of the ELM by using an improved cuckoo search (ICS) algorithm [15]. The fuzzy rough monotone dependence (FRMD) algorithm proposed by Liang et al [16] to reduce the dimensionality of the data from the BSM1 simulation model [17,18] is used in Matlab simulation. The reduction data is used as the input of the soft sensor and effluent BOD is used as the output of it.

Materials and Methods
The following steps should be taken to build the Neural Network model: (i) Obtain data and normalizing them. (ii) Carry out data attribute reduction process using fuzzy rough monotone dependence (FRMD) algorithm (mentioned in section 2.3.2). This would help to reduce the input layer nodes, which would save the computational cost when the model training process is carried out and make the process more efficient. (iii) Train the model using the final data, until it reaches the fitness function f (Equation 13). (iv) Verify the model using the test data sets. (v) Employ the model into field trials. The most important thing to build a Neural Network model of a process is to obtain valuable data for training and verification. This would guarantee the correctness and effectiveness of the model. The modeling process can easily be implemented in MATLAB.
Our paper focuses on the following three points: The first one is fuzzy rough monotone dependence (FRMD) algorithm for the data attribute reduction process to reduce the data attribute, and simultaneously reduce the number of input layer nodes of the neural network (NN) model. The second one is using an improved cuckoo search algorithm (ICS) to adjust and optimize the input weights and the hidden layer biases during the training process. This would help to improve the prediction accuracy and the stability of the extreme learning machine (ELM) NN model. The third one is using the proposed model to do a real-time BOD estimation study in wastewater treatment process.

Extreme Learning Machine
For N arbitrarily distinct samples of data ( , ) tR , and an infinitely differentiable activation function of any finite interval, : gR  R , standard single hidden layer feed-forward networks (SLFNs) with L hidden nodes are mathematically modeled as , , is the connecting weight matrices of the i th hidden node and the input nodes; , , is the connecting weight matrices of the i th hidden node and the output nodes; and bi is the threshold of the i th hidden node. ji  xw is the inner product of j x and i w . The network topology with linear output nodes is shown in Figure 1.  Using the initialized random assignment weights w i ∈ R n and the thresholds b i ∈ R, the standard SLFNs can approximate these N samples with zero error means that Compact form of Equation (2) can be expressed in matrices as follows: where, and As named in Huang [19], H is called the hidden layer output matrix of the neural network; The i th column of H is the i th hidden node output with respect to inputs x 1 , x 2 , . . . , x N ; The row of matrix H represents the hidden layer feature mapping with respect to input x i , that is x i : h(x i ).
If the activation function g is infinitely differentiable and the nodes with parameters of hidden layers can be randomly generated, then there is [6] Theorem 1. Given an small positive value ε (ε > 0), an activation function g: R→R which is infinitely differentiable in any interval and, N arbitrary distinct samples (x i , t i ) ∈ R n × R m , there exists L ≤ N for any parameters of the network {(w i , b i )} L i=1 , according to any continuous probability distribution, then with probability one, From the point of view of interpolation, the largest number of hidden layer nodes L should be less than the number of training samples N. In fact, when L is equal to N, the training error will be zero. According to Theorem 1, when L is less than N, SLFNs will approach the training samples with very little training error, and the matrix H is a non-square matrix, there existsŵ i ,b i ,β, so that Equation (7) can be established.
Unlike the traditional function approximation theories, the input weights w i and the hidden layer biases b i are in fact not necessarily tuned and the hidden layer output matrix H can actually remain unchanged once random values have been assigned to these parameters in the beginning of learning, and this makes Equation (7) is considerate as a linear system. The training for SLFNs is simply equivalent to finding the least squares solutionβ of the linear equations Hβ = T, that is Appl. Sci. 2019, 9, 523 5 of 12 The smallest norm least squares solution of the weights of the above linear system is unique, which isβ = H + T (9) where H + is the Moore-Penrose generalized inverse of the matrix H [20,21]. Thus, the main steps of ELM's learning algorithm can be summarized as follows: With the given training set (x i , t i ) ∈ R n × R m , i = 1, . . . , N, the activation function g(x) and the number of the hidden nodes L, Step 1: Assign input weight, w i and bias of hidden layer, b i , randomly (where i = 1, . . . , L); Step 2: Calculate the hidden layer output matrix, H; Step 3: Calculate the output weight,β.

Improved Cuckoo Search (ICS) Algorithm
The cuckoo search (CS) algorithm is approved as an efficient optimization method [22,23]. The principle of CS algorithm is how a cuckoo can find an optimal nest to hatch the eggs by free search based on the obligate brood parasitism and Lévy flights mechanism which is unique in nature. An important advantage of this algorithm is its simplicity. In fact, comparing with other population-or agent-based metaheuristic algorithms such as particle swarm optimization (PSO) and harmony search, there is essentially only a single parameter p a , which represents the probability to be found by the host, in CS (apart from the population size n). Therefore, it is very easy to implement.
Its few parameters, simple operation, easy to realize, and good ability to search random paths are the main advantages of CS algorithm. Therefore, once the CS algorithm was proposed, it has been rapidly developed and applied to solve a variety of optimization problems. But at the same time, clearly, CS has some disadvantages such as slow convergence speed and lack of adaptability. We proposed an improved cuckoo search algorithm, named improved cuckoo search (ICS) algorithm [15], which introduces the cosine cyclic operator into the CS algorithm to realize the periodic change of p a and an adaptive dynamic adjustment strategy for the search step size S, which are described as follows. (11) where, in Equation (10), T is the cycle of the periodic operator; t is the evolution generation of the current iteration; p a,max and p a,min are the dynamic control parameters of p a which are equal to 0.75 and 0.1 respectively. In Equation (11), m ∈ (0, 1) is a regulatory factor; bestX i-1 is the optimal nest position of the last generation groups; is the limiting factor; t and t max are the current iteration number and the maximum iteration number; S min is the minimum search step; p is an integer from 1 to 30. Based on the above analysis and improvements, the ICS algorithm steps can be described as follows: Step 1: Set the objective function and initialization function; generate initial population of n host nests x i (i = 1, 2, . . . , n); set the size of population, the dimension of independent variables, the maximum iteration number, the maximum and minimum probability of being detected.
Step 2: Calculate the current optimal nest position by putting x i into the objective function.
Step 3: Record the location of the nest, and use Equation (11) to calculate the current step size S, then use Equation (12) to update the location of the nest.
Appl. Sci. 2019, 9, 523 6 of 12 where, the product of ⊕ is a kind of calculation means entrywise multiplications; α>0 is a step size control factor.
Step 4: Compare the current value of the objective function with the last value; Update the value if the function value is better than the previous one; otherwise, keep it unchanged.
Step 5: After updating the nest location, choose a random number ε ∈ [0, 1], which obeys a uniform distribution; if ε > p a , randomly change the value of x t+1 i ; otherwise leave as it is. Keep the optimal nest position at last.
Step 6: Return to Step 2 if the iteration number has not reached the maximum iteration number; otherwise, continue to the next step.
Step 7: Output the optimal nest location. You can find more details of the ICS algorithm in Du et al. [15].

ICS-Based ELM
When the structure of ELM has been fixed, the network needs to be trained offline first before using the soft sensor online. To improve the prediction accuracy, the input weights w i and the hidden layer biases b i should be optimized through the training process by ICS algorithm.
The root mean square error (RMSE) of the actual value and the predicted value is taken as the fitness value of the nest of each group, shows in Equation (13). (13) where, N train is the number of training samples; m is the number of the hidden nodes; f is the fitness function. The experimental data used in our research are obtained from the outputs of a 14-day simulation on the benchmark simulation model No. 1 (BSM1) [17] developed by International Water Association (IWA). Data set provided by the European Co-operation in the field of Scientific and Technical Research (COST) based on the actual water quality of the influent of the wastewater treatment plant was used as the simulation input for the experimental process. The data represented four conditions namely steady state as well as dry, rainy, and stormy weathers. The plant consists of 5 bioreactors and a ten-layer secondary settler which is shown in Figure 2. The initial conditions of the operating parameters of the main executor in BSM1 are set as follows (Table 1):  The initial conditions of the operating parameters of the main executor in BSM1 are set as follows (Table 1): Steps to obtain the experimental data:

Parameters Values Units Descriptions
Step 1: Run a 100-day steady state simulation and repeat until the system achieves steady stability; Step 2: Run a 14-day simulation with the input data representing dry weather conditions and repeat until the system achieves dynamic stability; Step 3: Simulate BSM1 to obtain experimental data using the influent data mentioned above. The 2-week data (1344 groups in total with a sampling time of 15 minutes, 4 groups × 24 hours × 14 days = 1344 groups), of the wastewater treatment process was finally obtained. The compositions of wastewater are shown in Table 2. Obviously, the reaction mechanisms of activated sludge process is very complex, and the process parameters involved are numerous, and therefore the dimension of the obtained wastewater data is too high. This will not only lead to the occurrence of over fitting, but also cause the dimension disaster. In order to avoid these problems and not to affect the accuracy of model prediction, it is necessary to find out the parameters which have great influence on the BOD of the effluent. Therefore, the fuzzy rough monotone dependence (FRMD) [16,24] algorithm is used for processing the data to reduce the attribute.
It can be shown that there is a monotonic dependence between conditional attributes and decision attributes. Based on the FRMD algorithm, the steps of attribute reduction for wastewater data are as follows: Step 1: Define and initialize a two-dimensional array D[n,m] for the decision table, where the m th column is the decision attribute (that is, the data of effluent water quality), and 1 st to the (m − 1) th columns are the conditional attributes (that is, the data of the wastewater influent); Step 2: Arrange the decision attribute values in ascending order, and exchange the rows of the conditional attributes corresponding to the ordered decision attribute; Step 3: Make a circular study of the fuzzy rough monotone dependence relation between each condition attribute value and decision attribute value; Obtain the membership function values; Step 4: If the membership degree function in the set is monotonically increasing, output is the maximum value of them; otherwise, output is 0.
After the attribute reduction, if the membership degree is 0, the conditional attribute will be discarded. The remaining conditional attributes are considered as influential to the BOD, and will be used as the input of the soft sensor to predict BOD.

Data Attribute Reduction
Using fuzzy rough monotone dependence (FRMD) algorithm to do the data attribute reduction process, BOD is taken as the decision attribute, and components in Table 2 are taken as the conditional attributes. The membership function degrees of each conditional attribute to the BOD are shown in Figure 2. Among them, conditional attributes whose membership degrees are equal to 0 have been discarded. Figure 3 shows the membership degrees between BOD and the components parameters; the conditional attributes which have zero degree are not shown in Figure 3. Nine attributes are shown in Figure 3, but we can clearly see that the degrees of Xnd and TSS are much closer to zero than others. Therefore, for reducing the complexity of the system, those two attributes were not taken into account in the next simulation. The remaining seven attributes were taken as the inputs of the ELM soft sensor.   Figure 3 shows the membership degrees between BOD and the components parameters; the conditional attributes which have zero degree are not shown in Figure 3. Nine attributes are shown in Figure 3, but we can clearly see that the degrees of Xnd and TSS are much closer to zero than others. Therefore, for reducing the complexity of the system, those two attributes were not taken into account in the next simulation. The remaining seven attributes were taken as the inputs of the ELM soft sensor.

Comparison and Discussion of Simulation Results
The effectiveness of the ICS-based ELM BOD soft sensing model was verified based on the data attribute reduction results; Ss, Xi, Xs, Xbh, Snh, and Snd of the influent and the flow rate Q are used as the auxiliary variables for the network input, so the ICS-based ELM network structure had 7 input nodes, 40 hidden nodes and one output node (7-40-1). The 1344 groups of data are randomly divided into training datasets (500 groups), verifying datasets (460 groups), and testing datasets (384 groups) for the simulation. The training accuracy and prediction accuracy of the soft sensor model are represented by mean square error (MSE).

Comparison and Discussion of Simulation Results
The effectiveness of the ICS-based ELM BOD soft sensing model was verified based on the data attribute reduction results; S s , X i , X s , X bh , S nh , and S nd of the influent and the flow rate Q are used as the auxiliary variables for the network input, so the ICS-based ELM network structure had 7 input nodes, 40 hidden nodes and one output node (7-40-1). The 1344 groups of data are randomly divided into training datasets (500 groups), verifying datasets (460 groups), and testing datasets (384 groups) for the simulation. The training accuracy and prediction accuracy of the soft sensor model are represented by mean square error (MSE).
where, Y(n) is the predicted output of the model; Y * (n) is the actual measured value; N is the sample size number.
Parameters are set as follows: For the ICS algorithm, the population size n is 25, the maximum and the minimum probability to be discovered by the host bird are 0.75 and 0.1 respectively; for the adaptive step length control parameters m = 0.8, k = 0.2, p = 25, S min = 0.01.
When the number of iterations t = 100, the iteration will be terminated; the curve of the optimization process is shown in Figure 4. The prediction results and errors of ICS-based ELM soft sensor for effluent BOD are shown in Figure 5. It can be seen from the results, the ICS-based ELM has a better prediction accuracy than the basic ELM. To verify the advantage of the ICS-based ELM, a comparison studies are simulated with the other five models, which are extreme learning machine (ELM), cuckoo search (CS)-based ELM, relevance vector machine (RVM), back-propagation (BP) neural network and least squares support vector machines (LS-SVM), with the same influent data under dry weather condition. The MSE results are shown in Table 3, and the prediction results are shown in Figure 6. Clearly, the MSE from ICS-based ELM are much smaller than the other five models. But the training process will take a slightly longer computer time than for the basic ELM because of the optimization process of the input weights wi and the hidden layer biases bi.   It can be seen from the results, the ICS-based ELM has a better prediction accuracy than the basic ELM. To verify the advantage of the ICS-based ELM, a comparison studies are simulated with the other five models, which are extreme learning machine (ELM), cuckoo search (CS)-based ELM, relevance vector machine (RVM), back-propagation (BP) neural network and least squares support vector machines (LS-SVM), with the same influent data under dry weather condition. The MSE results are shown in Table 3, and the prediction results are shown in Figure 6. Clearly, the MSE from ICS-based ELM are much smaller than the other five models. But the training process will take a slightly longer computer time than for the basic ELM because of the optimization process of the input weights wi and the hidden layer biases bi.   It can be seen from the results, the ICS-based ELM has a better prediction accuracy than the basic ELM. To verify the advantage of the ICS-based ELM, a comparison studies are simulated with the other five models, which are extreme learning machine (ELM), cuckoo search (CS)-based ELM, relevance vector machine (RVM), back-propagation (BP) neural network and least squares support vector machines (LS-SVM), with the same influent data under dry weather condition. The MSE results are shown in Table 3, and the prediction results are shown in Figure 6. Clearly, the MSE from ICS-based ELM are much smaller than the other five models. But the training process will take a slightly longer computer time than for the basic ELM because of the optimization process of the input weights w i and the hidden layer biases b i .   To further verify the anti-interference ability of the ICS-based ELM prediction model, rainy and stormy weather conditions are considered as the disturbances of the system to simulate the BOD as shown in Figures 7 and 8.   As can be seen from the results shown in Figure 7 and Figure 8, no matter how the weather condition changes, a well-trained ICS-based ELM still can predict the effluent BOD with smaller errors compared to other five models considered in this paper.

Conclusions
In this paper, an ICS-based ELM is applied to BOD soft sensing modeling to predict the effluent water quality. It overcomes low prediction accuracy and poor stability of basic ELM algorithm with an improved cuckoo search algorithm. The input weights and the hidden layer biases of ELM are optimized with an offline training process. Results show that ICS-based ELM BOD soft sensing model can improve the accuracy of the prediction with better anti-interference and generalization abilities than basic ELM algorithm. Because of the accurately prediction of the BOD in the process, it would be helpful to the energy saving in the aeration operation in the future research. In summary: (1) The fuzzy rough monotone dependence (FRMD) algorithm was used to do the data attribute reduction process. This allowed finding out the input parameters closely related to BOD based on the membership function degrees of each conditional attribute. This would help to reduce the input layer nodes, which would save the computational cost when training the model and make the process more efficient.
(2) A periodic change of pa and an adaptive dynamic adjustment strategy are designed to improve the cuckoo search algorithm. The input weights and the hidden layer biases of the proposed ICS-based ELM are optimized during the offline training process. Through the verification of the results, the proposed method can effectively improve the accuracy of the prediction with better anti-interference and generalization abilities.
(3) Comparing the simulation results of ICS-based ELM, CS-based ELM, basic ELM, LS-SVM, and BP models showed that no matter how the weather condition changes, a well-trained ICSbased ELM still can predict the effluent BOD with smaller errors compared to other five models considered in this paper.

Conclusions
In this paper, an ICS-based ELM is applied to BOD soft sensing modeling to predict the effluent water quality. It overcomes low prediction accuracy and poor stability of basic ELM algorithm with an improved cuckoo search algorithm. The input weights and the hidden layer biases of ELM are optimized with an offline training process. Results show that ICS-based ELM BOD soft sensing model can improve the accuracy of the prediction with better anti-interference and generalization abilities than basic ELM algorithm. Because of the accurately prediction of the BOD in the process, it would be helpful to the energy saving in the aeration operation in the future research. In summary: (1) The fuzzy rough monotone dependence (FRMD) algorithm was used to do the data attribute reduction process. This allowed finding out the input parameters closely related to BOD based on the membership function degrees of each conditional attribute. This would help to reduce the input layer nodes, which would save the computational cost when training the model and make the process more efficient.
(2) A periodic change of p a and an adaptive dynamic adjustment strategy are designed to improve the cuckoo search algorithm. The input weights and the hidden layer biases of the proposed ICS-based ELM are optimized during the offline training process. Through the verification of the results, the proposed method can effectively improve the accuracy of the prediction with better anti-interference and generalization abilities.
(3) Comparing the simulation results of ICS-based ELM, CS-based ELM, basic ELM, LS-SVM, and BP models showed that no matter how the weather condition changes, a well-trained ICS-based ELM still can predict the effluent BOD with smaller errors compared to other five models considered in this paper.