Cholesky Factorization Based Online Sequential Multiple Kernel Extreme Learning Machine Algorithm for a Cement Clinker Free Lime Content Prediction Model

: Aiming at the difﬁculty in real-time measuring and the long ofﬂine measurement cycle for the content of cement clinker free lime (fCaO), it is very important to build an online prediction model for fCaO content. In this work, on the basis of Cholesky factorization, the online sequential multiple kernel extreme learning machine algorithm (COS-MKELM) is proposed. The LDL T form Cholesky factorization of the matrix is introduced to avoid the large operation amount of inverse matrix calculation. In addition, the stored initial information is utilized to realize online model identiﬁcation. Then, three regression datasets are used to test the performance of the COS-MKELM algorithm. Finally, an online prediction model for fCaO content is built based on COS-MKELM. Experimental results demonstrate that the fCaO content model improves the performance in terms of learning efﬁciency, regression accuracy, and generalization ability. In addition, the online prediction model can be corrected in real-time when the production conditions of cement clinker change.


Introduction
Cement is an important building material. Its quality will affect the building safety directly [1,2]. In addition, the cement clinker is a key product during cement production. At the same time, the content of cement clinker free lime (fCaO) is an important index to evaluate the clinker quality [3]. It is the main factor that impacts the cement stability and clinker strength. Thus, operators and engineers should adjust the variables of cement production according to the fCaO content in real time. In the process of cement production, the cement clinker fCaO content should be controlled within a certain range [4]. When the fCaO content is higher than the standard range, the cement clinker is unqualified. On the other hand, when it is lower than the standard range, the clinker is over-burning and the energy consumption increases [5].
At present, common measuring methods of fCaO content are chemical analysis and fluorescence analyzers. The measurement of cement clinker fCaO content is mainly obtained by manual sampling intermittently, and then the sample cement clinker are sent to the laboratory for manual analysis [6]. According to the requirements of different cement production lines, the time interval of laboratory testing fCaO content varies from 1 to 4 h. However, there is a certain time difference between each process in the cement clinker firing process, which makes the measurement results of fCaO content have a great lag for the production guidance in the clinker firing process. The measurement results can not meet the requirements of real-time optimization control in the cement clinker firing process [7]. Therefore, the essential work of estimating fCaO content is to set up a high precision online prediction model which is very important for monitoring and optimizing the operations of cement plants.
Considering the nonlinearity and uncertainty between the fCaO content and the production process related variables, some scholars have established the regression prediction model by using the method of neural networks modeling. Fuzzy entropy and a neural-net ensemble with random weights are used to build the soft sensor of fCaO content [8]. Szatvanyi et al. [9] and Li et al. [10] combined flame image features and process variables methods to build the prediction model for fCaO content. Liu et al. [11] proposed a support vector machine ensemble model to estimate fCaO content. A time series analysis method [12] and an improved combination modeling method [13] were used for soft measurement of cement clinker fCaO content. Some neural network methods of establishing a prediction model in the cement calcination process were also proposed, such as timevarying delay deep belief network [14], multi-channel CNN with moving window [15], deep belief network with sliding window [16], and two-dimensional convolutional neural network [17]. In addition, other researchers working on the online prediction model for fCaO content have been published. Pani et al. [18,19] used a feed-forward artificial neural network and fuzzy inference to build a soft sensor model for online prediction fCaO content. The multivariate time series analysis and convolutional neural network method [20] and the kernel extreme learning machine algorithm [21] were proposed for online cement clinker quality monitoring. Some of the above scholars established the prediction models of cement clinker fCaO content based on the offline neural network. However, when the training sample data is too large or the working conditions of cement clinker firing changed, the modeling process of the prediction model needs to be repeated. This will greatly increase the computational amount of the cement clinker fCaO content modeling process. In addition, some of the above scholars established the prediction models of cement clinker fCaO content based on the online neural network. However, due to the complexity of the cement clinker firing process, the performance of the online prediction model of the clinker fCaO content is poor, and the learning speed of parameters training is slow, which leads to the poor prediction effect of the cement clinker fCaO content. Therefore, it is necessary to deeply study the new online sequential learning neural network algorithm applicable to the cement clinker fCaO content prediction model. This can not only solve the problem of establishing and predicting the cement clinker fCaO content prediction model but also overcome the disadvantages of the existing conventional neural network algorithm, such as poor network performance and slow learning speed.
In recent years, the extreme learning machine (ELM) [22,23] and kernel extreme learning machine (KELM) [24] are proposed and have been successfully applied to model identification, such as saliency detection [25], gesture recognition [26], image classification [27], nonlinear fault detection [28], seepage time soft sensor model of nonwoven fabric [29], and rolling bearing sub-health recognition [30]. However, different kernel functions have different characteristics, and the performances are varying in different applications. Thus, the multiple kernel extreme learning machine (MKELM) has been proposed in [31] as a combination of multiple kernel learning and ELM, and it has also been successfully applied to many applications, such as network intrusion detection [32,33] and human activity recognition [34]. Considering a strong nonlinear relationship between the fCaO content and the related variables during cement production, MKELM is adopted to build an online prediction model for fCaO content in this paper. However, it is necessary to study how to select the appropriate kernel functions for the cement clinker fCaO content online prediction model.
In addition, Liang et al. [35] developed the online sequential extreme learning machine (OS-ELM) algorithm, and the online sequential kernel extreme learning machine (OS-KELM) algorithm is proposed by Wang and Han [36]. These online learning algorithms have received widespread attention in relevant research, such as modeling time series [37], dynamic modeling [38], batch process [39], and discharge forecasting [40]. On account of the large calculation of inverse matrix and the computational complexity of online sequential algorithm, the Cholesky factorization method is introduced to the extreme learning process [41,42]. However, the OS-ELM and OS-KELM algorithms need to compute the inverse of corresponding matrixes in each online sequential learning process. When the added data are too large, the matrix dimension in the network learning process of OS-ELM algorithm and OS-KELm algorithm is too large, which increases the computation of online sequential learning process and affects the learning speed of online training [43]. Therefore, in view of the limitations of the existing ELM algorithm, it is necessary to further study the neural network algorithm for online sequence learning to improve the structure of the existing neural network online learning.
In this research, considering different kernel function characteristics, three typical kernel functions are selected to construct an equivalent kernel function which is used in the MKELM algorithm. In order to avoid the large calculation of the inverse matrix and reduce the computational complexity of online sequential learning process, the LDL T form Cholesky factorization based online sequential multiple kernel extreme learning machine (COS-MKELM) algorithm is proposed. In the COS-MKELM algorithm, there are two major differences: the LDL T form Cholesky factorization of matrix is introduced, and the new matrix elements are calculated recursively based on the original matrix stored information. Then, three classical UCI regression datasets are used, which are available at https://www.dcc.fc.up.pt/~ltorgo/Regression/DataSets.html, on 5 December 2020, to test the validity of COS-MKELM. In addition, the cement clinker fCaO content online prediction model is built by the COS-MKELM algorithm. Simulation results show that the online prediction model has good learning efficiency, regression accuracy, and generalization ability.
The rest of this paper is arranged as follows. In Section 2, the MKELM algorithm with three typical kernel functions is derived. The COS-MKELM algorithm is proposed and tested in Section 3. In Section 4, the cement clinker fCaO online prediction model is built and simulation verified. Finally, Section 5 concludes the paper.

The MKELM Algorithm
In this section, the MKELM algorithm is given to handle different heterogeneous data integrations. It can improve the model precision compared with single kernel function. The optimal kernel of MKELM is expressed as a linear combination of multiple different basic kernel functions.
Assuming the set of training data are x i , y i , where i = 1, 2, . . . , N, N is the number of samples. x i = [x i1 , x i2 , · · · , x in ] T ∈ R n denotes the input vector and y i = [y i1 , y i2 , · · · , y im ] T ∈ R m is the output vector. n expresses the number of input layer nodes and m represents the number of output layer nodes. Then, the output function of MKELM can be given as follows: where λ p 0 is a coefficient of the pth kernel function, p = 1, 2, . . . , G, G represents the number of base kernel functions; Q is the node number of hidden layer; ω k = [ω k1 , ω k2 , · · · , ω kl , · · · , ω kn ] T is the weight of connection between the kth node of hidden layer and each input node, l = 1, 2, . . . , n; The b k sets threshold of the kth hidden node; β k = [β k1 , β k2 , · · · , β kq , · · · , β km ] T is the weight vector between hidden nodes and output nodes, q = 1, 2, . . . , m; g(x) is the activation function.
According to the structural risk minimization principle, Equation (1) can be written as the minimum optimization problem: where γ is regularization coefficient, and ξ i is the error between the ith actual output and expected output. It is not difficult to verify that (2) is a joint-convex optimization problem, and the solution of (2) with Lagrangian method can be written as: where α and τ are the Lagrange multipliers.
According to the Karush-Kuhn-Tucker optimality condition, the partial differential of the Lagrangian function in (3) with respect to the variables (β, ξ, α and τ) can be calculated respectively as: From the above formula, the following formula can be obtained: Therefore, the Lagrange multiplier τ vanishes during the derivation process. Then, let β and ξ also vanish according to (5), and the following formula can be obtained: According to (4) and (5), the (6) can be rewritten into a matrix form as: where Ω is the multiple kernel function matrix of MKELM, whose expression can be written as: It is the multiple kernel functions matrix of the MKELM. From (7), the α corresponding to the structural parameter of MKELM can be obtained by: Considering the general characteristics of kernel functions and the model complexity, a new equivalent kernel is constructed, which uses the weighted combination of Polynomial kernel, Exponential Radial Basis kernel, and the Gaussian Radial Basis kernel: Polynomial kernel: Exponential Radial Basis kernel: Gaussian Radial Basis kernel: where C, σ and σ are parameters to be determined. The equivalent kernel function of MKELM can be written as: where λ 1 , λ 2 , and λ 3 are the coefficients of corresponding kernel functions. According to the constraint conditions of (2) and letting the coefficient λ 3 in (13) vanish, Equation (13) can be simplified as: Therefore, the expression of the MKELM depicted in this research can be written as: In summary, the parameters γ, C, σ , σ, and λ in (15) can be determined empirically, and the parameter α is obtained by matrix inversion calculation according to (9). Therefore, the main amount of calculation in MKELM algorithm is the solution of parameter α.

The Proposed COS-MKELM Algorithm and Performance Verification
It can be seen from the MKELM algorithm depicted in Section 2 that, if the modeling process is repeated with the new input training data, the parameter α of MKELM needs to be obtained by repeating the computation of the inverse matrix. In addition, when the new training data added into MKELM are too large, calculating the matrix inverse with a very large dimension will greatly increase the amount of calculation in MKELM online modeling. Thus, the LDL T form Cholesky factorization method is used in this research to avoid the large matrix inverse calculation and simplify the online sequential calculation of MKELM. In this section, the COS-MKELM algorithm is proposed.

The Solution of COS-MKELM Parameter by Cholesky Factorization
According to (8) and (9), the expression of parameter α can be simplified as (16), where N is the number of initial training samples: Denote and it can be obtained as E T N = E N . Then, the (16) can be transformed as (17) when left multiplied matrix E N : The Cholesky factorization method can be used to solve the linear equation shown in (17). However, the conventional Cholesky factorization needs to be a square root, which is easy to reduce the computational precision and increase the computational complexity. In order to overcome this shortage, the matrix E N is factorized as the LDL T form Cholesky factorization. In addition, the elements in the LDL T forming the Cholesky factorization matrix are calculated recursively. Before getting the LDL T form Cholesky factorization, it is necessary to prove that the matrix E N is a symmetric positive definite matrix. Assume that any column vector W N = W 1 W 2 · · · W N T = 0 has the same dimension as E N , and the following formula can be obtained: Equation (18) proves that the matrix E N is a symmetric positive definite matrix. Then, E N can be factorized to LDL T form as follows: where L N is a unit lower triangular matrix and D N is a diagonal matrix with diagonal positive.
Put (19) into (17) and multiply L −1 N on both sides of (17): where M N is a column vector. The elements m i in matrix M N can be obtained as follows: where i = 1, 2, . . . , N.
According to (22), the elements α i in the parameter α can be calculated recursively as follows: Therefore, in the solution of the COS-MKELM parameter calculation process, the LDL T form Cholesky factorization method avoids the large matrix inverse and extracting a root calculation.

Online Sequential Learning COS-MKELM Parameter
During the online sequential learning process of the COS-MKELM algorithm, the new training data, which were added into the machine, will be learned immediately. Assuming that the number of new training data isN, then the matrix E N+N can be written as follows: According to E N described in Section 3.1, (25) can be expanded as follows: Processes 2021, 9, 1540 8 of 17 The matrix E N+N can be factorized in the LDL T form as follows: It can be obtained from (19) as follows: Here,ê,l, andd are the elements of matrix E N+N ,L N+N and D N+N , 1 ≤ j ≤ i ≤ N +N.
Form (26) and (27), it can be seen that, when 1 ≤ j ≤ i ≤ N, L N+N and D N+N are the same to L N and D N . Only the remaining new training data (theN new training data) need to be calculated recursively according to (21). Thus, the elements of matrix L N+N and D N+N can be calculated recursively based on matrix L N and D N .
Similarly, the matrix M N+N can be obtained according to (23). It is shown as (29): The elements α i in the new parameter α of COS-MKELM can be obtained according to (24). Substituting the parameter α into (15), the output of COS-MKELM algorithm can be obtained. In summary, when the new training data are added into COS-MKELM, the new matrix elements are calculated recursively based on the original matrix information. Thus, the COS-MKELM algorithm has reduced the computational complexity and simplified the online sequential learning process.

COS-MKELM Performance Verification
In this subsection, the regression datasets of Auto MPG (d 1 ), Machine CPU (d 2 ), and Boston Housing (d 3 ) are used to test the performance of COS-MKELM. These regression datasets have been widely used to test the performance of the modeling method. All of the experiments are carried out under Windows 7 and Matlab R2010a, with 4 GB RAM, and an Inter Pentium G2020 CPU with 2.90 GHz.
The COS-MKELM algorithm is compared with the OS-ELM algorithm and the OS-KELM algorithm. The activation function of OS-ELM algorithm is Sigmoid function, and the number of hidden layer nodes of OS-ELM algorithm is 40. The kernel function of OS-KELM is selected as the Gaussian Radial Basis kernel, and the kernel function of the proposed COS-MKELM algorithm is the same as the kernel function presented in Section 2. The parameters of COS-MKELM algorithm are set as the following: C = 60, σ = 2.5, σ = 8, λ 1 = 10, λ 2 = 2.5, γ = 10,000. 3/4 of each regression datasets is used as training data, and the remaining 1/4 is used as testing data. The number of initial training data of OS-ELM algorithm is set as 50, and the rest of the training data are added in 10%. In addition, 10% of the training data are added in OS-KELM and COS-MKELM each time.
In this research, the root mean square error (RMSE) is used to evaluate the precision performance index. In addition, RMSE can be expressed as follows: where y i andŷ i stand for the actual value and estimated value, and N T is the number of testing data. The training time and RMSE for training data are listed in Table 1. In addition, the output values of each regression datasets for testing data are shown in Figures 1-3.    From Table 1, it can be seen that, in the COS-MKELM algorithm, the training times of the three regression datasets are all the smallest. It demonstrates that the COS-MKELM algorithm is better than the other comparison algorithms in terms of learning efficiency. As seen from the RMSE values for testing data listed in Table 1 and the output results shown in Figures 1-3, the predicted testing values based on COS-MKELM are closer to the actual values than the other comparison algorithms. It is concluded that the COS-MKELM algorithm proposed in this research shows much better performance in regression accuracy and generalization ability than the OS-ELM and OS-KELM.
Therefore, it can be seen that it is feasible to build a cement clinker fCaO online prediction model based on the COS-MKELM algorithm proposed in this paper.

fCaO Content Online Prediction Model
In this subsection, the proposed COS-MKELM algorithm is adopted to build the cement clinker fCaO content online prediction model. The flowchart of the building cement clinker fCaO content online prediction model is shown in Figure 4.
Combined with the COS-MKELM algorithm, the fCaO online prediction modeling steps can be described in detail below: Step 1: Collect the dataset of variables related to the fCaO content from the cement production line. Given the initial training data {x i , Initialize the values of γ, C, σ , σ, λ 1 and λ 2 .
Step 2: Factorize the matrix E N to LDL T form. Calculate the elements of L N , D N , and M N recursively according to (21) and (23).
Step 3: Calculate the parameter α of COS-MKELM according to (24), and put it into (15) to obtain the online prediction output value of fCaO content.
Step 4: If the new training data are added into the COS-MKELM, update the number of online learning steps k = k + 1 and run Step 5 to learn the new data; otherwise, output the parameter values (α, γ, C, σ , σ, λ 1 and λ 2 ), and then obtain the fCaO content online prediction model.
Step 5: Assume the new training data added into COS-MKELM is {x i , y i }N i=1 . Based on the matrix L k , D k and M k obtained from the kth step of online sequential learning, calculate the elements of matrix L k + 1, D k + 1 and M k + 1 recursively according to (21) and (23). Return Step 3.
After the cement clinker fCao content online prediction model is obtained, in the process of fCao content online prediction, the parameters of fCaO model are updated constantly. When new sample data are obtained or the production conditions of cement clinker change, the above modeling Step 4 is repeated on the basis of the existing online prediction model. The cement clinker fCaO content online prediction model is trained continuously, and relevant parameters of the model are updated. The online process of updating parameters makes the model have strong applicability and anti-interference ability.

Dataset and Ascertaining Model Parameters
In this research, the raw dataset includes simultaneous process variables and fCaO content measured values under various conditions for fCaO online prediction modeling. It is obtained from No.1 rotary kiln at the TangshanJidong cement plant with 5000 tons clinker production capacity per day. Through object linking and embedding for process control communication technology, the input variables data of cement clinker fCaO content prediction model were collected from the distributed control system of the cement clinker production line. The laboratory samples the clinker on site every hour for chemical testing to obtain the actual value of the cement clinker fCaO content, which are the output variable data of the cement clinker fCaO content prediction model. Thus, the data of cement clinker fCaO content prediction model are sampled once every hour, with 24 samples per day.
Because the collected data are affected by uncertain factors such as noise, the continuous mean filtering method is used to pre-process the collected initial data. After filtering the collected initial data, a total of 215 pairs of data are taken as the training and testing data sets of the cement clinker fCaO content prediction model. From these data, 150 data are selected randomly as the training dataset (used for fCaO content online prediction model development), and the other 65 data are testing datasets (used for fCaO content online prediction model validation). Some of the collected process variables are shown in Table 2.

Results and Discussion
In this subsection, the RMSE, mean absolute percentage error (MAPE), mean absolute error (MAE), maximum absolute error (MAXE), and coefficient of determination (R 2 ) are used as performance assessment indexes for the fCaO models' prediction. The expressions of RMSE are the same as (30), and the other criteria expressions are given as follows: The four performance criteria: RMSE, MAPE, MAE, and MAXE are applied to estimate the accuracy performance of the models. R 2 denotes the degree of matching between the estimated and actual values. The smaller the values of RMSE, MAPE, MAE and MAXE are and the closer R 2 is to 1, the better the accuracy performance of a model. The performance comparison results are shown in Table 3. Compared with actual data, the simulation results of the fCaO online prediction model with three models for testing data are shown in     As seen from the time data in Table 3, the training time and testing time of fCaO model based on COS-MKELM are smallest. The RMES, MAPE, MAE, and MAXE performance criteria values of fCaO model based on COS-MKELM are the least, and the criterion R 2 value is the closest to 1 among the three comparison models. As seen from the precision index data in Table 3, the predicted values of the cement clinker fCao content online prediction model based on the COS-MKELM algorithm are closer to the actual value of cement clinker fCaO content obtained by the laboratory test. In addition, the established cement clinker fCao content online prediction model has strong nonlinear identification ability, which meets the requirements of prediction accuracy of the fCaO content in the cement clinker firing process.
The simulation results in Figures 5-7 show that the predicted fCaO content values based on the COS-MKELM algorithm are closer to the actual values than the other comparison algorithms. In conclusion, although the COS-MKELM algorithm has fewer initial samples in the simulation modeling process, the learning performance of COS-MKELM algorithm is improved greatly through the continuous arrival of sample data. The online prediction model of cement clinker fCaO content based on the proposed COS-MKELM algorithm shows better performance in both test accuracy and generalization ability, indicating that the cement clinker fCaO content online prediction model based on the proposed COS-MKELM algorithm is effective.
In short, the simulation results of the cement clinker fCaO content online prediction model using actual data show that the proposed COS-MKELM algorithm, which is based on the fusion of three kernel functions method and online sequential learning method, shows fast learning efficiency. In addition, the prediction model of cement clinker fCaO content has better nonlinear identification ability, regression accuracy, and generalization ability. Therefore, the fCaO content value can be predicted by the fCaO content online prediction model based on the proposed COS-MKELM algorithm. When the new training data added or the production conditions of cement clinker changed during the online prediction of fCaO content, the parameters of fCaO model can be online sequential updated to adapt the current condition. With the continuous input of actual data, the characteristics of cement clinker fCaO content online prediction model are improved, so that the online prediction model can adapt to different production conditions and operate stably for a long time. Therefore, this paper provides a new idea for realizing the prediction of cement clinker fCaO content. Furthermore, according to the fCaO content predicted value, the operators and engineers can adjust the cement production process variables. In this way, the qualified rate of cement clinker can be guaranteed, and the energy consumption and the exhaust emission of cement clinker production can be reduced.

Conclusions
In this paper, the COS-MKELM algorithm is proposed to avoid the large matrix inverse calculation and reduce the computational complexity of online sequential learning process. Three regression datasets were used to test the performance of the COS-MKELM algorithm. Experiment results show that the proposed COS-MKELM algorithm, which is based on the fusion of the three kernel functions method and the online sequential learning method, shows fast learning efficiency. Although the COS-MKELM algorithm has fewer initial samples in the simulation modeling process, the learning performance of COS-MKELM algorithm is improved greatly through the continuous arrival of sample data.
In addition, online prediction fCaO content is crucial and useful to the cement process industry. The proposed COS-MKELM algorithm is adopted to build the cement clinker fCaO content online prediction model of rotary kiln with 5000 tons of clinker production capacity per day in this paper. It can be seen from experiment results that the online prediction model of cement clinker fCaO content based on the proposed COS-MKELM algorithm shows better performance in nonlinear identification ability, regression accuracy, and generalization ability. The characteristics of cement clinker fCaO content online prediction model can be improved with the iteration update of cement calcination process data. Therefore, this research provides a new idea for realizing the prediction of cement clinker fCaO content.
Based on the online prediction model proposed in this research, the cement clinker fCaO content can be mastered by operators and engineers in real time. This is beneficial to the clinker quality and the energy consumption of the cement calcination process. Furthermore, the fCaO online sequential model built in this paper provides the necessary prerequisite for the cement process industry to build the intelligent control system and minimize energy costs.
Author Contributions: Conceptualization, P.Z. and Y.C.; methodology, Z.Z. and P.Z.; software, P.Z.; validation, Y.C. and Z.Z.; data curation, P.Z. and Y.C.; writing-original draft preparation, P.Z.; writing-review and editing, Y.C. and Z.Z.; funding acquisition, P.Z. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.