A Data-Driven Kernel Principal Component Analysis–Bagging–Gaussian Mixture Regression Framework for Pulverizer Soft Sensors Using Reduced Dimensions and Ensemble Learning

: In light of the nonlinearity, high dimensionality, and time-varying nature of the operational conditions of the pulverizer in power plants, as well as the challenge of the real-time monitoring of quality variables in the process, a data-driven KPCA–Bagging–GMR framework for soft sensors using reduced dimensions and ensemble learning is proposed. Firstly, the methodology employs a Kernel Principal Component Analysis to effectively reduce the dimensionality of the collected process data in a nonlinear manner. Secondly, the reduced principal components are then utilized to reconstruct a reﬁned set of input samples, followed by the application of the Bagging algorithm to obtain multiple subsets of the samples and develop corresponding Gaussian Mixture Regression models. Ultimately, the fusion output is achieved by calculating the weights of each local model based on Bayesian posterior probabilities. By conducting simulation experiments on the coal mill, the proposed approach has been validated as demonstrating superior predictive accuracy and excellent generalization capabilities.


Introduction
With the rapid development of the power industry, the pulverizers in power plants play a crucial role in ensuring stable power supply and efficient operation [1,2].To guarantee the high performance and safe operation of pulverizers, researchers have begun to explore accurate predictions of their performance and fault conditions [3][4][5].However, due to the complexity and nonlinearity of the pulverizer, traditional modeling methods often struggle to capture their underlying patterns and characteristics, resulting in a low prediction accuracy.The introduction of ensemble learning methods [6][7][8] provides an effective solution for modeling pulverizers in power plants.By integrating the predictions of multiple individual models, ensemble learning can enhance the prediction capability of running conditions and fault situations of pulverizers.As a result, ensemble learning methods gradually became a research hot spot for modeling pulverizers in the power industry.
The basic idea behind ensemble learning is to combine the predictions of multiple individual models to improve overall prediction performance.In the modeling of pulverizers, various modeling methods can be employed to construct individual models, such as Support Vector Machines (SVM) [9,10], Artificial Neural Networks (ANN) [11,12], Decision Trees (DT) [13], and others [14][15][16].Each individual model can capture different features and patterns of pulverizers, thereby providing more comprehensive and accurate predictions.The key to ensemble learning lies not only in appropriate combination methods but Energies 2023, 16, 6671 2 of 12 also in considering the diversity and accuracy among individual models [17,18].Diversity can be achieved by applying different feature selection methods, model structures, or training data, which provides more information to address modeling problems.On the other hand, the accuracy of individual models relies on their own performance, including accuracy, generalization ability, etc.Thus, in the application of ensemble learning methods, there is a need to balance the relationship between diversity and accuracy.
To achieve diversity, several aspects can be considered.Firstly, different feature selection methods can be utilized to construct individual models, allowing each model to focus on different features and thereby enhancing the overall model diversity [19].Secondly, various model structures, such as SVM, ANN, and DT, can be chosen to introduce differences in learning and expressive capabilities among individual models, thereby enhancing the diversity of the ensemble model [20].Additionally, diverse information and diversity can be obtained using different training data or training algorithms.On the other hand, the accuracy of individual models is crucial for the performance of ensemble learning.During the selection of individual models, an evaluation should be conducted based on metrics such as accuracy, generalization ability, and stability.Selecting highperformance individual models contributes to enhancing the overall prediction capability of the ensemble model.When employing ensemble learning, the correlations among individual models should also be taken into account.If there is a high degree of correlation among individual models, they may provide redundant information during the ensemble process, leading to a decrease in performance.Therefore, selecting individual models with lower correlations is essential for constructing an efficient ensemble model.
In practical applications, ensemble learning methods have achieved certain success in the modeling of pulverizers in power plants.By integrating multiple different individual models, ensemble learning can improve the prediction accuracy of pulverizer performance and fault conditions.Research has shown that in real plant operations, ensemble learning methods can more accurately capture the dynamic characteristics and fault conditions of pulverizers compared to individual models [21], thus enhancing the operational stability and efficiency of power plants.However, despite the promising application results of ensemble learning methods in the modeling of pulverizers, there are still challenges and room for improvement.Firstly, the selection and design of ensemble learning methods depend on prior knowledge and experience [22,23], requiring a certain understanding of the performance and characteristics of different models.Thus, further research is needed for the introduction and application of novel models and algorithms to investigate their applicability and effectiveness.Secondly, ensemble learning methods have higher computational complexity, particularly when dealing with large-scale data and high-dimensional features [24].Therefore, improving the computational efficiency and scalability of ensemble learning methods is also an important research direction.
However, existing individual models in ensemble learning are based on extracting the relationship between input and output data from the original data, which inherently contain strong, noisy information.Eliminating such noise will be beneficial for reducing the difficulty of model construction and improving model accuracy.Considering the large-scale and high-dimensional characteristics of the operational data from a coal mill in power plants, this study proposes a Gaussian Mixture Regression (GMR) modeling approach based on the Kernel Principal Component Analysis (KPCA) and Bagging algorithm.KPCA is utilized to map the original data to a high-dimensional space for linearization and dimensionality reduction, eliminating the inherent noise of input data.Then, a Gaussian process regression method is employed to establish a soft sensing model.Furthermore, the Bagging ensemble approach is applied to improve the predictive accuracy and robustness of the model.In Section 2, an introduction to KPCA, the Bagging algorithm, GMR, and Bayesian information fusion methods is provided.Section 3 provides a detailed description of the KPCA-Bagging-GMR algorithm and its steps in Bayesian information fusion.In Section 4, the application of the KPCA-Bagging-GMR algorithm and its Bayesian information fusion Energies 2023, 16, 6671 3 of 12 method in coal mill data modeling in power plants is presented.A comparison with similar algorithms is conducted to validate the superiority of this approach.

KPCA Reducing Dimensions
KPCA is a non-linear dimensionality reduction technique that represents a variant of Principal Component Analysis (PCA) [25,26].PCA is a commonly used linear dimensionality reduction method that reduces the dimensionality of data by finding projections along the directions of maximum variance, which has been widely used in other fields [27][28][29].In contrast, KPCA employs the use of the kernel trick to map the data into a higher dimensional feature space, where PCA is performed.The underlying principle of KPCA is to capture the non-linear structure of data by conducting PCA operations in a highdimensional feature space.Initially, KPCA calculates the similarity among samples using a kernel function, thereby mapping the original data into a higher dimensional feature space.In this feature space, PCA can be applied to compute the principal components and generate the minimum dimensional representation.Finally, by projecting the new data back to the original space, the dimensionality-reduced data are obtained.The advantages of KPCA lie in its ability to handle non-linear data and preserve the essential structure of the data.It finds applications in tasks such as data visualization, feature extraction, and pattern recognition.However, it is computationally expensive as it involves the calculation of the kernel function and the covariance matrix in the feature space.
Assuming the sample data are set as , n is the number of samples, and m is the sample dimensionality.Let φ be a non-linear transformation method mapping the original data to the mapping space F. When the premise condition n ∑ i=1 φ(x i ) = 0 is satisfied, the covariance matrix of the transformed data is given as follows.
The features of covariance matrix C are decomposed as follows.
where λ is the eigenvalue of covariance matrix and λ ≥ 0, V is the corresponding eigenvector.Equation ( 2) is simultaneously left multiplied by φ(x i ) to get the following expression.
The eigenvector V corresponding to the non-zero eigenvalue can be obtained through Equation (3), as follows.
The kernel function K is introduced.
where K ij is the kernel matrix generated by kernel function and α is the eigenvector corresponding to K ij .By substituting Equations ( 4) and (5) into Equation (3), the expression can be obtained as follows.
At this point, the problem of finding the eigenvalues and eigenvectors of the covariance matrix C is transformed into the problem of finding the eigenvalues and eigenvectors of the kernel function matrix K ij .The Gaussian kernel function is used as follows.
where σ is the kernel width parameter.The obtained eigenvalues are sorted in descending order, and the corresponding eigenvectors are processed via unit orthogonalization.The projection of kernel matrix K ij on some eigenvectors can be obtained by extracting the first t principal elements based on the contribution rate of principal elements.
In practice, the conclusion of n ∑ i=1 φ(x i ) = 0 may not hold, so it is necessary to modify the nuclear matrix K ij in Equation ( 6), as follows.
where I is a n × n matrix with element 1 n .

Bagging Algorithm
Bagging, namely bootstrap aggregating, is a machine learning technique that combines the weak learners to reduce the prediction errors, which was first introduced by Breiman [30].The fundamental principle behind the Bagging algorithm entails training a weak learner multiple times using a training set.Each training set is generated by performing random sampling with replacement from the original dataset, resulting in several localized outputs.Subsequently, these individual predictions are amalgamated using a prescribed rule to derive the ultimate outcome.As each training set is acquired through the process of random sampling, it is possible that certain samples from the original dataset may be selected multiple times and subsequently appear in multiple new training sets.On the contrary, there are other samples that may not be selected at all during the sampling procedure.Through the utilization of repetitive random sampling techniques, the Bagging algorithm amplifies the heterogeneity among the learning machines, thereby enhancing the model's capacity for generalization and precision.
The Bagging algorithm leverages repeated random sampling from the original sample set, with a total of m iterations, to construct a training set denoted as Utilizing the learning machines, the new training set is utilized for training, resulting in the creation of B distinct local models.These local models are subsequently consolidated to yield the ultimate predictive outcome.

Gaussian Mixture Regression Model
GMR is a regression analysis approach that utilizes a Gaussian mixture model (GMM) [31].It is a model that represents a dataset by overlaying multiple Gaussian distributions.Let x i ∈ R M and y i ∈ R, x i , and y i represent the input and output variables of the i-th sample, where M and N represent the dimensions of the input variables and the number of training samples, respectively.The input and output data can be represented in matrix form as mean and covariance matrices of the Gaus- sian distributions, respectively.
K are the regression coefficient matrix and the measurement noise variance matrix, respectively.z i = {z i1 , z i2 , • • • , z iK } represents the latent variables of the i-th sample, where z ik ∈ {0, 1}.When z ik = 1, it indicates that the i-th sample belongs to the k-th Gaussian distribution, thus satisfying z iK .The joint probability distribution of the data pair (x i , y i ) and the marginal distribution of the input variable x i in the k-th Gaussian distribution are given as follows.
, and K represents the number of Gaussian distributions in the dataset.

 
In Equations ( 9) and ( 10), where i y is the functional mapping relationship between the output variable and the in- put variable, k θ signifies the regression coefficients, and k ε is Gaussian measurement noise with a mean of 0 and a variance of 2 k σ .
In the GMR model, the estimated model parameters to be determined consist of . Typically, the expectation-maximization algorithm (EM) based on maximizing the logarithm of the likelihood function is employed, combined with the training data, to obtain parameter estimates for the model.The EM algorithm involves two steps.In the expectation step (E-step), the posterior probability distribution p(Z|D) characterizing the latent variables representing the membership of samples to different modes is derived based on the given training sample set D. In the maximization step (Mstep), the updated values for each parameter are obtained by taking the derivative of the logarithm of the likelihood function with respect to each model parameter.After iterations using the EM algorithm, parameter estimates based on the training sample D can be obtained.In Equations ( 9) and ( 10), where y i is the functional mapping relationship between the output variable and the input variable, θ k signifies the regression coefficients, and ε k is Gaussian measurement noise with a mean of 0 and a variance of σ 2 k .In the GMR model, the estimated model parameters to be determined consist of . Typically, the expectation-maximization algorithm (EM) based on maximizing the logarithm of the likelihood function is employed, combined with the training data, to obtain parameter estimates for the model.The EM algorithm involves two steps.In the expectation step (E-step), the posterior probability distribution p(Z|D) characterizing the latent variables representing the membership of samples to different modes is derived based on the given training sample set D. In the maximization step (M-step), the updated values for each parameter are obtained by taking the derivative of the logarithm of the likelihood function with respect to each model parameter.After iterations using the EM algorithm, parameter estimates based on the training sample D can be obtained.
For a given predictive sample, when the input variable x q is provided, the posterior probability distribution of the sample belonging to different modes is first computed, as follows.
By taking the expected value of the probability distribution of the output variable y q , the estimated result for y q can be obtained as follows.
As indicated by Equation ( 13), the GMR model no longer relies solely on the prediction output of a single global model.Instead, it assigns appropriate weights to multiple local models, considering the predictive information provided by each local model to accurately output the final result.A larger value of γ q k indicates a higher probability that the predictive sample originates from the k-th Gaussian distribution, thereby giving greater weight to the corresponding local model's prediction result.

Bayesian Network Information Fusion
Bagging algorithm is employed to randomly select b samples from the training set, creating a sub-training set.This process is repeated B times to generate B training sets T = {T 1 , T 2 , • • • , T B }, which are then individually trained using the GMR method.By applying this process to the B sub-models, In order to improve the accuracy of the Bagging Gaussian process regression modeling method, a posteriori probability-weighted fusion method based on Bayesian inference [32] is used to merge the sub-models, as follows.
For the new test sample point x q , the weight coefficient for each sub-model is given as follows.
where M i,q is the output value of test point x q for the i-th GMR sub-model, p(M i,q ) is the prior probability of each sub-model, p( x q M i,q ) is the conditional probability of the test sample for each sub-model, and p( M i,q x q ) is the posterior probability of the test sample x q for each sub-model.

Steps Based on KPCA-Bagging-GMR
The power production process generally has the characteristics of multi-variable coupling and strong nonlinearity.The effect of PCA dimensionality reduction on the biochemical process data is not ideal, and the nonlinear characteristics of the data cannot be fully preserved.Therefore, a Gaussian Mixture Regression modeling method based on KPCA-Bagging is proposed.KPCA is used to reduce the dimensionality of the training sample set, and the corresponding GMR model is built on the sampled sub-sample data set based on the recursive random idea of Bagging algorithm.Finally, the local models are fused according to the Bayesian posterior probability to obtain the final global prediction output.
The flowchart of the proposed method is shown in Figure 2, where specific modeling steps are described as follows: Step 1: Pre-process the data of training set via standardization; Step 2: Set the appropriate kernel function parameters and calculate the kernel function matrix K ij in Equation (7) and modify it to get K ij in Equation ( 8); Step 3: The eigenvalues of kernel matrix K ij are arranged in descending order; Step 4: Do the unit orthogonalization of the eigenvector as α = {α 1 , α 2 , • • • , α n }; Step 5: Set the appropriate cumulative contribution threshold and extract the corresponding principal component α = {α 1 , α 2 , • • • , α t }; Step 6: The projection X of the kernel matrix K ij on the principal component is calculated, that is X = K ij α , the result after dimensionality reduction is obtained; Step 7: X is randomly sampled B times, and the number of samples selected each time is b; Step 8: For the new test sample x q , its output value M i,q for the i-th GMR sub-model is calculated; Step 9: The predicted value of the soft sensor model corresponding to x q can be obtained from Equation ( 14); Step 7: X ′ is randomly sampled B times, and the number of samples selected each time is b; Step 8: For the new test sample q x , its output value

M
for the i-th GMR submodel is calculated; Step 9: The predicted value of the soft sensor model corresponding to q x can be obtained from Equation ( 14); In order to better evaluate the prediction performance of the model, root mean square error (RMSE) and correlation coefficient (COR) are selected as evaluation indexes, which are defined as where N is the number of test sample data, i y is the true value corresponding to the test sample, y is the average value of the whole test sample, i pre y , is the predicted value of the test sample obtained using the model, and i pre y , is the average value of the predicted value of the test sample.The smaller the value of RMSE is, the higher the accuracy of the established soft measurement model is.The predicted value of the new test point is closer In order to better evaluate the prediction performance of the model, root mean square error (RMSE) and correlation coefficient (COR) are selected as evaluation indexes, which are defined as where N is the number of test sample data, y i is the true value corresponding to the test sample, y is the average value of the whole test sample, y pre,i is the predicted value of the test sample obtained using the model, and y pre,i is the average value of the predicted value of the test sample.The smaller the value of RMSE is, the higher the accuracy of the established soft measurement model is.The predicted value of the new test point is closer to the true value using the soft sensor model.The closer the value of COR is to 1, the closer the predicted value of the soft sensor model for the new test point is to the true value.

Research Object
The ZGM-type medium-speed pulverizer is a commonly used coal grinding equipment in coal-fired power plants.Its working principle involves the following steps: feeding, grinding, classification, and discharge.In the coal-fired power plant, coal is fed into the grinder through the feeding device.Inside the grinder, the coal undergoes grinding and pulverization through the squeezing and grinding action of the roller discs.The roller discs, consisting of multiple rollers with gaps in between, exert pressure on the coal, resulting in its pulverization into fine powder.During the grinding process, the coal is mixed with a certain amount of hot air, which enhances the grinding efficiency and helps remove internal moisture.The ground coal powder, along with the air mixture, enters the classifier where it is classified based on particle size.The classifier separates a portion of the fine powder to be recirculated back to the grinder for secondary grinding, ensuring consistent particle size.Meanwhile, the air containing grinded coal particles is separated and discharged through the exhaust system.The classified fine powder is discharged from the classifier outlet as the final product coal powder.This product is collected and stored using a discharge device for combustion in the boilers.Additionally, the dust generated during the grinding process is collected and treated by the dust collection system to minimize environmental pollution.In summary, the ZGM-type medium-speed grinder in coal-fired power plants operates by grinding coal into fine powder through the squeezing and grinding action of the roller discs.The coal powder is then classified, and the desired particle size is achieved through recirculation and separation.Finally, the collected coal powder is ready for combustion, while the dust is appropriately treated to ensure environmental compliance.
To validate the efficacy of the methodology outlined in this study, pulverization data from a 60 MW power plant are chosen for analysis, accompanied by the selection of 11 auxiliary variables in Table 1.A total of 400 sets of training data are selected, with the main variable being the motor.Out of these, 350 sets are designated as training samples, and the remaining 50 sets are assigned as testing samples.The first step involves standardizing the training samples, followed by dimensionality reduction through KPCA.The Bagging algorithm is then employed to select sub-sample sets, and GMR is utilized to establish individual sub-models.The choice of parameters in the KPCA method significantly impacts the prediction performance.In this study, the model is configured with the following parameters: KPCA parameter σ is set at 80, with a cumulative contribution rate threshold of 90%.In Bagging coupling with fusion algorithm, B and b are given through a large amount of trial and error, so as to ensure the best accuracy of the calculation.B is set at 5, and b is set at 60%.The modeling process proceeds according to the flowchart outlined in Figure 3, ultimately yielding the final global prediction value.

Location Auxiliary Variables
Pulverizer Inlet inlet primary air flow, inlet primary air pressure, inlet primary air temperature, and coal feed Pulverizer Outlet separator air powder mixture temperature and separator outlet pressure Pulverizer classifying seal air and primary air pressure difference Hydraulic power unit loaded oil pressure and hydraulic oil temperature Pulverizer motor motor bearing temperature and thrust bearing oil groove oil temperature

Simulation Experiment of Pulverizer
In order to further analyze the impact of the data preprocessing methods, Bagging algorithm, and sub-model fusion on the modeling performance, five different methods were simulated and evaluated: KPCA-GMR, PCA-Bagging-GMR, MDS-Bagging-GMR, the proposed KPCA-Bagging-GMR, and the conventional mean fusion [33] approach.MDS, abbreviated for Multidimensional Scaling [29], is an alternative approach to dimensionality reduction.In order to ensure the consistency of comparison conditions, the cumulative contribution thresholds of PCA and MDS are consistent with those of KPCA, which are basically around 90%.These experiments aim to assess the influence of preprocessing methods and fusion techniques on algorithm performance, which run in Python 3.10 environment.Table 2 presents the simulation results of relevant performance metrics.

Simulation Experiment of Pulverizer
In order to further analyze the impact of the data preprocessing methods, Bagging algorithm, and sub-model fusion on the modeling performance, five different methods were simulated and evaluated: KPCA-GMR, PCA-Bagging-GMR, MDS-Bagging-GMR, the proposed KPCA-Bagging-GMR, and the conventional mean fusion [33] approach.MDS, abbreviated for Multidimensional Scaling [29], is an alternative approach to dimensionality reduction.In order to ensure the consistency of comparison conditions, the cumulative contribution thresholds of PCA and MDS are consistent with those of KPCA, which are Energies 2023, 16, 6671 9 of 12 basically around 90%.These experiments aim to assess the influence of preprocessing methods and fusion techniques on algorithm performance, which run in Python 3.10 environment.Table 2 presents the simulation results of relevant performance metrics.In Table 2, the results indicate that the proposed KPCA-Bagging-GMR modeling demonstrates the highest predictive accuracy.When compared to the PCA-Bagging-GMR and MDS-Bagging-GMR methods, the utilization of KPCA proves to be more effective in capturing the nonlinear characteristics of the process data.Moreover, in contrast to the KPCA-GMR modeling, it can be observed that the ensemble of multiple models established through the Bagging ensemble learning method effectively enhances the predictive accuracy of the models.Additionally, in comparison to the mean fusion approach, the adoption of the Bayesian fusion technique highlights its ability to better harness the performance of the local models.This ensures that local models with higher predictive performance carry higher weights, thereby maximizing the advantages of ensemble learning.
Figure 4 shows the predictive results obtained using different preprocessing methods, namely PCA, MDS, and KPCA, for constructing soft measurement models.Upon comparing the predictive result graphs of the three methods, it is evident that the overall accuracy of the soft measurement models constructed using nonlinear preprocessing methods is higher.Additionally, the soft measurement model established using the KPCA preprocessing method outperforms the model constructed using MDS.This indicates that the proposed KPCA-Bagging-GMR modeling method exhibits superior predictive accuracy.For processes with strong nonlinearity, KPCA demonstrates a superior ability to handle nonlinear features compared to PCA and MDS. Figure 5 illustrates the relative error results of the soft measurement models constructed using different preprocessing methods, clearly indicating the modeling approach utilizing KPCA preprocessing.GMR and MDS-Bagging-GMR methods, the utilization of KPCA proves to be more effective in capturing the nonlinear characteristics of the process data.Moreover, in contrast to the KPCA-GMR modeling, it can be observed that the ensemble of multiple models established through the Bagging ensemble learning method effectively enhances the predictive accuracy of the models.Additionally, in comparison to the mean fusion approach, the adoption of the Bayesian fusion technique highlights its ability to better harness the performance of the local models.This ensures that local models with higher predictive performance carry higher weights, thereby maximizing the advantages of ensemble learning.Figure 4 shows the predictive results obtained using different preprocessing methods, namely PCA, MDS, and KPCA, for constructing soft measurement models.Upon comparing the predictive result graphs of the three methods, it is evident that the overall accuracy of the soft measurement models constructed using nonlinear preprocessing methods is higher.Additionally, the soft measurement model established using the KPCA preprocessing method outperforms the model constructed using MDS.This indicates that the proposed KPCA-Bagging-GMR modeling method exhibits superior predictive accuracy.For processes with strong nonlinearity, KPCA demonstrates a superior ability to handle nonlinear features compared to PCA and MDS. Figure 5 illustrates the relative error results of the soft measurement models constructed using different preprocessing methods, clearly indicating the modeling approach utilizing KPCA preprocessing.In Figure 6, the relative error corresponding to different fusion methods employed in the modeling approach is presented.It can be observed from the figure that the proposed method in this study exhibits a lower overall relative error compared to the mean fusion approach.In other words, the predictive accuracy of the Bayesian posterior probability fusion method is higher than that of the conventional mean fusion method.By using the Bayesian posterior probability to compute the weight coefficients of each sub-model, the proposed method effectively enhances the weights corresponding to locally superior models, thereby improving the overall performance of the model.In Figure 6, the relative error corresponding to different fusion methods employed in the modeling approach is presented.It can be observed from the figure that the proposed method in this study exhibits a lower overall relative error compared to the mean fusion approach.In other words, the predictive accuracy of the Bayesian posterior probability fusion method is higher than that of the conventional mean fusion method.By using the Bayesian posterior probability to compute the weight coefficients of each sub-model, the proposed method effectively enhances the weights corresponding to locally superior models, thereby improving the overall performance of the model.

Conclusions
The nonlinearity, high dimensionality, and time-varying nature of the operational conditions of a pulverizer in power plants bring about difficulties for the data-driven soft sensing modeling.Existing individual models in ensemble learning are rooted in extracting the correlation between the input and output data from the original dataset, which inherently encompasses substantial noisy data.To address this, the present study introduces the KPCA dimensionality reduction technique and extracts distinctive features of the input information for the pulverizer, subsequently establishing an ensemble learning model (Bagging algorithm + Bayesian fusion) that correlates the feature values with the output quantities.Experimental simulations on pulverizer processes demonstrate that the ensemble soft sensing modeling approach based on KPCA effectively predicts the electric current in the pulverizer process, showing superior predictive accuracy and excellent generalization capabilities.
Author Contributions: Conceptual model and simulation, S.J. and Y.D.; funding acquisition and

Conclusions
The nonlinearity, high dimensionality, and time-varying nature of the operational conditions of a pulverizer in power plants bring about difficulties for the data-driven soft sensing modeling.Existing individual models in ensemble learning are rooted in extracting the correlation between the input and output data from the original dataset, which inherently encompasses substantial noisy data.To address this, the present study introduces the KPCA dimensionality reduction technique and extracts distinctive features of the input information for the pulverizer, subsequently establishing an ensemble learning model (Bagging algorithm + Bayesian fusion) that correlates the feature values with the output quantities.Experimental simulations on pulverizer processes demonstrate that the ensemble soft sensing modeling approach based on KPCA effectively predicts the electric

Figure 1
shows the probability graph of the Gaussian Mixture Regression model, where π = {π 1 , π 2 , • • • , π K } represents the weights of different Gaussian distributions, satisfying K ∑ k=1 π k = 1, and K represents the number of Gaussian distributions in the dataset.
= represent the mean and covariance matrices of the Gaussian distributions, respectively.coefficient matrix and the measurement noise variance matrix, respectively.the latent variables of the i-th sample, where z = 1, it indicates that the i-th sample belongs to the k-th Gaussian dis- tribution, thus satisfying iK z .The joint probability distribution of the data pair ) , ( i i y x and the marginal distribution of the input variable xi in the k-th Gaussian distribution are given as follows.

Figure 6 .
Figure 6.Relative error of different fusion methods.

Figure 6 .
Figure 6.Relative error of different fusion methods.

Table 1 .
Auxiliary variable scale for pulverizing.

Table 1 .
Auxiliary variable scale for pulverizing.

Table 2 .
Performances of different modeling methods.