Soft Sensing of Silicon Content via Bagging Local Semi-Supervised Models

The silicon content in industrial blast furnaces is difficult to measure directly online. Traditional soft sensors do not efficiently utilize useful information hidden in process variables. In this work, bagging local semi-supervised models (BLSM) for online silicon content prediction are proposed. They integrate the bagging strategy, the just-in-time-learning manner, and the semi-supervised extreme learning machine into a unified soft sensing framework. With the online semi-supervised learning method, the valuable information hidden in unlabeled data can be explored and absorbed into the prediction model. The application results to an industrial blast furnace show that BLSM has better prediction performance compared with other supervised soft sensors.

In industrial processes, a large amount of sensor variables that are available can be used as input to the soft sensor model. The quality-relevant variable to be predicted using a soft sensor can be regarded as "labeled" data. However, the amount of quality-relevant variable ("labeled" data) is often limited mainly because it is difficult to measure online. Till now, most soft sensors in industrial ironmaking processes act in a supervised manner. That is, for construction of a soft sensor, both of inputs (sensor variables) and outputs (quality-relevant variables) are required for the task of supervised modeling. The labeled dataset contains both input and output data, while the unlabeled one consists of only input data (i.e., large amount of sensor variables). Actually, the labeled data are much fewer Sensors 2019, 19, 3814 2 of 11 than the unlabeled data mainly because the assaying process of silicon contents is infrequent and time-consuming. In contrast, the process input variables are measured frequently. Using a limited set of labeled data, the soft sensors are often inaccurate. To enhance the prediction performance, with large amounts of unlabeled data available, some semi-supervised soft sensors have been applied to chemical processes [33][34][35]. Therefore, the information hidden in unlabeled data is explored to develop a semi-supervised soft sensor for the silicon content prediction.
Most soft sensors have their fixed prediction domains. The predictive accuracy of soft sensors gradually decreases due to changes in the state of chemical plants [36]. Consequently, flexible models with adaptive structure, e.g., just-in-time-learning (JITL) soft sensors [23,24,37] are more attractive than only using a fixed one in practice use. Unfortunately, most conventional JITL-based soft sensors were constructed only with the labeled data. Only the labeled data are considered in the process of selection and modeling of similar samples. Consequently, without integration of useful information in the unlabeled data, the prediction performance of JITL-based models may still not be sufficient for some applications.
In this work, bagging local semi-supervised models (BLSM) for online silicon content prediction are proposed. It integrates the bagging strategy, the JITL modeling manner [37] and semi-supervised extreme learning machine (SELM) [34,38,39] into a unified soft sensing framework. For online prediction of a test sample, the useful information in both of similar labeled and unlabeled samples is taken into its special JITL model. Additionally, a simple bagging strategy is adopted to online construct the model. Compared with conventional JITL models only with the labeled data, the prediction performance of BLSM is improved by utilizing the useful information in unlabeled data.
This work is organized in the following way. The extreme learning machine (ELM) and SELM soft sensors are described in Section 2. Additionally, the BLSM online modeling method and its detailed implementation are proposed in this section. In Section 3, BLSM is applied to online silicon content prediction and compared with other approaches. Finally, a conclusion is given in Section 4.

Soft Sensor Modeling Methods
In this section, three soft sensing methods for the silicon content prediction are presented. First, the ELM-based supervised regression algorithm is briefly described. Second, the SELM-based semi-supervised regression algorithm is presented. Finally, the BLSM online local modeling method is proposed.

Extreme Learning Machine (ELM) Regression Method
The labeled dataset is denoted as are L input and output data, respectively. ELM works for generalized single-hidden layer feedforward networks (SLFNs) [38]. The ELM model has an input layer, a single-hidden layer, and an output layer. With N hidden nodes, ELM approximates the training data, i.e., L i=1 y l i −ŷ l i = 0, where y l i andŷ l i denote the actual output and predicted one, respectively. Compactly, the ELM-based regression formulation [38] is described as: where the output matrix of hidden-layer P = [p 1 , p 2 , . . . , , i = 1, . . . , N; v a i , x l j + b i is the activation function output of the ith hidden node related to the jth input x l j . For the ith hidden node, a i and b i are its input weight and bias, respectively; and a i , x l j is the Different from the gradient-descent based training algorithms (e.g. backpropagation method) for many NNs and the optimization method based for support vector machines, the essence of ELM is that the hidden layer of SLFNs need not be tuned. Without resorting to some complex training algorithms, the weights of the hidden neurons in ELM can be efficiently computed [38]. For many regression cases, the number of hidden nodes is much less than the number of training samples, i.e., N << L. In such a situation, the output weights α [38] are determined as: Using the Moore-Penrose generalized inverse of matrix P to solve α in ELM is feasible, i.e., α = P + Y l [38]. Additionally, to avoid the problem of P T P being noninvertible, a regularized ELM (RELM) model was formulated [34] where γ > 0 is the ridge parameter for the unit matrix I.
Finally, for a test sample x t = [x t1 , x t2 , · · · , x tn ] T ∈ R n , its predictionŷ t is obtained below: where p t is the output vector of the hidden-layer associated with x t .

Semi-supervised Extreme Learning Machine (SELM) Regression Method
For the semi-supervised learning methods, the input and output samples are represented as , respectively. Additionally, the hidden layer output matrix P can be defined as P = [p 1 , p 2 , · · · , p N ] (L+U)×N as aforementioned. The manifold regularization framework is utilized to learn the matrix W of an SELM model [39].
where JPW − Y 2 is the approximation errors of labeled training data (i.e., for the empirical risk) while λ(PW) T LPW is the penalty term utilizing the graph Laplacian L with a parameter λ ≥ 0 (i.e., for the complexity of learnt function). All the unlabeled data are integrated into the matrix P. The graph Laplacian L can be designed using a basic identity in the spectral graph theory [39]. Additionally, for the convenience of calculation, J = I L 0 0 0 (L+U)×(L+U) is defined [39].
By solving Equation (5), the coefficient matrix W [39] is obtained as: Generally, for semi-supervised learning methods, there is an assumption that the input patterns from both labeled and unlabeled data are from the same distribution. In such a situation, the data samples in the local region should have similar labels [33,34,39]. Useful information hidden in the unlabeled data can be explored from the above modeling framework. The graph Laplacian L of SELM contains the information in both of labeled and unlabeled data. Once the unlabeled data are ignored (i.e., λ = 0), W is the same as α in Equation (3). The prediction performance improvement can be obtained by suitably choosing λ as the penalty of model complexity. Finally, for a query sample where p t is the output vector of the hidden-layer associated with x t .

Bagging Local Semi-supervised Models (BLSM) Online Modeling Method
In industrial processes, JITL-based local soft sensors are more flexible than only using a fixed one for the relatively long-term utilization [23,24]. Nevertheless, most conventional JITL approaches only use limited labeled data, regardless of the useful information in lots of unlabeled data samples. As can be expected, using the unlabeled data, the prediction accuracy of JITL models can be improved.
Online inquiry of x t contains three main steps. First, select a similar set {S t } = S l t ∪ S u t , including both of L t labeled data and U t unlabeled data (i.e., S l t = X l t , Y l t and S u t = X u t ), from the historical database {S} via some defined similarity criteria [37]. The common Euclidean distance-based similarity is adopted here. Other similarity criteria available [23,24,37] can also be combined with local SELM models. Second, construct a local SELM model f (x t ) using the selected similar dataset {S t }. Third, online predict and then repeat the same procedure for another query sample.
For a selected {S t }, two parameters, i.e., the number of hidden nodes N and the balance parameter λ ≥ 0, are necessary to train a local SELM model. To avoid the overfitting problem, a simple bagging strategy is adopted to generate multiple local candidate models with diversities and then aggregate them as a new predictor. With the bootstrapping re-sampled strategy, several candidate regression models are ensembling to achieve an improved prediction [40].
For the similar labeled dataset S l t = X l t , Y l t , L t pairs of samples are randomly selected to replace S l t where the probability of each pair being chosen is 1 L t [40]. These L t pairs of data are a re-sampled training set S l t . Sequentially, the procedure is repeated for K times and to obtain K re-sampled datasets, i.e., S l t1 , · · · , S l tK . Similarly, the bagging strategy is applied to the unlabeled dataset S u t = X u t to get K re-sampled datasets S u t1 , . . . , S u tK . For the kth dataset {S tk } = S l tk ∪ S u tk , W k of the kth local SELM model is obtained (similar with Equations (5) and (6)). Consequently, for a test input x t = [x t1 , x t2 , . . . , x tn ] T ∈ R n , the prediction value of the kth local SELM model, i.e.,ŷ k,t , is formulated: where p t is the output matrix of the hidden-layer associated with x t . Finally, using a simple ensemble strategy, K candidate SELM models are equally weighted to generate the final prediction.ŷ The main modeling flowchart of BLSM is given in Figure 1. In summary, BLSM has two main characteristics. First, the useful information hidden in unlabeled data is explored and absorbed. Second, The main modeling flowchart of BLSM is given in Figure 1. In summary, BLSM has two main characteristics. First, the useful information hidden in unlabeled data is explored and absorbed. Second, using the bagging strategy [40], the BLSM model can be aggregated using multiple local candidates with diversities. Figure 1. Bagging local semi-supervised models (BLSM)-based online soft sensing flowchart for the silicon content prediction.

Data Sets and Pretreatment
The BLSM method is applied to the silicon content prediction in an industrial blast furnace in China. For construction of soft sensors, the related input variables include the blast volume, the blast temperature, the top pressure, the gas permeability, the top temperature, the ore/coke ratio, and the pulverized coal injection rate [22][23][24]. After preprocessing the data set with 3-sigma criterion, most of obvious outliers were removed out. A set of about 260 labeled samples was investigated. Half of labeled samples are considered as the historical samples. The remaining part is used for testing the models. Additionally, 500 unlabeled data were obtained as historical samples in the same furnace. The labeled and unlabeled data are from the same industrial blast furnace, indicating that they share with similar characteristics in a production process. Consequently, the semi-supervised learning methods can be applied.

Data Sets and Pretreatment
The BLSM method is applied to the silicon content prediction in an industrial blast furnace in China. For construction of soft sensors, the related input variables include the blast volume, the blast temperature, the top pressure, the gas permeability, the top temperature, the ore/coke ratio, and the pulverized coal injection rate [22][23][24]. After preprocessing the data set with 3-sigma criterion, most of obvious outliers were removed out. A set of about 260 labeled samples was investigated. Half of labeled samples are considered as the historical samples. The remaining part is used for testing the models. Additionally, 500 unlabeled data were obtained as historical samples in the same furnace. The labeled and unlabeled data are from the same industrial blast furnace, indicating that they share with similar characteristics in a production process. Consequently, the semi-supervised learning methods can be applied.
As a recent supervised method with good nonlinear regression performance, the just-in-time least squares SVR (JLSSVR) soft sensor [23] is adopted for comparison. Additionally, as a semi-supervised model, the SELM model [39] is also combined with JITL to construct a local SELM soft sensor here.
Two common performance indices, including the root-mean-square error (RMSE), the relative RMSE (simply denoted as RE), and the hit rate (HR), are adopted and defined, respectively.
where N tst is the number of test samples. H t is defined as:

Results and Discussion
First, with different sizes of unlabeled data, the comparison results of three performance indices between two semi-supervised models, i.e., BLSM and local SELM, are shown in Figures 2-4, respectively. For both BLSM and local SELM models, the prediction performance is enhanced gradually with the increase in the size of the unlabeled data. Due to the ensemble local modeling ability, BLSM exhibits superior prediction performance to a single local SELM one. In this case, the prediction performance is not further enhanced when the number of unlabeled samples is more than about 400. This is mainly because most of useful information in unlabeled dataset is absorbed from the first 400 data.
As a recent supervised method with good nonlinear regression performance, the just-in-time least squares SVR (JLSSVR) soft sensor [23] is adopted for comparison. Additionally, as a semi-supervised model, the SELM model [39] is also combined with JITL to construct a local SELM soft sensor here. Two common performance indices, including the root-mean-square error (RMSE), the relative RMSE (simply denoted as RE), and the hit rate (HR), are adopted and defined, respectively.
where tst N is the number of test samples. t H is defined as:

Results and Discussion
First, with different sizes of unlabeled data, the comparison results of three performance indices between two semi-supervised models, i.e., BLSM and local SELM, are shown in Figures 2-4, respectively. For both BLSM and local SELM models, the prediction performance is enhanced gradually with the increase in the size of the unlabeled data. Due to the ensemble local modeling ability, BLSM exhibits superior prediction performance to a single local SELM one. In this case, the prediction performance is not further enhanced when the number of unlabeled samples is more than about 400. This is mainly because most of useful information in unlabeled dataset is absorbed from the first 400 data.   With 400 unlabeled data, taking the HR index as an example, different numbers (i.e., K) of candidate local SELM models for construction of a BLSM one is shown in Figure 5. With the ensemble learning strategy, the efforts on parameter selection of BLSM can be reduced. The HR index indicates that the ensemble learning can enhance the prediction performance to some extent   With 400 unlabeled data, taking the HR index as an example, different numbers (i.e., K) of candidate local SELM models for construction of a BLSM one is shown in Figure 5. With the ensemble learning strategy, the efforts on parameter selection of BLSM can be reduced. The HR index indicates that the ensemble learning can enhance the prediction performance to some extent With 400 unlabeled data, taking the HR index as an example, different numbers (i.e., K) of candidate local SELM models for construction of a BLSM one is shown in Figure 5. With the ensemble learning strategy, the efforts on parameter selection of BLSM can be reduced. The HR index indicates that the ensemble learning can enhance the prediction performance to some extent (the HR value increases from 77.2% to 80.3%). And BLSM achieves the best prediction performance when K = 15 for this application. of lab assay, the computational load is accepted. With more historical data (especially unlabeled data), the computational load of online modeling becomes larger. To alleviate this problem, it is suggested that the online and offline models are integrated using the Bayesian analysis [37]. Alternatively, development of the recursive version of BLSM may be a choice. In summary, all the obtained results show that BLSM is a promising prediction method of the silicon content in hot metal produced in blast furnaces.  For the three soft sensors (i.e., BLSM, local SELM, and JLSSVR [23]), the silicon content prediction results are shown in Figure 6. This parity plot shows that BLSM is better than local SELM and JLSSVR methods. The prediction performance comparison of three modeling methods is listed in Table 1. Their main characteristics are also described briefly. Generally, BLSM is a local semi-supervised learning model and therefore it can better capture nonlinear characteristics in local regions, especially with the help of unlabeled data. For JLSSVR [23] only with a few labeled data, the prediction domain may be limited. Different from JLSSVR [23], BLSM explores and utilizes the hidden information in lots of unlabeled data to improve the local modeling ability. Moreover, using the simple bagging ensemble strategy, the prediction performance of a semi-supervised local model (e.g., a local SELM) can be enhanced.    The computational complexity of BLSM is about K times of a local SELM model. Based on the experiences, K is often much less than 100. The online prediction time of BLSM for a test sample is about 1 s (with CPU main frequency 2.3 GHz and 4 GB memory). Compared with the interval time of lab assay, the computational load is accepted. With more historical data (especially unlabeled data), the computational load of online modeling becomes larger. To alleviate this problem, it is suggested that the online and offline models are integrated using the Bayesian analysis [37]. Alternatively, development of the recursive version of BLSM may be a choice. In summary, all the obtained results show that BLSM is a promising prediction method of the silicon content in hot metal produced in blast furnaces.

Conclusions
This work has presented an online semi-supervised soft sensor model, i.e., BLSM, for blast furnace hot metal silicon content prediction. Two main advantages distinguish BLSM from most current hot metal silicon prediction soft sensors. First, the useful information in unlabeled data is absorbed into the online modeling and prediction framework efficiently. Second, a bagging-based ensemble strategy is integrated into the online semi-supervised model to improve its prediction reliability. The application results show that BLSM has better prediction performance than traditional soft sensors. This is the first application of semi-supervised learning methods to industrial blast furnaces. How to efficiently select the more informative unlabeled data in an error-in-variables environment for construction of a more robust semi-supervised model will be tackled in our future work.

Conflicts of Interest:
The authors declare no conflict of interest.

BLSM
bagging local semi-supervised model ELM extreme learning machine JITL just-in-time-learning JLSSVR just-in-time least squares support vector regression NNs neural networks RE relative root-mean-square error RMSE root-mean-square error RELM regularized extreme learning machine SELM semi-supervised extreme learning machine SLFNs single-hidden layer feedforward networks SVR support vector regression