Multi-Output Soft Sensor with a Multivariate Filter That Predicts Errors Applied to an Industrial Reactive Distillation Process

: The paper deals with the problem of developing a multi-output soft sensor for the industrial reactive distillation process of methyl tert-butyl ether production. Unlike the existing soft sensor approaches, this paper proposes using a soft sensor with ﬁlters to predict model errors, which are then taken into account as corrections in the ﬁnal predictions of outputs. The decomposition of the problem of optimal estimation of time delays is proposed for each input of the soft sensor. Using the proposed approach to predict the concentrations of methyl sec-butyl ether, methanol, and the sum of dimers and trimers of isobutylene in the output product in a reactive distillation column was shown to improve the results by 32%, 67%, and 9.5%, respectively.


Introduction
As the size and complexity of industrial systems increases, there is a need to accurately measure most process variables. Unfortunately, not all variables can be accurately measured using online hard sensors. For certain variables, such as concentration or density, the only accurate measurements can be obtained by manually taking samples and analyzing them in a laboratory. One solution to this problem is the development of soft sensors, which take the easy-to-measure variables and create models to predict the hard-to-measure variables [1].
All soft sensor systems consist of a process model that takes the easy-to-measure variables and provides an estimate of the hard-to-measure variables. These models can be constructed using methods ranging from linear regression to principal component analysis and support vector machines. Although the main focus has been on the development of the soft sensor models [2][3][4][5], advanced soft sensor systems have also a bias update term that can take any slowly sampled information to update the soft sensor prediction [1]. This bias update term is normally designed as some function of the difference between the predicted and measured values [6]. Of note, it should be mentioned that the measured values are often sampled very slowly and with considerable time delay. This means that during the points at which there are no updates, the previously available bias value is used. When such a system is properly designed, it can provide good tracking of the process, i.e., the predicted and measured values are close to each other.
Recently, it has been suggested that instead of only using the available slowly sampled data for updating the bias term, it should be possible to also model the historical errors and use them to predict the future errors [7]. It has been shown that such an approach can improve the overall performance of the soft sensor system. However, there still remain issues with how best to model and implement this predictive bias update term.
Furthermore, there are issues with incorporating time delays into this approach since they will greatly increase the size of the required search space. Therefore, this paper will examine the development of a predictive bias update term for a nonlinear system using dimension reduction. The proposed approach will be tested using data from an industrial reactive distillation column that produces methyl tert-butyl ether (MTBE).

Background
Consider the soft sensor system shown in Figure 1, where u t is the input, y t is the measured (true) output,ŷ m,t the predicted soft sensor value,ŷ α,t andŷ β,t are intermediate soft sensor values, G p is the true process,Ĝ p is the soft sensor process model, and G B is the bias update term. It can be noted that purpose of the bias update term is to take the information from the measured values and correct the output of the soft sensor system. This comes primarily from the unknown disturbances and the inherent plant-model mismatch.
sampled data for updating the bias term, it should be possible to also model the historical errors and use them to predict the future errors [7]. It has been shown that such an approach can improve the overall performance of the soft sensor system. However, there still remain issues with how best to model and implement this predictive bias update term. Furthermore, there are issues with incorporating time delays into this approach since they will greatly increase the size of the required search space. Therefore, this paper will examine the development of a predictive bias update term for a nonlinear system using dimension reduction. The proposed approach will be tested using data from an industrial reactive distillation column that produces methyl tert-butyl ether (MTBE).

Background
Consider the soft sensor system shown in Figure 1, where ut is the input, yt is the measured (true) output, G is the soft sensor process model, and GB is the bias update term. It can be noted that purpose of the bias update term is to take the information from the measured values and correct the output of the soft sensor system. This comes primarily from the unknown disturbances and the inherent plant-model mismatch. Another approach to this problem is to re-arrange the bias update term so that it contains a predictive model that can predict the errors between the measured and predicted values. This re-arrangement is shown in Figure 2, where the predicted value from the soft sensor is corrected based on the modeled errors of the system. The question becomes how to design this model so that the best predictions can be obtained. Another approach to this problem is to re-arrange the bias update term so that it contains a predictive model that can predict the errors between the measured and predicted values. This re-arrangement is shown in Figure 2, where the predicted value from the soft sensor is corrected based on the modeled errors of the system. The question becomes how to design this model so that the best predictions can be obtained.  For prediction of time series, the Box-Jenkins methodology is traditionally used, according to which the time series model is found in the class of autoregressive-moving average (ARMA) models, i.e., is considered a rational algebraic function of the backward shift operator. The flexibility of the ARMA class makes it possible to find parsimonious models, i.e., the adequacy of the evaluated model is achieved with a small number of estimated parameters. Since this property is especially important for empirical models, the Box-Jenkins methodology is widely used to solve various practical problems. This approach is adopted in this paper.
In industrial processes, where it is desired to implement the model on programmable logic control (PLC) units, the complexity of the model ˆp G can be an issue. Therefore, this paper will consider a simple model for ˆp G of the form errors and use them to predict the future errors [7]. It has been shown that such an approach can improve the overall performance of the soft sensor system. However, there still remain issues with how best to model and implement this predictive bias update term. Furthermore, there are issues with incorporating time delays into this approach since they will greatly increase the size of the required search space. Therefore, this paper will examine the development of a predictive bias update term for a nonlinear system using dimension reduction. The proposed approach will be tested using data from an industrial reactive distillation column that produces methyl tert-butyl ether (MTBE).

Background
Consider the soft sensor system shown in Figure 1, where ut is the input, yt is the measured (true) output, G is the soft sensor process model, and GB is the bias update term. It can be noted that purpose of the bias update term is to take the information from the measured values and correct the output of the soft sensor system. This comes primarily from the unknown disturbances and the inherent plant-model mismatch. Another approach to this problem is to re-arrange the bias update term so that it contains a predictive model that can predict the errors between the measured and predicted values. This re-arrangement is shown in Figure 2, where the predicted value from the soft sensor is corrected based on the modeled errors of the system. The question becomes how to design this model so that the best predictions can be obtained. errors and use them to predict the future errors [7]. It has been shown that such an approach can improve the overall performance of the soft sensor system. However, there still remain issues with how best to model and implement this predictive bias update term. Furthermore, there are issues with incorporating time delays into this approach since they will greatly increase the size of the required search space. Therefore, this paper will examine the development of a predictive bias update term for a nonlinear system using dimension reduction. The proposed approach will be tested using data from an industrial reactive distillation column that produces methyl tert-butyl ether (MTBE).

Background
Consider the soft sensor system shown in Figure 1, where ut is the input, yt is the measured (true) output, G is the soft sensor process model, and GB is the bias update term. It can be noted that purpose of the bias update term is to take the information from the measured values and correct the output of the soft sensor system. This comes primarily from the unknown disturbances and the inherent plant-model mismatch. Another approach to this problem is to re-arrange the bias update term so that it contains a predictive model that can predict the errors between the measured and predicted values. This re-arrangement is shown in Figure 2, where the predicted value from the soft sensor is corrected based on the modeled errors of the system. The question becomes how to design this model so that the best predictions can be obtained. For prediction of time series, the Box-Jenkins methodology is traditionally used, according to which the time series model is found in the class of autoregressive-moving average (ARMA) models, i.e., is considered a rational algebraic function of the backward shift operator. The flexibility of the ARMA class makes it possible to find parsimonious models, i.e., the adequacy of the evaluated model is achieved with a small number of estimated parameters. Since this property is especially important for empirical models, the Box-Jenkins methodology is widely used to solve various practical problems. This approach is adopted in this paper.
Mathematics 2021, 9, 1947 3 of 14 In industrial processes, where it is desired to implement the model on programmable logic control (PLC) units, the complexity of the modelĜ p can be an issue. Therefore, this paper will consider a simple model forĜ p of the form where b are the parameters to be estimated and x t is the input(s). Model (1) can be improved by taking into account possible delays of the output variables relative to inputs. Consider the following model for a multi-output soft sensor where t = 1, 2, . . . , n; m = 1, 2, 3 (the number of outputs m is given by the industrial production team and reflecting the key quality indices of MTBE product). Vector b m = (b m, is a row vector of unknown coefficients; τ m = (τ m, 1 , τ m, 2 , . . . , τ m, 10 ) is a row vector of unknown time delays; u m (t, τ m ) = (u t, m, 1 , u t, m, 2 , . . . , u t, m, 10 ) T ; u t, m, k is the measurement of the x k value at time t − τ m, k with k = 1, 2, . . . , 10. Please note that it has been assumed here that the maximal time delay is 10 samples and justified from the industrial process dynamics point of view. However, it can easily be extended to arbitrary values. Solving model (2) by minimizing the mean squared error (MSE) gives an estimate for the unknown parametersb m andτ m . The MSE depends not only on the coefficients b m , but also on the delays τ m , i.e., Furthermore, the estimatesb m are found using standard regression analysis which giveŝ where Y m is the m-th column of the matrix Y; U m is a matrix with dimension n × 10, whose t-th row is the row u m (t, τ m ) T . Since all variables are measured at discrete moments in time, the gradient descent methods cannot be directly applied to minimize the objective function D em (b m , τ m ) for the argument τ m . However, this difficulty can be avoided by calculating D em for any values of the elements of the vector τ m by interpolating between the nearby nodes of the discrete grid. Interpolation with a large search space dimension is a difficult problem. Among the various characteristics of the algorithms used, such properties as visibility and relative simplicity come to the fore. Therefore, in this situation, the most preferable is the polynomial interpolation.

Error Modeling
If the e t, m error were known at time t − 1, then using Equation (2), it would be possible to predict the y t, m variable with absolute accuracy. Unfortunately, the e t, m error is not known in advance, but it can be predicted using any statistical patterns found in the sequence e 1, m , e 2, m , . . . . This error prediction can be used as a correction to model (2) as shown in Figure 2, therefore improving the prediction accuracy of the y t,m output variable.
To evaluate a predictive model for the sequence e 1, m , e 2, m . . . , let us consider the class of ARMA models. Let us introduce the predicted process as the output of an invertible linear filter, called a shaping filter, driven by white noise, i.e., a process with a constant spectral density. In this case, the transfer function of the shaping filter is considered a rational algebraic function of the backward shift operator, i.e., where ε t and e t are values of the input and output processes of the shaping filter at time t; N n is the order of the moving average; N d is the order of the autoregressive component; H l , G k are constants (generally speaking, complex-valued); and q −1 is the backshift operator. The stationarity and invertibility conditions, which are necessary to predict the e t process, are [8] The flexibility of the ARMA class provides the possibility of finding parsimonious models, i.e., the adequacy of the constructed model is achieved with a relatively small number of estimated parameters. Since this property is especially important for empirical models, the models with the structure given in Equation (7) and their variants are widely used for solving practical problems.
The filter for predicting the e t process can be found using the prediction error method (PEM) [9]. Expanding the brackets in Equation (7) gives where θ l and η k are the model parameters. It is assumed that the polynomials in the numerator and denominator have no common roots, since otherwise it would be possible to reduce the common multipliers in the numerator and denominator of Equation (7). The PEM function finds the parameter values that minimize the predictive MSE of the e t process for given polynomial orders (N n , N d ) and the initial estimates of the parameters θ l and η k . It is possible to choose suitable orders of the polynomials based on sample estimations of the spectral density of the considered process. Recall that the frequency response of the shaping filter is the value of Equation (7) on a circle of unit radius centered on the origin and the spectral density S(ω) of the output process e t is equal to the product of the variance of the input process and the square of the frequency response modulus, i.e., [10] where σ 2 ε is the variance of random process ε t and H l and G k are the complex conjugates of the constants H l and G k . Furthermore, since we desire that our filter be invertible, it follows that for the model the e t process is invertible if the absolute values of all the H l constants are less than one. Similarly, if the absolute values of all the G k constants is less than one, then the e t process is stationary [8]. Thus, although multiple processes can have the same spectral density, there is only one that is both stationary and invertible. Once the general model has been obtained, we can rewrite it as an infinite impulse response model, i.e., where ψ is an impulse response coefficient. Since we know that the general model converges [8], it follows that we only need a finite number of terms in Equation (12). Furthermore, we note that which implies that for any positive i the random variables ε t and e t−i are uncorrelated (since the process ε t is white noise). Therefore, successively multiplying both sides of Equation (12) by the values of the corresponding process at delays i and taking expectations, we obtain equations for finding the initial estimates of the parameters that involve the covariances of the errors for different lags [10]. Obviously, since the true covariances are not known, they will need to be replaced by the sample estimates. This method of estimating the coefficients does not lead to too large error as long as the absolute values of the parameters of model (7) are not too close to the boundary of unit circle centered on the origin. Thus, it is possible to design the required filter.

Filter Design
Let e t = (e t, 1 , e t, 2 , . . . , e t, N ) T be an N-dimensional stationary process of the soft sensor's errors whose shaping filter transfer matrix is F 0 (q −1 ), i.e., where q −1 is the backshift operator; ε t = (ε t, 1 , ε t, 2 , . . . , ε t, N ) T is an N-dimensional vector of white noise; and F 0 (q −1 ) = [f km (q −1 )] is an N × N matrix function, whose entries denoted as f km (q −1 ) are the rational transfer function from ε t,m to e t, k . Thus, it is desired to construct the filter that will predict e t+1 given the past values. Let P(q −1 ) be the desired one-step ahead predictor transfer matrix,ê t+1 = P(q −1 )e t the prediction of the vector e t+1 at time t, and ε t+1 = e t+1 −ê t+1 the error of the prediction obtained with the aid of the filter P(q −1 ). Then where I N is identity matrix of order N. Consequently, the filter in the square brackets transforms the initial series into the prediction error series. If the random vector ε t includes components correlated with those of the vector ε t−j at some j > 0, we can predict the errors ε t using the known previous errors. Using those predictions as corrections to the e t that were obtained, we could improve the accuracy of the predictions. Hence, in order to maximize the predictor accuracy, we must find a P(q −1 ) such that the errors ε t are uncorrelated with the errors ε t−j at any j > 0 with some nonzero correlation between the components of ε t (i.e., at j = 0) being admissible. In other words, the time series ε t must be N-dimensional white noise. Consequently, I N − q −1 P(q −1 ) = F 0 −1 (q −1 ), from which it follows that P(q −1 ) = q[I N − F 0 −1 (q −1 )]. Thus, the predictor transfer matrix P(q −1 ) can be expressed through the transfer matrix of the shaping filter F 0 (q −1 ). The matrix F 0 (q −1 ) can be found from where G(q −1 ) = [g km (q −1 )], g km (q −1 ) is the q-transform of the statistical estimate of the cross-covariance function of the time series e t, k and e t, m (in particular, when m = k, g mm is a q-transform of the sample covariance function, i.e., the autocovariance generating function (AGF) of the time series e tm ). The algorithm for finding F 0 (q −1 ) is simplified by decomposing it into N stages. At the kth stage, a shaping filter F k (q −1 ) of the k-dimensional process (e t, 1 , e t, 2 , . . . , e t, k ) T is found. At this stage, the filter F k−1 (q −1 ), found at the (k−1)th stage, is used in order to transform the matrix G k (q −1 ) = F k (q −1 )F k T (q) so that its transform contains nonzero elements in only one line, one column, and on the main diagonal. This technique substantially simplifies the procedure of spectral factorization (finding the matrix function F k (q −1 )) [11].
The proposed approach allows us to identify the vector time series transfer matrix without resorting to a complicated phase state representation. This advantage is used to obtain an adequate model with relatively few estimated parameters for the initial time series shaping filter F 0 (q −1 ). Simultaneously, the model for the transfer matrix of the inverse filter F 0 −1 (q −1 ), which transforms the initial time series into the white noise, is also found. The algorithm for constructing both the shaping filter F 0 (q −1 ) and its inverse F 0 −1 (q −1 ) is described in [11]. Based on this algorithm, the sequence of prediction errors ε t should be N-dimensional white noise. However, since in practice, the true characteristics of the original process are not known, but only their estimates, containing inevitable statistical errors, in reality, the properties of the sequence ε t can be significantly different from the properties of white noise. Thus, to verify the optimality of the resulting model P(q −1 ) of the predictive filter, a criterion is needed to test the hypothesis that the process ε t is N-dimensional white noise. To construct such a criterion, we can transform the process ε t in such a way that its spectral density matrix is diagonal. Such a transformation is achieved by means of a rotation of axes in the N-dimensional variable space ε 1 , ε 2 , · · · , ε N [12]. Since the variances of these variables can be made equal to each other by normalization, without loss of generality, we suppose that spectral density matrix of the noise ε t is an N × N identity matrix I N .
Consider a univariate sequence ξ k = ε t−j,m , where k = jN + m. Please note that each pair couple (j, m) determines one k and each k determines one pair couple (j, m). Consequently, ε t is multivariate white noise if and only if ξ k is univariate white noise. It is known that the spectral density of univariate white noise is constant [8,13]. Thus, testing the hypothesis that ε t is multivariate white noise is reduced to testing the hypothesis on the constancy of the spectral density of a univariate sequence. This hypothesis can be tested using Kolmogorov's criterion [14].
Please note that only a time series containing prediction errors is used as the initial information for constructing a predictor with the proposed approach. Information about the model with which the predictions were obtained is not used. Therefore, this approach is applicable to any predictive model that involves errors, regardless of the specific properties of the model used.

Summary of the Proposed Approach
Thus, the proposed procedure for developing the model can be summarized as follows: Step 1: Create an initial sample u t , y t , t = 1, 2, . . . , K. If the plant is already functioning then the initial sample consists of the historical values of u t , y t . Otherwise, the initial sample is forming during the trial period of the plant. The initial sample is divided into training and testing datasets.
Step 2: Based on the data included in the training sample, the coefficients and delays of the model given by Equation (2) are estimated via solving optimization problem (4).
Step 3: Based on the data included in the training sample, the errors for the model and the corresponding sample spectrum of errors are calculated.
Step 4: Based on the sample spectrum, the order of the ARMA model is selected in order to predict the unknown future error given the known current and past errors.
Step 5: The least squares method is used to find the values of the ARMA model parameters.
Step 6: The ARMA model obtained is used as the predictive filter F(q −1 ) in the feedback loop of the compensator (bias update term) as shown in Figure 2.
Step 7: If the resulting soft sensor improves the accuracy of the prediction for the test sample then it can be recommended for practical use.
Please note that the obtained predictive filter model can be recommended for further use for the same plant on the data of which it was built. As for the approach, it will certainly be successful if the sequence of errors of the plant is a stationary (or close to it) process. In addition, the class of successful applicability of this approach can be extended to those plants, for whose errors it is possible to find an invertible transformation that brings the sequence of errors to a stationary process. The quality of the developed model should be checked on a test sample that was not used at the stage of the model training.

Industrial Application of the Proposed Method
Industrial methyl tert-butyl ether (MTBE) production occurs in a reactive distillation unit, as shown in Figure 3. The feed containing isobutylene and methanol (MeOH) enters the column. The distillate (D) is a lean butane-butylene fraction with a certain amount of MeOH. The raffinate is the heavy product MTBE that is withdrawn from the bottom part of the column. Table 1 shows the main process variables for the industrial unit. The goal is to develop a soft sensor for the prediction of the concentrations of methyl sec-butyl ether (MSBE), MeOH, and the sum of dimers and trimers of isobutylene (DIME) in the bottom product MTBE.
The measured values of output y m and input x k variables at the time moment t are denoted as y tm , x tk ; m = 1, 2, 3; k = 1, 2, . . . , 10; and t = 1, 2, . . . , n. The existing measurements may be used for development of a predictive model of the form where y t = (y t, 1 , y t, 2 , y t, 3 ) T ; x t = (x t, 1 , x t, 2 , . . . , x t, 10 ) T ; b is a matrix of the model parameters [b mk ] of dimension 3 × 10; b 0 = (b 1 , b 2 , b 3 ) T is a vector of the constant biases; e t = (e t, 1 , e t, 2 , e t, 3 ) T is a vector of the residuals, and the superscript T denotes the transpose. Since Equation (17) can be rewritten as where y = 1 as well as biases vector b 0 , may be considered to be equal to zero without loss of generality. Although the elements of matrix b are unknown, they are easily estimated using the ordinary least squares (OLS) method, which gives [10] b= where Although the elements of matrix b are unknown, they are easily estimated using the ordinary least squares (OLS) method, which gives [10] where X = [xtk]; Y = [ytm]; m = 1, 2, 3; k = 1, 2, …, 10; and t = 1, 2, …, n.
For the training sample containing n = 400 measurements, the following estimates were obtained:  For the training sample containing n = 400 measurements, the following estimates were obtained: The sample estimate of the coefficient of determination to predict the output variable y m denoted by R 2 Lm is R 2 L1 = 0.7160; R 2 L2 = 0.5200; R 2 L3 = 0.5726. The effect of delay accounting was evaluated on a test sample containing 167 measurements. As a result, the MSE of the predictions of output variables y 1 , y 2 and y 3 decreased by 23%, 10%, and 3%, respectively. Now, let us consider modeling the error term. From the spectral density of the errors for e t, 1 and e t, 3 shown in Figures 4 and 5, it can be seen that the maximum within the interval [0, 0.5] Hz indicates the presence in the denominator of the spectral density function S(ω) a factor (1 − Ge −jω ) with a complex-valued constant G. Since the sampling time is equal to 12 h, the frequency unit 1/(12 h) is used instead of Hz. However, for the practical application of the filter given by Equation (9), it is necessary that all the coefficients be real [8]. Therefore, the denominator of density S(ω) must contain a factor (1 − Ge −jω ) along with a factor (1 − Ge −jω ). If the frequency response models for e t, 1 and e t, 3 processes are limited to these two factors (assuming the numerator is equal to one), then the corresponding spectral density of the second-order autoregressive process approximates well the sample estimates of the spectrum of e t, 1 and e t, 3 processes at different values of G. However, the insufficiently rapid decrease of the spectral density in the high-frequency region justifies the inclusion in the denominator of the model another multiplier with a real value of the constant G. atics 2021, 9, x FOR PEER REVIEW In Figure 6, which shows the spectral density for the et, 2 of this time series resembles the spectrum of a first-order au However, we note that the stochastic process is not uniquel density [8]. Therefore, as previously mentioned, we need constraints that the resulting model be invertible and realiza have a unique model.   In Figure 6, which shows the spectral density for the e of this time series resembles the spectrum of a first-order a However, we note that the stochastic process is not uniqu density [8]. Therefore, as previously mentioned, we ne constraints that the resulting model be invertible and reali have a unique model.   In Figure 6, which shows the spectral density for the e t, 2 errors, the sample spectrum of this time series resembles the spectrum of a first-order autoregressive process [15][16][17]. However, we note that the stochastic process is not uniquely determined by its spectral density [8]. Therefore, as previously mentioned, we need to include two additional constraints that the resulting model be invertible and realizable. This will ensure that we have a unique model.  where η are the parameters to be determined. These parameters can be found using the approach presented in Section 2.2 by multiplying the finite impulse response model by the delayed errors and taking the expectations. For example, for e 1 , this gives where For the process e t, 1 , the estimates of the coefficients η 11 , η 21 and η 31 are, respectively, equal to 0.4131, −0.0093, and −0.0528. These values were used as the initial guesses passed to the PEM function. As a result of calculations, the model parameters were found to be: η 11 = 0.4175, η 21 = 0.03234, η 31 = −0.07026. The initial value of coefficient η 12 is 0.3748 and its final value is η 12 = 0.3758.
The performance of predictive filter models obtained from the analysis of the training dataset is validated using the testing sample.    Figures 10-12 compare the performance of the soft sensors with the proposed filter for error prediction and a traditional method, in which adaptive bias term is calculated based on the moving window (MW) approach [18]. It can be seen that the filter provides better tracking of the process values, therefore improving the accuracy of the overall soft sensor system reducing the MSE of the output variables y1, y2, and y3 by 32%, 67%, and 9.5%, respectively. Figures 10-12 compare the performance of the soft sensors with the proposed filter for error prediction and a traditional method, in which adaptive bias term is calculated based on the moving window (MW) approach [18]. It can be seen that the filter provides better tracking of the process values, therefore improving the accuracy of the overall soft sensor system reducing the MSE of the output variables y 1 , y 2 , and y 3 by 32%, 67%, and 9.5%, respectively.  10-12 compare the performance of the soft sensors with the proposed filter for error prediction and a traditional method, in which adaptive bias term is calculated based on the moving window (MW) approach [18]. It can be seen that the filter provides better tracking of the process values, therefore improving the accuracy of the overall soft sensor system reducing the MSE of the output variables y1, y2, and y3 by 32%, 67%, and 9.5%, respectively.  . Figure 12. Estimation of ym3.

Conclusions
This paper proposed a new approach to handling the bias update term in a soft sensor system. Rather than purely using available samples, the new bias update term seeks to predict what the errors will be in the future. Tests of this approach on a reactive distillation column show that the approach can handle the errors well. However, the predictive filters used only work for areas without serious disturbances or outliers.

Conclusions
This paper proposed a new approach to handling the bias update term in a soft sensor system. Rather than purely using available samples, the new bias update term seeks to predict what the errors will be in the future. Tests of this approach on a reactive distillation column show that the approach can handle the errors well. However, the

Conclusions
This paper proposed a new approach to handling the bias update term in a soft sensor system. Rather than purely using available samples, the new bias update term seeks to predict what the errors will be in the future. Tests of this approach on a reactive distillation column show that the approach can handle the errors well. However, the predictive filters used only work for areas without serious disturbances or outliers.
Therefore, it makes sense to consider more complex models for the predictive filters including models with an additional component in the form of some flow, for example, Poissonian flow, of events (outliers). If the flow of outliers is added to the process model then the intensity of this flow needs to be estimated. In this case, the number of outliers in the training dataset should be sufficient to estimate the intensity of the flow of outliers with acceptable accuracy.