Multi-Sensor Recursive EM Algorithm for Robust Identification of ARX Models

Chen, Xin; Li, Jiale

doi:10.3390/s25227060

Open AccessArticle

Multi-Sensor Recursive EM Algorithm for Robust Identification of ARX Models

by

Xin Chen

^*

and

Jiale Li

School of Electronics and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(22), 7060; https://doi.org/10.3390/s25227060

Submission received: 8 October 2025 / Revised: 14 November 2025 / Accepted: 17 November 2025 / Published: 19 November 2025

(This article belongs to the Section Industrial Sensors)

Download

Browse Figures

Versions Notes

Abstract

A robust multi-sensor recursive Expectation-Maximization (RMSREM) algorithm is proposed in this paper for autoregressive eXogenous (ARX) models, addressing the challenges of heavy-tailed noise, as well as the difficulty in simultaneously processing multi-sensor information. First, for the potential outliers in industrial processes, the Student’s t-distribution is introduced to model the statistical characteristics of measurement noise, whose heavy-tailed property enhances the algorithm’s robustness. Second, a recursive framework is integrated into the Expectation-Maximization (EM) algorithm to satisfy the real-time requirement of dynamic system identification. Through a recursive scheme of the Q-function and sufficient statistics, model parameters are updated in real-time, allowing them to adapt to time-varying system characteristics. Finally, by exploiting the redundancy and complementarity of multi-sensor data, a multi-sensor information fusion mechanism is designed that adaptively calculates the weight of each sensor based on the noise variances. This mechanism effectively fuses multi-source observation information and mitigates the impact of single-sensor failure or inaccuracy on identification performance. Numerical examples and simulations of the continuous stirred-tank reactor (CSTR) demonstrate the validity of the proposed RMSREM algorithm.

Keywords:

robust system identification; multi-sensor data fusion; ARX model; recursive EM algorithm; student’s t-distribution

1. Introduction

System identification is a cornerstone of modern control theory and engineering applications. It aims to create mathematical models that can accurately describe the dynamic characteristics of a system using input-output data [1,2]. Among various identification model structures, the autoregressive with exogenous input (ARX) model has been widely applied in the fields of industrial processes, chemical production, and biological manufacturing, due to its concise structure and efficient computation [3,4]. In practical environments, the measurement signals are often affected by accidental factors, such as intense noise, gross errors, missing data, and sensor failure. The above abnormal disturbances are induced by latent variables that are not directly observable. The Expectation-Maximization (EM) algorithm [5] creates a practical statistical framework for solving maximum likelihood estimation problems involving latent variables or incomplete data. Meanwhile, multiple sets of sensors can be deployed to collect large amounts of measurement data, to improve identification accuracy [6]. Most existing methods for the identification of multiple sensors rely on the assumption of identically distributed data, which is not suitable for simultaneous multi-sensor processing. An urgent need for new identification methods appears nowadays [7,8].

Titterington et al. [9] pioneered the recursive EM algorithm (REM) via stochastic approximation, recursively updating parameters using observed data likelihood gradient and Fisher information matrix (FIM) of complete data. To avoid FIM inversion, Cappé et al. [10] investigated an online EM algorithm based on recursive sufficient statistics with a focus on exponential family distributions. Later, the online EM algorithm was applied to the parameter estimation of hidden Markov models in [11]. A recursive EM identification relying on sufficient statistics was developed in [12], where additional iterations were conducted for each time instant. Considering the time-delay, Guo et al. [13] employed REM and convex optimization for the identification of Markov jump autoregressive systems. A recursive parameter estimation approach for the Dirichlet hidden Markov models was developed in [14]. A Student’s t-distribution-based REM was developed in [15] for the robust identification of linear ARX models. The varying delay issue of an integrated measurement systems was considered in [16], where an online EM algorithm was employed for parameter learning. The robust recursive identification was developed for time-delay systems in [17] integrated with skewed measurement noise.

On the other hand, Multi-Sensor identification mainly involves two processing methods: data fusion and multi-task learning (MTL) [18,19]. Data fusion technology further improves data availability and redundancy through the extensive deployment of multi-sensor systems. Simultaneously observing the same process variable with multiple sensors can reduce the impact of single-sensor failure or anomalies, improving identification accuracy and reliability [20]. Relevant research is mainly divided into two streams: probabilistic statistical methods and artificial intelligence approaches [21]. Due to its simplicity and efficiency, the weighted fusion algorithm has become an ideal choice for fusing data of varying accuracies. The optimal and unbiased fused data can be obtained by reasonably evaluating weight coefficients in the weighted fusion algorithms [22]. Data fusion methods obtain high quality measurement data through weighted averaging or optimal fusion approaches [23], whereas MTL methods share information among sensors from the perspective of optimization objectives, improving overall performance [24,25]. However, some deficiencies still exist in current approaches. Although these methods assume different noise characteristics across multiple sensors, they mostly rely on the Gaussian noise assumption. As is well known, the Gaussian assumptions are sensitive to non-Gaussian perturbations and outliers. Once sensor measurements exhibit heavy-tailed distributions or sudden spike disturbances, their identification performance would rapidly degrade.

In industrial engineering practice, outliers often appear due to unknown reasons, such as sensor faults and signal transmission disturbances. Improper disposition of the outliers may lead to degradation in model performance. In engineering, the most general outlier processing methods are data trimming and smoothing. Although the data trimming and smoothing approaches are intuitive and easy to understand, the identification models will suffer from information loss and estimation bias [26]. To enhance the robustness of system identification in non-Gaussian noise environments, statistical modeling methods based on the t-distribution have attracted researchers’ attention. Compared to ordinary Gaussian distribution, t-distribution has heavier tails in its probability density function (pdf), which increases tolerance for outliers [27]. Due to remarkable reliability and analytical properties, the t-distribution has been widely adopted in identification fields. For example, a robust Bayesian technique for logistic regression modeling was proposed in [28], where a weakly informative Student’s t prior distribution was employed. Yu et al. [29] considered the joint estimation of states and noise covariance for linear systems with unknown covariance of multiplicative noise, where the measurements were modeled as a mixture of a Student’s t-distribution and Gaussian distributions. Several alternatives to the EM algorithm were explored in [30] for ML estimation of location, scatter matrix, and degree of freedom (dof) of the Student t-distribution. Beyond the maximum likelihood ideology, the application of the t-distribution has also been documented in numerous studies [27,31] within the ideology of variational Bayesian inference for identification tasks.

However, a robust multi-sensor fusion technology is still not receiving sufficient attention for online identification. Based on the above background, this paper introduces the concept of multi-sensor fusion in the online EM algorithm and proposes a robust multi-sensor recursive EM algorithm (RMSREM). On the one hand, the proposed algorithm can effectively fuse information from multi-source observations. On the other hand, RMSREM algorithm can enhance robustness against outliers via incorporating the t-distribution in noise modeling. Moreover, the recursive EM framework enables real-time updates of model parameters when new data arrives. With the implementation of the proposed method, the heterogeneous information from multiple sensors is adequately utilized. The identification accuracy is significantly improved. Moreover, the performance of the estimated models remains stable in complex environments, involving sensor plug-and-play, noise interference, and outliers. The main contributions of this article are given as follows:

The Student’s t-distribution is incorporated in the algorithm to describe the statistical characteristics of measurement noise, whose heavy-tailed property promotes the algorithm’s robustness;
Second, a recursive Q-function is derived, based on which a recursive framework of the EM algorithm is accomplished together with sufficient statistics recursion. The real-time requirement of dynamic system identification is satisfied;
A multi-sensor information fusion mechanism is designed. Multi-source information is fused via adaptive calculation of the weight of each sensor. The reliability of the identification algorithm has been enhanced.

The structure of this paper is as follows: the preliminary concepts and background knowledge of the proposed RMSREM are introduced in Section 2, Section 3 provides the mathematical explanation of the robust multi-sensor recursive EM algorithm, Section 4 verifies the adaptability and efficacy of the proposed algorithm through a numerical example and simulations of the CSTR, and Section 5 presents the final research conclusions.

2. Problem Formulation

Considering a linear system with M sensors, one of the ARX model is defined as

y_{k}^{m} = {(x_{k}^{m})}^{T} θ + e_{k}^{m}, k = 1, 2, \dots, N,

(1)

where k is the time index and

m = 1, 2, \dots, M

indicates the sensor number. The noisy observation obtained from the m-th sensor at instant k is denoted by

y_{k}^{m}

. The corresponding regressive vector is

x_{k}^{m} = {[y_{k - 1}^{m}, \dots, y_{k - n_{a}}^{m}, u_{k - 1}, \dots, u_{k - n_{b}}]}^{T} \in R^{n}

, which incorporates past outputs of the same sensor together with previous inputs. The system input at time k is represented by

u_{k} \in R^{1}

. The unknown parameter vector to be estimated is

θ = {[a_{1}, \dots, a_{n_{a}}, b_{1}, \dots, b_{n_{b}}]}^{T} \in R^{n}

, where

n_{a}

and

n_{b}

denote the output and input polynomial orders. The measurement noise from the m-th sensor is indicated by

e_{k}^{m}

, which is assumed to follow a Student’s t-distribution, i.e.,

e_{k}^{m} \sim t (μ_{m}, σ_{m}^{2}, ν_{m}) .

Assumption 1.

The measurement noise

{e_{k}^{m}}

from the same sensor are independent and identically distributed.

Assumption 2.

The measurement sets

{y_{k}^{1}, \dots, y_{k}^{M}}

are independent but not identically distributed.

Assumption 3.

For the sake of computational simplicity, the distinct distributions are defined as Student’s t-distributions with different variances and different dofs.

The pdf of Student’s t-distribution is as follows:

t (e_{k}^{m} | μ_{m}, σ_{m}^{2}, ν_{m}) = \frac{Γ (\frac{ν_{m} + d}{2}) {|σ_{m}^{2}|}^{- \frac{1}{2}}}{{(π ν_{m})}^{\frac{d}{2}} Γ (\frac{ν_{m}}{2}) {[1 + δ (e_{k}^{m} | μ_{m}, σ_{m}^{2}) / ν_{m}]}^{\frac{ν_{m} + d}{2}}},

(2)

where

μ_{m} \in R^{1}

represents the mean of the measurement noise (taken as

μ_{m} = 0

in this work),

σ_{m}^{2} \in R^{1}

is its variance, and

ν_{m} \in R^{1}

specifies the dof. The measurement dimension is denoted by

d \in N^{1}

, with

d = 1

in this case.

Γ (t)

is the gamma function. Furthermore, the square of Mahalanobis distance between the noise term

e_{k}^{m}

and the mean value

μ_{m}

, given the variance

σ_{m}^{2}

, is defined as

δ (e_{k}^{m} | μ_{m}, σ_{m}^{2}) = {(e_{k}^{m} - μ_{m})}^{2} / σ_{m}^{2}

.

In system identification and parameter estimation, it is conventional to assume that the noise follows a Gaussian distribution. The Gaussian assumption is primarily due to its analytical property and alignment with engineering practice. Nevertheless, Gaussian-based models are vulnerable to outliers and abnormal disturbances in complex environments. To alleviate such sensitivity, the so-called contaminated Gaussian model introduces a mixture of two Gaussian components with different variances, where the one with the larger variance is set to accommodate outliers. From a similar perspective, the Student’s t-distribution can be regarded as a limit case of Gaussian mixtures that share an identical mean but whose variances vary continuously from 0 to ∞, governed by a variance-scaling mechanism [32]. By incorporating a latent weighting variable

r_{m, k} \in R^{1}

, which adjusts the influence of irregular deviations in the measurements, the probability of the measurement noise

e_{k}^{m}

under the t-distribution can be reformulated as

p (e_{k}^{m} | 0, σ_{m}^{2}, ν_{m}) = \int p (e_{k}^{m} | 0, σ_{m}^{2}, r_{m, k}) p (r_{m, k} | ν_{m}) d r_{m, k} .

(3)

where the process noise

e_{k}^{m}

, when re-scaled by the latent weight

r_{m, k}

, is subject to a conditional Gaussian distribution, namely

e_{k}^{m} | 0, σ_{m}^{2}, r_{m, k} \sim N (0, σ_{m}^{2} / r_{m, k}) .

In parallel, the scaling variable

r_{m, k}

is assumed to follow a Gamma distribution parameterized by the degrees of freedom

ν_{m}

, that is

r_{m, k} | ν_{m} \sim G (ν_{m} / 2, ν_{m} / 2) .

An important property is that as

ν_{m} \to \infty

, the distribution of

r_{m, k}

collapses to the constant value 1, implying that the t-distribution gradually degrades to the Gaussian distribution.

To fully utilize the information from all sensors, a set

λ_{k} = {λ_{1, k}, λ_{2, k}, \dots, λ_{M, k}}

is defined for the weight assigned to each sensor at time k. Since samples are collected sequentially over time, the measurement from the m-th sensor at time k is weighted, yielding a corrected value

{\bar{y}}_{k}^{m} = λ_{m, k} y_{k}^{m}

. This corrected measurement is assumed to be Gaussian distributed as

N (λ_{m, k} {(x_{k}^{m})}^{T} θ, λ_{m, k}^{2} σ_{m}^{2} / r_{m, k})

. The set

{\bar{Y}}_{k} = {{\bar{y}}_{k}^{1}, {\bar{y}}_{k}^{2}, \dots, {\bar{y}}_{k}^{M}}

represents the weighted output values from the M sensors at time k.

Based on the above definition, when measurements data arrive sequentially, the optimization objective is formulated in the maximum likelihood sense as

{\hat{θ}}_{k} = arg max_{θ} f ({\bar{Y}}_{k} | θ, λ_{k}), s . t . \sum_{m = 1}^{M} λ_{m, k} = 1,

(4)

where

f ({\bar{Y}}_{k} | θ, λ_{k})

is log-likelihood function (for details, see [23]). Solving (4), we can get

λ_{m, k}

and the parameter

{\hat{θ}}_{k}

. However, due to the unknown variances and dof introduced by Student’s t-distribution,

λ_{m, k}

and

{\hat{θ}}_{k}

cannot be directly obtained, which will be addressed with a recursive EM scheme in the following sections. Figure 1 illustrates the general framework of the RMSREM algorithm.

3. Parameter Estimation via Robust Multi-Sensor Recursive EM Algorithm

The EM framework is widely used to estimate systems containing unobserved variables based on the maximum likelihood principle. Two core steps are composed in EM: the Expectation step (E-step) and the Maximization step (M-step), which are executed iteratively. In the E-step, the expectation of the complete-data log-likelihood function regarding the missing data needs to be calculated, and this expected value is also known as the Q-function. The specific calculation formula of Q-function is given by

Q (Θ, Θ^{'}) = E_{C_{mis} | C_{obs}, Θ^{'}} \{log p (C_{obs}, C_{mis} | Θ)\},

(5)

where

C_{obs}

represents the observed variable set,

C_{mis}

denotes the unobserved (i.e., missing) variable set,

Θ

and

Θ^{'}

respectively stand for the parameter set to be estimated in the current iteration and obtained from the previous iteration. Then, the parameter set

Θ

is updated by maximizing (5) in the M-step, which can be expressed as

Θ = arg max_{Θ} Q (Θ, Θ^{'}) .

(6)

Typically, the batch EM algorithm (BEM) is an iterative method. By iteratively performing the above two steps, the algorithm will gradually converge to a local maximum of the Q-function. For the recursive EM algorithm, the key is to convert the iterative calculation process into a recursive form. The specific implementation details of the Q-function recursion will be elaborated in the following sections.

3.1. Derivation of the Recursive Q-Function

In the batch EM, all historical data within a time period would be included in parameter estimation. When new samples arrive, the BEM requires a complete re-update, resulting in a slow response to dynamic changes in the process. This drawback limits the application of the BEM. Alternatively, a robust recursive EM is designed in this paper, where the Q-function is calculated in a recursive manner.

For the robust multi-sensor identification problem, the observed variable set

C_{obs}

includes input U and output

\bar{Y}

, i.e.,

C_{obs} = {\bar{Y}, U}

, where

U = {u_{1}, \dots, u_{N}}

and

\bar{Y} = {{\bar{Y}}_{1}, \dots, {\bar{Y}}_{N}}

. The unobserved variable set

C_{mis}

consists of variance scaling factors R induced by Student’s t-distribution, i.e.,

C_{mis} = {R}

, where

R = {R_{1}, \dots, R_{N}}

denotes the set of variance scaling factors for all sensors over the entire sampling period, and

R_{N} = {r_{1, N}, \dots, r_{M, N}}

. Additionally, the parameter set to be estimated for this issue is

Θ = {θ, σ_{m}^{2}, ν_{m}}

. The log-likelihood of the complete dataset can be decomposed using the chain rule of probability as follows:

\begin{matrix} log p (C_{obs}, C_{mis} | Θ) & = log p (\bar{Y} | R, U, Θ) p (R | U, Θ) p (U | Θ) \\ = \sum_{k = 1}^{N} \{\sum_{m = 1}^{M} [log p ({\bar{y}}_{k}^{m} | x_{k}^{m}, r_{m, k}, Θ) + log p (r_{m, k} | ν_{m})]\} + log K, \end{matrix}

(7)

where

{\bar{y}}_{k}^{m}

denotes one realization of the measurement random variable

\bar{Y}

. Given the ARX model structure with parameter set

Θ

, the output

{\bar{y}}_{k}^{m}

is determined jointly by the regressive vector

x_{k}^{m}

and the variance scaling factor

r_{m, k}

. In contrast, the factor

r_{m, k}

is only governed by the degrees of freedom

ν_{m}

. Since the system input U is an artificially generated excitation signal that does not depend on

Θ

, the term

K = p (U | Θ)

is treated as a constant. Under these considerations, the Q-function associated with the BEM is expressed as

\begin{matrix} Q (Θ | Θ^{'}) & = E_{R | \bar{Y}, U, Θ^{'}} \{\sum_{k = 1}^{N - 1} \{\sum_{m = 1}^{M} [log p ({\bar{y}}_{k}^{m} | x_{k}^{m}, r_{m, k}, Θ) + log p (r_{m, k} | ν_{m})]\}\} \\ + E_{R | \bar{Y}, U, Θ^{'}} \{\sum_{m = 1}^{M} [log p ({\bar{y}}_{N}^{m} | x_{N}^{m}, r_{m, N}, Θ) + log p (r_{m, N} | ν_{m})]\} . \end{matrix}

(8)

Next, (8) is rewritten into the following summation form:

Q (Θ | Θ^{'}) = \sum_{k = 1}^{N - 1} q_{k} (Θ | Θ^{'}) + q_{N} (Θ | Θ^{'}) .

(9)

In BEM,

q_{k} (Θ | Θ^{'})

is the expected value of the complete data logarithmic likelihood of the k-th sample, with

Θ^{'}

representing the current parameter estimation. Mathematically, this expectation is detailed as

\begin{matrix} q_{k} (Θ | Θ^{'}) & = \sum_{m = 1}^{M} \int_{r_{m, k}} p (r_{m, k} | C_{obs}, Θ^{'}) log p ({\bar{y}}_{k}^{m} | x_{k}^{m}, r_{m, k}, Θ) d r_{m, k} \\ + \sum_{m = 1}^{M} \int_{r_{m, k}} p (r_{m, k} | C_{obs}, Θ^{'}) log p (r_{m, k} | ν_{m}) d r_{m, k} . \end{matrix}

(10)

Correspondingly,

q_{N} (Θ | Θ^{'})

is the expectation for the N-th data instance, which has the same expression as above.

In the context of online identification, the Q-function is not computed in batch for the entire dataset. Instead, it is incrementally updated by incorporating the most recent data points. This leads to a quasi-recursive formulation of the Q-function, expressed as

Q_{k} (Θ | Θ_{k - 1}) = \sum_{i = 1}^{k - 1} {\tilde{q}}_{i} (Θ | Θ_{i - 1}) + {\tilde{q}}_{k} (Θ | Θ_{k - 1}),

(11)

where

Θ

is the parameter set generated from successive recursive updates. The quantity

{\tilde{q}}_{i} (Θ | Θ_{i - 1})

is defined as the posterior expectation of the complete log-likelihood for the i-th data point, conditioned on the parameter set

Θ_{i - 1}

estimated at time

i - 1

. Correspondingly,

{\tilde{q}}_{k} (Θ | Θ_{k - 1})

has the same definition at time k. This quantity can be written explicitly as

\begin{matrix} {\tilde{q}}_{i} (Θ | Θ_{i - 1}) & = \sum_{m = 1}^{M} \int_{r_{m, i}} p (r_{m, i} | C_{obs}, Θ_{i - 1}) log p ({\bar{y}}_{i}^{m} | x_{i}^{m}, r_{m, i}, Θ) d r_{m, i} \\ + \sum_{m = 1}^{M} \int_{r_{m, i}} p (r_{m, i} | C_{obs}, Θ_{i - 1}) log p (r_{m, i} | ν_{m}) d r_{m, i} . \end{matrix}

(12)

The recursive Q-function at time step k can be formulated as follows, on the foundation of the quasi-recursive Q-function:

{\tilde{Q}}_{k} (Θ | Θ_{k - 1}) = \frac{1}{k} Q_{k} (Θ | Θ_{k - 1}) .

(13)

Substituting (11) into (13), the recursive Q-function can then be transformed into:

{\tilde{Q}}_{k} (Θ | Θ_{k - 1}) = (1 - \frac{1}{k}) {\tilde{Q}}_{k - 1} (Θ | Θ_{k - 2}) + \frac{1}{k} {\tilde{q}}_{k} (Θ | Θ_{k - 1}) .

(14)

In this paper, the standard step size

1 / k

can be replaced by a synthetic step size

γ_{k} \in R^{1}

. As demonstrated in [10], convergence is guaranteed under the conditions that the step sizes satisfy

\sum_{k = 1}^{\infty} γ_{k} = \infty

and

\sum_{k = 1}^{\infty} γ_{k}^{2} < \infty

. Consequently, the final recursive form of the Q-function is given by

{\tilde{Q}}_{k} (Θ | Θ_{k - 1}) = (1 - γ_{k}) {\tilde{Q}}_{k - 1} (Θ | Θ_{k - 2}) + γ_{k} {\tilde{q}}_{k} (Θ | Θ_{k - 1}) .

(15)

At the initial point of the algorithm, where

k = 1

, the recursive Q-function is

{\tilde{Q}}_{1} (Θ | Θ_{0}) = {\tilde{q}}_{1} (Θ | Θ_{0})

, where

Θ_{0}

is an artificially assigned initial parameter set. When recursively derived from

k = 1

to the current step k, the evolution of

{\tilde{Q}}_{k}

is governed by the following equation:

{\tilde{Q}}_{k} (Θ | Θ_{k - 1}) = [\prod_{t = 2}^{k} (1 - γ_{t})] E_{1} (Θ) + γ_{k} E_{k} (Θ) + \sum_{i = 2}^{k - 1} [\prod_{t = i + 1}^{k} (1 - γ_{t})] γ_{i} E_{i} (Θ),

(16)

where

\begin{matrix} E_{1} (Θ) = {\tilde{q}}_{1} (Θ | Θ_{0}), \\ E_{i} (Θ) = {\tilde{q}}_{i} (Θ | Θ_{i - 1}), \\ E_{k} (Θ) = {\tilde{q}}_{k} (Θ | Θ_{k - 1}) . \end{matrix}

Now, the recursive Q-function of the RMSREM algorithm has been formulated.

3.2. Posterior Expectation of Latent Variables

Next, the posterior expectation of the latent variables in (16) is considered. As previously described, the measurements, when conditioned on the variance-scaling factors, are Gaussian distributed. After extension, the logarithmic pdf for the measurement at time k can be written as

\begin{matrix} log p ({\bar{y}}_{k}^{m} | x_{k}^{m}, r_{m, k}, Θ) = & - \frac{1}{2} log 2 π λ_{m, k}^{2} - \frac{1}{2} log σ_{m, k}^{2} + \frac{1}{2} log r_{m, k} \\ - \frac{r_{m, k}}{2 λ_{m, k}^{2} σ_{m, k}^{2}} {({\bar{y}}_{k}^{m} - λ_{m, k} {(x_{k}^{m})}^{T} θ_{k})}^{2} . \end{matrix}

(17)

The variance weighting factor

r_{m, k}

is assumed to follow a Gamma distribution. The logarithm of its likelihood function can be expressed as

log p (r_{m, k} | ν_{m, k}) = - log Γ (\frac{ν_{m, k}}{2}) + \frac{ν_{m, k}}{2} log (\frac{ν_{m, k}}{2}) + \frac{ν_{m, k}}{2} (log r_{m, k} - r_{m, k}) - log r_{m, k} .

(18)

Exploiting the properties of conjugate priors, the posterior distribution of the variance weighting factor

r_{m, k}

is also a Gamma distribution. Its explicit form can be written as

r_{m, k} | \bar{Y}, U, Θ_{k - 1} \sim G (\frac{ν_{m, k - 1} + 1}{2}, \frac{ν_{m, k - 1} + δ ({\bar{y}}_{k}^{m} | λ_{m, k - 1} {(x_{k}^{m})}^{T} θ_{k - 1}, λ_{m, k - 1}^{2} σ_{m, k - 1}^{2})}{2}) .

(19)

The derivation of posterior distribution of

r_{m, k}

is provided in Appendix A. The Gamma distributed posterior expectation of the variance scaling factor

r_{m, k}

can be computed and is denoted by

r_{m, k | y}^{old} \in R^{1}

as

E_{r_{m, k} | \bar{Y}, U, Θ_{k - 1}} \{r_{m, k}\} ≜ r_{m, k | y}^{old} = \frac{ν_{m, k - 1} + 1}{ν_{m, k - 1} + δ ({\bar{y}}_{k}^{m} | λ_{m, k - 1} {(x_{k}^{m})}^{T} θ_{k - 1}, λ_{m, k - 1}^{2} σ_{m, k - 1}^{2})},

(20)

where the subscript

m, k | y

indicates that

r_{m, k | y}^{old}

is computed conditioned on the measurement

y_{k}^{m}

of time k and the parameters estimated at time

k - 1

. The posterior of the log-variance-scaling factor, namely

log r_{m, k}

, can be expressed as

\begin{matrix} E_{r_{m, k} | \bar{Y}, U, Θ_{k - 1}} \{log r_{m, k}\} & = - log (\frac{ν_{m, k - 1} + δ ({\bar{y}}_{k}^{m} | λ_{m, k - 1} {(x_{k}^{m})}^{T} θ_{k - 1}, λ_{m, k - 1}^{2} σ_{m, k - 1}^{2})}{2}) \\ + ψ (\frac{ν_{m, k - 1} + 1}{2}) \\ = log r_{m, k | y}^{old} + [ψ (\frac{ν_{m, k - 1} + 1}{2}) - log (\frac{ν_{m, k - 1} + 1}{2})], \end{matrix}

(21)

where

ν_{m, k - 1}

represents the dof previously estimated for the m-th sensor at time

k - 1

, respectively. The function

ψ (\cdot)

represents the digamma function, defined as

ψ (\cdot) = Γ^{'} (\cdot) / Γ (\cdot)

.

For brevity, more detailed information of the posterior expectation in the robust multi-sensor estimation is omitted. Up to this point, the derivation of the expectation step for RMSREM has been completed. The subsequent maximization step of RMSREM will be discussed in the next subsection.

3.3. Derivation of the Recursive Maximization Step

To recursively update the parameter

θ

, it is necessary to conduct the derivative of the recursive Q-function with respect to (w.r.t.)

θ

and set the resulting term to zero as

\frac{\partial}{\partial θ} {\tilde{Q}}_{k} (Θ | Θ_{k - 1}) = 0 .

(22)

The online solution for

θ

is given as

{\hat{θ}}_{k} = {[{(θ_{k})}^{den}]}^{- 1} [{(θ_{k})}^{num}],

(23)

where the two terms that determine the update of parameter vector, namely the sufficient statistics, are both computed via recursive processes. A recursive formula for the denominator is

{(θ_{k})}^{den} = (1 - γ_{k}) {(θ_{k - 1})}^{den} + γ_{k} \sum_{m = 1}^{M} λ_{m, k}^{2} \frac{r_{m, k | y}^{old}}{σ_{m, k - 1}^{2}} x_{k}^{m} {(x_{k}^{m})}^{T},

(24)

while the numerator of the parameters is

{(θ_{k})}^{num} = (1 - γ_{k}) {(θ_{k - 1})}^{num} + γ_{k} \sum_{m = 1}^{M} λ_{m, k}^{2} \frac{r_{m, k | y}^{old}}{σ_{m, k - 1}^{2}} x_{k}^{m} y_{k}^{m} .

(25)

To estimate the measurement noise variance, the recursive Q-function is differentiated w.r.t. the standard deviation

σ_{m, k}

, and then the resulting term is set to zero, leading to

\frac{\partial}{\partial σ_{m, k}} {\tilde{Q}}_{k} (Θ, Θ_{k - 1}) = 0 .

(26)

Employing a similar approach, the recursive fractional update equation for the variance

σ_{m, k}^{2}

is obtained as

{\hat{σ}}_{m, k}^{2} = {[{(σ_{m, k}^{2})}^{den}]}^{- 1} [{(σ_{m, k}^{2})}^{num}],

(27)

and the denominator is

{(σ_{m, k}^{2})}^{den} = 1,

(28)

while the numerator is updated as

{(σ_{m, k}^{2})}^{num} = (1 - γ_{k}) {(σ_{m, k - 1}^{2})}^{num} + γ_{k} r_{m, k | y}^{old} {(y_{k}^{m} - {(x_{k}^{m})}^{T} θ_{k})}^{2} .

(29)

Similar to the updates of the parameter vector and variance, the update for the dof

ν_{m}

would be derived from the recursive Q-function in (16). The partial derivative of

{\tilde{Q}}_{k} (Θ, Θ_{k - 1})

w.r.t.

ν_{m}

is computed and equated to zero. This expression is in the following equation:

\frac{\partial}{\partial ν_{m}} {\tilde{Q}}_{k} (Θ, Θ_{k - 1}) = 0 .

(30)

Then, a function of

ν_{m}

is obtained as

- ψ (\frac{ν_{m}}{2}) + log (\frac{ν_{m}}{2}) + 1 + {\tilde{s}}_{m, k} = 0,

(31)

with

{\tilde{s}}_{m, k} = (1 - γ_{k}) {\tilde{s}}_{m, k - 1} + γ_{k} \{[ψ (\frac{ν_{m, k - 1} + 1}{2}) - log (\frac{ν_{m, k - 1} + 1}{2})] + (log r_{m, k | y}^{old} - r_{m, k | y}^{old})\} .

(32)

where

{\tilde{s}}_{m, k}

is an artificially defined recursive auxiliary statistic, which facilitates the update of the degrees of freedom. In the BEM framework,

ν_{m}

is obtained directly by solving the associated equation of complete data. Regarding the online update step, the auxiliary statistic

{\tilde{s}}_{m, k}

admits a recursive representation.

Solving (31) yields the estimate of the degrees of freedom of current time k for m-th sensor, i.e.,

{\hat{ν}}_{m, k} \in \{ν | - ψ (\frac{ν}{2}) + log (\frac{ν}{2}) + 1 + {\tilde{s}}_{m, k} = 0\}

, where the subscript k is added explicitly for online estimation. In simulation studies, this can be computed using Matlab’s fsolve function, whereas in practical applications, standard nonlinear optimization methods may be employed.

3.4. Solution for Weights $λ_{m, k}$

As mentioned previously, the weights of sensors depend on the variance of measurement noise. After performing a recursive estimation of the noise variance based on hyperparameters, the multi-task likelihood function relating to

λ_{m, k}

can be expressed in a following form:

f ({\bar{Y}}_{k} | θ, λ) \propto - \sum_{m = 1}^{M} (\frac{1}{2} log \frac{λ_{m, k}^{2} σ_{m, k - 1}^{2}}{r_{m, k}^{old}}) - \sum_{m = 1}^{M} \frac{r_{m, k}^{old} {({\bar{y}}_{k}^{m} - λ_{m, k} {(x_{k}^{m})}^{T} θ_{k - 1})}^{2}}{2 λ_{m, k}^{2} σ_{m, k - 1}^{2}} .

(33)

An intermediate solution of

(λ_{m, k}^{2} σ_{m, k - 1}^{2})

w.r.t. (33) would be obtained via partial derivatives approach. Substituting the results back into (33), the term

{({\bar{y}}_{k}^{m} - λ_{m, k} {(x_{k}^{m})}^{T} θ_{k - 1})}^{2}

would be eliminated in the subsequent derivation. Solving the sensor weights is then simplified as the following constrained optimization issue:

λ_{m, k} = arg min_{λ_{m, k}} \sum_{m = 1}^{M} (\frac{λ_{m, k}^{2} σ_{m, k - 1}^{2}}{r_{m, k}}), s . t . \sum_{m = 1}^{M} λ_{m, k} = 1 .

(34)

Introducing the Lagrange multiplier

ζ

, a Lagrange function is formulated as follows:

g (λ_{m, k}, ξ) = \sum_{m = 1}^{M} (\frac{λ_{m, k}^{2} σ_{m, k - 1}^{2}}{r_{m, k}}) + ζ (\sum_{m = 1}^{M} λ_{m, k} - 1),

(35)

then, the weight of the sensor

λ_{m, k}

is finally determined as

λ_{m, k} = \frac{r_{m, k}}{σ_{m, k - 1}^{2}} (\sum_{m = 1}^{M} \frac{σ_{m, k - 1}^{2}}{r_{m, k}}) .

(36)

The solution of the sensor weight vector can be expressed into a matrix form as

λ_{k} = A_{M, k}^{- 1} 1_{M} {(1_{M}^{T} A_{M, k}^{- 1} 1_{M})}^{- 1},

(37)

A_{M, k} = [\begin{matrix} \frac{σ_{1, k - 1}^{2}}{r_{1, k}} & 0 & \dots & 0 \\ 0 & \frac{σ_{2, k - 1}^{2}}{r_{2, k}} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & 0 & \frac{σ_{M, k - 1}^{2}}{r_{M, k}} \end{matrix}],

(38)

where

1_{M} = {[1, 1, \dots, 1]}^{T} \in R^{M \times 1}

is an M-dimensional vector.

Hence, the derivation of the robust multi-sensor recursive EM (RMSREM) algorithm for handling multi-sensor linear ARX models has been completed. The operating steps are listed in Algorithm 1.

Algorithm 1. Robust multi-sensor recursive EM algorithm.

Require:: Observations of sensors $y_{1 : k}^{m}$ , regressive vector $x_{1 : k}^{m}$ ;
Ensure:: Updated ${\hat{θ}}_{k}$ , $σ_{m, k}^{2}$ , $ν_{m, k}$ for a next k
1:: Initialization:
2:: Assign step-size $λ_{1 : k} = 0.01$ ;
3:: Assign random values in $(0, 1)$ to elements of parameter vector $θ_{0}$ ;
4:: Assign large values to $σ_{m, 0}^{2} = 1$ ;
5:: Assign large values to $ν_{m, 0} = 100$ ;
6:: while $k - 1 \to k$ do
7::     Recursive E-step (per sensor)
         Calculate the posterior expectations $E_{r_{m, k} | \bar{Y}, U, Θ_{k - 1}} \{r_{m, k}\}$ via (20);
         Calculate the posterior expectations $E_{r_{m, k} | \bar{Y}, U, Θ_{k - 1}} \{log r_{m, k}\}$ via (21);
8:: Weight update
Update the sensor weights $λ_{m, k}$ via (37);
9:: Recursive M-step1
Update the model parameter $θ_{k}$ via (23);
10::     Recursive M-step2 (per sensor)
         Update the sensor noise variance $σ_{m, k}^{2}$ via (27);
         Update the dof of Student’s t-distribution $ν_{m, k}$ via (31);
11:: end while

3.5. Analysis of Convergence and Computational Complexity

A.: Analysis of convergence issue of RMSREM algorithm.

The Student’s t-distribution belongs to the curved exponential family. When only one sensor is considered, the complete-data likelihood can be decomposed as

f (Y, Θ) = h (Y, R) exp \{- ψ (θ) + 〈s (Y, R), ϕ (θ)〉\},

(39)

where the variable

s (Y, R)

is the sufficient statistics,

ϕ (θ)

is the natural parameters, and

〈\cdot, \cdot〉

represents the inner product. Considering the

k th

time instant, the sufficient statistics can be defined as

s_{k} (Y, R) = {[log r_{k} - r_{k}, r_{k} y_{k}^{2}, r_{k} x_{k} y_{k}, x_{k} x_{k}^{T}]}^{T}

, while the natural parameter can be defined as

ϕ_{k} (θ) = {[ν_{k} / 2, - 1 / (2 σ_{k}^{2}), - θ_{k} / σ_{k}^{2}, - θ_{k}^{2} / 2 σ_{k}^{2}]}^{T}

. When the Kullback-Leibler divergence

K (g_{θ^{*}} | | g_{θ}) \overset{Δ}{=} E_{π} [log \{\frac{g (Y, θ^{*})}{g (Y, θ)}\}]

is selected as the Lyapunov function, the convergence of the online algorithm can be proved using Lyapunov stability theorem or Theorem 1 of [10], where

g (Y, θ^{*})

represents the actual probability density function of the observation Y, and

g (Y, θ)

is the observed likelihood function based on estimated

θ

. The input variable U is omitted here for clarity.

The multi-sensor case would share the same convergence property with the one-sensor case, with the weights of sensors

λ_{m, k}

. The mathematical proof would be investigated in future work.

B.: Analysis of computational complexity of RMSREM algorithm.

Considering the online updating nature of the RMSREM algorithm, the computational complexity of the one-step operation is discussed.

In the proposed algorithm, the step with the heaviest computational burden is the inversion of the denominator of parameter vector

{(θ_{k})}^{den} \in R^{n \times n}

, which has a computational complexity of

O (n^{3})

. The computational complexity of the recursive updating of

{(θ_{k})}^{den} \in R^{n \times n}

is

O (m \cdot n^{2})

, involving the multiple sensors. As the sensor weight matrix is diagonal, the computational complexity of its inversion is

O (m)

, which will be overshadowed by other steps with higher computational complexity. Moreover, the function

f (ν_{m}) = - ψ (ν_{m} / 2) + log (ν_{m} / 2) + 1

in (31) is monotonous regarding

ν_{m}

. Therefore, the computational complexity of its optimal solution is

O (m \cdot i)

, where i is the iteration quantity cannot be accurately determined before the convergence of numerical optimization approach.

Therefore, the overall computational complexity of one-step operation of RMSREM algorithm when a new sample arrives is

O (n^{3} + m \cdot (n^{2} + i))

.

4. Algorithm Verification

4.1. Numerical Simulation

This experiment is a numerical simulation. To examine the multi-sensor characteristics of the RMSREM algorithm proposed in the paper, two sensors with different variances were set up and modeled using a second-order linear ARX models. The governing equation is as follows:

y_{k}^{m} = {(x_{k}^{m})}^{T} θ + e_{k}^{m}, k = 1, 2, \dots, L .

(40)

Among them, regressive vector is set as

x_{k}^{m} = {[y_{k - 1}^{m}, y_{k - 2}^{m}, u_{k - 1}, u_{k - 2}]}^{T}

, the input

u_{k}

follows a uniform distribution, namely

u_{k} \sim U (- 5, 5)

,

θ = {[a_{1}, a_{2}, b_{1}, b_{2}]}^{T}

is a parameter vector. Two sets of Gaussian distributed noise were generated, where

e_{k}^{1} \sim N (0, 0.001)

and

e_{k}^{2} \sim N (0, 1)

, which were added to the noiseless output. A proportion (in this case, 5%) of the measured values are replaced with random disturbances uniformly distributed on

[- 3.5, - 3]

, serving as outliers.

During the numerical simulation process, dynamic changes and multi-sensor scenarios were designed to examine the robustness of the RMSREM algorithm. To test the online recursive updating characteristics of the algorithm, the process sample sequence is split into two stages in chronological order (i.e., Phase I and Phase II). The parameter vector of the system will shift along with the switch states. The time-varying nature of parameter vectors results in the time-varying dynamics of the system, which hinders the implementation of conventional batch algorithms. The RMSREM algorithm can effectively solve the problem of system dynamic changes.

As mentioned above to simulate the multi-sensor scenarios, two types of noise are generated. Subsequently, 40,000 samples are generated according to (40), add noise to the generated samplewith each sensor containing 20,000 samples and each phase containing 10,000 samples. The input dual-output curves of the numerical example near the phase transition point are shown in Figure 2, where outliers are also plotted.The time-varying actual parameter vectors are given in Table 1.

The parameter obtained from the proposed the RMSREM algorithm are also shown in Table 1, along with a comparison with the results of robust recursive EM (RREM) algorithm [15]. It should be noted that the RREM is implemented separately for the two sensors, denoted as RREM 1 and RREM 2, since it cannot process multi-sensor information. For the RMSREM algorithm, the parameter vector is obtained at the last moment of each phase. As illustrated in this table, the parameter vectors estimated by RREM 1 and the RMSREM algorithm are consistent with the actual parameter vectors. However, the parameters of RREM 2 failed to converge to the true values due to the excessively large noise variance. This indicates that the RMSREM algorithm can effectively process multi-sensor information and can estimate the actual parameter vector when some sensors are failed.

Meanwhile, to evaluate the algorithm performance, the mean square error (MSE) of the estimated output is used, which is specified as

MSE = \frac{1}{N} \sum_{k = 1}^{N} {({\hat{y}}_{k} - y_{k})}^{2}

. Table 2 presents the results of self-validation (SV) and cross-validation (CV) MSE for each algorithm under different outlier ratios. In this table, CV I stands for the first stage cross-validation, while CV II stands for the second stage cross-validation. The robust batch EM (RBEM) algorithm, the recursive robust EM (RREM) algorithm, and recursive multi-task EM (RMTEM) algorithm [25] are adopted as benchmark methods. Since neither the RBEM algorithm nor the RREM algorithm can handle multi-sensor data simultaneously, they are implemented separately for the two sensors, denoted as RBEM 1, RBEM 2 and RREM 1, RREM 2, which process Sensor 1 and Sensor 2 respectively. As shown in the table, the RMSREM algorithm, RREM 1, and RBEM 1 exhibit similar performance, while RREM 2 and RBEM 2 perform poorly due to the excessively large variance of Sensor 2. In contrast, RMTEM algorithm demonstrates inferior performance under the interference of outliers.

Table 3 illustrates the time cost of the proposed the RMSREM algorithm on RTM i7-10750H CPU @ 2.60 GHz, alongside benchmark methods including the RLS algorithm, the RREM algorithm, and the RMTEM algorithm. We also designed a comparison focusing on two key indices: the time cost of one-step operation and the time cost of solving dof

ν

, where the latter accounts for a significant portion of the computational load. It can be seen that the solution time of the RMSREM algorithm is consistent with that of other algorithms under the recursive EM framework.

The parameter convergence process of the RMSREM algorithm is illustrated in Figure 3. The figure shows that convergence occurs after 2000 samples in the first stage. When the system switches the parameter vector to stage 2 at the 10,000-th sample, the proposed recursive algorithm converges again after 1500 samples.

Furthermore, to obtain the effective range of the RMSREM algorithm against outlier interference, outliers ranging from 0% to 30% (in increments of 5%) were injected into the measured values and subjected to 50 Monte Carlo simulations. Figure 4 plots the variation curve of MSE for self-validated N-step prediction, where the knee points of the proposed algorithm are marked. The curve in the left region of the inflection point is the effective range of the algorithm in this numerical simulation. The results indicate that the inflection point of the RMSREM algorithm falls within the range of 20% to 25%.

4.2. Continuous Stirred Tank Reactor Example

The continuous stirred tank reactor (CSTR) is a constant-volume, exothermic, irreversible, and nonlinear system. Its dynamic behavior follows the reaction mechanism, with the core governing equations referenced from [33] as shown below (see Figure 5 for the schematic diagram):

\frac{d C_{A} (t)}{d t} = \frac{q (t)}{V} (C_{A 0} (t) - C_{A} (t)) - k_{0} C_{A} (t) exp (\frac{- E}{R T (t)}),

(41)

\begin{matrix} \frac{d T (t)}{d t} & = \frac{q (t)}{V} (T_{0} (t) - T (t)) - \frac{(- Δ H) k_{0} C_{A} (T)}{ρ C_{p}} exp (\frac{- E}{R T (t)}) \\ + \frac{ρ_{c} C_{p c}}{ρ C_{p} V} q_{c} (t) \{1 - exp (\frac{- h A}{q_{c} (t) ρ C_{p}})\} (T_{c 0} (t) - T (t)), \end{matrix}

(42)

where

C_{A} (t)

denotes the product concentration of Component A (the key output variable of interest in this study), and

q_{c} (t)

represents the coolant flow rate (the key manipulated input variable). The dynamic relationship between these two variables is the focus of the analysis. Due to the nonlinear nature of the CSTR, the system needs to be linearized at preset operating points to investigate this relationship. In this work, two operating points of coolant flow rate,

q_{c} (t) = 98 L / \min

and

q_{c} (t) = 105 L / \min

, are selected to simulate the dynamic changes of Component A concentration, thereby verifying the adaptability of the RMSREM algorithm. The definitions and nominal values of other parameters in governing equations can be found in [15].

To guarantee sufficient excitation, a special binary signal is created that flips between −0.5 and 0.5 for the cooling liquid flow. Then, the reactor system is operated according to the governing Equations (41) and (42), and the input and output values are sampled at each minute (sampling rate: 1 min). In total, 20,000 samples are recorded, with 10,000 in each stage of the simulation. Next, two different series of Gaussian noise are added to the samples, imitating the signals coming from two separate measuring tools. The first series has a very small scatter, as

e_{k}^{1} \sim N (0, 1 \times 10^{- 8})

. The second series has a variance about a thousand times more, i.e.,

e_{k}^{2} \sim N (0, 1 \times 10^{- 5})

. To mimic occasional bad readings, 3% of the samples are randomly picked and replaced with outliers, with a random number between −0.05 and 0.05. Figure 6 exhibits a curve segment of the input and the two sets of output neighboring the stage switching instant. The outliers that were generated on purpose are also marked in the figure.

The step size

γ_{k}

is an artificially determined hyperparameter in the RMSREM algorithm, which significantly affects the performance of the method. To determine an appropriate step size, we conducted comparative experiments with different constant step sizes (0.001, 0.005, and 0.01) and visualized the results in Figure 7. Through comprehensive analysis of the performance metrics, the step size of 0.01 was ultimately selected for both simulation examples in this work [10,11].

A first-order ARX model is adopted to depict the specific dynamic characteristics. The RMSREM algorithm is then implemented and compared with the RREM algorithm. Since the RREM algorithm cannot handle multi-ensor data, it is implemented separately for each sensor, denoted as RREM 1 and RREM 2. Figure 8a shows the curve of product concentration of Component A for self-validation samples, which are the true values of samples without added noise. The corresponding curves for cross-validation are shown in Figure 8b. For the sake of clarity, only selected segments of the curves are presented, with the displayed regions chosen at random. Both SV and CV experiment involve making N-step forecasts of the production concentration of component A in CSTR. As illustrated in the figures, even in the presence of outliers, the algorithm accurately tracks the concentration trajectory in both validation scenarios. In comparison, RREM exhibits noticeable prediction errors as sensor noise levels rise.

The MSE of the algorithms, defined as

M S E = \frac{1}{N} \sum_{k = 1}^{N} {({\hat{y}}_{k} - y_{k})}^{2}

, is listed in Table 4, which shows the impact of different outlier proportions. All results are derived from 50 Monte Carlo simulations to ensure statistical reliability. The benchmark methods include the RBEM algorithm, the RREM algorithm (consistent with the numerical simulation section, implemented separately for the two sensors), and RMTEM algorithm. As illustrated in the table, the proposed the RMSREM algorithm outperforms RBEM algorithm, RREM algorithm, and RMTEM algorithm across all outlier proportion scenarios. Notably, RMTEM exhibits inferior performance under outlier interference, while RBEM algorithm and RREM algorithm also deteriorate significantly when the outlier proportion increases to 5%. In contrast, the RMSREM algorithm maintains excellent convergence capability even under such adverse conditions.

The effective range of the RMSREM algorithm in the CSTR process is explored through 50 Monte Carlo simulations. To better varify the multi-sensor robustness of the RMSREM algorithm, two methods of adding outliers are designed:

-: Case 1: The outlier proportions of the two sensors are the same and increased gradually.
-: Case 2: The second sensor is set to a damaged state with 100% outliers, and the outlier proportion of the first sensor is increased gradually.

The outlier proportion varies from 0% to 7.0% with an increment of 0.5%. Figure 9 presents the MSE of N-step self-validation predictions within one phase under different outlier contamination degrees. We have labeled the mean and confidence intervals of MSE for 3% and 6% outlier proportions in the graph. Specifically, Figure 9a shows that the effective range reaches 6.0% under Case 1, while Figure 9b indicates an effective range of 3.5% under the Case 2. The comparison between these two scenarios fully demonstrates the robustness of the multi-sensor framework: even when one sensor is completely compromised by outliers, the proposed the RMSREM algorithm can still leverage valid information from the other sensor to achieve accurate parameter identification.

The ability of the RMSREM algorithm method developed in this work to handle disturbances effectively is rooted in its employment of the t-distribution. Drawing on the properties of this distribution, the algorithm uses the posterior expectation of the variance scaling factor as a weight in the process of parameter updating. Specifically, when an unusual or erroneous measurement (an outlier) appears at time k, the expectation step yields a small value of

r_{m, k | y}^{old}

. Consequently, the influence of the k-th data point is diminished during the parameter updating. Simultaneously, the contribution of each sensor is further refined with the computation of

λ_{m, k}

. A lower weight is assigned to the sensor that exhibits a higher degree of measurement variability (larger variance).

5. Discussion

Focusing on the robust identification problem of multi-sensor ARX systems in complex industrial environments, this paper proposes a robust multi-sensor recursive EM (RMSREM) algorithm. To strengthen robustness against non-Gaussian noise and outliers, the algorithm introduces the Student’s t-distributions to model measurement noises. An adaptive weight fusion mechanism is incorporated for multi-sensor data to mitigate the impact of single-sensor failure, and a recursive framework is adopted to achieve real-time parameter updates, adapting to the time-varying characteristics of practical systems.

Results from numerical simulations and the CSTR process case demonstrate that the proposed RMSREM algorithm achieves a lower mean square error under the same outlier contamination level compared with the RREM method. Furthermore, it maintains stable accuracy when some sensors are damaged, and can quickly track time-varying parameters when working conditions switch.

Author Contributions

Conceptualization, X.C. and J.L.; methodology, X.C. and J.L.; validation, J.L.; writing–original draft preparation, review and editing, X.C. and J.L.; supervision, X.C.; project administration, X.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62203317, and in part by the Natural Science Foundation of Jiangsu Province under Grant BK20210862.

Data Availability Statement

The data supporting the results of this research report have been detailed in the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proof of (19)

According to Bayes Theorem, the posterior distribution of the variance weighting variable would be

p (R | Y, U, Θ) = \frac{p (Y, U | R, Θ) p (R | Θ)}{p (Y, U | Θ)} .

(A1)

The denominator

p (Y, U | Θ) = \int p (Y, U | R, Θ) p (R | Θ) d R

, which is independent of R after integration and acts as a normalization constant. Considering time instant k, the posterior probability of

r_{m, k}

is proportional to the following probabilities according to (A1):

p (r_{m, k} | {\bar{y}}_{k}^{m}, x_{k}^{m}, Θ_{k - 1}) \propto p ({\bar{y}}_{k}^{m} | x_{k}^{m}, r_{m, k}, Θ_{k - 1}) p (r_{m, k} | Θ_{k - 1}),

(A2)

where the first probability

p ({\bar{y}}_{k}^{m} | x_{k}^{m}, r_{m, k}, Θ_{k - 1})

follows a Gaussian distribution as

p ({\bar{y}}_{k}^{m} | x_{k}^{m}, r_{m, k}, Θ_{k - 1}) = \frac{1}{\sqrt{2 π}} \sqrt{\frac{r_{m, k}}{λ_{m, k}^{2} σ_{m, k - 1}^{2}}} exp (- \frac{r_{m, k} {({\bar{y}}_{k}^{m} - λ_{m, k} {(x_{k}^{m})}^{T} θ_{k - 1})}^{2}}{2 λ_{m, k}^{2} σ_{m, k - 1}^{2}}),

(A3)

and the prior

p (r_{m, k} | Θ_{k - 1})

follows a Gamma distribution as

p (r_{m, k} | Θ_{k - 1}) = \frac{{(\frac{ν_{m, k - 1}}{2})}^{\frac{ν_{m, k - 1}}{2}} r_{m, k}^{(\frac{ν_{m, k - 1}}{2} - 1)}}{Γ (\frac{ν_{m, k - 1}}{2})} exp (- \frac{ν_{m, k - 1}}{2} r_{m, k}) .

(A4)

Then, the posterior distribution of the variance weighting variable would be derived as

\begin{matrix} p (r_{m, k} | {\bar{y}}_{k}^{m}, x_{k}^{m}, Θ_{k - 1}) & \propto p ({\bar{y}}_{k}^{m} | x_{k}^{m}, r_{m, k}, Θ_{k - 1}) p (r_{m, k} | Θ_{k - 1}) \\ \propto {r_{m, k}}^{(\frac{ν_{m, k - 1} - 1}{2})} exp (- (\frac{{({\bar{y}}_{k}^{m} - λ_{m, k} {(x_{k}^{m})}^{T} θ_{k - 1})}^{2}}{2 λ_{m, k}^{2} σ_{m, k - 1}^{2}} + \frac{ν_{m, k - 1}}{2}) r_{m, k}) . \end{matrix}

(A5)

Comparing with the pdf of Gamma distribution in (A4), a conclusion would be addressed that the posterior distribution of variance weighting variable is subject to a Gamma distribution as

r_{m, k} | \bar{Y}, U, Θ_{k - 1} \sim G (\frac{ν_{m, k - 1} + 1}{2}, \frac{ν_{m, k - 1} + δ ({\bar{y}}_{k}^{m} | {(x_{k}^{m})}^{T} θ_{k - 1}, σ_{m, k - 1}^{2})}{2}) .

(A6)

where

δ ({\bar{y}}_{k}^{m} | {(x_{k}^{m})}^{T} θ_{k - 1}, σ_{m, k - 1}^{2}) ≜ {({\bar{y}}_{k}^{m} - λ_{m, k} {(x_{k}^{m})}^{T} θ_{k - 1})}^{2} / (λ_{m, k}^{2} σ_{m, k - 1}^{2})

is the square of the Mahalanobis distance. Then, the posterior distribution in (19) is obtained.

References

Ljung, L. Perspectives on system identification. Annu. Rev. Control 2010, 34, 1–12. [Google Scholar] [CrossRef]
Chen, H.; Jiang, B.; Ding, S.X.; Huang, B. Data-Driven Fault Diagnosis for Traction Systems in High-Speed Trains: A Survey, Challenges, and Perspectives. IEEE Trans. Intell. Transp. Syst. 2022, 23, 1700–1716. [Google Scholar] [CrossRef]
Khatibisepehr, S.; Huang, B. A Bayesian approach to robust process identification with ARX models. AIChE J. 2013, 59, 845–859. [Google Scholar] [CrossRef]
Kheirandish, A.; Fatehi, A.; Gheibi, M.S. Identification of Slow-Rate Integrated Measurement Systems Using Expectation–Maximization Algorithm. IEEE Trans. Instrum. Meas. 2020, 69, 9477–9484. [Google Scholar] [CrossRef]
Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Methodol. 1977, 39, 1–22. [Google Scholar] [CrossRef]
Sun, Q.; Ge, Z. A Survey on Deep Learning for Data-Driven Soft Sensors. IEEE Trans. Ind. Inform. 2021, 17, 5853–5866. [Google Scholar] [CrossRef]
Wang, Q.; Guo, G.; Qian, G.; Jiang, X. Distributed online expectation-maximization algorithm for Poisson mixture model. Appl. Math. Model. 2023, 124, 734–748. [Google Scholar] [CrossRef]
Zhang, Y.; Yang, Q. A Survey on Multi-Task Learning. IEEE Trans. Knowl. Data Eng. 2022, 34, 5586–5609. [Google Scholar] [CrossRef]
Titterington, D.M. Recursive Parameter Estimation Using Incomplete Data. J. R. Stat. Soc. Ser. B Methodol. 1984, 46, 257–267. [Google Scholar] [CrossRef]
Cappé, O.; Moulines, E. On-Line Expectation–Maximization Algorithm for latent Data Models. J. R. Stat. Soc. Ser. B Methodol. 2009, 71, 593–613. [Google Scholar] [CrossRef]
Cappé, O. Online EM Algorithm for Hidden Markov Models. J. Comput. Graph. Stat. 2011, 20, 728–749. [Google Scholar] [CrossRef]
Zhao, Y.; Fatehi, A.; Huang, B. A data-driven hybrid ARX and Markov-Chain modeling approach to process identification with time varying time delays. IEEE Trans. Ind. Electron. 2017, 64, 4226–4236. [Google Scholar] [CrossRef]
Guo, Y.; Ma, C. Identification of time-delay Markov jumps autoregressive system with recursive expectation maximum and convex optimization algorithm. Trans. Inst. Meas. Control 2023, 45, 1607–1618. [Google Scholar] [CrossRef]
Vaičiulytė, J.; Sakalauskas, L. Recursive parameter estimation algorithm of the Dirichlet hidden Markov model. J. Stat. Comput. Simul. 2020, 90, 306–323. [Google Scholar] [CrossRef]
Chen, X.; Zhao, S.; Liu, F. Robust identification of linear ARX models with recursive EM algorithm based on Student’s t-distribution. J. Frankl. Inst. 2021, 358, 1103–1121. [Google Scholar] [CrossRef]
Salehi, Y.; Huang, B. Offline and online parameter learning for switching multirate processes with varying delays and integrated measurements. IEEE Trans. Ind. Electron. 2021, 69, 7213–7222. [Google Scholar] [CrossRef]
Qian, Y.; Liu, X.; Yang, X.; Sun, X.M. Robust Recursive Identification of Time-delay Systems with Skewed Measurement Noise. IEEE Trans. Instrum. Meas. 2025, 74, 3001512. [Google Scholar] [CrossRef]
Zheng, Y. Methodologies for Cross-Domain Data Fusion: An Overview. IEEE Trans. Big Data 2015, 1, 16–34. [Google Scholar] [CrossRef]
Lu, S.; Gong, Y.; Luo, H.; Zhao, F.; Li, Z.; Jiang, J. Heterogeneous Multi-Task Learning for Multiple Pseudo-Measurement Estimation to Bridge GPS Outages. IEEE Trans. Instrum. Meas. 2021, 70, 1010309. [Google Scholar] [CrossRef]
Lin, C.M.; Hsueh, C.S. Adaptive EKF-CMAC-Based Multisensor Data Fusion for Maneuvering Target. IEEE Trans. Instrum. Meas. 2013, 62, 2058–2066. [Google Scholar] [CrossRef]
Chen, Z.; Li, W. Multisensor Feature Fusion for Bearing Fault Diagnosis Using Sparse Autoencoder and Deep Belief Network. IEEE Trans. Instrum. Meas. 2017, 66, 1693–1702. [Google Scholar] [CrossRef]
Jiongqi, W.; Zhangming, H.; Haiyin, Z.; Shuxing, L.; Xuanying, Z. Optimal Weight and Parameter Estimation of Multi-structure and Unequal-Precision Data Fusion. Chin. J. Electron. 2017, 26, 1245–1253. [Google Scholar]
Meng, T.; Jing, X.; Yan, Z.; Pedrycz, W. A survey on machine learning for data fusion. Inf. Fusion 2020, 57, 115–129. [Google Scholar] [CrossRef]
Ping, X.; Zhang, K.; Zhao, S.; Luan, X.; Liu, F. Multitask Maximum Likelihood Identification for ARX Model with Multisensor. IEEE Trans. Instrum. Meas. 2022, 71, 2509710. [Google Scholar] [CrossRef]
Chu, Y.; Ping, X.; Zhao, S.; Liu, F. Robust online identification method for biofabrication processes with multiple unknown disturbances. J. Frankl. Inst. 2025, 362, 107643. [Google Scholar] [CrossRef]
De Menezes, D.; Prata, D.; Secchi, A.; Pinto, J. A review on robust M-estimators for regression analysis. Comput. Chem. Eng. 2021, 147, 107254. [Google Scholar] [CrossRef]
Zhang, T.; Zhao, S.; Luan, X.; Liu, F. Bayesian inference for state-space models with student-t mixture distributions. IEEE Trans. Cybern. 2022, 53, 4435–4445. [Google Scholar] [CrossRef]
Asanya, K.C.; Kharrat, M.; Udom, A.U.; Torsen, E. Robust Bayesian approach to logistic regression modeling in small sample size utilizing a weakly informative student’s t prior distribution. Commun. Stat. Theory Methods 2023, 52, 283–293. [Google Scholar] [CrossRef]
Yu, X.; Meng, Z. Robust Kalman Filters with Unknown Covariance of Multiplicative Noise. IEEE Trans. Autom. Control 2024, 69, 1171–1178. [Google Scholar] [CrossRef]
Hasannasab, M.; Hertrich, J.; Laus, F.; Steidl, G. Alternatives to the EM algorithm for ML estimation of location, scatter matrix, and degree of freedom of the Student t distribution. Numer. Algorithms 2021, 87, 77–118. [Google Scholar] [CrossRef]
Favereau, M.; Lorca, A.; Negrete-Pincetic, M.; Vicuña, S. Robust streamflow forecasting: A Student’s t-mixture vector autoregressive model. Stoch. Environ. Res. Risk Assess. 2022, 36, 3979–3995. [Google Scholar] [CrossRef]
Chatzis, S.P.; Kosmopoulos, D.I. A variational Bayesian methodology for hidden Markov models utilizing Student’s-t mixtures. Pattern Recognit. 2011, 44, 295–306. [Google Scholar] [CrossRef]
Guo, F.; Wu, O.; Kodamana, H.; Ding, Y.; Huang, B. An Augmented Model Approach for Identification of Nonlinear Errors-in-Variables Systems Using the EM Algorithm. IEEE Trans. Syst. Man Cybern. Syst. 2018, 48, 1968–1978. [Google Scholar] [CrossRef]

Figure 1. Framework of robust multi-sensor recursive EM algorithm.

Figure 2. Input dual-output data (sample number: 9901 to 10,100).

Figure 3. Estimate the deviation trajectory of parameters in numerical examples.

Figure 4. The self validation MSE of numerical examples varies with the degree of data contamination.

Figure 5. The schematic diagram of continuous stirred tank reactor.

Figure 6. Input-dual measurements data of CSTR (sample number: 9801 to 10,200).

Figure 7. 50 Monte Carlo simulations with step sizes of 0.001, 0.005, and 0.01.

Figure 8. (a) Production concentration trajectory of component A for CSTR process in self validation. (b) Production concentration trajectory of component A for CSTR process in cross validation (samplin rate = 1 min).

Figure 9. (a) Boxplots of MSE from 50 simulations under different outlier ratios for Case 1. (b) Boxplots of MSE from 50 simulations under different outlier ratios for Case 2.

Table 1. Comparison between Real Parameters of Numerical Examples and Various Algorithms (with 5% Outliers).

Indicator	Actual Value		RREM 1		RREM 2		RMSREM
Indicator	Phase I	Phase II	Phase I	Phase II	Phase I	Phase II	Phase I	Phase II
Parm. $θ$	1.1430	0.7859	1.1429	0.7858	0.9336	0.6600	1.1430	0.7858
	−0.4346	−0.3679	−0.4316	−0.3673	−0.2780	−0.2936	−0.4351	−0.3665
	0.0572	0.3403	0.0572	0.3404	0.0576	0.3455	0.0578	0.3406
	0.2415	0.2417	0.2417	0.2413	0.2327	0.2573	0.2418	0.2414
Var. $σ_{1}^{2}$	0.001	0.001	0.0004	0.0004	0.9300	0.9635	0.0004	0.0003
Var. $σ_{2}^{2}$	1	1	N/A	N/A	N/A	N/A	1.0620	0.8145

N/A stands for not applicable

Table 2. Mean square error of the RMSREM algorithm compared to other benchmark methods.

Performance Index	SV	CV I	CV II	SV	CV I	CV II
Outlier Percentage	5%	5%	5%	15%	15%	15%
RBEM 1	N/A	0.0034	0.0016	N/A	0.0034	0.0018
RBEM 2	N/A	3.6222	1.7591	N/A	3.7800	1.9149
RREM 1	0.0026	0.0033	0.0017	0.0026	0.0035	0.0017
RREM 2	2.6421	3.2630	1.7452	2.7679	3.6152	1.9193
RMTEM	0.1257	0.1191	0.1322	0.5734	0.7606	0.3863
RMSREM	0.0026	0.0034	0.0017	0.0025	0.0032	0.0017

SV stands for the results of self-validation, CV I and CV II stand for the cross-validation of the first and second operating stages.

Table 3. The time cost of the RMSREM algorithm on RTM i7-10750H CPU @ 2.60 GHz.

	Time Cost of One-Step Operation (ms)	Time COST of Solving dof $ν$ (ms)
RLS	0.007	N/A
RREM	0.564	0.542
RMTEM	1.227	1.144
RMSREM	1.161	1.034

Table 4. Comparison of MSEs of the RMSREM algorithm and benchmarks for CSTR process, (order of magnitude:

\times 10^{- 7}

).

Table 4. Comparison of MSEs of the RMSREM algorithm and benchmarks for CSTR process, (order of magnitude:

\times 10^{- 7}

).

Performance Index	SV	CV I	CV II	SV	CV I	CV II
Outlier Percentage	3%	3%	3%	5%	5%	5%
RBEM 1	N/A	0.0581	2.2152	N/A	0.0652	2.7013
RBEM 2	N/A	0.0687	2.0692	N/A	0.0767	2.2498
RREM 1	0.0727	0.0593	0.0695	1.2713	0.0632	2.7816
RREM 2	1.3795	0.1022	0.1062	1.3943	0.1138	2.3364
RMTEM	1.7109	0.5125	2.9572	1.5573	0.6027	2.3902
RMSREM	0.0169	0.0014	0.0493	0.0170	0.0007	0.0051

SV stands for the results of self-validation, CV I and CV II stand for the cross-validation of the first and second operating stages.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, X.; Li, J. Multi-Sensor Recursive EM Algorithm for Robust Identification of ARX Models. Sensors 2025, 25, 7060. https://doi.org/10.3390/s25227060

AMA Style

Chen X, Li J. Multi-Sensor Recursive EM Algorithm for Robust Identification of ARX Models. Sensors. 2025; 25(22):7060. https://doi.org/10.3390/s25227060

Chicago/Turabian Style

Chen, Xin, and Jiale Li. 2025. "Multi-Sensor Recursive EM Algorithm for Robust Identification of ARX Models" Sensors 25, no. 22: 7060. https://doi.org/10.3390/s25227060

APA Style

Chen, X., & Li, J. (2025). Multi-Sensor Recursive EM Algorithm for Robust Identification of ARX Models. Sensors, 25(22), 7060. https://doi.org/10.3390/s25227060

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Sensor Recursive EM Algorithm for Robust Identification of ARX Models

Abstract

1. Introduction

2. Problem Formulation

3. Parameter Estimation via Robust Multi-Sensor Recursive EM Algorithm

3.1. Derivation of the Recursive Q-Function

3.2. Posterior Expectation of Latent Variables

3.3. Derivation of the Recursive Maximization Step

3.4. Solution for Weights $λ_{m, k}$

3.5. Analysis of Convergence and Computational Complexity

4. Algorithm Verification

4.1. Numerical Simulation

4.2. Continuous Stirred Tank Reactor Example

5. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of (19)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Multi-Sensor Recursive EM Algorithm for Robust Identification of ARX Models

Abstract

1. Introduction

2. Problem Formulation

3. Parameter Estimation via Robust Multi-Sensor Recursive EM Algorithm

3.1. Derivation of the Recursive Q-Function

3.2. Posterior Expectation of Latent Variables

3.3. Derivation of the Recursive Maximization Step

3.4. Solution for Weights λ m , k

3.5. Analysis of Convergence and Computational Complexity

4. Algorithm Verification

4.1. Numerical Simulation

4.2. Continuous Stirred Tank Reactor Example

5. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of (19)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.4. Solution for Weights $λ_{m, k}$