Algorithmic Differentiation of the MWGS-Based Arrays for Computing the Information Matrix Sensitivity Equations within the Problem of Parameter Identification

Tsyganov, Andrey; Tsyganova, Julia

doi:10.3390/math10010126

Open AccessArticle

Algorithmic Differentiation of the MWGS-Based Arrays for Computing the Information Matrix Sensitivity Equations within the Problem of Parameter Identification

by

Andrey Tsyganov

^1,†

and

Julia Tsyganova

^2,3,*,†

¹

Department of Mathematics, Physics and Technology Education, Ulyanovsk State University of Education, 432071 Ulyanovsk, Russia

²

Department of Mathematics, Information and Aviation Technology, Ulyanovsk State University, 432017 Ulyanovsk, Russia

³

Department of Academic Policy and Organization of Educational Activities, Innopolis University, 420500 Innopolis, Russia

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2022, 10(1), 126; https://doi.org/10.3390/math10010126

Submission received: 6 November 2021 / Revised: 19 December 2021 / Accepted: 28 December 2021 / Published: 2 January 2022

(This article belongs to the Special Issue New Trends on Identification of Dynamic Systems)

Download

Browse Figures

Versions Notes

Abstract

:

The paper considers the problem of algorithmic differentiation of information matrix difference equations for calculating the information matrix derivatives in the information Kalman filter. The equations are presented in the form of a matrix MWGS (modified weighted Gram–Schmidt) transformation. The solution is based on the usage of special methods for the algorithmic differentiation of matrix MWGS transformation of two types: forward (MWGS-LD) and backward (MWGS-UD). The main result of the work is a new MWGS-based array algorithm for computing the information matrix sensitivity equations. The algorithm is robust to machine round-off errors due to the application of the MWGS orthogonalization procedure at each step. The obtained results are applied to solve the problem of parameter identification for state-space models of discrete-time linear stochastic systems. Numerical experiments confirm the efficiency of the proposed solution.

Keywords:

algorithmic differentiation; information matrix; information filter; MWGS-orthgonalization; array algorithms; sensitivity equation; parameter identification

1. Introduction

Matrix orthogonal transformations are widely used in solving various problems of computational linear algebra [1].

The problem of calculating the values of the derivatives in matrix orthogonal transformations arises in automatic differentiation [2], perturbation and control theories, differential geometry when solving such problems as calculating Lyapunov exponents [3,4], numerical solutions of the matrix differential Riccati equation [5,6], and the Riccati sensitivity equation [7,8], computing higher-order derivatives in the experiment planning [9]. In the theory of Kalman filtering [10,11,12,13,14], orthogonal transformations are used to efficiently compute the solution of the matrix difference Riccati equation.

The methods for calculating the values of derivatives in matrix orthogonal transformations are similar in their properties to automatic (algorithmic) differentiation methods. Three methods for calculating derivatives are currently most common:

symbolic (analytical) differentiation;
numerical differentiation;
automatic (algorithmic) differentiation.

Symbolic differentiation allows one to obtain exact analytical formulas for the derivatives of the elements of a parameterized matrix, but this approach requires significant computational costs, and is not suitable for solving problems in real-time. With numerical differentiation, the result depends significantly on many factors, for example, on the step size. On the contrary, algorithmic (automatic) differentiation does not allow one to calculate expressions for derivatives or their tabular approximation, but does allow one to calculate the values of the derivatives at a given point for given values of the function arguments. This requires knowledge of the expression for the function, or at least a computer program to calculate it.

Consider a rectangular parameterized matrix

A (θ)

and a diagonal matrix

D_{A} (θ)

, where

θ

is a scalar real parameter. The problem of algorithmic differentiation of matrix MWGS transformation (MWGS—the modified weighted Gram–Schmidt orthogonalization [10]) is to find, at a given point

θ = \hat{θ}

a triangular (upper or lower) matrix of derivatives

{B_{θ}^{'}|}_{θ = \hat{θ}}

and the diagonal derivative matrix

{{(D_{B})}_{θ}^{'}|}_{θ = \hat{θ}}

using known parameterized matrices

A (θ)

,

D_{A} (θ)

and obtained as the result of MWGS transformation

A^{T} = B W^{T}

(where

A^{T} D_{A} A = B D_{B} B^{T}

) triangular (upper or lower) matrix

B

with ones on the diagonal and diagonal matrix

D_{B}

.

Suppose that the domain of the parameter

θ

is such that the matrix

A

has full column rank and the diagonal matrix

D_{A} > 0

. In what follows, for convenience of presentation, we denote

A ≜ A (\hat{θ})

and

D_{A} ≜ D_{A} (\hat{θ})

. Then, we will call two pairs of matrices

\{A, D_{A}\}

and

\{B, D_{B}\}

as MWGS-based arrays.

In our recent papers [15,16], we have proposed two methods for algorithmic differentiation of the MWGS-based arrays. These computational methods are based on the forward MWGS-LD orthogonalization ([16], p. 66) and the backward MWGS-UD orthogonalization procedure ([10], p. 127).

In this paper, we further develop our recently obtained results. Our research aims to construct a novel computational algorithm for evaluating the derivatives of MWGS factors of information matrix Y. Firstly, we show how our early suggested methods for algorithmic differentiation of the MWGS-based arrays can be applied to construct a new MWGS-based array algorithm for computing the information matrix sensitivity equations. Secondly, we demonstrate how the proposed algorithm can be efficiently applied to solve the parameter identification problem when gradient-based optimization methods are used.

The paper is organized as follows. Section 2 provides basic definitions associated with the information form of the Kalman filter, discusses the MWGS-based array algorithm for computing the information matrix, and presents two algorithms for computing derivatives of the MWGS-based arrays. Section 3 contains the main result of the paper—the new MWGS-based array algorithm for computing the information matrix sensitivity equations. Section 4 discusses the implementation details of the proposed algorithm and demonstrates how it can be applied for solving the parameter identification problem of the one practical stochastic system model. Finally, conclusions are made in Section 5.

2. Methodology

2.1. Information Kalman Filter and Information Matrix

The information Kalman filter (IKF) is an alternative formulation of the well-known Kalman filter (KF) [17]. IKF differs from KF in that it computes not an error covariance matrix P but its inverse matrix Y known as the information matrix. When there is no a priori information about the initial state value, the IKF is particularly useful because it easily starts from

Y_{0} = 0

. In the same case, the initial error covariance matrix

Π_{0}

is not defined. Additionally, an implementation of the IKF could be computationally cheaper when the size of the measurements vector is greater than the size of the state vector [11]. Information filter does not use the same state vector representation as KF. They utilize the so-called information state

d ≜ Y x

instead ([11], p. 263).

Consider a discrete-time linear stochastic system

\begin{matrix} x_{k + 1} & = & F_{k} x_{k} + G_{k} w_{k}, k \geq 0, \end{matrix}

(1)

\begin{matrix} z_{k + 1} & = & H_{k + 1} x_{k + 1} + v_{k + 1} \end{matrix}

(2)

where

x_{k} \in R^{n}

is the state and

z_{k} \in R^{m}

are the measurements; k is a discrete-time instant. The process noise

w_{k} \in R^{q}

and the measurement noise

v_{k} \in R^{m}

are Gaussian white-noise processes with zero mean and covariance matrices

Q_{k} > 0

and

R_{k} > 0

. That means

E \{[\begin{matrix} w_{k} \\ v_{k} \end{matrix}] [\begin{matrix} w_{j}^{T} & v_{j}^{T} \end{matrix}]\} = [\begin{matrix} Q_{k} & 0 \\ 0 & R_{k} \end{matrix}] δ_{k j}

(3)

where

δ_{k j}

denotes the Kronecker delta function. The initial state vector

x_{0} \sim N ({\bar{x}}_{0}, Π_{0})

.

Suppose that matrices

F_{k}

are invertible [11]. Consider the problem of information filtering. It is to calculate at each discrete-time moment k the information state estimate,

d_{k | k} = Y_{k | k} {\hat{x}}_{k | k}

given

Z_{1}^{k} = {z_{1}, \dots, z_{k}}

. The solution is obtained by the conventional IKF equations, which are as follows [11]:

I. Time Update: The predicted information state estimate and the predicted information matrix obey the difference equations

{\hat{d}}_{k + 1} = [I - L_{k} G_{k}^{T}] F_{k}^{- T} {\hat{d}}_{k | k}, {\hat{d}}_{0 | 0} = Y_{0} {\bar{x}}_{0} .

(4)

Y_{k + 1} = [I - L_{k} G_{k}^{T}] A_{k}, Y_{0 | 0} = Π_{0}^{- 1}

(5)

where

A_{k} ≜ F_{k}^{- T} Y_{k | k} F_{k}^{- 1},

(6)

L_{k} = A_{k} G_{k} C_{k}^{- 1}, C_{k} = G_{k}^{T} A_{k} G_{k} + Q_{k}^{- 1} .

(7)

II. Measurement Update: The updated (filtered) information state estimate

{\hat{d}}_{k + 1 | k + 1}

obeys

{\hat{d}}_{k + 1 | k + 1} = {\hat{d}}_{k + 1} + H_{k + 1}^{T} R_{k + 1}^{- 1} z_{k + 1} .

(8)

The filtered information matrix satisfies the difference equation

Y_{k + 1 | k + 1} = Y_{k + 1} + H_{k + 1}^{T} R_{k + 1}^{- 1} H_{k + 1} .

(9)

Equations (4)–(9) can be derived from the KF formulas by taking into account the definitions of the information matrix and the information state.

Furthermore, we will use the notations. Let B be a triangular matrix which can be either a unit upper triangular matrix, i.e.,

B : = U

(with 1’s on the main diagonal) or a unit lower triangular matrix, i.e.,

B : = L

. D is a diagonal matrix.

2.2. The MWGS-Based Array Algorithm for Computing the Information Matrix

Let us consider Equations (5)–(7) and (9). They allow one to compute information matrix

Y_{k | k}

at each discrete-time instant k. To improve the numerical robustness to machine round-off errors, we have proposed in [18] a new MWGS-based array algorithm for computing the information matrices and the information states in IKF.

The MWGS-based array computations imply the use of numerically stable modified weighted Gram–Schmidt (MWGS) orthogonalization procedure for updating the required quantities. (The MWGS outperforms the conventional Gram–Schmidt algorithm for computational accuracy [19].) In [18], we have used both the forward MWGS-LD and the backward MWGS-UD orthogonalization procedures.

Each iteration of these IKF implementations has the following form: given a pair of matrices

{A, D_{A}}

(so-called the pre-arrays), compute a pair of matrices

{B, D_{B}}

(the post-arrays) using the MWGS orthogonalization procedure

A^{T} = B W^{T}

(10)

where a rectangular matrix

A \in R^{r \times s}

, the MWGS transformation matrix

W \in R^{r \times s}

(

r \geq s

) produces the block triangular matrix

B

with 1’s on the main diagonal. A matrix

B \in R^{s \times s}

is either an upper triangular block matrix

U

or a lower triangular block matrix

L

such that

A^{T} D_{A} A = B D_{B} B^{T} a n d W^{T} D_{A} W = D_{B}

(11)

where the diagonal matrices are

D_{A} \in R^{r \times r}

,

D_{B} \in R^{s \times s}

, and

D_{A} > 0

; see ([10], Lemma VI.4.1) for details.

We have proved ([18], Statement 1) that Algorithm 1 is algebraically equivalent to the IKF given by Equations (4)–(9). Thus, we have obtained an alternative method for calculating the information matrix Y within the MWGS-based array algorithm. A significant advantage of Algorithm 1 is its numerical robustness to machine round-off errors. A detailed discussion can be found in [18,20].

Algorithm 1 [18]. The MWGS-based array IKF.

Initialization. Let

Y_{0} = Π_{0}^{- 1}

. Compute the modified Cholesky decomposition. (The modified Cholesky decomposition has the form

A = B_{A} D_{A} B_{A}^{T}

where A is a symmetric positive definite matrix,

D_{A}

is a diagonal matrix, and

B_{A}

is a unit triangular (lower or upper) matrix [1,11].)

Y_{0} = B_{Y_{0}} D_{Y_{0}} B_{Y_{0}}^{T}

. Set the initial values

{\hat{d}}_{0 | 0} = Y_{0} {\bar{x}}_{0}

and

B_{Y_{0 | 0}} = B_{Y_{0}}

,

D_{Y_{0 | 0}} = D_{Y_{0}}

.
▹ For

k = 0, \dots, K - 1

do
I. Time Update. Apply the modified Cholesky decomposition for the process noise covariance matrix

Q_{k} = B_{Q_{k}} D_{Q_{k}} B_{Q_{k}}^{T}

. Compute matrices

B_{Q_{k}}^{- 1}

and

D_{Q_{k}}^{- 1}

. Find the MWGS factors

{B_{Y_{k + 1}}, D_{Y_{k + 1}}}

of matrix

Y_{k + 1}

as follows:
I.A. In the case of the forward MWGS-LD factorization (i.e.,

B = L

), the following steps should be done:

\underset{Pre - array : A_{T U}^{T}}{\underset{︸}{[\begin{matrix} L_{Q_{k}}^{- T} & G_{k}^{T} F_{k}^{- T} L_{Y_{k | k}} \\ 0 & F_{k}^{- T} L_{Y_{k | k}} \end{matrix}]}} = \underset{Post - array : L_{T U}}{\underset{︸}{[\begin{matrix} L_{C_{k}} & 0 \\ (L_{k} L_{C_{k}}) & L_{Y_{k + 1}} \end{matrix}]}} W_{T U}^{T},

(12)

W_{T U}^{T} \underset{Pre - array : D_{A_{T U}}}{\underset{︸}{[\begin{matrix} D_{Q_{k}}^{- 1} & 0 \\ 0 & D_{Y_{k | k}} \end{matrix}]}} W_{T U} = \underset{Post - array : D_{L_{T U}}}{\underset{︸}{[\begin{matrix} D_{C_{k}} & 0 \\ 0 & D_{Y_{k + 1}} \end{matrix}]}}

(13)

where

W_{T U} \in R^{(q + n) \times (q + n)}

is the MWGS-LD transformation, the block unit triangular (lower) post-array

L_{T U} \in R^{(q + n) \times (q + n)}

, the diagonal post-array

D_{L_{T U}} \in R^{(q + n) \times (q + n)}

. The resulted MWGS-LD factors

{L_{Y_{k + 1}}, D_{Y_{k + 1}}}

are simply read off from the resulting post-arrays

L_{T U}

and

D_{L_{T U}}

.
I.B. In the case of the backward MWGS-LD factorization (i.e.,

B = U

), one has to follow the next steps:

\underset{Pre - array : D_{A_{T U}}}{\underset{︸}{[\begin{matrix} F_{k}^{- T} U_{Y_{k | k}} & 0 \\ G_{k}^{T} F_{k}^{- T} U_{Y_{k | k}} & U_{Q_{k}}^{- T} \end{matrix}]}} = \underset{Post - array : U_{T U}}{\underset{︸}{[\begin{matrix} U_{Y_{k + 1}} & (L_{k} U_{C_{k}}) \\ 0 & U_{C_{k}} \end{matrix}]}} W_{T U}^{T},

(14)

W_{T U}^{T} \underset{Pre - array : D_{A_{T U}}}{\underset{︸}{[\begin{matrix} D_{Y_{k | k}} & 0 \\ 0 & D_{Q_{k}}^{- 1} \end{matrix}]}} W_{T U} = \underset{Post - array : D_{U_{T U}}}{\underset{︸}{[\begin{matrix} D_{Y_{k + 1}} & 0 \\ 0 & D_{C_{k}} \end{matrix}]}}

(15)

where

W_{T U} \in R^{(n + q) \times (n + q)}

is the MWGS-UD transformation, the block unit triangular (upper) post-array

U_{T U} \in R^{(n + q) \times (n + q)}

, the diagonal post-array

D_{U_{T U}} \in R^{(n + q) \times (n + q)}

. Again, we can easily extract the required MWGS-UD factors

{U_{Y_{k + 1}}, D_{Y_{k + 1}}}

from the resulting post-arrays

U_{T U}

and

D_{U_{T U}}

.
Given

{\hat{d}}_{k | k}

, find the predicted information state estimate:

{\hat{d}}_{k + 1} = [I - (L_{k} B_{C_{k}}) B_{C_{k}}^{- 1} G_{k}^{T}] F_{k}^{- T} {\hat{d}}_{k | k}

(16)

where matrix product

(L_{k} B_{C_{k}})

is directly extracted either from the post-arrays

L_{T U}

in (12) or from

U_{T U}

in (14).
II. Measurement Update. Apply the modified Cholesky factorization for the measurement noise covariance matrix

R_{k} = B_{R_{k}} D_{R_{k}} B_{R_{k}}^{T}

. Compute matrices

B_{R_{k}}^{- 1}

and

D_{R_{k}}^{- 1}

. Find the filtered MWGS factors

{B_{Y_{k + 1 | k + 1}}, D_{Y_{k + 1 | k + 1}}}

:

\underset{Pre - array : A_{M U}^{T}}{\underset{︸}{[\begin{matrix} B_{Y_{k + 1}} & H_{k + 1}^{T} B_{R_{k + 1}}^{- T} \end{matrix}]}} = \underset{Post - array : B_{M U}}{\underset{︸}{[\begin{matrix} B_{Y_{k + 1 | k + 1}} \end{matrix}]}} W_{M U}^{T},

(17)

W_{M U}^{T} \underset{Pre - array : D_{A_{M U}}}{\underset{︸}{[\begin{matrix} D_{Y_{k + 1}} & 0 \\ 0 & D_{R_{k + 1}}^{- 1} \end{matrix}]}} W_{M U} = \underset{Post - array : D_{B_{M U}}}{\underset{︸}{[\begin{matrix} D_{Y_{k + 1 | k + 1}} \end{matrix}]}}

(18)

where

W_{M U} \in R^{(n + m) \times n}

is the MWGS transformation matrix, the block triangular post-array

B_{M U} \in R^{n \times n}

, the diagonal post-array

D_{B_{M U}} \in R^{n \times n}

. The resulting post-arrays

B_{M U}

and

D_{B_{M U}}

are the MWGS factors

{B_{Y_{k + 1 | k + 1}}, D_{Y_{k + 1 | k + 1}}}

.
Next, compute the filtered estimate

{\hat{d}}_{k + 1 | k + 1}

by (8).
▹ End.

2.3. Algorithmic Differentiation of the MWGS-Based Arrays

When solving practical problems of parameter identification [21], the discrete linear stochastic model (1),(2) is often parameterized. The latter means that the system matrices could depend on the unknown parameter

θ

. Therefore, it should be estimated together with the hidden state vector

x_{k}

given measurements

z_{k}

. In this case, any parameters’ estimation scheme includes two components, namely: the filtering method for computing an identification criterion and the chosen optimization algorithm to identify the optimal value

{\hat{θ}}^{☆}

. Altogether, it is called the adaptive filtering scheme [22].

It is well-known that the gradient-based optimization algorithms converge fast and, therefore, they are the preferred methods for practical implementation [8]. They require the computation of the gradient of identification criterion. The latter leads to the problem of the adaptive filter derivatives computation. The related vector- and matrix-type equations are called the filter sensitivity equations with respect to unknown parameter

θ

.

Consider conventional information Kalman filter, presented by Equations (4)–(9), and MWGS-based IKF (Algorithm 1). We can construct matrix sensitivity equations for information matrix evaluating in the conventional IKF by direct differentiation of (5)–(7), and (9). This solution is not hard, and it is as follows:

{(Y_{k + 1})}_{θ}^{'} = [I - L_{k} G_{k}^{T}] {(A_{k})}_{θ}^{'} - [{(L_{k})}_{θ}^{'} G_{k}^{T} + L_{k} {(G_{k}^{T})}_{θ}^{'}] A_{k},

(19)

{(A_{k})}_{θ}^{'} = {(F_{k}^{- T})}_{θ}^{'} Y_{k | k} F_{k}^{- 1} + F_{k}^{- T} {(Y_{k | k})}_{θ}^{'} F_{k}^{- 1} + F_{k}^{- T} Y_{k | k} {(F_{k}^{- 1})}_{θ}^{'},

(20)

{(L_{k})}_{θ}^{'} = {(A_{k})}_{θ}^{'} G_{k} C_{k}^{- 1} + A_{k} {(G_{k})}_{θ}^{'} C_{k}^{- 1} + A_{k} G_{k} {(C_{k}^{- 1})}_{θ}^{'},

(21)

{(C_{k})}_{θ}^{'} = {(G_{k}^{T})}_{θ}^{'} A_{k} G_{k} + G_{k}^{T} {(A_{k})}_{θ}^{'} G_{k} + G_{k}^{T} A_{k} {(G_{k})}_{θ}^{'} + {(Q_{k}^{- 1})}_{θ}^{'},

(22)

\begin{matrix} {(Y_{k + 1 | k + 1})}_{θ}^{'} = {(Y_{k + 1})}_{θ}^{'} + {(H_{k + 1}^{T})}_{θ}^{'} R_{k + 1}^{- 1} H_{k + 1} + H_{k + 1}^{T} {(R_{k + 1}^{- 1})}_{θ}^{'} H_{k + 1} \\ + H_{k + 1}^{T} R_{k + 1}^{- 1} {(H_{k + 1})}_{θ}^{'} . \end{matrix}

(23)

However, a corresponding solution is not obvious for the MWGS-based array IKF (Algorithm 1). Finding new computational methods for evaluating the derivatives of MWGS-factors of information matrix Y is the aim of our research.

Let us consider two of our methods for algorithmic differentiation of the MWGS-based arrays. They were proposed in [15,16].

Case 1. Consider the forward MWGS-LD orthogonalization procedure (10) and (11), where

B : = L

(lower triangular matrix),

D_{B} : = D_{L}

(diagonal matrix).

Lemma 1

([16]). Let entries of the pre-arrays

A

,

D_{A}

in (11) be known differentiable functions of a parameter θ. Consider the transformation (11) in Case 1. Given the derivatives of the pre-arrays

A_{θ}^{'}

and

{(D_{A})}_{θ}^{'}

, we can calculate the corresponding derivatives of the post-arrays:

L_{θ}^{'} = L ({\bar{L}}_{0} + {\bar{L}}_{2} + {\bar{U}}_{0}^{T}) D_{L}^{- 1}, {(D_{L})}_{θ}^{'} = 2 D_{0} + D_{2}

(24)

where

{\bar{L}}_{0}

,

D_{0}

,

{\bar{U}}_{0}

are strictly lower triangular, diagonal and strictly upper triangular parts of the matrix product

W^{T} {(D_{A})}_{θ}^{'} L^{- T}

, respectively.

D_{2}

and

{\bar{L}}_{2}

are diagonal and strictly lower triangular parts of the product

W^{T} {(D_{A})}_{θ}^{'} W

.

Case 2. Consider the backward MWGS-UD orthogonalization procedure (10) and (11) where

B : = U

(upper triangular matrix),

D_{B} : = D_{U}

(diagonal matrix).

Lemma 2

([15]). Let entries of the pre-arrays

A

,

D_{A}

in (11) be known differentiable functions of a parameter θ. Consider the transformation (11) in Case 2. Given the derivatives of the pre-arrays

A_{θ}^{'}

and

{(D_{A})}_{θ}^{'}

, we can calculate the corresponding derivatives of the post-arrays:

U_{θ}^{'} = U ({\bar{L}}_{0}^{T} + {\bar{U}}_{0} + {\bar{U}}_{2}) D_{U}^{- 1}, {(D_{U})}_{θ}^{'} = 2 D_{0} + D_{2}

(25)

where

{\bar{L}}_{0}

,

D_{0}

,

{\bar{U}}_{0}

are strictly lower triangular, diagonal and strictly upper triangular parts of the matrix product

W^{T} {(D_{A})}_{θ}^{'} U^{- T}

, respectively.

D_{2}

and

{\bar{U}}_{2}

are diagonal and strictly upper triangular parts of the product

W^{T} {(D_{A})}_{θ}^{'} W

.

Applying Lemmas 1 and 2, we construct the corresponding algorithms for calculating the values of derivatives in the MWGS-based arrays for given parametrized matrices

A (θ)

and

D_{A} (θ)

.

Remark 1.

FunctionMWGS-LD(

A

,

D_{A}

) implements the forward MWGS orthogonalization procedure.

Remark 2.

FunctionMWGS-UD(

A

,

D_{A}

) implements the backward MWGS orthogonalization procedure.

Thus, computational Algorithms 2 and 3 have the following properties:

They allow calculating, at a given point, the values of derivatives of elements of the matrix factors obtained by MWGS transformation of the pair of parameterized matrices. In this case, there is no need to calculate values of the derivatives of elements of the MWGS transformation matrix.
These algorithms require simple addition and multiplication matrix operations, and only one triangular and one diagonal matrix inversion operation. Therefore, they have a simple structure to easily implement in program code.
Their correctness is strictly mathematically proved [15,16].
Their performance has been confirmed by practical examples [15,23].

It should be noted here that the results of Lemma 2 and Algorithm 3 have been successfully applied in [24] for constructing an efficient UD-based algorithm for the computation of maximum likelihood sensitivity of continuous-discrete systems.

Algorithm 2. Diff_LD (LD-based derivative computation).

▹ Input data:

A (θ) \in R^{r \times s}

,

D_{A} (θ) \in R^{r \times r}

,

θ = \hat{θ} \in R^{p}

,

A_{θ_{i}}^{'} = \{\frac{a_{k j} (θ)}{\partial θ_{i}}\}

,

{(D_{A})}_{θ_{i}}^{'} = \{\frac{d_{k} (θ)}{\partial θ_{i}}\}

,

i = 1, \dots, p

.
▹ Begin
1 evaluate

A

←

A (\hat{θ})

,

D_{A}

←

D_{A} (\hat{θ})

;
2 evaluate

A_{{\hat{θ}}_{i}}^{'}

←

{A_{θ_{i}}^{'}|}_{θ_{i} = {\hat{θ}}_{i}}

,

{(D_{A})}_{\hat{θ_{i}}}^{'}

←

{{(D_{A})}_{θ_{i}}^{'}|}_{θ_{i} = {\hat{θ}}_{i}}

;
3. compute

[L, D_{L}, W]

←MWGS-LD(

A

,

D_{A}

);
▹ For

i = 1, \dots, p

do
4 compute X←

W^{T} D_{A} A_{{\hat{θ}}_{i}}^{'} L^{- T}

;
5 split X into three parts

[{\bar{L}}_{0} + D_{0} + {\bar{U}}_{0}]

←X;
6 compute V←

W^{T} {(D_{A})}_{{\hat{θ}}_{i}}^{'} W

;
7 split V into three parts

[{\bar{L}}_{2} + D_{2} + {\bar{L}}_{2}^{T}]

←V;
8 obtain result

{(D_{L})}_{{\hat{θ}}_{i}}^{'}

←

2 D_{0} + D_{2}

;
9 obtain result

L_{{\hat{θ}}_{i}}^{'}

←

L ({\bar{L}}_{0} + {\bar{L}}_{2} + {\bar{U}}_{0}^{T}) D_{L}^{- 1}

.
▹ End for
▹ End.
▹ Output data:

L \in R^{s \times s}

,

D_{L} \in R^{s \times s}

;

\{L_{{\hat{θ}}_{i}}^{'}, {(D_{U})}_{{\hat{θ}}_{i}}^{'}\}

,

i = 1, \dots, p

.

Algorithm 3. Diff_UD (UD-based derivative computation).

▹ Input data:

A (θ) \in R^{r \times s}

,

D_{A} (θ) \in R^{r \times r}

,

θ = \hat{θ} \in R^{p}

,

A_{θ_{i}}^{'} = \{\frac{a_{k j} (θ)}{\partial θ_{i}}\}

,

{(D_{A})}_{θ_{i}}^{'} = \{\frac{d_{k} (θ)}{\partial θ_{i}}\}

,

i = 1, \dots, p

.
▹ Begin
1 evaluate

A

←

A (\hat{θ})

,

D_{A}

←

D_{A} (\hat{θ})

;
2 evaluate

A_{{\hat{θ}}_{i}}^{'}

←

{A_{θ_{i}}^{'}|}_{θ_{i} = {\hat{θ}}_{i}}

,

{(D_{A})}_{\hat{θ_{i}}}^{'}

←

{{(D_{A})}_{θ_{i}}^{'}|}_{θ_{i} = {\hat{θ}}_{i}}

;
3 compute

[U, D_{U}, W]

← MWGS-UD(

A

,

D_{A}

);
▹ For

i = 1, \dots, p

do
4 compute X←

W^{T} D_{A} A_{{\hat{θ}}_{i}}^{'} U^{- T}

;
5 split X into three parts

[{\bar{L}}_{0} + D_{0} + {\bar{U}}_{0}]

←X;
6 compute V←

W^{T} {(D_{A})}_{{\hat{θ}}_{i}}^{'} W

;
7 split V into three parts

[{\bar{U}}_{2}^{T} + D_{2} + {\bar{U}}_{2}]

←V;
8 obtain result

{(D_{U})}_{{\hat{θ}}_{i}}^{'}

←

2 D_{0} + D_{2}

;
9 obtain result

U_{{\hat{θ}}_{i}}^{'}

←

U ({\bar{L}}_{0}^{T} + {\bar{U}}_{0} + {\bar{U}}_{2}) D_{U}^{- 1}

.
▹ End for
▹ End.
▹ Output data:

U \in R^{s \times s}

,

D_{U} \in R^{s \times s}

;

\{U_{{\hat{θ}}_{i}}^{'}, {(D_{U})}_{{\hat{θ}}_{i}}^{'}\}

.

3. Main Result

The New MWGS-Based Array Algorithm for Computing the Information Matrix Sensitivity Equations

Now, we are ready to present the main result—the new MWGS-based array algorithm for computing the information matrix sensitivity equations. We are extending the functionality of Algorithm 1 so that it is able to calculate not only the values of information matrix Y using MWGS-based arrays, but also the values of their derivatives.

Let us consider the given value of parameter

θ = \hat{θ}

.

The new Algorithm 4 naturally extends any MWGS-based IKF implementation on the information matrix sensitivities evaluation.

Algorithm 4. The differentiated MWGS-based array.

Initialization. Let

θ = \hat{θ}

. Evaluate the initial value of information matrix

Y_{0} = Y_{0} (\hat{θ})

. Find

{(Y_{0})}_{{\hat{θ}}_{i}}^{'}

,

i = 1, \dots, p

. Apply the modified Cholesky factorization

Y_{0} = B_{Y_{0}} D_{Y_{0}} B_{Y_{0}}^{T}

. Find

\{{(B_{Y_{0}})}_{{\hat{θ}}_{i}}^{'}, {(D_{Y_{0}})}_{{\hat{θ}}_{i}}^{'}\}

,

i = 1, \dots, p

. Set the initial values

{B_{Y_{0 | 0}} = B_{Y_{0}}, D_{Y_{0 | 0}} = D_{Y_{0}}}

and

{{(B_{Y_{0 | 0}})}_{{\hat{θ}}_{i}}^{'} = {(B_{Y_{0}})}_{{\hat{θ}}_{i}}^{'}, {(D_{Y_{0 | 0}})}_{{\hat{θ}}_{i}}^{'} = {(D_{Y_{0}})}_{{\hat{θ}}_{i}}^{'}}

.
▹ For

k = 0, \dots, K - 1

do
I. Time Update.
I.1 Evaluate matrices

{\hat{F}}_{k} = F_{k} (\hat{θ})

,

{\hat{G}}_{k} = G_{k} (\hat{θ})

, and

{\hat{Q}}_{k} = Q_{k} (\hat{θ})

. Find

{(F_{k})}_{{\hat{θ}}_{i}}^{'}

,

{(G_{k})}_{{\hat{θ}}_{i}}^{'}

, and

{(Q_{k})}_{{\hat{θ}}_{i}}^{'}

,

i = 1, \dots, p

.
I.2 Use the modified Cholesky decomposition for matrices

{\hat{Q}}_{k}

and

{(Q_{k})}_{{\hat{θ}}_{i}}^{'}

to find

{B_{{\hat{Q}}_{k}}, D_{{\hat{Q}}_{k}}}

, and

\{{(B_{Q_{k}})}_{{\hat{θ}}_{i}}^{'}, {(D_{Q_{k}})}_{{\hat{θ}}_{i}}^{'}\}

,

i = 1, \dots, p

.
I.3 Given the MWGS factors

{B_{Y_{k | k}}, D_{Y_{k | k}}}

and their derivatives

{{(B_{Y_{k | k}})}_{{\hat{θ}}_{i}}^{'}, {(D_{Y_{k | k}})}_{{\hat{θ}}_{i}}^{'}}

, find their predicted values

{B_{Y_{k + 1}}, D_{Y_{k + 1}}}

and their derivatives

{{(B_{Y_{k + 1}})}_{{\hat{θ}}_{i}}, {(D_{Y_{k + 1}})}_{{\hat{θ}}_{i}}^{'}}

(

i = 1, \dots, p

) as follows:
I.A. In the case of the forward MWGS-LD factorization (i.e.,

B = L

), the following steps should be taken:

Form the pre-arrays ${A_{T U}^{T}, D_{A_{T U}}}$ and their derivatives ${{(A_{T U}^{T})}_{{\hat{θ}}_{i}}^{'}, {(D_{A_{T U}})}_{{\hat{θ}}_{i}}^{'}}$ , ( $i = 1, \dots, p$ ):

$A_{T U}^{T} = [\begin{matrix} L_{{\hat{Q}}_{k}}^{- T} & {\hat{G}}_{k}^{T} {\hat{F}}_{k}^{- T} L_{Y_{k | k}} \\ 0 & {\hat{F}}_{k}^{- T} L_{Y_{k | k}} \end{matrix}], D_{A_{T U}} = [\begin{matrix} D_{{\hat{Q}}_{k}}^{- 1} & 0 \\ 0 & D_{Y_{k | k}} \end{matrix}];$

(26)

${(A_{T U}^{T})}_{{\hat{θ}}_{i}}^{'} = [\begin{matrix} {(L_{{\hat{Q}}_{k}}^{- T})}_{{\hat{θ}}_{i}}^{'} & {({\hat{G}}_{k}^{T} {\hat{F}}_{k}^{- T} L_{Y_{k | k}})}_{{\hat{θ}}_{i}}^{'} \\ 0 & {({\hat{F}}_{k}^{- T})}_{{\hat{θ}}_{i}}^{'} {(L_{Y_{k | k}})}_{{\hat{θ}}_{i}}^{'} \end{matrix}], {(D_{A_{M U}})}_{{\hat{θ}}_{i}}^{'} = [\begin{matrix} {(D_{{\hat{Q}}_{k}}^{- 1})}_{{\hat{θ}}_{i}}^{'} & 0 \\ 0 & {(D_{Y_{k | k}})}_{{\hat{θ}}_{i}}^{'} \end{matrix}] .$

(27)
Apply Algorithm 2 where $A : = A_{T U}^{T}$ , $D_{A} : = D_{A_{T U}}$ ; $L : = L_{T U}$ , $D_{L} : = D_{L_{T U}}$ .
Obtain the post-arrays ${L_{T U}, D_{L_{T U}}}$ and their derivatives ${{(L_{T U})}_{{\hat{θ}}_{i}}^{'}, {(D_{L_{T U}})}_{{\hat{θ}}_{i}}^{'}}$ , ( $i = 1, \dots, p$ ):

$L_{T U} = [\begin{matrix} L_{C_{k}} & 0 \\ (L_{k} L_{C_{k}}) & L_{Y_{k + 1}} \end{matrix}], D_{L_{T U}} = [\begin{matrix} D_{C_{k}} & 0 \\ 0 & D_{Y_{k + 1}} \end{matrix}];$

(28)

${(L_{T U})}_{{\hat{θ}}_{i}}^{'} = [\begin{matrix} {(L_{C_{k}})}_{{\hat{θ}}_{i}}^{'} & 0 \\ {(L_{k} L_{C_{k}})}_{{\hat{θ}}_{i}}^{'} & {(L_{Y_{k + 1}})}_{{\hat{θ}}_{i}}^{'} \end{matrix}], {(D_{L_{T U}})}_{{\hat{θ}}_{i}}^{'} = [\begin{matrix} {(D_{C_{k}})}_{{\hat{θ}}_{i}}^{'} & 0 \\ 0 & {(D_{Y_{k + 1}})}_{{\hat{θ}}_{i}}^{'} \end{matrix}] .$

(29)
Extract matrices ${L_{Y_{k + 1}}, D_{Y_{k + 1}}}$ and ${{(L_{Y_{k + 1}})}_{{\hat{θ}}_{i}}^{'}, {(D_{Y_{k + 1}})}_{{\hat{θ}}_{i}}^{'}}$ ( $i = 1, \dots, p$ ) from the post-arrays.

I.B. In the case of the backward MWGS-LD factorization (i.e.,

B = U

), one has to take the next steps:

Form the pre-arrays ${A_{T U}^{T}, D_{A_{T U}}}$ and their derivatives ${{(A_{T U}^{T})}_{{\hat{θ}}_{i}}^{'}, {(D_{A_{T U}})}_{{\hat{θ}}_{i}}^{'}}$ , ( $i = 1, \dots, p$ ):

$A_{T U}^{T} = [\begin{matrix} {\hat{F}}_{k}^{- T} U_{Y_{k | k}} & 0 \\ {\hat{G}}_{k}^{T} {\hat{F}}_{k}^{- T} U_{Y_{k | k}} & U_{{\hat{Q}}_{k}}^{- T} \end{matrix}], D_{A_{T U}} = [\begin{matrix} D_{Y_{k | k}} & 0 \\ 0 & D_{{\hat{Q}}_{k}}^{- 1} \end{matrix}];$

(30)

${(A_{T U}^{T})}_{{\hat{θ}}_{i}}^{'} = [\begin{matrix} {({\hat{F}}_{k}^{- T} U_{Y_{k | k}})}_{{\hat{θ}}_{i}}^{'} & 0 \\ {({\hat{G}}_{k}^{T} {\hat{F}}_{k}^{- T} U_{Y_{k | k}})}_{{\hat{θ}}_{i}}^{'} & {({\hat{F}}_{k}^{- T})}_{{\hat{θ}}_{i}}^{'} {(U_{{\hat{Q}}_{k}}^{- T})}_{{\hat{θ}}_{i}}^{'} \end{matrix}], {(D_{A_{M U}})}_{{\hat{θ}}_{i}}^{'} = [\begin{matrix} {(D_{Y_{k | k}})}_{{\hat{θ}}_{i}}^{'} & 0 \\ 0 & {(D_{{\hat{Q}}_{k}}^{- 1})}_{{\hat{θ}}_{i}}^{'} \end{matrix}] .$

(31)
Apply Algorithm 3, where $A : = A_{T U}^{T}$ , $D_{A} : = D_{A_{T U}}$ ; $L : = U_{T U}$ , $D_{U} : = D_{U_{T U}}$ .
Obtain the post-arrays ${U_{T U}, D_{U_{T U}}}$ and their derivatives ${{(U_{T U})}_{{\hat{θ}}_{i}}^{'}, {(D_{U_{T U}})}_{{\hat{θ}}_{i}}^{'}}$ , ( $i = 1, \dots, p$ ):

$U_{T U} = [\begin{matrix} U_{Y_{k + 1}} & (L_{k} U_{C_{k}}) \\ 0 & U_{C_{k}} \end{matrix}], D_{U_{T U}} = [\begin{matrix} D_{Y_{k + 1}} & 0 \\ 0 & D_{C_{k}} \end{matrix}];$

(32)

${(U_{T U})}_{{\hat{θ}}_{i}}^{'} = [\begin{matrix} {(U_{Y_{k + 1}})}_{{\hat{θ}}_{i}}^{'} & {(L_{k} U_{C_{k}})}_{{\hat{θ}}_{i}}^{'} \\ 0 & {(U_{C_{k}})}_{{\hat{θ}}_{i}}^{'} \end{matrix}], {(D_{U_{T U}})}_{{\hat{θ}}_{i}}^{'} = [\begin{matrix} {(D_{Y_{k + 1}})}_{{\hat{θ}}_{i}}^{'} & 0 \\ 0 & {(D_{C_{k}})}_{{\hat{θ}}_{i}}^{'} \end{matrix}] .$

(33)
Extract matrices ${U_{Y_{k + 1}}, D_{Y_{k + 1}}}$ and ${{(U_{Y_{k + 1}})}_{{\hat{θ}}_{i}}^{'}, {(D_{Y_{k + 1}})}_{{\hat{θ}}_{i}}^{'}}$ ( $i = 1, \dots, p$ ) from the post-arrays.

II. Measurement Update.
II.1 Evaluate matrices

{\hat{H}}_{k} = H_{k} (\hat{θ})

and

{\hat{R}}_{k} = R_{k} (\hat{θ})

. Find

{(H_{k})}_{{\hat{θ}}_{i}}^{'}

and

{(R_{k})}_{{\hat{θ}}_{i}}^{'}

,

i = 1, \dots, p

.
II.2 Use the modified Cholesky decomposition for matrices

{\hat{R}}_{k}

and

{(R_{k})}_{{\hat{θ}}_{i}}^{'}

to find

{B_{{\hat{R}}_{k}}, D_{{\hat{R}}_{k}}}

, and

\{{(B_{R_{k}})}_{{\hat{θ}}_{i}}^{'}, {(D_{R_{k}})}_{{\hat{θ}}_{i}}^{'}\}

,

i = 1, \dots, p

.
II.3 Given the MWGS factors

{B_{Y_{k + 1}}, D_{Y_{k + 1}}}

and their derivatives

{{(B_{Y_{k + 1}})}_{{\hat{θ}}_{i}}^{'}, {(D_{Y_{k + 1}})}_{{\hat{θ}}_{i}}^{'}}

, find the corresponding pairs of matrices

{B_{Y_{k + 1 | k + 1}}, D_{Y_{k + 1 | k + 1}}}

and

{{(B_{Y_{k + 1 | k + 1}})}_{{\hat{θ}}_{i}}, {(D_{Y_{k + 1 | k + 1}})}_{{\hat{θ}}_{i}}^{'}}

(

i = 1, \dots, p

) as follows:

Form the pre-arrays ${A_{M U}^{T}, D_{A_{M U}}}$ and their derivatives ${{(A_{M U}^{T})}_{{\hat{θ}}_{i}}^{'}, {(D_{A_{M U}})}_{{\hat{θ}}_{i}}^{'}}$ , ( $i = 1, \dots, p$ ):

$A_{M U}^{T} = [\begin{matrix} B_{Y_{k + 1}} & {\hat{H}}_{k + 1}^{T} B_{{\hat{R}}_{k + 1}}^{- T} \end{matrix}], D_{A_{M U}} = [\begin{matrix} D_{Y_{k + 1}} & 0 \\ 0 & D_{{\hat{R}}_{k + 1}}^{- 1} \end{matrix}];$

(34)

${(A_{M U}^{T})}_{{\hat{θ}}_{i}}^{'} = [\begin{matrix} {(B_{Y_{k + 1}})}_{{\hat{θ}}_{i}}^{'} & {({\hat{H}}_{k + 1}^{T} B_{{\hat{R}}_{k + 1}}^{- T})}_{{\hat{θ}}_{i}}^{'} \end{matrix}], {(D_{A_{M U}})}_{{\hat{θ}}_{i}}^{'} = [\begin{matrix} {(D_{Y_{k + 1}})}_{{\hat{θ}}_{i}}^{'} & 0 \\ 0 & {(D_{{\hat{R}}_{k + 1}}^{- 1})}_{{\hat{θ}}_{i}}^{'} \end{matrix}] .$

(35)
Apply Algorithm 2 in Case 1 or Algorithm 3 in Case 2, where $A : = A_{M U}^{T}$ , $D_{A} : = D_{A_{M U}}$ ; $B : = B_{M U}$ , $D_{B} : = D_{B_{M U}}$ .
Obtain the post-arrays ${B_{M U}, D_{B_{M U}}}$ and their derivatives ${{(B_{M U})}_{{\hat{θ}}_{i}}^{'}, {(D_{B_{M U}})}_{{\hat{θ}}_{i}}^{'}}$ , ( $i = 1, \dots, p$ ):

$B_{M U} = [\begin{matrix} B_{Y_{k + 1 | k + 1}} \end{matrix}], D_{B_{M U}} = [\begin{matrix} D_{Y_{k + 1 | k + 1}} \end{matrix}];$

(36)

${(B_{M U})}_{{\hat{θ}}_{i}}^{'} = [\begin{matrix} {(B_{Y_{k + 1 | k + 1}})}_{{\hat{θ}}_{i}}^{'} \end{matrix}], {(D_{B_{M U}})}_{{\hat{θ}}_{i}}^{'} = [\begin{matrix} {(D_{Y_{k + 1 | k + 1}})}_{{\hat{θ}}_{i}}^{'} \end{matrix}] .$

(37)
Extract matrices ${B_{Y_{k + 1 | k + 1}}, D_{Y_{k + 1 | k + 1}}}$ , ${{(B_{Y_{k + 1 | k + 1}})}_{{\hat{θ}}_{i}}^{'}, {(D_{Y_{k + 1 | k + 1}})}_{{\hat{θ}}_{i}}^{'}}$ ( $i = 1, \dots, p$ ) from the post-arrays.

▹ End.

4. Discussion

4.1. Implementation Details of Algorithm 4

Let us consider the computational scheme of the constructed algorithm in detail. The new Algorithm 4 is built based on one of two variants of the MWGS transformation and the corresponding method of algorithmic differentiation of such an orthogonal transformation. Therefore, Algorithms 2 and 3 can be considered as basic computational tools (or technologies) for the implementation of Algorithm 4. From this point of view, the technology of software implementation of the new algorithmic differentiation algorithm seems to be simple and understandable. It consists of only three simple steps:

Fill in block pre-arrays with available data.
Execute an algorithm for calculating derivatives in a matrix MWGS transformation of one of the types corresponding to Case 1 or Case 2.
As a result, get block post-arrays and read off the required results from them in the form of matrix blocks.

The implementation scheme of the measurement update step in Algorithm 4 based on the MWGS-UD transformation is shown in Figure 1. Similarly, a general scheme for the MWGS-LD transformation can also be represented.

The computational complexity of the novel Algorithm 4 is mainly determined by the computational complexity of Algorithm 1, i.e., the MWGS-based information-form Kalman filtering algorithm. A detailed analysis of its computational complexity is given in ([18], Section 5.2).

It was shown that conventional IKF and Algorithm 1 have the complexity of the same order. However, IKF requires four full matrix inversions while calculating the information matrix Y. At the same time, Algorithm 1 requires only one full matrix inversion of the matrix

F_{k}

. Besides, if the matrices

Q_{k}

and

R_{k}

are positively definite and do not depend on k, then the modified Cholesky decomposition needs to be performed only once at the initialization step of the MWGS-based algorithm. If matrix

F_{k}

is not singular and also does not depend on k, then the inversion of matrix F also needs to be performed only once, i.e., at the initialization step of the MWGS-based algorithm. Algorithm 4 requires additionally only one inversion of the unit triangular matrix and one inversion of the diagonal matrix (see Algorithm 2 or 3). Therefore, no additional inversions of the full matrices are required.

To summarize, the information matrix sensitivity evaluation based on the conventional IKF (Equations (5)–(7) and (19)–(23)) requires eight inversions of the full matrices, while the new Algorithm 4 avoids the full matrix inversion operations and requires the inverse of the unit triangular and diagonal matrices, only. Thus, we can conclude that the newly proposed algorithm is computationally efficient compared to the conventional IKF.

4.2. Application of the Results Obtained to the Problem of Parameter Identification

In practice, the matrices characterizing discrete-time linear stochastic system (1)–(2) are often known up to certain parameters. Consider an important problem of parameter identification [21]. Assume that the elements of system matrices

F_{k} \in R^{n \times n}

,

G_{k} \in R^{n \times q}

,

H_{k} \in R^{m \times n}

,

Q_{k} \in R^{q \times q}

,

R_{k} \in R^{m \times m}

, and

Π_{0} \in R^{n \times n}

are functions of unknown system parameters vector

θ \in R^{p}

. It needs to be identified. For the sake of simplicity, instead of

F_{k} (θ)

,

G_{k} (θ)

,

H_{k} (θ)

etc., we will write

F_{k}

,

G_{k}

,

H_{k}

, etc.

We wish to demonstrate how the new Algorithm 4 can be applied to solve the parameter identification problem of the practical stochastic system model.

Consider the instrument error model for one channel of the INS (Inertial Navigation System) given as follows [25]:

{[\begin{matrix} Δ v_{x} \\ β \\ m_{A x} \\ n_{G y} \end{matrix}]}_{k + 1} = [\begin{matrix} 1 & - τ g & τ & 0 \\ τ / a & 1 & 0 & τ \\ 0 & 0 & b_{1} & 0 \\ 0 & 0 & 0 & 1 \end{matrix}] {[\begin{matrix} Δ v_{x} \\ β \\ m_{A x} \\ n_{G y} \end{matrix}]}_{k} + [\begin{matrix} 0 \\ 0 \\ a_{1} \\ 0 \end{matrix}] w_{k},

(38)

z_{k + 1} = {(Δ v_{x})}_{k + 1} + v_{k + 1}

(39)

where

w_{k} \sim N (0, 1)

,

v_{k} \sim N (0, 0.01)

,

x_{0} \sim N (0, I_{4})

and subscripts x, y, A, G denote “axis

O x

”, “axis

O y

”, “Accelerometer”, and “Gyro”, respectively.

The state vector

x_{k} = {[\begin{matrix} Δ v_{x}, & β, & m_{A x}, & n_{G y} \end{matrix}]}^{T}

where the first element is the random error in reading velocity along axis

O x

of a GSP, the second element is the angular error in determining the local vertical, the third one is the accelerometer reading random error, and the fourth one is the gyro constant drift rate.

The constants

τ

, g, a are, respectively, equal to the rate of data arrival and processing, the gravity acceleration, and the semi-major axis of the Earth. The quantities

a_{1} = H_{1} \sqrt{1 - b_{1}^{2}} ≃ H_{1} \sqrt{2 γ τ}

and

b_{1} = e^{- γ τ} ≃ 1 - γ τ

. The constants

H_{1}

and

γ

are elements of the accepted model of the correlation function

R_{m_{A x}} = H_{1}^{2} e^{- γ | τ |}

. Numerical values of the model parameters are given in Table 1.

Equations (38) and (39) correspond to the general model (1)–(2). Note that, in our case, all system model matrices do not depend on k.

Let us suppose that parameter

γ

is unknown and needs to be identified. This means that the model parameter

θ = γ

, and therefore

F = F (θ)

,

G = G (θ)

.

Solving the problem of parameter identification, we use the Active Principle of Adaptation (APA) [26,27,28,29], which consists of constructing an Auxiliary Performance Index (API) [23,28,30] and minimizing it with the use of a gradient-based numerical procedure.

The APA approach to system adaption within the parameter uncertainty differs in the fact that it suggests an indirect state prediction error control in the API form. It has to satisfy two main requirements:

it depends on the system observable values only;
it attains its minimum coincidently with the Original Performance Index (OPI).

The API satisfies a relation

API = OPI + c o n s t,

if the OPI is defined as the expected (Euclidean) norm of the state prediction error. Thus API and OPI have the one and the same minimizing argument

θ^{†}

.

In order to construct the API, we build a Standard Observable Model (SOM), i.e., we perform the corresponding transformation of the basis in the state space from the representation (1)–(2). Model

\begin{matrix} x_{k + 1}^{*} & = & F_{*} (θ) x_{k}^{*} + G_{*} (θ) w_{k}, k \geq 0, \end{matrix}

(40)

\begin{matrix} z_{k + 1} & = & H_{*} x_{k + 1}^{*} + v_{k + 1} \end{matrix}

(41)

is equivalent to the original model (1)–(2) and is its canonical representation, where

x_{k}^{*}

is the new state vector;

F_{*} (θ)

,

G_{*} (θ)

, and

H_{*}

are matrices of the following form:

G_{*} = τ a_{1} [\begin{matrix} 0 \\ 1 \\ 1 + b_{1} \\ 1 - ρ + b_{1} + b_{1}^{2} \end{matrix}], H_{*} = [\begin{matrix} 1 & 0 & 0 & 0 \end{matrix}], ρ = τ^{2} g / a,

(42)

F_{*} = [\begin{matrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ - α_{4} & - α_{3} & - α_{2} & - α_{1} \end{matrix}], \begin{matrix} α_{4} = b_{1} (1 + ρ), \\ α_{3} = - b_{1} (3 + ρ) - (1 + ρ), \\ α_{2} = 3 b_{1} + (3 + ρ), \\ α_{1} = - b_{1} - 3 . \end{matrix}

(43)

From representation (42), (43) it follows that the maximum observability index of the system is

Ⓢ = 4

.

Using the results of [26,27,28], we construct the auxiliary performance index (API)

J_{ε} (θ) ≜ \frac{1}{2} E \{{ε_{k} (θ)}^{T} ε_{k} (θ)\},

(44)

for which the auxiliary process is written in the form:

ε_{k} = S (Z_{k + 1 - Ⓢ}^{k}) - {\hat{x}}_{k + 1 - Ⓢ} = {[z_{k - 3}, z_{k - 2}, z_{k - 1}, z_{k}]}^{T} - {\hat{x}}_{k - 3}

(45)

where

{\hat{x}}_{k}

is the prediction state estimate obtained in an adaptive filter.

Next, to get a reasonable estimate for (44), we replace it by the realizable (workable) performance index

J_{ε} (θ, K) = \frac{1}{2 K} \sum_{j = k - K + 1}^{k} {ε_{j} (θ)}^{T} ε_{j} (θ) .

(46)

With the purpose of finding the optimal value

θ^{†}

of the unknown parameter

θ

and to minimize the API (46), we can use existing methods of numerical optimization. Moreover, all non-search methods require the calculation of the API gradient. Assume that the gradient-based optimization method is chosen for the parameter identification procedure. Then, from (46), we can write the expression for calculating the API gradient:

\frac{\partial J_{ε} (θ, K)}{\partial θ} = \frac{1}{K} \sum_{j = k - K + 1}^{k} ε_{j}^{T} (θ) \frac{\partial ε_{j} (θ)}{\partial θ} .

(47)

Evaluating (47) requires the computation of sensitivities (partial derivatives) of the auxiliary process

ε_{j}

with respect to the adjustable parameter

θ

. Various methods for calculating the sensitivities based on the standard Kalman filter are discussed in detail in [8].

Now, we are ready to demonstrate how the new Algorithm 4 can be applied for constructing a computational scheme for the numerical identification of the parameter

θ

based on the API approach.

Consider again the system model (42)–(43). Denote for simplicity

F = F_{*} (θ)

,

G = G_{*} (θ)

,

H = H_{*}

,

{\bar{x}}_{0} = {\bar{x}}_{0}^{*} (θ)

,

Π_{0} = Π_{0}^{*} (θ)

. Let

θ = γ

,

Y_{0} = Π_{0}^{- 1}

,

{\hat{d}}_{0 | 0} = Y_{0} {\bar{x}}_{0}

.

The identification of unknown parameter

θ

and the estimation of state vector

x_{k}

of the system (38)–(39) according to the criterion

θ^{†} = {\hat{θ}}_{\min} = \underset{D (θ)}{argmin} J_{ε} (θ, K)

(48)

can be performed simultaneously by the following algorithm.

Let us consider the practical application of Algorithm 5 to identify the unknown value of

γ

in the model (38)–(39). We have simulated the sequence of output signals

Z_{1}^{200}

with the “true” value of

γ^{†} = 0.0002

. All computer codes were constructed in MATLAB.

In order to conduct our numerical experiments, we have implemented Algorithms 1–5 as the corresponding MATLAB functions. Then, we have calculated the API (46) and the API gradient (47) depending on different values of

γ

. Results are illustrated by Figure 2 and Figure 3.

Algorithm 5. The API-based parameter identification computational scheme.

BEGIN

✓ Assign an initial parameter estimate

{\hat{θ}}^{i}

for

i = 0

.
REPEAT

1°: Take the current point ${\hat{θ}}^{i}$ .
2°: Evaluate the system matrices for (42)–(43) at ${\hat{θ}}^{i}$ :
$F_{*} : = F_{☆} ({\hat{θ}}^{i})$ , $G_{☆} : = G_{*} ({\hat{θ}}^{i})$ .
3°: Evaluate the matrix derivatives ${(F_{*})}_{θ}^{'}$ and ${(G_{*})}_{θ}^{'}$ at ${\hat{θ}}^{i}$ .
4°: Given the output data $Z_{1}^{K} = {[z_{1}, \dots, z_{K}]}^{T}$ , compute a value of the API at ${\hat{θ}}^{i}$ using (46) where the auxiliary process $ε_{j} ({\hat{θ}}^{i})$ (45) can be evaluated as follows:

$ε_{j} ({\hat{θ}}^{i}) = Z_{j - 3}^{j} - {\hat{x}}_{j - 3} = Z_{j - 3}^{j} - {Y_{j - 3}}^{- 1} {\hat{d}}_{j - 3} .$

(49)

The information matrix $Y_{j} = B_{Y_{j}} D_{Y_{j}} B_{Y_{j}}^{T}$ and information state estimate ${\hat{d}}_{j}$ can be computed according to Algorithm 1 (the MWGS-based array IKF).
5°: Evaluate the API gradient (47) at $θ^{i}$ using the results of Step 3 $^{\circ}$ and the Algorithm 4. Wherein

$\frac{\partial ε_{j} (θ)}{\partial θ} = - {(\frac{\partial Y_{j - 3}}{\partial θ})}^{- 1} {\hat{d}}_{j - 3} - {(Y_{j - 3})}^{- 1} \frac{\partial {\hat{d}}_{j - 3}}{\partial θ},$

(50)

$\frac{\partial Y_{j}}{\partial θ} = \frac{\partial (B_{Y_{j}} D_{Y_{j}} B_{Y_{j}}^{T})}{\partial θ}$ are partial derivatives of information matrix $Y_{j}$ . The partial derivative vector $\frac{\partial {\hat{d}}_{j}}{\partial θ}$ can be evaluated by direct differentiation of (16).
6°: Find $θ^{i + 1}$ by the gradient-based optimization procedure

${\hat{θ}}^{i + 1} = {\hat{θ}}^{i} - β_{i} {\frac{\partial J_{ε} (θ, K)}{\partial θ}|}_{θ = {\hat{θ}}^{i}}$

(51)

where scalar step size parameter $β_{j}$ is designed to ensure that $J_{ε} ({\hat{θ}}^{i + 1}, K) \leq J_{ε} ({\hat{θ}}^{i}, K) + e$ (small $e > 0$ ).
7°: $i : = i + 1$ UNTIL a stopping criterion is satisfied.

END

As can be seen from these figures, the minimum point of the API coincides with the true value of parameter

θ = γ^{†}

. Furthermore, a plot of the API gradient has negative values to the left and positive values to the right of the zero point, which correspond to the minimum of the API. All this evidence substantiates our theoretical derivations.

Further to solve a parameter identification problem, we apply the MATLAB Optimization Toolbox with the built-in function fminunc, which implements the gradient-type method. Algorithm 5 was incorporated into the optimization method fminunc to compute the API and its gradient. We have chosen the initial value

{\hat{θ}}^{0} = 0.001

and the stopping criteria epsf=

10^{- 6}

, epsx=

10^{- 6}

.

Results summarized in Table 2 show that the computed estimate

\hat{θ}

comes close to the true parameter value

γ^{†}

.

So, we conclude that the newly constructed Algorithm 4 can be efficiently applied to solve the parameter identification problem when the gradient-based optimization method is used.

5. Conclusions

This paper presents the new MWGS-based array algorithm for computing the information matrix sensitivity equations. We have extended the functionality of the MWGS-based information-form Kalman filtering methods so that they are able to calculate not only the values of the information matrix using the MWGS-based arrays, but also the values of their derivatives. The proposed algorithm is robust to machine round-off errors due to the application of the MWGS orthogonalization procedure at each step.

Moreover, we have demonstrated how the new Algorithm 4 can be applied for solving the parameter identification problem of the one practical stochastic system model, i.e., a simplified version of the instrument error model of the INS. We also have suggested the new API-based parameter identification computational scheme. Numerical experiments conducted in MATLAB confirm the efficiency of the proposed solution.

Author Contributions

Conceptualization, A.T. and J.T.; methodology, A.T. and J.T.; software, A.T.; validation, J.T.; formal analysis, A.T. and J.T.; investigation, A.T. and J.T.; resources, A.T. and J.T.; data curation, A.T. and J.T.; writing—original draft preparation, J.T.; writing—review and editing, A.T. and J.T.; visualization, A.T. and J.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

MWGS	Modified weighted Gram–Schmidt transformation
MWGS-LD	Forward MWGS procedure
MWGS-UD	Backward MWGS procedure
KF	Kalman filter
IKF	Information Kalman filter
TU	Time update step
MU	Measurement update step
APA	Active principle of adaptation
API	Auxiliary Performance Index

References

Golub, G.H.; Van Loan, C.F. Matrix Computations; Johns Hopkins University Press: Baltimore, MD, USA, 1983. [Google Scholar]
Giles, M. An Extended Collection of Matrix Derivative Results for Forward and Reverse Mode Algorithmic Differentiation; Report 08/01; Oxford University Computing Laboratory: Oxford, UK, 2008; 23p. [Google Scholar]
Dieci, L.; Eirola, T. Applications of Smooth Orthogonal Factorizations of Matrices. In The IMA Volumes in Mathematics and Its Applications; Springer: New York, NY, USA, 2000; Volume 119, pp. 141–162. [Google Scholar]
Dieci, L.; Russell, R.D.; Van Vleck, E.S. On the Computation of Lyapunov Exponents for Continuous Dynamical Systems. SIAM J. Numer. Anal. 1997, 34, 402–423. [Google Scholar] [CrossRef] [Green Version]
Kunkel, P.; Mehrmann, V. Smooth factorizations of matrix valued functions and their derivatives. Numer. Math. 1991, 60, 115–131. [Google Scholar] [CrossRef]
Dieci, L. On smooth decompositions of matrices. SIAM J. Matrix Anal. Appl. 1999, 20, 800–819. [Google Scholar] [CrossRef]
Åström, K.-J. Maximum Likelihood and Prediction Error Methods. Automatica 1980, 16, 551–574. [Google Scholar] [CrossRef]
Gupta, N.K.; Mehra, R.K. Computational aspects of maximum likelihood estimation and reduction in sensitivity function calculations. IEEE Trans. Autom. Control 1974, AC-19, 774–783. [Google Scholar] [CrossRef]
Walter, S.F. Structured Higher-Order Algorithmic Differentiation in the Forward and Reverse Mode with Application in Optimum Experimental Design. Ph.D. Thesis, Humboldt-Universität zu Berlin, Berlin, Germany, 2011. [Google Scholar]
Bierman, G.J. Factorization Methods for Discrete Sequential Estimation; Academic Press: New York, NY, USA, 1977. [Google Scholar]
Grewal, M.S.; Andrews, A.P. Kalman Filtering: Theory and Practice Using MATLAB, 4th ed.; John Wiley & Sons, Inc.: New York, NY, USA, 2015. [Google Scholar]
Kailath, T.; Sayed, A.H.; Hassibi, B. Linear Estimation; Prentice Hall: Hoboken, NJ, USA, 2000. [Google Scholar]
Gibbs, B.P. Advanced Kalman Filtering, Least-Squares and Modeling: A Practical Handbook; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2011. [Google Scholar]
Grewal, M.S.; Weill, L.R.; Andrews, A.P. Global Positioning Systems, Inertial Navigation, and Integration; John Wiley & Sons, Inc.: New York, NY, USA, 2001. [Google Scholar]
Tsyganova, J.V.; Kulikova, M.V. State sensitivity evaluation within UD based array covariance filters. IEEE Trans. Autom. Control 2013, 58, 2944–2950. [Google Scholar] [CrossRef] [Green Version]
Tsyganova, Y.V.; Tsyganov, A.V. On the Computation of Derivatives within LD Factorization of Parametrized Matrices. Bull. Irkutsk. State Univ. Ser. Math. 2018, 23, 64–79. (In Russian) [Google Scholar] [CrossRef]
Kaminski, P.G.; Bryson, A.E.; Schmidt, S.F. Discrete square-root filtering: A survey of current techniques. IEEE Trans. Autom. Control 1971, AC-16, 727–735. [Google Scholar] [CrossRef]
Tsyganova, J.V.; Kulikova, M.V.; Tsyganov, A.V. A general approach for designing the MWGS-based information-form Kalman filtering methods. Eur. J. Control 2020, 56, 86–97. [Google Scholar] [CrossRef]
Björck, A. Solving linear least squares problems by Gram-Schmidt orthogonalization. Bit Numer. Math. 1967, 7, 1–21. [Google Scholar] [CrossRef]
Tsyganova, J.V.; Kulikova, M.V.; Tsyganov, A.V. Some New Array Information Formulations of the UD-based Kalman Filter. In Proceedings of the 18th European Control Conference (ECC), Napoli, Italy, 25–28 June 2019; pp. 1872–1877. [Google Scholar]
Ljung, L. System Identification: Theory for the User, 2nd ed.; Prentice Hall PTR: Upper Saddle River, NJ, USA, 1999. [Google Scholar]
Kulikova, M.V. Likelihood Gradient Evaluation Using Square-Root Covariance Filters. IEEE Trans. Autom. Control 2009, 54, 646–651. [Google Scholar] [CrossRef] [Green Version]
Semushin, I.V.; Tsyganova, J.V.; Tsyganov, A.V. Numerically Efficient LD-computations for the Auxiliary Performance Index Based Control Optimization under Uncertainties. IFAC-PapersOnline 2018, 51, 568–573. [Google Scholar] [CrossRef]
Boiroux, D.; Juhl, R.; Madsen, H.; Jørgensen, J.B. An Efficient UD-Based Algorithm for the Computation of Maximum Likelihood Sensitivity of Continuous-Discrete Systems. In Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC), ARIA Resort & Casino, Las Vegas, NV, USA, 12–14 December 2016; pp. 3048–3053. [Google Scholar]
Broxmeyer, C. Inertial Navigation Systems; McGraw-Hill Book Company: Boston, MA, USA, 1956. [Google Scholar]
Semushin, I.V. Adaptation in Stochastic Dynamic Systems—Survey and New Results I. Int. J. Commun. Netw. Syst. Sci. 2011, 4, 17–23. [Google Scholar] [CrossRef] [Green Version]
Semushin, I.V. Adaptation in Stochastic Dynamic Systems—Survey and new results II. Int. J. Commun. Netw. Syst. Sci. 2011, 4, 266–285. [Google Scholar] [CrossRef] [Green Version]
Semushin, I.V.; Tsyganova, J.V. Adaptation in Stochastic Dynamic Systems—Survey and New Results IV: Seeking Minimum of API in Parameters of Data. Int. J. Commun. Netw. Syst. Sci. 2013, 6, 513–518. [Google Scholar] [CrossRef] [Green Version]
Semushin, I.V. The APA based time-variant system identification. In Proceedings of the 53rd IEEE Conference on Decision and Control (CDC), Los Angeles, CA, USA, 15–17 December 2014; pp. 4137–4141. [Google Scholar]
Tsyganova, Y.V. Computing the gradient of the auxiliary quality functional in the parametric identification problem for stochastic systems. Autom. Remote Control 2011, 72, 1925–1940. [Google Scholar] [CrossRef]

Figure 1. Implementation scheme of the measurement update step in Algorithm 4 based on the MWGS-UD transformation.

Figure 2. The values of the API identification criterion

J_{ε} (θ, K)

depending on values of parameter

θ

, calculated using Algorithms 1 and 5.

Figure 2. The values of the API identification criterion

J_{ε} (θ, K)

depending on values of parameter

θ

, calculated using Algorithms 1 and 5.

Figure 3. The values of the API gradient

\frac{\partial J_{ε} (θ, K)}{\partial θ}

depending on values of parameter

θ

, calculated using Algorithms 4 and 5.

Figure 3. The values of the API gradient

\frac{\partial J_{ε} (θ, K)}{\partial θ}

depending on values of parameter

θ

, calculated using Algorithms 4 and 5.

Table 1. Numerical values of parameters.

Parameter	Value
$τ$	$1 s$
g	$9.81 m / s^{2}$
a	$0.6378245 \cdot 10^{7} m$
$b_{1}$	$e^{- γ τ} ≃ 1 - γ τ$
$a_{1}$	$H_{1} \sqrt{1 - b_{1}^{2}} ≃ H_{1} \sqrt{2 γ τ}$
$n_{G y}$	$0.48 \cdot 10^{- 6} rad / s$
$H_{1}$	$0.10 \cdot 10^{- 3} m / s^{2}$
$γ$	$0.20 \cdot 10^{- 3} s^{- 1}$
$v_{k}$	$σ ξ_{k}$ , $ξ_{k} \sim N (0, 1)$
$σ$	$0.1 m / s$
$w_{k}$	$w_{k} \sim N (0, 1)$

Table 2. Performance of the API-based identification of the model parameter

γ

.

Table 2. Performance of the API-based identification of the model parameter

γ

.

“True” value $γ^{†}$	$0.0002$
Resulting estimate ${\hat{θ}}_{\min}$	$1.9999 \cdot 10^{- 4}$
$J_{ε} ({\hat{θ}}_{\min}, K)$	$0.0196$
Relative estimation error $\| \| γ^{†} - {\hat{θ}}_{\min} \| \| / \| \| {\hat{θ}}_{\min} \| \|$	$7.8173 \cdot 10^{- 6}$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tsyganov, A.; Tsyganova, J. Algorithmic Differentiation of the MWGS-Based Arrays for Computing the Information Matrix Sensitivity Equations within the Problem of Parameter Identification. Mathematics 2022, 10, 126. https://doi.org/10.3390/math10010126

AMA Style

Tsyganov A, Tsyganova J. Algorithmic Differentiation of the MWGS-Based Arrays for Computing the Information Matrix Sensitivity Equations within the Problem of Parameter Identification. Mathematics. 2022; 10(1):126. https://doi.org/10.3390/math10010126

Chicago/Turabian Style

Tsyganov, Andrey, and Julia Tsyganova. 2022. "Algorithmic Differentiation of the MWGS-Based Arrays for Computing the Information Matrix Sensitivity Equations within the Problem of Parameter Identification" Mathematics 10, no. 1: 126. https://doi.org/10.3390/math10010126

APA Style

Tsyganov, A., & Tsyganova, J. (2022). Algorithmic Differentiation of the MWGS-Based Arrays for Computing the Information Matrix Sensitivity Equations within the Problem of Parameter Identification. Mathematics, 10(1), 126. https://doi.org/10.3390/math10010126

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Algorithmic Differentiation of the MWGS-Based Arrays for Computing the Information Matrix Sensitivity Equations within the Problem of Parameter Identification

Abstract

1. Introduction

2. Methodology

2.1. Information Kalman Filter and Information Matrix

2.2. The MWGS-Based Array Algorithm for Computing the Information Matrix

2.3. Algorithmic Differentiation of the MWGS-Based Arrays

3. Main Result

The New MWGS-Based Array Algorithm for Computing the Information Matrix Sensitivity Equations

4. Discussion

4.1. Implementation Details of Algorithm 4

4.2. Application of the Results Obtained to the Problem of Parameter Identification

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI