Optimal Filtering of Markov Jump Processes Given Observations with State-Dependent Noises: Exact Solution and Stable Numerical Schemes

Borisov, Andrey; Sokolov, Igor

doi:10.3390/math8040506

Open AccessArticle

Optimal Filtering of Markov Jump Processes Given Observations with State-Dependent Noises: Exact Solution and Stable Numerical Schemes

by

Andrey Borisov

^1,*

and

Igor Sokolov

²

¹

Institute of Informatics Problems of Federal Research Center “Computer Science and Control” RAS, 44/2 Vavilova str., 119333 Moscow, Russia

²

Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, GSP-1, 1-52 Leninskiye Gory, 119991 Moscow, Russia

^*

Author to whom correspondence should be addressed.

Mathematics 2020, 8(4), 506; https://doi.org/10.3390/math8040506

Submission received: 14 March 2020 / Revised: 29 March 2020 / Accepted: 30 March 2020 / Published: 2 April 2020

(This article belongs to the Special Issue Stability Problems for Stochastic Models: Theory and Applications)

Download

Browse Figure

Review Reports Versions Notes

Abstract

:

The paper is devoted to the optimal state filtering of the finite-state Markov jump processes, given indirect continuous-time observations corrupted by Wiener noise. The crucial feature is that the observation noise intensity is a function of the estimated state, which breaks forthright filtering approaches based on the passage to the innovation process and Girsanov’s measure change. We propose an equivalent observation transform, which allows usage of the classical nonlinear filtering framework. We obtain the optimal estimate as a solution to the discrete–continuous stochastic differential system with both continuous and counting processes on the right-hand side. For effective computer realization, we present a new class of numerical algorithms based on the exact solution to the optimal filtering given the time-discretized observation. The proposed estimate approximations are stable, i.e., have non-negative components and satisfy the normalization condition. We prove the assertions characterizing the approximation accuracy depending on the observation system parameters, time discretization step, the maximal number of allowed state transitions, and the applied scheme of numerical integration.

Keywords:

stochastic differential observation system; nonlinear filtering problem; state-dependent observation noise; numerical filtering algorithm; filtering given time-discretized observations; stable approximation; approximation accuracy

1. Introduction

The Wonham filter [1], as well as the Kalman–Bucy filter [2], is one of the most practically used filtering algorithms for the states of the stochastic differential observation systems. It is applied extensively for signal processing in technics, communications, finance and economy, biology, medicine, etc. [3,4,5,6]. The filter provides the optimal in the Mean Square (MS) sense on-line estimate of the finite-state Markov Jump Process. (MJP) given indirect continuous-time observations, corrupted by the Wiener noise. The elegant algorithm represents the desired estimate as a solution to a Stochastic Differential System (SDS) with continuous random processes on the Right-Hand Side (RHS).

The fundamental condition for the solution to the filtering problem is the independence of the observation noise intensity of the estimated state. It provides the continuity from the right for the natural flow of

σ

-algebras induced by the observations, with subsequent utilization of the innovation process framework. The condition violation breaks these advantages. In the case of the state-dependent observation noise, the author of [7] presents the optimal estimate within the class of the linear estimates. Further, the authors of [8,9] use filters of a linear structure for the solution to the

H_{2}

-optimal state filtering problem. To find the absolute optimal filtering estimate, one has to make extra efforts. First, for proper utilization of the stochastic analysis framework, one needs to reformulate the optimal filtering problem, “smoothing forward“ the flow of

σ

-algebras induced by the observations. Second, in the case of state-dependent noise, the innovation process contains less information than the original observations. One has to supplement the innovation by the observation quadratic characteristic, which represents a continuous-time noiseless function of the estimated MJP state. In general, the optimal filtering given partially noiseless observations is a challenging problem. Its solution can be expressed either as a sequence of some regularized estimates [10] or by the additional differentiation of the smooth observation components or their quadratic characteristics [11,12,13,14]. In both cases, one needs to realize a limit passage, which is difficult in computers.

Even in the traditional settings, the numerical realization of the MJP state filtering is a complicated problem. For example, the explicit numerical methods based on the Itô–Taylor expansion applied to the Wonham filter equation, diverge: the produced approximations do not meet component-wise non-negativity condition. Over time the approximation components reach arbitrary large absolute values. Further, in the presentation, we refer to the approximations, preserving both the component non-negativity and normalization condition as the stable ones.

The Wonham filtering equation is a particular case of the nonlinear Kushner–Stratonovich equation. To solve it, one can use various numerical algorithms

the procedures based on the weak approximation of the original processes by Markov chains [15,16],
some variants of the splitting methods [17],
the robust procedures based on the Clark transform [18,19],
the schemes, which represent the conditional probability distributions through the logarithm [20], etc.

All the algorithms are developed for the case of additive observation noise and based on the Girsanov’s measure transform. Hence, they are useless for the estimation of the MJP given the observations with state-dependent noise.

The goal of the paper is two-fold. First, it presents a theoretical solution to the MS-optimal filtering problem, given the observations with state-dependent noise. Second, it introduces a new class of stable numerical algorithms for filter realization and investigates its accuracy. We organize the paper as follows. Section 2 contains a description of the studying observation system with state-dependent observation noise along with the MS-optimal filtering problem statement. To solve the problem, one needs to transform the available observations both to preserve the information equivalence and suit for application of the known results of the optimal nonlinear filtering. Section 3 describes both the observation transformation and the SDS defining the optimal filtering estimate. The SDS is discrete–continuous and contains both continuous and counting random processes on the RHS. Previously, the author of the note [21] presents a sketch of the observation transform, but it cannot guarantee the uniqueness of that SDS solution.

Section 4 presents a new class of the stable numerical algorithms of the nonlinear filtering. The main idea is to discretize original continuous-time observations and then find the MS-optimal filtering estimate given the sampled observations. The authors of [22] use this idea to solve a particular case of the estimation problem, namely the classification problem of a finite-state random vector given continuous-time observations with multiplicative noise. Section 4.1 contains a general solution to the problem. The corresponding estimate represents a ratio, which numerator and denominator are the infinite sums of integrals. They are shift-scale mixtures of the Gaussians. The mixing distributions, in turn, describe the occupation time of the system state in each admissible value during the time discretization interval. In Section 4.2, we suggest approximating the estimates by a convergent sequence bounding number s of possible state transitions, which occurred over the discretization interval. We replace the infinite sums in the formula of the optimal estimate by their finite analogs and also investigate the accuracy of the approximations. We refer these approximations as the analytical ones of the s-th order. One cannot calculate the integrals analytically and have to replace them with some integral sums, and this brings an extra error. Section 4.3 analyzes the value of this error and the total distance between the optimal filtering estimate given the discretized observations and its numerical realization. Section 4.4 presents a numerical example that illustrates the conformity of theoretical estimates and their numerical realization. Section 5 contains discussion and concluding remarks.

2. Continuous-Time Filtering Problem Statement

On the probability triplet with filtration

(Ω, F, P, {F}_{t \geq 0})

we consider the observation system

\begin{matrix} X_{t} = X_{0} + \int_{0}^{t} Λ^{⊤} (s) X_{s} d s + M_{t}^{X}, \end{matrix}

(1)

\begin{matrix} Y_{t} = \int_{0}^{t} f (s) X_{s} d s + \int_{0}^{t} \sum_{n = 1}^{N} X_{s}^{n} G_{n}^{1 / 2} (s) d W_{s} . \end{matrix}

(2)

Here

$X_{t} = col (X_{t}^{1}, \dots, X_{t}^{N}) \in S^{N}$ is an unobservable state which is a finite-state Markov jump process (MJP) with the state space $S^{N} ≜ {e_{1}, \dots, e_{N}}$ ( $S^{N}$ stands for the set of all unit coordinate vectors of the Euclidean space $R^{N}$ ) with the transition matrix $Λ (t)$ and the initial distribution $π = col (π^{1}, \dots, π^{N})$ ; the process $M_{t}^{X}$ is an $F_{t}$ -adapted martingale,
$Y_{t} = col (Y_{t}^{1}, \dots, Y_{t}^{M}) \in R^{M}$ is an observation process: $W_{t} = col (W_{t}^{1}, \dots, W_{t}^{M}) \in R^{M}$ is an $F_{t}$ -adapted standard Wiener process characterizing the observation noise, $f (t)$ is an $M \times N$ -dimensional observation matrix and the collection of $M \times M$ -dimensional matrices ${G_{n} (t)}_{n = \bar{1, N}}$ defines the conditional observation noise intensities given $X_{t} = e_{n}$ .

The natural flow of

σ

-algebras generated by the observations Y up to the moment t is denoted by

Y_{t} ≜ σ {Y_{s} : s \in [0, t]}

,

Y_{0} ≜ {\emptyset, Ω}

.

The optimal state filtering given the observations Y is to find the Conditional Mathematical Expectation (CME)

{\hat{X}}_{t} ≜ E_{} \{X_{t} | Y_{t +}\} .

(3)

3. Observation Transform and Optimal Filtering Equation

Before derivation of the optimal filtering equation we specify the properties of the observation system (1) and (2).

All trajectories of ${X_{t}}_{t \geq 0}$ are continuous from the left and have finite limits from the right, i.e., are cádlág-processes.
Nonrandom matrix-valued functions $Λ (t)$ , $f (t)$ and ${G_{n} (t)}_{n = \bar{1, N}}$ consist of the cádlág-components.
The noises in Y are uniformly nondegenerate [10], i.e., $min_{\binom{1 \leq n \leq N,}{t \geq 0}} G_{n} (t) > α I$ for some $α > 0$ ; here and after, I is a unit matrix of appropriate dimensionality.
The processes

$K_{i j} (t) ≜ I_{{0}} (G_{i} (t) - G_{j} (t)), i, j = \bar{1, N}$

(4)

have a finite variation; here and after, $I_{A} (x)$ is an indicator function of the set $A$ , and $0$ is a zero matrix of appropriate dimensionality.

Conditions 1–3 are standard for the filtering problems [10]. They guarantee the proper description of MJP distribution

π (t) ≜ E_{} \{X_{t}\}

by the Kolmogorov system

π (t) = π + \int_{0}^{t} Λ^{⊤} (s) π (s) d s

. Condition 4 relates to the quadratic characteristic of the observation process as a key information source itself. Below we show that collection of

G_{n} (\cdot)

, distinguished for different n, allows to restore the state

X_{t}

precisely given the available noisy observations. Condition 4 guarantees the local regularity of the time subsets, where

G_{n} (\cdot)

coincide and/or differ each other: one can express them as finite unions of the intervals. The condition is not too restrictive: for instance, they are valid when

G_{n} (\cdot)

are piece-wise continuous with bounded derivatives.

Both the system state and observation are special square-integrable semimartingales [6,23] with the predictable characteristics

\begin{matrix} {〈 X, X 〉}_{t} ≜ X_{t} X_{t}^{⊤} - \int_{0}^{t} X_{s -} d X_{s}^{⊤} - \int_{0}^{t} d X_{s} X_{s -}^{⊤} = \\ = \int_{0}^{t} (diag (Λ^{⊤} (s) X_{s}) - Λ^{⊤} (s) diag X_{s} - diag (X_{s}) Λ (s)) d s \end{matrix}

(5)

and

{〈 Y, Y 〉}_{t} ≜ Y_{t} Y_{t}^{⊤} - \int_{0}^{t} Y_{s -} d Y_{s}^{⊤} - \int_{0}^{t} d Y_{s} Y_{s -}^{⊤} = \sum_{n = 1}^{N} \int_{0}^{t} X_{s}^{n} G_{n} (s) d s .

(6)

Conditions 1–3 and the properties of

X_{t}

guarantee

P

-a.s. fulfilment of the following equalities for the one-sided derivatives of

{〈 Y, Y 〉}_{t}

:

\begin{matrix} \frac{d {〈 Y, Y 〉}_{s}}{d s} |_{s = t -} = \sum_{n = 1}^{N} X_{t -}^{n} G_{n} (t -) = \sum_{n = 1}^{N} X_{t}^{n} G_{n} (t -), \\ \frac{d {〈 Y, Y 〉}_{s}}{d s} |_{s = t +} = \sum_{n = 1}^{N} X_{t -}^{n} (G_{n} (t -) + Δ G_{n} (t)) = \sum_{n = 1}^{N} X_{t}^{n} G_{n} (t), \end{matrix}

(7)

where

Δ G_{n} (t) ≜ G_{n} (t) - G_{n} (t -)

is a jump function of

G_{n} (t)

. So, if there exists a nonrandom instant

t^{*} > 0

such that

\sum_{n = 1}^{N} π^{n} (t^{*}) Δ G_{n} (t^{*}) \neq 0

, then

Y_{t^{*}} \subset Y_{t^{*} +} = Y_{t^{*}} \lor σ {\sum_{n = 1}^{N} X_{t^{*}}^{n} Δ G_{n} (t^{*})}

. The inclusion presumes the flow of

σ

-subalgebras

{Y_{t}}_{t \geq 0}

is not necessarily continuous from the right for the considered observations [24]. This is a reason to define a filtering estimate as a CME of

X_{t}

with respect to the “smoothed” flow

Y_{t +}

for subsequent correct usage of the stochastic analysis framework.

Let us transform the available observations in such a way to derive the optimal filtering estimate by the standard methods [6,23]. Initially, the idea of this transform is suggested in [11]. As the result, the authors introduce the pair

U_{t} ≜ \int_{0}^{t} {(\frac{d {〈 Y, Y 〉}_{u}}{d u} |_{u = s +})}^{- 1 / 2} d Y_{s},

(8)

{〈 Y, Y 〉}_{t} = \sum_{n = 1}^{N} \int_{0}^{t} X_{s}^{n} G_{n} (s) d s .

(9)

The authors of [11] prove coincidence of the

σ

-algebras

Y_{t} = σ {U_{s}, 0 \leq s \leq t} \lor σ {{〈 Y, Y 〉}_{s}, 0 \leq s \leq t}

for the general diffusion observation systems. However, they do not pay attention to the continuity of

{Y_{t}}

from the right. The authors of [12,14] suggest to replace the observations

{〈 Y, Y 〉}_{t}

by their derivative

Q (t) ≜ \frac{d {〈 Y, Y 〉}_{s}}{d s} |_{s = t -} = \sum_{n = 1}^{N} X_{t -}^{n} G_{n} (t -) .

(10)

Then, one can construct the optimal estimate either to use

Q_{t}

as a linear constraint or to differentiate (10) for extraction of the dynamic noises. The papers [12,14] contain a rather pessimistic conclusion: the number of differentiations is unbounded in the general case of diffusion observation system. In contrast, we estimate a finite-state MJP and can construct the optimal filtering estimate using Q without additional differentiation.

So, the transformed observations will contain

diffusion processes with the unit diffusion,
counting stochastic processes,
indirect state observations obtained at the nonrandom discrete moments.

The first transformed observation part is the process

U_{t}

(8), and in view of (2) and (7) it can be rewritten as

U_{t} = \int_{0}^{t} \bar{f} (s) X_{s} d s + {\bar{W}}_{t},

(11)

where

\bar{f} (s) ≜ \sum_{n = 1}^{N} G_{n}^{- 1 / 2} (s) f (s) diag (e_{n})

and

{\bar{W}}_{t}

is an

F_{t}

-adapted standard Wiener [10].

The process

Q_{t}

could play the role of the second part of the transformed observations since

Y_{t} = σ {U_{s}, Q_{s}, s \in [0, t]}

[11], however the natural flow of

σ

-algebras generated by the couple

(U, Q)

is not continuous from the right yet. Moreover, the process

Q_{t}

is matrix-valued and looks overabundant for the filter derivation. The point is,

Q_{t} = Q (t, X_{t -})

(10) is a function of the finite-set argument

X_{t}

, and it affects the estimate performance through its complete preimage

Q_{t} = Q (t, X_{t -}) \overset{Q^{- 1}}{\to} {e_{n} \in S^{N} : G_{n} (t -) e_{n} = Q_{t}} .

To go to the preimage we introduce the following transformation of

Q_{t}

:

H_{t} ≜ \sum_{n = 1}^{N} I_{{0}} (Q_{t} - G_{n} (t)) e_{n} .

H_{t}

is a

Y_{t}

-adapted vector process with components 0 or 1, but the trajectories

H_{t}

are not cádlág processes. Due to the fact

X_{t -} = X_{t}

P

-a.s. for

\forall t \geq 0

the equalities below are valid

H_{t} = \sum_{n, k = 1}^{N} I_{{0}} (G_{k} (t) - G_{n} (t)) X_{t}^{k} e_{n} = K (t) X_{t} = K (t) X_{t -} P - a . s .,

(12)

where

K (t)

is the

N \times N

-dimensional matrix with the components (4).

The function

K (t)

has the following properties.

$K (t) \equiv K^{⊤} (t)$ for any $t \geq 0$ .
The number of $K (\cdot)$ jumps occurred in any finite time interval is finite due to condition 4.
$K (t)$ is not a cádlág-function [25].
$P {∥ Δ K (t) ∥ ∥ Δ X_{t} ∥ > 0} = 0$ for any $t \geq 0$ .
For any $t \geq 0$ there exists a transformation $T (t)$ such that the matrix $T (t) K (t)$ is trapezoid with orthogonal strings and 0 and 1 as the components.
$P {T (t) H_{t} \in S^{N}} = 1$ for any $t \geq 0$ .

Let us define a

Y_{t +}

-adapted process

V_{t} = col (V_{t}^{1}, \dots, V_{t}^{N})

with the cádlág-trajectories:

V_{t} ≜ T (t +) H_{t +} .

(13)

From (12) and (13) it follows that

V_{t} = J (t) X_{t}

P

-a.s., where

J (t) ≜ T (t +) K (t +)

.

We denote the set of the process V discontinuity by

V

,

X

stands for the set of X discontinuity and

J

for the analogous set of the process J. The sets

V

and

X

are random, in contrast

J

is nonrandom. The process

V_{t}

is purely discontinuous, and due to property 4 it can be rewritten in the form

\begin{matrix} V_{t} = J (0) X_{0} + \sum_{κ \in V : κ \leq t} Δ V_{κ} = J (0) X_{0} + \sum_{κ \in J : κ \leq t} Δ J (κ) X_{κ} + \sum_{κ \in V ∖ J : κ \leq t} J (κ) Δ X_{κ} = \\ = J (0) X_{0} + \sum_{κ \in J : κ \leq t} Δ J (κ) X_{κ} + \sum_{κ \in X : κ \leq t} J (κ) Δ X_{κ} = \underset{≜ D_{t}}{\underset{︸}{J (0) X_{0} + \sum_{κ \in J : κ \leq t} Δ J (κ) X_{κ}}} + \underset{≜ R_{t}}{\underset{︸}{\int_{0}^{t} J (s) d X_{s}}} . \end{matrix}

(14)

Due to the definition

V_{t} \in S^{N}

for

\forall t \geq 0

. The process

D_{t}

characterizes the observable jumps at the nonrandom moments caused by

J (t)

changes, and

R_{t}

is an observable part of the state

X_{t}

jumps, occurred, at some random instants.

As a second part of the transformed observations, we choose the N-dimensional random process

C_{t} ≜ col (C_{t}^{1}, \dots, C_{t}^{N})

: the components

C_{t}^{n}

count the jumps of the process

V_{t}

into the state

e_{n}

, occurred at the random instants over the interval

[0, t]

:

C_{t}^{n} = \int_{0}^{t} (1 - e_{n}^{⊤} V_{s -}) e_{n}^{⊤} d R_{s} .

(15)

The third part of the transformed observations is the N-dimensional process

D_{t}

with the jumps at the nonrandom moments.

Lemma 1.

If

{\bar{Y}}_{t} ≜ σ {(U_{s}, C_{s}, D_{s}), s \in [0, t]}

, then the coincidence

{\bar{Y}}_{t} = Y_{t +}

is true for any

t \geq 0

.

Correctness of the Lemma assertion follows immediately from the fact the composite process

(U_{t}, C_{t}, D_{t})

is constructed to be

Y_{t +}

-adapted, and one-to-one correspondence of the

(U, C, D)

and Y paths:

\{\begin{matrix} U_{t} = \int_{0}^{t} {(\frac{d {〈 Y, Y 〉}_{u}}{d u} |_{u = s +})}^{- 1 / 2} d Y_{s}, \\ C_{t} = \int_{0}^{t} (I - diag V_{s -}) d V_{s} - \sum_{κ \in J : κ \leq t} (I - diag V_{κ -}) Δ V_{κ}, \\ D_{t} = \sum_{κ \in J : κ \leq t} (I - diag V_{κ -}) Δ V_{κ}, \\ V_{t} = T (t +) H_{t +}, \\ H_{t} ≜ \sum_{n = 1}^{N} I_{{0}} (\frac{d {〈 Y, Y 〉}_{s}}{d s} |_{s = t -} - G_{n} (t)) e_{n}, \end{matrix}

(16)

\{\begin{matrix} V_{t} = D_{t} + \int_{0}^{t} \sum_{(i, j) : i \neq j}^{N} V_{s -}^{i} (e_{j} - e_{i}) d C_{s}^{j} . \\ Y_{t} = \int_{0}^{t} \sum_{n = 1}^{N} V_{s}^{n} G_{n}^{1 / 2} (s) d U_{s}, \end{matrix}

(17)

Below we use the following notations:

1

is a row vector of the appropriate dimensionality formed by units,

J_{n} (s) ≜ e_{n}^{⊤} J (s)

is the n-th row of the matrix

J (s)

,

Γ_{n} (s) ≜ diag (J_{n} (s)) Λ^{⊤} (s) (I - diag J_{n} (s)) .

(18)

Lemma 2.

The process

C_{t} = col (C_{t}^{1}, \dots, C_{t}^{N})

has the following properties.

1.: n-th component $C_{t}^{n}$ allows the martingale representation

$C_{t}^{n} = \int_{0}^{t} 1 Γ_{n} (s) X_{s} d s + \int_{0}^{t} (1 - J_{n} (s) X_{s -}) J_{n} (s) d M_{s}^{X} .$
2.: ${[C^{n}, C^{m}]}_{t} \equiv 0$ for any $n \neq m$ ;

${〈 C^{n}, C^{n} 〉}_{t} = \int_{0}^{t} 1 Γ_{n} (s) X_{s} d s .$

(19)
3.: The innovation processes

$ν_{t}^{n} ≜ \int_{0}^{t} (d C_{s}^{n} - 1 Γ_{n} (s) {\hat{X}}_{s} d s), n = \bar{1, N}$

(20)

are ${\bar{Y}}_{t}$ -adapted martingales with the quadratic characteristics

${〈 ν^{n}, ν^{n} 〉}_{t} = \int_{0}^{t} 1 Γ_{n} (s) {\hat{X}}_{s} d s .$

(21)

Proof of Lemma 2 is given in Appendix A.

Finally, the transformed observations

(U, C, D)

take the form

\{\begin{matrix} U_{t} = \int_{0}^{t} \bar{f} (s) X_{s} d s + {\bar{W}}_{t}, \\ C_{t}^{n} = \int_{0}^{t} 1 Γ_{n} (s) X_{s} d s + \int_{0}^{t} (1 - J_{n} (s) X_{s -}) J_{n} (s) d M_{s}^{X}, n = \bar{1, N}, \\ D_{t} = J (0) X_{0} + \sum_{κ \in J : κ \leq t} Δ J (κ) X_{κ} . \end{matrix}

(22)

Theorem 1.

The optimal filtering estimate

{\hat{X}}_{t}

is a strong solution to the SDS

\begin{matrix} {\hat{X}}_{t} = {({(D_{0})}^{⊤} J (0) π_{0})}^{+} diag (D_{0}) J (0) π_{0} + \int_{0}^{t} Λ^{⊤} (s) {\hat{X}}_{s} d s + \int_{0}^{t} (diag {\hat{X}}_{s} - {\hat{X}}_{s} {\hat{X}}_{s}^{⊤}) {\bar{f}}^{⊤} (s) d ω_{s} + \\ + \sum_{n = 1}^{N} \int_{0}^{t} (Γ_{n} (s) - 1 Γ_{n} (s) {\hat{X}}_{s -} I) {\hat{X}}_{s -} {(1 Γ_{n} (s) {\hat{X}}_{s -})}^{+} d ν_{s}^{n} + \\ + \sum_{κ \in J : κ \leq t} ({(Δ D_{κ}^{⊤} Δ J (κ) {\hat{X}}_{κ -})}^{+} diag (Δ D_{κ}) Δ J (κ) - I) {\hat{X}}_{κ -}, \end{matrix}

(23)

where

ω_{t} ≜ U_{t} - \int_{0}^{t} \bar{f} (s) {\hat{X}}_{s} d s

(24)

and

A^{+}

is a Moore–Penrose pseudoinverse. The solution is unique within the class of nonnegative piecewise-continuous

Y_{t +}

-adapted processes with discontinuity set lying in

V

.

Proof of Theorem 1 is given in Appendix B.

The transformed observations (22) along with Theorem 1 prompt a condition of the exact identifiability of the state

X_{t}

given indirect noisy observations

Y_{t}

(2).

Corollary 1.

If for any

n \neq m

(

n, m = \bar{1, N}

) the inequalities

G_{n} (s) \neq G_{m} (s)

are true almost everywhere on

[0, t]

, then

{\hat{X}}_{t} = X_{t}

P

-a.s., and

X_{t}

is the solution to SDS (23).

The proof of Corollary 1 is given in Appendix C.

4. Numerical Algorithms of Optimal Filtering

4.1. Optimal Filtering Given Discretized Observations

The latter section contains the stochastic system (23) defining the optimal filtering estimate

{\hat{X}}_{t}

. The problem of its numerical realization seems routine: we should apply the corresponding methods of numerical integration of SDS with jumps on the RHS [26]. However, this simplicity is illusory. The problem is that the “new” countable observation

C_{t}

and discrete-time one

D_{t}

are results of certain transform of the available observation Y, and this transform includes a limit passage operation. In fact, to obtain

C_{t}

we have to estimate/restore the current value of the derivative

\frac{d {〈 Y, Y 〉}_{t +}}{d t}

. First, this leads to some time delay to accumulate observations

Y_{t}

. Second, any pre-limit variant of

C_{t}

either has a.s. continuous trajectories or represents their sampling, which demonstrates oscillating nature. Third, the considered filtering estimate is the CME of the state

X_{t}

given the observations Y up to the moment t. The CME has natural properties: its components are a.s. non-negative and satisfy the normalization condition. The estimates and approximations having these properties are referred in the paper as the stable ones. Mostly, the conventional numerical algorithms do not provide these properties for the calculated approximations. They can preserve the normalization condition only, but the components can have the arbitrary signs and absolute values.

In the paper we present another approach to the numerical realization of the filtering algorithm above. We discretize the available observations Y by time with the increment h and then solve the optimal state filtering problem given discretized observations. The estimate can be considered as approximation of the one given the initial continuous-time observations. Properties of the CME guarantee the stability of the proposed approximation.

To simplify derivation of the numerical algorithm and its accuracy analysis we investigate the time-invariant subset of the observation system (1), (2), i.e.,

Λ (t) \equiv Λ

,

A (t) \equiv A

,

G_{n} (t) \equiv G_{n}

,

n = \bar{1, N}

. The observations are discretized with the time increment h:

Y_{r} ≜ \int_{t_{r - 1}}^{t_{r}} f X_{s} d s + \int_{t_{r - 1}}^{t_{r}} \sum_{n = 1}^{N} X_{s}^{n} G_{n}^{1 / 2} d W_{s}, r \in N,

(25)

where

t_{r} ≜ r h

are equidistant time instants. We denote

Y_{r} ≜ σ {Y_{s} : 1 \leq s \leq r}

non-decreasing collection of

σ

-algebras generated by the time-discretized observations;

Y_{0} ≜ {\emptyset, Ω}

.

The optimal state filtering problem given discretized observations is to find

{\hat{X}}_{r} ≜ E_{} \{X_{t_{r}} | Y_{r}\}

.

Let us consider asymptotics of

\hat{X}

. We fix some

T > 0

and consider a condensed sequence of binary meshes

{\frac{r T}{2^{n}}}_{r = \bar{1, 2^{n}}}

with time increments

h_{n} ≜ \frac{T}{2^{n}}

and corresponding increasing sequence of

σ

-subalgebras

{Y_{2^{n}}^{n}}

:

Y_{2^{n}}^{n} ≜ σ {Y_{r}, 1 \leq r \leq 2^{n}}

. The observation process

{Y_{t}}

is separable, hence

σ \{⋃_{n = 1}^{\infty} Y_{n}\} = Y_{T}

. Then, by Levy theorem

{\hat{X}}_{2^{n}} ≜ E_{} \{X_{T} | Y_{n}\} \overset{n \to \infty}{\to} E_{} \{X_{T} | Y_{T}\} = E_{} \{X_{T} | Y_{T +}\} ≜ {\hat{X}}_{T}

P

-a.s. Moreover, since

E_{} \{{\hat{X}}_{T}\} \equiv E_{} \{{\hat{X}}_{2^{n}}\} = π (T)

, the

L_{1}

-convergence is also true:

{lim}_{n \to \infty} E_{} \{| {\hat{X}}_{T} - {\hat{X}}_{2^{n}} |\} = 0

. The convergence also holds, if we replace the sequence of the binary meshes by any condensed sequence with vanishing step. So, we can conclude that the optimal filtering given the discretized observation is a way to design the stable convergent approximations without observation transform

Y \to (U, C, D)

introduced in the previous section.

To derive the filtering formula we use the approach of [27] and the mathematical induction.

In the case

r = 0

we have

{\hat{X}}_{0} = E_{} \{X_{0} | Y_{0}\} = E_{} \{X_{0}\} = π .

(26)

Let for some

r \in N

the estimate

{\hat{X}}_{r - 1} = E_{} \{X_{t_{r - 1}} | Y_{r - 1}\}

be known. Now we calculate

{\hat{X}}_{r}

at the next time instant. To do this we have to specify the mutual conditional distribution

(X_{t_{r}}, Y_{r})

with respect to

Y_{r - 1}

. From the observation model and ([10] Lemma 7.5) it follows that the conditional distribution of

Y_{r}

given

σ

-algebra

F_{t_{r}}^{X} \lor Y_{r - 1}

is Gaussian with the parameters

E_{} \{Y_{r} | F_{t_{r}}^{X}\} = f υ_{r}, cov (Y_{r}, Y_{r} | F_{t_{r}}^{X}) = \sum_{n = 1}^{N} υ_{r}^{n} G_{n} .

(27)

Here,

υ_{r} = col (υ_{r}^{1}, \dots, υ_{r}^{N}) ≜ \int_{t_{r - 1}}^{t_{r}} X_{s} d s

is a random vector composed of the occupation times of the process X in each state

e_{n}

during the interval

[t_{r - 1}, t_{r}]

.

Below in the presentation we use the following notations:

$D ≜ {u = col (u^{1}, \dots, u^{N}) : u^{n} \geq 0, \sum_{n = 1}^{N} u^{n} = h}$ is an $(N - 1)$ -dimensional simplex in the space $R^{M}$ ; $D$ is a distribution support of the vector $υ_{r}$ ;
$Π ≜ {π = col (π^{1}, \dots, π^{N}) : π^{n} \geq 0, \sum_{n = 1}^{N} π^{n} = 1}$ is a “probabilistic simplex” formed by the possible values of $π$ ;
$N_{r}^{X}$ is a random number of the state $X_{t}$ transitions, occurred on the interval $[t_{r - 1}, t_{r}]$ ,
$a_{r}^{s} ≜ {ω \in Ω : N_{r}^{X} (ω) \leq s}$ , $A_{r}^{s} ≜ \prod_{q = 1}^{r} a_{q}^{s}$ ;
$ρ^{k, ℓ, q} (d u)$ is a conditional distribution of the vector $X_{t_{r}}^{ℓ} I_{{q}} (N_{r}^{X}) υ_{r}$ given $X_{t_{r - 1}} = e_{k}$ , i.e., for any $G \in B (R^{M})$ the following equality is true:

$E_{} \{I_{G} (υ_{r}) I_{{q}} (N_{r}^{X}) X_{t_{r}}^{ℓ} | X_{t_{r - 1}} = e_{k}\} = \int_{G} ρ^{k, ℓ, q} (d u);$
$N (y, m, K) ≜ {(2 π)}^{- M / 2} \det^{- 1 / 2} K exp \{- \frac{1}{2} {∥ y - m) ∥}_{K^{- 1}}^{2}\}$ is an M-dimensional Gaussian probability density function (pdf) with the expectation m and nondegenerate covariance matrix K;
${∥ α ∥}_{K}^{2} ≜ α^{⊤} K α$ , ${〈 α, β 〉}_{K} ≜ α^{⊤} K β$ .

Markovianity of

{(X_{t_{r}}, Y_{r})}_{r \geq 0}

, formula of the total probability and Fubini theorem provide the equalities below for any set

A \in B (R^{M})

\begin{matrix} E_{} \{X_{t_{r}} I_{A} (Y_{r}) | Y_{r - 1}\} = E_{} \{E_{} \{X_{t_{r}} I_{A} (Y_{r}) | F_{t_{r}}^{X} \lor Y_{r - 1}\} | Y_{r - 1}\} = \\ = E_{} \{X_{t_{r}} \int_{A} N (y, f υ_{r}, \sum_{p = 1}^{N} υ_{r}^{p} G_{p}) d y | Y_{r - 1}\} = \\ = E_{} \{E_{} \{X_{t_{r}} \int_{A} N (y, f υ_{r}, \sum_{p = 1}^{N} υ_{r}^{p} G_{p}) d y | X_{t_{r - 1}} \lor Y_{r - 1}\} | Y_{r - 1}\} = \\ = E_{} \{\sum_{ℓ = 1}^{N} e_{ℓ} \sum_{q = 0}^{\infty} \sum_{k = 1}^{N} e_{k}^{⊤} X_{t_{r - 1}} \int_{D} \int_{A} N (y, f u, \sum_{p = 1}^{N} u^{p} G_{p}) d y ρ^{k, ℓ, q} (d u) | Y_{r - 1}\} = \\ = \sum_{ℓ = 1}^{N} e_{ℓ} \int_{A} [\sum_{k = 1}^{N} {\hat{X}}_{r - 1}^{k} \sum_{q = 0}^{\infty} \int_{D} N (y, f u, \sum_{p = 1}^{N} u^{p} G_{p}) ρ^{k, ℓ, q} (d u)] d y . \end{matrix}

This means that the integrand in the square brackets defines the conditional distribution

(X_{t_{r}}, Y_{r})

given

Y_{r - 1}

. Further, the conditional distribution

{\hat{X}}_{r}

is defined component-wisely by the generalized Bayes rule [10]

{\hat{X}}_{r}^{j} = \frac{\sum_{k = 1}^{N} {\hat{X}}_{r - 1}^{k} \sum_{q = 0}^{\infty} \int_{D} N (Y_{r}, f u, \sum_{p = 1}^{N} u^{p} G_{p}) ρ^{k, j, q} (d u)}{\sum_{i, ℓ = 1}^{N} {\hat{X}}_{r - 1}^{i} \sum_{c = 0}^{\infty} \int_{D} N (Y_{r}, f v, \sum_{n = 1}^{N} v^{n} G_{n}) ρ^{i, ℓ, c} (d v)}, j = \bar{1, N} .

(28)

So, we have proved the following

Lemma 3.

If for the observation system (1), (2) conditions 1–3 are valid, then the filtering estimate

{\hat{X}}_{r}

given the discretized observations is defined by (26) at

r = 0

, and by recursion (28) at the instant

t_{r}

of the discretized observation

Y_{r}

reception.

4.2. Stable Analytic Approximations

Recursion (23) cannot be realized directly because of infinite summation both in the numerator and denominator. We replace them by the finite sums, and the corresponding vector sequence

{\bar{X}}_{r} (s)

, calculated by the formula

{\bar{X}}_{r}^{j} (s) = \frac{\sum_{k = 1}^{N} {\bar{X}}_{r - 1}^{k} (s) \sum_{q = 0}^{s} \int_{D} N (Y_{r}, f u, \sum_{p = 1}^{N} u^{p} G_{p}) ρ^{k, j, q} (d u)}{\sum_{i, ℓ = 1}^{N} {\bar{X}}_{r - 1}^{i} (s) \sum_{c = 0}^{s} \int_{D} N (Y_{r}, f v, \sum_{n = 1}^{N} v^{n} G_{n}) ρ^{i, ℓ, c} (d v)}, j = \bar{1, N}

(29)

is called the analytic approximation of the s-th order of

{\hat{X}}_{r}

. Obviously, that

{\bar{X}}_{r} (s)

is stable.

Let us introduce the following positive random numbers and matrices:

\begin{matrix} ξ_{q}^{k j} ≜ \sum_{m = 0}^{s} \int_{D} N (Y_{q}, f u, \sum_{p = 1}^{N} u^{p} G_{p}) ρ^{k, j, m} (d u), \\ θ_{q}^{k j} ≜ \sum_{m = s + 1}^{\infty} \int_{D} N (Y_{q}, f u, \sum_{p = 1}^{N} u^{p} G_{p}) ρ^{k, j, m} (d u), \\ ξ_{q} ≜ ∥ ξ_{q}^{k j} ∥_{k, j = \bar{1, N}}, θ_{q} ≜ {∥ θ_{q}^{k j} ∥}_{k, j = \bar{1, N}} . \end{matrix}

(30)

The estimates

{\hat{X}}_{r}

(28) and

{\bar{X}}_{r} (s)

(29) can be rewritten in the recurrent form:

{\hat{X}}_{r} = {(1 {(ξ_{r} + θ_{r})}^{⊤} {\hat{X}}_{r - 1})}^{- 1} {(ξ_{r} + θ_{r})}^{⊤} {\hat{X}}_{r - 1},

(31)

{\bar{X}}_{r} (s) = {(1 ξ_{r}^{⊤} {\bar{X}}_{r - 1} (s))}^{- 1} ξ_{r}^{⊤} {\bar{X}}_{r - 1} (s) .

(32)

Let us define the global distance [28] between the estimates

{{\bar{X}}_{r} (s)}

and

{{\hat{X}}_{r}}

as

Σ_{r} (s) ≜ sup_{π \in Π} E_{} \{∥ {\hat{X}}_{r} - {\bar{X}}_{r} {(s) ∥}_{1}\} = sup_{π \in Π} \sum_{j = 1}^{N} E_{} \{| {\hat{X}}_{r}^{j} - {\bar{X}}_{r}^{j} (s) |\} .

(33)

The pretty natural characteristic shows the maximal expected divergence of the recursions (28) and (29) at the r-th step.

The assertion below defines an upper bound of the characteristic

Σ_{r} (s)

.

Lemma 4.

If the conditions of Lemma 3 are valid, then

Σ_{r} (s) \leq 2 - 2 {(1 - C_{1} \frac{{(\bar{λ} h)}^{s + 1}}{(s + 1)!})}^{r},

(34)

where

\bar{λ} ≜ {max}_{1 \leq n \leq N} | λ_{n n} |

, and

C_{1} = C_{1} (h, \bar{λ}) \in (0, 1)

is the following parameter:

C_{1} ≜ e^{- \bar{λ} h} \frac{(s + 1)!}{{(\bar{λ} h)}^{s + 1}} \sum_{k = s + 1}^{\infty} \frac{{(\bar{λ} h)}^{k}}{k!},

(35)

which is bounded from above:

C_{1} \frac{{(\bar{λ} h)}^{s + 1}}{(s + 1)!} < 1

.

The proof of Lemma 4 is given in Appendix D.

Assertion of Lemma brings the practical benefit. The Lemma does not contain any asymptotic requirements neither to the approximation order s nor to the discretization step h: inequality (34) is universal. Mostly, in the digital control systems the data acquisition rate is fixed or bounded from above. There are some extra algorithmic limitations of the rate: the “raw” data should be preprocessed, smoothed, averaged, refined from outliers, etc. For example, utilization of the central limit theorem [29] and diffusion approximation framework [30] for the the renewal processes is legitimate with significant averaging intervals, and their length depends on the process moments.

Now we fix the time instant T and consider an asymptotic

h \to 0

. In this case

r = \frac{T}{h} \to \infty

and

Σ_{\frac{T}{h}} (s) \leq 2 - 2 {(1 - C_{1} \frac{{(\bar{λ} h)}^{s + 1}}{(s + 1)!})}^{\frac{T}{h}} \sim 2 \bar{λ} T \frac{{(\bar{λ} h)}^{s}}{(s + 1)!} .

4.3. Stable Numerical Approximations

In the recursion (32) we use the integrals

ξ_{r}^{i j}

, which cannot be calculated analytically. The numerical integration brings some extra approximation error. Let us investigate its affect to the total accuracy of the filter numerical realization.

The integrals

ξ^{i j} (y)

are usually approximated by the sums

\begin{matrix} ξ^{i j} (y) \approx ψ^{i j} (y) ≜ \sum_{ℓ = 1}^{L} N (y, f w_{ℓ}, \sum_{p = 1}^{N} w_{ℓ}^{p} g_{p}) ϱ_{ℓ}^{i j}, & ψ (y) ≜ ∥ ψ^{i j} {(y) ∥}_{i, j = \bar{1, N}}, \end{matrix}

(36)

which are defined by the collection of the pairs

{(w_{ℓ}, ϱ_{ℓ}^{i j})}_{ℓ = \bar{1, L}}

. Here,

w_{ℓ} ≜ col (w_{ℓ}^{1}, \dots, w_{ℓ}^{N}) \in D

are the points, and

ϱ_{ℓ}^{i j} \geq 0

(

ℓ = \bar{1, L}

) are the weights:

\sum_{j = 1}^{N} \sum_{ℓ = 1}^{L} ϱ_{ℓ}^{i j} \leq Q \leq 1

.

In complete analogy with

ξ_{q}

we define the approximations

ψ_{q} ≜ {∥ ψ^{i j} (Y_{q}) ∥}_{i, j = \bar{1, N}}

. By construction, the elements of

ψ_{q}

are positive random values, hence the approximation

{\tilde{X}}_{r}

{\tilde{X}}_{r} ≜ {(1 ψ_{r}^{⊤} {\tilde{X}}_{r - 1})}^{- 1} ψ_{r}^{⊤} {\tilde{X}}_{r - 1}, {\tilde{X}}_{0} = π

(37)

is stable. Below we denote the numerical integration errors and their absolute values as follows

γ^{k j} ≜ ψ^{k j} - ξ^{k j}, γ_{r} ≜ {∥ γ^{k j} (Y_{r}) ∥}_{k, j = \bar{1, N}}

(38)

{\bar{γ}}^{k j} ≜ | γ^{k j} |, {\bar{γ}}_{r} ≜ {∥| γ^{k j} (Y_{r}) |∥}_{k, j = \bar{1, N}} .

(39)

So, the recursion (32) is replaced by the scheme (37), holding the common initial condition

π

.

Both (32) and (37) are constructed in light of the event

A_{r}^{s}

: the state transition numbers do not exceed the threshold s over any subintervals

[t_{q - 1}, t_{q}]

belonging to

[0, t_{r}]

. So, the distance between

{\tilde{X}}_{r}

and

{\bar{X}}_{r} (s)

should be determined taking into account

A_{r}^{s}

. In view of this fact, we propose the pseudo-metrics

E_{r} (s) ≜ sup_{π \in Π} E_{} \{I_{A_{r}^{s}} (ω) {∥ {\tilde{X}}_{r} - {\bar{X}}_{r} (s) ∥}_{1}\} = sup_{π \in Π} \sum_{n = 1}^{N} E_{} \{I_{A_{r}^{s}} (ω) | {\tilde{X}}_{r}^{n} - {\bar{X}}_{r}^{n} (s) |\} .

(40)

This index reflects maximal divergence of the algorithms (32) and (37) after r steps, being started from the arbitrary but common initial condition.

Theorem 2.

If the inequality

max_{i = \bar{1, N}} \sum_{j = 1}^{N} \int_{R^{M}} | ψ^{i j} (y) - ξ^{i j} (y) | d y < δ

(41)

is true for the numerical integration scheme (36), then the distance

E_{r} (s)

is bounded from above:

E_{r} (s) \leq 2 r Q^{r - 1} δ .

(42)

The proof of Theorem 2 is given in Appendix E.

The chance to describe the accuracy of the numerical algorithm for the stochastic filtering using only the condition (41), related to the calculus, looks remarkable. Furthermore, if the total weight

Q = \sum_{ℓ, j} ϱ_{ℓ}^{i j}

separates from the unity, i.e.,

Q < 1

, then the index

E_{r} (s)

is a sublinear function of r, so as the index

Σ_{r} (s)

of the analytic accuracy is. Notably, that in the classic numerical algorithms of the SDS solution the global error grows linearly with respect to the number of steps r [26].

The precision characteristics of both the analytical approximation and its numerical realization should be aggregated into the one. If the conditions of Lemma 4 and Theorem 2 are valid, then the local distance (i.e., the distance after one iteration) between the optimal filtering estimate and its numerical approximation can be bounded from above:

\begin{matrix} τ (s) ≜ sup_{π \in Π} E_{} \{∥ {\hat{X}}_{1} - {\tilde{X}}_{1} ∥_{1}\} \leq sup_{π \in Π} E_{} \{I_{a_{1}^{s}} (ω) ∥ {\tilde{X}}_{1} - {\bar{X}}_{1} (s) + {\bar{X}}_{1} (s) - {\hat{X}}_{1} ∥_{1} + I_{{\bar{a}}_{1}^{s}} (ω) {∥ {\tilde{X}}_{1} - {\bar{X}}_{1} (s) ∥}_{1}\} \leq \\ \leq 2 P {{\bar{a}}_{1}^{s}} + sup_{π \in Π} E_{} \{∥ {\bar{X}}_{1} (s) - {\hat{X}}_{1} ∥_{1}\} + sup_{π \in Π} E_{} \{I_{a_{1}^{s}} (ω) {∥ {\tilde{X}}_{1} - {\bar{X}}_{1} (s) ∥}_{1}\} = \\ = 2 P {{\bar{a}}_{1}^{s}} + σ (s) + E_{1} (s) \leq 4 \frac{{(\bar{λ} h)}^{s + 1}}{(s + 1)!} + 2 δ . \end{matrix}

(43)

The global distance between

{\hat{X}}_{r} ≜ E_{} \{X_{r} | Y_{r}\}

and

{\tilde{X}}_{r}

can be bounded in the similar way:

T (s) ≜ sup_{π \in Π} E_{} \{∥ {\hat{X}}_{r} - {\tilde{X}}_{r} ∥_{1}\} \leq 4 [1 - {(1 - \frac{{(\bar{λ} h)}^{s + 1}}{(s + 1)!})}^{r}] + 2 r Q^{r - 1} δ .

(44)

We could choose the parameters

(h, s)

of the analytical approximation and

δ

of the numerical integration independently each other. However, both the limitation of the computational resources and the accuracy requirements lead to the necessity of the mutual optimization of

(h, s, δ)

.

Let us fix some time horizon T along with the order s of analytical approximation, and consider the asymptotic

r \to \infty

, or, equivalently,

h = \frac{T}{r} \to 0

. Due to the Bernoulli inequality, and condition

0 < Q \leq 1

we have that

\begin{matrix} sup_{π \in Π} E_{} \{∥ {\tilde{X}}_{T / h} - {\hat{X}}_{T / h} ∥_{1}\} \leq 4 [1 - {(1 - \frac{{(\bar{λ} h)}^{s + 1}}{(s + 1)!})}^{r}] + 2 r Q^{r - 1} δ \leq 4 r \frac{{(\bar{λ} h)}^{s + 1}}{(s + 1)!} + 2 r Q^{r - 1} δ = \\ = 4 \bar{λ} T \frac{{(\bar{λ} h)}^{s}}{(s + 1)!} + 2 r Q^{r - 1} δ \leq 2 T (2 \bar{λ} \frac{{(\bar{λ} h)}^{s}}{(s + 1)!} + \frac{δ}{h}) . \end{matrix}

(45)

The first summand in the brackets represents the contribution of the analytical approximation error, the second one reflects the error of the specified numerical integration scheme. Obviously, the optimal choice of the parameters provides an equal infinitesimal order for both the summands, and it is possible when

δ \sim \frac{{(\bar{λ} h)}^{s + 1}}{\bar{λ}}

.

4.4. Numerical Example

To illustrate the correspondence between the theoretical estimate and its realization along with the performance of the numerical algorithm, we consider the filtering problem for the observation system (1) and (2) with the following parameters:

t \in [0, 1]

,

N = 3

,

Λ = [\begin{matrix} - 1.0 & 0.2 & 0.8 \\ 0.8 & - 1.0 & 0.2 \\ 0.2 & 0.8 & - 1.0 \end{matrix}], π = [\begin{matrix} 0.333 \\ 0.333 \\ 0.334 \end{matrix}], f = [\begin{matrix} 0.0 \\ 0.0 \\ 0.0 \end{matrix}], \begin{matrix} G_{1} = 1.0, \\ G_{2} = 4.0, \\ G_{3} = 9.0 . \end{matrix}

The specified observation system is the one with state-dependent noise, and the conditions of Corollary 1 hold, so the optimal filter (23) restores the MJP state precisely under available noisy observations. Let us verify this theoretical fact, using the recursive algorithm (37). We choose the analytical approximation of the order

s = 1

with numerical integration by the simple midpoint rectangle scheme and calculate estimate approximations with decreasing time-discretization step:

h = 0.01; 0.001; 0.0001; 0.00001

. We expect the descent of the estimation error characterized by the MS-criterion

S_{t} (h) = \sqrt{E_{} \{∥ X_{t} - {\tilde{X}}_{\frac{t}{h}} ∥_{2}^{2}\}}

. To calculate the criterion, we use the Monte–Carlo method over the test sample of the size 1000. Figure 1 presents the corresponding plots of the quality index

S_{t} (h)

for various values of h.

The determination of the precision order provided by the chosen numerical integration method is out of the scope of this investigation. Nevertheless, one can see the expected decrease of the estimation error when the time-discretization step descends. We appraise this result as a practical confirmation of both the theoretical assertions and numerical algorithm.

5. Conclusions

In this paper, we investigated the optimal filtering problem of the MJP states, given the indirect noisy continuous-time observations. The observation noise intensity was a function of the estimated state, so it was impossible to apply the classic Wonham filter to this observation system. To overcome this obstacle, we suggested an observation transform. On the one hand, the transformed observations remained to be equivalent to the original one from the informational point of view. On the other hand, the “new“ observations allowed to apply the effective stochastic analysis framework to process them. We derived the optimal filtering estimate theoretically as a unique strong solution to some discrete–continuous stochastic differential system. The transformed observations included derivative of the quadratic characteristics, i.e., the result of some limit passage in the stochastic settings. Hence, the subsequent numerical realization of the filtering became challenging. We proposed to approximate the initial continuous-time filtering problem by a sequence of the optimal ones given the time-discretized observations. We also involved numerical integration schemes to calculate the integrals included in the estimation formula. We prove assertions, characterizing the accuracy of the numerical approximation of the filtering estimate, i.e., the distance between the calculated approximation and optimal discrete-time filtering estimate. The accuracy depended on the observation system parameters, time discretization step, a threshold of state transition number during the time step, and the chosen scheme of the numerical integration. We suggested the whole class of numerical filtering algorithms. In each case, one could choose any specific algorithm individually, taking into account characteristics of the concrete observation system, accuracy requirements, and available computing resources.

We do not consider the presented investigations as completed. First, the characterization of the distance between the initial optimal continuous-time filtering estimate and its proposed approximation is still an open problem. Second, we can use the theoretical solution to the MJP filtering problem as a base of numerical schemes for the diffusion process filtering, given the observations with state-dependent noise. Third, the obtained optimal filtering estimate looks a springboard for a solution to the optimal stochastic control of the Markov jump processes, given both the counting and diffusion observations with state-dependent noise. All of this research is in progress.

Author Contributions

Conceptualization, A.B., I.S.; methodology, A.B.; formal analysis and investigation, A.B., I.S.; writing—original draft preparation, A.B.; writing—review and editing, I.S.; supervision, I.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CME	Conditional mathematical expectation
MJP	Markov jump process
pdf	Probability density function
RHS	Right-hand side
SDS	Stochastic differential system

Appendix A. Proof of Lemma 2

From (14), (15), the identity

diag (a) b \equiv diag (b) a

, the fact that

J_{n} (t) \neq J_{n} (t -)

at most at finite points of any finite interval and property 4 of the function

K (t)

, the following equalities are true

\begin{matrix} C_{t}^{n} = \int_{0}^{t} (1 - e_{n}^{⊤} V_{s -}) e_{n}^{⊤} d R_{s} = \int_{0}^{t} (1 - e_{n}^{⊤} V_{s -}) e_{n}^{⊤} J (s) (Λ^{⊤} (s) X_{s -} d s + d M_{s}^{X}) = \\ = \int_{0}^{t} (1 - J_{n} (s -) X_{s -}) J_{n} (s -) Λ^{⊤} (s) X_{s -} d s + \int_{0}^{t} (1 - e_{n}^{⊤} V_{s -}) J_{n} (s) d M_{s}^{X} = \\ = \int_{0}^{t} J_{n} (s) Λ^{⊤} (s) (I - diag J_{n} (s)) X_{s} d s + \int_{0}^{t} (1 - e_{n}^{⊤} V_{s -}) J_{n} (s) d M_{s}^{X} = \\ = \int_{0}^{t} 1 Γ_{n} (s) X_{s} d s + \int_{0}^{t} (1 - e_{n}^{⊤} V_{s -}) J_{n} (s) d M_{s}^{X} . \end{matrix}

(A1)

Assertion 1 of Lemma is proved.

The definition of the processes

C_{t}^{n}

(

n = \bar{1, N}

) guarantees their strong orthogonality, i.e.,

P {Δ C_{t}^{i} Δ C_{t}^{j} = 0} \equiv 0

for any

i \neq j

and

t \geq 0

, so

{[C^{i}, C^{j}]}_{t} \equiv 0

.

Let us use (5), (19) and properties of X and

J_{n}

to derive the quadratic characteristics of

C^{n}

:

\begin{matrix} {〈 C^{n}, C^{n} 〉}_{t} = \int_{0}^{t} {(1 - J_{n} (s) X_{s -})}^{2} J_{n} (s) d {〈 X, X 〉}_{s} J_{n}^{⊤} (s) = \\ = \int_{0}^{t} (1 - J_{n} (s) X_{s -}) J_{n} (s) (diag (Λ^{⊤} (s) X_{s -} - Λ^{⊤} (s) diag X_{s -} - diag (X_{s -}) Λ (s)) J_{n}^{⊤} (s) d s = \\ = \int_{0}^{t} (1 - J_{n} (s) X_{s -}) J_{n} (s) diag (J_{n} (s)) Λ^{⊤} (s) X_{s -} d s = \int_{0}^{t} J_{n} (s) Λ^{⊤} (s) (I - diag J_{n} (s)) X_{s} d s = \\ = \int_{0}^{t} 1 Γ_{n} (s) X_{s} d s . \end{matrix}

(A2)

Assertion 2 of Lemma is proved.

If s and t are two arbitrary moments, such that

s \leq t

, then

\begin{matrix} E_{} \{ν_{t}^{n} - ν_{s}^{n} | {\bar{Y}}_{s}\} = E_{} \{\int_{s}^{t} J_{n} (u) Λ^{⊤} (u) (I - diag J_{n} (u)) E_{} \{(X_{u} - {\hat{X}}_{u}) | {\bar{Y}}_{u}\} d u | {\bar{Y}}_{s}\} + \\ + E_{} \{E_{} \{\int_{s}^{t} (1 - J_{n} (s) X_{s -}) J_{n} (u) d M_{u}^{X} | F_{s}\} | {\bar{Y}}_{s}\} = 0, \end{matrix}

i.e.,

ν_{t}^{n}

is a

{\bar{Y}}_{t}

-adapted martingale. Note, that

ν_{t}^{n}

is purely discontinuous with unit jumps, hence

\begin{matrix} {[ν^{n}, ν^{n}]}_{t} = \sum_{τ \leq t} {(Δ ν_{τ}^{n})}^{2} = {[C^{n}, C^{n}]}_{t} = \sum_{τ \leq t} {(Δ C_{τ}^{n})}^{2} = C_{t}^{n} = \\ = \int_{0}^{t} J_{n} (s) Λ^{⊤} (s) (I - diag J_{n} (s)) X_{s} d s + \int_{0}^{t} (1 - J_{n} (s) X_{s -}) J_{n} (s) d M_{s}^{X} = \int_{0}^{t} 1 Γ_{n} (s) {\hat{X}}_{s} d s + μ_{t}^{0}, \end{matrix}

where

μ_{t}^{0}

is some

{\bar{Y}}_{t}

-adapted martingale. From the uniqueness of the special martingale representation

{[ν^{n}, ν^{n}]}_{t}

it follows that

{〈 ν^{n}, ν^{n} 〉}_{t} = \int_{0}^{t} 1 Γ_{n} (s) {\hat{X}}_{s} d s

. Lemma 2 is proved. □

Appendix B. Proof of Theorem 1

We use the same approach as in ([6], Part III, Sect. 8.7) to derive the MJP filtering equations. The idea exploits the uniqueness of the representation for a special semimartingale along with the integral representation of a martingale [23].

From the Bayes rule it follows that

{\hat{X}}_{0} = E_{} \{X_{0} | D_{0}\} = {(D_{0}^{⊤} J (0) π)}^{+} diag (D_{0}) J (0) π

. Let

ϰ_{n - 1}

be a random instant of the

n - 1

-th discrete observation

Δ D_{ϰ_{n - 1}}

. We investigate evolution of

X_{t}

over the interval

[ϰ_{n - 1}, ϰ_{n})

:

X_{t} = X_{ϰ_{n - 1}} + \int_{ϰ_{n - 1}}^{t} Λ^{⊤} (s) X_{s} d s + M_{t}^{X} - M_{ϰ_{n - 1}}^{X}, t \in [ϰ_{n - 1}, ϰ_{n}) .

Conditioning the left and right parts of the latter equality over

{\bar{Y}}_{t}

, one can show that

{\hat{X}}_{t} = {\hat{X}}_{ϰ_{n - 1}} + \int_{ϰ_{n - 1}}^{t} Λ^{⊤} (s) {\hat{X}}_{s} d s + μ_{t}^{1},

where

{μ_{t}^{1}}_{t \in [ϰ_{n - 1}, ϰ_{n})}

is an

{\bar{Y}}_{t}

adapted martingale. For any

t \in [ϰ_{n - 1}, ϰ_{n})

the equality

{\bar{Y}}_{t} = {\bar{Y}}_{ϰ_{n - 1}} \lor σ {U_{s}, s \in (ϰ_{n - 1}, t]} \lor σ {C_{s}^{j}, s \in (ϰ_{n - 1}, t], j = \bar{1, N}}

holds. The process

{ω_{t}}

(24) is a

{\bar{Y}}_{t}

-adapted standard Wiener process [10].

The process

U_{t}

is a

{\bar{Y}}_{t}

-adapted semimartingale with

F^{X}

-conditionally-independent increments, meanwhile

{C_{t}^{j}}_{j = \bar{1, N}}

are

{\bar{Y}}_{t}

-adapted point processes. Hence, the martingale

μ_{t}^{1}

admits an integral representation ([23], Chap. 4, §8, Problem 1), i.e.,

{\hat{X}}_{t} = {\hat{X}}_{ϰ_{n - 1}} + \int_{ϰ_{n - 1}}^{t} Λ^{⊤} (s) {\hat{X}}_{s} d s + \int_{ϰ_{n - 1}}^{t} α_{s} d ω_{s} + \int_{ϰ_{n - 1}}^{t} \sum_{j = 1}^{N} β_{s}^{j} d ν_{s}^{j},

(A3)

where

α_{t}

and

{β_{t}^{j}}_{j = \bar{1, N}}

are

{\bar{Y}}_{t}

-predictable processes of appropriate dimensionality, which should be determined.

Due to the generalized Itô rule

X_{t} U_{t}^{⊤} = X_{ϰ_{n - 1}} U_{ϰ_{n - 1}}^{⊤} + \int_{ϰ_{n - 1}}^{t} (Λ^{⊤} (s) X_{s} U_{s}^{⊤} + diag (X_{s}) {\bar{f}}^{⊤} (s)) d s + μ_{t}^{2},

where

μ_{t}^{2}

is an

F_{t}

-adapted matringale. Conditioning both sides of the latter equality over

{\bar{Y}}_{t}

, we can show that

{\hat{X}}_{t} U_{t}^{⊤} = {\hat{X}}_{ϰ_{n - 1}} U_{ϰ_{n - 1}}^{⊤} + \int_{ϰ_{n - 1}}^{t} (Λ^{⊤} (s) {\hat{X}}_{s} U_{s}^{⊤} + diag ({\hat{X}}_{s}) {\bar{f}}^{⊤} (s)) d s + μ_{t}^{3},

(A4)

where

μ_{t}^{3}

is a

{\bar{Y}}_{t}

-adapted martingale. On the other hand, using the Itô rule, representation (A3) and the fact that

ω_{t}

is the Wiener process, we can obtain

{\hat{X}}_{t} U_{t}^{⊤} = {\hat{X}}_{ϰ_{n - 1}} U_{ϰ_{n - 1}}^{⊤} + \int_{ϰ_{n - 1}}^{t} (Λ^{⊤} (s) {\hat{X}}_{s} U_{s}^{⊤} + {\hat{X}}_{s} {\hat{X}}_{s}^{⊤} {\bar{f}}^{⊤} (s) + α_{s}) d s + μ_{t}^{4},

(A5)

where

μ_{t}^{4}

is a

{\bar{Y}}_{t}

-adapted martingale. One can see that (A4) and (A5) are two representations of the same special semimartingale

{\hat{X}}_{t} U_{t}^{⊤}

, hence due to the representation uniqueness the

{\bar{Y}}_{t}

-predictable process

α_{t}

should satisfy the equality

\int_{ϰ_{n - 1}}^{t} diag ({\hat{X}}_{s}) {\bar{f}}^{⊤} (s) d s = \int_{ϰ_{n - 1}}^{t} ({\hat{X}}_{s} {\hat{X}}_{s}^{⊤} {\bar{f}}^{⊤} (s) + α_{s}) d s,

and

α_{t}

may be chosen in the form

α_{t} = (diag {\hat{X}}_{t -} - {\hat{X}}_{t -} {\hat{X}}_{t -}^{⊤}) {\bar{f}}^{⊤} (t) .

(A6)

Due to the generalized Itô rule, Formulae (5), (18) and the properties of X and

J_{j}

we can obtain, that

X_{t} C_{t}^{j} = X_{ϰ_{n - 1}} C_{ϰ_{n - 1}}^{j} + \int_{ϰ_{n - 1}}^{t} (Λ^{⊤} (s) X_{s} C_{s}^{j} + Γ_{j} (s) X_{s}) d s + μ_{t}^{5},

where

μ_{t}^{5}

is an

F_{t}

-adapted martingale. Conditioning both sides of this equality over

{\bar{Y}}_{t}

, we get

{\hat{X}}_{t} C_{t}^{j} = {\hat{X}}_{ϰ_{n - 1}} C_{ϰ_{n - 1}}^{j} + \int_{ϰ_{n - 1}}^{t} (Λ^{⊤} (s) {\hat{X}}_{s} C_{s}^{j} + Γ_{j} (s) {\hat{X}}_{s}) d s + μ_{t}^{6},

(A7)

where

μ_{t}^{6}

is a

{\bar{Y}}_{t}

-adapted martingale. On the other hand, using the Itô rule, representation (A3) and quadratic characteristic (21) we deduce, that

{\hat{X}}_{t} C_{t}^{j} = {\hat{X}}_{ϰ_{n - 1}} C_{ϰ_{n - 1}}^{j} + \int_{ϰ_{n - 1}}^{t} (Λ^{⊤} (s) {\hat{X}}_{s} C_{s}^{j} + {\hat{X}}_{s} 1 Γ_{j} (s) {\hat{X}}_{s} + β_{s}^{j} 1 Γ_{j} (s) {\hat{X}}_{s}) d s + μ_{t}^{7},

(A8)

where

μ_{t}^{7}

is a

{\bar{Y}}_{t}

-adapted martingale. Since the representations (A7) and (A8) correspond to the same special semimartingale

{\hat{X}}_{t} C_{t}^{j}

we conclude that the process

β_{s}^{j}

should satisfy the equality

\int_{ϰ_{n - 1}}^{t} Γ_{j} (s) {\hat{X}}_{s} d s = \int_{ϰ_{n - 1}}^{t} [{\hat{X}}_{s} 1 Γ_{j} (s) {\hat{X}}_{s} + β_{s}^{j} 1 Γ_{j} (s) {\hat{X}}_{s}] d s .

Acting as with the coefficient

α_{t}

, we choose the predictable processes

β_{t}^{j}

in the form

β_{t}^{j} = (Γ_{j} (t) - 1 Γ_{j} (t) {\hat{X}}_{t -} I) {\hat{X}}_{t -} {(1 Γ_{j} (t) {\hat{X}}_{t -})}^{+}, j = \bar{1, N} .

(A9)

So, on the interval

[ϰ_{n - 1}, ϰ_{n})

the optimal filtering estimate

{\hat{X}}_{t}

is described by the SDS

\begin{matrix} {\hat{X}}_{t} = {\hat{X}}_{ϰ_{n - 1}} + \int_{ϰ_{n - 1}}^{t} Λ^{⊤} (s) {\hat{X}}_{s -} d s + \int_{ϰ_{n - 1}}^{t} (diag {\hat{X}}_{s -} - {\hat{X}}_{s -} {\hat{X}}_{s -}^{⊤}) {\bar{f}}^{⊤} (s) d ω_{s} + \\ + \sum_{j = 1}^{N} \int_{ϰ_{n - 1}}^{t} (Γ_{j} (s) - 1 Γ_{j} (s) {\hat{X}}_{s -} I) {\hat{X}}_{s -} {(1 Γ_{j} (s) {\hat{X}}_{s -})}^{+} d ν_{s}^{j} . \end{matrix}

(A10)

Since

P {Δ X_{ϰ_{n}} = 0} = 1

, Equation (A10) presumes

P

-a.s. fulfilment of the equality

\begin{matrix} E_{} \{X_{ϰ_{n}} | {\bar{Y}}_{ϰ_{n - 1}} \lor σ {U_{s}, s \in (ϰ_{n - 1}, ϰ_{n}]} \lor σ {C_{s}^{j}, s \in (ϰ_{n - 1}, ϰ_{n}], j = \bar{1, N}}\} = \\ = {\hat{X}}_{ϰ_{n - 1}} + \int_{ϰ_{n - 1}}^{ϰ_{n}} Λ^{⊤} (s) {\hat{X}}_{s -} d s + \int_{ϰ_{n - 1}}^{ϰ_{n}} (diag {\hat{X}}_{s -} - {\hat{X}}_{s -} {\hat{X}}_{s -}^{⊤}) {\bar{f}}^{⊤} (s) d ω_{s} + \\ + \sum_{j = 1}^{N} \int_{ϰ_{n - 1}}^{ϰ_{n}} (Γ_{j} (s) - 1 Γ_{j} (s) {\hat{X}}_{s -} I) {\hat{X}}_{s -} {(1 Γ_{j} (s) {\hat{X}}_{s -})}^{+} d ν_{s}^{j} = {\hat{X}}_{τ_{n} -} . \end{matrix}

Finally,

{\bar{Y}}_{ϰ_{n}} = {\bar{Y}}_{ϰ_{n - 1}} \lor σ {U_{s}, s \in (ϰ_{n - 1}, ϰ_{n}]} \lor σ {C_{s}^{j}, s \in (ϰ_{n - 1}, ϰ_{n}], j = \bar{1, N}} \lor σ {Δ D_{ϰ_{n}}},

so, by the Bayes rule we get that

{\hat{X}}_{τ_{n}} = {(Δ D_{τ_{n}}^{⊤} Δ J (τ_{n}) {\hat{X}}_{τ_{n} -})}^{+} diag (Δ D_{τ_{n}}) Δ J (τ_{n}) {\hat{X}}_{τ_{n} -} .

(A11)

Equation (23) can be obtained as “gluing“ of local Equation (A10), which describe the evolution of

{\hat{X}}_{t}

on the intervals

[ϰ_{n - 1}, ϰ_{n})

, and Formula (A11), which describes the estimate correction given the observations available at the moments

ϰ_{n}

.

Uniqueness of the strong solution within the class of nonnegative piecewise-continuous

Y_{t +}

-adapted processes with discontinuity set lying in

V

can be proved in complete analogy with ([31] Chap. 9, Theorem 9.2). Theorem 1 is proved. □

Appendix C. Proof of Corollary 1

The conditions of Corollary guarantee, that the elements of

K (t)

(4) satisfy the equality

K_{n m} (t) = δ_{n m}

almost everywhere, hence

J (t) \equiv I

. This means that in (23)

D_{0} = X_{0}, P - a . s .

, i.e.,

{\hat{X}}_{0} = X_{0}

. Further, from the properties of transition intensity matrix

Λ (\cdot)

and the identity

J_{n} (t) \equiv e_{n}^{⊤}

it follows that

Γ_{n} (t) = diag (e_{n}) {\bar{Λ}}^{⊤} (t)

, where

\bar{Λ} (t) ≜ Λ (t) - λ (t)

,

λ (t) ≜ diag (Λ_{11} (t), \dots, Λ_{N N})

. In this case

C_{t} = \int_{0}^{t} {\bar{Λ}}^{⊤} (s) X_{s} d s + \int_{0}^{t} (I - diag X_{s -}) d M_{s}^{X},

and the n-th component counts the jumps of

X_{t}

into the state

e_{n}

, occurred on the interval

(0, t]

. This means

X_{t}

is the unique solution to the “purely discontinuous” equation

X_{t} = D_{0} + \int_{0}^{t} (I - X_{s -} 1) d C_{s},

(A12)

i.e., the state

X_{t}

is measurable with respect to

σ {D_{0}, C_{s}, 0 \leq s \leq t}

, so

{\hat{X}}_{t} = X_{t}

P

-a.s.

Further, we substitute

X_{t}

into (23) and verify its validity. To do this we simplify the RHS of the equality using the explicit form of

J_{n} (t)

,

Γ_{n} (t)

and

C_{t}

, along with the identities

diag X_{t} - X_{t} X_{t}^{⊤} \equiv 0

and

Δ J (t) \equiv 0

:

\begin{matrix} X_{t} = D_{0} + \int_{0}^{t} Λ^{⊤} (s) X_{s} d s + \\ + \sum_{n = 1}^{N} \int_{0}^{t} [diag (e_{n}) {\bar{Λ}}^{⊤} (s) - e_{n}^{⊤} {\bar{Λ}}^{⊤} (s) X_{s -} I] X_{s -} {(e_{n}^{⊤} {\bar{Λ}}^{⊤} (s) X_{s -})}^{+} [d C_{s}^{n} - e_{n}^{⊤} {\bar{Λ}}^{⊤} (s) X_{s -} d s] = \\ = D_{0} + \sum_{n = 1}^{N} \int_{0}^{t} [diag (e_{n}) {\bar{Λ}}^{⊤} (s) - e_{n}^{⊤} {\bar{Λ}}^{⊤} (s) X_{s -} I] X_{s -} {(e_{n}^{⊤} {\bar{Λ}}^{⊤} (s) X_{s -})}^{+} d C_{s}^{n} . \end{matrix}

The properties of counting processes also provides the following implication: if for some

T \subseteq [0, T]

the equality

\int_{T} e_{n}^{⊤} {\bar{Λ}}^{⊤} (s) X_{s} d s = 0

holds, then

\int_{T} d C_{s}^{n} = 0

. Hence, the latter transformation can be continued:

X_{t} = D_{0} + \sum_{n = 1}^{N} \int_{0}^{t} [e_{n} - X_{s -}] e_{n}^{⊤} d C_{s} = D_{0} + \int_{0}^{t} (I - X_{s -} 1) d C_{s},

which leads to (A12). So, we have verified that under conditions of Corollary 1 the state

X_{t}

is a solution to the filtering Equation (23). Corollary 1 is proved. □

Appendix D. Proof of Lemma 4

Using notations

Ξ_{r} ≜ ξ_{1} ξ_{2} \dots ξ_{r}

and

Θ_{r} ≜ θ_{1} θ_{2} \dots θ_{r}

we can rewrite the estimates

{\hat{X}}_{r}

and

{\bar{X}}_{r} (s)

in the explicit form

{\hat{X}}_{r} = {(1 {(Ξ_{r} + Θ_{r})}^{⊤} π)}^{- 1} {(Ξ_{r} + Θ_{r})}^{⊤} π, {\bar{X}}_{r} (s) = {(1 Ξ_{r}^{⊤} π)}^{- 1} Ξ_{r}^{⊤} π .

To simplify inferences we will omit the index r in

Ξ_{r}

and

Θ_{r}

. The following relations are valid

\begin{matrix} E_{} \{{∥{\hat{X}}_{r} - {\bar{X}}_{r} (s)∥}_{1}\} = E_{} \{{∥\frac{1}{1 {(Ξ + Θ)}^{⊤} π} {(Ξ + Θ)}^{⊤} π - \frac{1}{1 Ξ^{⊤} π} Ξ^{⊤} π∥}_{1}\} = \\ = E_{} \{\frac{1}{1 {(Ξ + Θ)}^{⊤} π 1 Ξ^{⊤} π} {∥1 Ξ^{⊤} π Θ^{⊤} π - 1 Θ^{⊤} π Ξ^{⊤} π∥}_{1}\} \leq \\ \leq E_{} \{\frac{1}{1 {(Ξ + Θ)}^{⊤} π 1 Ξ^{⊤} π} (1 Ξ^{⊤} π ∥ Θ^{⊤} {π ∥}_{1} + 1 Θ^{⊤} π {∥ Ξ^{⊤} π ∥}_{1})\} = 2 E_{} \{\frac{1}{1 {(Ξ + Θ)}^{⊤} π} 1 Θ^{⊤} π\} . \end{matrix}

(A13)

Let us consider an auxiliary estimate

{\overset{˘}{X}}_{r} ≜ E_{} \{X_{t_{r}} I_{A_{r}^{s}} (ω) | Y_{r}\}

. From the Bayes rule it follows that

{\overset{˘}{X}}_{r} = \frac{1}{1 {(Ξ + Θ)}^{⊤} π} Ξ^{⊤} π

and

{\hat{X}}_{r} - {\overset{˘}{X}}_{r} = E_{} \{X_{t_{r}} I_{{\bar{A}}_{r}^{s}} (ω) | Y_{r}\} = \frac{1}{1 {(Ξ + Θ)}^{⊤} π} Θ^{⊤} π .

(A14)

From (A13) and (A14) we deduce, that for

r = 1

and

\forall π \in Π

\begin{matrix} E_{} \{∥ {\hat{X}}_{1} - {\bar{X}}_{1} {(s) ∥}_{1}\} \leq 2 E_{} \{∥ E_{} \{X_{t_{1}} I_{{\bar{a}}_{1}^{s}} (ω) | Y_{1}\} ∥_{1}\} = \\ = 2 E_{} \{\sum_{n = 1}^{N} E_{} \{X_{t_{1}}^{n} I_{{\bar{a}}_{1}^{s}} (ω) | Y_{1}\}\} = 2 E_{} \{E_{} \{I_{{\bar{a}}_{1}^{s}} (ω) | Y_{1}\}\} = 2 P {{\bar{a}}_{1}^{s}} . \end{matrix}

(A15)

The counting process

N_{t}^{X}

has the quadratic characteristic

{〈 N^{X}, N^{X} 〉}_{t} = - \int_{0}^{t} \sum_{n = 1}^{N} λ_{n n} X_{s}^{n} d s

, hence the probability

P {{\bar{a}}_{1}^{s}}

can be bounded from above as

P {{\bar{a}}_{1}^{s}} \leq e^{- \bar{λ} h} \sum_{k = s + 1}^{\infty} \frac{{(\bar{λ} h)}^{k}}{k!} = C_{1} \frac{{(\bar{λ} h)}^{s + 1}}{(s + 1)!} .

(A16)

Formulae (A15) and (A16) lead to the fact, that

{sup}_{π \in Π} E_{} \{∥ {\hat{X}}_{1} - {\bar{X}}_{1} {(s) ∥}_{1}\} \leq 2 C_{1} \frac{{(\bar{λ} h)}^{s + 1}}{(s + 1)!}

.

Markovianity of the pair

(X_{t}, N_{t}^{X})

and inequality (A16) also allow to bound the probability

P {{\bar{A}}_{r}^{s}}

from above:

P {{\bar{A}}_{r}^{s}} \leq 1 - {(1 - C_{1} \frac{{(\bar{λ} h)}^{s + 1}}{(s + 1)!})}^{r}

, that leads to (34). Lemma 4 is proved. □

Appendix E. Proof of Theorem 2

We have

{\tilde{X}}_{1} = {(1 ψ_{1}^{⊤} π)}^{- 1} ψ_{1}^{⊤} π

,

{\bar{X}}_{1} = {(1 ξ_{1}^{⊤} π)}^{- 1} ξ_{1}^{⊤} π

and

Δ_{1} = {\tilde{X}}_{1} - {\bar{X}}_{1} (s)

. Using the matrix algebra it is easy to verify that

[γ^{⊤} π 1 - 1 γ^{⊤} π I] γ^{⊤} π \equiv 0

. Both the estimates are stable, hence

∥ {\tilde{X}}_{1} ∥_{1} = {∥ {\bar{X}}_{1} (s) ∥}_{1} = 1

. The following relations are valid:

\begin{matrix} ∥ Δ_{1} ∥_{1} = \frac{1}{1 ψ_{1}^{⊤} π 1 ξ_{1}^{⊤} π} ∥ 1 ξ_{1}^{⊤} π ψ_{1}^{⊤} π - 1 ψ_{1}^{⊤} π ξ_{1}^{⊤} {π ∥}_{1} = \frac{1}{1 ψ_{1}^{⊤} π 1 ξ_{1}^{⊤} π} {∥ 1 ξ_{1}^{⊤} π γ_{1}^{⊤} π - 1 γ_{1}^{⊤} π ξ_{1}^{⊤} π ∥}_{1} = \\ = \frac{1}{1 ψ_{1}^{⊤} π 1 ξ_{1}^{⊤} π} {∥ [γ_{1}^{⊤} π 1 - 1 γ_{1}^{⊤} π I] ξ_{1}^{⊤} π ∥}_{1} = \\ = \frac{1}{1 ψ_{1}^{⊤} π 1 ξ_{1}^{⊤} π} ∥ [γ_{1}^{⊤} π 1 - 1 γ_{1}^{⊤} π I] [ξ_{1}^{⊤} π + γ_{1}^{⊤} π] ∥_{1} = \frac{1}{1 ξ_{1}^{⊤} π} {∥ [γ_{1}^{⊤} π 1 - 1 γ_{1}^{⊤} π I] {\tilde{X}}_{1} ∥}_{1} \leq \\ \leq \frac{1}{1 ξ_{1}^{⊤} π} ∥ [γ_{1}^{⊤} π 1 - 1 γ_{1}^{⊤} π I] ∥_{1} {∥ {\tilde{X}}_{1} ∥}_{1} \leq 2 \frac{1 {\bar{γ}}_{1}^{⊤} π}{1 ξ_{1}^{⊤} π} = \sum_{i = 1}^{N} π_{i} \frac{\sum_{j = 1}^{N} {\bar{γ}}_{1}^{i j}}{\sum_{k, ℓ = 1}^{N} ξ_{1}^{k ℓ} π_{k}} . \end{matrix}

Using the last inequality, (41) and (A20), it can be shown that

E_{} \{I_{a_{1}^{s}} (ω) {∥ Δ_{1} ∥}_{1}\} \leq 2 \sum_{i = 1}^{N} π_{i} \int_{R^{M}} \sum_{i = 1}^{N} {\bar{γ}}^{i j} (y) d y \leq 2 δ .

Since the latter inequality is valid for any

π \in Π

, we have an upper bound for the local distance characteristic:

sup_{π \in Π} E_{} \{I_{a_{1}^{s}} (ω) {∥ {\tilde{X}}_{1} - {\bar{X}}_{1} (s) ∥}_{1}\} \leq 2 δ .

(A17)

Let us define the following products of the random matrices

ξ_{r}

and

ψ_{r}

:

Ξ_{q, r} ≜ \{\begin{matrix} ξ_{q} ξ_{q + 1} \dots ξ_{r}, & if q \leq r, \\ I & otherwise, \end{matrix}

Ψ_{q, r} ≜ \{\begin{matrix} ψ_{q} ξ_{q + 1} \dots ψ_{r}, & if q \leq r, \\ I & otherwise, \end{matrix}

Γ_{q, r} ≜ Ψ_{q, r} - Ξ_{q, r} .

To proceed the proof of Theorem 2 we need the following auxiliary

Lemma A1.

If

ϕ_{r} ≜ ϕ_{r} (Y_{1}, \dots, Y_{r})

is a non-negative

Y_{r}

-measurable random value, and

Φ_{r} ≜ \frac{ϕ_{r}}{1 Ξ_{1, r}^{⊤} π}

, then

E_{} \{I_{A_{r}^{s}} (ω) Φ_{r}\} = \int_{R^{M}} \dots \int_{R^{M}} ϕ_{r} (y_{1}, \dots, y_{r}) d y_{r} \dots d y_{1} .

(A18)

Proof of Lemma A1.

We consider a non-negative integrable function

ϕ_{1} = ϕ_{1} (y) : R^{M} \to R_{+}

and a

Y_{1}

-measurable random value

Φ_{1} ≜ \frac{ϕ_{1} (Y_{1})}{1 ξ_{1}^{⊤} (Y_{1}) π} = \frac{ϕ_{1} (Y_{1})}{\sum_{i, j = 1}^{N} \sum_{m = 0}^{s} \int_{D} N (Y_{1}, f u, \sum_{p = 1}^{N} u^{p} G_{p}) ρ^{i, j, m} (d u) π_{i}} .

(A19)

We find

E_{} \{I_{a_{1}^{s}} (ω) Φ_{1}\}

:

\begin{matrix} E_{} \{I_{a_{1}^{s}} (ω) Φ_{1}\} = \int_{R^{M}} \int_{D} \frac{ϕ_{1} (y) \sum_{k, ℓ = 1}^{N} \sum_{n = 0}^{s} N (y, f v, \sum_{q = 1}^{N} v^{q} G_{q}) ρ^{k, ℓ, n} (d v) π_{k}}{\sum_{i, j = 1}^{N} \sum_{m = 0}^{s} \int_{D} N (y, f u, \sum_{p = 1}^{N} u^{p} G_{p}) ρ^{i, j, m} (d u) π_{i}} d y = \\ = \int_{R^{M}} ϕ_{1} (y) \frac{\sum_{k, ℓ = 1}^{N} \sum_{n = 0}^{s} \int_{D} N (y, f v, \sum_{q = 1}^{N} v^{q} G_{q}) ρ^{k, ℓ, n} (d v) π_{k}}{\sum_{i, j = 1}^{N} \sum_{m = 0}^{s} \int_{D} N (y, f u, \sum_{p = 1}^{N} u^{p} G_{p}) ρ^{i, j, m} (d u) π_{i}} d y = \int_{R^{M}} ϕ_{1} (y) d y . \end{matrix}

(A20)

Let us consider a non-negative integrable function

ϕ_{2} = ϕ_{1} (y_{1}, y_{2}) : R^{2 M} \to R_{+}

and a

Y_{2}

-measurable random value

\begin{matrix} Φ_{2} ≜ \frac{ϕ_{1} (Y_{1}, Y_{2})}{1 Ξ_{1, 2}^{⊤} (Y_{1}, Y_{2}) π} = \\ = \frac{ϕ_{2} (Y_{1}, Y_{2})}{\sum_{i, i_{2}, j = 1}^{N} \sum_{m_{1}, m_{2} = 0}^{s} \int_{D} \int_{D} N (Y_{1}, f u_{1}, \sum_{p_{1} = 1}^{N} u^{p_{1}} G_{p_{1}}) N (Y_{2}, f u_{2}, \sum_{p_{2} = 1}^{N} u^{p_{2}} G_{p_{2}}) ρ^{i, i_{2}, m_{1}} (d u_{1}) ρ^{i_{2}, j, m_{2}} (d u_{2}) π_{i}} . \end{matrix}

We find

E_{} \{I_{A_{2}^{s}} (ω) Φ_{2}\}

:

\begin{matrix} E_{} \{I_{A_{2}^{s}} (ω) Φ_{2}\} = \int_{R^{M}} \int_{R^{M}} ϕ_{2} (y_{1}, y_{2}) \times \\ \times \frac{\sum_{k, k_{2}, ℓ = 1}^{N} \sum_{n_{1}, n_{2} = 0}^{s} \int_{D} \int_{D} N (y_{1}, f v_{1}, \sum_{q_{1} = 1}^{N} v^{q_{1}} G_{q_{1}}) N (y_{2}, f v_{2}, \sum_{q_{2} = 1}^{N} v^{q_{2}} G_{q_{2}}) ρ^{k, k_{2}, n_{1}} (d v_{1}) ρ^{k_{2}, ℓ, n_{2}} (d v_{2}) π_{k}}{\sum_{i, i_{2}, j = 1}^{N} \sum_{m_{1}, m_{2} = 0}^{s} \int_{D} \int_{D} N (y_{1}, f u_{1}, \sum_{p_{1} = 1}^{N} u^{p_{1}} G_{p_{1}}) N (y_{2}, f u_{2}, \sum_{p_{2} = 1}^{N} u^{p_{2}} G_{p_{2}}) ρ^{i, i_{2}, m_{1}} (d u_{1}) ρ^{i_{2}, j, m_{2}} (d u_{2}) π_{i}} d y_{2} d y_{1} = \\ = \int_{R^{M}} \int_{R^{M}} ϕ_{2} (y_{1}, y_{2}) d y_{2} d y_{1} . \end{matrix}

The correctness of the Lemma assertion in the general case of

E_{} \{I_{A_{r}^{s}} (ω) Φ_{r}\}

can be verified similarly. Lemma A1 is proved. □

Let us define an upper estimate for the norm of

Δ_{r} = {\tilde{X}}_{r} - {\bar{X}}_{r}

. From the definitions of

Ξ

,

Ψ

and

Γ

it follows that

Γ_{1, r} ≜ Ψ_{1, r} - Ξ_{1, r} = \sum_{t = 1}^{r} Ψ_{1, t - 1} γ_{t} Ψ_{t + 1, r} .

(A21)

Making the same inferences as for

Δ_{1}

, we can deduce that

∥ Δ_{r} ∥_{1} \leq \frac{1}{1 Ξ_{1, r}^{⊤} π} {∥ [Γ_{1, r}^{⊤} π 1 - 1 Γ_{1, r}^{⊤} π I] ∥}_{1} \leq 2 \sum_{t = 1}^{r} \frac{1}{1 Ξ_{1, r}^{⊤} π} 1 Ψ_{t + 1, r}^{⊤} {\bar{γ}}_{t}^{⊤} Ψ_{1, t - 1}^{⊤} π .

(A22)

To estimate the contribution of each summand in (A22) we use (A18). To simplify derivation we consider the case

r = 3

, function

ϕ (y_{1}, y_{2}, y_{3}) : R^{3 M} \to R_{+}

ϕ (y_{1}, y_{2}, y_{3}) = 1 ψ^{⊤} (y_{3}) {\bar{γ}}^{⊤} (y_{2}) ψ^{⊤} (y_{1}) π

and the

Y_{3}

-measurable random value

Φ ≜ \frac{ϕ (Y_{1}, Y_{2}, Y_{3})}{1 Ξ_{1, 3}^{⊤} (Y_{1}, Y_{2}, Y_{3}) π}

. Let us estimate from above the mathematical expectation

\begin{matrix} E_{} \{I_{A_{3}^{s}} (ω) Φ\} = \int_{R^{M}} \int_{R^{M}} \int_{R^{M}} \sum_{i, j, k, m = 1}^{N} π_{i} ψ^{i j} (y_{1}) {\bar{γ}}^{j k} (y_{2}) ψ^{k m} (y_{3}) d y_{3} d y_{2} d y_{1} = \\ = \sum_{i, j, k = 1}^{N} π_{i} \sum_{ℓ = 1}^{L} ϱ_{ℓ}^{i j} \int_{R^{M}} {\bar{γ}}^{j k} (y_{2}) d y_{2} \sum_{m = 1}^{N} \sum_{n = 1}^{L} ϱ_{n}^{k m} = Q \sum_{i, j = 1}^{N} π_{i} \sum_{ℓ = 1}^{L} ϱ_{ℓ}^{i j} \sum_{k = 1}^{N} \int_{R^{M}} {\bar{γ}}^{j k} (y_{2}) d y_{2} \leq \\ \leq Q δ \sum_{i = 1}^{N} π_{i} \sum_{j = 1}^{N} \sum_{ℓ = 1}^{L} ϱ_{ℓ}^{i j} \leq Q^{2} δ . \end{matrix}

Acting in the same way, we can prove that for arbitrary

r \geq 2

the inequality

E_{} \{I_{A_{r}^{s}} (ω) \frac{1 Ψ_{t + 1, r}^{⊤} {\bar{γ}}_{t}^{⊤} Ψ_{1, t - 1}^{⊤} π}{1 Ξ_{1, r}^{⊤} π}\} \leq Q^{r - 1} δ

is valid for all r summands in the RHS of (A22). Finally

E_{} \{I_{A_{r}^{s}} (ω) {∥ Δ_{r} ∥}_{1}\} \leq 2 r Q^{r - 1} δ,

and the correctness of (42) follows from the fact that the latter inequality is valid for arbitrary

π \in Π

. Theorem 2 is proved. □

References

Wonham, W.M. Some Applications of Stochastic Differential Equations to Optimal Nonlinear Filtering. J. Soc. Ind. Appl. Math. Series A Control 1964, 2, 347–369. [Google Scholar] [CrossRef]
Kalman, R.E.; Bucy, R.S. New results in linear filtering and prediction theory. Trans. ASME Ser. D J. Basic Eng. 1961, 95–108. [Google Scholar] [CrossRef]
Rabiner, L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 1989, 77, 257–286. [Google Scholar] [CrossRef]
Ephraim, Y.; Merhav, N. Hidden Markov processes. IEEE Trans. Inf. Theory 2002, 48, 1518–1569. [Google Scholar] [CrossRef] [Green Version]
Cappé, O.; Moulines, E.; Ryden, T. Inference in Hidden Markov Models; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
Elliott, R.J.; Moore, J.B.; Aggoun, L. Hidden Markov Models: Estimation and Control; Springer: New York, NY, USA, 1995. [Google Scholar]
McLane, P.J. Optimal linear filtering for linear systems with state-dependent noise. Int. J. Control 1969, 10, 41–51. [Google Scholar] [CrossRef]
Dragan, V.; Aberkane, S. $H_{2}$ -optimal filtering for continuous-time periodic linear stochastic systems with state-dependent noise. Syst. Control Lett. 2014, 66, 35–42. [Google Scholar] [CrossRef]
Dragan, V.; Morozan, T.; Stoica, A. Mathematical Methods in Robust Control of Discrete-Time Linear Stochastic Systems; Springer: New York, NY, USA, 2010. [Google Scholar]
Liptser, R.; Shiryaev, A. Statistics of Random Processes II: Applications; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
Takeuchi, Y.; Akashi, H. Least-squares state estimation of systems with state-dependent observation noise. Automatica 1985, 21, 303–313. [Google Scholar] [CrossRef]
Joannides, M.; LeGland, F. Nonlinear filtering with continuous time perfect observations and noninformative quadratic variation. In Proceedings of the 36th IEEE Conference on Decision and Control, San Diego, CA, USA, 10–12 December 1997; Volume 2, pp. 1645–1650. [Google Scholar] [CrossRef] [Green Version]
Borisov, A. Optimal filtering in systems with degenerate noise in the observations. Autom. Remote Control 1998, 59, 1526–1537. [Google Scholar]
Crisan, D.; Kouritzin, M.; Xiong, J. Nonlinear filtering with signal dependent observation noise. Electron. J. Probab. 2009, 14, 1863–1883. [Google Scholar] [CrossRef]
Kushner, H. Probability Methods for Approximations in Stochastic Control and for Elliptic Equations; Academic Press: New York, NY, USA, 1977. [Google Scholar]
Kushner, H.J.; Dupuis, P.G. Numerical Methods for Stochastic Control Problems in Continuous Time; Springer: Berlin/Heidelberg, Germany, 1992. [Google Scholar]
Ito, K.; Rozovskii, B. Approximation of the Kushner Equation for Nonlinear Filtering. SIAM J. Control Optim. 2000, 38, 893–915. [Google Scholar] [CrossRef]
Clark, J. The design of robust approximations to the stochastic differential equations of nonlinear filtering. Commun. Syst. Random Proc. Theory 1978, 25, 721–734. [Google Scholar]
Malcolm, W.P.; Elliott, R.J.; van der Hoek, J. On the numerical stability of time-discretised state estimation via Clark transformations. In 42nd IEEE International Conference on Decision and Control; IEEE: Piscataway, NJ, USA, 2003; Volume 2, pp. 1406–1412. [Google Scholar] [CrossRef]
Yin, G.; Zhang, Q.; Liu, Y. Discrete-time approximation of Wonham filters. J. Control Theory Appl. 2004, 2, 1–10. [Google Scholar] [CrossRef]
Borisov, A.V. Wonham Filtering by Observations with Multiplicative Noises. Autom. Remote Control 2018, 79, 39–50. [Google Scholar] [CrossRef]
Borisov, A.V.; Semenikhin, K.V. State Estimation by Continuous-Time Observations in Multiplicative Noise. IFAC Pap. OnLine 2017, 50, 1601–1606. [Google Scholar] [CrossRef]
Liptser, R.; Shiryaev, A. Theory of Martingales; Mathematics and its Applications; Springer: Dortrecht, The Netherlands, 1989. [Google Scholar]
Stoyanov, J. Counterexamples in Probability; Wiley: Hoboken, NJ, USA, 1997. [Google Scholar]
Kolmogorov, A.; Fomin, S. Elements of the Theory of Functions and Functional Analysis; Dover: Mineola, NY, USA, 1999. [Google Scholar]
Platen, E.; Bruti-Liberati, N. Numerical Solution of Stochastic Differential Equations with Jumps in Finance; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar] [CrossRef] [Green Version]
Bertsekas, D.P.; Shreve, S.E. Stochastic Optimal Control: The Discrete-Time Case; Academic Press: New York, NY, USA, 1978. [Google Scholar]
Zolotarev, V. Metric Distances in Spaces of Random Variables and Their Distributions. Math. USSR-Sbornik 1976, 30, 373–401. [Google Scholar] [CrossRef]
Zolotarev, V. Limit Theorems as Stability Theorems. Theory Prob. Appl. 1989, 34, 153–163. [Google Scholar] [CrossRef]
Borovkov, A. Asymptotic Methods in Queuing Theory; John Wiley & Sons: Hoboken, NJ, USA, 1984. [Google Scholar]
Liptser, R.; Shiryaev, A. Statistics of Random Processes: I. General Theory; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]

Figure 1. Estimation quality index

S_{t} (h)

depending on the time-discretization step h.

Figure 1. Estimation quality index

S_{t} (h)

depending on the time-discretization step h.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Borisov, A.; Sokolov, I. Optimal Filtering of Markov Jump Processes Given Observations with State-Dependent Noises: Exact Solution and Stable Numerical Schemes. Mathematics 2020, 8, 506. https://doi.org/10.3390/math8040506

AMA Style

Borisov A, Sokolov I. Optimal Filtering of Markov Jump Processes Given Observations with State-Dependent Noises: Exact Solution and Stable Numerical Schemes. Mathematics. 2020; 8(4):506. https://doi.org/10.3390/math8040506

Chicago/Turabian Style

Borisov, Andrey, and Igor Sokolov. 2020. "Optimal Filtering of Markov Jump Processes Given Observations with State-Dependent Noises: Exact Solution and Stable Numerical Schemes" Mathematics 8, no. 4: 506. https://doi.org/10.3390/math8040506

APA Style

Borisov, A., & Sokolov, I. (2020). Optimal Filtering of Markov Jump Processes Given Observations with State-Dependent Noises: Exact Solution and Stable Numerical Schemes. Mathematics, 8(4), 506. https://doi.org/10.3390/math8040506

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal Filtering of Markov Jump Processes Given Observations with State-Dependent Noises: Exact Solution and Stable Numerical Schemes

Abstract

1. Introduction

2. Continuous-Time Filtering Problem Statement

3. Observation Transform and Optimal Filtering Equation

4. Numerical Algorithms of Optimal Filtering

4.1. Optimal Filtering Given Discretized Observations

4.2. Stable Analytic Approximations

4.3. Stable Numerical Approximations

4.4. Numerical Example

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

Abbreviations

Appendix A. Proof of Lemma 2

Appendix B. Proof of Theorem 1

Appendix C. Proof of Corollary 1

Appendix D. Proof of Lemma 4

Appendix E. Proof of Theorem 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI