Identification of Continuous-Discrete Hidden Markov Models with Multiplicative Observation Noise

Borisov, Andrey; Gorshenin, Andrey

doi:10.3390/math10173062

Open AccessArticle

Identification of Continuous-Discrete Hidden Markov Models with Multiplicative Observation Noise

by

Andrey Borisov

and

Andrey Gorshenin

^*

Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, Vavilova str. 44/2, 119333 Moscow, Russia

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(17), 3062; https://doi.org/10.3390/math10173062

Submission received: 26 July 2022 / Revised: 19 August 2022 / Accepted: 21 August 2022 / Published: 25 August 2022

(This article belongs to the Special Issue Mathematical Modeling, Optimization and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

The paper aims to identify hidden Markov model parameters. The unobservable state represents a finite-state Markov jump process. The observations contain Wiener noise with state-dependent intensity. The identified parameters include the transition intensity matrix of the system state, conditional drift and diffusion coefficients in the observations. We propose an iterative identification algorithm based on the fixed-interval smoothing of the Markov state. Using the calculated state estimates, we restore all required system parameters. The paper contains a detailed description of the numerical schemes of state estimation and parameter identification. The comprehensive numerical study confirms the high precision of the proposed identification estimates.

Keywords:

hidden markov model; multiplicative observation noise; fixed-interval smoothing; numerical integration; EM algorithm

MSC:

60J25; 60J75; 62M05; 65C60

1. Introduction

The parameter identification of the stochastic dynamic systems given the indirect heterogeneous noisy observations is one of the most complicated and demanding problems in system analysis. Usually, this type of estimation appears for either the development and validation of the mathematical models with their subsequent analysis and forecasting or various versions of the stochastic control under incomplete information. Engineers often use the results of identification in mechanical and electronic systems, in navigation, biology and medicine, economics and finance as well as speech and image processing.

There exists both the series of classical monographs and the number of comprehensive surveys, reflecting the state-of-the-art in the stochastic identification theory. The hidden Markov models (HMMs) are the prospective ones [1,2,3,4]. First, they form the mathematical basis for the description of the processes in telecommunications [5,6], finance [7,8,9], biology and medicine [10,11,12,13], tracking and navigation [14,15] and speech and image recognition [16,17,18,19]. Second, the HMM advanced theoretical framework provides a solution to a series of optimal estimation and control problems [20].

The investigation object of this paper represents a subclass of continuous-time HMMs with the states described by the finite-state Markov jump processes (MJPs) given the diffusion observations. Researchers have used various methods and algorithms to identify parameters in these HMMs [21,22,23,24,25,26]; however, the most effective is the maximum likelihood method as realized by the Expectation-Maximization algorithm (EM algorithm) [2,27]. It was successfully used for parameter identification in various stochastic models and particularly for the separation of the Gaussian mixtures [28]. To identify the HMM parameters, the authors of [29,30] introduced the corresponding version of the algorithm and proved the conditions of its convergence to a local maximum of the likelihood function. The essential assumption is the constant intensity of the Wiener observation noises.

The identified parameters include the transition intensity matrix (TIM) of the MJP and the conditional drifts in the observations. Note that the EM-procedure leans on the optimal filtering of the system state and some functionals of their trajectories. The paper aims to develop a new modification of the identification algorithm for the continuous-time HMM parameters with the state-dependent noise intensity in the observations. Such observation noises is also called multiplicative. Traditionally, the estimation problems in such systems are rarely investigated because of the known theoretical difficulties (see [31,32] for the details). The formal theoretical solution to the optimal filtering problem of the MJP state given the diffusion observations with multiplicative noises has been published recently [33,34] along with a complex of the stable numerical algorithms for their realization. The presented work is the natural evolution of that research.

The paper has the following organization. Section 2 contains a formal description of the investigated observation system and the statement of its parameter identification problem. The section also includes a comparative analysis of the various problem formulations and the properties of the EM-estimates, calculated for the adjacent HMMs: the discrete-time systems and the continuous-time ones with the homogeneous observation Wiener noises. Additionally, the section presents some reasons in favor of the proposed version of the identification algorithm. It is based on the high precision fixed interval smoothing estimates of the hidden state.

Section 3 contains a numerically effective “two-filter smoothing” algorithm, which expresses the smoothed estimates as a symmetric function of the forward- and backward-time filtering ones. The calculation of the smoothed estimates represents the E-step of the EM algorithm. M-step itself and the whole recursive EM-procedure is introduced in Section 4. The section contains the conversion of the

L_{2}

-optimal state estimates to the

L_{1}

-optimal ones and the formulae for the estimation of the required functionals of the MJP state trajectory. The section also presents pseudocode sketching the structure of the EM algorithm.

The solution to the optimal MJP state filtering given the diffusion observations with the multiplicative noises is a backbone of the identification algorithm. The theoretical estimator, being a variant of the generalized Bayes rule, contains the integrals by an abstract probability measure. They cannot be calculated analytically; hence, Section 5 proposes their numerical approximations and the accuracy characteristics.

Section 6 contains a complex of illustrative numerical examples. Section 6.1 presents the comparison of the filtering and fixed-interval smoothing estimates. The new version of the estimates is also compared with one from [2], initially proposed for the processing of the discrete-time HMMs. Section 6.2 contains diversified numerical analysis of the proposed identification estimates. Section 7 contains our concluding remarks.

2. Problem Formulation

On the probability triplet with filtration

(Ω, F, P, {F}_{t \in [0, T]})

, we consider the partially observable HMM

\begin{matrix} X_{t} = X_{0} + \int_{0}^{t} Λ^{⊤} X_{s} d s + M_{t}^{X}, \end{matrix}

(1)

\begin{matrix} Y_{r} = \int_{t_{r - 1}}^{t_{r}} f X_{s} d s + \int_{t_{r - 1}}^{t_{r}} \sum_{n = 1}^{N} X_{s}^{n} g_{n}^{\frac{1}{2}} d W_{s}, r \in {1, \dots, R}, t_{r} = r δ, T = R δ, \end{matrix}

(2)

where

$X_{t} = col (X_{t}^{1}, \dots, X_{t}^{N}) \in S^{N}$ is an unobservable state, which is a finite-state MJP with the state space $S^{N} ≜ {e_{1}, \dots, e_{N}}$ ( $S^{N}$ stands for the set of all unit coordinate vectors of the Euclidean space $R^{N}$ ) with the transition intensity matrix (TIM) $Λ$ and initial distribution $π = col (π^{1}, \dots, π^{N})$ ; the process $M_{t}^{X}$ is an $F_{t}$ -adapted martingale; and
$Y_{r} = col (Y_{r}^{1}, \dots, Y_{r}^{M}) \in R^{M}$ are the diffusion observations

$Y_{t} = \int_{0}^{t} f X_{s} d s + \int_{0}^{t} \sum_{n = 1}^{N} X_{s}^{n} g_{n}^{\frac{1}{2}} d W_{s},$

(3)

discretized with a (small) time increment $δ$ ; $W_{t} = col (W_{t}^{1}, \dots, W_{t}^{M}) \in R^{M}$ is an $F_{t}$ -adapted standard Wiener process characterizing the observation noise; and
f is an $M \times N$ -dimensional observation matrix, whose columns define the conditional observation drift, whereas the collection of $M \times M$ -dimensional matrices ${g_{n}}_{n = \bar{1, N}}$ represents the conditional observation diffusion coefficient given $X_{t} = e_{n}$ .

The non-decreasing family of

σ

-algebras generated by the discretized observations

Y

up to the moment

t_{r}

is denoted by

Y_{r} ≜ {Y_{q} : q = \bar{1, r}}

,

Y_{0} ≜ {\emptyset, Ω}

.

We suppose that the assumptions below hold for the system (1) and (2).

The probability triplet with filtration $(Ω, F, P, {F}_{t \in [0, T]})$ is the Wiener–Poisson one [35].
All the HMM parameters $Λ$ , f and ${g_{n}}_{n = \bar{1, N}}$ are unknown nonrandom matrices of appropriate dimensionality.
The transition intensity matrix $Λ$ satisfies the condition

$min_{i, j : i \neq j} λ_{i j} > 0$

(4)

along with the usual identity of the TIMs

$\sum_{j} λ_{i j} \equiv 0 .$
The noises in $Y$ are uniformly nondegenerate [36], i.e., $g_{n} > 0$ for all $n = \bar{1, N}$ .
The collection ${g_{n}}_{n = \bar{1, N}}$ of the observation noise intensity meets the indentifiability condition [33], i.e., all the matrices ${g_{n}}$ are different.

The HMM identification problem is to find the estimates of

Λ

, f and

{g_{n}}_{n = \bar{1, N}}

given the observations

{Y_{1}, \dots, Y_{R}}

.

Remark 1.

Note that condition (4) can be weakened, leaving the MJP X ergodicity only.

The identification problem of the system (1), (2) parameters has its inner history. It represents an approximation of the parameter estimation problem by the continuous-time diffusion observations (3). The authors of [29,30] investigated the continuous-time system (1), (3) with scalar observations and additive noise (i.e., the case

g_{n} \equiv g

). They developed a procedure of simultaneous MJP state filtering and system parameter identification based on the EM algorithm. The parameter identification requires the estimation of the number of

e_{i} \to e_{j}

transitions

N_{t}^{i, j} = \int_{0}^{t} X_{s -}^{i} d X_{s}^{j},

(5)

and the cumulative occupation time of the value

e_{n}

O_{t}^{n} = \int_{0}^{t} X_{s}^{n} d s .

(6)

The definitions given in (5) and (6) refer, respectively, to the transitions and occupation time related to interval

[0, t]

.

The authors of the monograph [2] investigated a discrete-time analog of (1), (3). The available observation is a vector-valued process corrupted by some multiplicative noise. The authors modified the EM algorithm and incorporated the fixed-interval smoothed estimates of

{N_{t}^{i, j}}

,

{O_{t}^{n}}

and the “level sums”

Q_{t}^{n, q} = \int_{0}^{t} X_{s}^{n} q (Y_{s}) d Y_{s}

(7)

in it. Note, the mentioned estimates of

{N_{t}^{i, j}}

,

{O_{t}^{n}}

and

{Q_{t}^{n, q}}

are not calculated by the two-filter smoothing algorithm, similar to [37], due to the nature of the underlying processes.

The paper [33] contains a solution to the optimal filtering of the MJP states given the vector observations with multiplicative noises. The authors suggested an observation transform that splits the initial ones into a diffusion with unit intensity, collection of counting processes and a set of random vectors observed at some non-random moments. The theoretical estimate represents the solution to the discrete-continuous stochastic differential system (SDS) with diffusion and counting processes on the right-hand side.

The article declares that identifiability condition 4 is sufficient for the precise restoration of the MJP state given the indirect noisy observations. Unfortunately, the solution to the SDS cannot be realized numerically due to the necessity to perform a double limit passage for the required observation transform. To overcome this obstacle, the authors suggested a collection of numerical filtering algorithms by the descretized observations. They investigated the performance of the proposed approximation and show that the algorithms deliver high precision to the MJP state estimates.

It is difficult to find the optimal estimates of the processes

{N_{t}^{i, j}}

,

{O_{t}^{n}}

and

{Q_{t}^{n, q}}

directly by the discretized observations. Nevertheless, it is possible to construct suboptimal ones based on the high precision two-filter smoothing estimates of the MJP state

X_{t}

.

Below in the presentation, we use the following notations:

$1$ is a row vector of the appropriate dimensionality formed by units.
I is a unit matrix of the appropriate dimensionality.
$I_{A} (x)$ is a indicator function of the set $A$ .
${∥ α ∥}_{K}^{2} ≜ α^{⊤} K α$ .
$N_{r}^{X}$ is a random number of the state $X_{t}$ transitions, occurred on the interval $[t_{r - 1}, t_{r}]$ .
$P_{ξ}$ is a contraction of the probability measure $P$ to the $σ$ -algebra $σ {X_{t}, Y_{t} : 0 ⩽ t ⩽ T}$ .
$τ_{r} = col (τ_{r}^{1}, \dots, τ_{r}^{N}) ≜ \int_{t_{r - 1}}^{t_{r}} X_{s} d s$ is a random vector composed of the occupation times of X in each state $e_{n}$ during the interval $[t_{r - 1}, t_{r}]$ .
$D ≜ {u = col (u^{1}, \dots, u^{N}) : u^{n} ⩾ 0, \sum_{n = 1}^{N} u^{n} = h}$ is an $(N - 1)$ -dimensional simplex in the space $R^{M}$ , $D$ is a distribution support of the vector $τ_{r}$ .
$Π ≜ {π = col (π^{1}, \dots, π^{N}) : π^{n} ⩾ 0, \sum_{n = 1}^{N} π^{n} = 1}$ is a “probabilistic simplex” formed by the possible values of $π$ .
$ρ_{r}^{k, ℓ} (d u)$ is a conditional distribution of the vector $X_{t_{r}}^{ℓ} τ_{r}$ given $X_{t_{r - 1}} = e_{k}$ , i.e., for any $G \in B (R^{M})$ the following equality holds:

$E_{} \{I_{G} (τ_{r}) X_{t_{r}}^{ℓ} | X_{t_{r - 1}} = e_{k}\} = \int_{G} ρ_{r}^{k, ℓ} (d u) .$
$ρ_{r}^{k, ℓ, q} (d u)$ is a conditional distribution of $X_{t_{r}}^{ℓ} I_{{q}} (N_{r}^{X}) τ_{r}$ given $X_{t_{r - 1}} = e_{k}$ , i.e., for any $G \in B (R^{M})$ the following equality holds:

$E_{} \{I_{G} (τ_{r}) I_{{q}} (N_{r}^{X}) X_{t_{r}}^{ℓ} | X_{t_{r - 1}} = e_{k}\} = \int_{G} ρ_{r}^{k, ℓ, q} (d u) .$
$ϱ_{r}^{k, ℓ} (d u)$ is a conditional distribution of $X_{t_{r}}^{ℓ} τ_{r + 1}$ given $X_{t_{r + 1}} = e_{k}$ , i.e., for any $G \in B (R^{M})$ the following equality holds:

$E_{} \{I_{G} (τ_{r + 1}) X_{t_{r}}^{ℓ} | X_{t_{r + 1}} = e_{k}\} = \int_{G} ϱ_{r}^{k, ℓ} (d u) .$
$ϱ_{r}^{k, ℓ, q} (d u)$ is a conditional distribution of $X_{t_{r}}^{ℓ} I_{{q}} (N_{r + 1}^{X}) τ_{r + 1}$ given $X_{t_{r + 1}} = e_{k}$ , i.e., for any $G \in B (R^{M})$ the following equality holds:

$E_{} \{I_{G} (τ_{r + 1}) I_{{q}} (N_{r + 1}^{X}) X_{t_{r}}^{ℓ} | X_{t_{r + 1}} = e_{k}\} = \int_{G} ϱ_{r}^{k, ℓ, q} (d u) .$
$N (y, m, K) ≜ {(2 π)}^{- M / 2} \det^{- 1 / 2} K exp \{- \frac{1}{2} {∥ y - m) ∥}_{K^{- 1}}^{2}\}$ is an M-dimensional Gaussian probability density function (pdf) with the expectation m and nondegenerate covariance matrix K.
$ξ ≜ vec (Λ, f, {g_{n}}_{\bar{1, N}})$ is an $(N^{2} + N M + N M^{2})$ -dimensional vector of all HMM parameters.
$Y_{t} ≜ σ {Y_{s}, 0 ⩽ s ⩽ t}$ is a natural filtration generated by the continuous-time process $Y_{s}$ available up to the moment t.

3. Auxiliary Problem: Fixed Interval Smoothing Given Discretized Observations

The proposed HMM parameter identification algorithm uses the iterative calculation of the fixed-interval smoothing estimates

{{\hat{X}}_{r}^{s}}_{r = \bar{1, R}}

:

{\hat{X}}_{r}^{s} ≜ E_{} \{X_{t_{r}} | Y_{R}\}

. For the effective numerical realization of the smoothing procedure, we suggest the two-filtering algorithm [37,38,39].

First, we present an algorithm for the calculation of the forward-time filtering estimate

{\hat{X}}_{t} ≜ E_{} \{X_{t_{r}} | Y_{r}\}

.

Let us introduce the following positive random numbers and matrices:

\begin{matrix} θ_{r}^{k j} ≜ \int_{D} N (Y_{r}, f u, \sum_{p = 1}^{N} u^{p} g_{p}) ρ_{r}^{k, j} (d u), & θ_{r} ≜ {∥ θ_{r}^{k j} ∥}_{k, j = \bar{1, N}} . \end{matrix}

(8)

The assertion below gives a recursive algorithm to calculate

{\hat{X}}_{t}

.

Proposition 1.

The Conditional Expectation (CE)

{\hat{X}}_{r}

can be calculated by the formula

{\hat{X}}_{r} = {(1 θ_{r}^{⊤} {\hat{X}}_{r - 1})}^{- 1} θ_{r}^{⊤} {\hat{X}}_{r - 1}, r > 0,

(9)

with the initial condition

{\hat{X}}_{0} = π .

(10)

The proof of Proposition 1 is quite similar to one of Lemma 3 in [33].

Second, we introduce the backward-time martingale representation of the MJP

X_{t}

along with the backward-time filtering estimate. On the initial probability triplet

(Ω, F, P)

we consider

The backward-time filtration ${F_{t}^{r}}_{t \in [0, T]}$ : $F_{t}^{r} ≜ σ {X_{s}, Y_{T} - Y_{s} : t ⩽ s ⩽ T}$ .
The backward-time collection of the discretized observations $Y_{r}^{b} ≜ σ {Y_{q} : r < q ⩽ R}$ ; $Y_{R}^{b} ≜ {\emptyset, Ω}$ .

The probability distribution

π (t) = E_{} \{X_{t}\}

, being the unique solution to the Kolmogorov equation

π (t) = π + \int_{0}^{t} Λ^{⊤} π (s) d s

, has strictly positive components

π^{n} (t) > 0

for any

n = \bar{1, N}, t > 0

due to condition 3 above (see [40]). Hence, we can define the backward-time TIM for the initial MJP

X_{t}

as follows

Γ (t) ≜ {diag}^{- 1} π (t) Λ^{⊤} diag π (t) - {diag}^{- 1} π (t) diag (Λ^{⊤} π (t)) .

(11)

The assertion below declares a backward-time representation of the MJP

X_{t}

.

Proposition 2.

The MJP

X_{t}

(1) satisfies the backward time SDS

X_{t} = X_{T} + \int_{t}^{T} Γ^{⊤} (s) X_{s} d s + [M_{t}^{X} - M_{T}^{X} - \int_{t}^{T} {(Γ (s) + Λ)}^{⊤} X_{s} d s] .

(12)

The process

M_{t}^{b} ≜ M_{t}^{X} - M_{T}^{X} - \int_{t}^{T} {(Γ (s) + Λ)}^{⊤} X_{s} d s

represents

F_{t}^{r}

-adapted square integrable martingale:

E_{} \{M_{t}^{b} | F_{s}^{r}\} = M_{s}^{b} P - a . s . f o r a n y 0 ⩽ t ⩽ s ⩽ T .

The proof of Proposition 2 can be found in [41].

To define the backward-time filtering estimate, we introduce the positive random numbers and matrices:

\begin{matrix} ϑ_{r}^{k j} ≜ \int_{D} N (Y_{r + 1}, f u, \sum_{p = 1}^{N} u^{p} g_{p}) ϱ_{r}^{k, j} (d u), & ϑ_{r} ≜ {∥ ϑ_{r}^{k j} ∥}_{k, j = \bar{1, N}} . \end{matrix}

(13)

One can obtain the backward-time filtering estimates

{\hat{X}}_{r}^{b} ≜ E_{} \{X_{t_{r}} | Y_{r}^{b}\}

, which are calculated by an algorithm similar to the original forward-time one

{\hat{X}}_{r}^{b} = {(1 ϑ_{r}^{⊤} {\hat{X}}_{r + 1}^{b})}^{- 1} ϑ_{r}^{⊤} {\hat{X}}_{r + 1}^{b}, r < R,

(14)

with the initial condition

{\hat{X}}_{R}^{b} = π (T) .

(15)

Third, in the assertion below, we present a numerically efficient two-filtering algorithm for the calculation of the fixed-interval smoothing MJP state estimate.

Proposition 3.

The fixed-interval smoothing estimate

{\hat{X}}_{r}^{s} = col ({\hat{X}}_{r}^{1 s}, \dots, {\hat{X}}_{r}^{N s}) = E_{} \{X_{t_{r}} | Y_{R}\}

,

r = \bar{0, R}

, has the form

{\hat{X}}_{r}^{n s} = \frac{{\tilde{X}}_{r}^{n s}}{\sum_{j = 1}^{N} {\tilde{X}}_{r}^{j s}}, n = \bar{1, N},

(16)

where the unnormalized estimate

{\tilde{X}}_{r}^{s} ≜ col ({\tilde{X}}_{r}^{1 s}, \dots, {\tilde{X}}_{r}^{N s})

is calculated via the filtering estimates

{\hat{X}}_{r}

,

{\hat{X}}_{r}^{b}

and prior MJP distribution

π (t_{r})

as

{\tilde{X}}_{r}^{n s} = \frac{{\hat{X}}_{r}^{n} {\hat{X}}_{r}^{n b}}{π^{n} (t_{r})}, n = \bar{1, N} .

(17)

The proof of Proposition 3 is similar to the one of Theorem 4 in [39].

4. EM-Like Algorithm of HMM Parameter Identification

Before the description of the parameter identification algorithm, we suggest some hints for it.

Let us consider the continuous-time HMM (1), (3), for which both process

X_{t}

and

Y_{t}

are observable. Due to the strong law of large numbers for MJP [42] and regenerative processes [43] the following strong limits hold as

T \to \infty

:

\begin{matrix} \frac{N_{T}^{i j}}{O_{T}^{i}} \to λ_{i j} for any i, j = \bar{1, N} : i \neq j, \end{matrix}

(18)

\begin{matrix} \frac{1}{O_{T}^{n}} Q_{T}^{n, 1} \to f e_{n} for any n = \bar{1, N} . \end{matrix}

(19)

The identity

\frac{1}{O_{T}^{n}} [Q_{T}^{n, 1} {(Q_{T}^{n, 1})}^{⊤} - \int_{0}^{T} Q_{s}^{n, 1} d {(Q_{s}^{n, 1})}^{⊤} - \int_{0}^{T} d Q_{s}^{n, 1} {(Q_{s}^{n, 1})}^{⊤}] = g_{n}

(20)

is also true

P

-a.s. for any

n = \bar{1, N}

on the set

{ω \in ω : O_{T}^{n} > 0}

. In the above formulae,

The processes $N_{t}^{i j}$ and $O_{t}^{n}$ are determined by formulae (5) and (6).
The “level sum” $Q_{t}^{n, 1}$ is a particular case of (7):

$Q_{t}^{n, 1} = \int_{0}^{t} X_{s}^{n} d Y_{s} .$

(21)

One can treat identification formulae (18) and (19) as some application of the method of moments [44]. Furthermore, [33], if

X_{t}

is the unobservable state and identifiability condition 5 above holds, then it is possible to restore

X_{t}

precisely given the diffusion observations

Y_{t}

:

{\bar{X}}_{t} ≜ E_{} \{X_{t} | Y_{t +}\} = X_{t} P - a . s . for any 0 ⩽ t < T .

(22)

Thus, in “the ideal situation” the CE

{\bar{X}}_{t}

allows identifying the HMM parameters

ξ

. However, two facts block the direct implementation of this idea. The estimate

E_{} \{X_{t} | Y_{t +}\}

represents a solution to the SDS with both the diffusion and counting processes on the right-hand side. The SDS uses some transformation of the original observations (3), which is a result of some double limit passage. There is no effective numerical procedure for its realization. Further, the CE

{\bar{X}}_{t} = {\bar{X}}_{ξ} (ω, t)

depends on the exact values of

Λ

, f and

{g_{n}}

, which are unknown.

We perform several steps to overcome this obstacle and develop a suboptimal identification algorithm. First, under rather mild conditions [45], there exists a continuous dependency of the solution to the SDS on the parameters, i.e., if the sequence of parameters

{ξ_{j}}_{j \in N}

converges to

ξ

as

j \to \infty

, then the sequence of the corresponding SDS solutions

{\bar{X}}_{ξ_{j}} \overset{p}{\to} {\bar{X}}_{ξ}

as

j \to \infty

. This means the performance of the estimate

{\bar{X}}_{ξ_{j}}

is close to the CE

{\bar{X}}_{ξ}

when the used parameters

ξ_{j}

are close enough to the exact values

ξ

.

Second, in the case of the known parameters

ξ

and the sequence of the nested partitions

{t_{r}}_{r = \bar{1, R}}

, due to Levy theorem [40],

{\hat{X}}_{r} \to {\bar{X}}_{t_{r}}

P -

a.s. as

R \to \infty

. This means that, under the identifiability condition 5 above, the optimal filtering estimate

{\hat{X}}_{r}

calculated by the discretized observations

Y

is close to the exact MJP state

X_{t_{r}}

, and the optimal smoothed estimate

{\hat{X}}_{r}^{s}

is even more so.

Third, both the trajectories of the state

X_{t_{r}}

and CE

{\bar{X}}_{t}

given the diffusion observations

Y_{t}

(3) are piecewise constant ones with values in

S^{N}

. By contrast, the CE

{\hat{X}}_{r}

calculated by the discretized observations

Y

(2) belongs to the “probabilistic simplex”

Π

. One can neutralize this issue rather easily, replacing the estimate

{\hat{X}}_{r}

with the maximum a posteriori probability (MAP) estimate

X_{r}

:

{\hat{X}}_{r} = {argmax}_{n} (e_{n}^{⊤} {\hat{X}}_{r}) .

(23)

The CE

{\hat{X}}_{r}

is the

L_{2}

-optimal estimate, whereas

{\hat{X}}_{r}

is the

L_{1}

-optimal one.

Fourth, Formulae (18)–(20) with the processes

N^{i ℓ}

,

O^{i}

and

Q^{i, 1}

substituted by their CEs, can be the base for the iterative identification procedure. Indeed, on the first step of the loop, the smoothed estimate

{\hat{X}}_{ξ_{j}}

is calculated given the previous values of the HMM parameters

ξ_{j}

. Based on

{\hat{X}}_{ξ_{j}}

, one can compute the required auxiliary estimates

{\hat{N}}^{i k}

,

{\hat{O}}^{n}

and

{\hat{Q}}^{n, 1}

:

{\hat{N}}_{R}^{i k} = \sum_{r = 1}^{R} {\hat{X}}_{r - 1}^{i s} ({\hat{X}}_{r}^{k s} - {\hat{X}}_{r - 1}^{k s}),

(24)

{\hat{O}}_{R}^{n} = δ \sum_{r = 1}^{R} {\hat{X}}_{r - 1}^{n s},

(25)

{\hat{Q}}_{R}^{n, 1} = \sum_{r = 1}^{R} {\hat{X}}_{r - 1}^{n s} Y_{r} .

(26)

Using them, in the second step, one can recalculate the HMM parameters

ξ_{j + 1}

by the modifications of (18), (19) and (20):

{\hat{λ}}_{i k} = \frac{{\hat{N}}_{R}^{i, k}}{{\hat{O}}_{R}^{i}},

(27)

\hat{f} e_{n} = \frac{1}{{\hat{O}}_{R}^{n}} {\hat{Q}}_{R}^{n, 1},

(28)

{\hat{g}}_{n} = \frac{1}{{\hat{O}}_{R}^{n}} \sum_{r = 1}^{R} {\hat{X}}_{r}^{n s} (Y_{r} - δ \hat{f} e_{n}) {(Y_{r} - δ \hat{f} e_{n})}^{⊤} .

(29)

The authors of [2,29,30] presented the versions of EM algorithm for the continuous-time HMM (1), (3) parameters in the case of additive observation noise (

g_{n} \equiv g

). It is shown that the formulae (18) and (19) establish the maximization step. Moreover, in [29,30], it is shown that, under the condition

P_{ξ^{'}} \sim P_{ξ^{''}}

for all admissible parameter values

ξ'

and

ξ ″

, the recursive identification procedure converges to a local maximum of the conditional likelihood function.

Below (see Algorithm 1) is a sketch of the EM like algorithm of the discrete-continuous HMM (1), (2) parameter identification. The input parameters include the initial values of parameters

\overset{\circ}{ξ}

, maximal number of iterations

N_{i t e r}

and minimal relative parameter variation per iteration

ε_{0}

.

Algorithm 1: The algorithm for HMM parameter identification.

1:: $ε \leftarrow ε_{0}$ ▹ Required accuracy index: initialization
2:: $N_{i t e r} \leftarrow N_{0}$ ▹ Maximal number of iterations: initialization
3:: for $n_{1} = 1$ to N do ▹ Parameter initialization: begin
4:: for $m_{1} = 1$ to M do
5:: ${\hat{f}}_{c u r}^{m_{1}, n_{1}} \leftarrow {\overset{\circ}{f}}^{m_{1}, n_{1}}$
6:: for $m_{1} = 2$ to M do
7:: ${\hat{g}}_{c u r}^{m_{1}, m_{2}} \leftarrow {\overset{\circ}{g}}^{m_{1}, m_{2}}$
8:: end for
9:: end for
10:: for $n_{2} = 1$ to N do If $(n_{1} < > n_{2})$
11:: ${\hat{λ}}_{c u r}^{n_{1}, n_{2}} \leftarrow {\overset{\circ}{λ}}^{n_{1}, n_{2}}$
12:: end for
13:: ${\hat{λ}}_{c u r}^{n_{1}, n_{2}} \leftarrow - \sum_{m : m \neq n_{1}} {\hat{λ}}_{c u r}^{n_{1}, m}$
14:: end for ▹ Parameter initialization: end
15:: $N_{c u r} \leftarrow 0$
16:: repeat ▹ Identification loop: begin
17:: ${\hat{X}}_{0} \leftarrow π$
18:: ${\hat{X}}_{0}^{b} \leftarrow π (T)$
19:: $\hat{ξ} \leftarrow {\hat{ξ}}_{c u r}$
20:: $N_{c u r} \leftarrow N_{c u r} + 1$
21:: for $r = 1$ to R do ▹ Forward and backward time filtering: begin
22:: ${\hat{X}}_{r} \leftarrow \hat{X} ({\hat{X}}_{r - 1}, Y_{r}, \hat{ξ})$ formulae (8), (9)
23:: ${\hat{X}}_{R - r}^{b} \leftarrow {\hat{X}}^{b} ({\hat{X}}_{R - r + 1}^{b}, Y_{R - r + 1}, \hat{ξ})$ formulae (13), (14)
24:: end for ▹ Forward and backward time filtering: end
25:: for $r = 0$ to R do ▹ Smoothing and CE to MAP conversion: begin
26:: ${\hat{X}}_{r}^{s} \leftarrow {\hat{X}}^{s} ({\hat{X}}_{r}, {\hat{X}}_{r}^{b})$ formulae (16), (17)
27:: ${\hat{X}}_{r}^{s} \leftarrow {\hat{X}}^{s} ({\hat{X}}_{r}^{s})$ formula (23)
28:: end for ▹ Smoothing and CE to MAP conversion: end
29:: for $n_{1} = 1$ to N do ▹ HMM Parameter recalculation: begin
30:: for $m_{1} = 1$ to M do
31:: ${\hat{f}}_{c u r}^{m_{1}, n_{1}} \leftarrow {\hat{f}}_{c u r}^{m_{1}, n_{1}} ({\hat{X}}^{s}, Y)$ formulae (24)–(26), (28)
32:: for $m_{2} = 1$ to M do
33:: ${\hat{g}}_{c u r}^{m_{1}, m_{2}} \leftarrow {\hat{g}}_{c u r}^{m_{1}, m_{2}} ({\hat{X}}^{s}, Y)$ formulae (24)–(26), (29)
34:: end for
35:: end for
36:: for $n_{2} = 1$ to N do If( $n_{1} < > n_{2}$ )
37:: ${\hat{λ}}_{c u r}^{n_{1}, n_{2}} \leftarrow {\hat{λ}}_{c u r}^{n_{1}, n_{2}} ({\hat{X}}^{s}, Y)$ formulae (24)–(27)
38:: end for
39:: ${\hat{λ}}_{c u r}^{n_{1}, n_{2}} \leftarrow - \sum_{m : m \neq n_{1}} {\hat{λ}}_{c u r}^{n_{1}, m}$
40:: end for ▹ HMM Parameter recalculation: end
41:: until $((\frac{∥ {\hat{ξ}}_{c u r} - \hat{ξ} ∥}{∥ {\hat{ξ}}_{c u r} ∥} ⩽ ε_{0}) OR (N_{c u r} > = N_{i t e r}))$ ▹ Identification loop: end
return ${\hat{ξ}}_{c u r}$

5. Numerical Schemes of Filtering Estimate Calculation

Neither the forward-time conditional distribution

ρ^{k, ℓ}

nor the backward-time one

ϱ^{k, ℓ}

are absolutely continuous with respect to the Lebesgue measure; hence, the practical calculation of the integrals (8) and (13) is nontrivial. Due to the law of the total probability,

θ_{r}^{k j} ≜ \int_{D} N (Y_{r}, f u, \sum_{p = 1}^{N} u^{p} g_{p}) ρ_{r}^{k, j} (d u) = \sum_{q = 0}^{\infty} \int_{D} N (Y_{r}, f u, \sum_{p = 1}^{N} u^{p} g_{p}) ρ_{r}^{k, j, q} (d u) .

The results of [33] hint that, for small enough time increments

δ

, the sum in the last formula can be limited by only two summands (

q = 0

and 1). In this case, we replace the estimate

{\hat{X}}_{r}

with its analytic approximation of order 1:

{\hat{X}}_{r} = {(1 ϰ_{r}^{⊤} {\hat{X}}_{r - 1})}^{- 1} ϰ_{r}^{⊤} {\hat{X}}_{r - 1}, r > 0, {\hat{X}}_{0} = π,

(30)

where

\begin{matrix} ϰ_{r}^{k j} ≜ \sum_{q = 0}^{1} \int_{D} N (Y_{r}, f u, \sum_{p = 1}^{N} u^{p} g_{p}) ρ_{r}^{k, j, q} (d u), & ϰ_{r} ≜ {∥ ϰ_{r}^{k j} ∥}_{k, j = \bar{1, N}} . \end{matrix}

(31)

From the probabilistic point of view, this means that we take into account at most one state transition occurring at the discretization interval

[t_{r - 1}, t_{r}]

. If

\bar{λ} ≜ {max}_{1 ⩽ n ⩽ N} | λ_{n n} |

and

δ

is so small that

\bar{λ} δ < 1

, then (see [33])

C_{1} ≜ e^{- \bar{λ} δ} \frac{2}{{(\bar{λ} δ)}^{2}} \sum_{k = 2}^{\infty} \frac{{(\bar{λ} δ)}^{k}}{k!} < 1,

(32)

and the filtering approximation error is characterized by the inequality

E_{} \{{∥{\hat{X}}_{r} - {\hat{X}}_{r}∥}_{1}\} ⩽ 2 - 2 {(1 - C_{1} \frac{{(\bar{λ} δ)}^{2}}{2})}^{r} \sim {\bar{λ}}^{2} t_{r} δ a s δ \to 0 .

The explicit form of

ϰ_{r}^{k j}

is

ϰ_{r}^{k j} = δ_{k j} e^{λ_{j j} δ} N (Y_{r}, δ f^{j}, δ g_{j}) + (1 - δ_{k j}) λ_{k j} e^{λ_{j j} δ} \int_{0}^{δ} V^{k j} (Y_{r}, u) d u,

(33)

where

δ_{k j}

is the Kronecker symbol, and

V^{k j} (y, u) ≜ e^{(λ_{k k} - λ_{j j}) u} N (y, u f^{k} + (δ - u) f^{j}, u g_{k} + (δ - u) g_{j}) .

(34)

The integral in (33) cannot be calculated analytically; thus, as the analytic approximation

{\hat{X}}_{r}

, we have to calculate it numerically, appending some error. We suggest using the composite method of the midpoint rectangles with the substep

δ^{1 + α}

for some

α > 0

:

ϰ_{r}^{k j} \approx ψ_{r}^{k j} ≜ δ_{k j} e^{λ_{j j} δ} N (Y_{r}, δ f^{j}, δ g_{j}) + (1 - δ_{k j}) λ_{k j} δ^{1 + α} \sum_{i = 1}^{[δ^{- α}]} V^{k j} (Y_{r}, δ^{1 + α} (i - \frac{1}{2})) .

(35)

Denoting

ψ_{r} ≜ {∥ ψ_{r}^{k j} ∥}_{k, j = \bar{1, N}}

, we present the numerical approximation of order

min (α, 1)

based on the composite midpoint rectangle scheme:

{\hat{X}}_{r} = {(1 ψ_{r}^{⊤} {\hat{X}}_{r - 1})}^{- 1} ψ_{r}^{⊤} {\hat{X}}_{r - 1}, r > 0, {\hat{X}}_{0} = π .

(36)

The total approximation error of

{\hat{X}}_{r}

is characterized by the inequality [34]

E_{} \{{∥{\hat{X}}_{r} - {\hat{X}}_{r}∥}_{1}\} ⩽ C_{2} t_{r} (δ + C_{3} δ^{α})

with some positive constants

C_{2}

and

C_{3}

. Thus, the order of the numerical approximation is at most 1.

Above in this section, we discuss the numerical approximation of the forward-time filtering estimate. A similar framework is applicable for the numerical realization of backward-time filtering. As in the forward-time, we assume at most one MJP state transition occurred on the time increment. We also have to consider the nonhomogeneous nature of the MJP

X_{t}

viewed in the reversed time. The analogue of formula (33) has the form

ζ_{r}^{k j} = δ_{k j} exp (\int_{t_{r}}^{t_{r + 1}} γ_{k k} (u) d u) N (Y_{r + 1}, δ f^{k}, δ g_{k}) + (1 - δ_{k j}) \int_{0}^{δ} U_{r}^{k j} (Y_{r + 1}, u) d u,

(37)

where

\begin{matrix} U_{r}^{k j} (y, u) ≜ γ_{k j} (u) exp (\int_{t_{r}}^{u} γ_{k k} (s) d s + \int_{u}^{t_{r + 1}} γ_{j j} (s) d s) \times \\ \times N (y, u f^{k} + (δ - u) f^{j}, u g_{k} + (δ - u) g_{j}) . \end{matrix}

(38)

Both the first summand in (37) and the function

U_{r}^{k j}

(38) contain integrals; therefore, we replace them with the following midpoint approximations:

exp (\int_{t_{r}}^{t_{r + 1}} γ_{k k} (u) d u) \approx exp (δ γ_{k k} (\frac{t_{r} + t_{r + 1}}{2})),

\begin{matrix} {\tilde{U}}_{r}^{k j} (y, u) ≜ γ_{k j} (u) exp ((u - t_{r}) γ_{k k} (\frac{u + t_{r}}{2}) + (t_{r + 1} - u) γ_{j j} (\frac{t_{r + 1} + u}{2})) \times \\ \times N (y, u f^{k} + (δ - u) f^{j}, u g_{k} + (δ - u) g_{j}) . \end{matrix}

(39)

Finally, the backward-time analogue of

ψ^{k j}

, which is valid for the numerical realization, takes the form

ζ_{r}^{k j} \approx ϕ_{r}^{k j} = δ_{k j} exp (δ γ_{k k} (\frac{t_{r} + t_{r + 1}}{2})) + (1 - δ_{k j}) δ^{1 + α} \sum_{i = 1}^{[δ^{- α}]} {\tilde{U}}^{k j} (Y_{r + 1}, δ^{1 + α} (i - \frac{1}{2})) .

(40)

Denoting

ϕ_{r} ≜ {∥ ϕ_{r}^{k j} ∥}_{k, j = \bar{1, N}}

, we present the numerical approximation of order

m i n (α, 1)

based on the composite midpoint rectangle scheme for backward-time filter:

{\hat{X}}_{r}^{b} = {(1 ϕ_{r}^{⊤} {\hat{X}}_{r + 1}^{b})}^{- 1} ϕ_{r}^{⊤} {\hat{X}}_{r + 1}^{b}, r < R, {\hat{X}}_{R}^{b} = π (T) .

(41)

6. Numerical Experiments

To illustrate the properties of the proposed estimation algorithms, we performed a complex of numerical experiments. We fixed the values for some basic parameters of HMM (1), (2). Varying some of them, we fulfilled the calculation, and the obtained results help us to highlight various performance characteristics of the suggested estimates.

We chose an HMM with the following parameters:

N = 3

,

M = 1

,

Λ = [\begin{matrix} - 18.0 & 12.0 & 6.0 \\ 9.0 & - 18.0 & 9.0 \\ 6.0 & 12.0 & - 18.0 \end{matrix}], π = [\begin{matrix} 0.3 \\ 0.4 \\ 0.3 \end{matrix}], f = [0.0, 0.0, 0.0], \begin{matrix} g_{1} = 0.1, \\ g_{2} = 0.2, \\ g_{3} = 0.3 . \end{matrix}

(42)

6.1. Filtering vs. Smoothing: Estimation Performance Analysis

The HMM state filtering numerical algorithm is a backbone of subsequent HMM parameter identification; thus, the high precision of the proposed filtering estimates is a vital requirement.

The optimal filtering algorithm of the discrete-time HMM states given the observations with state-dependent noises is presented in ([2], chap. 3). Its adaptation to the time-discretized observations is a specific case of (30), with the matrix

θ

calculated other than by (35).

We compare the proposed filtering and smoothing estimates, calculated by the composite midpoint rectangle scheme and the ones from [2]. Both numerical schemes are stable. The difference can be exposed clearly, when the time-discretization step is sufficiently large. The numerical experiment was conducted for the HMM with the basic parameter values (42) and the following auxiliary parameter values: the time interval

[0, T] = [0, 1.0]

, the time-discretization step

δ = 0.1

, the MJP state simulation step

δ_{s i m} = 0.001

and the step of the numerical integration in (35)

δ_{i n t} = 0.01

. The estimation error variance was calculated by the Monte-Carlo method over the sample of the size

N_{M C} = 10^{8}

.

Figure 1 contains the following plots:

${\hat{D}}_{t}$ , ${\hat{D}}_{t}^{b}$ , ${\hat{D}}_{t}^{s}$ —the sample variances of the forward-time filtering estimate, backward-time and fixed-interval smoothing one, calculated by the algorithms proposed in the paper.
${\tilde{D}}_{t}$ , ${\tilde{D}}_{t}^{b}$ , ${\tilde{D}}_{t}^{s}$ —the sample variances of the forward-time filtering estimate, backward-time and fixed-interval smoothing one, calculated by the algorithms proposed in [2].

Figure 1. Sample error variances of the proposed estimates and the ones from [2].

Figure 2. Available discretized observations, true MJP state and its filtering and smoothing estimates.

The chosen initial distribution of the MJP

X_{t}

coincides with the stationary one, hence, the vector of component variances is

[0.21, 0.24, 0.21]

. The combination of the high transition intensity and the large discretization step (

{max}_{j} (| λ_{j j} |) δ \approx 1

), along with the absence of the observation drift (f is a zero vector) and the slightly different noise intensities

{g_{n}}

establish poor conditions for the state estimation; hence, one can expect for the variances of estimate errors to be close to the ones of the MJP itself. In this numerical experiment, both the filtering and smoothing algorithms are better than the adaptation of the discrete-time algorithms [2]. Moreover, looking at the sample variances of the second MJP component, one can see that the computational errors become so high that the smoothing demonstrates worse performance than the filtering does.

To compare the performance of the filtering estimates and the smoothing one, the next numerical experiment is fulfilled with the following time steps:

δ = 0.2 \times 10^{- 4}

,

δ_{s i m} = 0.2 \times 10^{- 6}

and

δ_{i n t} = 0.2 \times 10^{- 5}

.

Figure 2 presents specific trajectories of the available observations, components of the accurate MJP state, their forward- and backward-time filtering estimates and the fixed-interval smoothing ones. One can see the superiority of the smoothing estimate performance.

Condition 5 is valid for the HMM with parameters (42); hence, we can expect convergence of both the filtering and smoothing estimates to the accurate state value as the time discretization step vanishes. Figure 3 presents sample error variances for all types of estimates (forward-time filtering

{\hat{D}}_{t}

, backward-time-filtering

{\hat{D}}_{t}^{b}

and fixed-interval smoothing ones

{\hat{D}}_{t}^{s}

) calculated with various time discretization steps

δ = 10^{- 2}, 10^{- 3}, 10^{- 4}

. One can see the expected convergence.

The results of the first stage of the numerical experiment allow us to make the following conclusions.

Both the proposed numerical approximation based on the composite midpoint rectangle scheme (35) and the adaptation of the algorithm from [2] demonstrate the stable character: both the estimates have non-negative components and satisfy the normalization property. Their performances look similar; however, (35) demonstrates more stability in the case of the large discretization step ( ${max}_{j} (| λ_{j j} |) δ \approx 1$ ).
The drift f is absent in the observations. However, all conditional noise intensities are different; therefore, identifiability condition 5 is valid for this HMM. The calculation results confirm the convergence of the proposed estimates to the real MJP state as $δ \to 0$ .
The quality of the fixed-interval smoothing estimates is significantly better than the one of the filtering estimates: the error variances are distinguished about twice.

6.2. Parameter Identification

We performed the complex of the identification numerical experiments for the HMM with parameters (42), where the observation drift vector f is replaced by

f = [- 10.0, 5.0, 20.0]

. This change allows us to illustrate the identification performance of all HMM parameters, including the drift f. The observations are available on the time interval

[0.0, 100.0]

. We simulate the “continuous-time” state with the small step

δ_{s i m} = 10^{- 7}

, and use the step

δ_{i n t} = 10^{- 6}

in the numerical integration scheme (40).

To analyze how the time discretization of observations affects the parameter identification performance, we choose the following time step values:

δ = 10^{- 2}, 2 \times 10^{- 4}, 10^{- 4}

and

10^{- 5}

. The number of iteration is bounded:

N_{i t e r} = 30

with the minimal relative parameter variation per iteration

ε_{0} = 10^{- 5}

.

Table 1 contains the identification results of the TIM

Λ

, Table 2 stands for the results of the drift f identification, and Table 3 stands for the ones of the diffusion g. All the tables include

The exact parameter values ( $Λ$ , f and g).
The starting points for the iterative identification procedure ( ${\hat{Λ}}_{0}$ , ${\hat{f}}_{0}$ and ${\hat{g}}_{0}$ ).
The obtained estimates ( $\hat{Λ}$ , $\hat{f}$ and $\hat{g}$ ).

Table 1. Identification results of the TIM

Λ

.

Table 1. Identification results of the TIM

Λ

.

$δ$	$Λ$			${\tilde{Λ}}_{X}$			${\hat{Λ}}_{0}$			$\hat{Λ}$			Relative Errors (%)
	−18.0	12.0	6.0	−18.6	11.6	7.0	−1.0	0.5	0.5	−14.6	7.4	7.2	18.9	38.3	20.0
10⁻²	9.0	−18.0	9.0	8.7	−17.2	8.5	0.5	−1.0	0.5	10.0	−16.0	6.0	11.1	11.1	33.3
	6.0	12.0	−18.0	6.0	11.2	−17.2	0.5	0.5	−1.0	9.7	5.3	−15.0	61.7	55.8	16.7
	−18.0	12.0	6.0	−18.9	11.8	7.1	−1.0	0.5	0.5	−16.6	10.1	6.5	7.8	15.8	8.3
2 × 10⁻⁴	9.0	−18.0	9.0	8.9	−17.5	8.6	0.5	−1.0	0.5	7.6	−14.6	7.0	15.6	18.9	22.2
	6.0	12.0	−18.0	6.0	11.5	−17.5	0.5	0.5	−1.0	5.5	9.8	−15.3	8.3	18.3	15.0
	−18.0	12.0	6.0	−18.9	11.8	7.1	−1.0	0.5	0.5	−17.4	11.1	6.3	3.3	7.5	5.0
10⁻⁴	9.0	−18.0	9.0	8.9	−17.5	8.6	0.5	−1.0	0.5	8.2	−15.6	7.4	8.9	13.3	17.8
	6.0	12.0	−18.0	6.0	11.5	−17.5	0.5	0.5	−1.0	5.6	9.8	−15.4	6.7	18.3	14.4
	−18.0	12.0	6.0	−18.9	11.8	7.1	−1.0	0.5	0.5	−18.4	11.4	7.0	2.2	5.0	16.7
10⁻⁵	9.0	−18.0	9.0	8.9	−17.5	8.6	0.5	−1.0	0.5	8.5	−16.9	8.4	5.6	6.1	6.7
	6.0	12.0	−18.0	6.0	11.5	−17.5	0.5	0.5	−1.0	6.1	11.0	−17.1	1.7	8.3	5.0

Table 2. Identification results of the observation drift elements f.

$δ$	f			${\hat{f}}_{0}$		$\hat{f}$			Relative Errors (%)
$10^{- 2}$	−10.0	5.0	20.0	−1.0	1.0	−9.9	5.1	20.1	1.0	2.0	0.5
$2 \times 10^{- 4}$	−10.0	5.0	20.0	−1.0	1.0	−10.1	5.0	20.3	1.0	0.0	1.5
$10^{- 4}$	−10.0	5.0	20.0	−1.0	1.0	−10.1	5.0	20.2	1.0	0.0	1.0
$10^{- 5}$	−10.0	5.0	20.0	−1.0	1.0	−10.0	4.9	20.0	0.0	2.0	0.0

Table 3. Identification results of the observation diffusion elements g.

$δ$	g			${\hat{g}}_{0}$			$\hat{g}$
$10^{- 2}$	0.1	0.2	0.3	0.05	0.15	0.4	0.1	0.2	0.3
2 × 10⁻⁴	0.1	0.2	0.3	0.05	0.15	0.4	0.1	0.2	0.3
$10^{- 4}$	0.1	0.2	0.3	0.05	0.15	0.4	0.1	0.2	0.3
$10^{- 5}$	0.1	0.2	0.3	0.05	0.15	0.4	0.1	0.2	0.3

Additionally, we compare the TIM estimate

\hat{Λ}

with an estimate

{\tilde{Λ}}_{X}

, calculated by (18) given the MJP state

X_{t}

. Clearly,

X_{t}

is unobservable directly; hence,

{\tilde{Λ}}_{X}

can be treated as some “ideal estimate”.

Below, we present the evolution of the HMM parameter estimates depending on the iteration number. The results are obtained for the discretization step

δ = 10^{- 4}

. Figure 4 contains the estimates of the off-diagonal TIM

Λ

coefficients in comparison with their exact values, Figure 5 contains the analogous information concerning the drift coefficients f, and Figure 6 presents the identification results of the diffusion coefficients g.

The identification quality can be illustrated in another way. One can build the histogram using the time-discretized observations as a sample. If the time step

δ

is small enough, then the theoretical pdf can be approximated by a mixture of the Gaussians

\sum_{n = 1}^{N} π_{\infty}^{n} N (y, δ f e_{n}, δ g_{n})

, where the vector

π_{\infty} = col (π_{\infty}^{1}, \dots, π_{\infty}^{N})

is the stationary distribution of the MJP

X_{t}

with the TIM

Λ

.

In this case, the approximation precision is

o (δ)

. We cannot construct the “theoretical pdf” because the true values of HMM parameters are unknown. Nevertheless, we can build its approximation

\sum_{n = 1}^{N} {\hat{π}}_{\infty}^{n} N (y, δ \hat{f} e_{n}, δ {\hat{g}}_{n})

, where

\hat{f}

and

{{\hat{g}}_{n}}

are identified values of the drift and diffusion and

{\hat{π}}_{\infty} = col ({\hat{π}}_{\infty}^{1}, \dots, {\hat{π}}_{\infty}^{N})

is the stationary distribution of a MJP with the TIM

\hat{Λ}

. Figure 7 contains all the pdf estimates mentioned above, and the calculation is performed for the time increment

δ = 10^{- 2}

.

The results of the second stage of the numerical experiment allow us to make the following conclusions.

All estimates demonstrate high precision. Even for the relatively large time increment $δ = 10^{- 2}$ and the corresponding small observation amount $\frac{T}{δ} = 10^{4}$ , the obtained estimates are precise.
The iterative identification procedure converges quickly. In all numerical experiments, the iteration number does not exceed 15, and the identification procedure is terminated by the decline of the minimal relative parameter variation per iteration less than the threshold $ε_{0}$ .
The precision of the parameter estimates depends on the time increment $δ$ . The most impact of the $δ$ declining to the identification performance can be seen in the estimates of the drift f and diffusion g parameters.
The estimation of the transition intensity matrix $Λ$ was found to be a challenging problem. Its identification precision demonstrates low dependency on the time increment $δ$ value. Under a fixed observation interval $[0, t]$ , the estimate $Λ$ cannot be precise more than the “ideal estimate” ${\tilde{Λ}}_{X}$ calculated by the precise knowledge of the state trajectory.

7. Conclusions

The paper presented a complete technological circuit to identify continuous-time HMM parameters. We included the identification problem formulation, which is proper from the mathematical point of view. The article contains a theoretical foundation of the solution and a numerical algorithm of the HMM parameter identification. We tested the proposed identification procedure over the extended complex of the numerical experiments. The numerical study confirmed the applicability of the current version of the identification algorithm and the high precision of the calculated estimates.

Nevertheless, we consider some directions for the further refinement of the algorithm and its theoretical basis. First, the authors of [29,30] have proven the convergence of the identification procedure to a local maximum of the conditional likelihood function. Let us remember that they considered the stochastic differential HMM with a scalar observable process and additive observation noises (the case

g_{n} \equiv g

). The keystone of the proof is the absolute continuity of the probability measures connected using a Girsanov transform. The continuity does not hold in the case of the continuous-time HMM with the observations corrupted by the multiplicative noise. Nevertheless, it is highly likely that this property will be provable for the time-discretized ones. Potentially, this would prove the convergence of the identification procedure and control the value of the conditional likelihood function during the iterations.

Second, the proposed algorithm has a significant drawback: it does not admit direct parallelization. This is an inherent feature of all identification algorithms based on fixed-interval smoothing, including the classic Baum–Welch algorithm [21,22]. Nevertheless, the proposed two-filter version of the fixed-interval smoothing admits double parallelization of the forward- and backward-time filtering procedures in each iteration step. Furthermore, the identification process could converge to a local maximum of the conditional likelihood function, depending on the starting point. The identification procedure admits the parallelization in the following sense: several exemplars of the identification procedure are performed on the independent processors given the same observation bulk and different starting points. Then, we can choose one of the calculated results, using either the likelihood function or some additional information concerning the estimated parameters.

Third, the current version of the algorithm can be modified to incorporate various constraints and describing the admissible set of the HMM parameters. We can illustrate this idea with the above-stated problem. It contains inequality (4), which postulates strong positivity for all off-diagonal elements of the TIM

Λ

. This constraint defines an open admissible set of

Λ

values. Theoretically, this leads to the absence of the maximum likelihood TIM estimate for certain samples. Practically, this appears as the convergence of the estimates to a matrix with some zero off-diagonal elements. To avoid this situation, one should replace the initial open set of admissible parameter values with some closed subsets, using additional barriers. They can be either stationary or variable during the iterative process to preserve the calculated estimates from leaving the admissible set.

Author Contributions

Conceptualization, A.B.; formal analysis, A.B.; funding acquisition, A.G.; investigation, A.B.; methodology, A.B.; project administration, A.B. and A.G.; software, A.B. and A.G.; supervision, A.B.; validation, A.B. and A.G.; visualization, A.B. and A.G.; writing—original draft preparation, A.B.; writing—review and editing, A.G. All authors have read and agreed to the published version of the manuscript.

Funding

The research was supported by the Ministry of Science and Higher Education of the Russian Federation, project No. 075-15-2020-799.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable to this article.

Acknowledgments

The research was performed using the infrastructure of the Shared Research Facilities “High Performance Computing and Big Data” (CKP “Informatics”) of the Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CE	Conditional Expectation
EM algorithm	Expectation/Maximization algorithm
HMM	Hidden Markov Model
MAP	Maximum A posteriori Probability
MJP	Markov Jump Process
pdf	probability density function
SDS	Stochastic Differential System
TIM	Transition Intensity Matrix

References

Rabiner, L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 1989, 77, 257–286. [Google Scholar] [CrossRef]
Elliott, R.J.; Moore, J.B.; Aggoun, L. Hidden Markov Models: Estimation and Control; Springer: New York, NY, USA, 1995. [Google Scholar]
Ephraim, Y.; Merhav, N. Hidden Markov processes. IEEE Trans. Inf. Theory 2002, 48, 1518–1569. [Google Scholar] [CrossRef]
Cappé, O.; Moulines, E.; Ryden, T. Inference in Hidden Markov Models; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
Altman, E.; Avrachenkov, K.; Barakat, C. A Stochastic Model of TCP/IP with Stationary Random Losses. Perform. Eval. 2000, 30, 231–242. [Google Scholar] [CrossRef]
Liu, S.; Başar, T.; Srikant, R. TCP-Illinois: A loss- and delay-based congestion control algorithm for high-speed networks. Perform. Eval. 2008, 65, 417–440. [Google Scholar] [CrossRef]
Elliott, R.; Malcolm, W.; Tsoi, A. HMM volatility estimation. In Proceedings of the 41st IEEE Conference on Decision and Control, Las Vegas, NV, USA, 10–13 December 2002; Volume 1, pp. 398–404. [Google Scholar] [CrossRef]
Mamon, R.; Elliott, R. Hidden Markov Models in Finance; International Series in Operations Research & Management Science; Springer: New York, NY, USA, 2010. [Google Scholar]
Yuan, Y.; Mitra, G. Market Regime Identification Using Hidden Markov Models. Cap. Mark. Asset Pricing Valuat. Ejournal 2016. [Google Scholar] [CrossRef]
Birney, E. Hidden Markov Models in Biological Sequence Analysis. IBM J. Res. Dev. 2001, 45, 449–454. [Google Scholar] [CrossRef]
Kim, J.; Hyub, H.; Yoon, S.Z.; Choi, H.J.; Kim, K.M.; Park, S.H. Analysis of EEG to quantify depth of anesthesia using Hidden Markov Model. In Proceedings of the 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Chicago, IL, USA, 26–30 August 2014; pp. 4575–4578. [Google Scholar] [CrossRef]
Vidyasagar, M. Hidden Markov Processes: Theory and Applications to Biology; Princeton University Press: Princeton, NJ, USA, 2014. [Google Scholar]
Vijayabaskar, M. Introduction to Hidden Markov Models and Its Applications in Biology. Methods Mol. Biol. 2017, 1552, 1–12. [Google Scholar] [CrossRef]
Rong Li, X.; Jilkov, V. Survey of maneuvering target tracking. Part V. Multiple-model methods. IEEE Trans. Aerosp. Electron. Syst. 2005, 41, 1255–1321. [Google Scholar] [CrossRef]
Qiao, S.; Shen, D.; Wang, X.; Han, N.; Zhu, W. A Self-Adaptive Parameter Selection Trajectory Prediction Approach via Hidden Markov Models. IEEE Trans. Intell. Transp. Syst. 2015, 16, 284–296. [Google Scholar] [CrossRef]
Li, J.; Gray, R.M. Image Segmentation and Compression Using Hidden Markov Models; The Springer International Series in Engineering and Computer Science; Springer: New York, NY, USA, 2000. [Google Scholar] [CrossRef]
Willsky, A. Multiresolution Markov models for signal and image processing. Proc. IEEE 2002, 90, 1396–1458. [Google Scholar] [CrossRef]
Gales, M.; Young, S. The Application of Hidden Markov Models in Speech Recognition. Found. Trends Signal Process. 2007, 1, 195–304. [Google Scholar] [CrossRef]
Vaseghi, S.V. Advanced Digital Signal Processing and Noise Reduction; John Wiley & Sons, Ltd.: New York, NY, USA, 2008. [Google Scholar]
Kushner, H.J.; Dupuis, P.G. Numerical Methods for Stochastic Control Problems in Continuous Time; Springer: Berlin/Heidelberg, Germany, 1992. [Google Scholar]
Baum, L.E.; Petrie, T. Statistical Inference for Probabilistic Functions of Finite State Markov Chains. Ann. Math. Stat. 1966, 37, 1554–1563. [Google Scholar] [CrossRef]
Baum, L.E.; Petrie, T.; Soules, G.; Weiss, N. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains. Ann. Math. Stat. 1970, 41, 164–171. [Google Scholar] [CrossRef]
Viterbi, A. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 1967, 13, 260–269. [Google Scholar] [CrossRef]
Qian, W.; Titterington, D.M.; Cox, D.R.; Titterington, D.M. Estimation of parameters in hidden Markov models. Philos. Trans. R. Soc. London Ser. A Phys. Eng. Sci. 1991, 337, 407–428. [Google Scholar] [CrossRef]
Rydén, T. EM versus Markov chain Monte Carlo for estimation of hidden Markov models: A computational perspective. Bayesian Anal. 2008, 3, 659–688. [Google Scholar] [CrossRef]
Khreich, W.; Granger, E.; Miri, A.; Sabourin, R. A survey of techniques for incremental learning of HMM parameters. Inf. Sci. 2012, 197, 105–130. [Google Scholar] [CrossRef]
Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 1977, 39, 1–22. [Google Scholar]
Adams, S.; Beling, P. A survey of feature selection methods for Gaussian mixture models and hidden Markov models. Artif. Intell. Rev. 2017, 52, 1739–1779. [Google Scholar] [CrossRef]
Zeitouni, O.; Dembo, A. Exact filters for the estimation of the number of transitions of finite-state continuous-time Markov processes. IEEE Trans. Inf. Theory 1988, 34, 890–893. [Google Scholar] [CrossRef]
Dembo, A.; Zeitouni, O. Parameter estimation of partially observed continuous time stochastic processes via the EM algorithm. Stoch. Process. Their Appl. 1989, 23, 91–113. [Google Scholar] [CrossRef]
Stoyanov, J. Counterexamples in Probability; Wiley: New Jersey, NJ, USA, 1997. [Google Scholar]
Liptser, R.; Shiryaev, A. Theory of Martingales; Mathematics and its Applications; Springer: Dortrecht, The Netherlands, 1989. [Google Scholar]
Borisov, A.; Sokolov, I. Optimal Filtering of Markov Jump Processes Given Observations with State-Dependent Noises: Exact Solution and Stable Numerical Schemes. Mathematics 2020, 8, 506. [Google Scholar] [CrossRef]
Borisov, A. L₁-optimal filtering of Markov jump processes. II: Numerical analysis of particular realizations schemes. Autom. Remote Control 2020, 81, 2160–2180. [Google Scholar] [CrossRef]
Ishikawa, Y.; Kunita, H. Malliavin calculus on the Wiener–Poisson space and its application to canonical SDE with jumps. Stoch. Process. Their Appl. 2006, 116, 1743–1769. [Google Scholar] [CrossRef][Green Version]
Liptser, R.; Shiryaev, A. Statistics of Random Processes II: Applications; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
Badawi, F.; Lindquist, A.; Pavon, M. A stochastic realization approach to the smoothing problem. IEEE Trans. Autom. Control 1979, 24, 878–888. [Google Scholar] [CrossRef]
Borisov, A. Backward representation of Markov jump processes and related problems. I. Optimal linear estimation. Autom. Remote Control 2006, 67, 1228–1250. [Google Scholar] [CrossRef]
Borisov, A. Backward representation of Markov jump processes and related problems. II. Optimal nonlinear estimation. Autom. Remote Control 2006, 67, 1466–1484. [Google Scholar] [CrossRef]
Liptser, R.; Shiryaev, A. Statistics of Random Processes: I. General Theory; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
Elliott, R.J. Reverse-Time Markov Processes. IEEE Trans. Inf. Theor. 1986, 32, 290–292. [Google Scholar] [CrossRef]
Meyn, S.; Tweedie, R. Markov Chains and Stochastic Stability; Springer: Berlin/Heidelberg, Germany, 1993. [Google Scholar]
Borovkov, A. Probability Theory; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Borovkov, A. Mathematical Statistics; Gordon & Breach: Amsterdam, The Netherlands, 1998. [Google Scholar]
Krylov, N. Controlled Diffusion Processes; Springer: New York, NY, USA, 1980. [Google Scholar]

Figure 3. Sample error variances of the forward- and backward-time filtering and fixed-interval smoothing estimates for the steps

δ = 10^{- 2}, 10^{- 3}, 10^{- 4}

.

Figure 3. Sample error variances of the forward- and backward-time filtering and fixed-interval smoothing estimates for the steps

δ = 10^{- 2}, 10^{- 3}, 10^{- 4}

.

Figure 4. Estimation of the TIM

Λ

off-diagonal elements.

Figure 4. Estimation of the TIM

Λ

off-diagonal elements.

Figure 5. Estimation of the observation drift f elements.

Figure 6. Estimation of the observation diffusion g elements.

Figure 7. The true pdf of discretized observations, its histogram and estimates.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Borisov, A.; Gorshenin, A. Identification of Continuous-Discrete Hidden Markov Models with Multiplicative Observation Noise. Mathematics 2022, 10, 3062. https://doi.org/10.3390/math10173062

AMA Style

Borisov A, Gorshenin A. Identification of Continuous-Discrete Hidden Markov Models with Multiplicative Observation Noise. Mathematics. 2022; 10(17):3062. https://doi.org/10.3390/math10173062

Chicago/Turabian Style

Borisov, Andrey, and Andrey Gorshenin. 2022. "Identification of Continuous-Discrete Hidden Markov Models with Multiplicative Observation Noise" Mathematics 10, no. 17: 3062. https://doi.org/10.3390/math10173062

APA Style

Borisov, A., & Gorshenin, A. (2022). Identification of Continuous-Discrete Hidden Markov Models with Multiplicative Observation Noise. Mathematics, 10(17), 3062. https://doi.org/10.3390/math10173062

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Identification of Continuous-Discrete Hidden Markov Models with Multiplicative Observation Noise

Abstract

1. Introduction

2. Problem Formulation

3. Auxiliary Problem: Fixed Interval Smoothing Given Discretized Observations

4. EM-Like Algorithm of HMM Parameter Identification

5. Numerical Schemes of Filtering Estimate Calculation

6. Numerical Experiments

6.1. Filtering vs. Smoothing: Estimation Performance Analysis

6.2. Parameter Identification

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI