# State and Parameter Estimation from Observed Signal Increments

^{*}

Next Article in Journal

Next Article in Special Issue

Next Article in Special Issue

Previous Article in Journal

Previous Article in Special Issue

Previous Article in Special Issue

Institute of Mathematics, University of Potsdam, Karl-Liebknecht-Str. 24/25, D-14476 Potsdam, Germany

Author to whom correspondence should be addressed.

Received: 26 March 2019
/
Revised: 13 May 2019
/
Accepted: 14 May 2019
/
Published: 17 May 2019

(This article belongs to the Special Issue Information Theory and Stochastics for Multiscale Nonlinear Systems)

The success of the ensemble Kalman filter has triggered a strong interest in expanding its scope beyond classical state estimation problems. In this paper, we focus on continuous-time data assimilation where the model and measurement errors are correlated and both states and parameters need to be identified. Such scenarios arise from noisy and partial observations of Lagrangian particles which move under a stochastic velocity field involving unknown parameters. We take an appropriate class of McKean–Vlasov equations as the starting point to derive ensemble Kalman–Bucy filter algorithms for combined state and parameter estimation. We demonstrate their performance through a series of increasingly complex multi-scale model systems.

The research presented in this paper has been motivated by the state and parameter estimation problem for particles moving under a stochastic velocity field, with the measurements given by partial and noisy observations of their position increments. If the deterministic contributions to the velocity field are stationary, and the position increments of the moving particle are exactly observed, then one is led to a standard parameter estimation problem for stochastic differential equations (SDEs) [1,2]. In [3], this setting was extended to the case where the deterministic contributions to the velocity field themselves undergo a stochastic time evolution. Furthermore, while continuous-time observations of position increments are at the focus of the present study, the assimilation of discrete-time observations of particle positions has been investigated in [4,5] under a so-called Lagrangian data assimilation setting for atmospheric fluid dynamics.

The assumption of exactly and fully observed position increments is not always realistic and the case of partial and noisy observations is at the center of the present study. With access to partial and noisy observations of position increments leads to correlations between the measurement and model errors. The theoretical impact of such correlations on state and parameter estimation problems has been discussed, for example, in [6] in the context of linear systems, and in [7] for nonlinear systems. In particular, one finds that the appropriately adjusted data likelihood involves the gradient of log-densities, which is nontrivial from a computational perspective, and which prevents a straightforward application of standard Markov chain Monte Carlo (MCMC) or sequential Monte Carlo (SMC) methods [8].

In this paper, we instead follow an alternative Monte Carlo approach based on appropriately adjusted McKean–Vlasov filtering equations, an approach pioneered in [9] in the context of the standard state estimation problem for diffusion processes. McKean–Vlasov equations, first studied in [10], are a class of SDEs in which the right-hand side depends on the law of the process itself. We rely on a particular formulation of McKean–Vlasov filtering equations, the so-called feedback particle filters [11], utilising stochastic innovation processes [12].

Our proposed Monte Carlo formulation avoids the need for estimating log-densities, and can be implemented in a numerically robust manner relying on a generalised ensemble Kalman–Bucy filter approximation applied to an extended state space formulation [13]. The ensemble Kalman–Bucy filter [14,15] has been introduced previously as an extension of the popular ensemble Kalman filter [13,16,17] to continuous-time data assimilation under the assumption of uncorrelated measurement and model errors.

While the McKean-Vlasov formulation is essentially mathematically equivalent to the more conventional one based on the Kushner-Stratonovitch equation [7], these two approaches differ significantly in structure, suggesting different tools for their analysis as well as numerical approximations. More broadly speaking, the McKean–Vlasov approach to filtering is appealing since its Monte Carlo implementations completely avoid the need for resampling characteristic of standard SMC methods. Furthermore, a wide range of approximations are possible within the McKean–Vlasov framework with some of them, such as the ensemble Kalman–Bucy filter, applicable to high-dimensional problems. The McKean–Vlasov approach also arises naturally when analysing sequential Monte Carlo methods [18].

In Section 6, we apply the proposed algorithms to a series of state and parameter estimation problems of increasing complexity. First, we study the state and parameter estimation problem for an Ornstein–Uhlenbeck process [2]. Two further experiments investigate the behaviour of the filters for reduced model equations, with the data being collected from underlying multi-scale models. There we distinguish between the averaging and homogenisation scenarios [19]. Finally, we look at examples of nonparametric drift estimation [3] and parameter estimation for the stochastic heat equation [20].

We consider the time evolution of a random state variable ${X}_{t}\in {\mathbb{R}}^{{N}_{x}}$ in ${N}_{x}$-dimensional state space, ${N}_{x}\ge 1$, as prescribed by an SDE of the form
for time $t\ge 0$, with the drift function $f:{\mathbb{R}}^{{N}_{x}}\times {\mathbb{R}}^{{N}_{a}}\to {\mathbb{R}}^{{N}_{x}}$ depending on ${N}_{a}\ge 0$ unknown parameters $a={({a}^{1},\dots ,{a}^{{N}_{a}})}^{\mathrm{T}}\in {\mathbb{R}}^{{N}_{a}}$. Model errors are represented through standard ${N}_{w}$-dimensional Brownian motion ${W}_{t}$, ${N}_{w}\ge 1$, and a matrix $G\in {\mathbb{R}}^{{N}_{x}\times {N}_{w}}$. We also introduce the associated model error covariance matrix $Q=G{G}^{\mathrm{T}}$. We will generally assume that the initial condition ${X}_{0}$ is fixed, that is, ${X}_{0}={x}_{0}$ a.s. for given ${x}_{0}\in {\mathbb{R}}^{{N}_{x}}$. In terms of a more specific example, one can think of ${X}_{t}$ denoting the position of a particle at time $t\ge 0$ moving in ${N}_{x}=3$ dimensional space under the influence of a stochastic velocity field, with deterministic contributions given by f and stochastic perturbations by $G{W}_{t}$. In the case $G=0$, the SDE (1) reduces to an ordinary differential equation with given initial condition ${x}_{0}$.

$$\mathrm{d}{X}_{t}=f({X}_{t},a)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+G\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t},$$

We assume throughout this paper that (1) possesses unique, strong solutions for all parameter values a. See, for example, [2] (Section 3.3) for sufficient conditions on the drift function f. The distribution of ${X}_{t}$ is denoted by ${\pi}_{t}$, which we also abbreviate by ${\pi}_{t}=\mathrm{Law}({X}_{t})$. We use the same notation for measures and their Lebesgue densities, provided they exist.

A wide class of drift functions can be written in the form
where ${f}_{0}:{\mathbb{R}}^{{N}_{x}}\to {\mathbb{R}}^{{N}_{x}}$ is a known drift function, the ${b}_{i}:{\mathbb{R}}^{{N}_{x}}\to {\mathbb{R}}^{{N}_{x}}$, $i=1,\dots ,{N}_{a}$, denote appropriate basis functions, and the vector $a={({a}^{1},\dots ,{a}^{{N}_{a}})}^{\mathrm{T}}\in {\mathbb{R}}^{{N}_{a}}$ contains the unknown parameters of the model. The family $\{{b}_{i}(x)\}$ of basis functions, which we collect in a matrix-valued function $B(x)=({b}_{1}(x),{b}_{2}(x),\dots ,{b}_{{N}_{a}}(x))\in {\mathbb{R}}^{{N}_{x}\times {N}_{a}}$, could arise from a finite-dimensional truncation of some appropriate Hilbert space $\mathcal{H}$. See, for example, [24] for computational approaches to nonparametric drift estimation using a Galerkin approximation in $\mathcal{H}$, where the ${b}_{i}(x)$ become finite element basis functions. Furthermore, the expansion coefficients $\{{a}^{i}\}$ could be made time-dependent by letting them evolve according to some system of differential equations arising, for example, from the discretisation of an underlying partial differential equation with solutions in $\mathcal{H}$. See [3] for specific examples of such a setting. While the present paper focuses on stationary drift functions, i.e., the parameters $\{{a}^{i}\}$ are time-independent, the results from Section 3 and Section 5, respectively, can easily be extended to the non-stationary case where the parameters themselves satisfy given evolution equations.

$$f(x,a)={f}_{0}(x)+B(x)a={f}_{0}(x)+\sum _{i=1}^{{N}_{a}}{b}_{i}(x){a}^{i},$$

Data and an observation model are required in order to perform state and parameter estimation for SDEs of the form (1). In this paper, we assume that we observe partial and noisy increments $\mathrm{d}{Y}_{t}$ of the signal ${X}_{t}$, given by
for t in the observation interval $[0,T]$, $T>0$, where $H\in {\mathbb{R}}^{{N}_{y}\times {N}_{x}}$ is a given linear operator, ${V}_{t}$ denotes standard ${N}_{y}$-dimensional Brownian motion with ${N}_{y}\ge 1$ and $R\in {\mathbb{R}}^{{N}_{y}\times {N}_{y}}$ is a covariance matrix. We introduce the observation map
for later use. Unless $HG=0$, it is clear that the model error ${E}_{t}^{\mathrm{m}}:=G{W}_{t}$ in (1) and the total observation error
in (3) are correlated. The impact of correlations between the model and measurement errors on the state estimation problem have been discussed by [6,7]. Furthermore, such correlations require adjustments to sequential estimation methods [16,17,25] which are the main focus of this paper. We assume throughout this paper that the covariance matrix
of the observation error (5) is invertible.

$$\mathrm{d}{Y}_{t}=H\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{X}_{t}+{R}^{1/2}\mathrm{d}{V}_{t}=Hf({X}_{t},a)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+HG\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t}+{R}^{1/2}\mathrm{d}{V}_{t},\phantom{\rule{1.em}{0ex}}{Y}_{0}={X}_{0}={x}_{0},$$

$$h(x,a)=Hf(x,a)$$

$${E}_{t}^{\mathrm{o}}:=HG{W}_{t}+{R}^{1/2}{V}_{t}$$

$$C=HG{G}^{\mathrm{T}}{H}^{\mathrm{T}}+R=HQ{H}^{\mathrm{T}}+R$$

The special case $R=0$ and $H=I$ leads to a pure parameter estimation problem which has been extensively studied in the literature in the settings of maximum likelihood and Bayesian estimators [1,2]. In Section 3, we provide a reformulation of the Bayesian approach as McKean–Vlasov equations for the parameters, based on the results in [9,11].

If $R\ne 0$, then (1) and (3) lead to a combined state and parameter estimation problem with correlated noise terms. We will first discuss the impact of this correlation on the pure state estimation problem in Section 4 assuming that the parameters of the problem are known. Again, we will derive appropriate McKean–Vlasov equations in the state variables. Our key contribution is a formulation that avoids the need for log-density estimates, and can be put into an appropriately generalised ensemble Kalman–Bucy filter approximation framework [14,15]. We also formally demonstrate that the McKean–Vlasov filter equation reduces to $\mathrm{d}{X}_{t}=\mathrm{d}{Y}_{t}$ in the limit $R\to 0$ and $H=I$, a property that is less straightforward to demonstrate for filter formulations involving log-densities.

These McKean–Vlasov equations are generalised to the combined state and parameter estimation problem via an augmentation of state space [13] in Section 5. Given the results from Section 4, such an extension is rather straightforward.

The numerical experiments in Section 6 rely exclusively on the generalised ensemble Kalman–Bucy filter approximation to the McKean–Vlasov equations, which are easy to implement and yield robust and accurate numerical results.

In this section, we treat the simpler Bayesian parameter estimation problem which arises from setting $R=0$ and $H=I$ in (3), i.e., ${N}_{y}={N}_{x}$. This leads to $\mathrm{d}{X}_{t}=\mathrm{d}{Y}_{t}$ and, furthermore, ${X}_{t}={Y}_{t}$ for all $t\in [0,T]$, provided ${X}_{0}={Y}_{0}={x}_{0}$ which we assume throughout this paper. The requirement that $C=Q$ is invertible requires that G has rank ${N}_{x}$; that is, ${N}_{w}\ge {N}_{x}$ in (1). The data likelihood
thus follows from the observation model with additive Brownian noise in (3). Given a prior distribution ${\Pi}_{0}(a)$ for the parameters, the resulting posterior distribution at any time $t\in (0,T]$ is
according to Bayes’ theorem [7]. Here, we have introduced the shorthand
for the expectation of ${l}_{t}$ with respect to ${\Pi}_{0}$. It is well-known that the posterior distributions ${\Pi}_{t}$ satisfy the stochastic partial differential equation
with the time-dependent observation map
where $\varphi :{\mathbb{R}}^{{N}_{a}}\to \mathbb{R}$ is a compactly supported smooth test function, and ${\Pi}_{t}[\varphi ]$ again denoting the expectation of $\varphi $ with respect to ${\Pi}_{t}$. See [7] for a detailed discussion. Equation (10) is a special instance of the well-known Kushner–Stratonovitch equation from time-continuous filtering [7].

$${l}_{t}(a)=exp\left({\int}_{0}^{t}f{({Y}_{s},a)}^{\mathrm{T}}{Q}^{-1}\mathrm{d}{Y}_{s}-\frac{1}{2}{\int}_{0}^{t}f{({Y}_{s},a)}^{\mathrm{T}}{Q}^{-1}f({Y}_{s},a)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}s\right)$$

$${\Pi}_{t}(a)=\frac{{l}_{t}(a){\Pi}_{0}(a)}{{\Pi}_{0}[{l}_{t}]}$$

$${\Pi}_{0}[{l}_{t}]={\int}_{{\mathbb{R}}^{{N}_{a}}}{l}_{t}(a){\Pi}_{0}(a)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}a$$

$$\mathrm{d}{\Pi}_{t}[\varphi ]={\left({\Pi}_{t}[\varphi \phantom{\rule{0.166667em}{0ex}}{h}_{t}]-{\Pi}_{t}[\varphi ]{\Pi}_{t}[{h}_{t}]\right)}^{\mathrm{T}}{Q}^{-1}(\mathrm{d}{Y}_{t}-{\Pi}_{t}[{h}_{t}]\mathrm{d}t)$$

$${h}_{t}(a)=f({Y}_{t},a),$$

We now state a McKean–Vlasov reformulation of the Kushner–Stratonovitch Equation (10) as a special instance of the feedback particle filter of [11,12]. The key idea is to formulate a stochastic differential equation in the parameters in which they are treated as time-dependent random variables. We introduce the notation ${\tilde{A}}_{t}$ for these, and require that the law of ${\tilde{A}}_{t}$ coincide with (8) for $t\in [0,T]$, i.e., with the solution to (10).

Consider the McKean–Vlasov equations
where the matrix-valued Kalman gain ${K}_{t}\in {\mathbb{R}}^{{N}_{a}\times {N}_{y}}$ satisfies

$$\mathrm{d}{\tilde{A}}_{t}={K}_{t}({\tilde{A}}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{I}_{t}+{\Omega}_{t}({\tilde{A}}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t,$$

$$\nabla \xb7\left({\tilde{\Pi}}_{t}\left({K}_{t}Q\right)\right)=-{\tilde{\Pi}}_{t}{\left({h}_{t}-{\tilde{\Pi}}_{t}[{h}_{t}]\right)}^{\mathrm{T}},\phantom{\rule{1.em}{0ex}}{\tilde{\Pi}}_{t}=\mathrm{Law}({\tilde{A}}_{t}).$$

The innovation process ${I}_{t}$ can be chosen to be given by either
or
and

$$\mathrm{d}{I}_{t}=\mathrm{d}{Y}_{t}-\frac{1}{2}\left({h}_{t}({\tilde{A}}_{t})+{\tilde{\Pi}}_{t}[{h}_{t}]\right)\mathrm{d}t,$$

$$\mathrm{d}{I}_{t}=\mathrm{d}{Y}_{t}-\left\{{h}_{t}({\tilde{A}}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+G\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t}\right\},$$

$${\Omega}_{t}^{i}=\frac{1}{2}\sum _{j=1}^{{N}_{a}}\sum _{k,l=1}^{{N}_{y}}{Q}^{kl}{K}_{t}^{jl}\left({\partial}_{j}{K}_{t}^{ik}\right),\phantom{\rule{1.em}{0ex}}i=1,\dots ,{N}_{a}.$$

Then, the distribution ${\tilde{\Pi}}_{t}=Law({\tilde{A}}_{t})$ coincides with the solution to (10), provided that the initial distributions agree. In other words, ${\tilde{\Pi}}_{t}={\Pi}_{t}$ for all $t\in [0,T]$.

Throughout this paper, we write (12) in the more compact Stratonovitch form
where the Stratonovitch interpretation is to be applied only to ${\tilde{A}}_{t}$ in ${K}_{t}({\tilde{A}}_{t})$, while the explicit time-dependence of ${K}_{t}$ remains in its Itô interpretation. It should be noted that the matrix-valued function ${K}_{t}$ is not uniquely defined by the PDE (13). Indeed, provided ${K}_{t}$ solves (13), ${K}_{t}+{\beta}_{t}$ is also a solution whenever $\nabla \xb7\left({\tilde{\Pi}}_{t}{\beta}_{t}\right)=0$. As discussed in [15], the minimiser over all suitable ${K}_{t}$ with respect to a kinetic energy-type functional is of the form
for a vector of potential functions ${\Psi}_{t}=({\psi}_{t}^{1},\dots ,{\psi}_{t}^{{N}_{x}})$, ${\psi}_{t}^{k}:{\mathbb{R}}^{{N}_{a}}\to \mathbb{R}$. Inserting (18) into (13) leads to ${N}_{x}$ elliptic partial differential equations (often referred to as Poisson equations),
understood component wise, where the centring condition ${\tilde{\Pi}}_{t}[{\Psi}_{t}]=0$ makes the solution unique under mild assumptions on ${\tilde{\Pi}}_{t}$ (see [26]). The numerical approximation of (19) in the context of the feedback particle filter has been discussed in [27]. Finally, (15) yields a particularly appealing formulation, since it is based on a direct comparison of $\mathrm{d}{Y}_{t}$ with a random realisation of the right hand side of the SDE (1), given a parameter value $a={\tilde{A}}_{t}(\omega )$ and a realisation of the noise term $\mathrm{d}{W}_{t}(\omega )$. This fact will be explored further in Section 4.

$$\mathrm{d}{\tilde{A}}_{t}={K}_{t}({\tilde{A}}_{t})\circ \mathrm{d}{I}_{t},$$

$${K}_{t}=\nabla {\Psi}_{t}{Q}^{-1}$$

$$\nabla \xb7\left({\tilde{\Pi}}_{t}\nabla {\Psi}_{t}\right)=-{\tilde{\Pi}}_{t}{\left({h}_{t}-{\tilde{\Pi}}_{t}[{h}_{t}]\right)}^{\mathrm{T}},\phantom{\rule{1.em}{0ex}}{\tilde{\Pi}}_{t}[{\Psi}_{t}]=0,$$

For clarity, let us repeat Equations (13) and (18) in their index forms:

$$\sum _{i=1}^{{N}_{a}}\sum _{j=1}^{{N}_{y}}{\partial}_{i}\left({\tilde{\Pi}}_{t}\left({K}_{t}^{ij}{Q}^{jk}\right)\right)=-{\tilde{\Pi}}_{t}\left({h}_{t}^{k}-{\tilde{\Pi}}_{t}[{h}_{t}^{k}]\right),\phantom{\rule{1.em}{0ex}}k=1,\dots ,{N}_{y},$$

$$\sum _{j=1}^{{N}_{y}}{K}_{t}^{ij}(a){Q}^{jk}={\partial}_{i}{\psi}_{t}^{k}(a),\phantom{\rule{1.em}{0ex}}i=1,\dots ,{N}_{a},\phantom{\rule{1.em}{0ex}}k=1,\dots ,{N}_{y}.$$

Let us now assume that the initial distribution ${\Pi}_{0}$ is Gaussian, and that f is linear in the unknown parameters such as in (2). Then, the distributions ${\tilde{\Pi}}_{t}$ remain Gaussian for all times with mean ${\overline{a}}_{t}$ and covariance matrix ${P}_{t}^{aa}$. The elliptic PDE (13) is solved by the parameter-independent Kalman gain matrix
and one obtains the McKean–Vlasov formulation
of the Kalman–Bucy filter, with the innovation process ${I}_{t}$ defined by either
or

$${K}_{t}={P}_{t}^{aa}B{({Y}_{t})}^{\mathrm{T}}{Q}^{-1}$$

$$\mathrm{d}{\tilde{A}}_{t}={P}_{t}^{aa}B{({Y}_{t})}^{\mathrm{T}}{Q}^{-1}\mathrm{d}{I}_{t}$$

$$\mathrm{d}{I}_{t}=\mathrm{d}{Y}_{t}-\left({f}_{0}({Y}_{t})+\frac{1}{2}B({Y}_{t})({\tilde{A}}_{t}+{\overline{a}}_{t})\right)\mathrm{d}t$$

$$\mathrm{d}{I}_{t}=\mathrm{d}{Y}_{t}-\left\{\left({f}_{0}({Y}_{t})+B({Y}_{t}){\tilde{A}}_{t}\right)\mathrm{d}t+G\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t}\right\}.$$

Please note that the Stratonovitch formulation (17) reduces to the standard Itô interpretation, since ${K}_{t}$ no longer depends explicitly on ${\tilde{A}}_{t}$.

The McKean–Vlasov Equation (23) can be extended to nonlinear, non-Gaussian parameter estimation problems by generalising the parameter-independent Kalman gain matrix (22) to

$${K}_{t}={P}_{t}^{ah}{Q}^{-1},\phantom{\rule{1.em}{0ex}}{P}_{t}^{ah}={\tilde{\Pi}}_{t}\left[(a-{\overline{a}}_{t}){({h}_{t}(a)-{\tilde{\Pi}}_{t}[{h}_{t}])}^{\mathrm{T}}\right]={\tilde{\Pi}}_{t}\left[a\phantom{\rule{0.166667em}{0ex}}{({h}_{t}(a)-{\tilde{\Pi}}_{t}[{h}_{t}])}^{\mathrm{T}}\right]$$

Clearly, the gain (26) provides only an approximation to the solution of (13). However, such approximations have become popular in nonlinear state estimation in the form of the ensemble Kalman filter [16,17], and we will test its suitability for parameter estimation in Section 6.

Numerical implementations of the proposed McKean–Vlasov approaches rely on Monte–Carlo approximations. More specifically, given M samples ${\tilde{A}}_{0}^{i}$, $i=1,\dots ,M$, from the initial distribution ${\Pi}_{0}$, we introduce the interacting particle system
where the innovation processes ${I}_{t}^{i}$ are defined by either
or, alternatively,
and ${W}_{t}^{i}$, $i=1,\dots ,M$, denote independent ${N}_{w}$-dimensional Brownian motions. For ${K}_{t}^{M}$, we will use the parameter-independent empirical Kalman gain approximation
in our numerical experiments, which leads to the so-called ensemble Kalman–Bucy filter [14,15]. Please note that ${\widehat{P}}_{t}^{ah}$ provides an unbiased estimator of ${P}_{t}^{ah}$.

$$\mathrm{d}{\tilde{A}}_{t}^{i}={K}_{t}^{M}({\tilde{A}}_{t}^{i})\circ \mathrm{d}{I}_{t}^{i},$$

$$\mathrm{d}{I}_{t}^{i}=\mathrm{d}{Y}_{t}-\frac{1}{2}\left({h}_{t}({\tilde{A}}_{t}^{i})+{\overline{h}}_{t}^{M}\right)\mathrm{d}t,\phantom{\rule{2.em}{0ex}}{\overline{h}}_{t}^{M}=\frac{1}{M}\sum _{i=1}^{M}{h}_{t}({\tilde{A}}_{t}^{i}),$$

$$\mathrm{d}{I}_{t}^{i}=\mathrm{d}{Y}_{t}-\left({h}_{t}({\tilde{A}}_{t}^{i})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+G\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t}^{i}\right),$$

$${K}_{t}^{M}={\widehat{P}}_{t}^{ah}{Q}^{-1},\phantom{\rule{2.em}{0ex}}{\widehat{P}}_{t}^{ah}=\frac{1}{M-1}\sum _{i=1}^{M}{\tilde{A}}_{t}^{i}{({h}_{t}({\tilde{A}}_{t}^{i})-{\overline{h}}_{t}^{M})}^{\mathrm{T}},$$

Finally, a robust and efficient time-stepping procedure for approximating ${\tilde{A}}_{{t}_{n}}$, ${t}_{n}=n\Delta t$, is provided in [28,29,30]. Denoting the approximations at time ${t}_{n}$ by ${\tilde{A}}_{n}^{i}$, $i=1,\dots ,M$, we obtain
with step size $\Delta t>0$, empirical covariance matrices
and innovation increments $\Delta {I}_{n}^{i}$ given by either
or

$${\tilde{A}}_{n+1}^{i}={\tilde{A}}_{n}^{i}+\Delta t{\widehat{P}}_{n}^{ah}{\left(Q+\Delta t{\widehat{P}}_{n}^{hh}\right)}^{-1}\Delta {I}_{n}^{i}$$

$${\widehat{P}}_{n}^{ah}=\frac{1}{M-1}\sum _{i=1}^{M}{\tilde{A}}_{n}^{i}{({h}_{n}({\tilde{A}}_{n}^{i})-{\overline{h}}_{n}^{M})}^{\mathrm{T}},\phantom{\rule{2.em}{0ex}}{\widehat{P}}_{n}^{hh}=\frac{1}{M-1}\sum _{i=1}^{M}{h}_{n}({\tilde{A}}_{n}^{i}){({h}_{n}({\tilde{A}}_{n}^{i})-{\overline{h}}_{n}^{M})}^{\mathrm{T}},$$

$$\Delta {I}_{n}^{i}=\Delta {Y}_{n}-\frac{1}{2}\left({h}_{n}({\tilde{A}}_{n}^{i})+{\overline{h}}_{n}^{M}\right)\Delta t,\phantom{\rule{2.em}{0ex}}{\overline{h}}_{n}^{M}=\frac{1}{M}\sum _{i=1}^{M}{h}_{n}({\tilde{A}}_{n}^{i}),$$

$$\Delta {I}_{n}^{i}=\Delta {Y}_{n}-\left({h}_{n}({\tilde{A}}_{n}^{i})\phantom{\rule{0.166667em}{0ex}}\Delta t+\Delta {t}^{1/2}G{\Xi}_{n}^{i}\right),\phantom{\rule{2.em}{0ex}}{\Xi}_{n}^{i}\sim \mathrm{N}(0,I).$$

Here we have used the abbreviations ${h}_{n}(a)=f({Y}_{n},a)$, ${Y}_{n}={Y}_{{t}_{n}}$, and $\Delta {Y}_{n}={Y}_{{t}_{n+1}}-{Y}_{{t}_{n}}$.

While the feedback particle formulation (17) and its ensemble Kalman–Bucy filter approximation (31) are special cases of already available formulations, they provide the starting point for our novel McKean–Vlasov equations and their numerical approximation of the combined state and parameter estimation problem with correlated measurement and model errors, which we develop in the following two sections.

We return to the observation Model (3) with $R\ne 0$ and general H. The pure state estimation problem is considered first; that is, $f(x,a)=f(x)$ in (1).

Using ${E}_{t}^{\mathrm{o}}$, given by (5), and ${E}_{t}^{\mathrm{c}}$ defined by
with the total measurement error covariance matrix C given by (6), we find that
and the covariations [2] satisfy

$${E}_{t}^{\mathrm{c}}=G(I-{G}^{\mathrm{T}}{H}^{\mathrm{T}}{C}^{-1}HG){W}_{t}-Q{H}^{\mathrm{T}}{C}^{-1}{R}^{1/2}{V}_{t}$$

$$G{W}_{t}={E}_{t}^{\mathrm{c}}+Q{H}^{\mathrm{T}}{C}^{-1}{E}_{t}^{\mathrm{o}},$$

$${\langle {E}^{\mathrm{o}},{E}^{\mathrm{c}}\rangle}_{t}=0,\phantom{\rule{1.em}{0ex}}{\langle {E}^{\mathrm{o}},{E}^{\mathrm{o}}\rangle}_{t}=Ct,\phantom{\rule{1.em}{0ex}}{\langle {E}^{\mathrm{c}},{E}^{\mathrm{c}}\rangle}_{t}=G(I-{G}^{\mathrm{T}}{H}^{\mathrm{T}}{C}^{-1}HG){G}^{\mathrm{T}}t.$$

These errors naturally suggest linear combinations of ${W}_{t}$ and ${V}_{t}$ in (1) and (3) that shift the correlation between measurement and model errors to the signal dynamics, yielding
where ${\widehat{W}}_{t}$ and ${\widehat{V}}_{t}$ denote mutually independent standard Brownian motions of dimension ${N}_{w}$ and ${N}_{y}$, respectively. These equations correspond exactly to the correlated noise example from [7] (Section 3.8). Furthermore, $H=I$ and $R=0$ lead to ${E}_{t}^{\mathrm{c}}=0$, $Q{H}^{\mathrm{T}}{C}^{-1/2}={C}^{1/2}$, and, hence, $\mathrm{d}{X}_{t}=\mathrm{d}{Y}_{t}$.

$$\begin{array}{cc}\hfill \mathrm{d}{X}_{t}& =f({X}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+G{(I-{G}^{\mathrm{T}}{H}^{\mathrm{T}}{C}^{-1}HG)}^{1/2}\mathrm{d}{\widehat{W}}_{t}+Q{H}^{\mathrm{T}}{C}^{-1/2}\mathrm{d}{\widehat{V}}_{t},\hfill \end{array}$$

$$\begin{array}{cc}\hfill \mathrm{d}{Y}_{t}& =Hf({X}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+{C}^{1/2}\mathrm{d}{\widehat{V}}_{t},\hfill \end{array}$$

A straightforward application of the results from [7] (Section 3.8) yields the following statement:

The conditional expectations ${\pi}_{t}[\varphi ]=\mathbb{E}[\varphi ({X}_{t})|{Y}_{[0,t]}]$ satisfy
where We use the notation $Q:\nabla \nabla \varphi ={\sum}_{i,j=1}^{{N}_{x}}{Q}^{ij}{\partial}_{i}{\partial}_{j}\varphi $.
is the generator of (1), $h(x)=Hf(x)$ denotes the observation map, and ϕ is a compactly supported smooth function.

$$\begin{array}{cc}\hfill {\pi}_{t}[\varphi ]& ={\pi}_{0}[\varphi ]+{\int}_{0}^{t}{\pi}_{s}[\mathcal{L}\varphi ]\phantom{\rule{0.166667em}{0ex}}\mathrm{d}s+{\int}_{0}^{t}{\pi}_{s}{\left[\varphi h+HQ\nabla \varphi -\varphi {\pi}_{s}[h]\right]}^{\mathrm{T}}{C}^{-1}\left(\mathrm{d}{Y}_{s}-{\pi}_{s}[h]\phantom{\rule{0.166667em}{0ex}}\mathrm{d}s\right),\hfill \end{array}$$

$$\mathcal{L}=f\xb7\nabla +\frac{1}{2}Q:\nabla \nabla $$

For the convenience of the reader, we present an independent derivation in Appendix A. We note that (39) also arises as the Kushner–Stratonovitch equations for an SDE Model (1) with observations ${Y}_{t}$ satisfying the observation model
where ${\tilde{V}}_{t}$ denotes ${N}_{y}$-dimensional Brownian motion independent of the Brownian motion ${W}_{t}$ in (1). Here we have used that ${\pi}_{t}\left[HQ\nabla {\pi}_{t}\right]=0$. This reinterpretation of our state estimation problem in terms of uncorrelated model and observation errors and modified observation map
allows one to apply available MCMC and SMC methods for continuous-time filtering and smoothing problems. See, for example, [16]. However, there are two major limitations of such an approach. First, it requires approximating the gradient of the log-density. Second, the modified observation Model (41) is not well-defined in the limit $R\to 0$ and $H=I$, since the density ${\pi}_{t}$ collapses to a Dirac delta function under the given initial condition ${X}_{0}={x}_{0}$ a.s.

$$\mathrm{d}{Y}_{t}=H\left(f({X}_{t})-Q\nabla log{\pi}_{t}({X}_{t})\right)\mathrm{d}t+{C}^{1/2}\mathrm{d}{\tilde{V}}_{t},$$

$${\tilde{h}}_{t}(x)=H\left(f(x)-Q\nabla log{\pi}_{t}(x)\right)$$

In order to circumvent these complications, we develop an alternative approach based on an appropriately modified feedback particle filter formulation in the following subsection.

While it is clearly possible to apply the standard feedback particle filter formulations using (41), the following alternative formulation avoids the need for approximating the gradient of the log-density.

Consider the McKean–Vlasov equation
where the gain ${K}_{t}\in {\mathbb{R}}^{{N}_{x}\times {N}_{y}}$ solves
with observation map $h(x)=Hf(x)$. The function ${\Omega}_{t}$ is given by
and the innovation process ${I}_{t}$ by

$$\mathrm{d}{\tilde{X}}_{t}=f({\tilde{X}}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+G\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t}+{K}_{t}({\tilde{X}}_{t})\circ \mathrm{d}{I}_{t}+{\Omega}_{t}({\tilde{X}}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t,$$

$$\nabla \xb7\left({\tilde{\pi}}_{t}\left({K}_{t}C-Q{H}^{\mathrm{T}}\right)\right)=-{\tilde{\pi}}_{t}{\left(h-{\tilde{\pi}}_{t}[h]\right)}^{\mathrm{T}},\phantom{\rule{1.em}{0ex}}{\tilde{\pi}}_{t}=Law({\tilde{X}}_{t}),$$

$${\Omega}_{t}^{i}=-\frac{1}{2}\sum _{l=1}^{{N}_{x}}\sum _{j=1}^{{N}_{y}}{\partial}_{l}{K}_{t}^{ij}{(Q{H}^{\mathrm{T}})}^{lj},\phantom{\rule{1.em}{0ex}}i=1,\dots ,{N}_{x},$$

$$\mathrm{d}{I}_{t}=\mathrm{d}{Y}_{t}-\left(h({\tilde{X}}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+HG\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t}+{R}^{1/2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{U}_{t}\right).$$

Here, ${W}_{t}$ and ${U}_{t}$ denote mutually independent ${N}_{x}$-dimensional and ${N}_{y}$-dimensional Brownian motions, respectively. Then, ${\tilde{\pi}}_{t}=Law({\tilde{X}}_{t})$ coincides with the solution to (39), provided that the initial distributions agree.

It should be stressed that ${W}_{t}$ in (43) and (46) denote the same Brownian motion, resulting in correlations between the innovation process and model noise.

In this proof the Einstein summation convention over repeated indices is employed, noting that (44) takes the form

$${\partial}_{i}\left({\tilde{\pi}}_{t}\left({K}_{t}^{ij}{C}^{jk}-{(Q{H}^{\mathrm{T}})}^{ik}\right)\right)=-{\tilde{\pi}}_{t}\left({h}^{k}-{\tilde{\pi}}_{t}[{h}^{k}]\right),\phantom{\rule{1.em}{0ex}}k=1,\dots ,{N}_{y}.$$

We begin by writing (43) in its Itô-form,
where

$$\mathrm{d}{\tilde{X}}_{t}=f({\tilde{X}}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+G\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t}+{K}_{t}({\tilde{X}}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{I}_{t}+{\widehat{\Omega}}_{t}({\tilde{X}}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t,$$

$$\begin{array}{cc}\hfill {\widehat{\Omega}}_{t}^{i}& ={\Omega}_{t}^{i}+\frac{1}{2}\left\{-\left({\partial}_{l}{K}_{t}^{ij}\right){(Q{H}^{\mathrm{T}})}^{lj}+2\left({\partial}_{l}{K}_{t}^{ij}\right){K}_{t}^{lk}{C}^{kj}\right\}\hfill \\ \hfill & =\left({\partial}_{l}{K}_{t}^{ij}\right)\left\{{K}_{t}^{lk}{C}^{kj}-{(Q{H}^{\mathrm{T}})}^{lj}\right\}\hfill \end{array}$$

Here, we have used that the covariation between ${K}_{t}$ and ${I}_{t}$ satisfies

$$\mathrm{d}{\u2329{K}^{ij},{I}^{j}\u232a}_{t}={\partial}_{l}{K}_{t}^{ij}\left({G}^{lk}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{\u2329{W}^{k},I\u232a}_{t}+{K}_{t}^{lk}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{\u2329{I}^{k},{I}^{j}\u232a}_{t}\right).$$

Furthermore, ${\langle GW,I\rangle}_{t}=-Q{H}^{\mathrm{T}}t$ and ${\langle I,I\rangle}_{t}=2Ct$.

For a smooth compactly supported test function $\varphi $, Itô’s formula implies
where the covariation process is given by

$$\varphi ({\tilde{X}}_{t})=\varphi ({\tilde{X}}_{0})+{\int}_{0}^{t}{\partial}_{i}\varphi ({\tilde{X}}_{s})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{\tilde{X}}_{s}^{i}+\frac{1}{2}{\int}_{0}^{t}{\partial}_{i}{\partial}_{j}\varphi ({\tilde{X}}_{s})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{\langle {\tilde{X}}^{i},{\tilde{X}}^{j}\rangle}_{s},$$

$${\langle \tilde{X},\tilde{X}\rangle}_{t}=tQ-{\int}_{0}^{t}\left({K}_{s}HQ+Q{H}^{\mathrm{T}}{K}_{s}^{\mathrm{T}}\right)\mathrm{d}s+2{\int}_{0}^{t}{K}_{s}C{K}_{s}^{\mathrm{T}}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}s.$$

Our aim is to show that ${\tilde{\pi}}_{t}[\varphi ]$ coincides with ${\pi}_{t}[\varphi ]$ as defined by the Kushner–Stratonovich Equation (39). To this end, we insert (48) and (52) into (51) and take the conditional expectation, arriving at
recalling that the generator $\mathcal{L}$ has been defined in (40). Under the assumption that ${K}_{t}$ satisfies (44), the two Equations (39) and (53) coincide. Indeed,
implies
and the $\mathrm{d}{Y}_{s}$-contributions agree. To verify the same for the $\mathrm{d}s$-contributions, we use (44) to obtain

$$\begin{array}{cc}{\tilde{\pi}}_{t}[\varphi ]\hfill & ={\tilde{\pi}}_{0}[\varphi ]+{\int}_{0}^{t}{\tilde{\pi}}_{s}[\mathcal{L}\varphi ]\phantom{\rule{0.166667em}{0ex}}\mathrm{d}s+{\int}_{0}^{t}{\tilde{\pi}}_{s}\left[({\partial}_{i}\varphi ){K}_{s}^{ij}\right]\mathrm{d}{Y}_{s}^{j}-{\int}_{0}^{t}{\tilde{\pi}}_{s}\left[({\partial}_{i}\varphi ){K}_{s}^{ij}{h}^{j}\right]\mathrm{d}s\hfill \\ \hfill & \phantom{\rule{2.em}{0ex}}+{\int}_{0}^{t}{\tilde{\pi}}_{s}\left[({\partial}_{i}\varphi )\phantom{\rule{0.166667em}{0ex}}{\widehat{\Omega}}_{s}^{i}\right]\mathrm{d}s+{\int}_{0}^{t}{\tilde{\pi}}_{s}\left[\left({\partial}_{i}{\partial}_{j}\varphi \right){\left({K}_{s}(C{K}_{s}^{\mathrm{T}}-HQ)\right)}^{ij}\right]\mathrm{d}s,\hfill \end{array}$$

$${\tilde{\pi}}_{s}\left[({\partial}_{i}\varphi )({K}_{s}^{ik}{C}^{kj}-{(Q{H}^{\mathrm{T}})}^{ij})\right]={\tilde{\pi}}_{s}\left[\varphi \left({h}^{j}-{\tilde{\pi}}_{s}\left[{h}^{j}\right]\right)\right]$$

$${\tilde{\pi}}_{s}[\nabla \varphi \xb7{K}_{s}]={\tilde{\pi}}_{s}{\left[\varphi h+HQ\nabla \varphi -\varphi {\tilde{\pi}}_{s}[h]\right]}^{\mathrm{T}}{C}^{-1},$$

$$\begin{array}{cc}{\tilde{\pi}}_{s}\left[({\partial}_{i}\varphi ){K}_{s}^{ij}({h}^{j}-{\tilde{\pi}}_{t}[{h}^{j}])\right]\hfill & =-{\int}_{{\mathbb{R}}^{{N}_{x}}}({\partial}_{i}\varphi ){K}_{s}^{ij}{\partial}_{l}\left({\tilde{\pi}}_{s}\left({K}_{s}^{ln}{C}^{nj}-{(Q{H}^{\mathrm{T}})}^{lj}\right)\right)\mathrm{d}x\hfill \\ \hfill & ={\tilde{\pi}}_{s}\left[({\partial}_{i}\varphi )\phantom{\rule{0.166667em}{0ex}}{\widehat{\Omega}}_{s}^{i}\right]+{\tilde{\pi}}_{s}\left[\left({\partial}_{i}{\partial}_{j}\varphi \right){\left({K}_{s}(C{K}_{s}^{\mathrm{T}}-{K}_{s}HQ)\right)}^{ij}\right].\hfill \end{array}$$

We note that the correlation between the innovation process ${I}_{t}$ and the model error ${W}_{t}$ leads to a correction term ${\Omega}_{t}$ in (43) which cannot be subsumed into a Stratonovitch correction, in contrast to the standard feedback particle filter formulation (17).

Assuming that there exist potential functions ${\Psi}_{t}=({\psi}_{t}^{1},\dots ,{\psi}_{t}^{{N}_{y}})$, ${\psi}_{t}^{k}:{\mathbb{R}}^{{N}_{x}}\to \mathbb{R}$, solving the Poisson equation(s) (19) (with ${\tilde{\Pi}}_{t}$ being replaced by ${\tilde{\pi}}_{t}$), (44) can be solved by requiring
thus generalising (18).

$${K}_{t}=(\nabla {\Psi}_{t}+Q{H}^{\mathrm{T}}){C}^{-1},$$

If we set $R=0$, $H=I$, and ${K}_{t}=Q{H}^{\mathrm{T}}{C}^{-1}=I$ in (43), then one obtains
since ${\Omega}_{t}$ vanishes, and all other terms in (43) cancel each other out. If, furthermore, ${Y}_{0}={\tilde{X}}_{0}={x}_{0}$ a.s., then ${\tilde{X}}_{t}={Y}_{t}$ for all $t\in [0,T]$, which in turn justifies our assumption that the gain ${K}_{t}$ is independent of the state variable. Hence, the McKean–Vlasov formulation (43) reproduces the exact reference trajectory ${Y}_{t}$ in the case of no measurement errors and perfectly known initial conditions.

$$\mathrm{d}{\tilde{X}}_{t}=\mathrm{d}{Y}_{t}$$

We develop a simplified version of the feedback particle filter formulation (43) for linear SDEs and Gaussian distributions in the following subsection, which will form the basis of the generalised ensemble Kalman–Bucy filter put forward in the follow-up Section 4.3.

Let us assume that $f(x)=Fx$ with $F\in {\mathbb{R}}^{{N}_{x}\times {N}_{x}}$, i.e., Equations (1) and (3) take the form
with initial conditions drawn from a Gaussian distribution. In this case ${\pi}_{t}$ stays Gaussian for all $t>0$, i.e., ${\pi}_{t}\sim \mathrm{N}({\overline{x}}_{t},{P}_{t})$ with ${\overline{x}}_{t}\in {\mathbb{R}}^{{N}_{x}}$, ${P}_{t}\in {\mathbb{R}}^{{N}_{x}\times {N}_{x}}$. Equation (19) can be solved uniquely by ${\nabla}_{x}\Psi ={P}_{t}{F}^{\mathrm{T}}{H}^{\mathrm{T}}$, and thus the McKean–Vlasov equations for the feedback particle filter (43) reduce to
with the innovation process (46) leading to

$$\begin{array}{cc}\hfill \mathrm{d}{X}_{t}& =F{X}_{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+G\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t},\hfill \end{array}$$

$$\begin{array}{cc}\hfill \mathrm{d}{Y}_{t}& =HF{X}_{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+HG\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t}+{R}^{1/2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{V}_{t},\hfill \end{array}$$

$$\mathrm{d}{\tilde{X}}_{t}=F{\tilde{X}}_{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+G\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t}+\left({P}_{t}{F}^{\mathrm{T}}{H}^{\mathrm{T}}+Q{H}^{\mathrm{T}}\right){C}^{-1}\mathrm{d}{I}_{t},$$

$$\mathrm{d}{I}_{t}=\mathrm{d}{Y}_{t}-HF{\tilde{X}}_{t}\mathrm{d}t-HG\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t}-{R}^{1/2}\mathrm{d}{U}_{t}.$$

We take the expectation in (60) and (61) and end up with

$$\mathrm{d}{\overline{x}}_{t}=F{\overline{x}}_{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+\left({P}_{t}{F}^{\mathrm{T}}+Q\right){H}^{\mathrm{T}}{C}^{-1}\left(\mathrm{d}{Y}_{t}-HF{\overline{x}}_{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t\right).$$

Defining ${u}_{t}:={\tilde{X}}_{t}-{\overline{x}}_{t}$, we see that

$$\mathrm{d}{u}_{t}=F{u}_{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+G\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{\tilde{W}}_{t}-\left({P}_{t}{F}^{\mathrm{T}}+Q\right){H}^{\mathrm{T}}{C}^{-1}\left(HF{u}_{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+HG\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t}+{R}^{1/2}\mathrm{d}t\right).$$

Next we use
and ${P}_{t}=\mathbb{E}[{u}_{t}{u}_{t}^{\mathrm{T}}]$ to obtain, after some calculations,

$$\mathrm{d}\left({u}_{t}{u}_{t}^{\mathrm{T}}\right)=\mathrm{d}{u}_{t}{u}_{t}^{\mathrm{T}}+{u}_{t}\mathrm{d}{u}_{t}^{\mathrm{T}}+\mathrm{d}{\langle u,{u}^{\mathrm{T}}\rangle}_{t}$$

$$\mathrm{d}{P}_{t}=(F{P}_{t}+{P}_{t}{F}^{\mathrm{T}})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t-\left({P}_{t}{F}^{\mathrm{T}}+Q\right){H}^{\mathrm{T}}{C}^{-1}H\left(F{P}_{t}+Q\right)\mathrm{d}t+Q\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t.$$

The McKean–Vlasov Equation (60) for linear systems, along with Gaussian prior and posterior distributions, suggest approximating the feedback particle filter formulation (43) for nonlinear systems by
where the innovation process ${I}_{t}$ given by (46) as before. In other words, we approximate the gain matrix ${K}_{t}$ in (43) by the state independent term $\left({P}_{t}^{xh}+Q{H}^{\mathrm{T}}\right){C}^{-1}$ with the covariance matrix ${P}_{t}^{xh}$ defined by
where ${\tilde{\pi}}_{t}$ denotes the law of ${\tilde{X}}_{t}$.

$$\mathrm{d}{\tilde{X}}_{t}=f({\tilde{X}}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+G\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t}+\left({P}_{t}^{xh}+Q{H}^{\mathrm{T}}\right){C}^{-1}\mathrm{d}{I}_{t},$$

$${P}_{t}^{xh}={\tilde{\pi}}_{t}\left[(x-{\overline{x}}_{t}){(h(x)-{\tilde{\pi}}_{t}[h])}^{\mathrm{T}}\right]={\tilde{\pi}}_{t}\left[x\phantom{\rule{0.166667em}{0ex}}{(h(x)-{\tilde{\pi}}_{t}[h])}^{\mathrm{T}}\right]$$

We can now generalise the ensemble Kalman–Bucy filter formulation (31) for the pure parameter estimation problem to the state estimation problem with correlated noise. We assume that M initial state values ${\tilde{X}}_{0}^{i}$ have been sampled from an initial distribution ${\pi}_{0}$ or, alternatively, ${X}_{0}^{i}={x}_{0}$ for all $i=1,\dots ,M$ in case the initial condition is known exactly. These state values are then propagated under the time-stepping procedure
with ${\Theta}_{n}^{i}\sim \mathrm{N}(0,I)$, step size $\Delta t>0$, empirical covariance matrices
and innovation increments $\Delta {I}_{n}^{i}$ given by

$${\tilde{X}}_{n+1}^{i}={\tilde{X}}_{n}^{i}+\Delta tf({\tilde{X}}_{n}^{i})+\Delta {t}^{1/2}G{\Theta}_{n}^{i}+\left({\widehat{P}}_{n}^{xh}+Q{H}^{\mathrm{T}}\right){\left(C+\Delta t{\widehat{P}}_{n}^{hh}\right)}^{-1}\Delta {I}_{n}^{i}$$

$$\begin{array}{cc}\hfill {\widehat{P}}_{n}^{xh}& =\frac{1}{M-1}\sum _{i=1}^{M}{\tilde{X}}_{n}^{i}{(h({\tilde{X}}_{n}^{i})-{\overline{h}}_{n}^{M})}^{\mathrm{T}},\phantom{\rule{2.em}{0ex}}{\overline{h}}_{n}^{M}=\frac{1}{M}\sum _{i=1}^{M}h({\tilde{X}}_{n}^{i}),\hfill \end{array}$$

$$\begin{array}{cc}\hfill {\widehat{P}}_{n}^{hh}& =\frac{1}{M-1}\sum _{i=1}^{M}h({\tilde{X}}_{n}^{i}){(h({\tilde{X}}_{n}^{i})-{\overline{h}}_{n}^{M})}^{\mathrm{T}},\hfill \end{array}$$

$$\Delta {I}_{n}^{i}=\Delta {Y}_{n}-\Delta th({\tilde{X}}_{n}^{i})-\Delta {t}^{1/2}HG{\Theta}_{n}^{i}-\Delta {t}^{1/2}{R}^{1/2}{\Xi}_{n}^{i},\phantom{\rule{2.em}{0ex}}{\Xi}_{n}^{i}\sim \mathrm{N}(0,I).$$

The McKean–Vlasov equations of this section form the basis for the methods proposed for the combined state and parameter estimation problem to be considered next.

We now return to the combined state and parameter estimation problem, and consider the augmented dynamics
with observations (3) as before. The initial conditions satisfy ${X}_{0}={x}_{0}$ a.s., and ${A}_{0}\sim {\Pi}_{0}$. Let us introduce the extended state space variable ${Z}_{t}={({X}_{t}^{\mathrm{T}},{A}_{t}^{\mathrm{T}})}^{\mathrm{T}}$. In terms of ${Z}_{t}$, the Equations (3) and (71) take the form
with

$$\begin{array}{cc}\hfill \mathrm{d}{X}_{t}& =f({X}_{t},{A}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+G\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t},\hfill \end{array}$$

$$\begin{array}{cc}\hfill \mathrm{d}{A}_{t}& =0,\hfill \end{array}$$

$$\begin{array}{c}\hfill \mathrm{d}{Z}_{t}=\overline{f}(Z)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+\overline{G}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t},\end{array}$$

$$\begin{array}{c}\hfill \mathrm{d}{Y}_{t}=\overline{H}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{Z}_{t}+{R}^{1/2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{V}_{t},\end{array}$$

$$\overline{f}(z)=\left(\begin{array}{c}f(x,a)\\ 0\end{array}\right),\phantom{\rule{1.em}{0ex}}\overline{G}=\left(\begin{array}{cc}G& 0\\ 0& 0\end{array}\right),\phantom{\rule{1.em}{0ex}}\overline{H}=\left(\begin{array}{cc}H& 0\end{array}\right).$$

Thus we end up with an augmented state estimation problem of the general structure considered in detail in Section 4 already. Below we provide details on some of the necessary modifications.

The appropriately extended feedback particle filter Equation (43) leads to
where (46) takes the form
with observation map (4) and correction ${\Omega}_{t}$ given by (45), with Q replaced by $\overline{Q}=\overline{G}{\overline{G}}^{T}$ and H by $\overline{H}$. In the Poisson equation(s) (19), ${\tilde{\Pi}}_{t}$ is replaced by ${\tilde{\pi}}_{t}$ denoting the joint density of $({\tilde{X}}_{t},{\tilde{A}}_{t})$. We also stress that ${\Psi}_{t}$ becomes a function of x and a, and we distinguish between gradients with respect to x and a using the notation ${\nabla}_{x}$ and ${\nabla}_{a}$, respectively.

$$\begin{array}{cc}\hfill \mathrm{d}{\tilde{X}}_{t}& =f(\tilde{X},{\tilde{A}}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+G\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t}+({\nabla}_{x}{\Psi}_{t}({\tilde{X}}_{t},{\tilde{A}}_{t})+Q{H}^{\mathrm{T}}){C}^{-1}\circ \mathrm{d}{I}_{t}+{\Omega}_{t}({\tilde{X}}_{t},{\tilde{A}}_{t}),\hfill \end{array}$$

$$\begin{array}{cc}\hfill \mathrm{d}{\tilde{A}}_{t}& ={\nabla}_{a}{\Psi}_{t}({\tilde{X}}_{t},{\tilde{A}}_{t}){C}^{-1}\circ \mathrm{d}{I}_{t},\hfill \end{array}$$

$$\mathrm{d}{I}_{t}=\mathrm{d}{Y}_{t}-\left(h({\tilde{X}}_{t},{\tilde{A}}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+HG\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t}+{R}^{1/2}\mathrm{d}{U}_{t}\right)$$

Numerical implementations of the extended feedback particle filter are demanding due to the need for solving the Poisson equation(s) (19). Instead, we again rely on the ensemble Kalman–Bucy filter approximation, which we describe next.

We approximate the joint density ${\tilde{\pi}}_{t}$ of ${\tilde{Z}}_{t}$ by an ensemble of particles
that is,
where ${\delta}_{{z}^{\prime}}$ denotes the Dirac delta function centred at ${z}^{\prime}$. The initial ensemble satisfies ${X}_{0}^{i}={x}_{0}$ for all $i=1,\dots ,M$, and the initial parameter values ${A}_{0}^{i}$ are independent draws from the prior distribution ${\Pi}_{0}$.

$${\tilde{Z}}_{t}^{i}=\left(\begin{array}{c}{\tilde{X}}_{t}^{i}\\ {\tilde{A}}_{t}^{i}\end{array}\right),$$

$${\tilde{\pi}}_{t}\approx \frac{1}{M}\sum _{i=1}^{M}{\delta}_{{\tilde{Z}}_{t}^{i}},$$

At the same time, we make the approximation ${\tilde{Z}}_{t}\sim \mathrm{N}({\overline{z}}_{t}^{M},{\widehat{P}}_{t}^{zz})$ when dealing with the Kalman gain of the feedback particle filter. Here the empirical mean ${\overline{z}}_{t}^{M}$ has components
and the joint empirical covariance matrix is given by

$${\overline{x}}_{t}^{M}=\frac{1}{M}\sum _{i=1}^{M}{\tilde{X}}_{t}^{i},\phantom{\rule{1.em}{0ex}}{\overline{a}}_{t}^{M}=\frac{1}{M}\sum _{i=1}^{M}{\tilde{A}}_{t}^{i},$$

$${\widehat{P}}_{t}^{zz}=\frac{1}{M-1}\sum _{i=1}^{M}{\tilde{Z}}_{t}^{i}{({\tilde{Z}}_{t}-{\overline{z}}_{t}^{M})}^{\mathrm{T}}=\left(\begin{array}{cc}{\widehat{P}}_{t}^{xx}& {\widehat{P}}_{t}^{xa}\\ {({\widehat{P}}_{t}^{xa})}^{\mathrm{T}}& {\widehat{P}}_{t}^{aa}\end{array}\right).$$

As in Section 4.3, the solution to (19) can be approximated by
where finally, the covariance matrices ${P}_{t}^{xh}$ and ${P}_{t}^{ah}$ are estimated by their empirical counterparts
with ${\overline{h}}_{t}^{M}$ defined by

$${\nabla}_{x}{\Psi}_{t}={P}_{t}^{xh},\phantom{\rule{1.em}{0ex}}{\nabla}_{a}{\Psi}_{t}={P}_{t}^{ah},$$

$$\begin{array}{cc}\hfill {\widehat{P}}_{t}^{xh}& =\frac{1}{M-1}\sum _{i=1}^{M}{\tilde{X}}_{t}^{i}{(h({\tilde{X}}_{t}^{i},{\tilde{A}}_{t}^{i})-{\overline{h}}_{t}^{M})}^{\mathrm{T}},\hfill \end{array}$$

$$\begin{array}{cc}\hfill {\widehat{P}}_{t}^{ah}& =\frac{1}{M-1}\sum _{i=1}^{M}{\tilde{A}}_{t}^{i}{(h({\tilde{X}}_{t}^{i},{\tilde{A}}_{t}^{i})-{\overline{h}}_{t}^{M})}^{\mathrm{T}},\hfill \end{array}$$

$${\overline{h}}_{t}^{M}=\frac{1}{M}\sum _{i=1}^{M}h({\tilde{X}}_{t}^{i},{\tilde{A}}_{t}^{i}).$$

Summing everything up, we obtain the following generalised ensemble Kalman–Bucy filter equations
where the innovations are given by
and ${W}_{t}^{i}$ and ${U}_{t}^{i}$ denote independent ${N}_{x}$-dimensional and ${N}_{y}$-dimensional Brownian motions, respectively, for $i=1,\dots ,M$.

$$\begin{array}{cc}\hfill \mathrm{d}{\tilde{X}}_{t}^{i}& =f({\tilde{X}}_{t}^{i},{\tilde{A}}_{t}^{i})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+G\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t}^{i}+({\widehat{P}}_{t}^{xh}+Q{H}^{\mathrm{T}}){C}^{-1}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{I}_{t}^{i},\hfill \end{array}$$

$$\begin{array}{cc}\hfill \mathrm{d}{\tilde{A}}_{t}^{i}& ={\widehat{P}}_{t}^{ah}{C}^{-1}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{I}_{t}^{i},\hfill \end{array}$$

$$\mathrm{d}{I}_{t}^{i}=\mathrm{d}{Y}_{t}-\left(h({\tilde{X}}_{t}^{i},{\tilde{A}}_{t}^{i})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+HG\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{W}_{t}^{i}+{R}^{1/2}\mathrm{d}{U}_{t}^{i}\right),$$

The interacting particle Equation (83) can be time-stepped along the lines discussed in Section 4.3 for the pure state estimation formulation of the ensemble Kalman–Bucy filter.

We now apply the generalised ensemble Kalman–Bucy filter formulation (83) with innovation (84) to five different model scenarios.

Our first example is provided by the Ornstein–Uhlenbeck process
with unknown parameter $a\in \mathbb{R}$, and known initial condition ${X}_{0}=1/2$. We assume an observation model of the form (3) with $H=1$, and a measurement error taking values $R=0.01$, $R=0.0001$, and $R=0$. The model error variance is set to either $Q=0.5$ or $Q=0.005$. Except for the case $R=0$ a combined state and parameter estimation problem is to be solved. We implement the ensemble Kalman–Bucy filter (Section 5.2) with innovation (84), step size $\Delta t=0.005$, and ensemble size $M=1000$. The data is generated using the Euler–Maruyama method applied to (85), with $a=-1/2$ and integrated over a time-interval $[0,500]$ with the same step size. The prior distribution ${\Pi}_{0}$ for the parameter is Gaussian with mean $\overline{a}=-1/2$ and variance ${\sigma}_{a}^{2}=2$. The results can be found in Figure 1. We find that the ensemble Kalman–Bucy filter is able to successfully identify the unknown parameter under all tested experimental settings, except for the largest measurement error case where $R=0.01$. There, a small systematic offset of the estimated parameter value can be observed. One can also see that the variance in the parameter estimate monotonically decreases in time in all cases, while the variance in the state estimates approximately reaches a steady state.

$$\mathrm{d}{X}_{t}=a{X}_{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+{Q}^{1/2}\mathrm{d}{W}_{t}$$

Consider the equations
from [19] for $\lambda ,\alpha ,\gamma ,\u03f5>0$, and initial condition ${Y}_{0}=1/2$, ${Z}_{0}=0$. The reduced equations in the limit $\u03f5\to 0$ are given by (85), with parameter value
and initial condition ${X}_{0}=1/2$. The reduced dynamics corresponds to a (stable) Ornstein–Uhlenbeck process for $\lambda /\alpha >1$. We wish to estimate the parameter a from observed increments
where the sequence of ${\{{Y}_{n}\}}_{n\ge 0}$ is obtained by time-stepping (86) using the Euler–Maruyama method with a step size $\Delta t$. We set $\lambda =3$, $\alpha =2$ (so that $a=-1/2$), $Q=0.5$, and $\u03f5\in \{0.1,0.01\}$ in our experiments. The measurement noise is set to $R=0.01$ or $R=0$ (pure parameter estimation).

$$\begin{array}{cc}\hfill \mathrm{d}{Y}_{t}& =\left(1-{Z}_{t}^{2}\right){Y}_{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+{Q}^{1/2}\mathrm{d}{W}_{t}^{y},\hfill \end{array}$$

$$\begin{array}{cc}\hfill \mathrm{d}{Z}_{t}& =-\frac{\alpha}{\u03f5}{Z}_{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+\sqrt{\frac{2\lambda}{\u03f5}}\mathrm{d}{W}_{t}^{z}\hfill \end{array}$$

$$a=1-\frac{\lambda}{\alpha}$$

$$\Delta {Y}_{n}={Y}_{n+1}-{Y}_{n}+\Delta {t}^{1/2}{R}^{1/2}{\Xi}_{n},\phantom{\rule{2.em}{0ex}}{\Xi}_{n}\sim \mathrm{N}(0,1),$$

We implement the ensemble Kalman–Bucy filter (83) with innovation (84), step size $\Delta t=\u03f5/50$, and ensemble size $M=1000$ for the reduced Equation (87). The data is generated from an Euler–Maruyama discretization of (86) with the same step size. We also investigate the effect of subsampling the observations for $\u03f5=0.01$ by solving (86) with step size $\Delta t=\u03f5/50$ and storing only every tenth solution ${Y}_{n}$, while the reduced equations and the ensemble Kalman–Bucy filter equations are integrated with $\Delta t=\u03f5/5$. The results are shown in Figure 2. Figure 3 shows the results for the same experiments repeated with a smaller ensemble size of $M=10$. We find that the smaller ensemble size leads to more noisy estimates for the variance in ${\tilde{X}}_{n}$ and a faster decay of the variance in ${\tilde{A}}_{n}$, but the estimated parameter values are equally well converged. Subsampling does not lead to significant changes in the estimated parameter values. This is in contrast to the example considered next.

We finally mention [31] for alternative approaches to sequential estimation in the context of averaging using however different assumptions on the data.

In this example, the data is produced by integrating the multi-scale SDE
with parameter values $\u03f5=0.1$, $a=-1/2$, $\sigma =1/2$, and initial condition ${Y}_{0}=1/2$, ${Z}_{0}=0$. Here, ${W}_{t}^{z}$ denotes standard Brownian motion. The equations are discretised with step size $\Delta \tau ={\u03f5}^{2}/50=0.0002$, and the resulting increments (88) are stored over a time interval $[0,500]$. See [32] for more details.

$$\begin{array}{cc}\hfill \mathrm{d}{Y}_{t}& =\left(\frac{\sqrt{\sigma /2}}{\u03f5}{Z}_{t}+a{Y}_{t}\right)\mathrm{d}t,\hfill \end{array}$$

$$\begin{array}{cc}\hfill \mathrm{d}{Z}_{t}& =-\frac{1}{{\u03f5}^{2}}{Z}_{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+\frac{\sqrt{2}}{\u03f5}\mathrm{d}{W}_{t}^{z}\hfill \end{array}$$

According to homogenisation theory, the reduced model is given by (85) with $Q=\sigma $, and we wish to estimate the parameter a from the data $\{\Delta {Y}_{n}\}$ produced according to (88). It is known that a standard maximum likelihood estimator (MLE) given by
leads to ${a}_{\mathrm{ML}}=0$ in the limit $\Delta \tau \to 0$ and the observation interval $T\to \infty $. This MLE corresponds to $H=I$ and $R=0$ in our extended state space formulation of the problem. Subsampling can be achieved by choosing an appropriate time-step $\Delta t>\Delta \tau $ in the ensemble Kalman–Bucy filter equations and a corresponding subsampling of the data points ${Y}_{n}$ in (88). We used $\Delta t=50\Delta \tau =0.01$ and $\Delta t=500\Delta \tau =0.1$, respectively. The results can be found in Figure 4. It can be seen that only the larger subsampling leads to a correct estimate of the parameter a. This is in line with known results for the maximum likelihood estimator (90). See [32] and references therein.

$${a}_{\mathrm{ML}}=\frac{{\sum}_{n}{Y}_{{t}_{n}}({Y}_{{t}_{n+1}}-{Y}_{{t}_{n}})}{{\sum}_{n}{Y}_{{t}_{n}}^{2}\Delta \tau}$$

We consider nonparametric drift estimation for one-dimensional SDEs over a periodic domain $[0,2\pi )$ in the setting considered from a theoretical perspective in [33]. There, a zero-mean Gaussian process prior $\mathcal{GP}(0,{\mathcal{D}}^{-1})$ is placed on the unknown drift function, with inverse covariance operator

$$\mathcal{D}:=\eta [{(-\Delta )}^{p}+\kappa I].$$

The integer parameter p sets the regularity of the process, whereas $\eta ,\kappa \in {\mathbb{R}}^{+}$ control its characteristic correlation length and stationary variance.

Spatial discretization of the problem is carried out by first defining a grid of ${N}_{d}$ evenly spaced points on the domain, at locations ${x}_{i}=i\Delta x$, $\Delta x=2\pi /{N}_{d}$. The drift function is projected onto compactly supported functions centred at these points, which are piecewise linear with
and linear interpolation is used to define a drift function $f(x,a)$ for all $x\in [0,2\pi )$, that is, it is of the form (2) with ${f}_{0}(x)\equiv 0$. In this example, we set ${N}_{d}=200$. Sample realisations, as well as the reference drift ${f}^{*}$, can be found in Figure 5a.

$${b}_{i}({x}_{j})={\delta}_{ij}$$

Data is generated by integrating the SDE (1) with drift ${f}^{*}$ forward in time from initial condition ${X}_{0}=\pi $ and with noise level $Q=0.1$, using the Euler–Maruyama discretisation with step size $\Delta t=0.1$ over one million time-steps. The spatial distribution of the solutions ${X}_{n}$ is plotted in Figure 5b. The data is then given by
with $R=0.00001$. Data assimilation is performed using the time-discretised ensemble Kalman–Bucy filter Equation (83) with innovation (84), ensemble size $M=200$, and step size $\Delta t=0.1$.

$$\Delta {Y}_{n}={X}_{n+1}-{X}_{n}+\Delta {t}^{1/2}{R}^{1/2}{\Xi}_{n}$$

The final estimate of the drift function (ensemble mean) and the ensemble of drift functions can be found in Figure 5c. Figure 5d displays the ensemble of state estimates and the value of the reference solution at the final time. We find that the ensemble Kalman–Bucy filter is able to successfully estimate the drift function and the model states. Further experiments reveal that the drift function can only be identified for sufficiently small measurement errors.

Consider the stochastic heat equation on the periodic domain $x\in [0,2\pi )$, given in conservative form by the stochastic partial differential equation (SPDE)
where $W(x,t)$ is space-time white noise. With constant $\theta (x)=\theta $, this SPDE reduces to

$$\begin{array}{c}\hfill \mathrm{d}u(x,t)=\nabla \xb7\left(\theta (x)\nabla u(x,t)\right)\mathrm{d}t+{\sigma}^{1/2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}W(x,t),\end{array}$$

$$\begin{array}{c}\hfill \mathrm{d}u(x,t)=\theta \Delta u(x,t)\mathrm{d}t+{\sigma}^{1/2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}W(x,t).\end{array}$$

In this example, we examine the estimation of $\theta $ from incremental measurements of a locally averaged quantity $q(x,t)$ that arises naturally in a standard finite volume discretisation of (95).

To discretise the system, one first defines ${q}_{t}^{i}=q({x}_{i},t)$ around ${N}_{d}=200$ grid points ${x}_{i}$ on a regular grid, separated by distances $\Delta x$, as

$$\begin{array}{c}\hfill {q}_{t}^{i}={\int}_{{x}_{i}-\Delta x/2}^{{x}_{i}+\Delta x/2}u(x,t)\mathrm{d}x.\end{array}$$

The conservative (drift) term in (94) reduces to
where ${\theta}_{i\pm 1/2}\equiv \theta ({x}_{i}+\Delta x/2)$, etc. The following standard finite difference approximations
yield the ${N}_{d}$-dimensional SDE
for constant $\theta $, where ${W}_{t}^{i}$ are independent one-dimensional Brownian motions in time.

$$\begin{array}{c}\hfill {\int}_{{x}_{i}-\Delta x/2}^{{x}_{i}+\Delta x/2}\nabla \xb7\left(\theta (x)\nabla u(x,t)\right)\mathrm{d}x={\theta}_{i+1/2}\nabla {u}_{t}^{i+1/2}-{\theta}_{i-1/2}\nabla {u}_{t}^{i-1/2},\end{array}$$

$$\begin{array}{c}\hfill \nabla {u}_{t}^{i+1/2}\simeq \frac{{u}_{t}^{i+1}-{u}_{t}^{i}}{\Delta x},\phantom{\rule{1.em}{0ex}}{u}_{t}^{i}\simeq \Delta {x}^{-1}{q}_{t}^{i}\end{array}$$

$$\begin{array}{cc}\hfill \mathrm{d}{q}_{t}^{i}& =\theta \left(\frac{{q}_{t}^{i+1}-2{q}_{t}^{i}+{q}_{t}^{i-1}}{\Delta {x}^{2}}\right)\mathrm{d}t+{\sigma}^{1/2}\Delta {x}^{1/2}\mathrm{d}{W}_{t}^{i}\hfill \end{array}$$

Following recent results from [20] we consider the case of estimation of a constant $a=\theta $ value from measurements $\mathrm{d}{q}_{t}^{*}$ at a fixed location/index ${j}^{*}\in \{1,\dots ,{N}_{d}\}$. The data trajectory is thus given by
where ${R}^{1/2}$ is a scalar and ${V}_{t}$ is a standard Brownian motion in one dimension. We perform numerical experiments in which the initial state ${q}_{0}^{i}$ is set to zero for all indices i and the prior on the unknown parameter $a=\theta $ is uniform over the interval $[0.2,1.8]$.

$$\begin{array}{c}\hfill \mathrm{d}{Y}_{t}=\mathrm{d}{q}_{t}^{*}+{R}^{1/2}\mathrm{d}{V}_{t}\end{array}$$

The increment data is generated by first integrating (95) forward in time from the known initial condition ${q}_{i}(0)=0$ for all i. The equation is discretised in time using the Euler-Maruyama method. It is known that $\Delta t<\theta \Delta {x}^{2}/2$ is required for stability of the Euler–Maruyama discretisation; we use the much smaller time step $\Delta t=\Delta {x}^{2}/80$. The solution is sampled with this same time step, and increment measurements are approximated at time ${t}_{n}$ by setting the measurement noise level R to zero in (100), resulting in

$$\begin{array}{c}\hfill \Delta {Y}_{n}={q}_{n+1}^{*}-{q}_{n}^{*}.\end{array}$$

Please note that the associated model error in (1) is given by $G={\sigma}^{1/2}\Delta {x}^{1/2}I$ and the matrix H in (3) projects the vector of state increments onto a single component with index ${j}^{*}={N}_{d}/2$. Simulations are performed over the time-interval $[0,20]$. The results can be found in Figure 6a. We also compute the model evidence for a sequence of parameter values $\theta \in \{0.2,0.3,\dots ,1.8\}$ based on a standard Kalman–Bucy filter [6] for the associated linear state estimation problem. See Figure 6b. Both approaches agree with the reference value $\theta =1$.

The results presented here demonstrate that the proposed methodology can be applied to a broad range of continuous-time state and parameter estimation problems with correlated measurement and model errors. Alternatively, one could have employed standard SMC or MCMC methods utilising the modified observation Model (41) as implied by the Kushner–Stratonovitch formulation (39) of the filtering problem. However, such implementations require the approximation of the additional $Q\nabla log{\pi}_{t}$ term which is nontrivial if only samples from ${\pi}_{t}$ are available. Furthermore, the limiting behaviour of such implementations in the limit $R\to 0$ and $H=I$ (pure parameter estimation problem) is unclear since ${\pi}_{t}$ degenerates into a Dirac delta distribution, potentially leading to numerical difficulties in this singular regime. The proposed generalised feedback particle filter formulation avoids these issues through the use of stochastic innovations which are correlated with the model noise. In other words, the distribution ${\pi}_{t}$ does not appear explicitly in the innovation process (46), and the correlated noise terms cancel each other out as discussed in Remark 3 for $R=0$ and $H=I$. The main computational challenge of the feedback particle filter approach is given by the need for finding the Kalman gain matrix (57). However, the constant gain ensemble Kalman–Bucy approximation
is easy to implement. In fact, the only differences with the standard ensemble Kalman–Bucy filter formulation of [14] are in the additional $Q{H}^{\mathrm{T}}$ term in the Kalman gain, and a correlation between the stochastic innovation process and the model error. While the ensemble Kalman–Bucy filter gave rather satisfactory results for the numerical experiments displayed in Section 6, strongly non-Gaussian distributions might require more accurate approximations to the Kalman gain matrix (57). In that case, one could rely on the particle-based diffusion map approximation considered in [27].

$${K}_{t}\approx \left({P}^{xh}+Q{H}^{\mathrm{T}}\right){C}^{-1}$$

In this paper, we have derived McKean–Vlasov equations for combined state and parameter estimation from continuously observed state increments. An approximate and robust implementation of these McKean–Vlasov equations in the form of a generalised ensemble Kalman–Bucy filter has been provided and applied to a range of increasingly complex model systems. Future work will address the treatment of temporally-correlated measurement and model errors, as well as a rigorous analysis of these McKean–Vlasov equations in the contexts of multi-scale dynamics and nonparametric drift estimation.

Methodology, N.N. and S.R.; software, S.R. and P.J.R.; validation, N.N., S.R. and P.J.R.; writing—original draft preparation, N.N., S.R.; writing—review and editing, N.N., S.R. and P.J.R.

This research has been partially funded by Deutsche Forschungsgemeinschaft (DFG) through grants CRC 1294 ‘Data Assimilation’ (project A06) and CRC 1114 ‘Scaling Cascades’ (project A02).

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

In this appendix we outline a derivation of the Kushner-Stratonovich Equation (39) for the signal-observation dynamics given by (38). In fact, we only compute the evolution equation (termed modified Zakai equation) for the unnormalised filtering distribution ${\rho}_{t}[\varphi ]=\mathbb{E}\left[{l}_{t}\varphi ({X}_{t})|{Y}_{[0,t]}\right]$, where the likelihood ${l}_{t}$ is given by

$${l}_{t}\equiv l({Y}_{[0,t]}|{X}_{[0,t]})=exp\left({\int}_{0}^{t}f{({X}_{s})}^{\mathrm{T}}{H}^{\mathrm{T}}{C}^{-1}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{Y}_{s}-\frac{1}{2}{\int}_{0}^{t}f{({X}_{s})}^{\mathrm{T}}{H}^{\mathrm{T}}{C}^{-1}Hf({X}_{s})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}s\right).$$

Obtaining the Kushner-Stratonovich formulation is then standard, applying Itô’s formula to the Kallianpur-Striebel formula $\pi [\varphi ]={\rho}_{t}[\varphi ]/{\rho}_{t}[\mathbf{1}]$, see ([7], Chapter 3). The following result is in agreement with the corollaries 3.39 and 3.40 in [7].

The modified Zakai equation is given by
where the generator $\mathcal{L}$ has been defined in (40).

$${\rho}_{t}[\varphi ]={\rho}_{0}[\varphi ]+{\int}_{0}^{t}{\rho}_{s}[\mathcal{L}\varphi ]\phantom{\rule{0.166667em}{0ex}}\mathrm{d}s+{\int}_{0}^{t}{\rho}_{s}\left[\varphi {f}^{\mathrm{T}}{H}^{\mathrm{T}}{C}^{-1}\right]\mathrm{d}{Y}_{s}+{\int}_{0}^{t}{\rho}_{s}\left[\nabla \varphi \right]Q{H}^{\mathrm{T}}{C}^{-1}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{Y}_{s},$$

For convenience, let us define the process
where ${Y}_{s}$ satisfies (38b). From ${\langle Y\rangle}_{t}=Ct$ we see that

$${M}_{t}={\int}_{0}^{t}f{({X}_{s})}^{\mathrm{T}}{H}^{\mathrm{T}}{C}^{-1}\mathrm{d}{Y}_{s},$$

$${\langle M\rangle}_{t}={\int}_{0}^{t}f{({X}_{s})}^{\mathrm{T}}{H}^{\mathrm{T}}{C}^{-1}Hf({X}_{s})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}s.$$

Hence, the likelihood takes the form
satisfying the SDE

$${l}_{t}=exp\left({M}_{t}-\frac{1}{2}{\langle M\rangle}_{t}\right),$$

$$\mathrm{d}{l}_{t}={l}_{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{M}_{t}.$$

For an arbitrary smooth compactly supported test function $\varphi $, Itô’s formula implies
where ${X}_{s}$ satisfies (38a). For the covariation process ${\langle l,X\rangle}_{t}$ we obtain
using ${\langle Y,X\rangle}_{t}=HQt$. Furthermore, ${\langle X,X\rangle}_{t}=Qt$, which follows from the definition of the stochastic contributions in (38a).

$$\begin{array}{cc}{l}_{t}\varphi ({X}_{t})\hfill & =\varphi ({X}_{0})+{\int}_{0}^{t}\varphi ({X}_{s})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{l}_{s}+{\int}_{0}^{t}{l}_{s}\nabla \varphi ({X}_{s})\xb7\mathrm{d}{X}_{s}\hfill \end{array}$$

$$\begin{array}{c}+\frac{1}{2}\sum _{i,j=1}^{{N}_{x}}{\int}_{0}^{t}{l}_{s}{\partial}_{i}{\partial}_{j}\varphi ({X}_{s})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{\langle {X}^{i},{X}^{j}\rangle}_{s}+\sum _{i=1}^{{N}_{x}}{\int}_{0}^{t}{\partial}_{i}\varphi ({X}_{s})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{\langle l,{X}^{i}\rangle}_{s},\hfill \end{array}$$

$${\langle l,X\rangle}_{t}={l}_{t}{\langle M,X\rangle}_{t}={l}_{t}f{({X}_{t})}^{\mathrm{T}}{H}^{\mathrm{T}}{C}^{-1}HQt,$$

We now apply the conditional expectation to (A7). Noticing that
the result follows from (A6). □

$${\int}_{0}^{t}\varphi ({X}_{s})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{l}_{s}={\int}_{0}^{t}{l}_{s}\varphi ({X}_{s})f{({X}_{s})}^{\mathrm{T}}{H}^{\mathrm{T}}{C}^{-1}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{Y}_{s},$$

- Kutoyants, Y. Statistical Inference for Ergodic Diffusion Processes; Springer: New York, NY, USA, 2004. [Google Scholar]
- Pavliotis, G. Stochastic Processes and Applications; Springer: New York, NY, USA, 2014. [Google Scholar]
- Apte, A.; Hairer, M.; Stuart, A.; Voss, J. Sampling the posterior: An approach to non-Gaussian data assimilation. Phys. D Nonlinear Phenom.
**2007**, 230, 50–64. [Google Scholar] [CrossRef][Green Version] - Salman, H.; Kuznetsov, L.; Jones, C.; Ide, K. A method for assimilating Lagrangian data into a shallow-water-equation ocean model. Mon. Weather Rev.
**2006**, 134, 1081–1101. [Google Scholar] [CrossRef] - Apte, A.; Jones, C.; Stuart, A. A Bayesian approach to Lagrangian data assimilation. Tellus A
**2008**, 60, 336–347. [Google Scholar] [CrossRef][Green Version] - Simon, D. Optimal State Estimation; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
- Bain, A.; Crisan, D. Fundamentals of Stochastic Filtering; Springer: New York, NY, USA, 2009. [Google Scholar]
- Liu, J. Monte Carlo Strategies in Scientific Computing; Springer: New York, NY, USA, 2001. [Google Scholar]
- Crisan, D.; Xiong, J. Approximate McKean-Vlasov representation for a class of SPDEs. Stochastics
**2010**, 82, 53–68. [Google Scholar] [CrossRef] - McKean, H. A class of Markov processes associated with nonlinear parabolic equations. Proc. Natl. Acad. Sci. USA
**1966**, 56, 1907–1911. [Google Scholar] [CrossRef] [PubMed] - Yang, T.; Mehta, P.; Meyn, S. Feedback particle filter. IEEE Trans. Autom. Control
**2013**, 58, 2465–2480. [Google Scholar] [CrossRef] - Reich, S. Data assimilation: The Schrödinger perspective. Acta Numer.
**2019**, 28, 635–710. [Google Scholar] - Majda, A.; Harlim, J. Filtering Complex Turbulent Systems; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
- Bergemann, K.; Reich, S. An ensemble Kalman–Bucy filter for continuous data assimilation. Meteorol. Z.
**2012**, 21, 213–219. [Google Scholar] [CrossRef] - Taghvaei, A.; de Wiljes, J.; Mehta, P.; Reich, S. Kalman filter and its modern extensions for the continuous-time nonlinear filtering problem. ASME. J. Dyn. Syst. Meas. Control
**2017**, 140. [Google Scholar] [CrossRef] - Law, K.; Stuart, A.; Zygalakis, K. Data Assimilation: A Mathematical Introduction; Springer: New York, NY, USA, 2015. [Google Scholar]
- Reich, S.; Cotter, C. Probabilistic Forecasting and Bayesian Data Assimilation; Cambridge University Press: Cambridge, UK, 2015. [Google Scholar]
- Moral, P.D. Mean Field Simulation for Monte Carlo Integration; Chapman and Hall/CRC: London, UK, 2013. [Google Scholar]
- Pavliotis, G.; Stuart, A. Multiscale Methods; Springer: New York, NY, USA, 2008. [Google Scholar]
- Altmeyer, R.; Reiß, M. Nonparametric Estimation for Linear SPDEs from Local Measurements; Technical Report; Humboldt University Berlin: Berlin, Germany, 2019. [Google Scholar]
- Saha, S.; Gustafsson, F. Particle filtering with dependent noise processes. IEEE Trans. Signal Process.
**2012**, 60, 4497–4508. [Google Scholar] [CrossRef] - Berry, T.; Sauer, T. Correlations between systems and observation errors in data assimilation. Mon. Weather Rev.
**2018**, 146, 2913–2931. [Google Scholar] [CrossRef] - Mitchell, H.L.; Daley, R. Discretization error and signal/error correlation in atmospheric data assimilation: (I). All scales resolved. Tellus A
**1997**, 49, 32–53. [Google Scholar] [CrossRef] - Papaspiliopoulos, O.; Pokern, Y.; Roberts, G.; Stuart, A. Nonparametric estimation of diffusion: A differential equation approach. Biometrika
**2012**, 99, 511–531. [Google Scholar] [CrossRef] - Särkkä, S. Bayesian Filtering and Smoothing; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
- Laugesen, R.S.; Mehta, P.G.; Meyn, S.P.; Raginsky, M. Poisson’s equation in nonlinear filtering. SIAM J. Control Optim.
**2015**, 53, 501–525. [Google Scholar] [CrossRef] - Taghvaei, A.; Mehta, P.; Meyn, S. Gain Function Approximation in the Feedback Particle Filter; Technical Report; University of Illinois at Urbana-Champaign: Champaign, IL, USA, 2019. [Google Scholar]
- Amezcua, J.; Kalnay, E.; Ide, K.; Reich, S. Ensemble transform Kalman-Bucy filters. Q. J. R. Meteorol. Soc.
**2014**, 140, 995–1004. [Google Scholar] [CrossRef] - De Wiljes, J.; Reich, S.; Stannat, W. Long-time stability and accuracy of the ensemble Kalman–Bucy filter for fully observed processes and small measurement noise. SIAM J. Appl. Dyn. Syst.
**2018**, 17, 1152–1181. [Google Scholar] [CrossRef] - Blömker, D.; Schillings, C.; Wacker, P. A strongly convergent numerical scheme for ensemble Kalman inversion. SIAM J. Numer. Anal.
**2018**, 56, 2537–2562. [Google Scholar] [CrossRef] - Harlim, J. Model error in data assimilation. In Nonlinear and Stochastic Climate Dynamics; Franzke, C., Kane, T.O., Eds.; Cambridge University Press: Cambridge, UK, 2017; pp. 276–317. [Google Scholar]
- Krumscheid, S.; Pavliotis, G.; Kalliadasis, S. Semi-parametric drift and diffusion estimation for multiscale diffusions. SIAM J. Multiscale Model. Simul.
**2011**, 11, 442–473. [Google Scholar] [CrossRef] - Van Waaij, J.; van Zanten, H. Gaussian process methods for one-dimensional diffusion: Optimal rates and adaptation. Electron. J. Stat.
**2016**, 10, 628–645. [Google Scholar] [CrossRef]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).