State and Parameter Estimation from Observed Signal Increments

The success of the ensemble Kalman filter has triggered a strong interest in expanding its scope beyond classical state estimation problems. In this paper, we focus on continuous-time data assimilation where the model and measurement errors are correlated and both states and parameters need to be identified. Such scenarios arise from noisy and partial observations of Lagrangian particles which move under a stochastic velocity field involving unknown parameters. We take an appropriate class of McKean–Vlasov equations as the starting point to derive ensemble Kalman–Bucy filter algorithms for combined state and parameter estimation. We demonstrate their performance through a series of increasingly complex multi-scale model systems.


Introduction
The research presented in this paper has been motivated by the state and parameter estimation problem for particles moving under a stochastic velocity field, with the measurements given by partial and noisy observations of their position increments. If the deterministic contributions to the velocity field are stationary, and the position increments of the moving particle are exactly observed, then one is led to a standard parameter estimation problem for stochastic differential equations (SDEs) [1,2]. In [3], this setting was extended to the case where the deterministic contributions to the velocity field themselves undergo a stochastic time evolution. Furthermore, while continuous-time observations of position increments are at the focus of the present study, the assimilation of discrete-time observations of particle positions has been investigated in [4,5] under a so-called Lagrangian data assimilation setting for atmospheric fluid dynamics.
The assumption of exactly and fully observed position increments is not always realistic and the case of partial and noisy observations is at the center of the present study. With access to partial and noisy observations of position increments leads to correlations between the measurement and model errors. The theoretical impact of such correlations on state and parameter estimation problems has been discussed, for example, in [6] in the context of linear systems, and in [7] for nonlinear systems. In particular, one finds that the appropriately adjusted data likelihood involves the gradient of log-densities, which is nontrivial from a computational perspective, and which prevents a straightforward application of standard Markov chain Monte Carlo (MCMC) or sequential Monte Carlo (SMC) methods [8].
In this paper, we instead follow an alternative Monte Carlo approach based on appropriately adjusted McKean-Vlasov filtering equations, an approach pioneered in [9] in the context of the standard state estimation problem for diffusion processes. McKean-Vlasov equations, first studied in [10], are a class of SDEs in which the right-hand side depends on the law of the process itself. We rely on a particular formulation of McKean-Vlasov filtering equations, the so-called feedback particle filters [11], utilising stochastic innovation processes [12].
Our proposed Monte Carlo formulation avoids the need for estimating log-densities, and can be implemented in a numerically robust manner relying on a generalised ensemble Kalman-Bucy filter approximation applied to an extended state space formulation [13]. The ensemble Kalman-Bucy filter [14,15] has been introduced previously as an extension of the popular ensemble Kalman filter [13,16,17] to continuous-time data assimilation under the assumption of uncorrelated measurement and model errors.
While the McKean-Vlasov formulation is essentially mathematically equivalent to the more conventional one based on the Kushner-Stratonovitch equation [7], these two approaches differ significantly in structure, suggesting different tools for their analysis as well as numerical approximations. More broadly speaking, the McKean-Vlasov approach to filtering is appealing since its Monte Carlo implementations completely avoid the need for resampling characteristic of standard SMC methods. Furthermore, a wide range of approximations are possible within the McKean-Vlasov framework with some of them, such as the ensemble Kalman-Bucy filter, applicable to high-dimensional problems. The McKean-Vlasov approach also arises naturally when analysing sequential Monte Carlo methods [18].
In Section 6, we apply the proposed algorithms to a series of state and parameter estimation problems of increasing complexity. First, we study the state and parameter estimation problem for an Ornstein-Uhlenbeck process [2]. Two further experiments investigate the behaviour of the filters for reduced model equations, with the data being collected from underlying multi-scale models. There we distinguish between the averaging and homogenisation scenarios [19]. Finally, we look at examples of nonparametric drift estimation [3] and parameter estimation for the stochastic heat equation [20].
We finally mention that SMC methods for correlated noise terms in discrete-time have been discussed, for example, in [21] and in the context of the ensemble Kalman filter in [22]. Similar ideas have also been pursued in a more applied context in [23].

Mathematical Problem Formulation
We consider the time evolution of a random state variable X t ∈ R N x in N x -dimensional state space, N x ≥ 1, as prescribed by an SDE of the form for time t ≥ 0, with the drift function f : R N x × R N a → R N x depending on N a ≥ 0 unknown parameters a = (a 1 , . . . , a N a ) T ∈ R N a . Model errors are represented through standard N w -dimensional Brownian motion W t , N w ≥ 1, and a matrix G ∈ R N x ×N w . We also introduce the associated model error covariance matrix Q = GG T . We will generally assume that the initial condition X 0 is fixed, that is, X 0 = x 0 a.s. for given x 0 ∈ R N x . In terms of a more specific example, one can think of X t denoting the position of a particle at time t ≥ 0 moving in N x = 3 dimensional space under the influence of a stochastic velocity field, with deterministic contributions given by f and stochastic perturbations by GW t . In the case G = 0, the SDE (1) reduces to an ordinary differential equation with given initial condition x 0 . We assume throughout this paper that (1) possesses unique, strong solutions for all parameter values a. See, for example, [2] (Section 3.3) for sufficient conditions on the drift function f . The distribution of X t is denoted by π t , which we also abbreviate by π t = Law(X t ). We use the same notation for measures and their Lebesgue densities, provided they exist.

Example 1.
A wide class of drift functions can be written in the form where f 0 : R N x → R N x is a known drift function, the b i : R N x → R N x , i = 1, . . . , N a , denote appropriate basis functions, and the vector a = (a 1 , . . . , a N a ) T ∈ R N a contains the unknown parameters of the model. The family {b i (x)} of basis functions, which we collect in a matrix-valued function B(x) = (b 1 (x), b 2 (x), . . . , b N a (x)) ∈ R N x ×N a , could arise from a finite-dimensional truncation of some appropriate Hilbert space H. See, for example, [24] for computational approaches to nonparametric drift estimation using a Galerkin approximation in H, where the b i (x) become finite element basis functions. Furthermore, the expansion coefficients {a i } could be made time-dependent by letting them evolve according to some system of differential equations arising, for example, from the discretisation of an underlying partial differential equation with solutions in H. See [3] for specific examples of such a setting. While the present paper focuses on stationary drift functions, i.e., the parameters {a i } are time-independent, the results from Sections 3 and 5, respectively, can easily be extended to the non-stationary case where the parameters themselves satisfy given evolution equations.
Data and an observation model are required in order to perform state and parameter estimation for SDEs of the form (1). In this paper, we assume that we observe partial and noisy increments dY t of the signal X t , given by for t in the observation interval [0, T], T > 0, where H ∈ R N y ×N x is a given linear operator, V t denotes standard N y -dimensional Brownian motion with N y ≥ 1 and R ∈ R N y ×N y is a covariance matrix. We introduce the observation map for later use. Unless HG = 0, it is clear that the model error E m t := GW t in (1) and the total observation error in (3) are correlated. The impact of correlations between the model and measurement errors on the state estimation problem have been discussed by [6,7]. Furthermore, such correlations require adjustments to sequential estimation methods [16,17,25] which are the main focus of this paper. We assume throughout this paper that the covariance matrix of the observation error (5) is invertible. The special case R = 0 and H = I leads to a pure parameter estimation problem which has been extensively studied in the literature in the settings of maximum likelihood and Bayesian estimators [1,2]. In Section 3, we provide a reformulation of the Bayesian approach as McKean-Vlasov equations for the parameters, based on the results in [9,11].
If R = 0, then (1) and (3) lead to a combined state and parameter estimation problem with correlated noise terms. We will first discuss the impact of this correlation on the pure state estimation problem in Section 4 assuming that the parameters of the problem are known. Again, we will derive appropriate McKean-Vlasov equations in the state variables. Our key contribution is a formulation that avoids the need for log-density estimates, and can be put into an appropriately generalised ensemble Kalman-Bucy filter approximation framework [14,15]. We also formally demonstrate that the McKean-Vlasov filter equation reduces to dX t = dY t in the limit R → 0 and H = I, a property that is less straightforward to demonstrate for filter formulations involving log-densities.
These McKean-Vlasov equations are generalised to the combined state and parameter estimation problem via an augmentation of state space [13] in Section 5. Given the results from Section 4, such an extension is rather straightforward.
The numerical experiments in Section 6 rely exclusively on the generalised ensemble Kalman-Bucy filter approximation to the McKean-Vlasov equations, which are easy to implement and yield robust and accurate numerical results.

Parameter Estimation from Noiseless Data
In this section, we treat the simpler Bayesian parameter estimation problem which arises from setting R = 0 and H = I in (3), i.e., N y = N x . This leads to dX t = dY t and, furthermore, X t = Y t for all t ∈ [0, T], provided X 0 = Y 0 = x 0 which we assume throughout this paper. The requirement that C = Q is invertible requires that G has rank N x ; that is, N w ≥ N x in (1). The data likelihood thus follows from the observation model with additive Brownian noise in (3). Given a prior distribution Π 0 (a) for the parameters, the resulting posterior distribution at any time t ∈ (0, T] is according to Bayes' theorem [7]. Here, we have introduced the shorthand for the expectation of l t with respect to Π 0 . It is well-known that the posterior distributions Π t satisfy the stochastic partial differential equation with the time-dependent observation map where φ : R N a → R is a compactly supported smooth test function, and Π t [φ] again denoting the expectation of φ with respect to Π t . See [7] for a detailed discussion. Equation (10) is a special instance of the well-known Kushner-Stratonovitch equation from time-continuous filtering [7].

Feedback Particle Filter
We now state a McKean-Vlasov reformulation of the Kushner-Stratonovitch Equation (10) as a special instance of the feedback particle filter of [11,12]. The key idea is to formulate a stochastic differential equation in the parameters in which they are treated as time-dependent random variables. We introduce the notation A t for these, and require that the law of A t coincide with (8) for t ∈ [0, T], i.e., with the solution to (10).

Lemma 1 (Feedback particle filter). Consider the McKean-Vlasov equations
where the matrix-valued Kalman gain K t ∈ R N a ×N y satisfies The innovation process I t can be chosen to be given by either or and Then, the distribution Π t = Law( A t ) coincides with the solution to (10), provided that the initial distributions agree. In other words, Π t = Π t for all t ∈ [0, T].
Throughout this paper, we write (12) in the more compact Stratonovitch form where the Stratonovitch interpretation is to be applied only to A t in K t ( A t ), while the explicit time-dependence of K t remains in its Itô interpretation. It should be noted that the matrix-valued function K t is not uniquely defined by the PDE (13). Indeed, provided K t solves (13), K t + β t is also a solution whenever ∇ · Π t β t = 0. As discussed in [15], the minimiser over all suitable K t with respect to a kinetic energy-type functional is of the form for a vector of potential functions Ψ t = (ψ 1 t , . . . , ψ N x t ), ψ k t : R N a → R. Inserting (18) into (13) leads to N x elliptic partial differential equations (often referred to as Poisson equations), understood component wise, where the centring condition Π t [Ψ t ] = 0 makes the solution unique under mild assumptions on Π t (see [26]). The numerical approximation of (19) in the context of the feedback particle filter has been discussed in [27]. Finally, (15) yields a particularly appealing formulation, since it is based on a direct comparison of dY t with a random realisation of the right hand side of the SDE (1), given a parameter value a = A t (ω) and a realisation of the noise term dW t (ω). This fact will be explored further in Section 4. (13) and (18) in their index forms:

Ensemble Kalman-Bucy Filter
Let us now assume that the initial distribution Π 0 is Gaussian, and that f is linear in the unknown parameters such as in (2). Then, the distributions Π t remain Gaussian for all times with mean a t and covariance matrix P aa t . The elliptic PDE (13) is solved by the parameter-independent Kalman gain matrix and one obtains the McKean-Vlasov formulation of the Kalman-Bucy filter, with the innovation process I t defined by either Please note that the Stratonovitch formulation (17) reduces to the standard Itô interpretation, since K t no longer depends explicitly on A t .
The McKean-Vlasov Equation (23) can be extended to nonlinear, non-Gaussian parameter estimation problems by generalising the parameter-independent Kalman gain matrix (22) to Clearly, the gain (26) provides only an approximation to the solution of (13). However, such approximations have become popular in nonlinear state estimation in the form of the ensemble Kalman filter [16,17], and we will test its suitability for parameter estimation in Section 6.
Numerical implementations of the proposed McKean-Vlasov approaches rely on Monte-Carlo approximations. More specifically, given M samples A i 0 , i = 1, . . . , M, from the initial distribution Π 0 , we introduce the interacting particle system d A i in our numerical experiments, which leads to the so-called ensemble Kalman-Bucy filter [14,15]. Please note that P ah t provides an unbiased estimator of P ah t . Finally, a robust and efficient time-stepping procedure for approximating A t n , t n = n∆t, is provided in [28][29][30]. Denoting the approximations at time t n by A i n , i = 1, . . . , M, we obtain with step size ∆t > 0, empirical covariance matrices and innovation increments ∆I i n given by either or Here we have used the abbreviations h n (a) = f (Y n , a), Y n = Y t n , and ∆Y n = Y t n+1 − Y t n . While the feedback particle formulation (17) and its ensemble Kalman-Bucy filter approximation (31) are special cases of already available formulations, they provide the starting point for our novel McKean-Vlasov equations and their numerical approximation of the combined state and parameter estimation problem with correlated measurement and model errors, which we develop in the following two sections.

State Estimation for Noisy Data
We return to the observation Model (3) with R = 0 and general H. The pure state estimation problem is considered first; that is, (1).
Using E o t , given by (5), and E c t defined by with the total measurement error covariance matrix C given by (6), we find that These errors naturally suggest linear combinations of W t and V t in (1) and (3) that shift the correlation between measurement and model errors to the signal dynamics, yielding where W t and V t denote mutually independent standard Brownian motions of dimension N w and N y , respectively. These equations correspond exactly to the correlated noise example from [7] (Section 3.8).
Furthermore, H = I and R = 0 lead to E c t = 0, QH T C −1/2 = C 1/2 , and, hence, dX t = dY t . A straightforward application of the results from [7] (Section 3.8) yields the following statement: where We use the notation Q : is the generator of (1), h(x) = H f (x) denotes the observation map, and φ is a compactly supported smooth function.
For the convenience of the reader, we present an independent derivation in Appendix A. We note that (39) also arises as the Kushner-Stratonovitch equations for an SDE Model (1) with observations Y t satisfying the observation model where V t denotes N y -dimensional Brownian motion independent of the Brownian motion W t in (1).
Here we have used that π t [HQ∇π t ] = 0. This reinterpretation of our state estimation problem in terms of uncorrelated model and observation errors and modified observation map allows one to apply available MCMC and SMC methods for continuous-time filtering and smoothing problems. See, for example, [16]. However, there are two major limitations of such an approach. First, it requires approximating the gradient of the log-density. Second, the modified observation Model (41) is not well-defined in the limit R → 0 and H = I, since the density π t collapses to a Dirac delta function under the given initial condition X 0 = x 0 a.s. In order to circumvent these complications, we develop an alternative approach based on an appropriately modified feedback particle filter formulation in the following subsection.

Generalised Feedback Particle Filter Formulation
While it is clearly possible to apply the standard feedback particle filter formulations using (41), the following alternative formulation avoids the need for approximating the gradient of the log-density.

Lemma 3 (Feedback particle filter with correlated innovation). Consider the McKean-Vlasov equation
where the gain K t ∈ R N x ×N y solves The function Ω t is given by and the innovation process I t by Here, W t and U t denote mutually independent N x -dimensional and N y -dimensional Brownian motions, respectively. Then, π t = Law( X t ) coincides with the solution to (39), provided that the initial distributions agree.
It should be stressed that W t in (43) and (46) denote the same Brownian motion, resulting in correlations between the innovation process and model noise.
Proof. In this proof the Einstein summation convention over repeated indices is employed, noting that (44) takes the form We begin by writing (43) in its Itô-form, where Here, we have used that the covariation between K t and I t satisfies Furthermore, GW, I t = −QH T t and I, I t = 2Ct. For a smooth compactly supported test function φ, Itô's formula implies where the covariation process is given by Our aim is to show that π t [φ] coincides with π t [φ] as defined by the Kushner-Stratonovich Equation (39). To this end, we insert (48) and (52) into (51) and take the conditional expectation, arriving at recalling that the generator L has been defined in (40). Under the assumption that K t satisfies (44), the two Equations (39) and (53) coincide. Indeed, and the dY s -contributions agree. To verify the same for the ds-contributions, we use (44) to obtain Finally, collecting terms in (53) and (56), and applying (55) to the remaining ds-contribution, i.e., − π s [∇φ · K s ] π s [h], leads to the desired result.
We note that the correlation between the innovation process I t and the model error W t leads to a correction term Ω t in (43) which cannot be subsumed into a Stratonovitch correction, in contrast to the standard feedback particle filter formulation (17).

Remark 3.
If we set R = 0, H = I, and K t = QH T C −1 = I in (43), then one obtains since Ω t vanishes, and all other terms in (43) cancel each other out. If, furthermore, Y 0 = X 0 = x 0 a.s., then X t = Y t for all t ∈ [0, T], which in turn justifies our assumption that the gain K t is independent of the state variable. Hence, the McKean-Vlasov formulation (43) reproduces the exact reference trajectory Y t in the case of no measurement errors and perfectly known initial conditions.
We develop a simplified version of the feedback particle filter formulation (43) for linear SDEs and Gaussian distributions in the following subsection, which will form the basis of the generalised ensemble Kalman-Bucy filter put forward in the follow-up Section 4.3.

Generalised Kalman-Bucy Filter
Let us assume that f (x) = Fx with F ∈ R N x ×N x , i.e., Equations (1) and (3) take the form with initial conditions drawn from a Gaussian distribution. In this case π t stays Gaussian for all t > 0, i.e., π t ∼ N(x t , P t ) with x t ∈ R N x , P t ∈ R N x ×N x . Equation (19) can be solved uniquely by ∇ x Ψ = P t F T H T , and thus the McKean-Vlasov equations for the feedback particle filter (43) reduce to with the innovation process (46) leading to We take the expectation in (60) and (61) and end up with Defining u t := X t − x t , we see that Next we use and P t = E[u t u T t ] to obtain, after some calculations, Hence we have shown that our McKean-Vlasov formulation (60) agrees with the standard Kalman-Bucy filter equations for the mean and the covariance matrix in the correlated noise case [6].

Ensemble Kalman-Bucy Filter
The McKean-Vlasov Equation (60) for linear systems, along with Gaussian prior and posterior distributions, suggest approximating the feedback particle filter formulation (43) for nonlinear systems by where the innovation process I t given by (46) as before. In other words, we approximate the gain matrix K t in (43) by the state independent term P xh t + QH T C −1 with the covariance matrix P xh t defined by where π t denotes the law of X t . We can now generalise the ensemble Kalman-Bucy filter formulation (31) for the pure parameter estimation problem to the state estimation problem with correlated noise. We assume that M initial state values X i 0 have been sampled from an initial distribution π 0 or, alternatively, X i 0 = x 0 for all i = 1, . . . , M in case the initial condition is known exactly. These state values are then propagated under the time-stepping procedure with Θ i n ∼ N(0, I), step size ∆t > 0, empirical covariance matrices and innovation increments ∆I i n given by The McKean-Vlasov equations of this section form the basis for the methods proposed for the combined state and parameter estimation problem to be considered next.

Combined State and Parameter Estimation
We now return to the combined state and parameter estimation problem, and consider the augmented dynamics with observations (3) as before. The initial conditions satisfy X 0 = x 0 a.s., and A 0 ∼ Π 0 . Let us introduce the extended state space variable Z t = (X T t , A T t ) T . In terms of Z t , the Equations (3) and (71) take the form Thus we end up with an augmented state estimation problem of the general structure considered in detail in Section 4 already. Below we provide details on some of the necessary modifications.

Feedback Particle Filter Formulation
The appropriately extended feedback particle filter Equation (43) leads to where (46) takes the form with observation map (4) and correction Ω t given by (45), with Q replaced byQ =ḠḠ T and H byH.
In the Poisson equation(s) (19), Π t is replaced by π t denoting the joint density of ( X t , A t ). We also stress that Ψ t becomes a function of x and a, and we distinguish between gradients with respect to x and a using the notation ∇ x and ∇ a , respectively. Numerical implementations of the extended feedback particle filter are demanding due to the need for solving the Poisson equation(s) (19). Instead, we again rely on the ensemble Kalman-Bucy filter approximation, which we describe next.

Ensemble Kalman-Bucy Filter
We approximate the joint density π t of Z t by an ensemble of particles that is, where δ z denotes the Dirac delta function centred at z . The initial ensemble satisfies X i 0 = x 0 for all i = 1, . . . , M, and the initial parameter values A i 0 are independent draws from the prior distribution Π 0 . At the same time, we make the approximation Z t ∼ N(z M t , P zz t ) when dealing with the Kalman gain of the feedback particle filter. Here the empirical mean z M t has components and the joint empirical covariance matrix is given by and W i t and U i t denote independent N x -dimensional and N y -dimensional Brownian motions, respectively, for i = 1, . . . , M.
The interacting particle Equation (83) can be time-stepped along the lines discussed in Section 4.3 for the pure state estimation formulation of the ensemble Kalman-Bucy filter.

Numerical Results
We now apply the generalised ensemble Kalman-Bucy filter formulation (83) with innovation (84) to five different model scenarios.

Parameter Estimation for the Ornstein-Uhlenbeck Process
Our first example is provided by the Ornstein-Uhlenbeck process dX t = a X t dt + Q 1/2 dW t (85) with unknown parameter a ∈ R, and known initial condition X 0 = 1/2. We assume an observation model of the form (3) with H = 1, and a measurement error taking values R = 0.01, R = 0.0001, and R = 0. The model error variance is set to either Q = 0.5 or Q = 0.005. Except for the case R = 0 a combined state and parameter estimation problem is to be solved. We implement the ensemble Kalman-Bucy filter (83) with innovation (84), step size ∆t = 0.005, and ensemble size M = 1000. The data is generated using the Euler-Maruyama method applied to (85), with a = −1/2 and integrated over a time-interval [0, 500] with the same step size. The prior distribution Π 0 for the parameter is Gaussian with mean a = −1/2 and variance σ 2 a = 2. The results can be found in Figure 1. We find that the ensemble Kalman-Bucy filter is able to successfully identify the unknown parameter under all tested experimental settings, except for the largest measurement error case where R = 0.01. There, a small systematic offset of the estimated parameter value can be observed. One can also see that the variance in the parameter estimate monotonically decreases in time in all cases, while the variance in the state estimates approximately reaches a steady state.

Averaging
Consider the equations from [19] for λ, α, γ, > 0, and initial condition Y 0 = 1/2, Z 0 = 0. The reduced equations in the limit → 0 are given by (85), with parameter value and initial condition X 0 = 1/2. The reduced dynamics corresponds to a (stable) Ornstein-Uhlenbeck process for λ/α > 1. We wish to estimate the parameter a from observed increments where the sequence of {Y n } n≥0 is obtained by time-stepping (86) using the Euler-Maruyama method with a step size ∆t. We set λ = 3, α = 2 (so that a = −1/2), Q = 0.5, and ∈ {0.1, 0.01} in our experiments. The measurement noise is set to R = 0.01 or R = 0 (pure parameter estimation). We implement the ensemble Kalman-Bucy filter (83) with innovation (84), step size ∆t = /50, and ensemble size M = 1000 for the reduced Equation (87). The data is generated from an Euler-Maruyama discretization of (86) with the same step size. We also investigate the effect of subsampling the observations for = 0.01 by solving (86) with step size ∆t = /50 and storing only every tenth solution Y n , while the reduced equations and the ensemble Kalman-Bucy filter equations are integrated with ∆t = /5. The results are shown in Figure 2. Figure 3 shows the results for the same experiments repeated with a smaller ensemble size of M = 10. We find that the smaller ensemble size leads to more noisy estimates for the variance in X n and a faster decay of the variance in A n , but the estimated parameter values are equally well converged. Subsampling does not lead to significant changes in the estimated parameter values. This is in contrast to the example considered next. We finally mention [31] for alternative approaches to sequential estimation in the context of averaging using however different assumptions on the data.
According to homogenisation theory, the reduced model is given by (85) with Q = σ, and we wish to estimate the parameter a from the data {∆Y n } produced according to (88). It is known that a standard maximum likelihood estimator (MLE) given by leads to a ML = 0 in the limit ∆τ → 0 and the observation interval T → ∞. This MLE corresponds to H = I and R = 0 in our extended state space formulation of the problem. Subsampling can be achieved by choosing an appropriate time-step ∆t > ∆τ in the ensemble Kalman-Bucy filter equations and a corresponding subsampling of the data points Y n in (88). We used ∆t = 50∆τ = 0.01 and ∆t = 500∆τ = 0.1, respectively. The results can be found in Figure 4. It can be seen that only the larger subsampling leads to a correct estimate of the parameter a. This is in line with known results for the maximum likelihood estimator (90). See [32] and references therein.

Nonparametric Drift and State Estimation
We consider nonparametric drift estimation for one-dimensional SDEs over a periodic domain [0, 2π) in the setting considered from a theoretical perspective in [33]. There, a zero-mean Gaussian process prior GP (0, D −1 ) is placed on the unknown drift function, with inverse covariance operator The integer parameter p sets the regularity of the process, whereas η, κ ∈ R + control its characteristic correlation length and stationary variance. Spatial discretization of the problem is carried out by first defining a grid of N d evenly spaced points on the domain, at locations x i = i∆x, ∆x = 2π/N d . The drift function is projected onto compactly supported functions centred at these points, which are piecewise linear with and linear interpolation is used to define a drift function f (x, a) for all x ∈ [0, 2π), that is, it is of the form (2) with f 0 (x) ≡ 0. In this example, we set N d = 200. Sample realisations, as well as the reference drift f * , can be found in Figure 5a. Data is generated by integrating the SDE (1) with drift f * forward in time from initial condition X 0 = π and with noise level Q = 0.1, using the Euler-Maruyama discretisation with step size ∆t = 0.1 over one million time-steps. The spatial distribution of the solutions X n is plotted in Figure 5b. The data is then given by with R = 0.00001. Data assimilation is performed using the time-discretised ensemble Kalman-Bucy filter Equation (83) with innovation (84), ensemble size M = 200, and step size ∆t = 0.1. The final estimate of the drift function (ensemble mean) and the ensemble of drift functions can be found in Figure 5c. Figure 5d displays the ensemble of state estimates and the value of the reference solution at the final time. We find that the ensemble Kalman-Bucy filter is able to successfully estimate the drift function and the model states. Further experiments reveal that the drift function can only be identified for sufficiently small measurement errors.

Spde Parameter Estimation
Consider the stochastic heat equation on the periodic domain x ∈ [0, 2π), given in conservative form by the stochastic partial differential equation (SPDE) where W(x, t) is space-time white noise. With constant θ(x) = θ, this SPDE reduces to In this example, we examine the estimation of θ from incremental measurements of a locally averaged quantity q(x, t) that arises naturally in a standard finite volume discretisation of (95).
To discretise the system, one first defines q i t = q(x i , t) around N d = 200 grid points x i on a regular grid, separated by distances ∆x, as The conservative (drift) term in (94) reduces to where θ i±1/2 ≡ θ(x i + ∆x/2), etc. The following standard finite difference approximations for constant θ, where W i t are independent one-dimensional Brownian motions in time. Following recent results from [20] we consider the case of estimation of a constant a = θ value from measurements dq * t at a fixed location/index j * ∈ {1, . . . , N d }. The data trajectory is thus given by where R 1/2 is a scalar and V t is a standard Brownian motion in one dimension. We perform numerical experiments in which the initial state q i 0 is set to zero for all indices i and the prior on the unknown parameter a = θ is uniform over the interval [0.2, 1.8].
The increment data is generated by first integrating (95) forward in time from the known initial condition q i (0) = 0 for all i. The equation is discretised in time using the Euler-Maruyama method. It is known that ∆t < θ∆x 2 /2 is required for stability of the Euler-Maruyama discretisation; we use the much smaller time step ∆t = ∆x 2 /80. The solution is sampled with this same time step, and increment measurements are approximated at time t n by setting the measurement noise level R to zero in (100), resulting in Please note that the associated model error in (1) is given by G = σ 1/2 ∆x 1/2 I and the matrix H in (3) projects the vector of state increments onto a single component with index j * = N d /2. Simulations are performed over the time-interval [0, 20]. The results can be found in Figure 6a. We also compute the model evidence for a sequence of parameter values θ ∈ {0.2, 0.3, . . . , 1.8} based on a standard Kalman-Bucy filter [6] for the associated linear state estimation problem. See Figure 6b. Both approaches agree with the reference value θ = 1.

Discussion
The results presented here demonstrate that the proposed methodology can be applied to a broad range of continuous-time state and parameter estimation problems with correlated measurement and model errors. Alternatively, one could have employed standard SMC or MCMC methods utilising the modified observation Model (41) as implied by the Kushner-Stratonovitch formulation (39) of the filtering problem. However, such implementations require the approximation of the additional Q∇ log π t term which is nontrivial if only samples from π t are available. Furthermore, the limiting behaviour of such implementations in the limit R → 0 and H = I (pure parameter estimation problem) is unclear since π t degenerates into a Dirac delta distribution, potentially leading to numerical difficulties in this singular regime. The proposed generalised feedback particle filter formulation avoids these issues through the use of stochastic innovations which are correlated with the model noise. In other words, the distribution π t does not appear explicitly in the innovation process (46), and the correlated noise terms cancel each other out as discussed in Remark 3 for R = 0 and H = I. The main computational challenge of the feedback particle filter approach is given by the need for finding the Kalman gain matrix (57). However, the constant gain ensemble Kalman-Bucy approximation is easy to implement. In fact, the only differences with the standard ensemble Kalman-Bucy filter formulation of [14] are in the additional QH T term in the Kalman gain, and a correlation between the stochastic innovation process and the model error. While the ensemble Kalman-Bucy filter gave rather satisfactory results for the numerical experiments displayed in Section 6, strongly non-Gaussian distributions might require more accurate approximations to the Kalman gain matrix (57). In that case, one could rely on the particle-based diffusion map approximation considered in [27].

Conclusions
In this paper, we have derived McKean-Vlasov equations for combined state and parameter estimation from continuously observed state increments. An approximate and robust implementation of these McKean-Vlasov equations in the form of a generalised ensemble Kalman-Bucy filter has been provided and applied to a range of increasingly complex model systems. Future work will address the treatment of temporally-correlated measurement and model errors, as well as a rigorous analysis of these McKean-Vlasov equations in the contexts of multi-scale dynamics and nonparametric drift estimation.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. satisfying the SDE dl t = l t dM t .
For an arbitrary smooth compactly supported test function φ, Itô's formula implies where X s satisfies (38a). For the covariation process l, X t we obtain l, X t = l t M, X t = l t f (X t ) T H T C −1 HQt, using Y, X t = HQt. Furthermore, X, X t = Qt, which follows from the definition of the stochastic contributions in (38a). We now apply the conditional expectation to (A7). Noticing that the result follows from (A6).