Continuous-Discrete Path Integral Filtering

A summary of the relationship between the Langevin equation, Fokker-Planck-Kolmogorov forward equation (FPKfe) and the Feynman path integral descriptions of stochastic processes relevant for the solution of the continuous-discrete filtering problem is provided in this paper. The practical utility of the path integral formula is demonstrated via some nontrivial examples. Specifically, it is shown that the simplest approximation of the path integral formula for the fundamental solution of the FPKfe can be applied to solve nonlinear continuous-discrete filtering problems quite accurately. The Dirac-Feynman path integral filtering algorithm is quite simple, and is suitable for real-time implementation.


Introduction
The following continuous-discrete filtering problem often arises in practice. The time evolution of the state, or signal of interest, is well-described by a continuous-time stochastic process. However, the state process is not directly observable, i.e., the state process is a hidden continuous-time Markov process. Instead, what is measured is a related discretetime stochastic process termed the measurement process. The continuous-discrete filtering problem is to estimate the state of the system, given the measurements [1].
When the state and measurement processes are linear, excellent performance is often obtained using the Kalman filter [2,3]. However, the Kalman filter merely propagates the conditional mean and covariance, so is not a universally optimal filter and is inadequate for problems with non-Gaussian characteristics (e.g., multi-modal). When the state and/or measurement processes are nonlinear, a (non-unique) linearization of the problem leads to an extended Kalman filter.If the nonlinearity is benign, it is still very effective. However, for the general case, it cannot provide a robust solution.
The complete solution of the filtering problem is the conditional probability density function of the state given the observations. It is complete in the Bayesian sense, i.e., it contains all the probabilistic information about the state process that is in the measurements and the initial condition. The solution is termed universal if the initial distribution can be arbitrary. From the conditional probability density, one can compute quantities optimal under various criteria. For instance, the conditional mean is the least mean-squares estimate.
The solution of the continuous-discrete filtering problem requires the solution of a linear, parabolic, partial differential equation (PDE) termed the Fokker-Planck-Kolmogorov forward equation (FPKfe). There are three main techniques to solve the FPKfe type of equations, namely, finite difference methods [4,5], spectral methods [6], and finite/spectral element methods [7]. However, numerical solution of PDEs is not straightforward. For example, the error in a naïve discretization may not vanish as the grid size is reduced, i.e., it may not be convergent. Another possibility is that the method may not be consistent, i.e., it may tend to a different PDE in the limit that the discretization spacing vanishes. Furthermore, the numerical method may be unstable, or there may be severe time step size restrictions.
The fundamental solution of the FPKfe can be represented in terms of a Feynman path integral [8]. The path integral formula can be derived directly from the Langevin equation. A textbook discussion for derivation of the path integral representation of the fundamental solution of the FPKfe corresponding to the Langevin equations for additive and multiplicative noise cases can be found in [9] and [10]. These results have been simplified and generalized for the general case in [11] and [12]. In this paper, it is demonstrated that the simplest approximate path integral formulae lead to very accurate solution of the nonlinear continuousdiscrete filtering problem.
In Section 2, the basic concepts of continuous-discrete filtering theory is reviewed. In Section 3, the path integral formulae for the case of additive and multiplicative noise cases are summarized. In Section 4, an elementary solution of the continuous-discrete filtering problem is presented that is based on the path integral formulae. Some examples illustrating the path integral filtering is presented in the following section. Some remarks on practical implementational aspects of path integral filtering is presented in Section 6. The appendix summarizes the path integral results derived in [11] and [12].

Langevin Equation and the FPKfe
The general continuous-time state model is described by the following stochastic differential equation (SDE): Here x(t) and f (x(t), t) are n−dimensional column vectors, the diffusion vielbein e(x(t), t) is an n × p e matrix and v(t) is a pedimensional column vector. The noise process v(t) is assumed to be Brownian with covariance Q(t) and the quantity g ≡ eQe T is termed the diffusion matrix. All functions are assumed to be sufficiently smooth. Equation 2.1 is also referred to as the Langevin equation. It is interpreted in the Itô sense (see Appendix A). Throughout the paper, bold symbols refer to the stochastic processes while the corresponding plain symbol refers to a sample of the process. Let σ 0 (x) be the initial probability distribution of the state process. Then, the evolution of the probability distribution of the state process described the Langevin equation, p(t, x), is described by the FPKfe, i.e., (2.2)

Fundamental Solution of the FPKfe
The solution of the FPKfe can be written as an integral equation. To see this, first note that the complete information is in the transition probability density which also satisfies the FPKfe except with a δ−function initial condition. Specifically, let t ′′ > t ′ , and consider the following PDE: Such a solution, i.e., P (t, x|t ′ , x ′ ), is also known as the fundamental solution of the FPKfe.
From the fundamental solution one can compute the probability at a later time for an arbitrary initial condition as follows: In this paper, all integrals are assumed to be from −∞ to +∞, unless otherwise specified. Therefore, in order to solve the FPKfe it is sufficient to solve for the transition probability density P (t, x|t ′ , x ′ ). Note that this solution is universal in the sense that the initial distribution can be arbitrary.

Continuous-Discrete Filtering
In this paper, it is assumed that the measurement model is described by the following discretetime stochastic process where y(t) ∈ R m×1 , h ∈ R m×1 , and the noise process w(t) is assumed to be a white noise process. Note that y(t 0 ) = 0. It is assumed that p(y(t k )|x(t k )) is known. Then, the universal continuous-discrete filtering problem can be solved as follows. Let the initial distribution be σ 0 (x) and let the measurements be collected at time instants t 1 , t 2 , . . . , t k , . . .. Let p(t k−1 |Y (t k−1 )) be the conditional probability density at time t k−1 , where Y (τ ) = {y(t l ) : t 0 < t l ≤ τ }. Then the conditional probability density at time t k , after incorporating the measurement y(t k ), is obtained via the prediction and correction steps: Often (as in this paper), the measurement model is described by an additive Gaussian noise model, i.e., with w(t) ∼ N (0, R(t)), i.e., p(y(t k )|x) = 1 Observe that, as in the PDE formulations, one may use a convenient set of basis functions. Then, the evolution of each of the basis functions under the FPKfe follows from Equation 2.4. Since the basis functions are independent of measurements, the computation may be performed off-line. Finally, note that this solution of the filtering problem is universal. In conclusion, the determination of the fundamental solution of the FPKfe is equivalent to the solution of the universal optimal nonlinear filtering problem. A solution for the time independent case with orthogonal diffusion matrix in terms of ordinary integrals was presented in [13]. However, the integrand is complicated and not easily implementable in practice. In the next section,the fundamental solution for the general case in terms of path integrals is summarized. It is shown that it leads to formulae that are simple to implement.

Path Integral Formulas
In this section, path integral formulae derived in [11] and [12] are summarized. It is assumed that t ′′ > t ′ . Details on the formulae are summarized in Appendix A.

Additive Noise
When the diffusion vielbein is independent of the state, i.e., where all quantities are as defined in Section II-C, the noise is said to be additive. The path integral formula for the transition probability density is given by where the Lagrangian L (r) (t, x,ẋ) is defined as and e ia (t)Q ab (t)e bj (t), (3.4) and Here, r ∈ [0, 1] specifies the discretization of the SDE; see Appendix A for details. The quantity S(t ′′ , x ′ ) = t ′′ t ′ L (r) (t, x,ẋ)dt is referred to as the action.

Multiplicative Noise
The state model for the general case is given by As discussed in more detail in Appendix A, there is ambiguity in the definition of this SDE which is due to the fact that dv(t) ≈ O( √ dt). The path integral formula for the general discretization is complicated and summarized in Appendix A. In the simplest Itô case, it reduces to where the Lagrangian L (r,0) (t, x,ẋ) is defined as and A nice feature of the Itô interpretation is that the formula is the same as that for the simpler additive noise case (with some obvious changes). Note that it is always possible to convert from a SDE defined in any sense (say, Stratanovich or s = 0) to the corresponding Ito SDE. Therefore, this can be considered to be the result for the general case.

Dirac-Feynman Path Integral Filtering
The path integral is formally defined as the N → ∞ limit of a N multi-dimensional integrals and yields the correct answer for arbitrary time step size. In this section, an algorithm for continuous-discrete filtering using the simplest approximation to the path integral formula, termed the Dirac-Feynman approximation, is derived.

Dirac-Feynman Approximation
Consider first the additive noise case. When the time step ǫ ≡ t ′′ − t ′ is infinitesimal, the path integral is given by where the Lagrangian is This leads to a natural approximation for the path integral for small time steps: A special case is the one-step pre-point approximate formula The one-step symmetric approximate path integral formula for the transition probability amplitude (as originally used by Feynman in quantum mechanics [8]) is . Note that for the explicit time-dependent case the time has also been symmetrized in the hope that it will give a more accurate result. Of course, for small time steps and if the time dependence is benign, the error in using this or the end points is small.
Similarly, for the multiplicative noise case in the Itô interpretation/discretization of the state SDE the following approximate formula results: For the multiplicative noise case, the simplest one-step approximation is the pre-point discretization where r = s = 0: Since s = 0, this means that we are using the Itô interpretation of the state model Langevin equation. When r = 1/2, it is termed the Feynman convention, while s = 1/2 corresponds to the Stratanovich interpretation.

The Dirac-Feynman Algorithm
The one-step formulae discussed in the previous section lead to the simplest path integral filtering algorithm, termed the Dirac-Feynman (DF) algorithm. The steps for DF algorithm may be summarized as follows: 1. From the state model, obtain the expression for the Lagrangian. Specifically, • For the additive noise case, the Lagrangian is given by Equation 3.3; • For the multiplicative noise case with Itô discretization the Lagrangian is given by Equation 3.8, while for the general discretization the action is given in Appendix A.
3. Compute the transition probability density P (t ′′ , x ′′ |t ′ , x ′ ) using the appropriate formula (e.g., Equations 4.5 or 4.8). The grid spacing should be such that the transition probability tensor is adequately sampled, as discussed below.

At time t k
(a) The prediction step is accomplished by The measurement at time t k are incorporated in the correction step via (4.10)

Practical Computational Strategies
The above general filtering algorithm based on the Dirac-Feynman approximation of the path integral formula computes the conditional probability density at grid points. This can be computationally very expensive as the number of grid points can be very large, especially for larger dimensions. Here, a few approximations will be presented that drastically reduces the computational load.
The most crucial property that is exploited is that the transition probability density is an exponential function. Consequently, many elements of the transitional probabilty tensor are negligible small, the precise number depending on the grid spacing. A significant computational saving is obtained when the (user-defined) "exponentially small" quantities are simply set to zero. In that case, the transition probability density is approximated by a sparse tensor, which results in huge savings in memory and computational load.
The next key issue is that of grid spacing. An appropriate grid spacing is one that adequately samples the conditional probability density. Of course, the conditional probability density is not known, but its effective domain (i.e., where it is significant) is clearly a function of the signal and measurement model drifts and noises. For instance, the grid spacing should be of the order of change in state expected in a time step, which is not always easy to determine for a generic model. However, if the measurement noise is small, finer grid spacing is required so as to capture the state information in precise measurements. However, if the measurement noise is large, it may be unnecessary to use a fine grid spacing even if the state model noise is very small since the measurements are not that informative. Alternatively, If the grid spacing is too large compared to the signal model noise vielbein term, replace the difusion matrix with an "effective diffusion matrix" that is taken to be a constant times the grid spacing, i.e., noise inflation. This additional approximation can still lead to useful results as shown in an example below.
It is also noted that the grid spacing is a function of the time steps. This is analogous to the case of PDE solution via discretization. Thus, when using the one-step DF approximation, there will not be a gain by reducing the grid spacing to smaller values (and at the cost of drastically increasing processing time). It is then more appropriate to use multi-step approximations to get more accurate results.
Here are some possibilities for practical implemenetation: 1. Precompute the transition probability tensor wth pre-determined and fixed grid. There are two options: (a) Compute the correction at all the grid points; (b) Compute the correction only where the prediction result is significant.
It is the second of those options that will be used in this paper.
2. Another option is to use a focussed adaptive grid, much as in PDE approaches. Specifically, at each time step: (a) Find where the prediction step result is significant; (b) Find the domain in the state space where the conditional probability density is significant, and possibly interpolate. For the multi-modal case, there would be several disjoint regions; (c) Compute the transition probability tensor with those points as initial points and propagate to points in region suggested by state model.
Thus, the grid is moving. In this case, the grid can be finer than in the previous case, although then the computational advantage of pre-computation is lost.
3. Precompute the solution using basis functions. For instance, in many applications wavelets have been known to represent a wide variety of functions arising in practice. Then, instead of using the transition probability tensor, FPKfe solutions with wavelet basis functions are stored.

Examples
In this section, a couple of two-dimensional examples are presented that illustrate the utility of the path integral formulae presented in this paper. The signal and measurement models are both nonlinear in these examples. Therefore, the Kalman filter is not a reliable solution for these problems. The symmetric discretization formula was used. The MATLAB tensor toolbox developed by Bader and Kolda was used for the computations [14]. The approximation techniques are discussed in Section 4.3. In addition, in order to speed up the pre-computation of the transition probability tensor, it was assumed that P (f 1 , i.e., the "extent" of P was chosen to be 2. Thus, this implementation of the DF algorithm is sub-optimal in many ways. For comparison, the performance of the SIR particle filter based on the Euler discretization of the state model SDE is also included [15]. The MATLAB toolbox PFlib was used in the particle filter simulations [16].  Consider the state model with the nonlinear measurement model Here σ x 1 σ x 2 = 0.001 0.03 and we consider two values for σ y , namely σ y = 0.2 and σ y = 2. This example was studied in [17] and the extended Kalman is known to fail for this model. The Lagrangian for this model is easily seen to be given by  Figure 3: Conditional mean x 1 (t) computed for the measurement sample of Figure 1.
Consider first the σ y = 0.2 case. Figure 1 shows a sample trajectory with initial state [0.37, 0.31], and in Figure 2 is shown a measurement sample. The time step size is 0.01 and the number of time steps is 200.
The spatial interval [−0.8, 0.8] × [−0.8, 0.8] is subdivided into 42 × 42 equal intervals. The signal model noise is very small requiring much finer grid spacing. Instead, as discussed in Section 4.3, the effective σ's were taken to be α × ∆x 1 ∆x 2 with α = 1. The initial distribution is taken to be uniform.
The conditional mean for x 1 (t) and x 2 (t) are plotted in Figures 3 and 4. The RMS error was found to be 0.1180 and the time taken was 40 seconds. Observe that the conditional mean is within a standard deviation of the true state almost all of the time confirming that the tracking quality is good. It is noted that the variance is larger for x 1 (t). The reason can be understood from Figure 5 which plots the marginal conditional probability density of the state variable x 1 (t)-it is bi-modal. The bi-modal nature (for a significant fraction of the time) is the reason the EKF will fail in this instance. The performance is seen to be similar to that reported in [17] which was obtained using considerably more involved techniques and finer grid spacing.
This example was also investigated using the SIR-PF. The SIR-PF implemented with 5000 particles took about 155 seconds and the RMS error was found to be 0.165 even when Conditional Mean with 2σ bounds for the sample path (measurement noise σ y =0.2): Nx1=42Nx2=42. Figure 4: Conditional mean x 2 (t) computed for the measurement sample of Figure 1.
initiated about the true state, i.e., initial distribution was chosen to be Gaussian with mean [0.37, 0.31] and variance I 2 , where I 2 is the 2 × 2 identity matrix. When the variance was reduced to 10 −2 × I, it resulted in RMS error of only 0.022. Thus, the performance of the SIR-PF depends crucially on the initial condition. It is also noted that no bi-modality of the marginal pdf of state x 1 (t) at T = 1 was observed for the SIR-PF simulations when the number of particles was 5000. Upon increasing the number of particles to 10, 000, the bi-modality was noted, although the RMSE was not significantly smaller. Next consider the larger measurement noise case. A measurement sample corresponding to the state trajectory in Figure 1 is shown in Figure 6. The spatial interval [−1.6, 1.6]×[−1, 1] was subdivided into 62 × 62 equispaced grid points. The RMS error was found to be 0.128 and the time taken was about 110 seconds. The bimodality at T = 1 is evident in this case as well (see Figure 7).
The SIR-PF was also implemented. When initialized as Gaussian with zero mean and unit variance, the tracking performance of the SIR-PF failed; the RMSE was found to be 25.34 when using 5000 particles (taking about 110 seconds). A sample performance is shown in Figures 8 and 9; it is clear that the state x 1 is poorly tracked. Even when using 50,000 particles, a sample run that took 25 minutes, resulted in RMS error of 16.53.

Consider the state model
dx 2 (t) = (x 1 (t) + sin(x 2 (t)))dt + dv 2 (t), and the measurement model Here dv i (t) are uncorrelated standard Wiener processes, and w i (t k ) ∼ N (0, 1). A sample of the state and measurement processes are shown in Figures 10 and 11 respectively. The discretization time step is 0.01.
The initial distribution is taken to be a Gaussian with zero mean and variance of 10. Figures 12 and 13 plots the conditional mean for the state. It is seen that the tracking is quite good despite the error at the start; the RMS error was found to be 0.54. The interval [−6, 6] was uniformly divided into 62 grid points and the extent of the transition probability tensor was 2. The 2000 time steps took about 8 minutes.
The SIR particle filter was also implemented with the same initial condition and with 5000 particles. The RMS error was found to be 1.48. Each run took about 10 minutes. Conditional Mean with 2σ bounds for the sample path (measurement noise σ y =1): Nx1=62Nx2=62. Figure 12: Conditional mean for state x 1 (t) computed for the measurement sample of Figure 11 and with initial distribution N (0, 10.
Next, consider time step of 0.2, i.e., only every twentieth measurement sample is assumed given. Figures 15 and 16 show the conditional means of the states. The number of grid points is smaller; the grid spacing is chosen to be twice the previous instance. Consequently, the computational effort is less, requiring only about 14 seconds. It is noted that the tracking performance is very good and the error estimated form the conditional probability density using this approximation is reliable. Now the RMS error is found to be 0.69, and only 0.31 if the first few errors are ignored.
In contrast, the error using the SIR-PF (with 2000 particles that took 37 seconds) is found to fail with RMS error of 3.68; see Figures 17 and 18 for typical results. The results for increasing the number of particles to 50, 000 (14 minutes execution time) did not improve the situation significantly (RMS error of 3.34); it would be better to subdivide the time step into several time steps and do the SIR-PF. For the purposes of this paper, it is sufficient to note that in this instance a single-step DF algorithm succeeds where the one-step SIR-PF fails. Conditional Mean with 2σ bounds for the sample path (measurement noise σ y =1): Nx1=62Nx2=62. Figure 13: Conditional mean for state x 2 (t) computed for the measurement sample of Figure 11 and with initial distribution N (0, 10. Conditional Mean with 2σ bounds for the sample path (measurement noise σ y =1): Nx1=32Nx2=32. Figure 15: Conditional mean for state x 1 (t) when measurement sample given by Figure 14. Conditional Mean with 2σ bounds for the sample path (measurement noise σ y =1): Nx1=32Nx2=32. Figure 16: Conditional mean for state x 2 (t) when measurements are every 0.2 seconds (Figure 14).

Additional Remarks
It is remarkable to note that the simplest approximations to the path integral formulae leads to very accurate results. Note that the time steps are small, but not infinitesimal. Such time step sizes are not unrealistic in real world applications.
It is particularly noteworthy since it was found that SIR-PF was not a reliable solution to the studied problems. Note that the rigorous results for MC type of techniques assume that the signal model drifts are bounded. If that is not the case, as here, the SIR-PF is not guarenteed to work well. In any case, the speed of convergence to the correct solution is not specified for a general filtering problem, as emphasized in [18]; PFs need to be "tuned" to the problem to get desired level of performance. In fact, for discrete-time filtering problems, excellent performance also follows from a well-chosen grid using sparse tensor techniques [19]. Clearly, it is not axiomatic that a generic particle filter will lead to significant computation savings (or performance) over a well-chosen sparse grid method, at least for smaller dimensional problems.
Observe also that the DF path integral filtering formulae have a simple and clear physical interpretation. Specifically, when the signal model noise is small the transition probability is significant only near trajectories satisfying the noiseless equation. The noise variance quantifies the extent to which the state may deviate from the noiseless trajectory.
The following additional observations can be made on the simplest path integral filtering method proposed in this paper: 1. In universal nonlinear filtering, including path integral filtering discussed here, the standard deviation computed from the (accurately computed) conditional probability density is a reliable measure of the filter performance. This is not the case for suboptimal methods like the EKF.

2.
In the examples studied, only the simplest one-step approximate formulae for the path integral expression were applied. There is a large body of work on more accurate onestep formulae that could be used to get better results if the formulae used in this paper are not accurate enough (see, for instance, [10]) .
3. Observe that higher accuracy (than the DF approximation) is attained by approximating the path integral with a finite-dimensional integral. The most efficient technique for evaluating such integrals would be to use Monte Carlo or quasi Monte Carlo methods. Another possibility is to use Monte Carlo based techniques for computing path integrals [20]. Observe that this is different from particle filtering.
4. The major source of computational savings following from noting that the transition probability is given in terms of an exponential function. This implies that P (t ′′ , x ′′ |t ′ , x ′ ) is non-negligible only in a very small region, or the transition probability density tensor is sparse. The sparsity property is crucial for storage and computation speed.

5.
In the example studied in Section 5.1 the grid spacing was larger than the noise. Since the grid must be able to sample the probability density, the effective noise vielbein was taken to be a constant (1 in our example) times the grid spacing, i.e., the signal model noise term is "inflated". Of course, this means that the result is not as accurate as the solution that uses the smaller values for the noise. However, it may still lead to acceptable results (as in the first example) at significantly lower computational effort.
6. Observe that even with the coarse sampling the computed conditional probability density is "smooth". It seems apparent that a finer spatial grid spacing (with the same temporal grid spacing)will yield essentially the same result (using the DF approximation) at significantly higher computational cost. This was observed in the two examples studied in this paper. Of course, a multiple time step approximation would be more accurate.
7. Also note that the conditional mean estimation is quite good, i.e., of the order of the grid spacing, even for the coarser resolutions. This confirms the view that the conditional probability density calculated at grid points approximates very well the true value at those grid points (provided the computations are accurate). Alternatively, an interpolated version of the fundamental solution at coarser grid is close to the actual value. This suggests that a practical way of verifying the validity of the approximation is to note if the variation in the statistics with grid spacing, such as the conditional mean, is minimal.
8. It is also noted that the PDE-based methods are considerably more complicated for general two-or higher-dimensional problems. Specifically, the non-diagonal diffusion matrix case is no harder to tackle using path integral methods than the diagonal case. This is in sharp contrast to the PDE approach which for higher-dimensional problems is typically based on operator splitting approaches. The operator splitting approaches cannot be reliable approximation for the general case.
9. In this paper, the prediction and correction steps were carried out using a uniform grid. It is clear that a much faster approach for the correction part for higher dimensional problems would be to use Monte Carlo integration methods. 10. Observe that the one-step approximation of the path integral can be stored more compactly. Compact representation of the transition probability density, especially in the Itô case where it is of the Gaussian form. Even for the general case, the transition probability density from a certain initial point and given time step can be stored in terms of a few points with the rest obtainable via interpolation.
11. Observe that the prediction step computation was sped up considerably by restricting calculation only in areas with significant conditional probability mass (or more accurately, in the union of the region of significant probability mass of p(y(t k )|x) and p(x|Y (t k−1 ))).
12. It is noted that when the DF approximation is used with larger time steps, a coarser grid is more appropriate, which requires far fewer computations. Thus, a quasi-realtime implementation could use the coarse-grid approximation with larger time steps to identify local regions where the conditional pdf is significant so that a more accurate computation can then be carried out. 13. In this paper, it has been assumed that the diffusion matrix is invertible. Often this arises when it is a matter of defining a state variable as a time derivative of the other (dx 1 (t) = x 2 (t)dt). It is plausible that the addition of a small, positive perturbation to the diffusion matrix (so that the perturbed diffusion matrix is invertible) will not lead to large errors.
14. When the step size is too large, the approximation will not be adequate. However, unlike some PDE discretization schemes, the degradation in performance is more graceful. For instance, positivity is always maintained since the transition probability density is manifestly positive. It is also significant to note that in physics path integral methods are used to compute quantities where t ′ → −∞ and t ′′ → +∞ (see, for instance, [21]). This is not possible by simple discretization of the corresponding PDE due to time step restrictions (note that implicit schemes are not as accurate).
15. For the multiplicative noise case, the choice of s = 0 leads to a more complicated form of the Lagrangian. The accuracy of the one-step approximation depends on s in addition to r and will be model-dependent. 16. Note that, unlike the result of S-T. Yau and Stephen Yau in [13], there is no rigorous bound on errors obtained for the Dirac-Feynman path integral formulae studied in the examples. It is known rigorously for a large class of problems that the continuum path integral formula converges to the correct solution [22].

Conclusion
In this paper, a new approach to solving the continuous-discrete filtering problem is presented. It is based on the Feynman path integral, which has been spectacularly successful in many areas of theoretical physics. The application of path integral methods to quantum field theory has also given striking insights to large areas of pure mathematics. The path integral methods has been shown offer deep insight into the solution of the continuous-discrete filtering problem that has potentially useful practical implications. In particular, it is demonstrated via non-trivial examples that the simplest approximations suggested by the path integral formulation can yield a very accurate solution of the filtering problem. The proposed Dirac-Feynman path integral filtering algorithm is very simple and easy to implement and practical for modest size problems, such as those arising in target tracking applications. Such formulae are also especially suitable from a real-time implementation point of view since it enables us to focus computation only on domains of significant probability mass. The application of path integral filtering for radar tracking problems, especially those with significant nonlinearity in the state model, will be investigated in subsequent papers. In a recent paper [23], it has been shown that the Feynman path integral filtering techniques also leads to new insights into the general continuous-continuous nonlinear filtering problem.

Acknowledgements
The author is grateful to Defence Research and Development Canada (DRDC) Ottawa for supporting this work under a Technology Investment Fund.

A.1 Additive Noise
The additive noise model is interpreted as the continuum limit of Observe that any r ∈ [0, 1] leads to the same continuum expression. The transition probability density for the additive noise case is given by (see [11]) where the Lagrangian L (r) (t, x,ẋ) is This formal path integral expression is defined as the continuum limit of 1 (2πǫ) n det g(t ′′ ) N k=1 d n x(t ′ + kǫ) 1 (2πǫ) n det g(t ′ + kǫ) exp −S (r) ǫ (t ′′ , t ′ ) , (A.7) where the discretized action S (r) ǫ (t ′′ , t ′ ) is defined as and where x (r) (t k ) = x(t k−1 ) + r(x(t k ) − x(t k−1 )). (A.9)
In this section, the Einstein summation convention is adopted, i.e., all repeated indices are assumed to be summed over, so that e ia dv a = p a=1 e ia dv a . Also, ∂ ∂x (r) i is written as ∂ (r) i . Note that the change in Equation A.10 when f i (x (r) (t p ), t p ) and e ia (x (s) (t p ), t p ) are replaced with f i (x (r) (t p ), t p−1 ) and e ia (x (s) (t p ), t p−1 ) is of O(ǫ 2 ) and O(ǫ 3/2 ) respectively. Hence, it may be ignored in the continuum limit as it is of order higher than O(ǫ).
In summary, there are infinitely many possible discretizations parametrized by two reals r, s ∈ [0, 1]. In the continuum limit, i.e., ǫ → 0, observe that the stochastic process depends on s, but not on r. When s = 0, the limiting equation is said to be interpreted as an Ito SDE, while when s = 1 2 , the equation is said to be interpreted in the Stratanovich sense. In [12] it is shown that for the general multiplicative noise case The discretized expression for the general case is complicated but can be written down from these results in a straightforward manner.