Density Difference Grid Design in a Point-Mass Filter

: The paper deals with the Bayesian state estimation of nonlinear stochastic dynamic systems. The stress is laid on the point-mass ﬁlter, solving the Bayesian recursive relations for the state estimate conditional density computation using the deterministic grid-based numerical integration method. In particular, the grid design is discussed and the novel density difference grid is proposed. The proposed grid design covers such regions of the state-space where the conditional density is signiﬁcantly spatially varying, by the dense grid. In other regions, a sparse grid is used to keep the computational complexity low. The proposed grid design is thoroughly discussed, analyzed, and illustrated in a numerical study.


Introduction
State estimation of nonlinear discrete-time stochastic dynamic systems from noisy or incomplete measurements has been a subject of considerable research interest. The topic plays an important role in signal processing, control, and fault detection of both small and large scale systems in virtually all fields of industry [1] including power generation and transmission, battery live prediction [2][3][4][5][6][7][8], automotive [9,10], navigation [11,12], speech and image processing etc. State estimation methods has also become important in non-technical fields including, for example, economics, biology, or weather forecasting.
In the Bayesian framework, a general solution to the state estimation problem is given by the Bayesian recursive relations (BRRs), which compute the probability density functions (PDFs) of the state conditioned on the measurements. The conditional PDFs fully describe the estimate of the immeasurable state of a nonlinear or non-Gaussian stochastic dynamic system. Besides a set of linear models leading e.g., to the Kalman filter (KF), the BRRs are not analytically tractable and an approximation in their solution must be used. The resulting approximate methods can be divided with respect to the validity of the resulting estimates into global and local filters [13,14].
The local filters provide computationally efficient estimates in the form of the conditional mean and covariance matrix with potentially limited performance due to inherent underlying Gaussian assumption, which is not always realistic (as the first two moments usually do not represent a full description of the immeasurable state) . These filters are represented by the extended Kalman filter, the unscented Kalman filter, or the stochastic integration filter [14][15][16][17][18].
As opposed to the local filters, the global filters provide estimates in the form of conditional PDFs without any assumption on the conditional distribution family. The global filters are capable of estimating the state of a strongly nonlinear or non-Gaussian system but usually at the cost of higher computational demands. Among these, the Gaussian sum filter [19], the particle filter [20], and the point-mass filter [21][22][23][24] have attracted a considerable attention.
The point-mass filter (PMF) considered in this paper is based on a numerical solution to the BRRs using deterministic grid-based numerical integration rules, which means the BRRs are solved and the conditional PDF is computed at the grid points only. Although a conceptual design of the PMF is known from the seventies [13], an application in the terrain-aided navigation (TAN) [22] is reported as late as the nineties mainly because of the computational complexity of the filter. Therefore, since the development of the PMF, extensive attention is devoted to the efficient filter design reducing the computational complexity. In the literature, two main directions of reduction can be found; • Design of the Rao-Blackwellized (or marginalized) PMF [23,25], • Efficient convolution in the PMF prediction step [21,22,24,[26][27][28][29].
The former approach is based on the decomposition of the state vector into two parts; nonlinearly and linearly modeled. As a consequence, just the nonlinearly modeled part of the state is estimated by the computationally expensive PMF and the remaining part of the state vector is estimated by a set of computationally cheap KFs. The Rao-Blackwellized PMF is thus suitable for possibly high-dimensional models with specific conditionally linear structure and Gaussian noises. The latter approach typically does not restrict the class of the allowed state-space models, but it rather focuses on computationally effective implementation of the convolution, as the most demanding step, in the prediction of the PMF.
Recently, a novel method for the efficient convolution, based on the the density-specific grid (DSG) design, was proposed [24]. Compared to other methods (typically assuming one equidistant grid), the DSG assumes two different grids (each with different properties); • Denser grid supporting symmetrical vicinity of the conditional PDF mean (where a higher volume of the conditional PDF is expected), • Sparser grid supporting conditional PDF tail region (where a marginal volume of the conditional PDF is expected).
The DSG brings significant improvement to the estimation quality especially for unimodal and rather symmetric conditional PDFs with light tails. If the density is multimodal, heavy-tailed, or heavy-skewed, the benefit of the DSG is marginal.
The goal of the paper is, thus, to design an advanced version of the DSG, further denoted as the density difference grid (DDG). The proposed DDG is based on the differentiation of the conditional PDF at the sparse grid, which allows accurate determination of the regions with a significant spatial change of the PDF. In these regions, the denser grid is created to reasonably well capture the PDF shape. As a consequence, the DDG has an ability to better approximate asymmetric, heavy-tailed, and multi-modal PDFs, i.e., the PDFs with dominant higher-order moments, with a negligible increase of the overall PMF computational complexity.
The rest of the paper is organized as follows. In Section 2, the system description and a brief introduction to the Bayesian state estimation is presented. Then, in Section 3, the points-mass filter is overviewed. This overview is accompanied with a short recapitulation of the state-of-the-art grid designs. Section 4 proposes a novel DDG design and Section 5 shows the DDG performance in a numerical study. Finally, concluding remarks are drawn in Section 6.

System Description and Bayesian State Estimation
Let the following discrete-time state-space model of a nonlinear stochastic dynamic system be considered, where the vectors x k ∈ R n x , u k ∈ R n u , and z k ∈ R n z represent the unknown state of the system and the known input and measurement at time instant k, respectively. The state and measurement functions f k : R n x ×n u → R n x and h k : R n x → R n z are known general transformations.
Particular realizations of the state and measurement noises w k and v k are unknown, but their PDFs, i.e., the state noise PDF p(w k ) and the measurement noise PDF p(v k ), are supposed to be known and independent of the known initial state PDF p(x 0 ). The goal of the state estimation, particularly of the filtering, is to find an estimate of the state x k conditioned on all measurements z k = [z 0 , z 1 , . . . , z k ] up to the time instant k. In the Bayesian framework, the state estimate in the form of the conditional PDF p(x k |z k ), ∀k, is sought. Note that, considering the model (1), (2), the state transition PDF and the BRRs (3)-(5) should be conditioned also on available sequence of the input u k , ∀k. However, for the sake of notational simplicity, the input signal is assumed to be implicitly part of the condition and it is not explicitly stated, i.e., p(x k+1 |x k ) = p(x k+1 |x k ; u k ), p(x k |z k ) = p(x k |z k ; u k−1 ), and p(x k+1 |z k ) = p(x k+1 |z k ; u k ).
The Bayesian state estimator design stems from the solution of the Bayesian recursive relations (BRRs) for the recursive computation of the conditional PDFs [15] p( where p(x k |z k ) is the filtering PDF computed by the Bayes' rule (3) and p(x k |z k−1 ) is the one-step predictive PDF computed by the Chapman-Kolmogorov Equation (4). The PDFs p(x k |x k−1 ) and p(z k |x k ) are the state transition PDF obtained from system dynamics (1) and the measurement PDF obtained from measurement Equation (2), respectively. The PDF is the one-step predictive PDF of the measurement. The recursion (3), (4) starts from the initial PDF, i.e., p(x 0 |z −1 ) = p(x 0 ).

Point-Mass Filter Summary and Grid Designs
The PMF is based on the numerical solution to BRRs using the deterministic integration rules. As a consequence, a significant part of the (continuous) state-space is approximated by the grid of isolated points in which the conditional PDFs are computed. The PDF evaluated at the grid points is denoted as the point-mass PDF or, alternatively, as the point-mass density (PMD).

Point-Mass PDF Approximation
Approximation of a conditional PDF p(x k |z m ), where m = k for the filtering PDF (3) and m = k − 1 for the predictive PDF (4), by a piece-wise constant PMDp(x k |z m ; ξ k ) defined at the set of the discrete grid points ξ k = {ξ is a normalization constant, and δ k is a volume of the i-th point neighbourhood defined below, T defines a (hyper-)rectangular neighbourhood of a grid point ξ (i) k , where the PDF p(x k |z m ) is assumed to be constant and has value P k|m (ξ . . , n x , and otherwise, so that Note that in (6)-(9) the notation x(j) meaning the j-th element of the vector x was used. Illustration of the point-mass PDF approximation (6) with omitted time indices is shown in Figure 1. The point-mass density (6) can be, thus, interpreted as a sum of weighted uniform distributions. Another interpretations, such as the a sum of the Dirac delta functions, can be found in [22,26].

Point-Mass Filter Algorithm
The basic algorithm of the point-mass filter (PMF) can be summarised by the following Algorithm 1 [26,30]: approximating the initial PDF p(x 0 |z −1 ); 2. Measurement update: Compute the filtering point-mass PDF where is the value of the filtering PDF at i-th grid point; 3. Grid construction: Based on the filtering estimate (11) and the state Equation (1) where the value of the predictive PDF at j-th grid point is 5. Set k = k + 1 and go to the step 2); For the sake of clarity, a detailed derivation of the PMF can be found in Appendix A. Note that the PMF filter provides estimates in the form of the PMDp(x k |z m ; ξ k ) approximating the conditional PDF p(x k |z m ). Assuming small grid point vicinity ∆ k , the conditional moments, e.g., the mean and covariance matrix, can be easily computed according to [23] x PMD,k|m = when required. The moments are, however, generally not needed for the PMF run.

Grid Design
A crucial part of the PMF design that is significantly affecting the filter estimation performance is a specification of the grid points before time update step (13) and, by extension, in the initialization (10).
When designing a grid, the number of grid points should be decided first. The more grid points N are considered, the better the accuracy of the density approximation can be achieved but at the cost of higher computational demands. Because of the convolution (13), (14), the computational complexity grows exponentially with N. Thus, the number of grid points is solely driven by the available computational capacity and required precision of the estimate (an analysis of ideal grid resolution can be found in [31]).
The grid should be designed to cover a significant region of the conditional PDFs support sufficiently well. Therefore, it is necessary to specify two grid characteristics 1. Region of the PDF support to be covered by the grid points, 2. Locations of grid points inside the region.
The former characteristic specification is common for all single-grid design techniques. The region R for the predictive PDF p(x k+1 |z k ) is (hyper-)rectangular and centred around the mean where P thr is a user-defined threshold. Unfortunately, except for the initial condition, neither the predictive moments nor the predictive PDF is known at "Grid construction" step of Algorithm 1 (the PDF is to be computed in the subsequent step "Time update"). A reasonable approach is to compute the first two (approximate) predictive moments of the statê and to construct an approximate Gaussian predictive PDF where the notation N {x;x, P} stands for the Gaussian PDF of the random variable x with the meanx and the covariance matrix P (for simplicity, the random variable has same notation as its realization). Then, the sought region R can be found analogously to (17), where the approximate predictive PDF p A (x k+1 |z k ) (20) is used instead of the exact one p(x k+1 |z k ). The predictive moments (18) and (19) are computed on the basis of known filtering moments (given by (15), (16), and (11)) and state Equation (1). If the dynamics is nonlinear, approximate predictive moments can be computed e.g., by the unscented transformation or various numerical integration rules [16,32]. Note that, the approximate PDF (20) is used for the region R determination only. The latter characteristic is specific for each grid design technique.

Equidistant Grid Design
The equidistant grid design [22] is a standard grid design technique, which covers the region R with equidistantly located points, i.e., the neighbourhood ∆ k is the same for all grid points.

Density Specific Grid Design
The DSG design [24], is based on an unequally spaced placement of the fixed number of points over the PDF support R to be approximated. The main idea is to approximate the PDF, where the volume of the PDF is significant or significantly varying by a denser grid (as in this area, a slope of the PDF is expected to be significant) and the tail of the PDF by a sparser grid (as the tail is flat). Thus, the DSG design determines, besides the region R, also the hyper-rectangular subregion R in , where the grid is denser. The determination of the subregion again depends on the computed approximate predictive moments (18), (19).
As can be seen in [24], the DSG design significantly outperforms the standard equidistant grid design if the underlying conditional PDF is unimodal, rather symmetric, and light-tailed. If any of the mentioned properties are violated, the DSG benefit is marginal.

Goal of the Paper
The goal of the paper is to propose a novel unequally spaced grid design, called the density difference grid design (further abbreviated as DDG), which provides improved PMF estimation performance independently of the conditional PDF properties.

Density Difference Grid Design
The DDG design is based on the idea that such part of the support, where the conditional PDF is significantly spatially varying, should be covered by the dense grid (to sufficiently well capture PDF behavior). On the other hand, the remaining part of the support, i.e., the parts where the conditional PDF is almost flat, can be reasonably well approximated by the sparse grid. To realize this basic idea of the DDG, the following three steps should be performed: 1. Design of the sparse (outer) grid and evaluation of the predictive PMD at these grid points, 2. Differentiation of the "sparse" predictive PMD, 3. Design of the dense (inner) grids in the vicinity of such "sparse" grid points, which are associated with a large value of the difference.
Compared to the DSG design discussed in [24], the DDG design utilizes full information about the shape of the predictive PMD (and not only the moments) and also the denser (inner) grid need not be hyper-rectangular, which further improves the fit of the PDF and its grid support. The steps are particularized below.

Sparse Grid Design
The sparse grid design takes advantage of the PMF time-update, that the PDF value of a single point ξ (j) k+1 can be counted independently of any other grid points. The sparse grid design is given by Algorithm 2: Algorithm 2: DDG Sparse Grid 1. Define a threshold p thr below which the probability at a grid point is considered to be negligible and can be neglected. The threshold setting can be related to the computation precision of a given platform; 2. State Similarly as in the DSG design [24], compute the predictive momentsx A,k+1|k (18) and P A,k+1|k (19) and set the number of standard deviations (STDs) n σ of the marginal distributions of (20), which are used for specification of an a priori (hyper-)rectangular region R A to be covered by the grid points; 3. At the boundaries of the region R A select a small set of grid points and evaluate the predictive probability at these points. If the probability of the point is (significantly) below the threshold p thr , then the particular bound can be shifted towards the mean. On the other hand, if the probability of the point is (significantly) over the threshold p thr , then the particular bound can be shifted in an opposite direction from the mean. The shifted points are, then, re-evaluated; 4. When all boundary points are properly set, create the smallest possible hyper-rectangle defining R including all the boundary points. As a consequence, the resulting region R need not be symmetrical with respect to the mean; 5. Define a cardinality of the sparse grid N sparse and dense grid N dense = N − N sparse . Fill the region R by equidistantly placed N sparse hyper-rectangular grid points {ξ (j) sparse,k+1 } Nsparse j=1 (each sparse grid point is associated with a neighbourhood ∆ sparse,k+1 ). A reasonable choice N sparse = 1 This algorithm is illustrated in Figure 2 for the two dimensional state-space.

Differentiation of Sparse Grid PMD
To find the state-space regions, where the conditional predictive PDF p(x k+1 |z k ) is expected to significantly spatially vary (i.e., the regions to be covered by the dense grid), the PMD evaluated at the sparse grid points is differenced according to Algorithm 3: sparse,k+1 compute the divided difference vector d k+1 ∈ R 2 nx [14,33] according to (21) where · 2 denotes the Euclidean norm and the index vector ı = [ı 1 , ı 2 , . . . , ı 2 nx ] is selected so that the closest neighbouring points of the j-th point ξ (j) sparse,k+1 are chosen. It means two neighbouring points are selected for n x = 1, four neighbouring points for n x = 2, eight neighbouring points for n x = 3, etc.; 3. Compute the grid points rating as a norm of the vector d k+1 ∈ R 2 nx (21) according to which characterizes the overall spatial variability of the PDF in the vicinity of the j-th sparse This algorithm is illustrated in Figure 3 for the two dimensional state-space.

Design of Dense Grid
Having defined sparse grid points (Algorithm 2) and computed their difference rating (Algorithm 3), which in fact precisely denotes the state-space regions with significant change or variability of the conditional PDF, it is possible to construct the dense grid according to Algorithm 4:  (22); 2. Set the splitting ratio r (similarly as in the DSG design in [24]), which determines number of dense grid points r n x used for covering the vicinity of the j-th sparse grid ξ (j) sparse,k+1 . Reasonable choice is r = {2, 3}, but note that, the ratio need not be constant and it can depend on the rating d This algorithm is illustrated in Figure 4 for the two dimensional state-space.

Summary of DDG Design and Properties
The DDG design is given by the three steps realized by Algorithms 2-4. An example of the final DDG for the PMD shown in Figure 3 can be found in Figure 5.
Compared to the recently proposed DSG, the DDG dense (inner) grid need not be rectangular and, thus, it more realistically covers the support of the conditional PDF with the significant volume without any requirement on computation and utilization of a set of higher-order moments. The DDG design also reduces the user interaction and minimizes the number of the user-defined parameters (compared to the standard grid design, the DDG design requires specification of three additional parameters only; the number of sparse grid points N sparse and the threshold p thr in Algorithm 2 and splitting ration r in Algorithm 4). On the other, the DDG design is slightly more computationally intensive as illustrated in the following section.

Numerical Illustration
The influence of the grid design is illustrated in two examples; static point-mass approximation and dynamic state estimation using a PMF with three different grid designs: • Standard equidistant (EQ) grid design ( [22], Section 3.3.1), • DSG design ( [24], Section 3.3.2), • Proposed DDG design (Section 4).

PDF Point-Mass Approximation Quality
In the first example, let a true two-dimensional, i.e., n x = 2, Gaussian mixture PDF with four components be considered, where the particular weights, means, and covariance matrices are given as follows x 1x2x3x4 = 1 2 4 5 4 1 3.5 −3.5 , The PDF is illustrated in Figure 6. The PDF (23) is approximated by the PMDp(x; ξ) (6) with the number of grid points N = {50, 200, 500} for all three considered grid designs with R covering three STDs. The DSG and DDG designs used such user-defined parameters, which provided the best performance, namely the PMD with the DSG used R in covering two STDs and the PMD with the DDG used parameters r = 2 and N sparse = 0.4N. The results can be found in Table 1. It can be seen that the proposed DDG provides the best approximation quality among the considered grids with the same number of points. A degraded performance of the DSG-based design for the approximation of multimodal distributions can also be noted.
For the sake of completeness, a layout of the tested grids is shown in Figure 7.

PMF in Terrain-Aided Navigation with Spherical Coordinates
The second example considers the terrain-aided navigation (TAN) scenario. The TAN system is a navigation system suitable for GNSS-denied environments, where the vehicle horizontal position is determined by the correlation of the measured vertical terrain profile and the pre-recorded terrain map [23]. The models used in the TAN are highly nonlinear, thus, they are often used as benchmark examples for illustration of the state estimation techniques performance. In this section, the considered the TAN scenario is described by the state-space model [34] where k = 0, 1, 2, . . . , 10, φ k is the sought vehicle latitude in degrees, λ k is the longitude in degrees, , ∆T = 1 is the sampling period in minutes, K = 0.4363 is the constant heading relative to the north in radians (measurement provided e.g., by the compass), V = [V north , V east ] T is the known constant velocity vector (measurement provided e.g., by an inertial navigation system) in meters/sec, and R(λ k ) = r 2 1 cos(λ k ) 2 + r 2 2 sin(λ k ) 2 (r 1 cos(λ k )) 2 + (r 2 sin(λ k )) 2 (29) is Earth radius at latitude λ k . Constants r 1 = 6, 378, 137 and r 2 = 6, 356, 800 are radii of the Earth at the equator and pole, respectively, in meters. The measurement function h k (·) is the the map in the Geographic coordinate system. For the simulation, the maps provided by the Shuttle Radar Topography Mission (SRTM) were used (https://www2.jpl.nasa.gov/srtm/). The noises w k and v k are described as The PMF was configured with three grid designs with N = 100, all with R covering ±4 STDs in each axis (i.e., in latitude and longitude). Namely, the following implementations were compared  (22) were split into r 2 × 6 = 54 dense grid points (each sparse point were split into nine dense points). In total, this implementation used N = 44 + 54 = 98 grid points.
Performance of the filters were compared using the following criteria  Table 2. It can be seen that the DDG method is better than the equidistant and DSG grid methods, while keeping similar computational complexity. The PMF estimate with the DDG is approximately about 11% more accurate. Note that transformation of the angular longitude and latitude error into the Cartesian coordinate system results in the improvement of the several meters (only by better selection of the grid points and with almost the same computational complexity).

Concluding Remarks
The paper focused on the point-mass filter for the nonlinear state estimation with the stress on the grid design as a key filter design step. In particular, a new density difference grid design was proposed. The proposed design was based on evaluation of the predictive PDF in a set of the spare grid points. Subsequently, such point-mass density was differentiated and the state-space regions with significantly varying density were located. These regions are, finally, covered by the dense grid and the detailed point-mass density was completed. The density difference grid properties were analyzed, discussed, and illustrated. It was shown that the point-mass density and the point-mass filter with the proposed grid provides consistently better performance independently of the conditional probability density function shape (e.g., multimodal, heavy-tailed) than with other grid design strategies, while the computational complexity is preserved. The performance was illustrated in two examples, including terrain-aided navigation inspired scenario.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: where the evaluation of the filtering PDF at ξ and the (point-mass) approximate normalization constant iŝ Note that the support grid stays the same during measurement-update.

Appendix A.3. Derivation of Time-Update
Predictive probability is given by the Chapman-Kolmogorov Equation (4), where p(x k |z k ) is the filtration density and is a transition PDF of the state, which is given by the dynamics (1). However, only the approximate filtering PMD (A4) is known and its substitution into (4) readŝ p(x k+1 |z k ; ξ k ) = p(x k+1 |x k ) In order to solve this relation in accord to the PMF design philosophy, a new grid {ξ (j) k+1 } N j=1 is defined. This grid should cover the part of the state-space where the predictive probability is expected to be. The location and properties of the new grid can be found on the basis of the moment-based prediction as discussed in Section 3.3 or in [22,24]. Having the new grid, i.e., the grid for the time instant k + 1, the transition PDF (A7) can be approximated by the PMD as followŝ By substitution of (A9) into (A8), the predictive PMD is of the form where the predictive probability value at a new grid point ξ The expression (A10), commonly denoted as the convolution, is the most computationally demanding step of the PMF. Note that the value of P k+1|k (ξ (j) k+1 ) is independent over j, therefore parallel programming and computing tools can be used to speed the algorithm up.