1. Introduction
When tracking vehicles or robots that are only capable of planar motion within rooms or on plains, their pose can be described by a two-dimensional position and a heading angle. Integrating the heading angle into the tracking is essential for realistic motion models of nonholonomic vehicles [
1]. Such models may also include other angles, such as a trailer angle or the steering angle for slowly steering vehicles [
2]. Estimating one’s own position and orientation is also an essential component for simultaneous localization and mapping [
3].
Each position–angle pair corresponds to an element in the special Euclidean group 
, which is the group of two-dimensional rigid-body motions (Section 10.6.2, [
4]). Quantities in 
 are also relevant in other settings, such as in control [
5]. However, it is nontrivial to rigorously estimate states in 
, which is why even nowadays, some trackers rely merely on estimating the velocities along the two axes [
6]. No heading angle is required to use a classical constant velocity or constant acceleration model [
7]. However, such simple models generally do not adequately take the kinematics of the vehicle or robot into account.
For estimation on , it is relevant to consider that the pose involves both an angle, which is a periodic quantity on the unit circle , and a real vector in . Stacking the orientation, described by an angle in  and the position yields a three-dimensional vector, which is an element of the Cartesian product . While all entries are real valued, one needs to consider the periodicity of the angle and not treat the vector as a three-dimensional Euclidean vector.
One can provide a filter for 
 based on the extended Kalman filter [
8]. However, such filters are generally suboptimal since they cannot fully take the periodicity of the angle into account. Especially in cases with high uncertainties, the local linearization can cause a degradation of estimation accuracy. A very general filter that can be easily adapted to the 
 domain is the particle filter (PF) [
9]. However, PFs suffer from particle degeneracy and impoverishment [
10] and can be slow to converge [
11]. Other general approaches that can be applied to 
 are the invariant extended and unscented Kalman filter [
12] and the unscented Kalman filter on manifolds (UKF-M) [
13]. An approach tailored explicitly to 
 that is based on dual quaternions was proposed in [
14]. A filter based on sample reduction for scenarios in which only samples of the noise are available was presented in [
15].
It is possible to make use of the property that the state can be described as a Cartesian product of a nonlinear domain and a linear domain. If we denote the part of our state describing the orientation with 
 and the part describing its translation with 
, then we can rewrite our joint density as a product of the conditional density 
 and the marginalized density for the periodic part 
. Splitting the state up into a part that is well suited for estimation using the Kalman filter and another part for which a nonlinear filter should be applied is a technique that has become popular for PFs as Rao–Blackwellization [
16] and has also found application in other filters such as the point mass filter [
17]. This technique was also the foundation for the grid-based filter for SE(2) [
18].
The filter we present in this paper also builds upon the idea of splitting up the state. It differs from the filter proposed in [
18] in several regards. First, system models in our novel filter should be given in the form of a transition density instead of an equation of random variables. Second, no way to provide a smooth continuous density was presented in [
18]. In our current paper, we consider different ways to provide a continuous density based on the parameters of our filter.
Another limitation of [
18] is that the conditionals for the likelihood and the system noise all have the same parameters. In our current paper, we have no such limitation. Thus, the system and measurement noise for the position can be orientation dependent. The system noise can depend on both the current angle and the potential future angle. Especially the dependency on the current angle is important because the uncertainty in the predicted position can depend on the orientation in many applications. For the likelihood, which describes the measurement model, orientation dependency is relevant since the orientation of an object can have an influence on the error in the position measurement. Last, unlike [
18], we allow for system inputs that depend on the orientation, which can be the key to solving scenarios such as the evaluation scenario in this paper.
For the nonlinear part concerning the angle, we use a grid-based filter that was first presented in [
19] and later extended for arbitrary nonlinear models and generalized to arbitrary-dimensional hypertori in [
20]. The filter uses trigonometric polynomials [
21] to provide an interpolation of the grid values that closely matches the original continuous density even when using small numbers of grid points. Similar filters were also presented for the (hyper)sphere and hyperhemisphere [
22,
23]. All these filters have in common that they can be interpreted to use a subdivision of the state space into non-overlapping regions. Since the domains in the grid filters mentioned in this paragraph are bounded, they can be subdivided into areas of finite size.
For our novel filter, which we call the state space subdivision filter (S3F), we subdivide the domain , which can be used to describe planar poses, into a finite number of non-overlapping parts. For this, we subdivide  into the set of areas  with . Each of the n areas in the subdivision is in the form , with a and b in . Note that for the unit circle, we get a valid interval even if . In this case, the interval crosses the boundary of periodicity, which is located at  in our case. Based on the subdivision of , the subdivision of  is then . Please note that the focus of our explanations and derivations will be on the elements of  and that we will use the term area to refer to the subsets of the periodic domain and not subsets of the entire state space.
For the derivation of our new filter, we introduce three assumptions. For an angle , the marginalized density  and the conditional density  are assumed to only depend on which area  lies in and not on the precise value of . Furthermore, we assume that  is a Gaussian density. This will allow us to use the basic ideas of the grid filter for the density describing the orientation and the Kalman filter for the conditional part. However, due to dependencies, significant adjustments are required.
While we focus on 
 in this paper, the S3F can be directly applied for 
 with 
 and most key parts of our filter are not limited to 
. It can be adopted for other Cartesian products, such as of a hypertorus or hypersphere and a Euclidean domain, using grid filters for the respective domain [
22,
23]. To allow for easier visualization, we limit ourselves to plotting densities on 
. This domain could, e.g., be used to estimate the rotor angle and velocity of a motor. In 
Figure 1, we show the density of a partially wrapped normal distribution [
24] on 
 with mean 
 and a covariance matrix with 1 in the entries on the diagonal and 
 in the off-diagonal entries. An illustration of all the assumptions of our filter is given in 
Figure 2. We will later regard how better continuous densities can be derived from the parameters of the filter.
The paper is structured as follows. In the next section, we go into detail on how we represent densities, how we approximate given densities, and how a continuous density can be provided. The update and prediction steps of the S3F are derived in 
Section 3. We evaluate the accuracy and run time of the S3F and other state-of-the-art filters in a simulated scenario in 
Section 4. A discussion, a conclusion, and an outlook are presented in 
Section 5.
  2. Density Representation and Approximation
Since all areas in 
 are in the form of 
, each area is (when considering the periodicity of the domain) connected and contains more than one element. For each area 
, we specify a grid point 
 in 
, which is in some way representative of the area. For this, we define the topology-aware distance (Section 2.3.4, [
25])
      
      for two angles 
 and 
.
We place the grid point  such that it is in  and has the same distance to a and b. To simplify the filter, we only consider subdivisions of the periodic domain into areas of equal size. Hence, for n regions, each area is of size . With all these specifications, the subdivision only depends on the starting point of the first region when  is always sorted such that the next region after  is . Without loss of generality, we say the first region is the region in which 0 lies. Thus, the starting point of the first region must lie in .
In our implementation of the filter, we choose the subdivision that leads to the first grid point to be at 0. In this subdivision, the first area starts at . Thus, assuming , we obtain the subdivision  and the grid points . We use this subdivision in our filter because all grid points can be represented as  with .
We proceed by providing the full details on the assumptions of our filters that we briefly summarized in the introduction. If all assumptions hold, all densities and likelihoods involved can be represented accurately and the update step will be precise (as we later see, the prediction step involves additional assumptions and approximations). To each grid point 
, we assign a grid value, which we denote by 
. There are different ways to interpret the grid values [
23]. However, we will not go into detail on this in our current paper and merely focus on the interpretation that 
 describes the function value of 
 in the area 
.
This leads us to the assumptions (A1)–(A3) that underlie the density representation in this paper.
- (A1)
- The function value in each area is identical to the corresponding grid value, i.e., . 
- (A2)
- The conditional density  is the same for all  (we denote it by ). 
- (A3)
- The conditional density  is Gaussian for every considered . 
By combining assumptions (A2) and (A3), we get the assumption that every 
 is Gaussian. We shall call the mean of that Gaussian 
 and the covariance 
, and write the pdf of the normal distribution 
 at 
 as 
. The combination of all three assumptions leads to a joint density consisting of scaled Gaussians that are extruded along the periodic dimension, as can be seen in 
Figure 2. For densities encountered in many scenarios, the assumptions (A1)–(A3) do not hold. By approximating densities and likelihoods based on these assumptions, we introduce errors. However, the errors tend to get smaller with an increasing number of regions 
n.
Based on these assumptions, we can now derive a formula for the parametric density. We start with the first two assumptions and then also introduce the third one, leading to
      
      with 
 denoting the set containing all tuples 
 for 
 and 
 being an indicator function that is 1 when the expression in braces is true and 0 otherwise. Throughout this paper, we will use that the marginalized density for the periodic part does not depend on the parameters of the Gaussians. This can be proven via
      
Thus, we can also write the marginal as , with  being the vector containing all grid values.
In the remainder of this section, we first address how we can obtain the parameters for a given density. The second subsection addresses how we can calculate the parameter of a normalized density from the parameters describing an unnormalized one. In the third subsection, we describe other ways to provide continuous densities based on the same parameter set. This only serves as a means to obtain smoother densities than the one in 
Figure 2 and is not used in the prediction and update steps.
  2.1. Deriving Parameters for a Given Density
We first discuss the case when the position and orientation are independent in the next paragraph and then proceed with the general case. The position and orientation may be independent for the initial prior density or certain likelihood functions. Note that the orientation and position usually do not stay independent throughout the prediction and filter steps. If they did, separate filters could be used and the S3F would not be required.
If the position and orientation are independent, we have 
. For the periodic part 
, the function values at the grid points 
 are stored in the vector of grid values 
, as is done by the grid filter for the circle [
19]. If 
 is a Gaussian density, all 
 and 
 are set to its parameters (the parameters are the same for each area due to the independence of 
 and 
). If 
 is not a Gaussian, we can approximate it with a Gaussian, e.g., via moment matching.
We now consider the case in which the two parts are dependent. We do not require the full density 
 but only the value of the density at the grid points. The values can be determined via
        
        which involves a 2-D integral for each of the 
n areas. Based on the grid values, we obtain the conditional density
        
        for every region. If the parameters of the Gaussian are not directly available, we can use
        
        to obtain the mean vector of the Gaussian for the area 
i and
        
        for the entry in column 
j and row 
k of the covariance matrix for area 
i. In total, we need 
n 2-D integrals for the grid values, 
 2-D integrals for all components of the means for all areas, and 
 2-D integrals for all components of the covariance matrices. Due to the expensive integrals, determining the parameters should be avoided during prediction and update steps. However, the integral formulae can be useful for, e.g., transforming a prior density offline.
The grid values are guaranteed to be nonnegative since they are function values of  or integrals of function values of . However, it cannot be guaranteed that any of the grid values is positive. If none is positive, using a higher resolution or a different filter should be considered. Furthermore, the density may not be normalized. If  is normalized, then the joint density based on the parameters is also normalized since the Gaussian for the translation is inherently normalized. However, it may happen that  is unnormalized. This is the reason why we wrote the vector as , and we will continue using this decorator to denote that the density described by the vector may be unnormalized. The final step of our approximation procedure is to determine the parameters of the normalized density. This is done as described in the next subsection.
  2.2. Normalization
For normalizing a density, it is crucial how a parametric density as described by (
3) can be scaled. It is trivial to see that if one scales every grid value with a factor 
, all function values are scaled by 
. The factor by which we need to scale the function is the reciprocal of the integral since scaling it with this value will lead to a function that integrates to one. We first derive the formula for the integral of the parametric density. For this, we use (
4) to obtain
        
With this formula, we can confirm that the joint density is normalized when  is normalized.
We can now proceed using the assumption (A1) to get
        
        in which 
 is the size of the respective subset of the periodic domain 
 and 
 denotes the 1-norm of the vector. Hence, the vector describing a normalized density can be obtained via
        
  2.3. Providing Continuous Densities
One way to provide a continuous density is to use (
3), which integrates all assumptions on which we base the prediction and update steps. Since modulo arithmetics can be employed to find the relevant area in 
 and evaluating the corresponding Gaussian is also in 
, evaluating the density is in 
. We can also apply (
3) for Cartesian products of 
 with other periodic (or bounded) domains.
In 
Figure 2, we illustrate how an approximation according to (
3) looks like. While the result clearly resembles the original density, it is not an accurate approximation. In our first modification to obtain a density that is closer to the original density, we drop the assumption (A1). In the paper on the grid filter [
19] that our current paper is based on, two ways to interpolate a periodic function based on function values on a grid are given. The first one does not ensure the nonnegativity of the density, while the other one does. We use the interpolation that ensures the nonnegativity and apply it to the periodic part 
. As stated previously, the joint density is normalized if the density for the periodic part is normalized. Thus, the normalization of the density when changing the interpolation scheme for the periodic part can be directly concluded from the normalization of the density for the periodic part, which is proven in the appendix of [
19]. In 
Figure 3, we show the result when the conditional densities are unchanged. The density is now much smoother but still shows significant changes at the borders of the different areas and is not yet an accurate approximation of the original density.
To obtain smooth transitions between the areas, we drop the assumption (A2) and consider potentially (different) conditionals within each area. Since we only have parameters of 
n Gaussians, we explore ways to generate other conditionals based on these parameters. For the conditional at a grid point 
, the parameters 
 and 
 should always be used. For any other point 
, we determine the points 
 and 
 (for 
, 
 shall be 1) that are closest to 
 in respect to the distance 
. Then, we calculate a value that describes how much of the distance between two consecutive grid points (which is 
) lies between 
 and 
 using
        
The corresponding factor for  is .
As the first of two alternative ways to obtain other conditionals, we propose to also drop (A3) and use the Gaussian mixture
        
        instead of the Gaussian at the closest grid point. This leads to a much smoother result, as can be seen in 
Figure 4. However, 
 may be multimodal for 
.
Our second option is to preserve (A3) and use a convex combination of the Gaussians’ parameters as the parameters of a new Gaussian according to
        
This guarantees that the conditional density of 
 is always unimodal. In the formula for the covariance, we use that the convex combination of symmetric positive definite matrices is always symmetric positive definite. While this approach leads to a good approximation quality, we did not observe a clear superiority over using (
13). For our considered example, the plot is visually indistinguishable from 
Figure 4, which is why we have omitted it. This approach may be more expensive in practical implementations since a different covariance matrix is used for every point at which the pdf is to be evaluated. The other two approaches involve evaluating at most 
n differently parameterized Gaussian densities, regardless of the number of points at which the pdf should be evaluated.
  3. Filter Derivation
In this section, we derive the update and prediction steps of the S3F. A formula that we will require in this section is the fundamental Gaussian identity (App. D, [
26])
      
      which can be applied for symmetric positive definite matrices 
 and 
, the matrix 
, and vectors 
, 
, and 
 with compatible sizes.
  3.1. Update Step
The update step is performed to improve our parameters based on measurements. For simplicity, we say a single measurement is obtained at time step 
t and denote it by 
. In our filter, we work with measurement models in the form of a likelihood function 
. The likelihood function (which is not necessarily a density) describes the probability that the measurement 
 is obtained when the state is 
 (we use ; in brackets to denote that a vector is stacked vertically). The update step is based on Bayes’ rule, leading to
        
        with 
 denoting the posterior and 
 the prior density. We now focus on obtaining the unnormalized prior 
, which we can easily normalize, as explained in 
Section 2.2.
To simplify the notation, we omit all time indices (in the update step, it is always 
t) except in the index of the state and measurement. We also omit conditioning on measurements for additional brevity (refer to (
16) to see which density is conditioned on which measurements). In the previous prediction step (or the initialization), we obtained the prior density
        
For the likelihood function, we also introduce the assumptions (A1) that 
 for all 
, (A2) that 
 for all 
, and (A3) that all conditionals are Gaussians, leading to
        
Since the assumptions may not be fulfilled in general, 
 will generally not be equal to the original likelihood 
. We assume we have the grid values 
 or have the marginalized density for the periodic part 
 and can evaluate it at any point. If integrals are required to determine (or approximate) any of the parameters, this has a significant negative impact on the run times. However, in many settings (see, e.g., our evaluation scenario in 
Section 4.1) integrals are not required to obtain the likelihood function.
The likelihood function is special in multiple ways. First, while it is nonnegative everywhere, it may not be and does not need to be normalized. The multiplication of the likelihood and the prior density is generally unnormalized, and thus, a normalization step is anyway required at the end of the update step. Second, the likelihood may only depend on a certain part of the state. If the likelihood does not depend on the linear part, one can just update the grid values as in the regular grid filter [
19] without adapting the parameters of the Gaussians. If the likelihood does not depend on the orientation part, we set all 
 to 1. It is important to still perform the update step as described in this subsection and not merely perform Kalman filter updates for updating the Gaussians’ parameters in each region. This will become evident in our derivation.
We first introduce the assumptions (A1) and (A2) to obtain
        
Then, we use (A3) and (
15), leading to
        
        with
        
As the parameters of the unnormalized posterior density, we thus obtain the parameters of the Gaussians according to (
21) and (
22) and the grid values using
        
The parameters of the normalized posterior density can be calculated as explained in 
Section 2.2 using 
. The Gaussians’ parameters are unchanged by the normalization. From the formulae, we can see that the update step has a run time in 
.
No negative values can result from the formulae for the update step. However, it is possible that all new grid values are zero. In this case, using a higher grid resolution or employing a different filter should be attempted.
  3.2. Prediction Step
The prediction step is based on a description of the system model in the form of a transition density 
, which describes the probability of transitioning to state 
 when the current state is 
. The foundation for describing the predicted density for the next time step is the Chapman–Kolmogorov equation
        
        for our state space. The formula on the right-hand side can be implemented as two consecutive operations. The first is to obtain the joint density 
 and the second is to marginalize 
 and 
 out.
We now approximate the transition density using a suitable parametric density. As in the update step, we omit the time indices for the functions (refer to (
24) for these) and the measurements on which the density is conditioned. We split the transition density into a marginalized density for the periodic part and a conditional density according to
        
To arrive at a feasible prediction step, we now assume that the orientation at time step  is conditionally independent of the position at time step t given the orientation at time step t. When this is the case, a transition density  may be directly available. If only the full transition density is available, one can obtain  by marginalizing the position at time step  out of  and discarding the (irrelevant) dependency on the position at time step t. When the conditional independence does not hold, one may consider conditioning on a specific value for the position part, e.g., the current estimate of the filter or the mean vector in the respective area.
The key idea of the prediction step is to consider the transition density as a function (not a density) of . By reordering the parameters, we can see it as a function of  Then, we can consider the toroidal manifold  as the periodic domain that we subdivide. For compatibility with , we subdivide  equally along both axes so that we obtain the areas . For each area , the corresponding grid point is . Then, we can get a similar grid-based representation as previously discussed by introducing some assumptions that resemble (A1)–(A3). Combined with the assumption introduced in the previous paragraph, we arrive at the list of assumptions
- (B1)
-  holds for all . 
- (B2)
- The function value of  in each area in  is identical to the grid value for this area, i.e.,  for all , . 
- (B3)
- The conditional density  is the same function for all ,  (we denote it by ). 
- (B4)
- Each conditional density can be written as
             - 
            and we have all relevant system matrices  - , inputs  - , and covariance matrices  - . 
With (B3), we assume that for a fixed orientation, the transitional part follows a linear system model. Writing it using random vectors (written in bold font), the model can be described by
        
        with fixed and known system input 
 and system noise 
. If 
 is not a Gaussian, we propose to approximate it with one via moment matching. The input 
 can be different for each area and can thus depend on the orientation. If it is a function in the form of 
, one can use 
 as 
, i.e., 
. Similarly, the input can also be a function of 
 and 
. While we never explicitly consider inputs for the orientation part, they can be considered by the nonlinear system model described by 
.
In our description of the transition density, we shall from now on omit the time indices of the input vector and the system and covariance matrices, which is always 
t. Based on its parameters, the transition density is given by
        
Due to potential approximations involved, the density may be unnormalized. By default, we renormalize after the prediction step (if necessary) and do not perform any normalization on the parametric transition density.
We start our derivation of the predicted density by writing out and reformulating the joint density as
        
We define 
 and marginalize all components of the state at time step 
t out to obtain the predicted density
        
We now introduce the assumption (B4) and use that the value of the pdf of a Gaussian at a point only depends on the distance between the point and the mode to rearrange the parameters of the first Gaussian. This allows us to employ (
15). Thus, we get
        
        in which the precise values of 
 and 
 are irrelevant. Since 
 is the only term that depends on 
, we can push back the integral sign to that term. Then, it is evident that the integral is 1 because we integrate a Gaussian density over the whole domain.
By adapting the parameters of the remaining Gaussian density in a way that does not change the function value, we obtain
        
Finally, we pull the factor 
 out of the second sum to obtain
        
This is already reminiscent of the desired form (
3). The factors in front of the second sum can be directly combined into the grid values for the predicted density 
. However, instead of a single Gaussian, we now have a mixture of Gaussians on 
 with the weights 
.
To prevent an increase in the number of parameters and an increase in the complexity of the filter over time, we perform mixture reduction. In our current implementation, we always reduce the mixture to a single Gaussian using moment matching. However, it would also be possible to perform a different reduction and preserve more components with alternative approaches [
27]. Our use of moment matching is motivated by its speed and ease of implementation. Moment matching yields
        
        for the predicted mean of the 
ith area. For the covariance, we need to include both the part stemming from the individual covariances and the additional part caused by the differences in the means. For the area 
i, we obtain the formula
        
        with 
 denoting the sample covariance of the weighted zero mean (note that we subtract 
) sample set. For a set of vectors 
 and a vector of weights 
, we obtain the values of the sample covariance according to
        
        in which 
 is the 
lth component of the vector 
. Determining this sum is in 
. It has to be determined for all 3 (unique) entries of the covariance matrix.
Since both the mean and the covariance can be determined in , the total effort of the mixture reduction is in . Since n mixture reductions are required, the total effort involved to determine all new means and covariances is in . Calculating each of the n entries of  also involves summing over n values, and thus, the calculation of the grid values is also in . With both operations being in , the total effort of the prediction step is in .
  4. Evaluation
In our evaluation, we consider a tracking scenario on 
 and compare our novel S3F with a particle filter (PF), the progressive 
 Bingham filter (ProgSE2BF) [
14], and the unscented Kalman filter for manifolds (UKF-M).
The PF, ProgSE2BF, and S3F are part of the latest version of libDirectional [
28]. For the UKF-M, we use the implementation of [
13]. The UKF-M involves a parameter that can be tuned, which is called 
 in [
29]. This parameter influences how far away from the mean the sigma points of the UKF [
30] are placed. [
13] recommends values in the range 
. The implementation allows specifying different parameters for the state uncertainty, the uncertainty introduced in the prediction step, and the measurement uncertainty. We set all entries of the vector to the same value and considered the values 
, 
, 
, and 1. While the results were similar for the different parameters, using 1 led to the best results. Therefore, we used this parameter in our evaluation.
  4.1. Scenario Description
In the considered scenario, we track a vehicle for 50 time steps. Its initial position is distributed according to 
 (with 
 denoting a zero vector and 
 an identity matrix), and its orientation (describing its heading angle) 
 is distributed according to a von Mises distribution (Section 3.5.4, [
25]) with 
 and 
 (i.e., 
) in each run.
The vehicle moves approximately one unit per time step in the direction it is facing. Random noise is applied to both the position and the orientation. The system model is thus
        
        with 
 and 
 for all 
. The uncertainty in the orientation influences the uncertainty in the position as 
 influences 
. Thus, while 
 and 
 are independent, the noise in the angle influences later positions as 
 influences 
, which in turn influences 
. In each run, we drew the initial state from the initial prior and then generated the new states by applying the system model and sampling from the involved distributions. Thus, the path was different for each run. We provide four sample trajectories in 
Figure 5.
We can only measure the position and not the heading angle of the vehicle. The measurement model is  with  for all . This makes it challenging for filters since they have to derive information about the heading angle based on the dependency of the heading angle and position over time.
  4.2. Models Used by the Filters
In this subsection, we describe how we applied the filters to the considered scenario. If possible, required conversions are done offline once and the results are used for all runs of the filter.
For the initialization of the PF with 
n particles, 
 and 
 were sampled 
n times. These samples were then used as equally weighted samples for the state representation of the PF. For the ProgSE2BF, we draw 100,000 samples from 
 and 
 and fit a density for dual quaternions states [
31] to the samples.
Due to the independence of the distributions for the initial position and angle, we can directly store the function values on the grid in the vector 
 and initialize all Gaussians with 
 and 
 for 
 for the S3F. For the UKF-M, we approximate the von Mises distribution with a wrapped normal (WN) distribution (Section 3.5.7, [
25]) 
 via moment matching. The covariance matrix for the UKF-M is then a block diagonal matrix with the squared standard deviation 
 of the WN and the covariance matrix 
 for the position on the diagonal.
Concerning the system model, we can sample the noise distributions for the PF and apply (
36) (with samples instead of random variables) to the samples to obtain the samples of the predicted density. For the ProgSE2BF, we draw 100,000 samples from the system noise 
 and fit a density for dual quaternion states to it. This has to be done only once offline because the resulting density can be used in all runs and time steps.
For the S3F, we evaluate  on the two-dimensional grid to obtain the grid values . This is done in a precomputation step offline. As the input , we use , which describes the change in the position in one time step for the orientation .
For the UKF-M, we use an existing model for tracking on  for scenarios involving gyroscopes. We provide a gyroscope reading of 1  per time step (which is the true velocity when disregarding noise) to the model. For the system noise, we match  with a WN distribution  and provide  to the filter. In the gyroscope-based model, the position noise needs to be given as longitudinal and transversal shifts. While our uncertainty in the position is given relative to the world coordinate system, our system covariance matrix for the position is , and this covariance matrix does not change under rotations of the coordinate system. Thus, we use  as the system covariance for the position for the UKF-M.
All filters except the UKF-M use likelihood functions. In the PF, the particle weights are multiplied with the function values of the likelihood at the particle positions. Afterward, a resampling step is performed. For the ProgSE2BF, we use the progressive update step that uses a likelihood function. The parameter for the progression (called 
 in [
14]) is set to 
. For the S3F, all grid values 
 are set to 1 since the measurement does not provide any information on the orientation. The measurement 
 is used as the mean 
 for all Gaussians and 
 is used for all covariances 
. A measurement function that extracts the state from the position is provided to the UKF-M. The employed covariance matrix is the covariance matrix of the measurement noise, i.e., 
.
  4.3. Evaluation Metrics
For all filters, we measured the run times and divided them by the number of time steps to provide the run time per time step. The run times were measured on an Intel Xeon Gold 6230 running at 2.1 GHz with 3 GB of RAM allocated to the job. Matlab 2021a ran with a single thread on Red Hat Enterprise Linux 8.2.
To assess the estimation quality of the filters, we consider the accuracy of the position and orientation estimates. We consider them separately since one would have to specify a weighting factor when combining them, and any chosen value may appear arbitrary. Multiple runs were performed, and the errors and run times of all individual runs were condensed into a single value by determining their average.
For the PF, we discarded the part of the vector describing the orientation to marginalize it out and then determined the mean of the particles to obtain the position estimate. We did this likewise for the orientation component using the mean direction (Section 2.2.1, [
25]) instead of the mean. For the S3F, we determined the mean of the linear part by marginalizing out the orientation part. This can be done analytically by determining the vector 
, weighting the Gaussians according to 
, and finally extracting the mean of the Gaussian mixture (see (
34)). The circular mean of the weighted grid points is used as the estimate for the orientation.
The average Euclidean distance between the true and the estimated position is used as the position error. As error criterion for the periodic part, we use the topology-aware distance (
1). For both the position and orientation, we compare the estimation accuracy after the update step. Since the position (and thus the measurement) is influenced by the previous orientation, it contains information on it. However, no information on the change from the previous to the current time step can be obtained. Hence, the accuracy of the orientation estimate is inherently limited by the uncertainty in the system evolution from the previous to the current time step. Therefore, not even a noise-free position measurement could lead to an elimination of all uncertainty in the orientation component.
We will also consider how often the filters failed during the evaluation. The PF can fail when all particle weights are zero after an update step. Since the likelihood is nonzero everywhere, this can only happen due to numerical imprecision (double precision was used). The S3F could potentially fail if all grid points are 0 in some time step.
Based on these criteria, we performed two evaluations. In the first, we extracted the estimates in all time steps and considered the error over the course of the time steps. While the filter results were highly variable in a single run, we also determined the averages over multiple runs to see whether the filter performance degrades over time or stays constant. This evaluation also showed us when the effect of the prior density had mostly vanished for the tracker. We chose to use only 100 particles for the PF and 15 areas for the S3F for this evaluation to see if the filters tend to diverge for low numbers of particles or areas.
In the second evaluation, we aimed to assess the accuracy of the filters during longer tracking tasks. For this, we chose to only consider the accuracy at the 50th time step. Our first evaluation showed that the filters had reached a stable accuracy by then. We did not average over the errors in all times steps to eliminate the effect of the first time steps, in which the prior density still influences the filter result or the filter has not yet reached a stable accuracy. Further, averaging over the errors in all time steps would have made us average over correlated errors. The run times are only considered in the second evaluation. We did not extract an estimate in every time step in this evaluation, and thus, the run times measured exclude the extraction of an estimate.
  4.4. Evaluation Results
In the first subsection, we provide the results of our first evaluation, in which we only considered specific configurations for all filters. The results of the second evaluation, in which we considered different numbers of particles for the PF and areas for the S3F, are given in the second subsection.
  4.4.1. Evaluation Considering All Time Steps
We first consider a single run and depict the position errors in 
Figure 6 and the orientation errors in 
Figure 7. We can see that the error goes up and down over time, which is what we had expected. While most filters work well on average, the system can evolve in a much different way than we expect it to do on average due to the effects of the random variables in the system model. Furthermore, with the considered Gaussian measurement noise, the obtained measurements in each time step can have arbitrarily large errors, just with small errors being more likely than large errors. From the results for a single run, it is not evident which filter performs best. It is merely evident that the PF with 100 particles once goes off track in its position estimate and appears to be worse overall.
To allow for a better comparison of the filters, we averaged the errors in all time steps over 200,000 runs. The results are shown in 
Figure 8 and 
Figure 9. None of the considered filters failed in any of the runs. We can see that the S3F with 15 areas performed best both concerning the position and the orientation estimates. The UKF-M yielded better position estimates than the ProgSE2BF. However, the ProgSE2BF provided better orientation estimates. The PF only outperformed the UKF-M in the beginning and was the worst filter at the end of the run. We can see that the UKF-M is slow to converge to a stable quality. When it does, it performs better than the PF with 100 particles. However, as we will see in the next section, the PF can provide better estimates than both the UKF-M and the ProgSE2BF given a sufficient number of samples. Due to the high number of runs, the oscillating behavior of the UKF-M in the error in the orientation component is certain not to be purely noise and may be attributed to its suboptimal handling of the information on the orientation. An important observation for the second evaluation is that it is fair to compare the filters at the 50th time step if one wants to analyze the long-term accuracy of the filters.
  4.4.2. Evaluation of the Last Estimates and Run Times
The filters were evaluated based on 70,000 runs in this evaluation. Several configurations were considered for the PF and S3F. The PF with 5 particles failed in 7014 runs and the PF with 11 particles in 145 runs. No failures were observed for all the other configurations and filters.
In 
Figure 10 and 
Figure 11, we show the average errors over the number of areas or particles. The ProgSE2BF and UKF-M have a fixed number of parameters and are therefore shown as straight lines. Using only 15 areas, the S3F achieved a higher mean accuracy than a PF with 1,000,000 particles. Using 7 areas, the S3F achieved better accuracy than the ProgSE2BF for the position and orientation. The UKF-M is better than the S3F with 3 areas but worse than the S3F with 7 areas. The PF achieves a better accuracy than the ProgSE2BF when using 700 particles. Using 15 areas, the S3F already achieved very high accuracy. The difference to the accuracy obtained using 30 areas is only in the order of 
. The accuracy of the S3F with 15 areas is also higher (although only slightly) than that of a particle filter using 1,000,000 particles. We believe this accuracy to be close to the accuracy an optimal Bayes filter would achieve in this scenario.
The run times of the filters are shown in 
Figure 12. We can see that the S3F is much slower per area than the PF per particle, which was expected. We also observe an expected faster increase in the run time for the S3F. While the PF is in 
, the S3F is in 
. The S3F is faster than the ProgSE2BF in all considered configurations. The PF with 25,000 particles had a run time similar to that of the ProgSE2BF but required much more run time with 1,000,000 particles. The UKF-M was faster than all other filters except the PF with 21 or fewer particles.
For a fair comparison of the filters, which have very different run times, we compare the errors over the run times in 
Figure 13 and 
Figure 14. A configuration to the lower left of another is strictly better because more accurate results were achieved using less run time. Configurations to the upper left were faster but provided less accurate results. Ones to the lower right were slower but better in terms of accuracy. The latter two cases do not allow for a clear ranking. The plots can be used to assess which filters should be considered given certain run time constraints. However, they can only serve as a rough guideline since the run times highly depend on the scenario, implementation, and hardware.
The S3F using 3 areas is worse than the PF with 300 and 700 particles. However, for configurations of the PF with 5000 particles and more, one can find at least one configuration of the S3F that performs better. In the configuration using 15 areas, the S3F still uses less than 3 ms per time step and is faster and more accurate than the PF with 1,000,000 particles.
Using 7 areas, the S3F achieves better accuracy than the ProgSE2BF at lower run times. The ProgSE2BF is slower than the UKF-M and provides worse position estimates, but its orientation estimates are better, and thus, none of the two filters is clearly superior to the other. The UKF-M is better than the S3F using 3 areas. However, using 7 areas, the S3F yields more accurate results than the UKF-M while taking only less than  ms per time step. The PF provides more accurate results than the UKF-M in the configuration with 700 particles, in which it takes less than  ms per time step.
  5. Discussion, Summary, and Outlook
In the first subsection, we compare our novel filter with other state-of-the-art approaches based on their theoretical properties and evaluation results. The conclusion of our paper is provided in the second subsection. The third subsection concludes the paper with an outlook.
  5.1. Discussion
While some existing approaches, such as the UKF-M and the ProgSE2BF, use a fixed number of parameters, the number of parameters can be increased to arbitrary numbers in the S3F and the PF. Generally, a higher number of particles in the PF or areas in the S3F leads to better results.
If good results at rapid run times are desired or only very low computation power is available, the UKF-M may be a good choice. The S3F should be preferred over the ProgSE2BF as the S3F can achieve better results at lower run times. Generally, both the PF and S3F can achieve higher accuracy than the UKF-M and the ProgSE2BF. Thus, if sufficient computation power is available, one should rather employ the S3F or PF instead of the UKF-M or ProgSE2BF.
While the run time of the PF only increases linearly in the number of particles, the accuracy of its estimates may converge only slowly. This is also reflected in the evaluation results in 
Section 4.4. The increase in the run time for the S3F is linear in the update step and quadratic in the prediction step, leading to a total complexity of 
. However, the convergence of the accuracy is so much faster than for the PF, and almost optimal results were achieved with relatively low run times. Hence, we do not recommend using a PF with a high number of particles but rather an S3F. Another advantage of the S3F is that it is deterministic, while the PF is non-deterministic (unless one uses a fixed seed for the random number generator).
In our evaluation, good results were obtained, even though none of the considered densities fulfilled the assumptions (A1)–(A3) and (B2)–(B4). However, higher numbers of areas may be required for high-quality estimates when the densities are more concentrated.
Different ways to provide a continuous density exist for the S3F. No densities are inherently provided by the PF. A limitation of the S3F is that one has to provide a transition density for the orientation that is conditionally independent of the previous state. If no such transition density can be provided and no good approximation can be generated using information from the filter, one may consider using the PF instead.
  5.2. Summary
In this paper, we presented a novel filter for 
 for which we rewrote the joint density for the full 
 state as a marginalized density for the orientation and a conditional density for the position. By subdividing the domain along the periodic dimension into multiple areas, we were able to present an amalgamation of a grid filter and a Kalman filter. While the S3F for 
 states requires more operations than a 1-D grid filter for 
 that permits the use of arbitrary transition densities [
20], it is also in 
 and thus in the same complexity class.
Our filter does not support all kinds of system models in the prediction step. Further, since densities do generally not fulfill the assumptions introduced in this paper, the filter involves approximations. However, in our evaluation, the S3F showed a much better convergence than the PF and achieved a very high accuracy using a low number of areas. Using 15 areas, higher accuracy was achieved than with a PF with 1,000,000 particles. The S3F also outperformed the ProgSE2BF by achieving a slightly higher accuracy using considerably less run time. While not as fast as the UKF-M, the S3F can achieve far higher accuracy.
  5.3. Outlook
Future work will involve applying the S3F to 
 [
32] and other manifolds that can be written as a Cartesian product of periodic (or bounded) and linear manifolds. As another possible extension, using ideas of the extended or unscented Kalman filter in the prediction step may help to allow for a certain degree of nonlinearity in the system model for the position.
Throughout this paper, we assumed the system and measurement models to be precisely known. Future work may concern approaches for imprecisely known models. One way to deal with imprecision in the parameters of linear models on linear domains is to employ robust filters [
33,
34]. For 
, even if, e.g., only the measurement model for the position is imprecise, this will have to be also considered for the orientation as the position and orientation are generally dependent.
Finally, adaptive subdivisions may help to keep the number of areas low without sacrificing too much accuracy. The basis for this could be adaptive grids as used in [
35]. Reducing the required number of areas is particularly important for higher dimensions, e.g., for estimating the pose of a drone along with sensor drifts, biases, and other relevant quantities.