Kalman Filtering for Attitude Estimation with Quaternions and Concepts from Manifold Theory

The problem of attitude estimation is broadly addressed using the Kalman filter formalism and unit quaternions to represent attitudes. This paper is also included in this framework, but introduces a new viewpoint from which the notions of “multiplicative update” and “covariance correction step” are conceived in a natural way. Concepts from manifold theory are used to define the moments of a distribution in a manifold. In particular, the mean and the covariance matrix of a distribution of unit quaternions are defined. Non-linear versions of the Kalman filter are developed applying these definitions. A simulation is designed to test the accuracy of the developed algorithms. The results of the simulation are analyzed and the best attitude estimator is selected according to the adopted performance metric.


Introduction
Mechanical state estimation of a vehicle is a field of interest. A vehicle is considered a rigid body, and its state of motion is represented by 4 mathematical objects: two of them represent its position and velocity, and the other two represent its orientation, and angular velocity. This paper is focused on the estimation of the angular state, composed of orientation, and angular velocity.
Although there are other mathematical tools used for estimation [1], the Kalman Filter [2] has become the algorithm par excellence in this area. Because of its simplicity, the rigor and elegance in its mathematical derivation, and its recursive nature it is very attractive for many practical applications. Its non-linear versions have been widely used in orientation estimation: the Extended Kalman Filter (EKF), and the Unscented Kalman Filter (UKF) [3]. However, there are problems arising from the used parametrization to represent the orientation.
The orientation of a system is represented by the rotation transformation that relates two reference frames: a reference frame anchored to that system, and an external reference frame. A thorough survey of attitude representations is provided in Reference [4]. The parametrization used to represent the rotation transformation could be singular, or present discontinuities among others. Table 1 summarizes the main characteristics of the most used parametrizations.
Having in mind that the special orthogonal group SO(3) has dimension three, ideally we would seek for a continuous and non-singular representation expressed by 3 parameters. However, since 1964 we know that "...it is topologically impossible to have a global 3-dimensional parametrization without singular points for the rotation group" [5]. Knowing this, we would not be wrong to say that unit quaternions are the most convenient representation we have, and that we will have for orientations. In Reference [6] the literature on attitude estimation is reviewed until 1982, when other

Quaternions
Quaternions are hypercomplex numbers composed of a real part and an imaginary part. The imaginary part is expressed using three different imaginary units {i, j, k} satisfying the Hamilton axiom: A quaternion q can be represented with 4 real numbers, and using several notations: We will denote quaternions with bold italic symbols ( q ), while vectors will be denoted with bold upright symbols ( q ). Vectors will be written in matrix form, and the transposed of a matrix M will be denoted as M T .
Quaternion product is defined by Equation (1) which produces the multiplication rule where (·) represents the usual dot product, and (×) represents the 3-vector cross product. Note that the quaternion product ( * ) is different from the product denoted by (⊗) in reference [4]. Given this multiplication rule, the inverse of a quaternion q (the one for which q * q −1 = q −1 * q = 1) is given by where q * represents the complex conjugate quaternion. Note that if q is a unit quaternion (a quaternion with q = 1), then q −1 = q * .

Quaternions Representing Rotations
Each rotation transformation is mapped with a rotation matrix R , and with two unit quaternions q and −q all of them related through Note that R(q) = R(−q) . Quaternions representing rotations have the form q = cos(θ/2) , q sin(θ/2) T , where q denotes the unit vector that defines the rotation axis, and θ the angle of rotation. Having this form, they satisfy the restriction q 2 = q 2 0 + q 2 1 + q 2 2 + q 2 3 = 1.
This means that quaternions describing rotations live in the unit sphere of R 4 , S 3 . This space is a manifold, and some concepts regarding these mathematical objects are useful in our context. In particular, the concept of chart is of special interest.

Distributions of Unit Quaternions
When dealing with the Kalman filter, the distribution of a random variable x is encoded by its mean x = E[ x ] and its covariance matrix P defined as This definition makes sense when our random variables are defined in the Euclidean space. But how do we define the covariance matrix of a random variable living in a manifold like ours? How can we define the covariance for unit quaternions if q − q does not represent a rotation? (Unit quaternions form a group under multiplication, but not under addition. This means that the addition of two unit quaternions may not result in another unit quaternion. Therefore, the addition of two unit quaternions may not represent a rotation.) What would be the covariance matrix if each quaternion was equiprobable in the unit sphere? We cannot redefine the covariance matrix, because the Kalman filter uses this precise form in its derivations, but we can take advantage of the properties of a manifold. Let us retrieve some important definitions: Definition 1 (Homeomorphism). A homeomorphism is a function f : X → Y between two topological spaces X and Y satisfying the next properties: If such a function exists, we say that X and Y are homeomorphic.
Definition 2 (Manifold). A n-manifold M n is a topological space in which each point is locally homeomorphic to the Euclidean space R n . This is, each point x ∈ M n has a neighborhood N ⊂ M n for which we can define a homeomorphism f : N → B n , with B n the unit ball of R n .

Definition 3 (Chart).
A chart for a manifold M n , is a homeomorphism ϕ from an open subset U ⊂ M n , to an open subset of the Euclidean space V ⊂ R n . This is, a chart is a function with ϕ a homeomorphism. Traditionally, a chart is expressed as the pair (U, ϕ) .
Given these definitions we can continue our reasoning.
In Reference [8] it talks about four "attitude error representations". Namely, the one we will call Orthographic (O), the Rodrigues Parameters (RP), the Modified Rodrigues Parameters (MRP), and the Rotation Vector (RV). The first three are what we know as stereographic projections (and are called Orthographic, Gnomonic, and Stereographic respectively). The last one is a projection called Equidistant. But all four are charts defining a homeomorphism from the manifold S 3 to the Euclidean space R 3 . This is, they map a point q in the manifold with a point e in R 3 . Table 2 arranges these chart definitions, together with their domain and image. We must ensure the charts to be bijections so that they properly define a homeomorphism, and that they do not map q and −q with different points of R 3 since they represent the same rotation. We achieve this by the given definitions of the domain and image for each chart. Figure 1 shows how points in the sphere S 2 (subspace of the sphere S 3 , where quaternions live) are mapped to points in R 2 (subspace of R 3 , where the images of the charts are contained) through each one of the named charts. Since our charts are homeomorphisms, it is possible to invert the functions. Figure 2 shows how points from R 2 are mapped to points in the manifold through the inverted charts. As pointed in Reference [8], all four charts provide the same second-order approximation for a point e ∈ R 3 near the origin, to a quaternion q ∈ S 3 :  We should notice that having R 3 and S 3 different metrics, a chart ϕ will inevitably produce a deformation of the space. However, for quaternions in the neighborhood of the identity quaternion (top of the sphere), our charts behave like the identity transformation between the imaginary part of these quaternions, and the points near the origin in R 3 , as suggested by (10). This is a desirable property, as this means that the space around the identity quaternion closely resembles the Euclidean space, which is the space for which the Kalman filter is designed. But this just happens in the neighborhood of the identity quaternion. However, we can extend this property for any quaternion q ∈ S 3 noting that any quaternion q ∈ S 3 can be expressed as a "deviation" from the first one through the quaternion product: where δ q represents such a deviation. (This definition is arbitrary: we could have chosen to relate the quaternions through q = δ q * q , but it is important to establish one of these definitions, and then be consequent with it. However, (11) entails a computational advantage for the computation of (37).) Then, we define a chart ϕ q for each quaternion q ∈ S 3 as where δ q = q * * q , and where we have denoted the point of the Euclidean space mapped with the quaternion q ∈ S 3 through the chart ϕ q as e q . Then, we will have a set of charts ϕ q q , each one resembling the Euclidean space around a quaternion q ∈ S 3 , and mapping this last quaternion to the origin of R 3 . We will refer to the Euclidean space associated with the chart ϕ q as the q-centered chart. Thus, the homeomorphism ϕ −1 q takes a point e q in the q-centered chart and maps it to a point q in the manifold through After reviewing these concepts, we can define the covariance matrix of a distribution of unit quaternions.
Given a unit quaternion q and a chart ϕ , we will define the expected value of a distribution of unit quaternions in the q-centered chart as and its covariance matrix as and the probability density of each unit quaternion q would be defined through the homeomorphism q = ϕ −1 q ( e q ) . Then, a distribution of unit quaternions needs of four mathematical objects to be encoded: ϕ , q , e q , P q . Although a distribution of unit quaternions is unique, given this definition, its expected value e q and its covariance matrix P q may take different values depending on the chosen quaternion q and chart ϕ . However, knowing that the Kalman filter is designed for the Euclidean space, it will be convenient to choose a unit quaternion q central in the distribution, in order that the manifold space around it closely resembles the most significant region for the covariance matrix in the q-centered chart. It is particularly convenient to choose a quaternion q such that e q = 0 , so that the covariance matrix is centered in the origin of the q-centered chart.

Transition Maps
At some step of the Kalman filter, we will have a distribution of unit quaternions defined in a q-centered chart, and we will be interested in expressing our distribution in another p-centered chart. The concept of transition map is relevant for this purpose.
For the present case, let us consider two unit quaternions p and q both related through These two quaternions define the charts ϕ p and ϕ q . We build the transition map that relates a point e q expressed in the q-centered chart with a point e p expressed in the p-centered chart doing That is to say, first we take the point e q in the q-centered chart, and we obtain its associated quaternion q in the manifold using ϕ −1 q . Then, we transform this quaternion q to a point e p in the p-centered chart. Nevertheless, knowing the quaternion δ we do not need to explicitly compute q . In fact, being able to express the same quaternion q as two different deviations, Note the equivalence of expressions (18c) and (19). Table 3 displays the transition maps for the charts studied. The detailed derivations of these transition maps can be found in Appendix A. Figure 3 attempts to provide some insight into how points are transformed through the transition map of each chart. Table 3. Transition maps for the charts studied.

Chart
Transition Map e p e q

Manifold Kalman Filters
In this section we present the models adopted for the Manifold Kalman Filters (MKF), and we display the resulting algorithms.
The state of the system at a time t is defined by an orientation, encoded with a unit quaternion q t , and by an angular velocity ω t . We will consider them to be random variables, and we will try to estimate their value using a Kalman filter.
Our unit quaternions {q t ∈ H : q t = 1} will define the rotation transformation that relates a vector v t expressed in a reference frame S attached to the solid whose state we want to describe, with the same vector v t expressed in an external reference frame S : For example, if we measure an acceleration a t in reference frame S , the acceleration in the inertial reference frame S would be given by a t = R(q t ) a t . This acceleration would be the one that we would have to integrate to obtain the position estimated by an accelerometer.
The vector ω t will define the angular velocity of the solid measured in S . Note that we do not include the bias of the sensors in the state of our system. We will assume that our sensors are calibrated, so the biases are zero.
We can predict the value of the random variables that describe the state of our system through the following motion equations: where q ω (t) is a random variable that represents the process noise, and is associated with the torque acting on the system, and with its inertia tensor. Its expected value at a given time t will be denoted as q ω t , and its covariance matrix will be denoted as Q ω t . We will assume that we have sensors giving measurements of angular velocity ω m t (which provide information about the relative change in orientation), and of a vector v m t whose value v t , expressed in the external reference frame S , is known (this provides information about absolute orientation). Examples of such sensors could be a gyroscope giving angular velocity measurements, an accelerometer measuring the gravity vector near the Earth surface (v t := −g), or a magnetometer measuring the Earth magnetic field (v t := B). The measurement model relates these measurements with the variables that describe the state of the system: where r ω t and r v t are random variables with zero mean and covariance matrices R ω t and R v t respectively that represent the measurement noises, and q v t is another random variable with mean q v t and covariance matrix Q v t representing external disturbances in the measurement of the vector v t . For example, it could represent accelerations others than gravity for an accelerometer, or magnetic disturbances produced by moving irons for a magnetometer.
We will assume that the measurements arrive at discrete times {t n } n . The format x t|t n will be used to denote a variable x at a time t , having included measurements up to a time t n with t > t n . For the n-th time stamp, in which a measurement arrives, we will write x t|n for the sake of simplicity. Then, our knowledge about the state at a time t , having included measurements up to a time t n with t > t n , is described by a distribution encoded in the collection of mathematical objects ϕ , p , x p t|n , P p t|n as described in Section 2.3. For the present case, x p t|n = e p t|n , ω t|n T is the expected value of the distribution, and P p t|n is its 6 × 6 covariance matrix, both expressing the quaternion distribution in the p-centered chart. Preferably, p will be a unit quaternion central in the distribution, so that the mapping of points from the p-centered chart to the manifold causes minimal deformation in such distribution. The unit quaternion q t|n = ϕ −1 p e p t|n will be our best estimation of the real quaternion q t that defines the orientation of the system with respect to the external reference frame S at time t .
The following subsections present the developed Kalman filters: one version based on the EKF and another version based on the UKF. The EKF is based on the linearization of the non-linear models to calculate the predicted covariance matrices. That is, the EKF approximates non-linear functions using their Jacobian matrices. To apply the EKF, our functions must be differentiable. On the other hand, the UKF is based on a deterministic sampling to approximate the distribution of our random variables. We select a minimal set of samples whose mean and covariance matrix are those of the state distribution. Then, they are transformed by the non-linear models, and the resulting set of points is used to compute the means and covariance matrices necessary to perform the Kalman update. This second approach does not need the functions to be differentiable.

Manifold Extended Kalman Filter
In this section we present the EKF-based estimator: the Manifold Extended Kalman Filter (MEKF). We offer here the main results of the more detailed derivation given in Appendix B.
A measurement arrives at time t n . Our knowledge about the orientation at a previous time t n-1 is described by a distribution expressed in the q n-1|n-1 -centered chart. We assume that this distribution has mean and covariance matrix P q n-1|n-1 n-1|n-1 . This is, we have an initial four ϕ , q n-1|n-1 , ω n-1|n-1 , P q n-1|n-1 n-1|n-1 .
The state prediction at time t n given all the information up to t n-1 is computed through with The measurement prediction at the same time is given by v m At this point, we compute the Kalman gain K n and use it to obtain the optimal estimation of the state: where x q n|n-1 n|n-1 = e q n|n-1 n|n-1 = 0 , ω n|n-1 T . Finally, we need to obtain the updated unit quaternion, q n|n , and compute the mean and the covariance matrix in the q n|n -centered chart, so that the distribution is expressed in the same conditions as at the beginning of the iteration. The point e q n|n-1 n|n that results from (41), and that is defined in the q n|n-1 -centered chart, correspond to a unit quaternion in the manifold. This is the updated unit quaternion q n|n which we are looking for: Knowing that the Kalman update (41) could produce any point in the q n|n-1 -centered chart we will need to "saturate" to the closest point contained in the image of each chart. The point e q n|n-1 n|n in the q n|n-1 -centered chart is the origin in the q n|n -centered chart. Then, the expected value of the state in this new chart will be given by x q n|n n|n = e q n|n n|n = 0 , ω n|n T , as at the beginning of the iteration.
To update the covariance matrix we need to consider its definition (15). We want to compute P q n|n having P q n|n-1 and knowing the relation e p e q provided by the transition maps in Table 3. Continuing with the EKF philosophy, the update for the covariance matrix will be found by linearizing e p e q around the point where the majority of information is comprised (in our case, the point e q = e q n|n-1 n|n ): where we have used the big O notation to describe the limiting behavior of the error term of the approximation as e q → e q . In particular, if we define then, and the final update for the covariance matrix will be computed through (47b) Table 4 summarizes the resulting T-matrix for each chart, along with their application domain. A detailed derivation of these T-matrices can be found in Appendix C. Table 4. T-matrices for the transition maps of the charts studied.

Chart T( δ ) Matrix Domain
After the final computation we obtain the four that is a condition equivalent to (27) in which we started the iteration.

Manifold Unscented Kalman Filter
In this section we present the UKF-based estimator: the Manifold Unscented Kalman Filter (MUKF).
A measurement z n arrives at time t n . Our knowledge about the orientation at a previous time t n-1 is described by a distribution expressed in the q n-1|n-2 -centered chart. This distribution is encoded in the four ϕ , q n-1|n-2 , x q n-1|n-2 n-1|n-1 , P q n-1|n-2 n-1|n-1 . (49) The first step in the UKF is to create the augmented N×1 mean x n and N×N covariance matrix P n . Since the measurement equations are linear for the random variables r ω t and r v t , we can leave their covariance matrices out of the augmented one and add them later: Then, we obtain the matrix L n which satisfies L n L T n = P n and we use it to generate the 2N+1 sigma points {X j } 2N j=0 as described in Ref. [15]: being W j = (1 − W 0 )/(2N) for j = 0 where W 0 regulates the importance given to the sigma point X 0 in the computation of the mean. These sigma points {X j } j are expressed in the q n-1|n-2 -centered chart. We need to express them in the manifold before applying the evolution equations and the measurement equations: where for the j-th sigma point, X e j is its chart point part and X q j is the quaternion with which it is mapped, X ω j is its angular velocity part, X q ω j is its angular velocity noise part, Y ω j is its angular velocity prediction, Y q j is the quaternion part of its prediction (we have assumed that the angular velocity Y ω j is constant in the time interval [t n-1 , t n ) so that we can use (A20)), X v j is the vector process noise part, Z v j is its vector measurement prediction, Z ω j is its angular velocity measurement prediction, and ∆t n = t n − t n-1 . Note that when applying the inverse chart ϕ −1 we will need to "saturate" X e j to the closest point in the image of ϕ . Having these new sigma points, we can obtain the means and covariance matrices of the distributions present in the UKF. First, defining Z j := Z v j , Z ω j T , the means are computed through x q n|n-1 n|n-1 where we have used a variation of the result provided in Ref. [16]. Namely, with q j · q k > 0 for j, k = 0, . . . , 2N . This result is shown to minimize the fourth order approximation of the distance defined as the sum of squared angles between the rotation transformation represented by each quaternion q j , and the one represented by q . This approach to compute the mean quaternion is extremely efficient, and its derivation is elegant and simple. In order to ensure that q j · q k > 0 , it is useful to remember the property that both q and −q represent the same rotation. This property is also useful for introducing the quaternions in the domain of ϕ to execute the next step of the filter. After this, we use the obtained mean quaternion q n|n-1 to express each sigma point in the q n|n-1 -centered chart, and compute the covariance matrices: where we have denoted Y j := Y e j , Y ω j − ω n|n-1 T . Finally, we compute the UKF version of the Kalman gain K n , and we use it to obtain the optimal estimation of the state: x q n|n-1 n|n arriving at the same conditions in which we began the iteration, with a distribution expressed in the q n|n-1 -centered chart, and encoded by the four ϕ , q n|n-1 , x q n|n-1 n|n , P q n|n-1 n|n .
Our best estimation for the orientation at this time is being e q n|n-1 n|n the part of the mean x q n|n-1 n|n that represent the quaternion in the q n|n-1 -centered chart.
Note that setting q n-1|n-2 := q n-1|n-1 and e q n-1|n-2 n-1|n-1 := 0 at the beginning of each iteration yields the traditional version of the algorithm, where a "reset operation" is performed instead of the covariance matrix update.

Simulation Results
This section presents the results of the simulations used to measure the accuracy of each estimator. Simulations are chosen instead of real experiments because a real system entails an uncertainty in the measurement of the true attitude: the attitude that is used to compare with that estimated by the algorithms. There are sources of error ranging from a miscalibration of the measurement system to a possible bias in the "true attitude" produced by another attitude estimator, which makes it problematic to define an adequate metric to measure the accuracy of the algorithms. For this reason, the authors consider that using a simulation is more reliable to avoid possible biases in the results due to said sources of error. Others have performed similar types of tests [7,17]. However, the results do not seem to be statistically conclusive: only the estimations of some orientation trajectories are shown.
We perform our comparison through a simulation in which we do have an absolute knowledge of the attitude of the system: a true oracle exists in a simulation. Therefore, we can compare the real orientation with the attitude estimated by the algorithms having fed them only with simulated measurements that we obtain from such known orientations. We will extract our performance metrics from a wide set of orientation trajectories in order to obtain statistically conclusive results.
We try to answer three questions with the simulation test. The first question is, is there a chart for which we get a greater accuracy in attitude estimation? The second one is, what algorithm produces the most accurate attitude estimation, the MEKF or the MUKF? The last question stems from the fact that previous algorithms on attitude estimation, such as the Multiplicative Extended Kalman Filter, did not contemplate updating the distribution from one chart to another as done at (47b) in the MEKF. However, their estimators performed well [6,7,12]. Then the third question is, does this "chart update" imply an improvement in the accuracy of the attitude estimation?
Although a simulation has been used to compare our algorithms, these have also been tested with a real IMU. In the Supplementary Materials one can find a demonstration video, the source code used in the video, the source code used to generate the simulations, and the source code used to obtain the computational cost of the algorithms in each platform.

Performance Metric
We have already described a quaternion q as a deviation from another quaternion q as q = q * δ . Now we define the instantaneous error between an estimated attitude, represented by a unit quaternion q , and the real attitude, represented by the unit quaternion q , as the angle we have to rotate one of them to transform it into the other. This is, the angle of the rotation transformation defined by the quaternion δ e such that q = q * δ e . Recalling (6), this angle can be computed as: having previously ensured that q · q ≥ 0 using the fact that both q and −q represent the same rotation transformation.
Angle θ e will vary along an orientation trajectory. Then, we will define the mean error in orientation estimation for a given trajectory starting at time t = 0 and ending at time t = T as Finally, e θ will depend on the followed trajectory, and on the set of taken measurements. We will need to generate several orientation trajectories to obtain the mean value e θ and the variance σ 2 e θ that characterize the distribution of the error in orientation estimation e θ for each algorithm. We will define the confidence interval for the computed e θ as where N s is the number of samples taken for the e θ computation, so that σ 2 e θ /N s is the variance of the sample mean distribution.
Being that the lower the better, the value of e θ gives us a measure of how well an algorithm estimates the orientation. We will consider that the performance of an algorithm A is better than the performance of other algorithm B if e θ (A) < e θ (B) and their confidence intervals do not overlap.

Simulation Scheme
To compute the performance metrics we will need to generate a large number of simulations. Each independent simulation will consist of three steps: initialization, convergence, and estimation.
In the initialization step we set up the initial conditions accordingly to the chosen simulation parameters. This includes generating the initial unit quaternion q 0 from a uniform distribution in S 3 , setting the initial angular velocity ω 0 to zero, setting the update frequency f update , generating the variances of the process noises σ 2 ω and σ 2 v from a uniform distribution in the intervals (0, Q ω max ] and (0, Q v max ] respectively, and initializing the estimation algorithm. The initialization of the MEKF includes setting q 0|0 = 1 , ω 0|0 = 0 rad/s , and P q 0|0 0|0 = 10 2 I . On the other hand, the initialization of the MUKF includes setting q 0|−1 = 1 , e q 0|−1 0|0 = 0 , ω 0|0 = (1, 1, 1) T rad/s , and P q 0|−1 0|0 = 10 2 I . The angular velocity is not initialized to 0 in the MUKF because it has been observed that it is sometimes necessary to "break the symmetry" for the algorithm to converge; especially when we do not apply the chart update (when we perform the "reset operation") for the RV chart. The covariance matrices that appear in both algorithms are initialized as Q ω n = I rads 2 /s 4 , Q v n = 10 −2 I p.d.u. ("p.d.u." stands for "Procedure Defined Unit". In the present case it depends on the definition of the vector v ), R ω n = R ω I rads 2 /s 2 , R v n = R v I p.d.u., where R ω and R v are the variances of the measurement noise that will be used in the simulation. We give this information about the measurement noise to the algorithms because it can be obtained offline, while the information about the process noise cannot. Given that a priori we cannot know how the system will behave, the values of Q ω n and Q v n have been chosen according to what we understand could be normal. Choosing these values we are assuming that after a second it is normal for the angular velocity to have changed by 1 rad/s , and also that it is normal to find external noises added to the vector v t of magnitude 10 −1 p.d.u. . For the mean values we set q ω n = 0 rads/s 2 , and q v = 0 p.d.u. . In the convergence step we keep the system in the initial orientation q 0 . Simulated measurements are generated using (23) and (24). For each measurement, a different v t is sampled from a uniform distribution in the unit sphere of R 3 . The values for each component of q v t , r v t , and r ω t are obtained from normal distributions with zero mean and variances σ 2 v , R v , and R ω respectively. The term R T ( q t ) in (23) is obtained from the true attitude q t , which in the convergence step takes the value of q t = q 0 . The term ω t in (24) is the true angular velocity, which in the convergence step takes the value ω t = 0 . The tested algorithm updates its state estimation until the inequality θ e (t) < θ 0 e is satisfied, where θ e (t) is the value of the error (72), and θ 0 e is a parameter in the simulation. The convergence step could have been replaced by an initialization of the attitude estimated by the algorithm q t to the real value q t , but then it would have also been necessary to fix a certain covariance matrix. Since the metric of the space generated by each chart is different, it is difficult to set a covariance matrix that provides the same information for each chart. It seemed more natural to the authors to allow the algorithm to find the true attitude by its own means, and for the covariance matrix to converge to a value in each case. Finally, in the estimation step we generate a random but continuous orientation sequence using a Wiener process for the angular velocity: where n t is a random vector whose components are sampled from a normal distribution with zero mean and variance σ 2 ω , and δt is the simulation time step that is related to the algorithm time step ∆t trough dtdtsim δt = ∆t , being dtdtsim an integer parameter that determines the simulation updates per algorithm update. Note that we multiply n t by √ δt and not by δt . We do it this way so that the covariance matrix after k steps does not depend on the simulation time step δt . In fact, after a time T = k δt the covariance matrix of the angular velocity will have grown by ∆P ω = k I σ 2 ω δt = I σ 2 ω T , and not by (∆P ω ) = k I σ 2 ω (δt) 2 = I σ 2 ω T δt . After each dtdtsim simulation updates, a simulated measurement is generated in the same way it was done in the convergence step, and the algorithm is updated with it. The simulation will run for a time Tsim = k ∆t , where k is an integer number. This way we will perform the last algorithm update at the end of the simulation. The error (72) will be evaluated after each algorithm update, and it will be added up through the simulation to obtain the averaged error (73). After each simulation, we will obtain a sample for the computation of e θ and σ 2 e θ . We will perform N s of these simulations to obtain the confidence interval (74).

Results
In this section we present the results of the simulations. The algorithms are tested for update frequencies f update = 1/∆t in the interval [2,1000] Hz . This range has been chosen thinking about the possible limitations of a real system. For example, the maximum data rate of a low cost IMU is around 1000 Hz . On the other hand, the update frequency may be limited by processing. The computational cost of each estimator has been evaluated in two platforms: an Arduino MEGA 2560, and a Raspberry Pi 3 Model B. The code has been written in c++. The resulting maximum update frequencies are presented in Figure 4, which indicates that the MEKF can be executed approximately 3 times faster than the MUKF. Although the algorithms have been developed allowing a different ∆t n for each update, the simulations are performed using a constant ∆t , and the simulation parameters depicted in Table 5. The parameters θ 0 e , Tsim , dtdtsim , and N s have been chosen trying to reach a compromise between the precision of the results, and the execution time of the simulation. The values for Q ω max and Q v max have been chosen in such a way that the estimation algorithms face both normal situations ( Q ω n ≈ σ 2 ω I and Q v n ≈ σ 2 v I ) and situations that were not foreseen ( Q ω n = σ 2 ω I or Q v n = σ 2 v I ). A typical low cost IMU has R ω ≈ 10 −4 rad 2 /s 2 and R v ≈ 10 −4 g 2 . The values chosen for R represent an imprecise sensor (10 −2 ), a normal sensor (10 −4 ), and a precise sensor (10 −6 ). The value of W 0 has been chosen so that all sigma points have the same importance, but very similar results, if not identical, have been obtained for other selections of W 0 .

Chart Choice
The results of the simulation are presented in Figure 5. The average of the performance metric is shown along with its confidence interval for each of the selected update frequencies. The results of the MEKF and the MUKF are shown in different graphs, but drawn in the same one are the results for each chart and for a given MKF. In this way we are able to distinguish if a chart has an advantage over the others.
We observe that there is no chart that is especially advantageous. All things being equal, we would opt for the RP chart. For this chart it is not necessary to worry about the domain since it maps q and −q with the same point of R 3 and with the same T-matrix; or of the image since it is all R 3 . In addition, the expressions of ϕ −1 and the T-matrix for the MEKF are simpler for the RP chart. These computational advantages make us prefer the RP chart over the others.  We note that the MEKF performs the same or better than the MUKF. This differs from the usual experience, in which the UKF outperforms the EKF in traditional non-linear estimation applications. The fact that the charts resemble the Euclidean space near the origin (see Section 2.3) might be favoring the MEKF, since the Jacobian matrices, used to approximate the non-linear functions, are defined at that point. However, the sigma points generated for the MUKF are sampled far from the origin of the chart, where the non-linearities become notorious. We are facing a very particular scenario in which the model is approximately linear for the MEKF, while for the MUKF it is not. In addition, due to the difference in computational cost (see Figure 4), the MUKF update frequencies will generally be lower than those of the MEKF, which will imply worse accuracy in its estimations. Then, the MEKF with the RP chart seems to be our best option. Figure 7 presents the results of each MKF with each chart in a different graph, but displayed in the same one are the results using the "chart update" and the results without using it. We can observe that there is almost no difference between using the "chart update" and not using it. The concepts used in this paper have helped us to understand the mechanisms of the MKF, and ultimately to arrive to the concepts of "multiplicative update", and of "covariance correction step" with the T-matrix definition. However, it is not necessary to apply the latest update (47b) in practice: we will obtain essentially the same accuracy in our estimations.

Conclusions
We have used concepts from manifold theory to define the expected value and the covariance matrix of a distribution in a manifold. In particular, we have defined the expected value and covariance matrix of a distribution of unit quaternions in S 3 , the unit sphere in R 4 , using the concept of chart. These definitions have helped us to develop Kalman filters for orientation estimation, where the attitude has been represented by a unit quaternion. They have also helped us solve the problem of the "covariance correction step". Two estimators have been developed: one based on the EKF (the MEKF), and another based on the UKF (the MUKF). The MEKF and the MUKF have been tested in simulations, and some results have been obtained. The conclusions of the simulations are:

•
There is no chart that presents a clear advantage over the others, but the RP chart has some characteristics that motivate us to prefer it.

•
The MEKF is preferable to the MUKF due to its lower computational cost and its greater accuracy in orientation estimation.

•
The "chart update" is not necessary for the MKF in practice.
Then, the MEKF with the RP chart and without applying the "chart update" is our best attitude estimator according to the adopted performance metric. This algorithm resembles the conventional "Multiplicative Extended Kalman Filter", but we have obtained the MEKF without having to redefine any aspect of the classic Kalman filter. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A. Derivation of Transition Maps
This appendix contains the derivation of the transition map for each chart.
Appendix A.1. Orthographic Using the inverse of the transformation that defines the chart, ϕ −1 , Finally, applying the chart definition, Finally, applying the chart definition, with δ p = δ 0 e q sin e q 2 − cos e q 2 δ − δ × e q sin e q 2 . (A13) Note that all transition maps are expressed using the δ quaternion. Given that e = ϕ δ we could also have expressed them using e which is what we get after applying the Kalman update (41). However, our choice makes transition maps to take a simpler form. In addition, having to compute the quaternion δ to perform (43c), this choice does not imply a computational overhead.

Appendix B. Details in the Derivation of the MEKF
with q ω t the expected value of the random variable q ω at time t . Doing ω 0 = ω n-1|n-1 we arrive at On the other hand, approximating (22) with its Taylor series up to first order around the current state ( q , ω ) , and taking its expected value we obtain This differential equation has no general closed solution. But if we assume that the expected value of the process noise q ω (t) is zero when t ∈ (t n-1 , t n ) , so that ω (t) is constant in that interval, then we will have the matrix differential equation This differential equation has the solution where q 0 represents the initial conditions. After taking q 0 = q n-1|n-1 , we obtain the prediction q t|n-1 , that can be expressed using the quaternion product as with ∆t = t − t n-1 .
we know [18] that the covariance matrix satisfies the following differential equation: where F = ∂ f ∂x , and G = ∂ g ∂q . This is so because the evolution equation for ∆x = x − x is approximately given by: and P is defined as P = E ∆x (∆x) T . However, we have a different definition for P : Then we need to find the evolution equation for e q . Recall that we are assuming e q = 0 at the beginning of the iteration. Knowing that any quaternion in the unit sphere can be expressed as a deviation from a central quaternion q as q = q * δ , and using the differential Equations (22) and (A16), we can find a differential equation for the quaternion δ : where a dot over a symbol represents time derivative, and we have obviated the time dependence. Isolating the time derivativeδ , Knowing that, for each of our charts, the δ quaternion can be approximated by (10) as e → 0 , then we can obtain an approximate differential equation for a point e expressed in the q-centered chart. Note that we have not explicitly denoted e q or δ q . This will be assumed implicitly, since these quantities will always be expressed in the q-centered chart in this appendix. Using the chain rule for a time derivative and expression (A26e), Then, the first order approximation to differential Equation (A27c) would bė On the other hand, combining Equations (21) and (A14) we obtain therefore matrices F , G , and Q in (A22) are in our case We are now in a position to solve the differential Equation (A22). Let us consider its homogeneous version first: which has as solution Taking into account the definition of matrix exponential, and after computing the powers of F we obtain where we have denoted Ω = ω × , and δ ω = cos ω t 2 , ω ω sin ω t 2 . We also have assumed that t takes small values so we can approximate the infinite sums truncating in the first term. To find the solution of the non-homogeneous differential equation we use the variation of constants method: P = e F t C(t) e F T t =⇒ (A37) =⇒ d P dt = F e F t C(t) e F T t + e F t C(t) e F T t F T + e F t d C(t) dt e F T t = (A38) = F P + P F T + e F t d C(t) dt e F T t .
Identifying terms with (A22) we obtain that Finally, truncating the summation in (A42) at the first non-zero elements, and inserting the result into (A37), we obtain (32) where we have identified C 0 = P(0) through the initial conditions.

Appendix B.2. Measurement Prediction
This subsection contains the derivation of the equations for the measurement prediction.
with ε ikl the Levi-Civita symbol. Applying (44) to (A59), being δ ij the Kronecker delta. Returning to matrix notation, the linear approximation of N e q is N e q = δ 0 I − δ × e q − e q + O e q − e q 2 . (A61) On the other hand, evaluating (A57) at e q we obtain the zeroth order approximation: Finally, combining (A61) and (A62c) we can compute the linear approximation of (A6) : δ × e q = (A66a) Then, as with the RP chart, the approximation of N e q does not have terms of order O(1) , and we will only need to approximate D e q to the zeroth order.
We can write (A64) as being δ ij the Kronecker delta. Returning to matrix notation, the linear approximation of N e q is N e q = 8 δ 0 I + 2 δ (e q ) T − 8 δ × e q − e q + O e q − e q 2 = (A69a) On the other hand, evaluating (A65) at e q we obtain the zeroth order approximation: D e q ≈ 16 + 16 δ 2 (1 + δ 0 ) 2 + δ 0 16 − 16 where we have used the equality δ 2 = 1 − δ 2 0 for unit quaternions. Finally, combining (A69b) and (A70c) we can compute the linear approximation of (A9): Then, the first order approximation of δ p around e q will have the form and δ p → 0 as e q → e q . Taking the Taylor series of the arcsin x , arcsin δ p so that (A12) is linearized as 2 δ p δ p arcsin δ p = 2 T e q − e q + O e q − e q 2 . (A75) We only lack the T matrix. We will need the linear approximations of cos e q /2 and e q sin e q /2 around e q . To this end we will first obtain the linear approximation of x :