A Bi-Invariant Statistical Model Parametrized by Mean and Covariance on Rigid Motions

This paper aims to describe a statistical model of wrapped densities for bi-invariant statistics on the group of rigid motions of a Euclidean space. Probability distributions on the group are constructed from distributions on tangent spaces and pushed to the group by the exponential map. We provide an expression of the Jacobian determinant of the exponential map of SE(n) which enables the obtaining of explicit expressions of the densities on the group. Besides having explicit expressions, the strengths of this statistical model are that densities are parametrized by their moments and are easy to sample from. Unfortunately, we are not able to provide convergence rates for density estimation. We provide instead a numerical comparison between the moment-matching estimators on SE(2) and R3, which shows similar behaviors.


Introduction
This work is an extended version of the conference paper [1], focused on SE (2). We provide here a formula for SE(n) with arbitrary n ≥ 2, and a numerical evaluation of the convergence of the moment-matching density estimator on SE (2).
Probability density estimation problems generally fall in one of two categories: estimating a density on a Euclidean vector space or estimating a density on a non-Euclidean manifold. In turn, estimation problems on non-Euclidean manifolds can be divided in different categories depending on the nature of the manifold. The two main classes of non-Euclidean manifold encountered in statistics are Riemannian manifolds and Lie groups. On Riemannian manifolds, the objects studied in statistics should be consistent with the Riemannian distance. For instance, means of distributions are defined as points minimizing the average square Riemannian distances. On a Lie group, the objects should be consistent with the group law. Direct products of compact Lie groups and vector spaces for examples belong to both categories, they admit a Riemannian metric invariant by left and right multiplications. However, in full generality, Lie groups do not admit such nice metrics, hence the need for statistical tools based solely on the group law and not on the Riemannian distance.
The definition of a statistical mean on Lie groups was addressed by Pennec and Arsigny in [2] where authors define bi-invariant means on arbitrary Lie groups as exponential barycenters [3]. Once the bi-invariant mean is defined, higher order bi-invariant centered moments can be defined in the tangent space at the mean. We build on this notion of moments to address the problem of constructing statistical models on SE(n), the group of direct isometries of R n . The wrapped distributions model we propose has several advantages. First, it is stable under left and right multiplications. Second, densities have explicit expressions and are parameterized by their mean and covariance rather than their concentration matrix as for normal distributions defined in [4]. Third, the densities are easy

Euclidean Groups
For a condensed introduction to Lie group theory for robotics, see [17], and for several relevant calculations on low-dimensional rigid motions, see the series of notes [18][19][20]. SE(n) is the set of all direct isometries of the Euclidean space R n . The composition law of maps makes SE(n) a group. For each element g of SE(n) there are a unique rotation R and a unique vector t such that g(u) = Ru + t, hence the isometry g can be represented by the couple (R, t). The group structure of SE(n) is not a direct product between the special orthogonal group and the group of translations, but a semi direct product with translations as the normal subgroup: where we simply have φ R = R. Let Ψ (R,t) denote the conjugation by (R, t). A short calculation gives Recall that Ad R,t = d Ψ (R,t) e . Hence, after unfolding the elements of the Lie algebra se(n) into column vectors, the matrix representation of Ad R,t is given by where C is a n by n(n−1) 2 matrix, Ad R is the adjoint representation of rotations. The structure of this adjoint matrix implies first that SE(n) is unimodular, i.e., admits a bi-invariant measure and the derivative of the exponential admits an explicit expression as we will see in Section 3.2. To see that SE(n) is unimodular, consider a left-invariant volume form ω. The volume form is bi-invariant if and only if dL g • dR g −1 (ω e ) = ω e , or equivalently det(dL g • dR g −1 ) = det(Ad g ) = 1. Since SO(n) is compact, it admits a bi-invariant measure. Hence det(Ad R ) = 1, and we have det(Ad (R,t) ) = det(Ad R ). det(R) = 1.
We note µ G the bi-invariant measure associated with ω. The fact that SE(n) is unimodular has a significant impact on the definition of statistical tools: it is possible to manipulate densities of probability distributions with respect to a canonical measure.
A convenient way to represent elements of SE(n) is to identify the isometry (R, t) with the matrix It is easy to check that the composition of isometries corresponds to the matrix multiplication. SE(n) is thus seen as a Lie subgroup of GL n+1 (R). Our density modelling framework is intrinsic and does not depend on a specific choice of coordinates. However, it is useful for some computations to set a reference basis. The tangent space at the identity element, noted T e SE(n), is spanned by the matrices of the form where E i,j is the n × n matrix with a 1 at index (i, j) and zeros elsewhere and e i is the i-th basis vector of R n . Let B e = A i,j (T i ) be the reference basis of T e SE(n). B e can be translated by left multiplication to make a left-invariant field of basis B. Depending on context A will denote an n × n skew-symmetric matrix or its embedding in the Lie algebra of GL n+1 , and tangent vectors will be noted with the letter u: u = (A, T).
Recall that a skew-symmetric matrix can be block-diagonalized with 2 by 2 rotations on the diagonal, followed by a 0 when the dimension is odd. For each n by n skew-symmetric matrix A, we note θ 1 , . . . , θ n 2 the set of angles of the 2 by 2 rotations. The identification of SE(n) with a Lie subgroup of GL n+1 (R) makes the computation of the exponential map easy: the group exponential is simply the matrix exponential. Let U be the subset of T e SE(n) defined by It can be checked that the exponential map on U is a bijection. Therefore, we can define the logarithm on SE(n) as the inverse of the exponential on U.

Bi-Invariant Local Linearizations
Moments and densities are defined using local linearizations of the group. Hence, to obtain bi-invariant statistics, the linearization must be compatible with left and right multiplications. This section describes why the exponential map provides such linearizations from arbitrary elements.
Though we do not use this formalism, the construction of the exponential at g can be viewed in the general setting of Cartan connections on Lie groups. The exponential at g is then the exponential of a bi-invariant connection, see [21][22][23].

The Exponential at Point G
Since the exponential maps the lines of the tangent space at e to the one parameter subgroups of SE(n), it is a natural candidate to linearize the group near the identity. To linearize the group around an arbitrary element g, it is possible to move g to the identity by a multiplication by g −1 , use the linearization at identity to obtain a tangent vector in T e SE(n), and map the resulting tangent vector to T g SE(n) with a multiplication by g. Fortunately, we can check that this procedure does not depend on a choice of left or right multiplication. Recall that on a Lie group, where dL g and dR g are the differentials of the left and right multiplication. This property enables the transport of the exponential application to any element of the group without ambiguity on the choice of left or right multiplication, Figure 1 for a visual illustration. Note U g ⊂ T g SE(n) = dL g (U) the injectivity domain of exp g . The logarithm log g : SE(n) → U g becomes We now have a linearization of the group around an arbitrary g ∈ SE(n). The bi-invariant nature of the linearization is summarized in Figure 2. Independence from the choice of left or right multiplication in the definition of the exponential at an arbitrary point was the key ingredient of the definition of the bi-invariant mean in [2]. It is again a key property in our statistical model. The strength of the exponential map is that it turns some general algebra problems into linear algebra. Once the space has been lifted to a tangent space, the problem of left and right invariances is reduced to the study of the commutation with the differentials of left and right multiplications. Since the tangent spaces do not have a canonical basis or scalar product, the manipulations we perform such as computing a mean, a covariance or estimating a density should not depend on the choice of a particular coordinate system. Hence if these manipulations commute with all the linear invertible transformations, in particular with the left and right differentials, they induce bi-invariant operations.

Jacobian Determinant of the Exponential
A measure µ on T g SE(n) can be pushed forward to the group using the exponential at g. This push-forward measure is noted exp g * (µ). Since exp g commutes with the right and left actions, so does the push-forward of measures. To obtain expressions of the densities on the group, it is necessary to compute the Jacobian determinant of the exponential, see Figure 3. Assume µ has a density f with respect to a Lebesgue measure of T g SE(n) and that its support is contained in an injectivity domain U g of exp g . The density f SE(n) of the measure pushed on the group is given by where d(exp g ) u is the differential of exp g at the vector u expressed in the left-invariant reference field of basis. Since SE(n) is unimodular, i.e., µ G is bi-invariant, the density of the pushed forward measure also commutes with the left and right translations of SE(n). We now compute this Jacobian determinant at the identity element. For the sake of notation, we drop the index e and let d exp u be the differential of the group exponential at the tangent vector u expressed in the bases B e and B exp(u) . d exp u has the following expression (see [20,24]): Since det(dL exp u ) = 1, the Jacobian determinant of the exponential is given by the determinant of the series. Fortunately, the adjoint action can be diagonalized and the determinant can be computed explicitly. Recall that ad u=(A,T) = d(Ad (R,t) ) (R,t)=e (A, T). Using Equation (1) we have that the matrix of ad u has the following form where on the left side A is an n × n skew-symmetric matrix, ad A is the adjoint map in the Lie algebra of skew-symmetric matrices, and D is an n(n−1) 2 by n matrix. Since the matrix of ad (A,T) is block triangular, and when λ = 0, the left term equals 1 and the right term can be extended by continuity. The right terms are the eigenvalues of the series in M. Hence using the fact that the determinant of a diagonalizable matrix is the product of its eigenvalues we have A is by definition skew-symmetric. Since the adjoint representation of SO(n) is compact, there is a basis of matrices such that ad A are skew-symmetric. Hence, Equation (5)  We provide a direct computation in Appendix A. If n is even, we have then and for n odd, Let J g (u) = | det(d exp g,u )|. Since exp g (u) = g · exp e dL g −1 u , Furthermore, .
Hence expressed in the basis B g and B exp g (u) , the determinant of d exp g,u is given by On SE(2) we simply have

Bi-Invariant Means
Bi-invariant means on Lie groups have been introduced by Pennec and Arsigny, see [2]. An elementḡ in a Lie group G is said to be a bi-invariant mean of g 1 , . . . , g k ∈ G or of probability distribution µ on G, if ∑ i logḡ(g i ) = 0 or G logḡ(g)dµ(g) = 0.
Observe thatḡ is not necessarily unique, see [2,26,27] for more details. Using Equation (2), it is straightforward to check that the mean is compatible with left and right multiplications: Hence if ∑ i logḡ(g i ) = 0 we also have ∑ i log g ḡ (g g i ) = 0 and ∑ i logḡ g (g i g ) = 0.

Covariance in a Vector Space
In this section, the bold letter u represents a vector and the letter u its coordinate in a basis. Let us recall the definition of the covariance of a distribution on a vector space in a coordinate system. Let e 1 , . . . , e n be a basis of the vector space V and µ a distribution on V. The covariance of µ in V is defined by where u andμ are the coordinate expressions of the vector u and the average of µ and E µ () is the expectation with respect to µ.
Let K : R + → R + be such that K( x ) is a probability density on R n whose covariance matrix in the canonical basis is the identity matrix, and µ be the distribution on V whose density is where λ e is the Lebesgue measure induced by e 1 , . . . , e n . It is easy to check that the covariance of µ is Σ.
Since the tangent space of a Lie group does not have a canonical basis, it is sometimes useful to define objects independently of coordinates. The coordinate free definition of the covariance becomes Recall that V ⊗ V is naturally identified with the space of bilinear forms on V * . Let B * be the bilinear form on V * associated with Σ. If B * is positive definite, it induces an isomorphism between V * and V and is then naturally identified with a bilinear form B on V. The definition µ in Equation (10) becomes where λ B is the Lebesgue measure on V induced by B. In this formulation it clearly appears that µ does not depends on a basis.

Covariance of a Distribution on SE(N)
Let µ be a distribution on SE(n) such that its bi-invariant meanḡ is uniquely defined. The covariance tensor of µ is defined as see Figure 4 for a visual illustration. Again, using Equation (2) and the bi-invariance of the mean, the compatibility of the covariance with left and right multiplication is straightforward. Note g · Σ and Σ · g the pushforwards by left and right multiplication by g of the tensor Σ. We have then g · Σ = E µ dL g (logḡ(g)) ⊗ dL g (logḡ(g)) where Σ is the covariance of the distribution g · µ, the push-forward of µ by L g . The same goes for right multiplications. However, it is important to note that for a covariance Σ defined on T g SE(n), pushing the covariance to the tangent space at identity using left and right multiplication usually gives different results: where Ad g (·) is interpreted as the map on tensors induced by the adjoint representation.
For two distributions µ 1 and µ 2 with different means, their covariance tensors are objects defined in different tangent spaces. The collection of all these spaces form the tangent bundle TSE(n), and covariances are identified to points in the tensor bundle TSE(n) ⊗ TSE(n).
In the reference field of basis B, the covariance Σ has a matrix Σ given by In principal geodesic analysis, the matrix Σ is sometimes referred to as a linearized quantity in contrast to the exact principal geodesic analysis, see [28].

The Model
Let K : R + → R + be such that (i) R n K( u )du = 1 (ii) R n uu t K( u )du = I, I being the n × n identity matrix (iii) K(x > a) = 0 for some a ∈ R.
Condition (i) imposes that K( u ) is a probability density on R n , condition (ii) that the covariance matrix is the identity matrix and condition (iii) that it has a bounded support.
The statistical model is defined by pushing densities of the form K( u ) from tangent spaces to the group via the exponential, where the Euclidean norms on tangent spaces are parameters of the distributions. To avoid summing densities over the multiple inverse images of the exponential map, it is convenient to deal with densities K( u ) whose support are included in injectivity domains, hence the (iii) requirement. Let C g be the set of covariance matrices compatible with the injectivity domain U g , Figure 5. When covariance matrices are expressed in the left-invariant reference basis, the set C g is the same for all g and the subscript can be dropped. When Σ ∈ C g , the support of the probability distribution µ on T g SE(n) defined by whereμ is the lift of µ by log g , is contained in U g . Here λ g denotes the Lebesgue measure of T g SE(n).
The density of the push-forward of µ is then or, expressed at g ∈ SE(n), where J is given in Equation (8). The set of such probability densities when g and Σ vary form a natural parametric statistical model: The commutation relations of Section 3.1 imply that M is closed under left and right multiplications. The fact that g and Σ are the moments of f g,Σ plays a major role in the relevance of the model M. This fact holds when Σ is small enough, a more precise result should follow in a future work.

Sampling Distributions of M
An advantage of constructing distributions from tangent spaces is that they are easy to sample: it suffices to be able to sample from the probability density p on R proportional to K(x), p ∝ K. Recall that the dimension of tangent spaces is d = n(n+1)

Evaluation of the Convergence of the Moment-Matching Estimator
All the experiments in this section were performed using the Python package geomstats, see [29], available at http:geomstats.ai. Let g 1 , . . . , g k be points in SE(n) with a unique bi-invariant meanĝ and such that the empirical covariancê is contained in Cĝ and that the moments of fĝ ,Σ µ G are (ĝ,Σ). The compatibilities with left multiplications, f g g,g Σ = g · f g,Σ andΣ(g · g 1 , . . . , g · g k ) = g .Σ(g 1 , . . . , g k ), and right multiplications, implies that the maximum likelihood and the moment-matching estimators are bi-invariant. On the one hand, finding the maximum likelihood estimation when g 1 , . . . , g k are i.i.d. requires an optimization procedure. On the other hand, matching moments is straightforward, provided that the moments of fḡ ,Σ are (ḡ, Σ). In most cases, this moment-matching estimator is expected to have reasonable convergence properties; however there are currently no theoretical results on the convergence of bi-invariant means and covariance on Lie groups. Hence for now it is only possible to provide empirical convergence on specific examples. Let The function K verifies i), ii) and ii) of Section 5.1. Since √ 5 < π 2 , Σ 1 and Σ 2 are admissible covariances, Σ 1,2 ∈ C. Σ 2 is chosen such that it correlates the rotation and translation coordinates.
Given a set of i.i.d. samples g 1 , . . . , g k of the density f e,Σ , the estimated density of the moment-matching estimator is fĝ ,Σ . For the sake of notations, we drop subscripts and simply write f andf . To characterize the convergence of the estimator, we compare the convergence off on SE(2) with the analogous moment-matching estimator on T e SE(2) ∼ R 3 using the samples log(g 1 ), . . . , log(g k ).
Any L p distance between densities provides a way to evaluate the convergence in a bi-invariant way. The L 1 is particularly meaningful in the context of probabilities and presents the advantage of being independent from a reference measure. Therefore, we evaluated the expectation of the L 1 distance to f : and the Euclidean analogous, where k is the number of samples of f . The integrals over SE(2) can be estimated using a Monte-Carlo sampling adapted to the distributions. Indeed, where S is the support of f and the (u i ) are i.i.d. samples of f . The L 1 distances between f andf are estimated using 5000 Monte-Carlo samples, and the expectation of the L 1 distance is estimated using 200 estimatesf . Figure 6 depicts the decay of the expected L 1 distance with the number of samples for the SE(2) and R 3 cases using the covariance Σ 1 and Figure 7 using the covariance Σ 2 . For a given covariance Σ, the error decay on SE(2) and R 3 seem to be asymptotically related by a multiplicative factor close to 1. Future work should focus on gaining insights into the phenomena underlying the error decay in the general case.

Conclusion and Perspectives
In this paper, we have described a statistical model M of densities for bi-invariant statistics on SE(n). Even though we do not provide convergence rates, we showed experimentally on an example that the density estimation on SE(2) behaves similarly to the estimation on R 3 . Further works will focus on a deeper analysis of the performance of the moment-matching estimator, on proposing detailed algorithms to estimate densities in a mixture model, and on generalizing the construction to other Lie groups.   that D = P −1 AP is a matrix with vanishing entries outside of n 2 2 by 2 matrices A j = 0 −a j a j 0 along the diagonal. If n is even, the eigenvalues of ad A are the numbers i(±a j ± a k ), 1 ≤ j < k ≤ n 2 0, with multiplicity n 2 .
If n is odd the eigenvalues of ad A are the numbers i(±a j ± a k ), 1 ≤ j < k ≤ n − 1 2 ±a j , 1 ≤ j ≤ n − 1 2 , 0, with multiplicity n − 1 2 .
Proof. Let g P (X) = PXP −1 . Since g P is invertible and that ad A = g P • ad D • g −1 P , ad A and ad D have the same eigenvalues. Consider first the case n odd. Let X ∈ A, X can be decomposed in n−1 2 × n−1 2 2 by 2 sub-matrices B i,j , n−1 2 1 by 2 sub-matrices u j on the last line of X, and n−1 2 2 by 1 u t j on the last column, and a 1 by 1 sub-matrix x = X n,n . Y = ad A (X) can be decomposed the same way in sub-matrices c i,j , v i,j , v t i,j and y = Y n,n . With the block products we obtain, it follows that each subspace A ij,i =j with vanishing entries outside the 2 by 2 blocks ij and ji, are ad D stable. These spaces are four-dimensional and direct calculation shows that the eigenvalues of ad D restricted to these spaces are i(±a i ± a j ). The subspace A i defined by the blocks u i and u t i are stable as well, and the computation shows that the corresponding eigenvalues are i(±a j ). ad D restricted to blocks A i,i vanishes, thus 0 is of multiplicity n−1 2 . In the n even case, only the eigenvalues associated with blocks A i,j remain.