A Geodesic-Based Riemannian Gradient Approach to Averaging on the Lorentz Group

: In this paper, we propose an efﬁcient algorithm to solve the averaging problem on the Lorentz group O ( n , k ) . Firstly, we introduce the geometric structures of O ( n , k ) endowed with a Riemannian metric where geodesic could be written in closed form. Then, the algorithm is presented based on the Riemannian-steepest-descent approach. Finally, we compare the above algorithm with the Euclidean gradient algorithm and the extended Hamiltonian algorithm. Numerical experiments show that the geodesic-based Riemannian-steepest-descent algorithm performs the best in terms of the convergence rate.


Introduction
Averaging measured data on matrix manifolds is one of the most frequently arising problems in many fields.For example, one may need to determine an average eye of a set of optical systems [1].In [2], the authors concern the statistics of covariance matrices for biomedical engineering applications.The current research mainly focuses on the compact Lie groups, such as the special orthogonal group SO(n) [3] and the unitary group U(n) [4], or other matrix manifolds, such as the special Euclidean group SE(n, R) [5], Grassmann manifold Gr(n, k) [6,7], the Stiefel manifold St(n, k) [8,9] and the symmetry positive-definite matrix manifold SPD(n) [10,11].However, there is little research on averaging over the Lorentz group O(n, k).
In [12], Clifford algebra, as a generalization of quaternions, is applied to approximate the average on the special Lorentz group SO (1,2).In engineering, the Lorentz group plays a fundamental role, e.g., in the analysis of motion of charged particles in electromagnetism [13].In physics, the Lorentz group forms a basis for the transformation of parabolic catadioptric image features [14].According to [15], the Lorentz group O(3, 1) is equal to the linear transformations of the projective plane and used to characterize the catadioptric fundamental matrices that are applied to study the catadioptric cameras.
This paper aims at investigating the averaging on the Lorentz group with a left-invariant metric that plays the role of Riemannian metric so that the geodesic is given in closed form.The considered averaging optimization problem could be solved numerically via a geodesic-based Riemannian-steepest-descent algorithm (RSDA).Furthermore, the devised method RSDA is compared with the line-search algorithm, Euclidean gradient algorithm (EGA) and the second-order learning algorithm, extended Hamiltonian algorithm (EHA), proposed in [16,17].
The remainder of the paper is organized as follows.In Section 2, we summarize the geometric structures of the Lorentz group.The averaging optimization problem, the geodesic-based Riemannian-steepest-descent algorithm and the extended Hamiltonian algorithm on the Lorentz group are presented in Section 3. In Section 4, we show the numerical results of the RSDA and compare the convergence rate of the RSDA with the other two algorithms.Section 5 presents the conclusions of the paper.

Geometry of the Lorentz Group
In this section, we review the foundations of Lorentz group and describe a Riemannian metrization in which the geodesic could be written in closed forms.
Let M(n, R) be the set of n × n real matrices and GL(n, R) be its subset containing only nonsingular matrices.It is well-known that GL(n, R) is a Lie group, i.e., a group which is also a differentiable manifold and for which the operations of group multiplication and inverse are smooth.
where I is identity matrix, and 0 is n × k zero matrix.
For anyA ∈ O(n, k), the following identities hold: The tangent space T A O(n, k), the Lie algebra o(n, k) and the normal space N A O(n, k) associated with the Lorentz group O(n, k) can be characterized as follows: We employ the left-invariant metric as the Riemannian metric on Let γ : [0, 1] → O(n, k) be a smooth curve in O(n, k), and its length associated with the Riemannian metric (1) is defined as The geodesic distance between two points A 1 and A 2 in O(n, k) is the infimum of lengths of curves connecting them, namely, Applying the variational method from Equation (2), we can find that the geodesic equation satisfies γ + Γ γ ( γ, γ) = 0, where Γ denotes the Christoffel symbol, the over-dot and the double over-dot denote first-order and second-order derivation with respect to parameter t, respectively.When the values γ(0) = A ∈ O(n, k) and γ(0) = X ∈ T A O(n, k) are specified, we shall denote the solution as γ A,X (t), which is associated with the Riemannian exponential map exp A : .
Let f : O(n, k) → R be a differentiable function and ∇ A f ∈ T A O(n, k) denotes the Riemannian gradient of function f with respect to the metric (1), which is defined by the compatibility condition where ∂ A f denotes the Euclidean gradient of function f and •, • E denotes the Euclidean metric.
The exponential of a matrix A in M(n, R) is defined by We remark that e A+B = e A e B when AB = BA for A and B in M(n, R).
For the Frobenius norm • , when A − I n < 1, the logarithm of A is given by In general, log(AB) = log A + log B. We here recall the important fact that holds, then where W denotes the derivative of f (A).
The structure of the Riemannian gradient associated with the Riemannian metric (1) is given by the following result.
Theorem 1.The Riemannian gradient of a sufficiently regular function f : O(n, k) → R associated with the Riemannian metric (1) satisfies Proof of Theorem 1.According to equality (3), the gradient ∇ A f is computed as as X is arbitrary, the condition above implies that AS, with S = S, and thus we have Inserting the expression ∇ A f into the equation above, we have Substituting S into the expression of ∇ A f , we finish the proof.
In the text, we would like to give credit to [19] for the explicit expression of geodesic over the general linear group endowed with the natural left-invariant Riemannian metric (1).Under the same Riemannian metric, it is clear that the geodesic on the Lorentz group can be written in closed form (cf. Theorem 2.14 in [19]).

Proposition 1. The geodesic γ
Proof of Proposition 1.In fact, by the variational formula δ By the definition that H(t) := γ −1 (t) γ(t), we see that the geodesic satisfies the Euler-Lagrange equation We can verify that the curve emanating from A with a velocity X is the solution of the Euler-Lagrange equation above.

Optimization on the Lorentz Group
This section aims at investigating the averaging optimization problem on the Lorentz group, and finding the average value out of a set of Lorentz matrices.Sections 3.1 and 3.2 recall the geodesic-based Riemannian-steepest-descent algorithm and the extended Hamiltonian algorithm on the Lorentz group, respectively.

Riemannian-Steepest-Descent Algorithm on the Lorentz Group
According to the expression of geodesic (5) and, furthermore, we have Note that the geodesic distance d(A, B) = X, X A = A −1 X .From Equation ( 6), it is difficult to give an explicit expression of X in A and B. In [20], the authors show that the geodesic distance connecting A and B could be obtained by solving the following problem min Then, the geodesic distance between A and B, d(A, B) = U * , where U * is a minimizer of Equation (8).In order to compute U * , a descent procedure can be applied so we need to find the gradient first.The differential with respect to U of the function and thus, according to Lemma 1, we can get the Euclidean gradient of J(U) On the other hand, recall the Goldberg exponential formula in [21] log where ω denotes a word in the letters X and Y, and the inner sum is over all words ω with length |ω| = n.The symbol g ω denotes Goldberg coefficient on the word ω, a rational number.If the word ω begins with X, say [ ] denotes the greatest integer of the enclosed number, and the polynomials G r (t) are defined recursively by According to Theorem 1 in [22], we can obtain the following result.
Proof of Proposition 2. According to the Goldberg exponential formula (10), Equation ( 7) can be recast as in which ω denotes a word in the letters U and U − U .
By using properties of the Goldberg coefficient g ω (see [23]), we get , Thus, the sum of |g ω | extended over all words ω of length n and involved m parts is 2 Therefore, due to U − U ≤ 2 U , and for n ≥ 2, ω ≤ 2 n−1 U n , we have Here, in the last step, the equality holds if and only if U < 1 2 .
Remark 3. Proposition 2 implies that we can replace the geodesic distance d(A, B) = U by log(A −1 B) where the latter one is much easier to handle.
In order to compare different Lorentz matrices, the following discrepancy For the Lorentz group, the criterion function f : O(n, k) → R to measure a collection {B 1 , B 2 , . . ., B N } of samples randomly generated is defined by which appears as a specialization, to the Lorentz group, of the Karcher criterion [24].
Definition 2. The average value of {B 1 , B 2 , . . ., B N } is defined as Based on the case in the Euclidean space of the line search method with the negative Euclidean gradient, in order to achieve the average value, a geodesic-based Riemannian-steepest-descent algorithm can be expressed as [25] where p ≥ 0 is an iteration step-counter, the initial guess A 0 ∈ O(n, k) is arbitrarily chosen, and t p denotes a (possibly variable) stepsize schedule.Note that the Riemannian gradient of the criterion function is crucial to solving the optimization problem by the geodesic-based Riemannian-steepest-descent algorithm (16).
Theorem 2. The Riemannian gradient of the criterion function ( 14) is Proof of Theorem 2. The differential with respect to A of the criterion function ( 14) is and, furthermore, according to Lemma 1, we get the Euclidean gradient Substituting the Euclidean gradient (18) into the expression of the Riemannian gradient (4), we have Associated with the expressions of the geodesic (5) and the Riemannian gradient (17) of the criterion function, the geodesic-based Riemannian-steepest-descent algorithm (16) is recast as where In order to compute a nearly-optimal stepsize, the function f (A p+1 ) considered as a function of the stepsize t p can be expanded in Taylor series at the point t p = 0 as where the f 1,p and f 2,p as the coefficients of the Taylor expansion are computed by .
Furthermore, the nearly-optimal stepsize value that minimizes the Thus, to obtain the f 1,p and f 2,p , the function f (A p+1 ) is recast as q tr(log (B −1 q A p e t p U p e t p (U p −U p ) ) log(B −1 q A p e t p U p e t p (U p −U p ) )).
Calculation leads to the first-order derivative of the function f (A p+1 ) and setting t p = 0 yields the f 1,p Under the first-order derivative of the function f (A p+1 ) above, direct calculations lead to the second-order derivative of the function f (A p+1 ) q tr e t p (U p −U p ) U p U p e t p (U p −U p ) + log (B −1 q A p e t p U p e t p (U p −U p ) ) and, setting t p = 0, we get Hence, the nearly-optimal stepsize of the geodesic-based Riemannian-steepest-descent algorithm (19) Therefore, the iteration algorithm (19) with the stepsize (20) may be recast explicitly as

Extended Hamiltonian Algorithm on the Lorentz Group
References [16,17] introduced a general theory of extended Hamiltonian (second order) learning on Riemannian manifold, especially, as an instance of learning by constrained criterion function optimization on the matrix manifolds.For the Lorentz group, the extended Hamiltonian algorithm can be expressed by Ȧ = X, where A ∈ O(n, k), X ∈ T A O(n, k) denotes the instantaneous learning velocity, f : O(n, k) → R denotes a criterion function, and the constant µ > 0 denotes a viscosity parameter.
Inserting the the expressions of the Christoffel matrix function Γ A in Remark 1 and the Riemannian gradient ∇ A f (4) into the general extended Hamiltonian system (22), calculations lead to the following expressions: Hence, the second equation may be integrated numerically by a Euler stepping method, while the first one may be integrated via the geodesic.Namely, system (22) may be implemented by where η > 0 denotes the learning rate.The effectiveness of the algorithm is ensured if and only if where λ max denotes the largest eigenvalue of the Hessian matrix of the criterion function f (A) (see [17]).

Numerical Experiments
In the present section, we give two numerical experiments on the Lorentz group O(3, 1) to demonstrate the effectiveness and performance of the proposed algorithms.The Lorentz group as a homogeneous space has four connected components.Now, we study the optimization on a connected component SO 0 (3, 1) containing the identical matrix I 4 .
In order to emphasize the behavior of the optimization methods RSDA, the EHA and the EGA, in the following experiments, η = 0.5 and µ = 1.5 are used.

Numerical Experiments on Averaging Two Lorentz Matrices
In subsection 3.1, it is noticed that the geodesic distance d(A, B) and the discrepancy D(A, B) between two points A and B are different.Now, given B 1 , B 2 ∈ O(3, 1), numerical experiment results are shown to compare d(B 1 , B 2 ) and D(B 1 , B 2 ).The following two points are sought: The optimization problem to compute the geodesic distance d(B 1 , B 2 ) can be solved by means of a descent procedure along the negative Euclidean gradient (9) with a line search strategy.In Figure 1, the results show that the Frobenius norm of U is decreasing steadily when the criterion function J(U) tends to be constant 0 during iteration.By the numerical experiments, we can get U * = 0.0000 0.0000 0.0000 −0.4000 0.0000 −0.0000 −0.0000 0.0000 0.0000 −0.0000 −0.0000 0.0000 −0.4000 0.0000 0.0000 −0.0000It is interesting that the geodesic distance between B 1 and B 2 is d(B 1 , B 2 ) = U * ≈ 0.5657, and the discrepancy between B 1 and B 2 is 2 ) ≈ 0.5657.In the following experiment, we consider the average value of two points B 1 and B 2 .Figure 2 displays the results of running the RSDA, the EHA, and the EGA.Note that the iteration sequence {A p } is expected to converge to a matrix that locates amid the samples {B q }, and the condition B −1 q A p − I 4 < 1 holds.The initial point is set as A 0 = B 1 , which satisfies B −1 2 A 0 − I 4 ≈ 0.5921 < 1.More initial points can be chosen in a neighbor of the matrix B 1 by the following rule (25).It can be found that the averaging algorithms converge steadily and the RSDA has the fastest convergence rate among three algorithms and needs one iteration to obtain the average value as follows: Ā = 1.0453 0.0000 0.0000 0.3045 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.3045 0.0000 0.0000 1.0453 .However, the EHA and the EGA need 13 and 18 iterations to realize the same accuracy, respectively.It is interesting to compare the discrepancy D( Ā, B 1 ) ≈ 0.2828 with the discrepancy D( Ā, B 2 ) ≈ 0.2828, which confirms that the computed average value is truly a midpoint (i.e., it is located at approximately equal discrepancy from the two matrices).

Numerical Experiments on Averaging Several Lorentz Matrices
The numerical experiments rely on the availability of a way to generate pseudo-random samples on the Lorentz group.Given a point B ∈ O(n, k) which is referred to as "center of mass" or simply center of the random distribution, it is possible to generate a random sample B q ∈ O(n, k) in a neighbor of a matrix B by the rule [26,27] B q = B e B −1 X , where the center matrix B can be generated by a curve departing from the identical matrix , B = e V with V generated randomly matrix in o(n, k).
According to the structure of the tangent space T B O(n, k), expression (24) can be written where C ∈ R (n+k)×(n+k) is any unstructured random matrix., which can be obtained after seven iterations using the RSDA, while the EHA and the EGA need 15 iterations.Figure 3b gives the norm A −1 ∇ A f of the Riemannian gradient (17) during iteration by the RSDA, from which we can obtain that the norm of the Riemannian gradient tends to be constant 0 after iteration 6. Figure 3c,d give the values of the discrepancy D(A, B q ) before iteration and after iteration, respectively.In particular, note that all the discrepancy from the samples B q and the matrix A are decreased substantially.The results show that the convergence rate of the RSDA is still the fastest among three algorithms.

Conclusions
In this paper, we consider the averaging optimization problem for the Lorentz group O(n, k), namely, to measure the average of a set of Lorentz matrices.In order to tackle the related optimization problem, a geodesic-based Riemannian-steepest-descent algorithm is presented on the Lorentz group endowed with a left-invariant metric (Riemannian metric).The devised averaging algorithm is compared with the extended Hamiltonian algorithm and the Euclidean gradient algorithm.The results of numerical experiments over the Lorentz group O(3, 1) show that the convergence rate of the geodesic-based Riemannian-steepest-descent algorithm is the best one among these three algorithms.

Figure 1 .
Figure 1.The criterion function J(U) and the Frobenius norm of U during iteration, respectively.

Figure 3 Figure 3 .
Figure 3. (a) convergence comparison among the RSDA, the EHA and the EGA during iteration; (b) the norm A −1 ∇ A f of the Riemannian gradient (17) during iteration by the Riemannian-steepest-descent algorithm (RSDA);(c-d) the discrepancy D(A, B q ) before iteration (A = A 0 ) and after iteration (A = Ā), respectively.
Figure3ashows the values of the criterion function(14) through the RSDA, the EHA and the EGA.Numerical experiments show that the average value of 50 Lorentz matrices is