Next Article in Journal
A Sparse Quasi-Newton Method Based on Automatic Differentiation for Solving Unconstrained Optimization Problems
Previous Article in Journal
Enhance Contrast and Balance Color of Retinal Image
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Tutorial

Manifold Calculus in System Theory and Control—Fundamentals and First-Order Systems

Dipartimento di Ingegneria dell’Informazione, Università Politecnica delle Marche, Via Brecce Bianche, I-60131 Ancona, Italy
Symmetry 2021, 13(11), 2092; https://doi.org/10.3390/sym13112092
Submission received: 23 August 2021 / Revised: 12 October 2021 / Accepted: 26 October 2021 / Published: 4 November 2021

Abstract

:
The aim of the present tutorial paper is to recall notions from manifold calculus and to illustrate how these tools prove useful in describing system-theoretic properties. Special emphasis is put on embedded manifold calculus (which is coordinate-free and relies on the embedding of a manifold into a larger ambient space). In addition, we also consider the control of non-linear systems whose states belong to curved manifolds. As a case study, synchronization of non-linear systems by feedback control on smooth manifolds (including Lie groups) is surveyed. Special emphasis is also put on numerical methods to simulate non-linear control systems on curved manifolds. The present tutorial is meant to cover a portion of the mentioned topics, such as first-order systems, but it does not cover topics such as covariant derivation and second-order dynamical systems, which will be covered in a subsequent tutorial paper.

1. Introduction

The theory of dynamical systems whose state spaces possess the structures of curved manifolds has been applied primarily in physics (especially to mathematically describe the theory of general relativity at the beginning of the 20th century). More recently, dynamical systems on manifolds have proven their relevance in a number of subjects in engineering and applied sciences such as in robotics and in biomedical engineering [1,2,3,4,5] to name a few. The observation at the core of such applications is that those dynamical systems whose descriptive variables are bound to one another by non-linear holonomic constraints may be studied by means of the rich variety of mathematical tools provided by manifold calculus (a shortened terminology to denote ’calculus on manifold’) and may be framed in the class of dynamical systems on manifold.
The structures of the state manifolds of such dynamical systems depend on the application at hand. As a concrete example, whenever the dynamics of a rigid body is concerned, the state manifold of choice is the special orthogonal group SO ( 3 ) , because it encodes the attitude of a flying drone or a submarine robot. Most applications of interest (such as the computation of the dynamics of flying bodies) concern well-studied and well-understood curved manifolds, such as the special orthogonal group, the unit hypersphere and the group of symmetric, positive-definite matrices.
From the perspective of performing numerical simulations of dynamical systems on a computing platform, it is necessary to design adequate numerical methods to compute approximate solutions which still meet the structures of the state manifolds. Classical numerical methods, such as those in the Euler class or in the Runge–Kutta class, will fail when applied directly to such dynamical systems because they were designed to work on flat spaces and cannot cope with non-flat manifolds [6].
As a specific applied field, the time synchronization of first-order dynamical systems on curved state manifolds by non-linear control will be surveyed. The time synchronization of non-autonomous dynamical systems has been applied in physiology [7,8], ecology [9], atmosphere physics [10], neurology [11] and many more applied fields [12,13,14]. Synchronization theory is a topic that appears to be very interesting from an abstract point of view and, at the same time, very useful in a number of applications. Indeed, synchronization theory appears to be an exciting and interdisciplinary research topic that combines key ideas from system theory, control theory and manifold calculus. In particular, non-linear control theory on manifolds may supply tools to design control fields to force a pair of dynamical systems into synchronizing their dynamics over time [15,16].
The present tutorial paper is devoted to recalling fundamental notions from manifold calculus and to explaining how these concepts apply to system theory and to non-linear control theory, taking time synchronization as a representative case of study in non-linear control. The present contribution is oriented to readers who either possess a good command of calculus but not of manifold calculus, or to readers who claim an understanding of theoretical manifold calculus, but do not have enough insight in the application/computation aspects of this field. In addition, a basic understanding of system theory and control theory will be assumed. The content of the present tutorial paper may be summarized as follows:
  • It provides a clear and well-motivated introduction to manifold calculus, the basis of system and control theories on manifolds, with special emphasis on computational and applicational aspects. The present contribution provides practical formulas to deal with those real-valued manifolds that, in the author’s experience, are the most accessed in engineering and applied science problems. As a matter of fact, complex-valued manifolds are not treated at all.
  • It clearly states and illustrates the idea that, when one wishes to perform a simulation, by a computing platform, of dynamical systems on manifolds described in terms of differential equations, it is necessary to time-discretize such differential equations in a suitable way. In order to achieve such a discretization, it is not safe to invoke standard discretization methods (such as the ones based on Euler forward–backward discretization), which do not work as they stand on curved manifolds. One should therefore resort to more sophisticated numerical integration techniques.
  • By the author’s choice, the present tutorial paper does not carry any graphical illustrations nor any numerical simulation results. Readers who are interested in deepening their understanding of this topic are invited to to sketch graphs autonomously and to code examples in their favorite programming language.
The present paper is organized as follows. Section 2 lays out some introductory material, such as the motivation behind manifold calculus and a list of manifolds mostly accessed in applications. Section 3 introduces the notion of curves and tangent bundles and allied topics, such as normal spaces. Section 4 provides a first introduction, with examples, to first-order dynamical systems on manifolds. Section 5 presents a special kind of derivative of a function having a manifold as a domain and a manifold as a co-domain, termed a pushforward map. Section 6 introduces a class of manifolds that have peculiar features: the Lie groups. Section 7, in detail, treats the fundamental concept of metrization of a curved space through the more familiar notion of metrization of vector spaces. Section 8 introduces notions such as geodesic lines, Riemannian distance and exponential maps. Section 9 surveys the notion of Riemannian gradient and illustrates such a concept via examples. Section 10 introduces and exemplifies the notion of parallel transport along a curve, which is of paramount importance in manifold calculus and its computer-based implementations. Section 11 outlines the concept of manifold retraction and vector transport, which are computationally convenient approximations of exponential maps and parallel transport, respectively. Section 12 illustrates a feedback control theory suitable for application to first-order systems insisting on state manifolds. Section 13 introduces the notion of Riemannian Hessian, which stems from a second-order approximation of a manifold-to-scalar function, and recalls optimization algorithms that extend the Newton method to look for a zero of a vector field. Section 14 concludes this tutorial paper.
As a distinctive aspect of the present tutorial paper, the main flow of discussion is based on coordinate-free (or component-free) expressions, which facilitates the implementation of the main equations by a matrix-friendly computational language (such as MATLAB © ). Coordinate-free manifold calculus is introduced by the technical tool of embedded calculus, which stems from embedding a manifold into a larger ambient space where the calculation rules are simpler and more familiar to readers. The starred subsections, namely subsections marked by an asterisk (*), address some specific arguments, related to coordinate-prone manifold calculus, which may be skipped by the uninterested readers without detriment to the comprehension of the main flow of this presentation.
The present tutorial paper does not cover a number of subjects, such as the covariant derivation of vector fields, continuous-time second-order dynamical systems arising from a Lagrangian framework nor higher-order discrete-time dynamical systems, nor the key topics related to manifold curvature. These topics will be the subject of a forthcoming tutorial paper.

2. Coordinate-Free Embedded Manifold Calculus

In this section, we recall the relevant notation and fundamental properties of matrix calculus, as well as several examples of manifolds of interest in engineering and applied sciences.
In manifold calculus, both embedded (or extrinsic) and intrinsic coordinates may be accessed. With the aim of elucidating the difference between extrinsic and intrinsic coordinates for manifold elements, let us mention the case of the space SO ( 2 ) of planar rotations. The space SO ( 2 ) is a mono-dimensional manifold; therefore, any element X SO ( 2 ) may be pointed to by one intrinsic coordinate: a matrix X SO ( 2 ) may be represented as
X ( θ ) = cos θ sin θ sin θ cos θ ,
with θ [ 0 , 2 π ) . On the other hand, by embedding the space SO ( 2 ) into the space R 2 × 2 of 2 × 2 real-valued matrices, it turns out that any element of SO ( 2 ) may be regarded as a two-by-two real-valued matrix
X = X 11 X 12 X 21 X 22 ,
whose entries must satisfy the four non-linear constrains:
X 11 2 + X 21 2 = 1 , X 22 2 + X 12 2 = 1 , X 11 X 12 + X 21 X 22 = 0 , X 11 X 22 X 12 X 21 = 1 .
The four parameters X 11 , X 12 , X 21 , X 22 denote embedded or extrinsic coordinates. The orthogonality and normality constraints of the columns of the matrix X may be represented through two coordinate-free constraints in a compact way as
X X = I 2   and   det ( X ) = 1 ,
where a superscript denotes matrix transposition and I 2 denotes a 2 × 2 identity matrix.
Another illustrative example is the ordinary sphere S 2 that may be parametrized as follows:
x = cos θ sin φ y = sin θ sin φ z = cos φ
with parameters 0 θ < 2 π and 0 φ π (in cartography, the coordinate θ is termed ‘latitude’, while the coordinate φ is termed ‘longitude’.) In general, the dimension of a manifold denotes the minimal number of parameters that are required to individuate a point on a manifold. While in common speech the ordinary sphere is termed ‘three-dimensional’, it is indeed a bi-dimensional manifold embedded in a three-dimensional ambient space. The dimension of the manifold is 2; the dimension of the embedding space, instead, is 3.
In general, in engineering and applied mathematics, using intrinsic coordinates is deprecated. As we have just seen, representing an ordinary sphere through intrinsic coordinates is fairly easy, but what about, for example, a hypersphere S 9 ? Any intrinsic representation would require nine angles and a series of complicated trigonometric expressions. Assuming that a manifold of interest may be embedded into a larger ambient space A , it is way easier to represent one such manifold as a subset of points of an ambient space that meet certain constraints. For example, the sphere S 9 may be embedded in R 10 and every point of such a sphere may be represented by an array x R 10 such that
i = 1 10 x i 2 = x x = 1 .
Clearly, the ten coordinates x i , termed embedded coordinates, turn out to be unnecessary (nine coordinates would be enough). The existence of an embedding may be evaluated through classical results of manifold calculus, which also state the best dimension of the ambient space A on the basis of the structural properties of the embedding. Two such results are the strong Whitney embedding theorem and the Nash isometric embedding theorem for Riemannian manifolds. In the present tutorial, we are concerned exclusively with manifolds that may be embedded into a linear ambient space.
The number of extrinsic coordinates is, in general, far larger than the number of intrinsic coordinates; hence, a coordinate-free embedded representation is, in general, redundant. However, as long as computer implementation is concerned, a coordinate-free representation is advantageous because computing languages, such as MATLAB © and Python, generally deal seamlessly with bi-dimensional arrays. With reference to the sphere, let us examine a simple example that tells us why manifolds need their own calculus.
Example 1.
Take a point x S 2 and a direction v R 3 { 0 } , such that x v = 0 , and let us write the parametric equation of a line departing from x in the direction v as ( t ) : = x + t v , with 0 t 1 . It is important to point out that none of the points over such a line belong to the sphere, except for the point x. In fact, we have
( t ) ( t ) = ( x + t v ) ( x + t v ) = 1 + t 2 v 2 ,
since v is orthogonal to x. It turns out that ( t ) ( t ) 1 if t 0 , hence the assertion. ■
This simple example tells us that standard constructs, such as straight lines, do not work on curved manifolds, hence the need of a specific calculus to be consistently developed.

2.1. General Notation and Properties

In general, column arrays are denoted by a lower-case letter, while matrices are denoted by an upper-case letter. Manifold elements, for a generic manifold, are denoted as lower-case letters.
A number of matrix functions and factorizations will be invoked in this paper, namely:
  • Matrix trace: The trace of a square matrix M R p × p (namely the sum of its principal-diagonal entries) is denoted by tr ( M ) . Matrix trace has a cyclic permutation invariance property. For example, given three conformable (i.e., mutually multipliable) matrices A , B , C , it holds that:
    tr ( A B C ) = tr ( B C A ) = tr ( C A B ) .
  • Matrix square root: Given a matrix M R p × p , its square root R is the unique matrix such that R 2 = P . Not every matrix admits a square root. Special square roots (such as a symmetric square root) will be defined later.
  • Spectral factorization: Given a matrix M R p × p , let us assume that there exists an orthogonal matrix X (i.e., one such that X X = I p ) and a real diagonal matrix D such that M = X D X . The expression on the right-hand side denotes the spectral factorization of the matrix M. Such a factorization turns out to be very useful in evaluating matrix polynomials. For example, it holds that
    M 4 = ( X D X ) 4 = ( X D X ) ( X D X ) ( X D X ) ( X D X ) = X D ( X X ) D ( X X ) D ( X X ) D X = X D D D D X = X D 4 X .
    Exponentiating full matrices is cumbersome, while exponentiating diagonal matrices is amiable.
  • Thin QR factorization: Any matrix M R p × q , with p q , may be factored as the product of a p × p orthogonal matrix Q and a p × q upper triangular matrix R, namely M = Q R . In general, a QR factorization is not unique [17]. To remove such kind of indeterminacy, the R-factor may be chosen with strictly positive entries on its main diagonal, so that the factorization is unique.
  • Compact singular value factorization (SVD): Compact SVD of a matrix M R p × q is a matrix factorization of the type M = A D B in which D is square diagonal of size r × r , where r min { p , q } is the rank of M, and has only nonzero singular values. In this variant, A denotes a p × r matrix and B denotes a q × r matrix, such that A A = B B = I r [18].
  • Polar factorization: Given a real-valued p × n matrix M, its polar factorization is written as M = X S , where X denotes a p × n matrix such that X X = I n , termed polar factor, and S denotes a symmetric positive semidefinite n × n matrix [19]. The polar factorization of a matrix always exists and, if the matrix is full rank, its polar factor is unique.
  • Matrix exponential: Given a matrix M R p × p , its matrix exponential is denoted as Exp ( M ) . Matrix exponential is defined via a series as
    Exp ( M ) : = k = 0 M k k ! .
    There exist special formulas to compute the matrix exponential via a finite number of operations for special matrices (see, for example, [20] and references therein). For example, for a symmetric matrix M R p × p , that admits a spectral factorization M = X D X , it holds that
    Exp ( M ) = X Exp ( D ) X ,
    where Exp ( diag ( λ 1 , λ 2 , λ 3 , , λ p ) ) = diag ( e λ 1 , e λ 2 , e λ 3 , , e λ p ) .
  • Principal matrix logarithm: Given a matrix M R p × p , its principal matrix logarithm is denoted as Log ( M ) . Matrix logarithm is defined via a series as
    Log ( M ) : = k = 1 ( I p M ) k k , defined   for I p M < 1 .
    In the specialized literaure it is possible to find special formulas to compute the principal matrix logarithm via a finite number of operations for special matrices. For example, for a positive-definite matrix M R p × p , which admits a spectral factorization M = X D X , it holds that
    Log ( M ) = X Log ( D ) X ,
    where Log ( diag ( λ 1 , λ 2 , λ 3 , , λ p ) ) = diag ( log λ 1 , log λ 2 , log λ 3 , , log λ p ) . Recall that any symmetric, positive-definite matrix has only positive eigenvalues; hence, Log ( D ) is a real-valued matrix.
The other notation used within this paper is defined at its earliest occurrence. We just recall that the symbol : = denotes a definition and that the derivative of a function F ( t ) with respect to its scalar parameter t is denoted as F ˙ .

2.2. Manifolds and Embedded Manifolds (or Submanifolds)

The formal definition of a smooth manifold is quite convoluted, as it requires notions from mathematical topology [21]. More practically, a manifold may be essentially regarded as a generalization of surfaces in higher dimensions that is endowed with the noticeable property of being locally similar to a flat space. In addition, a manifold may generally be regarded as an abstract mathematical object, not necessarily ready for computation, whereas, in practice, manifolds of interest in applied science and engineering are essentially matrix manifolds.
Let us consider a smooth manifold M and a point x on it. From an abstract point of view, x is an element of a set M and does not necessarily carry any intrinsic numerical features. In order to be able to develop calculus on manifolds, to, for instance, compute the directional derivative of a function f : M R , it is convenient to ‘coordinatize’ a manifold. To this aim, let us take a neighborhood (open set) U M that contains the point x and a coordinate map ψ : U R p . The size p denotes the minimal number of coordinates that is necessary to specify the location of the point x unequivocally and is taken as the dimension of the manifold. The map ψ needs to be a one-to-one map. In this way, we attach a set of coordinates ψ ( x ) to the point x. Such a construction establishes a smooth one-to-one correspondence between a point on a manifold and its coordinate point; therefore, the two concepts may be confused and we may safely speak of a point x M when actually speaking of its coordinates.
The above theoretical construction carries a number of practical drawbacks. For instance, notice that in general a manifold cannot be covered by a single coordinate map. Indeed, in general a manifold needs to be covered by a number of neighborhoods U k , each of which is equipped with a coordinate map ψ k : U k R p . The set { U k } thus forms a basis for the manifold. Such a basis need not be finite, although it needs to be countable and is termed ‘atlas’. In general, the basis neighborhoods U k may happen to be overlapping one another; hence, the coordinate maps ψ k need to satisfy the compatibility conditions. Such conditions formalize the natural requirement that there need to be a one-to-one smooth correspondence between any two different coordinate systems insisting on regions of the manifolds belonging to more than one neighborhood. In formal terms, if U k U h , then the ‘transition functions’ ψ k 1 ψ h and ψ h 1 ψ k should possess the structure of diffeomorphisms, namely C functions endowed with C inverses.
As mentioned in the introduction, in the present tutorial we are neglecting coordinates in favor of embeddings, except in starred sections that cover coordinate-prone calculations and that may be skipped by uninterested readers. A smooth manifold is by nature a continuous object. Manifolds of interest in applications, described in embedded terms, may be summarized as follows:
  • Hypercube: The simplest manifold of interest is perhaps the hypercube R p , which is essentially the set spanned by p real-valued variables (or p-tuples).
  • Hypersphere, oblique manifold, hyperellipsoid: A hypersphere is represented as S p 1 : = { x R p x x = 1 } and is the subset of points of the hypercube with unit Euclidean distance from point 0. This is a smooth manifold of dimension p 1 embedded in the hypercube R p ; in fact, with only p 1 coordinates, we can identify unequivocally any point on a sphere. In [21], it is shown how to ‘coordinatize’ such a manifold through, e.g., the stereographic projection, which requires two coordinate maps applied to two convenient neighborhoods (each including only one ‘pole’) on the sphere. The special cases are S 1 , the unit circle, and S 2 , the ordinary sphere. There exist a number of applications insisting on the hyperspheres S p 1 such as, for instance, blind deconvolution [22,23], data classification [24], adaptive pattern recognition [25] and motion planning, optimization, and verification in robotics and in computational biology [26]. A smooth manifold closely related to the unit hypersphere is the oblique manifold [27], defined as:
    OB ( p ) : = { X R p × p diag ( X X ) = I p } ,
    where the operator diag ( · ) returns the zero matrix except for the main diagonal that is copied from the main diagonal of its argument. The structure of the oblique manifold may be easily studied on the basis of the unit hypersphere; in fact, the following identification holds true:
    OB ( p ) S p 1 × S p 1 × × S p 1 p times ,
    hence, each column of a OB ( p ) -matrix may be treated as a S p 1 -column array. The hyperellipsoid L p 1 is defined as:
    L p 1 : = { x R p D R x 2 = 1 } ,
    with D R p × p being diagonal and positive-definite and R R p × p being a hyper-rotation such that R R = I p and det ( R ) = 1 . The mathematical structure of the hyperellipsoid L p 1 may be studied on the basis of the manifold structure of the hypersphere. The hyperellipsoid L p 1 is used, e.g., in the calibration of magnetometers [28].
  • General linear group and special linear group: The general linear group is defined as GL ( p ) : = { X R p × p det ( X ) 0 }. This is the subset of the space of p × p matrices R p × p which are invertible. The special linear group is defined as Sl ( p ) : = { X R p × p det ( X ) = 1 }. This is the subset of the general linear group made by all matrices with a unitary determinant.
  • Orthogonal group, special orthogonal group, special Euclidean group: An orthogonal group of size p is defined by O ( p ) : = { X R p × p X X = I p } . The manifold O ( p ) has dimension 2 1 p ( p 1 ) . In fact, every matrix in O ( p ) possesses p 2 entries which are constrained by 2 1 p ( p + 1 ) orthogonality/normality restrictions. The manifold of special orthogonal matrices is defined as SO ( p ) : = X R p × p X X = I p , det ( X ) = 1 . A smooth manifold closely related to the special orthogonal group is the special Euclidean group, denoted as SE ( p ) , that finds applications in robotics (see, e.g., [29]). The special Euclidean group is a set of ( p + 1 ) × ( p + 1 ) matrices defined as:
    SE ( p ) : = X δ 0 1 X SO ( p ) , δ R p .
  • Stiefel manifold: The (compact) Stiefel manifold is defined as:
    St ( n , p ) : = { X R n × p X X = I p } ,
    where p n . Avery Stiefel matrix has n p entries, but its elements are constrained by 2 1 p ( p 1 ) non-linear constraints, and hence the dimension of a Stiefel manifold St ( n , p ) is n p 2 1 p ( p 1 ) . Exemplary applications are blind source separation [30], non-negative matrix factorization [31], best basis search/selection [32], electronic structures computation [33] and factor analysis in psychometrics [34]. A generalization of the Stiefel manifold, to be applied to principal subspace tracking, was studied in the contribution [35]. Such a generalized Stiefel manifold is defined as:
    St B ( n , p ) = { X R n × p X B X = I p } ,
    where B denotes any symmetric, positive-definite n × n matrix. The contribution [35] studied the structure of the tangent bundle of the generalized Stiefel manifold and suggested a computationally convenient calculus for this manifold.
  • Real symplectic group: The real symplectic group is defined as
    Sp ( 2 n ) : = { Q R 2 n × 2 n | Q J Q = J } , J : = 0 n I n I n 0 n ,
    where the symbol I n denotes again a n × n identity matrix and the symbol 0 n denotes a whole-zero n × n matrix. The skew-symmetric matrix J enjoys the following properties: J 2 = I 2 n , J 1 = J = J .
  • Manifold of symmetric, positive-definite (SPD) matrices: The main features of the space
    S + ( p ) : = { P R p × p | P = P , P > 0 }
    of symmetric, positive-definite matrices was surveyed, e.g., in [36,37]. We recall that a matrix P R p × p is termed positive definite if for every column array z R p { 0 } it holds that z P z > 0 . A useful property is that the (real) eigenvalues of any SPD matrix are all strictly positive. A related manifold is the space S + ( n , p ) of symmetric fixed-rank positive-semidefinite matrices, that may be defined as:
    S + ( n , p ) : = { X X X R n × p , rank ( X ) = p } .
    The growing use of low-rank matrix approximations to retain tractability in large-scale applications boosted extensions of the calculus of positive-definite matrices to their low-rank counterparts [38].
  • Grassmann manifold: A Grassmann manifold Gr ( n , p ) is a set of subspaces of R n spanned by p-independent vectors, namely
    Gr ( n , p ) = { span ( w 1 , w 2 , w 3 , , w p ) } ,
    with w 1 , w 2 , w 3 , , w p R n being a p-tuple of arbitrary and linearly independent n-dimensional arrays. Grassmann manifolds are compact, smooth manifolds and are special cases of more general objects termed flag manifolds [39]. A representation of any of such subspaces may be assumed as the equivalence class [ X ] = { X R X St ( n , p ) , R O ( p ) } [33]. In practice, an element [ X ] of the Grassmann manifold Gr ( n , p ) is represented by a matrix in St ( n , p ) , whose columns span the subspace [ X ] .
It is easy to see that manifolds of interest in applications may exhibit large dimensions. Coordinatizing these manifolds may be inconvenient for practical purposes when their sizes are larger than some units. For this reason, matrix manifolds are treated as submanifolds of R p or R p × n and their elements are represented by solid arrays.
In general, we shall denote as A an ambient space that a manifold is embedded into. We shall assume that any ambient space A of interest in the present tutorial paper is a Euclidean vector space, namely, a finite-dimensional vector space over the real numbers endowed with an inner product, that we shall denote as · , · A .
Example 2.
An elementary example of manifold calculus involving the circle S 1 is the Borsuk–Ulam theorem.
Theorem 1.
Let f : S 1 R denote a continuous function. There exist two antipodal points x , x S 1 such that f ( x ) = f ( x ) .
Proof. 
Let g ( y ) : = f ( y ) f ( y ) with y S 1 . Choose an arbitrary point x 0 S 1 :
  • If g ( x 0 ) = 0 , then x = x 0 ;
  • If g ( x 0 ) 0 , notice that g ( x 0 ) = f ( x 0 ) f ( ( x 0 ) ) = ( f ( x 0 ) f ( x 0 ) ) = g ( x 0 ) . Since g is continuous and takes a different sign in two different points of its domain, there must exist at least a point x S 1 such that g ( x ) = 0 .
The above cases prove the assertion by exhaustion. □
The Borsuk–Ulam theorem has an interesting consequence for the temperature of the Earth. In fact, let us identify the equator with S 1 and the temperature over the equator with the function f, then the Borsuk–Ulam theorem implies that there exist two antipodal points over the equator where the temperature is the same. Indeed, the Borsuk–Ulam theorem holds in every dimension, which implies, for example, that if S 2 represents the Earth’s surface and f : S 2 R 2 returns the atmospheric temperature and the Earth’s pressure in a point of the Earth surface, then there exist two antipodal points on Earth that exhibit the same temperature-pressure value pair! ■
Matrix factorizations are characterized by a paramount importance in applied calculations, as illustrated in the following.
Example 3.
Let us recall that every symmetric, positive-definite matrix may be factorized as:
P = R D R ,
where P S + ( n ) , R SO ( n ) and D denotes a diagonal n × n matrix. In fact, D is the matrix whose in-diagonal entries coincide with the eigenvalues and R is the matrix whose columns coincide with the eigenvectors of P. The factorization (24) simplifies some calculations, for example, it holds that:
P 3 = P P P = ( R D R ) ( R D R ) ( R D R ) = R D ( R R ) D ( R R ) D R P 3 = R D D D R = R D 3 R , P = R D R , Exp ( P ) = i = 0 P i i ! = i = 0 ( R D R ) i i ! = i = 0 R D i R i ! = R Exp ( D ) R , P 1 = R D 1 R .
The net result is that convoluted matrix operations may be turned into fairly simple computations that require a finite number of elementary operations. ■
Manifolds may be classified as compact or non-compact:
  • The manifolds S n 1 A , with A : = R n , and SO ( n ) A , with A : = R n × n , are compact since there exists a ball B ( 0 , r ) A , with r < , that contains them.
  • The manifold S + ( n ) is non-compact since no t finite-radius ball exists that contains it.
In practical terms, the elements of a compact manifold are necessarily limited in value, while the elements of a non-compact manifold may take arbitrarily large values.
From a computational point of view, dynamical systems on compact manifolds, even if they turn out to be unstable, do not pose serious implementation problems, while dynamical systems on non-compact manifolds may result in implementational difficulties (and easily cause runtime errors).

3. Smooth Curves, Tangent Vector Fields, Tangent Spaces and Bundle, Normal Spaces

An interesting object we may think of on a smooth manifold M is a smooth curve γ : [ ϵ , ϵ ] M , with ϵ > 0 . It is worth remarking that a curve may cross different coordinate charts ( U k , ψ k ) ; therefore, it is generally necessary to split a curve in as many branches (or segments) as coordinate charts it crosses. The function γ describes a curve on the manifold M delimited by the endpoints γ ( ϵ ) and γ ( ϵ ) .

3.1. Curves and Bundles for Embedded Manifolds

Let us assume that a manifold M is embedded into an ambient space A of suitable dimensions (for instance, the sphere S p 1 is embedded in the ambient space A : = R p ). In the following, a number of examples, and a counterexample, are discussed to clarify the important notion of a smooth curve on a manifold.
Example 4.
On a hypersphere S n 1 , consider the following function:
γ ( t ) : = x + t v ( x + t v ) ( x + t v )
with t [ ϵ , ϵ ] and x , v R n being arbitrary (but let us avoid the singularity x + t v = 0 ). Notice that γ ( 0 ) = x / x . To verify that such a function γ traces indeed a curve on the hyersphere it suffices to show that, for every t [ ϵ , ϵ ] and x , v R n , it holds that γ ( t ) S n 1 . Indeed,
γ ( t ) γ ( t ) = ( x + t v ) ( x + t v ) ( x + t v ) ( x + t v ) 2 = 1 .
In addition, consider the following function
γ ( t ) : = x cos ( v v t ) + v v v sin ( v v t ) ,
with t [ ϵ , ϵ ] and with x S n 1 and v R n { 0 } such that v x = 0 . (In this case, it turns out that γ ( 0 ) = x .) Similarly to the previous example, in order to prove that γ is a valid curve on the hypersphere, it suffices to show that γ ( t ) S n 1 for every t [ ϵ , ϵ ] :
γ ( t ) γ ( t ) = [ x cos ( v t ) + v v sin ( v t ) ] [ x cos ( v t ) + v v sin ( v t ) ] = ( x x ) cos 2 ( v t ) + 2 x v v sin ( v t ) cos ( v t ) + v v v 2 sin 2 ( v t ) = ( 1 ) cos 2 ( v t ) + 2 ( 0 ) v sin ( v t ) cos ( v t ) + ( 1 ) sin 2 ( v t ) = 1 .
It is not difficult to evaluate γ ( t ) and, in particular, to show that γ ( 0 ) = v .
Let us now consider an example of a curve over the manifold SO ( n ) of the hyper-rotations. In particular, let us examine the following curve on the manifold SO ( 2 ) :
γ ( t ) : = cos ( b t ) sin ( b t ) sin ( b t ) cos ( b t )
with t [ ϵ , ϵ ] and b R being arbitrarily chosen. (Notice that γ ( 0 ) = I 2 .) In order to show that function γ represents a curve over the space of planar rotations, it suffices to compute the product γ ( t ) γ ( t ) and to show that it keeps equal to the identity I 2 and that det ( γ ( t ) ) = 1 for every value of t in its range:
γ ( t ) γ ( t ) = cos ( b t ) sin ( b t ) sin ( b t ) cos ( b t ) cos ( b t ) sin ( b t ) sin ( b t ) cos ( b t ) = cos ( b t ) sin ( b t ) sin ( b t ) cos ( b t ) cos ( b t ) sin ( b t ) sin ( b t ) cos ( b t ) = 1 0 0 1 , det ( γ ( t ) ) = cos 2 ( b t ) ( sin 2 ( b t ) ) = 1 .
Further to this, let us prove that the following function represents a curve over the manifold SO ( n ) :
γ ( t ) : = X ( I n + t H ) ( I n t H ) 1
with t [ ϵ , ϵ ] , X SO ( n ) and H = H R n × n . (Even in this case, it holds that γ ( 0 ) = X .) In order to prove such a statement, it is necessary to compute γ ( t ) γ ( t ) and to show that it equals I n and that det ( γ ( t ) ) = 1 at any time:
γ ( t ) γ ( t ) = [ X ( I n + t H ) ( I n t H ) 1 ] X ( I n + t H ) ( I n t H ) 1 = ( I n t H ) ( I n + t H ) ( X X ) ( I n + t H ) ( I n t H ) 1 = ( I n + t H ) 1 ( I n + t H ) ( I n + t H ) ( I n t H ) 1 = ( I n + t H ) [ ( I n + t H ) 1 ( I n + t H ) ] ( I n t H ) 1 = ( I n + t H ) ( I n t H ) 1 = ( I n t H ) ( I n t H ) 1 = I n , det ( γ ( t ) ) = det ( X ) det ( I n + t H ) det 1 ( I n t H ) = det ( X ) det ( I n + t H ) det 1 ( ( I n + t H ) ) = 1 ,
where the superscript denotes the inverse of the transposed matrix.
Notice that, in the proof, we have made use of some matrix identities, including the commutativity property ( I n + t H ) ( I n t H ) 1 = ( I n t H ) 1 ( I n + t H ) and det ( A ) = det ( A ) .
As a further example, let us consider the following function, which we shall prove to represent a curve over the manifold of the 2 × 2 symmetric, positive-definite matrices:
γ ( t ) : = a ( t + 1 ) 2 b ( t + 1 ) 2 b ( t + 1 ) 2 c ( t + 1 ) 2
with t [ ϵ , ϵ ] , 0 < ϵ < 1 , a > 0 and a c b 2 > 0 . (Notice that γ ( 0 ) = a b b c .) To prove the assertion, it suffices to show that γ ( t ) = γ ( t ) and that the eigenvalues of γ ( t ) are strictly positive for any t in its range. Symmetry is immediately verified. The eigenvalues are a ( t + 1 ) 2 > 0 and ( a c b 2 ) ( t + 1 ) 4 > 0 .
Let us now consider a counterexample. Define the following function:
γ ( t ) : = P + t Q ,
with t [ ϵ , ϵ ] and P , Q S + ( n ) arbitrary. In general, the above function does not represent a curve in S + ( n ) . To prove such an assertion, it suffices to recall that, in general, a linear combination of two positive-definite matrices does not result in a positive-definite matrix. As a numerical example, consider the following: P : = 4 1 1 1 and Q : = 2 1 1 1 . Then, γ ( t ) = 4 + 2 t 1 t 1 t 1 + t . It is readily verified that if t [ 4 , 4 ] , then γ ( t ) is not necessarily positive-definite for every value of t in the range (for example, γ ( 3 ) is not positive-definite). From this counterexample, we can verify that the manifold S + ( n ) , as all manifolds just exemplified, is not a linear space with respect to standard matrix operations. ■
From a system-theoretic perspective, a curve may be thought of as a trajectory generated by a dynamical system whose state space is a smooth manifold. Take, for instance, the case of a little magnetized ball rolling over a metallic spherical surface: no matter how the little ball moves over the surface of the sphere, during its motion the contact point will describe a curve γ ( t ) over the larger sphere from the initial time t = ϵ to t = + ϵ . The parameter t may be thought of as a time index, whose progression corresponds to the flowing of time. Thinking of physical objects moving over curved surfaces helps in understanding the main concepts in manifold calculus, even though a certain degree of abstraction is still necessary, not only because the dimensions of manifolds of interest may be far larger than 3, but also because most manifolds are difficult to visualize in practice.
A noteworthy property of curves is that they are equivalent up to a regular reparameterization.
Comparing the position over a smooth curve in two close-by instants provides the notion of speed, namely, how quickly an object is moving along a trajectory and in which direction. In particular, denoting by γ x : [ ϵ , ϵ ] M a curve such that γ x ( 0 ) = x , the quantity
v x : = lim h 0 γ x ( h ) γ x ( 0 ) h = d γ x ( t ) d t t = 0 = γ ˙ x ( 0 ) A ,
denotes such a speed, which is represented as a tangent vector  v x at the point x on the manifold. Clearly, the vector v x does not belong to the curved manifold M , whereas it is tangent to it in the point x. (Notice that in this tutorial the word ‘vector’ is reserved to tangent vectors, while a column-type array—which may represent a location—is termed ‘array’.)
Consider every possible smooth curve on a manifold of dimension p passing through the point x and compute the tangent vectors to all these curves in the point x at once. The collection of these tangent vectors span a linear space of dimension p, which is referred to as tangent space (or ‘tangent plane’) to the manifold M at the point x, and is denoted with
T x M : = { γ ˙ x ( 0 ) γ x } A .
Since there exist infinitely many curves passing through a given point with the same velocity, one should define a tangent vector as an equivalence class of curves passing through such a given point, while being tangent to each other at that point. A tangent plane is hence a collection of such vectors. In addition, we should underline that each tangent space is a subset of the ambient space due to the special nature of A , being both a point set and a vector space.
Taking a point x M and a tangent vector v x M , any pair ( x , v x ) is thought of as belonging to an abstract set termed tangent bundle, defined as
T M : = { ( x , v ) A 2 x M , v T x M } .
In order to facilitate computation, it is useful to introduce the concept of the normal space of an embedded manifold in a given point under a chosen metric:
N x M : = { z A z , v A = 0 , v T x M } .
The normal space N x M represents the orthogonal complement of the tangent space with respect to a Euclidean ambient space A that the manifold M is embedded into (in fact, some authors denote normal spaces as T x M ).
Let us examine the structure of the tangent/normal spaces of a number of smooth manifolds:
  • Hypercube: Since the space R p is linear, when it is embedded into itself, each tangent space coincides to the whole space, namely, for every, x R p it holds that T x R p R p . Since a normal space is the orthogonal complement to a tangent space, we must conclude that N x R p = { 0 } .
  • Hypersphere: At every point x S p 1 , the tangent space has the structure
    T x S p 1 : = { v R p v x = 0 } .
    The normal space N x S p 1 at every point of the hypersphere, which is the orthogonal complement of the tangent space with respect to the ambient space A : = R p that the manifold S p 1 is embedded in, has the structure
    N x S p 1 : = { λ x λ R }
    in the case that the ambient space is equipped with the Euclidean inner product v , w A : = v w .
  • Special orthogonal group: The tangent space of the manifold SO ( p ) has the structure
    T X SO ( p ) = { V R p × p V X + X V = 0 p } .
    This may be proven by differentiating a generic curve γ ( t ) SO ( p ) passing through X at t = 0 . Every such curve satisfies the orthogonal-group characteristic equation γ ( t ) γ ( t ) = I p ; therefore, after differentiation, one obtains
    γ ˙ ( 0 ) γ ( 0 ) + γ ( 0 ) γ ˙ ( 0 ) = 0 p .
    By recalling that the tangent space is formed by velocity vectors γ ˙ ( 0 ) , the above-mentioned result is readily achieved. Provided the ambient space A : = R p × p is endowed with the canonical Euclidean metric V , W A : = tr ( V W ) , the normal space at a point X may be defined as
    N X SO ( p ) = { N R p × p tr ( N V ) = 0 , V T X SO ( p ) } .
    It is easy to convince oneself that every tangent vector V T X SO ( p ) may be written as V = X H , with H skew-symmetric (i.e., such that H = H ); then, any element N N X SO ( p ) may be written as N = X S , with S = S . In fact, the normality condition implies 0 = tr ( V ( X S ) ) = tr ( S V X ) = tr ( X V S ) , which is equivalent to tr ( ( V X ) S ) ; therefore, the normality condition may be recast as tr ( ( V X ) ( S S ) ) = 0 . It is hence necessary and sufficient that S = S ; therefore,
    N X SO ( p ) = { X S S = S R p × p } .
  • Stiefel manifold: Given a trajectory [ ϵ , ϵ ] t X ( t ) St ( n , p ) , derivation with respect to the parameter t yields X ˙ X + X X ˙ = 0 , which means that the tangent space to the manifold St ( n , p ) in a point X St ( n , p ) has the structure:
    T X St ( n , p ) = { V R n × p V X + X V = 0 } .
    The normal space has the structure:
    N X St ( n , p ) = { X S S R p × p , S S = 0 } .
  • Real symplectic group: The tangent space associated with the real symplectic group has the structure:
    T Q Sp ( 2 n ) = { V R 2 n × 2 n V J Q + Q J V = 0 2 n } .
    The tangent spaces and the normal space associated with the real symplectic group may be characterized as follows:
    T Q Sp ( 2 n ) = { V = Q J S S R 2 n × 2 n , S = S } , N Q Sp ( 2 n ) = { N = J Q H H R 2 n × 2 n , H = H } .
  • Space of symmetric, positive-definite matrices: Given a point P S + ( p ) , its tangent bundle may be characterized simply by observing that every curve γ : [ ϵ , ϵ ] S + ( p ) satisfies γ ( t ) = γ ( t ) and γ ( t ) > 0 . Only the equality constraint influences the structure of the tangent space; therefore,
    T P S + ( p ) = { S R p × p S = S } .
    Notice that every tangent space is identical to each other as it does not depend on the base point P.
  • Grassmann manifold: For every element [ X ] Gr ( n , p ) , the tangent space may be represented as:
    T [ X ] Gr ( n , p ) = { V R n × p | X V = 0 } .
    A tangent space T [ X ] Gr ( n , p ) may be decomposed as the direct sum of a horizontal space and of a vertical space at [ X ] Gr ( n , p ) [40]. Starting from a point [ X ] Gr ( n , p ) , moving along a horizontal direction causes a change in subspace, while moving along a vertical direction does not change the subspace [ X ] .

3.2. Vector Fields

Let us now consider a map v x : M T M that associates a tangent vector v x with every point x of a manifold. One such map is termed tangent vector field (or simply vector field, for short). The set of all tangent vector fields of a manifold M is denoted as Γ ( M ) . Vector fields are objects of prime importance in manifold calculus. In physical system dynamics, vector fields are associated with, e.g., speed and acceleration.
Example 5.
A vector field on a manifold M is a function f : M T M that assigns to every point x M a tangent vector f ( x ) T x M . A vector field may even depend on the time parameter, in which case it will be denoted as f ( t , x ) with f : R × M T M .
Let us now consider the hypersphere S 7 embedded in A : = R 8 and the function f : S 7 R 8 defined as:
f ( x ) : = ( I 8 x x ) A x ,
where x indicates a column array of 8 elements and A R 8 × 8 is an arbitrary constant matrix. Such a function describes a vector field in Γ ( S 7 ) because it holds that f ( x ) T x S 7 for every x. In fact,
x f ( x ) = x ( A x x x A x ) = x A x ( x x ) ( x A x ) = 0 .
Notice that the tangent vector f ( x ) is inextricably tied to the point x on the manifold. In fact, in general, taking two distinct points x , y M , it turns out that f ( x ) T y M . It is then sensible to describe a vector field as a set of pairs:
( x , f ( x ) ) T M .
As a further example, let us consider the manifold SO ( 8 ) embedded in A : = R 8 × 8 and a function f : R × SO ( 8 ) R 8 × 8 :
f ( t , R ) : = R A t + R 2 B R ,
where R is a 8 × 8 orthogonal matrix variable, while A and B are 8 × 8 skew-symmetric constant matrices (namely, A = A and B = B ). Such a function represents a (time-varying) vector field on the manifold of hyper-rotations in that, for every R SO ( 8 ) and t R , it holds that f ( t , R ) T R SO ( 8 ) . Such an assertion may be proven as follows
R f ( t , R ) + f ( t , R ) R = R ( R A t + R 2 B R ) + ( R A t + R 2 B R ) R = ( R R ) A t + ( R R ) R B R + A ( R R ) t + R B R ( R R ) = A t + R B R A t R B R = 0 n .
As a last example, let us take the manifold S + ( 5 ) embedded in A : = R 5 × 5 and a function f : S + ( 5 ) R 5 × 5 defined as
f ( P ) : = P + P 2 + 3 P 1 ,
where P is a 5 × 5 symmetric, positive-definite matrix variable (hence, even P 2 = P P and P 1 are symmetric, positive-definite). It turns out that the function f represents a vector field in Γ ( S + ( 5 ) ) ; in fact, for every P S + ( 5 ) , it holds that f ( P ) T P S + ( 5 ) :
f ( P ) f ( P ) = ( P + P 2 + 3 P 1 ) ( P + P 2 + 3 P 1 ) = P + P 2 + 3 P P P 2 3 P 1 = P + P 2 + 3 P 1 P P 2 3 P 1 = 0 n .
(Notice that, in the proof, only the symmetry of the matrix P and of its powers P 2 and P 1 has been made use of.) ■
As a matter of fact, manifold calculus primarily deals with two kinds of objects:
  • Points on a manifold, denoted as x M A ;
  • Tangent vectors, denoted as v T x M A .
and occasionally on normal arrays, which are instrumental in calculations. In the case of matrix manifolds, which are of prime importance in applications, the ambient space A : = R p × q ; hence, points x and tangent vectors v are essentially arrays or matrices (either rectangular or square).

3.3. Canonical Curves, Canonical Basis of a Tangent Space*

In intrinsic manifold calculus, the way to regard, e.g., tangent spaces and vector fields is based on differential operators [21]. Formally, if F denotes a smooth function space, a tangent vector v T x M is defined in such a way that v : F R ; namely, v ( f ) denotes the directional derivative of the function f F along the direction v. Recall that a differential operator in T x M may be written as a linear combination of elementary differential operators, namely the derivatives along the basis axes, through some coefficients.
Let M denote a smooth manifold of dimension p and let γ x : [ ϵ , ϵ ] M denote any smooth curve such that γ x ( 0 ) = x . Let us assume, for simplicity, that the image of γ x is entirely contained in a chart ( U , ψ ) . The map ψ γ x traces out a smooth curve in the parameter space R p , which is differentiable in the usual multivariable calculus sense. Let us denote the intrinsic coordinates of the points over the curve as
( θ 1 ( t ) , θ 2 ( t ) , θ 3 ( t ) , , θ p ( t ) ) = ψ ( γ x ( t ) ) .
The superscript notation θ i to denote the i th coordinate is standard in manifold calculus. This notation helps when checking expressions written in intrinsic coordinates. Since the chart ψ is invertible by definition, we may represent the curve γ in local coordinates by
γ x ( t ) = ψ 1 ( θ 1 ( t ) , θ 2 ( t ) , θ 3 ( t ) , , θ p ( t ) ) .
In particular, it holds that x = ψ 1 ( θ 1 ( 0 ) , θ 2 ( 0 ) , θ 3 ( 0 ) , , θ p ( 0 ) ) . On the basis of the representation (60), we may define as many as pcanonical curves as γ x i ( t ) : = ψ 1 ( θ 1 ( 0 ) , θ 2 ( 0 ) , , θ i ( 0 ) + t , , θ p ( 0 ) ) for t [ ϵ , ϵ ] and i = 1 , 2 , 3 , , p . Namely, a canonical curve around a point is traced on a manifold by letting only one of the p coordinates vary at a time. Let us now consider the tangent vectors
i x : = γ ˙ x i ( 0 ) = d γ x i ( t ) d t t = 0 T x M .
The set { 1 x , 2 x , , p x } is termed canonical basis of the vector space T x M . Its property of being a basis is implied by the fact that such canonical vectors are linearly independent from one another. Therefore, any tangent vector may be written as a linear combination of canonical vectors, that is
v x = v x 1 1 x + v x 2 2 x + + v x p p x ,
where the quantities v x 1 represent the coefficients of the linear combination. To shorten the equations, the above expression may be written simply as v = v i i , where the information about what happens at the point x has been suppressed and the summation is implied by the presence of a repeated index (i), one in upper position ( i ) and one in lower position ( i ), which is commonly referred to as Einstein summation convention.
Example 6.
Let us consider the case of a sphere S 2 embedded into R 3 . A parametrization of S 2 (excluding the South pole) is
x = ψ 1 ( θ 1 , θ 2 ) : = sin θ 1 cos θ 2 sin θ 1 sin θ 2 cos θ 1
for 0 θ 1 π (where θ 1 is commonly referred to as ‘latitude’) and 0 θ 2 < 2 π (where θ 2 is commonly referred to as ‘longitude’). For example, the point ψ 1 ( 0 , 0 ) coincides with the North pole, while the point ψ 1 ( π 2 , 0 ) lays on the equator of the sphere. Given a point x = ψ 1 ( θ 0 1 , θ 0 2 ) , the curve
γ x 1 ( t ) = sin ( θ 0 1 + t ) cos θ 0 2 sin ( θ 0 1 + t ) sin θ 0 2 cos ( θ 0 1 + t )
traces a ‘meridian’ through x, while the curve
γ x 2 ( t ) = sin θ 0 1 cos ( θ 0 2 + t ) sin θ 0 1 sin ( θ 0 2 + t ) cos θ 0 1
traces a ‘parallel’ through x. Hence, by definition, the tangent space T x S 2 is spanned by the canonical basis vectors
1 x = γ ˙ x 1 ( 0 ) = d γ x 1 ( t ) d t t = 0 = cos θ 0 1 cos θ 0 2 cos θ 0 1 sin θ 0 2 sin θ 0 1
and
2 x = γ ˙ x 2 ( 0 ) = d γ x 2 ( t ) d t t = 0 = sin θ 0 1 sin θ 0 2 sin θ 0 1 cos θ 0 2 0 .
For example, it is readily seen that
1 ( π 2 , 0 ) = 0 0 1 , 2 ( π 2 , 0 ) = 0 1 0 ,
represents tangent vectors at [ 1 0 0 ] (i.e., the North pole). ■
The dual space to a tangent space T x M , denoted as T x M , is termed cotangent space. Elements of a cotangent space are termed covectors ω T x * M . Covectors combine with vectors by ‘annihilation’ to a scalar. Given a canonical basis { 1 x , 2 x , , p x } of T x M , it is possible to define a canonical basis { d x 1 , d x 2 , , d x p } of T x * M by the conditions d x i ( j ) = δ j i . The union of all cotangent spaces form a cotangent bundle T * M : = { ( x , ω ) x M , ω T x * M } .

4. First-Order Dynamical Systems on Manifolds

A number of physical phenomena are described in system theory as systems of coupled differential equations in several variables. A fairly general representation of such systems is
x ˙ ( t ) = f ( t , x ( t ) ) , x ( 0 ) = x 0 R p , t [ 0 , t f ] ,
where x R p denotes a state variable array and represents a set of descriptive variables, and f : R × R p R p represents a state-transition function. The multi-variable array-type function f denotes a (possibly time-varying) vector field that represents the set of velocities of change of the state of the system. The solutions of the differential Equation (69) represent integral curves of the vector field f. In this case, the space R p represents the flat state space and the velocity space of the model (69) at once.
In order to take into account further constraints on the state variables, termed invariants, it is convenient to introduce the notion of curved state space in the form of a smooth manifold. This logical process leads to a type of first-order dynamical system described by
x ˙ ( t ) = f ( t , x ( t ) ) , x ( 0 ) = x 0 M , t [ 0 , t f ] ,
where f : R × M T M denotes the state-transition function of the mathematical model and x M denotes the state of the system at any given time t. Even in this case, the function f denotes the velocity field; however, in this case, the state space is the manifold M , while the velocity space is the tangent bundle T M .
Let us underline two aspects of the above equation:
  • Meaning of x ˙ ( t ) in the expression (70): Since we assumed the manifold M to be embedded in an ambient space A of type R p × q , the quantity x ˙ ( t ) is an array or a matrix made of the derivatives of the entries of x ( t ) with respect to the time parameter t. Let us recall, however, that even if such a specification helps one understand the subject, we shall never write any relation involving single components of such matrix-type objects as we shall treat any state variable x as a whole, (except in a low-dimensional example). The solution x ( t ) of the differential Equation (70) is the trajectory of the system and is represented by a curve on the manifold M ; hence, x ˙ ( t ) denotes the speed along the trajectory since, for every t [ 0 , t f ] , it holds that ( x ( t ) , x ˙ ( t ) ) T M , namely x ˙ Γ ( M ) .
  • Separation (or non-equivalency) of first-order and second-order systems: When dealing with dynamical systems in R p × q , the system (69) is actually fairly general since, upon introducing additional variables, it is possible to turn an n-th order system into a first-order system. We consider now how such a property stems from the legitimate ‘confusion’ between the state space and velocity space. In contrast to this, when dealing with dynamical systems on manifolds, such confusion is not legitimate, since x M while x ˙ T x M ; namely, the system state and the system velocity belong to very different spaces. It suffices to recall that M is generally a curved space (i.e., non linear) while each T x M is a vector space (flat, linear). For this reason, second-order systems are not assimilable to first-order systems.
Example 7.
Let us reconsider the function (52). On the basis of such a vector field, we may define the following first-order system on the hypersphere S n 1 :
x ˙ ( t ) = ( I n x ( t ) x ( t ) ) A x ( t ) , x ( 0 ) = x 0 S n 1 .
Such a system may be rewritten equivalently as:
x ˙ = A x ( x A x ) x .
Let us examine the equilibrium points of such dynamical system. Any equilibrium point x must satisfy the equation
A x = ( x A x ) x .
Whenever A is symmetric and positive-definite, Equation (73) establishes that x is an eigenvector of matrix A. In effect, the state of the system (72) evolves towards an eigenvector of the matrix A. For this reason, the dynamical system (73) represents a prototypical example of a continuous-time calculation system.
The differential Equation (72) is commonly referred to as theOja equationand was studied by Prof. Erkki Oja from the Helsinki University of Technology (currently Aalto University). It was studied in several contexts, including automation and control [41], and constitutes an exemplification of the fact that even a fairly simple computing element might be able to perform a complex calculations, namely, extracting an eigenvector x (and the corresponding eigenvalue λ : = x A x ) from an arbitrarily-sized matrix. ■
Let us now consider an example, drawn from circuit theory, that introduces the notion of ‘invariants’ for a dynamical system, which indeed motivates the study of dynamical systems on manifolds.
Example 8.
In order to emphasize the notion ofinvariantsin connection to dynamical systems, let us consider the simple model of an ideal DC-to-DC converter studied in [42]. The mathematical model of such converter reads:
C 1 d v 1 d t = ( 1 u ) i 3 , C 2 d v 2 d t = u i 3 , L 3 d i 3 d t = ( 1 u ) v 1 u v 2 ,
where v 1 , v 2 are state voltages (across capacitors), i 3 is a state current (across an inductor) and u { 0 , 1 } denotes the control input (a switch). The following function is an invariant for such an electrical circuit
I : = 1 2 C 1 v 1 2 + 1 2 C 2 v 2 2 + 1 2 L 3 i 3 2 .
Indeed, such a quantity represents the total energy across the electrical network. By invariant it is meant that I ˙ = 0 , namely that the time-function I ( t ) keeps the same value for every t. Notice, in fact, that
d I d t = v 1 C 1 d v 1 d t + v 2 C 2 d v 2 d t + i 3 L 3 d i 3 d t = v 1 ( 1 u ) i 3 + v 2 u i 3 + i 3 ( ( 1 u ) v 1 u v 2 ) = 0 .
In order to simplify the analysis of such a dynamical system, let us define the following abstract state variables:
x 1 : = C 1 2 v 1 , x 2 : = C 2 2 v 2 , x 3 : = L 3 2 i 3 .
Let us arrange the above state variables into the state array x : = x 1 x 2 x 3 . The system (74) may thus be rewritten as x ˙ = f ( t , x ) , with
f t , x 1 x 2 x 3 : = 0 0 1 C 1 L 3 ( 1 u ( t ) ) 0 0 1 C 2 L 3 u ( t ) 1 C 1 L 3 ( 1 u ( t ) ) 1 C 2 L 3 u ( t ) 0 = : H ( t ) x 1 x 2 x 3 .
In short, f ( t , x ) = H ( t ) x .
In terms of the new state variables, the invariant may be rewritten as:
I ( x ) : = x 1 2 + x 2 2 + x 3 2 .
Assuming, for instance, that I ( x 0 ) = 1 , Equation (79) establishes that the state x belongs to the sphere S 2 . Hence, we may observe that the state space of the DC-to-DC converter is the unit sphere S 2 embedded in R 3 .
As a last verification step, let us prove that f ( t , x ) = H ( t ) x T x S 2 for every t. It suffices to prove that ( H x ) x = 0 , namely that x H x = 0 . By definition of H, it is readily found to be a skew-symmetric matrix, namely that for every t, it holds that H ( t ) = H ( t ) . By transposing the product x H x , we find the sought result; in fact,
( x H x ) = x H x = ( x H x ) ,
and since x H x is a scalar, it must hold that ( x H x ) = x H x , therefore it must be equal to zero. ■
First-order dynamical systems are characterized by a (possibly varying) vector field that determines its dynamics, as illustrated by the following example.
Example 9.
Let us consider the following first-order dynamical system on the manifold SO ( 3 ) of three-dimensional rotations in space:
R ˙ ( t ) = A R ( t ) A R ( t ) , R ( 0 ) = I 3 .
In this system, the matrix R SO ( 3 ) represents the orientation of a moving orthogonal frame attached to a rigid body with respect to an inertial reference frame, while the arbitrary constant matrix A R 3 × 3 determines its rotational speed (namely, the orientation of the axis of rotation and the rotational velocity).
The orientation matrix is often termedattitudefor rigid bodies such as drones and satellites:
  • The instance R = I 3 indicates that the object is horizontal with respect to the reference frame;
  • The instance R I 3 indicates that it is necessary to rotate the body-fixed axes to align them to the inertial axes.
The dynamical system (81) is of the type R ˙ = f ( R ) . Let us verify that f ( R ) : = A R A R is a vector field of T SO ( 3 ) . To show such a property, it suffices to prove that f ( R ) R + R f ( R ) = 0 3 . This is in fact true:
( A R A R ) R + R ( A R A R ) = A R R A ( R R ) + R A ( R R ) A R = A R R A + R A A R = 0 3 .
It is worth noticing that the property f Γ ( SO ( 3 ) ) holds true even in the case that A is a time-varying matrix field, namely A = A ( t ) . ■

5. Tangent Maps: Pushforward and Pullback

Let us consider two Riemannian manifolds M and N . Any smooth function f : M N transforms a curve in M into a curve in N . Since both curves are associated with their velocity vector fields, which define their tangent spaces, one might wonder how the function f maps tangent spaces in M into tangent spaces in N . The answer is materialized as a pushforward map.
Let x M , then a pushforward map d x f : T x M T f ( x ) N is defined such that for every smooth curve γ x : [ ϵ , ϵ ] M , γ x ( 0 ) = x it holds that:
d d t f ( γ x ( t ) ) t = 0 = : d x f ( γ ˙ x ( 0 ) ) .
In general, therefore, given a function f : M N that maps a point x from the manifold M to the point f ( x ) on the manifold N , the map d x f associates to any tangent vector v belonging to the tangent space T x M a tangent vector d x f ( v ) belonging to the tangent space T f ( x ) N by ‘pushing’ such vector to such space.
A pushforward map is indeed a linear approximation of a smooth map on tangent spaces. Any tangent map d x f ( v ) is linear in the argument v. The map d x f at a point x represents, in practical terms, the best linear approximation of the function f near x. A pushforward map may be regarded as a generalization of the total derivative of ordinary calculus. (Some authors would denote a pushforward map by an asterisk as in f ; however, this notation takes up the space of the reference point x hence hindering it.)
Let us consider two special cases of interest.
  • Pushforward of a manifold-to-scalar function: The special case that N : = R , namely that f is a manifold-to-scalar function, is particularly important in applications. Such a special case will be covered later since it involves the notion of Riemannian gradient.
  • Pushforward of a matrix-to-matrix function: This is the case that the smooth manifolds M and N are real matrix manifolds embedded in R p × p . Any smooth function between any such pair of manifolds is of matrix-to-matrix type. Let us assume that the function f is analytic about a point X 0 M , namely, that it may be expressed as a polynomial series:
    f ( X ) = k = 0 a k ( X X 0 ) k , a k R .
    Then, the pushforward map d X f ( V ) in a point X M applied to the tangent direction V T X M may be expressed as:
    d X f ( V ) = k = 1 a k r = 0 k 1 ( X X 0 ) r V ( X X 0 ) k r 1 .
    It is easily recognized that the tangent map d f X ( V ) is linear in the argument V. As a reference for the readers, we recall the analytic expansion of three matrix-to-matrix functions, that may be used to compute the corresponding pushforward maps:
    • Matrix exponential: For the map f ( X ) = Exp ( X ) , it holds that X 0 = 0 , a k = ( k ! ) 1 for k 0 ;
    • Principal matrix logarithm: For the map f ( X ) = Log ( X ) , it holds that X 0 = I p , a 0 = 0 , a k = ( 1 ) k + 1 k 1 for k 1 ;
    • Matrix inversion: For the map f ( X ) = X 1 , it holds that X 0 = I p , a k = ( 1 ) k for k 0 .
The inverse of a pushforward map d x f is termed pullback map and is denoted as ( d x f ) 1 . A map ( d x f ) 1 ( w ) ‘pulls’ any tangent vector w from the tangent space T f ( x ) N back to the tangent space T x M .

6. Lie Groups, Lie Algebras, Lie Brackets

Lie groups are hybrid mathematical constructions sharing properties that characterize smooth manifolds and algebraic groups. Let us recall that an algebraic group is an algebraic structure ( G , m , i , e ) made of a set G , either discrete or continuous, endowed with an internal operation denoted as m : G × G G , usually referred to as group multiplication (not necessarily a multiplication is the standard sense, though), an inversion operation denoted as i : G G , and an identity element with respect to group multiplication denoted as e. The multiplication, inversion and identity are related in such a way that, for every triple x , y , z G , it holds that:
m ( x , e ) = m ( e , x ) = x m ( x , i ( x ) ) = m ( i ( x ) , x ) = e , m ( x , m ( y , z ) ) = m ( m ( x , y ) , z ) .
Note that, in general, group multiplication is not commutative, i.e. m ( x , y ) m ( y , x ) .
Two instances of algebraic groups are ( Z , + , , 0 ) , a discrete group, and ( GL ( p ) , · , 1 , I p ), a continuous group. The structure ( Z , + , , 0 ) represents the set of integer numbers (either positive or negative, also including 0) with the standard addition as a group multiplication, which implies that the inverse is the standard subtraction while the identity is the 0 element. The second example, namely the structure ( GL ( p ) , · , 1 , I p ), represents the subset of non-singular matrices of size p × p , endowed with the standard matrix-to-matrix multiplication ‘·’ as group multiplication. In such a case, the inverse operation coincides with the standard matrix inversion while the group identity coincides with the identity matrix I p . It is straightforward to show that such group operations/identities satisfy the recalled group axioms. A counterexample of a structure that is not an algebraic group is given by the set of non-negative integer numbers Z 0 + , which does not form a group under standard addition/subtraction (in fact, the subtraction of two positive integers does not necessarily return a positive integer).
With the notions of algebraic groups and smooth manifolds, we may now define a well-known object of manifold calculus, namely a Lie group. A Lie group conjugates the properties of an algebraic group and of a smooth manifold, as it is a set endowed with group properties and a manifold structure. Paraphrasing Wirth:
Lie group := Manifold + Algebraic group.
Let us denote by T x G the tangent space of a Lie group G at a point x G . The tangent space at the identity, namely T e G , represents a special instance of tangent spaces. In fact, such a tangent space, upon being endowed with a binary operator termed Lie brackets, possesses the structure of a so-called Lie algebra and will be denoted as g .
Let us examine the structure of the Lie algebra of the following Lie groups:
  • Hypercube: The hypercube, also known as a translation group, is a Lie group under standard matrix sum (matrix subtraction and zero matrix complete the group structure). The Lie algebra of R p coincides with itself.
  • General linear group and the special linear group: Both the general linear group GL ( p ) and the special linear group Sl ( p ) are Lie groups under standard matrix multiplication and inversion. The Lie algebra of the general linear group, namely gl ( p ) , coincides with R p × p . The linear algebra associated with the special linear group is more interesting and its determination involves some clever matrix computations. Let us consider Sl ( p ) endowed with standard matrix multiplication, inversion and I p as the group identity and a curve γ : [ ϵ , ϵ ] Sl ( p ) defined by γ ( t ) : = Exp ( t A ) , where Exp denotes the matrix exponential and A R p × p . Clearly, γ ( 0 ) = I p ; hence, γ ˙ ( 0 ) represents any element of the Lie algebra sl ( p ) . It is not hard to prove that
    γ ˙ ( t ) = A Exp ( t A ) ,
    hence, γ ˙ ( 0 ) = A sl ( p ) . Now, let us recall a result from matrix calculus [43]:
    d d t det ( γ ( t ) ) t = 0 = tr ( γ ˙ ( 0 ) ) .
    In the present case, since det ( γ ( t ) ) 1 , it follows from the above considerations that tr ( A ) = 0 . In conclusion, we found that
    sl ( p ) = { A R p × p tr ( A ) = 0 } ,
    the space of traceless matrices. The algebra sl ( p ) has dimension p 2 1 .
  • Special orthogonal group: The SO ( n ) manifold is a Lie group under standard matrix multiplication. The Lie algebra associated with the special orthogonal group is the set of skew-symmetric matrices so ( p ) : = { H R p × p H + H = 0 p } . In fact, at the identity it holds that T I p SO ( p ) = so ( p ) . The Lie algebra so ( p ) is a vector space of dimension 2 1 p ( p 1 ) .
  • Real symplectic group: The Lie algebra associated with the real symplectic group may be characterized as follows:
    sp ( 2 n ) = { H = J S S R 2 n × 2 n , S = S } ,
    where the quantity J denotes again the fundamental skew-symmetric matrix.
  • Manifold of symmetric, positive-definite matrices: The Lie algebra associated with the Lie group S + ( p ) is the set of p × p symmetric matrices, namely s + ( p ) : = { S R p × p S = S } . The space of symmetric, positive-definite matrices is not a group under standard matrix multiplication. We recall from [44] the following group structure ( S + ( p ) , m , i , e ) :
    -
    Multiplication: m ( P , Q ) : = Exp ( Log P + Log Q ) , (logarithmic multiplication), with P , Q S + ( p ) , where ‘ Log ’ denotes the principal matrix logarithm;
    -
    Identity element: e = I p (notice that Log I p = 0 p );
    -
    Inverse: i ( P ) = Exp ( Log P ) (matrix inversion), with P S + ( p ) (any symmetric, positive-definite matrix is non-singular).
    It is easy to verify that the proposed instances of m , i , e satisfy the algebraic-group axioms in S + ( p ) . Additionally, the logarithmic multiplication on S + ( p ) is compatible with its smooth manifold structure, as the map ( P , Q ) m ( P , i ( Q ) ) = Exp ( Log P Log Q ) is smooth.
An essential peculiarity of any Lie groups ( G , m , i , e ) is that the whole group may always be brought back to a convenient neighborhood of the identity e and the same holds true for every tangent space T x G , x G , which may be brought back to the algebra g . Let us consider, for instance, a curve γ x : [ ϵ , ϵ ] G such that γ x ( 0 ) = x . We may define a new curve
γ ˜ x ( t ) : = m ( γ x ( t ) , i ( x ) )
that has the property γ ˜ x ( 0 ) = e . Conversely, γ x ( t ) = m ( γ ˜ x ( t ) , x ) . This operation closely resembles a translation of a curve into a neighborhood of the group identity.
The above observation leads to defining two Lie-group functions:
  • Right translation: Defined as a function R x : G G by R x ( y ) : = m ( y , x ) for every pair ( x , y ) G ;
  • Left-translation: Defined as a function L x : G G by L x ( y ) : = m ( i ( x ) , y ) for every pair ( x , y ) G ;
the distinction between the left translation and right translation is somewhat arbitrary: we chose a rather non-standard one.
Notice that R x and L x commute; in fact,
R x ( L x ( y ) ) = m ( m ( i ( x ) , y ) , x ) = m ( i ( x ) , m ( y , x ) ) = L x ( R x ( y ) ) .
Let us consider a simple example to get acquainted with the notion of left/right translation.
Example 10.
Let us particularize the notion of left/right translation to the familiar hypercube. In R p , considered as a group under standard array addition, these functions may be defined as R x ( y ) : = y + x and L x ( y ) = x + y . The composition R x ( L x ( y ) ) returns R x ( L x ( y ) ) = L x ( y ) + x = x + y + x , namely R x ( L x ( y ) ) = y ; hence, R x L x = id x . Since R x and L x commute, we may say that, in this specific case, the functions R x and L x are the inverse of one another. In general, this is not true, though, and the failure of the composition R x L x to equal an identity map is caused by the lack of commutativity of a group. ■
Now, let us consider the curve L x ( γ x ) : it crosses the identity e at t = 0 . Consequently, the pushforward d x L x maps every vector in T x G to a vector in T e G . Namely,
d x L x : T x G g .
Likewise, taking an arbitrary smooth curve γ e passing by the identity e G at t = 0 and considering the curve R x ( γ e ) , it is readily seen that the latter crosses the point x at t = 0 . Consequently, the pushforward d e R x maps every vector in T e G to a vector in T x G . Namely,
d e R x : g T x G .
Recall that T x G and g are of equal dimension and that the pushforward map is linear; hence, the pushforward map d x L x allows us to translate a vector belonging to a tangent space of a group to a vector belonging to its algebra (and vice versa through its pullback). This is the reason for which the Lie algebra of a Lie group is sometimes termed the ‘generator’ of a group.
It is easy to see that, if the structure of g is known for a group G , it might be convenient to coordinatize a neighborhood of the identity of G through elements of the associated algebra with the help of a conveniently selected homeomorphism (namely, a continuous function between topological spaces that has a continuous inverse function). Such a homeomorphism is known in the literature as an exponential map and is denoted as exp : g G . We shall discuss the notion of exponential map for general manifolds in a later section.
As mentioned, a Lie algebra is endowed with a binary operator termed Lie bracket, denoted as [ · , · ] . Let us survey its derivation. Let us define the function Ψ x : G G termed inner isomorphism as
Ψ x ( y ) : = L x ( R x ( y ) ) .
The inner isomorphism provides a measure of non-commutativity of a Lie group. It is interesting to notice that the function Ψ x preserves the inner product of G ; in fact, taken two elements y , z G , it holds that
Ψ x ( m ( y , z ) ) = m ( i ( x ) , m ( m ( y , z ) , x ) ) = m ( i ( x ) , m ( m ( m ( y , x ) , m ( i ( x ) , z ) ) , x ) ) = m ( m ( i ( x ) , m ( y , x ) ) , m ( i ( x ) , m ( z , x ) ) ) = m ( Ψ x ( y ) , Ψ x ( z ) ) ;
hence, it represents a Lie-group automorphism.
Now, let us take the differential of Ψ x ( y ) with respect to the variable y at y = e . To this aim, it suffices to consider a curve y : [ ϵ , ϵ ] G such that y ( 0 ) = e and an element x G , and to compute the derivative of Ψ x ( y ( t ) ) with respect to t at t = 0 . The calculation of the derivative gives
d d t Ψ x ( y ( t ) ) = ( d y ( t ) Ψ x ) ( y ˙ ( t ) ) = ( d R x ( y ( t ) ) L x ) ( d y ( t ) R x ) ( y ˙ ( t ) ) .
Setting y = e and letting v : = y ˙ ( 0 ) g and Ad x : = d e Ψ x yields
Ad x ( v ) : = ( d x L x ) ( d e R x ) ( v ) .
By virtue of the properties (93) and (94), Ad x is a linear map from g to itself, namely an endomorphism in End ( g ) . The map Ad x is termed adjoint representation of the Lie algebra g .
In addition, let us now consider a smooth curve x : [ ϵ , ϵ ] G such that x ( 0 ) = e and x ˙ ( 0 ) = u g and let us evaluate the map
ad u : = d d t ( d x L x ) ( d e R x ) t = 0 .
By the chain rule, one obtains
ad u : = d d t ( d x ( t ) L x ( t ) ) t = 0 ( d e R e ) + ( d e L e ) d d t ( d e R x ( t ) ) t = 0 .
This is termed adjoint operator and coincides with the Lie bracket of two tangent vectors up to sign, namely
[ u , v ] : = ad u ( v ) .
The Lie bracket provides a measure of non-commutativity of the algebra g .
Example 11.
Let us consider the Lie group GL ( n ) endowed with standard matrix multiplication and let us write down explicitly the inner isomorphism, the adjoint representation and the Lie bracket.
Since L X ( Y ) = X 1 Y and R X ( Y ) = Y X , the inner isomorphism reads Ψ X ( Y ) = X 1 Y X .
Taking a curve Y ( t ) GL ( n ) such that Y ( 0 ) = I n and Y ˙ ( 0 ) = V R n × n yields
Ad X ( V ) = d d t ( X 1 Y ( t ) X ) t = 0 = X 1 V X .
To end with, taking a curve X ( t ) GL ( n ) such that X ( 0 ) = I n and X ˙ ( 0 ) = U R n × n yields
ad U ( V ) = d d t ( X 1 ( t ) V X ( t ) ) t = 0 = ( U V ) + V U = [ U , V ] .
In the above calculation, we have used a known matrix identity, namely
d d t X 1 ( t ) = X 1 ( t ) X ˙ ( t ) X 1 ( t ) .
In this case, [ A , B ] : = A B B A denotes the matrix commutator. ■

7. Metrization, Riemannian Manifolds

A Riemannian manifold is a smooth manifold M whose tangent bundle is equipped with a smooth family of positive-definite inner products M x · , · x R . An inner product is locally defined at every point x M of a manifold as a bilinear function from T x M × T x M to R . It is important to remark that the inner product acts on two elements of a tangent space to the manifold at a given point x; it therefore depends (smoothly) on the point x. Hence, such an inner product gives rise to a local metric. Whenever an inner product · , · x does not depend explicitly on the point x, it is termed uniform. Given two tangent vectors v , w T x M , their inner products are denoted as v , w x .

7.1. Coordinate-Free Metrization by Inner Products and Metric Kernels

Let us recall some of the general properties of any inner product. In general, any vector space V may be endowed with an inner product (also termed scalar product) denoted by · , · : V × V R . Any inner product has a set of properties:
  • For every u , v V , it holds that u , v = v , u ,
  • For every u , v , w V , it holds that u + w , v = u , v + w , v ,
  • For every u , v V and c R , it holds that c u , v = c u , v ,
  • The norm of a vector v V is defined as v : = v , v ,
  • An inner product is non-degenerate if and only if u , v = 0 for every v V implies u = 0 .
Let us consider a simple example about the above notions.
Example 12.
Let us consider the special case V : = R p and the inner product:
u , v = u S v ,
with S being a symmetric, p × p matrix. If S, in addition to being symmetric, is also positive-definite, then such an inner product is non-degenerate. ■
It pays to recall the notion of adjoint operator with respect to a metric on a vector space. Let us consider again the vector space V endowed with an inner product · , · : V × V R and a linear operator ω : V V . Ad adjoint operator ω : V V is one such that, for every v , w V , it holds that
ω ( v ) , w = v , ω ( w ) .
A self-adjoint operator is one such that ω = ω , while an antiadjoint operator is one such that ω = ω . The following observation on self-adjoint operators in quadratic forms is in order.
Observation 1.
Every operator may be decomposed as the sum of a self-adjoint and of an antiadjoint operator; in fact, it holds that
ω = 1 2 ( ω + ω ) + 1 2 ( ω ω ) .
As far as quadratic forms ω ( v ) , v are concerned, only the self-adjoint component plays a role. In fact, it is not hard to show that
ω ( v ) , v = 1 2 ( ω + ω ) ( v ) , v .
This is indeed true for any arbitrary inner product of the type ω ( v ) , w , in fact, some straightforward algebraic work reveals that
ω ( v ) , w = 1 2 ω ( w + v ) , w + v ω ( w v ) , w v .
Since the right-hand side is a combination of quadratic forms, the left-hand side depends only on the self-adjoint part of ω.
In the case of a manifold, at every point, x M corresponds a tangent space T x M that may be endowed with an inner product. Therefore, to denote the inner product assigned to the vector space T x M , we use the notation · , · x .
Example 13.
Given a Riemannian manifold ( M , · , · ) and two smooth vector fields f , g Γ ( M ) , we may define the function h : M R as
h ( x ) : = f ( x ) , g ( x ) x .
Notice that the inner product variessmoothlyfrom one tangent space to another; hence, the function h is regular in M . The function h : M R represents a scalar field on M .
As a further example, let us consider an arbitrary curve γ : [ ϵ , ϵ ] M over a manifold M . On the basis of such curve, we can define a function φ : [ ϵ , ϵ ] R as
φ ( t ) : = f ( γ ( t ) ) , g ( γ ( t ) ) γ ( t ) .
Notice that the inner product ‘accompanies’ the point x = γ ( t ) that travels along the curve and involves different tangent spaces. Even the function φ : [ ϵ , ϵ ] R represents a scalar fieldrestrictedto the curve γ. ■
On a Riemannian manifold M , the norm of a vector v T x M is defined as v x : = v , v x . A specific property of Riemannian manifolds is that the norm is positive-definite; namely v x 2 0 , where v x = 0 if an only if v = 0 .
Choosing the ‘right’ metric for a given manifold in a given application is one of the most challenging aspects of metrization, especially because it is seldom unclear what ‘right’ practically means. When in doubt, it is possible to resort to canonical metrics. A number of canonical metrics for different manifolds of interest are summarized in the following:
  • Hypercube: For the space R p of (column) arrays, the canonical metric is the Euclidean metric u , v x : = u v , for every u , v R p , while for the space R p × q of rectangular matrices, the canonical metric is the Euclidean metric U , V X : = tr ( U V ) , for every U , V R p × q . Notice that both these metrics do not depend explicitly on the point that they are calculated at, and hence they are uniform.
  • Hypersphere: The hypersphere S p 1 embedded into the ambient space R p inherits its canonical metric; hence, we shall choose u , v x : = u v for every u , v T x S p 1 . Even this metric is uniform.
  • General linear group and the special linear group: A metric for the general linear group GL ( p ) A is:
    U , V A : = tr ( ( A 1 U ) ( A 1 V ) ) , U , V T A GL ( p ) .
    Such a metric was popularized, for instance, in [45], in the context of machine learning.
  • Special orthogonal group: The canonical metric in SO ( p ) is defined as U , V X : = tr ( U V ) , for any X SO ( p ) and U , V T X SO ( p ) . Notice that the norm of a tangent vector V T X SO ( p ) is V X = V , V = tr ( V V ) = : V F , known as the Frobenius norm [18].
  • Stiefel manifold: There are two well-known metrics for the Stiefel manifold, namely, the Euclidean metric and the canonical metric.
    • Euclidean metric: A possible metric that the Stiefel manifold may be endowed with is the Euclidean metric, inherited from the embedding of St ( n , p ) in R n × p :
      U , V X : = tr ( U V ) , U , V T X St ( n , p ) .
    • Canonical metric: The Stiefel manifold may be endowed with a second kind of metric, termed ‘canonical metric’. The associated inner product reads:
      U , V X : = tr U I n 1 2 X X V , U , V T X St ( n , p ) ,
      which, unlike the Euclidean metric (113), is not uniform over the Stiefel manifold.
  • Real symplectic group: There exist two known metrics in the scientific literature that were applied to the real symplectic group.
    • Khvedelidze–Mladenov metric: A metric for the real symplectic group Sp ( 2 n ) is:
      U , V Q : = tr ( Q 1 U Q 1 V ) , U , V T Q Sp ( 2 n ) .
      It is referred to as Khvedelidze–Mladenov metric (or KM metric, for short, [46]). This is an indefinite metric; hence, a manifold endowed with this metric is not Riemannian (it is in fact referred to as a pseudo-Riemannian manifold).
    • Euclidean metric: A further metric for the real symplectic group Sp ( 2 n ) is:
      U , V Q : = tr ( ( Q 1 U ) ( Q 1 V ) ) , U , V T Q Sp ( 2 n ) .
      Such a metric is inherited from the embedding of a real symplectic group Sp ( 2 n ) into the space of real invertible matrices GL ( 2 n ) that, in turn, is embedded into the real hypercube R 2 n × 2 n and hence inherits its canonical metric.
  • Space of symmetric, positive-definite matrices: The canonical metric in S + ( p ) is defined as U , V P : = tr ( U P 1 V P 1 ) , for any P S + ( p ) and U , V T P S + ( p ) . Clearly, this is not a uniform metric.
  • Grassmann manifold: The canonical metric on a Grassmann manifold is
    U , V [ X ] = tr ( U V ) , U , V T [ X ] Gr ( n , p ) ,
    which corresponds to the Euclidean metric in the Stiefel manifold that is used to represent elements of the Grassmann manifold.
Example 14.
The inner product (115) gives rise to a metric which is not positive-definite on the space Sp ( 2 n ) GL ( 2 n ) . To verify this property, it suffices to evaluate the structure of the squared norm V Q 2 = tr ( ( Q 1 V ) 2 ) with Q Sp ( 2 n ) and V T Q Sp ( 2 n ) . By the structure of the tangent space T Q Sp ( 2 n ) , it is known that Q 1 V = J S with S R 2 n × 2 n symmetric. It holds that:
J S = 0 n I n I n 0 n S 1 S 2 S 2 T S 3 = S 2 T S 3 S 1 S 2 ,
with S 1 , S 3 R n × n being symmetric and S 2 R n × n being arbitrary. Hence, tr ( ( J S ) 2 ) = 2 tr ( S 2 2 ) 2 tr ( S 1 S 3 ) , which has an indefinite sign. ■
In general, there exists a canonical metric inherited from the ambient space A that a manifold M is embedded in. We know that whenever a manifold is embedded in an ambient space A , the inner product between two tangent vectors may be written as
u , w x = u , G x ( w ) A ,
where · , · A denotes an inner product in A . The expression (118) is based on a metric kernel  G : T M T M . Metric kernels play a prominent role in coordinate-free embedded manifold calculus since a kernel and its derivative determine most of the main functions and maps that we shall encounter in the next sections. We assume that the metric kernel has the following properties:
  • Linearity: G x ( v ) is linear in v, namely G x ( v + α w ) = G x ( v ) + α G x ( w ) , for every α R , x M and v , w T x M ;
  • Symmetry: G x is a self-adjoint operator, namely u , G x ( w ) A = G x ( u ) , w A ;
  • Closure with respect to T M : G x is an endomorphism of T x M , namely G x ( v ) T x M , for every x M and v T x M ;
  • Invertibility: G x is invertible, namely, its inverse G x 1 is well-defined for every x M .
In general, a metric kernel is not well-defined in the whole ambient space A . We shall invoke the fact that it is well-defined in T M and that it might be extended to M × A , in fact in the expression G x ( a ) one must take x M but it is allowable to take a A , since the metric kernel is linear in the argument a. Occasionally, we might need to extend the metric kernel G by an operator G ¯ : A 2 A that is defined at least in a neighborhood of M × A and such that G ¯ G in T M . (In principle, such an ‘extendability’ requirement is not necessary, although it facilitates some computations and certainly clarifies some computational developments.)
Example 15.
Let us consider a hypersphere S n 1 A where A : = R n endowed with a metric u , w x = u , w A : = u w . It is readily seen that, in this example, G x = id x ; therefore, the metric kernel is linear, self-adjoint, invertible and an endomorphism of T x S n 1 .
Let us further consider the manifold S + ( n ) A endowed with the metric U , W P : = tr ( U P 1 W P 1 ) , where A : = R n × n is endowed with the metric U , V A : = tr ( U V ) . In this case, the metric kernel does not coincide with the identity, rather:
G P ( W ) : = P 1 W P 1 .
Let us verify that such metric kernel has the four mentioned properties:
1. 
Linearity: G P ( W ) appears to be linear in W; in fact, it holds that G P ( V + α W ) = P 1 ( V + α W ) P 1 = P 1 V P 1 + α P 1 W P 1 , for every α R , V , W T P A ,
2. 
Symmetry: G P is self-adjoint; in fact, it holds that U , G P ( W ) A = tr ( U ( P 1 W P 1 ) ) = tr ( ( P 1 U P 1 ) W ) = G P ( U ) , W A , by virtue of the cyclic permutation invariance of the trace operator,
3. 
Closure: G P turns out to be an endomorphism of T P S + ( n ) ; in fact, for every matrix U T P S + ( n ) , it holds that G P ( U ) = P 1 U P 1 is a symmetric matrix. Hence, it belongs to T P S + ( n ) (notice that ( P 1 U P 1 ) = P U P = P 1 U P 1 because both U and P are symmetric),
4. 
Invertibility: G P is invertible; in fact, if W = G P ( U ) = P 1 U P 1 , then U = P W P = : G P 1 ( W ) .
Notice that G P is not well-defined in R n × n (in fact, if P is taken in R n × n , P 1 does not necessarily exist).
As a last example, let us consider the Stiefel manifold St ( n , p ) A is endowed with the metric U , W X : = tr ( U ( I n 1 2 X X ) W ) , where A : = R n × p endowed with U , V A : = tr ( U V ) . In this case, the metric kernel G X : R n × p R n × p reads
G X ( W ) : = ( I n 1 2 X X ) W .
Linearity and symmetry are readily proven. Closure may be proven as follows. Take X St ( n , p ) and W T X St ( n , p ) , namely W X + X W = 0 p . Define U : = ( I n 1 2 X X ) W . It now suffices to show that U T X St ( n , p ) , which holds true; in fact,
U X + X U = W X 1 2 W X X X + X W 1 2 X X X W = W X + X W 1 2 W X + X W = 0 p .
The invertibility of the matrix kernel follows from the invertibility of the matrix I n 1 2 X X . From the Sherman–Morrison–Woodbury (matrix inversion) formula [47]
( A + U C V ) 1 = A 1 A 1 U ( C 1 + V A 1 U ) 1 V A 1 ,
it follows, upon setting A : = I n , C : = 1 2 I p , U : = X and V : = X , that
I n 1 2 X X 1 = I n X ( 2 I p + X X ) 1 X = I n + X X ,
that exists for every X St ( n , p ) . ■
Further, let us examine separately and concisely the structure of the metric kernel for the symplectic group endowed with its canonical metric.
Example 16.
Let us consider the symplectic group Sp ( 2 n ) endowed with the canonical metric V , W Q : = tr ( ( Q 1 V ) ( Q 1 W ) ) . Assume the symplectic group Sp ( 2 n ) to be embedded in the ambient space A : = R 2 n × 2 n endowed with the Euclidean metric U , V A : = tr ( U V ) . A simple expansion gives
V , W Q : = tr ( V Q Q 1 W ) ,
hence the metric kernel reads G Q ( W ) : = ( Q Q 1 ) W .

7.2. Covariancy, Contravariancy, Tensors*

The convention on index used so far comes from the nature of different objects in relation to their behavior upon coordinate changes.
Let a smooth manifold M be a submanifold of dimension p of a Euclidean space and let us fix a curve γ : [ ε , ε ] M , described by:
γ x ( t ) = ( x 1 ( t ) , x 2 ( t ) , , x p ( t ) ) .
The subscript x tells that the components of the curve are expressed through coordinates x. The tangent vector v : = γ ˙ x ( 0 ) has the components:
v x = ( x ˙ 1 ( 0 ) , x ˙ 2 ( 0 ) , , x ˙ p ( 0 ) ) .
Now, let us introduce a change of variable x i : = κ i ( ξ 1 , ξ 2 , , ξ p ) , for i = 1 , 2 , , p , such that
γ ξ ( t ) = ( κ 1 ( ξ 1 ( t ) , ξ 2 ( t ) , , ξ p ( t ) ) , , κ p ( ξ 1 ( t ) , ξ 2 ( t ) , , ξ p ( t ) ) ) .
In the new coordinates, the tangent vector at t = 0 is expressed as
v ξ = κ 1 ξ i ξ ˙ i ( 0 ) , κ 2 ξ i ξ ˙ i ( 0 ) , , κ p ξ i ξ ˙ i ( 0 ) = x 1 ξ i ξ ˙ i ( 0 ) , x 2 ξ i ξ ˙ i ( 0 ) , , x p ξ i ξ ˙ i ( 0 ) .
At the heart of intrinsic manifold calculus is the requirement that one such tangent vector stays exactly the same, no matter which coordinates are used. Observe that the components v x j : = x ˙ j ( 0 ) and the components v ξ j : = ξ ˙ j ( 0 ) are related by
v x j = x j ξ i v ξ i .
Such a set of equations prescribes how the components of a tangent vector vary upon a coordinate change. Every object that obeys the above law is termed contravariant. Contravariant objects are marked by an upper index.
The canonical basis vectors i do not obey the law (129). In fact, from the requirement that v x = v ξ , it follows that
v x j j x = x j ξ i v ξ i j x = v ξ i i ξ ,
which implies that x j ξ i j x = i ξ . Multiplying both sides by ξ i x k gives
ξ i x k x j ξ i j x = ξ i x k i ξ .
Since ξ i x k x j ξ i = x j x k = δ k j , we obtain the transformation law
k x = ξ i x k i ξ .
Every object that obeys the above law is termed covariant. Covariant objects are marked by a lower index.
Concerning tangent vectors, we may summarize the above results saying that the components of a tangent vector v T x M are contravariant, while the basis of a tangent space T x M is covariant.
Given two tangent vectors u , v T x M , their inner products may be written as
u , v x = u i i x , v j j x x = u i v j i x , j x x ,
due to the bilinearity of the inner product. Now, define g i j ( x ) : = i x , j x x . Then, u , v x = g i j ( x ) u i v j . The functions g i j are clearly symmetric in their indexes (hence, on a smooth manifold of dimension p, these functions are p 2 , but only 2 1 p ( p 1 ) are independent). Moreover, these functions are covariant in both indexes.
Given the canonical basis { d x 1 , d x 2 , , d x p } of a cotangent space T x * M , every covector may be written as ω = ω i d x i . It is easy to show that the components of a covector are covariant, while the basis elements of a cotangent space are contravariant.
Vectors and covectors are special cases of more general objects termed tensors. A vector is a ( 1 , 0 ) -tensor, while a covector is a ( 0 , 1 ) -tensor. In general, it is possible to construct ( p , q ) -tensors by an operation termed tensor product denoted by ⊗. Each component of one such tensor has p upper indexes and q lower indexes. For example, one can construct the metric tensor g = g i j d x i d x j , which is a ( 0 , 2 ) -tensor. Its inverse is g 1 = g i j i j , which is a ( 2 , 0 ) -tensor. Additionally, δ j i represents the mixed components of the so-termed fundamental tensor δ = δ j i i d x j , which is a ( 1 , 1 ) -tensor.
Example 17.
Let us show in what sense functions g i j ( x ) are covariant components of a tensor. Given a variable change x = κ ( ξ ) , from property (132) it follows that
g i j ( κ ( ξ ) ) = i x , j x κ ( ξ ) = ξ k x i k ξ , ξ h x j h ξ κ ( ξ ) = ξ k x i ξ h x j k ξ , h ξ κ ( ξ ) = ξ k x i ξ h x j g k h ( ξ ) .
This is the transformation law of a ( 0 , 2 ) -tensor. The point is that the components g i j ( x ) transform into components g k h ( ξ ) through a linear expression, whose coefficients depend on the derivatives of one set of coordinates with respect to the other set. ■

8. Geodesic Arc, Riemannian Distance, Exponential and Logarithmic Map

A geodesic on a smooth manifold may be intuitively looked at in different ways:
  • On a general manifold, the concept of geodesic extends the concept of a straight line from a flat space to a curved space. In fact, let us consider a curved manifold M embedded in an ambient space R p . Such ambient space contains straight lines in the usual meaning, but the manifold, being curved, hardly accommodates any straight lines. Geodesics are curves that resemble straight lines in that they copy some of their distinguishing features.
  • On a metrizable manifold, a geodesic connecting two points is locally defined as the shortest curve on the manifold connecting these endpoints. Therefore, once a metric is specified, the equation of the geodesic arises from the minimization of a length functional. Such a definition comes from the observation that any straight line in R p is indeed the shortest path connecting any two given points.
  • A further distinguishing feature of straight lines is that they are self-parallel, namely, sliding a straight lines infinitesimally along itself returns the same exact lines. Such a concept gives rise to a definition of geodesic which requires to specify the mathematical meaning of ‘sliding a piece of line infinitesimally along itself’. The technical argument to access such a definition is covariant derivation, which is not covered in the present tutorial (while it will be covered in a forthcoming review paper).
  • Another intuitive interpretation is based on the observation that a geodesic emanating from a point on a manifold coincides with the path followed by a particle sliding on the manifold at a constant speed. For a manifold embedded in a larger ambient space and in special circumstances, this is equivalent to the requirement that the naïve (or embedded) acceleration of the particle is either zero or perpendicular to the tangent space to the manifold at every point of its trajectory. (In the present tutorial paper, we use the term naïve acceleration to distinguish it from covariant acceleration, which will only be defined in a subsequent tutorial.)
Starting from the above informal description, we are going to examine in detail the notion of embedded geodesy. In particular, we shall treat the problem according to an energy-minimizing principle (which is in fact equivalent to a length-minimizing principle).

8.1. Coordinate-Free Embedded Geodesy

The length of a smooth curve γ : [ ϵ , ϵ ] M may be evaluated though a rectification formula as:
L γ : = ϵ ϵ γ ˙ ( t ) , γ ˙ ( t ) γ ( t ) d t .
The net result of this argument is that, through the definition of an inner product on the tangent bundle to a Riemannian manifold, we are able to measure the lengths of paths in the manifold itself, which turns the manifold into a metric space.
Example 18.
Let us consider the manifold described in intrinsic coordinates:
M : = { ( θ 1 , θ 2 , sin ( θ 1 ) + 1 ) R 3 ( θ 1 , θ 2 ) R 2 } .
Let us consider the curve ( θ 1 , θ 2 ) = ( t , t ) obtained by mapping a segment of the plane R 2 to the manifold M , namely:
γ ( t ) : = ( t , t , sin ( t ) + 1 ) , t [ a , b ] .
Let us assume the manifold M to be endowed with the metric v , w x : = v w . In order to evaluate the length of such a curve, it is necessary to evaluate the velocity field γ ˙ :
γ ˙ ( t ) = ( 1 , 1 , cos ( t ) ) , t [ a , b ] .
The length of the curve is now obtained through the rectification formula:
L ( γ ) : = a b 1 2 + 1 2 + cos 2 ( t ) d t = a b 2 + cos 2 ( t ) d t .
Such an integral may not be expressed in terms of elementary functions. ■
On the basis of the concepts of ‘curve’ and of ‘length of a curve’, it is possible to define the notion of ‘distance between two points on a Riemannian manifold’ d ( · , · ) .
To define a notion of distance, let us consider what follows:
  • Given a manifold ( M , · , · ) , take two arbitrary points p , q M ;
  • Choose a curve γ : [ 0 , 1 ] M that has p and q as endpoints, namely, such that γ ( 0 ) = p and γ ( 1 ) = q ;
  • Take as as distance d ( p , q ) the length of such a curve, namely L ( γ ) ; one such definition seems a good starting point, but needs to be perfected since there exist infinitely many curves joining two given points.
It is important to underline that the notion of distance is not univocally defined since it depends on the inner product that a manifold is endowed with. Since there exist infinitely many curves joining two points on a manifold, a distance between two points p and q is actually defined as the length of the shortest curve connecting such two points, namely
d ( p , q ) : = inf γ 0 1 γ ˙ ( t ) , γ ˙ ( t ) γ ( t ) d t .
The problem of selecting the shortest path connecting two points is far from easy to solve. A possible method to look for it is the so-called variational method that we are going to introduce in the following.
The key point is to introduce an energy functional defined as
E ( γ ) : = 0 1 γ ˙ ( t ) , γ ˙ ( t ) γ ( t ) d t ,
whose minimum argument coincides to a geodesic. The variational method consists of looking for a curve that makes the energy functional stationary, namely, for a curve such that δ E ( γ ) = 0 . A concept that facilitates the formulation of the variational method is that of normal space to a manifold in a point.
Clearly, the above definition introduces yet another degree of freedom, since the definition of normal space is based on a choice of inner product · , · A to evaluate orthogonality. It is instructive to notice that, since a normal space is defined as the orthogonal complement to a tangent space with respect to the ambient space, then
T x M N x M = A .
In other terms, every element in A may be decomposed exactly as the sum of a tangent and a normal vector.
Example 19.
Let us evaluate the structure of the normal spaces to the hypersphere S n 1 embedded in A : = R n . Let us recall that S n 1 : = { x R n x x = 1 } and that T x S n 1 = { v R n v x = 0 } . Let us select y , z A : = y z for every y , z A . Since, by definition,
N x S n 1 : = { z R n z v = 0 , v R n s u c h t h a t v x = 0 } ,
it is easily found that
N x S n 1 = { λ x λ R } .
Let us further determine the structure of the normal spaces for the manifold SO ( n ) . Let us recall that SO ( n ) : = { R R n × n R R = I n , det ( R ) = 1 } and that T R SO ( n ) = { V R n × n V R + R V = 0 } . Let us take A : = R n × n and Y , Z A : = tr ( Y Z ) for Y , Z A . Let us also recall that
V R + R V = ( R V ) + R V = H + H = 0 n ,
where H : = R V . Hence, any tangent vector at R may be written as V = R H with H being skew-symmetric. By definition,
N R SO ( n ) : = { Z R n × n tr ( Z V ) = 0 , V = R H , H + H = 0 n } ,
and hence it is readily found that
N R SO ( n ) = { R S S R n × n , S = S } .
As a verification step, let us show that, for every n × n special orthogonal matrix R, skew-symmetric matrix H and symmetric matrix S, it holds that R H , R S A = 0 . To this aim, let us show that:
R H , R S A = tr ( ( R H ) R S ) = tr ( H ( R R ) S ) = tr ( H S ) .
Now, it is not hard to prove that every skew-symmetric matrix H is ‘orthogonal’ to every symmetric matrix S; in fact:
tr ( H S ) = tr ( ( H S ) ) B e c a u s e tr ( X ) = tr ( X ) = tr ( S H ) = tr ( S ( H ) ) = tr ( ( H ) S ) B e c a u s e tr ( X Y ) = tr ( Y X ) = tr ( H S ) ,
and therefore tr ( H S ) = 0 . From this result it follows that R H , R S A = 0 .
As a last example, let us determine the structure of the normal space to the manifold S + ( n ) . Recall that S + ( n ) : = { P R n × n P = P , P > 0 } and that T P S + ( n ) = { V R n × n V V = 0 } . In this case, choose A : = R n × n and Y , Z A : = tr ( Y Z ) for every Y , Z A . By definition, it holds that
N P S + ( n ) : = { Z R n × n tr ( Z V ) = 0 , V s u c h t h a t V V = 0 } ,
from which it turns out that
N P S + ( n ) = { Z R n × n Z + Z = 0 } .
In fact, N P S + ( n ) represents the set of n × n matrices that are orthogonal to all n × n symmetric matrices, which coincides with the set of all n × n skew-symmetric matrices. Notice that N P S + ( n ) does not depend explicitly on the base point P. ■
The notion of normal space affords formulating the variational problem δ E ( γ ) = 0 as a differential inclusion. For every specific manifold, it will then be possible to turn such a differential inclusion into a differential equation to be solved under appropriate initial conditions. Let us consider a curve γ : [ 0 , 1 ] M and let us define the fundamental form F : T M R as:
F ( x , v ) : = v , v x .
The following result holds.
Theorem 2.
For a manifold M , the variational equation δ E ( γ ) = 0 is equivalent to the following differential inclusion:
F γ d d t F γ ˙ N γ M .
Notice that such a relation appears as a generalized form of the well-known Euler–Lagrange equation of rational mechanics [48].
Proof. 
In order to prove such an important result, notice that the energy functional may be rewritten as:
E ( γ ) = 0 1 F ( γ , γ ˙ ) d t .
It is important to notice that, by virtue of the property (118), for every ( x , v ) T M , there exists a neighborhood of ( x , v ) in A 2 , where the partial derivatives F ( x , v ) x and F ( x , v ) v exist (technically, one should use an extension F ¯ in place of F ). Such partial derivatives are intended in the Gateaux sense, namely
lim ϵ 0 F ( x + ϵ u , v ) F ( x , v ) ϵ = : F ( x , v ) x , u A , for every u A , lim ϵ 0 F ( x , v + ϵ u ) F ( x , v ) ϵ = : F ( x , v ) v , u A , for every u A .
Let us define an arbitrary smooth variation η : [ 0 , 1 ] T M , namely a deviation from the geodesic γ , such that η ( 0 ) = η ( 1 ) = 0 .
For a sufficiently small constant ϵ > 0 , it is possible to evaluate the quantity F ( γ + ϵ η , γ ˙ + ϵ η ˙ ) . Applying the Taylor series expansion in A 2 gives:
F ( γ + ϵ η , γ ˙ + ϵ η ˙ ) = F ( γ , γ ˙ ) + ϵ F γ , η A + ϵ F γ ˙ , η ˙ A + O ( ϵ 2 ) ,
where O denotes a Landau symbol and it is used to compactly represent the remainder of the series (e.g., in Lagrange form). The variation of the energy of a curve may be concretely defined as:
δ E ( γ ) : = lim ϵ 0 E ( γ + ϵ η ) E ( γ ) ϵ = lim ϵ 0 1 ϵ 0 1 [ F ( γ + ϵ η , γ ˙ + ϵ η ˙ ) F ( γ , γ ˙ ) ] d t = 0 1 F γ , η A d t + 0 1 F γ ˙ , η ˙ A d t + 0 1 lim ϵ 0 O ( ϵ 2 ) ϵ d t .
The last integral is null. The second last integral may be rewritten, upon integration by parts, as:
0 1 F γ ˙ , η ˙ A d t = F γ ˙ , η A 0 1 0 1 d d t F γ ˙ , η A d t .
The first addendum on the right-hand side is null because the variation η vanishes at the endpoints of the curve; therefore,
δ E ( γ ) = 0 1 F γ d d t F γ ˙ , η A d t .
By imposing that δ E ( γ ) = 0 it is readily obtained that, since the variation η is arbitrary, it must hold the relation (153). □
Let us apply the above theorem to a number of manifolds.
Example 20.
Let us determine the equation of geodesics on the hypersphere S n 1 through Equation (153). Let us recall that S n 1 : = { x R n x x = 1 } ; assume that v , w x : = v w (independently of x) and let us recall that N x S n 1 = { λ x λ R } . Now, the equation of the geodesic reads:
( γ ˙ γ ˙ ) γ d d t ( γ ˙ γ ˙ ) γ ˙ = 0 2 d γ ˙ d t = 2 γ ¨ { λ γ λ R }
In summary, the equation for the geodesic is
γ ¨ ( t ) = λ γ ( t ) .
(Since λ is arbitrary, it absorbed the factor 2 .) The geodesic equation is then a second-order differential equation in γ that needs two conditions to be solved.
As a further example, let us derive the geodesic equation for the manifold SO ( n ) . Let us recall that SO ( n ) : = { R R n × n R R = I n , det ( R ) = 1 } ; the canonical inner product in this space is V , W R : = tr ( V W ) (independently of R) and that N R SO ( n ) = { R S S R n × n , S = S } . According to the general principle, the geodesic equation reads as
( γ ˙ γ ˙ ) γ d d t ( γ ˙ γ ˙ ) γ ˙ = 0 2 d γ ˙ d t = 2 γ ¨ { γ S S R n × n , S = S }
In summary, the geodesic equation on the space SO ( n ) reads as
γ ¨ = γ S ,
with S denoting an arbitrary symmetric function. ■
Through the notion of geodesic arcs, one may define the notion of Riemannian (or geodesic) distance between two nearby points in a smooth manifold. In fact, let us denote by x , y M two nearby points and assume that there exists a unique geodesic arc γ x y : [ 0 , 1 ] M such that γ x y ( 0 ) = x and γ x y ( 1 ) = y . By definition, the length of such geodesic arcs is taken as the distance between its two endpoints, namely:
d ( x , y ) : = 0 1 γ ˙ x y ( t ) , γ ˙ x y ( t ) γ x y ( t ) d t .
It is worth noticing, at this point, that the expression of the geodesic distance may be rewritten in terms of the fundamental form F , namely
d ( x , y ) = 0 1 F ( γ x y , γ ˙ x y ) d t .
A noteworthy (and extremely useful) result that we are going to prove is that, along a geodesic, the fundamental form keeps constant. We shall see after the proof that such a result noticeably simplifies the computation of the geodesic distance.
Theorem 3.
Along a geodesic arc γ : [ 0 , 1 ] M , the function F ( γ ( t ) , γ ˙ ( t ) ) stays constant with respect to t.
Proof. 
Let us mention that, for any given pair ( x , v ) T M , it holds that
F ( x , v ) v , v A = 2 F ( x , v ) ,
which clearly stems from the fact that F ( x , v ) is quadratic in v. Such a property may be shown by recalling that F ( x , v ) = v , G x ( v ) A , where G denotes again a metric kernel, and by noticing that
F ( x , v ) v = 2 G x ( v ) ,
from which it follows that
F ( x , v ) v , v A = 2 G x ( v ) , v A ,
hence the property (166). Let us now show that the differential inclusion (153) and the property (166) imply the constancy of the fundamental form F along a geodesic, namely that:
F ( γ ( t ) , γ ˙ ( t ) ) = F ( γ ( 0 ) , γ ˙ ( 0 ) ) f o r e v e r y v a l u e t [ 0 , 1 ]
on every geodesic. By multivariable calculus it is readily proven that:
d d t F ( γ , γ ˙ ) = F γ , d γ d t A + F γ ˙ , d γ ˙ d t A .
Integrating such an equation over the interval [ 0 , t ] gives:
0 t d F = 0 t F γ , d γ d t A d t + 0 t F γ ˙ , d γ ˙ d t A d t .
The integral on the left-hand side is equal to F 0 t , while the last integral on the right-hand side may be evaluated through integration by parts and equals
0 t F γ ˙ , d γ ˙ d t A d t = F γ ˙ , d γ d t A 0 t 0 t d d t F γ ˙ , d γ d t A d t = 2 F 0 t 0 t d d t F γ ˙ , d γ d t A d t ,
by Equation (166). From Equations (171) and (172), it follows that:
F 0 t = 0 t F γ d d t F γ ˙ , d γ d t A d t + 2 F 0 t ,
hence
F 0 t = 0 t F γ d d t F γ ˙ , d γ d t A d t .
From the differential inclusion (153), it follows that the integrand is null; therefore, F 0 t = 0 for every t [ 0 , 1 ] . □
The above result means that any geodesic trajectory is traveled at a constant speed. Since, along a geodesic arc, the speed γ ˙ x y ( t ) , γ ˙ x y ( t ) γ x y ( t ) is constant for any t, it holds that
d ( x , y ) = γ ˙ x y ( 0 ) , γ ˙ x y ( 0 ) x .
It is worth underlining that, provided that one knows the functional expression of the geodesic connecting two points, their distance does not require integration, but just algebraic operations. (The difficulty is hidden by the fact that finding the exact expression of a geodesic connecting two points may be more troublesome than one might expect.)
We make an observation in order to justify an idea presented at the beginning of this subsection about normal naïve acceleration for a geodesic.
Observation 2.
Let us take a closer look at the differential inclusion (153), where F ( x , v ) = v , G x ( v ) , namely
γ ˙ , G γ ( γ ˙ ) A γ d d t γ ˙ , G γ ( γ ˙ ) A γ ˙ N γ M .
Under the hypothesis that the metric kernel G x does not depend explicitly on the point x, the first term on the left-hand side vanishes. In this case, let us plainly assume that G x id x . In addition, we have already established that γ ˙ , G γ ( γ ˙ ) A γ ˙ = 2 G γ ( γ ˙ ) . The above differential inclusion may hence be written as
( G x id x ) γ ¨ N γ M ,
namely, the naïve acceleration is perpendicular to the velocity. This is perhaps the simplest form of a geodesic equation on a manifold. This is, in fact, the case for the hypersphere surveyed in an example.
A geodesic arc may be expressed in terms of two pieces of information, in addition to its endpoints, such as the initial values γ ( 0 ) = x M and γ ˙ ( 0 ) = v T x M , which represent the point where a geodesic departs from and its initial speed, respectively. A geodesic arc determined by these two pieces of information will be denoted as γ x , v : [ 0 , 1 ] M .
Example 21.
Let us consider a geodesic γ x , v : [ 0 , 1 ] M , with ( x , v ) T M , such that γ ˙ x , v ( 0 ) = v . In this case, it holds that
L ( γ x , v ) = v , v x = F ( x , v ) .

8.2. Exponential and Logarithmic Maps

To a geodesic arc γ x , v : [ 0 , 1 ] M is associated a map exp x : T x M M defined as:
exp x ( v ) : = γ x , v ( 1 ) .
The function (179) is termed exponential map with pole x M and is of paramount importance in manifold calculus. The exponential map exp : T M M takes two arguments, the point x M over the manifold and a vector v T x M tangent to the manifold at the pole x, and returns a point on the manifold itself. A practical reading of the exponential map is that it advances a point x toward a direction v, like vector addition moves a point along a straight line in flat spaces. In other words, the following two expressions are the counterparts of each other in a flat space R n and in a manifold M :
γ x , v ( t ) = x + t v i n R n , t [ 0 , 1 ] , γ x , v ( t ) = exp x ( t v ) i n M , t [ 0 , 1 ] .
Much like the first expression denotes the shortest/straightest line on a flat space (endowed with a Euclidean metric), the second expression denotes the shortest/straightest line over a curved manifold.
The exponential map depends on the pair ( x , v ) T M and is locally invertible around v = 0 . This follows from the fact that exp x ( 0 ) = x and by local invertibility results. In the flat manifold R n , the exponential map may be inverted easily; in fact
from y = x + v it follows that v = y x .
In the flat manifold R n , we recover the classical and intuitive meaning of the exponential map and of its inverse:
  • Exponential map: Given a point x and a vector v in R n , the exponential map exp x ( v ) : = x + v moves the point from x to x + v . (Indeed, the term ‘vector’ comes from the homonymus Latin term that means ‘transporter’.)
  • Inverse exponential map: Given two points x and y in R n , the inverse of the exponential map exp x 1 ( y ) returns a vector v such that exp x ( v ) = exp x ( exp x 1 ( y ) ) = x + ( y x ) y . In other words, the inverse exponential map applied to two points x , y determines the vector v that ‘transports’ x to y.
Let us observe that the inverse exponential map is non-symmetric in its arguments, namely the notation exp x 1 ( y ) makes it immediately clear that it is not allowed to swap x with y. Indeed, even in the flat space R n , it holds that exp x 1 ( y ) : = y x = ( x y ) = exp y 1 ( x ) . (In the case of a curved manifold, such a reciprocity relation is more convoluted.)
Generally, the inverse exponential map is termed logarithmic map. On a manifold M where it is defined an exponential map exp : T M M a logarithmic map is denoted as log : M 2 T M . A logarithmic map takes as arguments two points x , y M and returns a vector v = log x ( y ) T x M . It is important to underline that a logarithmic map is defined only locally, namely, only if x , y M lay sufficiently close to one another. The lack of a global logarithmic map may be understood as follows. A logarithmic map is defined by the following equation:
exp x ( log x y ) = y ,
and therefore, the existence of log x y is tied to the chance of determining one and only one geodesic arc γ x y that connects the given points x , y M , and then, by the equality γ x y ( t ) = γ x , v ( t ) , it follows that log x y v . However, not every pair of points may be connected by a unique geodesic, and hence a global logarithmic map, in general, fails to exist. A quick example of such an unavoidable problem is found on the sphere S 2 ; taking x as the North pole and y as the South pole, there exist infinitely many geodesic lines that connect the two poles. Hence, log x y is undetermined.
Let us examine the expressions of geodesic arcs, geodesic distances and exponential maps for manifolds of interest in system theory and non-linear control.
  • Hypercube: The space R p endowed with the Euclidean metric admits straight lines as geodesics; in fact, since T x R p = R p it follows that N x R p = { 0 } , and hence the geodesic equation is simply γ ¨ x , v = 0 and its solution is γ x , v ( t ) = x + t v for every x R p , v T x R p and t [ 0 , 1 ] . Now, take two points x , y R p and look for a geodesic arc connecting them with t [ 0 , 1 ] . It is necessary to find a vector v T x R p such that γ x , v ( 1 ) = y . Such a vector is clearly v = y x ; hence, the unique geodesic arc connecting x to y is γ x y ( t ) = x + t ( y x ) . Since γ ˙ x y ( 0 ) , γ ˙ x y ( 0 ) x = v , the Riemannian distance reads
    d ( x , y ) = y x ,
    which is a well-known result from calculus and geometry.
  • Hypersphere: On the hypersphere S p 1 embedded in the Euclidean space R p , a geodesic line may be conceived as a curve on which a particle, departing from the point x S p 1 with velocity v T x S p 1 , slides with constant speed v , where · denotes the standard L 2 vector norm. On the hypersphere, we denote such a curve as γ x , v ( t ) , where the variable t [ 0 , 1 ] provides a parametrization of the curve. The differential equation characterizing geodesics on the hypersphere may be determined by observing that, with the given conditions, in this case the naïve acceleration of the particle must be either null or normal to the tangent space at any point of the hypersphere itself, namely γ ¨ x , v ( t ) N γ x , v ( t ) S p 1 . Since the normal space to a hypersphere at a point x is radial along x, the geodesic equation reads as γ ¨ x , v ( t ) = λ γ x , v ( t ) . In explicit form, the equation of the geodesic on the unit hypersphere may be written as [49]:
    γ x , v ( t ) = x cos ( v t ) + v sin ( v t ) v 1 , t [ 0 , 1 ] ,
    as it is easy to verify by substitution. Additionally, it is easy to verify that γ x , v ( 0 ) = x , γ ˙ x , v ( 0 ) = v and that γ x , v ( t ) γ x , v ( t ) = 1 for every t. The exponential map associated with the above geodesic is
    exp x ( v ) : = x cos ( v ) + v sin ( v ) v 1 .
    The relationship (184) for the geodesic represents a ‘great circle’ on the hypersphere. Now let us take two points x , y S p 1 (non-antipodal, such that x y 1 ) and let us look for a geodesic arc of the form (184) connecting them. It is clearly necessary to find a vector v T x S p 1 such that γ x , v ( 1 ) = y . Such an equation in the unknown v may be expressed explicitly as
    x cos ( v ) + v sinc ( v ) = y ,
    where ‘ sinc ’ denotes the cardinal sine function. Pre-multiplying the above equation by x gives cos ( v ) = x y , namely v = acos ( x y ) ; hence
    v = y x ( x y ) sinc ( acos ( x y ) ) .
    This expression represents the inverse of the exponential map applied to points x , y S p 1 , namely
    log x y : = ( I p x x ) y sinc ( acos ( x y ) ) .
    Notice that such a logarithmic map is defined only when x y 1 , namely when the two points are antipodal. The unique geodesic arc connecting x to y is given by
    γ x y ( t ) = x cos ( acos ( x y ) t ) + y x ( x y ) sinc ( acos ( x y ) ) sinc ( acos ( x y ) t ) .
    A noticeable consequence is that, since γ ˙ x y ( 0 ) , γ ˙ x y ( 0 ) x = v , the Riemannian distance between the points x and y reads
    d ( x , y ) = acos ( x y ) ,
    where the inverse cosine function ‘ acos ’ returns a value in [ 0 , π ] .
  • Special orthogonal group: In general, it is not easy to obtain the expression of a geodesic arc on a given manifold in closed form. In the present case, with the assumptions considered, the geodesic on SO ( p ) departing from the identity with velocity H so ( p ) has expression γ ˜ ( t ) = exp ( t H ) . (It is important to verify that γ ˜ ( 0 ) = I p and d γ ˜ ( t ) d t t = 0 = H .) It might be useful to verify such an essential result by the help of the following arguments. A geodesic γ ˜ ( t ) on the Riemannian manifold SO ( p ) , embedded in the Euclidean ambient space A : = R p × p and endowed with its canonical metric, departing from the identity I p , should satisfy γ ˜ ¨ ( t ) N γ ˜ ( t ) SO ( p ) , and therefore it should hold:
    γ ˜ ¨ ( t ) = γ ˜ ( t ) S ( t ) , w i t h S ( t ) = S ( t ) .
    Additionally, we know that any geodesic arc belongs entirely to the base manifold; therefore γ ˜ ( t ) γ ˜ ( t ) = I p . By differentiating such an expression two times with respect to the parameter t, one obtains:
    γ ˜ ¨ ( t ) γ ˜ ( t ) + 2 γ ˜ ˙ ( t ) γ ˜ ˙ ( t ) + γ ˜ ( t ) γ ˜ ¨ ( t ) = 0 p .
    By plugging Equation (191) into Equation (192), we find that S ( t ) = γ ˜ ˙ ( t ) γ ˜ ˙ ( t ) , which leads to the second-order differential equation on the orthogonal group:
    γ ˜ ¨ ( t ) = γ ˜ ( t ) ( γ ˜ ˙ ( t ) γ ˜ ˙ ( t ) ) ,
    to be solved with the initial conditions γ ˜ ( 0 ) = I p and γ ˜ ˙ ( 0 ) = H . It is a straightforward task to verify that the solution to this second-order differential equation is given by the one-parameter curve γ ˜ ( t ) = Exp ( t H ) , where ‘Exp’ denotes a matrix exponential.
    The expression of the geodesic arc in the position of interest may be made explicit by taking advantage of the Lie-group structure of the orthogonal group endowed with the canonical metric. In fact, let us consider the pair X SO ( p ) and V T X SO ( p ) as well as the geodesic γ ( t ) that emanates from X, namely γ ( 0 ) = X , with velocity V. We claim that the geodesic departing from X SO ( p ) in the direction V T X SO ( p ) is:
    γ X , V ( t ) = X Exp ( t X V ) .
    In fact, let us consider the left-translated curve γ ˜ ( t ) = X γ X , V ( t ) . It has the following properties:
  • The curve γ X , V ( t ) belongs to the orthogonal group at any time. This may be proven by computing the quantity γ X , V ( t ) γ X , V ( t ) and taking into account that the identity Exp ( t H ) = Exp ( t H ) holds true. Therefore,
    γ X , V ( t ) γ X , V ( t ) = Exp ( t X V ) X X Exp ( t X V ) = Exp ( t X V ) Exp ( t X V ) = = I p .
  • (2)
    It satisfies Equation (193); hence, it is a geodesic. In fact, notice that γ ˜ ( t ) = X X Exp ( t X V ) = Exp ( t X V ) ; hence, γ ˜ ˙ ( t ) = X V Exp ( t X V ) and γ ˜ ¨ ( t ) = ( X V ) 2 Exp ( t X V ) . Now, the right-hand side of Equation (193) has the expression Exp ( t X V ) ( X V Exp ( t X V ) ) X V Exp ( t X V ) = Exp ( t X V ) Exp ( t X V ) ( X V ) X V Exp ( t X V ) = ( X V ) 2 Exp ( t X V ) because X V so ( p ) , hence the claim.
    (3)
    It satisfies γ ˜ ( 0 ) = X and γ ˜ ˙ ( 0 ) = X ( X V ) = V ; hence, it has the correct base point and direction.
    Therefore, the exponential map associated with the above geodesic expression reads
    exp X ( V ) : = X Exp ( X V ) .
    Now, fixed two points X , Y SO ( p ) , let us compute the geodesic arc γ X Y : [ 0 , 1 ] SO ( p ) that joins them. It all boils down in finding a tangent vector V T X SO ( p ) such that Y = X Exp ( X V ) . Pre-multiplying this expression by X gives X Y = Exp ( X V ) ; hence, Log ( X Y ) = X V . Pre-multiplying the last equality by X gives the sought tangent vector as V = X Log ( X Y ) . The logarithmic map associated with the above exponential map therefore reads
    log X Y : = X Log ( X Y ) .
    The Riemannian distance between X and Y then takes the expression
    d ( X , Y ) = X Log ( X Y ) F = Log ( X Y ) F ,
    where ‘Log’ denotes, as usual, the matrix logarithm.
  • Stiefel manifold: Let us consider the expression of geodesics corresponding to two metrics.
    • Euclidean metric: The solution of the geodesic equation, with the initial conditions γ X , V ( 0 ) = X St ( n , p ) and γ ˙ X , V ( 0 ) = V T X St ( n , p ) , reads [33]:
      γ X , V ( t ) = [ X V ] Exp t X V V V I p X V I 2 p , p Exp ( t X V ) ,
      for t [ 0 , 1 ] . The expression of the exponential map corresponding to the Euclidean metric reads, therefore,
      exp X ( V ) : = [ X V ] Exp X V V V I p X V I 2 p , p Exp ( X V ) ,
      while the expression of the logarithmic map and of the Riemannian distance between two given points, in closed form, are unknown at present (to the best of the author’s knowledge). The pseudo-identity matrix I 2 p , p , with 2 p rows and p columns, is just an identity matrix topping a zero matrix.
    • Canonical metric: The geodesic arc γ X , V may be computed as follows. Let Q and R denote the factors of the thin QR factorization of the matrix V, then:
      γ X , V ( t ) = [ X Q ] Exp t X V R R 0 p I p 0 p ,
      for t [ 0 , 1 ] . The expression of the exponential map corresponding to the canonical metric is, therefore,
      exp X ( V ) : = [ X Q ] Exp X V R R 0 p I p 0 p ,
      while the expressions of the logarithmic map and of the Riemannian distance between two points are still unknown.
      In fact, neither the logarithmic map nor the geodesic distance are known in closed form for a Stiefel manifold.
  • Real symplectic group: According to the two considered metrics, we have:
    • KM metric: Under the pseudo-Riemannian metric (115), it is indeed possible to solve the geodesic equation in closed form. The geodesic curve γ Q , V : [ 0 , 1 ] Sp ( 2 n ) with Q Sp ( 2 n ) and V T Q Sp ( 2 n ) corresponding to the indefinite Khvedelidze–Mladenov metric (115) has the expression:
      γ Q , V ( t ) = Q Exp ( t Q 1 V ) .
      In fact, the geodesic equation in variational form is:
      δ 0 1 tr ( γ 1 γ ˙ γ 1 γ ˙ ) d t = 0 .
      The calculation of this variation is facilitated by the following rules of the calculus of variations:
      δ ( Q 1 ) = Q 1 ( δ Q ) Q 1 ,
      δ d Q d t = d d t ( δ Q ) ,
      δ ( Q Z ) = ( δ Q ) Z + Q ( δ Z ) ,
      for curves Q , Z : [ 0 , 1 ] Sp ( 2 n ) . By computing the variations, integrating by parts and recalling that the variations vanish at endpoints, it is found that the geodesic equation in variational form reads:
      0 1 tr ( δ γ ( γ 1 γ ¨ γ 1 γ 1 γ ˙ γ 1 γ ˙ γ 1 ) ) d t = 0 .
      The variation δ γ T γ Sp ( 2 n ) is arbitrary. By the structure of the normal space N Q Sp ( 2 n ) , the equation tr ( P δ γ ) = 0 , with δ γ T γ Sp ( 2 n ) , implies that P = H J γ 1 with H so ( 2 n ) . Therefore, Equation (207) is satisfied if and only if:
      γ 1 γ ¨ γ 1 γ 1 γ ˙ γ 1 γ ˙ γ 1 = H J γ 1 , H so ( 2 n ) ,
      or, equivalently,
      γ ¨ γ ˙ γ 1 γ ˙ = γ H J ,
      for some H so ( 2 n ) . In order to determine the value of matrix H, note that:
      γ J γ J = 0 γ ¨ J γ + 2 γ ˙ J γ ˙ + γ γ ¨ = 0 .
      Substituting the expression γ ¨ = γ ˙ γ 1 γ ˙ + γ H J into the above equation yields the condition J H J = 0 . Hence, H = 0 and the geodesic equation reads:
      γ ¨ γ ˙ γ 1 γ ˙ = 0 .
      Its solution, with the initial conditions γ ( 0 ) = Q Sp ( 2 n ) and γ ˙ ( 0 ) = V T X Sp ( 2 n ) , is found to be of the form (202). By definition of matrix exponential, it follows that γ ˙ Q , V ( t ) = V Exp ( t Q 1 V ) .
    • Euclidean metric: The expression of the geodesic corresponding to the Euclidean metric was derived in [50]. Let γ : [ 0 , 1 ] Sp ( 2 n ) be a geodesic arc connecting the points X , Y Sp ( 2 n ) . Let us define h ( t ) : = γ 1 ( t ) γ ˙ ( t ) . The geodesic that minimizes the following energy functional
      0 1 tr ( h ( t ) h ( t ) ) d t ,
      is the solution of the differential (Lax) equation:
      h ˙ ( t ) = h ( t ) h ( t ) h ( t ) h ( t ) .
      Furthermore, for the initial conditions γ ( 0 ) = X Sp ( 2 n ) and γ ˙ ( 0 ) = V T X Sp ( 2 n ) , the geodesic on the real symplectic group is given by
      γ X , V ( t ) = X Exp ( t ( X 1 V ) ) Exp ( t [ ( X 1 V ) ( X 1 V ) ] ) ,
      in the case that the real symplectic group is equipped by a Euclidean metric.
  • Space of symmetric, positive-definite matrices: The geodesic arc, corresponding to the canonical metric, emanating from a point P S + ( p ) in the direction V T P S + ( p ) has the expression:
    γ P , V ( t ) = P S Exp t P 1 S V P 1 S P S ,
    where · S denotes a symmetric matrix square root. The exponential and the logarithmic maps for P , Q S + ( p ) and V T P S + ( p ) thus read:
    exp P ( V ) = P S Exp P 1 S V P 1 S P S ,
    log P ( Q ) = P S Log P 1 S Q P 1 S P S .
    The symmetric matrix square root of a S + ( p ) matrix P may be computed by means of its eigenvalue factorization. In fact, if the matrix P is factored as X diag ( λ 1 , λ 2 , λ 3 , , λ p ) X , with X O ( p ) and every λ k > 0 , then it holds that P S = X diag ( λ 1 , λ 2 , λ 3 , , λ p ) X . The squared Riemannian distance between two points P , Q S + ( p ) is given by
    d 2 ( P , Q ) = log P Q , log P Q P = tr ( ( log P Q ) P 1 ( log P Q ) P 1 ) = tr P S Log P 1 S Q P 1 S P S P 1 P S Log P 1 S Q P 1 S P S P 1 = tr Log 2 P 1 S Q P 1 S P S P 1 P S = tr Log 2 P 1 S Q P 1 S .
    Notice that the following identity holds true:
    tr ( I p P 1 S Q P 1 S ) k = tr ( I p Q P 1 ) k ,
    and hence the Riemannian distance between two SPD matrices may be written equivalently as
    d 2 ( P , Q ) = tr ( Log 2 ( Q P 1 ) ) ,
    as obtained by the series expansion of the matrix logarithm function (cf. Section 2.1).
  • Grassmann manifold: A geodesic arc on a Grassmann manifold emanating from [ X ] Gr ( n , p ) with the velocity V T [ X ] Gr ( n , p ) may be written as:
    γ [ X ] , V ( t ) = [ X B A ] cos ( D t ) sin ( D t ) B , = X B cos ( D t ) B + A sin ( D t ) B .
    where X St ( n , p ) denotes a Stiefel matrix whose columns form an orthonormal basis of the subspace [ X ] and A D B denotes the compact singular value factorization of the matrix V. The sin/cos functions applied to a diagonal matrix simply act, in terms of components, on the entries of the main diagonal. As a compact, smooth Riemannian manifold, a Grassmann manifold is geodesically complete, which implies that any two points on Gr ( n , p ) may be connected by a geodesic arc.
    The exponential map associated with the canonical metric reads:
    exp [ X ] ( V ) = X B cos ( D ) B + A sin ( D ) B .
    The logarithmic map log [ X ] [ Y ] of two subspaces [ X ] , [ Y ] Gr ( n , p ) is not easy to compute in general. In [40], it is shown that if their Stiefel representatives X , Y St ( n , p ) are such that the product X Y is symmetric, then log [ X ] [ Y ] = A D B , where B ( cos D ) B denotes the spectral factorization of the matrix X Y and A : = ( I X X ) Y B ( sin D ) 1 .
Example 22.
Let us compute the length of a geodesic curve γ x y : [ 0 , 1 ] S n 1 that, by definition, represents the distance between points x , y S n 1 :
d ( x , y ) : = L ( γ x y ) .
Let us start by showing that:
γ ˙ x y ( t ) = 1 sin θ θ cos ( θ t ) y θ cos ( θ ( 1 t ) ) x
with cos θ : = x y ; hence,
γ ˙ x y ( t ) 2 = θ 2 sin 2 θ cos 2 ( θ t ) + cos 2 ( θ ( 1 t ) ) 2 cos ( θ t ) cos ( θ ( 1 t ) ) x y = θ 2 = acos 2 ( x y ) .
It is instructive to verify that γ ˙ x y ( t ) 2 does not actually depend on t, which facilitates the computation of the length. In fact, the length of γ x y : [ 0 , 1 ] S n 1 is given by:
L ( γ x y ) = 0 1 γ ˙ x y ( t ) d t = acos ( x y ) 0 1 d t .
Therefore, the distance between x , y S n 1 equals:
d ( x , y ) = acos ( x y ) .
As a further example, let us determine the length of the geodesic curve γ R , V : [ 0 , 1 ] SO ( n ) given by the relationship (194). By definition, it holds that
L ( γ R , V ) = 0 1 γ ˙ R , V ( t ) d t .
Let us show that
γ ˙ R , V ( t ) = ( R R ) V Exp ( R V t ) = V Exp ( R V t ) ,
therefore
L ( γ R , V ) = 0 1 tr [ ( V Exp ( R V t ) ) V Exp ( R V t ) ] d t = 0 1 tr [ Exp ( R V t ) V V Exp ( R V t ) ] d t = 0 1 V d t = V ,
as proven in the general theoretical developments. ■
The following observation clarifies a technical argument.
Observation 3.
The parameter t in the expressions of geodesic lines, which we may identify astime, indicates which point of a geodesic one is referring to. Normally, the value t = 0 corresponds to the initial point x, namely γ x , v ( 0 ) x . It is instructive to observe that, in all examined cases, the velocity v and the time parameter t appear to be multiplied to one another. For example, the equation of the geodesic γ x , v on a hypersphere may be rewritten as
γ x , v ( t ) = cos ( v t ) x + sin ( v t ) v t v t , t > 0 .
This means that it is always possible to re-scale the parameter t as α t as long as the velocity v is re-scaled as v / α , where α > 0 . This is the reason why the time parameter t normally ranges in the interval [ 0 , 1 ] .

8.3. Geodesic Interpolation

The notions of metric, distance, geodesics, are related to the notion of interpolation. In fact, the interpolation between two points x , y M may be defined through the optimization problem [51]:
ι x y ( t ) : = arg min z M [ ( 1 t ) d 2 ( x , z ) + t d 2 ( z , y ) ] ,
with parameter t [ 0 , 1 ] providing the degree of interpolation between the two points. The one-parameter curve ι x y : [ 0 , 1 ] M describes a trajectory over the manifold M , having x and y as endpoints, which may be regarded as an interpolation curve. For example, the value ι x y ( 1 2 ) is regarded as a midpoint between points x and y.
Example 23.
Let us consider two close points X , Y SO ( p ) and let us look for an interpolation of such points in SO ( p ) endowed with its canonical metric. We are looking for a minimizer of the criterion:
Z ( 1 t ) Log ( X Z ) F 2 + t Log ( Y Z ) F 2 ,
with t [ 0 , 1 ] assigned. We may look for an explicit solution of the above minimization problem by reasoning as follows. The points X and Y may be connected by a geodesic arc γ X , V ( τ ) = X Exp ( τ X V ) with τ [ 0 , 1 ] and V T X SO ( p ) such that γ X , V ( 1 ) = Y , namely, with V satisfying condition Exp ( X V ) = X Y . As the interpolating point ι X Y ( t ) must be as close as possible to both X and Y, it should belong to the geodesic γ X , V ( τ ) ; therefore, we are left with the problem of minimizing the criterion function:
τ ( 1 t ) Log ( X X Exp ( τ X V ) ) F 2 + t Log ( Y X Exp ( τ X V ) ) F 2 = ( 1 t ) τ X V F 2 + t ( τ 1 ) X V F 2 = ( ( 1 t ) τ 2 + t ( τ 1 ) 2 ) Log ( X Y ) F 2 ,
with respect to the variable τ. In the second line, we have used the identity Y X = Exp ( X V ) . The minimum of the above criterion function in [ 0 , 1 ] is readily seen to be achieved in τ = t ; thus, the interpolating curve reads:
ι X Y ( t ) = X Exp ( t Log ( X Y ) ) = X ( X Y ) t
for 0 t 1 . Notice that a matrix power A α , with A R n × n and α R , is defined (inasmuch as it exists) in terms of matrix exponential/logarithm, hence the last identity. ■
Generalizing the notion of midpoint, one comes up with the notion of mean value (and of dispersion of a sample set around its mean value) in a metrizable manifold M . We may consider what follows:
  • The notion of ‘mean value’ of objects in a metrizable space should reflect the intuitive understanding that the mean value is an element of the space that locates ‘amidst’ the available objects. Therefore, a fundamental tool in the definition of mean value is a measure of ‘how far apart’ elements in the sample space lie to one another.
  • The notion of ‘metric variance’ of objects in a metrizable space should be defined in a way that accounts for the dispersion of these objects about their mean values and also depends on how the dissimilarity of such objects is measured.
A way of defining the mean value of a set of objects x k M , k = 1 , , K , is provided by the notion of Fréchet sample mean. The Fréchet sample mean and associated sample metric variance [52] may be defined as:
μ : = arg min x M 1 K k = 1 K d 2 ( x , x k ) ,
σ 2 : = min x M 1 K k = 1 K d 2 ( x , x k ) ,
where d denotes a distance function in the Riemannian manifold M . It is worth noting that there is no guarantee, in general, that the optimization problem (233) admits a unique solution. For example, consider the space S p 1 and a sample set consisting of two antipodal points; then, every point over the equator is a mean value. (Notice that a mean is Fréchet only if it is a global minimizer, otherwise it is a Karcher mean.) If the sample points are close enough to each other, it is known that the mean value is unique [53].
The notion of mean value of a set of points belonging to a curved manifold is utterly important in system theory and non-liner control. In fact, the mean value μ is, by definition, close to all points in a collection of points. Therefore, the tangent space T μ M associated with a cloud of data points may serve as a reference tangent space (see, for example, [54]).

8.4. Coordinate-Prone Geodesy, Christoffel Symbols*

In terms of the variation of an energy functional, a geodesic on a manifold M is defined as the unique curve γ ( t ) M , t [ 0 , 1 ] that meets the following requirement
δ 0 1 γ ˙ , γ ˙ γ d t = 0 ,
where the symbol δ denotes the variation of the integral functional. The integral functional in (235) represents the total action associated with the parametrized smooth curve γ : [ 0 , 1 ] M of local coordinates x k = γ k ( t ) and may be written explicitly as:
0 1 γ ˙ , γ ˙ γ d t = 0 1 g i j ( γ ( t ) ) γ ˙ i ( t ) γ ˙ j ( t ) d t .
The variation δ in the expression (235) corresponds to a perturbation of the total action (236). Let η : [ 0 , 1 ] M denote an arbitrary parametrized smooth curve of local coordinates η k ( t ) such that η k ( 0 ) = η k ( 1 ) = 0 . Define the perturbed action as
A η * ( h ) : = 0 1 g i j ( γ + h η ) ( γ ˙ i + h η ˙ i ) ( γ ˙ j + h η ˙ j ) d t ,
with h [ ϵ , ϵ ] , ϵ > 0 . The condition (235) may be expressed explicitly in terms of the perturbed action as:
lim h 0 A η * ( h ) A η * ( 0 ) h = 0 , perturbation η .
The perturbed total action may be expanded around h = 0 as follows:
A η * ( h ) = 0 1 g i j ( x ) + h g i j x k η k + O ( h 2 ) x = γ ( t ) [ γ ˙ i γ ˙ j + h ( γ ˙ i η ˙ j + γ ˙ j η ˙ i ) + O ( h 2 ) ] d t = 0 1 g i j ( γ ) γ ˙ i γ ˙ j d t + h 0 1 g i j ( x ) ( γ ˙ i η ˙ j + γ ˙ j η ˙ i ) + η k g i j x k γ ˙ i γ ˙ j x = γ ( t ) d t + O ( h 2 ) = A η * ( 0 ) + h 0 1 g i k ( x ) γ ˙ i η ˙ k + g k j ( x ) γ ˙ j η ˙ k + g i j x k γ ˙ i γ ˙ j η k x = γ ( t ) d t + O ( h 2 ) ,
where O ( h 2 ) is such that lim h 0 h 1 O ( h 2 ) = 0 . The first term in the integral on the right-hand side of the last line may be integrated by parts, namely:
0 1 g i k ( γ ) γ ˙ i η ˙ k d t = g i k ( γ ) γ ˙ i η k 0 1 0 1 d ( g i k ( γ ) γ ˙ i ) d t η k d t ,
whose first term on the right-hand side vanishes to zero because η k ( 0 ) = η k ( 1 ) = 0 , hence:
0 1 g i k ( γ ) γ ˙ i η ˙ k d t = 0 1 g i k x j γ ˙ i γ ˙ j + g i k ( x ) γ ¨ i x = γ ( t ) η k d t .
A similar result holds for the second term within the integral on the right-hand side. Therefore, it holds that:
A η * ( h ) A η * ( 0 ) h = 0 1 g i j x k g i k x j g k j x i γ ˙ i γ ˙ j 2 g i k ( x ) γ ¨ i x = γ ( t ) η k d t + O ( h 2 ) h .
The condition (238), therefore, implies that:
g i k γ ¨ i + 1 2 g i j x k + g i k x j + g k j x i γ ˙ i γ ˙ j = 0 .
Let us now recall the notion of Christoffel symbols of the first kind (named after Elwin Bruno Christoffel) associated with a metric tensor of components g i j . These coefficients are defined as:
Γ i j k : = 1 2 g j k x i + g i k x j g i j x k .
On top of it, the Christoffel symbols of the second kind are defined as Γ i j h : = g h k Γ i j k . The Christoffel symbols of the second kind are symmetric in the covariant indices, namely, Γ i j h = Γ j i h .
On the basis of the Christoffel symbols, Equation (241) may be rewritten as
g i k γ ¨ i + Γ i j k γ ˙ i γ ˙ j = 0 .
Multiplying by g h k and introducing the Christoffel symbols of the second kind, Equation (243) may be written as:
γ ¨ i + Γ j k i γ ˙ j γ ˙ k = 0 .
Such a system of second-order differential equations needs two initial conditions to be solved. Typical initial conditions are γ i ( 0 ) = x i and γ ˙ i ( 0 ) = v i .
Example 24.
One might wonder if the Christoffel symbols defined in (242) are the components of a tensor. The answer is negative, because none of its lower index denote covariancy. ■

9. Riemannian Gradient of a Manifold-to-Scalar Function

The gradient of a scalar function in a point of its manifold-type domain represents a degree of variability of the function in a vicinity of such a point. In the present section, we are going to survey the notion of ‘Riemannian gradient’ starting from more familiar cases to obtain a full definition for manifold-to-scalar functions.

9.1. Riemannian Gradient: Motivation and Definition

In the simplest case of a scalar function of a scalar variable f : R R , its degree of variability is quantified by its slope d f d x . In fact, let us recall that, given a point x R , moving slightly away from it of a little amount h R , the value of the function f ( x + h ) may be related to the value of f ( x ) through a Taylor series expansion. Such an expansion, truncated to a first-order term by a Lagrange-type remainder, reads
f ( x + h ) = f ( x ) + d f ( x ) d x h + O ( h 2 ) ,
and hence the first derivative of the function (that is, its gradient) truly quantifies the variability of the function around the point x. In the above expression, O again denotes a Landau symbol.
In the more involved case of a scalar function of several variables, f : R p R , the gradient f x quantifies again the degree of variability of the function around a point, although such information is spread along p axes. In fact, in this case, there exist multiple directions along which the variability of a function may be explored. After choosing a direction, one may move away along a straight line. Starting from a point x R p and moving away along a direction v R p of a small fraction t, the value of f ( x + t v ) may be related to the value of f ( x ) through a multivariable Taylor series expansion which, in the first order, reads
f ( x + t v ) = f ( x ) + f ( x ) x ( t v ) + O ( t 2 ) .
Therefore, the gradient of the function f (which, in this case, may be termed ‘Euclidean gradient’ and coincides with the ‘Gateaux derivative’ and with the ‘Jacobian’ of the function) encodes the variability of the function and the quantity f ( x ) x v represents the directional derivative of the function f at x along the direction v. In the following, we shall utilize the shorter notation x : = x .
In the case of a scalar function whose domain is a curved manifold f : M R , its gradient, which we shall denote as grad x f , serves again to quantify the degree of variability of a function in the neighborhood of point x M along a given tangent direction. On the same line of reasoning as before, starting from a point x M one may think of moving away slightly along a given tangent direction v T x M . After choosing a direction, the simplest way to move away from a point along a given direction is to travel along a geodesic line for a short time. Thus, taking a geodesic γ x , v : [ ϵ , ϵ ] M , the value of f ( γ x , v ) may be related to the value f ( x ) through a Taylor-type expansion, which is written as
f ( γ x , v ( t ) ) = f ( x ) + grad x f , t v x + O ( t 2 ) .
Therefore, the gradient of the function f quantifies the variability of a function around a point, while grad x f , v x may be interpreted as a directional derivative of the function f at x along a direction v. From the relationship (247), we see that
f ( γ x , v ( t ) ) f ( x ) t = grad x f , v x + O ( t 2 ) t .
Taking the limit as t approaches zero, the second addendum on the right-hand side vanishes to zero; hence, we obtain
grad x f , v x = lim t 0 f ( γ x , v ( t ) ) f ( x ) t = d f ( γ x , v ( t ) ) d t t = 0 .
The right-hand side is resemblant of a Gateaux derivative of the function f in the direction v. The relationship (249) may be indeed taken as a definition of the gradient of the function with the proviso that it should hold for every v T x M .
By taking a closer look at the expression (249), it is readily seen that it holds even if the geodesic line is replaced by any smooth curve γ : [ ϵ , ϵ ] M as long as γ ( 0 ) = x and γ ˙ ( 0 ) = v , in which case we can write
grad x f , v x = lim t 0 f ( γ ( t ) ) f ( x ) t .
Now let us underline an illusory contradiction in the above expression.
Observation 4.
The curve γ takes part only in closed proximity of the point x; therefore, the right-hand side of the above relation only depends on the point x and the tangent v, as well as the function f, and, in particular, it does not depend on the metric. In other words, the result of the expression lim t 0 f ( γ ( t ) ) f ( x ) t quantifies how much the function f varies upon moving away from the point x toward the direction v. As opposed to this, the left-hand side of the relation (250) seems to depend on the metric · , · . Since the metric may be chosen arbitrarily and independently of the function f and of the quantities x and v, the above relation looks like a conceptual clash.
The only possible way to fix such an illusory contradiction is to recognize that the gradient itself must depend on the metric in such a way that the directional derivative grad x f , v x does not. In other words, if we denote by · , · ( 1 ) and · , · ( 2 ) two metrics for the manifold M , then it must hold that
grad x ( 1 ) f , v x ( 1 ) = grad x ( 2 ) f , v x ( 2 )
for every v T x M , where grad x ( 1 ) f denotes the gradient of the function f at x corresponding to the metric · , · ( 1 ) and grad x ( 2 ) f denotes the gradient of the function f at x corresponding to the metric · , · ( 2 ) .
Let us formalize the above discussion with the aim of formalizing the notion of Riemannian gradient on a Riemannian manifold. Let us consider a Riemannian manifold M embedded in an ambient space A and, for every point x, the tangent space T x M . Let us now consider a smooth function f : M R and an inner product · , · A in A . We shall require that the function f is extendable to a neighborhood of x in A so as to be able to compute the Gateaux derivative x of the function f with respect to x. (As usual, we are supposed to use an extension f ¯ in the following expressions.) Let us recall that
lim h 0 f ( x + h v ) f ( x ) h = : x f , v A .
We denote the inner product that the tangent bundle T M is endowed with at x M as · , · x . The Riemannian gradient grad x M f of the function f over the manifold M at the point x is uniquely defined by the following two conditions:
  • Tangency condition: For every x M , grad x M f T x M .
  • Metric compatibility condition: For every x M and every v T x M A , it holds that grad x M f , v x = x f , v A .
The tangency condition codifies the requirement that a Riemannian gradient must always be a tangent vector to the manifold, namely ( x , grad x f ) T M , while the metric compatibility condition codifies the requirement that the inner product between a gradient vector and any other tangent vector belonging to the same tangent space takes a value that is invariant with respect to the chosen metric. The latter condition is better understood by considering that the linear component of the variation of a function when moving away from a point x of a quantity v is given by
d x f ( v ) = grad x M f , v x .
Clearly, the amount d x f ( v ) depends only on x, f and the displacement v, certainly not on the metric. As a consequence, the gradient needs to depend on the metric. The ‘reference’ inner product is taken as the inner product that the ambient space is endowed with. In fact, the Riemannian gradient of a manifold-to-scalar function represents the rate of change of such function at a given point toward a given direction. The superscript M may be removed for simplicity.
Let us survey the calculation of the Riemannian gradient in a number of spaces of interest in applications.
  • Hypercube: In the space R p endowed with the Euclidean metric, the Gateaux derivative of a regular function f : R p R is simply the column array of partial derivatives of function f with respect to the entries of the column array x = [ x ( 1 ) x ( 2 ) x ( 3 ) x ( p ) ] , namely
    grad x R p f = x f : = f x ( 1 ) f x ( 2 ) f x ( p ) .
    Likewise, the Gateaux derivative of a regular function f : R p × q R is the Jacobian matrix of partial derivatives of function f with respect to the entries of the matrix X, denoted by X ( i , j ) , namely
    grad X R p × q f = X f : = f X ( 1 , 1 ) f X ( 1 , 2 ) f X ( 1 , q ) f X ( 2 , 1 ) f X ( 2 , 2 ) f X ( 2 , q ) f X ( p , 1 ) f X ( p , 2 ) f X ( p , q ) .
  • Hypersphere: Given a regular function f : S p 1 R , its Riemannian gradient at x S p 1 is denoted as grad x S p 1 f T x S p 1 . The Riemannian gradient associated with the canonical metric has the expression:
    grad x S p 1 f = ( I p x x ) x f .
    In fact, the metric compatibility condition prescribes that grad x S p 1 f , v = x f , v , for every v T x S p 1 ; hence, grad x S p 1 f x f , v = 0 for every v T x S p 1 , and therefore grad x S p 1 f x f N x S p 1 . It follows that
    grad x S p 1 f = x f + λ x ,
    for some λ R . The tangency condition then entails ( grad x S p 1 f ) x = 0 , hence that ( x f + λ x ) x = 0 , namely λ = x x f , which gives the stated result.
  • Special orthogonal group: Let us compute the Riemannian gradient of a regular function f : SO ( p ) R . Let the manifold SO ( p ) be equipped with its canonical metric. Let grad X SO ( p ) f denote the gradient vector of a function f at R SO ( p ) derived from the canonical metric. According to the compatibility condition for the Riemannian gradient it must hold that:
    V , grad R SO ( p ) f = V , R f , V T R SO ( p )
    and therefore:
    V , grad R SO ( p ) f R f = 0 , V T R SO ( p ) .
    This implies that the quantity grad R SO ( p ) f R f belongs to the normal space N R SO ( p ) , namely:
    grad R SO ( p ) f = R f + R S , w i t h S b e i n g p × p s y m m e t r i c .
    In order to determine the unknown matrix S, we may exploit the tangency condition, namely ( grad R SO ( p ) f ) R + R ( grad R SO ( p ) f ) = 0 p . Let us first pre-multiply both sides of Equation (258) by R , which gives:
    R grad R SO ( p ) f = R R f + S .
    The above equation, transposed hand by hand, gives:
    ( grad R SO ( p ) f ) R = R f R + S .
    Hand-by-hand summation of the last two equations gives:
    R R f + R f R = 2 S ,
    that is:
    S = 1 2 R R f + R f R .
    By plugging the expression (259) into the expression (258), we obtain the Riemannian gradient in the orthogonal group, corresponding to its canonical metric:
    grad R SO ( p ) f = 1 2 R f R R f R .
  • Stiefel manifold: Let us compute the expression of the Riemannian gradient of a regular function f : St ( n , p ) R at a point X St ( n , p ) corresponding to the Euclidean and the canonical metrics. Recall that the Riemannian gradient grad X St ( n , p ) f in a Stiefel manifold embedded in the Euclidean space R n × p is the unique matrix in T X St ( n , p ) such that:
    tr U X f = U , grad X f X , U T X St ( n , p ) .
    • Euclidean metric: The metric compatibility condition becomes tr ( U ( X f grad X St ( n , p ) f ) ) = 0 , which implies that grad X St ( n , p ) f = X f + X S , with S being p × p symmetric. Pre-multiplying both sides of this equation by the matrix X yields S + X X f = X grad X St ( n , p ) f . Transposing both hands of the above equation and summing hand by hand yields:
      S = 1 2 X f X + X X f + 1 2 ( grad X St ( n , p ) f ) X + X grad X St ( n , p ) f .
      From the condition grad X St ( n , p ) f T X St ( n , p ) , according to Equation (46), it follows that ( grad X St ( n , p ) f ) X + X grad X St ( n , p ) f = 0 . In conclusion, the sought Riemannian gradient reads:
      grad X St ( n , p ) f = X f 1 2 X X f X + X X f .
    • Canonical metric: The metric compatibility condition prescribes that:
      tr U X f I n 1 2 X X grad X St ( n , p ) f = 0 , U T X St ( n , p ) ,
      and therefore, invoking the tangency condition as well, it turns out that:
      X f I n 1 2 X X grad X St ( n , p ) f = X S , S = 1 2 X X f + X f X .
      Solving for the Riemannian gradient yields the final expression:
      grad X St ( n , p ) f = x f X X f X .
  • Real symplectic group: According to the two considered metrics, the expression of the gradient may be computed as outlined below.
    • KM metric: The structure of the pseudo-Riemannian gradient of a regular function f : Sp ( 2 n ) R associated with the Khvedelidze–Mladenov metric (115) is given by
      Q f = 1 2 Q J Q Q f J J Q f Q .
      In fact, the pseudo-Riemannian gradient of a regular function f : Sp ( 2 n ) R associated with the metric (115) is computed as the solution of the following system of equations:
      tr ( Q 1 grad Q f Q 1 V ) = tr ( Q f V ) , V T Q Sp ( 2 n ) , grad Q f J Q + Q J grad Q f = 0 .
      The first constraint ensures the compatibility of the pseudo-Riemannian gradient with the chosen metric, while the second constraint enforces the requirement for the pseudo-Riemannian gradient to lay on a specific tangent space. In particular, the metric compatibility condition may be recast as:
      tr ( V ( Q f Q grad Q f Q ) ) = 0 .
      The above condition implies that Q f Q grad Q f Q N Q Sp ( 2 n ) , and hence that Q f Q grad Q f Q = J Q H with H so ( 2 n ) . Therefore, the pseudo-Riemannian gradient of the criterion function f has the expression:
      Q f = Q Q f Q Q H J .
      In order to determine the value of the unknown skew-symmetric matrix H, it is sufficient to plug the expression (270) of the gradient within the tangency condition, which becomes:
      Q Q f Q Q H J J Q + Q J Q Q f Q Q H J = 0 .
      Solving for H gives:
      H = 1 2 J Q Q f + Q f Q J .
      Plugging the above expression into (270) gives the result (267).
    • Euclidean metric: The structure of the Riemannian gradient Q f of a regular function f : Sp ( 2 n ) R corresponding to the metric (116) reads:
      Q f = 1 2 Q J Q f Q J J Q Q f .
      In fact, the Riemannian gradient Q f must satisfy the conditions:
      Q f = Q J ( H Q 1 J Q f ) , H = 1 2 ( Q 1 J Q f + Q f Q J ) ,
      from which the expression of the Riemannian gradient associated with the metric (116) follows.
  • Manifold of symmetric, positive-definite matrices: The Riemannian gradient grad P f of the function f : S + ( p ) R may be calculated as the unique vector in T P S + ( p ) that satisfies the following equation:
    tr V P f = tr V P 1 ( grad P S + ( p ) f ) P 1 , V T P S + ( n ) .
    The solution of the above equation satisfies:
    P f P 1 ( grad P S + ( p ) f ) P 1 = 1 2 P f P f ,
    and hence the expression of the sought Riemannian gradient follows:
    grad P S + ( p ) f = 1 2 P P f + P f P .
  • Grassmann manifold: The Riemannian gradient grad [ X ] f may be calculated by its definition and reads as:
    [ X ] Gr ( n , p ) f = ( I n X X ) X f .
The above calculations may be conveniently unified by recalling the notion of metric kernel, which affords turning the metric compatibility condition into a projection. Let us recall that the inner product v , u x may be written as v , G x ( u ) A . On the basis of such an equality, the metric compatibility condition may be rewritten as
G x ( grad x M f ) , v A = x f , v A f o r e v e r y v T x M .
Equivalently, G x ( grad x M f ) x f , v A = 0 for every v T x M . By the definition of a normal space, such a condition may be rewritten as
G x ( grad x M f ) x f N x M .
In practical terms, the Gateaux derivative x f A may be uniquely decomposed into the sum of two terms, a tangent one (that it, the gradient) and a normal one. Upon discarding the normal component, the remainder is the sought gradient.
The latter observation may be expressed using the notion of orthogonal projection. Let us define an orthogonal projector  Π x : A T x M as an operator that maps an element of the ambient space to a tangent vector in a specific tangent space. Let us observe that
Π x ( a ) T x M a Π x ( a ) N x M .
The projection operator Π x : A T x M is defined by the two conditions:
  • Tangency: Π x ( a ) T x M for x M and a A .
  • Complementarity: v , a Π x ( a ) A = 0 for all v T x M .
Notice that if ν N x M then Π x ( ν ) = 0 , while if v T x M then Π x ( v ) = v .
Observation 5.
Orthogonal projection stems from a minimization problem, namely, given a point a A and a vector space V , the orthogonal projection of a on V is the shortest vector in V that approximates a, namely
min v V a v , a v A .
Applying orthogonal projection to both sides of the relationship (280) leads to
Π x ( G x ( grad x M f ) ) Π x ( x f ) = 0 ,
but since Π x ( G x ( grad x M f ) ) = G x ( grad x M f ) , we may write the explicit formula
grad x M f = G x 1 ( Π x ( x f ) ) .
The latter expression is quite suggestive, as it states that the Riemannian gradient of a function may be computed as the projection of its Gateaux derivative over a tangent space, to make it a tangent vector, further compensated by the metric kernel to take into account the effect of the metric (namely, to make it sure that the inner product of gradient with any tangent vector is independent of the metric). Notice that grad x M f , v x = G x 1 ( Π x ( x f ) ) , G x ( v ) A . Since G x is self-adjoint, such expression simplifies into Π x ( x f ) , v A which is, in fact, independent of the metric kernel.
Let us summarize a few expressions of interest of orthogonal projection.
  • Hypersphere: An expression of an orthogonal projector Π x : R n T x S p 1 , for x S p 1 , where the ambient space R p is endowed with a Euclidean metric, is:
    Π x ( a ) : = ( I p x x ) a .
    Notice that a Π x ( a ) = ( x a ) x is radial (that is, directed along x) and hence normal.
  • Stiefel manifold: It might be useful to define an orthogonal projection operator Π X : R n × p T X St ( n , p ) , for X St ( n , p ) . Let us assume the ambient space A : = R n × p be endowed with a Euclidean metric U , V A : = tr ( U V ) . In this case, the orthogonal projection takes the expression
    Π X ( U ) : = U 1 2 X ( X U + U X ) ,
  • Grassmann manifold: An expression of orthogonal projection Π [ X ] : R n × p T [ X ] Gr ( n , p ) , for [ X ] Gr ( n , p ) is
    Π [ X ] ( V ) : = ( I n X X ) V .
Let us apply the above considerations to a quadratic function on the hypersphere.
Example 25.
Let us consider the function f : S n 1 R defined as:
f ( x ) : = 1 2 x P 0 x ,
with P 0 S + ( n ) being constant. We assume the hypersphere to be embedded in the ambient space R n endowed with a Euclidean metric. The Euclidean gradient (and also the Gateaux derivative) of such a function reads as
f x = P 0 x .
Since, on the hypersphere, Π x ( a ) = ( I x x ) a , the Riemannian gradient of the function f takes the expression
grad x f = ( I n x x ) P 0 x = P 0 x ( x P 0 x ) x .
Let us verify that the expression (286) is indeed an orthogonal projection over the tangent bundle T St ( n , p ) .
Example 26.
The expression Π X ( U ) : = U 1 2 X ( X U + U X ) realizes an orthogonal projection over the tangent space T X St ( n , p ) . To prove such a statement, it suffices to show that Π X ( U ) T X St ( n , p ) and that U Π X ( U ) , V A = 0 for every V T X St ( n , p ) , which represent the conditions of tangency and of orthogonality, respectively. To what concerns the first property:
X Π X ( U ) + Π X ( U ) X = X U + U X 1 2 ( X X ) ( X U + U X ) 1 2 ( X U + U X ) ( X X ) = 0 .
To what concerns the second property, notice that
U Π X ( U ) , V A = tr ( V ( U U + 1 2 X ( X U + U X ) ) = 1 2 tr ( V X ( X U + U X ) ) .
Now observe that H : = V X is skew-symmetric, while S : = X U + U X is symmetric, for every X St ( n , p ) , U A and V T X St ( n , p ) . Since tr ( H S ) 0 , the statement holds true. ■
A further interesting counterexample clarifies the notion of orthogonality in orthogonal projection.
Example 27.
Let us consider the function Π X ( U ) : = U X U X which realizes a projection over the tangent bundle T St ( n , p ) but not an orthogonal one as long as the Euclidean metric is concerned. Showing that it indeed realizes a projection is fairly easy; in fact,
X Π X ( U ) + Π X ( U ) X = X U U X + U X X U = 0 .
Likewise, we can show that such a projection isnotorthogonal in the Euclidean metric; in fact,
U Π X ( U ) , V A = tr ( V ( U U + X U X ) ) = tr ( ( V X ) ( U X ) ) .
Since the matrix product U X is arbitrary, the latter expression is not guaranteed to be null. ■
As a last example, let us consider the case of the symplectic group endowed with its canonical metric. The orthogonal projection is not easy to express in closed form.
Example 28.
Let us consider the symplectic group Sp ( 2 n ) embedded into the ambient space A : = R 2 n × 2 n endowed with a Euclidean metric A , B A : = tr ( A B ) . By definition of orthogonal projection, Π Q ( A ) must be an element of the tangent space T Q Sp ( 2 n ) , while A Π Q ( A ) must be an element of the normal space N Q Sp ( 2 n ) . According to the characterization (49), it should then hold
A Π Q ( A ) = J Q H ,
namely Π Q ( A ) = A J Q H , with H skew-symmetric to be determined. From the tangency condition, it follows that
( A J Q H ) J Q + Q J ( A J Q H ) = 0 .
Through some manipulations, the above equation may be rewritten as
H Q Q + Q Q H = A J Q + Q J A .
All quantities in the above equation are known except for the unknown skew-symmetric matrix H. The above equation is an instance of the more general Sylvester equation (for a review, see, e.g., [55]).
According to [55], there exist a number of expressions to write the solution to such an equation. One solution is based on the Kronecker operations. In particular, in this case we can use the Kronecker sum I 2 n Q Q and the vectorization operator ‘ vec ’ to rewrite Equation (297) as
( I 2 n Q Q ) vec ( H ) = vec ( A J Q + Q J A ) .
The orthogonal projection operation may therefore be expressed as
Π Q ( A ) = A J Q vec 1 { ( I 2 n Q Q ) 1 vec ( A J Q + Q J A ) } .
A more elegant expression stems from an integral representation of the solution of the Sylvester Equation (297), namely
Π Q ( A ) = A + J Q 0 + Exp ( Q Q t ) ( A J Q + Q J A ) Exp ( Q Q t ) d t .
For the integral representation of the solution of a Sylvester equation, we refer the readers to the aforementioned review paper [55]. We mention that none of these solutions are efficient from a computational point of view.

9.2. Application of Riemannian Gradient to Optimization on Manifold

An important application of the Riemannian gradient, which is of specific interest in system theory and non-linear control, is gradient-based mathematical optimization, which we are going to briefly survey in the remainder of the present section.
Let us preliminarily observe that the relationship (250) that defines the notion of Riemannian gradient possess a further interpretation. Let us take a regular function f : M R , a regular curve γ : [ ϵ , ϵ ] M and a metric · , · in T M . The functions f and γ may be composed as f γ : [ ϵ , ϵ ] R . Therefore, from the relation (250), it follows that:
d d t f ( γ ( t ) ) = grad γ ( t ) f , γ ˙ ( t ) γ ( t )
along the curve γ . As a matter of fact, such a relation is a counterpart of the familiar rule of derivation for composite functions
d d t f ( g ( t ) ) = d f d g ( g ( t ) ) d g d t ( t ) .
To determine the maximum of a function on a manifold it is possible to make use of a gradient method. Such a method is based on the following important property: given a regular function f : M R , the solution x ( t ) of the following initial value problem
x ˙ ( t ) = ± grad x ( t ) f , x ( 0 ) = x 0 M ,
is such that
lim t + x ( t ) = x
where x denotes a point of local minimum of the function f near x 0 if one chooses the sign ‘−’ in front of the gradient; otherwise, x denotes a local maximum of the function f near x 0 if one chooses the sign ‘+’ in front of the gradient.
For a function with domain R n , the extremal points are among those that make its partial derivatives vanish to zero. A similar result holds for functions whose domain is a smooth manifold. Given a differentiable function f : M R , the extremal points are those x M for which:
grad x f = 0 .
This observation may be exploited
  • To determine those points x to which the solutions of the differential Equation (303) will tend toward;
  • Considering that Equation (305) is often non-linear and difficult to solve in closed form, while the differential Equation (303) may be solved (albeit approximately) by numerical recipes, the differential Equation (303) may be considered as a way to solve an equation such as (305).
On the basis of the relation (301), it is quite straightforward to show the following fundamental result.
Theorem 4.
The differential equation
x ˙ ( t ) = grad x ( t ) f , x ( 0 ) = x 0 M ,
generates trajectories toward points of the domain corresponding to ever-decreasing values of the function f.
Proof. 
Let us observe that
f ˙ = grad x f , x ˙ x = grad x f , grad x f x = grad x f x 2 0 ,
where equality holds if and only if grad x f = 0 . □
In a fully analogous way, upon choosing the sign + in front of the gradient, the corresponding differential equation generates trajectories tending toward ever-increasing values of the function f. Notice that the above facts are completely independent of the initial value x 0 and from the chosen metric.
In practical terms, it is safe to assume that the function f admits maxima and minima. For example, in the case of a continuous function f defined on a compact manifold, such as the hypersphere S n 1 , a theorem by Weierstrass ensures that f admits at least a maximum and a minimum. (In fact, the Weierstrass extreme value theorem holds for topological spaces which any manifold is an instance of.)
To summarize, the differential Equation (303) may be exploited to look for the extremal points of a function:
  • The dynamical system x ˙ ( t ) = + grad x ( t ) f , x ( 0 ) = x 0 M generates a trajectory in the state space M that tends toward a point of maximum of the function f located near the initial state x 0 ,
  • Conversely, the dynamical system x ˙ ( t ) = grad x ( t ) f , x ( 0 ) = x 0 M generates a trajectory in the space M that tends toward a point of minimum of the function f located near the initial state x 0 .
The locality property of such a gradient-based extremum search is due to the monotonicity property (307), which essentially shows that the trajectory of such systems fall in the basin of attraction of the extremal point that the initial condition also belongs to. In practice, the choice of an initial condition (also termed, in this context, initial guess), is a sensitive step that determines the success or the failure of extremal-point searching. In fact, a function f might possess several extremal points (namely, local maxima and minima). It pays to keep in mind that the extremal-point searching method based on the first-order dynamical systems (303) is able to determine only an extremal point at a time and, in particular, only the one nearest to the initial guess x 0 . It is therefore important to put into effect some pre-estimation technique to initialize the search procedure correctly.
From an algorithmic point of view, as opposed to what is required in numerical simulation where precision is a common demand, in the case of optimization by a gradient method, only the last point of the trajectory, namely x , needs to be approximated with good precision, and hence the employed numerical method does not need to be of a high order.
Example 29.
Let us consider Oja’s equation [56,57]
x ˙ ( t ) = ( I n x ( t ) x ( t ) ) P 0 x ( t ) , x ( 0 ) = x 0 S n 1 ,
with P 0 S + ( n ) . As we have seen in the previous example, the right-hand side of Oja’s equation coincides with the Riemannian gradient of the function (often termed ‘Rayleigh quotient’) f ( x ) : = 1 2 x P 0 x over the unit hypersphere endowed with its canonical metric. Such a function is quadratic, continuous and differentiable and is defined over a compact space; hence, it admits a maximum over the space S n 1 . Indeed, it admits n local maxima. The differential Equation (308) hence affords determining one of its local maxima depending on the initial condition.
The extremal points are those that make the gradient grad x f = P 0 x ( x P 0 x ) x vanish to zero, namely the solutions x S n 1 to the equation
P 0 x = ( x P 0 x ) x .
Clearly, the solutions x coincide with the eigenvectors of the symmetric, positive-definite matrix P 0 . Let us observe that the eigenvalue associated with the eigenvector x is λ : = x P 0 x ; hence, f ( x ) = 1 2 λ . Since the dynamical system (308) looks for the maximal value of the function f, Oja’s equation generates a trajectory that tends toward the eigenvector of P 0 corresponding to its maximal eigenvalue. ■
The differential equations of the kind
x ˙ ( t ) = grad x ( t ) f ,
represent an instance of dynamical systems of the first order, on a manifold, of the gradient type. (Notice that to switch the sign in front of the gradient, it suffices to take the function f instead of f. Likewise, to rescale the gradient of a factor η > 0 , it suffices to take η f instead of f.)

9.3. A Golden Gradient Rule: Gradient of Squared Distance

A golden calculation rule involving the notion of Riemannian gradient and that of Riemannian distance is as follows:
grad x d 2 ( x , y ) = 2 log x y .
The formal proof of this result which, of course, holds under technical conditions, traces back to [53]. It is instructive to verify this property in a couple of cases.
Example 30.
Let us verify the calculation rule (311) for the familiar case that M = R p . We have already shown that d 2 ( x , y ) = log x y 2 . In the case of the hypercube R p , endowed with a Euclidean metric, it holds that log x y = y x ; therefore, d 2 ( x , y ) = x y 2 , the familiar Euclidean distance in R p . Now, let us observe that
x y 2 x = 2 ( x y ) ,
and henceforth
d 2 ( x , y ) x = 2 ( y x ) = 2 log x y .
In addition, we can verify such a property for the hypersphere S p 1 . Let us recall that d 2 ( x , y ) = arccos 2 ( x y ) ; therefore, it holds that
d 2 ( x , y ) x = 2 d ( x , y ) 1 1 ( x y ) 2 y .
By the calculation rule for the gradient on the unit hypersphere, we obtain
grad x d 2 ( x , y ) = 2 d ( x , y ) sin d ( x , y ) ( I p x x ) y = 2 ( I p x x ) y sinc d ( x , y )
which coincides precisely with 2 log x y on the unit hypersphere. ■
For those who are mathematically minded, it is interesting to read a sketch of proof of the golden rule (311) based on properties that we have already surveyed.
Theorem 5.
Let us consider a Riemannian manifold ( M , · , · ) embedded in an ambient space ( A , · , · A ) and a pair of points x , y M such that d ( x , y ) and log x y are well defined. The Riemannian gradient of the squared distance d 2 ( x , y ) with respect to the variable x equals 2 log x y .
Proof. 
Let us define the following functions:
  • An arbitrary sufficiently smooth curve α x ( t ) such that α x ( 0 ) = x , while α ˙ x ( t ) is arbitrary;
  • The fundamental form F ( x , v ) : = v , v x ;
  • A geodesic curve c ( t , s ) connecting the point y to the point α x ( t ) , emanating from the former, with parameter s [ 0 , 1 ] ;
  • The partial derivatives c ˙ ( t , s ) : = c t ( t , s ) and c ( t , s ) : = c s ( t , s ) .
In addition, notice that the following properties hold:
(P1)
c ( t , 0 ) = y , therefore c ˙ ( t , 0 ) = 0 ;
(P2)
c ( t , 1 ) = α x ( t ) , therefore c ˙ ( t , 1 ) = α ˙ x ( t ) ;
(P3)
c ( t , 1 ) = log α x ( t ) y . (In fact, notice that if γ y x : [ 0 , 1 ] M denotes a geodesic from x to y then it holds that γ ˙ x y ( 0 ) = log y x and γ y x ( s ) = γ x y ( 1 s ) ; therefore, γ ˙ y x ( 1 ) = γ ˙ x y ( 0 ) = log y x .)
Let us recall that the squared Riemannian distance between two points may be written in terms of the fundamental form, namely
d 2 ( α x ( t ) , y ) = F ( c ( t , s ) , c ( t , s ) ) ,
where the right-hand side of the above equation is independent of s, as we have shown in Section 8. Let us integrate both sides of the above equation with respect to s in the interval [ 0 , 1 ] . This gives:
d 2 ( α x ( t ) , y ) = 0 1 F ( c ( t , s ) , c ( t , s ) ) d s .
Now, let us compute the derivative of both sides with respect to the parameter t:
d d t d 2 ( α x ( t ) , y ) = 0 1 d d t F ( c ( t , s ) , c ( t , s ) ) d s = 0 1 F c ( t , s ) , c t ( t , s ) A d s + 0 1 F c ( t , s ) , c t ( t , s ) A d s .
The second integral may be rewritten through the rule of integration by parts upon observing that c t = c ˙ s (Schwarz) and recalling that F ( x , v ) v = 2 G x ( v ) , where G denotes the metric kernel associated with the metrics in M and A . The application of such rule gives:
0 1 F c ( t , s ) , c t ( t , s ) A d s = 2 G x ( c ( t , s ) ) , c ˙ ( t , s ) A s = 0 1 0 1 d d t F c ( t , s ) , c ˙ ( t , s ) A d s .
Putting the pieces together gives:
d d t d 2 ( α x ( t ) , y ) = 0 1 F c ( t , s ) d d t F c ( t , s ) , c ˙ ( t , s ) A d s + 2 c ( t , s ) , c ˙ ( t , s ) c ( t , s ) s = 0 1
Since c ( · , s ) is a geodesic, the first integral is null due to the differential inclusion (153); therefore, we rest with
d d t d 2 ( α x ( t ) , y ) = 2 c ( t , 1 ) , c ˙ ( t , 1 ) c ( t , 1 ) 2 c ( t , 0 ) , c ˙ ( t , 0 ) c ( t , 0 ) .
Now, due to property P 1 , the second term on the right-hand side of the above equality is null; hence, thanks to properties P 2 P 3 , the above equality simplifies to
d d t d 2 ( α x ( t ) , y ) = 2 log α x ( t ) y , α ˙ x ( t ) α x ( t ) .
Setting t = 0 and recalling the definition of gradient (249) proves the golden rule. □
Let us emphasize how useful such a result is, since in non-linear control a number of functions to differentiate are based on distances. Let us apply the golden calculation rule to a non-linear control problem known as consensus optimization.
Example 31.
A category of dynamical systems of gradient type on manifolds are those associated withconsensus optimizationamong a set of agents [58,59]. Let us consider, as an example, a dynamical system that affords reaching attitudinal consensus in a fleet of flying objects (such as drones).
The attitude of a flying agent (namely, its orientation in space) may be represented by a rotation matrix R SO ( 3 ) that summarizes the three attitudinal coordinatesroll,pitchandyaw. A fleet of N flying agents may be described as a set of N rotation matrices R i SO ( 3 ) with i = 1 , 2 , 3 , , N .
The attitude of each agent in the fleet is then described by a time-function R i ( t ) . Consensus optimization may be formulated as follows:
  • Given a metric for the manifold SO ( 3 ) , it is possible to define a distance function d : SO ( 3 ) × SO ( 3 ) R 0 + between any pair of attitude matrices in SO ( 3 ) ;
  • A hierarchy is established among the agents in a fleet through a set of weights w i j 0 ;
  • Then, consensus optimization consists of determining an evolution law for each agent to minimize the distance between any pair of attitude matrices weighted according to the assigned hierarchy, namely, to minimize the function
    f i : = 1 2 j = 1 N w i j d 2 ( R i , R j ) , w i t h i = 1 , 2 , 3 , , N .
    Notice that d ( R i , R i ) = 0 ; hence, the values assigned to the diagonal coefficients w i i are unimportant.
On the basis of the calculation rule (311), it turns out that
grad R i f i = j = 1 N w i j log R i R j .
The set of gradient-type control systems to achieve consensus optimization within the fleet reads:
R ˙ i ( t ) = j = 1 N w i j log R i ( t ) R j ( t ) , R i ( 0 ) = R i 0 SO ( 3 ) ,
with i = 1 , 2 , 3 , , N and t 0 . (Notice that log R R = 0 .) ■

9.4. Riemannian Gradient in Coordinates*

Let f : M R denote a smooth manifold-to-real function, whose gradient at a point x M is sought. Let us fix a coordinate system on a local chart that includes the point x whose local coordinates are x 1 , x 2 , x p .
In order to determine the gradient grad x f , let us make use of the metric compatibility condition, recalling that
  • The Euclidean inner product reads x f , v = f x k v k .
  • The inner product reads grad x f , v x = g i k ( grad x f ) i v k .
The metric compatibility condition requires that
f x x k v k = g i k ( grad x f ) i v k for   every   choice   of   v ,
hence, g i k ( grad x f ) i = f x x k . Multiplying both sides by g h k one obtains g h k g i k ( grad x f ) i = g h k f x x k . Since g h k g i k = δ i h , we obtain
( grad x f ) h = g h k f x x k .
The Riemannian gradient in intrinsic coordinates is therefore expressed as
grad x f = g h k f x x k h x .
In manifold calculus, the Riemannian gradient has yet another interpretation in relation to the differential of a manifold-to-real function and to so-called musical isomorphisms. In fact, there exist canonical operators to convert contravariant components to covariant (also referred to as lowering an index) or vice versa (raising an index).
Given a regular function f : M R , its differential d f : T M T * M is expressed by:
d f x ( v ) = grad x f , v x , v T x M ,
where grad x f : M T x M denotes its Riemannian gradient. Namely, the differential represents the linear part of the change of a function when moving away from a point x in the direction v. In local coordinates:
d f x ( v ) = g i j ( x ) ( x f ) i v j .
Since the differential does not depend on the choice of coordinates, it must hold that d f x ( v ) = ( i f ) x v i . It follows the metric-compatibility condition g i j ( x ) ( x f ) j = ( i f ) x . Such expressions may be interpreted through ‘musical’ isomorphisms:
  • Sharp isomorphism ( ): The ‘sharp’ isomorphism : T * M T M takes a cotangent vector and returns a tangent vector. Namely, given a cotangent vector ω = ω i d x i , the sharp isomorphism acts like v : = ω = g i j ω i j . For example, in the case of gradient, one may say that grad x f = ( d f x ) .
  • Flat isomorphism ( ): The ‘flat’ isomorphism : T M T * M takes a tangent vector and returns a cotangent vector. Namely, given a tangent vector v = v i i x , the flat isomorphism acts like ω : = v = g i j v i d x j . Then, the differential is the dual of gradient via a flat isomorphism.

10. Parallelism and Parallel Transport along a Curve

Parallel transport along a curve on a manifold is a fundamental concept that emerges from the curved nature of manifolds and is deeply interwoven in the fabric of manifold calculus.

10.1. Properties and Definition of Parallel Transport

Given a curve on a Riemannian manifold M embedded into a Euclidean space A , a point x M on the curve and a tangent vector v T x M to the curve, if such a vector is moved to another point on the curve by ordinary rigid translation (which is available in A and may be referred to as ‘parallel translation’), in general it will not be tangent to the curve anymore. This observation suggests that it is necessary to define a notion of transport that is compatible with the structure of the manifold.
Another reason to motivate the notion of parallel transport is the lack of existence of ‘uniform’ vector fields on curved manifolds. On a flat space, such as R p , a uniform vector field u : R p R p is a field x u x such that u x is constant (same direction, length and orientation) for every x R p . On a curved manifold embedded in an ambient space A , however, it is unlikely that a tangent vector field may take the same value in two points, because tangency at one point would most likely imply lack of tangency in another point. The closest notion to uniformity would be parallelism: Informally speaking, a vector field is said to be parallel along a curve if it keeps the same orientation with respect to such a curve’s own velocity field. Notice that this property is expressed in quite vague terms: such vagueness is indeed intended and implies that there exist infinitely many ways to interpret the notion of parallel transport.
Example 32.
Let M denote a Riemannian manifold and x w x a tangent vector field on the tangent bundle T M ; namely, the function w assigns a tangent vector to each point x on the manifold. In the familiar case that M = R 3 , there exists a notion ofuniformvector field; namely, a vector field may take the same amplitude and direction in any point of the base-space M . This is the case, for example, of an electric field within an ideal capacitor or an induction field within an ideal electrical coil. A distinguishing feature of a uniform vector field in R 3 is that its directional derivative is zero everywhere; namely, given any point x R 3 and any direction v T x R 3 , it holds that
lim h 0 w x + h v w x h = 0 ,
where the left-hand side represents the directional derivative of the vector field w Γ ( M ) along a direction v at the point x.
On a curved manifold M , uniform vector fields hardly exist, as the rigid translation of a tangent vector to a different point of a manifold will most likely result in a non-tangent vector. The closest notion to uniformity that may be recovered on a curved manifold is that of parallelism. A parallel vector field needs to satisfy a generalized version of the condition (331) (which we shall not see in this survey since it requires covariant derivation, which will be covered in a separate survey). ■
The notion of parallelism of a vector field gives rise to an important canonical operator in manifold calculus, namely, parallel transport along a curve. Parallel transport allows one to, e.g., compare vectors belonging to different tangent spaces. On a smooth Riemannian manifold M , fix a smooth curve γ : [ 0 , 1 ] M . The parallel transport map associated with the curve γ is denoted as P γ s t : T γ ( s ) M T γ ( t ) M , which is a linear map for every s , t [ 0 , 1 ] , namely
P γ s t ( α w + β v ) = α P γ s t ( w ) + β P γ s t ( v )
for every α , β R and u , v T γ ( s ) M . The parallel transport map depends smoothly on its arguments and is such that P γ s s coincides with the identity map and P γ u t P γ s u = P γ s t for every s , u , t [ 0 , 1 ] . (Notice that the last property holds on the same curve only, while it does not hold for different arcs.)
In most cases of importance in system theory and control, the main interest lies in the parallel transport of a tangent vector along a geodesic curve on a Riemannian manifold, rather than along a generic smooth curve. Parallel transport along a geodesic arc may be conjugated in at least two ways:
  • Parallel transport along a geodesic arc joining two given points: Given two (sufficiently close) points x , y M , and a tangent vector w T x M , the parallel transport of w from x to y along the geodesic arc connecting x to y is denoted by P x y ( w ) , since this notation shows all relevant information. In fact, the notation P x y for the parallel transport operator is a shortened version of P γ x y 0 1 : T x M T y M , where γ x y : [ 0 , 1 ] M would denote the geodesic arc such that γ x y ( 0 ) = x and γ x y ( 1 ) = y .
  • Parallel transport along a geodesic specified by a point and a tangent direction: Given a point x M , and a tangent direction v T x M , the parallel transport of a vector w T x M along the geodesic arc departing from the point x toward the direction v is denoted by P γ x , v 0 t ( w ) , where γ x , v : [ ϵ , ϵ ] M , with ϵ > 0 and t [ ϵ , ϵ ] , would denote the geodesic curve such that γ x , v ( 0 ) = x and γ ˙ x , v ( 0 ) = v .
There exist other special cases of interest in control and automation. One such special case supplies a partial answer to the problem of effecting parallel transport on a manifold for which the structure of the parallel transport operator is unknown. Another special case is parallel transport along a closed loop, which may arise when dealing with periodic trajectories or self-intersecting trajectories.
  • Self parallel transport along a geodesic arc: A distinguishing feature of a geodesic arc is that it parallel-transports its own initial slope. On a setting where the parallel transport operator is unknown in closed form, there exists no known way to transport a given vector w along a geodesic curve γ x , v . However, it is possible to parallel-transport a given vector v along a geodesic γ x , v , namely to compute P γ x , v 0 t ( v ) : it coincides exactly with γ ˙ x , v ( t ) . Such a numerical trick was invoked, for instance, in [39], Subsection III.A.
  • Parallel transport along a closed loop: Parallel transport of a vector along a (piece-wise continuous) loop is denoted as P : T x M T x M , where x is termed base of the loop. In general, P ( v ) v , for v T x M . This phenomenon is referred to as anholonomy. Whenever P realizes an isometry, since P ( v ) x = v x , the operator P changes only the orientation of v. For example, if M R 3 , then we may say that P ( v ) is a rotated version of a tangent vector v laying in the same tangent space, namely, P may be represented as an element of the orthogonal group O ( 3 ) . Intuitively, holonomy is specifically related to curved spaces, and therefore holonomy must be related to curvature. This conjecture is made precise by the Ambrose–Singer theorem [60].
In the following, we shall go through a detailed derivation of the notion of parallel transport along a curve on the tangent bundle of a manifold embedded in an ambient space. In general, the result of parallel transport depends on the curve; moreover, parallel transport may be conceived in a way that preserves the angle between transported vectors but not their lengths, or to preserve both.

10.2. Coordinate-Free Derivation of Parallel Transport

Let us develop a notion of parallel transport P such that, in every point of the curve γ , it holds that P γ 0 t ( w ) T γ M . In addition, a fundamental requirement that parallel transport should fulfill is that, given two tangent vectors u , w T x M that form an angle α , parallel transport should preserve the value of such angle along the whole curve, namely the angle between P γ 0 t ( u ) and P γ 0 t ( w ) . Let us recall that the angle α between two vectors u , w in T x M is defined by
cos ( α ) : = u , w x u x w x .
We may thus require parallel transport to fulfill:
  • Tangency condition: for every t [ 0 , 1 ] it should hold that P γ 0 t ( w ) T γ ( t ) M ,
  • Conformal isometry condition: for every value of the parameter t [ 0 , 1 ] it should hold that P γ 0 t ( u ) , P γ 0 t ( w ) γ ( t ) keeps constant to u , w x along the curve.
Let us verify that the second condition implies both conformality and isometry. To what concerns isometry, let us show that taking w = u gives:
P γ 0 t ( u ) , P γ 0 t ( u ) γ ( t ) = P γ 0 t ( u ) γ ( t ) 2 ,
from which it follows that P γ 0 t ( u ) γ ( t ) keeps constant along the curve γ to u x . It is straightforward to prove conformality; in fact,
P γ 0 t ( u ) , P γ 0 t ( w ) γ ( t ) P γ 0 t ( u ) γ ( t ) P γ 0 t ( w ) γ ( t ) = u , w x u x w x , t [ 0 , 1 ] .
(Let us recall that a linear transformation that preserves angles is termed conformal.) It is important to underline that the above conditions do not define parallel transport univocally, but they represent minimal requirements for an operator to be qualified as parallel transport. In other words, parallel transport may be defined in several ways as long as it meets the above conditions, which turn out to be of great importance in non-linear control.
To ease notation, let us define the following fields:
  • u ¯ ( t ) : = P γ 0 t ( u ) : a vector field ( γ ( t ) , u ¯ ( t ) ) that represents the evolution, over the curve γ , of the vector u that is tangent to the manifold at γ ( 0 ) and after transport it will be tangent at γ ( 1 ) ,
  • w ¯ ( t ) : = P γ ( t ) 0 t ( w ) : analogous to u ¯ ( t ) , both are transport fields,
  • Q ( t ) : = u ¯ ( t ) , w ¯ ( t ) γ ( t ) : a scalar field that represents the inner product between the above two transport fields along the curve γ .
For a manifold M embedded in an ambient space A endowed with a metric · , · A , the condition of conformality and isometry may be written in a more detailed way. Let us recall from Section 7 that the inner product of two tangent vectors may be written as u , w x = u , G x ( w ) A , where the metric kernel has key properties, namely G x ( v ) is linear in v, G x is self-adjoint (namely, u , G x ( w ) A = G x ( u ) , w A ), G x is invertible and its inverse is denoted as G x 1 , and G x maps T x M to itself.
In order to perform the following calculations seamlessly, it pays to highlight a few calculation rules about the metric kernel G and the fundamental form F :
  • Taylor series expansion of the kernel G: Given G x ( v ) with x M , v T x M , let us write
    G γ x , w ( t ) ( v ) = G x ( v ) + G x ( v , t w ) + O ( t 2 ) ,
    where γ x , w : [ ϵ , ϵ ] M denotes any smooth curve such that γ x , w ( 0 ) = x and γ ˙ x , w ( 0 ) = w , with w T x M denoting a direction along which a variation of the metric kernel is sought. The quantity G x ( v , w ) represents a first-order derivative of the kernel G x calculated with respect to the variable x. (Notation-wise, the ‘bullet’ derivative is a convenient way to denote a derivative that cannot be expressed in more specific terms. For example, if f , g are real functions of a real variable, one might denote d d x f ( g ( x ) ) = f ˙ ( g ( x ) ) g ˙ ( x ) as f ( g ( x ) , g ˙ ( x ) ) .) The formal definition goes like
    G x ( v , w ) : = d d t G γ x , w ( t ) ( v ) t = 0 ,
    where again γ x , w : [ ϵ , ϵ ] M denotes any smooth curve such that γ x , w ( 0 ) = x and γ ˙ x , w ( 0 ) = w . The function G x ( v , w ) is linear in the arguments v and w and, in general, non-linear in x. Notice that the extendability property of G x from T x M to A holds also for the first argument of the derivative G x .
  • Commutativity with the inner product in the ambient space: In the expression u , G x ( v , w ) A it is allowed to swap the arguments u and v. In fact, notice that
    u , G x ( v , w ) A = u , d d t G γ x , w ( t ) ( v ) t = 0 A = d d t u , G γ x , w ( t ) ( v ) A t = 0 = d d t G γ x , w ( t ) ( u ) , v A t = 0 = d d t G γ x , w ( t ) ( u ) t = 0 , v A = v , G x ( u , w ) A .
    This property of G x holds even if the first argument of the derivative belongs to A .
  • Partial derivatives of the fundamental form: Let us recall the definition of the fundamental form F ( x , v ) : = v , G x ( v ) A . Its partial derivatives read
    F x ( x , v ) = G x ( v , v ) , F v ( x , v ) = 2 G x ( v ) .
Let us see a few examples to clarify the above definitions and properties.
Example 33.
Let us start considering the case of the manifold S + ( n ) endowed with its canonical metric, embedded in the ambient space A : = R n × n endowed with a Euclidean metric. In this setting, we have seen that G P ( V ) = P 1 V P 1 . By definition, its ‘bullet derivative’ reads
G P ( V , W ) = d d t ( γ 1 V γ 1 ) t = 0 ,
where γ : [ ϵ , ϵ ] S + ( n ) denotes any smooth curve such that γ ( 0 ) = P S + ( n ) and γ ˙ ( 0 ) = W T P S + ( n ) . By using the noticeable matrix-flow derivation rule (104), we obtain
d d t ( γ 1 V γ 1 ) = ( γ 1 γ ˙ γ 1 ) V γ 1 + γ 1 V ( γ 1 γ ˙ γ 1 ) .
Setting t = 0 leads to the sought expression
G P ( V , W ) = P 1 W P 1 V P 1 P 1 V P 1 W P 1 .
Let us verify the property of commutativity with the inner product. We may calculate that
G P ( V , W ) , U A = tr ( P 1 W P 1 V P 1 U ) tr ( P 1 V P 1 W P 1 U ) , G P ( U , W ) , V A = tr ( P 1 U P 1 W P 1 V ) tr ( P 1 W P 1 U P 1 V ) .
The two quantities may be proven equal by the cyclic permutation property of the trace.
Another example that we are surveying concerns the manifold Sp ( 2 n ) endowed with its canonical metric again embedded in the ambient space A : = R n × n endowed with a Euclidean metric. In this setting, we have seen that G Q ( V ) = Q Q 1 V . Its ‘bullet derivative’ reads
G Q ( V , W ) = d d t ( γ γ 1 V ) t = 0 ,
where γ : [ ϵ , ϵ ] Sp ( 2 n ) denotes any smooth curve such that γ ( 0 ) = Q Sp ( 2 n ) and γ ˙ ( 0 ) = W T Q Sp ( 2 n ) .
By invoking the identity (104) once again, we obtain
d d t ( γ γ 1 ) = ( γ 1 γ ˙ γ 1 ) γ 1 + γ ( γ 1 γ ˙ γ 1 ) .
Setting t = 0 leads to the expression
G Q ( V , W ) = Q ( W Q + Q 1 W ) Q 1 V .
Let us verify the commutativity property with the inner product. We may calculate that
U , G P ( V , W ) A = tr ( U Q ( W Q + Q 1 W ) Q 1 V ) , V , G P ( U , W ) A = tr ( V Q ( W Q + Q 1 W ) Q 1 U ) .
These two quantities may be proven equal by transposing the arguments of the traces. ■
The condition of conformality and isometry may be written very compactly as Q ˙ = 0 . By multivariable calculus and the above definition and properties, we obtain:
Q ˙ = u ¯ ˙ , G γ ( w ¯ ) A + u ¯ , G γ ( w ¯ , γ ˙ ) A + u ¯ , G γ ( w ¯ ˙ ) A .
(This is one situation in which we need to invoke the ‘extendability’ property of the metric kernel; in fact, notice that w ¯ ˙ does not necessarily belong to T γ M .) In the second term on the right-hand side, the vector fields u ¯ and w ¯ may be swapped, and hence we may write:
Q ˙ = u ¯ ˙ , G γ ( w ¯ ) A + w ¯ , G γ ( u ¯ , γ ˙ ) A + u ¯ , G γ ( w ¯ ˙ ) A .
Since the above development is still too general to lead to a computable notion of parallel transport, let us refer parallelism to the velocity field associated with the curve γ , namely, let us set w ¯ : = γ ˙ . This means that the parallel transport operation will keep fixed the angle between the transported vector and the velocity vector associated with the curve. The expression of Q ˙ becomes then:
Q ˙ = u ¯ ˙ , G γ ( γ ˙ ) A + γ ˙ , G γ ( u ¯ , γ ˙ ) A + u ¯ , G γ ( γ ¨ ) A .
In the above relation, all quantities are known except for the transport field u ¯ of which we are seeking a temporal evolution law.
A special case, of sure interest in applications since it dispenses us from choosing a curve γ , is parallel transport along a geodesic line. Let thus γ denote a geodesic and let us recall that, in this case, the expression G γ ( γ ¨ ) may be rewritten in terms of the velocity γ ˙ through the differential inclusion that characterizes a geodesic line. Let us then revamp the geodesic equation with the aim of writing it in a Christoffel normal form. Recall that, given the fundamental form for a line F ( γ , γ ˙ ) = γ ˙ , G γ ( γ ˙ ) A , whenever it is a geodesic line it holds that:
F γ d d t F γ ˙ N γ M .
According to the relationships in (339)
F γ = G γ ( γ ˙ , γ ˙ ) , F γ ˙ = 2 G γ ( γ ˙ ) ,
and then
d d t F γ ˙ = 2 G γ ( γ ˙ , γ ˙ ) + 2 G γ ( γ ¨ ) ,
therefore the differential inclusion (351) that defines a geodesic line becomes
G γ ( γ ˙ , γ ˙ ) 2 G γ ( γ ¨ ) = B γ ( γ ˙ , γ ˙ ) ,
where B x : ( T x M ) 2 N x M is a bilinear function of its vector arguments and returns an element of the normal space N x M . Ultimately, we obtained
G γ ( γ ¨ ) = 1 2 G γ ( γ ˙ , γ ˙ ) 1 2 B γ ( γ ˙ , γ ˙ ) .
This same expression may be rewritten compactly in one of the following normal Christoffel forms:
G γ ( γ ¨ ) + Γ ¯ γ ( γ ˙ , γ ˙ ) = 0 , o r γ ¨ + Γ ¯ ¯ γ ( γ ˙ , γ ˙ ) = 0 ,
where the function Γ ¯ x is termed Christoffel form of the 1 st kind, while the quantity Γ ¯ ¯ x : = G x 1 Γ ¯ x is termed Christoffel form of the 2 nd kind. Notice that only the restricted Christoffel form Γ ¯ x ( v , v ) with x M and v T x M has been defined so far, namely
Γ ¯ x ( v , v ) : = 1 2 G x ( v , v ) + 1 2 B x ( v , v ) .
Since both terms on the right-hand side are bilinear in their vector arguments, the Christoffel form is quadratic. It is now necessary to define the full Christoffel form Γ x ( u , w ) , with x M and u , w T x M on the basis of its restricted version. Such a result may be achieved through the following polarization formula:
Γ ¯ x ( u , w ) : = 1 4 Γ ¯ x ( u + w , u + w ) Γ ¯ x ( u w , u w ) ,
which stems from the requirements that the full Christoffel form is bilinear and symmetric in its vector argument, namely Γ ¯ x ( u , w ) = Γ ¯ x ( w , u ) .
Employing the normal Christoffel form in the latest expression of Q ˙ gives:
Q ˙ = u ¯ ˙ , G γ ( γ ˙ ) A + γ ˙ , G γ ( u ¯ , γ ˙ ) A u ¯ , Γ ¯ γ ( γ ˙ , γ ˙ ) A .
Since Γ ¯ is derived by polarization from a restricted bilinear function, it is self-adjoint in both its arguments; hence, it commutes with the inner product · , · A . Therefore, in the third term on the right hand side it is possible to swap the transport field u ¯ with any of the two fields γ ˙ , which leads to:
Q ˙ = u ¯ ˙ , G γ ( γ ˙ ) A + γ ˙ , G γ ( u ¯ , γ ˙ ) A γ ˙ , Γ ¯ γ ( u ¯ , γ ˙ ) A = u ¯ ˙ , G γ ( γ ˙ ) A + γ ˙ , Γ ¯ γ ( u ¯ , γ ˙ ) A
where we have used the fact that Γ ¯ and 1 2 G differ by a purely normal component.
In order to make the above addenda uniform to one another, let us introduce in the above expression twice the identity G γ G γ 1 ( = id γ ) obtaining:
Q ˙ = G γ ( γ ˙ ) , u ¯ ˙ A + G γ ( γ ˙ ) , G γ 1 Γ ¯ γ ( u ¯ , γ ˙ ) A .
Now it is possible to gather the terms in G γ ( γ ˙ ) and to write the above sum as a single inner product:
Q ˙ = G γ ( γ ˙ ) , u ¯ ˙ + G γ 1 Γ ¯ γ ( u ¯ , γ ˙ ) A = 0 .
Ultimately, the last relationship still expresses the basic property of parallel transport to realize a conformal isometry, which still leaves much room open for a precise definition of parallel transport. The condition Q ˙ = 0 alone would give rise to infinitely many equations to compute a transport field, and hence parallel transport. Among these infinitely many, one arises simply by requiring the right-hand side of the above inner product to vanish to zero. Namely:
u ¯ ˙ + Γ ¯ ¯ γ ( u ¯ , γ ˙ ) = 0 .
The solution of such equation supplies an expression for the transport field which, in turn, supplies a notion of parallel transport along a geodesic line, namely
P x y ( u ) : = u ¯ ( 1 ) ,
where u ¯ ( t ) denotes the solution of the differential Equation (363) with the initial condition u ¯ ( 0 ) = u . Such a differential equation in A is first-order and linear in the unknown transport field. Since the set of solutions to a linear first-order differential equation is a linear space, parallel transport is a linear isomorphism of the tangent bundle T M .
Example 34.
As anticipated, a noteworthy property of any geodesic line is that it self-transports its own velocity along. In other words, given a geodesic γ x y : [ 0 , 1 ] M that connects two points x , y M , the following function turns out to realize parallel transport:
P γ x y 0 t ( γ ˙ x y ( 0 ) ) : = γ ˙ x y ( t ) .
Let us verify that it meets the fundamental properties any parallel transport should meet:
  • Tangency: for every t [ 0 , 1 ] it holds that P γ x y 0 t ( γ ˙ x y ( 0 ) ) T γ x y ( t ) M ; in fact, P γ x y 0 t ( γ ˙ x y ( 0 ) ) = γ ˙ x y ( t ) is tangent to the curve at every point γ x y ( t ) .
  • Conformal isometry: for every t [ 0 , 1 ] it holds that P γ x y 0 t ( γ ˙ x y ( 0 ) ) , P γ x y 0 t ( γ ˙ x y ( 0 ) ) γ x y ( t ) = γ ˙ x y ( t ) γ x y ( t ) 2 which is constant along a geodesic,
as claimed. ■
As a special case, it is worth considering a manifold M endowed with a uniform metric. In this instance, the metric kernel G x is independent of the point x, and therefore it holds that G 0 along every (geodesic) line. The transport field equation hence simplifies into
u ¯ ˙ = 1 2 G 1 B γ ( u ¯ , γ ˙ ) .
In the ever simpler instance where the metric is Euclidean, the metric kernel coincides with the identity (namely G x id x ), and hence it turns out that
u ¯ ˙ = 1 2 B γ ( u ¯ , γ ˙ ) .
Recalling that the function B maps the double tangent bundle ( T M ) 2 to the normal bundle N M , it is quite straightforward to figure out a practical interpretation of the differential Equation (366). It basically prescribes that, when traveling from a point over the curve γ to another infinitesimally close point, the transport field looses its normal component to stay tangent.
The following two examples aim at clarifying the above theoretical developments.
Example 35.
Let us apply the theoretical development just analyzed to determine a transport field equation on a hypersphere S n 1 A with A : = R n endowed with the metric u , w x : = u , w A = u w .
Let us start by determining the Christoffel form (of the first or second kind is unessential, in this case, because the metric kernel coincides with the identity). The easiest way to determine Christoffel forms is via a geodesic equation. The fundamental form, in this case, reads F : = γ ˙ γ ˙ . From the fundamental form we find that
F γ = 0   a n d   F γ ˙ = 2 γ ˙ ,
therefore
d d t F γ ˙ = 2 γ ¨ .
The geodesic equation then reads γ ¨ = λ γ ; hence, λ = γ ¨ γ . From the tangency of the velocity γ ˙ , it follows that:
γ ˙ γ = 0 d / d t γ ¨ γ + γ ˙ γ ˙ = 0 ,
and henceforth γ ¨ γ = γ ˙ 2 . As a consequence, λ = γ ˙ 2 and:
Γ ¯ x ( u , w ) = Γ ¯ ¯ x ( u , w ) = ( u w ) x .
We may now verify that Γ ¯ x ( u , w ) , v A = Γ ¯ x ( v , w ) , u A . It holds that
Γ ¯ x ( u , w ) , v A = ( u w ) ( v x ) = 0 , Γ ¯ x ( v , w ) , u A = ( v w ) ( u x ) = 0 ,
hence the assertion.
The equation of the transport field, in this case, reads as:
u ¯ ˙ + ( u ¯ γ ˙ ) γ = 0 .
Such differential equation may be solved and returns an expression for the transport field u ¯ . The definition (363) returns then the expression for the parallel transport from the point x S n 1 of a vector w T x S n 1 to the point y S n 1 :
P x y ( w ) : = I n ( x + y ) y 1 + x y w .
Let us verify some characteristic features of the above map:
  • Linearity: The linearity of P x y ( w ) with respect to w (but not with respect to x ad y) is quite apparent.
  • Identity: Letting y x leads to an identity map. In fact, it holds that P x x ( w ) = I n ( x + x ) x 1 + x x w = I n 2 x x 2 w = ( I n x x ) w = w x ( x w ) = w 0 = w .
  • Tangency: It should hold that y P x y ( w ) = 0 . In fact: y I n ( x + y ) y 1 + x y w = y w y ( x + y ) y w 1 + x y = y w ( y x + 1 ) y w 1 + x y = y w y w = 0 .
Important properties of the map (373) have been hence verified. ■
Example 36.
Let us consider the case of the manifold S + ( n ) A endowed with the metric U , W P : = tr ( U P 1 W P 1 ) embedded in the ambient space A : = R n × n endowed with the metric U , V A : = tr ( U V ) . Recall from Section 7 that the corresponding metric kernel takes the expression G P ( W ) : = P 1 W P 1 (which, as a function of W, is well-defined in the whole ambient space A ).
Let us determine the Christoffel form of the second kind through calculations related to geodesy. Define the fundamental form on a curve as F ( γ , γ ˙ ) : = tr ( γ ˙ γ 1 γ ˙ γ 1 ) and observe that:
F γ ˙ = 2 γ 1 γ ˙ γ 1 [ 2 G γ ( γ ˙ ) ] ,
F γ = 2 γ 1 γ ˙ γ 1 γ ˙ γ 1 [ G γ ( γ ˙ , γ ˙ ) ] ,
from which it follows that:
d d t F γ ˙ = 2 d d t ( γ 1 ) γ ˙ γ 1 + 2 γ 1 γ ¨ γ 1 + 2 γ 1 γ ˙ d d t ( γ 1 ) .
Recalling the matrix identity
d d t ( γ 1 ) = γ 1 γ ˙ γ 1 ,
leads to
d d t F γ ˙ = 4 γ 1 γ ˙ γ 1 γ ˙ γ 1 + 2 γ 1 γ ¨ γ 1 ,
which, in turn, implies that:
F γ d d t F γ ˙ = 2 γ 1 γ ˙ γ 1 γ ˙ γ 1 + 4 γ 1 γ ˙ γ 1 γ ˙ γ 1 2 γ 1 γ ¨ γ 1 , = 2 γ 1 γ ˙ γ 1 γ ˙ γ 1 2 γ 1 γ ¨ γ 1 , = 2 γ 1 ( γ ˙ γ 1 γ ˙ γ ¨ ) γ 1 .
The transport equation, in normal Christoffel form, hence reads:
γ ¨ γ ˙ γ 1 γ ˙ ω γ = 0 ,
where ω denotes a skew-symmetric matrix function (namely ω = ω ) such that the product ω γ belongs to the normal space at γ.
In order to determine a function ω, let us invoke the condition of tangency of the velocity field γ ˙ that, in this case, is expressed as γ ˙ = γ ˙ . From such a condition, it follows that γ ¨ = γ ¨ ; hence:
( γ ¨ γ ˙ γ 1 γ ˙ ω γ ) = γ ¨ γ ˙ γ 1 γ ˙ γ ω = 0 ,
from which it follows that ω = 0 . The restricted Christoffel form of the second kind, in this case, hence reads as
Γ ¯ ¯ P ( V , V ) : = V P 1 V .
To determine the complete Christoffel form of the second kind, let us make use of the polarization formula (358):
Γ ¯ ¯ P ( V , W ) : = 1 4 Γ ¯ ¯ P ( V + W , V + W ) + Γ ¯ ¯ P ( V W , V W ) = 1 4 ( V + W ) P 1 ( V + W ) + ( V W ) P 1 ( V W ) = 1 2 V P 1 W 1 2 W P 1 V .
Let us now verify that the form Γ ¯ commutes with the ambient inner product. We have Γ ¯ = G Γ ¯ ¯ ; therefore,
Γ ¯ P ( V , W ) = 1 2 P 1 ( V P 1 W + W P 1 V ) P 1 .
As a consequence, we have
Γ ¯ P ( V , W ) , U A = 1 2 tr ( U P 1 V P 1 W P 1 + U P 1 W P 1 V P 1 ) , Γ ¯ P ( U , W ) , V A = 1 2 tr ( V P 1 U P 1 W P 1 + V P 1 W P 1 U P 1 ) ,
which may be easily proven equal by the cyclic commutation property of matrix trace.
As a consequence, and employing the relation (342), the parallel transport equation is found to be:
U ¯ ˙ + γ ( γ 1 γ ˙ γ 1 U ¯ γ 1 γ 1 U ¯ γ 1 γ ˙ γ 1 ) γ 1 2 γ ˙ γ 1 U ¯ 1 2 U ¯ γ 1 γ ˙ = 0 ,
or, after a few straightforward simplifications,
U ¯ ˙ 1 2 γ ˙ γ 1 U ¯ + U ¯ γ 1 γ ˙ = 0 .
Through the definition (363), we may obtain the expression of the parallel transport from a point P S + ( n ) of a tangent vector W T P S + ( n ) to a point Q S + ( n ) :
P P Q ( W ) : = Q P 1 W P 1 Q ,
where the matrix square root · returns a symmetric matrix. Let us verify two properties of the found operator:
  • Identity: Setting Q P leads to parallel transport collapse to an identity map. Indeed, it holds that P P P ( W ) : = P P 1 W P 1 P = W .
  • Tangency: It holds that ( P P Q ( W ) ) = P P Q ( W ) . In fact, we have ( P P Q ( W ) ) = ( P 1 Q ) 2 W ( Q P 1 ) 2 = ( Q P ) 1 2 W ( P Q ) 1 2 = Q P 1 W P 1 Q = P P Q ( W ) .
Linearity is apparent as well. ■
The calculations illustrated in the previous example requires the evaluation of the partial derivatives of the fundamental form F , determined in (374) and (375). It is quite instructive to sketch such calculations.
Example 37.
Let us survey in detail the calculations that led to relations (374) and (375). Such calculations may be carried out through calculus (exact way) or by analytic approximations (which is more clear from a computational viewpoint).
To begin with, it is useful to analytically justify a matrix approximation, namely
( P + E ) 1 P 1 P 1 E P 1 ,
with P S + ( n ) and E R n × n with E small. Such approximation may be proven through the notion ofanalytic matrix function. By definition, a matrix-to-matrix function f : R n × n R n × n is analytic in a point X 0 R n × n if it admits, in a neighborhood of such a point, a polynomial series expansion, namely:
f ( X ) = n = 0 a n ( X X 0 ) n , f o r X X 0 < ρ
where a n R are coefficients of the series and ρ > 0 denotes the radius of convergence of the polynomial series.
Let us now consider the function f ( X ) : = ( I n + X ) 1 . It is analytic in a neighborhood of the point X = 0 n and the associated polynomial series, truncated to the second term, reads:
( I n + X ) 1 I n X ,
from which it readily follows that
( P + E ) 1 = ( P ( I n + P 1 E ) ) 1 = ( I n + P 1 E ) 1 P 1 ( I n P 1 E ) P 1 = P 1 P 1 E P 1 .
To justify the partial derivatives (374) and (375), let us define F ( P , V ) : = tr ( V P 1 V P 1 ) and evaluate its partial derivatives with respect to P and V. The partial derivative of a matrix-to-scalar function may be thought of as the ‘coefficient’ of the linear term arising from an additive perturbation.
Applying such a practical idea to evaluate the partial derivative with respect to the first argument gives:
F ( P + E , V ) = F ( P , V ) + F P , E A + higher order terms ,
where E denotes a matrix (termedperturbation) whose entries are small numbers (such that E is very small). In the present case, we have:
F ( P + E , V ) = tr ( V ( P + E ) 1 V ( P + E ) 1 ) , tr ( V ( P 1 P 1 E P 1 ) V ( P 1 P 1 E P 1 ) ) tr ( V P 1 V P 1 ) tr ( V P 1 E P 1 V P 1 ) tr ( V P 1 V P 1 E P 1 ) + higher order terms = F ( P , V ) tr ( P 1 V P 1 V P 1 E ) tr ( P 1 V P 1 V P 1 E ) + higher order terms = F ( P , V ) 2 tr ( P 1 V P 1 V P 1 E ) + h i g h e r   o r d e r   t e r m s .
Comparing this expression with (392), one obtains the sought result, namely:
F P = 2 P 1 V P 1 V P 1 .
To what concerns the second argument, we may write that:
F ( P , V + E ) = F ( P , V ) + F V , E A + higher order terms ,
where E again denotes a perturbation. In this case, we have:
F ( P , V + E ) = tr ( ( V + E ) P 1 ( V + E ) P 1 ) , = tr ( V P 1 V P 1 ) + tr ( E P 1 V P 1 ) + tr ( V P 1 E P 1 ) + higher order terms = F ( P , V ) + tr ( P 1 V P 1 E ) + tr ( P 1 V P 1 E ) + higher order terms = F ( P , V ) + 2 tr ( P 1 V P 1 E ) + higher order terms .
Comparing such an expression with (395) yields:
F V = 2 P 1 V P 1 ,
which is the sought result. ■
Let us summarizes a few known formulas about parallel transport. All the formulas given below hold under the provision that the manifolds are endowed with their canonical metrics.
  • Hypercube: The hypercube endowed with the standard Euclidean metric is a flat space; hence, parallel transport may be realized as a rigid translation. Namely parallel transport is an identity map.
  • Hypersphere: Parallel transport on the hypersphere S p 1 of the tangent vector w T x S p 1 along the geodesic arc γ x , v of an extent t may be computed by [39]:
    P γ x , v 0