Reduction by Lie Group Symmetries in Diffeomorphic Image Registration and Deformation Modelling

We survey the role of reduction by symmetry in the large deformation diffeomorphic metric mapping framework for registration of a variety of data types (landmarks, curves, surfaces, images and higher-order derivative data). Particle relabelling symmetry allows the equations of motion to be reduced to the Lie algebra allowing the equations to be written purely in terms of the Eulerian velocity field. As a second use of symmetry, the infinite dimensional problem of finding correspondences between objects can be reduced for a range of concrete data types, resulting in compact representations of shape and spatial structure. Using reduction by symmetry, we describe these models in a common theoretical framework that draws on links between the registration problem and geometric mechanics. We outline these constructions and further cases where reduction by symmetry promises new approaches to the registration of complex data types.


Introduction
Registration, the task of establishing correspondences between multiple instances of objects, such as images, landmarks, curves and surfaces, plays a fundamental role in a range of computer vision applications, including shape modelling [1], motion compensation and optical flow [2], remote sensing [3] and medical imaging [4]. In the subfield of computational anatomy, establishing inter-subject correspondences between organs allows the statistical study of organ shape and shape variability [5]. Examples of the fundamental role of registration include quantifying developing Alzheimer's disease by establishing correspondences between brain tissue at different stages of the disease [6]; measuring the effect of chronic obstructive pulmonary disease on lung tissue after removing variability caused by the respiratory process [7]; and correlating the shape of the hippocampus to schizophrenia after inter-subject registration [8].
In this paper, we survey the role of reduction by symmetry in diffeomorphic registration and deformation modelling, linking symmetry as seen from the field of geometric mechanics with the image registration problem. All of our calculations will be formal and void of functional analytic detail, although citations will be used when available. We focus on large deformations modelled in subgroups of the group of diffeomorphic mappings on the spatial domain in the context of large deformation diffeomorphic metric mapping (LDDMM) [1,[9][10][11]. Connections with geometric mechanics [12,13] have highlighted the role of symmetry, and properties that were previously known to be connected with the specific data types have been described in a common theoretical framework [14]. We wish to describe these connections in a form that highlights the role of symmetry and points towards future applications of the ideas.

Symmetry and Information
One of the main reasons symmetry is useful in data analysis and numerics is its ability to reduce the complexity of information that represents data. Lower information complexity can lead to more stable statistical analysis and the reduced need of computational resources.
As a toy example, consider a spinning top. Upon choosing a reference configuration, the orientation of the top is given by a rotation matrix, i.e., an element R ∈ SO(3) (see Figure 1). To describe the direction of the tip of the top, it suffices to provide the orientation matrix R. However, R is contained in SO(3), a three-dimensional space, while the space of possible directions is the two-sphere, S 2 , which is only of dimension two. Therefore, providing the full matrix R is an over-representation of the tip direction. It suffices to solely provide the vector R · k ∈ S 2 where k = (0, 0, 1) is the direction of the tip in a reference configuration. Note that ifR · k = k, then R · k = R ·R · k. Therefore, given only the direction k = R · k, we can only reconstruct R up to an elementR, which preserves k. The subgroup of rotations that preserve k can be identified with SO (2). Specifically, this identification comes from perceiving a rotation about k as a rotation of the plane, which is perpendicular to k. This insight allows us to express the space of directions S 2 as a homogeneous space S 2 ≡ SO(3)/ SO (2). In terms of information, we can cartoonishly express this by: "orientation" = "direction of tip" + "orientation around the tip" This picture is typical for many group quotients. Generally speaking, if X is a manifold and G acts freely and properly on X, then: When X is infinite dimensional, this formula is less insightful. However, X/G is smaller than X in that there exists a surjective map from X to X/G with non-trivial level sets. Reduction by symmetry can be implemented when a problem posed on X has G symmetry and can be rewritten as a problem posed on X/G. As X/G is smaller than X, reduction by symmetry can yield more stable subsequent statistical analysis of observed data and more tractable algorithms, as will be shown later in the article. This reduction is particularly dramatic when dim(X) = ∞ and dim(X/G) < ∞. reference configuration current configuration tip tip Figure 1. A diagram relating a top in the reference configuration to its current configuration via a rotation matrix R ∈ SO(3).

Symmetry in Registration
Registration of objects contained in a spatial domain, e.g., the volume to be imaged by a scanner, can be formulated as the search for a deformation that transforms both domain and objects to establish an inter-object match. The data available when solving a registration problem is generally insufficient for determining the displacement of every point of the domain. This is the case when images to be matched have areas of constant intensity and no derivative information can guide the registration. For example, the "best" deformation for matching the two discs in Figure 2 is ambiguous, except at the boundary of the discs, where the images are non-constant. Similarly, when 3D shapes are matched based on the similarity of their surfaces, the deformation of the interior cannot be derived from the available information. In these cases, the deformation model is over-complete, and a range of deformations can provide equally adequate matches for the data. The registration problem or the registration cost-function is thus symmetric with respect to the subset of transformations just described. When the deformation model is a Lie group, the deformations for which the registration is symmetric form a subgroup. The quotient by this subgroup of symmetries of the registration cost-function can provide vastly more compact representations. This situation arises in several cases with the LDDMM framework: when registering images, only displacements orthogonal to the level lines of an image are needed, and when registering shapes, the information left in the quotient is supported on the shape surface only.  Figure 2. A registration of two discs of different sizes (a,b) with one example of a warp that brings (b) into correspondence with (a) visualized by its effect on an initially regular grid (c). Using symmetry, the dimensionality of the registration problem can be reduced from infinite to finite. In this case, six parameters of a one-jet particle (see Section 5.4) in the centre of the moving image encode the entire deformation. The six parameters can roughly be described as a position in R 2 , a scaling, a stretch, a shear and a rotation. (a) Fixed image; (b) moving image; (c) warp.

Content and Outline
Although a degree of comfort with differential geometry will be assumed, it is the aim of this paper to make the role of symmetry in registration and deformation modelling clear to the non-expert in geometric mechanics and symmetry groups in image registration. We begin the paper by presenting the background for the registration problem and the large deformation approach before outlining some necessary notions from differential geometry. For more information on the Riemannian geometry behind the LDDMM approach to image registration, we refer the reader to [5]. We continue by describing how reduction by symmetry leads to an Eulerian formulation of the equations of motion when reducing to the Lie algebra. The symmetry of dissimilarity measures allows additional reductions, and we use isotropy subgroups to reduce the complexity of the registration problem further. Lastly, we survey the effect of symmetry in a range of concrete registration problems. The paper ends with concluding remarks.

Registration
The registration problem consists of finding correspondences between objects that are typically point sets (landmarks), curves, surfaces, images or more complicated spatially-dependent data, such as diffusion weighted images (DWI). The problem can be approached by letting M be a spatial domain containing the objects to be registered. M can be a compact finite dimensional differentiable manifold without boundary or R d itself with d = 2, 3. It is common to consider manifolds with boundaries, as well. In such cases, care must be taken with regards to boundary conditions. For example, vector-fields must be tangential to the boundary. Here, we consider only manifolds without a boundary.
A map ϕ : M → M can deform or warp the domain by mapping each x ∈ M to ϕ(x). The deformation encoded in the warp will apply to the objects in M , as well as the domain itself. For example, if the objects to be registered consist of point sets {x 1 , . . . , x N }, x i ∈ M , the set will be mapped to {ϕ(x 1 ), . . . , ϕ(x N )}. For surfaces S ⊂ M , ϕ similarly results in the warped surface ϕ(S). Because those operations are associative, the mapping ϕ acts on {x i } or S, and we write ϕ · {x i } and ϕ · S for the warped objects. An image is a function I : M → R, and ϕ acts on I as well, in this case by composition with its inverse ϕ · I = I • ϕ −1 ; see Figure 3. For this, ϕ must be is invertible, and commonly, we restrict to the set of invertible and differentiable mappings Diff(M ). For various other types of data objects, the action of a warp on objects can be defined in a manner similar to that of point sets, surfaces and images. This fact relates a range registration problems to the common case of finding appropriate warps ϕ, which bring the objects into correspondence. Different shape instances can be realized by letting warps act on a base shape, and a class of shape models can thereby be obtained by using deformations as shape representations [1]. Given two images I 0 , I 1 : M → R, image registration involves finding a warp ϕ, such that ϕ · I 0 is close to I 1 as measured by a dissimilarity measure F (ϕ · I 0 , I 1 ).
The search for appropriate warps can be formulated in a variational formulation with an energy: where F is a dissimilarity measure of the difference between the deformed objects and R is a regularization term that penalizes unwanted properties of ϕ, such as spatial irregularity. If two objects o 1 and o 2 are to be matched, F can take the form F (ϕ · o 0 , o 1 ) using the action of ϕ on o 0 ; for image matching, an often used dissimilarity measure is the L 2 -difference or sum of square differences (SSD) having the form F (ϕ · I 0 , The regularization term can take various forms often modelling physical properties, such as elasticity [15], and derivatives of ϕ are often penalized to ensure smoothness. For some choices of R, existence and analytical properties of minimizers of Equation (1) have been derived [16]; however, in general, it is difficult to ensure that solutions are diffeomorphic by penalizing ϕ in itself. The free-form-deformation (FFD; [17]) and related approaches model the deformation by a displacement vector field u on M = R d , so that ϕ(x) = x + u(x). Smoothness is here ensured by the choice of basis functions, e.g., B-splines, or by applying a regularization term on u. Smooth and invertible mappings can be obtained by integrating flows [9,11] to obtain one-parameter families or paths of mappings ϕ t , t ∈ [0, 1]. The warp ϕ 0 at t = 0 is here the identity mapping id ∈ Diff(M ), and the dissimilarity is measured at the endpoint ϕ 1 . The time evolution of ϕ t can be described by the differential equation: with the flow field u t being a vector field on M . Numerically, the map ϕ can be represented by how it maps a finite set of points, and a numerical scheme might simply implement Euler-integration on each point, i.e., " A relaxation of this idea is now a standard method in optical flows [18]. The space of flow fields is denoted by V . In the LDDMM [1] framework, the regularization is applied to the flow field and integrated over time giving the energy: Here, the time-dependent diffeomorphism ϕ t is related to u t through Equation (2). If the norm · V that measures the irregularity of u t is sufficiently strong (e.g., H k with k > d 2 + 1), then ϕ t will be a diffeomorphism for all t [19]. This approach thus gives a direct way of enforcing properties of the generated warp: instead of regularizing ϕ directly, the analysis is lifted to a normed space V that is much easier control. The energy E in Equation (3) has the same minimizers as the geometric formulation of LDDMM used in the next section.
Direct approaches to solving the optimization problem in Equation (3) must handle the fact that the problem of finding a warp is now expanded to that of finding a time-dependent family of warps. This is a huge increase in dimensionality. This formulation of registration is thus very difficult to represent numerically and to optimize and analyse statistically. For several data types, it has been shown how optimal paths for Equation (3) have specific properties that reduce the dimensionality of the problem, making practical solutions feasible. In the next section, we outline the geometric framework that is needed when we, in the later sections, use reduction by symmetry to describe these data-dependent results in a common theoretical framework.

Notions from Differential Geometry
In this section, we will introduce a number of notions from differential geometry in a fairly informal manner. We will use conventions from [20,21] where a more rigorous understanding of differential geometry can be found.
We will assume that the reader has at least an intuitive picture of the notion of a smooth manifold M . For the purpose of this paper, M will either be assumed to be compact without a boundary or R n . The tangent bundle of M is the space of velocity vectors tangential to M . Notationally, the tangent bundle of M , denoted T M , is the set of pairs (x, v) where x ∈ M and v is a vector tangential to M at the point x (see Figure 4). A vector-field is a continuous map u : M → T M , such that u(x) ∈ T M is a vector above x for all x ∈ M . We will denote the space of vector-fields by X(M ).
Given a vector-field u ∈ X(M ), we may consider the initial value problem: for t ∈ [0, 1]. Given an initial condition x 0 , the point x 1 = x(1) given by solving this initial value problem is uniquely determined if it exists. Under many circumstances (e.g., if M is compact or if M = R d and u(x) grows sub-linearly), an x 1 exists for each x 0 , and there is a continuous invertible map Φ u t : x 0 ∈ M → x 1 ∈ M , which we call the flow of u. Given a time-dependent vector-field, u t ∈ X(M ) for t ∈ [0, 1], we can consider the initial value problem with dx dt = u t (x(t)). This will yield a flow map, Φ u t 0 ,t 1 which is the flow from time t = t 0 to t = t 1 . If u t is smooth, the flow map will be smooth, as well, in particular a diffeomorphism. We denote the set of diffeomorphisms by Diff(M ).
Conversely, let ϕ t ∈ Diff(M ) be a time-dependent diffeomorphism. For any x ∈ M , we observe that ϕ t (x) is a curve in M . If this curve is differentiable we may consider its time-derivative, dϕt dt (x) ∈ T M , which is a vector above the point ϕ t (x) ∈ M . From these observations, it follows that dϕt dt [ϕ −1 t (x)] is a vector above x. Therefore, the map u t : , is a vector-field called the Eulerian velocity field of ϕ t .
As will be described shortly, the Eulerian velocity field contains less data than dϕt dt . This reduction in data can be viewed from the perspective of symmetry. Given any ψ ∈ Diff(M ), the curve ϕ t can be transformed to the curve ϕ t • ψ. We observe that: Thus, ϕ t and ϕ t • ψ both have the same Eulerian velocity fields. In other words, the Eulerian velocity field, u t , is invariant under particle relabellings. More precisely, we may view Diff(M ) as a manifold in its own right, and view dϕt dt as a vector in the infinite-dimensional tangent bundle T Diff(M ) above the "point" ϕ t ∈ Diff(M ). Thus, the vector dϕt dt contains both velocities and a base diffeomorphism ϕ t . Given u t and ϕ t , we can reconstruct dϕt dt via dϕt dt = u t • ϕ t . As has been shown, we can also construct u t from dϕt dt by its own definition. However, we cannot reconstruct dϕt dt from u t , which is why u t contains less data.
Finally, we will denote some linear operators on the space of vector-fields. Let Φ ∈ Diff(M ), and let u ∈ X(M ). The push-forward of u by Φ is the vector-field given by: In local coordinates (x 1 , . . . , x n ), this looks like: By inspection, we see that Φ * is a linear operator on X(M ). One can view Φ * u as "u in a new coordinate system", because any geometric property of u is also inherited by As Φ * is a linear operator, a well-defined operator exists, which is dual to Φ * . Let X(M ) * denote the dual space to X(M ), i.e., the set of linear maps X(M ) → R, which are continuous with respect to a chosen vector-space topology on X(M ). Given m ∈ X(M ) * , we define Φ * m ∈ X(M ) * by the identity: for all u ∈ X(M ), where m, u denotes the evaluation of m on v. We can define Φ * m := (Φ −1 ) * m, which yields the identity: In local coordinates, we may represent m as a one-form density, given by m i (x)dx i ⊗(dx 1 ∧· · ·∧dx n ) with components m 1 (x), . . . , m n (x). In this local coordinate description, the i-th component of the push-forward looks like: Finally, we define the Lie derivative. Let w ∈ X(M ). The Lie derivative operator, with respect to w, is the linear operator £ w : X(M ) → X(M ) defined by: for all u ∈ X(M ).
We conclude the section with a table of notation for the reader's convenience, see Table 1. Table 1. Notation.

Reduction by Symmetry in LDDMM
In this section, we will present necessary conditions satisfied by local extremizers of the variational problem Equation (3). The resulting conditions will first involve the computation of a curve in Diff(M ), as well as its time-derivative in T Diff(M ). We then invoke a Diff(M ) symmetry of the problem to reduce this computation to a computation on X(M ) instead of T Diff(M ) ∼ = Diff(M ) × X(M ). Secondly, we describe how the symmetry of the dissimilarity measure allows further reductions.

Reduction to the Lie Algebra
The variational formulation Equation (3) of LDDMM is equivalent to minimizing the energy: where d : Diff(M ) × Diff(M ) → R is a Riemannian distance metric on Diff(M ) induced by the norm · V , id is the identity diffeomorphism, and F : Diff(M ) → R is a dissimilarity measure, i.e., a function measuring the disparity between the deformed template and the target object. Example 1. Given images I 0 , I 1 ∈ L 2 (M ), we consider the dissimilarity measure: In this article, we will consider the metric on connected components of Diff(M ) given by: where u t denotes a one-parameter family of vector fields and · is a norm on X(M ), the Lie algebra of Diff(M ). The norm is generally assumed to be admissible, i.e., embedded in C 1 0 (M, R n+k ) for sufficiently large k, so that a constant C exists satisfying u ≥ C u 1,∞ for all u ∈ X(M ) ([1], Chapter 9). In the case where M = R n , we can define u 1,∞ chart-wise by a partition of unity (e.g., see the construction of H k norms on M in [22]) or intrinsically in terms of the Riemannian gradient. Both choices would yield identical topologies, and so, this choice has no significance as far as the article is concerned. From now on, we will overload notation and let X(M ) denote the set of C k vector fields with finite norm. For finite k, this makes X(M ) a Banach space, and it breaks the Lie algebra structure. The consequences of this breakage will not be explored here, and we will continue to treat X(M ) formally as a Lie algebra. We will later be using the space of homeomorphisms generated by X(M ), which is a subspace of C k -diffeomorphisms. Again, we will overload the notation and call this space Diff(M ), even though this is usually reserved for smooth diffeomorphisms. In the case where M = R d , we assume that our norm is such that decay conditions at infinity for u ∈ X(M ) arise naturally as a result of requiring the norm to be finite. If · is induced by an inner-product, the inner-product is formally a Riemannian metric on Diff(M ) given by: and d is the Riemannian distance with respect to this metric. The norm is often defined in terms of an operator P : X(M ) → X(M ) * as u 2 = P [u], u , and the assumed admissibility implies that (X(M ), · ) has a reproducible kernel Hilbert space structure (RKHS; [1], Chapter 8). For example, we could consider M = R, P = dx ⊗ (1 − ∂ 2 x ), and the vector-field u(x) = exp(−|x|) is mapped to the one-form density dx ⊗ δ, where δ is the Dirac delta distribution (see [23]. In particular, in the case that M = R n , a matrix-valued kernel function K : R n × R n → R n×n exists satisfying the reproducing property P [K(·, x)a], u = a T u(x) for all x ∈ R n and a ∈ R n (see [24]). We will denote RKHSs by V and the norms by · V .
Given P , minimizers of E Equation (6) must satisfy: That Equation (7) is a necessary condition satisfied by the minimizers of Equation (6) (7). The term F only penalizes the end-point of the geodesic, and the minimization condition manifests as the third line of Equation (7).
Issues regarding the well-posedness of Equation (7) are non-trivial, because P is merely injective, but not bijective, and so, there is no guarantee that P can be inverted on a given m t ∈ X(M ) * at each time in order to obtain a vector-field u t ∈ X(M ). Fortunately, safety guards for well-posedness are known (e.g., [19], Theorem 1, or [25]).
Using Equation (7) for computational purposes is difficult because Diff(M ) is a non-linear infinite dimensional space. Moreover, the dissimilarity measure F only comes into play at time t = 1, and the distance function is an integral over the vector-space X(M ). It would be beneficial if we could rewrite the extremizers in terms of the Eulerian velocity field u and the flow at t = 1. In fact, this is often the case. One (formally) must take the time-derivative of the term "(Φ u t,1 ) * m(1)" and apply Equation (5). Explicitly, this computation is performed as follows. Let w ∈ X(M ), and observe: As w is arbitrary, we find ∂ t m t + £ ut [m t ] = 0. This allows us to reformulate Equation (7) as: The advantage of this formulation is that the bulk of the computation occurs on the vector-space X(M ) rather than on the space T Diff(M ). Registration algorithms based on Equation (8) differ from the algorithm proposed by Beg et al. in [26]. In [26], a gradient descent on the time-dependent Eulerian vector field u t is used to minimize the curve energy. The algorithm is posed on the velocity field u t instead of the momentum field m t as Equation (8) suggests. The momentum at time t is retrieved by a transport equation similar to the first equation in Equation (7). The evolution Equation (8) effectively allows one to search over the space of initial conditions, X(M ) * , rather than over the larger space of curves, C 1 ([0, 1]; X(M )).
This reduction of the problem from T Diff(M ) given in Equation (7) to the problem over space of vector-fields, X(M ) given in Equation (8) is the first instance of reduction by symmetry. In particular, this corresponds to the fact that the space of vector-fields X(M ) is identifiable as a quotient space: is the quotient projection.

Isotropy Subgroups
The reduction to dynamics on T Diff(M ) to dynamics on X(M ) occurs primarily because the distance function is Diff(M ) invariant. However, one cannot completely abandon Diff(M ), because the solution requires one to compute the Time 1 flow, Φ u 0,1 . Fortunately, there is a second reduction, which allows us to avoid computing Φ u 0,1 in its entirety. This second reduction corresponds to the invariance properties of the dissimilarity measure F . Let G F ⊂ Diff(M ) denote the set of diffeomorphisms, which leave F invariant, i.e.: One can readily verify that G F is a subgroup of Diff(M ), and so, we call G F the isotropy subgroup of F .
Having defined G F , we can now consider the homogeneous space Q = Diff(M )/G F , which is the quotient space induced by the action of the right composition of G F on Diff(M ). This quotient space is "smaller" in the sense of the amount of data required to describe an element of it. In terms of maps, this can be seen by defining the map ϕ ∈ Diff(M ) → q = [ϕ] /G F ∈ Q, where [ϕ] /G F denotes the equivalence class of ϕ. We call this mapping the quotient projection, because it sends Diff(M ) to Q surjectively. While these notions are theoretically quite complicated, often they manifest less so in practice.
Example 2. In this example, we consider a simple aspect of the landmark matching problem. Let M ⊂ R n be the closure of some open set. Let x 1 , x 2 , y 1 , y 2 ∈ M with x 1 = x 2 , and consider the dissimilarity measure: We see that: and: . Two examples of diffeomorphisms contained in G F are shown in Figure 5. Example 3. In this example, we consider the matching problem for greyscale images. Let I 0 , I 1 ∈ H k (M ) be images. There is a natural action of Diff(M ) on H k given by sending each image I ∈ H k (M ) to I • ϕ ∈ H k (M ). We could consider the matching function F : Diff(M ) → R given by: This function measures the difference between a deformed version of I 0 and a fixed image, I 1 . The isotropy subgroup G F is the group of images that preserve 0 0 . Such a diffeomorphism would preserve each of the level lets of I 0 , but could permute the points within a given level set.
If one is able to understand Q, then one can use this insight to reformulate the dissimilarity measure F as a function on Q, rather than Diff(M ). In particular, there exists a unique function F Q : Q → R defined by the property F Q ([ϕ] /G F ) = F (ϕ). Again, this is useful in the sense of data, as illustrated in the following example.
Example 4. Consider the dissimilarity measure F of Example 2. The function, F Q : Q → R is: Finally, note that Diff(M ) acts upon Q by the left action: Usually, we will simply write ψ · q for the action of ψ ∈ Diff(M ) on a given q ∈ Q. This means that X(M ) acts upon Q infinitesimally, as it is the Lie algebra of Diff(M ). ψ · (q 1 , q 2 ) = (ψ(q 1 ), ψ(q 2 )) for ψ ∈ Diff(M ) and q = (q 1 , q 2 ) ∈ Q. The infinitesimal action of u ∈ X(M ) on Q is: u · (q 1 , q 2 ) = (u(q 1 ), u(q 2 )) ∈ T q Q These constructions allow us to rephrase the initial optimization problem using a reduced curve energy. Minimization of E is equivalent to minimization of: where q(1) is obtained by integrating the ODE, dq(t) dt = u t · q(t) with the initial condition q(0) = [id] /G F , where id ∈ Diff(M ) is the identity transformation. We see that this curve energy only depends on the Eulerian velocity field and the equivalence class q(1). Minimizers of E Q must necessarily satisfy: A geometric derivation of this formula can be found in [13] (Lemma 2.8, (2.12), and (2.13)). Again, the solution only depends on the Eulerian velocity and q(1). For this reason, we see that the G F symmetry of F provides a second reduction in the data needed to solve our original problem.

Orthogonality
In addition to reducing the amount of data that we must keep track of, there is an additional consequence to the G F -symmetry of F . In particular, there is a potentially massive constraint satisfied by the Eulerian velocity u.
To describe this, we must introduce an isotropy algebra. Given q(t) = [Φ u 0,t ] /G F , we can define the (time-dependent) isotropy algebra: This is nothing, but the "Lie algebra" associated with the isotropy group G q(t) = {ψ ∈ Diff(M ) | ψ · q(t) = q(t)}. The use of quotes here is deliberate. If we let X(M ) denote an RKHS obtained from the space of vector-fields, then some of these are permitted to be non-smooth, which means that the standard Lie bracket of vector-fields does not close.
It turns out that the velocity field u t that minimizes E (or E Q ) is orthogonal to g q(t) with respect to the chosen inner-product. Intuitively, this is quite sensible, because velocities that do not change q(t) do not alter the data and simply waste control effort. Equivalently, said from the perspective Lagrange-multipliers, we know that the Lagrange-multipliers used to enforce optimality should point in a direction orthogonal to those which leave the cost functional unaltered. These variations on the same statement are formalized below. Proposition 1. Let u satisfy Equation (8) or Equation (9). Then, m = P [u] annihilates g q(t) .
Proof. Let u be the solution to Equation (9). We will first prove that u 1 is orthogonal to g q(1) . Let w 1 ∈ g q(1) . We observe: However, w 1 leaves q(1) fixed, so Φ w(1) · q(1) = 0. Therefore, P [u 1 ], w 1 = 0. Let w t = [Φ u t,1 ] * w 1 . In coordinates, this means: One can directly verify that w t ∈ g q(t) for all t ∈ [0, 1]. Denoting m t = P [u t ], as in Equation (9), we find: The last equality follows from Equation (5). Thus, P [u t ], w t is constant. We have already verified that at t = 1, this inner-product is zero, and therefore, P [u t ], w t = 0 for all time. That w 1 is an arbitrary element of g q(1) makes w t an arbitrary element of g q(t) at each time. Thus, u t is orthogonal to g q(t) for all time.
At this point, we should return to our example to illustrate this idea. Example 6. Again, consider the setup of Example 2. In this case, q(t) = (q 1 (t), q 2 (t)) ∈ M × M − ∆ M ×M . The space g z(t) is the space of vector-fields that vanish at q 1 (t) and q 2 (t). Therefore, u t is orthogonal to q(t) if and only if m t = P [u t ] satisfies: for some covectors p 1 , p 2 ∈ X(M ) * and for any v ∈ X(M ). In other words: where δ x (·) denotes the Dirac delta functional centred at x.
This orthogonality constraint allows one to reduce the evolution equation on X(M ) to an evolution equation on Q, which might be finite dimensional if G F is large enough. In particular, there is a horizontal lift, h ↑ : T Q → X(M ), uniquely defined by the conditions h ↑ (q,q) · q = 0, and h ↑ (q,q) ⊥ g q with respect to the chosen inner-product on vector-fields.
Example 7. Consider the setup of Example 2 with M = R n . Then, Q = R n × R n − ∆ R n ×R n . Let K : R n × R n → R n×n be the reproducing kernel of P . Then, h ↑ : T Q → X(R n ) is given by: where p 1 , p 2 ∈ R n are such that p 1 + K(q 1 − q 2 )p 2 =q 1 and K(q 2 − q 1 )p 1 + p 2 =q 2 .
One can immediately observe that h ↑ is injective and linear inq. In other words, h ↑ (q, ·) : T q Q → X(M ) is an injective linear map for fixed q ∈ Q. Because the optimal u t is orthogonal to g q(t) , we may invert h ↑ (q(t), ·) on u t . In particular, we may often write the equation of motion on T Q, rather than on X(M ). This is a massive reduction if Q is finite dimensional. In particular, the inner-product structure on X(M ) induces a Riemannian metric on Q given by: (8) and (9) map to the geodesic equations on Q.

The equations of motion in Equations
Proposition 2. Let u extremize E or E Q . Then, there exists a unique trajectory q(t) ∈ Q, such that u = h ↑ ( dq(t) dt ). Moreover, q(t) is a geodesic with respect to the metric g.
Proof. Let u minimize E. Thus, u satisfies Equation (9). By the previous proposition, u t is orthogonal to g q(t) . As h ↑ (q(t), ·) : T q(t) Q → X(M ) is injective on g ⊥ q(t) , there exists a uniqueq(t), such that h ↑ (q(t),q(t)) = u t . Note that E can be written as: h ↑ (q(t),q(t)) V dt + F (q(1)) = g q (q,q) 1/2 dt + F (q(1)) Thus, minimizers of E correspond to geodesics in Q with respect to the metric g.
If we let H : T * Q → R be the Hamiltonian induced by the metric on Q, we obtain the most data-efficient form or Equations (8) and (9). Minimizers of E (or E Q ) are: We see that this is a boundary value problem posed entirely on Q. If Q is finite dimensional, this is a massive reduction in terms of data requirements.

Descending Group Action
A related approach to defining distances on a space of objects to be registered consists of defining an object space O upon which Diff(M ) acts transitively (this means that for any o 1 , o 2 ∈ O, there exists a ϕ ∈ Diff(M ) such that ϕ · o 1 = o 2 ) with distance: Here, the distance on O is defined directly from the distance in the group acting on the objects; see for example [1,5]. With this approach, the Riemannian metric descends from Diff(M ) to a Riemannian metric on O, and geodesics on O lift by horizontality to geodesics on Diff(M ). The quotient spaces Q obtained by reduction by symmetry and their geometric structure correspond to the object spaces and geometries defined with this approach. Intuitively, reduction by symmetry can be considered a removal of redundant information to obtain compact representations, while letting the metric descend to the object space O constitutes an approach to defining a geometric structure on an already known space of objects.

Examples
Here, we present a number of concrete examples of how reduction by symmetry can reduce the infinite dimensional registration problem over Diff(M ) to lower, in some cases finite, dimensional problems. In all examples, the symmetry of the dissimilarity measure with respect to a subgroup of Diff(M ) gives a reduced space by quotienting out the isotropy subgroups.

Landmark Matching
The space Q used in the examples in Section 4 constitutes a special case of the landmark matching problem, where sets of landmarks Q = {(x 1 , . . . , x N )| x i ∈ M, x i = x j ∀i = j} are placed into spatial correspondence trough the left action ϕ · (x 1 , . . . , The landmark space Q arises as a quotient of Diff(M ) from the isotropy group G F , as in Example 2. Reduction from Diff(M ) to Q in the landmark case has been used in a series of papers starting with [27]. Hamilton's equations (Equation (10)) take the form: where DK denotes the spatial derivative of the reproducing kernel K. Generalizing the situation in Example 6, the momentum field is a finite sum of Dirac measures N j=1 p j ⊗ δ q j , and the Eulerian velocity field is the corresponding finite linear combination of the kernel evaluated at q i : u(·) = N j=1 K(· − q j )p j . Registration of landmarks is often in practice done by optimizing over the initial value of the momentum p in the ODE to minimize E, a strategy called shooting [28]. Using symmetry, the optimization problem is thus reduced from an infinite dimensional time-dependent problem to an N dim(M ) dimensional optimization problem involving integration of a 2N dim(M ) dimensional ODE on T * Q.

Curve and Surface Matching
The space of smooth non-intersecting closed parametrized curves in R n is also known as the space of embeddings, denoted Emb(S 1 , R n ). The parametrization can be removed by considering the right action of Diff(S 1 ) on Emb(S 2 , R n ) given by: Then, the quotient space Gr(S 1 , R n ) := Emb(S 1 , R n )/ Diff(S 1 ) is the space of unparameterized curves. The space Gr(S 1 , R n ) is a special case of a non-linear Grassmannian [29], and it has a manifold structure under certain conditions on the space of embeddings and the space of diffeomorphisms [30]. When the parametrization is not removed, embedded curves and surfaces can be matched with the current dissimilarity measure [31,32]. If M is a volume manifold, then the objects are considered elements of the dual space of Ω k (M ), the space of differential k-forms on M . In the surface case, a bounded submanifold S ⊂ M can be seen as an element of [Ω k (M )] * by its evaluation on a k-form, w ∈ Ω 2 (M ), given by: where w| S ∈ Ω k (S) is the restriction of w to S. The dual space (Ω k (M )) * can be equipped with a norm that enables surfaces to be quantitatively compared as elements of the vector-space (Ω k (M )) * . Note that the evaluation Equation (11) does not depend on the parametrization of S, as it is written in a coordinate-free form. Coordinate-based formulations of Equation (11) are available in [31,32]. This technique is computationally much more tractable than using the Hausdorff distance, which requires pairwise comparisons between all points between two surfaces. The isotropy groups for curves and surfaces generalize the isotropy groups of landmarks by consisting of warps that keep the objects fixed, i.e., The momentum field will be supported on the transported curves/surfaces ϕ(t).S for optimal paths for E in Diff(M ).

Image Matching
Images can be registered using either the L 2 -difference defined in Example 1 or with other dissimilarity measures, such as mutual information or the correlation ratio [33,34]. The similarity will be invariant to any infinitesimal deformation orthogonal to the gradient of dissimilarity measure. In the L 2 case, this is equivalent to any infinitesimal deformation orthogonal to the level sets of the moving image [35]. The momentum field thus has the form p(t) = α(t)∇ϕ(t).I 0 for a smooth function α(t) on M (see Figure 6), and the registration problem can be reduced to a search over the scalar field α(t) instead of vector field p(t).
Minimizers for E follow the PDE [5]: with m t representing the deformed image at time t.
In particular, the isotropy group of a source image f 0 ∈ C ∞ (M ) is the subgroup of diffeomorphisms, which preserve the level sets of f 0 . The quotient space Diff(M )/Iso(f 0 ) can be identified with the orbit of f 0 by diffeomorphisms, i.e., Diff(M )/Iso(f 0 ) ∼ = Orb(f 0 ) := {ϕ * f 0 | ϕ ∈ Diff(M )}. This orbit is difficult to identify with a more concrete object, in contrast to, e.g., the case of landmark matching. However, it can be characterized by various properties. For example, for a function f ∈ Orb(f 0 ) and any c ∈ f 0 (M ) ⊂ R, the level sets f −1 (c) and f −1 0 (c) have the same topology. Figure 6. In image matching, the gradient of the L 2 -difference will be orthogonal to level lines of the image, and symmetry implies that the momentum field will be orthogonal to the level lines, so that p(t) = α(t)∇ϕ(t).I 0 for a time-dependent scalar field α.

Jet Matching
In [14,36], an extension of the landmark case has been developed where higher-order spatial information is advected with the landmarks. The spaces of jet-particles arise as extensions of the reduced landmark space Q by quotienting out smaller isotropy subgroups known as jet-groups. A thorough account of jet-groups, including Lie group and algebra structures, can be found in [37] (Chapter 4). We provide a brief introduction here. Let G (0) be the isotropy subgroup for a single landmark: Let now k be a positive integer. For any k-differentiable map f from a neighbourhood of q, the k-jet of f is denoted J That is, the elements of G (k) fix the Taylor expansion of the deformation ϕ up to order k. The definition naturally extends to finite numbers of landmarks, and the quotients Q (k) = G/G (k) can be identified as the sets: with S 1 2 being the space of rank (1, 2) tensors symmetric in the lower indices. Intuitively, the space Q (0) is the regular landmark space with information about the position of the points; the one-jet space Q (1) carries for each jet information about the position and the Jacobian matrix of the warp at the jet position; and the two-jet space Q (2) carries in addition the Hessian matrix of the warp at the jet position. The momentum for Q (0) in coordinates consists of N vectors representing the local displacement of the points. With the one-jet space Q (0) , the momentum in addition contains d × d matrices that can be interpreted as locally linear deformations at the jet positions [36]. In combination with the displacement, the one-jet momenta can thus be regarded as locally affine transformations. The momentum fields for Q (2) add symmetric tensors encoding local second order deformation. The local effect of the jet-particles is sketched in Figure 7.
When the dissimilarity measure F is dependent not just on positions, but also on higher-order information around the points, reduction by symmetry implies that optimal solutions for E will be parametrized by k-jets in the same way as Q (0) parametrizesoptimal paths for E in the landmark case. The higher-order jets can thus be used for landmark matching when the dissimilarity measure is dependent on the local geometry around the landmarks. For example, matching of the first order structure, such as image gradients, leads to first-order jets, and matching of local curvature leads to second-order jets. Figure 7. With discrete image matching, the image is sampled at a regular grid Λ h , h > 0, and the image matching PDE (12) is reduced to an ODE on a finite dimensional reduced space Q. With the approximation F (0) (13), the momentum field will encode local displacement, as indicated by the horizontal arrows (top row). With a first order expansion, the solution space will be jet space Q (1) , and locally affine motion is encoded around each grid point (middle row). The O(h d+2 ) approximation F (2) includes second order information, and the system reduces to the jet space Q (2) with second order motion encoded at each grid point (lower row).

Discrete Image Matching
The image matching problem can be discretized by evaluating the L 2 -difference at a finite number of points. In practice, this always happens when the integral M |I 0 • ϕ −1 (x) − I 1 (x)| 2 dx is evaluated at finitely many pixels of the image. In [36,38], it is shown how this reduces the image matching PDE (12) to a finite dimensional system on Q when the integral is approximated by pointwise evaluation at a grid Λ h : where h > 0 denotes the grid spacing. F (0) approximates F to order O(h d ), d = dim(M ). The reduced space Q encodes the position of the points ϕ −1 (x), x ∈ Λ h , and the lifted Eulerian momentum field is a finite sum of point measures p = x∈Λ h a x ⊗ δ ϕ −1 (x) . For each grid point, the momentum encodes the local displacement of the point. In [38], a discretization scheme with higher-order accuracy is in addition introduced with an O(h d+2 ) approximation F (2) of F . The increased accuracy results in the entire energy E being approximated to order O(h d+2 ). The solution space in these cases becomes the jet-space Q (2) . For a given order of approximation, a corresponding reduction in the number of required discretization points is obtained. The reduction is countered by the increased information encoded in each two-jet. The momentum field thus encodes both local displacement, local linear deformation and second order deformation; see Figure 7. The discrete solutions will converge to solutions of the non-discretized problem as h → 0.

DWI/DTI Matching
Image matching is invariant with respect to variations parallel to the level lines of the images. With diffusion weighted images (DWI) and the variety of models for the diffusion information (e.g., diffusion tensor imaging (DTI) [39], Gaussian mixture fields [40]), first or higher-order information can be reintroduced into the matching problem. In essence, by letting the dissimilarity measure depend on the diffusion information, the isotropy subgroup of the matching problem becomes smaller.
The exact form of the of DWI matching problem depends on the diffusion model and how Diff(M ) acts on the diffusion image. In [41], the diffusion is represented by the principal direction of the diffusion tensor, and the data objects to be matched are thus vector fields. The action by elements of Diff(M ) is defined by: The action rotates the diffusion vector by the Jacobian of the warp, keeping its length fixed. Similar models can be applied to DTI with the preservation of principle direction scheme (PPD, [42,43]) and to GMF-based models [44]. The dependency on the Jacobian matrix implies that a reduced model must carry first order information in a similar fashion to the one-jet space Q (1) ; however, any irrotational part of the Jacobian can be removed by symmetry. The full effect of this has yet to be explored.
As in the case of image matching, the quotient can be identified with the orbit of the source data under diffeomorphisms.

Fluid Mechanics
Incidentally, the equation of motion: ∂ t m t + £ u [m t ] = 0 , u t = K * m t is an eccentric way of writing Euler's equation for an inviscid incompressible fluid if we assume u t ∈ X(R n ) is initially in the space of divergence free vector-fields and K * is the Riemannian flat map (which implies that m t and u t can be identified as functions on R n ) [45]. This fact was exploited in [46] to create a sequence of regularized models to Euler's equations by considering a sequence of kernels, such that the operator K * (viewed as a map to one-form densities) converges to a surjection onto the annihilator gradient vector-fields (this is written as a projection onto divergence free vector-fields in [46]). Moreover, if one replaces Diff(M ) by the subgroup of volume preserving diffeomorphisms Diff vol (M ), then (formally) one can produce incompressible particle methods using the same reduction arguments presented here. In fact, jet-particles were independently discovered in this context as a means of simulating fluids in [47]. It is notable that [47] is a mechanics paper, and the particle methods that were produced were approached from the perspective of reduction by symmetry without any knowledge of the related work being done in image registration.
In [48], one of the kernel parameters in [46], which controls the compressibility of the u, was taken to the incompressible limit. This allowed a realization of the particle methods described in [47]. The constructions of [48] are the same as presented in this article, but with Diff(M ) replaced by the group of volume-preserving diffeomorphisms of R d . Velocity fields induced by first order jet-particles are visualized in Figure 8.

Discussion and Conclusions
The information available for solving the registration problem is in practice not sufficient for uniquely encoding the deformation between the objects to be registered. Symmetry arises in both particle relabelling symmetry that gives the Eulerian formulation of the equations of motion and in symmetry groups for specific dissimilarity measures.
For landmark matching, reduction by symmetry reduces the infinite dimensional registration problem to a finite dimensional problem on the reduced landmark space Q. For matching curves and surfaces, symmetry implies that the momentum stays concentrated at the curves and surfaces allowing a reduction by the isotropy groups of warps that leave the objects fixed. In image matching, symmetry allows reduction by the group of warps that do not change the level sets of the image. Jet-particles arise from smaller isotropy subgroups and, hence, larger reduced spaces Q (1) and Q (2) that encode locally affine and second order information.
Reduction by symmetry allows these cases to be handled in one theoretical framework. We have surveyed the mathematical construction behind the reduction approach and its relation to the above-mentioned examples. As data complexity rises both in terms of resolution and structure, symmetry will continue to be an important tool for removing redundant information and achieving compact data representations.