2.2. Schubert Cells
Grassmannians have a well-known cell decompostion into Schubert cells. Consider the sequence of subspaces of
:
, where
consists of the vectors of the form
. Any
k-plane
X gives rise to a sequence of integers
Consecutive integers differ by at most 1.
Definition 2. A Schubert symbol
is a sequence of k integers satisfying Given a Schubert symbol
, let
denote the set of
k-planes
X such that
Each
belongs to precisely one of the sets
.
Lemma 2. is an open cell of dimension .
In terms of matrices,
if and only if it can be described as the row space of a
matrix of the form
where the
i-th row has
-th entry positive (say equal to 1) and all subsequent entries zero. Equivalently, we could (and do in the sequel) consider the column space of the transpose of this matrix.
For example, the possible Schubert symbols and cells for
are as follows. Such a symbol has the form
where
.
| |
| 0 |
| 1 |
| 2 |
| 2 |
| 3 |
| 4 |
Theorem 2. The sets form the cells of a CW-decomposition of .
Proposition 1. The number of r-cells in is equal to the number of partitions of r into at most k integers, each of which is .
The mod 2 homology of is easily computed from the Schubert cell decomposition: since the induced boundary maps are all either 0 or multiplication by 2, the mod 2 homology has a basis corresponding to the cells.
Continuing the example of
, we have
Since we will need it below, we also note the integral homology of
[
7]:
Using the Universal Coefficient Theorem, one then quickly deduces that the homology groups
are
for
and 0, otherwise.
2.3. Persistent Homology
Suppose we are given a finite nested sequence of finite simplicial complexes
where the
are real numbers
. For each homological degree
, we then obtain a sequence of homology groups and induced linear transformations (homology with
-coefficients for simplicity)
Since the complexes are finite, each
is a finite-dimensional vector space. Thus, there are only finitely many distinct homology classes. A particular class
z may come into existence in
, and then one of two things happens. Either
z maps to 0 (i.e., the cycle representing
z gets filled in) in some
,
, or
z maps to a nontrivial element in
. This yields a
barcode, a collection of interval graphs lying above an axis parametrized by
R. An interval of the form
corresponds to a class that appears at
and dies at
. Classes that live to
are usually represented by the infinite interval
to indicate that such classes are real features of the full complex
.
As an example, consider the boundary of the tetrahedron
T with filtration
defined by
,
,
,
,
, and
(this is topologically a 2-sphere). The barcodes for this filtration are shown in
Figure 1. Note that, initially, there are four components (
), which get connected in
, when three independent 1-cycles are born (
). These three 1-cycles die successively as triangles get added in
,
, and
. The addition of the final triangle in
creates a 2-cycle (
).
For analyzing point cloud data, one needs a simplicial complex modeling the underlying space. Since it is impossible to know a priori if a complex is “correct”, one builds a nested family of complexes approximating the data cloud, computes the persistent homology of the resulting filtration, and looks for homology classes that exist in long sections of the filtration. We discuss two popular methods for doing this in the next subsection.
2.4. Vietoris–Rips and Witness Complexes
Now suppose we are given a discrete set
X of points in some metric space (typically a Euclidean space
). The standard example of such an object is a sample of points from some geometric object
M. We would like to recover information about
M from the sample
X, and the first step is to obtain an approximation of
M using only the point cloud
X. There are many such techniques; perhaps the most classical is the
Delaunay triangulation of
X. This is defined as follows. Say
. The
Voronoi decomposition of
relative to
X is the partition of
into cells
,
, defined by
The corresponding Delaunay triangulation,
, is the nerve of the Voronoi decomposition; that is, a collection
forms an
ℓ-simplex in
if
. One obtains a geometric realization of
via the map
. See
Figure 2 for an example.
While the Delaunay triangulation provides a good approximation to the underlying space M, it has several disadvantages. If the point cloud X is large, there will be a very large number of simplices in . In addition, suffers from the “curse of dimensionality;" that is, if the ambient dimension (m) is large, calculating the Voronoi decomposition is computationally expensive.
There are many popular alternatives to the Delaunay triangulation. The one used most often is the
Vietoris–Rips complex, which is built as follows. Consider the point cloud
X and let
. The Vietoris–Rips complex with parameter
r is the simplicial complex
whose
k-simplices are
That is, if one imagines a ball of radius
around each point
, then we join the points
and
with an edge if the balls intersect. Observe that if
then there is an inclusion of complexes
. We therefore have a nested sequence of complexes
and we may study the persistent homology of this filtration. The corresponding barcodes yield information about the topology of the underlying space
M.
Many software packages support the calculation of Vietoris–Rips persistence on point clouds. In this paper, we use the Eirene package developed by Gregory Henselman [
8]. Other popular programs include Ulrich Bauer’s Ripser [
9] and Vidit Nanda’s Perseus [
10].
In
Section 3.5, we shall use the
witness complexes of de Silva and Carlsson [
11]. The idea is to model the Delaunay triangulation on a smaller set of points
, called
landmarks, in such a way that the topology of the underlying object is well-approximated. Moreover, the definition makes sense in any metric space, so assume that
X is a metric space with distance function
d (e.g.,
X could be a finite point cloud in
with the usual Euclidean distance). Choose a subset
of
and let
be a real number.
The witness complex is defined as follows:
- •
The vertex set of is L;
- •
span an edge if there exists an
, called a
witness, such that
- •
A collection spans a p-simplex if span an edge for all .
Examples of witness complexes are shown in
Figure 2b alongside the associated Delaunay triangulation. Four landmark points were chosen using the maxmin procedure described below. The complex on the left has
, and the complex on the right has
. Note that the larger value of
R yields a complex with more simplices. In addition, note that the witness complex is a coarse approximation of the Delaunay triangulation.
We make some observations about this definition. Let D be the matrix of distances from points in L to points in X:
If , then form an edge if there is an such that and are the two smallest entries in the i-th column of D. This is analogous to the existence of an edge in the Delaunay triangulation .
For , one may think of relaxing the boundaries of the Voronoi diagram of L and taking the nerve of the resulting covering of X.
If , then there is an inclusion of simplicial complexes .
By a theorem of de Silva and Carlsson [
11], this complex is a natural analogue of the Delaunay triangulation for a space represented by point cloud data.
Suppose that
X is a sample of points from some object
. There is no guarantee that
recovers the topology of
M, but experiments on familiar geometric objects [
11] (spheres, for example) suggest that, for a suitable range of values of
R and good choices of landmarks
L, the topology of
is the same as that of
M. This begs the question:
The second question is best handled via the use of persistent homology, which we discussed in
Section 2.3 above. As for the choice of landmarks, there are three standard options:
The maxmin procedure yields more evenly-spaced landmarks, but tends to emphasize extremal points. It is generally more reliable than a random selection [
11]. Another useful resource is [
12]. In our experiments in
Section 3.5 below, we use the maxmin process to generate landmarks.
2.5. Sampling Procedures
To build a Vietoris–Rips or witness complex on points in
, we need to develop a sampling procedure. The first question to be asked is in which Euclidean space do we embed
? This is highly nontrivial. Even in the case of projective spaces (
), it is not so obvious how to proceed. A whole industry has been devoted to the question of the minimal embedding dimension of
[
13], but the proof of the minimality of any particular embedding rarely comes with an explicit
formula for the map. An exception is if one insists on an
isometric embedding [
14], but the minimal dimension of such an embedding for
is
, which grows rather quickly.
For arbitrary Grassmannians, one could try to use the Plücker embedding
defined by
(where
denotes the line spanned by the vector
v) and then embed the target projective space into Euclidean space. Of course, this explodes the dimension further, making this an impractical solution. Aside from some low dimensional projective spaces, we will instead approach this problem via the following result.
Proposition 2. The manifold is diffeomorphic to the smooth manifold consisting of all symmetric, idempotent matrices of trace k. The map φ realizing this takes a k-plane X to the operator defined by orthogonal projection onto X.
Proof. If X is a k-plane with orthonormal basis , denote by A the matrix having the as columns. Define a map by . This map is clearly smooth since it consists of polynomials in the entries of the various . Moreover, it is well-defined since, if is another orthonormal basis of X with associated matrix B, then there is an orthogonal matrix O such that . Then, . The matrix is symmetric: . It is idempotent: (note that , the identity matrix, since the columns of A are orthonormal). Finally, the trace of is k since its rank is k and its only eigenvalues are 0 and 1. Thus, the image of lies in the set of symmetric, idempotent matrices of trace k. To see that surjects onto this set, note that such a matrix B is projection onto a k-dimensional subspace X and there exists a basis with . Injectivity of follows since the subspace determined by a projection is unique. □
Now, to generate a sample of points on which to build a Vietoris–Rips or witness complex, we will use the embedding . A crude sampling is then obtained by the following procedure:
Select k random vectors in .
Perform the Gram–Schmidt orthogonalization algorithm to yield an orthornomal set . Let A be the matrix with as columns.
Compute .
One immediate problem with this process is that the k-plane it constructs lives in the top-dimensional Schubert cell with probability 1. However, since we know the space we are interested in, and we know its homology, we can bias our sample to ensure we include points from each Schubert cell. The following procedure implements this idea:
Determine the percentage of sample points desired from each Schubert cell. For example, one might choose 5% from a 1-cell, 10% from a 2-cell, and so on.
Elements of a given Schubert cell correspond to the column space of a particular matrix form. Generate such a matrix B using random vectors of the required form.
Generate a random orthogonal matrix X.
Add the matrix to the point cloud.
Note the final step above. If we merely took the matrix
B, we would not end up with a well-distributed sample. For example, in the case of
, such a matrix lying in the 1-cell of the Schubert decomposition has the following form:
The corresponding point in
would have most coordinates equal to 0, which is clearly not what we want. Conjugating the various
by a random orthogonal matrix
X (a different
X for each
B) yields a wider distribution of points in
.