Isometry invariant shape recognition of projectively perturbed point clouds by the mergegram extending 0D persistence

Rigid shapes should be naturally compared up to rigid motion or isometry, which preserves all inter-point distances. The same rigid shape can be often represented by noisy point clouds of different sizes. Hence the isometry shape recognition problem requires methods that are independent of a cloud size. This paper studies stable-under-noise isometry invariants for the recognition problem stated in the harder form when given clouds can be related by affine or projective transformations. The first contribution is the stability proof for the invariant mergegram, which completely determines a single-linkage dendrogram in general position. The second contribution is the experimental demonstration that the mergegram outperforms other invariants in recognizing isometry classes of point clouds extracted from perturbed shapes in images.

approximated. The Farthest Point Sampling (FPS) has a quadratic complexity in the number of points [20, section 3.6] and was successfully tested on small clouds.
The proposed invariant mergegram extends the 0-dimensional persistence in the area of Topological Data Analysis (TDA), which grew from the theory of size functions [28]. TDA views a point cloud A ⊂ R m not by fixing any distance threshold but across all scales s, for example by blurring given points to balls of a variable radius s. The resulting evolution of topological shapes is summarized by a persistence diagram, which is invariant under isometries of R m . TDA can be combined with machine learning and statistical tools due to stability under noise, which was first proved by Cohen-Steiner et al. [9] and then extended to a very general form by Chazal et al. [5]. The mergegram extends PD(A) to a stronger invariant whose stability under perturbations in the above sense is proved in section 5 for the first time. The idea of a mergegram is related to the Reeb graph [24] or the merge tree [21] for the sublevel set filtration of a scalar function. The mergegram MG is defined at a more abstract level for any clustering dendrogram, which can be reconstructed from MG in general position.
Since any persistence diagram and a mergegram are unordered collections of pairs, the experiments in section 6 will use the neural network PersLay [4] whose output is invariant under permutations of input points by design. PersLay extends the neural network DeepSet [29] for unordered sets and introduces new layers to specifically handle persistence diagrams, as well as a new form of representing such permutation-invariant layers. In other related work deep learning was recently applied to outputs of hierarchical clustering [13], [7], [17] and to 0-dimensional persistence [8], [14].

3
Single-linkage clustering and the invariant mergegram of a dendrogram Example 3.1. Fig. 2 illustrates the key concepts before formal Definitions 3.2, 3.4, 3.6 for the point cloud A = {0, 1, 3, 7, 10} in the real line R. Imagine that we gradually blur original data points by growing disks of the same radius s around the given points.
The disks of the closest points 0, 1 start overlapping at the scale s = 0.5 when these points merge into one cluster {0, 1}. This merger is shown by blue arcs joining at the node at s = 0.5 in the single-linkage dendrogram, see the bottom left picture in Fig. 2. The persistence diagram PD in the bottom middle picture of Fig. 2 represents this merger by the pair (0, 0.5) meaning that a singleton cluster of (say) point 1 was born at the scale s = 0 and then died later at s = 0.5 by merging into another cluster of point 0. When clusters {0, 1, 3} and {7, 10} merge at s = 2, this event was previously encoded in the persistence diagram by the single pair (0, 2) meaning that one cluster inherited from (say) point 7 was born at s = 0 and died at s = 2. The new mergegram in the bottom right picture of Fig. 2 represents the above merger by the following two pairs. The pair (1,2) means that the cluster {0, 1, 3} is merging at the current scale s = 2 and was previously formed at the smaller scale s = 1. The pair (1.5, 2) means that another cluster {7, 10} is merging at the scale s = 2 and was previously formed at s = 1.5.
The 0D persistence diagram represents the cluster of the whole cloud A by the pair (0, +∞), because A was inherited from a singleton cluster starting from s = 0. The mergegram represents the same cluster A by the pair (2, +∞), because A was formed during the last merger of {0, 1, 3} and {7, 10} at s = 2 and continues to live as s → +∞.
In the above dendrogram every vertical arc going up from a scale b to d contributes one pair (b, d) to the mergegram. So both singleton clusters {7}, {10} merging at s = 1.5 contribute one pair (0, 1.5) of multiplicity two shown by two red circles in Fig. 2. Definition 3.2 (single-linkage clustering). Let A be a finite set in a metric space X with a distance d : X × X → [0, +∞). Given a distance threshold, which will be called a scale s, any points a, b ∈ A should belong to one SL cluster if and only if there is a finite sequence a = a 1 , . . . , a m = b ∈ A such that any two successive points have a distance at most s, so d(a i , a i+1 ) ≤ s for i = 1, . . . , m − 1. Let ∆ SL (A; s) denote the collection of SL clusters at the scale s. For s = 0, any point a ∈ A forms a singleton cluster {a}. Representing each cluster from ∆ SL (A; s) over all s ≥ 0 by one point, we get the single-linkage dendrogram ∆ SL (A) visualizing how clusters merge, see the first bottom picture in Fig. 2.
For any s > 0, all SL clusters ∆ SL (A; s) can be obtained as connected components of a Minimum Spanning Tree MST(A) by removing all edges longer than s. Definition 3.3 (partition set P(A)). For any set A, a partition of A is a finite collection of non-empty disjoint subsets A 1 , . . . , A k ⊂ A whose union is A. The single-block partition of A consists of the set A itself. The partition set P(A) consists of all partitions of A.

XX:5
Definition 3.4 below extends a dendrogram from [3, section 3.1] to arbitrary (possibly, infinite) sets A. Since every partition of A is finite by Definition 3.3, we don't need to add that an initial partition of A is finite. Non-singleton sets are now allowed.     [1,2). At the scale s 2 = 2 the only merge set {0, 1, 2} has life = [2, +∞). The notation ∆ is motivated as the first (Greek) letter in the word dendrogram and by a ∆-shape of a typical tree.
Condition (3.4a) says that a partition of a set X is trivial for all large scales. Condition (3.4b) means that if the scale s is increasing, then sets from a partition ∆(s) can only merge but can not split. Condition (3.4c) implies that there are only finitely many mergers, when two or more subsets of X merge into a larger merge set.

XX:6
The mergegram of a dendrogram and its stability Definition 3.6 (mergegram MG(∆)). The mergegram of a dendrogram ∆ has the pair (birth, death) ∈ R 2 for each merge set B of ∆ with life(B) = [birth, death). If any life interval appears k times, the pair (birth,death) has the multiplicity k in MG(∆).
If our input is a point cloud A in a metric space, then the mergegram MG(∆ SL (A)) is an isometry invariant of A, because ∆ SL (A) depends only on inter-point distances. Though ∆ SL (A) as any dendrogram is unstable under perturbations of points, the key advantage of MG(∆ SL (A)) is its stability, which will be proved in Theorem 5.8.
The set of real numbers can be considered as a category R in the following sense. The objects of R are all real numbers. Any two real numbers such that a ≤ b define a single morphism a → b. The composition of morphisms a → b and b → c is the morphism a ≤ c. In this language, a persistence module is a functor from R to the category of vector spaces. A basic example of a persistence module V is an interval module. An interval J between points p < q in R can be one of the following types: closed [p, q], open (p, q), half-open or half-closed [p, q) and (p, q], all encoded as follows: The endpoints p, q can also take the infinite values ±∞, but without superscripts.
We illustrate the abstract concepts above by geometric constructions. Let f : X → R be a continuous function on a topological space. The sublevel sets X f The inclusions of the sublevel sets respect compositions similarly to a dendrogram ∆ in Definition 3.4. On a metric space X with a metric d : X × X → [0, +∞), a typical example of a function f : X → R is the distance d A to a finite subset A ⊂ X. For any point p ∈ X, let d A (p) be the distance from p to a closest point of A. For any r ≥ 0, the preimage ≤ r} is the union of closed balls with radius r and centers at all points q ∈ A. For example, If we consider any continuous function f : X → R, we have the inclusion X f s ⊂ X f r for any s ≤ r. Hence all sublevel sets X f s form a nested sequence of subspaces within X. The above construction of a filtration {X f s } can be considered as a functor from R to the category of topological spaces. Below we discuss the simplest case of dimension 0.

Example 4.3 (persistent homology).
For any topological space X, the 0-dimensional homology H 0 (X) is the vector space (with coefficients Z 2 ) generated by all connected components of X. Let {X s } be any filtration of nested spaces, e.g. sublevel sets X f s based on a continuous function f : X → R. The inclusions X s ⊂ X r for s ≤ r induce the linear maps between homology groups H 0 (X s ) → H 0 (X r ) and define the persistent homology {H 0 (X s )}, which satisfies the conditions of a persistence module from Definition 4.1.
If X is a finite set of m points, then H 0 (X) is the direct sum Z m 2 of m copies of Z 2 .
The persistence modules that can be decomposed as direct sums of interval modules can be described in a simple combinatorial way by persistence diagrams in R 2 . Definition 4.4 (persistence diagram PD(V)). Let a persistence module V be decomposed as a direct sum of interval modules : The 0-dimensional persistence diagram of a topological space X with a continuous function f : X → R is denoted by PD{H 0 (X f s )}. Lemma 5.6 will prove that the merge module M (∆) of any dendrogram ∆ is decomposable into interval modules. Hence the mergegram MG(∆) from can be interpreted as the persistence diagram of M (∆).
The following result describes how the persistence diagram PD of the distance-based filtration of any point cloud A can be obtained from the mergegram MG(∆ SL (S)).
containing each pair (0, s) exactly #b − #d times, where #b is the number of births b i = s and #d is the number of deaths d i = s. All trivial pairs (0, 0) are ignored, alternatively we take  A)). This example together with Theorem 4.5 justify that the mergegram is strictly stronger than 0D persistence as an isometry invariant of a point cloud.
New Reconstruction Theorem 4.6 below can be contrasted with the weakness of 0D persistence PD{H 0 (X d A s )} consisting of only pairs (0, s) whose finite deaths are half-lengths of edges in a Minimum Spanning Tree MST(A). In Example 3.1 these scales s = 0.5, 1, 1.5, 2 are insufficient to reconstruct the SL dendrogram in Fig. 2. Such a unique reconstruction is possible by using the richer invariant mergegram as follows. For a smallest merge scale s > 0, the births should be b 1 = b 2 = 0. We start drawing a dendrogram ∆ by merging any two points of A at this smallest scale s. To realize a merger at any larger s, we should select two clusters representing (b 1 , s) and (b 2 , s).
If b i = 0 then we take any of the unmerged points of A. If b i > 0 then the already constructed dendrogram should contain a unique non-singleton cluster determined by the scale b i ∈ (0, s). Hence at any merge scale s we know how to select two clusters to merge. The only choice comes from choosing points of A or permuting notes of ∆.
Following the above proof of Theorem 4.6 for the cloud A = {0, 1, 3, 7, 10} in Example 3.1, the first two pairs (0, 0.5) ∈ MG(∆ SL (A)) indicate that we should merge two points of A at s = 0.5. The scale s = 0.5 uniquely determines this 2-point cluster.
The next two pairs (0, 1), (0.5, 1) mean that the above cluster born at s = 0.5 should merge at s = 1 with a singleton cluster (any free point of A). The resulting 3-point cluster is uniquely determined by its merge scale s = 1. The further two pairs (0, 1.5), (0, 1.5) say that a new 2-point cluster is formed at s = 1.5 by the two remaining points of A.
The final pairs (1, 2), (1.5, 2) tell us to merge at s = 2 the two clusters formed earlier at s = 1 and s = 1.5. The resulting dendrogram ∆ has the expected combinatorial structure as in Fig. 2, though we can draw ∆ in another way by permuting points of A.

Stability of the mergegram for any single-linkage dendrogram
This section fully proves the stability of a mergegram, which was stated in [11,Theorem 7.4] without proving key Lemmas 5.6 and 5.7. For simplicity, we consider vector spaces with coefficients only in Z 2 = {0, 1}, which can be replaced by any field.
Definition 5.1 introduces homomorphisms between persistence modules, which are needed to state the stability of persistence diagrams PD{H 0 (X f s )} under perturbations of a function f : X → R. This result will imply a stability for the mergegram MG(∆ SL (A)) for the dendrogram ∆ SL (A) of the single-linkage clustering of a set A ⊂ X.

Definition 5.1 (a homomorphism of a degree δ between persistence modules). Let U and V be persistent modules over
Let Hom δ (U, V) be all homomorphisms U → V of degree δ. Persistence modules U, V are isomorphic if they have inverse homomorphisms U → V → U of degree 0.
For a persistence module V with maps v t s : V s → V t , the simplest example of a homomorphism of a degree δ ≥ 0 is 1 δ So v t s defining the structure of V shift all vector spaces V s by the difference δ = t − s.
The concept of interleaved modules below is an algebraic generalization of a geometric perturbation of a set X in terms of (the homology of) its sublevel sets X s .

Definition 5.3 (bottleneck distance BD)
. Let multisets C, D contain finitely many points (p, q) ∈ R 2 , p < q, of finite multiplicity and all diagonal points (p, p) ∈ R 2 of infinite multiplicity. For δ ≥ 0, a δ-matching is a bijection h : C → D such that |h(a) − a| ∞ ≤ δ in the L ∞ -distance for any point a ∈ C. The bottleneck distance between persistence modules U, V is BD(U, V) = inf{δ | there is a δ-matching between PD(U), PD(V)}.
The original stability of persistence for sequences of sublevel sets was extended as Theorem 5.4 to q-tame persistence modules. A persistence module V is q-tame if any nondiagonal square in the persistence diagram PD(V) contains only finitely many of points, see [5, section 2.8]. Any finitely decomposable persistence module is q-tame.    We will first prove that φ r is well-defined. If r ∈ life(A) then A ∈ M r ( ). We know that M r ( ) is generated by elements A ∈ (r) for which r ∈ life(A). Thus the compositions satisfy φ r • ψ r = id r and ψ r • φ r = id r . It remains to prove that morphisms correctly behave under the functors ψ, φ. The proofs for both cases are essentially the same, thus we will prove it only for ψ. The goal is to prove that the following diagram commutes: Here i t s is the direct sum of the corresponding maps of interval modules A (i t s ) A . Let

New experiments on isometry recognition of substantially distorted real shapes
This section fulfills final condition (d) of Problem 1.1 by experimentally comparing the mergegram with 0D persistence and distributions of distances to neighbors on 15000 clouds. The earlier paper [11] did experiments only on randomly generated clouds.
We considered 15 classes of shapes represented by black and white images of mythical creatures [2], see Fig. 1. These shapes were chosen to make the shape recognition problem really challenging. Indeed, similar creatures from this dataset are represented by slightly different shapes, which can be hard to isometrically distinguish from each other. For example, several images of a horse include only minor differentiating features such as a saddle or a different tails, which makes horses nearly identical.
Shape generation. For each image, we generated 1000 perturbed images by affine and projective transformations to get 15000 distorted shapes split into 15 classes.
First we rotated each image around its central point by an angle generated uniformly in the interval [0, 2π) using the function cv::rotate from the OpenCV library. If needed, we extended the resulting image to fit all black pixels of the rotated shape into a bounding box. Then both affine and projective transformations distort each image by using a noise parameter δ such that the value δ = 0 represents the identity transformation. Projective transformation are implemented as compositions of the already applied rotations above and the OpenCV function cv::getPerspectiveTransform() function, which is parametrized by 4-dimensional array v = (a 0 , a 1 , a 2 , a 3 ) consisting of points a i ∈ Z 2 , i = 0, 1, 2, 3. This function maps the corners of the image as follows: Then the projective transformation of the rectangle w × h is uniquely determined by the above corners. The above points a i are randomly sampled by using a noise parameter δ.
• Uniform noise: each coordinate has a uniform distribution with a noise parameter δ • Gaussian noise: each coordinate has a Gaussian distribution truncated to the image Generating distorted shapes by applying random rotations, affine and projective transformations, which substantially affect the extracted clouds of Harris corner points [27] in red.

Point cloud extraction.
For each distorted image, we extract classical Harris point corners [27] due to their simplicity, see the red points in Fig. 8. For detecting corner points, the OpenCV function cv::cornerHarris was used with the parameters blockSize = 3, apertureSize = 5, k = 0.04, thresh = 120. However one can use any reliable algorithms such as FAST [26] or scale-invariant feature transform (SIFT) [19].
After describing the available point cloud data above, we specify condition (1.1d) of Isometry Recognition Problem 1.1 in the context of supervised machine learning. Problem 6.1 (experimental recognition). Given a labeled dataset split into classes of similar but projectively distorted shapes, develop a supervised learning tool to recognize a class of distorted shapes with a high accuracy despite substantial noise.
Since all isometry invariants are independent of point ordering, the most suitable neural network is PersLay [4] whose output is invariant under permutations by design. Each layer is a combination of a coefficient layer ω(p) : R m → R, a transformation φ(p) : R m → R q and a permutation invariant layer op combined as follows Coordinates of all input points are linearly normalized to [0, 1]. We have used the following parameters of the PersLay network for all experiments below.
The max layer MAX(q) consists of the following functions.
The coefficient layer w : R m → R is the weight w(x 1 , . . . , x m ) = k|x 1 − x 2 |, where k is a trainable scalar and the dimension is typically m = 2. The transformation layer φ : {diagrams of points in R m } → R q is the function φ(D) = p∈D λp + γmaxpool(D) + β, where λ,γ are R m×q trainable matrices, β is a R q trainable vector and maxpool returns a maximum value for every i = 1, . . . , m. The operational layer op : R q → R t puts all coordinates in increasing order and composes the result with standard densely connected layer [12] Dense : R q → R t .
An output is a vector in R t for t = 15 of image classes. A final prediction is obtained by choosing a class with a largest coordinate in the output vector.
The image layer Im[x, y] for integer parameters x, y and a multiset of points in the unit square [0, 1] 2 consists of the following functions.
The coefficient layer w : R 2 → R is a piecewise constant function trained on x · y parameters, defined on the unit square partition . , x − 1 and j = 0, . . . , y − 1 .
Let φ p : R 2 → R be the Gaussian distribution centered at p ∈ D with a trainable standard deviation σ. The transformation layer φ : R → R xy consists of xy functions φ p , where p runs over all centroids of the partition P(x, y).
The operation layer op takes the sum over the given point cloud. A final prediction is made by composing the operation layer with the Dense layer.
Finally, the PersLay network used the optimizer tf.keras.adam with the standard learning rate 0.01 and 150 epochs, the loss function SparseCategoricalCrossEntropy, the 80:20 of training and testing, a 5-fold Monte Carlo cross validation for each run. Fig. 9, 10, 11 show that the mergegram MG consistently outperforms two other isometry invariants: 0D persistence and the multiset N N (4) consisting of 4-tuple distances to neighbors per given point. The simpler multiset N N (2) performed worse. A given cloud C ⊂ R 2 was considered as a baseline input. The noise factors δ reached 25%, which means that original images were distorted up to a quarter of image sizes.

A discussion of novel contributions and further open problems
This paper has further demonstrated that the provably stable-under-noise invariant mergegram of a dendrogram is a fast and efficient tool in the challenging problem of isometry shape recognition, especially for substantially distorted images.
In comparison with the conference version [11], section 4 proved new Theorem 4.6 describing how to reconstruct a single-linkage dendrogram in general position from its much simpler mergegram. It is hard to define a continuous metric between dendrograms, especially because they can be unstable under perturbations. Theorem 4.6 allows us to measure a continuous similarity between dendrograms in general position as the bottleneck distance between their unique mergegrams. This distance can be computed in time O(n 1.5 log n) [18] for diagrams consisting of at most n points.
Section 5 provided a full proof of stability of the mergegram under perturbations of points, while the earlier paper [11] only announced this result without proving highly non-trivial Lemmas 5.6 and 5.7, which required a heavy algebraic machinery. Example 3.1 and the discussion following Theorem 4.5 justify that the invariant mergegram is strictly stronger than the 0D persistence. This theoretical fact is now confirmed by the new experiments on 15000 point clouds extracted from substantially distorted real shapes. In Fig. 9, 10, 11 the mergegram outperformed other isometry invariants. Since the distribution N N (2) of distances to two closest neighbors per point performed badly, we have strengthened this invariant to N N (4) of distances to four nearest neighbors. However, even N N (4) performed always always worse than the original point cloud, which can not be considered as an isometry invariant. For very high levels of 20% and 25% distortions in projective transformations, the PersLay network trained on a point cloud achieved high recognition rates, because we have extensively tried many parameters in the layers MAX(75) and Im [20,20] for a best trade-off between accuracy and speed. The C++ code for the mergegram is at [11].
We thank all reviewers in advance for their valuable time and helpful suggestions.