A Note on Ultrametric Spaces, Minimum Spanning Trees and the Topological Distance Algorithm

: We relate the deﬁnition of an ultrametric space to the topological distance algorithm—an algorithm deﬁned in the context of peer-to-peer network applications. Although (greedy) algorithms for constructing minimum spanning trees such as Prim’s or Kruskal’s algorithm have been known for a long time, they require the complete graph to be speciﬁed and the weights of all edges to be known upfront in order to construct a minimum spanning tree. However, if the weights of the underlying graph stem from an ultrametric, the minimum spanning tree can be constructed incrementally and it is not necessary to know the full graph in advance. This is possible, because the join algorithm responsible for joining new nodes on behalf of the topological distance algorithm is independent of the order in which the nodes are added due to the property of an ultrametric. Apart from the mathematical elegance which some readers might ﬁnd interesting in itself, this provides not only proofs (and clearer ones in the opinion of the author) for optimality theorems (i.e., proof of the minimum spanning tree construction) but a simple proof for the optimality of the reconstruction algorithm omitted in previous publications too. Furthermore, we deﬁne a new algorithm by extending the join algorithm to minimize the topological distance and (network) latency together and provide a correctness proof.


Introduction
Our motivation stems from algorithms for peer-to-peer network applications, as considered in [1,2]. In these papers a simple distance called topological distance based on IP-addresses has been introduced along with a greedy topological distance algorithm to compute (spanning) trees with minimum weight measured by this distance. It is important to note that-differently to, e.g., Prim's algorithm-the topological distance algorithm computes spanning trees incrementally (see Remark 2 below) which is important for the applications considered. In this short note we relate the topological distance and the greedy algorithm to the theory of greedy algorithms in the context of matroids and greedoids, where the objective function stems from an ultrametric instead of a usual metric. Our contributions and motivations are threefold. First, our approach generalizes results of [1] to arbitrary metrics satisfying the axioms of ultrametric spaces. To the best of our knowledge, the connection between the topological distance algorithm, ultrametric spaces, and the abstract, discrete mathematics of matroids and greedoids has not been published before and hopefully some practitioners might find our remarks useful or appealing at the least. Secondly, our approach yields simpler and more elegant proofs of some of the results proven in [1] and enables extensions of the algorithms presented therein. And thirdly, even if the metric involved does not possess the properties of an ultrametric, our results can nonetheless be useful for practical applications, as often the relevant metrics are close to an ultrametric. For example, in networking applications in which nodes are connected to each other via a combination of local and global area networks, the network parameters of distance and latency of the associated routing paths differ primarily due to the effects of the (few) long-distance routes involved, and henceforth can be approximated quite well by an ultrametric; see remarks in Section 3.3.

Introduction and Notation
For the reader's convenience and to make this note self-contained, we first fix our notation and recall the definitions of ultrametric spaces, matroids, and greedoids from [3][4][5][6] respectively. Definition 1. A metric space (X, d) is called ultrametric iff its metric d (distance function) satisfies the strong triangle inequality: Corollary 1. Let (X, d) be an ultrametric space; then any triangle is an acute, isosceles triangle; i.e., for any three points x, y, and z we have Proof. This follows immediately from the definition.

Definition 2.
A matroid M on a finite ground set X is a pair (X, F ) with F ⊆ 2 X satisfying: M3 If A, B ∈ F and |A| ≥ |B|, then there exists x ∈ A \ B such that B ∪ {x} ∈ F (exchange axiom).
A greedoid M on a finite ground set X is a pair (X, F ) with F ⊆ 2 X satisfying M1 and M3.
If, e.g., for an undirected graph G = (X, E) we define F := {T ⊆ E|T does not contain a cycle}, then it is well known that (X, F ) becomes a matroid, and henceforth a greedoid; see [4].
Any undirected graph G equipped with a metric d can be considered a weighted, undirected graph by assigning weights d(x, x ) to each of the edges {x, x }. For any finite, such as graph G, we denote by d(G) the total distance (or total weight) of the graph defined as In the sequel, let X = {x 1 , . . . , x n } be a finite set equipped with a metric d. For any such set, one can naturally define a complete, graph K = (X, E), i.e., an undirected and connected graph in which every pair of distinct vertices is connected by a unique edge by taking the set X as the set of vertices and E := {{x i , x j } | x i , x j ∈ X and x i = x j } as the edge set. Using the metric d we turn K into a weighted, complete graph and define the total distance as above. Now for any complete graph K and a given ordering of the vertices, i.e., a sequence S = (x 1 , . . . , x n ) of vertices of K, we consider constructing trees T sequentially from K by applying the following greedy algorithm: Definition 3. Let (K, d) be a complete graph equipped with a metric d. Then for any ordered sequence of the vertices S = (x 1 , . . . , x n ) of K we define the join algorithm by the followings steps: For i = 2 to n, repeat step (3): 3.
Find the nearest neighbor x j with respect to d of x i in T and add x i together with the connecting edge {x i , x j } to T.

Remark 1.
While the join algorithm looks similar to the well-known Prim's algorithm [4] applied to a complete graph K, it is important to note that in Prim's algorithm the optimum candidate vertex x i to be added to the tree is chosen from the list of all available vertices by choosing the one with the minimum distance, whereas in the join algorithm the optimum candidate is simply the next one dictated by the ordering of the vertices, and only the best edge is chosen by taking the one with the minimum distance.
Obviously-as can be seen by trivial examples-in general the trees constructed depend on the order the vertices are added. We prove in the next subsection (see Proposition 1) that the total distance does not depend, however, on the order. Furthermore (see Theorem 1), the join algorithm guarantees constructing a minimally spanning tree for any given ultrametric d. The notion of an ultrametric is essential, as simple examples show that the join algorithm fails to construct a minimum spanning tree for arbitrary metrics. Indeed, see Proposition 2; the ultrametric is also necessary; i.e., if the join algorithm always constructs a minimum spanning tree, the metric must be an ultrametric.

Optimality Results
Proposition 1 (Permutation independence of the join algorithm). Assume that K is endowed with a metric d satisfying the strong triangle inequality, i.e., assume (K, d) is an ultrametric space. Let π denote a permutation of 1, . . . , n. For n vertices x 1 , . . . , x n , by T (1,...,n) we denote the tree obtained by joining x 1 , . . . , x n in this order, and henceforth, by T π(1,...,n) the tree obtained by joining x π(1) , . . . , x π(n) in the permuted order while applying the greedy algorithm join.
Proof. It is well known that any permutation can be written as a product of adjacent transpositions (This follows, e.g., immediately from the correctness of sorting algorithms, such as bubble or insertion sort). Henceforth, it is enough to prove the proposition for adjacent transpositions of two newly added vertices x and y. We consider several cases (omitting the trivial case n = 1): 1. Case n = 2: By symmetry of d for any two vertices x, y ∈ K we have d(T (x,y) ) = d(T (y,x) ).

2.
Case n > 2: Denote by T (Z,x,y) the tree obtained by joining x and y in this order to a given tree Z applying the join algorithm.
(a) Assume that x and y are connected by an edge, i.e., that the tree T (Z,x,y) looks as depicted in Figure 1a.
We consider three cases 2(a)i-2(a)iii: i. Let d(x, z) > d(x, y); then by Corollary 1 we conclude that d(z, y) = d(x, z), and henceforth if we define T as in Figure 1b, we conclude that d(T (Z,x,y) ) = d(T ). Thus T is equivalent to T (Z,x,y) with regard to its total distance. However, as T can be greedily constructed by first joining y and then x, we conclude ii. Let d(x, z) < d(x, y); then by Corollary 1 we conclude that d(z, y) = d(x, y), and henceforth if we define T as in Figure 1c, we get d(T (Z,x,y) ) = d(T ). Again as T can be greedily constructed by first (Note, that the tree T is symmetric with respect to the order x and y that are added) joining y and then x, we conclude iii.
. Henceforth for T defined as in Figure 1c we infer d(T ) ≤ d(T (Z,x,y) ). However, as T (Z,x,y) is obtained by greedily minimizing the distance of y, only equality can hold and we get (b) On the other hand, assume that T (Z,x,y) is such that x and y are not directly linked; then by the greedy choice of the join algorithm we conclude that Therefore putting (2a) and (2b) together we conclude that in all cases we have d(T (Z,y,x) ) ≤ d(T (Z,x,y) ). By exchanging the roles of x and y, i.e., a symmetrization argument, we conclude that only equality can hold. While the ultrametric property is sufficient to ensure order independence, it is also necessary in the following sense. Proposition 2. Let K = (X, E) be an arbitrary complete graph equipped with a metric d. If the total distance of the tree produced by the join algorithm (Definition 3) is the same for any permutation of the vertices S = (x 1 , . . . , x n ) of K, then the metric is an ultrametric.
Proof. Assume |X| ≥ 3, as otherwise the statement is trivially true. We first consider |X| = 3; i.e., we consider the triangle K = (A, B, C) (Figure 2a). We join B and C to A with B first and afterwards C (Figure 2b imply the ultrametric property. To prove the ultrametric property for the fourth case also, we consider the permutation of joining A and B to C or vice versa. Then we get T (C,A,B) = T (C,B,A) and repeat the above argument. This yields the ultrametric property for the first three cases directly again. For the fourth case we get b ≤ c ∧ a ≤ c, which altogether yields b ≤ a = c (and symmetrizing by considering T (B,A,C) = T (B,C,A) eventually yields a = b = c) which proves the ultrametric property in this case as well.
If |X| > 3, we consider an arbitrary triangle D := (A, B, C) ⊂ K. The join algorithm produces the same total distance for any permutation and henceforth without loss of generality we can consider D to be the first three vertices joined. By definition of the join algorithm, if the total distance of the first three joined vertices would be different, the total distance for K would be different, which-by assumption-is not the case. Thus, we can apply the former argument for |X| = 3 to conclude that the triangle D satisfies the ultrametric property. As this applies to any triangle D, the metric is an ultrametric on K. Theorem 1. Let (K, d) be a complete graph equipped with an ultrametric d. Then the join algorithm constructs a minimum spanning tree of K for the ultrametric d.
Proof. Prim's algorithm applied to the ultrametric d as an objective function (i.e., a real-valued function d : E → R) to optimize, is a greedy algorithm that is optimal, as proven in [4]. According to Proposition 1 we can re-order the vertices to match the order of Prim's algorithm without changing the topological distance. Corollary 2. Let (K, d) be a complete graph equipped with a metric d. If the join algorithm always constructs a minimum spanning tree of K for the metric d; then d is an ultrametric.

Proof. Proposition 2.
Corollary 3. Every sub tree produced by the join algorithm is a minimum spanning tree for the ultrametric d.

Remark 2.
From Corollary 3 it follows that the join algorithm is incremental in the sense that the join algorithm can construct a minimum spanning tree for K i for i ≤ n without any information about x j , j > i. It is also stable; i.e., a vertex added to the tree is never rearranged later. This is important for the network applications considered in [1,2,7] as it allows for dynamically constructing an optimal overlay network for, e.g., video conference participants with minimal modifications of the overlay network when new participants join.

Proof.
For each x i added to the tree, we have to calculate the minimum of d(

Remark 3.
Prim's algorithm can be implemented in O(|E| + n lg n) using a Fibonacci heap; see [8]. For a complete graph, we have |E| = n(n − 1)/2. As calculating the distances d(x i , x j ) for all edges is in O(n 2 ), Prim's algorithm has the same complexity as the join algorithm.
An alternative to Prim's algorithm for constructing minimum weighted spanning trees is Kruskal's algorithm [4]. It considers (sub-) forests of trees in X and greedily adds links of minimum weight until all forests are connected. The reconstruction algorithm as defined in [1] to "repair" trees after removing one vertex (see Section 3) is based on Kruskal's idea and attaches whole subtrees by a modification of step (3) of Definition 3: Definition 4. Let T be a minimum spanning tree in (K, d) obtained by applying the join algorithm and deleting a vertex x ∈ K. The reconstruction algorithm is then defined by the followings steps:

1.
Consider x as the root of the subtree denoted by T x and find all children x i of x in T x .

2.
For each child x i find the nearest neighbor x j with respect to d of x i in T \ T x and add x i together with the connecting edge {x i , x j } to T \ T x .
Theorem 2. If (K, d) is an ultrametric space, the reconstruction algorithm constructs a minimum spanning tree for the ultrametric d.
Proof. If one considers forests of trees, one can apply Kruskal's algorithm to the greedoid defined by the matroid above and the subtrees defined by each neighbor x i as a root of a subtree. For details, see [4]. In Kruskal's algorithm one selects the minimum link connecting all vertices of the subtree connecting to the existing tree. By the construction of the join algorithm and the fact that (K, d) is an ultrametric space, the minimum is obtained at the root x i .

Remark 4.
A subtlety might be worthwhile/important to note: It is claimed in [9] that the general proof of the optimality of the greedy algorithm for general greedoids, i.e., Theorem XI.1.3 of [4] contains a subtle error. However, our proofs are not affected by this argument, as Kruskal's algorithm applies to matroids, as pointed out therein, and Prim's algorithm-known to be valid for a long time anyway-is covered by Theorem XI.2.2 of [4] (see Theorem 9 of [9]).

Topological Distance
In [1] the so-called topological distance was introduced as a tool to optimize peer-to-peer networks. For practical usage, the incremental nature of the join algorithm (as opposed to Prim's algorithm) is crucial-see [7]-as this algorithm is used as a basis for a peer-to-peer application protocol. For the reader's convenience we recall its definition (It is similar to a metric based on a taxonomy-known to satisfy the strong triangle inequality): Definition 5. Let x and y be a two vertices representing communication nodes in an IP-network; then their topological distance d(x, y) is defined by counting the number of different static network coordinates C i (x). Formally, if h denotes the number of coordinates, and C(x) = (C 0 (x), . . . , C i (x), . . . , C h−1 (x)), then: where m :  d is a metric. 2.
d satisfies the strong triangle axiom; i.e., X is an ultrametric space if endowed with d.
Proof. Obvious from the definition.
Theorem 3. The JOIN algorithm as defined in [1] and analyzed in [2] constructs a minimum spanning tree for the topological distance d.

Proof.
As the JOIN algorithm is equivalent to the algorithm join defined in Definition 3, this is a corollary of Lemma 1 and Theorem 1.

Theorem 4.
The RECONSTRUCT_TREE algorithm as defined in [1] and analyzed in [2] constructs a minimum spanning tree for the topological distance d.
Proof. The RECONSTRUCT_TREE algorithm is equivalent to the algorithm reconstruction defined in Definition 4; thus, this is a corollary of Lemma 1 and Theorem 2.

Latency
When constructing applications, in addition to minimizing the topological distance d, one often also wants to control the maximum latency of each communication link; i.e., one tries to minimize the maximum latency of each (pair-wise) link. We prove that it is possible to modify the JOIN algorithm of [1,2] in such a way that, among all trees with minimum topological distance d, it also minimizes the maximum latency of each (pair-wise) link. We formally define latency as follows: Definition 6. Let T be a tree equipped with a metric d. Any two vertices x i and x j of a tree are connected by a unique path of m vertices which we denote by x i 1 , . . . , x i m such that x i 1 = x i and x i m = x j . Then the latency between x i and x j is defined as and the tree latency L of the tree is defined as Definition 7. The snowflake (The name stems from the snowflake forms it tends to generate) algorithm JOIN-S algorithm is defined by joining each node as in the JOIN algorithm with the following modification. If multiple minimal candidates with the same topological distance exist, then the one is chosen that implies the lower tree latency for the resulting tree.
Theorem 5. The JOIN-S algorithm creates a tree with minimum topological distance and minimum latency among all those trees.
Proof. Define d := d + λL; then d is an objective function, i.e., a real-valued function d : E → R, that we can minimize within the framework of matroids and greedoids; see [4]. If we set λ > 0 small enough to ensure that λL < 1, then to obtain the minimum, we first have to find the minimum of d henceforth for the trees with minimum topological distance, and then afterwards-among all trees with the same minimum-the one with the minimum tree latency. (The condition λL < 1 effectively subordinates the latency costs L to the topological costs d). It is also clear that a greedy choice corresponds to the JOIN-S algorithm. Henceforth the proof follows from general matroid/greedoid theory, as defined in [4] if we can ensure that the analogy of Proposition 1 is true. By carefully examining the proof of Proposition 1 we conclude that the equivalent trees depicted in Figure 1a,b not only have the same d but also the same latency L; thus case 2(a)i passes through without modification. On the other hand, for case 2(a)ii and 2(a)iii-as the "symmetric" case depicted in Figure 1c has the same d but lower latency L-it would therefore be selected by algorithm JOIN-S. As the corresponding configuration depicted in Figure 1c is independent of the order, we can conclude that in any case JOIN-S is order-independent which completes the proof.

Definition 8.
The snowflake RECONSTRUCT_TREE-S algorithm is defined by replacing the join algorithm in Definition 4 of the reconstruction algorithm with the JOIN-S algorithm of Definition 7.

Corollary 5.
The RECONSTRUCT_TREE-S creates a tree with minimum topological distance and minimum latency among all those trees.
Proof. Follows from the equivalence of the JOIN algorithm with Prim's and Kruskal's algorithm applied to d .

Outlook
For practical applications it would be interesting to investigate whether distance measures can be approximated by distances satisfying the strong triangle inequality-in other words, whether we can use ultrametric spaces as approximation tools. For example, the network configuration depicted in Figure 3a is based on the data measured in [2]. The edge weights depicted denote the actual latencies in milliseconds. As shown in Figure 3b it can be quite well approximated by an ultrametric-the individual weights have to be changed only by small percentages, and the total distances of the trees produced by the join algorithm differ only by less than 2% if calculated with the ultrametric rather than using the empirical numbers. This is primarily a result of the latencies of local and national links respectively being similar to each other while different in scale. As follow-up work to [7], we plan to empirically investigate this more quantitatively using appropriate proximity measures to judge the approximation quality.