# Algorithms for Computing the Triplet and Quartet Distances for Binary and General Trees

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

^{3}or n

^{4}just for enumerating the sets. The different topologies can be counted implicitly, however, and in this paper, we review a series of algorithmic improvements that have been used during the last decade to develop more efficient algorithms by exploiting two different strategies for this; one based on dynamic programming and another based on coloring leaves in one tree and updating a hierarchical decomposition of the other.

## 1. Introduction

^{2}) for both quartet and triplet distance and one based on coloring leaves in a tree traversal with complexity O(n log n) for quartet distances (with very little work, this algorithm can be modified to compute the triplet distance, but it has never been described as such in the literature). The dynamic programming approach was the first algorithm that was modified to deal with the quartet distance for general trees, and in the following section, we describe a number of approaches to doing this. In the same section, we describe how the tree coloring approach can be adapted to general trees, leading to the currently best worst-case running times mentioned above. We then turn to practical implementations, present experimental results for the best algorithms and show that the theoretically fastest algorithms can also be implemented to run efficiently in practice.

## 2. The Triplet and Quartet Distances

_{1}and T

_{2}, each with n leaves with labels from the same set of n labels, the triplet distance is defined for rooted trees, while the quartet distance is defined for unrooted trees.

_{1}and T

_{2}, of arbitrary degree can be partitioned into the five cases, A–E, listed in Figure 2. Note that in the resolved-resolved case, the triplet/quartet topologies may either agree (A) or differ (B) in the two trees. In the resolved-unresolved and unresolved-resolved cases (C and D), they always differ, and in the unresolved-unresolved case (E), they always agree. For binary trees, only the two resolved-resolved cases are possible.

## 3. Computing the Triplet and Quartet Distances between Binary Trees

#### 3.1. A Dynamic Programming Approach

_{1}behind the edge and F

_{2}and F

_{3}in front of the edge (see Figure 3a). We will write this as and say that the directed edge, , claims all quartets, ij|kl, with i, j ∈ F

_{1}, k ∈ F

_{2}and l ∈ F

_{3}(or l ∈ F

_{2}and k ∈ F

_{3}). If a tree contains the quartet topology, ij|kl, it will be claimed by exactly two such edges: with i,j ∈ F

_{1}, k ∈ F

_{2}and l ∈ F

_{3}(or l ∈ F

_{2}and k ∈ F

_{3}); and with k, l ∈ G

_{1}, i ∈ G

_{2}and j ∈ G

_{3}(or j ∈ G

_{2}and i ∈ G

_{3}) as in Figure 3b.

_{1}and T

_{2}, the algorithm now simply iterates through all edges, in T

_{1}and in T

_{2}, and counts how many oriented quartets, ij → kl, are claimed by both e

_{1}and e

_{2}. This number, denoted A(e

_{1},e

_{2}), can be computed as:

_{1}and G of T

_{2}. The number of shared quartets (and by extension, the quartet distance) can then be computed as:

^{2}).

^{2}): if F contains subtrees, F

_{1}and F

_{2}, and G contains subtrees, G

_{1}and G

_{2}, then |F ∩ G| = |F

_{1}∩ G

_{1}| + |F

_{1}∩ G

_{2}| + |F

_{2}∩ G

_{1}| + |F

_{2}∩ G

_{2}|. If F or G (or both) are leaves, similar, but simpler, cases are used.

^{2}) time.

#### 3.2. Tree Coloring

^{2}n) and O(n log n) algorithms [14,15] take a different approach than dynamic programming, but they are also based on oriented quartets. Rather than associating oriented quartets to edges, however, the algorithms associate oriented quartets to inner nodes. Given an inner node, v, with three incident subtrees, F , G and H, we say that v claims the oriented quartets, ij → kl, where i and j are in two different subtrees of v and k and l both are in the remaining subtree. Node v thus claims oriented quartets, and similar to before, each quartet is associated with exactly two inner nodes. Furthermore, similar to before, the idea is to compute the double sum , where A(v

_{1}, v

_{2}) now denotes the number of oriented quartets (in the remainder of this section, we will use quartet and oriented quartet interchangeably) associated with nodes, v

_{1}and v

_{2}, which induce the same quartet topology in both T

_{1}and T

_{2}. The two algorithms, however, only explicitly iterate over the nodes in T

_{1}, while for each v

_{1}∈ T

_{1}, they compute implicitly using a data structure called the hierarchical decomposition tree of T

_{2}. To realize this strategy, both algorithms use a coloring procedure in which the leaves of the two trees are colored using the three colors, , and . For an internal node, v

_{1}∈ T

_{1}, we say that T

_{1}is colored according to v

_{1}if the leaves in one of the subtrees all have the color, , the leaves in another of the subtrees all have the color, , and the leaves in the remaining subtree all have the color, . Additionally, we say that a quartet, ij → kl, is compatible with a coloring if i and j have different colors and k and l both have the remaining color. From this setup, we immediately get that if T

_{1}is colored according to a node, v

_{1}∈ T

_{1}, then the set of quartets in T

_{1}that are compatible with this coloring is exactly the set of quartets associated with v

_{1}and, furthermore, if we color the leaves of T

_{2}in the same way as in T

_{1}, that the set of quartets, S, in T

_{2}that are compatible with this coloring is exactly the set of quartets that are associated with v

_{1}and induce the same topology in both T

_{1}and T

_{2}; thus, = |S|.

_{1}and counting the compatible quartets in T

_{2}explicitly would take time O(n

^{2}), as we would run through n − 2 different colorings and perform O(n) work for each. By handling the coloring of T

_{1}recursively and using the “smaller-half trick”, we can ensure that each leaf changes color only O(log n) times, giving O(n log n) color changes in total. To compute the number of compatible quartets in T

_{2}for each coloring, we use a hierarchical decomposition of T

_{2}. The main feature of this data structure is that we can decorate it in a way such that it can return the number of quartets in T

_{2}compatible with the current coloring in constant time. To achieve this, the hierarchical decomposition uses O(log n) time to update its decoration when a single leaf changes color. Thus, in total, we use O(n log

^{2}n) time [14]. The algorithm for computing the triplet distance between binary trees in time O(n log

^{2}n) from [16] also uses this strategy, although the decoration of the hierarchical decomposition differs significantly. In the following two subsections, we describe the coloring procedure and the hierarchical decomposition tree data structure used in the quartet distance algorithm. Finally, in Section 3.2.3., we describe how the algorithm was tweaked in [15] to obtain time complexity O(n log n).

#### 3.2.1. Coloring Leaves in T_{1} Using the “Smaller-Half Trick”

_{1}in an arbitrary leaf. It then for each inner node, v

_{1}∈ T

_{1}, calculates the number of leaves, |v

_{1}|, in the subtree rooted at v

_{1}. This is done in a postorder traversal starting in the new (designated) root of T

_{1}, and the information is stored in the nodes. In this traversal, all leaves are also colored by the color, , and the new root is colored by the color, . The procedure then runs through all colorings of T

_{1}recursively, as described and illustrated in Figure 4, starting at the single child of the root of T

_{1}. In Figure 4, large(v

_{1}), respectively small(v

_{1}), denotes the child of v

_{1}that constitutes the root of the largest, respectively smallest, of the two subtrees under v

_{1}, measured on the number of leaves in the subtrees, and

`get_shared()`is a call to the hierarchical decomposition of T

_{2}to retrieve the number of quartets in T

_{2}that are compatible with the current coloring. During the traversal, the following two invariants are maintained for all nodes, v

_{1}∈ T

_{1}: (1) when

`Count`(v

_{1}) is called, all leaves in the subtree rooted at v

_{1}have the color, , and all other leaves have the color, ; and (2) when

`Count`(v

_{1}) returns, all leaves have the color, . As illustrated in Figure 4, these invariants imply that when

`get_shared()`is called as a subprocedure of

`Count`(v

_{1}), then T

_{1}is colored according to v

_{1}. Hence, the correctness of the algorithm follows, assuming that

`get_shared()`returns the number of oriented quartets in T

_{2}compatible with the current coloring of the leaves.

**Figure 4.**Traversing T

_{1}using the “smaller-half trick” to ensure that each leaf changes color at most O(log n) times.

_{1}, is at most half the size of the entire subtree rooted at v

_{1}, this can only happen at most log n times for each leaf, leading to O(n log n) color changes in total.

#### 3.2.2. The Hierarchical Decomposition Tree

_{2}, HDT(T

_{2}), is a representation of T

_{2}as a binary tree data structure with logarithmic height. The nodes in HDT(T

_{2}) are called components, and each of them is of one of the six types illustrated in Figure 5. The nodes in T

_{2}(including leaves) constitute the leaves of HDT(T

_{2}): each leaf in T

_{2}is contained in a component of type (i), and each inner node of T

_{2}is contained in a component of type (ii). An inner node/component of HDT(T

_{2}) represents a connected subset of nodes in T

_{2}, and it is formed as the union of two adjacent components using one of the compositions in Figure 5(iii)–(vi), where two components are adjacent if there is an edge in T

_{2}connecting the two subsets of nodes they represent. The root of HDT(T

_{2}) is formed using the composition in Figure 5(vi), and it represents the entire set of all nodes (including leaves) in T

_{2}.

**Figure 5.**The six different types of components in a hierarchical decomposition tree. Type (

**i**)and (

**ii**) consist of, respectively, a single leaf or an inner node in the original binary tree. Type (

**iii**)–(

**vi**) are formed as unions of two adjacent components. Type (

**vi**) occurs only at the root of the hierarchical decomposition tree.

_{2}), we first make a component of type (i) or (ii) for each node in T

_{2}and, then, greedily combine and replace pairs of components using the compositions in Figure 5(iii)–(vi) in subsequent iterations, until we only have the single root component left. This greedy strategy requires O(log n) iterations using O(n) time in total, and the result is an HDT of height O(log n) (see [14,15] for details).

_{2}from HDT(T

_{2}) in constant time, we decorate it in the following way: For each component, C, in HDT(T

_{2}), we store a tuple (a, b, c) of integers and a function, F. The integers, a, b and c, denote the number of leaves with colors , and , respectively, in the connected component of T

_{2}represented by C. The function F takes three parameters for each external edge of C (all components have between zero and three external edges, depending on their type, e.g., type (i) components have one external edge, while type (iii) components have two). If C has k external edges, we number these arbitrarily from one to k and denote the three parameters for edge i by a

_{i}, b

_{i}and c

_{i}. Of the two ends of an external edge, one is in the component and the other is not. The parameters, a

_{i}, b

_{i}and c

_{i}, denote the number of leaves with colors , and , respectively, which are in the subtree of T

_{2}rooted at the endpoint of edge i that is not in C. Finally, F states, as a function of these parameters, the number of elements of the quartets, which are both associated with nodes contained in C and compatible with the current coloring of the leaves. It turns out that F is a polynomial of a total degree of at most four.

_{2}) is initialized and updated, but the crucial point is that for a component, C, which is the composition of two components, C’ and C”, the decoration of C can be computed in constant time, provided that the decorations of C’ and C” are known. This means that the decoration can be initialized in O(n) time in a postorder traversal and that it can be updated in O(log n) time when a single leaf changes color by updating the path from the type (i) component containing the leaf to the root of HDT(T

_{2}).

#### 3.2.3. Tweaking the Algorithm to Obtain Time Complexity O(n log n)

_{v}= for each inner node v and c

_{v}= 0 for each leaf v, then we have a total cost of Σ

_{v}

_{∈}

_{T }c

_{v}≤ n log n. This means that if we can decrease the time used per inner node v

_{1}in T

_{1}from |small(v

_{1})| log n, as in the previous two subsections, to , the total time used will be reduced from O(n log

^{2}n) to O(n log n).

_{1}∈ T

_{1}, mark all leaves in the subtree of T

_{1}rooted at v

_{1}as non-contractible and contract T

_{2}according to this marking, then we get a contraction, T

_{2}

^{(}

^{v}

_{1}

^{)}, with at most 4|v

_{1}|− 5 nodes, and thus, we can build a hierarchical decomposition tree, HDT(T

_{2}

^{(}

^{v}

_{1}

^{)}), from this contraction with height O(log |v

_{1}|) in time O(|v

_{1}|). Hence, if we use HDT(T

_{2}

^{(}

^{v}

_{1}

^{)}) instead of HDT(T

_{2}) when visiting v

_{1}∈ T

_{1}, the time needed for visiting v

_{1}, disregarding the time spent on building T

_{2}

^{(}

^{v}

^{)}, is now O(|small(v

_{1})| log|v

_{1}|). However, this is still too much to use the extended smaller-half trick to obtain our goal.

_{1}∈ T

_{1}HDT(T

_{2}

^{(}

^{v}

_{1}

^{)}) has O(|v

_{1}|) nodes, the |small(v

_{1})| leaf-to-root paths in HDT(T

_{2}

^{(}

^{v}

^{)}) that need to be updated when visiting v

_{1}contain nodes in total. This means that if we spend only constant time to update each of these, we get a total time for coloring and counting, still disregarding the time used to build contractions, of O(n log n) by the extended smaller-half trick. To spend only constant time for each of the nodes, we split the update procedure for HDT(T

_{2}

^{(}

^{v}

_{1}

^{)}) after the |small(v

_{1})| color changes in two in the following way: (1) We first mark all internal nodes in HDT(T

_{2}

^{(}

^{v}

_{1}

^{)}) on paths from the |small(v

_{1})| type (i) components to the root by marking bottom-up from each of the type (i) components, until we find the first already marked component. (2) We then update all the marked nodes recursively in a postorder traversal starting at the root of HDT(T

_{2}

^{(}

^{v}

_{1}

^{)}).

_{2}

^{(}

^{v}

_{1}

^{)}from scratch for every inner node, v

_{1}in T

_{1}, we would spend O(n

^{2}) time just doing this. To fix this problem, we will only contract T

_{2}whenever a constant fraction of the leaves have been colored , and we will not do it from scratch every time. Figure 6 shows how this is achieved using the subroutines,

`Contract`and

`Extract`. In this pseudo-code, T

_{2}and ${T}_{2}^{\u2019}$ are used to refer to the tree, T

_{2}, as well as their associated hierarchical decomposition trees.

`Contract`(X, T

_{2}), constructs a decomposition of T

_{2}of size O(|X|), where each leaf with the color, X , has been regarded as non-contractible and where |X| is the number of leaves with the color, X . This is done in O(|T

_{2}|) time, where |T

_{2}| is the number of nodes in the current version of T

_{2}. Note that T

_{2}is not static, but is a parameter to

`FastCount`and is changed in specific recursive calls. See [15] for the details of

`Contract`. The routine,

`Extract`(small(v

_{1}), T

_{2}), uses the hierarchical decomposition of T

_{2}to extract a copy of T

_{2}at the point in the algorithm where T

_{1}is colored according to v

_{1}. The extracted tree is a copy of T

_{2}where all leaves in the subtree rooted at small(v

_{1}) still have the color, , but all other leaves have the color . The first call to

`Contract`in line 9 builds a contraction, ${T}_{2}^{\u2019}$, of this copy, and this contraction is used as T

_{2}in the recursive call on small(v

_{1}) in line 15. The second call to

`Contract`in line 12 is only executed when |T

_{2}| > 5|large(v

_{1})| (i.e., when more than 4/5 of the leaves have the color, ). This contraction is used in the recursive call on large(v

_{1}) in line 13.

_{2}| = |v

_{1}| when we perform it, the total time used on this line is O(n log n) by the extended smaller-half trick. We now consider the time spent on contracting T

_{2}in line 12. We perform this line whenever |T

_{2}| > 5|large(v

_{1})|. Since all leaves in large(v

_{1}) are colored when we contract, the size of the new T

_{2}has at most 4|large(v

_{1})| − 5 nodes. Hence, the size of T

_{2}is reduced by a factor of 4/5. This implies that the sequence of contractions applied to a hierarchical decomposition results in a sequence of data structures of geometrically decaying sizes. Since a contraction takes time O(|T

_{2}|), the total time spent on line 12 is linear in the initial size of T

_{2}, i.e., it is dominated by the time for constructing the initial hierarchical decomposition tree, which is O(n). In total, we therefore use O(n log n) time for contracting and extracting T

_{2}.

## 4. Dealing with General Trees

#### 4.1. Dynamic Programming Approaches

^{2}) pairs of trees, but the total sum of degrees for each tree is bounded by O(n). Therefore, the product of the pairs of degrees is bounded by O(n

^{2}). Worse, however, is computing A(e

_{1},e

_{2}) where we need to consider all ways of picking trees F

_{2}and F

_{3}and pairing them with choices G

_{2}and G

_{3}, with a worst-case performance of O(d

^{4}) for each pair of edges.

^{3}), is to consider all triplets, {i, j, k}, and count for each fourth leaf-label, x, how many of the quartets, {i, j, k, x}, have the same topology in the two trees; whether this be resolved topologies, ix|jk, ij|xk or ik|jx counting A, or unresolved, (ijkx) counting E. Counting this way, each shared quartet topology will be counted twelve times; so, the algorithm actually computes 12 · (A + E), and we must divide by twelve at the end. We explicitly iterate through all O(n

^{3}) triplets, and the crux of the algorithm is counting the shared quartet topologies in constant time.

_{i}, F

_{j}and F

_{k}be the trees in T

_{1}connected to C containing leaves i, j and k, respectively, and let G

_{i}, G

_{j}and G

_{k}be the corresponding trees in T

_{2}. A resolved quartet, ix|jk, is shared between the two trees if x is in F

_{i}and in G

_{i}(see Figure 7), so the number of such quartets is |F

_{i}∩ G

_{i}| − 1 (minus one because we otherwise would also count ii|jk). Therefore, given i, j and k, the number of shared resolved quartets is:

_{i}∩ G

_{i}| + |F

_{j}∩ G

_{j}| + |F

_{k}∩ G

_{k}| − 3

_{−}

_{ijk}denote all the leaves in T

_{1}, except those in F

_{i}, F

_{j}and F

_{k}, and similarly, let G

_{−}

_{ijk}denote the set of leaves in T

_{2}not in G

_{i}, G

_{j}and G

_{k}. The number of unresolved quartets (ijkx) is then given by |F

_{−}

_{ijk}∩ G

_{−}

_{ijk}|. We cannot directly build a table of all |F

_{−}

_{ijk}∩ G

_{−}

_{ijk}| in O(n

^{2}), since just from the number of indices (i, j and k), we would need a table of size O(n

^{3}). Instead, we can compute:

_{−}

_{ijk}∩ G

_{−}

_{ijk}| = | G

_{−}

_{ijk}| − (|G

_{−}

_{ijk}∩ F

_{i}| + |G

_{−}

_{ijk}∩ F

_{j}| + | G

_{−}

_{ijk}∩ F

_{k}|)

_{−}

_{ijk}| = n − (|G

_{i}| + |G

_{j}| + |G

_{k}|)

_{l}| for all l in O(n) in preprocessing) and:

_{−}

_{ijk}∩ F

_{l}| = |F

_{l}| − (|F

_{l}∩ G

_{i}| + |F

_{l}∩ G

_{j}| + |F

_{l}∩ G

_{k}|)

^{3}), we thus only need to find the center nodes in constant time for each triplet, {i, j, k}. We achieve this by finding a linear number of centers, in linear time, for a linear number of triplets. For pairs, {i, j}, we iterate through all k and their corresponding centers in linear time.

^{2})—we iterate through leaves k in O(n) and count the number of shared quartets, {i, j, k, x}, giving us a total running time of O(n

^{3}).

_{1}and T

_{2}are given by B+C+D in Figure 2. If we have functions A(T

_{1}, T

_{2}) and B(T

_{1}, T

_{2}) that compute A and B for the two trees, respectively, then we can compute the quartet distance without considering unresolved topologies explicitly, since A + B + C = A(T

_{1}, T

_{1}), A + B + D = A(T

_{2}, T

_{2}), and:

_{1}and e

_{2}in T

_{1}and T

_{2}, respectively, and let v

_{1}and v

_{2}denote the destination nodes of the edges. If v

_{1}or v

_{2}are high-degree nodes, then the edges do not correspond to unique claims, and , since the trees, F

_{2}, F

_{3}, G

_{2}and G

_{3}, are not uniquely defined (see Figure 9a). To get around this, the algorithm transforms the trees by expanding (arbitrarily) each high-degree node into a binary tree, tagging edges, so we can distinguish between original edges and edges caused by the expansion (see Figure 9b). We then define extended claims, , consisting of two edges, e

_{1}and ${e}_{1}^{\u2019}$, such that e

_{1}is one of the original edges, ${e}_{1}^{\u2019}$ is a result of expanding nodes and such that the oriented path from e

_{1}to ${e}_{1}^{\u2019}$ only goes through expansion edges. The extended claims play the role that claims do in the original algorithm, and each resolved quartet in the original tree is claimed by exactly two extended claims in the expanded tree.

^{2}d

^{2}).

_{1}, k ∈ F

_{2}and l ∈ F

_{3}) and claims ik → jl, il → jk, jk → il or jl → ik, we can count as:

^{2}d) by changing the counting schemes for the A and B functions. The gist of the ideas in the improved counting scheme is to extend the table of shared leaf counts, |F ∩ G|, with values |F ∩ G|, |F ∩ G| and |F ∩ G|, where F denotes the set of leaves not in subtree F. Using these extra counts, the algorithm builds additional tables, over-counting the number of quartets claimed and, then, adjusting the over-count. For details on this, rather involved, counting scheme, we refer to the original paper [21].

^{2+}

^{α}) algorithm, where α = (ω − 1)/2 and O(n

^{ω}) is the time it takes to multiply two n × n matrices. Using the Coppersmith-Winograd algorithm [23], where ω = 2.376, this yields a running time of O(n

^{2.688}) and was the first guaranteed sub-cubic time algorithm for computing the quartet distance between general trees. Here, the underlying idea is to reduce an explicit iteration over O(d

^{2}) pairs of claims to a matrix-matrix multiplication as part of the counting iteration. Again, we refer to the original paper for details [22].

#### 4.2. Tree Coloring

^{9}n log n) time algorithm for computing the quartet distance between two trees, T

_{1}and T

_{2}, where d = max{d

_{1},d

_{2}} for d

_{i}, the maximal degree of a node in tree T

_{i}. This was the first algorithm for general trees allowing for sub-quadratic time. The approach of [24] stays rather close to that of [14,15], but with a decoration scheme adapted to general trees. The dependency, d

^{9}, on d stems from the HDT data structure having a d factor in its balance bound when applied to general trees and from the applied decoration scheme for the HDT requiring O(d

^{8}) time for the update of the decoration of a node based on its children after a color change.

_{1}, d

_{2}}, which is of significance if only one of the trees has a high degree.

_{2}and I components containing a single internal node from T

_{2}. The internal nodes of the HDT are formed by C components containing a connected set of nodes with at most two external edges to other components and G components containing the subtrees of some of the siblings of a node in T

_{2}. During construction of the HDT, C and G components are created by the transformations and compositions shown in Figure 11. The construction proceeds in rounds, with each round performing a set of non-overlapping compositions. It is shown in [5] that in O(log n) rounds and O(n) time, a 1/ log(10/9)-locally balanced HDT tree is formed. This plays the role of the HDT described in Section 3.2, but now, for general trees.

**Figure 12.**The anchors (white nodes) of resolved and unresolved triplets and quartets. Edges in the figures represents paths in the tree.

_{1}recursively can be seen in Figure 13. It maintains the following invariants: (1) When entering a node, v, all leaves in the subtree of v have the color, one, and all leaves not in the subtree of v have the color, zero; (2) When exiting v, all leaves in T

_{1}have the color, zero. To initialize Invariant (1), all leaves are colored one at the start. As in Section 3.2, the operations, Contract and Extract, are then added (analogously to Figure 6) for exploiting a variant of the “extended smaller-half trick”, in order to arrive at the stated running times.

## 5. Experimental Results

^{2 }n) time algorithm implemented in QDist [14,26], the O(n

^{2.688}) time algorithms from Nielsen et al. [22] and the O(dn log n) time algorithm from Brodal et al. [5]. For the quartet distance between non-binary trees, we considered trees with a degree of eight and 128, using only the last two of the three algorithms, since the QDist algorithm only works on binary trees. Note, however, that the implementation from [5] is not strictly the variation described in the paper. It contains algorithmic optimizations improving both the asymptotic and practical runtime as described in [25].

^{2.688}) time algorithm is faster than the implementation of the O(n log

^{2}n) time algorithm. For all input sizes depicted, however, the implementation of the O(dn log n) time algorithm is the fastest, computing the distance between two trees with one million leaves in well under two minutes.

^{2.688}) time algorithm is faster on small trees (where the exact point where the other algorithm is faster depends on the degree).

^{2}n) time algorithm [16] is faster than the O(n log n) time algorithm [5] for all measured sizes. For the general algorithm, we again observe that it performs faster on high-degree trees (where, since the number of leaves is fixed, we have fewer internal nodes to handle).

**Figure 16.**Triplet distance running time. For the O(n log

^{2}n) time algorithm, which only handles binary trees, results are only shown for a degree of d = 2, while for the general algorithm, results are also shown for d = 8 and d = 128.

## 6. Conclusions

## Acknowledgments

## References

- Robinson, D.; Foulds, L.R. Comparison of phylogenetic trees. Math. Biosci.
**1981**, 53, 131–147. [Google Scholar] [CrossRef] - Estabrook, G.F.; McMorris, F.R.; Meacham, C.A. Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Syst. Zool.
**1985**, 34, 193–200. [Google Scholar] [CrossRef] - Critchlow, D.E.; Pearl, D.K.; Qian, C. The triples distance for rooted bifurcating phylogenetic trees. Syst. Biol.
**1996**, 45, 323–334. [Google Scholar] [CrossRef] - Day, W.H.E. Optimal-algorithms for comparing trees with labeled leaves. J. Classif.
**1985**, 2, 7–28. [Google Scholar] [CrossRef] - Brodal, G.S.; Fagerberg, R.; Mailund, T.; Pedersen, C.N.S.; Sand, A. Efficient Algorithms for Computing the Triplet and Quartet Distance between Trees of Arbitrary Degree. In Proceedings of the annual ACM-SIAM Symposium on Discrete Algorithms (SODA), New Orleans, LA, USA, January 2013; pp. 1814–1832.
- Steel, M.; Penny, D. Distributions of tree comparison metrics—Some new results. Syst. Biol.
**1993**, 42, 126–141. [Google Scholar] - Bandelt, H.J.; Dress, A. Reconstructing the shape of a tree from observed dissimilarity data. Adv. Appl. Math.
**1986**, 7, 309–343. [Google Scholar] [CrossRef] - Huson, D.H.; Scornavacca, C. Dendroscope 3: An interactive tool for rooted phylogenetic trees and networks. Syst. Biol.
**2012**, 61, 1061–1067. [Google Scholar] [CrossRef] - Snir, S.; Rao, S. Quartet MaxCut: A fast algorithm for amalgamating quartet trees. Mol. Phylogenetics Evol.
**2012**, 62, 1–8. [Google Scholar] [CrossRef] - Bansal, M.S.; Dong, J.; Fernández-Baca, D. Comparing and aggregating partially resolved trees. Theor. Comput. Sci.
**2011**, 412, 6634–6652. [Google Scholar] [CrossRef] - Pompei, S.; Loreto, V.; Tria, F. On the accuracy of language trees. PLoS One
**2011**, 6, e20109. [Google Scholar] [CrossRef] - Walker, R.S.; Wichmann, S.; Mailund, T.; Atkisson, C.J. Cultural phylogenetics of the Tupi language family in lowland South America. PLoS One
**2011**, 7, e35025. [Google Scholar] - Bryant, D.; Tsang, J.; Kearney, P.; Li, M. Computing the Quartet Distance between Evolutionary Trees. In Proceedings of the annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, CA, USA, January 2000.
- Brodal, G.S.; Fagerberg, R.; Pedersen, C.N.S. Computing the Quartet Distance Between Evolutionary Trees in Time O(n log
^{2}n). In Proceedings of the annual International Symposium on Algorithms and Computation, Christchurch, New Zealand, 19–21 December; Springer: Berlin/ Heidelberg, Germany, 2001; Volume 2223, pp. 731–742. [Google Scholar] - Brodal, G.S.; Fagerberg, R.; Pedersen, C.N.S. Computing the quartet distance between evolutionary trees in time O(n log n). Algorithmica
**2004**, 38, 377–395. [Google Scholar] [CrossRef] - Sand, A.; Brodal, G.S.; Fagerberg, R.; Pedersen, C.N.S.; Mailund, T. A practical O(n log
^{2}n) time algorithm for computing the triplet distance on binary trees. BMC Bioinforma.**2013**, 14, S18. [Google Scholar] - Mehlhorn, K. Data Structures and Algorithms: Sorting and Searching; Springer: Berlin/ Heidelberg, Germany, 1984. [Google Scholar]
- Buneman, O.P. The recovery of trees from measures of dissimilarity. In Mathematics of the Archeological and Historical Sciences; Kendall, D.G., Tautu, P., Eds.; Columbia University Press: New York, NY, USA, 1971; pp. 387–395. [Google Scholar]
- Bryant, D.; Moulton, V. A polynomial time algorithm for constructing the refined buneman tree. Appl. Math. Lett.
**1999**, 12, 51–56. [Google Scholar] [CrossRef] - Christiansen, C.; Mailund, T.; Pedersen, C.N.S.; Randers, M. Computing the Quartet Distance Between Trees of Arbitrary Degree. In Proceeding of the annual Workshop on Algorithms in Bioinformatics, Mallorca, Spain, 3–6 October 2005; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3692, pp. 77–88. [Google Scholar]
- Christiansen, C.; Mailund, T.; Pedersen, C.N.S.; Randers, M.; Stissing, M. Fast calculation of the quartet distance between trees of arbitrary degrees. Algorithms Mol. Biol.
**2006**, 1, 16–16. [Google Scholar] [CrossRef] - Nielsen, J.; Kristensen, A.; Mailund, T.; Pedersen, C.N.S. A sub-cubic time algorithm for computing the quartet distance between two general trees. Algorithms Mol. Biol.
**2011**. [Google Scholar] [CrossRef] - Coppersmith, D.; Winograd, S. Matrix multiplication via arithmetic progressions. J. Symb. Comput.
**1990**, 9, 251–280. [Google Scholar] [CrossRef] - Stissing, M.; Pedersen, C.N.S.; Mailund, T.; Brodal, G.S.; Fagerberg, R. Computing the Quartet Distance between Evolutionary Trees of Bounded Degree. In Proceedings of the Asia-Pacific Bioinformatics Conference, Hong Kong, 15–17 January 2007; pp. 101–110.
- Johansen, J.; Holt, M.K. Computing Triplet and Quartet Distances. Master’s Thesis, Aarhus University, Department of Computer Science, Aarhus, Denmark, 2013. [Google Scholar]
- Mailund, T.; Pedersen, C.N.S. QDist–Quartet distance between evolutionary trees. Bioinformatics
**2004**, 20, 1636–1637. [Google Scholar] [CrossRef]

© 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Sand, A.; Holt, M.K.; Johansen, J.; Fagerberg, R.; Brodal, G.S.; Pedersen, C.N.S.; Mailund, T.
Algorithms for Computing the Triplet and Quartet Distances for Binary and General Trees. *Biology* **2013**, *2*, 1189-1209.
https://doi.org/10.3390/biology2041189

**AMA Style**

Sand A, Holt MK, Johansen J, Fagerberg R, Brodal GS, Pedersen CNS, Mailund T.
Algorithms for Computing the Triplet and Quartet Distances for Binary and General Trees. *Biology*. 2013; 2(4):1189-1209.
https://doi.org/10.3390/biology2041189

**Chicago/Turabian Style**

Sand, Andreas, Morten K. Holt, Jens Johansen, Rolf Fagerberg, Gerth Stølting Brodal, Christian N. S. Pedersen, and Thomas Mailund.
2013. "Algorithms for Computing the Triplet and Quartet Distances for Binary and General Trees" *Biology* 2, no. 4: 1189-1209.
https://doi.org/10.3390/biology2041189