From trees to barcodes and back again: theoretical and statistical perspectives

Methods of topological data analysis have been successfully applied in a wide range of fields to provide useful summaries of the structure of complex data sets in terms of topological descriptors, such as persistence diagrams. While there are many powerful techniques for computing topological descriptors, the inverse problem, i.e., recovering the input data from topological descriptors, has proved to be challenging. In this article we study in detail the Topological Morphology Descriptor (TMD), which assigns a persistence diagram to any tree embedded in Euclidean space, and a sort of stochastic inverse to the TMD, the Topological Neuron Synthesis (TNS) algorithm, gaining both theoretical and computational insights into the relation between the two. We propose a new approach to classify barcodes using symmetric groups, which provides a concrete language to formulate our results. We investigate to what extent the TNS recovers a geometric tree from its TMD and describe the effect of different types of noise on the process of tree generation from persistence diagrams. We prove moreover that the TNS algorithm is stable with respect to specific types of noise.


Introduction
Although geometric approaches to analyzing data have been extensively used for many years, the first topological methods for data analysis were developed only recently, e.g., [8], [7], [22], [26], [25] and [4]. Topological Data Analysis (TDA) is a fairly new field at the intersection of data science and algebraic topology, the aim of which is to provide robust mathematical, statistical, and algorithmic methods to infer and analyze the topological and geometric structures underlying complex data. These data are often represented as point clouds in Euclidean or metric spaces, though TDA methods have also been generalized to geometric objects and graphs. TDA has proved its utility in a wide range of applications in biology [11], [3], [18], [10], material science [16], and geology [19], among other fields. Although it is still rapidly evolving, TDA now provides a set of powerful and efficient tools that can be used in combination with or as complements to other data science tools. One of the most promising applications of TDA is to the study of the brain, where it has served to analyze neuronal morphologies [13], brain networks [21], [23] [13], and brain functionality [24]. Motivated by the desire to objectively classify neuronal morphologies, in a previous publication (Kanari and Hess in [13]) we designed a topological signature for trees, the Topological Morphology Descriptor (TMD), that assigns a barcode (i.e., a multi-set of open intervals -called bars -in the real line) to any geometric tree (i.e, any finite binary tree embedded in R 3 ). We showed that the TMD algorithm effectively determines the reliability of clusterings of random and neuronal trees. Moreover, using the TMD algorithm, we performed an objective, stable classification of pyramidal cells in the rat neocortex [14], based only on the shape of their dendrites.
A frequent topic of discussion in the context of TDA is how to define an inverse to the process of associating a particular topological descriptor to a dataset, i.e., how to design a practical algorithm to recover the input data from a topological descriptor, such as a barcode. Oudot and Solomon [20] and Curry et al. [6] have proposed partial solutions to this problem. The main obstacle that renders this endeavor particularly challenging has proven to be the computational complexity of the space of inputs considered. To avoid this obstacle, it is reasonable to constrain the input space and search only for an inverse transformation that is relevant in a specific context, for instance to look for solutions only in the space of embedded graphs, as in [2].
In the context of geometric trees, we have designed an algorithm to reverse-engineer the TMD [12], in order to digitally generate artificial neurons, to compensate for the dearth of available biological reconstructions. This algorithm, called Topological Neuron Synthesis (TNS), stochastically generates a geometric tree from a barcode, in a biologically grounded manner. As shown in [12], the synthesized neurons are statistically indistinguishable from the corresponding reconstructed neurons in terms of both their morphological characteristics and the networks they form.
In this article, we further study the properties of this generative al-gorithm, from mathematical and statistical perspectives. We perform a theoretical and computational analysis of the TMD and TNS algorithms and their mathematical properties, in which symmetric groups play a key role. In particular, we investigate in detail the extent to which the TNS provides an inverse to the TMD. First, we carefully define our objects of study -geometric trees, barcodes, and persistence diagrams -then recall the TMD and TNS algorithms. We also introduce two distinct classifications of geometric trees: into combinatorial types and into TMD-types. The symmetric groups play an important role in our classification of trees into TMD-types. These complementary descriptions provide us with a language in which to formulate our results on the relationship between the TMD and the TNS.
In the next section, we introduce tools to describe the set of geometric trees that realize a specific barcode, i.e., whose TMD is equal to that barcode. In particular we establish an explicit formula for the cardinality of this set, which we use to describe how the cardinality changes when a new bar is added to a barcode or two bars of a barcode permuted. Cayley graphs of symmetric groups provide a useful visualization of these effects.
We then study the composite of the TNS and TMD algorithms from a theoretical perspective, to quantify the extent to which the TNS acts as an inverse to the TMD. For a given barcode B, we show that the probability that the bottleneck distance between the barcodes B and TMD • TNS(B) is greater than ε decreases exponentially with ε, thus establishing a form of stability for the TNS. We prove, moreover, that the probability that two bars of a barcode B will be permuted by applying TMD • TNS decreases exponentially with the distance between the terminations of the two bars, which is another form of stability. Together these stability results imply that the TNS is an excellent approximation to a (right) inverse to the TMD.
In the final section we present computational results that illustrate the complex relationship between a barcode and its possible tree-realizations. In particular, we study the distinguishing characteristics of "biological" geometric trees, i.e., those that arise from digital reconstructions of neurons, as opposed to arbitrary geometric trees. We also show that both the combinatorial type and the TMD-type of a geometric tree can change significantly when applying the composite TNS • TMD, from which it follows that the TNS is not a left inverse to the TMD. The tree T that we start with is indicated in dashed red lines under the new tree T = TNS • TMD(T ). The trees T and T can be quite different combinatorially, as seen on the right.

Mathematical background
Precisely what a mathematician means by the terms "tree" and "barcode" can vary depending on context. First, we specify what these terms mean in this article. We then recall biologically motivated algorithms for generating barcodes from trees [13] and trees from barcodes [12], the relation between which will be made clear in the following sections.

Trees
A finite rooted tree T is an acyclic, finite, directed graph such that each vertex is of degree at most 3, with a distinguished vertex r of degree 1, called the root. A vertex v of T is a parent of a vertex w if there is a directed edge from w to v; the vertex w is then a child of v. Each vertex of T has a single parent, except for the root r, which has no parent, and at most two children. The non-root vertices of degree 1 are called the leaves of T , and the vertices of degree 3 the branch points of T . A finite tree T is fully specified by its set of vertices, equipped with the partial order "is a parent of". Our main objects of study in this article are geometric trees, i.e., embeddings of finite rooted trees in R 3 , which are often used to model neurons. We assume, moreover, that if a vertex v is the parent of a vertex w, then the distance from the root to v is less than that from the root to w. 1 Let T denote the set of geometric trees.
We say that two geometric trees T and T are combinatorially equivalent, denoted T ∼ comb T , if they are embeddings of the same finite rooted tree.
In other words, the combinatorial type of a geometric tree is independent of its embedding in R 3 . A persistence barcode can equivalently be represented as a multi-set of points in R 2 , called a persistence diagram, where a bar (b i , d i ) corresponds to a point in R 2 with x-coordinate d i and y-coordinate b i . 2 If B is a barcode, we let PD(B) denote the associated persistence diagram. Note that, under this correspondence, the points of PD(B) lie below the diagonal, since b i is less than d i for every i.

Barcodes
We say that a barcode B = {(b i , d i )} i=0,...,n is strict if the first bar (b 0 , d 0 ) properly contains all the others, i.e., b 0 < b i and d i < d 0 for all i, and no bars are born or die at the same time, i.e., b i = b j and d i = d j for all i = j. The birth times of a strict barcode admit a total ordering. Without loss of generality, we assume that the bars are ordered by birth value, that is b 0 < b 1 < · · · < b n . Let B st denote the set of strict barcodes, and let B st n denote the subset of those strict barcodes with n + 1 bars. We say that two strict barcodes We denote the equivalence class containing a strict barcode B = {(b i , d i )} i=0,...,n by (i 1 ...i n ), where d i k > d i k+1 for all 1 ≤ k < n. For example, (2134) corresponds to the barcode with 5 bars shown in Figure 2 . Similarly, the deaths are ordered d 2 > d 1 > d 3 > d 4 , leading to the notation (2134).

The TMD: from trees to barcodes
The TMD (Topological Morphology Descriptor) is a many-to-one function from the set of geometric trees to the set of barcodes, that encodes the overall shape of the tree, both the topology of the branching structure of a tree and its embedding in R 3 [13]. It is defined recursively as follows. Let T be a rooted tree with root r and set N of vertices, with subset L of leaves. Let δ : N → R ≥0 be the function that assigns to each vertex its Euclidean distance to the root r.
We order the children of any vertex of T by their µ-value: The algorithm that extracts the TMD of a geometric tree T proceeds as follows ( Figure 3). Start by creating a set A of active vertices, originally set equal to L, and an empty barcode. For each leaf l, the algorithm proceeds recursively along its unique path to the root r. At each branch point b, one applies the standard Elder Rule from topological data analysis [5], removing from A all of the children of b, and adding b to A. One bar is added to the barcode for each child of b except (any one of) the oldest. Each child removed from A corresponds to a path from some leaf l to b, which is recorded in the barcode as a bar δ(b), δ(l) . These operations are applied iteratively to all the vertices until the root r is reached, at which point A contains only r and a leaf l for which µ is maximal among all leaves, which is recorded in the barcode as a bar 0, δ(l) .
If T is a digital reconstruction of a neuron, and the function δ is the path distance from the soma, then TMD(T ) is actually a strict barcode. Indeed, the probability for two branch points or leaves to be exactly the same distance from the soma is almost zero, and TMD(T ) always has a longest bar that contains all the others. This observation justifies our interest in the subset of strict barcodes.
The TMD gives rise to an equivalence relation on T : two geometric trees We provide below an in-depth analysis of the TMD-equivalence classes of geometric trees. Geometric trees can be combinatorially equivalent without being TMD-equivalent and vice-versa, cf. Figure 5.

The TNS: from barcodes to trees
The topological neuron synthesis (TNS) algorithm [12] stochastically generates synthetic neurons, in particular for use in digital reconstuctions of brain circuity [17]. In this paper, we focus on the sub-process of the TNS that stochastically generates a geometric tree from a strict barcode, in such a way that if a tree T is generated from a barcode B, then TMD(T ) is "close to" to B, with respect to an appropriate metric on the set of barcodes, up to some stochastic noise, cf. section 4. Henceforth, when we refer to the TNS, we mean this sub-process.
To grow geometric trees, the TNS algorithm first initiates growth, then loops through steps of elongation and branching/termination. Each branch of the tree is elongated as a directed random walk [1] with memory. At each step, a growing tip is assigned probabilities to bifurcate, to terminate, or to continue that depend on the path distance from the root and on a chosen bar of the selected barcode. Once a bar has been used, it is removed from the barcode. The growth of a tree terminates when no bars remain to be used. We now provide further details of the two steps in this process.

Bifurcation / Termination
The branching process in the TNS algorithm is based on the concept of a Galton-Watson tree [9], which is a finite rooted tree recursively generated as follows. At each step, a number of offspring is independently sampled from a distribution. Since a geometric tree consists only of bifurcations, terminations, and continuations, the accepted values for the number of offspring are: zero (termination), one (continuation), and two (bifurcation). The Galton-Watson algorithm generates only a combinatorial tree, with no embedding in space, so we modify the traditional process to introduce a dependency of the tree growth on the embedding, so that the bifurcation/termination probabilities depend on the path distance of the growing tip from the root.
The bifurcation/termination step of the growth process of a geometric tree with associated barcode B proceeds as follows. Each growing tip of the tree is assigned a bar (b i , d i ) sampled from the barcode B and a bifurcation angle a i . The growing tip first checks the probability to bifurcate, then the probability to terminate. If the growing tip does not bifurcate or terminate, then the branch continues to elongate. The probability to bifurcate depends on b i : as the distance from the root to the growing tip approaches b i , the probability to bifurcate increases exponentially until it attains a maximum of 1 at b i . Similarly, the probability to terminate depends exponentially on d i .
The probabilities to bifurcate and terminate are sampled from an exponential distribution e −λx , whose free parameter λ should be wisely chosen. A very steep exponential distribution (high value of λ) reduces the variance of the population of geometric trees synthesized based on the same barcode. On the other hand, a very low value of λ results in trees that are almost random, since the dependence on the input persistence barcode is decreased significantly. If we assume that growth takes place in discrete steps of size L, the value of the parameter λ should be of the order of the step size L, to ensure biologically appropriate variance [12]. Assuming L = 1 in some appropriate units, we usually select λ ≈ 1, so that the bifurcation and termination points are stochastically chosen but still strongly correlated with the input persistence barcodes.
Contrary to other neuron synthesis algorithms [15] that sample the branching and termination probabilities from independent distribution, in the TNS the correlation of these probabilities is captured in the structure of the barcode. When the growing tip bifurcates, the corresponding bar is removed from the input barcode to exclude re-sampling of the same conditional probability, thus recording the tree's growth history, which is essential for reproducing the branching structure. In the event of a termination, the growing tip is deactivated, and the bar that corresponds to this termination point is removed from the reference barcode.
At a bifurcation, the directions of the two daughter branches created depend on the bifurcation angle a i . In this study, we focus primarily on the combinatorial type and the TMD type of the generated geometric tree, so we do not investigate the effect of bifurcation angles on the growth.

Elongation
We now describe how the synthesized trees are embedded in R 3 . A segment of a growing tree is the portion of the tree between a pair of consecutive vertices (parent and child). Each synthesized tree is grown segment by segment. The direction of a segment, i.e., the vector d from its starting point to its end point, is a weighted sum of three unit vectors: the cumulative memory m of the directions of previous segments within a branch, a target vector t, and a random vector r [15]. The memory term is a weighted sum of the previous directions of the branch, with the weights decreasing with distance from the tip. As long as the memory function decreases faster than linearly with the distance from the growing tip, the exact choice of function is not important [12]. The target vector is chosen at the beginning of each branch and depends on the bifurcation angles. The random component is a unit vector sampled uniformly from R 3 at each step. The direction of the segment d = ρ r + τ t + µ m, then depends on three weight parameters ρ, τ , and µ, where ρ + τ + µ = 1. An increase of the randomness weight ρ results in a highly tortuous branch, approaching the limit of a simple random walk when ρ = 1. If the targeting weight τ = 1, the branch will be a straight line in the target direction. Different combinations of the three parameters (τ, ρ, µ) generate more or less meandering branches and thus reproduce a large diversity of geometric trees.

The Elder Rule and TNS
The TNS provides a sort of right inverse to the TMD. To recreate a tree that is close to TMD-equivalent to the original, the branch corresponding to a particular bar (b i , d i ) in the barcode can be attached only to branches corresponding to bars (b j , d j ) such that d i < d j and b i > b j . This rule ensures that the Elder rule (at a bifurcation, the older component survives) holds in the TMD transformation. As a result, only a subset of trees with n branches can be generated by the TNS from a given strict barcode with n bars.

Tree-realizations of barcodes
In this section we provide an in-depth analysis of the set of geometric trees that realize a specific strict barcode B, i.e., each of which has TMD equal to B.

Realizing barcodes as trees
A geometric tree T is a tree-realization of a barcode B if TMD(T ) = B, i.e., T ∈ TMD −1 (B). Examples of tree-realizations are provided in Figure 4B, while Figure 5 shows all the possible combinatorial types of tree-realizations of a strict barcode with n = 4.
In Figures 4 and 5, we encode the combinatorial structure of the tree, i.e., how the branches may be attached to each other, in an adjacency matrix in which the (i, j) coefficient is non-zero if the Elder Rule allows bar i to be connected to bar j. For example, in Figure 4A, bars 1 − 3 may all be connected to the black bar 0, thus the coefficients (0, 1), (0, 2), (0, 3) are all non-zero in the corresponding adjacency matrix. Note that in each realization only a subset of these possible attachments is actually made ( Figure 4B), since each branch can be attached to only one other branch.
The connectivity diagram (bottom of Figure 4A) provides another representation of the pairs of branches that may be connected, in agreement with the Elder Rule. The arrow on an edge in the diagram indicates the direction of the connection. In this example, there are arrows from 0 towards 1, 2, and 3, from 1 to 2 and 3, and from 2 to 3. Figure 4: A strict barcode, whose bars are ordered according to birth times, defines a unique ordering of death times. This ordering and the Elder Rule constrain the possible combinatorial types of trees that correspond to this barcode.  Proof. The order of the deaths in a strict barcode B completely determines the set of combinatorial equivalence classes of its possible tree realizations. Indeed, the two pairs of bars in Figure 6(2) lead to the same adjacency possibilities for their respective branches. Only move (1) in Figure 6, corresponding to switching the order of the deaths of the two bars, modifies the possible tree adjancencies. (1) (2) Figure 6: The two possible moves that respect the condition of a realisable barcode. Move (1) modifies the barcode's ordering, whereas move (2) does not change the order of the deaths.

The combinatorics of tree-realization
A version of this formula was established by Curry in [5]. In particular, the maximum tree-realization number for a strict barcode with n + 1 bars is n!, in the specific case where d n < ... < d 1 < d 0 . We call this case a strictly ordered barcode. https://www.overleaf.com/project/5e4e825efaec210001c3830f Lemma 3.2 enables us to quantify how adding a new bar changes tree-realization number.
Proof. The condition on d n+1 implies that the new bar (b n+1 , d n+1 ) is included in exactly k other bars, so its index is k.

Example 3.4. Let B be a barcode with four bars such that
i.e., its equivalence class is (213). It is easy to see that TRN(B) = 3 (see Figure 7). If we add a new bar (b 4 , d 4 ) such that  We can also apply Lemma 3.2 to determining how switching the order of two consecutive deaths in the barcodes affects the tree realization number.
Proof. It is enough to prove (1), since (2) then follows by switching the roles of B and B .
, but otherwise respects all of the same inclusion relations as (b i k+1 , d i k+1 ), so that Because no other bars are affected when passing from B to B , we can conclude.
Example 3.6. In Figure 9, we show all the possible death-transpositions in a strict barcode with five bars. As an example, take B in the equivalence class (2134), so the barcode satisfies d 2 > d 1 > d 3 > d 4 . The index of (b 4 , d 4 ) is 4, because it is included in all the other bars. Permuting d 3 and d 4 leads to a barcode in the equivalence class (2143). The index of last bar is now 3 because it is no longer included in the third bar.
Recall the bijection σ : B st n → S n from section 2.2. Permuting the order of the deaths d i and d i+1 corresponds to a transposition (i, i + 1) in S n (and to move (1) in Figure 6). Studying the allowed moves and their effects on the barcode is equivalent to studying the symmetric group seen as generated by transpositions of type (i, i + 1), enabling us to create the following revealing visualization of the effect of switching the order of deaths or of adding a new bar. Figure 8 shows the Cayley graph of S 3 generated by the permutations (12) and (23) and the corresponding equivalence classes of barcodes. The vertices of the graph correspond to the permutations in the symmetric group and their corresponding barcode types, and the edges between them to the transpositions transforming one permutation into another. The number next to each bar is its index.  (12) and (23), respectively. The number to the right of each bar is its index.

Stability of the TNS
In this section, we investigate the effect of the composition of the TNS and TMD algorithms from a theoretical perspective. Given a strict barcode Expressing the similarity between B and B in terms of the bottleneck distance enables us to establish one form of stability for the TNS in the first part of this section. We establish another type of stability for the TNS in the second part, when we show that the probability that the order of two specific bars will be altered upon applying TMD • TNS decreases exponentially with the distance between the death times of the two bars. Together these stability results imply that the TNS is an excellent approximation to a (right) inverse to the TMD.

Bottleneck stability
Henceforth, we call the endpoints of the bars of the barcode B targets, as the TNS algorithm either creates a new branch or terminates a branch when the distance from the root approaches a birth or death point, respectively. By definition of the TNS algorithm (cf. section 2.4), when approaching a target, there is an exponential probability to bifurcate (create a new branch) or terminate, depending on λ. Therefore, the distance between b i and b i and the distance between d i and d i both follow an exponential distribution of parameter λ. Let X i denote the random variable |d i − d i | ∼ Exp(λ) for all i ∈ {0, ..., n}.
The notion of similarity between barcodes that we consider here is the standard bottleneck distance. Given two strict barcodes B and B with n bars, the bottleneck distance between them is The TNS algorithm thus exhibits a form of "barcode-type stability": the probability that the bottleneck distance between B and TMD • TNS(B) is greater than ε decreases exponentially with ε.
Proof. Since the differences between the new and original values of the targets all follow an exponential distribution, the largest such difference must also follow an exponential law. It follows that which corresponds to taking γ to be the identity permutation. Since the bottleneck distance is computed by taking the infimum over all permutations, we conclude that the probability that d(B, B ) ≤ ε is smaller or equal than 1 − exp(−λε).
We perform two experiments to illustrate our theoretical results computationally. First, we compute the bottleneck distance between input barcodes B and output barcodes B , for increasing values of lambda λ from 0.01 to 10. We observe (see Figure 8A) that the computational result matches very closely the expected behavior, d(B, B ) ∼ Exp(λ). Second, we compute the bottleneck distance between input and output barcodes for various fixed values of λ, where the input barcodes arise by gradually decreasing the death time of one bar of an initial barcode B and thus increasing the distance to the next death time in the sequence (see Figure  8B). All other bars of the initial barcode B remain the same. We observe that the bottleneck distance depends only on the value of λ and not on the distance between the bars of the input barcode. Figure 10: A. Bottleneck distance as a function of lambda. We compute the bottleneck distance between an input barcode B and an output barcode B , for increasing values of lambda λ = 0.01 − 10. We observe that the computational result (scatter points in red) matches very closely the expected behavior d(B, B ) ∼ Exp(λ) and can be approximated with an exponential (red curve). B. Bottleneck distance as a function of distance between two bars, for different values of lambda. We choose two bars (b i , d i ) and (b j , d j ) of the initial barcode B that are consecutive in the order of deaths. The x-axis represents the distance between these consecutive death times as the death time of one of the two bars decreases. All other bars remain the same. For each of these input barcodes, we generated 100 synthesized barcodes and computed the bottleneck distance between the input and output barcodes, which depends only on the value of λ and not on the distance between the bars of the input barcode.

Transposition stability
As the TNS algorithm is a stochastic process, the image of any strict barcode  Figure 11: We are interested in the case where d j < d i when we start from d i < d j .
The distances |d i − d i | and |d j − d j | both follow an exponential law of parameter λ. The probability to terminate increases exponentially when approaching d i and d j , as represented by the blue arrows.
The TNS thus exhibits a sort of "transposition stability": the probability that the death times of two bars will be transposed decreases exponentially with the distance between those death times.
Let Y = X j − X i . As X j and X i both follow an exponential law, the density function of their difference, Y , is given by f Y (t) = λ 2 exp(−λt) when t ≥ 0. Therefore,

Remark 4.3.
Since the TNS is based on a stochastic process, multiple transpositions can occur when generating a new tree from a barcode. This makes it challenging to determine the overall probability of changing equivalence classes when computing the composite TMD • TNS. Note that the TNS might also affect the birth order, but we will not discuss this possible effect in this paper.

Computational exploration of tree-realization
In this section we present computational results that illustrate the complex relationship between the equivalence class of a barcode and its possible tree-realizations. We first present four results concerning all geometric trees: a computation of the distribution of tree-realization numbers across the set of equivalence classes of strict barcodes for various numbers of bars, a computation of the empirical distribution of combinatorial types of geometric trees in a synthesized population as a function of the equivalence class of the input barcode, a measurement of the diversity of TMD-equivalence classes among the realizations of a fixed barcode, and simulations of the fluctations in tree-realization number that can occur as two bars gradually switch the order of their deaths.
We conclude by reporting on an experiment that sheds light on the distinguishing characteristics of "biological" geometric trees, i.e., those that arise from digital reconstructions of neurons.

The distribution of tree-realization numbers
We illustrate here how the number of tree-realizations of strict barcodes with n + 1 bars depends on n. In Figure 18 we present the distribution of tree-realization numbers across equivalence classes of barcodes with n + 1 bars, for 1 ≤ n ≤ 10. As mentioned in section 3.2, the tree-realization number is maximal for a fixed number of bars if and only if the barcode is strictly ordered. Figure 13: Histogram of tree-realization numbers for equivalence classes of barcodes with n + 1 bars (1 ≤ n ≤ 10). The maximal tree-realization number for a fixed number of bars can be achieved with exactly one equivalence class, that of the strictly ordered barcode.

Empirical distributions of combinatorial types of trees
In this section, we explore computationally the probability to generate different combinatorial tree types (see Figure 5 A-F) with the TNS. We observe that this probability depends on the choice of the parameter λ (cf. section 2.4). When λ > 2, the TNS is more likely to generate trees with all branches connected to the longest branch, due to the design of the algorithm. On the other hand, for smaller values of λ, the probability to generate different types of trees is approximately uniform.
Focusing on our prefered value of λ, we generated 1000 trees for λ = 1 and computed the percentage of each combinatorial tree type that is realized for each equivalence class of barcodes with four bars (Figure 14). There are six possible equivalence classes of strict barcodes with four bars and six combinatorial equivalence classes of geometric trees with four branches. Figure 14: Empirical distribution (percentage of 1000 trees) of synthesized geometric trees with four branches by combinatorial tree type (columns A -F) for a given input barcode equivalence class (rows), when λ = 1. We observe that the distribution is approximately uniform.

Diversity of realized TMD-equivalence classes
We now explore the diversity of TMD-equivalence classes of geometric trees that can be synthesized from a fixed barcode, in the particular case of the TMD of a biologically meaningful tree. For a fixed geometric tree with eight branches arising from a digital reconstruction of layer 4 pyramidal cell, we computed its TMD barcode, to which we applied the TNS with λ = 1 to generate a set of 100 geometric trees. We computed the barcode-type and the persistence diagrams of the synthesized trees ( Figure 15).
In agreement with the results presented in Figure 10, the persistence diagrams of the synthesized trees ( Figure 15B, in blue) are essentially indistinguishable from the persistence diagram of the original barcode ( Figure 15B, in red). On the other hand, the TMD-equivalence class of a synthesized tree is not necessarily equal to that of the original tree ( Figure 15A). Here we represent the TMD-equivalence class of a tree in terms of the permutation σ B corresponding to the equivalence class of its TMD barcode B. In this graphical representation, we plot birth (or bifurcation) index k (on the x-axis) versus death (or termination) index σ B (k) (on the y-axis). A strictly ordered barcode would thus correspond to the set of points along the diagonal in this representation.

Statistics of changing classes
Motivated by the theoretical results on the probability to change classes in section 4.2, we analyze here several simulations of gradual switching of death order of two bars and the resulting effect on tree realizations and their associated barcodes.
Let B be a strict barcode, and let (b i , d i ), (b j , d j ) be bars of B such that d i < d j . By Lemma 4.2, for a fixed choice of the parameter λ (cf. section 2.4), the probability that the order of these two bars is reversed in TMD • TNS(B) depends exponentially on the distance between d i and d j : Thus, when the distance between d i and d j decreases, the probability that the order of bars changes increases. When there is no k such that d i < d k < d j , Proposition 3.5 provides a formula for the tree-realization number of the new barcode obtained when such a switch happens, as long as the order of the birth times is not also reversed. We start with a geometric tree T extracted from a digital reconstruction of a neuron and compute its associated barcode B = TMD(T ). We choose two bars (b i , d i ) and (b j , d j ) of B that are consecutive in the order of deaths and divide the interval (d i , d j ) into 50 equally sized subintervals. For 0 ≤ k ≤ 50, let B k be a barcode that is identical to B, except that its i th bar is Figure 16 provides an example of this construction, where the barcodes are represented as persistence diagrams for visualization purposes (cf. section 2.2).  To test whether the barcode B k might be equivalent to the original barcode B, we compute its tree-realization number: if TRN(B k ) = TRN(B), then B and B k are not equivalent. Figure 17 shows several examples of the endpoint-switching process described above and the corresponding evolution of the tree-realization number as k increases.
The top row illustrates very well the exponential behavior of changing classes. When the distance between the death times of the two bars is very small (they are the closest when k = 25), the tree-realization number oscillates between its values for two different classes and otherwise stays constant.
The two middle rows come from the same biological tree and hence have the same starting barcode. The difference is that in the second row, the death times of the two bars are already very close, leading to more frequent changes of equivalence class than in the third row.
The bottom row illustrates Remark 4.3 well. Since several bars are close to each other (represented here by several points in the persistence diagram that are close to each other), applying the TNS algorithm leads to frequent changes in equivalence classes, leading to the oscillatory behavior of the tree-realization number curve. When not clear, we circle in orange the two points that switch. On the right, the corresponding evolution of the tree realization number TRN(B k ) as k increases.

Tree-realizations of biological barcodes
Since the original objective in developing the TMD was to classify digital reconstructions of neurons, it is natural to ask whether those barcodes that arise biologically exhibit any special characteristics compared to those arising from other sets of geometric trees. In Figure 18 we employ the graphical representation of permutations introduced in section 5.3 to display as red dots all possible permutations corresponding to TMD-barcodes of biological trees with at most 30 branches arising from a population of digital reconstructions of neurons. Clearly, only a small fraction of the set of all possible permutations can be realized as the barcode-equivalence classes of geometric trees extracted from digital reconstructions of neurons, as every black dot in this plot can arise as a pair k, σ(k) for some permutation σ. In future work, we intend to study the biological relevance of this restriction.
To provide further insight into the subset of TMD-equivalence classes of biological geometric trees within the set of all possible TMD-equivalence classes, we computed the tree-realization number as a function of the number of bars, for a population of barcodes obtained by applying the TMD to geometric trees extracted from a population of digitally reconstructed neurons. We compared the values obtained to the maximum tree-realization number and to the tree-realization numbers of randomly chosen barcodes with the same number of bars ( Figure 19). Interestingly, the barcodes that correspond to apical dendrites (relatively complex neural trees that perform significant processing tasks) exhibit a more narrow range of possible treerealization numbers than random barcodes of the same size. On the other hand, barcodes of basal dendrites (less complex neuronal trees) exhibit treerealization numbers similar to those of the randomly generated barcodes.  The log of tree-realization number for barcodes of basal dendrites (in blue) in comparison with random barcodes (in yellow) and the maximum tree-realization number (n! for n + 1 bars) (in red). (B) The log of the tree-realization number for barcodes of apical dendrites (in blue) in comparison with random barcodes (in yellow) and the maximum maximum tree-realization number (in red).

Discussion
In this paper we presented and analyzed two algorithms that are relevant in topological data analysis: the TMD, which encodes the structure of a geometric tree in a barcode, and the TNS, which generates a geometric tree from a barcode. We proved that the TNS is robust with respect to small perturbations of barcodes; an analogous stability result for the TMD was established in [13].
We observed that ordering the bars in the persistence barcode according to birth times results in the natural association of a permutation to the barcode, based on death times, giving rise to a meaningful equivalence relation on the set of barcodes. We also introduced a natural, combinatorial equivalence relation on geometric trees. For any barcode, we analyzed the set of combinatorial equivalence classes of those geometric trees whose TMD corresponds that barcode, providing a simple, explicit formula for its cardinality in terms of the permutation associated to the barcode. Cayley graphs of symmetric groups provide a useful visualization of how this cardinality varies as bars in the barcode are transposed.
We illustrated our theoretical results computationally. In addition, we computed the probability for the TNS to generate different combinatorial tree types from a fixed barcode and found it to be a function of the parameter λ on which the TNS depends, a result which can be explained only by the stochastic nature of the TNS algorithm. The stochastic nature of the TNS algorithm also leads to variation in the equivalence classes of barcodes associated by the TMD to the trees generated from a fixed barcode by the TNS. In particular, when starting with the TMD of a "biological" tree (i.e., arising from a digital neuron reconstruction) including bars with similar birth or death times, we observed an oscillatory behavior between two (or more) different classes states, increasing the variance of the generated trees.
We also initiated an analysis of the distinctive features of biological trees compared to random trees. We discovered that the barcodes associated by the TMD to trees representing neuronal morphologies represent a small fraction of possible equivalence classes of barcodes. It follows that the set of combinatorial types of geometric trees that are biologically realized is also constrained, indicating a biological preference for specific tree structures. There is much yet to discover about which geometric or combinatorial features distinguish biological trees among all geometric trees and why.
In future work we intend to further investigate the effect of different types of noise on the TNS algorithm. For instance, we have considered only the effect of transposing two bars, but other types of changes are certainly also relevant, such as investigating the effects of switching both births and deaths, as mentioned in Remark 4.3. On a more neuroscientific note, we intend to continue exploring the distinguishing characteristics of biological trees, with the goal of explaining the structural and functional reasons for the observed geometric and combinatorial constraints.
On the mathematical side, we are currently analyzing the structure on the space of barcodes revealed by the symmetric groups and determining what information can be extracted from the induced stratification of this space. This structure on the space of barcodes should also provide significant insights into the still somewhat mysterious space of geometric trees, which is of considerable interest to a wide range of mathematicians.

Author contributions
Conceptualization by L.K., experiments performed by L.K. and A.G. Formal analysis and theorem formulation by L.K., A.G and K.H. Supervision by K.H. All authors wrote that paper and agreed to the published version of the manuscript.

Funding
This study was supported by funding to the Blue Brain Project, a research center of theÉcole polytechnique fédérale de Lausanne (EPFL), from the Swiss government's ETH Board of the Swiss Federal Institutes of Technology. AG and KH gratefully acknowledge the support of Swiss National Science Foundation, grant number CRSII5 177237.