# From Trees to Barcodes and Back Again: Theoretical and Statistical Perspectives

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Mathematical Background

#### 2.1. Trees

#### 2.2. Barcodes

#### 2.3. The TMD: From Trees to Barcodes

#### 2.4. The TNS: From Barcodes to Trees

#### 2.4.1. Bifurcation/Termination

#### 2.4.2. Elongation

#### 2.4.3. The Elder Rule and TNS

## 3. Tree-Realizations of Barcodes

#### 3.1. Realizing Barcodes as Trees

**Lemma**

**1.**

**Proof.**

#### 3.2. The Combinatorics of Tree-Realization

**Lemma**

**2.**

**Proof.**

**not**satisfy

**Lemma**

**3.**

**Proof.**

**Example**

**1.**

**Proposition**

**1.**

- 1.
- If ${i}_{k}<{i}_{k+1}$, then ${\mathrm{index}}_{{i}_{k+1}}\left({B}^{\prime}\right)={\mathrm{index}}_{{i}_{k+1}}\left(B\right)-1$, and$$\mathrm{TRN}\left({B}^{\prime}\right)=\frac{\mathrm{TRN}\left(B\right)({\mathrm{index}}_{{i}_{k+1}}\left(B\right)-1)}{{\mathrm{index}}_{{i}_{k+1}}\left(B\right)}.$$
- 2.
- If ${i}_{k}>{i}_{k+1}$, then ${\mathrm{index}}_{{i}_{k+1}}\left({B}^{\prime}\right)={\mathrm{index}}_{{i}_{k+1}}\left(B\right)+1$, and$$\mathrm{TRN}\left({B}^{\prime}\right)=\frac{\mathrm{TRN}\left(B\right)({\mathrm{index}}_{{i}_{k+1}}\left(B\right)+1)}{{\mathrm{index}}_{{i}_{k+1}}\left(B\right)}.$$

**Proof.**

**Example**

**2.**

## 4. Stability of the TNS

#### 4.1. Bottleneck Stability

**Lemma**

**4.**

**Proof.**

#### 4.2. Transposition Stability

**Lemma**

**5.**

**Proof.**

**Remark**

**1.**

## 5. Computational Exploration of Tree-Realization

#### 5.1. The Distribution of Tree-Realization Numbers

#### 5.2. Empirical Distributions of Combinatorial Types of Trees

#### 5.3. Diversity of Realized TMD-Equivalence Classes

#### 5.4. Statistics of Changing Classes

#### 5.5. Tree-Realizations of Biological Barcodes

## 6. Discussion

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Frosini, P.; Landi, C. Size Functions and Morphological Transformations. Acta Appl. Math.
**1997**, 49, 85–104. [Google Scholar] [CrossRef] - Edelsbrunner, H.; Letscher, D.; Zomorodian, A. Topological Persistence and Simplification. Discret. Comput. Geom.
**2002**, 28, 511–533. [Google Scholar] [CrossRef] [Green Version] - Robins, V. Computational Topology for Point Data: Betti Numbers of α-Shapes. In Morphology of Condensed Matter; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
- Zomorodian, A.; Carlsson, G. Computing persistent homology. Discret. Comput. Geom.
**2005**, 33, 249–274. [Google Scholar] [CrossRef] [Green Version] - Verri, A.; Uras, C.; Frosini, P.; Ferri, M. On the use of size functions for shape analysis. Biol. Cybern.
**2004**, 70, 99–107. [Google Scholar] [CrossRef] - Carlsson, G. Topology and data. Bull. Am. Math. Soc.
**2009**, 46, 255–308. [Google Scholar] [CrossRef] [Green Version] - Harrington, H.; Feliu, E.; Wiuf, C.; Stumpf, M.P. Cellular compartments cause multistability in biochemical reaction networks and allow cells to process more information. arXiv
**2012**, arXiv:1210.2993v1. [Google Scholar] - Byrne, H.; Harrington, H.; Muschel, R.; Reinert, G.; Stolz, B.J.; Tillmann, U. Topological Methods for Characterising Spatial Networks: A Case Study in Tumour Vasculature. arXiv
**2019**, arXiv:1907.08711. [Google Scholar] - Martino, A.; Rizzi, A.; Mascioli, F.M.F. Supervised Approaches for Protein Function Prediction by Topological Data Analysis. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
- Gameiro, M.; Hiraoka, Y.; Izumi, S.; Kramár, M.; Mischaikow, K.; Nanda, V. A topological measurement of protein compressibility. Jpn. J. Ind. Appl. Math.
**2015**, 32, 1–17. [Google Scholar] [CrossRef] - Lee, Y.; Barthel, S.; Dłotko, P.; Moosavi, S.M.; Hess, K.; Smit, B. High-Throughput Screening Approach for Nanoporous Materials Genome Using Topological Data Analysis: Application to Zeolites. J. Chem. Theory Comput.
**2018**, 14, 4427–4437. [Google Scholar] [CrossRef] - Muszynski, G.; Kashinath, K.; Kurlin, V.; Wehner, M.F.; Prabhat, M. Topological Data Analysis and Machine Learning for Recognizing Atmospheric River Patterns in Large Climate Datasets. Geosci. Model Dev.
**2019**, 12, 613–628. [Google Scholar] [CrossRef] [Green Version] - Kanari, L.; Dłotko, P.; Scolamiero, M.; Levi, R.; Shillcock, J.; Hess, K.; Markram, H. A Topological Representation of Branching Neuronal Morphologies. Neuroinformatics
**2018**, 16, 3–13. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Reimann, M.W.; Nolte, M.; Scolamiero, M.; Turner, K.; Perin, R.; Chindemi, G.; Dlotko, P.; Levi, R.; Hess, K.; Markram, H. Cliques of Neurons Bound into Cavities Provide a Missing Link between Structure and Function. Front. Comput. Neurosci.
**2017**, 11, 48. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Sizemore, A.; Giusti, C.; Kahn, A.E.; Vettel, J.; Betzel, R.; Bassett, D. Cliques and cavities in the human connectome. J. Comput. Neurosci.
**2017**, 44, 115–145. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Stolz, B.J.; Harrington, H.; Porter, M. Persistent homology of time-dependent functional networks constructed from coupled time series. Chaos
**2017**, 27, 047410. [Google Scholar] [CrossRef] - Kanari, L.; Ramaswamy, S.; Shi, Y.; Morand, S.; Meystre, J.; Perin, R.; Abdellah, M.; Wang, Y.; Hess, K.; Markram, H. Objective Morphological Classification of Neocortical Pyramidal Cells. Cereb. Cortex
**2019**, 29, 1719–1735. [Google Scholar] [CrossRef] - Oudot, S.; Solomon, E. Inverse Problems in Topological Persistence. arXiv
**2018**, arXiv:1810.10813. [Google Scholar] - Curry, J.; Mukherjee, S.; Turner, K. How Many Directions Determine a Shape and other Sufficiency Results for Two Topological Transforms. arXiv
**2018**, arXiv:1805.09782. [Google Scholar] - Belton, R.L.; Fasy, B.T.; Mertz, R.; Micka, S.; Millman, D.L.; Salinas, D.; Schenfisch, A.; Schupbach, J.; Williams, L. Reconstructing Embedded Graphs from Persistence Diagrams. arXiv
**2020**, arXiv:1912.08913. [Google Scholar] [CrossRef] - Kanari, L.; Dictus, H.; Chalimourda, A.; Van Geit, W.; Coste, B.; Shillcock, J.; Hess, K.; Markram, H. Computational Synthesis of Cortical Dendritic Morphologies. Cell
**2020**. [Google Scholar] [CrossRef] - Curry, J. The Fiber of the Persistence Map for Functions on the Interval. arXiv
**2019**, arXiv:1706.06059. [Google Scholar] [CrossRef] [Green Version] - Markram, H.; Muller, E.; Ramaswamy, S.; Reimann, M.W.; Abdellah, M.; Sanchez, C.A.; Ailamaki, A.; Alonso-Nanclares, L.; Antille, N.; Arsever, S.; et al. Reconstruction and Simulation of Neocortical Microcircuitry. Cell
**2015**, 163, 456–492. [Google Scholar] [CrossRef] [PubMed] - Aslangul, C.; Pottier, N.; Chvosta, P.; Saint-James, D. Directed random walk with spatially correlated random transfer rates. Phys. Rev. E
**1993**, 47 3, 1610–1617. [Google Scholar] [CrossRef] - Galton, F.; Watson, H.W. On the Probability of the Extinction of Families. J. Anthropol. Inst. Great Br. Irel.
**1875**, 4, 138–144. [Google Scholar] - Koene, R.; Tijms, B.; van Hees, P.; Postma, F.; Ridder, A.; Ramakers, G.; Pelt, J.; Ooyen, A.V. NETMORPH: A Framework for the Stochastic Generation of Large Scale Neuronal Networks with Realistic Neuron Morphologies. Neuroinformatics
**2009**, 7, 195–210. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**The two composites of TMD and TNS. (Left) An illustration of how a neuron (black) is modeled as a tree (dashed red lines). We describe this process and how to extract a barcode from a tree in Section 2.3. (Top) The composite $\mathrm{TMD}\circ \mathrm{TNS}$ applied to a barcode B. The new barcode ${B}^{\prime}=\mathrm{TMD}\circ \mathrm{TNS}\left(B\right)$ is indicated in dashed lines on top of the barcode B on the right. We show in Section 4 that the barcodes B and ${B}^{\prime}$ will almost certainly be very similar and quantify this similarity. (Bottom) The composite $\mathrm{TNS}\circ \mathrm{TMD}$ applied to a tree T. The tree T that we start with is indicated in dashed red lines under the new tree ${T}^{\prime}=\mathrm{TNS}\circ \mathrm{TMD}\left(T\right)$. The trees T and ${T}^{\prime}$ can be quite different combinatorially, as seen on the right.

**Figure 2.**A strict barcode belonging to the equivalence class $\left(2134\right)$. One bar $({b}_{0},{d}_{0})$ contains all the others. The remaining bars are ordered by their birth times $({b}_{1}<{b}_{2}<{b}_{3}<{b}_{4})$. Similarly, the deaths are ordered ${d}_{2}>{d}_{1}>{d}_{3}>{d}_{4}$, leading to the notation $\left(2134\right)$.

**Figure 3.**The algorithm to encode a tree structure as a persistence barcode. (

**A**) neuronal tree; (

**B**) persistence barcode generated with TMD. Each branch in the tree (

**A**) corresponds to a bar in the barcode (

**B**); the circled numbers encode the correspondence between branches and bars. Terminations are shown in blue, bifurcations in red, and branches in between in black.

**Figure 4.**A strict barcode, whose bars are ordered according to birth times (greyscale), defines a unique ordering of death times. This ordering and the Elder Rule constrain the possible combinatorial types of trees that can be realized from this barcode. (

**A**) the notation that will be used in this paper from a barcode that corresponds to an adjacency matrix of possible connectivities. Equivalently, the possible connectivities are presented in the connectivity diagram; (

**B**) examples of possible tree realizations from branches that connect to the longest one (top) to random (bottom).

**Figure 5.**Tree-realizations of all possible strict barcodes B with four bars. Each row (left) represents a possible permutation of death times for the corresponding order of bars in B, i.e., a possible TMD-type. Each barcode can be realized by a subset of all the combinatorial tree types, each represented by a column, with a corresponding adjacency matrix.

**Figure 6.**The two possible moves that respect the condition of a realizable barcode. Move $\left(\mathbf{A}\right)$ modifies the barcode’s ordering, whereas move $\left(\mathbf{B}\right)$ does not change the order of the deaths.

**Figure 7.**A representation of the space of barcodes with four bars up to permutation equivalence class as the Cayley graph of ${\mathfrak{S}}_{3}$ generated by the transpositions $\left(12\right)$ and $\left(23\right)$, respectively. Each vertex is an element of the group. The edges represent the transposition to convert one end point into the other, colored by generator. The number to the right of each bar is its index. All trees T such that $\mathrm{TMD}\left(T\right)=B$ are indicated next to a barcode B with the corresponding tree type of Figure 5. The number of such trees can be computed using Lemma 2.

**Figure 8.**A barcode B in equivalence class $\left(213\right)$ is shown in black. There are three possible combinatorial equivalence classes of trees whose TMD barcode is B, also represented in black. After adding the extra bar in red, we obtain a new barcode ${B}^{\prime}$, in the equivalence class $\left(2143\right)$. In a tree-realization of ${B}^{\prime}$, the branch corresponding to the red bar can be attached to any of the branches corresponding to the 0th, 1st, and 2nd bars, represented on the trees by the red branches. This leads to nine possible combinatorial equivalence classes of trees for the barcode $\left(2134\right)$.

**Figure 9.**The Cayley graph representing ${\mathfrak{S}}_{4}$ generated by $\left(12\right),\left(23\right),\left(34\right)$. For the four outside elements, we provide the barcode associated with the permutation. The number next to each bar is its index.

**Figure 10.**Upper bound on the probability that the bottleneck distance between B and $\mathrm{TNS}\circ \mathrm{TMD}\left(B\right)$ is larger than $\epsilon $ (Equation (1)) for various values of $\lambda $ and for $n=10$.

**Figure 11.**(

**A**) The ${\ell}_{1}$-distance between the black bullet and the diamond follows an Erlang$(2,\lambda )$ distribution. The interior of the green square defines a bound for the ${\ell}_{1}$-distance from the black bullet that depends on the value of the parameter $\lambda $. (

**B**) If the endpoints of the bars of B are sufficiently far away from each other and $B\sim \mathrm{TMD}\circ \mathrm{TNS}\left(B\right)$, then, with high probability, taking $\gamma =I$ will minimize the ${\ell}_{1}$-distance between pairs of endpoints of bars. (

**C**) If the endpoints of B are instead close to each other, then it is more likely that $B\nsim \mathrm{TMD}\circ \mathrm{TNS}\left(B\right)$, so that the optimal choice of $\gamma $ (represented by red segments) is not the identity. The red distances do not necessarily follow exponential distributions, so the proof of Lemma 4 does not apply.

**Figure 12.**(

**A1**–

**A3**) Bottleneck distance as a function of $\lambda $. We compute the bottleneck distance between an input barcode B and an output barcode ${B}^{\prime}$ for $\lambda =0.01-2$. (

**A1**) From barcode B (in black), a tree (in red) is generated using the TNS which results in a new barcode ${B}^{\prime}=\mathrm{TMD}\circ \mathrm{TNS}\left(B\right)$ (in red). (

**A2**) The average bottleneck distance (red points) is compared to the expected mean of the probability distribution function found in Lemma 4 (blue curve). (

**A3**) The bottleneck distances (red) are compared to the cumulative distribution probability for $0<\u03f5<200$ and $0<\lambda <2$ (blue). (

**B1**–

**B2**) bottleneck distance between B and ${B}^{\prime}$ as a function of distances between bars in B. (

**B1**). We consider barcodes of the same permutation type for different distances between two bars $({b}_{i},{d}_{i})$ and $({b}_{j},{d}_{j})$ of the initial barcode B that are consecutive in the order of deaths. (

**B2**) For each input barcode with increasing ${d}_{i}$, the distance between death times presented in x-axis, 100 synthesized barcodes are generated and the bottleneck distance between the input and output barcodes is computed (y-axis), which depends only on the value of $\lambda $ and not on the distance between the bars.

**Figure 13.**We are interested in the case where ${d}_{j}^{\prime}<{d}_{i}^{\prime}$ when we start from ${d}_{i}<{d}_{j}$. The distances $|{d}_{i}-{d}_{i}^{\prime}|$ and $|{d}_{j}-{d}_{j}^{\prime}|$ both follow an exponential law of parameter $\lambda $. The probability to terminate increases exponentially when approaching ${d}_{i}$ and ${d}_{j}$, as represented by the blue arrows.

**Figure 14.**(

**A**) Example of two bars changing order, which results in switching of classes. We show here a tree, its barcode, and the corresponding persistence diagram when two consecutive deaths switch their order. The impact of the change is illustrated by the red arrows; (

**B**) percentage of order changes per 100 repetitions for varied distance between death times of two consecutive bars of the input barcode. Comparison of theoretical results (solid lines) to simulations (scatter plot) for different values of lambda.

**Figure 15.**Histogram of tree-realization numbers for equivalence classes of barcodes with $n+1$ bars ($1\le n\le 10$). The maximal tree-realization number for a fixed number of bars can be achieved with exactly one equivalence class that of the strictly ordered barcode.

**Figure 16.**Empirical distribution (percentage of 1000 trees) of synthesized geometric trees with four branches by combinatorial tree type (

**A**–

**F**) for a given input barcode equivalence class (rows), when $\lambda =1$. We observe that the distribution is approximately uniform.

**Figure 17.**Barcode-equivalence class, represented by the corresponding permutation, (

**A**) and persistence diagram (

**B**) of 100 synthesized cells based on a geometric tree with eight branches, extracted from a layer 4 pyramidal cell. The barcode-equivalence classes of the synthesized trees (represented by blue dots) can differ from that of the original tree due to the stochastic nature of synthesis algorithm. The persistence diagrams of the synthesized trees ((

**B**), blue) are essentially indistinguishable from those of the original tree ((

**B**), red).

**Figure 18.**We begin with a barcode with eight bars. The death times ${d}_{{i}_{4}}$ and ${d}_{{i}_{5}}$ (i.e., the 4th and 5th largest death times) are slowly switching as k increases, represented by red-shifting of the color of the points in the persistence diagram. When $k=0$ (in red), we have the original barcode B, and when $k=50$ (in blue) we obtain a barcode identical to the original, except that $({b}_{{i}_{4}},{d}_{{i}_{4}})$ is replaced by $({b}_{{i}_{4}},{d}_{{i}_{5}})$ and $({b}_{{i}_{5}},{d}_{{i}_{5}})$ by $({b}_{{i}_{5}},{d}_{{i}_{4}})$.

**Figure 19.**On the left, evolution of $\mathrm{PD}\left({B}_{k}^{\prime}\right)$ as k increases (represented by red-shifting of the point color, from red $k=0$ to blue $k=50$), for various pairs of bars. When not clear, we circle in orange the two points that switch. On the right, the corresponding evolution of the tree realization number $\mathrm{TRN}\left({B}_{k}^{\prime}\right)$ as k increases. For instance, as indicated in Table 2, the tree-realization number of ${B}^{1}$ is 810 and that of ${\widehat{B}}^{1}={B}_{50}^{1}$ is 540. The barcodes ${B}_{k}^{\prime}$ exhibit the behavior described in Lemma 5, except for the last row, in which death times that are too close to each other (circled in purple and green) interfere with the process. Without this interference, the tree-realization numbers should oscillate between 20 and 40. When k gets close to 50 (blue), the death time ${d}_{{i}_{1}}$ (largest death time) starts interfering with the third one ${d}_{{i}_{3}}$ (circled in purple) in the tree synthesis process.

**Figure 20.**(

**A**) TMD-equivalence classes of a population of biological geometric trees with at most 30 bars (red dots), represented by their associated permutations; (

**B**) examples of TMD-equivalence classes of individual biological geometric trees with eight branches, extracted from layer 4 pyramidal cells (red dots).

**Figure 21.**The log of the tree-realization number for barcodes with varying numbers of bars. (

**A**) the log of tree-realization number for barcodes of basal dendrites (in blue) in comparison with random barcodes (in yellow) and the maximum tree-realization number ($n!$ for $n+1$ bars) (in red); (

**B**) the log of the tree-realization number for barcodes of apical dendrites (in blue) in comparison with random barcodes (in yellow) and the maximum maximum tree-realization number (in red).

**Table 1.**Summary and terminology of the TMD and TNS algorithms. The TMD computes the barcode of a tree from the tips of branches towards the root, whereas the TNS grows the tree in the opposite direction, from the root to the leaves.

TMD | TNS | |
---|---|---|

Goal | Compute the barcode of a tree based on a distance function | Grow a new tree from a barcode |

Directionality | From leaves to root | From root to leaves |

Domains | $\left\{\mathrm{geometric}\phantom{\rule{4.pt}{0ex}}\mathrm{trees}\right\}\u27f6\left\{\mathrm{barcodes}\right\}$ | $\left\{\mathrm{barcodes}\right\}\u27f6\left\{\mathrm{geometric}\phantom{\rule{4.pt}{0ex}}\mathrm{trees}\right\}$ |

**Table 2.**For each example displayed in Figure 19, we list the permutation type and the tree-realization number of the original barcode B and of $\widehat{B}={B}_{50}$, and the indices of the bars that are switched. The superscript i in ${B}^{i}$ indicates the corresponding row of Figure 19. For example, the largest death time of barcode ${B}^{1}$ is the second bar (in order of birth times), and its shortest death is the third one. When we switch the 4th and 5th (from largest to smallest) death times in ${B}^{1}$ and ${\widehat{B}}^{1}$, the TRN changes from 810 to 540.

Permutation | TRN | Bars That Switch | |
---|---|---|---|

${B}^{1}$ | $(2,6,8,1,5,7,4,3)$ | 810 | 4 and 5 |

${\widehat{B}}^{1}$ | $(2,6,8,5,1,7,4,3)$ | 540 | 4 and 5 |

${B}^{2}$ | $(5,7,6,4,2,1,3)$ | 12 | 2 and 3 |

${\widehat{B}}^{2}$ | $(5,6,7,4,2,1,3)$ | 18 | 2 and 3 |

${B}^{3}$ | $(5,7,6,4,2,1,3)$ | 12 | 3 and 4 |

${\widehat{B}}^{3}$ | $(5,7,4,6,2,1,3)$ | 18 | 3 and 4 |

${B}^{4}$ | $(8,6,7,4,3,1,2,5)$ | 20 | 1 and 2 |

${\widehat{B}}^{4}$ | $(6,8,7,4,3,1,2,5)$ | 40 | 1 and 2 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kanari, L.; Garin, A.; Hess, K.
From Trees to Barcodes and Back Again: Theoretical and Statistical Perspectives. *Algorithms* **2020**, *13*, 335.
https://doi.org/10.3390/a13120335

**AMA Style**

Kanari L, Garin A, Hess K.
From Trees to Barcodes and Back Again: Theoretical and Statistical Perspectives. *Algorithms*. 2020; 13(12):335.
https://doi.org/10.3390/a13120335

**Chicago/Turabian Style**

Kanari, Lida, Adélie Garin, and Kathryn Hess.
2020. "From Trees to Barcodes and Back Again: Theoretical and Statistical Perspectives" *Algorithms* 13, no. 12: 335.
https://doi.org/10.3390/a13120335