# Computing Persistent Homology of Directed Flag Complexes

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Results

#### 2.1. Sample Computations

- BE = HumanContact/Windsurfers—An undirected network containing interpersonal contacts between windsurfers in southern California during the fall of 1986,
- GP = Social/Google+—A directed network that contains Google+ user to user links,
- IN = HumanContact/Infectious—face-to-face behavior of people during the exhibition INFECTIOUS: STAY AWAY in 2009 at the Science Gallery in Dublin,
- JA = HumanSocial/Jazz musicians—collaboration network between Jazz musicians,
- MQ = Animal/Macaques—A directed network containing dominance behaviour in a colony of 62 adult female Japanese macaques,
- PR = Metabolic/Human protein (Figeys)—a network of interactions between proteins in humans.

- FN = 19th step of the Rips filtration of the 40-cycle graph (equipped with the graph metric), i.e., the graph with 40 vertices, where each vertex is connected to all others except the vertex with graph distance exactly 20 from itself,
- BB = sample of a Blue Brain Project reconstruction of the neocortical column of a rat [6],
- CE = a C. Elegans brain network [13], and
- BA = a Barabási-Albert graph with a similar number of vertices and connection probability as a C. Elegans brain network.
- ER = an Erdős-Rényi graph with a similar number of vertices and connection probability as such a network,

#### 2.2. Comparisons

#### 2.3. Availability of Source Code

- Flagser-Count: A modification of a basic procedure in Flagser, optimised for very large networks.
- Deltser: A modification of Flagser that computes persistent homology on any finite delta complex (like a simplicial complex, except more than one simplex may be supported on the same set of vertices).
- Tournser: The flag tournaplex of a directed graph is the delta complex whose simplices are all tournaments that occur in the graph. Tournser is an adaption of Flagser which computes (persistent) homology of the tournaplex of a finite directed graph. Tournser uses two filtrations that occur naturally in tournaplexes, but can also be used in conjunction with other filtrations.

## 3. Discussion

## 4. Materials and Methods

#### 4.1. Mathematical Background

**Definition**

**1.**

**Definition**

**2.**

**Definition**

**3.**

**Definition**

**4.**

**Definition**

**5.**

#### 4.2. Representing the Directed Flag Complex in Memory

- For an arbitrary fixed vertex ${v}_{0}\in V$, we iterate over all simplices which have ${v}_{0}$ as the initial vertex.
- Iterate Step 1 over all vertices (but see 3 below).

- The construction of the directed flag complex is very highly parallelisable. In theory one could use one CPU for each of the vertices of the graph, so that each CPU computes only the simplices with this vertex as initial vertex.
- Only the graph is loaded into memory. The directed flag complex, which is usually much bigger, does not have any impact on the memory footprint.
- The procedure skips branches of the iteration based on the prefix. The idea is that if a simplex is not contained in the complex in question, then no simplex containing it as a face can be in the complex. Therefore, if we computed that a simplex $({v}_{0},\dots ,{v}_{k})$ is not contained in our complex, then we don’t have to iterate Step 1 on its vertices to compute the vertices w that are reachable from each of the ${v}_{i}$. This allows us to skip the full iteration branch of all simplices with prefix $({v}_{0},\dots ,{v}_{k})$. This is particularly useful for iterating over the simplices in a subcomplex.

`0111 and 1101 = 0101`). This makes the computation of the directed flag complex more efficient. Given a k-simplex $({v}_{0},\dots ,{v}_{k})$ in $dFl\left(G\right)$, the set of vertices

`1`’s in the bitsets of the adjacency matrix in the rows corresponding to the vertices ${v}_{0},\dots ,{v}_{k}$, and thus the set intersection can be computed as the logical “AND” of those bitsets.

- perform computations on the complex such as counting, or computing co-boundaries etc., without having to recompute the intersections described above, and
- associate data to each simplex with fast lookup time.

#### 4.3. Computing the Coboundaries

#### 4.4. Performance Considerations

#### 4.4.1. Sorting the Columns of the Coboundary Matrix

- Sort columns in ascending order by their filtration value. This is necessary for the algorithm to produce the correct results.
- Sort columns in ascending order by the number of non-zero entries in the column. This tries to keep the number of new non-zero entries that a reduction with such a column produces small.
- Sort columns in descending order by the position of the pivot element of the unreduced column. This means that the larger the pivot (i.e., the higher the index of the last non-trivial entry), the lower the index of the column is in the sorted list. The idea behind this is that “long columns”, i.e., columns whose pivot element has a high index, should be as sparse as possible. Using such a column to reduce the columns to its right is thus more economical. Initial coboundary matrices are typically very sparse. Thus ordering the columns in this way gives more sparse columns with large pivots to be used in the reduction process.
- Sort columns in descending order by their “pivot gap”, i.e., by the distance between the pivot and the next nontrivial entry with a smaller row index. A large pivot gap is desirable in case the column is used in reduction of columns to its right, because when using such a column to reduce another column, the added non-trivial entry in the column being reduced appears with a lower index. This may yield fewer reduction steps on average.

**Remark**

**1.**

#### 4.4.2. Approximate Computation

#### 4.4.3. Dynamic Priority Queue

#### 4.4.4. Apparent Pairs

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Blue Brain Project. Blue Brain Project, General Website. Available online: https://www.epfl.ch/research/domains/bluebrain (accessed on 2 July 2019).
- Laboratory for Topology and Neuroscience. Available online: https://hessbellwald-lab.epfl.ch/ (accessed on 2 July 2019).
- Maria, C.; Boissonnat, J.D.; Glisse, M.; Yvinec, M. The Gudhi Library: Simplicial Complexes and Persistent Homology; Research Report RR-8548; INRIA: Rocquencourt, France, 2014. [Google Scholar]
- Bauer, U.; Kerber, M.; Reininghaus, J.; Wagner, H. Phat—Persistent Homology Algorithms Toolbox. J. Symb. Comput.
**2017**, 78, 76–90. [Google Scholar] [CrossRef] - Blue Brain Project. Digital Reconstruction of Neocortical Microcircuitry. 2019. Available online: https://bbp.epfl.ch/nmc-portal/downloads (accessed on 2 July 2019).
- Reimann, M.W.; Nolte, M.; Scolamiero, M.; Turner, K.; Perin, R.; Chindemi, G.; Dłotko, P.; Levi, R.; Hess, K.; Markram, H. Cliques of Neurons Bound into Cavities Provide a Missing Link between Structure and Function. Front. Comput. Neurosci.
**2017**, 11, 48. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Dłotko, P. Topological Neuroscience Software. Available online: http://neurotop.gforge.inria.fr (accessed on 2 July 2019).
- Blue Brain Project. Mouse Whole-Neocortex Connectome Model. 2019. Available online: https://portal.bluebrain.epfl.ch/resources/models/mouse-projections/ (accessed on 2 July 2019).
- Smith, J.P. Flagser-Adaptions. 2019. Available online: https://github.com/JasonPSmith/flagser-adaptions (accessed on 2 July 2019).
- Bauer, U. Ripser: A Lean C++ Code for the Computation of Vietoris-Rips Persistence Barcodes. 2015–2018. Available online: http://ripser.org (accessed on 29 August 2018).
- Kunegis, J. KONECT—The Koblenz Network Collection. In Proceedings of the 22nd International Conference on World Wide Web Companion, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 1343–1350. [Google Scholar]
- Kunegis, J. KONECT—The Koblenz Network Collection Website. Available online: http://konect.uni-koblenz.de/ (accessed on 2 July 2019).
- Varshney, L.R.; Chen, B.L.; Paniagua, E.; Hall, D.H.; Chklovskii, D.B. Structural Properties of the Caenorhabditis elegans Neuronal Network. PLoS Comput. Biol.
**2011**, 7, e1001066. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Adamaszek, M. Special cycles in independence complexes and superfrustration in some lattices. Topol. Appl.
**2013**, 160, 943–950. [Google Scholar] [CrossRef] - Zomorodian, A. Fast construction of the Vietoris-Rips complex. Comput. Graph.
**2010**, 34, 263–271. [Google Scholar] [CrossRef] - Binchi, J.; Merelli, E.; Rucco, M.; Petri, G.; Vaccarino, F. jHoles: A Tool for Understanding Biological Complex Networks via Clique Weight Rank Persistent Homology. Electron. Notes Theor. Comput. Sci.
**2014**, 306, 5–18. [Google Scholar] [CrossRef] - Lütgehetmann, D. Computing Homology of Directed Flag Complexes. 2019. Available online: https://github.com/luetge/flagser (accessed on 2 July 2019).
- Costa, A.; Farber, M. Large random simplicial complexes, I. J. Topol. Anal.
**2016**, 8, 399–429. [Google Scholar] [CrossRef] [Green Version] - Sanchez-Garcia, R.J.; Fennelly, M.; Norris, S.; Wright, N.; Niblo, G.; Brodzki, J.; Bialek, J.W. Hierarchical spectral clustering of power grids. IEEE Trans. Power Syst.
**2014**, 29, 2229–2237. [Google Scholar] [CrossRef] [Green Version] - Munkres, J.R. Elements of Algebraic Topology; Addison-Wesley: Boston, MA, USA, 1984. [Google Scholar]
- Hatcher, A. Algebraic Topology; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
- Edelsbrunner, H.; Harer, J. Persistent homology—A survey. In Surveys on Discrete and Computational Geometry; American Mathematical Soc.: Providence, RI, USA, 2008; Volume 453, pp. 257–282. [Google Scholar] [CrossRef] [Green Version]
- Bauer, U. Ripser: Efficient Computation of Vietoris-Rips Persistence Barcodes. 2018. Available online: http://ulrich-bauer.org/ripser-talk.pdf (accessed on 29 August 2018).
- Yannakakis, M. Computing the Minimum Fill-In is NP-Complete. SIAM J. Algebraic Discret. Methods
**1981**, 2, 77–79. [Google Scholar] [CrossRef] - Forman, R. Morse Theory for Cell Complexes. Adv. Math.
**1998**, 134, 90–145. [Google Scholar] [CrossRef] [Green Version]

**Figure 1.**Trajectories of ${\beta}_{2}$ and ${\beta}_{3}$ as a function of the approximation parameter. A change in order of magnitude in the approximation parameter results in a small change in the Betti number computed.

**Figure 2.**Cell count in the Blue Brain reconstruction of the somatosensory cortex. Computation was run on an HPC cluster using 256 CPUs, required 55.69 GB of memory and took 7.5 h to complete.

**Figure 3.**The simplex count for the PL-Left region of a mouse neocortex (local connections only). The complex is 21-dimensional with more than 1.2 trillion 11-dimensional simplices. Computation was run in parallel on two nodes of an HPC cluster with 256 CPUs each, required 1 GB of memory and took five days to complete (run as 10 different jobs of 24 h each).

**Figure 4.**Four large datasets taken from the Koblenz Network Collection [11]. (

**a**) Social network of YouTube users and their friendship connections, (

**b**) friendship data of Facebook users, (

**c**) road network of the State of Texas, a node is an intersection between roads or road endpoint and edges are road segments and (

**d**) Road network of the State of California. All computations were carried out on an HPC cluster using 256 cores.

**Figure 5.**A digraph for which using apparent pairs to compute the homology of the directed flag complex gives the wrong results.

**Table 1.**Performance of Flagser on a variety of datasets: BE = Windsurfers, GP = Google+, IN = Infectious, JA = Jazz, MQ = Macaques, PR = Protein, FN = 19th step of the Rips filtration of the 40-cycle graph, BB = Blue Brain Project neocortical column, BA = Barabási–Albert graph, CE = C-Elegans, ER = Erdős-Rényi graph. In all cases density is given as connection probability as percentage. Dataset${}^{\u2020}$ means that the Approximate function of Flagser was used with approximation parameter 100,000. Dataset ${}^{*}$ means that the homology computation was made on a single core of an HPC cluster.

Dataset | Vertices | Edges | Density | $\mathtt{Flagser}-\mathtt{Count}$ | $\mathtt{Flagser}\phantom{\rule{3.33333pt}{0ex}}\mathtt{Homology}$ | ||
---|---|---|---|---|---|---|---|

BE | 43 | 336 | 18.172 | 0.01 s, | 3.32 MB | 0.08 s, | 3.81 MB |

GP | 23.6 K | 39.2 K | 0.007 | 0.50 s, | 564.40 MB | 2.09 s, | 569.54 MB |

IN | 410 | 2765 | 1.644 | 0.04 s, | 3.68 MB | 1.27 s, | 9.13 MB |

JA * | 198 | 2742 | 6.994 | 49.63 s, | 3.57 MB | 10 h 44 min 3.00 s, | 75.79 GB |

MQ | 62 | 1187 | 30.879 | 0.09 s, | 3.44 MB | 9.41 s, | 50.22 MB |

PR | 2239 | 6452 | 0.129 | 0.04 s, | 8.63 MB | 0.07 s, | 9.17 MB |

FN | 40 | 760 | 47.500 | 2 min 23.26 s, | 3.46 MB | Unfinished in 24 h | |

BB ${}^{\u2020,}$* | 31.3 K | 7.8 M | 0.793 | 23.76 s, | 1.08 GB | 12 h 15 min 3.00 s, | 52.96 GB |

BA | 280 | 1911 | 2.437 | 0.00 s, | 3.52 MB | 0.03 s, | 3.84 MB |

CE | 279 | 2194 | 2.819 | 0.02 s, | 3.55 MB | 0.12 s, | 4.55 KB |

ER ${}^{\u2020}$ | 31.3 K | 7.9 M | 0.804 | 15.06 s, | 1.08 GB | 1 h 30 min 51.00 s, | 4.98 GB |

Dataset | $\mathtt{Persistence}$ |
---|---|

BE | 00.09 s, 03.91 MB |

MQ | 11.47 s, 69.54 MB |

**Table 3.**Performance of Flagser on the Blue Brain Project graph with respect to different values of the Approximate parameter. The Betti numbers computed are an approximation to the true Betti numbers. See Table 4 for accuracy.

Approx. | Time | Memory | ${\mathit{\beta}}_{1}$ | ${\mathit{\beta}}_{2}$ | ${\mathit{\beta}}_{3}$ | ${\mathit{\beta}}_{4}$ | ${\mathit{\beta}}_{5}$ |
---|---|---|---|---|---|---|---|

1 | 40 min 44 s | 11.77 GB | 14,640 | 9,831,130 | 30,219,098 | 513,675 | 785 |

10 | 38 min 44 s | 11.96 GB | 15,148 | 14,831,585 | 12,563,309 | 13,278 | 40 |

100 | 40 min 29 s | 12.69 GB | 15,188 | 13,956,297 | 7,171,621 | 9898 | 37 |

1000 | 55 min 40 s | 15.42 GB | 15,283 | 14,598,249 | 5,780,569 | 9822 | 37 |

10,000 | 2 h 24 min 17 s | 23.78 GB | 15,410 | 14,872,057 | 5,219,666 | 9821 | 37 |

100,000 | 12 h 15 min 3 s | 52.96 GB | 15,438 | 14,992,658 | 4,951,674 | 9821 | 37 |

**Table 4.**Number of skipped columns of the coboundary matrix for the Blue Brain Project graph with respect to different values of the Approximate parameter. The approximation accuracy of ${\beta}_{i}$ depends theoretically on the number of columns skipped in the coboundary matrices for ${\delta}_{i-1}$ and ${\delta}_{i}$.

Approx. | ${\mathit{\delta}}_{1}$ | ${\mathit{\delta}}_{2}$ | ${\mathit{\delta}}_{3}$ | ${\mathit{\delta}}_{4}$ | ${\mathit{\delta}}_{5}$ |
---|---|---|---|---|---|

1 | 7,724,934 | 56,746,602 | 19,092,109 | 664,481 | 4162 |

10 | 6,714,811 | 15,944,792 | 796,559 | 372 | 0 |

100 | 179,794 | 4,107,019 | 6414 | 0 | 0 |

1000 | 14,209 | 1,902,229 | 42 | 0 | 0 |

10,000 | 1002 | 1,054,397 | 0 | 0 | 0 |

100,000 | 27 | 664,857 | 0 | 0 | 0 |

**Table 5.**Performance of Flagser on an Erdős-Rényi graph with respect to different values of the Approximate parameter. The Betti numbers range from dimension 0 to dimension 3.

Approx. | Time | Memory | ${\mathit{\beta}}_{0}$ | ${\mathit{\beta}}_{1}$ | ${\mathit{\beta}}_{2}$ | ${\mathit{\beta}}_{3}$ |
---|---|---|---|---|---|---|

1 | 4 min 45 s | 3.25 GB | 1 | 19,675 | 14,675,052 | 40 |

10 | 4 min 31 s | 3.10 GB | 1 | 19,981 | 9,204,135 | 4 |

100 | 4 min 44 s | 3.18 GB | 1 | 19,981 | 8,074,149 | 4 |

1000 | 5 min 48 s | 3.33 GB | 1 | 19,981 | 7,921,730 | 4 |

10,000 | 14 min 18 s | 3.72 GB | 1 | 19,982 | 7,876,858 | 4 |

100,000 | 1 h 30 min 51 s | 4.98 GB | 1 | 19,984 | 7,857,917 | 4 |

**Table 6.**Number of skipped columns of the coboundary matrix for an Erdős-Rényi graph with respect to different values of the Approximate parameter in dimensions from 1 to 3.

Approx. | ${\mathit{\delta}}_{1}$ | ${\mathit{\delta}}_{2}$ | ${\mathit{\delta}}_{3}$ |
---|---|---|---|

1 | 7,692,010 | 719,041 | 124 |

10 | 1,501,909 | 3 | 0 |

100 | 371,920 | 0 | 0 |

1000 | 219,501 | 0 | 0 |

10,000 | 174,628 | 0 | 0 |

100,000 | 155,685 | 0 | 0 |

**Table 7.**A comparison of the run times of Flagser and Gudhi+NetworkX to compute the homology of the undirected flag complex of six undirected graphs. Where DNF denotes the computation did not finish in under 24 h, Approx. 100,000 denotes the use of the Approximate function of Flagser with value 100,000, and MD = 6 denotes the use of the max dimension option of Flagser to compute only up to dimension 6. In the cases where the max dimension option was used, the computation did not finish within 24 h without it. All computations were done on an HPC cluster using a single CPU. The BBP layers are taken from the Blue Brain Project website [1] and the other circuits are taken from KONECT [11]. We consider the graphs as simple undirected graphs for these computations, as such the edge counts in the table above may differ from the original graphs as reciprocal edges and loops were removed.

Dataset | Vertex Count | Edge Count | Gudhi+NetworkX | Flagser |

C. Elegans | 279 | 2194 | 1.229 s, 585 MB | 0.705 s, 589 MB |

BBP Layer 2 | 3416 | 132,309 | 11 s, 919 MB | 145 s, 745 MB |

BBP Layer 5 | 6114 | 756,107 | DNF | DNF |

PGP | 10,680 | 24,316 | 3 h 6 min 20 s, 92 GB | 3 h 25 min 46 s, 5.4 GB (MD = 6) |

OpenFlights | 2939 | 19,256 | DNF | 3 h 45 min 18 s, 5.2 GB (MD = 6) |

SimpleWiki | 100,312 | 824,968 | DNF | DNF |

Dataset | Flagser (Approx. 100,000) | Betti Numbers | Approx. Betti Numbers | |

C. Elegans | 0.701 s, 524 MB | [1, 162, 83] | [1, 162, 83] | |

BBP Layer 2 | 129 s, 612 MB | [1, 5290, 106,360, 5] | [1, 5288, 106,381, 5] | |

BBP Layer 5 | 13 h 4 min 45 s, 1.4 GB | Unknown | [1, 1222, 1,427,597, 1133] | |

PGP | 3 h 30 min 32 s, 5.3 B (MD = 6) | [1, 1093, 22, 1] | [1, 1093, 22, 1, 0, 0, 0] | |

OpenFlights | 3 h 37 min 19 s, 5.3 GB (MD = 6) | [24,354, 310, 107, 53, 2, 2,0] | [24,354, 309, 108, 53, 4, 2, 0] | |

SimpleWiki | 15 h 7 min 41 s, 21.0 GB (MD = 6) | Unknown | [304, 125,571, 45,306, 31,866, 10,486, 319, 6] |

**Table 8.**The precise number of columns skipped in the Flagser computations presented in Table 7. The notation ${\delta}_{i}$ refers to the i-th coboundary operator.

Dataset | ${\mathit{\delta}}_{1}$ | ${\mathit{\delta}}_{2}$ | ${\mathit{\delta}}_{3}$ | ${\mathit{\delta}}_{4}$ | ${\mathit{\delta}}_{5}$ | ${\mathit{\delta}}_{6}$ | ${\mathit{\delta}}_{7}$ |
---|---|---|---|---|---|---|---|

C. Elegans | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

BBP2 | 0 | 23 | 0 | 0 | 0 | 0 | 0 |

BBP5 | 0 | 32 | 49,614 | 0 | 0 | 0 | 0 |

PGP | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

OpenFlights | 0 | 2 | 0 | 2 | 0 | 0 | 0 |

SimpleWiki | 0 | 56 | 1357 | 1280 | 63 | 0 | 0 |

**Table 9.**Theoretical and actual error in the computation of the first Betti number by skipping the reduction of columns that require certain numbers of reduction steps. The cell counts are 4656, 35,587, and 1,485,099 in dimensions $0,1$ and 2 respectively, and the first Betti number is 3321. The computations were made as averages over ten runs on a MacBook Pro, 2.5 GHz Intel Core i7. Source of data: Blue Brain Project—Layer 4 of the neocortical column reconstruction, Version 5.

Approximation | None | ${10}^{4}$ | ${10}^{3}$ | ${10}^{2}$ | ${10}^{1}$ |
---|---|---|---|---|---|

Theoretical error | 0 | 322 | 1380 | 7849 | 192,638 |

Actual error | 0 | 0 | 19 | 58 | 65 |

Computation time | 113 s | 49 s | 43 s | 41 s | 39 s |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lütgehetmann, D.; Govc, D.; Smith, J.P.; Levi, R.
Computing Persistent Homology of Directed Flag Complexes. *Algorithms* **2020**, *13*, 19.
https://doi.org/10.3390/a13010019

**AMA Style**

Lütgehetmann D, Govc D, Smith JP, Levi R.
Computing Persistent Homology of Directed Flag Complexes. *Algorithms*. 2020; 13(1):19.
https://doi.org/10.3390/a13010019

**Chicago/Turabian Style**

Lütgehetmann, Daniel, Dejan Govc, Jason P. Smith, and Ran Levi.
2020. "Computing Persistent Homology of Directed Flag Complexes" *Algorithms* 13, no. 1: 19.
https://doi.org/10.3390/a13010019