Computing persistent homology of directed flag complexes

We present a new computing package Flagser, designed to construct the directed flag complex of a finite directed graph, and compute persistent homology for flexibly defined filtrations on the graph and the resulting complex. The persistent homology computation part of Flagser is based on the program Ripser [Bau18a], but is optimized specifically for large computations. The construction of the directed flag complex is done in a way that allows easy parallelization by arbitrarily many cores. Flagser also has the option of working with undirected graphs. For homology computations Flagser has an Approximate option, which shortens compute time with remarkable accuracy. We demonstrate the power of Flagser by applying it to the construction of the directed flag complex of digital reconstructions of brain microcircuitry by the Blue Brain Project and several other examples. In some instances we perform computation of homology. For a more complete performance analysis, we also apply Flagser to some other data collections. In all cases the hardware used in the computation, the use of memory and the compute time are recorded.


Introduction
Over the past decade, homology and persistent homology -particularly in the form of persistent homology of Vietoris-Rips complexes of point clouds -entered the field of data analysis.Therefore, its efficient computation is of great interest, which led to several computational tools like Gudhi [MBGY14], PHAT [BKRW17] and other packages.
In an ongoing collaboration with the Blue Brain Project (Link 9.1) and the Laboratory for Topology and Neuroscience (Link 9.9) we study certain ordered simplicial complexes (Definition 2.3) arising from directed graphs that model brain microcircuitry reconstructions created by the Blue Brain Project team.Having access to these reconstructions gives us a unique opportunity to apply topological tools to the study of brain circuitry models in a scale and level of biological accuracy that was never possible before.Thus the need arose for a software package that can generate the complexes in question out of the adjacency matrix of the circuits under investigation and compute homology and persistent homology of such complexes with respect to a variety of filtrations.
The size of the networks we considered initially was at the order of magnitude of 30 thousand vertices and 8 million directed edges (reconstructed neocortical column of a 14 days old rat (Link 9.3)).Software was designed as part of the work that lead to the publication [RNS + 17], and is freely available (Link 9.7).However, the magnitude of the resulting complexes -typically 7 dimensional with about 70 million 2-and 3-simpliceslimited our ability to conduct certain deeper investigations, as even the construction of the complexes required a considerable number of compute hours on a large HPC node, and homology computations with the full complex were completely out of reach.More recent reconstructions by the BBP team produced much larger and more complex models (See for instance the model of the full mouse cortex: Link 9.2).In addition, attempting to understand neuroscientific phenomena, such as neuronal activity as a response to stimuli or synaptic plasticity, naturally invites the use of persistent homology associated to various filtrations of the complexes.Thus the need arose for a flexible and efficient computational package that is capable of generating complexes out of very large directed graphs and computing homology and persistent homology within reasonable compute time and use of memory.
The software package Flagser presented in this article was developed as part of an EPSRC funded project, Topological Analysis of Neural Systems.It is designed specifically for dealing with the kind of constructions and high performance computations required in this project, but is potentially applicable in a variety of other areas.The homology computation part of Flagser is based on a rather new addition to the topological data analysis software collection, Ripser by Ulrich Bauer [Bau18a].However, unlike Ripser, the Flagser package includes the capability to generate the (directed or undirected) flag complex of a finite (directed or undirected) graph from its connectivity matrix.In addition the reduction algorithm and coboundary computation methods were modified and optimized for use with very large datasets.We also added a flexible way of associating weights to vertices and/or edges in a given graph, and the capability to have user defined filtrations on the complex, resulting from the given weights.Another very useful added feature in Flagser is an Approximate function that allows the homology computation part of the program to compute Betti numbers and persistent homology within a user defined accuracy, and as a result dramatically reduce compute time.Flagser also contains the capability of computing homology with coefficients in odd characteristics.With Flagser we are able to perform the construction of the directed flag complex of the neocortical column of a rat (Link 9.3) on a laptop within less than 30 seconds, and compute a good approximation for it's homology within a few of hours on an HPC cluster (Figure 3).Such computations were completely out of reach before.
The usefulness of Flagser is clearly not limited to neuroscience.Topological analysis of graphs and complexes arising in both theoretical problems and applicable studies have become quite prevalent in recent years [CF16, SGFN + 14].Hence Flagser, in its original or modified form, could become a major computational aid in many fields.
In the following, we discuss the main design choices and their impact on computations.The last section contains real-world examples demonstrating the usefulness of the software package.
Flagser is an open source package, licensed under the LGPL 3.0 scheme.Some task specific modifications are available as well (See Section 7).

Mathematical background
In this section, we give a brief introduction to directed flag complexes of directed graphs, and to persistent homology.For more detail, see [Mun84], [Hat02] and [RNS + 17].
Definition 2.1.A graph G is a pair (V, E), where V is a finite set referred to as the vertices of G, and E is a subset of the set of unordered pairs e = {v, w} of distinct points in V , which we call the edges of G. Geometrically the pair {u, v} indicates that the vertices u and v are adjacent to each other in G.A directed graph, or a digraph, is similarly a pair (V, E) of vertices V and edges E, except the edges are ordered pairs of distinct vertices, i.e. the pair (v, w) indicates that there is an edge from v to w in G.In a digraph we allow reciprocal edges, i.e. both (u, v) and (v, u) may be edges in G, but we exclude loops, i.e. edges of the form (v, v).
Next we define the main topological objects considered in this article.
Definition 2.2.An abstract simplicial complex on a vertex set V is a collection X of non-empty finite subsets σ ⊆ V that is closed under taking non-empty subsets.An ordered simplicial complex on a vertex set V is a collection of non-empty finite ordered subsets σ ⊆ V that is closed under taking non-empty ordered subsets.Notice that the vertex set V does not have to have any underlying order.In both cases the subsets σ are called the simplices of X, and σ is said to be a k-simplex, or a k-dimensional simplex, if it consists of k + 1 vertices.If σ ∈ X is a simplex and τ ⊆ σ, then τ is said to be a face of σ.If τ is a subset of V (ordered if X is an ordered simplicial complex) that contains σ, then τ is called a coface of σ.If σ = (v 0 , . . ., v k ) is a k-simplex in X, then the i-th face of σ is the subset ∂ i (σ) = (v 0 , . . ., vi , . . ., v k ) where by vi we mean omit the i-th vertex.For d ≥ 0 denote by X d the set of all d-dimensional simplices of X.
Notice that if X is any abstract simplicial complex with a vertex set V , then by fixing a linear order on the set V , we may think of each one of the simplices of X as an ordered set, where the order is induced from the ordering on V .Hence any abstract simplicial complex together with an ordering on its vertex set determines a unique ordered simplicial complex.Clearly any two orderings on the vertex set of the same complex yield two distinct but isomorphic ordered simplicial complexes.Hence from now on we may assume that any simplicial complex we consider is ordered.
Definition 2.3.Let G = (V, E) be a directed graph.The directed flag complex dFl(G) is defined to be the ordered simplicial complex whose k-simplices are all ordered (k + 1)cliques, i.e. (k + 1)-tuples σ = (v 0 , v 1 , . . ., v k ), such that v i ∈ V for all i, and (v i , v j ) ∈ E for i < j.The vertex v 0 is called the initial vertex of σ or the source of σ, while the vertex v k is called the terminal vertex of σ or the sink of σ.
Let F be a field, and let X be an ordered simplicial complex.For any k ≥ 0 let C k (X) be the F-vector space with basis given by the set X k of all k-simplices of X.If σ = (v 0 , . . ., v k ) is a k-simplex in X, then the i-th face of σ is the subset ∂ i (σ) = (v 0 , . . ., vi , . . ., v k ), where vi means that the vertex v i is removed to obtain the corresponding (k − 1)-simplex.For k ≥ 1, define the k-th boundary map The boundary map has the property that d k • d k+1 = 0 for each k ≥ 1 [Hat02].Thus Im(d k+1 ) ⊆ Ker(d k ) and one can make the following fundamental definition: Definition 2.4.Let X be an ordered simplicial complex.The k-th homology group of X (over F) denoted by H k (X) = H k (X; F) is defined to be If X is an ordered simplicial complex, and Y ⊆ X is a subcomplex, then the inclusion Hence one obtains an induced homomorphism for each k ≥ 0, This is a particular case of a much more general property of homology, namely its functoriality with respect to so called simplicial maps.We will not however require this level of generality in the discussion below.
Definition 2.5.A filtered simplicial complex is a simplicial complex X together with a filter function f : X → R, i.e. a function satisfying the property that if σ is a face of a simplex τ in X, then Notice that the condition imposed on the function f in Definition 2.5 ensures that for each b ∈ R {∞}, the sublevel complex . By default we identify X with X[∞].Thus a filter function f as above defines an increasing family of subcomplexes X[b] of X that is parametrized by the real numbers.
If the values of the filter function f are a finite subset of the real numbers, as will always be the case when working with actual data, then by enumerating those numbers by their natural order we obtain a finite sequence of subspace inclusions For such a sequence one defines its persistent homology as follows: for each i ≤ j and each k ≥ 0 one has a homomorphism . The image of this homomorphism is a k-th persistent homology group of X with respect to the given filtration, as it represents all homology classes that are present in H k (X[i]) and carry over to H k (X[j]).To use the common language in the subject, Im(h i,j k ) consists of the classes that were "born" at or before filtration level i and are still "alive" at filtration level j.The rank of Im(h i,j k ) is a persistent Betti number for X and is denoted by β i,j k .Persistent homology can be represented as a collection of persistence pairs, i.e., pairs (i, j), where 0 ≤ i < j ≤ n.The multiplicity of a persistence pair (i, j) in dimension k is, roughly speaking, the number of linearly independent k-dimensional homology classes that are born in filtration i and die in filtration j.More formally, Persistence pairs can be encoded in persistence diagrams or persistence barcodes.See [EH08] for more detail.
Flagser computes the persistent homology of filtered directed flag complexes.The filtration of the flag complexes is based on filtrations of the 0-simplices and/or the 1simplices, from which the filtration values of the higher dimensional simplices can then be computed by various formulas.When using the trivial filtration algorithm of assigning the value 0 to each simplex, for example, Flagser simply computes the ordinary simplicial Betti numbers.The software uses a variety of tricks to shorten the computation time (see below for more details), and outperforms many other software packages, even for the case of unfiltered complexes (i.e.complexes with the trivial filtration).In the following, we mainly focus on (persistent) homology with coefficients in finite fields, and if we don't mention it explicitly we consider the field with two elements F 2 .

Representing the directed flag complex in memory
An important feature of Flagser is its memory efficiency which allows it to carry out rather large computations on a laptop with a mere 16GB of RAM (see controlled examples in Section 6).In order to achieve this, the directed flag complex of a graph is generated on the fly and not stored in memory.An n-simplex of the directed flag complex dFl(G) corresponds to a directed (n + 1)-clique in the graph G = (V, E).Hence each n-simplex is identified with its list of vertices ordered by their in-degrees in the corresponding directed clique from 0 to n.The construction of the directed flag complex is performed by a double iteration: (I) For each vertex of the graph we iterate over all simplices which have this vertex as the initial vertex.(II) Iterate (I) over all vertices.
This procedure enumerates all simplices exactly once.
In more detail, consider a fixed vertex v 0 .The first iteration of Step (I) starts by finding all vertices w such that there exists a directed edge from v 0 to w.This produces the set S v 0 1 of all 1-simplices of the form (v 0 , w), i.e. all 1-simplices that have v 0 as their initial vertex.In the second iteration of Step (I), construct for each 1-simplex (v 0 , v 1 ) ∈ S v 0 1 obtained in the first iteration, the set S v 0 ,v 1 2 by finding the intersection of the two sets of vertices that can be reached from v 0 and v 1 , respectively.In other words, 1 , and (v 1 , w) ∈ S v 1 1 }.This is the set of all 2-simplices in dFl(G) of the form (v 0 , v 1 , w).Proceed by iterating Step (I), thus creating the prefix tree of the set of simplices with initial vertex v 0 .The process must terminate, since the graph G is assumed to be finite.In Step II the same procedure is iterated over all vertices in the graph (but see (c) below.)This algorithm has three main advantages: (a) The construction of the directed flag complex is very highly parallelizable.In theory one could use one CPU for each of the vertices of the graph, so that each CPU computes only the simplices with this vertex as initial vertex.(b) Only the graph is loaded into memory.The directed flag complex, which is usually much bigger, does not have any impact on the memory footprint.(c) The procedure skips branches of the iteration based on the prefix.The idea is that if a simplex is not contained in the complex in question, then no simplex containing it as a face can be in the complex.Therefore, if we computed that a simplex (v 0 , . . ., v k ) is not contained in our complex, then we don't have to iterate Step (I) on its vertices to compute the vertices w that are reachable from each of the v i .This allows us to skip the full iteration branch of all simplices with prefix (v 0 , . . ., v k ).This is particularly useful for iterating over the simplices in a subcomplex.
In the code, the adjacency matrix (and for improved performance its transpose) are both stored as bitsets 1 .This makes the computation of the directed flag complex more efficient.Given a k-simplex (v 0 , . . ., v k ) in dFl(G), the set of vertices is computed as the intersection of the sets of vertices that the vertices v 0 , . . ., v k have a connection to.Each such set is given by the positions of the 1's in the bitsets of the adjacency matrix in the rows corresponding to the vertices v 0 , . . ., v k , and thus the set intersection can be computed as the logical "AND" of those bitsets.
In some situations, we might have enough memory available to load the full complex into memory.For these cases, there exists a modified version of the software that stores the tree described above in memory, allowing to • perform computations on the complex such as counting, or computing coboundaries etc., without having to recompute the intersections described above, and • associate data to each simplex with fast lookup time.
The second part is useful for assigning unique identifiers to each simplex.Without the complex in memory a hash map with custom hashing function has to be created for this purpose, which slows down the lookup of the global identifier of a simplex dramatically.

Computing the coboundaries
In Ripser, [Bau18a], cohomology is computed instead of homology.The reason for this choice is that for typical examples, the coboundary matrix contains many more columns that can be skipped in the reduction algorithm than the boundary matrix [Bau18b].
If σ is a simplex in a complex X, then σ is a canonical basis element in the chain complex for X with coefficients in the field F. The hom-dual σ * , i.e. the function that associates 1 Bitsets store a single bit of information (0 or 1) as one physical bit in memory, whereas usually every bit is stored as one byte (8 bit).This representation improves the memory usage by a factor of 8, and additionally allows the compiler to use much more efficient bitwise logical operations, like "AND" or "OR".Example of the bitwise logical "AND" operation: 0111 & 1101 = 0101.
1 ∈ F with σ and 0 with any other simplex, is a basis element in the cochain complex computing the cohomology of X with coefficients in F. The differential in the cochain complex requires considering the coboundary simplices of any given simplex, i.e. all those simplices that contain the original simplex as a face of codimension 1.
In Ripser, the coboundary simplices are never stored in memory.Instead, the software enumerates the coboundary simplices of any given simplex by a clever indexing technique, omitting all coboundary simplices that are not contained in the complex defined by the filtration stage being computed.However, this technique only works if the number of theoretically possible coboundary entries is sufficiently small.In the directed situation, and when working with a directed graph with a large number of vertices, this number is typically too large to be enumerated by a 64-bit integer.Therefore, in Flagser we precompute the coboundary simplices into a sparse matrix, achieving fast coboundary iteration at the expense of longer initialization time and more memory load.An advantage of this technique is that the computation of the coboundary matrices can be fully parallelized, again splitting the simplices into groups indexed by their initial vertex.
Given an ordered list of vertex identifiers (v 0 , . . ., v k ) characterising a directed simplex σ, the list of its coboundary simplices is computed by enumerating all possible ways of inserting a new vertex into this list to create a simplex in the coboundary of σ.Given a position 0 ≤ p ≤ k + 1, the set of vertices that could be inserted in that position is given by the intersection of the set of vertices that have a connection from each of the vertices v 0 , . . ., v p−1 and the set of vertices that have a connection to each of the vertices v p , . . ., v k .This intersection can again be efficiently computed by bitwise logical operations of the rows of the adjacency matrix and its transpose (the transpose is necessary to efficiently compute the incoming connections of a given vertex).
With the procedure described above, we can compute the coboundary of every k-simplex.The simplices in this representation of the coboundary are given by their ordered list of vertices, and if we want to turn this into a coboundary matrix we have to assign consecutive numbers to the simplices of the different dimensions.We do this by creating a hash map, giving all simplices of a fixed dimension consecutive numbers starting at 0. This numbering corresponds to choosing the basis of our chain complex by fixing the ordering, and using the hashmap we can lookup the number of each simplex in that ordering.This allows us to turn the coboundary information computed above into a matrix representation.If the flag complex is stored in memory, creating a hash map becomes unnecessary since we can use the complex as a hash map directly, adding the dimension-wise consecutive identifiers as additional information to each simplex.This eliminates the lengthy process of hashing all simplices and is therefore useful in situations where memory is not an issue.

Performance considerations
Flagser is optimized for large computations.In this section we describe some of the algorithmic methods by which this is carried out.5.1.Sorting the columns of the coboundary matrix.The reduction algorithm iteratively reduces the columns in order of their filtration value (i.e. the filtration value of the simplex that this column represents).Each column is reduced by adding previous (already reduced) columns until the pivot element does not appear as the pivot of any previous column.While reducing all columns the birth and death times for the persistence diagram can be read off: roughly speaking, if a column with a certain filtration value f 0 ends up with a pivot that is the same as the pivot of a column with a higher filtration value f 1 , then this column represents a class that is born at time f 0 and dies at time f 1 .Experiments showed that the order of the columns is crucial for the performance of this reduction algorithm, so Flagser sorts the columns that have the same filtration values by an experimentally determined heuristic (see below).This is of course only useful when there are many columns with the same filtration values, for example when taking the trivial filtration where every simplex has the same value.In these cases, however, it can make very costly computations tractable.
The heuristic to sort the columns is applied as follows: (1) Sort columns in ascending order by their filtration value.This is necessary for the algorithm to produce the correct results.(2) Sort columns in ascending order by the number of non-zero entries in the column.
This tries to keep the number of new non-zero entries that a reduction with such a column produces small.(3) Sort columns in descending order by the position of the pivot element of the unreduced column.This means that the larger the pivot (i.e. the higher the index of the last non-trivial entry), the lower the index of the column is in the sorted list.
The idea behind this is that "long columns", i.e. columns whose pivot element has a high index, should be as sparse as possible.Using such a column to reduce the columns to its right is thus more economical.Initial coboundary matrices are typically very sparse.Thus ordering the columns in this way gives more sparse columns with large pivots to be used in the reduction process.(4) Sort columns in descending order by their "pivot gap", i.e. by the distance between the pivot and the next nontrivial entry with a smaller row index.A large pivot gap is desirable in case the column is used in reduction of columns to its right, because when using such a column to reduce another column, the added non-trivial entry in the column being reduced appears with a lower index.This may yield fewer reduction steps on average.
Remark 5.1.We tried various heuristics-amongst them different orderings of the above sortings-but the setting above produced by far the best results.Running a genetic algorithm in order to find good combinations and orderings of sorting criteria produced slightly better results on the small examples we trained on, but it did not generalize to bigger examples.It is known that finding the perfect ordering for this type of reduction algorithm is NP-hard [Yan81].

Approximate computation.
Another method that speeds up computations considerably is the use of approximate computation.This simply means allowing Flagser to skip columns that need many reduction steps.The reduction of these columns takes the most time, so skipping even only the columns with extremely long reduction chains gives significant performance gains.By skipping a column, the rank of the matrix can be changed by at most one, so the error can be explicitly bounded.This is especially useful for computations with trivial (i.e.constant) filtrations, as the error in the resulting Betti numbers is easier to interpret than the error in the set of persistence pairs.
Experiments on smaller examples show that the theoretical error that can be computed from the number of skipped columns is usually much bigger than the actual error.Therefore, it is usually possible to quickly get rather reliable results that afterwards can be refined with more computation time.See Figure 1 for an example of the speed-up gained by approximate computation.The user of Flagser can specify an approximation level, which is given by a number (defaulting to infinity).This number determines after how many reduction steps the current column is skipped.The lower the number, the more the performance gain (and the more uncertainty about the result).At the current time it is not possible to specify the actual error margin for the computation, as it is very complicated to predict the number of reductions that will be needed for the different columns.

Dynamic priority queue.
When reducing the matrix, Ripser stores the column that is currently reduced as a priority queue, ordering the entries in descending order by filtration value and in ascending order by an arbitrarily assigned index for entries with the The computations were made as averages over ten runs on a MacBook Pro, 2.5 GHz Intel Core i7.
Figure 2. A digraph for which using apparent pairs to compute the homology of the directed flag complex gives the wrong results.
same filtration value.Each index can appear multiple times in this queue, but due to the sorting all repetitions are next to each other in a contiguous block.To determine the final entries of the column after the reduction process, one has to loop over the whole queue and sum the coefficients of all repeated entries.
This design makes the reduction of columns with few reduction steps very fast but gets very slow (and memory-intensive) for columns with a lot of reduction steps: each time we add an entry to the queue it has to "bubble" through the queue and find its right place.Therefore, Flagser enhances this queue by dynamically switching to a hash-based approach if the queue gets too big: after a certain number of elements were inserted into the queue, the queue object starts to track the ids that were inserted in a hash map, storing each new entry with coefficient 1.If a new entry is then already found in the hash map, it is not again inserted into the queue but rather the coefficient of the hash map is updated, preventing the sorting issue described above.When removing elements from the front, the queue object then takes for each element that is removed from the front of the queue the coefficient stored in the hash map into account when computing the final coefficient of that index.5.4.Apparent pairs.In Ripser, so-called apparent pairs are used in order to skip some computations completely.These rules are based on discrete Morse theory [For98], but they only apply for non-directed simplicial complexes.Since we consider directed graphs, the associated flag complex is a semi-simplicial rather than simplicial complex, so we cannot use this simplification.Indeed, experiments showed that enabling the skipping of apparent pairs yields wrong results for directed flag complexes of certain directed graphs.For example, when applied to the directed flag complex of the graph in Figure 2, using apparent pairs gives β 1 = β 2 = 0 and not using apparent pairs gives β 1 = β 2 = 1, which is visibly the correct answer.
Experiments showed that when applied to directed flag complexes that arise from graphs without reciprocal connections (so that the resulting directed flag complex is a simplicial complex), enabling the skipping of apparent pairs does not give a big performance advantage.In fact, we found a significant reduction in run time for subgraphs of the connectome of the reconstruction of the neocortical column based data averaged across the five rats that were used in [RNS + 17], the Google plus network and the Twitter network from KONECT (Link 9.8).

Sample Computations
In this section we present some sample computations carried out using Flagser.Almost all computations were done on a laptop PC with 32GB of RAM and a 2-core processor.The data files were either taken from online public domain sources or generated for the purpose of computation, as detailed below.In the table above, the dagger symbol on the dataset abbreviation means that the approximate function of Flagser was used with approximation parameter 100000 (the code stops reducing a column after 100000 steps).This allows for computation of homology in a reasonable compute time.The asterisk stands for a homology computation that was made on a single core of an HPC cluster.This was done because the memory requirements of the homology computation were too large to perform it on a laptop.
The first six datasets were taken from the public domain collection KONECT (the Koblenz Network Collection -Link 9.8).We chose the following datasets: HumanContact/Windsurfers, Social/Google+, HumanContact/Infectious, HumanSocial/Jazz musicians, Animal/Macaques, Metabolic/Human protein (Figeys).See Figure 3 for detail on performance.Two of the networks came as weighted graphs, which allowed us to compute persistent homology in those two cases (Figure 4).The filtration was determined by assigning to each simplex the maximal weight of an edge it contains.Next, we tested the performance of Flagser with respect to several values of the Approximate parameter.The performed the computation on two graphs: a) a Blue Brain Project neocortical column reconstruction, denoted above by BB (Figure 5), and an Erdős-Rényi graph with the same number of vertices and connection probability (Figure 8).These computations were carried out on a single core of an HPC cluster.Consider for example β 4 .By Figures 5 and 6 the true value is β 4 = 9821, since in that case no columns are skipped in δ 3 and δ 4 .Thus perfect accuracy is achieved with Approximate parameter 10000.However, with Approximate parameters 1000 and 100 one has a theoretical error of 42 and 6414 respectively, whereas the actual error is 1 and 77 respectively.In the case of β 2 we do not know the actual answer.However, since the total number skipped with Approximate parameter 100000 is 664884, and the approximate value computed for β 2 is 14992658, the calculation is accurate within less than ±5%.Similarly, β 1 is computed to be 15438 with 27 skipped columns, which stands for an accuracy at least as good as ±0.2%.For β 3 the theoretical bound is clearly not as good, but here as in all other cases where precise computation was not carried out, the trajectory of the approximated Betti numbers as the approximation parameter grows suggests that the number computed is actually much closer to the real Betti numbers than the theoretical error suggests (Figure 7).
We performed a similar computation with an Erdős-Rényi random directed graph with the same number of vertices as the Blue Brain Project column and the same average probability of connection (Figures 8 and 9).Notice that there is a very substantial difference in performance between the Blue Brain Project graph and a random graph with the same size and density parameters.The difference can be explained by the much higher complexity of the Blue Brain Project graph, as witnessed by the Betti numbers.
A remarkable feature of Flagser is its parallelisable construction algorithm of the directed flag complex.To demonstrate the capability of the software, we computed some large scale examples.In particular, we computed the cell counts of two sample circuits produced by the Blue Brain Project.The first is a reconstruction of the somatosensory cortex of a rat (Figure 10), and the second is the so called PL region of the reconstruction  of the full neocortex of a mouse (Figure 11).We also computed the simplex counts on some large non-biological networks (Figure 12).
The somatosensory cortex (Figure 10) is a digraph with about 1.7 million vertices and 1.1 billion edges (average connection probability 0.04%).The Blue Brain Project model of the full mouse cortex is available online (Link 9.2).The full reconstruction consists of approximately 10 million neurons.Its synaptic connections fall into three categories.Local (short distance) connections, intermediate distance connections and long distance connections.We computed the cell count of the PL region of the left hemisphere in the reconstruction, consisting of approximately 64,000 neurons and 17.5 million local connections (average connection probability 0.4%).We did not consider any connections but the local ones.
Interestingly the number of neurons in the model of the PL region (Link 9.2) is roughly twice that in the reconstruction of the neocortical column studied in [RNS + 17], and the  probability of connection is roughly half.Hence we expected a comparable performance of Flagser.The results are therefore quite surprising.The likely reason for the difference in computation speeds is the higher dimensionality of the model of neocortex of the mouse, resulting in a gigantic number of simplices in the middle dimensions (approximately 1.1 trillion 11-dimensional simplices), compared to the neocortical column of the rat, with the former containing simplices of dimension 21 (see Figure 11) and the latter only containing simplices up to dimension 7.These computation were done on an adapted version of Flagser available at ??).This alternate version has been adapted in the following ways: the code to compute homology is removed as this is not needed, the option to load the graph from compressed sparse matrix format is included as this speeds up the time required to read the graph, the graph can be stored using Google's dense hash map to reduce the memory needed for very large graphs, and partial results are printed during run time which allows for the computation to be resumed if it is stopped (for example, if the process is killed due to time limitations on HPC).

Availability of source code
Flagser was developed as a part of an EPSRC funded project on applications of topology in neuroscience.As such Flagser is an open source package, licensed under the LGPL 3.0 scheme.The code can be found on the GitHub page of Daniel Lütgehetmann who owns the copyright for the code and will maintain the page (Link 9.5).
In the course of using Flagser in the project three adapted versions were developed, and can be found in the GitHub page of Jason Smith (Link 9.6), who will maintain the page.The following modifications are available.
• Flagser-count: A modification of a basic procedure in Flagser, optimised for very large networks.• Deltser: A modification of Flagser that computes persistent homology on any finite delta complex (like a simplicial complex, except more than one simplex may be supported on the same set of vertices).• Tournser: The flag tournaplex of a directed graph is the delta complex whose simplices are all tournaments that occur in the graph.Tournser is an adaption of Flagser which computes (persistent) homology of the tournaplex of a finite directed graph.Tournser uses two filtrations that occur naturally in tournaplexes, but can also be used in conjunction with other filtrations.

Figure 4 .
Figure 4. Performance of Flagser computing persistence.The last five datasets are: 19th step of the Rips filtration of the 40-cycle graph (equipped with the graph metric), a sample of a Blue Brain Project reconstruction of the neocortical column or a rat [RNS + 17], an Erdős-Rényi graph with a similar number of vertices and connection probability as such a network, a C. Elegans brain network [VCP + 11] (Link 9.4) and a Barabási-Albert graph with a similar number of vertices and connection probability as a C. Elegans brain network.

Figure 7 .
Figure 7. Trajectories of β 2 and β 3 as a function of the approximation parameter.A change in order of magnitude in the approximation parameter results in a small change in the Betti number computed.

Figure 10 .
Figure 10.Cell count in the Blue Brain reconstruction of the somatosensory cortex.Computation was run on an HPC cluster using 256 CPUs, required 55.69GB of memory and took 7.5 hours to complete.

Figure 11 .
Figure 11.The simplex count for the PL-Left region of a mouse neocortex (local connections only).The complex is 21-dimensional with more than 1.2 trillion 11-dimensional simplices.Computation was run in parallel on two nodes of an HPC cluster with 256 CPUs each, required 1GB of memory and took 5 days to complete (run as 10 different jobs of 24 hours each).

Figure 12 .
Figure 12.Four large datasets taken from the Koblenz Network Collection (Link 9.8).(a) Social network of YouTube users and their friendship connections, (b) Friendship data of Facebook users, (c) Road network of the State of Texas, a node is an intersection between roads or road endpoint and edges are road segments and (d) Road network of the State of California.All computations were carried out on an HPC cluster using 256 cores.
Theoretical and actual error in the computation of the first Betti number by skipping the reduction of columns that require certain numbers of reduction steps.The cell counts are 4656, 35587 and 1485099 in dimensions 0, 1 and 2 respectively, and the first Betti number is 3321.
Performance of Flagser on the Blue Brain Project graph with respect to different values of the Approximate parameter.The Betti numbers computed are an approximation to the true Betti numbers.See Figure 6 for accuracy.Figure 6. Number of skipped columns of the coboundary matrix for the Blue Brain Project graph with respect to different values of the Approximate parameter.The approximation accuracy of β i depends theoretically on the number of columns skipped in the coboundary matrices for δ i−1 and δ i .
Figure 8. Performance of Flagser on an Erdős-Rényi graph with respect to different values of the Approximate parameter.The Betti numbers range from dimension 0 to dimension 3.