Merging discrete Morse vector fields: a case of stubborn geometric parallelization

We address the basic question in discrete Morse theory of combining discrete gradient fields that are partially defined on subsets of the given complex. This is a well-posed question when the discrete gradient field $V$ is generated using a fixed algorithm which has a local nature. One example is ProcessLowerStars, a widely used algorithm for computing persistent homology associated to a grey-scale image in 2D or 3D. While the algorithm for $V$ may be inherently local, being computed within stars of vertices and so embarrassingly parallelizable, in practical use it is natural to want to distribute the computation over patches $P_{i}$, apply the chosen algorithm to compute the fields $V_{i}$ associated to each patch, and then assemble the ambient field $V$ from these. Simply merging the fields from the patches, even when that makes sense, gives a wrong answer. We develop both very general merging procedures and leaner versions designed for specific, easy to arrange covering patterns.


Introduction
Discrete Morse theory [4] is a fairly new and powerful tool, created by Robin Forman in the 1990s, that has many applications in many fields. One important application of discrete Morse theory is to streamlining homology calculations. It has been a very useful tool in topological data analysis, for example in computing persistent homology.
More specifically, there are algorithms that take both 2D and 3D digital grayscale images and create discrete Morse functions from their grayscale function values Date: January 28, 2022. 2020 Mathematics Subject Classification. Primary 57Q70, 68Q85; secondary 68U05, 68W10, 68W15.
defined on pixels or voxels. These images give rise to cubical complexes, where the voxels or pixels are the 0-cells of the complex. One algorithm, ProcessLowerStars from [12], takes all of the cells in the cubical complex and puts them either singly in the set of critical cells, C, or as a face-coface (of codimension 1) pair in the discrete vector field, V . The set of critical cells and the discrete vector field are the defining features of a discrete Morse function, and in turn the discrete Morse complex. In discrete Morse theory, the discrete vector field determines the maps in the simplified chain complex of the Morse complex that is created from the critical cells of the discrete Morse function.
Given a general algorithm α applied to cells in a regular CW complex K which takes data from k-neighborhoods of cells in K and uses that data to either pair cells together in a list which is to become the discrete vector field V or place the cells singly in the list of critical cells C. The basic question we address is how do we merge the vector fields in the elementary situation where K = U ∪ W , and U , W are two subcomplexes to which α is applied individually to get the discrete vector fields V (U ) and V (W )? The goal is to reconstruct the correct vector field V (K) that we would get on all of K. It turns out this is not straightforward, in that V (K) = V (U ) ∪ V (W ), and so α is not embarassingly parallelizable. Being able to do such a computation will involve looking at where mistakes occur in naively merging the vector fields together as in the right-hand side of the formula (or, in general, related vector fields) and "crossing out" those mistakes.
What do we mean by an algorithm being embarassingly parallelizable? One can take an example with operators in calculus to illustrate the difference between embarrassingly and stubbornly parallelizable algorithms. Differentiation is embarassingly parallelizable with respect to addition and subtraction, i.e. (f (x) ± g(x)) = f (x) ± g (x). But it is stubbornly parallelizable with respect to multiplication and division, i.e. (f (x)g(x)) = f (x)g(x) + f (x)g (x), which is a non-obvious replacement of the straightforward but incorrect formula (f (x)g(x)) = f (x)g (x).
We will present the main parallelization theorem, Theorem 3.1, that will allow us to do this merging for a general uniformly k-local algorithm α. Here k-local refers to the nature of the algorithm in that it makes the classification decision about a cell or a pair by processing only the information from k-neighborhoods of cells in a regular CW complex. This turns out to be the common type of algorithms in practice. The authors don't know any α that builds discrete Morse fields or functions and doesn't have this property. We then make this merging process more efficient in Corollary 3.3 by noting some geometric properties of the vector fields we get, which leads to a merging that involves applying α on fewer subcomplexes, i.e. a more efficient distributed computation. We then look at the situation where the subcomplexes U and W are in antithetic position, that is satisfying is the k-neighborhood of the subcomplex U . This special relative positioning of the patches allows to further improve the efficiency by allowing to use smaller auxiliary patches. This is spelled out in Corollary 3.5.
In the context of α being ProcessLowerStars, it turns out the rigid structure of the cubical complexes that are formed from the images gives a very efficient formula, related to but not exactly matching the formula for the more general case. We will present a formula, in the 2D image case, that allows us to take lists of discrete vector fields directly from two patches and merge them together to get the discrete vector field of the union of the two patches, under the condition that the two patches must have an overlap that contains at least one 2-cell. This is done in section 4 and is an illustration of how the general formulas can be further tweaked to improve the efficiency of the procedure.
Why do we want to parallelize this ProcessLowerStars algorithm? For one, if one has a particularly large image with a lot of pixels or voxels, being able to break the image up and apply the algorithm on each piece will reduce the time cost compared to applying the algorithm on the whole image. It also may be that an image may come to you in pieces,á la online machine learning. The methods of this paper allow to process each piece as it arrives and either gradually build the discrete field or put the whole field together as a final quick step.
In an application to the explicit assembly strategy for the ProcessLowerStars algorithm on a 2D image, we will assume that our computer has a hypothetical limited constraint (i.e. restricted to processing at most a certain number of pixels). In this situation we will have to break up the image so that the number of pixels in each patch is less than the constraint of our computer. These patches will be rectangular and will allow us to index our patches both in the horizontal and vertical directions. We will apply the ProcessLowerStars algorithm to each patch (and all intersections, in both the horizontal and vertical direction, of adjacent patches). We create an algorithm that merges the discrete vector field in each of the patches to obtain the discrete vector field of the whole image, using the formula on two patches. In order to do this, our algorithm will merge starting in the top left corner of the image and move along each horizontal strip and their intersections. Once the discrete vector field of every horizontal strip and their intersections is found, the algorithm then moves vertically from top down to merge the discrete vector fields of the horizontal strips two at a time again making use of the formula that will be provided.
Finally we will look at another specific special geometric situation that involves simplicial trees in section 5. This is done with two purposes. It is known that finite products of trees contain coarse images of the most common structures in geometry of finite asymptotic dimension. A sufficient partition scheme in a product of trees will therefore provide a generic template to approach most common regular cellular complexes. On the other hand, we will find that a tree is an example that distinguishes the advantages of each of the two main parallelization results of the paper.
We finish with a collection of remarks on applications of the results, the literature, and directions for future work in sections 6 and 7.

Discrete gradient vector fields
All complexes in this paper are regular CW complexes, in the sense that characteristic maps of all cells are homeomorphisms onto the closures of cells in the complex. We refer for the general background in discrete Morse theory on regular CW complexes to [10].
Notation 2.1. Suppose K is a regular complex, and suppose f is a discrete Morse function on K. Then there are well-defined sets of critical cells C(K, f ) and the discrete vector field that can be thought of as a set of pairs of regular cells V (K, f ) or V (K). When we use the notation (σ ≤ τ ) for a pair in V (K), we mean σ is a codimension 1 face of τ .

Definition 2.2.
We introduce the following relation among cells in K. Two cells σ 1 and σ 2 are adjacent if the closures of the two cells in K have a point in common. The star of a cell σ is the smallest cellular subcomplex of K that contains all cells adjacent to σ. Given a collection of cells C of K, let the star of C be the smallest subcomplex that contains the stars of all of the cells in C. If K is a subcomplex, we also call its star the 1-neighborhood of K denoted K [1]. The star of a star is the 2-neighborhood, etc. We will use the notation K [n] for the n-neighborhood of K . The k-border of a subcomplex L of K consists of those cells in L whose k-neighborhoods are not contained entirely in L.
We will assume that the lists C(K, f ) and V (K, f ) are results of applying an algorithm α : K → V of the following nature. Definition 2.3. Given a face-coface pair of cells (σ ≤ τ ), the algorithm decides whether the pair (σ ≤ τ ) is placed in the list V (K) based on data specific to α coming from some k-neighborhood σ[k], for some uniformly fixed number k > 0. The result of the algorithm is a discrete vector field V α (K) which can be therefore realized as V (K, f ) for some discrete Morse function f . We will say α with this property is uniformly local or, more precisely, uniformly k-local.
In fact, in all applications that generate V (K) the authors see in the literature, it is always done using a uniformly 1-local algorithm. This isn't surprising of course, as traditional smooth vector fields are generated through differentiation procedures which are inherently local.

Example 2.4 (ProcessLowerStars is uniformly local
). An example of such algorithm is ProcessLowerStars which produces V (K) starting with an arbitrary positive bounded function (a.k.a grayscale function) on cells (pixels or voxels) of a 2D or 3D pixel grid K. Since the algorithm qualifies (σ ≤ τ ) based on the values of a function on vertices in σ [1], this algorithm is uniformly k-local for k = 1.
To be more specific, to make these images into a regular CW complex, we will translate them to a cubical complexes by making the pixels or voxels correspond to 0-cells of the cubical complex, K. This gives us a positive bounded function, call it g, on a cubical complex, more specifically, a cubulated plane or space. To apply the algorithm, it is ideal to have the values of g on all of the 0-cells be unique. If they are not, one can make small changes (i.e., a linear ramp) to g to ensure they are unique. Once there are unique values of g, the algorithm inspects all of the cells in the lower star of a 0-cell, x. The lower star of x, L(x), contains all cells σ ∈ K in the cubical complex such that g(x) = max y∈σ g(y). To give an ordering to higher dimensional cells, a new function, G, is introduced, defined as follows: If σ contains the vertices {x, y 1 , · · · , y n }, then G(σ) = {g(x), g(y i1 ), · · · , g(y in )}, where g(x) > g(y i1 ) > · · · > g(y in ). This will allow us to impose the lexicographical ordering on G when performing this algorithm.
The algorithm, itself, will then take all of the cells of in L(x) and either possibly pair a cell, τ with a codimensional 1 face, σ and place them (σ ≤ τ ) in V or take a cell, σ and place it singly in C in the following way: • If L(x) = {x}, then x ∈ C. Otherwise, take the minimal (with respect to G) 1-cell, τ , and (x ≤ τ ) ∈ V . All other 1-cells are added to a queue called PQzero, since they have no remaining unpaired faces in L(x). All cofaces of codmension 1 of τ in L(x) that have 1 unpaired face in L(x) are added to a different queue called PQone. • The algorithm then takes the minimal cell (with respect to G) that is in PQone and either moves it to PQzero, if it has no unpaired faces remaining in L(x), or if it still has an unpaired face it gets paired with that face and is put into V . Then you look at all cofaces of codimension 1 of both cells that were just paired and put into V . If any of these cofaces have exactly one unpaired face, they are added to the queue PQone. • Then, if PQzero is not empty, it takes the minimal cell (with respect to G), call it δ, and places it singly in C. Then all cofaces of codimension 1 of δ are inspected. If any these cofaces have exactly one unpaired face, it is placed in PQone.
This will keep going until both PQzero and PQone are empty. Note that, in particular L(x) = x[1], using the notation defined above. The ProcessLowerStars, in fact, only looks at σ [1], where σ is a 0-cell.

Example 2.5 (failure of naive merge). We present a simple example with
where α is the ProcessLowerStars algorithm from [12] and U and W are subcomplexes of K such that K = U ∪ W . Let K be the following 1-dimensional cubical complex with the function values, f , on the 0-cells.
, so simply merging the lists from the subcomplexes does not give us the correct discrete vector field for all of K. In this particular instance, we get an extra pairing, the {4, 43} from U .
The failure of naive merging is made clearer using the following definition and explanation. Let σ C [k] be the k-star of σ viewed as a cell within the subcomplex C of K. This is an intrinsic to C construction and so may differ from σ [k], which is the same as σ K [k], for some combinations of σ and k. Since the algorithm α takes as input the data from σ C [k], there are cases when the discrepancy leads to different constructions of the vector fields as Example 2.5 shows. This also points to some cases when this discrepancy does not happen. Proposition 2.6. Suppose K is a disjoint union of subcomplexes C i , for i from some index set I. In other words, a connected subset of K is contained entirely within one and only one Proof. For every pair (σ ≤ τ ) its inclusion in V α (C i ) is decided by α from the data within σ Ci [k]. Since in our case σ Ci [k] = σ K [k], that decision is precisely the same within C i as within K.
This argument makes it clear that failure of the simple merge is caused by possible discrepancies between σ C [k] and σ K [k] for cells σ in the k-border of C (see Definition 2.2).

The parallelization theorems
Suppose a complex K is assigned a gradient vector field V using an algorithm α. Suppose further that U and W are two subcomplexes of K that form a covering of K. In particular, there are vector fields In slightly different terms, The informal idea is this. It will be elementary to check that ) using that the algorithm α is uniformly k-local. Let's refer to the difference as "mistakes" in building V α (K) from the union. We will see that for the same reason, We now give a formal proof.

can be viewed as all the mistakes made near the border of U [k]. This set is interpreted as pairs of cells in
Proof. If the pair (σ ≤ τ ) is contained in V α (K), it was placed in this list according to the algorithm α. According to the assumption, this placement is the outcome of inspection of data relevant to α in σ [k]. If σ and τ are cells Observe that whenever the cell σ is outside of , so the outcome of the decision α makes on inclusion of (σ ≤ τ ) in respectively V α (K) and V α (U [k]) is the same. This shows that such (σ ≤ τ ) is disqualified from M as it is excluded as part of V α (K). Our conclusion is that whenever (σ ≤ τ ) is in M , σ must be in W [k]. A symmetric argument proves that it also must also be in U [k].
The last paragraph proved that our pair of cells σ, τ is contained in U [k + 1] ∩ W [k + 1]. Therefore α uses the same data for placing ( . This allows us to rewrite This gives the first formula. Let The same argument as above shows that and similarly gives the second formula.

Remark 3.2. Our interest in this theorem is that it allows to express the global list
The computation of these smaller lists can be further simplified by the observation that the difference between Proof. The formula from Theorem 3.1 is rewritten in terms of the intersection We next implement the formula in this simple case of a two-set covering. There is a special geometric situation with a more direct identification of intersections of enlargements. We can now state the following consequence of Corollary 3.3.

Corollary 3.5.
Suppose α is an algorithm that is uniformly k-local for some k ≥ 1 and suppose U , W is an antithetic pair, then Proof. The subcomplexes in Corollary 3.3 now have a description based entirely on the enlargements of covering complexes and their intersections.

Inputs:
• k ≥ 1 • α uniformly k-local algorithm • K regular CW complex • D data attached to cells in K, specific for use in α • U (1), U (2) a covering of K by subcomplexes given as subsets of K Outputs: # preprocessing step to obtain discrete vector fields in enlargements of patches end for build the union working with the intersection U ∩ W and one single enlarging procedure. This is particularly true when the size of the intersection U ∩ W is significantly smaller than the sizes of U and W . In the case of simplicial trees which we address later in section 5, the size of U ∩ W can be arranged to be an order of magnitude smaller than either U or W in the Euclidean case and that order can be made arbitrarily smaller in the case of a tree, depending on the valence of the tree.
In the rest of the section we illustrate in the specific context of the algorithm ProcessLowerStars [12] that the general theorems can be improved on in situations with specific α and specific geometry.
Suppose Theorem 3.7. Applying ProcessLowerStars as α to the two sets U and W as above, allows some savings in size of processed lists by using a more efficient formula Notice that this theorem does not follow from any of the general theorems before because no enlargement of the patches is required for the containment Proof. The proof is based on a case-by-case analysis of ProcessLowerStars applied to stars of vertices near [b, c] × J. As before, we are merging two lists V α (U ) and V α (W ). We know that in each complex there are pairs that involve the vertices in b×J that may be included in non-critical pairs in error. In a complementary fashion, the same can be said about the critical cells in C α (U ) and C α (W ). We know the errors can happen in this specific case of α because of a 2D counterexample similar to Example 2.5. In much the same way the errors happen at this boundary because that is where incomplete information about their stars in K is used within U and W . Let's call the pairs included in error "mistakes". We try to identify the mistakes in the merged list V α (U ) ∪ V α (W ). Are they guaranteed to be inside V α (U ∩ W )? This turns out to be true in this case but is not expected for general uniformly local α or more general geometric decompositions. Next, suppose a pair in V α (U ) ∪ V α (W ) is a mistake, then we know that it is not in V α ((U ∩ W ) [1]) because now both cells are in a star entirely contained in V α ((U ∩ W ) [1]). This guarantees that the difference and completes the proof.

Distributed ProcessLowerStars algorithm on a 2D digital image
We apply Theorem 3.7 to parallelize an algorithm of type α on grayscale 2D images, in particular the algorithm ProcessLowerStars from [12].
These digital images are modeled by cubical complexes with a grid-like structure, which makes it a little easier to work with than other cubical complexes, as we can index our patches with horizontal and vertical components, which will be outlined more clearly in a little bit. Our formula for merging the vector field together only makes use of two patches. In some cases, in particular, with a picture with many pixels and a computer program having a constraint on the number of pixels it can process with the algorithm, it is very likely that one would end up with more than two patches. The grid-like structure that these cubical complexes possess will allow us to apply our formula to two pieces at a time. First, the image will be split up into patches, which will partition the cubical complex. How many patches we end up with will be determined by the number of pixels in the image and the constraint we have on the number of pixels our computer program can process. After taking the star of each of the patches in both the vertical and horizontal directions separately, we will be left with new patches where there will be overlap, both vertically and horizontally, between adjacent patches. This will allow us to apply our formula.
After the partitioning into patches and taking the stars in both directions of all the patches, we apply the ProcessLowerStars algorithm to each patch and all of the overlaps of the adjacent patches, including the overlaps of the overlaps. The algorithm will give the discrete vector field in each of the patches and all of the overlaps of adjacent patches. We then start applying our formula, starting in upper left hand side of the our complex, with the first two starred patches. This will give the discrete vector field of the union of these two starred patches. We then proceed by applying our formula to this new bigger patch with the adjacent starred patch to its right. We continue until we reach the end of this horizontal strip. We then move down vertically to the next row of starred patches and go through the same procedure. We do this for every horizontal strip of starred patches and obtain the discrete vector field for each horizontal strip.
We then move to the overlaps of the horizontal strips and use the same procedure as the horizontal strips themselves, starting from the left hand side, applying our formula as we move to the right end of the strip. We will eventually end up with the discrete vector field of the overlaps of the horizontal strips. We will need the discrete vector field of the overlaps of the horizontal strips in the next part of our algorithm when we start applying our formula vertically We now have the discrete vector field of all of the horizontal strips and their overlaps, so we can now proceed applying our formula in the vertical direction, starting with the top most horizontal strip and applying our formula with the strip directly below it, making use of the discrete vector field of the overlaps that we computed in the previous steps. We continue downward until we have the discrete vector field of one big patch and the horizontal strip below it, along with the discrete vector field of their overlap. We apply our formula to these last two patches and end up with the discrete vector field of the whole image.
We make this precise with a pseudo-code for the distributed ProcessLowerStars algorithm. We assume that the constraint of our computer program in applying the ProcessLowerStars algorithm on a digital image is N pixels. Assume that a digital image has D pixels where D > N . We break the image up into n disjoint patches such that n > D/N , where patch P # i,j is a subcomplex and corresponds to the patch in the ith column and jth row for all i = 1, · · · , m and j = 1, · · · , such that m · = n.
Let P # i,j [a, b] be the subcomplex that is the a-neighborhood of P # i,j in the horizontal direction together with the b-neighborhood of P # i,j in the vertical direction. We take our disjoint decomposition P # i,j and enlarge each patch by 1-neighborhood in each direction and and call it P i,j . This gives us P i,j = P # i,j [1,1] where each patch now overlap and the overlaps contain at least one 2-cell. Let P k,j (the union of patches moving across a horizontal strip) (P k,j ∩ P k,j+1 ) (the union of patches moving horizontally across the intersection of patches that are vertically adjacent to each other, i.e. moving horizontally along the intersection of strips) ) (the union of patches moving horizontally across the intersection of 1-neighborhoods in the vertical direction of the patches that are vertically adjacent to each other) Figure 2 is used to illustrate this notation. The figure is intended to show a part of the decomposition of the 2D plane centered on the middle patch denoted P i,j . The orange, green, and blue overlapping rows are the unions of such patches across the values of the horizontal parameter i. One useful observation is that there are only double overlaps of the rows so that there is always at least 2δ clearance between non-adjacent rows such as the orange and the blue, for a fixed value of δ. This guarantees that δ-enlargements of this covering of the plane by rows have the same nerve as the rows themselves. The nerve is isomorphic to a triangulation of the real line. The figure also shows the set P * i,j , which is shaded red, and the union of intersections of patches (P i,j ∩ P i,j+1 ) * . It is clear that the finer covering of the plane by individual patches has the same property as above: its δ-enlargement has the same nerve as the covering itself.

Inputs:
• D digital image pixels • g grayscale values on pixels Outputs: • C critical cells • V discrete vector field Constraint: • Memory= N Pixels for all i and j do Apply ProcessLowerStars algorithm to each of the following to obtain: Step to obtain discrete vector field in all patches and their overlaps end for for j = 1, · · · , do for i = 1, · · · , m − 1 do Update: [1,0]) # Moving horizontally along strips end for end for for j = 1, · · · , do for i = 1, · · · , m − 1 do Update: [1,0]) # Moving horizontally along intersection of strips end for end for for j = 1, · · · , do for i = 1, · · · , m − 1 do Update: [1,1]) # Moving horizontally along intersection of strips enlarged by 1 vertically end for end for for j = 1, · · · , − 1 do m,j [0, 1]) ) # Moving vertically down strips end for

Generalization to a hierarchical tree-like decomposition
Our main motivation for this work has been parallelization of the specific algorithm ProcessLowerStars on 2D images that we have done in sections 3 and 4. This algorithm has been proven to work in cubical cellular complexes based on standard cubical grids in 2D and 3D [12]. However, our theorems in section 3 have only general local constraints on the type of the algorithm and on the geometry of the regular cellular complex. In this section we want to leverage some geometric material from [5] and [1] to construct required antithetic coverings for grids in nD for all n and, more generally, any subcomplex of a product of finite locally finite trees.
It is known from a result of Dranishnikov [3,8] that all metric spaces which satisfy a very weak and natural geometric condition called finite asymptotic dimension (FAD) can be coarsely embedded in a finite product of locally finite trees with uniform distortion. This allows us to give a useful antithetic covering of any cellular complex which can be given an FAD metric with a universal bound on the size of all cells.
As we observed before in Remark 3.6, the importance of the case of a tree or a product of trees is also as an illustration of how crucial the improvements can be in passage from using the most general Theorem 3.1 to using Corollary 3.5.
We saw in the previous section a worked-out example of use of antithetic decompositions in the case of a cubical grid in 2D. Just as in that example, it is most natural to decompose a multi-parameter geometry according to projections to subsets in one chosen parameter. There should result an inductively defined decomposition of the entire cubical complex. The general kind of parameter for our purposes is tree-based, with partial order, generalizing from the totally ordered real line.
A simplicial tree is a simplicial complex that is connected and has no cycles. Recall also that a nerve of a collection of subsets of a given set is the simplicial complex with vertices which are the subsets and simplices corresponding to nonempty intersections of families of subsets. Definition 5.1. Suppose U is covering of a metric space. We will denote by U[k] the covering by k-enlargements of members of U for a number k ≥ 0. We will say that a covering U is a tree-like decomposition with margin k if U[k] is a simplicial tree. Suppose each of the covering sets with the subspace metric also has a tree-like decomposition with margin k. Then we say that the resulting covering by smaller sets is a hierarchical tree-like decomposition with margin k and depth 2. Inductively, for a natural number D one defines a hierarchical tree-like decomposition with margin k and depth D. We will refer to the sets that appear in such hierarchical decomposition and are not unions of other sets as primary sets.
The most useful hierarchical tree-like decomposition of depth D can be obtained for any subset of the product of D simplicial trees. The simplest case of this type of decomposition is when the trees are obtained as triangulations of the real line, which generalizes 2D decompositions as in Algorithm 2 to higher dimensions. Clearly, intersecting any subset X of T with the produced covering gives a covering of X with exactly the same three properties except possibly a forest of trees instead of a single tree. This gives a hierarchical tree-like decomposition of X with margin r and depth 1. Figure 3 is used to illustrate this notation. In this figure, v 0 is the root of the tree. Notice that the vertices labeled a 1 , a 2 , a 3 are all at distance 3 from v 0 . The blue, the orange, and the red subsets can be described as the jet subsets J(a i , 1, 4) for 1 ≤ i ≤ 3. They overlap with the green subset which itself can be described as the jet subset J (b 1 , 1, 4). Notice that the bi-colored edges represent the overlaps between the different colored subsets. This should illustrate the pattern of jet generated covering sets and their overlaps in the whole tree. One more feature that can be seen in this figure is that with this particular set of choices of the vertices and the bounds l 1 , l 2 there are only double overlaps. There are no overlaps between jet subsets of higher multiplicities. In other words, the nerve of such covering is 1dimensional. Now it can be easily seen in the picture that the cellular enlargement of all subsets by 1 unit still has a 1-dimensional nerve. The corresponding enlargements are simply the jet subsets J(a i , 0, 5), for 1 ≤ i ≤ 3, and J(b 1 , 0, 5). This illustrates how this procedure with the specific choices we make produces a family of patches with margin 1.
For a product Π of D trees, one starts by performing this construction in each of the factors, then takes the product of the covering families to create a covering of Π. This is a hierarchical tree-like decomposition with margin r and depth D. Again, intersecting any subset X of the product with the covering subsets is a hierarchical tree-like decomposition of X with margin r and depth D.

Remark 5.3.
Recall that a real tree in geometry is a geodesic metric space where every triangle is a tripod. Divergence behavior of geodesics in a real tree allows to generalize the above constructions in simplicial trees verbatim to real trees and their products.
In the rest of this section, we show how to use a hierarchical tree-like decomposition similar to Algorithm 2.
Suppose our complex K is a subcomplex of Π with the product cubical structure. In each factor we assume the simplicial tree structure. Distributing the computation, one simplifies the computation by applying an algorithm to slices which are reduced in size compared to the total complex.
In our case this process will be performed inductively using the covering sets V in each tree coordinate. This will guarantee that all slices used in the computation form coverings with margin at least r. We remind the reader that because of the assumptions both V and V[r] can be viewed themselves as vertices of forests of trees.
If π i is the projection onto the i-th coordinate, we have the slices S i,V = π −1 i (V ) for all V in V i . Since π i is a 1-Lipschitz function, the slices form a covering of K with margin at least r. We think of V i as analogues of the intervals in Algorithm 2, so the following terminology is natural.
Notation 5.4. Given a vector x with i-th coordinate a set V i from V i , there are "hypercubes" These hypercubes can serve as cells in a grid-like structure in Π. Note that the empty set ∅ can be made a valid value for V * . Let x be a vector as above with the property that if ∅ appears as a value then all subsequent coordinates must be ∅. We will use d(x) to denote the highest index for which the value is non-empty. Now the subsets of indices and the corresponding coverings C(s) by C x for x ∈ X(s) for increasing values of s form the covering sets that generalize the rectangles and strips from Algorithm 2. This is precisely the inductive structure that was leveraged in the 2D plane in the case of each tree T i being a simplicial line covered by the intervals.

Remarks on applications
There is now a large library of applications of discrete gradient fields through its use to simplify computations in topological data analysis but also direct applications to concrete problems. Our treatment of its parallelization in this paper is very general. To our knowledge all useful gradient fields that appear in the literature are uniformly local and so the methods of this paper can be applied to them.
Just to give an example of gradient fields used for motion planning in robotics, we mention a couple of recent papers of the second author. The problem in motion planning is to create an algorithm for a path a robot would take through the socalled Free Space of all allowable configurations, the configurations of the robot that avoid all obstructions in the physical environment. The Free Space is really a parameter space of feasible robot positions. One approach to modeling it is to sample the Free Space and build the corresponding simplicial approximation to it through some Vietoris-Rips complex built on the sample as vertices [14]. There are several ways to produce a discrete Morse function on a simplicial complex which restricts to given values on the vertices, or essentially equivalently a gradient field, some well very known such as [9]. All of them are uniformly local algorithms. One way this can be used for practical motion planning is as follows. Convex polyhedral obstructions usually produce convex polyhedral exclusion zones complementary to the Free Space. One may chose a density estimator for the sample in Free Space, with values at every sample point, which is then possible to extend to a Morse function. It is argued in [15] that critical marginal points can act as "lighthouses" in planning a small finite collection of paths that provably contains an optimal path of smallest length which can then be easily extracted.
There are two kinds of applications of our formula in this setting. The construction of the vector field can be distributed by considering an arbitrary or antithetic covering of the Free Space. This allows one to construct the vector field that identifies the lighthouses in patches. There is also the option of processing the data "as needed", for example by exploring an adjacent patch and the next lighthouse only when the robot approaches its border.

Discussion
There are plenty of comments in the literature that point out that by its nature ProcessLowerStars is embarrassingly parallelizable, cf. Robins et al. [12] itself and, for another example, section 2 of Guylassy et al. [7]. In the same way other uniformly local algorithms α are embarrassingly parallelizable. This perspective of parallelization is not that useful unless α is very expensive to compute by itself. A more urgent need is the type of distributed computation that builds partial vector fields on patches and combines them together. Since discrete vector fields are analogues of smooth vector fields, it's not surprising that they are also very locally defined and are not expensive to compute in each locality. These observations justify our distributed computation in this paper as the useful perspective on parallelization in this context.
Related to the point above, we wonder if there are useful discrete vector fields that are not uniformly local or even those that are uniformly local for k ≥ 2 but not for k = 1. The authors are not aware of any in the literature. Just to mention another example of a broadly used algorithm, it's easy to see that ExtractRaw of King et al. [9] which is used to generate a discrete vector field from values of a Morse function on 0-cells is uniformly local for k = 1. If we are correct then our method in the paper applies to all known discrete vector fields.
Our future plans include extending Theorem 3.7 and Algorithm 2 to the 3D case where ProcessLowerStars has been proven to work [12]. We will also address similar custom efficiency improvements of general formulas for another algorithm MorseReduce due to Mischaikow and Nanda [11]. This is a versatile algorithm using discrete Morse theory as preprocessing tool for persistent homology computations in topological data analysis. The striking feature of MorseReduce is that it is dimension-independent and can be applied to very general regular cellular complexes.
Our results apply to infinite complexes and infinite coverings of complexes. This comment might seem to have no practical implications, however we can use it to point out that our parallelization theorems allow a dynamic approach to processing the data. As in the robotics application described in the preceding section, one may not want to process all available patches at the same time but rather proceed one patch at a time depending on the need of a dynamic process such as planning a path. In this case there is a need to incorporate the new patch into the pre-existing framework. There might be infinitely many possible patches in the agnostic planning process. The theorems can be used to extend the definition of the discrete vector field to each successive patch as many times as needed.
Finally, we would like to contrast our theorems with comparable parallelization algorithms [6] and [13]. We are grateful to the referee who pointed out these algorithms to us. Both papers deal with discrete models approximated by smooth gradient vector fields and geometrically parallelize the computation of the associated Morse-Smale complexes. The goals of these authors are very much akin to ours. In fact, we refer the reader to additional motivation in Related Work sections in both papers. The main distinguishing feature of our theorems from section 3 is their generality. They apply to any uniformly local algorithm on any regular cellular complex of any dimension, while the results of the referenced papers are more specific, geared toward computation of Morse-Smale complexes associated to gradient flows on lower dimensional manifolds. We don't say these are special cases of the general theorems. They are certainly leaner and more efficient algorithms for that specific task, much like our Theorem 3.7 is not a special case of Theorem 3.1 and its corollaries.