A Review of and Some Results for Ollivier–Ricci Network Curvature

Characterizing topological properties and anomalous behaviors of higher-dimensional topological spaces via notions of curvatures is by now quite common in mainstream physics and mathematics, and it is therefore natural to try to extend these notions from the non-network domains in a suitable way to the network science domain. In this article we discuss one such extension, namely Ollivier’s discretization of Ricci curvature. We first motivate, define and illustrate the Ollivier–Ricci Curvature. In the next section we provide some “not-previously-published” bounds on the exact and approximate computation of the curvature measure. In the penultimate section we review a method based on the linear sketching technique for efficient approximate computation of the Ollivier–Ricci network curvature. Finally in the last section we provide concluding remarks with pointers for further reading.


Introduction
It is by now quite common in mainstream physics and mathematics [1,2] to characterize topological properties and anomalous behaviors of higher-dimensional topological spaces via notions of (local and global) curvatures of these spaces, e.g., in general relativity, extreme variations of four dimensional space-time curvatures via geodesic incompleteness lead to characterizations of black-holes [3]. It is therefore natural to try to extend these notions from the non-network domains e.g., from continuous metric spaces or from higher-dimensional geometric objects) in a suitable way to the network science domain so that non-trivial new topological characteristics of networks can be captured. There are several ways this can be achieved; we briefly mention two other approaches before proceeding with the approach that is the main topic of this paper. Note that such extensions need to overcome at least two key challenges, namely that (i) networks are discrete (non-continuous) objects, and that (ii) networks may not necessarily have an associated natural geometric embedding.
One notion of network curvature that has been well-studied in the network theory literature, first suggested by Gromov in a non-network group theoretic context [4], is the Gromov-hyperbolic curvature. First defined for infinite continuous metric space [2], the measure was later adopted for finite graphs. Usually the measure is defined via properties of geodesic triangles or via equivalent (in a sense that can be made precise) 4-node conditions, though Gromov originally defined the measure using Gromov-product nodes in [4]. Informally any infinite metric space has a finite Gromov-hyperbolicity measure if it behaves metrically in the large scale as a negatively curved Riemannian manifold, and thus the value of this measure can be correlated to the standard scalar curvature of a hyperbolic

Basic Notations and Terminologies
To simplify exposition, we assume in this paper that the given network (In this paper the terms "graph" and "network" will be used interchangeably.) G = (V, E) is an undirected unweighted connected graph; generalization of the corresponding definitions and concepts to the case of non-negative edge weights is mostly straightforward. The following notations will be used in the rest of this paper.
For a node v ∈ V, Nbr(v) = { u | {v, u} ∈ E} denotes the set of neighbors of v, and deg(v) = | Nbr(v) | denotes the degree of v.
dist G (u, v) (or simply dist(u, v)) denote the distance (i.e., number of edges in a shortest path) between the nodes u and v in G.

Ollivier-Ricci Curvature: Motivation, Definition and Illustration
In this section, we provide the formal definition of the Ollivier-Ricci curvature. First, we need to define the so-called Earth Mover's Distance (EMD) (also known as the L 1 transportation distance, the L 1 Wasserstein distance and the Monge-Kantorovich-Rubinstein distance) [24][25][26][27]. For the purpose of this paper, it suffices to define the distance in the discrete setting of a network as follows. Suppose that we have two probability distributions P 1 and P 2 on a subset ∅ ⊂ V ⊆ V of nodes, i.e., two real We can think of every number P 1 (v) as the maximum total amount of "earth" (dirt) at node v that can be moved to other nodes, and every number P 2 (v) as the maximum total amount of earth node v can store in its storage. The cost of transporting one unit of earth from node u to node v is dist G (u, v), and the goal is to satisfy the storage requirement of all nodes by moving earths as needed while minimizing the total transportation cost. Letting the variable z u,v ∈ [0, 1] denote the amount of shipment from node u to node v in an optimal solution, EMD for the two probability distributions P 1 and P 2 on V can be formulated as the linear programming (LP) problem shown in Figure 1 which can be solved in polynomial time. One can also think of the EMD solution as the distance between two probability distributions P 1 and P 2 on the set of nodes V based on the shortest-path metric on G. We will use the notation EMD(V , P 1 , P 2 ) to denote the value of the objective function in an optimal solution of the LP in Figure 1. For an intuitive understanding of the connection of EMD to Ollivier-Ricci curvature for networks, we informally recall one way of defining Ricci curvature measure for a smooth Riemannian manifold. The Ricci curvature at a point x in the manifold along a direction can be thought of transporting a small ball centered at x along that direction and measuring the "distortion" of that ball. The role of the direction is captured by the edge {u, v}, the roles of the balls at the two nodes are played by the distributions P 1 and P 2 , and the role of the distortion due to transportation is captured by the EMD measure. More precisely, given our input graph G = (V, E) and an edge {u, v} ∈ E, the paper [20] uses the EMD measure to define the "course Ricci curvature" RIC(u, v) along the edge {u, v} in the following manner (see Figure 2 for an illustration):

variables: z u,v for every pair of nodes
Let the probability distributions P 1 and P 2 be uniform distributions (If the given graph is non-negative node weights then another option is to normalize the restrictions of these node weights to the sub-graph H u,v and use them for the distributions P 1 and P 2 .) P u and P v , respectively, over the nodes in {u} ∪ Nbr(u) and {v} ∪ Nbr(v), respectively, i.e., Remembering that dist G (u, v) = 1 for an edge {u, v} ∈ E, we can then define the course Ricci curvature as (cf. [20] (Definition 3)): The measure can easily be extended for graphs with non-negative edge weights; redefine dist(u, v) to be minimum total weight over all possible paths between u and v and use the equation: Some authors also define the discrete Ricci curvature RIC(u) for a node u ∈ V by taking the average of the discrete Ricci curvarure over all edges incident on u, e.g., by letting RIC(u) =

An Illustration of Computing the Value of RIC(u, v) For a Two-dimensional Grid
Consider an infinite two-dimensional grid on the plane and any edge {u, v} of the grid as shown in Figure 3. Note that any node of the grid has exactly 4 neighbors, thus P u (x) Moreover, the set of nodes Nbr(u) \ {v} and Nbr(v) \ {u} are disjoint, thus it is easy to see that EMD(V u,v , P u , P v ) = 1 (see Figure 3).

Exact and Approximate Computation of RIC(u, v)
Note that any node x ∈ V u,v with either P u (x) = 0 or P v (x) = 0 can be ignored in the calculation of EMD(V u,v , P u , P v ). Thus, a straightforward calculation of RIC(u, v) requires the following two steps: Find the pair-wise distances between the nodes in Nbr(u) and Nbr(v). This can be done in O(n ω log n) using Seidel's algorithm [28] where n is the number of nodes and ω be the value such that two n × n matrices can be multiplied in O(n ω ) time; the smallest current value of ω is slightly less than 2.373 [29].
) constraints via standard LP solvers such as the interior-point method. Alternatively, the LP can be solved by minimum-cost network flow algorithms by viewing it as a transportation problem, e.g., see [30].
However, the calculation of EMD(V u,v , P u , P v ) (and therefore RIC(u, v)) can be further simplified if we make some more observations.
Consider a pair of nodes u ∈ Nbr(u) and v ∈ Nbr(v) for an edge {u, v} ∈ E. Note that there are only four possible values of dist there is a path of length 2 between u and v , and dist G (u , v ) = 3 for all other cases. Thus, to to find all pair-wise distances between the nodes in Nbr(u) and Nbr(v) we only need to check for paths up to length 3, which can be done faster in O(n ω ) time using Seidel's algorithm [28] again.
For further discussion, consider the total variation distance (TVD) between the two distributions P u and P v on the set of nodes in V u,v : The bound in Proposition 1 may not necessarily be a tight approximation for RIC(u, v); for example, for the grid in Figure 3 we get || P u − P v || TVD = 3 /5 giving − 4 /5 ≤ RIC(u, v) ≤ 2 /5 as an approximation to the actual value of RIC(u, v) = 0.
For development of further bounds, consider the edge {u, v} ∈ E. Assume without loss of generality that deg(u) ≤ deg(v) and G has 4 or more nodes, thus deg(v) ≥ 2. Suppose that u and v have 0 ≤ ≤ deg(u) common neighbour nodes as shown pictorially below: . . , p k , q 1 , q 2 , . . . , q q 1 , q 2 , . . . , q ≥ 0 common neighbours , r 1 , r 2 , . . . , r m Note that the two probability vectors P u and P v for the edge {u, v} are as shown below: By our assumption 1 deg(u)+1 ≥ 1 deg(v)+1 , and thus a straightforward calculation gives the following value for || P u − P v || TVD : , and in particular it always holds that −2 < RIC(u, v) ≤ 1.
For further bounds, suppose that there exists a γ ∈ {1, 2, 3} such that for any two distinct nodes u ∈ Nbr(u) and v ∈ Nbr(v) we have dist(u , v ) is exactly γ. In that case, it follows that Now, suppose that G has no cycles of 5 of fewer edges containing the edge {u, v} (a tree is a trivial example of such a graph). This implies γ = 3 and = 0, giving the following bound.

Review of Efficient Approximate Computation of RIC(u, v) via Linear Sketching
It is clear that a crucial bottleneck in computing RIC(u, v) for an arbitrary graph G = (V, E) is the computation of EMD(V u,v , P u , P v ) since it seems to require solving a linear program with O(deg(u) deg(v)) variables and O(deg(u) deg(v)) constraints (note that in the worst case deg(u) deg(v) can be as large as Θ(n 2 ) when n is the number of nodes of G). In this section we review a non-trivial approach for computing EMD(V u,v , P u , P v ) provided we settle for a slightly non-optimal solution for EMD(V u,v , P u , P v ).
Linear sketching is a popular method to perform approximate computations on large data sets using dimensionality reduction [31]. The general (informal) intuition behind linear sketching is to take linear projections of the given data set and then use these projections to provide solutions to the original problem. Significant research has been done on the problem of estimating EMD using linear sketches for general metric spaces [32][33][34][35][36]. In this section, we discuss the results by McGregor and Stubbs [37] to approximately estimate EMD on a graph metric (i.e., metric induced by inter-node distances in a graph, as is the case for computing RIC(u, v)). Recall that our bottleneck is the computation of EMD(V u,v , P u , P v ) for the given graph G.
The first step is to transform the problem of computing EMD(V u,v , P u , P v ) by standard techniques to the following equivalent problem which will be denoted by EMD d . Given two multi-sets A, B ⊆ X over a ground set X with |A| = |B| = k, and a metric d : X × X → R + on X , compute the minimum-cost of perfect matching between A and B, i.e., using π A,B to denote a 1-1 mapping from A to B, we need to compute For the purpose of measuring approximation quality, we say that an algorithm is an ( , δ)-algorithm for computing a quantity of value Q if the value Q returned by the algorithm satisfies The basic approach of McGregor and Stubbs in [37] is to define two vectors x, y ∈ R |E| corresponding to the set A and B. We then estimate EMD d (A, B) by posing it as a 1 -regression problem using the vectors x, y and a set of other vectors defined by the structure of the underlying graph. The idea is take some random projections of these vectors to a smaller dimensional space and then perform 1 -regression on these projections to save space and time. The following result by Kane et al. [38] is crucial to the analysis of this approach (the notation Pr M∼ν is the standard notation for denoting that the entries of M are drawn from the distribution ν): ( ) There exists a distribution ("q-dimensional sketch") ν over linear maps from R n → R q where q = O(ε −2 log n log δ −1 ) and a "post-processing" function f : R q → R such that for any x ∈ R n with polynomially-bounded entries, it holds that To understand how the above result relates to the calculation of EMD d (A, B), first consider the case when the given instance of EMD d (A, B) is one dimensional, i.e., let G = (V, E) be a path with n nodes V = {1, . . . , n} and n − 1 edges E = {e 1 , . . . , e n−1 } where e i = {i, i + 1}, let A, B ⊆ V, and let d(i, j) = dist G (i, j) for all i, j ∈ V. Then we can associate computation of EMD d (A, B) to a norm estimation problem in the following manner. Assume that we have vectors x = (x 1 , . . . , x n−1 ) ∈ R n−1 and y = (y 1 , . . . , y n−1 ) ∈ R n−1 such that for all i ∈ {0, 1, n − 1} the following assertions hold: x i = |{a ∈ A |i ≥ a}| and y i = |{b ∈ B |i ≥ b}|. Then, it can be shown that EMD d (A, B) = x − y 1 and thus we can use the result of Kane et al. [38] as stated in ( ) directly.
As a second illustration of the above point, suppose that the graph G in the previous example is now a cycle of n nodes V = {1, . . . , n} and n edges E = {e 1 , . . . , e n } where e i = {i, i + 1} for i ∈ {1, . . . , n − 1} and e n = {n, 1}. Suppose that we simply ignore the last edge e n so that the graph becomes a path and we can apply the previous approach. However, this omission of e n changes the distance between the nodes i ∈ A and j ∈ B from d(i, j) = min |i − j|, |i − n| + 1 + |1 − j|, |i − 1| + 1 + |n − j| to a new distance d (i, j) = |i − j|. To resolve this issue, we make a sequence of guesses for the number of pairs of nodes that will be joined using the edge e n . More precisely, for λ ∈ {−k, −k + 1, . . . , k − 1, k} let C λ be the multi-set consisting of λ copies of "1" if λ > 0 and |λ| copies of "n" if λ < 0. Then, one can show that with equality for some λ ∈ {−k, −k + 1, . . . , k − 1, k}, where denotes the union for multi-sets. Thus, we can use the result in ( ) in the following manner. First define two vectors x = (x 1 , . . . , x n ) ∈ R n and y = (y 1 , . . . , y n ) ∈ R n where x i = |{a ∈ A | i ≥ a}| and y i = |{b ∈ B | i ≥ b}| for i ∈ {1, . . . , n − 1}, and x n = y n = 0. Let z = x − y and c = (1, . . . , 1) ∈ R n . Then, it follows that Define the function f : R → R as f (λ) = z + λc 1 ; clearly EMD d (A, B) = min λ∈{−k,−k+1,...,k−1,k} f (λ) . For a specific λ ∈ {−k, −k + 1, . . . , k − 1, k}, we can use ( ) to find an approximation f λ of f λ using a O(ε −2 log n log(kδ −1 ))-dimensional sketch of z such that Iterating the process 2k + 1 times and using the union bound for probabilities, we get It is possible to design a more careful approach that iterates only O(log k) times instead of 2k + 1 times. The ideas behind this approach as described above can be extended to trees with some non-trivial effort.
Finally the approach can indeed be generalized to the case when G is an arbitrary graph (which applies to computing RIC(u, v)) in the following manner. The basic idea to calculate EMD d (A, B) for an arbitrary graph G is to reduce it in an approximate sense to that of computing EMD for a tree. Let T = (V, E T ) be an arbitrary spanning tree of G, and let F = E \ E T . The tree T defines a natural tree metric d where d (a, b) is the length of the shortest path between a and b in T for all a, b ∈ V. One can then express EMD d (A, B) in terms of EMD d (A , B ) for some A ⊇ A and B ⊇ B in the following manner. For f = (u, v) ∈ F and λ f ∈ {−k, −k + 1, . . . , k − 1, k}, let C f λ f be the multi-set consisting of λ f copies "u" if λ f > 0 and |λ f | copies of "v" if λ f < 0. Then the following bound holds: The above inequality leads to the following approach. Fix an arbitrary node r ∈ V as the root of the spanning tree T, and let P T (u, v) denote the set of edges in the unique path in T between nodes u and v. Define the two vectors x, y ∈ R |E| as follows (x e and y e denote the component of x and y, respectively, indexed by the edge e ∈ E): This leads to the following optimization problem: The above optimization problem can be solved using several approaches, e.g., using a recursive regression algorithm that exploits the convexity of f or using some recent results on robust regression via sub-space embeddings [39,40].

Discussion
In this paper we have reviewed some computational aspects of the Ollivier-Ricci curvature for networks, and shown a few simple computational bounds. As already mentioned in Section 1, there are other notions of network curvature that is also used by researchers and therefore this review should not be viewed as championing the Ollivier-Ricci curvature over other curvatures. We hope that this review will motivate further research on the exciting interplay between notions of curvatures from network and non-network domains. Some applications of network curvatures for real-world networks appear in references such as [11,13,15,16,18].
We conclude our article by mentioning an interesting application of the Ollivier-Ricci curvature for Markov chains for graph coloring and other problems (recise technical descriptions of these results are beyond the scope of this introductory review). The probability distributions on nodes used to compute EMD in the Ollivier-Ricci curvature can be naturally associated with a Markov process on the given graph (as a very simplified illustration, one can use a "normalized version" of EMD(V u,v , P u , P v ) as the probability of transition between the states corresponding to nodes u and v). Such associations have a long history in the Markov chain literature under various names such as path coupling [41] and the values of RIC(u, v)'s have been used (explicitly or implicitly) to prove useful properties of the Markov chain, such as fast convergence to its stationary distribution, in many settings such as graph colouring [41] and sampling of paths with constraints [42].

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: EMD Earth Mover's Distance RIC Ricci curvature