1. Introduction
The modeling of social, biological, and technological systems as complex networks (or graphs) has gained significant attention across areas of research as diverse as computer science, statistical physics, biology, neuroscience and social science, among others. Network science has, in fact, become ubiquitous [
1]. In particular, as it was found that there is a deep interplay between the structural and dynamical features of networked systems, the analysis of the network topology therefore helps in understanding the network functional properties, such as robustness, efficiency in information transmission, and resiliency [
1]. Metric parameters play a central role in deciphering information diffusion processes. Indeed, complex networks are renowned for their ability to efficiently transfer information or signal(s) from node to node; a proper way to quantitatively analyze the information flow is through the use of structural measures, such as the diameter, the characteristic path length, the closeness, the betweenness centralities, the efficiency, etc. [
1,
2,
3].
The basis for the definition of all metric measures is the concept of distance between two nodes. If one deals with only dyadic node–node interactions, then there are no doubts as to how to define the distance between nodes, as the only way to construe a path from node s to node t is by means of a sequence of intersecting edges with the first one containing s and the last one containing t. Then, the corresponding path length is just the number of edges in the series, and the distance (or geodesic distance) between the two nodes is just the length of the shortest path.
However, the description of many processes in the real world requires to account for group interactions, i.e., interactions of more than two nodes, and it is, therefore, critical to refer to other mathematical objects that allow a better representation of such higher-order relationships [
4,
5]. These objects are called hypergraphs, and the quantification of efficiency of the spreading processes needs here the effort to extend the classical definitions of graph structural measures to such a hypergraph setting.
The simplest generalization to the higher-order case of any graph topological measure is just its computation in the hypergraph clique projection, a representation where each hyperedge is replaced by a clique of pairwise interactions among all its nodes. However, considering the clique projection, one obviously loses very important information about the structure of such higher-order interactions. This was brightly demonstrated in the series of recent studies, where it was shown that more complex generalizations of topological measures were needed in order to provide significant and meaningful information about the hypergraph structure [
6,
7,
8,
9,
10,
11]. In our study, we introduce a new concept of distance among nodes in a hypergraph.
Moreover, a series of fundamental macroscopic network topology descriptors (such as the betweenness centrality, the closeness centrality, or the efficiency) are explicitly based on the concept of metric structure or distance in the graph. As we will show further in more detail, the current literature contains some attempts to generalize the concept of distance [
4,
7,
8,
12,
13,
14], which, however, are not systematic, and are mostly only on one of the distinguishing features that higher-order structures display.
A first, naive extension for the term distance would be just replacing the word “edge” by the word “hyperedge” in the graph’s path definition given above, and leaving the rest of the definition unchanged. This way, one would obtain the so-called “distance in the hypergraph clique projection”. While such a definition frequently appears in the literature [
15,
16,
17,
18,
19,
20,
21,
22], it actually oversimplifies the rich structure of higher-order interactions. Indeed, in the case of a classic (unweighted) graph, all edges are roughly equivalent, as they all have the same cardinality (equal to two), and furthermore, all their intersections have the same cardinality (equal to one). This is not, however, the case of higher-order systems, where hyperedges may differ in their size and may intersect differently. Therefore, it seems reasonable that such extra information be somehow considered in the hypergraph distance definition.
To make an illustrative example, let us refer to the two panels of
Figure 1. In both cases, the hypergraphs are composed of two communities (of arbitrary internal structure) and contain hyperedges bridging the communities. Let us consider the hypergraph of
Figure 1a, and let us suppose that a random walker starts at node
u in the first community. Then, the probabilities to get from node
u to node
v by using no more than two steps either through the path including only the hyperedge
(
) or by using the path
(
) are
where
is the edge
’s cardinality (in
Figure 1a
) and
is the number of hyperedges incident to node
u. It is clear that
is significantly smaller than
, so it is natural to suppose that the length of the path
should be larger than the length of the path
.
Now, if one considers instead the hypergraph of
Figure 1b and, as before, compares the probabilities
with
of getting from node
u to node
v by using no more than two steps, then one has
Now, one has , and therefore the path should be assigned a distance shorter than that assigned to .
As a consequence, the concept of length of a path in a hypergraph should not only take into account the number of hyperedges involved but also their size and the size of their intersection: the bigger the size of a hyperedge, the larger the associated distance should be, and the bigger the intersection between hyperedges, the closer the nodes should be. Such two distinguishing features (varying sizes of hyperedges and sizes of hyperedges intersections) are indeed separately discussed in the literature, with the following in particular:
- (1)
If one assigns weights to hyperedges as proper functions of their sizes, then the distance between a couple of nodes
is the sum of the weights of all hyperedges in the shortest path, i.e.,
where
is the set of paths between
s and
t and
is the weight of the hyperedge
e of size
. The heuristics behind this definition come from social networks: if one assumes that the size of each hyperedge is the size of a company of friends, then when the company size grows, it becomes more difficult for the members of the network to keep in touch with each other. This idea was used in Refs. [
7,
14], where the authors define hypergraph random walks, arguing that
“the walkers may give more or less importance to hyperedges depending on their size”.
- (2)
The difference in the size of intersections between hyperedges was taken into account in the definition of
k-walks introduced in Refs. [
4,
8,
12,
13]. A
k-walk is a sequence of hyperedges such that each pair of successive hyperedges are adjacent, and they intersect in at least
k vertices.
Despite the fact that both ingredients are separately considered in the structural analysis of hypergraphs, these two features have never, so far, been considered to be present together for the description of the metric structure of hypergraphs. The main goal of this paper is, therefore, to combine them in order to obtain a tailored metric structure for hypergraphs that would properly extend that of classic (dyadic) networks. Once this structure is defined (in the next section), network efficiency, closeness and betweenness centralities can also be defined as basic metric structural measures of the hypernetwork topology.
Our manuscript includes, therefore, three novel contributions to the literature. First, it introduces a new distance measure. Second, it gives the generalizations for the three crucial characteristics of network topology listed above. Third, it provides a series of examples showing the efficiency of the new concepts in uncovering meaningful information about higher-order networks. An illustration of such a novel approach is made with specific synthetic networks, where one immediately realizes that it provides important information on the properties of information transfer through the networks. Finally, our numerical testings on real-world higher-order networks reveal that the new metric measures uncover several important features of the hypergraphs properties. In particular, we show that our measures reveal different assessments of the nodes importance and of other network’s features from the information-transferability point of view. Finally, we show that our measure is particularly suitable to describe the structure of hypergraphs, which are highly distinguishable from graphs, i.e., in which hyperedges of large size are frequent and nodes in these hyperedges are rarely connected by other hyperedges of smaller sizes.
2. Methods
The basic notation used here is the same as that of Ref. [
5]. In particular, a (dyadic)
complex network is a (undirected and unweighted) network
, where
for some
is the set of
nodes (or vertices) and
E is the set of
links (or edges), i.e., a finite family of (unordered) pairs of nodes. Throughout this paper, only undirected and unweighted networks are considered, but all our results can be straightforwardly extended to the case of directed and weighted networks. A
hypergraph is a pair
, where
is a (finite) set of nodes and
is a (finite) family of (non-empty) subsets of
V, i.e.,
for all
. Each element of
H is called a
hyperedge of
.
Notice that every complex network is a hypergraph, but the reverse is not true. Despite this latter fact, if one takes a hypergraph , one can define its complex network projection , where each hyperedge is replaced by the clique made of all the pairwise interactions among the hyperedge’s nodes, i.e., . is called the clique projection of .
In order to introduce a metric structure in
(i.e., a notion of distance between nodes of
), we first need to establish the notion of paths in
. If one takes a pair of nodes
, a
path from
i to
j is a (finite) sequence of hyperedges
such that
,
and
for all
. One can associate a (positive) real number to every path from
i to
j (the
length of the path), and then the distance between
i and
j will be the minimal length among all the paths from
i to
j, so the critical point, as we will see in
Section 2.1, is how to define the length for each path in a hypergraph.
The missing mathematical ingredient is the concept of the linegraph. Given a hypergraph
, its (unweighted and undirected)
linegraph is a graph, whose node set is the set of hyperedges of
and for which there is a link between
if
. Notice that the linegraph has self-loops in all its nodes since for every
, one has that
. A weighted linegraph is introduced in
Section 2.1 in order to define the path length in a hypergraph by assigning a non-negative weight to each link in
. Further properties and results about hypergraphs and their linegraphs can be found in Refs. [
4,
5].
2.1. Weighted Linegraphs
The use of weights in the links of the linegraph allows quantifying either the size of the hyperedges and that of the intersections among hyperedges. Given a hypergraph , there is not a unique way of assigning weights to the links of its linegraph, so a weight distribution mechanism must be included. As we will show momentarily, the weights of the self-loops () depend only on the size of corresponding hyperedge , while the weights of the edges in the line graph are calculated as proper functions of the corresponding hyperedges’ sizes and the size of the intersection of these hyperedges.
For the sake of illustration of our definitions, let us initially refer to the hypergraph (made of 11 nodes and 3 hyperedges) depicted in
Figure 2, and its associated linegraph.
For every node
of a hypergraph
, one can denote by
the set of hyperedges incident to the node
i (in the example of
Figure 2,
and
). Then, one can define the
weighted distance between the pair of nodes
as
where
is the distance between the hyperedges
e and
(
) in the weighted linegraph of
, while
is the weight of the self-loop
in the weighted linegraph. The critical point is now defining the weights of the linegraph in order to give sense to the definition given above.
In particular, the weight function on the linegraph must have the following properties:
The distance between the nodes obtained from Formula (
2) must be the same as the classic distance in a graph at all times that one considers a (dyadic) complex network
as a hypergraph.
The bigger the intersection of the hyperedges, the smaller the distance should be. For instance, if one considers the case illustrated in
Figure 3, then the weighted distance between
i and
j in panel (a) should be smaller than that in panel (b). Therefore, the distance should be inversely proportional to the intersection size.
The bigger the hyperedges involved, the larger the weighted distance should be. In particular, taking as an illustrative example the case of
Figure 4, the weighted distance in panel (b) should be larger than that in panel (a) since the sizes of edges are bigger in case (b), while the size of the intersection is the same.
Finally, the larger is the number of hyperedges involved in the path, the larger weighed distances one should obtain. In other words, the new metric structure should be sensitive to the number of hyperedges in the paths considered. In particular, if one takes two paths, one with only one hyperedge and another with two hyperedges but with the same number of nodes involved in both cases, then the path length should be smaller in the first case (see the illustration in
Figure 5, where panel (b) must give a larger distance with respect to the case of panel (a)).
A possible choice for the weight function
that verifies the four desired properties is to assign, for every
, a weight
It is easy, indeed, to verify that the function in Equation (
3) accomplishes the following properties:
Let us discuss explicitly the issue of the computational complexity associated with the calculation of our distance. It is reasonable to analyze separately two steps of the distance calculations: the construction of the weighted linegraph and the distance calculation per se. As for the construction of the weighted linegraph, the worst-case estimation of the complexity is, obviously,
. Moreover, looking at Formula (
2), estimating the distance
between nodes
i and
k needs the computation of the distances between all pairs of nodes
and
in the weighted linegraph
. The complexity associated with computing
is therefore given by the worst-case estimate for Dijkstra’s algorithm, which is
[
24]. The resulting estimate of the computational complexity is
.
To conclude this section, we remark that the choice of the weighting function in Equation (
3) is not, in fact, unique, in the sense that other choices can be made that equally satisfy the four properties desired for a distance.
2.2. Some Structural Measures
The metric structure of a networked systems plays a central role in understanding the dynamics and processes that take place on them, including robustness [
25], diffusion [
1] and resilience [
26]. There is a plethora of measures related to the metric structure of networks in the literature, and we will here limit ourselves to consider network efficiency, closeness and betweenness centrality, and to discuss their extension to the hypergraph setting by using the weighted distance introduced in
Section 2.1.
The (global) efficiency of a graph, introduced in Ref. [
27], measures the performance of the network information transfer. Formally, given a complex network
of
n nodes, its efficiency is defined as
where
is the distance between nodes
i and
j.
It is straightforward to extend this measure for a hypergraph
of
n nodes by simply considering the metric structure in
. As we discussed already in the introduction, one could consider a simple approach and define a metric structure in
given by its clique projection network
. By this metric, one could define the efficiency of
as
. On the other hand, if one considers the weighted metric structure defined by Equation (
2), then one can calculate the weighted efficiency of the hypergraph
as
Similarly, one can introduce a weighted hypergraph analogue of closeness centrality. If one takes again a network
, then the closeness centrality [
1,
2,
3] of node
is
Notice that the top nodes (according to the closeness centrality values) can be seen as the most aware nodes in an information/social network [
1,
2,
3]. By using expression (
6), it is easy to transfer this concept to the hypergraph setting, simply by computing
for every node
i in a hypergraph
. Additionally, in this case, if one considers the metric structure given by the clique projection
, then one can define an alternative closeness centrality as
for every node
i in the hypergraph.
Finally, the weighted centrality of a node
i in a hypergraph
can also be defined, but in this case, one should pay attention to some extra remarks. Let us remind indeed that, if one takes a network
, the betweenness centrality of each node
i is given by
where
is the set of all shortest paths between
u and
v, whereas
is the set of shortest paths between
u and
v that pass through
i and
is the set cardinality operator, i.e., it is the sum of fractions of shortest paths between all possible nodes pairs in the graph going through node
i.
If one wants to extend this notion to a general hypergraph
, one should specify first what it means that a path from
u to
v goes through node
t when the path is a series of hyperedges. Following Ref. [
18], we assume that if we have a path described by the sequence of hyperedges
, where
and
, then it provides us with a sequence of nodes
, where
(nodes
are called
bridging nodes in Ref. [
18]). Once we fix this notion, the weighted betweenness centrality of a node
i in a hypergraph
is defined as
where
is the set of shortest paths between nodes
u and
v whose lengths are calculated by using the proposed weighted hypergraph measure, and the elements of the set are all possible sequences of hyperedges forming a path from node
u to node
v;
is the set of those shortest paths between nodes
u and
v, for which the intersection of one pair (or some pairs) of hyperedges includes node
i; and
is the size of such a latter intersection. Once again, if one considers the clique projection
, then one could define an alternative betweenness centrality as
for every node
i in the hypergraph.
4. Conclusions
In conclusion, we here introduced a novel definition of distance for hypergraphs that extends the classic methods reported in the literature. In particular, the new distance takes explicitly into account two critical factors (the inter-node distance within each hyperedge and the distance between hyperedges in the network) which have never, so far, been considered together. The consequence is that the computation of distances is made in a properly weighted linegraph of the original hypergraph. We illustrated the benefit of adopting such a metric structure with reference to a series of small synthetic hypernetworks, and we then applied our approach to large real-world hypergraphs, revealing that the obtained information about the hypergraphs structure is far different from that which is acquired by computing distances with the standard approach of clique projection. In particular, the difference is significant for hypergraphs in which hyperedges of large sizes are frequent and nodes are rarely connected by hyperedges of small sizes. This latter evidence points to the fact that our measure may be of great help in various circumstances, especially when a correct assessment of the roles of nodes is needed from the information-transferability point of view.
Possible follow-ups of our study include the use of our novel definition of distance for the measure of other topological properties of (or the analysis of processes taking places in) hypergraphs. This is the case, for instance, for random walks, local efficiency, clustering coefficients, community structure features, modularity, etc. Furthermore, a similar approach can be adopted also in the case of directed and annotated hypergraphs.