1. Introduction
The notion of Burt’s structural holes, used when analyzing social networks, is pervasive and fascinating [
1]. Intuitively, it refers to the absence of connections between groups and has been linked to the fact that filling these voids and bridging these gaps can be a source of opportunities and be very beneficial for individuals able to do so. On the one hand, the versatility of this concept has stimulated the definition of several measures, each capturing a different aspect [
2], and each measure has been used in different frameworks when analyzing real-world networks and social capital theories [
3]. On the other hand, this has also generated confusion when it comes to which exact measure one has to compute and how to compute it [
4,
5]. Moreover, computing these measures directly by applying the definition formulas can be very slow and computationally intensive, because it would require looping over each node’s neighbors (and its neighbors’ neighbors).
As clearly described by [
6], the advantages of structural holes lie in the information benefits and in the control benefits. The former refer to the possibility of acquiring information from different communities where there may be different opinions, ideas, and pieces of information. The latter, instead, refer to the better negotiating position that a
structural hole spanner has because she has access to unique pieces of information with respect to other agents who are instead closed and confined within the groups that are being bridged. In several applications, it has been shown that structural holes are important for channeling information flows, and in this way they can bridge different groups that would otherwise not communicate with each other, thus acting as brokers. However, by blocking and controlling such information flow, they play a crucial role in shaping opinion dynamics, since they can also maintain close communities unaware of other groups’ opinions and beliefs, possibly exacerbating echo-chamber effects [
7,
8,
9]. In a world dominated by online social networks, studying these phenomena can have important applications when opinions about public and health issues are formed and spread, as recently demonstrated by the different and conflicting ideas about the COVID-19 vaccination campaigns that circulated in restricted groups of the population [
10,
11].
In this note, I consider the main measures associated with Burt’s structural holes, namely effective size, redundancy, local constraint, and constraint, and derive simple formulas for them based on the adjacency matrix of the network. This can help to have a unified framework where all measures are not only computed starting from the adjacency matrix, but also with a matrix representation. It can also help to interpret and compare these measures and produces intuitive and naïve algorithms, based on matrix multiplications, that work fairly well on medium-sized networks (see
Section 7 for more details on this). While this approach is clean and simple, it has clear limitations when the network being analyzed becomes very large. In such a case, one should avoid storing the matrices explicitly and rely on distributed algorithms and more advanced techniques for triangle listing in sparse graphs. It should be noted here that one of the limiting factors of this paper’s simple approach is that the matrix
is denser than the adjacency matrix
A. A comparison in a controlled environment should show that more efficient algorithms based on triangle listing with vertex orderings and neighborhood markers [
12] scale better with the number of nodes than the naïve algorithms proposed here.
Thus, the contribution of this paper to the extant literature consists, first, in making clear that all these measures can be easily and directly computed starting from the network’s adjacency matrix and, secondly, that the formulas obtained can be used as naïve algorithms that can work when applied on networks of medium dimension without the use of advanced techniques.
The paper is structured as follows. In
Section 2, we clarify the notation that will be used in the rest of the article, especially for what concerns matrix operations and element-wise operations. In
Section 3, we start from the definition of
effective size (and
redundancy) given by [
4] and show how it can be written with vector- and matrix-based operations (i.e., Equation (
4)). With the same line of reasoning, in
Section 4 and
Section 5, we start from the definition of
local constraint and of
constraint given by [
5] and show how they too can be written with vector- and matrix-based operations, respectively in Equations (
16) and (
17). Then, in
Section 6, we adopt the same approach and show how the measure developed by [
13] called
improved structural holes can be written with matrix–vector operations, as in Equation (
18). Lastly,
Section 7 concludes the paper.
2. Notation
In what follows, matrices are denoted by capital letters (e.g., A and P) and their elements denoted by the corresponding letter with subscripts (e.g., and ). Generic nodes of a network (i.e., a graph) will be indicated by i, j, or k. Consequently, an adjacency matrix will be indicated by , where n is the number of nodes and the elements can be 0 or 1 for binary networks or generic real numbers for weighted networks.
Vectors and their elements will be respectively denoted by bold letters (e.g., and ) and letters with a (single) subscript (e.g., ). The vector obtained by taking the diagonal elements of a square matrix A is denoted by ; analogously, the matrix that has as its diagonal and 0s elsewhere is denoted by . The transposed of a vector or matrix is denoted by (e.g., and ). Hereafter, vectors are considered as columns, that is ()-matrices and their transposed as row vectors . Accordingly, the (matrix) multiplication of a column vector times a row vector will yield a matrix (e.g., ), whereas will yield a scalar.
The matrix multiplication between two matrices A and B will be denoted by juxtaposition, i.e., , whereas element-by-element operations such as element-wise multiplication or division will be denoted, respectively, by ⊙ and ⊘. The n-dimensional unitary vectors in containing all 0s but one 1 in the i-th position is denoted by , while the vector containing all 1s is denoted by . The identity matrix is denoted by I.
3. Effective Size and Redundancy for Undirected Binary Networks
The original definition of effective size and redundancy in Burt’s works was complicated, but Borgatti [
4] has shown that it can be simplified. Here, we consider an undirected and binary network with no self-loops. The intuitive idea (see
Figure 1) is first to compute a node’s
redundancy, which is the mean number of connections from a neighbor to other neighbors. The
effective size is then obtained by subtracting the redundancy to the node’s degree.
Let
be the redundancy of node
i and let
be “the number of ties in the network (not including ties to ego)” [
4] (Note: “ego” here is node
i). The redundancy is then simply (since the network is assumed undirected, the links of
have to be counted twice)
where
is
i’s degree. Notice that
goes from 0 to
. (This is also well related to the notion of
local clustering, which can be thought of as a normalized version of redundancy ranging from 0 to 1. It can easily be shown that the relationship between the local clustering
and redundancy
of a node
i of degree
is given by
[
14]). The effective size
of node
i is then defined as
We now look at how to compute this in a matricial form. Let
be the adjacency matrix of such an undirected and binary network, and let
be the vector of nodes’ degrees. It should be noted that
A is a symmetric matrix only containing 0s and 1s. In such a case the vector of nodes’ degree can be obtained in several ways, for example, as
or
. Notice also that, for a binary network, the elements of the square
count the number of common neighbors. Indeed, for every two nodes
i and
j, the
-th element of
is
since
is different from 0 if and only if
i and
k are linked; analogously,
is different from 0 if and only if
k and
j are linked. Obviously, we only want to count the common neighbors for pairs of nodes that are actually linked in the network. To do so, it suffices to multiply
element by element for
A itself. Lastly, we want to sum all these numbers and divide them by the corresponding degree.
Summing up, a matricial way to compute the vector of nodes’ effective size,
, is by computing the following vector:
where
is
A squared with the standard matrix multiplication. The
i-th component of such a vector,
, is node
i’s effective size. By definition, the redundancy is just the last term, that is
, where
.
An Example
Consider the network in
Figure 1. The adjacency matrix and the degree vector are, respectively,
For simplicity, in
A, the 0s are not indicated. Notice that self loops are not allowed.
Now, since
Equation (
4) yields the effective size for each node:
4. Local Constraint (a.k.a. Dyadic Constraint)
In this section, we adopt the same approach used in the previous section to obtain a formula based on the network’s adjacency matrix for the so-called local constraint.
Let
be the adjacency matrix of a network (not necessarily binary or unweighted). That is,
A is not necessarily symmetric and may contain elements different from 0 and 1. The only assumption here is that no self-loop is allowed, that is,
for all nodes
i. Following Everett and Borgatti [
5], the
local constraint on
i with respect to
j, denoted
, is defined by
where
is the set of neighbors of
i, and
is the
normalized mutual weight of the edges joining
i and
j, that is,
This is also known as the
dyadic constraint. Notice that assuming the absence of self-loops, then every
because
. This implies that the second term in Definition (
5) can be written as
and, hence,
becomes
Now, let us focus on
, writing Equation (
6) in matricial terms:
and let us define vector
, where
Notice that the denominator here is the multiplication of a row vector times a column vector, which is a number.
Thus,
and we can consider the vector containing all inverted elements:
Notice that
is always symmetric, even if
A is not.
We then define a matrix that only requires that the diagonal is equal to
, that is,
. Now, we can finally compute
as follows:
By pre-multiplying a diagonal matrix, we are simply multiplying every row
i of
for the corresponding element
of the diagonal.
We will now focus on
. Consider again the second term of the definition’s formula as written in Equation (
8)
where the summation on the right-hand side is over all nodes
k (not just limited to
i’s neighbors). Note that, if the network is weighted, then here one has to first compute the binary version
A of the weighted adjacency matrix
W, where
if and only if
; otherwise,
. One can then simply apply the formula written in the text.
Written in matricial form, this summation in Equation (
14) can simply be expressed as
where ⊙ is the element-wise matricial multiplication and the second is a matrix multiplication. The modification regarding the dyadic constraint mentioned above consists in taking
here.
To conclude, we can write the matrix
containing all links’ local constraints as follows:
Summing up, the algorithm to compute the local constraints with this formula takes the adjacency matrix A as input and proceeds with the following steps:
Clearly, the output of such algorithm is the matrix where is the local constraint of link .
5. Constraint
In line with what was done in the previous sections, here we obtain a matrix-based formula to compute the network’s constraint, starting from the local constraint computed in
Section 4.
Let
be the local constraint matrix computed in Equation (
16). According to Everett and Borgatti [
5], the
constraint for node
i (also known as
first-type constraint in the terminology used by [
3]) is
Notice that, in our notation,
does not include
i itself. To be even more clear, one could then write
.
One can re-write this as follows:
where
is the adjacency matrix. In case the network is weighted, then the matrix
A here is the binary version of the weighted adjacency matrix
W, as observed in Footnote
Section 4.
Therefore, the vector
containing the constraints of the network is obtained by summing the rows of the matrix
:
Remember that, in our notation, vectors are always considered as columns.
Thus, one can use this matrix Formula (
17) as an algorithm to compute the constraints of the network gathered in vector
, where
is node
i’s constraint.
6. Improved Structural Holes
In this section, we adopt the same approach used so far and apply it to a variation of the constraint measure called
improved structural holes (ISH) developed by [
13]. This measure has been used to identify key nodes in networks and, with respect to measures of centrality such as betweenness and closeness, it has the advantage that it only uses local information, while, in comparison with more local measures of centrality, such as degree centrality, it has the advantage that it is better able to capture the importance of a node. We now describe how to compute the ISH with a matrix-based formula.
Consistently with the notation used so far, let us follow [
13] and consider their definitions.
The
node importance (also known as ISH) of
i is then defined as
In matricial form, the term within parenthesis
, where now
k is free to vary from 1 to
n, and can be written as
Thus, the vector of the ISH,
, can be computed as
Concerning the computational complexity, as already observed by [
13], the use of naïve matrix multiplications to compute the ISH would result in a complexity of the order
; however, a deeper analysis can show that it can be computed with a time complexity of
, where
is the average of the degrees squared.
7. Discussion and Conclusions
The notion of structural holes, proposed in Burt’s seminal work [
15], has been widely used to explain how, in networks where information flows through contacts, certain positions matter more than others. Individuals classified as structural holes spanners are able to exploit the lack of connections between separate parts of the network by bridging them and act as gatekeepers or brokers. These concepts have seen applications in the literature of management, sociology, and organization science, where the majority of the works conclude that where such network gaps are present, the role played by intermediaries gains importance precisely because they can control the flow of information and, hence, gain a competitive advantage from their position [
16,
17,
18,
19,
20].
Structural holes theory is related to [
21], which is on the strength of weak ties, because weak ties are capable of connecting parts of the network that would otherwise remain distant. It is also often seen as a competing theory with respect to [
22], which is on the importance of network closure, redundancy, and network density [
23]. However, these theories ultimately rely on the same underlying principle: ties alter the constraint ego [
24].
Although the notion of structural holes is widespread and very often used, its actual measurement may be complicated for several reasons: First, a variety of measures have been proposed, and each one is tailored to a specific application, whose purpose is to capture a specific aspect of the structural hole notion. Second, as it often occurs for network measures, their actual computation with naïve techniques obtained from the definition formulas would require looping over every node and over every node’s neighbor.
This paper thus contributes to the previous literature by making clear that all the measures commonly associated with the notion of structural holes can indeed be computed starting from the network’s adjacency matrix, and we provide formulas that are based on this matrix and mainly use only matrix multiplication. As a consequence, we additionally show that these formulas can be used as naïve algorithms and directly applied to networks of up to a medium dimension to compute such structural holes measures. We also highlight the limitations of this approach that can in fact be used on moderately-sized networks (see
Figure 2), but that should be outperformed by algorithms based on more advanced techniques that optimize memory and speed, although further research is still needed to explore this aspect.