In the following subsections, we first introduce a very fast algorithm for generating a BN, then tackle the problem of the BL appearing as a function of the density of the network.
3.1. A Fast Algorithm for a BN with Maximal/Minimal Assortativity
Creating a BN is a first step that can serve as a basis for comparing and testing algorithms. The selection of nodes is without loss of generality, as for a different number of nodes all that is needed is to recalculate the total number of nodes which contribute to the total count of each leading digit. The overall approach remains the same. Here, we propose an algorithm that immediately builds a BN. The pseudo-code is as follows:
1. initialize a network with N nodes and 0 edges
2. assign each node its degree so as to fullfill the BL
3. Unil each degree is reached:
select the beginning and end of each edge
which, from the point of view of the adjacency matrix, reads as follows:
1. create an NxN matrix A with each element equal to 0
2. create a vector v of length N storing the degree of each node
3. Until each degree is reached:
select i, j, and set A(i,j)=A(j,i)=1
The first step is , as it involves the creation of a matrix in which each element is equal to 0. Practically, in the second step a list is created in which each node is assigned the desired node degree (for instance, nodes 1–4 are assigned the node-degree 9, nodes 5–9 are assigned the node-degree 8, etc., until the last 30 nodes with degree 1, although this is not the only possibility). This step is , as it consists of reading a vector with N entries.
The third step is the selection of the beginning and end of each edge. To perform the task, the list is scrolled to select the match, which in principle can be done randomly. However, random matching of the beginning and end of each link is not as fast as following a precise criterion, since it involves a pseudo-random number generator. We propose two criteria, one aiming at maximal assortativity, the other at minimal assortativity. Therefore, the last part can be detailed as follows:
1. create an NxN matrix A with each element equal to 0
2. create a vector v of length N
assigning the degree in descending order
3. for each node i=1,\ldots,N
until its node degree v(i) is reached:
match the other end j of each edge
with the first available node
in the above, ‘available’ stands for ‘not already connected’, that is, for which the node degree has not already been reached.
Because the order of the degrees is descending, the algorithm begins with the nodes with the highest degree.
The algorithm provides a BN. This is trivial due to the condition of the node degree.
Figure 2 shows the network.
Remark 6. A network obtained in this way gives rise to the maximal assortativity. The condition of the descending order ensures that nodes with a high degree first have edges with nodes with a high degree, and have edges with nodes with a lower degree only when there is no better possibility [1]. Because assortativity is the correlation among the node degrees, any inversion in the sequence immediately decreases the values in the formula. The complexity of this match is the same as a roll of the list, assigning the node degree () and then the second one to find the first available node; thus, with N nodes the complexity is , which is much faster than any other random rewiring procedure, as it avoids the computational time needed for the pseudo-random generator.
Remark 7. The proposed algorithm has a computational time .
Remark 8. Here, we introduce a condition to avoid loops (i.e., ) except where strictly necessary to match the degree list. In fact, general speaking, not all assignments of degrees to the nodes are compatible with the topology of a network. Figure 3 shows this issue. If loops are not allowed and four nodes have degree four, then the fifth node needs to have degree four as well. For instance, if we assign degree 3 to the node, we need to remove one link; hence one of the other nodes, say node b, needs to have its degree decreased to 3 as well. Therefore, the set of degrees is incompatible with the network unless we allow loops. When running the algorithm on a network of 100 nodes, 171 edges are created, which corresponds to , where and . The constant is needed due to the bidirectional role of the edges. The density is , and there are no loops.
The condition on the match among nodes with the closer (higher) degree can be inverted, setting the connections among the nodes with either the highest node degree or the lowest one. The result continues to be a BN network, as the requirements on the BN are unchanged, except now with the assortativity slightly negative and very close to 0, with one link less than needed, resulting in the need to add one loop. The density remains the same. The computational complexity remains the same as well, as the list is simply scrolled in the reverse direction.
Figure 4 shows the network.
Remark 9. This is not the only way to create a BN. For instance, in a BN, a node with degree 1 makes the same contribution to the distribution as a node with degree 10, 11, ⋯, 19, as the leading digit remains 1. In general, a node contributes to the count of a leading digit x if it has x, , , ⋯, edges, meaning that each node degree may have 11 different values and contribute to the counting of the same leading digit. The computational complexity remains the same as the function of N, as is an upper limit for the edges departing from every single node. However, keeping the node degrees as low as possible contributes to the speed of the algorithm (obviously, creating 30 connections for the set of the 30 nodes with degree 1 is 10 times faster than creating 30 × 10 connections in which each node with degree 1 is replaced by a node with degree 10).
3.2. The BN as a Function of the Density of the Network
This section recalls the first results on random networks, where the task was to understand the density required for particular properties. The rationale behind the fact that many densities can be compatible with the validity of the BL on the node degree distribution relies on the fact that only the leading digit contributes to the BL. The same argument as in Remark 9 allows us to calculate the total number of the BN which can be obtained from a network with
N nodes. If the identity of each node has to be kept the same, then there are
BN networks (11 possible values for each of the
N nodes, where each value can be taken independently from the values of the other nodes). The number of possible networks is simply too high for an exaustive analysis. If we focus on network topology, the identification number of each node is not relevant. For instance, in a group of four nodes with the leading digit of the degree equal to 9, it is not relevant if the first has degree 9 and the remaining three have degree 99, or if the second has degree 9 and the others have degree 99. What matters is how many have degree 9, 91, 92, ⋯, 99. Therefore, in each set of nodes having the same leading digit, the number of possible assignments for the node degree is calculated as the number of combinations with repetition of 11 objects. Two combinations with repetition are considered identical if they have the same elements repeated the same number of times, regardless of their order. Recall that the number of combinations of
elements taken at
k at each time is
Therefore, the total number of networks with topologies different from each other is different configurations, where . As this number of networks remains too high for exaustive generation and analysis of each, we fix a discrete set of densities.
In this section, we first perform a preliminary analysis of the range of densities of BNs, then obtain a picture of the assortativity as a function of the densities through a rewiring procedure.
3.2.1. Analysis of the Range of Densities of BNs
Keeping as our reference, a BN network in which each node has at least one link and with the minimum number of edges (that is, minimum density) is the same as the one built in the previous section. In fact, the set of node degrees is the lowest which can fit BL. Eventual lower densities of a BN can be obtained if nodes have 0 connections, allowing the percentage of the node degree to fit BL despite being calculated on a lower number of nodes. Alternatively, if we want to increase the number of edges, the minimal amount which we have to add is 9 to move from a node with degree 1 to one with degree 10. This results in a gap in the possible set of densities, while after this value there can be many BNs with intermediate values for the densities, up to the one with the maximum number of edges. The latter has 30 nodes with degree 19, 18 nodes with degree 29, ⋯, and 4 nodes with degree 99, due to the role of the leading digit. The number of edges is 2160, which corresponds to , where and . The density is .
3.2.2. Rewiring Algorithm
Rewiring is a quite immediate method for achieving a target topology. The pseudo-code can be outlined as follows:
1. start from a random seed network with the due density
2. while the network is not a BN
(or the maximal number of trial is reached)
2.a select a link for the rewire
2.b if the rewire produces a network closer to a BN:
then accept the rewire
otherwise skip
end
3. store the distances from a BN
4. report the data in a figure
Remark 10. is essential for measuring whether the resulting network is closer to a BN, and thus whether to accept the rewire.
Remark 11. The algorithm follows a descending direction (i.e., the rewire is accepted only if the distance from a BN decreases).
Conformity tests can provide an answer regarding either the rejection or acceptance of a probability distribution; this answer need not be only a yes/no, and various scales of conformity degrees can be used [
7,
9,
27]. However, conformity tests are not the best choice for running simulations. Those that provide only four degrees of acceptance (‘conformity’, …’not conformity’) are too rough to form a basis for simulations. Moreover, it is easy to find through cross-checking that all the conformity tests are computationally more expensive than calculating a histogram and the distance from a vector with 9 components. Our notion of distance does not aim at providing a conformity test, although it is possible to use them to elaborate on the matter as soon as bounds are defined.
Here, we deepen our analysis by focusing on the assortativity. The starting point is a BN; the rewiring aims at either increasing or decreasing the assortativity while maintaining the BN and allowing swapping of the edges. The algorithm is an iterative one, and can be outlined as follows:
1. select two links of a BN network
2. if the swap increases (decreases) the assortativity,
then accept the swap
Table 4 summarizes the results of
simulation steps. The first row reports the densities which were examined, taken with a step equal to
below the density
, to obtain fine detail, with step
being above
. The BN appears from density
until to density
, shown in bold. Below the minimal density, a BN can still be found if the histogram is calculated on the nodes which have at least one edge.
Figure 4 shows a graphical representation of the results of
Table 4.
3.2.3. An Intermediate Algorithm for the Immediate Construction of a BN and Random Rewiring
The distance from a BN which we use for simulation is fast and accurate. It does not involve a rewiring process, which is computationally more expensive. Suppose, however, that a seed network is assigned as a starting point for simulations. Is it possible to drive the rewiring without random selection of the nodes to be checked? In other words, can the edges to be rewired be selected through targeted distribution? The answer to this question provides a way of targeting the rewiring process. To outline this through an example, we refer to
Figure 1, specifically the High Energy Physics collaboration network. The maximal distance from the BL is in the bin corresponding to the leading digit, 5. Removing edges from that set of nodes would quickly improve the proximity of the distribution of the node degrees to the BL. Of course, this targeted selection can be carried out using the distance already introduced by working on the nodes of each bin instead, than selecting them at random. The computational time is
, as it involves a double reading of a list to determine which nodes need to have other nodes removed or added, followed by another scrolling of the list of nodes to find the match.