Sparse Power-Law Network Model for Reliable Statistical Predictions Based on Sampled Data

A projective network model is a model that enables predictions to be made based on a subsample of the network data, with the predictions remaining unchanged if a larger sample is taken into consideration. An exchangeable model is a model that does not depend on the order in which nodes are sampled. Despite a large variety of non-equilibrium (growing) and equilibrium (static) sparse complex network models that are widely used in network science, how to reconcile sparseness (constant average degree) with the desired statistical properties of projectivity and exchangeability is currently an outstanding scientific problem. Here we propose a network process with hidden variables which is projective and can generate sparse power-law networks. Despite the model not being exchangeable, it can be closely related to exchangeable uncorrelated networks as indicated by its information theory characterization and its network entropy. The use of the proposed network process as a null model is here tested on real data, indicating that the model offers a promising avenue for statistical network modelling.


Introduction
Network science [1][2][3][4] is one of the most rapidly advancing scientific fields of investigation. The success of this field is deeply rooted in its interdisciplinarity. In fact, network science characterizes the underlying structure and dynamics of complex systems ranging from on-line social networks to molecular networks and the brain. Additionally, the theoretical tools and techniques used by network science are coming from different disciplines including statistical mechanics, statistics, machine learning and computer science.
In the last twenty years significant attention has been addressed to modelling framework of complex networks. Since most real networks, from the Internet to molecular networks, are sparse, i.e., they have an average degree that does not depend on the network size, statistical mechanics models focus on modelling sparse networks. These statistical mechanics models can be divided We also compare the correlations of the real dataset with the correlations of the simulation results to show that the simulations are able to generate only weak correlations of the degrees. Therefore a more refined model should be formulated to capture this additional network property.
The paper is structured as described in the following. In Section 2 we introduce the definition of the desired statistical properties of network models: projectivity and exchangeability. In Section 3 we discuss major examples of sparse network models (the Barabási-Albert model and the uncorrelated network ensembles) and characterize them with respect to the properties of projectivity and exchangeability. In Section 4 we present an account of the difficulties in combining projectivity and exchangeability with the sparseness of networks and we give a brief review of the approaches investigated in the recent literature on the subject. In Section 5 we present a network process mimicking a network sampling process. We characterize its structural and dynamical properties relating this non-equilibrum model to equilibrium uncorrelated network ensembles, and we characterize its statistical properties. In Section 6 we show the possible use of the proposed network process as a null model for modelling real power-law network datasets. Finally in Section 7 we give the conclusions.

Statistical Terms
Projectivity and exchangeability are two very basic and very natural statistical requirements for reliable statistical network models. In physical terms, projectivity is directly related to the principle of locality, while exchangeability is related to symmetry. In this section, we first discuss projectivity and exchangebility to make clear that they really are "must-haves" in any statistically useful network model, while in the next two sections we will comment on difficulties in combining them both in models of sparse networks, i.e., having average degree independent of the network size N [41]. While projectivity and exchangeability are desired properties of statistically reliable network models, the relevance and of these requirements for any realistic network model is a subject of scientific debate (see for instance contribution of Karthik Bharath in the discussion of the F. Caron and E. Fox paper [33]). In fact it is often observed that most real networks can hardly be exchangeable. Indeed, in a vast majority of real networks nodes are labelled with labels related to some rich metadata and a random permutation of the nodes labels would result in a different network whose probability to be produced by the same stochastic process that produces the real network is certainly not expected to be equal to the probability with which it generates the real network.
In order to investigate the properties of reliable statistical models we consider a network process mimicking the subsequent sampling a network by expanding the set of sampled nodes and detecting all the interactions among this set of nodes.
To this end we consider a set of networks {G t } t = 1, 2, ... with G t = (V t , E t ) and increasing network size N t = |V t | = t. The sequence of networks defines a network process, i.e., G t = (V t , E t ) is an induced subgraph of the network G t = (V t , E t ) for all t < t with node set V t ⊂ V t if t < t . We label the nodes in order of their appearance in the network such that where the projective map π t ,t maps networks G t of a larger size t > t to their subgraph G t of a smaller size t.
In other words this means that one can first generate a larger graph G t using the model, then reduce its size to t by throwing out some t − t nodes according to the projective map specification, and the probability with which the resulting graph G t is generated using this two-step procedure will be the same as if graph G t was generated by the model directly.
it is easy to see that we should also have Therefore to guarantee projectivity the expected degree of each node should grow linearly with the network size, resulting in a dense network with the total number of links L scaling with the network size N as L = O(N 2 ). This implies that the random network G(N, p) with p independent of N is an exchangeable model whereas the Poisson random network G(N, p) with p = z N and z independent of N is not exchangeable. In fact one cannot throw out N − N nodes from a network of size N produced by G(N , z/N ), and hope that the resulting network will have the same probability as in G(N, z/N), simply because the links in the G(N , z/N ) and G(N, z/N) ensembles exist with different probabilities z/N and z/N that depend on the graph size N. Alternatively, if one attempts to formulate G(N, z/N) as a growing model, then since the edge existence probability depends on N, the addition of a new node affects the probability of existence of edges in the existing network. Since this probability is a decreasing function of N (z/N), upon the addition of a new node all the existing edges must be removed with some probability (1/N). In other words, in such a growing model new node additions must necessarily affect the existing network structure.

Impasse with Sparsity
Surprisingly, combining projectivity and exchangeability with the additional constraint of sparsity, i.e., the requirement that the average degree of the sampled networks is independent of the network size, has been a major impasse. If we exclude spatially embedded networks [31], to the best of our knowledge there exists no model of sparse networks that would be both projective and exchangeable at the same time. This situation is in stark contrast with the case of dense graphs. Dense graphs are known to have well-defined thermodynamic limits known as graphons, and any graphon-based network model is both exchangeable and projective [30].
The thermodynamic limits of sparse graphs are at present quite poorly understood, which appears to be one of the reasons behind the mentioned impasse. Several attempts have been made to understand the limits of sparse graphs, including, for example, sparse L p graphons [32], which are not projective, or stretched graphons a.k.a. graphexes [33][34][35]. In the latter case, graphs are sparse, exchangeable and projective, but with two major caveats: (1) the average degree cannot be constant, it must diverge with N (but possibly slower than linearly), (2) exchangeability is completely redefined: it is not with respect to node labels 1, . . . , N, but with respect to artificial labels which are positive real numbers.
Another class of attempts suggests to completely give up on the node label exchangeability requirement, and to consider edge exchangeability instead, e.g., using variations of Pitman-Yor processes [36][37][38]. It remains unclear at present whether these developments imply that too many network models that were found to be quite useful in practice and that do use node labels 1, . . . , N, are statistically hopeless. It seems more likely that further research is needed to understand and resolve this projectivity vs. exchangeability impasse in sparse network models.

Proposed Solution of the Impasse Based on Network Geometry
In [31] it was shown that a generic network model is projective if the probability of edge existence, i.e., the connection probability, does not depend on the network size N. In fact if the connection probability does depend on N, then, the addition of new nodes to the existing network in the growing formulation of the model necessarily affects the existing network structure and the network cannot be projective.
In order to formulate network models in which the connection probability does not depend on the network size N, embedding networks in space can turn out to be very useful. In fact spatially embedded networks can combine projectivity with a constant average degree [31] as their spatial embedding ensures projectivity when the connection probability is local and nodes connect typically to nodes that are spatially close. For instance if the nodes are uniformly distributed in R 2 and each node connects only to the nodes with a constant radius r 0 , by sampling the network by progressively expanding the spatial region of interest we can build a projective model with constant average degree. This is clearly a realistic scenario in most real networks as it unlikely that a local event in a spatial network causes a global change in the network. For instance in the Internet, the appearance of a new customer of a local Internet provider in Bolivia cannot lead to immediate severance of customers by a local Internet provider in Bhutan.
It turns out that models that are not explicitly constructed from spatial embeddings can also be analysed using geometrical arguments, hence shedding light on their statistical properties. In this vein, it was recently shown that the hypersoft configuration model, which defines maximum-entropy random graphs with a given degree distribution, is sparse and either exchangeable or projective [39]. Both sparsity and exchangeability definitions are traditional in the model, i.e., the average degree is constant and exchangeability is with respect to labels 1, . . . , N, so that the only caveats are in "either-or" and also in that this "either-or" is achieved only for specific degree distributions (power law with exponent γ = 3 in [39]).
In the exchangeable equilibrium formulation of the model, nodes are points sprinkled at random onto an interval A N of an N-dependent length L N , where L N is a growing function of N, according to a non-uniform point density (if this point density is exponential, then the resulting degree distribution is a power law), and then all pairs of points/nodes i and j, j > i = 1, . . . , N, at sprinkled coordinates x i and x j are connected by an edge with the entropy-maximizing Fermi-Dirac connection probability that does not depend on the network size N.
In the projective growing formulation of the same model, the interval A N grows with N, its length growing according to L n , new node N + 1 appears in the interval increment A N+1 \ A N of length L N+1 − L N , and then connects to existing nodes with the same connection probability as in the exchangeable formulation.
The difficulty of combining projectivity and exchangeability is evident in this example: in the exchangeable formulation, node labels i are random and uncorrelated with their coordinates x i , while in the projective formulation, nodes are labelled in the increasing order of their coordinates: i < j ⇔ x i < x j . If nodes are labelled this way, then the projective map π N ,N trivially throws out nodes with labels N + 1, . . . , N , and the resulting graph satisfies the projectivity requirement since the connection probability does not depend on N, and since the remaining N nodes lie in A N . If the node labels are random, however, as they are in the exchangeable formulation, then it remains unclear if even an asymptotically correct projective map can be constructed.

Statistical Mechanics Model with Hidden Variables
Our goal is here to reconcile sparseness with a reliable statistical modelling framework without assuming the existence of an embedding geometrical space. In this endeavour we will define a projective network process yielding a sequence of networks growing by the subsequent addition of nodes and links. To each node i we associate a hidden variable θ i that is a proxy for the degree that the node will acquire in the model. The statistical properties of the network model when we average over all the possible sequences determining the subsequent addition of the links obey scaling laws and reduce to the uncorrelated network model of any size N in the sparse regime.
Although this model does not ultimately reconcile sparseness with both exchageability and projectivity, we will see in Section 6 that it provides a very reliable null model for power-law networks also if only a subsample of the original network is considered.

The Model
The model can be interpreted as a weighted growing network model where we allow multiedges. In the model every node i is assigned a hidden variable θ i from a hidden variable distribution ρ(θ).
Starting at t = 1 from a single isolated node, at each time t > 1 a new node i is added to the network and draws κ i links to the existing nodes of the network, where κ i is chosen according to the Poisson distribution with average θ i , i.e., Each new link is attached to a node j already present in the network with probability Note that not all the new links will yield new connections because the nodes i and j might be already connected. Additionally note that this model does not implement preferential attachment as the linking probability is only dependent on the externally attributed hidden variable θ i and not to the dynamically acquired degree k i . Whenever a new link connects node i to an already connected node j the multi-edge between node i and node j is reinforced, i.e., the weight of the links between node i and node j increases by one.
Here and in the following we will indicate by a the adjacency matrix of the network, with t i the time at which node i has been added to the network, with k i the node degree and with s i the node strength, i.e., the sum of the weights of the links incident to node i.

The Strength of a Node and Its Dependence on the Hidden Variable θ
The hidden variable θ i modulates the temporal evolution of the strength of the node i. In fact in the mean-field approach [1, 5,44], since at each time an average of θ links are added and reinforced, the average strength s i (t|t i , θ i , κ i ) of node i given the time t i of its arrival in the network, its hidden variable θ i and its initial strength κ i obeys the equation Therefore in this model the strength depends both on the time of arrival of the node in the network and on its hidden variable. If we average the strength over the nodes with the same hidden variable however, we see that the average strengths i (θ i ) of nodes with hidden variable θ i is given in the large network limit t 1 bys In fact we have This implies that if we attribute to a node a hidden variable θ i and we consider a set of models in which the time of arrival of node i is taken randomly, the strength of node i is (on average over the different network models) determined only by its hidden variable.

Strength Distribution
The strength distribution of the model is a convolution of exponentials. To find the strength distribution we use the master equation approach [44] under the assumption that the hidden variable distribution has a well defined average value θ . To this end we write the equation for N t θ (s), the average number of nodes with hidden variable θ that have strength s ≥ 0 at time t, as where δ(x, y) indicates the Kronecker delta and where we denote by Π(θ) the probability that a node with hidden variable θ is attached to the new node arrived in the network at time t by one of its connections, i.e., Given the continuous growth of the network asymptotically in time, for t 1 it is possible to assume that where P θ (s) is the probability that a random node has strength s and hidden variable θ. By inserting this asymptotic expression in the master Equation (21) and solving for P θ (s) we get Therefore given the value of the hidden variable θ and the initial number of links κ the strength distribution is exponential. The overall strength distribution P(s) of the model determining the probability that a random node has strength s is given by the integral of P θ (s) over all possible value of the hidden variable θ, i.e., This result reveals that the strength distribution can be different from the distribution of hidden variables. For instance if all the hidden variables are the same, the strength distribution will still allow for fluctuations of the strengths. However for power-law hidden variable distributions the strength distribution has a power-law tail with the same exponent γ for s 1. In fact, by inserting the explicit expression ofP(κ|θ) and of ρ(θ) in Equation (25) we get For s 1 we can approximate the sum over κ with the infinite sum getting where the last expression is valid if s 1. Therefore, although in general it is not true that the hidden variable distribution is the same as the strength distribution, in the case of power-law distributed hidden variables the strength distribution displays a power-law tail with the same exponent. Note that this is valid for power-law exponents in the range γ ∈ (2, 3] but also in the range γ ∈ (1, 2]. Therefore in this case the hidden variables can be used to directly tune the strength distribution.

Connection Probability
In this section we derive the expression for the connection probability between any two nodes. Let us consider the probability P(a ij = 1|θ i , θ j , κ j , t j > t i ) that node i is connected to node j, i.e., a ij = 1 given the hidden variables of node i and node j, their time of arrival with t j > t i and the initial strength κ j of node j. This probability is one minus the probability that all of the initial links of node j do not connect to node i, i.e., If we now average over the probabilityP(κ j |θ j ) we get the closed form expression where we have assumed that the average of the hidden variables θ is well defined. Therefore we have found that the connection probability between two nodes depends both on the hidden variables and on their time of arrival in the network. It follows that the model is not expected to be exchangeable, as this would require a connection probability independent of the time of arrival of the two nodes. However the fact that this connection probability does not only depend on the time of arrival of the nodes in the network (or the order in which they are sampled) can be a useful characteristic of a reliable statistical model.

Degree Distribution in the Sparse Regime
Here we derive the degree distribution of the model in the sparse regime, when we can assume that p ij 1. We will show that in this regime, each node has a Poisson degree distribution with an expected average degree k i depending both on the value of its hidden variable and on the time of its arrival in the network.
The probability P(k i |θ i , t i ) that a node i arrived in the network at time t i and, having hidden variable θ i , has degree k i can be calculated starting from the connection probabilities p ij given by Equation (31). Let us indicate with a i = {a ij |j ∈ {1, 2, . . . , N}} the elements of the adjacency matrix in the i-th row indicating the connections of node i. Since node i is connected with each node j with probability p ij given by Equation (31), the probability P (a i ) is given by Using this result we can express the probability P(k i |θ i , t i ) that node i has degree k i as where we have used the integral representation of the Kronecker delta δ(x, y). By performing the sum over all the elements of a i we get where For p ij 1 we can approximate F(ω) with where k i is the expected degree of node i given by Note here that since the connection probability p ij depends both on the hidden variables of the nodes i and j and on their arrival time in the network, it follows that also the expected degree k i of node i will be both a function of the node's hidden variable and its time of arrival in the network. Using Equations (34) and (36) we can derive the explicit expression for P(k i |θ i , t i ). In fact we have and by identifying the last integral with the Kronecker delta δ(h, k i ) we get the Poisson distribution Therefore the probability that node i, which arrived in the network at time t i with hidden variable θ i , has degree k i is given by the Poisson distribution with average k i given by Equation (37). It follows that the degree distribution P(k) of the network at time t is given by Note that for sufficiently sparse networks where each two connected nodes are typically connected by a link of weight one, the degree of a node can be identified with its strength It follows that in this case the degree distribution can be approximated by the strength distribution and we have that if the hidden variables are power-law distributed with power-law γ (as described in Equation (26)) then also the degree distribution has a power-law tail with the same exponent γ, i.e., for k 1.

Random Permutation of the Node Sequence
Here we investigate whether the described network process can be related to the generation of uncorrelated networks. In this way we aim at reconciling the non-equilibrium growing nature of the network model, displaying projectivity, with the properties of exchangeable but not projective uncorrelated network models.
We observe that this expression depends both on the hidden variable and on the time of arrival of the nodes i and j in the network. However if we consider several realizations of the model in which the times of arrival of node i and node j are random, but the hidden variables are preserved, we observe that the probability that node i and node j are connected satisfies Therefore if the network is sufficiently sparse, i.e., we have that the expected degree k i (θ i ) of a random node i of hidden variable θ i is given bỹ and the probability that a node with hidden variable θ i is connected with a node with hidden variable θ j independently of their time of arrival in the network, is given by the uncorrelated network marginal corresponding to the number of nodes in the sample, i.e., Note that in this case if the sample increases in size and includes N > N nodes, the probability that node i and node j are connected will satisfỹ In this case the network process induces a probabilityp ij that depends on the network size N and at the same time enforces the sparseness of the network. In fact the expected degrees {k i } of the nodes are only determined the the hidden variable and are independent on the network size.

Entropy of the Network Model
In order to compare our model with hidden variable distribution ρ(θ) to an uncorrelated network ensemble in which the expected degrees arek i = 2θ i , in this section we use information theory tools.
Specifically we will compare the entropy of the two ensembles. The entropy of a network model or of a network ensemble [14][15][16][17]40] is a fundamental tool to evaluate the information content in the network model. It indicates the logarithm of the typical number of networks generated by the ensemble and as such evaluates the complexity of the model and can be used in inference problems [40]. Since for our network model the connection probability p ij of any two pair of nodes is i and j is given by Equation (31), the entropy of the model is given by where in the sparse regime we can approximate p ij with t j > t i as Similarly for the uncorrelated network ensemble with connection probabilityp ij the entropy is given byS In order to compare these two entropies we use the explicit expression for the connection probabilityp ij when we putk i (θ i ) = 2θ i which reads By performing a straightforward calculation we find that S is given, up to the linear terms in N, by and that the entropy S of our model is smaller than the entropy of the uncorrelated network ensemble. In fact, S differs fromS only by The entropy difference ∆S quantifies the information loss when the proposed network process is approximated with its corresponding uncorrelated network model. We observe here that the uncorrelated network model is obtained when the causal construction of the original network model is disregarded and the only retained information is the probabilityp ij that two nodes of hidden variables θ i and θ j are connected regardless of their time of arrival in the network. Therefore ∆S captures the loss of information when the causal nature of the original model is disregarded. Interestingly in the large network limit N 1, |∆S| is low when compared to S revealing the proximity between the two models. Additionally ∆S is only dependent on θ indicating that the information loss from one model to the other is independent of the particular distribution of the hidden variables ρ(θ) as long as θ is kept constant.

Statistical Testing of the Model
In order to study the utility of the proposed model as a null model for sampled data we consider three power-law networks: the arxiv hep-ph (high energy physics phenomenology) citation network [45,46], the Berkeley-Stanford web network [47] and the Notre Dame web network [48] of network sizes N = 34,546, N = 685,230, N = 325,000 respectively. All data are freely available on the Stanford Network Analysis Project webpage. To each node of the network we assign a different label i ∈ 1, 2, . . . , N according to a random permutation of the indices from 1 up to N. We then assign to each node i of the network a hidden variable where k i is the observed degree of node i in the dataset. Given our random node labelling and the hidden variables {θ i } i = 1, 2, ..., N we have generated a random network according to the proposed network process. Interestingly the proposed model preserves to a large extent the degree distribution (see comparison of the real degree distribution with the one generated by the model in Figure 1). Additionally these results are quite stable if we consider a model generated only by adding a subsample of randomly chosen nodes, showing that the model preserves the degree distribution under random sub-sampling of the nodes (see Figure 1).    1. The degree distributions P(k) of the three analysed datasets is compared with the results of the model generated by using all the nodes of the network or with just a subsample of nodes of the network of size N. Panels (a-c) display the results for the arxiv hep-ph citation network [45,46] (N = 34,546) the Berkeley-Stanford web network [47] (N = 685,546) and the Notre Dame web network [48] (N = 325,000) respectively.
The generated model however is to be considered mostly as uncorrelated. In fact if we compare the degree correlations of the real datasets with the degree correlations of the network generated by the model we observe that the model deviates from the real data and displays very weak/marginal degree correlations (see Figure 2). In fact from the results obtained for the three studied network datasets it seems that the model is able to better reproduce weakly assortative behaviour than strongly disassortative behaviour. In future, modifications of the proposed model could be envisaged to capture also the degree correlations of real datasets.  The average degree k nn (k) of the neighbour of a node of degree k of the three analysed datasets is compared with the results of the model generated by using all the nodes of the network or with just a subsample of nodes of the network of size N. Panels (a-c) display the results for the arxiv hep-ph citation network [45,46] (N = 34,546) the Berkeley-Stanford web network [47] (N = 685,546) and the Notre Dame web network [48] (N = 325,000) respectively.

Conclusions
In conclusion, we have given a wide overview of the desirability of the projectivity and exchangeability properties in good statistical models and we have emphasized the difficulty in combining these properties with the sparseness of the network. While this problem is a widely discussed subject in statistics of networks and graph theory, here we have proposed a model that provides a trade-off solution. Our model describes a network process in which nodes and links are subsequently added according to a probability dependent on some hidden variables associated to the nodes. As long as the hidden variables are power-law distributed this model generates a scale-free network with the same exponent. This model is projective but not exchangeable. However, the expected probability that two nodes are connected when one considers a random permutation of the sequence in which nodes are added to the network reduces to the expression valid for the marginal of an uncorrelated exchangeable network with the same expected degrees (given by the double of the hidden variables) provided the network is sufficiently sparse. Finally, we tested this model as a statistical null model for scale-free sparse real networks, showing that it can reproduce the degree distribution (but not degree correlations) also if a partial subset of the data is considered.