A Mixture Model of Truncated Zeta Distributions with Applications to Scientific Collaboration Networks †

The degree distribution has attracted considerable attention from network scientists in the last few decades to have knowledge of the topological structure of networks. It is widely acknowledged that many real networks have power-law degree distributions. However, the deviation from such a behavior often appears when the range of degrees is small. Even worse, the conventional employment of the continuous power-law distribution usually causes an inaccurate inference as the degree should be discrete-valued. To remedy these obstacles, we propose a finite mixture model of truncated zeta distributions for a broad range of degrees that disobeys a power-law behavior in the range of small degrees while maintaining the scale-free behavior. The maximum likelihood algorithm alongside the model selection method is presented to estimate model parameters and the number of mixture components. The validity of the suggested algorithm is evidenced by Monte Carlo simulations. We apply our method to five disciplines of scientific collaboration networks with remarkable interpretations. The proposed model outperforms the other alternatives in terms of the goodness-of-fit.


Introduction
Network science focuses on the study of complex networks such as telecommunication, computer, biological, cognitive, and social networks. A network consists of nodes and links. The topological structure, which explores how nodes are connected in the system, has been investigated with great interest [1][2][3]. Network researchers have examined foundational network topologies using various network-related quantities such as the degree distribution, clustering coefficient, and average path length. Most networks are dynamic, so accordingly, network-related quantities also change over time. Studying the evolution of these network quantities could provide insight into the behavior of individuals expressed by nodes and the change of topological properties of the network. The evolution of a network has been studied from various perspectives, e.g., the community [4,5], richget-richer [1], node heterogeneity [2,6], and link persistence [3].
The degree of a node is the number of links connected to a node. The degree distribution of a network, P(k), tells us the probability of degree k that a randomly chosen node will have. One challenge in studying network science is to develop simplified measures that capture the network structure. The degree distribution is one of such measures that help to find influential nodes in the system. Many attempts have been made to study the degree distribution using Poisson, exponential, and power-law distributions. In particular, the analysis of the power-law degree distribution has been considered as one of the basic steps, and networks that have power-law degree distributions are often referred to as scale-free networks. We can frequently observe power-law degree distributions in collaboration, World Wide Web, protein-protein interactions, and semantic networks [7][8][9][10]. The emergence of hubs (highly connected nodes) is a consequence of a scale-free property of networks. A rich-get-richer mechanism, also called a popularity effect [11], has been known to produce power-law degree distributions and hubs [1,6].
Many dynamic network models have been developed to explain the power-law degree distribution in real networks. The most widely known dynamic model for the power-law degree distribution is the rich-get-richer generative model in Barabasi and Albert (1999) [1], called the BA model. They employ the preferential attachment mechanism in which nodes with more neighbors tend to receive more links from other nodes. Specifically, the algorithm of the BA model has the following parts with the parameter m that controls the number of links over time: • Growth: At each time point, a node enters the network. Then the node tries to connect with m nodes in the network. • Preferential attachment: The newly entered node connects with node i with probability proportional to the degree of node i.
The BA model yields a power-law degree distribution with exponent 3, i.e., P(k) ∝ k −3 . The exact form, given by Bollobas et al. in [12], is: , k = m, m + 1, · · · . (1) There are many variant models to cover a broad range of power-law exponents [13][14][15], degree correlations [16], accelerating growth [17][18][19], and node heterogeneity [6,20,21]. Another model for generating the power-law degree distribution is a copying network model presented in Kumar et al. (2000) [22]. In this model, newly entered nodes randomly select some existing nodes and copy some of the links. The power-law distribution has the form of P(k) ∝ k −α , which can be expressed as ln P(k) = −α ln k + constant. Therefore, a straight line of ln P(k) plot on ln k (loglog plot) can be an indication of a power-law relationship, and its slope is a power-law exponent. In real networks, however, the degree distribution does not have a shape of a straight line in the entire range. There are many empirical distributions where the power-law behavior is not observed in the range of small degrees. Many variants of the power-law distribution are developed to address this issue, such as generalized powerlaw distributions [23,24], composite distributions with threshold [25,26], and power-law distributions with an exponential cutoff [27,28]. However, these methods do not consider the essential foundation of the power-law. According to the BA model and its variants, the power-law nature is an inherent property exhibited from the preferential attachment rule. The model presented in this paper preserves the power-law nature to avoid manual modifications of the power-law distribution function, given by P(k) ∝ k −α .
Note that the BA model has a parameter m. Jordan (2006) [29] relieved the constant m condition that, at each time, the number M of connections can change over time according to the distribution of M. The degree distribution turned out to be We note the following statistical property.

Theorem 1.
Suppose that M has a finite support, M = 1, · · · , M max . Then the degree distribution of Jordan's model can be expressed as a mixture distribution, given by where w m = P(M = m) is a mixture weight corresponding to the mth mixture component P(k|m) in Equation (1), m = 1, · · · , M max .
Proof. Note that we can rewrite P(k|m) and P(k) as (1) and (2). Then, we have Theorem 1 suggests that the degree distribution might be expressed as a mixture distribution. For example, we consider a network where a new node connects to one or two existing nodes, with probabilities P(M = 1) = 2/3 and P(M = 2) = 1/3. Then we have the following degree distribution in a mixture form of the two BA model's distributions with different m, Inspired by this property, we consider a mixture model as an explanation of the deviation from the power-law in the range of small degrees. Moreover, many studies have considered the degree as a continuous variable using continuity assumptions [30,31]. This approach may mislead researchers since the degree is discrete-valued. Therefore, a discrete power-law distribution, called a truncated zeta distribution, is used in this paper.
In this study, we propose a mixture model of truncated zeta distributions for the analysis of degree distributions. The proposed model covers the entire range of the degree distribution through a mixture of truncated zeta distributions while maintaining the scale-free nature of a network. We can characterize the degree distribution more accurately through the discrete version of the power-law distribution. In addition, we present the maximum likelihood estimation algorithm along with a model selection method. A simulation study examines the validity of the proposed estimation procedure. In addition, real collaboration networks are investigated with the proposed model to describe the characteristics of the degree distribution.
We focus on analyzing actual scientific collaboration networks and have made significant advancements compared to the previous work in Jung and Phoa (2020) [32]. The major improvements are as follows: • We detected some inconsistency in the scientific collaboration data. For example, "Smith, James," "Smith, John," and "Smith, Jacob" are all stated as "Smith, J" before 2007. Hence, we change the period of data to avoid the author name inconsistency for accurate inferences. • Moreover, a more elaborate analysis of the real data is conducted with noteworthy interpretations. • The validity of the presented algorithm is addressed with Monte Carlo simulations.
• Extensive comparison studies are performed to show the superiority of the proposed model. We compared the proposed model with generalized Pareto models as well as base models that lack discreteness or mixture nature. • We provide more detailed explanations throughout the paper.
The rest of the paper is organized as follows. The continuous and discrete power-law distributions are defined in Section 2, and the proposed mixture model of truncated zeta distributions is defined in Section 3. Section 4 presents the estimation method of the mixture model, and the validity of the estimation procedure is demonstrated in Section 5. In Section 6, we analyze the scientific collaboration network by applying the proposed mixture model with interpretations. Section 7 concludes the paper.

Continuous Power-Law Distribution
The probability density function (pdf) of the continuous power-law distribution parameterized by the power-law exponent α > 1 and the minimum value l > 0, denoted by PL(α, l), is given by [7] Note that the support is real values greater than or equal to l. The complementary cumulative distribution function is useful to describe the tail of the power-law distribution, expressed as Its moments are given by provided α > m + 1. We can deduce that PL(α, l) has the mean and variance given by where the mean and variance are well-defined for α > 2 and α > 3, respectively.
The continuous power-law distribution has been widely used in the analysis of the degree distribution. To analyze the discrete-valued variable degree, we need an approximation method. Setting a constant c, 0 ≤ c ≤ 1, for the correction of continuity, the degree distribution can be approximated by for an integer value k.
One of the most common approaches is to round values to the nearest integer, which corresponds to c = 0.5 [7,33]. This rounding approach is acceptable when considering the tail part of the power-law distribution. However, if k is small, the constant c that satisfies the exact equation of (5) may be considerably less than 0.5, and it should be avoided. According to Clauset et al. (2009) in [7], the rounding approach is reasonable for k > 6.
Since the node degree is usually small, the approximation may lead to misunderstanding when performing statistical analysis on the degree distribution, such as generating node degrees and fitting the distribution to real data. We consider the exact version of the discrete power-law in the next subsection.

Truncated Zeta Distribution
The truncated zeta distribution, denoted by TZ(α, l), is a discrete form of the powerlaw distribution. Parameters are the same as the continuous power-law distribution in which α > 1 is the power-law exponent and l > 0 is the minimum value. The probability mass function (pmf) of TZ(α, l) is given by [7] g(k|α, l) = 1 where ζ(·, ·) is the Hurwitz zeta function which can be regarded as the normalizing constant of the distribution. The Hurwitz zeta function in Equation (6) corresponds to the continuous counterpart 1/(α − 1)l α−1 in Equation (4) via the upper Riemann sum approximation, expressed as The complementary cumulative distribution function is given by Next, the moments of the truncated zeta distribution are expressed as We can straightforwardly derive the mean and variance as provided α > 2 for the mean and α > 3 for the variance. Figure 1 shows some pmfs of TZ(α, l) when α = 2.50. We can check that the straight line of a log-log plot can be strong evidence of a power-law behavior.

Truncated Zeta Mixture Model
We consider the finite mixture model of truncated zeta distributions by fixing the power-law exponent α for mixture components while varying minimum values to produce a mixture of truncated zeta distributions. The probability mass function is represented as where g(k|α, l) is the pmf of TZ(α, l), and L is the number of mixture components. Mixture weights w = (w 1 , w 2 , · · · , w L ) In this paper, we assume that the minimum value l is equal to 1, but it can be modified according to the data. The tail of most real networks follows the power-law distribution, and Equation (7) has the exact power-law behavior for sufficiently large degrees.
Theorem 2. For k larger than or equal to L, the truncated zeta mixture distribution in Equation (7) has the exact power-law relationship, given by Proof. By using the pmf of TZ(α, l), we can write Since the term inside the bracket is independent with k, the pmf of the mixture is proportional to k −α .
Mixture models may suffer from the non-identifiability issue even for finite mixtures. The following theorem proves that the proposed truncated zeta mixture model is identifiable. (7) is identifiable with respect to α, L, and w.
In Figure 2, we depict some log-log plots of mixture distributions. We can see that this model can handle empirical degree distributions that do not follow the power-law distribution at small degrees.

Estimation Algorithm
We use the Expectation-Maximization (EM) algorithm to estimate the exponent parameter α and mixture weights w for a given number of mixture components L. Let k = (k 1 , k 2 , · · · , k N ) be the observed data, and z n be the membership of k n , where the membership z n is assigned as z n = l if k n is from the lth mixture component TZ(α, l). We consider the membership variable z = (z 1 , z 2 , · · · , z N ) as missing. Let θ = (α, w) = (α, w 1 , · · · , w L ) be the parameters of the mixture model.
The complete-data likelihood function is given by We proceed by taking the logarithms of both sides to have the complete-data log-likelihood function: We define Q(θ|θ * ) as the expected value of the log-likelihood given the observed data k and the current parameter estimate θ * = (α * , w * ), which can be expressed as where γ(l, n, θ * ) is the membership responsibility of the nth observation k n corresponding to the lth mixture component TZ(α, l). They are defined by the posterior probabilities of mixture component memberships for each observation, γ(l, n, θ * ) = P(z n = l|k n , θ * ) = w * l g(k n |α * , l) ∑ L l =1 w * l g(k n |α * , l ) , l = 1, 2, · · · , L, n = 1, 2, · · · , N.
The E-step computes membership responsibilities.
In the M-step, a new parameter estimateθ = argmax θ Q(θ|θ * ) is computed using the quantity previously computed in the E-step. To findw, we need to solve the following optimization problem: Subject to L ∑ l=1 w l = 1, and w l ≥ 0, l = 1, 2, · · · , L.
The Lagrange multiplier method yieldš Next,α can be found in the partial derivative of Q with respect to α, given by Here, ζ α (α, l) is the partial derivative of the Hurwitz zeta function with respect to α, given by Then, the equation gives the desiredα. Unfortunately, a closed-form solution does not exist in general. In this paper, we employ Brent's method [34] to solve Equation (10) with respect to α. The two steps are necessarily repeated until the convergence obtains the final parameter estimateθ. The process is summarized in Algorithm 1.
In order to select the number of mixture components L, we employ the Bayesian information criterion (BIC) considering the trade-off between the goodness-of-fit and the complexity of the model. BIC is given by the following formula: where α and w are the obtained estimated parameters given the number of mixture components L. We choose L giving the smallest BIC, where the candidates of L are the integers from 1 to the minimum of the two values: 100 and the nearest integer to 0.90×(maximum degree).

Monte Carlo Simulation
We study the validity of the presented estimation methods using the synthetic data. We consider three values of true power-law exponents α = 2.50, 3.00, 3.50 and also three values of the number of mixture components L = 2, 4, 6. For each pair of α and L, we generate 1000 samples from the finite mixture of truncated zeta distributions in Equation (7). Each process is repeated 30 times. Mixture weights w l , l = 1, 2, · · · , L are set to 1/L for each dataset.
We assume that the true number of components is given. Algorithm 1 is applied to each dataset to obtain the parameter estimateθ = (α,ŵ). Table 1 presents the parameter estimation result. We find that the average values of estimated parameters are very close to the true parameters, implying that the EM algorithm works well in estimating the exponent parameter α and weights w. Table 1. Estimates of θ of the EM algorithm applied to the synthetic data with the true number of mixture components. We present the average (standard deviation in parentheses) values of estimated parametersθ = (α,ŵ) of 30 datasets for each case. Next, we assume that the number L of mixture components is not given. We will check whether the presented algorithm can find the appropriate number of mixture components. Note that BIC is used as a model selection criterion for the generated datasets. Table 2 shows the result of determining L. The result indicates that the maximum likelihood method yields a reasonable estimation result in finding L. We study the scientific collaboration network obtained from the Web of Science, where a large-scale database collects the information of all published scientific articles in the world. Among all 275 disciplines, we randomly choose five, which are Biotechnology and Applied Microbiology, Computer Science, Environmental Science, Materials Science, and Physical Chemistry, for demonstration. We reorganize the Web of Science database into a network data structure such that the nodes of networks are authors, and two authors are connected by an undirected link if there is at least one paper co-authored by them. We employ the data from 2007 to 2016 as the inconsistency of author's names is observed before the year 2007. The data from 2007 to 2009 are used to accumulate data so that the dependence of node degrees can be ignorable. We analyze the degree distribution of the data from 2010 to 2016. Table 3 shows the summary statistics in the year 2016. We apply the presented estimation method to degree distributions from 2010 to 2016. In Figure 3, we plot degree distributions and the estimated fitting lines of the proposed model in the years 2001 and 2016. The plots suggest that the truncated zeta mixture shows a good performance in fitting degree distributions. The beginning points (see the vertical lines in Figure 3) of power-law behaviors are reasonably estimated viaL. This result shows the usefulness of the proposed model for the degree distribution that deviates from the power-law distribution on a small degree part.   Figure 4 shows the temporal changes ofL andα. The estimated number L of mixture components exhibits a rising tendency, andα shows an opposite trend. They constantly move towards large L and small α regions for all fields, indicating that the stabilization of the degree distribution has not yet been achieved. Many network scientists [1,13,14] have concentrated on the stationary or the converged degree distribution. The result implies that, however, non-stationary network models are of great importance, such as acceleration models [15]. Moreover, many works in real data analysis have focused on the power-law exponent of a snapshot of a network. However, the temporal variation ofα tells us that the significance of temporal models can account for the temporal change of the power-law exponent.  It should be noted thatL tends to approach different values across fields. According to Equation (3) and the relevant interpretation in Section 1, the number of mixture components L and mixture weights w are closely related to the distribution of the number of links in the system. Therefore,L could be a measure to the distribution of the number of links. On the other hand,α seems to converge a value near 2.80 for all fields, suggesting that α could be a network-specific quantity instead of a field-specific quantity.
We now focus on the field of Computer Science. Table 4 presents the result of applying the model to the field of Computer Science in more detail. We can observe an interesting pattern in the estimated mixture weights, whereŵ 1 andŵ 2 show decreasing trends whilst w 3 ,ŵ 4 , andŵ 5 show the opposite. According to Jordan's model [29] and Equation (3), mixture weights have much to do with the number of newly made links. The decreasing trend of w 1 and w 2 and the increasing trend of w 3 , w 4 , · · · indicates the increasing number of links over time. As shown in Table 4, the number of created links tends to increase over time. Since there are many new links compared to new nodes (authors), the average degree gradually increased from 4.70 in 2010 to 6.68 in 2016. The increase in the average degree also explains that the state of this network is still evolving. With the rapid advancement in technology and science, many publications have been produced by researchers. In particular, Computer Science is making greater progress due to recently emerging areas: artificial intelligence and big data. The proposed model tries to explain the increasing mean degree in two ways: decreasing power-law exponents α and an appropriate change in the mixture weight, suggesting that the proposed model is helpful to describe the change of the network pattern.

Comparison to Other Models
Our model is developed to deal with two essential characteristics of the degree distribution: the non-power-law behavior at small degrees and the discreteness. We study the superiority of the proposed model to base models that lack one of these characteristics.
We first consider the standard discrete power-law distribution, TZ(α, 1). The estimates of the power-law exponent α are obtained by fitting the data into TZ(α, 1), and the result is presented in Table 5. Theα estimates of the standard discrete power-law distribution are smaller than those of the mixture distribution. As we can observe in Figure 5, the smalldegree data that deviates from the power-law makes the exponent small. The estimated fitting lines indicate that the standard discrete power-law distribution is inappropriate to describe the degree distribution. Next, the degree distribution is applied to the mixture of the continuous powerlaw distribution using the rounding method with the constant c = 0.5 of the continuity correction. The continuous version of the mixture of truncated zeta distributions can be constructed by the substitution of TZ(α, l) to PL(α, l − c). In addition, we imitate the method in Section 4 for the continuous mixture to compare the performance of the continuous mixture with the discrete mixture. The estimation procedure is identical to the case of the discrete mixture model. Table 5 presents the estimation result of α and mixture weights w. The estimatesL andα are much smaller in continuous mixture models due to the non-exactness of the continuous model. In addition, estimated mixture weightsŵ are considerably different from the discrete mixture. According to Equation (3), mixture weights are involved in the distribution of the number of links. Therefore, we should use the discrete version of the power-law to determine mixture weights correctly. Figure 5 shows that the fitting lines of the continuous mixture seem to deviate from empirical distributions.
As we can see in Tables 4 and 5, the smallest BIC values are achieved in the proposed model as well. Thus, we can conclude that the proposed model outperforms the continuous model as well as the standard discrete power-law model. Next, we compare the proposed model with generalized Pareto distributions in which all degree ranges are covered. We use both continuous and discrete types of generalized Pareto distributions. The complementary cumulative density function of the continuous generalized Pareto distribution GPD(µ = 0.5, σ, ξ) is given by There are few studies concerning the discrete version of the generalized Pareto distribution.
We here use the model in Prieto et al. (2013) [35], expressed as DGP(α, λ, µ = 1). This distribution has advantages over the continuous distribution since the node degree is discrete. Table 6 shows the results of discrete and continuous generalized Pareto distributions applied to the field of Computer Science. In Figure 6, we plot fitting results in the years 2010 and 2016.  Interestingly, Figure 6 suggests that the continuous version has better fitting results. The discrete version has difficulty in explaining a range of large degrees. We can see that our model performs better than the two generalized Pareto distributions, as well indicated by the BIC values (see Tables 4 and 6).

Concluding Remark
Inspired by Jordan's model [29], a novel mixture model for the degree distribution is proposed to describe the entire range of degrees while maintaining the power-law or the scale-free property of a network. The truncated zeta distribution enables us to analyze discrete distributions for accuracy purposes. The parameter estimation procedure is presented along with the model selection criterion for determining the number of mixture components. A simulation study shows the validity of the suggested estimation procedure. The practical performance of the model is studied through the comparison analysis with the other techniques.
We perform the real data analysis on five disciplines of the scientific collaboration data obtained from the Web of Science. We observe the increasing tendency in the number of mixture components and the decreasing tendency in the power-law exponent. In addition, mixture weights change over time. It can be suggested from these results that the analyzed networks are still in an evolving state, highlighting the practical importance of non-stationary temporal network models. The non-convergence of the degree distribution might be due to the short-term analysis performed. Determining whether the collaboration network will stabilize the equilibrium remains as future work.
We can observe power-law distributions not only in the degree distribution but also in sandpile avalanches, species extinctions, city sizes, and so on. The proposed model could be useful when (i) the distribution does not follow the power-law only in small values while the power-law is suitable for large values, (ii) the background knowledge does not support the manual modification of the power-law relationship, or (iii) a mixture distribution can be regarded as reasonable for describing data. Data Availability Statement: Restrictions apply to the availability of these data. Data was obtained from a neo4j database refined from the Web of Science database. They are available from Dr. Frederick Kin Hing Phoa with the permission of the URA team of ISM (Japan) and Clarivate Analytics.