A Model for the Frequency Distribution of Multi-Scale Phenomena

: Frequency analysis is often used to investigate the structure of systems representing multi-scale real-world phenomena. In many different environments, functional relationships characterized by a power law have been recognized, but, in many cases this simple model has turned out to be absolutely inadequate and other models have been proposed. In this paper, we propose a general abstract model which constitutes a unifying framework, including many models found in literature, like the mixed model, the exponential cut-off and the log-normal. It is based on a discrete-time stochastic process, which leads to a recurrence relation describing the temporal evolution of the system. The steady state solution of the system highlights the probability distribution, which underlies the frequency behavior. A particular instance of the general model, called cubic-cut-off, was analyzed and tested in a number of experiments, producing good answers in difﬁcult cases, even in the presence of peculiar behaviors.


Introduction
A common activity in statistical science is the collection and the investigation of data in which underlying phenomenon can be described by random variables. Sometimes, data, for example human heights, are normally distributed. However, there exist many phenomena, called scale-free, where the data cannot be classified as normal distributions because the values do not gather around a mean value but span many orders of magnitude. Occurring in a wide variety of physical, biological, social, and information environments, these phenomena are assumed to have some common similarity in the structure of the underlying probability mechanisms [1].
To describe relations and processes occurring in real-world phenomena, different structures can be employed. A scalar phenomenon is characterized by a single distribution of values, called degrees, associated to given items. Examples of these phenomena are the world wealth or the word frequency in natural languages or the large cities populations.
A more complex structure is represented by the graphs (see Reference [2] and its extensive bibliography), where the edges provide connections among the nodes. The items are the nodes, and the number of edges connected to a node is its degree. A classical example is the graph which describes the structure of the web, where the nodes and the edges represent, respectively, the web pages and the links from one page to another. Graphs like this apply to many man-made and naturally occurring phenomena. One of the most used methods to investigate these structures is the frequency analysis, which explores the relationship between the number of items having the same degree and the degree itself. For example, the analysis of large subsets of the web has shown that there are many pages with a small degree and few pages with a large degree.
Some phenomena belonging to different environments, e.g., the distribution of wealth in a society or the frequencies of words in natural languages or the frequency of the inlinks of a network, have been recognized to approximately follow functional relationships characterized by a power law [3], that is, a relation of the form f (x) = a x −ρ , where ρ > 0 and a is a constant scaling factor. A power law has a well-defined mean over x ∈ [1, ∞) only if ρ > 2 and is the only scale-free distribution.
For many other phenomena, such as, for example, the frequency of the outlinks of a network [4] or the population of cities, the pure power law is absolutely inadequate. In many cases, substantial modifications are required. Among them, the following ones have often been suggested: (1) the exponential cut-off, where the power law is corrected by an exponential term responsible of a faster decay of the solution for large j, and (2) the log-normal, where a log term is responsible for a bending down for small j.
Various underlying probability distributions have been proposed for modeling the frequency behavior. They are mainly based on an attachment strategy defining the relationship between the degree of an item and the probability that its degree is increased by 1. The simplest model, which adopts a uniform attachment strategy as suggested in Reference [2], would generate a random dataset with most items having a comparable number of degrees. This behavior does not reflect the real-world datasets, where there are many items with a very small degree and a not negligible part of hub items with high degree. To obviate this situation, a preferential attachment strategy has been proposed (see Reference [5]). This strategy complies especially with the "rich get richer" effect. A mixed model combines the uniform and preferential approaches (see, for example, Reference [6][7][8]).
To describe the frequency behavior of multi-scale phenomena, in this paper, we propose a general model, which constitutes a unifying abstract framework able to include many models found in literature, like the mixed model, the log-normal model, and the exponential cut-off model. It is based on a discrete-time stochastic process, which leads to a recurrence relation describing the temporal evolution of the system. The steady state solution of the system highlights the probability distribution, which underlies the frequency behavior and rules the strategy on which the attachment policy relies.
A particular instance of the general model, which we call the cubic-cut-off model, is taken into consideration with the aim of dealing, at the same time, with items having a very small degree or a very large degree, providing a correct characterization of the degree distributions on the full range of the available data, even in presence of peculiar behaviors. This cubic-cut-off model lends itself to a definition of the attachment strategy, which characterizes, in a simple way, the behavior of the system. It has been tested in a number of experiments, producing better answers than the classical models, even in difficult cases.
The paper is structured as follows. The structure of the datasets taken into consideration in our analysis and the formal definition of a general model from which stems our proposed cubic-cut-off model are described in Sections 2.1 and 2.2. The discrete time stochastic process and the steady-state solution are described in Section 2.3. The classical Beta, power law, log-normal, and cut-off models are derived in Sections 2.4 and 2.5. In Section 3, we examine the problems caused by the collection, the representation, and the fitting of the real-world data. Finally, in Section 4, we test our model in comparison with the classical ones on a collection of 39 files of data, extracted from 21 datasets, including typical examples, such as the web, the movie actors graph, the supermarket purchases, or the number of social media followers.

The Frequency Distribution Model
The world wealth, the word frequency in natural languages, or the large cities populations represent real-world phenomena in which structure is characterized by a single distribution of values.
In order to describe the processes which guide their evolution, models of their frequency distributions are often devised. First of all, we give some definitions about the structure of the datasets we are considering.

The Structure of Datasets
The simplest way to treat real-world phenomena is to associate to each considered item, let's say the kth one, a value y k which somewhat measures the feature of interest. For example, y k could be the number of occurrences of the kth word in a linguistic corpus or the number of inhabitants of the kth city. We say that y k is the degree of the kth item. The number of items having the same degree j is the frequency and is given by An analogous function can be referred to also when we deal with phenomena described by more complex structures. We examine for example the structure implementing graphs, which are usually addressed to design models using vertices (the nodes) and edges (the links) for the interconnections. The degree deg(v) of a node v is the number of links connected to v, and the number of nodes having the same degree j is given by Definition (2) coincides with (1) if we assimilate node v to item k and deg(v) to y k . Generally, the values Q j which describe real-world phenomena span many orders of magnitude. For this reason, it is common in the literature to switch to the log-log plane for their graphical representation.

The Model
We give now the definition of a general model for describing the frequency behavior of multi-scale phenomena. Such a definition, based on infinite sequences verifying simple mathematical properties, aims at setting a unifying abstract framework for many approaches found in literature.
where the sequence p = {p j }, with j ≥ 0, satisfies ∞ ∑ j=0 p j = 1, and the sequence f = { f j } satisfies From (4), it follows that Thanks to these relations, a model M can be defined through any positive real infinite sequence g j such that ∞ ∑ j=1 j g j converges to a limit θ, by setting Note that, in the rest of the paper, we use the notation p j , with implicitly varying index j, to denote either the jth element of sequence p = {p j } or the whole sequence, depending on the context.
In the following section, we briefly outline the discrete-time stochastic process which leads to a model of form (3), where the sequence f j is the expected value of the sequence Q j .

The Discrete-Time Stochastic Process
The frequency analysis, often used to investigate the structure of a system, allows a deep insight in the design underlying a dataset. The frequency distribution model we consider in this paper is based on the following discrete-time stochastic process: we assume that, at time t, a set of N(t) items exists, with N(0) = 0, and that t is updated corresponding to a unit increase of the degree of an item. Let j denote the number of items having degree j ≥ 1. Then, j . In our setting Q j denote the probability that, at time t + 1, an item having degree j is considered. There are two possibilities.

•
If the item is new, different from any item already existing in the set, it is added to the set and it is given degree 1. Let β, with 0 < β < 1, be the probability of this event, i.e., p If the item already exists in the set, its degree is increased by 1. In this case, we assume that the event has a probability which is proportional to the ratio q (t) where δ j does not depend on t. Hence, The variation of q (t+1) j with respect to q (t) j is given by the equation which describes the temporal evolution of the stochastic system. We look for the steady-state solution of the system. So, we let t → ∞, p j = lim t→∞ p (t) j and assume q (t) Comparing with (4), we see that the pair M = (p, f ), with p = {p j } and f = { f j } defines a model of the form (3). The solution f j is the expected value of the number of items having degree j, and the probability p j is the expected value of the total number of items having degree larger than j. An important feature to evaluate the qualitative evolution of the system is the ratio δ j = p j / f j , denoted attachment rule [9]. In the linear case, δ j is, apart from an additive constant, proportional to the degree j of the item. However, this kind of attachment, even if widely studied in the literature, is rarely observed in real-world data, while nonlinear attachments, where δ j depends on a nonlinear function of j, are more commonly observed [10]. In the following sections, both linear and nonlinear attachment rules are examined.

The Linear Case
We consider first the linear case Replacing p j into (9), we have This recurrence is solved exactly by the (complete) Beta function (a classical text for the Beta function is Reference [11] (p. 258), but, for its important properties, see Reference [12]). In fact, the Beta function B(j, ρ) for positive j and ρ verifies the recursion It follows that f j may have the form provided that the series ∞ ∑ j=1 j f j is convergent and c is chosen in such a way that the series converges to 1. The series converges only for ρ > 2 (that is s < 1), and it holds that and with If r > 0, model M corresponds to the one known in literature as mixed model. In fact, we can give an interesting interpretation of formula (13) in the time dependent setting that we considered at the beginning of the section, by specifying the function δ j of (6). If the item considered at time t + 1 already exists, let k be its index. The mixed model specifies the following policy to choose k.
(a1) With probability α, 0 < α < 1, the index k is chosen accordingly to its degree j (this policy is known as preferential attachment), and (a2) with probability 1 − α, the index k is chosen at random (this policy is known as uniform attachment).
Then, p (t) j with j ≥ 1 is given by the sum of two terms. Because of assumption (a2), the first term is proportional to q (t) j /n(t), and, because of assumption (a1), the second term is proportional to j q (t) j /t, i.e., Function δ j quantifies the attachment rule: the higher s, the more preferential the attachment. If the uniform attachment was the only policy applied, all the items would acquire approximately the same degree. When applied to graphs, the preferential attachment expresses the concept that new links tend to attach themselves to nodes already having more links. From (14), we have Having assumed r > 0 and 0 < s < 1, the condition 0 < α, β < 1 is verified. The steady-state solution (13) holds with The starting condition for p j is in fact the same p 0 = β. An asymptotic approximation v j of f j for large j is obtained by neglecting γ with respect to j and writing the first order expansion of B(j, ρ) for fixed ρ. We get where d is a suitable constant, showing that v j satisfies a power law. Function v j is a good approximation of f j for large j, as shown in Figure 1 where the log-log plots of f j (solid line) and of v j (dashed line) are given for two different choices of the parameters α and β. The log-log representations of f j and v j are where z = log j. Note that v j is not solution of a mixed model. The case of the power law function will be taken up again in the next section.

The General Case
When dealing with real-world data, often improperly collected or contaminated by noise, superpositions of more different models defined in not overlapping intervals of j have been suggested. We prefer instead to consider a single model obtained by combining some basic functions.
In literature many different functions f j have been proposed. Some of them lead to solutions of a model M, which implies a nonlinear ratio p j / f j . In general, pairs (p j , f j ) which solve Equation (4) exactly are not immediately found. So, we suggest to choose some interesting f j and derive p j from them, as shown in Section 2.2.
In practice, f j is obtained by fitting given samples in the log-log space, i.e., by using its log-log representation f (z) = log f j , with z = log j.
f j must be normalized in such a way that To guarantee the convergence of the series, we must assume that f j has an asymptotic growth rate lower than j −2 . Let us examine some important examples.
• The power law model, in which log-log function is a straight line where a 1 < −2, and a 0 guarantees that the solution where ζ(s) is the Riemann's zeta function, and ζ(s, q) is the Hurwitz's zeta function. Actually, this case has already been met in the previous section (see (15)) as an asymptotic approximation of the Beta function. In fact, f (P) j is a realization of the Zipf's law, which describes the tail of a Yule-Simon distribution.
• The log-normal model, in which log-log function is a parabola where a 2 < 0, and a 0 guarantees that the solution follows from the convergence of the series of negative exponentials. The log-normal solution coincides with the probability density function of the log-normal distribution, as can be seen by setting • The cut-off model, in which log-log function is an exponential where a e < 0, and a 0 guarantees that the solution f (C) j = θ j a 1 exp a e j , with θ = exp(a 0 ), is normalized. As in the previous case, the convergence of follows from the convergence of the series of negative exponentials. The cut-off solution coincides with the probability density function of the power law with exponential cut-off distribution.
• We suggest a unifying approach: the function f (z) is where a e < 0, and a 0 guarantees that the solution is normalized.
The exponential term exp a e j is responsible of a faster decay of the solution with respect to the power law for large j, while the log terms are responsible for a bending down for small j. To show the different behaviors of the functions f j considered above, Figure 2 shows the log-log plots of f  Log-log plots of f The probabilities p are derived as shown in Section 2.2, obtaining The j . According to (22) . This suggests to express π j in the log-log scale with a basis similar to that used for f (O) (z). So, we assume for π(z) an expression of the form where h(z) is a function of order lower than exp(z), and η < 0 is a coefficient to be determined; then, Now, we impose that the dominant terms of f (O) j and π j−1 − π j in the asymptotic setting coincide. Since it follows that η = a e . We postpone the choice of a suitable function h(z) to the next section, where a fitting technique is suggested. The validation of this procedure will be effectively checked by the experimentation. The same technique allows finding also the probabilities corresponding to the log-normal and the cut-off functions.

Treatment of the Data
When data from real life phenomena are sampled and analyzed, intrinsic problems of various kind arise, namely: • The crawling process through which data are acquired can produce complete or partial datasets. English Wikipedia-2018 is an example of a complete crawling, whereas English Web must inevitably be partially crawled.

•
For the visualization of multi-scale data, a log-log plot is required, in order to better evidence the properties of the data and the possible correspondence with the chosen model. For example, if the chosen model is the power law, the log-log data should have a straight line representation.

•
In the previous sections, we looked for approximations of a function f j verifying q (t) j = t f j for t large enough. Actually, when real-world phenomena (such as the web or the whole English language) are considered, t is so large that it can be assumed infinite. In practice, we deal with J samples Q j , and, typically, the quantity J ∑ j=1 j Q j is much smaller than t. So, we assume where d is a suitable scaling factor. Note that Q j , being the number of items having degree j, is a nonnegative integer, while d f j is a real number which can be very small. The quantization phenomenon cannot be considered statistical noise (as done by some authors) but is an intrinsic characteristic of the sampled data. For example, if q j = 10 −3 , the corresponding values Q j are mostly 0 but sometimes 1 or 2. Obviously, the zeros become more and more probable until the last data are reached.
In the log-log scale, the values Q j = 1, 2, . . . are gathered in plateaus on the tail of the dataset. Figure 3 shows the base 10 log-log representation of the frequencies of two datasets described in the next section: a set of English words and a set of MovieLens ratings. The quantization phenomenon is evident. • A dequantization process can be accomplished by binning the data: the data values belonging to a given small interval (called a bin) are replaced by a value representative of that interval. When the binning is performed in the log-log scale, negative values might be generated. This procedure is essential to recover the asymptotic properties of the phenomenon and allows to reduce the size of data while performing some sort of smoothing. In Figure 4, the same data of Figure 3 are presented, together with the result of binning. It is clear that the binning reveals the different asymptotic behavior of the two data sets.

The Binning
We suggest the following logarithmic binning, which produces bins of equal width in the log-log scale.
Given τ > 1, we consider the sequence h i = τ i−1 , i = 1, . . . , n + 1, where n is such that h n ≤ J < h n+1 . The ith bin is J i = [h i , h i+1 ) for i = 1, . . . , n. The set Y of the binned data is formed by the pairs (x i , y i ), where If no point (j, Q j ) exists with j ∈ J i , the pair (x i , y i ) is discarded (this may happen for large i). Note that, because of the discarded pairs, the set Y might have size lower than n, but, for simplicity, we still denote by n the size of Y. In the experimentation, τ has been tuned through a preliminary processing.

The Fitting
The fitting procedure is performed in the log-log plane on the binned data Let g(z) be the cubic-cut-off function defined in (20): g(z) = a 0 + a 1 z + a 2 z 2 + a 3 z 3 + a e exp(z), where a e < 0.
We compute imposing the constraint a e ≤ 0. For the other functions of Section 2.5, we compute the fit (25) setting to zero some coefficients of g(z).
If b e < 0, the solution is In some cases, the coefficient b e might be zero. The series ∞ ∑ j=1 j g j is convergent for b e < 0, or, for b 3 < 0, when b e = 0. The corresponding model (p, f ) is derived according to (5): We can give a closed-form approximation of the sequence p j through the function π j defined in (24).
We have already suggested that η coincides with the coefficient of the exponential term of f j , in the present case η = b e , but we still need to compute the function h(z) defined in (23).
In analogy with what has been done for f (z), we try for h(z) a polynomial regression with degree 3. So, we consider a subset of m ≤ n integers j i , i = 1, . . . , m, in [1, J], equispaced in the logarithmic scale, such that j 1 = 1 and j m = J. Then, setting x i = log j i , and y i = log p j i , we solve the minimum problem Replacing in (23), we get π(z) = c 0 + c 1 z + c 2 z 2 + c 3 z 3 + b e exp(z), π j = exp(c 0 ) j c 1 exp c 2 log 2 j exp c 3 log 3 j exp b e j .
A specific performance index p (31) controls the effectiveness of the similarity of π j to p j . A too large p would raise doubts on the approximation, possibly due to numerical instability in the computation of f j .

The Attachment Rule
Instead of computing directly δ j as the ratio between the sequences p j and f j for 1 ≤ j ≤ J, the sequence δ j can be approximated by a function ξ j obtained exploiting the closed-form approximation π j of p j : The error of this approximation is measured by a specific performance index δ (32). If δ is sufficiently small, the investigation of ξ j for 1 ≤ j ≤ J gives useful hints on δ j . The quantity ν = min{k such that max j∈ [2,J] ξ j /j k ≤ 1 satisfies ξ j = s j j ν , where s j = ξ j /j ν , with 0 < s j ≤ 1.
Then, we can assume s j as the probability for an attachment rule δ j on the whole interval [2, J]. The value j = 1 has been excluded from definition (29) because the maximum of ξ j /j k does not change when ξ j assumes its maximum in j = 1. Of course, if the function ξ j /j ν is decreasing in [2, J], the value ν obtained from (29) coincides with ν = log 2 ξ 2 . The function s j takes the place of the coefficient s in (10). We call sublinear the attachment if ν < 1, superlinear if ν > 1, and pseudo-linear if ν = 1.
The attachment exponent ν, as defined in (29), holds for the whole interval, but it depends excessively on the head of the dataset in which behavior, even though in agreement with our model, could generate a uselessly overestimated attachment rule. An attachment exponent less affected by the first points could be more indicative. To this aim, we restrict our computation of (29) to a subinterval which leaves out the first j min points (in our experimentation, we took j min = 20).
The quantity ν can be used as a possible numerical measure to discriminate different types of datasets.

Performance Indices
In the experimentation, the function g(z) used for the fitting has been chosen among all the functions taken into consideration in the previous sections. Let g (H) (z), with H ∈ {B, P, L, C, O}, denote one of the functions. The corresponding normalized f (H) (z) are the Beta function f (B) (z) = f (z) defined in (16), the power law function f (P) (z) defined in (17), the log-normal function f (L) (z) defined in (18), the cut-off function f (C) (z) defined in (19), the cubic-cut-off function f (O) (z) defined in (21).
Least squares procedures solve the minimization problem (25), except for the Beta function which requires a procedure of non linear minimization (we used a Nelder-Mead procedure). The quality of the fitting is measured by the NRMSE (normalized root-mean-square error) where y min and y max are the minimum and the maximum of the values y j for j = 1, . . . , n. Besides the error (H) , the suitability of the model to the dataset can be measured by the scaling factor θ of (26).
In the case of b e = 0, if d 3 > 0, the series ∑ ∞ j=1 j g j does not converge, and, in practice, θ is given a very large value. The same thing can also occur when the series is convergent, but numerical instability prevents a correct computation. When θ is too large, we judge the model to be inadequate for that dataset. The symbol ∞ in the tables of the next section identifies this case.
Two more performance indices have emerged in the presentation of the whole fitting procedure.
(1) The quality of the approximation of p j by π j is measured by the NRMSE where p min and p max are the minimum and the maximum of the values p j i for i = 1, . . . , m.
(2) A too large discrepancy between δ j and ξ j suggests that the similarity of the bases used for π(z), and f (z) cannot be assumed. This is measured by the NRMSE where δ min and δ max are the minimum and the maximum of the values δ j i for i = 1, . . . , m.
These two indices p and δ have been evaluated for all the functions, but only the values obtained for the cubic-cut-off are reported in the next section.

Experiments
The experimentation has been performed with a 3.2 GHz 8-core Intel Xeon W processor machine using Mathematica TM version 12 and carried out on 21 datasets divided in three groups: scalar phenomena, directed graphs, and bipartite graphs. The code, together with the datasets not available elsewhere, can be downloaded from Reference [13]. For each dataset, the citation, a brief description, the number N of items, and the size S, equal to the total number of degrees, are given below.
Following the description, a first table summarizes the results of the experimentation. Columns 1-5 of the table show the errors of the solutions computed by the different procedures. The error is replaced by ∞ if the series ∑ ∞ j=1 jg j does not converge. For the power law, this means that ρ < 2, i.e., a well-defined mean does not exist. The error is replaced by an * if it exceeds by three times the best error for the same dataset. Columns 6, 7 list the indices p and δ of cubic-cut-off. Column 8 lists the exponent ν of the attachment rule defined in (29), where j ∈ [j min , J], with j min = 20.
A second table gives the log-log representation of g (O) (z) for some selected datasets. For these datasets, the base 10 log-log plots of the cubic-cut-off functions (solid line) are given, superimposed to the original data (gray points) and the binned data (black points). An integer i on the axis corresponds to 10 i in the linear scale.

Scalar Phenomena
The scalar phenomena are characterized by a single distribution of values. For each dataset, the file of the pairs (j, Q j ), where Q j is the frequency function defined in (1), is generated. The name of the file corresponds to the name of the dataset. The errors of the computed solutions are given in Table 1.  Two files have been selected: their log-log solutions g (O) (z) are given in Table 2, and the log-log plots of the cubic-cut-off functions are given in Figure 5. Table 2. Solutions of two selected scalar phenomena. citiesPopulation −1.05 + 2.690z − 0.358z 2 + 0.009z 3 − 5.9 10 −8 e z english 16.08 − 1.840z + 0.041z 2 − 0.002z 3

Directed Graphs
In directed graphs, the edges have an orientation, so there exist inlinks (pointing to a node) and outlinks (originating from a node). In this case, the degree of the node becomes, more specifically, the indegree, which counts the inlinks, and the outdegree, which counts outlinks. For each graph, two files are generated with the frequency function (2): the one containing the indegrees and the one containing the outdegrees. Their names correspond to the name of the graph with the suffix .i and .o, respectively.

•
clueweb12 [19]. The web graph underlying the ClueWeb12, a dataset created to support research on information retrieval and related human language technologies. N = 978 M, and S = 43.6 G. The errors of the computed solutions are given in Table 3. Four files have been selected: their log-log solutions g (O) (z) are given in Table 4. Note that eu2015.i lacks the exponential term, and b 3 is so small that the cubic-cut-off solution is nearly equal to the log-normal solution, as confirmed by the same error in Table 3.
The log-log plots of the cubic-cut-off functions are given in Figures 6 and 7.

Bipartite Graphs
Bipartite graphs contain two types of nodes, active and passive. The edges connect an active node with a passive one. As in the previous case, for each graph, two files are generated with the frequency function (2): the one with the suffix .i contains the indegrees (i.e., the degrees of the passive nodes), and the one with the suffix .o contains the outdegrees (i.e., the degrees of the active nodes).
Most of the considered graphs are rating networks between persons and items they have rated. The ratings values are ignored, and only the information whether a person has rated an item is retained. fine foods [25]. The dataset of the reviews of fine foods from Amazon. The data span a period of more than 10 years, up to October 2012. N = 74.2 K, and S = 568 K. • last.fm. A large database of listening data crawled by [26] using the last.fm API. There were considered both relations user-song (N = 211 K and S = 1.29 G) and relations user-song weighted with the number of plays (N = 211 K, and S = 4.96 G). • movielens [27]. The dataset describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 25 M ratings. These data were created between 9 January 1995 and 21 November 2019. N = 162 K, and S = 25 M. • supermarket. A small database of supermarket purchases collected by [28]. There were considered both relations user-product (N = 60.4 K and S = 24.6 M) and the relations user-product weighted with the number of purchases (N = 60.4 K, and S = 107 M). • Yahoo! artists [29]. The artists ratings collected from the Yahoo! Webscope dataset R1. This dataset represents a (anonymized) snapshot of the Yahoo! Music community's preferences for various musical artists, collected in one month sometime prior to March 2004. N = 1.95 M, and S = 116 M.
The errors of the computed solutions are given in Table 5. Four files have been selected: their log-log solutions g (O) (z) are given in Table 6, and the log-log plots of the cubic-cut-off functions are given in Figures 8 and 9.

Comments
The first thing we note from Tables 1, 3, and 5 is that cubic-cut-off mainly outperforms the other procedures, which could be so ranked: log-normal, Beta, cut-off, power law. The winning point of cubic-cut-off and log-normal is their better ability in adapting to the bending of the head, but log-normal behaves worse than cubic-cut-off in the tail because of the lack of the exponential term. The relevance of logarithmic terms is confirmed by the fact that, only in a small number of cases, cut-off reaches the performance of cubic-cut-off. The small values of p e δ provide an indirect proof of the validity of the cubic-cut-off model.
From Table 3, we note a characteristic behavior of function ξ j of the cubic-cut-off for most directed graphs, in particular for Wikipedia graphs: typically, the indegree files have a larger attachment exponent than outdegree files. This agrees with the reasonable idea that the outlink processes are independent from the degree of the node, while the inlink process rely more on the degree of the node.
On the contrary, the difference between indegree and outdegree files in bipartite graphs appears reversed, pointing out the active role of a person in choosing a particular item. This is evident, for example, in the case of the supermarket datasets. We could try an explanation for these outcomes: it could be the result of some aggressive commercial policy which directs the purchases toward more advertised products. It is worth noting that cubic-cut-off succeeds in coping, even with the particularly difficult datasets last.fm.o and last.fmW.o, which exhibit a very messy head. In these cases, since the resulting attachment rule is conditioned by dozens of head points, it seems appropriate to let j min = 100, thus reducing the exponent at values near 2.
A large attachment exponent appears for the citiesPopulation dataset, as well, pointing out the recognized great attractiveness of the most important cities of the world.
Finally, for many considered datasets, s j results decreasing in the tail, suggesting that the attachment rule might get weaker progressively when the items have a very large degree. If we associate the degree of an item to its age (as it is often made), in the sense that an item with a larger degree is assumed to be older, this weakening behavior could be considered as a possible indicator of a phenomenon of obsolescence.

Conclusions and Future Work
In this paper, a model for frequency analysis of systems representing multi-scale real-world phenomena has been proposed. At its basis, a discrete-time stochastic process leads to a steady state solution ruling the attachment policy. The attachment rule, which in the original mixed model is linear, has been enriched for including elements of the exponential cut-off model and of the log-normal model.
The proposed model, called cubic-cut-off, has been applied to a large number of datasets and results to be more effective than other models, like the widely applied log-normal and cut-off, which, in some cases, are unable to give acceptable approximations, as clearly appears from the inspection of Tables 1, 3, and 5, where the cubic-cut-off is compared with the Beta function, the power law, the log-normal, and the cut-off.
In a few cases, its behavior is only a little better than the log-normal, showing that the cubic and exponential terms added to log-normal have a small influence, but, in most cases, the presence of these terms is essential to obtain good results.
The frequency analysis we performed in this paper applies to a network modeling based on graph representations as a discrete structure. The model we proposed belongs to the class of parametric models, where some finite set of parameters is assumed. Alternatively, the class of non-parametric models, where an infinite set of parameters is assumed, can be taken into consideration; see, for example, Reference [30], where a Bayesian non-parametric model for random graphs is proved to exhibit a power-law behavior, and a general framework for bipartite graphs, directed multigraphs, and undirected graphs is described. In future research, we are interested in studying non-parametric models applied to networks exhibiting peculiar behaviors, like the ones we considered in this paper.
Author Contributions: All the authors have contributed substantially and in equal measure to all the phases of the work reported. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.