Taylor’s Law in Innovation Processes

Taylor’s law quantifies the scaling properties of the fluctuations of the number of innovations occurring in open systems. Urn-based modeling schemes have already proven to be effective in modeling this complex behaviour. Here, we present analytical estimations of Taylor’s law exponents in such models, by leveraging on their representation in terms of triangular urn models. We also highlight the correspondence of these models with Poisson–Dirichlet processes and demonstrate how a non-trivial Taylor’s law exponent is a kind of universal feature in systems related to human activities. We base this result on the analysis of four collections of data generated by human activity: (i) written language (from a Gutenberg corpus); (ii) an online music website (Last.fm); (iii) Twitter hashtags; (iv) an online collaborative tagging system (Del.icio.us). While Taylor’s law observed in the last two datasets agrees with the plain model predictions, we need to introduce a generalization to fully characterize the behaviour of the first two datasets, where temporal correlations are possibly more relevant. We suggest that Taylor’s law is a fundamental complement to Zipf’s and Heaps’ laws in unveiling the complex dynamical processes underlying the evolution of systems featuring innovation.


Introduction
The laws of Zipf [1][2][3], Heaps [4,5] and Taylor [6,7], which quantify, respectively, the frequency distribution of elements in a given system, the rate at which new elements enter a given system, and fluctuations in that rate, are recognized as the more general statistical laws characterizing complex systems featuring innovations. As such, they also set minimal requirements for the predictions a given modeling scheme should have to correctly address the fundamental mechanisms driving innovation processes. Zipf's law, or generalized Zipf's law predicting a frequency-rank distribution of the form f (R) = R −β , with 0 < β < +∞ (whereas the strict Zipf's law refers to β = 1) characterizes disparate systems, from cities population to earthquakes amplitudes to the frequency of words in written texts, and different explanations for its emergence have been proposed so far [8][9][10]. While Zipf's law is a static property of the system, Heaps' law explicitly refers to its evolution and states that the number of distinct elements D(n) when the system consists of n elements follows a power law D(n) ∝ n γ , 0 < γ ≤ 1. This points to two fundamental properties shared by different systems related to human activity, from natural language, to the way humans listen to music or interact in a collaborative online systems, or build up collaborations in a research activity: (i) new elements continuously enter the system; (ii) the rate at which innovation occurs slows down with the intrinsic time of the system (when the strict inequality γ < 1 holds), e.g., it is easier and easier to continue with established a subset of the Gutenberg corpus of English texts; (ii) Twitter hashtags; (iii) a collaborative tagging system (Delicious); (vi) the list of temporarily ordered songs listened by many users in the Last.fm website. We observe how Taylor's law of the actual sequences of events follows in all the dataset the form σ[D(n)] ∝ µ β [D(n)], with β 1. Furthermore, we highlight how the randomized sequences, obtained by retaining all the elements of the original sequences and changing their temporal order (see Section 5 for details), show different behavior in the Gutenberg corpus and Last.fm compared to Twitter and Delicious. This issue remains still not fully explained, and leaves open the need of a deeper understanding of the process responsible for this behavior.
The paper is organized as follows. In the next section, we recall the urn model with triggering, devoting a particular emphasis to its connection with the two parameters Poisson-Dirichlet process [27,28]. We recall, in particular, how the urn model with triggering can be recast to be equivalent to the latter stochastic process. At the same time, it extends the Poisson-Dirichlet process in the region where the latter is not defined, i.e., in the region of linear innovation growth. In Section 3, relying on known results on triangular urn models [32,33], we characterize the limit distribution for the number of distinct elements D(n) and Taylor's law for the urn model with triggering (Section 4). In Section 5, we discuss the Taylor's law in the four datasets mentioned above. Finally, in Section 6, we discuss two different mechanisms that can increase the exponent of Taylor's law at a value β > 1, as observed in the considered real-world systems.

The Urn Model with Triggering
The urn model with triggering, introduced in [12], is a minimal model based on Pólya's urn able to reproduce the main statistical signatures of innovation processes, namely Zipf's, Heaps' and Taylor's law. It casts in a mathematical framework the idea of the expansion into the adjacent possible [13,34,35] where the space of possibilities is continuously enlarged, due to the realization of part of them. A crucial element is thus correlations between the emergence of novel elements in the system. The model works as follows. An urn initially contains N 0 > 0 distinct balls of different colors. Then, at each time step t, a ball is drawn at random from the urn to construct a sequence S of events, and it is put back in the urn. Further,

•
if the color of the extracted ball is a new one, (it appears for the first time in S, i.e., it is a realization of a novelty), then we add ρ balls of the same color plus ν + 1 distinct balls of different new colors, which were not yet present in the urn; note that we use here the word new in two different acceptations: on one hand we refer to events that occur for the first time, on the other one to new colors that enter the space S of events • if the color of the extracted ball is already present in S, we add ρ balls of the same color.
Therefore, if C t+1 is the color of the extracted ball at time t + 1 and D t is the number of different colors extracted until time t, we have: where a := −ρ + ρ + ν + 1. Moreover, if c denotes an old color, we have where K c,t denotes the number of extractions of the color c until time t.

Values of the Model Parameters
Note that we have p c,t > 0 for each t when ρ > −1. The model can be defined also for ρ = −1, but this implies b t = 1 and D t = t for all t. Moreover, the value ρ = 0 is possible, but in that case p c,t would not depend on K c,t , e.g., no reinforcement effect would be present. Therefore, we focus on the case ρ > −1, ν ≥ 0 and ρ > 0.

Triangular Urn Schemes and Innovation Rate
Concerning the behavior of the number of distinct elements D t , the above urn model can be seen as a triangular two-color urn scheme [32,33,39,40]. More precisely, we can consider an urn model with the following dynamics. The urn initially contains N 0 > 0 black balls. Then, at each time step t, a ball is drawn at random from the urn and

•
if the color of the extracted ball is black, then we replace the extracted ball with a white ball and we add ρ white balls plus ν + 1 black balls; • if the color of the extracted ball is white, we return the extracted ball in the urn together with ρ additional white balls.
Therefore, in this urn scheme the extraction of a black (resp. white) ball corresponds to the extraction of a new (resp. old) color in the urn model with triggering. If we denote by B t and W t , respectively, the number of black and white balls in the urn at time step t and by δ t a random variable taking values in {0, 1} such that δ t = 1 if the extracted ball at time step t is black, then we have B 0 = N 0 > 0, W 0 = 0 and, for each t ≥ 0, with ρ := ρ + 1 and P(δ t+1 = 1|δ 1 , . . . , δ t ) = B t /(B t + W t ). A dynamics of this kind is a two-color urn model with triangular replacement matrix The balance condition, which requires that the number of added balls is the same at each time step, independently of the color of the extracted ball, corresponds to the particular case a = ν + ρ − ρ = 0. Recalling that we are assuming ν ≥ 0, ρ > 0 and ρ > 0, the balance condition is possible only if ρ > ν. According to the above notation, we can write Therefore, when ν > 0, the asymptotic behaviour of D t coincides with the one of B t /ν and from the results in [32,33,39] (simply translating the results proven in that papers in terms of the considered model) we immediately obtain (in the following a.s. → means almost sure convergence and d → means convergence in the distribution sense): where D is a suitable random variable with finite moments. In particular, when a = 0, the random variable D has probability density function given by where c is a normalizing constant and f ML denotes the probability density function of the Mittag-Leffler distribution with parameter ν/ρ. Hence, for a = 0, we have and where Z is a suitable random variable.
and the second-order behaviour depends on the value of ρ/ν. Precisely, denoting by N (0, σ 2 ) the normal distribution with mean value equal to zero and variance equal to σ 2 , we have: - where V is a suitable random variable.
For the degenerate case ν = 0, we trivially have B t = N 0 for each t. Moreover, we recall that ρ > 0 and W t − ρt = ( ρ − ρ)D t . Hence, when ρ = ρ, the asymptotic behaviour of D t follows from the results on W t (see [33]), that is we have and The balance condition with ν = 0 means ρ = ρ and in this case we have a Dirichlet process with parameter θ = N 0 /ρ and the above convergence results still hold true.

Taylor's Law
The Taylor's law connects the standard deviation of a random variable to its mean. In the considered model, when the balance condition a = 0 is satisfied, we can obtain explicit formulas for the moments of D t : indeed, from [33,41], using the relation and so σ[ 2 . To our knowledge, in the unbalanced case we have not explicit formulas for the first and the second asymptotic moments of D t . Here, we conjecture that suitable uniform integrability conditions hold for the convergence results in Section 3 in order to infer the convergence of the first two moments having only almost sure convergence and convergence in distribution (see, e.g., [42,43]). In other words, we leverage the convergence results in Section 3 in order to guess the corresponding Taylor's law. (7) is a constant, we can not exploit the almost sure convergence (7) in order to obtain a Taylor's law as done for the previous case 0 < ν < ρ. However, from the convergence in distribution (8), we can guess Hence, combining together the above two limit relations, we find 1] for all t, the almost sure convergence (9) implies the convergence of the moments (see [43]) for that equation. However, it is not enough in order to get a Taylor's law, but we need to use (10), (11) and (12). First of all, we observe that Hence: for ρ/ν < 1/2, we guess from (10) that the first term on the right hand of the above equality behaves as σ 2 /t, while the second term is o(1/t), and so we get with the constant of proportionality equal to σ a/(ν − ρ); -for ρ/ν = 1/2, we guess from (11) that the first term on the right hand of the above equality behaves as σ 2 ln(t)/t, while the second term is o(ln(t)/t) and so we get with the constant of proportionality equal to σ a/ρ; -for ρ/ν > 1/2, we guess from (12) that the first term and the second term on the right hand of the above equality behave as µ[Z 2 ]t 2(ρ/ν−1) and µ[Z] 2 t 2(ρ/ν−1) respectively and so we get In the degenerate case ν = 0, from the almost sure convergence (13) we guess 1 and, from the convergence in distribution (14), we guess that the first term on the right hand of above equality converges to N 0 /ρ, while the second term converges to zero. Hence, we find 2 . All the above theoretical predictions are supported by simulations, shown in Figure 1, left. We also report in Figure 1, right, the Taylor's law for the corresponding reshuffled sequences, where the elements are the same as in the original sequences but the temporal order (their ordering in the sequence) is lost. For a discussion on the shuffling procedure see the next section.
Each realization is a sequence of 10 6 elements. Right: Taylor's law from the same sequences as in the left side picture, individually reshuffled so that to loose the temporal order (refer to the parallel file random reshuffling procedure discussed in Section 5 and in Figure 2).

Taylor's Law in Real World Systems
We base our empirical analysis on four datasets whose content is the result of voluntary human activity. The first dataset consists of english written texts from the on-line collection of public domain books hosted at the Gutenberg project [44]. This dataset was crawled in year 2007. From that, we selected the longest 100 books. In this case, innovations are represented by new words entering the text. The second dataset contains the list of songs listened by 1000 Last.fm users until the 5th of May 2009 [45]. This list has been ordered according to the time of listening. Songs listened for the first time in the Last.fm platform are considered as innovations. The third dataset contains the time ordered list of tags in the platform Del.icio.us [46], where users used keywords (tags) to categorize bookmarked URLs. The dataset contains the tag sequence of users activity from early 2004 up to the end of 2006. The Del.icio.us platform has been discontinued. We treat as innovation the very first usage of a tag in Del.icio.us. The fourth and last dataset contains the time ordered sequence of the 10% of all the hashtags appeared in January 2013 on the micro-blogging platform Twitter [47]. Also in this database, we consider brand new hashtags entering the system as innovations. All these four datasets were already studied in previous works [12,14].
In order to estimate the average number of different tokens and their standard deviation, we preprocessed data such to split them into sequences of given fixed length. We use the generic term token to address the elements of the sequences, which are words in the Gutenberg dataset, song titles in Last.fm, tags in Del.icio.us and hashtags in Twitter. In Gutenberg we consider the natural splitting, each sequence being a book. To obtain sequences with the same length, we cut all the books at the length of the shortest one, so that we extracted the first 200,000 words of each of the 100 books. In Del.icio.us we took the last 2 × 10 7 tags and split them into 1000 chunks with 20,000 tags each. In Last.fm, we selected the last 19 × 10 6 titles and split them into 190 chunks of length 100,000. In Twitter we selected the last 346 × 10 5 hashtags and created 346 chunks of length 100,000.
The estimation of the average number of distinct tokens, as well as the standard deviation, is done by determining the number of distinct tokens appeared before a certain position in all the split chunks in parallel. For example, in Gutenberg, we count the number of different words D(N) appeared after N total words for all the M = 100 books and calculate We plot these two quantities one against each other for each N and display the result in Figure 3.  In order to evaluate the influence of the token macroscopic statistical properties, e.g., the Zipf's law, on Taylor's law, we destroy the correlations by reshuffling the sequences. We perform two different shuffles, with increasing randomization, as displayed in Figure 2. In the first one, which we denote as parallel file random, we shuffle the tokens inside the same sequence. In the other one, which we call parallel random, we shuffle the tokens throughout the all sequences. The results of these randomization schemes on Taylor's law are shown in Figure 3. Let us first comment that the Del.icio.us and Twitter datasets feature a Taylor exponent, that is the exponent β in the relation σ(N) ∝ µ(N) β , approximately equal to one. This behavior is well reproduced by the urn model with triggering discussed in Section 2, in the parameters region ν < ρ (refer to Figure 1), that is the region where its exchangeable counterpart, namely the two parameters Poisson-Dirichlet process, is defined. Conversely, the Gutenbeng and Last.fm datasets show a significant deviation from the linear relation between the standard deviation and mean of D t , featuring a Taylor's exponent β > 1. The simple urn model with triggering, as well as the two parameters Poisson-Dirichlet process, fails in predicting this deviation from an unitary exponent. However, simple generalization of the considered model can account for it. In the next section we will discuss two possible approaches leading to similar effect on the Taylor's exponent. Before doing that, we wish to further comment on the results obtained on the reshuffled sequences. The parallel random procedure produces asymptotically a trivial (equal to 1/2) Taylor's exponent for all the datasets, and this reflects the fact that a random sampling from a power law distribution produces a Taylor's exponent β = 1/2 [7,21]. The (parallel file random) procedure poses the need to distinguish again the Gutenbeng and Last.fm datasets from Del.icio.us and Twitter. While in the latter datasets the locally reshuffled sequences behave essentially as the original (temporarily ordered) one, in the first two dataset a peculiar behavior of the locally reshuffled sequences emerges, similar in the two datasets and stable against different choices of sets of books in the Gutenberg dataset ( Figure 4). The discrepancy between the Taylor's law in the reshuffled sequences with respect to the original ones points to the fact that randomly sampling from different power law distributions cannot account for the observations, and a different dynamical process has to be considered. The exact mechanism leading to the observed behavior remains an open question, that calls for a more detailed analysis probably adopting hierarchical models, where correlations between the words distributions in different books are taken into account. This will be the topic of a further work.    Figure 3 (top left), with 20 different realizations of the parallel file random reshuffling procedure. We see that the difference between the curve referred to the ordered sequences and those referred to the reshuffled ones is much higher than fluctuations due to different realizations of the reshuffling.

Two Mechanisms that Increase Fluctuations
We here propose two mechanisms that generalize the basic model and that are able to account for that higher exponent. On the one hand, increasing fluctuations can be obtained by a quenched stochasticity in the model parameters. That is, each book can be considered as an instance of the considered stochastic process with parameters extracted from a given probability distribution. The term quenched refer to the fact that the parameters are extracted from each realization of the process and remain fixed all along the sequence generation. As a second mechanism, we consider the urn model with semantic triggering introduced in [12] to account for observed clusterization in the emergence on novelties.

Random Parameters
For the sake of analytical simplicity, we here discuss in detail only the case with ν as the random parameter. We show from simulations that similar behaviors are obtained when we take ρ or N 0 as the random ( Figure 5).

•
(Case ν > ρ) As seen before, the Taylor's exponent in the case ν > ρ is always smaller than 1. Suppose now that ρ and ρ are constants and there exists a random variable X 0 , with σ 2 [X 0 ] > 0, that gives the value of ν. Given the value of X 0 , the urn process behaves as described before. If X 0 is concentrated on (ρ, +∞), that is X 0 /ρ > 1 almost surely, then, on the event {X 0 = ν}, the sequence D t /t converges almost surely to the value (ν − ρ)/(ν + ρ − ρ). Therefore, since D t /t is bounded, we have [43] µ Therefore, by setting D = (X 0 −ρ) This means that while a deterministic parameter ν > ρ gives a Taylor's exponent smaller than 1, a random parameter ν, with ν/ρ > 1 almost surely, gives a Taylor's exponent equal to 1. • (Case ν < ρ) As seen before, the Taylor's exponent in the case ν < ρ is equal to 1. Suppose now, as before, that X 0 is a random variable, with σ 2 [X 0 ] > 0, that gives the value of ν, while the other parameters are constant. If X 0 is concentrated on (0, ρ), that is X 0 /ρ < 1 almost surely, then, on the event {X 0 = ν}, the sequence t −ν/ρ D t converges almost surely to a suitable random variable D ν . Moreover, from [33], we have Assuming, as in the previous section, a condition of uniform integrability, we can say that where g 1 (ν) is the function given in (17) with q = 1. Similarly, where g 2 (ν) is the function given in (17) with q = 2.
If we neglect the terms g q (X 0 ) in the above mean values, we have where G X 0 is the moment-generating function of X 0 . For instance, if X 0 is uniformly distributed on (0, ρ), we get , and so, as above, From Figure 5 we see that the above predictions are valid asymptotically, after a long transient where a law σ[D t ] ∝ µ[D t ] β , β > 1 seems to be valid.

Urn Model with Semantic Triggering
For the sake of completeness, we recall here the urn model with semantic triggering introduced in [12], where it was shown that this generalization with respect to the basic model discussed in Section 2 was crucial in order to reproduce higher level features ruling the introduction of novelties in real systems. Let us again consider an urn U initially containing N 0 > 0 distinct balls with different colors. Each ball is endowed by a color and by a label as well. Balls with different colors can share the same label, each label defining a semantic group, while balls with different labels necessarily have different colors. The N 0 balls belong to N 0 /(ν + 1) groups, the elements in the same group sharing a common label. In the following, we will say that an element a triggered the enter in the urn of the element b, if the element b is one of the ν + 1 elements added in the urn when a is drawn for the first time. We thus define the following process. To construct the sequence S, we randomly choose the first element. Then, at each time step t: (i) we give weight 1 to: (a) each element in U with the same label, say C, as s t−1 (the last element added in the sequence), (b) to the element that triggered the enter in the urn of s t−1 , and (c) to the elements triggered by s t−1 ; a weight η ≤ 1 is given to any other element in U ; (ii) The element s t is chosen by drawing randomly from U , each element with a probability proportional to its weight; (iii) the element s t is added to the sequence S and put back into U along with ρ additional copies of it; (iv) if and only if the chosen element s t is new (i.e., it appears for the first time in the sequence S), ν + 1 brand new distinct elements (balls with different colors, not yet present in the urn), all with a common brand new label, are added to U .
We thus introduced a mechanism through which the occurrence of a ball with a given label facilitates further occurrences, close in time, of other balls with the same label, i.e., semantically related to it. Note that if η = 1 the dynamics of this model reduces to that of the model described in Section 2. We do not analyzed in details the behavior of this model (the interested reader can refer to [12], but we remind here that it produces again power laws for the Heaps and Zipf's laws, with exponents respectively min( νη ρ , 1) ≤ γ ≤ min( ν ρ , 1) and 1/γ. The behavior for the Taylor's law is reported in Figure 5 for some choices of the model parameters, showing that it also account for an exponent β > 1. For the sake of completeness, we show in Figure 5 also a case where the model with semantic triggering is coupled with a random choice of the model parameters, observing that this does not lead to any substantial different behavior.   Taylor's law in the urn model with triggering, with parameter's N 0 = 100, ρ = 1 and ν is randomly extracted for each simulation of the process from a uniform distribution on the interval (0, 1) (left) and from an exponential distribution on the interval (0, 1) and parameter λ = 1 (right), as discussed in the main text. Center: Taylor's law in the urn model with triggering, with parameter's respectively: (left) N 0 = 100, ν = 2, ρ = 3 + r i , with r i randomly extracted for each simulation of the process from an exponential distribution with meanr i = 1; ν = 2, ρ = 3, N 0 = 1 + n i , with n i randomly extracted for each simulation of the process from an exponential distribution with meann i = 10 4 . Bottom: (left) Taylor's law in the urn model with semantic triggering, with parameters N 0 = 100, ν = 6, ρ = 9, η = 0.6; (right) Taylor's law in the urn model with semantic triggering, with parameters ν = 2, ρ = 3, η = 0.6, N 0 = 1 + n i , with n i randomly extracted for each simulation of the process from an exponential distribution with meann i = 10 4 . The parameters of the simulations were chosen such to lie in the regime ρ < ν. The parameter η = 0.6 used in the bottom graphs was chosen in the regime where the Heaps' and Zipfs' laws feature exponents compatible with those observed in real systems. In all the figures the curves for the Taylor's law are constructed from 100 independent realizations of the process (M = 100 in Equation (16)).

Conclusions
In this paper, we discussed predictions for the Taylor's law both of a recently introduced modeling scheme based on the notion of the adjacent possible [12], and in four open systems characterized by human activities, where a notion of innovation can be defined. We obtained rigorous mathematical predictions relying on known results for triangular urn models. We supported analytical results and conjectures with simulations of the discussed stochastic process. Further, contrasting model's predictions and observations from real data, we proposed two, not necessary alternative, generalizations of the model to account for deviations of real data from a pure linear dependence of σ[D t ] from µ[D t ]. Namely, we consider the effect of a quenched stochasticity of the model parameters, and the introduction of semantic correlations, already discussed in [12].
By providing a rigorous mathematical framework to characterize the recently introduced urn model with triggering, the present paper opens the way to a deeper comprehension of the basic mechanisms underlying the observed universalities. On the other hand, a careful analysis of real data highlights relevant observables that unveil distinct behaviours in different systems, possibly due to varying degrees of correlations. A deeper understanding of this subtle behavior could shed some light on distinctive features of human language or on the cognitive and social pressure driving cultural production and fruition.
We finally note that we do not consider here hierarchical models, which we plan to investigate in further works. Hierarchical generalizations of the Poisson-Dirichlet process have proved to be extremely promising in inference problems adopting a Bayesian approach, such as topic modeling in textual corpora. We think that a hierarchical approach can be fundamental to reproduce further statistical features observed in written texts and not already fully explained, such, for instance, the double slope observed in Zipf's law in large text corpora and the subtle behaviour of Taylor's law discussed in Section 5.