# Zipf’s, Heaps’ and Taylor’s Laws are Determined by the Expansion into the Adjacent Possible

^{1}

^{2}

^{3}

^{*}

Next Article in Journal / Special Issue

Previous Article in Journal

Previous Article in Special Issue

Previous Article in Special Issue

Physics Department, Sapienza University of Rome, P.le Aldo Moro 5, 00185 Rome, Italy

Sony Computer Science Laboratories, 6, rue Amyot, 75005 Paris, France

Complexity Science Hub Vienna, Josefstädter Strasse 39, A-1080 Vienna, Austria

Author to whom correspondence should be addressed.

Received: 31 July 2018
/
Revised: 17 September 2018
/
Accepted: 25 September 2018
/
Published: 30 September 2018

(This article belongs to the Special Issue Economic Fitness and Complexity)

Zipf’s, Heaps’ and Taylor’s laws are ubiquitous in many different systems where innovation processes are at play. Together, they represent a compelling set of stylized facts regarding the overall statistics, the innovation rate and the scaling of fluctuations for systems as diverse as written texts and cities, ecological systems and stock markets. Many modeling schemes have been proposed in literature to explain those laws, but only recently a modeling framework has been introduced that accounts for the emergence of those laws without deducing the emergence of one of the laws from the others or without ad hoc assumptions. This modeling framework is based on the concept of adjacent possible space and its key feature of being dynamically restructured while its boundaries get explored, i.e., conditional to the occurrence of novel events. Here, we illustrate this approach and show how this simple modeling framework, instantiated through a modified Pólya’s urn model, is able to reproduce Zipf’s, Heaps’ and Taylor’s laws within a unique self-consistent scheme. In addition, the same modeling scheme embraces other less common evolutionary laws (Hoppe’s model and Dirichlet processes) as particular cases.

Innovation processes are ubiquitous. New elements constantly appear in virtually all systems and the occurrence of the new goes well beyond what we now call innovation. The term innovation refers to a complex set of phenomena that includes not only the appearance of new elements in a given system, e.g., technologies, ideas, words, cultural products, etc., but also their adoption by a given population of individuals. From this perspective one can distinguish between a personal, or local, experience of the new—for instance when we discover a new favorite writer or a new song—and a global occurrence of the new, i.e., every time something appears that never appeared before—for instance, if we write a new book or write a new song. In all these cases there is something new entering the history of a given system or a given individual.

Given the paramount relevance of innovation processes, it is highly important to grasp their nature and understand how the new emerges in all its possible instantiations. To this end, it is essential to fix a certain number of stylized facts characterizing the overall phenomenology of the new and quantifying its occurrence and its dynamical properties. Here we focus in particular on three basic laws whose general validity has been assessed in virtually all systems displaying innovation. The Zipf’s law [1,2,3,4], quantifying the frequency distribution of elements in a given system, the Heaps’ law [5,6], quantifying the rate at which new elements enter a given system and the Taylor’s law [7], quantifying the intrinsic fluctuations of variables associated to the occurrence of the new. Any basic theory, supposedly close to the actual phenomenology of innovation processes, should be able at least to explain those three laws from first principles. Despite an abundant literature on the subject related to many different disciplines, a clear and self-consistent framework to explain the above-mentioned stylized facts has been missing for a very long time. Many approaches have been proposed so far, often adopting ad-hoc assumptions or attempting to derive the three laws while taking the others for granted. The aim of this paper is that of trying to put order in the often scattered and disordered literature, by proposing a self-consistent framework that, in its simplicity and generality, is able to account for the existence of the three laws from very first principles.

The framework we propose is based on the notion of “Adjacent Possible” and, more generally, on the interplay between what Francois Jacob named the dichotomy between the “actual” and the “possible”, the actual realization of a given phenomenon and the space of possibilities still unexplored. Originally introduced by the famous biologist and complex-systems scientist Stuart Kauffman, the notion of the adjacent possible [8,9] refers to the progressive expansion, or restructuring, of the space of possibilities, conditional to the occurrence of novel events. Based on this early intuition, we recently introduced, in collaboration with Steven Strogatz, a mathematical framework [10,11] to investigate the dynamics of the new via the adjacent possible. The modeling scheme is based on older schemes, named Polya’s urns and it mathematically predicts the notion that “one thing leads to another”, i.e., the intuitive idea, presumably we all have, that innovation processes are non-linear and the conditions for the occurrence of a given event could realize only after something else happened.

It turns out that the mathematical framework encoding the notion of adjacent possible represents a sufficient first-principle scheme to explain the Zipf’s, Heaps’ and Taylor’s laws on the same ground. In this paper we present this approach and we discuss the links it bears with other approaches. In particular, we discuss the relation of our approach with well known stochastic processes, widely studied in the framework of nonparametric Bayesian inference, namely the Dirichlet and the Poisson-Dirichlet processes [12,13,14]. In addition, based on this comparison, a coherent framework emerges where the importance of the adjacent possible scheme appears as crucial to understand the basic phenomenology of innovation processes. Though we can only conjecture that the expansion of the adjacent possible space is also a necessary condition for the validity of the three laws mentioned above, no counterexamples have been found so far that, without a dynamical space of possibilities, that one can use to satisfactorily explain the empirically observed laws.

Let us consider a generic text and count the number of occurrences of each word. Now, suppose one repeats the same operation for all the distinct words in a long text, and ranks all the words according to their frequency of occurrence and plots them in a graph showing the number of occurrences vs. the rank. This is what George Kingsley Zipf did [2,3,4] in the 1920s. A more recent analysis of the same behaviour is reported in Figure 1, based on data of the Gutenberg corpus [15].

The existence of straight lines in the log-log plot is the signature of power-law functions of the form:

$$f\left(R\right)\sim {R}^{-\alpha}$$

The original result obtained by Zipf, corresponding to the first slope with $\alpha \simeq 1$, revealed a striking regularity in the way words are adopted in texts: said $f\left(1\right)$ the frequency of the most frequent word (rank $R=1$), the frequency of the second most frequent word is $f\left(1\right)/2$, that of the third $f\left(1\right)/3$ and so on. For high rankings, i.e., highly infrequent words, one observes a second slope, with an exponent larger than two.

It should be remarked that perhaps the first one to observe the above reported law was Jean-Baptiste Estoup, who was the General Secretary of the Institut Sténographique de France. In his book Gammes sténographiques [1,16], pioneered the investigation of the regularity of word frequencies and observed that the frequency with which words are used in texts appears to follow a power law behaviour. This observation was later acknowledged by Zipf [2] and examined in depth to bring what is also known as Estoup-Zipf’s law. From now on we shall refer to this law as Zipf’s law.

It is also important to remark that Zipf-like behaviours have been observed in a large variety of cases and situations. Zipf itself reported [4] about the distribution of metropolitan districts in 1940 in the USA, as well as service establishments, manufacturers, and retails stores in the USA in 1939. As the years pass, examples and situations where Zipf’s law has been invoked has been steadily growing. For instance, Zipf’s law has been invoked in city populations, the statistics of webpage visits and other internet traffic data, company sizes and other economic data, science citations and other bibliometric data, as well as in scaling in natural and physical phenomena. A thorough account of all these cases is out of the scope of the present paper and we refer to recent reviews and references therein for an account of the latest developments [17,18,19].

Let us now make a step forward and look at a generic text (or, without loss of generality, at a generic sequence of characters) and focus now on the occurrence of the novelties. For a generic text, one can ask when new words, i.e., never occurred before in the text, appear. Now, if one plots the number of new words as a function of the number of words read (which is our measure of the intrinsic time), one gets a plot like that of Figure 2, where one observes two main behaviors.

A linear growth for short times where at the beginning, basically all the words are appearing for the first time. Later on the growth slows down and an asymptotic behavior is observed of the form:
with $\gamma \in [0,1]$. In the specific case of Figure 2 $\gamma \simeq 0.45$ but the exponent slightly changes from text to text. The relation of Equation (2) is known as Heaps’ law from Harold Stanley Heaps [6], who formulated it in the framework of information retrieval (see also [20]), though its first discovery is due to Gustav Herdan [5] in the framework of linguistics (see also [21,22]). From now onward we shall refer to it as Heaps’ law.

$$D\left(N\right)\sim {N}^{\gamma}$$

In this section we compare the two laws just observed, Zipf’s law for the frequencies of occurrence of the elements in a system and Heaps’ law for their temporal appearance. It has often been claimed that Heaps’ and Zipf’s law are trivially related and that one can derive Heaps’s law once the Zipf’s is known. This is not true in general. It turns out to be true only under the specific hypothesis of random-sampling as follows. Suppose the existence of a strict power-law behaviour of the frequency-rank distribution, $f\left(R\right)\sim {R}^{-\alpha}$, and construct a sequence of elements by randomly sampling from this Zipf distribution $f\left(R\right)$. Through this procedure, one recovers a Heaps’ law with the functional form $D\left(t\right)\sim {t}^{\gamma}$ [23,24] with $\gamma =1/\alpha $. In order to do that we need to consider the correct expression for $f\left(R\right)$ that includes the normalisation factor, whose expression can be derived through the following approximated integral:

$${\int}_{1}^{{R}_{max}}f\left(\tilde{R}\right)d\tilde{R}=1\phantom{\rule{0.166667em}{0ex}}.$$

Let us now distinguish the two cases. For $\alpha \ne 1$ one has
while for $\alpha =1$ one obtains:

$$f\left(R\right)=\frac{1-\alpha}{{R}_{max}^{1-\alpha}-1}{R}^{-\alpha}\phantom{\rule{0.166667em}{0ex}}.$$

$$f\left(R\right)=\frac{1}{log{R}_{max}}{R}^{-1}\phantom{\rule{0.166667em}{0ex}}.$$

When $\alpha >1$, one can neglect the term ${R}_{max}^{1-\alpha}$ in Equation (4), and when $\alpha <1$, one can write ${R}_{max}^{1-\alpha}-1\simeq {R}_{max}^{1-\alpha}$.

Summarizing one has then:

$$\begin{array}{cc}\hfill \alpha >1\phantom{\rule{3.33333pt}{0ex}}:& \phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}f\left(R\right)\simeq (\alpha -1){R}^{-\alpha}\phantom{\rule{0.166667em}{0ex}}.\hfill \\ \hfill \end{array}$$

$$\begin{array}{cc}\hfill \alpha =1\phantom{\rule{3.33333pt}{0ex}}:& \phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}f\left(R\right)\simeq \frac{{R}^{-1}}{ln{R}_{max}}\phantom{\rule{0.166667em}{0ex}}.\hfill \\ \hfill \end{array}$$

$$\begin{array}{cc}\hfill 0<\alpha <1\phantom{\rule{3.33333pt}{0ex}}:& \phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}f\left(R\right)\simeq (1-\alpha )\frac{{R}^{-\alpha}}{{R}_{max}^{1-\alpha}}\phantom{\rule{0.166667em}{0ex}}.\hfill \end{array}$$

We are now interested in estimating the number, D, of distinct elements appearing in the sequence as a function of its length N. To do that, let us consider the entrance of a new element (never appeared before) in the sequence and let the number of distinct elements in the sequence be D after this entrance. This new element will have maximum rank ${R}_{max}=D$, and frequency $f\left({R}_{max}\right)=1/N$. From Equation (6) we obtain:
which, after an inversion gives:

$$f\left(D\right)\simeq (\alpha -1){D}^{-\alpha}=\frac{1}{N}\phantom{\rule{0.166667em}{0ex}}$$

$$\begin{array}{cc}\hfill D\simeq {N}^{\gamma}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\mathrm{with}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}& \gamma =\frac{1}{\alpha}\hfill \end{array}$$

The same reasoning can be extended to generic functional forms for $D\left(N\right)$ as follows:

$$\begin{array}{cc}\hfill \alpha >1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}:& \phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}f\left(D\right)\simeq (\alpha -1){D}^{-\alpha}=\frac{1}{N}\phantom{\rule{0.166667em}{0ex}}.\hfill \\ \hfill \end{array}$$

$$\begin{array}{cc}\hfill \alpha =1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}:& \phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}f\left(D\right)\simeq \frac{1}{DlnD}=\frac{1}{N}\phantom{\rule{0.166667em}{0ex}}.\hfill \\ \hfill \end{array}$$

$$\begin{array}{cc}\hfill 0<\alpha <1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}:& \phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}f\left(D\right)\simeq \frac{1-\alpha}{{D}^{1-\alpha}-1}{D}^{-\alpha}=\frac{1}{N}\phantom{\rule{0.166667em}{0ex}}.\hfill \end{array}$$

Inverting these relations, one eventually finds:

$$\begin{array}{cc}\hfill \alpha >1\phantom{\rule{4pt}{0ex}}:& \phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}D\simeq {N}^{\gamma}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\mathrm{with}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\gamma =1/\alpha \hfill \end{array}$$

$$\begin{array}{cc}\hfill \alpha =1\phantom{\rule{4pt}{0ex}}:& \phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}D\simeq N/lnN\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\mathrm{with}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\gamma \simeq 1\hfill \end{array}$$

$$\begin{array}{cc}\hfill 0<\alpha <1\phantom{\rule{4pt}{0ex}}:& \phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}D\simeq N\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\mathrm{with}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\gamma =1\phantom{\rule{0.166667em}{0ex}}.\hfill \end{array}$$

Summarizing, under the hypothesis of random sampling from a frequency rank distribution expressed by a power-law function $f\left(R\right)\sim {R}^{-\alpha}$, one recovers a Heaps’ law $D\left(N\right)\sim {N}^{\gamma}$ with the following relation between $\gamma $ and $\alpha $:

$$\begin{array}{cc}\hfill \alpha >1\phantom{\rule{4pt}{0ex}}:& \phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\gamma =1/\alpha \hfill \\ \hfill 0<\alpha \le 1\phantom{\rule{4pt}{0ex}}:& \phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\gamma =1.\hfill \end{array}$$

In [24], it has been demonstrated that finite-size effects can affect the above-seen relationships that happen to be true only for very long sequences. For short enough sequences, one observes a systematic deviation from Equation (17), especially for $\alpha $ values close to 1.

Another important observation is now in order. The assumption of random sampling considered above is strong and sometimes unrealistic (e.g., [25]). First of all it implies the a priori existence of Zipf’s law with an infinite support. In addition, the frequency-rank plots one empirically observes are far from featuring a pure power-law behavior. In all those cases, the relation between the Zipf’s law and the Heaps’ law seen above and summarized by Equation (17) happens to hold only when looking at the tail of the Zipf’s plot, i.e., for high ranks (small frequencies) in the frequency-rank plots and long times, i.e., high N, in the plot expressing the Heaps’ law. In a later section we shall also discuss the so-called Taylor’s law that connects the standard deviation s of a random variable (for instance the size D of the dictionary) to its mean $\mu $. Simple analytic calculations [26] show that the poissonian sampling of a power-law leads to a Taylor’s law with exponent $1/2$, i.e., $s\propto \sqrt{\mu}$. This is not the case for real texts for which one observes an exponent close to 1 [26].

The ensemble of all these facts implies that the explanation of the empirical findings of the Zipf’s and Heaps’ law cannot be done by only deriving one of the laws and deducing the other one accordingly, based on Equation (17). Rather, both Zipf’s and Heaps’ laws and the Taylor’s law should all be derived in the framework of a self-consistent theory. This is precisely the aim of this paper.

We now introduce a simple modeling scheme able to reproduce both Zipfs’ and Heaps’ laws simultaneously. Crucial for this result is the conditional expansion of the space of possibilities, that we will elucidate in the following. In [8,9], S. Kauffman introduces and discusses the notion of the adjacent possible, which is all those things that are one step away from what actually exists. The idea is that evolution does not proceed by leaps, but moves in a space where each element should be connected with its precursor. The Kauffman’s theoretical concept of adjacent possible, originally discussed in his investigations of molecular and biological evolution, has also been applied to the study of innovation and technological evolution [27,28]. To clarify the concept, let us think about a baby that is learning to talk. We can say almost surely that she will not utter “serendipity” as the first word in her life. More than this, we can safely guess that her first word will be “papa”, or “mama”, or one among a list of few other possibilities. In other words, in the period of lallation, only few words belong to the space of the adjacent possible and can be actualized in the next future. Once the baby has learned how to utter simple words, she can try more sophisticated ones, involving more demanding articulation efforts. In the process of learning, her space of possibilities (her adjacent possible) considerably grows, with the result that guessing a priori the first 100 words learned by a child is much less obvious than guessing which will be the first one.

Here we formalize the idea that by opening up new possibilities, an innovation paves the way for other innovations, explicitly introducing this concept in a Pólya’s urn based model. In particular, we will discuss the simplest version of the model introduced in [10], which we will name Pólya’s urn model with triggering (PUT). The interest of this model lies, on the one hand, in its generality, the only assumptions it makes refer to the general and not system-specific mechanisms for the expansion into the adjacent possible; on the other hand, its simplicity allows to draw analytical solutions.

The model works as follows (please refer to Figure 3). An urn $\mathcal{U}$ initially contains ${N}_{0}$ distinct elements, represented by balls of different colors. By randomly extracting elements from the urn, we construct a sequence $\mathcal{S}$ mimicking the evolution of our system (e.g., the sequence of words in a given text). Both the urn and the sequence enlarge during the process: (i) at each time step t, an element ${s}_{t}$ is drawn at random from the urn, added to the sequence, and put back in the urn along with $\rho $ additional copies of it (Figure 3A); (ii) iff the chosen element ${s}_{t}$ is new (i.e., it appears for the first time in the sequence $\mathcal{S}$), $\nu +1$ brand new distinct elements are also added to the urn (Figure 3B). These new elements represent the set of new possibilities opened up by the seed ${s}_{t}$. Hence $\nu +1$ is the size of the new adjacent possible available once an innovation occurs.

Simple asymptotic formulas for the number $D\left(n\right)$ of distinct elements appearing in the sequence as a function of the sequence’s length n (Heaps’ law), and for the asymptotic power-law behavior of the frequency-rank distribution (Zipf’s law), in terms of the model parameters $\rho $ and $\nu $ can be derived. In order to do so, one can write a recursive formula for $D\left(n\right)$ as:
where we have defined ${P}_{N}\left(n\right)$ as the probability of drawing a new ball (never extracted before) at time n (note that we consider intrinsic time, that is we identify the time elapsed with the length of the sequence constructed). The probability ${P}_{N}\left(n\right)$ is equal to the ratio (at time n) between the number of elements in the urn never extracted and the total number of elements in the urn. Approximating Equation (18) with its continuous limit, we can write:
where ${N}_{0}$ is the number of balls, all distinct, initially placed in the urn. This equation can be integrated analytically in the limit of large n, when ${N}_{0}$ can be neglected, by performing a change of variable $z=\frac{D}{n}$. After some algebra (detailed computations can be found in [11] and in Appendix A for an extended model), we find the asymptotic solutions (valid for large n):

$$D(n+1)=D\left(n\right)+{P}_{N}\left(n\right)$$

$$\frac{dD}{dn}=\frac{{N}_{0}+\nu D}{{N}_{0}+(\nu +1)D+\rho n},$$

$$\begin{array}{ccc}\hfill \rho >\nu & \Rightarrow & D\left(n\right)\sim {\left(\frac{\rho -\nu}{\rho +1}\right)}^{\frac{\nu}{\rho}}{n}^{\frac{\nu}{\rho}}\phantom{\rule{0.166667em}{0ex}};\hfill \end{array}$$

$$\begin{array}{ccc}\hfill \rho <\nu & \Rightarrow & D\left(n\right)\sim \frac{\nu -\rho}{\nu +1}n\phantom{\rule{0.166667em}{0ex}};\hfill \end{array}$$

$$\begin{array}{ccc}\hfill \rho =\nu & \Rightarrow & D\left(n\right)logD\sim \frac{\nu}{\nu +1}n\to D\sim \frac{\nu}{\nu +1}\frac{n}{logn}\phantom{\rule{0.166667em}{0ex}}.\hfill \end{array}$$

For the derivation of the Zipf’s law we refer the reader to the SI of [10] and to Appendix B for an alternative derivation based on the continuous approximation. Results contrasting numerical results and theoretical predictions for the Heaps’ and Zipf’s laws are reported in Figure 4.

One question that naturally emerges concerns the relevance of the notion of adjacent possible and its conditional growth. One could for instance argue that the same predictions of the PUT model could be replicated having all the possible outcomes of a process immediately available from the outset, instead of appearing progressively through the conditional process related to the very notion of adjacent possible. In order to remove all doubt, we consider an urn initially filled with ${N}_{0}$ distinct colors, with ${N}_{0}$ arbitrarily large, with no other colors entering into the urn during the process of construction of the sequence $\mathcal{S}$. This is the Pólya multicolors urn model [29] and we here briefly discuss the Heaps’ and Zipf’s laws emerging from it. Let us thus consider an urn initially containing ${N}_{0}$ balls, all of different colors. At each time step, a ball is withdrawn at random, added to a sequence, and placed back in the urn along with $\rho $ additional copies of it. This process corresponds to the one depicted in Figure 3A, that is to the rule of the PUT model in the case that the drawn element is not new.

Note that although in this case the urn does not acquire new colors during the process, we can still study the dynamic of innovation by looking at the entrance of new color in the growing sequence. Let us then consider ${N}_{0}$ very large, so that we can consider a long time interval far from saturation (when there are still many colors in the urn that have not already appeared in $\mathcal{S}$). The number of different colors $D\left(n\right)$ added to the sequence at time n follows the equation (when the continuous limit is taken):

$$\frac{dD}{dn}=\frac{{N}_{0}-D\left(n\right)}{{N}_{0}+\rho n}\phantom{\rule{0.166667em}{0ex}},\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}D\left(0\right)=0\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\Rightarrow \phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}D\left(n\right)={N}_{0}\left[1-{\left(1+\frac{\rho n}{{N}_{0}}\right)}^{-\frac{1}{\rho}}\right]\phantom{\rule{0.166667em}{0ex}}.$$

We thus obtain that for $\rho n\ll {N}_{0}$, $D\left(n\right)$ follows a linear behaviour ($D\left(n\right)\simeq n$), while for large n saturates at $D\left(n\right)\simeq {N}_{0}$, failing to predict the power law (sublinear) growth of new elements. In Figure 5, we report results for both the Heaps’ and Zipf’s laws predicted by the model along with their theoretical predictions, referring the reader to [30] for a detailed derivation of the Zipf’s law. It is evident that a simple exploration of a static, though large, space of possibilities cannot account for the empirical observations summarized by the Zipf’s and the Heaps’s laws.

The PUT model is closely related to well known stochastic processes, widely studied in the framework of nonparametric Bayesian inference, namely the Dirichlet and the Poisson-Dirichlet processes. We will discuss here those processes in terms of their predictive probabilities, referring to excellent reviews [12,13,14] for a complete and formal definition of them.

The problem can be framed in the following way. Given a sequence of events ${x}_{1},\cdots ,{x}_{n}$, we want to estimate the probability that the next event will be $\tilde{x}$, where $\tilde{x}$ can be one of the already seen events ${x}_{i}$, $i=1,\cdots ,n$, or a completely new one, unseen until the intrinsic time n.

Let us first discuss the Poisson-Dirichlet process, whose predictive conditional probability reads:
where $0\le \alpha <1$ and $\theta >-\alpha $ are parameters of the model, ${P}_{0}$ a given continuous probability distribution defined a priori on the possible values the variables ${x}_{i}$ can take, named base probability distribution, and ${\tilde{x}}_{i}$ the D distinct values appearing in the sequence ${x}_{1},\cdots ,{x}_{n}$, respectively with multiplicity ${n}_{i}$. Let us briefly discuss Equation (24). The first term on the right hand side refers to the probability that ${x}_{n+1}$ takes a value that has never appeared before, i.e., a novel event. This happens with probability $\frac{\theta +D\alpha}{\theta +n}$, depending both on the total number n of events seen until time n, and on the total number D of distinct events seen until time n. In this way, in the Poisson-Dirichlet process, the concept that the more novelties that are actualized, the higher the probability of encountering further novelties is implicit. The second term in Equation (24) weights the probability that ${x}_{n+1}$ equals one of the events that has previously occurred, and differs from a bare proportionality rule when $\alpha >0$.

$$p\left(\tilde{x}\right|{x}_{1},\cdots ,{x}_{n};\alpha ,\theta ,{p}_{0})=\frac{\theta +D\alpha}{\theta +n}{P}_{0}\left(\tilde{x}\right)+\sum _{i}^{D}\frac{{n}_{i}-\alpha}{\theta +n}{\delta}_{{\tilde{x}}_{i},\tilde{x}}$$

The Poisson-Dirichlet process predicts an asymptotic power-law behavior for the number $D\left(n\right)$ of distinct elements seen as a function of the sequence length n. The exact expression for the expected value of $D\left(n\right)$ can be found in [12]. Here we report the results obtained under the same approximations made for the urn model with triggering:
that can be solved by separation of variables, leading to:

$$\frac{dD}{dn}=\frac{\theta +\alpha D}{\theta +n}\phantom{\rule{3.33333pt}{0ex}},\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}D\left(0\right)=0\phantom{\rule{0.166667em}{0ex}},$$

$$D\left(n\right)\sim \frac{{\theta}^{1-\alpha}{(\theta +n)}^{\alpha}}{\alpha}-\frac{\theta}{\alpha}\phantom{\rule{3.33333pt}{0ex}}.$$

Note that the Poisson-Dirichlet process predicts a sublinear power law behavior for $D\left(n\right)$ but cannot reproduce a linear growth for it, being only defined for $\alpha <1$.

The ubiquity of the Poisson-Dirichlet process is due, together with its ability of producing sequences featuring Heaps’ and Zipf’s laws, to the fundamental property of exchangeability [12,31]. This refers to the fact that the probability of a sequence generated by the Poisson-Dirichlet process does not depend on the order of the elements in the sequence: $p({x}_{1},\cdots ,{x}_{n};\alpha ,\theta ,{p}_{0})=p(\pi \left({x}_{1}\right),\cdots ,\pi \left({x}_{n}\right);\alpha ,\theta ,{p}_{0})$ for any permutation $\pi $ of the sequence elements, so that we can write the joint probability distribution $p({n}_{1},\cdots ,{n}_{D};\alpha ,\theta ,{p}_{0})$ for the number of occurrences of the variables ${x}_{i}$. Exchangeability is a powerful property related to the de Finetti theorem [32,33]; it is also a strong and sometimes unrealistic assumption on the lack of correlations and causality in the data.

Returning to the PUT model, we observe that the model produces, in general, sequences that are not exchangeable. It recovers exchangeability in a particular case, corresponding to a slightly different definition of rule (i): the drawn element ${s}_{t}$ is put back in the urn along with $\rho $ additional copies of it iff ${s}_{t}$ is not new; in the other case (i.e., when we apply rule (ii)), ${s}_{t}$ is put back in the urn along with $\tilde{\rho}$ additional copies of it, with $\tilde{\rho}=\rho -(\nu +1)$. In this particular case the PUT model corresponds exactly to the Poisson-Dirichlet process, with $\theta =\frac{{N}_{0}}{\rho}$ and $\alpha =\frac{\nu}{\rho}$. In this case, at odds with the previously discussed version of the model, the urn acquires the same number of balls at each time step, regardless of whether a novelty occurs. This variant makes the generated sequences exchangeable, but imposes the constraint $\rho \ge (\nu +1)$, and thus in this case we cannot recover the linear growth of $D\left(n\right)$; this is the same for the Poisson-Dirichlet process. We demonstrate in Appendix A that the dependence of the power law’s exponents of the Heaps’ and Zipf’s laws on the PUT model’s parameters $\rho $ and $\nu $ reads the same as in Equations (20–22) if we modify rule (i) with any $\tilde{\rho}\ge 0$.

Here we wish to remark that the urn representation of the PUT model allows for straightforward generalizations where correlations can be explicitly taken into account (see for instance [10] for a first step in this direction). In addition, it can be easily rephrased in terms of walks in a complex space (for instance a graph), allowing to consider more complex underlying structures for the space of possibilities (see for instance the SI of [10,34,35]).

By setting $\alpha =0$ in Equation (24), we obtain the predictive conditional probability for the Dirichlet process, predicting a logarithmic growth of $D\left(n\right)$ [12]. Correspondingly, if we chose $\nu =0$ in the urn model, we obtain:

$$\frac{dD}{dn}=\frac{{N}_{0}}{{N}_{0}+D+\rho n}$$

If we now neglect $D\left(n\right)$ in the denominator of (27), we can solve in the large limit of large n:

$$D\left(n\right)\sim \frac{{N}_{0}}{\rho}log\left(1+\frac{\rho}{{N}_{0}}n\right).$$

The same asymptotic growth of $D\left(n\right)$ is also found in one of the first models introducing innovation in the framework of Pólya’s urn, namely the Hoppe’s model [36]. The motivation of the Hoppe’s work was to derive the Ewens’ sampling formula [37], describing the allelic partition at equilibrium of a sample from a population evolved according to a discrete time Wright-Fisher process [38,39]. In the Hoppe model, innovations are introduced through a special ball, the “mutator”. In particular, the process starts with only the mutator in the urn, with a mass $\theta $. At any time n, a ball is withdrawn with a probability proportional to its mass, and, if the ball is the mutator, it is placed back in the urn along with a ball of a brand new color, with unitary mass, thus increasing the number of different colors present in the urn. Otherwise, the selected ball is placed back in the urn along with another ball of the same color. Writing the recursive formula for $D\left(n\right)$ and taking the continuous limit, we obtain:
that is exactly Equation (27) with $\alpha =0$. It predicts a logarithmic increase of new colors in the urn:
corresponds to Equation (28), by identifying $\frac{{N}_{0}}{\rho}$ with $\theta $. Hoppe’s urn scheme is non-cooperative in the sense that one novelty does nothing to facilitate another. In other words, while in the Hoppe’s model a mechanism is already present that allows for the expansion of the space of possibilities, this mechanism is completely independent of the actual realization of a novelty, and fails to reproduce both the Heaps’ and the Zipf’s laws.

$$\frac{dD\left(n\right)}{dn}=\frac{\theta}{\theta +n}\phantom{\rule{3.33333pt}{0ex}},\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}D\left(0\right)=0\phantom{\rule{0.166667em}{0ex}},$$

$$D\left(n\right)=\theta ln(\theta +n)-\theta ln\left(\theta \right)=\theta ln(1+\frac{n}{\theta})\phantom{\rule{0.166667em}{0ex}},$$

From Equations (14)–(16), it is clear that randomly sampling a Zipf’s law with a given exponent results in a Heaps’ law with linear and sublinear exponents tuned by the exponent of the Zipf’s. On the other hand, Equations (20)–(22) show that the PUT model is also producing the same Heaps’ exponents with the same relation to the Zipf’s exponent as in the random sampling. Therefore, one legitimate question is whether the PUT is also actually performing a kind of sophisticated random sampling of an underlying Zipf’s law. One possible way to discriminate PUT from a random sampling is to look at the fluctuation scaling, i.e., the Taylor’s law discussed in Section 2.3, which connects the standard deviation s of a random variable to its mean $\mu $. Simple analytic calculations [26] show that the Poissonian sampling of a power-law leads to a Taylor’s law with exponent 1/2, i.e., $s\propto \sqrt{\mu}$.

Real text analysis shows instead a Taylor exponent of 1 [26], which points to the obvious conclusion that the process of writing texts is not an uncorrelated choice of words from a fixed distribution. In [26] this was explained by a “topic-dependent frequencies of individual words”. The empirical observation therein, was that the frequency of a given word changes according to the topic of the writing. For example, the term “electron” has a high frequency in physics books and a low frequency in fairy tales, so that its rank is low in the first case and high in the second. The result is that there exist different Zipf’s laws with the same exponent according to the topic and the enhanced variance of the dictionary size is ascribable to this multitude of Zipf’s laws that add a further variability to the sampling process.

In PUT there is certainly no topicality as in real texts. Nevertheless, we find numerically a linear Taylor’s law in the case of sublinear Heaps’ exponents ($\nu <\rho $). In PUT, there is no Zipf’s law beforehand: it is built during the process instead and this is sufficient to boost the variance of the dictionary, on average, at any given time.

In Figure 6, we show the numerical results of two simulations of PUT with $\nu <\rho $, one with $\nu =\rho $ and one with $\nu >\rho $, in order to cover all the possible cases of Equations (20)–(22), plus the random sampling from a zipfian distribution with exponent $\alpha =-2$. Besides the interesting linearity of the fluctuation scaling in the case of $\nu <\rho $, its behaviour in case of fast growing spaces $\nu >\rho $ can also be pointed out. In that case, Heaps’ law is linear as shown in Equation (21), and the traditional model of reference is the Yule-Simon model (YSM) [40]. The Yule-Simon model generates a sequence of characters with the following iterative rule. Starting from an initial character, at each time with a constant probability p, a brand new character is chosen while with probability $1-p$ one selects one of the characters already present in the sequence (which implies drawing them with their multiplicity). In this way, in YSM, the rate of growth of different characters is constant and equal to p and this constant rate of innovation yields a linear Heaps’ law. The preferential attachment rule leads to a Zipf’s law with exponent $\left|\alpha \right|=1-p$. This is consistent with Equation (16) of random sampling and even with Equation (21) of PUT. The difference between YSM and PUT can be appreciated with Taylor’s law. In YSM, new characters appear with probability p so that the average number of different characters at step N is $\mu =pN$, and the variance ${\sigma}^{2}=Np(1-p)=\mu (1-p)$ as in the binomial distribution. As a result, in YSM one gets the Poissonian result $\sigma \propto \sqrt{\mu}$. In contrast, PUT features numerically an exponent of $\simeq 0.58$, i.e., larger than $1/2$ but still less than 1 (see Figure 6).

Given the intrinsic inability of YSM to accomplish for sub-linear dictionary growths, Zanette and Montemurro [41] proposed a simple variant of it. In this variant (ZM), the rate of introduction of new characters, i.e., p, is not constant any more. It is made instead time-dependent with an ad hoc chosen functional form able to reproduce the right range for the Heaps’ exponents. For a Heaps’ exponent $\gamma $, the rate of innovation p is chosen proportional to ${t}^{\gamma -1}$. This expedient allows to reproduce both Zipf’s law and, by construction, Heaps’ law. The two mechanisms for Zipf’s and Heaps’ production are independent of each other as in YSM so that we expect for Taylor’s law the same behavior of YSM, i.e., an exponent $0.5$. After all, ZM can be seen as a YSM with a diluting time flow, which might not affect the scaling of the fluctuations of YSM at a given time. In Figure 6 we show that indeed ZM features a Taylor’s exponent of $0.5$ (magenta curve).

For the Poisson-Dirichlet and the Dirichlet processes, analytical solutions can be computed for the moments of the probability distribution $P\left(D\right(n\left)\right)$ [13,42], yielding the asymptotic exponents respectively 1 and $1/2$ in the Taylor’s law. Numerical results are given in Figure 6. Note that a non trivial exponent in the Taylor’s law is featured by the Poisson-Dirichlet process, where the probability of a novelty to occur does depend on the number of previous novelties, while the Dirichlet process lacks both properties.

In this paper we have argued that the notion of adjacent possible is key to explain the occurrence of the Zipf’s, Heaps’ and Taylor’s laws in a very general way. We have presented a mathematical framework, based on the notion of adjacent possible, and instantiated through a Polya’s urn modelling scheme, that accounts for the simultaneous validity of the three laws just mentioned in all their possible regimes.

We think this a very important result that will help in assessing the relevance and the scope of the many approaches proposed so far in the literature. In order to be as clear as possible, let us itemize the key points:

- The first point we make is about the many claims made in literature about the possibility to deduce the Heaps’ law by simply sampling a Zipf-like distribution of frequencies of events. Though, as seen above, it is possible to deduce a power-law behaviour for the growth of distinct elements by randomly drawing from a Zipf-like distribution, this procedure does not allow to reproduce the empirical results. It has been conjectured in [26], that texts are subject to a topicality phenomenon, i.e., writers do not sample the same Zipf’s law. This implies that the same word can appear at different ranking positions depending on the specific context. Though this is an interesting point, we think that the deduction of the Heaps’ law from the sampling of a Zipfian distribution is not satisfactory from two different points of view. First of all, the empirical Heaps’ and Zipf’s laws are never pure power-laws. We have seen for instance that for written texts the frequency-rank plot features a double slope. Nevertheless, we have seen that a relation exists between the exponent of the frequency-rank distribution at high ranks (rare words) and the asymptotic exponent of the Heaps’ law. In other words, the behaviour of the rarest words is responsible for the entrance rate of new words (or new items). Even though a pure power-law behaviour was observed, we have shown that the statistics of fluctuations, represented by the Taylor’s law, would not reproduce the empirical results (unless a specific sampling procedure based on the hypothesis of topicality is adopted [26]). The conclusion to be taken is that in general the Heaps’ and the Zipf’s laws are non-trivially related and their explanation should be made based instead on first-principle.
- Models featuring a fixed space of possibilities are not able to reproduce the simultaneous occurrence of the three laws. For instance, a multicolor Polya’s urn model [29] does not even produce power-law-like behaviours for the Zipf’s and the Heaps’ laws. It rather features a saturation phenomenon, related to the exploration of the predefined boundaries of the space of possibilities. The conclusion here is that one needs a modelling scheme featuring a space of possibilities with dynamical boundaries, for instance expanding ones.
- Models that incorporate the possibility to expand the space of possibilities like the Yule-Simon [40] model or the Hoppe model fail in explaining the empirical results. In the Yule-Simon model, the innovation rate is constant and the the Heaps’ law is reproduced with the trivial unitary exponent. An ad-hoc correction to this has been proposed by Zanette and Montemurro [41], who postulate a sublinear power-law Heaps’s law form the outset, without providing any first-principle explanation for it. In addition, in this case the result is not satisfactory because the resulting time-series does not obey the Taylor’s law, being instead compatible with a series of i.i.d variables. The question is now why this approach is not reproducing Taylor’s law despite the fact that it fixes the expansion of the space of possibilities. In our opinion what is lacking in the scheme by Zanette and Montemurro is the interplay between the preferential attachment mechanism and the exploration of new possibilities. In other words, the triggering effect which is instead a key features of the PUT model (see next item). The situation for the Hoppe model is different [36], i.e., a multicolor Polya’s urn with a special replicator color. In this case, though a self-consistent expansion of the space of possibilities is in place, an explicit mechanism of triggering, in which the realization of an innovation facilitates the realization of further innovations, lacks. In this case the innovation rate is too weak and the Heap’s law features only a logarithmic growth, i.e., it is slower than any power-law sublinear behaviour.
- The Polya’s urn model with triggering (PUT) [10], incorporating the notion of adjacent possible, allows to simultaneously account for the three laws, Zipf’s, Heaps’ and Taylor’s, in all their regimes, without ad-hoc or arbitrary assumptions. In this case, the space of possibilities expands conditional to the occurrence of novel events in a way that is compatible with the empirical findings. From the mathematical point of view, the expansion into the adjacent possible solves another issue related to Zipf’s and Heaps’ generative models. In fact, in PUT one can switch with continuity from the sublinear to the linear regime of the dictionary growth and vice-versa and this by tuning one parameter only: the ratio $\nu /\rho $. This ratio is not limited to a ratio of integers. In fact, in the SI of [10] it was demonstrated that the same expressions for the Heaps’ and Zipf’s laws are recovered if one uses parameters $\rho $ and $\nu $ extracted from a distribution with fixed means. One possible strategy is to fix an integer $\rho $ while $\nu $ can assume any value in the real numbers (in simulations this is a floating point value), and the mantissa can be taken into account by resorting to probabilities. Therefore, it is perfectly sound to state that one switches with continuity from the sublinear regime to the linear one in the interval $|\nu /\rho -1|<\epsilon $, with $\epsilon \ll 1$, although the rigorous mathematical characterization of the transition is far from being understood.
- It should be remarked that the Poisson-Dirichlet process [12,13,14] is also able to explain the three Zipf’s, Heaps’ and Taylor’s laws only in the strict sub-linear regime for the Heaps’ law. It cannot however account for a constant innovation rate as in the PUT modelling scheme. We also point out that the PUT model embraces the Poisson-Dirichlet and the Dirichlet processes as particular cases.

In this paper we highlighted that the simultaneous occurrence of the Zipf’s, Heaps’ and Taylor’s laws can be explained in the framework of the adjacent possible scheme. This implies considering a space of possibilities that expands or gets restructured conditional to the occurrence of a novel event. The Pólya’s urn with triggering features these properties. Poisson-Dirichlet processes can also be said to belong to the adjacent possible scheme. Though no explicit mention is made about the space of possibilities in those schemes, the probability of the occurrence of a novel event closely depends on how many novelties occurred in the past. We recall that the PUT model includes Dirichlet-like processes as particular cases. From this perspective, PUT-like models seem to be good candidates to explain higher order features connected to innovation processes. We conclude by saying that the very notion of adjacent possible, though sufficient to explain the stylized facts of innovation processes, can be only conjecture as a necessary condition for the validity of the three laws mentioned above. No counterexamples have been found so far with which, without a dynamically restructured space of possibilities, one can satisfactorily explain the empirically observed laws.

Conceptualization, F.T., V.L. and V.D.P.S.; Data curation, F.T. and V.D.P.S.; Formal analysis, F.T. and V.D.P.S.; Funding acquisition, V.L.; Investigation, F.T., V.L. and V.D.P.S.; Methodology, F.T., V.L. and V.D.P.S.; Resources, V.L.; Software, V.D.P.S., F.T.; Validation, F.T., V.L. and V.D.P.S.; Writing—Original Draft, F.T., V.L. and V.D.P.S.; Writing—Review & Editing, F.T., V.L. and V.D.P.S.

This research was partially funded by the Sony Computer Science Laboratories Paris.

We thank S. H. Strogatz for interesting discussions. V.D.P.S. is grateful to M. Osella for interesting discussions on the relevance of Taylor’s law. V.D.P.S. acknowledges the Austrian Research Promotion Agency FFG under grant #857136 for financial support.

The authors declare no conflict of interest.

We here derive the Heaps’ law for a general variation of the urn model with triggering. For the sake of completeness, we recall here the model: An urn $\mathcal{U}$ initially contains ${N}_{0}$ distinct elements, represented by balls of different colors. By randomly extracting elements from the urn, we construct a sequence $\mathcal{S}$. Both the urn and the sequence enlarge during the process. At each time step t, an element ${s}_{t}$ is drawn at random from the urn: (i) iff the chosen element ${s}_{t}$ is old (i.e., it already appeared in the sequence $\mathcal{S}$), it is added to the sequence, and put back in the urn along with $\rho $ additional copies of it; (ii) iff the chosen element ${s}_{t}$ is new (i.e., it appears for the first time in the sequence $\mathcal{S}$), it is added to the sequence, and put back in the urn along with $\tilde{\rho}$ additional copies of it. Further, $\nu +1$ brand new distinct elements are also added to the urn.

We can now write the equation governing the growth of the number of distinct elements $D\left(n\right)$ as a function of the total number n of elements in the sequence (n is also obviously denoting the time step t above):
where we have defined $a=\nu +1-\rho +\tilde{\rho}$.

$$\frac{dD}{dn}=\frac{{N}_{0}+\nu D}{{N}_{0}+aD+\rho n},$$

By defining $z=\frac{D}{n}$ and neglecting ${N}_{0}$, we can write:
which gives:

$$\frac{dz}{dn}=\frac{1}{n}\frac{dD}{n}-\frac{D}{{n}^{2}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\Rightarrow \phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\frac{dD}{dn}=n\frac{dz}{dn}+z=\frac{\nu z}{az+\rho}$$

$${\int}_{z\left({n}_{0}\right)}^{z\left(n\right)}\frac{az+\rho}{z(\nu -az-\rho )}dz={\int}_{{n}_{0}}^{n}\frac{dn}{n}.$$

Here we note that by definition $0\le z\le 1$, and $z\left({n}_{0}\right)=D\left({n}_{0}\right)/{n}_{0}$, for a given ${n}_{0}$ such that the solutions we found are valid for any $n\ge {n}_{0}$. In order to integrate Equation (A3) we need to study the sign of the expression $\nu -az-\rho $. Let us do this by considering separately the case $\rho >\nu $ and $\rho <\nu $, and postponing the computation for $\rho =\nu $.

In this case, if $a\ge 0$ we have $\nu -az-\rho <0$ (and thus obviously $az+\rho -\nu >0$), while if $a<0$ it exists a ${z}_{0}$ such that $\nu -az-\rho <0$ for $z<{z}_{0}$. Thus, if $z\left(n\right)$ is decreasing in n, we can safely perform the integration for any $n\ge {n}_{0}$, for some ${n}_{0}$. Let us make this assumption and verify it at the end of the computation. By integrating Equation (A3) we thus obtain:
and solving:

$$-log(az+\rho -\nu )(1+\frac{\rho}{\nu -\rho}){\mid}_{z\left({n}_{0}\right)}^{z\left(n\right)}+\frac{\rho}{\nu -\rho}logz{\mid}_{z\left({n}_{0}\right)}^{z\left(n\right)}=logn{\mid}_{{n}_{0}}^{n}$$

$${(az+\rho -\nu )}^{\nu}=A{n}^{\rho -\nu}{z}^{\rho}\phantom{\rule{3.33333pt}{0ex}},\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}A=exp\left(C\right)\phantom{\rule{3.33333pt}{0ex}},\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}C=log\frac{{(az\left({n}_{0}\right)+\rho -\nu )}^{\nu}}{z{\left({n}_{0}\right)}^{\rho}}-log{n}_{0}^{\rho -\nu}.$$

We can now substitute $z=\frac{D}{n}$, and after some algebra we can write:
that gives the solution:

$$D-\frac{a}{{A}^{\frac{1}{\nu}}}{D}^{\frac{\nu}{\rho}}=B{n}^{\frac{\nu}{\rho}}\phantom{\rule{3.33333pt}{0ex}},\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}B={A}^{{-}^{\frac{1}{\rho}}}{(\rho -\nu )}^{\frac{\nu}{\rho}}$$

$$D\left(n\right)=B{n}^{\frac{\nu}{\rho}}+O\left({n}^{\frac{{\nu}^{2}}{{\rho}^{2}}}\right).$$

We observe that ${lim}_{n\to \infty}z\left(n\right)={lim}_{n\to \infty}\frac{D\left(n\right)}{n}=0$ monotonically, so that the assumption made above is satisfied.

Let us now assume $\nu -az-\rho >0$ and let us verify the assumption at the end. We thus write:

$$-log(\nu -az-\rho )(1+\frac{\rho}{\nu -\rho}){\mid}_{z\left({n}_{0}\right)}^{z\left(n\right)}+\frac{\rho}{\nu -\rho}logz{\mid}_{z\left({n}_{0}\right)}^{z\left(n\right)}=logn{\mid}_{{n}_{0}}^{n}.$$

After similar calculation as in the case $\rho >\nu $, we arrive to the relation:
that gives the solution:

$$D+\frac{{A}^{\frac{1}{\nu}}}{a}{D}^{\frac{\rho}{\nu}}=\frac{\nu -\rho}{a}n\phantom{\rule{3.33333pt}{0ex}},\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}A=exp\left(C\right)\phantom{\rule{3.33333pt}{0ex}},\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}C=log\frac{{(\nu -az\left({n}_{0}\right)-\rho )}^{\nu}}{z{\left({n}_{0}\right)}^{\rho}}-log{n}_{0}^{\rho -\nu}$$

$$D\left(n\right)=\frac{\nu -\rho}{a}n-\frac{{A}^{\frac{1}{\nu}}}{a}O\left({n}^{\frac{\rho}{\nu}}\right)$$

Note that $a\ge 0$ per $\nu >\rho $. We already discussed in the main text the case $a=0$, that has to be treated separately, so we here consider $a>0$. From (A10) we observe that $z\left(n\right)=D\left(n\right)/n$ is increasing in n: $z\left(n\right)=\frac{\nu -\rho}{a}-\tilde{z}\left(n\right)$ with ${lim}_{n\to \infty}\tilde{z}\left(n\right)=0$. Thus $\nu -\rho -az=a\tilde{z}\left(n\right)>0$, with limit zero in the asymptotic limit $n\to \infty $. The initial assumption is thus satisfied in the entire range of the z values.

In the following we derive the expression of Zipf’s exponents for the model of Pólya’s urn with innovations, by exploiting the continuous approximation.

The time evolution of the number of different colors D in the stream can be approximated by the following equation (see Equation (19) with ${N}_{0}=1$):

$$\dot{D}=\frac{1+\nu D}{1+\rho t+(\nu +1)D}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\mathrm{with}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}D\left(0\right)=1.$$

Putting aside the particular cases $\nu =0$ and $\nu =\rho $, Equation (A11) can be solved analytically to yield, in the leading terms at large t (see also Appendix A):

$$D\left(t\right)\approx \left\{\begin{array}{cc}\frac{\nu -\rho}{\nu +1}\phantom{\rule{0.166667em}{0ex}}t& \hfill \phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\mathrm{if}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\nu >\rho \\ {\left(\frac{\rho (\rho -\nu )}{\nu (\rho -1)+2\rho}\phantom{\rule{0.166667em}{0ex}}t\right)}^{\nu /\rho}& \hfill \phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\mathrm{if}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\nu <\rho \end{array}\right.\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\mathrm{with}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}t\gg 1.$$

The two regimes given by the relative values of $\nu $ and $\rho $ result in two different Heaps’ exponents $\gamma $, i.e., $\gamma =1$ and $\gamma =\nu /\rho $.

In the denominator of Equation (A11), the total number of balls in the urn appears: $N\left(t\right)=1+\rho t+(\nu +1)D$, so that we can write:

$$N\left(t\right)=\frac{\nu D}{\dot{D}}\approx \frac{\nu t}{\gamma}\approx \left\{\begin{array}{c}\nu t\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\mathrm{if}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\nu >\rho \hfill \\ \rho t\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\mathrm{if}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\nu <\rho \hfill \end{array}\right..$$

We denote with ${N}_{k}$ the number of balls with a given color occurring k-times in the urn. In particular, we have ${\sum}_{k}{N}_{k}=D$. The following master equation can be written for the ${N}_{k}$:

$$\frac{\partial {N}_{k}}{\partial t}=\frac{(k-\rho ){N}_{k-\rho}}{N\left(t\right)}-\frac{k{N}_{k}}{N\left(t\right)}\approx -\frac{\rho}{N\left(t\right)}\frac{\partial k{N}_{k}}{\partial k}\approx -\frac{\rho \dot{D}}{\nu D}\frac{\partial k{N}_{k}}{\partial k}.$$

We introduce now the probability ${p}_{k}$ that a given color appears k-times in the urn, i.e., the corresponding normalized version of the number of occurrences ${N}_{k}$. In order to have $\sum {p}_{k}=1$, we must choose ${N}_{k}=D{p}_{k}$. The idea is that, as the time runs, the probabilities ${p}_{k}$ will tend to a stationary distribution, i.e., a distribution independent of t. By substituting ${N}_{k}=D{p}_{k}$ in Equation (A14), we get

$${p}_{k}=-\frac{\rho}{\nu}\frac{\partial k{p}_{k}}{\partial k}.$$

This equation can be solved easily by substituting ${p}_{k}\propto {k}^{-\beta}$ and solving for $\beta $, which leads to
and conversely the frequency-rank exponent $\alpha =\rho /\nu $. Note that, while $\gamma $ depends on the relative values of $\nu $ and $\rho $, $\alpha $ does not. To relate the ${p}_{k}$ distribution in the urn to that of the stream is an easy task.

$$\beta =1+\frac{\nu}{\rho}$$

In addition, in this case Equation (A11) can be solved analytically with the solution including the Lambert W function. At large values of t, the solution can be approximated as
so that $N\left(t\right)\approx \nu t$. Equation (A14) can be written as before as:
which results in $\beta =2$ and $\alpha =1$.

$$D\left(t\right)\approx \frac{\nu}{\nu +1}\phantom{\rule{0.166667em}{0ex}}\frac{t}{logt},$$

$$\frac{\partial D{p}_{k}}{\partial t}=-\frac{\nu \dot{D}}{\nu D}\frac{\partial Dk{p}_{k}}{\partial k}\approx \phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\u27f6\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}{p}_{k}=-\frac{\partial k{p}_{k}}{\partial k},$$

This case is identical to the Hoppe’s urn model. When a ball with a brand new color is extracted, exactly one new color enters the urn so that the number of unobserved colors stays the same during the whole dynamics. If we start with one single ball, there will always be only one unobserved color in the urn and this color would have exactly the same function of the black ball with weight one in Hoppe’s model. The equation for the growth of novelties will be:
while the frequency-rank will be decaying exponentially. In order to introduce the equivalent counterpart of the weight of the black ball in the Hoppe’s model, whenever a novelty is extracted, w balls of the same brand new color could be added to the urn.

$$\dot{D}=\frac{1}{1+\rho t+D}\phantom{\rule{3.33333pt}{0ex}}\stackrel{t\to \infty}{\approx}\phantom{\rule{3.33333pt}{0ex}}\frac{1}{\rho t}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\u27f6\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}D\approx \frac{1}{\rho}logt,$$

- Estoup, J.B. Les Gammes Sténographiques; Institut Sténographique de France: Paris, France, 1916. [Google Scholar]
- Zipf, G.K. Relative Frequency as a Determinant of Phonetic Change. Harvard Stud. Class. Philol.
**1929**, 40, 1–95. [Google Scholar] [CrossRef] - Zipf, G.K. The Psychobiology of Language; Houghton-Mifflin: New York, NY, USA, 1935. [Google Scholar]
- Zipf, G.K. Human Behavior and the Principle of Least Effort; Addison-Wesley: Reading, MA, USA, 1949. [Google Scholar]
- Herdan, G. Type-Token Mathematics: A Textbook of Mathematical Linguistics; Janua linguarum. Series Maior. No. 4; Mouton en Company: The Hague, The Netherlands, 1960. [Google Scholar]
- Heaps, H.S. Information Retrieval-Computational and Theoretical Aspects; Academic Press: Orlando, FL, USA, 1978. [Google Scholar]
- Taylor, L. Aggregation, Variance and the Mean. Nature
**1961**, 189, 732. [Google Scholar] [CrossRef] - Kauffman, S.A. Investigations: The Nature of Autonomous Agents and the Worlds They Mutually Create; SFI Working Papers; Santa Fe Institute: Santa Fe, NM, USA, 1996. [Google Scholar]
- Kauffman, S.A. Investigations; Oxford University Press: New York, NY, USA; Oxford, UK, 2000. [Google Scholar]
- Tria, F.; Loreto, V.; Servedio, V.D.P.; Strogatz, S.H. The dynamics of correlated novelties. Nat. Sci. Rep.
**2014**, 4. [Google Scholar] [CrossRef] [PubMed] - Loreto, V.; Servedio, V.D.P.; Tria, F.; Strogatz, S.H. Dynamics on expanding spaces: modeling the emergence of novelties. In Universality and Creativity in Language; Altmann, E., Esposti, M.D., Pachet, F., Eds.; Springer: Cham, Switzerland, 2016; pp. 59–83. [Google Scholar]
- Pitman, J. Combinatorial stochastic processes. In Lecture Notes in Mathematics; Springer-Verlag: Berlin, Germany, 2006; Volume 1875, p. x+256. [Google Scholar]
- Buntine, W.; Hutter, M. A Bayesian View of the Poisson-Dirichlet Process. arXiv, 2010; arXiv:1007.0296. [Google Scholar]
- De Blasi, P.; Favaro, S.; Lijoi, A.; Mena, R.H.; Pruenster, I.; Ruggiero, M. Are Gibbs-Type Priors the Most Natural Generalization of the Dirichlet Process? IEEE Trans. Pattern Anal. Mach. Intel.
**2015**, 37, 212–229. [Google Scholar] [CrossRef] [PubMed][Green Version] - Hart, M. Project Gutenberg. 1971. Available online: http://www.gutenberg.org/.
- Petruszewycz, M. L’histoire de la loi d’Estoup-Zipf: Documents. Math. Sci. Hum.
**1973**, 44, 41–56. [Google Scholar] - Li, W. Zipf’s Law everywhere. Glottometrics
**2002**, 5, 14–21. [Google Scholar] - Newman, M.E.J. Power laws, Pareto distributions and Zipf’s law. Contemp. Phys.
**2005**, 46, 323–351. [Google Scholar] [CrossRef] - Piantadosi, S.T. Zipf’s word frequency law in natural language: A critical review and future directions. Psychon. Bull. Rev.
**2014**, 21, 1112–1130. [Google Scholar] [CrossRef] [PubMed][Green Version] - Baeza-Yates, R.; Navarro, G. Block addressing indices for approximate text retrieval. J. Am. Soc. Inf. Sci.
**2000**, 51, 69–82. [Google Scholar] [CrossRef][Green Version] - Baayen, R. Word Frequency Distributions; Number v. 1 in Text, Speech and Language Technology; Springer: Dordrecht, The Netherlands, 2001. [Google Scholar]
- Egghe, L. Untangling Herdan’s law and Heaps’ Law: Mathematical and informetric arguments. J. Am. Soc. Inf. Sci. Technol.
**2007**, 58, 702–709. [Google Scholar] [CrossRef] - Serrano, M.A.; Flammini, A.; Menczer, F. Modeling statistical properties of written text. PLoS ONE
**2009**, 4, e5372. [Google Scholar] [CrossRef] [PubMed] - Lü, L.; Zhang, Z.K.; Zhou, T. Zipf’s law leads to Heaps’ law: Analyzing their relation in finite-size systems. PLoS ONE
**2010**, 5, e14139. [Google Scholar] [CrossRef] [PubMed] - Cristelli, M.; Batty, M.; Pietronero, L. There is More than a Power Law in Zipf. Sci. Rep.
**2012**, 2, 812. [Google Scholar] [CrossRef] [PubMed] - Gerlach, M.; Altmann, E.G. Scaling laws and fluctuations in the statistics of word frequencies. New J. Phys.
**2014**, 16, 113010. [Google Scholar] [CrossRef][Green Version] - Johnson, S. Where Good Ideas Come From: The Natural History of Innovation; Riverhead Hardcover: New York, NY, USA, 2010. [Google Scholar]
- Wagner, A.; Rosen, W. Spaces of the possible: universal Darwinism and the wall between technological and biological innovation. J. R. Soc. Interface
**2014**, 11, 20131190. [Google Scholar] [CrossRef] [PubMed] - Gouet, R. Strong Convergence of Proportions in a Multicolor P’olya Urn. J. Appl. Probab.
**1997**, 34, 426–435. [Google Scholar] [CrossRef] - Tria, F. The dynamics of innovation through the expansion in the adjacent possible. Nuovo Cim. C Geophys. Space Phys. C
**2016**, 39, 280. [Google Scholar] - Pitman, J. Exchangeable and partially exchangeable random partitions. Probab. Theory Relat. Fields
**1995**, 102, 145–158. [Google Scholar] [CrossRef] - De Finetti, B. La Prévision: Ses Lois Logiques, Ses Sources Subjectives. Annales de l’Institut Henri Poincaré
**1937**, 17, 1–68. [Google Scholar] - Zabell, S. Predicting the unpredictable. Synthese
**1992**, 90, 205–232. [Google Scholar] [CrossRef] - Monechi, B.; Ruiz-Serrano, A.; Tria, F.; Loreto, V. Waves of Novelties in the Expansion into the Adjacent Possible. PLoS ONE
**2017**, 12, e0179303. [Google Scholar] [CrossRef] [PubMed] - Iacopini, I.; Milojević, S.C.V.; Latora, V. Network Dynamics of Innovation Processes. Phys. Rev. Lett.
**2018**, 120, 048301. [Google Scholar] [CrossRef] [PubMed][Green Version] - Hoppe, F.M. Pólya-like urns and the Ewens’ sampling formula. J. Math. Biol.
**1984**, 20, 91–94. [Google Scholar] [CrossRef] - Ewens, W. The Sampling Theory of Selectively Neutral Alleles. Theor. Popul. Biol.
**1972**, 3, 87–112. [Google Scholar] [CrossRef] - Fisher, R.A. The Genetical Theory of Natural Selection; Clarendon Press: Oxford, UK, 1930. [Google Scholar]
- Wright, S. Evolution in Mendelian populations. Genetics
**1931**, 16, 97. [Google Scholar] [PubMed] - Simon, H. On a class of skew distribution functions. Biometrika
**1955**, 42, 425–440. [Google Scholar] [CrossRef] - Zanette, D.; Montemurro, M. Dynamics of Text Generation with Realistic Zipf’s Distribution. J. Quant. Linguist.
**2005**, 12, 29. [Google Scholar] [CrossRef] - Yamato, H.; Shibuya, M. Moments of some statistics of pitman sampling formula. Bull. Inf. Cybern.
**2000**, 32, 1–10. [Google Scholar]

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).