The Rescaled Polya Urn and the Wright-Fisher process with mutation

In [arXiv:1906.10951 (forthcoming on Advances in Applied Probability),arXiv:2011.05933 (published on PLOS ONE)] the authors introduce, study and apply a new variant of the Eggenberger-Polya urn, called the"Rescaled"Polya urn, which, for a suitable choice of the model parameters, is characterized by the following features: (i) a"local"reinforcement, i.e. a reinforcement mechanism mainly based on the last observations, (ii) a random persistent fluctuation of the predictive mean, and (iii) a long-term almost sure convergence of the empirical mean to a deterministic limit, together with a chi-squared goodness of fit result for the limit probabilities. In this work, motivated by some empirical evidences in [arXiv:2011.05933 (published on PLOS ONE)], we show that the multidimensional Wright-Fisher diffusion with mutation can be obtained as a suitable limit of the predictive means associated to a family of rescaled Polya urns


Introduction
The standard Eggenberger-Pólya urn [11,22] has been widely studied and generalized. In its simplest form, this model with k-colors works as follows. An urn contains N0 i balls of color i, for i = 1, . . . , k, and, at each time-step, a ball is extracted from the urn and then it is returned inside the urn together with α > 0 additional balls of the same color. Therefore, if we denote by Nn i the number of balls of color i in the urn at time-step n, we have Nn i = Nn−1 i + αξn i for n ≥ 1, where ξn i = 1 if the extracted ball at time-step n is of color i, and ξn i = 0 otherwise. The parameter α regulates the reinforcement mechanism: the greater α, the greater the dependence of Nn i on n h=1 ξ h i . In [1,2,3] the Rescaled Pólya (RP) urn has been introduced, studied, generalized and applied. This model is characterized by the introduction of a parameter β in the original model so that Therefore, the urn initially contains bi + B0 i > 0 balls of color i and the parameter β ≥ 0, together with α > 0, regulates the reinforcement mechanism. More precisely, the term βBn i links Nn+1 i to the "configuration" at time-step n through the "scaling" parameter β, and the term αξn+1 i links Nn+1 i to the outcome of the extraction at time-step n+1 through the parameter α. The case β = 1 obviously corresponds to the standard Eggenberger-Pólya urn with an initial number N0 i = bi + B0 i of balls of color i. When β < 1, the RP urn model exhibits the following three features: (i) a "local" reinforcement, i.e. a reinforcement mechanism mainly based on the last observations; (ii) a random persistent fluctuation of the predictive mean ψn i = E[ξn+1 i = 1| ξ h j , 0 ≤ h ≤ n, 1 ≤ j ≤ k]; (iii) a long-term almost sure convergence of the empirical mean N n=1 ξn i/N to the deterministic limit pi = bi/ n i=1 bi, and a chi-squared goodness of fit result for the long-term probability distribution {p1, . . . , p k }.
Regarding point (iii), we specifically have that the chi-squared statistics where N is the size of the sample and Oi = N n=1 ξn i the number of observations equal to i in the sample, is asymptotically distributed as χ 2 (k − 1)λ, with λ > 1. This means that the presence of correlation among observations mitigates the effect of the sample size N , that multiplies the chi-squared distance between the observed frequencies and the expected probabilities. This aspect is important for the statistical applications in the context of a "big sample", when a small value of the chi-squared distance might be significant, and hence a correction related to the correlation between observations is desirable. In [1,2] it is described a possible application in the context of clustered data, with independence between clusters and correlation, due to a reinforcement mechanism, inside each cluster.
In [3] the RP urn has been applied as a good model for the evolution of the sentiment associated to Twitter posts. For these processes the estimated values of β are strictly smaller than 1, but very near to 1. Note that the RP urn dynamics with such a value for β cannot be approximated by the standard Pólya urn (β = 1), because one would loose the fluctuations of the predictive means and the possibility of touching the barriers {0, 1}. In Figure 1, we show the plots of the processes (ψn 1)n and (ξn 1)n, reconstructed from the data and rescaled in time as t = n(1 − β) 2 . (Details about the analyzed data sets, the reconstruction process and the parameters estimation can be found in [3].) In this work, we show that the law of such processes can be approximated by the one of the Wright-Fisher diffusion with mutation. More precisely, we prove that the multidimensional Wright-Fisher diffusion with mutation can be obtained as a suitable limit of the predictive means associated to a family of RP urns with β ∈[0,1), β → 1.
The Wright-Fisher (WF) class of diffusion processes models the evolution of the relative frequency of a genetic variant, or allele, in a large randomly mating population with a finite number k of genetic variants. When k = 2, the WF diffusion obeys the one-dimensional stochastic differential equation The drift coefficient, F : [0, 1] → R, can include a variety of evolutionary forces such as mutation and selection. For example, F (x) = p1 − (p1 + p2)x = p1(1 − x) − p2x describes a process with recurrent mutation between the two alleles, governed by the mutation rates p1 > 0 and p2 > 0. The drift vanishes when x = p1/(p1 + p2) which is an attracting point for the dynamics. Equation (1) can be generalized to the case k > 2. The WF diffusion processes are widely employed in Bayesian Statistics, as models for time-evolving priors [12,14,24,28] and as a dicrete-time finite-population construction method of the two-parameter Poisson-Dirichlet diffusion [6]. They have been applied in genetics [4,16,23,26,29,31], in biophysics [7,8], in filtering theory [5,25] and in finance [9,13]. The benefit coming from the proven limit result is twofold. First, the known properties of the WF process can give a description of the RP urn when the parameter β is strictly smaller than one, but very near to one. Second, the given result might furnish the theoretical base for a new simulation method of the WF process. Indeed, simulation from Equation (1) is highly nontrivial because there is no known closed form expression for the transition function of the diffusion, even in the simple case with null drift [18].
The sequel of the paper is so structured. In Section 2 we set up our notation and we formally define the RP urn model. Section 3 provides the main result of this work, that is the convergence result of a suitable family of predictive means associated to RP urns with β → 1. In Section 4 we list some properties of the considered stochastic processes. In particular, we recall some properties of the WF diffusion with mutation, connecting them to the parameters of the RP urn model. Section 5 focuses on the case k = 2. Finally, in Section 6 we introduce the notion of dominant component (color in the RP urn), related to the possibility of reaching the barrier 1. The paper closes with two technical appendices.

The Rescaled Pólya urn
In all the sequel (unless otherwise specified) we suppose given two parameters α > 0 and β ≥ 0. Given a vector x = (x1, . . . , x k ) ∈ R k , we set |x| = k i=1 |xi| and x 2 = x x = k i=1 |xi| 2 . Moreover we denote Figure 1: Twitter data: In [3] the RP urn has been proven to be a good model for the evolution of the sentiment associated to Twitter posts. For these processes we have estimated values of β smaller than 1, but very near to 1. We here plot the processes (ψ n 1 ) n (red color) and (ξ n 1 ) n (blue color), reconstructed from the data and rescaled in time as t = n(1 − β) 2 . Details about the analyzed data sets, the reconstruction process and the estimated parameters can be found in [3].
by 1 and 0 the vectors with all the components equal to 1 and equal to 0, respectively.
To formally work with the RP urn model presented in the introduction, we add here some notations. In the whole sequel the expression "number of balls" is not to be understood literally, but all the quantities are real numbers, not necessarily integers. The urn initially contains bi + B0 i > 0 distinct balls of color i, with i = 1, . . . , k. We set b = (b1, . . . , b k ) and B0 = (B0 1, . . . , B 0 k ) . In all the sequel (unless otherwise specified) we assume |b| > 0 and we set p = b |b| . At each time-step (n + 1) ≥ 1, a ball is drawn at random from the urn, obtaining the random vector ξn+1 = (ξn+1 1, . . . , ξ n+1 k ) defined as ξn+1 i = 1 when the extracted ball at time n + 1 is of color i 0 otherwise, and the number of balls in the urn is so updated: which gives Similarly, from the equality |Bn+1| = β|Bn| + α , Setting r * n = |Nn| = |b| + |Bn|, that is the total number of balls in the urn at time-step n, we get the relations and Moreover, setting F0 equal to the trivial σ-field and Fn = σ(ξ1, . . . , ξn) for n ≥ 1, the conditional probabilities ψn = (ψn 1, . . . , ψ n k ) of the extraction process, also called predictive means, are and, from (3) and (4), we have The dependence of ψn on ξh depends on the factor f (h, n) = αβ n−h , with 1 ≤ h ≤ n, n ≥ 0. In the case of the standard Eggenberger-Pólya urn, that corresponds to β = 1 for all n, each observation ξh has the same "weight" f (h, n) = α. Instead, when β < 1 the factor f (h, n) increases with h, then the main contribution is given by the most recent extractions. We refer to this phenomenon as "local" reinforcement. The case β = 0 is an extreme case, for which ψn depends only on the last extraction ξn. By means of (7), together with (2) and (5), we get Setting ∆Mn+1 = ξn+1 − ψn and letting n = |b|(1 − β)/r * n+1 and δn = α/r * n+1 , from (9) we obtain

Main result
Consider the RP urn with parameters α > 0, β ∈ [0, 1) and B0 such that |B0| = r(β) = α/(1 − β) and set b = |b| > 0. Consequently, the total number of balls in the urn along the time-steps is constantly equal to r * (β) = b + r(β) and, if we denote by ψ (β) = (ψ (β) n )n the predictive means corresponding to the fixed value β, we have the dynamics where and ∆M The following result holds true: weakly converges towards some process X0 when β → 1. Then, for β → 1, the family of stochastic processes {X (β) , β ∈ [0, 1)} weakly converges towards the k-alleles Wright-Fisher diffusion X = (Xt) t≥0 , with type-independent mutation kernel given by p and dynamics with Σ(Xt)Σ(Xt) = diag(Xt) − XtXt Proof. Fix a sequence (βn), with βn ∈ [0, 1) and βn → 1. The sequence of processes {X (βn) , n ∈ N} is bounded, and hence we have to prove the tighthness of the sequence in the space D k [0, ∞) of right-continuous functions with the ususal Skorohod topology, and the characterization of the law of the unique limit process.
We note that, for any f ∈ C 2 b , the partial derivatives in (16) are uniformly dounded, as x belongs to the , n ∈ N} is tight in the space of right-continuous functions with the ususal Skorohod topology. Since, for any n and t, X (βn) t ∈ S, then 1 Σ(Xt) = 0 . Moreover, the generator of the limit process is determined by the limit Hence, the weak limit of the sequence of the bounded processes X (βn) is the diffusion process The expression (15)

Some properties
We list here some properties.
The Markov diffusion process Xt in (14) may be ridefined as Yt = (Xt,1, . . . , X t,k−1 ) on y ∈ T k−1 with the corresponding generator The Kolmogorov forward equation for the density p(y, t) of the limiting process Y t is Therefore, it is not hard to show that the limit invariant ergodic distribution is because it satisfy (20) (see also [30]). The above distribution is the Dirichel distribution Dir 2 b α p as a function of x = (y, 1 − y1 − · · · − y k−1 ).

Transition density of the limit process
The transition density p(y0, y; t) is defined by P (Yt ∈ S|Y0 = y0) = S∩T k−1 p(y0, y; t)dy and it can be represented in terms of series of orthogonal polynomials given in Appendix A. We first note that the limiting invariant ergoding distribution p(y) in (21) and the generator of the process Yt in (19) may be rewritten on T k−1 in terms of γi = 2 b α pi − 1, obtaining These two expressions coincide with those given in Appendix A. Let Vn,γ be the space of orthogonal polynomials of degree n as defined there and let f γ n one of the three orthogonal bases given there. Then (22) implies . Note that each ψ γ n (t, y) = e −νnt f γ n (y) satisfies the Kolmogorov backward equation associated to the process Yt, since The orthogonality and the completeness of the polynomial system implies that The transition density p(y0, y; t) may be then computed differentiating Uy(y0, t), obtaining (cfr. [19,Eq. (15.13.11)]) p(y0, y; t) = ∂ ∂y n n : n 1 +···+n k−1 =n cn(y)ψ γ n (t, y0)

Two-dimensional urn
In the next proposition we point out the behavior obtained when we look at the aggregated evolution of two groups J1, J2 of urn colors.
In addition, X Proof. It is sufficient to apply Theorem 2 and note that the process X (J) t satisfies the SDE (18), that now reads Now, if we further specialize the grouping choice to J = ({i}, {1, . . . , i − 1, i + 1, . . . , k}), we get We note that in this case Yt,i = X

Excursions from z 0
Let J = {J1, J2} as in Proposition 1, that implies that Zt = l∈J X t,l satisfies the following equation that is (23) with a0 = b α l∈J p l and a1 = b α − a0. We focus here on some properties of Z. As in [19,Section 15.3], let a and b be fixed, subject to 0 < a < b < 1, and let τA be the hitting time of the set A (for z ∈ (0, 1), we set τz = τ {z} for semplicity) and τ * = τ {a,b} = min(τa, τ b ) be the first time the process reaches either a or b. We highlight some classical problems that are linked to τ b , τa and τ * .

Problem 2.
Find w(z0) = E( τ * 0 g(Zt)dt|Z0 = z0), with g bounded and continuous function. This quantity is the expected cost up to the time when either a or b was first reached, under the cost rate g, starting from z0 ∈ (a, b). When g ≡ 1, then this problem gives the mean time to reach either a or b, starting from z0 ∈ (a, b). The solution w : (a, b) → R may be computed in terms of u : (a, b) → [0, 1] above, of the scale function S : (0, 1) → R of Zt given in (24), and of the speed density m : (0, 1) → R of Zt given in (25). By [19,Eq. (15.3.11)] we get A complete characterization of G in terms of a second order differential equation may be found in [19, p. 199]. An explicit formula for G is possible only when a0 = 0 (extinction of J1, not admitted in our model), that may be found in [19, p. 208].

Problem 3.
In [15], it is stated that for one-dimensional diffusion with limiting invariant distribution with density π(x), one has denote the first exit time from (x − ε, x + ε) and the time of first return to x after leaving (x − ε, x + ε), respectively. The proof in [15] can be easily modified to our context, so that for Zt above relation reads

Accessible and inaccessible boundaries: recessive sets and dominant components
Looking at (21), we give the following definition: that is (23)  With the same spirit, Corollary 1 states that Zt = 1 − Xt,i satisfies the SDE The results 3 and 4 in Proposition 2 are then a consequence of the classification of the boundary point z = 0 given in Appendix B.
Fixed γ = (γ1, . . . , γ k ) with γi > −1 for any i ∈ {1, . . . , k}, the classical polynomials on T k−1 are orthogonal with respect to the weight L 1 (T k−1 ) function where the normalization constant wγ of fγ is given by the Dirichlet integral Then, we may define πγ (y) = wγ fγ (y), which is a density on T k−1 . The Hilbert space that we consider here is hence defined on T k−1 by the inner product f, g γ = T k−1 f (y)g(y)πγ (y)dy, that gives the orthogonality stated above. As proven in [10,Section 5.3], the space Vn,γ of orthogonal polynomials of degree n is a eigenspace of eigenfunctions f of the second-order differential operator with eigenvalue λn = n(n + k + k i=1 γi) (see [10,Eq. (5.3.4)]), that is, , for any f ∈ Vn,γ .
In [10, Section 5.3]), three orthogonal bases of Vn,γ are presented. Each one of these three bases is made by functions identified by the n+k−2 n possible choice of n = (n1, . . . , n k−1 ) with ni ≥ 0 and |n| = i ni = n as follows.
Jacobi: the standard extension of Jacobi polynomials given in [10, Proposition 5.3.1] as where p (a 1 ,a 2 ) n (t) is the standard Jacobi polynomial defined on −1 < t < 1 and h γ P,n is the normalizing factor given in [10,Proposition 5.3.1] in terms of products of Pochhammer symbols, so that P γ n 1 , P γ n 2 γ = 1n 1 ≡n 2 ; Monic orthogonal basis: in [10, Proposition 5.3.2] it is proven that the following family is a orthogonal base of Vn,γ : where m n means mi ≤ ni for any i = 1, . . . , k − 1. Note that in this case the normalizing factor h γ V,n is not given explicitly, and then V γ n 1 , V γ n 2 γ = 1n 1 ≡n 2 (h γ V,n 1 ) 2 . Rodrigue formula: in [10,Proposition 5.3.3] it is proven that the following family is a orthogonal base of Vn,γ , given in terms of the Rodrigue formula: Again, the normalizing factor h γ U,n is not given explicitly, and then U γ n 1 , U γ n 2 γ = 1n 1 ≡n 2 (h γ U,n 1 ) 2 . Moreover, the two families (U γ n )n and (V γ n )n are biorthogonal, in the sense that U γ n 1 , V γ n 2 γ = 0 whenever n1 = n2.

B Wright-Fisher boundary types
In this section we recall a classification of the boundaries of the one-dimensional Wright-Fisher process with mutation given in [19, p. 239, Example 8] (see also [17]).
Fixed a0, a1 ≥ 0, let Zt be the process with values in [0, 1] that satisfies the SDE dZt = (−a1Zt + a0(1 − Zt))dt + max(0, Zt(1 − Zt))dWt. When a0, a1 > 0, the SDE may be written as dZt = −(a0 + a1) Zt − a0 a0 + a1 dt + max(0, Zt(1 − Zt))dWt and the process that starts at Z0 ∈ (0, 1) will never leave the strip [0, 1]. In particular, the classification above states whether the process Zt will reach the boundary infinitely many times (and the boundary point is a reflection barrier) or will never reach it. In particular z is a regular boundary if and only if Zt reaches the reflecting barrier z infinitely many times; while z is an entrance boundary if and only if the process Zt will never touch z. Moreover, we may compute the scale function S : (0, 1) → R, defined as the integral of its derivative