A Compound Poisson Perspective of Ewens–Pitman Sampling Model

Emanuele Dolera; Stefano Favaro

doi:10.3390/math9212820

and

¹

Department of Mathematics, University of Pavia, Via Adolfo Ferrata 5, 27100 Pavia, Italy

²

Collegio Carlo Alberto, Piazza V. Arbarello 8, 10122 Torino, Italy

³

IMATI-CNR “Enrico Magenes”, 27100 Pavia, Italy

⁴

Department of Economic and Social Sciences, Mathematics and Statistics, University of Torino, Corso Unione Sovietica 218/bis, 10134 Torino, Italy

Mathematics2021, 9(21), 2820;https://doi.org/10.3390/math9212820

This article belongs to the Special Issue Bayesian Predictive Inference and Related Asymptotics—Festschrift for Eugenio Regazzini's 75th Birthday

Version Notes

Order Reprints

Abstract

The Ewens–Pitman sampling model (EP-SM) is a distribution for random partitions of the set

{1, \dots, n}

, with

n \in N

, which is indexed by real parameters

α

and

θ

such that either

α \in [0, 1)

and

θ > - α

, or

α < 0

and

θ = - m α

for some

m \in N

. For

α = 0

, the EP-SM is reduced to the Ewens sampling model (E-SM), which admits a well-known compound Poisson perspective in terms of the log-series compound Poisson sampling model (LS-CPSM). In this paper, we consider a generalisation of the LS-CPSM, referred to as the negative Binomial compound Poisson sampling model (NB-CPSM), and we show that it leads to an extension of the compound Poisson perspective of the E-SM to the more general EP-SM for either

α \in (0, 1)

, or

α < 0

. The interplay between the NB-CPSM and the EP-SM is then applied to the study of the large n asymptotic behaviour of the number of blocks in the corresponding random partitions—leading to a new proof of Pitman’s

α

diversity. We discuss the proposed results and conjecture that analogous compound Poisson representations may hold for the class of

α

-stable Poisson–Kingman sampling models—of which the EP-SM is a noteworthy special case.

Keywords:

Berry–Esseen type theorem; Ewens–Pitman sampling model; exchangeable random partitions; log-series compound poisson sampling model; Mittag–Leffler distribution function; negative binomial compound poisson sampling model; Pitman’s α-diversity; wright distribution function

1. Introduction

The Pitman–Yor process is a discrete random probability measure indexed by real parameters

α

and

θ

such that either

α \in [0, 1)

and

θ > - α

, or

α < 0

and

θ = - m α

for some

m \in N

—as can be seen in, e.g., Perman et al. [], Pitman [] and Pitman and Yor []. Let

{V_{i}}_{i \geq 1}

be independent random variables such that

V_{i}

is distributed as a Beta distribution with parameter

(1 - α, θ + i α)

, for

i \geq 1

, with the convention for

α < 0

that

V_{m} = 1

and

V_{i}

is undefined for

i > m

. If

P_{1} : = V_{1}

and

P_{i} : = V_{i} \prod_{1 \leq j \leq i - 1} (1 - V_{j})

for

i \geq 2

, such that almost definitely

\sum_{i \geq 1} P_{i} = 1

, then the Pitman–Yor process is the random probability measure

{\tilde{p}}_{α, θ}

on

(N, 2^{N})

such that

{\tilde{p}}_{α, θ} ({i}) = P_{i}

for

i \geq 1

. The Dirichlet process (Ferguson []) arises for

α = 0

. Because of the discreteness of

{\tilde{p}}_{α, θ}

, a random sample

(X_{1}, \dots, X_{n})

induces a random partition

\prod_{n}

of

{1, \dots, n}

by means of the equivalence

i \sim j \Leftrightarrow X_{i} = X_{j}

(Pitman []). Let

K_{n} (α, θ) : = K_{n} (X_{1}, \dots, X_{n}) \leq n

be the number of blocks of

\prod_{n}

and let

M_{r, n} (α, θ) : = M_{r, n} (X_{1}, \dots, X_{n})

, for

r = 1, \dots, n

, be the number of blocks with frequency r of

\prod_{n}

with

\sum_{1 \leq r \leq n} M_{r, n} = K_{n}

and

\sum_{1 \leq r \leq n} r M_{r, n} = n

. Pitman [] showed that:

\begin{matrix} \Pr [(M_{1, n} (α, θ), \dots, M_{n, n} (α, θ)) = (x_{1}, \dots, x_{n})] = n! \frac{{(\frac{θ}{α})}_{(\sum_{i = 1}^{n} x_{i})}}{{(θ)}_{(n)}} \prod_{i = 1}^{n} \frac{{(\frac{α {(1 - α)}_{(i - 1)}}{i!})}^{x_{i}}}{x_{i}!}, \end{matrix}

(1)

with

{(x)}_{(n)}

being the ascending factorial of x of order n, i.e.,

{(x)}_{(n)} : = \prod_{0 \leq i \leq n - 1} (x + i)

. The distribution (1) is referred to as the Ewens–Pitman sampling model (EP-SM), and for

α = 0

, it reduces to the Ewens sampling model (E-SM) in Ewens []. The Pitman–Yor process plays a critical role in a variety of research areas, such as mathematical population genetics, Bayesian nonparametrics, machine learning, excursion theory, combinatorics and statistical physics. See Pitman [] and Crane [] for a comprehensive treatment of this subject.

The E-SM admits a well-known compound Poisson perspective in terms of the log-series compound Poisson sampling model (LS-CPSM). See Charalambides [] and the references therein for an overview of compound Poisson models. We consider a population of individuals with a random number K of distinct types, and let K be distributed as a Poisson distribution with parameter

λ = - z log (1 - q)

for

q \in (0, 1)

and

z > 0

. For

i \in N

, let

N_{i}

denote the random number of individuals of type i in the population, and let the

N_{i}

’s be independent of K and independent from each other, with the same distribution:

\Pr [N_{1} = x] = - \frac{1}{x log (1 - q)} q^{x}

(2)

for

x \in N

. Let

S = \sum_{1 \leq i \leq K} N_{i}

and let

M_{r} = \sum_{1 \leq i \leq K} 𝟙_{{N_{i} = r}}

for

r = 1, \dots, S

, that is,

M_{r}

is the random number of

N_{i}

equal to r such that

\sum_{r \geq 1} M_{r} = K

and

\sum_{r \geq 1} r M_{r} = S

. If

(M_{1} (z, n), \dots, M_{n} (z, n))

denotes a random variable whose distribution coincides with the conditional distribution of

(M_{1}, \dots, M_{S})

given

S = n

, then (Section 3, Charalambides []) it holds:

\begin{matrix} \Pr [(M_{1} (z, n), \dots, M_{n} (z, n)) = (x_{1}, \dots, x_{n})] = \frac{n!}{{(z)}_{(n)}} \prod_{i = 1}^{n} \frac{{(\frac{z}{i})}^{x_{i}}}{x_{i}!} . \end{matrix}

(3)

The distribution (3) is referred to as the LS-CPSM, and it is equivalent to the E-SM. That is, the distribution (3) coincides with the distribution (1) with

α = 0

. Therefore, the distributions of

K (z, n) = \sum_{1 \leq r \leq n} M_{r} (z, n)

and

M_{r} (z, n)

coincide with the distributions of

K_{n} (0, z)

and

M_{r, n} (0, z)

, respectively. Let

\overset{w}{⟶}

denote the weak convergence. From Korwar and Hollander [],

K (z, n) / log n \overset{w}{⟶} z

as

n \to + \infty

, whereas from Ewens [], it follows that

M_{r} (z, n) \overset{w}{⟶} P_{z / r}

as

n \to + \infty

, where

P_{z}

is a Poisson random variable with parameter z.

In this paper, we consider a generalisation of the LS-CPSM referred to as the negative binomial compound Poisson sampling model (NB-CPSM). The NB-CPSM is indexed by real parameters

α

and z such that either

α \in (0, 1)

and

z > 0

or

α < 0

and

z < 0

. The LS-CPSM is recovered by letting

α \to 0

and

z > 0

. We show that the NB-CPSM leads to extend the compound Poisson perspective of the E-SM to the more general EP-SM for either

α \in (0, 1)

, or

α < 0

. That is, we show that: (i) for

α \in (0, 1)

, the EP-SM (1) admits a representation as a randomised NB-CPSM with

α \in (0, 1)

and

z > 0

, where the randomisation acts on z with respect a scale mixture between a Gamma and a scaled Mittag–Leffler distribution (Pitman []); (ii) for

α < 0

the NB-CPSM admits a representation in terms of a randomised EP-SM with

α < 0

and

θ = - m α

for some

m \in N

, where the randomisation acts on m with respect to a tilted Poisson distribution arising from the Wright function (Wright []). The interplay between the NB-CPSM and the EP-SM is then applied to the large n asymptotic behaviour of the number of distinct blocks in the corresponding random partitions. In particular, by combining the randomised representation in (i) with the large n asymptotic behaviour or the number of distinct blocks under the NB-CPSM, we present a new proof of Pitman’s

α

-diversity (Pitman []), namely the large n asymptotic behaviour of

K_{n} (α, θ)

under the EP-SM for

α \in (0, 1)

.

2. A Compound Poisson Perspective of EP-SM

To introduce the NB-CPSM, we considered a population of individuals with a random number K of types and let K be distributed as a Poisson distribution with parameter

λ = z [1 - {(1 - q)}^{α}]

such that either

q \in (0, 1)

,

α \in (0, 1)

and

z > 0

, or

q \in (0, 1)

,

α < 0

and

z < 0

. For

i \in N

, let

N_{i}

be the random number of individuals of type i in the population, and let the

N_{i}

be independent of K and independent from each other with the same distribution:

\Pr [N_{1} = x] = - \frac{1}{[1 - {(1 - q)}^{α}]} (\binom{α}{x}) {(- q)}^{x}

(4)

for

x \in N

. Let

S = \sum_{1 \leq i \leq K} N_{i}

and

M_{r} = \sum_{1 \leq i \leq K} 𝟙_{{N_{i} = r}}

for

r = 1, \dots, S

, that is,

M_{r}

is the random number of

N_{i}

equal to r such that

\sum_{r \geq 1} M_{r} = K

and

\sum_{r \geq 1} r M_{r} = S

. If

(M_{1} (α, z, n), \dots, M_{n} (α, z, n))

is a random variable whose distribution coincides with the conditional distribution of

(M_{1}, \dots, M_{S})

, given

S = n

, then it holds (Section 3, Charalambides []):

\begin{matrix} \Pr [(M_{1} (α, z, n), \dots, M_{n} (α, z, n)) = (x_{1}, \dots, x_{n})] = \frac{n!}{\sum_{j = 0}^{n} C (n, j; α) z^{j}} \prod_{i = 1}^{n} \frac{{[z \frac{α {(1 - α)}_{(i - 1)}}{i!}]}^{x_{i}}}{x_{i}!}, \end{matrix}

(5)

where

C (n, j; α) = \frac{1}{j!} \sum_{0 \leq i \leq j} (\binom{j}{i}) {(- 1)}^{i} {(- i α)}_{(n)}

is the generalised factorial coefficient (Charalambides []), with the proviso

C (n, 0, α) = 0

for all

n \in N

,

C (n, j, α) = 0

for all

j > n

and

C (0, 0, α) = 1

. The distribution (5) is referred to as the NB-CPSM. As

α \to 0

, the distribution (4) reduces to the distribution (2), and hence the NB-CPSM (5) is reduced to the LS-CPSM (3). The next theorem states the large n asymptotic behaviour of the counting statistics

K (α, z, n) = \sum_{1 \leq r \leq n} M_{r} (α, z, n)

and

M_{r} (α, z, n)

arising from the NB-CPSM.

Theorem 1.

Let

P_{λ}

denote a Poisson random variable with the parameter

λ > 0

. As

n \to + \infty

,

(i): for $α \in (0, 1)$ and $z > 0$ :

$K (α, z, n) \overset{w}{⟶} 1 + P_{z}$

(6)

and:

$M_{r} (α, z, n) \overset{w}{⟶} P_{\frac{α {(1 - α)}_{(r - 1)}}{r!} z};$

(7)
(ii): for $α < 0$ and $z < 0$ :

$\frac{K (α, z, n)}{n^{\frac{- α}{1 - α}}} \overset{w}{⟶} \frac{{(α z)}^{\frac{1}{1 - α}}}{- α}$

(8)

and:

$M_{r} (α, z, n) \overset{w}{⟶} P_{\frac{α {(1 - α)}_{(r - 1)}}{r!} z} .$

(9)

Proof.

As regards the proof of (6), we start by recalling that the probability generating function

G (\cdot; λ)

of

P_{λ}

is

G (s; λ) = exp {- λ (s - 1)}

for any

s > 0

. Now, let

G (\cdot; α, z, n)

be the probability generating function of

K (α, z, n)

. The distribution of

K (α, z, n)

follows by combining the NB-CPSM (5) with Theorem 2.15 of Charalambides []. In particular, it follows that:

G (s; α, z, n) = \frac{\sum_{j = 1}^{n} C (n, j; α) {(s z)}^{j}}{\sum_{j = 1}^{n} C (n, j; α) z^{j}} .

Hereafter, we show that

G (s; α, z, n) \to s exp {z (s - 1)}

as

n \to + \infty

, for any

s > 0

, which implies (6). In particular, by the direct application of the definition of

C (n, k; α)

, we write the following:

\begin{matrix} \sum_{j = 1}^{n} C (n, j; α) z^{j} & = \sum_{i = 1}^{n} {(- 1)}^{i} {(- i α)}_{(n)} \sum_{k = i}^{n} \frac{1}{k!} (\binom{k}{i}) z^{k} = \sum_{i = 1}^{n} {(- 1)}^{i} {(- i α)}_{(n)} e^{z} z^{i} \frac{Γ (n - i + 1, z)}{i! Γ (n - i + 1)}, \end{matrix}

where

Γ (a, x) : = \int_{x}^{+ \infty} t^{a - 1} e^{- t} d t

denotes the incomplete gamma function for

a, x > 0

and

Γ (a) : = \int_{0}^{+ \infty} t^{a - 1} e^{- t} d t

denotes the Gamma function for

a > 0

. Accordingly, we write the identity:

G (s; α, z, n) = e^{z (s - 1)} \frac{- z s \frac{Γ (n, z s)}{Γ (n)} + \sum_{i = 2}^{n} {(- 1)}^{i} \frac{{(- i α)}_{(n)}}{{(- α)}_{(n)}} {(z s)}^{i} \frac{Γ (n - i + 1, z s)}{i! Γ (n - i + 1)}}{- z \frac{Γ (n, z)}{Γ (n)} + \sum_{i = 2}^{n} {(- 1)}^{i} \frac{{(- i α)}_{(n)}}{{(- α)}_{(n)}} z^{i} \frac{Γ (n - i + 1, z)}{i! Γ (n - i + 1)}} .

Since

{lim}_{n \to + \infty} \frac{Γ (n, x)}{Γ (n)} = 1

for any

x > 0

, the proof (6) is completed by showing that, for any

t > 0

:

lim_{n \to + \infty} \sum_{i = 2}^{n} {(- 1)}^{i} \frac{{(- i α)}_{(n)}}{{(- α)}_{(n)}} \frac{Γ (n - i + 1, t)}{Γ (n - i + 1)} \frac{t^{i}}{i!} = 0 .

(10)

By the definition of ascending factorials and the reflection formula of the Gamma function, it holds:

\frac{{(- i α)}_{(n)}}{{(- α)}_{(n)}} = \frac{Γ (n - i α)}{Γ (n - α)} \frac{sin i π α}{π} Γ (i α + 1) Γ (- α) .

In particular, by means of the monotonicity of the function

[1, + \infty) ∋ z \mapsto Γ (z)

, we can write:

\frac{1}{i!} | \frac{{(- i α)}_{(n)}}{{(- α)}_{(n)}} | \leq \frac{| Γ (- α) |}{π} \frac{Γ (n - 2 α)}{Γ (n - α)} \frac{Γ (i α + 1)}{i!}

(11)

for any

n \in N

such that

n > 1 / (1 - α)

, and

i \in {2, \dots, n}

. Note that

\frac{Γ (n, x)}{Γ (n)} \leq 1

. Then, we apply (11) to obtain:

\begin{matrix} | \sum_{i = 2}^{n} {(- 1)}^{i} \frac{{(- i α)}_{(n)}}{{(- α)}_{(n)}} \frac{Γ (n - i + 1, t)}{Γ (n - i + 1)} \frac{t^{i}}{i!} | & \leq \sum_{i = 2}^{n} \frac{t^{i}}{i!} | \frac{{(- i α)}_{(n)}}{{(- α)}_{(n)}} | \\ \leq \frac{| Γ (- α) |}{π} \frac{Γ (n - 2 α)}{Γ (n - α)} \sum_{i \geq 0} t^{i} \frac{Γ (i α + 1)}{i!} . \end{matrix}

Now, by means of Stirling approximation, it holds

\frac{Γ (n - 2 α)}{Γ (n - α)} \sim \frac{1}{n^{α}}

as

n \to + \infty

. Moreover, we have:

\sum_{i \geq 0} t^{i} \frac{Γ (i α + 1)}{i!} = \int_{0}^{+ \infty} e^{t z^{α} - z} d z < + \infty

where the finiteness of the integral follows, for any fixed

t > 0

, from the fact that

t z^{α} < \frac{1}{2} z

if

z > {(2 t)}^{\frac{1}{1 - α}}

. This completes the proof of (10) and hence the proof of (6). As regards the proof of (7), we make use of the falling factorial moments of

M_{r} (α, z, n)

, which follows by combining the NB-CPSM (5) with Theorem 2.15 of Charalambides []. Let

{(a)}_{[n]}

be the falling factorial of a of order n, i.e.,

{(a)}_{[n]} = \prod_{0 \leq i \leq n - 1} (a - i)

, for any

a \in R^{+}

and

n \in N_{0}

with the proviso

{(a)}_{[0]} = 1

. Then, we write:

\begin{matrix} E [{(M_{r} (α, z, n))}_{[s]}] \\ = {(- 1)}^{r s} {(n)}_{[r s]} {(\binom{α}{r})}^{s} {(- z)}^{s} \frac{\sum_{j = 0}^{n - r s} C (n - r s, j; α) z^{j}}{\sum_{j = 0}^{n} C (n, j; α) z^{j}} \\ = {(- 1)}^{r s} {(n)}_{[r s]} {(\binom{α}{r})}^{s} {(- z)}^{s} \frac{(- z) \frac{Γ (n - r s, z)}{Γ (n - r s)} + \sum_{i = 2}^{n - r s} {(- 1)}^{i} \frac{{(- i α)}_{(n - r s)}}{{(- α)}_{(n - r s)}} {(z)}^{i} \frac{Γ (n - r s - i + 1, z)}{i! Γ (n - r s - i + 1)}}{(- z) \frac{Γ (n, z)}{Γ (n)} + \sum_{i = 2}^{n} {(- 1)}^{i} \frac{{(- i α)}_{(n)}}{{(- α)}_{(n)}} {(z)}^{i} \frac{Γ (n - i + 1, z)}{Γ (n - i + 1)}} \\ = {(- 1)}^{r s} {(n)}_{[r s]} {(\binom{α}{r})}^{s} {(- z)}^{s} \\ \times \frac{{(- α)}_{(n - r s)}}{{(- α)}_{(n)}} \frac{(- z) \frac{Γ (n - r s, z)}{Γ (n - r s)} + \sum_{i = 2}^{n - r s} {(- 1)}^{i} \frac{{(- i α)}_{(n - r s)}}{{(- α)}_{(n - l r)}} {(z)}^{i} \frac{Γ (n - r s - i + 1, z)}{i! Γ (n - r s - i + 1)}}{(- z) \frac{Γ (n, z)}{Γ (n)} + \sum_{i = 2}^{n} {(- 1)}^{i} \frac{{(- i α)}_{(n)}}{{(- α)}_{(n)}} {(z)}^{i} \frac{Γ (n - i + 1, z)}{Γ (n - i + 1)}} . \end{matrix}

Now, by means of the same argument applied in the proof of statement (6), it holds true that:

lim_{n \to + \infty} \frac{(- z) \frac{Γ (n - r s, z)}{Γ (n - r s)} + \sum_{i = 2}^{n - r s} {(- 1)}^{i} \frac{{(- i α)}_{(n - r s)}}{{(- α)}_{(n - l r)}} {(z)}^{i} \frac{Γ (n - r s - i + 1, z)}{i! Γ (n - r s - i + 1)}}{(- z) \frac{Γ (n, z)}{Γ (n)} + \sum_{i = 2}^{n} {(- 1)}^{i} \frac{{(- i α)}_{(n)}}{{(- α)}_{(n)}} {(z)}^{i} \frac{Γ (n - i + 1, z)}{Γ (n - i + 1)}} = 1 .

Then:

lim_{n \to + \infty} E [{(M_{r} (α, z, n))}_{[s]}] = {(- 1)}^{r s} {(\binom{α}{r})}^{s} {(- z)}^{s} = {[\frac{α {(1 - α)}_{(r - 1)}}{r!} z]}^{s}

follows from the fact that

{(n)}_{[r s]} \sim \frac{{(- α)}_{(n - r s)}}{{(- α)}_{(n)}}

as

n \to + \infty

. The proof of the large n asymptotics (7) is completed by recalling that the falling factorial moment of order s of

P_{λ}

is

E [{(P_{λ})}_{[s]}] = λ^{s}

.

As regards the proof of statement (8), let

α = - σ

for any

σ > 0

and let

z = - ζ

for any

ζ > 0

. Then, by direct application of Equation (2.27) of Charalambides [], we write the following identity:

\sum_{j = 0}^{n} C (n, j; - σ) {(- ζ)}^{j} = {(- 1)}^{n} \sum_{v = 0}^{n} s (n, v) {(- σ)}^{v} \sum_{j = 0}^{v} ζ^{j} S (v, j),

where

S (v, j)

is the Stirling number of that second type. Now, note that

\sum_{0 \leq j \leq v}^{v} ζ^{j} S (v, j)

is the moment of order v of a Poisson random variable with parameter

ζ > 0

. Then, we write:

\begin{matrix} \sum_{j = 0}^{n} C (n, j; - σ) {(- ζ)}^{j} & = \sum_{v = 0}^{n} | s (n, v) | σ^{v} \sum_{j \geq 0} j^{v} e^{- ζ} \frac{ζ^{j}}{j!} = \sum_{j \geq 0} e^{- ζ} \frac{ζ^{j}}{j!} \int_{0}^{+ \infty} x^{n} f_{G_{σ j, 1}} (x) d x . \end{matrix}

(12)

That is:

B_{n} (w) = E [{(G_{σ P_{w}, 1})}^{n}],

(13)

where

G_{a, 1}

and

P_{w}

are independent random variables such that

G_{a, 1}

is a Gamma random variable with a shape parameter

a > 0

and a scale parameter 1, and

P_{w}

is a Poisson random variable with a parameter w. Accordingly, the distribution of

G_{σ P_{w}, 1}

, say

μ_{σ, w}

, is the following:

μ_{σ, w} (d t) = e^{- w} δ_{0} (d t) + (\sum_{j \geq 1} \frac{e^{- w} w^{j}}{j!} \frac{1}{Γ (j σ)} e^{- t} t^{j σ - 1}) d t

for

t > 0

. The discrete component of

μ_{σ, w}

does not contribute to the expectation (13) so that we focus on the absolutely continuous component, whose density can be written as follows:

\sum_{j \geq 1} \frac{e^{- w} w^{j}}{j!} \frac{1}{Γ (j σ)} e^{- t} t^{j σ - 1} = \frac{e^{- (w + t)}}{t} W_{σ, 0} (w t^{σ}),

where

W_{σ, τ} (y) : = \sum_{j \geq 0} \frac{y^{j}}{j! Γ (j σ + τ)}

is the Wright function (Wright []). In particular, for

τ = 0

:

B_{n} (w) = \int_{0}^{+ \infty} t^{n} \frac{e^{- (w + t)}}{t} W_{σ, 0} (w t^{σ}) d t .

(14)

If we split the integral as

\int_{0}^{M} + \int_{M}^{+ \infty}

for any

M > 0

, the contribution of the latter integral is overwhelming with respect to the contribution of the former. Then,

W_{σ, 0}

can be equivalently replaced by the asymptotics

W_{σ, 0} (y) \sim c (σ) y^{\frac{1}{2 (1 + σ)}} exp {σ^{- 1} (σ + 1) {(σ y)}^{\frac{1}{1 + σ}}}

, as

y \to + \infty

, for some constant

c (σ)

solely depending on

σ

. See Theorem 2 in Wright []. Hence:

\begin{matrix} B_{n} (w) & \sim c (σ) \int_{0}^{+ \infty} t^{n - 1} e^{- (w + t)} {(w t^{σ})}^{\frac{1}{2 (1 + σ)}} exp \{\frac{σ + 1}{σ} {(σ w t^{σ})}^{\frac{1}{1 + σ}}\} d t \\ = c (σ) e^{- w} w^{\frac{1}{2 (1 + σ)}} \int_{0}^{+ \infty} t^{n + \frac{σ}{2 (1 + σ)} - 1} exp {A (w, σ) t^{\frac{σ}{1 + σ}} - t} d t, \end{matrix}

where

A (w, σ) : = \frac{σ + 1}{σ} {(σ w)}^{\frac{1}{1 + σ}}

. Then, the problem is reduced to an integral whose asymptotic behaviour is described in Berg []. From Equation (31) of the Berg [] and Stirling approximation, we have:

B_{n} (w) \sim c (σ) e^{- w} w^{\frac{1}{2 (1 + σ)}} Γ (n) exp \{A (w, σ) n^{\frac{σ}{1 + σ}}\} .

(15)

This last asymptotic expansion leads directly to (8). Indeed, let

G (\cdot; - σ, - ζ, n)

be the probability generating function of the random variable

K (- σ, - ζ, n)

, which reads as

G (s; - σ, - ζ, n) = B_{n} (s ζ) / B_{n} (ζ)

for

s > 0

. Then, by means of (15), for any fixed

s > 0

we write:

G (s; - σ, - ζ, n) \sim e^{- w (s - 1)} s^{\frac{1}{2 (1 + σ)}} exp \{n^{\frac{σ}{1 + σ}} \frac{σ + 1}{σ} {(σ ζ)}^{\frac{1}{1 + σ}} [s^{\frac{1}{1 + σ}} - 1]\} .

(16)

Since (15) holds uniformly in w in a compact set, we consider the function

G (s; - σ, - ζ, n)

evaluated at some point

s_{n}

and extend the validity of (16) with

s_{n}

in the place of s, as long as

{s_{n}}_{n \geq 1}

varies in a compact subset of

[0, + \infty)

. Thus, we can choose

s_{n} = s^{β (n)}

and

β (n) = \frac{1}{n^{\frac{σ}{1 + σ}}}

and notice that

β (n) \to 0

as

n \to + \infty

. Thus,

s_{n} ≃ 1 + β (n) log s \to 1

and we have:

n^{\frac{σ}{1 + σ}} \frac{σ + 1}{σ} {(σ w)}^{\frac{1}{1 + σ}} [s_{n}^{\frac{1}{1 + σ}} - 1] \to \frac{{(σ ζ)}^{\frac{1}{1 + σ}}}{σ} log s,

which implies that

K (- σ, - ζ, n) \to \frac{{(σ ζ)}^{\frac{1}{1 + σ}}}{σ}

as

n \to + \infty

. This completes the proof of (8). As regards the proof (9), let

α = - σ

for any

σ > 0

and let

z = - ζ

for any

ζ > 0

. Similarly to the proof of (7), here we make use of the falling factorial moments of

M_{r} (- σ, - ζ, n)

, that is:

\begin{matrix} E [{(M_{r} (- σ, ζ, n))}_{[s]}] = {(- 1)}^{r s} {(n)}_{[r s]} {(\binom{- σ}{r})}^{s} ζ^{s} \frac{\sum_{j = 0}^{n - r s} C (n - r s, j; - σ) {(- ζ)}^{j}}{\sum_{j = 0}^{n} C (n, j; - σ) {(- ζ)}^{j}} . \end{matrix}

At this point, we make use of the same large n arguments applied in the proof of statement (7). In particular, by means of the large n asymptotic (15), as

n \to + \infty

, it holds true that:

\frac{\sum_{j = 0}^{n - r s} C (n - r s, j; - σ) {(- ζ)}^{j}}{\sum_{j = 0}^{n} C (n, j; - σ) {(- ζ)}^{j}} \sim n^{- r s} .

Then:

lim_{n \to + \infty} E [{(M_{r} (- σ, - ζ, n))}_{[s]}] = {(- 1)}^{r s} {(\binom{- σ}{r})}^{s} ζ^{s} = {[\frac{- σ {(1 + σ)}_{(r - 1)}}{r!} (- ζ)]}^{s}

it follows from the fact that

{(n)}_{[r s]} \sim n^{r s}

as

n \to + \infty

. The proof of the large n asymptotics (9) is completed by recalling that the falling factorial moment of order s of

P_{λ}

is

E [{(P_{λ})}_{[s]}] = λ^{s}

. □

In the rest of the section, we make use of the NB-CPSM (5) to introduce a compound Poisson perspective of the EP-SM. In particular, our result extends the well-known compound Poisson perspective of the E-SM to the EP-SM for either

α \in (0, 1)

, or

α < 0

. For

α \in (0, 1)

let

f_{α}

denote the density function of a positive

α

-stable random variable

X_{α}

, that is

X_{α}

is a random variable for which

E [exp {- t X_{α}}] = exp {- t^{α}}

for any

t > 0

. For

α \in (0, 1)

and

θ > - α

, let

S_{α, θ}

be a positive random variable with the density function:

f_{S_{α, θ}} (s) = \frac{Γ (θ + 1)}{α Γ (θ / α + 1)} s^{\frac{θ - 1}{α} - 1} f_{α} (s^{- \frac{1}{α}}) .

That is,

S_{α, θ}

is a scaled Mittag–Leffler random variable (Chapter 1, Pitman []). Let

G_{a, b}

be a Gamma random variable with the scale parameter

b > 0

and shape parameter

a > 0

, and let us assume that

G_{a, b}

is independent of

S_{α, θ}

. Then, for

α \in (0, 1)

,

θ > - α

and

n \in N

let:

{\bar{X}}_{α, θ, n} \overset{d}{=} G_{θ + n, 1}^{α} S_{α, θ} .

(17)

Finally, for

α < 0

,

z < 0

and

n \in N

, let

{\tilde{X}}_{α, z, n}

be a random variable on

N

whose distribution is a tilted Poisson distribution arising from the identity (12). Precisely, for any

x \in N

:

\Pr [{\tilde{X}}_{α, z, n} = x] = \frac{1}{\sum_{j = 1}^{n} C (n, j; α) z^{j}} \frac{e^{z} {(- z)}^{x} Γ (- x α + n)}{x! Γ (- x α)} .

(18)

The next theorem makes use of

{\bar{X}}_{α, θ, n}

and

{\tilde{X}}_{α, z, n}

to set an interplay between NB-CPSM (5) and EP-SM (1). This extends the compound Poisson perspective of the E-SM.

Theorem 2.

Let

(M_{1, n} (α, θ), \dots, M_{n, n} (α, θ))

be distributed as the EP-SM (1) and let

{\bar{X}}_{α, θ, n}

be the random variable defined in (17), which is independent of

(M_{1, n} (α, θ), \dots, M_{n, n} (α, θ))

. Moreover, let

(M_{1} (α, z, n), \dots, M_{n} (α, z, n))

be distributed as the NB-CPSM (5), and let

{\tilde{X}}_{α, z, n}

be the random variable defined in (18), which is independent of

(M_{1} (α, z, n), \dots, M_{n} (α, z, n))

. Then:

(i): for $α \in (0, 1)$ and $θ > - α$ :

$(M_{1, n} (α, θ), \dots, M_{n, n} (α, θ)) \overset{d}{=} (M_{1} (α, {\bar{X}}_{α, θ, n}, n), \dots, M_{n} (α, {\bar{X}}_{α, θ, n}, n));$
(ii): for $α < 0$ and $z < 0$ :

$(M_{1} (α, z, n), \dots, M_{n} (α, z, n)) \overset{d}{=} (M_{1, n} (α, - {\tilde{X}}_{α, z, n} α), \dots, M_{n, n} (α, - {\tilde{X}}_{α, z, n} α)) .$

Proof.

As regards the proof of statement (i), it relies on the classical integral representation of the Gamma function. That is, by applying the integral representation of

Γ (θ / α + k)

to the EP-SM (1), for

x_{1}, \dots, x_{n} \in {0, \dots, n}

with

\sum_{i = 1}^{n} x_{i} = k

and

\sum_{i = 1}^{n} i x_{i} = n

, we can write that:

\begin{matrix} \Pr [(M_{1, n} (α, θ), \dots, M_{n, n} (α, θ)) = (x_{1}, \dots, x_{n})] \\ = n! \frac{α^{k}}{Γ (θ + n)} \prod_{i = 1}^{n} \frac{{(\frac{{(1 - α)}_{(i - 1)}}{i!})}^{x_{i}}}{x_{i}!} \frac{Γ (θ + 1)}{α Γ (θ / α + 1)} \\ \times \int_{0}^{+ \infty} z^{θ / α - 1} e^{- z} \frac{z^{k}}{\sum_{j = 1}^{n} C (n, j; α) z^{j}} (\sum_{j = 1}^{n} C (n, j; α) z^{j}) d z \end{matrix}

By Equation (13) of Favaro et al. []:

\begin{matrix} = n! \frac{α^{k}}{Γ (θ + n)} \prod_{i = 1}^{n} \frac{{(\frac{{(1 - α)}_{(i - 1)}}{i!})}^{x_{i}}}{x_{i}!} \frac{Γ (θ + 1)}{α Γ (θ / α + 1)} \\ \times \int_{0}^{+ \infty} z^{θ / α - 1} e^{- z} \frac{z^{k}}{\sum_{j = 1}^{n} C (n, j; α) z^{j}} (e^{z} z^{n / α} \int_{0}^{+ \infty} y^{n} e^{- y z^{1 / α}} f_{α} (y) d y) d z \\ = \int_{0}^{+ \infty} \frac{n!}{\sum_{j = 0}^{n} C (n, j, α) z^{j}} \prod_{i = 1}^{n} \frac{{(z \frac{α {(1 - α)}_{(i - 1)}}{i!})}^{x_{i}}}{x_{i}!} \\ \times \frac{Γ (θ + 1)}{α Γ (θ + n) Γ (θ / α + 1)} z^{θ / α + n / α - 1} \int_{0}^{+ \infty} y^{n} e^{- y z^{1 / α}} f_{α} (y) d y d z \\ = \int_{0}^{+ \infty} \Pr [(M_{1} (α, x, n), \dots, M_{n} (α, x, n)) = (x_{1}, \dots, x_{n})] \\ \times \frac{Γ (θ + 1)}{α Γ (θ + n) Γ (θ / α + 1)} z^{θ / α + n / α - 1} \int_{0}^{+ \infty} y^{n} e^{- y z^{1 / α}} f_{α} (y) d y d z \\ By the distribution of {\bar{X}}_{α, θ, n} : \\ = \int_{0}^{+ \infty} \Pr [(M_{1} (α, z, n), \dots, M_{n} (α, z, n)) = (x_{1}, \dots, x_{n})] f_{{\bar{X}}_{α, θ, n}} (z) d z, \end{matrix}

where

f_{{\bar{X}}_{α, θ, n}}

is the density function of the random variable

{\bar{X}}_{α, θ, n}

. This completes the proof of (i).

As regards the proof of statement (ii), for any α < 0,

m \in N

,

k \leq m

and

n \in N

, we define the function

m \mapsto A (m; k, α, n) = \frac{m!}{(m - k)!} \frac{Γ (- m α)}{Γ (- m α + n)}

, and then consider the following identity:

\frac{{(- z)}^{k}}{\sum_{j = 1}^{n} C (n, j; α) z^{j}} = \sum_{m \geq k} A (m; k, α, n) \Pr [{\tilde{X}}_{α, z, n} = m] .

(19)

By applying (19) to the NB-CPSM (5), for

x_{1}, \dots, x_{n} \in {0, \dots, n}

with

\sum_{i = 1}^{n} x_{i} = k

and

\sum_{i = 1}^{n} i x_{i} = n

, we write:

\begin{matrix} \Pr [(M_{1} (α, z, n), \dots, M_{n} (α, z, n)) = (x_{1}, \dots, x_{n})] \\ = \sum_{m \geq k} n! {(- 1)}^{k} A (m; k, α, n) \Pr [{\tilde{X}}_{α, z, n} = m] \prod_{i = 1}^{n} \frac{{(\frac{α {(1 - α)}_{(i - 1)}}{i!})}^{x_{i}}}{x_{i}!} \\ = \sum_{m \geq k} n! {(- 1)}^{k} \frac{m!}{(m - k)!} \frac{Γ (- m α)}{Γ (- m α + n)} \Pr [{\tilde{X}}_{α, z, n} = m] \prod_{i = 1}^{n} \frac{{(\frac{α {(1 - α)}_{(i - 1)}}{i!})}^{x_{i}}}{x_{i}!} \\ = \sum_{m \geq k} n! \frac{{(\frac{- m α}{α})}_{(k)}}{{(- m α)}_{(n)}} \prod_{i = 1}^{n} \frac{{(\frac{α {(1 - α)}_{(i - 1)}}{i!})}^{x_{i}}}{x_{i}!} \Pr [{\tilde{X}}_{α, z, n} = m] \\ = \sum_{m \geq k} \Pr [(M_{1} (α, - m α), \dots, M_{n} (α, - m α)) = (x_{1}, \dots, x_{n})] \Pr [{\tilde{X}}_{α, z, n} = m] . \end{matrix}

This completes the proof of (ii). □

Theorem 2 presents a compound Poisson perspective of the EP-SM in terms of the NB-CPSM, thus extending the well-known compound Poisson perspective of the E-SM in terms of the LS-CPSM. Statement (i) of Theorem 2 shows that for

α \in (0, 1)

and

θ > - α

, the EP-SM admits a representation in terms of the NB-CPSM with

α \in (0, 1)

and

z > 0

, where the randomisation acts on the parameter z with respect to the distribution (17). Precisely, this is a compound mixed Poisson sampling model. That is, a compound sampling model in which the distribution of the random number K of distinct types in the population is a mixture of Poisson distributions with respect to the law of

{\bar{X}}_{α, θ, n}

. Statement (ii) of Theorem 2 shows that for

α < 0

and

z < 0

, the NB-CPSM admits a representation in terms of a randomised EP-SM with

α < 0

and

θ = - m α

for some

m \in N

, where the randomisation acts on the parameter m with respect to the distribution (17).

Remark 1.

The randomisation procedure introduced in Theorem 2 is somehow reminiscent of a class of Gibbs-type sampling models introduced in Gnedin and Pitman []. This class is defined from the EP-SM with

α < 0

and

θ = - m α

, for some

m \in N

, and then it assumes that the parameter m is distributed according to an arbitrary distribution on

N

. This can be seen in Theorem 12 of Gnedin and Pitman [] and Gnedin [] for example. However, differently from the definition of Gnedin and Pitman [], in our context, the distribution of m depends on the sample size n.

For

α \in (0, 1)

and

θ > - α

, Pitman [] first studied the large n asymptotic behaviour of

K_{n} (α, θ)

. This can also be seen in Gnedin and Pitman [] and the references therein. Let

\overset{a . s .}{⟶}

denote the almost sure convergence, and let

S_{α, θ}

be the scaled Mittag–Leffler random variable defined above. Theorem 3.8 of Pitman [] exploited a martingale convergence argument to show that:

\frac{K_{n} (α, θ)}{n^{α}} \overset{a . s .}{⟶} S_{α, θ}

(20)

as

n \to + \infty

. The random variable

S_{α, θ}

is referred to as Pitman’s

α

-diversity. For

α < 0

and

θ = - m α

for some

m \in N

, the large n asymptotic behaviour of

K_{n} (α, θ)

is trivial, that is:

K_{n} (α, θ) \overset{w}{⟶} m

(21)

as

n \to + \infty

. We refer to Dolera and Favaro [,] for Berry–Esseen type refinements of (20) and to Favaro et al. [,] and Favaro and James [] for generalisations of (20) with applications to Bayesian nonparametrics. This can also be seen in Pitman [] (Chapter 4) for a general treatment of (20). According to Theorem 2, it is natural to ask whether there exists an interplay between Theorem 1 and the large n asymptotic behaviours (20) and (21). Hereafter, we show that: (i) (20), with the almost sure convergence replaced by the convergence in distribution, arises by combining (6) with (i) of Theorem 2; (ii) (8) arises by combining (21) with (ii) of Theorem 2. This provides an alternative proof of Pitman’s

α

-diversity.

Theorem 3.

Let

K_{n} (α, θ)

and

K (α, z, n)

under the EP-SM and the NB-CPSM, respectively. As

n \to + \infty

:

(i): For $α \in (0, 1)$ and $θ > - α$ :

$\frac{K_{n} (α, θ)}{n^{α}} \overset{w}{⟶} S_{α, θ} .$

(22)
(ii): For $α < 0$ and $z < 0$ :

$\frac{K (α, z, n)}{n^{\frac{- α}{1 - α}}} \overset{w}{⟶} \frac{{(α z)}^{\frac{1}{1 - α}}}{- α} .$

(23)

Proof.

We show that (22) arises by combining (6) with statement (i) of Theorem 2. For any pair of

N

-valued random variables U and V, let

d_{T V} (U; V)

be the total variation distance between the distribution of U and the distribution of V. Furthermore, let

P_{c}

denote a Poisson random variable with parameter

c > 0

. For any

α \in (0, 1)

and

t > 0

, we show that as

n \to + \infty

:

d_{T V} (K (α, t n^{α}, n); 1 + P_{t n^{α}}) \to 0 .

(24)

This implies (22). The proof of (24) requires a careful analysis of the probability generating function of

K (α, t n^{α}, n)

. In particular, let us define

ω (t; n, α) : = t n^{α} + \frac{t M_{α}^{'} (t)}{M_{α} (t)}

, where

M_{α} (t) : = \frac{1}{π} \sum_{m = 1}^{\infty} \frac{{(- t)}^{m - 1}}{(m - 1)!} Γ (α m) sin (π α m)

is the Wright–Mainardi function (Mainardi et al. []). Then, we apply Corollary 2 of Dolera and Favaro [] to conclude that

d_{T V} (K (α, t n^{α}, n); 1 + P_{ω (t; n, α)}) \to 0

as

n \to + \infty

. Finally, we applied inequality (2.2) in Adell and Jodrá [] to obtain:

\begin{matrix} d_{T V} (1 + P_{t n^{α}}; 1 + P_{ω (t; n, α)}) & = d_{T V} (P_{t n^{α}}; P_{ω (t; n, α)}) \leq \frac{t M_{α}^{'} (t)}{M_{α} (t)} min \{1, \frac{\sqrt{(2 / e)}}{\sqrt{ω (t; n, α)} + \sqrt{t n^{α}}}\} \end{matrix}

So that

d_{T V} (1 + P_{t n^{α}}; 1 + P_{ω (t; n, α)}) \to 0

as

n \to + \infty

, and (24) follows. Now, keeping

α

and t fixed as above, we show that (24) entails (22). To this aim, we introduced the Kolmogorov distance

d_{K}

which, for any pair of

R_{+}

-valued random variables U and V, is defined by

d_{K} (U; V) : = {sup}_{x \geq 0} | \Pr [U \leq x] - \Pr [V \leq x] |

. The claim to be proven is equivalent to:

d_{K} (K_{n} (α, θ) / n^{α}; S_{α, θ}) \to 0

as

n \to + \infty

. We exploit statement (i) of Theorem 2. This leads to the distributional identity

K_{n} (α, θ) \overset{d}{=} K (α, {\bar{X}}_{α, θ, n}, n)

. Thus, in view of the basic properties of the Kolmogorov distance:

\begin{matrix} d_{K} (K_{n} (α, θ) / n^{α}; S_{α, θ}) & \leq d_{K} (K_{n} (α, θ); K (α, n^{α} S_{α, θ}, n)) \\ + d_{K} (K (α, n^{α} S_{α, θ}, n); 1 + P_{n^{α} S_{α, θ}}) \\ + d_{K} ([1 + P_{n^{α} S_{α, θ}}] / n^{α}; S_{α, θ}), \end{matrix}

(25)

where the

{P_{λ}}_{λ \geq 0}

is thought of here as a homogeneous Poisson process with a rate of 1, independent of

S_{α, θ}

. The desired conclusion will be reached as soon as we will prove that all the three summands on the right-hand side of (25) go to zero as

n \to + \infty

. Before proceeding, we recall that

d_{K} (U; V) \leq d_{T V} (U; V)

. Therefore, for the first of these terms, we write:

\begin{matrix} d_{K} (K_{n} (α, θ); K (α, n^{α} S_{α, θ}, n)) \\ \leq \frac{1}{2} \sum_{k = 1}^{n} | C (n, k; α) \frac{Γ (k + θ / α)}{α Γ (θ / α + 1)} \frac{Γ (θ + 1)}{Γ (n + θ)} - \int_{0}^{+ \infty} \frac{C (n, k; α) {(t n^{α})}^{k}}{d_{n} (t)} f_{S_{α, θ}} (t) d t | \end{matrix}

with

d_{n} (t) : = \sum_{j = 1}^{n} C (n, j; α) {(t n^{α})}^{j}

. Now, let us define

d_{n}^{*} (t) : = e^{t n^{α}} (n - 1)! \frac{1}{t^{1 / α}} f_{α} (\frac{1}{t^{1 / α}})

. Accordingly, we can make the above right-hand side major by means of the following quantity:

\begin{matrix} \frac{1}{2} \sum_{k = 1}^{n} | C (n, k; α) \frac{Γ (k + θ / α)}{α Γ (θ / α + 1)} \frac{Γ (θ + 1)}{Γ (n + θ)} - \int_{0}^{+ \infty} \frac{C (n, k; α) {(t n^{α})}^{k}}{d_{n}^{*} (t)} f_{S_{α, θ}} (t) d t | \\ + \frac{1}{2} \int_{0}^{+ \infty} \frac{| d_{n}^{*} (t) - d_{n} (t) |}{d_{n}^{*} (t)} f_{S_{α, θ}} (t) d t . \end{matrix}

Then, by exploiting the identity

\int_{0}^{+ \infty} \frac{{(t n^{α})}^{k}}{d_{n}^{*} (t)} f_{S_{α, θ}} (t) d t = \frac{1}{(n - 1)!} \frac{Γ (k + θ / α)}{n^{θ}} \frac{Γ (θ + 1)}{α Γ (θ / α + 1)}

, we can write:

\begin{matrix} \sum_{k = 1}^{n} | C (n, k; α) \frac{Γ (k + θ / α)}{α Γ (θ / α + 1)} \frac{Γ (θ + 1)}{Γ (n + θ)} - \int_{0}^{+ \infty} \frac{C (n, k; α) {(t n^{α})}^{k}}{d_{n}^{*} (t)} f_{S_{α, θ}} (t) d t | = | 1 - \frac{Γ (n + θ)}{Γ (n) n^{θ}} | \end{matrix}

which goes to zero as

n \to + \infty

for any

θ > - α

, by Stirling’s approximation. To show that the integral

\int_{0}^{+ \infty} \frac{| d_{n}^{*} (t) - d_{n} (t) |}{d_{n}^{*} (t)} f_{S_{α, θ}} (t) d t

also goes to zero as

n \to + \infty

, we may resort to identities (13)–(14) of Dolera and Favaro [], as well as Lemma 3 therein. In particular, let

Δ : (0, + \infty) \to (0, + \infty)

denote a suitable continuous function independent of n, and such that

Δ (z) = O (1)

as

z \to 0

and

Δ (z) f_{α} (1 / z) = O (z^{- \infty})

as

z \to + \infty

. Then, we write that:

\begin{matrix} \int_{0}^{+ \infty} \frac{| d_{n}^{*} (t) - d_{n} (t) |}{d_{n}^{*} (t)} f_{S_{α, θ}} (t) d t \\ \leq | \frac{{(n / e)}^{n} \sqrt{2 π n}}{n!} - 1 | + (\frac{{(n / e)}^{n} \sqrt{2 π n}}{n!}) \frac{1}{n} \int_{0}^{+ \infty} Δ (t^{1 / α}) f_{S_{α, θ}} (t) d t . \end{matrix}

Since

\int_{0}^{+ \infty} Δ (t^{1 / α}) f_{S_{α, θ}} (t) d t < + \infty

by Lemma 3 of Dolera and Favaro [], both the summands on the above right-hand side go to zero as

n \to + \infty

, again by Stirling’s approximation. Thus, the first summand on the right-hand side of (25) goes to zero as

n \to + \infty

. As for the second summand on the right-hand side of (25), it can be bounded by

\int_{0}^{+ \infty} d_{T V} (K (α, t n^{α}, n); 1 + P_{t n^{α}}) f_{S_{α, θ}} (t) d t .

By a dominated convergence argument, this quantity goes to zero as

n \to + \infty

as a consequence of (24). Finally, for the third summand on the right-hand side of (25), we can resort to a conditioning argument in order to reduce the problem to a direct application of the law of large numbers for renewal processes (Section 10.2, Grimmett and Stirzaker []). In particular, this leads to

n^{- α} P_{t n^{α}} \overset{a . s .}{⟶} t

for any

t > 0

, which entails that

n^{- α} P_{n^{α} S_{α, θ}} \overset{a . s .}{⟶} S_{α, θ}

as

n \to + \infty

. Thus, this third term also goes to zero as

n \to + \infty

and (22) follows.

Now, we consider (23), showing that it arises by combining (21) with statement (ii) of Theorem 2. In particular, by an obvious conditioning argument, we can write that as

n \to + \infty

:

\frac{K_{n} (α, {\tilde{X}}_{α, z, n} | α |)}{{\tilde{X}}_{α, z, n}} \overset{a . s .}{⟶} 1 .

At this stage, we consider the probability generating function of

{\tilde{X}}_{α, z, n}

and we immediately obtain

E [s^{{\tilde{X}}_{α, z, n}}] : = B_{n} (- s z) / B_{n} (- z)

for

n \in N

and

s \in [0, 1]

with the same

B_{n}

as in (13) and (14). Therefore, the asymptotic expansion we already provided in (15) entails:

\frac{{\tilde{X}}_{α, z, n}}{n^{\frac{- α}{1 - α}}} \overset{w}{⟶} \frac{{(α z)}^{\frac{1}{1 - α}}}{- α}

(26)

as

n \to + \infty

. In particular, (26) follows by applying exactly the same arguments used to prove (8). Now, since:

\frac{K_{n} (α, {\tilde{X}}_{α, z, n} | α |)}{n^{\frac{- α}{1 - α}}} \overset{d}{=} \frac{K_{n} (α, {\tilde{X}}_{α, z, n} | α |)}{{\tilde{X}}_{α, z, n}} \frac{{\tilde{X}}_{α, z, n}}{n^{\frac{- α}{1 - α}}},

the claim follows from a direct application of Slutsky’s theorem. This completes the proof. □

3. Discussion

The NB-CPSM is a compound Poisson sampling model generalising the popular LS-CMSM. In this paper, we introduced a compound Poisson perspective of the EP-SM in terms of the NB-CPSM, thus extending the well-known compound Poisson perspective of the E-SM in terms of the LS-CPSM. We conjecture that an analogous perspective holds true for the class of

α

-stable Poisson–Kingman sampling models (Pitman [] and Pitman []), of which the EP-SM is a noteworthy special case. That is, for

α \in (0, 1)

, we conjecture that an

α

-stable Poisson–Kingman sampling model admits a representation as a randomised NB-CPSM with

α \in (0, 1)

and

z > 0

, where the randomisation acts on z with respect to a scale mixture between a Gamma and a suitable transformation of the Mittag–Leffler distribution. We believe that such a compound Poisson representation would be critical in order to introduce Berry–Esseen type refinements of the large n asymptotic behaviour of

K_{n}

under

α

-stable Poisson–Kingman sampling models. This can be seen in Section 6.1 of Pitman [] and the references therein. Such a line of research aims to extend the preliminary works of Dolera and Favaro [,] on Berry–Esseen type theorems under the EP-SM. Work on this, and on the more general settings induced by normalised random measures (Regazzini et al. []) and Poisson–Kingman models (Pitman []), is ongoing.

Author Contributions

Formal analysis, E.D. and S.F.; writing—original draft preparation, E.D. and S.F.; writing—review and editing, E.D. and S.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under grant agreement No 817257.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors thank the editor and two anonymous referees for all their comments and suggestions which remarkably improved the original version of the present paper. Emanuele Dolera and Stefano Favaro wish to express their enormous gratitude to Eugenio Regazzini, whose fundamental contributions to the theory of Bayesian statistics have always been a great source of inspiration, transmitting enthusiasm and method for the development of their own research. The authors gratefully acknowledge the financial support from the Italian Ministry of Education, University and Research (MIUR), “Dipartimenti di Eccellenza” grant 2018–2022.

Conflicts of Interest

The authors declare no conflict of interest.

References

Perman, M.; Pitman, J.; Yor, M. Size-biased sampling of Poisson point processes and excursions. Probab. Theory Relat. Fields 1992, 92, 21–39. [Google Scholar] [CrossRef]
Pitman, J. Exchangeable and partially exchangeable random partitions. Probab. Theory Relat. Fields 1995, 102, 145–158. [Google Scholar] [CrossRef]
Pitman, J.; Yor, M. The two parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 1997, 25, 855–900. [Google Scholar] [CrossRef]
Ferguson, T.S. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1973, 1, 209–230. [Google Scholar] [CrossRef]
Pitman, J. Combinatorial Stochastic Processes; Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Ewens, W. The sampling theory or selectively neutral alleles. Theor. Popul. Biol. 1972, 3, 87–112. [Google Scholar] [CrossRef]
Crane, H. The ubiquitous Ewens sampling formula. Stat. Sci. 2016, 31, 1–19. [Google Scholar] [CrossRef]
Charalambides, C.A. Distributions of random partitions and their applications. Methodol. Comput. Appl. Probab. 2007, 9, 163–193. [Google Scholar] [CrossRef]
Korwar, R.M.; Hollander, M. Contributions to the theory of Dirichlet processes. Ann. Stat. 1973, 1, 705–711. [Google Scholar] [CrossRef]
Wright, E.M. The asymptotic expansion of the generalized Bessel function. Proc. Lond. Math. Soc. 1935, 38, 257–270. [Google Scholar] [CrossRef]
Charalambides, C.A. Combinatorial Methods in Discrete Distributions; Wiley: Hoboken, NJ, USA, 2005. [Google Scholar]
Berg, L. Asymptotische darstellungen für integrale und reihen mit anwendungen. Math. Nachrichten 1958, 17, 101–135. [Google Scholar] [CrossRef]
Favaro, S.; James, L.F. A note on nonparametric inference for species variety with Gibbs-type priors. Electron. J. Stat. 2015, 9, 2884–2902. [Google Scholar] [CrossRef]
Gnedin, A.; Pitman, J. Exchangeable Gibbs partitions and Stirling triangles. J. Math. Sci. 2006, 138, 5674–5685. [Google Scholar] [CrossRef] [Green Version]
Gnedin, A. A species sampling model with finitely many types. Electron. Commun. Probab. 2010, 8, 79–88. [Google Scholar] [CrossRef]
Dolera, E.; Favaro, S. A Berry—Esseen theorem for Pitman’s α—Diversity. Ann. Appl. Probab. 2020, 30, 847–869. [Google Scholar] [CrossRef]
Dolera, E.; Favaro, S. Rates of convergence in de Finetti’s representation theorem, and Hausdorff moment problem. Bernoulli 2020, 26, 1294–1322. [Google Scholar] [CrossRef]
Favaro, S.; Lijoi, A.; Prünster, I. Asymptotics for a Bayesian nonparametric estimator of species richness. Bernoulli 2012, 18, 1267–1283. [Google Scholar] [CrossRef]
Favaro, S.; Lijoi, A.; Mena, R.H.; Prünster, I. Bayesian nonparametric inference for species variety with a two parameter Poisson-Dirichlet process prior. J. R. Stat. Soc. Ser. B 2009, 71, 992–1008. [Google Scholar] [CrossRef]
Mainardi, F.; Mura, A.; Pagnini, G. The M-Wright function in time-fractional diffusion processes: A tutorial survey. Int. J. Differ. Equat. 2010, 104505. [Google Scholar] [CrossRef] [Green Version]
Adell, J.A.; Jodrá, P. Exact Kolmogorov and total variation distances between some familiar discrete distributions. J. Inequalities Appl. 2006, 64307. [Google Scholar] [CrossRef] [Green Version]
Grimmett, G.; Stirzaker, D. Probability and Random Processes; Oxford University Press: Oxford, UK, 2001. [Google Scholar]
Pitman, J. Poisson-Kingman partitions. In Science and Statistics: A Festschrift for Terry Speed; Goldstein, D.R., Ed.; Institute of Mathematical Statistics: Tachikawa, Japan, 2003. [Google Scholar]
Regazzini, E.; Lijoi, A.; Prünster, I. Distributional results for means of normalized random measures with independent increments. Ann. Stat. 2003, 31, 560–585. [Google Scholar] [CrossRef]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Compound Poisson Perspective of Ewens–Pitman Sampling Model

Abstract

1. Introduction

2. A Compound Poisson Perspective of EP-SM

3. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics