Mixture of Species Sampling Models

Federico Bassetti; Lucia Ladelli

doi:10.3390/math9233127

and

Department of Mathematics, Politecnico of Milano, 20133 Milano, Italy

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics2021, 9(23), 3127;https://doi.org/10.3390/math9233127

This article belongs to the Special Issue Bayesian Predictive Inference and Related Asymptotics—Festschrift for Eugenio Regazzini's 75th Birthday

Version Notes

Order Reprints

Abstract

We introduce mixtures of species sampling sequences (mSSS) and discuss how these sequences are related to various types of Bayesian models. As a particular case, we recover species sampling sequences with general (not necessarily diffuse) base measures. These models include some “spike-and-slab” non-parametric priors recently introduced to provide sparsity. Furthermore, we show how mSSS arise while considering hierarchical species sampling random probabilities (e.g., the hierarchical Dirichlet process). Extending previous results, we prove that mSSS are obtained by assigning the values of an exchangeable sequence to the classes of a latent exchangeable random partition. Using this representation, we give an explicit expression of the Exchangeable Partition Probability Function of the partition generated by an mSSS. Some special cases are discussed in detail—in particular, species sampling sequences with general base measures and a mixture of species sampling sequences with Gibbs-type latent partition. Finally, we give explicit expressions of the predictive distributions of an mSSS.

Keywords:

species sampling models; exchangeable random partitions; exchangeable sequences; predictive distributions

1. Introduction

Discrete random measures have been widely used in Bayesian nonparametrics. Noteworthy examples of such random measures are the Dirichlet process [1], the Pitman–Yor process [2,3], (homogeneous) normalized random measures with independent increments (see, e.g., [4,5,6,7]), Poisson–Kingman random measures [8] and stick-breaking priors [9]. All the previous random measures are of the form

P = \sum_{j \geq 1} p_{j}^{↓} δ_{Z_{j}},

(1)

where

{(Z_{j})}_{j \geq 1}

are i.i.d. random variables taking values in a Polish space

(X, X)

with common distribution H, and

{(p_{j}^{↓})}_{j \geq 1}

are random positive weights in

[0, 1]

, independent of

{(Z_{j})}_{j \geq 1}

, such that

p_{1}^{↓} \geq p_{2}^{↓} \geq p_{3}^{↓} \geq \dots

.

With a few exceptions—see, e.g., [1,4,10,11,12,13,14]—the base measure H of a random measure P in (1) is usually assumed to be diffuse, since this simplifies the derivation of various analytical results.

The diffuseness of H is assumed also to define the so-called species sampling sequences [15], exchangeable sequences whose directing measure is a discrete random probability of type (1). In this case, the diffuseness of H is motivated by the interpretation of species sampling sequences as sequences describing a sampling mechanism in discovering species from an unknown population. In this context, the

Z_{j}

s are the possible infinite different species, and the diffuseness of H ensures that there is no redundancy in this description.

On the other hand, from a Bayesian point of view, the diffuseness of H is not always reasonable and there are situations in which a discrete (or mixed) H is indeed natural. For example, recent interest in species sampling models with a spike-and-slab base measure emerged in [16,17,18,19,20,21] in order to induce sparsity and facilitate variable selection. Other models, which are implicitly related to species sampling sequences with non-diffused base measures, are mixtures of Dirichlet processes [10] and hierarchical random measures; see, e.g., [22,23,24,25].

The combinatorial structure of species sampling sequences derived from random measure (1) with general H have been recently studied in [14].

In this paper, we discuss some relevant properties of species sampling sequences with general base measures, as well as some further generalizations, namely mixtures of species sampling sequences with general base measures (mSSS).

An mSSS is an exchangeable sequence whose directing random measure is of type (1), where

{(Z_{n})}_{n \geq 1}

is a sequence of exchangeable random variables and

{(p_{n}^{↓})}_{n \geq 1}

are random positive weights in

[0, 1]

with

p_{1}^{↓} \geq p_{2}^{↓} \geq p_{3}^{↓} \geq \dots

, independent of

{(Z_{n})}_{n \geq 1}

.

The core of the results that we prove in this paper is that all the mSSS can be obtained by assigning the values of an exchangeable sequence to the classes of a latent exchangeable random partition. We summarize the results of Section 3 in the next statement.

The following are equivalent:

$ξ = {(ξ_{n})}_{n \geq 1}$ is an mSSS;
with probability one ${(ξ_{n})}_{n \geq 1} = {(Z_{I_{n}})}_{n \geq 1}$ , where ${(I_{n})}_{n \geq 1}$ is a sequence of integer-valued random variables independent of the Zs such that, conditionally on $p^{↓} : = (p_{1}^{↓}, p_{2}^{↓}, \dots)$ , the $I_{n}$ are independent and $P {I_{n} = i | p^{↓}} = p_{i}^{↓}$ .
with probability one ${(ξ_{n})}_{n \geq 1} : = {(Z_{C_{n} (Π)}^{'})}_{n \geq 1}$ , where ${(Z_{n}^{'})}_{n \geq 1}$ is an exchangeable sequence with the same law of ${(Z_{n})}_{n \geq 1}$ , Π is an exchangeable partition, independent of ${(Z_{n}^{'})}_{n \geq 1}$ , obtained by sampling from ${(p_{n}^{↓})}_{n \geq 1}$ , and $C_{n} (Π)$ is the index of the block in Π containing n.

The partition

Π

obtained from

p^{↓} = (p_{1}^{↓}, p_{2}^{↓}, \dots)

is the so-called paint-box process associated with

p^{↓}

. In general, this partition, called the latent partition, does not coincide with the partition induced by the

{(ξ_{n})}_{n \geq 1}

. Note that also the sequence

{(Z_{n}^{'})}_{n \geq 1}

is latent, in the sense that it cannot be obtained if only

{(ξ_{n})}_{n \geq 1}

is known. On the other hand, combining the information contained in

{(Z_{n}^{'})}_{n \geq 1}

and in

Π

, one obtains complete knowledge of

{(ξ_{n})}_{n \geq 1}

, and, in particular, of its clustering behavior. This last observation is essential for the development of all the other results presented in our paper.

The rest of the paper is organized as follows. Section 2 reviews some important results on species sampling models and exchangeable random partitions. Section 3 introduces mixtures of species sampling sequences and discusses how these sequences are related to various types of Bayesian models. In the same section, the stochastic representations for mixtures of species sampling sequences sketched above are proven. In Section 4, we provide an explicit expression of the Exchangeable Partition Probability Function (EPPF) of the partition generated by such sequences. This result is achieved considering two EPPFs arising from a suitable latent partition structure. Some special cases are further detailed. Finally, Section 5 deals with the predictive distributions of mixtures of species sampling sequences.

2. Background Materials

In this section, we briefly review some basic notions of exchangeable random partitions and species sampling models.

2.1. Exchangeable Random Partitions

A partition

π_{n}

of

[n] : = {1, \dots, n}

is an unordered collection

{π_{1, n}, \dots, π_{k, n}}

of disjoint non-empty subsets (blocks) of

{1, \dots, n}

such that

\cup_{j = 1}^{k} π_{j, n} = [n]

. A partition

π_{n} = {π_{1, n}, π_{2, n}, \dots, π_{k, n}}

has

| π_{n} | : = k

blocks (with

1 \leq | π_{n} | \leq n

) and

| π_{c, n} |

, with

c = 1, \dots, k

, is the number of elements of the block c. We denote by

P_{n}

the collection of all partitions of

[n]

and, given a partition, we list its blocks in ascending order of their smallest element, i.e., in order of their appearance. For instance, we write

[(1, 3), (2, 4), (5)]

and not

[(2, 4), (3, 1), (5)]

.

A sequence of random partitions,

Π = {(Π_{n})}_{n \geq 1}

, defined on a common probability space, is called a random partition of

N

if, for each n, the random variable

Π_{n}

takes values in

P_{n}

and, for

m < n

, the restriction of

Π_{n}

to

P_{m}

is

Π_{m}

(consistency property).

In order to define an exchangeable random partition, given a permutation

ρ

of

[n]

and a partition

π_{n}

in

P_{n}

, we denote by

ρ (π_{n})

the partition with blocks

{ρ (j) : j \in π_{i, n}}

for

i = 1, \dots, | π_{n} |

. A random partition of

N

is said to be exchangeable if

Π_{n}

has the same distribution of

ρ (Π_{n})

for every n and every permutation

ρ

of

[n]

. In other words, its law is invariant under the action of all permutations (acting on

Π_{n}

in the natural way).

The law of any exchangeable random partition on

N

is completely characterized by its Exchangeable Partition Probability Function (EPPF); in other words, there exists a unique symmetric function

q

on the integers such that, for any partition

π_{n}

in

P_{n}

,

P {Π_{n} = π_{n}} = q (| π_{1, n} |, \dots, | π_{k, n} |)

(2)

where k is the number of blocks in

π_{n}

. In the following, we shall write

Π \sim q

to denote an exchangeable partition of

N

with EPPF

q

. Note that an EPPF is indeed a family of symmetric functions

q_{k}^{n} (\cdot)

defined on

C_{n, k} = {(n_{1}, \dots, n_{k}) \in N^{k} : \sum_{i = 1}^{k} n_{i} = n}

. To simplify the notation, we write

q

instead of

q_{k}^{n}

. Alternatively, one can assume that

q

is a function on

\cup_{n \in N} \cup_{k = 1}^{n} C_{n, k}

. See [26].

Given a sequence of random variables

X = {(X_{j})}_{j \geq 1}

taking values in some measurable space, the random partition

Π^{*} (X)

induced by X is defined as the random partition obtained by the equivalence classes under the random equivalence relation

i (ω) \sim j (ω)

if and only if

X_{i} (ω) = X_{j} (ω)

. One can check that a partition induced by an exchangeable random sequence is an exchangeable random partition.

Recall that, by de Finetti’s theorem, a sequence

X = {(X_{n})}_{n \geq 1}

taking values in a Polish space

(X, X)

is exchangeable if and only if the

X_{n}

s, given some random probability measure Q on

X

, are conditionally independent with common distribution Q. Moreover, the random probability Q, known as the directing random measure of X, is the almost sure limit (with respect to weak convergence) of the empirical process

\frac{1}{n} \sum_{i = 1}^{n} δ_{X_{i}}

.

Based on de Finetti’s theorem, Kingman’s correspondence theorem sets up a one-to-one map between the law of an exchangeable random partition on

N

(i.e., its EPPF) and the law of random ranked weights

p^{↓} = {(p_{j}^{↓})}_{j \geq 1}

satisfying

1 \geq p_{1}^{↓} \geq p_{2}^{↓} \geq \dots \geq 0

and

\sum_{j} p_{j}^{↓} \leq 1

(with probability one). To state the theorem, recall that a partition

Π

is said to be generated by a (possibly random)

p^{↓}

, if it is generated by a sequence

{(I_{n})}_{n \geq 1}

of integer-valued random variables that are conditionally independent given

p^{↓}

with conditional distribution

P {I_{n} = i | p^{↓}} : = \{\begin{matrix} 1 - \sum_{j \geq 1} p_{j}^{↓} & if i = - n \\ p_{i}^{↓} & if i \geq 1, \end{matrix}

(3)

Note that

1 - \sum_{j \geq 1} p_{j}^{↓}

is the magnitude of the so-called “dust” component; indeed, each

I_{n}

sampled from this part, i.e.,

I_{n} = - n

, contributes to a singleton n in the partition

Π

. A consequence is that if

\sum_{j \geq 1} p_{j}^{↓} = 1

a.s., the partition

Π

has no singleton. The partition

Π^{*} (I)

is sometimes referred to as the

p^{↓}

-paintbox process; see [27].

Let

\nabla : = {p_{j}^{↓} \in [0, 1] : p_{1}^{↓} \geq p_{2}^{↓} \geq \dots, \sum_{j \geq 1} p_{j}^{↓} \leq 1}

. We are now ready to state Kingman’s theorem.

Theorem 1

([28]). Given any exchangeable random partition Π with EPPF

q

, denote by

Π_{j, n}^{↓}

the blocks of the partition rearranged in decreasing order with respect to the number of elements in the blocks of

Π_{n}

. Then,

lim_{n} {(\frac{| Π_{j, n}^{↓} |}{n})}_{j \geq 1} = {(p_{j}^{↓})}_{j \geq 1} a . s .

(4)

for some random

p^{↓} = {(p_{j}^{↓})}_{j \geq 1}

taking values in ∇. Moreover, on a possibly enlarged probability space, there is a sequence of integer-valued random variables

I = {(I_{n})}_{n \geq 1}

, conditionally independent given

p^{↓}

, such that (3) holds and the partition induced by I is equal to Π a.s.

Kingman’s theorem is usually stated in a slightly weaker form (see, e.g., Theorem 2.2 in [26]) and the equality between

Π^{*} (I)

and

Π

is given in law. The previous “almost sure” version can be easily derived by inspecting the proof of Kingman’s theorem given in [29].

A consequence of the previous theorem is that

q \mapsto Law (p^{↓})

for

p^{↓}

in (4) defines a bijection from the set of the EPPF and the laws on ∇.

If

p^{↓}

is proper, i.e.,

\sum_{j \geq 1} p_{j}^{↓} = 1

a.s., then Kingman’s correspondence between

p^{↓}

and the EPPF

q

can be made explicit by

q (n_{1}, \dots, n_{k}) = \sum_{(j_{1}, \dots, j_{k}) \in N_{k}} E [\prod_{i = 1}^{k} {(p_{j_{i}}^{↓})}^{n_{i}}] .

(5)

where

N_{k}

is the set of all ordered k-tuples of distinct positive integers. See Chapter 2 [26].

Given an EPPF

q

, one deduces the corresponding sequence of predictive distributions, which is the sequence of conditional distributions

P {Π_{n + 1} = π_{n + 1} | Π_{n} = π_{n}}

when

Π \sim q

. Starting with

Π_{1} = {1}

, given

Π_{n} = π_{n}

(with

| π_{n} | = k

), the conditional probability of adding a new block (containing

n + 1

) to

Π_{n}

is

ν_{n} (π_{n}) = ν_{n} (| π_{1, n} |, \dots, | π_{k, n} |) : = \frac{q (| π_{1, n} |, \dots, | π_{k, n} |, 1)}{q (| π_{1, n} |, \dots, | π_{k, n} |)};

(6)

while the conditional probability of adding

n + 1

to the ℓ-th block of

Π_{n}

(for

ℓ = 1, \dots, k

) is

ω_{n, ℓ} (π_{n}) = ω_{n, ℓ} (| π_{1, n} |, \dots, | π_{k, n} |) : = \frac{q (| π_{1, n} |, \dots, | π_{ℓ, n} | + 1, \dots, | π_{k, n} |)}{q (| π_{1, n} |, \dots, | π_{k, n} |)} .

(7)

2.2. Species Sampling Models

A species sampling random probability (SSrp) is a random probability of the form

P = \sum_{j \geq 1} p_{j} δ_{Z_{j}} + (1 - \sum_{j \geq 1} p_{j}) H

(8)

where

{(Z_{j})}_{j \geq 1}

are i.i.d. random variables taking values in a Polish space

(X, X)

with common distribution H, and

{(p_{j})}_{j \geq 1}

are random positive weights in

[0, 1]

, independent of

{(Z_{j})}_{j \geq 1}

, such that

\sum_{j \geq 1} p_{j} \leq 1

with probability one. These random probability measures are also known as Type III random probability measures; see [30].

Given the SSrp in (8), let

{(p_{j}^{↓})}_{j \geq 1}

be the ranked sequence obtained from

{(p_{j})}_{j \geq 1}

rearranging the

p_{j}

s in decreasing order. One can always write

P = \sum_{j \geq 1} p_{j}^{↓} δ_{{\tilde{Z}}_{j}} + (1 - \sum_{j \geq 1} p_{j}^{↓}) H

(9)

where

{({\tilde{Z}}_{j})}_{j \geq 1}

is a suitable random reordering of the original sequence

{(Z_{j})}_{j \geq 1}

. It is easy to check that

{({\tilde{Z}}_{j})}_{j \geq 1}

are i.i.d. random variables with law H independent of

{(p_{j}^{↓})}_{j \geq 1}

. Hence, H and the EPPF

q

associated via Kingman’s correspondence with

{(p_{j}^{↓})}_{j \geq 1}

completely characterize the law of P, from now on denoted by

S S r p (q, H)

.

S S r p

with H diffuse are also characterized as directing random measures of a particular type of exchangeable sequences, known as species sampling sequences. Let

q

be an EPPF and H a diffuse probability measure on a Polish space

X

. An exchangeable sequence

ξ : = {(ξ_{n})}_{n}

taking values in

X

is a species sampling sequence,

S S S (q, H)

, if the law of

{(ξ_{n})}_{n}

is characterized by the predictive system:

(PS1) $P {ξ_{1} \in d x} = H (d x)$ ;
(PS2) the conditional distribution of $ξ_{n + 1}$ given $(ξ_{1}, \dots, ξ_{n})$ is

$P \{ξ_{n + 1} \in d x | ξ_{1}, \dots, ξ_{n}\} = \sum_{c = 1}^{K} ω_{n, c} δ_{ξ_{c}^{*}} (d x) + ν_{n} H (d x),$

where $(ξ_{1}^{*}, \dots, ξ_{K}^{*})$ is the sequence of distinct observations in order of appearance, $ω_{n, c} = ω_{n, c} (| Π_{1, n} |, \dots, | Π_{K, n} |)$ , $ν_{n} = ν_{n} (| Π_{1, n} |, \dots, | Π_{K, n} |)$ , $K = | Π_{n} |$ , $Π_{n}$ is the random partition induced by $(ξ_{1}, \dots, ξ_{n})$ and $ω_{n, c}$ and $ν_{n}$ are related to the $q$ by (6) and (7).

We summarize here some results proven in [15].

Proposition 1

([15]). Let H be a diffuse probability measure; then, an exchangeable sequence

{(ξ_{n})}_{n}

is characterized by (PS1)–(PS2) if and only if its directing random measure is an

S S r p (q, H)

.

As noted in [29], the partition induced by any exchangeable sequence taking values in

X

with directing measure

\tilde{μ}

depends only on the sequence

\tilde{μ} ({\tilde{x}}_{j})

, where

{\tilde{x}}_{j}

are the random atoms forming the discrete component of

\tilde{μ}

and ordered in such a way that

\tilde{μ} ({\tilde{x}}_{1}) \geq \tilde{μ} ({\tilde{x}}_{2}) \geq \dots

. Combining this observation with the previous proposition, one can see that, when H is diffuse and

ξ

is an

S S S (q, H)

, the partition

Π^{*} (ξ)

is equal (a.s.) to

Π^{*} (I)

(where I is defined as in Kingman’s theorem) and

Π^{*} (ξ)

has EPPF

q

. Note that [29] defines the

p^{↓}

-paintbox process as any random partition

Π^{*} (ξ)

where

ξ

is an exchangeable sequence with directing random measure (9) and H is a diffuse measure.

One can show (see the proof of Proposition 13 in [15]) that an

S S S (q, H)

can be obtained by assigning the values of an i.i.d. sequence

{(Z_{n})}_{n}

with distribution H to the classes of an independent exchangeable random partition with EPPF

q

. More formally, for a random partition

Π

, let

C_{n} (Π)

be the random index denoting the block containing n, i.e.,

C_{n} (Π) = c if n \in Π_{c, n}

or equivalently if

n \in Π_{c, j}

for some (and hence all)

j \geq n

. If

Z^{'} = {(Z_{n}^{'})}_{n}

is an i.i.d. sequence with law H (diffuse),

Π

is an exchangeable partition with

Π \sim q

, and

Z^{'}

and

Π

are stochastically independent, then

{(ξ_{n})}_{n \geq 1} : = {(Z_{C_{n} (Π)}^{'})}_{n \geq 1}

(10)

is an

S S S (q, H)

. Note that the

Z_{n}^{'}

s appearing in (10) are not the same

Z_{n}

s of (8), although they have the same law.

It is worth mentioning that the original characterization given in [15] of species sampling sequences is stronger than the one summarized here. Indeed, the original definition of SSS is given using a slightly weaker predictive assumption. For details, see Proposition 13 and the discussion following Proposition 11 in [15].

In summary, when H is diffuse, one can build a species sampling sequence

{(ξ_{n})}_{n}

by one of the following equivalent constructions:

using the system of predictive distributions (PS1)–(PS2);
sampling (conditionally) i.i.d. variables from (8);
combining an i.i.d. sequence from H with an exchangeable random partition by (10).

3. Mixture of Species Sampling Models

We now discuss some possible generalizations of the notion of species sampling sequences and we show that the three constructions presented above are no more equivalent in this setting.

3.1. Definitions and Relation to Other Models

Exchangeable sequences sampled from an

S S r m

with a general base measure, also known as generalized species sampling sequences (

g S S S

), have been introduced and studied in [14,25].

Definition 1

(

g S S S (q, H)

).

{(ξ_{n})}_{n \geq 1}

is a

g S S S (q, H)

if it is an exchangeable sequence with directing random measure P, where

P \sim S S r p (q, H)

, H being any measure on

(X, X)

(not necessarily diffuse).

Clearly, a

g S S S (q, H)

with H diffuse is an

S S S (q, H)

. On the contrary, if

ξ

is a

g S S S (q, H)

with H non-diffuse, (PS1)–(PS2) are no longer true. Moreover, the EPPF of the random partition induced by

ξ

with H non-diffuse is not

q

. The relation between the partition induced by

ξ

and

q

has been studied in [14].

In [25], the definition of

g S S S (q, H)

with H not necessarily diffuse was motivated by an interest in defining the class of the so-called hierarchical species sampling models. If

ξ_{1}, ξ_{2}, \dots

are exchangeable random variables with a directing random measure of hierarchical type, one has that

\begin{matrix} ξ_{n} | P_{1}, P_{0} & \overset{i . i . d .}{\sim} P_{1} n \geq 1 \\ P_{1} | P_{0} & \sim S S r p (q, P_{0}) \\ P_{0} & \sim S S r p (q_{0}, H_{0}) . \end{matrix}

In order to understand why the general definition of

g S S S (q, H)

is useful in this context, note that, even if

H_{0}

is diffuse and

q_{0}

is proper (i.e., the

p^{↓}

associated with

q_{0}

by Kingman’s correspondence are proper), the conditional distribution of

{[ξ_{n}]}_{n \geq 1}

given

P_{0}

is not an

S S S

, since

P_{0}

is a.s. a purely atomic probability measure on

X

. Moreover, assuming that

q

is proper, we can write

P_{1} = \sum_{j} p_{j 1} δ_{Z_{j}}

where

Z_{j}

are conditionally i.i.d. with common distribution

P_{0}

, given

P_{0}

, and

{(p_{j 1})}_{j}

are associated by Kingman’s correspondence with the EPPF

q

. In other words, in this case,

{(ξ_{n})}_{n \geq 1}

are exchangeable with directing random measure

P_{1} = \sum_{j} p_{j 1} δ_{Z_{j}}

, where

{(p_{j 1})}_{j}

and

{(Z_{j})}_{j}

are independent and

{(Z_{j})}_{j}

are exchangeable with directing measure

P_{0} \sim S S r p (q_{0}, H_{0})

.

The previous observation suggests a further generalization of species sampling sequences.

Definition 2

(

m S S S

). We say that

{(ξ_{n})}_{n \geq 1}

is a mixture of species sampling sequences (

m S S S

) if it is an exchangeable sequence with directing random measure

P = \sum_{j \geq 1} p_{j}^{↓} δ_{Z_{j}} + (1 - \sum_{j \geq 1} p_{j}^{↓}) \tilde{H}

(11)

where

Z : = {(Z_{n})}_{n \geq 1}

is an exchangeable sequence with directing random measure

\tilde{H}

,

p^{↓}

a sequence of random weight in ∇ with EPPF

q

such that

P {\sum_{j \geq 1} p_{j}^{↓} > 0} > 0

,

(Z, \tilde{H})

and

p^{↓}

are stochastically independent.

First of all, note that

g S S S (q, H)

is a particular case of Definition 2, obtained from a deterministic

\tilde{H} = H

. Moreover, Definition 2 can be seen as a mixture of

g S S S

. Indeed, if

ξ = {(ξ_{n})}_{n \geq 1}

is as in Definition 2 and

\tilde{H}

is the directing random measure of

{(Z_{n})}_{n}

, then the conditional distribution of

ξ

given

\tilde{H}

is a

g S S S (q, \tilde{H})

. This motivates the name “mixture of species sampling sequences”.

It is worth noticing that one can also consider more general mixtures of SSS. The most general mixture one can take into consideration leads to a random probability measure of the form (11), where

Z : = {(Z_{n})}_{n \geq 1}

are exchangeable random variables with directing random measure

\tilde{H}

,

p^{↓}

is a sequence of random weight in ∇ such that

P {\sum_{j \geq 1} p_{j}^{↓} > 0} > 0

, where

[Z, \tilde{H}]

, and

p^{↓}

are not necessarily stochastically independent.

As an example of this more general situation, we describe the so-called mixtures of Dirichlet processes as defined in [10]. Recall that a Dirichlet process

P \sim D i r (α)

is defined as a random probability measure characterized by the system of finite n-dimensional distributions

P {(P (A_{1}), \dots, P (A_{n})) \in \cdot} = D i r (\cdot; α (A_{1}), \dots, α (A_{n})) \forall n \geq 1, \forall A_{i} \in X

where

D i r (\cdot; a_{1}, \dots, a_{n})

is the Dirichlet measure (on the

n - 1

simplex) of parameters

a_{1}, \dots, a_{n}

and

α

is a finite

σ

-additive measure on

X

. It is well known that a Dirichlet process is an

S S r p (q, H)

for

H (\cdot) = α (\cdot) / α (X)

and

q (n_{1}, \dots, n_{k}) = \frac{α {(X)}^{k}}{{(α (X))}_{n}} \prod_{c = 1}^{k} (n_{c} - 1)!,

(12)

where

{(x)}_{n} = x (x + 1) \dots (x + n - 1)

is the rising factorial (or Pochhammer polynomial); see [2,31]. A mixture of Dirichlet processes is defined in [10] as a random probability measure P characterized by the n-dimensional distributions

P {(P (A_{1}), \dots, P (A_{n})) \in \cdot} = \int_{U} D i r (\cdot; α_{u} (A_{1}), \dots, α_{u} (A_{n})) Q (d u)

(13)

where, now,

(u, A) \mapsto α_{u} (A)

is a kernel measure on

U \times X

(in particular,

A \mapsto α_{u} (A)

is a finte

σ

-additive measure on

X

for every

u \in U

),

(U, U)

is a (Borel) regular space (e.g., a Polish space) and Q is a probability measure on

U

.

Using the fact that a Dirichlet process is the

S S r p

described above, one can prove that any mixture of Dirichlet processes has a representation of the form (11), where

({(Z_{n})}_{n \geq 1} \tilde{H})

and

p^{↓}

are stochastically dependent. More precisely, the joint law of

(\tilde{H}, {(Z_{n})}_{n \geq 1}, p^{↓})

is characterized by the law of the (augmented) random element

(\tilde{H}, {(Z_{n})}_{n \geq 1}, p^{↓}, \tilde{u})

given by the following:

$\tilde{u}$ is a random variable taking values in U with law Q;
$\tilde{H} (\cdot) : = α_{\tilde{u}} (\cdot) / α_{\tilde{u}} (X)$ ;
${(Z_{n})}_{n \geq 1}$ are exchangeable random variables with directing random measure $\tilde{H}$ ;
$p^{↓}$ is sequence of random weight in ∇ such that $P {\sum_{j \geq 1} p_{j}^{↓} = 1} = 1$ and the conditional distribution of $p^{↓}$ given $\tilde{u}$ depends only on $α_{\tilde{u}} (X)$ . In particular, the (conditional) EPPF associated with the law of $p^{↓}$ given $\tilde{u}$ has the form

$q (n_{1}, \dots, n_{k} | \tilde{u}) : = \frac{α_{\tilde{u}} {(X)}^{k}}{{(α_{\tilde{u}} (X))}_{n}} \prod_{c = 1}^{k} (n_{c} - 1)!$

(14)

Note that the marginal EPPF of the

p^{↓}

, obtained by integrating (14) with respect to the law of

\tilde{u}

, is

q (n_{1}, \dots, n_{k}) = \prod_{c = 1}^{k} (n_{c} - 1)! \int_{U} \frac{α_{u} {(X)}^{k}}{{(α_{u} (X))}_{n}} Q (d u) .

(15)

Without further assumptions, a mixture of Dirichlet processes is a mixture of SSrp with

p^{↓}

and

\tilde{H} (\cdot)

possibly dependent. Nevertheless, with this representation at hand, one can easily deduce that if

{(ξ_{n})}_{n \geq 1}

is sampled from a mixture of Dirichlet processes under the additional hypothesis that Q is such that

α_{\tilde{u}} (X)

and

α_{\tilde{u}} (\cdot) / α_{\tilde{u}} (X)

are independent, then

{(ξ_{n})}_{n \geq 1}

satisfies Definition 2, with

\tilde{H} = α_{\tilde{u}} (\cdot) / α_{\tilde{u}} (X)

and

q

given by (15).

In the rest of the paper, we focus our attention on mSSS for which

[Z, \tilde{H}]

and

p^{↓}

are independent.

3.2. Representation Theorems for mSSS

In this section, we give two alternative representations for exchangeable sequences as in Definition 2, which generalize Proposition 1 in [14].

Proposition 2.

An exchangeable sequence

ξ = {(ξ_{n})}_{n \geq 1}

is an

m S S S

as in Definition 2 if and only if

ξ_{n} = Z_{I_{n}} a . s .

where

Z^{+} = {(Z_{n})}_{n \geq 1}

,

\tilde{H}

and

p^{↓}

are as in Definition 2 ,

Z^{-} = {(Z_{n})}_{n \leq - 1}

are further conditionally (given

\tilde{H}

) i.i.d. random variables with conditional distribution

\tilde{H}

, and

{(I_{n})}_{n \geq 1}

is a sequence of integer-valued random variables independent of the Zs and

\tilde{H}

, such that, conditionally on

p^{↓}

, the

I_{n}

are independent and (3) holds. All the random elements are defined on a possibly enlarged probability space.

Proof.

Let

σ_{2} = [Z^{+}, \tilde{H}, p^{↓}]

, where

Z^{+}

,

\tilde{H}

,

p^{↓}

are defined as in Definition 2 (mSSS). Set

α = 1 - \sum_{j \geq 1} p_{j}^{↓}

. On a possibly enlarged probability space, let

{(Z^{'})}^{-} = {(Z_{n}^{'})}_{n \leq - 1}

be a sequence of random variables conditionally i.i.d. given

\tilde{H}

with conditional distribution

\tilde{H}

and let

I^{'} = {(I_{n}^{'})}_{n \geq 1}

be a sequence of integer-valued random variables conditionally independent given

p^{↓}

with conditional distribution (3) with

I_{n}^{'}

in place of

I_{n}

. One can also assume that

I^{'}

and

Z = [Z^{+}, {(Z^{'})}^{-}]

are independent given

[p^{↓}, \tilde{H}]

; see Lemma A1 in the Appendix A. Set

τ_{1} = [I^{'}, {(Z^{'})}^{-}]

and define

{(ξ_{n}^{'})}_{n \geq 1} = ϕ (τ_{1}, σ_{2}) : = {(Z_{I_{n}^{'}} 1 {I_{n}^{'} \geq 1} + Z_{- n}^{'} 1 {I_{n}^{'} = - n})}_{n \geq 1} .

Let us show that the law of

ξ^{'} : = {(ξ_{n}^{'})}_{n \geq 1}

given

σ_{2}

is the same as the law of

ξ

given

σ_{2}

. Take n Borel sets

A_{1}, \dots, A_{n}

and non-zero integer numbers

i_{1}, \dots, i_{n}

. One has

\begin{matrix} P \{ξ_{1}^{'} \in A_{1}, \dots, ξ_{n}^{'} \in A_{n}, I_{1}^{'} = i_{1}, \dots, I_{n}^{'} = i_{n} | \tilde{H}, p^{↓}, Z\} \\ = \prod_{j = 1}^{n} [δ_{Z_{i_{j}}} (A_{j}) p_{i_{j}}^{↓} 1 {i_{j} > 0} + α δ_{Z_{- j}^{'}} (A_{j}) 1 {i_{j} = - j}] . \end{matrix}

Conditionally on

\tilde{H}

, the

{(Z_{n}^{'})}_{n \leq - 1}

are i.i.d. with law

\tilde{H}

so that

\begin{matrix} P \{ξ_{1}^{'} \in A_{1}, \dots, ξ_{n}^{'} \in A_{n}, I_{1}^{'} = i_{1}, \dots, I_{n}^{'} = i_{n} | \tilde{H}, p^{↓}, Z^{+}\} \\ = E [P \{ξ_{1}^{'} \in A_{1}, \dots, ξ_{n}^{'} \in A_{n}, I_{1}^{'} = i_{1}, \dots, I_{n}^{'} = i_{n} | \tilde{H}, p^{↓}, Z\} | \tilde{H}, p^{↓}, Z^{+}] \\ = E [\prod_{j = 1}^{n} [δ_{Z_{i_{j}}} (A_{j}) p_{i_{j}}^{↓} 1 {i_{j} > 0} + α δ_{Z_{- j}^{'}} (A_{j}) 1 {i_{j} = - j}] |\tilde{H}, p^{↓}, Z^{+}] \\ = \prod_{j = 1}^{n} E [δ_{Z_{i_{j}}} (A_{j}) p_{i_{j}}^{↓} 1 {i_{j} > 0} + α δ_{Z_{- j}^{'}} (A_{j}) 1 {i_{j} = - j} |\tilde{H}, p^{↓}, Z^{+}] \\ = \prod_{j = 1}^{n} [δ_{Z_{i_{j}}} (A_{j}) p_{i_{j}}^{↓} 1 {i_{j} > 0} + α \tilde{H} (A_{j}) 1 {i_{j} = - j}] . \end{matrix}

Marginalizing with respect to

i_{1}, \dots, i_{n}

,

P \{ξ_{1}^{'} \in A_{1}, \dots, ξ_{n}^{'} \in A_{n} | \tilde{H}, p^{↓}, Z^{+}\} = \prod_{j = 1}^{n} P (A_{j}) .

Recalling that

P = \sum_{j \geq 1} p_{j}^{↓} δ_{Z_{j}} + α \tilde{H}

,

P \{ξ_{1}^{'} \in A_{1}, \dots, ξ_{n}^{'} \in A_{n} | P\} = \prod_{j = 1}^{n} P (A_{j})

almost surely. Since

X

is Polish, we have proven that, given P,

{(ξ_{n}^{'})}_{n \geq 1}

are i.i.d. with common distribution P. In particular, we have proven that

ξ^{'} : = {(ξ_{n}^{'})}_{n \geq 1}

given

σ_{2}

is the same as the law of

ξ

given

σ_{2}

. This concludes the proof of the “if part”, since, by the previous argument, any sequence of the form

(ξ^{'})

is of type (mSSS). To complete the proof, it remains to conclude the “only if part”. Setting

σ_{1} = ξ

, we have proven that the conditional distribution of

σ_{1}

given

σ_{2}

is the same as the conditional distribution of

ϕ (τ_{1}, σ_{2})

given

σ_{2}

. At this stage, Lemma A3 in the Appendix A yields that there is

τ = [{(I_{n})}_{n \geq 1}, {(Z_{n})}_{n \leq - 1}]

such that

{(ξ_{n})}_{n} = ϕ (τ, σ_{2})

a.s., i.e.,

{(ξ_{n})}_{n} = {(Z_{I_{n}})}_{n}

a.s. In addition,

L (τ, σ_{2}) = L (τ_{1}, σ_{2})

; hence, the

{(Z_{n})}_{n \leq - 1}

are conditionally i.i.d. given

\tilde{H}

and the

I_{n}

s are conditionally independent given

[Z^{+}, Z^{-}, \tilde{H}, p^{↓}]

with the conditional distribution defined by (3). □

Proposition 3.

An exchangeable sequence

ξ = {(ξ_{n})}_{n \geq 1}

is an

m S S S

as defined in Definition 2 if and only if

{(ξ_{n})}_{n \geq 1} : = {(Z_{C_{n} (Π)}^{'})}_{n \geq 1} a . s .

(16)

where

Z^{'} : = {(Z_{n}^{'})}_{n \geq 1}

is an exchangeable sequence with the same law of Z, Π is an exchangeable partition with EPPF

q

and Π and

Z^{'}

are independent.

Remark 1.

Note that the

Z_{n}^{'}

s appearing in (16) are not the same

Z_{n}

s appearing in Definition 2, although they have the same law.

Proof of Proposition 3.

If

ξ

is mSSS, then, by Proposition 2, we know that

ξ = {(Z_{I_{n}})}_{n \geq 1}

. Let

Π = Π^{*} (I)

be the partition induced by

{(I_{n})}_{n \geq 1}

; then,

Π

has EPPF

q

by Kingman’s theorem 1. Denote by

I_{1}^{*} = I_{1}, I_{2}^{*}, \dots, I_{K}^{*}

(with

K \leq + \infty

) the distinct values of

{(I_{n})}_{n \geq 1}

in order of appearance, and set

Z_{n}^{'} = Z_{I_{n}^{*}} n = 1, \dots, K .

When

K < + \infty

, set

I_{K + j}^{*} = D + j

, where

D = max {i : i = I_{n}^{*} for n \leq K}

, and define the remaining

Z_{m}^{'}

for

m > K

accordingly as

Z_{m}^{'} = Z_{I_{m}^{*}}

. Let

{i_{1}, \dots, i_{M}}

be integers in

Z \ {0}

, and denote the distinct values in

(i_{1}, \dots, i_{M})

in order of appearance by

(i_{1}^{*}, \dots, i_{m}^{*})

. Let

A_{1}, \dots, A_{n}

be measurable sets in

X

, if

n > m

, then

\begin{matrix} P { & Z_{1}^{'} \in A_{1}, \dots, Z_{n}^{'} \in A_{n}, I_{1} = i_{1}, \dots, I_{M} = i_{M}} \\ = \sum_{ℓ_{1}, \dots, ℓ_{n - m}} P {Z_{i_{1}^{*}} \in A_{1}, \dots, Z_{i_{m}^{*}} \in A_{m}, Z_{ℓ_{1}} \in A_{m + 1} \dots, \\ Z_{ℓ_{n - m}} \in A_{n}, I_{1} = i_{1}, \dots, I_{M} = i_{M}, I_{m + 1}^{*} = ℓ_{1}, \dots, I_{n}^{*} = ℓ_{n - m}} \end{matrix}

where the sum runs over all the non-zero distinct integers

ℓ_{1}, \dots, ℓ_{n - m}

different from

i_{1}^{*}, \dots, i_{m}^{*}

. Since

I^{*}

is a function of I and I and Z are independent, it follows that

\begin{matrix} P { & Z_{i_{1}^{*}} \in A_{1}, \dots, Z_{i_{m}^{*}} \in A_{m}, Z_{ℓ_{1}} \in A_{m + 1} \dots, \\ Z_{ℓ_{n - m}} \in A_{n}, I_{1} = i_{1}, \dots, I_{M} = i_{M}, I_{m + 1}^{*} = ℓ_{1}, \dots, I_{n}^{*} = ℓ_{n - m}} \\ = P {Z_{i_{1}^{*}} \in A_{1}, \dots, Z_{i_{m}^{*}} \in A_{m}, Z_{ℓ_{1}} \in A_{m + 1}, \dots, Z_{ℓ_{n - m}} \in A_{n}} \\ P {I_{1} = i_{1}, \dots, I_{M} = i_{M}, I_{m + 1}^{*} = ℓ_{1}, \dots, I_{n}^{*} = ℓ_{n - m}} \\ = P {Z_{1} \in A_{1}, \dots, Z_{n} \in A_{n}} P {I_{1} = i_{1}, \dots, I_{M} = i_{M}, I_{m + 1}^{*} = ℓ_{1}, \dots, I_{n}^{*} = ℓ_{n - m}} \end{matrix}

where the second equality follows by exchangeability. Summing in ℓ, one obtains

\begin{matrix} P {Z_{1}^{'} \in A_{1}, & \dots, Z_{n}^{'} \in A_{n}, I_{1} = i_{1}, \dots, I_{M} = i_{M}} \\ = P {Z_{1} \in A_{1}, \dots, Z_{n} \in A} P {I_{1} = i_{1}, \dots, I_{M} = i_{M}} . \end{matrix}

For

m \geq n

, the sum is not needed and the same result follows. This shows that

{(Z_{n}^{'})}_{n}

is an exchangeable sequence with the same law of Z, and

{(Z_{n}^{'})}_{n}

and

{(I_{n})}_{n \geq 1}

are independent. To conclude, note that, with probability one,

I_{C_{n} (Π)}^{*} = I_{n}

, and hence

ξ_{n} = Z_{I_{n}} = Z_{I_{C_{n} (Π)}^{*}} = Z_{C_{n} (Π)}^{'} .

Conversely, let us assume that

ξ_{n} = Z_{C_{n} (Π)}^{'}

and let

{(p_{j}^{↓})}_{j \geq 1}

be the weights obtained from

Π

by (4). Let

I_{1}, I_{2}, \dots

be the integer-valued random variables appearing in Theorem 1 such that

Π = Π^{*} (I)

a.s. It follows that

C_{n} (Π) = C_{n} (Π^{*} (I))

and

I_{C_{n} (Π)}^{*} = I_{n}

, where the

I_{n}^{*}

are defined as above for

n \leq K

. Setting

Z_{m} : = \{\begin{matrix} Z_{k}^{'} & if I_{k}^{*} = m \\ Z_{m}^{″} & if I_{k}^{*} \neq m \forall k, \end{matrix}

with

Z_{m}^{″}, m \in Z

conditionally i.i.d. given

\tilde{H}

with law

\tilde{H}

, independent of everything else. Arguing as above, one can check that the

{(Z_{m})}_{m \in Z, m \neq 0}

are exchangeable random variables with the same law of

Z^{'}

independent of

(I, p^{↓})

. To conclude, note that, in particular,

Z_{I_{n}} = Z_{I_{C_{n} (Π)}^{*}} = Z_{C_{n} (Π)}^{'} a . s . .

The conclusion follows by Proposition 2. □

A simple consequence of the previous proposition is the following.

Corollary 1.

Let

{(ξ_{n})}_{n \geq 1}

be an

m S S S

as defined in Definition 2. For every

A_{1}, \dots, A_{n}

Borel set in

X

,

P \{ξ_{1} \in A_{1}, \dots, ξ_{n} \in A_{n}\} = \sum_{π_{n} \in P_{n}} q (| π_{1, n} |, \dots, | π_{k, n} |) E [\prod_{c = 1}^{| π_{n} |} \tilde{H} (\cap_{j \in π_{c, n}} A_{j})] .

4. Random Partitions Induced by mSSS

Let

\tilde{Π} = Π^{*} (ξ)

be the random partition induced by an exchangeable sequence

ξ

defined as in Definition 2, and let

Π^{(0)} : = Π^{*} (Z^{'})

be the partition induced by the corresponding exchangeable sequence

{(Z_{n}^{'})}_{n}

(see Proposition 3). Finally, let

Π

be the partition with EPPF

q

appearing in Proposition 3. As already observed, if

Z^{'}

is an i.i.d. sequence from a diffuse H, then

Π^{(0)} = [(1), (2), (3), \dots]

a.s. and hence

Π^{*} (ξ) = Π

. The same result follows if

Z^{'}

is exchangeable without ties (see Corollary 2). When

Π^{(0)}

is not the trivial partition, it is clear by construction that different blocks in

Π

can merge in the final clustering configuration (i.e.,

Π^{*} (ξ)

). In other words, two observations can share the same value because either they belong to the same block in the latent partition

Π

or they are in different blocks but they share the same value (from

Z^{'}

). This simple observation leads us to write the EPPF of the random partition

Π^{*} (ξ)

using the EPPF of

Π^{(0)}

and of

Π

.

4.1. Explicit Expression of the EPPF

If

{\tilde{π}}_{n} = {{\tilde{π}}_{1, n} \dots, {\tilde{π}}_{k, n}}

is a partition of

[n]

with

| {\tilde{π}}_{i, n} | = n_{i}

(

i = 1, \dots, k

) and

n = (n_{1}, \dots, n_{k})

, we can easily describe all the partitions

π_{n}

more finely than

{\tilde{π}}_{n}

, which are compatible with

{\tilde{π}}_{n}

in the merging process described above. To do this, first of all, note that any block

{\tilde{π}}_{i, n}

can arise from the union of

1 \leq m_{i} \leq n_{i}

blocks in the latent partition. Hence, given

n = (n_{1}, \dots, n_{k})

, where

n = \sum_{i = 1}^{k} n_{i}

, we define the set

M (n) = \{m = (m_{1}, \dots, m_{k}) \in N^{k} : 1 \leq m_{i} \leq n_{i}\} .

See Figure 1 for an example. Once a specific configuration

m

in

M (n)

is considered, the

m_{i}

blocks of the latent partition contributing to the block

{\tilde{π}}_{i, n}

, are characterized by the sufficient statistics

λ_{i} = (λ_{i 1}, \dots, λ_{i n_{i}}) \in N^{n_{i}}

, where

λ_{i j}

is the number of blocks of j elements among the

m_{i}

blocks above. This leads, for

m

in

M (n)

, to the definition of

Λ (n, m) : = \{\begin{matrix} λ = [λ_{1}, \dots, λ_{k}] where λ_{i} = (λ_{i 1}, \dots, λ_{i n_{i}}) \in N^{n_{i}} : \\ \sum_{j = 1}^{n_{i}} j λ_{i j} = n_{i}, \sum_{j = 1}^{n_{i}} λ_{i j} = m_{i} for i = 1, \dots, k \end{matrix}\} .

In summary, the set of partitions

{\tilde{π}}_{n}

, which are compatible with

{\tilde{π}}_{n}

in the merging process described above, can be written as

P_{{\tilde{π}}_{n}} : = \cup_{m \in M (n)} \cup_{λ \in Λ (n, m)} P_{{\tilde{π}}_{n}} (λ)

(17)

where

P_{{\tilde{π}}_{n}} (λ)

is the set of all the partitions in

P_{n}

with

m_{1} + \dots + m_{k} = : | m |

blocks such that

Figure 1. Pictorial representation of the latent partition structure of an mSSS. In the example, the partition induced by

(ξ_{1}, \dots, ξ_{n})

for

n = 8

is

{\tilde{Π}}_{n} = {[1, 3, 4, 7], [2], [5, 6, 8]}

, and it is represented using rounded squares (left bottom). Circles at the top left represent a compatible latent partition, namely

Π_{n} = {[1, 3], [2], [4, 7], [5, 8], [6]}

. The partition on

{1, \dots, 5}

induced by the latent

Z_{n}^{'}

, i.e.,

Π_{| Π_{n} |}^{(0)} = {[1, 3], [2], [4, 5]}

, is represented with squares in the middle of the figure. Combining

Π_{n}

and

Π_{| Π_{n} |}^{(0)}

, one obtains

{\tilde{Π}}_{n}

. The statistics

n

,

m

and

λ

corresponding to this particular configuration are shown in the box at the bottom right.

it is possible to determine k subset containing $m_{1}, \dots, m_{k}$ of these blocks;
the union of the blocks in the i-th subset coincides with the i-th block of ${\tilde{π}}_{n}$ for $i = 1, \dots, k$ ;
in the i-th block, there are $λ_{i j}$ blocks with j elements, for $j = 1, \dots, n_{i}$ .

Given the EPPF

q

, let

\bar{q} (λ) : = q (n_{11}, \dots, n_{1 m_{1}}, n_{21}, \dots, n_{k m_{k}}),

where

(n_{11}, \dots, n_{1 m_{1}}, \dots, n_{k m_{k}})

is any sequence of integer numbers such that

\sum_{c = 1}^{m_{i}} n_{i c} = \sum_{j} j λ_{i j}

for every i and

# {c : n_{i c} = j} = λ_{i j}

for every i and j. Note that since the value of

q (n_{11}, \dots, n_{1 m_{1}}, n_{21}, \dots, n_{k m_{k}})

depends only on the statistics

λ

,

\bar{q} (λ)

is well-defined. See, e.g., [26].

Finally, recall that the cardinality of

P_{{\tilde{π}}_{n}} (λ)

is

c (λ) : = \prod_{i = 1}^{k} \frac{(\sum_{j} j λ_{i j})!}{\prod_{j = 1}^{n_{i}} λ_{i j}! {(j!)}^{λ_{i j}}} = \prod_{i = 1}^{k} \frac{n_{i}!}{\prod_{j = 1}^{n_{i}} λ_{i j}! {(j!)}^{λ_{i j}}},

See Equation (39) in [15].

Proposition 4.

Let

ξ = {(ξ_{n})}_{n \geq 1}

be an

(m S S S)

. Denote by

\tilde{Π} = Π^{*} (ξ)

the random partition induced by ξ and by

q^{(0)}

the EPPF of the partition induced by

{(Z_{n}^{'})}_{n \geq 1}

. If

{\tilde{π}}_{n} = {{\tilde{π}}_{1, n} \dots, {\tilde{π}}_{k, n}}

is a partition of

[n]

with

| {\tilde{π}}_{i, n} | = n_{i}

(i = 1, \dots, k)

and

n = (n_{1}, \dots, n_{k})

, then

P {{\tilde{Π}}_{n} = {\tilde{π}}_{n}} = \sum_{m \in M (n)} q^{(0)} (m) \sum_{λ \in Λ (n, m)} c (λ) \bar{q} (λ) .

(18)

Proof.

Start by writing

P {{\tilde{Π}}_{n} = {\tilde{π}}_{n}} = P (\cup_{m \in M (n)} \cup_{λ \in Λ (n, m)} \cup_{π_{n} \in P_{{\tilde{π}}_{n}} (λ)} {Π_{n} = π_{n}, {\tilde{Π}}_{n} = {\tilde{π}}_{n}}),

(19)

which gives

P {{\tilde{Π}}_{n} = {\tilde{π}}_{n}} = \sum_{m \in M (n)} \sum_{λ \in Λ (n, m)} \sum_{π_{n} \in P_{{\tilde{π}}_{n}} (λ)} P {{\tilde{Π}}_{n} = {\tilde{π}}_{n} | Π_{n} = π_{n}} P {Π_{n} = π_{n}}

(20)

Whenever

π_{n} \in P_{{\tilde{π}}_{n}} (λ)

,

P {Π_{n} = π_{n}} = \bar{q} (λ) .

Therefore, we can write (20) as

P {{\tilde{Π}}_{n} = {\tilde{π}}_{n}} = \sum_{m \in M (n)} \sum_{λ \in Λ (n, m)} \sum_{π_{n} \in P_{{\tilde{π}}_{n}} (λ)} P {{\tilde{Π}}_{n} = {\tilde{π}}_{n} | Π_{n} = π_{n}} \bar{q} (λ) .

(21)

Define now the function

M_{{\tilde{π}}_{n}, π_{n}} : {1, \dots, | m |} \to {1, \dots, | {\tilde{π}}_{n} |}

as

M (j) = i

if

π_{j, n}

is in the i-th subset of blocks, i.e., if

π_{j, n} \subset {\tilde{π}}_{i, n}

. Recalling that k is the number of blocks in

{\tilde{π}}_{n}

, define now a partition

π (M_{{\tilde{π}}_{n}, π_{n}})

on

{1, \dots, | m |}

with k block where the i-th block is

{j : M_{{\tilde{π}}_{n}, π_{n}} (j) = i} .

Recalling that

Π^{(0)}

is the partition induced by the

Z^{'}

s, one has

{{\tilde{Π}}_{n} = {\tilde{π}}_{n}, Π_{n} = π_{n}} = {Π_{| m |}^{(0)} = π (M_{{\tilde{π}}_{n}, π_{n}}), Π_{n} = π_{n}}

which gives

\begin{matrix} P {{\tilde{Π}}_{n} = {\tilde{π}}_{n} | Π_{n} = π_{n}} & = P {Π_{| m |}^{(0)} = π (M_{{\tilde{Π}}_{n}, π_{n}}) | Π_{n} = π_{n}} \\ = P {Π_{| m |}^{(0)} = π (M_{{\tilde{Π}}_{n}, π_{n}})} \end{matrix}

since

Π^{(0)}

and

Π

are independent. To conclude, note that the vector of the cardinalities of the blocks in

π (M_{{\tilde{π}}_{n}, π_{n}})

is

m

; hence, if

q^{(0)}

is the EPPF of

Π^{(0)}

, one has

P {Π_{| m |}^{(0)} = π (M_{{\tilde{π}}_{n}, π_{n}})} = q^{(0)} (m)

. Since the cardinality of

P_{{\tilde{π}}_{n}} (λ)

is

c (λ)

, one obtains the thesis. □

Corollary 2.

Let

ξ = {(ξ_{n})}_{n}

be defined according to (mSSS). If

P {Z_{1}^{'} = Z_{2}^{'}} = 0

, then

Π^{*} (ξ) = Π

with probability one.

Proof.

If

P {Z_{1}^{'} = Z_{2}^{'}} = 0

, by exchangeability,

P {Z_{i_{1}}^{'} = Z_{i_{2}}^{'} = \dots = Z_{i_{k}}^{'}} \leq P {Z_{1}^{'} = Z_{2}^{'}} = 0

. Hence, the

Z_{i}^{'}

s are distinct with probability one. Since

(ξ_{1}, \dots, ξ_{n}) = (Z_{C_{1} (Π)}^{'},

\dots, Z_{C_{n} (Π)}^{'})

by Proposition 3, it follows that

{\tilde{Π}}_{n} = Π_{n}

. □

Remark 2.

Note that, as a special case, we recover the fact that if ξ is a

g S S S (q, H)

with H diffuse (i.e., it is a

S S S (q, H)

), then the random partition induced by ξ is a.s. Π.

Remark 3.

The fact that the EPPF of

{\tilde{Π}}_{n}

is

q

when

P {Z_{1}^{'} = Z_{2}^{'}} = 0

can be deduced from (18). Indeed, if

P {Z_{1}^{'} = Z_{2}^{'}} = 0

, then the partition induced by

Z^{'}

is a.s. the trivial partition

[(1), (2), (3), \dots]

, so that

q^{(0)} (m) = 0

for every

m \neq (1, 1, \dots, 1)

. For

m = (1, 1, \dots, 1)

,

{\tilde{π}}_{n} = {{\tilde{π}}_{1, n} \dots, {\tilde{π}}_{k, n}}

with

| {\tilde{π}}_{i, n} | = n_{i}

(i = 1, \dots, k)

, and

n = (n_{1}, \dots, n_{k})

, the set

Λ (n, m)

reduces to the singleton

λ^{(1)} : = [λ_{1}, \dots, λ_{k}]

, where

λ_{i} = (0, 0, \dots, 1)

with

λ_{i}

of length

n_{i}

. Hence,

\bar{q} (λ^{(1)}) = q (n)

and (18) gives

P {{\tilde{Π}}_{n} = {\tilde{π}}_{n}} = \bar{q} (λ^{(1)}) = q (n)

.

4.2. EPPF When $Π$ Is of Gibbs Type

An important class of exchangeable random partitions is that of Gibbs-type partitions, introduced in [32] and characterized by the EPPF

q (n_{1}, \dots, n_{k}) : = V_{n, k} \prod_{j = 1}^{k} {(1 - σ)}_{n_{j} - 1},

(22)

where

{(x)}_{n} = x (x + 1) \dots (x + n - 1)

,

σ < 1

and

V_{n, k}

are positive real numbers such that

V_{1, 1} = 1

and

(n - σ k) V_{n + 1, k} + V_{n + 1, k + 1} = V_{n, k}, n \geq 1, 1 \leq k \leq n .

A noteworthy example of Gibbs-type EPPF is the so-called Pitman–Yor two-parameter family. It is defined by

q (n_{1}, \dots, n_{k}) : = \frac{\prod_{i = 1}^{k - 1} (θ + i σ)}{{(θ + 1)}_{n - 1}} \prod_{c = 1}^{k} {(1 - σ)}_{n_{c} - 1},

(23)

where

0 \leq σ < 1

and

θ > - σ

; or

σ < 0

and

θ = | σ | m

for some integer m; see [2,31].

In order to state the next result, we recall that

\sum_{\begin{matrix} (λ_{1}, \dots, λ_{n}) \\ \sum_{j = 1}^{n} j λ_{j} = n, \sum_{j = 1}^{n} λ_{j} = k \end{matrix}} \prod_{j = 1}^{n} {[{(1 - σ)}_{j - 1}]}^{λ_{j}} \frac{n!}{\prod_{j = 1}^{n} λ_{i}! {(j!)}^{λ_{j}}} = S_{σ} (n, k)

(24)

where

S_{σ} (n, k)

is the generalized Stirling number of the first kind; see (3.12) in [26]. In the same book, various equivalent definitions of generalized Stirling numbers are presented.

Corollary 3.

Let

\tilde{Π} = Π^{*} (ξ)

be defined as in Proposition 4 with

q

of Gibbs type defined in (22). If

{\tilde{π}}_{n} = {{\tilde{π}}_{1, n} \dots, {\tilde{π}}_{k, n}}

is a partition of

[n]

with

| {\tilde{π}}_{i, n} | = n_{i}

(

i = 1, \dots, k

) and

n = (n_{1}, \dots, n_{k})

, then

P {{\tilde{Π}}_{n} = {\tilde{π}}_{n}} = \sum_{m \in M (n)} q^{(0)} (m) V_{n, | m |} \prod_{i = 1}^{k} S_{σ} (n_{i}, m_{i}) .

Proof.

Combining Proposition 4 with (22), one obtains

\begin{matrix} P {{\tilde{Π}}_{n} = {\tilde{π}}_{n}} & = \sum_{m \in M (n)} q^{(0)} (m) V_{n, | m |} \sum_{λ \in Λ (n, m)} \prod_{i = 1}^{k} \prod_{j = 1}^{n_{i}} {[{(1 - σ)}_{j - 1}]}^{λ_{i, j}} \frac{n_{i}!}{\prod_{j = 1}^{n_{i}} λ_{i j}! {(j!)}^{λ_{i j}}} \\ = \sum_{m \in M (n)} q^{(0)} (m) V_{n, | m |} \\ \times \prod_{i = 1}^{k} \sum_{\begin{matrix} (λ_{i 1}, \dots, λ_{i n_{i}}) \\ \sum_{j = 1}^{n_{i}} j λ_{i j} = n_{i}, \sum_{j = 1}^{n_{i}} λ_{i j} = m_{i} \end{matrix}} \prod_{j = 1}^{n_{i}} {[{(1 - σ)}_{j - 1}]}^{λ_{i, j}} \frac{n_{i}!}{\prod_{j = 1}^{n_{i}} λ_{i j}! {(j!)}^{λ_{i j}}} \\ = \sum_{m \in M (n)} q^{(0)} (m) V_{n, | m |} \prod_{i = 1}^{k} S_{σ} (n_{i}, m_{i}) . \end{matrix}

□

4.3. The EPPF of a $g S S S (q, H)$

As a special case, we now consider the partition induced by a

g S S S (q, H)

with general base measure H. For the rest of the section, it is useful to decompose H as

H (d x) = \sum_{i \geq 1} a_{i} δ_{{\bar{x}}_{i}} (d x) + (1 - a) H^{c} (d x)

(25)

where

X_{0} : = {{\bar{x}}_{1}, {\bar{x}}_{2}, \dots}

is the collection of points with positive H probability,

a_{i} = H ({\bar{x}}_{i})

,

a = H (X_{0}) \in [0, 1]

and

H^{c} (\cdot) = H (\cdot \cap X_{0}^{c}) / H (X_{0}^{c})

is a diffuse probability measure on

X

. The sum is assumed taken over

i \in {1, \dots, | X_{0} |}

.

We now describe

q^{(0)}

, i.e., the EPPF of the partition induced by

{(Z_{n}^{'})}_{n \geq 1}

. Let

m

in

M (n)

, where

n = (n_{1}, \dots, n_{k})

, and assume that the realization of

Π_{| m |}^{(0)}

has k blocks of cardinality

m_{1}, \dots, m_{k}

. Set

z_{i} = 0

if the

Z_{n}^{'}

corresponding to the i-th block of

Π_{| m |}^{(0)}

comes from the diffuse component

H^{c}

, while

z_{k} = ℓ

if it is equal to

{\bar{x}}_{ℓ}

. Since the blocks in

Π^{(0)}

need to be associated with different values of the

Z_{n}^{'}

, one has that necessarily

z_{i} = z_{j} = 0

if

z_{i} = z_{j}

for

i \neq j

. In this case, the block is a singleton, which is

m_{i} = m_{j} = 1

. On the other hand, if

m_{i} \geq 2

, i.e., a merging occurred, necessarily,

z_{i} > 0

. Note that it is also possible that

m_{i} = 1

but

z_{i} > 0

. This motivates the definition of the set

\begin{matrix} Z (m) : & = {(z_{1}, \dots, z_{k}) \in {0, 1, \dots, | X_{0} {|}}^{k} : if z_{i} = z_{j} for i \neq j then z_{i} = z_{j} = 0 \\ and m_{i} = m_{j} = 1; if m_{i} \geq 2 then z_{i} > 0} \end{matrix}

for

m

in

M (n)

where

n = (n_{1}, \dots, n_{k})

. The probability of obtaining, in an i.i.d. sample of length

| m |

from H, exactly k ordered blocks with cardinality

m_{1}, \dots, m_{k}

, such that observations in each block are equal and observations in distinct blocks are different, is

H^{#} (m) : = \sum_{(z_{1}, \dots, z_{k}) \in Z (m)} {(1 - a)}^{# {j : z_{j} = 0}} \prod_{j : z_{j} > 0} a_{z_{j}}^{m_{j}} .

By exchangeability,

H^{#} (m)

turns out to be

q^{(0)} (m)

. Note also that if

a = 1

,

H^{#} (m)

reduces to

\sum_{(z_{1}, \dots, z_{k})} \prod_{j = 1}^{k} a_{z_{j}}^{m_{j}}

where

z_{1}, \dots, z_{k}

runs over all distinct positive integers (less than or equal to

| X_{0} |

if

X_{0}

is finite), which is nothing else (5) for deterministic weights.

To rewrite

H^{#} (m)

in a different way, given

m = (m_{1}, \dots, m_{k})

in

M (n)

, let

m^{*}

be the vector containing all the elements

m_{i} > 1

and let r be its length, with possibly

r = 0

if

m = (1, 1, \dots, 1)

, and define for

ℓ \geq 0

A_{m, ℓ} = \sum_{j_{1} \neq \dots \neq j_{r + ℓ}} a_{j_{1}}^{m_{1}^{*}} \dots a_{j_{r}}^{m_{r}^{*}} a_{j_{r + 1}} \dots a_{j_{r + ℓ}}

with the convention that

A_{m, 0} = 1

when

r = 0

. A simple combinatorial argument shows that

H^{#} (m) = \sum_{ℓ = 0}^{k - r} {(1 - a)}^{k - ℓ - r} (\binom{k - r}{ℓ}) A_{m, ℓ} .

Proposition 4 gives immediately the next proposition.

Proposition 5.

Let ξ be a

g S S S (q, H)

and let

\tilde{Π} = Π^{*} (ξ)

be the random partition induced by ξ. If

{\tilde{π}}_{n} = [{\tilde{π}}_{1, n} \dots, {\tilde{π}}_{k, n}]

is a partition of

[n]

with

| {\tilde{π}}_{i, n} | = n_{i}

(i = 1, \dots, k)

and

n = (n_{1}, \dots, n_{k})

, then

P {{\tilde{Π}}_{n} = {\tilde{π}}_{n}} = \sum_{m \in M (n)} H^{#} (m) \sum_{λ \in Λ (n, m)} c (λ) \bar{q} (λ) .

Remark 4.

Once again, if H is diffuse, then

H^{#} (m) = 0

for every

m \neq (1, 1, \dots, 1)

. Hence, the above formula reduces to the familiar

P {{\tilde{Π}}_{n} = {\tilde{π}}_{n}} = q (| {\tilde{π}}_{n, 1} |, \dots, | {\tilde{π}}_{n, k} |) = P {Π_{n} = {\tilde{π}}_{n}} .

4.4. EPPF for $g S S S$ with Spike-and-Slab Base Measure

We now consider the special case of

g S S S

with a spike-and-slab base measure. A spike-and-slab measure is defined as

H (d x) = a δ_{x_{0}} (d x) + (1 - a) H^{c} (d x)

(26)

where

a \in (0, 1)

,

x_{0}

is a point of

X

and

H^{c}

is a diffuse measure on

X

. This type of measure has been used as a base measure in the Dirichlet process by [16,17,18,19,20] and in the Pitman–Yor process by [21].

Here, we deduce by Proposition 5 the explicit form of the EPPF of the random partition induced by a sequence sampled from a species sampling random probability with such a base measure.

Proposition 6.

Let H be as in (26),

\tilde{Π}

be the random partition induced by a

g S S S (q, H)

and Π be an exchangeable random partition with EPPF

q

. If

π_{n} = {π_{1, n} \dots, π_{k, n}}

is a partition of

[n]

with

| π_{i, n} | = n_{i}

(

i = 1, \dots, k

), then

\begin{matrix} P { & {\tilde{Π}}_{n} = π_{n} {} = (1 - a)}^{k} q (n_{1}, \dots, n_{k}) \\ + {(1 - a)}^{k - 1} \sum_{i = 1}^{k} q (n_{1}, \dots, n_{i - 1}, n_{i + 1}, \dots, n_{k}) \sum_{r = 1}^{n_{i}} a^{r} q_{n} (r | n_{1}, \dots, n_{i - 1}, n_{i + 1}, \dots, n_{k}) \end{matrix}

(27)

where, conditionally on the fact that

Π_{n - n_{i}}

has

k - 1

blocks with sizes

n_{1}, \dots, n_{i - 1}, n_{i + 1}, \dots, n_{k}

, the probability that

Π_{n}

has

k - 1 + r

blocks is denoted by

q_{n} (r | n_{1}, \dots, n_{i - 1}, n_{i + 1}, \dots, n_{k})

. If, in addition,

q

is of Gibbs type (22), then

\begin{matrix} P {{\tilde{Π}}_{n} = π_{n}} & = {(1 - a)}^{k} V_{n, k} \prod_{j = 1}^{k} {(1 - σ)}_{n_{j} - 1} \\ + {(1 - a)}^{k - 1} \sum_{i = 1}^{k} \prod_{j = 1, j \neq i}^{k} {(1 - σ)}_{n_{j} - 1} \sum_{r = 1}^{n_{i}} a^{r} V_{n, k - 1 + r} S_{σ} (n_{i}, r) . \end{matrix}

Proof.

In this case,

H^{#} (m) = 0

if

m_{i} \geq 2

and

m_{j} \geq 2

for some

i \neq j

because H has only one atom. Moreover,

H^{#} (m)

is clearly symmetric and

H^{#} (1, 1, 1, \dots, 1) = {(1 - a)}^{k} + k {(1 - a)}^{k - 1} a

H^{#} (m, 1, \dots, 1) = a^{m} {(1 - a)}^{k - 1} for m > 1 .

By Proposition 5,

\begin{matrix} P {{\tilde{Π}}_{n} = π_{n}} & = [{(1 - a)}^{k} + k {(1 - a)}^{k - 1} a] q (n_{1}, \dots, n_{k}) \\ + {(1 - a)}^{k - 1} \sum_{i = 1}^{k} \sum_{m_{i} = 2}^{n_{i}} a^{m_{i}} \sum_{λ \in Λ (m)} c (λ) \bar{q} (λ) \\ = [{(1 - a)}^{k} + k {(1 - a)}^{k - 1}] q (n_{1}, \dots, n_{k}) + \\ + {(1 - a)}^{k - 1} \sum_{i = 1}^{k} \sum_{r = 2}^{n_{i}} a^{r} \sum_{\begin{matrix} λ \in Λ (m) for m : \\ m_{i} = r, m_{j} = 1, j \neq i \end{matrix}} c (λ) q (n_{1}, \dots, n_{i - 1}, n_{r}^{(i)}, n_{i + 1}, \dots, n_{k}) \\ = {(1 - a)}^{k} q (n_{1}, \dots, n_{k}) \\ + {(1 - a)}^{k - 1} \sum_{i = 1}^{k} \sum_{r = 1}^{n_{i}} a^{r} \sum_{\begin{matrix} λ \in Λ (m) for m : \\ m_{i} = r, m_{j} = 1, j \neq i \end{matrix}} c (λ) q (n_{1}, \dots, n_{i - 1}, n_{r}^{(i)}, n_{i + 1}, \dots, n_{k}) \end{matrix}

where

n_{r}^{(i)}

is any vector of r positive integers with sum

n_{i}

such that

λ_{i j}

of them are equal to j. In view of the definition of

c (λ)

, formula (27) is immediately obtained.

If

q

is of Gibbs type, taking into account (24), then

q_{n} (r | n_{1}, \dots, n_{i - 1}, n_{i + 1}, \dots, n_{k}) = \frac{V_{n, k - 1 + r}}{V_{n - n_{i}, k - 1}} S_{σ} (n_{i}, r)

and the second part of the thesis follows by simple algebra. □

Applying Proposition 6 to the Pitman–Yor EPPF defined in (23), one immediately recovers the results stated in Theorem 1 and Corollary 1 of [21].

5. Predictive Distributions

In this section, we provide some expressions for the predictive distributions of mixtures of species sampling sequences.

5.1. Some General Results

Let

ξ

be as in Definition 2 and let

{(Z_{n}^{'})}_{n}

and

Π_{n}

be the sequence of exchangeable random variables and the exchangeable random partition appearing in Proposition 3. Let

G_{n} = σ (Z_{1}^{'}, \dots, Z_{| Π_{n} |}^{'}, Π_{n})

be the

σ

-field generated by

(Z_{1}^{'}, \dots, Z_{| Π_{n} |}^{'}, Π_{n})

. By Proposition 3, one has

ξ_{n} = Z_{C_{n} (Π)}^{'}

a.s.; hence,

ξ_{n}

is

G_{n}

measurable. Note that, in general,

σ (ξ_{1}, \dots, ξ_{n})

can be strictly contained in

G_{n}

. Set

Ξ_{n} : = | Π_{n} |

and, for any

k \geq 1

, let

K_{k + 1} (\cdot | \cdot)

be a kernel corresponding to the conditional distribution of

Z_{k + 1}^{'}

given

Z_{1}^{'}, \dots, Z_{k}^{'}

(i.e., the

k + 1

-predictive distribution of the exchangeable sequence

Z^{'}

). Finally, recall that

\tilde{Π} = Π^{*} (ξ)

is the partition induced by

ξ

and define

ξ_{1 : {\tilde{K}}_{n}}^{*} = (ξ_{1}^{*}, \dots, ξ_{{\tilde{K}}_{n}}^{*})

as the distinct values in order of appearance of

ξ_{1 : n} : = (ξ_{1}, \dots, ξ_{n})

with

{\tilde{K}}_{n} = | {\tilde{Π}}_{n} |

.

Proposition 7.

Let ξ as in Definition 2. Then,

P {ξ_{n + 1} \in \cdot | G_{n}} = \sum_{ℓ = 1}^{Ξ_{n}} ω_{n, ℓ} (Π_{n}) δ_{Z_{ℓ}^{'}} (\cdot) + ν_{n} (Π_{n}) K_{Ξ_{n} + 1} (\cdot | Z_{1}^{'}, \dots, Z_{Ξ_{n}}^{'})

(28)

where

ν_{n}

and

ω_{n, ℓ}

are defined by (6) and (7). If

P {Z_{1}^{'} = Z_{2}^{'}} = 0

, then

\begin{matrix} P {ξ_{n + 1} \in \cdot | ξ_{1}, \dots, ξ_{n}} & = P {ξ_{n + 1} \in \cdot | ξ_{1}^{*}, \dots, ξ_{{\tilde{K}}_{n}}^{*}, {\tilde{Π}}_{n}} \\ = \sum_{ℓ = 1}^{{\tilde{K}}_{n}} ω_{n, ℓ} ({\tilde{Π}}_{n}) δ_{ξ_{ℓ}^{*}} (\cdot) + ν_{n} ({\tilde{Π}}_{n}) K_{{\tilde{K}}_{n} + 1} (\cdot | ξ_{1}^{*}, \dots, ξ_{{\tilde{K}}_{n}}^{*}) . \end{matrix}

(29)

Proof.

Set

E_{n e w}^{*} = {Ξ_{n + 1} = Ξ_{n} + 1} .

Since

ξ_{n} = Z_{C_{n} (Π)}^{'}

, one can write

\begin{matrix} P {ξ_{n + 1} \in A | G_{n}} & = \sum_{ℓ = 1}^{Ξ_{n}} P {ξ_{n + 1} \in A, n + 1 \in Π_{ℓ, n} | G_{n}} + P {ξ_{n + 1} \in A, E_{n e w}^{*} | G_{n}} \\ = \sum_{ℓ = 1}^{Ξ_{n}} P {Z_{ℓ}^{'} \in A, n + 1 \in Π_{ℓ, n} | G_{n}} + P {Z_{Ξ_{n + 1}}^{'} \in A, E_{n e w}^{*} | G_{n}} \\ = \sum_{ℓ = 1}^{Ξ_{n}} δ_{Z_{ℓ}^{'}} (A) P {n + 1 \in Π_{ℓ, n} | G_{n}} + P {Z_{Ξ_{n} + 1}^{'} \in A, E_{n e w}^{*} | G_{n}} \end{matrix}

Now, since

Π

and

{(Z_{n}^{'})}_{n}

are independent, it follows that

P {n + 1 \in Π_{ℓ, n} | G_{n}} = P {n + 1 \in Π_{ℓ, n} | Π_{n}} = ω_{n, ℓ} (Π_{n})

and also

\begin{matrix} P {Z_{Ξ_{n} + 1}^{'} & \in A, E_{n e w}^{*} | G_{n}} \\ = P {Z_{Ξ_{n} + 1}^{'} \in A | Z_{1}^{'}, \dots, Z_{Ξ_{n}}^{'}} P {E_{n e w}^{*} | Π_{n}} \\ = K_{Ξ_{n} + 1} (A | Z_{1}^{'}, \dots, Z_{Ξ_{n}}^{'}) ν_{n} (Π_{n}) . \end{matrix}

Combining all the claims, one obtains (28). The second part of the proof follows since, if

P {Z_{1}^{'} = Z_{2}^{'}} = 0

, the

Z_{i}^{'}

s are distinct with probability one. Since

(ξ_{1}, \dots, ξ_{n}) = (Z_{C_{1} (Π)}^{'}, \dots, Z_{C_{n} (Π)}^{'})

, it follows that

{\tilde{Π}}_{n} = Π_{n}

,

Ξ_{n} = {\tilde{K}}_{n}

and

(ξ_{1}^{*}, \dots, ξ_{{\tilde{K}}_{n}}^{*}) = (Z_{1}^{'}, \dots, Z_{Ξ_{n}}^{'})

with probability one and

G_{n} = σ (ξ_{1}, \dots, ξ_{n})

. Hence, (29) follows from (28). □

Remark 5.

Note that (29) can be also derived as follows.

P {Z_{1}^{'} = Z_{2}^{'}} = 0

is equivalent to the fact that

\tilde{H}

is almost sure diffuse. Hence, conditionally on

\tilde{H}

, we have a

S S S (q, \tilde{H})

; then, by (PS2) in Section 2.2, one has

\begin{matrix} P {ξ_{n + 1} \in \cdot | ξ_{1}, \dots, ξ_{n}, \tilde{H}} & = \sum_{ℓ = 1}^{{\tilde{K}}_{n}} ω_{n, ℓ} ({\tilde{Π}}_{n}) δ_{ξ_{ℓ}^{*}} (\cdot) + ν_{n} ({\tilde{Π}}_{n}) \tilde{H} (d x) . \end{matrix}

(30)

Taking the conditional expectation of the previous equation, given

ξ_{1}, \dots, ξ_{n}

, we obtain

\begin{matrix} P {ξ_{n + 1} \in A | ξ_{1}, \dots, ξ_{n}} & = \sum_{ℓ = 1}^{{\tilde{K}}_{n}} ω_{n, ℓ} ({\tilde{Π}}_{n}) δ_{ξ_{ℓ}^{*}} (A) + ν_{n} ({\tilde{Π}}_{n}) E [\tilde{H} (A) | ξ_{1}, \dots, ξ_{n}] \end{matrix}

(31)

and the thesis follows since one can check (arguing as in the proof of the proposition) that

E [\tilde{H} (A) | ξ_{1}, \dots, ξ_{n}] = E [\tilde{H} (A) | Z_{1}^{'}, \dots, Z_{{\tilde{K}}_{n}}^{'}] = K_{{\tilde{K}}_{n} + 1} (A | Z_{1}^{'}, \dots, Z_{{\tilde{K}}_{n}}^{'}) .

Assume now that the random variables

Z_{j}^{'}

are defined on

X

by a Bayesian model with likelihood

f (z_{j} | u)

and prior

Q (u)

, where f is a density with respect to a dominating measure

λ

and Q is a probability measure defined on a Polish space U (the space of parameters). In other words,

P {Z_{1}^{'} \in A_{1}, \dots, Z_{k}^{'} \in A_{k}} = \int_{U} (\int_{A_{1} \times A_{2} \dots \times A_{k}} \prod_{j = 1}^{k} f (z_{j} | u) λ (d z_{1}) \dots λ (d z_{k})) Q (d u) .

Note that this means that

\tilde{H} (A) = \int_{A} f (z | \tilde{u}) λ (d z)

, where

\tilde{u} \sim Q

. Bayes’ theorem (see, e.g., Theorem 1.31 in [33]) gives

P {Z_{k + 1}^{'} \in d z_{k + 1} | Z_{1}^{'}, \dots, Z_{k}^{'}} = (\int_{U} f (z_{k + 1} | u) Q (d u | Z_{1}^{'}, \dots, Z_{k}^{'})) λ (d z_{k + 1})

where

Q (d u | Z_{1}, \dots, Z_{k})

is the usual posterior distribution, which is

Q (d u | Z_{1}^{'}, \dots, Z_{k}^{'}) : = \frac{\prod_{j = 1}^{k} f (Z_{j}^{'} | u) Q (d u)}{\int_{U} \prod_{j = 1}^{k} f (Z_{j}^{'} | v) Q (d v)} .

If

λ

is a diffuse measure, one obtains

P {Z_{1}^{'} = Z_{2}^{'}} = 0

. Hence, (29) in Proposition 7 applies and one has

\begin{matrix} P {ξ_{n + 1} \in d x | ξ_{1}, \dots, ξ_{n}} = & \sum_{ℓ = 1}^{{\tilde{K}}_{n}} ω_{n, ℓ} ({\tilde{Π}}_{n}) δ_{ξ_{ℓ}^{*}} (d x) \\ + ν_{n} ({\tilde{Π}}_{n}) (\int_{U} f (x | u) Q (d u | ξ_{1}^{*}, \dots, ξ_{{\tilde{K}}_{n}}^{*})) d x . \end{matrix}

(32)

For example, one can apply this result to a mixture of Dirichlet processes in the sense of [10], as briefly described at the end of Section 3.1. Assume that

α_{\tilde{u}} (X)

and

\tilde{H} (\cdot) = α_{\tilde{u}} (\cdot) / α_{\tilde{u}} (X)

are independent and that

α_{u} (A) / α_{u} (X) = \int_{Z} f (z | u) λ (d z)

for a suitable dominating diffuse measure

λ

.

Under these hypotheses, a sample

{(ξ_{n})}_{n \geq 1}

from a mixture of Dirichlet processes is an mSSS with

q

described in (15) and, in addition,

P {Z_{1}^{'} = Z_{2}^{'}} = 0

. Combining (15) with (6) and (7), one obtains

ω_{n, ℓ} ({\tilde{Π}}_{n}) = | {\tilde{Π}}_{n, ℓ} | \frac{\int_{U} \frac{α_{u} {(X)}^{{\tilde{K}}_{n}}}{{(α_{u} (X))}_{n + 1}} Q (d u)}{\int_{U} \frac{α_{u} {(X)}^{{\tilde{K}}_{n}}}{{(α_{u} (X))}_{n}} Q (d u)}

and

ν_{n} ({\tilde{Π}}_{n}) = \frac{\int_{U} \frac{α_{u} {(X)}^{{\tilde{K}}_{n} + 1}}{{(α_{u} (X))}_{n + 1}} Q (d u)}{\int_{U} \frac{α_{u} {(X)}^{{\tilde{K}}_{n}}}{{(α_{u} (X))}_{n}} Q (d u)}

Hence, the predictive distribution of

ξ_{n + 1}

given

(ξ_{1}, \dots, ξ_{n})

is (33) for

ω_{n, ℓ} ({\tilde{Π}}_{n})

and

ν_{n} ({\tilde{Π}}_{n})

given above.

Note that the same result can be deduced by combining Lemma 1 and Corollary 3.2’ in [10].

Example 1

(Species Sampling NIG). Let

Z_{n}^{'}

be defined as a mixture of normal random variables with Normal-Inverse-Gamma prior. In other words, given

μ_{0} \in R

,

k_{0} > 0, α_{0} > 0, β_{0} > 0

,

\begin{matrix} Z_{n} | \tilde{μ}, {\tilde{σ}}^{2} \overset{i . i . d .}{\sim} N (\tilde{μ}, {\tilde{σ}}^{2}) \\ \tilde{μ} | {\tilde{σ}}^{2} \sim N (μ_{0}, {\tilde{σ}}^{2} / k_{0}) \\ {\tilde{σ}}^{2} \sim In Γ (α_{0}, β_{0}) \end{matrix}

where

N (μ, σ^{2})

denotes a normal distribution of mean μ and variance

σ^{2}

and

In Γ (α, β)

is the inverse gamma distribution with shape α and scale β. Let

T_{ν} (\cdot | μ, σ^{2})

be the density of a Student-T distribution with ν degrees of freedom and

(μ, σ)

position/scale parameters, i.e.,

T_{ν} (x | μ, σ^{2}) : = \frac{1}{\sqrt{σ^{2}}} \frac{Γ (\frac{ν + 1}{2})}{\sqrt{ν π} Γ (\frac{ν}{2})} {(1 + \frac{1}{ν σ^{2}} {(x - μ)}^{2})}^{\frac{- ν + 1}{2}} .

It is well known that, under these assumptions,

K_{k + 1} (A | z_{1}, \dots, z_{k})

has density

T_{2 α_{k}} (z | μ_{k}, σ_{k}^{2})

, where the parameters are updated

\begin{matrix} μ_{k} & = \frac{k_{0} μ_{0} + k {\bar{z}}_{n}}{k_{0} + n} {\bar{z}}_{k} = \frac{1}{k} \sum_{j = 1}^{k} z_{j} α_{k} = α_{0} + k / 2, \\ σ_{k}^{2} & = \frac{(β_{0} + \frac{1}{2} \sum_{j = 1}^{k} {(z_{j} - {\bar{z}}_{k})}^{2} + \frac{k k_{0} {({\bar{z}}_{k} - μ_{0})}^{2}}{2 (n + k_{0})}) (k_{0} + k + 1)}{(α_{0} + k / 2) (k_{0} + k)} \end{matrix}

Thus, in this case, if

z_{1}, \dots, z_{k}

are distinct real numbers and

π_{n} = [π_{1, n}, \dots, π_{k, n}]

, one has

\begin{matrix} P {ξ_{n + 1} \in d x | ξ_{1}^{*} = z_{1}, \dots, ξ_{k}^{*} = z_{k}, Π^{*} (ξ) = π_{n}} & = \sum_{ℓ = 1}^{k} ω_{n, ℓ} (π_{n}) δ_{ξ_{ℓ}^{*}} (d x) \\ + ν_{n} (π_{n}) T_{2 α_{k}} (x | μ_{n}, σ_{k}^{2}) d x . \end{matrix}

(33)

We show an application of (33) to a true dataset by choosing

ω_{n, ℓ}

and

ν_{n}

according to a Pitman–Yor two-parameter family; see (23). The data are the relative changes in reported larcenies between 1991 and 1995 (relative to 1991) for the 90 most populous US counties, taken from Section 2.1 of [34]. We apply our models to both the raw data and the rounded data (approximated to the second digit) in order to obtain ties in the ξs. In the evaluation of the predictive CDFs, we fix

μ_{0} = 0

,

α_{0} = 0.1

,

β_{0} = 0.1

and

k_{0} = 0.1

. In Figure 2, we report the empirical CDF of the rounded data (solid line), the predictive CDF obtained from (33) (dotted line) and the predictive CDF of a Pitman–Yor species sampling sequence (see PS2) with

H = T_{2 α_{0}} (\cdot | μ_{0}, σ_{0}^{2})

,

σ_{0}^{2} = β_{0} (k_{0} + 1) / α_{0} k_{0}

(dashed line). Similar plots are reported in Figure 3, with raw data in place of the rounded data. Note that in all the various settings, the influence of the hyper-parameters

(θ, σ)

is stronger in the CDF of the simple Pitman–Yor species sampling model with respect to the corresponding predictive CDF derived from (33).

Figure 2. Predictive CDFs for the relative changes in larcenies between 1991 and 1995 (relative to 1991) for the 90 most populous US counties; data taken from Section 2.1 of [34]. Data have been rounded to the second decimal. Here,

n = 90

and

k = 36

. Solid line: empirical CDF. Dotted line: predictive CDF from (33). Dashed line: predictive CDF from PS2 with

H = T_{2 α_{0}} (\cdot | μ_{0}, σ_{0}^{2})

,

σ_{0}^{2} = β_{0} (k_{0} + 1) / α_{0} k_{0}

. Different plots correspond to different values of

θ

and

σ

. In all the plots, the predictive CDFs are evaluated with

μ_{0} = 0

,

α_{0} = 0.1

,

β_{0} = 0.1

and

k_{0} = 0.1

.

Figure 3. Predictive CDFs for the relative changes in larcenies between 1991 and 1995 (relative to 1991) for the 90 most populous US counties; data taken from Section 2.1 of [34]. Raw data, without rounding. Here,

n = 90

and

k = 36

. Solid line: empirical CDF. Dotted line: predictive CDF from (33). Dashed line: predictive CDF from PS2 with

H = T_{2 α_{0}} (\cdot | μ_{0}, σ_{0}^{2})

,

σ_{0}^{2} = β_{0} (k_{0} + 1) / α_{0} k_{0}

. Different plots correspond to different values of

θ

and

σ

. In all the plots, the predictive CDFs are evaluated with

μ_{0} = 0

,

α_{0} = 0.1

,

β_{0} = 0.1

and

k_{0} = 0.1

.

5.2. Predictive Distributions for $g S S S$

We now deduce an explicit form for the predictive distribution of a

g S S S

with general base measure H given in (25).

Recall that we denote by

{\tilde{Π}}_{n}

the partition induced by

ξ_{1 : n}

, with

{\tilde{K}}_{n} = | {\tilde{Π}}_{n} |

, and by

Π_{n}

the latent partition appearing in Proposition 3. We also set

ζ_{i} = \{\begin{matrix} ℓ & if ξ_{i}^{*} = {\bar{x}}_{ℓ} \\ 0 & if ξ_{i}^{*} \in X_{0}^{c} . \end{matrix}

The variable

ζ_{i}

is a discrete random variable that takes value 0 if

ξ_{i}^{*}

comes from the diffuse component of H.

Let

P_{{\tilde{π}}_{n}, z_{1 : k}} \subset P_{{\tilde{π}}_{n}}

be the set of all the possible configurations of

Π_{n}

that are compatible with the observed partition

{\tilde{Π}}_{n} = {\tilde{π}}_{n}

and the additional information given by

ζ_{1 : {\tilde{K}}_{n}} = z_{1 : k}

,

{\tilde{K}}_{n} = k

. In order to describe this set, observe that if

z_{i} > 0

, then the block

{\tilde{π}}_{i, n}

may arise from the union of more blocks in

π_{n}

, while, if

z_{i} = 0

, then

{\tilde{π}}_{i, n} = π_{ϕ (i), n}

for some

ϕ

. Note that it may happen that

ϕ (i) \neq i

.

Recalling that the elements in

m = (m_{1}, \dots, m_{k})

in (17) are used to describe the numbers of sub-blocks into which the blocks of

{\tilde{π}}_{n}

have been divided to form the latent partition

π_{n}

, it turns out that the set

P_{{\tilde{π}}_{n}, z_{1 : k}}

has the additional constraint

m_{i} = 1

whenever

z_{i} = 0

. These considerations yield that, starting from

{\tilde{π}}_{n}

and

z_{1 : k}

, the set of admissible

m

can also be described by resorting to the definition of

Z (m)

as follows:

\begin{matrix} M (n, z_{1 : k}) & : = \{m \in M (n) : z_{1 : k} \in Z (m)\} \\ = \{m \in N^{k} : m_{i} = 1 if z_{i} = 0, 1 \leq m_{i} \leq n_{i} if z_{i} > 0\} . \end{matrix}

With this position, one has

\begin{matrix} P_{{\tilde{π}}_{n}, z_{1 : k}} & = \cup_{m \in M (n) : z_{1 : k} \in Z (m)} \cup_{λ \in Λ (n, m)} P_{{\tilde{π}}_{n}} (λ) \\ = \cup_{m \in M (n, z_{1 : k})} \cup_{λ \in Λ (n, m)} P_{{\tilde{π}}_{n}} (λ), \end{matrix}

(34)

where

Λ (n, m)

and

P_{{\tilde{π}}_{n}} (λ)

have been defined in Section 4.1.

For any

m

in

M (n, z_{1 : k})

and any

λ

in

Λ (n, m)

, we define

λ^{n e w} : = [λ_{1}, \dots, λ_{k}, 1] .

In other words,

λ^{n e w}

corresponds to the configuration obtained from

λ

by adding one new element as a new block. In the following, let

\tilde{N} = (| {\tilde{Π}}_{1, n} |, \dots, | {\tilde{Π}}_{{\tilde{K}}_{n}, n} |)

, and let

{\tilde{N}}^{i +}

be obtained from

\tilde{N}

by adding 1 to its i-th component.

Proposition 8.

Let

{(ξ_{n})}_{n \geq 1}

be a

g S S S (q, H)

. Then, for any A in

X

,

P {ξ_{n + 1} \in A | ξ_{1 : n}} = \frac{1}{Z_{n}} (\sum_{i = 1}^{{\tilde{K}}_{n}} w_{i} δ_{ξ_{i}^{*}} (A) + w_{0} {\bar{H}}_{n} (A)) a . s .

where

\begin{matrix} {\bar{H}}_{n} (A) : = [\sum_{ℓ : {\bar{x}}_{ℓ} \notin ξ_{1 : {\tilde{K}}_{n}}^{*}} a_{ℓ} δ_{{\bar{x}}_{ℓ}} (A) + (1 - a) H^{c} (A)] {H (X \ ξ_{1 : {\tilde{K}}_{n}}^{*})}^{- 1}, \\ w_{i} : = \sum_{m \in M ({\tilde{N}}^{i +}, ζ_{1 : {\tilde{K}}_{n}})} \prod_{j : ζ_{j} > 0} a_{ζ_{j}}^{m_{j}} \sum_{λ \in Λ ({\tilde{N}}^{i +}, m)} c (λ) \bar{q} (λ), \\ w_{0} : = H (X \ ξ_{1 : {\tilde{K}}_{n}}^{*}) \sum_{m \in M (\tilde{N}, ζ_{1 : {\tilde{K}}_{n}})} \prod_{j : ζ_{j} > 0} a_{ζ_{j}}^{m_{j}} \sum_{λ \in Λ (\tilde{N}, m)} c (λ) \bar{q} (λ^{n e w}), \\ Z_{n} : = \sum_{m \in M (\tilde{N}, ζ_{1 : {\tilde{K}}_{n}})} \prod_{j : ζ_{j} > 0} a_{ζ_{j}}^{m_{j}} \sum_{λ \in Λ (\tilde{N}, m)} c (λ) \bar{q} (λ) . \end{matrix}

Proof.

We start by defining the following events for

i = 1, \dots, {\tilde{K}}_{n}

:

E_{i} = {ξ_{n + 1} = ξ_{i}^{*}}, E_{n e w} = {ξ_{n + 1} \notin ξ_{1 : {\tilde{K}}_{n}}^{*}} .

Since conditioning on

ξ_{1 : n}

is equivalent to the condition on

[ξ_{1 : {\tilde{K}}_{n}}^{*}, {\tilde{Π}}_{n}]

, one can write

P {ξ_{n + 1} \in A | ξ_{1 : n}} = \sum_{i = 1}^{{\tilde{K}}_{n}} P {ξ_{n + 1} \in A, E_{i} | ξ_{1 : {\tilde{K}}_{n}}^{*}, {\tilde{Π}}_{n}} + P {ξ_{n + 1} \in A, E_{n e w} | ξ_{1 : {\tilde{K}}_{n}}^{*}, {\tilde{Π}}_{n}}

Now, set

E_{n e w}^{*} : = {C_{n + 1} (Π) = | Π_{n} | + 1}

and

E_{i}^{*} = {| C_{n + 1} (Π) | \leq | Π_{n} | and Π_{C_{n + 1} (Π), n} \subset {\tilde{Π}}_{i, n}} .

On

ζ_{i} = 0

, one has (up to zero probability sets)

{ξ_{n + 1} \in A} \cap E_{i} = {ξ_{i}^{*} \in A} \cap E_{i}^{*}

while, on

ζ_{i} > 0

(up to zero probability sets),

{ξ_{n + 1} \in A} \cap E_{i} = ({ξ_{i}^{*} \in A} \cap E_{i}^{*}) \cup ({ξ_{i}^{*} \in A} \cap {Z_{| Π_{n} | + 1}^{'} = {\bar{x}}_{ζ_{i}}} \cap E_{n e w}^{*}) .

Note that (up to zero probability sets)

{Z_{| Π_{n} | + 1}^{'} = {\bar{x}}_{ζ_{i}}} \cap E_{n e w}^{*} \cap {ζ_{i} = 0} = \emptyset .

Hence,

\begin{matrix} P {ξ_{n + 1} \in A, E_{i} & | ξ_{1 : {\tilde{K}}_{n}}^{*}, {\tilde{Π}}_{n}} \\ = δ_{ξ_{i}^{*}} (A) P {E_{i}^{*} \cup ({Z_{| Π_{n} | + 1}^{'} = {\bar{x}}_{ζ_{i}}} \cap E_{n e w}^{*}) | {\tilde{Π}}_{n}, ξ_{1 : {\tilde{K}}_{n}}^{*}} . \end{matrix}

(35)

Similarly, using that

E_{n e w} \subset E_{n e w}^{*}

, one obtains

\begin{matrix} P {ξ_{n + 1} \in A, E_{n e w} | ξ_{1 : n}} & = P {ξ_{n + 1} \in A, E_{n e w}, E_{n e w}^{*} | {\tilde{Π}}_{n}, ξ_{1 : {\tilde{K}}_{n}}^{*}} \\ = P {ξ_{n + 1} \in A, E_{n e w} | ξ_{1 : {\tilde{K}}_{n}}^{*}, E_{n e w}^{*}} P {E_{n e w}^{*} | {\tilde{Π}}_{n}, ξ_{1 : {\tilde{K}}_{n}}^{*}} \\ = H_{n} (A) P {E_{n e w}^{*} | {\tilde{Π}}_{n}, ξ_{1 : {\tilde{K}}_{n}}^{*}} \end{matrix}

(36)

where

H_{n} (A) = \sum_{ℓ : {\bar{x}}_{ℓ} \notin ξ_{1 : {\tilde{K}}_{n}}^{*}} {\bar{a}}_{ℓ} δ_{{\bar{x}}_{ℓ}} (A) + (1 - a) H^{c} (A) .

At this stage, note that, by construction,

L (ξ_{1 : {\tilde{K}}_{n}}^{*} | Π_{n}, Z_{| Π_{n} | + 1}^{'}, ζ_{1 : {\tilde{K}}_{n}}, {\tilde{Π}}_{n}, Π_{n + 1}) = L (ξ_{1 : {\tilde{K}}_{n}}^{*} | ζ_{1 : {\tilde{K}}_{n}})

where

L (ξ_{1 : {\tilde{K}}_{n}}^{*} | ζ_{1 : {\tilde{K}}_{n}})

is characterized by

P (ξ_{1}^{*} \in A_{1}, \dots, ξ_{{\tilde{K}}_{n}}^{*} \in A_{{\tilde{K}}_{n}} | ζ_{1 : {\tilde{K}}_{n}}) = \prod_{i = 1}^{{\tilde{K}}_{n}} (H^{c} (A_{i}) 1 {ζ_{i} = 0} + δ_{{\bar{x}}_{ζ_{i}}} (A_{i}) 1 {ζ_{i} > 0}),

and then

L (Z_{| Π_{n} | + 1}^{'}, ζ_{1 : {\tilde{K}}_{n}}, {\tilde{Π}}_{n}, Π_{n + 1}, ξ_{1 : {\tilde{K}}_{n}}^{*} | Π_{n}) = L (Z_{| Π_{n} | + 1}^{'}, ζ_{1 : {\tilde{K}}_{n}}, {\tilde{Π}}_{n}, Π_{n + 1} | Π_{n}) L (ξ_{1 : {\tilde{K}}_{n}}^{*} | ζ_{1 : {\tilde{K}}_{n}}) .

Hence,

\begin{matrix} L (Π_{n}, Π_{n + 1}, Z_{| Π_{n} | + 1}^{'}, & ξ_{1 : {\tilde{K}}_{n}}^{*}, ζ_{1 : {\tilde{K}}_{n}}, {\tilde{Π}}_{n}) \\ = L (Π_{n}) L (Z_{| Π_{n} | + 1}^{'}, ζ_{1 : {\tilde{K}}_{n}}, {\tilde{Π}}_{n}, Π_{n + 1} | Π_{n}) L (ξ_{1 : {\tilde{K}}_{n}}^{*} | ζ_{1 : {\tilde{K}}_{n}}) \end{matrix}

(37)

which shows, in particular, that

[Π_{n}, Π_{n + 1}, Z_{| Π_{n} | + 1}^{'}, {\tilde{Π}}_{n}]

and

ξ_{1 : {\tilde{K}}_{n}}^{*}

are conditionally independent given

ζ_{1 : {\tilde{K}}_{n}}

. Since

E_{i}^{*}

,

E_{n e w}^{*}

and

{Z_{| Π_{n} | + 1}^{'} = {\bar{x}}_{ζ_{i}}} \cap E_{n e w}^{*}

depend logically only on

Π_{n + 1}, Z_{| Π_{n} | + 1}^{'}, {\tilde{Π}}_{n}, ζ_{1 : {\tilde{K}}_{n}}

, one obtains

\begin{matrix} P {E_{i}^{*} | {\tilde{Π}}_{n}, ξ_{1 : {\tilde{K}}_{n}}^{*}} = P {E_{i}^{*} | {\tilde{Π}}_{n}, ζ_{1 : {\tilde{K}}_{n}}}, \\ P {(Z_{| Π_{n} | + 1}^{'} = {\bar{x}}_{ζ_{i}}) \cap E_{n e w}^{*} | {\tilde{Π}}_{n}, ξ_{1 : {\tilde{K}}_{n}}^{*}} = P {(Z_{| Π_{n} | + 1}^{'} = {\bar{x}}_{ζ_{i}}) \cap E_{n e w}^{*} | {\tilde{Π}}_{n}, ζ_{1 : {\tilde{K}}_{n}}} \end{matrix}

(38)

and, finally,

P {E_{n e w}^{*} | {\tilde{Π}}_{n}, ξ_{1 : {\tilde{K}}_{n}}^{*}} = P {E_{n e w}^{*} | {\tilde{Π}}_{n}, ζ_{1 : {\tilde{K}}_{n}}} .

(39)

Since

[{\tilde{Π}}_{n}, ζ_{1 : {\tilde{K}}_{n}}, {\tilde{K}}_{n}]

are discrete random variables, we use the elementary definition of the conditional probability of events to evaluate the conditional distributions (38) and (39). Specifically, assume that

{\tilde{K}}_{n} = k

,

[{\tilde{Π}}_{n}, ζ_{1 : {\tilde{K}}_{n}}] = [{\tilde{π}}_{n}, z_{1 : k}]

,

\tilde{N} = n

, and, for a given event E, write

P {E | {\tilde{Π}}_{n} = {\tilde{π}}_{n}, ζ_{1 : {\tilde{K}}_{n}} = z_{1 : k}} = \frac{P {E, {\tilde{Π}}_{n} = {\tilde{π}}_{n}, ζ_{1 : {\tilde{K}}_{n}} = z_{1 : k}}}{P {{\tilde{Π}}_{n} = {\tilde{π}}_{n}, ζ_{1 : {\tilde{K}}_{n}} = z_{1 : k}}} .

(40)

As for the denominator in (40), letting

M_{n}^{*} = M (n, z_{1 : k})

and

J = # {i : z_{i} > 0}

, using (34), one obtains

\begin{matrix} P {{\tilde{Π}}_{n} = & {\tilde{π}}_{n}, ζ_{1 : {\tilde{K}}_{n}} = z_{1 : k}} \\ = \sum_{π_{n} \in P_{{\tilde{π}}_{n}, z_{1 : k}}} P {{\tilde{Π}}_{n} = {\tilde{π}}_{n}, ζ_{1 : {\tilde{K}}_{n}} = z_{1 : k}, Π_{n} = π_{n}} \\ = \sum_{m \in M_{n}^{*}} \sum_{λ \in Λ (n, m)} \sum_{π_{n} \in P_{{\tilde{π}}_{n}} (λ)} P {{\tilde{Π}}_{n} = {\tilde{π}}_{n}, ζ_{1 : n} = z_{1 : k}, Π_{n} = π_{n}} \\ = {(1 - a)}^{k - J} \sum_{m \in M_{n}^{*}} \prod_{j : z_{j} > 0} a_{z_{j}}^{m_{j}} \sum_{λ \in Λ (n, m)} \sum_{π_{n} \in P_{{\tilde{π}}_{n}} (λ)} \bar{q} (λ) \\ = {(1 - a)}^{k - J} \sum_{m \in M_{n}^{*}} \prod_{j : z_{j} > 0} a_{z_{j}}^{m_{j}} \sum_{λ \in Λ (n, m)} c (λ) \bar{q} (λ) . \end{matrix}

As for the numerators in (40), when

E = E_{n e w}^{*}

, we start from

\begin{matrix} P {E_{n e w}^{*}, {\tilde{Π}}_{n} = {\tilde{π}}_{n}, ζ_{1 : {\tilde{K}}_{n}} = z_{1 : k}, Π_{n} = π_{n}} \\ = P {E_{n e w}^{*} | Π_{n} = π_{n}} P {{\tilde{Π}}_{n} = {\tilde{π}}_{n}, ζ_{1 : {\tilde{K}}_{n}} = z_{1 : k}, Π_{n} = π_{n}} \\ = P {E_{n e w}^{*} | Π_{n} = π_{n}} {(1 - a)}^{k - J} \prod_{j : z_{j} > 0} a_{z_{j}}^{m_{j}} \bar{q} (λ) \\ = {(1 - a)}^{k - J} \prod_{j : z_{j} > 0} a_{z_{j}}^{m_{j}} \bar{q} (λ^{n e w}) . \end{matrix}

where, in the last equality, we used that for

π_{n} \in P_{{\tilde{π}}_{n}} (λ)

, one has

P {E_{n e w}^{*} | Π_{n} = π_{n}} = \frac{\bar{q} (λ^{n e w})}{\bar{q} (λ)} .

Taking the sum over

P_{{\tilde{π}}_{n}, z_{1 : k}}

gives

P {E_{n e w}^{*}, {\tilde{Π}}_{n} = {\tilde{π}}_{n}, ζ_{1 : {\tilde{K}}_{n}} = z_{1 : k}} = {(1 - a)}^{k - J} \sum_{m \in M_{n}^{*}} \prod_{j : z_{j} > 0} a_{z_{j}}^{m_{j}} \sum_{λ \in Λ (n, m)} c (λ) \bar{q} (λ^{n e w}) .

Combining these with (39) and (40) and recalling that

M_{n}^{*} = M (n, z_{1 : k})

, one obtains

P {E_{n e w}^{*} | {\tilde{Π}}_{n}, ξ_{1 : {\tilde{K}}_{n}}^{*}} = \frac{\sum_{m \in M (n, z_{1 : k})} \prod_{j : z_{j} > 0} a_{z_{j}}^{m_{j}} \sum_{λ \in Λ (n, m)} c (λ) \bar{q} (λ^{n e w})}{\sum_{m \in M (n, z_{1 : k})} \prod_{j : z_{j} > 0} a_{z_{j}}^{m_{j}} \sum_{λ \in Λ (n, m)} c (λ) \bar{q} (λ)} .

Finally, it remains to consider (40) when

E = E_{i}^{*} \cup ({Z_{| Π_{n} | + 1}^{'} = {\bar{x}}_{ζ_{i}}} \cap E_{n e w}^{*})

. Now, observe that

\begin{matrix} (E_{i}^{*} & \cup ({Z_{| Π_{n} | + 1}^{'} = {\bar{x}}_{ζ_{i}}} \cap E_{n e w}^{*})) \cap {{\tilde{Π}}_{n} = {\tilde{π}}_{n}, ζ_{1 : {\tilde{K}}_{n}} = z_{1 : k}} \\ = {{\tilde{Π}}_{n + 1} = {\tilde{π}}_{n}^{i +}, ζ_{1 : {\tilde{K}}_{n}} = z_{1 : k}} = {{\tilde{Π}}_{n + 1} = {\tilde{π}}_{n}^{i +}, ζ_{1 : {\tilde{K}}_{n + 1}} = z_{1 : k}} \end{matrix}

where

{\tilde{π}}_{n}^{i +}

denote the partition in

P_{n + 1}

obtained from

{\tilde{π}}_{n}

by adding

n + 1

to the i-th block of

{\tilde{π}}_{n}

. Note that, for the second equality, we used that, on

{\tilde{Π}}_{n + 1} = {\tilde{π}}_{n}^{i +}

, one has

{\tilde{K}}_{n + 1} = {\tilde{K}}_{n} = k

.

Hence, using (34) with

{\tilde{π}}_{n}^{i +}

in place of

{\tilde{π}}_{n}

, one concludes that

\begin{matrix} P {{\tilde{Π}}_{n + 1} = {\tilde{π}}_{n}^{i +}, ζ_{1 : {\tilde{K}}_{n + 1}} = z_{1 : k}} \\ = \sum_{\begin{matrix} m \in M (n^{i +}, z_{1 : k}) \\ λ \in Λ (n, m) \end{matrix}} \sum_{π_{n + 1} \in P_{{\tilde{π}}_{n + 1}} (λ)} P {{\tilde{Π}}_{n + 1} = {\tilde{π}}_{n}^{i +}, Π_{n + 1} = π_{n + 1}, ζ_{1 : {\tilde{K}}_{n + 1}} = z_{1 : k}} \\ = {(1 - a)}^{k - J} \sum_{m \in M (n^{i +}, z_{1 : k})} \prod_{j : z_{j} > 0} a_{z_{j}}^{m_{j}} \sum_{λ \in Λ (n, m)} c (λ) \bar{q} (λ) \end{matrix}

where

n^{i +} = (n_{1}, \dots, n_{i} + 1, \dots, n_{k})

. Hence, by (38)–(40), one can write

\begin{matrix} P {E_{i}^{*} & \cup ({Z_{| Π_{n} | + 1}^{'} = {\bar{x}}_{ζ_{i}}} \cap E_{n e w}^{*}) | {\tilde{Π}}_{n}, ξ_{1 : {\tilde{K}}_{n}}^{*}} \\ = \frac{\sum_{m \in M (n^{i +}, z_{1 : k})} \prod_{j : z_{j} > 0} a_{z_{j}}^{m_{j}} \sum_{λ \in Λ (n, m)} c (λ) \bar{q} (λ)}{\sum_{m \in M (n, z_{1 : k})} \prod_{j : z_{j} > 0} a_{z_{j}}^{m_{j}} \sum_{λ \in Λ (n, m)} c (λ) \bar{q} (λ)} . \end{matrix}

Combining these results, one obtains the thesis. □

6. Conclusions and Discussion

We have defined a new class of exchangeable sequences, called mixtures of species sampling sequences (mSSS). We have shown that these sequences include various well-known Bayesian nonparametric models. In particular, the observations of many nonparametric hierarchical models (e.g., hierarchical Dirichlet process, hierarchical Pitman–Yor process and, more generally, hierarchical species sampling models [22,23,24,25]) are mSSS. We have shown that also observations sampled from a mixture of Dirichlet processes [10] are mSSS, under some additional assumptions. Our general class also includes species sampling sequences with a general (not necessarily diffuse) base measure, which have been used in various applications, e.g., in the case of “spike-and-slab”-type nonparametric priors [16,17,18,19,20,21].

We believe that our general framework sheds light on the common structure of all the above-mentioned models, leading to a possible unified treatment of some of their important features. Our techniques provides unified proofs for various results that, up to now, have been proven with ad hoc methods.

We have proven that all the mSSS are obtained by assigning the values of an exchangeable sequence to the classes of a latent exchangeable random partition. This representation is proven in the strong sense of an almost sure equality (see Section 3) and leads to the simple and clear derivation of an explicit expression for the EPPF of an mSSS. We believe that our general proof simplifies the derivation of the EPPF of many of the above-mentioned particular cases. Moreover, our results show that the clustering and the predictive structure of various well-known models do not depend on the relation between these models and completely random measures, but are essentially a consequence of the simple combinatorial structure of these sequences. Many important differences between well-known models (such as mixtures of Dirichlet and hierarchical Dirichlet) can be explained easily by simple differences in the latent partition and the corresponding latent exchangeable sequence.

We stress that a clear understanding of the clustering structure of mSSS is fundamental for practical purposes, since these models are typically used to cluster observations. Moreover, we hope that the explicit expression for EPPFs in our general framework can lead to the development of new MCMC algorithms for sampling from the posterior distribution.

Finally, we believe that some of the results we have proven here for mSSS can be extended to the more general case of partially exchangeable arrays. In this direction, for future works, a possible generalization of mSSS is to consider partially exchangeable arrays with a mixture of species sampling random probability measures as directing measures.

Author Contributions

F.B.: Methodology, simulation, writing and editing. L.L.: Methodology, writing and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 817257.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

F. Bassetti and L. Ladelli wish to express their gratitude to Professor Eugenio Regazzini, who has been an inspiring teacher and outstanding guide in many fields of probability and statistics.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In what follows,

L (X)

denotes the law of a random element X. For ease of reference, we state here Lemma 5.9 and Corollary 5.11 of [35].

Lemma A1

(Extension 1). Fix a probability kernel K between two measurable spaces S and T, and let σ be a random element defined on

(Ω, F, P)

taking values in S. Then, there exists a random element η in T, defined on some extension of the original probability space Ω, such that

P [η \in \cdot | σ] = K (\cdot | σ)

a.s. and, moreover, η is conditionally independent given σ from any other random element on Ω.

Lemma A2

(Extension 2). Fix two Borel spaces S and T, a measurable mapping

f : T \to S

and some random elements σ in S and

\tilde{η}

in T with

L (σ) = L (f (\tilde{η}))

. Then, there is a random element η defined on some extension of the original probability space, such that

L (η) = L (\tilde{η})

and

σ = f (η)

a.s.

We need the following variant of the previous result.

Lemma A3

(Extension 3). Fix three Borel spaces

S_{1}

,

S_{2}

and

T_{1}

, a measurable mapping

ϕ : T_{1} \times S_{2} \to S_{1}

and some random elements

σ = (σ_{1}, σ_{2})

in

S_{1} \times S_{2}

and

τ_{1}

in

T_{1}

, all defined on a probability space

(Ω, F, P)

. Assume that the conditional law of

σ_{1}

given

σ_{2}

is the same as the conditional law of

ϕ (τ_{1}, σ_{2})

given

σ_{2}

(P-almost surely). Then, there is a random element τ defined on some extension of the original probability space

(Ω, F, P)

taking values in

T_{1}

such that

$σ_{1} = ϕ (τ, σ_{2})$ a.s.
$L (τ_{1}, σ_{2}) = L (τ, σ_{2})$ .

Proof.

Define

f : T_{1} \times S_{2} = : T \to S_{1} \times S_{2} = : S

by

f (a, b) = (ϕ (a, b), b)

, set

\tilde{η} = (τ_{1}, σ_{2})

and

σ = (σ_{1}, σ_{2})

. By hypothesizing, it is clear that

L (f (\tilde{η})) = L ((ϕ (τ_{1}, σ_{2}), σ_{2})) = L (σ_{1}, σ_{2}) = L (σ)

. Thus, by Lemma A2, one has that, on an enlargement of

(Ω, F, P)

, there exists

η : = (τ, σ_{2}^{*})

such that

L (η) = L (\tilde{η})

and

(ϕ (τ, σ_{2}^{*}), σ_{2}^{*}) = f (η) = σ = (σ_{1}, σ_{2})

a.s. Hence,

σ_{2}^{*} = σ_{2}

a.s. but also

ϕ (τ, σ_{2}) = ϕ (τ, σ_{2}^{*}) = σ_{1}

a.s. It remains to show the second part of the thesis. Since

(τ, σ_{2}) = (τ, σ_{2}^{*}) = η

a.s. and

L (η) = L (\tilde{η})

, where

\tilde{η} = (τ_{1}, σ_{2})

, it follows that

L (τ, σ_{2}) = L (τ_{1}, σ_{2})

. □

References

Ferguson, T.S. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1973, 1, 209–230. [Google Scholar] [CrossRef]
Pitman, J.; Yor, M. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 1997, 25, 855–900. [Google Scholar] [CrossRef]
Perman, M.; Pitman, J.; Yor, M. Size-biased sampling of Poisson point processes and excursions. Probab. Theory Relat. Fields 1992, 92, 21–39. [Google Scholar] [CrossRef]
Regazzini, E.; Lijoi, A.; Prünster, I. Distributional results for means of normalized random measures with independent increments. Ann. Stat. 2003, 31, 560–585. [Google Scholar] [CrossRef]
James, L.F.; Lijoi, A.; Prünster, I. Posterior analysis for normalized random measures with independent increments. Scand. J. Stat. 2009, 36, 76–97. [Google Scholar] [CrossRef]
Lijoi, A.; Prünster, I. Models beyond the Dirichlet process. In Bayesian Nonparametrics; Hjort, N.L., Holmes, C., Müller, P., Walker, S., Eds.; Cambridge University Press: New York, NY, USA, 2010. [Google Scholar]
De Blasi, P.; Favaro, S.; Lijoi, A.; Mena, R.H.; Prunster, I.; Ruggiero, M. Are Gibbs-Type Priors the Most Natural Generalization of the Dirichlet Process? IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 212–229. [Google Scholar] [CrossRef]
Pitman, J. Poisson-Kingman partitions. In Statistics and Science: A Festschrift for Terry Speed; IMS Lecture Notes Monograph Series; Institute of Mathematical Statistics: Beachwood, OH, USA, 2003; Volume 40, pp. 1–34. [Google Scholar]
Ishwaran, H.; James, L.F. Gibbs sampling methods for stick-breaking priors. J. Am. Stat. Assoc. 2001, 96, 161–173. [Google Scholar] [CrossRef]
Antoniak, C.E. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Stat. 1974, 2, 1152–1174. [Google Scholar] [CrossRef]
Cifarelli, D.M.; Regazzini, E. Distribution functions of means of a Dirichlet process. Ann. Stat. 1990, 18, 429–442. [Google Scholar] [CrossRef]
Sangalli, L.M. Some developments of the normalized random measures with independent increments. Sankhyā 2006, 68, 461–487. [Google Scholar]
Broderick, T.; Wilson, A.C.; Jordan, M.I. Posteriors, conjugacy, and exponential families for completely random measures. Bernoulli 2018, 24, 3181–3221. [Google Scholar] [CrossRef]
Bassetti, F.; Ladelli, L. Asymptotic number of clusters for species sampling sequences with non-diffuse base measure. Stat. Probab. Lett. 2020, 162, 108749. [Google Scholar] [CrossRef]
Pitman, J. Some developments of the Blackwell-MacQueen urn scheme. In Statistics, Probability and Game Theory; IMS Lecture Notes Monograph Series; Institute of Mathematical Statistics: Hayward, CA, USA, 1996; Volume 30, pp. 245–267. [Google Scholar] [CrossRef]
Dunson, D.B.; Herring, A.H.; Engel, S.M. Bayesian selection and clustering of polymorphisms in functionally related genes. J. Am. Stat. Assoc. 2008, 103, 534–546. [Google Scholar] [CrossRef]
Kim, S.; Dahl, D.B.; Vannucci, M. Spiked Dirichlet process prior for Bayesian multiple hypothesis testing in random effects models. Bayesian Anal. 2009, 4, 707–732. [Google Scholar] [CrossRef] [PubMed]
Suarez, A.J.; Ghosal, S. Bayesian Clustering of Functional Data Using Local Features. Bayesian Anal. 2016, 11, 71–98. [Google Scholar] [CrossRef]
Cui, K.; Cui, W. Spike-and-Slab Dirichlet Process Mixture Models. Spike Slab Dirichlet Process. Mix. Model. 2012, 2, 512–518. [Google Scholar] [CrossRef][Green Version]
Barcella, W.; De Iorio, M.; Baioa, G.; Malone-Leeb, J. Variable selection in covariate dependent random partition models: An application to urinary tract infection. Stat. Med. 2016, 35, 1373–13892. [Google Scholar] [CrossRef]
Canale, A.; Lijoi, A.; Nipoti, B.; Prünster, I. On the Pitman–Yor process with spike and slab base measure. Biometrika 2017, 104, 681–697. [Google Scholar] [CrossRef]
Teh, Y.; Jordan, M.I. Hierarchical Bayesian nonparametric models with applications. In Bayesian Nonparametrics; Hjort, N.L., Holmes, C., Müller, P., Walker, S., Eds.; Cambridge University Press: New York, NY, USA, 2010. [Google Scholar]
Teh, Y.W.; Jordan, M.I.; Beal, M.J.; Blei, D.M. Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 2006, 101, 1566–1581. [Google Scholar] [CrossRef]
Camerlenghi, F.; Lijoi, A.; Orbanz, P.; Pruenster, I. Distribution theory for hierarchical processes. Ann. Stat. 2019, 1, 67–92. [Google Scholar] [CrossRef]
Bassetti, F.; Casarin, R.; Rossini, L. Hierarchical Species Sampling Models. Bayesian Anal. 2020, 15, 809–838. [Google Scholar] [CrossRef]
Pitman, J. Combinatorial Stochastic Processes; Lectures from the 32nd Summer School on Probability Theory Held in Saint-Flour, 7–24 July 2002, with a Foreword by Jean Picard; Lecture Notes in Mathematics; Springer: Berlin, Germany, 2006; Volume 1875. [Google Scholar]
Crane, H. The ubiquitous Ewens sampling formula. Stat. Sci. 2016, 31, 1–19. [Google Scholar] [CrossRef]
Kingman, J.F.C. The representation of partition structures. J. Lond. Math. Soc. 1978, 18, 374–380. [Google Scholar] [CrossRef]
Aldous, D.J. Exchangeability and related topics. In École d’été de Probabilités de Saint-Flour, XIII—1983; Lecture Notes in Mathematics; Springer: Berlin, Germany, 1985; Volume 1117, pp. 1–198. [Google Scholar] [CrossRef]
Kallenberg, O. Canonical representations and convergence criteria for processes with interchangeable increments. Z. Wahrscheinlichkeitstheorie Und Verw. Geb. 1973, 27, 23–36. [Google Scholar] [CrossRef]
Pitman, J. Exchangeable and partially exchangeable random partitions. Probab. Theory Relat. Fields 1995, 102, 145–158. [Google Scholar] [CrossRef]
Gnedin, A.; Pitman, J. Exchangeable Gibbs partitions and Stirling triangles. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 2005, 325, 83–102, 244–245. [Google Scholar] [CrossRef]
Schervish, M.J. Theory of Statistics; Springer Series in Statistics; Springer: New York, NY, USA, 1995. [Google Scholar] [CrossRef]
Marin, J.M.; Robert, C.P. Bayesian Core: A Practical Approach to Computational Bayesian Statistics; Springer Texts in Statistics; Springer: New York, NY, USA, 2007; pp. xiv+255. [Google Scholar]
Kallenberg, O. Foundations of Modern Probability, 3rd ed.; Probability Theory and Stochastic Modelling; Springer: New York, NY, USA, 2021; Volume 99. [Google Scholar]

Figure 1. Pictorial representation of the latent partition structure of an mSSS. In the example, the partition induced by

(ξ_{1}, \dots, ξ_{n})

for

n = 8

is

{\tilde{Π}}_{n} = {[1, 3, 4, 7], [2], [5, 6, 8]}

, and it is represented using rounded squares (left bottom). Circles at the top left represent a compatible latent partition, namely

Π_{n} = {[1, 3], [2], [4, 7], [5, 8], [6]}

. The partition on

{1, \dots, 5}

induced by the latent

Z_{n}^{'}

, i.e.,

Π_{| Π_{n} |}^{(0)} = {[1, 3], [2], [4, 5]}

, is represented with squares in the middle of the figure. Combining

Π_{n}

and

Π_{| Π_{n} |}^{(0)}

, one obtains

{\tilde{Π}}_{n}

. The statistics

n

,

m

and

λ

corresponding to this particular configuration are shown in the box at the bottom right.

Figure 2. Predictive CDFs for the relative changes in larcenies between 1991 and 1995 (relative to 1991) for the 90 most populous US counties; data taken from Section 2.1 of [34]. Data have been rounded to the second decimal. Here,

n = 90

and

k = 36

. Solid line: empirical CDF. Dotted line: predictive CDF from (33). Dashed line: predictive CDF from PS2 with

H = T_{2 α_{0}} (\cdot | μ_{0}, σ_{0}^{2})

,

σ_{0}^{2} = β_{0} (k_{0} + 1) / α_{0} k_{0}

. Different plots correspond to different values of

θ

and

σ

. In all the plots, the predictive CDFs are evaluated with

μ_{0} = 0

,

α_{0} = 0.1

,

β_{0} = 0.1

and

k_{0} = 0.1

.

Figure 3. Predictive CDFs for the relative changes in larcenies between 1991 and 1995 (relative to 1991) for the 90 most populous US counties; data taken from Section 2.1 of [34]. Raw data, without rounding. Here,

n = 90

and

k = 36

. Solid line: empirical CDF. Dotted line: predictive CDF from (33). Dashed line: predictive CDF from PS2 with

H = T_{2 α_{0}} (\cdot | μ_{0}, σ_{0}^{2})

,

σ_{0}^{2} = β_{0} (k_{0} + 1) / α_{0} k_{0}

. Different plots correspond to different values of

θ

and

σ

. In all the plots, the predictive CDFs are evaluated with

μ_{0} = 0

,

α_{0} = 0.1

,

β_{0} = 0.1

and

k_{0} = 0.1

.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Mixture of Species Sampling Models

Abstract

1. Introduction

2. Background Materials

2.1. Exchangeable Random Partitions

2.2. Species Sampling Models

3. Mixture of Species Sampling Models

3.1. Definitions and Relation to Other Models

3.2. Representation Theorems for mSSS

4. Random Partitions Induced by mSSS

4.1. Explicit Expression of the EPPF

4.2. EPPF When $Π$ Is of Gibbs Type

4.3. The EPPF of a $g S S S (q, H)$

4.4. EPPF for $g S S S$ with Spike-and-Slab Base Measure

5. Predictive Distributions

5.1. Some General Results

5.2. Predictive Distributions for $g S S S$

6. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Article Metrics

Citations

Article Access Statistics

Mixture of Species Sampling Models

Abstract

1. Introduction

2. Background Materials

2.1. Exchangeable Random Partitions

2.2. Species Sampling Models

3. Mixture of Species Sampling Models

3.1. Definitions and Relation to Other Models

3.2. Representation Theorems for mSSS

4. Random Partitions Induced by mSSS

4.1. Explicit Expression of the EPPF

4.2. EPPF When Π Is of Gibbs Type

4.3. The EPPF of a g S S S ( q , H )

4.4. EPPF for g S S S with Spike-and-Slab Base Measure

5. Predictive Distributions

5.1. Some General Results

5.2. Predictive Distributions for g S S S

6. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Article Metrics

Citations

Article Access Statistics

4.2. EPPF When $Π$ Is of Gibbs Type

4.3. The EPPF of a $g S S S (q, H)$

4.4. EPPF for $g S S S$ with Spike-and-Slab Base Measure

5.2. Predictive Distributions for $g S S S$