A Central Limit Theorem for Predictive Distributions

Patrizia Berti; Luca Pratelli; Pietro Rigo

doi:10.3390/math9243211

,

and

¹

Dipartimento di Scienze Fisiche, Informatiche e Matematiche, Università di Modena e Reggio-Emilia, Via Campi 213/B, 41100 Modena, Italy

²

Accademia Navale di Livorno, 57127 Livorno, Italy

³

Dipartimento di Scienze Statistiche “P. Fortunati”, Università di Bologna, Via delle Belle Arti 41, 40126 Bologna, Italy

^*

Author to whom correspondence should be addressed.

Mathematics2021, 9(24), 3211;https://doi.org/10.3390/math9243211

This article belongs to the Special Issue Bayesian Predictive Inference and Related Asymptotics—Festschrift for Eugenio Regazzini's 75th Birthday

Version Notes

Order Reprints

Abstract

Let S be a Borel subset of a Polish space and F the set of bounded Borel functions

f : S \to R

. Let

a_{n} (\cdot) = P (X_{n + 1} \in \cdot ∣ X_{1}, \dots, X_{n})

be the n-th predictive distribution corresponding to a sequence

(X_{n})

of S-valued random variables. If

(X_{n})

is conditionally identically distributed, there is a random probability measure

μ

on S such that

\int f d a_{n} \overset{a . s .}{⟶} \int f d μ

for all

f \in F

. Define

D_{n} (f) = d_{n} \{\int f d a_{n} - \int f d μ\}

for all

f \in F

, where

d_{n} > 0

is a constant. In this note, it is shown that, under some conditions on

(X_{n})

and with a suitable choice of

d_{n}

, the finite dimensional distributions of the process

D_{n} = \{D_{n} (f) : f \in F\}

stably converge to a Gaussian kernel with a known covariance structure. In addition,

E \{φ (D_{n} (f)) ∣ X_{1}, \dots, X_{n}\}

converges in probability for all

f \in F

and

φ \in C_{b} (R)

.

Keywords:

bayesian predictive inference; central limit theorem; conditional identity in distribution; exchangeability; predictive distribution; stable convergence

MSC:

60B10; 60G25; 60G09; 60F05; 62F15; 62M20

1. Introduction

All random elements appearing in the sequel are defined on a common probability space, say

(Ω, A, P)

. We denote by S a Borel subset of a Polish space and by

B

the Borel

σ

-field on S. We let

\begin{matrix} P = \{probability measures on B\} and \\ F = \{real bounded Borel functions on S\} . \end{matrix}

Moreover, if

λ \in P

and

f \in F

, we write

λ (f)

to denote

λ (f) = \int f d λ .

In other terms, depending on the context,

λ

is regarded as a function on

B

or a function on F. This slight abuse of notation is quite usual (see, e.g., [1,2]) and very useful for the purposes of this note.

Let

X = (X_{1}, X_{2}, \dots)

be a sequence of S-valued random variables and

F_{0} = {\emptyset, Ω} and F_{n} = σ (X_{1}, \dots, X_{n}) .

The predictive distributions of X are the random probability measures on

(S, B)

given by

a_{n} (\cdot) = P (X_{n + 1} \in \cdot ∣ F_{n}) for all n \geq 0 .

Under some conditions, there is a further random probability measure

μ

on

(S, B)

such that

μ (f) \overset{a . s .}{=} lim_{n} a_{n} (f) for each f \in F .

(1)

For instance, condition (1) holds if X is exchangeable. More generally, it holds if X is conditionally identically distributed (c.i.d.), as defined in Section 2. Note also that, since S is separable, condition (1) implies

a_{n} \to μ

weakly. Regarding

a_{n}

and

μ

as measurable functions from

Ω

into

P

, one obtains

P ({ω \in Ω : a_{n, ω} \to μ_{ω} weakly}) = 1 .

Assume condition (1), fix a sequence

d_{n}

of positive constants, and define

D_{n} (f) = d_{n} \{a_{n} (f) - μ (f)\} for each f \in F .

This note deals with the process

D_{n} = \{D_{n} (f) : f \in F\} .

Our goal is to show that, under some conditions on X and with a suitable choice of the constants

d_{n}

, the finite-dimensional distributions of

D_{n}

stably converge, as

n \to \infty

, to a certain Gaussian limit.

To be more precise, we recall that a kernel on

(S, B)

is a measurable map

α : S \to P

. This means that

α (x) \in P

, for each

x \in S

, and the function

x \mapsto α (x) (A)

is

B

-measurable for each

A \in B

. In what follows, we write

α (x) (f) = \int f (y) α (x) (d y) for all x \in S and f \in F .

Next, as in [3], suppose the predictive distributions of X satisfy the recursive equation

a_{n + 1} = q_{n} a_{n} + (1 - q_{n}) α (X_{n + 1}) a . s . for all n \geq 0,

(2)

where

q_{0}, q_{1}, \dots \in (0, 1)

are constants and

α

is a kernel on

(S, B)

. Moreover, let

ν (\cdot) = P (X_{1} \in \cdot)

be the marginal distribution of

X_{1}

. Under condition (2), X is c.i.d. whenever

α

is a regular conditional distribution for

ν

given a sub-

σ

-field

G \subset B

; see ([3] Section 5). Hence, we assume

α (\cdot) (A) = E_{ν} (1_{A} ∣ G), ν - a . s .,

(3)

for all

A \in B

and some sub-

σ

-field

G \subset B

. For instance, condition (3) holds if

α (x) = δ_{x} for all x \in S

where

δ_{x}

denotes the unit mass at the point x (just let

G = B

). In addition, we assume

\sum_{n} {(1 - q_{n})}^{2} < \infty and lim_{n} d_{n} sup_{k \geq n} (1 - q_{k - 1}) = 0

where

d_{n} = {(\sum_{k \geq n} {(1 - q_{k})}^{2})}^{- 1 / 2} .

In this framework, it is shown that

(D_{n} (f_{1}), \dots, D_{n} (f_{p})) ⟶ N_{p} (0, Σ) stably

(4)

for all

p \geq 1

and all

f_{1}, \dots, f_{p} \in F

, where

Σ

is the random covariance matrix with entries

σ_{j k} = \int α (x) (f_{j}) α (x) (f_{k}) μ (d x) - μ (f_{j}) μ (f_{k}) .

We actually prove something more than (4). Let

C_{b} (R)

denote the set of real bounded continuous functions on

R

. Then, it is shown that

E \{φ (D_{n} (f)) ∣ F_{n}\} \overset{P}{⟶} N (0, σ^{2}) (φ)

(5)

for all

f \in F

and

φ \in C_{b} (R)

, where

σ^{2} = \int α (x) {(f)}^{2} μ (d x) - μ {(f)}^{2} .

Based on (5), it is not hard to deduce condition (4).

Before concluding the Introduction, several remarks are in order.

(i): A remarkable special case is $α (x) = δ_{x}$ for all $x \in S$ . Indeed, Equation (2) holds with $α = δ$ in some meaningful situations, including Dirichlet sequences; see ([3] Section 4) for other examples. Thus, suppose $α = δ$ . Then, the above formulae reduce to $σ_{j k} = μ (f_{j} f_{k}) - μ (f_{j}) μ (f_{k})$ and $σ^{2} = μ (f^{2}) - μ {(f)}^{2}$ . Moreover, if $ν$ is non-atomic and

$\prod_{j = 0}^{n} q_{j} \to 0 and \sum_{n} \prod_{j = 0}^{n} q_{j} = \infty,$

then $μ$ takes the form

$μ \overset{a . s .}{=} \sum_{n} V_{n} δ_{Y_{n}}$

where $(V_{n})$ and $(Y_{n})$ are independent sequences and $(Y_{n})$ is i.i.d. with $Y_{1} \sim ν$ ; see ([3] Theorem 20) and [4] for details.
(ii): Let $l^{\infty} (G)$ be the set of real bounded functions on G, where G is any subset of F. For instance, if $S = R$ , one could take $G = \{1_{(- \infty, x]} : x \in R\}$ . In view of (4), a natural question is whether $D_{n}$ has a limit in distribution when $l^{\infty} (G)$ is equipped with a suitable distance. As an example, $l^{\infty} (G)$ could be equipped with the uniform distance (as in [1,2]) or with some weaker distance (as in [5]). Even if natural, this question is neglected in this note. We hope and plan to investigate it in a forthcoming paper.
(iii): For fixed $f \in F$ , condition (4) provides some information on the convergence rate of $a_{n} (f)$ to $μ (f)$ . Define $L_{n} = u_{n} | a_{n} (f) - μ (f) |$ where $u_{n} > 0$ is any sequence of constants. Then, condition (4) yields $L_{n} \overset{P}{⟶} 0$ whenever $u_{n} / d_{n} \to 0$ . Furthermore, $L_{n} \overset{P}{⟶} \infty$ provided $u_{n} / d_{n} \to \infty$ and $σ^{2} > 0$ a.s.
(iv): The condition ${lim}_{n} d_{n} {sup}_{k \geq n} (1 - q_{k - 1}) = 0$ is just a technical assumption which guarantees that, asymptotically, there are no dominating terms. In a sense, this condition is analogous to the weak Lindeberg’s condition in the classical CLT for independent summands.
(v): From a Bayesian point of view, $μ$ can be seen as a random parameter of the data sequence X. This is quite clear if X is exchangeable, for, in this case, X is conditionally i.i.d. given $μ$ . If X is only c.i.d., the role of $μ$ is not as crucial, but $μ$ still contributes to specify the probability distribution of X; see ([3] Section 2.1). Thus, in a Bayesian framework, conditions (4)–(5) may be useful to make (asymptotic) inference about $μ$ . To this end, an alternative could be proving a limit theorem for $W_{n} = w_{n} (μ_{n} - μ)$ , where $w_{n}$ is a suitable constant and $μ_{n} = (1 / n) \sum_{j = 1}^{n} δ_{X_{j}}$ the empirical measure. However, $D_{n}$ has two advantages with respect to $W_{n}$ . It usually converges at a better rate and the variance of the limit distribution is smaller; see, e.g., Example 3.
(vi): Conditions (4)–(5) are our main results. They can be motivated in at least two ways. Firstly, from the theoretical perspective, conditions (4)–(5) fit into the results concerning the asymptotic behavior of conditional expectations (see, e.g., [6,7,8] and references therein). Secondly, from the practical perspective, conditions (4)–(5) play a role in all those fields where predictive distributions are basic objects. The main example is Bayesian predictive inference. Indeed, the predictive distributions investigated in this note have been introduced in connection with Bayesian prediction problems; see [3]. Another example is the asymptotic behavior of certain urn schemes. Related subjects, where (4)–(5) are potentially useful, are empirical processes for dependent data, Glivenko-Cantelli-type theorems and merging of opinions. Without any claim of being exhaustive, a list of references is: [3,5,9,10,11,12,13,14,15,16,17,18,19,20,21].

2. Preliminaries

In this note,

N_{p} (0, C)

denotes the Gaussian law on the Borel sets of

R^{p}

with mean 0 and covariance matrix C, where C is symmetric and semidefinite positive. If

p = 1

and

c \geq 0

is a scalar, we write

N (0, c)

instead of

N_{1} (0, c)

and

N (0, c) (φ) = \int φ (x) N (0, c) (d x)

for all bounded measurable

φ : R \to R

. Note that, if

Σ

is a random covariance matrix,

N_{p} (0, Σ)

is a random probability measure on the Borel sets of

R^{p}

.

Let us briefly recall stable convergence. Let

A^{+} = {H \in A : P (H) > 0}

. Fix a random probability measure K on

(S, B)

and define

λ_{H} (A) = E \{K (A) ∣ H\} for all A \in B and H \in A^{+} .

Each

λ_{H}

is a probability measure on

B

. Then,

X_{n}

converges stably to K, written

X_{n} \to K

stably, if

P (X_{n} \in \cdot ∣ H) ⟶ λ_{H} weakly for all H \in A^{+} .

In particular,

X_{n}

converges in distribution to

λ_{Ω}

. However, stable convergence is stronger than convergence in distribution. To see this, take a further random variable

X : Ω \to S

. Then,

X_{n} \overset{P}{⟶} X

if, and only if,

X_{n} \to δ_{X}

stably. Thus, stable convergence is strictly connected to convergence in probability. Moreover,

(X_{n}, X) \to K \times δ_{X}

stably whenever

X_{n} \to K

stably. Therefore, if

X_{n}

converges stably,

(X_{n}, X)

still converges stably for any S-valued random variable X.

We next turn to conditional identity in distribution. Say that X is conditionally identically distributed (c.i.d.) if

P (X_{k} \in \cdot ∣ F_{n}) = P (X_{n + 1} \in \cdot ∣ F_{n}) a . s . for all k > n \geq 0 .

Thus, at each time n, the future observations

(X_{k} : k > n)

are identically distributed given the past. This is actually weaker than exchangeability. Indeed, X is exchangeable if, and only if, it is stationary and c.i.d.

C.i.d. sequences were introduced in [9,22] and then investigated in various papers; see, e.g., [3,4,5,11,23,24,25,26,27,28,29].

The asymptotics of c.i.d. sequences is similar to that of exchangeable ones. To see this, suppose X is c.i.d. and define the empirical measures

μ_{n} = \frac{1}{n} \sum_{j = 1}^{n} δ_{X_{j}} .

Then, there is a random probability measure

μ

on

(S, B)

such that

μ (A) \overset{a . s .}{=} lim_{m} μ_{m} (A) for each fixed A \in B .

It follows that

\begin{matrix} E \{μ (A) ∣ F_{n}\} = lim_{m} E \{μ_{m} (A) ∣ F_{n}\} \\ = lim_{m} \frac{1}{m} \sum_{j = n + 1}^{m} P (X_{j} \in A ∣ F_{n}) = P (X_{n + 1} \in A ∣ F_{n}) a . s . \end{matrix}

for all

n \geq 0

and

A \in B

. Therefore, as in the exchangeable case, the predictive distributions can be written as

a_{n} (\cdot) = P (X_{n + 1} \in \cdot ∣ F_{n}) = E \{μ (\cdot) ∣ F_{n}\} a . s .

Using the martingale convergence theorem, this implies

μ (f) \overset{a . s .}{=} lim_{n} E \{μ (f) ∣ F_{n}\} = lim_{n} a_{n} (f) for all f \in F .

Furthermore, X is asymptotically exchangeable, in the sense that the probability distribution of the shifted sequence

(X_{n}, X_{n + 1}, \dots)

converges weakly to an exchangeable probability measure on

(S^{\infty}, B^{\infty})

.

Finally, we state a technical result to be used later on.

Lemma 1.

Let

(Y_{n})

be a sequence of real integrable random variables, adapted to the filtration

(F_{n})

, and

Z_{n} = E (Y_{n + 1} ∣ F_{n}) .

Let V be a real non-negative random variable and

0 < b_{1} < b_{2} < \dots

an increasing sequence of constants, such that

b_{n} ↑ \infty

and

b_{n} / b_{n + 1} \to 1

. Suppose

(Y_{n}^{2})

is uniformly integrable,

Z_{n} \overset{a . s .}{⟶} Z

for some random variable Z, and define

T_{n} = b_{n} (Z_{n} - Z) .

Then,

E \{φ (T_{n}) ∣ F_{n}\} \overset{P}{⟶} N (0, V) (φ) for all φ \in C_{b} (R)

provided

b_{n}^{2} \sum_{k \geq n} {(Z_{k} - Z_{k - 1})}^{2} \overset{P}{⟶} V;

(6)

lim_{n} b_{n} E \{sup_{k \geq n} | Z_{k} - Z_{k - 1} |\} = 0;

(7)

\sum_{k \geq n} E |E (Z_{k + 1} ∣ F_{k}) - Z_{k}| = o (1 / b_{n}) .

(8)

Proof.

Just repeat the proof of ([10] Theorem 1) with

b_{n}

in the place of

\sqrt{n}

. □

3. Main Result

Let us go back to the notation of Section 1. Recall that

q_{n} \in (0, 1)

is a constant for each

n \geq 0

and

d_{n} = {(\sum_{k \geq n} {(1 - q_{k})}^{2})}^{- 1 / 2}

. We aim to prove the following CLT.

Theorem 1.

Assume conditions (2)–(3) and

\sum_{n} {(1 - q_{n})}^{2} < \infty a n d lim_{n} d_{n} sup_{k \geq n} (1 - q_{k - 1}) = 0 .

Then, there is a random probability measure μ on

(S, B)

such that

μ (f) \overset{a . s .}{=} lim_{n} a_{n} (f) a n d E \{φ (D_{n} (f)) ∣ F_{n}\} \overset{P}{⟶} N (0, σ^{2}) (φ)

for all

f \in F

and

φ \in C_{b} (R)

, where

σ^{2} = \int α (x) {(f)}^{2} μ (d x) - μ {(f)}^{2} .

As a consequence,

(D_{n} (f_{1}), \dots, D_{n} (f_{p})) ⟶ N_{p} (0, Σ) s t a b l y

for all

p \geq 1

and all

f_{1}, \dots, f_{p} \in F

where the covariance matrix Σ has entries

σ_{j k} = \int α (x) (f_{j}) α (x) (f_{k}) μ (d x) - μ (f_{j}) μ (f_{k}) .

Proof.

Due to conditions (2)–(3), X is c.i.d.; see ([3] Section 5). Hence, as noted in Section 2, there is a random probability measure

μ

on

(S, B)

such that

a_{n} (f) \overset{a . s .}{=} E \{μ (f) ∣ F_{n}\} for all f \in F .

By martingale convergence, it follows that

a_{n} (f) \overset{a . s .}{⟶} μ (f)

for all

f \in F

.

We next prove condition (5). Fix

f \in F

and define

b_{n} = d_{n}, Y_{n} = a_{n} (f), Z = μ (f) and V = σ^{2} .

Then,

(Y_{n}^{2})

is uniformly integrable (for f is bounded) and

b_{n}

satisfies the conditions of Lemma 1. Moreover,

Z_{n} = E (Y_{n + 1} ∣ F_{n}) = E \{E (μ (f) ∣ F_{n + 1}) ∣ F_{n}\} = E \{μ (f) ∣ F_{n}\} = a_{n} (f) a . s .

so that

Z_{n} \overset{a . s .}{⟶} Z

. Therefore, Lemma 1 applies. Hence, to prove (5), it suffices to check conditions (6)–(8).

Let

c = sup | f |

. Since

E (Z_{k + 1} ∣ F_{k}) = Z_{k}

a.s., condition (8) is trivially true. Moreover, condition (2) implies

\begin{matrix} Z_{k} - Z_{k - 1} & = & a_{k} (f) - a_{k - 1} (f) \\ = & q_{k - 1} a_{k - 1} (f) + (1 - q_{k - 1}) α (X_{k}) (f) - a_{k - 1} (f) \\ = & (1 - q_{k - 1}) \{α (X_{k}) (f) - a_{k - 1} (f)\} a . s . for all k \geq 1 . \end{matrix}

Hence, condition (7) holds, since

d_{n} E \{sup_{k \geq n} | Z_{k} - Z_{k - 1} |\} \leq 2 c d_{n} sup_{k \geq n} (1 - q_{k - 1}) ⟶ 0 .

It remains to prove condition (6), namely

d_{n}^{2} \sum_{k \geq n} {(1 - q_{k - 1})}^{2} {\{α (X_{k}) (f) - a_{k - 1} (f)\}}^{2} \overset{P}{⟶} σ^{2} .

First note that, since

a_{k - 1} {(f)}^{2} \overset{a . s .}{⟶} μ {(f)}^{2}

as

k \to \infty

, one obtains

d_{n}^{2} \sum_{k \geq n} {(1 - q_{k - 1})}^{2} a_{k - 1} {(f)}^{2} = \frac{\sum_{k \geq n} {(1 - q_{k - 1})}^{2} a_{k - 1} {(f)}^{2}}{\sum_{k \geq n} {(1 - q_{k})}^{2}} \overset{a . s .}{⟶} μ {(f)}^{2} .

Next, define

R_{k} = α (X_{k}) {(f)}^{2} and M_{n} = d_{n}^{2} \sum_{k \geq n} {(1 - q_{k - 1})}^{2} \{R_{k} - E (R_{k} ∣ F_{k - 1})\} .

Then,

\begin{matrix} E (M_{n}^{2}) & = & d_{n}^{4} \sum_{k \geq n} {(1 - q_{k - 1})}^{4} E \{{(R_{k} - E (R_{k} ∣ F_{k - 1}))}^{2}\} \\ \leq & 4 c^{4} d_{n}^{4} \sum_{k \geq n} {(1 - q_{k - 1})}^{4} \\ \leq & 4 c^{4} d_{n}^{2} sup_{k \geq n} {(1 - q_{k - 1})}^{2} \cdot d_{n}^{2} \sum_{k \geq n} {(1 - q_{k - 1})}^{2} \\ ⟶ & 0 . \end{matrix}

Moreover,

E (R_{k} ∣ F_{k - 1}) = E \{\int α (x) {(f)}^{2} μ (d x) ∣ F_{k - 1}\} \overset{a . s .}{⟶} \int α (x) {(f)}^{2} μ (d x) .

Therefore,

d_{n}^{2} \sum_{k \geq n} {(1 - q_{k - 1})}^{2} R_{k} = M_{n} + d_{n}^{2} \sum_{k \geq n} {(1 - q_{k - 1})}^{2} E (R_{k} ∣ F_{k - 1}) \overset{P}{⟶} \int α (x) {(f)}^{2} μ (d x) .

By the same argument, it follows that

d_{n}^{2} \sum_{k \geq n} {(1 - q_{k - 1})}^{2} α (X_{k}) (f) a_{k - 1} (f) \overset{P}{⟶} μ (f) \int α (x) (f) μ (d x) .

In addition, as proved in the Claim below,

\int α (x) (f) μ (d x) \overset{a . s .}{=} μ (f) .

Collecting all pieces together, one finally obtains

d_{n}^{2} \sum_{k \geq n} {(1 - q_{k - 1})}^{2} {\{α (X_{k}) (f) - a_{k - 1} (f)\}}^{2} \overset{P}{⟶} μ {(f)}^{2} + \int α (x) {(f)}^{2} μ (d x) - 2 μ {(f)}^{2} = σ^{2} .

Hence, condition (6) holds.

This concludes the proof of (5). We next prove that (5) ⇒ (4). Let

p \geq 1

and

f_{1}, \dots, f_{p} \in F

. Fix

u_{1}, \dots, u_{p} \in R

and define

U_{n} = \sum_{j = 1}^{p} u_{j} D_{n} (f_{j}) and σ_{u}^{2} = \sum_{j, k} u_{j} u_{k} σ_{j k} .

Moreover, for each

H \in A^{+}

, define the probability measure

λ_{H} (A) = E \{N (0, σ_{u}^{2}) (A) ∣ H\} for each Borel set A \subset R .

We have to show that

P (U_{n} \in \cdot ∣ H) ⟶ λ_{H} weakly for each H \in A^{+} .

(9)

To this end, call

ϕ_{H}

the characteristic function of

λ_{H}

, namely

ϕ_{H} (t) = E (\int e^{i t x} N (0, σ_{u}^{2}) (d x) ∣ H) = E (e^{- t^{2} σ_{u}^{2} / 2} ∣ H) for all t \in R .

Letting

f = \sum_{j = 1}^{p} u_{j} f_{j}

, one obtains

U_{n} = D_{n} (f) and σ_{u}^{2} = \int α (x) {(f)}^{2} μ (d x) - μ {(f)}^{2} .

Therefore, condition (5) yields

E (e^{i t U_{n}}) = E (E \{e^{i t D_{n} (f)} ∣ F_{n}\}) ⟶ E (e^{- t^{2} σ_{u}^{2} / 2}) = ϕ_{Ω} (t)

for each

t \in R

. Hence, condition (9) holds for

H = Ω

. Next, suppose

H \in ⋃_{n} F_{n}

and

P (H) > 0

. Then, for large n, one obtains

E (1_{H} e^{i t U_{n}}) = E (1_{H} E \{e^{i t D_{n} (f)} ∣ F_{n}\}) .

Hence, for each

t \in R

, condition (5) still implies

P (H) ϕ_{H} (t) = E (1_{H} e^{- t^{2} σ_{u}^{2} / 2}) = lim_{n} E (1_{H} E \{e^{i t D_{n} (f)} ∣ F_{n}\}) = lim_{n} E (1_{H} e^{i t U_{n}}) .

Therefore, condition (9) holds whenever

H \in ⋃_{n} F_{n}

and

P (H) > 0

. Based on this fact, by standard arguments, condition (9) easily follows for each

H \in A^{+}

.

To conclude the proof of the Theorem, it remains only to show that:

Claim:

\int α (x) (f) μ (d x) \overset{a . s .}{=} μ (f)

for all

f \in F

.

Proof of the Claim:

By (3),

α

is a regular conditional distribution for

ν

given a sub-

σ

-field of

B

, where

ν

is the marginal distribution of

X_{1}

. Therefore, as proved in ([3] Lemma 6), there is a set

A \in B

such that

ν (A) = 1

and

\int α (z) (f) α (x) (d z) = α (x) (f) for all x \in A and f \in F .

Since X is c.i.d. (and, thus, identically distributed) one also obtains

P (X_{n} \in A) = ν (A) = 1

for all

n \geq 1

.

Having noted these facts, fix

f \in F

. Since

a_{0} = ν

and

α

is a regular conditional distribution for

ν

,

\int α (x) (f) a_{0} (d x) = a_{0} (f) .

Moreover, if

\int α (x) (f) a_{n} (d x) = a_{n} (f)

a.s. for some

n \geq 0

, then

\begin{matrix} \int α (x) (f) a_{n + 1} (d x) & = & q_{n} \int α (x) (f) a_{n} (d x) + (1 - q_{n}) \int α (x) (f) α (X_{n + 1}) (d x) \\ = & q_{n} a_{n} (f) + (1 - q_{n}) α (X_{n + 1}) (f) \\ = & a_{n + 1} (f) a . s . \end{matrix}

By induction, one obtains

\int α (x) (f) a_{n} (d x) = a_{n} (f)

a.s. for each

n \geq 0

. Hence,

\int α (x) (f) μ (d x) = lim_{n} \int α (x) (f) a_{n} (d x) = lim_{n} a_{n} (f) = μ (f) a . s .

. □

We do not know whether

E \{φ (D_{n} (f)) ∣ F_{n}\}

converges a.s. (and not only in probability) under the conditions of Theorem 1. However, it can be shown that

E \{φ (D_{n} (f)) ∣ F_{n}\}

converges a.s. under slightly stronger conditions on

q_{n}

.

Under conditions (2)–(3), for Theorem 1 to work, it suffices that

lim_{n} n^{b} (1 - q_{n}) = c for some b > 1 / 2 and c > 0 .

(10)

In addition, if (10) holds, then

\frac{n^{b - 1 / 2}}{d_{n}} \to \frac{c}{\sqrt{2 b - 1}} .

Hence, letting

D_{n}^{*} = n^{b - 1 / 2} (a_{n} - μ)

, one obtains

(D_{n}^{*} (f_{1}), \dots, D_{n}^{*} (f_{p})) ⟶ N_{p} (0, \frac{c^{2}}{2 b - 1} Σ) stably,

for all

p \geq 1

and all

f_{1}, \dots, f_{p} \in F

, provided conditions (2), (3) and (10) hold.

We close this note with some examples.

Example 1.

Let

q_{n} = \frac{n + θ_{n}}{n + 1 + θ_{n + 1}}

where

(θ_{n})

is a bounded increasing sequence with

θ_{0} > 0

. Then, X is c.i.d. (because of (2)–(3)) but is exchangeable if and only if

θ_{n} = θ_{0}

for all n. In any case, since condition (10) holds with

b = c = 1

, Theorem 1 applies and

d_{n}

can be replaced by

\sqrt{n}

. Letting

D_{n}^{*} = \sqrt{n} (a_{n} - μ)

, it follows that

(D_{n}^{*} (f_{1}), \dots, D_{n}^{*} (f_{p})) ⟶ N_{p} (0, Σ) stably .

It is worth noting that, in the special case

θ_{n} = θ_{0}

for all n, the predictive distributions of X reduce to

a_{n} = \frac{θ_{0} ν + \sum_{i = 1}^{n} α (X_{i})}{n + θ_{0}} .

Therefore, X is a Dirichlet sequence if

α = δ

. The general case, where α is any kernel satisfying condition (3), is investigated in [30]. It turns out that X satisfies most properties of Dirichlet sequences. In particular, μ has the same distribution as

μ^{*} = \sum_{n} V_{n} α (Y_{n}),

where

(V_{n})

and

(Y_{n})

are independent sequences,

(Y_{n})

is i.i.d. with

Y_{1} \sim ν

, and

(V_{n})

has the stick breaking distribution. Nevertheless, as shown in the next example, X can behave quite differently from a Dirichlet sequence.

Example 2

(Example 1 continued). Let

H

be a countable partition of S such that

H \in B

and

ν (H) > 0

for all

H \in H

. Define

α (x) = \sum_{H \in H} 1_{H} (x) ν (\cdot ∣ H) = ν (\cdot ∣ H_{x}) for all x \in S

where

H_{x}

is the only element of the partition

H

, such that

x \in H

. Then, α is a regular conditional distribution for ν given

σ (H)

(i.e., condition (3) holds). If the

q_{n}

are as in Example 1 with

θ_{n} = θ_{0}

for all n, one obtains

a_{n} = \frac{θ_{0} ν + \sum_{i = 1}^{n} ν (\cdot ∣ H_{X_{i}})}{n + θ_{0}} .

Therefore,

a_{n} ≪ ν for all n \geq 0 .

(11)

This is a striking difference with respect to Dirichlet sequences. For instance, if ν is non-atomic, condition (11) yields

P (X_{i} = X_{j} f o r s o m e i \neq j) = 0

while

P (X_{i} = X_{j} f o r s o m e i \neq j) = 1

if X is a Dirichlet sequence. Note also that, for each

f \in F

,

σ^{2} = \int α (x) {(f)}^{2} μ (d x) - μ {(f)}^{2} = \sum_{H \in H} ν {(f ∣ H)}^{2} μ (H) - μ {(f)}^{2}

while

σ^{2} = μ (f^{2}) - μ {(f)}^{2}

if X is a Dirichlet sequence. Other choices of α, which make X quite different from a Dirichlet sequence, are in [30].

Example 3.

A meaningful special case is

\sum_{n} (1 - q_{n}) < \infty

. In this case,

\prod_{j = 0}^{\infty} q_{j} : = lim_{n} \prod_{j = 0}^{n} q_{j}

exists and is strictly positive. Hence, μ admits the representation

μ = ν \prod_{j = 0}^{\infty} q_{j} + \sum_{i = 1}^{\infty} α (X_{i}) (1 - q_{i - 1}) \prod_{j = i}^{\infty} q_{j} .

As an example, under conditions (2)–(3), Theorem 1 applies whenever

q_{n} {= exp {- (c + n)}^{- 2}} f o r s o m e c o n s t a n t c > 0 .

With this choice of

q_{n}

, one obtains

(1 - q_{n}) {(c + n)}^{2} \to 1

, so that

\sum_{n} (1 - q_{n}) < \infty

and μ can be written as above. Note also that

lim_{n} \frac{d_{n}}{{(c + n)}^{3 / 2}} = \sqrt{3} .

Therefore, for fixed

f \in F

, the rate of convergence of

a_{n} (f)

to

μ (f)

is

n^{- 3 / 2}

and not the usual

n^{- 1 / 2}

.

Author Contributions

Methodology, P.B., L.P. and P.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under grant agreement No 817257.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

We are grateful to Giorgio Letta and Eugenio Regazzini. They not only introduced us to probability theory, they also shared with us their enthusiasm and some of their expertise.

Conflicts of Interest

The authors declare no conflict of interest.

References

Dudley, R.M. Uniform Central Limit Theorems; Cambridge University Press: Cambridge, UK, 1999. [Google Scholar]
Van der Vaart, A.; Wellner, J.A. Weak Convergence and Empirical Processes; Springer: New York, NY, USA, 1996. [Google Scholar]
Berti, P.; Dreassi, E.; Pratelli, L.; Rigo, P. A class of models for Bayesian predictive inference. Bernoulli 2021, 27, 702–726. [Google Scholar] [CrossRef]
Berti, P.; Dreassi, E.; Pratelli, L.; Rigo, P. Asymptotics of certain conditionally identically distributed sequences. Statist. Prob. Lett. 2021, 168, 108923. [Google Scholar] [CrossRef]
Berti, P.; Pratelli, L.; Rigo, P. Limit theorems for empirical processes based on dependent data. Electron. J. Probab. 2012, 17, 1–18. [Google Scholar] [CrossRef]
Crimaldi, I.; Pratelli, L. Convergence results for conditional expectations. Bernoulli 2005, 11, 737–745. [Google Scholar] [CrossRef]
Goggin, E.M. Convergence in distribution of conditional expectations. Ann. Probab. 1994, 22, 1097–1114. [Google Scholar] [CrossRef]
Lan, G.; Hu, Z.C.; Sun, W. Products of conditional expectation operators: Convergence and divergence. J. Theore. Probab. 2021, 34, 1012–1028. [Google Scholar] [CrossRef]
Berti, P.; Pratelli, L.; Rigo, P. Limit theorems for a class of identically distributed random variables. Ann. Probab. 2004, 32, 2029–2052. [Google Scholar] [CrossRef]
Berti, P.; Crimaldi, I.; Pratelli, L.; Rigo, P. A central limit theorem and its applications to multicolor randomly reinforced urns. J. Appl. Probab. 2011, 48, 527–546. [Google Scholar] [CrossRef]
Berti, P.; Pratelli, L.; Rigo, P. Exchangeable sequences driven by an absolutely continuous random measure. Ann. Probab. 2013, 41, 2090–2102. [Google Scholar] [CrossRef][Green Version]
Blackwell, D.; Dubins, L.E. Merging of opinions with increasing information. Ann. Math. Statist. 1962, 33, 882–886. [Google Scholar] [CrossRef]
Cifarelli, D.M.; Regazzini, E. De Finetti’s contribution to probability and statistics. Statist. Sci. 1996, 11, 253–282. [Google Scholar] [CrossRef]
Cifarelli, D.M.; Dolera, E.; Regazzini, E. Frequentistic approximations to Bayesian prevision of exchangeable random elements. Int. J. Approx. Reason. 2016, 78, 138–152. [Google Scholar] [CrossRef]
Dolera, E.; Regazzini, E. Uniform rates of the Glivenko-Cantelli convergence and their use in approximating Bayesian inferences. Bernoulli 2019, 25, 2982–3015. [Google Scholar] [CrossRef]
Fortini, S.; Ladelli, L.; Regazzini, E. Exchangeability, predictive distributions and parametric models. Sankhyā Indian J. Stat. Ser. A 2000, 62, 86–109. [Google Scholar]
Hahn, P.R.; Martin, R.; Walker, S.G. On recursive Bayesian predictive distributions. J. Am. Stat. Assoc. 2018, 113, 1085–1093. [Google Scholar] [CrossRef]
Morvai, G.; Weiss, B. On universal algorithms for classifying and predicting stationary processes. Probab. Surv. 2021, 18, 77–131. [Google Scholar] [CrossRef]
Pitman, J. Some developments of the Blackwell-MacQueen urn scheme. Stat. Probab. Game Theory IMS Lect. Notes Mon. Ser. 1996, 30, 245–267. [Google Scholar]
Pitman, J. Combinatorial Stochastic Processes; Lectures from the XXXII Summer School in Saint-Flour; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Regazzini, E. Old and recent results on the relationship between predictive inference and statistical modeling either in nonparametric or parametric form. In Bayesian Statistics 6; Oxford University Press: Oxford, UK, 1999; pp. 571–588. [Google Scholar]
Kallenberg, O. Spreading and predictable sampling in exchangeable sequences and processes. Ann. Probab. 1988, 16, 508–534. [Google Scholar] [CrossRef]
Airoldi, E.M.; Costa, T.; Bassetti, F.; Leisen, F.; Guindani, M. Generalized species sampling priors with latent beta reinforcements. J. Am. Stat. Assoc. 2014, 109, 1466–1480. [Google Scholar] [CrossRef]
Bassetti, F.; Crimaldi, I.; Leisen, F. Conditionally identically distributed species sampling sequences. Adv. Appl. Probab. 2010, 42, 433–459. [Google Scholar] [CrossRef][Green Version]
Cassese, A.; Zhu, W.; Guindani, M.; Vannucci, M. A Bayesian nonparametric spiked process prior for dynamic model selection. Bayesian Anal. 2019, 14, 553–572. [Google Scholar] [CrossRef]
Fong, E.; Holmes, C.; Walker, S.G. Martingale posterior distributions. arXiv 2021, arXiv:2103.15671v1. [Google Scholar]
Fortini, S.; Petrone, S. Predictive construction of priors in Bayesian nonparametrics. Braz. J. Probab. Statist. 2012, 26, 423–449. [Google Scholar] [CrossRef]
Fortini, S.; Petrone, S.; Sporysheva, P. On a notion of partially conditionally identically distributed sequences. Stoch. Proc. Appl. 2018, 128, 819–846. [Google Scholar] [CrossRef]
Fortini, S.; Petrone, S. Quasi-Bayes properties of a procedure for sequential learning in mixture models. J. R. Stat. Soc. B 2020, 82, 1087–1114. [Google Scholar] [CrossRef]
Berti, P.; Dreassi, E.; Leisen, F.; Pratelli, L.; Rigo, P. Kernel based Dirichlet sequences. arXiv 2021, arXiv:2106.00114. [Google Scholar]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Central Limit Theorem for Predictive Distributions

Abstract

1. Introduction

2. Preliminaries

3. Main Result

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics