Abstract
Let S be a Borel subset of a Polish space and F the set of bounded Borel functions . Let be the n-th predictive distribution corresponding to a sequence of S-valued random variables. If is conditionally identically distributed, there is a random probability measure on S such that for all . Define for all , where is a constant. In this note, it is shown that, under some conditions on and with a suitable choice of , the finite dimensional distributions of the process stably converge to a Gaussian kernel with a known covariance structure. In addition, converges in probability for all and .
Keywords:
bayesian predictive inference; central limit theorem; conditional identity in distribution; exchangeability; predictive distribution; stable convergence MSC:
60B10; 60G25; 60G09; 60F05; 62F15; 62M20
1. Introduction
All random elements appearing in the sequel are defined on a common probability space, say . We denote by S a Borel subset of a Polish space and by the Borel -field on S. We let
Moreover, if and , we write to denote
In other terms, depending on the context, is regarded as a function on or a function on F. This slight abuse of notation is quite usual (see, e.g., [1,2]) and very useful for the purposes of this note.
Let
be a sequence of S-valued random variables and
The predictive distributions of X are the random probability measures on given by
Under some conditions, there is a further random probability measure on such that
For instance, condition (1) holds if X is exchangeable. More generally, it holds if X is conditionally identically distributed (c.i.d.), as defined in Section 2. Note also that, since S is separable, condition (1) implies weakly. Regarding and as measurable functions from into , one obtains
Assume condition (1), fix a sequence of positive constants, and define
This note deals with the process
Our goal is to show that, under some conditions on X and with a suitable choice of the constants , the finite-dimensional distributions of stably converge, as , to a certain Gaussian limit.
To be more precise, we recall that a kernel on is a measurable map . This means that , for each , and the function is -measurable for each . In what follows, we write
Next, as in [3], suppose the predictive distributions of X satisfy the recursive equation
where are constants and is a kernel on . Moreover, let
be the marginal distribution of . Under condition (2), X is c.i.d. whenever is a regular conditional distribution for given a sub--field ; see ([3] Section 5). Hence, we assume
for all and some sub--field . For instance, condition (3) holds if
where denotes the unit mass at the point x (just let ). In addition, we assume
where
In this framework, it is shown that
for all and all , where is the random covariance matrix with entries
We actually prove something more than (4). Let denote the set of real bounded continuous functions on . Then, it is shown that
for all and , where
Based on (5), it is not hard to deduce condition (4).
Before concluding the Introduction, several remarks are in order.
- (i)
- A remarkable special case is for all . Indeed, Equation (2) holds with in some meaningful situations, including Dirichlet sequences; see ([3] Section 4) for other examples. Thus, suppose . Then, the above formulae reduce to and . Moreover, if is non-atomic andthen takes the formwhere and are independent sequences and is i.i.d. with ; see ([3] Theorem 20) and [4] for details.
- (ii)
- Let be the set of real bounded functions on G, where G is any subset of F. For instance, if , one could take . In view of (4), a natural question is whether has a limit in distribution when is equipped with a suitable distance. As an example, could be equipped with the uniform distance (as in [1,2]) or with some weaker distance (as in [5]). Even if natural, this question is neglected in this note. We hope and plan to investigate it in a forthcoming paper.
- (iii)
- For fixed , condition (4) provides some information on the convergence rate of to . Define where is any sequence of constants. Then, condition (4) yields whenever . Furthermore, provided and a.s.
- (iv)
- The condition is just a technical assumption which guarantees that, asymptotically, there are no dominating terms. In a sense, this condition is analogous to the weak Lindeberg’s condition in the classical CLT for independent summands.
- (v)
- From a Bayesian point of view, can be seen as a random parameter of the data sequence X. This is quite clear if X is exchangeable, for, in this case, X is conditionally i.i.d. given . If X is only c.i.d., the role of is not as crucial, but still contributes to specify the probability distribution of X; see ([3] Section 2.1). Thus, in a Bayesian framework, conditions (4)–(5) may be useful to make (asymptotic) inference about . To this end, an alternative could be proving a limit theorem for , where is a suitable constant and the empirical measure. However, has two advantages with respect to . It usually converges at a better rate and the variance of the limit distribution is smaller; see, e.g., Example 3.
- (vi)
- Conditions (4)–(5) are our main results. They can be motivated in at least two ways. Firstly, from the theoretical perspective, conditions (4)–(5) fit into the results concerning the asymptotic behavior of conditional expectations (see, e.g., [6,7,8] and references therein). Secondly, from the practical perspective, conditions (4)–(5) play a role in all those fields where predictive distributions are basic objects. The main example is Bayesian predictive inference. Indeed, the predictive distributions investigated in this note have been introduced in connection with Bayesian prediction problems; see [3]. Another example is the asymptotic behavior of certain urn schemes. Related subjects, where (4)–(5) are potentially useful, are empirical processes for dependent data, Glivenko-Cantelli-type theorems and merging of opinions. Without any claim of being exhaustive, a list of references is: [3,5,9,10,11,12,13,14,15,16,17,18,19,20,21].
2. Preliminaries
In this note, denotes the Gaussian law on the Borel sets of with mean 0 and covariance matrix C, where C is symmetric and semidefinite positive. If and is a scalar, we write instead of and
for all bounded measurable . Note that, if is a random covariance matrix, is a random probability measure on the Borel sets of .
Let us briefly recall stable convergence. Let . Fix a random probability measure K on and define
Each is a probability measure on . Then, converges stably to K, written stably, if
In particular, converges in distribution to . However, stable convergence is stronger than convergence in distribution. To see this, take a further random variable . Then, if, and only if, stably. Thus, stable convergence is strictly connected to convergence in probability. Moreover, stably whenever stably. Therefore, if converges stably, still converges stably for any S-valued random variable X.
We next turn to conditional identity in distribution. Say that X is conditionally identically distributed (c.i.d.) if
Thus, at each time n, the future observations are identically distributed given the past. This is actually weaker than exchangeability. Indeed, X is exchangeable if, and only if, it is stationary and c.i.d.
C.i.d. sequences were introduced in [9,22] and then investigated in various papers; see, e.g., [3,4,5,11,23,24,25,26,27,28,29].
The asymptotics of c.i.d. sequences is similar to that of exchangeable ones. To see this, suppose X is c.i.d. and define the empirical measures
Then, there is a random probability measure on such that
It follows that
for all and . Therefore, as in the exchangeable case, the predictive distributions can be written as
Using the martingale convergence theorem, this implies
Furthermore, X is asymptotically exchangeable, in the sense that the probability distribution of the shifted sequence converges weakly to an exchangeable probability measure on .
Finally, we state a technical result to be used later on.
Lemma 1.
Let be a sequence of real integrable random variables, adapted to the filtration , and
Let V be a real non-negative random variable and an increasing sequence of constants, such that and . Suppose is uniformly integrable, for some random variable Z, and define
Then,
provided
Proof.
Just repeat the proof of ([10] Theorem 1) with in the place of . □
3. Main Result
Let us go back to the notation of Section 1. Recall that is a constant for each and . We aim to prove the following CLT.
Theorem 1.
Assume conditions (2)–(3) and
Then, there is a random probability measure μ on such that
for all and , where
As a consequence,
for all and all where the covariance matrix Σ has entries
Proof.
Due to conditions (2)–(3), X is c.i.d.; see ([3] Section 5). Hence, as noted in Section 2, there is a random probability measure on such that
By martingale convergence, it follows that for all .
We next prove condition (5). Fix and define
Then, is uniformly integrable (for f is bounded) and satisfies the conditions of Lemma 1. Moreover,
so that . Therefore, Lemma 1 applies. Hence, to prove (5), it suffices to check conditions (6)–(8).
Let . Since a.s., condition (8) is trivially true. Moreover, condition (2) implies
Hence, condition (7) holds, since
It remains to prove condition (6), namely
First note that, since as , one obtains
Next, define
Then,
Moreover,
Therefore,
By the same argument, it follows that
In addition, as proved in the Claim below,
Collecting all pieces together, one finally obtains
Hence, condition (6) holds.
This concludes the proof of (5). We next prove that (5) ⇒ (4). Let and . Fix and define
Moreover, for each , define the probability measure
We have to show that
To this end, call the characteristic function of , namely
Letting , one obtains
Therefore, condition (5) yields
for each . Hence, condition (9) holds for . Next, suppose and . Then, for large n, one obtains
Hence, for each , condition (5) still implies
Therefore, condition (9) holds whenever and . Based on this fact, by standard arguments, condition (9) easily follows for each .
To conclude the proof of the Theorem, it remains only to show that:
Claim:
for all .
Proof of the Claim:
By (3), is a regular conditional distribution for given a sub--field of , where is the marginal distribution of . Therefore, as proved in ([3] Lemma 6), there is a set such that and
Since X is c.i.d. (and, thus, identically distributed) one also obtains for all .
Having noted these facts, fix . Since and is a regular conditional distribution for ,
Moreover, if a.s. for some , then
By induction, one obtains a.s. for each . Hence,
. □
We do not know whether converges a.s. (and not only in probability) under the conditions of Theorem 1. However, it can be shown that converges a.s. under slightly stronger conditions on .
Under conditions (2)–(3), for Theorem 1 to work, it suffices that
In addition, if (10) holds, then
Hence, letting , one obtains
for all and all , provided conditions (2), (3) and (10) hold.
We close this note with some examples.
Example 1.
Let
where is a bounded increasing sequence with . Then, X is c.i.d. (because of (2)–(3)) but is exchangeable if and only if for all n. In any case, since condition (10) holds with , Theorem 1 applies and can be replaced by . Letting , it follows that
It is worth noting that, in the special case for all n, the predictive distributions of X reduce to
Therefore, X is a Dirichlet sequence if . The general case, where α is any kernel satisfying condition (3), is investigated in [30]. It turns out that X satisfies most properties of Dirichlet sequences. In particular, μ has the same distribution as
where and are independent sequences, is i.i.d. with , and has the stick breaking distribution. Nevertheless, as shown in the next example, X can behave quite differently from a Dirichlet sequence.
Example 2
(Example 1 continued). Let be a countable partition of S such that and for all . Define
where is the only element of the partition , such that . Then, α is a regular conditional distribution for ν given (i.e., condition (3) holds). If the are as in Example 1 with for all n, one obtains
Therefore,
This is a striking difference with respect to Dirichlet sequences. For instance, if ν is non-atomic, condition (11) yields
while if X is a Dirichlet sequence. Note also that, for each ,
while if X is a Dirichlet sequence. Other choices of α, which make X quite different from a Dirichlet sequence, are in [30].
Example 3.
A meaningful special case is . In this case,
exists and is strictly positive. Hence, μ admits the representation
As an example, under conditions (2)–(3), Theorem 1 applies whenever
With this choice of , one obtains , so that and μ can be written as above. Note also that
Therefore, for fixed , the rate of convergence of to is and not the usual .
Author Contributions
Methodology, P.B., L.P. and P.R. All authors have read and agreed to the published version of the manuscript.
Funding
This research received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under grant agreement No 817257.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Acknowledgments
We are grateful to Giorgio Letta and Eugenio Regazzini. They not only introduced us to probability theory, they also shared with us their enthusiasm and some of their expertise.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Dudley, R.M. Uniform Central Limit Theorems; Cambridge University Press: Cambridge, UK, 1999. [Google Scholar]
- Van der Vaart, A.; Wellner, J.A. Weak Convergence and Empirical Processes; Springer: New York, NY, USA, 1996. [Google Scholar]
- Berti, P.; Dreassi, E.; Pratelli, L.; Rigo, P. A class of models for Bayesian predictive inference. Bernoulli 2021, 27, 702–726. [Google Scholar] [CrossRef]
- Berti, P.; Dreassi, E.; Pratelli, L.; Rigo, P. Asymptotics of certain conditionally identically distributed sequences. Statist. Prob. Lett. 2021, 168, 108923. [Google Scholar] [CrossRef]
- Berti, P.; Pratelli, L.; Rigo, P. Limit theorems for empirical processes based on dependent data. Electron. J. Probab. 2012, 17, 1–18. [Google Scholar] [CrossRef]
- Crimaldi, I.; Pratelli, L. Convergence results for conditional expectations. Bernoulli 2005, 11, 737–745. [Google Scholar] [CrossRef]
- Goggin, E.M. Convergence in distribution of conditional expectations. Ann. Probab. 1994, 22, 1097–1114. [Google Scholar] [CrossRef]
- Lan, G.; Hu, Z.C.; Sun, W. Products of conditional expectation operators: Convergence and divergence. J. Theore. Probab. 2021, 34, 1012–1028. [Google Scholar] [CrossRef]
- Berti, P.; Pratelli, L.; Rigo, P. Limit theorems for a class of identically distributed random variables. Ann. Probab. 2004, 32, 2029–2052. [Google Scholar] [CrossRef]
- Berti, P.; Crimaldi, I.; Pratelli, L.; Rigo, P. A central limit theorem and its applications to multicolor randomly reinforced urns. J. Appl. Probab. 2011, 48, 527–546. [Google Scholar] [CrossRef]
- Berti, P.; Pratelli, L.; Rigo, P. Exchangeable sequences driven by an absolutely continuous random measure. Ann. Probab. 2013, 41, 2090–2102. [Google Scholar] [CrossRef][Green Version]
- Blackwell, D.; Dubins, L.E. Merging of opinions with increasing information. Ann. Math. Statist. 1962, 33, 882–886. [Google Scholar] [CrossRef]
- Cifarelli, D.M.; Regazzini, E. De Finetti’s contribution to probability and statistics. Statist. Sci. 1996, 11, 253–282. [Google Scholar] [CrossRef]
- Cifarelli, D.M.; Dolera, E.; Regazzini, E. Frequentistic approximations to Bayesian prevision of exchangeable random elements. Int. J. Approx. Reason. 2016, 78, 138–152. [Google Scholar] [CrossRef]
- Dolera, E.; Regazzini, E. Uniform rates of the Glivenko-Cantelli convergence and their use in approximating Bayesian inferences. Bernoulli 2019, 25, 2982–3015. [Google Scholar] [CrossRef]
- Fortini, S.; Ladelli, L.; Regazzini, E. Exchangeability, predictive distributions and parametric models. Sankhyā Indian J. Stat. Ser. A 2000, 62, 86–109. [Google Scholar]
- Hahn, P.R.; Martin, R.; Walker, S.G. On recursive Bayesian predictive distributions. J. Am. Stat. Assoc. 2018, 113, 1085–1093. [Google Scholar] [CrossRef]
- Morvai, G.; Weiss, B. On universal algorithms for classifying and predicting stationary processes. Probab. Surv. 2021, 18, 77–131. [Google Scholar] [CrossRef]
- Pitman, J. Some developments of the Blackwell-MacQueen urn scheme. Stat. Probab. Game Theory IMS Lect. Notes Mon. Ser. 1996, 30, 245–267. [Google Scholar]
- Pitman, J. Combinatorial Stochastic Processes; Lectures from the XXXII Summer School in Saint-Flour; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Regazzini, E. Old and recent results on the relationship between predictive inference and statistical modeling either in nonparametric or parametric form. In Bayesian Statistics 6; Oxford University Press: Oxford, UK, 1999; pp. 571–588. [Google Scholar]
- Kallenberg, O. Spreading and predictable sampling in exchangeable sequences and processes. Ann. Probab. 1988, 16, 508–534. [Google Scholar] [CrossRef]
- Airoldi, E.M.; Costa, T.; Bassetti, F.; Leisen, F.; Guindani, M. Generalized species sampling priors with latent beta reinforcements. J. Am. Stat. Assoc. 2014, 109, 1466–1480. [Google Scholar] [CrossRef]
- Bassetti, F.; Crimaldi, I.; Leisen, F. Conditionally identically distributed species sampling sequences. Adv. Appl. Probab. 2010, 42, 433–459. [Google Scholar] [CrossRef][Green Version]
- Cassese, A.; Zhu, W.; Guindani, M.; Vannucci, M. A Bayesian nonparametric spiked process prior for dynamic model selection. Bayesian Anal. 2019, 14, 553–572. [Google Scholar] [CrossRef]
- Fong, E.; Holmes, C.; Walker, S.G. Martingale posterior distributions. arXiv 2021, arXiv:2103.15671v1. [Google Scholar]
- Fortini, S.; Petrone, S. Predictive construction of priors in Bayesian nonparametrics. Braz. J. Probab. Statist. 2012, 26, 423–449. [Google Scholar] [CrossRef]
- Fortini, S.; Petrone, S.; Sporysheva, P. On a notion of partially conditionally identically distributed sequences. Stoch. Proc. Appl. 2018, 128, 819–846. [Google Scholar] [CrossRef]
- Fortini, S.; Petrone, S. Quasi-Bayes properties of a procedure for sequential learning in mixture models. J. R. Stat. Soc. B 2020, 82, 1087–1114. [Google Scholar] [CrossRef]
- Berti, P.; Dreassi, E.; Leisen, F.; Pratelli, L.; Rigo, P. Kernel based Dirichlet sequences. arXiv 2021, arXiv:2106.00114. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).