Open Access This article is
- freely available
Mathematics 2018, 6(6), 99; doi:10.3390/math6060099
Convergence in Total Variation to a Mixture of Gaussian Laws
Accademia Navale, Viale Italia 72, 57100 Livorno, Italy
Dipartimento di Matematica “F. Casorati”, Universita’ di Pavia, via Ferrata 1, 27100 Pavia, Italy
Author to whom correspondence should be addressed.
Received: 29 April 2018 / Accepted: 5 June 2018 / Published: 11 June 2018
It is not unusual that where , V, Z are real random variables, V is independent of Z and . An intriguing feature is that for each Borel set , namely, the probability distribution of the limit is a mixture of centered Gaussian laws with (random) variance . In this paper, conditions for are given, where is the total variation distance between the probability distributions of and . To estimate the rate of convergence, a few upper bounds for are given as well. Special attention is paid to the following two cases: (i) is a linear combination of the squares of Gaussian random variables; and (ii) is related to the weighted quadratic variations of two independent Brownian motions.
Keywords:mixture of Gaussian laws; rate of convergence; total variation distance; Wasserstein distance; weighted quadratic variation
All random elements involved in the sequel are defined on a common probability space . We let denote the Borel -field on and the Gaussian law on with mean a and variance b, where , , and . Moreover, Z always denotes a real random variable such that:
In plenty of frameworks, it happens that:where and V are real random variables and V is independent of Z. Condition (1) actually occurs in the CLT, both in its classical form (with ) and in its exchangeable and martingale versions (Examples 3 and 4). In addition, condition (1) arises in several recent papers with various distributions for V. See, e.g., [1,2,3,4,5,6,7,8].
An intriguing feature of condition (1) is that the probability distribution of the limit:is a mixture of centered Gaussian laws with (random) variance . Moreover, condition (1) can be often strengthened into:where is the Wasserstein distance between the probability distributions of and . In fact, condition (2) amounts to (1) provided the sequence is uniformly integrable; see Section 2.1.
(*) Give conditions for , where
Under such (or stronger) conditions, estimate the rate of convergence, i.e., find quantitative bounds for .
Problem (*) is addressed in this paper. Before turning to results, however, we mention an example.
Let B be a fractional Brownian motion with Hurst parameter H and
The asymptotics of and other analogous functionals of the B-paths (such as weighted power variations) is investigated in various papers. See, e.g., [5,7,8,9,10] and references therein. We note also that:where the stochastic integral is meant in Skorohod’s sense (it reduces to an Ito integral if ).
In Example 1, problem (*) admits a reasonable solution. In fact, in a sense, Example 1 is our motivating example.
This paper includes two main results.
The first (Theorem 1) is of the general type. Suppose , where is the characteristic function of . (In particular, has an absolutely continuous distribution). Then, an upper bound for is provided in terms of and . In some cases, this bound allows to prove and to estimate the convergence rate. In Example 5, for instance, such a bound improves on the existing ones; see Theorem 3.1 of  and Remark 3.5 of . However, for the upper bound to work, one needs information on and , which is not always available. Thus, it is convenient to have some further tools.
In the second result (Theorem 2), the ideas underlying Example 1 are adapted to weighted quadratic variations; see [5,8,9]. Let B and be independent standard Brownian motions andwhere is a suitable function, and . Under some assumptions on f (weaker than those usually requested in similar problems), it is shown that O, where . Furthermore, O if one also assumes . (We recall that, if and are non-negative numbers, the notation O means that there is a constant c such that for all n).
2.1. Distances between Probability Measures
In this subsection, we recall a few known facts on distances between probability measures. We denote by a measurable space and by and two probability measures on .
The total variation distance between and is:
If X and Y are -valued random variables, we also write:to denote the total variation distance between the probability distributions of X and Y.
Next, suppose S is a separable metric space, the Borel -field andwhere d is the distance on S. The Wasserstein distance between and is:where inf is over the pairs of -valued random variables such that and . By a duality theorem, admits the representation:where sup is over those functions such that for all ; see, e.g., Section 11.8 of . Again, if X and Y are -valued random variables, we write:to mean the Wasserstein distance between the probability distributions of X and Y.
Finally, we make precise the connections between convergence in distribution and convergence according to Wasserstein distance in the case . Let and X be real random variables such that for each n. Then, the following statements are equivalent:
- and ;
- and the sequence is uniformly integrable.
2.2. Two Technical Lemmas
The following simple lemma is fundamental for our purposes.
If , and , then:
Note also that, if , Lemma 1 yields:
The next result, needed in Section 4, is just a consequence of Lemma 1. In such a result, and are separable metric spaces, and Borel functions, and X and Y random variables with values in and , respectively.
Let ν be the probability distribution of Y. If X is independent of Y andfor ν-almost all , then:
Since X is independent of Y,
Thus, since and have centered Gaussian laws and has strictly positive variance, for -almost all , Lemma 1 yields:☐
3. A General Result
As in Section 1, let , V and Z be real random variables, with and V independent of Z. Since , it can be assumed . We also assume , so that we can define:
In addition, we let:where U is a standard normal random variable independent of .
We aim to estimate . Under some conditions, however, the latter quantity can be replaced by .
For each ,
In addition, if , then:
The Lemma is trivially true if . Hence, it can be assumed . Define and note that:
For each ,
Hence, Lemma 1 yields:
On the other hand, the probability distribution of can also be written as:
Arguing as above, Lemma 1 implies again:for each . Letting with , it follows that:
Inequality (3) holds true for every joint distribution for the pair . In particular, inequality (3) holds if such a joint distribution is taken to be one that realizes the Wasserstein distance, namely, one such that . In this case, one obtains:
Finally, if , it suffices to note that:☐
For Lemma 3 to be useful, should be kept under control. This can be achieved under various assumptions. One is to ask to admit a Lipschitz density with respect to Lebesgue measure.
Let be the characteristic function of and
Given , suppose and . Then, there is a constant k, independent of n, such that:
In particular,for each , and
It is worth noting that, if , the condition follows from . On the other hand, can be weakened into whenever for some ; see Section 2.1.
Proof of Theorem 1
If , the Theorem is trivially true. Thus, it can be assumed .
Since is integrable, the probability distribution of admits a density with respect to Lebesgue measure. In addition,
Given , it follows that:
Since and , one obtains:for some constant . Hence,
Minimizing over t, one finally obtains:where is a constant that depends on only. This concludes the proof. ☐
Theorem 1 provides upper bounds for in terms of and . It is connected to Proposition 4.1 of , where is replaced by the Kolmogorov distance.
In particular, Theorem 1 implies that provided a.s. and
In addition, Theorem 1 allows to estimate the convergence rate. As an extreme example, if , and for all , then:
We next turn to examples. In each such examples, Z is a standard normal random variable independent of all other random elements.
Example 2. (Classical CLT).
Let and , where is an i.i.d. sequence of real random variables such that and . In this case, ; see Theorem 2.1 of . Suppose now that for all and has a density f (with respect to Lebesgue measure) such that . Then, for all , and Theorem 1 yields:
This rate, however, is quite far from optimal. Under the present assumptions on , in fact, ; see Theorem 1 of .
We finally prove . It is well known that for all β implies for all β. Hence, it suffices to prove . Let ϕ be the characteristic function of and . An integration by parts yields for each . By Lemma 1.4 of , one also obtains for (just let and in Lemma 1.4 of ). Since for each ,
Using these inequalities, follows from a direct calculation.
As noted above, the rate provided by Theorem 1 in the classical CLT is not optimal. While not exciting, this fact could be expected. Indeed, Theorem 1 is a general result, applying to arbitrary , and should not be requested to give optimal bounds in a very special case (such as the classical CLT).
Example 3. (Exchangeable CLT).
Suppose now that is an exchangeable sequence of real random variables with . Definewhere is the tail σ-field of . By de Finetti’s theorem,
Hence, provided . As to Theorem 1, note that (see e.g. Theorem 3.1 of ) and
Furthermore, . Thus, by Theorem 1, whenever
Example 4. (Martingale CLT).
Letwhere is an array of real square integrable random variables and . For each , let:be sub-σ-fields of with . A well known version of the CLT (see e.g. Theorem 3.2 of ) states that provided:
- is -measurable and a.s.;
- , , ;
- V is measurable with respect to the σ-field generated by where .
Now, in addition to (i)–(ii)–(iii) or (i)–(ii)–(iv), suppose . Then, Theorem 1 (applied with ) implies whenever a.s. and . Moreover,
Our last example is connected to the second order Wiener chaos. We first note a simple fact as a lemma.
Let be a centered Gaussian random vector. Define:where and is an independent copy of ξ. Then, the characteristic function ψ of Y can be written as:
Let , and . Then,
Example 5. (Squares of Gaussian random variables).
For each , let be a centered Gaussian random vector and
Take an independent copy of and define:
Note that is a (random) quadratic form of the covariance matrix . Therefore, .
Since agrees with the characteristic function of , Lemma 4 yields:
Being , it follows that:
Hence,so that whenever for some .
To summarize, applying Theorem 1 with , one obtains:provided , for some V independent of Z, and
The bound (4) requires strong conditions, which may be not easily verifiable in real problems. However, the above result is sometimes helpful, possibly in connection with the martingale CLT of Example 4. As an example, the conditions for (4) are not hard to be checked when are independent for fixed n. We also note that, to our knowledge, the bound (4) improves on the existing ones. In fact, letting in Theorem 3.1 of  (see also Remark 3.5 of ) one only obtains .
4. Weighted Quadratic Variations
Theorem 1 works nicely if one is able to estimate and , which is usually quite hard. Thus, it is convenient to have some further tools. In this section, is upper bounded via Lemma 2. We focus on a special case, but the underlying ideas are easily adapted to more general situations. The results in , for instance, arise from a version of such ideas.
For any function , denote:
Let be an integer, a Borel function, and a real process. The weighted q-variation of J on is:
As noted in , to fix the asymptotic behavior of is useful to determine the rate of convergence of some approximation schemes of stochastic differential equations driven by J. Moreover, the study of is also motivated by parameter estimation and by the analysis of single-path behaviour of J. See [5,9,18,19,20,21] and references therein.
More generally, given an -valued process:one could define:
The weight of depends now on I. Thus, in a sense, can be regarded as the weighted q-variation of J relative to I.
Here, we focus on:where B and are independent standard Brownian motions. Note that, letting and , one obtains:
Thus, can be seen as the difference between the quadratic variations of B and relative to .
We aim to show that, under mild assumptions on f, the probability distributions of converge in total variation to a certain mixture of Gaussian laws. We also estimate the rate of convergence. The smoothness assumptions on f are weaker than those usually requested in similar problems; see, e.g., .
Let B and be independent standard Brownian motions and Z a standard normal random variable independent of . Define by Equation (5) and
Suppose andfor some constant c and all . Then, there is a constant k independent of n satisfying:
Moreover, if , one also obtains .
To understand better the spirit of Theorem 2, think of the trivial case . Then, the asymptotic behavior of can be deduced by classical results. In fact, and this rate is optimal; see Theorem 1 of . On the other hand, since , the same conclusion can be drawn from Theorem 2.
We finally prove Theorem 2.
Proof of Theorem 2.
First note that and are independent standard Brownian motions and
Note also that:
Thus, in order to apply Lemma 2, it suffices to let , , and
For fixed , and are centered Gaussian random variables. Since ,
On noting that , one also obtains:
By Lemma 2 and the Cauchy–Schwarz inequality,
If , since , Lemma 2 implies again:
Thus, to conclude the proof, it suffices to show that O.
Define and note that:
Since , it suffices to see that O for each i. Since Y has independent increments and ,
Therefore, O for each i, and this concludes the proof. ☐
Each author contributed in exactly the same way to each part of this paper.
This research was supported by the Italian Ministry of Education, University and Research (MIUR): Dipartimenti di Eccellenza Program (2018-2022) - Dept. of Mathematics “F. Casorati”, University of Pavia.
Conflicts of Interest
The authors declare no conflict of interest.
- Azmoodeh, E.; Gasbarra, D. New moments criteria for convergence towards normal product/tetilla laws. arXiv, 2017. [Google Scholar]
- Eichelsbacher, P.; Thäle, C. Malliavin-Stein method for Variance-Gamma approximation on Wiener space. Electron. J. Probab. 2015, 20, 1–28. [Google Scholar] [CrossRef]
- Gaunt, R.E. On Stein’s method for products of normal random variables and zero bias couplings. Bernoulli 2017, 23, 3311–3345. [Google Scholar] [CrossRef]
- Gaunt, R.E. Wasserstein and Kolmogorov error bounds for variance-gamma approximation via Stein’s method I. arXiv, 2017. [Google Scholar]
- Nourdin, I.; Nualart, D.; Tudor, C.A. Central and non-central limit theorems for weighted power variations of fractional Brownian motion. Ann. I.H.P. 2010, 46, 1055–1079. [Google Scholar] [CrossRef]
- Nourdin, I.; Poly, G. Convergence in total variation on Wiener chaos. Stoch. Proc. Appl. 2013, 123, 651–674. [Google Scholar] [CrossRef]
- Nourdin, I.; Nualart, D.; Peccati, G. Quantitative stable limit theorems on the Wiener space. Ann. Probab. 2016, 44, 1–41. [Google Scholar] [CrossRef]
- Pratelli, L.; Rigo, P. Total Variation Bounds for Gaussian Functionals. 2018. Submitted. Available online: http://www-dimat.unipv.it/rigo/frac.pdf (accessed on 10 April 2018).
- Nourdin, I.; Peccati, G. Weighted power variations of iterated Brownian motion. Electron. J. Probab. 2008, 13, 1229–1256. [Google Scholar] [CrossRef]
- Peccati, G.; Yor, M. Four limit theorems for quadratic functionals of Brownian motion and Brownian bridge. In Asymptotic Methods in Stochastics, AMS, Fields Institute Communication Series; Amer. Math. Soc.: Providence, RI, USA, 2004; pp. 75–87. [Google Scholar]
- Dudley, R.M. Real Analysis and Probability; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
- Nourdin, I.; Peccati, G. Normal Approximations with Malliavin Calculus: From Stein’s Method to Universality; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
- Goldstein, L. L1 bounds in normal approximation. Ann. Probab. 2007, 35, 1888–1930. [Google Scholar] [CrossRef]
- Sirazhdinov, S.K.H.; Mamatov, M. On convergence in the mean for densities. Theory Probab. Appl. 1962, 7, 424–428. [Google Scholar] [CrossRef]
- Petrov, V.V. Limit Theorems of Probability Theory: Sequences of Independent Random Variables; Clarendon Press: Oxford, UK, 1995. [Google Scholar]
- Berti, P.; Pratelli, L.; Rigo, P. Limit theorems for a class of identically distributed random variables. Ann. Probab. 2004, 32, 2029–2052. [Google Scholar]
- Hall, P.; Heyde, C.C. Martingale Limit Theory and Its Applications; Academic Press: New York, NY, USA, 1980. [Google Scholar]
- Barndorff-Nielsen, O.E.; Graversen, S.E.; Shepard, N. Power variation and stochastic volatility: A review and some new results. J. Appl. Probab. 2004, 44, 133–143. [Google Scholar] [CrossRef]
- Gradinaru, M.; Nourdin, I. Milstein’s type schemes for fractional SDEs. Ann. I.H.P. 2009, 45, 1058–1098. [Google Scholar] [CrossRef]
- Neuenkirch, A.; Nourdin, I. Exact rate of convergence of some approximation schemes associated to SDEs driven by a fractional Brownian motion. J. Theor. Probab. 2007, 20, 871–899. [Google Scholar] [CrossRef]
- Nourdin, I. A simple theory for the study of SDEs driven by a fractional Brownian motion in dimension one. In Seminaire de Probabilites; Springer: Berlin, Germany, 2008; Volume XLI, pp. 181–197. [Google Scholar]
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).