1. Introduction
All random elements involved in the sequel are defined on a common probability space
. We let
denote the Borel
-field on
and
the Gaussian law on
with mean
a and variance
b, where
,
, and
. Moreover,
Z always denotes a real random variable such that:
In plenty of frameworks, it happens that:
where
and
V are real random variables and
V is independent of
Z. Condition (
1) actually occurs in the CLT, both in its classical form (with
) and in its exchangeable and martingale versions (Examples 3 and 4). In addition, condition (
1) arises in several recent papers with various distributions for
V. See, e.g., [
1,
2,
3,
4,
5,
6,
7,
8].
An intriguing feature of condition (
1) is that the probability distribution of the limit:
is a mixture of centered Gaussian laws with (random) variance
. Moreover, condition (
1) can be often strengthened into:
where
is the Wasserstein distance between the probability distributions of
and
. In fact, condition (
2) amounts to (
1) provided the sequence
is uniformly integrable; see
Section 2.1.
A few (engaging) problems are suggested by conditions (
1) and (
2). One is:
(*) Give conditions for
, where
Under such (or stronger) conditions, estimate the rate of convergence, i.e., find quantitative bounds for .
Problem (*) is addressed in this paper. Before turning to results, however, we mention an example.
Example 1. Let B be a fractional Brownian motion with Hurst parameter H and The asymptotics of and other analogous functionals of the B-paths (such as weighted power variations) is investigated in various papers. See, e.g., [5,7,8,9,10] and references therein. We note also that:where the stochastic integral is meant in Skorohod’s sense (it reduces to an Ito integral if ). Let and . In [8], it is shown that, for every , there is a constant k (depending on H and β only) such that:where Z is a standard normal random variable independent of V. Furthermore, the rate is quite close to be optimal; see condition (2) of [8]. In Example 1, problem (*) admits a reasonable solution. In fact, in a sense, Example 1 is our motivating example.
This paper includes two main results.
The first (Theorem 1) is of the general type. Suppose
, where
is the characteristic function of
. (In particular,
has an absolutely continuous distribution). Then, an upper bound for
is provided in terms of
and
. In some cases, this bound allows to prove
and to estimate the convergence rate. In Example 5, for instance, such a bound improves on the existing ones; see Theorem 3.1 of [
6] and Remark 3.5 of [
7]. However, for the upper bound to work, one needs information on
and
, which is not always available. Thus, it is convenient to have some further tools.
In the second result (Theorem 2), the ideas underlying Example 1 are adapted to weighted quadratic variations; see [
5,
8,
9]. Let
B and
be independent standard Brownian motions and
where
is a suitable function,
and
. Under some assumptions on
f (weaker than those usually requested in similar problems), it is shown that
O
, where
. Furthermore,
O
if one also assumes
. (We recall that, if
and
are non-negative numbers, the notation
O
means that there is a constant
c such that
for all
n).
2. Preliminaries
2.1. Distances between Probability Measures
In this subsection, we recall a few known facts on distances between probability measures. We denote by a measurable space and by and two probability measures on .
The total variation distance between
and
is:
If
X and
Y are
-valued random variables, we also write:
to denote the total variation distance between the probability distributions of
X and
Y.
Next, suppose
S is a separable metric space,
the Borel
-field and
where
d is the distance on
S. The Wasserstein distance between
and
is:
where inf is over the pairs
of
-valued random variables such that
and
. By a duality theorem,
admits the representation:
where sup is over those functions
such that
for all
; see, e.g., Section 11.8 of [
11]. Again, if
X and
Y are
-valued random variables, we write:
to mean the Wasserstein distance between the probability distributions of
X and
Y.
Finally, we make precise the connections between convergence in distribution and convergence according to Wasserstein distance in the case . Let and X be real random variables such that for each n. Then, the following statements are equivalent:
- -
;
- -
and ;
- -
and the sequence is uniformly integrable.
2.2. Two Technical Lemmas
The following simple lemma is fundamental for our purposes.
Lemma 1. If , and , then: Lemma 1 is well known; see e.g. Proposition 3.6.1 of [
12] and Lemma 3 of [
8].
Note also that, if
, Lemma 1 yields:
The next result, needed in
Section 4, is just a consequence of Lemma 1. In such a result,
and
are separable metric spaces,
and
Borel functions, and
X and
Y random variables with values in
and
, respectively.
Lemma 2. Let ν be the probability distribution of Y. If X is independent of Y andfor ν-almost all , then: Proof. Since
X is independent of
Y,
Thus, since
and
have centered Gaussian laws and
has strictly positive variance, for
-almost all
, Lemma 1 yields:
☐
3. A General Result
As in
Section 1, let
,
V and
Z be real random variables, with
and
V independent of
Z. Since
, it can be assumed
. We also assume
, so that we can define:
In addition, we let:
where
U is a standard normal random variable independent of
.
We aim to estimate . Under some conditions, however, the latter quantity can be replaced by .
Lemma 3. In addition, if , then: Proof. The Lemma is trivially true if
. Hence, it can be assumed
. Define
and note that:
On the other hand, the probability distribution of
can also be written as:
Arguing as above, Lemma 1 implies again:
for each
. Letting
with
, it follows that:
Inequality (
3) holds true for
every joint distribution for the pair
. In particular, inequality (
3) holds if such a joint distribution is taken to be one that realizes the Wasserstein distance, namely, one such that
. In this case, one obtains:
Finally, if
, it suffices to note that:
☐
For Lemma 3 to be useful, should be kept under control. This can be achieved under various assumptions. One is to ask to admit a Lipschitz density with respect to Lebesgue measure.
Theorem 1. Let be the characteristic function of and Given , suppose and . Then, there is a constant k, independent of n, such that: In particular,for each , and It is worth noting that, if
, the condition
follows from
. On the other hand,
can be weakened into
whenever
for some
; see
Section 2.1.
Proof of Theorem 1 If , the Theorem is trivially true. Thus, it can be assumed .
Since
is integrable, the probability distribution of
admits a density
with respect to Lebesgue measure. In addition,
Given
, it follows that:
Since
and
, one obtains:
for some constant
. Hence,
Minimizing over
t, one finally obtains:
where
is a constant that depends on
only. This concludes the proof. ☐
Theorem 1 provides upper bounds for
in terms of
and
. It is connected to Proposition 4.1 of [
4], where
is replaced by the Kolmogorov distance.
In particular, Theorem 1 implies that
provided
a.s. and
In addition, Theorem 1 allows to estimate the convergence rate. As an extreme example, if
,
and
for all
, then:
We next turn to examples. In each such examples, Z is a standard normal random variable independent of all other random elements.
Example 2. (Classical CLT). Let and , where is an i.i.d. sequence of real random variables such that and . In this case, ; see Theorem 2.1 of [13]. Suppose now that for all and has a density f (with respect to Lebesgue measure) such that . Then, for all , and Theorem 1 yields: This rate, however, is quite far from optimal. Under the present assumptions on , in fact, ; see Theorem 1 of [14]. We finally prove . It is well known that for all β implies for all β. Hence, it suffices to prove . Let ϕ be the characteristic function of and . An integration by parts yields for each . By Lemma 1.4 of [15], one also obtains for (just let and in Lemma 1.4 of [15]). Since for each , Using these inequalities, follows from a direct calculation.
As noted above, the rate provided by Theorem 1 in the classical CLT is not optimal. While not exciting, this fact could be expected. Indeed, Theorem 1 is a general result, applying to arbitrary , and should not be requested to give optimal bounds in a very special case (such as the classical CLT).
Example 3. (Exchangeable CLT). Suppose now that is an exchangeable sequence of real random variables with . Definewhere is the tail σ-field of . By de Finetti’s theorem, Hence, provided . As to Theorem 1, note that (see e.g. Theorem 3.1 of [16]) and Furthermore, . Thus, by Theorem 1, whenever Example 4. (Martingale CLT). Letwhere is an array of real square integrable random variables and . For each , let:be sub-σ-fields of with . A well known version of the CLT (see e.g. Theorem 3.2 of [17]) states that provided: - (i)
is -measurable and a.s.;
- (ii)
, , ;
- (iii)
.
Condition (iii) can be replaced by:
- (iv)
V is measurable with respect to the σ-field generated by where .
Note also that, under (i), one obtains .
Now, in addition to (i)–(ii)–(iii) or (i)–(ii)–(iv), suppose . Then, Theorem 1 (applied with ) implies whenever a.s. and . Moreover, Our last example is connected to the second order Wiener chaos. We first note a simple fact as a lemma.
Lemma 4. Let be a centered Gaussian random vector. Define:where and is an independent copy of ξ. Then, the characteristic function ψ of Y can be written as: Proof. Let
,
and
. Then,
Example 5. (Squares of Gaussian random variables). For each , let be a centered Gaussian random vector and Take an independent copy of and define: Note that is a (random) quadratic form of the covariance matrix . Therefore, .
Since agrees with the characteristic function of , Lemma 4 yields: Being , it follows that: Hence,so that whenever for some . To summarize, applying Theorem 1 with , one obtains:provided , for some V independent of Z, and The bound (4) requires strong conditions, which may be not easily verifiable in real problems. However, the above result is sometimes helpful, possibly in connection with the martingale CLT of Example 4. As an example, the conditions for (4) are not hard to be checked when are independent for fixed n. We also note that, to our knowledge, the bound (4) improves on the existing ones. In fact, letting in Theorem 3.1 of [6] (see also Remark 3.5 of [7]) one only obtains . 4. Weighted Quadratic Variations
Theorem 1 works nicely if one is able to estimate
and
, which is usually quite hard. Thus, it is convenient to have some further tools. In this section,
is upper bounded via Lemma 2. We focus on a special case, but the underlying ideas are easily adapted to more general situations. The results in [
8], for instance, arise from a version of such ideas.
For any function
, denote:
Let
be an integer,
a Borel function, and
a real process. The weighted
q-variation of
J on
is:
As noted in [
5], to fix the asymptotic behavior of
is useful to determine the rate of convergence of some approximation schemes of stochastic differential equations driven by
J. Moreover, the study of
is also motivated by parameter estimation and by the analysis of single-path behaviour of
J. See [
5,
9,
18,
19,
20,
21] and references therein.
More generally, given an
-valued process:
one could define:
The weight of depends now on I. Thus, in a sense, can be regarded as the weighted q-variation of J relative to I.
Here, we focus on:
where
B and
are independent standard Brownian motions. Note that, letting
and
, one obtains:
Thus, can be seen as the difference between the quadratic variations of B and relative to .
We aim to show that, under mild assumptions on
f, the probability distributions of
converge in total variation to a certain mixture of Gaussian laws. We also estimate the rate of convergence. The smoothness assumptions on
f are weaker than those usually requested in similar problems; see, e.g., [
5].
Theorem 2. Let B and be independent standard Brownian motions and Z a standard normal random variable independent of . Define by Equation (5) and Suppose andfor some constant c and all . Then, there is a constant k independent of n satisfying: Moreover, if , one also obtains .
To understand better the spirit of Theorem 2, think of the trivial case
. Then, the asymptotic behavior of
can be deduced by classical results. In fact,
and this rate is optimal; see Theorem 1 of [
14]. On the other hand, since
, the same conclusion can be drawn from Theorem 2.
We finally prove Theorem 2.
Proof of Theorem 2. First note that
and
are independent standard Brownian motions and
Thus, in order to apply Lemma 2, it suffices to let
,
, and
For fixed
,
and
are centered Gaussian random variables. Since
,
On noting that
, one also obtains:
By Lemma 2 and the Cauchy–Schwarz inequality,
If
, since
, Lemma 2 implies again:
Thus, to conclude the proof, it suffices to show that O.
Define
and note that:
Since
, it suffices to see that
O
for each
i. Since
Y has independent increments and
,
Therefore, O for each i, and this concludes the proof. ☐