Wasserstein Bounds in the CLT of the MLE for the Drift Coefﬁcient of a Stochastic Partial Differential Equation

: In this paper, we are interested in the rate of convergence for the central limit theorem of the maximum likelihood estimator of the drift coefﬁcient for a stochastic partial differential equation based on continuous time observations of the Fourier coefﬁcients u i ( t ) , i = 1, . . . , N of the solution, over some ﬁnite interval of time [ 0, T ] . We provide explicit upper bounds for the Wasserstein distance for the rate of convergence when N → ∞ and/or T → ∞ . In the case when T is ﬁxed and N → ∞ , the upper bounds obtained in our results are more efﬁcient than those of the Kolmogorov distance given by the relevant papers of Mishra and Prakasa Rao, and Kim and Park.

Here, the v i , i = 1, 2 . . . are determined by It can be shown (see [1]) that u(t, x) belongs to L 2 ([0, T] × Ω; L 2 ([0, 1]) together with its derivative in x. It vanishes at 0 and 1 and its norm in L 2 ([0, 1]) is continuous in t. In addition, u(t, x) is the only solution to (1) with the above properties. Let Π N be the finite dimensional subspace of L 2 (Ω) generated by {e 1 , . . . , e N }. The likelihood ratio of the projection of the solution u(t, x) onto the subspace Π N (see [2,3]) can be expressed as follows: where P θ denotes the probability measure on C([0, T]) generated by the u N . By maximizing the log likelihood ratio with respect to the parameter θ, we obtain the Maximum Likelihood Estimator (MLE) θ N,T for θ based on u N as follows: Moreover, using (2) and (3), we can write Recently, several papers provided explicit upper bounds for the Kolmogorov distance for the rate of convergence for the central limit theorem of estimators for coefficients in stochastic Gaussian models, see, e.g., [4][5][6][7][8].
The purpose of this paper is to derive upper bounds of the Wasserstein distance for the rate of convergence of the distribution of the MLE θ N,T when N → ∞ and/or T → ∞. Upper bounds of the Kolmogorov distance for the central limit theorem of the MLE θ N,T , as N → ∞ and T fixed, are provided in [4,9]. Let us describe what is proved in this direction. In [9], Mishra and Prakasa Rao proved that there exists a constant C θ,T depending on θ, f 2 L 2 ((0,1)) and T such that, for any γ > 0 and N ≥ N 0 , depending on θ and T, where Z ∼ N (0, 1) denotes a standard normal random variable and the normalizing factor Moreover, in ( , v i = 0 for all i = 1, 2, . . .), then the upper bound in (5) is given by In this case, we notice that the upper bound of the Kolmogorov distance given by (6) does not show that the normal approximation of the MLE θ N,T holds. Hence, the sharp upper bound is needed to prove the normal approximation through the Kolmogorov distance. This problem has been solved by Kim and Park in [4], where they improved the bound in (5) to that converging to zero when N → ∞ and T fixed, by using techniques based the combination Malliavin calculus and Stein's method. More precisely, they proved, in the case when f = 0, that, for sufficiently large N, there exists a constant C θ,T depending on θ and T such that where the normalizing factor ϕ N (θ) is given by ϕ N (θ) = T 2θ ∑ N i=1 λ i . The goal of this paper is to provide Berry-Esseen bounds in Wasserstein distance for the MLE θ N,T when N → ∞ and/or T → ∞. Let us first recall that the estimator θ N,T is strongly consistent and asymptotically normal in three asymptotic regimes: for the two cases N → ∞ and T fixed, and T → ∞ and N fixed, see, for instance, [10] and references therein, and for the case when both N, T → ∞, see [11]. However, the study of the asymptotic distribution of an estimator is not very useful in general for practical purposes unless the rate of convergence is known. To the best of our knowledge, no results of Berry-Esseen bounds are known for MLE θ N,T in terms of Wasserstein distance when N → ∞ and/or T → ∞. Recall that, if X, Y are two real-valued integrable random variables, then the Wasserstein distance between the law of X and the law of Y is given by where Lip(1) is the set of all Lipschitz functions with Lipschitz constant 1. In what follows, in order to simplify the notation, we set u(0, x) = f (x) = 0 and hence u i (0) = 0 for all i ≥ 1. The following are the main results of this paper.

•
Case 1: N → ∞ and T fixed. Then, there exists a positive constant C θ,T depending only on θ and T such that, for every N ≥ 1, In particular, as N → ∞, • Case 2: T → ∞ and N fixed. Then, there exists a positive constant C θ,N depending only on θ and N such that, for every T ≥ 1, In particular, as T → ∞, Then, there exists a positive constant C θ depending only on θ such that, for every N ≥ 1 and T ≥ 1, In particular, as N, T → ∞, Remark 1. Note that, in Case 1, N → ∞ and T fixed, we obtained the upper bound O(1/N 3 2 ) in Wasserstein distance for normal approximation of the MLE θ N,T , while the upper bound in Kolmogorov distance obtained by Kim and Park [4] is O(1/N).
The paper is organized as follows: Section 2 contains some preliminaries presenting the tools needed from the analysis on Wiener space, including Wiener chaos calculus and Malliavin calculus. In Section 3, we derive upper bounds for the rate of convergence of the distribution of the MLE θ N,T when N → ∞ and/or T → ∞, see Theorem 1. We also included in this section a lemma that plays an important role in the proof of Theorem 1.

Preliminaries
In this section, we recall some elements from the analysis on Wiener space and the Malliavin calculus for Gaussian processes that we will need in the paper. For more details, we refer the reader to [12,13]. Let H := L 2 ([0, T]) and let {W(ϕ), ϕ ∈ H} be a Wiener process that is a centered Gaussian family of random variables on a probability space (Ω, F , P) such that E(W(ϕ)W(ψ)) = ϕ, ψ H . In this case, we denote W t := W(1 [0,t] ) and T 0 ϕ(s)dW(s) =: W(ϕ) for every ϕ ∈ H.
The Wiener chaos H p of order p is defined as the closure in L 2 (Ω) of the linear span of the random variables H p (W(ϕ)), where ϕ ∈ H, ϕ H = 1 and H p is the Hermite polynomial of degree p.

•
Multiple Wiener-Itô integral. The multiple Wiener stochastic integral I p with respect to W of order p is defined as an isometry between the Hilbert space H p = L 2 sym [0, T] p (symmetric tensor product) equipped with the norm p! · H ⊗p and the Wiener chaos of order p under L 2 (Ω)'s norm, that is, the multiple Wiener stochastic integral of order p: The Wiener chaos expansion. Let F ∈ L 2 (Ω); then, there exists a unique sequence of functions f p in H p such that where the terms I p ( f p ) are all mutually orthogonal in L 2 (Ω) and • Product formula and contractions. Let p, q ≥ 1 be integers and f ∈ H p and g ∈ H q ; then, where f ⊗ r g is the contraction of f and g of order r, which is an element of H ⊗(p+q−2r) defined by  (10) is particularly handy, and can be written in its symmetrized form: where f ⊗ g means the tensor product of f and g. • Hypercontractivity property in Wiener chaos. Fix q ≥ 1. For any p ≥ 2, there exists c p,q depending only on p and q such that, for every F ∈ ⊕ q l=1 H l , It should be noted that the constants c p,q above are known with some precision when F ∈ H q : by ([12], Corollary 2.8.14), c p,q = (p − 1) q/2 . • Optimal fourth moment theorem. Let Z denote the standard normal law. Let a sequence X : X n ∈ H q , such that EX n = 0 and Var[X n ] = 1, and assume X n converges to a normal law in distribution, which is equivalent to lim n E X 4 n = 3 (this equivalence, proved originally in [14], is known as the fourth moment theorem). Then, we have the optimal estimate for total variation distance d TV (X n , Z), known as the optimal 4th moment theorem, proved in [15]. This optimal estimate also holds with Wasserstein distance d W (X n , Z), see ( [16], Remark 2.2), as follows: there exist two constants c, C > 0 depending only on the sequence X but not on n, such that Moreover, we recall that, for a standardized random variable X, i.e., with E[X] = 0 and E X 2 = 1, the third and fourth cumulants are, respectively, Fix T ≥ 1 and an integer N ≥ 1. Recall that, if H = L 2 [0, T], R N and W = (W 1 , W 2 , . . . , W N ) with W 1 , W 2 , . . . , W N are independent standard Brownian motions; then, for every h = h 1 , . . . , h N ∈ H, the multiple integral I 1 (h) is defined by and Moreover, if g ∈ H ⊗2 , then the third and fourth cumulants for I 2 (g) satisfy the following (see (6.2) and (6.6) in [17], respectively): and |k 4 (I 2 (g))| = 16 g ⊗ 1 g 2 Throughout the paper, Z ∼ N (0, 1) denotes a standard normal random variable, while N (µ, σ 2 ) denotes a normal variable with mean µ and variance σ 2 .

Berry-Esseen Bounds for the MLE
Recall that, in what follows, in order to simplify the notation, we set u(0, x) = f (x) = 0 and hence u i (0) = 0 for all i ≥ 1. In this case, since the Equation (2) is linear, it is immediate to solve it explicitly; one then gets the following formula: Let us introduce the following sequences: and ϕ N, Combining (4) and (18), we have, for every N ≥ 1, T > 0, Using (14), we can write On the other hand, using the product formula (11), s 0 e θλ i r dW i (r) Thus, for every i = 1, . . . , N, This and the linearity of I Consequently, where h N, According to (20)-(22), we can write, for every N ≥ 1, T > 0, where the function µ α (x) is increasing and hence µ α (x) > µ α (0) = 0 for all x > 0. Furthermore, for every p, T 0 > 0, there exists a positive constant C θ,T 0 depending only on θ and T 0 such that where the processes u i , i = 1 . . . , N are given by (17).

Proof.
We will use similar arguments as in ( [16], Proposition 6.3). Let 0 ≤ a < b. Using the fact that, for every t > a, t a e −α(t−u) dW u is independent of F W a , we have Moreover, since µ α (x) = 1−e −2αx 2 > 0 for all x > 0, the function µ α (x) is increasing. Thus, the proof (24) is complete.
Let us now prove (25). Fix p, T 0 > 0, and let m be a positive integer such that m 2p > 1. Using Hölder's inequality, we have, for all x 1 , . . . , Hence, Using the fact that if X ≥ 0, almost surely, Applying Carbery-Wright Inequality, there is a universal constant c > 0 such that, for any ε > 0, we can write Using (24) for α = θλ i and the fact that, for any fixed x > 0, the function y −→ µ y (x) is increasing on (0, ∞). Moreover, since µ(x) x is positive and continuous on (0, ∞) and Combining these facts, we get, Therefore, combining (27)-(29), we deduce that, for every T ≥ T 0 , Thus, Consequently, it follows from (26) and (30) that, for all T ≥ T 0 , which completes the proof of (25).

Theorem 1.
Suppose that θ > 0. Let θ N,T be the MLE given by (3), and let ϕ N,T (θ) be the normalizing factor given by (19). Then, there exists a positive constant C θ depending only on θ such that, for any integer N ≥ 1 and any real number T ≥ 1, where Z is standard Normal law. Consequently, the estimates (7)-(9) are obtained.