Convergence of Relative Entropy for Euler–Maruyama Scheme to Stochastic Differential Equations with Additive Noise

For a family of stochastic differential equations driven by additive Gaussian noise, we study the asymptotic behaviors of its corresponding Euler–Maruyama scheme by deriving its convergence rate in terms of relative entropy. Our results for the convergence rate in terms of relative entropy complement the conventional ones in the strong and weak sense and induce some other properties of the Euler–Maruyama scheme. For example, the convergence in terms of the total variation distance can be implied by Pinsker’s inequality directly. Moreover, when the drift is β(0<β<1)-Hölder continuous in the spatial variable, the convergence rate in terms of the weighted variation distance is also established. Both of these convergence results do not seem to be directly obtained from any other convergence results of the Euler–Maruyama scheme. The main tool this paper relies on is the Girsanov transform.


Introduction
Consider the following d-dimensional stochastic differential equation (SDE) where b : R d → R d , σ : R d → R d ⊗ R m , and (W t ) t≥0 is m-dimensional Brownian motion on some complete filtration probability space (Ω, F , (F t ) t≥0 , P).
Usually, it can be proved that (1) has strong well-posedness under reasonable conditions, whereas the explicit representation of (1) is unknown.Instead, one may develop various numerical schemes to approximate (1); see [1] and references therein for more introductions.When the coefficients are regular, strong/weak convergence of numerical schemes for SDEs have been investigated considerably; see, for instance, monographs [1].
One of the most popular numerical schemes for SDEs is the Euler-Maruyama (EM) scheme, the introduction of which can be found in [1] and references therein.The Euler-Maruyama scheme is a numerical method commonly used for approximating the solutions of SDEs.SDEs are differential equations that involve both deterministic and stochastic (random) components.They are applied in various fields, including physics, finance, biology, and more.The Euler-Maruyama method is particularly useful for solving SDEs because it is a simple and computationally efficient approach.It is an extension of the Euler method, which is used for solving ordinary differential equations.The Euler-Maruyama method is adapted to handle the stochastic part of the equations.Besides its fundamental tools for the numerical solutions of SDEs, researchers also pay much attention to its convergence rate for SDEs.There are some related works in the literature.
For strong convergence of the EM scheme for SDEs, there are some basic results under irregular coefficients.Yan (2002) [2] uses Meyer-Tanaka's formula and estimates local times to derive a strong convergence rate for EM schemes to one-dimensional SDEs, for which the drift is Lipschitz continuous and the diffusion is Hölder continuous.Gyöngy and Rásonyi (2011) [3] adopt a Yamada-Watanabe approximation approach to derive strong convergence for EM schemes of one-dimensional SDEs with Hölder-continuous diffusions, but the drifts cannot be Lipschitz continuous.Halidias and Kloeden (2008) [4] established strong convergence for an EM scheme of SDEs with monotone drift, which may be discontinuous.Leobacher and Szölgyenyi (2016) and Müller-Gronbach and Yaroslavtseva (2020) [5,6] investigated strong convergence of the EM scheme for one-dimensional SDEs with piecewise Lipschitz-continuous drifts.
Moreover, there are also some other authors who use some transformation tools to get strong convergence of an EM scheme for more complex cases.Leobacher and Szölgyenyi (2017) [7] studied an EM scheme for the multi-dimensional case, which was extended by Leobacher and Szölgyenyi (2018) [8] to the multi-dimensional and degenerate case.The proofs are based on a transformation that changes the piecewise Lipschitz-continuous drifts into globally Lipschitz-continuous ones.Besides the transformation mentioned above, the Zvonkin transform is an alternative tool to deal with the convergence for an EM scheme of SDEs with irregular coefficients.Bao, Huang and Yuan (2019) and Pamen and Taguchi (2017) [9,10] studied SDEs with Hölder-or Hölder-Dini-continuous drifts.Bao, Huang and Zhang (2022) [11] focused on the integrability condition; see the references in [9][10][11] for more results.
For weak convergence, one can refer to [12,13], wherein the drift satisfies the integrable condition and the main tool is the Girsanov transform.
We should remark that all of the references mentioned above study weak or strong convergence of the EM scheme.As far as we know, there are no results on the convergence in the sense of relative entropy.
In this paper, we further characterize the asymptotic behaviors of an EM scheme for SDEs by studying the convergence rate in terms of the so-called relative entropy.Our main results show that the distribution of the EM iteration of SDEs driven by additive Gaussian noise converges to that of real stochastic process of SDEs in terms of relative entropy.And hence, we can get the asymptotic behaviors of its corresponding EM scheme.Indeed, while relative entropy is commonly perceived as a measure of the dissimilarity between two probability distributions, it falls short of being considered a metric.This is primarily attributed to its asymmetry concerning the order of its parameters and its inability to satisfy the triangle inequality.Despite not meeting the criteria of a metric, relative entropy maintains strong connections with various other metrics.Some notable relationships include: total variation distance, Fisher information divergence, Wasserstein distance and so on; see, e.g., [14] and references therein.These relationships highlight the versatility of relative entropy and its role in connecting with various other measures of dissimilarity and divergence between probability distributions.While it may not possess all the properties of a metric, its specific characteristics make it a valuable tool in information theory and related fields.
Relative entropy's broad utilization across a spectrum of disciplines, spanning probability theory, statistics, statistical physics, machine learning, neural science, and information theory, can be attributed to its possession of numerous advantageous properties; see, e.g., [15] and references therein.These qualities make it a versatile and valuable tool in diverse applications and fields of study.In addition to conventional convergence analysis in both the strong and weak senses for EM schemes, the relative entropy convergence in our findings could unveil previously unexplored facets of SDEs.These discoveries hold the potential to reveal new properties of SDEs in uncharted research territories, presenting exciting prospects for further exploration and investigation.
As an example of the direct corollary of our main results, the convergence in total variation of an EM scheme can be implied by the well-known Pinsker's inequality.Pinsker's inequality states that the relative entropy between two probability measures provides an upper bound for their total variation distance.
Moreover, when the drift of SDE ( 4) is β(0 < β < 1)-Hölder-continuous in the spatial variable, the convergence rate for the weighted variation distance will also be induced.
The paper is organized as follows: In Section 2, we review some related concepts and definitions of this paper.In Section 3, we state our assumptions and introduce the main results.In Section 4, we induce the proofs of all results.And conclusions and discussions are provided in Section 5.

Preliminaries
In this section, we first state some definitions and some related concepts for the main results of this paper.

Euler-Maruyama Scheme
The Euler-Maruyama scheme is a numerical method commonly used for approximating solutions to SDEs.SDEs involve both deterministic and stochastic components, making their solutions more challenging than those of ordinary differential equations.The basic idea is similar to the traditional Euler method for ordinary differential equations but adapted to handle the stochastic terms.The Euler-Maruyama scheme is one of the simplest time-discrete approximations of Itô's process, and it is sometimes called the Euler-Maruyama aprproximation or Euler approximation.
We consider an Itô's process satisfying the stochastic differential Equation (1): , a discrete EM scheme satisfies the following iterative scheme: for n = 0, 1, 2, • • • , N − 1 with initial value Y 0 = X 0 .We shall also write ∆ n = τ n+1 − τ n for the nth time increment and call δ = max n ∆ n the maximum time step.
In this paper, we shall consider equidistant discretizaiton times τ n = nδ with δ = T/N for some integer N large enough so that δ ∈ (0, 1).When the diffusion coefficient is identically zero-that is, when σ = 0-the stochastic iterative scheme reduces to the deterministic Euler scheme for the ordinary differential equation dX t = b(X t )dt.The main difference is that we need to generate the random increments From the properties of Wiener processes, we know that these increments are independent Gaussian random variables with mean E(∆W n ) = 0 and variance E((∆W n ) 2 ) = ∆ n .We can use a sequence of independent Gausssian pseudo-random numbers generated by one of the random number generators for the increments of the Wiener process.
The recursive structure of the discrete EM scheme, which evaluates approximate values for the Itô's process at the discretization instants only, is the key to its successful implementation for numerical approximation of SDEs.For a given time discretizaiton, the discrete EM scheme determines values of the approximation process at the discretization times only.We also need the continuous EM scheme, i.e., where t δ := ⌊t/δ⌋δ, and ⌊t/δ⌋ is the integer part of t/δ.More details and introductions can be found in [1] and references therein.

Relative Entropy
Kullback and Leibler (1951) [16] firstly introduced the definition of relative entropy, which is also called Kullback-Leibler divergence (K-L divergence for short).The definition of relative entropy is as bellow.
Definition 1 (Relative entropy).Recall that the relative entropy of two probability measures ν and µ on R d is defined as where dν dµ is the Radon-Nikodym derivative of ν with respect to µ.
Relative entropy is a concept from information theory and probability theory that measures how one probability distribution diverges from another one.Roughly speaking, the relative entropy between two probability measures is a measure of the "distance" or difference measuring how "close" these two probability distributions are.In Chapter 4 of reference [17], the authors provide lots of properties of relative entropy (K-L divergence), establish the relationship between relative entropy, cross entropy and conventional differential entropy and give some examples of relative entropy calculations: for instance, exponential distributions, normal distributions and Poisson distributions.

Total Variation and Weighted Variation Distance
Definition 2 (Total variation).For two probability measures γ, γ on R d , the total variation distance is formulated as Remark 1.In view of (3), the convergence of the total variation distance can be implied by Pinsker's inequality directly under the relative entropy convergence.
Definition 4 (Weighted variation distance).For any k > 1, the weighted variation distance for two probability measures γ, γ on R d is formulated as Remark 2. The convergence of the weighted total variation distance cannot be implied by the relative entropy convergence, so we need to further investigate it.

Stochastic Differential Equation Description
Definition 5 (Stochastic differential equations with additive noise).In this paper, we consider the following SDE where b : R d → R d , and (W t ) t≥0 is d-dimensional Brownian motion on some complete filtration probability space (Ω, F , (F t ) t≥0 , P).

Assumptions
Throughout the paper, we impose the following assumptions on the drift term b of the SDE (4).

Main Results
Let L ξ denote the distribution of a random variable ξ.The main result is the following theorem.
Theorem 1. Assume the dirft term of the SDE (4) satisfies assumption (A).Then there exists constants C T,x,d such that Consequently, we have lim Remark 5. Theorem 1 gives the convergence rate of an EM scheme in the sense of relative entropy for SDEs (4) with additive noise, so its asymptocic behaviors can be established.The main tool of the proof relies on the Girsanov transform.The details of the proof can be found in Section 4.

Corollary 1.
When assumption (A) is satisfied, we have Remark 6. Corollary 1 is the convergence of an EM scheme for the total variance distance.This can be implied by Pinker's inequality (3) directly.

Theorem 2.
If assumption (A) holds for some 0 < β < 1 , then for any k ≥ 1 there exists a constant c k,T,x,d such that Remark 7. Theorem 2 is the convergence of an EM scheme for the weighted total variance distance.This convergence is not the direct application of the convergence in the relative entropy sense for the EM scheme of Theorem 1.And the details of the proof can be found in Section 4.

Proof of Theorem 1
Before finishing the proof of Theorem 1, we prepare some auxiliary lemmas.The first lemma below plays a crucial role in the proof of Theorem 1.

Lemma 1. Assume (A).
Then for any k ≥ 1, there exists a constant C T,d,k > 0 such that Proof.Without loss of generality, we only prove the inequality for X δ t since it is similar for X t .
For any n ≥ 1, let By (A), it is easy to see that Combining this with (9), we can find a constant c 0 > 0 such that By the Burkerholder-Davis-Gundy inequality, there exists a constant c 1 > 0 such that Putting this into (11) and applying Gronwall's inequality, we find a constant Note that This yields that P-a.s.lim n→∞ ζ n = ∞, which, combined with Fatou's lemma, yields that So we complete the proof.
Lemma 2. Under (A), there exists a constant C T,x,d > 0 such that Proof.Note that This together with (10), Lemma 1 and the fact Therefore, the proof is completed.

Proof of Theorem 1. For any
Let Then ( 15) can be rewritten as Fix t 0 ∈ (0, T].By (A) and Girsanov's theorem, we conclude that {R t∧τ n } t∈[0,t 0 ] is a martingale and W δ t is d-dimensional Brownian motion up to t 0 ∧ τ n under probability measure Q n = R t 0 ∧τ n P.This together with (A) and Lemma 2 implies that From this and ( 16), we derive that This combined with the convergence theorem of martingales implies that {R t } t∈[0,t 0 ] is a martingale, and it follows from Fatou's lemma that Applying Girsanov's theorem again, we conclude that {R t } t∈[0,t 0 ] is a martingale and W δ t is d-dimensional Brownian motion up to t 0 under probability measure Q = R t 0 P, and hence, the distribution of {X t } t∈[0,t 0 ] under Q is equal to that of {X δ t } t∈[0,t 0 ] under P.As a result, it holds that By Young's inequality, we derive that Ent(L X δ t 0 The proof is completed.

Proof of Corollary 1
Corollary 1 is the direct result of Theorem 1 and Pinsker's inequality (3).

Proof of Theorem 2
Proof.Firstly, we have for some constant c > 0. From this together with (18) and Lemma 1, we derive The proof is completed.