A Dissipation of Relative Entropy by Diffusion Flows

Abstract: Given a probability measure, we consider the diffusion flows of probability measures associated with the partial differential equation (PDE) of Fokker–Planck. Our flows of the probability measures are defined as the solution of the Fokker–Planck equation for the same strictly convex potential, which means that the flows have the same equilibrium. Then, we shall investigate the time derivative for the relative entropy in the case where the object and the reference measures are moving according to the above diffusion flows, from which we can obtain a certain dissipation formula and also an integral representation of the relative entropy.


Introduction
We shall begin with the definitions and fix some notations.Several results in the literature that we will use later are also gathered in this section.Probability measures on R n in this paper are always assumed to be absolutely continuous with respect to the Lebesgue measure.Thus, we say that a probability measure µ has the continuous density f , which means dµ(x) = f (x) dx, and the measure µ is sometimes identified with its density f .Throughout this paper, if we simply write an integral symbol without specifying any domain, which means R n is the integral over the whole space on R n .
Definition 1.For a probability measure µ on R n with the density f , the entropy of µ (or f ) is defined by and, if the density f is smooth, then we define the Fisher information of µ (or f ) as For an R n -valued random variable X, if X is distributed according to the probability measure µ, then we define the entropy H(X) and the Fisher information I(X) of X by H(X) = H(µ) and I(X) = I(µ), respectively.
In a one-dimensional case, the gradient ∇ log f in (2) is usually called the score function of X (or µ) and denoted by ρ X (or ρ µ ).For a differentiable function ξ with bounded derivative, the score function behaves that which is known as Stein's identity.
Let X be an R n -valued random variable distributed according to the probability measure µ, and Z be an n-dimensional standard (with mean vector 0 and identity covariance matrix I n ) Gaussian random variable independent of X.Then, for τ > 0, the independent sum X + √ τZ is called the Gaussian perturbation of X.
We denote by µ τ the probability measure corresponding to the Gaussian perturbation X + √ τZ, and f τ stands for the density of µ τ .It is fundamental that the density function f τ satisfies the heat equation: where ∆ = ∇•∇ is the Laplacian operator.
The remarkable relationship between the entropy and the Fisher information can be established by the Gaussian perturbation as follows (see, for instance, [1] or [2]), which is known as the de Bruijn identity.
Lemma 1.Let X be an R n -valued random variable distributed according to the probability measure µ.Then, for the Gaussian perturbation, it holds that Namely, using the density f τ of the Gaussian perturbed measure µ τ , we can write Definition 2. Let µ and ν be probability measures on R n with µ ν (µ is absolutely continuous with respect to ν).We denote the probability density functions of µ and ν by f and g, respectively.Then, as the ways of indicating the difference between two measures, we shall introduce the following quantities: the relative entropy H(µ | ν) of µ with respect to ν, H( f | g) of f with respect to g, is defined by Although it does not appear to have received widespread attention, it is natural to define the relative Fisher information I(µ | ν) of µ with respect to ν, I( f | g) of f with respect to g as (see, for instance, [3]) where the relative density f /g is assumed to be sufficiently smooth such that the above expressions make sense.
The relative entropy H( f | g) and the relative Fisher information I( f | g) take non-negative values and 0 if and only if f (x) = g(x) for almost all x ∈ R n .Similar to Definition 1, for random variables X and Y with the distributions µ and ν, the relative entropy and the relative Fisher information of X with respect to Y are defined as In view of the de Bruijn identity, one might expect that there is a similar connection between the relative entropy and the relative Fisher information.Indeed, Vérdu in [4] investigated the derivative in for two Gaussian perturbations, and derived the following identity of the de Bruijn type via minimum mean-square error (MMSE) in estimation theory.Lemma 2. Let X and Y be R n -valued random variables distributed according to the probability measure µ and ν, respectively.Then, for the Gaussian perturbations, it holds that where µ τ and ν τ are the corresponding measures of the Gaussian perturbations.
An alternative proof of this identity by direct calculation with integrations by part has been given in [5].It should be noted here that the reference measure does move by the same heat equation in the formula of Lemma 1.
Other derivative formulas of the relative entropy have been investigated in [6][7][8], which are closely related to the theory of optimal transport and functional inequalities of informations.It is common in these fields that the reference measure is unchanged with the equilibrium measure.Here, we shall recall such a derivative formula and will list some useful related results.
Let V be a C 1 map on R n and consider the probability measure κ by where Z = e −V(x) dx, the normalization constant.Such a probability measure κ is called the equilibrium (or Gibbs) measure for the potential function V. Given a probability measure µ 0 , we consider the diffusion flow of probability measures µ t t≥0 associated with the gradient ∇V, that is, the density f t of the measure µ t (t > 0) is defined as the solution to the partial differential equation: which is called the Fokker-Planck equation.It is easily found that the long-time asymptotically stationary measure for Fokker-Planck Equation ( 12) is given by the above equilibrium (Gibbs) measure.
Setting the equilibrium measure as the reference, we can understand the relationship between the relative entropy and the relative Fisher information via the Fokker-Planck equation as follows (see, for instance, [8]): Proposition 1.Let µ t t≥0 be a diffusion flow of the probability measure associated with the gradient ∇V, and let κ be the equilibrium measure for the potential function V.Then, it holds the differential formula: In the case where the potential function V has the above convexity, we can obtain the inequality between the relative entropy and the relative Fisher information with respect to the equilibrium measure for the potential V, which is known as the logarithmic Sobolev inequality (see, for instance, [7,9,10]).
Theorem 1.Let κ be the equilibrium measure for the potential function V.If the potential function V is strictly K-convex, then it follows that, for any probability measure µ( κ), Combining Proposition 1 and Theorem 1, we can obtain the following convergence of the diffusion flow to the equilibrium: Proposition 2. Let µ t t≥0 be the diffusion flow of probability measures by the Fokker-Planck equation associated with the strictly K-convex potential V, and κ be the equilibrium for the potential V.Then, it follows that which implies that µ t converges exponentially fast, as t → ∞, to the equilibrium κ in the relative entropy.Namely, The diffusion flow by the quadratic potential is called the Ornstein-Uhlenbeck flow, and the corresponding Fockker-Planck equation is reduced to In this case, we can obtain the explicit solution f t , and it follows that the equilibrium measure becomes the standard Gaussian.
Furthermore, it is known that the solution to Equation ( 17) can be represented in terms of random variables as follows: let X be a random variable on R n having the initial density f 0 , and Z be an n-dimensional standard Gaussian random variable independent of X.Then, the density function of the independent sum gives the solution f t to partial differential Equation (17).Since the Ornstein-Uhlenbeck flow has Gaussian equilibrium, it has been widely used as the technical tool for the proof of Gross's logarithmic Sobolev inequality [9] and Talagrand inequality [11].
Here, we shall mention one more useful result concerned with the convergence of the relative entropy, which is called the Csiszár-Kullback-Pinsker inequality (see, for instance, [12] or [13]).Lemma 3. The convergence in the relative entropy is stronger than in L 1 -norm, that is, it holds for probability densities that The problem of finding the time derivative of the relative entropy between two densities under the same continuity equation has been investigated in [14,15].In this paper, we will treat the Focker-Planck equation with strictly convex potential as our continuity equation, which is because the first natural extension of the heat equation and the similar dissipation formula in Lemma 2 of Vérdu can be derived by the fundamental method, the integration by parts like in [5].
The time integration of our formula will give an integral representation of the relative entropy.Applying this representation to the Ornstein-Uhlenbeck flows, we can give an extension of the formula for entropy gap.

Dissipation of the Relative Entropy
We will calculate the time derivative of the relative entropy for the case where the objective and the reference measures are evolved by the Fokker-Planck equation with the same strictly convex potential.We shall begin with describing our situation of calculation precisely.
• Situation A: Let µ 0 and ν 0 be Lebesgue absolutely continuous probability measures on R n with µ 0 ν 0 , and let µ t and ν t (t ≥ 0) be the diffusion flows by the Fokker-Planck equation with the strictly K-convex potential function V starting from µ 0 and ν 0 , respectively.Here, the growth rate of the potential function V is assumed to be at most polynomial.
We assume that, for t ≥ 0, the measures µ t and ν t have finite Fisher information I(µ t ) < ∞ and I(ν t ) < ∞, and are absolutely continuous with respect to the Lebesgue measure, the densities f t and g t of which are sufficiently smooth and rapidly decreasing at infinity.Furthermore, it is naturally required that µ t ν t .Here, we shall impose the following assumption on the relative densities, which does not cause any loss of generality but for simplicity of the proof.
• Assumption on the relative densities D: Let dκ(x) = e −V(x) dx be the equilibrium measure of the potential function V, where the potential function V is normalized (shifted) so that Z = 1.The Fokker-Planck equation will not be made of any effect by this normalization (shift) because it depends only on the gradient ∇V.
We may assume that the relative densities f t (x) e −V(x) and g t (x) e −V(x) are bounded away from zero and infinity for sufficiently large t.Namely, there exist uniform constants 0 and Hence, the relative density is also bounded away from zero and infinity for sufficiently large t, that is, there exist uniform constants 0 for sufficiently large t.
Remark 1.The above technical assumptions on the relative densities are due to the non-linear approximation argument given by Otto and Villani in [8] and the following fact: in our situation, it follows that the density f t of the diffusion flow of probability measure by the Fokker-Planck equation converges to the equilibrium e −V in L 1 -norm as t → ∞ by combining Proposition 2 with Lemma 3-so does g t .
Proposition 3. Let µ t and ν t (t ≥ 0) be the flows of the probability measures on R n by the Fokker-Planck equation as in Situation A with the assumptions on the relative densities D.Then, it holds that, for t > 0, Proof.We expand the derivative of the relative entropy as Since we know that f t and g t converge to the equilibrium e −V in L 1 , and that the time derivatives ∂ t f t and ∂ t g t are converging to 0 as t → ∞, and the densities f t , g t and |∂ t f t |, |∂ t g t | are uniformly bounded for t.Furthermore, by our assumptions on the relative density, f t g t is bounded away from zero and infinity.Hence, we are allowed to exchange integration and t-differentiation, which is justified by a routine argument with the bounded convergence theorem (see, for instance, [2] and also [8]).
Then, the first term on the right-hand side of ( 24) is calculated with the Fokker-Planck equation as follows: . (25) The integral (I) in ( 25) is clearly 0. By applying integration by parts, the integral (II) can be written by Here, it should be noted that (log f t ) ∇ f t will vanish at infinity by the following observation: if we factorize it as then, as µ t has finite Fisher information I(µ t ) < ∞, must be bounded at infinity.Furthermore, f t log f t will vanish at infinity by the limit formula, lim ξ→+0 ξ log ξ = 0.
The integral (III) in ( 25) becomes by the following observations: since f t is rapidly decreasing at infinity and the growth rate of the potential function V is at most polynomial by our assumption, we have lim |x|→∞ f t ∇V = 0.The limit lim |x|→∞ f t log f t = 0 is the same as above.Thus, will vanish at infinity.Substituting ( 26) and ( 28) into (25), we can obtain Next, we shall see the second term on the right-hand side of (24), which can be reformulated by the Fokker-Planck equation as follows: The integral (IV) in (31) can be reformulated by applying integration by parts as follows: where we can see that ∇g t f t g t vanishes at infinity by the following observation: factorize it as and then the boundedness of is by our assumptions on the relative density and that of ∇g t √ g t comes from the finiteness of Fisher information I(ν t ) < ∞.
Applying integration by parts again, the integral (V) in ( 31) becomes Because the growth rate of the function V is at most polynomial and f t is rapidly decreasing at infinity, thus g t ∇V f t g t = f t ∇V will vanish at infinity.
The integral (VI) in (31) can be reformulated as where we can find that (log g t ) ∇ f t vanishes at infinity by factorizing with the assumption of I(µ t ) < ∞ and the boundedness of The last integral (VII) in (31) becomes where (log g t ) f t ∇V will vanish with the following factorization by the same reasons as above: In reformulation of the integrals (VI) and (VII), we have, of course, used integration by parts.Substituting the equations from (32) to (37) into (31), we can have Finally, combining (30) and (39), we obtain that (40) Remark 2. The assumption of dropping surface terms on integrations by parts, that is, vanishing at infinity in the proof of Proposition 3, is rather common in various physics, which has been also repeatedly employed in a series of works by Plastino et al., for instance, in [16,17].
Next, we will see the convergence of the relative entropy for the pair of time evolutes by the same Fokker-Planck equations.Proposition 4.Under Situation A with Assumption D, the relative entropy H( f t | g t ) converges exponentially fast to 0 as t → ∞.
Proof.We first expand the relative entropy H( f t | g t ) as where V(x) is the potential function of the Fokker-Planck equation.Then, we obtain Since the first term on the right-hand side of (42) is the relative entropy H( f t | e −V ), we concentrate our attention on the second term.
We put the set P ⊂ R n as P = x ∈ R n : log g t (x) + V(x) ≥ 0 , and then we have Thus, for sufficiently large t, it can be evaluated as follows: where in the last inequality is by virtue of the assumption on the relative densities.Consequently, we can have the estimation that with the positive constant M = max{M 0 , M 1 }.
As we have mentioned in Lemma 3 that the relative entropy controls the L 1 -norm, we have Thus, we obtain that, for sufficiently large t, Taking the limit t → ∞, it follows that H f t | g t → 0 exponentially fast because H f t | e −V and H g t | e −V converge to 0 in exponentially fast with rate 2K.
By the dissipation formula in Proposition 3, together with the above convergence, we can obtain the following integral representation of relative entropy.Theorem 2. Let f t and g t (t ≥ 0) be the flows of probability densities on R n by the Fokker-Planck equation under situation A with the assumptions on relative densities D.Then, we have the integral representation of the relative entropy (48) If we choose particularly the equilibrium e −V as the initial measure of the reference g 0 , then it is stationary such that g t = e −V (t ≥ 0).Hence, as the direct consequence of the above theorem, we have the following integral formula: (49)

An Application to the Entropy Gap
In this section, we shall apply the formula of the time integration in Theorem 2 to the Ornstein-Uhlenbeck flows, which gives an extension of the formula of the entropy gap.For simplicity, we will consider the one-dimensional case in this section.
Among random variables with unit variance, the Gaussian has the largest entropy.Let X be a standardized (mean 0 and variance 1) random variable, and let Z be a standard Gaussian random variable.Then, the quantity H(Z) − H(X) is called the entropy gap or the non-Gaussianity, which coincides, of course, with the relative entropy H(X | Z).It is known (see, for instance, [18]) that this entropy gap can be written as the integration of the Fisher information.Namely, where X t is the time evolute at t of the random variable X by the Ornstein-Uhlenbeck semigroup in (17).It is easy to find that our formula (48) of Theorem 2 covers (50) as one of the special cases.
In the formula (48) of Theorem 2, even for the case of the quadratic potential V(x) = x 2 2 , we can choose the initial reference measure ν 0 more freely other than the standard Gaussian as we will illustrate below.Let X be a centered random variable of variance σ 2 (not a unit in general) and G be a centered Gaussian of the same variance σ 2 .Then, applying the integral formula with the potential function , the relative entropy H(X | G), which is equal to the entropy gap H(G) − H(X), can be written by the integration In Figure 2, the dashed curve indicates the convergence of H f t |g t in Figure 1.
Example 2. In the second example, we put the initial reference measure as g 0 (x) = ϕ(x, 3), that is, we take the centered Gaussian of variance 3 as the initial reference measure, but f 0 (x) = u(x) is unchanged.In Figure 3, the convergence of H f t |g t is illustrated.In Figure 4, the dashed curve indicates the convergence of H f t |g t in Figure 3. Example 3. In the third numerical example, the initial objective and the initial reference measures are given as the uniform distributions on the intervals (−1, 1) and − 1 2 , 1 2 , respectively.Namely, we set the densities as f 0 (x) = u(x) and g 0 = 2u(2x).We illustrate the convergence of H f t |g t in Figure 5.In Figure 6, the dashed curve indicates the convergence of H f t |g t in Figure 5.

Conclusions
The partial differential equation of Fokker-Planck describes the flow of the probability measures for diffusion process.The diffusion by the Fokker-Planck equation with the strictly convex potential V has the long-time asymptotic stationary measure e −V .In the case of the relative entropy endowed with the stationary measure e −V as reference, the dissipation formula of the relative entropy of the diffusion flow by the Fokker-Planck equation with the potential V is known in literature.
In this paper, we have derived the similar dissipation formula under the more flexible situation.Namely, we have considered the situation that the reference measure is also evolved by the Fokker-Planck equation with the same potential function V for the objective measure.Then, we have obtained another integral representation of the relative entropy, which gives an extension of the formula of the entropy gap.

Figure 3 .
Figure 3.The value of H f t |g t .

Figure 4 .
Figure 4.The value of H f t | e −V + 2H g t | e −V .

H f t | g t Figure 5 .
Figure 5.The value of H f t |g t .

Figure 6 .
Figure 6.The value of H f t | e −V + 2H g t | e −V .