A Representation of the Relative Entropy with Respect to a Diffusion Process in Terms of Its Infinitesimal Generator

Oliver Faugeras; James MacLaurin

doi:10.3390/e16126705

and

INRIA Sophia Antipolis Mediterannee, 2004 Route Des Lucioles, Sophia Antipolis, France

^*

Author to whom correspondence should be addressed.

Entropy2014, 16(12), 6705-6721;https://doi.org/10.3390/e16126705

This article belongs to the Section Information Theory, Probability and Statistics

Version Notes

Order Reprints

Abstract

In this paper we derive an integral (with respect to time) representation of the relative entropy (or Kullback–Leibler Divergence) R(μ||P), where μ and P are measures on C([0, T]; ℝ^d)_. The underlying measure P is a weak solution to a martingale problem with continuous coefficients. Our representation is in the form of an integral with respect to its infinitesimal generator. This representation is of use in statistical inference (particularly involving medical imaging). Since R(μ||P) governs the exponential rate of convergence of the empirical measure (according to Sanov’s theorem), this representation is also of use in the numerical and analytical investigation of finite-size effects in systems of interacting diffusions.

Keywords:

relative entropy; Kullback–Leibler; diffusion; martingale formulation

1. Introduction

In this paper we derive an integral representation of the relative entropy R(μ||P), where μ is a measure on C([0, T];ℝ^d) and P governs the solution to a stochastic differential equation (SDE). The relative entropy is used to quantify the distance between two measures. It has considerable applications in statistics, imaging, information theory and communications. It has been used in the long-time analysis of Fokker–Planck equations [1,2], the analysis of dynamical systems [3] and the analysis of spectral density functions [4]. It has been used in financial mathematics to quantify the difference between martingale measures [5,6]. It has also been shown in [7] that the existence problem of the minimal relative entropy martingale measure problem of birth and death processes can be reduced to the problem of solving the Hamilton–Jacobi–Bellman equation; furthermore the minimal entropy martingale measures (MEMMs) for geometric Levy processes are investigated in [8]. The finiteness of R(μ||P) has been shown to be equivalent to the invertibility of certain shifts on Wiener space, when P is the Wiener measure [9,10]. However, one of the most frequent uses of the relative entropy is in statistical inference (particularly in medical imaging) [11,12]. For example, in data fitting, it is a standard technique to select the parameters that minimise the relative entropy of two conditional probability distributions [13]. Modelling in medical imaging increasingly involves diffusion process with state space C([0, T];ℝ^d), for which the expression

R (μ ‖ P) = E^{μ} [\log \frac{d μ}{d P}]

or the variational definition in Definition 1 may not always be tractable. Furthermore, it is not always clear that one may simply approximate the relative entropy by successively calculating it for the marginals over increasingly fine time-discretisations, since these expressions may asymptotically approach infinity (see (4) below).

Another very important application of the relative entropy is in the field of Large Deviations. Sanov’s theorem dictates that the empirical measure induced by independent samples governed by the same probability law P converge towards their limit exponentially fast; and the constant governing the rate of convergence is the relative entropy [14]. Large Deviations have been applied for example to spin glasses [15], neural networks [16–18] and mean-field models of interacting particles [19,20]. In the mean-field theory of neuroscience in particular, there has been a recent interest in the modelling of “finite size effects” [18,21], that is, the deviations from the limiting behaviour for a population of a particular size. Large Deviations provides a mathematically rigorous tool to do this. In this system, the limiting system is typically the law P of a stochastic process, and therefore the likelihood of the empirical measure of the system being “near” some measure μ is the relative entropy R(μ||P). However the numerical calculation of R(μ||P) is not straightforward: the results of this paper provide an alternative characterization of R(μ||P), which assists in this calculation.

For example, the rate function for the Large Deviation Principle of the interacting particle model of [20] is directly in terms of the relative entropy between two measures on the space of continuous functions (see in particular Theorem 5.2 of this paper). Similarly, the rate function in [18] (Theorem 10) may be expressed as a function of the relative entropy. In more detail, the rate function

\overset{⌣}{J}

in [18] (Theorem 10) is of the form

\overset{⌣}{J} (μ) = \lim_{n \to \infty} \frac{1}{| V_{n} |} R (μ^{V_{n}} ‖ Ξ^{V_{n}})

. Here Ξ is the law of the process in [18] (Equation (31)), i.e., the law of a ℤ^d-indexed stochastic process, and μ^Vn and Ξ^Vn are the marginals over the finite hypercube V_n of side length (2n + 1). The results of this paper give a means of evaluating R(μ^Vn||Ξ^Vn) and therefore

\overset{⌣}{J} (μ)

.

In this paper we derive a specific integral (with respect to time) representation of the relative entropy R(μ||P) when P is the law of a diffusion process. The representation is in terms of the infinitesimal generator of P. This P is the same as in [22] (Section 4). The representation makes use of regular conditional probabilities. We expect that in some circumstances, it ought to be more tractable than the standard definition in Definition 1, and thus it might be of practical use in the applications listed above.

2. Outline of Main Result

Let T be the Banach Space C([0, T];R^d) equipped with the norm

‖ X ‖ = \sup_{s \in [0, T]} {| X_{s} |},

(1)

where |⋅| is the standard Euclidean norm over ℝ^d. We let (F_t) be the canonical filtration over (T, B(T)). For some topological space

X

, we let B(

X

) be the Borelian σ-algebra and

ℳ

(

X

) the space of all probability measures on

(X, B (X))

. Unless otherwise indicated, we endow

ℳ

(

X

) with the topology of weak convergence. Let σ = {t₁, t₂,…, t_m} be a finite set of elements such that t₁ ≥ 0, t_m ≤ T and t_j < t_j₊₁. We term σ a partition, and denote the set of all such partitions by J. The set of all partitions of the above form such that t₁ = 0 and t_m = T is denoted J_*. We define |σ| = sup₁_≤j≤m−₁{t_j₊₁ − t_j}. For some t ∈ [0, T] and σ ∈ J_*, we define

\underline{σ} (t) = sup {s \in σ | s \leq t}

. The following definition of relative entropy is standard.

Definition 1. Let (Ω,

ℋ

) be a measurable space, and μ, ν probability measures.

R_{ℋ} (μ ‖ ν) = \sup_{f \in ε} {E^{μ} [f] - \log E^{ν} [\exp (f)]} \in R \cup \infty,

where ε is the set of all bounded functions. If the σ-algebra is clear from the context, we omit the

ℋ

and write R(μ||ν). If Ω is Polish and

ℋ

= B(Ω), then we only need to take the supremum over the set of all continuous bounded functions.

Let P ∈

ℳ

(T) be the following law governing a Markov–Feller diffusion process on T. Stipulate P to be a weak solution (with respect to the canonical filtration) of the local martingale problem with infinitesimal generator

L_{t} (f) = \frac{1}{2} \sum_{1 \leq j, k \leq d} a^{j k} (t, x) \frac{\partial^{2} f}{\partial x^{j} x^{k}} + \sum_{1 \leq j \leq d} b^{j} (t, x) \frac{\partial f}{\partial x^{j}},

for f(x) in C²(ℝ^d), i.e., the space of twice continuously differentiable functions. The initial condition (governing P₀) is μ_I ∈

ℳ

(ℝ^d). The coefficients a^jk, b^j: [0, T]× ℝ^d → ℝ are assumed to be continuous (over [0, T] × ℝ^d), and the matrix a(t, x) is strictly positive definite for all t and x. Here P is assumed to be the unique weak solution. We note that the above infinitesimal generator is the same as in [22] (p. 269) (note particularly its Remark 4.4). We note that P is the law of the solution Y = (Y ^j) to the following stochastic differential equation: for j ∈ [1, d],

d Y_{t}^{j} = b^{j} (t, Y) d t + \sum_{k = 1}^{d} a^{j k} (t, Y) d W^{k} .

Here (W^k) are independent Wiener processes.

Our major result is the following. Let μ ∈

ℳ

(T) govern a random variable

X

∈ T. For some x ∈ T, we note μ_|_[0_,s_]_,x, the regular conditional probability (rcp) given X_r = x_r for all r ∈ [0, s]. The marginal of μ_|_[0_,s_]_,x at some time t ≥ s is noted μ_t|_[0_,s_]_,x.

Theorem 1. Let (σ⁽^m⁾)_m_∈ℤ+ be any series of partitions such that σ⁽^m) ⊆ σ⁽^m⁺¹⁾ and |σ⁽^m⁾| → 0 as m → ∞. For μ ∈

ℳ

(T),

\begin{array}{l} R (μ ‖ P) = R_{ℱ_{0}} (μ ‖ P) + \sup_{σ \in J_{*}} Γ (σ) \\ = R_{ℱ_{0}} (μ ‖ P) + \lim_{m \to \infty} Γ (σ^{(m)}) \end{array}

(2)

where

Γ (σ) E^{μ (x)} [\int_{0}^{T} \sup_{f \in D} {\frac{\partial}{\partial t} E^{μ_{t | [0, \underline{σ} (t)], x}} [f] - E^{μ_{t | [0, \underline{σ} (t)], x} (y)} [L_{t} f (y) + \frac{1}{2} \sum_{j, k = 1}^{d} a^{j k} (t, y) \frac{\partial f}{\partial y^{j}} \frac{\partial f}{\partial y^{k}}]} d t] .

(3)

Here D is the Schwartz space of compactly supported functions ℝd → ℝ, possessing continuous derivatives of all orders. If

\frac{\partial}{\partial t} E^{μ_{t | [0, \underline{σ} (t)], x}} [f]

does not exist, then we consider it to be ∞.

Our paper has the following format. In Section 3 we make some preliminary definitions, defining the process P against which the relative entropy is taken in this paper. In Section 4 we employ the projective limits approach of [22] to obtain the chief result of this paper: Theorem 1. This gives an explicit integral representation of the relative entropy. In Section 5 we apply the result in Theorem 1 to various corollaries, including the particular case when μ is the solution of a martingale problem. We finish by comparing our results to those of [19] and [20].

3. Preliminaries

We outline some necessary definitions. For σ ∈ J of the form σ = {t₁, t₂,…, t_m}, let σ_;_j = {t₁,…, t_j}. We denote the number of elements in a partition σ by m(σ). We let J_s be the set of all partitions lying in [0, s]. For 0 < s < t ≤ T, we let J_s_;_t be the set of all partitions of the form σ ∪ t, where σ ∈ J_s.

Let π: T → T_σ := ℝ ^d×m ⁽^σ⁾ be the natural projection, i.e., such that

π_{σ} (x) = (x_{t}_{1}, \dots, x_{t_{m}_{(σ)}})

. We similarly define the natural projection

π_{α γ} : T_{γ} \to T_{α} (for α \subseteq γ \in J)

, and we define

π_{[s, t]} : T \to C ([s, t]; R^{d})

to be the natural restriction of x ∈ T to [s, t]. The expectation of some measurable function f with respect to a measure μ is written as E^μ⁽^x⁾[f(x)], or simply E^μ[f] when the context is clear.

For s < t, we write

F_{s, t} = μ_{[s, t]}^{- 1} B (C ([s, t]; R^{d}))

and

F_{σ} = μ_{σ}^{- 1} B (T_{σ})

. We define F_s_;_t to be the σ-algebra generated by F_s and F_γ (where γ = [t]). For μ ∈

ℳ

(T), we denote its image laws by

μ_{σ} : = μ o μ_{σ}^{- 1} \in ℳ (T_{σ})

and

μ_{[s, t]} : = μ o μ_{[s, t]}^{- 1} \in ℳ (C ([s, t]; R^{d}))

Let μ ϵ

ℳ

(T) govern a random variable X = (X_s) ∈ T. For z ∈ ℝ^d, the rcp given X_s =z by μ|_s,z For x ϵ C([0, s]; R^d) or T, the rcp given that X_u = x_u for all 0 ≤ u ≤ s is written as μ_|_[0_,s_],_x. The rcp given that X_u = x_u for all u ≤ s, and X_t = z, is written as μ_|_s,x;t,z For σ ∈ J_s and z ∈ (ℝ^d)^m⁽^σ⁾, the rcp given that X_u = z_u for all u ∈ σ is written as μ_|_σ,z. All of these measures are considered to be in

ℳ

(C([s, T]; ℝ^d)) (unless indicated otherwise in particular circumstances). The probability laws governing X_t (for t ≥ s), for each of these, are respectively μ_t|s,z, μ_t|_[0_,s_]_,x and μ_t|σ,z. We clearly have μ_s|s,z = δ_z, for μ_s a.e. z, and similarly for the others.

Remark. See [23] (Definition 5.3.16) for a definition of a rcp. Technically, if we let

μ_{| s, z}^{*}

be the rcp given X_s = z according to this definition, then

μ_{|}_{s, z} = μ_{s, z}^{*} \circ π_{[s, T]}^{- 1}

and

μ_{t |}_{s, t} = μ_{s, z}^{*} \circ π_{[t]}^{- 1}

. By [23] (Theorem 3.18), μ_|s,z is well-defined for μ_s a.e. z. Similar comments apply to the other rcp’s defined above.

In the definition of the relative entropy, we abbreviate R_Fσ(μ||P) by R_σ(R||P). If σ = {t}, we write R_t(μ||P).

4. The Relative Entropy R(⋅||P ) Using Projective Limits

In this section we derive an integral representation of the relative entropy R(μ||P), for arbitrary μ ∈

ℳ

(T). We start with the standard result in Theorem 2, before adapting the projective limits approach of [22] to obtain the central result (Theorem 1).

We begin with a standard decomposition result for the relative entropy [24].

Lemma 1. Let X be a Polish space with sub σ-algebras G ⊆ F ⊆ B(X). Let μ and ν be probability measures on (X, F), and their regular conditional probabilities over G be (respectively) μ_ω and ν_ω. Then

R_{F} (μ ‖ ν) = R_{G} (μ ‖ ν) + E^{μ}^{(ω)} [R_{F} (μ_{ω} ‖ ν_{ω})] .

The following Theorem is a straightforward consequence of [25] (Theorem 6.6): we provide an alternative proof using the theory of Large Deviations in Section 6.

Theorem 2. If α, σ ∈ J and α ⊆ σ, then R_α(μ||P) ≤ R_σ(μ||P). Furthermore,

R_{F_{s, t}} (μ ‖ P) = sup_{σ \in J \cap [s, t]} R_{σ} (μ || P),

(4)

R_{F_{s; t}} (μ ‖ P) = sup_{σ \in J_{s; t}} R_{σ} (μ || P) .

(5)

It suffices for the supremums in (4) to take σ ⊂ Q_s,t, where Q_s,t is any countable dense subset of [s, t]. Thus we may assume that there exists a sequence σ⁽ⁿ⁾ ⊂ Q of partitions such that σ⁽ⁿ⁾ ⊆ σ⁽ⁿ⁺¹⁾, |σ⁽ⁿ⁾| → 0 as n → ∞ and

R_{F_{s, t}} (μ ‖ P) = lim_{n \to \infty} R_{σ^{(n)}} (μ || P) .

(6)

We now provide a technical lemma.

Lemma 2. Let t > s, α, σ ∈ J_s, σ ⊂ α and s ∈ σ. Then for μ_σ a.e. x, R_t(μ_|σ,x||P_|s,xs) = R(μ_t|σ,x||P_t|s,xs). Secondly,

E^{μ_{σ} (x)} [R_{t} (μ_{| σ, x} ‖ P_{| s, x_{s}})] \leq E^{μ_{σ} (z)} [R_{t} (μ_{| α, z} ‖ P_{| s, z_{s}})] .

Proof. The first statement is immediate from Definition 1 and the Markovian nature of P. For the second statement, it suffices to prove this in the case that α = σ ∪ u, for some u < s. We note that, using a property of regular conditional probabilities, for μ_σ a.e x,

μ_{t | σ, x} = E^{μ_{u | σ, x} (ω)} [μ_{t | α, ν (x, ω)}],

(7)

where v(x, ω) ∈ T_α, v(x, ω)_u = ω, v(x, ω)_r = x_r for all r ∈ σ.

We consider A to be the set of all finite disjoint partitions a ⊂ B(ℝ^d) of ℝ^d. The expression for the entropy in [26] (Lemma 1.4.3) yields

E^{μ_{σ} (x)} [R (μ_{t | σ, s} ‖ P_{t | s, x_{s}})] = E^{μ_{σ} (x)} [\sup_{a \in A} \sum_{A \in a} μ_{t | σ, x} (A) \log \frac{μ_{t | σ, x} (A)}{P_{t | s, x_{s}} (A)}] .

Here the summand is considered to be zero if μ_t|σ,x(A) = 0, and infinite if μ_t|σ,x(A) > 0 and P_t|s,xs(A) = 0. Making use of (7), we find that

\begin{array}{l} E^{μ_{σ} (x)} [R (μ_{t | σ, s} ‖ P_{t | s, x_{s}})] \\ = E^{μ_{σ} (x)} [\sup_{a \in A} \sum_{A \in a} E^{μ_{u | σ, x} (ω)} [μ_{t | α, ν (x, ω)} (A)] \log \frac{μ_{t | σ, x} (A)}{P_{t | s, x_{s}} (A)}] \\ \leq E^{μ_{σ} (x)} E^{μ_{u | σ, x} (ω)} [\sup_{a \in A} \sum_{A \in a} μ_{t | α, ν (x, ω)} (A) \log \frac{μ_{t | σ, x} (A)}{P_{t | s, x_{s}} (A)}] \\ = E^{μ_{α} (z)} [\sup_{a \in A} \sum_{A \in a} μ_{t | α (z)} (A) \log \frac{μ_{t | σ, π_{σ α} z} (A)}{P_{t | s, z_{s}} (A)}] . \end{array}

We note that, for μ_α a.e. z, if

μ_{t | σ, π_{σ α} z} (A) = 0

in this last expression, then μ_t|α,z(A) = 0 and we consider the summand to be zero. To complete the proof of the lemma, it is thus sufficient to prove that for μ_α a.e. z

\sup_{a \in A} \sum_{A \in a} μ_{t | α, z} (A) \log \frac{μ_{t | α, z} (A)}{P_{t | s, z_{s}} (A)} \geq \sup_{a \in A} \sum_{A \in a} μ_{t | α, z} (A) \log \frac{μ_{t | σ, π_{σ α} z} (A)}{P_{t | s, z_{s}} (A)} .

However, in turn, the above inequality will be true if we can prove that for each partition a such that

P_{t | s, z_{s}} (A) > 0

and

μ_{t | σ, π_{σ α} z} (A) > 0

for all A ∈ a,

\sum_{A \in a} μ_{t | α, z} (A) \log \frac{μ_{t | α, z} (A)}{P_{t | s, z_{s}} (A)} - \sum_{A \in a} μ_{t | α, z} (A) \log \frac{μ_{t | σ, π_{σ α} z} (A)}{P_{t | s, z_{s}} (A)} \geq 0.

The left hand side is equal to

\sum_{A \in a} μ_{t | α, z} (A) \log \frac{μ_{t | α, z} (A)}{μ_{t | σ, π_{σ α} z} (A)} .

An application of Jensen’s inequality demonstrates that this is greater than or equal to zero. □

Remark. If, contrary to the definition, we briefly consider

μ_{| [0, t], x}

to be a probability measure on T, such that μ(A) = 1 where A is the set of all points y such that y_s = x_s for all s ≤ t, then it may be seen from the definition of R that

R_{F_{T}} (μ_{| [0, t], x} ‖ P_{| [0, t], x}) = R_{F_{t, T}} (μ_{| [0, t], x} ‖ P_{| [0, t], x}) = R_{F_{t, T}} (μ_{| [0, t], x} ‖ P_{| t, x_{t}}) .

(8)

We have also made use of the Markov property of P. This is why our convention, to which we now return, is to consider

μ_{| [0, t], x}

to be a probability measure on (C([t, T]; R^d), F_t,T ).

This leads us to the following expressions for R(μ||P).

Lemma 3. Each σ in the supremums below is of the form {t₁ < t₂ < … < t_m₍_σ₎₋₁ < t_m₍_σ₎} for some integer m(σ).

R (μ ‖ P) = R_{0} (μ ‖ P) + \sum_{j = 1}^{m (σ) - 1} E^{μ {[0, t_{j}]}^{(x)}} [R_{F t_{j}, t_{j + 1}} (μ_{| [0, t_{j}], x} ‖ P_{| t_{j}, x_{t}_{_{j}}})],

(9)

R (μ ‖ P) = R_{0} (μ ‖ P) + \sup_{σ \in J_{*}} \sum_{j = 1}^{m (σ) - 1} E^{μ_{σ}_{{_{;}}_{j}} (x)} [R_{t_{j + 1}} (μ_{t_{j + 1 | σ_{; j}, x}} ‖ P_{t_{j + 1 | t_{j},, x}_{_{t_{j}}}})],

(10)

E^{μ_{[0, s]} (x)} [R_{t} (μ_{t | [0, s], x} ‖ P_{t | s, x_{s}})] = \sup_{σ \in J_{s}} E^{μ_{σ} (y)} [R_{t} (μ_{t | σ, y} ‖ P_{t | s, y_{s}})],

(11)

where in this last expression 0 ≤ s < t ≤ T.

Proof. Consider the sub σ-algebra

F_{0, t_{m (σ) - 1}}

. We then find, through an application of Lemma 1 and (8), that

R (μ ‖ P) = R_{F_{0, t_{m (σ) - 1}}} (μ ‖ P) + E^{μ {[0, t_{m (σ) - 1}]}^{(x)}} [R_{F_{t_{m (σ) - 1}}, t_{m (σ)}} (μ_{| [0, t_{m (σ) - 1}], x} ‖ P_{| t_{m (σ) - 1}, x_{t_{m (σ) - 1}}})] .

We may continue inductively to obtain the first identity.

We use Theorem 2 to prove the second identity. It suffices to take the supremum over J_*, because R_σ(μ||P) ≥ R_γ(μ||P) if γ ⊂ σ. It thus suffices to prove that

R_{σ} (μ ‖ P) = R_{0} (μ ‖ P) + \sum_{j = 1}^{m (σ) - 1} E^{μ_{σ_{; j}} (x)} [R_{t_{j + 1}} (μ_{t_{j + 1} |}_{_{σ_{; j}}, x} ‖ P_{t_{j + 1}}_{| t_{j}, x_{t}_{_{j}}})] .

(12)

However, this also follows from repeated application of Lemma 1. To prove the third identity, we firstly note that

\begin{matrix} R_{F_{s; t}} (μ ‖ P) = R_{0} (μ ‖ P) + \sup_{σ \in J_{s; t}} \sum_{j = 1}^{m (σ) - 1} E^{μ_{σ_{; j}} (x)} [R_{t_{j + 1}} (μ_{t_{j + 1} |}_{_{σ_{; j}}, x} ‖ P_{t_{j + 1}}_{| t_{j}, x_{t}_{_{j}}})] . \\ = \sup_{σ \in J_{s}} {R_{σ} (μ ‖ P) + E^{μ_{σ} (x)} [R_{t} (μ_{t |}_{σ, x} ‖ P_{t}_{| s, x_{s}})]} . \end{matrix}

The proof of this is entirely analogous to that of the second identity, except that it makes use of (5) instead of (4). However, after another application of Lemma 1, we also have that

R_{F_{s; t}} (μ ‖ P) = R_{F_{0, s}} (μ ‖ P) + E^{μ_{[0, s]} (x)} [R_{t} (μ_{t |}_{[0, s], x} ‖ P_{t}_{| s, x_{s}})] .

On equating these two different expressions for

R_{F_{s; t}} (μ ‖ P)

, we obtain

E^{μ_{[0, s]} (x)} [R_{t} (μ_{t |}_{[0, s], x} ‖ P_{t}_{| s, x_{s}})] = \sup_{σ \in J_{s}} {(R_{σ} (μ ‖ P) - R_{F_{0, s}} (μ ‖ P)) + E^{μ_{σ} (x)} [R_{t} (μ_{t |}_{σ, x} ‖ P_{t}_{| s, x_{s}})]} .

Let (σ⁽^k⁾) ⊂ J_s, σ⁽^k−¹⁾ ⊆ σ⁽^k⁾ be such that

{lim}_{k \to \infty} R_{σ^{(k)}} (μ ‖ P) = R_{F}_{_{0, s}} (μ ‖ P)

. Such a sequence exists by (4). Similarly, let (γ⁽^k⁾) ⊆ J_s be a sequence such that

E^{μ_{γ} {(k)}^{(x)}} [R_{t} (μ_{t | γ^{(k)}, x} ‖ P_{t | s, x_{s}})]

is strictly non-decreasing and, as k → ∞, asymptotically approaches

\sup_{σ \in J s} E^{μ_{σ} (x)} [R_{t} (μ_{t | σ, x} ‖ P t | s, x_{s})]

. Lemma 2 dictates that

E^{μ_{σ} {(k)}_{\cup γ} {(k)}^{(x)}} [R_{t} (μ_{t | σ^{(k)} \cup γ^{(k)}, x} ‖ P_{t | s, x_{s}})]

asymptotically approaches the same limit as well. Clearly

\lim_{k \to \infty} R_{σ^{(k)} \cup γ^{(k)}} (μ ‖ P) = R_{F_{0, s}} (μ ‖ P)

because of the identity at the start of Theorem 2. This yields the third identity.

4.1. Proof of Theorem 1

In this section we work towards the proof of Theorem 1, making use of some results in [22]. However, we first require some more definitions.

If K ⊂ ℝ^d is compact, let D_K be the set of all f ∈ D whose support is contained in K. The corresponding space of real distributions is D′, and we denote the action of θ ∈ D′ by ⟨θ, f⟩. If θ ∈

ℳ

(ℝ^d), then clearly ⟨θ, f⟩ = E^θ[f]. We let

C_{0}^{2, 1} (R^{d})

denote the set of all continuous functions, possessing continuous spatial derivatives of first and second order, a continuous time derivative of first order, and of compact support. For f ∈ D and t ∈ [0, T], we define the random variable ∇_tf: ℝ^d→ ℝ^d such that

{(\nabla_{t} f (y))}^{i} = \sum_{j = 1}^{d} a^{i j} (t, y) \frac{\partial f}{\partial y^{j}}

, we may also understand ∇_tf(x) := ∇_tf(x_t)). Let a_ij be the components of the matrix inverse of a^ij. For random variables X, Y: T → ℝ^d, we define the inner

{(X, Y)}_{t, x} = \sum_{i, j = 1}^{d} X^{i} (x) Y^{j} (x) a_{i j} (t, x_{t})

with associated norm

| X |_{t, x}^{2} = {(X (x), X (x))}_{t, x}^{2}

We note that

| \nabla_{t} f |_{t, x}^{2} = \sum_{i, j = 1}^{d} a^{i j} (t, x_{t}) \frac{\partial f}{\partial z^{i}} (x_{t}) \frac{\partial f}{\partial z^{j}} (x_{t})

.

Let M be the space of all continuous maps [0, T] → M(ℝ^d), equipped with the topology of uniform convergence. For s ∈ [0, T], ϑ ∈ M and ν ∈

ℳ

(ℝ^d) we define n(s, ϑ, ν) ≥ 0 and such that

n {(s, ϑ, v)}^{2} = \sup_{f \in D} {⟨ ϑ, f ⟩ - \frac{1}{2} E^{v (y)} [| \nabla_{t} f |_{t, y}^{2}]} .

(13)

This definition is taken from [22] (Equation (4.7))—we note that n is convex in ϑ. For γ ∈

ℳ

(T), we may naturally write n(s, γ, ν) := n(s, ω, ν), where ω is the projection of γ onto M, i.e., ω(s) = γ_s. It is shown in [22] that this projection is continuous. The following two definitions, lemma and two propositions are all taken (with some small modifications) from [22].

Definition 2. Let I be an interval of the real line. A measure μ ∈

ℳ

(T) is called absolutely continuous if for each compact set K ⊂ ℝ^d there exists a neighbourhood U of 0 in K and an absolutely continuous function H_K : I → ℝ such that

| E^{μ_{u}} [f] - E^{μ_{v}} [f] | \leq | H_{K} (u) - H_{K} (v) |,

for all u, v ∈ I and f ∈ U_K.

Lemma 4. [22] (Lemma 4.2) If μ is absolutely continuous over an interval I, then its derivative exists (in the distributional sense) for Lebesgue a.e. t ∈ I. That is, for Lebesgue a.e. t ∈ I, there exists

{\dot{μ}}_{t} \in D^{'}

such that for all f ∈ D

\lim_{h \to 0} \frac{1}{h} (⟨ μ_{t + h}, f ⟩ - ⟨ μ_{t}, f ⟩) = ⟨ {\dot{μ}}_{t}, f ⟩ .

Definition 3. For ν ∈

ℳ

(C([s, t]; ℝ^d)), and 0 ≤ s < t ≤ T, let

L_{s, t}^{2} (ν)

be the Hilbert space of all measurable maps h : [s, t] × ℝ^d → ℝ^d with inner product

[h_{1}, h_{2}] = \int_{s}^{t} E^{v_{u} (x)} [{(h_{1} (u, x), h_{2} (u, x))}_{u, x}] d u .

We denote by

L_{s, t, \nabla}^{2} (ν)

the closure in

L_{s, t}^{2} (ν)

of the linear subset generated by maps of the form (x, u) → ∇_uf, where

f \in C_{0}^{2, 1} ([s, t], R^{d})

. We note that functions in

L_{s, t, \nabla}^{2} (ν)

only need to be defined du⊗ν_u(dx) almost everywhere.

Recall that n is defined in (13), and note that

⟨ * L_{t} μ_{t}, f ⟩ : = ⟨ μ_{t}, L_{t} f ⟩

.

Proposition 1. Assume that μ ∈

ℳ

(C([r, s]; ℝ^d)), such that μ_r = δ_y for some y ∈ ℝ^d and 0 ≤ r < s ≤ T. We have that [22] (Equation 4.9 and Lemma 4.8)

\int_{r}^{s} n {(t, {\dot{μ}}_{t} - * L_{t} μ_{t}, μ_{t})}^{2} d t = \sup_{f \in C_{0}^{2, 1} (R^{d})} {E^{μ_{s} (x)} [f (s, x)] - f (r, y) - \int_{r}^{s} E^{μ_{t} (x)} [(\frac{\partial}{\partial t} + L_{t}) f (t, x) + \frac{1}{2} | \nabla_{t} f (t, x) |_{t, x}^{2}] d t} .

(14)

It clearly suffices to take the supremum over a countable dense subset. Assume now that

\int_{r}^{s} n {(t, {\dot{μ}}_{t} - * L_{t} μ_{t}, μ_{t})}^{2} d t < \infty

. Then for Lebesgue a.e. t,

t, {\dot{μ}}_{t} = * K_{t} μ_{t}

, where [22] (Lemma 4.8(3))

K_{t} f (\cdot) = L_{t} f (\cdot) + \sum_{i \leq j \leq d} {(h^{μ} (t, \cdot))}^{j} \frac{\partial f}{\partial x^{j}} (\cdot),

(15)

for some

h^{μ} \in L_{r, s, \nabla}^{2} (μ)

that satisfies [22] (Lemma 4.8(4))

\int_{r}^{s} n {(t, {\dot{μ}}_{t} - * L_{t} μ_{t}, μ_{t})}^{2} d t = \frac{1}{2} \int_{r}^{s} E^{μ_{t} (x)} [| h^{μ} (t, x) |_{t, x}^{2}] d t < \infty .

(16)

Remark. We reach (17) from the proof of Lemma 9 in [22] (Eq 4.10). One should note also that in the equation (4.10) of [22] the relative entropy R as

L_{ν}^{(1)}

. To reach (18), we also use the equivalence between (4.7) and (4.8) in [22].

Proposition 2. Assume that μ ∈

ℳ

(T), such that μ_r = δ_y for some y ∈ ℝ^d and 0 ≤ r < s ≤ T. If

R_{F_{r, s}} (μ ‖ P_{| r, y}) < \infty

, then μ is absolutely continuous on [r, s], and [22] (Lemma 4.9)

R_{F_{r, s}} (μ ‖ P_{| r, y}) \geq \int_{r}^{s} n {(t, {\dot{μ}}_{t} - * L_{t} μ_{t}, μ_{t})}^{2} d t .

(17)

Here the derivative

{\dot{μ}}_{t}

is defined in Lemma 4. For all f ∈ D, [22] (Eq. (4.35))

E^{μ_{s}} [f] - \log E^{P_{s | r, y}} [\exp (f)] \leq \int_{r}^{s} n {(t, {\dot{μ}}_{t} - * L_{t} μ_{t}, μ_{t})}^{2} d t .

(18)

We are now ready to prove Theorem 1 (the central result).

Proof. Fix a partition σ = {t₁, …, t_m}. We may conclude from (9) and (17) that

R (μ ‖ P) \geq R_{0} (μ ‖ P) + \sum_{j = 1}^{m - 1} E^{μ_{[0, t_{j}]} (x)} \int_{t_{j}}^{t_{j + 1}} n {(t, {\dot{μ}}_{t | [0, t_{j}], x,} - * L_{t} μ_{t | [0, t,_{j}], x}, μ_{t | [0, t,_{j}], x})}^{2} d t .

(19)

The integrand on the right hand side is measurable with respect to

E^{μ_{[0, t_{j}]}}^{^{(x)}}

due to the equivalent expression (14). We may infer from (18) that

E^{μ_{[0, t_{j}]} (x)} \int_{t_{j}}^{t_{j + 1}} n {(t, {\dot{μ}}_{t | [0, t_{j}], x} - * L_{t} μ_{t | [0, t_{j}], x}, μ_{t | t_{j}, x})}^{2} d t \geq E^{μ_{[0, t_{j}]} (x)} [\sup_{f \in D} {E^{μ_{t_{j + 1} | [0, t_{j}], x}} [f] - \log E^{P_{t_{j + 1} | t_{j, x_{t_{j}}}}} [\exp (f)]}] = E^{μ_{[0, t_{j}]} (x)} [\sup_{f \in C_{b} (R^{d})} {E^{μ_{t_{j + 1} | [0, t_{j}], x}} [f] - \log E^{P_{t_{j + 1} | t_{j}, x_{t_{j}}}} [\exp (f)]}] .

(20)

This last step follows by noting that if ν ∈

ℳ

(ℝ^d), and f ∈ C_b((ℝ^d), and the expectation of f with respect to ν is finite, then there exists a series (K_n) ⊂ ℝ^d of compact sets such that

\int_{R^{d}} \int (x) d ν (x) = \lim_{n \to \infty} \int_{K_{n}} f (x) d ν (x) .

In turn, for each n there exist

(f_{n}^{(m)}) \in D_{K_{n}}

such that we may write

\int_{K_{n}} f (x) d ν (x) = \lim_{m \to \infty} \int_{K_{n}} f_{n}^{(m)} (x) d ν (x) .

This allows us to conclude that the two supremums are the same. The last expression in (20) is merely

E^{μ_{[0, t_{j}]} (x)} [R_{t_{j + 1}} (μ_{t_{j + 1} | [0, t_{j}], x} ‖ P_{t_{j + 1} | t_{j}, x_{t_{j}}})] .

By (11), this is greater than or equal to

E^{μ_{σ}_{{_{;}}_{j}} (y)} [R_{t_{j + 1}} (μ_{t_{j + 1} | σ_{; j, y}} ‖ P_{t_{j + 1} | t_{j}, y_{t_{j}}})] .

We thus obtain the theorem using (10).

5. Some Corollaries

We state some corollaries of Theorem 1. In the course of this section we make progressively stronger assumptions on the nature of μ, culminating in the elegant expression for R(μ||P) when μ is a solution of a martingale problem. We finish by comparing our work with that of [19,20].

Corollary 1. Suppose that μ ∈

ℳ

(T) and R(μ||P) < ∞. Then for all s and μ a.e. x, μ_|_[0_,s_]_,x is absolutely continuous over [s, T]. For each s ∈ [0, T] and μ a.e. x ∈ T, for Lebesgue a.e. t ≥ s

{\dot{μ}}_{t | [0, s], x} = * K_{t | s, x}^{μ} μ_{t | [0, s], x}

(21)

where for some

h_{s, x}^{μ} \in L_{s, T, \nabla}^{2} (μ_{| [0, s], x})

K_{t | s, x}^{μ} f (y) = L_{t} f (y) + \sum_{j = 1}^{d} h_{s, x}^{μ, j} (t, y) \frac{\partial f}{\partial y^{j}} (y) .

(22)

Furthermore,

R (μ ‖ P) = R_{0} (μ ‖ P) + \frac{1}{2} \sup_{σ \in J_{*}} \int_{0}^{T} E^{μ (w)} E^{μ_{t | [0, \underline{σ} (t)], w} (z)} [{| h_{\underline{σ} (t), w}^{μ} (t, z) |}_{t, z}^{2}] d t .

(23)

For any dense countable subset Q_0,_T of [0, T], there exists a series of partitions σ⁽ⁿ⁾ ⊂ σ⁽ⁿ⁺¹⁾ ∈ Q₀_,T, such that as n → ∞, |σ⁽ⁿ⁾| → 0, and

R (μ ‖ P) = R_{0} (μ ‖ P) + \frac{1}{2} \lim_{n \to \infty} \int_{0}^{T} E^{μ (w)} E^{μ}^{_{t | [0, {\underline{σ}}^{(n)} (t)], w^{(z)}}} [{| h_{{\underline{σ}}^{(n)} (t), w}^{μ} (t, z) |}_{t, z}^{2}] d t .

(24)

Remark. It is not immediately clear that we may simplify (23) further (barring further assumptions). The reason for this is that we only know that

E^{μ | [0, \underline{σ} (t)], w (z)} [{| h_{\underline{σ} (t), w}^{μ} (t, z) |}_{t, z}^{2}]

is measurable (as a function of w), but it has not been proven that

h_{\underline{σ} (t), w}^{μ} (t, z)

is measurable (as a function of w).

Proof. Let σ = {0 = t₁, …, t_m = T} be an arbitrary partition. For all j < m, we find from Lemma 3 that

R_{F_{t_{j}, t_{j + 1}}} (μ_{| [0, t_{j}], x} ‖ P_{| t_{j, x t_{j}}}) < \infty

for

μ_{[0, t_{j}]}

a.e. x ∈ C([0, t_j]; ℝ^d). We thus find that, for all such x,

μ_{| [0, t_{j}], x}

is absolutely continuous on [t_j, t_j₊₁] from Proposition 2. We are then able to obtain (21) and (22) from Propositions 1 and 2. From (2), (16) and (21) we find that

R (μ ‖ P) = R_{0} (μ ‖ P) + \frac{1}{2} \sup_{σ \in J_{*}} E^{μ (x)} \int_{0}^{T} E^{μ_{t | [0, \underline{σ} (t)], x (z)}} [{| h_{\underline{σ} (t), x}^{μ} (t, z) |}_{t, z}^{2}] d t .

(25)

The above integral must be finite (since we are assuming R(μ||P) is finite). Furthermore

E^{μ_{t | [0, \underline{σ} (t)], x^{(z)}}} [{| h_{\underline{σ} (t), x}^{μ} (t, z) |}_{t, z}^{2}]

is (t, x) measurable as a consequence of the equivalent form (14). This allows us to apply Fubini’s theorem to obtain (23). The last statement on the sequence of maximising partitions follows from Theorem 2.

Corollary 2. Suppose that R(μ||P) < ∞. Suppose that for all s ∈ Q₀_,T (any countable, dense subset of [0, T]), for μ a.e. x and Lebesgue a.e. t,

h_{s, x}^{μ} (t, x_{t}) = E^{μ_{| [0, s], x; t, x_{t}} (w)} h^{μ} (t, w)

for some progressively measurable random variable h^μ : [0, T] × T → ℝ^d. Then

R (μ ‖ P) = R_{0} (μ ‖ P) + \frac{1}{2} \int_{0}^{T} E^{μ (w)} [{| h^{μ} (t, w) |}_{t, w_{t}}^{2}] d t .

Proof. Let G^s,x^;^t,y be the sub σ-algebra consisting of all B ∈ B(T) such that for all w ∈ B, w_r = x_r for all r ≤ s and w_t = y. Thus

h_{s, x}^{μ} (t, y) = E^{μ_{| [0, s]}, x; t, y^{(w)}} h^{μ} (t, w) = E^{μ} [h^{μ} (t, \cdot) | G^{s, x; t, y}]

. By [27] (Corollary 2.4), since

\cap_{s < t} G^{s, x; t, x_{t}} = G^{t, x; t, x_{t}}

(restricting to s ∈ Q₀_,T), for μ a.e. x,

\lim_{s \to t^{-}} E^{μ | [0, s], x; t, x_{t} (w)} h^{μ} (t, w) = h^{μ} (t, x),

(26)

where s ∈ Q₀_,T. By the properties of the regular conditional probability, we find from (24) that

R (μ ‖ P) = R_{0} (μ ‖ P) + \frac{1}{2} \lim_{n \to \infty} \int_{0}^{T} E^{μ (w)} [{| E^{μ | [0, {\underline{σ}}^{(n)} (t)], w; t, w_{t}^{(υ)}} [h^{μ} (t, υ)] |}_{t, w_{t}}^{2}] d t .

(27)

By assumption, the above limit is finite. Thus by Fatou’s lemma, and using the properties of the regular conditional probability,

R (μ ‖ P) \geq R_{0} (μ ‖ P) + \frac{1}{2} \int_{0}^{T} E^{μ (w)} [\underset{n \to \infty}{\lim_{¯}} {| E^{μ | [0, {\underline{σ}}^{(n)} (t)], w; t, w_{t}^{(υ)}} [h^{μ} (t, υ)] |}_{t, w_{t}}^{2}] d t .

Through use of (26),

R (μ ‖ P) \geq R_{0} (μ ‖ P) + \frac{1}{2} \int_{0}^{T} E^{μ (w)} [{| h^{μ} (t, w) |}_{t, w_{t}}^{2}] d t .

Conversely, through an application of Jensen’s inequality to (27)

R (μ ‖ P) \leq R_{0} (μ ‖ P) + \frac{1}{2} \lim_{n \to \infty} \int_{0}^{T} E^{μ (w)} [E^{μ | [0, {\underline{σ}}^{(n)} (t)], w; t, w_{t}^{(υ)}} [{| h^{μ} (t, υ) |}_{t, w_{t}}^{2}]] d t .

A property of the regular conditional probability yields

R (μ ‖ P) \leq R_{0} (μ ‖ P) + \frac{1}{2} \int_{0}^{T} E^{μ (w)} [{| h^{μ} (t, w) |}_{t, w_{t}}^{2}] d t .

Remark. The condition in the above corollary is satisfied when μ is a solution to a martingale problem—see Lemma 5.

We may further simplify the expression in Theorem 1 when μ is a solution to the following martingale problem. Let {c^jk, e^j} be progressively measurable functions [0, T] × T → ℝ. We suppose that c^jk = c^kj. For all 1 ≤ j, k ≤ d, c^jk(t, x) and e^j(t, x) are assumed to be bounded for x ∈ L (where L is compact) and all t ∈ [0, T]. For

f \in C_{0}^{2} (R^{d})

and x ∈ T, let

ℳ_{u} (f) (x) = \sum_{1 \leq j, k \leq d} c^{j k} (u, x) \frac{\partial^{2} f}{\partial y^{j} \partial y^{k}} (x_{u}) + \sum_{1 \leq j \leq d} e^{j} (u, x) \frac{\partial f}{\partial y^{j}} (x_{u}) .

We assume that for all such f, the following is a continuous martingale (relative to the canonical filtration) under μ

f (X_{t}) - f (X_{0}) - \int_{0}^{t} ℳ_{u} f (X) d u .

(28)

The law governing X₀ is stipulated to be ν ∈

ℳ

(ℝ^d).

From now on we switch from our earlier convention and we consider μ_|_[0_,s_]_,x to be a measure on T such that, for μ a.e. x ∈ T, μ_|_[0_,s_]_,x(A_s,x) = 1, where A_s,x is the set of all X∈ T satisfying X_t = x_t for all 0 ≤ t ≤ s. This is a property of a regular conditional probability (see Theorem 3.18 in [23]). Similarly, μ_|s,x_;_t,y is considered to be a measure on T such that for μ a.e. x ∈ T, μ_|s,x_;_t,y(B_s,x_;_t,y) = 1, where B_s,x_;_t,y is the set of all X∈ A_s,x such that X_t = y. We may apply Fubini’s Theorem (since f is compactly supported and bounded) to (28) to find that

⟨ μ_{t | [0, s], x,} f ⟩ - f (x_{s}) = \int_{s}^{t} E^{μ | [0, s], x} [ℳ_{u} f] d u .

This ensures that μ_|_[0_,s_] is absolutely continuous over [s, T], and that

⟨ {\dot{μ}}_{t | [0, s], x,} f ⟩ = E^{μ | [0, s], x} [ℳ_{t} f] .

(29)

Lemma 5. If R(μ||P) < ∞ then for Lebesgue a.e. t ∈ [0, T] and μ a.e. x ∈ T,

a (t, x_{t}) = c (t, x) .

(30)

If R(μ||P) < ∞ then

R (μ ‖ P) = R (ν ‖ μ_{I}) + \frac{1}{2} E^{μ (x)} [\int_{0}^{T} | b (s, x_{s}) - e (s, x) |_{s, x_{s}}^{2} d s] .

(31)

Proof. It follows from R(μ||P) < ∞, (21) and (22) that for all s and μ a.e. x, for Lebesgue a.e. t ≥ s

E^{μ | s, x; t, x_{t}} [c (t, \cdot)] = a (t, x_{t}) .

(32)

Let us take a countable dense subset Q₀_,T of [0, T]. There thus exists a null set N ⊆ [0, T] such that for every s ∈ Q₀_,T, μ a.e. x and every t ∉ N the above equation holds. We may therefore conclude (30) using [27] (Corollary 2.4) and taking s → t⁻. From (29), we observe that for all s ∈ [0, T] and μ a.e. x, for Lebesgue a.e. t

h_{s, x}^{μ} (t, x_{t}) = E^{μ | [0, s], x; t, x_{t}} [e (t, \cdot)] .

Equation (31) thus follows from Corollary 2.

5.1. Comparison of our Results to Those of Fischer et al. [19,20]

We have already noted in the introduction that one may infer a variational representation of the relative entropy from [19,20] by assuming that the coefficients of the underlying stochastic process are independent of the empirical measure in these papers. The assumptions in [20] on the underlying process P are both more general and more restrictive than ours. His assumptions are more general insofar as the coefficients of the SDE may depend on the past history of the process and the diffusion coefficient is allowed to be degenerate. However, our assumptions are more general insofar as we only require P to be the unique (in the sense of probability law) weak solution of the SDE, whereas [20] requires P to be the unique strong solution of the SDE. Of course when both sets of assumptions are satisfied, one may infer that the expressions for the relative entropy are identical.

6. Proof of Theorem 2

The following is an alternative proof to that of [25] (Theorem 6.6) employing the theory of Large Deviations. The fact that, if α ⊆ σ, then R_α(μ||P) ≤ R_σ(μ||P), follows from Lemma 1. We prove the first expression (4) in the case s = 0, t = T (the proof of the second identity (5) is analogous).

Definition 4. A series of probability laws Γ^N on some topological space Ω equipped with its Borelian σ-algebra is said to satisfy a strong Large Deviation Principle with rate function I : Ω → ℝ if for all open sets O,

\underset{N \to \infty}{\lim_{¯}} N^{- 1} \log Γ^{N} (O) \geq - \inf_{x \in O} I (x)

and for all closed sets F

\underset{N \to \infty}{\lim^{¯}} N^{- 1} \log Γ^{N} (F) \leq - \inf_{x \in F} I (x) .

If furthermore the set {x : I(x) ≤ α} is compact for all α ≥ 0, we say that I is a good rate function.

We define the following empirical measures.

Definition 5. For x ∈ T^N,

y \in T_{σ}^{N}

, let

{\hat{μ}}^{N} (x) = \frac{1}{N} \sum_{1 \leq j \leq N} δ_{x^{j}} \in ℳ (T), {\hat{μ}}_{σ}^{N} (y) = \frac{1}{N} \sum_{1 \leq j \leq N} δ_{y^{j}} \in ℳ (T_{σ}) .

Clearly

{\hat{μ}}_{σ}^{N} (x_{σ}) = π_{σ} ({\hat{μ}}^{N} (x))

. The image law

P^{\otimes}^{N} \circ {({\hat{μ}}^{N})}^{- 1}

is denoted by

\prod_{s, t}^{N} \in ℳ (ℳ (T))

. Similarly, for σ ∈ J, the image law of

P_{σ}^{\otimes N} \circ {({\hat{μ}}_{σ}^{N})}^{- 1}

on

ℳ

(T_σ) is denoted by

\prod_{σ}^{N} \in ℳ (ℳ (T_{σ}))

. Since T and T_σ are Polish spaces, we have by Sanov’s theorem (see Theorem 6.2.10 in [14]) that Π^N satisfies a strong Large Deviation Principle with good rate function R(⋅||P). Similarly,

\prod_{σ}^{N}

satisfies a strong Large Deviation Principle on

ℳ

(T_σ) with good rate function

R_{F_{σ}} (\cdot ‖ P)

.

We now define the projective limit

\underline{ℳ} (T)

. If α, γ ∈ J, α ⊂ γ, then we may define the projection

π_{α γ}^{ℳ} : ℳ (T_{γ}) \to ℳ (T_{σ})

as

π_{α γ}^{ℳ} (ξ) : = ξ \circ π_{α γ}^{- 1}

. An element of

\underline{ℳ} (T)

is then a member ⊗_σζ(σ) of the Cartesian product ⊗_σ_∈J

ℳ

(T_σ) satisfying the consistency condition

π_{α γ}^{ℳ} (ζ (γ)) = ζ (α)

for all α ⊂ γ. The topology on

\underline{ℳ} (T)

is the minimal topology necessary for the natural projection

\underline{ℳ} (T) \to ℳ (T_{α})

to be continuous for all α ∈ J. That is, it is generated by open sets of the form

A_{γ, O} = {\otimes_{σ} ζ (σ) \in \underline{ℳ} (T) : ζ (γ) \in O},

(33)

for some γ ∈ J and open O (with respect to the weak topology of

ℳ

(T_γ)).

We may continuously embed

ℳ

(T) into the projective limit

\underline{ℳ} (T)

of its marginals, letting ι denote this embedding. That is, for any σ ∈ J, (ι(μ))(σ) = μ_σ. We note that ι is continuous because ι⁻¹(A_γ,O) is open in

ℳ

(T), for all A_γ,O of the form in (33). We equip

\underline{ℳ} (T)

with the Borelian σ-algebra generated by this topology. The embedding ι is measurable with respect to this σ-algebra because the topology of

ℳ

(T) has a countable base. The embedding induces the image laws (Π^N ○ ι⁻¹) on

ℳ (\underline{ℳ} (T))

. For σ ∈ J, it may be seen that

Π_{σ}^{N} = Π^{N} \circ ι^{- 1} \circ {(π_{σ}^{ℳ})}^{- 1} \in ℳ (ℳ (T_{σ}))

, where

π_{σ}^{ℳ} (\otimes_{α} μ (α)) = μ (σ)

.

It follows from [22] (Thm 3.3) that Π^N ○ ι⁻¹ satisfies a Large Deviation Principle with rate function sup_σ_∈J R_σ(μ||P). However, we note that ι is 1 – 1, because any two measures μ, ν ∈

ℳ

(T) such that μ_σ = ν_σ for all σ ∈ J must be equal. Furthermore, ι is continuous. Because of Sanov’s theorem, (Π^N) is exponentially tight (see Defn 1.2.17, Exercise 1.2.19 in [14] for a definition of exponential tightness and proof of this statement). These facts mean that we may apply the inverse contraction principle [14] (Thm 4.2.4) to infer that Π^N satisfies a Large Deviation Principle with the rate function sup_σ_∈J R_σ(μ||P). Since rate functions are unique [14] (Lemma 4.1.4), we obtain the first identity in conjunction with Sanov’s theorem. The second identity (5) follows similarly. We may repeat the argument above, while restricting to σ ⊂ Q_s,t. We obtain the same conclusion because the σ-algebra generated by (F_σ)_σ_⊂_Qs,t is the same as F_s,t. The last identity follows from the fact that, if α ⊆ σ, then R_α(μ||P) ≤ R_σ(μ||P).

Acknowledgments

This work was supported by INRIA FRM, ERC-NERVI number 227747, European Union Project # FP7-269921 (BrainScales), and Mathemacs # FP7-ICT-2011.9.7

Author Contributions

Both authors contributed to all the article. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Plastino, A.; Miller, H.; Plastino, A. Minimum Kullback entropy approach to the Fokker-Planck equation. Phys. Rev. E. 1997, 56, 3927–3934. [Google Scholar]
Desvillettes, L.; Villani, C. On the trend to global equilibrium in spatially inhomogeneous entropy-dissipating systems. Part 1: The Linear Fokker-Planck Equation. Comm. Pure Appl. Math. 2001, 54, 1–42. [Google Scholar]
Yu, S.; Mehta, P. The Kullback-Leibler Rate Metric for Comparing Dynamical Systems, In Proceedings of Joint 48th IEEE Conference on Decision and Control and 28th Chinese Control Conference, Shanghai, China, 16–18 December 2009; pp. 8363–8368.
Georgiou, T.T.; Lindquist, A. Kullback-Leibler approximation of spectral density functions. IEEE Trans. Inf. Theory. 2003, 49, 2910–2917. [Google Scholar]
Fritelli, M. The minimal entropy martingale measure and the valuation problem in incomplete markets. Math. Finance. 2000, 10, 39–52. [Google Scholar]
Grandits, P.; Rheinlander, T. On the minimal entropy martingale measure. Ann. Prob. 2002, 30, 1003–1038. [Google Scholar]
Miyahara, Y. Minimal Relative Entropy Martingale Measure of Birth and Death Process. In Discussion Papers in Economics; Nagoya City University: Nagoya, Japan, 2000. [Google Scholar]
Miyahara, Y. On the Minimal Entropy Martingale Measures for Geometric Lévy Processes. In Discussion Papers in Economics; Nagoya City University: Nagoya, Japan, 2000. [Google Scholar]
Ustunel, A.S. Entropy, invertibility and variational calculus of the adapted shifts on Wiener Space. J. Funct. Anal. 2009, 257, 3655–3689. [Google Scholar]
Lassalle, R. Invertibility of adapted perturbations of the identity on abstract Wiener space. J. Funct. Anal. 2012, 262, 2734–2776. [Google Scholar]
Akaike, H. Likelihood of a model and information criteria. J. Econometrics. 1981, 16, 3–14. [Google Scholar]
Do, M.; Vetterli, M. Wavelet-based Texture Retrieval Using Generalized Gaussian Density and Kullback–Leibler Distance. IEEE Trans. Image Process 2002, 11, 146–158. [Google Scholar]
Bozdogan, H. Akaike’s Information Criterion and Recent Developments in Information Complexity. J. Math. Psychol. 2000, 44, 62–91. [Google Scholar]
Dembo, A.; Zeitouni, O. Large Deviations Techniques, 2nd ed; Springer: Berlin, Germany, 1997. [Google Scholar]
Ben-Arous, G.; Guionnet, A. Large deviations for Langevin spin glass dynamics. Probab. Theor. Relat. Field. 1995, 102, 455–509. [Google Scholar]
Moynot, O.; Samuelides, M. Large deviations and mean-field theory for asymmetric random recurrent neural networks. Probab. Theor. Relat. Field. 2002, 123, 41–75. [Google Scholar]
Faugeras, O.; MacLaurin, J. A large deviation principle for networks of rate neurons with correlated synaptic weights 2013, arXiv, 1302.1029.
Faugeras, O.; MacLaurin, J. Large Deviations of an Ergodic Synchoronous Neural Network with Learning 2014, arXiv, 1404.0732v3, math.PR.
Budhiraja, A.; Dupuis, P.; Fischer, M. Large deviation properties of weakly interacting processes via weak convergence methods. Ann. Prob. 2012, 40, 74–102. [Google Scholar]
Fischer, M. On the form of the large deviation rate function for the empirical measures of weakly interacting systems. Bernoulli 2014, 20, 1765–1801. [Google Scholar]
Baladron, J.; Fasoli, D.; Faugeras, O.; Touboul, J. Mean field description of and propagation of chaos in recurrent multipopulation networks of Hodgkin-Huxley and Fitzhugh-Nagumo neurons 2011, arXiv, 1110.4294.
Dawson, D.A.; Gärtner, J. Large deviations from the McKean-Vlasov limit for weakly interacting diffusions. Stochastics 1987, 20, 247–308. [Google Scholar]
Karatzas, I.; Shreve, S.E. Brownian Motion and Stochastic Calculus, 2nd ed; Graduate Texts in Mathematics; Volume 113, Springer-Verlag: New York, NY, USA, 1991. [Google Scholar]
Donsker, M.; Varadhan, S. Asymptotic Evaluation of Certain Markov Process Expectations for Large Time, IV. Comm. Pure Appl. Math. 1983, XXXVI, 183–212. [Google Scholar]
Xanh, N.X.; Zessin, H. Ergodic Theorems for Spatial Processes. Z. Wahfscheinlichkeitstheorie verw Gebiete 1979, 48, 133–158. [Google Scholar]
Dupuis, P.; Ellis, R.S. A Weak Convergence Approach to the Theory of Large Deviations; John Wiley & Sons: London, UK, 1997. [Google Scholar]
Revuz, D.; Yor, M. Continuous Martingales and Brownian Motion, 2nd ed; Springer-Verlag: Berlin, Germany, 1991. [Google Scholar]

© 2014 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).