**Proof.** From (ii) to (i), the proof is trivially applying Itô’s calculus.

From (i) to (ii), the proof consists of the following four steps: 1. show that the maximum principle on smooth functions is equivalent to the law of Wiener processes, 2. show that the invariance of the law is preserved on the Wiener path, 3. set up the approximation on the Wiener path by showing that the martingale fairness is preserved, and 4. extend the result to the model $(\Omega ,\mathcal{F},\mathbb{P})$.

Step 1. The definition of the maximum principle is simply the first and second derivative conditions in calculus. If a function

$f:\mathbb{S}\to \mathbb{R}$ attains its maximum at point

$x\in \mathbb{S}$, then

${\nabla}_{x}f(x)=0$ and

${\u25b5}_{x}f(x)\le 0$. Furthermore, if

f is a time-dependent function such that

$f:[0,T)\times \mathbb{S}\to \mathbb{R}$ at a certain time interval

$[0,t]$, and

f attains its maximum at

x when time is

t, then

$\partial f(t,x)/\partial t\ge 0$ with

${\nabla}_{x}f(t,x)=0$ and

${\u25b5}_{x}f(t,x)\le 0$. The inequality

$\partial f(t,x)/\partial t\ge 0$ expresses the uncertainty of the future such that

$\partial f(\xb7,x)/\partial t$ could either strictly increase along

t or attain its optimum at

t. Since the maximum principle is preserved up to the second order, we have the heat equation

without loss of generality, in steps 1 to 3, we only consider the standard case with the diffusion factor

$a(x)=b(x)=1$, but (

A1) holds for any real vectors

$a(x)$ and

$b(x)$. The solution of (

A1) is the well-known Wiener process.

Step 2: In order to formalize the concept of the Wiener path, we need to introduce the path space. Suppose that a series of realizations

${\left\{{x}_{{t}_{i}}\right\}}_{{t}_{i}\le {t}_{N}}$ corresponds to

t via

${x}_{i}=\psi ({t}_{i})$ for

${t}_{i}\le {t}_{N}$. Then

$\psi :[0,\infty )\to \mathbb{S}$ is a continuous path with the image on the complete separable space

$\mathbb{S}$. A path space

$\mathfrak{P}(\mathbb{S})=C([0,\infty ),\mathbb{S})$ is a continuous function space of paths

$\psi $. The

$\sigma $-algebra

$\mathcal{PB}$ is

generated by

$\psi \in \mathfrak{P}(\mathbb{S})\mapsto \psi (t)\in \mathbb{S}$. The measure

$\mathcal{W}$ for

$\mathfrak{P}(\mathbb{S})$ is called the Wiener measuresuch that for a sequence

${\left\{\psi ({t}_{i})\right\}}_{{t}_{i}\le {t}_{N}}={\left\{{x}_{{t}_{i}}\right\}}_{{t}_{i}\le {t}_{N}}$:

The measure is tight in the sense that, if

$t-s<\u03f5$,

for any metric

$\rho (\xb7,\xb7)$. This is the Ascoli-Arzela criterion for compact subsets.

We need to show that the invariance property of $\mathcal{W}$ is a restatement of the independent identical increment property.

Identical: Note that a function

f over

$\psi $ will not change the expression except that

$\psi (t)$ is replaced by

$f(\psi (t))$. By Lemma 3.4.3 and Theorem 3.4.16 (Kolmogorov’s Criterion) of

Stroock (

2000), we have that for a subset

$\mu $ of all tight measures

$\mathcal{M}(\mathfrak{P}(\mathbb{S}))$ and

$\psi \in \mathfrak{P}(\mathbb{R})$:

where

${C}_{T}<\infty $ is a constant,

$\alpha >0$ and

$r\ge 1$. Then we have

This means that the increments are controlled by the length of the time interval. When the interval is extremely small, all increments are essentially treated the same. So the smooth function f does not matter for the law of $\mathcal{W}$.

Independent: For $\psi $, $\varpi \in \mathfrak{P}(\mathbb{R})$, let $\varpi (t)=\psi (t+s)-\psi (s)$. By the definition of the Wiener measure, both $\psi (s)$ and $\varpi (t)$ associate with $\mathcal{W}$ on the time path $[0,s]$ and $[0,t]$ respectively. Clearly, they are independent.

Step 3. The reason why we are looking for a martingale representation is in fact to look for a “stochastic constant”. In the deterministic case, suppose we define an integral curve of

$\psi (\xb7)$ on a smooth vector field

a on

$\mathbb{R}$, starting at

$x\in \mathbb{R}$. Then the path

$\psi $ with

$\psi (0)=x$ has the property that

is a

constant17 for any

$f\in {C}^{\infty}$. If there is a stochastic analog, then we can use this

stochastic constant to establish our approximating model. The aim is to maintain a stable “error”.

Recall the path space

$\mathfrak{P}(\mathbb{S})$ and its

$\sigma $-algebra

$\mathcal{PB}$. For an incremental element

$\psi (t)-\psi (s)$ on

$\mathfrak{P}(\mathbb{R})$, the Fourier transform is:

where

$x=\varpi (t-s)=\psi (t)-\psi (s)$. What we want to obtain is a martingale and a “constant” under

$\mathcal{W}$. From the above equation, it easy to see that we can obtain both of them simultaneously if we shift the element

$expi\xi \psi (t)$ by a Gaussian factor

${exp|\xi |}^{2}t/2$:

Let a triplet denote this martingale on the Wiener path

$\mathcal{W}$:

We define the Fourier transform of f by $\mathbb{F}f(\xi )={\int}_{-\infty}^{\infty}f(x){e}^{i\xi x}dx$, and the inverse Fourier transform is ${\mathbb{F}}^{-1}f(\xi )={\int}_{-\infty}^{\infty}f(x){e}^{-i\xi x}dx$.

As in the deterministic case, the ideal representation of

$f(t,\psi (t))$ on

$\mathcal{W}$ is the path integral:

We need to check whether the approximation error is a “constant” in the stochastic sense. Note that

By the property

${\mathbb{F}}^{-1}(\frac{\partial}{\partial x})(\xb7)=i\xi {\mathbb{F}}^{-1}(\xb7)$, we have

The approximating error is

The Fourier term

${\mathbb{F}}^{-1}f$ is bounded and irrelevant for

$\mathcal{W}$. If

${M}_{\xi}(t)$ is a martingale in

$\mathcal{W}$, then the error will be a stochastic constant. Rewrite

${M}_{\xi}(t)$ as:

The second term can be written as

and the first term can be written as

${e}^{i\xi t-\frac{1}{2}{|\xi |}^{2}t}{e}^{i\xi \psi (t)+\frac{1}{2}{|\xi |}^{2}t}$. Fubini’s Lemma together with (

A2) implies that

Thus $\left(f(t,\psi )-{\int}_{0}^{t}\left[{\nabla}_{x}f+\frac{1}{2}{\u25b5}_{x}f\right](\tau ,\psi )d\tau ,{\mathcal{PB}}_{t},\mathcal{W}\right)$ is a martingale.

Now we consider the general case in (

A1). If the state moves with velocity

$a({X}_{t})$, the path derivative becomes

$a(\xb7)\nabla f$. Moreover, the Laplace operator Δ in the heat Equation (

A1) may be associated with a volatility coefficient

$b(\xb7)$. Then the approximating model is given by

which is the integral of the Feller generator

$\mathbb{A}$ on

f:

The generator is a dual representation of a diffusion process

$(a,b)$ such that

where

$b({X}_{t})=\sigma {({X}_{t})}^{T}\sigma ({X}_{t})$ and

${V}_{t}$ is a Wiener process.

Step 4. Since the martingale with initial condition

$\mathcal{W}(\psi (0)=x)=1$ completely characterizes

$\mathcal{W}$, the above result can be extended to any

$\mathbb{P}$ by the

Principle of Accompanying Laws and

Donsker’s Invariance Principle (Theorem 3.1.14 and 3.4.20,

Stroock 2000) if and only if

$\mathbb{P}$ belongs to the family of all tight measures,

$\mathcal{M}(\mathfrak{P}(\mathbb{S}))$. In our setup,

$\mathbb{S}$ is a compact metric space so the collection of

$\mathbb{P}(\xb7)$ over

$\mathcal{S}$ is tight. The Principle of Accompanying Laws says that if a sequence is in a complete separable space with tight measure, the law of this sequence will weakly converge. Donsker’s Invariance Principle says that for independent increment processes, the convergent law is the law of the Wiener process. Therefore,

$\mathbb{P}\in \mathcal{M}(\mathfrak{P}(\mathbb{S}))$ and

is a martingale.

The IF claim says that a martingale exists for $f(t,{X}_{t})$ on $(\Omega ,\mathcal{F},\mathbb{P})$. The maximum principle restricts the process to be ${\mathcal{PB}}_{t}$-adapted, thus $\mathcal{F}\sim \mathcal{PB}$ and the result holds on $(\mathbb{S},\mathcal{S})$ with the probability space $(\Omega ,\mathcal{F},\mathbb{P})$. □