Convergence of Limiting Cases of Continuous-Time, Discrete-Space Jump Processes to Diffusion Processes for Bayesian Inference

Lanterman, Aaron

doi:10.3390/math13071084

Open AccessFeature PaperArticle

Convergence of Limiting Cases of Continuous-Time, Discrete-Space Jump Processes to Diffusion Processes for Bayesian Inference

by

Aaron Lanterman

School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA

Mathematics 2025, 13(7), 1084; https://doi.org/10.3390/math13071084

Submission received: 27 February 2025 / Revised: 18 March 2025 / Accepted: 25 March 2025 / Published: 26 March 2025

(This article belongs to the Section D1: Probability and Statistics)

Download Versions Notes

Abstract

Jump-diffusion algorithms are applied to sampling from Bayesian posterior distributions. We consider a class of random sampling algorithms based on continuous-time jump processes. The semigroup theory of random processes lets us show that limiting cases of certain jump processes acting on discretized spaces converge to diffusion processes as the discretization is refined. One of these processes leads to the familiar Langevin diffusion equation; another leads to an entirely new diffusion equation.

Keywords:

Markov chain Monte Carlo; Metropolis–Hastings; Gibbs sampling; pattern theory

MSC:

47D07; 60J76

1. Introduction

This paper develops Monte Carlo algorithms based on continuous-time jump processes for sampling from Bayesian posterior distributions. Section 3 relates some specific continuous-time jump processes to diffusion processes, thereby linking two of the main tools in the bag of tricks of Monte Carlo algorithm designers and offering insight.

1.1. Two Kinds of Convergence

This paper invokes the term convergence in two distinct ways. The first sense of convergence is the convergence of the marginal distribution of random processes over time to a stationary distribution, as defined on p. 238 of [1]. In Bayesian inference applications, we seek Markov processes whose stationary distribution is the desired Bayesian posterior under investigation. The “time” here refers not to the physical time over which we collect data, but an “algorithmic time” associated with the fictitious process we construct.

The second sense of convergence is the convergence of a sequence of different processes to some limiting process, with the convergence being taken over the entire process. This is a more complex issue. While the convergence described in the proceeding paragraph is over simple spaces, such as

ℜ^{n}

, or perhaps groups like

S O (n)

, the convergence of entire processes deals with abstract infinite-dimensional spaces, and defining these notions properly requires considerable care.

The codex by Ethier and Kurtz [1] is one of the most extensive treatises available on the detailed characterization of Markov processes and their convergence via semigroup theory, and we extensively invoke its definitions and theorems. Semigroup theory is our Swiss army knife for the study of convergence, particularly the infinitesimal generators (defined on p. 8 of [1]) that characterize these semigroups (defined on p. 6 of [1]). A semigroup and its generator can be related via a differential equation according to Proposition 1.5(b) on p. 9 of [1]; the uniqueness of this characterization is addressed by Proposition 2.9 on p. 15 of [1].

To show that a chosen process converges to the stationary distribution, it suffices to show that the integral of the generator against the stationary distribution is zero (Proposition 9.2 on p. 239 of [1]). This sort of argument has been a staple of research into pattern-theoretic [2,3,4,5] jump-diffusion algorithms [6,7,8,9,10,11,12,13,14], beginning with Theorem 1 of [15] and the appendices of [16,17].

To show convergence in the second sense—convergence of entire processes—we show that their associated generators converge to the generator of the limiting process. This implies the convergence of semigroups (Theorem 6.1 on p. 28 of [1]), which, in turn, implies the convergence of processes (Theorem 2.5 on p. 167 of [1]) in the Skorohod topology, as defined on pp. 116–118 of [1] and pp. 293–295 of [18]; see also Theorem 7.8 on p. 131 of [1].

1.2. Algorithmic Context

Geman and Hwang [19], among others [20,21,22], proposed simulating Langevin diffusions defined by the stochastic differential equation (SDE)

d X (t) = \frac{1}{2} E^{'} (X (t)) d t + d W (t),

(1)

where

W (t)

is a Wiener process, to sample from densities of the form

π (x) = exp [E (x)] / Z

. Langevin diffusion is essentially a gradient ascent with an additional random term. To accommodate scenes of varying dimension, [15] combined diffusions with jump processes to move between subspaces.

Langevin diffusions must be discretized for computer implementation, and the discretized process only approximately maintains a detailed balance. Besag [23] suggested this can be readily remedied by performing a Metropolis–Hastings [24,25] acceptance/rejection step after making a discretized Langevin diffusion step; the resulting discrete-time Markov chain algorithm incorporates all the advantages of the Langevin diffusion approach, namely, the analogy with the gradient search (natural for continuous variables) and completely parallel site updating. Thus, diffusion techniques fit nicely within the Metropolis–Hastings umbrella. Roberts and Rosenthal [26] explore this approach in depth and deduce optimal step sizes.

The remaining sections offer two primary take-home messages in this vein:

Metropolis–Hastings algorithms and other discrete-time Markov chain Monte Carlo algorithms have been the subject of much investigation in the statistical literature. In Section 2.2, we introduce a continuous-time analogue of a conditional Gibbs sampler [27], where the process jumps on exponential times, with a mean proportional to the inverse of the total probability contained in a neighborhood around the current state. Compared with discrete-time MCMC techniques, such continuous-time jump processes appear to have remained largely unexplored by the random sampling community, although they are well-known as models for physical phenomena: see, for instance, Chapters 4 and 5 of [28].
If derivatives are difficult to compute, Metropolis–Hasting proposals can be made using a density centered around the old value, such as a Gaussian. The acceptance/rejection step ensures that we still achieve the target distribution. Most intriguingly, ref. [29] shows that certain continuous-time interpolations of some Metropolis sampling algorithms, such as the Gaussian proposals mentioned here, weakly converge to Langevin diffusions with the same stationary distribution. Thus, even if the gradient is not explicitly employed, it implicitly guides the inference in a limiting sense. The difficulty of computing derivatives in recognizing objects in visual aerial images for the U.K. Defence Evaluation and Research Agency led the authors of [30] to construct jump-diffusion algorithms using such an approach. We explore variations of this idea in Section 3, where algorithms that operate on discretized sample spaces are shown to converge to different kinds of diffusions as the discretization is refined.

2. Continuous-Time Jump Processes

2.1. Definitions

A continuous-time Markov jump process is characterized by a jump transition intensity

q (x, d y)

(using the notation on p. 561 of [15], along the lines of [31]). The generator for a jump process is

A f (x) = - q (x) f (x) + \int q (x, d y) f (y),

(2)

where the jump intensity is defined as

q (x) = \int q (x, d y) .

(3)

As described on p. 163 of [1] (and in accordance with Theorem 1 on p. 353 of [31]), a continuous-time jump process may be simulated by generating a discrete-time Markov chain, in which we wait a certain random amount of time at each stage in the chain, as follows. Beginning from a state

x_{0}

at time

τ = t_{0}

, a sample path of the process

X (τ)

is constructed as follows:

Draw an exponentially distributed random variable $w_{i}$ with mean $1 / q (x_{i})$ .
Let $t_{i + 1} = t_{i} + w_{i}$ . $X (τ) = x_{i}$ for $τ \in [t_{i}, t_{i + 1})$ .
Draw y from the transition distribution

$Q (x_{i}, d y) = \frac{q (x_{i}, d y)}{q (x_{i})},$

(4)

and let $x_{i + 1} = y$ .
Assign $i \leftarrow i + 1$ and go to step 1.

2.2. A Conditional Gibbs Sampler Subordinated to a Markov Process

We now consider a random-sampling algorithm based on a continuous-time jump process analogous to Corollary 1 of Theorems 1 and 2 of [15]. In [15], the process diffuses between jumps. In this section, we keep the state constant between jumps.

Suppose we restrict the set of potential transitions to a neighborhood around the current state x, denoted as

N (x)

. We require the neighborhood to have the following properties:

Reversibility: $y \in N (x)$ if and only if $x \in N (y)$ . Written in terms of indicator functions, we have $I_{N (x)} (y) = I_{N (y)} (x)$ .
Connectedness: For any x and y, there exists a finite sequence of states $z_{1}, \dots, z_{m}$ such that $z_{1} \in N (x), \dots, y \in N (z_{m})$ .

Now suppose we define our jump transition intensity according to the posterior density, restricted to the neighborhood

q (x, d y) = π (y) I_{N (x)} (y) d y

(5)

and

q (x) = \int_{N (x)} π (y) d y .

(6)

Proposition 9.2 on p. 239 of [1] may be used to show that

π (x) d x

is an invariant measure of the resulting jump process. Furthermore, the connectedness of the neighborhoods implies the irreducibility of the underlying chain, which, in turn, implies that the stationary measure is unique and that the process (starting from any initial distribution) converges in a total variation norm to the invariant measure (see pp. 73–74 of [32] for details and a further discussion of ergodicity).

Notice (from step 1 in the construction of jump processes given in the previous subsection) that if the process is at state x, the process will tend to jump away from x rapidly if there is a lot of probability mass contained in the neighborhood around x. If there is little probability in the surrounding neighborhood, the process will tend to spend more time loitering at x.

During implementation, there is no need for the computer to physically wait an amount of time

w_{i}

. When computing statistics from the chain, we can simply weight the samples according to the

w_{i}

instances.

3. Limiting Cases of Discrete-State Continuous-Time Processes

We now present two theorems illustrating how continuous-time random processes defined on discretizations of ℜ with step size

ϵ

converge to diffusions as the discretization is made finer (as in Section 5.7 of [33]). The first theorem explores a special case of the continuous-time jumping algorithm outlined in Section 2.2. The second is analogous to a Metropolis–Hastings acceptance/rejection scheme. In the second case, the limiting diffusion is a Langevin diffusion. In the first case, the limiting diffusion is a different sort of a diffusion; to our knowledge, it was not previously reported.

One advantage of the generator approach used in this section is that one can easily combine the small-step diffusions presented here with jumping processes that move between subspaces by adding the generators associated with the diffusion and jump processes. An elegant example of this approach is given in Section 5.4 of [32].

Section 5.5 of [32] discusses the the ergodic properties of diffusion processes, including the uniqueness of their invariant measures and their convergence to those invariant measures.

3.1. Conditional Gibbs Style

Theorem 1.

Let π be a strictly positive, twice-differentiable probability density on ℜ with a Gibbs representation

π (x) = exp [E (x)] / Z

. Consider a family of jump processes

X_{ϵ} (t)

, indexed by

ϵ > 0

, with a jump measure given by

q_{ϵ} (x, d y) = \frac{1}{ϵ^{2}} \{π (x + ϵ) δ (y - (x + ϵ)) + π (x - ϵ) δ (y - (x - ϵ))\} d y,

(7)

where δ is the Dirac delta function (this is analogous to Equation (20) of [15]). As

ϵ \to 0

,

X_{ϵ}

converges in the Skorohod topology to a diffusion process X defined by the novel SDE

d X (t) = 2 E^{'} (X (t)) π (X (t)) d t + \sqrt{2 π (X (t))} d W (t) .

(8)

Moreover,

π (x) d x

is a stationary measure for X.

Remark 1.

Notice that (8) is somewhat similar to a Langevin SDE, with some important distinctions. The process still drifts along the gradient of the posterior, but the multiplication by

π (X (t))

in the drift term implies that the gradient pulls harder when exploring regions of higher probability. Similarly, the Wiener process is more “turbulent” in regions of high probability. We have been unable to find (8) in the literature (such as the examples and problems in Chapter 7 of [34]), and therefore do not have a name for it.

Proof.

Here, the jump intensity is

q_{ϵ} (x) = \int q (x, d y) = \frac{1}{ϵ^{2}} \{π (x + ϵ) + π (x - ϵ)\} .

(9)

The generator (2) associated with index

ϵ

is given by

\begin{matrix} A_{ϵ} f (x) = \frac{1}{ϵ^{2}} \{- [π (x + ϵ) + π (x - ϵ)] f (x) + π (x + ϵ) f (x + ϵ) + π (x - ϵ) f (x - ϵ)\} . \end{matrix}

Since we take the limits to form derivatives, we restrict the domain of f to be twice-differential bounded functions.

Expanding functions with

ϵ

in their arguments in Taylor series to the second order, the limit of (10) as

ϵ \to 0

is

A_{0} f (x) \overset{df}{=} {lim}_{ϵ \to 0} A_{ϵ} f (x) =

\begin{matrix} lim_{ϵ \to 0} \frac{1}{ϵ^{2}} {- [π (x + ϵ) + π (x - ϵ)] f (x) + π (x + ϵ) f (x + ϵ) + π (x - ϵ) f (x - ϵ)} \\ = lim_{ϵ \to 0} \frac{1}{ϵ^{2}} {- [2 π (x) + ϵ^{2} π^{″} (x)] f (x) + 2 f (x) π (x) + ϵ^{2} {(π \cdot f)}^{″} (x)} \\ = - π^{″} (x) f (x) + π^{″} (x) f (x) + 2 π^{'} (x) f^{'} (x) + π (x) f^{″} (x) \\ = 2 π^{'} (x) f^{'} (x) + π (x) f^{″} (x), \end{matrix}

(10)

which is the generator of a diffusion with infinitesimal mean

2 π^{'} (x) = 2 E^{'} (x) π (x)

and infinitesimal variance

2 π (x)

(Equation (1.2) on p. 366 of [1]). This process can be implicitly defined by our newly discovered SDE

d X (t) = 2 E^{'} (X (t)) π (X (t)) d t + \sqrt{2 π (X (t))} d W (t),

(11)

where

W (t)

is a standard Wiener process.

Theorem 6.1 on p. 28 of [1] shows that the convergence of generators implies the convergence of their associated semigroups. Combining this with Theorem 2.5 on p. 167 of [1] implies that

X_{ϵ} \to X

in the Skorohod topology as

ϵ \to 0

.

By Proposition 9.2 on p. 239 of [1],

π

is a stationary density of X if and only if

\int A_{0} f (x) π (x) d x = 0

for all f in the domain of the generator:

\int A_{0} f (x) π (x) d x = \int π^{2} (x) f^{″} (x) d x + 2 \int π^{'} (x) π (x) f^{'} (x) d x .

(12)

Applying integration by parts (let

u = π^{2} (x)

and

d v = f^{″} (x) d x

, so

d u = 2 π (x) π^{'} (x) d x

and

v = f^{'} (x)

) to the first term and employing the fact that

π (x)

and

f^{'} (x)

must both go to zero at the extremes yields

\int A_{0} f (x) π (x) d x = π^{2} (x) f^{'} {(x) |}_{- \infty}^{\infty} - 2 \int π^{'} (x) π (x) f^{'} (x) d x + 2 \int π^{'} (x) π (x) f^{'} (x) d x,

(13)

which is zero. □

3.2. Metropolis Style

This section is in the spirit of Corollary 2 of Theorems 1 and 2 of [15]. An informal recollection of the classic construction of Brownian motion in ℜ in terms of coin flips will be helpful for interpreting the next theorem. At each time step, if the coin comes up heads, our Brownian traveller takes a step to the left; if tails, a step to the right. Letting the time between steps and the size of the steps shrink yields Brownian motion. Now, suppose we place a density

π

over the range of places the traveller may sojourn, and suppose that instead of automatically taking a step after each coin flip, the traveller first looks at the value of the density

π

at the place they are asked to move and compares it with the value of

π

where they are currently at. If the probability is higher, they take the step; if lower, they decide to accept or reject the step in the traditional Metropolis fashion. At each stage, the traveller waits for a random amount of time according to an exponential distribution with mean

ϵ^{2}

. This heuristic scenario is made precise in the next theorem, where we find that this alternate traveller’s journey, instead of converging to Brownian motion, remarkably converges to a Langevin diffusion with a stationary density

π

. For clarity, we make the Metropolis procedure explicit in our theorem statement by allowing the process to “jump” to the same state; we address this further in a remark after the proof.

Theorem 2.

Let π be a strictly positive, twice-differentiable probability density on ℜ with a Gibbs representation

π (x) = exp [E (x)] / Z

. Consider a family of jump processes

X_{ϵ} (t)

, indexed by

ϵ > 0

, with a jump measure given by

q_{ϵ} (x, d y) =

\begin{matrix} \frac{1}{ϵ^{2}} \{\frac{1}{2} min (\frac{π (x + ϵ)}{π (x)}, 1) δ (y - (x + ϵ)) + \frac{1}{2} min (\frac{π (x - ϵ)}{π (x)}, 1) δ (y - (x - ϵ)) \\ + \frac{1}{2} [1 - min (\frac{π (x + ϵ)}{π (x)}, 1)] δ (y - x) + \frac{1}{2} [1 - min (\frac{π (x - ϵ)}{π (x)}, 1)] δ (y - x)\} d y . \end{matrix}

(14)

The first two terms of (14) correspond to Metropolis acceptances; the second two correspond to rejections. As

ϵ \to 0

,

X_{ϵ}

converges in the Skorohod topology to a diffusion process X defined by the Langevin SDE

d X (t) = \frac{1}{2} E^{'} (X (t)) d t + d W (t) .

(15)

Moreover,

π (x) d x

is a stationary measure for X.

Proof.

Here, the jump intensity is

q_{ϵ} (x) = \int q (x, d y) = \frac{1}{ϵ^{2}} .

The generator (2) associated with the index

ϵ

is given by

\begin{matrix} A_{ϵ} f (x) & = & \frac{1}{2 ϵ^{2}} \{- [min (\frac{π (x + ϵ)}{π (x)}, 1) + min (\frac{π (x - ϵ)}{π (x)}, 1)] f (x) \\ + min (\frac{π (x + ϵ)}{π (x)}, 1) f (x + ϵ) + min (\frac{π (x - ϵ)}{π (x)}, 1) f (x - ϵ)\} . \end{matrix}

(16)

It will be easiest to consider separate cases. For

π (x + ϵ) > π (x)

and

π (x - ϵ) < π (x)

, generator (16) simplifies to

A_{ϵ} f (x) = \frac{1}{2 ϵ^{2}} \{- [1 + \frac{π (x - ϵ)}{π (x)}] f (x) + f (x + ϵ) + \frac{π (x - ϵ)}{π (x)} f (x - ϵ)\} .

(17)

Expanding functions with

ϵ

in their arguments in Taylor series to the second order yields

\begin{matrix} A_{0} f (x) \overset{df}{=} lim_{ϵ \to 0} A_{ϵ} f (x) & = lim_{ϵ \to 0} \frac{1}{2 ϵ^{2}} \{- [1 + \frac{π (x) - ϵ π^{'} (x) + (ϵ^{2} / 2) π^{″} (x)}{π (x)}] f (x) \\ + f (x) + ϵ f^{'} (x) + (ϵ^{2} / 2) f^{″} (x) \\ + \frac{π (x) f (x) - ϵ {(π \cdot f)}^{'} (x) + (ϵ^{2} / 2) {(π \cdot f)}^{″} (x)}{π (x)}\} \\ = lim_{ϵ \to 0} \frac{1}{2 ϵ^{2}} \{- [1 + \frac{π (x) - ϵ π^{'} (x) + (ϵ^{2} / 2) π^{″} (x)}{π (x)}] f (x) \\ + f (x) + ϵ f^{'} (x) + \frac{ϵ^{2}}{2} f^{″} (x) \\ + \frac{π (x) f (x) - ϵ [π^{'} (x) f (x) + π (x) f^{'} (x)]}{π (x)} \\ + ϵ^{2} \frac{π^{″} (x) f (x) + 2 π^{'} (x) f^{'} (x) + π (x) f^{″} (x)}{2 π (x)}\} \\ = lim_{ϵ \to 0} \frac{1}{2 ϵ^{2}} \{- 2 f (x) + ϵ \frac{π^{'} (x) f (x)}{π (x)} - ϵ^{2} \frac{π^{″} (x) f (x)}{F} 2 π (x) \\ + 2 f (x) - ϵ \frac{π^{'} (x) f (x)}{π (x)} + ϵ^{2} \frac{π^{″} (x) f (x)}{2 π (x)} \\ + ϵ^{2} \frac{π^{'} (x) f^{'} (x)}{π (x)} + ϵ^{2} f^{″} (x)\} \\ = \frac{π^{'} (x)}{2 π (x)} f^{'} (x) + \frac{1}{2} f^{″} (x) . \end{matrix}

(18)

Working this through for the opposite case with

π (x + ϵ) < π (x)

and

π (x - ϵ) > π (x)

yields the same result.

Now consider the case where

π (x + ϵ) < π (x)

and

π (x - ϵ) < π (x)

holds as

ϵ \to 0

so that

π^{'} (x) = 0

. Here,

\begin{matrix} A_{0} f (x) \overset{df}{=} lim_{ϵ \to 0} A_{ϵ} f (x) & = lim_{ϵ \to 0} \frac{1}{2 ϵ^{2}} \{- \frac{2 π (x) + ϵ^{2} π^{″} (x)}{π (x)} f (x) + \frac{2 π (x) f (x) + ϵ^{2} {(π \cdot f)}^{″} (x)}{π (x)}\} \\ = \frac{1}{2} \{- \frac{π^{″} (x)}{π (x)} f (x) + \frac{π^{″} (x) f (x)}{π (x)} + 2 π^{'} (x) f^{'} (x) + \frac{π (x) f^{″} (x)}{π (x)}\} \\ = \frac{f^{″} (x)}{2} . \end{matrix}

(19)

Finally, consider the case where

π (x + ϵ) > π (x)

and

π (x - ϵ) > π (x)

hold as

ϵ \to 0

such that

π^{'} (x) = 0

. Now we have

\begin{matrix} A_{0} f (x) \overset{df}{=} lim_{ϵ \to 0} A_{ϵ} f (x) & = lim_{ϵ \to 0} \frac{1}{2 ϵ^{2}} \{- 2 f (x) + f (x + ϵ) + f (x - ϵ)\} \\ = lim_{ϵ \to 0} \frac{1}{2 ϵ^{2}} \{- 2 f (x) + 2 f (x) + ϵ^{2} f^{″} (x)\} = \frac{f^{″} (x)}{2} . \end{matrix}

(20)

Notice that (19) and (20) can both be written as (18), which is the generator of a diffusion process with infinitesimal mean

π^{'} (x) / [2 π (x)] = E^{'} (x) / 2

and infinitesimal variance 1. This process can be implicitly defined by the SDE for the Langevin diffusion

d X (t) = \frac{1}{2} E^{'} (X (t)) d t + d W (t),

(21)

where

W (t)

is a standard Wiener process. As in the preceding theorem, applying Theorem 6.1 on p. 28 of [1], along with Theorem 2.5 on p. 167 of [1], reveals that

X_{ϵ} \to X

in the Skorohod topology as

ϵ \to 0

.

Again, by Proposition 9.2 on p. 239 of [1],

π

is a stationary density of X if and only if

\int A_{0} f (x) π (x) d x = 0

for all f in the domain of the generator:

\int A_{0} f (x) π (x) d x = \frac{1}{2} \{\int π (x) f^{″} (x) d x + \int π^{'} (x) f^{'} (x) d x\} .

(22)

Applying integration by parts (let

u = π (x)

and

d v = f^{″} (x) d x

, so

d u = π^{'} (x) d x

and

v = f^{'} (x)

) to the first term and employing the fact that

π (x)

and

f^{'} (x)

must both go to zero at the extremes yields

\int A_{0} f (x) π (x) d x = \frac{1}{2} \{π (x) f^{'} {(x) |}_{- \infty}^{\infty} - \int π^{'} (x) f^{'} (x) d x + \int π^{'} (x) f^{'} (x) d x\},

(23)

which is zero. □

Remark 2.

Consider a jump transition intensity consisting of the first two terms of (14):

q_{ϵ} (x, d y) =

\frac{1}{ϵ^{2}} \{\frac{1}{2} min (\frac{π (x + ϵ)}{π (x)}, 1) δ (y - (x + ϵ)) + \frac{1}{2} min (\frac{π (x - ϵ)}{π (x)}, 1) δ (y - (x - ϵ))\},

(24)

which is analogous to Equation (21) of [15] (except the “proposals” are chosen with equal probability instead of according to a prior). The corresponding jump intensity is

q_{ϵ} (x) = \frac{1}{2 ϵ^{2}} \{min [\frac{π (x + ϵ)}{π (x)}, 1] + min [\frac{π (x - ϵ)}{π (x)}, 1]\},

as would be used in the algorithm described in Section 2.1. The resulting generator

A_{ϵ} f (x)

exactly matches (16), so it describes the same family of processes, though it suggests a different practical implementation.

4. Conclusions

Section 3 presents two new theorems that demonstrate that certain random sampling algorithms operating on a discretized space in which moves are restricted to neighboring points converge to diffusion algorithms as the discretization is refined to the continuum. A Metropolis–Hastings formulation leads to the Langevin SDE; in contrast, a Gibbs-style formulation leads to an entirely new SDE that, to our knowledge, had not previously appeared in the literature.

These theorems were formulated for one-dimensional processes; future work could consider extensions to multi-dimensional processes, including Cartesian products that involve rotation groups, as in [35]. A first step toward an extension to

ℜ^{n}

might be to add terms associated with increments and decrements of

ϵ

in additional coordinates to (7) and (14).

The convergence results in this paper do not make any claims about the rates of convergence; future work on numerical simulations would yield insight along these lines by comparing different methods.

Funding

This research was supported by the U.S. Army Research Office (DAAH04-95-0494 and DAAH04-94-G-0209) and the U.S. Air Force Office of Scientific Research (F49620-03-1-0340).

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviation is used in this manuscript:

SDE	Stochastic differential equation

References

Ethier, S.; Kurtz, T. Markov Processes: Characterization and Convergence; John Wiley and Sons: Hoboken, NJ, USA, 1986. [Google Scholar]
Grenander, U. General Pattern Theory: A Mathematical Study of Regular Structures; Oxford University Press: Oxford, UK, 1994. [Google Scholar]
Grenander, U. Elements of Pattern Theory; Johns Hopkins University Press: Baltimore, MD, USA, 1996. [Google Scholar]
Grenander, U.; Miller, M. Pattern Theory: From Representation to Inference; Oxford University Press: Oxford, UK, 2006. [Google Scholar]
Mumford, D.; Desolneux, A. Pattern Theory: The Stochastic Analysis of Real World Signals; A.K. Peters/CRC Press: Boca Raton, FL, USA, 2010. [Google Scholar]
Sworder, D.; Boyd, J. Jump-Diffusion Processing in Tracking/Recognition. IEEE Trans. Signal Process. 1998, 46, 235–239. [Google Scholar]
Srivastava, A.; Miller, M.; Grenander, U. Multiple Target Direction of Arrival Tracking. IEEE Trans. Signal Process. 1995, 43, 1282–1285. [Google Scholar]
Zhu, S.C. Stochastic jump-diffusion process for computing medial axes in Markov random fields. IEEE Trans. Pattern Anal. Mach. Intell. 1999, 21, 1158–1169. [Google Scholar]
Han, F.; Tu, Z.; Zhu, S. Range Image Segmentation by an Effective Jump-Diffusion Method. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 1139–1153. [Google Scholar]
Phillips, D.; Smith, A. Bayesian model comparison via jump diffusions. In Markov Chain Monte Carlo in Practice; Gilks, W., Richardson, S., Spiegelhalter, D., Eds.; Chapman and Hall: Boca Raton, FL, USA, 1996; Chapter 13; pp. 215–239. [Google Scholar]
Lanterman, A.; Miller, M.; Snyder, D. General Metropolis-Hastings jump diffusions for automatic target recognition in infrared scenes. Opt. Eng. 1997, 36, 1123–1137. [Google Scholar]
Lanterman, A. Jump-Diffusion Algorithm for Multiple Target Recognition using Laser Radar Range Data. Opt. Eng. 2001, 40, 1724–1728. [Google Scholar]
Liu, X.; Lo, D.; Thuan, C. Unsupervised Learning Based Jump-Diffusion Process for Object Tracking in Video Surveillance. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; Volume 2018, pp. 5060–5066. [Google Scholar]
Kumar, S.; Paria, B.; Tsvetkov, Y. Gradient-based Constrained Sampling from Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 2251–2277. [Google Scholar]
Grenander, U.; Miller, M.I. Representations of Knowledge in Complex Systems. J. R. Stat. Soc. B 1994, 56, 549–603. [Google Scholar]
Miller, M.; Srivastava, A.; Grenander, U. Conditional-Mean Estimation Via Jump-Diffusion Processes in Multiple Target Tracking/Recognition. IEEE Trans. Signal Process. 1995, 43, 2678–2690. [Google Scholar]
Miller, M.; Grenander, U.; O’Sullivan, J.A.; Snyder, D. Automatic Target Recognition Organized via Jump-Diffusion Algorithms. IEEE Trans. Image Process. 1997, 6, 1–17. [Google Scholar]
Durrett, R. Stochastic Calculus: A Practical Introduction; CRC Press: Boca Raton, FL, USA, 1996. [Google Scholar]
Geman, S.; Hwang, C.R. Diffusions for Global Optimization. SIAM J. Control Optim. 1987, 24, 1031–1043. [Google Scholar]
Amit, Y.; Grenander, U.; Piccioni, M. Structural image restoration through deformable templates. J. Am. Stat. Assoc. 1991, 86, 376–387. [Google Scholar] [CrossRef]
Ghaderi, S.; Ahookhosh, M.; Arany, A.; Skupin, A.; Patrinos, P.; Moreau, Y. Smoothing Unadjusted Langevin Algorithms for Nonsmooth Composite Potential Functions. Appl. Math. Comput. 2024, 464, 128377. [Google Scholar]
Gelfand, S.; Mitter, S. Recursive Stochastic Algorithms for Global Optimization in ℝ^d. SIAM J. Control Optim. 1991, 29, 999–1018. [Google Scholar] [CrossRef]
Besag, J. Contribution to the Discussion of the Paper by Grenander and Miller. J. R. Stat. Soc. B 1994, 56, 591–592. [Google Scholar]
Metropolis, N.; Rosenbluth, A.; Rosenbluth, M.; Teller, A.; Teller, E. Equation of State Calculations by Fast Computing Machines. J. Phys. Chem. 1953, 21, 1087–1092. [Google Scholar] [CrossRef]
Hastings, W.K. Monte Carlo Sampling Methods Using Markov Chains, and their Applications. Biometrika 1970, 57, 97–109. [Google Scholar]
Roberts, G.; Rosenthal, J. Optimal Scaling of Discrete Approximations to Langevin diffusions. J. R. Stat. Soc. B 1998, 60, 255–268. [Google Scholar] [CrossRef]
Geman, S.; Geman, D. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. IEEE Trans. Pattern Anal. Mach. Intell. 1984, 6, 721–741. [Google Scholar]
Gillespie, D. Markov Processes: An Introduction for Physical Scientists; Academic Press: San Diego, CA, USA, 1992. [Google Scholar]
Gelfand, S.; Mitter, S. Weak Convergence of Markov Chain Sampling Methods and Annealing Algorithms to Diffusions. J. Optim. Theory Appl. 1991, 68, 483–498. [Google Scholar] [CrossRef]
Reno, A.; Gillies, D.; Booth, D. Deformable Models for Object Recognition in Aerial Images. In Automatic Target Recognition VIII; SPIE: Bellingham, WA, USA, 1998; Volume 3371, pp. 322–333. [Google Scholar]
Gikhman, I.; Skorohod, A. Introduction to the Theory of Random Processes; Saunders: Philadelphia, PA, USA, 1965. [Google Scholar]
Srivastava, A. Inference on Transformation Groups Generating Patterns on Rigid Motions. D.Sc. Dissertation, Dept. of Electrical Engineering, Sever Institute of Technology, Washington University, St. Louis, MO, USA, 1996. [Google Scholar]
Lanterman, A. Modeling Clutter and Target Signatures for Pattern-Theoretic Understanding of Infrared Scenes. D.Sc. Dissertation, Dept. of Electrical Engineering, Sever Institute of Technology, Washington University, St. Louis, MO, USA, 1998. [Google Scholar]
Øksendal, B. Stochastic Differential Equations: An Introduction with Applications, 4th ed.; Springer: New York, NY, USA, 1995. [Google Scholar]
Srivastava, A.; Grenander, U.; Jensen, G.R.; Miller, M.I. Jump-diffusion Markov Processes on Orthogonal Groups for Object Pose Estimation. J. Stat. Plan. Inference 2002, 103, 15–37. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lanterman, A. Convergence of Limiting Cases of Continuous-Time, Discrete-Space Jump Processes to Diffusion Processes for Bayesian Inference. Mathematics 2025, 13, 1084. https://doi.org/10.3390/math13071084

AMA Style

Lanterman A. Convergence of Limiting Cases of Continuous-Time, Discrete-Space Jump Processes to Diffusion Processes for Bayesian Inference. Mathematics. 2025; 13(7):1084. https://doi.org/10.3390/math13071084

Chicago/Turabian Style

Lanterman, Aaron. 2025. "Convergence of Limiting Cases of Continuous-Time, Discrete-Space Jump Processes to Diffusion Processes for Bayesian Inference" Mathematics 13, no. 7: 1084. https://doi.org/10.3390/math13071084

APA Style

Lanterman, A. (2025). Convergence of Limiting Cases of Continuous-Time, Discrete-Space Jump Processes to Diffusion Processes for Bayesian Inference. Mathematics, 13(7), 1084. https://doi.org/10.3390/math13071084

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Convergence of Limiting Cases of Continuous-Time, Discrete-Space Jump Processes to Diffusion Processes for Bayesian Inference

Abstract

1. Introduction

1.1. Two Kinds of Convergence

1.2. Algorithmic Context

2. Continuous-Time Jump Processes

2.1. Definitions

2.2. A Conditional Gibbs Sampler Subordinated to a Markov Process

3. Limiting Cases of Discrete-State Continuous-Time Processes

3.1. Conditional Gibbs Style

3.2. Metropolis Style

4. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI