Rate of Entropy Production in Stochastic Mechanical Systems

Chirikjian, Gregory S.

doi:10.3390/e24010019

Open AccessArticle

Rate of Entropy Production in Stochastic Mechanical Systems

by

Gregory S. Chirikjian

Department of Mechanical Engineering, National University of Singapore, Singapore 117575, Singapore

Entropy 2022, 24(1), 19; https://doi.org/10.3390/e24010019

Submission received: 6 November 2021 / Revised: 20 December 2021 / Accepted: 22 December 2021 / Published: 23 December 2021

(This article belongs to the Collection Randomness and Entropy Production)

Download Versions Notes

Abstract

:

Entropy production in stochastic mechanical systems is examined here with strict bounds on its rate. Stochastic mechanical systems include pure diffusions in Euclidean space or on Lie groups, as well as systems evolving on phase space for which the fluctuation-dissipation theorem applies, i.e., return-to-equilibrium processes. Two separate ways for ensembles of such mechanical systems forced by noise to reach equilibrium are examined here. First, a restorative potential and damping can be applied, leading to a classical return-to-equilibrium process wherein energy taken out by damping can balance the energy going in from the noise. Second, the process evolves on a compact configuration space (such as random walks on spheres, torsion angles in chain molecules, and rotational Brownian motion) lead to long-time solutions that are constant over the configuration space, regardless of whether or not damping and random forcing balance. This is a kind of potential-free equilibrium distribution resulting from topological constraints. Inertial and noninertial (kinematic) systems are considered. These systems can consist of unconstrained particles or more complex systems with constraints, such as rigid-bodies or linkages. These more complicated systems evolve on Lie groups and model phenomena such as rotational Brownian motion and nonholonomic robotic systems. In all cases, it is shown that the rate of entropy production is closely related to the appropriate concept of Fisher information matrix of the probability density defined by the Fokker–Planck equation. Classical results from information theory are then repurposed to provide computable bounds on the rate of entropy production in stochastic mechanical systems.

Keywords:

entropy; information; inequalities; Lie Group; statistical mechanics; fluctuation-dissipation theory; covariance; stochastic modeling

1. Introduction

The second law of thermodynamics introduced the concept of entropy, and states that at the macroscopic scale entropy is nondecreasing for a closed isolated system, i.e., one for which no heat enters or leaves and on which no external work is performed. This law was postulated in the mid 1800s, motivated in part by the desire for efficiency gains in steam engines. A closed system could be an isolated container on a lab bench, or the whole universe, leading to the famous phrase “the universe tends toward disorder” which is a paraphrase of a statement made by Rudolf Clausius circa 1865.

Statistical mechanical arguments were developed to explain the second law at the microscopic/molecular level and were established in large part by Ludwig Boltzmann’s classical work for ideal gases involving deterministic collision models between particles. In their famous H-theorem, Boltzmann proved that:

\frac{d H}{d t} \leq 0

where

H (t) ≐ \int_{R^{3}} f (v, t) log f (v, t) d v

with

f (v, t)

being the velocity distribution of gas particles in a container at a particular value of time, t, under the assumption of isotropic spatial distribution. This expression is related to the more general Gibbs formula for entropy given later in this introduction, which differs in sign and scale. As t increases,

f (v, t)

converges to an equilibrium distribution,

f_{\infty} (v)

, known as the Maxwell–Boltzmann distribution, where H is minimized (or entropy is maximized) for a given value of temperature.

A completely different approach to Boltzmann’s is the stochastic mechanical system in which each element in the statistical ensemble is a classical mechanical system which is forced by a combination of conservative forces, random Brownian motion and viscous damping. A stochastic mechanical system is modeled using stochastic differential equations, and ensemble behavior is described by the corresponding Fokker–Planck equation. They are not limited to particle systems, and can either be modelled as having inertia or can be purely kinematic. They can be complex molecules or even systems with nonholonomic kinematic constraints such as robots.

The main contributions of this paper are as follows:

(1): The recognition that stochastic mechanical models can be used in place of the original deterministic collision models used to formulate statistical mechanics;
(2): The interpretation of these stochastic models as Itô or Stratonovich is irrelevant for systems that have nonzero mass, but that these interpretations provide different results as mass becomes zero—an effect almost never discussed in statistical mechanical works;
(3): Conditions for a stochastic mechanical system with configuration-dependent noise and damping to reach equilibrium (i.e., the time-independent probability distribution on phase space) are established, generalizing the Einstein relations and providing a statement that is new and different than those for detailed balance, and represents a new observation in fluctuation-dissipation theory;
(4): The aforementioned stationary pdf to which a stochastic mechanical ensemble converges is in fact the Boltzmann distribution from statistical mechanics, which is obtained via a different path than it was in statistical mechanics;
(5): Novel solutions to diffusion equations on Lie groups are provided;
(6): Inequalities from information theory are extended beyond their original setting (e.g., to include diffusion processes on Lie groups in addition to Euclidean spaces) and used to bound the entropy and rate of entropy production in stochastic mechanical systems such as the rigid Brownian rotor and mobile robots, which are beyond the scope of classical statistical mechanics.

Mixed in with these new results is a substantial amount of review material, since it makes little sense to present intricacies about differences between Itô and Stratonovich stochastic differential equations or to talk about diffusions on Lie groups to someone familiar only with statistical mechanics or information theory. Moreover, despite the similarities in the form of entropy in these fields, the inequalities of information theory are rarely known in statistical mechanics, and the concept of Hamiltonian mechanics and phase space which are essential in statistical mechanics may be unknown to those familiar with information theory.

Stochastic mechanical models (i.e., stochastic differential equations and associated Fokker–Planck equations) are examined here with the goal of establishing strict bounds on the rate of entropy production. Individual stochastic trajectories can be generated numerically from a stochastic differential equation, but it is an ensemble of such trajectories that conveys important statistical information about the system. These ensembles can be used to generate a family of histograms indexed by time, or the associated Fokker–Planck equation can be solved to obtain the same family of probability densities. Regardless of which approach is taken, each stochastic mechanical systems has an associated family of probability density functions (indexed by time) from which entropy and its rate can be computed.

A probability density function (pdf) on a measure space

(X, μ)

which is indexed by time is a function

f : X \times R_{\geq 0} ⟶ R_{\geq 0}

such that:

\int_{X} f (x, t) d μ (x) = 1 .

The entropy of f at each value of time is defined as:

S (t) ≐ - \int_{X} f (x, t) log f (x, t) d μ (x) .

(1)

The choice of base of the logarithm amounts to a choice of measurement units for S, as a different base will result in a different scaling of

S (t)

. Throughout this article,

log ≐ {log}_{e} = ln

.

When considering a statistical mechanical system, the entropy of a time-varying probability density on phase space is defined using Gibb’s formula as:

S_{B} (t) ≐ - k_{B} \int_{q} \int_{p} f (p, q, t) log f (p, q, t) d p d q

(2)

where

q \in Q \subseteq R^{n}

is a set of coordinates, and

p \in R^{n}

are the corresponding conjugate momenta. The Lebesgue measure on the

2 n

-dimensional phase space,

d p d q = d q_{1} . . . d q_{n} d p_{1} . . . d p_{n}

, is invariant under coordinate changes, though such changes do affect the bounds of integration Q, unless

Q = R^{n}

. The inclusion of the Boltzmann constant

k_{B}

in (2) is for consistency with the statistical mechanics literature.

It should be noted that this integral is an approximation of a sum over microstates, the size of which is specified by the Planck scale. Whereas mathematically speaking continuous entropy has no lower bound, for physical systems it does because discrete entropy computed by summing over microstates is always nonnegative. As long as the features of the probability density function (pdf) are coarser than the Planck scale, then entropy differences using this continuous integral formula and the analogous entropy formula computed by summing over discrete states will be the same. That is,

S_{c o n t} \neq S_{d i s c}

, but

Δ S_{c o n t} = Δ S_{d i s c}

for physical systems.

In information theory, entropy can also be over a continuous space (in which case it is called “differential” entropy) or it can be defined over a discrete space, such as a set of symbols. It is no coincidence that the same expression (without the factor of

k_{B}

) appears with the same name in information theory. As the folklore goes, the founder of information theory, Claude Shannon had not yet fixed the name ‘entropy’, and had a conversation with mathematician John von Neumann, who suggested [1,2]:

“You should call it entropy, for two reasons. First, your expression has been used in statistical mechanics under that name. Second, nobody really knows what entropy is, so in a debate you will always have the advantage.”

These words are not exact because it was supposedly a phone conversation (if it ever happened at all) subsequently relayed through other people.

The configurational probability density is the marginal distribution:

f (q, t) ≐ \frac{1}{\sqrt{M (q)}} \int_{p} f (p, q, t) d p

(3)

which is defined in this way so that:

\int_{q \in Q} f (q, t) \sqrt{M (q)} d q = 1 .

The corresponding configurational entropy is:

S_{Q} (t) ≐ - \int_{q \in Q} f (q, t) log f (q, t) \sqrt{M (q)} d q

(4)

where

M (q)

is the metric tensor for the configuration space, which is the mass matrix in the case of a mechanical system with inertia.

A unimodular Lie group is one with an integration measure that is invariant under shifts from the left and the right. In the case when

q

parameterizes a unimodular Lie group, G, such as the group of rotations or the Euclidean motion group, then the entropy of a time-indexed pdf

f : G \times R_{\geq 0} \to R_{\geq 0}

is:

S_{G} (t) ≐ - \int_{G} f (g, t) log f (g, t) d g

(5)

where the Haar measure

d g

takes on different appearances under changes of parametrization but the value of the integral is independent of parametrization and it is invariant under shifts of the form

g \to g_{0} g

and

g \to g g_{0}

for any fixed

g_{0} \in G

.

The main topic of this article is to study the rate of entropy increase in stochastically forced mechanical systems. That is, for any of the entropies defined above, to calculate or bound

\dot{S}

. In particular, by simply moving the time derivative inside the integral and using the product rule from Calculus,

\dot{S} (t) = - \int_{X} \{\frac{\partial f}{\partial t} log f + \frac{\partial f}{\partial t}\} d μ (x) .

(6)

However, since

f (x, t)

is a probability density function at each value of time, whose integral is equal to 1,

\int_{X} \frac{\partial f}{\partial t} d μ (x) = \frac{d}{d t} \int_{X} f (x, t) d μ (x) = 0,

and so the second term in the above braces integrates to zero. Consequently:

\dot{S} (t) = - \int_{X} \frac{\partial f}{\partial t} log f d μ (x) .

(7)

It is also possible to bound the value of entropy itself. The function

Φ (x) = - log x

is a convex function. Consequently, Jensen’s inequality gives:

S = \int_{X} f (x) Φ (f (x)) d μ (x) \geq Φ ({∥ f ∥}^{2})

or

S \geq - log ({∥ f ∥}^{2})

(8)

where

{∥ f ∥}^{2} ≐ \int_{X} {| f (x) |}^{2} d μ (x) .

As a consequence of Parseval’s inequality, (8) can then be stated in Fourier space. This is true not only for the case of Euclidean space, but for wide classes of unimodular Lie groups [3].

Moreover,

S (t)

can be bounded from above in some contexts. For example, on Euclidean spaces it is well known that Gaussian distributions have maximum entropy over all pdfs with a given mean and covariance. Therefore, if

f (x, t)

is an arbitrary time-evolving pdf on

R^{d}

with mean

μ_{f} (t)

and covariance

Σ_{f} (t)

, and if

ρ_{μ, Σ} (x)

denotes a Gaussian distribution with mean

μ

and covariance

Σ

, then:

S_{f} (t) \leq S_{ρ_{μ_{f}, Σ_{f}}} (t) .

(9)

On a compact space such as the circle or rotation group, the uniform distribution has the absolute maximum entropy of all distributions on those spaces.

2. Rate of Entropy Production for Stochastic Processes on Euclidean Space

The stochastic mechanical models addressed here are described as stochastic differential equations forced by Brownian noise (i.e., Gaussian white noise, or equivalently, increments of Wiener processes).

2.1. Review of Stochastic Differential Equations

Stochastic differential equations (SDEs) are forced by Gaussian white noises

d w_{i}

, which are increments of uncorrelated unit-strength Weiner processes. These define Brownian motion processes. That is, each

d w_{i} (t)

can be viewed as an independent random draw from a one-dimensional Gaussian distribution with zero mean and unit variance. Let d denote the dimension of a Euclidean space on which the random process evolves. For systems with inertia,

d = 2 n

is phase space, and for noninertial systems

d = n

. The independent uncorrelated unit strength white noises

d w_{i}

form the components of a vector

d w \in R^{d}

. For example, given the stochastic differential equation on

R^{d}

d x = B d w

where

B \in R^{d \times d}

is a constant full-rank matrix, the distribution

f (x, t)

describing the ensemble of an infinite number of trajectories will satisfy:

\frac{\partial f}{\partial t} = \frac{1}{2} \sum_{i, j = 1}^{d} D_{i j} \frac{\partial^{2} f}{\partial x_{i} \partial x_{j}} .

where

D = [D_{i j}] = B B^{T}

. The solution of this equation subject to initial conditions

f (x, 0) = δ (x - 0)

is the time-varying Gaussian distribution:

f (x, t) = \frac{1}{{(2 π)}^{d / 2} {| D t |}^{1 / 2}} exp (- \frac{1}{2} x^{T} {(D t)}^{- 1} x)

(10)

where

| A |

denotes the determinant of a square matrix A. The entropy for this can be computed in closed form as: [4]

S (t) = log \{{(2 π e)}^{d / 2} {| Σ (t) |}^{1 / 2}\} .

(11)

where

Σ (t) = D t

in this context. More generally (11) can be used as the upper bound in (9) with

Σ = Σ_{f}

since entropy is independent of the mean.

As an example of (8), applying it to a Gaussian distribution on

R^{d}

gives:

S (t) \geq log \{{(4 π)}^{d / 2} {| D t |}^{1 / 2}\} .

Comparing with the exact expression in (11) verifies this since

4 π < 2 π e

and

log (x)

is a monotonically increasing function.

There are two major kinds of stochastic differential equations (SDEs), Itô and Stratonovich. Both are forced by Gaussian white noises

d w_{i}

. In the simple example above, Itô and Stratonovich interpretations lead to the same result, but in more complex cases where the coupling matrix B is configuration-dependent, the two intepretations will differ. A brief review of the main features of these two different interpretations of SDEs is given here based on the longer exposition in [5,6].

Historically, the Itô interpretation came first. If

d x_{i} (t) = a_{i} (x_{1} (t), . . ., x_{d} (t), t) d t + \sum_{j = 1}^{m} B_{i j} (x_{1} (t), . . ., x_{d} (t), t) d w_{j} (t) for i = 1, . . ., d

(12)

is an Itô SDE describing a random process on

R^{d}

, where now

B \in R^{d \times m}

, then the corresponding Fokker–Planck equation governing the probability density of the ensemble of states,

f (x, t)

, is [5,6]

\frac{\partial f (x, t)}{\partial t} = - \sum_{i = 1}^{d} \frac{\partial}{\partial x_{i}} (a_{i} (x, t) f (x, t)) + \frac{1}{2} \sum_{i, j = 1}^{d} \frac{\partial^{2}}{\partial x_{i} \partial x_{j}} (\sum_{k = 1}^{m} B_{i k} (x, t) B_{k j}^{T} (x, t) f (x, t)),

(13)

Itô SDEs are popular in mathematical statistics contexts, measurement and filtering theory, and in finance, because of the ease with which expectations can be taken, and the associated martingale properties. In engineering contexts when modeling physical processes the Stratonovich interpretation of SDEs described below is more popular because standard rules of Calculus can be used [7,8]. For this reason the Stratonovich interpretation is popular in differential geometric contexts since moving between coordinate patches involves Calculus operations. A Stratonovich SDE describing the exact same random process as the Itô SDE given above is written as

d x_{i} (t) = a_{i}^{s} (x_{1} (t), . . ., x_{d} (t), t) d t + \sum_{j = 1}^{m} B_{i j} (x_{1} (t), . . ., x_{d} (t), t) Ⓢ d w_{j} (t) for i = 1, . . ., d

(14)

where Ⓢ is used to denote the Stratonovich interpretion of the SDE, which differs from the Itô version. These interpretations are interchangeable with drift terems related as:

a_{i} (x, t) = a_{i}^{s} (x, t) + \frac{1}{2} \sum_{j = 1}^{m} \sum_{k = 1}^{d} \frac{\partial B_{i j}}{\partial x_{k}} B_{k j} .

(15)

This illustrates that there is very simple way to interconvert between Itô and Stratonovich by either adding or subtracting the last term in (15).

The Stratonovich form of the Fokker–Planck Equation (FPE) is written as [5,6]:

\frac{\partial f}{\partial t} = - \sum_{i = 1}^{d} \frac{\partial}{\partial x_{i}} (a_{i}^{s} f) + \frac{1}{2} \sum_{i, j = 1}^{d} \frac{\partial}{\partial x_{i}} [\sum_{k = 1}^{m} B_{i k} \frac{\partial}{\partial x_{j}} (B_{j k} f)]

(16)

When B is independent of the configuration variable

x

, Itô snd Stratonovich versions of the FPE are always the same, as can be seen from (15) where the discrepancy between the two versions of drift term vanishes as the partial derivatives of B with respect to

x

vanish. This is a sufficient condition for Itô and Stratonovich SDEs to yield the same FPE, but is not a necessary for this to be the case, as will be seen in later examples. Similar equations hold for processes evolving on manifolds and Lie groups, as will be discussed later in the paper.

The main topic addressed here is the rate at which

S (t)

changes. This can be observed by substituting the solution of the Fokker–Planck equations into the definition of

\dot{S}

in (6).

2.2. Rate of Entropy Production

Using the Stratonovich form of the FPE in (16), we arrive at the following theorem:

Theorem 1.

The rate of entropy production for

f (x, t)

governed by (16) evolving freely on Euclidean space is positive and bounded from below by

\dot{S} \geq \frac{1}{2} \int_{R^{d}} trace [\frac{(B {(x, t)}^{T} \nabla f) {(B {(x, t)}^{T} \nabla f)}^{T}}{f}] d x

(17)

when

\frac{\partial}{\partial x_{i}} (a_{i}^{s} - \sum_{k, j} B_{i k} \frac{\partial B_{j k}}{\partial x_{j}}) \geq 0

(18)

with equality holding when

a_{i}^{s} - \sum_{k, j} B_{i k} \frac{\partial B_{j k}}{\partial x_{j}} = c_{i}

(19)

is constant for all values of i. In the case when B is a constant and

D ≐ B B^{T}

, then

\dot{S} = \frac{1}{2} trace [D F (t)] .

(20)

where

F (t) = \int_{R^{d}} \frac{(\nabla f) {(\nabla f)}^{T}}{f} d x

is the Fisher information matrix of

f (x, t)

.

Proof.

The

\partial f / \partial t

term in

\dot{S} = - \int_{R^{d}} \frac{\partial f}{\partial t} log f d x

can be expressed in terms of spatial derivatives by substituting in the FPE. This is written as two terms using integration by parts with the surface terms at infinity vanishing due to the fact that the process evolves freely and the pdf must decay to zero at infinity. First,

- \int_{R^{d}} \{- \sum_{i = 1}^{d} \frac{\partial}{\partial x_{i}} (a_{i}^{s} f)\} log f d x = - \int_{R^{d}} \sum_{i = 1}^{d} a_{i}^{s} \frac{\partial f}{\partial x_{i}} d x .

Second,

- \int_{R^{d}} \{\frac{1}{2} \sum_{i, j = 1}^{d} \frac{\partial}{\partial x_{i}} [\sum_{k = 1}^{m} B_{i k} \frac{\partial}{\partial x_{j}} (B_{j k} f)]\} log f d x = \frac{1}{2} \int_{R^{d}} \frac{1}{f} \sum_{i, j = 1}^{d} [\sum_{k = 1}^{m} B_{i k} \frac{\partial}{\partial x_{j}} (B_{j k} f)] \frac{\partial f}{\partial x_{i}} d x .

Expanding

\frac{\partial}{\partial x_{j}} (B_{j k} f) = \frac{\partial B_{j k}}{\partial x_{j}} f + B_{j k} \frac{\partial f}{\partial x_{j}}

and recollecting terms gives

\dot{S} = \int_{R^{d}} \sum_{i = 1}^{d} \{- a_{i}^{s} + \sum_{i, j = 1}^{d} B_{i k} \frac{\partial B_{j k}}{\partial x_{j}}\} \frac{\partial f}{\partial x_{i}} d x + \frac{1}{2} \int_{R^{d}} \frac{1}{f} \sum_{k = 1}^{m} (\sum_{i = 1}^{d} B_{i k} \frac{\partial f}{\partial x_{i}}) (\sum_{j = 1}^{d} B_{j k} \frac{\partial f}{\partial x_{j}}) d x .

The second integral is always nonnegative, as it is a positive semi-definite quadratic form and can be written as:

\frac{1}{2} trace [\int_{R^{d}} \frac{(B {(x, t)}^{T} \nabla f) {(B {(x, t)}^{T} \nabla f)}^{T}}{f} d x] .

The first term can have either sign. However, if

a_{i}^{s} - \sum_{k, j} B_{i k} \frac{\partial B_{j k}}{\partial x_{j}} = c_{i}

is constant, then the first integral will vanish, and if by integration by parts the derivative in the first term is transferred over, the condition in the statement of the theorem will result since

f \geq 0

. □

When (18) holds it is clear that a looser lower bound than (17) akin to (20) can be obtained as:

\dot{S} \geq \frac{1}{2} trace [D_{0} (t) F (t)]

(21)

by constructing positive definite matrix

D_{0} (t) = D_{0}^{T} (t)

such that the following matrix inequality:

D_{0} (t) \leq B (x, t) B^{T} (x, t) \forall x \in R^{d}

is satisfied. The reason why (21) then holds is because the trace is linear and the trace of the product of positive semi-definite matrices is nonnegative, and both F and

B B^{T} - D_{0}

are positive semi-definite.

The condition (19) and result (20) are for diffusion processes with constant diffusion tensor and drift. Given initial conditions

f (x, 0) = f_{0} (x)

, the solution

f (x, t)

in this case will be of the form:

f (x, t) = (f_{0} * ρ_{t a, t B B^{T}}) (x)

(22)

where

ρ_{μ, Σ} (x)

is a multivariate Gaussian with mean

μ

and covariance

Σ

. The convolution of any two functions

f_{1}, f_{2} \in (L^{1} \cap L^{2}) (R^{d})

is defined as:

(f_{1} * f_{2}) (x) ≐ \int_{R^{d}} f_{1} (y) f_{2} (x - y) d y .

(23)

Fisher information plays an important part in probability theory [9,10] and its connections to physics also have been recognized [11,12,13]. For a recent review of its properties see [14].

Several inequalities from Information Theory can then be used to bound both entropy and entropy rate by quantities that are easily computable. For example, it is known that [15,16,17]:

\frac{1}{tr [F (f_{1} * f_{2}) P]} \geq \frac{1}{tr [F (f_{1}) P]} + \frac{1}{tr [F (f_{2}) P]}

(24)

where P is any real positive definite symmetric matrix with the same dimensions as F. When

P = D

this can then be used to give a lower bound on

tr [F D]

, and hence on

\dot{S}

. Moreover, one reason for the significance of (24) in information theory is that it provides a path for proving the entropy power inequality [15,16] described below.

The entropy power of a pdf

f (x)

on

R^{d}

was defined by Shannon as [4,18]:

N (f) ≐ \frac{exp (2 S (f) / d)}{2 π e}

where

S (f)

denotes the entropy of f. The entropy power inequality is:

N (f_{1} * f_{2}) \geq N (f_{1}) + N (f_{2}) .

(25)

Since the logarithm is a strictly increasing function, this provides a lower bound on

S (f_{1} * f_{2})

and hence can be used to bound the entropy of

f (x, t)

of the form in (22). In Section 3 lower bounds on

\dot{S}

will be derived.

It should be noted that (24) and (25) only apply for convolution on Euclidean spaces, and do not even apply for diffusion processes on the circle. In contrast, other bounds on non-Euclidean spaces and for processes that are not necessarily homogenoues diffusions are presented later in this paper.

2.3. Examples

2.3.1. Brownian Motion in Euclidean Spaces

Brownian motion in d-dimensional Euclidean space with Dirac delta initial conditions was already reviewed, with pdf (10) and entropy (11). From this, the rate of entropy production can be computed explicitly as:

\dot{S} = \frac{(d / 2) {(2 π e)}^{d / 2} {| D |}^{1 / 2} t^{d / 2 - 1}}{{(2 π e)}^{d / 2} {| D |}^{1 / 2} t^{d / 2}} = \frac{d}{2 t} .

The version of the Fisher information matrix in the theorem for a Gaussian is the inverse of the covariance, and hence:

F = {(D t)}^{- 1} = t^{- 1} D^{- 1} .

Consequently, (20) gives:

\dot{S} = \frac{1}{2} trace (t^{- 1} I) = \frac{d}{2 t} .

2.3.2. Brownian Motion on the Torus/Circle

The stochatic differential equation

d θ = \sqrt{D} d w

with constant scalar D describing Brownian motion on the unit circle has an associated Fokker–Planck equation

\frac{\partial f}{\partial t} = \frac{1}{2} D \frac{\partial^{2} f}{\partial θ^{2}}

which is the same as the case on the line. However, the boundary condition

θ (π) = θ (- π)

is imposed rather than free boundary conditions.

Let:

ρ (x, t) ≐ \frac{1}{\sqrt{2 π D t}} e^{- \frac{x^{2}}{2 D t}},

which is the solution of the heat equation on the real line subject to initial conditions

ρ (x, 0) = δ (x)

.

The solution of the heat equation on the circle subject to initial condition

f (θ, 0) = δ (θ)

is then [3,5,6]

f (θ, t) = \sum_{k = - \infty}^{\infty} ρ (θ - 2 π k; t) = \frac{1}{2 π} + \frac{1}{π} \sum_{n = 1}^{\infty} e^{- D t n^{2} / 2} cos n θ

(26)

The above two equalities represent two very different ways of describing the solution. In the first equality, the Gaussian solution presented in the previous section is wrapped (or folded) around the circle. The second solution is a Fourier series solution. The first is efficient when

k t

is small. In such cases only

k = 0

may be sufficient as an approximation. When

k t

is large, truncating

n = 1

in the Fourier expansion may be sufficient.

Statistical quantities such as the mean and variance can be computed in closed form as:

μ (t) = \int_{- π}^{π} θ f (θ, t) d θ = 0

and

σ^{2} (t) = \int_{- π}^{π} θ^{2} f (θ, t) d θ = \frac{π^{2}}{3} + 4 \sum_{n = 1}^{\infty} \frac{{(- 1)}^{n}}{n^{2}} e^{- D t n^{2} / 2} .

In both expressions in (26) summations are present which makes the exact analytical computation of logarithms, and hence entropy, problematic. However, Since the function

Φ (x) = - log x

is a monotonically decreasing function, then when

a, b > 0

we have

Φ (a + b) < Φ (a)

. Consequently, since

ρ > 0

everywhere,

S (t) \leq - \int_{- π}^{π} f (θ, t) log ρ (θ, t) d θ = \frac{1}{2} log (2 π D t) + \frac{σ^{2} (t)}{2 D t} .

(27)

In the extreme case when the distribution reaches equilibrium

f_{\infty} (θ) = 1 / (2 π)

, this is the maximum entropy possible, and hence:

S (t) \leq S_{\infty} = log (2 π) .

(28)

It should be noted that all calculations here are done relative to the measure

d θ

. For a compact space like the circle, it is common to normalize the measure so that:

V = \int_{V} 1 d V = 1 .

In doing so for the heat kernel on the circle this would involve redefining

d V = d θ / 2 π

and

f^{'} (θ, t) ≐ 2 π f (θ, t)

. This has no effect on mean and covariance, but the value of entropy is shifted such that the entropy of the uniform distribution is equal to zero and the entropy of all other distributions are negative. Rewriting (28) in a way that does not depend on the choice of normalization of measure is:

S (t) \leq log V (S^{1}) .

(29)

In other words, the value of entropy not only depends on the base of the logarithm, but also on the way that the integration measure is scaled.

2.3.3. Concentration of Species Transport in Inhomogeneous Compressible Flow

The concentration

c (x, t)

of a species in inhomogeneous compressible flow can be modeled by the Equation [19,20,21]:

\frac{\partial c}{\partial t} = \frac{\partial}{\partial x} [D (x, t) \frac{\partial c}{\partial x}] - \frac{\partial}{\partial x} (u (x, t) c)

(30)

where

D (x, t) = D_{0} {(1 + κ_{0} x)}^{2}

and

u (x, t) = u_{0} (1 + κ_{0} x)

. This is a FPE, and it is possible to work backwards to find the corresponding SDE.

We see immediately from the above partial differential Equation (pde) that if

c (x, 0)

is normalized to be a probability density function, then

c (x, t)

will preserve this property because integrating both sides over x gives that the right side is zero, and the time derivative on the left side commutes with the integral over x.

Equation (30) represents a one-dimensional example of what was presented in the theorem, and the rate of entropy increase is:

\dot{S} = \int_{- \infty}^{\infty} \frac{1}{c} D (x, t) {(\frac{\partial c}{\partial x})}^{2} d x + u_{0} κ_{0}

since the drift term in the entropy rate computation simplifies as:

- \int_{- \infty}^{\infty} u \frac{\partial c}{\partial x} d x = \int_{- \infty}^{\infty} c \frac{\partial u}{\partial x} d x = u_{0} κ_{0} \int_{- \infty}^{\infty} c (x, t) d x = u_{0} κ_{0} .

2.3.4. Homogeneous Transport in Couette Flow

As another example from classical fluid mechanics, consider 2D homogeneous transport in Couette flow governed by Equation [22,23]:

\frac{\partial c}{\partial t} = D_{0} [\frac{\partial^{2} c}{\partial x^{2}} + \frac{\partial^{2} c}{\partial y^{2}}] - \frac{U_{0}}{H} y \frac{\partial c}{\partial x}

where

y \in [0, H]

and again

c (x, 0)

is normalized to be a probability density function, and hence

c (x, t)

retains this property. In this case the region over which the equation holds is an infinite slab with the concentration and its gradient vanishing at

y = 0

and

y = H

.

We then arrive at:

\dot{S} = D_{0} \int_{0}^{H} \int_{- \infty}^{\infty} \frac{1}{c} [{(\frac{\partial c}{\partial x})}^{2} + {(\frac{\partial c}{\partial y})}^{2}] d x d y + μ \cdot e_{2}

where

μ = \int_{0}^{H} \int_{- \infty}^{\infty} x c (x, t) d x

is the mean of

c (x, t)

.

The pdf for concentration in both of the above examples can be solved in closed form using the methods in [23] in which inhomogeneous processes on Euclidean space are recast as homogeneous processes on an appropriately chosen Lie group. However, the purpose of these examples is to illustrate the relationship between entropy rate and Fisher information. In the next section, a classical result of information theory is used to bound the Fisher information with covariance. Covariance can be propagated without explicit knowledge of the pdf, which will be demonstrated.

3. Bounding Rate of Entropy Production with Covariance

The version of the Fisher information matrix that appears in entropy rate computations in general is not easy to compute. An exception to this statement is when the pdf is a Gaussian and

F_{g a u s s} = Σ_{g a u s s}^{- 1} .

For other exponential families closed form expressions are also possible. However, in general computing the time-varying Fisher information matrix for a pdf satisfying a Fokker–Planck equation will not be easy to compute.

However, it is possible to bound the Fisher information matrix with the covariance of the pdf, and covariance can be propagated as an ordinary differential equation even when no explicit solution for the time-varying pdf is known.

In this section, the Cramér-Rao Bound [24,25], a famous inequality in Euclidean statistics and information theory, is reviewed. Recently it has been extended to manifolds and Lie groups [26,27,28,29,30,31]. A second kind of inequality for compact spaces such as tori, spheres, and rotation groups was introduced in [32]. Then examples where covariance is propagated directly from the FPE are given in which the aforementioned inequalities can be put to use in bounding the rate of entropy generation. An outline of the general procedure is given here.

Let

μ

and

Σ

respectively denote the mean and covariance of a pdf

f (x; μ, Σ)

. Then:

μ ≐ \int_{R^{n}} x f (x; μ, Σ) d x,

or equivalently

\int_{R^{n}} (x - μ) f (x; μ, Σ) d x = 0,

(31)

and

Σ ≐ \int_{R^{n}} (x - μ) {(x - μ)}^{T} f (x; μ, Σ) d x .

When presented with a FPE of the form:

\frac{\partial f}{\partial t} = D f,

where

D

is as in the right hand side of (16), an ordinary differential equation describing the evolution of

μ (t)

and

Σ (t)

can be defined as:

\dot{μ} = \int_{R^{n}} x D f d x

and

\dot{Σ} = \int_{R^{n}} (x - μ) {(x - μ)}^{T} D f d x

where integration by parts can be used in some cases to obtain explicit closed-form expressions for the integrals on the right hand side of both equations. Once

Σ (t)

is obtained, it can be used to bound entropy from above using (gaussianentropy) the rate of entropy production from below using the results given in the next section.

3.1. The Cramér-Rao Bound

The Cramér-Rao Bound (or CRB) is a way to bound the covariance of an estimated statistical quantity [24,25]. Here it will not be used in its most general form. Here it will only be used in the unbiased estimation of the mean of a probability density function on

R^{n}

.

In the standard derivation of the CRB, as given in [18,33] for the case of estimation of the mean, the gradient of this expression with respect to

μ

is computed, where

\partial f / \partial μ

denotes the gradient as a column vector, and

\partial f / \partial μ^{T} ≐ {[\partial f / \partial μ]}^{T}

.

Differentiation of both sides of (31) with respect to

μ^{T}

gives:

\frac{\partial}{\partial μ^{T}} \int_{R^{n}} [x - μ] f d x = \int_{R^{n}} [x - μ] \frac{\partial f}{\partial μ^{T}} d x - I = O .

Here the derivative is taken under the integral. The product rule for differentiation then is used with the fact that f is a pdf in

x

.

O

denotes the

m \times m

zero matrix resulting from computing

\partial 0 / \partial μ^{T}

. That is, since the zero vector is a constant quantity, each partial derivative with respect to

μ_{i}

is zero as well, resulting in an array of zero vectors,

O .

The above equation can be written as [18,24,25]

I = \int_{R^{n}} a (x, μ) b^{T} (x, μ) d x \in R^{p \times m}

(32)

where

a (x, μ) = {[f]}^{\frac{1}{2}} [x - μ]

and

b (x, μ) = {[f]}^{- \frac{1}{2}} \frac{\partial f}{\partial μ} .

Using the fact that

f (x; μ, Σ) = f (x - μ; 0, Σ)

means that:

b (x, μ) = - {[f]}^{- \frac{1}{2}} \frac{\partial f}{\partial x} .

Then it becomes clear that:

F = \int_{R^{n}} b (x, μ) {[b (x, μ)]}^{T} d x

(33)

and

Σ = \int_{R^{n}} a (x, μ) {[a (x, μ)]}^{T} d x .

(34)

Following the logic in [33], two arbitrary vectors are introduced:

v \in R^{p}

and

w \in R^{m}

. Then (32) is multiplied on the left by

v^{T}

and on the right by

w

to give:

v^{T} I w = \int_{R^{n}} v^{T} a (x, μ) b^{T} (x, μ) w d x .

(35)

Regrouping terms in the resulting expression, squaring, and using the Cauchy–Schwarz inequality, then gives [24,25,33]:

\begin{matrix} {(\int_{R^{n}} v^{T} (a b^{T}) w d x)}^{2} & = & {(\int_{R^{n}} (v^{T} a) (b^{T} w) d x)}^{2} \\ \leq & (\int_{R^{n}} {(v^{T} a)}^{2} d x) (\int_{R^{n}} {(w^{T} b)}^{2} d x) \\ = & (\int_{R^{n}} v^{T} a a^{T} v d x) (\int_{R^{n}} w^{T} b b^{T} w d x) . \end{matrix}

From the Equations (33)–(35), this can be written as:

{(v^{T} I w)}^{2} \leq (v^{T} Σ v) (w^{T} F w) .

Making the choice of

w = F^{- 1} v

yields:

{(v^{T} F^{- 1} v)}^{2} \leq (v^{T} Σ v) (v^{T} F^{- 1} v) .

This simplifies to:

v^{T} (Σ - F^{- 1}) v \geq 0 for arbitrary v \in R^{n}

(36)

Consequently, the term in parenthesis is a positive definite matrix, or as a matrix inequality [18,24,25,33]:

Σ \geq F^{- 1},

(37)

which is the famous Cramér-Rao Bound (for the special case of an unbiased estimator of the mean). This is equivalently:

Σ^{- 1} \leq F .

(38)

Then, for example, in all of the equations for entropy production in time-varying pdfs on Euclidean space presented earlier, it is possible to bound from below using a cascade of inequalities such as:

tr [D F] \geq λ_{m i n} (D) tr [F] \geq λ_{m i n} (D) tr [Σ^{- 1}] .

3.2. An Example

Returning to the example of species transport in a compressible 1D flow outlined in Section 2.3.3, this section illustrates how the entropy rate can be bounded from below using the CRB even when a closed-form solution for the pdf is not known.

From the FPE itself, it is possible to propagate the mean and covariance. Multiplying both sides of (30) and integrating by parts gives the following ordinary differential Equation (ODE) for the mean

μ (t)

\dot{μ} = (2 d_{0} κ_{0} + u_{0}) (1 + κ_{0} μ)

subject to initial conditions,

μ (0) = μ_{0}

. This ODF can be solved in closed form for

μ (t)

. However, even if it could not be, it could be solved by numerical integration, which is much easier than solving the FPE. Similarly, since:

σ^{2} = \int_{- \infty}^{\infty} {(x - μ)}^{2} c (x, t) d x = - μ^{2} + \int_{- \infty}^{\infty} x^{2} c (x, t) d x,

multiplying (30) by

x^{2}

and integrating by parts gives a way to propagate the covariance with an ODE of the form:

\frac{d}{d t} (σ^{2}) = F (μ, σ^{2})

which can be solved either analytically or numerically subject to initial conditions

σ^{2} (0) = σ_{0}^{2}

.

It is worth noting that even in cases where such propagation of moments by FPE is not possible (for example, when higher moments creep into the equations so that there is not form closure), it is still possible to numerically generate a large ensemble of sample paths from the SDE corresponding to the FPE and compute variance (or covariance in multi-dof systems). Covariance estimation is much more stable than pdf estimation, and so using the CRB as a lower bound is more reliable than directly attempting to compute entropy, entropy rate, or Fisher information when the pdf is not known explicitly.

4. Classical Statistical Mechanics as Stochastic Mechanics

Classical statistical mechanics, as developed by Boltzmann, Maxwell, and Gibbs, states that entropy increases. For an introduction to phase space and equilibrium statistical mechanics see [34]. Nonequilibrium statistical mechanics has been studied extensively over a long period of time starting with Boltzmann and summarized in a number of books including [35,36,37,38,39]. Important results continue to be developed in modern time, e.g., [40]. An alternative to the classical Boltzmann–Gibbs formulation is stochastic mechanics [41,42,43,44]. The difference is that in Boltzmann’s original formulation of statistical mechanics the model describing collisions between gas molecules was deterministic. At the beginnning of the twentieth century Einstein’s formulation of Brownian motion also did not explicitly model random forces, though Langevin did. In all of those early works on Brownian motion there was no concept of Wiener process or Itô or Stratonovich stochastic calculus. These are mid twentieth century constructs that came after. Consequently, revisiting results in statistical mechanics using more modern stochastic modeling techniques sheds light on some old problems and provides a basis for building connections between statistical mechanics, stochastic modeling, and information theory.

Here a Hamiltonian formulation of stochastic mechanics is used. The Hamiltonian of a mechanical system is defined as the total system energy written in terms the conjugate momenta

p

and generalized coordinates

q

:

H (p, q) ≐ \frac{1}{2} p^{T} M^{- 1} (q) p + V (q) .

(39)

Here

M (q)

is the configuration-dependent mass matrix and

p ≐ M (q) \dot{q} .

The beauty of the Hamiltonian formulation is that the volume in phase space (i.e., the joint

p

-

q

space) is invariant under coordinate changes.

4.1. Properties of Phase Space

As is well known and explained in [5,6], if

q

and

q^{'}

are two different sets of coordinates, then kinetic energy is expressed as:

T = \frac{1}{2} {\dot{q}}^{'}^{T} M^{'} (q^{'}) {\dot{q}}^{'} = \frac{1}{2} {\dot{q}}^{T} M (q) \dot{q} .

Then with the Jaccobian relating rates of change,

{\dot{q}}^{'} = J (q) \dot{q},

it is clear that:

M (q) = J^{T} (q) M^{'} (q^{'} (q)) J (q) .

From the above, and the definition of conjugate momenta,

p_{i} ≐ \partial T / \partial {\dot{q}}_{i}

,

p^{'} = J^{- T} (q) p .

(40)

Therefore, the two phase spaces have volume elements that are related as:

d p^{'} d q^{'} = |\begin{matrix} J^{- T} (q) & \partial (J^{- T} (q) p) / \partial q \\ O & J (q) \end{matrix}| d p d q .

The determinant of the upper-triangular block matrix in the above equation is equal to 1, illustrating the invariance:

d p d q = d p^{'} d q^{'} .

(41)

The key to this result is how

p

transforms in (40). A similar result holds in the Lie group setting wherein the cotangent bundle of a Lie group can be endowed with an operation making it unimodular even when the underlying group is not [45]. This is analogous to why (4) requires the metric tensor weighting and is coordinate dependent and (41) is not.

4.2. Hamilton’s Equations for a System Forced by External Noise and Damping

Hamilton’s equations of motion are:

\begin{matrix} \frac{d p_{i}}{d t} & = & - \frac{\partial H}{\partial q_{i}} + F_{i} \end{matrix}

(42)

\begin{matrix} \frac{d q_{i}}{d t} & = & \frac{\partial H}{\partial p_{i}} . \end{matrix}

(43)

where

F_{i}

are generalized external forces. In the case in which the mechanical system is forced by noise and viscous damping, then after multiplication by

d t

, these equations of motion become:

\begin{matrix} d p_{i} = - \frac{1}{2} p^{T} \frac{\partial M}{\partial q_{i}} p d t - \frac{\partial V}{\partial q_{i}} d t - e_{i}^{T} C M^{- 1} p d t + e_{i}^{T} B d w \end{matrix}

(44)

and

d q_{i} = e_{i}^{T} M^{- 1} p d t

(45)

where

e_{i}

is the

i^{t h}

natural unit basis vector. Note that the configuration-dependant mass matrix

M = M (q)

, noise matrix

B = B (q)

, and damping

C = C (q)

appear prominently in these equations.

Equations (44) and (45) can be written together as:

(\begin{matrix} d q \\ d p \end{matrix}) = (\begin{matrix} α (p, q) \\ γ (p, q) \end{matrix}) d t + (\begin{matrix} O & O \\ O & B (q) \end{matrix}) (\begin{matrix} d w^{'} \\ d w \end{matrix}) .

(46)

(Here

d w^{'}

multiplies zeros and hence is inconsequential).

The vector-valued function

α

and

γ

are defined by their entries:

\begin{matrix} α_{i} & ≐ & e_{i}^{T} M^{- 1} p \\ γ_{i} & ≐ & - \frac{1}{2} p^{T} \frac{\partial M}{\partial q_{i}} p - \frac{\partial V}{\partial q_{i}} - e_{i}^{T} C M^{- 1} p . \end{matrix}

The Fokker–Planck corresponding to (46), which together with an initial distribution

f (q, p, 0) = f_{0} (q, p)

defines the family of time-evolving pdfs

f (q, p; t)

, is

\frac{\partial f}{\partial t} + \sum_{i = 1}^{n} \frac{\partial}{\partial q_{i}} (α_{i} f) + \sum_{i = 1}^{n} \frac{\partial}{\partial p_{i}} (γ_{i} f) - \frac{1}{2} \sum_{k = 1}^{n} \sum_{i, j = 1}^{n} \frac{\partial^{2}}{\partial p_{i} \partial p_{j}} (b_{i k} b_{k j}^{T} f) = 0

(47)

where

b_{i j} = e_{i}^{T} B e_{j}

is the

i, j^{t h}

entry of B.

Note that for any mechanical system with inertia the diffusion is the same regardless of Itô or Stratonovich interepretation, as:

\frac{\partial^{2}}{\partial p_{i} \partial p_{j}} (b_{i k} b_{k j}^{T} f) = \frac{\partial}{\partial p_{i}} (b_{i k} \frac{\partial}{\partial p_{j}} (b_{k j}^{T} f)) = b_{i k} b_{k j}^{T} \frac{\partial^{2} f}{\partial p_{i} \partial p_{j}} .

That is, even though B is configuration dependent, the structure of the FPE equations in the case of mechanical systems with inertia places partial derivatives with respect to momenta in the diffusion terms, and such partial derivatives pass through the configuration-dependent B matrix. For this reason in mechanical systems with inertia, it does not matter if Itô or Stratonovich interpretations of SDEs are used. This freedom allows the modeler to take the best of both worlds. However, when approximations are made in modeling the initial equations of motion, such as assuming that the inertia is negligible, then the above Hamiltonian formulation no longer applies and one must be very careful as to the interpretation of the SDE as Itô or Stratonovich.

4.3. The Boltzmann Distribution

The Boltzmann distribution is defined as:

f_{\infty} (q, p) ≐ \frac{1}{Z} exp (- β H (p, q))

(48)

where

β ≐ 1 / k_{B} T

with

k_{B}

denoting Boltzmann’s constant and T is temperature measured in degrees Kelvin.

The partition function is defined as:

Z = \int_{q} \int_{p} exp (- β H (p, q)) d p d q .

(49)

The reason for using the subscript ∞ in defining (48) is the following theorem.

Theorem 2.

If

q \in R^{n}

globally parameterizes the configuration manifold of a mechanical system, then the solution of the Fokker–Planck Equation (47) will satisfy

lim_{t \to \infty} f (p, q, t) = f_{\infty} (p, q)

if and only if

C = \frac{β}{2} B B^{T} .

(50)

Proof.

We begin by noting that (47) can be simplified a bit. First,

\frac{\partial}{\partial q_{i}} (α_{i} f) = \frac{\partial α_{i}}{\partial q_{i}} f + α_{i} \frac{\partial f}{\partial q_{i}}

and

\frac{\partial}{\partial p_{i}} (γ_{i} f) = \frac{\partial γ_{i}}{\partial p_{i}} f + γ_{i} \frac{\partial f}{\partial p_{i}} .

It is not difficult to show that:

\sum_{i = 1}^{n} \{\frac{\partial α_{i}}{\partial q_{i}} + \frac{\partial γ_{i}}{\partial p_{i}}\} = tr (C M^{- 1}) .

Using this, and considering the equilibrium condition when

\partial f / \partial t = 0

then reduces (47) to

tr (C M^{- 1}) f + \sum_{i = 1}^{n} \{α_{i} \frac{\partial f}{\partial q_{i}} + γ_{i} \frac{\partial f}{\partial p_{i}}\} - \frac{1}{2} \sum_{i, j = 1}^{n} \frac{\partial^{2}}{\partial p_{i} \partial p_{j}} ({(B B^{T})}_{i j} f) = 0,

(51)

where the substitution

{(B B^{T})}_{i j} = \sum_{k = 1}^{n} b_{i k} b_{k j}^{T}

has been made.

Note that:

\frac{\partial f_{\infty}}{\partial q_{i}} = - β (\frac{\partial V}{\partial q_{i}} + \frac{1}{2} p^{T} \frac{\partial M}{\partial q_{i}} p) f_{\infty},

\frac{\partial f_{\infty}}{\partial p_{i}} = - β e_{i}^{T} M^{- 1} p f_{\infty} = - β α_{i} f_{\infty},

and hence significant cancelation results in:

\sum_{i = 1}^{n} \{α_{i} \frac{\partial f_{\infty}}{\partial q_{i}} + γ_{i} \frac{\partial f_{\infty}}{\partial p_{i}}\} = β α^{T} C α .

Moreover,

\frac{\partial^{2} f_{\infty}}{\partial p_{i} \partial p_{i}} = (- β m_{i j}^{- 1} + β^{2} α_{i} α_{j}) f_{\infty}

Substituting into (51) therefore gives:

tr (C M^{- 1}) + β α^{T} C α - \frac{β}{2} tr (M^{- 1} B B^{T}) - \frac{β^{2}}{2} α^{T} B B^{T} α = 0 .

This shows that

f_{\infty} (p, q)

is in fact a solution of (51), if (50) holds. The necessary conditions for the above equality to hold boil down to the necessary conditions for the two independent statements:

tr ((C - \frac{β}{2} B B^{T}) M^{- 1}) = 0

and

α^{T} (C - \frac{β}{2} B B^{T}) α = 0

to hold. The independence of these follows from the fact that some terms depend on

α

and others do not, and the main equality must hold for all values of

α

.

The only way that both of the above can hold is if

C - \frac{β}{2} B B^{T}

is skew symmetric. However, damping matrices, like stiffness and mass matrices, are symmetric, as is

B B^{T}

. Hence (50) must hold. □

Note that the necessary and sufficient conditions in (50) for the Boltzmann distribution to be the equilibrium/stationary solution holds even when B and C are dependant on

q

. As such, this is a generalization of the fluctuation dissipation theorem (which is stated for particles) to the case of complex mechanical systems that can be modeled as a collection of rigid bodies (e.g., biological macromolecules).

4.4. Marginal Densities and the Conundrum as Mass Becomes Zero

Marginal densities of

f (p, q, t)

can be defined as:

f (p, t) ≐ \int_{q} f (p, q, t) {| det M (q) |}^{1 / 2} d q

and

f (q, t) ≐ {| det M (q) |}^{- 1 / 2} \int_{p} f (p, q, t) d p,

(52)

which is consistent with (3).

In the equilibrium case it is always possible to compute

f_{\infty} (q)

in closed form as:

f_{\infty} (q) = \frac{1}{Z_{c}} e^{- β V (q)}

(53)

where

Z_{c} = \int_{q} e^{- β V (q)} {| det M (q) |}^{1 / 2} d q

is the configurational partition function. Then,

\int_{q} f_{\infty} (q) {| det M (q) |}^{1 / 2} d q = 1 .

In contrast, in general

f_{\infty} (p)

can only be computed easily in closed form when

M (q) = M_{0}

is constant. In this case:

f_{\infty} (p) = \frac{β^{d / 2}}{{(2 π)}^{d / 2} {| det M_{0} |}^{d / 2}} exp (- \frac{β}{2} p^{T} M_{0}^{- 1} p)

is a Gaussian distribution in the momenta.

Though (53) degenerates as the inertia goes to zero, it does so gracefully since both

{| det M |}^{1 / 2}

and

Z_{c}

approach zero in the same way as the system mass goes to zero. We can then use it as the baseline truth to compare approximations in which inertia is neglected. For example, consider the spring-mass-damper with noise:

m \ddot{x} + c (x) \dot{x} + k x = b (x) n

where

c (x)

and

b (x)

are nonlinear functions satisfying the condition

2 c (x) = β b {(x)}^{2}

. If

n d t = d w

, then as

m \to 0

, we have a conundrum unless

c = c_{0}

and

b = b_{0}

are constant. Namely, which of the following interpretations is correct?

d x_{1} = - k c {(x_{1})}^{- 1} x_{1} d t + 2 β^{- 1} b {(x_{1})}^{- 1} d w

or

d x_{2} = - k c {(x_{2})}^{- 1} x_{2} d t + 2 β^{- 1} b {(x_{2})}^{- 1} Ⓢ d w ?

It did not matter when there was inertia, as both gave the same FPEs in the case, but making the approximation that the mass is zero creates a situation where a choice now must be made.

The answer can be informed by comparing the corresponding pdfs that solve the associated Fokker–Planck equations,

f_{1} (x, t)

and

f_{2} (x, t)

, with

f (x, t)

in (52). Short of that, we can examine the behavior of the mean as a function of time, and the behavior of the equilibrium distributions as compared with (52).

The Fokker–Planck equations corresponding to the above SDEs are, respectively:

\frac{\partial f_{1}}{\partial t} = k \frac{\partial}{\partial x} (c^{- 1} x f_{1}) + 2 β^{- 2} \frac{\partial^{2}}{\partial x^{2}} (b^{- 2} f_{1})

and

\frac{\partial f_{2}}{\partial t} = k \frac{\partial}{\partial x} (c^{- 1} x f_{2}) + 2 β^{- 2} \frac{\partial}{\partial x} (b^{- 1} \frac{\partial}{\partial x} (b^{- 1} f_{2})) .

Expanding and considering equilibrium conditions gives

Δ_{1} = k \frac{\partial}{\partial x} (c^{- 1} x f_{1}) + 2 β^{- 2} \frac{\partial}{\partial x} (- 2 b^{- 3} \frac{\partial b}{\partial x} f_{1} + b^{- 2} \frac{\partial f_{1}}{\partial x})

and

Δ_{2} = k \frac{\partial}{\partial x} (c^{- 1} x f_{2}) + 2 β^{- 2} \frac{\partial}{\partial x} (- b^{- 3} \frac{\partial b}{\partial x} f_{2} + b^{- 2} \frac{\partial f_{2}}{\partial x})

where an exact solution would give

Δ_{i} = 0

.

The exact configurational marginal from the Hamiltonian formulation is

f_{\infty} (x) = {(\frac{β k}{2 π})}^{\frac{1}{2}} e^{- β k x^{2} / 2},

and it has the property

\frac{\partial f_{\infty}}{\partial x} = - β k x f_{\infty} .

Substituting into the above, and observing that

k \frac{\partial}{\partial x} (c^{- 1} x f_{\infty}) + 2 β^{- 2} \frac{\partial}{\partial x} (b^{- 2} \frac{\partial f_{\infty}}{\partial x}) = 0

due to the relationship between b and c, then

Δ_{1} = 2 β^{- 2} \frac{\partial}{\partial x} (- b^{- 3} \frac{\partial b}{\partial x} f_{\infty}) = 2 Δ_{2} .

This means that neither interpretation gives the true answer at equilibrium, but the magnitude of the discrepancy in the Stratonovich model is half that of the Itô. For this reason, unless modeling systems in phase space, or if there are physical grounds for choosing a particular SDE (e.g., working backwards from Fick’s Law), it is safest to consider diffusions with constant diffusion tensors, as will be the case throughout the remainder of this paper.

5. Stochastic Systems on Unimodular Lie Groups

A stochastic mechanical system that has a Lie group as its configuration space can be studied in a coordinate-free way [5,6]. These systems can be purely kinematic, or can have inertia. Concrete examples are used here to illustrate, and then general theorems are provided to quantify the rate of entropy production. Different connections between Lie groups and thermodynamics than what is presented here have been made in the literature [46,47,48,49].

5.1. Review of Unimodular Matrix Lie Groups with $S O (3)$ and $S E (2)$ as Examples

The use of geometric (and particularly Lie-theoretic) methods in the control of mechanical systems and robots has been studied extensively over the past half century [50,51,52,53,54]. The material and notation in this section summarizes more in-depth treatments in [3,5,6].

A matrix Lie group is a group with elements that are matrices, for which group multiplication is matrix multiplication, and for which the underlying space is an analytic manifold with the operations of group multiplication and inversion of elements being analytic also. Intuitively, matrix Lie groups are continuous families of invertible matrices with structure that is preserved under multiplication and inversion. The dimension of a matrix Lie group is the dimension of its manifold, not the dimension of the square matrices describing its elements.

For example, the group of rigid-body displacements in the Euclidean plane,

S E (2)

, can be described with elements of the form:

g (x, y, θ) = (\begin{matrix} cos θ & - sin θ & x \\ sin θ & cos θ & y \\ 0 & 0 & 1 \end{matrix}) .

(54)

The dimension is 3 because there are three free parameters

(x, y, θ)

. This group is not compact as x and y can take values on the real line.

The group of pure rotations in 3D can be described by the rotation matrices:

S O (3) ≐ {R \in R^{3 \times 3} | R R^{T} = I, det R = + 1} .

S O (3)

is a compact 3-dimensional manifold. Again, the fact that the dimension of the matrices is also 3 is coincidental.

A unimodular Lie group is defined by the property that a measure

d g

can be constructed such that the integral over the group has the property that

\int_{G} f (g) d g = \int_{G} f (g_{0} g) d g = \int_{G} f (g g_{0}) d g

(55)

for any fixed

g_{0} \in G

and any function

f \in L^{1} (G)

. It can also be shown that as a consequence of (55):

\int_{G} f (g) d g = \int_{G} f (g^{- 1}) d g .

(56)

These properties are natural generalizations of those familiar to us for functions on Euclidean space.

As we are primarily concerned with probability density functions for which

\int_{G} f (g) d g = 1,

these clearly meet the condition of being in

L^{1} (G)

.

In the case of

S O (3)

the bi-invariant measure expressed in terms of

Z - X - Z

Euler angles

(α, β, γ)

is

d R = sin β d α d β d γ

. In the case of

S E (2)

, the bi-invariant measure is

d g = d x d y d θ

.

The convolution of probability density functions on a unimodular Lie group is a natural operation, and is defined as:

(f_{1} * f_{2}) (g) ≐ \int_{G} f_{1} (h) f_{2} (h^{- 1} g) d h .

(57)

The convolution of two probability density functions is again a probability density.

In addition to being natural spaces over which to integrate probability density functions, natural concepts of directional derivatives of functions exist in the matrix Lie group setting. This builds on the fact that associated with every matrix Lie group is a matrix Lie algebra.

In the case of

S O (3)

, the Lie algebra consists of

3 \times 3

skew-symmetric matrices of the form

X = (\begin{matrix} 0 & - x_{3} & x_{2} \\ x_{3} & 0 & - x_{1} \\ - x_{2} & x_{1} & 0 \end{matrix}) = \sum_{i = 1}^{3} x_{i} E_{i} .

(58)

The matrices

{E_{i}}

form a basis for the set of

3 \times 3

skew-symmetric matrices. The coefficients

{x_{i}}

are all real. The notation relating the matrix X and the vector

x = {[x_{1}, x_{2}, x_{3}]}^{T}

is [51,52]

x = X^{\lor} and X = \hat{x} .

(59)

This is equivalent to identifying

E_{i}^{\lor}

with

e_{i}

.

For

S E (2)

the basis elements are different, and are of the form

E_{1}^{'} = (\begin{matrix} 0 & 0 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}); E_{2}^{'} = (\begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{matrix}); E_{3}^{'} = (\begin{matrix} 0 & - 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}) .

Every element of the Lie algebra associated with

S E (2)

can be written as a linear combination of these, and the notation (59) is still used to identify these matrices with natural unit basis vectors

e_{i} \in R^{3}

. For example,

X^{'} = x_{1}^{'} E_{1}^{'} + x_{2}^{'} E_{2}^{'} + x_{3}^{'} E_{3}^{'}

. (Here the primes are used so as not to confuse the Lie algebra elements for

S E (2)

with those for

S O (3)

, but when working with a single Lie group and Lie algebra the primes will be dropped, as in the discussion below which is for the generic case.)

For an arbitrary unimodular matrix Lie group, a natural concept of directional derivative is

(\tilde{X} f) (g) ≐ {\frac{d}{d t} f (g exp (t X))|}_{t = 0} .

(60)

Here the argument of the function f is read as the product of g and

exp (t X)

, which are each in G, as is the product. If

X = \sum_{i} x_{i} E_{i}

for constants

{x_{i}}

, this derivative has the property

(\tilde{X} f) (g) = \sum_{i} x_{i} ({\tilde{E}}_{i} f) (g) .

Such derivatives appear in invariant statements of Fokker–Planck equations on unimodular Lie groups. Moreover, these derivatives can be used together with integration to state results such as integration by parts

\int_{G} f_{1} (g) ({\tilde{E}}_{i} f_{2}) (g) d g = - \int_{G} f_{2} (g) ({\tilde{E}}_{i} f_{1}) (g) d g .

There are no surface terms because either group is infinite in its extent (and so the functions must decay to zero at the boundaries), or it is compact (in which case the functions must match values when arriving from different directions), or both for a group such as

S E (2)

.

5.2. The Noisy Kinematic Cart

The stochastic kinematic cart has been studied extensively in the robotics literature [5,6,55,56,57,58]. In this model (which is like a motor-driven wheelchair) the two wheels each have radii r, and the wheelbase (distance between wheels) is denoted as L. The nonholonomic equations of motion are

(\begin{matrix} \dot{x} \\ \dot{y} \\ \dot{θ} \end{matrix}) = (\begin{matrix} \frac{r}{2} cos θ & \frac{r}{2} cos θ \\ \frac{r}{2} sin θ & \frac{r}{2} sin θ \\ \frac{r}{L} & - \frac{r}{L} \end{matrix}) (\begin{matrix} {\dot{ϕ}}_{1} \\ {\dot{ϕ}}_{2} \end{matrix}) .

(61)

When the wheel rates consist of a constant deterministic part and a stochastic part, then

\begin{matrix} d ϕ_{1} & = & ω d t + \sqrt{D} d w_{1} \end{matrix}

(62)

\begin{matrix} d ϕ_{2} & = & ω d t + \sqrt{D} d w_{2} \end{matrix}

(63)

and multiplying (61) by

d t

and substituting in (63) results in an SDE. This is an example where it does not matter whether the SDE is of Itô or Stratonovich type, even though B is not constant. The corresponding Fokker–Planck equation for the probability density function

f (x, y, θ; t)

with respect to measure

d x d y d θ

is [58]:

\frac{\partial f}{\partial t} = - r ω cos θ \frac{\partial f}{\partial x} - r ω sin θ \frac{\partial f}{\partial y} +

\frac{D}{2} (\frac{r^{2}}{2} {cos}^{2} θ \frac{\partial^{2} f}{\partial x^{2}} + \frac{r^{2}}{2} sin 2 θ \frac{\partial^{2} f}{\partial x \partial y} + \frac{r^{2}}{2} {sin}^{2} θ \frac{\partial^{2} f}{\partial y^{2}} + \frac{2 r^{2}}{L^{2}} \frac{\partial^{2} f}{\partial θ^{2}}),

which is subject to the initial conditions

f (x, y, θ; 0) = δ (x - 0) δ (y - 0) δ (θ - 0) .

The coordinates

(x, y, θ)

that define the position and orientation of the cart relative to the world frame are really parameterizing the group of rigid-body motions of the plane,

S E (2)

. Each element of this unimodular Lie group can be described as homogeneous transformation matrices of the form in (54) in which case the group law is matrix multiplication.

Then (61) can be written in coordinate-free notation as:

{(g^{- 1} \frac{d g}{d t})}^{\lor} = A \dot{ϕ} where A = \frac{r}{2} (\begin{matrix} 1 & 1 \\ 0 & 0 \\ 2 / L & - 2 / L \end{matrix}) .

(64)

Here the notation ∨ is used as in [3,5,6,51] in analogy with (59), but for the case of

S E (2)

rather than

S O (3)

.

The coordinate-free version of the above Fokker–Planck equation can be written compactly in terms of these Lie derivatives as [58]:

\frac{\partial f}{\partial t} = - r ω {\tilde{E}}_{1}^{'} f + \frac{r^{2} D}{4} {({\tilde{E}}_{1}^{'})}^{2} f + \frac{r^{2} D}{L^{2}} {({\tilde{E}}_{3}^{'})}^{2} f

(65)

with initial conditions

f (g; 0) = δ (g)

. The resulting time-evolving pdf is denoted as

f (g; t)

with respect to the natural bi-invariant integration measure for

S E (2)

, which is

d g = d x d y d θ

. Solutions for (65) can be obtained in different regimes (small

D t

and large

D t

) either using Lie-group Gaussian distributions or Lie-group Fourier expansions, as in [58,59]. That is not the goal here. Instead, the purpose of this example is to provide a concrete case for the derivations that follow regarding rate of entropy production.

It should be noted that degenerate diffusions on

S E (2)

occur not only in this problem, in models of the visual cortex [60,61,62,63,64,65]. Phase noise is a problem in coherent optical communication systems that has been identified in the literature [66,67,68,69,70]. The Fokker–Planck equations describing phase noise have been developed and solved using various methods [71,72,73]. Remarkably, these FPEs are the same kind as those for the kinematic cart, inpainting, visual cortex modeling, etc. Moreover, the natural extension of (64) and (65) to

S E (3)

has found applications in modeling DNA (as reviewed in [3,5,6]) and flexible steerable needles for robotic surgery [57,74,75,76].

5.3. Rotational Brownian Motion

Starting with Perrin [77], various efforts at mathematical modeling of rotational Brownian motion has been undertaken over the past century [78,79,80,81,82]. These include both inertial and noninertial theories. A major application is in the spectroscopy of macromolecules [83,84]. Essentially the same mathematics is applicable to modeling the time-evolving uncertaintly in mechanical gyroscopes [85].

Brownian motion on Riemannian manifolds and Lie groups also has been studied over a long period of time in the mathematics literature [86,87,88,89,90,91], with the rotation group and three-sphere being two very popular objects [92,93,94]. In addition to forcing by white noise, forcing by Lévy processes (white noise with jumps) has also been investigated [95].

5.3.1. Inertial Theory

Euler’s equation of motion for a rotating rigid body subjected to an external potential, noise, and damping can be written as:

I_{0} d ω + ω \times (I_{0} ω) d t = - (\tilde{E} V) (R) d t - C_{0} ω d t + B_{0} d w

(66)

where

(\tilde{E} V) (R) = (\begin{matrix} ({\tilde{E}}_{1} V) (R) \\ ({\tilde{E}}_{2} V) (R) \\ ({\tilde{E}}_{3} V) (R) \end{matrix}) .

Here

ω

is the body-fixed description of angular velocity, which is related to a time-evolving rotation matrix (using the hat notation in (59)) as:

\dot{R} = R \hat{ω} .

(67)

The moment of inertia matrix,

I_{0}

, damping matrix,

C_{0}

, and noise matrix

B_{0}

are all constant. Equations (66) and (67) define a stochastic process evolving on the tangent bundle of

S O (3)

.

This can be re-written using angular momentum,

π = I_{0} ω

, as

d π = π \times (I_{0}^{- 1} π) d t - (\tilde{E} V) (R) d t - C_{0} I_{0}^{- 1} π d t + B_{0} d w .

(68)

Equations (68) and

\dot{R} = R \hat{I_{0}^{- 1} π}

define a stochastic process on the cotangent bundle of

S O (3)

.

Note that

p \neq π

. To see this, expand angular velocity and kinetic energy in coordinates as

ω = J (q) \dot{q}

and

T = \frac{1}{2} {\dot{q}}^{T} J {(q)}^{T} I_{0} J (q) \dot{q} .

Consequently

M (q) = J {(q)}^{T} I_{0} J (q)

and

p = M (q) \dot{q}

. In contrast,

π = I_{0} J (q) \dot{q}

. Therefore, in order to use the general results from statistical mechanics, the interconversion

p = J {(q)}^{T} π

must be done. Moreover,

C_{0}

in the above equations is not the same as C in the Hamiltonian formulation. A Rayleigh dissipation function will be of the form

R = \frac{1}{2} ω^{T} C_{0} ω = \frac{1}{2} {\dot{q}}^{T} J {(q)}^{T} C_{0} J (q) \dot{q},

indicating that

C (q) = J {(q)}^{T} C_{0} J (q)

. Then converting (68) to the Hamiltonian form, the viscous and noise terms become:

J {(q)}^{T} [- C_{0} I_{0}^{- 1} π d t + B_{0} d w] = - J {(q)}^{T} C_{0} I_{0}^{- 1} I_{0} J (q) d q + J {(q)}^{T} B_{0} d w .

If

C (q) = J {(q)}^{T} C_{0} J (q)

and

B (q) = J {(q)}^{T} B_{0}

, and if J is invertible, then the condition in (50) then becomes completely equivalent to:

C_{0} = \frac{β}{2} B_{0} B_{0}^{T}

(69)

The structure of the

C_{0}

matrix for a rigid body is a function of its shape. For example, the viscous drag on an ellipsoid was characterized in [96]. Given

C_{0}

, it is possible to define

B_{0} = C_{0}^{1 / 2}

.

When

I = I

and

V = 0

(68) becomes

d π = - \frac{β}{2} B_{0} B_{0}^{T} π d t + B_{0} d w .

(70)

This is an Ornstein-Uhlenbeck process, and the corresponding Fokker–Planck equation can be solved for

f (π, t)

in closed form as a time-varying Gaussian if the initial conditions are

f (π, 0) = δ (π)

. The equilibrium solution is the Boltzmann distribution

f_{\infty} (π) = c (β) exp (- \frac{β}{2} {∥ π ∥}^{2})

where

c (β)

is the usual normalizing constant for a Gaussian distribution.

5.3.2. Noninertial Theory

When the inertia is negligible, as it is in the case of rotational Brownian motion of molecules, then (69) and (66) give

ω d t = B_{1} d w where B_{1} = \frac{2}{β} B_{0}^{- T} . .

(71)

This can be expressed in coordinates as a Stratonovich equation:

\dot{q} = J^{- 1} (q) B_{1} w,

(72)

or it can be kept in the invariant form (71). The corresponding Fokker–Planck equation is of the form:

\frac{\partial f}{\partial t} = \sum_{i, j = 1}^{3} D_{i j} {\tilde{E}}_{i} {\tilde{E}}_{j} f

where

D = B_{1} B_{1}^{T}

and each

{\tilde{E}}_{i}

is as in (60) with

X = E_{i}

.

The short-time solution of this equation subject to Dirac delta initial conditions is the Gaussian in exponential coordinates. Hence, for short times, entropy and entropy rate can be computed in closed form using the results from the Euclidean case.

In noninertial theory a special case is isotropic diffusions. Let:

\nabla^{2} ≐ {\tilde{E}}_{1}^{2} + {\tilde{E}}_{2}^{2} + {\tilde{E}}_{3}^{2} .

An isotropic driftless diffusion on

S O (3)

is one of the form

\frac{\partial f}{\partial t} = K \nabla^{2} f .

(73)

The heat kernel for

S O (3)

is the solution of this subject to the initial condition

f (R, 0) = δ (R)

.

Rotation matrices can be expressed in terms of the axis and angle of rotation using Euler’s formula:

R (θ, ν, λ) = exp [θ \hat{n} (ν, λ)] = I + sin θ \hat{n} (ν, λ) + (1 - cos θ) {[\hat{n} (ν, λ)]}^{2}

where

\hat{n} (ν, λ)

is a skew symmetric matrix such that for arbitrary vector

v \in R^{3}

and vector cross product ×,

\hat{n} (ν, λ) v = n (ν, λ) \times v

and

n (ν, λ) = (\begin{matrix} sin ν cos λ \\ sin ν sin λ \\ cos ν \end{matrix}) .

There are several different ways to choose the ranges of these coordinates to fully parameterize

S O (3)

. One way is to view these coordinates as a solid ball of radius

π

in which

θ \in [0, π]

serves as the radius and

ν \in [0, π]

and

λ \in [0, 2 π)

, are the usual spherical angles. Another way is to let

θ \in [0, 2 π)

and cut the range of one of the other variables in half. For example,

ν \in [0, π / 2]

and

λ \in [0, 2 π)

would restrict

n

to the upper hemisphere and

ν \in [0, π]

and

λ \in [0, π)

would be like the western hemisphere (if the initial datum is chosen appropriately). In these hemispherical boundary models, the great circle that bounds the hemisphere will be half open and half closed so as not to redundantly parameterize.

There are benefits to each of these parameterizations. For example, allowing the

[0, 2 π)

range for

θ

reflects that for fixed

n

rotations around a fixed axis bring back to the same location. That is the ‘little group’ of rotations around

n

isomorphic to

S O (2)

is the ‘maximal torus’ in

S O (3)

. Likewise parameterizing the whole sphere has value. For this reason, the best of both worlds can be achieved by double covering rotations by allowing both ranges to be expanded. Moreover, each range

[0, 2 π)

can be replaced with

[- π, π)

. Then when performing integration all that needs to be done is to divide by 2 afterwards.

Using these parameters, the integration measure

d R

such that the volume of

S O (3)

is normalized to 1 is

d R = \frac{1}{4 π^{2}} {sin}^{2} (θ / 2) sin ν d θ d λ d ν .

(74)

When computing integrals,

\begin{matrix} \int_{S O (3)} f (R) d R & = & \frac{1}{2 π^{2}} \int_{ν = 0}^{π} \int_{λ = 0}^{2 π} \int_{θ = 0}^{π} f (R (θ, ν, λ)) {sin}^{2} (θ / 2) sin ν d θ d λ d ν \\ = & \frac{1}{2 π^{2}} \int_{ν = 0}^{π / 2} \int_{λ = 0}^{2 π} \int_{θ = - π}^{π} f (R (θ, ν, λ)) {sin}^{2} (θ / 2) sin ν d θ d λ d ν \\ = & \frac{1}{2 π^{2}} \int_{ν = 0}^{π} \int_{λ = 0}^{π} \int_{θ = - π}^{π} f (R (θ, ν, λ)) {sin}^{2} (θ / 2) sin ν d θ d λ d ν \end{matrix}

Doubling the range gives

\int_{S O (3)} f (R) d R = \frac{1}{4 π^{2}} \int_{ν = 0}^{π} \int_{λ = 0}^{2 π} \int_{θ = - π}^{π} f (R (θ, ν, λ)) {sin}^{2} (θ / 2) sin ν d θ d λ d ν

(75)

and when

f (R (θ, ν, λ)) = f (θ) = f (- θ)

\begin{matrix} \int_{S O (3)} f (R) d R & = & \frac{2}{π} \int_{θ = 0}^{π} f (θ) {sin}^{2} (θ / 2) d θ \\ = & \frac{1}{π} \int_{θ = - π}^{π} f (θ) {sin}^{2} (θ / 2) d θ . \end{matrix}

(76)

All normalizations are such that:

\int_{S O (3)} 1 d R = 1 .

The Laplacian operator for

S O (3)

in this axis-angle parameterization is [3,97,98]

\nabla^{2} = \frac{\partial^{2}}{\partial θ^{2}} + cot θ / 2 \frac{\partial}{\partial θ} + \frac{1}{4 {sin}^{2} θ / 2} (\frac{\partial^{2}}{\partial ν^{2}} + cot ν \frac{\partial}{\partial ν} + \frac{1}{{sin}^{2} ν} \frac{\partial^{2}}{\partial λ^{2}})

(77)

It can be shown that the isotropic solution does not depend on

ν

or

λ

, and so all that needs to be solved is [99,100]:

\frac{\partial f}{\partial t} = K (\frac{\partial^{2} f}{\partial θ^{2}} + cot θ / 2 \frac{\partial f}{\partial θ})

(78)

subject to initial condition

f (R, 0) = δ (R)

.

A basis for all functions on

S O (3)

that depend only on

θ

are the functions

{χ_{l} (θ) | l \in Z_{\geq 0}}

where

χ_{l} (θ) = \frac{sin (l + \frac{1}{2}) θ}{sin \frac{θ}{2}} .

These are eigenfunctions of the Laplacian:

\nabla^{2} χ = - l (l + 1) χ .

Consequently, the Fourier series solution of the isotropic heat equation on

S O (3)

is known to be:

f (R (θ, ν, λ), t) = \sum_{l = 0}^{\infty} (2 l + 1) χ_{l} (θ) e^{- l (l + 1) K t} = {(sin \frac{θ}{2})}^{- 1} \sum_{l = 0}^{\infty} (2 l + 1) sin ((l + 1 / 2) θ) e^{- l (l + 1) K t} .

(79)

Note that:

lim_{t \to \infty} f (R (θ, ν, λ), t) = 1

and

\int_{S O (3)} f (R, t) d R = 1 .

When

t = 0

, the above becomes the Fourier series for the Dirac delta function

f (R, 0) = δ (R) .

As with the case of the heat equation on the circle, an alternative solution exists, analogous to a folded Gaussian. Denote this solution as

ρ (θ, t)

, and

f (R (θ, ν, λ), t) = e^{K t / 4} {(sin \frac{θ}{2})}^{- 1} \sum_{k \in Z} ρ (θ + 2 π k, t) .

(80)

This can be derived in two steps. First, let

f (R (θ, ν, λ), t) = {(sin \frac{θ}{2})}^{- 1} h (θ, t) .

Substituting in to (78) and simplifying then gives

\frac{\partial h}{\partial t} = K (\frac{\partial^{2} h}{\partial θ^{2}} + \frac{1}{4} h) .

(81)

Then substituting

h (θ, t) = e^{a t} q (θ, t)

and simplifying gives that when

a = K / 4

\frac{\partial q}{\partial t} = K \frac{\partial^{2} q}{\partial θ^{2}} .

(82)

The fundamental solution of this heat Equation (the 1D heat kernel) is

q (θ, t) = \frac{1}{2 \sqrt{π K t}} e^{- θ^{2} / 4 K t} .

However, this solution is not a satisfactory choice for

ρ (θ, t)

for several reasons. First,

S O (3)

is a 3D space, and so the normalization is not correct, as the normalization factor should be proportional to

1 / t^{3 / 2}

. However, to arbitrarily change the temporal dependence will cause the result to no longer be a solution of (82). Secondly, whereas division by

sin (θ / 2)

is fine in the definition of

χ_{l} (θ)

because zeros are balanced by zeros in the numerator, that is not the case here.

There is a way to solve both problems. That is by realizing that if

q (θ, t)

solves (82), then so too does

C \partial q / \partial θ

for arbitrary constant C. Consequently, we take as a candidate solution [99,100]:

ρ (θ, t) = C \frac{1}{{(π K t)}^{3 / 2}} θ e^{- θ^{2} / 4 K t} .

(83)

The choice of normalizing factor C is made such that (80) is a pdf. This is a constant, hence independent of t. Choosing a relatively small value of t, the summation in (80) reduces to a single term and (76) becomes

\frac{1}{π} C \frac{e^{K t / 4}}{{(π K t)}^{3 / 2}} \int_{θ = - π}^{π} θ e^{- θ^{2} / 4 K t} sin (θ / 2) d θ = 1 .

Moreover, for small t the integral over

[- π, π]

can be replaced with an integral over the whole real line. Consequently, since

\int_{- \infty}^{\infty} θ e^{- θ^{2} / 4 K t} sin (θ / 2) d θ = 2 \sqrt{π} {(K t)}^{3 / 2} e^{- K t / 4}

then

C = π^{2} / 2

and

f (R (θ, ν, λ), t) = \frac{\sqrt{π}}{2} \frac{e^{K t / 4}}{{(K t)}^{3 / 2}} {(sin \frac{θ}{2})}^{- 1} \sum_{k \in Z} (θ + 2 π k) e^{- {(θ + 2 π k)}^{2} / 4 K t} .

(84)

Since

f (R, t) > 0

and integrates to 1 it is a valid pdf. What needs to be tested is

lim_{t \to \infty} f (R, t) = 1

and

lim_{t \to 0} f (R, t) = δ (R) .

If these conditions hold, then the solution is valid.

This provides a way to bound entropy in a way similar to the case of the circle. The next section makes more general exact statements for all values of time.

5.4. Rate of Entropy Production under Diffusion on Unimodular Lie Groups

The entropy of a pdf on a Lie group is defined in (5). If

f (g, t)

is a pdf that satisfies a diffusion equation, then some interesting properties of

S_{f} (t)

that are independent of initial conditions result. For example, if

{\dot{S}}_{f} = d S_{f} / d t

, then differentiating under the integral gives

\begin{matrix} {\dot{S}}_{f} & = & - \int_{G} \{\frac{\partial f}{\partial t} log f + \frac{\partial f}{\partial t}\} d g . \end{matrix}

Moreover, since f is a pdf,

\int_{G} \frac{\partial f}{\partial t} d g = \frac{d}{d t} \int_{G} f (g, t) d g = 0 .

and so the second term in braces in the expression for

{\dot{S}}_{f}

integrates to zero.

Substituting the diffusion equation

\frac{\partial f}{\partial t} = \frac{1}{2} \sum_{i, j = 1}^{n} D_{i j} {\tilde{E}}_{i} {\tilde{E}}_{j} f - \sum_{k = 1}^{n} a_{k} {\tilde{E}}_{k} f

(85)

into the expression for

{\dot{S}}_{f}

gives [5,6,31]

\begin{matrix} \dot{S} & = & - \int_{G} \{\frac{1}{2} \sum_{i, j = 1}^{n} D_{i j} {\tilde{E}}_{i} {\tilde{E}}_{j} f - \sum_{k = 1}^{n} a_{k} {\tilde{E}}_{k} f\} log f d g \\ = & - \frac{1}{2} \sum_{i, j = 1}^{n} D_{i j} \int_{G} ({\tilde{E}}_{i} {\tilde{E}}_{j} f) log f d g + \sum_{k = 1}^{n} a_{k} \int_{G} ({\tilde{E}}_{k} f) log f d g \\ = & \frac{1}{2} \sum_{i, j = 1}^{n} D_{i j} \int_{G} ({\tilde{E}}_{j} f) ({\tilde{E}}_{i} log f) d g - \sum_{k = 1}^{n} a_{k} \int_{G} f ({\tilde{E}}_{k} log f) d g \\ = & \frac{1}{2} \sum_{i, j = 1}^{n} D_{i j} \int_{G} \frac{1}{f} ({\tilde{E}}_{j} f) ({\tilde{E}}_{i} f) d g - \sum_{k = 1}^{n} a_{k} \int_{G} {\tilde{E}}_{k} f d g \\ = & \frac{1}{2} \sum_{i, j = 1}^{n} D_{i j} \int_{G} \frac{1}{f} ({\tilde{E}}_{j} f) ({\tilde{E}}_{i} f) d g \\ = & \frac{1}{2} tr [D F] \end{matrix}

where

F = [F_{i j}]

is the Lie-group version of the Fisher information matrix with entries

F_{i j} ≐ \int_{G} \frac{1}{f} ({\tilde{E}}_{j} f) ({\tilde{E}}_{i} f) d g .

Consequently,

\frac{1}{2} λ_{m i n} (D) tr [F] \leq \dot{S} \leq \frac{1}{2} λ_{m a x} (D) tr [F] .

The above result is an extension of one presented in [5,6,31].

5.5. The Generalized de Briujn Identity

Here a theorem derived in [5,6,31] is restated.

Theorem 3.

Let the solution of the diffusion Equation (85) with constant

a = {[a_{1}, . . ., a_{n}]}^{T}

subject to the initial condition

f (g, 0; D, a) = δ (g)

be denoted as

f_{D, a, t} (g) = f (g, t; D, a)

. Let

α (g)

be another differentiable pdf on the group. Then

\frac{d}{d t} S (α * f_{D, a, t}) = \frac{1}{2} tr [D F (α * f_{D, a, t})] .

(86)

Proof.

The solution of the diffusion equation

\frac{\partial ρ}{\partial t} = \frac{1}{2} \sum_{i, j = 1}^{n} D_{i j} {\tilde{E}}_{i} {\tilde{E}}_{j} ρ - \sum_{k = 1}^{n} a_{k} {\tilde{E}}_{k} ρ

(87)

subject to the initial conditions

ρ (g, 0) = α (g)

is

ρ (g, t) = (α * f_{D, a, t}) (g)

. Then computing the derivative of

S (ρ (g, t))

with respect to time yields

\frac{d}{d t} S (ρ) = - \frac{d}{d t} \int_{G} ρ (g, t) log ρ (g, t) d g = - \int_{G} \{\frac{\partial ρ}{\partial t} log ρ + \frac{\partial ρ}{\partial t}\} d g .

(88)

By substituting in (87), the partial with respect to time can be replaced with Lie derivatives. However,

\int_{G} {\tilde{E}}_{k} ρ d g = \int_{G} {\tilde{E}}_{i} {\tilde{E}}_{j} ρ d g = 0 .

Consequently, the second term on the right side of (88) completely disappears. Using the integration-by-parts formula (There are no surface terms. As with the circle and real line, each coordinate in the integral either wraps around or goes to infinity.)

\int_{G} f_{1} {\tilde{E}}_{k} f_{2} d g = - \int_{G} f_{2} {\tilde{E}}_{k} f_{1} d g,

with

f_{1} = log ρ

and

f_{2} = ρ

then gives

\begin{matrix} \frac{d}{d t} S (α * f_{D, a, t}) & = & \frac{1}{2} \sum_{i, j = 1}^{n} D_{i j} \int_{G} \frac{1}{α * f_{D, a, t}} {\tilde{E}}_{j} (α * f_{D, a, t}) {\tilde{E}}_{i} (α * f_{D, a, t}) d g \\ = & \frac{1}{2} \sum_{i, j = 1}^{n} D_{i j} F_{i j} (α * f_{D, a, t}) = \frac{1}{2} tr [D F (α * f_{D, a, t})] . \end{matrix}

This means that:

S (α * f_{D, a, t_{2}}) - S (α * f_{D, a, t_{1}}) = \frac{1}{2} \int_{t_{1}}^{t_{2}} tr [D F (α * f_{D, a, t})] d t .

□

Whereas some inequalities of Information Theory generalize to the Lie group setting, as demonstrated above, others do not. For example, under convolution on a Lie group (24) and (25) do not hold in general. As for the CRB, there are versions for Lie groups applicable to small values of

D t

or with different concepts of covariance, but not in a way that is directly applicable to the scenarios formulated here.

6. Conclusions

Stochastic mechanical systems can describe individual representatives of a statistical mechanical ensemble (as in a rotor in rotational Brownian motion), or can be a stand-alone system subjected to noise, such as a kinematic cart robot. When these systems have mass, Itô and Stratonovich SDEs lead to the same Fokker–Planck equation. Even in the case of inertia-free kinematic systems evolving on Lie groups, these can be the same. From the Fokker–Planck equation, the rate of entropy can be computed. For diffusion processes (with or without drift), the rate of entropy increase is related simply to the diffusion matrix and the Fisher information matrix. This result holds both in Euclidean spaces and on unimodular Lie groups (including cotangent bundle groups), which are a common configuration space for mechanical systems. As systems approach equilibrium, the entropy rate approaches zero. Two different ways to approach equilibrium are discussed: (1) when there is a restoring potential; (2) when the configuration space is bounded. By using the monotonicity and convexity properties of the logarithm function together with inequalities from information theory, computable bounds on entropy and entropy rate are established.

Funding

This work was performed under MOE Tier 1 grant R-265-000-655-114, Faculty Board funds C-265-000-071-001, and National University of Singapore Startup grants R-265-000-665-133 and R-265-000-665-731.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Tribus, M. Information theory and thermodynamics. In Heat Transfer, Thermodynamics and Education: Boelter Anniversary Volume; Johnson, H.A., Ed.; McGraw-Hill: New York, NY, USA, 1964. [Google Scholar]
Avery, J. Information Theory and Evolution; World Scientific: Singapore, 2003. [Google Scholar]
Chirikjian, G.S.; Kyatkin, A.B. Harmonic Analysis for Engineers and Applied Scientists; Dover: Mineola, NY, USA, 2016. [Google Scholar]
Shannon, C.E.; Weaver, W. The Mathematical Theory of Communication; University Illinois Press: Urbana, IL, USA, 1949. [Google Scholar]
Chirikjian, G.S. Stochastic Models, Information Theory, and Lie Groups: Volume I; Birkhäuser: Boston, MA, USA, 2009. [Google Scholar]
Chirikjian, G.S. Stochastic Models, Information Theory, and Lie Groups: Volume II; Birkhäuser: Boston, MA, USA, 2012. [Google Scholar]
Stratonovich, R.L. Topics in the Theory of Random Noise: Volumes I; Gordon and Breach Science Publishers, Inc.: New York, NY, USA, 1963. [Google Scholar]
Stratonovich, R.L. Topics in the Theory of Random Noise: II; CRC Press: Boca Raton, FL, USA, 1967. [Google Scholar]
Fisher, R. Theory of Statistical Estimation. Proc. Camb. Philos. Soc. 1925, 22, 700–725. [Google Scholar] [CrossRef] [Green Version]
Kullback, S. Information Theory and Statistics; Dover Publications Inc.: Mineola, NY, USA, 1968. [Google Scholar]
Jaynes, E.T. Information Theory and Statistical Mechanics, I. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
Jaynes, E.T. Information Theory and Statistical Mechanics, II. Phys. Rev. 1957, 108, 171–190. [Google Scholar] [CrossRef]
Frieden, B.R. Physics from Fisher Information; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
Zegers, P. Fisher Information Properties. Entropy 2015, 17, 4918–4939. [Google Scholar] [CrossRef] [Green Version]
Stam, A.J. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Control 1959, 2, 101–112. [Google Scholar] [CrossRef] [Green Version]
Blachman, N.M. The convolution inequality for entropy powers. IEEE Trans. Inform. Theory 1965, 11, 267–271. [Google Scholar] [CrossRef]
Dembo, A.; Cover, T.M.; Thomas, J.A. Information Theoretic Inequalities. IEEE Trans. Inf. Theory 1991, 37, 1501–1518. [Google Scholar] [CrossRef] [Green Version]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
Muller, S.; Bachmann, M.; Kroninger, D.; Kurz, T.; Helluy, P. Comparison and validation of compressible flow simulations of laser-induced cavitation bubbles. Comput. Fluids 2009, 38, 1850–1862. [Google Scholar] [CrossRef] [Green Version]
Iwase, M.; Liang, Y.; Masuda, Y.; Morimoto, M.; Matsuoka, T.; Boek, E.S.; Kaito, Y.; Nakagawa, K. Application of a digital oil model to solvent-based enhanced oil recovery of heavy crude oil. Energy Fuels 2019, 33, 10868–10877. [Google Scholar] [CrossRef]
Jia, X.; Zeng, F.; Gu, Y. Semi-analytical solutions to one-dimensional advection-diffusion equations with variable diffusion coefficient and variable flow velocity. Appl. Math. Comput. 2013, 221, 268–281. [Google Scholar] [CrossRef]
Orlandi, P.; Bernardini, M.; Pirozzoli, S. Poiseuille and Couette flows in the transitional and fully turbulent regime. J. Fluid Mech. 2015, 770, 424–441. [Google Scholar] [CrossRef]
Sun, Y.; Jayaraman, A.S.; Chirikjian, G.S. Lie group solutions of advection-diffusion equations. Phys. Fluids 2021, 33, 046604. [Google Scholar] [CrossRef]
Cramér, H. Mathematical Methods of Statistics; Princeton University Press: Princeton, NJ, USA, 1946. [Google Scholar]
Rao, C.R. Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 1945, 37, 81–89. [Google Scholar]
Chirikjian, G.S. Information theory on Lie groups and mobile robotics applications. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, AK, USA, 3–7 May 2010; pp. 2751–2757. [Google Scholar]
Bonnabel, S.; Barrau, A. An intrinsic Cramér-Rao bound on SO(3) for (dynamic) attitude filtering. In Proceedings of the 54th IEEE Conference on Decision and Control (CDC), Osaka, Japan, 15–18 December 2015; pp. 2158–2163. [Google Scholar]
Bonnabel, S.; Barrau, A. An intrinsic Cramér-Rao bound on Lie groups. In Proceedings of the International Conference on Geometric Science of Information, Palaiseau, France, 28–30 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 664–672. [Google Scholar]
Solo, V.; Chirikjian, G.S. On the Cramer-Rao Bound in Riemannian Manifolds with Application to SO(3). In Proceedings of the 2020 59th IEEE Conference on Decision and Control (CDC), Jeju, Korea, 14–18 December 2020; pp. 4117–4122. [Google Scholar]
Solo, V.; Chirikjian, G.S. Ito, Stratonovich and Geometry. In Proceedings of the 2019 IEEE 58th Conference on Decision and Control (CDC), Nice, France, 11–13 December 2019; pp. 3026–3032. [Google Scholar]
Chirikjian, G.S. Information-Theoretic Matrix Inequalities and Diffusion Processes on Unimodular Lie Groups. In Geometric Structures of Information; Nielsen, F., Ed.; Springer: Berlin/Heidelberg, Germany, 2019; pp. 213–249. [Google Scholar]
Chirikjian, G.S. From Wirtinger to Fisher Information Inequalities on Spheres and Rotation Groups. In Proceedings of the 21st International Conference on Information Fusion (FUSION), Cambridge, UK, 10–13 July 2018; pp. 730–736. [Google Scholar]
Crassidis, J.L.; Junkins, J.L. Optimal Estimation of Dynamic Systems; Chapman & Hall/CRC: Boca Raton, FL, USA, 2004. [Google Scholar]
Gibbs, J.W. Elementary Principles in Statistical Mechanics: Developed With Especial Reference to the Rational Foundation of Thermodynamics; Cambridge University Press: Cambridge, UK, 1902; Reissued by Kessinger Publishing and BiblioBazaar 2008. [Google Scholar]
Balescu, R. Equilibrium and Nonequilibrium Statistical Mechanics; Wiley: New York, NY, USA, 1975. [Google Scholar]
de Groot, S.R.; Mazur, P. Non-Equilibrium Thermodynamics; North-Holland Publishing Co.: Amsterdam, The Netherlands; Interscience Publishers, Inc.: New York, NY, USA, 1962. [Google Scholar]
Prigogine, I. Non-Equilibrium Statistical Mechanics; John Wiley and Sons Inc.: New York, NY, USA, 1962. [Google Scholar]
McLennan, J.A. Introduction to Non-Equilibrium Statistical Mechanics; Prentice-Hall, Inc.: Englewood Cliffs, NJ, USA, 1989. [Google Scholar]
Zwanzig, R. Nonequilibrium Statistical Mechanics; Oxford University Press: Oxford, UK, 2001. [Google Scholar]
Jarzynski, C. Nonequilibrium equality for free energy differences. Phys. Rev. Lett. 1997, 78, 2690–2693. [Google Scholar] [CrossRef] [Green Version]
Kac, M. Some Stochastic Problems in Physics and Mathematics; Colloquium Lectures in the Pure and Applied Sciences, Magnolia Petroleum Company: Galveston, TX, USA, 1957. [Google Scholar]
Bismut, J.-M. Mécanique Aléatoire; Springer: Berlin, Germany, 1981. [Google Scholar]
Nelson, E. Dynamical Theories of Brownian Motion; Princeton University Press: Princeton, NJ, USA, 1967. [Google Scholar]
Nelson, E. Review of stochastic mechanics. J. Physics Conf. Ser. 2012, 361, 012011. [Google Scholar] [CrossRef]
Jayaraman, A.S.; Campolo, D.; Chirikjian, G.S. Black-Scholes theory and diffusion processes on the cotangent bundle of the affine group. Entropy 2020, 22, 455. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Barbaresco, F. Koszul information geometry and Souriau geometric temperature/capacity of Lie group thermodynamics. Entropy 2014, 16, 4521–4565. [Google Scholar] [CrossRef] [Green Version]
Barbaresco, F. Geometric theory of heat from Souriau Lie groups thermodynamics and Koszul Hessian geometry: Applications in information geometry for exponential families. Entropy 2016, 18, 386. [Google Scholar] [CrossRef] [Green Version]
Marle, C.M. From tools in symplectic and Poisson geometry to J.-M. Souriau’s theories of statistical mechanics and thermodynamics. Entropy 2016, 18, 370. [Google Scholar] [CrossRef] [Green Version]
De Saxcé, G. Link between Lie group statistical mechanics and thermodynamics of continua. Entropy 2016, 18, 254. [Google Scholar] [CrossRef] [Green Version]
Brockett, R.W. System Theory on Group Manifolds and Coset Spaces. SIAM J. Control 1972, 10, 265–284. [Google Scholar] [CrossRef]
Murray, R.; Li, Z.; Sastry, S. A Mathematical Introduction to Robotics; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar]
Bullo, F.; Lewis, A.D. Geometric Control of Mechanical Systems; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
Holm, D. Geometric Mechanics, Part I: Dynamics ans Symmetry; World Scientific: Singapore, 2008. [Google Scholar]
Holm, D. Geometric Mechanics, Part II: Rotating, Translating and Rolling; World Scientific: Singapore, 2008. [Google Scholar]
Smith, P.; Drummond, T.; Roussopoulos, K. Computing MAP Trajectories by Representing, Propagating and Combining PDFs over Groups. In Proceedings of the 9th IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003; Volume 2, pp. 1275–1282. [Google Scholar]
Thrun, S.; Burgard, W.; Fox, D. Probabilistic Robotics; MIT Press: Cambridge, MA, USA, 2005. [Google Scholar]
Park, W.; Liu, Y.; Moses, M.; Chirikjian, G.S. Kinematic State Estimation and Motion Planning for Stochastic Nonholonomic Systems Using the Exponential Map. Robotica 2008, 26, 419–434. [Google Scholar] [CrossRef] [Green Version]
Zhou, Y.; Chirikjian, G.S. Probabilistic Models of Dead-Reckoning Error in Nonholonomic Mobile Robots. In Proceedings of the 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422), Taipei, Taiwan, 14–19 September 2003. [Google Scholar]
Long, A.W.; Wolfe, K.C.; Mashner, M.J.; Chirikjian, G.S. The banana distribution is Gaussian: A localization study with exponential coordinates. In Robotics: Science and Systems VIII; MIT Press: Cambridge, MA, USA, 2013; p. 265. [Google Scholar]
Mumford, D. Elastica and computer vision. In Algebraic Geometry and Its Applications; Bajaj, C., Ed.; Springer: New York, NY, USA, 1994. [Google Scholar]
Williams, L.R.; Jacobs, D.W. Stochastic completion fields: A neural model of illusory contour shape and salience. Neural Comput. 1997, 9, 837–858. [Google Scholar] [CrossRef] [PubMed]
Williams, L.R.; Jacobs, D.W. Local Parallel Computation of Stochastic Completion Fields. Neural Comput. 1997, 9, 859–881. [Google Scholar] [CrossRef]
Zweck, J.; Williams, L.R. Euclidean Group Invariant Computation of Stochastic Completion Fields Using Shiftable-Twistable Functions. J. Math. Imaging Vis. 2004, 21, 135–154. [Google Scholar] [CrossRef]
Citti, G.; Sarti, A. A Cortical Based Model of Perceptual Completion in the Roto-Translation Space. J. Math. Imaging Vis. 2006, 24, 307–326. [Google Scholar] [CrossRef]
Duits, R.; Franken, E. Left-invariant parabolic evolutions on SE(2) and contour enhancement via invertible orientation scores Part I: Linear left-invariant diffusion equations on SE(2). Quart. Appl. Math. 2010, 68, 255–292. [Google Scholar] [CrossRef] [Green Version]
Bond, D.J. The statistical properties of phase noise. Br. Telecom. Technol. J. 1989, 7, 12–17. [Google Scholar]
Foschini, G.J.; Greenstein, L.J.; Vannucci, G. Noncoherent detection of coherent lightwave signals corrupted by phase noise. IEEE Trans. Commun. 1988, COM-36, 306–314. [Google Scholar] [CrossRef]
Foschini, G.J.; Vannucci, G. Characterizing filtered light waves corrupted by phase noise. IEEE Trans. Inf. Theory 1988, 34, 1437–1448. [Google Scholar] [CrossRef]
Foschini, G.J.; Vannucci, G.; Greenstein, L.J. Envelope statistics for filtered optical signals corrupted by phase noise. IEEE Trans. Commun. 1989, 37, 1293–1302. [Google Scholar] [CrossRef]
Garrett, I.; Bond, D.J.; Waite, J.B.; Lettis, D.S.L.; Jacobsen, G. Impact of phase noise in weakly coherent systems: A new and accurate approach. J. Light. Technol. 1990, 8, 329–337. [Google Scholar] [CrossRef]
Garrett, I.; Jacobsen, G. Phase noise in weakly coherent systems. IEEE Proc. 1989, 136 Pt. J, 159–165. [Google Scholar] [CrossRef]
Zhang, X. Analytically solving the Fokker–Planck equation for the statistical characterization of the phase noise in envelope detection. J. Light. Technol. 1995, 13, 1787–1794. [Google Scholar] [CrossRef]
Wang, Y.; Zhou, Y.; Maslen, D.K.; Chirikjian, G.S. Solving the Phase-Noise Fokker–Planck Equation Using the Motion-Group Fourier Transform. IEEE Trans. Commun. 2006, 54, 868–877. [Google Scholar] [CrossRef] [Green Version]
Park, W.; Kim, J.S.; Zhou, Y.; Cowan, N.J.; Okamura, A.M.; Chirikjian, G.S. Diffusion-based motion planning for a nonholonomic flexible needle model. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005; pp. 4600–4605. [Google Scholar]
Park, W.; Wang, Y.; Chirikjian, G.S. The path-of-probability algorithm for steering and feedback control of flexible needles. Int. J. Robot. Res. 2010, 29, 813–830. [Google Scholar] [CrossRef] [PubMed]
Webster, R.J., III; Kim, J.-S.; Cowan, N.J.; Chirikjian, G.S.; Okamura, A.M. Nonholonomic Modeling of Needle Steering. Int. J. Robot. Res. 2006, 25, 509–525. [Google Scholar] [CrossRef]
Perrin, F. “Étude Mathématique du Mouvement Brownien de Rotation. Ann. Sci. L’ École Norm. Supérieure 1928, 45, 1–51. [Google Scholar] [CrossRef]
Furry, W.H. Isotropic Rotational Brownian Motion. Phys. Rev. 1957, 107, 7–13. [Google Scholar] [CrossRef]
Favro, L.D. Theory of the Rotational Brownian Motion of a Free Rigid Body. Phys. Rev. 1960, 119, 53–62. [Google Scholar] [CrossRef]
Hubbard, P.S. Angular velocity of a nonspherical body undergoing rotational Brownian motion. Phys. Rev. A 1977, 15, 329–336. [Google Scholar] [CrossRef]
Steele, W.A. Molecular Reorientation in Liquids. I. Distribution Functions and Friction Constants. II. Angular Autocorrelation Functions. J. Chem. Phys. 1963, 38, 2404–2418. [Google Scholar] [CrossRef]
McConnell, J. Rotational Brownian Motion and Dielectric Theory; Academic Press: New York, NY, USA, 1980. [Google Scholar]
Weber, G. Rotational Brownian motion and polarization of the fluorescence of solutions. Adv. Protein Chem. 1953, 8, 415–459. [Google Scholar] [PubMed]
Tao, T. Time-dependent fluorescence depolarization and Brownian rotational diffusion coefficients of macromolecules. Biopolymers 1969, 8, 609–632. [Google Scholar] [CrossRef]
Willsky, A.S. Dynamical Systems Defined on Groups: Structural Properties and Estimation. Ph.D. Dissertation, Department Aeronautics and Astronautics, M.I.T., Cambridge, MA, USA, June 1973. [Google Scholar]
Itô, K. Brownian Motions in a Lie Group. Proc. Jpn. Acad. 1950, 26, 4–10. [Google Scholar] [CrossRef]
Itô, K. Stochastic Differential Equations in a Differentiable Manifold. Nagoya Math. J. 1950, 1, 35–47. [Google Scholar] [CrossRef] [Green Version]
Itô, K. Stochastic Differential Equations in a Differentiable Manifold (2). Mem. Coll. Sci. Univ. Kyoto Ser. A Math. 1953, 28, 81–85. [Google Scholar] [CrossRef]
McKean, H.P., Jr. Stochastic Integrals; Academic Press: New York, NY, USA, 1969. [Google Scholar]
Gangolli, R. On the Construction of Certain Diffusions on a Differentiable Manifold. Z. Wahrscheinlichkeitstheorie Und Verw. Geb. 1964, 2, 406–419. [Google Scholar] [CrossRef]
Duncan, T.E. Stochastic Systems in Riemannian Manifolds. J. Optim. Theory Appl. 1979, 27, 399–426. [Google Scholar] [CrossRef]
McKean, H.P., Jr. Brownian Motions on the 3-Dimensional Rotation Group. Mem. Coll. Sci. Univ. Kyoto Ser. A 1960, 33, 25–38. [Google Scholar] [CrossRef]
Gorman, C.D. Brownian Motion of Rotation. Trans. Am. Soc. 1960, 94, 103–117. [Google Scholar] [CrossRef]
Liao, M. Random Motion of a Rigid Body. J. Theor. 1997, 10, 201–211. [Google Scholar]
Liao, M. Lévy Processes in Lie Groups; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Jeffrey, G.B. The Motion of Ellipsoidal Particles Immersed in a Viscous Fluid. Proc. R. Soc. Lond. Ser. A 1922, 102, 161–179. [Google Scholar]
Varshalovich, D.A.; Moskalev, A.N.; Khersonskii, V.K. Quantum Theory of Angular Momentum; World Scientific: Singapore, 1988. [Google Scholar]
Gel’f, I.M.; Minlos, R.A.; Shapiro, Z.Y. Representations of the Rotation and Lorentz Groups and Their Applications; Macmillan: New York, NY, USA, 1963. [Google Scholar]
Chétalet, O.; Chirikjian, G.S. Sampling and Convolution on Motion Groups Using Generalized Gaussian Functions. Electron. Comput. Kinemat. 2002, 1. [Google Scholar]
Lee, S.; Chirikjian, G.S. Pose Analysis of Alpha–Carbons in Proteins. Int. J. Robot. Res. 2005, 24, 183–210. [Google Scholar] [CrossRef]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chirikjian, G.S. Rate of Entropy Production in Stochastic Mechanical Systems. Entropy 2022, 24, 19. https://doi.org/10.3390/e24010019

AMA Style

Chirikjian GS. Rate of Entropy Production in Stochastic Mechanical Systems. Entropy. 2022; 24(1):19. https://doi.org/10.3390/e24010019

Chicago/Turabian Style

Chirikjian, Gregory S. 2022. "Rate of Entropy Production in Stochastic Mechanical Systems" Entropy 24, no. 1: 19. https://doi.org/10.3390/e24010019

APA Style

Chirikjian, G. S. (2022). Rate of Entropy Production in Stochastic Mechanical Systems. Entropy, 24(1), 19. https://doi.org/10.3390/e24010019

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Rate of Entropy Production in Stochastic Mechanical Systems

Abstract

1. Introduction

2. Rate of Entropy Production for Stochastic Processes on Euclidean Space

2.1. Review of Stochastic Differential Equations

2.2. Rate of Entropy Production

2.3. Examples

2.3.1. Brownian Motion in Euclidean Spaces

2.3.2. Brownian Motion on the Torus/Circle

2.3.3. Concentration of Species Transport in Inhomogeneous Compressible Flow

2.3.4. Homogeneous Transport in Couette Flow

3. Bounding Rate of Entropy Production with Covariance

3.1. The Cramér-Rao Bound

3.2. An Example

4. Classical Statistical Mechanics as Stochastic Mechanics

4.1. Properties of Phase Space

4.2. Hamilton’s Equations for a System Forced by External Noise and Damping

4.3. The Boltzmann Distribution

4.4. Marginal Densities and the Conundrum as Mass Becomes Zero

5. Stochastic Systems on Unimodular Lie Groups

5.1. Review of Unimodular Matrix Lie Groups with S O ( 3 ) and S E ( 2 ) as Examples

5.2. The Noisy Kinematic Cart

5.3. Rotational Brownian Motion

5.3.1. Inertial Theory

5.3.2. Noninertial Theory

5.4. Rate of Entropy Production under Diffusion on Unimodular Lie Groups

5.5. The Generalized de Briujn Identity

6. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.1. Review of Unimodular Matrix Lie Groups with $S O (3)$ and $S E (2)$ as Examples