Symplectic Bregman Divergences

Nielsen, Frank

doi:10.3390/e26121101

Open AccessArticle

Symplectic Bregman Divergences

by

Frank Nielsen

Sony Computer Science Laboratories Inc., Tokyo 141-0022, Japan

Entropy 2024, 26(12), 1101; https://doi.org/10.3390/e26121101

Submission received: 26 August 2024 / Revised: 5 December 2024 / Accepted: 15 December 2024 / Published: 16 December 2024

(This article belongs to the Special Issue Information Geometry for Data Analysis)

Download

Browse Figures

Versions Notes

Abstract

We present a generalization of Bregman divergences in finite-dimensional symplectic vector spaces that we term symplectic Bregman divergences. Symplectic Bregman divergences are derived from a symplectic generalization of the Fenchel–Young inequality which relies on the notion of symplectic subdifferentials. The symplectic Fenchel–Young inequality is obtained using the symplectic Fenchel transform which is defined with respect to the symplectic form. Since symplectic forms can be built generically from pairings of dual systems, we obtain a generalization of Bregman divergences in dual systems obtained by equivalent symplectic Bregman divergences. In particular, when the symplectic form is derived from an inner product, we show that the corresponding symplectic Bregman divergences amount to ordinary Bregman divergences with respect to composite inner products. Some potential applications of symplectic divergences in geometric mechanics, information geometry, and learning dynamics in machine learning are touched upon.

Keywords:

dual system; duality product; inner product; symplectic form; symplectic matrix group; symplectic subdifferential; symplectic Fenchel transform; Moreau proximation; geometric mechanics

1. Introduction

Symplectic geometry [1,2,3] was historically pioneered by Lagrange around 1808–1810 [4,5,6] where the motions and dynamics (evolution curves) of a finite set of m point mass particles in a time interval T are analyzed in the phase space by a 1D curve

C = {c (t) = (q_{1}, p_{1}, \dots, q_{m}, p_{m}) (t) : t \in T \subset R} \subset R^{2 n}

, where

q_{i} (t) \in R^{n}

’s denote the point locations at time t and

p_{i} (t) \in R^{n}

’s encode the momentum, i.e.,

p_{i} (t) = m_{i} {\dot{q}}_{i}

with

{\dot{q}}_{i} = \frac{d}{d t} q_{i} (t)

. See Figure 1. (Notice that Joseph-Louis Lagrange (1736–1813) was 72 years old in 1808, and is famous for his treatise on analytic mechanics [7,8] published first in french in 1788 when he was 52 years old).

The Hamiltonian coupled equations [9] governing the system motion are written in the phase space as follows:

\begin{matrix} \frac{{d q}^{i}}{d t} = \frac{\partial H}{\partial p_{i}}, \frac{{d p}_{i}}{d t} = - \frac{\partial H}{\partial q^{i}}, \end{matrix}

(1)

where

H (q, p, t)

is the Hamiltonian describing the system. Lagrange originally started a new kind of calculus, “symplectic calculus”. Symplectic geometry can be thought as the first discovered non-Euclidean geometric structure since hyperbolic geometry is usually considered to be first studied by Lobachevsky and Bolyai around 1820–1930. We refer to the paper entitled “The symplectization of science” [10] for an outreach article on symplectic geometry.

The adverb “symplectic” stems from Greek: It means “braided together” to convey the interactions of point mass particle positions with their momenta. Its use in mathematics originated in the work of Hermann Weyl (see §6 on symplectic groups in [11]). Another synonym adverb of symplectic is “complex” which has been used to describe braided numbers z of

C = {z = a + i b : (a, b) \in R^{2}}

. Complex has its etymological root in Latin. In differential geometry, symplectic structures are closely related to (almost) complex structures on vector spaces and smooth manifolds [2].

In physics, symplectic geometry is not only at the core of classical mechanics (i.e., conservative reversible mechanics) and quantum mechanics [12], but has also recently been used to model and study dynamics of systems exhibiting dissipative terms [13,14] which are irreversible. As a pure geometry, symplectic geometry can be studied on its own by mathematicians, and gave birth to the field of symplectic topology [15]. Thus, symplectic geometry can be fruitfully applied to various areas beyond its original domain of geometric mechanics. For example, symplectic geometry has been considered in machine learning for accelerating numerical optimization methods based on symplectic integrators [16] and in physics-informed neural networks [17,18] (PINNs).

In this paper, we define symplectic Bregman divergences (Definition 5) which recover as special cases Bregman divergences [19] defined with respect to composite inner products. A Bregman divergence induced by a strictly convex and differentiable (potential) function F (called the Bregman generator) between

x_{1}

and

x_{2}

of X is defined in [19] (1967) by

B_{F} (x_{1} : x_{2}) = F (x_{1}) - F (x_{2}) - 〈 x_{1} - x_{2}, \nabla F (x_{2}) 〉,

(2)

where

〈 \cdot, \cdot 〉

is an inner product on X. Let

Γ_{0} (X)

denote the set of functions which are lower semi-continuous convex with non-empty effective domains. The convex conjugate

F^{*} (x^{*})

obtained by the Legendre–Fenchel transform

F^{*} (x^{*}) = {sup}_{x \in X} 〈 x^{*}, x 〉 - F (x)

yields a dual Bregman divergence

B_{F^{*}}

when the function

F \in Γ_{0} (X)

is of Legendre type [20,21]:

B_{F *} (x_{1} * : x_{2}^{*}) = F^{*} (x_{1}^{*}) - F^{*} (x_{2}^{*}) - 〈 x_{1}^{*} - x_{2}^{*}, \nabla F^{*} (x_{2}^{*}) 〉,

such that

B_{F} (x_{1} : x_{2}) = B_{F^{*}} (x_{2}^{*} : x_{1}^{*})

with

F^{*} (x^{*}) = 〈 x^{*}, {(\nabla F)}^{- 1} (x^{*}) 〉 - F ({(\nabla F)}^{- 1} (x^{*})) .

This paper introduces and extends the work of Buliga and Saxcé [13,14] which is motivated by geometric irreversible mechanics. To contrast with [13,14], this expository paper is targeted to an audience familiar with Bregman divergences [19] in machine learning and information geometry [22] but does not assume any prior knowledge in geometric mechanics. Furthermore, we consider only finite-dimensional spaces in this study.

The paper is organized as follows: In Section 2, we define symplectic vector spaces and explain the representation of symplectic forms using dual pairings. We then define the symplectic Fenchel transform and the symplectic Fenchel–Young inequality in Section 3. The definitions of symplectic Fenchel–Young divergences (Definition 4) and symplectic Bregman divergences (Definition 5) are reported in Section 4. In particular, we show how to recover Bregman divergences with respect to composite inner products as special cases in Section 5 (Property 1). In general, symplectic Bregman divergences allow one to define Bregman divergences in dual systems equipped with pairing products. Finally, we recall the role of Bregman divergences in dually flat manifolds of information geometry in Section 6, and motivate the introduction of symplectic Bregman divergences in geometric mechanics (e.g., symplectic BEN principle of [13,14]) and learning dynamics in machine learning.

2. Dual Systems, Linear Symplectic Forms, and Symplectomorphisms

2.1. Symplectic Forms Derived from Dual Systems

We begin with two definitions:

Definition 1

(Dual system). Let X and Y be finite m-dimensional vector spaces [23] equipped with a pairing product

b (\cdot, \cdot)

, i.e., a bilinear map:

b (\cdot, \cdot) : X \times Y \to R,

such that all continuous linear functionals on X and Y are expressed as

x^{#} (\cdot) = b (x, \cdot)

and

y^{#} (\cdot) = b (\cdot, y)

, respectively. The triplet

(X, Y, b (\cdot, \cdot))

forms a dual system.

(Notice that when the type of X is different from the type of Y then the bilinear map cannot be symmetric).

Definition 2

(Symplectic vector space). A symplectic vector space

(V, ω)

is a vector space equipped with a map [24]

ω : Z = V \times V \to R

which is

1.: bilinear: $\forall α, β, α^{'}, β^{'} \in R, \forall z_{1}, z_{2} \in Z$ , we have

$ω (α z_{1} + α^{'} z_{1}^{'}, z_{2}, β z_{2} + β^{'} z_{2}^{'}) = α β ω (z_{1}, z_{2}) + α β^{'} ω (z_{1}, z_{2}^{'}) + α^{'} β ω (z_{1}^{'}, z_{2}) + α^{'} β^{'} ω (z_{1}, z_{2}^{'}),$
2.: skew-symmetric (or alternating): $ω (z_{2}, z_{1}) = - ω (z_{1}, z_{2})$ , and
3.: non-degenerate: if for a $z_{0}$ , we have $ω (z, z_{0}) = 0$ for all $z \in Z$ then we have $z_{0} = 0$ .

Notice that skew-symmetry implies that

ω (z, z) = 0

for all

z \in Z

since

ω (z, z) = - ω (z, z)

and hence

2 ω (z, z) = 0

. The map

ω

is called a linear symplectic form [24,25].

We define the symplectic form

ω

induced by the pairing product of a dual system as follows:

ω (z_{1}, z_{2}) = b (x_{1}, y_{2}) - b (x_{2}, y_{1}),

(3)

where

z_{1} = (x_{1}, l_{1})

and

z_{2} = (x_{2}, l_{2})

belong to

Z = X \oplus Y

.

Let us report several examples of linear symplectic forms:

Let $X = V$ be a finite n-dimensional vector space with the dual space of linear functionals $Y = V^{*}$ (space of covectors l). The natural pairing $((x, l)) = l (x) = \sum_{i} x^{i} l_{i}$ of a vector $x \in V$ with a covector $l \in V^{*}$ is an example of dual product. (We use the superscript index for indicating components of contravariant vectors and subscript index for specifying components of covariant vectors [9]). We define the symplectic form $ω$ induced by the natural pairing of vectors with covectors as follows:

$ω (z_{1}, z_{2}) = ((x_{1}, l_{2})) - ((x_{2}, l_{1})) = l_{2} (x_{1}) - l_{1} (x_{2}),$

(4)

where $z_{1} = (x_{1}, l_{1})$ and $z_{2} = (x_{2}, l_{2})$ belong to $Z = V \oplus V^{*}$ .
Consider $(X, 〈 \cdot, \cdot 〉)$ an inner product space of dimension n. The product space $Z = X \oplus X$ of even dimension n can be equipped with the following map $ω : Z \times Z \to R$ induced by the inner product:

$ω (z_{1}, z_{2}) = 〈 x_{1}, y_{2} 〉 - 〈 x_{2}, y_{1} 〉,$

(5)

where $z_{1} = (x_{1}, y_{1}) \in Z$ and $z_{2} = (x_{2}, y_{2}) \in Z$ .

For example, let

X = R

and

〈 x_{1}, x_{2} 〉 = x_{1} x_{2}

. Then

ω (z_{1}, z_{2}) = x_{1} y_{2} - x_{2} y_{1}

. This symplectic form can be interpreted as the determinant of the matrix

M = [\begin{matrix} x_{1} & x_{2} \\ y_{1} & y_{2} \end{matrix}]

which corresponds geometrically to the signed orientation of the parallelogram defined by the vectors

z_{1} = (x_{1}, y_{1})

and

z_{2} = (x_{2}, y_{2})

. See Figure 2. (This example indicates the link with integration of 2D manifolds equipped with fields of symplectic forms smoothly varying called differential 2-forms [9]).

In a finite-dimensional vector space, we can express the inner product as

〈 x, y 〉 = x^{⊤} Q y

for a symmetric positive-definite matrix

Q \in R^{n \times n}

. Let

Q = L^{⊤} L

be the Cholesky decomposition of Q. Then we have

〈 x, y 〉 = {(L^{⊤} x)}^{⊤} I (L^{⊤} y) = {〈 L^{⊤} x, L^{⊤} y 〉}_{0},

where I is the

n \times n

identity matrix and

{〈 x, y 〉}_{0} = x^{⊤} y

is the Euclidean inner product. Thus the form

ω_{0}

induced by

{〈 \cdot, \cdot 〉}_{0}

can be expressed using linear algebra as

ω_{0} (z_{1}, z_{2}) = z_{1}^{⊤} [\begin{matrix} 0 & I \\ - I & 0 \end{matrix}] z_{2} = z_{1}^{⊤} Ω_{0} z_{2},

where

Ω_{0} \in R^{2 n \times 2 n}

is a skew-symmetric matrix:

Ω_{0}^{⊤} = - Ω_{0}

. More generally, we may consider skew-symmetric matrices of the form

Ω = [\begin{matrix} 0 & L^{⊤} \\ - L^{⊤} & 0 \end{matrix}]

to define the symplectic form

ω_{Q}

induced by the inner product

{〈 x, y 〉}_{Q} = x^{⊤} Q y

.

2.2. Linear Symplectomorphisms and the Groups of Symplectic Matrices

A symplectic form

ω

can be expressed as a

2 n \times 2 n

matrix

Ω = [ω_{i j}]

such that

ω_{i j} = ω (b_{i}, b_{j})

where

b_{1} = e_{1}, \dots, b_{n} = e_{n}, b_{n + 1} = f_{1}, \dots, b_{2 n} = f_{n}

are the basis vectors, and

ω (z_{1}, z_{2}) = z_{1}^{⊤} Ω z_{2}

.

The Darboux basis [2] of the canonical form

ω_{0}

of

R^{2 n}

is such that

ω (e_{i}, f_{j}) = δ_{i j}

and

ω (e_{i}, e_{j}) = ω (l_{i}, l_{j}) = 0

where

δ

denotes the Kronecker delta function.

Ω_{0} \in Sp (2 n)

is the symplectic matrix

[\begin{matrix} 0 & I \\ - I & 0 \end{matrix}]

corresponding to the canonical form

ω_{0}

of

R^{2 n}

.

A transformation

t : V \to V

is called a linear symplectomorphism when

ω (t (z_{1}), t (z_{2}) = ω (z_{1}, z_{2})

(i.e.,

t^{*} ω = ω

), i.e., when

T^{⊤} Ω T = Ω

where T be the matrix representation of t. In particular t is a linear symplectomorphism with respect to

ω_{0}

when

T^{⊤} Ω_{0} T = Ω_{0}

. Any symplectic vector space of

(V, ω)

dimension

2 n

is symplectomorphic to the canonical symplectic space

(R^{2 n}, ω_{0})

.

Linear symplectomorphisms can be represented by symplectic matrices of the symplectic group [11,26]

Sp (2 n)

:

\begin{matrix} Sp (2 n) & = & \{T : T^{⊤} Ω_{0} T = Ω_{0}\} \subset GL (2 n), \\ = & \{T = [\begin{matrix} A & B \\ C & D \end{matrix}] : - C^{⊤} A + A^{⊤} C = 0, - C^{⊤} B + A^{⊤} D = I, - D^{⊤} B + B^{⊤} D = 0\}, \end{matrix}

Transpose and inverse of symplectic matrices are symplectic matrices. The inverse of a symplectic matrix T is given by

\begin{matrix} T^{- 1} & = & - Ω_{0} T^{⊤} Ω_{0}, \\ = & [\begin{matrix} D^{⊤} & - B^{⊤} \\ - C^{⊤} & A^{⊤} \end{matrix}] \end{matrix}

Symplectic matrices of

Sp (2 n)

have unit determinant (

Sp (2 n) \subset SL (2 n) \subset GL (2 n)

), and in the particular case of

n = 1

,

Sp (2)

corresponds precisely to the set of matrices with unit determinant. Thus rotation matrices of

SO (2)

which have unit determinant for a subgroup of

Sp (2)

.

Sesquilinear symplectic forms can also be defined on complex linear spaces [27].

3. Symplectic Fenchel Transform, Symplectic Subdifferentials, and Symplectic Fenchel–Young (in)Equality

Let

F : Z = X \times Y \to R \cup {+ \infty}

be a convex lower semi-continuous (lsc) function called a potential function.

Definition 3

(Symplectic Fenchel conjugate). The symplectic Fenchel conjugate

F^{* ω} (z^{'})

is defined by

F^{* ω} (z^{'}) = sup_{z \in Z} \{ω (z^{'}, z) - F (z)\} .

Notice that since

ω

is skew-symmetric, the order of the arguments in

ω

is important: The symplectic Fenchel transform optimizes with respect to the second argument of

ω (\cdot, \cdot)

.

The symplectic subdifferential of F at z is defined by

\partial^{ω} F (z) = \{z_{1} \in Z : \forall z_{2} \in Z, F (z + z_{2}) \geq F (z) + ω (z_{1}, z_{2})\} .

The differential operator

\partial^{ω}

is a set-valued operator:

\partial^{ω} : F ⇉ Z

, where

F

is the set of potential functions. An element of the symplectic subdifferential of F at z is called a symplectic subgradient.

Remark 1.

Moreau generalized the Fenchel conjugate using a cost function [28]. In particular, the duality induced by logarithmic cost function was studied in [29], and lead to a generalization of Bregman divergences called the logarithmic divergences which are canonical divergences of constant section curvature manifolds in information geometry.

Remark 2.

In geometric mechanics [2], the symplectic gradient on a symplectic manifold

(M, ω)

is the Hamiltonian vector field, i.e., the vector field

X_{H}

such that the Halmitonian mechanics equation writes concisely as

ω (X_{H}, \cdot) = d H

.

Theorem 1

(Symplectic Fenchel–Young inequality, Theorem 2.3 of [13,14]). Let

F (z)

be a convex (i.e.,

F (z) = F (x, y)

is joint convex, i.e., convex with respect to

z = (x, y)

) and lower semi-continuous function. Then the following inequality holds:

\forall z, z^{'} \in Z, F (z) + F^{* ω} (z^{'}) \geq ω (z^{'}, z),

with equality if and only if

z^{'} \in \partial^{ω} (z)

.

Let us again notice that the argument order in

ω (\cdot, \cdot)

is important.

Assume that the potential functions are smooth and that symplectic subdifferentials consist only of single-element sets (singletons). By abuse of language, we shall call in this paper the symplectic gradient of F the single element of the symplectic subdifferential

\partial^{ω}

, and denote it by

\nabla^{ω} F

:

\partial^{ω} F (z) = {\nabla^{ω} F (z)}

. (Our terminology and notation is thus not to be confused with the Hamiltonian vector field

X_{H}

of geometric mechanics).

4. Symplectic Fenchel–Young Divergences and Symplectic Bregman Divergences

Divergences are smooth dissimilarity functions (see Section 4.2 of [30]). From the symplectic Fenchel–Young inequality of Theorem 1, we can define the symplectic Fenchel–Young divergence as follows:

Definition 4

(Symplectic Fenchel–Young divergence). Let

F : Z = X \times Y \to R

be a smooth convex function. Then the symplectic Fenchel–Young divergence is the following non-negative measure of dissimilarity between z and

z^{'}

:

Y_{F} (z, z^{'}) = F (z) + F^{* ω} (z^{'}) - ω (z^{'}, z) \geq 0 .

(6)

We have

Y_{F} (z, z^{'}) = 0

if and only if

z^{'} \in \partial^{ω} F (z)

, i.e.,

z^{'} = \nabla^{ω} F (z)

when F is smooth.

Let us now define the symplectic Bregman divergence

B_{F}^{ω} (z_{1} : z_{2})

as

Y_{F} (z_{1}, z_{2}^{'})

where

z_{2}^{'} = \nabla^{ω} F (z_{2})

. Using the following identity derived from the symplectic Fenchel–Young equality:

F^{* ω} (\nabla^{ω} F (z)) = ω (\nabla^{ω} F (z), z) - F (z),

and the bilinearity of the symplectic form, we obtain:

\begin{matrix} B_{F}^{ω} (z_{1} : z_{2}) & = & Y_{F} (z_{1}, z_{2}^{'}), \\ = & F (z_{1}) + F^{* ω} (z_{2}^{'}) - ω (z_{2}^{'}, z_{1}), \\ = & F (z_{1}) + ω (\nabla^{ω} F (z_{2}), z_{2}) - F (z_{2}) - ω (\nabla^{ω} F (z_{2}), z_{1}), \\ = & F (z_{1}) - F (z_{2}) - ω (\nabla^{ω} F (z_{2}), z_{1} - z_{2}) . \end{matrix}

(7)

Since

ω

is skew-symmetric, we can also rewrite Equation (7) equivalently as

B_{F}^{ω} (z_{1} : z_{2}) = F (z_{1}) - F (z_{2}) + ω (z_{1} - z_{2}, \nabla^{ω} F (z_{2})) .

(8)

Definition 5

(Symplectic Bregman divergence). Let

(Z = X \times Y, ω)

be a symplectic vector space. Then the symplectic Bregman divergence between

z_{1}

and

z_{2}

of Z induced by a smooth convex potential

F (z)

is

B_{F}^{ω} (z_{1} : z_{2}) = F (z_{1}) - F (z_{2}) - ω (\nabla^{ω} F (z_{2}), z_{1} - z_{2}),

where the symplectic subdifferential gradient is the singleton

\partial^{ω} F (z) = {\nabla^{ω} F (z)}

.

Remark 3.

The ordinary Bregman divergences (BDs) have been generalized to non-smooth strictly convex potential functions using a subdifferential map in [31,32,33] to choose among several potential subgradients at a given location. Similarly, we can extend symplectic Bregman divergences to non-smooth strictly convex potential functions using a symplectic subdifferential map.

5. Particular Cases Recover Composite Bregman Divergences

When

Y = X

and

(X, 〈 \cdot, \cdot 〉)

is an inner-product space, we may consider the composite inner-product on

Z = X \times X

:

〈 〈 z_{1}, z_{2} 〉 〉 = 〈 x_{1}, y_{1} 〉 + 〈 x_{2}, y_{2} 〉,

with

z_{1} = (x_{1}, y_{1})

and

z_{1} = (x_{2}, y_{2})

.

Let

I : Z \to Z

be the linear function

I (z) = z

and denote by

J : Z \to Z

the linear function defined by

J (z) = J (x, y) = (- y, x) .

Notice that this definition of J makes sense because

X = Y

and thus

(- y, x) \in Z

. We check that we have

J^{2} (x, y) = J (- y, x) = (- x, - y) = - (x, y)

, i.e.,

J^{2} = - I

. Furthermore, we have

g (z_{1}, z_{2}) = ω (z_{1}, J z_{2}) = 〈 x_{1}, x_{2} 〉 + 〈 y_{1}, y_{2} 〉

that is a positive definite inner product. That is, the automorphism J is a complex structure

ω

-compatible (J is a symplectomorphism).

We can express the symplectic form

ω (z_{1}, z_{2}) = 〈 x_{1}, y_{2} 〉 - 〈 x_{2}, y_{1} 〉

induced by the inner product using the composite inner product as follows:

\begin{matrix} ω (z_{1}, z_{2}) & = & 〈 x_{1}, y_{2} 〉 - 〈 x_{2}, y_{1} 〉 = 〈 〈 J (z_{1}), z_{2} 〉 〉, \\ ω (- J z_{1}, z_{2}) & = & 〈 〈 z_{1}, z_{2} 〉 〉 . \end{matrix}

Similarly, the symplectic subdifferential of F can be expressed using the ordinary subdifferential (and vice versa) as follows:

\begin{matrix} z^{'} \in \partial^{ω} F (z) & \Leftrightarrow & J (z^{'}) \in \partial F (z), \\ z^{'} \in \partial F (x) & \Leftrightarrow & - J (z^{'}) \in \partial^{ω} F (z) . \end{matrix}

When subdifferentials are singletons, we thus have

\begin{matrix} J (\nabla^{ω} F (z)) & = & \nabla F (z), \\ \nabla^{ω} F (z) & = & - J (\nabla F (z)) . \end{matrix}

Last, the symplectic Fenchel conjugate of F is related by the ordinary Fenchel conjugate

F^{*}

of F as follows:

F^{* ω} (z) = F^{*} (J (z)) .

Thus in that case the symplectic Bregman divergence amounts to an ordinary Bregman divergence:

\begin{matrix} B_{F}^{ω} (z_{1} : z_{2}) & = & F (z_{1}) - F (z_{2}) - ω (\nabla^{ω} F (z_{2}), z_{1} - z_{2}), \\ = & F (z_{1}) - F (z_{2}) + ω (- \nabla^{ω} F (z_{2}), z_{1} - z_{2}), \\ = & F (z_{1}) - F (z_{2}) + ω (J (\nabla F (z_{2})), z_{1} - z_{2}), \\ = & F (z_{1}) - F (z_{2}) - 〈 〈 z_{1} - z_{2}, \nabla F (z_{2}) 〉 〉, \\ = & B_{F} (z_{1} : z_{2}) . \end{matrix}

Property 1.

When the symplectic form ω is induced by an inner product

〈 \cdot, \cdot 〉

of X, the symplectic Bregman divergence

B_{F}^{ω} (z_{1} : z_{2})

between

z_{1} = (x_{1}, y_{1})

and

z_{2} = (x_{2}, y_{2})

of

Z = X \times X

amounts to an ordinary Bregman divergence with respect to the composite inner-product

〈 〈 z_{1}, z_{2} 〉 〉 = 〈 x_{1}, y_{1} 〉 + 〈 x_{2}, y_{2} 〉

:

B_{F}^{ω} (z_{1} : z_{2}) = B_{F} (z_{1} : z_{2}) = F (z_{1}) - F (z_{2}) - 〈 〈 z_{1} - z_{2}, \nabla F (z_{2}) 〉 〉 .

Furthermore, if the potential function

F (z)

is separable, i.e.,

F (z) = F_{1} (x) + F_{2} (y)

for Bregman generators

F_{1}

and

F_{2}

, then we have

B_{F}^{ω} (z_{1} : z_{2}) = B_{F_{1}} (x_{1} : x_{2}) + B_{F_{2}} (y_{1}, y_{2})

where the Bregman divergences

B_{F_{1}}

and

B_{F_{2}}

are defined with respect to the inner product of X.

Notice that the symplectic Fenchel–Young inequality can be rewritten using the ordinary Fenchel–Young inequality and the linear function J as:

\begin{matrix} F (z) + F^{* ω} (z^{'}) & \geq & ω (z^{'}, z), \\ F (z) + F^{*} (J (z^{'})) \geq 〈 〈 J (z^{'}), z 〉 〉 . \end{matrix}

6. Summary, Discussion, and Perspectives

Since its inception in operations research, Bregman divergences [19] have proven instrumental in many scientific fields including information theory, statistics, and machine learning, just to cite a few. Let

(X, 〈 \cdot, \cdot 〉)

be a Hilbert space, and

F : X \to R

a strictly convex and smooth real-valued function. Then the Bregman divergence induced by F is defined in [19] (1967) by

B_{F} (x_{1} : x_{2}) = F (x_{1}) - F (x_{2}) - 〈 x_{1} - x_{2}, \nabla F (x_{2}) 〉 .

In this work, we consider finite-dimensional vector spaces equipped with an inner product.

In information geometry [22,34,35], a smooth dissimilarity

D (p, q)

between two points p and q on an n-dimensional smooth manifold M induces a statistical structure on the manifold [36], i.e., a triplet

(g, \nabla, \nabla^{*})

where the Riemannian metric tensor g and the torsion-free affine connections ∇ and

\nabla^{*}

are induced by the divergence

D

. The duality in information geometry is expressed by the fact that the mid-connection

\frac{\nabla + \nabla^{*}}{2}

corresponds to the Levi-Civita connection induced by g. To build the divergence-based information geometry [37], the divergence

D (p, q)

is interpreted as a scalar function on the product manifold

M \times M

of dimension

2 n

. Thus, the divergence

D

is called a contrast function [36] or yoke [38]. Conversely, a statistical structure

(g, \nabla, \nabla^{*})

on an n-dimensional manifold M induces a contrast function [39]. When the statistical manifold

(M, g, \nabla, \nabla^{*})

is dually flat with

θ (\cdot)

the global ∇-affine coordinate system and

η (\cdot)

the global

\nabla^{*}

-affine coordinate system [40], there exists two dual global potential functions

ϕ

and

ϕ^{*}

on the manifold M such that

ϕ (p) = F (θ (p))

and

ϕ^{*} (η (p)) = F^{*} (η (p))

where

F^{*} (η)

is the Legendre–Fenchel convex conjugate of

F (θ)

. The canonical dually flat divergence on M is then defined by

D (p, q) = F (θ (p)) + F^{*} (η (p)) - \sum_{i = 1}^{n} θ_{i} (p) η_{i} (q),

and amounts to a Fenchel–Young divergence or equivalently a Bregman divergence:

D (p, q) = Y_{F} (θ (p) : θ (q)) = B_{F} (θ (p) : θ (q)),

where the Fenchel–Young divergence is defined by

Y_{F} (θ : η^{'}) = F (θ) + F^{*} (η^{'}) - \sum_{i = 1}^{n} θ^{i} η_{i}^{'} .

The Riemannian metric g of a dually flat space can be expressed as

g = \nabla d ϕ = \nabla^{*} d ϕ^{*}

or in the

θ

-coordinates by

g_{i j} (θ) = \frac{\partial^{2}}{\partial θ_{i} \partial θ_{j}} F (θ)

and in the

η

-coordinates by

g_{i j} (η) = \frac{\partial^{2}}{\partial η_{i} \partial η_{j}} F^{*} (η)

. That is, g is a Hessian metric [40],

(g, \nabla)

a Hessian structure and

(g, \nabla^{*})

a dual Hessian structure. In differential geometry,

(M, g, \nabla)

is called a Hessian manifold which admits a dual Hessian structure

(g, \nabla^{*})

. In particular, a Hessian manifold is of Koszul type [40] when there exists a closed 1-form

α

such that

g = \nabla α

.

Remark 4.

Notice that the potential functions F and

F^{*}

are not defined uniquely although the potential functions ϕ and

ϕ^{*}

on the manifold are. Indeed, consider the generator

\bar{F} (θ) = F (A θ + b) + 〈 c, θ 〉 + d

for invertible matrix

A \in GL (d, R)

, vectors

b, c \in R^{d}

and scalars

d \in R

. The gradient of the generator

\bar{F}

is

η = \nabla \bar{F} (θ) = A^{⊤} \nabla F (A θ + b) + c

. Solving the equation

\nabla \bar{F} (θ) = η

yields the reciprocal gradient

θ (η) = \nabla \bar{G} (η) = A^{- 1} \nabla G (A^{- ⊤} (η - c)) - b

from which the Legendre convex conjugate is obtained as

\bar{G} (η) = 〈 η, \nabla \bar{G} (η) 〉 - F (\nabla \bar{G} (η))

. We have

B_{F} (θ_{1} : θ_{1}) = B_{\bar{F}} ({\bar{θ}}_{1} : {\bar{θ}}_{2})

where

\bar{θ} = A^{- 1} (θ - b)

.

It has been shown that a divergence

D

also allows one to define a symplectic structure

ω

on a statistical manifold [38,41]. The symplectic vector space

(R^{2 n}, ω_{0})

viewed as a symplectic manifold has symplectic form

ω_{0} = \sum_{i = 1}^{n} d x^{i} \land d y_{i} = - d (\sum_{i = 1}^{n} y_{i} d x^{i})

. There are no local invariants but only global invariants on symplectic manifolds (symplectic topology). That is, a symplectic structure is flat.

In this expository paper, we have defined symplectic Fenchel–Young divergences and equivalent symplectic Bregman divergences by following the study of geometric mechanics reported in [13,14]. The symplectic Bregman divergence between two points

z_{1}

and

z_{2}

on a symplectic vector space

(Z, ω)

induced by a convex potential function

F : Z \to R

is defined by

B_{F}^{ω} (z_{1} : z_{2}) = F (z_{1}) - F (z_{2}) + ω (z_{1} - z_{2}, \nabla^{ω} F (z_{2})),

where

\nabla^{ω} F

has been called the symplectic gradient in this paper, and assumed to be the unique symplectic subdifferential at any

z \in Z

, i.e.,

\partial^{w} F (z) = {\nabla^{ω} F (z)}

. Symplectic Bregman divergences are used to define Bregman divergences on dual systems (Figure 3). In the particular case of dual system

(X, X, 〈 \cdot, \cdot 〉)

, we recover ordinary Bregman divergences with composite inner products.

In finite

2 n

-dimensional symplectic vector spaces, linear symplectic forms

ω

can be represented by symplectic matrices of the matrix group

Sp (2 n)

. Buliga and de Saxcé [13,14] considered geometric mechanics with dissipative terms, and stated the following “symplectic Brezis–Ekeland–Nayroles principle” (SBEN principle for short):

Definition 6

(SBEN principle [13,14]). The natural evolution path

z (t) = z_{rev} (t) + z_{irr} (t) \in Z

for

t \in [0, T]

in a geometric mechanic system with convex dissipation potential

ϕ (z)

minimizes among all admissible paths

\int_{0}^{T} Y_{F}^{ω} (\dot{z} (t), {\dot{z}}_{irr} (t)) d t

and satisfies

Y_{F}^{ω} (\dot{z} (t), {\dot{z}}_{irr} (t)) = 0

for all

t \in [0, T]

, where

Y_{F}^{ω}

denotes the symplectic Fenchel–Young divergence induced by ϕ, and

z_{rev} (t)

and

z_{irr} (t)

are the reversible and irreversible parts of the particle

z (t)

, respectively.

The decomposition of

z = z_{rev} + z_{irr}

into two parts can be interpreted as Moreau’s proximation [42,43] associated to the potential function

ϕ

: Indeed, let

F (z)

be a convex function of

Z = R^{d}

. Then for all

z \in R^{d}

, we can uniquely decompose z as

z = z + z^{*}

such that

F (z) + F^{*} (z^{*}) = 〈 z, z * 〉

(Fenchel–Young equality) where

z^{*} = \nabla F (z)

(see Proposition in Section 4 of [42]). The part z is called the proximation with respect to F, and the part

z^{*}

is the proximation with respect to the convex conjugate

F^{*}

.

We may consider the non-separable potential functions

F (z) = F (x, y) = x f (y / x)

which are obtained from the perspective transform [44,45] of arbitrary convex functions

f (u)

to define symplectic Bregman divergences. The perspective functions

F (x, y)

are jointly convex if and only if their corresponding generators f are convex. Such perspective transforms play a fundamental role in information theory [46] and information geometry [22].

In machine learning, symplectic geometry has been used for designing accelerated optimization methods [16,47] (Bregman–Lagrangian framework) and physics-informed neural networks [17,18] (PINNs).

This paper aims to spur interest in either designing or defining symplectic divergences from first principles, and to demonstrate their roles when studying thermodynamics [48] or the learning dynamics of ML and AI systems.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

Author Frank Nielsen is employed by the company Sony Computer Science Laboratories Inc. The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The author declares no conflicts of interest.

References

McDuff, D. Symplectic structures—A new approach to geometry. Not. AMS 1998, 45, 952–960. [Google Scholar]
Da Silva, A.C. Lectures on Symplectic Geometry; Springer: Berlin/Heidelberg, Germany, 2001; Volume 3575. [Google Scholar]
Libermann, P.; Marle, C.M. Symplectic Geometry and Analytical Mechanics; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 35. [Google Scholar]
de Lagrange, J.L. Mémoire sur la théorie des variations des éléments des planétes, et en particulier des variations des grands axes de leurs orbites. Paris 1808, VI, 713–768. [Google Scholar]
Lagrange, J.L. Second mémoire sur la théorie générale de la variation des constantes arbitraires dans tous les problemes de la mécanique. Mémoires Prem. Cl. l’Institut Fr. 1810, 19, 809–816. [Google Scholar]
Marle, C.M. The inception of symplectic geometry: The works of Lagrange and Poisson during the years 1808–1810. Lett. Math. Phys. 2009, 90, 3–21. [Google Scholar] [CrossRef]
Lagrange, J.L. Mécanique Analytique; First Published by La Veuve Desaint, Paris in French in 1788 by Joseph-Louis De La Grange with title “Méchanique analitique”; Mallet-Bachelier: Paris, France, 1811. [Google Scholar]
Lagrange, J.L. Analytical Mechanics; First Published in French in 1811; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013; Volume 191. [Google Scholar]
Godinho, L.; Natário, J. An introduction to Riemannian geometry. In With Applications; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Gotay, M.J.; Isenberg, G. The symplectization of science. Gaz. Mathématiciens 1992, 54, 59–79. (In French) [Google Scholar]
Weyl, H. The Classical Groups: Their Invariants and Representations; Number 1; Princeton University Press: Princeton, NJ, USA, 1946. [Google Scholar]
Souriau, J.M. Structure of Dynamical Systems: A Symplectic View of Physics; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1997; Volume 149. [Google Scholar]
Buliga, M.; de Saxcé, G. A symplectic Brezis–Ekeland–Nayroles principle. Math. Mech. Solids 2017, 22, 1288–1302. [Google Scholar] [CrossRef]
de Saxcé, G. A variational principle of minimum for Navier–Stokes equation and Bingham fluids based on the symplectic formalism. In Information Geometry; Springer: Berlin/Heidelberg, Germany, 2024; pp. 1–22. [Google Scholar]
Audin, M. Vladimir Igorevich Arnold and the invention of symplectic topology. In Contact and Symplectic Topology; Springer: Berlin/Heidelberg, Germany, 2014; pp. 1–25. [Google Scholar]
Jordan, M.I. Dynamical, symplectic and stochastic perspectives on gradient-based optimization. In Proceedings of the International Congress of Mathematicians: Rio de Janeiro 2018, Rio de Janeiro, Brazil, 1–9 August 2018; World Scientific: Singapore, 2018; pp. 523–549. [Google Scholar]
Chen, Y.; Matsubara, T.; Yaguchi, T. Neural symplectic form: Learning Hamiltonian equations on general coordinate systems. Adv. Neural Inf. Process. Syst. 2021, 34, 16659–16670. [Google Scholar]
Matsubara, T.; Miyatake, Y.; Yaguchi, T. Symplectic adjoint method for exact gradient of neural ODE with minimal memory. Adv. Neural Inf. Process. Syst. 2021, 34, 20772–20784. [Google Scholar]
Bregman, L.M. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 1967, 7, 200–217. [Google Scholar] [CrossRef]
Rockafellar, R.T. Conjugates and Legendre transforms of convex functions. Can. J. Math. 1967, 19, 200–205. [Google Scholar] [CrossRef]
Bauschke, H.H.; Borwein, J.M.; Combettes, P.L. Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces. Commun. Contemp. Math. 2001, 3, 615–647. [Google Scholar] [CrossRef]
Amari, S.i. Information Geometry and Its Applications; Applied Mathematical Sciences; Springer: Tokyo, Japan, 2016. [Google Scholar]
Horváth, J. Topological Vector Spaces and Distributions; Courier Corporation: New York, NY, USA, 2013. [Google Scholar]
McInerney, A. First Steps in Differential Geometry: Riemannian, Contact, Symplectic. In Undergraduate Texts in Mathematics; Springer: New York, NY, USA, 2013. [Google Scholar]
Bourguignon, J.P. Variational Calculus; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
Siegel, C.L. Symplectic Geometry; Elsevier: Amsterdam, The Netherlands, 1964. [Google Scholar]
Everitt, W.; Markus, L. Complex symplectic geometry with applications to ordinary differential operators. Trans. Am. Math. Soc. 1999, 351, 4905–4945. [Google Scholar] [CrossRef]
Moreau, J.J. Inf-convolution, sous-additivité, convexité des fonctions numériques. J. Mathématiques Pures Appliquées 1970. Available online: https://hal.science/hal-02162006/ (accessed on 25 August 2024).
Wong, T.K.L. Logarithmic divergences from optimal transport and Rényi geometry. Inf. Geom. 2018, 1, 39–78. [Google Scholar] [CrossRef]
Leok, M.; Zhang, J. Connecting information geometry and geometric mechanics. Entropy 2017, 19, 518. [Google Scholar] [CrossRef]
Kiwiel, K.C. Free-steering relaxation methods for problems with strictly convex costs and linear constraints. Math. Oper. Res. 1997, 22, 326–349. [Google Scholar] [CrossRef]
Gordon, G.J. Approximate Solutions to Markov Decision Processes. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 1999. [Google Scholar]
Iyer, R.; Bilmes, J.A. Submodular-Bregman and the Lovász-Bregman divergences with applications. Adv. Neural Inf. Process. Syst. 2012, 25, 2933–2941. [Google Scholar]
Nielsen, F. An elementary introduction to information geometry. Entropy 2020, 22, 1100. [Google Scholar] [CrossRef]
Nielsen, F. The many faces of information geometry. Not. Am. Math. Soc 2022, 69, 36–45. [Google Scholar] [CrossRef]
Eguchi, S. A differential geometric approach to statistical inference on the basis of contrast functionals. Hiroshima Math. J. 1985, 15, 341–391. [Google Scholar] [CrossRef]
Amari, S.i.; Cichocki, A. Information geometry of divergence functions. Bull. Pol. Acad. Sci. Tech. Sci. 2010, 58, 183–195. [Google Scholar] [CrossRef]
Barndorff-Nielsen, O.E.; Jupp, P.E. Statistics, yokes and symplectic geometry. In Annales de la Faculté des Sciences de Toulouse: Mathématiques; Université Paul Sabatier: Toulouse, France, 1997; Volume 6, pp. 389–427. [Google Scholar]
Matumoto, T. Any statistical manifold has a contrast function: On the C³-functions taking the minimum at the diagonal of the product manifold. Hiroshima Math. J 1993, 23, 327–332. [Google Scholar] [CrossRef]
Shima, H. The Geometry of Hessian Structures; World Scientific: Singapore, 2007. [Google Scholar]
Zhang, J. Divergence functions and geometric structures they induce on a manifold. In Geometric Theory of Information; Springer: Berlin/Heidelberg, Germany, 2014; pp. 1–30. [Google Scholar]
Moreau, J.J. Proximité et dualité dans un espace hilbertien. Bull. Société Mathématique Fr. 1965, 93, 273–299. [Google Scholar] [CrossRef]
Rockafellar, R. Integrals which are convex functionals. Pac. J. Math. 1968, 24, 525–539. [Google Scholar] [CrossRef]
Dacorogna, B.; Maréchal, P. The role of perspective functions in convexity, polyconvexity, rank-one convexity and separate convexity. J. Convex Anal. 2008, 15, 271–284. [Google Scholar]
Combettes, P.L. Perspective functions: Properties, constructions, and examples. Set-Valued Var. Anal. 2018, 26, 247–264. [Google Scholar] [CrossRef]
Csiszár, I.; Shields, P.C. Information theory and statistics: A tutorial. Found. Trends^® Commun. Inf. Theory 2004, 1, 417–528. [Google Scholar] [CrossRef]
Shi, B.; Du, S.S.; Su, W.; Jordan, M.I. Acceleration via symplectic discretization of high-resolution differential equations. Adv. Neural Inf. Process. Syst. 2019, 32, 5744–5752. [Google Scholar]
Barbaresco, F. Symplectic theory of heat and information geometry. In Handbook of Statistics; Elsevier: Amsterdam, The Netherlands, 2022; Volume 46, pp. 107–143. [Google Scholar]

Figure 1. The motion of a single point particle

q (t)

with mass m and momentum

p (t) = m \dot{q} (t)

on a 1D line can be modeled as a curve

C = {c (t) = (q (t), p (t)) : t \in T \subset R}

in the phase space

R^{2}

.

Figure 1. The motion of a single point particle

q (t)

with mass m and momentum

p (t) = m \dot{q} (t)

on a 1D line can be modeled as a curve

C = {c (t) = (q (t), p (t)) : t \in T \subset R}

in the phase space

R^{2}

.

Figure 2. Interpreting a 2D symplectic form

ω (z_{1}, z_{2})

as the signed area of a parallelogram with first oriented edge

z_{1}

(grey). A pair of vectors defines two possible orientations of the parallelogram: The orientation compatible with

z_{1}

and the reverse orientation compatible with

z_{2}

.

ω

is called the standard area form.

Figure 2. Interpreting a 2D symplectic form

ω (z_{1}, z_{2})

as the signed area of a parallelogram with first oriented edge

z_{1}

(grey). A pair of vectors defines two possible orientations of the parallelogram: The orientation compatible with

z_{1}

and the reverse orientation compatible with

z_{2}

.

ω

is called the standard area form.

Figure 3. Bregman divergences generalized to dual systems

(X, Y, b (\cdot, \cdot))

: A symplectic form

ω

on the space

Z = X \oplus Z

is induced by the pairing product. The Bregman divergence on the dual system is then defined as the symplectic Bregman divergence on the symplectic vector space

(Z, ω)

.

Figure 3. Bregman divergences generalized to dual systems

(X, Y, b (\cdot, \cdot))

: A symplectic form

ω

on the space

Z = X \oplus Z

is induced by the pairing product. The Bregman divergence on the dual system is then defined as the symplectic Bregman divergence on the symplectic vector space

(Z, ω)

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nielsen, F. Symplectic Bregman Divergences. Entropy 2024, 26, 1101. https://doi.org/10.3390/e26121101

AMA Style

Nielsen F. Symplectic Bregman Divergences. Entropy. 2024; 26(12):1101. https://doi.org/10.3390/e26121101

Chicago/Turabian Style

Nielsen, Frank. 2024. "Symplectic Bregman Divergences" Entropy 26, no. 12: 1101. https://doi.org/10.3390/e26121101

APA Style

Nielsen, F. (2024). Symplectic Bregman Divergences. Entropy, 26(12), 1101. https://doi.org/10.3390/e26121101

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Symplectic Bregman Divergences

Abstract

1. Introduction

2. Dual Systems, Linear Symplectic Forms, and Symplectomorphisms

2.1. Symplectic Forms Derived from Dual Systems

2.2. Linear Symplectomorphisms and the Groups of Symplectic Matrices

3. Symplectic Fenchel Transform, Symplectic Subdifferentials, and Symplectic Fenchel–Young (in)Equality

4. Symplectic Fenchel–Young Divergences and Symplectic Bregman Divergences

5. Particular Cases Recover Composite Bregman Divergences

6. Summary, Discussion, and Perspectives

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI