Generalized Legendre Transforms Have Roots in Information Geometry

Nielsen, Frank

doi:10.3390/e28010044

Open AccessArticle

Generalized Legendre Transforms Have Roots in Information Geometry^†

by

Frank Nielsen

Sony Computer Science Laboratories, 3-14-13 Higashi Gotanda, Shinagawa Ku, Tokyo 141-0022, Japan

^†

This paper is dedicated to Professor Shun-ichi Amari in honor of his 90th birthday.

Entropy 2026, 28(1), 44; https://doi.org/10.3390/e28010044 (registering DOI)

Submission received: 5 December 2025 / Revised: 28 December 2025 / Accepted: 28 December 2025 / Published: 30 December 2025

(This article belongs to the Special Issue SUURI of Information Geometry: Dedicated to SUURI Engineer Professor Shun’ichi Amari on the Occasion of His 90th Birthday)

Download

Browse Figures

Versions Notes

Abstract

Artstein-Avidan and Milman [Annals of mathematics (2009), (169):661–674] characterized invertible reverse-ordering transforms in the space of lower, semi-continuous, extended, real-valued convex functions as affine deformations of the ordinary Legendre transform. In this work, we first prove that all those generalized Legendre transforms of functions correspond to the ordinary Legendre transform of dually corresponding affine-deformed functions. In short, generalized convex conjugates are ordinary convex conjugates of dually affine-deformed functions. Second, we explain how these generalized Legendre transforms can be derived from the dual Hessian structures of information geometry.

Keywords:

reverse-ordering and convex duality; legendre transform; information geometry; dually flat space and Hessian manifold; Bregman and Fenchel–Young divergences; affine and curvilinear coordinate systems

1. Introduction

Let

R

denote the field of real numbers and

\bar{R} = R \cup {\pm \infty}

be the extended real line. A m-variate extended function

F : R^{m} \to \bar{R}

is proper when its efficient domain

Θ = dom (F) : = {θ : F (θ) < + \infty}

is non-empty and lower semi-continuous (lsc.) when its epigraph

epi (F) : = {(θ, y) : y \geq F (θ), θ \in dom (F)}

is closed with respect to the metric topology of

R^{m}

. We denote by

Γ_{0}

(shorcut for

Γ_{0} (R^{m})

) the space of of proper, lower, semi-continuous, extended, real-valued convex functions.

Consider the Legendre–Fenchel transform [1] (LFT)

L F

of a function F:

(L F) (η) : = sup_{θ \in R^{m}} \{〈 θ, η 〉 - F (θ)\},

(1)

where

〈 v, v^{'} 〉 : = \sum_{i = 1}^{m} v_{i} v_{i}^{'}

denotes the Euclidean inner product. The function

F^{*} = L F

is called the convex conjugate function. Some examples of convex conjugates are reported in Appendix B.

The Legendre transform originated in the work of Adrien-Marie Legendre [2] circa 1805. Legendre considered variational calculus problems and introduced the variable transform. The Legendre transform was later formalized by Hamilton [3] and Gibbs [4] as a way to switch between equivalent descriptions of a system by exchanging variables like position ↔ momentum or energy ↔ entropy. This variable duality principle induced by the Legendre transform turned out to be fundamental in many scientific fields, like thermodynamics [5] for relating thermodynamic potentials, quantum field theory [6] for connecting Lagrangian and Hamiltonian formulations (and underlying effective actions), economics for relating expenditure with utility functions [7], and convex optimization [8], where it is the cornerstone of duality theory in convex analysis.

The Moreau–Fenchel–Rockafellar theorem [9] states that when a function

F \in Γ_{0}

, its biconjugate function

{(F^{*})}^{*}

coincides with the function F:

{F^{*}}^{*} = F

. That is, the Legendre transform is an involutive transform of

Γ_{0}

. For general lsc. functions F (possibly non-convex), it is known that the biconjugate function

{(F^{*})}^{*} : = L (L F)

lower bounds the function F by the largest possible lsc. convex function

{(F^{*})}^{*} \leq F

. This lower bound convexification property has been proven useful in machine learning [10].

This study was motivated by the fundamental result of Artstein-Avidan and Milman [11,12], who proved the following theorem:

Theorem 1

([12], Theorem 7). Let

T

be an invertible transform such that the following are true:

$F_{1} \leq F_{2} \Rightarrow T F_{2} \leq T F_{1}$ ;
$T F_{1} \leq T F_{2} \Rightarrow F_{2} \leq F_{1}$ .

Then,

T

is a generalized Legendre–Fenchel transform (GLFT), written canonically as

(T F) (η) = λ (L F) (E η + f) + 〈 η, g 〉 + h,

(2)

where

λ > 0

,

E \in GL (R^{m})

(general linear group),

f, g \in R^{m}

, and

h \in R

.

We note in passing that Fenchel [13,14] interpreted the graph of the Legendre transform of an m-variate function

F \in Γ_{0}

as the polarity with respect to the paraboloid surface of

R^{m + 1}

P = \{(θ, y = \frac{1}{2} \sum_{i = 1}^{m} θ_{i}^{2}) : θ \in R^{m}\} \in R^{m + 1},

of the graph of F. This geometric polarity connection of the Legendre transform is all the more interesting since Böröczky and Schneider [15] characterized the duality of convex bodies in Euclidean spaces containing the origin in the interior into the same space [11].

In this work, we shall first prove that a generalized convex conjugate

T F

obtained by Equation (2) can always be expressed as the ordinary convex conjugate of a corresponding affine-deformed function. Namely, we shall show that

(T F) (η) = L (λ F (A θ + b) + 〈 θ, c 〉 + d)),

where

A \in GL (R^{m})

,

b, c \in R^{m}

and

d \in R

are defined according to

E, f, g, h

(details reported in Theorem 2). This equivalence result allows us to interpret the origin of the generalized Legendre transforms from the viewpoint of information geometry [16,17] and untangle the various degrees of freedom used when defining generalized Legendre transforms in Section 3.

2. Generalized Legendre Transforms as Ordinary Legendre Transforms

Consider a parameter

P = (λ, A, b, c, d) \in P : = R_{> 0} \times GL (R^{m}) \times R^{m} \times R^{m} \times R,

and deform a function

F (θ)

by carrying out affine transformations of both the parameter argument and its output as follows:

F_{P} (θ) : = λ F (A θ + b) + 〈 θ, c 〉 + d .

(3)

Those affine deformations preserve convexity:

Property 1

(Convexity-preserving affine deformations). Let

F \in Γ_{0}

. Then,

F_{P}

belongs to

Γ_{0}

for all

P \in P

.

Proof.

Let us check the convexity of

F_{P}

:

α F_{P} (θ_{1}) + (1 - α) F_{P} (θ_{2}) = λ (α F ({\bar{θ}}_{1}) + (1 - α) F ({\bar{θ}}_{2})) + 〈 α θ_{1} + (1 - α) θ_{2}), c 〉 + d,

where

{\bar{θ}}_{1} : = A θ_{1} + b

and

{\bar{θ}}_{2} : = A θ_{2} + b

. Since F is convex, we have

α F ({\bar{θ}}_{1}) + (1 - α) F ({\bar{θ}}_{2}) \leq F (α {\bar{θ}}_{1} + (1 - α) {\bar{θ}}_{2})

. Let

{\bar{θ}}_{α}^{'} : = α {\bar{θ}}_{1} + (1 - α) {\bar{θ}}_{2} = A {\bar{θ}}_{α}^{'} + b

with

θ_{α}^{'} : = α θ_{1} + (1 - α) θ_{2}

. We obtain

\begin{matrix} α F_{P} (θ_{1}) + (1 - α) F_{P} (θ_{2}) & \leq & \underset{= F_{P} (α θ_{1} + (1 - α) θ_{2})}{\underset{︸}{λ F (A {\bar{θ}}_{α}^{'} + b) + 〈 θ_{α}^{'}, c 〉 + d}}, \end{matrix}

hence proving that

F_{P}

is convex. Since the lower semi-continuous property is ensured (i.e., the epigraph of

F_{P}

is closed), we conclude that

F_{P} \in Γ_{0}

. □

Next, let us express the convex conjugate of a function

F_{P}

according to the ordinary convex conjugate

F^{*} = L F

:

Proposition 1

(LFT of an affine-deformed function). The Legendre transform of

F_{P}

with

P = (λ, A, b, c, d) \in P

is

{(F_{P})}^{*} = {(F^{*})}_{P^{⋄}}

where

P^{⋄} : = (λ, \frac{1}{λ} A^{- 1}, - \frac{1}{λ} A^{- 1} c, - A^{- 1} b, 〈 b, A^{- 1} c 〉 - d) \in P .

(4)

That is, we have

L (F_{P}) = {(L F)}_{P^{⋄}}

.

Proof.

Let

Γ_{1} \subset Γ_{0}

be defined as the class of differentiable, strictly convex Legendre-type functions [18] (see Appendix A). In general, given a function

F (θ) \in Γ_{1}

, we proceed as follows to calculate its convex conjugate. First, we find the inverse function

{(\nabla F)}^{- 1}

of its gradient

\nabla F

to obtain the gradient of the convex conjugate

\nabla F^{*} = {(\nabla F)}^{- 1}

. Then, we have

F^{*} (η) = 〈 θ, η 〉 - F (θ) = 〈 {(\nabla F)}^{- 1} (η), η 〉 - F ({(\nabla F)}^{- 1} (η)) .

For parameter

P = (λ, A, b, c, d)

, consider the strictly convex and differentiable function

F_{P} (θ) = λ F (A θ + b) + 〈 c, θ 〉 + d

for an invertible matrix

A \in GL (d, R)

, vectors

b, c \in R^{d}

, and scalars

d \in R

and

λ \in R_{> 0}

. The gradient of

F_{P}

is

η = \nabla F_{P} (θ) = λ A^{⊤} \nabla F (A θ + b) + c .

We denote by

G = F^{*}

and

G_{P} = {(F_{P})}^{*}

the Legendre convex conjugates of F and

F_{P}

, respectively. By solving the equation

\nabla F_{P} (θ) = η

, we find the reciprocal gradient

θ (η) = \nabla G_{P} (η)

:

\nabla G_{P} (η) = A^{- 1} \nabla G (\frac{1}{λ} A^{- ⊤} (η - c)) - b .

Therefore, the Legendre convex conjugate is obtained as follows:

\begin{matrix} G_{P} (η) & = & 〈η, \nabla G_{P} (η)〉 - F_{P} (\nabla G_{P} (η)), \\ = & λ^{'} G (A^{'} η + b^{'}) + 〈 c^{'}, η 〉 + d^{'}, \end{matrix}

where

\begin{matrix} λ^{'} & = & λ, \\ A^{'} & = & \frac{1}{λ} A^{- 1}, \\ b^{'} & = & - \frac{1}{λ} A^{- 1} c, \\ c^{'} & = & - A^{- 1} b, \\ d^{'} & = & 〈 b, A^{- 1} c 〉 - d . \end{matrix}

Hence, it follows that we have

P^{⋄} = (λ, \frac{1}{λ} A^{- 1}, - \frac{1}{λ} A^{- 1} c, - A^{- 1} b, 〈 b, A^{- 1} c 〉 - d) \in P .

□

Let us check that the (diamond) ⋄ operator on affine deformation parameters is an involution:

Proposition 2

(⋄ involution). The parameter transformation

P^{⋄}

is an involution, where

{(P^{⋄})}^{⋄} = P

.

Proof.

Let

P = (λ, A, b, c, d)

and

P^{⋄} = (λ^{'}, A^{'}, b^{'}, c^{'}, d^{'})

with

\begin{matrix} λ^{'} & = & λ, \\ A^{'} & = & \frac{1}{λ} A^{- 1}, \\ b^{'} & = & - \frac{1}{λ} A^{- 1} c, \\ c^{'} & = & - A^{- 1} b, \\ d^{'} & = & 〈 b, A^{- 1} c 〉 - d . \end{matrix}

We check that

{P^{⋄}}^{⋄} = (λ^{″}, A^{″}, b^{″}, c^{″}, d^{″}) = (λ, A, b, c, d) = P

component-wise as follows:

\begin{matrix} λ^{″} & = & λ^{'} & = & λ, \\ A^{″} & = & \frac{1}{λ} {A^{'}}^{- 1} & = & \frac{1}{λ} {(\frac{1}{λ} A^{- 1})}^{- 1} & = & A, \\ b^{″} & = & - \frac{1}{λ} {A^{'}}^{- 1} c^{'} & = & - \frac{1}{λ} {(\frac{1}{λ} A^{- 1})}^{- 1} (- A^{- 1} b) & = & b, \\ c^{″} & = & - {A^{'}}^{- 1} b^{'} & = & - {(\frac{1}{λ} A^{- 1})}^{- 1} (- \frac{1}{λ} A^{- 1} c) & = & c, \\ d^{″} & = & 〈 b^{'}, {A^{'}}^{- 1} c^{'} 〉 - d^{'} & = & 〈- \frac{1}{λ} A^{- 1} c, \frac{1}{λ} {(\frac{1}{λ} A^{- 1})}^{- 1} (- A^{- 1} b)〉 - 〈 b, A^{- 1} c 〉 + d & = & d . \end{matrix}

□

The ⋄ involution confirms that the Legendre–Fenchel transform is an involution:

{({(F_{P})}^{*})}^{*} = L (L F_{P}) = L F_{P^{⋄}} = F_{{(P^{⋄})}^{⋄}} = F_{P} .

Let us now define the following notation to express the generalized Legendre convex conjugates:

Definition 1

(Generalized Legendre–Fenchel convex conjugates [12]). Let

L_{λ, E, f, g, h}

denote a generalized Legendre–Fenchel transform

L_{λ, E, f, g, h} F : = L_{P} F : = λ (L F) (E η + f) + 〈 η, g 〉 + h

for the parameter

P = (λ, E, f, g, h) \in P

.

Our result is that we can interpret those generalized Legendre–Fenchel transforms of Equation (2) as the ordinary Legendre transform on corresponding affine-deformed convex functions:

Theorem 2.

For any

F \in Γ_{0}

, we have

L_{P} (F) : = {(F^{*})}_{P} = L (F_{P^{⋄}})

.

Proof.

By definition,

L_{P} F : = {(L F)}_{P}

. Since

P = {(P^{⋄})}^{⋄}

(Proposition 2), we have

{(L F)}_{P} = {(L F)}_{{(P^{⋄})}^{⋄}}

, and by using Proposition 1, we obtain

{(L F)}_{{(P^{⋄})}^{⋄}} = L (F_{P^{⋄}})

. To summarize, we have

L_{P} (F) = {(L F)}_{P} = {(L F)}_{{(P^{⋄})}^{⋄}} = L (F_{P^{⋄}}) .

□

That is, in plain words, the Artstein-Avidan–Milman generalized Legendre transforms [11,12] are ordinary Legendre transforms on affine-deformed functions evaluated on affine deformed arguments.

Next, we describe an information-geometric interpretation of Theorem 2 which casts light on the origin and meanings of the degrees of freedom used to define generalized Legendre transforms. Thus, understanding the ordinary Legendre transform from the information-geometric viewpoint allows one to understand how to recover the generalized Legendre transforms formalized axiomatically by Artstein-Avidan–Milman [11,12].

3. An Information-Geometric Interpretation of Generalized Legendre Transforms

We can interpret the fact that generalized Legendre convex conjugates are affine-deformed convex conjugates from the lens of information geometry [16]. Consider a smooth, strictly convex function

F : R^{m} \to \bar{R}

of the space

Γ_{2}

of twice-differentiable lsc. Legendre-type convex functions (with

Γ_{2} \subset Γ_{1} \subset Γ_{0}

). This function

F (θ)

defines a dually flat m-dimensional space [16]

(M, g, \nabla, \nabla^{*})

which is a global chart manifold

M = {p}

of points p equipped with dual Hessian structures [19]

(g, \nabla)

and

(g, \nabla^{*})

. The primal Hessian structure is induced by a potential function

ψ

on M with a torsion-free, flat, affine connection ∇ such that

ψ (p) = F (θ (p))

, where

(M, θ (\cdot))

is the primal ∇-affine coordinate system. The dual Hessian structure is induced by a dual potential function

ϕ

on M with a dual torsion-free, flat, affine connection

\nabla^{*}

such that

ϕ (p) = F^{*} (η (p))

, where

(M, η (\cdot))

is the dual

\nabla^{*}

-affine coordinate system. The Riemannian metric g can be expressed in the dual coordinate charts as

g (θ) = \nabla^{2} F (θ)

or

g (η) = \nabla^{2} F^{*} (η)

, where

F^{*}

is the convex conjugate. These two potential functions

ψ

and

ϕ

living on the manifold M satisfy the Fenchel–Young inequality:

ψ (θ (p)) + ϕ (η (q)) \geq \sum_{i = 1}^{m} θ_{i} (p) η_{i} (q),

(5)

with equality if and only if

p = q

on M. The metric tensor g can be expressed as

g = \nabla d ψ

or equivalently as

g = \nabla^{*} d ϕ

, where d denotes the exterior derivative and

ψ

and

ϕ

are 0 forms.

Now, the ∇-affine coordinate system

θ

is defined up to an affine transformation. That is, if

θ (\cdot)

is a ∇ coordinate system, then the coordinate system is

\bar{θ} (\cdot) = A θ (\cdot) + b

. However, once parameters A and b are fixed, it fully determines the dual

\nabla^{*}

coordinate system

η (\cdot)

.

The potential function

ϕ

is also reconstructed by solving differential modulo equations an affine term

〈 c, θ 〉 + d

(e.g., see the proofs relying on Poincaré lemma in [16,19,20,21]). Fixing the degrees of freedom in the reconstruction of

ψ

will also fix the corresponding affine term in

F^{*}

which expresses

ψ

. The dual potential functions

ψ

and

ϕ

on the manifold M are related by the fiberwise Legendre transform [22] in geometric mechanics.

Thus, the information geometry of dually flat spaces allows one to explain the decoupling of the interactions of the dual set of parameters

(A, b, c, d)

with

(E, f, g, h)

. Lastly, the scalar parameter

λ > 0

is the degree of freedom obtained from the fact that if

(M, g, \nabla, \nabla^{*})

is a dually flat space, then so is

(M, λ g, \nabla, \nabla^{*})

.

The Fenchel–Young inequality induced by the dual potential functions of Equation (5) defines a canonical divergence on a dually flat space that we term the dually flat Hessian divergence:

D_{g, \nabla,} (p : q) = ψ (θ (p)) + ϕ (η (q)) - \sum_{i = 1}^{m} θ_{i} (p) η_{i} (q) \geq 0 .

We have the following reference duality [23]:

D_{g, \nabla} (q : p) = D_{g, \nabla^{*}} (p : q)

.

The dually flat Hessian divergence can be expressed using the dual coordinate systems as a corresponding Fenchel–Young divergence defined by

Y_{F, F^{*}} (θ : η^{'}) : = F (θ) + F^{*} (η^{'}) - \sum_{i = 1}^{m} θ_{i} η_{i}^{'} .

A Fenchel–Young divergence can also be expressed equivalently as dual Bregman divergences

B_{F}

or

B_{F^{*}}

:

Y_{F, F^{*}} (θ : η^{'}) = B_{F} (θ : θ^{'}) = B_{F^{*}} (η^{'} : η),

where

θ^{'} = \nabla F^{*} (η^{'})

and

η = \nabla F (θ)

with

B_{F} (θ : θ^{'}) = F (θ) - F (θ^{'}) - 〈 θ - θ^{'}, \nabla F (θ^{'}) 〉 .

Thus, the dually flat Hessian divergence can be expressed as dual Fenchel–Young divergences (using the fact that

{F^{*}}^{*} = F

) in the mixed

θ

/

η

coordinate systems as

\begin{matrix} D_{g, \nabla,} (p : q) & = & Y_{F, F^{*}} (θ (p) : η (q)) = Y_{F^{*}, F} (η (q) : θ (p)), \end{matrix}

(6)

\begin{matrix} = & \frac{1}{λ} Y_{F_{P}, F_{P^{⋄}}^{*}} (\bar{θ} (p) : \bar{η} (q)) = \frac{1}{λ} Y_{F_{P^{⋄}}^{*}, F_{P}} (\bar{η} (q) : \bar{θ} (p)), \forall P \in P, \end{matrix}

(7)

This follows from the fact that

ψ (p) = F_{P} (\bar{θ} (p)) = F (θ (p))

and

ϕ (q) = F_{P^{⋄}}^{*} (\bar{η} (q)) = F^{*} (η (q))

for any

P \in P

.

Thus, we can define an equivalence relation ∼ between the functions of

Γ_{2}

:

F \sim \tilde{F}

if and only if

\tilde{F} = F_{P}

for some

P \in P

. Furthermore, a distance invariant under the Legendre transform [24] can be defined in the moduli space

Γ_{2} / \sim

of dually flat spaces.

Lastly, we may consider some curvilinear dual coordinate systems instead of the mutually orthogonal, dually affine coordinate systems. Let us transform the affine

θ

coordinate system into some arbitrary curvilinear coordinate system

\tilde{θ}

. The dual affine

η

coordinate system transforms correspondingly into the curvilinear coordinate system

\tilde{η}

. The underlying geometric structures have been called the

(u, v)

structures by Amari [16,25] or the

(ρ, τ)

structures by Zhang [23]. The dual Bregman divergences or, equivalently, the dual Fenchel–Young divergences of the underlying dually flat space need to be addressed as undeforming arguments [26,27]

\tilde{θ}

or

\tilde{η}

. Since the inverse transform

\tilde{θ} \to θ

can be interpreted as a representation function, the Bregman divergences addressed as undeforming

\tilde{θ}

arguments were called representational Bregman divergences in [26]. For example, the

β

divergences are Bregman divergences [28,29,30] which can be transformed by a nonlinear change of coordinates into

α

divergences [31]. Although the

α

divergences are Bregman divergences for any

α

on positive measures [26,27],

α

divergences are not Bregman divergences anymore when constrained on the probability simplex (except for the

α = \pm 1

case).

Remark 1.

In information geometry [16], and more generally in Hessian geometry [19], a Legendre-type

C^{3}

convex function

F (θ)

of

Γ_{3}

induces a dually flat space or manifold. The convex function

F (θ)

(potential function) may stem from several modeling sources. We may consider the following settings to build dually flat spaces:

The cumulant function of a natural exponential family [16,32]: Let $(X, Ω, μ)$ be a measurable space with a sample space $X$ , σ algebra Ω, and a positive measure μ (e.g., counting or Lebesgue measure). An exponential family $E$ is a set of probability measures $E = \{P_{θ} (x) = exp (〈 x, θ 〉 - F (θ)) d μ (x) : θ \in Θ\}$ , where

$Θ = {θ : F (θ) = log \int exp (〈 x, θ 〉) d μ (x) < \infty},$

is the natural parameter space. $Z (θ) = exp (F (θ))$ is strictly log-convex [32], and hence $F (θ) = log Z (θ)$ is strictly convex (and differentiable). In this case, the convex conjugate amounts to the negative Shannon or differential entropy [33] $F^{*} (η) = - S (p_{θ}) = \int p_{θ} (x) log p_{θ} (x) d μ (x)$ , where $S (p) = - \int p (x) log p (x) d μ (x)$ is the Shannon entropy (when μ is the counting measure) or the differential entropy (when μ is the Lebesgue measure).
The negative Shannon entropy of a mixture family [16]: A mixture family $M$ is a set of probability measures

$M = \{m_{θ} (x) = \sum_{i = 1}^{d - 1} θ_{i} p_{i} (x) + (1 - \sum_{i = 1}^{d - 1} θ_{i}) p_{0} (x)\},$

parameterized by a normalized positive weight vector $θ \in Θ = Δ_{d}$ (the standard $(d - 1)$ -dimensional simplex) such that the functions $1, p_{0} (x), \dots, p_{d - 1} (x)$ are linearly independent. The function $F (θ) = - S (m_{θ})$ is strictly convex and differentiable (see [34] for a proof). In this case, the convex conjugate $F^{*} (η) = S^{\times} (p_{0}, m_{θ})$ is the cross-entropy of $p_{0} (x)$ with the mixture $m_{θ} (x)$ , where $S^{\times} (p, q) = - \int p (x) log q (x) d μ (x)$ .
The characteristic function of a regular cone [19] (i.e., convex and pointed cone): Let $K$ be a regular cone and $K^{*} = \cap_{θ \in K} {η : 〈 θ, η 〉 \geq 0}$ be its dual cone. We define $F (θ) = log χ_{K} (θ)$ , where $χ_{K} (θ)$ is the characteristic function $χ_{K} (θ) = \int_{K^{*}} exp (- 〈 θ, η 〉) d η$ . One can further build generalized Wishart exponential families on the cones [35]. (Note that Massieu [36] introduced the concept of characteristic functions and their convex conjugates in thermodynamics in the 19th century.) See also the barrier functions (logarithms of characteristic functions) on convex cones for interior point methods [37].
The Guillemin potential of a (Delzant) polytope [38] which recovers the negative Shannon entropy when considering the standard simplex.

4. Conclusions

To summarize, we first proved that the generalized Legendre transforms obtained from the reverse-ordering involutive transform axiomatization of [11,12] can be interpreted as the ordinary Legendre transform of corresponding affine-deformed functions. Second, we explained how these generalized Legendre transforms are merely different expressions of the well-known geometric Legendre transforms on dually flat spaces (Hessian manifolds with global charts).

Figure 1 summarizes the various classes of convex functions used in this paper, with some of their key properties used to define information-geometric structures. In particular, the class of functions

Γ_{3}

is thrice-differentiable Legendre-type lsc. convex functions (with

Γ_{3} \subset Γ_{2} \subset Γ_{1} \subset Γ_{0}

), which allows one to construct

α

geometry [16] from the Amari–Chentsov cubic tensor [17].

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

Author Frank Nielsen was employed by the company Sony Computer Science Laboratories. The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A. Legendre-Type Functions

Rockafellar [18] showed that there exist convex lsc. functions of

Γ_{0}

with corresponding gradient domains that are not convex. For example, Rockafellar showed that the bivariate function

F (θ) = F (θ_{1}, θ_{2}) = \frac{1}{4} (\frac{θ_{1}^{2}}{θ_{2}} + θ_{1}^{2} + θ_{2}^{2})

defined on the upper plane domain

Θ = R \times R_{> 0}

is strictly convex, but its gradient domain H is not convex. Thus, to rule out these functions, Rockafellar defined Legendre-type functions [18]:

Definition A1

(Legendre-type function [18]). A convex function

F : Θ \subset R^{m} \to \bar{R}

is of a Legendre type if the following are true:

Θ is a non-empty open effective domain;
F is strictly convex and differentiable on Θ;
F becomes infinitely steep close to boundary points of its effective domain:

$\forall θ \in Θ, \forall θ^{'} \in \partial Θ, lim_{λ \to 0} \frac{d}{d λ} F (λ θ + (1 - λ) θ^{'}) = - \infty .$

When F is of a Legendre type, so is its convex conjugate

F^{*}

, and moreover the gradients of convex conjugates are reciprocal to each other such that

\nabla F^{*} = {(\nabla F)}^{- 1}

and

\nabla F = {(\nabla F^{*})}^{- 1}

. This property of reciprocal gradients obtained from the LFT of Legendre-type functions is strong since, in general, the implicit function theorem only guarantees local inversion of multivariate functions and not the existence of global inverse functions.

We shall denote by

Γ_{1} (Θ) \subset Γ_{0} (Θ)

the subset of proper lsc. strictly convex and differentiable functions of a Legendre type defined on the open effective domain

Θ

. We define the Legendre–Fenchel transform of

L (Θ, F) = (H, F^{*})

where H is the gradient domain, and we call

F^{*}

the convex conjugate of F. We have

L L (Θ, F) = (Θ, F)

for Legendre-type functions

F \in Γ_{1} (Θ)

. The concept of a Legendre-type function is related to the concept of steep exponential families in statistics.

In general, convex conjugate pairs enjoy the following reverse-ordering property:

Property A1

(Reverse-ordering). Let

F_{1}

and

F_{2}

in

Γ_{0} (Θ)

. If

F_{2} \leq F_{1}

, then

L F_{2} \geq L F_{1}

. If

F_{2} \geq F_{1}

, then

L F_{2} \leq L F_{1}

.

Proof.

Assume that

F_{2} \leq F_{1}

, and let us prove that

F_{2}^{*} = L F_{2} \geq F_{1}^{*} = L F_{1}

:

\begin{matrix} F_{1}^{*} (η) & = & sup_{θ} {〈 η, θ 〉 - F_{1} (θ)}, \\ = & 〈 η, \nabla F^{- 1} (η) 〉 - F_{1} (\nabla F^{- 1} (η)), \\ \leq & 〈 η, \nabla F^{- 1} (η) 〉 - F_{2} (\nabla F^{- 1} (η)), \\ \leq & sup_{θ} {〈 η, θ 〉 - F_{2} (θ)} = F_{2}^{*} (η) . \end{matrix}

The case

F_{2} \geq F_{1}

is similar to the case

F_{1} \geq F_{2}

by exchanging the role of

F_{1}

and

F_{2}

(i.e.,

F_{2} \leftrightarrow F_{1}

). □

Appendix B. Some Examples of Convex Conjugates

Example A1.

Let

F (θ) = 〈 a, θ 〉 + b

be an affine function. Then, we have

F^{*} (η) = \{\begin{matrix} - b & i f η = a \\ + \infty & i f η \neq a \end{matrix}

Let

1_{A} (x) = \{\begin{matrix} 0 & i f x \in A \\ + \infty & i f x \notin A \end{matrix}

be the indicator function of a set A. Then, we can write

F^{*} = 1_{{a}} - b

.

Example A2.

Let

F (θ) = | θ |

for

θ \in R

. Then, we have the convex conjugate

F^{*} (η) = 1_{[- 1, 1]} (η) = \{\begin{matrix} 0 & | η | \leq 1, \\ + \infty & o t h e r w i s e . \end{matrix}

Example A3.

Consider

F (θ) = exp θ

for

Θ = R

. We have

F^{*} (η) = \{\begin{matrix} η log η - η, & i f η > 0, \\ 0, & i f η = 0, \\ + \infty & i f η < 0 . \end{matrix}

The convex conjugate is the scalar Shannon entropy function extended to positive real values.

Example A4.

For

p \in [1, + \infty)

, let

F_{p} (θ) = \frac{1}{p} {∥ θ ∥}^{p}

for

θ \in Θ = R^{m}

. Then, we have

F_{p}^{*} (η) : = F_{q} (η)

, where

\frac{1}{p} + \frac{1}{q} = 1

. The powers p and q are called a conjugate pair of exponents. Notice that the Fenchel–Young inequality reduces to the Young’s inequality in the scalar case:

F_{p} (θ^{'}) + F_{q} (η) \geq 〈 θ^{'}, η 〉 .

Furthermore, by integrating Young’s inequality on both sides for

η = g (x)

and

θ = f (x)

for

x \in X

and

f \in L_{p}

and

g \in L_{q}

(Lebesgue spaces), we recover Hölder’s inequality:

\frac{1}{p} {∥ f ∥}_{p} + \frac{1}{q} {∥ g ∥}_{q} \geq {∥ f g ∥}_{1} .

In fact, we have the self-duality

F = F^{*}

only for

F (θ) = F_{2} (θ) = \frac{1}{2} {∥ θ ∥}^{2}

.

When a function

F \in Γ_{0} (Θ)

is not of the Legendre type, the Fenchel–Young inequality may be tight for several pairs

(θ^{'}, η)

instead of a single pair

(θ = \nabla F^{*} (η), η = \nabla F (θ))

. A subgradient

η

of

F (θ)

at

θ_{0}

satisfies

F (θ) - F (θ_{0}) \geq 〈 η, θ - θ_{0} 〉 .

The subdifferential

\partial_{θ_{0}} F

of F at

θ_{0}

is the set of subgradients. The subdifferential operator

\partial : R^{m} ⇉ R^{m}

is multivalued:

\partial F (θ) = \{\begin{matrix} \{η : F (θ) - F (θ_{0}) \geq 〈 η, θ - θ_{0} 〉\}, & i f θ \in Θ = dom (F) \\ \emptyset & i f θ \notin Θ . \end{matrix}

The following example illustrates the LFT of a non-Legendre type function and the LFT of a Legendre-type function obtained by restricting the function to a specified domain.

Example A5.

Let

F (θ) : = exp (| θ |) - | θ | - 1

be defined on

Θ = R

. Then,

F^{*} (η) = (1 + | η |) log (1 + | η |) - | η |

, defined on

H = R

. These two functions F and

F^{*}

are even and convex, and they form a conjugate pair (Figure A1). The function

F (θ)

is not differentiable at

θ = 0

, and thus we have

\partial F (θ) = \{\begin{matrix} \nabla F (θ) = exp (θ) - 1, & θ > 0, \\ \partial F (0) = [- 1, 1], & θ = 0, \\ \nabla F (θ) = - exp (- θ) + 1, & θ < 0 . \end{matrix}

The function

F^{*} (η)

is differentiable everywhere, and we have

\partial F^{*} (η) = \{\begin{matrix} \nabla F^{*} (η) = log (1 + η), & η \geq 0, \\ \nabla F^{*} (η) = - log (1 - η), & η \leq 0 . \end{matrix}

For

θ \neq 0

, the gradients

\nabla F (θ)

and

\nabla F^{*} (η)

are reciprocal to each other. The function

(R, F (θ))

is not of a Legendre type since it is not differentiable at

θ = 0

, but the functions

(R_{> 0}, F (θ))

and

(R_{> 0}, F^{*} (η))

form a Legendre-type pair when restricted to the positive domain

R_{> 0}

:

F, F^{*} \in Γ_{1} (R_{> 0})

.

Figure A1. A pair

(F (θ), F^{*} (η))

of conjugate functions (top) with their subgradients plotted (bottom). The function

F (θ)

is not differentiable at

θ = 0

and thus admits a subgradient

\partial F (0)

at

θ = 0

. The function

F^{*} (η)

is differentiable everywhere, and when

θ \neq 0

, we have

\nabla F^{*} = {(\nabla F)}^{- 1}

, visualized by rotating the

x y

axis 90 degrees).

Figure A1. A pair

(F (θ), F^{*} (η))

of conjugate functions (top) with their subgradients plotted (bottom). The function

F (θ)

is not differentiable at

θ = 0

and thus admits a subgradient

\partial F (0)

at

θ = 0

. The function

F^{*} (η)

is differentiable everywhere, and when

θ \neq 0

, we have

\nabla F^{*} = {(\nabla F)}^{- 1}

, visualized by rotating the

x y

axis 90 degrees).

References

Bauschke, H.H.; Lucet, Y. What is… a Fenchel conjugate? Not. AMS 2012, 59, 44–46. [Google Scholar] [CrossRef]
Legendre, A.M. Nouvelles méthodes pour la détermination des orbites des comètes; Firmin Didot: Paris, France, 1805. [Google Scholar]
Hamilton, W.R. On a General Method in Dynamics. In Philosophical Transactions of the Royal Society of London; Cambridge University: Cambridge, UK, 1834; pp. 247–308. [Google Scholar]
Gibbs, J.W. Elementary Principles of Statistical Mechanics; Yale University Press: New Haven, CT, USA, 1902. [Google Scholar]
Callen, H.B. Thermodynamics and an Introduction to Thermostatistics; John Wiley & Sons: Hoboken, NJ, USA, 1985. [Google Scholar]
Weinberg, S. The Quantum Theory of Fields; Cambridge University Press: Cambridge, UK, 1995; Volume 2. [Google Scholar]
Mas-Colell, A.; Whinston, M.D.; Green, J.R. Microeconomic Theory; Oxford University Press: New York, NY, USA, 1995; Volume 1. [Google Scholar]
Rockafellar, R.T. Convex Analysis. Princet. Math. Ser. 1970, 28, 470. [Google Scholar]
Correa, R.; Hantoute, A.; López, M.A. Fundamentals of Convex Analysis and Optimization; Springer: Berlin/Heidelberg, Germany, 2023. [Google Scholar]
Möllenhoff, T.; Khan, M.E. SAM as an Optimal Relaxation of Bayes. In Proceedings of the The Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023; ICLR: Appleton, WI, USA, 2023. [Google Scholar]
Artstein-Avidan, S.; Milman, V. A characterization of the concept of duality. Electron. Res. Announc. 2007, 14, 42–59. [Google Scholar]
Artstein-Avidan, S.; Milman, V. The concept of duality in convex analysis, and the characterization of the Legendre transform. Ann. Math. 2009, 169, 661–674. [Google Scholar] [CrossRef]
Fenchel, W. On Conjugate Convex Functions. Can. J. Math. 1949, 1, 73–77. [Google Scholar] [CrossRef]
Fenchel, W. On conjugate convex functions. In Traces and Emergence of Nonlinear Programming; Springer: Berlin/Heidelberg, Germany, 2013; pp. 125–129. [Google Scholar]
Böröczky, K.J.; Schneider, R. A characterization of the duality mapping for convex bodies. Geom. Funct. Anal. 2008, 18, 657–667. [Google Scholar] [CrossRef]
Amari, S.i. Information Geometry and Its Applications; Springer: Berlin/Heidelberg, Germany, 2016; Volume 194. [Google Scholar]
Ay, N.; Jost, J.; Vân Lê, H.; Schwachhöfer, L. Information Geometry; Springer: Berlin/Heidelberg, Germany, 2017; Volume 64. [Google Scholar]
Rockafellar, R.T. Conjugates and Legendre transforms of convex functions. Can. J. Math. 1967, 19, 200–205. [Google Scholar] [CrossRef]
Shima, H. The Geometry of Hessian Structures; World Scientific: Singapore, 2007. [Google Scholar]
Amari, S.i.; Nagaoka, H. Methods of Information Geometry; American Mathematical Society: Providence, RI, USA, 2000; Volume 191. [Google Scholar]
Morales, P.A.; Korbel, J.; Rosas, F.E. Geometric structures induced by deformations of the Legendre transform. Entropy 2023, 25, 678. [Google Scholar] [CrossRef] [PubMed]
Leok, M.; Zhang, J. Connecting information geometry and geometric mechanics. Entropy 2017, 19, 518. [Google Scholar] [CrossRef]
Naudts, J.; Zhang, J. Legendre duality: From thermodynamics to information geometry. Inf. Geom. 2024, 7, 623–649. [Google Scholar] [CrossRef]
Attouch, H.; Wets, R.J.B. Isometries for the Legendre-Fenchel transform. Trans. Am. Math. Soc. 1986, 296, 33–60. [Google Scholar] [CrossRef]
Nock, R.; Nielsen, F.; Amari, S.I. On conformal divergences and their population minimizers. IEEE Trans. Inf. Theory 2015, 62, 527–538. [Google Scholar] [CrossRef]
Nielsen, F.; Nock, R. The dual Voronoi diagrams with respect to representational Bregman divergences. In Proceedings of the Sixth International Symposium on Voronoi Diagrams (ISVD), Copenhagen, Denmark, 23–26 June 2009; IEEE: New York, NY, USA, 2009; pp. 71–78. [Google Scholar]
Amari, S.i. α-Divergence is unique, belonging to both f-divergence and Bregman divergence classes. IEEE Trans. Inf. Theory 2009, 55, 4925–4931. [Google Scholar] [CrossRef]
Csiszar, I. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann. Stat. 1991, 19, 2032–2066. [Google Scholar] [CrossRef]
Murata, N.; Takenouchi, T.; Kanamori, T.; Eguchi, S. Information geometry of U-Boost and Bregman divergence. Neural Comput. 2004, 16, 1437–1481. [Google Scholar] [CrossRef] [PubMed]
Hennequin, R.; David, B.; Badeau, R. Beta-divergence as a subclass of Bregman divergence. IEEE Signal Process. Lett. 2010, 18, 83–86. [Google Scholar] [CrossRef]
Dikmen, O.; Yang, Z.; Oja, E. Learning the information divergence. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 1442–1454. [Google Scholar] [CrossRef] [PubMed]
Barndorff-Nielsen, O. Information and Exponential Families: In Statistical Theory; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
Nielsen, F.; Nock, R. Entropies and cross-entropies of exponential families. In Proceedings of the 2010 IEEE International Conference on Image Processing, Hong Kong, China, 26–29 September 2010; IEEE: New York, NY, USA, 2010; pp. 3621–3624. [Google Scholar]
Nielsen, F.; Hadjeres, G. Monte Carlo information-geometric structures. In Geometric Structures of Information; Springer: Berlin/Heidelberg, Germany, 2018; pp. 69–103. [Google Scholar]
Graczyk, P.; Ishi, H.; Mamane, S. Wishart exponential families on cones related to tridiagonal matrices. Ann. Inst. Stat. Math. 2019, 71, 439–471. [Google Scholar] [CrossRef]
Massieu, F. Thermodynamique: Mémoire sur les Fonctions Caractéristiques des Divers Fluides et sur la Théorie des Vapeurs; Académie des Sciences de L’Institut National de France: Paris, France, 1876; Volume 22. [Google Scholar]
Güler, O. Barrier functions in interior point methods. Math. Oper. Res. 1996, 21, 860–885. [Google Scholar] [CrossRef]
Fujita, H. The generalized Pythagorean theorem on the compactifications of certain dually flat spaces via toric geometry. Inf. Geom. 2024, 7, 33–58. [Google Scholar] [CrossRef]

Figure 1. The ordinary Legendre transform for various classes of functions: relationships with Fenchel–Young and Bregman divergences, dually flat Hessian divergence, and

α

geometry in information geometry.

Figure 1. The ordinary Legendre transform for various classes of functions: relationships with Fenchel–Young and Bregman divergences, dually flat Hessian divergence, and

α

geometry in information geometry.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nielsen, F. Generalized Legendre Transforms Have Roots in Information Geometry. Entropy 2026, 28, 44. https://doi.org/10.3390/e28010044

AMA Style

Nielsen F. Generalized Legendre Transforms Have Roots in Information Geometry. Entropy. 2026; 28(1):44. https://doi.org/10.3390/e28010044

Chicago/Turabian Style

Nielsen, Frank. 2026. "Generalized Legendre Transforms Have Roots in Information Geometry" Entropy 28, no. 1: 44. https://doi.org/10.3390/e28010044

APA Style

Nielsen, F. (2026). Generalized Legendre Transforms Have Roots in Information Geometry. Entropy, 28(1), 44. https://doi.org/10.3390/e28010044

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Generalized Legendre Transforms Have Roots in Information Geometry^†

Abstract

1. Introduction

2. Generalized Legendre Transforms as Ordinary Legendre Transforms

3. An Information-Geometric Interpretation of Generalized Legendre Transforms

4. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Legendre-Type Functions

Appendix B. Some Examples of Convex Conjugates

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Generalized Legendre Transforms Have Roots in Information Geometry †

Abstract

1. Introduction

2. Generalized Legendre Transforms as Ordinary Legendre Transforms

3. An Information-Geometric Interpretation of Generalized Legendre Transforms

4. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Legendre-Type Functions

Appendix B. Some Examples of Convex Conjugates

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Generalized Legendre Transforms Have Roots in Information Geometry^†