λ-Deformation: A Canonical Framework for Statistical Manifolds of Constant Curvature

Zhang, Jun; Wong, Ting-Kam Leonard

doi:10.3390/e24020193

Open AccessReview

λ-Deformation: A Canonical Framework for Statistical Manifolds of Constant Curvature

by

Jun Zhang

^1,2,* and

Ting-Kam Leonard Wong

³

¹

Department of Psychology, University of Michigan, Ann Arbor, MI 48109-1109, USA

²

Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1109, USA

³

Department of Statistical Sciences, University of Toronto, Toronto, ON M5S 1A1, Canada

^*

Author to whom correspondence should be addressed.

Entropy 2022, 24(2), 193; https://doi.org/10.3390/e24020193

Submission received: 15 December 2021 / Revised: 15 January 2022 / Accepted: 18 January 2022 / Published: 27 January 2022

(This article belongs to the Special Issue Review Papers for Entropy)

Download

Browse Figure

Versions Notes

Abstract

:

This paper systematically presents the

λ

-deformation as the canonical framework of deformation to the dually flat (Hessian) geometry, which has been well established in information geometry. We show that, based on deforming the Legendre duality, all objects in the Hessian case have their correspondence in the

λ

-deformed case:

λ

-convexity,

λ

-conjugation,

λ

-biorthogonality,

λ

-logarithmic divergence,

λ

-exponential and

λ

-mixture families, etc. In particular,

λ

-deformation unifies Tsallis and Rényi deformations by relating them to two manifestations of an identical

λ

-exponential family, under subtractive or divisive probability normalization, respectively. Unlike the different Hessian geometries of the exponential and mixture families, the

λ

-exponential family, in turn, coincides with the

λ

-mixture family after a change of random variables. The resulting statistical manifolds, while still carrying a dualistic structure, replace the Hessian metric and a pair of dually flat conjugate affine connections with a conformal Hessian metric and a pair of projectively flat connections carrying constant (nonzero) curvature. Thus,

λ

-deformation is a canonical framework in generalizing the well-known dually flat Hessian structure of information geometry.

Keywords:

Legendre duality; λ-duality; λ-exponential family; λ-mixture family; conformal Hessian; constant curvature space

1. Introduction

Information geometry is a differential-geometric framework for studying finite-dimensional statistical models that coherently integrates the following notions:

(i): A differentiable manifold $M$ consisting of probability density functions or finite measures on a common sample space;
(ii): A divergence function $D [p | | p^{'}]$ that defines an asymmetric proximity between points p, $p^{'}$ in $M$ ;
(iii): A Riemannian metric $g$ plus a pair of torsion-free dual (conjugate) affine connections $\nabla, \nabla^{*}$ on $M$ .

For completeness, we recall that a pair of affine connections ∇,

\nabla^{*}

on

M

are said to be dual (or conjugate) with respect to a Riemannian metric

g

if for any vector fields X, Y, and Z on

M

, one has:

Z g (X, Y) = g (\nabla_{Z} X, Y) + g (X, \nabla_{Z}^{*} Y) .

(1)

Here,

(M, g, \nabla, \nabla^{*})

is called a dualistic structure. When D is the Kullback–Leibler divergence (or more generally, f-divergence), the induced Riemannian metric

g

is the Fisher–Rao metric, and the induced cubic form

C = \nabla^{*} - \nabla

is the Amari–Chentsov tensor [1]. It can be shown that the Fisher–Rao metric and the Amari–Chentsov tensor are unique invariants, of respectively second and third orders, under sufficient statistics on the manifold

M

[2].

Geometrically, the standard model (denoted the

S

-model in this paper) uses a pair of affine connections that are torsion-free, though in general, they are not curvature-free. An alternative, “partially flat” model (denoted the

P

-model in this paper) was recently investigated in [3], leading to the notion of “statistical mirror symmetry” [4]. Under the

P

-model, the affine connections ∇ and

\nabla^{*}

are allowed to carry torsion, but are both curvature-free. See [4] for the geometric properties of the

P

-model leading to a symplectic-to-complex correspondence characteristic of mirror Calabi–Yau manifolds studied in string theory and mathematical physics.

Within the usual

S

-model, a special case is the dually flat geometry where the Riemannian metric can be expressed under special coordinate systems as a Hessian metric. Two prominent examples are the exponential family and the mixture family, where the Hessian metric coincides with the Fisher–Rao metric. The Hessian geometry is said to be dually flat because the Riemann curvature tensors of both the primal and the dual connections vanish; the corresponding primal and dual affine coordinate systems are linked via Legendre transformations by a pair of convex potentials. For an exponential family, these coordinates are precisely the natural (canonical) and mixture (expectation) coordinate systems, respectively. Note that the Hessian metric itself is not flat as its Levi-Civita connection contains curvature in general.

Between the well-understood dually flat Hessian geometry and the full-blown

S

-model, there is a wide range of geometries capturing various probability models. Of special interest are generalizations of the exponential family, namely deformed exponential families. The

ϕ

-exponential family was introduced in the context of statistical physics [5]; it was later shown [6] to be equivalent to the U-model [7] motivated by applications in machine learning—[6] revealed that both the

ϕ

- and U-models can be generated from the

(ρ, τ)

-model [8] through the mechanism of “gauge selection”. The

(ρ, τ)

-metric generalizes the Fisher–Rao metric and may lead to a conformal Hessian metric for a

ϕ

-exponential family. However, the connections are typically not curvature-free unless a special type of gauge is selected; this underlies the geometric characterization of the q-exponential model of Tsallis by [9,10,11].

In recent years, the second author [12], motivated by previous works with Pal on mathematical finance and optimal transport [13,14,15,16], studied a class of deformed exponential families generating constant curvatures through the use of a new divergence function called logarithmic divergence. By constant (information geometric) curvature, we mean that both the primal and dual Riemann curvature tensors have (the same) constant sectional curvature with respect to

g

. In [17], the present authors developed a unified framework, based on the notions of λ-duality and the λ-exponential family, which appears to provide a canonical extension of the dually flat geometry to the constant curvature case. Previously, statistical manifolds with constant curvature were studied using the abstract tools of affine differential geometry; see, e.g., [1,18] (also see [19]). Our framework provides a concrete approach and an explicit construction that elucidates how the properties of the exponential family and the dually flat geometry may be extended to the constant curvature case. In this paper, a careful exposition of the

λ

-deformation framework is provided from the perspective of

λ

-duality, namely the

λ

-deformation of Legendre duality.

The rest of the paper is organized as follows. In Section 2, we review the standard

S

-model of information geometry with a focus on the dually flat geometry, based on convex duality and Bregman divergence, of the exponential and mixture families. The section closes with a preview of

λ

-deformation by introducing a suite of four deformation functions, as two pairs of mutually inverse functions:

{log}_{λ}

versus

{exp}_{λ}

and

κ_{λ}

versus

γ_{λ}

, with the first pair deforming log and exp and the second pair deforming the identity function. In Section 3, we describe the

λ

-duality, which deforms the standard convex duality. In particular, we compare

λ

-duality and standard Legendre duality and show their relations to each other upon a change of parameterization. In Section 4, we define the

λ

-gradient and then the

λ

-logarithmic divergence and study the constant curvature information geometry the latter induces. In Section 5, we relate

λ

-divergence to Rényi entropy by introducing the

λ

-exponential and

λ

-mixture family. The two expressions of the

λ

-exponential family under divisive and subtractive normalization correspond to, respectively, Rényi deformation and Tsallis deformation. Section 6 concludes with a comparison of

λ

-deformation with the standard dually flat (Hessian) framework.

2. The Standard Model of Information Geometry

2.1. The Standard Model

We begin by recalling the standard framework (referred to as the

S

-model) of parametric information geometry [1,20]. Let

M

be a finite-dimensional differentiable manifold with dimension d and

θ = (θ^{1}, \dots, θ^{d})

be a local coordinate system. The most important case is where

M

is a manifold of parametric probability density functions. However, the idea of deforming Legendre duality to

λ

-duality and hence dually flat (Hessian) manifolds to manifolds of constant curvature discussed in Section 3 and Section 4 is entirely general and does not rely on

M

being a manifold of probability density functions.

Let

(X, μ)

be a measure space, where

μ

is called the reference (or dominating) measure. Let

Θ \subset R^{d}

be an open domain. A parametric family of density functions is a mapping

θ \in Θ \mapsto p (\cdot | θ)

, where each

p (\cdot | θ)

is a probability density function with respect to

μ

, i.e.,

\int_{X} p (ζ | θ) d μ (ζ) = 1

. We assume that the family is sufficient regular such that all analytical operations (such as differentiation under the integral sign) can be performed as needed.

While a dualistic structure

(M, g, \nabla, \nabla^{*})

can be defined abstractly, in practice, it is often constructed by a divergence, namely a smooth, non-negative function

D [\cdot | | \cdot]

on

M \times M

such that

D [p | | p^{'}] = 0

only if

p = p^{'}

and the (0,2)-tensor

g

it induces on

M

(see (2) below) is positive definite. Intuitively,

D [p | | p^{'}]

defines a notion of “asymmetric distance” between points p and

p^{'}

of

M

. When

M

is a manifold of density functions, a prominent example is the Kullback–Leibler (KL) divergence (relative entropy) given by:

H [p | | p^{'}] = \int p log \frac{p}{p^{'}} d μ .

When dealing with parametric probability families, p and

p^{'}

are replaced by

p (\cdot | θ)

and

p (\cdot | θ^{'})

, then we denote

D [p | | p^{'}]

as

D (θ, θ^{'})

with an abuse of notation, that is:

D [p (\cdot | θ) | | p (\cdot | θ^{'})] \equiv D (θ, θ^{'}),

and similarly for H as well—the notation of

[p | | p^{'}]

in the divergence for probability density functions emphasizes the non-symmetricity in p,

p^{'}

; see [1].

Eguchi [21] showed that any divergence function (called a “contrast function” there) induces a dualistic structure

(M, g, \nabla, \nabla^{*})

. In local coordinates, given

D (θ, θ^{'})

, the components

g_{i j}

of the metric

g

are given by:

g_{i j} (θ) = - {\frac{\partial^{2}}{\partial θ^{i} \partial θ^{' j}} D (θ, θ^{'})|}_{θ = θ^{'}} = {\frac{\partial^{2}}{\partial θ^{i} \partial θ^{j}} D (θ, θ^{'})|}_{θ = θ^{'}},

(2)

and the Christoffel symbols of the conjugate connections ∇ and

\nabla^{*}

are given respectively by:

Γ_{i j, k} (θ) = - {\frac{\partial^{3}}{\partial θ^{i} \partial θ^{j} \partial θ^{' k}} D (θ, θ^{'})|}_{θ^{'} = θ}, Γ_{i j, k}^{*} (θ) = - {\frac{\partial^{3}}{\partial θ^{' i} \partial θ^{' j} \partial θ^{k}} D (θ, θ^{'})|}_{θ = θ^{'}} .

(3)

Conversely, given any dualistic structure

(M, g, \nabla, \nabla^{*})

, there exists a divergence D that induces it, but this D is not unique in general [22]. Thus, the standard model

S

is completely encoded by the choice of a divergence function.

2.2. Dually Flat Geometry

The most important example of a dualistic structure is the dually flat geometry, which is induced by a Bregman divergence [23]. Let

M

be prescribed with an affine coordinate chart

θ \in Θ

on an open convex set

Θ \subset R^{d}

. Let

ϕ : Θ \to R

be a differentiable convex function; specifically, we assumed that

ϕ

is

C^{2}

and its Hessian

D^{2} ϕ

is strictly positive definite. The Bregman divergence of

ϕ

is defined by:

B_{ϕ} (θ, θ^{'}) = ϕ (θ) - ϕ (θ^{'}) - D ϕ (θ^{'}) \cdot (θ - θ^{'}), θ, θ^{'} \in Θ,

where

D ϕ (θ) = {(\frac{\partial}{\partial θ^{1}} ϕ (θ), \dots, \frac{\partial}{\partial θ^{d}} ϕ (θ))}^{⊤}

is the Euclidean gradient and

a \cdot b

denotes the standard dot product. We call

θ \in Θ

the primal coordinates, and

η = D ϕ (θ)

the dual coordinates, where the inverse of

D ϕ

is given by

θ = D ϕ^{*} (η)

. Here, the Legendre conjugate

ϕ^{*}

(or convex conjugate) of

ϕ

is defined by:

ϕ^{*} (η) = sup_{θ} (θ \cdot η - ϕ (θ)) .

(4)

Then, the components

g_{i j}

of the Riemannian metric

g

, under the respective local coordinate system, are given by:

g_{i j} (θ) = \frac{\partial^{2}}{\partial θ^{i} \partial θ^{j}} ϕ (θ), g_{i j} (η) = \frac{\partial^{2}}{\partial η^{i} \partial η^{j}} ϕ^{*} (η) .

(5)

In particular,

g

is a Hessian metric with potential

ϕ

(resp.

ϕ^{*}

) under

θ

(resp.

η

). Furthermore, the Christoffel symbols of ∇ and

\nabla^{*}

are given respectively by:

Γ_{i j, k} (θ) = 0, Γ_{i j, k}^{*} (η) = 0 .

(6)

From (6), we see that the Riemann curvature tensors of both ∇ and

\nabla^{*}

vanish. Thus, we call this a dually flat geometry. Furthermore, a ∇-geodesic (resp.

\nabla^{*}

-geodesic) is a constant velocity straight line under the

θ

(resp.

η

) coordinate system.

Moreover, the

θ

and

η

coordinates are biorthogonal in the sense that:

g (\frac{\partial}{\partial θ^{i}}, \frac{\partial}{\partial η^{j}}) = δ_{i j},

(7)

and the Bregman divergence takes the forms of:

B_{ϕ} (θ, θ^{'} (η^{'})) = A_{ϕ} (θ, η^{'}) = A_{ϕ^{*}} (η^{'}, θ) = B_{ϕ^{*}} (η^{'}, η (θ))

with

η = D ϕ (θ)

and

θ^{'} = D ϕ^{*} (η^{'})

, where A is called the canonical divergence:

A_{ϕ} (θ, η^{'}) = ϕ (θ) + ϕ^{*} (η^{'}) - θ \cdot η^{'} = A_{ϕ^{*}} (η^{'}, θ) .

(8)

Following [24,25], we call the equality between two expressions of B and the equality between two expressions of A in (8) reference–representation biduality. In [26], the identity (8) was used to motivate a family of Fenchel–Young losses in the context of regularized prediction in machine learning. Last but not least, the Bregman divergence satisfies the generalized Pythagorean theorem: given points P, Q, and R, we have the equality:

B_{ϕ} (θ_{Q}, θ_{P}) + B_{ϕ} (θ_{R}, θ_{Q}) = B_{ϕ} (θ_{R}, θ_{P})

if and only if the ∇-geodesic between Q and R and the

\nabla^{*}

-geodesic between Q and P meet

g

-orthogonally at Q. As we will see in Section 4, all the properties above have natural generalizations under our

λ

-framework. We stress that the dually flat geometry depends crucially on classical convex (or Legendre) duality, as seen from (4) and (8).

2.3. Exponential and Mixture Families

The dually flat Hessian geometry arises naturally in the exponential and mixture families of probability densities. Given a reference measure

μ

on a state space

X

, an exponential family is a parameterized probability density

p^{(e)} (\cdot | θ)

of the form:

p^{(e)} (ζ | θ) = e^{θ \cdot F (ζ) - ϕ (θ)},

(9)

where

θ = (θ^{1}, \dots, θ^{d}) \in Θ \subseteq R^{d}

and

F (ζ) = (F_{1} (ζ), \dots, F_{d} (ζ))

is a vector of sufficient statistics. In (9), the cumulant generating function

ϕ

, defined by:

ϕ (θ) = log \int e^{θ \cdot F} d μ,

enforces the normalization

\int p^{(e)} d μ = 1

. The exponential family generalizes the Boltzmann–Gibbs distribution in statistical physics, where

Z (θ) = e^{ϕ (θ)}

is called the partition function.

The information geometry of the exponential family begins with the observation that

ϕ

is convex. Then,

ϕ

defines a Bregman divergence

B_{ϕ}

giving rise to a dually flat structure. It can be shown that the Bregman divergence is a KL-divergence:

B_{ϕ} (θ, θ^{'}) = H [p^{(e)} (\cdot | θ^{'}) | | p^{(e)} (\cdot | θ)] .

The induced Riemannian metric

g

, the Fisher–Rao metric given by (in matrix components

g_{i j}

), becomes a Hessian metric

D^{2} ϕ

:

g_{i j} (θ) = \int (\frac{\partial}{\partial θ^{i}} log p^{(e)} (ζ | θ)) (\frac{\partial}{\partial θ^{j}} log p^{(e)} (ζ | θ)) p^{(e)} (ζ | θ) d μ = \frac{\partial^{2}}{\partial θ^{i} \partial θ^{j}} ϕ (θ) .

Equivalently,

g (θ)

is the covariance matrix of the sufficient statistics F:

g_{i j} (θ) = \int p^{(e)} (ζ | θ) (F_{i} (ζ) - \int p^{(e)} (ζ | θ) F_{i} (ζ)) (F_{j} (ζ) - \int p^{(e)} (ζ | θ) F_{j} (ζ)) d μ .

Furthermore, the dual coordinate

η = D ϕ (θ)

is the expectation coordinates given by:

η = \int p^{(e)} (ζ | θ) F (ζ) d μ,

and the dual potential function

ϕ^{*}

is, as a function of

η

, the negative Shannon entropy:

ϕ^{*} (η) = - H [p^{(e)} (\cdot | θ)] = \int p^{(e)} (ζ | θ) log p^{(e)} (ζ | θ) d μ .

A theoretical justification for the exponential family is that it maximizes the Shannon entropy under the constraints of its expected value of the vector of random functions

F (\cdot)

.

The mixture family is another probability family that is very useful in both theory and applications. Let

P_{0} (ζ), P_{1} (ζ), \dots, P_{d} (ζ)

be a set of affinely independent probability densities with respect to the same dominating measure

μ

. Given mixture parameters

η_{i} > 0

for

i = 0, \dots, d

with

\sum_{i = 0}^{d} η_{i} = 1

, the mixture family

p^{(m)} (\cdot | η)

is defined by:

p^{(m)} (ζ | η) = \sum_{i = 0}^{d} η_{i} P_{i} (ζ) = P_{0} (ζ) + \sum_{i = 1}^{d} η_{i} (P_{i} (ζ) - P_{0} (ζ)),

where

(η_{1}, \dots, η_{d})

may be taken as the independent parameters. It can be shown that the negative Shannon entropy:

ψ (η) = - H [p^{(m)} (\cdot | η)] = \int p^{(m)} (ζ | η) log p^{(m)} (ζ | η) d μ

of a mixture family is convex in

η

. Using

ψ

as the potential function, we have:

B_{ψ} (η, η^{'}) = H [p^{(m)} (\cdot | η) | | p^{(m)} (\cdot | | η^{'})],

which is again a KL-divergence and induces a dually flat geometry. In summary, the exponential and mixture families are both dually flat when the geometry is induced by the KL-divergence.

For completeness, we note that the convex conjugate of

ψ (η)

is:

ψ^{*} (θ) = - \int P_{0} (ζ) log p^{(m)} (ζ | η) d μ,

with conjugate parameters

θ = D ψ

given by:

θ^{i} = \int (P_{i} (ζ) - P_{0} (ζ)) log p^{(m)} (ζ | η) d μ .

2.4. Deforming exp and log

The exponential function used in the exponential family:

p^{(e)} (ζ | θ) = exp {θ \cdot F (ζ) - ϕ (θ)} = \frac{e^{θ \cdot F (ζ)}}{Z (θ)}

allows the cumulant generating function

ϕ (θ)

(also called the potential function) and the partition function

Z (θ)

to be linked by the simple relation

ϕ = log Z

. The equivalence of using

ϕ

as subtractive normalization and Z as divisive normalization of the same exponential family

\int p^{(e)} (ζ | θ) d μ = 1

is due to the elementary, but crucial property

exp (x + y) = exp (x) exp (y)

of the exponential function. Using a functional form other than exp (exponential function) or log (logarithm function) is referred to as deformation in information geometric (statistical and information theoretic) contexts, and the resulting probability families are called “deformed” families. Typically, this is performed by regarding log, or equivalently exp, as special cases of some parametric class of functions that include them as special members.

More generally, the exponential/logarithmic function can be considered within a non-parametric function space that includes exp or log as a special member. Several approaches can be found in the literature, including the

ϕ

-deformed exponential approach by Naudts [5,27,28], the conjugate

(ρ, τ)

-embedding approach by the first author [8,25,29], and the U-model by Eguchi [7,30]. The

ϕ

-model and U-model are both one-function models, while the

(ρ, τ)

-model uses two free functions. It eventually became clear in the 2018 paper [6] by Naudts and the first author that (i) the

ϕ

- and U-model turned out to be equivalent; (ii) they are special cases of the

(ρ, τ)

-model upon a particular fixing of the “gauge freedom”; (iii) the corresponding

(ρ, τ)

-geometry of the manifold of the

ϕ

-exponential family can have different appearances (gauges freedom), such as a Hessian geometry (under one type of gauge selection) and a conformal Hessian geometry (under another type of gauge selection). The work [6] unified the intermediary results in [10,11,31] and provided a general deformation framework that preserves the rigid interlocking of: (i) the functional form of entropy, cross-entropy, and relative entropy (divergence); (ii) the functional form of the deformed probability family with the corresponding normalization and potential and the duality between the natural and expectation parameterizations; (iii) the expressions of the Riemannian metric (Fisher–Rao metric in general and Hessian metric in particular) and of the conjugate connections. Some of these concepts have their correspondence in nonparametric probability families as well [32,33,34].

Although the

(ρ, τ)

-model may admit a conformal Hessian metric (more rigorously stated: the

ϕ

-exponential family with the

(ρ, τ)

-metric under a certain gauge will lead to conformal Hessian geometry), the dual connections are not projectively flat (as the geometry studied by [12]). As a result, while the connections are not flat (torsion-free, but not curvature-free), they are not in general of the constant-curvature-type either. Therefore they are “too general” and do not generate the space of constant curvatures.

2.5. Highlights of $λ$ -Deformation

Here enters

λ

-deformation as a middle ground [17]. The

λ

-deformation theory absorbs the q-deformation model of Tsallis and the

F^{(\pm α)}

model of Wong [12] in deforming the exponential family and unifies the subtractive and divisive normalization—this is an occasion where subtractive and divisive normalizations are still linked by a simple reparameterization of the probability family.

Let us introduce some notations. Consider the following deformed logarithm and exponential functions (note the slight difference to the

{log}_{q}

notation used by Tsallis, in the way how the subscript indicates the deformation parameter):

{log}_{λ} (t) = \frac{1}{λ} (t^{λ} - 1), {exp}_{λ} (t) = {(1 + λ t)}^{1 / λ} .

More precisely, we define

{exp}_{λ} : R \to [0, \infty]

(where

λ \in R, λ \neq 0

) by:

{exp}_{λ} (t) = {[1 + λ t]}_{+}^{1 / λ},

where

{[a]}_{+} = max {a, 0}

. In our analysis, we assumed implicitly that

1 + λ t > 0

, which is shown to hold for

λ

-duality, so the subscript + can be omitted. Furthermore,

\frac{d}{d t} {exp}_{λ} (t) = {[{exp}_{λ} (t)]}^{1 - λ}

, so

{exp}_{λ} (\cdot)

is convex if and only if

λ < 1

. For this reason, we restricted

λ

to this range as in [9,28]. Below, we also took

log t = - \infty

whenever

t \leq 0

. Note that our notation differs slightly from Tsallis’ indexing of the deformed logarithm and exponential functions; see Section 5.

Next, we construct another pair of inverse functions

κ_{λ}, γ_{λ}

by:

κ_{λ} = log \circ {exp}_{λ}, γ_{λ} = {log}_{λ} \circ exp,

where ∘ denotes function composition. Explicitly, they are:

κ_{λ} (t) = \frac{1}{λ} log (1 + λ t), γ_{λ} (t) = \frac{1}{λ} (e^{λ t} - 1) .

(10)

This suite of four functions, namely

{exp}_{λ}, {log}_{λ}

as an inverse pair and

κ_{λ}, γ_{λ}

as another inverse pair, is called λ-deformation and used in the discussions of

λ

-convexity,

λ

-conjugation, and

λ

-duality. Regular exponential and logarithmic functions are recovered when

λ \to 0

, whence both

κ_{λ}

and

γ_{λ}

reduce to the identity function.

Using these four functions, Wong and Zhang [17] developed the

λ

-deformation framework to solve the problem of relating the exponential family under subtractive normalization:

p^{(λ)} (ζ | θ) = {exp}_{λ} (θ \cdot F (ζ) - ϕ_{λ} (θ))

to that under divisive normalization:

p^{(λ)} (ζ | ϑ) = {exp}_{λ} (ϑ \cdot F (ζ)) e^{- φ_{λ} (ϑ)} .

There, the same

λ

-deformed exponential family can be expressed by two parameterizations

θ

and

ϑ

linked through:

θ = ϑ e^{- λ φ_{λ} (ϑ)} ⟺ ϑ = \frac{θ}{1 - λ ϕ_{λ} (θ)},

while the normalization functions

ϕ_{λ}

and

φ_{λ}

(with different domains) are linked through:

ϕ_{λ} (θ) = γ_{- λ} (φ_{λ} (ϑ)) ⟺ φ_{λ} (ϑ) = κ_{- λ} (ϕ_{λ} (θ)) .

The

λ

-deformation framework led to a unified way of looking at the Tsallis entropy (related to the subtractive denormalization) and Rényi entropy (related to the divisive normalization), as well as generating new insights into the distinction between the exponential and mixture families through the lens of deformation theory. To understand this deformation better, we describe the underlying mathematical framework of

λ

-deformation.

3. Deforming the Legendre Duality: $λ$ -Duality

In this section, we describe the

λ

-duality and a its link to the standard Legendre duality. We start by defining the notions of

λ

-conjugate and

λ

-convexity/

λ

-concavity, then draw a parallel to the regular Legendre duality. We proceed to establish a formal correspondence between the

λ

-duality and classical convex duality, including the associated notions of the

λ

-gradient,

λ

-logarithmic divergence, etc. Some of the derivations are illustrative, yet heuristic—a rigorous analysis in the spirit of Rockafellar [35] is yet to be performed in future research.

3.1. Legendre Duality and Bregman Divergence Reviewed

Recall from (4) that the convex conjugate of a function f on

R^{d}

is defined by:

f^{*} (u) = sup_{x} (x \cdot u - f (x)), u \in R^{d} .

(11)

It can be proven that:

(i): $f^{*}$ is convex;
(ii): ${({(f^{*})}^{*})}^{*} = f^{*}$ ;
(iii): ${(f^{*})}^{*} = f$ if f is convex and lower semicontinuous.

When f is further differentiable, then the Legendre transformation:

u = D f (x),

which can be motivated by the first-order condition in (11), defines a “dual variable” u, satisfying the Fenchel identity:

f (x) + f^{*} (u) = x \cdot u .

We have

x = D f^{*} (u)

, provided the second derivative or

D^{2} f

is positive definite. The function f also defines a Bregman divergence

B_{f}

given by:

B_{f} (x, x^{'}) = f (x) - f (x^{'}) - D f (x^{'}) \cdot (x - x^{'}) \geq 0 .

(12)

The Bregman divergence satisfies the reference–representation biduality [24,25] in the sense that:

B_{f} (x, x^{'}) = B_{f^{*}} (u^{'}, u)

where

u = D f (x), u^{'} = D f (x^{'}) .

Note that when f is convex and differentiable, the non-negativity of the Bregman divergence encodes the fact that for any

x, x^{'}

:

f (x) - f (x^{'}) \geq D f (x^{'}) \cdot (x - x^{'}) .

3.2. $λ$ -Deformation of Legendre Duality

The main idea behind the

λ

-deformation of the Legendre duality (“

λ

-duality”) is to replace the term

x \cdot u

in (11) by a monotone transformation of

x \cdot u

. Given a parameter

λ \in R \ {0}

, later revealed to be the curvature parameter of the information geometric characterization, we replace the term

x \cdot u

by:

κ_{λ} (x \cdot u) = \frac{1}{λ} log (1 + λ x \cdot u),

(13)

where

κ_{λ} (t)

and its inverse

γ_{λ} (t)

are given by (10). With this in mind we give the following definition.

Definition 1

(

λ

-conjugation). Let

Ω, Ω^{'} \subset R^{d}

. Given a function

f : Ω \to R

, we define its λ-conjugate

f^{(λ)}

by:

f^{(λ)} (u) = sup_{x \in Ω} (κ_{λ} (x \cdot u) - f (x)), u \in Ω^{'} .

(14)

Generalized convex dualities have been heavily used in optimal transport theory [36,37] to characterize the optimal transport plans; in this context, it is called the c-duality where c is the cost function of the transport problem. A major novelty of our framework is that the functional form of

κ_{λ}

(and of

γ_{λ}

) leads to explicit formulas, which are not available in the general case. We remark that this is closely related to the fact that the associated information geometry has constant curvature

λ

.

It turns out that the

λ

-conjugation defined by (14) corresponds to an appropriately generalized notion of convexity or concavity, through the aid of the function

γ_{λ}

given by (10). Henceforth, we let

λ \in R \ {0}

be a fixed constant.

Definition 2

(

λ

-exponential convexity and concavity). Let

Ω \subset R^{d}

be an open convex set. A function

f : Ω \to R

is said to be λ-exponentially convex (“λ-convex”), or λ-exponentially concave (“λ-concave”), if:

G_{λ, f}^{} (x) = (γ_{λ} \circ f) (x) = \frac{1}{λ} (e^{λ f (x)} - 1)

is convex, or concave, on Ω. When f is

C^{2}

, we have equivalently that f is λ-convex, or λ-concave, if the Hessian of

G_{λ, f}^{} \equiv γ_{λ} \circ f

is positive definite, or negative definite.

Note that the additive term

- 1 / λ

in the above definition of

G_{λ, f}^{} (x) = \frac{1}{λ} (e^{λ f (x)} - 1)

is not necessary; it is included so that

{lim}_{λ \to 0} G_{λ, f}^{} (x) = f (x)

, meaning that in the limiting case of zero-convexity is just ordinary convexity.

It is easily shown that, for

λ > 0

a fixed positive number,

(i): f is $λ$ -convex if and only if $- f$ is $(- λ)$ -concave;
(ii): f is $λ$ -concave if and only if $- f$ is $(- λ)$ -convex.

Proposition 1.

Given any

f : Ω \to R

, we define variable

\tilde{x}

, which has range

\tilde{Ω} \subset R^{d}

, and function

g : \tilde{Ω} \to R

by:

\begin{matrix} \tilde{x} & = & x e^{- λ f (x)} = x (1 - λ G_{- λ, f} (x)), \end{matrix}

(15)

\begin{matrix} g (\tilde{x}) & = & \frac{1}{- λ} (e^{- λ f (x)} - 1) = γ_{- λ} (f (x)) = G_{- λ, f}^{} (x) . \end{matrix}

(16)

Then, the convex (Legendre) conjugate

g^{*}

of the function g:

g^{*} (u) = sup_{\tilde{x} \in \tilde{Ω}} (\tilde{x} \cdot u - g (\tilde{x}))

is related to the λ-conjugate

f^{(λ)}

of the function f via:

g^{*} (u) = \frac{1}{λ} (e^{λ f^{(λ)} (u)} - 1) = γ_{λ} (f^{(λ)} (u)) = G_{λ, f^{(λ)}}^{} (u) .

(17)

Proof.

We first prove the following identities:

\begin{matrix} (1 + λ x \cdot u) e^{- λ f (x)} & = & e^{- λ f (x)} + λ e^{- λ f (x)} x \cdot u \\ = & (1 - λ g (\tilde{x})) + λ \tilde{x} \cdot u \\ = & 1 + λ (\tilde{x} \cdot u - g (\tilde{x})) \end{matrix}

where, going from the first to the second line, we used (15) and the fact:

1 - λ g (\tilde{x}) = e^{- λ f (x)},

which is a re-write of the definition of g given by (16).

With the above identity, we can proceed to prove this proposition. For

u \in Ω^{'}

, we have:

\begin{matrix} f^{(λ)} (u) & = & sup_{x \in Ω} (\frac{1}{λ} log (1 + λ (x \cdot u)) - f (x)) \\ = & sup_{\tilde{x} \in \tilde{Ω}} \frac{1}{λ} log (1 + λ (\tilde{x} \cdot u - g (\tilde{x}))) \\ = & \frac{1}{λ} log (1 + λ sup_{\tilde{x} \in \tilde{Ω}} (\tilde{x} \cdot u - g (\tilde{x}))) \\ = & \frac{1}{λ} log (1 + λ g^{*} (u)) \\ = & κ_{λ} (g^{*} (u)) . \end{matrix}

Recasting the above relation yields (17). □

Recall that from convex analysis,

g^{*}

is always a convex function regardless of whether g is convex (by the property of Legendre conjugation). The expression of

g^{*} (u) = γ_{λ} (f^{(λ)} (u))

in (17) therefore implies that

f^{(λ)}

is

λ

-convex, by the definition of

λ

-convexity.

Corollary 1.

For any

f : Ω \to R

, its λ-conjugate

f^{(λ)} (u)

as defined by (14) is a λ-convex function of u on

Ω^{'}

(note

Ω^{'}

may not necessarily be convex).

Proof.

We can also give a direct proof (essentially reversing the steps of the proof of Proposition 1).

\begin{matrix} g^{*} (u) & = & sup_{\tilde{x} \in \tilde{Ω}} (\tilde{x} \cdot u - g (\tilde{x})) \\ = & sup_{x \in Ω} (e^{- λ f (x)} (x \cdot u) - \frac{1}{- λ} (e^{- λ f (x)} - 1)) \\ = & sup_{x \in Ω} \frac{1}{λ} ((1 + λ x \cdot u) e^{- λ f (x)}) - \frac{1}{λ} \\ = & \frac{1}{λ} (sup_{x \in Ω} e^{log (1 + λ x \cdot u) - λ f (x)} - 1) \\ = & \frac{1}{λ} (e^{{sup}_{x \in Ω} (log (1 + λ x \cdot u) - λ f (x))} - 1) \\ = & \frac{1}{λ} (e^{λ f^{(λ)} (u)} - 1) \\ = & γ_{λ} (f^{(λ)} (u)) . \end{matrix}

□

Corollary 1 is the extension of the claim that for any f, the standard Legendre conjugate

f^{*}

as given by (11) is always a convex function. Because of this, we can prove, in analogy to the standard Legendre conjugation ∗, the following relations:

(i): ${({(f^{(λ)})}^{(λ)})}^{(λ)} = f^{(λ)}$ for any f.
(ii): ${(f^{(λ)})}^{(λ)} = f$ if f is $λ$ -convex.

3.3. Relations between the $λ$ -Duality and Legendre Duality

We proceed to establish a formal relationship between the

λ

-duality and the ordinary Legendre duality, by relating the

λ

-conjugation of a

λ

-convex function f, denoted by

f^{(λ)}

, to the standard Legendre conjugation of a function (denoted by ∗).

We continue the analysis performed in Proposition 1. Taking

λ

-conjugation for a second time,

\begin{matrix} {(f^{(λ)})}^{(λ)} (x) & = & sup_{u \in Ω^{'}} \frac{1}{λ} (log (1 + λ (x \cdot u))) - f^{(λ)} (u) \\ = & sup_{\tilde{u} \in Ω^{'}} \frac{1}{λ} (log (1 + λ (x \cdot \tilde{u} - \tilde{g} (\tilde{u})))) \\ = & \frac{1}{λ} log (1 + λ sup_{\tilde{u} \in Ω^{'}} (x \cdot \tilde{u} - \tilde{g} (\tilde{u}))) \\ = & \frac{1}{λ} log (1 + λ {\tilde{g}}^{*} (x)) \\ = & κ_{λ} ({\tilde{g}}^{*} (x)) . \end{matrix}

Here, the variable

\tilde{u}

is defined by:

\tilde{u} = u e^{- λ f^{(λ)} (u)},

and the function

\tilde{g}

by:

\tilde{g} (\tilde{u}) = γ_{- λ} (f^{(λ)} (u)) = G_{- λ, f^{(λ)}}^{} (u) .

(18)

In the event when f is

λ

-convex, then

{(f^{(λ)})}^{λ} = f

. Therefore:

{\tilde{g}}^{*} (x) = γ_{λ} (f (x)) = G_{λ, f}^{} (x) .

Therefore,

\tilde{g} (\tilde{u}) = {(G_{λ, f}^{})}^{*} (\tilde{u})

. That is, the function

\tilde{g}

is just the (regular) Legendre conjugation

^{*}

of the function

G_{λ, f}^{} (x)

. In

\tilde{u}

parameterization, the

\tilde{g}

function has the expression of (18) with

\tilde{u}

and u related by (20). This parallels the fact that

g (\tilde{x}) = {(G_{λ, f^{(λ)}}^{})}^{*} (\tilde{x})

, and in

\tilde{x}

parameterization, the g function has the expression of (16) with

\tilde{x}

and x related by (19).

Summarizing the above, we have:

Theorem 1

(Connecting

λ

-duality to Legendre duality). Let f be a λ-convex function and

f^{(λ)}

be its λ-conjugate. Denote two functions g and

\tilde{g}

:

\begin{matrix} g (\tilde{x}) & = & G_{- λ, f}^{} (x) = γ_{- λ} (f (x)), \\ \tilde{g} (\tilde{u}) & = & G_{- λ, f^{(λ)}}^{} (u) = γ_{- λ} (f^{(λ)} (u)), \end{matrix}

where the two variables

\tilde{x}

and

\tilde{u}

are given by:

\begin{matrix} \tilde{x} = x e^{- λ f (x)} ⟺ x = \frac{\tilde{x}}{1 - λ g (\tilde{x})}, \end{matrix}

(19)

\begin{matrix} \tilde{u} = u e^{- λ f^{(λ)} (u)} ⟺ u = \frac{\tilde{u}}{1 - λ \tilde{g} (\tilde{u})} . \end{matrix}

(20)

Then, the following statements are equivalent:

(i): The $(x, u)$ variables satisfy the λ-duality of a pair of λ-convex functions $(f, f^{(λ)})$ :

$κ_{λ} (x \cdot u) = f (x) + f^{(λ)} (u);$

(21)
(ii): The $(\tilde{x}, u)$ variables satisfy the Legendre duality of a pair of convex functions $(g, g^{*})$ :

$\tilde{x} \cdot u = g (\tilde{x}) + g^{*} (u)$

(22)

with:

$g^{*} (u) = G_{λ, f^{(λ)}}^{} (u) = γ_{λ} (f^{(λ)} (u));$
(iii): The $(x, \tilde{u})$ variables satisfy the Legendre duality of a pair of convex functions $(\tilde{g}, {\tilde{g}}^{*})$ :

$x \cdot \tilde{u} = {\tilde{g}}^{*} (x) + \tilde{g} (\tilde{u})$

(23)

with:

${\tilde{g}}^{*} (x) = G_{λ, f}^{} (x) = γ_{λ} (f (x)) .$

Proof.

To prove the equivalence of (21) and (22), we re-write the latter as:

e^{- λ f (x)} x \cdot u = γ_{- λ} (f (x)) + γ_{λ} (f^{(λ)} (u)) = \frac{1}{λ} (e^{λ f^{(λ)} (u)} - e^{- λ f (x)}),

where we inserted the following relations:

g (\tilde{x}) = γ_{- λ} (f (x)), g^{*} (u) = γ_{λ} (f^{(λ)} (u))

and replaced

\tilde{x}

by x using (19). Multiplying

e^{λ f (x)}

on both sides, we obtain:

x \cdot u = \frac{1}{λ} (e^{λ (f^{(λ)} (u) + f (x))} - 1) = γ_{λ} (f^{(λ)} (u) + f (x)) .

(24)

Noting

{(γ_{λ})}^{- 1} = κ_{λ}

verifies (21). To prove the equivalence of (21) and (23), we rely on an analogous identity:

(1 + λ x \cdot u) e^{- λ f^{(λ)} (u)} = 1 + λ (x \cdot \tilde{u} - \tilde{g} (\tilde{u})),

where:

\tilde{u} = u e^{- λ f^{(λ)} (u)}, \tilde{g} (\tilde{u}) = γ_{- λ} (f^{(λ)} (u)) .

We have, after multiplying

e^{λ f^{(λ)} (u)}

on both sides of (24),

x \cdot u e^{- λ f^{(λ)} (u)} = \frac{1}{λ} (e^{λ f (x)} - e^{- λ f^{(λ)} (u)}) = γ_{λ} (f (x)) + γ_{- λ} (f^{(λ)} (u)) = {\tilde{g}}^{*} (x) + \tilde{g} (\tilde{u}),

where the last step used:

{\tilde{g}}^{*} (x) = γ_{λ} ({(f^{(λ)})}^{(λ)} (x)), \tilde{g} (\tilde{u}) = γ_{- λ} (f^{(λ)} (u)) .

Noting

{(f^{(λ)})}^{(λ)} = f

due to f assumed to be

λ

-convex, then (23) follows. □

We see that the functions

γ_{λ}

and

γ_{- λ}

serve as link functions from the

(f, f^{(λ)})

-pair of the

λ

-deformed Legendre conjugation to the

(g, g^{*})

-pair and the

(\tilde{g}, {\tilde{g}}^{*})

-pair of the regular Legendre conjugation.

4. $λ$ -Logarithmic Divergence and Its Dualistic Geometry

In this section, we study the

λ

-deformation of the Bregman (canonical) divergence function and the resulting dualistic geometry (Riemannian metric and dual connections), which correspond to the

λ

-duality. This involves first establishing the

λ

-deformation to the gradient operation (so-called

λ

-gradient), which then leads to the so-called

λ

-logarithmic divergence function as deformation to the Bregman divergence. Finally, we show that the resulting Riemannian metric is a conformal Hessian metric, while the resulting dual connections are projectively flat (with constant curvature). The conformal and projective factor is parameterized by

λ

, which gives the curvature of the constant curvature space.

4.1. $λ$ -Gradient

Definition 3

(

λ

-gradient). For

x \in Ω

, define the λ-gradient

D^{(λ)} f

by:

D^{(λ)} f (x) = \frac{1}{1 - λ D f (x) \cdot x} D f (x) .

(25)

The work of [17] (Theorem 2.2) showed the above formula for deforming the gradient of a function motivated by the

λ

-duality setting. For mathematical convenience, it is proven under some regularity conditions; a full generalization along the lines of [35] is a natural direction for further research.

Theorem 2

(

λ

-gradient for

λ

-duality). Let

λ \neq 0

, and let f be a λ-exponentially convex function that is

C^{2}

on some open convex set

Ω \subset R^{d}

, such that (a)

D^{2} G_{λ, f}^{}

is strictly positive definite and (b)

1 - λ D f (x) \cdot x > 0

on Ω. Then we have

(i): $D^{(λ)} f$ is a $C^{1}$ -diffeomorphism from Ω to its range $Ω^{'}$ .
(ii): Denote $u = D^{(λ)} f (x)$ . We have $1 + λ x \cdot u > 0$ , and the following identity holds:

$f (x) + f^{(λ)} (u) = \frac{1}{λ} log (1 + λ x \cdot u) \equiv κ_{λ} (x \cdot u) .$
(iii): Furthermore, $x = D^{(λ)} f^{(λ)} (u)$ .

Note that the

λ

-gradient

D^{(λ)} f

differs from the regular gradient

D f

by a scalar multiplication. The duality between x and u under the

λ

-duality is mediated by a dual variable

u = D^{(λ)} f (x)

, which plays an important role in what follows.

Let:

(a): $u = D^{(λ)} f (x)$ denote the $λ$ -conjugate variable corresponding to x with respect to $f (x)$ ;
(b): $\hat{u} = D g (\tilde{x})$ be the Legendre conjugate variable corresponding to $\tilde{x}$ with respect to $g (\tilde{x})$ ;
(c): $x = D^{(λ)} f^{(λ)} (u)$ denote the $λ$ -conjugate variable corresponding to u with respect to $f^{(λ)} (u)$ ;
(d): $\hat{x} = D \tilde{g} (\tilde{u})$ be the Legendre conjugate variable corresponding to $\tilde{u}$ with respect to $\tilde{g} (\tilde{u})$ .

Is there a simple relationship between them? The following proposition says

u (x) = \hat{u} (\tilde{x})

, where

\tilde{x}

and x are linked by (19), and

x (u) = \hat{x} (\tilde{u})

, where

\tilde{u}

and u are linked by (20).

Proposition 2.

We have:

u = D_{x}^{(λ)} f (x) = D_{\tilde{x}}^{} g (\tilde{x}), x = D_{u}^{(λ)} f^{(λ)} (u) = D_{\tilde{u}}^{} \tilde{g} (\tilde{u})

Here, we add the subscript to

D

to emphasize the argument with respect to which the derivative is taken.

Proof.

We use matrix notations where the gradient is regarded as a column vector. Applying the multivariate chain rule to (17), we have:

{(D_{\tilde{x}} g (\tilde{x}))}^{⊤} = e^{- λ f (x)} {(D_{x} f (x))}^{⊤} \frac{\partial x}{\partial \tilde{x}} (\tilde{x}),

where

\frac{\partial x}{\partial \tilde{x}} (\tilde{x})

is the Jacobian of the transformation

\tilde{x} \mapsto x

and

{(\cdot)}^{⊤}

denotes the transpose, For two vectors x and y, their outer product is denoted by

x y^{⊤}

, which is a rank-one square matrix with the

(i, j)

-entry

x^{i} y^{j}

.

From (15), we have:

\frac{\partial \tilde{x}}{\partial x} (x) = e^{- λ f (x)} (I - λ x {(D_{x} f (x))}^{⊤}) .

Since

1 - λ D_{x} f (x) \cdot x > 0

by assumption, we can invert the Jacobian by the Sherman–Morrison formula (see [12], Proposition 4) to obtain:

\frac{\partial x}{\partial \tilde{x}} (\tilde{x}) = e^{λ f (x)} (I + \frac{λ x {(D_{x} f (x))}^{⊤}}{1 - λ D_{x} f (x) \cdot x}) .

Plugging this into the above, we have:

{(D_{\tilde{x}} g (\tilde{x}))}^{⊤} = \frac{{(D_{x} f (x))}^{⊤}}{1 - λ D_{x} f (x) \cdot x} .

Using (25) to relate

D_{x}^{(λ)} f (x)

to

D_{x} f (x)

, the first relation involving

D_{\tilde{x}} g (\tilde{x})

is proven. The proof of the second relation in this proposition is analogous. □

Just as ordinary convexity leads to the notion of Bregman divergence (12), the notion of

λ

-exponential convexity leads to a generalization that we call the λ-logarithmic divergence. Henceforth, we let

f : Ω \to R

be a

λ

-exponentially convex function on an open convex domain

Ω \subset R^{d}

, and we assumed that the regularity conditions in Theorem 2 hold.

4.2. $λ$ -Logarithmic Divergence

By the definition of the

λ

-convexity, we have that

G_{λ, f}^{} (x) = γ_{λ} (f (x))

is convex on

Ω

. By the ordinary convexity of

G_{λ, f}^{}

, we have:

G_{λ, f}^{} (x) - G_{λ, f}^{} (x^{'}) \geq D G_{λ, f}^{} (x^{'}) \cdot (x - x^{'}), x, x^{'} \in Ω .

In terms of f, we have, after some manipulations,

γ_{λ} (f (x) - f (x^{'})) \geq D f (x^{'}) \cdot (x - x^{'}) .

Since

γ_{λ}

is increasing, we have:

f (x) - f (x^{'}) \geq {(γ_{λ})}^{- 1} (D f (x^{'}) \cdot (x - x^{'})) = κ_{λ} (D f (x^{'}) \cdot (x - x^{'})) .

This motivates the following definition.

Definition 4

(

λ

-logarithmic divergence). We define the λ-logarithmic divergence of f by:

\begin{matrix} L_{λ, f} (x, x^{'}) & = f (x) - f (x^{'}) - κ_{λ} (D f (x^{'}) \cdot (x - x^{'})) \\ = f (x) - f (x^{'}) - \frac{1}{λ} log (1 + λ D f (x^{'}) \cdot (x - x^{'})), x, x^{'} \in Ω . \end{matrix}

(26)

See Figure 1 for a graphical illustration. We note that the logarithmic correction in (26) corresponds to a logarithmic first-order approximation, based at

x^{'}

, which is possible due to the

λ

-exponential convexity of f. We also note that when

λ > 0

, it is possible that

L_{λ, f} (x, x^{'}) = \infty

. Nevertheless,

L_{λ, f} (x, x^{'})

is finite when x and

x^{'}

are sufficiently close. Formally, letting

λ \to 0

in (26) recovers the Bregman divergence.

4.3. $λ$ -Logarithmic Divergence in Different Forms

We now prove a lemma about the relationship of the variables

x, u

and gradients or

λ

-gradients of f or

f^{(λ)}

. We assumed, for convenience, that

1 + λ x \cdot u > 0

for all

x \in Ω, u \in Ω^{'}

.

Lemma 1.

Given

u = D_{x}^{(λ)} f (x)

or equivalently

x = D_{u}^{(λ)} f^{(λ)} (u)

, for arbitrary

x^{'}, u^{'}

(such that the expressions are well defined), we have the following identities:

\begin{matrix} κ_{λ} (u \cdot x^{'}) - κ_{λ} (u \cdot x) & = & κ_{λ} (u \cdot (x^{'} - x) {(Π_{λ})}^{- 1}), \end{matrix}

(27)

\begin{matrix} κ_{λ} (u^{'} \cdot x) - κ_{λ} (u \cdot x) & = & κ_{λ} ((u^{'} - u) \cdot x {(Π_{λ})}^{- 1}) . \end{matrix}

(28)

where

Π_{λ}

is a multiplicative factor (function of x or u) given by:

Π_{λ} \equiv 1 + λ x \cdot u = \frac{1}{1 - λ D f (x) \cdot x} = \frac{1}{1 - λ D f^{(λ)} (u) \cdot u} .

Proof.

Since

u = D_{x}^{(λ)} f (x)

, substituting (25), we have:

u \cdot x = \frac{D f (x) \cdot x}{1 - λ D f (x) \cdot x}

and:

1 + λ u \cdot x = \frac{1}{1 - λ D f (x) \cdot x}

so:

1 + λ u \cdot x^{'} = \frac{1 + λ D f (x) \cdot (x^{'} - x)}{1 - λ D f (x) \cdot x} = (1 + λ u \cdot x) (1 + λ D f (x) \cdot (x^{'} - x)) .

Taking the logarithm and rearranging, we obtain (27).

On the other hand, because:

x = D_{u}^{(λ)} f^{(λ)} (u) = \frac{D f^{(λ)} (u)}{1 - λ D f^{(λ)} (u) \cdot u},

we also have:

1 + λ x \cdot u = \frac{1}{1 - λ D f^{(λ)} (u) \cdot u} .

The proof of (28) is similar. □

In this above lemma,

x^{'}

and

u^{'}

are arbitrary; it is interesting that a modified form of “linearity” holds even though

κ_{λ}

is itself nonlinear. As a consequence, we have an alternative expression for

L_{λ, f} (x, x^{'})

.

Proposition 3.

L_{λ, f} (x, x^{'})

defined by (26) can also be written as:

L_{λ, f} (x, x^{'}) = f (x) - f (x^{'}) - κ_{λ} (x \cdot u^{'}) + κ_{λ} (x^{'} \cdot u^{'}) .

where

u^{'} = D^{(λ)} f (x^{'})

.

Of course, we may express the

λ

-logarithmic divergence using the conjugate variables

u, u^{'}

as well. Indeed, we have the analogous reference–representation biduality (see [24,25]) that is characteristic of Bregman divergence and canonical divergence for dually flat spaces, that is (8). See [38] for the reference–representation biduality of a general c-divergence (which includes both the Bregman and logarithmic divergences) based on optimal transport.

Theorem 3.

The λ-logarithmic divergence satisfies the reference–representation biduality, namely:

L_{λ, f^{(λ)}} (u^{'}, u) = L_{λ, f} (x, x^{'}),

where

u = D^{(λ)} f (x)

and

u^{'} = D^{(λ)} f (x^{'})

. Moreover, define the λ-deformed canonical divergence

A_{λ, f}

by:

A_{λ, f} (x, u^{'}) = f (x) + f^{(λ)} (u^{'}) - \frac{1}{λ} log (1 + λ x \cdot u^{'}) = A_{λ, f^{(λ)}} (u^{'}, x) .

We have:

L_{λ, f} (x, x^{'}) = A_{λ, f} (x, u^{'}) = A_{λ, f^{(λ)}} (u^{'}, x) = L_{λ, f^{(λ)}} (u^{'}, u) .

Proposition 3 also allows us to derive our next theorem (Theorem 4) linking

λ

-logarithmic divergence and Bregman divergence (also see [19] for a discussion of conformal divergence in the affine immersion setting).

Theorem 4.

The canonical forms of the λ-logarithmic divergence

A_{λ, f}

and

A_{λ, f^{(λ)}}

are related to the canonical forms of the Bregman divergence

A_{g^{*}}

and

A_{\tilde{g}}

via a conformal transformation and the non-linear link function

κ_{- λ}

:

\begin{matrix} A_{λ, f^{(λ)}} (u^{'}, x) = κ_{- λ} (e^{- λ f^{(λ)} (u^{'})} A_{g^{*}} (u^{'}, \tilde{x})), \\ = & A_{λ, f} (x, u^{'}) = κ_{- λ} (e^{- λ f (x)} A_{{\tilde{g}}^{*}} (x, \tilde{u^{'}})) . \end{matrix}

Proof.

\begin{matrix} A_{λ, f^{(λ)}} (u^{'}, x) & = f^{(λ)} (u^{'}) + f (x) - \frac{1}{λ} log (1 + λ u^{'} \cdot x) \\ = f^{(λ)} (u^{'}) - \frac{1}{λ} log ((1 + λ u^{'} \cdot x) e^{- λ f (x)}) \\ = \frac{1}{λ} log (1 + λ g^{*} (u^{'})) - \frac{1}{λ} log (e^{- λ f (x)} + λ u^{'} \cdot (x e^{- λ f (x)})) \\ = \frac{1}{λ} log (1 + λ g^{*} (u^{'})) - \frac{1}{λ} log (1 - λ g (\tilde{x}) + λ u^{'} \cdot \tilde{x}) \\ = - \frac{1}{λ} log (\frac{1 + λ (u^{'} \cdot \tilde{x} - g (\tilde{x}))}{1 + λ g^{*} (u^{'})}) \\ = - \frac{1}{λ} log (1 + λ \frac{u^{'} \cdot \tilde{x} - g (\tilde{x}) - g^{*} (u^{'})}{1 + λ g^{*} (u^{'})}) \\ = - \frac{1}{λ} log (1 - λ \frac{A_{g^{*}} (u^{'}, \tilde{x})}{1 + λ g^{*} (u^{'})}) \\ = - \frac{1}{λ} log (1 - λ (e^{- λ f^{(λ)} (u^{'})} A_{g^{*}} (u^{'}, \tilde{x}))) \\ = κ_{- λ} (e^{- λ f^{(λ)} (u^{'})} A_{g^{*}} (u^{'}, \tilde{x})) . \end{matrix}

The proof of the second line of Theorem 4 is similar. We have

A_{λ, f^{(λ)}} (u^{'}, x) = A_{λ, f} (x, u^{'})

from Theorem 3. □

4.4. Dualistic Geometry of $λ$ -Logarithmic Divergence

Regard

x \in Ω

as the primal (global) coordinate system of a manifold

M

. As described in Section 2.1, we may use the

λ

-logarithmic divergence

L_{λ, f}

of f to construct a dualistic structure

(M, g, \nabla, \nabla^{*})

. In this subsection, we provide explicit expressions of the corresponding coefficients and state some key geometric consequences.

We begin with the Riemannian metric.

Theorem 5.

The Riemannian metric

g

induced from

L_{λ, f} (x, x^{'})

is given in primal coordinate x by:

g (x) = D^{2} f (x) + λ (D f (x)) {(D f (x))}^{⊤} = e^{- λ f (x)} D^{2} G_{λ, f}^{} (x) .

(29)

Proof.

According to (2), we perform direct differentiation of (26):

g_{i j} (x) = {\frac{\partial^{2}}{\partial x^{i} \partial x^{j}} L_{λ, f} (x, x^{'})|}_{x = x^{'}}

and obtain the expression of (29). □

By symmetry, under the dual coordinate system

u = D^{(λ)} f (x)

, we have:

g (u) = D^{2} f^{(λ)} (u) + λ (D f^{(λ)} (u)) {(D f^{(λ)} (u))}^{⊤} .

From the first equality in (29), we see that

g

is a rank-one correction of the Hessian matrix

D^{2} f (x)

. From the second equality, we see that

g

is in fact a conformal Hessian metric, i.e., it has the form

g (x) = e^{- λ f (x)} g_{0} (x)

, where

g_{0} (x) = D^{2} G_{λ, f}^{} (x)

is the Hessian metric induced by the convex function

G_{λ, f}^{} (x) = \frac{1}{λ} (e^{λ f (x)} - 1)

. This conclusion is entirely anticipated from Theorem 4.

To compute the Christoffel symbols of the primal and dual connections, we need an expression of the inverse of the Riemannian metric

g (x)

as a matrix. This is provided by the following proposition.

Proposition 4.

The metric

g

can be expressed as:

g (x) = \frac{1}{Π_{λ} (x)} (I_{d} - \frac{λ}{Π_{λ} (x)} u x^{⊤}) \frac{\partial u}{\partial x} (x),

(30)

where

\frac{\partial u}{\partial x}

is the Jacobian matrix of the coordinate transformation

x \mapsto u

and

I_{d}

is the

d \times d

identity matrix with Kronecker

δ_{i j}

as its entries. Here:

Π_{λ} (x) = 1 + λ x \cdot u = \frac{1}{1 - λ D f (x) \cdot x}

and

Π_{λ} (x) > 0

for

x \in Ω

and

u = D^{(λ)} f (x)

, due to Part (ii) of Theorem 2.

Moreover, the inverse of

g (x)

can be expressed as:

{(g (x))}^{- 1} = Π_{λ} (x) \frac{\partial x}{\partial u} (u) (I_{d} + λ u x^{⊤}) .

(31)

Proof.

Using the

λ

-logarithmic divergence represented as the generalized canonical divergence

A_{λ, f}

(26), we apply (2) to obtain:

\begin{matrix} g_{i j} (x) & = - {\frac{\partial^{2}}{\partial x^{i} \partial x^{' j}} L_{λ, f} (x, x^{'})|}_{x = x^{'}} \\ = - {\frac{\partial^{2}}{\partial x^{i} \partial x^{' j}} \{f (x) + f^{(λ)} (u^{'}) - \frac{1}{λ} log (1 + λ x \cdot u^{'})\}|}_{x = x^{'}} \\ = \frac{1}{Π_{λ} (x)} \{\frac{\partial u^{i}}{\partial x^{j}} - \frac{λ}{Π_{λ} (x)} u^{i} \sum_{k = 1}^{d} x^{k} \frac{\partial u^{k}}{\partial x^{j}}\} . \end{matrix}

Expressing the above expression using matrix notations gives (30). Formula (31) follows by inverting (30) using the Sherman–Morrison formula. □

Under the dualistic structure induced by a

λ

-logarithmic divergence, the primal and dual coordinate vector fields are no longer biorthogonal in the sense of (7). Nevertheless, we have the following generalization. Again, we write

Π_{λ} (x) = 1 + λ x \cdot u

.

Corollary 2.

The inner product of the coordinate vector fields

\frac{\partial}{\partial x^{i}}

and

\frac{\partial}{\partial u^{j}}

is given by a λ-deformed “biorthogonality” relation:

g (\frac{\partial}{\partial x^{i}}, \frac{\partial}{\partial u^{j}}) = \frac{1}{Π_{λ} (x)} δ_{i j} - \frac{λ}{Π_{λ} {(x)}^{2}} x^{j} u^{i} .

Proof.

Write

\frac{\partial}{\partial u^{j}} = \sum_{m = 1}^{d} \frac{\partial x^{m}}{\partial u^{j}} \frac{\partial}{\partial x^{m}}

. Then:

g (\frac{\partial}{\partial x^{i}}, \frac{\partial}{\partial u^{j}}) = \sum_{m = 1}^{d} \frac{\partial x^{m}}{\partial u^{j}} g (\frac{\partial}{\partial x^{i}}, \frac{\partial}{\partial x^{m}}) .

Simplifying the expression using (30) gives the result. For details, see ([12], Proposition 8). □

Theorem 6.

The Christoffel symbols of the primal connection ∇ are given by:

Γ_{i j, k} (x) = - \frac{λ}{Π_{λ} {(x)}^{2}} (u^{j} \frac{\partial u^{i}}{\partial x^{k}} + u^{i} \frac{\partial u^{j}}{\partial x^{k}}) + \frac{2 λ^{2}}{Π_{λ} {(x)}^{3}} \sum_{ℓ = 1}^{d} u^{i} u^{j} x^{ℓ} \frac{\partial u^{ℓ}}{\partial x^{k}},

where

Π_{λ} (x) = 1 + λ x \cdot u

as in Proposition 4. Furthermore, let

Γ_{i j}^{k} = \sum_{ℓ = 1}^{d} Γ_{i j, ℓ} g^{ℓ k}

be the Christoffel symbol of the second kind, then:

Γ_{i j}^{k} (x) = \frac{- λ}{Π_{λ} (x)} (u^{i} δ_{j}^{k} + u^{j} δ_{i}^{k}) = - λ (\frac{\partial f}{\partial x^{i}} (x) δ_{j}^{k} + \frac{\partial f}{\partial x^{j}} (x) δ_{i}^{k}),

(32)

where δ is the Kronecker delta.

Similarly, under the dual coordinate system u, the Christoffel symbol (of the second kind) of the dual connection

\nabla^{*}

is given by:

Γ_{i j}^{* k} (u) = - λ (\frac{\partial f^{(λ)}}{\partial u^{i}} (u) δ_{j}^{k} + \frac{\partial f^{(λ)}}{\partial u^{j}} (u) δ_{i}^{k}) .

(33)

Proof.

This is a straightforward computation using (3) and Proposition 4. The details, which are a minor modification of the proof of ([12], Proposition 5), are omitted. □

Although the curvatures of ∇ and

\nabla^{*}

are nonzero, it can be shown that ∇ and

\nabla^{*}

are both projectively flat, i.e., each of them is projectively equivalent to a flat connection. Specifically, any ∇-geodesic (resp.

\nabla^{*}

-geodesic) is a time-reparameterized straight line under the x (resp. u) coordinate system.

Theorem 7.

The sectional curvatures of ∇ and

\nabla^{*}

with respect to

g

are both equal to λ.

Proof.

See ([12], Theorem 15). □

Using the dual projective flatness and Corollary 2, Reference ([12], Theorem 16) showed that the

λ

-logarithmic divergence satisfies a generalized Pythagorean theorem, which generalizes the property of Bregman divergence outlined in Section 2.2.

Theorem 8

(Generalized Pythagorean theorem). Let

P, Q, R \in M

. Then:

L_{λ, f} (x_{Q}, x_{P}) + L_{λ, f} (x_{R}, x_{Q}) = L_{λ, f} (x_{R}, x_{P})

if and only if the ∇-geodesic between Q and R and the

\nabla^{*}

-geodesic between Q and P meet

g

-orthogonally at Q.

To summarize, the dually flat geometry becomes a dually projectively flat geometry with constant sectional curvature

λ

, and the Hessian metric becomes a conformal Hessian metric. Nevertheless, the primal and dual geodesics are still straight lines (up to time reparametrizations), and the generalized Pythagorean theorem holds.

We say that the above

λ

-deformation framework is “canonical” because the statistical manifold

(M, g, \nabla, \nabla^{*})

, with a conformal Hessian metric

g_{i j}

given by (29) and a pair of dual projectively flat affine connections

Γ_{i j}^{k}, Γ_{i j}^{* k}

given by (32) and (33), is the only statistical structure with constant curvature ([12], Theorem 15). Moreover, given such a statistical manifold, one can construct locally a

λ

-logarithmic divergence, which induces the given geometry.

5. Linking $λ$ -Deformation to Rényi Entropy and Divergence

5.1. Relation between Tsallis’ and Rényi’s Deformation Expressions

Recall that Tsallis [39], in the context of statistical physics, introduced the generalized entropy:

H_{λ}^{Tsallis} [p] = \int p {log}_{λ} (\frac{1}{p}) d μ = \frac{1}{λ} (\int {(p (ζ))}^{λ} - 1) d μ;

note that we use

λ

here in place of

q = 1 - λ

as in [40].

Tsallis entropy is related to Rényi entropy [41], defined as:

H_{λ}^{Rényi} [p] : = \frac{1}{λ} log (\int p^{1 - λ} (ζ) d μ),

through a monotonic transformation:

H_{λ}^{Tsallis} [p] = \frac{1}{λ} (e^{λ H_{λ}^{Rényi} [p]} - 1) .

In our current notation,

H_{λ}^{Tsallis} [p] = γ_{λ} (H_{λ}^{Rényi} [p]) ⟺ H_{λ}^{Rényi} [p] = κ_{λ} (H_{λ}^{Tsallis} [p]) .

Rényi divergence (with Rényi index

1 - λ

) is defined by:

H_{λ}^{Rényi} [p | | p^{'}] = \frac{- 1}{λ} log \int {(p (ζ))}^{1 - λ} {(p^{'} (ζ))}^{λ} d μ .

Rényi divergence is additive: given two product measures

p_{1} \otimes p_{2}

and

p_{1}^{'} \otimes p_{2}^{'}

, we have:

H_{λ}^{Rényi} [p_{1} \otimes p_{2} | | p_{1}^{'} \otimes p_{2}^{'}] = H_{λ}^{Rényi} [p_{1} | | p_{1}^{'}] + H_{λ}^{Rényi} [p_{2} | | p_{2}^{'}] .

Because Tsallis entropy is not additive, this has been used as an argument for favoring Rényi entropy as a physical concept over Tsallis entropy; see [28] (Section 9.3) and [42].

5.2. $λ$ -Exponential Family

Under the

λ

-deformation, there is an intrinsic link between the subtractive and divisive normalizations of the

λ

-deformed exponential family. Starting with the observation:

e^{κ_{λ} (t)} = {(1 + λ t)}^{1 / λ} = {exp}_{λ} (t),

we investigate the identity:

{(1 + λ ϑ \cdot F (ζ))}^{1 / λ} e^{- φ_{λ} (ϑ)} = {(1 + λ (θ \cdot F (ζ) - ϕ_{λ} (θ)))}^{1 / λ} .

Taking the

λ

-th power and equating both sides, we obtain the conditions for the above identity to hold:

\begin{matrix} θ = ϑ e^{- λ φ_{λ} (ϑ)} & ⟺ & ϑ = \frac{θ}{1 - λ ϕ_{λ} (θ)}, \\ ϕ_{λ} (θ) = \frac{1}{- λ} (e^{- λ φ_{λ} (ϑ)} - 1) & ⟺ & φ_{λ} (ϑ) = - \frac{1}{λ} log (1 - λ ϕ_{λ} (θ)) . \end{matrix}

This fact led us to define a

λ

-exponential family that can be normalized both subtractively and divisively: the former denoted by

p (ζ | θ)

and the latter denoted by

p (ζ | ϑ)

.

Proposition 5

(Reparameterization equivalence). Let

λ \neq 0

. With respect to a given reference measure μ and a fixed vector of random functions

F (ζ) = (F_{1} (ζ), \dots, F_{d} (ζ))

, the λ-exponential family is given by

p^{(λ)} (ζ | θ)

under subtractive normalization and by

p^{(λ)} (ζ | ϑ)

under divisive normalization; they are reparametrizations of each other:

p^{(λ)} (ζ | θ) = {exp}_{λ} (θ \cdot F (ζ) - ϕ_{λ} (θ)) = {exp}_{λ} (ϑ \cdot F (ζ)) e^{- φ_{λ} (ϑ)} = p^{(λ)} (ζ | ϑ) .

(34)

Here, the function

ϕ_{λ} (θ)

is called subtractive

λ

-potential and used for subtractive normalization, while

φ_{λ} (ϑ)

is called divisive

λ

-potential and used for divisive normalization. Note that

ϕ_{λ}

and

φ_{λ}

may not have same domains. They satisfy:

ϕ_{λ} (θ) = γ_{- λ} (φ_{λ} (ϑ)) ⟺ φ_{λ} (ϑ) = κ_{- λ} (ϕ_{λ} (θ)),

where:

κ_{- λ} (t) = - \frac{1}{λ} log (1 - λ t), γ_{- λ} (t) = \frac{1}{λ} (1 - e^{- λ t}) .

Note again that we use

ϑ

for the divisive normalization setting and distinguish it from

θ

for the subtractive normalization setting. For later convenience, we also note:

e^{- λ φ_{λ} (ϑ)} = 1 - λ ϕ_{λ} (θ) .

5.2.1. Under Subtractive Normalization

The deformed exponential family takes the form:

p (ζ | θ) = {exp}_{λ} (θ \cdot F (ζ) - ϕ_{λ} (θ))

(35)

where

θ \cdot F (ζ) = \sum_{i = 1}^{d} θ^{i} F_{i} (ζ)

, and the subtractive

λ

-potential

ϕ_{λ} (θ)

is specified by the normalization:

1 = \int p (ζ | θ) d μ = \int {exp}_{λ} (θ \cdot F (ζ) - ϕ_{λ} (θ)) d μ .

This leads to:

\frac{\partial ϕ_{λ}}{\partial θ^{i}} = \int \tilde{p} (ζ | θ) F_{i} (ζ) d μ

with the escort transformation given by:

\tilde{p} (ζ | θ) = \frac{{(p (ζ | θ))}^{1 - λ}}{\int {(p (ζ | θ))}^{1 - λ} d μ} .

(36)

Clearly, when

λ \to 0

, we recover the regular exponential family (9). It was Tsallis who introduced the q-exponential family, where

q = 1 - λ

.

5.2.2. Under Divisive Normalization

To deform the exponential family through divisive normalization, we use a smooth monotone function

κ_{λ} (\cdot)

and define a parametric probability family, which takes the form:

log p (ζ | ϑ) = κ_{λ} (ϑ \cdot F (ζ)) - φ_{λ} (ϑ) .

Note that we use the symbol

ϑ

to distinguish it from the parameter

θ

in the subtractive case. Here:

φ_{λ} (ϑ) = log \int e^{κ_{λ} (ϑ \cdot F (ζ))} d μ

is the divisive normalization function, and it was assumed that:

\int e^{κ_{λ} (ϑ \cdot F (ζ))} d μ < \infty

in the domain of

ϑ

(the natural parameter set). It is possible that the support of the density depends on the parameter

ϑ

, as in the case of the q-exponential family; see [17]. To avoid technicalities, we assumed that the support of

p (ζ | ϑ)

is independent of

ϑ

.

Writing out

κ_{λ} ()

, the resulting family is:

p (ζ | ϑ) = {(1 + λ ϑ \cdot F (ζ))}^{1 / λ} e^{- φ_{λ} (ϑ)},

(37)

where the divisive

λ

-potential

φ_{λ} (ϑ)

is given by:

φ_{λ} (ϑ) = log \int {(1 + λ ϑ \cdot F (ζ))}^{1 / λ} d μ

(38)

is finite on the parameter set. This family unifies the

F^{(\pm α)}

-families introduced in [12].

5.3. $λ$ -Mixture Family

We next define a mixture-type family dual to the

λ

-exponential family, in an analogous way that an exponential family is dual to the mixture family. The form of the family is justified by its compatibility with the

λ

-duality.

Definition 5

(

λ

-mixture family). Let

λ \neq 0, 1

be given. The λ-mixture family with respect to a fixed set of densities

P_{0} (ζ), P_{1} (ζ), \dots, P_{d} (ζ)

is defined by:

p^{(λ)} (ζ | η) = \frac{1}{Z_{λ} (η)} {(\sum_{i = 0}^{d} η_{i} {\tilde{P}}_{i} (ζ))}^{1 / (1 - λ)},

(39)

where

η = (η_{1}, \dots, η_{d})

is the mixture parameter satisfying

0 \leq η_{i} \leq 1

and

η_{0} = 1 - \sum_{i = 1}^{d} η_{i} > 0

. Here,

{\tilde{P}}_{i}, i = 0, 1, \dots, d

denotes the escort transformation, as given by (36), of the given

P_{i}

’s:

{\tilde{P}}_{i} (ζ) = \frac{{(P_{i} (ζ))}^{1 - λ}}{\int {(P_{i} (ζ))}^{1 - λ} d μ},

where the denominator is assumed to exist, and

Z_{λ} (η)

represents the integral:

\begin{matrix} Z_{λ} (η) & = \int {(\sum_{i = 0}^{d} η_{i} {\tilde{P}}_{i} (ζ))}^{1 / (1 - λ)} d μ, \end{matrix}

which is assumed to converge for all η and to be differentiable under the integral sign.

Denote:

C_{i} = \int {(P_{i} (ζ))}^{1 - λ} d μ

and:

{\tilde{η}}_{i} = \frac{1}{h_{λ} (η)} \frac{η_{i}}{C_{i}}

where:

h_{λ} (η) = \sum_{i = 0}^{d} \frac{η_{i}}{C_{i}} .

Then,

0 \leq {\tilde{η}}_{i} \leq 1

and

\sum_{i = 0}^{d} {\tilde{η}}_{i} = 1 .

We can express

p^{(λ)}

now in

\tilde{η}

:

\begin{matrix} p^{(λ)} = \frac{1}{Z_{λ} (η)} {(\sum_{i = 0}^{d} η_{i} {\tilde{P}}_{i} (ζ))}^{1 / (1 - λ)} = \frac{h_{λ} (η)}{Z_{λ} (η)} {(\sum_{i = 0}^{d} {\tilde{η}}_{i} {(P_{i} (ζ))}^{1 - λ})}^{1 / (1 - λ)} \\ = e^{log h_{λ} (η) - log Z_{λ} (η)} {((1 - \sum_{i = 1}^{d} {\tilde{η}}_{i}) {(P_{0} (ζ))}^{1 - λ} + \sum_{i = 1}^{d} {\tilde{η}}_{i} {(P_{i} (ζ))}^{1 - λ})}^{1 / (1 - λ)} \\ = e^{log h_{λ} (η) - log Z_{λ} (η)} {(1 + \sum_{i = 1}^{d} {\tilde{η}}_{i} \frac{{(P_{i} (ζ))}^{1 - λ} - {(P_{0} (ζ))}^{1 - λ}}{{(P_{0} (ζ))}^{1 - λ}})}^{1 / (1 - λ)} P_{0} (ζ) . \end{matrix}

Setting:

F_{i} (ζ) = \frac{1}{1 - λ} ({(\frac{P_{i} (ζ)}{P_{0} (ζ)})}^{1 - λ} - 1),

with

d ν = P_{0} (ζ) d μ

, the density of the

λ

-mixture family

p^{(λ)}

with respect to the new measure

ν

now has the form:

p^{(λ)} (ζ | \tilde{η}) = {(1 + (1 - λ) \tilde{η} \cdot F (ζ))}^{1 / (1 - λ)} e^{- ψ_{1 - λ} (\tilde{η})},

where

ψ_{1 - λ} (\tilde{η}) = log Z_{λ} (η) - log h_{λ} (η)

. Thus, we showed the following:

Proposition 6

(Relation between

λ

-exponential and

λ

-mixture families). Suppose

λ \neq 0, 1

. A λ-mixture family with pure densities:

P (ζ) = {P_{0} (ζ), P_{1} (ζ), \dots, P_{d} (ζ)}

becomes a λ-exponential family with the vector of random functions:

F (ζ) = {F_{1} (ζ), \dots, F_{d} (ζ)}

after a transformation of the dominating measure

d μ \to d ν = P_{0} (ζ) d μ

and the random variables

P (ζ) \to F (ζ)

:

F_{i} (ζ) = \frac{1}{1 - λ} ({(\frac{P_{i} (ζ)}{P_{0} (ζ)})}^{1 - λ} - 1) = {log}_{1 - λ} (\frac{P_{i} (ζ)}{P_{0} (ζ)}),

and a reparameterization

η \to \tilde{η}

:

{\tilde{η}}_{i} = \frac{1}{h_{λ} C_{i}} η_{i} ⟺ η_{i} = {\tilde{η}}_{i} (h_{λ} C_{i})

with:

h_{λ} = \sum_{i = 0}^{d} \frac{η_{i}}{C_{i}} = \sum_{i = 0}^{d} {\tilde{η}}_{i} C_{i} .

5.4. Potential Functions as Rényi Entropies

We now show that our

λ

-duality framework is naturally compatible with the

λ

-exponential and

λ

-mixture families, with Rényi entropy and Rényi divergence replacing Shannon entropy and Kullback–Leibler divergence. In what follows, we assume

λ < 1

.

Proposition 7

(For the

λ

-exponential family). With respect to the λ-exponential family defined by (37) with divisive potential function

φ_{λ}

given by (38), we have:

(i): $φ_{λ} (ϑ)$ is λ-convex. Moreover, $1 - λ D φ_{λ} (ϑ) \cdot θ > 0$ .
(ii): The λ-conjugate variable $η = D^{(λ)} φ_{λ} (ϑ) = D ϕ_{λ} (θ)$ is the the escort expectation:

$η = \frac{\int {(p (ζ | θ))}^{1 - λ} F (ζ) d μ}{\int {(p (ζ | θ))}^{1 - λ} d μ} = \int \tilde{p} (ζ | θ) F (ζ) d μ .$
(iii): The λ-conjugate function $ψ_{λ} (η)$ with respect to $φ_{λ} (ϑ)$ is given by:

$ψ_{λ} (η) = - H_{λ}^{Rényi} [p (\cdot | ϑ)] .$
(iv): The λ-logarithmic divergence is the Rényi divergence:

$L_{λ, φ_{λ}} (ϑ, ϑ^{'}) = H_{λ}^{Rényi} [p (\cdot | ϑ^{'}) | | p (\cdot | ϑ)] .$

Proposition 8

(For the

λ

-mixture family). With respect to the λ-mixture family given by (39) with its potential function

ψ_{λ} (η)

given by:

ψ_{λ} (η) = \frac{1 - λ}{λ} log \int {(\sum_{i = 0}^{d} η_{i} {\tilde{P}}_{i})}^{1 / (1 - λ)} d μ = \frac{1 - λ}{λ} log Z_{λ} (η),

we have:

(i): The potential function $ψ_{λ} (η)$ is a λ-convex function of η.
(ii): The potential function $ψ_{λ} (η)$ is given by:

$ψ_{λ} (η) = - H_{λ}^{Rényi} [p (\cdot | η)] .$
(iii): The λ-logarithmic divergence is the Rényi divergence:

$L_{λ, ψ_{λ}} (η, η^{'}) = H_{λ}^{Rényi} [p (\cdot | η) | | p (\cdot | η^{'})] .$

The proofs of the above Proposition 7 (about the

λ

-exponential family) and Proposition 8 (about the

λ

-mixture family) can be found in [17].

6. Summary and Conclusions

Our paper summarizes a canonical approach to deforming exponential and mixture families and the associated dually flat Hessian geometry. The

λ

-exponential family we introduced has two parameterizations (35) and (37):

p^{(λ)} (ζ | \cdot) = {exp}_{λ} (ϑ \cdot F (ζ)) e^{- φ_{λ} (ϑ)} = {exp}_{λ} (θ \cdot F (ζ) - ϕ_{λ} (θ)) .

The two expressions reflect subtractive and divisive normalizations—a typical example of the former is the q-exponential family with associated Tsallis entropy, whereas an example of the latter is the

F^{(\pm α)}

-family and the associated Rényi entropy. These two versions of deformation to an exponential family are two faces of the same coin; furthermore, the

λ

-exponential family is also linked to the

λ

-mixture family, when

λ \neq 0, 1

, via a reparameterization of the random functions

F (ζ)

above.

The coincidence of these two parameterizations of the deformed family is associated with the

λ

-duality, which is the main focus of our exposition. The

λ

-duality is a “deformation” (see Table 1) of the usual Legendre duality reviewed in Section 3.1. In a nutshell, instead of convex functions, we worked with

λ

-convex functions f such that

\frac{1}{λ} (e^{λ f} - 1)

is convex, for a fixed

λ \neq 0

. Furthermore, instead of the convex conjugate, we used the

λ

-conjugate given by:

f^{(λ)} (u) = sup_{x} (\frac{1}{λ} log (1 + λ x \cdot u) - f (x)) .

The expression of the

λ

-duality:

κ_{λ} (x \cdot u) = f (x) + f^{(λ)} (u),

turns out to be a re-write of the Legendre duality between

\tilde{x}

and u:

\tilde{x} \cdot u = g (\tilde{x}) + g^{*} (u), w i t h \tilde{x} = x e^{- λ f (x)};

and a re-write of the Legendre duality between x and

\tilde{u}

:

x \cdot \tilde{u} = {\tilde{g}}^{*} (x) + \tilde{g} (\tilde{u}), w i t h \tilde{u} = u e^{- λ f^{(λ)} (u)} .

Therefore,

λ

-duality is in essence the Legendre duality with a

λ

-dependent rescaling of the variables:

\tilde{x} = x e^{- λ f (x)} ⟺ x = \frac{\tilde{x}}{1 - λ g (\tilde{x})}

and:

\tilde{u} = u e^{- λ f^{(λ)} (u)} ⟺ u = \frac{\tilde{u}}{1 - λ \tilde{g} (\tilde{u})} .

The two pairs of convex functions

g, g^{*}

and

\tilde{g}, {\tilde{g}}^{*}

are linked with the pair of

λ

-convex functions

f, f^{(λ)}

via:

\begin{matrix} g (\tilde{x}) & = & G_{- λ, f}^{} (x) = γ_{- λ} (f (x)) = (γ_{- λ} \circ f) (x); \\ g^{*} (u) & = & G_{λ, f^{(λ)}}^{} (u) = γ_{λ} (f^{(λ)} (u)) = (γ_{λ} \circ f^{(λ)}) (u); \\ \tilde{g} (\tilde{u}) & = & G_{- λ, f^{(λ)}}^{} (u) = γ_{- λ} (f^{(λ)} (u)) = (γ_{- λ} \circ f^{(λ)}) (u); \\ {\tilde{g}}^{*} (x) & = & G_{λ, f}^{} (x) = γ_{λ} (f (x)) = (γ_{λ} \circ f) (x) . \end{matrix}

The

λ

-duality leads to nontrivial mathematical questions, e.g., a differential calculus in the spirit of Rockafellar and analogous to functions of the Legendre type. Some of the derivations in the current paper were heuristic, and a complete and rigorous development is left for future research.

Coming back to the probability families, we first verified that the subtractive potential

ϕ_{λ} (θ)

is convex in

θ

and the divisive potential

φ_{λ} (ϑ)

is

λ

-convex in

ϑ

. Subtractive normalization using

ϕ_{λ} (θ)

is associated with the regular Legendre duality, whereas divisive normalization using

φ_{λ} (ϑ)

is associated with the

λ

-duality. This gives an interpretation of the distinctiveness of Rényi entropy (used in the latter) from Tsallis entropy (used in the former) based on their intimate connection to the

λ

-duality (for

λ \neq 0

) or to the Legendre duality. As

λ

is the parameter that controls the curvature in the Riemannian geometry of these probability families (see [12]), our framework provides a simple parametric deformation from the dually flat geometry (of the exponential model) to the dually projectively flat geometry (of the

λ

-exponential model). We expect that this framework will generate new insights in the applications of the q-exponential family and related concepts in statistical physics and information science.

Author Contributions

Conceptualization, J.Z.; Formal analysis, J.Z. and T.-K.L.W.; Investigation, T.-K.L.W.; Writing—original draft, J.Z.; Writing—review & editing, J.Z. and T.-K.L.W. All authors have read and agreed to the published version of the manuscript.

Funding

J.Z. is supported by United States Air Force Office of Scientific Research, grant number AFOSR-FA9550-19-1-0213. T-K.L.W. is supported by NSERC Discovery Grant RGPIN-2019-04419 and a Connaught New Researcher Award.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Amari, S.I.; Nagaoka, H. Methods of Information Geometry; American Mathematical Society: Providence, RI, USA, 2000; Volume 191. [Google Scholar]
Ay, N.; Jost, J.; Vân Lê, H.; Schwachhöfer, L. Information geometry and sufficient statistics. Probab. Theory Relat. Fields 2015, 162, 327–364. [Google Scholar] [CrossRef] [Green Version]
Zhang, J.; Khan, G. From Hessian to Weitzenböck: Manifolds with torsion-carrying connections. Inf. Geom. 2019, 2, 77–98. [Google Scholar] [CrossRef]
Zhang, J.; Khan, G. Statistical mirror symmetry. Differ. Geom. Its Appl. 2020, 73, 101678. [Google Scholar] [CrossRef]
Naudts, J. Estimators, escort probabilities, and ϕ-exponential families in statistical physics. J. Inequal. Pure Appl. Math. 2004, 5, 102. [Google Scholar]
Naudts, J.; Zhang, J. Rho–tau embedding and gauge freedom in information geometry. Inf. Geom. 2018, 1, 79–115. [Google Scholar] [CrossRef]
Murata, N.; Takenouchi, T.; Kanamori, T.; Eguchi, S. Information geometry of U-Boost and Bregman divergence. Neural Comput. 2004, 16, 1437–1481. [Google Scholar] [CrossRef]
Zhang, J. Divergence function, duality, and convex analysis. Neural Comput. 2004, 16, 159–195. [Google Scholar] [CrossRef]
Amari, S.I.; Ohara, A. Geometry of q-exponential family of probability distributions. Entropy 2011, 13, 1170–1185. [Google Scholar] [CrossRef]
Amari, S.I.; Ohara, A.; Matsuzoe, H. Geometry of deformed exponential families: Invariant, dually-flat and conformal geometries. Phys. A Stat. Mech. Appl. 2012, 391, 4308–4319. [Google Scholar] [CrossRef]
Ohara, A.; Matsuzoe, H.; Amari, S.I. Conformal geometry of escort probability and its applications. Mod. Phys. Lett. B 2012, 26, 1250063. [Google Scholar] [CrossRef]
Wong, T.K.L. Logarithmic divergences from optimal transport and Rényi geometry. Inf. Geom. 2018, 1, 39–78. [Google Scholar] [CrossRef] [Green Version]
Pal, S.; Wong, T.K.L. Multiplicative Schröodinger problem and the Dirichlet transport. Probab. Theory Relat. Fields 2020, 178, 613–654. [Google Scholar] [CrossRef]
Pal, S.; Wong, T.K.L. The geometry of relative arbitrage. Math. Financ. Econ. 2016, 10, 263–293. [Google Scholar] [CrossRef] [Green Version]
Pal, S.; Wong, T.K.L. Exponentially concave functions and a new information geometry. Ann. Probab. 2018, 46, 1070–1113. [Google Scholar] [CrossRef] [Green Version]
Wong, T.K.L. Information Geometry in Portfolio Theory. In Geometric Structures of Information; Springer: Berlin/Heidelberg, Germany, 2019; pp. 105–136. [Google Scholar]
Wong, T.K.L.; Zhang, J. Tsallis and Rényi deformations linked via a new λ-duality. arXiv 2021, arXiv:2107.11925. [Google Scholar]
Kurose, T. On the divergences of 1-conformally flat statistical manifolds. Tohoku Math. J. Second Ser. 1994, 46, 427–433. [Google Scholar] [CrossRef]
Wong, T.K.L.; Yang, J. Logarithic divergence: Geometry and interpretation of curvature. In International Conference on Geometric Science of Information; Springer: Berlin/Heidelberg, Germany, 2019; pp. 413–422. [Google Scholar]
Amari, S.I. Information Geometry and Its Applications; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
Eguchi, S. Geometry of minimum contrast. Hiroshima Math. J. 1992, 22, 631–647. [Google Scholar] [CrossRef]
Matumoto, T. Any statistical manifold has a contrast function—On the C3-functions taking the minimum at the diagonal of the product manifold. Hiroshima Math. J 1993, 23, 327–332. [Google Scholar] [CrossRef]
Nagaoka, H.; Amari, S.I. Differential Geometry of Smooth Families of Probability Distributions; Technical Report METR 82-7; University of Tokyo: Tokyo, Japan, 1982. [Google Scholar]
Zhang, J. Referential duality and representational duality on statistical manifolds. In Proceedings of the Second International Symposium on Information Geometry and Its Applications, Tokyo, Japan, 12–16 December 2005; Volume 1216, pp. 58–67. [Google Scholar]
Zhang, J. Nonparametric information geometry: From divergence function to referential-representational biduality on statistical manifolds. Entropy 2013, 15, 5384–5418. [Google Scholar] [CrossRef] [Green Version]
Blondel, M.; Martins, A.F.; Niculae, V. Learning with Fenchel-Young losses. J. Mach. Learn. Res. 2020, 21, 1–69. [Google Scholar]
Naudts, J. Generalized exponential families and associated entropy functions. Entropy 2008, 10, 131–149. [Google Scholar] [CrossRef] [Green Version]
Naudts, J. Generalized Thermostatistics; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Zhang, J. On monotone embedding in information geometry. Entropy 2015, 17, 4485–4499. [Google Scholar] [CrossRef] [Green Version]
Eguchi, S. Information geometry and statistical pattern recognition. Sugaku Expos. 2006, 19, 197–216. [Google Scholar]
Matsuzoe, H. Hessian structures on deformed exponential families and their conformal structures. Differ. Geom. Its Appl. 2014, 35, 323–333. [Google Scholar] [CrossRef]
Newton, N.J. An infinite-dimensional statistical manifold modelled on Hilbert space. J. Funct. Anal. 2012, 263, 1661–1681. [Google Scholar] [CrossRef]
Montrucchio, L.; Pistone, G. Deformed exponential bundle: The linear growth case. In International Conference on Geometric Science of Information; Springer: Berlin/Heidelberg, Germany, 2017; pp. 239–246. [Google Scholar]
De Andrade, L.H.; Vieira, F.L.; Cavalcante, C.C. On Normalization Functions and ϕ-Families of Probability Distributions. In Progress in Information Geometry: Theory and Applications; Springer Nature Switzerland AG: Cham, Switzerland, 2021. [Google Scholar]
Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1970. [Google Scholar]
Villani, C. Topics in Optimal Transportation; American Mathematical Society: Providence, RI, USA, 2003. [Google Scholar]
Villani, C. Optimal Transport: Old and New; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Wong, T.K.L.; Yang, J. Pseudo-Riemannian geometry encodes information geometry in optimal transport. Inf. Geom. 2021, 1–29. [Google Scholar] [CrossRef]
Tsallis, C. Possible generalization of Boltzmann–Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
Tsallis, C. What are the numbers that experiments provide. Quim. Nova 1994, 17, 468–471. [Google Scholar]
Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics; The Regents of the University of California: Oakland, CA, USA, 1961. [Google Scholar]
Van Erven, T.; Harremos, P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory 2014, 60, 3797–3820. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Illustration of the

λ

-logarithmic divergence. Top:

λ = - 1

and

f (x) = \sqrt{x (10 - x)}

. Bottom:

λ = 1

and

f (x) = 2 log x

. In both cases,

x^{'} = 4

and

x = 8

, and we plot the function on the interval

(2, 9)

. Note that the first-order logarithmic approximation (dashed grey curve) supports the graph of f from below.

Figure 1. Illustration of the

λ

-logarithmic divergence. Top:

λ = - 1

and

f (x) = \sqrt{x (10 - x)}

. Bottom:

λ = 1

and

f (x) = 2 log x

. In both cases,

x^{'} = 4

and

x = 8

, and we plot the function on the interval

(2, 9)

. Note that the first-order logarithmic approximation (dashed grey curve) supports the graph of f from below.

Table 1. Generalization of objects from the Hessian (dually flat) geometry to the

λ

-deformed (dually projectively flat) geometry.

Table 1. Generalization of objects from the Hessian (dually flat) geometry to the

λ

-deformed (dually projectively flat) geometry.

Objects	Conventional ( $λ = 0$ )	$λ$ -Deformed
transformation	Legendre	$λ$ -Legendre
conjugation	${sup}_{x} (x \cdot u - f (x))$	${sup}_{x} (κ_{λ} (x \cdot u) - f (x))$
potentials	convex	$λ$ -convex
associated divergence	Bregman	$λ$ -logarithmic
Riemannian metric	Hessian	conformal Hessian
affine connections	dually flat	dually projectively flat
curvature of connections	0	constant $λ \neq 0$
biorthogonal coordinates	$(x, u)$	$(\tilde{x}, u)$ or $(x, \tilde{u})$
probability family	exponential	$λ$ -exponential
probability family	mixture	$λ$ -mixture
associated divergence	Kullback–Leibler	Rényi
associated entropy	Shannon	Rényi/Tsallis

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, J.; Wong, T.-K.L. λ-Deformation: A Canonical Framework for Statistical Manifolds of Constant Curvature. Entropy 2022, 24, 193. https://doi.org/10.3390/e24020193

AMA Style

Zhang J, Wong T-KL. λ-Deformation: A Canonical Framework for Statistical Manifolds of Constant Curvature. Entropy. 2022; 24(2):193. https://doi.org/10.3390/e24020193

Chicago/Turabian Style

Zhang, Jun, and Ting-Kam Leonard Wong. 2022. "λ-Deformation: A Canonical Framework for Statistical Manifolds of Constant Curvature" Entropy 24, no. 2: 193. https://doi.org/10.3390/e24020193

APA Style

Zhang, J., & Wong, T.-K. L. (2022). λ-Deformation: A Canonical Framework for Statistical Manifolds of Constant Curvature. Entropy, 24(2), 193. https://doi.org/10.3390/e24020193

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

λ-Deformation: A Canonical Framework for Statistical Manifolds of Constant Curvature

Abstract

1. Introduction

2. The Standard Model of Information Geometry

2.1. The Standard Model

2.2. Dually Flat Geometry

2.3. Exponential and Mixture Families

2.4. Deforming exp and log

2.5. Highlights of λ -Deformation

3. Deforming the Legendre Duality: λ -Duality

3.1. Legendre Duality and Bregman Divergence Reviewed

3.2. λ -Deformation of Legendre Duality

3.3. Relations between the λ -Duality and Legendre Duality

4. λ -Logarithmic Divergence and Its Dualistic Geometry

4.1. λ -Gradient

4.2. λ -Logarithmic Divergence

4.3. λ -Logarithmic Divergence in Different Forms

4.4. Dualistic Geometry of λ -Logarithmic Divergence

5. Linking λ -Deformation to Rényi Entropy and Divergence

5.1. Relation between Tsallis’ and Rényi’s Deformation Expressions

5.2. λ -Exponential Family

5.2.1. Under Subtractive Normalization

5.2.2. Under Divisive Normalization

5.3. λ -Mixture Family

5.4. Potential Functions as Rényi Entropies

6. Summary and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.5. Highlights of $λ$ -Deformation

3. Deforming the Legendre Duality: $λ$ -Duality

3.2. $λ$ -Deformation of Legendre Duality

3.3. Relations between the $λ$ -Duality and Legendre Duality

4. $λ$ -Logarithmic Divergence and Its Dualistic Geometry

4.1. $λ$ -Gradient

4.2. $λ$ -Logarithmic Divergence

4.3. $λ$ -Logarithmic Divergence in Different Forms

4.4. Dualistic Geometry of $λ$ -Logarithmic Divergence

5. Linking $λ$ -Deformation to Rényi Entropy and Divergence

5.2. $λ$ -Exponential Family

5.3. $λ$ -Mixture Family