λ-Deformation: A Canonical Framework for Statistical Manifolds of Constant Curvature

This paper systematically presents the λ-deformation as the canonical framework of deformation to the dually flat (Hessian) geometry, which has been well established in information geometry. We show that, based on deforming the Legendre duality, all objects in the Hessian case have their correspondence in the λ-deformed case: λ-convexity, λ-conjugation, λ-biorthogonality, λ-logarithmic divergence, λ-exponential and λ-mixture families, etc. In particular, λ-deformation unifies Tsallis and Rényi deformations by relating them to two manifestations of an identical λ-exponential family, under subtractive or divisive probability normalization, respectively. Unlike the different Hessian geometries of the exponential and mixture families, the λ-exponential family, in turn, coincides with the λ-mixture family after a change of random variables. The resulting statistical manifolds, while still carrying a dualistic structure, replace the Hessian metric and a pair of dually flat conjugate affine connections with a conformal Hessian metric and a pair of projectively flat connections carrying constant (nonzero) curvature. Thus, λ-deformation is a canonical framework in generalizing the well-known dually flat Hessian structure of information geometry.


Introduction
Information geometry is a differential-geometric framework for studying finite-dimensional statistical models that coherently integrates the following notions: (i) A differentiable manifold M consisting of probability density functions or finite measures on a common sample space; (ii) A divergence function D[p||p ] that defines an asymmetric proximity between points p, p in M; (iii) A Riemannian metric g plus a pair of torsion-free dual (conjugate) affine connections ∇, ∇ * on M.
For completeness, we recall that a pair of affine connections ∇, ∇ * on M are said to be dual (or conjugate) with respect to a Riemannian metric g if for any vector fields X, Y, and Z on M, one has: Zg(X, Y) = g(∇ Z X, Y) + g(X, ∇ * Z Y).
Here, (M, g, ∇, ∇ * ) is called a dualistic structure. When D is the Kullback-Leibler divergence (or more generally, f -divergence), the induced Riemannian metric g is the Fisher-Rao metric, and the induced cubic form C = ∇ * − ∇ is the Amari-Chentsov tensor [1]. It can be shown that the Fisher-Rao metric and the Amari-Chentsov tensor are unique invariants, of respectively second and third orders, under sufficient statistics on the manifold M [2].
Geometrically, the standard model (denoted the S-model in this paper) uses a pair of affine connections that are torsion-free, though in general, they are not curvature-free. An alternative, "partially flat" model (denoted the P-model in this paper) was recently investigated in [3], leading to the notion of "statistical mirror symmetry" [4]. Under the P-model, the affine connections ∇ and ∇ * are allowed to carry torsion, but are both curvature-free. See [4] for the geometric properties of the P-model leading to a symplecticto-complex correspondence characteristic of mirror Calabi-Yau manifolds studied in string theory and mathematical physics.
Within the usual S-model, a special case is the dually flat geometry where the Riemannian metric can be expressed under special coordinate systems as a Hessian metric. Two prominent examples are the exponential family and the mixture family, where the Hessian metric coincides with the Fisher-Rao metric. The Hessian geometry is said to be dually flat because the Riemann curvature tensors of both the primal and the dual connections vanish; the corresponding primal and dual affine coordinate systems are linked via Legendre transformations by a pair of convex potentials. For an exponential family, these coordinates are precisely the natural (canonical) and mixture (expectation) coordinate systems, respectively. Note that the Hessian metric itself is not flat as its Levi-Civita connection contains curvature in general.
Between the well-understood dually flat Hessian geometry and the full-blown Smodel, there is a wide range of geometries capturing various probability models. Of special interest are generalizations of the exponential family, namely deformed exponential families. The φ-exponential family was introduced in the context of statistical physics [5]; it was later shown [6] to be equivalent to the U-model [7] motivated by applications in machine learning- [6] revealed that both the φ-and U-models can be generated from the (ρ, τ)-model [8] through the mechanism of "gauge selection". The (ρ, τ)-metric generalizes the Fisher-Rao metric and may lead to a conformal Hessian metric for a φ-exponential family. However, the connections are typically not curvature-free unless a special type of gauge is selected; this underlies the geometric characterization of the q-exponential model of Tsallis by [9][10][11].
In recent years, the second author [12], motivated by previous works with Pal on mathematical finance and optimal transport [13][14][15][16], studied a class of deformed exponential families generating constant curvatures through the use of a new divergence function called logarithmic divergence. By constant (information geometric) curvature, we mean that both the primal and dual Riemann curvature tensors have (the same) constant sectional curvature with respect to g. In [17], the present authors developed a unified framework, based on the notions of λ-duality and the λ-exponential family, which appears to provide a canonical extension of the dually flat geometry to the constant curvature case. Previously, statistical manifolds with constant curvature were studied using the abstract tools of affine differential geometry; see, e.g., [1,18] (also see [19]). Our framework provides a concrete approach and an explicit construction that elucidates how the properties of the exponential family and the dually flat geometry may be extended to the constant curvature case. In this paper, a careful exposition of the λ-deformation framework is provided from the perspective of λ-duality, namely the λ-deformation of Legendre duality.
The rest of the paper is organized as follows. In Section 2, we review the standard S-model of information geometry with a focus on the dually flat geometry, based on convex duality and Bregman divergence, of the exponential and mixture families. The section closes with a preview of λ-deformation by introducing a suite of four deformation functions, as two pairs of mutually inverse functions: log λ versus exp λ and κ λ versus γ λ , with the first pair deforming log and exp and the second pair deforming the identity function. In Section 3, we describe the λ-duality, which deforms the standard convex duality. In particular, we compare λ-duality and standard Legendre duality and show their relations to each other upon a change of parameterization. In Section 4, we define the λ-gradient and then the λ-logarithmic divergence and study the constant curvature information geometry the latter induces. In Section 5, we relate λ-divergence to Rényi entropy by introducing the λ-exponential and λ-mixture family. The two expressions of the λ-exponential family under divisive and subtractive normalization correspond to, respectively, Rényi deformation and Tsallis deformation. Section 6 concludes with a comparison of λ-deformation with the standard dually flat (Hessian) framework.

The Standard Model
We begin by recalling the standard framework (referred to as the S-model) of parametric information geometry [1,20]. Let M be a finite-dimensional differentiable manifold with dimension d and θ = (θ 1 , . . . , θ d ) be a local coordinate system. The most important case is where M is a manifold of parametric probability density functions. However, the idea of deforming Legendre duality to λ-duality and hence dually flat (Hessian) manifolds to manifolds of constant curvature discussed in Sections 3 and 4 is entirely general and does not rely on M being a manifold of probability density functions.
Let (X , µ) be a measure space, where µ is called the reference (or dominating) measure. Let Θ ⊂ R d be an open domain. A parametric family of density functions is a mapping θ ∈ Θ → p(·|θ), where each p(·|θ) is a probability density function with respect to µ, i.e., X p(ζ|θ)dµ(ζ) = 1. We assume that the family is sufficient regular such that all analytical operations (such as differentiation under the integral sign) can be performed as needed.
While a dualistic structure (M, g, ∇, ∇ * ) can be defined abstractly, in practice, it is often constructed by a divergence, namely a smooth, non-negative function D[·||·] on M × M such that D[p||p ] = 0 only if p = p and the (0,2)-tensor g it induces on M (see (2) below) is positive definite. Intuitively, D[p||p ] defines a notion of "asymmetric distance" between points p and p of M. When M is a manifold of density functions, a prominent example is the Kullback-Leibler (KL) divergence (relative entropy) given by: When dealing with parametric probability families, p and p are replaced by p(·|θ) and p(·|θ ), then we denote D[p||p ] as D(θ, θ ) with an abuse of notation, that is: and similarly for H as well-the notation of [p||p ] in the divergence for probability density functions emphasizes the non-symmetricity in p, p ; see [1]. Eguchi [21] showed that any divergence function (called a "contrast function" there) induces a dualistic structure (M, g, ∇, ∇ * ). In local coordinates, given D(θ, θ ), the components g ij of the metric g are given by: and the Christoffel symbols of the conjugate connections ∇ and ∇ * are given respectively by: ( Conversely, given any dualistic structure (M, g, ∇, ∇ * ), there exists a divergence D that induces it, but this D is not unique in general [22]. Thus, the standard model S is completely encoded by the choice of a divergence function.

Dually Flat Geometry
The most important example of a dualistic structure is the dually flat geometry, which is induced by a Bregman divergence [23]. Let M be prescribed with an affine coordinate chart θ ∈ Θ on an open convex set Θ ⊂ R d . Let φ : Θ → R be a differentiable convex function; specifically, we assumed that φ is C 2 and its Hessian D 2 φ is strictly positive definite. The Bregman divergence of φ is defined by: is the Euclidean gradient and a · b denotes the standard dot product. We call θ ∈ Θ the primal coordinates, and η = Dφ(θ) the dual coordinates, where the inverse of Dφ is given by θ = Dφ * (η). Here, the Legendre conjugate φ * (or convex conjugate) of φ is defined by: Then, the components g ij of the Riemannian metric g, under the respective local coordinate system, are given by: In particular, g is a Hessian metric with potential φ (resp. φ * ) under θ (resp. η). Furthermore, the Christoffel symbols of ∇ and ∇ * are given respectively by: From (6), we see that the Riemann curvature tensors of both ∇ and ∇ * vanish. Thus, we call this a dually flat geometry. Furthermore, a ∇-geodesic (resp. ∇ * -geodesic) is a constant velocity straight line under the θ (resp. η) coordinate system. Moreover, the θ and η coordinates are biorthogonal in the sense that: and the Bregman divergence takes the forms of: with η = Dφ(θ) and θ = Dφ * (η ), where A is called the canonical divergence: Following [24,25], we call the equality between two expressions of B and the equality between two expressions of A in (8) reference-representation biduality. In [26], the identity (8) was used to motivate a family of Fenchel-Young losses in the context of regularized prediction in machine learning. Last but not least, the Bregman divergence satisfies the generalized Pythagorean theorem: given points P, Q, and R, we have the equality: if and only if the ∇-geodesic between Q and R and the ∇ * -geodesic between Q and P meet g-orthogonally at Q. As we will see in Section 4, all the properties above have natural generalizations under our λ-framework. We stress that the dually flat geometry depends crucially on classical convex (or Legendre) duality, as seen from (4) and (8).

Exponential and Mixture Families
The dually flat Hessian geometry arises naturally in the exponential and mixture families of probability densities. Given a reference measure µ on a state space X , an exponential family is a parameterized probability density p (e) (·|θ) of the form: where θ = (θ 1 , . . . , θ d ) ∈ Θ ⊆ R d and F(ζ) = (F 1 (ζ), . . . , F d (ζ)) is a vector of sufficient statistics. In (9), the cumulant generating function φ, defined by: enforces the normalization p (e) dµ = 1. The exponential family generalizes the Boltzmann-Gibbs distribution in statistical physics, where Z(θ) = e φ(θ) is called the partition function. The information geometry of the exponential family begins with the observation that φ is convex. Then, φ defines a Bregman divergence B φ giving rise to a dually flat structure. It can be shown that the Bregman divergence is a KL-divergence: The induced Riemannian metric g, the Fisher-Rao metric given by (in matrix components g ij ), becomes a Hessian metric D 2 φ: Equivalently, g(θ) is the covariance matrix of the sufficient statistics F: Furthermore, the dual coordinate η = Dφ(θ) is the expectation coordinates given by: η = p (e) (ζ|θ)F(ζ)dµ, and the dual potential function φ * is, as a function of η, the negative Shannon entropy: A theoretical justification for the exponential family is that it maximizes the Shannon entropy under the constraints of its expected value of the vector of random functions F(·).
The mixture family is another probability family that is very useful in both theory and applications. Let P 0 (ζ), P 1 (ζ), . . . , P d (ζ) be a set of affinely independent probability densities with respect to the same dominating measure µ. Given mixture parameters η i > 0 for i = 0, . . . , d with ∑ d i=0 η i = 1, the mixture family p (m) (·|η) is defined by: where (η 1 , . . . , η d ) may be taken as the independent parameters. It can be shown that the negative Shannon entropy: of a mixture family is convex in η. Using ψ as the potential function, we have: which is again a KL-divergence and induces a dually flat geometry. In summary, the exponential and mixture families are both dually flat when the geometry is induced by the KL-divergence. For completeness, we note that the convex conjugate of ψ(η) is: with conjugate parameters θ = Dψ given by:

Deforming exp and log
The exponential function used in the exponential family: allows the cumulant generating function φ(θ) (also called the potential function) and the partition function Z(θ) to be linked by the simple relation φ = log Z. The equivalence of using φ as subtractive normalization and Z as divisive normalization of the same exponential family p (e) (ζ|θ)dµ = 1 is due to the elementary, but crucial property exp(x + y) = exp(x) exp(y) of the exponential function. Using a functional form other than exp (exponential function) or log (logarithm function) is referred to as deformation in information geometric (statistical and information theoretic) contexts, and the resulting probability families are called "deformed" families. Typically, this is performed by regarding log, or equivalently exp, as special cases of some parametric class of functions that include them as special members. More generally, the exponential/logarithmic function can be considered within a non-parametric function space that includes exp or log as a special member. Several approaches can be found in the literature, including the φ-deformed exponential approach by Naudts [5,27,28], the conjugate (ρ, τ)-embedding approach by the first author [8,25,29], and the U-model by Eguchi [7,30]. The φ-model and U-model are both one-function models, while the (ρ, τ)-model uses two free functions. It eventually became clear in the 2018 paper [6] by Naudts and the first author that (i) the φ-and U-model turned out to be equivalent; (ii) they are special cases of the (ρ, τ)-model upon a particular fixing of the "gauge freedom"; (iii) the corresponding (ρ, τ)-geometry of the manifold of the φ-exponential family can have different appearances (gauges freedom), such as a Hessian geometry (under one type of gauge selection) and a conformal Hessian geometry (under another type of gauge selection). The work [6] unified the intermediary results in [10,11,31] and provided a general deformation framework that preserves the rigid interlocking of: (i) the functional form of entropy, cross-entropy, and relative entropy (divergence); (ii) the functional form of the deformed probability family with the corresponding normalization and potential and the duality between the natural and expectation parameterizations; (iii) the expressions of the Riemannian metric (Fisher-Rao metric in general and Hessian metric in particular) and of the conjugate connections. Some of these concepts have their correspondence in nonparametric probability families as well [32][33][34].
Although the (ρ, τ)-model may admit a conformal Hessian metric (more rigorously stated: the φ-exponential family with the (ρ, τ)-metric under a certain gauge will lead to conformal Hessian geometry), the dual connections are not projectively flat (as the geometry studied by [12]). As a result, while the connections are not flat (torsion-free, but not curvature-free), they are not in general of the constant-curvature-type either. Therefore they are "too general" and do not generate the space of constant curvatures.

Highlights of λ-Deformation
Here enters λ-deformation as a middle ground [17]. The λ-deformation theory absorbs the q-deformation model of Tsallis and the F (±α) model of Wong [12] in deforming the exponential family and unifies the subtractive and divisive normalization-this is an occasion where subtractive and divisive normalizations are still linked by a simple reparameterization of the probability family.
Let us introduce some notations. Consider the following deformed logarithm and exponential functions (note the slight difference to the log q notation used by Tsallis, in the way how the subscript indicates the deformation parameter): where [a] + = max{a, 0}. In our analysis, we assumed implicitly that 1 + λt > 0, which is shown to hold for λ-duality, so the subscript + can be omitted.
For this reason, we restricted λ to this range as in [9,28]. Below, we also took log t = −∞ whenever t ≤ 0. Note that our notation differs slightly from Tsallis' indexing of the deformed logarithm and exponential functions; see Section 5.
Next, we construct another pair of inverse functions κ λ , γ λ by: where • denotes function composition. Explicitly, they are: This suite of four functions, namely exp λ , log λ as an inverse pair and κ λ , γ λ as another inverse pair, is called λ-deformation and used in the discussions of λ-convexity, λ-conjugation, and λ-duality. Regular exponential and logarithmic functions are recovered when λ → 0, whence both κ λ and γ λ reduce to the identity function. Using these four functions, Wong and Zhang [17] developed the λ-deformation framework to solve the problem of relating the exponential family under subtractive normalization: to that under divisive normalization: There, the same λ-deformed exponential family can be expressed by two parameterizations θ and ϑ linked through: while the normalization functions φ λ and ϕ λ (with different domains) are linked through: The λ-deformation framework led to a unified way of looking at the Tsallis entropy (related to the subtractive denormalization) and Rényi entropy (related to the divisive normalization), as well as generating new insights into the distinction between the exponential and mixture families through the lens of deformation theory. To understand this deformation better, we describe the underlying mathematical framework of λ-deformation.

Deforming the Legendre Duality: λ-Duality
In this section, we describe the λ-duality and a its link to the standard Legendre duality. We start by defining the notions of λ-conjugate and λ-convexity/λ-concavity, then draw a parallel to the regular Legendre duality. We proceed to establish a formal correspondence between the λ-duality and classical convex duality, including the associated notions of the λ-gradient, λ-logarithmic divergence, etc. Some of the derivations are illustrative, yet heuristic-a rigorous analysis in the spirit of Rockafellar [35] is yet to be performed in future research.

Legendre Duality and Bregman Divergence Reviewed
Recall from (4) that the convex conjugate of a function f on R d is defined by: It can be proven that: When f is further differentiable, then the Legendre transformation: which can be motivated by the first-order condition in (11), defines a "dual variable" u, satisfying the Fenchel identity: We have x = D f * (u), provided the second derivative or D 2 f is positive definite. The function f also defines a Bregman divergence B f given by: The Bregman divergence satisfies the reference-representation biduality [24,25] in the sense that: Note that when f is convex and differentiable, the nonnegativity of the Bregman divergence encodes the fact that for any x, x :

λ-Deformation of Legendre Duality
The main idea behind the λ-deformation of the Legendre duality ("λ-duality") is to replace the term x · u in (11) by a monotone transformation of x · u. Given a parameter λ ∈ R \ {0}, later revealed to be the curvature parameter of the information geometric characterization, we replace the term x · u by: where κ λ (t) and its inverse γ λ (t) are given by (10). With this in mind we give the following definition.

Definition 1 (λ-conjugation).
Let Ω, Ω ⊂ R d . Given a function f : Generalized convex dualities have been heavily used in optimal transport theory [36,37] to characterize the optimal transport plans; in this context, it is called the c-duality where c is the cost function of the transport problem. A major novelty of our framework is that the functional form of κ λ (and of γ λ ) leads to explicit formulas, which are not available in the general case. We remark that this is closely related to the fact that the associated information geometry has constant curvature λ.
It turns out that the λ-conjugation defined by (14) corresponds to an appropriately generalized notion of convexity or concavity, through the aid of the function γ λ given by (10). Henceforth, we let λ ∈ R \ {0} be a fixed constant.
Definition 2 (λ-exponential convexity and concavity). Let Ω ⊂ R d be an open convex set. A function f : Ω → R is said to be λ-exponentially convex ("λ-convex"), or λ-exponentially concave ("λ-concave"), if: Note that the additive term −1/λ in the above definition of , meaning that in the limiting case of zero-convexity is just ordinary convexity.
It is easily shown that, for λ > 0 a fixed positive number, Proposition 1. Given any f : Ω → R, we define variablex, which has range Ω ⊂ R d , and function g : Ω → R by: Then, the convex (Legendre) conjugate g * of the function g: is related to the λ-conjugate f (λ) of the function f via: Proof. We first prove the following identities: where, going from the first to the second line, we used (15) and the fact: which is a re-write of the definition of g given by (16).
With the above identity, we can proceed to prove this proposition. For u ∈ Ω , we have: Recasting the above relation yields (17).
Recall that from convex analysis, g * is always a convex function regardless of whether g is convex (by the property of Legendre conjugation). The expression of g * (u) = γ λ ( f (λ) (u)) in (17) therefore implies that f (λ) is λ-convex, by the definition of λ-convexity. Corollary 1. For any f : Ω → R, its λ-conjugate f (λ) (u) as defined by (14) is a λ-convex function of u on Ω (note Ω may not necessarily be convex).
Proof. We can also give a direct proof (essentially reversing the steps of the proof of Proposition 1).
Corollary 1 is the extension of the claim that for any f , the standard Legendre conjugate f * as given by (11) is always a convex function. Because of this, we can prove, in analogy to the standard Legendre conjugation * , the following relations:

Relations Between the λ-Duality and Legendre Duality
We proceed to establish a formal relationship between the λ-duality and the ordinary Legendre duality, by relating the λ-conjugation of a λ-convex function f , denoted by f (λ) , to the standard Legendre conjugation of a function (denoted by * ).
We have, after multiplying e λ f (λ) (u) on both sides of (24), where the last step used: .
We see that the functions γ λ and γ −λ serve as link functions from the ( f , f (λ) )-pair of the λ-deformed Legendre conjugation to the (g, g * )-pair and the (g,g * )-pair of the regular Legendre conjugation.

λ-Logarithmic Divergence and Its Dualistic Geometry
In this section, we study the λ-deformation of the Bregman (canonical) divergence function and the resulting dualistic geometry (Riemannian metric and dual connections), which correspond to the λ-duality. This involves first establishing the λ-deformation to the gradient operation (so-called λ-gradient), which then leads to the so-called λ-logarithmic divergence function as deformation to the Bregman divergence. Finally, we show that the resulting Riemannian metric is a conformal Hessian metric, while the resulting dual connections are projectively flat (with constant curvature). The conformal and projective factor is parameterized by λ, which gives the curvature of the constant curvature space.

λ-Gradient
Definition 3 (λ-gradient). For x ∈ Ω, define the λ-gradient D (λ) f by: The work of [17] (Theorem 2.2) showed the above formula for deforming the gradient of a function motivated by the λ-duality setting. For mathematical convenience, it is proven under some regularity conditions; a full generalization along the lines of [35] is a natural direction for further research.
Theorem 2 (λ-gradient for λ-duality). Let λ = 0, and let f be a λ-exponentially convex function that is C 2 on some open convex set Ω ⊂ R d , such that (a) D 2 G λ, f is strictly positive definite and . We have 1 + λx · u > 0, and the following identity holds: Note that the λ-gradient D (λ) f differs from the regular gradient D f by a scalar multiplication. The duality between x and u under the λ-duality is mediated by a dual variable u = D (λ) f (x), which plays an important role in what follows. Let: (a) u = D (λ) f (x) denote the λ-conjugate variable corresponding to x with respect to f (x); (b)û = Dg(x) be the Legendre conjugate variable corresponding tox with respect to g(x); (c) x = D (λ) f (λ) (u) denote the λ-conjugate variable corresponding to u with respect to f (λ) (u); (d)x = Dg(ũ) be the Legendre conjugate variable corresponding toũ with respect tog(ũ).
Is there a simple relationship between them? The following proposition says u(x) =û(x), wherex and x are linked by (19), and x(u) =x(ũ), whereũ and u are linked by (20).

Proposition 2.
We have: Here, we add the subscript to D to emphasize the argument with respect to which the derivative is taken.
Proof. We use matrix notations where the gradient is regarded as a column vector. Applying the multivariate chain rule to (17), we have: where ∂x ∂x (x) is the Jacobian of the transformationx → x and (·) denotes the transpose, For two vectors x and y, their outer product is denoted by x y , which is a rank-one square matrix with the (i, j)-entry x i y j . From (15), we have: Since 1 − λD x f (x) · x > 0 by assumption, we can invert the Jacobian by the Sherman-Morrison formula (see [12], Proposition 4) to obtain: Plugging this into the above, we have: Using (25) to relate D (λ) x f (x) to D x f (x), the first relation involving Dx g(x) is proven. The proof of the second relation in this proposition is analogous.
Just as ordinary convexity leads to the notion of Bregman divergence (12), the notion of λ-exponential convexity leads to a generalization that we call the λ-logarithmic divergence.
Henceforth, we let f : Ω → R be a λ-exponentially convex function on an open convex domain Ω ⊂ R d , and we assumed that the regularity conditions in Theorem 2 hold.

λ-Logarithmic Divergence
By the definition of the λ-convexity, we have that By the ordinary convexity of G λ, f , we have: In terms of f , we have, after some manipulations, Since γ λ is increasing, we have: This motivates the following definition.
Definition 4 (λ-logarithmic divergence). We define the λ-logarithmic divergence of f by: See Figure 1 for a graphical illustration. We note that the logarithmic correction in (26) corresponds to a logarithmic first-order approximation, based at x , which is possible due to the λ-exponential convexity of f . We also note that when λ > 0, it is possible that L λ, f (x, x ) = ∞. Nevertheless, L λ, f (x, x ) is finite when x and x are sufficiently close. Formally, letting λ → 0 in (26) recovers the Bregman divergence.

λ-Logarithmic Divergence in Different Forms
We now prove a lemma about the relationship of the variables x, u and gradients or λ-gradients of f or f (λ) . We assumed, for convenience, that 1 + λx · u > 0 for all  (2,9). Note that the first-order logarithmic approximation (dashed grey curve) supports the graph of f from below.
for arbitrary x , u (such that the expressions are well defined), we have the following identities: where Π λ is a multiplicative factor (function of x or u) given by: Proof. Since u = D (λ) x f (x), substituting (25), we have: Taking the logarithm and rearranging, we obtain (27). On the other hand, because: we also have: The proof of (28) is similar.
In this above lemma, x and u are arbitrary; it is interesting that a modified form of "linearity" holds even though κ λ is itself nonlinear. As a consequence, we have an alternative expression for L λ, f (x, x ). (26) can also be written as: Of course, we may express the λ-logarithmic divergence using the conjugate variables u, u as well. Indeed, we have the analogous reference-representation biduality (see [24,25]) that is characteristic of Bregman divergence and canonical divergence for dually flat spaces, that is (8). See [38] for the reference-representation biduality of a general c-divergence (which includes both the Bregman and logarithmic divergences) based on optimal transport. Theorem 3. The λ-logarithmic divergence satisfies the reference-representation biduality, namely: We have: Proposition 3 also allows us to derive our next theorem (Theorem 4) linking λlogarithmic divergence and Bregman divergence (also see [19] for a discussion of conformal divergence in the affine immersion setting).

Theorem 4.
The canonical forms of the λ-logarithmic divergence A λ, f and A λ, f (λ) are related to the canonical forms of the Bregman divergence A g * and Ag via a conformal transformation and the non-linear link function κ −λ :

Proof.
A The proof of the second line of Theorem 4 is similar. We have A λ, f (λ) (u , x) = A λ, f (x, u ) from Theorem 3.

Dualistic Geometry of λ-Logarithmic Divergence
Regard x ∈ Ω as the primal (global) coordinate system of a manifold M. As described in Section 2.1, we may use the λ-logarithmic divergence L λ, f of f to construct a dualistic structure (M, g, ∇, ∇ * ). In this subsection, we provide explicit expressions of the corresponding coefficients and state some key geometric consequences.
We begin with the Riemannian metric.
Theorem 5. The Riemannian metric g induced from L λ, f (x, x ) is given in primal coordinate x by: Proof. According to (2), we perform direct differentiation of (26): x=x and obtain the expression of (29).
By symmetry, under the dual coordinate system u = D (λ) f (x), we have: From the first equality in (29), we see that g is a rank-one correction of the Hessian matrix D 2 f (x). From the second equality, we see that g is in fact a conformal Hessian metric, i.e., it has the form g(x) = e −λ f (x) g 0 (x), where g 0 (x) = D 2 G λ, f (x) is the Hessian metric induced by the convex function G λ, f (x) = 1 λ (e λ f (x) − 1). This conclusion is entirely anticipated from Theorem 4.
To compute the Christoffel symbols of the primal and dual connections, we need an expression of the inverse of the Riemannian metric g(x) as a matrix. This is provided by the following proposition. Proposition 4. The metric g can be expressed as: where ∂u ∂x is the Jacobian matrix of the coordinate transformation x → u and I d is the d × d identity matrix with Kronecker δ ij as its entries. Here: and Π λ (x) > 0 for x ∈ Ω and u = D (λ) f (x), due to Part (ii) of Theorem 2.
Moreover, the inverse of g(x) can be expressed as: Proof. Using the λ-logarithmic divergence represented as the generalized canonical divergence A λ, f (26), we apply (2) to obtain: x k ∂u k ∂x j .
Under the dualistic structure induced by a λ-logarithmic divergence, the primal and dual coordinate vector fields are no longer biorthogonal in the sense of (7). Nevertheless, we have the following generalization. Again, we write Π λ (x) = 1 + λx · u. Corollary 2. The inner product of the coordinate vector fields ∂ ∂x i and ∂ ∂u j is given by a λ-deformed "biorthogonality" relation: Simplifying the expression using (30) gives the result. For details, see ( [12], Proposition 8).
Theorem 6. The Christoffel symbols of the primal connection ∇ are given by: where Π λ (x) = 1 + λx · u as in Proposition 4. Furthermore, let Γ k ij = ∑ d =1 Γ ij, g k be the Christoffel symbol of the second kind, then: where δ is the Kronecker delta.
Similarly, under the dual coordinate system u, the Christoffel symbol (of the second kind) of the dual connection ∇ * is given by: Proof. This is a straightforward computation using (3) and Proposition 4. The details, which are a minor modification of the proof of ( [12], Proposition 5), are omitted.
Although the curvatures of ∇ and ∇ * are nonzero, it can be shown that ∇ and ∇ * are both projectively flat, i.e., each of them is projectively equivalent to a flat connection. Specifically, any ∇-geodesic (resp. ∇ * -geodesic) is a time-reparameterized straight line under the x (resp. u) coordinate system. Theorem 7. The sectional curvatures of ∇ and ∇ * with respect to g are both equal to λ.
Using the dual projective flatness and Corollary 2, Reference ( [12], Theorem 16) showed that the λ-logarithmic divergence satisfies a generalized Pythagorean theorem, which generalizes the property of Bregman divergence outlined in Section 2.2.
Theorem 8 (Generalized Pythagorean theorem). Let P, Q, R ∈ M. Then: if and only if the ∇-geodesic between Q and R and the ∇ * -geodesic between Q and P meet gorthogonally at Q.
To summarize, the dually flat geometry becomes a dually projectively flat geometry with constant sectional curvature λ, and the Hessian metric becomes a conformal Hessian metric. Nevertheless, the primal and dual geodesics are still straight lines (up to time reparametrizations), and the generalized Pythagorean theorem holds.
We say that the above λ-deformation framework is "canonical" because the statistical manifold (M, g, ∇, ∇ * ), with a conformal Hessian metric g ij given by (29) and a pair of dual projectively flat affine connections Γ k ij , Γ * k ij given by (32) and (33), is the only statistical structure with constant curvature ( [12], Theorem 15). Moreover, given such a statistical manifold, one can construct locally a λ-logarithmic divergence, which induces the given geometry.

Relation Between Tsallis' and Rényi's Deformation Expressions
Recall that Tsallis [39], in the context of statistical physics, introduced the generalized entropy: note that we use λ here in place of q = 1 − λ as in [40]. Tsallis entropy is related to Rényi entropy [41], defined as: through a monotonic transformation: [p] − 1 .
In our current notation, Rényi divergence is additive: given two product measures p 1 ⊗ p 2 and p 1 ⊗ p 2 , we have: Because Tsallis entropy is not additive, this has been used as an argument for favoring Rényi entropy as a physical concept over Tsallis entropy; see [28] (Section 9.3) and [42].

λ-Exponential Family
Under the λ-deformation, there is an intrinsic link between the subtractive and divisive normalizations of the λ-deformed exponential family. Starting with the observation: we investigate the identity: Taking the λ-th power and equating both sides, we obtain the conditions for the above identity to hold: This fact led us to define a λ-exponential family that can be normalized both subtractively and divisively: the former denoted by p(ζ|θ) and the latter denoted by p(ζ|ϑ).

Proposition 5 (Reparameterization equivalence).
Let λ = 0. With respect to a given reference measure µ and a fixed vector of random functions F(ζ) = (F 1 (ζ), . . . , F d (ζ)), the λ-exponential family is given by p (λ) (ζ|θ) under subtractive normalization and by p (λ) (ζ|ϑ) under divisive normalization; they are reparametrizations of each other: Here, the function φ λ (θ) is called subtractive λ-potential and used for subtractive normalization, while ϕ λ (ϑ) is called divisive λ-potential and used for divisive normalization. Note that φ λ and ϕ λ may not have same domains. They satisfy: Note again that we use ϑ for the divisive normalization setting and distinguish it from θ for the subtractive normalization setting. For later convenience, we also note:

Under Divisive Normalization
To deform the exponential family through divisive normalization, we use a smooth monotone function κ λ (·) and define a parametric probability family, which takes the form: Note that we use the symbol ϑ to distinguish it from the parameter θ in the subtractive case. Here: is the divisive normalization function, and it was assumed that: e κ λ (ϑ·F(ζ)) dµ < ∞ in the domain of ϑ (the natural parameter set). It is possible that the support of the density depends on the parameter ϑ, as in the case of the q-exponential family; see [17]. To avoid technicalities, we assumed that the support of p(ζ|ϑ) is independent of ϑ.
Writing out κ λ (), the resulting family is: where the divisive λ-potential ϕ λ (ϑ) is given by: is finite on the parameter set. This family unifies the F (±α) -families introduced in [12].

λ-Mixture Family
We next define a mixture-type family dual to the λ-exponential family, in an analogous way that an exponential family is dual to the mixture family. The form of the family is justified by its compatibility with the λ-duality.

Potential Functions as Rényi Entropies
We now show that our λ-duality framework is naturally compatible with the λexponential and λ-mixture families, with Rényi entropy and Rényi divergence replacing Shannon entropy and Kullback-Leibler divergence. In what follows, we assume λ < 1.
Proposition 7 (For the λ-exponential family). With respect to the λ-exponential family defined by (37) with divisive potential function ϕ λ given by (38), we have: is the the escort expectation: (iii) The λ-conjugate function ψ λ (η) with respect to ϕ λ (ϑ) is given by: (iv) The λ-logarithmic divergence is the Rényi divergence: Proposition 8 (For the λ-mixture family). With respect to the λ-mixture family given by (39) with its potential function ψ λ (η) given by: we have: (i) The potential function ψ λ (η) is a λ-convex function of η.
(ii) The potential function ψ λ (η) is given by: (iii) The λ-logarithmic divergence is the Rényi divergence: The proofs of the above Proposition 7 (about the λ-exponential family) and Proposition 8 (about the λ-mixture family) can be found in [17].
The two expressions reflect subtractive and divisive normalizations-a typical example of the former is the q-exponential family with associated Tsallis entropy, whereas an example of the latter is the F (±α) -family and the associated Rényi entropy. These two versions of deformation to an exponential family are two faces of the same coin; furthermore, the λ-exponential family is also linked to the λ-mixture family, when λ = 0, 1, via a reparameterization of the random functions F(ζ) above.
The coincidence of these two parameterizations of the deformed family is associated with the λ-duality, which is the main focus of our exposition. The λ-duality is a "deformation" (see Table 1) of the usual Legendre duality reviewed in Section 3.1. In a nutshell, instead of convex functions, we worked with λ-convex functions f such that 1 λ (e λ f − 1) is convex, for a fixed λ = 0. Furthermore, instead of the convex conjugate, we used the λ-conjugate given by: The expression of the λ-duality: turns out to be a re-write of the Legendre duality betweenx and u: x · u = g(x) + g * (u) , withx = xe −λ f (x) ; and a re-write of the Legendre duality between x andũ: x ·ũ =g * (x) +g(ũ) , withũ = ue −λ f (λ) (u) .
The λ-duality leads to nontrivial mathematical questions, e.g., a differential calculus in the spirit of Rockafellar and analogous to functions of the Legendre type. Some of the derivations in the current paper were heuristic, and a complete and rigorous development is left for future research.
Coming back to the probability families, we first verified that the subtractive potential φ λ (θ) is convex in θ and the divisive potential ϕ λ (ϑ) is λ-convex in ϑ. Subtractive normalization using φ λ (θ) is associated with the regular Legendre duality, whereas divisive normalization using ϕ λ (ϑ) is associated with the λ-duality. This gives an interpretation of the distinctiveness of Rényi entropy (used in the latter) from Tsallis entropy (used in the former) based on their intimate connection to the λ-duality (for λ = 0) or to the Legendre duality. As λ is the parameter that controls the curvature in the Riemannian geometry of these probability families (see [12]), our framework provides a simple parametric deformation from the dually flat geometry (of the exponential model) to the dually projectively flat geometry (of the λ-exponential model). We expect that this framework will generate new insights in the applications of the q-exponential family and related concepts in statistical physics and information science. Table 1. Generalization of objects from the Hessian (dually flat) geometry to the λ-deformed (dually projectively flat) geometry.

Conflicts of Interest:
The authors declare no conflict of interest.