Geometric Structures Induced by Deformations of the Legendre Transform

The recent link discovered between generalized Legendre transforms and non-dually flat statistical manifolds suggests a fundamental reason behind the ubiquity of Rényi’s divergence and entropy in a wide range of physical phenomena. However, these early findings still provide little intuition on the nature of this relationship and its implications for physical systems. Here we shed new light on the Legendre transform by revealing the consequences of its deformation via symplectic geometry and complexification. These findings reveal a novel common framework that leads to a principled and unified understanding of physical systems that are not well-described by classic information-theoretic quantities.


Introduction
The Legendre transform [1] plays a key-albeit perhaps not always transparent-role in many areas of mathematical physics. Specifically, it allows for the identification of dual coordinates and potentials that yield theories in terms of more convenient variables, being instrumental in diverse areas in physics ranging from relativistic field theory to condensed matter physics. Applications of the transform have their roots in classical physics-in analytical mechanics serving as a link between its Lagrangian and Hamiltonian formulations, and in thermodynamics bridging intensive and extensive variables. These notions have led to more general frameworks which, in turn, gave rise to the development of symplectic topology [2].
Far from being a relic, the Legendre transform still plays an important role in contemporary physics. It plays an important role in classical field theory, where the index of pairs of components becomes continuous. It is also used in quantum field theory, where it relates the generator of connected Green functions to the quantum effective action, i.e., the generator of one-particle irreducible Green functions. Furthermore, the relevance of the Legendre transform has lead to generalizations in the context of perturbative quantum field theories [3,4]. Overall, the transform continues to be at the core of important developments in current research.
The Legendre transform also plays a fundamental role in information geometry, where it mediates the relationship between primal and dual coordinates within the non-Riemannian geometry induced by dually flat statistical manifolds [5]. This duality gives rise to relationships of orthogonality in these geometries, corresponding to alternative

Preliminaries
The Legendre transform is, at its core, an exploration of the properties of convex functions. Despite its importance, the transform is unfortunately typically introduced as an obscure algebraic 'trick', with no explanation of why it plays such an important role in many different areas of physics. For completeness, this section presents a basic standard interpretation of the Legendre transform in mathematical physics, which is then complemented by a deeper view based on information geometry in Section 3.
The most straightforward interpretation of the Legendre transform comes from the geometry of graphs of functions [22]. In this view, the Legendre transform of a convex function F is another function G that keeps track of the (negative) height at which the tangent to F touches the y-axis, which is usually reparametrized in terms of the slope of F. This view is easy to grasp, but unfortunately makes the construction seem arbitrary while failing to explain why this procedure is so fundamental.
A more principled view comes from an algebraic perspective as follows. If F(x) with x = (x 1 , . . . , x n ) ∈ R n is a strictly convex function (i.e., its Hessian is positive-definite), then the partial derivative y i (x) := ∂F/∂x i (x) is a monotonous function of x 1 , . . . , x n for i = 1 . . . n. This means that there exists an isomorphism between x and y = (y 1 , . . . , y n ); said differently, there exist mappings y i (x) and x k (y) that transform one into the other. Using these mappings, it would be natural to consider the possibility of reparametrizing F in terms of y instead of x. However, instead of focusing on such reparametrization, an elegant move is to consider instead the function G(y) = x · y − F x(y), y . Interestingly, the resulting pair F(x) and G(y) exhibit the following symmetry: Useful properties of this transformation are that it preserves convexity (i.e., the transform of a convex function results into another convex function) and it is an 'involution', that is, the Legendre transform of the transform of a convex function is the function itself. The symmetry of these relationships is graphically represented in the right-hand side of Figure 1. Each transform brings elements of one space to the other. Please note that while Section 2 presents the classical view of Legendre transforms acting over convex functions, the rest of this work follows Ref. [9] in focusing on concave functions. (Right:) The symmetry that governs the algebraic relationships between convex dual functions and dual coordinates, which is mediated by the Legendre derivative operator D L , which differs from the standard Euclidean gradient when the transform is deformed.
Overall, one can think of the Legendre transform as acting on two inputs, x and F, and providing two outputs: the dual variable y and the convex conjugate G (similarly, the Fourier transform of a time series F(t) can be thought of as giving two outputs too: the spectrum of amplitudes G(s) (analogous to the conjugate function) and the frequency domain s itself (analogous to the dual variable)). Pairs of convex functions {F, G} satisfying Equation (1) are known as convex duals, with {x, y} being known as dual variables. Additionally, convex functions and their duals satisfy the Fenchel inequality F(x) + G(y) ≥ x · y. The multiple useful properties of Legendre duals are leveraged in various areas of mathematics and engineering, particularly in convex optimization [23].
A more general definition of the Legendre transform of a convex function is given by This definition applies even when F is not everywhere differentiable, and recovers the above procedure for the case where C(x, y) = x · y. For other choices of C, this opens the door to so-called "deformed" Legendre transforms, which play an important role in optimal transport theory [24]. Interestingly, dual functions according to these generalized Legendre transforms satisfy relationships analogous to Equation (1), but where the role of the Euclidean gradient is replaced by a 'Legendre derivative' operator D L , which is formally defined in Section 3.4. The goal of this paper is to explore the implications of such deformations of the Legendre transforms for physical systems.

Legendre Transform in Information Geometry
In this section we present the key role of the Legendre transform in statistical manifolds. For this purpose, Section 3.1 first introduces the necessary background about information geometry to the unfamiliar reader. Then, Section 3.2 explains how the standard Legendre transform describes the geometry of dually flat spaces, which are naturally associated with the Kullback-Leibler divergence and the Shannon entropy. Building on this, Section 3.3 then presents how other divergences lead to more general geometries, and Section 3.4 develops how generalized Legendre transforms are a natural way to build and describe them. Please note that hereafter we use Einstein's summation convention for convenience of the notation.

The Dual Structure of Statistical Manifolds
Our exposition is focused on statistical manifolds M whose elements are probability distributions p ξ (s), with s ∈ S being the possible events accounted for by the probability distribution and ξ ∈ O ⊂ R d with O an open subset of a set of parameter values. The geometry of such statistical manifolds is determined by two structures: a metric tensor g and a torsion-free affine connection pair (∇, ∇ * ) that are dual with respect to g. Intuitively, g establishes norms and angles between tangent vectors and, in turn, establishes curve length and the shortest curves. On the other hand, the affine connection establishes covariant derivatives of vector fields establishing the notion of parallel transportation between neighboring tangent spaces, which defines what is a straight curve.
Traditional Riemannian geometry is built on the assumption that the shortest and the straightest curves locally coincide, which is pivotal to the development of general relativity. This assumption leads to the study of metric-compatible Levi-Civita connections, as its geodesics are locally distance-minimizing and satisfy ∇ = ∇ * and are, hence, completely determined from the metric. However, modern approaches motivated in information geometry [25] and gravitational theories [26,27] consider more general scenarios, where connections may not be derivable from the metric. In such geometries, the parallel transport operator Π : T p M → T q M and its dual Π * (the dual transport operator acts on cotangent vectors and is defined by the condition of guaranteeing g q (ΠV, Π * W) = g p (V, W) for all W ∈ T p M and V ∈ T * p M ) induced by ∇ and ∇ * , respectively might differ. The departure of ∇ and ∇ * from self-duality can be shown to be proportional to Chentsov's tensor, which allows for a single degree of freedom traditionally denoted by α ∈ R [25]. Put simply, α captures the degree of asymmetry between short and straight curves, with α = 0 corresponding to metric-compatible connections where ∇ = ∇ * .
An important property of the geometry of a statistical manifold (M , g, ∇, ∇ * ) is its curvature, which can be of two types: the (Riemann-Christoffel) metric curvature or the curvature associated to the connection. Both quantities capture the distortion induced by parallel transport over closed curves, the former with respect to the Levi-Civita connection and the latter with respect to ∇ and ∇ * . In the sequel, we use the term curvature to refer exclusively to the latter type. Statistical manifolds with zero curvature (equivalently, manifolds where it is possible to find a coordinate chart pair under which the connections and its dual vanish for any point of the manifold) are said to be dually flat.

Dually Flat Geometry, Bregman Divergences, and the Legendre Transform
The geometry of Riemannian manifolds is typically formulated in terms of a single set of local coordinates. However, the fact that non-Riemannian manifolds have two dissimilar affine connections ∇ and ∇ * makes it more natural to describe their geometry in terms of two dual coordinates ξ and η [25]. Specifically, while in Riemannian geometry orthogonality can be assessed between the different dimensions of a single set of coordinates, in statistical manifolds it is more fruitful to consider orthogonality between elements of the primal ξ and dual coordinates η [6,10]. A standard example of dual coordinates in a statistical manifold is where ξ corresponds to the natural parameters of an exponential family distribution and η corresponds to the corresponding expectation values. In the sequel, we follow Schouten's notation in which upper indices are reserved for dual coordinates, i.e., Under this notation, ∂ i gives rise to a basis for the tangent space T p M , while ∂ i is related to a natural dual basis of the cotangent space T * p M . A Riemannian metric is always "locally flat", i.e., it can be brought down to its signature (a Kronecker delta) at a given point p ∈ M by choosing an appropriate coordinate chart. It is not guaranteed, however, that such a chart would preserve the delta at a neighborhood of p; finding a chart that satisfies this property globally is the hallmark of a flat geometry. Analogously, affine geometries are also locally flat when considering its dual entry, therefore satisfying g(∂ i , ∂ j ) = δ j i for an appropriate pair of primal and dual coordinate charts {ξ i , η i } at some point p. In a similar fashion, this property in general only holds locally; dually flat geometries are characterized by the fact that one can find a pair of coordinates that satisfies this condition of orthogonality on the whole manifold (under these coordinate charts, one can show that both the connections and its dual are vanishing, hence the term dual flatness).
For an orthogonal pair {ξ, η} of a given dually flat manifold, the gradients of the mappings ξ → η and η → ξ are both symmetric. To confirm this, let us first note that where the first equality follows from the chain rule of derivatives ∂ i = ∂ i η k ∂ k . Then, using the fact that Riemannian metrics are always symmetric, one can see that ∂ i η j = g ij = g ji = ∂ j η i . A similar derivation shows that g ij = ∂ i ξ j , and hence ∂ i ξ j = ∂ j ξ i (note that g ij = ∂ i η j and g ij = ∂ i ξ j is consistent with the fact that for orthogonal coordinates g(∂ i , ∂ k ) = g ij g jk = δ i k ). There is an intimate relationship between an orthogonal pair of coordinates in a dually flat manifold and the Legendre transform. To see this, we first note that the symmetry of the Jacobian of ξ → η implies the existence of a closed 1-form dω = 0, and this-via Poincare Lemma-implies in turn the existence of a scalar potential ψ ∈ C ∞ that satisfies Note that the second condition, combined with the fact that g ij is positive-semidefinite, implies that ψ is convex. By a similar line of reasoning, the symmetry of g i,j induces a dual convex potential ϕ that satisfies Furthermore, a direct calculation shows that the dual potentials ψ(ξ 1 , ..., ξ n ) and ϕ(η 1 , ..., η n ) always satisfy d{ψ + ϕ − ξ i η i } = 0. This implies that, modulo an unimportant constant, the following relationship holds over any dually flat manifold (Equation (6) holds on any manifold but only locally; in contrast, dually flat spaces are a special case in which dual potentials ϕ, ψ that satisfy Equations (4) and (5) can be defined over the whole manifold): Let us now consider the behavior of Equation (6) on dually flat spaces when the coordinates and potentials are evaluated at different points of the manifold. For this, let us denote as ξ(p) and η(q) the coordinates and dual coordinates of p, q ∈ M , respectively, and define the so-called Bregman divergence D as Then, the differential of the mapping q → D(p 0 ||q) is From this, and considering that D by definition is a difference between a linear and two convex functions, one can verify that this mapping attains its unique minimum when q = p 0 . Interestingly, at this minimal value one recovers Equation (6), which implies that D = 0. This shows that Bregman divergences are non-negative.
These results suggest an alternative definition for ϕ and ψ, conceiving them as a maximum of the following maps: This reveals that the orthogonal coordinate pair is always dual in the Legendre sense, or equivalently, that dual flatness implies that the potentials are convex duals. This property generalizes the well-known Legendre duality between the natural and expectation parameters of an exponential family [28], showing that the same holds of any coordinate pair as long as they satisfy local flatness.

Divergences as a General Tool to Establish Geometries
This subsection explains how divergences, such as the one introduced in Equation (7), can be used as a convenient tool to establish a geometry on a statistical manifold ( [29], Section 4). Importantly, this approach does not lack generality, as any geometry can be expressed from an appropriate divergence [30][31][32].
Divergences are a general class of functions that assess the dissimilarity of their arguments. More specifically, a divergence is a smooth, distance-like function D[x; x ] that satisfies D[x; x ] ≥ 0 and vanishes only when x = x . Divergences are more general-hence weaker-notions than distances, as they do not need to be symmetric in their arguments and may not respect the triangle inequality. Of the various types of divergences explored in the literature [33], two are particularly important: f -divergences (which are monotonic with respect to coarse-grainings of the domain of events S [34]) and Bregman divergences (studied in the previous section).
Let us show how divergences can be used to establish metrics and connections over manifolds. For this, let us use the shorthand notation D[ξ; ξ ] := D(p||q) when expressing D in terms of coordinates ξ = ξ(p) and ξ = ξ(q). Then, the Riemannian metric of the manifold is recovered from the second-order expansion of the divergence as follows: which is positive-definite due to the non-negativity of D. This construction leads to the Fisher's metric, which is the unique metric that emerges from a broad class of di-vergences ( [29], Th. 5), with this being closely related with Chentsov's theorem [35][36][37][38]. Similarly, connections emerge at the third-order expansion of the divergence as follows: In summary, Fisher's metric is insensible the choice of divergence but the resulting connections are, and therefore the effects of a particular D manifest only at the third order. Bregman divergences always give rise to flat geometries, as for them, and therefore other types of divergences are needed in order to establish curved non-Riemannian geometries. As mentioned in Section 3.1, the deviation of a given connection ∇ from its corresponding metric-compatible (i.e., Levi-Civita) counterpart can be measured by αT, where T corresponds to the invariant Amari-Chensov tensor [39,40] and α ∈ R is a free parameter. The invariance of T implies that the value of α entirely determines the connection, and the corresponding geometry can be obtained from a divergence of the form [10] which is known as α-divergence. As important particular cases, if α = 0 then D α becomes the square of Hellinger's distance, and if α = ±1 then it gives the well-known Kullback-Leibler divergence. Furthermore, it can be shown that the Kullback-Leibler divergence is a Bregman divergence, which in turn implies that for those cases the resulting geometry is flat. This illustrates the fact that being Riemannian (i.e., α = 0) and Euclidean (α = ±1) are independent features of a geometry. We finish this subsection by noting that multiple divergences can give rise to the same geometry. A one-to-one relationship between divergence and geometries is obtained when considering conformal-projective equivalent classes of divergences, which are related both via conformal and projective transformations. For a more detailed explanation, we refer the interested reader to Ref. [10], Sec. 2-D.

Generalized Legendre Transforms as a Natural Way to Describe Curved Manifolds
Sections 3.2 and 3.3 clarified the intimate relationship that exists between dually flat manifolds, Bregman divergences, and the Legendre transform. Here we explain how these relationships are altered in more complex geometries.
In curved geometries it is impossible to construct dual potentials that satisfy Equation (6) on the whole manifold. This impossibility is a symptom of the fact that the divergence that gives rise to this geometry, e.g., the α-divergence given in Equation (13), is not a Bregman divergence, but only an f -divergence [34]. To better understand the nature of the α-divergence, let us consider in detail its relationship with Bregman divergences. Bregman divergences, as given in Equation (7), can also be expressed as Hence, D Φ [ξ; ξ ] measures how convex the function Φ is at ξ in the direction of ξ − ξ (this also explains the asymmetry that exists in the arguments of a Bregman divergence) and exploits the fact that a first-order approximation of a convex function always underestimates its value (i.e., that Interestingly, such a first-order approximation can also be built on an intermediate point between ξ and ξ , which leads to where ξ α = 1−α 2 ξ + 1+α 2 ξ , with α ∈ (−1, 1) being a one-dimensional parameter that regulates how close x α is to ξ and ξ . This inequality leads to a family of divergences [41] indexed by α, given by where the factor 4/( Φ becomes the α-divergence. Importantly, divergences of the form of Equation (16) with α = ±1 are not Bregman divergences (as they cannot be expressed in terms of convex conjugates as in Equation (7)), and hence they do not lead to flat geometries (see Section 3.3).
Fortunately, recent results suggest a way to express non-Bregman divergences in terms of generalized Legendre transforms [9]. The generalized Legendre transform is based on a link function (Link functions are typically used as cost functions driving optimization problems in the literature focused on optimal transport [24]) corresponds to a smooth function C : M × M → R, that connects generalized potentials ϕ and ψ via the following relationship: which holds for all (ξ, η) pairs belonging to the C-superdifferential of ψ. In this manner, η can be interpreted as the C-supergradient of ψ at ξ [42]. Put differently, for a given link function C, a pair of generalized potentials are functions ϕ, ψ, which are related via a generalized Fenchel-Lengendre C-transform as follows: Note that these equations use a different sign than Equation (2), which leads to the consideration of concave instead of convex functions. Arguments for adopting this choice are discussed in Ref. [9]. Following the rationale that led to Equation (7), for a given function C and C-conjugate potentials ϕ, ψ, one can define a generalized Bregman divergence (This divergence is known as a C-divergence, recently introduced in the context of optimal transport [42]), where C refers to the corresponding cost function. Here we use another term to stress its relationship with key geometric notions, given by Equations (18a) and (18b) imply that D (p||q) ≥ 0, with equality if and only if p = q. Interestingly, while the metric induced by generalized Bregman divergences is the Fisher metric, Equations (12a) and (12b) imply that the connections are given by If C(ξ, η) = ξ · η then Γ ijk (ξ) = Γ * ijk (ξ) = 0, and hence curved geometries in this construction only arise from non-trivial link functions, i.e., from deformations of the Legendre transform.
For the dual geometries that arise from the α-divergence, one can identify the corresponding link function following a two-step procedure. First, one applies a monotonous transformation that turns the α-divergence into the Rényi divergence [43] of order γ (Note that we follow Ref. [44] in adopting a shifted indexing, thereby referring to γ = n − 1 as the order of Rényi's entropy, with n ≥ 0 corresponding to the order in the standard definition): related to the α parameter of divergence (13) as α = −1 + 2γ and leveraging the fact that both divergences generate the same geometry, being part of the same conformal-projective equivalent class ( [10], Sec. 2-D). Note that when γ → 0, C tends to ξ · η, and the Rényi divergence tends to the Kullback-Leibler divergence. As a second step, one uses the fact that the Rényi divergence can be expressed in terms of generalized convex conjugates ( [9], Th. 13), and hence it can be recovered as a generalized Bregman divergence as Equation (19), where the link function is given by and the corresponding generalized potential is Furthermore, it has been shown that this non-trivial logarithmic link function-or, equivalently, the Rényi divergence-gives rise to dual geometries of constant curvature [9]. Therefore, this divergence constitutes a natural first step in the exploration of statistical manifolds of more complex geometry.
To conclude, let us introduce the notion of Legendre derivative (This corresponds to the C-gradient in optimal transport theory (see, e.g., [9])). For given generalized potentials ϕ and ψ, the corresponding Legendre derivative is the operator D L that satisfies D L ϕ(ξ) = η and D L ψ(η) = ξ.
The functional form for D L is determined by the corresponding link function. For example, for the case of C(ξ, η) = ξ · η, Equations (4) and (5) show that D L is given by the Euclidean gradient. In contrast, for a logarithmic link function as in Equation (22), one can find that the corresponding (non-Euclidean) Legendre derivative acting on a smooth function ϕ is given by with D denoting the Euclidean gradient.

Symplectic and Kähler Structures in Information Geometry
This section studies the realization of symplectic structures in statistical manifolds. This naturally leads towards considering the complexification of statistical manifolds, which enables a new avenue to develop insights about the Legendre transform. Complex manifolds are 'bigger' bundles that possess a richer structure benefited by greater symmetry. These complex structures are quintessential to physics, being related to the quantization of the spin and coherent states [45], entanglement [46], string theory [47], and Kähler oscillators [48].
The reasoning pursued here is that by recasting manifolds as complex structures with a higher degree of symmetry, one can obtain a more detailed understanding of their geometry and their relationship with the deformed Legendre transform. To develop this idea, we first establish a parallel between statistical manifolds and phase spaces. In doing this, it is important to note that while in statistical manifolds the dual coordinates ξ and η usually refer to the same point, in phase spaces they typically refer to canonical pairs (e.g., position and momentum) and hence correspond to different dimensions. This naturally leads to the consideration of product manifolds of two times the dimensionality of the original one.

Establishing Dynamics on Phase Space
In analytical mechanics, the Legendre transform enables the derivation of the Hamiltonian formalism from the Lagrangian, a smooth function of n generalized coordinates q, velocityq, and time t. By doing this, one trades n second-order equations of motion for 2n first-order differential equations of the form Notice that the transformation (q, p) → (p, −q) preserves the form of the above equations. This symmetry is a reflection of a rich mathematical structure that provides the foundations of classical mechanics, which we introduce in the rest of this subsection. We start by reviewing the standard method to establish dynamics over a manifold based on the Hamiltonian formulation of classical mechanics, as described, for instance, in Refs. [2,49,50]. For this, let us consider a phase space M that describes the possible configurations of a system of interest. More specifically, each point in M has the form z = (q 1 , ..., q n , p 1 , ..., p n ), with (q 1 , ..., q n ) ∈ R n corresponding to a configuration manifold Q, and (p 1 , ..., p n ) ∈ R n corresponding to its generalized conjugate momenta. Dynamics over the phase space M are established by a Hamiltonian H : M → R via the following equations of motion: where the Hamiltonian vector field is given by and D denotes the standard gradient (see Equation (25)). In this way, dynamics are established flowing the integral curves of X H . At any point z ∈ M there is a trajectory governed by the dynamics induced by the Hamiltonian, which is unique due to the linearity of the equations involved. Above, the role of Equation (28)-which turns the Hamiltonian into a vector field-can be re-framed in a more principled manner via symplectic geometry [51] as follows. A symplectic form ω is a 2-form on M that is closed (dω = 0) and non-degenerate (∀v = 0 ∃u : ω(v, u) = 0). On a symplectic manifold (i.e., a manifold equipped with a symplectic form), the flow of the Hamiltonian H can be defined as the vector field X H that satisfies the following relationship: where ι X ω = ω(X, ·) is the 1-form that results from the interior contraction of ω. Above, dH is the differential of H and the sign corresponds to a convention in the definition of the symplectic form. The fact that ω is non-degenerate guarantees that one can always find a unique X H that satisfies Equation (29). Additionally, the closure of the symplectic form locally implies-by the Poincare Lemma-the existence of a tautological 1-form θ (also known as the canonical 1-form or symplectic potential), which satisfies the condition ω = dθ. This coordinate-invariant expression for ω emphasizes its topological nature. Symplectic manifolds belong to equivalent classes established via symplectomorphism (i.e., diffeomorphism, which preserves the symplectic form), which are equivalent to canonical transformation in the context of analytical mechanics. The symplectic form allows us to determine a vector field from a smooth function up to diffeomorphisms that preserve the symplectic form, i.e., L X H ω = 0. Furthermore, the geometry of the phase space gives an account of important properties of the underlying system. Indeed, while an unconstrained system may be described by a phase space of the form M = R 2n , more complicated systems are usually reflected by more convoluted geometries. As a simple example, a pendulum is described as a phase space of the form of a cylinder, which has a flat internal geometry but a non-trivial topology. The next subsections explore the implications of phase spaces with non-zero curvature.

Symplectic Structure under the Deformed Legendre Transform
Section 3.3 shows that, from an information-geometric perspective, divergences can be used to determine the metric and connections of a manifold. In this subsection, we show how divergences also generate a symplectic 2-form, from which much of the insights from Hamiltonian mechanics can be inherited. This, in turn, allows us to study probability distributions in phase space and discuss the flow induced by divergences. Our results will show that the symplectic 2-form induced by the divergence on the phase space and the induced Hamiltonian dynamics are different from the ones induced on the product manifold when the geometry is curved-or equivalently, when the Legendre transform has been deformed.
To start, let us introduce some terminology. We will contrast structures on the cotangent bundle of statistical manifolds with structures in the product manifold M × M made of pairs of the form (p, q). The product manifold is often parameterized using dual coordinates as (ξ, η) := ξ(p), η(q) (as a consequence, in this section ξ and η refer to different points in the manifold, unless it is explicitly specified to be otherwise). In addition, let us use the projection operators over the left and right elements, π l (p, q) = p and π r (p, q) = q, to define the sub-manifolds M q := π −1 l (p, q) = M × {q} M and M p := π −1 r (p, q) = {p} × M M . The diagonal of the product manifold will be denoted as ∆ ⊂ M × M , being made by pairs of the form (p, p).
Divergences are smooth functions mapping M × M into R, and we are interested in the geometrical structure that such mappings induce. To investigate this, let us consider the canonical symplectic form ω p on T * M p , which can be expressed in terms of a local chart (U, ξ k , ν k ) as with ν k being the conjugate coordinate to ξ k . Note that, thanks to Darboux's theorem [2], such canonical pairs are guaranteed to always exist locally. Let us then recast the map presented in Equation (8) as the symplectomorphism L D : M × M → T * M p given by As shown in [52,53], this map induces-via the pull-back L * D ω p = ω D -the following symplectic form on M × M : where the vanishing of the first expression (32c) is a result of the commutativity of the second derivatives of the divergence. Note that ∂ k i D(ξ, η) reduces to the Fisher metric when evaluated on ∆ (i.e., when ξ and η are evaluated at the same element p), but is different otherwise. Importantly, the same symplectic form on M × M is obtained by pulling back the canonical symplectic form ω q := dη k ∧ dλ k on T * M q (where (η, λ) form a canonical pair) in an analogous fashion, using here the symplectomorphism R D : M × M → T * M q given by Now that the symplectic form given by Equation (32d) has been identified as the natural one on M × M , our next step is to investigate how is it influenced by the manifold's curvature. For this, note first that if the divergence D is a generalized Bregman divergence, then its associated symplectic form depends solely on the link function. In effect, a direct calculation shows that for this case This clarifies how, although identical on the cotangent bundle T * M , the symplectic structure induced by different divergences may differ on M × M .

Rényi's Symplectic 2-Form and Flow
While the dually flat geometry established by Bregman divergences leads to a symplectic form given by ω D = dξ i ∧ dη i , for γ-curved geometry the Rényi divergence induces the following symplectic form: The coefficients of this symplectic form coincide with the metric tensor in Ref. [9] (Proposition 4), this time on the product manifold M × M . The symplectic form exhibited in Equation (35) is closed, as can be confirmed by a direct calculation leading to dω D = 0. This, in turn, implies the local existence of a corresponding tautological 1-form via Poincare Lemma, as explained in the previous section. Similar to the derivation that led to Equation (35), we define the canonical 1-form θ p = ν i dξ i on T * M p and evaluate its pull-back onto M × M , yielding This expression, hence, characterizes the 1-form emerging from connections that describe the projective-flat geometry induced by Rényi's divergence. As a last step, let us leverage the symplectic form ω D to evaluate the action of the smooth function D γ on the product manifold M × M . This function is of particular interest as it generates integral curves of constant D, and hence the induced flow is closed within the diagonal ∆ M . For this purpose, let us denote as X γ = X i γ ∂ ξ i + X γj ∂ η j the vector field generated by the observable D γ and the corresponding symplectic form. We are interested in the vector fields that preserve the symplectic form ω D , i.e., the vector field X γ that satisfies L X γ ω D = 0, where L X γ ω D denotes the Lie derivative of ω D in the direction of X γ . Then, using Cartan's magic formula one can find that where the last equality is a consequence of the fact that ω D is closed. Therefore, L X γ ω D vanishes only if X γ is Hamiltonian (29), i.e., if X H satisfies ι X γ ω D + dD γ = 0. One can then determine the Rényi vector field via explicit evaluation of the interior product as follows: which results in a Hamiltonian flow generated by Rényi's divergences of the form Then, the corresponding Rényi vector field can be found to be equal to where D (γ) L is the Legendre derivative operator introduced in Section 3.4. As mentioned above, this Rényi flow is closed within the diagonal ∆. Moreover, the above result implies that the flows on the diagonal follow the geodesic with respect to the primal and dual connections, which naturally satisfy Equation (24). In this way, we gain a new understanding of what deforming the exponential family implies. The squared brackets in Equation (40) imply that the set of points flowing along the integral curves at X γ correspond to enforcing the dual coordinate pair as the Legendre derivative of the potential at the diagonal. Hence, the Bregman limit (i.e., γ → 0) leads to the dual parameterization of exponential families from regular Legendre transformation, whereas finite γ = 1 leads to the deformed family of distributions obtained from Rényi's divergence ( [9], Section 4), which would describe the sets of points flowing along the integral curves at X γ and external points diverging away from it.

Complexification of Statistical Manifolds
This section discusses some fundamental aspects of complex geometry, followed by the complexification of statistical manifolds. Then, the next section focuses on the complex structure induced by the Rényi divergence. For a more extensive treatment of the properties of complex manifolds, we refer the reader to Refs. [54][55][56].
A complex manifold can be depicted as a topological space that locally looks like C n . One way to try building a complex manifold would be to consider a 2n-dimensional real manifold, and then arrange a set of coordinates {x k c } into complex combinations such as x 2k−1 c + ix 2k c . Unfortunately, such an arrangement is not only arbitrary, but also, more importantly, it is coordinate-dependent. In effect, additional structure on the manifold is required for it to be 'complexifiable'.
One way to build a complex manifold is via a tensor field J b a of real components satisfying J 2 = −1, which provides a linear endomorphism J : T p M → T p M . Notably, the diagonalization of such a tensor cannot be accomplished in a vector space of real values; hence, the coefficients of vectors in T p M must be allowed to be complex-valued (i.e., T C p M = T p M ⊗ C). By arranging 2n-local coordinates into complex coordinates x k + iy k , e.g., via x k = x 2k−1 c , y k = x 2k c , one can express J in complex coordinates as Hereon, a andā are indices within {1, . . . , n}, with the bar being used to distinguish between holomorphic and anti-holomorphic components. The manifold M together with the tensor J are known as an "almost complex structure". With the aid of J, such complexified T p M can now be decomposed into holomorphic and anti-holomorphic parts via projection operators given by [P (±) ] b a = 1 2 (δ b a ± J b a ). These projection operators can be used to decompose any k-form into (p, q)-forms with p + q = k.
As suggested above, every complex manifold is also a real manifold but the converse does not always hold. A necessary and sufficient condition on J to allow a real manifold to be a complex one is given by N ab c = 0, where N ab c stands for the Nijenhuis tensor given by (note that the connections appearing from the covariant derivatives cancel out, which is why it is often found written in terms of partial derivatives in spite of being a tensor) with squared brackets denoting the antisymmetrization of indices. Up to this point, the complex manifold (M , J) has not been equipped with a metric; in fact, a J-compatible metric may not exist (e.g., in Hopf manifolds). When such a metric does exist, this imposes the following compatibility conditions: The first condition above implies that the pure holomorphic and anti-holomorphic components of the metric vanish; hence, ds 2 = g µν dx µ ⊗ dx ν = g ab dz a ⊗ dzā is hermitian. The second condition enforces the vanishing of Nijenhuis tensor (42), not only guaranteeing complexification, but also implying that the Kähler 2-form given by is closed, which serves as the manifold's symplectic form. In components, Equation (44) means that ∂ a g bc = ∂ b g ac and ∂bg ac = ∂cg ab . Analogously as in (5), these expressions can be locally integrated revealing the metric with K being a real-valued smooth function known as the Kähler potential. This potential is not unique, as it is only determined up to the addition of a holomorphic and an antiholomorphic function: Furthermore, K may not be globally defined (if it were, the ω form would be exact and so its manifold's volume form implies the vanishing of its integral, violating the non-degeneracy condition for the metric). In this way, a Riemannian metric as well as the symplectic form are determined by K, as ω = k = i 2 ∂∂K with {∂,∂} denoting the Dolbeault operators ∂ = dz ∧ ∂ a and∂ = dz ∧ ∂ā. The similarities between these expressions and the ones in Section 3.4 and Kähler's are no coincidence, as K itself must convex. These similarities have been, in fact, the catalyst for the investigation of more intimate relations between the space of Kähler metrics and convexity [57] and various applications in the context of optimal transport [58].
In statistical manifolds the fundamental object is its divergence D, and therefore the constraints on the metric are ultimately enforced on D. Hence, the conditions for complexification of a manifold translate into two conditions over the corresponding divergence [59]: Above, the primed indices denote differentiation with respect to y ∈ M q (as opposed to regular derivatives with respect to x ∈ M p ). Although the first condition above is trivially satisfied when evaluated at the diagonal (as shown in Equation (11)), it is not automatic for it to hold on the whole M × M manifold. Both conditions arise from the construction of an invariant arc element ds 2 from the symmetric and antisymmetric parts, given by where g D and ω D denote the metric and symplectic form induced by the divergence D on M × M . Note that ω D is equivalent to the one derived in (36), while the components of g D are expressed in Equation (45). The second condition for the complexification of a statistical manifold is motivated by the fact that, if one is interested in expressing ds 2 as ∂∂D, then the condition (2) should be satisfied on M × M for κ ∈ R.
Importantly, divergences that can be expressed as in Equation (16) for a given convex function Φ satisfy the conditions discussed above, and hence the geometries they induce are compatible with a complex structure [58,59]. These divergences induce a geometry of constant scalar curvature given by κ = α − α + with α + = −γ and α − = 1 + γ. Furthermore, Φ(α + x + α − y) serves as the local Kähler potential of the manifold. It is worth noting that γ → 0 results in a vanishing K and thus cannot be defined. Indeed, γ = 0 is an excluded value for these expressions, and its limit should be previously worked out prior to complexification, as discussed in Ref. [59].

Complex Rényi Geometry under the Deformed Legendre Transform
Let us now exploit the general results presented in the previous section to deepen our understanding of the geometry induced by the Rényi divergence on statistical manifolds. The Rényi divergence D γ belongs to the family of divergences that can be expressed as in Equation (16) using Φ(x) as given by This means that the geometry that arises from the Rényi divergence is susceptible to being complexified. Furthermore, when evaluated on arguments that correspond to probability distributions (i.e., x a = log p a and y a = log q a ) then the first two terms in Equation (16) vanish, and therefore the Rényi divergence itself serves as the Kähler potential.
Let us now show that the two conditions for complexification discussed in the previous subsection are satisfied by product manifolds M × M endowed by a geometry induced by Rényi's divergence. For this, we adopt complex coordinates w a = x a + iy a ∈ C with x a = log p a and y a = log q a for p, q ∈ M . Using these coordinates, one finds that where we are using the shorthand notations Γ = 1 2 (α − + iα + ) and z a = exp(wΓ a ). In this manner, Φ(α + x + α − y) (or, equivalently, D γ (x, y)) can be identified as the Kähler potential for the product manifold.
The resemblance between the induced symplectic form in Equation (35) and the connections (36) at the previous section to the well-known Fubini-Study metric and its connection are suggestive of the complex-projective spaces CP n (for an overview on CP n spaces, please refer to Refs. [54][55][56]). Unfortunately, complexification of the local charts does not preserve the functional form of the symplectic form given by Equation (35), nor the canonical 1-form given by Equation (36). Nevertheless, special circumstances-such as γ = 1 and a restriction to the diagonal ∆-do lead to CP n upon complexification. Disregarding the pure holomorphic and anti-holomorphic functions of the divergence, the link function of the deformed Legendre transform can be directly read as the Kähler potential as follows: hence generating the Fubini-Study metric given by The case of complex dimension n = dim C M = 1 (two real dimensions), that is, M = CP 1 ⊂ C 2 , is of particular interest to physical systems. Indeed, from a group-theoretic perspective, this manifold corresponding to the coset group SU(2)/U(1) (isomorphic to the Riemann sphere S 2 CP 1 ) is crucial for the formulation of spin coherent states [45,60] and the geometric quantization of the spin [49]. In addition, CP 1 describes pure quantum states whose direct product enables a nice geometric formulation of many phenomena of interest, including entangled systems [46]. The connection on this manifold corresponds to the canonical 1-form, which is now determined by its Kähler potential via the Dolbeault operators (here the index takes only one entry a = 1, with trivial generalization to CP n ). Note that this gauge-field is consistent with the expression obtained for the connection 1-form found in Equation (35). Let us now show how a quantization of the 2-sphere restricts the allowed values for the Rényi parameter γ. As Poincare's Lemma tells us, every closed form is locally exact, and hence the existence of closed forms failing to be exact reflects some non-trivial aspect of the topology of the manifold. This feature is captured by cohomology classes H k (M , R), whose members are closed yet globally not exact k-forms. In this sense, the Kähler form belongs to H 2 (M , R). The single-valuedness of points on the manifold would require the ω D to belong to a cohomology class H 2 (M , R). Therefore, its symplectic two-form must be an integer multiple of ω D . Hence, the covariant derivative is ∇ a = ∂ a − ikA z with k ∈ Z (not to be confused with the manifold's complex dimension n), and the same holds for its anti-holomorphic counterpart. The holomorphic polarization (see Appendix A) imposes the condition ∇āψ = 0 for ψ wave function, a function whose squared module gives the probability density, closely resembling wave functions in quantum mechanics. This results in This implicit equation is solved by physical solutions ψ phys of the form with f (z) being a holomorphic function. The resulting probability density |ψ phys | 2 is given by The holomorphic function f (z) can be expanded on the basis {1, z, z 2 , ..., z k }, as higher powers would imply P (z) to diverge; hence, a Hilbert space of finite dimension as ψ phys is defined over the 2-sphere. Just as holomorphic polarization for γ = 0 results in exponential family distributions (Appendix A), one recovers the Rényi maximum entropy distributions as a polarization of the manifold for other values of γ. Moreover, by identifying γ = 1 k , one realizes (keeping the sign of γ) that k ∈ Z + introduces the restriction γ ∈ (0, 1], which corresponds to α ∈ (−1, 1] and reflects a positive curvature, as discussed in Ref. [10]. Although ruled out by the polarization, it is interesting to note that considering γ ∈ (0, 1] would result in the manifold having hyperbolic topology and becoming non-compact, hence not being susceptible to complexification. These results establish γ ∈ (0, 1] as values of special physical significance: γ = 1 for spin coherent states [45], worldline formalism [61], Kähler oscillators [48], and entanglement [46], and other values in γ ∈ (0, 1] for systems described through the geometric quantization framework. Notably, this range does not include γ = 0, which corresponds to conventional dually flat geometry and the Shannon entropy.

Conclusions
The Legendre transform, a fundamental piece of classic and contemporary physics, has a direct but non-trivial correspondence with the dually flat geometry of statistical manifolds induced by Shannon's entropy and the Kullback-Leibler divergence. This paper explores how deformations of the Legendre transform induce a departure from this regime and has multiple consequences on symplectic geometry and complexification. Taken together, these results provide some first steps towards a novel, rigorous, and encompassing understanding of physical systems that are not well-described by classic information-theoretic quantities.
The role of the Legendre transform on analytical mechanics differs from that in information geometry; in the latter, dual coordinates refer to different descriptions of the same point, whereas in the former, they refer to an isomorphism between the tangent and cotangent bundles. In flat geometry the symplectic form of the cotangent bundle is equivalent to a canonical area form at the product manifold. In contrast, our results show that this equivalence is broken if the manifold is curved. Interestingly, this implies that a deformation of the regular Legendre transform results in the failure of the natural coordinates to form a canonical pair. Furthermore, an analysis of the deformed symplectic form and flow that arises in curved manifolds reveals a new understanding of the family of maximum Rényi entropy distributions, which are found to form sets of points flowing along the integral curves of the flow.
The departure of the symplectic form of the product manifold from the cotangent bundle provides a promising lead to study coupled physical systems, with non-canonical coordinates-like the pair induced by the Rényi geometry-being subjects of special interest. For instance, there have been studies on the consequences of deformations in the symplectic form in field theory [62] and in CP n Kähler oscillators, where deformations to the symplectic structure via magnetic field are explored [48]. Other related phenomena have been studied in Fermi liquids under an external magnetic field, where the the magnetic field couples to Berry's curvature, deforming the symplectic form. Such deformations have been shown to have strong consequences for observables, as the invariant phase volume is modified via a topological invariant [63,64]. An interesting avenue for future research is to investigate if there are divergences that can recapitulate these deformations, providing a mathematical scaffolding for the study of such systems.
In this work we have established a broad range of nonzero γ values relevant from more than just a mathematical perspective. Both symplectic topology and Kähler manifolds are sensitive to the topology rather than local changes in geometry. Furthermore, they are sensitive to the physical systems to which they now connect. In particular, our results show that γ = 1 corresponds to a special case that is associated with the CP 1 manifolds relevant across various fields such as coherent states [45], worldline formalism [61], Kähler oscillators [48] and entanglement [46], to name a few. Via geometric quantization methods, our results show that holomorphic polarization leads to γ ∈ (0, 1]. This reveals a further array of values of interest outside of the conventional γ = 0 that characterizes the conventional dually flat Shannon systems.
The results presented here establish a first step in uncovering the consequences that the relationship between generalized Legendre transforms and curved statistical manifolds have for physical systems. We hope that this investigation may foster future work on these important implications, which may reveal other hidden threads connecting seemingly dissimilar approaches, such as the one revealed here relating non-Shannon entropies and non-canonical coordinates. Such investigations may lead towards a principled and unified understanding of physical systems that are not well-described by traditional approaches, providing solid foundations to support and guide some of today's effective but ad hoc procedures of analysis.