On Monotone Embedding in Information Geometry

A paper was published (Harsha and Subrahamanian Moosath, 2014) in which the authors claimed to have discovered an extension to Amari’s α-geometry through a general monotone embedding function. It will be pointed out here that this so-called (F,G)-geometry (which includes F -geometry as a special case) is identical to Zhang’s (2004) extension to the α-geometry, where the name of the pair of monotone embedding functions ρ and τ were used instead of F andH used in Harsha and Subrahamanian Moosath (2014). Their weighting function G for the Riemannian metric appears cosmetically due to a rewrite of the score function in log-representation as opposed to (ρ, τ)-representation in Zhang (2004). It is further shown here that the resulting metric and α-connections obtained by Zhang (2004) through arbitrary monotone embeddings is a unique extension of the α-geometric structure. As a special case, Naudts’ (2004) φ-logarithm embedding (using the so-called logφ function) is recovered with the identification ρ = φ, τ = logφ, with φ-exponential expφ given by the associated convex function linking the two representations.

subsequent work [4][5][6][7][8].The metric and affine connections proposed by [1] are identical to [2] apart from the notations: the embedding functions F and H in [1] were denoted as ρ and τ in [2], and weighting function G in [1] is a trivial rewriting of the convex function f used by [2].This paper will start in Section 1 with a review of Amari's α-geometry and α-embedding, a review of Zhang's (2004) [2] extension to ρ-embedding with an arbitrary monotone function and a summary of Harsha and Subrahamanian Moosath (2014) [1].Then, the equivalence of [1] to [2] is shown.In Section 2, after analyzing the group of monotone embedding functions, a stronger statement is made: the construction of [2] is a unique dualistic extension of Amari's α-geometry through arbitrary monotone embedding in place of α-embedding.As an important special case, we illustrate how the deformed logarithm log φ associated with an arbitrary strictly increasing function φ as investigated by Naudts (2004) [3] arises naturally from identifying φ with ρ and with a proper choice of the auxiliary function f as a part of Zhang's theory.
A Riemannian manifold M µ with its metric g and the family of α-connections Γ (α) in the form of ( 1) and ( 2) has been called α-geometry.Amari's α-geometry can be specified in terms of a symmetric (0, 2)-tensor g ij (the Fisher-Rao metric) and a totally symmetric (0, 3)-tensor T ijk (sometimes called the Amari-Chentsov tensor), which is linked to the α-connections via: where Γ LC ij,k is the Levi-Civita connection corresponding to the Riemannian metric g.
As an extension of the logarithmic embedding l(p) = log p of probability density function p, an α-embedding function [10] is defined through l (α) : R + → R: It is an interesting observation (e.g., p. 46 in [11]) that the α-geometry can be recovered under such α-representation (scaling) of the probability function, that is the Fisher-Rao metric turns out to be α-independent (i.e., embedding independent) and the ±1-connections precisely the α-connections: A variance of α-embedding of a probability function plays an important role in Tsallis statistics; see [12][13][14].On the geometric side, [15,16] illuminated that the α-scaling of the probability functions leads to a conformal transformation.
1.2.Zhang (2004) [2] Extension: ρ-Embedding and (ρ, τ )-Geometry Zhang [2,4,6] obtained generalizations of the α-geometry for a pair of monotone embeddings, called ρand τ -embeddings generalizing α-embedding.Given any smooth strictly convex function f : R → R, with convex conjugate f * given by: Zhang (2004) defines a pair of conjugate representations [2] (Section 3.2) using two strictly increasing functions ρ, τ from R → R: (1) we call ρ-representation of a probability function p the mapping p → ρ(p); (2) we say τ -representation of the probability function p → τ (p) is conjugate to ρ-representation with respect to a smooth and strictly convex function f , or simply τ is f -conjugate to ρ, if: which can be equivalently written as: These equalities in (10) and (11) hold, and they are equivalent, because f and (f * ) are both strictly increasing (due to their strict convexity) and that (f for a strictly increasing function τ .As a first example, we may set ρ(t) = t, τ (t) = log t.Then, we can derive that f * (t) = exp(t) and f (t) = t log t − t + 1.That ρ(p) and τ (p) are just the p and log p representation reflects the conventional dual embeddings that have later been extended to φand log φ -embedding in ( [3]).In Section 2.2, it will be shown that Naudts' φ-logarithm formulation is recovered as a special case of the (ρ, τ )-embedding.
As another example, we may set ρ(p) = l (β) (p) to be the β-representation given by Equation (6); this would have been traditionally called "alpha-embedding", except we use the symbol β, so that the α-parameter will be reserved for indexing α-connections.In this case, the conjugate representation is the (−β)-representation τ (p) = l (−β) (p): In this case, ρ and τ are conjugate with respect to f , where f is given by: . ( Based on divergence functions constructed under monotone embedding, Zhang ([2]) showed: ], Proposition 7) Using an arbitrary monotone embedding function ρ and an arbitrary smooth strictly convex function f , a generalization of α-geometry is obtained, with metric and α-connections taking the form: where: As special cases, Furthermore, taking a pair of monotone representations, the metric tensor and affine connections stated in Proposition 1 have dualistic expressions: , Proposition 8) Using two arbitrary monotone embedding functions ρ and τ , the metric and α-connections of ( 14)-( 16) are: As a special case, when ρ, τ take the familiar alpha-embeddings (12) (using β as the parameter), the α-connections becomes (αβ)-connections: with the product α • β playing the role of the alpha-parameter indexing the family of connections.
Statement 1. Equations ( 14) and (22) give the same Riemannian metric; Equations ( 17) and (23) give the same affine connection; and Equations ( 18) and (25) give the same conjugate connection, as long as: Proof.Re-writing (14), and keeping in mind: so: Comparing the above with (22), obviously, F is just ρ, and G is linked to f and ρ: where we have used (10).
Statement 2. The conjugate embedding function H is the same as τ .The conjugate connection (25), when expressed using H, has the same form as (23) for Γ G,F ij,k using F .
Proof.Applying Definition (24) immediately yields H = τ .Therefore, (apart from constant) H(p) = τ (p).Next, we will express (25) explicitly using the conjugate embedding function H (rather than F ) and the weighting function G.That is to say, we will simplify the terms in the middle parenthesis of (25): Hence, (25) has the same expression as (23) showing the duality between the embedding function H and the embedding function F .
By Statement 1, starting from F (that is, ρ) and G and imposing conjugacy requirement on the pair of affine connections, one is guaranteed to derive H (that is, τ ) as the conjugate embedding function.
During the review of their manuscript [1] and in subsequent personal communications, these authors argued that they used a different approach: (F, G)-geometry is derived by embedding the manifold into the space of random variables and suitably defining the inner product through using the F -expectation (their Equation (15)) and (F, G)-expectation (their Equation (32)) as a general weighted expectation of a random variable, while Zhang (2004) [2] derived the geometry through constructing a divergence function.This difference, however, is entirely superficial, because the relationship between divergence functions and geometric structure (metric and affine connection) is well-established by Eguchi's work [17,18] and known to information geometers.Therefore, neither the approach nor the results of Harsha and Moosath's proposed (F, H, G) extension to Amari's α-geometry differs from Zhang's proposed (ρ, τ, f ) extension, with the following correspondence in different symbols by the two papers: the difference in the representation of score function as log-representation in [1] or under ρ or τ -representation in [2] is cosmetic.

Monotone Embedding as a Transformation Group
Monotone representations of any given probability function form a transformation group, with functional composition as group composition operation and the functional inverse as the group inverse operation.This was pointed out by Zhang [6] (Section 2.2.2).We state it as a lemma here.
Lemma 1. Denote Ω as the set of strictly increasing functions from R → R.Then, (Ω, •) forms a group, with • denoting functional composition.
Proof.We easily verify that: (1) closure for •: for any ρ 1 , ρ 2 ∈ Ω, ρ 2 • ρ 1 , defined as ρ 2 (ρ 1 (•)), is strictly increasing, and hence, (2) existence of unique identity element: the identity function ι, which satisfies ρ • ι = ι • ρ = ρ, is strictly increasing, and hence, ι ∈ Ω and is unique; (3) existence of inverse: for any ρ ∈ Ω, its functional inverse ρ −1 , which satisfies is also strictly increasing, and hence, ρ −1 ∈ Ω; (4) associativity of •: for any three Recall that the derivative of smooth strictly convex functions are strictly increasing functions.From this perspective, f = τ , encountered above, are themselves two mutually inverse strictly increasing functions.This is the rationale behind Zhang's ( [2]) choice of f (and f * ) as the auxiliary function to capture conjugate embedding, rather than using G as in [1].The following identities are useful; they are obtained by differentiating (10) and (11): therefore: and: With respect to (41), taking log on both sides yields: Move and differentiate: Making use of (40) yields: Note the coupling between f and ρ, τ given by ( 10), ( 11), ( 40) and (44).They allow us to cast ( 14) and ( 15) in terms of f * and τ .Among the triple (f, ρ, τ ), given any two, the third is specified.In particular, if we arbitrary choose two strictly increasing functions ρ and τ as embedding functions and require them to be conjugate embeddings, then f is specified by f (t) = τ (ρ −1 (t)).In terms of conjugate function f * , the relation is The function f (or f * ) is important in constructing the general class of divergence function.

Naudts' φ-Logarithm as a Special Case
In his 2004 publication [3], Naudts considered the "deformed" logarithm function as an extension to the exponential family of densities that is log-linear.Given a strictly increasing and strictly positive function φ : R + → R + , the φ-logarithm is defined as: The deformed exponential denoted exp ψ , is defined by: (Naudts (2004) used the notation exp φ , so our current rendition has a subtle difference shown as ( 48) and (49) below.)It can be shown that the deformed functions log φ and exp ψ are in fact inverse functions of each other if: Stated alternatively, the deformed logarithmic function h(t) = log φ (t) can be viewed as the solution to the following integral and its equivalent differential equation: whereas the deformed exponential function h(t) = exp ψ (t) can be viewed as the solution to the following integral and its equivalent differential equation: We now show that the above formulation can be re-written as (ρ, τ )-embeddings with a particular choice of f (or equivalently, f * ) function.Set φ(t) = ρ(t) and f * (t) = exp ψ (t), so that (f * ) (t) = ψ(t) from (46).Therefore, we derive: That is, when φ is chosen as ρ-representation, the deformed logarithm log φ turns out to be the τ -representation, while the deformed exponential is nothing but f * .The relationship (47) is identical to (10) and (11).
The expression (51) in Proposition 2 shows that for any ρ, if one can find a decomposition: ρ = g •g −1 in terms of g, then g would be the ρ-exponential, g −1 the ρ-logarithm and g the linking function.In the case of φ → log φ transformation, g = f * (t).
Naudts' ( [3]) deformed logarithm/exponential embedding approach and Zhang's ( [2]) (ρ, τ )-embedding approach can be seen as playing complementary roles in information geometry: the former makes it easy to generalize the exponentiation and logarithm as inverse operations obeying desired differential/integral equations, while the latter makes it apparent how conjugate (ρ, τ )-embeddings lead to bidualistic expressions for the underlying geometric structures (metric and conjugate connections).

Uniqueness of (ρ, τ )-Geometry
It is known [19,20] that the Fisher-Rao metric and α-connections (equivalently, Amari-Chentsov tensor T ) are the only invariants of sufficient statistics under the Markov morphism of a random variable.In [22,23], the Fisher-Rao metric has been extended to allow a weighting function.In [2,6], general weighting functions for affine connections were made compatible with the generalized (i.e., weighted) Fisher-Rao metric, since they result from divergence functions that are allowed to have the freedom of monotone embedding.The recent reinvention [1] constructed weighted connections that turned out to be identical to the expressions given by [2].A natural question is, then, whether Zhang's (ρ, τ ) geometry is the unique construction given the freedom of arbitrary monotone embedding.Below, arguments will be provided, along with a proof, for a positive answer to this question.
First, when a probability function p(ζ|θ) (as a function of a random variable indexed by ζ and a background measure of µ) is embedded into the parametric manifold M Θ , there are several traditional choices for tangent vectors: ∂ i p, ∂ i log p, ∂ i √ p, etc.Each of these are linked with a weighting function (expectation operator), so that the tangent vectors are zero-mean random variables: where the weighting functions are, respectively, one, p, √ p: For these various choices, the direction of the tangent vectors are all the same.We can consider the above as special cases of ρ-embedding, with ρ(t) = t, log t, √ t, respectively.Because ∂ i (ρ(p)) = ρ (p)∂ i p, so a tangent vector retains its direction with any choice of monotone embedding function.
To investigate the weighting function for general monotone ρ-embedding, let us consider the f -normalization (foliation) condition, cf.[21], where f is a given convex function.Differentiate the above; we get: Therefore, we can see that τ (p) = f (ρ(p)), what we have called the f -conjugate of ρ, is precisely the weighting function to make ∂ i ρ a zero-mean random function at any point of M Θ (i.e., for any value of θ ∈ Θ).
Next, consider the Fisher-Rao metric (1), which can be written as E µ {∂ i p ∂ j log p} = E µ {∂ i log p ∂ j p}, the pairing of a random function with a random functional under two embeddings p and log p.A natural generalization (see [6]) is to use two (independently chosen) monotone embeddings ρ, τ : This is precisely (14), with the weighting function for the Riemannian metric as f (ρ(p))(ρ (p)) 2 = τ (p)ρ (p), when tangent vectors are expressed as ∂ i p (identity representation).When ρ-representation or τ -representation is adopted, the weighting function is simply f (ρ(p)) or (f * ) (τ (p)), respectively.Third, given ρ, τ embedding, we can construct two affine connections on the manifold as follows.Differentiate (57), ∂g ij (θ) and compare with the relation that defines conjugate connections: we can identify: with Γ ki,j and: with Γ * kj,i , respectively.Their difference is, by definition, the Amari-Chentsov (0,3)-tensor T : Proposition 3. T as given by ( 62) is a totally symmetric (0,3)-tensor.
Proof.First, we prove that T (θ) is totally symmetric: Since (62) clearly implies T ijk = T jik , we only need to establish T ijk = T ikj .Applying the chain-rule of differentiation, and taking into account: (62) becomes: Next, we prove that T ijk is indeed a (0,3)-tensor.This is done through examining the behavior of T under a coordinate transform θ → θ, with the (inverse) Jacobian matrix ∂θ k ∂ θl , which affects: and: Therefore: after substituting (69), ( 70) and (62).T indeed transforms to T in a manner that defines a (0, 3)-tensor.Therefore, the proposition is proven.
We now cast the Amari-Chentsov tensor T in an alternative form that gives an explicit form of weighting function.Given ρ, τ , because of Lemma 1, there exists another monotone embedding σ, such that σ(ρ) = τ .Differentiating, Differentiate again, we obtain: Substituting the above into (62), we obtain an expression of T in terms of ρ (which plays the role of embedding function) and σ (which plays the role of weighting function): Similarly, we can obtain: Therefore, under τ -representation, σ −1 (the inverse function of σ) serves as the weighting function.Note that σ = f , σ −1 = (f * ) when ρ and τ are said to be conjugate.Furthermore, note the negative sign in (75) compared with (74); this precisely reflects "representation duality" with a ρ ←→ τ exchange.
To summarize, because α-geometry {M, g, T } is uniquely specified given a Riemannian metric g and the Amari-Chentsov tensor T , the above derivations show that they both enjoy the freedom of two monotone/convex functions, with the freedom in specifying g coupled to the freedom in specifying T in the same way that the metric and connections are coupled via Codazzi relation for statistical manifolds.That the weighting functions used to construct linear, symmetric bilinear and totally symmetric trilinear functionals (on random functions) turns out to be f (ρ(•)), f (ρ(•)), f (ρ(•)), respectively, is noteworthy.See [6] for more discussions.
Zhang (2004) [2] showed that (1) and ( 2) reflect two different types of duality in information geometry, with (1) concerning the reference/comparison status of a pair of points (functions) expressed in the divergence function ("reference duality") and (2) concerning their representation under arbitrary monotone scaling ("representation duality").Both can lead to (3), the family of α-connections.Therefore, care has to be taken in carefully delineating these two kinds of duality; for instance, the αβ-connection we derived in (21) reflects how reference duality and representation duality interacts in the alpha-connections.
The present analysis elaborated representation duality in information geometry by working out the freedom in allowing two (independently chosen) embedding functions ρ, τ or, equivalently, one embedding function ρ along with a weighting function f , while the (ρ, f ) pair can be dually chosen to be the (τ, f * ) pair.Naudts' (2004) [3] φ-logarithm is but a special case of the (ρ, τ ) duality, in which f plays the role of the "integral-of-the-reciprocal" operation, that is taking the log of a function.This linkage then leads to f * and τ as inverse functions.The phenomena of biduality emerges when exchanging ρ ←→ τ or (ρ, f ) ←→ (τ, f * ) leads to invariance of the Riemannian metric, but switches the two connections (the latter half of the statement is equivalent to changing signs of the Amari-Chentsov tensor).Therefore, the present paper, while elaborating the theory developed in [2], re-asserts the distinction between two distinct kinds of duality that was originally confounded in Amari's theory of α-geometry, one through the freedom of selecting monotone embedding functions ("representation duality") and the other through the freedom of assigning referential status to points for pair comparison ("reference duality").
Finally, it is noted that the (bi)dualistic structure of the (ρ, τ )-geometry (generalizing α-geometry) is preserved in the non-parametric (infinite-dimensional) setting, as well [4,6], with the α-connection structure cast in a more general way.Theorem 1 of [4] gives non-parametric expressions of the metric and connections under monotone embedding, mirroring the forms ( 14) and (15) in the parametric case.

Conclusion
The Riemannian metric with the pair of conjugate connections derived by Harsha and Moosath [1] are identical to the (ρ, τ )-geometry obtained by Zhang in [2].The (ρ, τ )-embedding also recovers Naudts' deformed logarithm/exponential formulation.It is further shown in this paper that such (ρ, τ )-geometry obtained is, when α-embedding is relaxed to arbitrary monotone embeddings, the unique extension of Amari's α-geometry in terms of its representational freedom.