Deformed Algebras and Generalizations of Independence on Deformed Exponential Families

A deformed exponential family is a generalization of exponential families. Since the useful classes of power law tailed distributions are described by the deformed exponential families, they are important objects in the theory of complex systems. Though the deformed exponential families are defined by deformed exponential functions, these functions do not satisfy the law of exponents in general. The deformed algebras have been introduced based on the deformed exponential functions. In this paper, after summarizing such deformed algebraic structures, it is clarified how deformed algebras work on deformed exponential families. In fact, deformed algebras cause generalization of expectations. The three kinds of expectations for random variables are introduced in this paper, and it is discussed why these generalized expectations are natural from the viewpoint of information geometry. In addition, deformed algebras cause generalization of independences. Whereas it is difficult to check the well-definedness of deformed independence in general, the κ-independence is always well-defined on κ-exponential families. This is one of advantages of κ-exponential families in complex systems. Consequently, we can well generalize the maximum likelihood method for the κ-exponential family from the viewpoint of information geometry.


Introduction
An exponential family is a set of probability distributions and an important statistical model in mathematical sciences.For example, the set of all Gaussian distributions is an exponential family.A deformed exponential family is one of generalizations of exponential families, and it has been studied in anomalous statistical physics (cf.[1]) and in machine learning theory (cf.[2,3]).A useful class of power law tailed distributions, such as the set of all Student's t-distributions, is a deformed exponential family.
In the study of deformed exponential families, a deformed exponential function and a deformed logarithm function play important roles.However, these functions do not satisfy the law of exponents in general.Hence, deformed algebraic structures and deformed differentials have been introduced in anomalous statistical physics (cf.[4][5][6][7]).In addition, a random variable that follows a power law tailed distribution may not have its mean and variance.To overcome this problem, a deformed probability distribution called an escort distribution (cf.[1,8]) has been introduced.Then, an expectation with respect to the escort distribution has been discussed.
In this paper, after summarizing such deformed algebraic structures, we clarify how a deformed algebra works on deformed exponential families.In particular, we elucidate that a deformed sum works on the sample space and a deformed product works on the target functional space.This difference makes clear how to use deformed algebras.
Since the deformed sum works on the sample space (i.e., the domain of random variables), the sample space can be regarded as some algebraic space, not the standard Euclidean space.This deformation causes generalizations of expectations of random variables.In this paper, we consider three kinds of expectations, which include the expectation with respect to the escort distribution mentioned above.Then, we elucidate why these expectations are natural from the viewpoint of information geometry.Here, information geometry is one of the differential geometric methods for statistical estimation (cf.[9]).As a consequence, generalized expectations give local coordinate systems of deformed exponential families, and such coordinate systems have close relations to a dually-flat structure and to a projective structure of deformed exponential families.(see also [10][11][12][13], etc.) The deformed product works on the target space of probability distributions.This deformation causes generalizations of independences.Though it is difficult to check the well-definedness of deformed independence, the κ-independence for the κ-exponential family is always well defined.This is an advantage of κ-exponential families among the class of deformed exponential families.Hence, we consider κ-generalization of the maximum likelihood method.In information geometry, it is known that the maximum likelihood estimator for a curved exponential family is characterized by the Kullback-Leibler divergence projection from the observed data.Based on this fact, we give a κ-generalization of the divergence projection-type theorem for the κ-maximum likelihood estimator.
In this paper, new contributions are stated as theorems (i.e., Theorems 1, 3, 4 and 7), whereas known results are stated as propositions.

Deformed Exponential Families
In this section, we give definitions of deformed exponential functions and deformed exponential families.For more details, see [1,10,11,14], for example.We assume that all functions are real functions and that variables are defined in a real number field, since we will consider probability distributions in a real number field.
Let χ be a strictly increasing function from R >0 to R >0 .We define a χ-logarithm function (or a deformed exponential function) by: dt.
The inverse of the χ-logarithm function is called a χ-exponential function (or a deformed exponential function), and it is given by: where the function λ(s) is given by λ(ln χ s) = χ(s).
We remark that the χ-logarithm function ln χ and the χ-exponential function exp χ are usually called φ-logarithm and φ-exponential, respectively (cf.[1,15]).However, the symbol φ is used as the dual Hessian potential function in information geometry, so we use χ as the deformation function in this paper.
Example 1. Suppose that a deformation function χ(s) is given by: Then, the deformed exponential and the deformed logarithm are given by: respectively.The function ln κ s is called a κ-logarithm function and exp κ t a κ-exponential function (cf.[6]).By taking a limit κ → 0, these functions coincide with the standard logarithm and the standard exponential, respectively.While s > 0 is needed for defining the κ-logarithm function ln κ s, the κ-exponential function exp κ t is defined entirely on R, since κt + √ 1 + κ 2 t 2 is always positive.
Example 2. Suppose that χ(s) is given by a power function χ(s) = s q , (q > 0, q = 1), Then, the deformed exponential and the deformed logarithm are given by: The function ln q s is called a q-logarithm, and exp q t a q-exponential (cf.[1,8]).Taking a limit q → 1, these functions coincide with the standard logarithm and the standard exponential, respectively.The condition s > 0 is needed for defining ln q s.In the q-exponential case, the condition: is also necessary, since the base of the exponential function must be positive.Condition (1) is called the anti-exponential condition for the q-exponential function.
Let Ω be a total sample space.We say that a statistical model S χ on Ω is a χ-exponential family or a deformed exponential family if S χ is a set of probability density functions, such that: where F 1 (x), . . ., F n (x) are functions on Ω, θ = {θ 1 , . . ., θ n } is a parameter and ψ(θ) is the normalization with respect to the parameter θ.We assume that S χ is a statistical model in the sense of information geometry.That is, a probability density p(x; θ) ∈ S χ has support entirely on Ω. See Chapter 2 in [9] for more details.The normalization function ψ is convex, but it may not be strictly convex in general.We assume that ψ is strictly convex in this paper, and then, we can induce a Riemannian metric from this normalization function ψ (see Section 7).In addition, functions F 1 (x), . . ., F n (x), ψ(θ) and a parameter θ must satisfy the anti-exponential condition.For example, in the q-exponential case, We remark that it is a bit of a difficult problem how the anti-exponential condition imposes the domain of {θ i } and the range of {F i (x)}.We will give a further discussion at the end of this section.We say that a deformed exponential family is a κ-exponential family if its deformed exponential function is a κ-exponential function exp κ and a q-exponential family if its deformed exponential function is a q-exponential function exp q .These deformed exponential families are denoted by S κ and S q , respectively.
Suppose that M χ is a submanifold of S χ , that is, The submanifold M χ is called a curved χ-exponential family of S χ .From similar arguments, we can define a curved q-exponential family M q in S q and a curved κ-exponential family M κ in S κ .
Example 3 (Discrete distributions (cf.[10])).Suppose that Ω = {x 0 , x 1 , . . ., x n } is a finite sample space.Denote by S n the set of all probability distributions on Ω: The natural parameters and the normalization are given by: Then, we obtain: This implies that S n is a χ-exponential family for any χ.
By taking a limit q → 1, a Student t-distribution converges to a normal distribution.
The set of all Student t-distributions S q is a q-exponential family.In fact, natural parameters are given by: respectively.Then, we obtain: where ψ is the normalization defined by: Hence, the set of all Student t-distributions S q is a q-exponential family.
Let us give further considerations about deformed exponential families.In the case 0 < q < 1, a q-normal distribution has the following form: where the normalization Z q (σ) is given by: The anti-exponential condition for this q-normal distribution is: hence the domain of random variable x is given by: In this case, the set of q-normal distributions S q = {p(x; µ, σ)} is not a statistical model in the sense of information geometry [9], since the support of p(x; µ, σ) depends on its parameter (µ, σ).
On the other hand, for a q-normal distribution, fix parameters q, µ, σ.By introducing a new parameter α (0 < α < q/(1 − q)), we set: The transformation (q, σ) → (q α , σ α ) defined by ( 3) is called a τ -transformation [16].From straightforward calculations, we have: This equation implies that, from Equation ( 2), the domain of random variable x is invariant under τ -transformations.Hence, a one-dimensional statistical model is defined by: However, S qα is not a deformed exponential family in our setting, since the exponent q α of the deformed exponential function depends on the parameter α.

Non-Additive Differentials
In this section, we consider deformed algebras and deformed differential equations to characterize deformed exponential functions.

κ-Deformed Algebras and κ-Exponential Functions
We begin with the κ-exponential case.For more details about κ-deformed algebras, see [6].Let exp κ be a κ-exponential function and ln κ a κ-logarithm function.Since exp κ and ln κ do not satisfy the law of exponents, we introduce the κ-sum ⊕κ and the κ-product ⊗ κ as follows.
The conditions y 1 > 0 and y 2 > 0 are necessary for defining the κ-logarithm function.On the other hand, such conditions are not necessary for defining the κ-exponential function.
From the definitions of κ-deformed algebras, we have the following deformed law of exponents.
Since the inverse element of x with respect to the κ-sum is −x, we define the κ-difference ˜ κ by: By taking a limit with respect to the κ-difference, we define a (non-additive) κ-differential as follows.
We remark that a non-additive κ-differential d κ /d κ x characterizes the κ-exponential function.Consider the following deformed differential equations: Then, the eigenfunction f (x) of both equations is the κ-exponential function.That is, In fact, from the definition of the κ-difference (5), we have: hence two deformed differential equations, ( 7) and ( 8), are essentially equivalent.We call a non-additive differential equation ( 7) a non-additive representation and a deformed differential equation ( 8) an escort representation.
Remark 1.A κ-sum works on the domain of a κ-exponential function (i.e., the sample space Ω), and a κ-product works on the target space.This implies that the sample space can be regarded as some deformed algebraic space, not the standard Euclidean space.In fact, the sample space and the target space are regarded as commutative fields (equivalently, Abelian fields in the usage of [7]).The κ-sum is an additive group structure of a commutative field structure on the sample space, and the κ-product is a multiplicative group structure on the target space (see also Remark 2 and [7]).We consider that this fact is very important in the theory of non-extensive statistical physics.
Recall the definition of Napier's constant.The standard exponential function has the following infinite product expression: In the κ-exponential case, we have the following.
Theorem 1. Fix a real number x ∈ R. Suppose that n > |x| and n ∈ N.Then, we have: , where: Proof.From the assumption, the inequality 1 + x/n > 0 always holds.Hence, we have: (the assumption n > |x| is a condition that the κ-logarithm in Equation ( 9) defines).From the definition of the κ-product (4), using asymptotic expansions, we have: Substituting asymptotic expansions into (9), we have: Hence, we have: By taking a limit n → ∞, we obtain the result.
3.2.q-Deformed Algebras and q-Exponential Functions Let us consider the q-exponential case (cf.[4]).Let exp q be a q-exponential function, and let ln q be a q-logarithm function.The q-deformed algebras, i.e., the q-sum ⊕q and the q-product ⊗ q , are defined as follows.
x 1 ⊕q x 2 := ln q exp q x 1 • exp q where conditions 1 + (1 − 1 > 0 are needed for defining q-exponential functions and y 1 > 0, y 2 > 0 are for q-logarithm.Under the q-deformed algebras, the q-deformed law of exponents holds: ln q (y 1 • y 2 ) = ln q y 1 ⊕q ln q y 2 , exp q (x 1 + x 2 ) = exp q x 1 ⊗ q exp q x 2 , ln q (y 1 ⊗ q y 2 ) = ln q y 1 + ln q y 2 . ( The inverse element of x with respect to the q-sum is given by: Hence, the q-difference should be defined by: By taking a limit with respect to the q-difference, we define a (non-additive) q-differential as follows.
Let us consider the following deformed differential equations: Then, the eigenfunction f (x) of both equations is the q-exponential function.That is, In the same way as the κ-exponential, we say that the non-additive differential equation ( 12) is the non-additive representation and the deformed differential equation ( 13) is the escort representation.We remark again that a q-sum (a deformed sum) works on the domain of a q-exponential function and that a q-product (a deformed product) works on the target space.Hence, the sample space Ω may not be the standard Euclidean space.
An infinite product expression of the q-exponential function is given as follows.
Proposition 2 (cf.[17]).For all integers n ∈ N, suppose that: Then, we have: , where: Proof.From the definition of q-product (10) and the anti-exponential condition ( 14), we have: Using an asymptotic expansion: we have: Hence, we have: By taking a limit n → ∞, we obtain the result.
Remark 2. If a deformed sum and a deformed product are well-defined, then we can give similar arguments for any χ-exponential functions.However, it is difficult to describe the anti-exponential conditions in general.If we can admit a complex number field for the domain and the target of statistical model (cf.[18]), then the deformed algebras are well defined [7].In fact, we can define the following commutative field structures if all of the objects are well defined: Usages of the multiplicative group structure ⊗χ on the sample space and the additive group structure ⊕ χ on the target space are not clear.We may need algebraic probability theory to clarify these group structures (for usages of algebraic structures for statistics, see [19,20], for example).

Expectation Functionals
As we have seen in the previous section, the sample space Ω may not be the standard Euclidean space.Let us consider suitable expectations for deformed exponential families.
For a χ-exponential probability p(x; θ) ∈ S χ , we define the escort distribution P χ (x; θ) and the normalized escort distribution P esc χ (x; θ) of p(x; θ) by: P χ (x; θ) := χ{p(x; θ)}, respectively.The χ-canonical expectation E χ,p [ * ] and the normalized χ-escort expectation E esc χ,p [ * ] are defined by: Even though the integration of χ-canonical expectation is carried out with respect to a positive density, as we will see in later sections, this expectation is natural from the viewpoint of differential geometry.On the other hand, we call the standard expectation with respect to p(x; θ) a simple expectation and denote it by: A χ-canonical expectation and a normalized χ-escort expectation with respect to a κ-exponential probability p(x; θ) are called the κ-canonical expectation and the normalized κ-escort expectation and are denoted by E κ,p [ * ] and E esc κ,p [ * ], respectively.In the q-exponential case, they are called the q-canonical expectation and the normalized q-escort expectation and denoted by E q,p [ * ] and E esc q,p [ * ], respectively.For a Student t-distribution p(x; µ, θ) ∈ S q , the normalized q-escort mean µ q and the normalized q-escort variance σ 2 q are given by: respectively.Hence, the normalized q-escort expectation E esc q,p [ * ] is a natural generalization of the simple expectation E p [ * ].
Next, we consider non-additive integrals to elucidate the relations between the deformed algebras and the escort expectations.In particular, we discuss the κ-exponential case.
Let f (x) be a function on the sample space Ω.Then, we define a (non-additive) κ-integral (cf.[5]) by the following formula: where w(x) is a weight function defined by: Obviously, this is the inverse operation of the non-additive κ-differential (6).
When Ω is a discrete set, Ω = {x 0 .x 1 , . . ., x n }, then we define a (non-additive) κ-summation by: From the definition of the κ-exponential function, we have the following.
Theorem 3. Suppose that χ(s) = 2s/(s κ + s −κ ) is the deformation function with respect to the κ-logarithm function.Then, χ(exp κ x) coincides with the weight function w(x) with respect to the non-additive κ-integral.That is, the following formula holds: We think that the canonical expectation E κ,p [ * ] gives a suitable weight for the sample space Ω from the above theorem.We may consider a non-additive χ-integral as a general discussion (in the q-exponential case, the corresponding q-integral is introduced in [4]).However, we have to check carefully the well-definedness of the χ-integral since the anti-exponential condition must be satisfied.

Geometry of χ-Exponential Families with Simple Expectations
In this section, we consider the geometry of χ-exponential families by generalizing the e-representation and the m-representation of probability densities.For more details, see [11].
Let S χ be a χ-exponential family.We define a χ-score function s χ (x; θ) : S χ → R n , s χ (x; θ) = t ((s χ ) 1 (x; θ), . . ., (s χ ) n (x; θ)) by: Under suitable conditions, we can define Riemannian metrics on S χ by: In the same manner as an invariant statistical manifold, a differential ∂ i p(x; θ) and a χ-score function ∂ i ln p(x; θ) are regarded as tangent vectors for a χ-exponential family S χ .Hence, the χ-score function is a generalization of the e-representation of p(x; θ).Theorem 4. Riemannian metrics g E , g M and g N on S χ coincide.That is, Proof.For a χ-exponential distribution p(x; θ), its differential is given as follows: By substituting the above formulas into ( 17)-( 19), we obtain the results.
We remark that integrations are carried out with respect to un-normalized χ-escort distributions.If we define Riemannian metrics by normalized χ-escort expectations, they do not coincide in general.Their Riemannian metrics are conformally equivalent (cf.[11]).
By differentiating Equation ( 18), we can define dual affine connections ∇ M (e) and ∇ M (m) on S χ by: From the definitions of the χ-exponential family and the χ-logarithm function, we obtain Γ M (e) ij,k (θ) ≡ 0. Hence, a parameter θ = {θ i } is a ∇ M (e) -affine coordinate system, and the connection ∇ M (e) is flat.These imply that the triplet (S χ , ∇ M (e) , g M ) is a Hessian manifold.The cubic form C M ijk of (S χ , ∇ M (e) , g M ) is: To give Hessian potential functions of (S χ , ∇ M (e) , g M ), we define functions I χ and Φ by: where the function V χ (t) is given by: We call I χ a generalized entropy functional and Ψ a generalized Massieu potential.
Let us consider a divergence function on χ-exponential family.The canonical divergence D on (S χ , ∇ M (e) , g M ) is defined by: On the other hand, the χ-divergence (or U -divergence) on S χ is defined by: where the function U χ (t) is given by: Then, the χ-divergence D χ coincides with the canonical divergence D on (S χ , ∇ M (m) , g M ).We remark that the χ-divergence is naturally constructed from a bias corrected χ-score function.See [11,23].for more details.
In the q-exponential case, the χ-divergence is given by: or a density power divergence in statistics [24].This divergence is useful in robust statistics.We remark that the generalization of eand m-representations through an arbitrary monotone embedding function was first studied in [25].For further generalizations through monotone embedding functions, see [26,27].These generalizations of eand m-representations are also related to the U -geometry in information geometry (cf.[21,22]).When the embedding function χ(t) is identity (q = 1 in the q-exponential case and κ = 0 in the κ-exponential case), the results in this section reduce to the standard results in exponential families [11].

Geometry of Deformed Exponential Families with χ-Escort Expectation
Since a χ-exponential distribution has a normalization term ψ(θ), we induce geometric structures directly from the potential function ψ.For more details, see [10,11].When the embedding function χ(t) is identity, the results in this section also reduce to the standard results in exponential families [11].
Then, φ(η) is the potential of g χ with respect to {η i }.
Let us consider divergence functions.The canonical divergence of (S χ , ∇ χ(e) , g χ ) is given by: On the other hand, a χ-relative entropy (or a generalized relative entropy) D χ (p, r) on S χ is defined by: If the deformation function χ is an identity function χ(s) = s, then the χ-relative entropy coincides with the Kullback-Leibler divergence.In addition, the χ-relative entropy D χ coincides with the canonical divergence on (S χ , ∇ χ(m) , g χ ).In fact, in the same way as a standard exponential family, we have: In the κ-exponential case, we call a χ-relative entropy (20) a κ-relative entropy and denote it by D κ .On the other hand, in the q-exponential case, a χ-relative entropy for q-exponential family is called a normalized Tsallis relative entropy, which is given by: D T q (p, r) := E esc q,p [ln q p(x) − ln q r(x)] where Z q (p) is the normalization of the escort distribution P esc q (x) of p(x).Denote by (S q , ∇ q(e) , g q ) and (S q , ∇ q(m) , g q ) the induced Hessian manifolds from the normalization ψ(θ).Then, the normalized Tsallis relative entropy coincides with the canonical divergence for a Hessian manifold (S q , ∇ q(m) , g q ).
Remark 3.For a q-exponential family S q , a normalized Tsallis entropy induces a Hessian manifold (i.e., a flat statistical manifold) (S q , ∇ q(m) , g q ), whereas an α-divergence induces an invariant statistical manifold (S q , ∇ (1−2q) , g).Since a constant multiplication is not essential in differential geometry, the difference is caused by the normalization of the escort distribution: In this case, the two statistical manifolds (S q , ∇ q(m) , g q ) and (S q , ∇ (1−2q) , g) are (−1)-conformally equivalent (cf.[29,30]).This implies that the normalization of a probability density is not a trivial problem.The normalization does affect the induced geometric structures and, consequently, the estimating methods for statistical inference.

Discussion about Expectations
We give further discussions about expectation functionals.Since a deformed exponential family S χ is regarded as a manifold, we can choose an arbitrary local coordinate system for S χ .From this point of view, simple expectations {E p [F i (x)]} and normalized χ-escort expectations {E esc χ,p [F i (x)]} are nothing but local coordinates of the statistical model.However, in differential geometry, we often use appropriate coordinates depending on the background geometry, e.g., Darboux coordinates in symplectic geometry and isothermal coordinates in geometry of minimal surfaces.From Propositions 5 and 6, the simple expectations {E p [F i (x)]} and the normalized χ-escort expectations {E esc χ,p [F i (x)]} give appropriate coordinates for (S χ , g M , ∇ M (e) , ∇ M (m) ) and (S χ , g χ , ∇ χ(e) , ∇ χ(m) ), respectively, since they are the dual affine coordinates of the natural parameters {θ i }.
From the assumptions of deformed exponential families, there always exists a dually flat structure (S χ , g χ , ∇ χ(e) , ∇ χ(m) ), but there does not exist (S χ , g M , ∇ M (e) , ∇ M (m) ) in general (see [31] for more details).In addition, from Theorem 3, the deformed algebra on sample space Ω is reflected in the canonical expectation E Recall that we cannot consider infinitely many q-products to define a joint distribution.In the case of Student t-distributions, the number of q-marginal distributions must satisfy N < 2(q − 1).Otherwise, the normalization Z diverges.
Similarly, we say that X 1 , X 2 , . . ., X N are κ-independent with e-normalization (or exponential normalization) if: where c is the normalization of p 1 (x 1 ) ⊗ κ p 2 (x 2 ) ⊗ κ • • • ⊗ κ p N (x N ) defined by: We remark that the e-normalization is different form the m-normalization in general.See [33] for further discussion.
A normalization of joint distribution is not required in several problems.In these cases, we define a joint positive distribution (not a probability distribution) by the κ-marginal probability distributions, f (x 1 , x 2 , . . ., x N ) := p 1 (x 1 ) ⊗ κ p 2 (x 2 ) and we say that X 1 , X 2 , . . ., X N are simply κ-independent.
Remark 4. As we mentioned in Remark 2, it is difficult to describe explicitly the anti-exponential conditions for the χ-exponential case.Though several authors have introduced χ-independence (which is called U -independence in [2,3] and F -independence in [34]), they did not mention the anti-exponential conditions.Hence, the χ-independence was not well defined in their papers.
On the other hand, the anti-exponential condition of the κ-deformed algebra (4) is always satisfied, since p(x; θ) ∈ S κ can be defined entirely on R. Therefore, the κ-independence is well defined for a κ-exponential family.This is an advantage of the κ-exponential families.
Before we discuss a generalization of maximum likelihood methods, we recall the difference between Gauss' law of error and the maximum likelihood method.
In the case of Gauss' law of error, we consider the following likelihood function: χ,p [ * ].Hence, we think that the canonical expectation E χ,p [ * ] and the normalized χ-escort expectation E esc χ,p [ * ] are more natural than the simple expectation E p [ * ].