Mixture and Exponential Arcs on Generalized Statistical Manifold

In this paper, we investigate the mixture arc on generalized statistical manifolds. We ensure that the generalization of the mixture arc is well defined and we are able to provide a generalization of the open exponential arc and its properties. We consider the model of a φ-family of distributions to describe our general statistical model.


Introduction
In the geometry of statistical models, information geometry [1][2][3] is the part of probability theory dedicated to investigate probability density functions equipped with differential geometry structure. A differential-geometric structure to the multi-parameter families of distributions was provided in [4]. In the mid-1980s, other topics related to the subject, such as fiber bundle theory and duality of connections of statistical models, were investigated by Amari [5] and Amari and Nagaoka [6], respectively. In the parametric case, exponential, mixture and α-connections, as well as their dual structure, are among the most important geometric objects [6], since the dual structure of the α-connections is the key point distinguishing statistical manifolds against arbitrary differential manifolds. Divergence function is an essential topic in information geometry, for both, parametric and non-parametric cases, since a metric and dual connections can be induced from a divergence [7][8][9][10]. To find an information-geometrical foundation for multi-parameter families of probability distributions, with a more general description, is one of topics of interest in information geometry [11][12][13][14] Non-parametric statistical models [15] are important in a wide range of areas [16,17]. In the parametric case, the manifold of probability density functions obtains a Euclidian topology from the space of its natural parameters. As for the non-parameter case, a major challenge is to define a convenient topology and a notion of convergence. Pistone and Sempi [18] were the first to formulate a rigorous infinite dimensional extension. In that work, the set P µ of all strictly positive probability densities was endowed with a structure of exponential Banach manifolds, using Orlicz spaces associated to a Young function. In a later work [19], more properties of the statistical manifold were studied, specifically regarding the orthogonality condition.
Similar to in the parametric case, in non-parametric models, the mixture and exponential connections are among the most important geometric objects. To find these connections, it is necessary to guarantee the existence of the open arcs, which are the geodesics of the manifold. Using the notion of exponential convergence, Gibilisco and Pistone [20] investigated those connections. In that work, the exponential and mixture connections were built in a way that the relation between them is the same as in the parametric case. Another approach was used in [21] where the mixture arc was additionally studied. Moreover, Grasselli [21] proved that two probability densities in the same neighborhood are connected by an open mixture arc if and only if the difference between their random variables is bounded.
The exponential statistical manifold was later studied in [22], with another system of charts, the statistical model E (p), called the maximal exponential model. Cena and Pistone [22] proved that this model is the set of all positive densities connected to a given positive density p by an open exponential arc and viceversa. In that work, it was used the open mixture arc and the open exponential arc to discuss properties of this model as e-connection and m-connection in the same way that in [6]. This exponential model E (p) with the open exponential and mixture arcs were also studied recently by Santacroce et al., 2016 [23] and Santacroce et al., 2017 [24], where a proof of duality properties of statistical models was provided. Examples of applications of non-parametric information geometry to statistical physics using the connection by open arcs were studied in [25].
The generalization of the exponential statistical manifold has been an active topic of research in the last years. Pistone [26] used the Kaniadaki's κ-exponential [27] in the construction of a statistical manifold. Vigelis and Cavalcante [28] proposed a ϕ-family of probability distributions F ϕ c , which generalizes the exponential family E (p). This generalization is based on the replacement of the exponential function by a deformed exponential ϕ which satisfies some properties and provides to the set P µ a Banach manifold structure, so called generalized statistical manifold. In [29], a review of nonparametric information geometry with specific issues of the infinite dimensional setting is provided. In that work, the deformed exponential manifold was studied with a deformed exponential function defined in [30] and a model space was built according to the proposal in [28].
In [31] were given necessary and sufficient conditions for any two probability distributions being connected by a ϕ-arc. In this work, we ensure the existence of a generalized mixture arc for probability distributions in the same ϕ-family F ϕ c , with a deformed exponential function which satisfies some properties. Moreover, we find a generalization of open exponential arcs and we prove, in the same way that in [22], that the ϕ-family F ϕ c is the component connected to a given positive density p = ϕ(c) and viceversa.
The rest of the paper is organized as follows. In Section 2, we revisit results about Musielak-Orlicz space and ϕ-family of probability distributions. We also briefly recall about the subdifferential of a convex function. In Section 3, where we provide our main results, we ensure that the generalized mixture arc is well-defined. In Section 4, we discuss the generalized, exponential and mixture arcs. Finally, our conclusions and perspectives are stated in Section 5.

Preliminary Results
The statistical manifold P µ can be equipped with a structure of C ∞ -Banach Manifold, using the Musielak-Orlicz space L Φ associated to the Musielak-Orlicz function Φ c (t, u) = ϕ(t, c(t) + u(t)) − ϕ(t, c(t)). Each connected component of the statistical manifold gives rise to a ϕ-family of probability distributions F ϕ c . In this section, we provide an introduction of Musielak-Orlicz spaces and the construction of the ϕ-family of probability distributions.
We notice that Φ(t, ·), by (i)-(ii), is not equal to 0 or ∞ on the interval (0, ∞). Let L 0 be the linear space of all real-value, measurable functions on T. Given a Musielak-Orlicz function Φ, we denote the functional I Φ (u) = T Φ(t, |u(t)|)dµ, for any u ∈ L 0 . The Musielak-Orlicz space, Musielak-Orlicz class, Morse-Transue space generated by a Musielak-Orlicz function Φ are defined, respectively, by The Musielak-Orlicz space L Φ is a Banach space when it is equipped with the Luxemburg norm given by or the Orlicz norm, represented as , which is also a Musielak-Orlicz function. These norms are equivalent and the inequalities u Φ ≤ u Φ,0 ≤ 2 u Φ hold for all u ∈ L Φ [32]. A Musielak-Orlicz function is said to satisfy the ∆ 2 -condition, or belong to the ∆ 2 -class (denoted by Φ ∈ ∆ 2 ), if we can find a constant α > 0 and a non-negative function f ∈L Φ such that If the Musielak-Orlicz function Φ satisfies the ∆ 2 -condition, then I Φ (u) < ∞ for every u ∈ L Φ [32]. In this case L Φ ,L Φ and E Φ are equal as sets. Moreover, if the Musielak-Orlicz function Φ does not satisfy the ∆ 2 -condition, E Φ is a proper subspace of L Φ . Every function Φ that satisfies the ∆ 2 -condition is finite-value. Indeed, we define and assuming that b Φ (t) < ∞, we get Φ(t, 1 2 u) < αΦ(t, u) = ∞ for all b Φ (t) < u < 2b Φ (t) which implies that Φ cannot satisfy the ∆ 2 -condition. For more information see for instance [32,33].
We say that a Musielak-Orlicz function Φ satisfies the ∇ 2 -condition, or belongs to ∇ 2 -class, if we can find a constant γ > 1, and a non-negative function f ∈L Φ such that We notice that, if Φ ∈ ∇ 2 , then satisfies the ∇ 2 -condition and does not satisfy the ∆ 2 -condition.
The (topological) dual space of L Φ , is denoted by (L Φ ) * and represented in the following way [32,34,35] where L Φ * is the set of the order continuous functionals and (L Φ ) ∼ s is formed by singular components. If the Musielak-Orlicz function Φ c ∈ ∆ 2 then all functionals in (L Φ ) * are order continuous and represented by Otherwise, if Φ / ∈ ∆ 2 , then the functionals f in (L Φ ) * can be uniquely expressed as where f c is the order continuous component and f s is the singular component. While exponential families are based on the exponential function, ϕ-families are based on deformed exponential functions. A deformed exponential ϕ : T × R → (0, ∞) is a function that satisfies the following properties, for µ-a.e. t ∈ T [28]: There exists a measurable function u 0 : for every measurable function c : T → R for which T ϕ(c)dµ = 1.
In de Souza et al. [36], Lemma 1, it was shown that the constraint T ϕ(c)dµ = 1 can be replaced by T ϕ(c)dµ < ∞. Thus, the condition (iii) can be rewritten as: (iii') There exists a measurable function u 0 : for all λ > 0, for every measurable function c : T → R for which T ϕ(c)dµ < ∞.
for a measurable function c : T → R such that ϕ(t, c(t)) is µ-integrable, was defined in [28]. Thus, the sets L Φ c ,L Φ c and E Φ c are denoted by L ϕ c ,L ϕ c and E ϕ c , respectively, when Φ c is given by (5). Let be the collection whose ϕ-family is a subset, where L 0 is the linear space of all real-valued. For each probability density p ∈ P µ , we have a ϕ-family of probability density associated, where the set B ϕ c is the intersection of the convex set The behavior of the normalizing function near the boundary was studied in [33,37].
It is shown that the normalizing function ψ : K ϕ c → R is a convex function [28]. Assuming that ϕ is continuously differentiable, the normalizing function is Gâteaux-differentiable and the expression for Gâteaux-derivative is with u ∈ K ϕ c and v ∈ L ϕ c . In the next section, we recall some differentiability properties of convex functions on infinite dimensional spaces.

The Subdifferential of a Convex function
In this section, we discuss some properties of extended real-valued convex functions in Banach spaces, i.e., functions with values in R ∪ {±∞}. Mainly, we recall subdifferentials of lower semicontinuous convex functions and its properties.
Let E be a Banach space. A function f is a convex function on E, with the epigraph [38] We denote by ∂ f (x) the set of subgradients of f at x and the subdifferential of f is the multivalued The subdifferential may be empty at points of dom f , so we denote by Observe that, if f is proper, then "sup" in Equation (9) may be restricted to the points x ∈ dom f . The conjugate f * is a convex and lower semicontinuous function on E * and jointly with f satisfy the well known Young's inequality with equality holding if and only if x * ∈ ∂ f (x). If f is a lower semicontinuous function, the subdifferential ∂ f * of the conjugate function f * coincides with (∂ f ) −1 ([39], Proposition 2.33).
It is known that, if f is a lower semicontinuous proper convex function, then and it was shown in [40] that The subdifferential of a convex function is closely related to Gâteaux-gradient. If the convex In the next section, we investigate the subdifferential of the normalizing function ψ. This result will be useful for us to prove that the generalized mixture arc is well defined, which is one of our main goals in this work.

Construction of Generalized Mixture Arcs
The normalizing function ψ : K ϕ c → R is convex and Gâteaux-differentiable and this derivative is given by Equation (8). Hence, with these facts in mind, we can provide the expression for the generalized mixture arc as given by: where and p, q belong to a ϕ-family F ϕ c . We can rewrite the functional F(p) as with p = ϕ c (c + u − ψ(u)u 0 ) and Equation (13) is the Gâteaux-gradient of ψ. Thus, for the generalized mixture arc to be well defined, it is necessary that the set of these functionals in Equation (13) be convex. As mentioned in Section 2.2, the subdifferential and Gâteaux-gradient are closely related. For this reason, we investigate the subdifferential of ψ.

Subdifferential of the Normalizing Function ψ
Considering that the Musielak-Orlicz function (5) does not satisfy the ∆ 2 -condition, then we have that ∂B ϕ c is not-empty [33]. The effective domain of the normalizing ψ, the set where {∂B ϕ c } <∞ is the set of points in the boundary of B ϕ c such that ψ(u) < ∞. The behavior of the normalizing function ψ near the boundary of B ϕ c was discussed in [33]. We need to know the subdifferentials of ψ. Hence, we have to prove some properties of ψ, then we have our first result.
To prove the statement, it suffices to show that C α is closed. We define a set and we are going to prove that B is a closed set and that B = C α . Let {u n } be a sequence which belongs to B, such that u n − u Φ c → 0 . This way, u n → u, µ-a.e. Since ϕ is a continuous function, we have thus, u ∈ B and B is a closed set. Now, we prove that B = C α . Let u be a function which belongs to C α , then ψ(u) ≤ α. The function ϕ is a strictly increasing function, so that Suppose that there exists w ∈ B \ C α , then w ∈ B, which implies that T ϕ(c + w − αu 0 )dµ ≤ 1 and w / ∈ C α , which implies that ψ(w) > α. Then This contradicts the assumption that w ∈ B. Therefore, B = C α and C α is closed.
The subdifferential of ψ at a function u ∈ dom ψ is the set where (L Φ c ) * denotes the dual space of L Φ c . We know that, for all u ∈ B ϕ c the normalizing function ψ is Gâteaux-differentiable and the Gâteaux-gradient is given by Equation (13). Hence, ∂ψ(u) consists of a single element and is given by In fact, we prove below that Equation (13) (16) belongs to L Φ * c , then (16) belongs to ∂ψ(u).
Proof. We have that the functional (16) belongs to L Φ * c . Let v be a function in B ϕ c such that T ϕ(c + u + v)dµ < ∞. In other words, u + v ∈ dom ψ, so we have that T ϕ(c + u − ψ(u)u 0 )dµ = 1 and Consequently, Inequality (17) holds for all v ∈ B ϕ c and the result follows.
We need to find the subdifferential of ψ for u in the set {∂B ϕ c } <∞ . We know that ψ is a proper lower semicontinuous convex function, so Since we are interested to prove that the set of functionals in Equation (13) is convex and these functionals are order continuous, we need to analyze only the order continuous part of the subdifferential, i.e., the part of the subdifferential that belongs to L Φ * c . We need to investigate whether the functional in Equation (16) For this, we will use some results.

Lemma 2.
Let Φ * and Ψ * denote the complementary functions to the Musielak-Orlicz functions Φ and Ψ, respectively. Suppose that, for constants α, λ > 0, there exists a non-negative function f ∈L Ψ such that for all u > f (t).

Lemma 3.
The ∆ 2 -condition is equivalent to the statement that, for every λ ∈ (0, 1), there exist a constant α λ ∈ (0, 1), and a non-negative function f λ ∈L Φ such that The ∇ 2 -condition is equivalent to the statement that, for any λ ∈ (0, 1), there exist a constant γ λ > 1, and a non-negative function f λ ∈L Φ such that for all u > f λ (t).
The next result follows from Lemmas 2 and 3.
As a consequence of Proposition 5, we have that it is possible to find u ∈ ∂B ϕ c <∞ such that and therefore the functional in Equation (16) does not belong to ∂ψ(u). We conclude in this section that, if the functional (23) belongs to L Φ * c , then the functional belongs to ∂ψ(u) for u ∈ dom ψ. In next section we finally prove that the set of functionals formed by Gâteaux gradient of the normalizing function ψ that belongs to L Φ * c is convex, so we can guarantee that the generalized mixture arc is well defined.

Convexity of the Functionals Set
We already know that, for the generalized mixture arc in Equation (11) to be well defined, it is necessary that the set of functionals to be convex. From Proposition 2, the set in Equation (24) is contained in the range of ∂ψ, the set given by Let ψ * be the conjugate function of ψ. By the fact that ψ * be a l.s.c. proper convex function, int dom ψ * and dom ψ * are convex sets and the range of ∂ψ is the effective domain of ∂ψ * , since is the same that int dom ψ * ⊂ range ∂ψ ⊂ dom ψ * .
To prove that the set in Equation (24) is convex, we analyze the set in Equation (25) in three cases. Let u * , v * be elements in Equation (25) such that Let u * , v * be elements in Equation (25) belonging to D(∂ψ * ) \ int dom ψ * .
Taking the product of Equation (30) by λ, the product of Equation (31) by (1 − λ) and adding the two obtained equations, we have From Equations (29) and (32), we obtain which is a contradiction by Equation (28). This implies that ∂ψ * (u * ) is a unitary set and this completes the proof.
Therefore, by Fact 2, there exists no functional u * in Equation (25) such that u * ∈ D(∂ψ * ) \ int dom ψ * . Thus Equation (25) is a convex set and, as a consequence, the generalized mixture arc is well defined, since the set in Equation (24) is a convex set. Indeed, let u, v be functions in dom ψ such that and belong to Equation (24). Clearly, We note that, the functionals in Equation (24) are the only elements in Equation (25) that satisfy then there exist functions w λ ∈ dom ψ such that Thus, the set in Equation (24) is a convex set.
In this section, we proved that the generalized mixture arc is well defined for a deformed exponential ϕ strictly convex. In the next section, we discuss generalized open exponential arcs and generalized open mixture arcs.
In [31], necessary and sufficient conditions for any probability distributions being ϕ-connected were provided. In this section, we discuss the concept of two probability distributions p, q ∈ P µ are ϕ-connected by open arcs. We generalize open exponential arcs and open mixture arcs, defined in [22] and studied later in [23].

Generalized Open Exponential Arcs
Let us define the generalized open arcs and prove some of its properties.
belongs to P µ for every t ∈ I, where k(α) depends of t, p and q.
In the following proposition, we give an equivalent definition of ϕ-connection by open arc. and a random variable v ∈ L ϕ c , such that p(α) ∝ ϕ(c + αv) belongs to P µ , for all t ∈ I and p(0) = p and p(1) = q.
Proof. Let us assume that p, q are ϕ-connected, i.e., T ϕ((1 − α)ϕ −1 (p) + αϕ −1 (q))dµ < ∞, for all α ∈ I. Since where v = ϕ −1 (q) − ϕ −1 (p) and ϕ(c) = p, then v ∈ L ϕ c . Moreover, p(α) ∝ ϕ(c + αv) belongs to P µ , for every α ∈ I and p(0) = ϕ(c) = p and p(1) = q. The converse follows immediately. Suppose that Because of v ∈ L ϕ c the need to define the open arcs arises. As a consequence of Proposition 7, we have that if p, q ∈ P µ are ϕ-connected by an open arc, then the random variable v ∈ K ϕ c , since T ϕ(c + αv)dµ < ∞ for all α ∈ (−ε, 1 + ε). With this, we can prove the following results.  With this, we prove that, for ϕ(c) = p, the ϕ-family of probability distributions F ϕ c is the set of all q ∈ P µ such that q is ϕ-connected by an open arc to p. Proof. It follows from Corollary 1 that p and q are in the same ϕ-family, thenc = c + u − ψ(u)u 0 and by Vigelis and Cavalcante [28], Lemma 5, it follows the result. Now, we show that the connection by generalized open exponential arcs is an equivalence relation.

Proposition 8. The relation in Definition 1 is an equivalence relation.
Proof. Reflexive and symmetry properties follow from the definition and now, we prove transitivity. Consider p, q, r ∈ P µ We have that p is ϕ-connected to q and r, respectively. We need to prove that q and r are also Therefore, q and r are ϕ-connected.
We know from Corollary 1 that the ϕ-family F ϕ c coincides with the set of all q ∈ P µ which are ϕ-connected to p by an open arc. We want now to prove that the ϕ-family F ϕ c is convex for some deformed exponential ϕ. Lemma 4. Let ϕ be a fixed deformed exponential. Assuming that (ϕ −1 ) (x) is continuous and then F(x) = ϕ(αϕ −1 (x) + k) , for some fixed α > 1 and k ∈ R is a convex function.
Proof. We know that, if F (x) ≥ 0 ∀α > 1 and ∀x, then F(x) is a convex function. We have by the fact that ϕ is an increasing function [ϕ (αϕ −1 (x))] 3 > 0. Hence, we have F (x) ≥ 0 if and only if which follows from Equation (37).

Proposition 9.
Let p ∈ P µ such that ϕ(c) = p. Assuming that (ϕ −1 ) (x) is continuous and for some fixed α > 1 and k ∈ R. Then, the ϕ-family of probability F ϕ c is convex.
since q ∈ F ϕ c and, therefore, p and q are ϕ-connected by an open arc.

Conclusions
In this work, we have generalized open exponential arc and open mixture arc for probability distributions. Moreover, we ensure that the generalization of open mixture arc is well-defined for deformed exponential strictly convex. From two ϕ-connected probability distributions p 1 and p 2 , we can define the generalized parallel transport τ (1) p 1 ,p 2 between the tangent spaces T p 1 P µ and T p 2 P µ given by u → u − T uϕ (ϕ −1 (p 2 ))dµ T u 0 ϕ (ϕ −1 (p 2 ))dµ u 0 , where T p P µ B ϕ c with p = ϕ(c). A next step is to find a generalized parallel transport τ (−1) p 1 ,p 2 that is dual to τ (1) p 1 ,p 2 . Another goal is to investigate if the generalized Rényi divergence D α ϕ (· ·) defined in [36] from two probability distributions ϕ-connected, can be related to the statistical divergence associated with (τ (−1) p 1 ,p 2 , τ (1) p 1 ,p 2 , ·, · ).