 Next Article in Journal
Symplectic Entropy as a Novel Measure for Complex Systems
Next Article in Special Issue
The Information Geometry of Sparse Goodness-of-Fit Testing
Previous Article in Journal
Multivariate Generalized Multiscale Entropy Analysis
Previous Article in Special Issue
Kernel Density Estimation on the Siegel Space with an Application to Radar Processing Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

# Geometry Induced by a Generalization of Rényi Divergence

1
Instituto Federal do Ceará, Campus Maracanaú, Fortaleza 61939-140, Brazil
2
Computer Engineering School, Campus Sobral, Federal University of Ceará, Sobral 62010-560, Brazil
3
Department of Teleinformatics Engineering, Federal University of Ceará, Fortaleza 60455-900, Brazil
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Entropy 2016, 18(11), 407; https://doi.org/10.3390/e18110407
Received: 6 September 2016 / Revised: 27 October 2016 / Accepted: 11 November 2016 / Published: 17 November 2016
(This article belongs to the Special Issue Differential Geometrical Theory of Statistics)

## Abstract

:
In this paper, we propose a generalization of Rényi divergence, and then we investigate its induced geometry. This generalization is given in terms of a φ-function, the same function that is used in the definition of non-parametric φ-families. The properties of φ-functions proved to be crucial in the generalization of Rényi divergence. Assuming appropriate conditions, we verify that the generalized Rényi divergence reduces, in a limiting case, to the φ-divergence. In generalized statistical manifold, the φ-divergence induces a pair of dual connections $D ( − 1 )$ and $D ( 1 )$. We show that the family of connections $D ( α )$ induced by the generalization of Rényi divergence satisfies the relation $D ( α ) = 1 − α 2 D ( − 1 ) + 1 + α 2 D ( 1 )$, with $α ∈ [ − 1 , 1 ]$.

## 1. Introduction

Information geometry, the study of statistical models equipped with a differentiable structure, was pioneered by the work of Rao , and gained maturity with the work of Amari and many others [2,3,4]. It has been successfully applied in many different areas, such as statistical inference, machine learning, signal processing or optimization [4,5]. In appropriate statistical models, the differentiable structure is induced by a (statistical) divergence. The Kullback–Leibler divergence induces a Riemannian metric, called the Fisher–Rao metric, and a pair of dual connections, the exponential and mixture connections. A statistical model endowed with the Fisher–Rao metric is called a (classical) statistical manifold. Amari also considered a family of α-divergences that induce a family of α-connections.
Much research in recent years has focused on the geometry of non-standard statistical models [6,7,8]. These models are defined in terms of a deformed exponential (also called ϕ-exponential). In particular, κ-exponential models and q-exponential families are investigated in [9,10]. Non-parametric (or infinite-dimensional) φ-families were introduced by the authors in [11,12], which generalize exponential families in the non-parametric setting [13,14,15,16]. Based on the similarity between exponential and φ-families, we defined the so-called φ-divergence, with respect to which the Kullback–Leibler divergence is a particular case. Statistical models equipped with a geometric structure induced by φ-divergences, which are called generalized statistical manifolds, are investigated in [17,18]. With respect to these connections, parametric φ-families are dually flat.
The φ-divergence is intrinsically related to the $( ρ , τ )$-model of Zhang, which was proposed in [19,20], extended to the infinite-dimension setting in , and explained in more details in [22,23]. For instance, the metric induced by φ-divergence and the $( ρ , τ )$-generalization of the Fisher–Rao metric, for the choices $ρ = φ − 1$ and $f = ρ − 1$, differ by a conformal factor.
Among many attempts to generalize Kullback–Leibler divergence, Rényi divergence  is one of the most successful, having found many applications . In the present paper, we propose a generalization of Rényi divergence, which we use to define a family of α-connections. This generalization is based on an interpretation of Rényi divergence as a kind of normalizing function. To generalize Rényi divergence, we considered functions satisfying some suitable conditions. To a function for which these conditions hold, we give the name of φ-function. In a limiting case, the generalized Rényi divergence reduces to the φ-divergence. In [17,18], the φ-divergence gives rise to a pair of dual connections $D ( − 1 )$ and $D ( 1 )$. We show that the connection $D ( α )$ induced by the generalization of Rényi divergence satisfies the convex combination $D ( α ) = 1 − α 2 D ( − 1 ) + 1 + α 2 D ( 1 )$.
Eguchi in  investigated a geometry based on a normalizing function similar to the one used in the generalization of Rényi divergence. In , results were derived supposing that this normalizing function exists; conditions for its existence were not given. In the present paper, the existence of the normalizing function is ensured by conditions involved in the definition of φ-functions.
The rest of the paper is organized as follows. In Section 2, φ-functions are introduced and some properties are discussed. The Rényi divergence is generalized in Section 3. We investigate in Section 4 the geometry induced by the generalization of Rényi divergence. Section 4.2 provides evidence of the role of the generalized Rényi divergence in φ-families.

## 2. φ-Functions

Rényi divergence is defined in terms of the exponential function (to be more precise, the logarithm). A way of generalizing Rényi divergence is to replace the exponential function by another function, which satisfies some suitable conditions. To a function for which these conditions hold, we give the name φ-function. In this section, we define and investigate some properties of φ-functions.
Let $( T , Σ , μ )$ be a measure space. Although we do not restrict our analysis to a particular measure space, the reader can think of T as the set of real numbers $R$, Σ as the Borel σ-algebra on $R$, and μ as the Lebesgue measure. We can also consider T to be a discrete set, a case in which μ is the counting measure.
We say that $φ : R → ( 0 , ∞ )$ is a φ-function if the following conditions are satisfied:
(a1)
$φ ( · )$ is convex;
(a2)
$lim u → − ∞ φ ( u ) = 0$ and $lim u → ∞ φ ( u ) = ∞$;
(a3)
there exists a measurable function $u 0 : T → ( 0 , ∞ )$ such that
$∫ T φ ( c + λ u 0 ) d μ < ∞ , for all λ > 0 ,$
for each measurable function $c : T → R$ satisfying $∫ T φ ( c ) d μ = 1$.
Thanks to condition (a3), we can generalize Rényi divergence using φ-functions. These conditions appeared first at  where the authors constructed non-parametric φ-families of probability distributions. We remark that if T is finite, condition (a3) is always satisfied.
Examples of functions $φ : R → ( 0 , ∞ )$ satisfying (a1)–(a3) abound. An example of great relevance is the exponential function $φ ( u ) = exp ( u )$, which satisfies conditions (a1)–(a3) with $u 0 = 1 T$. Another example of φ-function is the Kaniadakis’ κ-exponential [12,27,28].
Example 1.
The Kaniadakis’ κ-exponential $exp κ : R → ( 0 , ∞ )$ for $κ ∈ [ − 1 , 1 ]$ is defined as
$exp κ ( u ) = ( κ u + 1 + κ 2 u 2 ) 1 / κ , if κ ≠ 0 , exp ( u ) , if κ = 0 ,$
whose inverse is the so called the Kaniadakis’ κ-logarithm $log k : ( 0 , ∞ ) → R$, which is given by
$log κ ( u ) = u κ − u − κ 2 κ , if κ ≠ 0 , ln ( u ) , if κ = 0 .$
It is clear that $exp κ ( · )$ satisfies (a1) and (a2). Let $u 0 : T → ( 0 , ∞ )$ be any measurable function for which $∫ T exp κ ( u 0 ) d μ < ∞$. We will show that $u 0$ satisfies expression (1). For any $u ∈ R$ and $α ≥ 1$, we can write
$exp κ ( α u ) = α 1 / | κ | ( | κ | u + 1 / α 2 + | κ | 2 u 2 ) 1 / | κ | ≤ α 1 / | κ | ( | κ | u + 1 + | κ | 2 u 2 ) 1 / | κ | = α 1 / | κ | exp κ ( u ) ,$
where we used that $exp κ ( · ) = exp − κ ( · )$. Then, we conclude that $∫ T exp κ ( α u 0 ) d μ < ∞$ for all $α ≥ 0$. Fix any measurable function $c : T → R$ such that $∫ T φ ( c ) d μ = 1$. For each $λ > 0$, we have
$∫ T exp κ ( c + λ u 0 ) d μ ≤ 1 2 ∫ T exp κ ( 2 c ) d μ + 1 2 ∫ T exp κ ( 2 λ u 0 ) d μ ≤ 2 1 / | κ | − 1 ∫ T exp κ ( c ) d μ + 2 1 / | κ | − 1 ∫ T exp κ ( λ u 0 ) d μ < ∞ ,$
which shows that $exp κ ( · )$ satisfies (a3). Therefore, the Kaniadakis’ κ-exponential $exp κ ( · )$ is an example of φ-function.
The restriction that $∫ T φ ( c ) d μ = 1$ can be weakened, as asserted in the next result.
Lemma 1.
Let $c ˜ : T → R$ be any measurable function such that $∫ T φ ( c ˜ ) d μ < ∞$. Then, $∫ T φ ( c ˜ + λ u 0 ) d μ < ∞$ for all $λ > 0$.
Proof.
Notice that if $∫ T φ ( c ˜ ) d μ ≥ 1$, then $∫ T φ ( c ˜ − α u 0 ) d μ = 1$ for some $α > 1$. From the definition of $u 0$, it follows that $∫ T φ ( c ˜ + λ u 0 ) d μ = ∫ T φ ( c + ( α + λ ) u 0 ) d μ < ∞$, where $c = c ˜ − α u 0$. Now assume that $∫ T φ ( c ˜ ) d μ < 1$. Consider any measurable set $A ⊆ T$ with measure $0 < μ ( A ) < μ ( T )$. Let $u : T → [ 0 , ∞ )$ be a measurable function supported on A satisfying $φ ( c ˜ + u ) 1 A = [ φ ( c ˜ ) + α ] 1 A$, where $α = ( 1 − ∫ T φ ( c ˜ ) d μ ) / μ ( A )$. Defining $c = ( c ˜ + u ) 1 A + c ˜ 1 T \ A$, we see that $∫ T φ ( c ) d μ = 1$. By the definition of $u 0$, we can write
$∫ T φ ( c ˜ + λ u 0 ) d μ ≤ ∫ T φ ( c + λ u 0 ) d μ < ∞ , for any λ > 0 ,$
which is the desired result. ☐
As a consequence of Lemma 1, condition (a3) can be replaced by the following one:
(a3’)  There exists a measurable function $u 0 : T → ( 0 , ∞ )$ such that
$∫ T φ ( c + λ u 0 ) d μ < ∞ , for all λ > 0 ,$
for each measurable function $c : T → R$ for which $∫ T φ ( c ) d μ < ∞$.
Without the equivalence between conditions (a3) and (a3’), we could not generalize Rényi divergence in the manner we propose. In fact, φ-functions could be defined directly in terms of (a3’), without mentioning (a3). We chose to begin with (a3) because this condition appeared initially in .
Not all functions $φ : R → ( 0 , ∞ )$, for which conditions (a1) and (a2) hold, satisfy condition (a3). Such a function is given below.
Example 2.
Assume that the underlying measure μ is σ-finite and non-atomic. This is the case of the Lebesgue measure. Let us consider the function
$φ ( u ) = e ( u + 1 ) 2 / 2 , u ≥ 0 , e ( u + 1 / 2 ) , u ≤ 0 ,$
which clearly is convex, and satisfies the limits $lim u → − ∞ φ ( u ) = 0$ and $lim u → ∞ φ ( u ) = ∞$. Given any measurable function $u 0 : T → ( 0 , ∞ )$, we will find a measurable function $c : T → R$ with $∫ T φ ( c ) d μ < ∞$, for which expression (2) is not satisfied.
For each $m ≥ 1$, we define
$v m ( t ) : = m log ( 2 ) u 0 ( t ) − u 0 ( t ) 2 − 1 1 E m ( t ) ,$
where $E m = { t ∈ T : m log ( 2 ) u 0 ( t ) − u 0 ( t ) 2 − 1 > 0 }$. Because $v m ↑ ∞$, we can find a sub-sequence ${ v m n }$ such that
$∫ E m n e ( v m n + u 0 + 1 ) 2 / 2 d μ ≥ 2 n .$
According to (Lemma 8.3 in ) , there exists a sub-sequence $w k = v m n k$ and pairwise disjoint sets $A k ⊆ E m n k$ for which
$∫ A k e ( w k + u 0 + 1 ) 2 / 2 d μ = 1 .$
Let us define $c = c ¯ 1 T \ A + ∑ k = 1 ∞ w k 1 A k$, where $A = ⋃ k = 1 ∞ A k$ and $c ¯$ is any measurable function such that $φ ( c ¯ ( t ) ) > 0$ for $t ∈ T \ A$ and $∫ T \ A φ ( c ¯ ) d μ < ∞$. Observing that
$e ( w k ( t ) + u 0 ( t ) + 1 ) 2 / 2 = 2 m n k e ( w k ( t ) + 1 ) 2 / 2 , for t ∈ A k ,$
we get
$∫ A k e ( w k + 1 ) 2 / 2 d μ = 1 2 m n k , for every m ≥ 1 .$
Then, we can write
$∫ T φ ( c ) d μ = ∫ T \ A φ ( c ¯ ) d μ + ∑ k = 1 ∞ ∫ A k e ( w k + 1 ) 2 / 2 d μ = ∫ T \ A φ ( c ¯ ) d μ + ∑ k = 1 ∞ 1 2 m n k < ∞ .$
On the other hand,
$∫ T φ ( c + u 0 ) d μ = ∫ T \ A φ ( c ¯ ) d μ + ∑ k = 1 ∞ ∫ A k e ( u 0 + w k + 1 ) 2 / 2 d μ = ∫ T \ A φ ( c ¯ ) d μ + ∑ k = 1 ∞ 1 = ∞ ,$
which shows that (2) is not satisfied.

## 3. Generalization of Rényi Divergence

In this section, we provide a generalization of Rényi divergence, which is given in terms of a φ-function. This generalization also depends on a parameter $α ∈ [ − 1 , 1 ]$; for $α = ± 1$, it is defined as a limit. Supposing that the underlying φ-function is continuously differentiable, we show that this limit exists and results in the φ-divergence . In what follows, all probability distributions are assumed to have positive density. In other words, they belong to the collection
$P μ = p ∈ L 0 : ∫ T p d μ = 1 and p > 0 ,$
where $L 0$ is the space of all real-valued, measurable functions on T, with equality μ-a.e. (μ-almost everywhere).
The Rényi divergence of order $α ∈ ( − 1 , 1 )$ between two probability distributions p and q in $P μ$ is defined as
$D R ( α ) ( p ∥ q ) = 4 α 2 − 1 log ∫ T p 1 − α 2 q 1 + α 2 d μ .$
For $α = ± 1$, the Rényi divergence is defined by taking a limit:
$D R ( − 1 ) ( p ∥ q ) = lim α ↓ − 1 D R ( α ) ( p ∥ q ) ,$
$D R ( 1 ) ( p ∥ q ) = lim α ↑ 1 D R ( α ) ( p ∥ q ) .$
Under some conditions, the limits in (5) and (6) are finite-valued, and converge to the Kullback–Leibler divergence. In other words,
$D R ( − 1 ) ( p ∥ q ) = D R ( 1 ) ( q ∥ p ) = D KL ( p ∥ q ) < ∞ ,$
where $D KL ( p ∥ q )$ denotes the Kullback–Leibler divergence between p and q, which is given by
$D KL ( p ∥ q ) = ∫ T p log p q d μ .$
These conditions are stated in Proposition 1, given in the end of this section, for the case involving the generalized Rényi divergence.
The Rényi divergence in its standard form is given by
$D ( α ) ( p ∥ q ) = 1 1 − α log ∫ T p α q 1 − α d μ , for α ∈ ( 0 , 1 ) .$
Expression (4) is related to this form by
$D R ( α ) ( p ∥ q ) = 2 1 − α D ( ( 1 − α ) / 2 ) ( p ∥ q ) .$
Beyond the change of variables, which results in α ranging in $[ − 1 , 1 ]$, expressions (4) and (7) differ by the factor $2 / ( 1 − α )$. We opted to insert the term $2 / ( 1 − α )$ so that some kind of symmetry could be maintained when the limits $α ↓ − 1$ and $α ↑ 1$ are considered. In addition, the geometry induced by the version (4) conforms with Amari’s notation .
The Rényi divergence $D R ( α ) ( · ∥ · )$ can be defined for every $α ∈ R$. However, for $α ∉ ( − 1 , 1 )$, the expression (4) may not be finite-valued for every p and q in $P μ$. To avoid some technicalities, we just consider $α ∈ [ − 1 , 1 ]$.
Given p and q in $P μ$, let us define
$κ ( α ) = − log ∫ T p 1 − α 2 q 1 + α 2 d μ , for α ∈ [ − 1 , 1 ] ,$
which can be used to express the Rényi divergence as
$D R ( α ) ( p ∥ q ) = 4 1 − α 2 κ ( α ) , for α ∈ ( − 1 , 1 ) .$
The function $κ ( α )$, which depends on p and q, can be defined as the unique non-negative real number for which
$∫ T exp 1 − α 2 ln ( p ) + 1 + α 2 ln ( q ) + κ ( α ) d μ = 1 .$
The function $κ ( α )$ makes the role of a normalizing term. The generalization of Rényi divergence, which we propose, is based on the interpretation of $κ ( α )$ given in (8). Instead of the exponential function, we consider a φ-function in (8).
Fix any φ-function $φ : R → ( 0 , ∞ )$. Given any p and q in $P μ$, we take $κ ( α ) = κ ( α ; p , q ) ≥ 0$ so that
$∫ T φ 1 − α 2 φ − 1 ( p ) + 1 + α 2 φ − 1 ( q ) + κ ( α ) u 0 d μ = 1 ,$
or, in other words, the term inside the integral is a probability distribution in $P μ$. The existence and uniqueness of $κ ( α )$ as defined in (9) is guaranteed by condition (a3’).
We define a generalization of the Rényi divergence of order $α ∈ ( − 1 , 1 )$ as
$D φ ( α ) ( p ∥ q ) = 4 1 − α 2 κ ( α ) .$
For $α = ± 1$, this generalization is defined as a limit:
$D φ ( − 1 ) ( p ∥ q ) = lim α ↓ − 1 D φ ( α ) ( p ∥ q ) ,$
$D φ ( 1 ) ( p ∥ q ) = lim α ↑ 1 D φ ( α ) ( p ∥ q ) .$
The cases $α = ± 1$ are related to a generalization of the Kullback–Leibler divergence, the so-called φ-divergence, which was introduced by the authors in . The φ-divergence is given by (It was pointed out to us by an anonymous referee that this form of divergence is a special case of the $( ρ , τ )$-divergence for $ρ = φ − 1$ and $f = ρ − 1$ (see Section 3.5 in ) apart from a conformal factor, which is the denominator of (13)):
$D φ ( p ∥ q ) = ∫ T φ − 1 ( p ) − φ − 1 ( q ) ( φ − 1 ) ′ ( p ) d μ ∫ T u 0 ( φ − 1 ) ′ ( p ) d μ .$
Under some conditions, the limit in (11) or (12) is finite-valued and converges to the φ-divergence:
$D φ ( − 1 ) ( p ∥ q ) = D φ ( 1 ) ( q ∥ p ) = D φ ( p ∥ q ) < ∞ .$
To show (14), we make use of the following result.
Lemma 2.
Assume that $φ ( · )$ is continuously differentiable. If for $α 0 , α 1 ∈ R$, the expression
$∫ T φ 1 − α 2 φ − 1 ( p ) + 1 + α 2 φ − 1 ( q ) d μ < ∞$
is satisfied for all $α ∈ [ α 0 , α 1 ]$, then the derivative of $κ ( α )$ exists at any $α ∈ ( α 0 , α 1 )$, and is given by
$∂ κ ∂ α ( α ) = − 1 2 ∫ T [ φ − 1 ( q ) − φ − 1 ( p ) ] φ ′ ( c α ) d μ ∫ T φ ′ ( c α ) u 0 d μ ,$
where $c α = 1 − α 2 φ − 1 ( p ) + 1 + α 2 φ − 1 ( q ) + κ ( α ) u 0$.
Proof.
For $α ∈ ( α 0 , α 1 )$ and $κ > 0$, define
$g ( α , κ ) = ∫ T φ 1 − α 2 φ − 1 ( p ) + 1 + α 2 φ − 1 ( q ) + κ u 0 d μ .$
The function $κ ( α )$ is defined implicitly by $g ( α , κ ( α ) ) = 1$. If we show that
(i)
the function $g ( α , κ )$ is continuous in a neighborhood of $( α , κ ( α ) )$,
(ii)
the partial derivatives $∂ g ∂ α$ and $∂ g ∂ κ$ exist and are continuous at $( α , κ ( α ) )$,
(iii)
and $∂ g ∂ κ ( α , κ ( α ) ) > 0$,
then by the Implicit Function Theorem $κ ( α )$ is differentiable at $α ∈ ( α 0 , α 1 )$, and
$∂ κ ∂ α ( α ) = − ( ∂ g / ∂ α ) ( α , κ ( α ) ) ( ∂ g / ∂ κ ) ( α , κ ( α ) ) .$
We begin by verifying that $g ( α , κ )$ is continuous. For fixed $α ∈ ( α 0 , α 1 )$ and $κ > 0$, set $κ 0 = 2 κ$. Denoting $A = { t ∈ T : φ − 1 ( q ( t ) ) > φ − 1 ( p ( t ) ) }$, we can write
$φ 1 − β 2 φ − 1 ( p ) + 1 + β 2 φ − 1 ( q ) + λ u 0 ≤ φ φ − 1 ( p ) + 1 + β 2 [ φ − 1 ( q ) − φ − 1 ( p ) ] + κ 0 u 0 ≤ φ φ − 1 ( p ) + 1 + α 1 2 [ φ − 1 ( q ) − φ − 1 ( p ) ] + κ 0 u 0 1 A + φ φ − 1 ( p ) + 1 + α 0 2 [ φ − 1 ( q ) − φ − 1 ( p ) ] + κ 0 u 0 1 T \ A ,$
for every $β ∈ ( α 0 , α 1 )$ and $λ ∈ ( 0 , κ 0 )$. Because the function on the right-hand side of (18) is integrable, we can apply the Dominated Convergence Theorem to conclude that
$lim ( β , λ ) → ( α , κ ) g ( β , λ ) = g ( α , κ ) .$
Now, we will show that the derivative of $g ( α , κ )$ with respect to α exists and is continuous. Consider the difference
$g ( γ , λ ) − g ( β , λ ) γ − β = ∫ T 1 γ − β φ c β + γ − β 2 [ φ − 1 ( q ) − φ − 1 ( p ) ] + λ u 0 − φ ( c β + λ u 0 ) d μ ,$
where $c β = 1 − β 2 φ − 1 ( p ) + 1 + β 2 φ − 1 ( q )$. Represent by $f β , γ , λ$ the function inside the integral sign in (19). For fixed $α ∈ ( α 0 , α 1 )$ and $κ > 0$, denote $α ¯ 0 = ( α 0 + α ) / 2$, $α ¯ 1 = ( α + α 1 ) / 2$, and $κ 0 = 2 κ$. Because $φ ( · )$ is convex and increasing, it follows that
$| f β , γ , λ | ≤ f α ¯ 1 , α 1 , κ 0 1 A − f α ¯ 0 , α 0 , κ 0 1 T \ A = : f , for all β , γ ∈ ( α ¯ 0 , α ¯ 1 ) and λ ∈ ( 0 , κ 0 ) ,$
where $A = { t ∈ T : φ − 1 ( q ( t ) ) > φ − 1 ( p ( t ) ) }$. Observing that f is integrable, we can use the Dominated Convergence Theorem to get
$lim γ → β ∫ T f β , γ , λ d μ = ∫ T lim γ → β f β , γ , λ d μ ,$
and then
$∂ g ∂ α ( β , λ ) = 1 2 ∫ T [ φ − 1 ( q ) − φ − 1 ( p ) ] φ ′ ( c β + λ u 0 ) d μ .$
For $β ∈ ( α ¯ 0 , α ¯ 1 )$ and $λ ∈ ( 0 , κ 0 )$, the function inside the integral sign in (20) is dominated by f. As a result, a second use of the Dominated Convergence Theorem shows that $∂ g ∂ α$ is continuous at $( α , κ )$:
$lim ( β , λ ) → ( α , κ ) ∂ g ∂ α ( β , λ ) = ∂ g ∂ α ( α , κ ) .$
Using similar arguments, one can show that $∂ g ∂ κ ( α , κ )$ exists and is continuous at any $α ∈ ( α 0 , α 1 )$ and $κ > 0$, and is given by
$∂ g ∂ κ ( α , κ ) = ∫ T u 0 φ ′ ( c α + κ u 0 ) d μ .$
Clearly, expression (21) implies that $∂ g ∂ κ ( α , κ ) > 0$ for all $α ∈ ( 0 , α 0 )$ and $κ > 0$.
We proved that items (i)–(iii) are satisfied. As consequence, the derivative of $κ ( α )$ exists at any $α ∈ ( α 0 , α 1 )$. Expression (16) for the derivative of $κ ( α )$ follows from (17), (20) and (21). ☐
As an immediate consequence of Lemma 2, we get the proposition below.
Proposition 1.
Assume that $φ ( · )$ is continuously differentiable.
(a)
If, for some $α 0 < − 1$, expression (15) is satisfied for all $α ∈ [ α 0 , − 1 )$, then
$D φ ( − 1 ) ( p ∥ q ) = lim α ↓ − 1 D φ ( α ) ( p ∥ q ) = 2 ∂ κ ∂ α ( − 1 ) = D φ ( p ∥ q ) < ∞ .$
(b)
If, for some $α 1 > 1$, expression (15) is satisfied for all $α ∈ ( 1 , α 1 ]$, then
$D φ ( 1 ) ( p ∥ q ) = lim α ↑ 1 D φ ( α ) ( p ∥ q ) = − 2 ∂ κ ∂ α ( 1 ) = D φ ( q ∥ p ) < ∞ .$

## 4. Generalized Statistical Manifolds

Statistical manifolds consist of a collection of probability distributions endowed with a metric and α-connections, which are defined in terms of the derivative of $l ( t ; θ ) = log p ( t ; θ )$. In a generalized statistical manifold, the metric and connection are defined in terms of $f ( t ; θ ) = φ − 1 ( p ( t ; θ ) )$. Instead of the logarithm, we consider the inverse $φ − 1 ( · )$ of a φ-function. Generalized statistical manifolds were introduced by the authors in [17,18]. Among examples of the generalized statistical manifold, (parametric) φ-families of probability distributions are of greatest importance. The non-parametric counterpart was investigated in [11,12]. The metric in φ-families can be defined as the Hessian of a function; i.e., φ-families are Hessian manifolds . In [17,18], the φ-divergence gives rise to a pair of dual connections $D ( − 1 )$ and $D ( 1 )$; and then for $α ∈ ( − 1 , 1 )$ the α-connection $D ( α )$ is defined as the convex combination $D ( α ) = 1 − α 2 D ( − 1 ) + 1 + α 2 D ( 1 )$. In the present paper, we show that the connection induced by $D φ ( α ) ( · ∥ · )$, the generalization of Rényi divergence, corresponds to $D ( α )$.

#### 4.1. Definitions

Let $φ : R → ( 0 , ∞ )$ be a φ-function. A generalized statistical manifold $P = { p ( t ; θ ) : θ ∈ Θ }$ is a collection of probability distributions $p θ ( t ) : = p ( t ; θ )$, indexed by parameters $θ = ( θ 1 , ⋯ , θ n ) ∈ Θ$ in a one-to-one relation, such that
(m1)
Θ is a domain (open and connected set) in $R n$;
(m2)
$p ( t ; θ )$ is differentiable with respect to θ;
(m3)
the matrix $g = ( g i j )$ defined by
$g i j = − E θ ′ ∂ 2 φ − 1 ( p θ ) ∂ θ i ∂ θ j ,$
is positive definite at each $θ ∈ Θ$, where
$E θ ′ [ · ] = ∫ T ( · ) φ ′ ( φ − 1 ( p θ ) ) d μ ∫ T u 0 φ ′ ( φ − 1 ( p θ ) ) d μ ;$
(m4)
the operations of integration with respect to μ and differentiation with respect to $θ i$ commute in all calculations found below, which are related to the metric and connections.
The matrix $g = ( g i j )$ equips $P$ with a metric. By the chain rule, the tensor related to $g = ( g i j )$ is invariant under change of coordinates. The (classical) statistical manifold is a particular case in which $φ ( u ) = exp ( u )$ and $u 0 = 1 T$.
We introduce a notation similar to Equation (23) that involves higher order derivatives of $φ ( · )$. For each $n ≥ 1$, we define
$E θ ( n ) [ · ] = ∫ T ( · ) φ ( n ) ( φ − 1 ( p θ ) ) d μ ∫ T u 0 φ ′ ( φ − 1 ( p θ ) ) d μ .$
We also use $E θ ′ [ · ]$, $E θ ″ [ · ]$ and $E θ ‴ [ · ]$ to denote $E θ ( n ) [ · ]$ for $n = 1 , 2 , 3$, respectively. The notation (24) appears in expressions related to the metric and connections.
Using property (m4), we can find an alternate expression for $g i j$ as well as an identification involving tangent spaces. The matrix $g = ( g i j )$ can be equivalently defined by
$g i j = E θ ″ ∂ φ − 1 ( p θ ) ∂ θ i ∂ φ − 1 ( p θ ) ∂ θ j .$
As a consequence of this equivalence, the tangent space $T p θ P$ can be identified with $T ˜ p θ P$, the vector space spanned by $∂ φ − 1 ( p θ ) ∂ θ i$, and endowed with the inner product $〈 X ˜ , Y ˜ 〉 θ : = E θ ″ [ X ˜ Y ˜ ]$. The mapping
$∑ i a i ∂ ∂ θ i ↦ ∑ i a i ∂ φ − 1 ( p θ ) ∂ θ i$
defines an isometry between $T p θ P$ and $T ˜ p θ P$.
To verify (25), we differentiate $∫ T p θ d μ = 1$, with respect to $θ i$, to get
$0 = ∂ ∂ θ i ∫ T p θ d μ = ∫ T ∂ ∂ θ i φ ( φ − 1 ( p θ ) ) d μ = ∫ T ∂ φ − 1 ( p θ ) ∂ θ i φ ′ ( φ − 1 ( p θ ) ) d μ .$
Now, differentiating with respect to $θ j$, we obtain
$0 = ∫ T ∂ 2 φ − 1 ( p θ ) ∂ θ i ∂ θ j φ ′ ( φ − 1 ( p θ ) ) d μ + ∫ T ∂ φ − 1 ( p θ ) ∂ θ i ∂ φ − 1 ( p θ ) ∂ θ j φ ″ ( φ − 1 ( p θ ) ) d μ ,$
and then (25) follows. In view of (26), we notice that every vector $X ˜$ belonging to $T ˜ p θ P$ satisfies $E θ ′ [ X ˜ ] = 0$.
The metric $g = ( g i j )$ gives rise to a Levi–Civita connection ∇ (i.e., a torsion-free, metric connection), whose corresponding Christoffel symbols $Γ i j k$ are given by
$Γ i j k : = 1 2 ∂ g k i ∂ θ j + ∂ g k j ∂ θ i − ∂ g i j ∂ θ k .$
Using expression (25) to calculate the derivatives in (27), we can express
$Γ i j k = E θ ″ ∂ 2 φ − 1 ( p θ ) ∂ θ i ∂ θ j ∂ φ − 1 ( p θ ) ∂ θ k + 1 2 E θ ‴ ∂ φ − 1 ( p θ ) ∂ θ i ∂ φ − 1 ( p θ ) ∂ θ j ∂ φ − 1 ( p θ ) ∂ θ k = − 1 2 E θ ″ ∂ φ − 1 ( p θ ) ∂ θ i ∂ φ − 1 ( p θ ) ∂ θ k E θ ″ u 0 ∂ φ − 1 ( p θ ) ∂ θ j − 1 2 E θ ″ ∂ φ − 1 ( p θ ) ∂ θ j ∂ φ − 1 ( p θ ) ∂ θ k E θ ″ u 0 ∂ φ − 1 ( p θ ) ∂ θ i = + 1 2 E θ ″ ∂ φ − 1 ( p θ ) ∂ θ i ∂ φ − 1 ( p θ ) ∂ θ j E θ ″ u 0 ∂ φ − 1 ( p θ ) ∂ θ k .$
As we will show later, the Levi–Civita connection ∇ corresponds to the connection derived from the divergence $D φ ( α ) ( · ∥ · )$ with $α = 0$.

#### 4.2. φ-Families

Let $c : T → R$ be a measurable function for which $p = φ ( c )$ is a probability density in $P μ$. Fix measurable functions $u 1 , ⋯ , u n : T → R$. A (parametric) φ-family $F p = { p θ : θ ∈ Θ }$, centered at $p = φ ( c )$, is a set of probability distributions in $P μ$, whose members can be written in the form
$p θ : = φ c + ∑ i = 1 n θ i u i − ψ ( θ ) u 0 , for each θ = ( θ i ) ∈ Θ ,$
where $ψ : Θ → [ 0 , ∞ )$ is a normalizing function, which is introduced so that expression (28) defines a probability distribution belonging to $P μ$.
The functions $u 1 , ⋯ , u n$ are not arbitrary. They are chosen to satisfy the following assumptions:
(i)
$u 0 , u 1 , ⋯ , u n$ are linearly independent,
(ii)
$∫ T u i φ ′ ( c ) d μ = 0$, and
(iii)
there exists $ε > 0$ such that $∫ T φ ( c + λ u i ) d μ < ∞$, for all $λ ∈ ( − ε , ε )$.
Moreover, the domain $Θ ⊆ R n$ is defined as the set of all vectors $θ = ( θ i )$ for which
$∫ T φ c + λ ∑ i = 1 n θ i u i d μ < ∞ , for some λ > 1 .$
Condition (i) implies that the mapping defined by (28) is one-to-one. Assumption (ii) makes of ψ a non-negative function. Indeed, by the convexity of $φ ( · )$, along with (ii), we can write
$∫ T φ ( c ) d μ = ∫ T φ ( c ) + ∑ i = 1 n θ i u i φ ′ ( c ) d μ ≤ ∫ T φ c + ∑ i = 1 n θ i u i d μ ,$
which implies $ψ ( θ ) ≥ 0$. By condition (iii), the domain Θ is an open neighborhood of the origin. If the set T is finite, condition (iii) is always satisfied. One can show that the domain Θ is open and convex. Moreover, the normalizing function ψ is also convex (or strictly convex if $φ ( · )$ is strictly convex). Conditions (ii) and (iii) also appears in the definition of non-parametric φ-families. For further details, we refer to [11,12].
In a φ-family $F p$, the matrix $( g i j )$ given by (22) or (25) can be expressed as the Hessian of ψ. If $φ ( · )$ is strictly convex, then$( g i j )$ is positive definite. From
$∂ φ − 1 ( p θ ) ∂ θ i = u i − ∂ ψ ∂ θ i , − ∂ 2 φ − 1 ( p θ ) ∂ θ i ∂ θ j = − ∂ 2 ψ ∂ θ i ∂ θ j ,$
it follows that $g i j = ∂ 2 ψ ∂ θ i ∂ θ j$.
The next two results show how the generalization of Rényi divergence and the φ-divergence are related to the normalizing function in φ-families.
Proposition 2.
In a φ-family $F p$, the generalization of Rényi divergence for $α ∈ ( − 1 , 1 )$ can be expressed in terms of the normalizing function ψ as follows:
$D φ ( α ) ( p θ ∥ p ϑ ) = 2 1 + α ψ ( θ ) + 2 1 − α ψ ( ϑ ) − 4 1 − α 2 ψ 1 − α 2 θ + 1 + α 2 ϑ ,$
for all $θ , ϑ ∈ Θ$.
Proof.
Recall the definition of $κ ( α )$ as the real number for which
$∫ T φ 1 − α 2 φ − 1 ( p θ ) + 1 + α 2 φ − 1 ( p ϑ ) + κ ( α ) u 0 d μ = 1 .$
Using expression (28) for probability distributions in $F p$, we can write
$1 − α 2 φ − 1 ( p θ ) + 1 + α 2 φ − 1 ( p ϑ ) + κ ( α ) u 0 = c + ∑ i = 1 n 1 − α 2 θ i + 1 + α 2 ϑ i u i − 1 − α 2 ψ ( θ ) + 1 + α 2 ψ ( ϑ ) − κ ( α ) u 0 = c + ∑ i = 1 n 1 − α 2 θ i + 1 + α 2 ϑ i u i − ψ 1 − α 2 θ + 1 + α 2 ϑ u 0 .$
The last equality is a consequence of the domain Θ being convex. Thus, it follows that
$κ ( α ) = 1 − α 2 ψ ( θ ) + 1 + α 2 ψ ( ϑ ) − ψ 1 − α 2 θ + 1 + α 2 ϑ .$
By the definition of $D φ ( α ) ( · ∥ · )$, we get (29). ☐
Proposition 3.
In a φ-family $F p$, the φ-divergence is related to the normalizing function ψ by the equality
$D φ ( p θ ∥ p ϑ ) = ψ ( ϑ ) − ψ ( θ ) − ∇ ψ ( θ ) · ( ϑ − θ ) ,$
for all $θ , ϑ ∈ Θ$.
Proof.
To show (30), we use
$∂ ψ ∂ θ i ( θ ) = ∫ T u i φ ′ ( φ − 1 ( p θ ) ) d μ ∫ T u 0 φ ′ ( φ − 1 ( p θ ) ) d μ ,$
which is a consequence of (Lemma 10 in ). In view of $( φ − 1 ) ′ ( u ) = 1 / φ ′ ( φ − 1 ( u ) )$, expression (13) with $p = p θ$ and $q = p ϑ$ results in
$D φ ( p θ ∥ p ϑ ) = ∫ T [ φ − 1 ( p θ ) − φ − 1 ( p ϑ ) ] φ ′ ( φ − 1 ( p θ ) ) d μ ∫ T u 0 φ ′ ( φ − 1 ( p θ ) ) d μ .$
Inserting into (31) the difference
$φ − 1 ( p θ ) − φ − 1 ( p ϑ ) = c + ∑ i = 1 n θ i u i − ψ ( θ ) u 0 − c + ∑ i = 1 n ϑ i u i − ψ ( ϑ ) u 0 = ψ ( ϑ ) u 0 − ψ ( θ ) u 0 − ∑ i = 1 n ( ϑ i − θ i ) u i ,$
we get expression (30). ☐
In Proposition 2, the expression on the right-hand side of Equation (29) defines a divergence on its own, which was investigated by Jun Zhang in . Proposition 3 asserts that the φ-divergence $D φ ( p θ ∥ p ϑ )$ coincides with the Bregman divergence [31,32] associated with the normalizing function ψ for points ϑ and θ in Θ. Because ψ is convex and attains a minimum at $θ = 0$, it follows that $∂ ψ ∂ θ i ( θ ) = 0$ at $θ = 0$. As a result, equality (30) reduces to $D φ ( p ∥ p θ ) = ψ ( θ )$.

#### 4.3. Geometry Induced by $D φ ( α ) ( · ∥ · )$

In this section, we assume that $φ ( · )$ is continuously differentiable and strictly convex. The latter assumption guarantees that
$D φ ( α ) ( p ∥ q ) = 0 if and only if p = q .$
The generalized Rényi divergence induces a metric $g = ( g i j )$ in generalized statistical manifolds $P$. This metric is given by
$g i j = − ∂ ∂ θ i p ∂ ∂ θ j q D φ α ( p ∥ q ) q = p .$
To show that this expression defines a metric, we have to verify that $g i j$ is invariant under change of coordinates, and $( g i j )$ is positive definite. The first claim follows from the chain rule. The positive definiteness of $( g i j )$ is a consequence of Proposition 4, which is given below.
Proposition 4.
The metric induced by $D φ ( α ) ( · ∥ · )$ coincides with the metric given by (22) or (25).
Proof.
Fix any $α ∈ ( − 1 , 1 )$. Applying the operator $( ∂ ∂ θ j ) p ϑ$ to
$∫ T φ ( c α ) d μ = 1 ,$
where $c α = 1 − α 2 φ − 1 ( p θ ) + 1 + α 2 φ − 1 ( p ϑ ) + κ ( α ) u 0$, we obtain
$∫ T 1 + α 2 ∂ φ − 1 ( p ϑ ) ∂ θ j + ∂ ∂ θ j p ϑ κ ( α ) u 0 φ ′ ( c α ) d μ = 0 ,$
which results in
$∂ ∂ θ j p ϑ κ ( α ) = − 1 + α 2 ∫ T ∂ φ − 1 ( p ϑ ) ∂ θ j φ ′ ( c α ) d μ ∫ T u 0 φ ′ ( c α ) d μ .$
By the standard differentiation rules, we can write
$∂ ∂ θ i p θ ∂ ∂ θ j p ϑ κ ( α ) = − 1 + α 2 ∫ T [ 1 − α 2 ∂ φ − 1 ( p θ ) ∂ θ i + ( ∂ ∂ θ i ) p θ κ ( α ) u 0 ] ∂ φ − 1 ( p ϑ ) ∂ θ j φ ″ ( c α ) d μ ∫ T u 0 φ ′ ( c α ) d μ + 1 + α 2 ∫ T ∂ φ − 1 ( p ϑ ) ∂ θ j φ ′ ( c α ) d μ ∫ T u 0 φ ′ ( c α ) d μ ∫ T u 0 [ 1 − α 2 ∂ φ − 1 ( p θ ) ∂ θ i + ( ∂ ∂ θ i ) p θ κ ( α ) u 0 ] φ ″ ( c α ) d μ ∫ T u 0 φ ′ ( c α ) d μ .$
Noticing that $∫ T ∂ φ − 1 ( p ϑ ) ∂ θ j φ ′ ( c α ) d μ = 0$ for $p ϑ = p θ$, the second term on the right-hand side of Equation (34) vanishes, and then
$∂ ∂ θ i p θ ∂ ∂ θ j p ϑ κ ( α ) p ϑ = p θ = − 1 − α 2 4 ∫ T ∂ φ − 1 ( p θ ) ∂ θ i ∂ φ − 1 ( p θ ) ∂ θ j φ ″ ( φ − 1 ( p θ ) ) d μ ∫ T u 0 φ ′ ( φ − 1 ( p θ ) ) d μ .$
If we use the notation introduced in (24), we can write
$g i j = − ∂ ∂ θ i p θ ∂ ∂ θ j p ϑ D φ ( α ) ( p θ ∥ p ϑ ) p ϑ = p θ = E θ ″ ∂ φ − 1 ( p θ ) ∂ θ i ∂ φ − 1 ( p θ ) ∂ θ j .$
It remains to show the case $α = ± 1$. Comparing (13) and (23), we can write
$D φ ( p θ ∥ p ϑ ) = E θ ′ [ φ − 1 ( p θ ) − φ − 1 ( p ϑ ) ] .$
We use the equivalent expressions
$g i j = ∂ 2 ∂ θ i ∂ θ j p D φ α ( p ∥ q ) q = p = ∂ 2 ∂ θ i ∂ θ j q D φ α ( p ∥ q ) q = p ,$
which follows from condition (32), to infer that
$g i j = ∂ 2 ∂ θ i ∂ θ j p ϑ D φ ( p θ ∥ p ϑ ) p θ = p ϑ = − E θ ′ ∂ 2 φ − 1 ( p θ ) ∂ θ i ∂ θ j .$
Because $D φ ( − 1 ) ( p ∥ q ) = D φ ( 1 ) ( q ∥ p ) = D φ ( p ∥ q )$, we conclude that the metric defined by (22) coincides with the metric induced by $D φ ( − 1 ) ( · ∥ · )$ and $D φ ( 1 ) ( · ∥ · )$. ☐
In generalized statistical manifolds, the generalized Rényi divergence $D φ ( α ) ( · ∥ · )$ induces a connection $D ( α )$, whose Christoffel symbols $Γ i j k ( α )$ are given by
$Γ i j k ( α ) = − ∂ 2 ∂ θ i ∂ θ j p ∂ ∂ θ k q D φ ( α ) ( p ∥ q ) q = p .$
Because $D φ ( α ) ( p ∥ q ) = D φ ( − α ) ( q ∥ p )$, it follows that $D ( α )$ and $D ( − α )$ are mutually dual for any $α ∈ [ − 1 , 1 ]$. In other words, $Γ i j k ( α )$ and $Γ i j k ( − α )$ satisfy the relation $∂ g j k ∂ θ i = Γ i j k ( α ) + Γ i k j ( − α )$. A development involving expression (35) results in
$Γ i j k ( 1 ) = E θ ″ ∂ 2 φ − 1 ( p θ ) ∂ θ i ∂ θ j ∂ φ − 1 ( p θ ) ∂ θ k − E θ ′ ∂ 2 φ − 1 ( p θ ) ∂ θ i ∂ θ j E θ ″ u 0 ∂ φ − 1 ( p θ ) ∂ θ k ,$
and
$Γ i j k ( − 1 ) = E θ ″ ∂ 2 φ − 1 ( p θ ) ∂ θ i ∂ θ j ∂ φ − 1 ( p θ ) ∂ θ k + E θ ‴ ∂ φ − 1 ( p θ ) ∂ θ i ∂ φ − 1 ( p θ ) ∂ θ j ∂ φ − 1 ( p θ ) ∂ θ k = − E θ ″ ∂ φ − 1 ( p θ ) ∂ θ j ∂ φ − 1 ( p θ ) ∂ θ k E θ ″ u 0 ∂ φ − 1 ( p θ ) ∂ θ i = − E θ ″ ∂ φ − 1 ( p θ ) ∂ θ i ∂ φ − 1 ( p θ ) ∂ θ k E θ ″ u 0 ∂ φ − 1 ( p θ ) ∂ θ j .$
For $α ∈ ( − 1 , 1 )$, the Christoffel symbols $Γ i j k ( α )$ can be written as a convex combination of $Γ i j k ( − 1 )$ and $Γ i j k ( − 1 )$, as asserted in the next result.
Proposition 5.
The Christoffel symbols $Γ i j k ( α )$ induced by the divergence $D φ ( α ) ( · ∥ · )$ satisfy the relation
$Γ i j k ( α ) = 1 − α 2 Γ i j k ( − 1 ) + 1 + α 2 Γ i j k ( 1 ) , for α ∈ [ − 1 , 1 ] .$
Proof.
For $α = ± 1$, equality (39) follows trivially. Thus, we assume $α ∈ ( − 1 , 1 )$. By (34), we can write
$∂ ∂ θ i p θ ∂ ∂ θ k p ϑ κ ( α ) = − 1 + α 2 ∫ T [ 1 − α 2 ∂ φ − 1 ( p θ ) ∂ θ i + ( ∂ ∂ θ i ) p θ κ ( α ) u 0 ] ∂ φ − 1 ( p ϑ ) ∂ θ k φ ″ ( c α ) d μ ∫ T u 0 φ ′ ( c α ) d μ = + 1 + α 2 ∫ T ∂ φ − 1 ( p ϑ ) ∂ θ k φ ′ ( c α ) d μ ∫ T u 0 φ ′ ( c α ) d μ ∫ T u 0 [ 1 − α 2 ∂ φ − 1 ( p θ ) ∂ θ i + ( ∂ ∂ θ i ) p θ κ ( α ) u 0 ] φ ″ ( c α ) d μ ∫ T u 0 φ ′ ( c α ) d μ .$
Applying $( ∂ ∂ θ j ) p θ$ to the first term on the right-hand side of (40), and then equating $p ϑ = p θ$, we obtain
$− 1 − α 2 4 E θ ″ ∂ 2 φ − 1 ( p θ ) ∂ θ i ∂ θ j ∂ φ − 1 ( p θ ) ∂ θ k − 1 + α 2 ∂ 2 ∂ θ i ∂ θ j p θ κ ( α ) E θ ″ u 0 ∂ φ − 1 ( p θ ) ∂ θ k − 1 − α 2 4 1 − α 2 E θ ‴ ∂ φ − 1 ( p θ ) ∂ θ i ∂ φ − 1 ( p θ ) ∂ θ j ∂ φ − 1 ( p θ ) ∂ θ k + 1 − α 2 4 1 − α 2 E θ ″ ∂ φ − 1 ( p θ ) ∂ θ i ∂ φ − 1 ( p ϑ ) ∂ θ k E θ ″ u 0 ∂ φ − 1 ( p ϑ ) ∂ θ j .$
Similarly, if we apply $( ∂ ∂ θ j ) p θ$ to the second term on the right-hand side of (40), and make $p ϑ = p θ$, we get
$1 − α 2 4 1 − α 2 E θ ″ ∂ φ − 1 ( p θ ) ∂ θ j ∂ φ − 1 ( p θ ) ∂ θ k E θ ″ u 0 ∂ φ − 1 ( p θ ) ∂ θ i .$
Collecting (41) and (42), we can write
$Γ i j k ( α ) = − 4 1 − α 2 ∂ 2 ∂ θ i ∂ θ j p θ ∂ ∂ θ k p ϑ κ ( α ) p θ = p ϑ = E θ ″ ∂ 2 φ − 1 ( p θ ) ∂ θ i ∂ θ j ∂ φ − 1 ( p θ ) ∂ θ k + 1 − α 2 E θ ‴ ∂ φ − 1 ( p θ ) ∂ θ i ∂ φ − 1 ( p θ ) ∂ θ j ∂ φ − 1 ( p θ ) ∂ θ k = − 1 − α 2 E θ ″ ∂ φ − 1 ( p θ ) ∂ θ j ∂ φ − 1 ( p θ ) ∂ θ k E θ ″ u 0 ∂ φ − 1 ( p θ ) ∂ θ i = − 1 − α 2 E θ ″ ∂ φ − 1 ( p θ ) ∂ θ i ∂ φ − 1 ( p ϑ ) ∂ θ k E θ ″ u 0 ∂ φ − 1 ( p ϑ ) ∂ θ j = − 1 + α 2 E θ ′ ∂ 2 φ − 1 ( p θ ) ∂ θ i ∂ θ j E θ ″ u 0 ∂ φ − 1 ( p θ ) ∂ θ k ,$
where we used
$∂ 2 ∂ θ i ∂ θ j p θ κ ( α ) = 1 − α 2 4 ∂ 2 ∂ θ i ∂ θ j p θ D φ ( α ) ( p θ ∥ p ϑ ) p ϑ = p θ = 1 − α 2 4 g i j = − 1 − α 2 4 E θ ′ ∂ 2 φ − 1 ( p θ ) ∂ θ i ∂ θ i .$
Expression (39) follows from (37), (38) and (43). ☐

## 5. Conclusions

In [17,18], the authors introduced a pair of dual connections $D ( − 1 )$ and $D ( 1 )$ induced by φ-divergence. The main motivation of the present work was to find a (non-trivial) family of α-divergences, whose induced α-connections are convex combinations of $D ( − 1 )$ and $D ( 1 )$. As a result of our efforts, we proposed a generalization of Rényi divergence. The connection $D ( α )$ induced by the generalization of Rényi divergence satisfies the relation $D ( α ) = 1 − α 2 D ( − 1 ) + 1 + α 2 D ( 1 )$. To generalize Rényi divergence, we made use of properties of φ-functions. This makes evident the importance of φ-functions in the geometry of non-standard models. In standard statistical manifolds, even though Amari’s α-divergence and Rényi divergence (with $α ∈ [ − 1 , 1 ]$) do not coincide, they induce the same family of α-connections. This striking result requires further investigation. Future work should focus on how the generalization of Rényi divergence is related to Zhang’s $( ρ , τ )$-divergence, and also how the present proposal is related to the model presented in .

## Acknowledgments

The authors are indebted to the anonymous reviewers for their valuable comments and corrections, which led to a great improvement of this paper. Charles C. Cavalcante also thanks the CNPq (Proc. 309055/2014-8) for partial funding.

## Author Contributions

All authors contributed equally to the design of the research. The research was carried out by all authors. Rui F. Vigelis and Charles C. Cavalcante gave the central idea of the paper and managed the organization of it. Rui F. Vigelis wrote the paper. All the authors read and approved the final manuscript.

## Conflicts of Interest

The authors declare no conflict of interest.

## References

1. Rao, C.R. Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 1945, 37, 81–91. [Google Scholar]
2. Amari, S.-I. Differential geometry of curved exponential families—Curvatures and information loss. Ann. Stat. 1982, 10, 357–385. [Google Scholar] [CrossRef]
3. Amari, S.-I. Differential-Geometrical Methods in Statistics; Springer: Berlin/Heidelberg, Germany, 1985; Volume 28. [Google Scholar]
4. Amari, S.-I.; Nagaoka, H. Methods of Information Geometry (Translations of Mathematical Monographs); American Mathematical Society: Providence, RI, USA, 2000; Volume 191. [Google Scholar]
5. Amari, S.-I. Information Geometry and Its Applications; Applied Mathematical Sciences Series; Springer: Berlin/Heidelberg, Germany, 2016; Volume 194. [Google Scholar]
6. Amari, S.-I.; Ohara, A.; Matsuzoe, H. Geometry of deformed exponential families: Invariant, dually-flat and conformal geometries. Physica A 2012, 391, 4308–4319. [Google Scholar] [CrossRef]
7. Matsuzoe, H. Hessian structures on deformed exponential families and their conformal structures. Differ. Geom. Appl. 2014, 35 (Suppl.), 323–333. [Google Scholar] [CrossRef]
8. Naudts, J. Estimators, escort probabilities, and ϕ-exponential families in statistical physics. J. Inequal. Pure Appl. Math. 2004, 5, 102. [Google Scholar]
9. Pistone, G. κ-exponential models from the geometrical viewpoint. Eur. Phys. J. B 2009, 70, 29–37. [Google Scholar] [CrossRef]
10. Amari, S.-I.; Ohara, A. Geometry of q-exponential family of probability distributions. Entropy 2011, 13, 1170–1185. [Google Scholar] [CrossRef]
11. Vigelis, R.F.; Cavalcante, C.C. The Δ2-Condition and φ-Families of Probability Distributions. In Geometric Science of Information; Springer: Berlin/Heidelberg, Germany, 2013; Volume 8085, pp. 729–736. [Google Scholar]
12. Vigelis, R.F.; Cavalcante, C.C. On φ-families of probability distributions. J. Theor. Probab. 2013, 26, 870–884. [Google Scholar] [CrossRef]
13. Cena, A.; Pistone, G. Exponential statistical manifold. Ann. Inst. Stat. Math. 2007, 59, 27–56. [Google Scholar] [CrossRef]
14. Grasselli, M.R. Dual connections in nonparametric classical information geometry. Ann. Inst. Stat. Math. 2010, 62, 873–896. [Google Scholar] [CrossRef]
15. Pistone, G.; Sempi, C. An infinite-dimensional geometric structure on the space of all the probability measures equivalent to a given one. Ann. Stat. 1995, 23, 1543–1561. [Google Scholar] [CrossRef]
16. Santacroce, M.; Siri, P.; Trivellato, B. New results on mixture and exponential models by Orlicz spaces. Bernoulli 2016, 22, 1431–1447. [Google Scholar] [CrossRef]
17. Vigelis, R.F.; Cavalcante, C.C. Information Geometry: An Introduction to New Models for Signal Processing. In Signals and Images; CRC Press: Boca Raton, FL, USA, 2015; pp. 455–491. [Google Scholar]
18. Vigelis, R.F.; de Souza, D.C.; Cavalcante, C.C. New Metric and Connections in Statistical Manifolds. In Geometric Science of Information; Springer: Berlin/Heidelberg, Germany, 2015; Volume 9389, pp. 222–229. [Google Scholar]
19. Zhang, J. Divergence function, duality, and convex analysis. Neural Comput. 2004, 16, 159–195. [Google Scholar] [CrossRef] [PubMed]
20. Zhang, J. Referential Duality and Representational Duality on Statistical Manifolds. In Proceedings of the 2nd International Symposium on Information Geometry and Its Applications, Pescara, Italy, 12–16 December 2005; pp. 58–67.
21. Zhang, J. Nonparametric information geometry: From divergence function to referential-representational biduality on statistical manifolds. Entropy 2013, 15, 5384–5418. [Google Scholar] [CrossRef]
22. Zhang, J. Divergence Functions and Geometric Structures They Induce on a Manifold. In Geometric Theory of Information; Springer: Berlin/Heidelberg, Germany, 2014; pp. 1–30. [Google Scholar]
23. Zhang, J. On monotone embedding in information geometry. Entropy 2015, 17, 4485–4489. [Google Scholar] [CrossRef]
24. Rényi, A. On measures of entropy and information. In Proceedings of 4th Berkeley Symposium on Mathematical Statistics and Probability; University California Press: Berkeley, CA, USA, 1961; Volume I, pp. 547–561. [Google Scholar]
25. Van Erven, T.; Harremoës, P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inform. Theory 2014, 60, 3797–3820. [Google Scholar] [CrossRef]
26. Eguchi, S.; Komori, O. Path Connectedness on a Space of Probability Density Functions. In Geometric Science of Information; Springer: Berlin/Heidelberg, Germany, 2015; Volume 9389, pp. 615–624. [Google Scholar]
27. Kaniadakis, G.; Lissia, M.; Scarfone, A.M. Deformed logarithms and entropies. Physica A 2004, 340, 41–49. [Google Scholar] [CrossRef]
28. Kaniadakis, G. Theoretical foundations and mathematical formalism of the power-law tailed statistical distributions. Entropy 2013, 15, 3983–4010. [Google Scholar] [CrossRef]
29. Musielak, J. Orlicz Spaces and Modular Spaces; Springer: Berlin/Heidelberg, Germany, 1983; Volume 1034. [Google Scholar]
30. Shima, H. The Geometry of Hessian Structures; World Scientific: Singapore, 2007. [Google Scholar]
31. Banerjee, A.; Merugu, S.; Dhillon, I.S.; Ghosh, J. Clustering with Bregman Divergences. J. Mach. Learn. Res. 2005, 6, 1705–1749. [Google Scholar]
32. Bregman, L.M. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 1967, 7, 200–217. [Google Scholar] [CrossRef]
33. Zhanga, J.; Hästö, P. Statistical manifold as an affine space: A functional equation approach. J. Math. Psychol. 2006, 50, 60–65. [Google Scholar] [CrossRef]

## Share and Cite

MDPI and ACS Style

De Souza, D.C.; Vigelis, R.F.; Cavalcante, C.C. Geometry Induced by a Generalization of Rényi Divergence. Entropy 2016, 18, 407. https://doi.org/10.3390/e18110407

AMA Style

De Souza DC, Vigelis RF, Cavalcante CC. Geometry Induced by a Generalization of Rényi Divergence. Entropy. 2016; 18(11):407. https://doi.org/10.3390/e18110407

Chicago/Turabian Style

De Souza, David C., Rui F. Vigelis, and Charles C. Cavalcante. 2016. "Geometry Induced by a Generalization of Rényi Divergence" Entropy 18, no. 11: 407. https://doi.org/10.3390/e18110407

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.