This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Information geometry studies the dually flat structure of a manifold, highlighted by the generalized Pythagorean theorem. The present paper studies a class of Bregman divergences called the (

Information geometry, originated from the invariant structure of a manifold of probability distributions, consists of a Riemannian metric and dually coupled affine connections with respect to the metric [

The present paper studies a general and unique class of decomposable divergence functions in

Positive-definite (PD) matrices appear in many engineering problems, such as convex programming, diffusion tensor analysis and multivariate statistical analysis [_{n}

The present paper is organized as follows. Section 2 is preliminary, giving a short introduction to a dually flat manifold and the Bregman divergence. It also defines the cluster center due to a divergence. Section 3 defines the (_{n}

A manifold is said to have the dually flat Riemannian structure, when it has two affine coordinate systems ^{1}^{n}_{1}_{n}

where ∇ is the gradient operator. The Riemannian metric is given by:

in the respective coordinate systems. A curve that is linear in the

A dually flat manifold has a unique canonical divergence, which is the Bregman divergence defined by the convex functions,

where _{P}_{Q}_{Qi}_{p}_{Q}

Given three points,

Given a smooth submanifold, _{S}

Then, _{S}_{S}

We have the dual of the above theorems where

A divergence,

where
_{P}_{Q}

An

is a typical example of decomposable divergence in the manifold of probability distributions, where _{i}_{i}

by using a scalar convex function,

When

where ′ is the differentiation of a function, so that it is computationally tractable. Its inverse transformation is also componentwise,

where

Consider a cluster of points _{1}_{m}_{1}_{m}_{1}_{m}

By differentiating ∑_{i}

Hence, the cluster-center theorem due to Banerjee

The _{R}

When we need to obtain the _{R}

However, in many cases, the transformation is computationally heavy and intractable when the dimensions of a manifold is large. The transformation is easy in the case of a decomposable divergence. This is motivation for considering a general class of decomposable Bregman divergences.

Let
_{1}_{n}_{i}

and _{1}_{n}_{i}

and

Let

the

By using these functions, we construct new coordinate systems ^{i}_{i}

They are called the _{ρ,τ}_{ρ,τ}

We introduce two scalar functions of

Then, the first and second derivatives of _{ρ,τ}

Since _{ρ,τ}_{ρ,τ}

We then define two decomposable convex functions of

They are Legendre duals to each other.

The (

where

The (

The Riemannian metric is:

and hence Euclidean, because the Riemann-Christoffel curvature due to the Levi-Civita connection vanishes, too.

The following theorem is new, characterizing the (

The (

We have dually flat connections, (∇_{ρ,τ}_{ρ,τ}

The

The Riemann-Christoffel curvature of

for any

As a special case of the (

This was introduced by Cichocki, Cruse and Amari in [

The affine and dual affine coordinates are:

and the convex functions are:

where:

The induced (

for _{n}_{n}_{i}

The

we have:

The

It is written as:

The _{n}

The classes of

When

Zhang already introduced the (

Let

where ∇ is the gradient operator with respect to matrix _{ij}

where tr is the trace of a matrix.

It induces a dually flat structure to the manifold of positive-definite matrices, where the affine coordinate system (

A convex function,

holds for any orthogonal transformation ^{T}_{1}_{n}

it is said to be decomposable. We have:

A divergence

When it is derived from a decomposable convex function,

We give well-known examples of decomposable convex functions and the divergences derived from them:

For ^{2}, we have:

where ||^{2} is the Frobenius norm:

For

The affine coordinate system is ^{−1}. The derived geometry is the same as that of multivariate Gaussian probability distributions with mean zero and covariance matrix

For

This divergence is used in quantum information theory. The affine coordinate system is

We extend the (

be _{ρ,τ}_{ρ,τ}

They are not convex with respect to

The (

The Euclidean, Gaussian and von Neuman divergences given in

When

(

By using the (

so that the (

This is a Bregman divergence, where the affine coordinate system is ^{α}^{β}

The

The affine coordinate system is

The

We extend the concept of invariance under the orthogonal group to that under the general linear group,

holds for any

We identify matrix

where

The invariant, flat and decomposable divergence under

We have focused on flat and decomposable divergences. There are many interesting non-decomposable divergences. We first discuss a general class of flat divergences in

We can describe a general class of flat divergence in

which is not necessarily a componentwise function. Any pair of invertible

The dual coordinates

so that we have:

This implies that a pair (^{−1}(

This is different from the case of decomposable divergence, where any monotone pair of

Ohara and Eguchi [

where _{V}

In such a case, we can introduce dually flat structure to _{n}_{V}

The derived divergence is:

When _{V}

A dually flat structure is introduced in the manifold of probability distributions [

where:

The dual affine coordinates are the

The divergence, _{q}

We can generalize it to the case of _{n}

This is flat, but not decomposable.

The _{n}

This is not flat nor decomposable. This is a projective divergence in the sense that, for any

Therefore, it can be defined in the submanifold of tr

We have shown that the (

When we treat the manifold of probability distributions, it is a submanifold of the manifold of positive measures, where the total sum of measures are restricted to one. This is a nonlinear constraint in the

Quantum information theory deals with positive-definite Hermitian matrices of trace one [

The author declares no conflicts of interest.