A Class of New Metrics Based on Triangular Discrimination

Lu, Guoxiang; Li, Bingqing

doi:10.3390/info6030361

Open AccessArticle

A Class of New Metrics Based on Triangular Discrimination

by

Guoxiang Lu

^1,2,* and

Bingqing Li

¹

Department of Risk Management and Insurance, Nankai University, No. 94 Weijin Road, 300071 Tianjin, China

²

School of Statistics and Mathematics, Zhongnan University of Economics and Law, No. 182 Nanhu Avenue, 430073 Wuhan, China

^*

Author to whom correspondence should be addressed.

Information 2015, 6(3), 361-374; https://doi.org/10.3390/info6030361

Submission received: 2 June 2015 / Revised: 8 July 2015 / Accepted: 14 July 2015 / Published: 17 July 2015

(This article belongs to the Section Information Theory and Methodology)

Download Versions Notes

Abstract

:

In the field of information theory, statistics and other application areas, the information-theoretic divergences are used widely. To meet the requirement of metric properties, we introduce a class of new metrics based on triangular discrimination which are bounded. Moreover, we obtain some sharp inequalities for the triangular discrimination and other information-theoretic divergences. Their asymptotic approximation properties are also involved.

Keywords:

triangular discrimination; metric; triangle inequality; information-theoretic divergence; information inequalities

1. Introduction

In many applications such as pattern recognition, machine learning, statistics, optimization and other applied branches of mathematics, it is beneficial to use the information-theoretic divergences rather than the squared Euclidean distance to estimate the (dis)similarity of two probability distributions or positive arrays [1,2,3,4,5,6,7,8,9]. Among them the Kullback–Leibler divergence (relative entropy), triangular discrimination, variation distance, Hellinger distance, Jensen–Shannon divergence, symmetric Chi-square divergence, J-divergence and other important measures often play a critical role. Unfortunately, most of these divergences do not satisfy the metric properties and unboundedness [10]. As we know, metric properties are the preconditions for numerous convergence properties of iterative algorithms [11]. Moreover, boundedness is also highly concerned in numerical computations and simulations. In paper [12], Endres and Schindelin have proved that the square root of twice Jensen–Shannon divergence is a metric. Triangular discrimination presented by Topsøe in [13] is a non-logarithmic measure and is simple in complex computation. Inspired by [12], we discuss the triangular discrimination. In this paper, the main result is that a class of new metrics derived from the triangular discrimination are introduced. Finally, some new relationships among triangular discrimination, Jensen–Shannon divergence, square of Hellinger distance, variation distance are also obtained.

2. Definition and Auxiliary Results

Definition 1. Let

Γ_{n} = \{P = (p_{1}, p_{2}, \dots, p_{n}) | p_{i} \geq 0, \sum_{i = 1}^{n} p_{i} = 1\}, n \geq 2

be the set of all complete finite discrete probability distributions. For all

P, Q \in Γ_{n}

, the triangular discrimination is defined by

Δ (P, Q) = \sum_{i = 1}^{n} \frac{{(p_{i} - q_{i})}^{2}}{p_{i} + q_{i}} .

(1)

In the above definition, we use convention based on limitation property that

\frac{0}{0} = 0

.

The triangular discrimination is obviously symmetric, nonnegative and vanishes for

P = Q

, but it does not fulfill the triangle inequality. In the view of the foregoing, the concept of triangular discrimination should be generalized. If

P, Q \in Γ_{n}

, the function

Δ_{α} (P, Q)

is studied:

Δ_{α} (P, Q) = {(\sum_{i = 1}^{n} \frac{{(p_{i} - q_{i})}^{2}}{p_{i} + q_{i}})}^{α},

(2)

where

α \in (0, + \infty)

.

In the following, the

α -

power of the summand in

Δ (P, Q)

with all

α \in (0, + \infty)

are discussed.

Definition 2. Let the function

L (p, q) : [0, + \infty) \times [0, + \infty) \to [0, + \infty)

be defined by

L (p, q) = \frac{{(p - q)}^{2}}{p + q} .

(3)

It is easy to see that

L (p, q) \geq 0

and

L (p, q) = L (q, p)

. To all

α \in (0, + \infty)

, the issue of whether

{(L (p, q))}^{α}

satisfies the triangle inequality is considered in the following.

Lemma 1. If the function

g : [0, a) \cup (a, + \infty) \to (- \infty, + \infty)

is defined by

g (x) = \frac{\frac{(x - a) (x + 3 a)}{{(x + a)}^{2}}}{\sqrt{L (a, x)}}

with

a > 0

, then

lim_{x \to a^{+}} g (x) = \sqrt{\frac{2}{a}}, lim_{x \to a^{-}} g (x) = - \sqrt{\frac{2}{a}} .

Proof. As

g (x) = \{\begin{matrix} \frac{x + 3 a}{{(x + a)}^{\frac{3}{2}}}, x > a \\ - \frac{x + 3 a}{{(x + a)}^{\frac{3}{2}}}, 0 \leq x < a \end{matrix}

we can get

lim_{x \to a^{+}} g (x) = \frac{a + 3 a}{{(a + a)}^{\frac{3}{2}}} = \sqrt{\frac{2}{a}}, lim_{x \to a^{-}} g (x) = - \frac{a + 3 a}{{(a + a)}^{\frac{3}{2}}} = - \sqrt{\frac{2}{a}} .

Lemma 2. If the function

h : [0, + \infty) \to (0, + \infty)

is defined by

h (x) = \frac{3 x + a}{{(x + a)}^{\frac{3}{2}}}

with

a > 0

, then h is monotonic increasing in

[0, a)

and monotonic decreasing in

(a, + \infty)

.

Proof. Straightforward derivative shows

h^{'} (x) = \frac{3 (a - x)}{2 {(x + a)}^{\frac{5}{2}}},

h^{'} (x) > 0

in

[0, a)

and

h^{'} (x) < 0

in

(a, + \infty)

. Thus the lemma holds.

Assuming

0 < p < q

, we introduce function

R_{p q} : [0, + \infty) \to [0, + \infty)

defined by

R_{p q} (r) = \sqrt{L (p, r)} + \sqrt{L (q, r)} .

Lemma 3. The function

R_{p q} (r)

has two minima, one at

r = p

and the other at

r = q

.

Proof. The derivative of the function

R_{p q} (r)

is

{R_{p q}}^{'} (r) = \frac{1}{2} (\frac{\frac{(r - p) (r + 3 p)}{{(r + p)}^{2}}}{\sqrt{L (p, r)}} + \frac{\frac{(r - q) (r + 3 q)}{{(r + q)}^{2}}}{\sqrt{L (q, r)}}) .

(4)

So

{R_{p q}}^{'} (r) < 0

for

r \in [0, p)

and

{R_{p q}}^{'} (r) > 0

for

r \in (q, + \infty)

. It shows

R_{p q} (r)

is monotonic decreasing in

[0, p)

and monotonic increasing in

[q, + \infty)

.

Next consider the monotonicity of

R_{p q} (r)

in the open interval

(p, q)

.

From Lemma 3, we have

\begin{matrix} lim_{r \to p^{+}} \frac{\frac{(r - p) (r + 3 p)}{{(r + p)}^{2}}}{\sqrt{L (p, r)}} = \sqrt{\frac{2}{p}}, \\ lim_{r \to q^{-}} \frac{\frac{(r - q) (r + 3 q)}{{(r + q)}^{2}}}{\sqrt{L (q, r)}} = - \sqrt{\frac{2}{q}} . \end{matrix}

(5)

From Lemma 2, we have

\begin{matrix} \frac{\frac{(p - q) (p + 3 q)}{{(p + q)}^{2}}}{\sqrt{L (p, q)}} = - \frac{p + 3 q}{{(p + q)}^{\frac{3}{2}}} > - \frac{p + 3 p}{{(p + p)}^{\frac{3}{2}}} = - \sqrt{\frac{2}{p}}, \\ \frac{\frac{(q - p) (q + 3 p)}{{(p + q)}^{2}}}{\sqrt{L (p, q)}} = \frac{q + 3 p}{{(p + q)}^{\frac{3}{2}}} < \frac{q + 3 q}{{(q + q)}^{\frac{3}{2}}} = \sqrt{\frac{2}{q}} . \end{matrix}

(6)

Using (5) and (6),

\begin{matrix} lim_{r \to p^{+}} {R_{p q}}^{'} (r) = \frac{1}{2} (lim_{r \to p^{+}} \frac{\frac{(r - p) (r + 3 p)}{{(r + p)}^{2}}}{\sqrt{L (p, r)}} + \frac{\frac{(p - q) (p + 3 q)}{{(p + q)}^{2}}}{\sqrt{L (p, q)}}) = \frac{1}{2} (\sqrt{\frac{2}{p}} - \frac{p + 3 q}{{(p + q)}^{\frac{3}{2}}}) > 0, \\ lim_{r \to q^{-}} {R_{p q}}^{'} (r) = \frac{1}{2} (\frac{\frac{(q - p) (q + 3 p)}{{(q + p)}^{2}}}{\sqrt{L (p, q)}} + lim_{r \to q^{-}} \frac{\frac{(r - q) (r + 3 q)}{{(r + q)}^{2}}}{\sqrt{L (r, q)}}) = \frac{1}{2} (\frac{q + 3 p}{{(p + q)}^{\frac{3}{2}}} - \sqrt{\frac{2}{q}}) < 0 . \end{matrix}

Let

A (y, r) = \frac{\frac{(r - y) (r + 3 y)}{{(r + y)}^{2}}}{\sqrt{L (y, r)}} = \frac{\frac{(r - y) (r + 3 y)}{{(r + y)}^{2}}}{\sqrt{r} \sqrt{L (\frac{y}{r}, 1)}} = \frac{1}{\sqrt{r}} B (y, r), y > 0,

then

\frac{\partial B (y, r)}{\partial r} = - \frac{3 y \sqrt{\frac{{(r - y)}^{2}}{r + y}}}{2 \sqrt{r} {(r + y)}^{2}} \leq 0 .

The equality holds if and only if

r = y

. So with respect to variable r in the open interval

(p, q)

,

B (p, r)

and

B (q, r)

are both monotonic decreasing,

B (p, r) + B (q, r)

is also monotonic decreasing. Using (4),

{R_{p q}}^{'} (r) = \frac{1}{2} (A (p, r) + A (q, r)) = \frac{1}{2 \sqrt{r}} (B (p, r) + B (q, r)),

this shows

lim_{r \to p^{+}} B (p, r) + B (q, r) > 0, lim_{r \to q^{-}} B (p, r) + B (q, r) < 0

. So we can see

B (p, r) + B (q, r)

has only one zero point in the open interval

(p, q)

with respect to variable r. As a consequence,

{R_{p q}}^{'} (r)

has only one zero point

x_{0}

in the open interval

(p, q)

with respect to variable r. This means

{R_{p q}}^{'} (r) > 0

in the interval

(p, x_{0})

,

{R_{p q}}^{'} (r) < 0

in the interval

(x_{0}, q)

. From the above we know

{R_{p q}}^{'} (r)

has only one maximum and no minimum in the open interval

(p, q)

.

As a result, the conclusion in the lemma is obtained. ☐

Theorem 1. Let

p, q, r \in [0, + \infty)

, then

{(L (p, q))}^{\frac{1}{2}} \leq {(L (p, r))}^{\frac{1}{2}} + {(L (q, r))}^{\frac{1}{2}} .

(7)

Proof. If

p = q

, then

L (p, q) = 0

. The triangle inequality (7) obviously holds.

If

p \neq q

and one of

p, q

is equal to 0, it is easy to obtain that (7) holds.

Next we assume

0 < p < q

without loss of generality. Note that the formula is valid:

{(L (p, q))}^{\frac{1}{2}} = lim_{r \to p} ({(L (p, r))}^{\frac{1}{2}} + {(L (q, r))}^{\frac{1}{2}}) = lim_{r \to q} ({(L (p, r))}^{\frac{1}{2}} + {(L (q, r))}^{\frac{1}{2}}) .

From Lemma 3 the triangle inequality (7) can be easily proved for any number

r \in [0, + \infty)

. ☐

Corollary 1. Let

p, q, r \in [0, + \infty)

. If

0 < α < \frac{1}{2}

, then

{(L (p, q))}^{α} \leq {(L (p, r))}^{α} + {(L (q, r))}^{α} .

(8)

Proof. Let

a, b > 0

and

0 < γ < 1

, then

a^{γ} + b^{γ} > {(a + b)}^{γ}

which follows from the concavity of

x^{γ}

. Now a γ which satisfies

α = \frac{1}{2} γ

can be found. Thus from Theorem 1,

\begin{matrix} {(L (p, r))}^{α} + {(L (q, r))}^{α} = & {(L (p, r))}^{\frac{1}{2} γ} + {(L (q, r))}^{\frac{1}{2} γ} \\ \geq & {({(L (p, r))}^{\frac{1}{2}} + {(L (q, r))}^{\frac{1}{2}})}^{γ} \geq {(L (p, q))}^{\frac{1}{2} γ} = {(L (p, q))}^{α} . \end{matrix}

This is the triangle inequality (8) for the function

{(L (p, q))}^{α}

. ☐

Theorem 2. Let

p, q, r \in [0, + \infty)

. If

α > \frac{1}{2}

, then the triangle inequality (8) does not hold.

Proof. Assuming

0 < p < q

, let

l (r) = {(L (p, r))}^{α} + {(L (q, r))}^{α}

. Firstly the formula is valid:

{(L (p, q))}^{α} = lim_{r \to p} ({(L (p, r))}^{α} + {(L (q, r))}^{α}) = lim_{r \to q} ({(L (p, r))}^{α} + {(L (q, r))}^{α}) .

The derivative of the function l is

l^{'} (r) = α (\frac{(r - p) (3 p + r)}{{(p + r)}^{2}} {(L (p, r))}^{α - 1} + \frac{(r - q) (3 q + r)}{{(q + r)}^{2}} {(L (q, r))}^{α - 1}) .

When

r \in (p, q)

, let

m (r) = {(\frac{(r - p) (3 p + r)}{{(p + r)}^{2}} {(L (p, r))}^{α - 1})}^{\frac{1}{1 - α}} .

Using l’Hôspital’s rule,

lim_{r \to p^{+}} m (r) = \frac{8 p^{2}}{(1 - α) {(p + r)}^{3}} {(\frac{(r - p) (3 p + r)}{{(p + r)}^{2}})}^{\frac{2 α - 1}{1 - α}} = 0 .

So

lim_{r \to p^{+}} l^{'} (r) = \frac{(p - q) (3 q + p)}{{(q + p)}^{2}} {(L (p, q))}^{α - 1} < 0 .

According to the definition of derivative, there exists a

δ > 0

such that for any

s \in (p, p + δ)

,

{(L (p, q))}^{α} = lim_{r \to p^{+}} ({(L (p, r))}^{α} + {(L (q, r))}^{α}) > {(L (p, s))}^{α} + {(L (q, s))}^{α} .

This shows the triangle inequality (8) does not hold. ☐

To sum up the theorems and corollary above, we can obtain the main theorem:

Theorem 3. The function

{(L (p, q))}^{α}

satisfies the triangle inequality (8) if and only if

0 < α \leq \frac{1}{2}

.

3. Metric Properties of $Δ_{α} (P, Q)$

In this section, we mainly prove the following theorem:

Theorem 4. The function

Δ_{α} (P, Q)

is a metric on the space

Γ_{n}

if and only if

0 < α \leq \frac{1}{2}

.

Proof. From (2) we can get

Δ_{α} (P, Q) = {(\sum_{i = 1}^{n} L (p_{i}, q_{i}))}^{α}

. It is easy to see that

Δ_{α} (P, Q) \geq 0

with equality only for

P = Q

and

Δ_{α} (P, Q) = Δ_{α} (Q, P)

. So what we concern is whether the triangle inequality

Δ_{α} (P, Q) \leq Δ_{α} (P, R) + Δ_{α} (Q, R)

(9)

holds for any

P, Q, R \in Γ_{n}

.

☐

When

P = Q

,

Δ_{α} (P, Q) = 0

, the triangle inequality (9) holds apparently. So we assume

P \neq Q

in the following.

Next we consider the value of α in two cases respectively:

(i)

0 < α \leq \frac{1}{2}

:

From Theorem 3, the inequality

{(L (p_{i}, q_{i}))}^{α} \leq {(L (p_{i}, r_{i}))}^{α} + {(L (q_{i}, r_{i}))}^{α}

holds. Applying Minkowski’s inequality we have

\begin{matrix} {(\sum_{i = 1}^{n} L (p_{i}, q_{i}))}^{α} & = {\{\sum_{i = 1}^{n} {({(L (p_{i}, q_{i}))}^{α})}^{\frac{1}{α}}\}}^{α} \\ \leq {\{\sum_{i = 1}^{n} {({(L (p_{i}, r_{i}))}^{α} + {(L (q_{i}, r_{i}))}^{α})}^{\frac{1}{α}}\}}^{α} \\ \leq {\{\sum_{i = 1}^{n} {({(L (p_{i}, r_{i}))}^{α})}^{\frac{1}{α}}\}}^{α} + {\{\sum_{i = 1}^{n} {({(L (q_{i}, r_{i}))}^{α})}^{\frac{1}{α}}\}}^{α} \\ = {(\sum_{i = 1}^{n} L (p_{i}, r_{i}))}^{α} + {(\sum_{i = 1}^{n} L (q_{i}, r_{i}))}^{α} . \end{matrix}

So the triangle inequality (9) holds.

(ii)

α > \frac{1}{2}

:

Let

F (x_{1}, \dots, x_{n}) = F_{1} (x_{1}, \dots, x_{n}) + F_{2} (x_{1}, \dots, x_{n}),

where

\begin{matrix} F_{1} (x_{1}, \dots, x_{n}) & = {(\sum_{i = 1}^{n} \frac{{(p_{i} - x_{i})}^{2}}{p_{i} + x_{i}})}^{α}, \\ F_{2} (x_{1}, \dots, x_{n}) & = {(\sum_{i = 1}^{n} \frac{{(q_{i} - x_{i})}^{2}}{q_{i} + x_{i}})}^{α} . \end{matrix}

Then

F (p_{1}, \dots, p_{n}) = F (q_{1}, \dots, q_{n}) = Δ_{α} (P, Q)

.

Next we prove

(p_{1}, \dots, p_{n})

and

(q_{1}, \dots, q_{n})

are not the extreme points of the function

F (x_{1}, \dots, x_{n})

. By the symmetry we only need to prove

(p_{1}, \dots, p_{n})

is not the extreme point.

By partial derivative,

\begin{matrix} \frac{\partial F}{\partial x_{i}} |_{(x_{1}, \dots, x_{n}) = (p_{1}, \dots, p_{n})} = \frac{\partial F_{1}}{\partial x_{i}} |_{(x_{1}, \dots, x_{n}) = (p_{1}, \dots, p_{n})} + \frac{\partial F_{2}}{\partial x_{i}} |_{(x_{1}, \dots, x_{n}) = (p_{1}, \dots, p_{n})} . \end{matrix}

(10)

Since

P \neq Q

, we might as well assume

p_{1} \neq q_{1}

and

p_{1} > 0

.

\begin{matrix} \frac{\partial F_{2}}{\partial x_{1}} |_{(x_{1}, \dots, x_{n}) = (p_{1}, \dots, p_{n})} = & \frac{α (p_{1} - q_{1}) (p_{1} + 3 q_{1})}{{(p_{1} + q_{1})}^{2}} \cdot {(\sum_{i = 1}^{n} \frac{{(p_{i} - q_{i})}^{2}}{p_{i} + q_{i}})}^{α - 1} \\ \neq & 0 . \end{matrix}

(11)

\begin{matrix} \frac{\partial F_{1}}{\partial x_{1}} |_{(x_{1}, \dots, x_{n}) = (p_{1}, \dots, p_{n})} = & lim_{Δ x_{1} \to 0} \frac{1}{Δ x_{1}} (F_{1} (p_{1} + Δ x_{1}, \dots, p_{n}) - F_{1} (p_{1}, \dots, p_{n})) \\ = & lim_{Δ x_{1} \to 0} \frac{1}{Δ x_{1}} {(\frac{Δ x_{1}^{2}}{2 p_{1} + x_{1}})}^{α} \\ = & lim_{Δ x_{1} \to 0} \frac{Δ x_{1}^{2 α - 1}}{{(2 p_{1} + x_{1})}^{α}} \\ = & 0 . \end{matrix}

(12)

Then taking (11) and (12) into (10), we have

\frac{\partial F}{\partial x_{1}} |_{(x_{1}, \dots, x_{n}) = (p_{1}, \dots, p_{n})} \neq 0 .

Therefore,

(p_{1}, \dots, p_{n})

is not the extreme point of the function

F (x_{1}, \dots, x_{n})

. For the same reason,

(q_{1}, \dots, q_{n})

is also not the extreme point.

Using the definition of extreme point, there exists a point

R = (r_{1}, \dots, r_{n})

such that

F (r_{1}, \dots, r_{n}) < F (p_{1}, \dots, p_{n}) = Δ_{α} (P, Q)

. As

F_{1} (r_{1}, \dots, r_{n}) = Δ_{α} (P, R), F_{2} (r_{1}, \dots, r_{n}) = Δ_{α} (Q, R)

, then

Δ_{α} (P, R) + Δ_{α} (Q, R) < Δ_{α} (P, Q)

. The inequality is not consistent with the triangle inequality (9).

From what has been discussed above, the conclusion in the theorem is obtained. ☐s

The generalization of this result to continuous probability distributions is straightforward. Consider a measurable space

(X, A)

, and P, Q are probability distributions with Radon-Nykodym densities

p = \frac{d P}{d μ}

,

q = \frac{d Q}{d μ}

w.r.t. a dominating σ-finite measure μ. Then

Δ_{α} (P, Q) = {(\int_{X} \frac{{(p - q)}^{2}}{p + q} d μ)}^{α}

(13)

is a metric if and only if

0 < α \leq \frac{1}{2}

.

Next we will discuss the maxima and minima of

Δ_{α} (P, Q)

. It is obvious that

Δ_{α} (P, Q) = 0

is the minima, if and only if

P = Q

. Because

Δ (P, Q)

can rewrite in the form

\begin{matrix} Δ (P, Q) & = \sum_{i = 1}^{n} \frac{{(p_{i} - q_{i})}^{2}}{p_{i} + q_{i}} \\ = \sum_{i = 1}^{n} (p_{i} + q_{i} - \frac{4 p_{i} q_{i}}{p_{i} + q_{i}}) \\ = 2 - \sum_{i = 1}^{n} \frac{4 p_{i} q_{i}}{p_{i} + q_{i}} \leq 2 . \end{matrix}

(14)

Δ (P, Q)

obtains the maxima 2 when

P, Q

are two distinct deterministic distributions, namely

p_{i} q_{i} = 0

. Then the metric

Δ_{α} (P, Q)

achieves its maximum value

2^{α}

.

4. Some Inequalities among the Information-Theoretic Divergences

Definition 3. For all

P, Q \in Γ_{n}

, the Jensen–Shannon divergence is defined by

J S (P, Q) = \frac{1}{2} \sum_{i = 1}^{n} [p_{i} ln (\frac{2 p_{i}}{p_{i} + q_{i}}) + q_{i} ln (\frac{2 q_{i}}{p_{i} + q_{i}})] .

The square of the Hellinger distance is defined by

H^{2} (P, Q) = \frac{1}{2} \sum_{i = 1}^{n} {(\sqrt{p_{i}} - \sqrt{q_{i}})}^{2} .

The variance distance is defined by

V (P, Q) = \sum_{i = 1}^{n} | p_{i} - q_{i} | .

Next we introduce the Csiszár’s f-divergence [14].

Definition 4. Let

f : [0, + \infty) \to (- \infty, + \infty)

be a convex function satisfying

f (1) = 0

, the f-divergence measure introduced by Csiszár is defined as

C_{f} (P, Q) = \sum_{i = 1}^{n} q_{i} f (\frac{p_{i}}{q_{i}})

(15)

for all

P, Q \in Γ_{n} .

The triangular discrimination, Jensen–Shannon divergence, the square of the Hellinger distance, variance distance are all f-divergence.

Example 1. (Triangular Discrimination) Let us consider

f_{Δ} (x) = \frac{{(x - 1)}^{2}}{x + 1}, x \in [0, + \infty)

in (15). Then we can verify

f_{Δ} (x)

is convex because

f_{Δ}^{″} (x) = \frac{8}{{(x + 1)}^{3}} \geq 0

,

f_{Δ} (1) = 0

,

f_{Δ} (x) \geq 0

and

C_{f_{Δ}} (P, Q) = Δ (P, Q)

.

Example 2. (Jensen–Shannon divergence) Let us consider

f_{J S} (x) = \frac{x}{2} ln \frac{2 x}{x + 1} + \frac{1}{2} ln \frac{2}{x + 1}, x \in [0, + \infty)

in (15). Then we can verify

f_{J S} (x)

is convex because

f_{J S}^{″} (x) = \frac{1}{2 x^{2} + 2 x} \geq 0

,

f_{J S} (1) = 0

and

C_{f_{J S}} (P, Q) = J S (P, Q)

. By standard inequality

ln x \geq 1 - \frac{1}{x}

,

f_{J S} (x) \geq \frac{x}{2} (1 - \frac{x + 1}{2 x}) + \frac{1}{2} (1 - \frac{x + 1}{2}) = 0

holds.

Example 3. (Square of Hellinger distance) Let us consider

f_{h} (x) = \frac{1}{2} {(\sqrt{x} - 1)}^{2}, x \in [0, + \infty)

in (15). Then we can verify

f_{h} (x)

is convex because

f_{h}^{″} (x) = \frac{1}{4 x \sqrt{x}} \geq 0

,

f_{h} (1) = 0

,

f_{h} (x) \geq 0

and

C_{f_{h}} (P, Q) = H^{2} (P, Q)

.

Example 4. (Variation distance) Let us consider

f_{V} (x) = | x - 1 |, x \in [0, + \infty)

in (15). Then we can easily get

f_{V} (x)

is convex,

f_{V} (1) = 0

,

f_{V} (x) \geq 0

and

C_{f_{V}} (P, Q) = V (P, Q)

.

Theorem 5. Let

f_{1}, f_{2}

be two nonnegative generating functions and there exists the real constants

k, K

such that

k < K

and if

f_{2} (x) \neq 0

then

k \leq \frac{f_{1} (x)}{f_{2} (x)} \leq K,

if

f_{2} (x) = 0

, then

f_{1} (x) = 0

. We have the inequalities:

k C_{f_{2}} (P, Q) \leq C_{f_{1}} (P, Q) \leq K C_{f_{2}} (P, Q) .

Proof. The conditions can be rewritten as

k f_{2} (x) \leq f_{1} (x) \leq K f_{2} (x)

. So from the formula (15),

\begin{matrix} C_{f_{1}} (P, Q) & = \sum_{i = 1}^{n} q_{i} f_{1} (\frac{p_{i}}{q_{i}}) \geq \sum_{i = 1}^{n} q_{i} (k f_{2} (\frac{p_{i}}{q_{i}})) \\ = k \sum_{i = 1}^{n} q_{i} f_{2} (\frac{p_{i}}{q_{i}}) = k C_{f_{2}} (P, Q) . \end{matrix}

and

\begin{matrix} C_{f_{1}} (P, Q) & = \sum_{i = 1}^{n} q_{i} f_{1} (\frac{p_{i}}{q_{i}}) \leq \sum_{i = 1}^{n} q_{i} (K f_{2} (\frac{p_{i}}{q_{i}})) \\ = K \sum_{i = 1}^{n} q_{i} f_{2} (\frac{p_{i}}{q_{i}}) = K C_{f_{2}} (P, Q) . \end{matrix}

☐

We have shown that

f_{Δ}

,

f_{J S}

,

f_{h}

,

f_{V}

are all nonnegative. In the following we will have some inequalities.

Theorem 6.

\frac{1}{4} Δ (P, Q) \leq J S (P, Q) \leq \frac{ln 2}{2} Δ (P, Q) .

Proof. When

x \neq 1

, both

f_{Δ} (1)

and

f_{J S} (1)

are not equal to 0. We consider the function:

ϕ (x) = \frac{f_{J S} (x)}{f_{Δ} (x)} = \frac{\frac{x}{2} ln \frac{2 x}{x + 1} + \frac{1}{2} ln \frac{2}{x + 1}}{\frac{{(x - 1)}^{2}}{x + 1}} .

The derivative of the function

ϕ (x)

is

ϕ^{'} (x) = \frac{(1 + 3 x) ln x + 4 (1 + x) ln \frac{2}{x + 1}}{2 {(1 - x)}^{3}} .

Let

ψ (x) = (1 + 3 x) ln x + 4 (1 + x) ln \frac{2}{x + 1} .

(16)

Straightforward derivative shows

\begin{matrix} ψ^{'} (x) & = 3 ln x + 4 ln \frac{2}{1 + x} + \frac{1}{x} - 1, \\ ψ^{''} (x) & = - \frac{{(x - 1)}^{2}}{x^{2} (x + 1)} < 0 . \end{matrix}

So

ψ (x)

is concave function when

x \in [0, + \infty)

and

ψ^{'} (1) = ψ (1) = 0

. This means

ψ (x)

gets the maximum 0 at the point

x = 1

. Accordingly

ψ (x) < 0

when

x \neq 1

. From (16), we find

\{\begin{matrix} ϕ^{'} (x) < 0, 0 < x < 1 \\ ϕ^{'} (x) > 0, x > 1 \end{matrix}

and

lim_{x \to 0^{+}} ϕ (x) = \frac{\frac{1}{2} ln 2}{1} = \frac{ln 2}{2} .

Using l’Hôspital’s rule (differentiate twice),

lim_{x \to 1} ϕ (x) = lim_{x \to 1} \frac{\frac{1}{2} (\frac{1}{x} - \frac{1}{x + 1})}{\frac{8}{{(x + 1)}^{3}}} = \frac{1}{4} .

Using l’Hôspital’s rule (differentiate once),

lim_{x \to + \infty} ϕ (x) = \frac{\frac{1}{2} ln \frac{2 x}{x + 1}}{\frac{(x - 1) (x + 3)}{{(x + 1)}^{2}}} = \frac{ln 2}{2} .

Thus

\frac{1}{4} \leq ϕ (x) = \frac{f_{J S} (x)}{f_{Δ} (x)} \leq \frac{ln 2}{2} .

When

x = 1

,

f_{Δ} (1) = f_{J S} (1) = 0

. As a consequence of Theorem 5, we obtain the result

\frac{1}{4} C_{f_{Δ}} (P, Q) \leq C_{f_{J S}} (P, Q) \leq \frac{ln 2}{2} C_{f_{Δ}} (P, Q) .

Thus the theorem is proved. ☐

Theorem 7.

J S (P, Q) \leq H^{2} (P, Q) \leq \frac{1}{ln 2} J S (P, Q) .

Proof. When

x \neq 1

, both

f_{h} (1)

and

f_{J S} (1)

are not equal to 0. We consider the function:

ξ (x) = \frac{f_{J S} (x)}{f_{h} (x)} = \frac{\frac{x}{2} ln \frac{2 x}{x + 1} + \frac{1}{2} ln \frac{2}{x + 1}}{\frac{1}{2} {(\sqrt{x} - 1)}^{2}} .

The derivative of the function

ϕ (x)

is

ξ^{'} (x) = \frac{ln \frac{2}{x + 1} + \sqrt{x} ln \frac{2 x}{x + 1}}{\sqrt{x} {(1 - \sqrt{x})}^{3}} .

By standard inequality

ln x \geq 1 - \frac{1}{x}

,

\begin{matrix} ln \frac{2}{x + 1} + \sqrt{x} ln \frac{2 x}{x + 1} & \geq 1 - \frac{x + 1}{2} + \sqrt{x} (1 - \frac{x + 1}{2 x}) \\ = \frac{{(\sqrt{x} - 1)}^{2} (\sqrt{x} + 1)}{2 \sqrt{x}} > 0 \end{matrix}

So

\{\begin{matrix} ξ^{'} (x) > 0, 0 < x < 1 \\ ξ^{'} (x) < 0, x > 1 \end{matrix}

and

lim_{x \to 0^{+}} ξ (x) = \frac{\frac{1}{2} ln 2}{\frac{1}{2}} = ln 2 .

Using l’Hôspital’s rule (differentiate twice),

lim_{x \to 1} ξ (x) = lim_{x \to 1} \frac{\frac{1}{2} (\frac{1}{x} - \frac{1}{x + 1})}{\frac{1}{4 \sqrt{x^{3}}}} = 1 .

Using l’Hôspital’s rule (differentiate once),

lim_{x \to + \infty} ξ (x) = \frac{\frac{1}{2} ln \frac{2 x}{x + 1}}{\frac{\sqrt{x} - 1}{2 \sqrt{x}}} = ln 2 .

Thus

ln 2 \leq ϕ (x) = \frac{f_{J S} (x)}{f_{h} (x)} \leq 1,

or

1 \leq \frac{1}{ϕ (x)} = \frac{f_{h} (x)}{f_{J S} (x)} \leq \frac{1}{ln 2} .

When

x = 1

,

f_{h} (1) = f_{J S} (1) = 0

. As a consequence of Theorem 5, we obtain the result

C_{f_{J S}} (P, Q) \leq C_{f_{h}} (P, Q) \leq \frac{1}{ln 2} C_{f_{J S}} (P, Q) .

Thus the theorem is proved. ☐

Theorem 8.

\frac{1}{2} V^{2} (P, Q) \leq Δ (P, Q) \leq V (P, Q) .

Proof. When

x \neq 1

, both

f_{Δ} (1)

and

f_{V} (1)

are not equal to 0. We consider the function:

\frac{f_{Δ} (x)}{f_{V} (x)} = \frac{\frac{{(x - 1)}^{2}}{x + 1}}{| x - 1 |} = \frac{| x - 1 |}{x + 1} \leq 1 .

When

x = 1

,

f_{Δ} (1) = f_{V} (1) = 0

. As a consequence of Theorem 5, we obtain the result

C_{f_{Δ}} (P, Q) \leq C_{f_{V}} (P, Q)

. This means

Δ (P, Q) \leq V (P, Q)

. Next,

\begin{matrix} \frac{1}{2} V^{2} (P, Q) & = \frac{1}{2} {(\sum_{i = 1}^{n} | p_{i} - q_{i} |)}^{2} \\ \leq \frac{1}{2} (\sum_{i = 1}^{n} (p_{i} + q_{i})) (\sum_{i = 1}^{n} \frac{{(p_{i} - q_{i})}^{2}}{p_{i} + q_{i}}) (Cauchy--Schwarz inequality) \\ = \frac{1}{2} \cdot 2 \cdot (\sum_{i = 1}^{n} \frac{{(p_{i} - q_{i})}^{2}}{p_{i} + q_{i}}) \\ = \sum_{i = 1}^{n} \frac{{(p_{i} - q_{i})}^{2}}{p_{i} + q_{i}} = Δ (P, Q) \end{matrix}

Thus the theorem is proved. ☐

From the above theorems, inequalities among these measures are given by

\begin{matrix} \frac{1}{8} V^{2} (P, Q) \leq \frac{1}{4} Δ (P, Q) \leq J S (P, Q) \leq H^{2} (P, Q) \leq \frac{1}{ln 2} J S (P, Q) \\ \leq \frac{1}{2} Δ (P, Q) \leq \frac{1}{2} V (P, Q) \end{matrix}

(17)

These inequalities are sharper than the inequalities in [13] Theorem 2 and [15] (Section 3.1).

5. Asymptotic Approximation

Definition 5. For all

P, Q \in Γ_{n}

, the Chi-square divergence is defined by

χ^{2} (P, Q) = \sum_{i = 1}^{n} \frac{{(p_{i} - q_{i})}^{2}}{q_{i}} .

In [12],

J S (P, Q) = \frac{1}{2} D_{P Q}^{2} \approx \frac{1}{2} \sum_{i = 1}^{n} \frac{1}{4 q_{i}} {(p_{i} - q_{i})}^{2} = \frac{1}{8} χ^{2} (P, Q) .

In this section, we will discuss the asymptotic approximation of

Δ (P, Q)

and

H^{2} (P, Q)

when

P \to Q

in

L^{2}

norm.

Theorem 9. If

{∥ P - Q ∥}_{2} \to 0

, then

Δ (P, Q) \to \frac{1}{2} χ^{2} (P, Q), H^{2} (P, Q) \to \frac{1}{8} χ^{2} (P, Q) .

Proof. From Taylor’s series expansion at q, we have

\begin{matrix} \frac{{(x - q)}^{2}}{x + q} & = \frac{{(x - q)}^{2}}{2 q} + o ({(x - q)}^{2}) \\ \frac{1}{2} {(\sqrt{x} - \sqrt{q})}^{2} & = \frac{{(x - q)}^{2}}{8 q} + o ({(x - q)}^{2}) \end{matrix}

Hence

\begin{matrix} Δ (P, Q) & = \sum_{i = 1}^{n} \frac{{(p_{i} - q_{i})}^{2}}{p_{i} + q_{i}} = \sum_{i = 1}^{n} \frac{{(p_{i} - q_{i})}^{2}}{2 q_{i}} + o ({∥ P - Q ∥}_{2}^{2}) \\ = \frac{1}{2} χ^{2} (P, Q) + o ({∥ P - Q ∥}_{2}^{2}) \end{matrix}

\begin{matrix} H^{2} (P, Q) & = \sum_{i = 1}^{n} \frac{1}{2} {(\sqrt{p_{i}} - \sqrt{q_{i}})}^{2} = \sum_{i = 1}^{n} \frac{{(p_{i} - q_{i})}^{2}}{8 q_{i}} + o ({∥ P - Q ∥}_{2}^{2}) \\ = \frac{1}{8} χ^{2} (P, Q) + o ({∥ P - Q ∥}_{2}^{2}) \end{matrix}

☐

Equivalently,

J S (P, Q) \approx H^{2} (P, Q) \approx \frac{1}{4} Δ (P, Q) \approx \frac{1}{8} χ^{2} (P, Q)

when

P \to Q

. So in some cases, one of the information-theoretic divergences can be substituted for another. The asymptotic property can also interpret the boundedness of triangular discrimination and, on the other hand, the new metrics.

Acknowledgments

The authors would like to thank the editor and referees for their helpful suggestions and comments on the manuscript. This manuscript is supported by China Postdoctoral Science Foundation (2015M571255), the National Science Foundation of China (the NSF of China) Grant No. 71171119, the Fundamental Research Funds for the Central Universities (FRF-CU) Grant No. 2722013JC082, and the Fundamental Research Funds for the Central Universities under grant number NKZXTD1403.

Author Contributions

Wrote the paper: Guoxiang Lu and Bingqing Li. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Basseville, M. Divergence measures for statistical data processing—An annotated bibliography. Signal Process. 2013, 93, 621–633. [Google Scholar] [CrossRef]
Csiszár, I.; Shields, P.C. Information theory and statistics: A tutorial. Found. Trends Commun. Inf. Theory 2004, 1, 417–528. [Google Scholar] [CrossRef]
Dragomir, S.S.; Gluščević, V. Some inequalities for the Kullback–Leibler and χ2-distances in information theory and applications. Tamsui Oxf. J. Math. Sci. 2001, 17, 97–111. [Google Scholar]
Reid, M.D.; Williamson, R.C. Information, divergence and risk for binary experiments. J. Mach. Learn. Res. 2011, 12, 731–817. [Google Scholar]
Liese, F.; Vajda, I. On divergences and informations in statistics and information theory. IEEE Trans. Inf. Theory 2006, 52, 4394–4412. [Google Scholar] [CrossRef]
Vajda, I. Theory of Statistical Inference and Information; Kluwer Academic Press: London, UK, 1989. [Google Scholar]
Csiszár, I. Axiomatic characterizations of information measures. Entropy 2008, 10, 261–273. [Google Scholar] [CrossRef]
Cichocki, A.; Cruces, S.; Amari, S. Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization. Entropy 2011, 13, 134–170. [Google Scholar] [CrossRef]
Taneja, I.J. Seven means, generalized triangular discrimination, and generating divergence measures. Information 2013, 4, 198–239. [Google Scholar] [CrossRef]
Arndt, C. Information Measures: Information and its Description in Science and Engineering; Springer Verlag: Berlin, Germany, 2004. [Google Scholar]
Brown, R.F. A Topological Introduction to Nonlinear Analysis; Birkhäuser: Basel, Switzerland, 1993. [Google Scholar]
Endres, D.M.; Schindelin, J.E. A new metric for probability distributions. IEEE Trans. Inf. Theory 2003, 49, 1858–1860. [Google Scholar] [CrossRef] [Green Version]
Topsøe, F. Some inequalities for information divergence and related measures of discrimination. IEEE Trans. Inf. Theory 2000, 46, 1602–1609. [Google Scholar] [CrossRef]
Csiszár, I. Information type measures of differences of probability distribution and indirect observations. Studia Sci. Math. Hungar 1967, 2, 299–318. [Google Scholar]
Taneja, I.J. Refinement inequalities among symmetric divergence measures. Austr. J. Math. Anal. Appl. 2005, 2. Available online: http://ajmaa.org/cgi-bin/paper.pl?string=v2n1/V2I1P8.tex (accessed on 14 July 2015). [Google Scholar]

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, G.; Li, B. A Class of New Metrics Based on Triangular Discrimination. Information 2015, 6, 361-374. https://doi.org/10.3390/info6030361

AMA Style

Lu G, Li B. A Class of New Metrics Based on Triangular Discrimination. Information. 2015; 6(3):361-374. https://doi.org/10.3390/info6030361

Chicago/Turabian Style

Lu, Guoxiang, and Bingqing Li. 2015. "A Class of New Metrics Based on Triangular Discrimination" Information 6, no. 3: 361-374. https://doi.org/10.3390/info6030361

APA Style

Lu, G., & Li, B. (2015). A Class of New Metrics Based on Triangular Discrimination. Information, 6(3), 361-374. https://doi.org/10.3390/info6030361

Article Menu

A Class of New Metrics Based on Triangular Discrimination

Abstract

1. Introduction

2. Definition and Auxiliary Results

3. Metric Properties of $Δ_{α} (P, Q)$

4. Some Inequalities among the Information-Theoretic Divergences

5. Asymptotic Approximation

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Class of New Metrics Based on Triangular Discrimination

Abstract

1. Introduction

2. Definition and Auxiliary Results

3. Metric Properties of Δ α ( P , Q )

4. Some Inequalities among the Information-Theoretic Divergences

5. Asymptotic Approximation

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3. Metric Properties of $Δ_{α} (P, Q)$