A Class of New Metrics Based on Triangular Discrimination

In the field of information theory, statistics and other application areas, the information-theoretic divergences are used widely. To meet the requirement of metric properties, we introduce a class of new metrics based on triangular discrimination which are bounded. Moreover, we obtain some sharp inequalities for the triangular discrimination and other information-theoretic divergences. Their asymptotic approximation properties are also involved.


Introduction
In many applications such as pattern recognition, machine learning, statistics, optimization and other applied branches of mathematics, it is beneficial to use the information-theoretic divergences rather than the squared Euclidean distance to estimate the (dis)similarity of two probability distributions or positive arrays [1][2][3][4][5][6][7][8][9].Among them the Kullback-Leibler divergence (relative entropy), triangular discrimination, variation distance, Hellinger distance, Jensen-Shannon divergence, symmetric Chi-square divergence, J-divergence and other important measures often play a critical role.Unfortunately, most of these divergences do not satisfy the metric properties and unboundedness [10].As we know, metric properties are the preconditions for numerous convergence properties of iterative algorithms [11].Moreover, boundedness is also highly concerned in numerical computations and simulations.In paper [12], Endres and Schindelin have proved that the square root of twice Jensen-Shannon divergence is a metric.Triangular discrimination presented by Topsøe in [13] is a non-logarithmic measure and is simple in complex computation.Inspired by [12], we discuss the triangular discrimination.In this paper, the main result is that a class of new metrics derived from the triangular discrimination are introduced.Finally, some new relationships among triangular discrimination, Jensen-Shannon divergence, square of Hellinger distance, variation distance are also obtained.

Definition and Auxiliary Results
Definition 1.Let be the set of all complete finite discrete probability distributions.For all P, Q ∈ Γ n , the triangular discrimination is defined by In the above definition, we use convention based on limitation property that 0 0 = 0.The triangular discrimination is obviously symmetric, nonnegative and vanishes for P = Q, but it does not fulfill the triangle inequality.In the view of the foregoing, the concept of triangular discrimination should be generalized.If P, Q ∈ Γ n , the function ∆ α (P, Q) is studied: where α ∈ (0, +∞).
In the following, the α−power of the summand in ∆(P, Q) with all α ∈ (0, +∞) are discussed.
with a > 0, then Proof.As , 0 ≤ x < a we can get (a + a) with a > 0, then h is monotonic increasing in [0, a) and monotonic decreasing in (a, +∞).
Lemma 3. The function R pq (r) has two minima, one at r = p and the other at r = q.
Next consider the monotonicity of R pq (r) in the open interval (p, q).From Lemma 3, we have (5) From Lemma 2, we have Using ( 5) and (6), The equality holds if and only if r = y.So with respect to variable r in the open interval (p, q), B(p, r) and B(q, r) are both monotonic decreasing, B(p, r) + B(q, r) is also monotonic decreasing.Using (4), this shows lim r→p + B(p, r) + B(q, r) > 0, lim r→q − B(p, r) + B(q, r) < 0. So we can see B(p, r) + B(q, r) has only one zero point in the open interval (p, q) with respect to variable r.As a consequence, R pq (r) has only one zero point x 0 in the open interval (p, q) with respect to variable r.This means R pq (r) > 0 in the interval (p, x 0 ), R pq (r) < 0 in the interval (x 0 , q).From the above we know R pq (r) has only one maximum and no minimum in the open interval (p, q).
As a result, the conclusion in the lemma is obtained.
Theorem 1.Let p, q, r ∈ [0, +∞), then Proof.If p = q, then L(p, q) = 0.The triangle inequality (7) obviously holds.If p = q and one of p, q is equal to 0, it is easy to obtain that (7) holds.
Next we assume 0 < p < q without loss of generality.Note that the formula is valid: .
This shows the triangle inequality (8) does not hold.
To sum up the theorems and corollary above, we can obtain the main theorem: Theorem 3. The function (L(p, q)) α satisfies the triangle inequality (8) if and only if 0 < α ≤ 1 2 .
3. Metric Properties of ∆ α (P, Q) In this section, we mainly prove the following theorem: Proof.From (2) we can get It is easy to see that ∆ α (P, Q) ≥ 0 with equality only for P = Q and ∆ α (P, Q) = ∆ α (Q, P ).So what we concern is whether the triangle inequality holds for any P, Q, R ∈ Γ n .When P = Q, ∆ α (P, Q) = 0, the triangle inequality (9) holds apparently.So we assume P = Q in the following.
Next we consider the value of α in two cases respectively: From Theorem 3, the inequality (L(p i , q i )) α ≤ (L(p i , r i )) α + (L(q i , r i )) α holds.Applying Minkowski's inequality we have So the triangle inequality (9) holds. (ii where Next we prove (p 1 , • • • , p n ) and (q 1 , • • • , q n ) are not the extreme points of the function F (x 1 , • • • , x n ).By the symmetry we only need to prove (p 1 , • • • , p n ) is not the extreme point.
By partial derivative, Since P = Q, we might as well assume p 1 = q 1 and p 1 > 0.
Using the definition of extreme point, there exists a point The inequality is not consistent with the triangle inequality (9).
From what has been discussed above, the conclusion in the theorem is obtained.
The generalization of this result to continuous probability distributions is straightforward.Consider a measurable space (X , A), and P , Q are probability distributions with Radon-Nykodym densities p = dP dµ , q = dQ dµ w.r.t. a dominating σ-finite measure µ.Then is a metric if and only if 0 < α ≤ 1 2 .Next we will discuss the maxima and minima of ∆ α (P, Q).It is obvious that ∆ α (P, Q) = 0 is the minima, if and only if P = Q.Because ∆(P, Q) can rewrite in the form ∆(P, Q) obtains the maxima 2 when P, Q are two distinct deterministic distributions, namely p i q i = 0.
4. Some Inequalities among the Information-Theoretic Divergences Definition 3.For all P, Q ∈ Γ n , the Jensen-Shannon divergence is defined by The square of the Hellinger distance is defined by The variance distance is defined by Next we introduce the Csiszár's f -divergence [14].
The triangular discrimination, Jensen-Shannon divergence, the square of the Hellinger distance, variance distance are all f -divergence.

Example 1. (Triangular Discrimination) Let us consider
in (15).Then we can verify in (15).Then we can verify in (15).Then we can easily get f Theorem 5. Let f 1 , f 2 be two nonnegative generating functions and there exists the real constants k, K such that k < K and if We have the inequalities: Proof.The conditions can be rewritten as kf 2 (x) ≤ f 1 (x) ≤ Kf 2 (x).So from the formula (15), and We have shown that f ∆ , f JS , f h , f V are all nonnegative.In the following we will have some inequalities.Theorem 6.
Proof.When x = 1, both f ∆ (1) and f JS (1) are not equal to 0. We consider the function: .
Proof.When x = 1, both f h (1) and f JS (1) are not equal to 0. We consider the function: The derivative of the function φ(x) is

By standard inequality ln
.
When x = 1, f h (1) = f JS (1) = 0.As a consequence of Theorem 5, we obtain the result Thus the theorem is proved.