Inequalities for Jensen–Sharma–Mittal and Jeffreys–Sharma–Mittal Type f–Divergences

In this paper, we introduce new divergences called Jensen–Sharma–Mittal and Jeffreys–Sharma–Mittal in relation to convex functions. Some theorems, which give the lower and upper bounds for two new introduced divergences, are provided. The obtained results imply some new inequalities corresponding to known divergences. Some examples, which show that these are the generalizations of Rényi, Tsallis, and Kullback–Leibler types of divergences, are provided in order to show a few applications of new divergences.


Introduction
The Sharma-Mittal entropy was introduced as a new measure of information with two parameters [1]. It has previously been studied in the context of multi-dimensional harmonic oscillator systems [2]. This entropy could also be formulated in the form of exponential families, to which many usual statistical distributions including the Gaussians and discrete multinomials (that is, normalized histograms) belong. In physical applications it plays a major role in the field of thermo-statistics [3].
The Sharma-Mittal entropy is also applied for the analysis of the results of machine learning methods [4,5]. Additionally, the divergence based on considered entropy could be a cost function in the context of so-called the Twin Gaussian Processes [6].
It was originally showed by [7] that the Sharma-Mittal entropy generalized both Tsallis and Rényi entropy in the limiting cases of these two entropies. In [8], authors suggested a physical meaning of Sharma-Mittal entropy, which is the free energy difference between the equilibrium and the off-equilibrium distribution.
Recently, was published a manuscript showing, in opposition to the work [8], that Sharma-Mittal entropy besides the convenient thermodynamic systems does not reduce only to Kullback-Leibler entropy. In [9] Verma and Merigó present the use of Sharma-Mittal entropy under intuitionistic fuzzy environment. Additionally, in [5] Koltcov et al. demonstrate that Sharma-Mittal entropy is a tool for selecting both the number of topics and the values of hyper-parameters, simultaneously controlling for semantic stability, which none of the existing metrics can do.
Another applications of considered entropy are interesting results in the cosmological setup, such as black hole thermodynamics [10]. Namely, it helps us to describe the current accelerated universe by using the vacuum energy in a suitable manner [11]. In addition [12] have established the relation between anomalous diffusion process and Sharma-Mittal entropy.
This paper is based on publications in which we introduced new types of f -divergences [13][14][15][16].
In this paper we generalize Sharma-Mittal types divergences in order to obtain new types of divergences and hence the inequalities from which it will be possible to derive new results and generalizations for known divergences in order to estimate the lower and upper bounds which determine the level of the uncertainty measure.
Let g : I → R be a convex function on an interval I ⊂ R. Let x = (x 1 , . . . , x n ) ∈ I n and p i ∈ [0, 1) for i = 1, . . . n. The Jensen's inequality is as follows (see [22]) When the function k : R + → R is convex and the function l : R + → R is convex and increasing then the composition of the functions k • l : R + → R is convex. We assume that the probabilities p i ≥ 0 and q i > 0 for i = 1, . . . , n.

Remark 1.
If, additionally, α tends to 1 then based on the proof of the Equation (11) where φ : (0, +∞) × R + → R + is an increasing, non-negative and differentiable function for β > 1. We p i q i = 1 for α = 1 and which are increasing, non-negative for α > 1 and According to [16] if we substitute the function We assume that h(1) = 1. Then, The function φ is the generalization of the function f α ( p i q i ) which is used for example in Csiszár f -divergence. Condition β → 1 means that the limit of generalized Sharma-Mittal divergence is equal to generalized (h, F )-Rényi divergence. Hence we have implications for generalized forms of entropies.

Remark 4.
In (6), when the parameter β = α, the function h = id and φ( p i This work is more theoretical than practical. Therefore, the implications are formulated in the mathematical area that is from constructing general model which gives known specific cases.
We similarly introduce a new generalized (h, φ) Jeffreys-Sharma-Mittal divergence as follows Jef Taking into account inequality from [17]: describing the relation between the Jensen-Shannon and Jeffreys divergences, we could formulate the following: We define the Jensen-Sharma-Mittal h-divergence where, in (8) Then, it takes the form: In the same way, we define the Jeffreys-Sharma-Mittal h-divergence: Additionally, if the function h(t) = t then we define the Jensen-Sharma-Mittal and the Jeffreys-Sharma-Mittal divergences of order α and degree β, respectively.
When in (8) and (9) β → 1 and we substitute for φ p i q i , α = f α p i q i then we obtain, defined in [16], the generalized (h, F ) Jensen-Rényi and Jeffreys-Rényi divergences, respectively: The following theorem is the generalization and refinement of the inequalities for some known divergences and provides lower and upper bounds for the generalized (h, φ) Jeffreys-Sharma-Mittal divergence in order to a more accurate estimation of its uncertainty measure. Theorem 1. Let p = (p 1 , . . . , p n ) and q = (q 1 , . . . , q n ) be two discrete probability distributions with p i > 0, q i > 0, q i p i ∈ I 0 , p i q i ∈ I 0 , i = 1, . . . , n, where I 0 ⊂ R is an interval, such that 1 ∈ I 0 . Let φ : I 0 × R + → R + be an increasing, non-negative and differentiable function for which , α ≥ 1 where 1 < β ≤ α, α ∈ R + \{1} and h : I 0 → R be a convex and increasing function on I 0 .
Then, the following inequalities are valid: Proof. Taking into account the assumptions, we could formulate the following inequality: The function h is increasing and convex, therefore, from (4) and (17) we obtain inequalities: In the same way, we obtain the following inequalities: From (9) we have (18), (19) and the definition of Jeffreys divergence, it stands that: The above inequality is the upper bound for generalized (h, φ) Jeffreys-Sharma-Mittal divergence.
By using the convexity of the function h with h(1) = 1 the following inequality is valid for β > 1: From (7) the above derivative function is equal to: 1 The function f (t) = log t is concave and increasing. Then, it stands that: Hence, from (21) and (22) we have the inequality: Similarly, we obtain the second inequality: We have from (6), (23) and (24) that: Then, by using the definition (9) we have: This result is the lower bound of the generalized (h, φ) Jeffreys-Sharma-Mittal divergence. Combining (20) and (25) we obtain the expected inequalities (16).

Corollary 1.
When we substitute for φ p i q i , α = p i q i α then from (16) we obtain the inequalities for Jeffreys-Sharma-Mittal h-divergence: Hence, we have that (30) is greater than Then, combining (27), (29)-(31), and using the definition (8) the following inequality occurs and it is the lower bound of the generalized (h, φ) Jensen-Sharma-Mittal divergence. When we consider the function with 1 < β ≤ α then for the convex and increasing function h we have from (4) that (33) is smaller than In a similar way we conclude the following inequality for the function and we have Then combining (33)-(36) and the definition (8) with the proper transformations we obtain the inequality which is the upper bound of the generalized (h, φ) Jensen-Sharma-Mittal divergence.

Remark 5.
It could be seen that the lower bounds for both Jeffreys (25) and Jensen (32) Sharma-Mittal (h, φ) divergences are independent of the function h.

Remark 6.
Taking into account the inequality (10) we obtain the alternative upper bound for the Jensen-Sharma-Mittal and the lower bound for the Jeffreys-Sharma-Mittal generalized (h, φ) divergences, respectively.

Applications
In this section we show how our theory works.

Bounds for Sharma-Mittal Divergences
For the functions h(t) = t, φ(t, α) = t α and based on Theorems 1 and 3 we obtain the lower and upper bounds for Jeffreys-Sharma-Mittal and Jensen-Sharma-Mittal divergences, respectively, as follows .
(39) Remark 7. The above lower bounds (38) and (39) are the same for Rényi types divergences because they are independent of the parameter β which in that case approaches 1.

Remark 8.
Substituting different values for the parameters α, β, such that 1 < β ≤ α and taking into account the assumptions from the Theorems 1 and 3 about the functions h and φ we could formulate new types of divergences and related inequalities which are based on the generalized (h, φ) Sharma-Mittal divergence.

Bounds for Tsallis Divergences
When we make the same assumptions as for Sharma-Mittal divergences with additional that β = α we obtain the bounds for Tsallis type divergences as follows

Bounds for Kullback-Leibler Divergences
When we have the same situation as in case of Tsallis divergence that is h(t) = t, φ(t, α) = t α , α = β and additionally both α and β approach 1 then we obtain new upper bounds for Jeffreys and Jensen-Shannon divergences, respectively.
The last inequality is equivalent to Jen S(p, q) ≥ 2log2.

Summary
In this paper, new types of entropy have been defined, which are generalizations of others known and used so far in information theory.
The manuscript deals more with issues in the field of pure mathematics, therefore the standard axioms of entropy used in thermodynamics could, in this case, be extended by other assumptions and properties.
These divergences have been introduced for new physical interpretations which could be generated.
Generalized Sharma-Mittal and consequently Jensen-Sharma-Mittal and Jeffrey-Sharma-Mittal divergences have been defined for obtaining better estimates for known entropies, which will allow to more accurately determination of the dispersion measure of different distributions.
The derived inequalities have both upper and lower limits for the considered fdivergences. As a consequence, we obtain specific estimates for some new order measures. Hence they provide much wider interpretation possibilities in comparing probability distributions in the sense of mutual distances in different spaces.
In the era of advancing quantum mechanics, scientists are striving to build a quantum computer with very high computing power. The obtained results, despite their mathematical and analytical complexity, will very quickly generate specific numerical intervals which are an estimation of new introduced entropies. Therefore, such results as in this paper will be very useful in developing information theory issues.
This work is from the area of pure mathematics, therefore it is more theoretical than practical and makes it possible to find the existing known entropies by means of new defined generalizations. These generalizations can be used for interpreting various physical phenomena. The aim of this manuscript was to provide some new theoretical solutions for physicists who, with their knowledge and experience, will be able to look for new applications.