1. Introduction
Research of the classical inequalities, such as the Jensen, the Hölder and similar, has experienced great expansion. These inequalities first appeared in discrete and integral forms, and then many generalizations and improvements have been proved (for instance, see [
1,
2]). Lately, they are proven to be very useful in information theory (for instance, see [
3]).
Let 
I be an interval in 
 and 
 a convex function. If 
 is any 
n-tuple in 
 and 
 a nonnegative 
n-tuple such that 
, then the well known Jensen’s inequality
      
      holds (see [
4,
5] or for example [
6] (p. 43)). If 
f is strictly convex then (1) is strict unless 
 for all 
.
Jensen’s inequality is one of the most famous inequalities in convex analysis, for which special cases are other well-known inequalities (such as Hölder’s inequality, A-G-H inequality, etc.). Beside mathematics, it has many applications in statistics, information theory, and engineering.
Strongly related to Jensen’s inequality is the Lah–Ribarič inequality (see [
7]),
      
      which holds when 
 is a convex function on 
, 
   is as in (
1), 
 is any 
n-tuple in 
 and 
 If 
f is strictly convex then (4) is strict unless 
 for all 
.
The Lah–Ribarič inequality has been largely investigated and the interested reader can find many related results in the recent literature as well as in monographs such as [
6,
8,
9]. It is interesting to find further refinements of the above inequality.
Our main result will be refinement of the inequality (
2).
Using the same technique, we will give a refinement of the inequality (
1) (see [
10]).
In addition, we deal with the notion of f-divergences which measure the distance between two probability distributions. One of the most important is the Csiszár f-divergence, some special cases of which are the Shannon entropy, Jeffrey’s distance, Kullback–Leibler divergence, the Hellinger distance, and the Bhattacharyya distance. We deduce the relations for the mentioned f-divergences.
Let us say few words about the organization of the paper. In the following section we give a new refinement of the Lah–Ribarič inequality and state a known refinement of the Jensen inequality using the same technique. Using obtained results we give a refinement of the famous Hölder inequality and some new refinements for the weighted power means and quasi-arithmetic means. In addition, we give a historical remark regarding the Jensen–Boas inequality. In 
Section 3, we give the results for various 
f-divergences. These are further examined for the Zipf–Mandelbrot law.
  2. New Refinements
The starting point of this consideration is the following lemma (see [
11]).
Lemma 1. Let f be a convex function on an interval I. If  such that , then the inequalityholds for any .  The main result is a refinement of the Lah–Ribarič inequality (
2). As we will see, its proof is based on the idea from the proof of the Jensen–Boas inequality.
Theorem 1. Let  be a convex function on , ,  is as in (1),  be any n-tuple in  and  Let  where  for , , , for  and , , for . Thenholds, where If f is concave on I, then the inequalities in (
3) 
are reversed.  Proof.  Using the Lah–Ribarič inequality (
2) for each of the subsets 
, we obtain
        
Using 
, 
 and Lemma 1, we obtain
        
□
 Remark 1. If , the related term in the sum on the right-hand side of the first inequality in the proof of Theorem 1 remains unaltered (i.e., is equal to ).
 Using the same technique, we obtain the following refinement of the Jensen inequality (
1).
Theorem 2. Let I be an interval in  and  a convex function. Let  is any n-tuple in  and  a nonnegative n-tuple such that . Let  where  for ,  and . Thenholds. If f is concave on I, then the inequalities in (
4) 
are reversed.  Proof.  Using Jensen’s inequality (1), we obtain
        
        which is (
4).    □
 We can find this idea for proving the refinement of our main result (and the refinement of the Jensen inequality) in one other well-known result (see [
6] (pp. 55–60)).
In Jensen’s inequality there is a condition “
 a nonnegative 
n-tuple such that 
”. In 1919, Steffensen gave the same inequality (
1) with slightly relaxed conditions (see [
12]).
Theorem 3 (Jensen–Steffensen). 
If  is a convex function,  is a real monotonic n-tuple such that , and  is a real n-tuple such thatThen (1) holds. If f is strictly convex, then inequality (1) is strict unless .  One of many generalizations of the Jensen inequality is the Riemann–Stieltjes integral form of the Jensen inequality.
Theorem 4 (the Riemann–Stieltjes form of Jensen’s inequality). 
Let  be a continuous convex function where I is the range of the continuous function . Inequalityholds provided that λ is increasing, bounded and . Analogously, integral form of the Jensen–Steffensen’s inequality is given.
Theorem 5 (The Jensen–Steffensen). 
If f is continuous and monotonic (either increasing or decreasing) and λ is either continuous or of bounded variation satisfyingthen (
5) 
holds. In 1970, Boas gave the integral analogue of Jensen–Steffensen’s inequality with slightly different conditions.
Theorem 6 (the Jensen–Boas inequality). 
If f is continuous or of bounded variation satisfyingfor all , and , and if f is continuous and monotonic (either increasing or decreasing) in each of the  intervals , then inequality (5) holds. In 1982, J. Pečarić gave the following proof of the Jensen–Boas inequality.
Proof.  If 
 with the notation
        
        we have
        
Using Jensen’s inequality (
1), we obtain
        
Using Jensen–Steffensen’s inequality (
5) on each subinterval 
, we obtain
        
If , for some j, then  on  and we can easily prove that the Jensen–Boas inequality is valid.    □
 If we look at the previous proof, we see that the technique is the same as for our main result and the refinement of the Jensen inequality.
By using Theorem 2, we obtain the following refinement of the discrete Hölder inequality (see [
13,
14]).
Corollary 1. Let  such that . Let ,  such that . Then:  Proof.  We use Theorem 2 with 
. Then 
 and from (
4), we obtain
        
For the function 
 from (
7), we obtain
        
Multiplying with 
, and raising to the power of 
, we obtain
        
        which is (
6).    □
 Corollary 2. Using the same conditions as in previous corollary for , , , we obtain  Proof.  First for 
. We use Theorem 2 with 
. Then 
 and from (
4), we obtain
        
For the function 
, we obtain
        
Multiplying with 
, and then with 
, we obtain
        
        which is (
8).
If , then , and the same result follows from symmetry (see comments in Corollary 1).    □
 It is interesting to show how the previously obtained results impact the study of the weighted discrete power means and the weighted discrete quasi-arithmetic means.
Let 
, 
, 
, 
, 
. The weighted discrete power means of order 
 are defined as
      
Using Theorem 2, we obtain the following inequalities for the weighted discrete power means. Let us notice that left-hand side and right-hand side of both inequalities are the same; only mixed means in the middle, which are a refinement, change.
Corollary 3. Let , , , . Let  such that . Thenwhere , , , , for .  Proof.  We use Theorem 2 with 
 for 
, 
, 
, 
, 
. From (
4), we obtain
        
Substituting 
 with 
, and then raising to the power 
, we obtain
        
        which is (
9).
Similarly, we use Theorem 2 with 
 for 
, 
, 
, 
. We obtain
        
Substituting  with , and then raising to the power , inequality (10) easily follows. Other cases follow similarly.    □
 Let 
I be an interval in 
. Let 
, 
, 
, 
. Then, for a strictly monotone continuous function 
, the discrete weighted quasi-arithmetic mean is defined as
      
Using Theorem 2, we obtain the following inequalities for quasi-arithmetic means.
Corollary 4. Let I be an interval in . Let , , , . Let  be a strictly monotone continuous function such that  convex. Let  where  for ,  and . Thenwhere , , , , for .  Proof.  Theorem 2 with 
 and 
 gives
        
□
   3. Applications in Information Theory
In this section we give basic results concerning the discrete Csiszár f-divergence. In addition, bounds for the divergence of the Zipf–Mandelbrot law are obtained.
Let us denote the set of all probability densities by , i.e.,  if  for  and .
In [
15], Csiszár introduced the 
f-divergence functional as
      
      where 
 is a convex function, and it represents a “distance function” on the set of probability distributions 
.
In order to use nonnegative probability distributions in the 
f-divergence functional, we assume, as usual,
      
      and the following definition of a generalized 
f-divergence functional is given.
Definition 1 (the Csiszár 
f-divergence functional). 
Let  be an interval, and let  be a function. Let  be an n-tuple of real numbers and  be an n-tuple of nonnegative real numbers such that  for every . The Csiszár f-divergence functional is defined as Theorem 7. Let I be an interval in  and  a convex function. Let  be an n-tuple of real numbers and  be an n-tuple of nonnegative real numbers such that  for every . Let  where  for , ,  and . Thenholds.  Proof.  Using Theorem 2 with 
 and 
, we obtain
        
        which is (
13).    □
 Corollary 5. If in the previous theorem we take  and  to be probability distributions, and we directly obtain the following result:  Theorem 8. Let  be a convex function on , . Let  be an n-tuple of real numbers and  be an n-tuple of nonnegative real numbers such that . Let  where  for , , , for  and , , for . Thenholds.  Proof.  Using Theorem 1 with 
 and 
, we obtain
        
        which is (
15).    □
 Corollary 6. If, in the previous theorem, we take  and  to be probability distributions, we directly obtain the following result:  If 
 and 
 are probability distributions, the Kullback–Leibler divergence, also called relative entropy or KL divergence, is defined as
      
The next corollary provides us bounds for the Kullback–Leibler divergence of two probability distributions.
Corollary 7. Let  where  for ,  and .
- Let  and  be n-tuples of nonnegative real numbers. Then 
- Let  and  be probability distributions. Then 
 Proof.  Let  and  be an n-tuples of nonnegative real numbers. Since the function  is convex, first inequality follows from Theorem 7 by setting .
The second inequality is a special case of the first inequality for probability distributions  and .    □
 Corollary 8. Let  where  for ,  and , for .
- Let  and  be n-tuples of nonnegative real numbers. Let , ,  and , for . Then 
- Let  and  be probability distributions. Let , ,  and , for . Then 
 Proof.  Let  and  be an n-tuples of nonnegative real numbers. Since the function  is convex, the first inequality follows from Theorem 8 by setting .
The second inequality is a special case of the first inequality for probability distributions  and .    □
 Now we deduce the relations for some more special cases of the Csiszár f-divergence.
Definition 2 (the Shannon entropy). 
For a , the discrete Shannon entropy is defined as Corollary 9. Let . Let  where  for ,  and . Then  Proof.  Using Theorem 7 with 
 and 
, we obtain
        
For  inequality (17) follows.    □
 Corollary 10. Let , ,  such that . Let  where  for , , , for  and , , for . Thenholds.  Proof.  Using Theorem 8 with 
, 
 and 
, we obtain
        
        and (
17) easily follows.    □
 Definition 3 (Jeffrey’s distance). 
For the  the discrete Jeffrey distance is defined as Corollary 11. Let . Let  where  for ,  and . Then  Proof.  Using Corollary 5 with 
, we obtain
        
        and (
18) easily follows.    □
 Corollary 12. Let , ,  such that . Let  where  for , , , for  and , , for . Thenholds.  Proof.  Using Corollary 6 with 
, we obtain
        
        and (
19) easily follows.    □
 Definition 4 (the Hellinger distance). 
For the , the discrete Hellinger distance is defined as Corollary 13. Let . Let  where  for ,  and . Then  Proof.  Using Corollary 5 with 
 (
20) follows.    □
 Corollary 14. Let , ,  such that . Let  where  for , , , for  and , , for . Thenholds.  Proof.  Using Corollary 6 with 
 (
21) follows.    □
 Definition 5 (Bhattacharyya distance). 
For the , the discrete Bhattacharyya distance is defined as Corollary 15. Let . Let  where  for ,  and . Then  Proof.  Using Corollary 5 with 
 (
22) follows.    □
 Corollary 16. Let , ,  such that . Let  where  for , , , for  and , , for . Thenholds.  Proof.  Using Corollary 6 with 
 (
23) follows.    □
 Now we are going to derive the results from the Theorems (7) and (8) for the Zipf–Mandelbrot law.
The Zipf–Mandelbrot law is a discrete probability distribution and is defined by the following probability mass function:
      where
      
      is a generalization of the harmonic number and 
, 
 and 
 are parameters.
If we define 
 as a Zipf–Mandelbrot law M-tuple, we have
      
      where
      
      and the Csiszár functional becomes
      
      where 
, and the parameters 
 are such that 
.
If 
 and 
 are both defined as Zipf–Mandelbrot law M-tuples, then the Csiszár functional becomes
      
      where 
, and the parameters 
 are such that 
.
Now, from Theorem 7, we have the following result.
Corollary 17. Let I be an interval in  and  a convex function. Let  be an n-tuple of real numbers and  be an n-tuple of nonnegative real numbers such that  for every . Let  where  for , . Suppose  are such that , . Thenholds.  Proof.  If we define 
 as a Zipf–Mandelbrot law 
n-tuple with parameters 
, then from Theorem 7 it follows
        
        which is (
24).    □
 From Theorem 8 we have the following result.
Corollary 18. Let  be a convex function on , . Let  be an n-tuple of real numbers. Suppose  are such that . Let  where  for , , ,  and , , for . Thenholds.  Proof.  If we define 
 as a Zipf–Mandelbrot law 
n-tuple with parameters 
, then from Theorem 8 it follows
        
        which is (
25).    □
 Now, from Theorem 7, we also have the following result.
Corollary 19. Let I be an interval in  and  a convex function. Let  where  for , . Suppose  are such that , . Thenholds.  Proof.  If we define 
 as a Zipf–Mandelbrot law 
n-tuples with parameters 
, then from Theorem 7, we obtain (
26).    □
 From Theorem 8, we have the following result.
Corollary 20. Let  be a convex function on , . Suppose  are such that . Let  where  for , , ,  and , , for . Thenholds.  Proof.  If we define 
 as a Zipf–Mandelbrot law 
n-tuples with parameters 
, then from Theorem 8, we obtain (
27).    □
 Since the minimal value for 
 is 
 and its maximal value is 
, from the right-hand side of (
24) and the left-hand side of (
25), we obtain the following result.
Corollary 21. Let  be a convex function on , . Let  be an n-tuple of real numbers. Suppose  are such that . Let  where  for , , ,  and , , for . Thenholds.  Proof.  Using 
 and 
 from the right-hand side of (
24) and the left-hand side of (
25), we obtain
        
        and (
28) follows.    □
   4. Conclusions
In this paper we have obtained a refinement of the Lah–Ribarič inequality and a refinement of the Jensen inequality which follows from using the Lah–Ribarič inequality and the Jensen inequality on disjunctive subsets of .
Using these results, we find a refinement of the discrete Hölder inequality and a refinement of some inequalities for the discrete weighted power means and the discrete weighted quasi-arithmetic means. In addition, some interesting estimations for the discrete Csiszár divergence and for its important special cases are obtained.
It would be interesting to see whether using this method one can give refinements of some other inequalities. In addition, we can try to use this method for refining the Jensen inequality and the Lah–Ribarič inequality for operators.