1. Introduction
The concept of information entropy was introduced by Claude Shannon in 1948 in his article [
1]. It is used in information theory [
2] to quantify the amount of information or uncertainty inherent in a system. We remind that Shannon’s entropy is defined in the context of a probabilistic model. Consider a measurable partition 
 of a probability space 
 (that is a finite collection of measurable subsets 
  such that 
 and 
 whenever 
) with probabilities 
  Then the Shannon entropy of 
 is defined as the number 
 with the convention that 
 (which is justified by the fact that 
). The base of the logarithm can be any positive real number; depending on the selected base of the logarithm, the entropy is expressed in bits (
), nats (
), or dits (
).
The extensions of Shannon’s entropy have led to several alternatives of entropy measure, of which the Rényi entropy [
3] is one of the most important. The classical logical entropy (cf. [
4,
5]) and the entropy measure called the 
R-norm entropy (cf. [
6,
7]) are other alternative entropy measures. In this article, we will deal with the study of the 
R-norm entropy. If 
 is a probability distribution, then the 
R-norm entropy is defined, for every real number 
 by the formula:
Some results regarding the 
R-norm entropy measure and its generalizations can be found in [
8,
9,
10,
11,
12]. The above entropy measures have found many important applications, for example, in statistics, pattern recognition, and coding theory.
In classical probability theory, partitions are defined in the context of the Cantor set theory. In solving many real-life problems, however, the partitions defined in terms of fuzzy set theory [
13] are more appropriate. Therefore, many proposals have been made to generalize the classical partitions into fuzzy partitions [
14,
15,
16,
17,
18,
19,
20]. Fuzzy partitions represent a mathematical tool for modeling random experiments that lead to unclear, vague events. Naturally, there are also many results concerning the Shannon entropy of fuzzy partitions; see e.g., [
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31]. We note that in [
32] the results regarding the entropy of fuzzy partitions provided in [
25] have been employed to introduce the notions of mutual information and Kullback–Leibler divergence for the fuzzy case. The notion of Kullback–Leibler divergence was introduced in [
33] as the distance measure between two probability distributions. It plays significant roles in information theory and various disciplines such as statistics, machine learning, physics, neuroscience, computer science, linguistics, etc.
Since its inception in 1965, the theory of fuzzy sets has advanced in many mathematical disciplines and has found important applications in practice. Currently, the subjects of intense study are algebraic systems based on the theory of fuzzy sets, for example, D-posets [
34,
35,
36], MV-algebras [
37,
38,
39,
40,
41], and effect algebras [
42]. Some results regarding the above entropy measures and divergence on these structures can be found e.g., in [
43,
44,
45,
46,
47,
48,
49,
50,
51,
52,
53,
54,
55,
56,
57,
58].
The aim of this article is to study the 
R-norm entropy of fuzzy partitions and the 
R-norm divergence in fuzzy probability spaces [
59]. The organization of the paper is as follows. In 
Section 2 we provide basic definitions, terminology and some known results used in the paper. The results of the article are presented in 
Section 3 and 
Section 4. In 
Section 3, we define the 
R-norm entropy and conditional 
R-norm entropy of fuzzy partitions and examine their properties. In 
Section 4, the concept of the 
R-norm divergence for the case of fuzzy probability spaces is proposed and the properties of this distance measure are studied. The results presented in 
Section 3 and 
Section 4 are illustrated with numerical examples. The paper concludes in 
Section 5 with a brief summary.
  2. Preliminaries
We begin by recalling the basic concepts and the known results used in the paper.
It is known that the concept of fuzzy set, introduced by Zadeh in 1965 [
13], extends the classical set theory. In classical set theory, the membership of elements in a set is assessed in binary terms according to the condition—an element either belongs or does not belong to the considered set. By contrast, the fuzzy set is characterized by a membership function which assigns to every element a grade of membership ranging between zero and one. The mathematical model of the fuzzy set is as follows. Let 
X be a non-empty set. By a fuzzy subset of 
X we mean a mapping 
 (where the considered fuzzy set is identified with its membership function). The value 
 is interpreted as a grade of membership of the element 
 to the considered fuzzy set 
.
Definition 1. Let X be a non-empty set, and  be a family of fuzzy subsets of X. The pair  is called a fuzzy measurable space, if the following conditions are satisfied: (i)  (ii)  (iii) if  then  The family  with the properties (i)–(iii) is said to be a fuzzy -algebra.
 Throughout the paper, the symbols 
 and 
 denote the fuzzy union and the fuzzy intersection of a sequence 
 respectively, by Zadeh [
13], i.e., 
 and 
 The symbol 
 denotes the complement of fuzzy set 
 i.e., 
 Here, 
 denotes the constant function with the value 1; analogously, the symbols 
 and 
 denote the constant functions with the value 
 and 0, respectively. Additionally, the relation 
 denotes the usual order relation of fuzzy subsets of 
X, i.e., 
 if and only if 
 for every 
 The complementation 
  satisfies, for every 
 the conditions: (i) 
 and (ii) 
 implies 
Fuzzy subsets  with the property  are said to be separated, fuzzy subsets  with the property  are said to be W-separated fuzzy sets. Any fuzzy subset  with the property  is said to be a W-universum, any fuzzy subset  with the property  is said to be a W-empty fuzzy set. A fuzzy set from the fuzzy -algebra M is interpreted as a fuzzy event. W-separated fuzzy events are considered to be mutually exclusive events. A W-universum is interpreted as a certain event, a W-empty set as an impossible event. It can be shown that a fuzzy subset  is a W-universum if and only if there exists a fuzzy subset  such that 
Naturally, the notion of a fuzzy measurable space generalizes the concept of a measurable space  from the classical measure theory; it suffices to put  where  is the characteristic function of the set  With this procedure, the classical model can be inserted into the fuzzy one.
Definition 2 ([
59])
. Let  be a fuzzy measurable space. A map  is said to be a fuzzy P-measure, the following conditions being satisfied: (i)  for every  (ii) if  is a sequence of pairwise W-separated fuzzy sets from M, then  The triplet  is called a fuzzy probability space. The fuzzy P-measure 
 has the properties that correspond to properties of a classical probability measure; the proofs can be found in [
59].
- (P1)
-  for every  
- (P2)
-  is non-decreasing, i.e., if  with  then  
- (P3)
-  for every  
- (P4)
- Let  Then  for all  if and only if  
- (P5)
- If  such that  then  
- (P6)
- If  such that  then  
Definition 3 ([
14])
. A fuzzy partition of a fuzzy probability space  is a collection  of W-separated fuzzy sets from M with the property  In the system of all fuzzy partitions of  we define the refinement partial order in the following way. If A and B are two fuzzy partitions of  then we say that B is a refinement of A (and write  if for every  there exists  such that  Furthermore, for every two fuzzy partitions  and  of  we put   One can easily to verify that the family  is a family of pairwise W-separated fuzzy sets from M; moreover, by the property (P4), we have  Thus,  is a fuzzy partition of  It represents a combined experiment consisting of a realization of the experiments A and B. Evidently, it holds  and  i.e., the fuzzy partition  is a common refinement of fuzzy partitions A and B. If  are fuzzy partitions of  then we put 
Definition 4. Two fuzzy partitions  and  of a fuzzy probability space  are said to be statistically independent, if  for  
 Example 1. Let us consider a classical probability space  and put  It can be verified that the map  defined by  for every  is a fuzzy P-measure and the triplet  is a fuzzy probability space. A classical measurable partition  of a probability space  can be eventually regarded as a fuzzy partition of  considering  instead of 
 The Shannon entropy of fuzzy partition of a fuzzy probability space 
 has been introduced and examined in [
23], see also [
25].
Definition 5 ([
23])
. We define the entropy of a fuzzy partition  of  by Shannon’s formula:If  and  are two fuzzy partitions of  then we define the conditional entropy of A given B by the formula:with the convention that  if   The symbol log denotes the base 2 logarithm, so the Shannon entropy of fuzzy partition is expressed in bits. The entropy and the conditional entropy of fuzzy partitions have properties that correspond to properties of Shannon’s entropy of classical measurable partitions: for every fuzzy partitions  of a fuzzy probability space  it holds:
	  
- (S1)
-  implies  
- (S2)
- (S3)
-  implies  
- (S4)
-  implies  
- (S5)
-  with the equality if and only if  are statistically independent; 
- (S6)
- (S7)
The proofs can be found in [
23,
25]. We remark that in [
15,
16,
17,
18,
19,
20,
21,
22,
26,
27,
28,
29,
30,
31], other conceptions of fuzzy partitions and their entropy measures have been introduced. Whereas our approach is based on Zadeh’s connectives, in the referenced papers Zadeh’s connectives have been replaced by other fuzzy set operations.
We note that in [
32], the concept of Kullback–Leibler divergence in the fuzzy probability space was introduced. Let 
  be two fuzzy P-measures on a fuzzy measurable space 
 and 
 be a fuzzy partition of fuzzy probability spaces 
, 
 Then the Kullback–Leibler divergence of fuzzy P-measures 
  with respect to 
 is defined as the number:
      with the convention that 
 if 
 and 
 if 
.
In the following sections, we will use the following known Minkowski inequality: for non-negative real numbers 
 it holds:
      and
      
Furthermore, we will use the Jensen inequality which states that for a real convex function 
 real numbers 
 in its domain and non-negative real numbers 
 with 
 it holds:
      and the inequality is reversed if 
 is a real concave function. The equality holds if and only if 
 or 
 is linear.
In addition, we will use L’Hôpital’s rule: for functions 
 and 
 that are differentiable on an open interval 
U except possibly at a point 
 if 
  for every 
x in 
 with 
 and 
 exists, then:
  3. The R-Norm Entropy of Fuzzy Partitions
In this part we define the R-norm entropy of a fuzzy partition and its conditional version and study the properties of these entropy measures. It is shown that as the limiting cases of the R-norm entropy and the conditional R-norm entropy of fuzzy partitions for R going to 1, we obtain the Shannon entropy  and the conditional Shannon entropy  respectively, expressed in nats.
Definition 6. Let  be a fuzzy partition of a fuzzy probability space  The R-norm entropy of A with respect to  is defined, for a positive real number R not equal to 1
, by the formula:  Remark 1. For simplicity, we write  instead of  In the following, we will write  instead of 
 Theorem 1. For arbitrary fuzzy partition  of a fuzzy probability space  the R-norm entropy  is non-negative.
 Proof.  Assume that  We will consider two cases: the case of  and the case of  If  then  for  hence  This implies that  Since  for  it follows that   On the other hand, for  it holds that  for  hence  It follows that  Since  for  we obtain  □
 Example 2. Let  and  be defined by  Consider a fuzzy measurable space  where  Then it can be easily verified that the mappings  and  defined by the equalities        are fuzzy P-measures and the systems   are fuzzy probability spaces. The sets    are fuzzy partitions of  and  such that  We can calculate their R-norm entropy. Evidently,  in accordance with the natural requirement, experiments resulting in a certain event have zero R-norm entropy. Furthermore, we have: If we put  then   for  we have   for  we have  and 
 Definition 7. Let  and  be two fuzzy partitions of a fuzzy probability space  Then the conditional R-norm entropy of A given B with respect to  is defined, for a positive real number R not equal to 1
, by the formula:  Remark 2. Let A be a fuzzy partition of a given fuzzy probability space  Evidently, if we put  where  is a W-universum, then 
 The following theorem shows the consistency of the conditional R-norm entropy  in the case of the limit of  going to 1, with the conditional Shannon entropy  defined by the formula (2), up to a positive multiplicative constant.
Theorem 2. Let  and  be two fuzzy partitions of a given fuzzy probability space  Then  where  and  
 Proof.  In the proof, we use L’Hôpital’s rule 
 where in this case 
 For every 
 we can write:
     where 
 are continuous functions defined for 
 in the following way:
By continuity of the function 
 we get 
 Furthermore, by continuity of the function 
 and by the property (P4) of fuzzy P-measure 
 we get 
 Using L’Hôpital’s rule, this implies:
      under the assumption that the right-hand side exists. To find the derivative of the function 
 we use the identity 
 Let us calculate:
Since 
 we obtain:
 Theorem 3. Let  be a fuzzy partition of a fuzzy probability space  Then  where  and 
 Proof.  The claim is a direct consequence of the previous theorem; it suffices to put  □
 In the following, the properties of the R-norm entropy of fuzzy partitions are discussed.
Theorem 4. For arbitrary fuzzy partitions  and  of a fuzzy probability space  it holds that:  Proof.  Suppose that 
   Let us calculate:
 Theorem 5. For arbitrary fuzzy partitions  of a fuzzy probability space  it holds that:  Proof.  The claim is a direct consequence of the previous theorem; it suffices to put  □
 In the following theorem, using the notion of conditional R-norm entropy of fuzzy partitions, chain rules for the R-norm entropy of fuzzy partitions are established.
Theorem 6. Let  and C be fuzzy partitions of a fuzzy probability space  Then, for  the following equalities hold:
- (i) 
- (ii) 
 Proof.  The proof can be done using mathematical induction and Theorems 4 and 5. □
 In the following we prove that the R-norm entropy  is a concave function on the class of all fuzzy P-measures on a given fuzzy measurable space 
Proposition 1. Let   be two fuzzy P-measures on a given fuzzy measurable space  Then, for every real number  the map  is a fuzzy P-measure on 
 Proof.  It is straightforward. □
 Theorem 7. Let A be a fuzzy partition of fuzzy probability spaces   Then, for every real number  this inequality holds:  Proof.  Let 
 and 
 Putting 
 and 
 for 
 in the Minkowski inequality, we obtain for 
      and for 
:
This means that the function 
 is convex in 
 for 
 and concave in 
 for 
 Therefore, the function 
 is concave in 
 for 
 and convex in 
 for 
 Evidently, 
 for 
 and 
 for 
 According to definition of the 
R-norm entropy 
 we obtain that for every 
 the 
R-norm entropy 
 is a concave function on the family of all fuzzy P-measures on a given fuzzy measurable space 
 Thus, for every 
 it holds that:
 Proposition 2. Let  be fuzzy partitions of a fuzzy probability space  such that  Then there exists a partition  of the set  such that  for 
 Proof.  By the assumption, for every  there exists  such that  Let us denote by  the subset of the set  such that for every  it holds that   Then the set  is a partition of the set  and  for  By monotonicity of fuzzy P-measure   for  Summing over  we get  Since  it follows that  for  □
 Theorem 8. Let  be fuzzy partitions of a fuzzy probability space  such that  Then:
- (i) 
- (ii) 
 Proof.  (i) Assume that 
   According to Proposition 2 there exists a partition 
 of the set 
 such that 
 for 
 For the case of 
 we obtain:
	  and consequently:
Since 
 for 
 we conclude that:
For the case of 
 we get:
     and consequently:
Since 
 for 
 we have:
(ii) By the assumption, for every 
 there exists 
 such that 
 Hence, for arbitrary element 
 of fuzzy partition 
 there exists 
 such that 
 This means that 
 Therefore, we get:
 Theorem 9. Let  be statistically independent fuzzy partitions of a fuzzy probability space  Then:  Proof.  Let 
 and 
 By the assumption, 
 for 
 Therefore we can write:
 In view of Theorems 5 and 9, the R-norm entropy does not have the property of additivity, but it satisfies the property that is called pseudo-additivity, as stated in the following theorem. 
Theorem 10. (Pseudo-additivity). 
Let  be statistically independent fuzzy partitions of a fuzzy probability space  Then: Proof.  The result follows by combining Theorems 5 and 9. □
   4. The R-Norm Divergence of Fuzzy P-Measures
In this part, the concept of the R-norm divergence of fuzzy P-measures is defined. In order to avoid expressions like  we will use in this section the following simplification: for any fuzzy partition  of a fuzzy probability space ), we assume that 0, for  Note that this is without loss of generality, because . We will prove basic properties of this quantity. The results are illustrated with numerical examples.
Definition 8. Let   be two fuzzy P-measures on a fuzzy measurable space  and  be a fuzzy partition of fuzzy probability spaces   The R-norm divergence of fuzzy P-measures  with respect to  is defined, for a positive real number R not equal to 1
, as the number:  Remark 3. It is easy to see that, for any fuzzy partition  of a fuzzy probability space  we have 
 The following theorem states that the R-norm divergence  is consistent, in the case of the limit of R going to 1, with the Kullback–Leibler divergence  defined by formula (3), up to a positive multiplicative constant.
Theorem 11. Let  be a fuzzy partition of fuzzy probability spaces   Then  where  and 
 Proof.  For every 
 we can write:
      where 
 are continuous functions defined for 
 by the formulas:
By continuity of the functions 
 we get 
  and 
 Using L’Hôpital’s rule this implies that:
      under the assumption that the right-hand side exists. Let us calculate the derivative of the function 
:
Since 
 we get:
 Remark 4. Evidently, if the Kullback–Leibler divergence  is expressed in terms of a natural logarithm, then it is the limiting case of the R-norm divergence  for R going to 1.
 Let 
 be a fuzzy partition of fuzzy probability spaces 
  In [
32], it has been shown that for the Kullback–Leibler divergence 
 it holds the Gibbs inequality 
 with the equality if and only if 
 for 
 This result allows us to interpret the Kullback–Leibler divergence 
 as a distance measure between two fuzzy P-measures (over the same fuzzy partition). In the following theorem, we present an analogy of this result for the case of the 
R-norm divergence.
Theorem 12. Let  be a fuzzy partition of fuzzy probability spaces   Then  with the equality if and only if  for 
 Proof.  We shall consider two cases: the case of  and the case of 
Consider the case of 
 The inequality follows from Jensen’s inequality for the function 
 defined by 
 for every 
 and putting 
  for 
 The assumption that 
 implies 
 hence the function 
 is convex. Therefore, by Jensen’s inequality we obtain:
      and consequently:
Since 
 for 
 it follows that:
For 
 the function 
 defined by 
 for every 
 is concave. Hence, using the Jensen inequality, we obtain:
      and consequently:
Since 
 for 
 we conclude that:
The equality in (9) holds if and only if  is constant, for  i.e., if and only if  for  Taking the sum over all  we get  which implies that  Therefore,  for  This means that  if and only if  for  □
 In the example that follows, it is shown that the equality  is not necessarily true which means that the R-norm divergence  is not symmetrical. Therefore, it is not a metric in a true sense.
Example 3. Consider the fuzzy probability spaces   defined in Example 2 and the fuzzy partition  of   Let us calculate the R-norm divergencies  and  Put  Elementary calculations show that  and  thus  For  we have  and  i.e.,  This means that  in general.
 Theorem 13. Let   be two fuzzy P-measures on a fuzzy measurable space  and  be a fuzzy partition of fuzzy probability spaces   In addition, let  be uniform over  i.e.,  for  Then, it holds that:  Example 4. Consider the fuzzy P-measures   from Example 2 and the fuzzy partition  of fuzzy probability spaces   The fuzzy P-measure  is uniform over A. Put  Based on previous results, we have  and  Let us calculate: Thus, the Equality (10) holds.
 As a direct consequence of Theorems 12 and 13, we obtain the following property of the R-norm entropy of fuzzy partitions:
Corollary 1. For arbitrary fuzzy partition  of a fuzzy probability space  it holds that:with the equality if and only if the fuzzy P-measure  is uniform over A.  Theorem 14. Let    be fuzzy P-measures on a fuzzy measurable space  and A be a fuzzy partition of fuzzy probability spaces   Then, for every real number  it holds that:  Proof.  Assume that 
 and 
 Putting 
 and 
  in the Minkowski inequality, we get for 
      and for 
This means that the function 
 is convex in 
 for 
 and concave in 
 for 
 The same applies to the function 
 Since 
 for 
 and 
 for 
 we conclude that the function 
 is convex on the family of all fuzzy P-measures on a given fuzzy measurable space 
 Thus, for every real number 
 it holds that:
 Theorem 15. Let  be any fuzzy partition of a fuzzy probability space  Then:
- (i) 
-  implies  
- (ii) 
-  implies  
 Proof.  In the proof we use the Jensen inequality for the concave function 
 defined by 
 for 
 and putting 
  for 
 Since the logarithm satisfies the condition 
 for all real numbers 
 we get:
Suppose that 
 Then 
 and using the inequality (11) and the Jensen inequality, we can write:
The case of  can be obtained in the similar way. □
 Example 5. Consider the fuzzy P-measures   from Example 2 and the fuzzy partition  of fuzzy probability spaces   Based on the results from Example 3, we have     By simple calculation we get that  and  Thus, for  we have   and for  we have  and  which is consistent with the statement in the previous theorem.
 We conclude our contribution with the formulation of a chain rule for the R-norm divergence in the fuzzy case. First, we define the conditional version of the R-norm divergence of fuzzy P-measures.
Definition 9. Let   be two fuzzy partitions of fuzzy probability spaces   Then, we define the conditional divergence of fuzzy P-measures   with respect to B assuming a realization of A, for a positive real number R not equal to 1
, as the number:  Theorem 16. Let  be two fuzzy partitions of fuzzy probability spaces   Then  Proof.  Assume that 
  Then we have:
   5. Conclusions
In this article, we have extended the study of entropy measures and distance measures in the fuzzy case. Our goal was to introduce the concepts of 
R-norm entropy and 
R-norm divergence for the case of fuzzy probability spaces and to derive basic properties of these measures. Our results are presented in 
Section 3 and 
Section 4.
In 
Section 3, we have defined the 
R-norm entropy and conditional 
R-norm entropy of fuzzy partitions of a given fuzzy probability space and have examined the properties of the proposed entropy measures. In particular, it has been shown that the 
R-norm entropy of fuzzy partitions does not have the property of additivity, but it satisfies the property called pseudo-additivity, as stated in Theorem 10. In Theorem 6, chain rules for the 
R-norm entropy of fuzzy partitions are provided. Moreover, it was shown that the Shannon entropy and the conditional Shannon entropy of fuzzy partitions can be derived from the 
R-norm entropy and conditional 
R-norm entropy of fuzzy partitions, respectively, as the limiting cases for 
In 
Section 4, the concept of 
R-norm divergence of fuzzy P-measures was introduced and the properties of this quantity have been proven. Specifically, it was shown that the Kullback–Leibler divergence defined and studied in [
32] can be derived from the 
R-norm divergence of fuzzy P-measures, as the limiting case for 
 The result of Theorem 12 allows us to interpret the 
R-norm divergence as a distance measure between two fuzzy P-measures. Theorem 13 provides a relationship between the 
R-norm divergence and the 
R-norm entropy of fuzzy partitions; Theorem 15 provides a relationship between the 
R-norm divergence and the Kullback–Leibler divergence of fuzzy P-measures. In addition, the concavity of 
R-norm entropy (Theorem 7) and convexity of 
R-norm divergence (Theorem 14) have been demonstrated. Finally, using the suggested concept of conditional 
R-norm divergence of fuzzy P-measures, the chain rule for the 
R-norm divergence of fuzzy P-measures was established. 
In the proofs, the Jensen inequality, L’Hôpital’s rule, and the Minkowski inequality were used. The results presented in 
Section 3 and 
Section 4 are illustrated with numerical examples.