Kullback – Leibler Divergence and Mutual Information of Partitions in Product MV Algebras

The purpose of the paper is to introduce, using the known results concerning the entropy in product MV algebras, the concepts of mutual information and Kullback–Leibler divergence for the case of product MV algebras and examine algebraic properties of the proposed measures. In particular, a convexity of Kullback–Leibler divergence with respect to states in product MV algebras is proved, and chain rules for mutual information and Kullback–Leibler divergence are established. In addition, the data processing inequality for conditionally independent partitions in product MV algebras is proved.


Introduction
The notions of entropy and mutual information are fundamental concepts in information theory [1]; they are used as measures of information obtained from a realization of the considered experiments.The standard approach in information theory is based on the Shannon entropy [2].Consider a finite measurable partition A of probability space (Ω, S, P) with probabilities p 1 , ..., p n of the corresponding elements of A. We recall that the Shannon entropy of A is the number H(A) = −∑ n i=1 F(p i ), where the function F : [0, ∞) → is defined by F(x) = x log x, if x > 0, and F(0) = 0. Perhaps a crucial point in applications of the Shannon entropy in another scientific field presents the discovery of Kolmogorov and Sinai [3] (see also [4,5]).They showed an existence of non-isomorphic Bernoulli shifts describing independent repetition of random spaces with finite numbers of results.If two dynamical systems are isomorphic, they have the same Kolmogorov-Sinai entropy.So Kolmogorov and Sinai constructed two Bernoulli shifts with different entropies, hence non-isomorphic.It is natural that the mentioned modification of entropy has been used in many mathematical structures.In [6], we have generalized the notion of Kolmogorov-Sinai entropy to the case when the considered probability space is a fuzzy probability space (Ω, M, µ) defined by Piasecki [7].This structure can serve as an alternative mathematical model of probability theory for the situations where the observed events are described unclearly, vaguely (so called fuzzy events).Other fuzzy generalizations of Shannon's and Kolmogorov-Sinai's entropy can be found e.g., in [8][9][10][11][12][13][14][15][16][17].It is known that there are many possibilities for defining operations with fuzzy sets; an overview can be found in [18].It should be noted that while the model presented in [6] was based on the Zadeh connectives [19], in our recently published paper [14], the Lukasiewicz connectives were used to define the fuzzy set operations.In [20], the mutual information of fuzzy partitions of a given fuzzy probability space (Ω, M, µ) has been defined.It was shown that the entropy of fuzzy partitions introduced and studied in [6] can be considered as a special case of their mutual information.
In classical information theory the mutual information is a special case of a more general quantity called Kullback-Leibler divergence (K-L divergence for short), which was originally introduced by Kullback and Leibler in 1951 [21] (see also [22]) as the divergence between two probability distributions.It plays an important role, as a mathematical tool, in the stability analysis of master equations [23] and Fokker-Planck equations [24], and in isothermal equilibrium fluctuations and transient nonequilibrium deviations [25] (see also [24,26]).In [27], we have introduced the concept of K-L divergence for the case of fuzzy probability spaces.
A natural generalization of some family of fuzzy sets is the notion of an MV algebra introduced by Chang [28].An MV algebra is an algebraic structure which models the Lukasiewicz multivalued logic, and the fragment of that calculus which deals with the basic logical connectives "and", "or", and "not", but in a multivalued context.MV algebras play a similar role in the multivalued logic as Boolean algebras in the classical two-valued logic.Recall also that families of fuzzy sets can be embedded to suitable MV algebras.MV algebras have been studied by many authors (see e.g., [29][30][31][32][33]) and, of course, there are also many results about the entropy on this structure (cf.[34,35]).The theory of fuzzy sets is a rapidly and massively developing area of theoretical and applied mathematical research.In addition to MV algebras, generalizations of MV algebras as D-posets (cf.[36][37][38]), effect algebras (cf.[39]), or A-posets (cf.[40,41]) are currently subject of intensive research.Some results about the entropy on these structures can be found e.g., in [42][43][44].
A special class of MV algebras is a class of product MV algebras.They have been introduced independently in [45] from the point of view of probability theory, and in [46] from the point of view of mathematical logic.Product MV algebras have been studied e.g., in [47,48].A suitable theory of entropy of Kolmogorov type for the case of product MV algebras has been constructed in [35,49,50].
The purpose of this contribution is to define, using the results concerning the entropy in product MV algebras, the concepts of mutual information and Kullback-Leibler divergence for the case of product MV algebras and to study properties of the suggested measures.The main results of the contribution are presented in Sections 3 and 4. In Section 3 the notions of mutual information and conditional mutual information in product MV algebras are introduced and basic properties of the suggested measures are proved, inter alia, the data processing inequality for conditionally independent partitions.In Section 4 we define the Kullback-Leibler divergence in product MV algebras and its conditional version and examine the algebraic properties of the proposed measures.Our results are summarized in the final section.

Basic Definitions, Notations and Facts
In this section, we recall some definitions and basic facts which will be used in the following ones.An MV algebra [30] is a system (M, ⊕, ⊗, * , 0, 1), where M is a non-empty set, ⊕, ⊗ are binary operations on M, * is a unary operation on M and 0, 1 are fixed elements of M, such that the following conditions are satisfied: An example of MV algebra is the real interval [0, 1] equipped with the operations x ⊕ y = min(1, x + y), x ⊗ y = max(0, x + y − 1).It is interesting that any MV algebra has a similar structure.In fact, by the Mundici theorem [33] any MV algebra can be represented by a lattice-ordered Abelian group (shortly Abelian l-group).Recall that an Abelian l-group is an algebraic system (G, +, ≤), where (G, +) is an Abelian group, (G, ≤) is a partially ordered set being a lattice and a ≤ b implies a + c ≤ b + c.
Let (G, +, ≤) be an Abelian l-group, 0 be a neutral element of (G, +) and u ∈ G, u > 0. On the interval [0, u] = {h ∈ G; 0 ≤ h ≤ u} we define the following operations: Then the system MG = ([0, u], ⊕, ⊗, * , 0, u) becomes an MV algebra.The Mundici theorem states that to any MV algebra M there exists an Abelian l-group G with a strong unit u (i.e., to every a ∈ G there exists n ∈ N with the property a ≤ nu) such that M ∼ = MG.
In this contribution we shall consider MV algebras with a product.We recall that the definition of product MV algebra is based on Mundici's categorical representation of MV algebra by an Abelian l-group, i.e., the sum in the following definition of product MV algebra, and subsequently in the next text, means the sum in the Abelian l-group associated to the given MV algebra.Similarly, the element u is a strong unit of this group.More details can be found in [45,46].Definition 1.A product MV algebra is a couple (M, •), where M is an MV algebra and • is a commutative and associative operation on M satisfying the following conditions: In addition, we shall consider a finitely additive state defined on a product MV algebra.
Definition 2 [30].Let (M, •) be a product MV algebra.A map m : M → [0, 1] is said to be a state if the following properties are satisfied: In product MV algebras a suitable entropy theory has been provided in [35,49,50].In the following we present the main idea and some results of this theory which will be used in the contribution.Definition 3. By a partition in a product MV algebra (M, •) we mean a finite collection A = {a 1 , ..., a n } ⊂ M such that ∑ n i=1 a i = u.
Let m be a state on a product MV algebra (M, •).In the set of all partitions of (M, •) the relation ≺ is defined in the following way: Let A = {a 1 , ..., a n } and B = {b 1 , ..., b k } be two partitions of (M, •).We say that B is a refinement of A (with respect to m), and write A ≺ B, if there exists a partition I(1), I(2), ..., I(n) of the set {1, 2, ..., k} such that m(a i ) = ∑ j∈I(i) m(b j ), for every i = 1, 2, ..., n.
Given two partitions A = {a 1 , ..., a n } and B = {b 1 , ..., b k } of (M, •), their join A ∨ B is defined as .., a n } be a partition in a product MV algebra (M, •) and m be a state on (M, •).Then the entropy of A with respect to m is defined by Shannon's formula: where: If A = {a 1 , ..., a n } and B = {b 1 , ..., b k } are two partitions of (M, •), then the conditional entropy of A given B is defined by: In accordance with the classical theory the log is to the base 2 and the entropy is expressed in bits.Note that we use the convention (based on continuity arguments) that x log x 0 = ∞ if x > 0, and 0 log 0 Example 1.Consider any product MV algebra (M, •) and a state m defined on M. Then the set E = { u} is a partition of (M, •) such that E ≺ A for any partition A of (M, •).Its entropy is H m (E ) = 0. Let a ∈ M such that m(a) = p, where p ∈ (0, 1).Evidently, m(u − a) = 1 − p, and the set A = {a, u − a} is a partition of (M, The entropy and the conditional entropy of partitions in a product MV algebra satisfy all properties analogous to properties of Shannon's entropy of measurable partitions in the classical case; the proofs can be found in [35,49,50].We present those that will be further exploited.Let A, B, C be any partitions of a product MV algebra (M, •).Then the following properties hold:

Mutual Information of Partitions in Product MV Algebras
In this section the results concerning the entropy in product MV algebras are used in developing information theory for the case of product MV algebras.We define the notions of mutual information and conditional mutual information of partitions in a product MV algebra and prove basic properties of the proposed measures.Definition 4. Let A, B be partitions in a given product MV algebra (M, •).Then we define the mutual information of A and B by the formula: ( Remark 1.As a simple consequence of (E4) we get: Subsequently we see that I m (A , A) = H m (A), i.e., the entropy of partitions in product MV algebras can be considered as a special case of their mutual information.Moreover, we see that I m (A , B) = I m (B , A), and hence we can also write: Example 2. Consider the measurable space (Ω, S), where Ω is the unit interval [0, 1], and S is the σ-algebra of all Borel subsets of [0, 1].Let F be the family of all S-measurable functions f : . F is the so called full tribe of fuzzy sets [30] (see also [14,29]); it is closed also under the natural product of fuzzy sets and represents a special case of product MV algebras.On the product MV algebra F we define a state m by the formula m( f ) = 1 0 f (x)dx, for every f ∈ F. Evidently, the sets A = {x, 1 − x} and B = x 2 , 1 − x 2 are two partitions of F with the m-state values 1  2 , 1 2 and 1 3 , 2 3 of the corresponding elements of A and B, respectively.By simple calculations we obtain the entropy H m (A) = log 2 = 1 bit, and the entropy H ) with the m-state values 1  4 , 1 12 , 1 4 , 5  12 of the corresponding elements.The entropy of A ∨ B is the number: • log 5 12 = 1.8250 bit. Since: = 0.9067 bit, the mutual information of A and B is the number: We can also see that Equation ( 3) is fulfilled: In the following we will use the assertions of Propositions 1 and 2.

Proposition 1.
If A = {a 1 , ..., a n } and B = {b 1 , ..., b k } are two partitions of (M, •), then we have: Proof.By the assumption k ∑ j=1 b j = u, therefore, according to Definitions 1 and 2, we get: The equality (ii) could be obtained in the same way.
From the following proposition it follows that, for every partitions A, B of (M, •), the set A ∨ B is a common refinement of A and B. Proposition 2. A ≺ A ∨ B, for every partitions A, B of (M, •).
Theorem 1.For any partitions A, B and C in a product MV algebra (M, •), we have: Proof.By Equation ( 2) and the properties (E3) and (E4), we get: According to Proposition 2 A ≺ C ∨ A, and therefore by (E2) H m (B/A) ≥ H m (B/C ∨ A).It follows the inequality: .., a n } and B = {b 1 , ..., b k } are two partitions of (M, •), then: Proof.Since by Proposition 1 it holds: we get: Theorem 2. Let A, B be partitions in a product MV algebra (M, •).Then I m (A , B) ≥ 0 with the equality if and only if the partitions A, B are statistically independent.
Proof.Assume that A = {a 1 , ..., a n } and B = {b 1 , ..., b k }.Then using the inequality log x ≤ x − 1, which is valid for all real numbers x > 0, with the equality if and only if x = 1, we get: The equality holds if and only if ). Therefore using Equation ( 5) and Proposition 1 we have: It follows that I m (A , B) ≥ 0 with the equality if and only if m(a i • b j ) = m(a i ) • m(b j ), for i = 1, 2, ..., n, j = 1, 2, ..., k, i.e., when the partitions A, B are statistically independent.
From Theorem 2 it follows subadditivity and additivity of entropy in a product MV algebra, as shown by the following theorem.

Theorem 3 (Subadditivity and additivity of entropy). For arbitrary partitions
with the equality if and only if the partitions A, B are statistically independent.Proof.The assertion is a simple consequence of Equation ( 2) and Theorem 2. Definition 6.Let A, B and C be partitions in a given product MV algebra (M, •).Then the conditional mutual information of A and B given C is defined by the formula Remark 2. Notice that the conditional mutual information is nonnegative, because by the property (E2) Theorem 5.For any partitions A, B and C in a product MV algebra (M, •), we have: Proof.Let us calculate: In a similar way we obtain also the second equality.
By (E3) and (E4) we get: Now let us suppose that the result is true for a given n ∈ N. Then: (ii) For n = 2, using (E3) we obtain: Suppose that the result is true for a given n ∈ N. Then: (iii) By Equation (2), the equalities (i) and (ii) of this theorem, and Equation ( 6), we obtain: Let us calculate: The results means that C → B → A. The reverse implication is evident.
Proof.(i) By the assumption we have I m (A , C / B) = 0. Hence using the chain rule for the mutual information (Theorem 6 (iii)), we obtain: (ii) By the equality (i) of this theorem and Theorem 5, we can write: (iii) From (ii) it follows the inequality I m (B , C) ≥I m (C , B / A).Interchanging A and C (we can do it based on Theorem 7) we obtain: (iv) By the assumption we have I m (A , C / B) = 0. Therefore by Theorem 5 we get: Thus by the same theorem we can write: In the following, a concavity of entropy H m (A) and concavity of mutual information I m (A, B) as functions of m are studied.We recall, for the convenience of the reader, the definitions of convex and concave function: A real-valued function f is said to be convex over an interval [a, b] if for every x 1 , x 2 ∈ [a, b] and for any real number α ∈ [0, 1]: A real-valued function f is said to be concave over an interval [a, b] if for every x 1 , x 2 ∈ [a, b] and for any real number α ∈ [0, 1]: In the following, we will use the symbol F to denote the family of all states on a given product MV algebra (M, •).It is easy to prove the following proposition: Theorem 9 (Concavity of entropy).Let A be a partition in a given product MV algebra (M, •).Then, for every m 1 , m 2 ∈ F, and every real number α ∈ [0, 1], the following inequality holds: Proof.Assume that A = {a 1 , ..., a n }.Since the function F is convex, we get: which proves that the entropy m → H m (A) is a concave function on the family F.
In the proof of concavity of mutual information I m (A , B) we will need the assertion of Proposition 5. First, we introduce the following notation.Let m be a state on a product MV algebra (M, •), a, b ∈ M. Then we denote: .
Proof.Let us calculate: In the last step, we used the implication m(a Remark 4. By Proposition 5 there exists Theorem 10 (Concavity of mutual information).The mutual information m → I m (A , B) is a concave function on the family K.
Proof.By Equation ( 4) we can write: In view of Theorem 9 and Remark 4, the function m → I m (A , B) is the sum of two concave functions on the family K: m → H m (B), and m → −H m (B /A).Since the sum of two concave functions is itself concave, we have the statement.

Kullback-Leibler Divergence in Product MV Algebras
In this section we introduce the concept of Kullback-Leibler divergence in product MV algebras.We prove basic properties of this measure; in particular, Gibb's inequality.Finally, using the notion of conditional Kullback-Leibler divergence we establish a chain rule for Kullback-Leibler divergence with respect to additive states defined on a given product MV algebra.In the proofs we use the following known log-sum inequality: for non-negative real numbers x 1 , x 2 , ..., x n , y 1 , y 2 , ..., y n , it holds: with the equality if and only if x i y i is constant.Recall that we use the convention that x log x 0 = ∞ if x > 0, and 0 log 0 x = 0 if x ≥ 0.

Definition 9.
Let m 1 , m 2 be states defined on a given product MV algebra (M, •), and A = {a 1 , ..., a n } be a partition of (M, •).Then we define the Kullback-Leibler divergence D A (m 1 m 2 ) by: Remark 5.It is obvious that D A (m m) = 0.The Kullback-Leibler divergence is not a metric in a true sense since it is not symmetric, i.e., the equality D A (m 1 m 2 ) = D A (m 2 m 1 ) is not necessarily true (as shown in the following example), and does not satisfy the triangle inequality.
Proof.If we put x i = m 1 ( a i ) and y i = m 2 ( a i ), for i = 1, 2, ..., n, then x 1 , x 2 , ..., x n , y 1 , y 2 , ..., y n are non-negative real numbers such that ∑ n i=1 x i = 1 and analogously we obtain ∑ n i=1 y i = 1.Thus, using the log-sum inequality we can write: with the equality if and only if m 2 (a i ) = α for i = 1, 2, ..., n, where α is constant.Taking the sum for all i = 1, 2, ..., n, we obtain Theorem 12. Let A be a partition of (M, •) and ν be a state on (M, •) uniform over A. Then, for the entropy of A with respect to any state m from F, we have: n , for i = 1, 2, ..., n.Let us calculate: As a consequence we obtain the following property of entropy of partitions in product MV algebras.

Corollary 1.
For any partition A of (M, •), it holds H m (A) ≤ log cardA, with the equality if and only if m is uniform over the partition A.
The result of Theorem 13 is illustrated in the following example.
Theorem 14 (Chain rule for K-L divergence).Let m 1 , m 2 be states on a given product MV algebra (M, •).
If A, B are two partitions of (M, •), then: Proof.Assume that A = {a 1 , ..., a n } and B = {b 1 , ..., b k }.We will consider the following two cases: (i) there exists i 0 ∈ {1, ..., n} such that m 2 (a i 0 ) = 0; (ii) m 2 (a i ) > 0 for i = 1, 2, ..., n.In the first case, both sides of Equation ( 11) are equal to ∞, thus the equality holds.Let us now assume that m 2 (a i ) > 0, for i = 1, 2, ..., n.We get: In the last step, analogously as in the proof of Proposition 5, we used the implication In the following example, we illustrate the result of the previous theorem.

Discussion
In this paper, we have extended the study of entropy in product MV algebras.The main aim of the paper was to introduce, using known results concerning the entropy in product MV algebras, the concepts of mutual information and Kullback-Leibler divergence for the case of product MV algebras and examine algebraic properties of the proposed measures.Our results have been presented in Sections 3 and 4.
In Section 3 we have introduced the notions of mutual information and conditional mutual information of partitions of product MV algebras and proved some basic properties of the suggested measures.It was shown that the entropy of partitions of product MV algebras can be considered as a special case of their mutual information.Specifically, it was proved that from the properties of mutual information it follows subadditivity and additivity of entropy (Theorem 3).Theorem 6 provides the chain rule for mutual information.In addition, the data processing inequality for conditionally independent partitions in product MV algebras is proved.Moreover, a concavity of mutual information has been studied.
In Section 4 the notion of Kullback-Leibler divergence in product MV algebras was introduced and the basic properties of this measure were shown.In particular, a convexity of Kullback-Leibler divergence with respect to additive states defined on a given product MV algebra is proved.Theorem 11 admits interpretation of Kullback-Leibler divergence as a measure of how different two states on a common product MV algebra (over the same partition) are.The relationship between KL-divergence and entropy is provided in Theorem 12: the more a state m ∈ F diverges from the state ν ∈ F uniform over A (over the same partition A) the lesser the entropy H m (A) is and vice versa.Finally, a conditional version of the Kullback-Leibler divergence in product MV algebras has been defined and the chain rule for Kullback-Leibler divergence with respect to additive states defined on a given product MV algebra has been established.
Notice that in [14] (see also [29,30]) the entropy on a full tribe F of fuzzy sets has been studied.The tribe F is closed also under the natural product of fuzzy sets and it represents a special case of product MV algebras.Accordingly, the theory presented in this contribution can also be applied for the mentioned case of tribes of fuzzy sets.
In [51][52][53][54][55] a more general fuzzy theory-intuitionistic fuzzy sets (IF-sets for short) has been developed.While a fuzzy set is a mapping µ A : Ω → [0, 1] (where the considered fuzzy set is identified with its membership function µ A ), the Atanassov IF-set is a pair A = (µ A , ν A ) of functions µ A , ν A : Ω → [0, 1] with µ A + ν A ≤ 1.The function µ A is interpreted as a membership function of IF-set A, and the function ν A as a non-membership function of IF-set A. Evidently, any fuzzy set µ A : Ω → [0, 1] can be considered as an IF-set A = (µ A , 1 − µ A ). Any result holding for IF-sets is applicable also to fuzzy sets.Of course, the opposite implication is not true; the theory of intuitionistic fuzzy sets presents a non-trivial generalization of the fuzzy set theory.So IF-sets present possibilities for modeling a larger class of real situations.Note that some results about the entropy on IF-sets can be found e.g., in [56][57][58][59].These results could be used in developing information theory for the case of IF-sets.
To give a possibility to applied MV algebra results also to families of IF-experiments, one can use the Mundici characterization of MV algebras.In the family of IF-sets it is natural to define the partial ordering relation ≤ in the following way: if A = (µ A , ν A ), and B = (µ B , ν B ) are two IF-sets, then A ≤ B if and only if µ A ≤ µ B , and ν A ≥ ν B .Namely, in the fuzzy case µ A ≤ µ B implies ν A = 1 − µ A ≥ 1 − µ B = ν B .Therefore we can consider the Abelian l-group ( 2 , +, ≤) putting A + B = (µ A + µ B , 1 − (1 − ν A + 1 − ν B )) = (µ A + µ B , ν A + ν B − 1) with the zero element 0 = (0, 1).(In fact, A + 0 = (µ A , ν A ) + (0, 1) = (µ A , ν A ) = A.) The partial ordering ≤ in the l-group ( 2 , +, ≤) is defined by the prescription A ≤ B if and only if µ A ≤ µ B , and ν A ≥ ν B .Then a suitable MV algebra is e.g., the system M = {(µ A , ν A ); (0, 1) ≤ (µ A , ν A ) ≤ (1, 0)}.Moreover, this MV algebra is a product MV algebra with the product defined byA The presented MV algebra approach gives a possible elegant and practical way for obtaining new results also in the intuitionistic fuzzy case.We note that this approach was used to construct the Kolmogorov-type entropy theory for IF systems in [58], drawing on entropy results for product MV-algebras published in [35,49,50].In this way it is also possible to develop the theory of information and K-L divergence for IF-sets.

Theorem 4 .
For arbitrary partitions A, B in a product MV algebra (M, •), it holds H m (A/B) ≤ H m (A) with the equality if and only if the partitions A, B are statistically independent.

Remark 3 .Theorem 8 .
According to the above theorem, we may say that A and C are conditionally independent given B and write A ↔ B ↔ C instead of A → B → C .Let A, B and C be partitions in a given product MV algebra (M, •) such that A → B → C. Then we have:

Definition 7 .
LetA, B and C be partitions in a product MV algebra (M, •).We say that A is conditionally independent to C given B (and write A → B → C) if I Then 0 = I m (A , C / B) = H m (A / B) − H m (A / B ∨ C) .Therefore by (E4) we get: m (A , C / B) = 0. Theorem 7.For partitions A, B and C in a product MV algebra (M, •), A → B → C if and only if C → B → A .Proof.Let A → B → C.

Example 5 .
Consider the product MV algebra F and the partitions A = {x, 1 − x}, B = x 2 , 1 − x 2 of the product MV algebra F from Example 2. In addition, let m 1 , m 2 be the states on F, defined in Example 4. Then the partitions A and B have the m 1 -state values 1 of the corresponding elements, respectively.The join of partitions A and B is the system