Kullback-Leibler Divergence and Mutual Information of Experiments in the Fuzzy Case

: The main aim of this contribution is to deﬁne the notions of Kullback-Leibler divergence and conditional mutual information in fuzzy probability spaces and to derive the basic properties of the suggested measures. In particular, chain rules for mutual information of fuzzy partitions and for Kullback-Leibler divergence with respect to fuzzy P-measures are established. In addition, a convexity of Kullback-Leibler divergence and mutual information with respect to fuzzy P-measures is studied.


Introduction
The notion of σ−algebra S of random events and the concept of probability space (Ω, S, P) are a basis of the classical Kolmogorov probability theory [1].A probability P is a normalized measure defined on the σ−algebra S. The event in classical probability theory is understood as an exactly and clearly defined phenomenon and, from a mathematical point of view, it is a classical, ordinary set.Consider a finite measurable partition A of (Ω, S, P) with probabilities p 1 , . . ., p n of the corresponding elements of A. We recall that the Shannon entropy [2] of A is the number H(A) = −∑ n i=1 F(p i ), where the function F : [0, ∞) → is defined by F(x) = −x log x if x > 0 and F(0) = 0.In [3], we have generalized this notion to situations when the considered probability space is a fuzzy probability space (Ω, M, µ) defined by Piasecki [4].Instead of probability P it is considered a fuzzy P-measure µ defined on a fuzzy σ−algebra M of fuzzy subsets of Ω. Recall that by a fuzzy subset f of a non-empty set Ω we mean a mapping f : Ω → [0, 1] (Zadeh [5]).A fuzzy subset from the fuzzy σ−algebra M is interpreted as a fuzzy event; the value µ( f ) is interpreted as a probability of fuzzy event f .The structure (Ω, M, µ) can serve as an alternative mathematical model of probability theory for the case where the observed events are described unclearly, vaguely, so-called fuzzy events.In [6], the mutual information of fuzzy partitions of a given fuzzy probability space (Ω, M, µ) has been defined.It was shown that the entropy of fuzzy partitions introduced and studied by the author in [3] (see also [7]) can be considered as a special case of their mutual information.The proposed measures are fuzzy analogies of the relevant terms of the classical theory and they can be used whenever it is necessary to know the quantity of information received by the realization of experiments whose results are fuzzy events.Note that in [3] (see also [8]) using the concept of entropy of fuzzy partitions we define the entropy of the fuzzy dynamical system (Ω, M, µ, U) (where (Ω, M, µ) is a fuzzy probability space and U : M → M is a measure µ preserving σ−homomorphism).Analogies of our results for the case of logical entropy (see, e.g., [9]) are provided in our recently published papers [10,11].Recall that the logical entropy of the probability distribution P = (p 1 , . . ., p n ) ∈ n is defined as the number h(P) = n ∑ i=1 p i (1 − p i ).
In [9] the author deals with historical aspects of the logical entropy formula h(P) and investigates the relationship between the logical entropy and Shannon's entropy.It should be noted that other fuzzy analogies of entropy are presented in [12][13][14][15][16][17][18][19][20][21][22][23][24][25].It is known that there are many possibilities for defining operations with fuzzy sets; an overview can be found in [26].While our approach is based on Zadeh's connectives [5], the authors of the cited papers used other connectives to define the fuzzy set operations.
In classical information theory [27] the mutual information is a special case of a more general quantity called Kullback-Leibler divergence, which was originally introduced by Kullback and Leibler in 1951 [28] as the divergence between two probability distributions.It is discussed in Kullback's historic text [29].The Kullback-Leibler divergence is also called by many other different names, as K-L divergence, whether relative entropy, or information gain.It plays an important role, as a mathematical tool, in the stability analysis of master equations [30] and Fokker-Planck equations [31], and in isothermal equilibrium fluctuations and transient nonequilibrium deviations [32] (see also [31,33]).
The main aim of this contribution is to define, using our previous results on this issue, the notion of Kullback-Leibler divergence and conditional mutual information in fuzzy probability spaces and to study properties of the suggested measures.The paper is organized as follows.In the next section, we give the basic definitions and some known facts used in this paper.Our results are presented in Sections 3 and 4. In Section 3 we extend our study concerning the mutual information of fuzzy partitions.The notion of conditional mutual information of fuzzy partitions is introduced and subsequently chain rules for mutual information of fuzzy partitions are established.We also derive some more properties of this measure, e.g., data processing inequality.In Section 4 we define the Kullback-Leibler divergence and the conditional Kullback-Leibler divergence in fuzzy probability spaces.The basic properties of these measures are proved.Our results are summarized in Section 5.

Basic Definitions and Facts
We start by recalling some definitions and some known results which will be used in this contribution.
A fuzzy measurable space (Dvurečenskij [34]) is a couple (Ω, M), where Ω is a non-empty set, and M is a fuzzy σ−algebra of fuzzy subsets of Ω, i.e., M ⊂ [0, 1] Ω containing 1 Ω , excluding (1/2) Ω , closed under the operation ⊥ : f → 1 Ω − f (i.e., if f ∈ M, then f ⊥ : = 1 Ω − f ∈ M) and countable supremums (i.e., satisfying the implication if f n ∈ M, n = 1, 2, . .., then ∪ ∞ n=1 f n := sup n f n ∈ M).A fuzzy probability space (Piasecki [4]) is a triplet (Ω, M, µ), where (Ω, M) is a fuzzy measurable space and the mapping µ : M → [0, ∞) satisfies the following two conditions: n=1 is a sequence of pairwise W-separated fuzzy subsets from M (i.e., denote the fuzzy union and the fuzzy intersection of a sequence { f n } ∞ n=1 ⊂ M, respectively, in the sense of Zadeh [5].The complement of fuzzy subset f of Ω is a fuzzy set f ⊥ defined by f ⊥ (ω) = 1 − f (ω), ω ∈ Ω.The following notions were defined by Piasecki in [35].A fuzzy set f ∈ M such that f ≥ f ⊥ is called a W-universum; a fuzzy set f ∈ M such that f ≤ f ⊥ is called a W-empty fuzzy set.It can be proved that a fuzzy event f ∈ M is a W-universum if and only if there exists a fuzzy event g ∈ M such that f = g ∪ g ⊥ .A W-universum is interpreted as a certain event and a W-empty set as an impossible event.W-separated fuzzy events are interpreted as mutually exclusive events.Each mapping µ : M → [0, ∞) having the properties (i) and (ii) is called, in the terminology of Piasecki, a fuzzy P-measure.Any fuzzy P-measure has all properties analogous to properties of a classical probability measure; the proofs and more details can be found in [4].The monotonicity of fuzzy P-measure µ implies that this measure transforms M into the interval [0, 1].In the following we will use the following property of fuzzy P-measure µ.
Let g ∈ M such that µ(g) > 0. Then the mapping µ(•/g) : M → [0, 1] defined by the formula is a fuzzy P-measure on M; it is called a conditional probability.
Every fuzzy partition ξ = { f 1 , . . . ,f n } of (Ω, M, µ) represents in the sense of the classical probability theory a random experiment with a finite number of outcomes f i , i = 1, 2, . . ., n (which are fuzzy events) with a probability distribution p i = µ( f i ), i = 1, 2, . . ., n, since p i ≥ 0 for i = 1, 2, . . ., n, For that reason, we have defined in [3] the entropy of ξ = { f 1 , . . . ,f n } by Shannon's formula: where: In accordance with the classical theory the log is to the base 2 and entropy is expressed in bits.
A conditional entropy of η assuming a realization of the experiment ξ is defined by the formula: In the following, we will use the convention (based on continuity arguments) that x log x 0 = ∞ if x > 0, and 0 log 0 x = 0 if x ≥ 0. It is easy to see that we can rewrite Equation (2) in the following way: As in [3] we define in the set of all fuzzy partitions of a fuzzy probability space (Ω, M, µ) the relation ≺: Let ξ, η be two fuzzy partitions of a fuzzy probability space (Ω, M, µ).Then we write ξ ≺ η (and we say that the fuzzy partition η is a refinement of the fuzzy partition ξ) iff for every g ∈ η there exists f ∈ ξ such that g ≤ f .Given two fuzzy partitions ξ = { f 1 , . . . ,f n } and η = {g 1 , . . . ,g m } of a fuzzy probability space (Ω, M, µ), their common refinement ξ ∨ η is defined as the system ξ ∨ η = f i ∩ g j ; i = 1, . . ., n, j = 1, . . ., m }.The fuzzy partition ξ ∨ η represents a joint experiment of experiments ξ, η.Evidently, ξ ≺ ξ ∨ η and η ≺ ξ ∨ η.If ξ 1 , ξ 2 , . . ., ξ n are fuzzy partitions of a fuzzy probability space (Ω, M, µ), then we put Then it is easy to verify that the mapping µ : M → [0, 1] defined by the equalities µ( 1 where p ∈ (0, 1), is a fuzzy P-measure and the system (Ω, M, µ) is a fuzzy probability space.The sets This makes sense, because the partitions η and ς represent experiments whose results are certain events.In particular, if p = 1 2 , then H µ (ξ) = log 2 = 1 bit.
The entropy and the conditional entropy of fuzzy partitions of a fuzzy probability space (Ω, M, µ) satisfy all properties analogous to properties of Shannon's entropy of measurable partitions in the classical case; the proofs can be found in [3,6], respectively.We present some of them.If ξ, η, ς are fuzzy partitions of a fuzzy probability space (Ω, M, µ), then: Definition 3 [6].Let ξ, η be two fuzzy partitions of a given fuzzy probability space (Ω, M, µ).The mutual information of ξ and η is defined by the formula: As a simple consequence of (2.8) we have: It is evident that I µ (ξ, ξ) = H µ (ξ), and I µ (ξ, η) = I µ (η, ξ).Hence, we can write: The proofs of the following two theorems can be found in [6].

Mutual Information and Conditional Mutual Information in Fuzzy Probability Spaces
In this section by using our previous results we introduce the notion of conditional mutual information in fuzzy probability spaces.We derive chain rules for mutual information of fuzzy partitions and we will prove some more properties concerning these measures.Definition 4. Let ξ, η, ς be fuzzy partitions of a fuzzy probability space (Ω, M, µ).Then the conditional mutual information of ξ and η given ς is defined by the formula: Remark 1. Since, according to (2.5), we have H µ (ξ/ς) ≥ H µ (ξ/η ∨ ς), for the conditional mutual information the inequality I µ (ξ, η/ς) ≥ 0 holds.
Theorem 3.For fuzzy partitions ξ, η, ς of a fuzzy probability space (Ω, M, µ), it holds: Proof.By simple calculation we obtain: Analogously we obtain the second equality.

Kullback-Leibler Divergence with Respect to Fuzzy P-Measures
In this part we define the Kullback-Leibler divergence in fuzzy probability spaces and its conditional version.We prove basic properties of these measures; in particular, Gibb's inequality.Finally, using the concept of conditional Kullback-Leibler divergence we establish a chain rule for Kullback-Leibler divergence with respect to fuzzy P-measures.In the proofs we use the following known log-sum inequality: for non-negative real numbers a 1 , a 2 , . . ., a n , b 1 , b 2 , . . ., b n , it holds: with the equality if and only if a i b i is constant.Recall that we use the convention that x log x 0 = ∞ if x > 0, and 0 log 0 x = 0 if x ≥ 0. Definition 6.Let µ, ν be fuzzy P-measures on a common fuzzy measurable space (Ω, M).Then, for any fuzzy partition ξ of fuzzy probability spaces (Ω, M, µ), (Ω, M, ν), we define the Kullback-Leibler divergence D ξ (µ ν) by: Remark 2. The Kullback-Leibler divergence is not a metric in a true sense since it is not symmetric, i.e., the equality D ξ (µ ν)= D ξ (ν µ) is not necessarily true (as shown in the example that follows), and does not satisfy the triangle inequality.
Example 2. Consider the fuzzy measurable space (Ω, M) from Example 1 and the following two fuzzy P-measures µ, ν defined on M: µ is defined as in Example 1 and ν is defined in a similar way: where q ∈ (0, 1).Then, for the fuzzy partition ξ = f , f ⊥ , we obtain: This means that D ξ (µ ν) =D ξ (ν µ), in general.
Proof.Putting a i = µ( f i ) and b i = ν( f i ), for i = 1, 2, . . ., n, in the log-sum inequality, we obtain analogously we obtain ∑ n i=1 b i = 1.Therefore, by Equation ( 8): with the equality if and only if n, where α is constant.Taking the sum for all i = 1, 2, . . ., n, we obtain α = 1.This means that D ξ (µ ν) = 0 if and only if µ( Theorem 8. Let µ, ν be fuzzy P-measures on a common fuzzy measurable space (Ω, M).Then, for any fuzzy partition ξ = { f 1 , . . . ,f n } of fuzzy probability spaces (Ω, M, µ), (Ω, M, ν), it holds: where n is the number of members of ξ, and ν is the uniform probability distribution over ξ, i.e., ν( Proof.Calculate: As a consequence of the previous two theorems we obtain the following property of entropy of fuzzy partitions: Corollary 1.For any fuzzy partition ξ of a fuzzy probability space (Ω, M, µ), it holds H µ (ξ) ≤ log n, where n denotes the number of members of ξ, with the equality if and only if µ is the uniform probability distribution over ξ.
Proof.Let ν be the uniform probability distribution over ξ = { f 1 , . . . ,f n }.Then, according to the previous theorem and Gibb's inequality (Theorem 7) we have: which implies the inequality H µ (ξ) ≤ log n, where n is the number of members of ξ.Since by Theorem 7 D ξ (µ ν) = 0 if and only if µ( f i ) = ν( f i ), for i = 1, 2, . . ., n, the equality H µ (ξ) = log n holds if and only if µ is the uniform probability distribution over ξ.
In the following, a concavity of entropy H µ (ξ) as a function of µ and convexity of K-L divergence with respect to fuzzy P-measures are shown.We recall, for the convenience of the reader, the definitions of convex and concave function: A real-valued function f is said to be convex over an interval [a, b] if for every x 1 , x 2 ∈ [a, b] and for any real number α ∈ [0, 1], A real-valued function f is said to be concave over an interval [a, b] if for every x 1 , x 2 ∈ [a, b] and for any real number α ∈ [0, 1], Remark 3. In the proofs of some of the next assertions, the following known properties are used: (i) A function f is concave over an interval if and only if the function − f is convex over the interval.(ii) The sum of two concave functions is itself concave; the sum of two convex functions is itself convex.(iii) Every real-valued affine function, i.e., each function of the form f (x) = ax + b, a, b ∈ , is simultaneously convex and concave.
It is easy to prove the following proposition.
Proof.Let us prove the first assertion.By Equations ( 6) and (2) we have: whenever k = l, the system f 1 ∩ g j , . . ., f n ∩ g j is a system of pairwise W-separated fuzzy subsets from M. Due to the assumption µ(∪ n i=1 f i ) = 1, the property (2.1) and additivity of the fuzzy P-measure, we get: This means that, for fixed µ(g j / f i ), µ(g j ) is a linear function of µ( f i ).Therefore, H µ (η), which is a concave function of µ(g j ), is a concave function of µ( f i ).The second term of the difference in Equation ( 9) is a linear function of µ( f i ).Hence, (see Remark 3) the difference in Equation ( 9) is a concave function of µ( f i ).Thus, the first part of theorem is proved.Now, let us prove the second part.We fix µ over ξ and consider two different conditional fuzzy P-measures µ 1 (g j / f i ) and µ 2 (g j / f i ), defined for f i ∈ ξ, µ( f i ) > 0, and j = 1, 2, . . ., m.Then, for i = 1, . . ., n, j = 1, 2, . . ., m, we have: According to Proposition 1 we can define, for every real number α ∈ [0, 1], the following conditional fuzzy P-measure: Then we have: Therefore, if we put ν α ( f i ∩ g j ) = µ( f i ) • µ α (g j ), we obtain: According to Theorem 1 and Theorem 10 we can write: Finally, we define a conditional version of the Kullback-Leibler divergence and, subsequently, we will prove the chain rule for K-L divergence.Definition 7. Let ξ = { f 1 , . . . ,f n }, η = {g 1 , . . . ,g m } be two fuzzy partitions of fuzzy probability spaces (Ω, M, µ), (Ω, M, ν).Then we define the conditional Kullback-Leibler divergence D η/ξ (µ ν) by: .

Discussion
In this paper we extend our study concerning mutual information of fuzzy partitions in fuzzy probability spaces.In Section 3, using our previous results, the notion of conditional mutual information of fuzzy partitions is defined and chain rules for mutual information of fuzzy partitions are established.Subsequently, using the notion of conditional mutual information of fuzzy partitions we have defined the notion of conditional independence of fuzzy partitions and we have proved, inter alia, data processing inequality for the studied situation.In Section 4 the notion of Kullback-Leibler divergence for fuzzy P-measures is introduced and the basic properties of this measure are shown.In particular, a convexity of Kullback-Leibler divergence with respect to fuzzy P-measures is proved and a convexity of mutual information of fuzzy partitions is studied.Finally, a conditional version of the Kullback-Leibler divergence is defined and chain rules for Kullback-Leibler divergence with respect to fuzzy P-measures are established.
As noted previously, logical versions of some results concerning the entropy and mutual information of fuzzy partitions presented in Sections 2 and 3 are given in [10].In [9] Ellerman studied, in addition to the logical entropy and logical mutual information, the concept of logical Kullback-Leibler divergence.The aim of our next study will be to provide a logical version of Kullback-Leibler divergence for fuzzy probability measures.