Next Article in Journal
Norm Retrieval and Phase Retrieval by Projections
Previous Article in Journal
Sparse Wavelet Representation of Differential Operators with Piecewise Polynomial Coefﬁcients

Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

# Kullback-Leibler Divergence and Mutual Information of Experiments in the Fuzzy Case

by
Dagmar Markechová
Department of Mathematics, Faculty of Natural Sciences, Constantine the Philosopher University in Nitra, A. Hlinku 1, SK-949 01 Nitra, Slovakia
Submission received: 30 January 2017 / Accepted: 1 March 2017 / Published: 3 March 2017

## Abstract

:
The main aim of this contribution is to define the notions of Kullback-Leibler divergence and conditional mutual information in fuzzy probability spaces and to derive the basic properties of the suggested measures. In particular, chain rules for mutual information of fuzzy partitions and for Kullback-Leibler divergence with respect to fuzzy P-measures are established. In addition, a convexity of Kullback-Leibler divergence and mutual information with respect to fuzzy P-measures is studied.

## 1. Introduction

The notion of $σ −$algebra S of random events and the concept of probability space are a basis of the classical Kolmogorov probability theory [1]. A probability P is a normalized measure defined on the $σ −$algebra S. The event in classical probability theory is understood as an exactly and clearly defined phenomenon and, from a mathematical point of view, it is a classical, ordinary set. Consider a finite measurable partition $A$ of with probabilities $p 1 , … , p n$ of the corresponding elements of $A$. We recall that the Shannon entropy [2] of $A$ is the number $H ( A ) = − ∑ i = 1 n F ( p i )$, where the function is defined by $F ( x ) = − x log x$ if $x > 0$ and $F ( 0 ) = 0$. In [3], we have generalized this notion to situations when the considered probability space is a fuzzy probability space defined by Piasecki [4]. Instead of probability P it is considered a fuzzy P-measure $μ$ defined on a fuzzy $σ −$algebra M of fuzzy subsets of $Ω$. Recall that by a fuzzy subset f of a non-empty set $Ω$ we mean a mapping (Zadeh [5]). A fuzzy subset from the fuzzy $σ −$algebra M is interpreted as a fuzzy event; the value $μ ( f )$ is interpreted as a probability of fuzzy event $f$. The structure can serve as an alternative mathematical model of probability theory for the case where the observed events are described unclearly, vaguely, so-called fuzzy events. In [6], the mutual information of fuzzy partitions of a given fuzzy probability space has been defined. It was shown that the entropy of fuzzy partitions introduced and studied by the author in [3] (see also [7]) can be considered as a special case of their mutual information. The proposed measures are fuzzy analogies of the relevant terms of the classical theory and they can be used whenever it is necessary to know the quantity of information received by the realization of experiments whose results are fuzzy events. Note that in [3] (see also [8]) using the concept of entropy of fuzzy partitions we define the entropy of the fuzzy dynamical system (where is a fuzzy probability space and $U : M → M$ is a measure $μ$ preserving $σ −$homomorphism). Analogies of our results for the case of logical entropy (see, e.g., [9]) are provided in our recently published papers [10,11]. Recall that the logical entropy of the probability distribution $P = ( p 1 , … , p n ) ∈ ℜ n$ is defined as the number $h ( P ) = ∑ i = 1 n p i ( 1 − p i )$. In [9] the author deals with historical aspects of the logical entropy formula $h ( P )$ and investigates the relationship between the logical entropy and Shannon’s entropy. It should be noted that other fuzzy analogies of entropy are presented in [12,13,14,15,16,17,18,19,20,21,22,23,24,25]. It is known that there are many possibilities for defining operations with fuzzy sets; an overview can be found in [26]. While our approach is based on Zadeh’s connectives [5], the authors of the cited papers used other connectives to define the fuzzy set operations.
In classical information theory [27] the mutual information is a special case of a more general quantity called Kullback-Leibler divergence, which was originally introduced by Kullback and Leibler in 1951 [28] as the divergence between two probability distributions. It is discussed in Kullback’s historic text [29]. The Kullback-Leibler divergence is also called by many other different names, as K-L divergence, whether relative entropy, or information gain. It plays an important role, as a mathematical tool, in the stability analysis of master equations [30] and Fokker-Planck equations [31], and in isothermal equilibrium fluctuations and transient nonequilibrium deviations [32] (see also [31,33]).
The main aim of this contribution is to define, using our previous results on this issue, the notion of Kullback-Leibler divergence and conditional mutual information in fuzzy probability spaces and to study properties of the suggested measures. The paper is organized as follows. In the next section, we give the basic definitions and some known facts used in this paper. Our results are presented in Section 3 and Section 4. In Section 3 we extend our study concerning the mutual information of fuzzy partitions. The notion of conditional mutual information of fuzzy partitions is introduced and subsequently chain rules for mutual information of fuzzy partitions are established. We also derive some more properties of this measure, e.g., data processing inequality. In Section 4 we define the Kullback-Leibler divergence and the conditional Kullback-Leibler divergence in fuzzy probability spaces. The basic properties of these measures are proved. Our results are summarized in Section 5.

## 2. Basic Definitions and Facts

We start by recalling some definitions and some known results which will be used in this contribution.
A fuzzy measurable space (Dvurečenskij [34]) is a couple , where $Ω$ is a non-empty set, and M is a fuzzy $σ −$algebra of fuzzy subsets of $Ω$, i.e., containing $1 Ω$, excluding $( 1 / 2 ) Ω$, closed under the operation (i.e., if $f ∈ M$, then $f ⊥ : = 1 Ω − f ∈ M$) and countable supremums (i.e., satisfying the implication if , then ). A fuzzy probability space (Piasecki [4]) is a triplet , where is a fuzzy measurable space and the mapping satisfies the following two conditions: (i) $μ ( f ∪ f ⊥ ) = 1$ for all $f ∈ M ;$ (ii) if ${ f n } n = 1 ∞$ is a sequence of pairwise W-separated fuzzy subsets from M (i.e., $f i ≤ f j ⊥$ (point wisely) whenever $i ≠ j$), then $μ ( ∪ n = 1 ∞ f n )$$= ∑ n = 1 ∞ μ ( f n )$.
The symbols and denote the fuzzy union and the fuzzy intersection of a sequence ${ f n } n = 1 ∞ ⊂ M$, respectively, in the sense of Zadeh [5]. The complement of fuzzy subset f of $Ω$ is a fuzzy set $f ⊥$ defined by . The following notions were defined by Piasecki in [35]. A fuzzy set $f ∈ M$ such that $f ≥ f ⊥$ is called a W-universum; a fuzzy set $f ∈ M$ such that $f ≤ f ⊥$ is called a W-empty fuzzy set. It can be proved that a fuzzy event $f ∈ M$ is a W-universum if and only if there exists a fuzzy event $g ∈ M$ such that $f = g ∪ g ⊥$. A W-universum is interpreted as a certain event and a W-empty set as an impossible event. W-separated fuzzy events are interpreted as mutually exclusive events. Each mapping having the properties (i) and (ii) is called, in the terminology of Piasecki, a fuzzy P-measure. Any fuzzy P-measure has all properties analogous to properties of a classical probability measure; the proofs and more details can be found in [4]. The monotonicity of fuzzy P-measure $μ$ implies that this measure transforms M into the interval . In the following we will use the following property of fuzzy P-measure $μ$.
(2.1) Let $g ∈ M .$ Then $μ ( f ∩ g ) = μ ( f )$ for all $f ∈ M$ if and only if $μ ( g ) = 1$.
Let $g ∈ M$ such that $μ ( g ) > 0$. Then the mapping defined by the formula
is a fuzzy P-measure on M; it is called a conditional probability.
Definition 1
[36]. By a fuzzy partition (of a space ) we understand a finite collection $ξ = { f 1 , … , f n }$ of pairwise W-separated fuzzy subsets from M such that $μ ( ∪ i = 1 n f i ) = 1$.
Every fuzzy partition $ξ = { f 1 , … , f n }$ of represents in the sense of the classical probability theory a random experiment with a finite number of outcomes (which are fuzzy events) with a probability distribution $p i = μ ( f i )$, $i = 1 , 2 , … , n$, since $p i ≥ 0$ for $i = 1 , 2 , … , n$, and $∑ i = 1 n p i =$ $∑ i = 1 n μ ( f i )$ $= μ ( ∪ i = 1 n f i ) = 1$. For that reason, we have defined in [3] the entropy of $ξ = { f 1 , … , f n }$ by Shannon’s formula:
$H μ ( ξ ) = − ∑ i = 1 n F ( μ ( f i ) ) ,$
where:
In accordance with the classical theory the log is to the base 2 and entropy is expressed in bits.
Definition 2
[6]. Let $ξ = { f 1 , … , f n }$ and $η = { g 1 , … , g m }$ be two fuzzy partitions of a fuzzy probability space A conditional entropy of $η$ given a fuzzy event $f i ∈ ξ$ is defined by:
$H μ ( η / f i ) = − ∑ j = 1 m F ( μ ˙ ( g j / f i ) ) ,$
where:
A conditional entropy of $η$ assuming a realization of the experiment $ξ$ is defined by the formula:
$H μ ( η / ξ ) = − ∑ i = 1 n μ ( f i ) ⋅ H μ ( η / f i ) = − ∑ i = 1 n ∑ j = 1 m μ ( f i ) ⋅ F ( μ ˙ ( g j / f i ) ) .$
In the following, we will use the convention (based on continuity arguments) that $x log x 0 = ∞$ if $x > 0$, and $0 log 0 x = 0$ if $x ≥ 0$. It is easy to see that we can rewrite Equation (2) in the following way:
$H μ ( η / ξ ) = − ∑ i = 1 n ∑ j = 1 m μ ( f i ∩ g j ) ⋅ log μ ( f i ∩ g j ) μ ( f i ) .$
As in [3] we define in the set of all fuzzy partitions of a fuzzy probability space the relation $≺$: Let be two fuzzy partitions of a fuzzy probability space . Then we write (and we say that the fuzzy partition $η$ is a refinement of the fuzzy partition $ξ$) iff for every $g ∈ η$ there exists $f ∈ ξ$ such that $g ≤ f$. Given two fuzzy partitions $ξ = { f 1 , … , f n }$ and $η = { g 1 , … , g m }$ of a fuzzy probability space , their common refinement $ξ ∨ η$ is defined as the system . The fuzzy partition $ξ ∨ η$ represents a joint experiment of experiments $ξ , η$. Evidently, and . If are fuzzy partitions of a fuzzy probability space , then we put $∨ i = 1 n ξ i = ξ 1 ∨ ξ 2 ∨ … ∨ ξ n$. Two fuzzy partitions $ξ = { f 1 , … , f n }$ and $η = { g 1 , … , g m }$ of a given fuzzy probability space are called statistically independent, if $μ ( f i ∩ g j ) = μ ( f i ) ⋅ μ ( g j )$, for $i = 1 , 2 , … , n$, .
Example 1.
Let and let be defined by . Consider a fuzzy measurable space , where . Then it is easy to verify that the mapping defined by the equalities $μ ( 1 Ω ) = μ ( f ∪ f ⊥ ) = 1$, $μ ( 0 Ω ) = μ ( f ∩ f ⊥ ) = 0$, $μ ( f ) = p$, $μ ( f ⊥ ) = 1 − p$, where , is a fuzzy P-measure and the system is a fuzzy probability space. The sets , , $ς = { 1 Ω }$ are fuzzy partitions of such that . We can calculate their entropy: $H μ ( ξ ) = − p log p − ( 1 − p ) log ( 1 − p )$, $H μ ( η ) = H μ ( ς ) = 0$. This makes sense, because the partitions $η$ and $ς$ represent experiments whose results are certain events. In particular, if $p = 1 2$, then $H μ ( ξ ) = log 2 = 1$ bit.
The entropy and the conditional entropy of fuzzy partitions of a fuzzy probability space satisfy all properties analogous to properties of Shannon’s entropy of measurable partitions in the classical case; the proofs can be found in [3,6], respectively. We present some of them. If are fuzzy partitions of a fuzzy probability space , then:
(2.2)
implies $H μ ( ξ ) ≤ H μ ( η ) ;$
(2.3)
$H μ ( η ∨ ς / ξ ) = H μ ( ς / ξ ∨ η ) + H μ ( η / ξ ) ;$
(2.4)
implies $H μ ( ξ / ς ) ≤ H μ ( η / ς ) ;$
(2.5)
implies $H μ ( ς / ξ ) ≥ H μ ( ς / η ) ;$
(2.6)
$H μ ( ξ / η ) ≤ H μ ( ξ )$ with the equality if and only if are statistically independent;
(2.7)
$H μ ( η ∨ ς / ξ ) ≤ H μ ( η / ξ ) + H μ ( ς / ξ ) ;$
(2.8)
$H μ ( ξ ∨ η ) = H μ ( ξ ) + H μ ( η / ξ ) .$
Definition 3
[6]. Let be two fuzzy partitions of a given fuzzy probability space The mutual information of $ξ$ and $η$ is defined by the formula:
As a simple consequence of (2.8) we have:
It is evident that $I μ ( ξ , ξ ) = H μ ( ξ )$, and $I μ ( ξ , η )$ = $I μ ( η , ξ )$. Hence, we can write:
The proofs of the following two theorems can be found in [6].
Theorem 1
[6]. Let $ξ = { f 1 , … , f n }$ and $η = { g 1 , … , g m }$ be two fuzzy partitions of a fuzzy probability space Then:
$I μ ( ξ , η ) = ∑ i = 1 n ∑ j = 1 m μ ( f i ∩ g j ) log μ ( f i ∩ g j ) μ ( f i ) ⋅ μ ( g j ) .$
Theorem 2
[6]. $I μ ( ξ , η ) ≥ 0$ with the equality if and only if are statistically independent.

## 3. Mutual Information and Conditional Mutual Information in Fuzzy Probability Spaces

In this section by using our previous results we introduce the notion of conditional mutual information in fuzzy probability spaces. We derive chain rules for mutual information of fuzzy partitions and we will prove some more properties concerning these measures.
Definition 4.
Let be fuzzy partitions of a fuzzy probability space Then the conditional mutual information of $ξ$ and $η$ given $ς$ is defined by the formula:
Remark 1.
Since, according to (2.5), we have for the conditional mutual information the inequality $I μ ( ξ , η / ς ) ≥ 0$ holds.
Theorem 3.
For fuzzy partitions of a fuzzy probability space it holds:
$I μ ( ξ , η ∨ ς ) = I μ ( ξ , η ) + I μ ( ξ , ς / η ) = I μ ( ξ , ς ) + I μ ( ξ , η / ς ) .$
Proof.
By simple calculation we obtain:
$I μ ( ξ , η ) + I μ ( ξ , ς / η ) = H μ ( ξ ) − H μ ( ξ / η ) + H μ ( ξ / η ) − H μ ( ξ / η ∨ ς ) = H μ ( ξ ) − H μ ( ξ / η ∨ ς ) = I μ ( ξ , η ∨ ς ) .$
Analogously we obtain the second equality. □
Let be fuzzy partitions of a fuzzy probability space . Denote $ξ 0 = { 1 Ω }$. Evidently, we have $ξ 0 ∨ ξ = ξ$, and $H μ ( ξ / ξ 0 ) = H μ ( ξ )$. Hence, we can write:
Theorem 4 (Chain rules).
Let and $η$ be fuzzy partitions of a fuzzy probability space Then, for $n = 1 , 2 , … ,$ the following equalities hold:
(i)
$H μ ( ξ 1 ∨ ξ 2 ∨ … ∨ ξ n ) =$$∑ i = 1 n H μ ( ξ i / ∨ k = 0 i − 1 ξ k ) ;$
(ii)
$= ∑ i = 1 n H μ ( ξ i / ( ∨ k = 0 i − 1 ξ k ) ∨ η ) ;$
(iii)
$∑ i = 1 n I μ ( ξ i , η / ∨ k = 0 i − 1 ξ k ) .$
Proof.
The first two equalities may be obtained using the principle of mathematical induction from the properties (2.8) and (2.3). By Equation (4), using the two previous equalities and Equation (7), we obtain:
Definition 5.
Let be fuzzy partitions of a fuzzy probability space We say that $ξ$ is conditionally independent to $ς$ given $η$ (and write ) if $I μ ( ξ , ς / η ) = 0 .$
Theorem 5.
For fuzzy partitions of a fuzzy probability space it holds:
Proof.
Let Then:
$0 = I μ ( ξ , ς / η ) = H μ ( ξ / η ) − H μ ( ξ / η ∨ ς ) ,$
and according to the property (2.8) we can write:
$H μ ( ξ / η ) = H μ ( ξ / η ∨ ς ) = H μ ( ξ ∨ η ∨ ς ) − H μ ( η ∨ ς ) .$
Calculate:
It follows that . The opposite implication is obvious. □
According to Theorem 5, we may say that $ξ$ and $ς$ are conditionally independent given $η$ and write instead of .
Theorem 6.
For fuzzy partitions of a fuzzy probability space such that we have:
(i)
$I μ ( ξ ∨ η , ς ) =$$I μ ( η , ς ) ;$
(ii)
$I μ ( η , ς ) =$$I μ ( ξ , ς ) +$$I μ ( ς , η / ξ ) ;$
(iii)
$I μ ( ξ , η / ς ) ≤$$I μ ( ξ , η ) ;$
(iv)
$I μ ( ξ , η ) ≥ I μ ( ξ , ς ) .$
Proof.
(i)
Since by the assumption $I μ ( ξ , ς / η ) = 0 ,$ using the chain rule for logical mutual information, we obtain:
$I μ ( ξ ∨ η , ς ) = I μ ( η ∨ ξ , ς ) = I μ ( η , ς ) + I μ ( ξ , ς / η ) = I μ ( η , ς ) .$
(ii)
According to Theorem 3 we have $I μ ( ξ ∨ η , ς ) =$$I μ ( ς , ξ ) +$$I μ ( ς , η / ξ )$. Hence, using the equality (i) of this theorem, we obtain:
$I μ ( η , ς ) = I μ ( ξ ∨ η , ς ) = I μ ( ς , ξ ) + I μ ( ς , η / ξ ) .$
(iii)
Since $I μ ( ξ , η ) ≥ 0 ,$ from (ii) it follows the inequality:
$I μ ( ς , η / ξ ) ≤ I μ ( ς , η ) .$
By Theorem 5 we can interchange $ξ$ and $ς$. Doing so we obtain $I μ ( ξ , η / ς ) ≤$$I μ ( ξ , η )$.
(iv)
By Theorem 3, the mutual information $I μ ( ξ , η ∨ ς )$ can be expressed in two different ways:
Since , we have $I μ ( ξ , ς / η ) = 0$, and, therefore, it holds $I μ ( ξ , η ) = I μ ( ξ , η ∨ ς )$. Using the second equality, we obtain:
Since $I μ ( ξ , η / ς ) ≥ 0$, we have $I μ ( ξ , η ) ≥ I μ ( ξ , ς )$. □
Note that, in the classical theory, the last assertion from the previous theorem is known as the data processing inequality.

## 4. Kullback-Leibler Divergence with Respect to Fuzzy P-Measures

In this part we define the Kullback-Leibler divergence in fuzzy probability spaces and its conditional version. We prove basic properties of these measures; in particular, Gibb’s inequality. Finally, using the concept of conditional Kullback-Leibler divergence we establish a chain rule for Kullback-Leibler divergence with respect to fuzzy P-measures. In the proofs we use the following known log-sum inequality: for non-negative real numbers , , it holds:
$∑ i = 1 n a i log a i b i ≥ ( ∑ i = 1 n a i ) log ∑ i = 1 n a i ∑ i = 1 n b i$
with the equality if and only if $a i b i$ is constant. Recall that we use the convention that $x log x 0 = ∞$ if $x > 0$, and $0 log 0 x = 0$ if $x ≥ 0$.
Definition 6.
Let $μ ,$ $ν$ be fuzzy P-measures on a common fuzzy measurable space Then, for any fuzzy partition $ξ$ of fuzzy probability spaces we define the Kullback-Leibler divergence $D ξ ( μ$ $ν )$ by:
Remark 2.
The Kullback-Leibler divergence is not a metric in a true sense since it is not symmetric, i.e., the equality $D ξ ( μ$ $ν )$$= D ξ ( ν$ $μ )$ is not necessarily true (as shown in the example that follows), and does not satisfy the triangle inequality.
Example 2.
Consider the fuzzy measurable space from Example 1 and the following two fuzzy P-measures defined on M: $μ$ is defined as in Example 1 and $ν$ is defined in a similar way: $ν ( 1 Ω ) = ν ( f ∪ f ⊥ ) = 1 ,$ $ν ( 0 Ω ) = ν ( f ∩ f ⊥ ) = 0 ,$ $ν ( f ) = q ,$ $ν ( f ⊥ ) = 1 − q ,$ where Then, for the fuzzy partition we obtain:
If $p = q ,$ then $D ξ ( μ$ $ν ) =$$D ξ ( ν$ $μ ) = 0 .$ If $p = 1 2 ,$ $q = 1 3 ,$ then we have:
and:
This means that $D ξ ( μ$ $ν ) ≠$$D ξ ( ν$ $μ ) ,$ in general.
The following result suggests interpretation of Kullback-Leibler divergence as a measure of how different two fuzzy P-measures (over the same fuzzy partition) are.
Theorem 7.
Let $ξ = { f 1 , … , f n }$ be a fuzzy partition of fuzzy probability spaces , . Then $D ξ ( μ$ $ν ) ≥ 0$ (Gibb’s inequality) with the equality if and only if $μ ( f i ) = ν ( f i )$, for .
Proof.
Putting $a i = μ ( f i )$ and $b i = ν ( f i )$, for , in the log-sum inequality, we obtain $∑ i = 1 n a i = ∑ i = 1 n μ ( f i ) = μ ( ∪ i = 1 n f i ) = 1 ;$ analogously we obtain $∑ i = 1 n b i = 1$. Therefore, by Equation (8):
with the equality if and only if $μ ( f i ) ν ( f i ) = α$ for , where $α$ is constant. Taking the sum for all , we obtain $α = 1$. This means that $D ξ ( μ$ $ν ) = 0$ if and only if $μ ( f i ) = ν ( f i )$, for . □
Theorem 8.
Let $μ ,$ $ν$ be fuzzy P-measures on a common fuzzy measurable space Then, for any fuzzy partition $ξ = { f 1 , … , f n }$ of fuzzy probability spaces it holds:
$H μ ( ξ ) = log n − D ξ ( μ ‖ ν ) ,$
where n is the number of members of $ξ$, and $ν$ is the uniform probability distribution over $ξ$, i.e., $ν ( f i ) = 1 n$, for .
Proof.
Calculate:
As a consequence of the previous two theorems we obtain the following property of entropy of fuzzy partitions:
Corollary 1.
For any fuzzy partition $ξ$ of a fuzzy probability space it holds $H μ ( ξ ) ≤ log n ,$ where n denotes the number of members of $ξ ,$ with the equality if and only if $μ$ is the uniform probability distribution over $ξ .$
Proof.
Let $ν$ be the uniform probability distribution over $ξ = { f 1 , … , f n }$. Then, according to the previous theorem and Gibb’s inequality (Theorem 7) we have:
$0 ≤ D ξ ( μ ‖ ν ) = log n − H μ ( ξ ) ,$
which implies the inequality $H μ ( ξ ) ≤ log n$, where n is the number of members of $ξ$. Since by Theorem 7 $D ξ ( μ$ $ν ) = 0$ if and only if $μ ( f i ) = ν ( f i )$, for , the equality $H μ ( ξ ) = log n$ holds if and only if $μ$ is the uniform probability distribution over $ξ$. □
In the following, a concavity of entropy $H μ ( ξ )$ as a function of $μ$ and convexity of K-L divergence with respect to fuzzy P-measures are shown. We recall, for the convenience of the reader, the definitions of convex and concave function:
A real-valued function $f$ is said to be convex over an interval $[ a , b ]$ if for every $x 1 , x 2 ∈ [ a , b ]$ and for any real number ,
A real-valued function $f$ is said to be concave over an interval $[ a , b ]$ if for every $x 1 , x 2 ∈ [ a , b ]$ and for any real number ,
Remark 3.
In the proofs of some of the next assertions, the following known properties are used:
(i)
A function $f$ is concave over an interval if and only if the function $− f$ is convex over the interval.
(ii)
The sum of two concave functions is itself concave; the sum of two convex functions is itself convex.
(iii)
Every real-valued affine function, i.e., each function of the form $f ( x ) = a x + b ,$ $a , b ∈ ℜ ,$ is simultaneously convex and concave.
It is easy to prove the following proposition.
Proposition 1.
Let $μ 1 ,$ $μ 2$ be fuzzy P-measures on a given fuzzy measurable space Then, for every real number the mapping $α μ 1 + ( 1 − α ) μ 2$ is a fuzzy P-measure on
Theorem 9 (Concavity of entropy).
Let $ξ$ be a fuzzy partition of fuzzy probability spaces Then, for any real number the following inequality holds:
Proof.
Assume that $ξ = { f 1 , f 2 , … , f n } .$ Since the function F is convex, we obtain:
which proves the concavity of entropy $H μ ( ξ )$ as a function of fuzzy P-measure $μ$. □
Theorem 10 (Convexity of K-L divergence).
The K-L divergence $D ξ ( μ$ $ν )$ is convex in the pair i.e., for fuzzy P- measures $μ 1 ,$ $ν 1$ and $μ 2 ,$ $ν 2$ on a common fuzzy measurable space we have, for any fuzzy partition $ξ$ of fuzzy probability spaces and for any real number the following inequality:
$D ξ ( α μ 1 + ( 1 − α ) μ 2 ‖ α ν 1 + ( 1 − α ) ν 2 ) ≤ α D ξ ( μ 1 ‖ ν 1 ) + ( 1 − α ) D ξ ( μ 2 ‖ ν 2 ) .$
Proof.
Let $ξ = { f 1 , … , f n }$ be a fuzzy partition of fuzzy probability spaces , , , . Fix . Putting $a 1 = α μ 1 ( f i )$, $a 2 = ( 1 − α ) μ 2 ( f i )$, $b 1 = α ν 1 ( f i )$, $b 2 = ( 1 − α ) ν 2 ( f i )$ in the log-sum inequality, we obtain:
Summing these inequalities over all i, we obtain what was claimed. □
Theorem 11.
Let $ξ = { f 1 , … , f n }$ and $η = { g 1 , … , g m }$ be two fuzzy partitions of a fuzzy probability space Then the mutual information $I μ ( ξ , η )$ of $ξ$ and $η$ is a concave function of $μ$ for fixed $μ ( g j / f i ) ,$ $i = 1 , … , n .$ The mutual information $I μ ( ξ , η )$ of $ξ$ and $η$ is a convex function of $μ ( g j / f i ) ,$ for fixed
Proof.
Let us prove the first assertion. By Equations (6) and (2) we have:
$I μ ( ξ , η ) = H μ ( η ) − ∑ i = 1 n ∑ j = 1 m μ ( f i ) ⋅ F ( μ ˙ ( g j / f i ) ) .$
Let $g j ∈ M$. Since $f k ∩ g j ≤ f k ≤ f l ⊥ ≤ f l ⊥ ∪ g j ⊥ = ( f l ∩ g j ) ⊥$ whenever $k ≠ l$, the system ${ f 1 ∩ g j , … , f n ∩ g j }$ is a system of pairwise W-separated fuzzy subsets from M. Due to the assumption $μ ( ∪ i = 1 n f i ) = 1$, the property (2.1) and additivity of the fuzzy P-measure, we get:
$μ ( g j ) = μ ( ( ∪ i = 1 n f i ) ∩ g j ) = μ ( ∪ i = 1 n ( f i ∩ g j ) ) = ∑ i = 1 n μ ( f i ∩ g j ) = ∑ i : μ ( f i ) > 0 μ ( f i ) μ ( g j / f i ) .$
This means that, for fixed $μ ( g j / f i )$, $μ ( g j )$ is a linear function of $μ ( f i )$. Therefore, $H μ ( η )$, which is a concave function of $μ ( g j )$, is a concave function of $μ ( f i )$. The second term of the difference in Equation (9) is a linear function of $μ ( f i )$. Hence, (see Remark 3) the difference in Equation (9) is a concave function of $μ ( f i )$. Thus, the first part of theorem is proved.
Now, let us prove the second part. We fix $μ$ over $ξ$ and consider two different conditional fuzzy P-measures $μ 1 ( g j / f i )$ and $μ 2 ( g j / f i )$, defined for $f i ∈ ξ$, $μ ( f i ) > 0$, and . Then, for $i = 1 , … , n$, , we have:
According to Proposition 1 we can define, for every real number , the following conditional fuzzy P-measure:
Then we have:
Therefore, if we put $ν α ( f i ∩ g j ) = μ ( f i ) ⋅ μ α ( g j )$, we obtain:
$ν α ( f i ∩ g j ) = α μ ( f i ) ⋅ μ 1 ( g j ) + ( 1 − α ) μ ( f i ) ⋅ μ 2 ( g j ) .$
According to Theorem 1 and Theorem 10 we can write:
Finally, we define a conditional version of the Kullback-Leibler divergence and, subsequently, we will prove the chain rule for K-L divergence.
Definition 7.
Let $ξ = { f 1 , … , f n } ,$ $η = { g 1 , … , g m }$ be two fuzzy partitions of fuzzy probability spaces Then we define the conditional Kullback-Leibler divergence $D η / ξ ( μ$ $ν )$ by:
Theorem 12 (Chain rule for K-L divergence).
If are two fuzzy partitions of fuzzy probability spaces then:
Proof.
Let $ξ = { f 1 , … , f n } ,$ $η = { g 1 , … , g m }$. Using the property (2.1), we obtain:
Therefore:

## 5. Discussion

In this paper we extend our study concerning mutual information of fuzzy partitions in fuzzy probability spaces. In Section 3, using our previous results, the notion of conditional mutual information of fuzzy partitions is defined and chain rules for mutual information of fuzzy partitions are established. Subsequently, using the notion of conditional mutual information of fuzzy partitions we have defined the notion of conditional independence of fuzzy partitions and we have proved, inter alia, data processing inequality for the studied situation. In Section 4 the notion of Kullback-Leibler divergence for fuzzy P-measures is introduced and the basic properties of this measure are shown. In particular, a convexity of Kullback-Leibler divergence with respect to fuzzy P-measures is proved and a convexity of mutual information of fuzzy partitions is studied. Finally, a conditional version of the Kullback-Leibler divergence is defined and chain rules for Kullback-Leibler divergence with respect to fuzzy P-measures are established.
As noted previously, logical versions of some results concerning the entropy and mutual information of fuzzy partitions presented in Section 2 and Section 3 are given in [10]. In [9] Ellerman studied, in addition to the logical entropy and logical mutual information, the concept of logical Kullback-Leibler divergence. The aim of our next study will be to provide a logical version of Kullback-Leibler divergence for fuzzy probability measures.
Let us mention, finally, the fuzzy set theory has been shown to be useful in many applications of mathematics and it is continually developing. Currently, algebraic structures, based on fuzzy set theory, as MV-algebras (cf. [37,38]), D-posets (cf. [39,40]), effect algebras (cf. [41,42]), and IF-sets (cf. [43,44,45,46,47]) are subject of intensive research and, of course, there are also many results about the entropy on these structures. Some of them can be found, e.g., in [48,49,50,51,52,53,54,55,56,57,58].

## Acknowledgments

The author thanks the editor and the referees for their valuable comments and suggestions.

## Conflicts of Interest

The author declares no conflict of interest.

## References

1. Kolmogorov, A.N. Foundations of the Theory of Probability; Chelsea Press: New York, NY, USA, 1950. [Google Scholar]
2. Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
3. Markechová, D. The entropy of fuzzy dynamical systems and generators. Fuzzy Sets Syst. 1992, 48, 351–363. [Google Scholar] [CrossRef]
4. Piasecki, K. Probability of fuzzy events defined as denumerable additive measure. Fuzzy Sets Syst. 1985, 17, 271–284. [Google Scholar] [CrossRef]
5. Zadeh, L.A. Fuzzy Sets. Inf. Control 1965, 8, 338–358. [Google Scholar] [CrossRef]
6. Markechová, D. Entropy and mutual information of experiments in the fuzzy case. Neural Netw. World 2013, 23, 339–349. [Google Scholar] [CrossRef]
7. Markechová, D. Entropy of complete fuzzy partitions. Math. Slovaca 1993, 43, 1–10. [Google Scholar]
8. Markechová, D. A note to the Kolmogorov-Sinai entropy of fuzzy dynamical systems. Fuzzy Sets Syst. 1994, 64, 87–90. [Google Scholar] [CrossRef]
9. Ellerman, D. An Introduction to Logical Entropy and Its Relation to Shannon Entropy. Int. J. Seman. Comput. 2013, 7, 121–145. [Google Scholar] [CrossRef]
10. Markechová, D.; Riečan, B. Logical Entropy of Fuzzy Dynamical Systems. Entropy 2016, 18, 157. [Google Scholar] [CrossRef]
11. Ebrahimzadeh, A.; Giski, Z.E.; Markechová, D. Logical Entropy of Dynamical Systems—A General Model. Mathematics 2017, 5, 4. [Google Scholar] [CrossRef]
12. Mesiar, R.; Rybárik, J. Entropy of Fuzzy Partitions—A General Model. Fuzzy Sets Syst. 1998, 99, 73–79. [Google Scholar] [CrossRef]
13. Mesiar, R. The Bayes formula and the entropy of fuzzy probability spaces. Int. J. Gen. Syst. 1990, 4, 67–71. [Google Scholar]
14. Dumitrescu, D. Fuzzy measures and the entropy of fuzzy partitions. J. Math. Anal. Appl. 1993, 176, 359–373. [Google Scholar] [CrossRef]
15. Dumitrescu, D. Entropy of a fuzzy dynamical system. Fuzzy Sets Syst. 1995, 70, 45–57. [Google Scholar] [CrossRef]
16. Riečan, B. An entropy construction inspired by fuzzy sets. Soft Comput. 2003, 7, 486–488. [Google Scholar]
17. Hu, Q.; Yu, D.; Xie, Z.; Liu, J. Fuzzy probabilistic approximation spaces and their information measures. IEEE Trans. Fuzzy Syst. 2006, 14, 191–201. [Google Scholar]
18. Yu, D.; Hu, Q.; Wu, C. Uncertainty measures for fuzzy relations and their applications. Appl. Soft Comput. 2007, 7, 1135–1143. [Google Scholar] [CrossRef]
19. Rahimi, M.; Riazi, A. On local entropy of fuzzy partitions. Fuzzy Sets Syst. 2014, 234, 97–108. [Google Scholar] [CrossRef]
20. Rahimi, M.; Assari, A.; Ramezani, F. A Local Approach to Yager Entropy of Dynamical Systems. Int. J. Fuzzy Syst. 2015, 1, 1–10. [Google Scholar] [CrossRef]
21. Markechová, D.; Riečan, B. Entropy of Fuzzy Partitions and Entropy of Fuzzy Dynamical Systems. Entropy 2016, 18, 19. [Google Scholar] [CrossRef]
22. Khare, M. Fuzzy σ-algebras and conditional entropy. Fuzzy Sets Syst. 1999, 102, 287–292. [Google Scholar] [CrossRef]
23. Khare, M.; Roy, S. Conditional entropy and the Rokhlin metric on an orthomodular lattice with Bayessian state. Int. J. Theor. Phys. 2008, 47, 1386–1396. [Google Scholar] [CrossRef]
24. Srivastava, P.; Khare, M.; Srivastava, Y.K. M-Equivalence, entropy and F-dynamical systems. Fuzzy Sets Syst. 2001, 121, 275–283. [Google Scholar] [CrossRef]
25. Criado, F.; Gachechiladze, T. Entropy of fuzzy events. Fuzzy Sets Syst. 1997, 88, 99–106. [Google Scholar] [CrossRef]
26. Dubois, D.; Prade, M. A review of fuzzy set aggregation connectives. Inf. Sci. 1985, 36, 85–121. [Google Scholar] [CrossRef]
27. Gray, R.M. Entropy and Information Theory; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
28. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
29. Kullback, S. Information Theory and Statistics; John Wiley & Sons: New York, NY, USA, 1959. [Google Scholar]
30. Schnakenberg, J. Network theory of microscopic and macroscopic behavior of master equation systems. Rev. Mod. Phys. 1976, 48, 571–585. [Google Scholar] [CrossRef]
31. Risken, H. The Fokker-Planck Equation, Methods of Solution and Applications; Springer: New York, NY, USA, 1984. [Google Scholar]
32. Qian, H. Relative entropy: Free energy associated with equilibrium fluctuations and nonequilibrium deviations. Phys. Rev. E 2001, 63, 042103. [Google Scholar] [CrossRef] [PubMed]
33. Ellis, R.S. Entropy, Large Deviations, and Statistical Mechanics; Springer: New York, NY, USA, 1985. [Google Scholar]
34. Dvurečenskij, A. On the existence of probability measures on fuzzy measurable spaces. Fuzzy Sets Syst. 1991, 43, 173–181. [Google Scholar] [CrossRef]
35. Piasecki, K. New concept of separated fuzzy subsets. In Proceedings of the Polish Symposium on Interval and Fuzzy Mathematics, Poznan, Poland, 26–29 August 1983; Albrycht, J., Wiśniewski, H., Eds.; Wydaw. Politechniki Poznańskiej: Poznan, Poland, 1985; pp. 193–195. [Google Scholar]
36. Piasecki, K. Fuzzy partitions of sets. BUSEFAL 1986, 25, 52–60. [Google Scholar]
37. Riečan, B.; Mundici, D. Probability on MV-algebras. In Handbook of Measure Theory; Pap, E., Ed.; Elsevier: Amsterdam, The Netherlands, 2002; pp. 869–910. [Google Scholar]
38. Di Nola, A.; Dvurečenskij, A. Product MV-algebras. Multiple-Valued Log. 2001, 6, 193–215. [Google Scholar]
39. Kôpka, F.; Chovanec, F. D-posets. Math. Slovaca 1994, 44, 21–34. [Google Scholar]
40. Frič, R. On D-posets of fuzzy sets. Math. Slovaca 2014, 64, 545–554. [Google Scholar] [CrossRef]
41. Foulis, D.J.; Bennett, M.К. Effect algebras and unsharp quantum logics. Found. Phys. 1994, 24, 1325–1346. [Google Scholar] [CrossRef]
42. Dvurečenskij, A.; Pulmannová, S. New Trends in Quantum Structures; Kluwer Academic Publishers: Dordrecht, The Netherlands, 2000. [Google Scholar]
43. Atanassov, K. Intuitionistic Fuzzy Sets: Theory and Applications; Physica Verlag: New York, NY, USA, 1999. [Google Scholar]
44. Atanassov, K. Intuitionistic fuzzy sets. Fuzzy Sets Syst. 1986, 20, 87–96. [Google Scholar] [CrossRef]
45. Atanassov, K. More on intuitionistic fuzzy sets. Fuzzy Sets Syst. 1989, 33, 37–45. [Google Scholar] [CrossRef]
46. Atanassov, K.; Riečan, B. On two operations over intuitionistic fuzzy sets. J. Appl. Math. Stat. Inform. 2006, 2, 145–148. [Google Scholar] [CrossRef]
47. Riečan, B. Algebraic and Proof-Theoretic Aspects of Non-Classical Logics: Papers in Honor of Daniele Mundici on the Occasion of His 60th Birthday; Springer: Berlin/Heidelberg, Germany, 2007; pp. 290–308. [Google Scholar]
48. Petrovičová, J. On the entropy of partitions in product MV-algebras. Soft Comput. 2000, 4, 41–44. [Google Scholar] [CrossRef]
49. Petrovičová, J. On the entropy of dynamical systems in product MV-algebras. Fuzzy Sets Syst. 2001, 121, 347–351. [Google Scholar] [CrossRef]
50. Riečan, B. Kolmogorov–Sinaj entropy on MV-algebras. Int. J. Theor. Phys. 2005, 44, 1041–1052. [Google Scholar] [CrossRef]
51. Ďurica, M. Entropy on IF-events. Notes Intuit. Fuzzy Sets 2007, 13, 30–40. [Google Scholar]
52. Di Nola, A.; Dvurečenskij, A.; Hyčko, M.; Manara, C. Entropy on Effect Algebras with the Riesz Decomposition Property I: Basic Properties. Kybernetika 2005, 41, 143–160. [Google Scholar]
53. Di Nola, A.; Dvurečenskij, A.; Hyčko, M.; Manara, C. Entropy on Effect Algebras with the Riesz Decomposition Property II: MV-Algebras. Kybernetika 2005, 41, 161–176. [Google Scholar]
54. Giski, Z.E.; Ebrahimi, M. Entropy of Countable Partitions on Effect Algebras with the Riesz Decomposition Property and Weak Sequential Effect Algebras. Cankaya Univ. J. Sci. Eng. 2015, 12, 20–39. [Google Scholar]
55. Farnoosh, R.; Rahimi, M.; Kumar, P. Removing noise in a digital image using a new entropy method based on intuitionistic fuzzy sets. In Proceedings of the International Conference on Fuzzy Systems, Vancouver, BC, Canada, 24–29 July 2016.
56. Burillo, P.; Bustince, H. Entropy on intuitionistic fuzzy sets and on interval-valued fuzzy sets. Fuzzy Sets Syst. 1996, 78, 305–316. [Google Scholar] [CrossRef]
57. Szmidt, E.; Kacprzyk, J. Entropy for intuitionistic fuzzy sets. Fuzzy Sets Syst. 2001, 118, 467–477. [Google Scholar] [CrossRef]
58. Ebrahimzadeh, A.; Giski, Z.E.; Markechová, D. Logical Entropy on Effect Algebras with the Riesz Decomposition Property. Commun. Theor. Phys. 2017. under review. [Google Scholar]

## Share and Cite

MDPI and ACS Style

Markechová, D. Kullback-Leibler Divergence and Mutual Information of Experiments in the Fuzzy Case. Axioms 2017, 6, 5. https://doi.org/10.3390/axioms6010005

AMA Style

Markechová D. Kullback-Leibler Divergence and Mutual Information of Experiments in the Fuzzy Case. Axioms. 2017; 6(1):5. https://doi.org/10.3390/axioms6010005

Chicago/Turabian Style

Markechová, Dagmar. 2017. "Kullback-Leibler Divergence and Mutual Information of Experiments in the Fuzzy Case" Axioms 6, no. 1: 5. https://doi.org/10.3390/axioms6010005

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.