# Kullback-Leibler Divergence and Mutual Information of Experiments in the Fuzzy Case

## Abstract

## 1. Introduction

## 2. Basic Definitions and Facts

**Definition 1**

**.**By a fuzzy partition (of a space $(\Omega ,M,\mu )$) we understand a finite collection $\xi =\left\{{f}_{1},\dots ,{f}_{n}\right\}$ of pairwise W-separated fuzzy subsets from M such that $\mu ({\cup}_{i=1}^{n}{f}_{i})=1$.

**Definition 2**

**.**Let $\xi =\left\{{f}_{1},\dots ,{f}_{n}\right\}$ and $\eta =\left\{{g}_{1},\dots ,{g}_{m}\right\}$ be two fuzzy partitions of a fuzzy probability space $(\Omega ,M,\mu ).$ A conditional entropy of $\eta $ given a fuzzy event ${f}_{i}\in \xi $ is defined by:

**Example**

**1.**

- (2.2)
- $\xi \prec \eta $ implies ${H}_{\mu}(\xi )\le {H}_{\mu}(\eta );$
- (2.3)
- ${H}_{\mu}(\eta \vee \varsigma /\xi )={H}_{\mu}(\varsigma /\xi \vee \eta )+{H}_{\mu}(\eta /\xi );$
- (2.4)
- $\xi \prec \eta $ implies ${H}_{\mu}(\xi /\varsigma )\le {H}_{\mu}(\eta /\varsigma );$
- (2.5)
- $\xi \prec \eta $ implies ${H}_{\mu}(\varsigma /\xi )\ge {H}_{\mu}(\varsigma /\eta );$
- (2.6)
- ${H}_{\mu}(\xi /\eta )\le {H}_{\mu}(\xi )$ with the equality if and only if $\xi ,\eta $ are statistically independent;
- (2.7)
- ${H}_{\mu}(\eta \vee \varsigma /\xi )\le {H}_{\mu}(\eta /\xi )+{H}_{\mu}(\varsigma /\xi );$
- (2.8)
- ${H}_{\mu}(\xi \vee \eta )={H}_{\mu}(\xi )+{H}_{\mu}(\eta /\xi ).$

**Definition 3**

**.**Let $\xi ,\eta $ be two fuzzy partitions of a given fuzzy probability space $(\Omega ,M,\mu ).$ The mutual information of $\xi $ and $\eta $ is defined by the formula:

**Theorem 1**

**.**Let $\xi =\left\{{f}_{1},\dots ,{f}_{n}\right\}$ and $\eta =\left\{{g}_{1},\dots ,{g}_{m}\right\}$ be two fuzzy partitions of a fuzzy probability space $(\Omega ,M,\mu ).$ Then:

**Theorem 2**

**.**${I}_{\mu}(\xi ,\eta )\ge 0$ with the equality if and only if $\xi ,\eta $ are statistically independent.

## 3. Mutual Information and Conditional Mutual Information in Fuzzy Probability Spaces

**Definition**

**4.**

**Remark**

**1.**

**Theorem**

**3.**

**Proof.**

**Theorem 4**(Chain rules)

**.**

- (i)
- ${H}_{\mu}({\xi}_{1}\vee {\xi}_{2}\vee \dots \vee {\xi}_{n})=$${\sum}_{i=1}^{n}{H}_{\mu}}({\xi}_{i}/{\vee}_{k=0}^{i-1}{\xi}_{k});$
- (ii)
- ${H}_{\mu}({\vee}_{i=1}^{n}{\xi}_{i}/\eta )$$={\displaystyle {\sum}_{i=1}^{n}{H}_{\mu}}({\xi}_{i}/({\vee}_{k=0}^{i-1}{\xi}_{k})\vee \eta );$
- (iii)
- ${I}_{\mu}({\vee}_{i=1}^{n}{\xi}_{i},\eta )=$${\sum}_{i=1}^{n}{I}_{\mu}}({\xi}_{i},\eta /{\vee}_{k=0}^{i-1}{\xi}_{k}).$

**Proof.**

**Definition**

**5.**

**Theorem**

**5.**

**Proof.**

**Theorem**

**6.**

- (i)
- ${I}_{\mu}(\xi \vee \eta ,\varsigma )=$${I}_{\mu}(\eta ,\varsigma );$
- (ii)
- ${I}_{\mu}(\eta ,\varsigma )=$${I}_{\mu}(\xi ,\varsigma )+$${I}_{\mu}(\varsigma ,\eta /\xi );$
- (iii)
- ${I}_{\mu}(\xi ,\eta /\varsigma )\le $${I}_{\mu}(\xi ,\eta );$
- (iv)
- ${I}_{\mu}(\xi ,\eta )\ge {I}_{\mu}(\xi ,\varsigma ).$

**Proof.**

- (i)
- Since by the assumption ${I}_{\mu}(\xi ,\varsigma /\eta )=0,$ using the chain rule for logical mutual information, we obtain:$${I}_{\mu}(\xi \vee \eta ,\varsigma )={I}_{\mu}(\eta \vee \xi ,\varsigma )={I}_{\mu}(\eta ,\varsigma )+{I}_{\mu}(\xi ,\varsigma /\eta )={I}_{\mu}(\eta ,\varsigma ).$$
- (ii)
- According to Theorem 3 we have ${I}_{\mu}(\xi \vee \eta ,\varsigma )=$${I}_{\mu}(\varsigma ,\xi )+$${I}_{\mu}(\varsigma ,\eta /\xi )$. Hence, using the equality (i) of this theorem, we obtain:$${I}_{\mu}(\eta ,\varsigma )={I}_{\mu}(\xi \vee \eta ,\varsigma )={I}_{\mu}(\varsigma ,\xi )+{I}_{\mu}(\varsigma ,\eta /\xi ).$$
- (iii)
- Since ${I}_{\mu}(\xi ,\eta )\ge 0,$ from (ii) it follows the inequality:$${I}_{\mu}(\varsigma ,\eta /\xi )\le {I}_{\mu}(\varsigma ,\eta ).$$By Theorem 5 we can interchange $\xi $ and $\varsigma $. Doing so we obtain ${I}_{\mu}(\xi ,\eta /\varsigma )\le $${I}_{\mu}(\xi ,\eta )$.
- (iv)
- By Theorem 3, the mutual information ${I}_{\mu}(\xi ,\eta \vee \varsigma )$ can be expressed in two different ways:$$\begin{array}{c}{I}_{\mu}(\xi ,\eta \vee \varsigma )={I}_{\mu}(\xi ,\eta )+{I}_{\mu}(\xi ,\varsigma /\eta )\\ ={I}_{\mu}(\xi ,\varsigma )+{I}_{\mu}(\xi ,\eta /\varsigma ).\end{array}$$Since $\xi \to \eta \to \varsigma $, we have ${I}_{\mu}(\xi ,\varsigma /\eta )=0$, and, therefore, it holds ${I}_{\mu}(\xi ,\eta )={I}_{\mu}(\xi ,\eta \vee \varsigma )$. Using the second equality, we obtain:$${I}_{\mu}(\xi ,\eta )={I}_{\mu}(\xi ,\eta \vee \varsigma )={I}_{\mu}(\xi ,\varsigma )+{I}_{\mu}(\xi ,\eta /\varsigma ).$$Since ${I}_{\mu}(\xi ,\eta /\varsigma )\ge 0$, we have ${I}_{\mu}(\xi ,\eta )\ge {I}_{\mu}(\xi ,\varsigma )$. □

## 4. Kullback-Leibler Divergence with Respect to Fuzzy P-Measures

**Definition**

**6.**

**Remark**

**2.**

**Example**

**2.**

**Theorem**

**7.**

**Proof.**

**Theorem**

**8.**

**Proof.**

**Corollary**

**1.**

**Proof.**

**Remark**

**3.**

- (i)
- A function $f$ is concave over an interval if and only if the function $-f$ is convex over the interval.
- (ii)
- The sum of two concave functions is itself concave; the sum of two convex functions is itself convex.
- (iii)
- Every real-valued affine function, i.e., each function of the form $f(x)=ax+b,$ $a,b\in \Re ,$ is simultaneously convex and concave.

**Proposition**

**1.**

**Theorem 9**(Concavity of entropy)

**.**

**Proof.**

**Theorem 10**(Convexity of K-L divergence)

**.**

**Proof.**

**Theorem**

**11.**

**Proof.**

**Definition**

**7.**

**Theorem 12**(Chain rule for K-L divergence)

**.**

**Proof.**

## 5. Discussion

## Acknowledgments

## Conflicts of Interest

