# On Normalized Mutual Information: Measure Derivations and Properties

## Abstract

## 1. Introduction

^{*}are strictly functions of X or Y, but of the probability distribution $\{p({x}_{i},{y}_{j})\}$ (and $p({x}_{i})={\displaystyle \sum _{j=1}^{J}}p({x}_{i},{y}_{j})$ and $p({y}_{j})={\displaystyle \sum _{i=1}^{I}}p({x}_{i},{y}_{j})$). Similarly, p is used for both the joint probability and the marginal probabilities instead of ${p}_{XY}$, ${p}_{X}$, and ${p}_{Y}$. When necessary for the sake of clarity, $I\left(\{p({x}_{i},{y}_{j})\}\right)$ and ${I}^{\ast}\left(\{p({x}_{i},{y}_{j})\}\right)$ are sometimes used.

## 2. Mutual Information and Upper Bounds

#### 2.1. Pairwise Measure

- (i)
- $I({x}_{i};{y}_{j})\ge 0$.
- (ii)
- $I({x}_{i};{y}_{j})=0$, if, and only if, the events $X={x}_{i}$ and $Y={y}_{j}$ are independent.
- (iii)
- $I({x}_{i};{y}_{j})=I({y}_{j};{x}_{i})$, i.e., I is symmetric in the events $X={x}_{i}$ and $Y={y}_{j}$.
- (iv)
- $\sum _{i=1}^{I}}{\displaystyle \sum _{j=1}^{J}}p({x}_{i})p({y}_{j})I({x}_{i};{y}_{j})=I(X;Y)$ in (1).

#### 2.2. Mean Measures

#### 2.3. Conditional Measures

## 3. Normalizations

## 4. Weighted Mutual Information

## 5. Value Validity

#### 5.1. Value-Validity Consideration

#### 5.2. Value-Validity Corrections of ${I}^{\ast}$

#### 5.3. Numerical Example

## 6. Conclusions

## Conflicts of Interest

## References

**Table 1.**Values of the normalized forms of the measures in (1), (5), and (8) for the probability distribution $\{{p}_{ij}^{\alpha}\}$ in (42) with differing α-values.

${\mathit{I}}^{\ast}$ | $\mathit{\alpha}$ | ||||
---|---|---|---|---|---|

0.1 | 0.3 | 0.5 | 0.7 | 0.9 | |

${I}^{\ast}({x}_{1};{y}_{1})={I}^{\ast}({x}_{2};{y}_{2})$ | 0.01 | 0.07 | 0.20 | 0.42 | 0.77 |

${I}^{\ast}({x}_{1};{y}_{2})={I}^{\ast}({x}_{2};{y}_{1})$ | 0.01 | 0.06 | 0.18 | 0.37 | 0.69 |

${I}^{\ast}(X;{y}_{1})={I}^{\ast}(X;{y}_{2})$ | 0.01 | 0.07 | 0.19 | 0.39 | 0.71 |

${I}^{\ast}(X;Y)$ | 0.01 | 0.07 | 0.19 | 0.39 | 0.71 |

**Table 2.**Values of ${I}^{\ast}$ for ${I}^{\ast}(X;Y)=I(X;Y)/\mathrm{min}\{H(X),H(Y)\}$ and the distribution $\{{p}_{ij}^{\alpha}\}$ in (42) with differing α-values, as well as the corresponding values for two different functions h satisfying (50), approximately.

$\mathit{\alpha}$ | ${\mathit{I}}^{\ast}\left(\{{\mathit{p}}_{\mathit{i}\mathit{j}}^{\mathit{\alpha}}\}\right)$ | $\sqrt{{\mathit{I}}^{\ast}\left(\{{\mathit{p}}_{\mathit{i}\mathit{j}}^{\mathit{\alpha}}\}\right)}$ | $1-{\left(1-\sqrt{{\mathit{I}}^{\ast}\left(\{{\mathit{p}}_{\mathit{i}\mathit{j}}^{\mathit{\alpha}}\}\right)}\right)}^{11/9}$ |
---|---|---|---|

0 | 0 | 0 | 0 |

0.1 | 0.0072 | 0.0849 | 0.1027 |

0.2 | 0.0291 | 0.1706 | 0.2044 |

0.3 | 0.0659 | 0.2567 | 0.3041 |

0.4 | 0.1187 | 0.3445 | 0.4033 |

0.5 | 0.1887 | 0.4344 | 0.5017 |

0.6 | 0.2781 | 0.5274 | 0.5999 |

0.7 | 0.3902 | 0.6247 | 0.6981 |

0.8 | 0.5310 | 0.7287 | 0.7970 |

0.9 | 0.7136 | 0.8447 | 0.8974 |

1 | 1 | 1 | 1 |

**Table 3.**United States (U.S.) Senate election results in terms of sample probabilities (proportions) $p({x}_{i},{y}_{j})$ for candidate vote (X) and voters’ party identification (Y) (sample size N = 2843). Source: Reynolds ([36] (p. 2)).

Vote (X) | Party Identification (Y) | |||
---|---|---|---|---|

Democrat $({\mathit{y}}_{1})$ | Independent $({\mathit{y}}_{2})$ | Republican $({\mathit{y}}_{3})$ | Total | |

Democrat $({x}_{1})$ | 0.39 | 0.11 | 0.04 | 0.54 |

Republican $({x}_{2})$ | 0.07 | 0.12 | 0.27 | 0.46 |

Total | 0.46 | 0.23 | 0.31 | 1.00 |

Corresponding values for the normalized mutual information measures defined in the text: | ||||

${I}^{\ast}({x}_{1};{y}_{j})$ = 0.35, 0.01, 0.46; ${I}_{C}^{\ast}({x}_{1};{y}_{j})$ = 0.66, 0.12, 0.75 for j = 1, 2, 3 | ||||

${I}^{\ast}({x}_{2};{y}_{j})$ = 0.33, 0.01, 0.55; ${I}_{C}^{\ast}({x}_{2};{y}_{j})$ = 0.65, 0.13, 0.81 for j = 1, 2, 3 | ||||

${I}^{\ast}(X;{y}_{j})$ = 0.33, 0.01, 0.49; ${I}_{C}^{\ast}(X;{y}_{j})$ = 0.65, 0.13, 0.77 for j = 1, 2, 3 | ||||

${I}^{\ast}(X;Y)$ = 0.31; ${I}_{C}^{\ast}(X;Y)$ = 0.63 |

