# On Normalized Mutual Information: Measure Derivations and Properties

^{1}

^{2}

## Abstract

**:**

## 1. Introduction

^{*}are strictly functions of X or Y, but of the probability distribution $\{p({x}_{i},{y}_{j})\}$ (and $p({x}_{i})={\displaystyle \sum _{j=1}^{J}}p({x}_{i},{y}_{j})$ and $p({y}_{j})={\displaystyle \sum _{i=1}^{I}}p({x}_{i},{y}_{j})$). Similarly, p is used for both the joint probability and the marginal probabilities instead of ${p}_{XY}$, ${p}_{X}$, and ${p}_{Y}$. When necessary for the sake of clarity, $I\left(\{p({x}_{i},{y}_{j})\}\right)$ and ${I}^{\ast}\left(\{p({x}_{i},{y}_{j})\}\right)$ are sometimes used.

## 2. Mutual Information and Upper Bounds

#### 2.1. Pairwise Measure

- (i)
- $I({x}_{i};{y}_{j})\ge 0$.
- (ii)
- $I({x}_{i};{y}_{j})=0$, if, and only if, the events $X={x}_{i}$ and $Y={y}_{j}$ are independent.
- (iii)
- $I({x}_{i};{y}_{j})=I({y}_{j};{x}_{i})$, i.e., I is symmetric in the events $X={x}_{i}$ and $Y={y}_{j}$.
- (iv)
- $\sum _{i=1}^{I}}{\displaystyle \sum _{j=1}^{J}}p({x}_{i})p({y}_{j})I({x}_{i};{y}_{j})=I(X;Y)$ in (1).

#### 2.2. Mean Measures

#### 2.3. Conditional Measures

## 3. Normalizations

## 4. Weighted Mutual Information

## 5. Value Validity

#### 5.1. Value-Validity Consideration

#### 5.2. Value-Validity Corrections of ${I}^{\ast}$

#### 5.3. Numerical Example

## 6. Conclusions

## Conflicts of Interest

## References

- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J.
**1948**, 27, 379–423, 623–656. [Google Scholar] [CrossRef] - Reza, F.M. An Introduction to Information Theory; McGraw-Hill: New York, NY, USA, 1961. [Google Scholar]
- Hamming, R.W. Coding and Information Theory; Prentice-Hall: Englewood Cliffs, NJ, USA, 1980. [Google Scholar]
- Han, T.S.; Kobayashi, K. Mathematics of Information and Coding; American Mathematical Society: Providence, RI, USA, 2002. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: New York, NY, USA, 2006. [Google Scholar]
- Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat.
**1951**, 22, 79–86. [Google Scholar] [CrossRef] - MacKay, D.J.C. Information Theory, Inference, and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Horibe, Y. Entropy and correlation. IEEE Trans. Syst. Man Cybern.
**1985**, SMC-15, 641–642. [Google Scholar] [CrossRef] - Kvålseth, T.O. Entropy and correlation: Some comments. IEEE Trans. Syst. Man Cybern.
**1987**, SMC-17, 517–519. [Google Scholar] - Wickens, T.D. Multiway Contingency Tables Analysis for the Social Sciences; Lawrence Erlbaum: Hillsdale, NJ, USA, 1989. [Google Scholar]
- Tang, W.; He, H.; Tu, X.M. Applied Categorical and Count Data Analysis; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
- Pfitzer, D.; Leibbrandt, R.; Powers, D. Characterization and evaluation of similarity measures of pairs of clusterings. Knowl. Inf. Syst.
**2009**, 19, 361–394. [Google Scholar] [CrossRef] - Yang, Y.; Ma, Z.; Yang, Y.; Nie, F.; Shen, H.T. Multitask spectral clustering by exploring intertask correlation. IEEE Trans. Cybern.
**2015**, 45, 1069–1080. [Google Scholar] [CrossRef] [PubMed] - Jain, N.; Murthy, C.A. A new estimate of mutual information based measure of dependence between two variables: Properties and fast implementation. Int. J. Mach. Learn. Cybern.
**2015**, 7, 857–875. [Google Scholar] [CrossRef] - Reshef, D.N.; Reshef, Y.A.; Finucane, H.K.; Grossman, S.R.; McVean, G.; Turnbaugh, P.J.; Lander, E.S.; Mitzenmacher, M.; Sabeti, P.C. Detecting novel associations in large data sets. Science
**2011**, 334, 1518–1524. [Google Scholar] [CrossRef] [PubMed] - Hu, B.-G. Information measure toolbox for classifier evaluation on open source software scilab. In Proceedings of the 2009 IEEE International Workship on Open-Source Software for Scientific Computing, (OSSC-2009), Guiyang, China, 18–20 September 2009; pp. 179–184. [Google Scholar]
- Hossny, M.; Nahavandi, S.; Creighton, D. Comments on ‘Information measure for performance of image fusion’. Electron. Lett.
**2009**, 44, 1066–1067. [Google Scholar] [CrossRef] - Hardy, G.H.; Littlewood, J.E.; Pólya, G. Inequalities; Cambridge University Press: Cambridge, UK, 1934. [Google Scholar]
- Beckenbach, E.F.; Bellman, R. Inequalities; Springer: Heidelberg, Germany, 1971. [Google Scholar]
- Stolarsky, K.B. Generalizations of the logarithmic mean. Math. Mag.
**1975**, 48, 87–92. [Google Scholar] [CrossRef] - Ebanks, B. Looking for a few good means. Am. Math. Mon.
**2012**, 119, 658–669. [Google Scholar] [CrossRef] - Chen, S.; Ma, B.; Zhang, K. On the similarity metric and the distance metric. Theor. Comput. Sci.
**2009**, 410, 2365–2376. [Google Scholar] [CrossRef] - Strehl, A.; Ghosh, J. Cluster ensembles—A knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res.
**2002**, 3, 583–617. [Google Scholar] - Belis, M.; Guiasu, S. A quantitative-qualitative measure in cybernetic systems. IEEE Trans. Inf. Theory
**1968**, 14, 593–594. [Google Scholar] [CrossRef] - Guiasu, S. Information Theory with Applications; McGraw-Hill: New York, NY, USA, 1977. [Google Scholar]
- Taneja, H.C.; Tuteja, R.K. Characterization of a quantitative-qualitative measure of relative information. Inf. Sci.
**1984**, 33, 217–222. [Google Scholar] [CrossRef] - Kapur, J.N. Measures of Information and Their Applications; Wiley Eastern: New Delhi, India, 1994. [Google Scholar]
- Luan, H.; Qi, F.; Xue, Z.; Chen, L.; Shen, D. Multimodality image registration by maximization of quantitative-qualitative measures of mutual information. Pattern Recognit.
**2008**, 41, 285–298. [Google Scholar] [CrossRef] - Schaffernicht, E.; Gross, H.-M. Weighted mutual information for feature selection. In Proceedings of the 21st International Conference on Artificial Neural Networks, Part II, Espoo, Finland, 14–17 June 2011; pp. 181–188. [Google Scholar]
- Pocock, A.C. Feature Selection via Joint Likelihood. Ph.D. Thesis, School of Computer Science, University of Manchester, Manchester, UK, 2012. [Google Scholar]
- Kvålseth, T.O. The relative useful information measure: Some comments. Inf. Sci.
**1991**, 56, 35–38. [Google Scholar] [CrossRef] - Hand, D.J. Measurement Theory and Applications; Wiley: Chichester, UK, 2004. [Google Scholar]
- Kvålseth, T.O. Entropy evaluation based on value validity. Entropy
**2014**, 16, 4855–4873. [Google Scholar] [CrossRef] - Kvålseth, T.O. Association measures for nominal categorical variables. In International Encyclopedia of Statistical Science; Lovric, M., Ed.; Springer: Berlin/Heidelberg, Germany, 2011; Part 1; pp. 61–64. [Google Scholar]
- Kvålseth, T.O. Cautionary note about R
^{2}. Am. Stat.**1985**, 39, 279–285. [Google Scholar] - Reynolds, H.T. The Analysis of Cross-Classification; The Free Press: New York, NY, USA, 1977. [Google Scholar]
- Kendall, M.; Stuart, A. The Advanced Theory of Statistics, Volume 2: Inference and Relationships, 4th ed.; Charles Griffin: London, UK, 1979. [Google Scholar]

**Table 1.**Values of the normalized forms of the measures in (1), (5), and (8) for the probability distribution $\{{p}_{ij}^{\alpha}\}$ in (42) with differing α-values.

${\mathit{I}}^{\ast}$ | $\mathit{\alpha}$ | ||||
---|---|---|---|---|---|

0.1 | 0.3 | 0.5 | 0.7 | 0.9 | |

${I}^{\ast}({x}_{1};{y}_{1})={I}^{\ast}({x}_{2};{y}_{2})$ | 0.01 | 0.07 | 0.20 | 0.42 | 0.77 |

${I}^{\ast}({x}_{1};{y}_{2})={I}^{\ast}({x}_{2};{y}_{1})$ | 0.01 | 0.06 | 0.18 | 0.37 | 0.69 |

${I}^{\ast}(X;{y}_{1})={I}^{\ast}(X;{y}_{2})$ | 0.01 | 0.07 | 0.19 | 0.39 | 0.71 |

${I}^{\ast}(X;Y)$ | 0.01 | 0.07 | 0.19 | 0.39 | 0.71 |

**Table 2.**Values of ${I}^{\ast}$ for ${I}^{\ast}(X;Y)=I(X;Y)/\mathrm{min}\{H(X),H(Y)\}$ and the distribution $\{{p}_{ij}^{\alpha}\}$ in (42) with differing α-values, as well as the corresponding values for two different functions h satisfying (50), approximately.

$\mathit{\alpha}$ | ${\mathit{I}}^{\ast}\left(\{{\mathit{p}}_{\mathit{i}\mathit{j}}^{\mathit{\alpha}}\}\right)$ | $\sqrt{{\mathit{I}}^{\ast}\left(\{{\mathit{p}}_{\mathit{i}\mathit{j}}^{\mathit{\alpha}}\}\right)}$ | $1-{\left(1-\sqrt{{\mathit{I}}^{\ast}\left(\{{\mathit{p}}_{\mathit{i}\mathit{j}}^{\mathit{\alpha}}\}\right)}\right)}^{11/9}$ |
---|---|---|---|

0 | 0 | 0 | 0 |

0.1 | 0.0072 | 0.0849 | 0.1027 |

0.2 | 0.0291 | 0.1706 | 0.2044 |

0.3 | 0.0659 | 0.2567 | 0.3041 |

0.4 | 0.1187 | 0.3445 | 0.4033 |

0.5 | 0.1887 | 0.4344 | 0.5017 |

0.6 | 0.2781 | 0.5274 | 0.5999 |

0.7 | 0.3902 | 0.6247 | 0.6981 |

0.8 | 0.5310 | 0.7287 | 0.7970 |

0.9 | 0.7136 | 0.8447 | 0.8974 |

1 | 1 | 1 | 1 |

**Table 3.**United States (U.S.) Senate election results in terms of sample probabilities (proportions) $p({x}_{i},{y}_{j})$ for candidate vote (X) and voters’ party identification (Y) (sample size N = 2843). Source: Reynolds ([36] (p. 2)).

Vote (X) | Party Identification (Y) | |||
---|---|---|---|---|

Democrat $({\mathit{y}}_{1})$ | Independent $({\mathit{y}}_{2})$ | Republican $({\mathit{y}}_{3})$ | Total | |

Democrat $({x}_{1})$ | 0.39 | 0.11 | 0.04 | 0.54 |

Republican $({x}_{2})$ | 0.07 | 0.12 | 0.27 | 0.46 |

Total | 0.46 | 0.23 | 0.31 | 1.00 |

Corresponding values for the normalized mutual information measures defined in the text: | ||||

${I}^{\ast}({x}_{1};{y}_{j})$ = 0.35, 0.01, 0.46; ${I}_{C}^{\ast}({x}_{1};{y}_{j})$ = 0.66, 0.12, 0.75 for j = 1, 2, 3 | ||||

${I}^{\ast}({x}_{2};{y}_{j})$ = 0.33, 0.01, 0.55; ${I}_{C}^{\ast}({x}_{2};{y}_{j})$ = 0.65, 0.13, 0.81 for j = 1, 2, 3 | ||||

${I}^{\ast}(X;{y}_{j})$ = 0.33, 0.01, 0.49; ${I}_{C}^{\ast}(X;{y}_{j})$ = 0.65, 0.13, 0.77 for j = 1, 2, 3 | ||||

${I}^{\ast}(X;Y)$ = 0.31; ${I}_{C}^{\ast}(X;Y)$ = 0.63 |

© 2017 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kvålseth, T.O. On Normalized Mutual Information: Measure Derivations and Properties. *Entropy* **2017**, *19*, 631.
https://doi.org/10.3390/e19110631

**AMA Style**

Kvålseth TO. On Normalized Mutual Information: Measure Derivations and Properties. *Entropy*. 2017; 19(11):631.
https://doi.org/10.3390/e19110631

**Chicago/Turabian Style**

Kvålseth, Tarald O. 2017. "On Normalized Mutual Information: Measure Derivations and Properties" *Entropy* 19, no. 11: 631.
https://doi.org/10.3390/e19110631