# Improvement of the k-nn Entropy Estimator with Applications in Systems Biology

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. k-nn Estimators of Differential Entropy

**Figure 1.**Performance of the k-nn estimator Equation (1) for increasing sample size: box-plots of estimated entropy values for samples from a uniform distribution: on interval [0,1] (

**top**); on hypercube ${[0,1]}^{5}$ (

**center**); on hypercube ${[0,1]}^{15}$ (

**bottom**); with real entropy value denoted by the red line.

## 3. Bias Correction of the k-nn Entropy Estimator

**Theorem 1**(Lebesgue)

**.**Let $p\left(x\right)\in {L}_{1}\left({\mathbb{R}}^{d}\right)$; then, for almost all $x\in {\mathbb{R}}^{d}$ and open balls with radius ${r}_{n}\to 0$:

**Figure 2.**Comparison of the k-nn entropy bias estimate by Sricharan et al. [8] (blue line) with the real bias based on the k-nn entropy estimation from sampled points (green line) and the k-nn entropy bias estimate obtained by our method (red line).

## 4. k-nn Estimator Performance for Different Distributions

#### 4.1. Independent Marginals Case

**Figure 3.**Bias of the entropy estimator for growing dimensions for original k-nn entropy estimator in Equation (1) (red) and corrected entropy estimator in Equation (6) (blue) for multivariate random variables with independent marginals sampled from a uniform distribution on interval [0,1] (

**left**); and from the Beta(3,1) distribution (

**right**); for two different sample sizes (

**top**and

**bottom**).

**Figure 4.**Box plots for the growing sample size of estimated entropy values with k-nn entropy estimator in Equation (1) (

**top**); and corrected entropy estimator in Equation (6) (

**center**); for four-dimensional random variables with independent marginals sampled from a uniform distribution on interval [0,1] (

**left**); and the Beta(3,1) distribution (

**right**); w.r.t. the real entropy value denoted by the red line. The bottom panels demonstrate the histograms of marginal distributions.

#### 4.2. Dependent Marginals Case

**Figure 5.**Bivariate distributions with a Gaussian copula dependence structure with correlation coefficient $\rho =0.5$ and marginals sampled from a uniform distribution on interval [0,1] (

**left**); vs. marginals sampled from the Beta(3,1) distribution (

**right**).

**Figure 6.**Bias of the entropy estimator for growing dimensions for original k-nn entropy estimator in Equation (1) (red) and corrected entropy estimator in Equation (6) (blue) for multivariate random variables with dependent marginals sampled from a uniform distribution on interval [0,1] (

**left**); and the Beta(3,1) distribution (

**right**); for two different sample sizes (top-bottom). The dependence structure is given by the Gaussian copula with correlation coefficients among marginals $\rho =0.5$.

## 5. Sensitivity Indices Based on the k-nn Entropy Estimator

**Definition 2.**The mutual information between continuous random variables $X\sim p\left(x\right)$ and $Y\sim p\left(y\right)$ is defined by:

**Definition 3.**The conditional entropy of a random variable $X\sim p\left(x\right)$ given random variable $Y\sim p\left(y\right)$ is defined as:

information theory | set theory |

$H(X,Y)$ | $X\cup Y$ |

$I(X;Y)$ | $X\cap Y$ |

$H\left(X\right|Y)$ | $X\setminus Y$ |

**Definition 4.**white Assume that ${X}_{i}$ are the parameters of the model and Y is the model output, then single sensitivity indices are defined as:

**Definition 5.**If ${X}_{i}$ are parameters of the model and Y is model output, then interactions indices within a pair of parameters are defined by:

**Corollary 6.**Interaction index within the pair of parameters ${X}_{i}$ and ${X}_{j}$ can be expressed as the sum of single sensitivity indices of parameter ${X}_{i}$ and parameter ${X}_{j}$ to the output variable Y minus the group sensitivity index for a pair of parameters:

#### 5.1. Case Study: Model of the p53-Mdm2 Feedback Loop

${Y}_{1}\left({t}_{0}\right)=0$ | p53 protein; |

${Y}_{2}\left({t}_{0}\right)=0.8$ | Mdm2 ligase; |

${Y}_{3}\left({t}_{0}\right)=0.1$ | mRna Mdm2 precursor of Mdm2 ligase. |

${p}_{1}=0.9$ | p53 production; | ${p}_{4}=0.8$ | Mdm2 transcription; |

${p}_{2}=1.7$ | Mdm2-dependent p53 degradation; | ${p}_{5}=0.8$ | Mdm2 degradation; |

${p}_{3}=1.1$ | p53-dependent Mdm2 production; | ${p}_{6}=0$ | independent p53 degradation; |

$k=0.0001$ | p53 threshold for degradation by Mdm2. |

**Figure 7.**Graphical scheme of the modelled system (

**left**); and the oscillatory behavior for the considered parameters values (

**right**).

**Figure 8.**Sensitivity indices based on mutual information (MI) (

**left**); local sensitivity analysis based on derivatives of variables w.r.t. parameters averaged in time (

**right**).

**Figure 9.**Sensitivity indices for pairs of parameters (

**left**); interactions within pairs of parameters, respectively, for the model output (

**right**).

## 6. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Appendix

## A. Performance of the Corrected k-nn Estimator

**Figure A1.**Box plots for the growing sample size of estimated entropy values with k-nn entropy estimator Equation (1) (

**top**); and corrected entropy estimator Equation (6) (

**middle**); for four-dimensional random variables with independent marginals sampled from a standard normal distribution (

**left**); and an Exponential(1) distribution (

**right**); w.r.t. the real entropy value denoted by the red line. The bottom panels demonstrate the histograms of the marginal distributions.

**Figure A2.**The bias of the entropy estimator for growing dimensions for original k-nn entropy estimator in Equation (1) (red), and corrected entropy estimator in Equation (6) (blue) for multivariate random variables with independent marginals sampled from a standard normal distribution (

**left**); and an Exponential(1) distribution (

**right**); for two different sample sizes (

**top-bottom**).

**Figure A3.**Gaussian copula probability density function for two marginal variables with correlation coefficient $\rho =0.5$ (

**left**); and a two-dimensional copula function with correlation coefficient $\rho =0.5$ (

**right**).

## B. Sketch of the Proof of k-nn Estimator Convergence [7]

**Figure B1.**Venn diagrams for information theoretic measures, including interactions between pairs of parameters.

**Figure B2.**Possible trajectories of the p53-Mdm2 negative feedback loop with perturbed parameters. The vertical axis denotes time; the horizontal axis denotes species concentration. The blue line corresponds to the p53 protein concentration; the green line corresponds to the mRna Mdm2 concentration; and the red line corresponds to the Mdm2 ligase concentration.

## References

- Clausius, R. On the Motive Power of Heat, and on the Laws which can be deduced from it for the Theory of Heat. Poggendorff’s Annalen der Physick
**1850**, LXXIX. [Google Scholar] - Maxwell, J.C. Theory of Heat; Longmans, Green and Co.: London, UK, 1871. [Google Scholar]
- Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef] - Lazo, A.; Rathie, P. On the entropy of continuous probability distributions. IEEE Trans. Inf. Theory
**1978**, 24, 120–122. [Google Scholar] [CrossRef] - Westerhoff, H.V.; Palsson, B.O. The evolution of molecular biology into systems biology. Nat. Biotechnol.
**2004**, 22, 1249–1252. [Google Scholar] [CrossRef] [PubMed] - Lüdtke, N.; Panzeri, S.; Brown, M.; Broomhead, D.; Knowles, J.; Montemurro, M.A.; Kell, D. Information-theoretic sensitivity analysis: A general method for credit assignment in complex networks. J. R. Soc. Interface
**2008**, 5, 223–235. [Google Scholar] [CrossRef] [PubMed] - Kozachenko, L.; Leonenko, N. Sample Estimate of the Entropy of a Random Vector. Probl. Peredachi Inf.
**1987**, 23, 9–16. [Google Scholar] - Sricharan, K.; Raich, R.; Hero, A.O. Boundary Compensated k-NN Graphs. In Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, Kittila, Finland, 29 August– 1 September 2010.
- Dorval, A. Probability distributions of the logarithm of inter-spike intervals yield accurate entropy estimates from small datasets. J. Neurosci. Methods
**2008**, 173, 129–139. [Google Scholar] [CrossRef] [PubMed] - Raykar, V.C. Probability Density Function Estimation by different Methods; Report for Course assignment 1 of ENEE 739Q SPRING; University of Maryland: College Park, MD, USA, 2002. [Google Scholar]
- Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E
**2004**, 69, 066138. [Google Scholar] [CrossRef] [PubMed] - Pérez-Cruz, F. Estimation of Information Theoretic Measures for Continuous Random Variables. In Proceedings of the Advances in Neural Information Processing Systems 21 (NIPS 2008), Vancouver, BC, Canada, 2008; p. 1260.
- white Leonenko, N.; Pronzato, L.; Savani, V. A class of Rényi information estimators for multidimensional densities. Ann. Statist.
**2008**, 36, 2153–2182. [Google Scholar] [CrossRef] [Green Version] - white Penrose, M.D.; Yukich, J.E. Limit theory for point processes in manifolds. Ann. Appl. Probab.
**2013**, 23, 2161–2211. [Google Scholar] [CrossRef] - Leonenko, N.; Pronzato, L. Correction: A class of Rényi information estimators for multidimensional densities. Ann. Statist.
**2010**, 38, 3837–3838. [Google Scholar] [CrossRef] - white Wang, Q.; Kulkarni, S.R.; Verdú, S. Divergence estimation for multidimensional densities via k -nearest-neighbor distances. IEEE Trans. Inform. Theory
**2009**, 55, 2392–2405. [Google Scholar] [CrossRef] - white Miller, E.G. A new class of entropy estimators for multidimensional densities. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP ’03), HongKong, China, 6–10 April 2003.
- Learned-Miller, E.G. Hyperspacings and the estimation of information theoretic quantities; University of Massachusetts-Amherst Technical Report: Amherst, MA, USA, 2004. [Google Scholar]
- Van der Meulen, E.C.; Tsybakov, A.B. Root-n consistent estimators of entropy for densities with unbounded support. Scand. J. Stat.
**1996**, 23, 75–83. [Google Scholar] - Sricharan, K.; Raich, R.; Hero, A. Optimized intrinsic dimension estimation. In Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 14–19 March 2010; pp. 5418–5421.
- Sricharan, K.; Raich, R.; Hero, A. Estimation of nonlinear functionals of densities with confidence. IEEE Trans. Inform. Theory
**2012**, 58, 4135–4159. [Google Scholar] [CrossRef] [Green Version] - Sricharan, K.; Wei, D.; Hero, A. Ensemble estimators for multivariate entropy estimation. IEEE Trans. Inform. Theory
**2013**, 59, 4374–4388. [Google Scholar] [CrossRef] [PubMed] - Ma, J.; Sun, Z. Mutual Information is Copula Entropy. Available online: http://arxiv.org/abs/0808.0845v1 (accessed on 23 December 2015).
- white McGill, W.J. Multivariate information transmission. Psychometrika
**2006**, 19, 97–116. [Google Scholar] [CrossRef] - Geva-Zatorsky, N.; Rosenfeld, N.; Itzkovitz, S.; Milo, R.; Sigal, A.; Dekel, E.; Yarnitzky, T.; Liron, Y.; Polak, P.; Lahav, G.; et al. Oscillations and variability in the p53 system. Mol. Syst. Biol.
**2006**, 2. [Google Scholar] [CrossRef] [PubMed] - Charzyńska, A.; Nałȩcz, A.; Rybiński, M.; Gambin, A. Sensitivity analysis of mathematical models of signalling pathways. BioTechnol. J. Biotechnol. Comput. Biol. Bionanotechnol.
**2012**, 93, 291–308. [Google Scholar]

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Charzyńska, A.; Gambin, A.
Improvement of the *k*-nn Entropy Estimator with Applications in Systems Biology. *Entropy* **2016**, *18*, 13.
https://doi.org/10.3390/e18010013

**AMA Style**

Charzyńska A, Gambin A.
Improvement of the *k*-nn Entropy Estimator with Applications in Systems Biology. *Entropy*. 2016; 18(1):13.
https://doi.org/10.3390/e18010013

**Chicago/Turabian Style**

Charzyńska, Agata, and Anna Gambin.
2016. "Improvement of the *k*-nn Entropy Estimator with Applications in Systems Biology" *Entropy* 18, no. 1: 13.
https://doi.org/10.3390/e18010013