# Estimation Bias in Maximum Entropy Models

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Results

#### 2.1. Bias in Maximum Entropy Models

**μ**, we can Taylor expand the bias around the true parameters, leading to:

**Figure 1.**Sampling bias in maximum entropy models. The equilateral triangle represents a D-dimensional probability space (for the binary model considered here, $D={2}^{n}-1$, where n is the dimensionality of $\mathbf{x}$). The cyan lines are contour plots of entropy; the red lines represent the m linear constraints and, thus, lie in a $D-m$ dimensional linear manifold. (

**a**) Maximum entropy occurs at the tangential intersection of the constraints with the entropy contours. (

**b**) The light red region indicates the range of constraints arising from multiple experiments in which a finite number of samples is drawn in each. Maximum entropy estimates from multiple experiments would lie along the green line. (

**c**) As the entropy is concave, averaging the maximum entropy over experiments leads to an estimate that is lower than the true maximum entropy—estimating maximum entropy is subject to downward bias.

**Figure 2.**Normalized bias, b, versus number of samples, $\mathit{K}$. Grey lines: $\mathit{b}$, computed from Equation (11). Colored curves: $-\mathbf{2}\mathit{K}$ times the bias, computed numerically using the expression on the left-hand side of Equation (8). We used a homogeneous Dichotomized Gaussian distribution with $\mathit{n}=\mathbf{10}$ and a mean of 0.1. Different curves correspond to different correlation coefficients [see Equation (19) below], as indicated in the legend.

#### 2.2. Is Bias Correction Important?

**Figure 3.**Scaling of the bias with population size for a homogeneous Dichotomized Gaussian model. (

**a**) Bias, b, for $\nu \delta t=0.1$ and a range of correlation coefficients, ρ. The bias is biggest for strong correlations and large population sizes; (

**b**) $\nu \delta t=0.02$ and a range of (smaller) correlation coefficients. In both panels, the left axis is $b/m$, and the right axis is $(b/m)/log(e/(\nu \delta t\left)\right)$. The latter quantity is important for determining the minimum number of trials [Equation (17)] or the minimum runtime [Equation (18)] needed to reduce bias to an acceptable level.

**Figure 4.**Effect of heterogeneity on the normalized bias in a small population. (

**a**) Normalized bias relative to the within model class case, $b/m$, of a heterogeneous Dichotomized Gaussian model with $n=5$ as a function of the median mean, $\nu dt$, and correlation coefficient, ρ. As with the homogeneous model, bias is largest for small means and strong correlations. (

**b**) The same plot, but for a homogeneous Dichotomized Gaussian. The difference in bias between the heterogeneous and homogeneous models is largest for small means and small correlations, but overall, the two plots are very similar.

#### 2.3. Maximum and Minimum Bias When the True Model is Not in the Model Class

**μ**, now they depend on both

**μ**and β. However, the previous variables are closely related to the new ones: when $\beta =0$, the constraint associated with b disappears, and we recover $q\left(\mathbf{x}\right|\mu )$; that is, $q\left(\mathbf{x}\right|\mathit{\mu},0)=q(\mathbf{x}\left|\mathit{\mu}\right)$. Consequently, ${\lambda}_{i}(\mathit{\mu},0)={\lambda}_{i}\left(\mathit{\mu}\right)$, and $Z(\mathit{\mu},0)=Z\left(\mathit{\mu}\right)$.

**Figure 5.**Relationship between $\Delta S$ and bias. (

**a**) Maximum and minimum normalized bias relative to m versus $\Delta S/{S}_{p}$ (recall that ${S}_{p}$ is the entropy of $p\left(\mathbf{x}\right)$) in a homogeneous population with size $n=5$, $\nu \delta t=0.1$, and correlation coefficients indicated by color. The crosses correspond to a set of homogeneous Dichotomized Gaussian models with $\nu \delta t=0.1$. (

**b**) Same as a, but for $n=100$. For $\rho =0.5$, the bias of the Dichotomized Gaussian model is off the right-hand side of the plot, at $(0.17,3.4)$; for comparison, the maximum bias at $\rho =0.5$ and $\Delta S/{S}_{p}=0.17$ is 3.8. (

**c**) Comparison between the normalized bias of the Dichotomized Gaussian model and the maximum normalized bias. As in panels a and b, we used $\nu \delta t=0.1$. Because the ratio of the biases is trivially near one when b is near m, we plot $({b}_{DG}-m)/({b}_{max}-m)$, where ${b}_{DG}$ and ${b}_{max}$ are the normalized bias of the Dichotomized Gaussian and the maximum bias, respectively; this is the ratio of the “additional” bias. (

**d**) Distribution of total spike count ($={\sum}_{i}{x}_{i}$) for the Dichotomized Gaussian, maximum entropy (MaxEnt) and maximally biased (MaxBias) models with $n=100$, $\nu \delta t=0.1$ and $\rho =0.05$. The similarity between the distributions of the Dichotomized Gaussian and maximally biased models is consistent with the similarity in normalized biases shown in panel c.

#### 2.4. Using a Plug-in Estimator to Reduce Bias

**Figure 6.**Bias correction. (

**a**) Plug-in, ${b}_{\text{plugin}}$, and thresholded, ${b}_{\text{thresh}}$, estimators versus sample size for a homogeneous Dichotomized Gaussian model with $n=10$ and $\nu \delta t=0.1$. Correlations are color coded as in Figure 5. Gray lines indicate the true normalized bias as a function of sample size, computed numerically as for Figure 2. (

**b**) Relative error without bias correction, $({S}_{q}\left(\widehat{\mathit{\mu}}\right)-{S}_{q}\left(\mathit{\mu}\right))/{S}_{q}\left(\mathit{\mu}\right)$, with the plug-in correction, $({S}_{q}\left(\widehat{\mathit{\mu}}\right)+2K{b}_{\text{plugin}}-{S}_{q}\left(\mathit{\mu}\right))/{S}_{q}\left(\mathit{\mu}\right)$, and with the thresholded estimator, $({S}_{q}\left(\widehat{\mathit{\mu}}\right)+2K{b}_{\text{thresh}}-{S}_{q}\left(\mathit{\mu}\right))/{S}_{q}\left(\mathit{\mu}\right)$.

## 3. Discussion

## 4. Methods

#### 4.1. Numerical Methods

#### 4.1.1. Parameters of the Heterogeneous Dichotomized Gaussian Distribution

#### 4.1.2. Fitting Maximum Entropy Models

#### 4.1.3. Bias Correction

#### 4.1.4. Sample Size

#### 4.2. The Normalized Bias Depends Only on the Covariance Matrices

#### 4.3. Dependence of $\Delta S$ and the Normalized Bias, b, on β

## Acknowledgements

## Conflict of Interest

## References

- Rieke, F.; Warland, D.; de Ruyter van Steveninck, R.; Bialek, W. Spikes: Exploring the Neural Code; The MIT Press: Cambridge, MA, USA, 1999. [Google Scholar]
- Borst, A.; Theunissen, F.E. Information theory and neural coding. Nat. Neurosci.
**1999**, 2, 947–957. [Google Scholar] [CrossRef] [PubMed] - Shannon, C.; Weaver, W. The Mathematical Theory of Communication; University of Illinois Press: Chicago, IL, USA, 1949. [Google Scholar]
- Cover, T.; Thomas, J. Elements of Information Theory; Wiley: New York, NY, USA, 1991. [Google Scholar]
- Paninski, L. Estimation of entropy and mutual information. Neural Comput.
**2003**, 15, 1191–1253. [Google Scholar] [CrossRef] - Miller, G. Note on the Bias of Information Estimates. In Information Theory in Psychology II-B; Free Press: Glencole, IL, USA, 1955; pp. 95–100. [Google Scholar]
- Treves, A.; Panzeri, S. The upward bias in measures of information derived from limited data samples. Neural Comput.
**1995**, 7, 399–407. [Google Scholar] [CrossRef] - Panzeri, S.; Senatore, R.; Montemurro, M.A.; Petersen, R.S. Correcting for the sampling bias problem in spike train information measures. J. Neurophysiol.
**2007**, 98, 1064–1072. [Google Scholar] [CrossRef] [PubMed] - Averbeck, B.B.; Latham, P.E.; Pouget, A. Neural correlations, population coding and computation. Nat. Rev. Neurosci.
**2006**, 7, 358–366. [Google Scholar] [CrossRef] [PubMed] - Quian Quiroga, R.; Panzeri, S. Extracting information from neuronal populations: Information theory and decoding approaches. Nat. Rev. Neurosci.
**2009**, 10, 173–185. [Google Scholar] [CrossRef] [PubMed] - Ince, R.A.; Mazzoni, A.; Petersen, R.S.; Panzeri, S. Open source tools for the information theoretic analysis of neural data. Front Neurosci.
**2010**, 4, 60–70. [Google Scholar] [CrossRef] [PubMed][Green Version] - Pillow, J.W.; Ahmadian, Y.; Paninski, L. Model-based decoding, information estimation, and change-point detection techniques for multineuron spike trains. Neural Comput.
**2011**, 23, 1–45. [Google Scholar] [CrossRef] [PubMed] - Tkačik, G.; Schneidman, E.; Berry, M.J., II; Bialek, W. Spin glass models for a network of real neurons. 2009; arXiv:q-bio/0611072v2. [Google Scholar]
- Ising, E. Beitrag zur Theorie des Ferromagnetismus. Zeitschrift für Physik
**1925**, 31, 253–258. [Google Scholar] [CrossRef] - Schneidman, E.; Berry, M.J.; Segev, R.; Bialek, W. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature
**2006**, 440, 1007–1012. [Google Scholar] [CrossRef] [PubMed] - Shlens, J.; Field, G.D.; Gauthier, J.L.; Grivich, M.I.; Petrusca, D.; Sher, A.; Litke, A.M.; Chichilnisky, E.J. The structure of multi-neuron firing patterns in primate retina. J. Neurosci.
**2006**, 26, 8254–8266. [Google Scholar] [CrossRef] [PubMed] - Shlens, J.; Field, G.D.; Gauthier, J.L.; Greschner, M.; Sher, A.; Litke, A.M.; Chichilnisky, E.J. The structure of large-scale synchronized firing in primate retina. J. Neurosci.
**2009**, 29, 5022–5031. [Google Scholar] [CrossRef] [PubMed] - Tang, A.; Jackson, D.; Hobbs, J.; Chen, W.; Smith, J.L.; Patel, H.; Prieto, A.; Petrusca, D.; Grivich, M.I.; Sher, A.; et al. A maximum entropy model applied to spatial and temporal correlations from cortical networks in vitro. J. Neurosci.
**2008**, 28, 505–518. [Google Scholar] [CrossRef] [PubMed] - Ohiorhenuan, I.E.; Mechler, F.; Purpura, K.P.; Schmid, A.M.; Hu, Q.; Victor, J.D. Sparse coding and high-order correlations in fine-scale cortical networks. Nature
**2010**, 466, 617–621. [Google Scholar] [CrossRef] [PubMed] - Yu, S.; Huang, D.; Singer, W.; Nikolic, D. A small world of neuronal synchrony. Cereb Cortex
**2008**, 18, 2891–2901. [Google Scholar] [CrossRef] [PubMed] - Roudi, Y.; Tyrcha, J.; Hertz, J. Ising model for neural data: model quality and approximate methods for extracting functional connectivity. Phys. Rev. E Stat. Nonlin. Soft. Matter Phys.
**2009**, 79, 051915. [Google Scholar] [CrossRef] [PubMed] - Roudi, Y.; Aurell, E.; Hertz, J. Statistical physics of pairwise probability models. Front. Comput. Neurosci.
**2009**, 3. [Google Scholar] [CrossRef] [PubMed] - Mora, T.; Walczak, A.M.; Bialek, W.; Callan, C.G.J. Maximum entropy models for antibody diversity. Proc. Natl. Acad. Sci. USA
**2010**, 107, 5405–5410. [Google Scholar] [CrossRef] [PubMed] - Dhadialla, P.S.; Ohiorhenuan, I.E.; Cohen, A.; Strickland, S. Maximum-entropy network analysis reveals a role for tumor necrosis factor in peripheral nerve development and function. Proc. Natl. Acad. Sci. USA
**2009**, 106, 12494–12499. [Google Scholar] [CrossRef] [PubMed] - Socolich, M.; Lockless, S.W.; Russ, W.P.; Lee, H.; Gardner, K.H.; Ranganathan, R. Evolutionary information for specifying a protein fold. Nature
**2005**, 437, 512–518. [Google Scholar] [CrossRef] [PubMed] - Macke, J.; Berens, P.; Ecker, A.; Tolias, A.; Bethge, M. Generating spike trains with specified correlation coefficients. Neural Comput.
**2009**, 21, 397–423. [Google Scholar] [CrossRef] [PubMed] - Macke, J.; Opper, M.; Bethge, M. Common input explains higher-order correlations and entropy in a simple model of neural population activity. Phys. Rev. Lett.
**2011**, 106, 208102. [Google Scholar] [CrossRef] [PubMed] - Amari, S.I.; Nakahara, H.; Wu, S.; Sakai, Y. Synchronous firing and higher-order interactions in neuron pool. Neural Comput.
**2003**, 15, 127–142. [Google Scholar] [CrossRef] [PubMed] - Yu, S.; Yang, H.; Nakahara, H.; Santos, G.S.; Nikolic, D.; Plenz, D. Higher-order interactions characterized in cortical activity. J. Neurosci.
**2011**, 31, 17514–17526. [Google Scholar] [CrossRef] [PubMed] - Nemenman, I.; Bialek, W.; van Steveninck, R. Entropy and information in neural spike trains: Progress on the sampling problem. Phys. Rev. E
**2004**, 69, 056111. [Google Scholar] [CrossRef] - Archer, E.; Park, I.M.; Pillow, J. Bayesian estimation of discrete entropy with mixtures of stick-breaking priors. Adv. Neural Inf. Process. Syst.
**2012**, 25, 2024–2032. [Google Scholar] - Ahmed, N.; Gokhale, D.V. Entropy expressions and their estimators for multivariate distributions. IEEE Trans. Inf. Theory
**1989**, 35, 688–692. [Google Scholar] [CrossRef] - Oyman, O.; Nabar, R.U.; Bolcskei, H.; Paulraj, A.J. Characterizing the Statistical Properties of Mutual Information in MIMO Channels: Insights into Diversity-multiplexing Tradeoff. In Proceedings of the IEEE Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, Monterey, CA, USA, 3–6 November 2002; Volume 1, pp. 521–525.
- Misra, N.; Singh, H.; Demchuk, E. Estimation of the entropy of a multivariate normal distribution. J. Multivar. Anal.
**2005**, 92, 324–342. [Google Scholar] [CrossRef] - Marrelec, G.; Benali, H. Large-sample asymptotic approximations for the sampling and posterior distributions of differential entropy for multivariate normal distributions. Entropy
**2011**, 13, 805–819. [Google Scholar] [CrossRef] - Cox, D.R.; Wermuth, N. On some models for multivariate binary variables parallel in complexity with the multivariate Gaussian distribution. Biometrika
**2002**, 89, 462–469. [Google Scholar] [CrossRef] - Montani, F.; Ince, R.A.; Senatore, R.; Arabzadeh, E.; Diamond, M.E.; Panzeri, S. The impact of high-order interactions on the rate of synchronous discharge and information transmission in somatosensory cortex. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci.
**2009**, 367, 3297–3310. [Google Scholar] [CrossRef] [PubMed] - Ince, R.A.; Senatore, R.; Arabzadeh, E.; Montani, F.; Diamond, M.E.; Panzeri, S. Information-theoretic methods for studying population codes. Neural Netw.
**2010**, 23, 713–727. [Google Scholar] [CrossRef] [PubMed] - Granot-Atedgi, E.; Tkačik, G.; Segev, R.; Schneidman, E. Stimulus-dependent maximum entropy models of neural population codes. PLoS Comput. Biol.
**2013**, 9, e1002922. [Google Scholar] [CrossRef] [PubMed] - Arabzadeh, E.; Petersen, R.S.; Diamond, M.E. Encoding of whisker vibration by rat barrel cortex neurons: Implications for texture discrimination. J. Neurosci.
**2003**, 23, 9146–9154. [Google Scholar] [PubMed] - Stevenson, I.; Kording, K. How advances in neural recording affect data analysis. Nat. Neurosci.
**2011**, 14, 139–142. [Google Scholar] [CrossRef] [PubMed] - Montemurro, M.A.; Senatore, R.; Panzeri, S. Tight data-robust bounds to mutual information combining shuffling and model selection techniques. Neural Comput.
**2007**, 19, 2913–2957. [Google Scholar] [CrossRef] [PubMed] - Dudïk, M.; Phillips, S.J.; Schapire, R.E. Performance Guarantees for Regularized Maximum Entropy Density Estimation. In Learning Theory; Shawe-Taylor, J., Singer, Y., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; Volume 3120, Lecture Notes in Computer Science; pp. 472–486. [Google Scholar]
- Schmidt M. minFunc. http://www.di.ens.fr/∼mschmidt/Software/minFunc.html (accessed on 30 July 2013).

© 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Macke, J.H.; Murray, I.; Latham, P.E.
Estimation Bias in Maximum Entropy Models. *Entropy* **2013**, *15*, 3109-3129.
https://doi.org/10.3390/e15083109

**AMA Style**

Macke JH, Murray I, Latham PE.
Estimation Bias in Maximum Entropy Models. *Entropy*. 2013; 15(8):3109-3129.
https://doi.org/10.3390/e15083109

**Chicago/Turabian Style**

Macke, Jakob H., Iain Murray, and Peter E. Latham.
2013. "Estimation Bias in Maximum Entropy Models" *Entropy* 15, no. 8: 3109-3129.
https://doi.org/10.3390/e15083109