# Maximum Entropy Approaches to Living Neural Networks

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Maximum Entropy for Spatial Correlations

_{i}, that can be either up (+1) or down (–1). There is a rich history of using this binary representation of neural activity in network models [2,21]. To represent temporal activity as a sequence of numbers, the duration of the recording is broken down into time bins. We will discuss the implications of the width of these time bins more in section 5. A typical width is 20 ms, as in [5,12], but other widths have been used also. Thus, a 100 ms recording in which a single spike fired at 23 ms would be represented by the sequence: (–1,+1, –1, –1, –1) if the data were binned at 20 ms. With this representation, the average firing rate of neuron i is given by:

^{N}possible states) at a particular time bin, and is given by:

_{i}, in which it is embedded. In addition, the state of each spin will depend on the interactions it has with its neighbors. When the coupling constant J

_{ij}is positive, spins will have a tendency to align in the same direction; when it is negative, they will have a tendency to be anti-aligned. In this language, the “energy” of an ensemble of N neurons can be given by:

_{j}again represents the jth state of spins and where the denominator is the partition function, summed over all 2

^{N}possible states of the ensemble. Note that this equation will make states with high energy less probable than states with low energy.

_{i}〉

_{m}and pairwise 〈σ

_{i}σ

_{j}〉

_{m}interactions of the maximum entropy model can be extracted by the following equations, where the subscript m denotes model:

_{i}(V

_{j}) is the activity (either +1 or –1) of σ

_{i}when the ensemble is in state V

_{j}. The expected values from the model are then compared to 〈σ

_{i}〉 and 〈σ

_{i}σ

_{j}〉 found in the data. Although the model parameters are initially selected to match the firing rates and pairwise interactions found in the data, a given set of parameters in general will not produce harmony of every neuron with its local fields and interactions for every state. To improve agreement between 〈σ

_{i}〉

_{m}, 〈σ

_{i}σ

_{j}〉

_{m}and 〈σ

_{i}〉, 〈σ

_{i}σ

_{j}〉, the local magnetic fields h

_{i}and interactions J

_{ij}are adjusted by one of many gradient ascent algorithms [26]. Several groups [12,18] have used iterative scaling [27], as follows:

_{i}and J

_{ij}. Adjustments can be made iteratively until the local field and interactions are close to their asymptotic values. Because the entropy is convex everywhere in this formulation, there are no local extrema and methods like simulated annealing are not necessary. After iterative scaling, the final values of h

_{i}and J

_{ij}are then re-inserted into equation (4) to calculate the energy of each state, and then this energy is inserted into equation (5) to calculate the probability of observing each state V. This process of adjustment can be time consuming because it is computationally intensive to calculate the averages in equations 6 and 7, which requires summing over 2

^{N}terms. Faster methods have been developed which allow larger numbers of neurons to be analyzed [26,28,29]. These methods exploit ways of approximating the averages in equations 6 and 7.

_{i}〉 found in the data, but assumes that all higher-order interactions, like 〈σ

_{i}σ

_{j}〉, are independent and can be given by the product of first-order interactions: 〈σ

_{i}σ

_{j}〉 = 〈σ

_{i}〉·〈σ

_{i}〉. Schneidman and colleagues [12] denoted the probability distribution produced by a first-order model by P

_{1}. A second-order model, as described above, takes into account the firing rates and pairwise interactions, and produces a probability distribution that is denoted by P

_{2}. For an ensemble of N neurons, an accurate N

^{th}-order model would capture all the higher-order interactions found in the data, because its probability distribution, P

_{N}, would be identical with the probability distribution found in the data. The entropy, S, of a distribution, P, is calculated in the standard way:

_{1}, is always greater than the entropy of any higher-order models, S

_{2}… S

_{N}, because increased interactions always reduce entropy [12,30]. The multi-information, I

_{N}, is the total amount of entropy produced by an ensemble, and is expressed as the difference between the entropy of the first-order model and entropy of the actual data [12,31]:

_{i}and J

_{ij}are computed exactly, the ratio can also be expressed as [5,30]:

_{1}is the Kullback-Leibler divergence between P

_{1}and P

_{N}, given by:

_{2}is the Kullback-Leibler divergence between P

_{2}and P

_{N}:

## 3. The Issue of Temporal Correlations

**Figure 2.**Temporal correlations are important. (a) Activity from many neurons plotted over time. Boxes highlight an ensemble of four neurons over six time bins. (b) Within the boxes, there was activity for four consecutive time bins, bracketed by no activity at the beginning and the end. This is a sequence of length L = 4 (see text). (c) Sequence length distributions from actual data were significantly longer than those produced by random concatenations of states from the model. This suggests that temporal correlations play an important part in determining activity in neuronal ensembles.

## 4. Incorporating Temporal Correlations

_{1}and σ

_{3}are shown over space and time. The solid line represents spatial correlation; the dotted line represents temporal correlation one time step into the future; the dashed line represents temporal correlation two time steps into the future. (b) Matrices required for the model only of spatial correlations for the four spin system. The local magnetic field is represented by h, and the spatial coupling constants by J. (c) Composite matrix required for the model of spatial and temporal correlations up to two time steps for the four spin system. The local magnetic field h is as before, but the matrix of coupling constants is considerably expanded. The matrix of spatial coupling constants J occurs whenever interactions among spins at the same time step occur. The matrices of temporal coupling constants, T’ and T’’, occur whenever there are interactions among spins at temporal delays of one and two time steps, respectively. Note that transposed matrices are used below the diagonal, indicating that delayed correlations are treated here as if they are symmetric in time, following [37].

^{3N}different configurations. As before, the probability distribution of these states from the data is compared to the distribution produced by the model (Figure 4C). But now it is possible to tease out the relative importance of spatial and temporal correlations. This can be done by creating three different types of models: one that accounts for spatial correlations only, one that accounts for spatial correlations and temporal correlations only one time step ahead, and one that accounts for spatial correlations and temporal correlations two time steps ahead.

**Figure 4.**Distribution of states for a model with temporal correlations

**(**a) An ensemble of four neurons is selected from the raster plot. (b) Here, activity over a span of three time bins (t, t + 1, and t + 2) is considered one state. (c) The distribution of states is plotted for the model and for the data.

^{3N}states and does not need to be concatenated to produce states of three time steps. Once temporal distributions for all three models are obtained, they can be compared as before by using a ratio of multi-information. The results of this comparison are shown in Figure 5.

^{3N}= 4096 possible states. Note that this number is far more than the number of states in the spatial task, where 2

^{N}= 16. Because the dimensionality has increased dramatically, it is perhaps not surprising that the ratios are below 0.65, the value obtained when spatial models were applied to spatial correlations only between spiking neurons in cortical tissue for N = 4 [18]. Adding more temporal correlations to the model clearly improves its performance, but also reveals a fraction of temporal correlations that are not captured by the model. These preliminary results should be interpreted with caution, however, as they are obtained from a relatively small ensemble size. Calculations for larger ensemble sizes are challenging because of the dramatic increase in dimensionality.

^{3N}= 4,096) is substantially greater than the dimensionality of the spatial problem (2

^{N}= 16). If temporal correlations play any role, it would therefore be reasonable to expect a somewhat lower ratio when only one or no time steps are included in the model. Second, note that including two time steps of temporal correlations nearly doubles the ratio obtained by the spatial model. This quantifies how important temporal correlations are: they account for roughly half of the correlation structure captured by the model. Third, note that even when two time steps are included in the model, a portion of correlations in the data are still unaccounted for. For example, if we were to fit the three points in the plot with an exponential curve, it would asymptote somewhere near 0.75, suggesting that about 25% of the spatio-temporal correlation structure would not be captured by the model, even if we were to include an infinite number of temporal terms in our correlation matrix. Conclusions here are preliminary, though, as this example is taken from a small sample of neurons prepared by our lab. For example, it is presently unclear whether an exponential function, rather than a linear one, should be used here. We are in the process of analyzing larger samples of neurons to clarify this issue. Despite the fact that the temporally extended maximum entropy model is not able to account for some fraction of correlations, the model still may be useful in giving us insight as to how correlations are apportioned in ensembles of living neural networks. This issue was not previously appreciated, and represents a gap that must be filled by future generations of models [18,37,38].

## 5. Limitations and Criticisms

^{N}states are needed for the spatial model and 2

^{3N}states are needed for the temporal model with only three time steps. Here, two types of solutions are relevant: exact and approximate. When N ≥ 30, it becomes computationally unmanageable to solve these models exactly. If we are interested in approximations, however, it is possible to work with much larger values of N. Several groups have worked on ways to rapidly approximate the model for relatively large ensembles [44,45]. These approximations, however, were performed only for the second-order model. In both the exact and the approximate cases, solving the model for large numbers of neurons is computationally challenging. It seems likely that we will be able to record from more neurons than we can analyze for many years to come.

^{TN}for the spatio-temporal correlation problem, where T is the number of time steps to be included in the model. For an ensemble of ten neurons to be modeled over three time bins, it would take ~ 3.4 years of recording to ensure that each bin in the distribution of states was populated 100 times, even if the ensemble marched through each binary state at the rate of one per millisecond. This is of course a very conservative estimate, as it is unreasonable to assume that each state would be visited in such a manner. The criterion of 100 instances per bin was used by Shlens and colleagues for minimal entropy estimation bias [5]. Such long recordings are obviously unattainable, so entropy estimation errors are inevitable.

_{c}, that is given by:

_{c}≈ 33. When the number of neurons in the ensemble is below this critical size (N < N

_{c}), results from the model are not predictive of the behavior of larger ensembles. But when the size of the ensemble is above this critical size (N > N

_{c}), the model may accurately represent the behavior of larger networks. Thus, Roudi et al. [33] would argue that the results reported by [12] and [18] are not really surprising, as these ensemble sizes are below the critical size, where a good fit is to be expected. In the case of Shelns et al. [5], however, this is not true, as their neurons had very high firing rates and thus had a relatively small critical ensemble size. Indeed, when Shlens and colleagues examined whether or not the model fit for ensembles of up to 100 neurons in the retina, they found that it fit quite well [10]. All of this is consistent with the arguments presented in [33]. These results in the retina are promising, but the circuitry there is specialized and it remains to be seen whether similar results can be obtained in cortical tissue. One way to overcome the critical size restriction would be to increase the bin width, δt, which could bring N

_{c}down to a reasonable size, particularly if the neurons are firing at a high rate. Roudi and colleagues point out that this could create other problems, though, as temporal correlations will be lost when large bin widths are used. We should note that these arguments about a critical size are based on the assumption that the model under consideration is second-order only [33]. These arguments may not apply if higher-order models are used.

## 6. Future Directions

## 7. Conclusions

## Acknowledgements

## References

- Haykin, S.S. Neural Networks: A Comprehensive Foundation, 2nd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 1999. [Google Scholar]
- Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. U.S.A.
**1982**, 79, 2554–2558. [Google Scholar] [CrossRef] [PubMed] - Steinbuch, K. The Learning Matrix. Kybernetik
**1961**, 1, 36–45. [Google Scholar] [CrossRef] - Kanerva, P. Sparse Distributed Memory; MIT Press: Cambridge, MA, USA, 1988. [Google Scholar]
- Shlens, J.; Field, G.D.; Gauthier, J.L.; Grivich, M.I.; Petrusca, D.; Sher, A.; Litke, A.M.; Chichilnisky, E.J. The structure of multi-neuron firing patterns in primate retina. J. Neurosci.
**2006**, 26, 8254–8266. [Google Scholar] [CrossRef] [PubMed] - Ikegaya, Y.; Aaron, G.; Cossart, R.; Aronov, D.; Lampl, I.; Ferster, D.; Yuste, R. Synfire chains and cortical songs: temporal modules of cortical activity. Science
**2004**, 304, 559–564. [Google Scholar] [CrossRef] [PubMed] - MacLean, J.N.; Watson, B.O.; Aaron, G.B.; Yuste, R. Internal dynamics determine the cortical response to thalamic stimulation. Neuron.
**2005**, 48, 811–823. [Google Scholar] [CrossRef] [PubMed] - Cossart, R.; Aronov, D.; Yuste, R. Attractor dynamics of network UP states in the neocortex. Nature
**2003**, 423, 283–288. [Google Scholar] [CrossRef] [PubMed] - Kerr, J.N.; Denk, W. Imaging in vivo: Watching the brain in action. Nat. Rev.
**2008**, 9, 195–205. [Google Scholar] [CrossRef] [PubMed] - Shlens, J.; Field, G.D.; Gauthier, J.L.; Greschner, M.; Sher, A.; Litke, A.M.; Chichilnisky, E.J. The Structure of Large-Scale Synchronized Firing in Primate Retina. J. Neurosci.
**2009**, 29, 5022–5031. [Google Scholar] [CrossRef] [PubMed] - Litke, A.M.; Bezayiff, N.; Chichilnisky, E.J.; Cunningham, W.; Dabrowski, W.; Grillo, A.A.; Grivich, M.; Grybos, P.; Hottowy, P.; Kachiguine, S.; Kalmar, R.S.; Mathieson, K.; Petrusca, D.; Rahman, A.; Sher, A. What does the eye tell the brain?: Development of a system for the large-scale recording of retinal output activity. IEEE Trans. Nucl. Sci.
**2004**, 51, 1434–1440. [Google Scholar] [CrossRef] - Schneidman, E.; Berry, M.J., 2nd; Segev, R.; Bialek, W. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature
**2006**, 440, 1007–1012. [Google Scholar] [CrossRef] [PubMed] - Nirenberg, S.H.; Victor, J.D. Analyzing the activity of large populations of neurons: how tractable is the problem? Curr. Opin. Neurobiol.
**2007**, 17, 397–400. [Google Scholar] [CrossRef] [PubMed] - Truccolo, W.; Eden, U.T.; Fellows, M.R.; Donoghue, J.P.; Brown, E.N. A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. J. Neurophysiol.
**2005**, 93, 1074–1089. [Google Scholar] [CrossRef] [PubMed] - Pillow, J.W.; Shlens, J.; Paninski, L.; Sher, A.; Litke, A.M.; Chichilnisky, E.J.; Simoncelli, E.P. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature
**2008**, 454, 995–999. [Google Scholar] [CrossRef] [PubMed] - Schneidman, E.; Still, S.; Berry, M.J., 2nd; Bialek, W. Network information and connected correlations. Phys. Rev. Lett.
**2003**, 91, 238701. [Google Scholar] [CrossRef] [PubMed] - Jaynes, E.T. Information Theory and Statistical Mechanics. 2. Phys. Rev.
**1957**, 108, 171–190. [Google Scholar] [CrossRef] - Tang, A.; Jackson, D.; Hobbs, J.; Chen, W.; Smith, J.L.; Patel, H.; Prieto, A.; Petrusca, D.; Grivich, M.I.; Sher, A.; Hottowy, P.; Dabrowski, W.; Litke, A.M.; Beggs, J.M. A maximum entropy model applied to spatial and temporal correlations from cortical networks in vitro. J. Neurosci.
**2008**, 28, 505–518. [Google Scholar] [CrossRef] [PubMed] - Martignon, L.; Deco, G.; Laskey, K.; Diamond, M.; Freiwald, W.; Vaadia, E. Neural coding: higher-order temporal patterns in the neurostatistics of cell assemblies. Neural Comput.
**2000**, 12, 2621–2653. [Google Scholar] [CrossRef] [PubMed] - Amari, S. Information geometry on hierarchy of probability distributions. IEEE Trans. Inf. Theory
**2001**, 47, 1701–1711. [Google Scholar] [CrossRef] - Van Vreeswijk, C.; Sompolinsky, H. Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science
**1996**, 274, 1724–1726. [Google Scholar] [CrossRef] [PubMed] - Brush, S.G. History of Lenz-Ising Model. Rev.Mod. Phys.
**1967**, 39, 883–893. [Google Scholar] [CrossRef] - Johnston, D.; Wu, S.M.-S. Foundations of Cellular Neurophysiology; MIT Press: Cambridge, MA, USA, 1995. [Google Scholar]
- Jaynes, E.T. Information Theory and Statistical Mechanics. Phys. Rev.
**1957**, 106, 620–630. [Google Scholar] [CrossRef] - Jaynes, E.T. On the Rationale of Maximum-Entropy Methods. Proc. IEEE
**1982**, 70, 939–952. [Google Scholar] [CrossRef] - Malouf, R. A comparison of algorithms for maximum entropy parameter estimation. International Conference On Computational Linguistics. In Proceedings of the 6th Conference on Natural Language Learning 2002, Taipei, Taiwan, 31 August–1 September, 2002; Volume 20, pp. 1–7.
- Darroch, J.N.; Ratcliff, D. Generalized Iterative Scaling for Log-Linear Models. Ann. Math. Stat.
**1972**, 43, 1470–1480. [Google Scholar] [CrossRef] - Broderick, T.; Dudik, M.; Tkacik, G.; Schapire, R.; Bialek, W. Faster Solutions of the Inverse Pairwise Ising Problem. 2008. Arxiv Preprint: http://arxiv.org/PS_cache/arxiv/pdf/0712/0712.2437v2.pdf/ (accessed on 12th January, 2010).
- Roudi, Y.; Aurell, E.; Hertz, J.A. Statistical physics of pairwise probability models. Front. Comput. Neurosci.
**2009**, 3, 22. [Google Scholar] [CrossRef] [PubMed] - Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
- Schneidman, E.; Bialek, W.; Berry, M.J., 2nd. Synergy, redundancy, and independence in population codes. J. Neurosci.
**2003**, 23, 11539–11553. [Google Scholar] [PubMed] - Paninski, L. Estimation of entropy and mutual information. Neural Comput.
**2003**, 15, 1191–1254. [Google Scholar] [CrossRef] - Roudi, Y.; Nirenberg, S.; Latham, P.E. Pairwise maximum entropy models for studying large biological systems: when they can work and when they can't. PLoS Comput. Biol.
**2009**, 5, e1000380. [Google Scholar] [CrossRef] [PubMed] - Strong, S.P.; Koberle, R.; van Steveninck, R.R.D.; Bialek, W. Entropy and information in neural spike trains. Phys. Rev. Lett.
**1998**, 80, 197–200. [Google Scholar] [CrossRef] - Shlens, J.; Kennel, M.B.; Abarbanel, H.D.; Chichilnisky, E.J. Estimating information rates with confidence intervals in neural spike trains. Neural Comp.
**2007**, 19, 1683–1719. [Google Scholar] [CrossRef] [PubMed] - Nemenman, I.; Bialek, W.; de Ruyter van Steveninck, R. Entropy and information in neural spike trains: progress on the sampling problem. Phys. Rev.
**2004**, 69, 056111. [Google Scholar] [CrossRef] - Marre, O.; El Boustani, S.; Fregnac, Y.; Destexhe, A. Prediction of spatiotemporal patterns of neural activity from pairwise correlations. Phys. Rev. Lett.
**2009**, 102, 138101. [Google Scholar] [CrossRef] [PubMed] - Yu, S.; Huang, D.; Singer, W.; Nikolic, D. A small world of neuronal synchrony. Cereb. Cortex
**2008**, 18, 2891–2901. [Google Scholar] [CrossRef] [PubMed] - Qin, Y.L.; McNaughton, B.L.; Skaggs, W.E.; Barnes, C.A. Memory reprocessing in corticocortical and hippocampocortical neuronal ensembles. Philos. Trans. R. Soc. Lond. B. Biol. Sci.
**1997**, 352, 1525–1533. [Google Scholar] [CrossRef] [PubMed] - Song, S.; Sjostrom, P.J.; Reigl, M.; Nelson, S.; Chklovskii, D.B. Highly nonrandom features of synaptic connectivity in local cortical circuits. PLoS Biol.
**2005**, 3, e68. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Zwanzig, R. Nonequilibrium Statistical Mechanics; Oxford University Press: New York, NY, USA, 2001. [Google Scholar]
- Dewar, R.C. Maximum entropy production and the fluctuation theorem. J. Phys. A: Math. Gen.
**2005**, 38, L371–L381. [Google Scholar] [CrossRef] - Nicolelis, M.A.; Dimitrov, D.; Carmena, J.M.; Crist, R.; Lehew, G.; Kralik, J.D.; Wise, S.P. Chronic, multisite, multielectrode recordings in macaque monkeys. Proc. Nat. Acad. Sci. U.S.A.
**2003**, 100, 11041–11046. [Google Scholar] [CrossRef] [PubMed] - Tkacik, G.; Schneidman, E.; Berry, M.J., III; Bialek, W. Ising models for networks of real neurons. 2006. Arxiv Preprint: http://arxiv.org/PS_cache/q-bio/pdf/0611/0611072v1.pdf (accessed on 12th January, 2010).
- Roudi, Y.; Aurell, E.; Hertz, J. Statistical physics of pairwise probability models. Front. Comput. Neurosci.
**2009**, 3, 22. [Google Scholar] [CrossRef] [PubMed] - Schreiber, T. Measuring information transfer. Phys. Rev. Lett.
**2000**, 85, 461–464. [Google Scholar] [CrossRef] [PubMed] - Kaminski, M.; Ding, M.; Truccolo, W.A.; Bressler, S.L. Evaluating causal relations in neural systems: granger causality, directed transfer function and statistical assessment of significance. Biol. Cybernetics
**2001**, 85, 145–157. [Google Scholar] - Cohen, I.B. The Birth of A New Physics; Rev. and updated. ed.; W.W. Norton & Company: New York, NY, USA, 1985. [Google Scholar]

© 2010 by the authors. Licensee Molecular Diversity Preservation International, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license ( http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Yeh, F.-C.; Tang, A.; Hobbs, J.P.; Hottowy, P.; Dabrowski, W.; Sher, A.; Litke, A.; Beggs, J.M.
Maximum Entropy Approaches to Living Neural Networks. *Entropy* **2010**, *12*, 89-106.
https://doi.org/10.3390/e12010089

**AMA Style**

Yeh F-C, Tang A, Hobbs JP, Hottowy P, Dabrowski W, Sher A, Litke A, Beggs JM.
Maximum Entropy Approaches to Living Neural Networks. *Entropy*. 2010; 12(1):89-106.
https://doi.org/10.3390/e12010089

**Chicago/Turabian Style**

Yeh, Fang-Chin, Aonan Tang, Jon P. Hobbs, Pawel Hottowy, Wladyslaw Dabrowski, Alexander Sher, Alan Litke, and John M. Beggs.
2010. "Maximum Entropy Approaches to Living Neural Networks" *Entropy* 12, no. 1: 89-106.
https://doi.org/10.3390/e12010089