# A Moment-Based Maximum Entropy Model for Fitting Higher-Order Interactions in Neural Data

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Results

#### 2.1. The Reliable Moment Model

^{k}). Moreover, higher-order features are more susceptible to overfitting, because they represent spiking features that occur less frequently in the data (and consequently have noisy empirical estimates). An alternative is to incorporate a limited subset of predetermined phenomenological features that increase the predictive power of the model, such as the spike count distribution [13] or frequency of the quiescent state [11]. While these models have been able to capture the collective activity of populations of neurons (e.g., to determine whether neural activity operates at a critical point [17]), they are not able to dissect how the functional connectivity between specific subgroups of neurons contributes to the population level activity.

^{N}probabilities. In this case, the partition function can be quickly estimated using other techniques, such as the Good–Turing estimate [23] (see Methods). As we shall see below, attempting to apply these approaches to the RI model strongly disrupts its predictions.

#### 2.2. Illustration with a Toy Example

^{N}unnormalized probabilities, this can become prohibitively slow for large populations. We instead approximate the partition function; e.g., by the Good–Turing estimate [23]. Another alternative is to use Gibbs sampling [25] to generate spiking patterns from the inferred interaction parameters, then use the RI estimate of the partition function as the inverse probability of the non-spiking state in the Gibbs sampled “data”. Regardless of which of these methods is used, our toy example shows the fundamental differences between the RM and RI models, namely, that the RM model can in principle be normalized without disrupting its predictions of spike pattern probabilities.

#### 2.3. The RM Model Infers Fewer Strong Spurious Higher-Order Interactions

#### 2.4. The RM Model Fits Rare Spiking Patterns

^{N}) assuming fixed recording lengths. We therefore tested this effect by generating a new testing dataset for each ground-truth model, and separating it into “old” spiking patterns (those that also occurred within the training dataset) and “new“ spiking patterns (those that only occurred within the test dataset). In order to compare the RM and RI models, we must specify which threshold values to use for each model. Since the RM and RI threshold use different “units” (i.e., the RI threshold is based on the frequencies population spiking patterns, and the RM threshold is based on marginal probabilities or moments), it is difficult to directly compare them. For a fair comparison of the model fits, it is therefore necessary to compare models that have the same number of fitted interaction parameters. Otherwise, any difference in model performance might be attributed to a model having more parameters to fit. We therefore first chose the threshold parameters in this example so that the RM and RI models have exactly the same number of fitted interaction parameters (in this case, 395). Figure 4a shows an example of model vs. empirical frequencies (calculated from held-out test data) for old spiking patterns.

#### 2.5. Fitting a Model with Cortical-Like Statistics and Dense Higher-Order Correlations

## 3. Discussion

## 4. Materials and Methods

#### 4.1. Ground Truth Models

#### 4.2. Identification of Reliable Moments

^{k}). Because of the hierarchy of moments, this search can be expedited by only considering the $k$th-order subsets $\left\{{s}_{1}\cdots {s}_{k}\right\}$ for which all of their $\left(k-1\right)$th-order subsets are elements of ${S}_{k-1}$. This determines whether the corresponding moment is above threshold. This step is performed iteratively until ${S}_{k}=\varnothing $.

#### 4.3. Model Fitting and Sampling

#### 4.4. Dissimilarity Between Empirical Data and Models

#### 4.5. Code Availability

## Author Contributions

## Acknowledgments

## Conflicts of Interest

## Appendix A

^{N}. Although the RI model accurately (in this example, perfectly) fits the most common spiking pattern (silence), it is unnormalized. Furthermore, naive renormalization would result in a probability distribution that is inaccurate for every spiking pattern. This dilemma occurs because $\widehat{Z}$ is an accurate (in this example, perfect) estimate of the partition function of the underlying distribution, but not for the model defined by the interaction parameters identified by the RI model.

## Appendix B

**Figure A1.**The Reliable Moment (RM) model infers smaller parameters for spurious higher-order interaction terms, relative to lower-order terms. (

**a**) Average magnitude of all fitted higher-order interaction parameters normalized by the average magnitude of all pairwise interaction parameters, shown for both the Reliable Interaction (RI; magenta) and RM (blue) models (cf. Figure 3a). (

**b**) Average magnitude of all pairwise interaction parameters (RI, magenta; RM, blue).

## Appendix C

**Figure A2.**The Reliable Moment (RM) model more accurately fits higher-order interaction parameters of a maximum entropy ground-truth model incorporating a sparse subset of triplet terms. Fitted interaction parameters for ground-truth triplets inferred by the RM model (blue) and the Reliable Interaction model (RI, magenta), plotted against the ground-truth values of the interaction parameters.

## References

- Panzeri, S.; Schultz, S.R.; Treves, A.; Rolls, E.T. Correlations and the encoding of information in the nervous system. Proc. R. Soc. B Biol. Sci.
**1999**, 266, 1001–1012. [Google Scholar] [CrossRef] [PubMed][Green Version] - Nirenberg, S.; Latham, P.E. Decoding neuronal spike trains: How important are correlations? Proc. Natl. Acad. Sci. USA
**2003**, 100, 7348–7353. [Google Scholar] [CrossRef] [PubMed][Green Version] - Averbeck, B.B.; Latham, P.E.; Pouget, A. Neural correlations, population coding and computation. Nat. Rev. Neurosci.
**2006**, 7, 358–366. [Google Scholar] [CrossRef] [PubMed] - Schneidman, E.; Berry, M.J.; Segev, R.; Bialek, W. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature
**2006**, 440, 1007–1012. [Google Scholar] [CrossRef] [PubMed][Green Version] - Moreno-Bote, R.; Beck, J.; Kanitscheider, I.; Pitkow, X.; Latham, P.; Pouget, A. Information-limiting correlations. Nat. Neurosci.
**2014**, 17, 1410–1417. [Google Scholar] [CrossRef] [PubMed][Green Version] - Hu, Y.; Zylberberg, J.; Shea-Brown, E. The Sign Rule and Beyond: Boundary Effects, Flexibility, and Noise Correlations in Neural Population Codes. PLoS Comput. Biol.
**2014**, 10. [Google Scholar] [CrossRef] [PubMed] - Ohiorhenuan, I.E.; Mechler, F.; Purpura, K.P.; Schmid, A.M.; Hu, Q.; Victor, J.D. Sparse coding and high-order correlations in fine-scale cortical networks. Nature
**2010**, 466, 617–621. [Google Scholar] [CrossRef] [PubMed][Green Version] - Yu, S.; Yang, H.; Nakahara, H.; Santos, G.S.; Nikolic, D.; Plenz, D. Higher-Order Interactions Characterized in Cortical Activity. J. Neurosci.
**2011**, 31, 17514–17526. [Google Scholar] [CrossRef] [PubMed][Green Version] - Shimazaki, H.; Amari, S.; Brown, E.N.; Grün, S. State-space analysis of time-varying higher-order spike correlation for multiple neural spike train data. PLoS Comput. Biol.
**2012**, 8. [Google Scholar] [CrossRef] [PubMed][Green Version] - Köster, U.; Sohl-Dickstein, J.; Gray, C.M.; Olshausen, B.A. Modeling Higher-Order Correlations within Cortical Microcolumns. PLoS Comput. Biol.
**2014**, 10. [Google Scholar] [CrossRef] [PubMed] - Shimazaki, H.; Sadeghi, K.; Ishikawa, T.; Ikegaya, Y.; Toyoizumi, T. Simultaneous silence organizes structured higher-order interactions in neural populations. Sci. Rep.
**2015**, 5, 9821. [Google Scholar] [CrossRef] [PubMed] - Ganmor, E.; Segev, R.; Schneidman, E. Sparse low-order interaction network underlies a highly correlated and learnable neural population code. Proc. Natl. Acad. Sci. USA
**2011**, 108, 9679–9684. [Google Scholar] [CrossRef] [PubMed][Green Version] - Tkačik, G.; Marre, O.; Amodei, D.; Schneidman, E.; Bialek, W.; Berry, M.J. Searching for Collective Behavior in a Large Network of Sensory Neurons. PLoS Comput. Biol.
**2014**, 10. [Google Scholar] [CrossRef] [PubMed][Green Version] - Cayco-Gajic, N.A.; Zylberberg, J.; Shea-Brown, E. Triplet correlations among similarly tuned cells impact population coding. Front. Comput. Neurosci.
**2015**, 9. [Google Scholar] [CrossRef] [PubMed] - Zylberberg, J.; Shea-Brown, E. Input nonlinearities can shape beyond-pairwise correlations and improve information transmission by neural populations. Phys. Rev. E Stat. Nonlinear Soft Matter Phys.
**2015**, 92, 062707. [Google Scholar] [CrossRef] [PubMed] - Ganmor, E.; Segev, R.; Schneidman, E. The Architecture of Functional Interaction Networks in the Retina. J. Neurosci.
**2011**, 31, 3044–3054. [Google Scholar] [CrossRef] [PubMed] - Tkacik, G.; Mora, T.; Marre, O.; Amodei, D.; Berry, M.J.; Bialek, W. Thermodynamics for a network of neurons: Signatures of criticality.
**2014**, 112, 11508–11513. [Google Scholar] - Berger, A.L.; Pietra, V.J.D.; Pietra, S.A.D. A maximum entropy approach to natural language processing. Comput. Linguist.
**1996**, 22, 39–71. [Google Scholar] - Tkacik, G.; Prentice, J.S.; Balasubramanian, V.; Schneidman, E. Optimal population coding by noisy spiking neurons. Proc. Natl. Acad. Sci. USA
**2010**, 107, 14419–14424. [Google Scholar] [CrossRef] [PubMed][Green Version] - Meshulam, L.; Gauthier, J.L.; Brody, C.D.; Tank, D.W.; Bialek, W. Collective Behavior of Place and Non-place Neurons in the Hippocampal Network. Neuron
**2017**, 96, 1178–1191. [Google Scholar] [CrossRef] [PubMed] - Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev.
**1957**, 106, 620–630. [Google Scholar] [CrossRef] - Sohl-Dickstein, J.; Battaglino, P.B.; Deweese, M.R. New method for parameter estimation in probabilistic models: Minimum probability flow. Phys. Rev. Lett.
**2011**, 107, 220601. [Google Scholar] [CrossRef] [PubMed] - Haslinger, R.; Ba, D.; Galuske, R.; Williams, Z.; Pipa, G. Missing mass approximations for the partition function of stimulus driven Ising models. Front. Comput. Neurosci.
**2013**, 7, 96. [Google Scholar] [CrossRef] [PubMed] - Darroch, J.N.; Ratcliff, D. Generalized Iterative Scaling for Log-Linear Models. Ann. Math. Stat.
**1972**, 43, 1470–1480. [Google Scholar] [CrossRef] - Geman, S.; Geman, D. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. IEEE Trans. Pattern Anal. Mach. Intell.
**1984**, PAMI-6, 721–741. [Google Scholar] [CrossRef] [PubMed] - Macke, J.H.; Opper, M.; Bethge, M. Common input explains higher-order correlations and entropy in a simple model of neural population activity. Phys. Rev. Lett.
**2011**, 106, 208102. [Google Scholar] [CrossRef] [PubMed] - Roxin, A.; Brunel, N.; Hansel, D.; Mongillo, G.; van Vreeswijk, C. On the Distribution of Firing Rates in Networks of Cortical Neurons. J. Neurosci.
**2011**, 31, 16217–16226. [Google Scholar] [CrossRef] [PubMed][Green Version] - Buzsáki, G.; Mizuseki, K. The log-dynamic brain: How skewed distributions affect network operations. Nat. Rev. Neurosci.
**2014**, 15, 264–278. [Google Scholar] [CrossRef] [PubMed] - Cohen, M.R.; Kohn, A. Measuring and interpreting neuronal correlations. Nat. Neurosci.
**2011**, 14, 811–819. [Google Scholar] [CrossRef] [PubMed][Green Version] - Ferrari, U. Learning maximum entropy models from finite-size data sets: A fast data-driven algorithm allows sampling from the posterior distribution. Phys. Rev. E
**2016**, 94, 023301. [Google Scholar] [CrossRef] [PubMed] - Malouf, R. A comparison of algorithms for maximum entropy parameter estimation. In Proceedings of the 6th Conference on Natural Language Learning, Taipei, Taiwan, 31 August–1 September 2002; Volume 20, pp. 1–7. [Google Scholar]
- Broderick, T.; Dudik, M.; Tkacik, G.; Schapire, R.E.; Bialek, W. Faster solutions of the inverse pairwise Ising problem. arXiv, 2007; arXiv:0712.2437. [Google Scholar]
- Bozdogan, H. Model selection and Akaike’s Information Criterion (AIC): The general theory and its analytical extensions. Psychometrika
**1987**, 52, 345–370. [Google Scholar] [CrossRef] - Tang, A.; Jackson, D.; Hobbs, J.; Chen, W.; Smith, J.L.; Patel, H.; Prieto, A.; Petrusca, D.; Grivich, M.I.; Sher, A.; et al. A maximum entropy model applied to spatial and temporal correlations from cortical networks in vitro. J. Neurosci.
**2008**, 28, 505–518. [Google Scholar] [CrossRef] [PubMed] - Marre, O.; El Boustani, S.; Frégnac, Y.; Destexhe, A. Prediction of spatiotemporal patterns of neural activity from pairwise correlations. Phys. Rev. Lett.
**2009**, 102, 138101. [Google Scholar] [CrossRef] [PubMed] - Vasquez, J.C.; Marre, O.; Palacios, A.G.; Berry, M.J.; Cessac, B. Gibbs distribution analysis of temporal correlations structure in retina ganglion cells. J. Physiol. Paris
**2012**, 106, 120–127. [Google Scholar] [CrossRef] [PubMed][Green Version] - Nasser, H.; Cessac, B. Parameter estimation for spatio-temporal maximum entropy distributions application to neural spike trains. Entropy
**2014**, 16, 2244–2277. [Google Scholar] [CrossRef][Green Version] - Herzog, R.; Escobar, M.-J.; Cofre, R.; Palacios, A.G.; Cessac, B. Dimensionality Reduction on Spatio-Temporal Maximum Entropy Models of Spiking Networks. bioRxiv
**2018**. [Google Scholar] [CrossRef] - Paninski, L.; Pillow, J.; Lewi, J. Statistical models for neural encoding, decoding, and optimal stimulus design. Prog. Brain Res.
**2007**, 165, 493–507. [Google Scholar] [PubMed] - Pillow, J.W.; Shlens, J.; Paninski, L.; Sher, A.; Litke, A.M.; Chichilnisky, E.J.; Simoncelli, E.P. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature
**2008**, 454, 995–999. [Google Scholar] [CrossRef] [PubMed][Green Version] - Granot-Atedgi, E.; Tkačik, G.; Segev, R.; Schneidman, E. Stimulus-dependent Maximum Entropy Models of Neural Population Codes. PLoS Comput. Biol.
**2013**, 9. [Google Scholar] [CrossRef] [PubMed] - Vidne, M.; Ahmadian, Y.; Shlens, J.; Pillow, J.W.; Kulkarni, J.; Litke, A.M.; Chichilnisky, E.J.; Simoncelli, E.; Paninski, L. Modeling the impact of common noise inputs on the network activity of retinal ganglion cells. J. Comput. Neurosci.
**2012**, 33, 97–121. [Google Scholar] [CrossRef] [PubMed] - Brillinger, D.R.; Bryant, H.L.; Segundo, J.P. Identification of synaptic interactions. Biol. Cybern.
**1976**, 22, 213–228. [Google Scholar] [CrossRef] [PubMed] - Krumin, M.; Reutsky, I.; Shoham, S. Correlation-Based Analysis and Generation of Multiple Spike Trains Using Hawkes Models with an Exogenous Input. Front. Comput. Neurosci.
**2010**, 4, 147. [Google Scholar] [CrossRef] [PubMed] - Bacry, E.; Muzy, J.F. First- and second-order statistics characterization of hawkes processes and non-parametric estimation. IEEE Trans. Inf. Theory
**2016**, 62, 2184–2202. [Google Scholar] [CrossRef] - Etesami, J.; Kiyavash, N.; Zhang, K.; Singhal, K. Learning Network of Multivariate Hawkes Processes: A Time Series Approach. arXiv, 2016; arXiv:1603.04319. [Google Scholar]
- Macke, J.H.; Berens, P.; Ecker, A.S.; Tolias, A.S.; Bethge, M. Generating spike trains with specified correlation coefficients. Neural Comput.
**2009**, 21, 397–423. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Toy model of N = 3 neurons with only first- and second-order interactions. Ground-truth probabilities are shown for each spiking pattern (black). Also shown are the frequencies predicted by the best-case (i.e., assuming infinite data and fitting time) Reliable Interaction (RI, magenta) and Reliable Moment (RM, blue) models (assuming a threshold of 0.1). Under these assumptions, the RM model would fit the ground-truth frequencies exactly. The RI model exactly fits the frequencies for spiking patterns above threshold, but is inaccurate for rare patterns. Note that the RI model cannot be normalized because the fitted partition function does not match fitted interaction terms (see main text and Appendix A for a detailed explanation). Model parameters: $\alpha =1,\text{}\beta =1.2$ (see Equation (3)).

**Figure 2.**Fitting a ground-truth pairwise maximum entropy model (N = 20). (

**a**,

**b**) Distribution of (

**a**) firing rates (assuming a time window of 20 ms) and (

**b**) pairwise correlation coefficients generated by the ground truth models. (

**c–e**) Example of Reliable Moment (RM) model fit to 200 s of a simulated pairwise ground truth model (${p}_{min}={10}^{-3}$). In this example, the RM model identified all 20 units, 154 pairs, 103 triplets, and 5 quadruplets. (

**c**) Uncentered sample moments in the fitted RM model plotted against the empirical sample moments (estimated from training data) to show quality of model fit. Blue indicates all moments (single, pairwise, and higher-order) that were identified by the RM model. For comparison, red indicates the 36 pairs that were not identified by the RM model (and hence not fitted). (

**d**) Cross-validated RM model probabilities versus ground-truth probability (i.e., estimated from held-out “test” data), for an example ground-truth model. Each point represents a different spiking pattern. (

**e**) RM model correlations plotted against cross-validated empirical correlations (i.e., sample correlations plotted against empirical sample correlations from test data). Again, red points indicate pairs whose corresponding interaction terms were not identified. Inset shows the same for firing rates.

**Figure 3.**The Reliable Moment (RM) infers fewer strong, spurious higher-order interactions. (

**a**) Average magnitude of all fitted higher-order interaction parameters as a function of the number of fitted higher-order interactions, shown for both the Reliable Interaction (RI; magenta) and RM (blue) models. Note that all higher-order interactions should have magnitude 0. Points represent 50 random ground-truth models (i.e., random interaction parameters), each of which is fitted 20 times with varying threshold parameters (see Methods). Solid lines indicate the RM and RI fits to a specific example ground-truth model. (

**b**) Same as (a) but for standard deviation.

**Figure 4.**The Reliable Moment (RM) model is able to predict the probabilities of new spiking patterns. (

**a**) Reliable interaction (RI; magenta) model frequencies and RM (blue) model probabilities of previously observed spiking patterns plotted against ground-truth probability, for an example ground-truth model. Each point represents a different “old” spiking pattern (i.e., occurring within both test and training datasets). For a fair comparison, we chose an example in which the RM and RI models had the same number of fitted interaction parameters (in this case, 395). (

**b**) Dissimilarity (see Methods) between ground-truth distribution and model distribution of spiking patterns over different numbers of fitted higher-order interactions. Points represent 50 random ground-truth models (i.e., random interaction parameters), each of which is fitted 20 times with varying threshold parameters. Solid lines indicate the RM and RI fits to a specific example ground-truth model. (

**c**,

**d**) Same as (

**a**,

**b**) for new spiking patterns (i.e., those observed in the test data but not observed in the training data).

**Figure 5.**Fitting a Dichotomized Gaussian model with cortical-like statistics (N = 20). (

**a**,

**b**) Distribution of (

**a**) firing rates (assuming a time window of 20 ms) and (

**b**) pairwise correlation coefficients generated by the model. The Dichotomized Gaussian model is known to generate dense higher-order correlations [9,26]. (

**c**) Cross-validated dissimilarity between the empirical and model distributions, for both Reliable Interaction (RI; magenta) and Reliable Moment (RM; blue) models. Points represent 50 random ground-truth models (i.e., random interaction parameters), each of which is fitted 20 times with varying threshold parameters. Solid lines indicate the RM and RI fits to a specific example ground-truth model. (

**d**) Cross-validated model frequencies versus empirical probability, for an example ground-truth model. Each point represents a different spiking pattern. Only patterns occurring at least twice in the dataset are shown. Inset shows same plot, including spiking patterns that only occur once. For a fair comparison, we chose an example in which the RM and RI models had the same number of fitted interaction parameters (in this case, 239). (

**e**) Time required for fitting RM and RI models.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Cayco-Gajic, N.A.; Zylberberg, J.; Shea-Brown, E. A Moment-Based Maximum Entropy Model for Fitting Higher-Order Interactions in Neural Data. *Entropy* **2018**, *20*, 489.
https://doi.org/10.3390/e20070489

**AMA Style**

Cayco-Gajic NA, Zylberberg J, Shea-Brown E. A Moment-Based Maximum Entropy Model for Fitting Higher-Order Interactions in Neural Data. *Entropy*. 2018; 20(7):489.
https://doi.org/10.3390/e20070489

**Chicago/Turabian Style**

Cayco-Gajic, N. Alex, Joel Zylberberg, and Eric Shea-Brown. 2018. "A Moment-Based Maximum Entropy Model for Fitting Higher-Order Interactions in Neural Data" *Entropy* 20, no. 7: 489.
https://doi.org/10.3390/e20070489