# Modules or Mean-Fields?

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Mean-Field Theory

_{q}) is an upper bound on the Helmholtz free energy (F). The implication is that, by minimising the latter, we should arrive at a good approximation of the former. This converts the difficult integration problem of Equation (1) into a much easier optimisation problem. Variational approaches of this sort have a long history, perhaps most famously in the formulation of quantum mechanics in terms of distributions over alternative paths a particle might follow [21]. Crucially, the factorisation of the variational density means we can optimise each factor independently. It is this property that lends a modular aspect to particular kinds of random dynamical system.

_{K}) whose argument (x

_{K}) is some subset of x from a region (K) of the graph involved in that potential. For example, region 6 includes (x

_{1}, x

_{6}), as these are the variables linked to the φ

_{6}node. Crucially, regions overlap such that x

_{6}participates in regions 6 and 7. The Hamiltonian is given by the sum of these potentials:

## 3. Non-Equilibrium Stochastic Dynamics

_{i}and y, and no other system components. There is a sense in which this can be interpreted as ‘information encapsulation’ [34], one of the key features ascribed to modular architectures. Substituting this into the Fokker–Planck equation, we have:

## 4. Factors and Modules

_{1}in Figure 1) sensory input, the lower left (y

_{2}in Figure 1) input, or both.

_{1}). This mimics the (slow and fast) temporal separation seen in neurobiological hierarchies [48,49,50,51]. It also implies a simple form of working memory, in the sense that the effects of the stimulus persist long after it has been removed. Finally, the more central factors respond to both sensory inputs, and show a greater response when both are presented simultaneously. Here, we have evidence in favour of multimodal factors analogous to those brain cells that respond to stimuli presented to different sensory modalities [52,53,54,55,56]. Multimodal properties of this sort speak to the importance of functional integration alongside modular segregation [2,57], heightened during cognitive processing [58].

## 5. Neuronal Message Passing

_{1},…,y

_{4}). For instance, the position of a cup of coffee has potential consequences for vision, gustation, olfaction, and somatosensation. It may be that the data-generating process is of a form that requires some transformation of the x variables, or even that the generative model is not an accurate description of the data-generating process [68]. Regardless of whether the model is a ‘good’ model, the inferential interpretation is useful in thinking about modularity. This is because it allows us to conceptualise a factor of the system as performing computations about something. If each factor is about something different, each can be thought of as a specialised module with a definitive role, in relation to the external environment.

## 6. Discussion

## 7. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Software Note

## Appendix A

## Appendix B

## References

- Fodor, J.A. The Modularity of Mind: An Essay on Faculty Psychology, reprint ed.; MIT Press: Cambridge, MA, USA, 1983. [Google Scholar]
- Friston, K.J.; Price, C.J. Modules and brain mapping. Cogn. Neuropsychol.
**2011**, 28, 241–250. [Google Scholar] [CrossRef] [PubMed][Green Version] - Clune, J.; Mouret, J.-B.; Lipson, H. The evolutionary origins of modularity. Biol. Sci.
**2013**, 280, 20122863. [Google Scholar] [CrossRef] [PubMed][Green Version] - Hipolito, I.; Kirchhoff, M.D. The Predictive Brain: A Modular View of Brain and Cognitive Function? preprints, 2019. Available online: https://www.preprints.org/manuscript/201911.0111/v1 (accessed on 13 May 2020).
- Baltieri, M.; Buckley, C.L. The modularity of action and perception revisited using control theory and active inference. In Artificial Life Conference Proceedings; MIT Press: Cambridge, MA, USA, 2018; pp. 121–128. [Google Scholar]
- Cosmides, L.; Tooby, J. Origins of domain specificity: The evolution of functional organization. In Mapping the Mind: Domain Specificity in Cognition and Culture; Cambridge University Press: New York, NY, USA, 1994; pp. 85–116. [Google Scholar]
- Weiss, P. L’hypothèse du champ moléculaire et la propriété ferromagnétique. J. Phys. Theor. Appl.
**1907**, 6, 661–690. [Google Scholar] [CrossRef] - Kadanoff, L.P. More is the Same; Phase Transitions and Mean Field Theories. J. Stat. Phys.
**2009**, 137, 777. [Google Scholar] [CrossRef] - Cessac, B. Mean Field Methods in Neuroscience. 2015. Available online: https://core.ac.uk/download/pdf/52775181.pdf (accessed on 13 May 2020).
- Fasoli, D. Attacking the Brain with Neuroscience: Mean-Field Theory, Finite Size Effects and Encoding Capability of Stochastic Neural Networks. Ph.D. Thesis, Université Nice Sophia Antipolis, Nice, France, 2013. [Google Scholar]
- Winn, J.; Bishop, C.M. Variational message passing. J. Mach. Learn. Res.
**2005**, 6, 661–694. [Google Scholar] - Gadomski, A.; Kruszewska, N.; Ausloos, M.; Tadych, J. On the Harmonic-Mean Property of Model Dispersive Systems Emerging Under Mononuclear, Mixed and Polynuclear Path Conditions. In Traffic and Granular Flow’05; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
- Hethcote, H.W. Three Basic Epidemiological Models. In Applied Mathematical Ecology; Levin, S.A., Hallam, T.G., Gross, L.J., Eds.; Springer: Berlin/Heidelberg, Germany, 1989; pp. 119–144. [Google Scholar]
- Lasry, J.-M.; Lions, P.-L. Mean field games. Jpn. J. Math.
**2007**, 2, 229–260. [Google Scholar] [CrossRef][Green Version] - Lelarge, M.; Bolot, J. A local mean field analysis of security investments in networks. In Proceedings of the 3rd international workshop on Economics of networked systems, Seattle, WA, USA, 20–22 August 2008. [Google Scholar]
- Friston, K. A free energy principle for a particular physics. arXiv
**2019**, arXiv:1906.10184. [Google Scholar] - Yoshioka, D. The Partition Function and the Free Energy. In Statistical Physics: An Introduction; Yoshioka, D., Ed.; Springer: Berlin/Heidelberg, Germany, 2007; pp. 35–44. [Google Scholar]
- Hinton, G.E.; Zemel, R.S. Autoencoders, minimum description length and Helmholtz free energy. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 1994. [Google Scholar]
- Beal, M.J. Variational Algorithms for Approximate Bayesian Inference; University of London: London, UK, 2003. [Google Scholar]
- Bogolyubov, N.N. On model dynamical systems in statistical mechanics. Physica
**1966**, 32, 933–944. [Google Scholar] [CrossRef] - Feynman, R.P. Space-Time Approach to Non-Relativistic Quantum Mechanics. Rev. Mod. Phys.
**1948**, 20, 367–387. [Google Scholar] [CrossRef][Green Version] - Loeliger, H. An introduction to factor graphs. IEEE Signal Process. Mag.
**2004**, 21, 28–41. [Google Scholar] [CrossRef][Green Version] - Vontobel, P.O. A factor-graph approach to Lagrangian and Hamiltonian dynamics. In 2011 IEEE International Symposium on Information Theory Proceedings; IEEE: Piscataway, NJ, USA, 2011. [Google Scholar]
- Loeliger, H.; Vontobel, P.O. Factor Graphs for Quantum Probabilities. IEEE Trans. Inf. Theory
**2017**, 63, 5642–5665. [Google Scholar] [CrossRef] - Parr, T.; Friston, K.J. The Anatomy of Inference: Generative Models and Brain Structure. Front. Comput. Neurosci.
**2018**, 12, 90. [Google Scholar] [CrossRef] [PubMed] - Friston, K.J.; Parr, T.; de Vries, B. The graphical brain: Belief propagation and active inference. Netw. Neurosci.
**2017**, 1, 381–414. [Google Scholar] [CrossRef] [PubMed] - Pelizzola, A. Cluster variation method in statistical physics and probabilistic graphical models. J. Phys. A Math. Gen.
**2005**, 38, R309–R339. [Google Scholar] [CrossRef] - Yedidia, J.S.; Freeman, W.T.; Weiss, Y. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. Inf. Theory
**2005**, 51, 2282–2312. [Google Scholar] [CrossRef] - Frey, B.J.; MacKay, D.J.C. A revolution: Belief propagation in graphs with cycles. In Proceedings of the 1997 Conference on Advances in Neural Information Processing Systems 10; MIT Press: Denver, CL, USA, 1998; pp. 479–485. [Google Scholar]
- Risken, H. Fokker-Planck Equation. In The Fokker-Planck Equation: Methods of Solution and Applications; Springer: Berlin/Heidelberg, Germany, 1996; pp. 63–95. [Google Scholar]
- Ao, P. Potential in stochastic differential equations: Novel construction. J. Phys. A Math. Gen.
**2004**, 3, L25–L30. [Google Scholar] [CrossRef] - Kwon, C.; Ao, P.; Thouless, D.J. Structure of stochastic dynamics near fixed points. Proc. Natl. Acad. Sci. USA
**2005**, 102, 13029–13033. [Google Scholar] [CrossRef][Green Version] - Ma, Y.-A.; Chen, T.; Fox, E. A complete recipe for stochastic gradient MCMC. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2015. [Google Scholar]
- Pylyshyn, Z. Is vision continuous with cognition? The case for cognitive impenetrability of visual perception. Behav. Brain Sci.
**1999**, 22, 341–365. [Google Scholar] [CrossRef] - Seifert, U. Stochastic thermodynamics, fluctuation theorems and molecular machines. Rep. Prog. Phys.
**2012**, 75, 126001. [Google Scholar] [CrossRef][Green Version] - Grzelczak, M.; Vermant, J.; Furst, E.M.; Liz-Marzán, L.M. Directed Self-Assembly of Nanoparticles. ACS Nano
**2010**, 4, 3591–3605. [Google Scholar] [CrossRef] - Cheng, J.Y.; Mayes, A.M.; Ross, C.A. Nanostructure engineering by templated self-assembly of block copolymers. Nat. Mater.
**2004**, 3, 823–828. [Google Scholar] [CrossRef] [PubMed] - Marreiros, A.C.; Kiebel, S.J.; Daunizeau, J.; Harrison, L.M.; Friston, K.J. Population dynamics under the Laplace assumption. Neuroimage
**2009**, 44, 701–714. [Google Scholar] [CrossRef] [PubMed] - Moran, R.; Pinotsis, D.A.; Friston, K. Neural masses and fields in dynamic causal modeling. Front. Comput. Neurosci.
**2013**, 7, 57. [Google Scholar] [CrossRef] [PubMed][Green Version] - Hastings, W.K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika
**1970**, 57, 97–109. [Google Scholar] [CrossRef] - Yildirim, I. Bayesian inference: Gibbs sampling; Technical Note; University of Rochester: Rochester, NY, USA, 2012. [Google Scholar]
- Neal, R.M. Probabilistic Inference Using Markov Chain Monte Carlo Methods; Department of Computer Science, University of Toronto: Toronto, ON, Canada, 1993. [Google Scholar]
- Girolami, M.; Calderhead, B. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B
**2011**, 73, 123–214. [Google Scholar] [CrossRef] - Ungerleider, L.G.; Haxby, J.V. ‘What’ and ‘where’ in the human brain. Curr. Opin. Neurobiol.
**1994**, 4, 157–165. [Google Scholar] [CrossRef] - Winkler, I.; Denham, S.; Mill, R.; Bőhm, T.M.; Bendixen, A. Multistability in auditory stream segregation: A predictive coding view. Philos. Trans. R. Soc. B Biol. Sci.
**2012**, 367, 1001–1012. [Google Scholar] [CrossRef][Green Version] - Hickok, G.; Poeppel, D. Dorsal and ventral streams: A framework for understanding aspects of the functional anatomy of language. Cognition
**2004**, 92, 67–99. [Google Scholar] [CrossRef] - Friston, K.; Buzsaki, G. The Functional Anatomy of Time: What and When in the Brain. Trends Cogn. Sci.
**2016**, 20, 500–511. [Google Scholar] [CrossRef] - Kiebel, S.J.; Daunizeau, J.; Friston, K.J. A Hierarchy of Time-Scales and the Brain. PLoS Comput. Biol.
**2008**, 4, e1000209. [Google Scholar] [CrossRef] - Cocchi, L.; Sale, M.V.; Gollo, L.L.; Bell, P.T.; Nguyen, V.T.; Zalesky, A.; Breakspear, M.; Mattingley, J.B. A hierarchy of timescales explains distinct effects of local inhibition of primary visual cortex and frontal eye fields. eLife
**2016**, 5, e15252. [Google Scholar] [CrossRef] [PubMed][Green Version] - Hasson, U.; Yang, E.; Vallines, I.; Heeger, D.J.; Rubin, N. A Hierarchy of Temporal Receptive Windows in Human Cortex. Off. J. Soc. Neurosci.
**2008**, 28, 2539–2550. [Google Scholar] [CrossRef] [PubMed] - Murray, J.D.; Bernacchia, A.; Freedman, D.J.; Romo, R.; Wallis, J.D.; Cai, X.; Padoa-Schioppa, C.; Pasternak, T.; Seo, H.; Lee, D.; et al. A hierarchy of intrinsic timescales across primate cortex. Nat. Neurosci.
**2014**, 17, 1661–1663. [Google Scholar] [CrossRef] [PubMed][Green Version] - Murata, A.; Fadiga, L.; Fogassi, L.; Gallese, V.; Raos, V.; Rizzolatti, G. Object representation in the ventral premotor cortex (area F5) of the monkey. J. Neurophysiol.
**1997**, 78, 2226–2230. [Google Scholar] [CrossRef] [PubMed][Green Version] - Giard, M.H.; Peronnet, F. Auditory-Visual Integration during Multimodal Object Recognition in Humans: A Behavioral and Electrophysiological Study. J. Neurophysiol.
**1999**, 11, 473–490. [Google Scholar] [CrossRef] [PubMed] - Wallace, M.T.; Meredith, M.A.; Stein, B.E. Multisensory Integration in the Superior Colliculus of the Alert Cat. J. Neurophysiol.
**1998**, 80, 1006–1010. [Google Scholar] [CrossRef][Green Version] - Limanowski, J.; Blankenburg, F. Integration of Visual and Proprioceptive Limb Position Information in Human Posterior Parietal, Premotor, and Extrastriate Cortex. Off. J. Soc. Neurosci.
**2016**, 36, 2582–2589. [Google Scholar] [CrossRef][Green Version] - Stein, B.E.; Stanford, T.R. Multisensory integration: Current issues from the perspective of the single neuron. Nat. Rev. Neurosci.
**2008**, 9, 255–266. [Google Scholar] [CrossRef] - Tononi, G.; Sporns, O.; Edelman, G.M. A measure for brain complexity: Relating functional segregation and integration in the nervous system. Proc. Natl. Acad. Sci. USA
**1994**, 91, 5033–5037. [Google Scholar] [CrossRef][Green Version] - Fukushima, M.; Betzel, R.F.; He, Y.; van den Heuvel, M.P.; Zuo, X.N.; Sporns, O. Structure-function relationships during segregated and integrated network states of human brain functional connectivity. Brain Struct. Funct.
**2018**, 223, 1091–1106. [Google Scholar] [CrossRef][Green Version] - Markov, N.T.; Ercsey-Ravasz, M.; Van Essen, D.C.; Knoblauch, K.; Toroczkai, Z.; Kennedy, H. Cortical high-density counterstream architectures. Science
**2013**, 342, 1238406. [Google Scholar] [CrossRef] [PubMed][Green Version] - Pearl, J. Probabilistic Reasoning. In Intelligent Systems: Networks of Plausible Inference; Morgan Kaufmann: San Fransisco, CA, USA, 1988. [Google Scholar]
- Friston, K.; Kiebel, S. Predictive coding under the free-energy principle. Philos. Trans. R. Soc. B Biol. Sci.
**2009**, 364, 1211–1221. [Google Scholar] [CrossRef] [PubMed][Green Version] - Rao, R.P.; Ballard, D.H. Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci.
**1999**, 2, 79–87. [Google Scholar] [CrossRef] [PubMed] - David, O.; Kilner, J.M.; Friston, K.J. Mechanisms of evoked and induced responses in MEG/EEG. NeuroImage
**2006**, 31, 1580–1591. [Google Scholar] [CrossRef] [PubMed] - Knill, D.C.; Pouget, A. The Bayesian brain: The role of uncertainty in neural coding and computation. Trends Neurosci.
**2004**, 27, 712–719. [Google Scholar] [CrossRef] [PubMed] - Doya, K. Bayesian Brain: Probabilistic Approaches to Neural Coding; MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
- Friston, K. The free-energy principle: A unified brain theory? Nat. Rev. Neurosci.
**2010**, 11, 127–138. [Google Scholar] [CrossRef] - O’Reilly, J.X.; Jbabdi, S.; Behrens, T.E.J. How can a Bayesian approach inform neuroscience? Eur. J. Neurosci.
**2012**, 35, 1169–1179. [Google Scholar] [CrossRef] - Tschantz, A.; Seth, A.K.; Buckley, C.L. Learning action-oriented models through active inference. bioRxiv
**2019**. [Google Scholar] [CrossRef][Green Version] - George, D.; Hawkins, J. Towards a mathematical theory of cortical micro-circuits. PLoS Comput. Biol.
**2009**, 5, e1000532. [Google Scholar] [CrossRef] - Parr, T.; Markovic, D.; Kiebel, S.J.; Friston, K.J. Neuronal message passing using Mean-field, Bethe, and Marginal approximations. Sci. Rep.
**2019**, 9, 1889. [Google Scholar] [CrossRef] - Van de Laar, T.W.; de Vries, B. Simulating Active Inference Processes by Message Passing. Front. Robot. AI
**2019**, 6, 20. [Google Scholar] [CrossRef][Green Version] - Parr, T.; Costa, L.D.; Friston, K. Markov blankets, information geometry and stochastic thermodynamics. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci.
**2020**, 378, 20190159. [Google Scholar] [CrossRef] [PubMed][Green Version] - Sajid, N.; Ball, P.J.; Friston, K.J. Demystifying active inference. arXiv
**2019**, arXiv:1909.10863. [Google Scholar] - Da Costa, L.; Parr, T.; Sajid, N.; Veselic, S.; Neacsu, V.; Friston, K. Active inference on discrete state-spaces: A synthesis. arXiv
**2020**, arXiv:2001.07203. [Google Scholar] - Harding, M.C.; Hausman, J. Using a Laplace: Approximation to Estimate the Random Coefficients logit model by Nonlinear Least Squares*. Int. Econ. Rev.
**2007**, 48, 1311–1328. [Google Scholar] [CrossRef] - Daunizeau, J.; Friston, K.J.; Kiebel, S.J. Variational Bayesian identification and prediction of stochastic nonlinear dynamic causal models. Phys. D Nonlinear Phenom.
**2009**, 238, 2089–2118. [Google Scholar] [CrossRef][Green Version] - He, X.; Cai, D.; Shao, Y.; Bao, H.; Han, J. Laplacian regularized gaussian mixture model for data clustering. IEEE Trans. Knowl. Data Eng.
**2010**, 23, 1406–1418. [Google Scholar] [CrossRef][Green Version] - Parr, T.; Friston, K.J. The Discrete and Continuous Brain: From Decisions to Movement—And Back Again. Neural Comput.
**2018**, 30, 2319–2347. [Google Scholar] [CrossRef] - Parr, T.; Friston, K.J. The computational pharmacology of oculomotion. Psychopharmacology
**2019**, 236, 2473–2484. [Google Scholar] [CrossRef][Green Version] - Tsujimoto, S.; Postle, B.R. The prefrontal cortex and oculomotor delayed response: A reconsideration of the “mnemonic scotoma”. J. Cogn. Neurosci.
**2012**, 24, 627–635. [Google Scholar] [CrossRef][Green Version] - Funahashi, S. Functions of delay-period activity in the prefrontal cortex and mnemonic scotomas revisited. Front. Syst. Neurosci.
**2015**, 9, 2. [Google Scholar] [CrossRef] [PubMed][Green Version] - Kojima, S.; Goldman-Rakic, P.S. Delay-related activity of prefrontal neurons in rhesus monkeys performing delayed response. Brain Res.
**1982**, 248, 43–50. [Google Scholar] [CrossRef] - Zarghami, T.S.; Friston, K.J. Dynamic effective connectivity. NeuroImage
**2020**, 207, 116453. [Google Scholar] [CrossRef] [PubMed] - Wu, C.-H.; Doerschuk, P.C. Tree approximations to Markov random fields. IEEE Trans. Pattern Anal. Mach. Intell.
**1995**, 17, 391–402. [Google Scholar] [CrossRef] - Wainwright, M.J.; Jaakkola, T.S.; Willsky, A.S. Tree-based reparameterization framework for analysis of sum-product and related algorithms. IEEE Trans. Inf. Theory
**2003**, 49, 1120–1146. [Google Scholar] [CrossRef][Green Version] - Friston, K. Life as we know it. J. R. Soc. Interface
**2013**, 10, 20130475. [Google Scholar] [CrossRef][Green Version] - Rojas-Carulla, M.; Schölkopf, B.; Turner, R.; Peters, J. Invariant models for causal transfer learning. J. Mach. Learn. Res.
**2018**, 19, 1309–1342. [Google Scholar] - Bengio, Y. Deep learning of representations for unsupervised and transfer learning. Workshop Conf. Proc.
**2012**, 27, 17–37. [Google Scholar] - Maisto, D.; Donnarumma, F.; Pezzulo, G. Divide et impera: Subgoaling reduces the complexity of probabilistic inference and problem solving. J. R. Soc. Interface
**2015**, 12, 20141335. [Google Scholar] [CrossRef][Green Version] - Jaynes, E.T. Information Theory and Statistical Mechanics. Phys. Rev. Ser. II
**1957**, 106, 620–630. [Google Scholar] [CrossRef] - Hohwy, J. The Self-Evidencing Brain. Noûs
**2016**, 50, 259–285. [Google Scholar] [CrossRef]

**Figure 1.**This schematic illustrates how the decomposition of a Hamiltonian into the sum of potentials may be represented graphically. This is a factor graph that represents each potential as a square node. The arguments of each potential are represented as circles connected to that square node. The y arguments of the potentials are represented as smaller squares. The arrows on some of the edges inherit from the interpretation of potentials as log conditional probabilities. If a random variable A is conditionally dependent on a variable B, the factor linking the two will include an arrow pointing towards A. The factor graph shown here is the (arbitrarily constructed) Hamiltonian that we will employ in the simulations in subsequent figures. This assumes a quadratic form for each potential. The details of these potentials are not important and could be replaced with any alternative quadratic functions. For readers interested in the precise formulation used in the simulations that follow, please see the Matlab routines referred to in the software note. In brief, each potential is centred upon a linear function of the mode of the neighbouring potential. An important feature of this structure is the sparsity of conditional dependencies. Each factor connects at most two variables. We assume ${x}_{i}\in {\mathbb{R}}^{2}$ in what follows. Uppercase subscripts are used to identify larger groups of x (i.e., ${x}_{K}\in {\mathbb{R}}^{\ge 2}$), corresponding to the argument of a given potential.

**Figure 2.**The plots in this figure illustrate the evolution of the random dynamical system whose Hamiltonian is shown in Figure 1. The plots on the left show the evolution of the system over time. This is a 34-dimensional system, which is shown on the right in terms of 17 particles, whose positions are described by two coordinates. The plots on the right show the final configuration at the end of the simulation. The first row shows a single realization of a stochastic trajectory. The second averages over 16 realizations of the trajectory. The third row shows the density dynamics under a Laplace approximation. The mean-field factorisation treats each particle independently (so each factor is a bivariate normal distribution). The filled pink circles in the plots on the right illustrate the values of the y variables (which are fixed). For ease of visibility, the intensity of each of the densities superimposed on this image have been normalised, such that their mode is the same intensity (regardless of the probability density at that mode).

**Figure 3.**This figure decomposes the density dynamics of Figure 2 in line with the mean-field partition. The arrows here indicate the influence of each marginal density on another via their associated mean-fields. In other words, they represent the non-zero elements of the Jacobian for the vector μ, with elements μ

_{i}, whose rate of change is given in Equation (13). Each image shows the probability density at the end of the simulation (in blue) and the trajectory of the mode throughout the simulation (white). Note the highly precise distribution over the central factor, which is constrained by its four neighbours. The key message to take away from this figure is that the mean-field approximation separates the full system of Figure 2 into a series of smaller systems that influence one another only through their averages.

**Figure 4.**These plots show the consequences of perturbing the y variables on the density dynamics (depicted as the mode and surrounding 90% credible intervals) for each factor. The images on the left indicate which factor is shown in each row of the plots. Each column of plots shows a separate simulation in which different perturbations are applied to the y variables. In this figure, the plots show the central factor (first row) through to the upper right factor (fifth row). The lower two rows show the y variables in the upper right and lower left. These are perturbed by introducing a sinusoidal impulse. The first column of plots shows the response to the upper right perturbation. The second column shows the limited response to the lower left perturbation. The third column shows the increased recruitment of more central regions in the presence of both perturbations. The key point to take away from this figure is that a simple form of ‘information encapsulation’ or functional specialisation occurs in the extremities, with specific responses to, and only to, one sort of y variable. Over a hierarchy of timescales and progressively prolonged responses evocative of delay-period firing in working memory tasks, the factors become progressively multimodal. Figure 5 shows the lower left factors in the same simulation.

**Figure 5.**These plots complement those of Figure 4, illustrating the same perturbations and their consequences for the lower left modules. Here, there is little effect of the upper right y perturbation until we reach more central regions. However, there is a response to the lower left y perturbation that was not seen in Figure 4. For details on the format of these plots, please see the legend of Figure 4.

**Figure 6.**The central panel in this figure shows an interpretation of Equations (13) and (14), applied to the Hamiltonian of Figure 1, as a neuronal network. This shows a reciprocal message passing in which more central and more peripheral regions communicate along a neural hierarchy. Each arm of this hierarchy connects central regions to sensory input (shown as squares, consistent with previous figures). Central regions therefore have multimodal properties, responding to any of the sensory perturbations. Peripheral regions are more specialised in virtue of their proximity to external input. The panel on the left unpacks the connections between two regions (modules, or factors of a mean-field density) in detail. This includes neural populations representing the (2-dimensional) mode (in red and blue), auxiliary variables (in lighter shades) playing the role of prediction errors (i.e., gradients of the local Hamiltonian), and connections between these. Blue connections are inhibitory while red are connections excitatory. Note that, while some populations are shown as giving rise to both excitatory and inhibitory connections, we do not intend to imply a violation of Dale’s law. The assumption here is that there are intermediate inhibitory neurons that act to change the sign of the connection. The panel on the right highlights the importance of intrinsic (intra-modular) connectivity, and the role of the diffusion tensor (Γ) and solenoidal flow (Q) in determining neural activity. If the solenoidal component is large relative to the diffusion tensor, this leads to net excitation of the blue neuron by the red, and net inhibition of the red by the blue. This pattern of connectivity favours intrinsically driven oscillations. The circuit dominated by the diffusion tensor favours rapid convergence of neural activity to a fixed point.

Distribution | Support | Hamiltonian |
---|---|---|

Gaussian | $x\in \mathbb{R}$ | ${\scriptscriptstyle \frac{1}{2\beta}}(x-\mu )\cdot \Pi (x-\mu )$ |

Multinomial ^{1} | $\begin{array}{l}{x}_{i}\in \{0\dots N\}\\ i\in \{1,\dots ,K\}\\ {\displaystyle {\sum}_{i}{x}_{i}=N}\end{array}$ | $-{\scriptscriptstyle \frac{1}{\beta}}{\displaystyle {\sum}_{i}{x}_{i}\mathrm{ln}{d}_{i}}$ |

Dirichlet ^{2} | $\begin{array}{l}{x}_{i}\in (0,1)\\ i\in \{1,\dots ,K\}\\ {\displaystyle {\sum}_{i}{x}_{i}=1}\end{array}$ | ${\scriptscriptstyle \frac{1}{\beta}}{\displaystyle {\sum}_{i}(1-{\alpha}_{i})\mathrm{ln}{x}_{i}}$ |

Gamma | $x\in (0,\infty )$ | ${\scriptscriptstyle \frac{1}{\beta}}\left(bx+(1-a)\mathrm{ln}x\right)$ |

^{1}Special cases include Categorical (K > 2, N = 1), Binomial (K = 2, N > 1), and Bernoulli (K = 2, N = 1) distributions.

^{2}A special case is the Beta distribution (K = 2).

Name | Hamiltonian | Comments |
---|---|---|

Mean-field | ${\sum}_{i}{h}_{i}({x}_{i},y)$ | As in the main text, x is divided into non-overlapping subsets (x_{i}), each of which is associated with its own Hamiltonian. The inference scheme associated with this approximation is known as Variational message passing [11]. |

Bethe | ${\sum}_{ij}{h}_{ij}^{(2)}({x}_{i},{x}_{j},y)}-{\displaystyle {\sum}_{k}({c}_{k}^{(1)}-1){h}_{k}^{(1)}({x}_{k},y)$ | This expression uses a series of overlapping pairwise (superscript 2) Hamiltonians, that are then ‘corrected’ for these overlaps by subtracting singleton (superscript 1) Hamiltonians. Here, c_{k} is the number of pairwise factors that include x_{k} as an argument. The inference scheme associated with this approximation is known as (loopy) Belief propagation [29]. |

Kikuchi | $\begin{array}{l}{\displaystyle {\sum}_{R}{c}_{R}^{(i)}{h}_{R}^{(i)}({x}_{R}^{(i)},y)}\\ {c}_{R}^{(i)}\triangleq 1-{\displaystyle {\sum}_{\left\{K:R\subset K\right\}}{c}_{K}^{(i+1)}}\end{array}$ | This expression generalises the above approximations. Here, the subscripts index regions, while the superscript indexes the size of that region. In this expression, ${x}_{R}^{(i)}$ includes all elements of x in region R at scale i. Here, regions may overlap. If all regions are of size 1, this reduces to a mean-field approximation. If some are size 1 and others size 2, this is the Bethe approximation. Inference schemes based on the Kikuchi approximation are known as Cluster variational methods or Generalised belief propagation [27,28]. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Parr, T.; Sajid, N.; Friston, K.J. Modules or Mean-Fields? *Entropy* **2020**, *22*, 552.
https://doi.org/10.3390/e22050552

**AMA Style**

Parr T, Sajid N, Friston KJ. Modules or Mean-Fields? *Entropy*. 2020; 22(5):552.
https://doi.org/10.3390/e22050552

**Chicago/Turabian Style**

Parr, Thomas, Noor Sajid, and Karl J. Friston. 2020. "Modules or Mean-Fields?" *Entropy* 22, no. 5: 552.
https://doi.org/10.3390/e22050552