The 2-D Cluster Variation Method: Initial Findings and Topography Illustrations

One of the biggest challenges in characterizing 2-D image topographies is finding a low-dimensional parameter set that can succinctly describe, not so much image patterns themselves, but the nature of these patterns. The 2-D Cluster Variation Method (CVM), introduced by Kikuchi in 1951, can characterize very local image pattern distributions using configuration variables, identifying nearest-neighbor, next-nearest-neighbor, and triplet configurations. Using the 2-D CVM, we can characterize 2-D topographies using just two parameters; the activation enthalpy and the interaction enthalpy. Initial investigations with two different representative topographies (“scale-free-like” and “rich club-like”) produce interesting results when brought to a CVM free energy minimum. Additional phase space investigations, where one of these two parameters has been set to zero, identify useful parameter ranges. Careful comparison of the analytically-predicted configuration variables versus those obtained when performing computational free energy minimization on a 2-D grid show that the computational results differ significantly from the analytic solution. The 2-D CVM can potentially function as a secondary free energy minimization within the hidden layer of a neural network, providing a basis for extending node activations over time and allowing temporal correlation of patterns.


Introduction
A wide range of natural phenomena, from natural topographies to brain activation patterns, are expressed as two-dimensional (2-D) topographies. One of the most important areas in which we observe interesting 2-D topographies is in cortical activation patterns. Over the past two decades, neural activation studies have led to characteristic neural pattern descriptions, such as "rich club," "small world," and "scale free." Despite the increase in our ability to observe and characterize activation patterns within the brain, we have not yet had a comprehensive framework that provides an overarching rationale for neural pattern evolutions. This has not been for lack of interest or attention. There have been multiple efforts to present neural activation patterns in the context of free energy minimization within the brain. While several researchers have each brought forth valuable arguments, we still do not have an overarching framework that would allow us to ground the formation of neural activation patterns within the context of free energy minimization.
One reason that this might be the case is that -up until now -we have not had an appropriate free energy function. However, the 2-D Cluster Variation Method (CVM) may provide a means for not only characterizing neural activation patterns, but also for couching their dynamics within a solid framework.
The 2-D CVM offers three key advantages: 2. Continuous range of topographic types within a well-specified phase space, and 3. Two-parameter topographic pattern control.
The Cluster Variation Method (CVM) was originally devised by Kikuchi in 1951 [1], and then further advanced by Kikuchi and Brush (1967) [2]. Maren (2016) suggested that the CVM would be appropriate for modeling brain topographies [3], and presented initial computational results for the 1-D CVM. More recently, Maren (2019a) investigated how the free energy minimization process influenced resultant 2-D topographies represented by a 2-D CVM grid [4].  Up until now, the CVM approach has not received a great deal of attention. However, as Figure 1 illustrates, this approach is potentially useful for characterizing 2-D systems that would naturally gravitate to a free energy-minimized state. Examples of potential applications include:

Illustration of a Key Result
• Modeling systems where the topography is an essential component of system description; this can range from urban/rural topographies to brain images, • Providing an essential component of a new computational engine, where memories of prior system states can persist over time, and • Providing the modeling component of a variational Bayes approach that uses a representational system to model an external system, where the two are separated by a Markov blanket, and both systems are presumed to come to separate free energy equilibrium states, such as systems advocated by Friston and colleagues (2010Friston and colleagues ( , 2013Friston and colleagues ( , 2015, 2020) [5][6][7][8].
to the case where the unit activation enthalpy was set to zero; ε 0 = 0. The only remaining parameter was the interaction enthalpy between a pair of neighboring units, which was here set to a fairly high value of ε 1 = 0.250, or (more useful in the actual computations) h = 1.65, where the h-value (or simply h) is given as h = exp(2ε 1 ). The manually-designed pattern used as a starting point in Figure 1 (a) was a "scale-free-like" design, in that it had masses of "on" (A) units enbedded into a sea of "off" (B) nodes. The original design plan was to have masses of of 16,8, and 4 units, with increasing numbers of units as the mass-size decreased. This original plan had to be adjusted by the constraint keeping equal numbers of (A) and (B) units. This constraint kept the unit activation enthalpy at zero (ε 0 = 0), which allowed direct comparison of the computational configuration variable results with those obtained analytically. The configuration variables are fractional values for nearest-neighbor, next-nearest-neighor, and triplet configurations. The cluster variation method rests on including these variables in the entropy term of the free energy formulation.
One of the interesting things about the 2-D CVM approach is that we can see immediate visual results corresponding to our selection of the interaction enthalpy parameter, ε 1 . The value used here, h = 1.65 (ε 1 = 0.25), strongly encouraged pairwise interactions. Had the free energy minimization process continued (using a more sophisticated node state adjustment algorithm), the resulting pattern could have moved much more strongly in the direction of a "rich club" configuration. Even with existing algorithmic constraints, we see that previously unconnected "islands" of (A) (black nodes) in a sea of (B) (white) are now connected, typically via extended spiderleg-like diagonal trails of black (A) nodes, connecting the previously-isolated A-masses. This is just one of several observations that we can make about topographies resulting from 2-D CVM free energy minmization.
The remaining sections will investigate this figure in more depth; this is presented here as an illustration of the kind of results obtained via free energy minimization in a 2-D CVM grid.
The following subsection briefly describes prior work on free energy minimization in the brain, and the next subsections describe different topographies ("rich club," "small-world," and scale-free), both in general and as they occur in the brain. Following subsections introduce the 2-D CVM, with specific attention to the configuration variables,, which provide very local topographic descriptive elements.

Free Energy Minimization in the Brain
The notion that the brain operates to minimize free energy is not new; there is a long and robust history of efforts to advance this approach. (See Maren (2016) for a substantial discussion [3].) There is vigorous discussion within the neural dynamics modeling community as to exactly how free energy minimization plays its role.
For example, one point of view is that the brain operates in a near-critical state. Early work includes notions of criticality in the brain induced by "mutual excitation," as suggested by Kozma et al. (2012) [9] and Kozma and Puljic (2015) [10], and previously by Kozma et al. (2007) [11]. This work suggests that the brain operates at a near-critical state, specifically that the brain operates at a "pseudo-equilibrium" [9]. They advance the notion of neuropercolation, suggesting that "populations of cortical neurons ... sustain their metastable state by mutual excitation, and that its stability is guaranteed by the neural refractory periods." Recent data-driven studies support the notion that the brain acts near criticality. Ezaki et al. (2020) have used functional magnetic resonance imaging (fMRI) of networks at a whole-brain level to posit that individuals with higher fluid intelligence have brains operating closer to criticality, as compared with others [12]. They address patterns of transition between discrete states.
Others, such as Tognoli and Kelso (2014), also suggest that the brain operates in a metastable state [13]. Wilting and Priesemann (2020) provide an excellent review and discussion of criticality in neural systems [14].
Beginning in the second decade of this millenium, Friston and colleagues began introducing a notion of how free energy minimzization could work in the brain. One of the fundamental ideas that they have advocated (Friston et al., 2012) shows how Bayesian priors can be derived as outcomes of variational free energy minimization [15]. This specifically addresses notions of self-organization within the brain.
As noted by Friston et al. (2013), movement towards a new state will be influenced by the nature of the stimulus or perturbation. Friston describes this, saying: "In short, the internal states will appear to engage in Bayesian inference, effectively inferring the (external) causes of sensory states. Furthermore, the active states are complicit in this inference, sampling sensory states that maximize model evidence: in other words, selecting sensations that the system expects. This is active inference, ..." [6].
Friston and colleagues have taken this further to suggest mechanisms for neural communication strategies as an operative element of a "graphical brain" [16]. More recently, Demekas, Parr, and Friston (2020) have suggested that free energy minimization in the brain as a principle underlying active inference can facilitate emotion recognition [17].
While Friston et al. postulate how free energy minimization can be intimately tied to a range of important dynamic phenomena in the brain, the next step is to represent the free energy associated with spatio-temporal neural activity patterns. Thus, a means for explicitly representing pattern distributions can be most useful.
A detailed investigation into these different points of view is beyond the scope of this work. It suffices to say that various leading researchers are proponents for the idea that the brain does, indeed, perform some sort of free energy minimization. The most immediate question is thus: how does this process evidence in terms of activation patterns within the brain?

The Challenge: Efficiently Characterizing Two-Dimensional Topographies
The following Figure 2 illustrates a set of three easily-observed, naturally-occurring topographies; black lava rocks surrounded by white coral sand. Figure 2(a) shows relatively compact black lava rock elements, all of which can be connected with each other, although some connection routes will pass through neighboring masses. In Figure 2(b), there is a greater range of sizes in the black lava rock masses. Further, some are now disjunct from each other; they are now truly "islands." In Figure 2(c), the trend towards greater granularity continues, as the lava rocks are broken into smaller and smaller units, and relatively more are isolated from each other. Figure 2 gives us visual topographic images, representing a continuum of topography types. Similarly, there are several topography types that are commonly used to describe neural activation patterns. Most notably, three common topographic types have been recognized with their own labels, specifically: 1. Rich club, 2. Small world, and 3. Scale free.
Ultimately, we desire a means of connecting the morphological descriptions with various topology patterns produced via free energy minimization in the brain. Ideally, we will be able to map a parameter regions in a phase space for free energy-minimized systems with corresponding topologies. To enable this goal, we briefly identify some salient work describing each of the three topographic types, with particular attention to how they appear in neural activation patterns.

Rich Club Topographies in Neural Activation Patterns
One of the most well-known neural topographic structures has been described as being a rich club, characterized by clusters in which every node is connected with every other node in that cluster. Recently, Kim and Min (2020) take a graph theoretical approach to describing the rich-club structure in the brain [18]. Csigi et al., in their studies of rich clubs in complex networks, note that "the literature contains only a very limited set of models capable of generating networks with realistic rich club structure" [19]. They propose a means for generating rich club structures. One of the future directions that can ensue from this present study will be to create a means for generating certain specific topology types, such as rich clubs, within the context of 2-D CVM free energy minimization. This topic is developed further in Section 3: Discussion.

Small World Topographies in Neural Activation Patterns
Watts and Strogatz (1998) were the first to introduce the notion of a "small-world" network, identifying that between completely regular and completely random topographies, there was a topographic type that was both highly clustered and at the same time, had a short path length between nodes [20]. (For our discussion, this would be a short path length between active nodes, or those in state A.) They noted that this topography allowed networks to have enhanced signal-propagation speed, computational power, and synchronizability. Muldoon et al. have developed a metric to assess what they term to be "small world propensity," or the degree to which a system is likely to exhibit small-world-ness [21]. Liao et al. provide a concise illustration of summary of key measures useful in evaluating small-world-ness [22], and Zhu et al.take a graph theory-based to investigate the topological organization of the brain connectome, with particular attention to small world properties (2020) [23].
Friston, describing how perturbations can influence a neural system, notes that "The emerging picture is that endogenous fluctuations are a consequence of dynamics on anatomical connectivity structures with particular scale-invariant and small-world characteristics," and that these models "... also speak to larger questions about how the brain maintains itself near phase-transitions (i.e., self-organized criticality and gain control)" [24].
The notion of "small-world-ness" uses the notion of geodesic distance, where between any pair of nodes in an unweighted network, one can calculate the geodesic distance, which is given by the minimum number of edges that must be traversed to travel from the starting node to the destination node. In particular, a network is said to be a small-world network (or to satisfy the small-world property) if the mean geodesic distance between pairs of nodes is small relative to the total number of nodes in the network. Watts and Strogatz further introduced the notion of a global clustering coefficient, C, that is given as the ration of three times the number of triangles over the number of connected triples.
One direction of this work (elaborated further Section 3: Discussion) will be to correlate the global clustering coefficient with measures that can be obtained from nodes within a 2-D CVM grid.

Scale-Free Topographies in Neural Activation Patterns
The notion of scale-free behavior in neural systems is generally linked to power-law dynamics, rather than to a visual topography in neural activation patterns. The essential visual notion underlying a scale free topography is that the distribution of patterns, in terms of the their relative scale, is independent of the particular scale it which we examine the topography. A fractal pattern is scale-free, and so is a snowflake.
While initially, scale-free systems were thought to be ubiquitous, more recently, various researchers have challenged that assertion. As an example, Briodo and Clauset (2019) analyzed nearly 1000 networks, from across different domains, and found that relatively small proportions followed any real degree of power-law behavior [25]. Zhou et al. (2020), however, define a fundamental property -the degree-degree-distance -which is possessed by each link in a system. The distribution of this property can follow a power law, and can be used to characterize whether or not a system follows power law dynamics, and thus, whether it would be classed as being "scale-free" [26]. They find that, using this metric, the preponderance of scale-free systems (following a power law) is actually substantive. They find that their model "... predicts that the degree-degree distance distribution exhibits stronger power-law behavior than the degree distribution of a finite-size network, especially when the network is dense." They further identify how distributions change as networks evolve into more dense stages. This will be an interesting avenue for exploration as we characterize how a network expressed as a 2-D CVM grid pattern can change its density as the two controlling parameters change.

Introducing the Cluster Variation Method
The essential notion of the CVM is that we work with a more complex entropy expression within the free energy formalism for a system.
In a simple Ising model, the entropy S can be computed based on only the relative fraction of active units in a bistate system. That is, there are only two kinds of units; active ones in state A, where the fraction of these units is denoted x 1 , and inactive ones in state B, where the fraction of these units is denoted In contrast to the simple entropy used in the basic Ising model, in the CVM approach, we expand the entropy term. The CVM entropy term considers not only the relative fractions of units in states A and B, but also a set of configuration variables. The configuration variables for a 2-D CVM system include, in addition to the usual activation of single units, also the fractional values of different kinds of nearest neighbor and next-nearest-neighbor pairs, along with six different kinds of triplets. (Section 1.6 describes the configuration variables in depth.) Our entropy term now involves what are essentially topographic variables. That is, the location of one unit in conjunction with another now makes a substantial difference.
As a result, when we do free energy minimization to find an equilibrium configuration for a given system, we arrive at a 2-D CVM with certain characteristic topographic properties. While the exact activation of a specific unit (whether it is in state A or state B) can vary, each time a given system is brought to equilibrium, the overall values for the configuration variables should be consistent for a given set of enthalpy parameters.
These topographic patterns are worth our attention and interest.

Background: The Cluster Variation Method (CVM)
The Cluster Variation Method (CVM) was originally devised by Kikuchi in 1951 [1], and then further advanced by Kikuchi and Brush (1967) [2]. Over the past two decades, various researchers have placed the CVM into a broader context, most notably within graph theory and belief propagation methods.
Although not yet fully exploited, this provides a valuable avenue for CVM applications. Yedidia (2000) [27] has provided a pleasant introduction to the CVM in the context of an overall discourse on Gibbs Free Energy, and (more specifically) the Ising equation. Shortly thereafter, Yedidia, Weiss, and Freeman described how the CVM method was a means to accomplish belief propagation [28]. Wainwright and Jordan (2008) were among the first to recognize that the CVM approach was encompassed within the larger scope of graphical models, expontial families, and variational methods [29].
The majority of CVM studies have been with 3-D systems, most notably in applications to alloys [30]. Albers et al. applied the CVM to efficient linkage analysis [31], and Barton and Cocco (2013) studied Ising models for neural activity inferred via selective cluster expansion [32].
There have been only a few studies of the 2-D CVM. Dominguez et al. (2015) have compared a Bethe and plaquette-CVM for the random field Ising model in two dimensions, and showed that in both methods, it was difficult to find a a robust critical line [33].
The work by Maren has used a CVM architecture where the fundamental units were nearest-neighbors, next-nearest neighbors, and triples; as the base unit of a triple has been deemed appropriate for neural modeling. (See discussion in Maren (2016) [3].) The following Table 1 presents a glossary of the thermodynamic terms used in this article.

The Configuration Variables
The key requirement to working with the cluster variation method is that we need to understand and use a set of configuration variables. In a simple Ising system (the simplest statistical mechanics model for a system in which only two states are possible for each unit), we only need to think about whether a given unit is "on" or "off," or in states A or B.
We really only need one variable to describe the distribution of units in the system, because if wel let x 1 be the fraction of nodes in state A, and x 2 be the fraction of nodes in state B, then we also have that x 2 = 1 − x 1 . This gives us a system that is relatively easy to model and visualize. 8 of 34 In contrast, in the CVM, we expand the set of variables under consideration. We use a set of variables that are collectively referred to as the configuration variables.
In this section, we address three key factors that are essential for working with the configuration variables: 1. Configuration variable definitions -how they show up in the 2-D CVM grid, 2. Counting the configuration variables -how each configuration variable is counted, and the V&V thereof (verification and validation), and 3. A brief interpretation -how to interpret configuration variable values.
Later in this paper, we will examine how configuration variables change with the enthalpy parameters. However, even before we connect the grid topographies (expressed via configuration variables) with the free energy equation, we can establish a foundation for understanding what the configuration variables mean (in practical terms), and how they interact with each other.

Introducing the Configuration Variables
A 2-D CVM is characterized by a set of configuration variables, which collectively represent single unit, pairwise combination, and triplet values. The configuration variables are denoted as: These configuration variables are illustrated for a single zigzag chain in Figure 3. For a bistate system (one in which the units can be in either state A or state B), there are six different ways in which the triplet configuration variables (z i ) can be constructed, as shown in Figure 4.
Notice that within Figure 4, the triplets z 2 and z 5 have two possible configurations each: A-A-B and B-A-A for z 2 , and B-B-A and A-B-B for z 5 . This means that there is a degeneracy factor of 2 for each of the z 2 and z 5 triplets. This is shown in Figure 4.
The degeneracy factors β i and γ i (number of ways of constructing a given configuration variable) are shown in Figure 5. For the pairwise combinations y 2 and w 2 , β 2 = 2, as y 2 and w 2 can each be constructed as either A-B or as B-A for y 2 , or as B--A or as A--B for w 2 . Similarly, γ 2 = γ 5 = 2 (for the triplets), as there are two ways each for constructing the triplets z 2 and z 5 . All other degeneracy factors are set to 1. See the illustrations in Figures. 4 and 5.

CVM Topographies: Interpreting configuration variables
When we work with a typical Ising system model, we can easily interpret the results in terms of simple dependence of various functions on a single variable, x 1 . The topographic organization of the system, in this simple case, is not important. In contrast, when we work with a 2-D CVM, the topographic organization is all-important. There are fourteen different configuration variables that collectively describe this topography. To assist our interpretation of the configuration variable values correspond with a given 2-D grid, it helps to identify a few key variables that, taken together, give us insight into their associated topographies. We will call this reduced subset the interpretation variables.
It will be useful if we can mentally correlate this very small set of interpretation variable values with some notion of their corresponding topographies. We understand, of course, that a given set of values for the interpretation variables will not define a specific topography; rather, they will correlate with a specific kind of topology.
We nominate three configuration variables to form our set of interpretation variables: z 1 , z 3 , and y 2 . This reduces the number of configuration variables that we consider from fourteen down to three; that is we go from the set of two x i , three each for the y i and w i , and six of the z i , down to just two z i and a single y i . As we will see in the remainder of this work, this subset is sufficient to give good insight into the corresponding topographies.
These three interpretation variables are selected because, taken together, they are reasonably descriptive of the grid topography: • z 1 -A-A-A triplets; indicates the relative fraction of A units that are included within the interiors of the"islands" or "land masses"; z 1 also (indirectly) indicates the compactness of these masses, • z 3 -A-B-A triplets; indicates the relative fraction of A units that are involved in a "jagged" border (one that involves irregular protrusions of A into a B space), or the presence of one or more "rivers" of B units extending into landmass(es) of A units, and • y 2 -A-B nearest-neighbor pairs; indicates the relative extent to which the A units are distributed among the surrounding B units. A higher y 2 value indicates lots of boundary areas between A and B, and a smaller value indicates large "landmasses" of A units.
Clearly, the corresponding values for z 6 (B-B-B) and z 4 (B-A-B) will indicate similar topographic aspects for the B units.
In order to obtain the next simplifications, and also to obtain an analytic solution to the 2-D CVM free energy equation, we introduce the constraint that x 1 = x 2 = 0.5. It is only at this point of equiprobable distrubution of A and B units that the analytic solution can be found.
A brief summary of the implications is accomplished in the following Section 2: Results, in which we compare the values found for the configuration variables at free energy equilibrium with those obtained analytically, and find that the values diverge from the analytically-predicted values as the interaction enthalpy ε 1 increases from zero.

The Configuration Variables under the Equiprobable Case
We can greatly simplify the complexity of our task by dealing with the equiprobable distribution case, where x 1 = x 2 = 0.5. We are doing this, not only because it makes it easier to think about the system, but because at this equiprobable distribution point, we can achieve an analytic solution for the configuration variables.
Further, at this equiprobable distribution point, we can also invoke certain equalities among the configuration variables, due to symmetry considerations. Thus, at this point, we would have: Thus, if we have the values for x 1 , y 1 , w 1 , z 1 , z 2 , and z 3 , then we also have the values for x 2 , y 3 , w 3 , z 6 , z 5 , and z 4 .
Further, our normalization constraints also mean that if we know y 1 (and thus also y 3 ), we also know y 2 , because we have Also, we can rewrite these expressions; if we have y 2 , then we have y 1 = y 3 = 0.5 − y 2 , and similarly for w 1 and w 3 if we have w 2 .

Simplifying the Numbers of z i
Similarly, we have the normalization for the z i configuration variables given as Thus, in the equiprobable case, we have Thus, if we know z 1 and z 3 , we also know z 2 , given as In short, at the equiprobable distribution point (where x 1 = 0.5), if we know y 1 , w 1 , z 1 , and z 3 , then we also know the remaining configuration variables.
Since we will make the equiprobable distribution assumption in the next section, it is useful to consider the impact of this step on the relative values of the configuration variables. We identify the resulting simplifications in two areas:

Relationship between y i and w i
As one more step, we will note that, most of the time, y i and the corresponding w i will be approximately close to each other. In particular, the nearest-neighbor configuration variable y 2 indicates the extent of like-near-unlike boundaries. In the very extreme case where the units are arranged in a "checkerboard"-like configuration, then y 2 will approach its maximum of 0.5 (and 2y 2 → 1.0), while the values for y 1 and y 3 , indicating like-near-like nearest neighbors, will both approach 0.
(As a side note, it is worthwhile to make sure that we don't push the system so far that either y 1 or y 3 approaches 0, as the entropy term involves multiple instances of computing v ln(v) − v, where v refers to a configuration variable, and we don't want to go to the extreme of taking the logarithm of 0.) At the other extreme, a very small value for y 2 indicates that the system is composed almost exclusively of a very large mass of A units, existing as a single "continent" in a sea of B units. There is no single minimum value for y 2 , as it really becomes a function of the overall number of units in the system; as we create a progressively larger and larger grid, the fraction of units on the boundary of a large A continent becomes smaller in comparision to the total number of units actually in the interior of this large mass (with progressively more A-A nearest neighbors).
The w i values, representing next-nearest-neighbor pairs, will not exactly be the same as the y i . However, in the case of having large "landmasses" of A in a sea of B, then (similar to the y 1 and y 3 ), the values for w 1 and w 3 will each approach their limits of 0.5, and the value for w 2 will shrink (getting close to but not reaching 0).
As visual interpretations, we can think that y 2 reflects the relative proportion of like-near-unlike boundary pairs across the diagonals, and w 2 reflects the like-near-unlike boundary pairs along the horizontal and vertical connections. Thus, in Figure 14 (with a large landmass of A units, wrapped around from right-to-left, so it appears as two distinct halves in the figure), we have relatively small values for both y 2 and w 2 .
In contrast, in the earlier Figure 11, we have lots of small 'islands" of A units scattered throughout the sea of B; both y 2 and w 2 are substantively increased.
Similar to y 1 , z 1 indicates the extent of large, contiguous "landmasses" of A units. In particular, it indicates the compactness of these landmasses.
The z 3 value reflects less the presence of a boundary between masses of A and B units, and more the complexity of the boundary. The z 3 value also increases when we have long, spider-leg-like connections between masses or islands of A units. (Similarly, the value for z 4 increases when we have "rivers" of B appearing within the A landmasses.)

Results
There are three key results: 1. Deviation of computational results from the analytic, 2. Identification of a useful parameter range, and 3. Intial topography characterization.

Overview of Experimental Trials
To investigate the potential use of 2-D CVM grid patterns for modeling brain activation patterns, the experiments used two different kinds of patterns: • Scale-free-like, and • Extreme rich club-like.
The experimental objectives were to: 1. Identify appropriate interaction enthalpy parameters that reasonably corresponded to the two different initial grid patterns, 2. Identify the actual configuration variable values for each pattern as each was brought to its respective free energy equilibrium, and 3. Characterize the resulting free energy-minimized topographies.
Each of the experimental trials began with one of the two manually-designed grid configurations shown in Figure 6. These two configurations corresponded (approximately) to the notions of (a) "scale-free" and (b) "rich club" topographies, as observed in various neural communities. (For references to the scale-free and rich club literature on brain topographies, please consult Maren (2016) [3].) As we will see in the following Subsection 2.4, these two initial grid patterns were each designed to have equal numbers of nodes in states A and B, that is, x 1 = x 2 = 0.5. This was because, when this equiprobability condition holds for the two states A and B, there is an analytic solution for the configuration variables as a function of the interaction enthalpy. To make the dependence easier, Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 January 2021 doi:10.20944/preprints202101.0041.v1 we will use the term h-value, where the h-value is a simple exponential function of the interaction enthalpy parameter ε 1 . The analytic solution is achieved using the h-value, and so it is easier to couch all descriptions of analytic and computational results, as well as topographic behaviors, in terms of the h-values.
As an indicator of experimental intention, Figure 7 shows the analytic solution for three of the configuration variables (z 1 , z 3 , and y 2 ) as functions of the h-value. It further identifies how each of the two initial grids correspond to candidate h-values; (a) the "scale-free-like" initial pattern has configuration variable values in close alignment with a relatively low h-value of 1.16, and (b) the "extreme rich club-like" initial pattern has configuration variable values indicating that an h-value of approximately 1.65 will be useful. Details of how these matches are done are presented in Section 4: Materials and Methods. The following Subsection 2.2 provides an overview of how the analytic solution is achieved.
From Figure 7, it is clear that we can readily characterize a 2-D grid pattern in terms of an h-value, even if it is not yet at free energy equilibrium. While the initial h-value may be approximate, there is a distinction between the fragmented islands of A illustrated in Figure 6 (a) and its correspondingly low h-value of approximately 1.16, and (b) the highly-massed continent of A (a single "continent," due to wrap-arounds of the grid in both horizontal and vertical directions) and its corresponding and significantly larger h-value of approximately 1.65.
Expanding our view into the future, where we envision more detailed results where the activation energy ε 0 is allowed to vary as well as the interaction enthalpy ε 1 (or its corresponding h-value), it seems realistic that we can characterize a wide range of topographies using these two parameters.
Restricting the first-stage experiments to systems with equiprobable occurrences of nodes in A and B served two purposes. First, it made it possible to compare the actual configuration variable values achieved after free energy minimization with those predicted by the analytic solution. Second, it facilitated code verification and validation (V&V), briefly referenced in Appendix A. Thus, for the configurations shown in Figure 6 (2) an "extreme-rich-club-like" system (on the right-hand-side), where the values are mapped against h-value = 1.65. In each case, the disparity between the actual (counted) configuration variable values versus any given h-value suggest that neither system is at equilibrium.
The following Subsection 2.2 overviews how the 2-D CVM free energy equation is achieved. Understanding this equation will make it possible to understand the three key results presented in this paper, namely: 1. Comparison of computational results with the analytic, where notable differences have been found, 2. Identification of a useful parameter range, which allows us to identify the extent of the phase space that we will ultimately desire to map, and 3. Initial topography characterization, which will allow us to correlate h-values with different kinds of topographies, as well as with their associated configuration variable values.

Result 1: Computational Configuration Variable Values Differ from Analytic
One of the most noteworthy computational results is that the actual configuration variable values, as computed for a free-energy-minimized grid at a given h-value, differ from the analytically predicted values. (The h-value, or simply h, is given as h = exp(2ε 1 ), and is an easier value for computational work than is ε 1 .) The was initially presented, without details, in Kikuchi and Brush (1967) [2]. Maren (2019a) [4] gives the full derivation. In order to compare computational results with the analytic, the computational experiments used topographies for which ε 0 = 0, or x 1 = x 2 = 0.
The essential notion of the CVM is that we work with a more complex expression for the free energy in a system.
As a point of comparison, the basic Ising equation is where F is the free energy, H is the enthalpy and S is the entropy for the system, and where N is the total number of units in the system, k β is Boltzmann's constant, and T is the temperature. For working with abstract systems, the total Nk β T can be absorbed into a reduced energy formalism, as these values are constants during system operations. This leads to the reduced representations ofF,H, andS. We will work consistently with reduced representations throughout this work.
In a simple Ising model, both the enthalpy H and the entropy S can be computed based on only the relative fraction of active units in a bistate system. That is, there are only two kinds of computational units; active ones in state A, where the fraction of these units is denoted x 1 , and inactive ones in state B, where the fraction of these units is denoted In contrast to the simple entropy used in the basic Ising model in statistical mechanics, in the CVM approach, we expand the entropy term. The CVM entropy term considers not only the relative fractions of units in states A and B, but also those associated with the configuration variables, as described earlier in Section 1.6.) We can write the 2-D CVM free energy, using the formalism first introduced by Kikuchi in 1951 [1], and then further advanced by Kikuchi and Brush (1967) [2] (explicitly for the 2-D CVM), as where µ and λ are Lagrange multipliers, and we have set k β T = 1.

The 2-D CVM Enthalpy
The enthalpy in a simple Ising system is traditionally given as where H 0 and c are constants. (For a useful discussion of the mean-field approach as compared with the Bethe approximation, belief propagation, and the cluster variation method, see Yedidia [27].) The first term on the RHS (Right-Hand-Side) corresponds to the activation enthalpy, or enthalpy associated with each active unit. The second term on the RHS corresponds to the interaction enthalpy, or energy associated with pairwise interactions between active units.
In contrast, the enthalpy for the 2-D CVM is given as that is, they omit the term linear in x 1 ; the activation enthalpy. We can infer that Kikuchi and Brush omit the activation enthalpy from their enthalpy term because they move directly to the analytic solution for the 2-D CVM free energy, which is solvable only in the equiprobable distribution case of x 1 = x 2 = 0.5. The equiprobable distribution is achieved only when the activation enthalpy is zero; that is, when ε 0 = 0.
When the activation enthalpy ε 0 > 0, then the units in state A have an energy associated with them that is greater than that of the units in state B. Thus, an equilibrium solution will favor having fewer units in state A. There is no analytic solution for this case, other than that in which the interaction enthalpy is zero (ε 1 = 0). This latter case is trivial to solve, and is not particularly interesting, as the distribution of different kinds of nearest-neighbor pairs and triplets will be probabilistically random, excepting only that the proportions of units in states A and B, respectively, will be skewed by the activation enthalpy parameter ε 0 .
As just noted, the typical expression for the interaction enthalpy is a quadratic term in x 1 , that is, H 1 = cx 2 1 . The parameter c encompasses both the actual interaction energy for each pairwise interaction, and a constant that expresses the distribution of pairwise interactions as a simple linear function of the fraction of active units (x 1 ) surrounding a given active unit. This is then multiplied by the total fraction of active units, giving the quadratic expression.
In the expression for the 2-D CVM interaction enthalpy, we have terms that expressly identify the total fraction of nearest-neighbor "unlike" pairs (y 2 ) and "like" pairs (y 1 and y 3 ). Thus, we can replace cx 2 1 with the fraction of "unlike" pairs (counted twice, to account for the degeneracy in how these pairs can be counted), and the fractions of "like" nearest neighbor pairs. We recognize that this is a simplification; we are not counting interaction energies due to next-nearest neighbor pairs, the w i , nor from the triplets z i . We are, effectively, subsuming these into the pairwise interactions that are being modeled with the y i .
We take the interaction enthalpy parameter ε 1 to be a positive constant. We envision a system in which the free energy is reduced by creating nearest-neighbors of like units, that is, A-A or B-B pairs. (Decreasing the interaction enthalpy leads to decreasing the free energy, which is desired as we go to a free energy minimum, or equilibrium state.) Similarly, the interaction enthalpy should increase with unlike pairs, or A-B pairs (or vice versa).
Thus, we write the interaction enthalpy equation, following the formalism introduced by Kikuchi and Brush, asH We interpret this equation by noting that as we increase the fraction of unlike unit pairings (2y 2 , encompassing both A-B and B-A pairs), we raise the interaction enthalpy. At the same time, if we're increasing unlike unit pairings, we are also decreasing like unit pairings (A-A and B-B, or y 1 and y 3 , respectively), so that we are again increasing the overall interaction energy. (Note that the like unit pairings show up in Eqn. 8 with a negative sign in front of them.) Thus, if we create a 2-D CVM that is like a checkerboard grid (alternating unit types), then we are moving to a higher interaction enthalpy and higher free energy, and away from a free energy minimum. If we create a grid such as that shown in the RHS of Figure 6, with all of the units in each state grouped together, respectively (the "extreme rich club-like" pattern), then we lower the interaction enthalpy and correspondingly lower the free energy.
Clearly, if the interaction enthalpy were all that mattered, we would have a 2-D CVM pattern such as the one on the right in Figure 6. What keeps this from happening is the entropy term, which demands some distribution over different possible configurations.
Before moving on to the entropy, we note that we can express the interaction enthalpy using the triplet configuration variables z i , instead of the nearest-neighbor pair variables y i . We can do this by drawing on equivalence relations between the y i and z i variables. Those for y 2 are given (Kikuchi and Brush (1967) [2], and also Maren (2019a) [4]) as Notice that we have two ways of expressing y 2 in terms of the z i , as shown in Eqn. 9. Since we want to work with the total 2y 2 , it is easy to express that as the sum of the two different equivalence expressions. This proves useful when deriving the analytical solution for the free energy minimum, or equilibrium point.
We also have equivalence relations for y 1 and y 3 , given as and This lets us writeH As a minor note, the enthalpy used in previous related work by Maren [3,34], was The results given here are similar in form to the results presented in the two previous works by Maren; they differ in the scaling of the interaction enthalpy term.

The 2-D CVM Entropy
The entropy for the 2-D CVM is given as A more detailed discussion of the entropy term is given (Kikuchi and Brush (1967) [2] and also Maren (2019a) [4]).

Free Energy Analytic Solution
Kikuchi and Brush (1967) [2] provided the results of an analytic solution for the 2-D CVM free energy, for the specific case where x 1 = x 2 = 0.5. Specifically, making certain assumptions about the Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 January 2021 doi:10.20944/preprints202101.0041.v1 Lagrange multipliers shown in Eqn. 4, we can then express each of the configuration variables in terms of ε 1 . More usefully, since the expression actually involves the term exp(2ε 1 ), and not ε 1 itself, it is much easier to use the substitution variable h = exp(2ε 1 ). We refer to h (or sometimes, the h-value), as the interaction enthalpy parameter throughout.
The full derivation of the set of equations giving the configuration variable values at equilibrium (i.e., at x 1 = x 2 = 0.5) is given in Appendix A of Maren (2019a) [4]).
The full set of the analytic expressions for the various configuration variables is couched in terms of a denominator involving h, specifically which Kikuchi and Brush present as their Eqn. (I.24) [2]. We recall that at the equiprobable distribution point, where x 1 = x 2 , we have a number of other equivalence relations, e.g. z 1 = z 6 , etc.
We then (following Kikuchi and Brush, in their Eqn. (I.25)) identify each of the remaining configuration variables as

Divergence in the Analytic Solution
As is obvious from Eqns. 16 and 17, there will be a divergence in the analytic solution for the configuration variables when ∆ = 0, because the analytic solution contains a denominator term (a scalar factor of ∆) that is quadratic in h. Specifically, the term diverges for h = 0.172 or h = 5.828.
We are interested in the latter case, where the value of h > 1 indicates that ε 1 > 0, which is the case where the interaction enthalpy favors like-near-like interactions, or some degree of gathering of similar units into clusters. This means that we expect that the computational results would differ from the analytic as h → 5.828. (Actually, the computational results diverge from the analytic substantially before this point.) This would possibly have some impact, but we find that the real point of interest comes from examining the impact of higher h-values on the free energy itself. Increasing h increases the overall value of the enthalpy term, and makes it overwhelmingly dominant with regard to the entropy term.

Significance of h in the Free Energy
Because it is the minimum in the negative entropy that creates the possibility for minimizing the free energy, we need to have the entropy term play a role that is at least on par with the enthalpy. If the enthalpy term is overall too large, then there will be an adverse effect on finding a useful free energy minimum.
The impact of the interaction enthalpy parameter h on the overall free energy is shown in Figure 8, which presents the results when x 1 = 0.5, which is the case where all of the results should conform with the analytic solution. Figure 8 shows that beyond a certain value, approximately h > 1.2, the enthalpy term dominates the entropy, and thus further increasing h does not yield an improved model. Further, as we are primarily interested in ferromagnetic systems (where like nodes tend to cluster with like, we are not interested in the antiferromagnetic regime, where h ≤ 1.0 (ε 1 ≤ 0).  Figure 9 identifies the reasonable phase space region for mapping out the configuration variable values for at-equilibrium patterns as a function of the activation and interaction enthalpies (ε 0 , ε 1 ). The useful region is bounded by the area where ε 0 < 3.0 and ε 1 < 0.25 (h < 1.649).

Result 2: Identification of a Useful Parameter Range
In this work, two plausible data points for configuration variable values have been identified for positions along the left-hand-side axis, where ε 0 = 0. Values along the top horizontal axis are easy to compute (see Maren (2019a) [4]). It will be somewhat more challenging to compute configuration variable values for the interior of this phase space. The reason is that, even though we have found that the computated values for the configuration variables diverge from the analytic as ε 1 moves further away from zero, the analytic solution is still a useful reference point, especially in the neighborhood of ε 1 ≈ 0. Finding interior points will require a greater number of trials per candidate set of (ε 0 , ε 1 ) values.
Once a configuration variable values have been established for a a suitable set of (ε 0 , ε 1 ) values, it will likely be possible to establish gradients throughout the phase space and produce approximate sets of configuration variables for any position within the (ε 0 , ε 1 ) phase space.

Experimental Trial: Very Large h-value
As a very preliminary investigation, the "extreme rich club-like" pattern shown in Figure 6 (a) was brought to free energy equilibrium for three trials where h = 2.0. This meant that we were knowingly using an h-value that was larger than reasonable, based on the thermodynamic values shown in Figure 8. The objective was to bring the system to equilibrium at an h-value that would force the extreme clustering shown in Figure 6 (a), making the resulting system very close to the initial pattern. Even for this (relatively extreme) h-value, the equilibrium state moved substantially away from the original, introducing much more varied topographies. Details are presented in Appendix A, and also in Maren (2019c) [35].

Experimental Trial: Two Patterns at Same Large h-value
The following Table 2 presents specific data for a set of two experimental trials run for the case where the same h-value was used, but for two very different kinds of patterns; the "scale free-like" and the "extreme rich club-like" patterns illustrated previously in Figure 6. The h-value of 1.65 was known, a priori, to be a reasonably good fit for the "extreme rich club-like" pattern, but to be very far from a likely fit to the "scale free-like" pattern. (See Subsection 4.2 for details.) The objective was to see how far, with a limit of 100 swaps, the two patterns could each be brought close to the analytically-predicted configuration variable values. Table 2. Select configuration variable values and thermodynamic values when the h-value is set to 1.65 for each of two patterns brought to free energy minimization; the "extreme rich club-like" pattern and the "scale-free-like" pattern. For comparison, the analytic solution variable values are also given. Experimental details are in the slidedeck included in the GitHub repository for this work (Maren, 2019c) [35]. It is clear that the "extreme rich club-like" pattern, even though it changed topography significantly (see Results 3), still yielded an at-equilibrium set of configuration values that were reasonbly close to the predicted analytic ones.

Variable Scale Free Rich Club
On the other hand, the "scale free-like" pattern did not morph to an at-equilibrium state after only 100 swaps; this is most likely due to the limitations of the "swap-and-test" nature of the free energy minimization algorithm. In order to bring various topographies closer to equilibrium, faster, it will be necessary to devise algorithms that are sensitive to the existing topographies. This will require an object-oriented approach, and is part of an ongoing investigation.
The actual resulting topographies are shown in Figure 10. Even for an h-value that is very high (h = 1.65), the free energy minimization produces results that introduce a greater distribution across the various configuration variables, rather than allowing tight clustering. Setting the h-value to 1.65 is at the outer edge of a useful parameter setting, according to the free energy computations shown in Figures 7 and 8. This is obvious for the case where the initial pattern is tightly clustered, as shown in Figure 6 (a). However, when the initial pattern does not have very high like-near-like connectivity, as was the case for the initial pattern in Figure 6 (b), free energy minimization at a high h-value substantially increases the like-near-like connections; i.e., increases z 1 . This will be investigated more fully in the following Subsection 2.4.

Result 3: Initial Topography Characterization
The early experiments with a 2-D CVM system (forming the bulk for the experimental work discussed here) were all done using various patterns of 256 (16 x 16) units each. Two of these patterns were illustrated previously in Figure 6; these were (a) the "extreme rich club-like" pattern and (b) the "scale-free-like" pattern.
Keeping initial experiments confined to this relatively small 16x16 (256-unit) grid allowed three things: 1. Sufficient variety in local patterns -this grid size was large enough to illustrate several distinct kinds of topographies (each corresponding to different h-values or enthalpy parameters; h-values will be discussed in a following sectioni), 2. Sufficient nodes -there were enough nodes so that triplet-configuration extrema could be explored in some detail (e.g., for relatively small numbers of nodes in state A, and 3. Countability -the V&V effort required that several early versions of the grid be manually counted for all the configuration values for a given 2-D grid configuration, and matched against the results from the program. One final advantage of the 16 x 16 grid layout was that the different grid configurations were both large enough to show diversity, but small enough so that it was possible to manually create figures illustrating the activation states (A or B) of each node, thus illustrating the detailed particulars of each configuration design. (The code at this point included only rudimentary print-outs of grid patterns.) The experiments began with manually-designed pattern configurations, such as the two shown in Figure 6. These two configurations correspond (approximately) to the notions of "scale-free" and "rich club" topographies, as observed in various neural communities. (For references to the scale-free and rich club literature on brain topographies, please consult Maren (2016) [3].) These two different grid pattern configurations are early attempts to characterize how the h-values can be identified for grids with different total configuration variable values. The following sections will discuss h-values in the context of the free energy equation.
Both of these systems were created with the constraint of equiprobable occurrence of units in states (A or B; that is, x 1 = x 2 = 0.5). This was done to facilitate the necessary code V&V step, which will be discussed in later in this section. Thus, for the configurations shown in Figure 6, both the (a) and (b) grids have 128 nodes each of units in state A and in state B.
Experimental results for Trial 1, performing free energy minimization for the "extreme rich club-like" pattern, were given previously in Subsection 2.3.2. Thus, this subsection both illustrates the process and details how the experiments were conducted for Trial 2, working with the "scale-free-like" pattern shown in Figure 6 (b).
The configuration on the left of Figure 6 (pattern a) is an effort to build a "scale-free-like" system. The notion of a "scale-free" system is that the same kind of pattern replicates itself throughout various scales of observation in a system. Thus, for the "scale-free-like" configuration shown in Figure 6 (a), the design was originally intended to be 180-degree symmetrical around a central axis (dihedral group-2 symmetry). Specifically, the left and right sides were to be identical in a rotated-180-degree sense.
For ease in design of the "scale-free-like" system, there was a single larger pattern on one side and duplicated on the other. The basis for this pattern was a paisley-like shape of A units. This shape was designed in order to create greater dispersion of values across the z i triplets; that is, to minimize having tightly-clustered islands that would yield little in the way of A-B-A and B-A-B triplets (z 2 and z 5 , respectively).
The practical limitation of attempting to fit various "islands" of A nodes (black) into a surrounding "sea" of B nodes (white) meant that there were not quite enough B nodes to act as borders around the more compact sets of A nodes. Thus, the "scale-free-like" pattern shown in Figure 6 (a) is a bit more compressed than originally planned.
The original plan was that out of 256 nodes in the grid, half (of the designed pattern) would be on the right, and half on the left; 128 nodes on each side. Of these, for each side, 64 nodes were to be in state A (black). Of these nodes (per side), sixteen (16 nodes) would be used to create a large, paisley-shaped island. The remaining 64 -16 = 48 nodes would be used for smaller-sized islands; two islands of eight nodes each, etc. The plan is shown in Figure 11. The notation of "center" and "off-center" refers to the placement of the various islands; the largest (16-node) islands were to be placed more-or-less in the center of each of their respective (left or right) sides of the grid, and the remaining islands were to be "off-center"; situated around their primary respective large islands.
The resulting patterns (shown in Figure 11) were close to the original plan, although not exactly the same. Even though some changes had to be made to the original design plan, the original constraint, that the number of units in states A and B would be identical (128 nodes in each), was kept. After the pattern was designed, the configuration variable values were computed. The details for the z i are shown in Figure 12. We begin with the scale-free-like initial grid, introduced earlier in Figure 11. We will bring this initial grid to a free energy minimum point, for each of two different h-values: • Trial (2.a): h = 1.65, which should push the system towards increased like-near-like clustering, and • Trial (2.b): h = 1.16, which should push the system towards like-near-unlike pairs.

Note:
The exact details of the row counts are difficult to read in Figures 12 and 14; the original diagrams are in a corresponding slidedeck that is available in the GitHub repository for this work (Maren, 2019c) [35].

Trial 2.a
When we bring the system from Figure 11 to free energy equilibrium with h = 1.65, we get the result shown in Figure 13 (a). The pattern on the right (a) illustrates a topography for the same initial pattern, but brought to a free energy minimum for h = 1.16.
We see in Figure 13 (a) that, as expected, the clusters are coalesced as compared with the original pattern. As is clear from an initial study to approximate a likely h-value candidate, h = 1.65 is very large compared with what is a reasonable target for this pattern. (See Figure 7.) This trial was conducted to find a point of comparison between the results here and those that would be obtained with a more reasonable h-value. When we bring the system from Figure 11 to free energy equilibrium with h = 1.16, we get the result shown in Figure 13 (b).
With a lower h-value than was previously used, we expect less clustering of like-with-like, and more spread-out connections. What we see is interesting -not surprising -but it is worth noting that we get a number of "spider-leg" connections between what remains from the original clusters.
These "spider-legs" give us both like-near-like (extending the length of the spider-leg) as well as like-near-unlike (on either side of the spider-leg) connections for each unit in the leg. Thus, each unit in a spider-leg gives us approximately the ratio of y 1 and y 2 pairwise connections (two y 2 connections per y 1 , accounting for the factor of two degeneracy in the y 2 pairs) that we'd expect near h = 1.0.

Discussion
We can now envision being able to characterize various topographic regions within the two-parameter phase space used for the 2-D CVM. The process of filling in the actual phase space details, correlating configuration variable values (and their associated thermodynamic quantities) with (ε 0 , ε 1 ) pairs, will be relatively straightforward. It will, of course, be greatly aided by developing the next generation of code, preferably written in an object-oriented manner.
With the phase space mapped out, several enticing avenues for future work open up. Among the first tasks are: • Devising a means to correlate known morphological cluster descriptions with the various topographies (both local and global) produced via free energy minmization, • Devising a means to correlate phase space regions with various topopgrahic descriptions, e.g., "rich club," and • Devising strategies to efficiently transit from one topography to another, as there are changes in the parameters (ε 0 , ε 1 ).
These steps, however important, simply pave the way for the next generation of work. One of the most interesting notions advanced by Yedidia, Weiss, and Freeman (2002) has been that the CVM method was a means to accomplish belief propagation [28]. Belief propagation, as we know, is essentially a symbolic-level activity, initially proposed by Pearl [36]. The belief propagation method is intrinsically associated with graphical models [29]. The challenge is that, in a traditional belief-propagation approach, each node in a graph is associated with a specific belief. In a CVM-based approach, a cluster or pattern of nodes might evidence in correspondence with a specific feature. Thus, we are interested not only in correlating a distributed representation with a more classically-symbolic one, but also in identifyng how dynamic patterns in one can correlate with emerging beliefs.
Another interesting avenue for work will be the connection between modeling long-range order in the brain and in other (non-biological) systems with the connectivity between clusters of active nodes that emerge following 2-D CVM free energy minimization. As an example, the topographies resulting from free energy minimization on the scale-free-like initial pattern (in Figure 13) showed spiderleg-like connecting branches between what had previously been isolated masses. These spiderlegs, running diagonally across the grid, suggest something like long-range order. Clearly, many more experiments are needed to validate the exact conditions under which these spiderleg extensions appear.
Following initial investigations into how topologies in 2-D CVM grids may show apparent long-range connection paths between different activation clusters, it would be potentially fruitful to make some connections between these empirical results and other studies involving long-range order. As an interesting corollary to long-range order in neural systems, non-biological systems with repulsive long-range connections can exhibit long-range order in the form of stripes [37], using a combination of short-range and long-range interactions. It would be interesting, and potentially useful, to investigate the introduction of long-range interactions into a 2-D CVM with the entropy.
As an evolution from studies on long-range order, the role of free energy minimization in a 2-D CVM grid can also potentially apply to message passing. Raymond and Ricci-Tersenghi (2013) report on message passing as a means for correcting beliefs [38]. More recently, Parr et al. discuss neuronal message passing [39], with a good review of relevant literature. Recent work on neural connectivity in the brain by Peraza-Goicolea et al. (2020) addresses susceptibility propagation (SP), belief propagation (BP), and critical state dynamics [40], and Friston, Parr, and de Vries (2017) investigate the connection between belief propagation and active inference [16]. Thus, the substantial work on message passing in neural systems may usefully inform further investigations on using a 2-D CVM as a means for modeling neural systems.
There is a philsophical connection between the notion of using a 2-D CVM to model a physical system and the ideas advanced by Friston and colleagues regarding the role of free energy minimization as a key function in brain processes (2010,2013,2015) [5][6][7][8]. Maren (2019b) [41] has illustrated how the 2D CVM could play a role within a computational engine modeled on notions advanced by Friston and others for active inference; a topic that can be investigated once the avenues identified in this section have been developed to sufficient maturity.

Materials and Methods
The computational results presented here summarize work originally reported in Maren (2019a) [4].These results were achieved through a series of computer programs, written in Python 3.6, and which are publicly available in a GitHub repository [35]. Substantial documentations for the various codes are also within the same GitHub repository. The author of this work holds the copyright to these codes and to the supporting documentation. This work follows predecessor efforts by Maren (2019c) [42], which reported extensive manual verification and validation, ensuring that the code results corresponded to those of the analytic equations [42].
It will be relatively straightforward for an independent researcher to replicate all the results presented in this work. The codes used are all self-contained; all of the data patterns used here are contained within the codes themselves; there is no need to set up special file directories for data access. Further, these codes can be used with the simplest of Python installations, e.g., Anaconda's Spyder. The graphics included with the code are of the simplest sort, and do not need installation of special graphics packages. The code is written in a structured manner, with the key parameters (changeable by the user) easily viewed in the main module.

Method Overview
The same basic codes, with minor variations for specific experiments, have been used throughout this work. The delicate and time-consuming aspect, requiring the greatest attention to verification and validation, has been computing the configuration variables. Once the configuration variables are obtained, it is straightforward to compute the free energy and other thermodynamic variables.
The overall method for achieving a free energy-minimized pattern, starting with an initial pattern, is in three steps. (This is the process for grid patterns where the activation enthalpy is zero (x 1 = x 2 = 0.5), and the analytic solution therefore provides a starting point.) 1. Devise an initial pattern, and compute the configuration variables associated with this initial pattern, 2. Using this initial set of (not-at-equilibrium) configuration variables (specifically, the set of interpretation variables, z 1 , z 3 , and y 2 ), identify an appropriate h-value that reasonably corresponds to that set of interpretation variables, using both graphical and tabular depictions of the configuration variables as a function of the h-value, and then 3. Use that h-value as the key parameter in the code that modifies the pattern until a free energy minimum has been achieved. This new resulting pattern exemplifies how the initial pattern will appear after free energy minimization, and the corresponding set of configuration variable values identify a position in the phase space, associating the selected h-value with a set of configuration variable values.
This procedure of estimating a candidate h-value is needed, because we cannot modify the h-value itself during the course of the free energy minimization. The reason for this is that increasing the h-value always lowers the enthalpy (see Figure 8), and thus, if the code allowed the h-value to be adjusted, the system would keep increasing the h-value in order to achieve progressively lower free energy values through h-value lowering and not via topography modifications.
Once an h-value has been selected, the free energy minimization process is achieved through a simple strategy of looping through a control sequence where two nodes are randomly selected (one in state A and another in state B), and swapping the activations of the two nodes. If the resulting free energy is at a lower value than previously, the swap is kept. Otherwise, the prior activation states are returned to the nodes. This process is continued until changes in the free energy value are under a certain small threshold, or a limit on the total number of swaps is reached.
It should be noted that the code computing the configuration variable values is exact for the x i , y i , and w i , and is approximate for the z i , in that the z i values are computed for the z i triplets that can be counted horizontally (within a row above or below the node in question). The z i values that need to be counted vertically (that is, not extending to two rows above or below the node in question) were not included in this original code set. Detailed hand-calculations (on small-scale examples), as well as comparisons of theoretical vs. computational for larger samples, showed that the results obtained with this approximation yielded the same values as both the theoretical and the full (manually-counted) computational values.
Ongoing work, transitioning the code from a straightforward structural approach to object-oriented Python, will incorporate the full computation of the z i .
Appendix A describes the codes, their documentation, and verification and validation (V&V) efforts.

Method Illustration
The three steps identified above are illustrated in the following example figures. Figure 14 shows an initial "extreme rich club-like" pattern, in which the "on" (state A) nodes are massed together. (Recall that the grid shown depicts a wrap-around envelope in both directions, so the apparent set of two masses of A units are actually connected with each other "behind" the grid, and are also connected to themselves top and bottom.)

Step 1: Compute Initial Configuration Variable Values
As shown in this figure, the first step is to compute the total numbers of each of the configuration variables. This was done both manually and computationally, in order to provide further code verification. The values for the total Z i are shown in the columns on the right-hand-side of Figure 14.
(Recall that the Z i count is being done for the horizontally-oriented Z i 's only, but that the values brought into the equations as z i 's have been scaled appropriately.) Figure 14. A 2-D CVM "extreme-rich-club-like" system with an equal number state A and state B nodes (128 nodes each).

Step 2: Identify Approximate h-value
For step 2, the initial values for z 1 , z 3 , and y 2 obtained are compared with the analytically-derived values for these variables for certain h-value. This is shown in Figure 7, for two cases: (1) the "extreme rich club-like" grid (right-hand-side) that is used as a process example in this section, and also (2) a "scale-free-like" pattern (whose configuration variable values are depicted in the center of the figure). (Experiments on the "scale-free-like" pattern were described in the previous Section Results.) As is evident from Figure 7, the configuration variable values for a given initial grid pattern typically do not correspond with a single h-value, meaning that the initial (manually-designed) patterns are not at equilibrium.
The implication of this initial not-at-equilibrium state is that we can choose an h-value that seems reasonable for a given initial state, and perform free energy minimization on that pattern.
At the beginning of this work, the working hypothesis was that the resulting configuration values, once a system had been brought to free energy equilibrium, would correspond to the analytically-predicted values. This turned out to not be the case. (See a more detailed discussion in Appendix A.) For example, if we take a nominal h-value of 1.65, the pattern's initial value for z 1 is 0.418, which is slightly above the analytic prediction for z 1 at h-value = 1.65. However, the y 2 , which for the initial pattern is 0.047, lies noticeably lower than the corresponding graph of the analytic value for y 2 . This behavior is characteristic of most manually-constructed initial patterns.
Even though the initial pattern values did not precisely correspond to h-value at 1.65, this value was chosen as a reasonable target textith-value for free energy minimization.

Step 3: Perform Free Energy Minimization
Surprisingly, the actual configuration variable values after free energy minimization differed strongly from the analytically-predicted values.
The resulting pattern (shown in Figure 15) was achieved after minimizing the free energy for an initial rich club-like grid, where h = 1.65. (For details, see [35].) Figure 15. A 2-D CVM "rich club-like" system after being brought to a free energy minimum. Note that there are still an equal number state A and state B nodes (128 nodes each). *** As a contrast to the "scale-free-like" grid configuration used in Figures 11 and 12, the contrasting experiment used a second configuration that had only one large compact region of nodes in state A, which was wrapped-around the grid envelope, as shown in Figure 14. This configuration was designed to maximize the number of pairwise and triplet configurations that put "like-near-like." The previous configuration, shown in Figure 11, was more in the direction of "like-near-unlike." The purpose of having patterns with such different dispersions among the configuration variable values was that they would putatively correspond to different h-values. This means that they would (ultimately) converge to different points on an equilibrium curve for the free energy equation (in the case of equiprobable units in states A and B).
We have the analytic results for the free energy minimum curve; specifically the equilibrium point for the free energy at different h-values, or interaction enthalpy values. (The details for the free energy will be presented in the next section.) Thus, knowing what the anticipated h-value would be for a given pattern, it would be interesting to find the actual set of configuration variable values that result when the initial 2-D CVM grid is brought to free energy equilibrium, for the identified target h-value. These

Method for the Case where the Interaction Enthalpy Is Zero
When the interaction enthalpy is zero (ε 1 = 0), then we can find the configuration variable values associated with any reasonable value of ε 0 . This is done very simply with an analytic solution. Details are presented in full in Maren (2019a) [4]. Relevant code is identified in Appendix A.

Conclusions
Free energy minimization of a 2-D Cluster Variation Method (CVM) grid pattern induces a resulting pattern in which the characteristic configuration variable values (nearest-neighbor, next-nearest-neighbor, and triplets) are associated with specific values for activation and interaction enthalpy parameters. In this work, we have identified various configuration variable value sets at points along each of the two axes; where the activation enthalpy is zero (and an analytic solution can be achieved), and where the interaction enthalpy is zero (and a straighforward analytic solution is possible). We present a computational means for performing free energy minimization, and thus obtaining the actual configuration variable values at an enthalpy-parameter-specified state.
The resulting topographies show interesting properties, such as connectivity between regions of node clusters. Future work will indicate how the 2-D CVM topographies correlate with known neural activation pattern characterizations. These computations required the most diligent verification and validation (V&V) attention, as described in Maren (2019c) [42]. Initial work was done with small grids (8x8 or 16x16), so that the configuration variables could be manually counted and the code-generated values compared with the manual. This V&V process was done across several different grid sizes with different node configurations.
An example of such (manual and computational) counting for the "scale-free-like" grid shown in Figure 11 is given in Figure 12.
The following identifies both verification and validation (V&V) documents as well as specific codes, all available in a public-access GitHub repository [35].
• Initial Verification and Validation document (Maren (2019c) [42]) -results of initial experiments, and also describes both the manually-generated and code-generated results for counting the configuration variables for small grids. • 2D-CVM-perturb-expt-1-2b-2018-01-12.py -A free energy minimization code that includes perturbations after reaching the first free energy minimum. This was used for a short experimental series confirming that the free energy minimum is stable. The code allows user-specifiable (in **main**) values for x 1 , h, numTrials, and many other parameters. Detailed results of this experimental series are given in Maren (2019c) [42].
Note: To achieve the fractional variables shown in Figures 4 and 5, as well as in Figure 12, the following relations are used: • x i = X i /N, • y i = Y i /2N, for i = 1, 3 and y 2 = Y 2 /4N, accounting for the degeneracy with which y 2 occurs, • w i = W i /2N, for i = 1, 3 and w 2 = W 2 /4N, accounting for the degeneracy with which w 2 occurs, and • z i = Z i /2N, for i = 1, 3, 4, 6 and z 2 = Z 2 /4N, z 5 = Z 5 /4N, accounting for the degeneracy with which z 2 and z 5 occur.
The division by factors of two throughout have to do with how the various pairs and triplets are counted within the code structure.

Appendix A.2 Experimental Trial: Very Large h-value -Details
The following Table A1 presents specific data for a set of three experimental trials run for the case where the pattern used was the "extreme rich club-like" pattern shown previously in this work, e.g., in Figure 6 (b). The h-value was set to 2.0 for each trial. The objective was to see if, at a relatively high h-value, the "rich club" nature of the pattern would be preserved. Table A1. Select configuration variable values and thermodynamic values before and after free energy minimization for three trials on the "extreme rich club-like" pattern with h-value set to 2.0. (* Values for y 2 and z 2 reported here has been divided by two from that given in the experimental results, to account for the degeneracy factor of two contributing to the total y 2 and z 2 counts.) Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 January 2021 doi:10.20944/preprints202101.0041.v1

Variable
These trials indicate that the free energy was indeed reduced in each case; typically the enthalpy term was increased, but the entropy was substantially lowered, resulting in an overall lower free energy value. The maximal number of "swaps" allowed in each case was set to 100. Other trials showed that after about 100 trials, the free energy value was not substantially diminished, using the simple "swap and test" protocol that was used here.
The results show that, even for a relatively high h-value, the integrity of the "extreme rich club-like" structure is not preserved, but rather, there is a significant drop in the fraction of A-A-A (z 1 ) triplets, with a concommitant increase in the realtive number of the z 2 and z 3 triplets. The even though the enthalpy increases, the increase in the (negative) entropy is sufficient to decrease the overall free energy.
This is a surprising result, given that that at a relatively high h-value, we would have expected the equilibrium system to show closer conformity with the analytic solution, shown in Figure 7. The failure of the analytic solution to accurately represent the real at-equilbrium configuration variable values is likely due to some of the substitutions made in the (very lengthy) derivation of the analytic solution. The full derivation is presented in Maren (2019a) [4].