# Life on the Edge: Latching Dynamics in a Potts Neural Network

^{1}

^{2}

^{3}

^{4}

^{*}

*Keywords:*neural network; Potts model; latching; recursion

Next Article in Journal

Next Article in Special Issue

Next Article in Special Issue

Previous Article in Journal

Cognitive Neuroscience, SISSA—International School for Advanced Studies, Via Bonomea 265, 34136 Trieste, Italy

The Abdus Salam International Centre for Theoretical Physics, Strada Costiera 11, 34151 Trieste, Italy

Department of Physics, La Sapienza Università di Roma, Piazzale Aldo Moro, 5, 00185 Roma, Italy

Centre for Neural Computation, Norwegian University of Science and Technology, 7491 Trondheim, Norway

Author to whom correspondence should be addressed.

Received: 2 August 2017 / Revised: 25 August 2017 / Accepted: 29 August 2017 / Published: 3 September 2017

(This article belongs to the Special Issue Information Theory in Neuroscience)

We study latching dynamics in the adaptive Potts model network, through numerical simulations with randomly and also weakly correlated patterns, and we focus on comparing its slowly and fast adapting regimes. A measure, Q, is used to quantify the quality of latching in the phase space spanned by the number of Potts states S, the number of connections per Potts unit C and the number of stored memory patterns p. We find narrow regions, or bands in phase space, where distinct pattern retrieval and duration of latching combine to yield the highest values of Q. The bands are confined by the storage capacity curve, for large p, and by the onset of finite latching, for low p. Inside the band, in the slowly adapting regime, we observe complex structured dynamics, with transitions at high crossover between correlated memory patterns; while away from the band latching, transitions lose complexity in different ways: below, they are clear-cut but last such few steps as to span a transition matrix between states with few asymmetrical entries and limited entropy; while above, they tend to become random, with large entropy and bi-directional transition frequencies, but indistinguishable from noise. Extrapolating from the simulations, the band appears to scale almost quadratically in the p–S plane, and sublinearly in p–C. In the fast adapting regime, the band scales similarly, and it can be made even wider and more robust, but transitions between anti-correlated patterns dominate latching dynamics. This suggest that slow and fast adaptation have to be integrated in a scenario for viable latching in a cortical system. The results for the slowly adapting regime, obtained with randomly correlated patterns, remain valid also for the case with correlated patterns, with just a simple shift in phase space.

How can the human brain produce creative behaviour? Systems neuroscience has mainly focused on the states induced, in particular in the cortex, by external inputs, be these states simple distributions of neuronal activity or more complex dynamical trajectories. It has largely eschewed the question of how such states can be combined into novel sequences that express, rather than the reaction to an external drive, spontaneous cortical dynamics. However, the generation of novel sequences of states drawn from even a finite set has been characterized as the infinitely recursive process deemed to underlie language productivity, as well as other forms of creative cognition [1]. If the individual states, whether fixed points or stereotyped trajectories, are conceptualized as dynamical attractors [2], the cortex can be thought of as engaging in a kind of chaotic saltatory dynamics between such attractors [3]. Attractor dynamics has indeed fascinated theorists, and a major body of work has shown how to make relevant for neuroscience the concepts and analytical tools developed within statistical physics, but the focus has been on compact, homogeneous neural networks [4,5,6,7]. These have been regarded as simplified models of local cortical networks—as well as, e.g., of the CA3 hippocampal field—and have not been analysed in their potential saltatory dynamics, given that it would make no sense to consider local cortical networks as isolated systems. Even in the case of a ground-breaking investigation of putative spatial trajectory planning [8], the hippocampal activity that expressed it was thought not to be entirely endogeneous, but rather guided by external inputs, including those representing goals and path integration. Therefore, formal analyses of model networks endowed with attractor dynamics have been largely confined to the simple paradigm of cued retrieval from memory. Attempts have been made to explore methodologies to study mechanisms beyond simple cued retrieval [9,10], for example those involved in drawing, confabulation, thought processes in general, and language, which are all considered to be largely independent of external stimuli, at their core, and to combine generativity with recursion [11,12,13,14,15,16].

Potts neural networks, on the other hand, originally studied merely as a variant of mathematical or potentially applied interest [17,18,19,20,21], offer one approach to model spontaneous dynamics in extended cortical systems, in particular if simple mechanisms of temporal adaptation are taken into account [22]. They can be subject to rigorous analyses of e.g., their storage capacity [23], of the mechanics of saltatory transitions between states [24] and are amenable to a description in terms of distinct “thermodynamic” phases [25,26]. The dynamic modification of thresholds with timescales separate from that of retrieval, i.e., temporal adaptation, together with the correlation between cortical states, are key features characterizing cortical operations, and Potts network models may contribute to elucidate their roles. Adaptation and its role in semantic priming [27] have been linked to the instability manifested in schizophrenia [28].

The Potts description is admittedly an oversimplified effective model for an underlying two-level auto-associative memory network [29]. The even more drastically simplified model of latching dynamics considered by the Tsodyks group [30,31], however, has afforded spectacular success in explaining the scaling laws obtained for free recall in experiments performed 50 years ago. The Potts model may be relevant to a wide set of behaviours and to related experimental measures, once the correspondence between model parameters and the quantities characterizing the underlying two-level network are elucidated. On this correspondence, we elaborate in a separate study [32]. Here, we ask when does the Potts network latch?

We consider an attractor neural network model comprised of Potts units, as depicted in Figure 1. The rationale for the model is that each unit represents a local network of many neurons with its own attractor dynamics [4,6], but in a simplified/integrated manner, regardless of detailed local dynamics. Local attractor states are represented by $S+1$ Potts states: S active ones and one quiescent state (intended to describe a situation of no retrieval in the local network), ${\sigma}_{i}^{k}$, k = 0, 1, ⋯, S, with the constraint that ${\sum}_{k=0}^{S}{\sigma}_{i}^{k}\equiv 1$. We call this autoassociative network of Potts units a Potts network, and refer to our earlier studies of some of its properties [22,23,24,25,33].

The “synaptic” connection between two Potts units is in fact a tensor summarizing the effect of very many actual connections between neurons in the two local networks, but still following the Hebbian learning rule [34], we write the connection weight between unit i in state k and unit j in state l as [23]
where ${c}_{ij}$ is 1 if two units i and j have a connection and 0 otherwise, C is the average number of connections per unit, a is the sparsity parameter, i.e., the fraction of active units in every stored global activity pattern ($\left\{{\xi}_{i}^{\mu}\right\}$, $\mu $ = 1, 2, ⋯, p) and p is the number of stored patterns. The last two delta functions imply that the learned connection matrix does not affect the quiescent states. We will use the indices i, j for units, k, l for states and $\mu $, $\nu $ for patterns. Units are updated in the following way:
and
where ${r}_{i}^{k}$ is the input to (active) state k of unit i integrated over a time scale ${\tau}_{1}$, while U and ${\theta}_{i}^{0}$ are, respectively, the constant and time-varying component of the effective overall threshold for unit i, which in practice act as inverse thresholds on its quiescent state. ${\theta}_{i}^{0}$ varies with time constant ${\tau}_{3}$, to describe local network adaptation and inhibitory effects. The stiffness of the local dynamics is parametrized by the inverse “temperature” $\beta $ (or ${T}^{-1}$), which is then distinct from the standard notion of thermodynamic noise. The input-output relations (2) and (3) ensure that

$${J}_{ij}^{kl}=\frac{{c}_{ij}}{Ca(1-a/S)}\sum _{\mu =1}^{p}\left({\delta}_{{\xi}_{i}^{\mu},k}-\frac{a}{S}\right)\left({\delta}_{{\xi}_{j}^{\mu},l}-\frac{a}{S}\right)(1-{\delta}_{k0})(1-{\delta}_{l0}),$$

$${\sigma}_{i}^{k}=\frac{exp\left(\beta {r}_{i}^{k}\right)}{{\sum}_{l=1}^{S}exp\left(\beta {r}_{i}^{l}\right)+exp\left[\beta ({\theta}_{i}^{0}+U)\right]}$$

$${\sigma}_{i}^{0}=\frac{exp\left[\beta ({\theta}_{i}^{0}+U)\right]}{{\sum}_{l=1}^{S}exp\left(\beta {r}_{i}^{l}\right)+exp\left[\beta ({\theta}_{i}^{0}+U)\right]},$$

$$\sum _{k=0}^{S}{\sigma}_{i}^{k}=1.$$

In addition to the overall threshold, ${\theta}_{i}^{k}$ is the threshold for unit i specific to state k, and it varies with time constant ${\tau}_{2}$, representing adaptation of the individual neurons active in that state, i.e., their neural or even synaptic fatigue. The time evolution of the network is then governed by equations that include three distinct time constants:
where the field that the unit i in state k experiences reads

$${\tau}_{1}\frac{d{r}_{i}^{k}\left(t\right)}{dt}={h}_{i}^{k}\left(t\right)-{\theta}_{i}^{k}\left(t\right)-{r}_{i}^{k}\left(t\right),$$

$${\tau}_{2}\frac{d{\theta}_{i}^{k}\left(t\right)}{dt}={\sigma}_{i}^{k}\left(t\right)-{\theta}_{i}^{k}\left(t\right),$$

$${\tau}_{3}\frac{d{\theta}_{i}^{0}\left(t\right)}{dt}=\sum _{k=1}^{S}{\sigma}_{i}^{k}\left(t\right)-{\theta}_{i}^{0}\left(t\right),$$

$${h}_{i}^{k}=\sum _{j\ne i}^{N}\sum _{l=1}^{S}{J}_{ij}^{kl}{\sigma}_{j}^{l}+w\left({\sigma}_{i}^{k}-\frac{1}{S}\sum _{l=1}^{S}{\sigma}_{i}^{l}\right).$$

The “local feedback term” w is a parameter, first introduced in [25], that modulates the inherent stability of Potts states, i.e., that of local attractors in the underlying network model. It helps the network converge to an attractor faster by giving positive feedback to the most active states and so it effectively deepens their basins of attraction. Note that, in this formulation, feedback is effectively spread over (at least) three time scales: w is positive feedback mediated by collective attractor effects at the neural activity time scale ${\tau}_{1}$, ${\theta}_{i}^{k}$ is negative feedback mediated by fatigue at the slower time scale ${\tau}_{2}$, while ${\theta}_{i}^{0}$ is also negative, and it can be used to model both fast and slow inhibition; for analytical clarity, we consider the two options separately, as the “slowly adapting regime”, with ${\tau}_{3}>{\tau}_{2}$, and the “fast adapting regime”, with ${\tau}_{3}<{\tau}_{1}$. It would be easy, of course, to introduce additional time scales, for example by distinguishing a component of ${\theta}_{i}^{0}$ that varies rapidly from one that varies slowly, but it would greatly complicate the observations presented in the following.

The overlap or correlation of the activity state of the network with the global memory pattern $\mu $ can be measured as

$${m}_{\mu}=\frac{1}{Na\left(1-a/S\right)}\sum _{j\ne i}^{N}\sum _{l\ne 0}^{S}\left({\delta}_{{\xi}_{j}^{\mu}l}-\frac{a}{S}\right){\sigma}_{j}^{l}.$$

Randomly correlated memory patterns are generated according to the following probability distribution
while correlated patterns are generated by the multi-parent algorithm sketched in [22], which will be discussed in a separate study [35].

$$P({\xi}_{i}^{\mu}=k)={\displaystyle \frac{a}{S}},$$

$$P({\xi}_{i}^{\mu}=0)=1-a,$$

When does robust latching, as a model of spontaneous sequence generation, occur? We address this question with extensive computer simulations, mostly focused on latching between randomly correlated patterns. We consider first the slowly adapting regime (${\tau}_{1}\ll {\tau}_{2}\ll {\tau}_{3}$) in which active states (${\tau}_{2}$) adapt slower than activity propagation to other units (${\tau}_{1}$), while inhibitory feedback is restricted to an even slower timescale, ${\tau}_{3}$. Next, we contrast with it the fast adapting regime (${\tau}_{3}\ll {\tau}_{1}\ll {\tau}_{2}$) in which, instead, inhibitory feedback is immediate, relative to the other two time scales.

The critical parameters at play are the number of patterns, p, the number of active states, S, and the number of connections per unit, C, and we also look at the effect of the feedback term w. The other parameters, including T, ${\tau}_{1}$, ${\tau}_{2}$, and ${\tau}_{3}$, are kept fixed during simulations, after having chosen a priori values that can lead to robust latching dynamics in the two regimes.

In the slowly adapting regime, over a (short) time of order ${\tau}_{1}$ the network, if suitably cued, may reach one of the global attractors, and stay there for a while; whereupon, after an adaptation time of order ${\tau}_{2}$, it may latch to another attractor, or else activity may die [25]. However, how distinct is the convergence to the new attractor? One may assess this as the difference between the two highest overlaps the network activity has, at time t, with any of the memory patterns, ${m}_{1}\left(t\right)-{m}_{2}\left(t\right)$: ideally, ${m}_{1}\simeq 1$ and ${m}_{2}$ is small, so their difference approaches unity. A summary measure of memory pattern discrimination can be defined as ${d}_{12}\equiv {\langle \int dt({m}_{1}\left(t\right)-{m}_{2}\left(t\right))\rangle}_{\mathrm{initial}\phantom{\rule{0.277778em}{0ex}}\mathrm{cue}}$, where, of course, the identity of patterns 1 and 2 changes over the sequence.

As discussed in [25], by looking at the latching length, how long a simulation runs before, if ever, the network falls into the global quiescent state, one can distinguish several “phases”. Depending on the parameters, the dynamics exhibit finite or infinite latching behaviour, or no latching at all. Typically, when increasing the storage load p, the latching sequence is prolonged and eventually extends indefinitely, but, at the same time, its distinctiveness decreases, since memory patterns cannot be individually retrieved beyond the storage capacity; and, even before, each acquires neighbouring patterns, in the finite and more crowded pattern space, with which it is too correlated to be well discriminated.

In Figure 2, we see that, for each S = (2, 3, 4), as p is increased beyond a certain value, latching dynamics rapidly picks up and extends eventually through the whole simulation, but, in parallel, its discriminative ability decreases and almost vanishes—the p-range where ${d}_{12}$ is large is in fact when there is no latching, and ${d}_{12}$ only measures the quality of the initial cued retrieval. For $S=1$ no significant latching sequence is seen, whereas for higher values, at fixed p, its distinctiveness increases with S, but its length decreases from the peak value at $S=2$.

Since the latching length l is not itself sufficient to characterize latching and has to be complemented by discriminative ability, we find it convenient to quantify the overall quality of latching.

With a new quantity Q defined as
where $\eta $ is introduced to exclude cases in which the network gets stuck in the initial cued pattern, so that no latching occurs; however, high ${d}_{12}$ and l are:
Q is therefore a positive real number between 0 and 1, and we report its color-coded value to delineate the relevant phases in phase space.

$$Q={d}_{12}\xb7l\xb7\eta ,$$

$$\eta =\left\{\begin{array}{cc}1:\hfill & \mathrm{if}\mathrm{at}\mathrm{least}\mathrm{one}\mathrm{transition}\mathrm{to}\mathrm{a}\mathrm{second}\mathrm{memory}\mathrm{pattern}\mathrm{occurs},\hfill \\ 0:\hfill & \mathrm{otherwise}.\hfill \end{array}\right.$$

Thus, low quality latching with small Q may result from either small ${d}_{12}$ or short l, or both. The parameters that determine Q which we focus on are S, C and p, after having suitably chosen all the other parameters, which are kept fixed. Their default values in the slowly adapting regime are $N=1000$, $a=0.25$, $U=0.1$, $T=0.09$, $w=0.8$, ${\tau}_{1}=3.3$, ${\tau}_{2}=100.0$, ${\tau}_{3}={10}^{6}$, unless explicitly noted otherwise. If activity does not die out before, simulations are terminated after ${N}_{update}$ = 6 × 10${}^{5}$ steps, the total number of updates of the entire Potts network, and are repeated with different cued patterns. Re the values of S, C and p, we use the following notation, for simplicity:

$$Q=Q\left(S,C,p\right)=\left\{\begin{array}{cc}Q\left(S,p\right):\hfill & C=150\phantom{\rule{1.em}{0ex}}\mathrm{fixed},\hfill \\ Q\left(C,p\right):\hfill & S=5\phantom{\rule{2.em}{0ex}}\mathrm{fixed}.\hfill \end{array}\right.\phantom{\rule{0.166667em}{0ex}}$$

Figure 3 shows that there are narrow regions in the S–p and C–p planes, which we call $bands$, where relatively high quality latching occurs. The values of p with the “best” latching scale almost quadratically in S, and sublinearly in C. Moreover, one notices that, below certain values of S and C, no latching is seen, i.e., the band effectively ends at $S\sim 2$, $p\sim 90$ in Figure 3a and at $C\sim 50$, $p\sim 70$ in Figure 3b. Importantly, the band in Figure 3a is confined in the area delimited by the cyan solid and dashed curves above and below it. The dashed curve is for the onset of latching, i.e., the phase transition to finite latching [25], while the solid curve above is the storage capacity curve in a diluted network, given by the approximate relation beyond which retrieval fails [25]. It should also be noted that overall Q values are not large, in fact well below 0.5 throughout both S–p and C–p planes. The reason is, again, in the conflicting requirements of persistent latching, favoured by dense storage, high p, and good retrieval, allowed instead only at low storage loads (in practice, relatively low $p/{S}^{2}$ and $p/C$ values):

$${p}_{c}\simeq \frac{C{S}^{2}}{4aln\frac{2S}{a\sqrt{ln\frac{S}{a}}}}.$$

In Figure 4, we show representative latching dynamics at three selected points in the $(S,p)$ plane, in terms of the time evolution of the overlap of the states with the stored activity patterns (see Equation (8)). The three points, marked in red, span across the band in Figure 3a, and we see that latching is indefinite but noisy in the example at (5, 250), which is apparently too close to storage capacity, while memory retrieval is good at (7, 150), but the sequence of states ends abruptly, as the network is in the phase of finite latching [25]. The two trends are representative of the two sides of the band, while in the middle, at (6, 200), one finds a reasonable trade-off, with relatively good retrieval combined with protracted latching.

We use two statistical measures, the asymmetry of the transition probability matrix and Shannon’s information entropy [33,36,37] to characterize the essential features of the dynamics in different parameter regions. For that, we take all five red points from Figure 3a, such that they cut across the latching band in the S–p plane, and extend further upwards. We first compile a transition probability (or rather, frequency) matrix M from all distinct transitions observed along many latching sequences generated with the same set of stored patterns, as in [33]. The dimension of the matrix M is $(p+1)\times (p+1)$, as it includes all possible transitions between p patterns $plus$ the global quiescent state. M is constructed from the transitions between states having both overlaps above a given threshold value, e.g., 0.5, in a data set of 1000 latching sequences, by accumulating their frequency between any two patterns into each element of the matrix and then normalizing to 1 row by row, so that ${M}_{\mu ,\nu}$ reflects the probability of a transition from pattern $\mu $ to $\nu $. A, the degree of asymmetry of M, is defined as
where ${M}^{T}$ is the transpose matrix of M and $\left|\left|M\right|\right|={\sum}_{\mu ,\nu}\left|{M}_{\mu ,\nu}\right|$. Note that A is small for unconstrained bi-directional dynamics and large for simpler stereotyped flows among global patterns, attaining its maximum value $A=2$ for strictly uni-directional transitions. Note also that if the average had been taken over different realizations of the memory patterns, given sufficient statistics A would obviously vanish.

$$A=\frac{\left||M-{M}^{T}|\right|}{\left|\right|M\left|\right|},$$

Another measure we apply to the transition matrix M is Shannon’s information entropy, defined as

$${I}_{\mu}={\u2329\frac{1}{{\mathrm{log}}_{2}(p+1)}\sum _{\nu =1}^{p+1}{M}_{\mu ,\nu}{\mathrm{log}}_{2}\left(\frac{1}{{M}_{\mu ,\nu}}\right)\u232a}_{\mu}.$$

${I}_{\mu}$ takes positive real values from 0 (deterministic, all transitions from one state are to a single other state) to 1 (completely random), since it is normalized by ${log}_{2}(p+1)$, which corresponds to a completely random case.

We use these two measures, A and ${I}_{\mu}$, on the points, marked red in Figure 3a.
that lie on a segment going through the latching band observed in the slowly adapting regime. If we focus on transitions between states reaching at least a threshold overlap of 0.5, Figure 5 appears to show two complementary, almost opposite U-shaped curves as the two measures, asymmetry and entropy, are applied to the five points along the segment. One branch of each U shape extends over the range that includes the high-Q latching band: these are the right branches of the two curves, in which asymmetry decreases from a large value $A\simeq 1.6$ at (7, 150) to a smaller one $A\simeq 0.6$ at (5, 250), while concurrently the entropy increases from ${I}_{\mu}<0.5$ at (7, 150) to ${I}_{\mu}>0.8$ at (5, 250). As Figure 4 indicates, at (7, 150), latching sequences are distinct but very short, and few entries are filled in the transition matrix: generally either ${M}_{\mu \nu}=0$ or ${M}_{\nu \mu}=0$, so that asymmetry is high and entropy relatively low. This holds irrespective of the number of sequences that are averaged over. The opposite happens at (5, 250), where many transitions are observed, and in filling the transition matrix they approach the random limit. The point with the highest Q-value, (6, 200), is characterized by intermediate values of asymmetry and entropy which, we have previously observed, may be seen as a signature of complex dynamics [33]. Extending the range upwards, it seems as if the asymmetry, with threshold 0.5, were to eventually increase again, reaching its maximum $A=2$ at (3, 350), with a decreasing entropy, vanishing at the same point (3, 350). These left branches are, however, dependent on the threshold values used, as Figure 5 shows, and do not imply that transitions become more deterministic because, in this region, there are simply fewer and fewer distinct transitions discernible above the noise (Figure 4). The left branches merely reflect the increasing arbitrariness with which one can identify significant correlations with memory states in the rambling dynamics observed at higher storage loads.

$$(3,350)-(4,300)-(5,250)-(6,200)-(7,150)$$

In Figure 6, we see that the effect of the local feedback term, w, is first to enable latching sequences of reasonable quality, and then to also shift the latching band to higher values of S, effectively pushing this behaviour away from the storage capacity curve representing the retrieval capability of the Potts associative network. Hence, if one were to regard S as a structural parameter of the network, and w as a parameter that can be tuned, there is an optimal range of w values that allows good quality latching for higher storage. This argument has to be revised, however, by considering also the threshold U, since increasing w can be shown to be functionally equivalent, in terms of storage capacity, to decreasing U [32]. Also for U, in fact, one can find an optimal range for associative retrieval to occur, in the simple Potts network with no adaptation and with $w=0$ [23]. This near equivalence between U and $-w$ does not hold anymore in the fast adapting regime, to which we turn next.

We characterize the fast adapting regime by the alternative ordering of time scales ${\tau}_{3}<{\tau}_{1}\ll {\tau}_{2}$, such that the mean activity in each Potts unit is rapidly regulated by fast inhibition, at the time scale ${\tau}_{3}$. Equation (6) stipulates that ${\sum}_{k=1}^{S}{\sigma}_{i}^{k}\left(t\right)$, the total activity of each unit, is followed almost immediately, or more precisely at speed ${\tau}_{3}^{-1}$, by the generic threshold ${\theta}_{i}^{0}\left(t\right)$. Extensive simulations, with the same parameters as for the slowly adapting regime, except for $w=1.37$, ${\tau}_{1}=20$, ${\tau}_{2}=200$ and ${\tau}_{3}=10$, show that, similarly to the slowly adapting regime, there are latching bands in the $Q(S,p)$ and $Q(C,p)$ planes (see Figure 7). With these parameters, in particular, the larger value chosen for the feedback term w, the bands occupy a similar position as in the slowly adapting regime. Again, they appear to vanish below certain values of S and C, more precisely around $S\sim 3$, $p\sim 120$ in Figure 7a and around $C\sim 50$, $p\sim 90$ in Figure 7b, and to scale subquadratically in S and sublinearly in C. The band in the S–p plane is again confined by the storage capacity (solid cyan curve) and by the onset of (finite) latching (dashed curve). The storage capacity curve, which is independent of threshold adaptation, follows the same Equation (13).

Examples of latching behaviour outside and inside the band are presented in Figure 8, at the same values for S but shifted by $\Delta p=100$, i.e., at the “red” points (5, 350), (6, 300), and (7, 250) in the S–p plane. Again, we see from Figure 7a that (5, 350) lies just above the band, while (6, 300) is right on the centre. To the right of the band, e.g., at (7, 250), the transitions are distinct but latching dies out very soon, while on the left, e.g., at (5, 350), the progressively reduced overlaps are a manifestation of increasingly noisy retrieval dynamics. In all three examples, we observe that latching steps proceed slowly, even slower than the doubled time scale ${\tau}_{2}=200$ would have led to predict. This appears to be because often a significant time elapses between the decay of the overlap of the network with one pattern and the emergence of a new one.

Figure 9 shows the asymmetry and entropy measures, A and ${I}_{\mu}$, along the points
in Figure 7a, where, again, we have chosen a series shifted by $\Delta p=100$ upwards in order to centre it better on the high quality latching band. Only an overlap threshold of 0.5 is considered. What one can see, in contrast with the slowly adapting regime, is that now the two measures are not quite complementary. The point (6, 300) that lies inside the band, very much at its quality peak, shows again an intermediate value for the asymmetry, but the highest value, given the overlap threshold, for the entropy. The discrepancy may be ascribed to the different prevailing type of latching transition observed in the fast adapting regime, Figure 8. As discussed in [24], in a Potts network latching transitions with a high cross-over, which can only occur between memory patterns with a certain degree of correlation, can be distinguished from those with a vanishing cross-over, which are much more random. In the fast adapting regime, as indicated by the examples in Figure 8, all transitions tend to be of the latter type. A more careful analysis indicates, in fact, that they are quasi-random, in that they avoid a memory pattern in which largely the same Potts units are active as in the preceding pattern. In fact, the value of the entropy at (6, 300) implies that on average from each of the 300 memory patterns there are transitions to at least 190 other patterns (to 190 if they were equiprobable, in practice many more); therefore, only the few patterns that happen to be more (spatially) correlated are avoided.

$$\left(4,400\right)-\left(5,350\right)-\left(6,300\right)-\left(7,250\right)-\left(8,200\right),$$

Towards the left, the curves do not vary much depending on the threshold chosen for the overlaps, but the asymmetry eventually becomes maximal and the entropy vanishes simply because sequences of robustly retrieved patterns do not last long, so, in this particular case, it would take more than 1000 sequences to accumulate sufficient statistics.

The effects of increasing the w term in the fast adapting regime are shown in Figure 10, where one notices two main features. First, there is heightened sensitivity to the exact value of w, so that relatively close data points at w = 1.33, 1.37, 1.41, and 1.45 yield rather different pictures. Second, although again increasing w shifts the latching band rightward, by far the main effect is a widening of the band itself. This is because in the presence of rapid feedback inhibition a larger w term ceases to be functionally similar to a lower threshold, which in the slowly adapting regime was leading in turn to noisier dynamics and eventually indiscernible transitions. In the fast adapting regime, the increased positive feedback can be rapidly compensated by inhibitory feedback, so that in the high-storage region overlaps remain large, until they are suppressed by storage capacity constraints (the cyan curve, which remains at approximately the same distance from the larger and larger latching band).

We now turn to more explicit comparison of the transition dynamics in two regimes.

To look more closely at latching dynamics in the slowly and fast adapting regimes, we take the following points from Figure 3a and Figure 7a, which allow us to cut through the bands at two different storage levels

$$\left\{\begin{array}{cc}p=200,\hfill & S=(4,5,6,7),\hfill \\ p=400,\hfill & S=(6,7,8,9).\hfill \end{array}\right.$$

Figure 11 shows in different colors the overlaps of the state of the network with the global patterns, for sample sequences along the points (16), in the slowly adapting regime. For both $p=200$ and 400, latching length is observed to decrease with S, unlike the discrimination between patterns, as measured by ${d}_{12}$, in agreement with Figure 2. Note that the two rows in the figure are similar, indicating that the shift $\Delta p=200$ is approximately compensated by the rightward shift $\Delta S=2$.

The fast adapting regime shows the same trends, again one sees in Figure 12 the approximate compensation between the two shifts $\Delta p=200$ and $\Delta S=2$, but latching appears in general less noisy.

The main difference between the two regimes, however, is in the distribution of crossover values, those when the network has equal overlap with the preceding and the following pattern: their distribution (PDF, or probability density function) is shown in Figure 13 and Figure 14.

We see that, in the fast adapting regime, most transitions occur at very low crossover, i.e., the correlation with the preceding memory has to decay almost to zero before the next memory pattern can be activated. Only in regions of the $(S,p)$ plane where latching sequences are very short, a few transitions only, we begin to see a small fraction of them with crossover values above 0.2. In most cases, the inhibitory feedback conveyed by the variable ${\theta}_{i}^{0}$ is so fast as not to allow transitions to be carried through by positive correlations, i.e., by the subset of Potts units which are in the same active state in the preceding and successive pattern. The choice of the next pattern is not completely random, as indicated by the relative entropy values still below unity, but is determined essentially by negative selection, as mentioned above: the next pattern tends to have few active Potts units that coincide with those active in the preceding pattern.

In the slowly adapting regime, instead, due to the slow variation of the non-specific threshold, active Potts units can remain active, but they are encouraged by the variables ${\theta}_{i}^{k}$ to switch between active states if they have been in the same for too long. This can produce, particularly in the center of the latching band, sequences of patterns succeeding each other at high crossover, as shown by the distribution in Figure 13c. Even when latching is very noisy and approaches randomness, as in panels Figure 13a,e, crossover values are consistently above 0.2, indicating a preference for patterns insisting on the same set of active Potts units, unlike the fast adapting regime. Finally, when the number of states S is too large or, equivalently, that of patterns p too low, we observe some transitions with minimal crossover and a majority with very large crossover, as if occurring only with those patterns that were already partially retrieved when the network had still the largest overlap with the preceding pattern, but the main observation is that there are very few transitions at all, so that to plot a probability density distribution we need to used wide bins, in panels Figure 13d,h (and in Figure 14d).

This difference between the two regimes is confirmed by an analysis of the correlations between successive patterns in latching sequences. In the Potts network, at least two types of spatial correlation between patterns are relevant: how many active Potts units the two patterns share, and how many of these units are active and in the same state. We quantify them with ${C}_{1}$, the fraction of the units active in one pattern that are active also in the other, and in the same state; and with ${C}_{2}$, the fraction that are also active, but in a different state. In a large set of randomly determined patterns, the mean values are $\langle {C}_{1}\rangle =a/S$ and $\langle {C}_{2}\rangle =a(S-1)/S$. The full distribution, among all pairs, is scattered around these mean values. However, do transitions occur between any pair of patterns?

Figure 15 shows that relative to the full distribution, in blue, transitions tend to occur, in the slowly adapting regime on the left, only between patterns with ${C}_{1}$ above and ${C}_{2}$ below (or at most around) their average values. Thus, when the network has retrieved a memory representation, it looks for correlated ones, as it were, where to jump. In the fast adapting regime, this is not the case: transitions are almost random, except there appears to be a slight tendency to avoid those with ${C}_{1}$ well above its mean value. Note that the values of p and w are different in the two panels, and are chosen so as to be in roughly equivalent positions within the respective latching bands.

The analysis of the crossover points, therefore, affords insight into the rather different transition dynamics prevailing in the fast and slowly adapting regimes, in particular in the center of their latching bands, suggesting that in a more realistic cortical model, which combines both types of activity regulation, there should still be a significant component of “slow adaptation” for interesting sequences of correlated patterns to emerge. The preceding simulations, however, were all carried out with randomly correlated patterns, in which the occasional high or low correlation of a pair is merely the result of a statistical fluctuation. Does the insight carry over to a more stuctured model of the correlations among memory patterns? This is what we ask next.

Correlated patterns were generated according to the algorithm mentioned by [22] and discussed in detail in [35]. The multi-parent pattern generation algorithm works in three stages. In the first step, a total set of $\Pi $ random patterns are generated to act as parents. In the second step, each of the total set of parents are assigned to ${p}_{par}$ randomly chosen children. Then, a “child” pattern is generated: each pattern, receiving the influence of its parents with a probability ${a}_{p}$, aligns itself, unit by unit, in the direction of the largest field. In the third and final step, a fraction a of the units with the highest fields is set to become active. In this way, child patterns with a sparsity a are generated. In addition, another parameter $\zeta $ can be defined, according to which the field received by a child pattern is weighted with a factor $exp(-\zeta k)$ where the index k runs through all parents. This is meant to express a non-homogeneous input from parents.

It is clear that such patterns, however, cannot be considered as independent and identically distributed, as in Equation (9), because their activity is drawn from a common pool of parents. In fact, they are correlated, in the sense that those children receiving congruent input from a larger number of common parents will tend to be more similar. All of these observations are studied in more detail in [35], and here we only focus on how correlations affect the phase diagrams. In the following simulations, the parameters pertaining to the patterns are ${a}_{p}=0.4$, $\Pi =100$, $\zeta =0.1$ while ${p}_{par}/p$, the probability that a pattern be influenced by a parent is kept constant at $0.277$.

Simulations with correlated patterns were carried out across the same S–p and C–p planes in phase space, in the slowly adapting regime, as shown in Figure 16. We focused on the slowly adapting regime based on the results of the crossover analysis. All other simulation parameters were kept at the values used with randomly correlated patterns.

We see from the figure that the presence of non-random correlations among the memory patterns, albeit weak, shifts the bands to the left and upward in phase space, keeping approximately the dependence of the viable storage load p on S and C, but at somewhat higher values. It is as if more memories could “fit”, if correlated, into the same latching dynamics.

Figure 17 shows the S–p plane cut along $p=200$, to better compare the cases with correlated (blue) and random (red) patterns. It is apparent that there is a leftward shift, in the case of correlated patterns, from the red curve applying to the random case, but the dependence on S remains very similar.

In this paper, we have found the region in the Potts network phase space spanned by the number of Potts states S, the number of connections per unit C and the storage load p, where latching dynamics occur, and we have described their character, comparing and contrasting the slowly and fast adapting regimes. In relation to our earlier paper [22], where the possibility of such a latching region was pointed out on the basis of limited simulations, we have now a firmer basis to extrapolate to regions of parameter space of relevance to the human cortex, possibly a step toward quantitatively studying human specific capacities, including creative behaviour. A common hallmark in both regimes is that good quality latching occupies a band which scales almost quadratically in the p–S plane, while it is sublinear in the p–C plane. These bands are bounded by the storage capacity line, above, and by the boundary between no latching and finite latching, below. If, as discussed elsewhere [32], we were to take $C\approx {10}^{2}$ and $S\approx {10}^{2}$ as the orders of magnitude of interest for the human brain, we would conclude that the relevant storage load, or semantic depth, is in the region $p\approx {10}^{5}$, in both regimes. At the center of the band in the slowly adapting regime, asymmetry and entropy take intermediate values, pointing at maximally complex and potentially useful dynamics, intermediate between the deterministic and the random extremes. High crossover values indicate that many transitions occur between highly correlated patterns. Using correlated patterns shifts the position of the band in phase space, but preserving the features observed with random patterns, still in the slowly adapting regime.

In the fast adapting regime, instead, in the center of the band, which can be made wider and more robust, the entropy is higher, and correspondingly only low crossover transitions are observed, indicating that the network latches most of the time from one pattern to any other among the many with which it is weakly or anti-correlated, avoiding only those few with which it is highly correlated.

Therefore, we can conclude that the fast adapting regime, modelling rapid inhibitory feedback, offers a robust framework for latching dynamics, but of an essentially random, not very useful nature; whereas in the slowly adapting regime, modelling slow inhibition or local fatigue, correlations can drive latching transitions, potentially enabling semantic content in a stream of thoughts or linguistic productions, but with fragile dynamics, living at the very edge between memory overload and sequence termination because of the inability of the network to jump forward. This suggests the opportunity of considering models that integrate both fast and slowly adapting dynamics in their non-specific thresholds, so as to combine the useful features of both regimes. It will be the object of future work.

We would like to note, in the end, the inherent limitation of considering a simple homogeneous Potts network, with no differentiation among its units and no internal structure. In order to make contact with cognitive processes, of any kind, this limitation has to be overcome, as perhaps attempted, with one first step among many possible ones, by arranging Potts units on a ring [38]. Nevertheless, even in its crudest form, the Potts network with its latching dynamics can be used to explore e.g., novel theories as to the evolutionary origin of complex cognition [39]. It establishes a quantitative framework to understand phase transitions [25], complementary to the perspective offered by other modelling approaches to sequence generation in cortical networks [40]. At the most abstract level, it can be considered an implementation of a fuzzy logic system [41,42], but with the critical advantage that its parameters can eventually be related to cortical parameters, as we begin to describe in a related study [32].

We are grateful to Leonardo Romor who optimized the Potts network code, and to the Human Frontier Science Program RGP0057/2016 collaboration, that has also supported open access publication.

All authors conceived and designed the simulations, which were performed primarily by Chol Jun Kang, with major contributions by Michelangelo Naim and Vezha Boboeva. Alessandro Treves and Chol Jun Kang wrote the paper, with input from the other two authors. All authors have read and approved the final manuscript.

The authors declare no conflict of interest.

- Hauser, M.D.; Chomsky, N.; Fitch, W.T. The Faculty of language: What is it, who has it, and how did it evolve? Science
**2002**, 298. [Google Scholar] [CrossRef] [PubMed] - Amit, D.J. The Hebbian paradigm reintegrated: Local reverberations as internal representations. Behav. Brain. Sci.
**1995**, 18. [Google Scholar] [CrossRef] - Kaneko, K.; Tsuda, I. Dynamic link of memory—Chaotic memory map in nonequilibrium neural networks. Chaos
**2003**, 13. [Google Scholar] [CrossRef] - Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA
**1982**, 79, 2554–2558. [Google Scholar] [CrossRef] [PubMed] - Amit, D.J.; Gutfreund, H.; Sompolinsky, H. Statistical mechanics of neural networks near saturation. Ann. Phys.
**1987**, 173. [Google Scholar] [CrossRef] - Amit, D.J. Modeling Brain Function; Cambridge University Press: New York, NY, USA, 1992. [Google Scholar]
- Rolls, E.T.; Treves, A. Neural Networks and Brain Function; Oxford University Press: New York, NY, USA, 1998. [Google Scholar]
- Pfeiffer, B.E.; Foster, D.J. Hippocampal place cell sequences depict future paths to remembered goals. Nature
**2013**, 497, 74–79. [Google Scholar] [CrossRef] [PubMed] - Abeles, M. Local Cortical Circuits: An Electrophysiological Study; Springer Science & Business Media: Berlin, Germany, 2012. [Google Scholar]
- Chossat, P.; Krupa, M.; Lavigne, F. Latching dynamics in neural networks with synaptic depression. arXiv
**2016**, arXiv:1611.03645v2. [Google Scholar] - Burgess, P.W.; Shallice, T. Confabulation and the control of recollection. Memory
**1996**, 4, 359–411. [Google Scholar] [CrossRef] [PubMed] - Epstein, R. The neural-cognitive basis of the Jamesian stream of thought. Conscious. Cognit.
**2000**, 9, 550–575. [Google Scholar] [CrossRef] [PubMed] - Abeles, M. Time is precious. Science
**2004**, 304, 523–524. [Google Scholar] [CrossRef] [PubMed] - Pulvermüller, F. Brain mechanisms linking language and action. Nat. Rev. Neurosci.
**2005**, 6, 576–582. [Google Scholar] [CrossRef] [PubMed] - Shmiel, T.; Drori, R.; Shmiel, O.; Ben-Shaul, Y.; Nadasdy, Z.; Shemesh, M.; Teicher, M.; Abeles, M. Temporally precise cortical firing patterns are associated with distinct action segments. J. Neurophysiol.
**2006**, 96, 2645–2652. [Google Scholar] [CrossRef] [PubMed] - Sosnik, R.; Shemesh, M.; Abeles, M. The point of no return in planar hand movements: An indication of the existence of high level motion primitives. Cognit. Neurodyn.
**2007**, 1, 341–358. [Google Scholar] [CrossRef] [PubMed] - Kanter, I. Potts-glass models of neural networks. Phys. Rev. A
**1988**, 37, 2739–2742. [Google Scholar] [CrossRef] - Bollé, D.; Dupont, P.; Mourik, J.V. Stability properties of potts neural networks with biased patterns and low loading. J. Phys. A Math. Gen.
**1991**, 24. [Google Scholar] [CrossRef] - Bollé, D.; Dupont, P.; Huyghebaert, J. Thermodynamic properties of the Q-state potts-glass neural network. Phys. Rev. A
**1992**, 45. [Google Scholar] [CrossRef] - Bollé, D.; Inck, B.; Zagrebnov, V.A. On the parallel dynamics of the Q-state potts and Q-ising neural networks. J. Stat. Phys.
**1993**, 70, 1099–1119. [Google Scholar] [CrossRef] - Bollé, D.; Cools, R.; Dupont, P.; Huyghebaert, J. Mean-field theory for the Q-state potts-glass neural network with biased patterns. J. Phys. A Math. Gen.
**1993**, 26. [Google Scholar] [CrossRef] - Treves, A. Frontal latching networks: A possible neural basis for infinite recursion. Cogn. Neuropsychol.
**2005**, 22, 276–291. [Google Scholar] [CrossRef] [PubMed] - Kropff, E.; Treves, A. The storage capacity of potts models for semantic memory retrieval. J. Stat. Mech. Theor. Exp.
**2005**. [Google Scholar] [CrossRef] - Russo, E.; Namboodiri, V.M.K.; Treves, A.; Kropff, E. Free association transitions in models of cortical latching dynamics. New J. Phys.
**2008**, 10. [Google Scholar] [CrossRef] - Russo, E.; Treves, A. Cortical free-association dynamics: Distinct phases of a latching network. Phys. Rev. E
**2012**, 85. [Google Scholar] [CrossRef] [PubMed] - Abdollah-nia, M.F.; Saeedghalati, M.; Abbassian, A. Optimal region of latching activity in an adaptive Potts model for networks of neurons. J. Stat. Mech. Theory Exp.
**2012**. [Google Scholar] [CrossRef] - Lerner, I.; Bentin, S.; Shriki, O. Spreading activation in an attractor network with latching dynamics: Automatic semantic priming revisited. Cogn. Sci.
**2012**, 36, 1339–1382. [Google Scholar] [CrossRef] [PubMed] - Lerner, I.; Bentin, S.; Shriki, O. Excessive attractor instability accounts for semantic priming in schizophrenia. PLoS ONE
**2012**, 7. [Google Scholar] [CrossRef] [PubMed] - O’Kane, D.; Treves, A. Short-and long-range connections in autoassociative memory. J. Phys. A Math. Gen.
**1992**, 25. [Google Scholar] [CrossRef] - Romani, S.; Pinkoviezky, I.; Rubin, A.; Tsodyks, M. Scaling laws of associative memory retrieval. Neural Comput.
**2013**, 25, 2523–2544. [Google Scholar] [CrossRef] [PubMed] - Recanatesi, S.; Katkov, M.; Romani, S.; Tsodyks, M. Neural network model of memory retrieval. Front. Comput. Neurosci.
**2015**, 9. [Google Scholar] [CrossRef] [PubMed] - Naim, M.; Boboeva, V.; Kang, C.J.; Treves, A. From multi-modular Hopfield networks to the Potts network and its storage capacity. Unpublished work. 2017. [Google Scholar]
- Kropff, E.; Treves, A. The complexity of latching transitions in large scale cortical networks. Nat. Comput.
**2007**, 6, 169–185. [Google Scholar] [CrossRef] - Hebb, D.O. The Organization of Behavior: A Neuropsychological Theory; John Wiley & Sons: New York, NY, USA, 2005. [Google Scholar]
- Boboeva, V.; Treves, A. The storage capacity of the Potts network with correlated patterns. Unpublished work. 2017. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: New York, NY, USA, 2006. [Google Scholar]
- Russo, E.; Pirmoradian, S.; Treves, A. Associative latching dynamics vs. syntax. In Advances in Cognitive Neurodynamics (II); Wang, R., Gu, F., Eds.; Springer: Dordrecht, The Netherlands, 2011; pp. 111–115. [Google Scholar]
- Song, S.; Yao, H.; Treves, A. A modular latching chain. Cogn. Neurodyn.
**2014**, 8, 37–46. [Google Scholar] [CrossRef] [PubMed] - Amati, D.; Shallice, T. On the emergence of modern humans. Cognition
**2007**, 103, 358–385. [Google Scholar] [CrossRef] [PubMed] - Rajan, K.; Harvey, C.D.; Tank, D.W. Recurrent network models of sequence generation and memory. Neuron
**2016**, 9, 128–142. [Google Scholar] [CrossRef] [PubMed] - Jiang, Y.; Chung, F.L.; Deng, Z.; Wang, S. Multitask TSK fuzzy system modeling by mining intertask common hidden structure. IEEE Trans. Cybern.
**2015**, 45, 548–561. [Google Scholar] [PubMed] - Tu, C.C.; Juang, C.F. Recurrent type-2 fuzzy neural network using Haar wavelet energy and entropy features for speech detection in noisy environments. Expert Syst. Appl.
**2012**, 39, 2479–2488. [Google Scholar] [CrossRef]

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).