Neural Activity in Quarks Language: Lattice Field Theory for a Network of Real Neurons

Brain–computer interfaces have seen extraordinary surges in developments in recent years, and a significant discrepancy now exists between the abundance of available data and the limited headway made in achieving a unified theoretical framework. This discrepancy becomes particularly pronounced when examining the collective neural activity at the micro and meso scale, where a coherent formalization that adequately describes neural interactions is still lacking. Here, we introduce a mathematical framework to analyze systems of natural neurons and interpret the related empirical observations in terms of lattice field theory, an established paradigm from theoretical particle physics and statistical mechanics. Our methods are tailored to interpret data from chronic neural interfaces, especially spike rasters from measurements of single neuron activity, and generalize the maximum entropy model for neural networks so that the time evolution of the system is also taken into account. This is obtained by bridging particle physics and neuroscience, paving the way for particle physics-inspired models of the neocortex.


Neural activity as a dynamical process on a lattice
It is widely accepted that the foundational computational units in the brain are the neurons.Neurons receive electro-chemical inputs from other neurons through dendrites, which are then integrated in the cell body.When the integration reaches a threshold, the neuron generates electrical impulses called action potentials, or spikes.If such a threshold is not reached no spike is generated.When recording neural activity, e.g. during a neurophysiology experiment, it is usual to collect the timing and occurrence of spikes of an arbitrary number N of individual neurons and align them, in an arbitrary time window T , with events or stimuli specific of the chosen experimental paradigm.The matrix with N rows (neurons) and T columns (time) that encodes this information is called a spike raster.Calling V the space and S the time in which our system "lives", we note that V is regularized by the intrinsic discretization of its units, i.e. the neurons are discrete objects, and that S is regularized by the physiological existence of an absolute refractory period and fixed by the natural temporal ordering of the observed dynamical evolution.This implies that when studying an ensemble of neurons we can represent the space-time with a set of discrete points, or sites of a lattice.Formally, we define a spatial mapping onto the following ordered set of vertices (see Supplementary Infromation, SI section 5), where the time window is regularized into sub-intervals α according to a hypothetical "clock time" τ, corresponding to the minimum time between two computational operations of the neuron.Considering that, after a spike, a neuron enters a refractory period during which it is temporarily unable to generate another one, as τ it would be natural to consider: charge time + discharge time + absolute refractory period + relative refractory period.However, τ it is not necessarily this, and may not be uniquely defined.In fact, it could also be strongly distributed, or be very different for various processes and scales.Hence, without any loss of generality, it is convenient to consider its smallest possible value, i.e. the typical duration of a spike: τ ≈ 1ms.Within a τ, the neuronal computational unit i can be either active or silent, which can be represented by a binary variable.The raster (or kernel) Ω can be explicitly written as: in which ϕ α i is the binary variable representing the activity of the i-th neuron at time α.After these simple observations, one realizes that Ω naturally contains the entire information needed to describe the observed process.From now on we will refer to Ω as the (neural) activity kernel, or simply, kernel 25 .

Neural activity in terms of Lattice Field Theory
The neural dynamics is expected to follow a causal evolution determined by the prior states, i.e., a dynamical process with memory.It is reasonable to assume that such dynamics can be described or approximated by a quantum time evolution, so that the formalism of quantum field theory [17][18][19][20][21][22][23][24][25] can be applied: this is an harmless assumption, since classical evolution can be always retrieved as a sub-case of quantum evolution.Then, let assume that the evolution of ϕ α i in α can be characterized by a discrete process of interacting binary fields, or qbits 24,25 .We can model the time evolution of ϕ α i by considering its statistical mechanics counterpart on a lattice [18][19][20][21][22][23]25 : to do so, we only need to formally define few quantities, familiar to neuroscientists, that can be obtained from the binary kernel Ω. Weintroduce Φ the space correlation matrix, and Π the time correlation matrix (joint-spike matrix, JS),

2/41
Indicating the transpose operation with the symbol †, these are straightforwardly obtained from the kernel through the relations Ω Ω † /T = Φ and Ω † Ω/N = Π, which are often used automatically (and unconsciously) to calculate spatial and temporal correlations between experimental data.Just like a system of particles, we can now describe how the N neurons represented by Ω interact with their surroundings in the discrete lattice space-time by combining these quantities into a single expression enclosing static and dynamic properties.Namely, we can write the action A of the system: which is the action of a lattice field theory on Ω.The matrix A of potential interactions and the matrix B of kinetic interactions control the theory, while I is the input kernel that collects the external influences.
The full derivation of eq. ( 4), omitted here for the sake of conciseness, can be found in section of 3 of SI.This is our first main result: an explicit expression for the action of a network of real neurons that depends on easily accessible experimental quantities.Although commonly used to derive and comment empirical results in neuroscience research, the correlation matrices were, hitherto, not related to each other or ascribable to a precise physical meaning.The action A provides a recipe for interpreting them in a physically grounded way, setting them within a general theoretical framework that portrays the dynamics of a system in terms of kinetic and potential energies.In short, it allows to derive the Lagrangian description of the system and, through the variational principle of stationary action 42 , the corresponding statistical theory.Hence, let O be a test function of Ω, we denote the Gibbs average with respect to A with angle brackets, and formally define it as follows: The classical (non-quantum) limit of the theory is obtained taking the limit of infinite λ , corresponding to a zero Planck constant, or the zero temperature limit of canonical statistical mechanics 43 .This entails being able to put a plethora of experimental results under one theoretical hat, using a coherent physical theory.The ingredients of the recipe are the three observables Φ, Π and Ω, that encode all the information about the system, the parameters of the theory A, B, that controls the fluctuations, and the boundary conditions I. We will call these triplets hypermatrix and the inverse hypermatrix respectively, since each group of observables can be arranged into a single matrix as in Figure 1 of SI.Many properties of the inverse matrices can be inferred by simple self consistency conditions: for example causality imply that B is upper triangular (see SI).Our method represent a natural generalization of the maximum entropy principle proposed in the works of Schneidman, Tkacik and colleagues [30][31][32][33][34] where an Ising model with variable couplings is used to fit ex-vivo recordings of a salamander retina (see 4).Moreover, it is remarkable that the principal component analysis (PCA) and the maximum entropy principle can be linked in a natural way within the proposed LFT context.The PCA is probably the most used numerical method in many scientific fields and, in neuroscience, a large amount of data from decades of research is already available in this form to feed machine learning methods.Given the operator relations between the kernel and the correlation matrices, it can be shown (see 4) that both PCA in space domain and the maximum entropy principle are special cases of the proposed LFT with zero kinetic term (B = 0), while the action associated to a PCA in the time domain is that of a purely kinetic LFT (A = 0), ie., the dual of the maximum entropy principle in the so-called momentum space.This last argument has is own physical importance and will be discussed elsewhere. 3/41

A model for cortical recordings
Our formalism, makes it possible to compare a cortical recordings of neural activity with a renormalized field theory.The most used BCI to simultaneously record collective neural activity is the silicon-based multilectrode array Utah 96 (Blackrock Microsystems, Salt Lake City).To date there are about 20 years of recordings of neural activity made with Utah 96, across different species and under hundreds of different experimental conditions, with thousands of kernels already available.Utah array is in the form of a square grid of a 10x10 electrode pattern with a total of 96 channels (the vertices of the square have no record).Due to its planar geometry, the length of its electrodes (around 1.5 mm penetration into the cortex) and their pitch (40µm) the Utah 96 is able to record from neurons belonging to horizontally separated cortical assemblies, sampled from the same superficial cortical layer z (an example is given in Figure 2 panel a).We will refer to such assemblies as minitubes [44][45][46][47][48][49][50][51][52][53][54] .As a comparison, a multi-electrode array with a linear geometry, like a single shank with multiple contact points arranged in a vertical fashion (e.g., like Neuropixels 55 ), would sample from neurons across various layers within a single minitube 48,49 .Given that each electrode of the Utah array is designed to record approximately the activity of individual minitubes at a distance enough to avoid self-interaction terms, we can model any of its recording as a decimated minitube lattice, and the dynamic evolving around each electrode tip with an on/off field φα xy that identifies the state of the observed minitube (see section 7.8 of SI).Let us name xyz the coordinates of a lattice such that z represents the average height from the surface of the cortex at which a given layer is located.V xyz represents the volume occupied by (all) the neurons present there and xy is the position of the minitube section in the horizontal plane.Each minitube layer would have its kernel where L 2 is a two-dimensional lattice with average lattice step around the diameter of the individual minitube.We can now define φα xy as: which corresponds to assuming that the activation of any one of the neurons in the cell V xyz corresponds to the activation of the entire cell, and with high probability to the activation of the whole minitube.Next, to model the spacing between the probing contacts of the array we apply what in physics is called renormalization, and consider a decimated lattice x ′ y ′ whose step is much greater than the diameter of the individual minitube.Renormalization is a method to understand how the parameters of a theory change when "zooming in" or "zooming out" on the system, thus linking the features observed at different scales.
One way of renormalizing is the so called renormalization by decimation 56,57 , in which the details of the system at small scales are systematically simplified by integrating out most of the degrees of freedom (spins in magnetic systems, neurons in this context) and leave only those on the decimated lattice x ′ y ′ .In our case this leads to the decimated activity kernel: Since Ω describes the on/off activity recorded by each electrode tip, we call it the "electrode" kernel (the full derivation is in sections 3.8 and 3.9 of SI).As Ω of eq. ( 2) yields the hypermatrix of supplementary Figure 1, Ω yields the corresponding empirical hypermatrix of Figure 2 (see also section 7.8 of SI).The space arrangement of Ω for square multielectrode arrays like the Utah 96 is shown in Figure 2 panel b.Explaining how a cortical recording modeled in these terms is comparable to a renormalized field theory is our second main result.In supplementary information we show explicitly a very simple example of how to compute corrections for the effective theory in the semi-classical limit (near-zero Planck constant) 25 .More accurate renormalization schemes could be achieved also by many other established methods [56][57][58][59][60][61][62] .
In general, the exact renormalized theory will depend on the details of the system, the recording interface, the experimental settings and other features that should be considered case by case.Indeed, modeling and calculating the non-linear corrections for the effective theory would be one of the core aspects on which a transfer of expertise from nuclear physics and statistical mechanics to neuroscience could be crucial.

An application to cognitive processes: movement generation
We now show how our framework is suitable to describe cognitive (and other) processes.As a case study we will consider single neuron activities recorded in vivo from the dorsal premotor cortex (PMd) of non-human primates during a task requiring the execution of planned movements (Figure 2 panel c).
We chose this task as use-case to prioritize readability: it involves only two experimental conditions and part its implications are known and therefore straightforwardly comparable with existing literature.Nonetheless, our methods are readily applicable to more complex tasks in which the number of stimuli and/or experimental conditions increase.The analyzed recording sessions are part of a larger dataset used in previous work from our group 13,63 .The most relevant epoch for the task is the one that starts around the Go signal and ends after the movement onset (M_on).By extracting the kernel ⟨ Ω⟩ in this time interval, we obtained the hypermatrices shown in Fig. 2 panel d (see also Extended Data).Looking at the joint spikes matrix ⟨ Π⟩ for both movement directions, we can notice an extended block of coordinated activity approximately 200 ms wide between the Go signal and the M_on, which is known to be the incoming motor plan 13,[63][64][65][66][67][68][69][70][71][72][73][74][75] .Specificity for one of the two directions is highlighted by the more intense patterns of synchronous activity for one direction of movement (D2) with respect to the other.These emerging at the end of the motor plan maturation (∼ 200 ms before M_on), in correspondence of movement execution and 200 ms following the latter.This is in accordance with the view that spiking activity of PMd neurons is crucial to process incoming signals from the thalamus and other cortical areas and to spread them to other cortical, subcortical, and spinal structures to eventually promote muscle activation 13,63,73,[76][77][78][79][80][81][82] , which is in agreement with previous results 13,63,65,72 .From these brief remarks we understand that by analyzing the hypermatrix alone, an ensemble of significant results can be straightforwadly derived without resorting to any specific numerical method (e.g.PCA or machine learning methods 1,13,65,72,83,84 among the most popular).Although the outcomes of these methods have provided valuable insights into the neural processes under investigation, the interpretations of the intrinsic nature of such processes (such as the neural manifold hypothesis 85 ) are still strongly dependent on the chosen analysis pipeline and are not derived from the universal principles of a physical theory.In our framework it is instead possible to have a formal correspondence between the microscopic world of elementary particles, with its governing equations, and that of neurons, enabling the understanding of neuronal interactions in terms of wellknown physical laws and observables.Is the case, for instance, of the repeated patterns of synchronous activity arising in correspondence of the Go signal and in the last part of the epoch after M_on, which are transverse spike waves in the JS matrix.It is also the case of the cross-shaped synchronous activity pattern preceding and following the Go signal that recurs at the time of M_on.It indicates that the neural pattern corresponding to a recent movement, i.e., the one the animal makes to reach and hold the central target, is stored in the network for later access at the appropriate time.Hence, also memory effects are straightforwardly deducuble in this formalism.It would be of certain interest to reconstruct the whole parameters set A, B and I from experimental data (like Tkacik et al. 31 did for the salamander retina) but this would require a remarkable computational burden, mostly because the high sampling frequency of the recording implies a very high rank for the JS matrix.There is a number of methods to attack these so called inverse Ising problems 86,87 .In this respect, a viable way could be to properly bin (renormalize) the process according to a larger clock time τ to obtain a JS matrix of smaller rank, possibly without losing too much information.This route will be certainly explored in future works.

Microscopic theory for movement generation
The example of PMd recording allows for some comments.It has been proposed that movement control may be carried out by the suppression of some steady signal that ends the holding or "non-movement state" and triggers the movement 13,63,65,70,72 .This idea is in line with the shared view whereby a command initiated in other regions is executed locally in the PMd, which is part of a larger network subserving motor control based on frontal, parietal, subcortical, cerebellar and spinal structures 76-78, 80-82, 90, 91 .Thank to our formalism we can state that the part of the brain deciding the movement sends the command to the PMd in the form of a spatially structured external field, that is stationary throughout the execution of the computation: in analogy with magnetic systems, such external field configures the phase toward which the population of neurons will try to balance.It can be hypothesized that the neural computation underlying the so-called motor plan is performed in the convergence to the system's equilibrium: at the time α in which the external input changes, the system converges to the phase (valley) selected by the new input.Such convergence has to necessarily start from the equilibrium preceding the input activation: this would essentially be memory.This mechanism would set the typical relaxation time of the process, and we can interpret it as a computation time.In this scenario it is possible to construct analytically solvable models with locally stationary external input, similar to Curie-Weiss models 92,93 , which can faithfully represent local circuitry (e.g.formally modeling the circuit sketched qualitatively in Pani et al. 2022 13 ).

Cognitive processes and spin glasses
It is interesting to briefly compare these results with those obtained by Schneidman, Takcik and others for the salamander retina.In their works [30][31][32] the authors showed that neurons recorded from a patch of ex-vivo retina, exposed to natural stimuli, may be capable of exhibiting a glassy phase, i.e. the system showing multiple states that are local minima of the energy.It would be tempting to generalize this conclusion to other systems of interacting neurons expressing collective behavior.As we have shown, this is not always the case.Like the salamander retina, also other cortical neurons (e.g., in the motor cortices) might be structurally capable of exhibiting glassy phases.However, it is not necessary that these are physiological within the neural computation leading to cognitive processes, nor that they play a central role in sending the system out of equilibrium.Indeed, the retina is a structure strictly devoted to inputs to be passed to the central nervous system, and in the aforementioned works was also separated from the same.We instead recorded from an in-vivo system that is integrated in a wider circuit and handles incoming inputs producing a refined output to other cortical, subcortical and spinal structures.If the system of neurons involved in motor control was in a glassy phase, then it could not be able to prepare and deliver the command consistently, making a planned movement inaccurate and inefficient.Following the ideas of Toulouse 38 , that sees learning in a network of neurons emerging as a selection of possible states of the system, we would expect more of a glassy behavior at the beginning of the training, before the animal has reached optimal performance.This could be studied by calculating, for example, the overlap between hypermatrices of training sessions separated by large time intervals.It is also reasonable to think that this could apply to any time-locked cognitive process.Glassy activity may also arise in entirely different scenarios, e.g.under altered consciousness states 40 like deep sleep or anesthesia or in similar conditions as those considered for the ex-vivo salamander retina.Expecting to test these conjectures in future works, we emphasize that the last hypothesis could be investigated by comparing the exceptionally rare recording of a dying brain 94 with its in-vivo recordings 13 .In conclusion, we showed that applying lattice methods from elementary particle theory to natural neurons should be possible and fruitful.We have set a first stone, but the transition to systematic thinking in this theoretical framework will still require substantial work.However, given the advanced state of LFTs and their vast range of applicability, knowledge exchange with neuroscience would be beneficial for the theoretical development of the latter in the near future, and for both in the long run.The encounter opens exciting avenues for interdisciplinary research, facilitating connections between computational neuroscience and other fields of physics that utilize LFTs.Indeed, LFTs became of crucial importance far beyond their traditional realms, encompassing cellular automata 95 , quantum simulations 96 , computational modeling 29,97 and machine learning 98 among others.Thus, it is reasonable to think that they could do the same in neuroscience, once the appropriate common theoretical frame is established.With our manipulations we showed that determining the effective theory describing the dynamics of neocortex is possible in terms of LFT, at least in our formalism.Moreover, given the specific architecture of the neocortex [44][45][46] , it is possible that the topology of such a theory is essentially two-dimensional, with the cortical layers behaving as interacting fields just as in elementary particle theory.Rethinking neural interactions in this way could greatly facilitate the analytical construction of effective theories, since in dimension two there are known schemes for analyzing, simulating, renormalizing and, in some cases, solving 56,57 such theories.In truth, it is not possible to predict what a collision between particle physics and neuroscience might lead to, yet the arguments in this work strongly suggest that it would be worth finding out.

Theoretical methods
Here we collect further details on how to derive the connection with maximum entropy principle [30][31][32][33][34] and PCA.The full derivation of the binary LFT in kernel representation 25 can be found in the Supplementary Informations (SI) section.

Generalization of the maximum entropy principle
We show that our LFT is a generalization of the maximum entropy principle as presented in the work of Schneidman, Tackcik and others [30][31][32][33][34] .To transform into the spin system we use the map σ = 2 ϕ − 1 (which is equivalent to replacing the zeros by -1), resulting in the kernel of magnetizations From a mathematical point of view, the two descriptions are equivalent.In fact, from M we obtain the spin-spin correlation matrix C that is usually found in statistical mechanics and the overlap matrix Q of the spin glass theory 28 Between the kernel, correlation and overlap matrices still hold M M † = C and M † M = Q 25 .With some algebra and a global rescaling of the quantities (see section 7 of SI) we can write the action in the magnetic representation: where h is linearly related to I. In the spin case the hypermatrix will therefore consist of M, C and Q.By expanding the definition of c i j we immediately note that in the limit B → 0 we have: In this way, a replicated version 28 of the Hamiltonian (i.e., the total energy of the system) used in the works of Schneidman and Tkacik is obtained.Notice that the limit T → 1, corresponding to a single replica of the system, it is exactly equivalent to the max entropy model [30][31][32] Thus, the max entropy principle is recovered as specific case of a field theory with zero kinetic energy.For what we have shown, the authors approximate the activity with a LFT at equilibrium with B = 0, which corresponds to a field theory with zero kinetic energy. 9/41

Scaling test for axonal connectivity
Let us show a simple possible application to the case of the salamander retina dataset shown in the papers of Schneidman, Tkacik and others [30][31][32] .Remarkably, they where able to reconstruct the A matrix for small groups of neurons in a salamander retina thus obtaining both the correlation matrix and its dual in the parameter space.In Figure 1f of their paper 31 they show the distributions of the reconstructed couplings Ji j for some values of N. The distributions of Ji j are indeed approximately Gaussian, and would be very interesting to verify the scaling of the variance of such distributions at various N. Concerning the space couplings, for mean field models 25,26 the pairwise interaction is the sum of two terms for the thermodynamic limit to exist it is necessary that the h i are of order O (1) in the number of neurons, and that J0 and Ji j scale correctly.This depends on the connectivity of the matrix of axonal adjacencies (axon matrix): Average connectivity is defined as follows: Therefore, to normalize correctly, one must take with J 0 and J i j Gaussian variables of unit variance and zero mean.For fully connected models we have g (1) = N, and then For models with large connectivity but sub-linear in the number of neurons, one can consider whereas for finite-dimensional models we have that g (Λ) = O (1).From preliminary analysis (not shown) we confirm that the couplings are approximately Gaussian, but the scaling is 1/N 1/4 and not 1/N 1/2 as in the Sherrington-Kirkpatrick model (SK) 25,26 .This would be interesting, since the system would admit a thermodynamic limit and still sufficient connectivity to manage the data with a mean-field theory [25][26][27][28] .Also, it would be very interesting to see the full hypermatrix to which the covariance matrix shown in Figure 1 of Tkacik et al. 2009 31 belongs, and even more interesting would be to fit such hypermatrix with the field theory presented in this paper. 10/41

Relation with the principal component analysis
Here we show that also the PCA can be interpreted as a special case of our LFT.In particular, the PCA can be understood as projecting the data into a subspace such that the projected data has minimum discrepancy with the original one.In the following we explicitly consider only the projection into the spatial domain, but the temporal projection is similar.Then, let V ′ be a subset of V with size n < N, let be a real valued kernel with N rows and n columns, and let introduce the set of kernels with orthonormal columns (hereafther we denote by I the identity matrix) The columns span a n-dimensional subspace of R V 2 , and the projection of Ω into this subspace is XX † Ω.
The PCA aims to identify the subspace such that the discrepancy between the operators XX † Ω and Ω is as small as possible in the Frobenius norm.The objective function is where ∥ • ∥ F denotes the Frobenius norm, and where we applied ∥D∥ 2 F = Tr(DD † ), with Tr(•) indicating the trace of the operator.This and the othonormal constraint constitute a constrained optimization problem, calling Y the solution to that problem we write The minimum is found by choosing Y as the n largest eigenvectors of ΩΩ † (or largest left-singular vectors of Ω), see Goodfellow et al. 99 for a proof.In the end one finds we conclude that the action of PCA in space domain is that of a LFT with B = 0 and and is therefore a special form of the maximum entropy principle described above.

Subjects
Two male rhesus macaque monkeys (Macaca mulatta, Monkeys P and C), weighing 9 and 9.5 kg, respectively, were employed for the task shown as case study.

Apparatus and task
The monkeys were seated in front of a black isoluminant background (< 0.1cd/m2) of a 17-inch touchscreen monitor (LCD, 800 x 600 resolution), inside a darkened, acoustic-insulated room.A noncommercial software package, CORTEX (http://www.nimh.gov.it), was used to control the presentation of the stimuli and the behavioural responses.Fig. 2 panel a shows the scheme of the task: a go-signal reaching task.Each trial started with the appearance of a central target (CT) (red circle, diameter 1.9 cm).The monkeys had to reach and hold the CT.After a variable holding time (400-900 ms, 100 ms increments) a peripheral target (PT) (red circle, diameter 1.9 cm) appeared randomly in one of two possible locations (right/left) and the CT disappeared (Go signal).After the Go signal the subjects had to reach and hold the PT for a variable time (400-800 ms, 100 ms increments) to receive juice.Feedbacks for the touch wer provided to the animals in the form of white circles.Reaction times (RTs) were defined as the time between the presentation of the Go signal and the onset of the hand movement.White circles around the central target were used as feedback for the animals to indicate the touch.

Extraction and processing of neuronal data
A multielectrode array (Blackrock Microsystems, Salt Lake City) with 96 electrodes (spacing 0.4 mm) was surgically implanted in the left dorsal premotor cortex (PMd; arcuate sulcus and pre-central dimple used as references after the opening of the dura) to acquire unfiltered electric field potentials (UFP; i.e., the raw signal) sampled at 24.4 kHz (Tucker Davis Technologies, Alachua, FL).As described in previous work from our group 13 , we extracted single neurons activities from the raw signal by employing the spike sorting toolbox KiloSort3 100 with the following parameters: Thresholds: [ 9 9] (thresholds for template-matching on spike detection); Lambda: 10 (bias factor of the individual spike amplitude towards the cluster mean); Area Under the Curve split: 0.9 (threshold for cluster splitting); Number of blocks: 5 (amount of blocks channels are divided into for estimating probe drift).The output was manually curated in Phy (v2.0; 17) to merge clusters mistakenly separated by the automated sorter.From this procedure we obtained a binary spike raster with a time resolution of 1ms (1 for a spike, 0 for no spikes) for each single-trial.Each single-trial raster was then put into the form of the kernel Ω of eq. ( 8).

Fundamentals
Here we introduce the fundamental notations to describe a patch of cortical tissue as dynamical system on lattice.In these first sections we will consider variables that are aimed to model the whole part of the nervous system involved in the neural computation, of which the actually observed neurons (e.g., during a neurophysiology experiment) are typically a sparse subset.For the detailed derivation of the Lattice Field Theory (LFT) formalism 25 and its associated statistical theory please refer to section 7.

Map on vertex
Let N be the number of neurons involved in the task, these are arbitrarily mapped onto the ordered set of vertices Indexing i ∈ V is defined short of a two-way map that shuffles the index this map constitutes a parameter of the representation, and ideally should be chosen so as to highlight the function of the various neurons observed during the task, i.e. highlight any groups of neurons that belong to the same population, or "computational structure."

Identify the θ highlighting the space-time structures
The neurons involved in the task (all of them, not just those observed) could be partitioned into further subgroups based on function, location, average activity, and times when this varies.For example, it is possible to sort neurons by average activity, or by the earliest time at which they significantly change their initial state.In the case of multi-electrode array interfaces, it is also possible to define a unique sorting of the experimental rasters based on the position of the sensor channels.Given the sampling rate of neural interfaces, it is assumed that the temporal sampling of the data is much finer than any functional level of interest, while the spatial resolution could be severely limited.This issue will be addressed in subsequent sections; for this one we assume θ arbitrary unless otherwise stated.

Dynamical system
In general, we assume that the vector follows a causal evolution determined by the previous activity (dynamic process with memory) according to a hypothetical law f which in principle could be stochastic.We further assume that such a law can be described or approximated by a quantum time evolution 24 , so that the formalism of statistical field theory can be applied (notice that classical evolution is a subcase of quantum evolution). 13/41

Time regularization
Since the space is already regularized by the natural discretization of the computational units (minitubes/neurons), in order to switch to the all-lattice system, it remains to regularize the time window.The window is therefore quantized in T sub-intervals of size τ, in turn mapped onto the ordered set of vertices preserving the temporal ordering.Notice that in contrast to V the map between time intervals and α ∈ S is naturally fixed by the temporal ordering of the process.

Clock time
The time interval is quantized according to a hypothetical "clock" time τ, corresponding to the time between two fundamental computational operations by the neuron.As described in the main text it is convenient to consider τ = 1ms.

Binary computation cell
Within a clock time the computation unit can be possibly active or silent: this can be encoded by a binary variable, which is assumed to be the actual variable supporting the computation.The natural association is obviously with the active/silent state of the neuron during an interval equal to the clock time.This leads to the definition of the (neural) activity kernel Ω, i.e., a binary array with N rows and T columns that encodes the entire activity of the observed neurons within the time window.Formally, the kernel is a function of the type: We can introduce the symbol ϕ α i to represent the activity of the i−th neuron at time α, in this way the kernel can be explicitly written as The kernel is the general order parameter 25 , and is assumed to contain the entire information needed to describe the observed process.Given the presence of an absolute refractory period, we expect that the regularized dynamical system is the one that naturally describes the evolution of the affected cortex sector, and not the possible continuous theory that would be obtained by taking τ → 0.

Kernel of the magnetizations
To transform from the binary representation into the spin system we use the map σ = 2 ϕ − 1 (which is equivalent to replacing the zeros by -1 ), that gives in the kernel of magnetizations: 14/41 explicitly, the magnetization kernel is The two descriptions are mathematically equivalent, but note that some observables may have different meanings.For example, the correlation between binary spikes is 1 only when neurons fire together and 0 otherwise, while the product between two spins is equal to 1 if the variables are equal, and negative if opposite.In practice, the first variable is sensitive only to the event in which the two neurons fire simultaneously, putting on the same rank (same null value) the events in which only one of the neurons fires and the one in which they are silent together.The second, on the other hand, distinguishes only whether or not they do the same thing, regardless of whether they fire or not.

Observables
In the following we define the observables of interest that can be obtained from the kernel.

Kernel offset
The zero order observable is the global average, or "offset", of the kernel.This quantity is specific to the representation made with kernels and would be a kind of "ergodic" approximation of the activity in the space-time window that is considered.The offset is defined for both the activity and the magnetization kernels, respectively The two quantities are related by a linear relationship From a mathematical point of view they are proportional to the Grand Sum (sum of all the elements of a matrix).Such a kernel corresponds to the thermodynamic limit of a gas on lattice with particle density exactly equal to Ω.For magnetic systems, in highly connected ones this can occur due to the presence of a constant external field, or from a uniform fully connected interaction of the Curie-Weiss type.We don't know if there are other magnetic or gaseous systems that have this property and are substantially different from magnetization eigenstates.

Row and column averages
At this point we can move to the description of the observables of order one.We define the row averages, which would be the average activity of the state at time α, and the column averages, that is the average firing rate of the i−th neuron in the time window considered.The average activity is with ω α space average a time α, while averages over the rows, is 15/41 where f i is the time-averaged activity of the i−th neuron in the time interval S. Notice that the average of the averages is in both cases equal to the offset Similarly, starting from the kernel of magnetizations we obtain the vector of space averages over the columns, the vector of time averages (on the rows) where m i is the time average of the i−th spin in the time interval S. Again, the average of the averages is equal to the offset and between the averages made with spin and with spikes the same linear relationship holds: at this first level of description still no particular differences emerge between the two representations.

Correlation matrices
The variables of order two are the correlation matrices.It is possible to define a temporal correlation matrix, which in the case of the (neural) activity kernel would be the joint spikes matrix and a spatial correlation matrix, In general, these matrices are obtained from the kernel through the relations Note, however, that these observables are not completely independent from the averages, and converge to the following matrices in the free-field approximation.We therefore define the connected correlation matrices, which describe the correlations between the fluctuations 16/41 these are zero in the free-field limit, and nontrivial in case the correlations between fluctuations are significant.Again, we can define analogues in the magnetic description.We can associate the spin-spin correlation matrix typical of systems seen in statistical mechanics and the overlap matrix commonly used in spin glass theory 28 The relationship between kernels, correlations and overlap is still the same As before we can define the mean field matrices however, it is important to note that the matrices constructed by the magnetization kernel have a different meaning than those constructed by the binary kernel.To compare them, one must consider the connected matrices the latter contain only the correlations between fluctuations, and are the same for both spin and binary neurons minus a scaling factor.Note that the distribution p Q * is exactly the distribution of the overlaps mentioned in the Replica Symmetry Breaking (RSB) theory 28 .

Lattice Field Theory
In the following section we provide the complete derivation of the expressions of the action A , both in the binary case and in the spin representation.Therefore, for the sake of completeness, some expressions and definitions given in the main text will be included again.The computation of neuron i is realized trough the field where Ω is the global offset for the neuron in vivo, which could possibly be zero.The action of a lattice field theory on A (Ω| F) can be described by an expansion of the kind: Each term represent one-, two-, three-, and four-vertex interactions etc., while the tensors sequence F collects the parameters of the theory.However, if we want to consider the same approximation used The hypermatrix provides all the necessary elements for analyzing the fundamental physical properties of the system of neurons.For a single realization of the process, e.g.single trials in a neuroscience experiment, the matrices can be obtained directly from the kernel.This does not apply in general to the averaged hypermatrix, where the correlations contain information about the fluctuations (see sections 6 and 7).See Figure 2 for the arrangement used in Figure 1 of main text.
by Schneidman and colleagues 30 , interactions with more than two vertices can be neglected, and also two-vertex interactions with four different indices.Therefore, the proposed action reduces to: The action depends explicitly on the correlation and overlap matrices, and is controlled by the matrix of potential interactions A and by the matrix of kinetic interactions B. We can switch to the binary representation ϕ through the transformation doing the algebra one finds that the structure of the theory is the same: because the connected part is identical.If the global offset Ω is not zero one must take into account the appearance of additional currents 18/41 which, however, transform linearly.Ultimately, the action in the binary form is: Note that normalizing the sums the action can be rewritten using the first-order observables f , ω, which are obtained from the kernel Ω through linear transformations, and with those of second order, the matrices Φ and Π, which are nonlinear observables: that is eq.( 4) of the main text.For a single realization of the process the matrices are binary and can be obtained from the kernel (in this case the hypermatrix is a redundant representation) however as we shall see this does not apply in general to the averaged hypermatrix, where the correlations contain information about the fluctuations.

Magnetic representation
To switch to the magnetic representation we apply the usual bit-spin transformation σ = 2 ϕ − 1: The structure of the interaction is identical except for a global rescaling And an adjustment of currents with an additional term The action in the magnetic representation is therefore In this case the hypermatrix will consists of M, C and Q.We can recover the Ising Hamiltonian used in Schneidman et al. 30 in the limit T → 1 and B → 0: Thus, the max entropy principle is recovered as specific case of a field theory with zero kinetic energy.Since the theories are equivalent in the coming manipulations we will mainly use the spin representation. 19/41

Lagrangian description
Assuming that the process evolves causally, it follows that the kinetic matrix B must be upper triangular, that is, the state of the system at instant α depends only on the states realized in the previous β ≤ α − 1.Therefore, we can define the sequence of time windows In this way it is possible to define the Lagrangian of the system where in the second line the definition of overlap q αβ is applied We can isolate the potential term from the kinetic term (which depends on the overlap) Thus, we can rewrite the Lagrangian in the canonical form where q αS α−1 is the α−th row of the matrix of overlaps up to the time α − 1.This determines the dynamics of the system.

The kinetic term
That the overlap-dependent term is indeed interpretable as a kinetic term can be deduced by comparing with a simple Lagrangian system (see the work from Lee 18,19 for an overview).We introduce the pulse (momentum) kernel The lagrangian of the Markovian scalar field (without memory) is with a few algebraic steps it can be shown that therefore the Lagrangian can be rewritten as.

20/41
In the case of our action, taking where I(•) is the indicator function.The associated Lagrangian becomes and one can see immediately that the difference between the two Lagrangians is i.e. a constant that is irrelevant to the determination of dynamics.Moreover, the sign of the kinetic term is reversed with respect to that of the overlap term.It follows that the free field system is a sub-case of the general action described at the beginning, whose overlap term can be reduced to the kinetic term of the free Lagrangian.

Statistical Theory
So far, our theory is equivalent to a quantum field theory on lattice 17-19, 21, 24 , since the reversal of the sign of the kinetic term in the transition from momentum to overlaps can also be achieved by applying a Wick rotation to a quantum oscillator system [18][19][20] .In general, the evolution of a Lagrangian system is determined by the principle of stationary action 42 , which means that the kernel that minimizes it is not necessarily a minimum of the action: it can also be a maximum, or a saddle point.However, in quantum field theories, Wick's rotation transforms the Lagrangian into a Hamiltonian (sum of kinetic energy and potential energy).Similarly to Wick's rotation, the expression of the kinetic contribution in terms of overlap also transforms the Lagrangian into a Hamiltonian.We therefore consider a Gibbs principle 43 applied to the action, which is equivalent to the least action principle.We define the action partition function where we interpret the action as a Hamiltonian and look for its minimum.Here λ is the inverse Planck constant, and plays the role of a temperature.The classical limit is recovered for λ → 0. We also define the free action which would be the analogue of the free energy.We then apply the steps to obtain the Gibbs principle 25 : first we write the partition function by multiplying and dividing by a test measure to obtain the flat functional Then we apply Jensen's inequality to the average versus the test measure so as to obtain the free action functional This functional is greater or equal to the free action for any test measure and one can see that the minimum is actually reached by the Gibbs measure It can be verified that the measure satisfies the relationship with the free action If the system is assumed to be classical (i.e., non-quantum) the dynamics is obtained in the zero temperature limit.However, it could also have an intrinsic minimum temperature (equivalent to non zero Planck's constant).

Connection with Replica Theory
The theory in the Lagrangian form allows to establish a connection with the replica theory 28 .Notice that the partition function is and since the sum over the kernels is equivalent to a sum over T replicas of the system σ the replicated system is obtained in the limit B → 0 (no kinetic term) with simple steps we arrive at the following relations where Z is the partition function associated with the Hamiltonian H and As one can see, the partition function of the action converges to the partition function of the Hamiltonian replicated T times.The interpretation of the replica trick 28 log is natural enough in this context: the formal limit T → 0 describes a situation in which the continuous limit of the theory (τ → 0) is observed for an infinitesimal time.

External input
So far, we have only considered the evolution of an isolated system, but obviously in our case the input is crucial, so we must include it in the model.This can be done in a relatively simple way by introducing the Input kernel, which describes the input signal in the network which should be added to the action to obtain the description of the full system By introducing the input partition function and applying Gibbs principle we find the distribution of the input The partition function of the general action can be expressed in terms of the average of the isolated state respect to ρ Note also that the partition can also be expressed as the average of the input over the measure of the isolated system from which a relationship between partition functions and averages over states For example, the input kernel may model the signal arriving to the observed cortical area after a stimulus.In the case of a motor task 13,63 could be the thalamic input arriving to the boundary neurons (in a topological sense) of the recorded cortical region following the Go stimulus, that is expected to be steady-state almost everywhere except around the time interval at which the motor plan is realized.If axonal and synaptic connections are reasonably stable then most of the observed variability could come from input noise from the rest of the network, or slightly different initial conditions, etc.To include all possible effects one can introduce a "quenched" space-time noise term, i.e. a random field δ to be added to the input term I which statistically mimic the input noise on the time scale of the entire session.In case of the recordings described in the main text 13,63 we expect that quenched noise terms can be ignored.

Ground state of the action and order parameter
The variational principle identifies a distribution η called the ground state of the action (GS), which would be the one from which the mutielectrode interface draws the states we observe at the single trial level.Note that the GS of the action is a distribution in the space of kernels Ω V S , and hence a natural order parameter would be the kernel of the GS The kernel of the GS is of particular interest since one can directly obtain the average momentum kernel ⟨∂ M⟩ η and all kernels derived from linear operations.It also allows to determine and subtract the steady state of the "hold" phase before the motor plan is observed (see section 2.4 of the main text).Notice that averages of first-order observables, such as offset or row and column averages, can be computed directly from the average kernel, since they are related to it by a linear relationship, and for N and T finite have the same quantization properties.Notice that if the connected correlation matrices are negligible then the correlation matrices can be deduced from the average kernel since C 0 and Q 0 depend on the averages.The amplitude of the kernel cell fluctuations satisfies the relation However, the relationship does not apply in general, Thus, the average kernel may not be a sufficient order parameter to fully describe the GS η, and we should also look at second-order variables.To verify this we can compute the ensemble covariance matrices If these matrices are non-trivial it means that the correlations cannot be reconstructed from the average kernel.Thus, the most general order parameter is the hypermatrix, composed by the average kernel ⟨M⟩ η and the average correlation matrices ⟨Q⟩ η and ⟨C⟩ η . 24/41

Renormalization
We conclude the general theoretical part with a simple renormalization scheme based on 25 that will be useful to link theory with experimental observations.For this subsection we switch again to the lattice gas representation.Consider a joint kernel partition as in Section 3 of Franchini 2023 25 , with two levels (equivalent to one step Replica Symmetry Breaking: RSB1).Let N 1 , N 2 , T 1 and T 2 be numbers such that N = N 1 N 2 and T = T 1 T 2 , and let The kernel can be rewritten according to the new multiscale index where we introduced the sub-kernels the field is renormalized according to a map such that to regain some binary variable, i.e. φα 1 i 1 will be one if within the cell V i 1 S α 1 the condition set by the renormalization map is verified, and zero otherwise.By construction, the relationship between the two variables is such that We can define the renormalized kernel as follows: Since the action structure is symmetrical between space and time, we can also perform the calculations on the potential term alone.We apply the multiscale index and then the renormalization map For example, for a bin renormalization (Kadanoff renormalization) the effective interaction will be given by Âα while for a renormalization by decimation (Wilsonian renormalization) (116) we will have that We separate the stationary term (if any) The stationary term corresponds to the renormalized coupling matrix; we can thus rewrite the action potential term by separating the renormalized part from the fluctuation Doing the same with the kinetic term and separating the uniform term the treatment is completely symmetrical, leading to The action in the renormalized variables will therefore have a perturbation where η is the GS of the renormalized action.In general, this expression depends on the details of the couplings within the renormalized cell.The perturbation of the action is formally defined as where the sum is on those Ω that if renormalized are equal to Ω, i.e.
O Ω := Ω ∈ {0, 1} V S : R (Ω) = Ω (125) and the function Γ is defined as follows: Γ Ω, Ω|A, B := ∑ Thus, renormalization operations can change the structure of the action.For example, consider the potential part: we can approximate the renormalized coupling fluctuations with a stationary Random Energy Model (REM universality, see Arous & Kuptsov 101 or Section VI of Franchini 2023 25 for a practical example in kernel language) δ Âα 1 i 1 j 1 (Ω) ≈ J i 1 j 1 Ω ∆ i 1 j 1 (127) the partition function can be approximated as follows where in the second row we applied the PPP-REM 25 average and λ is the renormalized temperature.In essence, this type of approximation introduces a linear term in the renormalization map More accurate renormalization schemes based on multi-scale analysis can be computed following the methods of Franchini 2023 25,26 and many others methods as well [56][57][58][59][60][61][62] , although in general the exact shape of the perturbations depend on the details of the system and on the instrumental limits and systematics, and to push further it is therefore necessary to introduce more specified informations about the couplings and the kinetic properties of the system, both of the neocortex and the sensor.

Decimated kernel
Let L 3 be a cubic lattice and let xyz ∈ L 3 such that z represents, for example, the average height from the surface of the cortex at which a given cortical layer is located.Let xy be the position of the "center of gravity" of the cortical minitube section in the horizontal plane.To model the minitube layers we will define a partition of the space R 3 into volumes of equal size according to the lattice cells, for simplicity, we will approximate the cortical minitubes with square-based minitubes (note that these approximations are topologically irrelevant).The layers of the minitubes are thus represented by the lattice cells Now, calling v (i) ∈ R 3 the position of the nucleus of the i−th neuron, we can group by the volume in which they are located each of these groups of neurons will have its own associated kernel At this point one could further group the neurons, first by index z, so as to form the cortical minitubes.The vertices belonging to the minitube are that is the set of neurons that constitutes the minitube at position xy.The kernel is and describes the activity of the single cortical minitube in xy.Some interfaces, such as Neuropixel or deep multielectrode shanks, allow direct observations of this activity.The minitubes are in the end grouped again to form the cortex structures and areas, and the original kernel can thus be expressed in terms of the minitubes: so that it represents a two-dimensional lattice of cortical minitubes [44][45][46] , that is, a system in 2+1+1 dimensions.For the above we can consider the experimental kernel for a specific tubular layer where, again, xyz ∈ L 3 are the spatial coordinates in a cubic lattice such that z represents the average height from surface of the cortex at which a given layer is located, and xy is the position of the minitube section in the horizontal plane.The points are organized on the tubular layer in a sub-lattice x ′ y ′ ∈ L ′ 2 whose step is much greater than the diameter of the individual minitube, so that the activities recorded at the various points belong with high probability to different and well-spaced minitubes.At this point, to model the spacing between the probing points, we apply a renormalization by decimation, and obtain the decimated activity kernel Ω := { φα x ′ y ′ ∈ {0, 1} : this is the electrode kernel of eq. ( 8) shown in main text.This kernel is intended to model approximately the sensor recording, net of systematic errors and approximations.According to our arguments it should be comparable with a renormalized theory.Notice that this renormalization happens only in space and hence the information coming from the digitalization of neuronal signals is largely preserved (as far as the signals inside a channel or multi units do not overlap too much in time).Ω leads to the experimental hypermatrix of Figure 3.In the case of our example dataset from PMd would be the Go and movement onset distributions reported in Figure 1 of the main text.

Figure 1 .
Figure 1.Case study: a) Cortical minitube sampling of the Utah array: each electrode pitch is ∼ 40µm so that the listening volume of each electrode can be reasonably assumed of the order of the diameter of a minitube (∼ 30 − 60µm 88 ).In the case of PMd the Utah 96 samples activity from around the inner Baillager band 49, 89 at around 1.5 mm penetration.b) Decimated lattice of the electrode kernel Ω for Utah 96.c) Behavioral task that required to move the arm toward a peripheral target (Go trials) that could appear in one of two directions of movement (D1 or D2).Monkeys had to reach and hold the peripheral target to get the reward.RT: reaction time; CT: central target; Go: Go signal appearance; M_on: Movement onset.d) Experimental hypermatrix from the electrode kernel Ω of eq.(8) for D1 and D2.Neural activity is aligned [-1, +1]s around the M_on to include the distributions of the stimuli (the Go signal, orange distribution and M_on, magenta).Here I of eq.(4) shows the time markers for the stimuli (see Figure 2 of SI).Purple traces are the observables computed in the first 250 ms.The neurons are sorted according to the activity in the first 250 ms of D1.Ticks are every 500 ms.

Figure 2 .
Figure 2. Hypermatrix: hypermatrix arrangement of Φ, Π and Ω.The hypermatrix provides all the necessary elements for analyzing the fundamental physical properties of the system of neurons.For a single realization of the process, e.g.single trials in a neuroscience experiment, the matrices can be obtained directly from the kernel.This does not apply in general to the averaged hypermatrix, where the correlations contain information about the fluctuations (see sections 6 and 7).See Figure2for the arrangement used in Figure1of main text.

which results in a quadratic term added to the 1 ∈V
Âi j c i j − λ 2 T ∑ i∈V ∑ j∈V ∆ i j ĉ2 i j + ...

Figure 3 .
Figure 3. Experimental hypermatrix from the electrode kernel Ω of eq.(139).The hat symbol (e.g., ω) denotes the exeperimentally measured observables.Î shows the time markers of the stimuli presented during the time window analyzed (see main text).In the case of our example dataset from PMd would be the Go and movement onset distributions reported in Figure1of the main text.