Thermodynamic Formalism in Neuronal Dynamics and Spike Train Statistics

The Thermodynamic Formalism provides a rigorous mathematical framework for studying quantitative and qualitative aspects of dynamical systems. At its core, there is a variational principle that corresponds, in its simplest form, to the Maximum Entropy principle. It is used as a statistical inference procedure to represent, by specific probability measures (Gibbs measures), the collective behaviour of complex systems. This framework has found applications in different domains of science. In particular, it has been fruitful and influential in neurosciences. In this article, we review how the Thermodynamic Formalism can be exploited in the field of theoretical neuroscience, as a conceptual and operational tool, in order to link the dynamics of interacting neurons and the statistics of action potentials from either experimental data or mathematical models. We comment on perspectives and open problems in theoretical neuroscience that could be addressed within this formalism.


Introduction
Initiated by Boltzmann [1,2], the goal of statistical physics was to establish a link between the microscopic mechanical description of interacting particles in a gas or a fluid, and the macroscopic description that is provided by thermodynamics [3,4].
Although this program is, even nowadays, far from being completed [1,5], the work of Boltzmann and his successors opened new avenues of research, not only in physics but also in mathematics. Especially the term "ergodic", which was coined by Boltzmann [1], inaugurated an important branch of mathematics that provides a rigorous link between the description of dynamical systems in terms of their trajectories and the description in terms of statistics of orbits and, more generally, between dynamical systems theory and probability theory. At the core of the ergodic theory, there is a set of "natural" dynamically invariant probability measures in the phase space, somewhat generalising the Liouville distribution for conservative systems with strong analogies with Gibbs distributions in statistical physics [6,7]. This strong connection, in particular, gave birth to the so-called Thermodynamic Formalism.
The introduction of Thermodynamic Formalism that occurred in the 1970s was primarily due to Yakov Sinai, David Ruelle, and Rufus Bowen [8][9][10]. The development of Thermodynamic Formalism initially served to derive rigorous criteria characterising the existence and uniqueness of the Gibbs In particular, if x = y then m = ∞ and d θ (x, x) = 0, and, by convention, if x 0 = y 0 then m = 0. When considering this distance, the metric space (A N , d θ ) is compact [10].
In order to consider time order, we introduce the time evolution in the form of a left-shift, σ : A N → A N , defined by (σx) i = x i+1 , for all i ∈ N and any x ∈ A N , where the index i refers to time. That is, the i-th symbol of σx (the image of x under σ), corresponds to the (i + 1)-th symbol of the original x, for all i.
Let us define continuous functions f : A N → R. We introduce the modulus of continuity of f : var k ( f ) := sup{| f (x) − f (y)| : x i = y i , for i = 0, . . . , k − 1.}, characterising the maximal variation of f on the set of infinite sequences agreeing on their first k symbols.
The function f is continuous if var k ( f ) → 0 when k → ∞, and the continuity is exponential if var k ( f ) → 0 exponentially fast. A function f : A N → R has range R if f (x) ≡ f (x 0 , . . . , x R−1 ), i.e., f only depends on the first R coordinates x 0 , . . . , x R−1 . Functions with finite range are obviously continuous. The notion of continuity, and, especially, exponential continuity is essential when studying the Thermodynamic Formalism of functions with infinite range. A useful concept for this formalism is that of the Markov partitions. Let Ω be a general (continuous) state space and a general dynamics (a flow or a discrete time mapping) G on Ω. Consider a finite covering of Ω, made by sets called rectangles, R = {R 1 , . . . , R l } with the property that, for every pair i = j, int(R i ) ∩ int(R j ) = ∅, and such that the closure of the image set G(R i ) equals a union of closures of sets in R. There is a specific construction for different dynamical systems (for details we refer the reader to [10]). This construction allows one to make a natural coding for the state space, where the trajectories of the dynamical systems correspond to the sequences of symbols of the labels identifying the sets R i . Rectangles allow for us to partition a continuous phase space into a discrete finite partition. For hyperbolic dynamical systems, each point in the phase space admits local stable or unstable manifolds with a non zero diameter. The edges of rectangles in this case are made of these local stable and unstable manifolds and the corresponding partition is called a "Markov partition", because the image of a rectangle under the dynamics can be represented by a Markov transition matrix [79].

Gibbs Measures
We now define the measurable sets on A N . Let us denote the sequence x 0 . . . , x n−1 by x 0,n−1 . The set [x 0,n−1 ] := {y ∈ A N : y 0 = x 0 , . . . , y n−1 = x n−1 } is called a cylinder. Cylinders define a Borel sigma algebra B on A N (see, for instance, Chapter 1, in [80]). We are interested in probability distributions on (A N , B) also referred as (macro) states in physics. In the sequel we will skip the prefix "macro" and deal with "states" as probability measures on (A N , B). There is a special class of states ν, such that, for any measurable set B ⊂ A N , ν(σ −1 (B)) = ν(B), where σ −1 (B) stands for the set of all the pre-images of σ of elements of B. These distributions are the set of shift-invariant probability measures denoted by M σ (A N ). In general, it is possible to consider systems where there exists some forbidden transitions between symbols. In that case, we need to consider a subset of A invariant under the dynamics, i.e., X ⊂ A N , such that σ(X) = X.
The analogy with the spin systems in statistical mechanics at the root of the terminology "Thermodynamic Formalism", goes, as follows. Let φ : A N → R be a continuous function, also called "energy" or "potential". Two important examples are finite range potentials, where φ(x) ≡ φ(x 0,R−1 ), or exponentially continuous potentials with infinite range. Subsequently, the "energy" of a configuration of n sites based on the sequence x ∈ X, is given by the "Birkhoff sums": We define the measures that assign probability to the cylinder sets based on the potential function, with the so called "Gibbs property". There exist constants C > 1 and F(φ), such that, for all x ∈ X and for all n ≥ 1: The measures satisfying the condition (1) are called "Gibbs measures". The quantity: for all y ∈ [x 0,n−1 ] [10], is called the "pressure" (or free energy) of the potential φ. Observe that it does not depend on the measure µ φ itself, only on the potential. Thus, two Gibbs measures that are associated with the same potential have the same pressure. Given a continuous potential φ, such that var k (φ) ≤ b θ k for some constants 0 < θ < 1 and, b > 0, there is a unique shift-invariant probability measure satisfying the Gibbs property (1), [10]. Furthermore, under the same assumption for the potential, the associated Gibbs measure is mixing i.e., lim n→∞ µ (A ∩ σ −n B) = µ(A)µ(B), for any measurable set A, B and thus ergodic (A measure µ is said to be ergodic for the dynamics G if for any measurable G-invariant set A, (G −1 (A) = A), its measure is either µ(A) = 0 or µ(A) = 1. See, for instance, Chapter 3 of [79] for a detailed introduction.) (see [10,80] for definitions and details). Moreover, two continuous potentials φ and ψ are called cohomologous with respect to the shift σ (and denoted by φ ∼ ψ), if there is a continuous function u and a constant K, such that: Cohomologous potentials have the same Gibbs measure, µ φ = µ ψ .

Entropy and the Variational Principle
The Shannon entropy of a probability distribution quantifies the average level of "uncertainty" in its possible outcomes.
More generally, let ν be a shift-invariant probability measure on X, we introduce the block entropy: For finite alphabets and because ν is shift-invariant, the following limit exists, h(ν) := lim n→∞ 1 n H n (ν), and is called the Kolmogorov-Sinai entropy or simply entropy of ν [76,81]. The key is that the sequence H n is sub-additive and the possible values lie in the compact set [0, log |A|] (see [76]).
Another important quantity is the Kullback-Leibler (KL) divergence, which quantifies the difference between probability distributions. Consider two probability distributions p and q on the same probability space X, the KL divergence is: This divergence is finite whenever p is absolutely continuous with respect to q, and it is only zero if p = q.
Following the analogy with systems in statistical mechanics, an equilibrium state is defined as the measure that satisfies the so-called variational principle, namely: where φdν represents the expected value of φ with respect to the measure ν (it would be noted E ν (φ) in a probabilistic context). Let us comment on this result. The first equality establishes that, among all shift-invariant probability measures, there is a unique one, µ φ , which maximises h(ν) + φdν. If φdν is fixed to some value, this corresponds to maximising the entropy under constraints. This is the Maximum Entropy Principle, but (4) is more general. The second equality states that the maximum, h(µ φ ) + φdµ φ is exactly the pressure defined in Equation (2). The equation F(φ) = h(µ φ ) + φdµ φ establishes a link with thermodynamics, as F(φ) is equivalent to the free energy (In contrast to thermodynamics where free energy or pressure refer to different thermodynamic ensembles, we will not make such distinction here as it is irrelevant. Also, note that we do not keep the minus sign and take the Boltzmann constant equal to one. Thus, (4) coincides with the principle of free energy minimisation in statistical mechanics, but it corresponds to a maximising measure in our formalism because of a sign convention. Gibbs measures associated to potentials satisfying that var k (φ) ≤ bθ k , as stated above, are equilibrium states. For more general potentials, the notion of Gibbs states and equilibrium states may not coincide.

Observables and Fluctuations of Their Time Averages
An observable f is a function f : X → R, such that | f | < ∞ (| · | denote the absolute value), and which is time-translation invariant, i.e., for any time An important observable, considered in the following sections, is the potential: that is a linear combination of K observables f k , where the parameters λ k 's ∈ R. Here, we want to make a few important remarks. First, this decomposition is similar to the definition of potentials in thermodynamics, which are written in terms of a linear combination of intensive variables (e.g., temperature, chemical potential, and so on) and extensive functions of the configurations of the system (physical energy, number of particles). In order to emphasise this analogy, we use the symbol U for the potential, instead of φ. The function U λ parametrically depends on the λ k 's; hence, we use the lower index λ, to denote U λ as well as the corresponding Gibbs measure, denoted here µ λ . Now, we denote A n ( f ) the empirical average of the observable f in a sample of size n, that is, This quantity is important for empirical purposes. If µ is ergodic, the convergence of the empirical average to the actual expected value is µ-almost-sure, that is A n ( f ) a.s.
−→ f dµ as n → ∞. For an example of application of this theorem in the context of spike train statistics, see Section 2.6.

Correlations
Let us consider a pair of observables f , g ∈ L 2 (µ), (square integrable functions with respect to µ). We define their correlation at time n by One also might be interested in the auto-correlation (or auto-covariance) of f at time n, Observe that these quantities might decay fast. The mixing property implies that the correlations go to zero with n → +∞.

Properties of the Pressure
From the pressure (2), important statistical information regarding the system can be obtained, in particular, correlations. A distinguished case corresponds to the potentials of the form (5). When the corresponding pressure is differentiable to any order, taking the successive derivatives of the pressure with respect to their conjugate parameters gives the average values of the observables, their correlations, and their high-order cumulants with respect to the Gibbs measure. That is, in general: where κ n is the cumulant of order n with respect to µ λ . In particular, κ 1 is the mean of f k , κ 2 is its variance, κ 3 the skewness, and κ 4 the kurtosis. Partial derivatives with respect to pairs of parameters can also be considered [82]: Observe, we differentiate C f k , f j from C f j , f k as the dynamics may be irreversible in time. For an example of application of these formulas in the context of spike train statistics, see Section 2.6. Remark 1. The last two equations are fundamental. They establish a link between the variations in the average of the observable f k , when slightly varying the parameter λ j and the sum of time correlations between the pair of observables f j , f k . This result is known, in statistical physics as the fluctuation-dissipation theorem [83,84]. It relates, for example, in a ferromagnetic model, to the variation of the magnetisation of the spin k to the variations of the local magnetic field h k , via the magnetic susceptibility, which is the second derivative of the free energy. This is also the context of the linear response theory, which quantifies how a small perturbation of a parameter affect the average values of observables in terms of the unperturbed measure. Equation (7) can also be used to extend results of Thermodynamic Formalism to non-stationary situations, as we discuss in Section 4.5. Now, in the classical formulation in statistical physics and the Maximum Entropy models, only the correlation C f k , f j (0) appears in (7), because successive times are independent (correlations C f k , f j (n), C f j , f k (n), n > 0 vanish). When handling memory (thus, potentials with range R > 1), there is an infinite sum (series) of correlations appearing in the linear response. This infinite sum converges whenever correlations are decaying sufficiently fast with time n (exponentially). In contrast, when they do not converge fast enough (e.g., power-law with a small exponent), the series diverges, leading to a divergence of the second derivative of the pressure, corresponding to a second-order phase transition. Here, follow the Ehrenfest classification of phase transitions [85]. There is a phase transition of order k if the pressure (free energy) is C k−1 but not C k . The known examples are first-order, second-order, or infinite order (Kosterlitz-Thouless) phase transitions. Note that phase transitions in memory-less models can also happen if the instantaneous correlation function C f k , f j (0) diverges, e.g., when the number of degrees of freedom in the system (number of spins, neurons) tends to infinity. Therefore, in the present setting, second-order phase transitions can arise either when the number of degrees of freedom tends to infinity, or (not exclusive), when time correlations decay slowly.
Thus, Equation (7) connects the second derivative of the pressure, variations in static average of observables, and dynamical correlations. In the next sections, we discuss theorems that relate the dynamical evolution to a criteria ensuring that time-correlations are exponentially decaying, preventing the possibility of second order phase transitions for systems with a finite number of degrees of freedom.

Ruelle-Perron-Frobenius Operator
Let C(X) be the set of continuous functions on X. Consider the potential φ and a continuous function f ∈ C(X), so one can define a bounded linear operator associated to φ (transfer operator), called the Ruelle-Perron-Frobenius (RPF), as follows (There is a close analogy between this operator and the propagator in quantum field theory or the Koopman operator in classical dynamics [5]. All of these operators characterise how measures or observables evolve ruled by the dynamics.): The spectral properties of (8) yields information to characterise the pressure and study the ergodic properties of the system, in particular, the rate of decay of their correlation functions [80]. For instance, if 1 is a simple eigenvalue and the modulus of each of the other eigenvalues is smaller than one, this is equivalent to be mixing [80]. When the potential considered is of finite range, then the transfer operator corresponds to a matrix and the whole formalism is equivalent to Markov chains defined on finite alphabets. A potential φ is called normalised if L φ (1) = 1. The log of a normalised potential of range R + 1, corresponds to the transition probabilities of a Markov chain with memory depth R. Moreover, in this case F(φ) = 0. For Lipschitz observables in the finite dimensional case, the Perron-Frobenius theorem assures a unique eigenvector associated to the maximal eigenvalue, from which the unique invariant measure (Markov) is obtained. This measure has mixing properties, exponential decay of correlations, central limit theorem, and a large deviations principle (see Section 2.3.4). When the operator acts on an infinite dimensional space (such as the space of continuous functions), then the spectrum of a bounded linear operator L is given by the set spec(L) = {λ ∈ C : such that (λI − L) has no bounded inverse}, this set may contain points that are not necessarily eigenvalues (see, for instance, [86]). In this case, the strategy is to find a proper subspace where the spectrum of L has a finite number of such complex numbers whose norm is the spectral radius, say ρ, and the rest of the spectrum has norm strictly less than ρ (spectral gap). In this scenario, it is known that there is exponential decay of correlations for a sufficiently regular class of observables (such as Lipschitz), and the central limit theorem holds. In the absence of the spectral gap, then one has sub-exponential decay of correlations, which breaks down the central limit theorem, and the phase transition phenomenon appears (for further details and precise definitions, see [80] and the references therein).
Note that, given a potential ψ, one can explicitly find a normalised potential φ cohomologous to ψ, as follows, where R is the right eigenvector (real and positive) associated to the unique maximum eigenvalue ρ that is associated to L ψ .

Remark 2.
Note that the normalisation of the potential ψ does not require a partition function. In fact, as discussed below, the classical normalisation by a partition function is a particular case of (9), holding for memory-less potentials that does not generalise to range R > 1 potentials.

Time Averages and Central Limit Theorem
We have seen that if the measure µ is ergodic the time averages A n ( f ) converge µ-almost surely to the expected value f dµ. Now, we can ask about fluctuations around the expected value. The observable f satisfies the central limit theorem (CLT), with respect to (σ, µ) if: where N 0,σ 2 f is the Gaussian distribution with zero mean and covariance σ 2 f , which is given by the following expression involving temporal correlations: that is a particular case of (7). We illustrate an application of this theorem in the context of spike train statistics later in Section 2.6. Strong properties of convergence and exponential decay of correlations are ensured for Hölder continuous potentials in finite dimension. These properties are associated with the spectral gap property and they do not (necessarily) hold for less regular potentials or in non-compact spaces [78,80].

Large Deviations
The central limit theorem describes small fluctuations in the limit when n goes to infinity. Rare events that are exponentially small are the object of study of the large deviations theory.
An empirical average A n ( f ) satisfies a large deviation principle (LDP) with rate function I f , if the following limit exists: for s ∈ R. The above condition for large n implies that P ({A n ( f ) > s}) ≈ e −nI f (s) . In particular, if s > f dµ, then P ({A n ( f ) > s}) should tend to zero as n increases. The rate function tells us precisely how fast this probability goes to zero. Computing the rate function from Equation (11), may be a laborious task. The Gärtner-Ellis theorem provides a way to compute I f more easily [63]. To this end, let us introduce the scaled cumulant generating function (SCGF) associated with the observable f , by whenever the limit exists. The name comes from the fact that the n-th cumulant of f can be obtained by successive differentiation operations over Γ f (k) with respect to k, and then evaluating the result at k = 0. If Γ f is differentiable, then the Gärtner-Ellis theorem ensures that the average A n ( f ) satisfies a LDP with rate function given by the Legendre transform of Γ f , that is Therefore, one can study the large deviations of empirical averages A n ( f ) by first computing their SCGF, characterise its differentiability, and then find the Legendre transform. We compute this function in the context of spike train statistics later in Section 2.6.
If Γ f (k) is differentiable then I f (s) is convex [87], thus has a unique global minimum s * such that I f (s * ) = 0, then I f (s * ) = 0. Assume that I f (s) admits a Taylor expansion around s * , then for s close to s * , Because I f (s * ) = 0 and I f (s * ) = 0, for large values of n we obtain from (11) Therefore, the small deviations of A t ( f ) around s * are Gaussian with variance 1/nI f (s * ). In this way, the LDP can be regarded as an extension of the CLT.
The large deviation principle plays an important role in statistical mechanics, in particular in spin glass dynamics [59]. A large deviation principle can be used in order to relate entropy and free energy (here pressure) through a Legendre transform and to explain why variational principles arise in statistical mechanics [63,88]. As mentioned in the introduction, large deviations is the common theoretical principle linking dynamic mean field theory, maximum entropy principle, maximum likelihood, and Thermodynamic Formalism, although this link has not been studied in detail, to our best knowledge.

Potentials of Range One
A specific case where the variational principle (4) holds, is when the potential has the form (5). Then, equilibrium states are probability distributions µ λ , that maximise the entropy (3), under the constraints of expected values of K observables E µ λ ( f k ) := ∑ x f k (x)µ λ (x) = C k for k = 1, . . . , K. This problem can be solved by introducing a Lagrange multipliers λ k in the potential (5): There exists a unique maximum entropy distribution µ λ (equilibrium state) satisfying the constraints. It turns out that the maximising distribution can be explicitly found for range one potentials and the distribution satisfies the Gibbs property (1), which, in this particular case, reduces to, for all x ∈ X. Equation (13) is obtained by considering F(U λ ) = log Z, where Z is the "partition function". From Equation (12), the expression for the entropy (3) and the Jensen inequality, one can obtain the formula for the pressure in this case: The constrained problem can be uniquely solved, because the map λ → E µ λ (U) maps the real line monotonically onto the interval (min U, max U) [76].
For range one potentials, the measure of a block becomes a product distribution, as given by: As the index n corresponds to time, having an interaction depending on one single coordinate implies that configurations at distinct times are independent.

Finite Range Potentials
Equation (9) can be used to find the unique Markov measure that is associated with a finite range potential. As an example, consider a potential U of range two representing the pairwise interactions in a graph with incidence matrix I. The entries I(y, x) = 1 represent the allowed transitions between symbols y → x and I(y, x) = 0 the forbidden. We introduce the finite | A | × | A | transfer matrix L U , which corresponds to the RPF "operator" (8) restricted to a finite space.
L U (y, x) = I(y, x)e U(y) , y, x ∈ A, y ∈ σ −1 x As anticipated in Section 2.3.2, calling ρ the unique maximal positive eigenvalue of L U guaranteed by the Perron-Frobenius theorem, and R(x) and L(x) the x-th entry in the right and left eigenvectors associated with ρ, respectively, we define a normalized potential φ(y, x) = U(y) + log R(y) − log R(x) − log ρ, such that the matrix is stochastic, i.e., ∑ x P(y, x) = 1, and represents the transition probabilities of a Markov chain P(y → x) = P(x | y). The invariant measure p associated to the matrix P satisfying pP = p is where R, L = ∑ x R(x)L(x). Note that normalisation is done without defining a partition function. The Markov measure µ(p, P) of a block is given by µ [x 0,n ] = p(x 0 )P (x 1 , x 2 ) · · · P (x n−1 , x n ) for x k ∈ A, k = 0, .., n. Here, we have a nice way to show that this measure satisfies the Gibbs property while using the Markov property µ [x 1,n | x 0 ] = e ∑ n k=1 log P(x k−1 ,x k ) where we see that the conditioning upon the first time is similar to left boundary conditions in statistical physics.
It follows from (9) and (17) that µ [x 0,n ] obeys the variational principle and satisfies Equation (1), where F(U) = log ρ. The Gibbs measure µ [x 0,n ] gives an exponential weight to each cylinder set, depending on the "energy" depending on smaller blocks.

Example
To illustrate the maximum entropy principle and the statistical analysis that can be performed while using tools and ideas from Thermodynamic Formalism, we include here a toy example. Consider the state space of all the binary blocks of size 2 × 2 and one step transitions between them. We associate to each block en integer (23), and index a matrix using this representation of blocks we built the RPF matrix (15). There are allowed and forbidden transitions as explained in Section 3.3.1 (see Figure 3). Assume that we obtain from data (T samples) the empirical average value of the observable A T (x 1 0 · x 2 1 ) = 0.1 and A T (x 1 1 · x 2 0 ) = 0.4 and we want to find the maximum entropy Markov chain compatible with these constraints. Using Equations (6), (15) and (16), and , we obtain the maximum entropy Markov chain, defined by the following Markov transition matrix: From this Markov transition matrix, we can compute the fluctuations that are associated to each observable either using numerical simulations or analytically. We illustrate in Figure 1, the limit theorems and fluctuations introduced in Section 2 applied to this example.
The entropy maximisation for this toy example can be explicitly solved, and the simulations can also be performed directly from the transition matrix. However, large scale networks require sophisticated Montecarlo sampling methods to fit maximum entropy models that include non-synchronous interactions [89]. In the first column of Figure 1, we directly sample from the Markov transition matrix for different sample sizes and average the empirical frequency of both observables considered in the toy example. In the second column we plot the fitted Gaussian distributions of the empirical averages for different sample sizes. The third row correspond to the large deviations rate function. As explained in Section 2.3.4, the second derivative at the minimum of I f characterise the Gaussian fluctuations around the expected value of f . The last column represents the auto-correlations (7).
The same analysis is done in the bottom row for the observable x 1 1 · x 2 0 . The first column represent the sample average for different sample sizes. We observe the convergence towards the theoretical value as predicted by the law of large numbers. The second column represent the fitted Gaussian's to the histograms of the averages that were obtained for different sample sizes in the legend (10). The third column represent the large deviations rate function for both observables. In the abscissa it is the parameter s in (11) and in the ordinate

Systems with Infinite Range Potentials, Chains with Infinite Memory and Gibbs Distributions
In this section, we somewhat depart from the strict setting of Thermodynamic Formalism, switching to the perspective of Markov chains and their extension to infinite memory. Although Thermodynamic Formalism allows for one to consider infinite memory (infinite range potentials) the advantage of the approach presented here is to allow considering non stationary dynamics, i.e., escape from the variational principle (4) constrained by the entropy definition, which requires stationarity.
A general class of stochastic processes to deal with infinite memory are called Chains with complete connections [90,91]. These chains define non-markovian processes. However, Markovian approximations are possible and useful [92]. This section follows closely from [90]. 1], such that the following conditions hold for every n ∈ Z:

Definition 1. A system of transition probabilities is a family
(a) For every x n ∈ A, the function P n (x n | ·) is measurable with respect to the filtration F ≤n−1 .
for all n ∈ Z and all F ≤n -measurable functions h. The probability measure µ, when it exists, is called a chain with complete connections consistent with the system of transition probabilities {P n } n∈Z . It is possible that multiple measures are consistent with the same system of transition probabilities.
We now give conditions ensuring the existence and uniqueness of a probability measure consistent with the system of transition probabilities [90]. Theorem 1. A system of continuous transition probabilities (var m [P n (x n | ·)] → 0 as m → +∞) on a compact space has at least one probability measure consistent with it. Definition 3. A system of transition probabilities is non-null, if, for all n ∈ Z and all x n −∞ ∈ A −∞,n :

Definition 4.
A normalized potential has bounded squared variations if, for all n ∈ Z and all x −∞,n ∈ A −∞,n : There exists a unique probability measure consistent with the system of transition probabilities if these are non-null and the associated normalised potential has bounded squared variations [90].
There is a mathematically well-founded correspondence between chains with complete connections and Gibbs distributions presented up to now [7,90,93]. Let us now discuss the formal analogy. Define and: Then: and: These last equations highlight the connection between chains with complete connections and Gibbs distributions in statistical physics. Indeed, the conditional probability P [ x m,n | x −∞,m−1 ] has a "Gibbs" form where φ acts as an "energy" [90]. The correspondence is obtained considering "time" as a one-dimensional "lattice" and the "boundary conditions" as the past of the stochastic process. In contrast to statistical physics, there is no need to define a partition function (the potential is defined via transition probabilities, and is thus normalised).
While, for chains with complete connections defined through transition probabilities, the present is conditioned upon the past, Gibbs distributions, in general, also allow for conditioning "upon the future". More generally, Gibbs distributions in statistical physics extend to probability distributions on Z d where the probability to observe a certain configuration of spins in a restricted region of space is constrained by the configuration at the boundaries of this region. Therefore, they are defined in terms of specifications [7,93], which determine finite-volume conditional probabilities when the exterior of the volume is known. In one spatial dimension (d = 1), identifying Z with a time axis, this corresponds to conditioning both in the past and in the future. In contrast, families of transition probabilities with an exponential continuity rate define the so-called left-interval specifications (LIS) [90,94]. This leads to nonequivalent notions of "Gibbsianness" [95].
In contrast to the potentials studied up to now, the potential (19) is defined from transition probabilities (18), which are not necessarily time-translation invariant. This is the reason why the potential is noted φ(n, x), as it depends explicitly on time n and the configuration x. This case is closer to the setting where potential or energy is not necessarily invariant when moving along a lattice in statistical physics, therefore not constrained by the stationarity assumption made up to now. As we discuss in the next section, this is quite helpful in the study of neuronal network dynamics.

Thermodynamic Formalism in Neuroscience
In this section, we make the connection between Thermodynamic Formalism and spiking neuronal dynamics. From the standpoint of mathematics, there are at least two ways to consider spiking neuronal networks. First, they can be considered as biological objects whose activity can be experimentally recorded while using Multi-Electrode Arrays (MEA), often requiring sophisticated mathematical methods and algorithms for data analysis [96,97]. Second, neuronal networks are characterised by dynamical models, more or less derived from biophysics [35,36].
Here, we begin considering the statistical analysis of spike trains recorded from neuronal networks. For this case, Thermodynamic Formalism provides a powerful and insightful method to analyse the spatio-temporal statistics from experimental spike trains. We briefly mention that this formalism has afforded us to develop algorithm for spike train analysis [49,89,98] leading to the software PRANAS [99] freely available at https://team.inria.fr/biovision/pranas-software/, although we do not develop along these lines in this paper. We focus then on a specific question. When dealing with a model of spiking neurons, how much of the intrinsic dynamics of neurons, their interaction via synapses, and the influence of stimuli, constrain the collective spatio-temporal spike statistics?
Neurons are (nonlinear) entities that evolve in a concerted way (as they interact via synapses) and responding to external stimuli. The theoretical analysis of this high dimensional systems can be made thanks to mathematical methods (dynamical systems, bifurcations theory, stochastic processes, partial differential equations) or theoretical physics (statistical physics, nonlinear physics). Here, one might be interested in what Thermodynamic Formalism can contribute when considering neuronal models dynamics. In this spirit, we consider two models, the Integrate and Fire model and the Galves Löcherbach model. Most of our presentation focuses on stationary situations that are characterised by equilibrium states. Nevertheless, we consider the extension of Thermodynamic Formalism to non-stationary situations.
At the end of the section, we address a couple of open questions.

1.
What is the natural alphabet for spiking neuron dynamics? As we shall see, although the binary representation of spikes is a good candidate, it is too naive, as the relevant alphabet can be constructed on time blocks of spikes. A subsidiary question is about the size (time depth) of these blocks.

2.
Under which conditions can Thermodynamic Formalism machinery be faithfully applied to a spiking neuronal network model? 3.
What are the limits when the main theorems of Thermodynamic Formalism can and cannot be applied and what are the consequences for neuronal dynamics and spike statistics?

Statistics of Spike Trains and Gibbs Distribution
The human brain is composed of about a hundred billion neurons that mostly communicate among themselves together using sequences of spikes that are binary events (Sub-threshold oscillations also play an important role [100] and in organs like the retina, where most neurons do not spike.) Although the action potentials can vary in duration, amplitude, and shape, depending, e.g., on the type of neuron, they have a stereotyped shape, so that they can be considered as identical events. The main physiological reason for spike occurrence is that they can propagate information on different scales in the nervous system (centimeters to one meter) essentially without attenuation (active conduction as opposed to passive, Ohmic, conduction). However, from a contemporary point of view, spikes are also considered as events constituting "bits" of information. In this paradigm, it is tempting to consider spike trains as objects containing a "neural code" [37], i.e., a language that neurons use to communicate and that one could decipher. This terminology should not be considered literally, because, as opposed to computer codes, spike trains have a wide variability (e.g., the repetition of the same stimulus, even under controlled experimental conditions does not induce the same sequence of spikes as a response). In addition, nothing guarantees that there is only one code. When considered from the perspective of Thermodynamic Formalism, the notion of neural codes can have several meanings. (1) Spike trains constitute a symbolic coding of voltage dynamics (which depends on neuronal interactions and stimuli); and, (2) the way how neuronal dynamical systems (especially, spike trains) are affected by stimuli, provides a way for downstream networks to infer the stimulus (e.g., the retina encodes a visual scene in spike trains which are decoded by the visual cortex, capable of reconstructing a representation of the visual scene). Here, we essentially want to address the following questions: how to use Thermodynamic Formalism to fit experimental (or numerically generated) spike train data and which Gibbs distribution is produced by a network of neurons whose dynamics is known.
It is useful to consider spikes as instantaneous events (while the duration is about 1ms) and identify the maximum in the action potential course as the "time of the spike" [101]. This implicitly assumes that one considers dynamics on time scales larger than one millisecond. The binary representation is obtained using a window of a constant "binning size" (of order 10-20 ms) over the continuous time course of membrane potentials and count how many spikes there are per neuron within each time bin. Two or more spikes may occur within the same time bin, in that case, the convention is to consider these events equivalent to just one spike. This procedure [102] transforms experimental data into sequences of binary patterns (see Figure 2) leading to the following symbolic description.
Denoting the discrete time index by n, the spike-state of neuron k is denoted by x k n ∈ {0, 1}, depending on whether the k-th neuron emits a spike during the n-th time bin or not. A spike pattern is the spike-state of all the neurons in a network of N neurons at a given time bin, and it is denoted by x n := x k n N k=1 . A spike block denoted by x n,r := x n x n+1 · · · x r is a sequence of spike patterns. The length of the spike block x n,r is r − n + 1. A spike train denoted by x is the spike block representing the whole sequence of spike patterns. We consider spike trains of finite and infinite length. The set of all possible spike blocks of length R in a network of N neurons is denoted by A N R . Thus, in comparison to the previous section, and especially Section 2.1, symbols here are spike blocks of length R. The alphabet, previously denoted A, is denoted here A N R , making explicit the dependence on the number of neurons, N, and the block depth R. It is important to make this dependence explicit, as we consider, later in this review, the effect of increasing N and R. In this illustration, the retina of a mammalian is put into the MEA and submitted to natural light stimuli. The membrane potential of retinal ganglion cells is recorded and analysed to extract the spikes using spike sorting algorithms [96,97]. (B) Mathematical models of biophysically inspired spiking networks can be used to study spike trains. Top. Neurons, considered here as points in a lattice, interact via synaptic connections on an oriented weighted graph where the matrix of weights is denoted W. Bottom. A prominent mathematical class of models is the Integrate and Fire model where the membrane potential is modelled by a stochastic differential equation (black trajectory) with threshold condition θ. The neuron is considered to spike whenever the membrane potential reaches the threshold. Then, it is reset to some constant value. Binning time using windows of a few ms length, one can associate the continuous-time trajectory of the membrane potential with a discrete-time sequence of 0s and 1s characterising the spike state of the neuron in each time window. (C) Spike trains. Using the binary representation at the bottom of (B) for each neuron in a network one obtains sequences of binary spike patterns (spike trains) symbolically representing the underlying neuronal dynamics.

Conditional Probabilities for Spike Trains
The probability that a biological neuron, embedded in a network, emits a spike in a given time bin depends on the history of all variables determining the evolution of the neural network (voltage, conductances, concentrations of ions, neurotransmitters, etc.). Most of these variables are not experimentally accessible. Even if they were, there would be no hope of predicting, from this huge amount of information, the statistics of spikes. Dealing with neuronal models, the situation is simpler as there are fewer variables to control and their dynamics are known explicitly. However, even in this case, it is generally not possible to access spike statistics from dynamics. A simplification is to consider that the probability of a spike pattern only depends on the spikes emitted in the past by the network. This way, one can ignore the hidden dynamics of inaccessible variables and compute the statistics from what can be measured. Still, characterising the probability of a spike pattern given the history of the system, is generally out of reach (with, at least, two exceptions described in the next section).
The idea is to characterise the spike train statistics through a family of transition probabilities of the form: where R is the memory of the spike sequence, i.e., the time horizon on which the present depends on the past. Having these transition probabilities and an initial condition (or initial distribution), one can define a Markov chain (or a chain with infinite memory if R → +∞). It is possible, for some models, to explicitly write these probabilities. In Equation (20), we have assumed that all transition probabilities are strictly positive. This is necessary to ensure the uniqueness of the corresponding Gibbs distribution. Subsequently, one can associate to (20) a range R potential φ ( x n−R,n ) > −∞. On experimental grounds, the problem is to estimate these probabilities from data. Since there are 2 NR possible spike blocks, for N and/or R big (e.g., N R > 20) it becomes rapidly impossible to estimate these transition probabilities from experimental data while using a frequentist approach, as most of these transitions do not even occur within the finite experimental sample. However, one can try to guess the form of these transition probabilities. One possibility is to start with an ad-hoc form, capturing the main features of neuronal dynamics. A canonical example is the Generalised Linear Model (GLM) [103], where the transition probabilities take the form: where f is a non-linear function. The term b i is a constant fixing the baseline firing rate of neuron i. H ij is the memory kernel, * is the convolution product, and r j (n) is the spike train of neuron j before time n.
In this case, the memory kernel considers the spikes between n − R and n, but R can go arbitrarily to the past. Here we do not consider the influence of a time-dependent external stimulus. As we show in Section 4.3 this form can be established for discrete-time Integrate and Fire models. Equation (21) gives an Ansatz for the marginal probability that neuron i spikes at time n + 1, given the history of the network. Equation (20), provides the joint probability of having the spike pattern at time n + 1, and is obtained assuming that neurons are conditionally independent. This can be justified if one assumes that, due to synaptic transmission and delays, neurons do not have the time to interact within one time bin. This means that time bins must not be too large and/or synapses must not be too fast (like gap junctions [104]).

Remark 3.
The GLM, instead of describing conditional probabilities, characterises the spike rate or conditional intensity of an auto-regressive Poisson process.
A second approach is based on the variational principle (14), maximising entropy under constraints. Both of the approaches can be addressed from the perspective of Thermodynamic Formalism.

The Hammersley-Clifford Theorem
In our representation spikes take a binary value 0 or 1. Thus, any potential of range R is a function taking a finite set of values. A general theorem from Hammersley and Clifford [105,106] states that any range-R observable, in particular, the potential φ ( x n−R,n ), can be written in the form where the coefficients φ l correspond to the decomposition of φ in the space of finite range R-observables. This is analogous to Equation (5), with two main differences. First, the linear combination in (5) is used as an example making a link with Thermodynamics and the Maximum Entropy principle. Here, the decomposition (22) is a systematic expansion of any potential of range R defined over spike sequences. Second, in contrast to (5), the observables, denoted f k in (5) only consider finitely many values. Equation (22) is a linear decomposition on a basis of eigenfunctions referred to from now on as monomials [107]. They have the form: where n k = 1, . . . , N is a neuron index, and i k = n − R, . . . , There are 2 NR monomials for N neurons and a given range R. One can index them by an integer l in one-to-one correspondence with the set of pairs (i k , n k ). The advantage of the monomial representation is that it focuses on spike events, which is natural for spiking neuronal dynamics. Thus, the Hammersley Clifford decomposition gives a canonical way to write any range R potentials as a linear combination of monomials of maximum degree R. This includes the GLM potential which can be embedded in the same framework [104].
As emphasised above, the Hammersley Clifford decomposition is analogous to the expression of thermodynamic potentials as a sum of products of an intensive quantity (e.g., temperature) with an extensive one (e.g., the energy). Depending on the physical constraints of the problem, one defines a thermodynamic ensemble, where the average value of extensive quantities (energy, number of particles, volume, magnetisation) is prescribed. Whereas, the first principles allow for guessing the form of the potential in thermodynamics, there is no such recipe in neuroscience. Moreover, one cannot use the complete expansion (22) on practical grounds, simply because large degree monomials have a vanishing empirical probability. More precisely, the average value of a monomial of degree d decays exponentially fast with d. This leads to two problems. (i) How to determine (from data) the constraints which are necessary to correctly characterise the spike train statistics? (ii) Are there constraints that are equivalent? (i) can be addressed in the context of information geometry [108] while (ii) can be approached using cohomology [107]. We do not further develop these aspects here, but rather refer the reader to the cited articles.
The variational principle (4) (or its finite version (12)) provides a systematic way of inferring Gibbs distributions from empirical average values of spike interactions (monomials). We make the construction explicit in the next subsections.

Finite Memory, Markov Chains and Gibbs Distributions
We now explicitly show how to build a Gibbs measure from a finite set of experimental averages as constraints of the maximum entropy variational problem. We assume that these constraints involve events (monomials) over a memory depth R. We build the corresponding Markov chain while using the material of Section 2.3.2. We associate to each spike block x n,n+R−1 an integer w n , we write w n ∼ x n,n+R−1 . In this way a sequence of spike patterns (spike block) can be encoded as sequences of integers, that define the alphabet. Next, we define the incidence matrix ("grammar") between symbols of the alphabet. Not all transitions between symbols are legal or allowed. A transition between the two symbols denoted by w n → w n+1 or w n , w n+1 is legal if the corresponding blocks overlap according to this pattern, i.e., they have the block x n+1 . . . x n+R−1 in common (see Figure 3). This defines an incidence matrix I(w , w) = 1 if the transition between symbols w and w is legal and 0 otherwise. This incidence matrix defines the grammar of allowed and forbidden words or sequences of symbols. From this incidence matrix, we define the Perron-Frobenius transfer matrix L ψ in the same way, as in (15). In order to obtain the unique Markov transition matrix of maximum entropy, we follow the procedure that is explained in Section 2.3.2.
Thus, for a given choice of monomials, we associate a potential of the form (22), where the λ l s that do not correspond to a chosen monomial are set to 0. Subsequently, one computes the empirical average of the chosen monomials from data. From the Perron-Frobenius theorem, there is a unique Gibbs measure, the Markov measure µ λ of the Markov chain, which solves the variational problem (14), giving a statistical model of data, minimizing the KL divergence between the empirical measure and µ λ . Here, λ is the set of parameters λ l that achieve the variational principle. These parameters can be numerically computed, either by using the explicit form of the measure (17) [49] or by while using the MonteCarlo methods [89].
The software PRANAS allows for the handling of spike train statistics by numerically computing the Gibbs distribution, solving (14) for up to 100 neurons [99].

Spectral Gap and Thermodynamic Limit
For a potential of finite range R and a finite number of neurons provided λ l > −∞, for all l in (22), the Perron-Frobenius theorem guarantees the uniqueness of the Gibbs measure µ λ solving the variational principle (14). Moreover, the pressure being a real analytic function for finite range potentials, is infinitely differentiable with respect to parameters and there is an exponential decay of correlations with respect to time. This last aspect is due to the gap in the spectrum of the transfer matrix (15). These properties may not hold if either R → +∞ or, if N → ∞ (corresponding to a thermodynamic limit) where the potential may lose regularity. Here, one has to consider Thermodynamic Formalism in infinite dimension, on the functional space of continuous functions. The case R → +∞ corresponds to a potential with infinite range associated, in our case, to spike statistics with infinite memory. This is discussed in the next section and in Section 4, where we show that neuronal models can have such an unbounded memory. More generally, the limits N → ∞ or R → +∞ can induce important effects, such as phase transitions, which will be commented upon further in the discussion section.

Spiking Neuronal Network Models: The Leaky Integrate-and-Fire and Beyond
There are many models for neuronal dynamics both at the level of individual neurons and neuronal networks [35,36,109]. Here, we consider a canonical example of such a model. The first model proposed to the scientific community was introduced by Lapicque in 1907 [110]. The main interest was to make a nice link between dynamics, coding, and spikes, paving the way to use Thermodynamic Formalism in order to analyse the spike train statistics.

Dynamics and Spikes
A fundamental equation in neuronal membrane potential dynamics is the conservation of electric charge, written in its most canonical form, as follows: where C is the membrane capacitance of the neuron and V is its membrane potential. The sum ∑ X holds on ionic currents of the form i X = −g X (V − V X ) involving specific ionic channels permeable to specific ions (e.g., Na + , K + , Cl − , Ca 2+ ). Here, we include the neuron's intrinsic currents' (e.g., sodium and potassium currents triggering a spike [111]) and synaptic currents [109]. The conductance, g X , of channels of type X depends, in general, non-linearly on activation variables, themselves dependent on the voltage. V X is the Nernst reversal potential, i.e., the value of the membrane potential at which the current i X reverses its direction. Finally, I(t) is an external current that can mimic, e.g., an injection by an electrode or an external stimulus. In its simplest form, for a single isolated neuron, this equation takes the form: where R is the membrane resistance and the term g = 1 R corresponds to a unique passive conductance. In this case, we consider only a leak current where the leak reversal potential is set to 0. This is the equation of an RC circuit, which is quite simple, but quite far from a real neuron, as this equation does not even produce spikes. To circumvent this problem, one introduces a threshold, θ, such that Equation (24) holds whenever V(t) < θ (sub-threshold dynamics), corresponding to the "ntegrate" phase. In contrast, for all times t (r) such that V(t (r) ) = θ, two effects take place: (i) The membrane potential of the neuron is reset instantaneously to a rest value, here 0, without a loss of generality; (ii) a spike is recorded at times t (r) called "spike times". This is the "fire" phase (see Figure 2B, bottom).
While this is a simple artificial way to generate spikes, there is a huge price to pay on mathematical grounds because the threshold introduces a singularity set in the phase space where the dynamic is not differentiable. We develop this aspect below.
The generalisation of (24) to a network of N neurons is straightforward. Adding the contribution of synaptic currents, I k (syn) (t), building the network interactions, we obtain: where C k is the membrane capacitance of neuron k, V k , its membrane potential. The resistance R is assumed to be the same for all neurons. In the synaptic current, the parameter W kj represents the synaptic strength ("weight") from the pre-synaptic neuron j to the post-synaptic neuron k (see Figure 2B). Synaptic weights can be negative (inhibition) or positive (excitation). By convention, W kj = 0 if there is no connection from j to k. This way, the sum in (26) holds for j = 1 . . . N. The function α represents the time profile of the postsynaptic current induced by a pre-synaptic spike [112]. It has been experimentally observed that the tail of this function is exponential. On mathematical grounds this is essential. The sum in Equation (26) considers all the spike times t j (r) emitted by all the pre-synaptic neurons j before time t. When considering the asymptotic regime t → −∞ (to get rid of initial conditions) this sum may contain an infinite number of terms. Thus, in order to ensure the sumability of this series one needs α to decay sufficiently fast (here exponentially fast).
Equation (25) holds in the sub-threshold regime. The term S k (t) represents an external stimulus, and ξ k (t) is white noise whose amplitude is modulated by σ B . When the membrane potential of neuron k reaches the firing threshold at a firing time t k (r) , for some r, i.e., V k (t k (r) ) ≥ θ, the neuron k fires an action potential and its membrane potential is reset to a fixed reset value instantaneously (see Figure 2 B).
While Equations (25) and (26) look rather simple, the right hand side of Equation (25) depends, via (26) on a possibly uncountable set of events (the spike times) corresponding to a possible infinite history of the voltage dynamics of the network. In this sense, these equations do not represent a classical dynamical system, where the knowledge of the variables at a given time allows one to compute the variables' value at a future time by integrating the flow. Here, we require knowledge of the spike history, back to the last time where the neuron was reset to make the integration. This history can go quite far into the past, with a dependence decaying like the tail of the function α.
In order to circumvent these problems, we need to first get rid of the fact that spike times belong a priori to an uncountable set. There are two alternatives. The first one, briefly explored in this section, consists of discretising time as done e.g., in [113]. This leads to important results related with Thermodynamic Formalism (see [23,114] for details). The second alternative, to keep time continuous, is commented on the discussion section.

A Discrete Time Version of the Leaky-Integrate and Fire Model
The time discretisation of the model (25) reads: For simplicity, we have assumed that all neurons have the same capacitance C k = C, and set γ = 1 − dt τ , where (τ is the characteristic time scale of the membrane response, dt is an integration time step which has to be much smaller than τ to preserve the physical relevance, whereas it has to be strictly positive to have a time-discretization scheme.) τ = R C, with 0 < γ < 1, and then taken dt = 1. We have assumed that synapses are instantaneous. Subsequently, the synaptic input is ∑ j W kj x j n , that correspond to the pre-synaptic neuron j that acts on the post-synaptic neuron k whenever j spikes, x j n = 1. If, at some discrete-time n, V k n exceeds the threshold θ, the membrane potential is reset at time n + 1 and a spike is recorded at n for neuron k, i.e., x k n = 1. Below the threshold, the random dynamical system is ruled by (27). S k n is the time discretization of the external stimulus. ξ k n are independent standard Gaussian random variables.
It is easy to integrate Equation (27) conditionally upon a fixed spike sequence x.
We discuss the compatibility condition in more detail in Section 4.4, when dealing with symbolic coding.
For the moment, assume that V and x are compatible. We note τ k (x, n) = max l, l < n | x k l = 1 the last time before n where neuron k has spiked, thus whose voltage was reset to 0. Then: where: integrates the influence of pre-synaptic neuron j on the time interval [ τ k (x, n) + 1, n ]. Each spike emitted by this neuron, at times l in this time interval, contributes with a weight γ n−l and there is no contribution at times where x j l = 0. The condition γ < 1 implies an exponential decay in the spike history dependence with characteristic time: Likewise, ∑ n l=τ k (x,n) γ n−l S k l integrates the stimulus influence on neuron k and σ B ∑ n l=τ k (x,n) γ n−l ξ k (l) is the integrated noise term. This is a Gaussian random variable, with mean zero and variance . In (28), the dependence on the initial condition does not appear because we assume the initial time to be n 0 → −∞. So, either neuron k has spiked in the time interval ] − ∞, n] and the voltage is reset to 0, or it has not spiked, but the initial condition dependence decays like γ n−n 0 which vanishes when n 0 → −∞.

Gibbs Distribution of the Discrete Lif Model
Thanks to the integrated Equation (28) and because the integrated noise is Gaussian it is now easy to compute the probability that neuron k spikes at time n + 1 given the history x, P x k dz. Here, we have used a small abuse of notation. The conditioning upon x −∞,n means, in fact, the conditioning upon the sequence We condition upon the spike history prior to n back to the last time where all neurons had been reset at least once. While, for Equation (29). we just need to consider the history back to τ k (x, n), the conditioning upon x n−1,R(x) is necessary when considering the conditional join probability of spiking patterns. The joint probability is conditionally independent given the past: (30) Let us comment this result. Equation (30) is the transition probability, of the form (20), where the normalised Gibbs potential φ can be explicitly written, in terms of the synaptic interactions and the parameter σ B controlling the noise amplitude. However, note that, in contrast to (20), where the memory of the spike sequence was fixed independently of x, here the memory depends of x, providing a variable length Markov chain [115,116]. Actually, R(x) is an unbounded function of x as one can find, for all r ∈ { −∞, n }, a sequence x, such that R(x) = r (take the sequences where all x k n = 0, k = 1 . . . N, n > r and x k r = 1 for at least one k). Thus, we have to deal with the extension of Markov chains, to chains with unbounded memory introduced in Section 2.7. The existence and uniqueness of a Gibbs distribution compatible with this chain is guaranteed by the exponential decay of the memory controlled by γ < 1 [23]. In this case, the potential fulfills the conditions described in Section 2.7. Finally, in (30), the transition probabilities explicitly depend on time because of the stimulus dependent term. They are, therefore, not translation invariant. While the extension of Gibbs distributions to non time-translation invariant chains can be rigorously done (upon the exponential decay of memory [90]), we restrict ourselves now to the case without stimulus (S k n = 0, ∀k = 1 . . . N, n ∈ Z) to apply Thermodynamic Formalism, until Section 4.5, where we discuss linear response.

Markov Partition and Symbolic Coding
In this section, we consider the deterministic discrete-time neuronal network model obtained considering (25) with σ B = 0, studied in detail in [114]. The threshold θ of the voltage in a network of N neurons induces a natural partition of R N , where the bound B depends on synaptic weights [114,117]. If V k n ∈ P 0 , it evolves according to the sub-threshold dynamics (25) and it does not emit a spike at time n. In contrast, if V k n ∈ P 1 , it emits a spike at time n and its trajectory is set back to P 0 at time n + 1. Thus, P is a natural partition in the sense that it informs about the spikes of each neuron. Therefore, to each trajectory V ≡ V k n , k = 1 . . . N, n ∈ Z , there is associated an infinite spike sequence x such that x k n = 0 ⇐⇒ V k n ∈ P 0 and x k n = 1 ⇐⇒ V k n ∈ P 0 . This partition can be used to generate a Markov partition [114], but, in general, the Markov partition is not P but a finite refinement Q of P. What ensures that a Markov partition exists is that (27) is contracting. More precisely, it contracts, in one step, at speed γ for directions (neurons) such that V k < θ, and it contracts, with an infinite speed, for directions such that V k ≥ θ (reset). However, this generates discontinuities in the mapping (27), and a singularity set S = V ∈ R N | ∃k ∈ { 1 . . . N } , V k = θ , where the map associated with (27), hereafter denoted by G, is discontinuous. Thus, G is piecewise continuous and piecewise contracting. Now, recall that Q is a Markov partition for the dynamics with contracting map G if its elements satisfy G(Q n ) ∩ Q n = ∅ ⇒ G(Q n ) ⊂ Q n . In other words, the image of Q n is included in Q n whenever the transition n → n is legal. Here, in general, the elements of P do not satisfy this condition. This is because the image of a domain of P usually intersects in several domains (in this case, the image intersects the singularity set). From the neural network's perspective this means that, in general, it is not possible to know the spiking pattern at time n + 1 knowing the spiking pattern at time n. There are several possibilities, depending on the membrane potential values and not only on the firing state of the neurons. Of course, if, say P n is such that G ( P n ) intersects several domains P n 1 , . . . , P n l one can take the preimages of these domains G −1 ( P n r ) to construct a refinement of P, such that the Markov partition requirement is satisfied in one iteration of the map. However, nothing guarantees that, at the second iteration, some elements of this new partition will not intersect the singularity set under G 2 .
Can we find a finite refinement of P, such that the trajectory of the partition elements never intersects several partition elements? It is shown in [114] that this property is satisfied for generic values of the synaptic weights W ij . Essentially, it is based on the fact that the distance between the Ω-limit set of (27) and the singularity set S, is generically positive. In other words, each point in the partition elements Q n has a local stable manifold with a finite diameter.
As a consequence, the deterministic discrete-time neuronal network model (27) admits a Markov partition, a refinement of the natural partition, providing a symbolic coding of the membrane potential trajectories in terms of spike sequences. In particular, once the initial condition dependence has been removed, the evolution (28) (without noise) is only constrained by the stimulus. Thus, (28) provides a coding scheme of the stimulus in terms of spike sequences. The Markov partition is made of spike blocks, with finite memory depth R, which can be used to apply Thermodynamic Formalism in the presence of noise. However, R-the memory depth of the corresponding Markov chain-depends on the parameters and, in particular, the synaptic weights.
In addition, note that the presence of a singularity set induces a weak form of initial conditions. Although the dynamic is contracting, a small perturbation of a trajectory can induce an evolution drastically different from the unperturbed trajectory, if the perturbation crosses the singularity set. In this case, e.g., there is a neuron, k, which does not spike, at time n in the unperturbed trajectory, and spikes at time n in the perturbed one, inducing a completely different evolution (cascade effect). This phenomenon has been exposed in the context of spiking neurons, where the coexistence of stable and unstable dynamics is investigated [118]. The singularity set also induces the existence of ghost orbits, ∃k ∈ { 1 . . . N } , ∀n > 0, V k n < θ and lim sup n→+∞ V k n = θ. However, ghost orbits are non-generic in a topological and a measure-theoretic sense. As a corollary, the Ω-set is generically composed of finitely many periodic orbits with a finite period (whose length depends on parameters of the model, in particular, synaptic weights).

Explicit Form of the Potential GLM vs. MaxEnt
Model (27) makes a link between the dynamics of a neuronal network and the transition probabilities (20), where the dependence on the model parameters (in particular, synaptic weights, and stimuli) is explicit. We have an explicit potential for this model, which, here, takes a GLM-like form (29), but is more general, as in contrast to the GLM, the effective interactions depend on time via powers of the leak term γ. This potential can also be written in terms of monomials using the Hammersley-Clifford decomposition (3.3), through a series expansion of the function f . This procedure generates a series of monomials with coefficients that can be explicitly computed (using the fact that, from the monomials , for any integer m > 0). These coefficients are proportional to powers of γ < 1, so their strength decays exponentially fast, allowing for truncating the potential to a finite number of terms, which produce canonical Markov approximations of different orders [92]. One obtains, to the lowest order, a Bernoulli potential, then pairwise terms, and so on.

Linear Response
Another interesting consequence of the analysis of this model is that the potential may depend on a time dependent stimulus. When considering that the stimulus is of small amplitude and additive, one can take a Taylor expansion of the potential as powers of the stimulus allowing one to go beyond the stationarity assumption central to equilibrium statistical mechanics and Thermodynamic Formalism. In this case, it is possible to show that the variations in the spike statistics induced by the stimulus, can be described in terms of a linear response theory [119][120][121][122]. The main result is that the variation, in the average of an observable f , resulting from the application of a stimulus reads: where µ (sp) is the Gibbs distribution in spontaneous activity (without stimulus), and µ S is the Gibbs distribution in the evoked activity regime (with stimulus), µ S [ f ] (n) means the average of f with respect to µ S , which depends on time (if the stimulus does), and µ (sp) [ f ] means the average of f with respect to µ (sp) , which does not depends on time. This variation in average is given by a convolution between a kernel K f , depending on f (which can be expressed in terms of time correlation functions between monomials) and of the stimulus. The coefficients in the expansion of K f depend on the parameters constraining dynamics (e.g., the synaptic weights in (27)). The correlations are computed with respect to the invariant Gibbs measure µ (sp) . In addition, the influence of monomials in the expansion decreases with their order, so that one can obtain a reasonable approximation of the convolution kernel only considering averages of order two monomials (time dependent pairwise correlations). Therefore, this is sa result in the form of a fluctuation-dissipation "theorem" in statistical physics, with the difference that one considers time dependent correlations. This formula has proven to give astonishingly good results when computing the response to a time dependent stimulus in the model (27) [122].

The Galves-Löcherbach Model
Here, we present a second example, where the Gibbs potential can be computed. This model is known as the Galves-Löcherbach model introduced by Antonio Galves and Eva Löcherbach in [91] (see also [123,124]). This model is a generalization of [22], but considering an infinite (countable) network of neurons interacting in time with memory of variable length.
The model is built when considering a stochastic chain (X t ) t∈Z taking values in {0, 1} I , where I is a countable set of neurons. The probability of a spike depends on the accumulated activity of the system since the last spike, thus, each spike depends on a variable length history, defining also a non-Markovian stochastic process. Extensions of this model have been made considering the hydrodynamic limit of the interacting neuronal system [125], classifying the collective behavior according to parameter values [126], and the generalization to the continuous time [127,128].
For each neuron i ∈ I and each time t ∈ Z, let τ i (x, t) denote the last time before t at which neuron i fired a spike in the spike train x: and suppose that the synaptic weights W ij have the uniform summability property: The joint probabilities are conditionally independent given the past: where the probability of neuron i having a spike at time t is given by: where g j (t − s) plays the role of the exponential α-kernel in (26). The transition probabilities (32) have the form of a GLM model. Under technical conditions of the functions h i and g j and W ij , there exists a unique probability measure consistent with (31) and (32) (see Theorem 1 of [91]). To prove this claim, they use a Kalikow-type decomposition of the infinite order transition probabilities. This type of decomposition has also been considered in Ref [91,129,130]. The setup considered in this work extends to infinite size and infinite memory.

Discussion and Perspectives
In this review, we introduce different ideas and tools from Thermodynamical Formalism and show how they can be applied in theoretical neuroscience. As a summary, we grouped these approaches depending on two main characteristics. The first one is the number of neurons N that affect the cardinality of the alphabet considered and the range R of the potential (memory in transition probabilities) associated to the equilibrium measure characterising the system. This is represented in Table 1. The infinite number of neurons and infinite range cases are further discussed in Section 5.2.
While there have been interesting applications of Thermodynamic Formalism to neuroscience, there are interesting ideas and developments still to come. In this concluding section we raise questions, challenges and new avenues for the application of Thermodynamic Formalism to theoretical neurosciences. In this review, we have considered rather academic models of neuronal networks, where, especially, time is considered to be discrete. There are good reasons for that. As we remarked at the beginning of Section 3, we were considering models, like the Integrate-and-Fire, where spikes arise instantaneously, in continuous time, thereby providing a possibly uncountable set of potential spike trains. The question is whether we are dealing here with a realistic property of biological neuronal networks or with an artifact created by the instantaneous reset. Real spikes have a duration (a few ms) and a refractory period (also a few ms), so, for a fixed initial condition, spike trains produced by a continuous-time neuronal network are countable. Now, it might be that the set of spike trains depend continuously on the initial condition, so that we are still left with an uncountable set of spikes.
It is out of the scope of this review to discuss from a biological perspective, whether or not neuronal networks have the cardinality of the continuum (see [101,117,131] for a discussion on this topic). Instead, we have considered a strategy consisting of discretizing time, avoiding the problem of potentially uncountable spikes. Here, we briefly mention another strategy, allowing for associating a countable set of spike trains to continuous time networks with a countable set of spike trains. We first make the remark that the instantaneous reset of voltage is physical and biological nonsense, inducing pathologies in the dynamics [132]. On this basis, we use a convenient mathematical trick that is explained in the next paragraph, which can certainly be criticized on phenomenological grounds [131], especially when the dynamical system representing the neuronal activity is deterministic.
After spiking, a biological neuron stays at rest a certain time (refractory period). Accordingly, the trick is the following. Fix δ > 0 and define a spiking variable x k n ∈ { 0, 1 }, where n is an integer, where x k n = 1 if neuron k emits a spike in the time interval [nδ, (n + 1)δ[ and x k n = 0 otherwise. Recall that t k (r) denotes the time at which neuron k emits its r-th spike . This reads: otherwise.
Spiking variables are therefore time-discrete events with a time resolution δ. When V k reaches the threshold at time t k (r) it is reset to 0, and stays there until time (n + 1)δ. After this, follows the sub-threshold evolution (25) until the next time where V k reaches the threshold. Note that, in this modelling, δ can be quite small when compared to the time scales of the dynamics. In this way, the set of spike trains x becomes at most countable.
This trick can be used to generalise the Integrate-and-Fire model into a conductance-based Integrate-and-Fire model, which was introduced by Rudolph and Destexhe in [133] and mathematically studied in [23,24,117], where the synaptic conductance depends on the spike history. One can still show that a unique Gibbs distribution with infinite range potential exist in this case, characterising the spike train statistics. The potential can be explicitly computed as a function of network parameters, even in the presence of a time-dependent stimulus. Now, would Thermodynamic Formalism apply to more realistic neuronal models, like Hodgkin-Huxley [111], FitzHugh-Nagumo [134,135], Morris-Lecar models [136]? (see [36,109,137] for a complete presentation of canonical neuronal models). In these models, closer to biology, the spikes have a time course and, thus, are not considered as point events. However, we do not know any result establishing, e.g., the existence of Markov partitions and symbolic coding for these models and this seems to be out of reach for the moment. Still, one can bin the time and proceed as done in experiments where voltage is a time-continuous signal. Thus, one can still use the approach used in (20) to characterise the spike train statistics.
A natural question in this context is what is the link between spike train statistics and the underlying dynamical model, with "hidden" dynamical variables, such as membrane potential, but also, e.g., activation/inactivation variables? If we think in terms of spike coding, the alternative is the following. Either spike trains contain all the necessary information to characterise the dynamics, e.g., the spike response to a stimulus, and then characterising (20) is sufficient. Or, there is additional information, not conveyed by spikes (e.g., sub-threshold oscillations [100,138,139]), and the "neural code" is not entirely contained in the spikes, somewhat ruining the hope of encoding neuronal messages purely in terms of spikes. This question is much more general than the validity of the Thermodynamic Formalism approach for these models.
What Thermodynamic Formalism brings to the analysis of these models is twofold: (1) a way to rigorously handle probabilistic representations of spikes (20); and, (2) to provide conceptual and mathematical tools to analyse spiking neuronal network models, like (27), where a dynamical system formulation of biophysical variables can be mathematically related to spike coding and spike train statistics.

Phase Transitions
Several studies have shown that the population of vertebrate retinal ganglion cells responding to naturalistic stimulus is poised near a "critical state" [73,74]. From the maximum entropy joint distribution (13), a family of Gibbs distributions can be built introducing a parameter 1/β (analogous to the inverse temperature), which scales all of the Lagrange multipliers of the inferred Hamiltonian. When β → 0 (infinite temperature), the uniform distribution is obtained, and when β → +∞ (zero temperature), the Dirac delta supported at the spike configuration(s) of minimal energy is obtained. 1/β = 1 corresponds to the inferred maximum entropy distribution from data. These studies have only analysed memoryless Gibbs distributions (13).
From this representation, it is possible to compute the fluctuations (variance) of the energy U λ as a function of the "temperature" parameter T. This quantity can be obtained as the second derivative of the pressure, Equation (6), which is, in thermodynamics, related to the heat capacity C T . On numerical grounds, this quantity can be computed while using MonteCarlo simulations and plotted as a function of the "temperature" T = 1 β , for different network sizes, (see Figure 4). The form of C T versus T plot, for maximum entropy models of Ising type obtained from the recording of retinal ganglion cells responding to naturalistic stimuli are shown in Figure 4 (redrawn from [74]). It can be observed that there is a clear, increasing peak at T = 1, which starts to manifest itself when larger and larger groups of neurons are considered. This presumable divergence of the heat capacity (or variance of U) when N → ∞ (thermodynamic limit) is interpreted as a second order phase transition (a so-called "critical regime" [140]).
The behavior of the specific heat that is observed in Figure 4 suggests that the heat capacity of a maximum entropy distribution, fitted over an increasingly large group of neurons in the retina, diverges. This phenomenon has been considered to be a "signature of criticality" (details of this study and a discussion about whether criticality is functional for retinal ganglion cells can be found in Ref. [74]). Some criticism regarding this approach to diagnostic criticality has appeared arguing that the maximum entropy principle is likely to yield models that are close to singular values of parameters, akin to critical points in physics where phase transitions occur. Statistically distinguishable models tend to accumulate close to critical points, where the susceptibility (inverse Fisher Information) diverges in infinite systems [141]. These ideas have also been applied to numerical simulations of a canonical feed-forward population model showing that the specific heat diverges whenever the average correlation strength is independent of the population size [75], as in the random subsampling of correlations used in [74]. Additionally, note that, for spike trains obtained from discrete Markov processes, binning generates a stochastic process with unbounded memory akin to inducing spurious phase transitions [102]. Signatures of criticality Generic plot of heat capacity C T versus temperature T for maximum entropy models built constraining firing rates and pairwise correlations of retinal ganglion cells responding to naturalistic stimuli [74]. A clear peak appears at T = 1 when groups of an increasingly large number of neurons are considered (thermodynamic limit).
This interesting approach leads, nevertheless, to several questions in the context of Thermodynamic Formalism.
• Does this signature of criticality extend to Gibbs distributions with potentials of range R > 1, i.e., with memory? How does it depend on R? We are not aware of any experimental results addressing this issue. This question is related to the following: • What is this signature of criticality from the point of view of Thermodynamic Formalism? The occurrence of a second-order phase transition mathematically means that the pressure is C 1 but not C 2 when some limit is taken. Here, we have two possible limits: the range of the potential R tends to infinity or the number of neurons N tends to infinity. These two limits could also be addressed simultaneously and they do not necessarily commute. For potentials associated to finite R and N the Perron-Frobenius theorem guarantees the existence and uniqueness of the Gibbs measure and the analyticity of the pressure can also be proved, preventing phase transitions. When R or N are infinite, the properties of the RPF operator Section 2.3.2 characterises the presence or absence of phase transitions. Indeed, there are conditions ensuring a spectral gap for this operator, ensuring the exponential decay of correlations. Now, Equation (7) characterise the second derivative of the pressure as a time series of correlations, which converge when the correlations decay exponentially.
On the opposite side, the non-summability of time correlation function implies the non-existence of the second derivative, and thus, of a second-order phase transition. Therefore, a possibility to have a second-order phase transition is when the spectral gap property for the RPF operator when R → +∞ or N → +∞ is absent. In statistical mechanics, second-order phase transitions can be characterised by how the zeros of the partition function, written as a polynomial, pinch the real axis (Lee-Yang phenomenon) [142][143][144]. In our case, when R > 1, the object of interest is not the partition function, but rather the largest eigenvalue of L φ , which has to stay analytic in the limit R, or N, → +∞. The absence of the spectral gap property presents an analogy with the Lee-Yang phenomenon, although we do not know about results establishing a deeper link. • Can we relate known examples of dynamical systems exhibiting phase transitions to models in neuroscience? Another possible example to be interpreted in neuroscience is the Dyson model [145], in which there exists a phase transition in the sense of spontaneous magnetisation when the temperature goes to zero, due to an infinite range potential whose correlation does not decay exponentially fast. In our case, the range of the potential should be taken in time, keeping (possibly) the number of neurons finite. Other examples exist of rigorous characterisations of phase transitions in the thermodynamic description of Pomeau-Manneville intermittent maps, passing from an integrable density function associated with the measure to heavy-tailed densities [146]. An interesting result may hint at the connection between the topological Markov map of the interval and stochastic chains of infinite order or chains with complete connections. Ref [147] presents how to build a topological Markov map of the interval whose invariant probability measure is the stationary law of a given stochastic chain of infinite order. This is interesting in this context because as we presented in (27), there are mathematical models of spiking neurons whose spike statistics are represented by chains of infinite order. This result or its inverse i.e, how to build a stochastic chain of infinite order from a topological Markov map may hint at conditions in the parameters or conditions of the mathematical models of spiking neurons to exhibit second order phase transitions. • What could the dynamical or mechanistic origins of a second-order phase transition be in a spiking neuronal network model? Handling experimental data is of course important, but for long experiments with living neuronal tissue, one cannot control the size of the sample, the stationarity of data, and so on. Accordingly, assume that we have been able to find an example of a Gibbs distribution exhibiting a second-order phase transition when R → +∞ or N → +∞. Can we build a spiking dynamical system, with finite R and N, which has this Gibbs distribution in the limit R, or N, → +∞, so that we observe a phase transition in the model? Then, what are the mechanistic origins (in the neuronal dynamics) of second-order phase transitions? It could be an interesting example to study the existence of a second-order phase transition in a simple neuronal model. Returning back to the discrete LIF model, the failure in the second-order differentiability of the pressure means the loss of exponential mixing, which, in the model (27) can arise in, at least, two cases. First, if γ = 1 − , → 0. This is a way to obtain a potential with increasing range as → 0 with loss the summability of correlations. The corresponding orbits (reminiscent of the ghost orbits discussed in Section 4.4) are such that it may take a long time for some neurons to be reset. Thereby, the memory to be considered is very long. However, this is a case hardly interpretable from the neuroscience perspective. A second possibility is to analyse how the pressure depends on the spectrum of the synaptic weights matrix and to check whether there are cases (e.g., small world or scale-free lattices), where the spectral gap of the RPF vanishes.
From the perspective of the maximum entropy distributions built from experimental data of spiking neurons, there have been interesting applications of the Gibbs distributions that were obtained to answer questions related to the retinal code that are not related to criticality [148]. From the maximum entropy joint distribution the conditional distributions can be computed, and questions about the redundancy of the neural code can be addressed such as how predictable is the activity of each neuron based on the knowledge of the activity of other neurons in the population. Can we find a subset of neurons J that together predict with high accuracy the spiking behaviour of the neuron i? Mathematically can be written in this way p(x i = 1 | {x j } j∈J ), where J is a subset of neurons in the population of spiking neurons. Other questions related to the neural coding and dimensionality reduction can be addressed studying the energy landscape U λ (x) of (13). For example, the the local minima of an energy landscape correspond to metastable states and several configurations may correspond to the same "valley" near each local minima. Transitions between valleys have be studied in the context of "retinal coding" (see details in [148]). Alternative methodologies using the maximum entropy principle to study network of sensory neurons have been used in order to classify intrinsic interactions from extrinsic correlations [46] and to reveal the excitatory and inhibitory correlations [45].

What Else Do Thermodynamic Formalism and Gibbs Distributions May Tell Us about Neuroscience?
The relationship between mathematics and physics has been historically symbiotic and Thermodynamic Formalism is an interesting example of how ideas from physics may help to solve problems and introduce ideas into the field of mathematics. The history of Thermodynamic Formalism also shows how purely mathematical results can be obtained as corollaries of physical laws, inverting the frequently assumed relationship between physics and mathematics [149].
However, in the case considered in this review-the link between Thermodynamic Formalism and neuroscience-the mathematical problem is motivated by biology, not by physics. While Eugene Wigner argues in favour of the "The unreasonable effectiveness of mathematics in the natural sciences" [150], Israel Gelfand, after spending several years working in mathematical problems related to biology, replied with "The unreasonable ineffectiveness of mathematics in biology." [151]. While there are reasons to argue that this is still the case, it is less clear if one can blame the field of mathematics or just the fact that we have not yet used the right tools or frameworks. In the quest for these "right tools", there is a long tradition of using ideas from statistical physics to study neural networks, and in particular, to represent the emergence of collective behaviour from microscopic interactions, with the hope that statistical aspects of the collective behaviour will be independent of the details in these systems. This gave rise to major branches of theoretical neuroscience, like dynamic mean-field methods [29,[152][153][154] or the Maximum Entropy approach [38,49,148,155], mainly coming from physicists. Although there are considerably less articles using mathematical methods to rigorously analyse the collective behaviour of neuronal networks some promising approach have been recently proposed based on large deviations [156,157], Kalikow-type decomposition [91,158], stochastic processes [159][160][161][162][163], dynamical systems [137,164], etc. As we have developed in this review, Thermodynamic Formalism could also be one of these tools, providing interesting connections between mathematics and physics, dynamics and statistics, applied to neuroscience. Especially, we have described how Thermodynamic Formalism: (1) provides a conceptual and operational (i.e., allowing to develop algorithms and software [99]) framework to analyse experimental spike train data; (2) allows to derive explicit expressions linking spike statistics to neuronal networks dynamics; (3) extends to non stationarity via linear response theory; and, (4) proposes a realm to address questions related to criticality.
Here, we would like to propose some other extensions, not yet explored so staying at the level of ideas, all based on the power of Thermodynamic Formalism to make explicit and operational links between dynamics, statistics and symbolic coding.
1. Geometry of the state space. A prominent aspect of Thermodynamic Formalism, that we haven't discussed yet in this review, is its link to the characterisation of the geometry of attractors and, especially, fractal sets [165,166]. For example, the composition of contracting mappings along symbolic orbits defines the so-called Iterated Function Systems (IFS) [167] generating fractal sets with tunable geometry and structure. Now, it is interesting to remark that Integrate and Fire models are actually piecewise contracting dynamical systems having a structure similar to IFS where the contracting pieces are symbolically encoded by spike blocks [114]. It would be interesting to investigate, along these lines, the structure of attractors in Integrate and Fire models, and how orbits, encoded by spike blocks, are related to the geometry of attractors (the Ω-limit set). 2. Transitions between attractors. The concept of attractor is actually central in describing brain dynamics [168,169]. Especially, a current trend in neuroscience is to associate to brain states attractors (or ghost attractors, see [170] and references therein). The transitions between these states corresponds to transition during tasks or spontaneous activity [171][172][173][174]. It is relatively natural to characterise such transitions by Markov chains [175], which is the first step toward the application of Thermodynamic Formalism and analysing these transitions from a statistical and statistical physics perspective. 3. Non-stationarity and link with generating functional formalism.
As we mentioned, Thermodynamic Formalism is constructed from a variational approach based on entropy and, thus, requiring time translation invariance. We have briefly described how we can depart from this constraint while using linear response theory. It would be interesting to explore beyond this point and consider general types of response to stimuli (not requiring a small perturbation, as in linear response). For this, one would have to construct a Thermodynamic Formalism based on the optimisation of a quantity, which is not the entropy. This is somehow what generating functional approaches like the dynamic mean-field theory does (see introduction), although using other constraining hypotheses (essentially to be able to describe the infinite size limit by a Gaussian process). It would be interesting to try to close the gap between these two approaches (e.g., via large deviations theory).
One of the biggest challenges in science of the XXI century is to understand the brain functions within a conceptual framework that are capable of unifying the multi-scale dynamics that take place in the brain. This framework should also make sense in the light of the overwhelming amount of experimental data capable of predicting macroscopic phenomena, such as motor behaviour or visual experience from the activity of billions of neurons.
Physicists have been able to make a deep connection between mechanics, statistical physics, and thermodynamics. A similar quest is presumably guiding the research of (some) theoretical and experimental neuroscientists. While there is still a long way to go before achieving this goal (as some argue we are still searching for principles [176]), during the last decades, mathematicians have been playing a relevant role in the rigorous description of neural phenomena, clarifying and raising conceptual problems in neuroscience.
We hope that theoretical tools and ideas from Thermodynamic Formalism and its current application in neuronal dynamics and spike train statistics will lead to a better and unified understanding of the neural phenomena. We also hope that the present review may serve as an encouragement for the mathematical community that is interested in applications of Thermodynamic Formalism in order to study these interesting and important problems. Entropy of the probability measure µ λ k Lagrange multiplier parameter U λ Potential or Energy function Pressure associated to the potential U λ µ ψ Equilibrium measure associated to the potential ψ S n φ Birkhoff sums associated to the potential φ Γ f Scaled cumulant generating function of the observable f I f Rate function of the observable f L φ Ruelle-Perron-Frobenius operator associated to the potential φ w n Integer associated to the spike block x n,n+R−1 C f ,g (n) Correlation function between the observables f and g at time n m l Monomial l C T Heat capacity V k Voltage of neuron k θ Threshold