Linear Response of General Observables in Spiking Neuronal Network Models

We establish a general linear response relation for spiking neuronal networks, based on chains with unbounded memory. This relation allow us to predict the influence of a weak amplitude time dependent external stimuli on spatio-temporal spike correlations, from the spontaneous statistics (without stimulus) in a general context where the memory in spike dynamics can extend arbitrarily far in the past. Using this approach, we show how the linear response is explicitly related to the collective effect of the stimuli, intrinsic neuronal dynamics, and network connectivity on spike train statistics. We illustrate our results with numerical simulations performed over a discrete time integrate and fire model.


Introduction
Neurons communicate by short-lasting electrical signals called action potentials or "spikes", allowing the rapid propagation of information throughout the nervous system, with a minimal energy dissipation [1].The spike shape is remarkably constant for a given neuron, and it is a contemporary view to consider spikes as quanta (bits) of information [2].As a result, information is presumably encoded in the spike timing [3].
The simplest quantitative way to characterize the spiking activity of a neuron is its firing rate r(t), where r(t)dt is the probability that this neuron spikes during a small interval [t, t + dt].Under the influence of an external stimulus the firing rate changes.A classical ansatz, coming from the Volterra expansions [2] is to write the variation in the firing rate of a neuron as the convolution form: where the exponent (1) recalls that we consider a first-order effect of the stimulus S, that is, the stimulus is weak enough so that higher-order terms in the Volterra expansion can be neglected.This is an example of linear response: the variation in the rate is proportional to the stimulus.Here, K is a convolution kernel constrained by the underlying network dynamics.For example, in sensory neurons, where S and K are functions of space and time, the convolution (1) takes the explicit form: where K decays sufficiently fast at infinity (in space and time) to ensure that the integral is well defined.K mimics the receptive field (RF) of the neuron.In general, the response of spiking neuronal networks to a time-dependent stimulus does not only affect rates, it also has an impact on higher-order correlations between neurons, because neurons are connected through synapses.This situation is sketched in Figure 1.
Figure 1.A time-dependent stimulus (A) is applied at time t 0 to some neurons of a spiking neuronal network (B).As a result, the spiking activity is modified as well as spike correlations between neurons (C), even for neurons not directly stimulated, because of direct or indirect synaptic interactions.These are represented in the figure by the weights W.
In particular, sensory neurons collectively convey to the brain information about external stimuli using correlated spike patterns resulting from the conjunction of stimulus influence, intrinsic neurons dynamics and neurons interactions via synapses [4][5][6][7][8].This correlated firing has been linked to stimulus encoding [9], stimulus discrimination [10,11] and to intrinsic properties of the network which remain in absence of stimulus [12].However, disentangling the biophysical origins of the correlations observed in spiking data is still a central and difficult problem in neuroscience [2, 13,14].As a consequence, correlations in spiking neuronal networks have attracted a considerable amount of attention in the last years, from experimental data analysis perspectives [4,6,7,15] as well as from the theoretical modeling viewpoint [1,[16][17][18][19][20].
On one hand, novel experimental recording techniques such as Multi-Electrode Arrays (MEA) permit the measurement of the collective spiking activity of larger and larger populations of interacting neurons responding to external stimuli [21,22].These recordings allow, in particular, for better characterizations of the link between stimuli and the correlated responses of a living neuronal network, paving the way to better understand "the neural-code" [2].Yet, neuronal responses are highly variable [13].Even at the single neuron level, when presenting repetitions of the same stimulus under controlled experimental conditions, the neural activity changes from trial to trial [23,24].Thus, researchers are seeking statistical regularities in order to unveil a probabilistic, causal, relationship between stimuli and spiking responses [4,5,7,[25][26][27].
On the other hand, mathematical models of spiking neuronal networks offer a complementary approach to biological experiments [13,14,28,29].Based on bio-physically plausible mechanisms describing the dynamics of neurons, mathematical modeling provides a framework to characterize the population spike train statistics in terms of biophysical parameters, synaptic connectivity, history of previous spikes and stimuli.The hope is that understanding these aspects in a model may help to better the processing and extraction of information from real data.
In this article, we consider a spiking neuronal network model where the non-linear dynamics and neurons interactions naturally produce spatio-temporal spikes correlations.We assume that these neurons reach a stationary state without stimuli and from a given time are submitted to a time-dependent stimulation (See Figure 1).How are the spatiotemporal spike correlations modified by this stimulus?We address this question in the context of linear response theory using methods from ergodic theory and so-called chains with complete connections [30], extending the notion of Markov chains to infinite memory, providing a generalized notion of Gibbs distribution.We show that spatio-temporal response is written in terms of a history-dependent convolution kernel applied to the stimuli, generalizing (2) to more general observables than rates.We compute explicitly this kernel in a specific model example and analyze the validity of our linear response formula by numerical means.
The linear response determines how the expectation value of an observable of a dynamical system changes upon weakly perturbing the dynamics.The response of a system, originally at equilibrium, to a time-dependent stimulus is proportional to the stimulus, with coefficients obtained via correlation functions computed at equilibrium.The theory we develop has its roots in non-equilibrium statistical physics (briefly reviewed in Section 2.1).A seminal result in this context is the fluctuation-dissipation theorem [31], where the linear response only depends on the correlation functions of the unperturbed system.Our approach proceeds along similar lines, meaning that the linear response can be predicted from the spikes correlations of the unperturbed system (stationary state).We derive a result of this type for a discrete-time integrate and fire model addressing a central question: which correlations matter and how they are related to synaptic interactions?Among the applications of our results is the characterization of population receptive fields in terms of the underlying synaptic connectivity, non-linear dynamics and background activity, the prediction of higher-order correlations in the population responses to nonstationary weak amplitude stimuli, assessing the role of the synaptic connectivity in motion processing from the spiking response to, etc.
The paper is organized as follows.In Section 2, we briefly review linear response in statistical physics and ergodic theory allowing us to make a link between neuronal networks, considered as dynamical systems, and the statistics of spikes.We introduce the formalism of chains with unbounded memory (which are, as we explain, equivalent to left-sided one-dimensional Gibbs distributions), allowing the handling of non-stationary spike distribution with unbounded memory.In Section 3, we derive the general linear response Formula (16) used throughout the paper.This equation expresses the timedependent variation in the average of an observable f as a time series of specific correlation functions computed with respect to spontaneous activity (without stimulus).This result, reminiscent of the fluctuation-dissipation theorem in statistical physics [32,33], is applied here to spike statistics.In Section 4 we introduce a spiking neuronal network model to instantiate our analysis.This model has been presented in [34].We associate the spiking activity to a discrete stochastic process defined from transition probabilities where memory is unbounded.These probabilities are written as a function of the parameters of the model.We explicitly wrote a discrete-time form of the convolution kernel (1) as an explicit function of the model parameters, especially synaptic weights.The expression relies on a Markovian approximation of the chain and on a decomposition theorem of spike observables, introduced in a more general context by Hammersley and Clifford in 1971 [35] (see Section 2.2).In Section 5 we illustrate through an example our linear response theory applied.In particular, we show how one can predict the variation in the firing rate and in the delayed pairwise correlation between two neurons from the mere knowledge of the stimulus and relevant spontaneous correlations.Finally, in Section 6 we discuss our results.

Linear Response, Gibbs Distributions and Probabilistic Chains with Unbounded Memory
Neuronal networks can be considered either as dynamical systems (when the dynamics is known) or as spike generating processes characterized by transition probabilities computed from spike train observations.In the first case, it is natural to seek a linear response from dynamics itself, using approximations (e.g., mean-field [36]).In the second case, one has to define a probability distribution on the spike trains in order to investigate the effect of a perturbation.In this section, we show how these two approaches are related, making a link between the classical statistical physics approach of linear response, dynamical systems and ergodic theory, and neuronal networks.We introduce then the general formalism of chains with unbounded memory allowing the handling of non-equilibrium linear response for spiking neuronal networks.All of the material in this section is known in different domains, statistical physics, ergodic theory, stochastic processes, neuronal networks, and is presented here for a better understanding of the next sections.

Linear Response in Statistical Physics
For simplicity, we consider in this introductory section a dynamical system taking a finite number of "configurations" denoted by ω.In statistical physics, the linear response theory can be addressed in these terms.In a system at thermodynamic equilibrium, the probability of a configuration ω is given by the Boltzmann-Gibbs distribution: where k B T is called the partition function, with k B , the Boltzmann constant and T the temperature.The function H(ω ) is called the energy The functions X α are extensive quantities (proportional to the number of particles) such as energy, electric charge, volume, number of particles, magnetic field, . . .The conjugated parameters λ α correspond to intensive quantities (not proportional to the number of particles), like temperature, electric potential, pressure, chemical potential, magnetic susceptibility, . . . .In general, they depend on the location in the physical space (e.g., the temperature depends on the position in a fluid).At equilibrium, they are uniform in space though.The form of H, i.e., the choice of the λ α and X α is constrained by the physical properties of the system.It is also constrained by boundary conditions.
In standard statistical physics courses, the Gibbs distribution form (3) is obtained as a consequence of the Maximum Entropy Principle [37].For a probability measure P on the set of configurations, the statistical entropy is: Denote E P [ ] the expectation with respect to P. The Maximum Entropy Principle seeks a probability distribution maximizing the statistical entropy under the constraint that the average energy is constant, i.e., E P [ H ] = C for any probability measure P on the set of configurations.This probability exists and is unique when the set of configurations is finite; this is (3).When this set is infinite (e.g., thermodynamic limit) additional summability conditions are required on H to ensure existence and uniqueness of P [38,39].A non-equilibrium situation arises when the λ α s are not uniform in space, generating gradients ∇λ α (temperature gradient, electric potential gradient ...).These gradients result in currents of X α (e.g., a temperature gradient induces a heat current).In general, the currents are nonlinear functions of gradients.The Onsager linear response theory assumes that currents are linear combinations of gradients (i.e., gradients are weak enough so that non-linear terms can be neglected).Known examples are Ohm's law where the electric current is proportional to the gradient of the electric potential, Fourier's law where the heat flux is proportional to the temperature gradient, Fick's law etc.Several gradients can be simultaneously involved like in the Peltier effect.The proportionality coefficients are called Onsager coefficients [40].Now, the property that interests us is that Onsager coefficients are obtained as correlations functions computed at equilibrium (Kubo relations [32]).Thus, the knowledge of correlations at equilibrium allows the inference of the non-equilibrium response of the system to weak perturbations.

Linear Response in Spiking Neuronal Networks
Neuronal networks are modeled by dynamical systems (possibly stochastic).Therefore, the linear response theory can be addressed using the tools briefly presented in the previous section.This has been done for discrete-time in the Amari-Wilson-Cowan model [41,42], where the convolution kernel K appearing in (2) can be explicitly computed [43,44].Notably, one can compute the response of a neuron to a weak harmonic perturbation of another neuron, exhibiting specific resonances and functional connectivity distinct from the synaptic graph.
When dealing with spiking models such as the Integrate and Fire, the dynamics are not differentiable anymore (because of the mechanism of reset at threshold).Still, the Markov chain formalism (and its extension to infinite memory) can be used.In particular, we exhibit an example where transition probabilities can be explicitly computed and directly related to the dynamics.
We consider a network of N neurons, labeled by the index k = 1 . . .N. We define a spike variable ω k (n) = 1 if neuron k has emitted a spike in the time interval [nδ, (n + 1)δ[, and ω k (n) = 0 otherwise.We denote by ω(n k=1 the spike-state of the entire network at time n, which we call a spiking pattern.We denote by A = { 0, 1 } N , the state space of spiking patterns in a network of N neurons; a spike block denoted by ω n m , n ≥ m, is the sequence of spike patterns ω(m), ω(m + 1), . . ., ω(n); blocks are elements of the product set A n−m also denoted A n m in the text.We use this last notation because we consider processes with infinite memory (m → −∞) and we want to have an explicit notation A n −∞ for the corresponding set of events.The time-range (or "range") of a block ω n m is n − m + 1, the number of time steps from m to n.We call a spike train an infinite sequence of spikes both in the past and in the future.The set of spike trains is thus Ω ≡ A Z .
To simplify notations we note a spike train ω ∈ Ω.The shift operator σ : Ω → Ω is σω = ω , with ω (n) = ω(n + 1).This allow us to go one step forward in time along the spike train ω.We note F ≤n the set of measurable events (filtration) before time n and F the filtration on Ω. P (Ω, F ) is the set of probability measures on (Ω, F ).
We use the notion of (spike) observable.This is a function f : Ω → R that associates a real number to a spike-train.We say that the observable f : . It follows from the Hammersley-Clifford theorem [35,45] that any range-R observable can be written in the form: where f l are real numbers, the coefficients of the decomposition of f in the finite space of range R-observables.The functions m l spanning this space are called monomials [8].They have the form: where i k = 1 . . .N is a neuron index, and t k = 0 . . .D. Thus, m l (ω) = 1 if and only if, in the spike train ω, neuron i 1 spikes at time t 1 , . . ., neuron i k spikes at time t k .Otherwise, m l (ω) = 0.The number n is the degree of the monomial; degree one monomials have the form ω i 1 (t 1 ), degree two monomials have the form ω i 1 (t 1 )ω i 2 (t 2 ), and so on.Thus, monomials are similar to what physicists call (spike) interactions; in our case these interactions involve a time delay between spikes.There are L = 2 NR monomials of N neurons and range R and one can index each of them by an integer l in one-to-one correspondence with the set of pairs (i k , t k ) (see Equation (A2) in the Appendix A).The advantage of monomial representation is to focus on spike events, which is natural for spiking neuronal dynamics.We now introduce time-dependent observables.These are functions f (t, ω) depending on time t (continuous or discrete) and on the spike train ω.The notation f (t, ω) stands here for f (t, ω [ t ] −∞ ) where [ t ] is the integer part of t: the function depends on the spike train ω via spikes preceding the current time t.This is an implementation of causality.A range-R time dependent observable is a function [ t ]−D ).The decomposition (6) also holds for a time dependent range-R observable where f l (t) are now functions of time, and σ [ t ] is the [ t ]-iterate of the time shift operator σ.

Homogeneous Markov Chains and Gibbs Distributions
A "natural" way to characterize the statistics of observed spike trains is to associate them to a Markov chain with transition probabilities P n ω(n) ω n−1 n−D , where the index n in P n indicates that the transition probabilities depend on time n.This approach is "natural" because it captures causality by conditioning on the past spikes.We call D the memory depth of the chain and set R = D + 1.

Invariant Probability
Let us start the discussion when transition probabilities are independent of time (homogeneous Markov chain).In this case, we drop the index n in the transition probabilities, Assuming that all transition probabilities are strictly positive, it follows from the Perron-Frobenius theorem [46,47] that the Markov chain has a unique invariant probability p on A D .From the Chapman-Kolmogorov equation [46] one constructs, from p and transition probabilities, a probability measure µ on P (Ω, F ). where: As we now discuss, there is a natural correspondence between µ and exponential distributions of the form (4) (Gibbs distributions).

Transfer Matrix
Let us now consider a range-R observable: where u ) .By extension, for two blocks (u) , (u ) of range D ≥ 1 we say that the transition (u) → (u ) is legal if there is a block ω D 0 ∼ (u) (u ) .On this basis, one can construct a transfer matrix with positive entries: It follows from the Perron-Frobenius theorem [46,47] that L has a unique real positive eigenvalue s, strictly larger in modulus than the other eigenvalues, and with positive right eigenvector LR = sR, and left eigenvector LL = sL.Moreover, the range-R observable: defines an homogeneous Markov chain [48] with transition probabilities

Invariant Probability and Gibbs Distribution
The unique invariant probability of this Markov chain is: Using the Chapman-Kolmogorov Equation ( 7) one extends p to a probability µ on Ω, where, for m + n > D : This emphasizes the Markovian nature of the process since the conditioning has a finite time horizon of depth D.
It follows therefore that the probability of observing a spike block is the energy of the block ω n m .Note that, in contrast to (4) we have removed the − sign which has no reason to be present in this context, and is a source of nuisance when doing computations.
This establishes a first relationship with Gibbs distributions of the form (3), with a strong difference though.More precisely one has, ∃ A, B > 0 such that, for any block ω n 0 , which defines, in ergodic theory, a Gibbs measure in the sense of Bowen [49].Whereas we assumed (3) to hold on a finite set of states characterizing the system at a given time, here ω is a trajectory of the system describing its time evolution.In addition, the probability of the block ω n m is conditioned upon the past, which, in statistical physics would correspond to determine the probability of a block of binary variables (say spins) ω n m with left boundary conditions ω m−1 m−D−1 .This analogy is further developed in the next section.In the case of a Markov chain, the entropy (5) extends to: where we have dropped the Boltzmann constant as it plays no role here.Then, it can be shown that µ satisfies a variational principle [48,49]; it maximizes where ν is an invariant probability on P (Ω, F ).If E ν [ H ] is fixed this amounts to maximizing the entropy under the constraint that the average energy E ν [ H ] is fixed.Finally, the supremum F = S[ν ] + E ν [ H ] corresponds to the free energy; the generating function of cumulants.
We have therefore shown that a potential of the form ( 8) is associated with a homogeneous Markov chain where the invariant probability, extends the notion of Gibbs distribution, introduced in Section 2.1, to systems with memory where the probability to be in a state depends on a finite history.The extension to infinite history is made in the next section.As an important remark, the detailed balance condition is not required to define Gibbs distributions from Markov chains.

Chains with Infinite Memory and Gibbs Distributions
In the previous section we have made two important assumptions: (i) memory is bounded (finite memory depth D); (ii) the correspondence between Markov chains and Gibbs form was established for homogeneous Markov chains.
However, when considering neural networks, the memory may neither constant nor bounded: consider for example an Integrate and Fire model where the memory goes back to the last time in the past when the neuron has fired; in general, it is not possible to bound this time.So, the most general formalism is to consider chains with unbounded memory [50][51][52].Of course, as we discuss below, Markovian approximations are possible and useful.Still, one needs to properly control these approximations.In addition, we want to consider here the case of a system submitted to a time-dependent stimulus, where the dynamic is not time-translation invariant.
Thus, we are now considering a family of transition probabilities of the form , which represent the probability, that at time n, one observes the spiking pattern ω(n) given the (unbounded) network spike history.Such a non-Markovian stochastic process is known as a "chain with complete connections" or a "chain with unbounded memory" [30] defined in more detail here.This section follows very closely from [53].
such that the following conditions hold for every n ∈ Z: Definition 2. A probability measure µ in P (Ω, F ) is consistent with a system of transition probabilities {P n } n∈Z if: for all n ∈ Z and all F ≤n -measurable functions h.The probability measure µ, when it exists, is called a chain with complete connections consistent with the system of transition probabilities {P n } n∈Z .It is possible that multiple measures are consistent with the same system of transition probabilities.
We now give conditions ensuring the existence of a probability measure consistent with the system of transition probabilities [53].
Definition 3. A system of transition probabilities is non-null on Ω if, for all n ∈ Z and all We note, for n ∈ Z, m ≥ 0, and r integer: The intuitive meaning of continuity is the following.The quantity var m [P n [ω(n) | • ]] corresponds to the maximum variation one can observe on the probability of the spike state at time n, given that the history is fixed up to time n − m.Thus, continuity implies that this variation tends to zero as m tends to infinity: the further in the past that the spike sequence is fixed, the smaller the probability that the past influences the present.
The following result holds (see [53]): Theorem 1.A system of continuous transition probabilities on a compact space has at least one probability measure consistent with it.
Uniqueness requires additional technical assumptions [53].These conditions hold in the discrete time integrate and fire model [34] considered in Section 4.
Let us now elaborate on the link with Gibbs distributions.First, we define φ(n, ω and: Then: and: These equations emphasize the connection with Gibbs distributions in statistical physics where φ acts as an "energy" [53,54].From now on we use the term "potential" instead.The correspondence in our case is to consider "time" as a 1-dimensional "lattice" and the "boundary conditions" as the past ω m−1 −∞ of the stochastic process.In contrast to statistical physics, and because the potential is defined via transition probabilities, the normalization factor (partition function) is equal to 1.For this reason, we call φ a normalized Gibbs potential.
Equations ( 14) and ( 15) are similar to (9) with an essential difference: the memory is now infinite, and the potential φ has an infinite range.As it is well-known in statistical physics [38,39], infinite range potentials require specific conditions to be associated with a unique Gibbs distribution.There is a mathematically well-founded correspondence between chains with complete connections and Gibbs distributions [38,39,53].However, while chains with complete connections define transition probabilities where the present is conditioned upon the past, Gibbs distributions allow conditioning "upon the future" as well.More generally, Gibbs distributions in statistical physics extend to probability distributions on Z d where the probability (3) to observe a certain configuration of spins in a restricted region of space is constrained by the configuration at the boundaries of this region.They are therefore defined in terms of specifications [38,39], which determine finite-volume conditional probabilities when the exterior of the volume is known.In one spatial dimension (d = 1), identifying Z with a time axis, this corresponds to conditioning both in the past and in the future.In contrast, families of transition probabilities with an exponential continuity rate define the so-called left-interval specifications (LIS) [53,55].This leads to not equivalent notions of "Gibbsianness" [56].We shall not develop on these distinctions here and call Gibbs distribution a chain with the complete connection.

Linear Response for Neuronal Networks with Unbounded Memory
We consider a neural system where spike statistics is characterized by a time-translation invariant Gibbs distribution (chain with unbounded memory) µ (sp) where "sp" stands for "spontaneous".That is, we suppose that, in the absence of a stimulus, the spontaneous dynamics is stationary.We assume that a stimulus S(t) is applied from time t = t 0 , and that conditions of existence and uniqueness of a chain with complete connection µ are fulfilled in the presence of the stimulus (an example is given in the next section).We note n 0 = [ t 0 ].For times anterior to n 0 , µ identifies with µ (sp) , that is, for any m < n ≤ n 0 , for any block The goal is to establish an explicit (formal) equation for δ[ f (t, .)], as a function of the stimulus.This is done via a Volterra-like expansion in powers of the stimulus, cut to the first order so as to obtain a linear response in terms of a convolution between the stimulus and a convolution kernel K f , depending on f , δ (1) [ f (t, .)] = K f * S (t).This way, we obtain a relationship between the proportionality coefficient K f in the linear response, and specific correlation functions computed at equilibrium (spontaneous activity).This provides a Kubo relation holding in the case of neuronal networks with unbounded memory and for an arbitrary range-R observable f .In contrast to Volterra expansion, our formalism allows to explicit the dependence of K f in the neuronal network characteristics (parameters fixing the individual neuron dynamics and connectivity-synaptic weights).An example is provided in the next section.
From the definition ( 12), e φ (sp) corresponds to the family of transition probabilities P (sp) defining the Gibbs distribution µ (sp) in the spontaneous regime, whereas e φ corresponds to the family of transition probabilities { P } defining the Gibbs distribution µ in time-dependent stimuli-evoked regime.For n > n 0 , we have: which, from Equation ( 13), gives: Taking the first-order approximation of the exponential, we obtain: However, while Φ(n 0 + 1, n, ω ) and Φ (sp) (n 0 + 1, n, ω ) are normalized potentials, i.e., the log of a conditional probability, the first order approximation of e Φ (sp) (n 0 +1,n,ω ) [1 + δΦ(n 0 + 1, n, ω) ] is not.Normalization is obtained formally by introducing the partition function: constrained by the past sequence ω n 0 −∞ , so that the quantity is the first order approximation of P ω n n 0 +1 ω n 0 −∞ .Setting: we have, to first order: However, as Φ (sp) is the log of a conditional probability, Z (sp) ω n 0 −∞ = 1.So, finally, we obtain, to first order: where E (sp) [ ] denotes the expectation with respect to µ (sp) .We use E (sp) [ ] instead of E µ (sp) [ ] to alleviate notations.

Time Dependent Average of an Observable
We now consider a time-dependent observable f with finite range R ≡ R f .We assume Here, a note of explanation is necessary.Functions f (t, ω) are random functions, where the randomness comes from ω. So, the law of f (t, ω) is determined by the probability µ.E n [ f (t, .)] is the average of the continuous time dependent observable f (t, ω), averaged over the discrete time spike train ω, up to the discrete time n = [ t ] (by definition f (t, .)does not depend on spike events occurring at times posterior to n).Note that this average cannot be defined by an ergodic time average procedure as, here, the probability is non stationary (see Section 5 for a numerical implementation).
Because f has finite range R f we may write: The last equality holds because f (t, ω ) is independent of ω n 0 −∞ .Thus, using (15): where the last equation holds because, on F ≤n 0 , µ = µ (sp) .
Thus, replacing −∞ , we obtain, up to first order: The first term is E (sp) [ f (t, .)] from (15).The second term is: from the consistency property (10), and because by assumption (n For the third term: However, by assumption f (t, ), whereas by definition of the conditional expectation E (sp) δΦ(n 0 + 1, n, ω) | ω n 0 −∞ is the projection on the sigma-algebra F ≤n 0 .As a consequence, we have: Summing up, we have, using ( 13): Using the correlation function: This equation expresses that the time-dependent variation in the average of an observable f is expressed, to the first order, as a time series of correlation functions, between f and the time-dependent variation of the normalized potential, computed with respect to the equilibrium distribution.This is our main result.
It is similar to the fluctuation-dissipation theorem in statistical physics [32,33].Here, it holds for Gibbs distributions with infinite range potential φ(t, ω).A crucial point is the convergence of the series when the initial time of perturbation, n 0 tends to −∞.This holds if correlations C (sp) [ f (t, .),δφ(r, .)] decay sufficiently fast, typically, exponentially.We consider only this case in this article.
One of the advantages of this relationship is that averages are taken with respect to µ (sp) .In the case of experimental data, these averages can be approximated by empirical averages on spontaneous activity.The monomial decomposition of ( 16) can be found in the Appendix B. The Leaky Integrate and Fire model (LIF) consider a point neuron k (without spatial extension), with membrane potential V k , membrane capacity C k , resistance R, submitted to a current I k (t).Call θ the spiking threshold.The sub-threshold dynamics is: If there is a time t k such that the membrane potential of neuron k reaches the firing threshold, V k (t k ) ≥ θ, the neuron k fires an action potential, i.e., it emits a spike and the membrane potential of neuron k is reset to a fixed reset value V res instantaneously.The neuron's membrane potential remains at this value during a time denoted by ∆ called "refractory period", i.e., V k To illustrate our results we use a "simple" model corresponding to a discrete-time LIF network [34].The main reason to choose this model is to facilitate numerical simulations as handling continuous-time dynamics with spikes event, although manageable mathematically [57] is difficult to handle numerically [58].To further simplify the analysis we fixing the sampling time dt = 1, the capacitance The evolution of the membrane potential V k is ruled by the following sub-threshold equation: Note that by setting C k = 1 we are not considering units of measurements anymore.We now explicitly consider an interconnected network of neurons.The network of synaptic connectivity is represented by a matrix of components W kj which can be positive or negative to characterize inhibition or excitation respectively.We consider random fluctuations by adding a standard Gaussian additive noise ξ k (n) controlled by the amplitude σ B .We also consider a constant stimulus I 0 and a time dependent stimulus S k (t).Thus I k (t) = ∑ j W kj ω j (n) + I 0 + S k (t) + σ B ξ k (n), and thus our equation finally reads: Note that γ, the decay term, is related to the leak characteristic time (18) by: The condition γ < 1 define the exponential decay in the spike history dependence via the characteristic time: This characteristic time can be interpreted as follows.Integrating Equation (18) up to the last time where voltage was reset in the past, τ k (n, ω) gives the following: where: The first term on the right-hand side of (20) corresponds to the synaptic contribution.The second corresponds to the integration of the constant term I 0 , used to fix the baseline activity.The third term corresponds to the integration of a time-dependent stimulus S k (t) and the fourth to the integrated noise term with intensity σ B .
For each n ∈ Z, conditionally to ω n−1 −∞ , V k (n) is Gaussian random variable.It can be decomposed in the following way: We now consider the "spontaneous" voltage V (sp) k (n, ω) and the evoked response due to the external time dependent stimulus

Transition Probabilities of the Discrete Time LIF Model
In the limit of small σ B , the family of transition probabilities can be written explicitly in terms of the parameters of the spiking neuronal network model.The discrete-time LIF model is conditionally independent i.e., it factorizes over neurons once the spike history has been fixed [34].The same property is held for the continuous-time version of this model [59].However, a more complete version of this model also includes electric synapses [57].In that case, the conditional independence is lost. where: and: where, corresponds to the variance of the noise integrated up to time n − 1. Combining ( 12) and (22), we obtain the normalized potential for the discrete time LIF model. (24)  which depends on all of the parameters of the network via the variable X k (n − 1, ω) (23).
Remark 1.The function Π(x) is a sigmoid which tends to 1 when x → −∞ and tends to 0 when x → ∞.This has two consequences: This expresses that the probability of having a spike at time n when X k (n − 1, ω) becomes large (neuron strongly hyper-polarized) tends to 0. The same argument holds mutatis mutandis for the limit X k (n | is large (neuron either strongly hyper-polarized or strongly depolarized) the effect of a variation of the membrane potential on the firing probability is negligible.Thus, we study the effect of a perturbation in bounded range for X k (n − 1, ω): uniformly in n, ω.This is ensured by natural assumptions on synaptic weights and on σ B , the mean-square deviation of the noise, which has to be bounded away from 0.

Expansion of the Normalized Potential
The normalized potential of the Discrete time LIF model can be separated into: (i) a "spontaneous" part φ (sp) (ω), which before time t 0 is independent of the stimuli and time and; (ii) a "perturbation" part δφ(n, ω) depending on a time-dependent stimuli, which is non-zero from time t 0 .Mathematically this is achieved by adding an extra term to the spontaneous potential after time t 0 .
From Section 3 this perturbation induces a time-dependent variations on the average of an observable f : (sp) .Thus, the term E (sp) [ f (n, •) ] refers to an average with respect to the unperturbed system and δ (1) [ f (n, •) ].As we show, the variation δ (1) can be explicitly written in terms of the variation on the normalized potential induced by the introduction of the stimulus.
Note that if the external stimuli are switched on at time t 0 , spike statistics are still constrained by the previous spontaneous activity, since transition probabilities have memory.This effect is especially salient in the discrete-time LIF model which has an unbounded memory.
We rewrite (23) in the form: where ω) is independent of the stimulus, and: where the last equality holds because S(n) = 0 for n < t 0 .
In the next computation we write X to alleviate notations.We make a series expansion of φ k (n, ω ) at X (sp) k , under the conditions of (25).We have: where a (u) and b (u) are the u-th derivative of log Π(x) and log(1 − Π(x)).In particular: . Therefore, where: δφ This expansion holds for any value of X (sp) k (n − 1, ω).However, when this quantity becomes large in absolute value, one has to consider more and more terms in the expansion to approach sufficiently well the function δφ k (n, ω ).This is a well-known property of the function Π (which can be written in terms of the error function): the Taylor expansion converges very slowly near infinity and other expansions are more efficient (e.g., Bürmann series [60]).Here, we consider the effect of a perturbation in a range where the function Π does not saturate.In addition, we restrict ourselves to cases where the first order of the Taylor expansion is sufficient to characterize the response.This is ensured by conditions of the form (the same holds mutatis mutandis for b): Applied to the second order this gives a condition: which becomes more and more restrictive as one gets away from away from the linear region of the sigmoid Π.Note that the condition X (sp) k (n − 1, ω) ∼ 0 corresponds to a voltage close to the firing threshold.We insist that this constraint is not a limitation of our approach, but instead, a limitation of linear response applied in neuronal systems where the response of a neuron is characterized by a saturating function.Away from the linear part of the sigmoid, nonlinear effects dominate the response.In the numerical simulations, we use the mean-field approximation where we replace τ k (r − 1, .) by its average value µ where the inverse 1 ν k of the average firing rate ν k of neuron k is the mean time between two spikes.
Under these conditions we obtain: where we have replaced n by r in view of ( 16).

The function δφ
(1) k (r, ω ) is the first order variation of normalized potential when neurons are submitted to a weak time dependent stimulus, under the approximation (24).There are two contributions.The integral includes the effect of the stimulus on the dynamics flow; the term H k (r, ω), given by Equation ( 26), contains the effect of the network via the terms a Equation (21).Note that the dependence on synaptic weights is non-linear because a (1)  and b (1) are non-linear.
In order to study the linear response theory in this model, one has to properly handle numerically: (1) The spike train statistics, especially finding a numerical way to perform a suitable averaging, not only for the spontaneous probability measure where time-ergodic average can be used but also for the non-stationary response where ergodicity does not take place; (2) The long memory tail in the dynamics; (3) Find an illustrative example with a good range of parameters showing convincing, original, non-trivial results while avoiding prohibitive computational times.
It is not evident to us that classical spiking neural network simulators such as BRIAN [61], although quite efficient, could easily handle (1) in conjunction with (2).The model presented below, has been studied both from the mathematical side and the numerical side [8,34].Additionally, we have designed a simulation tool, PRANAS, devoted to the analysis of population spike train statistics and allowing to properly handle Gibbs distributions from numerical simulations or experimental data, with a specific module dedicated to this model [62].This software is freely downloadable at https://team.inria.fr/biovision/pranas-software/on simple demand.The linear response C++ codes and instructions required to reproduce these numerical results can be found at https://github.com/brincolab/Linear-response.

Linear Response
We apply our main result (16) to the potential of the discrete-time model (24).Under the approximations made in the previous section, we examine two forms of linear response proposed in the paper.

1.
The first-order expansion of the potential, (16).In the present context it becomes: where we have set k (r,ω) σ k (r−1,ω) .In the numerical simulations we use a Markovian approximation with memory depth D such that the sum ∑ n r=−∞ is replaced by ∑ n r=n−D .D is typically determined by exponential decay of the terms ) ], which is controlled, on one hand, by γ (with a char- acteristic time τ γ = − 1 log γ ), and the correlation which is controlled by the spectral gap in the Perron-Frobenius matrix.In this approximation we have: Using the stationarity of µ (sp) we have Then, introducing the N × D matrix K (1) with entries : we may write δµ (1) [ f (n) ] in the form: The first order Hammersley-Clifford expansion of H k which corresponds to expand ζ k (0, ω) to the lowest order in the Hammersley-Clifford expansion, k ω k (0) with: γ (1) . This gives an approximation similar to the fluctuation-dissipation theorem where the linear response is a sum of pairwise correlation functions.In the discrete time model it reads: where : K We have used the superscript "HC 1 " to refer to the lowest-order Hammersley-Clifford approximation from (28).The interest of this approximation (and more generally, of the Hammersley-Clifford expansion is that it can be obtained without knowing explicitly the potential φ (by a mere fit of the coefficients γ (1) k ).We expect however (28) to give a better approximation of the linear response than (29).Note that the effective threshold corresponds to θ , is independent of k in this case.This explains why γ (1) gets out of the sum and lost its index k in (29).

Averaging Method
We need to compute numerically averages with respect to the invariant probability µ (sp) .As µ (sp) is ergodic it is in principle possible to get them by time averaging.However, we also want to compute averages in the presence of a stimulus, where ergodicity does not take place.In addition, a notation like E (sp) [ f (n, •) g(r, •) ] appearing all over the paper involves a subtlety: we are computing the average of time-dependent quantities (via the first argument in f (n, •), g(r, •)) with respect to an invariant probability on the second argument, ω.The mathematical meaning is the following: We numerically compute such quantities by generating M spike trains, denoted ω (m) , m = 1 . . .M, of length T, with the spontaneous dynamics, thus distributed according to µ (sp) .Then: The equality holds in the limit M → ∞.Here, we chose M = 10,000 or M = 100,000.
Fluctuations about the mean are ruled by the central limit theorem, so they decrease like with a proportionality factor depending on the observables f and g.

Results
We consider a one-dimensional lattice of N = 30 neurons, separated by a lattice spacing d x , with null boundary conditions.The connectivity is depicted in Figure 2.Each neuron excites its neighbours with a weight W k±1,k = w + > 0 and inhibits its second neighbours with a weight W k±2,k = w − < 0. We compare the spontaneous activity (with a noise term and a constant term I 0 to fix the baseline activity, as in ( 18)) to the dynamics in the presence of Gaussian pulse, with width ∆, propagating at speed v from left to right: where x, t are continuous space-time variables.The neuron k is located at x k = k δ and time is updated at each t = n b where b is a time bin.In the simulations d x = 1 mm, v = 2 mm/s, b = 10 ms, ∆ = 1 mm.The term A represent the amplitude and is variable to show the validity of the linear response theory as the amplitude of the stimulus increases.
We fix θ = 1, γ = 0.6, corresponding to a characteristic decay time (19) τ γ = − 1 log γ ∼ 2. The memory D appearing in the summations (28), ( 29) was taken to be D = 10 from the study of correlation decay rate.This is a good compromise between the convergence of these sums and the computational time.
The constant stimulus I 0 is fixed to have a good baseline activity.We consider the activity without and with stimulus with moderate excitatory connectivity and strong inhibitory connectivity (w + = 0.2, w − = 2).In Figure 3 we show the spike network activity in spontaneous activity (left) and in the presence of a moving stimulus (right).The strong inhibition is particularly visible in the presence of the stimulus.

Linear Response for Firing Rates
We first present the results of linear response for the observable f (n, ω) = ω k c (n), where k c = N 2 is the index of the neuron located at the center of the lattice.Thus, µ[ f (n, ω) ] ≡ r(k c , t), the firing rate of this neuron as a function of time.In spontaneous activity it is a constant; under stimulation it depends on time.In Figure 4 we show the effect of the stimulus on the average value of this observable for different amplitudes values.One observes the combined effect of the stimulus and of the connectivity.(30).Orange: Linear response computed from Equation (28).Green: Linear response computed from Equation (29).
We next studied how correlation functions in spontaneous activity depend on space and time.One observes that they decay relatively fast with the time delay of m (Figure 5).In addition, they are multiplied by γ m in ( 28), (29).Therefore the contribution to the linear response series decay exponentially fast and the series can be truncated to low order.Here we took a maximal order D = 10.Finally, as shown in Figure 4, we compute the linear response δµ (1) [ f (t) ], δµ (HC 1 ) [ f (t) ] and compare them to the response obtained by empirical averages.

Linear Response for Higher Order Observables
Here, we consider the pairwise observable 2 .This is an example of an observable with a time delay.Neurons k c − 2 and k c mutually inhibit each other so we expect that the state of neuron k c − 2 before n impact the state of neuron k c at time n.However, the correlation between those states depends as well on the state of the other neurons.
Similarly to the previous section we have plotted in Figure 6 the empirical estimation of the linear response under the two approximations (28), (29).
Here, we consider the same amplitudes and as Figure 4.

Validity of the Linear Response
The linear response is expected to hold when the stimulus amplitude is weak.What does that mean?A mathematical answer is given by Equation ( 27) although remaining at a rather abstract level.Here, we compute the distance: between the theoretical curves δµ (th) [ f (n) ] and the experimental curve δµ (exp) [ f (n) ], as a function of the stimulus amplitude A, in the two examples of observable investigated here.Note that distance here is not normalized: it does not take into account the amplitude of the response.This explains why the distance is larger in the case of firing rates than in the delayed pairwise case, as in the latter the norm of the curve is quite smaller.The result is presented in Figure 7.As expected, the error in both cases increases with the amplitude of the stimulus, and it increases slower for (28) than for the lowest order Hammersley-Clifford expansion (29).It is interesting to see how the curves differ when A increases (see Figures 4 and 6).The empirical average curves clearly exhibit a non-linear saturation (e.g., firing rate cannot exceed 1) that is not reproduced by the linear response theory.This is further commented on in the discussion section.32) between the curves δµ (1) [ f (n) ], δµ (HC 1 ) [ f (n) ] and the empirical curve, as a function of the stimulus amplitude A. The left panel shows the distance between rate curves (Section 5.3) and right panel distance between pairwise with delay (Section 5.4).

Comments on Numerical Results
With these simulations, we have been able to illustrate our results.We are able to predict the time variation of an observable under a non-stationary stimulation, from the mere knowledge of the stimulus and the spontaneous statistics.In particular, our results hold for observables with time delays, considerably enlarging the scope of linear response theories in neuronal networks.The comparison with classical fluctuation theory also emphasizes the role played by higher-order terms.Linear response requires that the stimulus has a weak enough amplitude.We see very well this effect numerically.As the amplitude A of the stimulus increases, we check the increasing discrepancy between the empirical averages and the prediction.

Discussion
In this paper, we have addressed the following question: How is the average of an observable f (t, ω) affected by a weak time-dependent stimulus.We studied this question in a theoretical setting, using the linear response theory and probability distributions with unbounded memory generalizing the usual definition (3) of Gibbs distributions in statistical physics courses.Our goal was to show a general mathematical formalism allowing one to handle spike correlations as a result of neuronal network activity in response to a stimulus.The most salient result of this work is that the difference of an observable average in response to a time-dependent external stimulus of weak amplitude can be computed from the knowledge of the spontaneous correlations, i.e., from the dynamics without the stimulus.This result is not surprising from a non-equilibrium statistical physics perspective (Kubo relations, fluctuation-dissipation relation [31,33]).However, to the best of our knowledge, this is the first time it has been established for spiking neuronal networks.The novelty of our approach is that it provides a consistent treatment of the expected perturbation of higher-order correlations, going in this way, beyond the known linear perturbation of firing rates and instantaneous pairwise correlations; in particular, it extends to time-dependent correlations.
In addition, we made explicit the linear response kernel in terms of the parameters determining individual networks dynamics and neuron connectivity.We have provided an explicit example of this for a well-known class of models, the LIF model.This makes explicit the role of the neuronal network structure (especially synaptic weights) in the spiking response.As we show, the stimulus-response and dynamics are entangled in a complex manner.For example, the response of a neuron k to a stimulus applied on neuron i does not only depends on the synaptic weight W ki but, in general, on all synaptic weights, because the dynamics create complex causality loops which build up the response of neuron k [8,43,44].Our linear response formula is written in terms of the parameters of a spiking neuronal network model and the spike history of the network.Although a linear treatment may seem a strong simplification, our results suggest that already, in this case, the connectivity architecture should not be neglected.In the presence of stimuli, the whole architecture of synaptic connectivity, history and the dynamical properties of the networks are playing a role in the correlations through the perturbed potential.This agrees well with results from a recent study exhibiting an exact analytical mapping between neuronal network models and maximum-entropy models, showing that, in order to accurately describe the statistical behavior of any observable in the Maximum Entropy model, all the synaptic weights are needed, even to predict firing rates of single neurons [8].
Higher order terms can not only play a significant role in spatial terms, as shown in [63,64], but also in temporal terms.Indeed, as argued throughout this paper, neuronal interactions involve delays that have to be integrated into a model attempting to explain spike statistics [21].Note, however, that contrary to what is usually believed, detailed balance is absolutely unnecessary in order to properly handle time correlations [65][66][67].Note also that binning-which can be convenient to remove short-range time-correlations from the analysis-dramatically changes the nature of the process under investigation, rendering it non-Markovian [68] We have also introduced the monomial expansion by providing a canonical way of decomposing the potential, describing the stationary dynamics.Moreover, the Hammersley-Clifford decomposition allows us to obtain the coefficients weighting the monomials in terms of the parameters constraining dynamics.In the case of the discrete-time LIF model, this allowed us to show the explicit dependence of the coefficients in terms of synaptic weights.Although the basis of monomials is quite huge, standard results in ergodic theory and transfer matrices/operators state that we can neglect high order terms because of the exponential correlation decay.
In the example we present we study the following question: What is the role of this lateral connectivity in motion processing in sensory neurons?Clearly, one may expect it to induce spatial and temporal correlations in spiking activity, as an echo, a trace, of the object's trajectory.These correlations cannot be read in the variations of firing rate; they also cannot be read in synchronous pairwise correlations as the propagation of information due to lateral connectivity necessarily involves delays.This example raises the question about what information can be extracted from spatio-temporal correlations in a network of connected neurons submitted to a transient stimulus.
Our results are written in terms of Kernels which can be found for any observable, generalizing the concept of receptive fields to general spatio-temporal observables beyond firing rates.This is the analogous of having "population receptive fields" which are an extension of the concept of receptive-field usually associated individual sensory neurons.potential) can also be approximated by a finite-range potential φ (R) in the sup norm where φ − φ (R) ≤ CΘ R , for some 0 < Θ < 1 [69].Equivalently the chain with infinite memory can be replaced by a Markov chain of memory depth D, D = R + 1.Therefore, δφ can be approximated by a range-R potential δφ (R) .Using the monomial decomposition (6), we have: δφ(r, ω) ∼ ∑ l δφ l (r)m l (ω r r−D ).
In this setting, ( 16) becomes: f l (t) C l,l (n − r) δφ l (r), so that the linear response is a linear decomposition of monomial correlations computed at equilibrium.We can write this in a more compact form introducing the Ldimension vectors f (t) = f l (t) where | denotes the standard scalar product.This has three important consequences.

Appendix B.2. The Convolution Kernel
For a time independent observable f (ω) we introduce K f (m) the L × L matrix with entries: K f ;l,l (m) = f l C l,l (m), so that: which is a discrete convolution.It is close to the form (1) with the difference that the time is discrete, coming from our spike trains discretization.The stimulus, explicit in ( 1) is here hidden in δφ(r).We give an illustration of this in the next section.However, the fundamental result is that the convolution kernel defined this way is a linear combination of monomial correlations functions.This has, therefore, the form of a Kubo equation where the monomials play the role of the physical quantities introduced in Section 2.1.
As mentioned above, the main and essential difference is that, in contrast to Physics, we have no a priori idea which monomials are the most important (except straightforward arguments on the decaying probability of high order monomials).There is no known principle to guide the choice.

4 .
An Example: Linear Response in a Conductance Based Integrate and Fire Model 4.1.Discrete Time Integrate and Fire Network Model

Figure 2 .
Figure 2. Connectivity pattern of the model.Neurons (black filled circles) are located into a onedimensional lattice with spacing d x .They are connected to nearest neighbors through excitatory connections (red) with weight w + and to second nearest neighbors through inhibitory connections (blue) with weight w − .These neurons are submitted to a space-time dependent stimulus S(x, t) (top line, blue trace) traveling at speed v from left to right.This modifies the average activity µ[ f (x, t) ] by a variation δµ[ f (x, t) ] (bottom trace).

Figure 3 .
Figure 3. Spiking activity of the network.Left panel shows the spontaneous activity and right panel in the presence of a moving stimulus (31).

Figure 5 .
Figure 5. Correlation functions corresponding to the firing rate of the neuron k c = N 2 as a function of the neuron index k (abscissa), for different values of the time delay m. (Left) correlations with stimulus.(Right) correlations in the spontaneous regime.Top.m = 0, middle m = 1, bottom m = 2, 3, 9.

Figure 7 .
Figure 7. d 2 distance(32) between the curves δµ(1) [ f (n) ], δµ (HC 1 ) [ f (n) ] and the empirical curve, as a function of the stimulus amplitude A. The left panel shows the distance between rate curves (Section 5.3) and right panel distance between pairwise with delay (Section 5.4).

Author
Contributions: conceptualization, B.C. and R.C.; methodology, B.C. and R.C.; software, B.C and I.A.; writing-original draft preparation, B.C. and R.C.; writing-review and editing, B.C. and R.C.; visualization, B.C., R.C. and I.A.; supervision, B.C. and R.C.; project administration, B.C. and R.C.; funding acquisition, B.C. and R.C.All authors have read and agreed to the published version of the manuscript.Funding: All authors have been supported by the MAGMA-EQA-041903 INRIA associated team.RC was supported by Fondecyt Iniciación 2018 Proyecto 11181072.BC was supported by the French government through the UCA-Jedi project managed by the National Research Agency (ANR-15-IDEX-01) and, in particular, by the interdisciplinary Institute for Modeling in Neuroscience and Cognition (NeuroMod) of the Université Côte d'Azur.
t) δφ l (r) C (sp) m l (ω n n−D ), m l (ω r r−D ) .However, µ(sp) is stationary by assumption.Hence the correlation C (sp) m l (ω n n−D ), m l (ω r r−D ) only depends on the two monomials m l , m l and on the time lag n − r, i.e., C(sp) m l (ω n n−D ), m l (ω r r−D ) = C (sp) [m l • σ n−r , m l ],that we write C l,l (n − r) to alleviate notations.Thus: