Asymptotic Description of Neural Networks with Correlated Synaptic Weights

Faugeras, Olivier; MacLaurin, James

doi:10.3390/e17074701

Open AccessArticle

Asymptotic Description of Neural Networks with Correlated Synaptic Weights

by

Olivier Faugeras

^* and

James MacLaurin

INRIA Sophia Antipolis Mediterannee, 2004 Route Des Lucioles, Sophia Antipolis, 06410, France

^*

Author to whom correspondence should be addressed.

Entropy 2015, 17(7), 4701-4743; https://doi.org/10.3390/e17074701

Submission received: 13 February 2015 / Revised: 23 May 2015 / Accepted: 23 June 2015 / Published: 6 July 2015

(This article belongs to the Special Issue Entropic Aspects in Statistical Physics of Complex Systems)

Download Versions Notes

Abstract

:

We study the asymptotic law of a network of interacting neurons when the number of neurons becomes infinite. Given a completely connected network of neurons in which the synaptic weights are Gaussian correlated random variables, we describe the asymptotic law of the network when the number of neurons goes to infinity. We introduce the process-level empirical measure of the trajectories of the solutions to the equations of the finite network of neurons and the averaged law (with respect to the synaptic weights) of the trajectories of the solutions to the equations of the network of neurons. The main result of this article is that the image law through the empirical measure satisfies a large deviation principle with a good rate function which is shown to have a unique global minimum. Our analysis of the rate function allows us also to characterize the limit measure as the image of a stationary Gaussian measure defined on a transformed set of trajectories.

Keywords:

large deviations; good rate function; stationary gaussian processes; stationary measures; spectral representations; neural networks; firing rate neurons; correlated synaptic weights

1. Introduction

The goal of this paper is to study the asymptotic behavior and large deviations of a network of interacting neurons when the number of neurons becomes infinite. Our network may be thought of as a network of weakly-interacting diffusions: thus before we begin we briefly overview other asymptotic analyses of such systems. In particular, a lot of work has been done on spin glass dynamics, including Ben Arous and Guionnet on the mathematical side [1–4] and Sompolinsky and his co-workers on the theoretical physics side [5–8]. Furthermore, the large deviations of weakly interacting diffusions has been extensively studied by Dawson and Gartner [9,10], and more recently Budhiraja, Dupuis and Fischer [11,12]. More references to previous work on this particular subject can be found in these references.

Because the dynamics of spin glasses is not too far from that of networks of interacting neurons, Sompolinsky also succesfully explored this particular topic [13] for fully connected networks of rate neurons, i.e., neurons represented by the time variation of their firing rates (the number of spikes they emit per unit of time), as opposed to spiking neurons, i.e., neurons represented by the time variation of their membrane potential (including the individual spikes). For an introduction to these notions, the interested reader is referred to such textbooks as [14–16]. In his study of the continuous time dynamics of networks of rate neurons, Sompolinsky and his colleagues assumed, as in the work on spin glasses, that the coupling coefficients, called the synaptic weights in neuroscience, were random variables independent and identically distributed with zero mean Gaussian laws. The main result obtained by Ben Arous and Guionnet for spin glass networks using a large deviations approach (respectively by Sompolinsky and his colleagues for networks of rate neurons using the local chaos hypothesis) under the previous hypotheses is that the averaged law of Langevin spin glass (respectively rate neurons) dynamics is chaotic in the sense that the averaged law of a finite number of spins (respectively neurons) converges to a product measure as the system gets very large.

The next theoretical efforts in the direction of understanding the averaged law of rate neurons are those of Cessac, Moynot and Samuelides [17–21]. From the technical viewpoint, the study of the collective dynamics is done in discrete time, assuming no leak (this term is explained below) in the individual dynamics of each of the rate neurons. Moynot and Samuelides obtained a large deviation principle and were able to describe in detail the limit averaged law that had been obtained by Cessac using the local chaos hypothesis and to prove rigorously the propagation of chaos property. Moynot extended these results to the more general case where the neurons can belong to two populations, the synaptic weights are non-Gaussian (with some restrictions) but still independent and identically distributed, and the network is not fully connected (with some restrictions) [18].

The common thread to all of the above approaches is that, in the large network limit, the neurons are (probabilistically) independent of each other. This independence is desirable because it facilitates a reduction to the macroscopic level, since the net activity of the network can be accurately represented by the mean activity of any particular neuron. However, as our results further below demonstrate, complete independence between the neurons is not the only situation in which one may obtain an accurate reduction to the macroscopic level. We are therefore motivated to incorporate in the network model the fact that the synaptic weights are not independent and in effect often highly correlated. One of the reasons for this is the plasticity processes at work at the levels of the synaptic connections between neurons; see for example [22] for a biological viewpoint, and [14,16,23] for a more computational and mathematical account of these phenomena.

Our results imply that there are system-wide correlations between the neurons, even in the asymptotic limit. The key reason why we do not have propagation of chaos is that the Radon-Nikodym derivative

\frac{d Q^{N}}{d P^{N}}

of the average laws in Proposition 8 cannot be tensored into N independent and identically distributed processes; whereas the simpler assumptions on the weight function Λ in Moynot and Samuelides allow the Radon-Nikodym derivative to be tensored. We remind the reader that the Radon-Nikodym derivative of a measure with respect to another measure is an extension to more general spaces of the following simple result from differential calculus: given two differentiable functions F (x) and G(x) defined on ℝ with derivatives f(x) and g(x), then the ration of the differentials dF (x) and dG(x) is equal to f(x)/g(x) whenever g(x) ≠ 0. In this example, the first measure is f(x) dx and the second g(x) dx. The interested reader may look at standard textbooks on real and complex analysis such as [24].

A very important implication of our result is that the mean-field behavior is insufficient to characterize the behavior of a population. Our limit process µ_e is system-wide and ergodic. Our work challenges the assumption held by some that one cannot have a “concise” macroscopic description of a neural network without an assumption of asynchronicity at the local population level.

In more details, the problem we solve in this paper is the following. Given a completely connected network of firing rate neurons in which the synaptic weights are Gaussian correlated random variables, we describe the asymptotic behavior of the network when the number of neurons goes to infinity. Like in [18,19] we study a discrete time dynamics but unlike these authors we cope with more complex intrinsic dynamics of the neurons, in particular we allow for a leak (to be explained in more detail below). In the large-size limit, the neurons are highly correlated. The probabilistic law is ergodic, which basically means that it is invariant under a shift of the indices. Despite the non-trivial correlations, we are able to obtain a macroscopic process µ_e which describes the large-size behavior of the system. Furthermore we are able to obtain various “reductions” to the macroscopic level, as outlined in Section 6.

To be complete, let us mention the fact that this problem has already partially been explored in Physics by Sompolinsky and Zippelius [5,6] and in Mathematics by Alice Guionnet [4] who analyzed symmetric spin glass dynamics, i.e., the case where the matrix of the coupling coefficients (the synaptic weights in our case) is symmetric. This is a very special case of correlation. The work in [25] is also an important step forward in the direction of understanding the spin glass dynamics when more general correlations are present.

Let us also mention very briefly another class of approaches toward the description of very large populations of neurons where the individual spikes generated by the neurons are considered. The model for individual neurons is usually of the class of Integrate and Fire (IF) neurons [26] and the underlying mathematical tools are those of the theory of point-processes [27]. Important results have been obtained in this framework by Gerstner and his collaborators, e.g., [28,29] in the case of deterministic synaptic weights. Related to this approach but from a more mathematical viewpoint, important results on the solutions of the mean-field equations have been obtained in [30]. In the case of spiking neurons but with a continuous dynamics (unlike that of IF neurons), the first author and collaborators have recently obtained some limit equations that describe the asymptotic dynamics of fully connected networks of neurons [31] with independent synaptic weights.

Because of the correlation of the synaptic weights, the natural space to work in is the infinite dimensional space of the trajectories, noted

T^{ℤ}

, of a countably-infinite set of neurons and the set of stationary probability measures defined on this set, noted

ℳ_{1, S}^{+} (T^{ℤ})

.

We introduce the process-level empirical measure, noted

{\hat{μ}}_{N}

, of the N trajectories of the solutions to the equations of the network of N neurons and the averaged (with respect to the synaptic weights) law Q^N of the N trajectories of the solutions to the equations of the network of N neurons. The first result of this article (Theorem 1) is that the image law Π^N of Q^N through

{\hat{μ}}^{N}

satisfies a large deviation principle (LDP) with a good rate function H which is shown to have a unique global minimum, µ_e. We remind the reader that the notion of an image law is simply an extension to more complicated objects than functions, i.e., probability measures, of the usual notion of a change of variables. The interested reader is referred to e.g., the textbook as [32]. Thus, with respect to the measure Π^N

ℳ_{1, S}^{+} (T^{ℤ})

, if the set X contains the measure

δ_{μ_{e}}

, then Π (X) → 1 as N → ∞, whereas if

δ_{μ_{e}}

is not in the closure of X, Π^N(X) → 0 as N → ∞ exponentially fast and the constant in the exponential rate is determined by the rate function. Our analysis of the rate function allows us also, and this is our second result (Theorem 3), to characterize the limit measure µ_e as the image of a stationary Gaussian measure

\underline{µ_{e}}

defined on a transformed set of trajectories

T^{ℤ}

. This is potentially very useful for applications since

\underline{µ_{e}}

can be completely characterized by its mean and spectral density. Furthermore the rate function allows us to quantify the probability of finite-size effects. Theorems 1 and 3 allows us to characterize the average (over the synaptic weights) behavior of the network. We also derive, and this is our third result, some properties of the infinite-size network that are true for almost all realizations of the synaptic weights (Theorems 4 and 6).

The paper is organized as follows. In Section 2 we describe the equations of our network of neurons, the type of correlation between the synaptic weights, define the proper state spaces and introduce the different probability measures that are necessary for establishing our results, in particular the process-level empirical measure,

{\hat{μ}}_{N}

, Π^N and the image R^N through

{\hat{μ}}_{N}

of the law of the uncoupled neurons. We state the principle result of this paper in Theorem 1.

In Section 3 we introduce a certain Gaussian process attached to a given measure in

ℳ_{1, S}^{+} (T^{ℤ})

and

ℳ_{1, S}^{+} (T^{N})

and motivate this introduction by showing that the Radon-Nikodym derivative of Q^N with respect to the law of the uncoupled neurons can be expressed by the Gaussian process corresponding to the empirical measure

{\hat{μ}}_{N}

. This allows us to compute the Radon-Nikodym derivative of Π^N with respect to R^N for any measure in

ℳ_{1, S}^{+} (T^{ℤ})

. Using these results, Section 4 is dedicated to the proof of the existence of a strong LDP for the measure Π^N. In Section 5 we show that the good rate function obtained in the previous section has a unique global minimum and we characterize it as the image of a stationary Gaussian measure. Section 6 is dedicated to drawing some important consequences of our first main theorem, in particular some quenched results. Section 7 explores some possible extensions of our work and we conclude with Section 8.

2. The Neural Network Model

We consider a fully connected network of N neurons. Not all sets of neurons are fully connected but many are, e.g., within the same cortical column. One of the major aims of this article is to quantify how quickly the system converges to its limit, so the rate function gives us a means of assessing whether the number of neurons in a cortical column is sufficiently high for the mean field equations to be accurate. For simplicity but without loss of generality, we assume N odd [33] and write N = 2n + 1, n ≥ 0. The state of the neurons is described by the variables

(U_{t}^{j}), j = - n, \dots, n, t = 0, \dots, T

which represent the values of the neurons membrane potentials.

2.1. The Model Equations

The equation describing the time variation of the membrane potential U^j of the jth neuron writes

U_{t}^{j} = γ U_{t - 1}^{j} + \sum_{i = - n}^{n} J_{j i}^{N} f (U_{t - 1}^{i}) + B_{t - 1}^{j}, j = - n, \dots, n t = 1, \dots, T .

(1)

f : ℝ \to] 0, 1 [

is a monotonically increasing bijection which we assume to be Lipschitz continuous. Its Lipschitz constant is noted k_f. We could for example employ f(x) = (1 + tanh(gx))/2, where the parameter g can be used to control the slope of the “sigmoid” f at the origin x = 0.

This equation involves the parameters

γ, J_{i j}^{N}

, and the time processes

B_{t}^{j}, i, j = - n, \dots, n, t = 0, \dots, T - 1

. The initial conditions are discussed at the beginning of Section 2.2.2.

γ is in [0, 1) and determines the time scale of the intrinsic dynamics, i.e., without interactions, of the neurons. If γ = 0 the dynamics is said to have no leak.

The

B_{t}^{j} s

represent random fluctuations of the membrane potential of neuron j. They are independent random processes with the same law. We assume that at each time instant t, the

B_{t}^{j} s

are independent and identically distributed random variables distributed as

N_{1} (0, σ^{2})

[34].

The

J_{i j}^{N} s

are the synaptic weights.

J_{i j}^{N}

represents the strength with which the “presynaptic” neuron j influences the “postsynaptic” neuron i. They are Gaussian random variables, independent of the membrane fluctuations, whose mean is given by

E [J_{i j}^{N}] = \frac{J}{N},

where

\bar{J}

is some number independent of N.

We note J^N the N × N matrix of the synaptic weights,

J^{N} = {(J_{i j}^{N})}_{i, j = - n, \dots, n}

. Their covariance is assumed to satisfy the following shift invariance property,

c o v (J_{i j}^{N} J_{k l}^{N}) = c o v (J_{i + m, j + n}^{N} J_{k + m, l + n}^{N})

for all indexes i, j, k, l = −n,⋯, n and all integers m and n, the indexes being taken modulo N. Here, and throughout this paper, i mod N is taken to lie between −n and n.

Remark 1. This shift invariance property is technically useful since it allows us to use the tools of Fourier analysis. In terms of the neural population it means that the neurons “live” on a circle. Therefore, unlike in the uncorrelated case studied in the papers cited in the introduction, we have to indirectly introduce a notion of space.

We stipulate the covariances through a covariance function

Λ : ℤ^{2} \to ℝ

and assume that they scale as 1/N. We write

c o v (J_{i j}^{N} J_{k l}^{N}) = \frac{1}{N} Λ ((k - i) \mod N, (l - j) \mod N) .

(2)

The function Λ is even:

Λ (- k, - l) = Λ (k, l),

(3)

corresponding to the simultaneous exchange of the two presynaptic and postsynaptic neurons

(c o v (J_{i j}^{N} J_{k l}^{N}) = c o v (J_{k l}^{N} J_{i j}^{N})!)

. To be a well-defined covariance function, Λ must be a positive-definite function, i.e., satisfy the following property. For any

{(b_{i j})}_{i, j \in [- n, n]} \subset ℝ

,

\sum_{i, j, k, l = - n}^{n} b_{i j} b_{k l} Λ (k - i, l - j) \geq 0.

This implies that the two-dimensional Fourier transform of Λ (also called the power spectral density) is positive, see also Proposition 1 below. Furthermore, for any

k, l \in ℤ^{+}, Λ (0, 0) \geq | Λ (k, l) |

.

We must make further assumptions on Λ to ensure that the system is well-behaved as the number of neurons N asymptotes to infinity. We assume that the series

{(Λ (k, l))}_{k, l \in ℤ})

is absolutely convergent, i.e.,

Λ^{s u m} = \sum_{k, l = - \infty}^{\infty} | Λ (k, l) | < \infty,

(4)

and furthermore that

Λ^{\min} = \sum_{k, l = - \infty}^{\infty} Λ (k, l) > 0,

We let Λ^N be the restriction of Λ to [−n, n]², i.e., Λ^N(i, j) = Λ(i, j) for −n ≤ i, j ≤ n.

We next introduce the spectral properties of Λ that are crucial for the results in this paper. We use throughout the paper the notation that if x is some quantity,

\tilde{x}

represents its Fourier transform in a sense that depends on the particular space where x is defined. For example

\tilde{Λ}

is the 2π doubly periodic Fourier transform of the function Λ whose properties are described in the next proposition. Similarly,

{\tilde{Λ}}^{N}

is the two-dimensional Discrete Fourier Transform (DFT) of the doubly periodic sequence Λ^N. The proof of the following proposition is obvious.

Proposition 1. The sum

\tilde{Λ} (θ_{1}, θ_{2})

of absolutely convergent series

{(Λ (k, l) e^{- i (k θ_{1} + l θ_{2})})}_{k, l \in ℤ}

is continuous on [−π, π[² and positive. The covariance function Λ is recovered from the inverse Fourier transform of

\tilde{Λ}

:

Λ (k, l) = \frac{1}{{(2 π)}^{2}} \int_{- π}^{π} \int_{- π}^{π} \tilde{Λ} (θ_{1}, θ_{2}) e^{i (k θ_{1} + l θ_{2})} d θ_{1} d θ_{2}

Moreover there exists

{\tilde{Λ}}^{\min} > 0

such that

{\tilde{Λ}}^{N} (0, 0) \geq {\tilde{Λ}}^{\min} > 0,

(5)

for all N.

2.2. The Laws of the Uncoupled and Coupled Processes

2.2.1. Preliminaries

Sets of Trajectories, Temporal and Spatial Projections

The time evolution of one membrane potential is represented by the set

ℝ^{[0 \dots T]} : = T

of finite sequences (u_t)_t=0,⋯,T of length T + 1 of real numbers.

T^{N}

is the set of sequences (u⁻ⁿ,⋯,uⁿ) (N = 2n + 1) of elements of

T

that we use to describe the solutions to Equation (1). Similarly we note

T^{ℤ}

the set of doubly infinite sequences of elements of

T

. If u is in

T^{ℤ}

we note

u^{i}, i \in ℤ

, its ith coordinate, an element of

T

. Hence

u = {(u^{i})}_{i = - \infty \dots \infty}

.

Given the integers s and t, 0 ≤ s≤ t ≤ T, we define the temporal projection

π_{s, t} : T \to ℝ^{[s \dots t]} : = T_{s, t}

as the set of finite sequences of length t − s + 1 of real numbers such that

π_{s, t} (u) = {(u_{r})}_{r = s \dots t} : = u_{s, t}

. When s = t we note π_t and

T_{t}

rather than π_t,t and

T_{t, t}

. The temporal projection π_s,t extends in a natural way to

T^{N}

and

T^{ℤ}

: for example π_s,t maps

T^{N}

to

T_{s, t}^{N}

. We define the spatial projection

π^{N} : T^{ℤ} \to T^{N} (N = 2 n + 1)

to be

π^{N} (u) = (u^{- n}, \dots, u^{n})

. Temporal and spatial projections commute, i.e.,

π^{N} \circ π_{s, t} = π_{s, t} \circ π^{N}

.

The shift operator

S : T^{ℤ} \to T^{ℤ}

is defined by

{(S u)}^{i} = u^{i}^{+ 1}, i \in ℤ .

(6)

Given the element u = (u⁻ⁿ, …, uⁿ) of

T^{N}

we form the doubly infinite periodic sequence

p_{N} (u) = (\dots, u^{n - 1}, u^{n}, u^{- n}, \dots, u^{n}, u^{- n}, u^{- n + 1}, \dots)

(7)

which is an element of

T^{ℤ}

. We have

{(p_{N} (u))}^{i} = u^{(i \mod N)} \cdot p_{N}

is a mapping

T^{N} \to T^{ℤ}

. With a slight abuse of notation we also note

S

the shift operator induced by

S

on

T^{N}

through the function p_N:

S u = π^{N} (S p_{N} (u)) u \in T^{N}

(8)

Topologies on the Sets of Trajectories

We equip

T^{ℤ}

with the projective topology, i.e., the topology generated by the following metric. For

u, v \in T^{N}

, we define their distance d_N(u, υ) to be

d_{N} (u, v) = \sup_{| j | \leq n, 0 \leq s \leq T} | u_{s}^{j} - v_{s}^{j} | .

This allows us to define the following metric over

T^{ℤ}

, whereby if

u, v \in T^{ℤ}

then

d (u, v) = \sum_{N = 1}^{\infty} 2^{- N} (d_{N} (π^{N} u, π^{N} v) Λ 1)),

(9)

where a ∧ b is the smallest of a and b. Equiped with this topology,

T^{z}

is Polish (a complete, separable metric space).

The metric d generates the Borelian σ-algebra

ℬ (T^{ℤ}) : = ℱ

. It is generated by the coordinate functions

{(u_{j}^{i})}_{i \in ℤ, t = 0 \dots T}

. The spatial and temporal projections defined above can be used to define the corresponding σ-algebras on the sets

T_{s, t}^{N}

, e.g.,

ℱ_{s, t}^{N} = π^{N} (π_{s, t} (ℱ)), 0 \leq s \leq t \leq T

.

Probability Measures on the Sets of Trajectories

We note

ℳ_{1}^{+} (T^{ℤ})

(respectively

ℳ_{1}^{+} (T^{N})

) the set of probability measures on

(T^{ℤ}, ℱ)

(respectively

(T^{N}, ℱ^{N})

).

For

μ \in ℳ_{1}^{+} (T^{ℤ})

_, we denote its marginal distribution at time t by

μ_{t} = μ o μ_{t}^{- 1}

. Similarly,

μ_{s, t}^{N}

is its N-dimensional spatial, t − s + 1-dimensional time marginal

μ \circ {(μ^{N})}^{- 1} \circ μ_{s, t}^{- 1}

.

We denote the conditional probability distribution of µ, given

U_{0}^{j} = u_{0}^{j}

(for all j), by

μ_{u_{0}}

. This is understood to be a probability measure over

ℬ (T_{1, T}^{ℤ}) : = ℱ_{1, T}

.

We note

ℳ_{1, S}^{+} (T^{ℤ})

the set of stationary probability measures on

T^{ℤ}

. Given a random variable u with values in T governed by µ in

ℳ_{1, S}^{+} (T^{ℤ})

, so is the random variable

S u

, with

S

the shift operator defined by Equation (6)

(equivalently μ \circ S^{- 1} = μ)

. With a slight abuse of notation, we define

ℳ_{1, S}^{+} (T^{N})

to be the set of all

μ \in ℳ_{1} (T^{N})

satisfying the following property. If (u⁻ⁿ, …, uⁿ) are random variables governed by µ^N, then for all |m| ≤ n, (u^m−n, …, u^m⁺ⁿ) has the same law as (u⁻ⁿ, …, uⁿ) (recall that the indexing is taken modulo N), or equivalently that

μ^{N} \circ S^{- 1} = μ^{N}

(remember Equation (8)).

Remark 2. Note that the stationarity discussed here is a spatial stationarity.

Process-Level Empirical Measure

We next introduce the following process-level empirical measure, see e.g., [35]. Given an element u = (u⁻ⁿ, …, uⁿ) in

T^{N}

we associate with it the measure, noted

{\hat{μ}}_{N} (u^{- n}, \dots, u^{n})

, in

ℳ_{1, S}^{+} (T^{ℤ})

defined by

{\hat{μ}}_{N} : T^{N} \to ℳ_{1, S}^{+} (T^{ℤ}) such that d {\hat{μ}}_{N} (u^{- n}, \dots, u^{n}) (y) = \frac{1}{N} \sum_{i = - n}^{n} δ_{S^{i} p_{N} (u)} (y) .

(10)

S

is the shift operator defined in Equation (6).

Remark 3. This is a significant difference with previous work dealing with uncorrelated weights (e.g., [19]) where the N processes are coupled through the “usual” empirical measure

d {\hat{μ}}_{N} (u^{- n}, \dots, u^{n}) (y) = \frac{1}{N} \sum_{i = - n}^{n} δ_{u^{i}} (y)

which is a measure on

T

. In our case, because of the correlations and as shown in Section 3.4 the processes are coupled through the process-level empirical measure Equation (10) which is a probability measure on

T^{ℤ}

. This makes our analysis more biologically realistic, since we know that correlations between the synaptic weights do exist, but technically more involved.

Topology on Sets of Measures

We next equip

ℳ_{1}^{+} (T^{ℤ})

with the topology of weak convergence, as follows. For

μ^{N}, v^{N} \in ℳ_{1}^{+} (T^{N})

, we note the Wasserstein distance induced by the metric k_fd_N(u, υ) ∧ 1,

D_{N} (μ^{N}, v^{N}) = \inf_{ℒ \in J} {E^{ℒ} (k_{f}, d_{N} (u, v) \land 1)},

(11)

where k_f is a positive constant defined at the start of Section 2.1 and

J

is the set of all measures in

ℳ_{1}^{+} (T^{N} \times T^{N})

with N-dimensional marginals µ^N and υ^N.

Remark 4. The use of k_f in Equation (11) is technical and used to simplify the proof of Proposition 5.

For

μ, v \in ℳ_{1}^{+} (T^{N})

, we define

D (μ, v) = 2 \sum_{n = 0}^{\infty} κ_{n} D_{N} (μ^{N}, v^{N}),

(12)

where N = 2n + 1. Here

κ_{n} = \max (λ_{n} {.2}^{- N})

and

λ_{n} = \sum_{k = - \infty}^{\infty} | Λ (k, n) |

. We note that this metric is well-defined because D_N(μ^N, υ^N) ≤ 1 and

\sum_{n = 0}^{\infty} κ_{n} < \infty

. It can be shown that

ℳ_{1}^{+} (T^{ℤ})

equiped with this metric is Polish.

2.2.2. Coupled and Uncoupled Processes

We specify the initial conditions for Equation (1) as N independent and identically distributed random variables

{(U_{0}^{j})}_{j = - n, \dots, n}

. Let µ_I be the individual law on

ℝ

of

U_{0}^{j}

; it follows that the joint law of the variables is

μ_{I}^{\otimes N}

on

ℝ^{N}

. We note P the law of the solution to one of the uncoupled equations (1) where we take

J_{i j}^{N} = 0, i, j = - n, \dots, n

. P is the law of the solution to the following stochastic difference equation:

U_{t} = γ U_{t - 1} + B_{t - 1}, t = 1, \dots, T

(13)

the law of the initial condition being µ_I. This process can be characterized exactly, as follows.

Let

Ψ : T \to T

be the following bicontinuous bijection. Writing υ = Ψ(u), we define

{_{v_{s} = Ψ_{s} (u) = u_{s} - γ u_{s - 1} s = 1, \dots, T .}^{v_{0} = Ψ_{0} (u) = u_{0}}

(14)

The following proposition is evident from Equations (13) and (14).

Proposition 2. The law P of the solution to Equation (13) writes

P = (N_{T} (0_{T}, σ^{2} {Id}_{T}) \otimes µ_{I}) \circ Ψ,

where 0_T is the T-dimensional vector of coordinates equal to 0 and Id_T is the T -dimensional identity matrix.

We later employ the convention that if

u = (u^{- n}, \dots, u^{n}) \in T^{N}

then Ψ(u) = (Ψ(u⁻ⁿ), …, Ψ(uⁿ)). A similar convention applies if

u \in T^{ℤ}

. We also use the notation Ψ₁_,T for the mapping

T \to T_{1, T}

such that Ψ₁_,T = π₁_,T ◦ Ψ.

Reintroducing the coupling between the neurons, we note Q^N(J^N) the element of

ℳ_{1}^{+} (T^{N})

which is the law of the solution to Equation (1) conditioned on J^N. We let

Q^{N} = E^{J} [Q^{N} (J^{N})]

be the law averaged with respect to the weights. The reason for this is as follows. We want to study the empirical measure

{\hat{μ}}_{N}

on path space. There is no reason for this to be a simple problem since for a fixed interaction J^N, the variables (U⁻ⁿ, ⋯, Uⁿ) are not exchangeable. So we first study the law of

{\hat{μ}}_{N}

averaged over the interaction before we prove in Section 6 some almost sure properties of this law. Q^N is a common construction in the physics of interacting particle systems and is known as the annealed law [36].

We may thus infer that

Lemma 1. P ^⊗^N, Q^N and

{\hat{μ}}_{N}^{N}

(the N-dimensional marginal of

{\hat{μ}}_{N}

) are in

ℳ_{1, S}^{+} (T^{N})

_.

Since the application ψ defined in Equation (14) plays a central role in the sequel we introduce the following definition.

Definition 1. For each measure

μ \in ℳ_{1}^{+} (T^{N})

or

ℳ_{1, S}^{+} (T^{ℤ})

we define

\underline{μ}

to be μ ο ψ⁻¹

In particular, note that

\underline{P} = N_{T} (0_{T}, σ^{2} {Id}_{T}) \otimes μ_{I} .

(15)

Finally we introduce the image laws in terms of which the principal results of this paper are formulated.

Definition 2. Let Π^N (respectively R^N) be the image law of Q^N (respectively P^⊗^N) through the function

{\hat{μ}}_{N} : T^{N} \to ℳ_{1, S}^{+} (T^{ℤ})

defined by Equation (10).

The central result of this paper is in the next theorem.

Theorem 1. Π^N is governed by a large deviation principle (LDP) with a good rate function H (to be found in Definition 5). That is, if F is a closed set in

ℳ_{1, S}^{+} (T^{ℤ})

, then

\underset{N \to \infty}{\lim^{¯}} N^{- 1} \log \prod^{N} (F) \leq - \inf_{μ \in F} H (μ) .

(16)

Conversely, for all open sets O in

ℳ_{1, S}^{+} (T^{ℤ})

,

\lim_{N \to \infty} N^{- 1} \log \prod^{N} (O) \geq - \inf_{μ \in O} H (μ) .

(17)

Note

\lim_{N \to \infty}

is the lim-sup and

\lim_{N \to \infty}

is the lim-inf. By “good rate function”, we mean that for all a ≥ 0, the following set is compact

{v : H (v) \leq a},

see for example [37,38].

Remark 5. We recall that the above LDP is also called a strong LDP.

Our proof of Theorem 1 will occur in several steps. We prove in Sections 4.1 and 4.3 that Π^N satisfies a weak LDP, i.e., that it satisfies Equation (16) when F is compact and Equation (17) for all open O. We also prove in Section 4.2 that {Π^N} is exponentially tight, and we prove in Section 4.4 that H is a good rate function. It directly follows from these results that Π^N satisfies a strong LDP with good rate function H [38]. Finally, in Section 5 we prove that H has a unique minimum µ_e which

{\hat{μ}}_{N}

converges to weakly as N → ∞. This minimum is a (stationary) Gaussian measure which we describe in detail in Theorem 3.

3. The Good Rate Function

In the sections to follow we will obtain an LDP for the process with correlations (Π^N) via the (simpler) process without correlations (R^N). However, to do this we require an expression for the Radon-Nikodym derivative of Π^N with respect to R^N, which is the main result of this section. The derivative will be expressed in terms of a function

Γ_{[N]} : ℳ_{1, S}^{+} (T^{N}) \to ℝ

. We will firstly define Γ_[_N_](µ), demonstrating that it may be expressed in terms of a Gaussian process

G_{[N]}^{μ}

(to be defined below), and then use this to determine the Radon-Nikodym derivative of Π^N with respect to R^N.

3.1. Gaussian Processes

Given µ in

ℳ_{1, S}^{+} (T^{ℤ})

we define a stationary Gaussian process G^µ. with values in

T_{1, T}^{ℤ}

For all i the mean of

G_{t}^{μ, i}

is given by

c_{t}^{μ}

, where

c_{t}^{μ} = \bar{J} \int_{T^{ℤ}} f (u_{t - 1}^{i}) d μ (u), t = 1, \dots, T, i \in ℤ,

(18)

the above integral being well-defined because of the definition of f (which maps the state-variable to the activity, see text after Equation (1)) and independent of i due to the stationarity of µ.

We now define the covariance of G^µ. We first define the following matrix-valued process.

Definition 3. Let M^µ,k, k ∈ ℤ be the T × T matrix defined by (for s, t ∈ [1, T ]),

M_{s t}^{μ, k} = \int_{T^{ℤ}} f (u_{s - 1}^{0}) f (u_{t - 1}^{k}) d μ (u),

(19)

the above integral being well-defined because of the definition of f.

These matrices satisfy

^{†} M^{μ, k} = M^{μ, - k},

(20)

because of the stationarity of µ. Recall that ^† denotes the transpose. Furthermore, they feature a spectral representation, i.e., there exists a T × T matrix-valued measure

{\tilde{M}}^{μ} = {({\tilde{M}}^{μ})}_{s, t = 1, \dots, T}

with the following properties. Each

{\tilde{M}}_{s t}^{μ}

is a complex measure on [−π, π[ of finite total variation and such that

M^{μ, k} = \frac{1}{2 π} \int_{- π}^{π} e^{i k θ} {\tilde{M}}^{μ} (d θ) .

(21)

Relations (20) and (21) imply the following relations, for all Borelian sets

A \subset [- π, π [,

{\tilde{M}}^{μ} (- A) =^{†} {\tilde{M}}^{μ} (A) = {\tilde{M}}^{μ} (A) *,

(22)

where ^* indicates complex conjugation. We may infer from this that

{\tilde{M}}^{μ}

is Hermitian-valued. The spectral representation means that for all vectors

W \in ℝ^{T},^{†} W {\tilde{M}}^{μ} (d θ) W

is a positive measure on [−π, π[.

The covariance between the Gaussian vectors G^µ,i and G^µ,i⁺^k is defined to be

K^{μ, k} = \sum_{t = - \infty}^{\infty} Λ (k, l) M^{μ, l} .

(23)

We note that the above summation converges for all k ∈ ℤ since the series (Λ(k, l))_k,l_∈ℤ is absolutely convergent and the elements of M^µ,l are bounded by 1 for all l ∈ ℤ.

It follows immediately from the definition that for

μ \in ℳ_{1, S}^{+} (T^{ℤ})

and k ∈ ℤ we have

^{†} K^{µ, k} = K^{µ, - k} .

(24)

This is necessary for the covariance function to be well-defined. The following proposition may be easily proved from the above definitions.

Proposition 3. The sequence (K^µ,k)_k_∈ℤ has spectral density

{\tilde{K}}^{μ}

given by

{\tilde{K}}^{μ} (θ) = \frac{1}{2 π} \int_{- π}^{π} \tilde{Λ} (θ, - φ) {\tilde{M}}^{μ} (d φ) .

That is,

{\tilde{K}}^{μ}

is Hermitian positive and satisfies

{\tilde{K}}^{μ} (- θ) =^{†} {\tilde{K}}^{μ} (θ)

and

K^{μ, k} = \frac{1}{2 π} \int_{- π}^{μ} e^{i k θ} {\tilde{K}}^{μ} (θ) d θ

.

Proof. The proof essentially consists of demonstrating that the matrix function

{\tilde{K}}^{μ} (θ) = \sum_{k = - \infty}^{\infty} K^{μ, k} e^{- i k θ}

(25)

is well-defined on [−π, π[ and is equal to the expression in the statement of the proposition. Afterwards, we will prove that

{\tilde{K}}^{μ}

is positive.

From Equation (23) we obtain that, for all s, t ∈ [1 ⋯ T],

| K_{s t}^{μ, k} | \leq \sum_{t = - \infty}^{\infty} | Λ (k, l) | .

(26)

This shows that, because by Equation (4) the series (Λ(k, l))_k,l_∈ℤ is absolutely convergent,

{\tilde{K}}^{μ} (θ)

is well-defined on [−π, π]. The fact that

{\tilde{K}}^{μ} (θ)

is Hermitian follows from Equations (25) and (24).

Combining Equations (21), (23) and (25) we write

{\tilde{K}}^{μ} (θ) = \frac{1}{2 π} \int_{- π}^{π} (\sum_{m = - \infty}^{\infty} \sum_{k = - \infty}^{\infty} Λ (k, m) e^{- i (k θ - m φ)}) {\tilde{M}}^{μ} (d φ) .

This can be rewritten in terms of the spectral density

\tilde{Λ}

of Λ

{\tilde{K}}^{μ} (θ) = \frac{1}{2 π} \int_{- π}^{π} \tilde{Λ} (θ, - φ) {\tilde{M}}^{μ} (d φ) .

We note that

{\tilde{K}}^{μ} (θ)

is positive, because for all vectors W of ℝ^T

^{†} W {\tilde{K}}^{μ} (θ) W = \frac{1}{2 π} \int_{- π}^{π} \tilde{Λ} (θ, - φ) (^{†} W {\tilde{M}}^{μ} (d φ) W),

(27)

the spectral density

\tilde{Λ}

is positive and the measure

^{†} W {\tilde{M}}^{μ} (d φ) W

is positive. The identity

{\tilde{K}}^{μ} (- θ) =^{†} {\tilde{K}}^{μ} (θ)

follows from Equation (24). □

We may also define the N-dimensional Gaussian process

G_{[N]}^{μ}

with values in

T_{1, T}^{N}

as follows. The mean of

G_{[N]}^{μ, i}, i = - n, \dots, n

is given by Equation (18) (or rather its finite dimensional analog) and the covariance between

G_{[N]}^{μ, i}

and

G_{[N]}^{μ, i + k}

is given by

K_{[N]}^{μ, k} = \sum_{m = - n}^{n} Λ (k, m) M^{μ, m},

(28)

for k = −n,⋯, n. Equation (24) holds for

K_{[N]}^{μ, i}, k = - n, \dots, n

. This finite sequence has a Hermitian positive Discrete Fourier Transform denoted by

{\tilde{K}}_{[N]}^{μ, l}, for l = - n, \dots, n

.

3.2. Convergence of Gaussian Processes

The finite-dimensional system “converges” to the infinite-dimensional system in the following sense. In what follows, and throughout the paper, we use the Frobenius norm on the T × T matrices. We write

{\tilde{K}}_{[N]}^{μ} (θ) = \sum_{k = - n}^{n} K_{[N]}^{μ, k} \exp (- i k θ)

. Note that for

| j | \leq n, {\tilde{K}}_{[N]}^{μ} (2 π j / N) = {\tilde{K}}_{[N]}^{μ, j}

. The lemma below follows directly from the absolute convergence of

\sum_{j, k} | Λ (j, k) |

.

Lemma 2. Fix

μ \in ℳ_{1, S}^{+} (T^{ℤ})

. For all ε, there exists an N such that for all M > N and all j such that

2 | j | + 1 \leq M, ‖ K_{[M]}^{μ, j} - K^{μ, j} ‖ < ε

and for all

θ \in [- π, π [, ‖ {\tilde{K}}_{[M]}^{μ} (θ) - {\tilde{K}}^{μ} (θ) ‖ \leq ε

.

Lemma 3. The eigenvalues of

{\tilde{K}}_{[N]}^{μ, l}

and

{\tilde{K}}^{μ} (θ)

are upperbounded by

ρ_{K} \underset{\equiv}{def} T Λ^{s u m}

, where Λ^sum is defined in Equation (4).

Proof. Let W ∈ ℝ^T. We find from Proposition 3 and Equation (4) that

^{†} W {\tilde{K}}^{μ} (θ) W \leq \frac{Λ^{s u m}}{2 π} \int_{- π}^{π}^{†} W {\tilde{M}}^{μ} (d φ) W = Λ^{s u m}^{†} W M^{μ, 0} W .

The eigenvalues of M^µ,⁰ are all positive (since it is a correlation matrix), which means that each eigenvalue is upperbounded by the trace, which in turn is upperbounded by T. The proof in the finite dimensional case follows similarly. □

We note

K_{[N]}^{μ}

the (NT × NT) covariance matrix of the sequence of Gaussian random variables

(G_{[N]}^{μ, - n}, \dots, G_{[N]}^{μ, n})

. Because of the properties of the matrixes

K_{[N]}^{μ, k}, k = - n \dots n

, this is a symmetric block circulant matrix. It is also positive, being a covariance matrix.

We let

A_{[N]}^{μ} = K_{[N]}^{μ} {(σ^{2} {Id}_{N T} + K_{[N]}^{μ})}^{- 1}

. This is well-defined because

K_{[N]}^{μ}

is diagonalizable (being symmetric and real) and has positive eigenvalues (being a covariance matrix). It follows from Lemma 20 in Appendix A that this is a symmetric block circulant matrix, with blocks

A_{[N]}^{μ, k} (k = - n, \dots, n)

such that

A_{[N]}^{μ, - k} =^{†} A_{[N]}^{μ, k}

and that the matrixes

{\tilde{A}}_{[N]}^{μ, l} = \sum_{k = - n}^{n} A_{[N]}^{μ, k} e^{- \frac{2 π i k l}{N}} = {\tilde{K}}_{[N]}^{μ, l} {(σ^{2} {Id}_{T} + {\tilde{K}}_{[N]}^{μ, l})}^{- 1}

(29)

are Hermitian positive.

In the limit N → ∞ we may define

{\tilde{A}}^{µ} (θ) = {\tilde{K}}^{µ} (θ) {(σ^{2} {Id}_{T} + {\tilde{K}}^{µ} (θ))}^{-}^{1} .

(30)

The Fourier series of Ã^µ is absolutely convergent as a consequence of Wiener’s Theorem. We thus find that, for l ∈ ℤ,

A^{μ, l} = \frac{1}{2 π} \int_{- π}^{π} {\tilde{A}}^{μ} (θ) e^{i l θ} d θ = \lim_{N \to \infty} A_{[N]}^{μ, l},

(31)

and

{\tilde{A}}^{μ} (θ) = \sum_{l = - \infty}^{\infty} A^{μ, l} e^{i l θ}

. Let

{\tilde{A}}_{[N]}^{μ} (θ) = \sum_{k = - n}^{n} A_{[N]}^{μ, k} \exp (- i k θ)

and note that for |j| ≤ n,

{\tilde{A}}_{[N]}^{μ} (2 π j / N) = {\tilde{A}}_{[N]}^{μ, j}

.

Lemma 4. The map B → B(σ²Id_T + B)⁻¹ is Lipschitz continuous over the set

Δ = {{\tilde{K}}_{[N]}^{μ} (θ), {\tilde{K}}^{μ} (θ) : μ \in ℳ_{1, S}^{+} (T^{ℤ}), N > 0, θ \in [- π, π]}

.

Proof. The proof is straightforward using the boundedness of the eigenvalues of the matrixes in Δ. □

The following lemma is a consequence of Lemmas 2 and 4.

Lemma 5. Fix

µ \in ℳ_{1, S}^{+} (T^{ℤ})

. For all ε, there exists an N such that for all M > N and all θ ∈ [−π, π[,

| | {\tilde{A}}_{[M]}^{μ} (θ) - {\tilde{A}}^{μ} (θ) | | \leq ε

.

The above-defined matrices have the following “uniform convergen” properties.

Proposition 4. Fix

v \in ℳ_{1, S}^{+} (T^{ℤ})

. For all ε > 0, there exists an open neighbourhood V_ε(ν) such that for all μ ∈ V_ε(ν), all s, t ∈ [1,T] and all θ ∈ [−π, π[,

| {\tilde{K}}_{s t}^{ν} (θ) - {\tilde{K}}_{s t}^{μ} (θ) | \leq ε,

(32)

| {\tilde{A}}_{s t}^{ν} (θ) - {\tilde{A}}_{s t}^{μ} (θ) | \leq ε,

(33)

| c_{s}^{ν} - c_{s}^{μ} | \leq ε,

(34)

and for all N > 0, and for all k such that |k| ≤ n,

| {\tilde{K}}_{[N], s t}^{ν, k} - {\tilde{K}}_{[N], s t}^{μ, k} | \leq ε,

(35)

and

| {\tilde{A}}_{[N], s t}^{ν, k} - {\tilde{A}}_{[N], s t}^{μ, k} | \leq ε .

(36)

Proof. The proof is found in Appendix B. □

Before we close this section we define a subset of

ℳ_{1, S}^{+} (T^{ℤ})

which appears naturally, i.e., it is the subset of

ℳ_{1, S}^{+} (T^{ℤ})

where the rate function (to be defined) is not infinite, see Section 3.3.2 and Lemma 8.

Definition 4. Let

ℰ

₂ be the subset of

ℳ_{1, S}^{+} (T^{ℤ})

defined by

ℰ_{2} = {μ \in ℳ_{1, S}^{+} (T^{ℤ}) | E^{{\underline{μ}}_{1, T}} [{‖ v^{0} ‖}^{2}] < \infty},

where

v \in T_{1, T}^{ℤ}

and

E^{{\underline{μ}}_{1, T}} [{‖ v^{0} ‖}^{2}] = \int_{T_{1, T}^{ℤ}} {‖ π^{1} (v) ‖}^{2} d μ_{- 1, T} (v) = \int_{T_{1, T}} {‖ v^{0} ‖}^{2} μ_{- 1, T}^{1} (d v^{0}) .

For this set of measures, we may define the stationary process (υ^k)_k_∈ℤ in

T_{1, T}^{ℤ}

, where

v_{s}^{k} = Ψ_{s} (u^{k})

, s = 1, ⋯, T. This has a finite mean E^µ¹^,T [v⁰], noted

{\bar{v}}^{µ}

. It admits the following spectral density measure, noted

{\tilde{v}}^{μ}

, such that

E^{{\underline{μ}}_{1, T}} [v^{0 †} v^{k}] = \frac{1}{2 π} \int_{- π}^{π} e^{i k θ} {\tilde{v}}^{μ} (d θ) .

(37)

3.3. Definition of the Functional Γ

In this section we define and study a functional Γ_[_N_] = Γ_[_N_]_,₁ + Γ_[_N_]_,₂, which will be used to characterize the Radon-Nikodym derivative of Π^N with respect to R^N. Let

µ \in ℳ_{1, S}^{+} (T^{ℤ})

, and let (μ^N)N≥1 be the N-dimensional marginals of μ (for N = 2n + 1 odd).

3.3.1. Γ₁

We define

Γ_{[N], 1} (μ) = - \frac{1}{2 N} \log (\det ({Id}_{N T} + \frac{1}{σ^{2}} K_{[N]}^{μ})) .

(38)

Because of the remarks after Lemma 3 the spectrum of

K_{[N]}^{μ}

[N] is positive, that of

{Id}_{N T} + \frac{1}{σ^{2}} K_{[N]}^{μ}

is strictly positive (in effect larger than 1) and the above expression has a sense. Moreover, Γ_[_N_]_,₁(µ) ≤ 0.

We now define Γ₁(µ) = lim_N→∞ Γ_[_N_]_,₁(µ). The following lemma indicates that this is well-defined.

Lemma 6. When N goes to infinity the limit of Equation (38) is given by

Γ_{1} (μ) = - \frac{1}{4 π} \int_{- π}^{π} \log (\det ({Id}_{N T} + \frac{1}{σ^{2}} {\tilde{K}}^{μ} (θ))) d θ

(39)

for all

μ \in ℳ_{1, S}^{+} (T^{ℤ})

.

Proof. Through Lemma 20 in Appendix A, we have that

Γ_{[N], 1} (μ) = - \frac{1}{2 N} \sum_{l = - n}^{n} \log (\det ({Id}_{T} + \frac{1}{σ^{2}} {\tilde{K}}_{[N]}^{μ} (\frac{2 π l}{N}))),

(40)

where we recall that

{\tilde{K}}_{[N]}^{μ} (\frac{2 π l}{N}) = {\tilde{K}}_{[N]}^{μ l}

. Since, by Lemma 2,

{\tilde{K}}_{[N]}^{μ} (θ)

converges uniformly to

{\tilde{K}}^{μ} (θ)

, it is evident that the above expression converges to the desired result. □

Proposition 5. Γ_[_N_]_,₁ and Γ₁ are bounded below and continuous on

ℳ_{1, S}^{+} (T^{ℤ})

.

Proof. Applying Lemma 19 in the case of

Z = (G_{[N]}^{µ, - n} - c^{µ}, \dots, G_{[N]}^{µ, n} - c^{µ})

, a = 0, b = σ⁻², we write

Γ_{[N], 1} (μ) = \frac{1}{N} \log E [\exp (- \frac{1}{2 σ^{2}} \sum_{k = - n}^{n} {‖ G_{[N]}^{μ, k} - c^{μ} ‖}^{2})] .

Using Jensen’s inequality we have

Γ_{[N], 1} (μ) \geq - \frac{1}{2 N σ^{2}} E [\sum_{k = - n}^{n} {‖ G_{[N]}^{μ, k} - c^{μ} ‖}^{2}] = - \frac{1}{2 σ^{2}} E [{‖ G_{[N]}^{μ, 0} - c^{μ} ‖}^{2}] .

By definition of

K_{[N]}^{μ, 0}

, the right-hand side is equal to

- \frac{1}{2 σ^{2}} Trace (K_{[N]}^{μ, 0})

. From Equation (28), we find that

Trace (K_{[N]}^{μ, 0}) = \sum_{m = - n}^{n} Λ (0, m) Trace (M^{μ, m}) .

It follows from Equation (19) that 0 ≤ Trace(M^μ,m) ≤ T. Hence Γ_[_N_]_,₁(μ) ≥ −β₁, where

β_{1} = \frac{T Λ^{s u m}}{2 σ^{2}}

(41)

It follows from Lemma 6 that −β₁ is a lower bound for Γ₁(μ) as well.

The continuity of both Γ_[_N_]_,₁ and Γ₁ follows from the expressions (38) and (39), continuity of the applications

μ \to {\tilde{K}}_{[N]}^{μ}

and

μ \to {\tilde{K}}^{μ}

(Proposition 4) and the continuity of the determinant.

3.3.2. Γ₂

For

μ \in ℳ_{1, S}^{+} (T^{ℤ})

we define

Γ_{[N], 2} (μ) = \int_{T_{1, T}^{N}} ϕ^{N} (μ, ν) {\underline{μ}}_{1, T}^{N} (d ν)

(42)

where

ϕ^{N} : ℳ_{1, S}^{+} (T^{ℤ}) \times T_{1. T}^{N} \to ℝ

is defined by

ϕ^{N} (μ, ν) = \frac{1}{2 σ^{2}} (\frac{1}{N} \sum_{j, k = - n}^{n}^{†} (ν^{j} - c^{μ}) A_{[N]}^{μ, ν} (ν^{k + j} - c^{μ}) + \frac{2}{N} \sum_{j = - n}^{n} 〈 c^{μ}, ν^{j} 〉 - ‖ c^{μ} ‖^{2}) .

(43)

Γ_[_N_]_,₂(μ) is finite in the subset

ℰ

₂ of

ℳ_{1, S}^{+} (T^{ℤ})

defined in Definition 4. If μ ∉

ℰ

₂, then we set Γ_[_N_]_,₂(μ) = ∞.

We define Γ₂(μ) = lim_N_→∞ Γ_[_N_]_,₂(μ). The following proposition indicates that Γ₂(μ) is well-defined.

Proposition 6. If the measure μ is in

ℰ

₂, i.e., if

E^{\underline{μ}}^{_{1, T}} [‖ ν^{0} ‖^{2}] < \infty

, then Γ₂(μ) is finite and writes

Γ_{2} (μ) = \frac{1}{2 σ^{2}} (\frac{1}{2 π} \int_{- π}^{π} {\tilde{A}}^{μ} (- θ) : {\tilde{ν}}^{μ} (d θ) +^{†} c^{μ} ({\tilde{A}}^{μ} (0) - {Id}_{T}) c^{μ} + 2 E^{{\underline{μ}}_{1, T}} [^{t} ν^{0} ({Id}_{T} - {\tilde{A}}^{μ} (0)) c^{μ}]) .

The “:” symbol indicates the double contraction on the indexes. One also has

Γ_{2} (μ) = \frac{1}{2 σ^{2}} (\lim_{n \to \infty} \sum_{k = - n}^{n} \int_{T_{1, T}^{ℤ}}^{†} (ν^{0} - c^{μ}) A^{μ, k} (ν^{k} - c^{μ}) d {\underline{μ}}_{1, T} (ν) + 2 E^{{\underline{μ}}_{1, T}} [〈 c^{μ}, ν^{0} 〉] - ‖ c^{μ} ‖^{2}) .

Proof. Using Equations (37) and (42) the stationarity of μ and the fact that

\sum_{k = - n}^{n} A_{[N]}^{μ, k} = {\tilde{A}}_{[N]}^{μ} (0)

, we have

\begin{array}{l} Γ_{[N], 2} (μ) = \frac{1}{4 π σ^{2}} \int_{- π}^{π} \sum_{k = - n}^{n} \exp (i k θ) A_{[N]}^{μ, k} : {\tilde{ν}}^{μ} (d θ) \\ + \frac{1}{σ^{2}} \int_{T_{1, T}^{N}} 〈 c^{μ}, ν^{0} 〉 -^{†} c^{μ} {\tilde{A}}_{[N]}^{μ} (0) ν^{0} d {\underline{μ}}_{1, T}^{N} (ν) + {\frac{1}{2 σ^{2}}}^{†} c^{μ} ({Id}_{T} - {\tilde{A}}_{[N]}^{μ} (0)) c^{μ} . \end{array}

(44)

From the spectral representation of

A_{[N]}^{μ}

we find that

\begin{array}{l} Γ_{[N], 2} (μ) = \frac{1}{4 π σ^{2}} \int_{- π}^{π} {\tilde{A}}_{[N]}^{μ} (- θ) : {\tilde{ν}}^{μ} (d θ) \\ + \frac{1}{σ^{2}} E^{{\underline{μ}}_{1, T}} [^{†} ν^{0} ({Id}_{T} - {\tilde{A}}_{[N]}^{μ} (0)) c^{μ}] + {\frac{1}{2 σ^{2}}}^{†} c^{μ} ({Id}_{T} - {\tilde{A}}_{[N]}^{μ} (0)) c^{μ} . \end{array}

(45)

Since (according to Proposition 5)

{\tilde{A}}_{[N]}^{μ} (θ)

converges uniformly to Ã^μ(θ) as N → ∞, it follows by dominated convergence that Γ_[_N_]_,₂(μ) converges to the expression in the proposition.

The second expression for Γ₂(μ) follows analogously, although this time we make use of the fact that the partial sums of the Fourier Series of Ã^μ converge uniformly to Ã^μ (because the Fourier Series is absolutely convergent).

We next obtain more information about the eigenvalues of the matrices

{\tilde{A}}_{[N]}^{μ, k} = {\tilde{A}}_{[N]}^{μ} (\frac{2 k π}{N})

(where k = −n, …, n) and Ã^μ(θ).

Lemma 7. There exists 0 < α < 1, such that for all N and μ, the eigenvalues of

{\tilde{A}}_{[N]}^{μ, k}

, Ã^μ(θ) and

A_{[N]}^{μ}

are less than or equal to α.

Proof. By Lemma 3, the eigenvalues of

{\tilde{K}}^{μ} (θ)

are positive and upperbounded by ρ_K. Since

{\tilde{K}}^{μ} (θ)

and

{(σ^{2} {Id}_{T} + {\tilde{K}}^{μ} (θ))}^{- 1}

are coaxial (because

{\tilde{K}}^{μ}

is Hermitian and therefore diagonalisable), we may take

α = \frac{ρ_{K}}{σ^{2} + ρ_{K}} .

This upperbound also holds for

{\tilde{A}}_{[N]}^{μ, k}

, and for the eigenvalues of

A_{[N]}^{μ}

, because of Lemma 20.

We wish to prove that Γ_[_N_]_,₂(μ) is lower semicontinuous. A consequence of this will be that Γ_[_N_]_,₂(μ) is measureable with respect to

ℬ (ℳ_{1, S}^{+} (T^{ℤ}))

. In effect, we prove in Appendix C that ϕ^N(μ, v) defined by Equation (43) satisfies

ϕ^{N} (μ, ν) \geq - β_{2}

for some positive constant β₂ defined in Equation (87) in Appendix C.

We then have the following proposition.

Proposition 7. Γ_[_N_]_,₂(μ) is lower-semicontinuous.

Proof. We define

ϕ^{N, M} (μ, ν) = 1_{B_{M}} (ν) (ϕ^{N} (μ, ν) + β_{2})

, where

1_{B_{M}}

is the indicator of B_M and v ∈ B_M if

N^{- 1} \sum_{j = - n}^{n} {‖ ν^{2} ‖}^{2} \leq M

. We have just seen that ϕ^N,M ≥ 0. We also define

Γ_{[N], 2}^{M} (μ) = \int_{T_{1, T}^{N}} ϕ^{N, M} (μ, ν) {\underline{μ}}_{1, T}^{N} (d ν) - β_{2} .

Suppose that ν_n → μ with respect to the weak topology in

ℳ_{1, S}^{+} (T^{ℤ})

. Observe that

| Γ_{[N], 2}^{M} (μ) - Γ_{[N], 2}^{M} (ν_{n}) | \leq | \int_{T_{1, T}^{N}} ϕ^{N, M} (μ, ν) {\underline{μ}}_{1, T}^{N} (d ν) - \int_{T_{1, T}^{N}} ϕ^{N, M} (μ, ν) {\underline{ν}}_{n 1, T}^{N} (d ν) | + | \int_{T_{1, T}^{N}} ϕ^{N, M} (μ, ν) {\underline{ν}}_{n 1, T}^{N} (d ν) - \int_{T_{1, T}^{N}} ϕ^{N, M} (v_{n}, ν) {\underline{ν}}_{n 1, T}^{N} (d ν) | .

We may infer from the above expression that

Γ_{2, [N]}^{M} (μ)

is continuous (with respect to μ) for the following reasons. The first term on the right hand side converges to zero because ϕ^N,M is continuous and bounded (with respect to v). The second term converges to zero because ϕ^N,M(μ, v) is a continuous function of μ, see Proposition 4.

Since

Γ_{[N], 2}^{M} (μ)

grows to Γ_[_N_]_,₂(μ) as M → ∞, we may conclude that Γ_[_N_]_,₂(μ) is lower semicontinuous with respect to μ.

We define Γ_[_N_](μ) = Γ_[_N_]_,₁(μ) + Γ_[_N_]_,₂(μ). We may conclude from Propositions 5 and 7 that Γ_[_N_] is measurable.

3.4. The Radon-Nikodym Derivative

In this section we determine the Radon-Nikodym derivative of Π^N with respect to R^N. However, in order for us to do this, we must first compute the Radon-Nikodym derivative of Q^N with respect to P^⊗^N. We do this in the next proposition.

Proposition 8. The Radon-Nikodym derivative of Q^N with respect to P^⊗^N is given by the following expression.

\frac{d Q^{N}}{d P^{\otimes N}} (u^{- n}, \dots, u^{n}) = E [\exp (\frac{1}{σ^{2}} (\sum_{j = - n}^{n} 〈 Ψ_{1, T} (u^{j}), G^{j} 〉 - \frac{1}{2} {‖ G^{j} ‖}^{2}))],

(46)

the expectation being taken against the N T -dimensional Gaussian processes (Gⁱ), i = −n, ⋯, n given by

G_{t}^{i} \sum_{j = - n}^{n} J_{i j}^{N} f (u_{t - 1}^{j}), t = 1, \dots, T,

(47)

and the function Ψ being defined by Equation (14). Note that the

(G_{t}^{i})

constitute a finite-dimensional Gaussian Process because they are a linear combination of the Gaussian Processes

J_{i j}^{N}

.

Proof. For fixed J^N, we let

R_{J^{N}} : ℝ^{N (T + 1)} \to ℝ^{N (T + 1)}

be the mapping u → y, i.e.,

R_{J^{N}} (u^{- n}, \dots, u^{n}) = (y^{- n}, \dots, y^{n})

, where for j = −n, ⋯, n,

{\begin{cases} y_{0}^{j} = u_{0}^{j} \\ y_{t}^{j} = u_{t}^{j} - γ u_{t - 1}^{j} - G_{t}^{j} t = 1, \dots, T . \end{cases}

The determinant of the Jacobian of

R_{J^{N}}

is 1 for the following reasons. Since

\frac{d y_{s}^{j}}{d u_{t}^{k}} = 0

if t > s, the determinant is

\prod_{s = 0}^{T} D_{s}

, where D_s is the Jacobian of the map

(u_{s}^{- n}, \dots, u_{s}^{n}) \to (y_{s}^{- n}, \dots, y_{s}^{n})

induced by

R_{J^{N}}

. However, D_s is evidently 1. Similar reasoning implies that

R_{J^{N}}

is a bijection.

It may be seen that the random vector

Y = R_{J^{N}} (U)

, U solution of Equation (1), is such that

Y_{0}^{j} = U_{0}^{j}

and

Y_{t}^{j} = B_{t - 1}^{j}

where |j| ≤ n and t = 1, ⋯, T. Therefore

Y^{j} ≃ N_{T} (0, σ^{2} {Id}_{T}) \otimes μ_{I}, j = - n, \dots, n .

Since the determinant of the Jacobian of

R_{J^{N}}

is one, we obtain the law of Q^N(J^N) by applying the inverse of

R_{J^{N}}

to the above distribution, i.e.,

Q^{N} (J^{N}) (d u) = {(2 π σ^{2})}^{- \frac{N T}{2}} \exp (- \frac{1}{2 σ^{2}} {‖ R_{J^{N}} (u) ‖}^{2}) \prod_{j = - n}^{n} μ_{I} (d u_{0}^{j}) \prod_{t = 1}^{T} d u_{t}^{j} .

Note that, exceptionally, ‖ ‖ is the Euclidean norm in

ℝ^{N (T + 1)}

or

T^{N}

.

Recalling that P^⊗^N = Q^N(0), we therefore find that

\frac{d Q^{N} (J^{N})}{d P^{\otimes N}} (u) = \exp (- \frac{1}{2 σ^{2}} ({‖ R_{J^{N}} (u) ‖}^{2} - {‖ R_{0} (u) ‖}^{2})) .

Taking the expectation of this with respect to J^N yields the result.

In fact, as stated in the proposition below, the Gaussian system

{(G_{s}^{i})}_{i = - n, \dots, n, s = 1, \dots, T}

has the same law as the system

G_{[N]}^{{\hat{μ}}_{N}}

, as defined in Equation (28) and afterwards.

Proposition 9. Fix

u \in T^{N}

. The covariance of the Gaussian system

(G_{s}^{i})

, where i = −n, …, n and s = 1, …, T writes

K_{[N]}^{{\hat{μ}}_{N} (u)}

. For each i, the mean of Gⁱ is

c^{{\hat{μ}}_{N} (u)}

.

The proof of this proposition is an easy verification left to the reader. We obtain an alternative expression for the Radon-Nikodym derivative in Equation (46) by applying Lemma 19 in Appendix A. That is, we substitute Z = (G⁻ⁿ, ⋯, Gⁿ),

a = \frac{1}{σ^{2}} (ν^{- n}, \dots, ν^{n})

, and

b = \frac{1}{σ^{2}}

into the formula in Lemma 19. After noting Proposition 9 we thus find that

Proposition 10. The Radon-Nikodym derivatives write as

\begin{matrix} \frac{d Q^{N}}{d P^{\otimes N}} (u^{- n}, \dots, u^{n}) = \exp (N Γ_{[N]} ({\hat{μ}}_{N} (u^{- n}, \dots, u^{n})), \\ \frac{d Π^{N}}{d R^{N}} (μ) = \exp (N Γ_{[N]} (μ)) . \end{matrix}

Here

μ \in ℳ_{1, S}^{+} (T^{ℤ})

, Γ_[_N_](μ) = Γ_[_N_]_,₁(μ) + Γ_[_N_]_,₂(μ) and the expressions for Γ_[_N_]_,₁ and Γ_[_N_]_,₂ have been defined in Equations (38) and (42).

The second expression in the above proposition follows from the first one because Γ_[_N_] is measureable.

Remark 6. Proposition 10 shows that the process solutions of Equation (1) are coupled through the process-level empirical measure unlike in the case of independent weights where they are coupled through the usual empirical measure. As mentioned in the Remark 3 this complicates significantly the mathematical analysis.

4. The Large Deviation Principle

In this section we prove the principal result of this paper (Theorem 1), that the image laws Π^N satisfy an LDP with good rate function H (to be defined below). We do this by firstly establishing an LDP for the image law with uncoupled weights (R^N), see Definition 2, and then use the Radon-Nikodym derivative of Corollary 10 to establish the full LDP for Π^N. Therefore our first task is to write the LDP governing R^N.

Let μ, ν be probability measures over a Polish Space Ω equipped with its Borelian σ-algebra. The Küllback-Leibler divergence of μ relative to ν (also called the relative entropy) is

I^{(2)} (μ, ν) = \int_{Ω} \log (\frac{d μ}{d ν}) d μ

(48)

if μ is absolutely continuous with respect to ν, and I⁽²⁾(μ, ν) = ∞ otherwise. It is a standard result that

I^{(2)} (μ, ν) = \sup_{f \in C_{b} (Ω)} {E^{μ} [f] - \log E^{ν} [\exp (f)]},

(49)

the supremum being taken over all bounded continuous functions.

For

μ \in M_{1, S}^{+} (Ω^{ℤ})

and

ν \in ℳ_{1}^{+} (Ω)

, the process-level entropy of μ with respect to ν^ℤ is defined to be

I^{(3)} (μ, ν^{ℤ}) = \lim_{N \to \infty} \frac{1}{N} I^{(2)} (μ^{N}, ν^{\otimes N}) .

(50)

See Lemma IX.2.4 in [35] for a proof that this (possibly infinite) limit always exists (the superadditivity of the sequence N⁻¹I⁽²⁾(μ^N) follows from Equation (49)).

Theorem 2. R^N is governed by a large deviation principle with good rate function [39,40]

I^{(3)} (μ, P^{ℤ}) = I^{(3)} (μ_{0}, μ_{I}^{ℤ}) + \int_{ℝ \infty} I^{(3)} (μ_{u_{0}}, P_{u_{0}}^{ℤ}) d μ_{0} (u_{0}),

(51)

where

μ_{0} = μ \circ π_{0}^{- 1}

is the time marginal of μ at time 0 and

μ_{u_{0}} \in ℳ_{1, S}^{+} (T_{1, T}^{ℤ})

is the conditional probability distribution of μ given u₀ in ℝ^ℤ. In addition, the set of measures {R^N} is exponentially tight.

Proof. R^N satisfies an LDP with good rate function I⁽³⁾(μ, P^ℤ) [35]. In turn, a sequence of probability measures (such as {R^N}) over a Polish Space satisfying a large deviations upper bound with a good rate function is exponentially tight [38].

It is an identity in [41] that

I^{(2)} (μ^{N}, P^{\otimes N}) = I^{(2)} (μ_{0}^{N}, μ_{I}^{\otimes N}) + \int_{ℝ^{N}} I^{(2)} (μ_{u_{0}}^{N}, P_{u_{0}}^{\otimes N}) d μ_{0}^{N} (u_{0}) .

(52)

It follows directly from the variational expression (49) that

I^{(2)} (μ_{u_{0}}^{2 N}, P_{u_{0}}^{\otimes 2 N}) \geq 2 I^{(2)} (μ_{u_{0}}^{N}, P_{u_{0}}^{\otimes N}) .

(53)

Note that although our convention throughout this paper is for N to be odd, the limit in Equation (50) exists for any sequence of integers going to ∞. We divide Equation (52) by N and consider the subsequence of all N of the form N = 2^k for k ∈ ℤ⁺. It follows from Equation (53) that

N^{- 1} I^{(2)} (μ_{u_{0}}^{N}, P_{u_{0}}^{\otimes N})

is strictly nondecreasing as N = 2^k → ∞ (for all u₀), so that Equation (51) follows by the monotone convergence theorem.

Because Ψ is bijective and bicontinuous, it may be easily shown that

I^{(2)} (μ^{N}, P^{\otimes N}) = I^{(2)} ({\underline{μ}}^{N}, {\underline{P}}^{\otimes N})

(54)

I^{(3)} (μ, P^{ℤ}) = I^{(3)} (\underline{μ}, {\underline{P}}^{ℤ}) .

(55)

Before we move to a statement of the LDP governing Π^N, we prove the following relationship between the set

ℰ

₂ (see Definition 4) and the set of stationary measures which have a finite Küllback-Leibler divergence or process-level entropy with respect to P^ℤ.

Lemma 8.

{μ \in ℳ_{1, S}^{+} (T^{ℤ}), I^{(3)} (μ, P^{ℤ}) < \infty} \subset ε_{2} .

See Lemma 10 in [42] for a proof. We are now in a position to define what will be the rate function of the LDP governing Π^N.

Definition 5. Let H be the function

ℳ_{1, S}^{+} (T^{ℤ}) \to ℝ \cup {+ \infty}

defined by

H (μ) = {\begin{array}{l} + \infty & if I^{(3)} (μ, P^{ℤ}) = \infty \\ I^{(3)} (μ, P^{ℤ}) - Γ (μ) & otherwise . \end{array}

Here Γ(μ) = Γ₁(μ) + Γ₂(μ) and the expressions for Γ₁ and Γ₂ have been defined in Lemma 6 and Proposition 6. Note that because of Proposition 6 and Lemma 8, whenever I⁽³⁾ (μ, P^ℤ) is finite, so is Γ(μ).

4.1. Lower Bound on the Open Sets

We prove the second half of Proposition 1.

Lemma 9. For all open sets O, Equation (17)

\underset{N \to \infty}{\lim_{¯}} N^{- 1} \log Π^{N} (O) \geq - \inf_{μ \in O} H (μ),

holds.

Proof. From the expression for the Radon-Nikodym derivative in Proposition 10 we have

Π^{N} (O) \int_{O} \exp (N Γ_{[N]} (μ)) d R^{N} (μ) .

If μ ∈ O is such that I⁽³⁾(μ, P^ℤ) = ∞, then H(μ) = ∞ and evidently Equation (17) holds. We now prove Equation (17) for all μ ∈ O such that I⁽³⁾(μ, P^ℤ) < ∞. Let ε > 0 and

Z_{ε}^{N} (μ) \subset O

be an open neighbourhood containing μ such that

\inf_{ν \in Z_{ε}^{N} (μ)} Γ_{[N]} (ν) \geq Γ_{[N]} (μ) - ε

. Such

{Z_{ε}^{N} (μ)}

exist for all N because of the lower semi-continuity of Γ_[_N_](μ) (see Propositions 5 and 7). Then

\begin{array}{l} \underset{N \to \infty}{\lim_{¯}} N^{- 1} \log Π^{N} (O) = \underset{N \to \infty}{\lim_{¯}} N^{- 1} \log \int_{O} \exp (N Γ_{[N]} (ν)) d R^{N} (ν) \\ \geq \underset{N \to \infty}{\lim_{¯}} N^{- 1} \log (R^{N} (Z_{ε}^{N} (μ)) \times \inf_{ν \in Z_{ε}^{N} (μ)} \exp (N Γ_{[N]} (ν))) \\ \geq - I^{(3)} (μ, P^{ℤ}) + \underset{N \to \infty}{\lim_{¯}} \inf_{ν \in Z_{ε}^{N} (μ)} Γ_{[N]} (ν) because of Theorem 2 \\ \geq - I^{(3)} (μ, P^{ℤ}) + \underset{N \to \infty}{\lim_{¯}} Γ_{[N]} (μ) - ε \\ = - I^{(3)} (μ, P^{ℤ}) + Γ (μ) - ε . \end{array}

The last equality follows from Lemma 6 and Proposition 6. Since ε is arbitrary, we may take the limit as ε → 0 to obtain Equation (17). Since Equation (17) is true for all μ ∈ O the lemma is proved.

4.2. Exponential Tightness of Π^N

We begin with the following technical lemma, the proof of which can be found in Appendix D.

Lemma 10. There exist positive constants c > 0 and a > 1 such that, for all N,

\int_{T^{N}} \exp (a N ϕ^{N} ({\hat{μ}}_{N} (u) Ψ_{1, T} (u))) P^{\otimes N} (d u) \leq \exp (N c),

where φ^N is defined in Equation (43).

This lemma allows us to prove the exponential tightness:

Proposition 11. The family {Π^N} is exponentially tight.

Proof. Let

B \in ℬ (ℳ_{1, S}^{+} (T^{ℤ}))

. We have from Proposition 10

Π^{N} (B) = \int_{{({\hat{μ}}_{N})}^{- 1 (B)}} \exp N Γ_{[N]} ({\hat{μ}}_{N} (u)) P^{\otimes N} (d u) .

Through Hölder’s Inequality, we find that for any a > 1:

Π^{N} (B) \leq R^{N} {(B)}^{(1 - \frac{1}{a})} {(\int_{{({\hat{μ}}_{N})}^{- 1} (B)} \exp (a N Γ_{[N]} ({\hat{μ}}_{N} (u))) P^{\otimes N} (d u))}^{\frac{1}{a}} .

Now it may be observed that

\begin{array}{l} \int_{T^{N}} \exp (a N Γ_{[N]} ({\hat{μ}}_{N} (μ))) P^{\otimes N} (d u) \\ = \int_{T^{N}} \exp (a N ϕ^{N} ({\hat{μ}}_{N} (μ), Ψ_{1, T} (u)) + a N Γ_{[N], 1} ({\hat{μ}}_{N} (u))) P^{\otimes N} (d u) . \end{array}

(56)

Since Γ₁ ≤ 0, it follows from Lemma 10 that

Π^{N} (B) \leq R^{N} {(B)}^{(1 - \frac{1}{a})} \exp (\frac{N c}{a}) .

(57)

By the exponential tightness of {R^N} (as stated in Theorem 2), for each L > 0, there exists a compact set K_L such that

\bar{\lim_{N \to \infty}} N^{- 1} \log (R^{N} (K_{L}^{c})) \leq - L

. Thus if we choose compact K_Π such that

K_{Π},_{L} = K_{\frac{a}{a - 1} (L + \frac{c}{a})}

, then for all L > 0,

\bar{\lim_{N \to \infty}} N^{- 1} \log (Π^{N} (K_{Π, L}^{c})) \leq - L

. □

4.3. Upper Bound on the Compact Sets

In this section we obtain an upper bound on the compact sets, i.e., the first half of Theorem 1 for F compact. Our method is to obtain an LDP for a simplified Gaussian system (with fixed A^v and c^v), and then prove that this converges to the required bound as v → μ

4.3.1. An LDP for a Gaussian Measure

We linearise Γ in the following manner.

Fix ν \in M_{1, s}^{+} (T^{ℤ})

and assume for the moment that μ ∈

ℰ

₂. Let

Γ_{[N], 2}^{ν} (μ) = \int_{T_{1, T}^{N}} ϕ_{\infty}^{N} (ν, v) d {\underline{μ}}_{1, T}^{N} (v), where

(58)

ϕ_{\infty}^{N} : ℳ_{1, S}^{+} (T^{ℤ}) \times T_{1, T}^{N} \to ℝ

is define by

ϕ_{\infty}^{N} (ν, u) = \frac{1}{2 σ^{2}} (\frac{1}{N} \sum_{j, k = - n}^{n}^{†} (v^{j} - c^{ν}) A^{ν, k} (v^{k + j} - c^{ν}) + \frac{2}{N} \sum_{j = - n}^{n} 〈 c^{ν}, v^{j} 〉 - {‖ c^{ν} ‖}^{2}) .

(59)

Remark 7. Note the subtle difference with the definition of Φ^N in Equation (43): we use A^ν,k instead of

A_{[N]}^{ν, k}

. When turning to spectral representations of

Φ_{\infty}^{N}

this will bring in the matrixes Ã^{ν, N, l}, l = −n…n defined at the beginning of Section 4.3.2.

Let us also define

Γ_{1}^{N} (ν) = - \frac{1}{2 N} \log \det ({Id}_{N T} + \frac{1}{σ^{2}} K^{ν, N}),

(60)

where K^ν,N is the NT × NT matrix with T × T blocks noted K^ν,N,l. We define

Γ_{[N]}^{ν} (μ) = Γ_{1}^{N} (ν) + Γ_{[N], 2}^{ν} (μ), and

(61)

Γ_{2}^{ν} (μ) = \lim_{N \to \infty} Γ_{[N], 2}^{ν} (μ)

. We find, using the first identity in Proposition 6, that

Γ_{2}^{ν} (μ) = \frac{1}{2 σ^{2}} (\frac{1}{2 π} \int_{- π}^{π} {\tilde{A}}^{ν} (- θ) : {\tilde{v}}^{μ} (d θ) - 2^{†} c^{ν} {\tilde{A}}^{ν} (0) {\bar{v}}^{μ} +^{†} c^{ν} {\tilde{A}}^{ν} (0) c^{ν} + 2 〈 c^{ν}, {\bar{v}}^{μ} 〉 - {‖ c^{ν} ‖}^{2}),

(62)

Where

{\bar{v}}^{μ} = E^{{\underline{μ}}_{1, T}} [v^{0}]

, and

{\tilde{v}}^{μ}

is the spectral measure defined in Equation (37). We recall that : denotes double contraction on the indices.

Similarly to Lemma 6, we find that

\lim_{N \to \infty} Γ_{1}^{N} (ν) = - \frac{1}{4 π} \int_{- π}^{π} (\log \det ({Id}_{T} + \frac{1}{σ^{2}} {\tilde{K}}^{ν} (θ))) d θ = Γ_{1} (ν) .

(63)

For μ ∈

ℰ

₂, we define H^ν(μ) = I⁽³⁾(μ, P^ℤ) − Γ^ν(μ); for μ ∉

ℰ

₂, we define

Γ_{2}^{ν} (μ) = Γ^{ν} (μ) = \infty

and H^ν (μ) = ∞.

Definition 6. Let

{\underline{Q}}^{ν} \in ℳ_{1, S}^{+} (T^{ℤ})

with N-dimensional marginals

{\underline{Q}}^{ν, N}

given by

{\underline{Q}}^{ν, N} (B) = \int_{B} \exp (N Γ_{[N]}^{ν} ({\underline{\hat{μ}}}_{N} (v))) {\underline{P}}^{\otimes N} (d v),

(64)

Where

B \in ℬ (T^{N})

. This defines a law

Q^{ν} \in ℳ_{1, S}^{+} (T^{ℤ})

according to the correspondence in Definition 1.

We have the following lemma.

Lemma 11.

{\underline{Q}}_{1, T}^{ν}

is a stationary Gaussian process of mean c^ν. Its N-dimensional spatial, T-dimensional temporal, marginals

{\underline{Q}}_{1, T}^{ν, N}

are in

ℳ_{1, S}^{+} (T_{1, T}^{N})

and have covariance σ²Id_NT+K^ν^,N. The spectral density of

{\underline{Q}}_{1, T}^{ν}

is

σ^{2} {Id}_{T} + {\tilde{K}}^{ν} (θ)

and in addition,

{\underline{Q}}_{0}^{ν} = μ_{I}^{ℤ} .

(65)

Proof. In effect we find that

\begin{array}{l} {\underline{Q}}_{1, T}^{ν, N} (B_{N}) = {(\det ({Id}_{N T} + \frac{1}{σ^{2}} K^{ν, N}))}^{- \frac{1}{2}} \times \\ \int_{B_{N}} \exp \frac{1}{2 σ^{2}} (\sum_{j, k = - n}^{n}^{t} (v^{j} - c^{ν}) A^{ν, k} (v^{k + j} - c^{ν}) + 2 \sum_{j = - n}^{n} 〈 c^{ν}, v^{j} 〉 - N {‖ c^{ν} ‖}^{2}) {\underline{P}}_{1, T}^{\otimes N} (d v), \end{array}

(66)

for each Borelian

B_{N} \in ℬ (T_{1, T}^{N})

. We note c^ν,N the NT-dimensional vector obtained by concatenating N times the vector c^ν. We also have that

\frac{1}{σ^{2}} ({Id}_{N T} - A^{ν, N}) = {(σ^{2} {Id}_{N T} + K^{ν, N})}^{- 1} .

Thus, through Proposition 2, we find that

\begin{array}{l} {\underline{Q}}_{1, T}^{ν, N} (B_{N}) = {(2 π)}^{- \frac{N T}{2}} {(\det {(\frac{1}{σ^{2}} ({Id}_{N T} - A^{ν, N}))}^{- 1})}^{- \frac{1}{2}} \\ \int_{B_{N}} \exp - {\frac{1}{2 σ^{2}}}^{†} (v - c^{ν, N}) ({Id}_{N T} - A^{ν, N}) (v - c^{ν, N}) \prod_{j = - n}^{n} \prod_{t = 1}^{T} d v_{t}^{j} . \end{array}

(67)

It is seen that

{\underline{Q}}_{1, T}^{ν, N}

is an NT-dimensional Gaussian measure with mean c^ν,N, inverse covariance matrix

\frac{1}{σ^{2}} ({Id}_{N T} - A^{ν, N})

, and covariance matrix σ²Id_NT+K^ν,N. Hence

{\underline{Q}}_{1, T}^{ν, N}

is in

ℳ_{1, S}^{+} (T_{1, T}^{N})

, and

{\underline{Q}}^{ν, N} = {\underline{Q}}_{1, T}^{ν, N} \otimes μ_{I}^{N}

(68)

ℳ_{1, S}^{+} (T^{N})

. It follows also that the spectral density of

{\underline{Q}}_{1, T}^{ν}

is

σ^{2} {Id}_{T} + {\tilde{K}}^{ν} (θ)

. □

We may thus define the measure

{\underline{Q}}^{ν}

of a stationary process over the variables

{v_{s}^{j}}_{j \in ℤ, s = 0, \dots, T}

, with N-dimensional marginals given by Equations (67) and (68).

Definition 7. Let

{\underline{Π}}^{ν, N}

be the image law of

{\underline{Q}}^{ν, N}

under

{\hat{\underline{μ}}}_{N}

, i.e., for

B \in ℬ (ℳ_{1, S}^{+} (T^{ℤ}))

,

{\underline{Π}}^{ν, N} (B) = {\underline{Q}}^{ν, N} ({\underline{\hat{μ}}}_{N} \in B) .

The point is that it can be shown that the image

{\underline{Π}}^{ν, N}

of the measure

{\underline{Q}}^{ν, N}

satisfies a strong LDP (see next lemma) and that this LDP can be transferred to Π^N, see Proposition 12. We begin with the following lemma which is a generalization of the result in [39].

Lemma 12. The image law

{\underline{Π}}^{ν, N}

satisfies a strong LDP (in the manner of Theorem 1) with good rate function

{\underline{H}}^{ν} (\underline{μ}) = I^{(3)} (\underline{μ}, {\underline{P}}^{ℤ}) - Γ^{ν} (μ) .

(69)

Proof. We have found an LDP for a Gaussian process in [42]. Since Q^ν may be separated in the manner of Equation (68), we may use the expression in Theorem 2 to obtain the result. □

For

B_{N} \in ℬ (T_{1, T}^{N})

, we define the image law

Π^{ν, N} (B) = Q^{ν, N} ({\hat{μ}}_{N} \in B) = {\underline{Q}}^{ν, N} ({\hat{\underline{μ}}}^{N} \in B) .

It follows from the contraction principle that if we write

H^{ν} (μ) : = {\underline{H}}^{ν} (\underline{μ})

, then

Corollary 1. The image law Π^ν,N satisfies a strong LDP with good rate function

H^{ν} (μ) = I^{(3)} (μ, P^{ℤ}) - Γ^{ν} (μ) .

(70)

4.3.2. An Upper Bound for Π^N over Compact Sets

In this section we derive an upper bound for Π^N over compact sets using the LDP of the previous section. Before we do this, we require two lemmas governing the ‘distance’ between Γ^ν and Γ. Let

{\tilde{K}}^{µ, N}

be the DFT of

{(K^{µ, j})}_{j = - n}^{n}

, and similarly Ã^µ,N is the DFT of

{(A^{µ, j})}_{j = - n}^{n}

. We define

C_{N}^{ν} = \sup_{M \geq N, (2 | l | + 1) \leq M} {‖ {\tilde{A}}_{[M]}^{ν, l} - {\tilde{A}}^{ν, M, l} ‖, ‖ {\tilde{K}}_{[M]}^{ν, l} - {\tilde{K}}^{ν, M, l} ‖} .

(71)

Lemma 13. For all

ν \in ℳ_{1, S}^{+} (T^{ℤ})

,

C_{N}^{ν}

is finite and

C_{N}^{ν} \to 0 a s N \to \infty .

Proof. We recall from Proposition 4 that

{\tilde{K}}_{[M], s t}^{ν} (θ)

converges uniformly (in θ) to

{\tilde{K}}_{s t}^{ν} (θ)

. The same holds for

{\tilde{K}}_{s t}^{ν, M, l}

, because this represents the partial summation of an absolutely converging Fourier Series. That is, for fixed θ = 2πl_M,

{\tilde{K}}_{s t}^{ν, M, l_{M}} \to {\tilde{K}}_{s t}^{ν} (θ)

as M → ∞ The result then follows from the equivalence of matrix norms. The proof for Ã^ν is analogous. □

The second lemma, the proof of which can be found in Appendix E, goes as follows.

Lemma 14. There exists a constant C₀ such that for all ν in

ℳ_{1, S}^{+} (T^{ℤ})

, all ε > 0 and all μ ∈ V_ε (ν)∩ε₂,

Γ_{[N]} (μ) - Γ_{[N]}^{ν} (μ) | \leq C_{0} (C_{N}^{ν} + ε) (1 + E^{{\underline{μ}}_{1, r}} [{‖ ν^{0} ‖}^{2}]) .

Here V_ε(ν) is the open neighbourhood defined in Proposition 4, and µ is given in Definition 1.

We are now ready to begin the proof of the upper bound on compact sets for which we follow the ideas in [4].

Proposition 12. Let

K

be a compact subset of

ℳ_{1, S} (T^{ℤ})

. Then

\bar{\lim_{N \to \infty}} N^{- 1} \log (\prod^{N} (K)) \leq - \inf_{K} H

Proof. Fix ε > 0. Let V_ε(ν) be the open neighbourhood of ν defined in Proposition 4, and let

{\bar{V}}_{ε} (ν)

be its closure. Since

K

is compact and

{V_{ε} (ν)}_{ν \in K}

is an open cover, there exists an r and

{ν_{i}}_{i = 1}^{r}

such that

K \subset \cup_{i = 1}^{r} V_{ε} (ν_{i})

. We find that

\bar{\lim_{N \to \infty}} N^{- 1} \log (Π^{N} (\cup_{i = 1}^{r} V_{ε} (ν_{i}) \cap K)) \leq \sup_{1 \leq i \leq r} \bar{\lim_{N \to \infty}} N^{- 1} \log (Π^{N} ({\bar{V}}_{ε} (ν_{i}) \cap K)) .

It follows from the fact that

{\hat{μ}}_{N} \in ε_{2}

, Proposition 10 and Lemma 14 that

\begin{array}{l} \prod^{N} ({\bar{V}}_{ε} (ν_{i}) \cap K \leq \int_{{\hat{μ}}_{N} (u) \in {\bar{V}}_{ε} (ν_{i}) \cap K} \exp (N Γ_{(N)}^{ν_{i}} ({\hat{μ}}_{N} (u)) + \\ N C_{0} (ε + C_{N}^{ν_{i}}) (1 + \frac{1}{N} \sum_{j = - n}^{n} {‖ Ψ_{1, T} (u^{j}) ‖}^{2})) P_{1, T}^{\otimes N} (d u) . \end{array}

(72)

From the definition of Q^ν,N in Equation (64) and Hölder’s Inequality, for p, q such that

\frac{1}{p} + \frac{1}{q} = 1

, we have

\prod^{N} ({\bar{V}}_{ε} (ν_{i}) \cap K) \leq {(Q^{ν_{i}, N} ({\hat{μ}}_{N} (u) \in {\bar{V}}_{ε} (ν_{i}) \cap K))}^{\frac{1}{p}} D^{\frac{1}{q}},

(73)

where

\begin{array}{l} D = \int_{{\hat{μ}}_{N} (u) \in {\bar{V}}_{ε} (ν_{i}) \cap K} \exp (q N C_{0} (ε + C_{N}^{ν_{i}}) (1 + \frac{1}{N} \sum_{j = - n}^{n} {‖ Ψ_{1, T} (u^{j}) ‖}^{2})) Q_{1, T}^{ν_{i}, N} (d u) \\ = \exp q N C_{0} (ε + C_{N}^{ν_{i}}) \times \int_{{\underline{\hat{μ}}}^{N} (v) \in Ψ ({\bar{V}}_{ε} (ν_{i}) \cap K)} \exp (q N C_{0} (ε + C_{N}^{ν_{i}}) (\sum_{j = - n}^{n} {‖ (v^{j}) ‖}^{2})) {\underline{Q}}_{1, T}^{ν_{i}, N} (d u) . \end{array}

We note from Lemma 3 that the eigenvalues of the covariance of

{\underline{Q}}_{1, T}^{ν_{i}, N}

are upperbounded by σ² + ρ_K. Thus for this integral to converge it is sufficient that

q C_{0} (ε + C_{N}^{ν_{i}}) \leq \frac{1}{2 (σ^{2} + ρ_{K})} .

(74)

This condition will always be satisfied for sufficiently small ε and sufficiently large N (since

C_{N}^{ν_{i}} \to 0

as N → ∞ by Lemma 13). Considering Equation (73), by Corollary 1,

\bar{\lim_{N \to \infty}} N^{- 1} \log (Q^{ν_{i}, N} ({\hat{μ}}_{N} (u) \in {\bar{V}}_{ε} (ν_{i}) \cap K)) \leq - \inf_{μ \in {\bar{V}}_{ε} (ν_{i}) \cap K} H^{ν_{i}} (μ),

(75)

We next find an upper bound for the integral appearing in the definition of the quantity D. We apply Lemma 19 in Appendix A to find

\begin{array}{l} \int_{{\underline{\hat{μ}}}^{N} (v) \in Ψ ({\bar{V}}_{ε} (ν_{i}) \cap K)} \exp q C_{0} (ε + C_{N}^{ν_{i}}) (\sum_{j = - n}^{n} {‖ (v^{j}) ‖}^{2}) {\underline{Q}}_{1, T}^{ν_{i}, N} (d v) \leq \\ {(\det ((1 - 2 q C_{0} (ε + C_{N}^{ν_{i}}) σ^{2}) {Id}_{N T} - 2 q C_{0} (ε + C_{N}^{ν_{i}}) K^{ν_{i}, N}))}^{- \frac{1}{2}} \times \\ \exp {(2 C_{0}^{2} q^{2} ({(ε + C_{N}^{ν_{i}})}^{2})}^{†} (1_{N T} c^{ν_{i}}) B ({Id}_{T} \otimes 1_{N}) c^{ν_{i}}) + N q C_{0} (ε + C_{N}^{ν_{i}}) {‖ c^{ν_{i}} ‖}^{2}) \end{array}

where Id_T ⊗ 1_N is the NT × T block matrix with each block Id_T and

B = (σ^{2} {Id}_{N T} + K^{ν_{i}, N}) {((1 - 2 C_{0} q (ε + C_{N}^{ν_{i}}) σ^{2}) {Id}_{N T} - 2 C_{0} q (ε + C_{N}^{ν_{i}}) K^{ν_{i}, N})}^{- 1}

is a symmetric block circulant matrix. We note B^k, k = −n, ⋯, n its T × T blocks. We have

^{†} (({Id}_{T} \otimes 1_{N}) c^{ν_{i}}) B (({Id}_{T} \otimes 1_{N}) c^{ν_{i}}) = N^{†} c^{ν_{i}} (\sum_{k = - n}^{n} B^{k}) c^{ν_{i}} = N^{†} c^{ν_{i}} {\tilde{B}}^{0} c^{ν_{i}},

where

{\tilde{B}}^{0}

is the 0th component of the spectral representation of the sequence (B^k)_k₌₋_n,_⋯_,n. Let v_m be the largest eigenvalue of B. Since (by Lemma 20) the eigenvalues of

{\tilde{B}}^{0}

are a subset of the eigenvalues of B, we have

^{†} (({Id}_{T} \otimes 1_{N}) c^{ν_{i}}) B (({Id}_{T} \otimes 1_{N}) c^{ν_{i}}) \leq N v_{m} {‖ c^{ν_{i}} ‖}^{2} .

From the definition of B and through Lemma 3 we have

v_{m} \leq \frac{σ^{2} + ρ_{K}}{1 - 2 C_{0} q (ε + C_{N}^{ν_{i}}) (σ^{2} + ρ_{K})}

. Hence we have,

{‖ c^{ν_{i}} ‖}^{2} \leq T {\bar{J}}^{2}

\begin{array}{l} \exp (2 C_{0}^{2} (q^{2} {(ε + C_{N}^{ν_{i}})}^{2 †} (({Id}_{T} \otimes 1_{N}) c^{ν_{i}}) B (({Id}_{T} \otimes 1_{N}) c^{ν_{i}})) \leq \\ \exp (N T \times \frac{2 C_{0}^{2} q^{2} {(ε + C_{N}^{ν_{i}})}^{2} (σ^{2} + ρ_{K}) {\bar{J}}^{2}}{1 - 2 C_{0} q (ε + C_{N}^{ν_{i}}) (σ^{2} + ρ_{K})}) . \end{array}

Since the determinant is the product of the eigenvalues, we similarly find that

\begin{array}{l} {(\det ((1 - 2 C_{0} q (ε + C_{N}^{ν_{i}}) σ^{2}) {Id}_{N T} - 2 C_{0} q (ε + C_{N}^{ν_{i}}) K^{ν_{i}, N}))}^{- \frac{1}{2}} \leq \\ {(1 - 2 C_{0} q (ε + C_{N}^{ν_{i}}) (σ^{2} + ρ_{K}))}^{- \frac{N T}{2}} . \end{array}

Upon collecting the above inequalities, and noting that

{‖ c^{ν} ‖}^{2} \leq T {\bar{J}}^{2}

, we find that

D \leq \exp (N s_{N}^{ν_{i}} (q, ε))

(76)

where

\begin{array}{l} s_{N}^{ν_{i}} (q, ε) = T (- \frac{1}{2} \log (1 - 2 C_{0} q (ε + C_{N}^{ν_{i}}) (σ^{2} + ρ_{K})) \\ + \frac{2 C_{0}^{2} q^{2} {(ε + C_{N}^{ν_{i}})}^{2} (σ^{2} + ρ_{K}) {\bar{J}}^{2}}{1 - 2 C_{0} q (ε + C_{N}^{ν_{i}}) (σ^{2} + ρ_{K})} + q C_{0} (ε + C_{N}^{ν_{i}}) (\frac{1}{T} + {\bar{J}}^{2})) . \end{array}

We let

s (q, ε) = \bar{\lim_{N \to \infty}} s_{N}^{ν_{i}} (q, ε)

, and find through Lemma 13 that

s (q, ε) = T (- \frac{1}{2} \log (1 - 2 C_{0} q (σ^{2} + ρ_{K})) + \frac{2 C_{0}^{2} q^{2} (σ^{2} + ρ_{K}) {\bar{J}}^{2}}{1 - 2 C_{0} q (σ^{2} + ρ_{K})} + q C_{0} ε (\frac{1}{T} + {\bar{J}}^{2})) .

Notice that s(q, ε) is independent of ν_i and that s(q, ε) → 0 as ε → 0. Using Equations (73), (75) and (76) we thus find that

\bar{\lim_{N \to \infty}} N^{- 1} \log (\prod^{N} (K)) \leq \sup_{1 \leq i \leq r} - \frac{1}{p} \inf_{μ \in K \cap {\bar{V}}_{ε} (ν_{i})} H^{ν_{i}} (μ) - \frac{1}{q} s (q, ε) .

Recall that H^ν(μ) = ∞ for all μ ∉ ε₂. Thus if

K \cap ε_{2} = \emptyset

, we may infer that

\bar{\lim_{N \to \infty}} N^{- 1} \log (\prod^{N} (K)) = - \infty

and the proposition is evident. Thus we may assume without loss of generality that

\inf_{µ}_{\in K} H^{ν_{i}} (μ) = \inf_{µ \in K \cap ε_{2}} H^{ν_{i}} (μ)

. Furthermore it follows from Proposition 13 (below) that there exists a constant C_I such that for all

µ \in {\bar{V}}_{ε} (ν_{i} \cap ε_{2}

,

H^{ν_{i}} (μ) \geq I^{(3)} (μ, P^{ℤ}) - Γ (μ) - C_{I} ε (1 + I^{(3)}) (μ, P^{ℤ})) .

We thus find that

\bar{\lim_{N \to \infty}} N^{- 1} \log (\prod^{N} (K) \leq - \frac{1}{p} \inf_{K \cap ε_{2}} (I^{(3)} (μ, P^{ℤ}) (1 - C_{I} ε) - Γ (μ)) - \frac{s (q, ε)}{q} + \frac{ε}{p} C_{I} .

We take ε → 0 and find, through the use of Lemma 15, that

\bar{\lim_{N \to \infty}} N^{- 1} \log (\prod^{N} (K)) \leq - \frac{1}{p} \inf_{K} (I^{(3)} (μ, P^{ℤ}) - Γ (μ)) .

The proof may thus be completed by taking p → 1.

Proposition 13. There exists a positive constant C_I such that, for all ν in

ℳ_{1, S}^{+} (T^{ℤ}) \cap ε_{2}

, all ε > 0 and all

µ \in {\bar{V}}_{ε} (ν) \cap ε_{2}

(where

{\bar{V}}_{ε} (ν)

is the neighbourhood defined in Proposition 4),

| Γ^{ν} (μ) - Γ^{μ} (μ) | \leq C_{I} ε (1 + I^{(3)} (μ, P^{ℤ})) .

(77)

The proof is very similar to that of Lemma 14 and we leave it to the reader. We end this section with Lemma 15 whose proof can be found in Appendix D

Lemma 15. There exist constants a > 1 and c > 0 such that for all

μ \in ℳ_{1, S}^{+} (T^{ℤ}) \cap ε_{2}

,

Γ (μ \leq \frac{(I^{(3)} (μ, P^{ℤ}) + c)}{a}) .

4.4. End of the Proof of Theorem 1

Lemma 16. H(μ) is lower-semi-continuous.

The proof is very similar to that in [42]. Because {∏^N} is exponentially tight and satisfies the weak LDP with rate function H(μ), the following corollary is immediate (Lemma 2.1.5 in [37]).

Corollary 2. H(μ) is a good rate function, i.e., the sets {μ: H(μ) ≤ δ} are compact for all δ ∈ ℝ⁺ and it satisfies the first condition of Theorem 1.

This allows us to complete the proof of Theorem 1:

Proof. By combining Lemmas 16 and 9, Proposition 11, and Corollary 2, we complete the proof of Theorem 1.

5. Characterization of the Unique Minimum of the Rate Function

We prove that there exists a unique minimum µ_e of the rate function. and provide explicit equations for µ_e which would facilitate its numerical simulation. We start with the following lemma.

Lemma 17. For

μ, ν \in ℳ_{1, S}^{+} (T^{ℤ})

, H^ν(µ) = 0 if and only if µ = Q^ν.

Proof. This is a straightforward consequence of Theorem 1 in [42] and Theorem 2.

Proposition 14. There is a unique distribution

μ_{e} \in ℳ_{1, S}^{+} (T^{ℤ})

which minimizes H. This distribution satisfies H(µ_e) = 0.

Proof. By the previous lemma, it suffices to prove that there is a unique µ_e such that

Q^{μ_{e}} = μ_{e} .

(78)

We define the mapping

L : ℳ_{1, S}^{+} (T^{ℤ}) \to ℳ_{1, S}^{+} (T^{ℤ})

by

μ \to L (μ) = Q^{μ} .

It follows from Equation (65) that

Q_{0}^{μ} = μ_{I}^{ℤ},

(79)

which is independent of µ.

It may be inferred from the definitions in Section 3.1 that the marginal of L(µ) = Q^µ over

ℱ_{0, t}

only depends upon the marginal of µ over

ℱ_{0, t - 1}, t \geq 1

. This follows from the fact that

{\underline{Q}}_{1, t}^{μ}

(which determines

Q_{0, t}^{μ}

) is completely determined by the means

{c_{s}^{μ}; s = 1, \dots, t}

and covariances

{K_{u v}^{μ, j}; j \in ℤ, u, v \in [1, t]}

. In turn, it may be observed from Equations (18) and (23) that these variables are determined by µ_0,_t₋₁. Thus for any

μ, ν \in ℳ_{1, S}^{+} (T^{ℤ})

and t ∈ [1, T], if

μ_{0, t - 1} = ν_{0, t - 1},

Then

L {(μ)}_{0, t} = L {(ν)}_{0, t} .

It follows from repeated application of the above identity that for any ν satisfying

L^{T} {(ν)}_{0, T} = L^{T} {(L (ν))}_{0, T} .

(80)

Defining

μ_{e} = L^{T} (ν),

(81)

it follows from Equation (80) that µ_e satisfies Equation (78).

Conversely if µ = L(µ) for some µ, then we have that µ = L²(ν) for any ν such that ν₀_,T₋₂ = µ₀_,T₋₂. Continuing this reasoning, we find that µ = L^T (ν) for any ν such that ν₀ = µ₀. But by Equation (79), since Q^µ = µ, we have

μ_{0} = μ_{I}^{ℤ}

. But we have just seen that any µ satisfying µ = L^T (ν), where

ν_{0} = μ_{I}^{ℤ}

, is uniquely defined by Equation (81), which means that µ = µ_e.

We may use the proof of Proposition 14 to characterize the unique measure µ_e such that

μ_{e} = Q^{μ_{e}}

in terms of its image

\underline{µ_{e}}

. This characterization allows one to directly numerically calculate µ_e. We characterize

\underline{µ_{e}}

recursively (in time), by providing a method of determining

{\underline{μ}}_{e_{0, t}}

in terms of

{\underline{μ}}_{e_{0, t - 1}}

. However, we must firstly outline explicitly the bijective correspondence between µ_e₀_.t and

{\underline{μ}}_{e_{0, t}}

, as follows. For

v \in T

, we write Ψ⁻¹(v) = (Ψ⁻¹(v)₀, ⋯, Ψ⁻¹(v)_T). We recall from Equation (14) that Ψ⁻¹(v)₀ = v₀. The coordinate Ψ⁻¹(v)_t is the affine function of v_s, s = 0 ⋯ t obtained from Equation (14)

Ψ^{- 1} {(v)}_{t} = \sum_{i = 0}^{t} γ^{i} v_{t - i}, t = 0, \dots, T .

Let

K_{(t - 1, s - 1)}^{μ_{e}, l}

be the (t − 1) × (s − 1) submatrix of

K^{μ_{e}, l}

composed of the times rows from 1 to (t −1) and the columns from times 1 to (s − 1), and

c_{(t - 1)}^{μ_{e}} =^{†} (c_{1}^{μ_{e}}, \dots, c_{t - 1}^{μ_{e}}) .

Let the measures

{\underline{μ_{e}}}_{0, t}^{1}

and

{\underline{μ_{e}}}_{t, s}^{(0, l)}

be given by

\begin{array}{l} {\underline{μ_{e}}}_{0, t}^{1} (d v) = μ_{I} (d v_{0}) \otimes N_{t} (c_{(t)}^{μ_{e}}, σ^{2} {Id}_{t} + K_{(t, t)}^{μ_{e}, 0}) d v_{1} \dots d_{t - 1} . \\ {\underline{μ_{e}}}_{(t, s)}^{(0, l)} (d v^{0} d v^{l}) = μ_{I} (d v_{0}^{0}) \otimes μ_{I} (d v_{0}^{l}) \otimes N_{t + s} ((c_{(t)}^{μ_{e}}, c_{(s)}^{μ_{e}}), σ^{2} {Id}_{t + s} + K_{(t, s)}^{μ_{e}, (0, l)}) d v_{1}^{0} \dots d v_{t}^{0} d v_{1}^{l} \dots d v_{s}^{l}, \end{array}

where

K_{(t, s)}^{μ_{e}, (0, l)} = [\begin{matrix} K_{(t, t)}^{μ_{e}, 0} & K_{(t, s)}^{μ_{e}, l} \\ K_{(t, s)}^{μ_{e}, l} & K_{(s, s)}^{μ_{e}, 0} \end{matrix}] .

The lemma below is evident from the definitions above.

Lemma 18. For any t ∈ [1, T], the variables

{c_{s}^{μ_{e}}, K_{r s}^{μ_{e}, j} : 1 \leq r, s \leq t, j \in ℤ \leq}

are necessary and sufficient to completely characterize the measures

{{\underline{μ_{e}}}_{0, t}^{1}, {\underline{μ_{e}}}_{(0, t)}^{(0, l)} : l \in ℤ}

. In turn, the measures

{{\underline{μ_{e}}}_{0, t}^{1}, {\underline{μ_{e}}}_{(0, t)}^{(0, l)} : l \in ℤ}

are necessary and sufficient to characterize

{\underline{μ}}_{e_{0, t}}

.

The inductive method for calculating

\underline{µ_{e}}

is outlined in the theorem below.

Theorem 3. We may characterize

\underline{µ_{e}}

inductively as follows. Initially

{\underline{μ_{e}}}_{0} = μ_{I}^{ℤ}

. Given that we have a complete characterization of

{{\underline{μ_{e}}}_{(0, t - 1)}^{(0, l)}, {\underline{μ_{e}}}_{0, t - 1}^{1} : l \in ℤ},

we may characterize

{{\underline{μ_{e}}}_{(0, t)}^{(0, l)}, {\underline{μ_{e}}}_{0, t}^{1} : l \in ℤ}

according to the following identities. For s ∈ [1, t],

c_{s}^{μ_{e}} = \bar{J} \int_{ℝ^{t}} (f (Ψ^{- 1} {(v)}_{s - 1})) {\underline{μ_{e}}}_{0, s - 1}^{1} (d v) .

For 1 ≤ r, s ≤ t,

K_{r s}^{μ_{e}, k} = \sum_{l = - \infty}^{\infty} Λ (k, l) M_{r s}^{μ, l}

Here, for p = max(r − 1; s − 1),

M_{r s}^{μ_{e}, 0} = \int_{ℝ^{p + 1}} (f (Ψ^{- 1} {(v)}_{r - 1})) \times (f (Ψ^{- 1} {(v)}_{s - 1})) {\underline{μ_{e}}}_{0, p}^{1} (d v) .

and for l 6= 0

M_{r s}^{μ_{e}, l} = \int_{ℝ^{r} \times ℝ^{s}} (f (Ψ^{- 1} {(v^{0})}_{r - 1})) \times (f (Ψ^{- 1} {(v^{l})}_{s - 1})) {\underline{μ_{e}}}_{(r - 1, s - 1)}^{(0, l)} (d v^{0} d v^{l}) .

Of course the measure µ_e may be determined from

\underline{µ_{e}}

since

μ_{e} = \underline{µ_{e}} \circ Ψ

.

6. Some Important Consequences of Theorem 1

We state some important consequences of our results including some which are valid J almost surely (quenched results). We recall that Q^N(J^N) is the conditional law of N neurons for given J^N.

Theorem 4. Π^N converges weakly to

δ_{µ_{e}}

, i.e.„ for all

Φ \in C_{b} (ℳ_{1, S}^{+} (T^{ℤ}))

,

\lim_{N \to \infty} \int_{T^{N}} Φ ({\hat{μ}}_{N} (u)) Q^{N} (d u) = Φ (μ_{e}) .

Similarly,

\lim_{N \to \infty} \int_{T^{N}} Φ ({\hat{μ}}_{N} (u)) Q^{N} (J^{N}) (d u) = Φ (μ_{e}) J a l m o s t s u r e l y

Proof. The proof of the first result follows directly from the existence of an LDP for the measure ∏^N see Theorem 1, and is a straightforward adaptation of Theorem 2.5.1 in [18]. The proof of the second result uses the same method, making use of Theorem 5 below.

We can in fact obtain the following quenched convergence analogue of Equation (16).

Theorem 5. For each closed set F of

ℳ_{1, S}^{+} (T^{ℤ})

and for almost all J

\underset{N \to \infty}{\lim^{¯}} \frac{1}{N} \log [Q^{N} (J^{N}) ({\hat{μ}}_{N} \in F)] \leq \inf_{μ \in F} H (μ) .

Proof. The proof is a combination of Tchebyshev’s inequality and the Borel-Cantelli lemma and is a straightforward adaptation of Theorem 2.5.4 and Corollary 2.5.6 in [18].

We define

{\overset{⌣}{Q}}^{N} (J^{N}) = \frac{1}{N} \sum_{j = - n}^{n} Q^{N} (J^{N}) \circ S^{- j}

where we recall the shift operator S defined by Equation (8). Clearly

{\overset{⌣}{Q}}^{N} (J^{N})

is in

ℳ_{1, S}^{+} (T^{N})

.

Corollary 3. Fix M and let N > M. For almost every J and all h

h \in C_{b} (T^{M})

,

\begin{matrix} \lim_{N \to \infty} \int_{T^{N}} h (u) {\overset{⌣}{Q}}^{N, M} (J^{N}) (d u) = \int_{T^{N}} h (u) μ_{e}^{M} (d u) . \\ \lim_{N \to \infty} \int_{T^{N}} h (u) Q^{N, M} (d u) = \int_{T^{N}} h (u) μ_{e}^{M} (d u) . \end{matrix}

That is, the M^th marginals

{\overset{⌣}{Q}}^{N, M} (J^{N})

and Q^N,M converge weakly to

μ_{e}^{M}

as N → ∞.

Proof. It is sufficient to apply Theorem 4 in the case where Φ in

C_{b} (ℳ_{1, S}^{+} (T^{ℤ}))

is defined by

Φ (μ) \int_{T^{N}} h d μ^{M}

and to use the fact that Q^N,

{\overset{⌣}{Q}}^{N} (J) \in ℳ_{1, S}^{+} (T^{N})

(Lemma 1 and above remark).

We now prove the following ergodic-type theorem. We may represent the ambient probability space by

W

, where

ω \in W

is such that

ω = (J_{i j}, B_{t}^{j}, μ_{0}^{j})

, where i, j ∈ ℤ and 0 ≤ t ≤ T − 1, recall Equation (1). We denote the probability measure governing ω by

P

. Let

u^{(N)} (ω) \in T^{N}

be defined by Equation (1). As an aside, we may then understand Q^N(J^N) to be the conditional law of

P

on u⁽^N⁾(ω), for given J^N.

Theorem 6. Fix M > 0 and let

h \in C_{b} (T^{M})

. For

u^{(N)} (ω) \in T^{N}

(where N > M) and |j| ≤ n. Then

P

almost surely,

\lim_{N \to \infty} \frac{1}{N} \sum_{j = - n}^{n} h (π_{M} (S^{j} u^{(N)} (ω))) = \int_{T^{M}} h (u) d μ_{e}^{M} (u),

(82)

where

S^{j} u^{(N)} (ω)

is defined in Equation (8). Hence

{\hat{μ}}_{N} (u^{(N)} (ω))

converges

P

-almost-surely to µ_e.

Proof. Our proof is an adaptation of [18]. We may suppose without loss of generality that

f_{T^{M}} h (u) d μ_{e}^{M} (u) = 0.

For p > 1 let

F_{p} = {μ \in ℳ_{1, S}^{+} (T^{ℤ}) | | \int_{T^{M}} h (u) μ^{M} (d u) | \geq \frac{1}{p}} .

Since µ_e ∉ F_p, but it is the unique zero of H, it follows that

\inf_{F_{p}} H = m > 0

. Thus by Theorem 1 there exists an N₀, such that for all N > N₀,

Q^{N} ({\hat{μ}}_{N} \in F_{p}) \leq \exp (- m N) .

However

P (ω | {\hat{μ}}_{N} (u^{(N)} (ω)) \in F_{p}) = Q^{N} (u | {\hat{μ}}_{N} (u_{N} (u) \in F_{p}) .

Thus

\sum_{N = 1}^{\infty} P (ω | {\hat{μ}}_{N} (u^{(N)} (ω)) \in F_{p}) < \infty .

We may thus conclude from the Borel-Cantelli Lemma that P almost surely, for every

ω \in W

, there exists N_p such that for all N ≥ N_p,

| \frac{1}{N} \sum_{j = - n}^{n} h (π_{M} S^{j} u^{(N)} (ω)) | \leq \frac{1}{p} .

This yields (82) because p is arbitrary. The convergence of

{\hat{μ}}_{N} (u^{(N)} (ω))

is a direct consequence of Equation (82), since this means that each of the M^th marginals converge.

7. Possible Extensions

Our results hold true if we assume that Equation (1) is replaced by the more general equation

U_{t}^{j} = \sum_{k = 1}^{l} γ_{k} U_{t - k}^{j} \sum_{i = - n}^{l} J_{j i}^{N} f (U_{t - 1}^{j}) + θ_{j} + B_{t - 1}^{j}, j = - n, \dots, n t = l, \dots, T,

where l is a positive integer strictly less than T (in practice much smaller) and the θ_js are independent and identically distributed random variables independent of the synaptic weights J and the random processes B^j. They can be thought of as external stimuli imposed on the neurons. This equation accounts for a more complicated “intrinsic” dynamics of the neurons, i.e., when they are uncoupled. The parameters γ_k, k = 1, ⋯, l must satisfy some conditions to ensure stability of the uncoupled dynamics.

This result can be straightforwardly extended to the case when the noise is correlated but stationary Gaussian, that is cov

(B_{s}^{j} B_{t}^{k})

is some function of s, t and (k − j). It can also be easily extended to the case that the initial distribution is correlated but mixing, using the Large Deviation Principle in [43].

The hypothesis that the synaptic weights are Gaussian is somewhat unrealistic from the biological viewpoint. In his PhD thesis [18], Moynot has obtained some preliminary results in the case of uncorrelated weights. We think that this is also a promising avenue.

Moynot again, in his thesis, has extended the uncorrelated weights case, to include two populations with different (Gaussian) statistics for each population. This is also an important practical problem in neuroscience. Extending Moynot’s result to the correlated case is probably a low hanging fruit.

Last but not least, the solutions of the equations for the mean and covariance operator of the measure minimizing the rate function derived in Section 5 and their numerical simulation are very much worth investigating and their predictions confronted to biological measurements.

8. Conclusions

In recent years there has been a lot of effort to mathematically justify neural-field models, through some sort of asymptotic analysis of finite-size neural networks. Many, if not most, of these models assume/prove some sort of thermodynamic limit, whereby if one isolates a particular population of neurons in a localized area of space, they are found to fire increasingly asynchronously as the number in the population asymptotes to infinity [44]. Indeed, this was the result of Moynot and Samuelides. However, our results imply that there are system-wide correlations between the neurons, even in the asymptotic limit. The key reason why we do not have propagation of chaos is that the Radon-Nikodym derivative

\frac{d Q^{N}}{d P^{N}}

of the average laws in Proposition 8 cannot be tensored into N independent and identically distributed processes; whereas the simpler assumptions on the weight function Λ in Moynot and Samuelides allow the Radon-Nikodym derivative to be tensored. A very important implication of our result is that the mean-field behavior is insufficient to characterize the behavior of a population. Our limit process µ_e is system-wide and ergodic. Our work challenges the assumption held by some that one cannot have a “concise” macroscopic description of a neural network without an assumption of asynchronicity at the local population level.

It would be of interest to compare our LDP with other analyses of the rate of convergence of neural networks to their limits as the size asymptotes to infinity. This includes the system-size expansion of Bressloff [45], the path-integral formulation of Buice and Cowan [46] and the systematic expansion of the moments by (amongst others) [47–49].

Appendix

A. Two Useful Lemmas

The following lemma from Gaussian calculus [18,50] which we recall for completeness is used several times throughout the paper.

Lemma 19. Let Z be a Gaussian vector of ℝ^p with mean c and covariance matrix K. If a ∈ and b ∈ ℝ is such that for all eigenvalues α of K the relation αb > − 1 holds, we have

\begin{array}{l} E [\exp (^{t} a Z - \frac{b}{2} {‖ Z ‖}^{2})] = \\ \frac{1}{\sqrt{\det ({Id}_{p} + b K)}} \times \exp (^{t} a c - \frac{b}{2} {‖ c ‖}^{2} + {\frac{1}{2}}^{t} (a - b c) K {({Id}_{p} + b K)}^{- 1} (a - b c)) \end{array}

Block-circulant matrices may be diagonalised using DFTs as follows.

Lemma 20. Let B be a symmetric block-circulant matrix with the (j, k) T × T block given by (B⁽^j⁻^k^{) mod} ^N), j, k = −n, ⋯, n. Let W ⁽^N⁾ be the N × N Unitary matrix with elements

W_{j k}^{(N)} \frac{1}{\sqrt{N}} \exp (\frac{2 π i j k}{N})

, j, k = −n, ⋯, n. Then B may be ‘block’-diagonalised in the follow manner (where ⊗ is the Kronecker Product and ^∗ the complex conjugate),

B = (W^{(N)} \otimes {Id}_{T}) d i a g ({\tilde{B}}^{- n}, \dots, {\tilde{B}}^{n}) {(W^{(N)} \otimes {Id}_{T})}^{*} .

Here

{\tilde{B}}^{j}

is a T × T Hermitian matrix and is the DFT defined in Equation (88). We observe also that λ is an eigenvalue of B if and only if λ is an eigenvalue of

{\tilde{B}}^{k}

for some k.

B. Proof of Proposition 4

We first recall Proposition 4:

Proposition 4. Fix

ν \in ℳ_{1, S}^{+} (T^{ℤ})

. For all ε > 0, there exists an open neighborhood V_ε(ν) such that for all µ ∈ V_ε(ν), all s, t ∈ [1, T] and all θ ∈ [−π, π[,

\begin{matrix} | {\tilde{K}}_{s t}^{ν} (θ) - {\tilde{K}}_{s t}^{μ} (θ) | \leq ε, \\ | {\tilde{A}}_{s t}^{ν} (θ) - {\tilde{A}}_{s t}^{μ} (θ) | \leq ε, \\ | c_{s}^{ν} - c_{s}^{μ} | \leq ε, \end{matrix}

and for all N > 0, and for all k such that |k| ≤ n,

| {\tilde{K}}_{[N], s t}^{ν, k} - {\tilde{K}}_{[N] s t}^{μ, k} | \leq ε,

and

| {\tilde{A}}_{[N], s t}^{ν, k} - {\tilde{A}}_{[N], s t}^{μ, k} | \leq ε .

Proof. Let µ be in

ℳ_{1, S}^{+} (T^{ℤ})

and θ ∈ [−π, π[. We have

{\tilde{K}}_{s t}^{μ} (θ) - {\tilde{K}}_{s t}^{ν} (θ) = \sum_{k = - \infty}^{\infty} (K_{s t}^{μ, k} - K_{s t}^{ν, k}) e^{- i k θ} .

Using Equation (23) we have

{\tilde{K}}_{s t}^{μ} (θ) - {\tilde{K}}_{s t}^{ν} (θ) = \sum_{k, l = - \infty}^{\infty} Λ (k, l) (M_{s t}^{μ, l} - M_{s t}^{ν, l}) e^{- i k θ},

hence

\begin{array}{l} | {\tilde{K}}_{s t}^{μ} (θ) - {\tilde{K}}_{s t}^{ν} (θ) | \leq \\ \sum_{k, l = - \infty}^{\infty} | Λ (k, l) | \inf_{ℒ^{2 L}} \int_{T^{L} \times T^{L}} | f (u_{s - 1}^{0}) f (u_{t - 1}^{l}) - f (v_{s - 1}^{0}) f (v_{t - 1}^{l}) | ℒ^{2 L} (d u, d v), \end{array}

where L = 2|l| + 1 and

ℒ^{2 L}

has marginals µ^L and ν^L. Since

| f (u_{s - 1}^{0}) f (u_{t - 1}^{l}) - f (v_{s - 1}^{0}) f (v_{t - 1}^{l}) | \leq 2 (k_{f} d_{L} (π_{L} u, π_{L} v) \land 1)

, where k_f is the Lipschitz constant of the function f, we find (through Equation (12)) that

| {\tilde{K}}_{s t}^{μ} (θ) - {\tilde{K}}_{s t}^{ν} (θ) | \leq 2 D (μ, ν) .

Thus for Equation (32) to be satisfied, it suffices for us to stipulate that V_ε(ν) is a ball of radius less than

\frac{ε}{2}

(with respect to the distance metric in Equation (12)). Similar reasoning dictates that Equation (35) is satisfied too.

However, in light of Lemma 4, it is evident that we may take the radius of V_ε(ν) to be sufficiently small that Equations (32), (35) and (36) are satisfied. In fact Equation (33) is also satisfied, as it may be obtained by taking the limit as N → ∞ of Equation (36). Since c^µ is determined by the one-dimensional spatial marginal of µ, it follows from the definition of the metric in Equation (12) that we may take the radius of V_ε(ν) to be sufficiently small that Equation (34) is satisfied too.

C. Existence of a Lower Bound for Φ^N(µ, v)

In order to prove that φ^N(µ, v) defined in Equation (42) possesses a lower bound, we use a spectral representation and let

{\tilde{w}}^{j} = {\tilde{v}}^{j}

for all j, except that

{\tilde{w}}^{0} = {\tilde{v}}^{0} - N c^{μ}

. We may then write

{\underline{ϕ}}^{N} (μ, v) = \frac{1}{2 N^{2} σ^{2}} \sum_{l = - n}^{n}^{†} {\tilde{w}}^{l, *} {\tilde{A}}_{[N]}^{μ, - l} {\tilde{w}}^{l} + \frac{1}{N σ^{2}} 〈 c^{μ}, {\tilde{w}}^{0} 〉 + \frac{1}{2 σ^{2}} {‖ c^{μ} ‖}^{2} .

(83)

Thus in order that the integrand possesses a lower bound, it suffices to prove, since the matrixes

{\tilde{A}}_{[N]}^{μ, l}

are Hermitian positive, that there exists a lower bound for

{\frac{1}{N^{2}}}^{†} {\tilde{w}}^{0} {\tilde{A}}_{[N]}^{μ, 0} {\tilde{w}}^{0} + \frac{1}{N} 〈 {\tilde{w}}^{0}, c^{μ} 〉,

(84)

We have made use of the fact that

{\tilde{w}}^{0}

and

{\tilde{A}}_{[N]}^{μ, 0}

are real (since they are each a sum of real variables). Let

{\tilde{K}}_{[N]}^{μ, 0} = O_{[N]}^{μ} D_{[N]}^{μ}^{†} O_{[N]}^{μ}

, where

D_{[N]}^{μ}

is diagonal and

O_{[N]}^{μ}

is orthonormal. We define

X =^{†} O_{[N]}^{μ} {\tilde{w}}^{0}

, so that Equation (84) is equal to

{\frac{1}{N^{2}}}^{†} X D_{[N]}^{μ} {(σ^{2} {Id}_{T} + D_{[N]}^{μ})}^{- 1} X + \frac{2}{N} \sum_{t = 1}^{T} 〈^{†} O_{[N], t}^{μ}, c^{μ} 〉 X_{t},

(85)

where

O_{[N], t}^{μ}

is the t-th column vector of

O_{[N]}^{μ}

. In order that Equation (85) is bounded below, we require that the coefficient of X converges to zero when

D_{[N]}^{μ}

does. The following lemma is sufficient.

Lemma 21. For each 1 ≤ t ≤ T,

{〈 c^{μ}, O_{[N], t}^{μ} 〉}^{2} \leq \frac{{\bar{J}}^{2}}{{\tilde{Λ}}^{m i n}} D_{[N], t t}^{μ},

where

{\tilde{Λ}}^{m i n}

is given in Proposition 1.

Proof. If

\bar{J} = 0

the conclusion is evident, thus we assume throughout this proof that

\bar{J} \neq 0

. Since

D_{[N], t t}^{μ} =^{†} {\bar{O}}_{[N], t}^{μ}, {\tilde{K}}_{[N]}^{μ, 0} O_{[N], t}^{μ}

, we find from the definition that

D_{[N], t t}^{μ} = \sum_{k, m = - n}^{n} Λ^{N} {(k, m)}^{†} O_{[N], t}^{μ} M^{μ, m} O_{[N], t}^{μ} .

We introduce the matrixes (L^µ,k), k ∈ ℤ, where for 1 ≤ s, t ≤ T,

L_{s t}^{μ, k} = M_{s t}^{μ, k} - {\bar{c}}_{s}^{μ} {\bar{c}}_{t}^{μ} = \int_{T^{N}} (f (u_{s - 1}^{0}) - {\bar{c}}_{s - 1}^{μ}) (f (u_{t - 1}^{k}) - {\bar{c}}_{t - 1}^{μ}) μ (d u)

where

{\bar{c}}^{μ} = \frac{1}{\bar{J}} c^{μ}

.

These matrices have the same properties as the matrixes M^μk, in particular the discrete Fourier Transform

{({\tilde{L}}^{μ^{N}, l})}_{l = - n, \dots, n}

is Hermitian positive. Using this spectral representation we write

D_{[N], t t}^{μ} = {\tilde{Λ}}^{N} (0, 0) {〈 {\bar{c}}^{μ}, O_{[N], t}^{μ} 〉}^{2} + \frac{1}{N} \sum_{l = - n}^{n} {\tilde{Λ}}^{N} {(0, - l)}^{†} O_{[N], t}^{μ} {\tilde{L}}^{μ, l} O_{[N], t}^{μ},

and since

{\tilde{Λ}}^{N} (0, - l)

is positive for all l = −n, ⋯, n and

^{†} O_{[N], t}^{μ} {\tilde{L}}^{μ^{N}, l} O_{[N], t}^{μ^{N}}

is positive for all t = 1, ⋯, T, we have

D_{[N], t t}^{μ} \geq \frac{{\tilde{Λ}}^{N} (0, 0)}{{\bar{J}}^{2}} {〈 c^{μ}, O_{[N], t}^{μ} 〉}^{2},

and the conclusion follows from assumption (5).

We may use the previous lemma to obtain a lower-bound for the quadratic form (85). We recall the easily-proved identity from the calculus of quadratics that, for all x ∈ ℝ,

a x^{2} + 2 b x \geq - \frac{b^{2}}{a} .

We therefore find, through Lemma 21, that Equation (85) is greater than or equal to

- \frac{{\bar{J}}^{2}}{{\tilde{Λ}}^{\min}} (T σ^{2} + \sum_{t = 1}^{T} D_{[N], t t}^{μ}) = - \frac{{\bar{J}}^{2}}{{\tilde{Λ}}^{\min}} (T σ^{2} + Trace ({\tilde{K}}_{[N]}^{μ, 0})) .

(86)

We have already noted in the proof of Proposition 5 that

Trace ({\tilde{K}}^{μ, 0}) \leq T Λ^{s u m}

. Thus, pulling these results together, we find that ϕ^N(µ, v) is greater than −β₂, where

β_{2} = \frac{T {\bar{J}}^{2}}{2 σ^{2} {\tilde{Λ}}^{\min}} (σ^{2} + Λ^{s u m}) .

(87)

D. Proof of Lemmas 10 and 15

For technical reasons, we need the following definition which is also used in Appendix E. The motivation is that when we analyze the function Φ^N(µ, v) defined in Equation (43) we are led to use spectral representations and to introduce the Fourier transform

\tilde{v}

of v. Since

\tilde{v} \in {(ℂ^{T})}^{N}

the correspondence

v \to \tilde{v}

from

T_{1, T}^{N}

to (ℂ^T)^N is not one to one. We need to take into account the symmetries of

\tilde{v}

, hence the following definition.

Definition 8. For

v = {(v^{j})}_{j = - n \dots n} \in T_{1, T}^{N}

, we note

ℋ^{N} (v) = v_{⋄} = (v_{⋄}^{- n}, \dots, v_{⋄}^{n}) \in T_{1, T}^{N}

, where v_⋄ is defined from the Discrete Fourier Transform

\tilde{v} = ({\tilde{v}}^{- n}, \dots, {\tilde{v}}^{n})

of v as follows

{\tilde{v}}^{k} = \sum_{j = - n}^{n} v^{j} \exp (- \frac{2 π i j k}{N}) .

(88)

The inverse transform is given by

v^{j} = \frac{1}{N} \sum_{k = - n}^{n} {\tilde{v}}^{k} \exp (- \frac{2 π i j k}{N})

.

Because v is in

T_{1, T}^{N}

the real part of its DFT is even

(Re ({\tilde{v}}^{- k}) = Re ({\tilde{v}}^{k}), k = - n, \dots, n)

and similarly its imaginary part is odd. As a consequence we define

v_{⋄}^{k} = {\begin{cases} {\tilde{v}}^{0} k = 0 \\ \sqrt{2} Im ({\tilde{v}}^{k}) k = - n, \dots, - 1 \\ \sqrt{2} Re ({\tilde{v}}^{k}) k = 1, \dots, n \end{cases}

(89)

It is easily verified that the mapping v → v_⋄ =

ℋ

^N(v) is a bijection from

T_{1, T}^{N}

to itself the inverse being given by

v^{j} = \frac{1}{N} [v_{⋄}^{0} + \sqrt{2} Re (\sum_{k = 1}^{n} (v_{⋄}^{k} + i v_{⋄}^{- k}) e^{\frac{2 π i j k}{N}})]

and that

\sum_{k = - n}^{n} | | v_{⋄}^{k} | |^{2} = \sum_{k = - n}^{n}^{†} {\tilde{v}}^{k *} {\tilde{v}}^{k} = N \sum_{k = - n}^{n} | | v^{k} | |^{2},

For a probability measure

μ^{N} \in M_{1}^{+} (T^{N})

, we define

μ_{⋄}^{N} = μ_{1, T}^{N} \circ {(ℋ^{N})}^{- 1}

to be the image law.

We also note

{\underline{μ}}_{⋄}^{N}

the measure

{\underline{μ}}_{1, T}^{N} \circ {(ℋ^{N})}^{- 1}

(where

{\underline{μ}}^{N}

is given in Definition 1). We note that

{\underline{P}}_{⋄} ≃ N_{T} (0_{T}, N σ^{2} {Id}_{T}) .

(90)

We notice that

Γ_{[N], 2} (μ) = \int_{T_{1, T}^{N}} ϕ_{⋄}^{N} (μ, v_{⋄}) {\underline{μ}}_{⋄}^{N} (d v_{⋄})

, where

\begin{array}{l} ϕ_{⋄}^{N} (m u, v_{⋄}) = \frac{1}{2 N^{2} σ^{2}} \sum_{l = - n}^{n}^{†} {\tilde{v}}^{l *} {\tilde{A}}_{[N]}^{μ, - l} {\tilde{v}}^{l} \\ + \frac{1}{N σ^{2}} † {\tilde{v}}^{0} ({Id}_{T} - Ã_{[N]}^{μ} (0)) c^{μ} - \frac{1}{2 σ^{2}} † c^{μ} ({Id}_{T} - Ã_{[N]}^{μ} (0)) c^{μ}, \end{array}

(91)

and

\tilde{v}

is implicitly given by Equation (89) as a function of

v_{⋄}

. We have used Definition 8 and the DFT diagonalisation of Lemma 20. We note that, since

Ã_{[N]}^{μ, l}

is Hermitian positive,

^{†} {\tilde{v}}^{l, *} Ã_{[N]}^{μ, l} {\tilde{v}}^{l}

is real and positive. We recall Lemma 10 and give a proof.

Lemma 10. There exist positive constants c > 0 and a > 1 such that, for all N,

\int_{T^{N}} \exp (a N ϕ^{N} ({\hat{μ}}_{N} (u), Ψ_{1, T} (u))) P^{\otimes N} (d u) \leq \exp (N c),

where ϕ^N is defined in Equation (43).

Proof. We have from Equation (83) that

ϕ^{N} (μ, v) = {\underline{ϕ}}_{⋄}^{N} (μ, w_{⋄})

, where

w_{⋄}^{j} = v_{⋄}^{j}

for all j, except that

w_{⋄}^{0} = v_{⋄}^{0} - N c^{μ}

. Since (by Equation (90)) the distribution of the variables

v_{⋄}

under

{\underline{P}}_{⋄}^{\otimes N}

is

N_{T} {(0_{T}, N σ^{2} {Id}_{T})}^{\otimes N}

, the distribution of

w_{⋄}

under

{\underline{P}}_{⋄}^{\otimes N}

is

N_{T} {(N c^{μ}, N σ^{2} {Id}_{T})}^{\otimes N}

. By Lemma 7, the eigenvalues of

Ã_{[N]}^{μ, j}

are upperbounded by 0 < α < 1, for all j. Thus

N {\underline{ϕ}}_{⋄}^{N} (μ, w_{⋄}) \leq \frac{α}{2 N σ^{2}} \sum_{l = - n}^{n} | | w_{⋄}^{l} | |^{2} + \frac{1}{σ^{2}} 〈 c^{μ}, w_{⋄}^{0} 〉 + \frac{N}{2 σ^{2}} | | c^{μ} | |^{2} .

(92)

Hence we find that

\begin{array}{l} \int_{T^{N}} \exp (a N ϕ^{N} ({\hat{μ}}_{N} (u), Ψ (u))) P^{\otimes N} (d u) \leq {(\sqrt{2 π N σ^{2}})}^{- N T} \times \\ \int_{T_{1, T}^{N - 1}} G_{1} \exp (\frac{1}{2 N σ^{2}} [\sum_{| j | = 1}^{n} a α | | y^{j} | |^{2} - | | y^{j} | |^{2}]) \prod_{| j | = 1}^{n} \prod_{t = 1}^{T} d y_{t}^{j}, \end{array}

where

G_{1} = \int_{T_{1, T}} \exp [\frac{1}{2 N σ^{2}} \times [a α | | y^{0} | |^{2} + 2 a N 〈 c^{\hat{μ} N}, y^{0} 〉 + a N^{2} | | c^{\hat{μ} N} | |^{2} - | | y^{0} + N c^{\hat{μ} N} | |^{2}]] \prod_{t = 1}^{T} d y_{t}^{0} .

We note the dependency of

G_{1}

on (y^j) (for all |j| ≠ n) via

c^{\hat{μ} N}

. After diagonalisation, we find that

G_{1} = \int_{T_{1, T}} \exp [\frac{N | | c^{\hat{μ} N} | |^{2} a (a - 1) (1 - α)}{2 (1 - a α) σ^{2}} - \frac{(1 - a α)}{2 N σ^{2}} \sum_{s = 1}^{T} {(y_{s}^{0} - \frac{N c_{s}^{\hat{μ} N} (a - 1)}{1 - a α})}^{2}] \prod_{s = 1}^{T} d y_{s}^{0} .

We assume that a > 1 is such that 1 − aα>0. To bound this expression, we note the identity that if A : ℝ → ℝ satisfies

| A | \leq ℬ > 0

and γ_c > 0, then

\int_{ℝ} \exp (- \frac{1}{2 γ_{c}} {(t - A (t))}^{2}) d t \leq 2 ℬ + \sqrt{2 π γ_{c}} .

Since

| c_{s}^{\hat{μ} N} | \leq | \bar{J} |

, s = 1, ⋯, T, and hence

| | c^{\hat{μ} N} | |^{2} \leq T {\bar{J}}^{2}

, we therefore find that

G_{1} \leq G_{1}^{c}

, where

G_{1}^{c} = \exp [\frac{N T {\bar{J}}^{2} a (a - 1) (1 - α)}{2 σ^{2} (1 - a α)}] {(\frac{2 N | \bar{J} | (a - 1)}{1 - a α} + \sqrt{\frac{2 π N σ^{2}}{1 - a α}})}^{T} .

Thus

\int_{T^{N}} \exp (a N ϕ^{N} ({\hat{μ}}_{N} (u), Ψ_{1, T} (u))) P^{\otimes N} (d u) \leq G_{1}^{c} {(1 - a α)}^{- \frac{T (N - 1)}{2}} {(2 π N σ^{2})}^{- \frac{T}{2}},

which yields the lemma.

We include the proof of Lemma 15 which is used in the proof of the upper bound on compact sets in Section 4.3.2.

Lemma 15. There exist constants a > 1 and c > 0 such that for all

μ \in ℳ_{1, S}^{+} (T^{ℤ}) \cap ε_{2}

,

Γ (μ) \leq \frac{(I^{(3)} (μ, P^{ℤ}) + c)}{a} .

Proof. a > 1 is chosen as in the proof of Lemma 10. We have (from Equation (50) that

I^{(3)} (μ, P^{ℤ}) = \lim_{N \to \infty} N^{- 1} I^{(2)} (μ^{N}, P^{\otimes N}) .

We recall that I⁽²⁾ may be expressed using the variational expression (49) as

I^{(2)} (μ^{N}, P^{\otimes N}) = \sup_{ς^{N} \in C_{b} (T^{N})} (\int_{T^{N}} ς^{N} (u) μ^{N} (d u) - \log \int_{T^{N}} \exp (ς^{N} (u)) P^{\otimes N} (d u)),

(93)

where ς^N is a continuous, bounded function on

T^{N}

. We let

ς_{M}^{N} = a 1_{B_{M}} ς_{*}^{N}

, where

ς_{*}^{N} (u) = N (ϕ^{N} (μ^{N}, Ψ_{1, T} (u)) + Γ_{[N], 1} (μ))

, and u ∈ B_M only if either ||Ψ(u)|| ≤ NM or (ϕ^N(µ^N, Ψ₁_,T (u)) + Γ_[N],1(µ)) ≤ 0. We proved in Section 3.3.2 that ϕ^N (µ^N, Ψ₁_,T (u)) possesses a lower bound, which means that

ς_{M}^{N}

is continuous and bounded. Furthermore

ς_{M}^{N}

grows to

ς_{*}^{N}

, so that after substituting

ς_{M}^{N}

into Equation (93) and taking M → ∞ (i.e., applying the dominated convergence theorem), we obtain

a \int_{T^{N}} ς_{*}^{N} (u) μ^{N} (d u) \leq \log \int_{T^{N}} \exp (a ς_{*}^{N} (u)) P^{\otimes N} (d u) + I^{(2)} (μ^{N}, P^{\otimes N}) .

(94)

It can be easily shown, similarly to Lemma 10, that log

\int_{T^{N}} \exp (a ς_{*}^{N} (u)) P^{\otimes N} (d u) \leq N c

. We may thus divide both sides by aN and let N → ∞ to obtain the required result.

E. Proof of Lemma 14

We prove Lemma 14.

Lemma 14. There exists a constant C₀ such that for all ν in

ℳ_{1, S}^{+} (T^{ℤ})

, all ε > 0 and all µ ∈ V_ε(ν)∩ε₂,

| Γ_{[N]} (μ) - Γ_{[N]}^{v} (μ) | \leq C_{0} (C_{N}^{v} + ε) (1 + E^{μ} - 1, T [| | v^{0} | |^{2}]) .

Here V_ε(ν) is the open neighborhood defined in Proposition 4, and µ is given in Definition 1.

Proof. We firstly bound Γ₁. From Equations (60) and (61) and Lemma 20 we have

\begin{array}{r} | Γ_{[N], 1} (μ) - Γ_{[N]}^{v} (μ) | \leq \frac{1}{2 N} \sum_{l = - n}^{n} | \log \det ({Id}_{T} + σ^{- 2} {\tilde{K}}_{[N]}^{μ, l}) - \log \det ({Id}_{T} + σ^{- 2} {\tilde{K}}_{[N]}^{v, l}) | \\ + \frac{1}{2 N} \sum_{l = - n}^{n} | \log \det ({Id}_{T} + σ^{- 2} {\tilde{K}}_{[N]}^{v, l}) - \log \det ({Id}_{T} + σ^{- 2} {\tilde{K}}^{v, N, l}) | . \end{array}

It thus follows from Proposition 4 and Lemma 13 that

| Γ_{[N], 1} (μ) - Γ_{[N], 1}^{v} (μ) | \leq C_{0}^{*} (C_{N}^{v} + ε),

for some constant

C_{0}^{*}

which is independent of ν and N.

We define

ϕ_{\infty, ⋄}^{N} (ν, v_{⋄}) = ϕ_{\infty}^{N} (ν, {(ℋ^{N})}^{- 1} (v))

, where

ℋ

^N is given in Definition 8 and

ϕ_{\infty}^{N}

is given in Equation (59), and find that

\begin{array}{l} ϕ_{\infty, ⋄}^{N} (μ, v_{⋄}) = \frac{1}{2 N^{2} σ^{2}} \sum_{l = - n}^{n}^{†} {\tilde{v}}^{l, *} Ã^{v, - l} {\tilde{v}}^{l} \\ + \frac{1}{N σ^{2}} † {\tilde{v}}^{0} ({Id}_{T} - Ã^{v} (0)) c^{v} + \frac{1}{2 σ^{2}} † c^{v} ({Id}_{T} - Ã^{v} (0)) c^{v} . \end{array}

(95)

This means that

Γ_{[N], 2}^{v} (μ) - Γ_{[N], 2} (μ) = \int_{T_{1, T}^{N}} ϕ_{\infty, ⋄}^{N} (ν, v_{⋄}) - ϕ_{⋄}^{N} (μ^{N}, v_{⋄}) {\underline{μ}}_{⋄}^{N} (d v_{⋄}) .

(96)

Upon expansion of the above expression, we find that

| ϕ_{\infty, ⋄}^{N} (ν, v_{⋄}) - ϕ_{⋄}^{N} (μ^{N}, v_{⋄}) | \leq \frac{1}{2 σ^{2}} (\frac{1}{N^{2}} \sum_{l = - n}^{n} | | Ã_{[N]}^{μ, - l} - Ã^{ν, N, - l} | | | | {\tilde{v}}^{l} | |^{2} + \frac{2}{N} | | d_{ν, μ} | | | | {\tilde{v}}^{0} | | + | e_{ν, μ} |),

where

d_{ν, μ} = c^{μ} - Ã^{ν, N, 0} c^{ν} - Ã_{[N]}^{μ, 0} c^{μ}

and

e_{ν, μ} =^{†} c^{μ} Ã_{[N]}^{μ, 0} c^{μ} - | | c^{μ} | |^{2} -^{†} c^{ν} Ã^{ν, N, 0} c^{ν} + | | c^{ν} | |^{2}

. It follows from Proposition 5 and Lemma 13 that the (Euclidean) norm each of the above terms is bounded by

C^{*} (C_{N}^{ν} + ε)

for some constant C^∗.

The lemma now follows after consideration of the fact that

\int_{T_{T}^{ℤ}} | | v^{k} | |^{2} {\underline{μ}}_{1. T} (d v) = E^{{\underline{μ}}_{1, T}} [| | v^{0} | |^{2}]

,

| | {\tilde{v}}^{0} | |^{2} \leq N \sum_{k = - n}^{n} | | v^{k} | |^{2}

(Cauchy-Schwarz) and, because of the properties of the DFT,

\sum_{l = - n}^{n} | | v^{l} | |^{2} = N \sum_{k = - n}^{n} | | {\tilde{v}}^{k} | |^{2}

. □

Acknowledgments

Many thanks to Bruno Cessac whose suggestion to look at process-level empirical measures and entropies has been very useful and whose insights into the physical interpretations of our results have been very stimulating.

This work was partially supported by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 269921 (BrainScaleS), no. 318723 (Mathemacs), and by the ERC advanced grant NerVi no. 227747.

This work was supported by INRIA FRM, ERC-NERVI number 227747, European Union Project # FP7-269921 (BrainScales), and Mathemacs # FP7-ICT-2011.9.7.

Author Contributions

Both authors contributed to all parts of the article. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References and Notes

Guionnet, A. Dynamique de Langevin d’un verre de spins. Ph.D. Thesis, Université de Paris Sud, Orsay, France, March 1995. [Google Scholar]
Ben-Arous, G.; Guionnet, A. Large deviations for Langevin spin glass dynamics. Probab. Theory Relat. Fields. 1995, 102, 455–509. [Google Scholar]
Ben-Arous, G.; Guionnet, A. Symmetric Langevin Spin Glass Dynamics. Ann. Probab. 1997, 25, 1367–1422. [Google Scholar]
Guionnet, A. Averaged and quenched propagation of chaos for spin glass dynamics. Probab. Theory Relat. Fields. 1997, 109, 183–215. [Google Scholar]
Sompolinsky, H.; Zippelius, A. Dynamic theory of the spin-glass phase. Phys. Rev. Lett. 1981, 47, 359–362. [Google Scholar]
Sompolinsky, H.; Zippelius, A. Relaxational dynamics of the Edwards-Anderson model and the mean-field theory of spin-glasses. Phys. Rev. B 1982, 25, 6860–6875. [Google Scholar]
Crisanti, A.; Sompolinsky, H. Dynamics of spin systems with randomly asymmetric bonds: Langevin dynamics and a spherical model. Phys. Rev. A 1987, 36, 4922–4939. [Google Scholar]
Crisanti, A.; Sompolinsky, H. Dynamics of spin systems with randomly asymmetric bounds: Ising spins and Glauber dynamics. Phys. Rev. A 1987, 37, 4865–4874. [Google Scholar]
Dawson, D.; Gartner, J. Large deviations from the mckean-vlasov limit for weakly interacting diffusions. Stochastics 1987, 20, 247–308. [Google Scholar]
Dawson, D.; Gartner, J. Multilevel large deviations and interacting diffusions. Probab. Theory Relat. Fields. 1994, 98, 423–487. [Google Scholar]
Budhiraja, A.; Dupuis, P.; Markus, F. Large deviation properties of weakly interacting processes via weak convergence methods. Ann. Probab. 2012, 40, 74–102. [Google Scholar]
Fischer, M. On the form of the large deviation rate function for the empirical measures of weakly interacting systems. Bernoulli 2014, 20, 1765–1801. [Google Scholar]
Sompolinsky, H.; Crisanti, A.; Sommers, H. Chaos in Random Neural Networks. Phys. Rev. Lett. 1988, 61, 259–262. [Google Scholar]
Gerstner, W.; Kistler, W. Spiking Neuron Models; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
Izhikevich, E. Dynamical Systems in Neuroscience: The Geometry of Excitability and Bursting; MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
Ermentrout, G.B.; Terman, D. Foundations of Mathematical Neuroscience; Interdisciplinary Applied Mathematics; Springer: New York, NY, USA, 2010. [Google Scholar]
Cessac, B. Increase in complexity in random neural networks. J. Phys. I 1995, 5, 409–432. [Google Scholar]
Moynot, O. Etude mathématique de la dynamique des réseaux neuronaux aléatoires récurrents. Ph.D. Thesis, Université Paul Sabatier, Toulouse, France, January 2000. [Google Scholar]
Moynot, O.; Samuelides, M. Large deviations and mean-field theory for asymmetric random recurrent neural networks. Probab. Theory Relat. Fields. 2002, 123, 41–75. [Google Scholar]
Cessac, B.; Samuelides, M. From neuron to neural networks dynamics. Eur. Phys. J. Spec. Top. 2007, 142, 7–88. [Google Scholar]
Samuelides, M.; Cessac, B. Random Recurrent Neural Networks. Eur. Phys. J. Spec. Top. 2007, 142, 7–88. [Google Scholar]
Kandel, E.; Schwartz, J.; Jessel, T. Principles of Neural Science, 4th ed; McGraw-Hill: New York, NY, USA, 2000. [Google Scholar]
Dayan, P.; Abbott, L. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems; MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
Rudin, W. Real and Complex Analysis; McGraw-Hill: New York, NY, USA, 1987. [Google Scholar]
Cugliandolo, L.F.; Kurchan, J.; Le Doussal, P.; Peliti, L. Glassy behaviour in disordered systems with nonrelaxational dynamics. Phys. Rev. Lett. 1997, 78, 350–353. [Google Scholar]
Lapicque, L. Recherches quantitatifs sur l’excitation des nerfs traitee comme une polarisation. J. Physiol. Paris. 1907, 9, 620–635. [Google Scholar]
Daley, D.; Vere-Jones, D. An Introduction to the Theory of Point Processes: General Theory and Structure; Springer: New York, NY, USA, 2007; Volume 2. [Google Scholar]
Gerstner, W.; van Hemmen, J. Coherence and incoherence in a globally coupled ensemble of pulse-emitting units. Phys. Rev. Lett. 1993, 71, 312–315. [Google Scholar]
Gerstner, W. Time structure of the activity in neural network models. Phys. Rev. E 1995, 51, 738–758. [Google Scholar]
Cáceres, M.J.; Carillo, J.A.; Perhame, B. Analysis of nonlinear noisy integrate and fire neuron models: Blow-up and steady states. J. Math. Neurosci. 2011, 1. [Google Scholar] [CrossRef]
Baladron, J.; Fasoli, D.; Faugeras, O.; Touboul, J. Mean-field description and propagation of chaos in networks of Hodgkin-Huxley and FitzHugh-Nagumo neurons. J. Math. Neurosci. 2012, 2. [Google Scholar] [CrossRef]
Bogachev, V. Measure Theory, 1 ed; Volume 1 in Measure Theory; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
When N is even the formulae are slightly more complicated but all the results we prove below in the case N odd are still valid.
We note $N_{p} (m, Σ)$ the law of the p-dimensional Gaussian variable with mean m and covariance matrix Σ.
Ellis, R. Entropy, Large Deviations and Statistical Mechanics; Springer: New York, NY, USA, 1985. [Google Scholar]
Liggett, T.M. Interacting Particle Systems; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
Deuschel, J.D.; Stroock, D.W. Large Deviations; Pure and Applied Mathematics; Academic Press: Waltham, MA, USA, 1989; Volume 137. [Google Scholar]
Dembo, A.; Zeitouni, O. Large Deviations Techniques, 2nd ed; Springer: Berlin/Heidelberg, Germany, 1997. [Google Scholar]
Donsker, M.; Varadhan, S. Large deviations for stationary Gaussian processes. Commun. Math. Phys. 1985, 97, 187–210. [Google Scholar]
Baxter, J.R.; Jain, N.C. An Approximation Condition for Large Deviations and Some Applications. In Convergence in Ergodic Theory and Probability; Bergulson, V., Ed.; De Gruyter: Boston, MA, USA, 1993. [Google Scholar]
Donsker, M.; Varadhan, S. Asymptotic Evaluation of Certain Markov Process Expectations for Large Time, IV. Commun. Pure Appl. Math. 1983, XXXVI, 183–212. [Google Scholar]
Faugeras, O.; MacLaurin, J. A Large Deviation Principle and an Analytical Expression of the Rate Function for a Discrete Stationary Gaussian Process 2013, arXiv, 1311.4400.
Chiyonobu, T.; Kusuoka, S. The Large Deviation Principle for Hypermixing Processes. Probab. Theory Relat. Fields. 1988, 78, 627–649. [Google Scholar]
We noted in the introduction that this is termed propagation of chaos by some.
Bressloff, P. Stochastic neural field theory and the system-size expansion. SIAM J. Appl. Math. 2009, 70, 1488–1521. [Google Scholar]
Buice, M.; Cowan, J. Field-theoretic approach to fluctuation effects in neural networks. Phys. Rev. E 2007, 75. [Google Scholar] [CrossRef]
Ginzburg, I.; Sompolinsky, H. Theory of correlations in stochastic neural networks. Phys. Rev. E 1994, 50, 3171–3191. [Google Scholar]
ElBoustani, S.; Destexhe, A. A master equation formalism for macroscopic modeling of asynchronous irregular activity states. Neural Comput. 2009, 21, 46–100. [Google Scholar]
Buice, M.; Cowan, J.; Chow, C. Systematic fluctuation expansion for neural network activity equations. Neural Comput. 2010, 22, 377–426. [Google Scholar]
Neveu, J. Processus aléatoires gaussiens; Presses de l’Université de Montréal: Montréal, QC, Canada, 1968; Volume 34. [Google Scholar]

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Faugeras, O.; MacLaurin, J. Asymptotic Description of Neural Networks with Correlated Synaptic Weights. Entropy 2015, 17, 4701-4743. https://doi.org/10.3390/e17074701

AMA Style

Faugeras O, MacLaurin J. Asymptotic Description of Neural Networks with Correlated Synaptic Weights. Entropy. 2015; 17(7):4701-4743. https://doi.org/10.3390/e17074701

Chicago/Turabian Style

Faugeras, Olivier, and James MacLaurin. 2015. "Asymptotic Description of Neural Networks with Correlated Synaptic Weights" Entropy 17, no. 7: 4701-4743. https://doi.org/10.3390/e17074701

APA Style

Faugeras, O., & MacLaurin, J. (2015). Asymptotic Description of Neural Networks with Correlated Synaptic Weights. Entropy, 17(7), 4701-4743. https://doi.org/10.3390/e17074701

Article Menu

Asymptotic Description of Neural Networks with Correlated Synaptic Weights

Abstract

1. Introduction

2. The Neural Network Model

2.1. The Model Equations

2.2. The Laws of the Uncoupled and Coupled Processes

2.2.1. Preliminaries

2.2.2. Coupled and Uncoupled Processes

3. The Good Rate Function

3.1. Gaussian Processes

3.2. Convergence of Gaussian Processes

3.3. Definition of the Functional Γ

3.3.1. Γ1

3.3.2. Γ2

3.4. The Radon-Nikodym Derivative

4. The Large Deviation Principle

4.1. Lower Bound on the Open Sets

4.2. Exponential Tightness of ΠN

4.3. Upper Bound on the Compact Sets

4.3.1. An LDP for a Gaussian Measure

4.3.2. An Upper Bound for ΠN over Compact Sets

4.4. End of the Proof of Theorem 1

5. Characterization of the Unique Minimum of the Rate Function

6. Some Important Consequences of Theorem 1

7. Possible Extensions

8. Conclusions

Appendix

A. Two Useful Lemmas

B. Proof of Proposition 4

C. Existence of a Lower Bound for ΦN(µ, v)

D. Proof of Lemmas 10 and 15

E. Proof of Lemma 14

Acknowledgments

Author Contributions

Conflicts of Interest

References and Notes

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.3.1. Γ₁

3.3.2. Γ₂

4.2. Exponential Tightness of Π^N

4.3.2. An Upper Bound for Π^N over Compact Sets

C. Existence of a Lower Bound for Φ^N(µ, v)