The Onset of Parisi’s Complexity in a Mismatched Inference Problem

Camilli, Francesco; Contucci, Pierluigi; Mingione, Emanuele

doi:10.3390/e26010042

Open AccessFeature PaperEditor’s ChoiceArticle

The Onset of Parisi’s Complexity in a Mismatched Inference Problem

by

Francesco Camilli

^1,†,

Pierluigi Contucci

^2,†

and

Emanuele Mingione

^2,*,†

¹

The Abdus Salam International Center for Theoretical Physics, 34151 Trieste, Italy

²

Dipartimento di Matematica, Alma Mater Studiorum—Universita di Bologna, 40127 Bologna, Italy

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy 2024, 26(1), 42; https://doi.org/10.3390/e26010042

Submission received: 27 November 2023 / Revised: 21 December 2023 / Accepted: 25 December 2023 / Published: 30 December 2023

(This article belongs to the Special Issue A Journey Through Complex Landscapes—Dedicated to Professor Giorgio Parisi to Celebrate the Nobel Prize & His 75th Birthday)

Download Versions Notes

Abstract

We show that a statistical mechanics model where both the Sherringhton–Kirkpatrick and Hopfield Hamiltonians appear, which is equivalent to a high-dimensional mismatched inference problem, is described by a replica symmetry-breaking Parisi solution.

Keywords:

statistical mechanics; spin glasses; high-dimensional inference; replica symmetry breaking

1. Introduction

Beginning with Parisi’s seminal works on the Sherringhton–Kirkpatrick (SK) model [1,2], the ideas and the tools developed in spin glass theory spread across many other research fields such as computer science, probability theory, neural networks and others [3,4,5,6,7]. From a mathematical perspective, efforts to rigorously prove Parisi’s theory have yielded powerful techniques, such as interpolation methods [8,9], stochastic stability [10], and synchronization [11], which are currently instrumental in analyzing numerous disordered systems.

In this work, we will consider a family of mean-field spin glasses whose Hamiltonian contains two types of random interactions: the first is the SK type, while the second is a Hopfield model with a finite number of patterns. This class of models can also be interpreted as a high-dimensional inference problem known as a spiked Wigner model in a mismatched setting [12,13,14,15,16].

Our main result is a representation of the thermodynamic limit for the quenched pressure per particle in terms of a variational problem of Parisi type. The proof relies on two main ingredients: Guerra’s replica symmetry-breaking bound, which allows controlling the SK contribution, and adaptive interpolation, which is employed to linearize the Hopfield interaction. We start with a review of the SK model in Section 2 and then lay the ground for the inferential interpretation of the model under study in Section 3. In Section 4, we define and rigorously identify the exact solution of the model. Finally, we describe some interesting challenges for future investigations.

2. The SK Model

The SK model was introduced in the 1970s by D. Sherringhton and S. Kirkpatrick [17] and stands as an explicitly solvable mean-field spin glass. In their work, the authors discovered that the solution obtained through the replica symmetric (RS) approximation was not correct at low temperature. With a groundbreaking approach, Parisi identified a new type of solutions, nowadays called replica symmetry breaking (RSB), which proved to be correct at any temperature, thereby revealing a novel mathematical and physical structure [18].

The SK model is defined by its Hamiltonian, that is a function of N spins

σ = {(σ_{i})}_{i \leq N} \in {- 1, 1}^{N}

:

H_{N}^{S K} (σ) = - \frac{1}{\sqrt{2 N}} \sum_{i, j \leq N} z_{i j} σ_{i} σ_{j}

(1)

where

z = {(z_{i j})}_{i, j \leq N}

is a collection of i.i.d. standard Gaussian. In physical terms, the couplings between pairs of spins can be ferromagnetic or antiferromagnetic with equal probability. Consider also a random variable

ξ

with

E | ξ | < \infty

and a collection

ξ = {(ξ_{i})}_{i \leq N} \overset{iid}{\sim} ξ

representing random external fields acting on the spins. The Parisi formula is a representation for the large N limit of the pressure

p_{N}^{S K}

defined by

p_{N}^{S K} (β, h) = \frac{1}{N} log \sum_{σ \in {- 1, 1}^{N}} exp (- β H_{N}^{S K} (σ) + h σ \cdot ξ)

(2)

In the definition (2),

(β, h) \in R_{> 0} \times R

are fixed parameters, and the dependence on the realization of the random collections

z, ξ

is kept implicit. One can prove [5] that

p_{N}^{S K}

converges, for almost all realizations of the disorder, to its average

{\bar{p}}_{N}^{S K} (β, h) = E p_{N}^{S K} (β, h)

. Notice that

E

, taken after the logarithm, averages both the collections of

z

and

ξ

that are called quenched variables. The Hamiltonian (1) can also be regarded as a centered Gaussian process with covariance

E H_{N}^{S K} (σ^{1}) H_{N}^{S K} (σ^{2}) = \frac{N}{2} q_{N}^{2} (σ^{1}, σ^{2})

(3)

where

q_{N} (σ^{1}, σ^{2}) = \frac{1}{N} \sum_{i = 1}^{N} σ_{i}^{1} σ_{i}^{2} = \frac{1}{N} σ^{1} \cdot σ^{2} .

(4)

q_{N} (σ^{1}, σ^{2})

is the overlap between two spin configurations

σ^{1}

and

σ^{2}

.

The Parisi variational principle for the limiting pressure per particle of this model was proved after almost three decades of efforts, and it is mainly due to the works of Guerra [8] and Talagrand [19]. We hereby summarize these milestones in a single theorem.

Theorem 1

(Parisi Formula [8,19]). Let

M_{[0, 1]}

be the space of probability measures on

[0, 1]

,

β > 0

and

y \in R

. Consider the Parisi functional, which is defined as

\begin{matrix} χ \in M_{[0, 1]} ⟼ P (χ; β, y) = log 2 + Φ_{χ} (0, y, β) - \frac{β^{2}}{2} \int_{0}^{1} d q q χ ([0, q]) . \end{matrix}

(5)

where

Φ_{χ} (s, y, β)

solves the PDE

\begin{matrix} \{\begin{matrix} \partial_{s} Φ_{χ} = - \frac{β^{2}}{2} (\partial_{y}^{2} Φ_{χ} + χ ([0, s]) {(\partial_{y} Φ_{χ})}^{2}) \\ Φ_{χ} (1, y, β) = log cosh y . \end{matrix} \end{matrix}

(6)

The following holds

lim_{N \to \infty} p_{N}^{S K} (β, h) = inf_{χ \in M_{[0, 1]}} E_{ξ} P (χ; β, h ξ) a . s .

(7)

The key tool for the proof is the (Gaussian) interpolation method, which is introduced in [9] in order to prove the existence of the large N limit of

{\bar{p}}_{N}^{S K}

.

The thermodynamic equilibrium induced by the pressure

{\bar{p}}_{N}^{S K}

is called quenched equilibrium and is defined as follows. Physical quantities (e.g., energy) are functions of the disorder variables

z, ξ

and the spin configurations

σ

. Given a function

f (z, ξ, σ)

, its equilibrium value is defined as

E {〈 f 〉}_{N} : = E \sum_{σ \in {- 1, 1}^{N}} G_{N} (σ) f (z, ξ, σ)

(8)

where

G_{N}

is the (random) Boltzmann–Gibbs distribution

\begin{matrix} G_{N} (σ) = \frac{exp (- β H_{N}^{S K} (σ) + h σ \cdot ξ)}{Z_{N}} \end{matrix}

(9)

The measure

E {〈 〉}_{N}

is called a quenched measure and can be viewed as a two-step measuring process. Initially, for a given realization of the disorder variables

z, ξ

, one assumes that the system equilibrates according to the canonical Boltzmann–Gibbs distribution

G_{N}

defining a (random) measure on the space of spin configurations. The expectation with respect to

G_{N}

is denoted by

{〈 〉}_{N}

, namely

{〈 f 〉}_{N} : = \sum_{σ \in {- 1, 1}^{N}} G_{N} (σ) f (z, ξ, σ)

(10)

In probabilistic terms,

G_{N}

defines a conditional measure given

z

and

ξ

. The remaining degrees of freedom

z

,

ξ

are then averaged according to their apriori distribution

E

.

An important role is played by the concept of replicas. Replicas are i.i.d. samples from

G_{N}

at fixed disorder. Hence, the equilibrium value of a function

f (z, ξ, σ^{1}, \dots, σ^{n})

of n replicas and the quenched variables

z, ξ

is defined by

E {〈 f 〉}_{N} = E \prod_{a \leq n} \sum_{σ^{a} \in {- 1, 1}^{N}} G_{N} (σ^{a}) f (z, ξ, σ^{1}, \dots, σ^{n}) .

(11)

The computation of derivatives of

{\bar{p}}_{N}^{S K}

shows, using integration by parts, that the SK model is fully characterized by the (joint) distribution of the overlap array

{(q_{N} (σ^{l}, σ^{l^{'}}))}_{l, l^{'} \leq n} \equiv {(q_{l, l^{'}})}_{l, l^{'} \leq n}

, namely the overlaps between any finite number n of replicas with respect to the measure (11). The main feature of the Parisi theory is the characterization of the mentioned joint measure by means of two structural properties:

(i): It is uniquely determined by a one-dimensional marginal, namely the distribution of $q_{1, 2}$ ;
(ii): The distribution of three replicas has with a probability of one an ultrametric support

$lim_{N \to \infty} E 〈1 (q_{1, 2} \geq min (q_{1, 3}, q_{2, 3}))〉 = 1 .$

(12)

Despite having a mathematical proof of the Parisi Formula (7) for the SK model, (i) and (ii) have been rigorously proved only in the mixed p-spin model [6,20,21], an extension of the SK model, whose Hamiltonian contains also higher-order interactions (three-body, four-body, etc.).

One of the crucial instruments to achieve a rigorous control of the model is the so-called Ruelle Probability Cascades (RPCs), defined by Ruelle [22] when formalizing the properties of the Generalized Random Energy model of Derrida [23]. See also the characterization of RPC in terms of coalescent processes given in [24]. The first direct link between RPC and the SK model appeared in the work of Aizenman–Sims–Starr [25], where the authors found a representation of the thermodynamic limit of quenched pressure per particle in terms of the cavity fields distribution. This representation strongly suggested that if the thermodynamic limit of the overlap distribution is described by an RPC, then the Parisi formula is correct.

The first signal that the overlap array is described by an RPC was originally found by Aizenmann and Contucci in [10] with the identification of stochastic stability and by Ghirlanda and Guerra [26]. Both papers show an (infinite) set of identities for the moments of the overlap array distribution. It turns out that these identities actually imply that the support of the joint distribution of the overlap is ultrametric, as proved by Panchenko [27]. It should be noticed that Panchenko’s theorem requires identities for the overlap moments of all orders. The latter do not hold for the bare SK model, but it can be shown that there exists a perturbation of the Hamiltonian that forces the SK model to satisfy them without affecting the limit of the quenched pressure [28].

Once the validity of the Parisi Formula (7) is established, it is natural to ask for the properties of its solution. The uniqueness of the minimizer of (7) has been assessed by Auffinger and Chen [29], and its properties have been investigated for example in [30,31].

A relevant question about the minimizer is the following: for which values of the parameters

(β, h)

is the solution of (7) a Dirac-delta function

δ_{q}

for some

q \in [0, 1]

? In this case, we say that the model is replica symmetric and the Parisi Formula (7) reads

lim_{N \to \infty} p_{N}^{S K} (β, h) = inf_{q \in [0, 1]} \{log 2 + E_{z, ξ} log cosh (β z \sqrt{q} + h ξ) + \frac{β^{2}}{4} {(1 - q)}^{2}\} .

(13)

The replica symmetric region can be identified [6,32] with the region of parameters

(β, h)

where the overlap is a self-averaging quantity, namely

lim_{N \to \infty} E {〈{(q_{1, 2} - q^{*})}^{2}〉}_{N} = 0

(14)

where

q^{*}

is exactly the value that realizes the infimum in (13). The physics conjecture is that the replica symmetric region can be identified by the so called Almeida–Thouless [33]

β^{2} E_{z, ξ} {cosh}^{- 4} (β z \sqrt{q^{*}} + h ξ) \leq 1 .

(15)

The above conjecture is proved only in the case of Gaussian external field

ξ_{i} \sim N (0, 1)

[34]. An alternative characterization of the replica symmetric region has been obtained in [6,35]. If the minimizer corresponds to a non-trivial distribution (i.e., with non-zero variance), we say that replica symmetry breaking occurs, and the overlap is not a self-averaging quantity.

The Parisi formula has been extended to other mean field models with centered Gaussian interactions: vector spins [36], multispecies models [11,37,38], multiscale models [39,40]. Finally, we mention that the SK model fulfills a remarkable universality property: as long as

z_{i j}

’s are independent, centered, and with unit variance, the thermodynamic limit is still described by the Parisi solution [41].

In this work, we show that a class of non-centered Gaussian spin glasses admits an interpretation of high-dimensional inference that extends the celebrated correspondence between the spiked Wigner model and the SK model in the Nishimori line where replica symmetry is always fulfilled [3]. We show that the addition of an SK Hamiltonian to a Hopfield with a finite number of patterns can be mapped into a high-dimensional mismatched inference problem, where the statistician ignores the correct apriori distribution on the signal components they have to reconstruct. We shall see that even this slight mismatch may lead to the emergence of complexity, namely to the breakdown of replica symmetry, which is instead guaranteed under very mild hypotheses for optimal statisticians.

3. High-Dimensional Inference and Statistical Physics

High-dimensional inference aims at recovering a ground truth signal,

ξ

in the following, that is usually a vector with a very large number of components from some noisy observations of it, which is denoted by

Y

. The main feature of this setting is that the dimension of the signal, i.e., the number of real parameters to reconstruct, and the number of observations at disposal are a function of one another, typically a polynomial. For instance, for our purposes,

ξ

will be a vector of

R^{N}

and

Y

will be an

N \times N

matrix for a total of

N^{2}

noisy observations. Hence, if the number of observations becomes large, the number of parameters to retrieve also does. Contrary to what happens in typical low-dimensional settings, where max-likelihood, or Maximum A Posteriori (MAP) approaches yield provably satisfactory reconstruction performances, in a high-dimensional setting, this is not always the case. In particular, one needs to devise another kind of more refined estimators that exploit the marginal posterior probabilities for each signal component.

Both approaches described above are Bayesian, and the knowledge of a prior distribution on the signal components can play a key role especially for high-dimensional problems. Furthermore, to compose the posterior measure for the entire signal, one needs the likelihood of the data, which is the probability of an outcome

y

of the variable

Y

given a certain ground truth realization

ξ = x

. As we shall discuss soon, under certain hypotheses, the Bayesian approach highlights the correspondence of relevant information theoretic quantities with thermodynamic ones. Among the others, a key quantity is the mutual information between the signal

ξ

and the observations

Y

, which quantifies the residual amount of information left in

Y

about

ξ

after the noise corruption. As intuition may suggest, the mutual information gives access to the best reconstruction error that is information theoretically achievable.

Finally, we stress that the high dimensionality of the problem can induce phase transition in some parameters of the model, like the so-called signal-to-noise ratio (SNR), that tunes the strength of the signal with respect to that of the noise in the observations.

3.1. Bayes-Optimality and Nishimori Identities

For the sake of simplicity, we start by considering a signal

ξ = {(ξ_{i})}_{i \leq N} \in R^{N}

of i.i.d. (independently and identically distributed) components

ξ_{i} \overset{iid}{\sim} P_{ξ}

, where

P_{ξ}

has a finite fourth moment. The observations at the disposal of a statistician can be modeled as a stochastic function of the ground-truth signal:

Y = F (ξ; z)

, where

z

is the source of randomness or simply the noise. Knowing the function

F

, from a Bayesian perspective, translates directly into having the likelihood of the model, namely the conditional distribution

d P_{Y | ξ = x} (y) = p_{Y | ξ = x} (y) d y

, which we assume to have a density

p_{Y | ξ = x} (y)

over the Lebesgue measure. Observe that the likelihood is strongly affected by the nature of the noise.

According to Bayes’ rule, the posterior distribution of

ξ

given the data is:

\begin{matrix} d P_{ξ ∣ Y = y} (x) = \frac{p_{Y | ξ = x} (y) d P_{ξ} (x)}{Z (y)}, Z (y) = \int p_{Y | ξ = x} (y) d P_{ξ} (x), \end{matrix}

(16)

where

d P_{ξ} (x) = \prod_{i \leq N} d P_{ξ} (x_{i})

, and

Z (y)

is the probability of a given realization of the data, which is sometimes also called evidence. In practice, the above posterior, which would be ideal to perform inference, is rarely available, and the statistician is not aware either of the likelihood or of the correct prior distribution for the signal, or even both. This motivates the following definition of a special inference setting:

Definition 1

(Bayes optimality). The statistician is said to be Bayes optimal, or in the Bayes-optimal setting, if they are aware both of

P_{ξ}

and

F (\cdot; z)

; namely, they have access to the posterior (16).

The above is saying that an optimal statistician knows everything about the model except for the ground truth

ξ

itself. The Bayes-optimal setting is thus often used as a theoretical framework to establish the information theoretical limits. Indeed, it is known that the mean square error between the ground truth and an estimator

\hat{ξ} (y)

\begin{matrix} MSE (\hat{ξ}) = E {∥ ξ - \hat{ξ} (y) ∥}^{2} \end{matrix}

(17)

is minimized by an optimal statistician that can use the posterior mean as an estimator, yielding the minimum mean square error (MMSE)

\begin{matrix} MMSE = E ∥ ξ - E_{ξ ∣ Y} {ξ ∥}^{2} . \end{matrix}

(18)

In the following, we shall denote averages with respect to the posterior as

{〈 \cdot 〉}_{Y}

.

Another important consequence of this setting is the so-called Nishimori identities, which can be stated as follows. Given any continuous bounded function f of the data

Y

, the ground truth

ξ

and

n - 1

i.i.d. samples from the posterior

{(x^{(k)})}_{k = 2}^{n}

, one has

\begin{matrix} E {〈 f (Y, ξ, {(x^{(k)})}_{k = 2}^{n}) 〉}_{Y} = E {〈 f (Y, x^{(1)}, {(x^{(k)})}_{k = 2}^{n}) 〉}_{Y},, \end{matrix}

(19)

where

x^{(1)} \sim P_{ξ ∣ Y}

. An elementary proof can be found in [42]. These identities are enforcing a symmetry between replicas drawn from the posterior and the ground truth. For instance, a direct application of the Nishimori identities yields

\begin{matrix} {MMSE = E ∥ ξ - 〈 x 〉}_{Y} ∥^{2} = E 〈 ∥ x - {〈 x 〉}_{Y} {∥^{2} 〉}_{Y} . \end{matrix}

(20)

It is important to stress that, as it can be seen from the above equation, an optimal statistician is actually able to compute the minimum mean square error using their posterior.

At this point, the reader will have noticed a similarity with the Statistical Mechanics formalism. In fact, it is possible to interpret

Z (y)

as the partition function of a model with Hamiltonian

- log p_{Y ∣ ξ = x} (y)

and unit inverse absolute temperature. The pressure per particle of such a model would thus be

\begin{matrix} p_{N} (y) = \frac{1}{N} log Z (y), {\bar{p}}_{N} = E_{Y} p_{N} (Y) = - \frac{1}{N} H (Y), \end{matrix}

(21)

namely minus the Shannon entropy of the data per signal component, which is related to the mutual information

\begin{matrix} \frac{1}{N} I_{N} (Y; ξ) = \frac{1}{N} H (Y) - \frac{1}{N} H (Y ∣ ξ) . \end{matrix}

(22)

The contribution coming from the conditional entropy

H (Y ∣ ξ)

can be regarded as due only to the noise, since for fixed

ξ

, the only randomness in

Y

is due to

Z

.

We stress here that Bayes optimality, and the Nishimori identities, under rather mild hypotheses [43] are enough to grant replica symmetry in the model, i.e., concentration of the order parameters in the model. For the models we are interested in, the latter can be shown to imply finite-dimensional variational principles for the limiting mutual information.

3.2. The Spiked Wigner Model

The spiked Wigner model (WSM) was first introduced in [44] as a model for Principal Component Analysis (PCA), and since then, it was widely studied in recent literature. Without pretension of being exhaustive, we refer the interested reader to [42,45,46,47,48,49,50,51]. For our purposes, we restrict ourselves to the case where the signal is an N-dimensional vector of

\pm 1

s, drawn from a Rademacher distribution

ξ_{i} \overset{iid}{\sim} P_{ξ} = (δ_{- 1} + δ_{1}) / 2

. The function

F (; z)

is a Gaussian channel, namely

\begin{matrix} Y_{i j} = \sqrt{\frac{μ}{2 N}} ξ_{i} ξ_{j} + z_{i j} \end{matrix}

(23)

where

z_{i j} \overset{iid}{\sim} N (0, 1)

, and

μ

is a positive parameter called the signal-to-noise ratio. The statistician is tasked with the recovery of

ξ

given the observations

Y

. The Bayes-optimal posterior measure for this inference problem can be written directly as a Boltzmann–Gibbs random measure thanks to the Gaussian nature of the likelihood:

\begin{matrix} G_{N} (σ) = \frac{1}{Z (z, ξ)} e^{- H_{N} (σ; z, ξ)}, - H_{N} (σ; z, ξ) = \sum_{i, j = 1}^{N} \sqrt{\frac{μ}{2 N}} z_{i j} σ_{i} σ_{j} + \frac{μ}{2 N} σ_{i} ξ_{i} σ_{j} ξ_{j} \end{matrix}

(24)

where we have already exploited the fact that

{(ξ_{i})}^{2} = σ_{i}^{2} = 1

. We are denoting the posterior samples with

σ

. Since the quantity we are interested in is the quenched pressure of this model

\begin{matrix} {\bar{p}}_{N} = \frac{1}{N} E log \sum_{σ \in {- 1, 1}^{N}} exp [- H_{N} (σ; z, ξ)] \end{matrix}

(25)

that is connected to the mutual information

I_{N} (Y; ξ) / N

by a simple shift with an additive constant, we are allowed to perform a gauge transformation without altering its value:

\begin{matrix} z_{i j} \mapsto z_{i j} ξ_{i} ξ_{j}, σ_{i} σ_{j} \mapsto σ_{i} σ_{j} ξ_{i} ξ_{j} . \end{matrix}

(26)

This results in a Hamiltonian that is now independent of the original ground-truth signal

\begin{matrix} - H_{N}^{'} (σ; z) = \sum_{i, j = 1}^{N} (\sqrt{\frac{μ}{2 N}} z_{i j} + \frac{μ}{2 N}) σ_{i} σ_{j} \end{matrix}

(27)

and the coupling between spins are Gaussian random variables with a mean equal to their variance. This condition identifies a peculiar region of the phase space of a spin-glass model, which is called Nishimori line. In fact, the Nishimori identities were first discovered and studied in the context of gauge spin-glasses. Despite looking simpler, the above model retains most of the features we need for our study.

For inference models with additive Gaussian noise, like the one above, it is possible to prove the so-called I-MMSE relation:

\begin{matrix} \frac{d}{d μ} \frac{I_{N} (Y; ξ)}{N} = \frac{1}{4 N^{2}} E {∥ ξ ξ^{⊺} - 〈 σ σ^{⊺} 〉 ∥}_{F}^{2}, \end{matrix}

(28)

where

{∥ \cdot ∥}_{F}

is the Frobenius norm and

〈 \cdot 〉

denotes the expectation with respect to the Boltzmann–Gibbs measure induced by (27). Hence, once the mutual information is known, the MMSE can be accessed through a derivative with respect to the signal-to-noise ratio. A clarification is in order here: the above is the MMSE on the reconstruction of the rank-one matrix

ξ ξ^{⊺}

, because, due to flip symmetry, here we do not have any actual information on the single vector

ξ

, but only on the spike

ξ ξ^{⊺}

.

3.3. Sub-Optimality and Replica Symmetry Breaking

There are several ways to break Bayes optimality. Some examples are that the statistician does not know the signal-to-noise ratio

μ

[13,52]; the statistician adopts a likelihood different from that of the true model [14]; the statistician adopts a wrong prior [12,53]; combinations of the previous and many others. We will focus on the mismatching priors case, where the statistician not only adopts a wrong prior on the ground-truth elements, but they are not aware of the rank of the spiked matrix hidden inside the noise, which is denoted by M. The rest is assumed to be known. The channel of the inference problem is

\begin{matrix} Y_{i j} = \sqrt{\frac{μ}{2 N}} \sum_{k = 1}^{M} ξ_{i}^{(k)} ξ_{j}^{(k)} + z_{i j} . \end{matrix}

(29)

If the statistician assumes a Rademacher prior to the signal components and a rank-one hidden matrix, they will write a posterior in the form

\begin{matrix} {/ P}_{ξ | z, ξ} (σ) = \frac{1}{/ Z (z, ξ)} exp (- H_{N} (σ; z, ξ)) \end{matrix}

(30)

where

\begin{matrix} - H_{N} (σ; z, ξ) = \sum_{i, j = 1}^{N} (\sqrt{\frac{μ}{2 N}} z_{i j} + \frac{μ}{2 N} \sum_{k = 1}^{P} ξ_{i}^{(k)} ξ_{j}^{(k)}) σ_{i} σ_{j} \end{matrix}

(31)

The slash on quantities emphasizes that they are not the Bayes-optimal ones. In this setting, one can no longer rely on the Nishimori identities, and in principle, replica symmetry is no longer guaranteed. On the contrary, as we shall argue later on, a mismatch in the prior only is already sufficient to cause replica symmetry breaking.

4. The Model

Let M be a fixed integer and

k \in {1, \dots, M}

. Consider two independent random collections

{(z_{i j})}_{i, j \leq N} \overset{iid}{\sim} N (0, 1)

and

{(ξ_{i}^{(k)})}_{i \leq N}^{k \leq M} \overset{iid}{\sim} P_{ξ}

where

P_{ξ}

is such that

E [ξ^{4}] < \infty

. The above random collections play the role of quenched disorder in the model. Consider N Ising spins

σ = {(σ_{i})}_{i \leq N} \in {+ 1, - 1}^{N}

and the Hamiltonian function

\begin{matrix} \begin{matrix} H_{N} (σ; μ, ν, λ) & \equiv H_{N} (σ) = H_{N}^{i n t} (σ) - h \cdot σ \\ = - \sum_{i, j = 1}^{N} (\sqrt{\frac{μ}{2 N}} z_{i j} + \frac{ν}{2 N} \sum_{k \leq M} ξ_{i}^{(k)} ξ_{j}^{(k)}) σ_{i} σ_{j} - \sum_{k \leq M} λ_{k} \sum_{i = 1}^{N} ξ_{i}^{(k)} σ_{i} \end{matrix} \end{matrix}

(32)

with

μ, ν \geq 0, λ = {(λ_{k})}_{k \leq M} \in R^{M}

. Here,

H_{N}^{i n t}

is the interacting part while

h = {(h_{i})}_{i \leq N} \equiv h (λ, ξ) = {(\sum_{k \leq M} λ_{k} ξ_{i}^{(k)})}_{i \leq N}

(33)

denotes the random external field acting on the spins. The Hamiltonian (32) is determined by the choice of

M, μ, ν, λ

and

P_{ξ}

. For

μ = ν

, the interaction term

H_{N}^{i n t}

coincides with the Hamiltonian (31). Note that for some special choices of the parameters, we recover some well-known spin glass models:

$ν = 0$ gives the SK model (1) at $β = \sqrt{μ}$ and random external field $h$ .
$μ = 0$ gives the Hopfield model [6,7,18] with a finite number of patterns ${(ξ^{(k)})}^{k \leq M}$ .
$M = 1, ν = μ, λ_{1} = 0$ and $P_{ξ} = \frac{1}{2} (δ_{- 1} + δ_{- 1})$ gives the SK model on the Nishimori line (27). As we have seen in Section 3, the latter can be also viewed as a spiked Wigner model in the Bayesian-optimal setting.

Notice that the entire model can be interpreted as a Hopfield model where the traditional Hebbian matrix

\sum_{k \leq M} ξ_{i}^{(k)} ξ_{j}^{(k)}

is corrupted by Gaussian noise. Furthermore, if the Hebbian coupling is replaced by a constant matrix, the model reduces to an SK model with the addition of a ferromagnetic interaction, and it was studied in [54].

Our main result is the computation of the thermodynamic limit of the pressure per particle

\begin{matrix} p_{N} (μ, ν, λ) = \frac{1}{N} log \sum_{σ \in {- 1, 1}^{N}} e^{- H_{N} (σ)} \end{matrix}

(34)

whose variance can be shown to converge to 0 as an

O (N^{- 1})

, namely:

Lemma 1.

Assume

E ξ_{1}^{4} < \infty

. Then, for any

μ, ν \in R

and

λ \in R^{K}

\begin{matrix} E {[p_{N} (μ, ν, λ) - E p_{N} (μ, ν, λ)]}^{2} \leq \frac{K}{N} \end{matrix}

(35)

where K is a suitable positive constant.

We thus focus on

{\bar{p}}_{N} (μ, ν, λ) = E p_{N} (μ, ν, λ)

. The proof of this lemma makes use of the Efron–Stein concentration inequality to bound the variance, and it is simple but tedious. It follows closely that of ([12], Lemma 9). We are now in a position to state our main theorem:

Theorem 2

(Variational solution). If

E [ξ^{4}] < + \infty

then

\begin{matrix} lim_{N \to \infty} {\bar{p}}_{N} (μ, ν, λ) = sup_{x \in R^{M}} φ (x; μ, ν, λ) \end{matrix}

(36)

where

\begin{matrix} φ (x; μ, ν, λ) : = - \frac{{ν | x |}^{2}}{2} + E P (\sqrt{μ}, h (x, λ, ξ)), x = {(x_{k})}_{k \leq M} \in R^{M} \end{matrix}

(37)

and

P

is the Parisi functional (5) with a random external field

\begin{matrix} h (x, λ, ξ) = \sum_{k \leq M} (λ_{k} + ν x_{k}) ξ^{(k)} \end{matrix}

(38)

and

E

denotes the expectation with respect to

ξ^{(k)} \overset{iid}{\sim} P_{ξ}

. The consistency equations are

x_{k} = \frac{\partial}{\partial x_{k}} E P (\sqrt{μ}, h (x, λ, ξ)), k \leq M

(39)

Moreover, there exists

C > 0

such that for any

k \leq M

, one has

| x_{k} | \leq C

and the supremum in (36) can be restricted to

{[- C, C]}^{M}

.

The proof of the theorem is based on the concentration of the Mattis magnetization, which is the normalized scalar product between a spin-configuration (or sample from the wrong posterior measure) and one of the

ξ^{(k)}

:

m_{N} (σ | ξ^{(k)}) = \frac{1}{N} \sum_{i = 1}^{N} σ_{i} ξ_{i}^{(k)} = \frac{1}{N} σ \cdot ξ^{(k)} .

(40)

The Hamiltonian can thus be rewritten using (40) in the following form:

\begin{matrix} H_{N} (σ) = \sqrt{μ} H_{N}^{S K} (σ) - N \sum_{k \leq M} [\frac{ν}{2} m_{N}^{2} (σ | ξ^{(k)}) + λ_{k} m_{N} (σ | ξ^{(k)})] \end{matrix}

(41)

The Mattis magnetization, in fact, plays the role of an order parameter for this model. The concentration we can prove is only an integral average over some suitably small magnetic fields, which is still sufficient for our purposes:

Proposition 1

(Concentration of Mattis Magnetizations). Consider a k such that

k \leq M

. Let

ϵ_{k} \in [s_{N}, 2 s_{N}]

with

s_{N} = \frac{1}{2} N^{- α}

,

α \in (0, 1 / (2 M))

for all

k \leq M

. For any

y \in R

, we denote by

{〈 \cdot 〉}_{N, y}

the Boltzmann–Gibbs measure induced by the Hamiltonian

H_{N, y} (σ) = H_{N} (σ) - y σ \cdot ξ^{(k)}

. Then

\begin{matrix} lim_{N \to \infty} \frac{1}{s_{N}^{M}} \int_{s_{N}}^{2 s_{N}} \prod_{ℓ \leq M} d ϵ_{ℓ} E {〈{(m_{N} (σ | ξ^{(k)}) - E {〈 m_{N} (σ | ξ^{(k)}) 〉}_{N, ϵ})}^{2}〉}_{N, ϵ} = 0, \end{matrix}

(42)

for all

μ, ν \geq 0

and

λ \in R^{M}

.

We shall omit the proof of the above result as it is completely analogous to the one in [12]. We will need an intermediate lemma that leads to it (see Lemma 2 later) together with a second key ingredient: the adaptive interpolation technique [48] combined with Guerra’s replica symmetry-breaking upper bound for the quenched pressure of the SK model [8].

Proof of Theorem 2.

Here, we outline the main steps of the proof of the variational principle for the thermodynamic limit. The proof is achieved via two bounds that match in the

N \to \infty

limit. Let us start by defining the interpolating Hamiltonian

\begin{matrix} \begin{matrix} H_{N} (t; σ) : & = H_{N} (σ; μ, (1 - t) ν, λ + R_{ϵ} (t)) = \\ = \sqrt{μ} H_{N}^{S K} (σ) - (1 - t) \frac{N ν}{2} \sum_{k \leq M} m_{N}^{2} (σ | ξ^{(k)}) - N \sum_{k \leq M} (λ_{k} + R_{ϵ}^{(k)} (t)) m_{N} (σ | ξ^{(k)}) \end{matrix} \end{matrix}

(43)

where

R_{ϵ} = {(R_{ϵ}^{(k)})}_{k \leq M}

and

\begin{matrix} R_{ϵ}^{(k)} (t) = ϵ_{k} + ν \int_{0}^{t} d s r_{ϵ}^{(k)} (s), ϵ_{k} \in [s_{N}, 2 s_{N}], s_{N} = \frac{N^{- α}}{2} \end{matrix}

(44)

with

α \in (0, 1 / (2 M))

and where the interpolating functions

r_{ϵ}^{(k)}

, that must be continuously differentiable in

[0, 1]

and non negative, will be suitably chosen. With this interpolation, one is able to prove the following sum rule:

Proposition 2.

The following sum rule holds:

\begin{matrix} {\bar{p}}_{N} (μ, ν, λ) = {\bar{p}}_{N}^{S K} (\sqrt{μ}, λ + R_{ϵ} (1)) - \frac{ν}{2} \int_{0}^{1} d t \sum_{k \leq M} [r_{ϵ}^{(k) 2} (t) - Δ_{ϵ}^{(k)} (t)] + O (s_{N}) \end{matrix}

(45)

where

\begin{matrix} Δ_{ϵ}^{(k)} (t) : = E {〈{(m_{N} (σ | ξ^{(k)}) - r_{ϵ}^{(k)} (t))}^{2}〉}_{N, R_{ϵ} (t)} . \end{matrix}

(46)

The proof consists of the computation of the derivative of the interpolating pressure related to the model (43). It follows closely that of ([12], Proposition 7), to which we refer the interested reader. Since the remainder

Δ_{ϵ}^{(k)}

is non-negative, the above proposition already yields a bound for the quenched pressure of our model when we choose

r_{ϵ}^{(k)} = x_{k} \in R

constant:

\begin{matrix} \underset{N \to \infty}{lim inf} {\bar{p}}_{N} (μ, ν, λ) \geq sup_{x \in R^{M}} φ (x; μ, ν, λ) \end{matrix}

(47)

where we used Lipschitz continuity of the SK pressure in the magnetic fields.

The upper bound requires more attention. First, we notice that

{\bar{p}}_{N}^{S K} (\sqrt{μ}, λ + R_{ϵ} (1))

is convex in the magnetic fields and that

R_{ϵ}^{(k)} (1) = \int_{0}^{1} d t (ϵ_{k} + ν r_{ϵ}^{(k)} (t))

. Hence, we can use Jensen’s inequality and Lipschitz continuity of

{\bar{p}}^{S K}

to obtain:

\begin{matrix} {\bar{p}}_{N} (μ, ν, λ) \leq \int_{0}^{1} d t [{\bar{p}}_{N}^{S K} (\sqrt{μ}, λ + r_{ϵ} (t)) - ν \sum_{k_{\leq} M} \frac{r_{ϵ}^{(k) 2} (t)}{2}] + \frac{ν}{2} \sum_{k \leq M} \int_{0}^{1} Δ_{ϵ}^{(k)} (t) d t + O (s_{N}) . \end{matrix}

(48)

Now, we use Guerra’s bound for the SK pressure, that, importantly, is uniform in N, and we average over

ϵ

on both sides

\begin{matrix} {\bar{p}}_{N} (μ, ν, λ) \leq E_{ϵ} \int_{0}^{1} φ (r_{ϵ} (t); μ, ν, λ) d t + \frac{ν}{2} \sum_{k \leq M} E_{ϵ} \int_{0}^{1} Δ_{ϵ}^{(k)} (t) d t + O (s_{N}) \leq \\ \leq sup_{x \in R^{M}} φ (x; μ, ν, λ) + \frac{ν}{2} \sum_{k \leq M} E_{ϵ} \int_{0}^{1} Δ_{ϵ}^{(k)} (t) d t + O (s_{N}) . \end{matrix}

(49)

What remains to do is to prove that

E_{ϵ} Δ_{ϵ}^{(k)} (t) \overset{N \to \infty}{\to} 0

for a proper choice of the interpolating functions

R_{ϵ}

. The choice is made through a system of coupled ODEs

\begin{matrix} {\dot{R}}_{ϵ}^{(k)} (t) \equiv ν r_{ϵ}^{(k)} (t) = ν E {〈 m_{N} (σ ∣ ξ^{(k)}) 〉}_{N, R_{ϵ} (t)}, R_{ϵ}^{(k)} (0) = ϵ_{k}, \forall k \leq M . \end{matrix}

(50)

One can easily check that the above system is regular enough to admit a unique solution on the interval

t \in [0, 1]

. In this case, the remainder to push to 0 would appear as

\begin{matrix} \frac{1}{s_{N}^{M}} \int_{s_{N}}^{2 s_{N}} \prod_{ℓ \leq M} d ϵ_{ℓ} E {〈{(m_{N} (σ | ξ^{(k)}) - E {〈 m_{N} (σ | ξ^{(k)}) 〉}_{N, R_{ϵ} (t)})}^{2}〉}_{N, R_{ϵ} (t)} . \end{matrix}

(51)

The goal is now to apply a concentration lemma here:

Lemma 2.

Let

y \in [y_{1}, y_{2}], δ \in (0, 1)

and denote by

{〈 \cdot 〉}_{N, y}

the Boltzmann–Gibbs expectation associated to the Hamiltonian

H_{N} (σ; μ, ν, λ + y e_{k})

where

k \leq M

and

e_{k}

is the k-th canonical basis vector of

R^{M}

. Then

\begin{matrix} E {〈{(m_{N} (σ | ξ^{(k)}) - {〈 m_{N} (σ | ξ^{(k)}) 〉}_{N, y})}^{2}〉}_{N, y} = \frac{1}{N} \frac{d^{2}}{d y^{2}} {\bar{p}}_{N} (μ, ν, λ + y e_{k}) \end{matrix}

(52)

\begin{matrix} \begin{matrix} E [{({〈 m_{N} (σ | ξ) 〉}_{N, y} - E {〈 m_{N} (σ | ξ) 〉}_{N, y})}^{2}] \leq \frac{12 K}{δ^{2} N} + \\ + 8 \sqrt{a} \frac{d}{d y} [{\bar{p}}_{N} (μ, ν, λ + (y + δ) e_{k}) - {\bar{p}}_{N} (μ, ν, λ + (y - δ) e_{k})] \end{matrix} \end{matrix}

(53)

with K a positive constant.

Notice that the integral in (51) is over

ϵ

and not over the effective magnetic field of the model, which is instead

R_{ϵ} (t)

. Nevertheless, we can integrate over the magnetic fields

R_{ϵ} (t)

with a change of variables. This involves a Jacobian that is larger than 1. In fact, thanks to Liouville’s theorem ([55], Corollary 3.1, Chapter V), one can prove that

\begin{matrix} det \frac{\partial R_{ϵ} (t)}{\partial ϵ} = exp [\int_{0}^{t} ν \sum_{k \leq M} \frac{\partial}{\partial R_{ϵ}^{(k)} (s)} E {〈 m_{N} (σ | ξ^{(k)}) 〉}_{N, R_{ϵ} (s)} d s] = \\ exp [\int_{0}^{t} N ν \sum_{k \leq M} E {〈 {(m_{N} (σ | ξ^{(k)}) - E {〈 m_{N} (σ | ξ^{(k)}) 〉}_{N, R_{ϵ} (s)})}^{2} 〉}_{N, R_{ϵ} (s)} d s] \geq 1, \end{matrix}

(54)

when

ν \geq 0

.

This allows us to bound the thermal fluctuations in (51) using (52) and then Liouville’s theorem:

\begin{matrix} Δ_{T}^{(k)} : = \frac{1}{s_{N}^{M}} \int_{s_{N}}^{2 s_{N}} \prod_{ℓ \leq M} d ϵ_{ℓ} E {〈{(m_{N} (σ | ξ^{(k)}) - {〈 m_{N} (σ | ξ^{(k)}) 〉}_{N, R_{ϵ} (t)})}^{2}〉}_{N, R_{ϵ} (t)} \leq \\ \leq \frac{1}{s_{N}^{M}} \prod_{ℓ \leq M} \int_{R_{s_{N}}^{(ℓ)} (t)}^{R_{2 s_{N}}^{(ℓ)} (t)} d h_{ℓ} \frac{1}{N} \frac{d^{2}}{d h_{k}^{2}} {\bar{p}}_{N} (μ, ν, λ + h) = \\ = \frac{1}{N s_{N}^{M}} \prod_{ℓ \leq M, \neq k} \int_{R_{s_{N}}^{(ℓ)} (t)}^{R_{2 s_{N}}^{(ℓ)} (t)} d h_{ℓ} [E {〈 m_{N} (σ | ξ^{(k)}) 〉}_{N, h; h_{k} = R_{2 s_{N}}^{(k)}} - E {〈 m_{N} (σ | ξ^{(k)}) 〉}_{N, h; h_{k} = R_{s_{N}}^{(k)}}] \end{matrix}

(55)

Since

ξ_{i}

has a bounded second moment, using Cauchy–Schwarz inequality, one can show that

| E {〈 m_{N} (σ | ξ^{(k)}) 〉}_{N, h} |

is uniformly bounded by a constant C. Hence,

| R_{ϵ}^{(k)} (t) | \leq ϵ_{k} + C t ν \leq 1 + C ν

for any

k \leq M

by construction (recall (44) and (50)). Therefore,

Δ_{T} = O (\frac{1}{N s_{N}^{M}})

.

The fluctuations induced by the disorder can be bounded in a very similar fashion using (53):

\begin{matrix} Δ_{D}^{(k)} : = \frac{1}{s_{N}^{M}} \int_{s_{N}}^{2 s_{N}} \prod_{ℓ \leq M} d ϵ_{ℓ} E {({〈 m_{N} (σ | ξ^{(k)}) 〉}_{N, R_{ϵ} (t)} - E {〈 m_{N} (σ | ξ^{(k)}) 〉}_{N, R_{ϵ} (t)})}^{2} = O (\frac{1}{N δ^{2}} + \frac{δ}{s_{N}^{M}}) \end{matrix}

(56)

Hence, overall (51), that equals

Δ_{T}^{(k)} + Δ_{D}^{(k)}

is a

O (\frac{1}{N s_{N}^{M}} + \frac{1}{N δ^{2}} + \frac{δ}{s_{N}^{M}})

.

δ

can be chosen as a function of N in order to optimize the convergence rate:

δ = s_{N}^{2 M / 3} N^{- 1 / 3}

. Using Fubini’s theorem in (49) to exchange the t and

ϵ

averages and then Dominated Convergence, one concludes the proof. □

From the the variational problem (36), we can deduce also the differentiability properties of the limiting pressure obtaining the average values of the relevant thermodynamic quantities of the model:

Corollary 1.

Let

p (μ, ν, λ) = {lim}_{N \to \infty} {\bar{p}}_{N} (μ, ν, λ)

, and

Ω_{μ, ν, λ} = {argmax}_{x \in {[- C, C]}^{M}} φ (x; μ, ν, λ)

. Then

If $Ω_{μ, ν, λ} = {\bar{x}}$ is a singleton with $\bar{x} = {({\bar{x}}_{k})}_{k \leq M}$ then for any $k \leq M$ , one has

$\begin{matrix} lim_{N \to \infty} E {〈 m_{N} (σ | ξ^{k}) 〉}_{N} = \frac{\partial}{\partial λ_{k}} p (μ, ν, λ) = {\bar{x}}_{k} \end{matrix}$

(57)

and

$\begin{matrix} lim_{N \to \infty} E {〈 q_{12}^{2} 〉}_{N} = 1 - 4 \frac{\partial}{\partial μ} p (μ, ν, λ) = \int q^{2} d χ^{*} (q) . \end{matrix}$

(58)

where $χ^{*} (q)$ denotes the unique measure solving the Parisi variational principle in Theorem 1.
If $\{{| x |}^{2}, x \in Ω_{μ, ν, λ}\} = {| \bar{x} |^{2}}$ is a singleton then

$\begin{matrix} \frac{\partial}{\partial ν} p (μ, ν, λ) = \frac{| \bar{x} |^{2}}{2} . \end{matrix}$

(59)

More generally, let y be one of the variables

\sqrt{μ}, ν, λ_{1}, \dots, λ_{M}

, then the function

y \mapsto p (μ, ν, λ)

is convex. By Danskin theorem (see [56]),

y \mapsto p (μ, ν, λ)

is differentiable if and only if the set

\{\frac{\partial φ (x; μ, ν, λ)}{\partial y}, x \in Ω_{μ, ν, λ}\}

is a singleton.

5. Conclusions and Perspectives

In this paper, we offer an overview of the Parisi formula from a mathematical physics perspective, emphasizing its potential applications, particularly in addressing the mismatched inference problem outlined earlier. Building upon our previous work [12], we investigate a scenario where a statistician, tasked with reconstructing a finite-rank matrix, lacks knowledge about the underlying matrix generation process, including both the matrix elements and its rank. We consider the case in which the statistician assumes a rank-one matrix, leading to a mismatch between the "true" Bayes posterior and the one used for inference. Our key contribution is the proof that, contrary to what happens in the Bayes-optimal setting, this Bayesian mismatch induces replica symmetry breaking in the model. Consequently, we express the pressure of the corresponding spin glass as an infinite-dimensional variational principle over the space of distributions on [0, 1].

The chosen mismatch scenario shares some similarities with those studied in [57,58] with the fundamental difference being that here the rank of the hidden matrix is finite. In a recent work [59], the authors consider a general case of mismatch, which includes mismatching priors and likelihoods. The mentioned paper proves a universality property with respect to the likelihood assumed by the statistician provided that observations remain independent given the ground truth.

Despite these advancements, all the proofs available so far in the literature break down when considering a high-rank hidden matrix. To rigorously comprehend this scenario, addressing the solution of the Hopfield model is of crucial importance. However, to the best of our knowledge, its complete solution remains elusive [5,6].

Author Contributions

All authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

PIerluigi Contucci and Emanuele Mingione were partially supported by the EU H2020 ICT48 project Humane AI Net contract number 952026; by the Italian Extended Partnership PE01—FAIR Future Artificial Intelligence Research—Proposal code PE00000013 under the MUR National Recovery and Resilience Plan; by the project PRIN 2022—Proposal code: J53D23003690006.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors thank Jorge Kurchan, Nicolas Macris and Farzad Pourkamali for fruitful interactions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Parisi, G. An Infinite Number of Order Parameters for Spin Glasses. Phys. Rev. Lett. 1979, 43, 1754–1756. [Google Scholar] [CrossRef]
Parisi, G. A Sequence of Approximated Solutions to the S-K Model for Spin Glasses. J. Phys. A 1980, 13, L115. [Google Scholar] [CrossRef]
Nishimori, H. Statistical Physics of Spin Glasses and Information Processing: An Introduction; Oxford University Press: Oxford, NY, USA, 2001. [Google Scholar]
Mézard, M.; Montanari, A. Information, Physics, and Computation; Oxford Academic: Oxford, UK, 2009. [Google Scholar]
Talagrand, M. Mean Field Models for Spin Glasses: Volume I: Basic Examples; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Talagrand, M. Mean Field Models for Spin Glasses: Volume II: Advanced Replica-Symmetry and Low Temperature; Springer: Berlin/Heidelberg, Germany, 2011; Volume 55. [Google Scholar] [CrossRef]
Amit, D.J.; Gutfreund, H.; Sompolinsky, H. Spin-glass models of neural networks. Phys. Rev. A 1985, 32, 1007–1018. [Google Scholar] [CrossRef] [PubMed]
Guerra, F. Broken Replica Symmetry Bounds in the Mean Field Spin Glass Model. Commun. Math. Phys. 2003, 233, 1–12. [Google Scholar] [CrossRef]
Guerra, F.; Toninelli, F.L. The Thermodynamic Limit in Mean Field Spin Glass Models. Commun. Math. Phys. 2002, 230, 71–79. [Google Scholar] [CrossRef]
Aizenman, M.; Contucci, P. On the Stability of the Quenched State in Mean Field Spin Glass Models. J. Stat. Phys. 1998, 92, 765–783. [Google Scholar] [CrossRef]
Panchenko, D. The free energy in a multi-species Sherrington–Kirkpatrick model. Ann. Probab. 2015, 43, 3494–3513. [Google Scholar] [CrossRef]
Camilli, F.; Contucci, P.; Mingione, E. An inference problem in a mismatched setting: A spin-glass model with Mattis interaction. SciPost Phys. 2022, 12, 125. [Google Scholar] [CrossRef]
Pourkamali, F.; Macris, N. Mismatched Estimation of rank-one symmetric matrices under Gaussian noise. arXiv 2021, arXiv:cs.IT/2107.08927. [Google Scholar]
Barbier, J.; Hou, T.; Mondelli, M.; Sáenz, M. The price of ignorance: How much does it cost to forget noise structure in low-rank matrix estimation? Adv. Neural Inf. Process. Syst. 2022, 35, 36733–36747. [Google Scholar]
Fu, T.; Liu, Y.; Barbier, J.; Mondelli, M.; Liang, S.; Hou, T. Mismatched estimation of non-symmetric rank-one matrices corrupted by structured noise. arXiv 2023, arXiv:2302.03306. [Google Scholar]
Barbier, J.; Chen, W.K.; Panchenko, D.; Sáenz, M. Performance of Bayesian linear regression in a model with mismatch. arXiv 2021, arXiv:2107.06936. [Google Scholar]
Sherrington, D.; Kirkpatrick, S. Solvable Model of a Spin-Glass. Phys. Rev. Lett. 1975, 35, 1792–1796. [Google Scholar] [CrossRef]
Mezard, M.; Parisi, G.; Virasoro, M. Spin Glass Theory and Beyond; Lecture Notes in Physics Series; World Scientific: Singapore, 1987. [Google Scholar]
Talagrand, M. The Parisi Formula. Ann. Math. 2006, 163, 221–263. [Google Scholar] [CrossRef]
Panchenko, D. The Sherrington-Kirkpatrick Model; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
Auffinger, A.; Jagannath, A. Thouless–Anderson–Palmer equations for generic p-spin glasses. Ann. Probab. 2019, 47, 2230–2256. [Google Scholar] [CrossRef]
Ruelle, D. A mathematical reformulation of Derrida’s REM and GREM. Commun. Math. Phys. 1987, 108, 225–239. [Google Scholar] [CrossRef]
Derrida, B. A generalization of the Random Energy Model which includes correlations between energies. J. Phys. Lett. 1985, 46, 401–407. [Google Scholar] [CrossRef]
Bolthausen, E.; Sznitman, A.S. On Ruelle’s Probability Cascades and an Abstract Cavity Method. Commun. Math. Phys. 1998, 197, 247–276. [Google Scholar] [CrossRef]
Aizenman, M.; Sims, R.; Starr, S. Extended variational principle for the Sherrington-Kirkpatrick spin-glass model. Phys. Rev. B 2003, 68, 214403. [Google Scholar] [CrossRef]
Ghirlanda, S.; Guerra, F. General properties of overlap probability distributions in disordered spin systems. Towards Parisi ultrametricity. J. Phys. A Math. Gen. 1998, 31, 9149. [Google Scholar] [CrossRef]
Panchenko, D. The Parisi ultrametricity conjecture. Ann. Math. 2011, 177, 383–393. [Google Scholar] [CrossRef]
Contucci, P.; Mingione, E.; Starr, S. Factorization Properties in d-Dimensional Spin Glasses. Rigorous Results and Some Perspectives. J. Stat. Phys. 2012, 151, 809–829. [Google Scholar] [CrossRef][Green Version]
Auffinger, A.; Chen, W.K. The Parisi Formula has a Unique Minimizer. Commun. Math. Phys. 2015, 335, 1429–1444. [Google Scholar] [CrossRef]
Auffinger, A.; Chen, W.K. On properties of Parisi measures. Probab. Theory Relat. Fields 2013, 161, 817–850. [Google Scholar] [CrossRef]
Jagannath, A.; Tobasco, I. Some Properties of the Phase Diagram for Mixed p-Spin Glasses. Probab. Theory Relat. Fields 2017, 167, 615–672. [Google Scholar] [CrossRef]
Pastur, L.A.; Shcherbina, M. Absence of self-averaging of the order parameter in the Sherrington-Kirkpatrick model. J. Stat. Phys. 1991, 62, 1–19. [Google Scholar] [CrossRef]
de Almeida, J.R.L.; Thouless, D.J. Stability of the Sherrington-Kirkpatrick solution of a spin glass model. J. Phys. A Math. Gen. 1978, 11, 983. [Google Scholar] [CrossRef]
Chen, W.K. On the Almeida-Thouless transition line in the SK model with centered Gaussian external field. arXiv 2021, arXiv:2103.04802. [Google Scholar]
Guerra, F.; Toninelli, F.L. Quadratic replica coupling in the Sherrington-Kirkpatrick mean field spin glass model. J. Math. Phys. 2002, 43, 3704–3716. [Google Scholar] [CrossRef]
Panchenko, D. Free energy in the mixed p-spin models with vector spins. Ann. Probab. 2018, 46, 865–896. [Google Scholar] [CrossRef]
Barra, A.; Contucci, P.; Mingione, E.; Tantari, D. Multi-Species Mean Field Spin Glasses. Rigorous Results. Ann. Inst. Henri Poincaré 2013, 16, 691–708. [Google Scholar] [CrossRef]
Subag, E. TAP approach for multispecies spherical spin glasses II: The free energy of the pure models. Ann. Probab. 2023, 51, 1004–1024. [Google Scholar] [CrossRef]
Contucci, P.; Mingione, E. A Multi-scale Spin-Glass Mean-Field Model. Commun. Math. Phys. 2019, 368, 1323–1344. [Google Scholar] [CrossRef]
Mourrat, J.C.; Panchenko, D. Extending the Parisi formula along a Hamilton-Jacobi equation. Electron. J. Probab. 2020, 25, 1–17. [Google Scholar] [CrossRef]
Carmona, P.; Hu, Y. Universality in Sherrington–Kirkpatrick’s spin glass model. In Annales de l’Institut Henri Poincare (B) Probability and Statistics; Elsevier: Amsterdam, The Netherlands, 2006; Volume 42, pp. 215–222. [Google Scholar]
Lelarge, M.; Miolane, L. Fundamental limits of symmetric low-rank matrix estimation. Probab. Theory Relat. Fields 2017, 173, 859–929. [Google Scholar] [CrossRef]
Barbier, J. Overlap matrix concentration in optimal Bayesian inference. Inf. Inference A J. IMA 2020, 10, 597–623. [Google Scholar] [CrossRef]
Johnstone, I.M. On the distribution of the largest eigenvalue in principal components analysis. Ann. Stat. 2001, 29, 295–327. [Google Scholar] [CrossRef]
Krzakala, F.; Xu, J.; Zdeborová, L. Mutual information in rank-one matrix estimation. In Proceedings of the 2016 IEEE Information Theory Workshop (ITW), Cambridge, UK, 11–14 September 2016; pp. 71–75. [Google Scholar] [CrossRef]
Barbier, J.; Macris, N.; Miolane, L. The layered structure of tensor estimation and its mutual information. In Proceedings of the 55th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 3–6 October 2017. [Google Scholar]
Barbier, J.; Macris, N. The adaptive interpolation method for proving replica formulas. Applications to the Curie–Weiss and Wigner spike models. J. Phys. A Math. Theor. 2019, 52, 294002. [Google Scholar] [CrossRef]
Barbier, J.; Macris, N. The adaptive interpolation method: A simple scheme to prove replica formulas in Bayesian inference. Probab. Theory Relat. Fields 2019, 174, 1133–1185. [Google Scholar] [CrossRef]
Barbier, J.; Dia, M.; Macris, N.; Krzakala, F.; Zdeborová, L. Rank-one matrix estimation: Analysis of algorithmic and information theoretic limits by the spatial coupling method. arXiv 2018, arXiv:1812.02537. [Google Scholar]
Alaoui, A.E.; Krzakala, F.; Jordan, M. Fundamental limits of detection in the spiked Wigner model. Ann. Stat. 2020, 48, 863–885. [Google Scholar] [CrossRef]
Lesieur, T.; Krzakala, F.; Zdeborová, L. Constrained low-rank matrix estimation: Phase transitions, approximate message passing and applications. J. Stat. Mech. Theory Exp. 2017, 2017, 073403. [Google Scholar] [CrossRef]
Pourkamali, F.; Macris, N. Mismatched Estimation of Non-Symmetric Rank-One Matrices Under Gaussian Noise. In Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022; pp. 1288–1293. [Google Scholar] [CrossRef]
Verdú, S. Mismatched Estimation and Relative Entropy. IEEE Trans. Inf. Theory 2010, 56, 3712–3720. [Google Scholar] [CrossRef]
Chen, W. On the mixed even-spin Sherrington-Kirkpatrick model with ferromagnetic interaction. Ann. Inst. Henri Poincare 2014, 50, 63–83. [Google Scholar] [CrossRef]
Hartman, P. OrdinaryDifferentialEquations; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1964. [Google Scholar]
Bertsekas, D. Nonlinear Programming; A thena Scientific: Nashua, NH, USA, 2003. [Google Scholar]
Camilli, F.; Mézard, M. Matrix factorization with neural networks. Phys. Rev. E 2023, 107, 064308. [Google Scholar] [CrossRef]
Camilli, F.; Mézard, M. The Decimation Scheme for Symmetric Matrix Factorization. arXiv 2023, arXiv:2307.16564. [Google Scholar]
Guionnet, A.; Ko, J.; Krzakala, F.; Zdeborová, L. Estimating rank-one matrices with mismatched prior and noise: Universality and large deviations. arXiv 2023, arXiv:2306.09283. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Camilli, F.; Contucci, P.; Mingione, E. The Onset of Parisi’s Complexity in a Mismatched Inference Problem. Entropy 2024, 26, 42. https://doi.org/10.3390/e26010042

AMA Style

Camilli F, Contucci P, Mingione E. The Onset of Parisi’s Complexity in a Mismatched Inference Problem. Entropy. 2024; 26(1):42. https://doi.org/10.3390/e26010042

Chicago/Turabian Style

Camilli, Francesco, Pierluigi Contucci, and Emanuele Mingione. 2024. "The Onset of Parisi’s Complexity in a Mismatched Inference Problem" Entropy 26, no. 1: 42. https://doi.org/10.3390/e26010042

APA Style

Camilli, F., Contucci, P., & Mingione, E. (2024). The Onset of Parisi’s Complexity in a Mismatched Inference Problem. Entropy, 26(1), 42. https://doi.org/10.3390/e26010042

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Onset of Parisi’s Complexity in a Mismatched Inference Problem

Abstract

1. Introduction

2. The SK Model

3. High-Dimensional Inference and Statistical Physics

3.1. Bayes-Optimality and Nishimori Identities

3.2. The Spiked Wigner Model

3.3. Sub-Optimality and Replica Symmetry Breaking

4. The Model

5. Conclusions and Perspectives

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI