2. The SK Model
The SK model was introduced in the 1970s by D. Sherringhton and S. Kirkpatrick [
17] and stands as an explicitly solvable mean-field spin glass. In their work, the authors discovered that the solution obtained through the replica symmetric (RS) approximation was not correct at low temperature. With a groundbreaking approach, Parisi identified a new type of solutions, nowadays called 
replica symmetry breaking (RSB), which proved to be correct at any temperature, thereby revealing a novel mathematical and physical structure [
18].
The SK model is defined by its Hamiltonian, that is a function of 
N spins 
:
      where 
 is a collection of i.i.d. standard Gaussian. In physical terms, the couplings between pairs of spins can be ferromagnetic or antiferromagnetic with equal probability. Consider also a random variable 
 with 
 and a collection 
 representing random external fields acting on the spins. The 
Parisi formula is a representation for the large 
N limit of the pressure 
 defined by
      
In the definition (
2), 
 are fixed parameters, and the dependence on the realization of the random collections 
 is kept implicit. One can prove [
5] that 
 converges, for almost all realizations of the disorder, to its average 
. Notice that 
, taken after the logarithm, averages both the collections of 
 and 
 that are called 
quenched variables. The Hamiltonian (
1) can also be regarded as a centered Gaussian process with covariance
      
      where
      
 is the 
overlap between two spin configurations 
 and 
.
The Parisi variational principle for the limiting pressure per particle of this model was proved after almost three decades of efforts, and it is mainly due to the works of Guerra [
8] and Talagrand [
19]. We hereby summarize these milestones in a single theorem.
Theorem 1 (Parisi Formula [
8,
19]). 
Let  be the space of probability measures on ,  and . Consider the Parisi functional, which is defined aswhere  solves the PDEThe following holds The key tool for the proof is the (Gaussian) 
interpolation method, which is introduced in [
9] in order to prove the existence of the large 
N limit of 
.
The thermodynamic equilibrium induced by the pressure 
 is called 
quenched equilibrium and is defined as follows. Physical quantities (e.g.,  energy) are functions of the disorder variables 
 and the spin configurations 
. Given a function 
, its equilibrium value is defined as
      
      where 
 is the (random) Boltzmann–Gibbs distribution
      
The measure 
 is called a 
quenched measure and can be viewed as a two-step measuring process. Initially, for a given realization of the disorder variables 
, one assumes that the system equilibrates according to the canonical Boltzmann–Gibbs distribution 
 defining a (random) measure on the space of spin configurations. The expectation with respect to 
 is denoted by 
, namely
      
In probabilistic terms, 
 defines a conditional measure given 
 and 
. The remaining degrees of freedom 
, 
 are then averaged according to their apriori distribution 
.
An important role is played by the concept of 
replicas. Replicas are i.i.d. samples from 
 at fixed disorder. Hence, the equilibrium value of a function 
 of 
n replicas and the quenched variables 
 is defined by
      
The computation of derivatives of 
 shows, using integration by parts, that the SK model is fully characterized by the (joint) distribution of the overlap array 
, namely the overlaps between any finite number 
n of replicas with respect to the measure (
11). The main feature of the Parisi theory is the characterization of the mentioned joint measure by means of two structural properties:
- (i)
- It is uniquely determined by a one-dimensional marginal, namely the distribution of ; 
- (ii)
- The distribution of three replicas has with a probability of one an  ultrametric-  support
           
Despite having a mathematical proof of the Parisi Formula (
7) for the SK model, (
i) and (
ii) have been rigorously proved only in the mixed 
p-spin model [
6,
20,
21], an extension of the SK model, whose Hamiltonian contains also higher-order interactions (three-body, four-body, etc.).
One of the crucial instruments to achieve a rigorous control of the model is the so-called 
Ruelle Probability Cascades (RPCs), defined by Ruelle [
22] when formalizing the properties of the Generalized Random Energy model of Derrida [
23]. See also the characterization of RPC in terms of coalescent processes given in [
24]. The first direct link between RPC and the SK model appeared in the work of Aizenman–Sims–Starr [
25], where the authors found a representation of the thermodynamic limit of quenched pressure per particle in terms of the 
cavity fields distribution. This representation strongly suggested that if the thermodynamic limit of the overlap distribution is described by an RPC, then the Parisi formula is correct.
The first signal that the overlap array is described by an RPC was originally found by Aizenmann and Contucci in [
10] with the identification of 
stochastic stability and by Ghirlanda and Guerra [
26]. Both papers show an (infinite) set of identities for the moments of the overlap array distribution. It turns out that these identities actually imply that the support of the joint distribution of the overlap is ultrametric, as proved by Panchenko [
27]. It should be noticed that Panchenko’s theorem requires identities for the overlap moments of all orders. The latter do not hold for the bare SK model, but it can be shown that there exists a perturbation of the Hamiltonian that forces the SK model to satisfy them without affecting the limit of the quenched pressure [
28].
Once the validity of the Parisi Formula (
7) is established, it is natural to ask for the properties of its solution. The uniqueness of the minimizer of (
7) has been assessed by Auffinger and Chen [
29], and its properties have been investigated for example in [
30,
31].
A relevant question about the minimizer is the following: for which values of the parameters 
 is the solution of (
7) a Dirac-delta function 
 for some 
? In this case, we say that the model is 
replica symmetric and the Parisi Formula (
7) reads
      
The replica symmetric region can be identified [
6,
32] with the region of parameters 
 where the overlap is a self-averaging quantity, namely
      
      where 
 is exactly the value that realizes the infimum in (
13). The physics conjecture is that the replica symmetric region can be identified by the so called Almeida–Thouless [
33]
      
The above conjecture is proved only in the case of Gaussian external field 
 [
34]. An alternative characterization of the replica symmetric region has been obtained in [
6,
35]. If the minimizer corresponds to a non-trivial distribution (i.e., with non-zero variance), we say that 
replica symmetry breaking occurs, and the overlap is not a self-averaging quantity.
The Parisi formula has been extended to other mean field models with centered Gaussian interactions: vector spins [
36], multispecies models [
11,
37,
38], multiscale models [
39,
40]. Finally, we mention that the SK model fulfills a remarkable universality property: as long as 
’s are independent, centered, and with unit variance, the thermodynamic limit is still described by the Parisi solution [
41].
In this work, we show that a class of non-centered Gaussian spin glasses admits an interpretation of high-dimensional inference that extends the celebrated correspondence between the spiked Wigner model and the SK model in the Nishimori line where replica symmetry is always fulfilled [
3]. We show that the addition of an SK Hamiltonian to a Hopfield with a finite number of patterns can be mapped into a high-dimensional mismatched inference problem, where the statistician ignores the correct apriori distribution on the signal components they have to reconstruct. We shall see that even this slight mismatch may lead to the emergence of complexity, namely to the breakdown of replica symmetry, which is instead guaranteed under very mild hypotheses for 
optimal statisticians.
  3. High-Dimensional Inference and Statistical Physics
High-dimensional inference aims at recovering a ground truth signal,  in the following, that is usually a vector with a very large number of components from some noisy observations of it, which is denoted by . The main feature of this setting is that the dimension of the signal, i.e., the number of real parameters to reconstruct, and the number of observations at disposal are a function of one another, typically a polynomial. For instance, for our purposes,  will be a vector of  and  will be an  matrix for a total of  noisy observations. Hence, if the number of observations becomes large, the number of parameters to retrieve also does. Contrary to what happens in typical low-dimensional settings, where max-likelihood, or Maximum A Posteriori (MAP) approaches yield provably satisfactory reconstruction performances, in a high-dimensional setting, this is not always the case. In particular, one needs to devise another kind of more refined estimators that exploit the marginal posterior probabilities for each signal component.
Both approaches described above are Bayesian, and the knowledge of a prior distribution on the signal components can play a key role especially for high-dimensional problems. Furthermore, to compose the posterior measure for the entire signal, one needs the likelihood of the data, which is the probability of an outcome  of the variable  given a certain ground truth realization . As we shall discuss soon, under certain hypotheses, the Bayesian approach highlights the correspondence of relevant information theoretic quantities with thermodynamic ones. Among the others, a key quantity is the mutual information between the signal  and the observations , which quantifies the residual amount of information left in  about  after the noise corruption. As intuition may suggest, the mutual information gives access to the best reconstruction error that is information theoretically achievable.
Finally, we stress that the high dimensionality of the problem can induce phase transition in some parameters of the model, like the so-called signal-to-noise ratio (SNR), that tunes the strength of the signal with respect to that of the noise in the observations.
  3.1. Bayes-Optimality and Nishimori Identities
For the sake of simplicity, we start by considering a signal  of i.i.d. (independently and identically distributed) components , where  has a finite fourth moment. The observations at the disposal of a statistician can be modeled as a stochastic function of the ground-truth signal: , where  is the source of randomness or simply the noise. Knowing the function , from a Bayesian perspective, translates directly into having the likelihood of the model, namely the conditional distribution , which we assume to have a density  over the Lebesgue measure. Observe that the likelihood is strongly affected by the nature of the noise.
According to Bayes’ rule, the posterior distribution of 
 given the data is:
        where 
, and 
 is the probability of a given realization of the data, which is sometimes also called 
evidence. In practice, the above posterior, which would be ideal to perform inference, is rarely available, and the statistician is not aware either of the likelihood or of the correct prior distribution for the signal, or even both. This motivates the following definition of a special inference setting:
Definition 1 (Bayes optimality). 
The statistician is said to be Bayes optimal, or in the Bayes-optimal setting, if they are aware both of  and ; namely, they have access to the posterior (16). The above is saying that an optimal statistician knows everything about the model except for the ground truth 
 itself. The Bayes-optimal setting is thus often used as a theoretical framework to establish the 
information theoretical limits. Indeed, it is known that the mean square error between the ground truth and an estimator 
        is minimized by an optimal statistician that can use the posterior mean as an estimator, yielding the 
minimum mean square error (MMSE)
        
In the following, we shall denote averages with respect to the posterior as 
.
Another important consequence of this setting is the so-called 
Nishimori identities, which can be stated as follows. Given any continuous bounded function 
f of the data 
, the ground truth 
 and 
 i.i.d. samples from the posterior 
, one has
        
        where 
. An elementary proof can be found in [
42]. These identities are enforcing a symmetry between replicas drawn from the posterior and the ground truth. For instance, a direct application of the Nishimori identities yields
        
It is important to stress that, as it can be seen from the above equation, an optimal statistician is actually able to compute the minimum mean square error using their posterior.
At this point, the reader will have noticed a similarity with the Statistical Mechanics formalism. In fact, it is possible to interpret 
 as the partition function of a model with Hamiltonian 
 and unit inverse absolute temperature. The pressure per particle of such a model would thus be
        
        namely minus the Shannon entropy of the data per signal component, which is related to the mutual information
        
The contribution coming from the conditional entropy 
 can be regarded as due only to the noise, since for fixed 
, the only randomness in 
 is due to 
.
We stress here that Bayes optimality, and the Nishimori identities, under rather mild hypotheses [
43] are enough to grant 
replica symmetry in the model, i.e., concentration of the order parameters in the model. For the models we are interested in, the latter can be shown to imply finite-dimensional variational principles for the limiting mutual information.
  3.2. The Spiked Wigner Model
The spiked Wigner model (WSM) was first introduced in [
44] as a model for Principal Component Analysis (PCA), and since then, it was widely studied in recent literature. Without pretension of being exhaustive, we refer the interested reader to [
42,
45,
46,
47,
48,
49,
50,
51]. For our purposes, we restrict ourselves to the case where the signal is an 
N-dimensional vector of 
 s, drawn from a Rademacher distribution 
. The function 
 is a 
Gaussian channel, namely
        
        where 
, and 
 is a positive parameter called the 
signal-to-noise ratio. The statistician is tasked with the recovery of 
 given the observations 
. The Bayes-optimal posterior measure for this inference problem can be written directly as a Boltzmann–Gibbs random measure thanks to the Gaussian nature of the likelihood:
        where we have already exploited the fact that 
. We are denoting the posterior samples with 
. Since the quantity we are interested in is the quenched pressure of this model
        
        that is connected to the mutual information 
 by a simple shift with an additive constant, we are allowed to perform a gauge transformation without altering its value:
This results in a Hamiltonian that is now independent of the original ground-truth signal
        
        and the coupling between spins are Gaussian random variables with a mean equal to their variance. This condition identifies a peculiar region of the phase space of a spin-glass model, which is called 
Nishimori line. In fact, the Nishimori identities were first discovered and studied in the context of gauge spin-glasses. Despite looking simpler, the above model retains most of the features we need for our study.
For inference models with additive Gaussian noise, like the one above, it is possible to prove the so-called 
I-MMSE relation:
        where 
 is the Frobenius norm and 
 denotes the expectation with respect to the Boltzmann–Gibbs measure induced by (
27). Hence, once the mutual information is known, the MMSE can be accessed through a derivative with respect to the signal-to-noise ratio. A clarification is in order here: the above is the MMSE on the reconstruction of the rank-one matrix 
, because, due to flip symmetry, here we do not have any actual information on the single vector 
, but only on the 
spike .
  3.3. Sub-Optimality and Replica Symmetry Breaking
There are several ways to break Bayes optimality. Some examples are that the statistician does not know the signal-to-noise ratio 
 [
13,
52]; the statistician adopts a likelihood different from that of the true model [
14]; the statistician adopts a wrong prior [
12,
53]; combinations of the previous and many others. We will focus on the mismatching priors case, where the statistician not only adopts a wrong prior on the ground-truth elements, but they are not aware of the rank of the spiked matrix hidden inside the noise, which is denoted by 
M. The rest is assumed to be known. The channel of the inference problem is
        
If the statistician assumes a Rademacher prior to the signal components and a rank-one hidden matrix, they will write a posterior in the form
        
        where
        
The slash on quantities emphasizes that they are not the Bayes-optimal ones. In this setting, one can no longer rely on the Nishimori identities, and in principle, replica symmetry is no longer guaranteed. On the contrary, as we shall argue later on, a mismatch in the prior only is already sufficient to cause 
replica symmetry breaking.
  4. The Model
Let 
M be a fixed integer and 
. Consider two independent random collections 
 and 
 where 
 is such that 
. The above random collections play the role of quenched disorder in the model. Consider 
N Ising spins 
 and the Hamiltonian function
      
      with 
. Here, 
 is the interacting part while
      
      denotes the random external field acting on the spins. The Hamiltonian (
32) is determined by the choice of 
 and 
. For 
, the interaction term 
 coincides with the Hamiltonian (
31). Note that for some special choices of the parameters, we recover some well-known spin glass models:
-  gives the SK model ( 1- ) at  -  and random external field  - . 
-  gives the Hopfield model [ 6- , 7- , 18- ] with a finite number of patterns  - . 
-  and  -  gives the SK model on the Nishimori line ( 27- ). As we have seen in  Section 3- , the latter can be also viewed as a spiked Wigner model in the Bayesian-optimal setting. 
Notice that the entire model can be interpreted as a 
Hopfield model where the traditional Hebbian matrix 
 is corrupted by Gaussian noise. Furthermore, if the Hebbian coupling is replaced by a constant matrix, the model reduces to an SK model with the addition of a ferromagnetic interaction, and it was studied in [
54].
Our main result is the computation of the thermodynamic limit of the pressure per particle
      
      whose variance can be shown to converge to 0 as an 
, namely:
Lemma 1. Assume . Then, for any  and where K is a suitable positive constant.  We thus focus on 
. The proof of this lemma makes use of the Efron–Stein concentration inequality to bound the variance, and it is simple but tedious. It follows closely that of ([
12], Lemma 9). We are now in a position to state our main theorem:
Theorem 2 (Variational solution). 
If  thenwhereand  is the Parisi functional (5) with a random external fieldand  denotes the expectation with respect to . The consistency equations areMoreover, there exists  such that for any , one has  and the supremum in (36) can be restricted to . The proof of the theorem is based on the concentration of the Mattis magnetization, which is the normalized scalar product between a spin-configuration (or sample from the wrong posterior measure) and one of the 
:
The Hamiltonian can thus be rewritten using (
40) in the following form:
The Mattis magnetization, in fact, plays the role of an order parameter for this model. The concentration we can prove is only an integral average over some suitably small magnetic fields, which is still sufficient for our purposes:
Proposition 1 (Concentration of Mattis Magnetizations). 
Consider a k such that . Let  with ,  for all . For any , we denote by  the Boltzmann–Gibbs measure induced by the Hamiltonian . Thenfor all  and . We shall omit the proof of the above result as it is completely analogous to the one in [
12]. We will need an intermediate lemma that leads to it (see Lemma 2 later) together with a second key ingredient: the 
adaptive interpolation technique [
48] combined with Guerra’s replica symmetry-breaking upper bound for the quenched pressure of the SK model [
8].
Proof of Theorem 2. 
Here, we outline the main steps of the proof of the variational principle for the thermodynamic limit. The proof is achieved via two bounds that match in the 
 limit. Let us start by defining the interpolating Hamiltonian
        
        where 
 and
        
        with 
 and where the interpolating functions 
, that must be continuously differentiable in 
 and non negative, will be suitably chosen. With this interpolation, one is able to prove the following sum rule:
Proposition 2. The following sum rule holds:where  The proof consists of the computation of the derivative of the interpolating pressure related to the model (
43). It follows closely that of ([
12], Proposition 7), to which we refer the interested reader. Since the remainder 
 is non-negative, the above proposition already yields a bound for the quenched pressure of our model when we choose 
 constant:
        where we used Lipschitz continuity of the SK pressure in the magnetic fields.
The upper bound requires more attention. First, we notice that 
 is convex in the magnetic fields and that 
. Hence, we can use Jensen’s inequality and Lipschitz continuity of 
 to obtain:
Now, we use Guerra’s bound for the SK pressure, that, importantly, is uniform in 
N, and we average over 
 on both sides
        
What remains to do is to prove that 
 for a proper choice of the interpolating functions 
. The choice is made through a system of coupled ODEs
        
One can easily check that the above system is regular enough to admit a unique solution on the interval 
. In this case, the remainder to push to 0 would appear as
        
The goal is now to apply a concentration lemma here:
Lemma 2. Let  and denote by  the Boltzmann–Gibbs expectation associated to the Hamiltonian  where  and  is the k-th canonical basis vector of . Thenwith K a positive constant.  Notice that the integral in (
51) is over 
 and not over the effective magnetic field of the model, which is instead 
. Nevertheless, we can integrate over the magnetic fields 
 with a change of variables. This involves a Jacobian that is larger than 1. In fact, thanks to Liouville’s theorem ([
55], Corollary 3.1, Chapter V), one can prove that
        
        when 
.
This allows us to bound the thermal fluctuations in (
51) using (
52) and then Liouville’s theorem:
Since 
 has a bounded second moment, using Cauchy–Schwarz inequality, one can show that 
 is uniformly bounded by a constant 
C. Hence, 
 for any 
 by construction (recall (
44) and (
50)). Therefore, 
.
The fluctuations induced by the disorder can be bounded in a very similar fashion using (
53): 
Hence, overall (
51), that equals 
 is a 
. 
 can be chosen as a function of 
N in order to optimize the convergence rate: 
. Using Fubini’s theorem in (
49) to exchange the 
t and 
 averages and then Dominated Convergence, one concludes the proof.    □
 From the the variational problem (
36), we can deduce also the differentiability properties of the limiting pressure obtaining the average values of the relevant thermodynamic quantities of the model:
Corollary 1. Let , and . Then
More generally, let y be one of the variables , then the function  is convex. By Danskin theorem (see [56]),  is differentiable if and only if the set  is a singleton.