A Maximum Entropy Approach to the Realizability of Spin Correlation Matrices

Deriving the form of the optimal solution of a maximum entropy problem, we obtain an infinite family of linear inequalities characterizing the polytope of spin correlation matrices. For n ≤ 6, the facet description of such a polytope is provided through a minimal system of Bell-type inequalities.


Introduction
Moment problems are fairly common in many areas of applied mathematics, statistics and probability, economics, engineering, physics and operations research.Historically, moment problems came into focus with Stieltjes in 1894 [1], in the context of studying the analytic behavior of continued fractions.The term "moment" was borrowed from mechanics: the moments could represent the total mass of an unknown mass density, the torque necessary to support the mass on a beam, etc.Over time, however, moment problems took the shape of an important field in their own right.A deep connection with convex geometry was discovered by Krein in the mid 1930s and developed by the Russian school; see, e.g., [2,3].Another fundamental connection with the work of Caratheodory, Toeplitz, Schur, Nevanlinna and Pick on analytic interpolation was investigated in the first half of the twentieth century [4].This led to important developments in operator theory; see, e.g., [5,6].In more recent times, a rather impressive application and generalization of this mathematics has been developed by the Byrnes-Georgiou-Lindquist school for signal and control engineering applications.In their approach, one seeks a multivariate, stationary stochastic process as the input of a bank of rational filters whose output covariance has been estimated.This turns into a Nevanlinna-Pick interpolation problem with a bounded degree [7,8].The latter can be viewed as a generalized moment problem (namely, a moment problem with complexity constraints), which is advantageously cast in the frame of various convex optimization problems, often featuring entropic-type criteria.An example is provided by the covariance extension problem and its generalization; see [9][10][11][12][13][14].These problems pose a number of theoretical and computational challenges, especially in the multivariable framework, for which we also refer the reader to [15][16][17][18][19][20][21][22].Besides signal processing, significant applications of this theory are found in modeling and identification [23][24][25], H ∞ robust control [26,27], and biomedical engineering [28].
A general moment problem can be stated as follows.Suppose we are given a measurable space, (Ω, A), a family, F, of measurable functions for Ω to R and a corresponding family of real numbers, {c f : f ∈ F}.One wants to determine whether there exists a probability, P , on (Ω, A), such that: for every f ∈ F and, if so, characterize all probabilities having this property.Among the various instances of this inverse problem, the covariance realization/completion problem has raised wide interest, in part because of its important applications in mathematical statistics [29] and in theoretical engineering [30].It is well known that an n × n matrix is the covariance matrix of some R n -valued random vector if and only if it belongs to the convex cone of symmetric and positive semidefinite matrices.A relevant problem for applications considers the situation in which only some entries of the covariance matrix are given, for example, those that have been estimated from the data.In this context, one aims at characterizing all possible completions of the partially given covariance matrix or completions that possess certain desirable properties; see, e.g., [29,[31][32][33][34][35][36][37] and references therein.Another more theoretical problem investigates the geometry of correlation matrices, namely, covariances of standardized random variables.Clearly, correlation matrices form a compact, convex subset of the vector space of symmetric matrices of dimension, n, since the latter is determined by a family of linear inequalities, namely, the positivity constraints.A natural question is to determine the extreme points of this convex set.This problem was solved by Parthasarathy in [38], who could parametrize the uncountable family of extremals, which turn out to be all singular.
The geometry of correlation matrices may change dramatically if one adds constraints on the values of the random vector realizing the covariance matrix.In this paper, we consider the case in which the components of the random vector are required to be {−1, 1}-valued.We call spin systems random vectors of this type or, by abuse of language, their distributions.Although spin systems have been extensively studied in statistical mechanics and probability, various questions are still open concerning their covariance matrices.In [39], J. C. Gupta proved that covariance matrices of a system of n spins form a polytope and exhibited its 2 n−1 extremals.Apparently, his result is contained in some form in Pitowsky's previous work [40] and in even earlier, but not easily accessible, work by Assouad (1980) and Assouad and Deza (1982); cf.[41] (Section 5.3), for more information.
A more delicate problem is to characterize covariance matrices of spin systems by a system of linear inequalities.This problem was tackled in [40] (see, however [41] (p.54) for a thorough description of preceding contributions on "correlation polytopes" coming from such different fields as mathematical physics, quantum logic and analysis).There, the dual description, in the sense of linear programming, was considered, and the high complexity of the problem of determining the extremals of this dual, i.e., the facets of the polytope, was discussed.Moreover, in [42], it has been shown that this dual is generated by the Bell's inequalities (see [43] for an overview of the role played by these inequalities in quantum mechanics) for n = 3, 4, but not for n ≥ 5. Finally, we mention the paper [44], where the problem of realizability of correlations has been extended to the more general setting of random points in a topological space.
The aim of this paper is two-fold.
• We derive an infinite family of linear inequalities characterizing covariances of spin systems, via the solution of a maximum entropy problem.Besides its intrinsic interest, this method has the advantage of describing, in terms of certain Lagrangian multipliers, an explicit probability realizing the covariances, whenever they are realizable.The search for the Lagrange multipliers is an interesting computational problem, which will be addressed in a forthcoming paper.• Via a computer-aided proof, we determine the facets of the polytope of covariance matrices of spin systems for n ≤ 6.In particular, we show that for these values of n, Bell's inequalities are actually facets of the polytope, but generate the whole polytope only for n = 3, 4. For n = 5 and 6, the remaining facets are given by suitable generalizations of Bell's inequalities.Although the problem is computationally feasible also for some larger values of n, the number of extremal inequalities increases dramatically, and we have not been able to describe them synthetically.We mention the fact that the case n = 3 is peculiar, since it is the only case in which the polytope is a simplex.A more detailed description of this case is contained in the note [45].
Our work here inevitably overlaps with some previous research on linear descriptions of polytopes in combinatorial geometry, such as [46]; see also (Section 30.6 in [41]) and, in particular, the footnote on p. 503 of the latter reference (the book [41] by M. Deza and M. Laurent is a general, comprehensive reference on discrete geometry).We remark that our arguments go through even when the covariance matrix is only partially given, a case important for applications, but typically not considered in the discrete geometry literature.
Summing up, we obtain necessary and sufficient conditions for the existence of a covariance completion, as well as a "canonical" (maximum entropy) probability realizing the given covariances.

Spin Systems and Spin Correlation Matrices
Let us define a spin system first.Let Ω n = {−1, 1} n be the space of length-n sequences, which are denoted by σ = (σ 1 , σ 2 , . . ., σ n ), where σ i ∈ {−1, 1}.Define the spin random variables, ξ i : Ω n −→ {−1, 1}, for 1 ≤ i ≤ n as ξ i (σ) = σ i .For a probability, P , on Ω n , we denote by E P the expectation with respect to P .The finite probability space (Ω n , P ) is called the spin system.As in [39], for simplicity, we only consider symmetric probabilities, i.e., those for which E P (ξ i ) = 0 for all 1 ≤ i ≤ n.Note that, in this case, the covariance matrix, C = E P (ξ i ξ j ) n i,j=1 , has all diagonal elements equal to 1.We will refer to this matrix as the spin-correlation matrix associated to P .
Suppose that we are given the spin-spin correlations, c ij 's, and look for a probability, P , on Ω n , such that c ij = E P (ξ i ξ j ) for every 1 ≤ i, j ≤ n or for all pairs (i, j) for which c ij is given.We consider the following questions: • Under what conditions does a distribution with those correlations exist?
• If one such distribution exists, that is, if the given correlations are realizable, then how does one characterize the maximum entropy probability measure?
The spin correlation/covariance matrices form a convex polytope whose description in terms of vertices is known.Let us denote the convex polytope of the spin correlation matrices of the order, n, by Cov n .J. C. Gupta proved that Cov n is the convex hull of 2 n−1 matrices (it turns out that these matrices are exactly the extremal vertices of this polytope) that can be found explicitly.
Theorem 1 (J.C. Gupta, 1999): The class of realizable correlation matrices of n spin variables is given by: where Σ S and S are defined as follows: where c S ii = 1 for all i and c S ij = (−1) |S∩{i,j}| for i = j.
These Σ S 's are rank-1 matrices obtained by considering probability measures on Ω n supported at two points (configurations), such that each of these two configurations have probability, 1/2.The proof of above theorem can be found in [39].
It is interesting to note that the description of extremals of spin correlation matrices is rather simple when compared with the description of extremals of the convex set of correlation matrices in general (see [38]).As mentioned in the introduction, it is more difficult to obtain the dual representation in terms of linear inequalities.One simple observation is that every spin correlation matrix, C = (c ij ), must satisfy the following Bell's inequalities: for every ε ∈ Ω n and 1 ≤ i < j < k ≤ n: The necessity of these inequalities is easy to show: where, in the last step, we have observed that (σ i + σ j + σ k ) 2 ≥ 1 for every σ ∈ Ω n .One immediate consequence is that not all positive matrices, with diagonal elements equal to 1, are spin correlation matrices.
Example 1 Consider a symmetric matrix: Then, the condition for positive-definiteness is: and the Bell's inequalities are given by: 2 ) and c 2 = 0, we get a matrix, C, that is symmetric and positive semi-definite-hence, a correlation matrix-but it does not satisfy the Bell's inequalities, so it can't be a spin correlation matrix.

The Dual Representation for n ≤ 6
We have seen that the spin correlation matrices form a convex polytope with extreme points given by Theorem 1.Every convex polytope has two representations: one as the convex hull of finitely many extreme points (known as the V -representation) and another in terms of the inequalities defining the faces of the polytope (known as the H-representation).These inequalities provide necessary and sufficient conditions for a point to lie inside the convex hull.Thus, finding necessary and sufficient conditions for a matrix, M , to lie in Cov n is equivalent to finding the H-representation of Cov n .The problem of obtaining the H-representation from the V -representation is called the facet enumeration problem, while the dual one is called the vertex enumeration or the convex hull problem.These are well known problems in the theory of linear programming.
The program, cdd+ (cdd, respectively), is a C++ (ANSI C) program that performs both tasks.Given the equations of faces of the polytope, it returns the set of vertices and extreme rays and vice versa [47].This program is a computer implementation of the double description method (see, for instance, [48]).This program works with integer arithmetics; in particular, when data in the input are integers, it does not make any rounding.
We executed the cdd+ program to find the necessary and sufficient condition for 3 ≤ n ≤ 6.We know the extremals in each case from Theorem 1.We summarize below the results obtained.We remark that the facets of this correlation polytope have been previously computed for n ≤ 8 (Section 30.6 in [41]).Our point here is to connect these facets with Bell's inequalities and their generalizations (see Section 4).

Cases n = 3, 4
These are the simplest cases, already covered in [42].The program returns exactly the Bell's inequalities in Equation (1).In particular, the following nontrivial facts follow: • Bell's inequalities imply positivity of the matrix; • Bell's inequalities correspond to the facets of the polytope of spin correlation matrices in dimension three and four; in particular, they provide the "minimal" description in terms of linear inequalities.

Case n = 5
The polytope of spin-correlation matrices has 56 facets.Forty of these are given by the Bell's inequalities, corresponding to 5  3 = 10 choices of three indexes and 2 2 for ε ∈ {−1, 1} 3 (modulo change of sign).

Case n = 6
There are 368 facets.We can group the corresponding inequalities into three groups.
We will see in the next section that inequalities of the types above hold for spin-correlation matrices also in higher dimensions, where, however, facets of different types appear.

Maximum Entropy Method
Our aim now is to find an explicit measure that realizes the given covariances.One of the most natural and popular approach in these kind of problems is to use the maximum entropy method.The rationale underline this approach has been discussed over the years by various "deep thinkers" such as Jaynes [49][50][51] (physics), Dempster [29] (statistics) and Csiszár [52] (information theory).We refer the reader to these references for full motivation of this approach.
We want to find a probability measure that realizes the given covariances and that also maximizes the entropy of the system.In other words, we want to solve the following optimization problem: Maximize the entropy: Consider the Lagrangian function: Notice that L(P ) coincides with S(P ) on the set of P satisfying the constraints: Here, µ ∈ R and the n × n matrix Λ = (λ hk ) are the Lagrange multipliers.L(P ) is a strictly concave function of P on the convex cone, P, of positive measures on Ω n .Thus, if we can find an internal point, P * ∈ P, such that: for all δP : Ω n → R, then, necessarily, P * is the unique maximum point for L over P. Since: we get the optimality condition: Namely, P * has the form: where Z = exp {1 − µ}.Such a P * is, in fact, an internal point of P. Note that any probability of this form is such that P * (σ) = P * (−σ).In particular, this implies that each spin has a mean of zero with respect to P * .Also note that this last formula simply specifies a class of probability measures on Ω n , parametrized by the matrix of Lagrange multipliers, Λ = (λ ij ).It remains to establish whether the given correlations are realized by any such probability and, if so, to determine the corresponding values of the multipliers.To this aim, we consider the so-called dual functional.Let us denote by P * Λ the probability in Equation ( 3).Then, the dual functional, J , which is a real valued function of the Lagrange multipliers, is defined by: Observing that: and: we obtain the convex function: If we denote by ∇J the gradient of J with respect to the variables, λ ij , it is immediately seen that the following statements are equivalent: • Λ is a critical point for J , i.e., ∇J (Λ) = 0; • P * Λ realizes the assigned correlations, i.e.,: A critical point exists if J is proper, which means: It is also clear that the following set of inequalities ensures the properness of J : Let us denote by M (Λ), the minimum given by: min i,j λ ij σ i σ j : σ ∈ Ω .We denote by ∆ n the set of matrices defined by: We can now state the main result of this section.
Theorem 2 Let ∆ n be as defined in Equation (6).Then: where ∆ n denotes the interior of ∆ n .We know that ∆ n , the dual functional, J (Λ), is proper.This implies feasibility.Thus, there exists a probability, P , that realizes C as a correlation matrix of spin variables.Hence, C ∈ Cov n .Now, to show Cov n ⊆ ∆ n , let C = (c ij ) ∈ Cov n .Then, for every Λ, we have: This implies C ∈ ∆ n .As pointed out by one reviewer, Theorem 4.1 can also be proven by contradiction using the hyperplane separation theorem, using the knowledge of the extremal points of the polytope.Our proof, however, does not rely on this knowledge and holds, with minimal modifications, for random variables taking values in general subsets of R. The theorem above provides a (non-minimal) dual description of the polytope of spin-correlation matrices.Its main consequence is that it guarantees that whenever C is in the interior of Cov n , then it can be realized by some probability of the form Equation (3), for a Λ, which minimizes the dual functional J .The search of a probability that realizes a given correlation matrix is therefore reduced to finding the minimum of a function that, as we will see shortly, is convex.Before giving some details on this point, we observe that various classes of inequalities are obtained from Equation ( 6) by a suitable choice of Λ.
Then, let ε ∈ {−1, 1} T .We set: since, |T | is odd, we have: Thus, M (Λ) = 1 − |T |.As a result, we obtain the inequality We call these the generalized Bell's inequalities.These, as kindly pointed out by one anonymous reviewer, are special instances of the hypermetric inequalities introduced by M. Deza in the 1960s, (Section 6.1 in [41]).They reduce to Bell's inequalities for |T | = 3.We have seen in the previous section that these inequalities, for |T | = 3 and |T | = 5, give the facets of the polytope of spincorrelation matrices for n = 5.
Many other variants of the Bell's inequalities could be obtained with other choices of the λ ij .For instance, we can generalize to all even dimensions the inequalities of type (three) for the case n = 6.Let n ≥ 6 be even, and consider T ⊂ {1, 2, . . ., n}, such that |T | = n − 1.Then, choose: We obtain: where j T is the only element of T c .It is easy to check that the expression: as a function of σ ∈ Ω n , attains its minimum at i∈T ε i σ i = kε j T with k equal to three or five, and the minimum is −3, which gives M (Λ) = −(|T | + 3) = −(n + 2), and the family of inequalities: These, for n = 6, reduce to the inequality of type (three).
Remark 1 It is important to note that nowhere in the process of obtaining the maximum entropy measure have we assumed that we are given all the c ij s.Suppose we are only given a partial matrix.Then, ∆ n can be interpreted as the set of conditions under which the given partial matrix can be extended to a spin correlation matrix.Once we have feasibility, we know that the maximum entropy measure, P * , exists and can be used to complete the given matrix to a spin correlation matrix.

Finding the Minimum of the Dual Functional
We have observed that the maximum entropy method allows us to reduce the problem of realizing a given spin correlation matrix to finding the minimum of the function, J , defined in Equation (5).In this section, we show that this minimum can be obtained by an explicit gradient descent algorithm.Note first that J has some obvious symmetry properties: J (Λ) = J (Λ ) if λ ij = λ ij for all i = j, and J is indifferent to symmetrization: where A T is the transposition of the matrix, A. It is, therefore, enough to deal with the minimization problem within the set of symmetric matrices with zero diagonal elements.These matrices can be identified with elements of R I , where: In what follows, we use the usual vector notation for elements of R I : for v, w ∈ R I , v T denotes its transposition, v T w is the scalar product in R I and vw T is an element of R I×I .
Proposition 1 Consider the discrete time dynamical system in R I , defined by: For every K > n(n−1)

2
, this system has a unique fixed point, λ * , which is a global attractor, and it is the unique minimum of J .
Proof.Let G : {−1, 1} n → R I be defined by: G ij (σ) := σ i σ j for (i, j) ∈ I.Moreover, C ∈ Cov n are also obviously identified with elements of R I .In particular, in what follows, C T denotes the transposition of C as a vector in R I , rather than a matrix in M n (R.With these notations, J can be rewritten as: By elementary computations, we can compute the gradient, ∇J , and the Hessian, ∇ 2 J : λ σ e G T (σ)λ It follows, in particular, that ∇ 2 J (λ) is the covariance matrix of the vector, G, with respect to the probability: and it is, therefore, nonnegative.Thus, J is convex.In the next steps, we establish more detailed properties of J , including its strict convexity.
Step 1: the elements of G are linearly independent functions.Suppose α 0 , α ij for (i, j) ∈ I, such that: for every σ ∈ {−1, 1} n .We show that: for every (i, j) ∈ I.We proceed by induction on n.There is nothing to prove for n = 1.We can write, assuming Equation (10): Since the second summand in Equation ( 12) does not contain σ 1 , this implies: and: Identity in Equation ( 13) implies a 1j = 0 for j = 2, . . ., n, as can be shown, for instance, again by induction on n.Identity in Equation ( 14) implies α 0 = α ij = 0 for 2 ≤ i < j ≤ n by the inductive assumption.
Step 2: for every λ ∈ R I , ∇ 2 J (λ) is strictly positive definite.Denote by V(G) the covariance matrix of G with respect to the probability in Equation ( 9).We have: Step 3: λ ∈ R I , the largest eigenvalue of ∇ 2 J (λ) is less than or equal to n(n−1)

2
. Let δ denote this largest eigenvalue and v ∈ R I , a corresponding eigenvector with v T v = 1.We have: Step 4: For K > n(n−1)

2
, the map, λ → λ − 1 K ∇J (λ), is a strict contraction, and therefore, it has a unique fixed point.Let: We have: for some ξ in the segment joining λ and µ.Thus, setting v 2 = v T v: The conclusion now follows from the fact that, by Steps 2 and 3, the symmetric matrix, I − 1 K ∇ 2 J (ξ), has all eigenvalues in (0, 1).
It should be observed that the computation of J (λ) and of its gradient involves computing a sum over σ ∈ {−1, 1} n .This may be hard or even practically unfeasible for large n; this difficulty may be made less severe by the use of Monte Carlo methods.This and other computational aspects of this algorithm will be discussed in a forthcoming paper.