The Stochastic complexity of spin models: Are pairwise models really simple?

Models can be simple for different reasons: because they yield a simple and computationally efficient interpretation of a generic dataset (e.g. in terms of pairwise dependences) - as in statistical learning - or because they capture the essential ingredients of a specific phenomenon - as e.g. in physics - leading to non-trivial falsifiable predictions. In information theory and Bayesian inference, the simplicity of a model is precisely quantified in the stochastic complexity, which measures the number of bits needed to encode its parameters. In order to understand how simple models look like, we study the stochastic complexity of spin models with interactions of arbitrary order. We highlight the existence of invariances with respect to bijections within the space of operators, which allow us to partition the space of all models into equivalence classes, in which models share the same complexity. We thus found that the complexity (or simplicity) of a model is not determined by the order of the interactions, but rather by their mutual arrangements. Models where statistical dependencies are localized on non-overlapping groups of few variables (and that afford predictions on independencies that are easy to falsify) are simple. On the contrary, fully connected pairwise models, which are often used in statistical learning, appear to be highly complex, because of their extended set of interactions.

which is always equal to zero, unless all s i are equal to s i . In this latter case the sum yields the 2 n factor that allows to recover (S.3). The orthogonality and completeness properties (S.3) allow us to express any function F : S → R [3] as a linear combination of operators [2] φ µ (s), i.e.,

Generating Set of Ω n and Independent Operators
In the following, we will call generating set of Ω n a set of n spin operators that can fully generate Ω n , such as the set {s 1 , . . . , s n }. We also define the notion of set of independent operators [4] as a set I verifying that the product of all the operators of any subset of I is always different from φ 0 (s) = 1. Formally, any set of n independent operators of Ω n is a generating set of Ω n . By definition, a generating set of Ω n cannot include the identity operator φ 0 .
Mathematically, the set Ω n , associated with the multiplication operation (Ω n , ·), forms a finite Abelian group with identity element φ 0 (s) = 1 generated by a minimal set of n generators of order 2 (Ω n = Z 2 n ).

SM-0.3. Spin Models
A model M is defined in terms of a subset M ⊆ Ω n \{φ 0 } of operators [5]. These define a probability distribution of the vector s = (s 1 , . . . , s n ) of spin variables: where the vector g = {g µ , µ ∈ M} are the conjugate parameters: each parameter g µ is a real variable that modulates the strength of the interaction associated with the operator φ µ (s). We shall refer to the modelM = Ω n \{φ 0 } with all operators as the complete model. Models can be degenerate (several operators are mapped to the same parameter) or not.

Degenerate Models
For completeness we also define degenerate models, which are discussed in section SM-7. In a degenerate model, each parameter can be associated to one or more interactions. For example, the mean field Ising model is a degenerate model with only 2 parameters, h and J; the connection with the g µ notation reads: for all µ = 2 k with k ∈ [0, n − 1] , J for all µ = 2 k + 2 k with k > k and (k, k ) ∈ [0, n − 1] 2 , 0 otherwise , (S. 9) where we used the binary representation of the set µ [1]. To work with degenerate models, it is convenient to introduce a more general notation, in which a model M is defined by a set of |M| operators, φ = {φ µ } µ∈M , a set of m parameters g = {g i } i∈{0,...,m} , and a rectangular (mapping) matrix U of size |M| × m that maps each operator of φ to one parameter of g: otherwise . (S.10) For non-degenerate models, U is simply the |M| × |M| identity matrix. By definition, each column of U contains a single 1, whereas the sum of each line i gives the degeneracy α i = ∑ j U ij of the parameter g i . Note that ∑ ij U ij = |M|. The extension to such degenerate models is natural when operators µ ∈ V are of the same order [6]. The number of possible degenerate models, where interactions of the same order may be assigned the same parameter, grows much faster than N n with n:

SM-0.4. The Stochastic Complexity and Bayesian Model Selection
In this section we recall the relation between the stochastic complexity defined in the context of Minimum Description Length and the geometric complexity obtained from a Bayesian approach. We refer to References [7,8] for a more complete treatment.
Bayesian model selection dictates that, given a datasetŝ = (s (1) , . . . , s (N) ) of N observed configurations s (i) ∈ S, each model should be assigned a posterior probability according to Bayes' rule. Here P 0 (M) is the prior probability on the model M and the sum in the denominator runs on all models M that are considered. In Equation (S.11), P(ŝ|M) is the so-called evidence that is computed by integrating the likelihood P(ŝ | g, M) over the parameters g. In the case where s (i) are i.i.d., this reads: where P 0 (g|M) is the prior distribution on the parameters g of model M. For spin models, the probability P(s | g, M) is given by Equation (S.7) and the evidence becomes: 13) whereφ(ŝ) is a vector with |M| elements, containing the empirical averages of the operators φ µ over the measured dataŝ:φ The log-likelihood, log , is a convex function of g and it has a unique maximum for the values of the parameters g =ĝ that are the solution of the set of equations: denotes the ensemble average of the operator φ µ (s) under the model specified by g. In other words, at g =ĝ, the ensemble average of each operator φ µ of M is equal to its empirical averageφ µ over the measured dataŝ. For large N, the integral is sharply dominated by the maximum and it can be estimated by the Saddle-point method, expandingφ(ŝ) · g − log Z M (g) to second order aboutĝ. This shows that, for large N, Equation (S.13) is well approximated by: where c BMS M is a geometric complexity term [8] arising from the Gaussian integration: and J(g) is the Hessian of the log-likelihood, which in this case coincides with the Fisher Information matrix defined in Equation (5) of the main text. Minimum Description Length instead approaches the problem of model complexity from an apparently different angle. Imagine we run a series of experiments that generate a sampleŝ of N 1 observations of a system. We model the outcome of the experiment as N i.i.d. drawn from a model P(s | g, M) for unknown parameters g (imagine the situation where we run the experiment precisely because we want to infer the parameters g). How much memory storage should be set aside before running the experiment? If we knew the parametersĝ the solution is given by (minus) the log-likelihood − log P(ŝ |ĝ, M) where P(ŝ| . . . In the absence of this information, the problem can be cast as a minimax problem (we refer to [9] for details), i.e., to find the best possible codingP(ŝ) in the case where Nature choses the worst possible sampleŝ. The solution is the normalised maximum likelihoodP From this, it is clear that the additional memory space that is needed to describe the model and the parameters is given by the log of the denominator of Equation (S. 19), which is the l.h.s. in Equation (3) of the main text. In order to derive the r.h.s. of Equation (3) of the main text, consider the expansion: that arises from performing the integral by saddle point around the maximum likelihood parameterŝ g(ŝ) which depend on the dataŝ. In Equation (S.20), the matrix J is, in general, the Hessian of the likelihood atĝ. Yet, for exponential models, the Hessian J does not depend on the data, and it coincides with the Fisher Information matrix. Taking f (g) = det J(g), summing over all samplesŝ in Equation (S.20) and taking the limit N → ∞, one finds which is Equation (3) of the main text, and where c M is given by Equation (4) of the main text. As observed in Reference [8], the choice of Jeffreys' priors [10] in Equation (S.18) makes the geometric complexity c BMS M of the Bayesian approach coincide with the stochastic complexity c M (see Equation (4) of the main text) prescribed by Minimum Description Length [7,11]. This choice for the prior seems natural (in absence of any information on the values of g), as it corresponds to assuming an a priori uniform distribution in the space of samples [8]. We will see that this choice of prior has also an interesting property, as it is invariant under re-parametrisation, which will lead to the definition of class of complexity.

SM-1.1. Definition
Any generating set σ = {φ ν 1 , . . . , φ ν n } of Ω n induces a bijection s → σ(s) on the set of configurations S and on the set Ω n of operators. Indeed where ⊕ i∈µ ν i is the bitwise XOR of the binary representation of the integers ν i for all i ∈ µ. We call such a bijection a gauge transformation [12]. In other words, these are transformations that map the set of n generators {s 1 , . . . , s n } of Ω n to another set of generators of Ω n , i.e. a set of n independent operators of Ω n (see definitions in SM-0). A GT preserves the structure of Ω n in the sense that any operator in the old basis is mapped into a distinct operator in the new one. A transformation that maps (s 1 , . . . , s n ) to a set of n non-independent operators will not preserve its structure. Indeed it maps Ω n to a strict subset of Ω n , with n < n independent generators. Combining them can generate only 2 n operators, which means that some operators of Ω n will not occur in Ω n . Note also that the operator φ 0 (s) = 1 is invariant under GTs. Mathematically, these transformations are the automorphisms of the group (Ω n , ·).

SM-1.2. Number of Gauge Transformations for A System with n Spins
The total number of these transformations corresponds to the number of possible sets of generators of Ω n . There are exactly possible ways to sample a set of n operators among Ω n \{1} [13] . However, only a few of them correspond to a set of n independent operators. Consider that you have chosen i independent operators, {σ 1 , . . . , σ i }, in Ω n \{1}: with these operators you can generate a subset of 2 i operators of Ω n . The number of operators left in Ω n that are independent of the family {σ 1 , . . . , σ i } is thus |Ω n | − 2 i = 2 n − 2 i , which corresponds to the number of possibilities for choosing another independent operator σ i+1 . As a consequence, the number of different ways to sample n independent operators from Ω n , i.e., the total number of GTs, is In this equation, one can recognise the q-Pochhammer symbol, 1 2 , 1 2 n , a (strictly) decreasing function of n, converging rapidly (n > 5) to its asymptotic value known as the Euler φ-function, φ 1 2 0.2887880950. For example, N GT (3) = 168 and N GT (4) = 20160; for n > 5, the number of gauge transformations grows as N GT (n) ∼ 0.289 × (2 n ) n . Finally, the probability of getting a GT by drawing at random n operators {σ 1 , . . . , σ n } of Ω n converges asymptotically to a non-zero constant:

SM-2.1. Partition Function and Loops of M
In order to compute the complexity c M of a model M from Equation (4) of the main text, one has first to compute the Fisher Information matrix J(g), and, by extension, the partition function Z M (g) given in Equation (S.7). As each operator φ µ (s) only takes values in {−1, 1}, the exponential terms in Equation (S.7) can be expanded as [14,15] e g µ φ µ (s) = cosh(g µ ) + φ µ (s) sinh(g µ ) = cosh(g µ ) [1 + φ µ (s) tanh(g µ )] , (S.27) which successively leads to the expressions for the partition function: where the sum over M ⊆ M runs over all possible sub-models (i.e., subsets) of M and the product is then taken over every operator of the sub-model M . The "empty model" M = {∅}, with no interactions, is also included in the sum, considering that ∏ µ∈{∅} φ µ tanh(g µ ) = 1. In order to compute the sum over all configurations S, one can exploit Equations (S.2) and (S.4), that lead to: where ⊕ µ∈M denotes the bitwise XOR operation between all the operators µ ∈ M . Here, the key observation is that ⊕ µ∈M = 0 if and only if each spin occurs an even number of times (or none) among the operators of M . In this latter case, the operators of M form a loop, such that ∏ µ∈M φ µ (s) = 1 is equal to the identity operator. Let us name any sub-model M that forms a loop and call L the set of all the loops of a given model M (including the empty loop {∅}), allowing us to obtain the expression in Equation (7) of the main text. The expansion of the partition function in loops is in the same spirit of cluster expansions methods in statistical physics (for a review see [16]).

SM-2.2. Invariance of Z M under Gauge Transformation
In Equation (7) of the main text, the structure of the partition function depends only on few characteristics of the model M: i) the total number of operators |M|, as they all appear in the product ∏ µ∈M cosh(g µ ); ii) the structure of its set of loops L: the number |L| of loops in the model (through the sum over L); the number | | of operators involved in each loop, named the length of the loop (through the product over each operator µ of ); and finally which operators are involved in each loop.
These properties are invariant under GTs, such that the structure of the partition function in Equation (7)    transformations define an equivalence relation between models, for which the structure (previously described, see i) and ii)) and the complexity c M are invariant. Gauge transformations thus allow us to partition all models into equivalence classes, that we call complexity classes. For instance, Figure 1 of the main text displays several models for n = 4 that belong to the same complexity class (highlighted in bold font in Table S2) for which c M 2.8. Note that, conversely, c M = c M does not imply that M and M belong to the same complexity class. For example, the models M = {s 1 , s 2 } and M = {s 1 , s 2 , s 1 s 2 } have both c M = c M = 2 log π (see Table S1), but their structures are clearly different.

SM-3. Complexity Classes and Loop Structure of Spin Models
Let us highlight several interesting properties of models belonging to the same class of complexity. First, the number of independent operators n M in a model M is invariant under GT and is thus a property of each complexity class (see SM-3.1). It can besides be (strictly) smaller than the system size n: any model with n M < n is equivalent to a model involving only n M spins. Second, the set of loops L of a model (including the empty loop {∅}) has the structure of a finite group, from which we show that the total number of loops of a given model M is of the form (see SM-3.2): The structure of the group L is an invariant of the class. Table S1 gives, for instance, a description of the loop structure for each class of complexity of models with n = 3. Finally, GTs also imply a duality relation between complexity classes of complementary models: each class of models with |M| operators corresponds to a complementary class of models with (2 n − 1) − |M| operators that contains the same number of models (see SM-3.3 and Table S1).

SM-3.1. Number of Independent Operators
We define n M as the maximum number of independent operators of a model M, i.e., the maximum number of operators that can be taken in M without forming any loop. Necessarily n M ≤ n, because all operators can be generated by products of the n spins. Furthermore, the number of spin operators [17] generated by n M independent operators is 2 n M − 1, which implies the following relations between |M| and n M (see Figure S1):  Table S2 in SM-6). The dash lines corresponds to (S.28) for different values of n M ; the black squares, to the value of λ max given in (S.31). Models with λ = 0 are models with only independent operators, whereas models with λ = λ max are equivalent to sub-complete models, i.e. complete models on a subset n M of the n spins.
By definition, n M is invariant under GT and is thus a property of each complexity class. Table S1, for instance, reports the value of n M for each class of models with n = 3. As an important consequence, any model with n M < n can be mapped (through a GT) to a model involving only n M spins. See for instance the class displayed in Figure 1 of the main text: each model involves |M| = 7 operators for n = 4 spins, but only n M = 3 are independent; consequently any of these models can be mapped through a GT to a model that involves only 3 spins (see model a) in Figure 1).

SM-3.2. Loop Structure of A Model
For any model M, the corresponding set of loops L has a finite cardinality [18]. Let us define the disjunctive union (or symmetric difference) of two loops 1 and 2 as the set that contains the operators that occur in 1 but not in 2 and viceversa: . The set L is closed under disjunctive union ⊕: indeed if 1 and 2 are two elements of L, then 1 ⊕ 2 is also in L [19]. We can thus find a minimal generating set of loops, of cardinality λ ≤ |L|, that can generate the whole set L. Finally, we note that the operation ⊕ is commutative and that any element of L is of order 2: for any loop , ⊕ = {∅}. This way, the total number of loops that can be formed with λ generating loops is |L| = 2 λ , including the empty loop {∅}, and consequently, the total number of non-empty loops of any model is of the form 2 λ − 1.
Mathematically, for any model M, the corresponding set of loop L forms a finite Abelian group associated with the operator of disjunctive union ⊕, the neutral element being the empty loop {∅}. Each element of this group is of order two, which implies that the cardinality of the group is of the form 2 λ , where λ is then the cardinality of the minimal set of generators of L.
Let us now prove the relation in (S.28). Consider the model M with |M| operators, of which at most n M are independent. Let us take one maximal subset of independent operators of M and call it I M : by construction, the number of operators in this subset is |I M | = n M and the set of loops that can be formed with these operators is necessarily empty. If |M| = n M , then the set of loops of M is empty, which is consistent with λ = 0 in (S.28). If instead |M| > n M , then, for any other operators φ ν ∈ M\I M , we can find a subset P ν ⊆ I M such that the set ν = {φ ν } ∪ P ν is a loop of M, i.e., This procedure, for each different operator φ ν ∈ M\I M , produces a different loop, so the set of |M| − n M loops ν = {φ ν } ∪ P ν built in this way is a minimal generating set of L M . Note indeed that each loop ∈ L M can be decomposed in the loops ν in an unique way. Therefore λ = |M| − n M . Table S1, reports the values of |M|, n M and λ for n ≤ 4 and (S.28) can be verified for each class of complexity. For a fixed number n M of independent operators, λ thus grows linearly as a function of the number of parameters of M up to a maximum value (see Figure S1), where Ω n M is the set of operators that can be generated from n M spins. The value of λ max is associated with the maximum set of loops that can be generated from n M spins, that corresponds to the class of complexity of sub-complete models on n M spins, i.e., complete models on a subset n M of the n spins.

SM-3.3. Complexity Classes of Complementary Models
Consider a model M with |M| operators, we define the complementary model M c as the unique model that contains all the operators that are not in M, i.e., M c = Ω n \{1}\M, which can also be written as: By definition, M c has exactly |M c | = 2 n − 1 − |M| operators. Using the properties of gauge transformations, it can be shown that, if two models belong to the same complexity class C, then their respective complementary models also belong to the same class C c (see proof below). As a consequence, the two corresponding classes of complexity (named complementary classes) have the same cardinality and the number of complexity classes with |M| parameters is equal to the number of classes with 2 n − 1 − |M| parameters. Observe, for instance, in Table S1  We thus obtain that, if two models belong to the same class of complexity C, then their respective complementary models also belong to the same class C c , thus called complementary class of C.

SM-4. A General Argument for the Calculation of c M
In this section we provide a general argument for the calculation of the complexity c M : in the first part we compute the complexity of models with only independent operators and show that any independent operator (that doesn't enter in a loop) contributes as log π to c M ; while in the second part we suggest that this value constitutes an upper bound for the complexity of any operator.

SM-4.1. Models with Only Independent Operators
At fixed number of spins n, every model with |M| = n M independent operators and non-degenerated parameters belongs to the same class of complexity (see SM-3), which is the class of models with L = {{∅}}. The number of models in such a class is which is the number of possible ways [20] of choosing n M independent operators in Ω n \{1} divided by their permutations. Notice that 1 ≤ n M ≤ n, as n + 1 operators are necessarily forming at least one loop. In particular, for models with |M| = 1 operator, we recover that N ind n (1) = |Ω n | (any operator can be chosen); and, for models with n M = n independent operators, we obtain that N ind n (n) = N GT /n! . For instance, with n = 4, Equation (S.33) gives: N ind 4 (2) = 105 for n M = 2, N ind 4 (3) = 420 and N ind 4 (4) = 840, that match with the results in Table S2. The partition function of a model with n M = |M| independent operators contains only the first term of Equation (7) of the main text, Z(g) = 2 n ∏ µ∈M cosh(g µ ). The Fisher information matrix (FIM), in Equation (5) of the main text, is therefore diagonal and reads J ind µν (g) = δ µν [1 − tanh 2 (g µ )], which finally leads to the complexity term: (S.34) For the same reason, any operator of a model, that doesn't enter in any loop of the model, contributes with a term log π to the complexity c M . Indeed, let us consider a model M with |M| operators, including K independent φ 1 , . . . φ K . We can then introduce a model M formed by the set of non-independent operators of M, and a model M ind = {φ 1 , . . . φ K }, so that M = M ∪ M ind , |M| = |M | + K and L M = L M . As a consequence the partition function in Equation (7) of the main text can be factorized in the two models: Using the previous argument, we obtain that the FIM of M is a block matrix: where J ind is the diagonal matrix previously introduced (for models with only independent operators). Finally, using the property of the determinant, Notice that, in (S.34), the stochastic complexity for |M| independent operators is linear in the number of operators |M|, which is of the same form than the first penalty term in the BIC (penalty due to the number of operators -see Equation (3) of the main text).

SM-4.2. General Argument
Given a datasetŝ = (s (1) , ..., s (N) ) where s (i) are n-spins configurations, the maximum likelihood of exponential models defined in (S.7), is achieved when the empirical averagesφ µ (S.14) of the operators φ µ match the population averages ϕ µ (S. 16), and takes the value: where S(φ) is the entropy of P(s|ĝ). So we can introduce a delta function enforcing Equation (S.15) in Equation (S.21) for each operator µ ∈ M, resulting in The sum over samples is zero unless ϕ can be realized in at least one sample. This means that where Q(ϕ ) is the number of samplesŝ for whichφ µ (ŝ) = (ϕ ) µ for all µ ∈ M. Q(ϕ) can be estimated in the weak dependence limit where it is given by where we have turned the sum over ϕ µ into an integral, observing that dϕ µ = 2/N. In the limit N → ∞ we finally get This is a quite interesting result. It tells us that the complexity of a model is related to how operators of the model constrain the values that the expected values of other operators can take. All integrals in Equation (S.43) are over a subset F of the hypercube [−1, 1] |M| , with the same integrand. Then the complexity uniquely depends on the volume of F under the measure p(ϕ) The approximation used becomes exact in the case of independent operators and is likely an upper bound otherwise.
As a corollary, we find that the most complex models are those where all operators are independent and F = [−1, 1] |M| . In this case the integral takes the value π |M| (see SM-4.1). The least complex models, instead, are those where operators constrain themselves as much as possible. This correspond to models where all operators depend on the same subset of spins.

SM-5. The Complexity of Complete Models
In this section we derive an analytic expression for the complexity of complete models exploiting the invariance under reparametrization of Jeffreys prior [10] distribution over the parameters.

SM-5.1. Properties of the Complete Model
The complete model M involves all the 2 n − 1 operators of Ω n \{1}. This model presents the peculiarity that the number of parameters |M| equals the number of independent parameters that are needed to specify a generic distribution p(s) on the spin configurations. In order to make this more explicit let us label by an integer i ∈ {0, 1, ..., 2 n − 1} all configurations s i ∈ S and let p i = p(s i ). We can take p 0 = p(s 0 ) to be constrained by normalisation: ∑ 2 n −1 i=0 p i = 1, which leaves 2 n − 1 = |M| free parameters p i , i ∈ {1, ..., 2 n − 1}. Re-writing (S.7) with this notation, taking the log, multiplying by φ ν (s i ) and summing over i with the relation in (S.3), leads to We call this model complete in the sense that (S.44) and (S.45) define a bijection between the sets of parameters g = {g µ } µ∈{1,...,2 n −1} and the sets of 2 n − 1 probabilities p = {p i } i∈{1,...,2 n −1} , provided that one includes the values ±∞ as legitimate values for the individual parameters g µ . This shows that this basis of models is complete in the sense that any probability distribution can be represented within this class of models.

SM-5.2. Complexity of the Complete Model
The bijection between parameters {g µ } and probabilities {p i } indicates that p constitutes a suitable parametrization for the complete model. The Fisher Information matrix in p reads: Using that the volume element is invariant under re-parametrisation [22], we can express the complexity in the new set of parameters p: c M = log dp det J(p) . (S.48) In the p-parameters, the determinant of J(p) can be more easily worked out by rewriting the FIM as: where D is a diagonal matrix with entries D ii = 1/p i , 1 is the identity matrix, and, v and w are two vectors with elements v i = p i and w j = 1 p 0 . (S.50) Finally by using the properties of the determinant, det(D 1 + vw t ) = det D det(1 + vw t ) and So the complexity of the complete model is

SM-6. Complexity: Numerical Estimates
In this Section we deal with the computation of the complexity penalty c M , defined in Equation (5) of the main text. We start by considering the generic model with an arbitrary loop structure and we derive the expression of the complexity integral for a model with a single loop. Finally we assess numerically the complexity for all the models on systems of n ≤ 4 spins. Here we will focus on models with non-degenerated parameters, leaving the discussion on degeneracy to SM-7.

SM-6.1. Generic Model
In this section we will derive an expression for the complexity integral of a generic model suitable for numerical integration. Since the contribution to the complexity of independent operators has been derived in SM-4, here we will focus on models in which there are no independent operators, i.e., every operator participates in at least into one loop.
The elements of the FIM in Equation (5) of the main text, obtained by taking the derivatives of the logarithm of the partition function Equation (7) of the main text, are for a generic model M: where we have defined and " |µ" (" |ν, µ") refers to the loops in L M in which g µ (g µ and g ν ) enters. In light of (S.53) the FIM can be expressed as where A(g) is a diagonal matrix with A µµ = J µµ − W µµ and W(g) is defined as: where we have defined the matrix B(g) = A −1 (g)W(g). Replacing the determinant of the FIM ( Equation (S.59)) in the complexity integral (Equation (5) in the main text) and performing the change of variables g µ → γ µ , defined in (S.54), yields Now (S.60) is prone for standard Monte Carlo integration by random sampling γ according to the pdf q(γ) on its bounded support. As one could easily check, q(γ) is the measure induced by a set of |M| independent operators on the hypercube [−1, 1] |M| (see SM-4).

SM-6.2. Models with A Single Loop
We consider models with all their operators involved in a single loop. The contribution to the complexity c M of any supplementary independent operator (not involved in the loop) was studied in SM-4 (contribution of log π for each supplementary operator), and will not be considered here. Single loop models M are such that L = {{∅}, }, where = {φ 1 , . . . , φ |M| } = M. All such models with fixed number of operators belong to the same class of complexity. The cardinality of this class can be assessed thinking at the single loop model with |M| operators as a set of |M| − 1 independent operators plus an operator being the product of the independent ones. Following this reasoning the number of single loop models on a n spins system is: where the previous formula was derived in full analogy with the number of models with only independent operators (see SM-4).

Complexity of Single Loop Models
Expression (S.60) for the complexity integral can be notably simplified in case of single loop models. The single loop of length |M| reduces (S.54) to χ µ,ν = χ µ = χ and χ = ∏ µ∈ γ µ . Enforcing these relations into the determinant of the FIM (S.59) yields Entropy 2018, 20, 739; doi:10.3390/e20100739 S16 of S23 The matrix B(g) in (S.63) can be rewritten as the outer product of two vectors: The resulting expression for the complexity, obtained by replacing (S.65) and (S.63) in Equation (5) of the main text, is then suited for numerical integration using Monte Carlo methods as explained in section SM-6.1. Figure 3 of the main text displays the complexity of models with a fixed number |M| of operators and a single loop of length k, for different values of k. Such a model has |M| − k free operators (operators not involved in any loop). It can be, for instance, a model with (|M| − 1) fields (considering that the number of spins n ≥ |M| − 1) and one (k − 1)-body interaction, or a model formed with a closed chain of k pairwise interactions and |M| − k free fields. The complexity of such a model is c M (k) = c (k) + (|M| − k) log π, where (|M| − k) log π is the contribution of the free operators and c (k) is the one of the single loop of length k. For k = 3, this latter complexity can be obtained analytically from (S.66), (1 + xyz) 2 dxdydz = log(π 2 ) , (S.67) or directly by setting n = 2 in (S.52) (as the complete model for n = 2 spins is a single loop of length k = 3). We thus obtain that c M (3) = (|M| − 1) log π. For larger values of k, the complexity c (k) is obtained numerically by integrating (S.66) with a Monte Carlo method. Figure 3 of the main text shows that the complexity c M (k) of such models increases with the length k of the loop, and saturates for large k at c M (k) → c M (3) + log π = |M| log π, which corresponds to the complexity of a model with |M| independent operators. This can be re-written in terms of the complexity of the single loop: the complexity of the single loop increases with the length k of the loop, starting from c (3) = 2 log π to finally grow for large k as c (k) → k log π, as if the k operators of the loop were independent.
SM-6.3. All Models for n ≤ 4 In Table S1 and Table S2, we classified all the models for, respectively, n = 3 and n = 4. In the 4-spin system, there are 2 15 = 32768 distinct non-degenerate models. We counted 20160 possible gauge transformations, which is in agreement with Equation (S.25). By applying all gauge transformations to all models, we find that models can be classified in 46 complexity classes (see Table S2). The number of different values of complexity c M to be estimated numerically with Equation (S.66) is thus drastically reduced, from 32768 to only 46. In the 3-spin system, there are 2 7 = 128 non-degenerate models spread over 10 complexity classes (see Table S1). Note that the 3-spin system is a sub-case of a 4-spin system (see all classes with n M ≤ 3 in Table S2): every complexity class of the 3-spin system is also a class of the 4-spin system, with all characteristics preserved, except for the number of elements in the class (which takes into account the additional spin).
The comprehensive study of classes for n = 3 and n = 4 allows comparing with the results of the previous sections. First, the relation between |M|, λ and n M in (S.28) is always verified for each class of the two tables. We can also observe, in both tables, the "symmetry" in the cardinality between classes of models with |M| operators and their respective complementary classes of models with 2 n − 1 − |M| operators (see SM-3.3). Equation (S.33) for the cardinality of classes with only independent operators (such that |M| = n M ) is also verified here. Finally, let us remark, in Table S1, that two different classes, with different structures (even different number of operators), may have the same value of c M . For instance, the model M = {s 1 , s 2 , s 3 } (in the class of the 4th row) and the model M = {s 1 , s 2 , s 3 , s 2 s 3 } (class in the 7th row) have both c M = 3 log π even though their structure are clearly different.
Remarkably, the complexity of models is not monotonic in the number |M| of operators, as it can be verified in Table S1 for n = 3 and in Figure 4 in the main text for n = 4. Observe for instance in Table S1 that the complete model (with |M| = 7) is much less complex then any model with |M| = 6 operators. At equal number of operators |M|, the maximum of the complexity is achieved by models with only independent operators (see SM-4.2 and Figure 4 ) in the main text; on the other hand, sub-complete models, i.e. models that contain a complete model (see SM-5) on a subset of spins, are the simplest. We also notice that complexity decreases when turning an independent interaction into an operator that enters a loop (compare for instance the two complexity classes with |M| = 3 operators in Table S1). In summary, we found that, adding a new operator to a model: if the new operator enter in several loops, the complexity may increase (from always less than log π) or decrease; although it is no trivial to predict what will happen, we observe that, at fix number of operators |M|, the closest the model is to a sub-complete model, the less complex it is. Table S1. Summary table of all non-degenerate models of a 3-spin system; models are partitioned in 10 classes. Each line gives the characteristics of one class: the common structure of the models of the class (number of operators |M|, number of independent operators n M , λ = log 2 |L|, and lengths | | of each loop ∈ L), the number of models in the class, and finally, the corresponding value of the complexity c M . The notation a b means that the element a is repeated b times. The last row corresponds to the complete 3-spin model.  Table S2. Summary table of all non-degenerate models of a 4-spin system; models are partitioned in 46 classes. Each line corresponds to one or more classes with the same number of operators |M| and gives, for each class, the following characteristics: number of independent operators n M , cardinality of the minimal generating set of loops λ = log 2 |L|, and number of models in the class (column "cardinality").
The notation a b means that the element a is repeated b times. The complexity class taken as an example in Figure 1 and Figure 4 Left is highlighted in bold in this To conclude, our close analysis of the n = 4 spin case suggests that the simplest models are those where operators concentrate their support on a subset of spins (and their equivalent models), as opposite to "sparse" models with many independent parameters. More precisely, for fixed value of |M|, classes with lower value of n M (i.e., with less independent operators) are less complex (see colors in Figure 4 in the main text). They are the classes that contain at least one model whose interactions are grouped on a subset n M of the n spins. Finally, among these models, the least complex are the ones equivalent to the model that is as close as possible to a sub-complete model (exactly a sub-complete model for |M| = 2 n M − 1, see |M| = 1, 3, 7 and 15 in Figure 4 in the main text).

SM-7. Degenerate Models
In degenerate models at least two operators, say φ µ and φ ν , have the same parameter, i.e., g µ = g ν (see SM-0.3). Since the mapping between parameters and operators is no longer bijective, for a degenerate model M, together with the |M| operators φ and m parameters g, one requires to specify the matrix U (S.10), that maps operators into parameters [23]. The degeneracy coefficient α i = ∑ j U ij defines the number of operators parametrized by g i , implying that, if the number of parameters is m , while the number of operators is |M|, then ∑ m i=1 α i = |M|. The partition function (see SM-2) of a generic degenerate spin model M, parametrized by g is This expression extends Equation (7) of the main text to the case of degenerated parameters. Here i ∈ means that there is at least one operator φ µ , parametrized by g i , such that φ µ enters the loop (µ ∈ following the notation of SM-1.2). Finally β i ( ) [24] denotes the degeneracy coefficient of g i in loop (how many operators parametrized by g i enter the loop ).

SM-7.1. Independent Operators and Parameters
The partition function of a model with n M = |M| independent operators and m parameters with degeneracy coefficients (α 1 , . . . , α m ) contains only the first term of (S.68), Z M (g) = 2 n ∏ m i=1 cosh(g i ) α i . As a consequence its complexity is: such that any parameter g i contributes with a term log( √ α i π) to the complexity c M . Consider now a generic degenerate model M with |M| operators, including K independent φ 1 , . . . φ K . We can then introduce a model M formed by the set of non-independent operators of M (and relative parameters), and a model M ind = {φ 1 , . . . φ K } (and relative parameters). Suppose now that the set of parameters of the two models M and M ind is disjoint, meaning that there is no parameter parametrizing both operators in M and M ind . It follows that M = M ∪ M ind , such that, analogously to the non-degenerated case (see SM-4.1), the partition function (S.68) can be factorized in the two models, and so does the complexity: where m ind is the number of parameters in M ind (that possibly differs from the number of operators in M ind , K).
Notice that if the sets of operators of M and M ind are disjoint while the sets of parameters aren't, the complexity doesn't factorize in the two models. By comparing this result with the non degenerate case (see SM-4) degenerating parameters reduces the number of independent operators of a model, decreasing the complexity of the model itself. For models with only independent parameters this statement can be easily checked, as the complexity of a model with |M| independent operators and |M| parameters is larger than the complexity of a model |M| independent operators and m ≤ |M| The fact that degeneracy of parameters reduces the complexity of a model holds also in loopy models, as we show in section SM-7.3.

Complexity Classes
In case of degenerated parameters the complexity of models with n M = |M| independent operators depends only on the number of parameters and the degeneracy coefficients α i , as shown in SM-7.1. Specifically each class of complexity is identified by the number of parameters m ≤ |M|, the set of degeneracy coefficients and their multiplicities (α j 1 , r j 1 ), ..., (α j K , r j K ) , with α j 1 < ... < α j K and K total number of distinct values that the degeneracy coefficients α i take, such that ∑ K i=1 α j i r j i = |M| and ∑ K i=1 r j i = m. Then the cardinality of such a class of complexity is the number of possible ways of choosing n M independent operators N ind (S.33) times the number of partitions of a set of n M elements in exactly m subsets of which r j 1 of cardinality α j 1 , ..., and r j K of cardinality α j K .

SM-7.2. Generic Model
The derivation of the complexity for the generic model in case of non-degenerated parameters (see SM-6.1) can be straightforwardly extended to the case of degenerate models. In particular the FIM (hessian of logarithm of the partion function (S.68)) is and |i and |{i, j} refer respectively to the loops in L M in which g i enters and in which both g i and g j enter. The derivation then follows the non degenerated case, by decomposing the FIM (S.72) and factorizing out a diagonal matrix ( see section SM-6.1).
Finally the complexity of the generic degenerate model reads: where the matrix B is defined as Now (S.74) is prone for standard Monte Carlo integration by random sampling γ according to the pdf q(γ).

SM-7.3. Models with A Single Loop
We now focus on the complexity of models that contain only one loop involving all |M| operators, L = {{∅}, }, where = {φ 1 , . . . , φ |M| }, parametrized by m ≤ |M| parameters. All such models with fixed number of operators, parameters and degeneracy coefficient belong to the same class of complexity. Specifically the class of complexity is identified by the number of operators |M|, the number of parameters m and the set of degeneracy coefficients and their multiplicities (α j 1 , r j 1 ), ..., (α j K , r j K ) , with α j 1 < ... < α j K ( K is the total number of distinct values that the degeneracy coefficients α i take), such that ∑ K i=1 α j i r j i = |M| and ∑ K i=1 r j i = m. The cardinality of such a class of complexity is N deg 1 loop = N 1 loop |M|! (α j 1 !) r j 1 · ... · (α j K !) r j K 1 r j 1 ! · ... · r j K ! (S.77) given by the number of possible ways of choosing |M| operators constituting a single loop N 1 loop (S.62), times the number of partitions of a set of |M| elements in exactly m subsets of which r j 1 of cardinality α j 1 , ..., and r j K of cardinality α j K . The complexity of this class of models can be derived from (S.74) by enforcing the single loop constraints on (S.73), namely α i = β i and χ i = α i χ, χ i,j = α i α j χ, while χ = ∏ m i=1 γ α i i : where q(γ) is defined in (S.76).
The expression (S.78) for the complexity was obtained by replacing the relation (that only holds for single loop models) det(1 + B) = 1 + χ in the complexity of the generic degenerate model (S.74).  Figure S2. Degeneracy and complexity. Ratio (data points + solid line to guide the eye) between the complexity of two single loop models of length |M| (number of operators) parametrised respectively by 2 parameters (one of them |M| − 1 times degenerated) and |M| parameters, versus loop length |M|. Complexities here are averages of 10 3 numerical estimates of the integrals (S.66) and (S.78) using 10 6 MC samples each and error bars result from error propagation of one standard deviation of these estimates. The larger the loop the more independent are the operators, as shown in Figure 3 of the main text, such that the ratio between the complexities of single loop degenerated and non degenerated models is approaching the ratio (dashed line) between the complexities of independent operators degenerated (log |M| − 1π 2 ) and non-degenerated (log π |M| ) models (see SM-4.1 and SM-7.1).
The simple case of single loop models constitutes a suitable platform to gain some insights on how the degeneracy affects the complexity of models with loops. The degeneracy coefficients of parameters enter the expression of the complexity (S.78) in a non trivial way, such that numerical exploration is required. In Figure S2 we compare the complexity of a single loop model of length |M| (number of operators) parametrised by 2 parameters and the corresponding non degenerated single loop model. By increasing the length of the loop the degeneracy increases while the complexity decreases (relatively to the non degenerated model). The fact that-analogously to the independent operators model (see SM-7.1)-degeneracy in the single loop model reduces the complexity can be intuitively understood through our general argument in SM-4. By degenerating the parameters g µ one indirectly constrains