The Impact of Entropy and Solution Density on Selected SAT Heuristics

We present a new characterization of propositional formulas called entropy, which approximates the freedom we have in assigning the variables. Like several other such measures (e.g., back-door and back-door-key variables), it is computationally expensive to compute. Nevertheless, for small and medium-size satisfiable formulas, it enables us to study the effect of this freedom on the impact of various SAT heuristics, following up on a recent study by C. Oh (Oh, SAT’15, LNCS 9340, 307–323). Oh’s findings were that the expected success of various heuristics depends on whether the input formula is satisfiable or not. With entropy, and also with the measure of solution density, we are able to refine these findings for the case of satisfiable formulas. Specifically, we found empirically that satisfiable formulas with small entropy “behave” similarly to unsatisfiable formulas.


Introduction
In a recent article [4], Oh examined the impact of various key heuristics in competitive SAT solvers.His key findings are that the average success of those heuristics depends on whether the input formula is satisfiable or not.In particular the effect of the deletion strategy, restart policy, decay factor, and database reduction is different, on average, between satisfiable and unsatisfiable formulas.This observation can be used for designing solvers that specialize in one of them, and for designing a hybrid solver that alternates between SAT / UNSAT 'modes'.Indeed certain variants of COMiniSatPS [4] work this way.
We do not see an a priory reason to believe that the SAT/UNSAT dividecorresponding to the distinction between zero or more solutions-explains best the differences in the effect of the various heuristics. 1 In this work we investigate further his findings, and show empirically that there are more refined measures (properties) than the satisfiability of the formula, that predict better the effectiveness of these heuristics.In particular, we checked how it correlates with two measures of satisfiable formulas: the entropy of the formula (to be defined below), which approximates the freedom we have in assigning the variables, and the solution density (henceforth density), which is the number of solutions divided by the search space.Our experiments show that both are strongly correlated to the effectiveness of the heuristics, but the entropy measure seems to be a better predictor.Generally our findings confirm Oh's observations regarding which heuristic works better with satisfiable formulas.But we also found that satisfiable formulas with small entropy 'behave' similarly to unsatisfiable formulas.

Entropy
Let ϕ be a propositional CNF formula, var(ϕ) its set of variables and lit(ϕ) its set of literals.In the following we will use v, v to denote the literals corresponding to a variable v when the distinction between variables and literals is clear from the context.If ϕ is satisfiable, we denote by r(l), for l ∈ lit(ϕ), the ratio of solutions to ϕ that satisfy l.Hence for all v ∈ var(ϕ), it holds that r(v) + r(v) = 1.We now define: Definition 1 (variable entropy).For a satisfiable formula ϕ, the entropy of a variable v ∈ var(ϕ) is defined by where 0 • log 2 0 is taken as being equal to 0.
This definition is inspired by Shannon's definition of entropy in the context of information theory [6]. Figure 1 (left) depicts (1).Intuitively, entropy reflects how 'balanced' a variable is with respect to the solution space of the formula.In particular e(v) = 0 when r(v) = 0 or r(v) = 1, which means that ϕ ⇒ v or ϕ ⇒ v, respectively.In other words, e(v) = 0 implies that v is a backbone variable, since its value is implied by the formula.The other extreme is e(v) = 1; this happens when r(v) = r(v) = 0.5, which means that v and v appear an equal number of times in the solution space.
Definition 2 (formula entropy).The entropy of a satisfiable formula is the average entropy of its variables.
As an example, Fig. 1 (right) is a histogram of e(v) for a particular formula ϕ, where for 24 out of the 100 variables r(v) = 0.
Entropy is hard to compute : Let #SAT (ϕ) denote the number of solutions a formula ϕ has.Then it is easy to see that Hence computing e(v) amounts to two calls to a model counter.But since the denominator #ϕ is fixed for ϕ, computing e(ϕ) amounts to |var(ϕ)| + 1 calls to a model counter.Since model counting is a #P problem, we can only compute this value for small formulas.
The benchmark set : Using the model-counter Cachet [5], we computed the precise entropy of 5000 3-SAT random formulas with 100 variables and 400 clauses.These are formulas taken from SAT-lib, in which the number of backbone variables is known.Specifically, there is an equal number of formulas in this set with 10,30,50,70 and 90 backbone variables (i.e., a 1000 formulas of each number of backbone variables), which gave us a near-uniform distribution of entropy among the formulas.

A preliminary: standardized linear regression
We assume the reader is somewhat familiar with linear regression.It is a standard technique for building a linear model ŷ = β0 + β1 x, where ŷ in our case is a predictor of the number of conflicts, and x is either the entropy or the density of the formula.We will focus on two results of linear regression: the value of β1 and the p-value.The latter is computed with respect to a null hypothesis, denoted H 0 , that β1 = 0, and an alternative hypothesis H 1 .H 1 can be either the complement of H 0 ( β1 = 0) or a 'one-sided hypothesis', e.g., H 1 : β1 > 0. In the former case, p = 2P r(Z ≤ z | H 0 ), where Z ∼ N (0, 1) and z = β1−0 std( β1) .The '0' in the numerator comes from the specific value in H 0 .In other words, assuming H 0 is correct, the p-value indicates the probability that a random value from a standard normal distribution N (0, 1), is less than z, the standardized value of β1 .In the latter case p = P r(Z ≤ z | H 0 ).
We list below several important points about the analysis that we applied.
-Standardization of the data: given data points X .= x 1 , . . ., x n , their standardization X .= x 1 , . . ., x n is defied for 1 ≤ i ≤ n by where x is the average value of X and σ is its standard deviation.Now X has no units, and hence two standardized sets of data are comparable even if they originated from different types of measures (in our case, entropy and density).All the data in our experiments was standardized.-Bootstrapping: Bootstrapping, parameterized by a value k, is a well-known technique for improving the precision of various statistics, such as the confidence interval.Technically, bootstrap is applied as follows: Given the original n samples, uniformly sample it n times with replacement (i.e., without taking the sampled points out, which implies that the same point can be selected more than once); repeat this process k times.Hence we now have n • k data points.For our experiments we took k = 1000, which is a rather standard value when using this technique.Hence, we have 5 • 10 6 data points.
- • The ∆ test: A linear regression test over the series (e 1 , c ), and the series ( ). Intuitively, the two models tell us slightly different things: the first tells us whether the gap between the two heuristics is correlated with the measure, and the second tells us whether there is a significant difference in the value of β1 (the slope of the linear model) between the two heuristics.As we will see in the results, the p-value obtained by these models can be very different.-Plots: The plots are based on the original (non-standardized) data.To reduce the clutter (from 5000 points), we rounded all values to 2 decimal points and then aggregated them.Aggregation means that points (x, y 1 ) . . .(x, y n ) (i.e., n points with an equal xvalue) are replaced with a single point (x, avg(y 1 . . .y n )).However the trend-lines in the various plots are depicted according to the original data, before rounding and aggregation.The statistical significance of these trend-lines appears in Appendix B.

Entropy and density predict hardness
We checked the correlation between hardness, as measured by the number of conflicts, and the two measures described above, namely entropy and density.We use the number of conflicts as a proxy of the run-time, because these are all easy formulas for SAT, and hence the differences in run-time are rather meaningless.The two plots in Fig. 2 depict this data based on our experiments with the solver MiniSat-HACK-999ED.It is apparent that higher entropy and higher density imply a smaller number of conflicts.A detailed regression analysis appears in Appendix A, for seven solvers.We also checked the correlation between the two measures themselves: perhaps formulas with higher entropy also have a higher density (each variable v with high entropy, e.g., e(v) = 1, nearly doubles the number of solutions).It turns out that in our benchmarks these two measures are not correlated: the confidence-interval for β1 is [0.144-0.156]with a p-value which is practically 0. Fig. 2. Entropy (left) and density (right) as predictors of the number of conflicts (based on MiniSat-HACK-999ED).It is apparent that higher entropy and higher density imply a smaller number of conflicts.

Empirical findings
In this section we describe each of the experiments of Oh [4], and our own version of the experiment based on entropy and density, when applied to the benchmarks mentioned above.We omit the details of one experiment, in which Oh examined the effect of canceling database reduction, the reason being that this heuristic is only activated after 2000 conflicts, and most of our benchmarks are solved before that point. 2 Raw data as well as charts and regression analysis of our full set of experiments can be found online in [1].
1. Deletion strategy: Different solvers use different criteria for selecting the learned clauses for deletion.It was shown in [4] that for SAT instances learned clauses with low Literal Block Distance (LBD) [2] value can help, whereas others have no apparent effect.In one of the experiments, whose results are copied here at the top part of Fig. 3, Oh compared the criterion of 'core LBD-cut'3 5 and clause size 12.In other words, either save (i.e., do not delete) clauses with an LBD-cut of 5 and lower, or clauses with size 12 or lower.It shows that for UNSAT instances the former is better, whereas the opposite conclusion is reached for the SAT instances.The results of our own experiments are depicted at the bottom of the figure.They show that the latter is indeed slightly better with our benchmarks (all satisfiable, recall).But what is more important, is that the difference becomes smaller with lower entropy-hence the decline of the trend-line (recall that the trend-lines are based on the raw data, whereas the diagram itself is computed after rounding and aggregation to improve visibility).Hence it is evident that formulas with small entropy 'behave' more similar to unsat formulas.The ascending trend-line in the right figure shows, surprisingly, an opposite effect of density.
2. Deletion with different LBD-cut value Related to the previous heuristic, in [4] it was found that deletion based on larger LBD-cut values, up to a Fig. 3.The effect of the deletion criterion.The results of [4] appear in the table at the top of the figure (the numbers indicate the solved instances).It shows that for SAT instances keeping everything with clause size 12 is better than keeping everything with LBD 5, whereas the result is opposite for the UNSAT instances.Our own experiments (bottom left) show that within SAT instances, the clause size criterion becomes better with higher entropy, but not with higher density.Note that the y-axis corresponds to the difference in the # of conflicts.On average on these instances the # of conflicts across both methods was ≈ 290.
point, improve the performance of the solver with unsat formulas, but not with SAT ones.Fig. 4 (top) is an excerpt from his results for various LBD-cut values.We repeated his experiment with LBD-cut 1 and LBD-cut 5.The plots show that lower values of entropy and (independently) lower values of density yield a bigger advantage to LBD-cut 5, which again demonstrates that satisfiable formulas with these values 'behave' similarly to unsat formulas.

Restarts policy:
The Luby restart strategy [3] is based on a fixed sequence of time intervals, whereas the Glucose restarts are more rapid and dynamic.It initiates a restart when the solver identifies that learned clauses have higher LBD than average.According to the competitions' results this is generally better in unsat instances.Oh confirmed the hypothesis that this is related to the restart strategy: indeed his results show that for satisfiable instances Luby restart is better.
Our own results can be seen in Fig. 5 and in Appendix B. The fact that the gap in the number of conflicts between Luby and Glucose-style restarts is negative, implies that the former is generally better, which is consistent with Oh's results for satisfiable formulas.Observe that the trend-line slightly declines with entropy ( β1 = −15), which implies that Glucose restarts are slightly better with low entropy.So again we observe that low entropy formulas 'behave' more similar to UNSAT formulas than those that have high entropy.The table in Appendix B shows that this result has a relatively high p-value.We speculate that with high-entropy instances, the solver hits more branches that can be extended to a solution, hence Glucose's rapid restarts can be detrimental.Density seems to have an opposite effect, although again only with low statistical confidence.

The variable decay factor:
The well-known VSIDS branching heuristic is based on an activity score of literals, which decay over time, hence giving higher priority to literals that appear in recently-learned clauses.In the solver Min-iSat HACK 999ED, there is a different decay factor for each of the two restart phases: this solver alternates between a Glucose-style (G) restart policy phase and a no-restart (NR) phase (these two phases correspond to good heuristics for SAT and UNSAT formulas, respectively).In [4] Oh compares different decay factors for each of these restart phases, on top of MiniSat HACK 999ED.His results show that for UNSAT instances slower decay gives better performance, while for SAT instances it is unclear.His results appear at the top of Fig. 6.We experimented with the two extreme decay factors in that table: 0.95 and 0.6.Note that since our benchmarks are relatively easy, the solver never reaches the NR phase.The plot at the bottom of the figure shows the gap in the number of conflicts between these two values.A higher value means that with strong decay (0.6) the results are worse.We can see that the results are worse with strong decay when the entropy is low, which demonstrates again that the effect of the variable decay factor is similar for unsat formulas and satisfiable formulas with low entropy.A similar phenomenon happens with small density.
Conclusions : We defined the entropy property of satisfiable formulas, and used it, together with solution density, to further investigate the results achieved by Oh in [4].We showed that both are strongly correlated with the difficulty of solving the formula (as measured by the number of conflicts).Furthermore, we showed that they predict better the effect of various SAT heuristics than Oh's sat/unsat divide, and that satisfiable formulas with small entropy 'behave' similarly to unsatisfiable formulas.Since both measures are hard to compute we do not expect these results to be applied directly (e.g., in a portfolio), but perhaps future research will find ways to cheaply approximate them.For example, a high The results of [4] (top) show that the Glucose strategy (rapid restarts) has an advantage in unsat formulas.Our results (bottom) show that the same phenomenon is apparent in formulas with low entropy.Indeed observe that the number of conflicts with Glucose becomes smaller than it is with Luby (hence the negative gap), in satisfiable formulas with low entropy.backbone count (variables with a value at decision level 0) may be correlated to low entropy, because such variables contribute 0 to the formula's entropy.

Fig. 1 .
Fig. 1. (left) Depicting the entropy function (1), for a satisfiable formula with 11 solutions.(right) The distribution of e(v) of a formula with 100 variables.

Fig. 4 .
Fig. 4. The results of [4] (top) show that unsat formulas are solved faster with high LBD-cut.Our results (bottom) show that low-entropy and low-density formulas behave more similarly to unsat formulas.

Fig. 5 .
Fig. 5.The effect of the restart strategy, comparing Luby and Glucose-style restarts.The results of[4] (top) show that the Glucose strategy (rapid restarts) has an advantage in unsat formulas.Our results (bottom) show that the same phenomenon is apparent in formulas with low entropy.Indeed observe that the number of conflicts with Glucose becomes smaller than it is with Luby (hence the negative gap), in satisfiable formulas with low entropy.

Fig. 6 .
Fig.6.The effect of variable decay: the results of[4] (top) generally show that unsat formulas are better solved with a high decay factor.The restart policy in his solver is hybrid: it alternates between a 'no-restart' (NR) phase and a 'Glucose' (G) phase.The 'NR' and 'G' columns hold the decay factor during these phases.The plots at the bottom show the gap in the number of conflicts between G = 0.6 and G = 0.95.It shows that with low entropy, strong decay (i.e., G = 0.6) is worse, similar to the effect that it has on unsat formulas.With low density (right) a similar effect is visible.
Two regression tests:The entropy and density data consists of pairs of the form entropy, conf licts[i] , and density, conf licts[i] , respectively, where i ∈ {1, 2} is the index of the heuristic.Hence the corresponding data is four series of points (e 1 , c 1 [i]), . . ., (e n , c n [i]), and (d 1 , c 1 [i]), . . ., (d n , c n [i]), where i ∈ {1, 2}.In order to compare the predictive power of entropy, density and Oh's criterion of SAT/UNSAT, we performed two statistical tests (recall that the data is standardized, and hence comparable): [2]he ∆ β1 test: A linear regression test over the series (e 1 , c 1 [1]) ...(e n , c n [1]) and (e 1 , c 1[2]) ...(e n , c n[2]), and similarly for density (i.e., four tests all together).We then checked the significance of β1 for each of these 4 tests (in all such tests the significance was clear).In addition, we checked the hypothesis H 0 : β1 [1] − β1[2]= 0 for each of the measures.The result of this last test is what we will list in the results table in Appendix B.