This article is an openaccess article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
In this work we provide a statistical form of empirical analysis of classical propositional logic decision methods called SAT solvers. This work is perceived as an empirical counterpart of a theoretical movement, called
In an article entitled
The term
There is a large support for the idea that, due to undecidability, deductive inference does have the power to extend our knowledge, and thus is generally perceived as highly valuable in epistemic terms [
Since that statement of triviality of propositional logic, a full theory of computational complexity has been developed, in which the complexity of the decision of classical propositional inference plays a central role [
The fact that this is still an open question does not imply that propositional logic
The latter proposal contains a theoretical description of fragments of a family of subclassical entailment relations, called
Furthermore, it provides a decision algorithm according to which each of the entailment relations hk was shown to be decidable in time
Other approximations of classical entailment have been proposed in the literature [
(i) it is weaker than classical entailment;
(ii) it is based on informational notions; and
(iii) it treats as “uninformative” exactly those inferences that are “analytical” in the strict informational sense, that is, which do not appeal to virtual information.
This “appeal to virtual information” is what, in terms of Natural Deduction inferences, is known as introducing hypothesis that will later be discharged in the proof. In that respect, the first element of the intelim approximation, ⊢_{0}, is the only analytical deduction relation that employs no external, virtual hypothesis. All other entailment relations do employ some form of hypothesis generation and are thus considered nontrivial.
This, of course, is just a theoretical view of the complexity of Boolean inference. In this work, we take a more practical view, exploring the complexity in terms of the
In fact, we propose to study the predictability of classical propositional logic, which we take to be the possibility of having a clear distribution of some random variables associated with it. In our case, we study the distribution of the runtime of existing implementation of Boolean solvers.
We start by noting the existence of very efficient Boolean solvers, called SAT solvers, and a competition promoted biannually that reveal the best and fastest solvers [
So we take the view that the informational content of a Boolean formula is higher the harder it is to predict how long it will take to reach a decision on the satisfiability of a formula. Note that this is different from simply taking the average time to reach a conclusion. The curve that shows, for a fixed number
Here, we go one step further on the empirical analysis of SAT solvers and, instead of analysing average execution times, we analyse the probability distribution of this average time. Each formula is associated with a set of parameters (namely, the rate between the number of clauses and the number of variables) and the information content will be the same for all formulas sharing the same parameters. We consider as “less informative” class of formulas those with a “well behaved statistical distribution” of its decision time, and as a “more informative” class of formulas those with “undetermined or chaotic distribution” time to reach a decision. We will then compare the easy and hard regions according to this new view with that of the complexity profile. We investigate what new insights can be derived from this analysis with respect to the process of SAT solving and the empirical complexity of SAT formulas.
The paper develops as follows. The behaviour of SAT solvers, their empirical complexity profile and the phase transition phenomenon are presented in
Very efficient SAT solvers have appeared in the beginning of the century, with a performance several times faster than previous implementations, such as zChaff [
Cheeseman
SAT Empirical Complexity Profile,
Two curves are simultaneously plotted. For each point a number of randomly generated 3SAT formulas is submitted to the SAT solver. In all cases, the number of propositional symbols
What is surprising is that, independent of algorithm, machine and the value of
We now study the statistical distribution of these values; see if they can be approached by known statistical distributions and, in case they can, we obtain the distribution parameters.
With the aim of studying the statistical behaviour of the SAT solvers output, we perform a series of experiments.
The running time of computers programs depends on the computer processor, the processes currently running and various other factors. Therefore instead of using as a parameter measuring the runtime of zChaff, we use the number of operations performed, defined as the sum of the number of implications (or, in the terminology of the resolution method, unit propagation steps) with the number of decisions (branching points) that a resolver performs to solve a particular problem.
A total of 910,000 problems were generated and used as input in zChaff, but initially, only 360,000 problems were generated with the following characteristics:
All clauses were generated with exactly three literals (
The number of atoms (
For each value of
So, we had:
After running all these problems, we collected the output data of each case and plotted graphs showing the number of examples versus number of operations.
Thus, we can see that most graphics resembled a probability distribution.
To check which distribution represented each data set, we performed various GoodnessofFit (GoF) tests [
The Anderson–Darling tests were performed using the tool Minitab [
Resulting in a total of 550.000 instances = 50.000 (problems) × 11 (cases)
The purpose of this project was to verify the existence of a default behaviour of the SAT in accordance with the input parameters of the formulas. The set of all experiments are presented in [
Initially we tested 360,000 samples. Of these samples, the compliance with a statistical distribution for the runtime of 110,000 instances was fully inconclusive. It was not clear whether that undefined behaviour was due to zChaff requiring a large number of samples, or if zChaff did behave according to a standard distribution for some set of values of
Thus it was possible to identify the behaviour of the SAT in accordance with the 3SAT formula input parameters variation.
For most cases, the distribution of the SAT solver runtime follows the pattern shown at following figures, independent of the value of
The Anderson–Darling test measures how well the data follow a particular distribution being evaluated as:
For a given set of data and distribution, the lower the statistical distribution for a better fits the data.
The test parameters of Anderson–Darling are presented as follows:
The known statistical distribution for the tools used are normal, lognormal, exponential, Weibull, extreme value type I, and logistic distributions.
Through graphical analysis of all distributions, we found some patterns of behaviour related to the ratio
Distribution at the output of zChaff running problems with
Distribution at the output of zChaff running problems with
Distribution at the output of zChaff running problems with
Distribution for
In
Anderson–Darling tests of compliance with distributions. (
At the phase transition point (
By inspecting the empirical data distributions grouped in
Behaviour unstable at the phase transition point (
This nonuniform behaviour is not seen a little bit before the phasetransition point, when
Behaviour unstable near the phase transition (
For a fixed value of
Behaviour unstable near the point of phase shift (
A visual inspection of
The observed behaviour with respect to the runtime distribution is summarized in
Runtime distribution results.




≤ 2  yes  binomial/normal 
∈ [3, 4]  yes  gamma 
∈ [4.3, 5]  no  – 
≥ 6  yes  lognormal 
We can interpret the results in
On the other hand, when
The intermediate values of
At the phase transition point and a little bit after it, we have formulas that are not independently satisfiable; the set of jointly unsatisfiable formula overlaps in subtle, unpredictable and nearchaotic ways, generating nonuniform distributions with
In this paper we have provided empirical evidence that a significant subset of propositional formulas is far from trivial, and any implication that propositional classical logic is trivial is rightfully labelled as scandalous.
If we understand the predictability of classical logic as the possibility of having a clear distribution of some random variables associated with it, then this work has demonstrated that the runtime of existing SAT solvers is a random variable whose distribution is anything but trivial.
Previous work has concentrated on pointing the existence of an easyhardeasy separation of empirical complexity of Boolean formulas, mostly based on the average runtime of a SAT solver. Our work went one step further, identifying varying runtime distributions along the same classes of easyhardeasy problems. The two extremities of easy problems were separated, with the left extremity having a discrete binomial distribution and the light extremity a lognormal distribution, possibly the multiplicative product of those distributions. In the middle, a landscape of distributions was detected, going from left to right: normal, gamma, nonuniform/chaotic/inconclusive and lognormal. The nonuniform distribution coincides with the vicinity of the phasetransition point.
Further work should contemplate a deeper analysis of these results. First, one should try to decompose the processes that seem to be occurring at the area with nonuniform distribution. We could try to identify the existence of clusters and their distributions, for example. Also, the multiplicative nature of the mostly unsatisfiable cases should be related with that of the satisfiable cases, and we should try to analyse the factors in that multiplication.
We are aware that the kind of empirical analysis presented here is just the start of possible types of analyses that can be done and we hope that the empirical and theoretical analyses of NPcomplete problems may eventually crossfertilise.
This work was supported by Fapesp Thematic Project 2008/039955 (LOGPROB). Marcelo Finger was partially supported by CNPq grant PQ 302553/20100.
These very large problems arise from industrial applications or may be randomly generated.