The Phase Transition Analysis for the Random Regular Exact 2-( d , k )-SAT Problem

: In a regular ( d , k )-CNF formula, each clause has length k and each variable appears d times. A regular structure such as this is symmetric, and the satisﬁability problem of this symmetric structure is called the ( d , k ) -SAT problem for short. The regular exact 2-( d , k )-SAT problem is that for a ( d , k )-CNF formula F , if there is a truth assignment T , then exactly two literals of each clause in F are true. If the formula F contains only positive or negative literals, then there is a satisﬁable assignment T with a size of 2 n / k such that F is 2-exactly satisﬁable. This paper introduces the ( d , k )-SAT instance generation model, constructs the solution space, and employs the method of the ﬁrst and second moments to present the phase transition point d ∗ of the 2-( d , k )-SAT instance with only positive literals. When d < d ∗ , the 2-( d , k )-SAT instance can be satisﬁed with high probability. When d > d ∗ , the 2-( d , k )-SAT instance can not be satisﬁed with high probability. Finally, the veriﬁcation results demonstrate that the theoretical results are consistent with the experimental results.


Introduction
The satisfiability (SAT) problem of Boolean formula mainly investigates whether there is a satisfiable truth assignment for a given Conjunctive Normal Form (CNF) formula. As the first NP-complete problem to be proved, the SAT problem has been deeply studied for a long time. Various SAT problems with regular structure have been attracting increasing attention from scholars, such as limiting the length of clauses or the number of variables appearing. The SAT problem that limits the length of each clause to k is called a k-SAT problem. It has been proven that the k-SAT problem was hard to solve in the "worst" case. In recent years, some scholars have begun to study the SAT problem with symmetric structure. In this type of SAT problem, the number of occurrences of each variable is limited to d and the length of each clause is k. The results show that any k-SAT instance can be reduced to a (d, k)-SAT instance of this symmetric structure [1,2]. For the (d, k)-SAT problem, if there is a set of variable assignments such that exactly two literals in each clause of the (d, k)-SAT instance are true, such problem is called the regular exact 2-(d, k)-SAT problem. This problem is still hard to solve in the "worst" case, and the study of it is helpful to further analyze other SAT problems with symmetric structure.
Studying SAT problems in some specific situations will help people to discover the general mathematical phenomena of SAT problems and design more effective algorithms for solving SAT problems. Currently, more studies focus on the random k-SAT problem. Experimental verification and theoretical analysis of this problem have shown that the clause constraint density α (the ratio of the number of clauses m to the number of variables n) of the CNF formula was an important parameter affecting the satisfiability and the difficulty of solving the formula [3][4][5][6]. It has been found that there was a critical value α k associated with k. When α < α k , the random k-SAT problem instance could be satisfied with a high probability. When α > α k , the random k-SAT problem instance could not be satisfied with a high probability [3]. This sudden change from satisfied to unsatisfied was called the phase transition phenomenon of the k-SAT problem, and the corresponding critical value α k was called the phase transition point. Additionally, it was also found that when α was far away from the phase transition point α k , the random k-SAT problem was easy to solve. When α was close to the phase transition point α k , the random k-SAT problem was challenging to solve. In the random 3-SAT problem, the research revealed that α k was at least 3.52 [7] and at most 4.4898 [8]. In addition, the instances that most excellent SAT solvers could solve were closely related to α. For instance, the DPLL algorithm was applicable to solve pure text random 3-SAT instances with constraint density α < 1.6, the Zchaff algorithm was suitable for dealing with random 3-SAT instances with clause constraint density α < 3.52 that had the most positive or negative literals, WalkSAT was suitable for solving random 3-SAT instances with clause constraint density α slightly higher than 3.95 [9], the Belief Propagation algorithm was suitable for random 3-SAT instances with clause constraint density α < 3.95, and the Survey Propagation algorithm was suitable for solving clause constraint density α < 4.26 random 3-SAT instances [10]. However, all symmetric (d, k)-SAT problems have the same clause constraint density, because in such formulas, each variable is copied d times to form the dn variable junction points, each clause is copied k times to form the km clause nodes, and dn = km. Obviously, it is not enough to characterize the satisfiability and difficulty of such a regular structure SAT problem only by the constraint density of clauses. To better investigate the difficulty of solving such SAT problems, new parameters should be introduced to characterize the structural features of the formulas.
The phase transition analysis of the SAT problem with symmetric structure has obtained abundant research results. Boufkhad et al. firstly proposed the regular (d, k)-SAT problem with this symmetric structure and pointed out that the satisfiable phase transition point of the random regular (3, k)-SAT problem was 2.46 ≤ α k,3 ≤ 3.78 in [11]. On this basis, Rathi et al. obtained the upper and lower bounds of the phase transition points of the problem by using the method of first and second moments [12]. The authors provided the phase transition point of the random regular NAE-SAT problem and obtained a phase transition control parameter concerning parameter d [13]. Achlioptas et al. studied the phase transition of 1-in-k-SAT and NAE-3-SAT problems, and found that the upper and lower bounds of NAE-3-SAT were 1.514 < c 3,N AE < 2.215 [14]. Then, the phase transition of the NAE-k-SAT problem was provided in [15]. The author proposed the random regular exact cover problem (exactly one in every clause is true), studied its satisfiable phase transition, and proved it by using the subgraph method [16]. Further, the phase transition of the exact problem was provided by using the method of differential equation [17]. Inspired by the exact problem, we propose to study the phase transition of the regular exact 2-(d, k)-SAT problem, and find the phase transition control parameters for the problem.
This article introduces s-exact satisfiability. For a (d, k)-CNF formula F, if it contains only positive or negative literals, there is a satisfiable assignment T of size sn/k, such that F is s-exactly satisfiable. Then, we introduce the generation model of (d, k)-SAT instance, construct the solution space, and employ the method of the first-order moment and the secondorder moment to present the phase transition point d * of the 2-(d, k)-SAT problem. Specifically, let k ≥ 3, for a positive integer d, d * = (2 ln(2/k) + (k − 2) ln(1 − 2/k))/(ln(1 − 1/k) +(k − 2) ln(1 − 2/k)). When d < d * , the problem is satisfiable with high probability. When d > d * , the problem is unsatisfiable with high probability. Since the phase transition point d * is a function of the clause length k, we take the three cases of k = 4, k = 6, and k = 7 respectively, and choose (d, k)-SAT instance with a scale of n = 10 and n = 20 to conduct experiments. The verification results demonstrate that the theoretical results are in agreement with the experimental results.

Random Regular Exact 2-(d, k)-SAT Problem
A literal is a propositional variable x or a negation of a propositional variable ¬x. A clause C is the disjunction of several literals (C = (L 1 ∨. . . ∨L k ) ), the clause C can be regarded as the set of literals (C = {L 1 ,. . . , L k } ), k denotes the length of the clause, and Var C j represents the set of variables in a clause C j . A Conjunction Normal Form (CNF) formula F is the conjunction of several clauses (F = (C 1 ∧. . . ∧C m ) ), and the formula can be regarded as a set of clauses F = {C 1 ,. . . , C m }, or a clause table F = [C 1 ,. . . , C m ], where m is the size of the formula and var(F) represents the set of variables appearing in formula F. #var(F) and #cl(F) are the number of variables and clauses appearing in formula F, respectively. The number of occurrences of a variable x refers to the sum of the corresponding positive and negative literals, denoted as occs(x, F), using pos(x, F) and neg(x, F) respectively, to represent the positive and negative occurrence times of the variable x in F, and signify occ(x, F) = pos(x, F) + neg(x, F). A regular (d, k)-CNF formula means that in the CNF formula F, each variable occurs exactly d times, and the length of each clause is exactly k. The regular structure is symmetric. In the current formula class, there must be km = dn.
For any CNF formula F, if there is a truth assignment on the set of variables {x 1 , x 2 , . . . , x n } that makes the value of formula F be TRUE, then F is satisfiable (SAT). If the value of F is FALSE due to 2 n different assignments of n Boolean variables, then formula F is unsatisfiable (UNSAT). The satisfiability problem of symmetric structures is called the (d, k)-SAT problem. Definition 1. Let F be a (d, k)-CNF formula with n variables {x 1 , x 2 , . . . , x n }, and m clauses {C 1 , C 2 , . . . , C m }. It is said that formula F is s-exactly satisfiable. If there is a truth value assignment T, each clause C j has exactly s literals to be true, where 1 ≤ s < k.
Note: a formula can be satisfied, but it does not have to be s-exactly satisfied. Suppose F is a (d, k)-CNF formula with only positive literals or only negative literals and T is a satisfying assignment, for each clause C j , if T ∩ C j = s, then d|T| = mk − m(k − s) = sm = sd · (n/k). Thus, |T|=sn/k. According to the above derivation, for any (d, k)-CNF formula F containing only positive literals or only negative literals, if F is s-exactly satisfiable, then F must have a satisfiable assignment with a size of sn/k. However, for the (d, k)-CNF formula with both positive and negative literals, there is no such property. Theorem 1. Let the formula F be a regular (d, k)-CNF formula with n variables and m clauses. If F contains only positive or negative literals, then there is a satisfiable assignment T of size sn/k, such that F is s-exactly satisfiable.
Proof. Since the formula F is a regular (d, k)-CNF formula, each variable appears d times, and the length of each clause is k. Let a satisfiable assignment T = {x 1 , x 2 , . . . , x sn/k }. Since km = dn, then d|T| = d · (sn/k) = s · (dn/k) = s · (mk/k) = sm. Therefore, we take all sn/k variables as positive literals or negative literals, and equally distribute them to m clauses. Thus, the formula F is s-exactly satisfiable.
The following figure ( Figure 1) is a regular (4,3)-CNF formula, where n = 6 and m = 8. The formula is 2-exactly satisfiable, and T = (x 1 , x 2 , x 3 , x 5 ) is a satisfiable assignment of it. Therefore, we discuss the 2-exactly satisfiable problem of the (d, k)-CNF formula with only positive literals, in order to further analyze and understand the degree of difficulty and phase transition law of the s-(d, k)-SAT problem.

Random Regular (d, k)-SAT Instance Generation Model
Let the set of variables V = {x 1 , x 2 , . . . , x n }, and copy each variable in V d times to produce the following variable matrix: Let the CNF formula F = (C 1 ∧. . . ∧C m ), and copy each clause in F k times to produce the following matrix, which is called the position matrix of argument placement (short for position matrix): Note: for a regular (d, k)-CNF formula, km = dn. All permutations on the set {1, 2, . . . , mk} constitute the sample space, and there are (mk)! permutations in total. Randomly take a permutation π from it, corresponding to the generation of a regular (d, k)-bipartite graph G = (U, V, E), where the relationship between the edges is as follows: π x i,p = C j,q ⇔ x i ∈ C j . Since a permutation determines an instance, the size of the formula instance space is (mk)!.

Phase Change Analysis Technique
This section mainly introduces phase transitions and some methods used in phase transitions. (2) When lim n→∞ p ∞ (n)/p(n) = ∞, almost certainly model M has property Q. Then, the property Q has a phase change, and the function is called criticality (or threshold).
Suppose Q is a property with a phase transition property, then the probability function p : N → [0, 1] is called strict criticality. If for any c < 1, for almost all n, model M has no property Q. For any c > 1, for almost all n, the model M has property Q.
In the proof of the phase transition phenomenon of the SAT problem, the first moment method is usually used to demonstrate that the SAT problem can satisfy the upper bound of the critical value. At the same time, the second moment method is used to prove that the SAT problem can satisfy the lower bound of the critical value.
The first moment method is a simple application of Markov's inequality.
It is of note that the value of Z is only a non-negative integer. According to Markov's inequality, the probability of a random variable Z ≥ 1 (i.e., Z > 0) is at most E(Z), that is, The second moment method is an inequality which is directly derived from Chebyshev's inequality.
Theorem 3. (The second moment method [19]): Let Z be a random variable of non-negative integer value only, then:

Main Results
The phase transition phenomenon is one of the important characteristics of SAT problems. The study of this phenomenon will help to further understand the properties of SAT problems and design more efficient algorithms. In this section, the phase transition points of the 2-(d, k)-SAT problem with symmetric structure are obtained by using the method of first and second moments.
. When k ≥ 3, for any positive integer d: Theorem 4 shows that for a 2-(d, k)-SAT instance F, when k ≥ 3, if d < d * , then the high probability of F is satisfied. If d > d * , then the high probability of F is not satisfied. Theorem 4 can be proved by the following two lemmas. Lemma 1. Suppose F is a random regular exact 2-(d, k)-SAT instance. When k ≥ 3, if d > d * , F is unsatisfied with high probability.
Proof. Let F = {C 1 , C 2 , . . . , C m } be a regular (d, k)-SAT instance, let V = {x 1 , x 2 , . . . , x n } be a set of variables of instance F, and let T be a truth assignment on the set of variables V. Each clause C j in F satisfies |T ∩ C j | = 2. Therefore, for each clause C j , there are (k − 2) variables that are not entered in T. Since each variable occurs d times and km = dn, then d|T| = mk − m(k − 2) = 2m = 2dn/k. Thus, |T| = 2n/k. Let T = (x 1 , x 2 , . . . , x 2n/k ), copy d times, and generate the following matrix: There are 2n k · d = 2m variables in the matrix T , and the 2m variables are divided into m groups with 2 elements in each group. There are C 2 2m · C 2 2m−2 · · · C 2 4 · C 2 2 /m! = (2m)!/(m! · 2 m ) possibilities for such a division. Since there are k positions in each row of formula F, there are [(2m)!/(m! · 2 m )] · m! · (k(k − 1)) m /2 m substitutions by replacing any two positions with the variables in matrix T . There are (k − 2) empty positions in each row of formula F after the substitution, which can be replaced by other remaining variables in V. There are [m(k − 2)]! possibilities for such a substitution. Therefore, the number of formulas that T satisfies is Thus, the number satisfying the formula in the instance space is Assuming that the instance F is uniformly and randomly selected in the instance space which the size is (mk)!, and the random variable Z(n) is used to represent the number of satisfying solutions of F, the mathematical expectation of Z(n) is expressed as follows: According to the first moment, the task now is to prove that lim To further simplify the processing of ϕ(d): . Obviously, when k ≥ 3, the function is a monotonically decreasing function of d, and has a unique zero: . Therefore, when k ≥ 3, d > d * , lim n→∞ E[Z(n)] = 0. Then: In other words, when k ≥ 3 ,d > d * , the regular exact 2-(d, k)-SAT instances are unsatisfied with high probability. Lemma 2. Let F be a random regular exact 2-(d, k)-SAT instance. When k ≥ 3, if d < d * , then F can be satisfied with high probability.
Proof. The following is mainly performed to calculate the value of E[Z 2 (n)]. Suppose F is a satisfiable instance and has a satisfiable assignment set {T 1 , T 2 , · · · , T r } with r elements. Suppose a pair of assignments (T p , T q ) satisfy F, if and only if T p and T q satisfy F. Consider the set of assignment pairs {(T p , T q ) : 1 ≤ p, q ≤ r} on {T 1 , T 2 , · · · , T r },, and the number of assignment pairs is r 2 . Assume T pq = (T p , T q ), then, the assignment pair can be expressed as the following matrix: For the (d, k)-CNF formula, the size of any 2-exact satisfiable assignment T is 2n k. The matrix T is obtained by copying the variables in T d times. We write the matrix as set T = {x i,j : 1 ≤ i ≤ 2n/k, 1 ≤ j ≤ d}. By pairing (2n/k) · d = 2m elements in T , the set of element pairs T is constructed, and the size of the set T is m. Such set has C 2 2m · C 2 2m−2 · · · C 2 4 · C 2 2 = (2m)!/(m! · 2 m ). For any assignment pair (T p , T q ), the relationship is as follows: Therefore, the size of the set difference can be used to divide the set {(T p , T q )| 1 ≤ p, q ≤ r}. When T p − T q = |T l − T h | = λ, (T p , T q ) and (T l , T h ) are in the same class, denoted as: It can be seen that P 0 , P 1 , · · · , P 2n/k forms a partition of {(T p , T q )| 1 ≤ p, q ≤ r} and has: |P 0 | + |P 1 | + · · · + |P 2n/k | = r 2 .
(2) The matrix B matches (1 − ω)m clauses in m clauses, corresponds to (1 − ω)m rows in the position matrix, and occupies two positions in each row.
(3) The matrix C matches (ωn/k) · d = ωm clauses in m clauses, corresponds to ωm rows in the position matrix, and occupies two positions in each row.
For cases (1) and (3), which are jointly allocated to each row of the ωm row and match the four positions in the position matrix, there are a total of (ωm)! · C 2 k ωm · (ωm)! · For each satisfied assignment T p , it can construct (2m)!/(m! · 2 m ) elements. Thus, the formula can be satisfied as follows: Therefore, for a given assignment pair(T p , T q ), the total number of formulas that (T p , T q ) can satisfy is: So, the maximum number of formulas satisfied by the assignment pair in Q ω is: Specify a set S of size 2n/k, and let S and S satisfy the following conditions: Such subsets S have: Then, the maximum number of formulas satisfied by all assignment pairs in Q ω is: Let Z ω (n) denote the number of randomly selected formulas satisfied by the assignment pairs in Q ω , then: .
Let ω max be the maximum point of the function ψ(ω). From Reference [12], it can be obtained that: Finally, When k = 6 and 0 < ω max < 1, the image of the function φ(d) is shown in Figure 2b. Therefore, it can be proven that when k ≥ 3, d < d * , φ(d) is smooth and non-positive in the interval (0, 1), which has a unique maximum point. Thereby, (1)).
In conclusion, Pr In other words, when d < d * , the 2-(d, k)-SAT instance has high probability to be satisfied. When d > d * , the high probability of the 2-(d, k)-SAT instance cannot be satisfied. Thus, Theorem 4 is proven.

Numerical Experiment and Analysis
This section mainly designs experiments to verify the rationality of the derivation of Theorem 4. From the above derivation, it is found that the phase transition point d * is a function of the clause length k. Therefore, we selected the 6 total cases where the clause length k = 4, k = 6, and k = 7, and the variables set sizes are n = 10 and n = 10 respectively, to verify. The generation model introduced in the article is used to randomly generate (d, k)-SAT instances. When k = 4, d * = 1.6563. When k = 6, d * = 2.1168 and k = 6, d * = 2.2803. Figures 3-5 respectively show the phase transition phenomenon of 2-(d, k)-SAT instances with the clause lengths k = 4, k = 6, and k = 7, and the number of variables n = 10 and n = 20. In the figures, the horizontal axis represents the phase transition control parameter d, and the vertical axis represents the probability that 2-(d, k)-SAT instance can be satisfied. In the experimental design, because the occurrence times of the argument d and the number of clauses m are rounded off, a small error occurred. However, it can be seen that as the scale of the problem increases, this error gradually decreases. From the figures, for a positive integer d, when d < d , the problem can be satisfied with a high probability. When d > d , the problem cannot be satisfied with a high probability. The experimental value d is basically the same as the theoretical value d * .   In the experiment, we used a brute force method. The algorithm exhausts all the assignments, verifies them one by one, and finds all 2-exactly satisfiable formulas. This algorithm is suitable for solving small-scale examples of the problem. When the problem size increases, the algorithm solving time also increases exponentially. However, when the problem size was increased from 10 to 20, the experimental results were closer to the theoretical results. Therefore, as the scale of the problem further increases, the experimental results will become more and more accurate.

Conclusions
This article described s-exact satisfiability. For a (d, k)-CNF formula F, if F contains only positive or negative literals, there is a satisfying assignment T with a size of sn/k, so that F is s-exactly satisfiable. A random regular (d, k)-SAT instance generation model was introduced. By constructing the solution space and using the methods of first and second moments, the satisfiable phase transition point d * for the 2-(d, k)-SAT problem with only positive literals was provided. d * is a function of k (k ≥ 3). For a positive integer d, if d < d * , the problem can be satisfied with high probability. If d > d * , the problem cannot be satisfied with high probability. The 2-(d, k)-SAT problem is a special case of the s-(d, k)-SAT problem. This study of the phase transition of the problem contributes to understanding and analyzing the phase transition of the s-(d, k)-SAT problem and similar problems. In further work, the phase transition of the s-(d, k)-SAT problem in other cases will be provided by using this method or exploring new methods, such as clauses with both positive and negative literals.