A Meeting Point of Probability, Graphs, and Algorithms: The Lovász Local Lemma and Related Results—A Survey

: A classic and fundamental result, known as the Lovász Local Lemma, is a gem in the probabilistic method of combinatorics. At a high level, its core message can be described by the claim that weakly dependent events behave similarly to independent ones. A fascinating feature of this result is that even though it is a purely probabilistic statement, it provides a valuable and versatile tool for proving completely deterministic theorems. The Lovász Local Lemma has found many applications; despite being originally published in 1973, it still attracts active novel research. In this survey paper, we review various forms of the Lemma, as well as some related results and applications. “One of the most remarkable developments in Computer Science over the past 50 years has been the realization that allowing computers to toss coins can lead to algorithms that are more efﬁcient, conceptu-ally simpler and more elegant than their best known deterministic counterparts.”


Introduction
The probabilistic method in combinatorics is a very useful and powerful family of proof techniques; for a standard reference book, see Alon and Spencer [1]. As an important tool in this area, the result known as Lovász Local Lemma (LLL), which was first published in 1973 by Erdős and Lovász [2], plays a fundamental role and is often referred to as a "gem" in this field.
A key (and quite fascinating) feature of the probabilistic method is that using probabilistic arguments, it allows proving completely deterministic claims. This is also the case for the Lovász Local Lemma. Specifically, it allows us to prove that a certain structure or property exists with positive probability in a random setting, which implies that the structure must indeed exist or the property must hold deterministically. In a sense, randomness and probability play here a catalyst role: they make possible the progress toward the goal, but eventually disappear in the end result. In a number of cases, however, no other way is known to achieve the same result, which is quite surprising.
In the next section, we describe a specific motivating problem and solve it with the Lovász Local Lemma, illustrating its strength.

A Motivating Problem and the Original LLL
Hypergraph 2-coloring. Let H be a hypergraph, in which every hyperedge has at least k ≥ 3 vertices, and each hyperedge intersects at most d others. We ask that for what values of k and d can it be guaranteed that such a hypergraph is 2-colorable-that is, we can color each vertex with one of two colors, such as red and blue, such that no hyperedge becomes monochromatic.
To describe a solution attempt, let E 1 , . . . , E n be the hyperedges in H. Consider a random coloring-that is, for each vertex, flip a coin independently and, depending on the outcome, color the vertex red or blue. Let A i denote the event that E i is monochromatic. What we aim to achieve is that no hyperedge is monochromatic, which is expressed by the event C = A 1 . . . A n . If we can show Pr(C) > 0, a 2-coloring of H must exist.
To find the probability of C, let us first estimate the probability of A i . If E i has i vertices, then out of its 2 i possible colorings, only two are monochromatic (the all-red and all-blue colorings). Therefore, where the inequality follows from i ≥ k. Then, (1) implies Pr(A i ) ≥ 1 − 2 1−k . Now, if the events A i were independent, we could argue that holds. It would mean that with positive probability no edge is monochromatic, so a 2-coloring of H must indeed exist. The problem with this argument is that the events A 1 , . . . , A n are not independent whenever the hyperedges overlap. Consequently, the equality Pr(A 1 . . . A n ) = Pr(A 1 ) · . . . · Pr(A n ) that we used in (2) generally does not hold. This is the critical point where the LLL provides invaluable help. Before going into formal details, let us informally display the core message of the lemma: Key message of LLL. For any system A 1 , . . . , A n of events, in order for the conclusion Pr(A 1 . . . A n ) > 0 to hold, it is enough if the events are "almost" independent, in the sense that each one of them depends only on a limited number of others, provided that Pr(A i ) is small enough.
Specifically, the historically first version of the lemma, proved in the paper [2] published by Erdős and Lovász in 1973, is the following claim: Lemma 1 (LLL-symmetric version, original form). Let A 1 , . . . , A n be events such that each event is mutually independent of all the others, except at most d ≥ 1 of them. If holds, then Pr(A 1 . . . A n ) > 0.
Let us apply it to the above-described problem of hypergraph 2-coloring. Each event A i in the example depends only on those events A j for which E i ∩ E j = ∅, since we pick the color of each vertex independently. As each hyperedge intersects at most d others, A i may depend on at most d other events. We know from (1) that Pr(A i ) ≤ 2 1−k . Therefore, if holds, then Pr(A i ) ≤ 2 1−k ≤ 1 4d follows, so the inequality (3) is satisfied. As a result, Lemma 1 implies Pr(A 1 . . . A n ) > 0, which means that the desired 2-coloring must exist. Rearranging (4), we obtain the condition 4d ≤ 2 k−1 , or equivalently, d ≤ 2 k−3 . Thus, by means of the LLL, we have proved the following: Theorem 1. Let H be a hypergraph in which each hyperedge has at least k ≥ 3 vertices, and any hyperedge intersects at most d others. Then, the condition d ≤ 2 k−3 implies that H is 2-colorable.
For example, if each hyperedge of H has at least 11 vertices, and intersects with at most 256 other hyperedges, then H is always 2-colorable, regardless of its structure. The reason is that with d = 256 and k = 11 the inequality d ≤ 2 k−3 becomes 256 ≤ 2 8 , which indeed holds. Remark 1. The factor 4 in the condition 4d ≤ 2 k−1 can be reduced to e = 2.718 . . . (the base of the natural logarithm), using stronger versions of the LLL (see Section 4), yielding the somewhat less demanding condition d ≤ 2 k−1 /e, instead of d ≤ 2 k−3 . This, however, does not change the principle of the proof.
It is fascinating to note that applying the probabilistic claim of the LLL, we have obtained the completely deterministic result of Theorem 1. Furthermore, to the author's best knowledge, no proof is known of Theorem 1 without the use of the Lovász Local Lemma.

Naming Conventions
Before looking at stronger versions of the LLL, as well as related results, let us summarize some naming conventions that have become commonplace over the years.

•
Why "local?" The "local" adjective in the name refers to the situation that each event is typically dependent only on a small number of others. This can be visualized by a dependency graph, in which each event is represented by a vertex, and the event is mutually independent of all events to which its vertex is not connected. In this graph representation, each event is dependent only on its neighborhood-that is, subject to local dependencies only. This explains the "local" in the name.

Remark 2.
The dependency graph is not the same as the graph that we could obtain by connecting two nodes whenever they are dependent. For example, if all the events are pairwise independent, but not mutually independent, then the latter graph would have no edge, wrongly suggesting that all the events are mutually independent.
• Naming the lemma after Lovász. The first version of the LLL was published in the paper [2], jointly authored by Paul Erdős and László Lovász. Then, why is the lemma not called "Erdős-Lovász Local Lemma?" The reason is that Erdős insisted in every lecture he gave about the subject that this result was created by Lovász alone, even though they applied it together in their joint paper. This is mentioned in an interview with Lovász, see [3] (in Hungarian). • Symmetric vs. asymmetric versions. The original version of the LLL (Lemma 1) is symmetric in the sense that each event is treated equally: they all have to satisfy the same probability bound (3). In the asymmetric variant (see Section 4.2), the events may satisfy different probability bounds.

Strengthening the Symmetric Version
A natural question in connection with the original symmetric LLL (Lemma 1) is this: how large can Pr(A i ) be, such that the conclusion Pr(A 1 . . . A n ) > 0 is still guaranteed to hold? Note that a larger value of Pr(A i ) makes Pr(A i ) = 1 − Pr(A i ) smaller, so the nonemptiness of A 1 . . . A n becomes less likely. To formalize it, let us introduce a parameter p > 0 and replace the condition (3) with Then, the largest value of p for which the conclusion Pr(A 1 . . . A n ) > 0 still remains valid provides us with the strongest from of the symmetric LLL. Note that the value of d is kept fixed.
Shearer [4] investigated the above question and computed a specific function f (d), which is the supremum of p values in (5) for which the conclusion Pr(A 1 . . . A n ) > 0 remains valid, for a fixed d: Since f (d) is the supremum (not necessarily the maximum) of p values in (5) for which the conclusion Pr(A 1 . . . A n ) > 0 holds, therefore, we can formulate the strongest version of the symmetric LLL this way: Lemma 3 (Strongest symmetric LLL). Let A 1 , . . . , A n be events, such that each event is mutually independent of all the others, except at most d ≥ 1 of them. Let f (d) be given by (6).
148, so in this case the latter allows an about 77.7% larger value for Pr(A i ), than (3). Since, however, the formula (6) for f (d) is relatively complicated, one may look for a simpler bound. Spencer [5] proved that p = 1 e(d+1) suffices in place of 1 4d in (3), where e = 2.718 . . . is the base of the natural logarithm. Harvey and Vondrák [6] further improved it to 1 ed .

Asymmetric LLL
This is the more general version of the LLL, published by Spencer [5], in which it is allowed that different events can have different probability bounds. We again consider arbitrary events A 1 , . . . , A n , and express their dependencies by a dependency graph G = (V, E). In this graph, V = {1, . . . , n}, and each event A i is assumed mutually independent of the set of events {A j | (i, j) / ∈ E}. In other words, each event is mutually independent of all its non-neighbors in the dependency graph.

Further Application Examples
We have already presented an application (hypergraph 2-coloring) in Section 2. In this section, we present further interesting applications.

Disjoint Paths
Let G be a graph with two distinguished vertices u = v. Further, let Q 1 , . . . , Q n be given sets of u − v paths, such that each set contains m different paths. We would like to select a path P i ∈ Q i for each i, such that the selected paths are edge-disjoint. Under what conditions can this be done?
Assume the family of path systems is diffuse in the following sense: any P ∈ Q i is edge-disjoint from all but at most k paths in Q j for every j = i. Let us now pick a path P i ∈ Q i uniformly at random from each Q i . Let E ij denote the event that P i and P j are not edge-disjoint. Then, we have Pr(E ij ) ≤ k m since, for any P i ∈ Q i , the other path P j ∈ Q j can be chosen m different ways, and among these at most k can share an edge with P i , by assumption. Further, any event E i,j is mutually independent of all events E s,t for which {i, j} ∩ {s, t} = ∅ holds. The reason is that E i,j involves paths selected from Q i and Q j , and these are selected independently from the paths picked from Q s and Q t whenever there is no common index, i.e., {i, j} ∩ {s, t} = ∅. This implies that d < 2n, where d is the maximum degree in the dependency graph of the E i,j events. Therefore, if we satisfy k m ≤ 1 8n , then the upper bound holds, due to It means, the E i,j events satisfy the conditions of the LLL (Lemma 1), yielding Consequently, there must exist a path system P i ∈ Q i , i = 1, . . . , n, such that all the paths are edge-disjoint. Thus, by means of the LLL, we have proved the following result: Theorem 2. Let G be a graph with two distinguished vertices u = v, and let Q 1 , . . . , Q n be sets of u − v paths in G, such that each set contains m different paths. Assume that any P ∈ Q i is edge-disjoint from all but at most k paths in Q j , for every j = i. Then, whenever k ≤ m/(8n) holds, it is possible to select a path from each Q i such that all the selected paths are edge-disjoint.

k-SAT
Let us recall some well-known concepts about Boolean formulas. Such a formula is a CNF (Conjunctive Normal Form) formula if it is the conjunction (logical AND) of clauses, where each clause is the disjunction (logical OR) of literals. A literal is either a Boolean variable or its negation. A formula is a k-CNF formula if every clause contains k literals. We assume that the same variable cannot occur multiple times in a clause. The problem called k-SAT is this: given a k-CNF formula, is it satisfiable? This is a well-known NP-complete problem, but in some cases, the LLL allows us to quickly show that certain k-CNF formulas are satisfiable.
Let us say that two clauses in a k-CNF formula overlap, if there is a variable occurring in both (regardless of whether the variable is negated or not in the clauses). We can show via the LLL that if any clause overlaps with at most 2 k−2 other clauses in a k-CNF formula Φ, then Φ is satisfiable.
Assign random truth values to the variables independently. Let C i , i = 1, . . . , n, be the clauses and let A i denote the event that C i is not satisfied by the random truth assignment. There are 2 k possible truth assignments to the k literals in C i , and only one of them makes C i false -the one in which all literals of C i are false. Therefore, Pr(A i ) = 2 −k . Furthermore, observe that each event A i is mutually independent of the set of those A j 's that correspond to clauses which do not overlap with C i . Consequently, the maximum degree d in the dependency graph of the events A 1 , ..., A n satisfies d ≤ 2 k−2 . Therefore, we can write Then, we can conclude from the LLL that Pr(A 1 . . . A n ) > 0 holds, which means that at least one of the random truth assignments satisfies all clauses-that is, Φ is satisfiable. It is interesting to note that this does not depend on n, the number of clauses, and it is also independent of the number of variables. Thus, by means of the LLL, we have proved the following result: Theorem 3. If in a k-CNF formula Φ any clause overlaps with at most 2 k−2 other clauses, then Φ is satisfiable.
For example, if each clause contains k = 12 literals, and each clause overlaps with at most 2 k−2 = 1024 other clauses, then the formula is satisfiable, no matter how many clauses and variables it has.

Independent Sets in Graphs
Consider a graph with vertex set V and maximum degree ∆. Assume V is partitioned as V = V 1 ∪ . . . ∪ V r , where each V i satisfies |V i | ≥ ∆, for some positive integer . We ask: what value of guarantees that one can always select a vertex from each V i , such that the selected vertices form an independent set, i.e., a set in which no two vertices are connected.
First, observe that we can assume |V i | = ∆, since otherwise we can just remove vertices from V i to achieve the equality. The reason is that if the claim holds with fewer vertices in V i , then it certainly must hold in the original graph. Set k = ∆; this will be the size of each V i , after possibly removing some vertices. Now, pick a random vertex from each V i . For each edge e, let A e denote the event that e connects two of the randomly selected vertices. This can only happen if e connects vertices in two different sets V i , V j . Since |V i | = |V j | = k, Pr(A e ) = 1/k 2 holds for any e due to the random vertex selection.
Observe now that if e has its endpoints in the sets A i , A j , then for another edge f , the event A f can only depend on A e if f has an endpoint in A i ∪ A j . How many such edges f can exist? The size of A i ∪ A j is 2k and each vertex is adjacent to at most ∆ edges. Therefore, the maximum degree in the dependency graph of the A e events is at most d = 2k∆.
Let us now apply the LLL, in its simplest form, presented in Lemma 1. We need to satisfy the condition Pr(A e ) ≤ 1/(4d). Due to Pr(A e ) = 1/k 2 , it is satisfied if holds. From this, and from k = ∆, we obtain After simplification, this yields ≥ 8.

Graph Coloring with Additional Constraints
The well-known task of graph coloring means that we need to assign a color to each vertex of the graph such that neighboring vertices have different colors. If a coloring satisfies this requirement, it is called a proper coloring.
The standard question regarding graph coloring is: how many colors are needed for a proper coloring? A simple basic fact is that in a graph with maximum degree ∆, it is always enough to use at most ∆ + 1 colors: index the colors by 1, 2, . . . and color the vertices one by one, always using the lowest indexed color that has not occurred among the already colored neighbors of the considered vertex. It is easy to see that this leads to the use of at most ∆ + 1 colors.
What if, however, we want a coloring that also satisfies some extra requirements? Such an extra requirement is that the frequency of color occurrences in the neighborhood of any vertex is limited, as defined below. Definition 1. Let β ≥ 1 be an integer. A proper coloring is called β-frugal if no color appears more than β times in the neighborhood of any vertex.
Hind, Molloy, and Reed [7] analyzed the number of colors needed for β-frugal colorings. They found a relationship between the maximum degree and the number of colors that suffice, but this is much more complex than the simple ∆ + 1 bound mentioned above for conventional coloring. Using the asymmetric Lovász Local Lemma, they proved the theorem below (we present a simplified version given by Sinclair [8]): Theorem 5. Let G be a graph. If the maximum degree ∆ of G satisfies ∆ ≥ β β , then G has a β-frugal proper coloring using at most 16∆ 1+1/β colors.

Packet Scheduling in Networks
Consider a communication network, which is modeled by a directed graph. We have a given system P 1 , . . . , P r of directed paths in the network, and we want to send packets along these paths. The paths are edge-simple, which means that no path repeats any edge. Let us consider a synchronous communication model, in which for each time unit, one packet can traverse an edge. Assume a set of packets is given for each path. We would like to achieve that each packet is delivered on its respective path in the shortest possible time. Since the paths and packet sets are given, our only degree of freedom is to decide, whenever several packets compete for an edge, which one goes first, i.e., in which order they will traverse the edge (recall that only one can do it in one time unit). In other words, we are looking for the optimal packet scheduling policy. Furthermore, we would like to achieve it with a constant buffer size at every node.
Let us define some key parameters for this problem: congestion c is the maximum number of paths that share a directed edge; dilation d is the maximum path length that occurs in the path system. Let T be the smallest time in which each packet can be delivered, no matter how many packets wait for each path. The time for each individual packet is measured from the instant when it starts traversing the first edge on its path. A lower bound on T in the worst case is T ≥ max{c, d}.
The justification for this lower bound is the following: • A path can be d edge long, and traversing such a path clearly takes d time units, implying T ≥ d.

•
An edge can be included in c paths. If a packet arrives at the edge on each of these c paths at the same time, then one of them will suffer c − 1 time units delay, plus its own traversal time, resulting in T ≥ c in the worst case. • Then, T ≥ d and T ≥ c together yield the worst case lower bound T ≥ max{c, d}.
A remarkable thing is that this essentially trivial lower bound can be achieved, up to a constant factor, no matter how many packets wait for each route. Furthermore, a constant size buffer suffices at every node. Specifically, by a sophisticated use of the Lovász Local Lemma, Leighton, Maggs, and Richa [9] proved the following theorem: Theorem 6. In the above model, there is always a packet schedule that achieves O(c + d) delivery time for each packet, with a constant queue length at every node.
Observe that c + d ≤ 2 max{c, d}; so O(c + d) is indeed optimal up to a constant factor.

The Algorithmic Lovász Local Lemma
The original LLL (Lemma 1) and its variants presented in Section 4 are all existence theorems: they prove that there exists a realization that satisfies each of A 1 . . . A n , but do not tell anything about how to find such a realization algorithmically.
In many cases, it is very natural to look for an algorithmic solution. For example, in the case presented in Example 3 (see Section 5.2), we have shown that in a k-CNF formula Φ, if every clause overlaps with at most 2 k−2 other clauses, then Φ is satisfiable. In most applications, however, it is not enough just to know that a formula is satisfiable, we also want to find an actual satisfying truth assignment. The original LLL, as applied to this problem, does not provide it.
A naive approach would be to try rejection sampling to solve the above problem: repeatedly pick random truth assignments until one of them eventually satisfies all clauses. Since the LLL guarantees Pr(A 1 . . . A n ) > 0, we must find a satisfying truth assignment after a finite number of trials, with probability 1. Unfortunately, this approach can be very inefficient, because the probability Pr(A 1 . . . A n ), although guaranteed to be positive by the LLL, is typically exponentially small. This would therefore lead to an exponential time algorithm, essentially not better than exhaustive search.
The first proof of the possibility of an efficient algorithmic LLL was published by Beck [10] in 1991. His algorithm, however, was somewhat technical, and required a stronger condition on neighborhoods than the original LLL. After a sequence of improvements, the most elegant solution was published by Moser and Tardos [11]. It is worth mentioning that the authors received the prestigious Gödel Prize in 2020 for this work.
The Moser-Tardos algorithm can be best presented via a formulation called the variable version of the LLL. Let X 1 , . . . , X m be mutually independent random variables. Denote the "bad" events that we want to avoid by A 1 , . . . , A n and let p i = Pr(A i ) be the probability of A i , i = 1, . . . , n. Let vbl(A i ) ⊆ {X 1 , . . . , X m } denote the subset of variables on which A i depends. In particular, if vbl(A i ) ∩ vbl(A j ) = ∅, then A i and A j are independent. The dependency graph G = (V, E) is defined by V = {1, . . . , n} and With the above framework, the Moser-Tardos algorithm is surprisingly simple and elegant, see Algorithm 1. We present the most concise version, found in Szegedy [12].
The correctness and complexity of the algorithm is captured by the following theorem, which implies that this is a probabilistic polynomial time algorithm.
• While there is an i, 1 ≤ i ≤ n, such that A i is not satisfied by the current assignment, do: -Choose the smallest such i and resample all variables in X j ∈ vbl(A i ).
That is, choose new random values independently for each X j ∈ vbl(A i ).
• Return the current variable assignment.

Theorem 7.
If the conditions of the asymmetric LLL hold, i.e., there are real numbers x 1 , . . . , x n ∈ [0, 1), such that then the algorithm resamples each vbl(A i ) at most an expected number of x i 1−x i times before finding an assignment that satisfies all the A i . Therefore, the expected overall number of resamplings is at Furthermore, if the maximum degree of the dependency graph is D, and p i ≤ 1 e(D+1) (∀i) holds, then vbl(A i ) is resampled at most an expected number of 1/D times, and the total expected number of resamplings is bounded by n/D.
While the algorithm itself is surprisingly straightforward and elegant, the proof of the above Theorem is far from simple, see Moser and Tardos [11]. It is interesting to note that the RESAMPLE algorithm is similar to the WalkSAT heuristic, which is used to solve general Boolean satisfiability problems, see, e.g., Hoos and Stützle [13].

Outlook
Let us briefly list some other recent research directions that are being pursued in connection with the Lovász Local Lemma and its algorithmic version, without going into details.

•
Derandomization of the Moser-Tardos algorithm is possible, leading to a deterministic polynomial time algorithm. This was already addressed by Moser and Tardos in [11], and further developed by Chandrasekaran, Goyal, and Haeupler [14]. • Some recent papers deal with sampling and counting problems related to the LLL, see, e.g., Jain, Pham, and Vuong [15]. • Approximate counting is also considered in the context, such as approximately counting the satisfying truth assignments of a CNF formula, see Moitra [16]. • A quantum version of the LLL has also been introduced, referred to as Quantum Lovász Local Lemma, see Ambainis, Kempe, and Sattath [17] and He, Li, Sun, and Zhang [18].
The key difference between the Quantum Lovász Local Lemma and the classical LLL is that in the quantum version, the events are substituted with subspaces, and the event probabilities are substituted with subspace dimensions. This makes it more suitable for quantum computing applications. • The LLL is also being used to analyze parallel and distributed algorithms, see, e.g., Chang, He, Li, Pettie, and Uitto [19]. • There are attempts to extend the LLL to an infinite setting, see Bernshteyn [20].

Conclusions
We have surveyed a classic and fundamental result, known as the Lovász Local Lemma. We reviewed some variants, related results, and applications, as well as the algorithmic version. We were delighted with the intriguing feature that a purely probabilistic claim can be used to prove completely deterministic statements.
As a closing conclusion, we can claim that in a high-level, abstract sense, the Lovász Local Lemma also provides efficiency in a way that is somewhat reminiscent of the fact that general graph problems often become more tractable in bounded degree graphs. Of course, the LLL case is more complicated, since it involves not only neighborhood sizes, but also probabilities, but it still deals with bounded degree style structures. As an example, let us refer to the k-SAT problem (see Section 5.2) that becomes solvable in polynomial time, whenever the clause overlaps are limited.