2. Short History of Large Gap Results
Starting with the papers [
4,
22] of Erdős, all the results on large gaps between primes are based on modifications of the Erdős–Rankin method. Its basic features are as follows:
Let 
. All steps are considered for 
. Let
 By the prime number theorem we have
 A system of congruence classes
 (with 
 being the primes less than 
x) is constructed, such that the congruence classes 
 cover the interval 
.
Associated with the system (2) is the system of congruences
 By the Chinese Remainder Theorem the system (3)
      has a unique solution 
Let 
, 
. Then, there is a 
j, 
, such that
 From (2) and (3)
 If 
 is sufficiently large, then all integers 
 are composite. If
      then it follows that 
, a large gap result.
The large gap problem has thus been reduced to a covering problem: Find a system of congruence classes that cover the interval , where y is as large as possible.
In all papers since Erdős [
4,
22], the covering system (2) has been constructed by a sequence of sieving steps. The set
      is partitioned into a disjoint union of subsets:
 Associated with each sieving step 
 is a choice of congruence classes 
 for 
. We also consider the sequence 
 of residual sets. It is recursively defined as follows:
The 0-th residual set 
 covers the entire interval 
. Thus,
 The 
 residual set 
 is obtained by removing from 
 all the integers from 
 congruent to 
 for some 
. The sequence 
, 
 is complete; if 
, 
, that means all integers in 
 have been removed. For a complete sequence of sieving steps the union
      thus covers all of 
 and the choice 
 in (2) gives a covering system of the desired kind.
In all versions of the Erdős–Rankin method, the first sieving steps have been very similar.
We describe—with minor modifications, adjusting to our notations—the construction of the covering system (2) in Erdős [
4,
22].
One sets
 The sets 
 of primes are defined as follows:
 For the first two sieving steps, one defines the congruence classes
      by
 A simple consideration shows that for the second residual set 
 the intersection 
 is the union of a set 
Q of prime numbers
      with a set of 
Z-smooth integers, i.e., integers whose largest prime factor is 
. A crucial fact in all variants of the Erdős–Rankin method is that the number of smooth integers is very small. This fact was established by Rankin [
5] and Bruijn [
23].
A central idea of Rankin’s method is “Rankin’s trick”. Let us write 
 for the largest prime factor of 
m. Let 
 mean summation over all integers 
n with 
. Then, one has for 
:
 The bound needed follows by evaluating the product by the prime number theorem and by choosing 
 optimally.
Thus, the elements of the second residual set essentially only consist of prime numbers, the number of Z-smooth numbers of the second residual set being negligible.
In the third sieving step in Erdős [
4], the classes
      are chosen via a greedy algorithm. In each step, the congruence class not belonging to the previous congruence classes that contains the most elements of the residual set 
 is removed.
In each version of the Erdős–Rankin method, there is a weak sieving step, which we will not number, since this number might be different in different versions. Instead, we call it the weak sieving step, since only a few elements of the residual set are removed.
In the first paper [
4] of Erdős, which is being discussed right now, in the fourth sieving step 
 one uses the primes
      to remove the elements from the set 
.
An important quantity is the hitting number of the weak sieving step 
. The hitting number of the prime 
 is defined as the number of elements belonging to the congruence class 
. In all papers prior to [
6], this hitting number was bounded below by 1. Thus, for each element 
u of the residual set 
 a prime 
 could be found such that
      and thus the removal of a single element from the congruence class 
 could be guaranteed. The progress in the papers was achieved not by changing the estimate for the hitting number, but by better estimates for the number of smooth integers.
In the paper [
6] by Maier and Pomerance, the hitting number in the weak sieving step for a positive proportion of the primes 
 was at least 2.
A further improvement was obtained in the paper [
9] by Pintz, where the hitting number was at least 2 for almost all primes in 
. We give a short sketch of these two papers.
The paper [
6] consists of an arithmetic part and a graph-theoretic part, combined with a modification of the Erdős–Rankin method. The arithmetic information needed concerns the distribution of generalized twin primes in arithmetic progressions on average.
We recall definitions and theorems from [
6]. Fix some arbitrary, positive numbers 
. For a given large number 
N, let 
 satisfy
 If 
n is a positive integer, let
      where as usual 
p denotes a prime.
Further, if 
 are positive integers, let
 Let
      and let
 Let
 Then, one has with a fixed constant 
:
 The result (5) is proven by application of the Hardy–Littlewood Circle method. We now come to the graph-theoretic part:
We have the following definitions:
Definition 1 ([
6], Definition 4.1′). 
Say that a graph G is N-colored if there is a function χ from the edge set of G to . In the paper [
6], first a graph is discussed, whose properties are idealized and thus simpler to formulate than the properties really needed for the applications. A proof of the existence of certain colored subgraphs (partial matchings) is given. Then, the graphs with properties needed for the applications are discussed. The existence of certain colored subgraphs is given without proof. The proof can easily be obtained by a modification of the proof for the idealized graphs mentioned above. For the sketch of the details, we cite ([
6], Definition 4.2).
Say an N-colored graph G is K-uniform if  and there are integers  such that
- (i)
- Each color in  is assigned to exactly S edges of G. 
- (ii)
- For each  and each vertex V in G, there are exactly  edges E coincident at V with color in . Thus, each vertex of G has valence T. 
One has
Theorem 1 ([
6], Theorem 4.1). 
Say G is a K-uniform, N-colored graph with N vertices, where . Then, there is a set of B mutually non-coincident edges with distinct colors such that We describe the construction of these edges:
Let 
 be as in ([
6], Definition 4.2).
Let 
B, be the largest collection of mutually noncoincident edges with distinct colors in 
. After 
 have been chosen and 
, let 
 be the largest collection of edges of 
G with distinct colors in 
 such that the members of
      are mutually noncoincident. Let 
 be such that 
 and let
 It can be shown that
 We now describe the modifications suited for applications.
Definition 2 ([
6], Definition 4.2′). 
Let K be a positive integer and let ,  be arbitrary. Say an N-colored graph G with N vertices is -uniform if there are numbers  such that- (i) 
- For at most  exceptions, each color in  is assigned to between  and  edges of G; 
- (ii) 
- If we let  denote the number of edges coincident at the vertex V with color in , thenfor each , but for at most  exceptional vertices V, we havefor each . 
 Then, we have the following result:
Theorem 2 ([
6], Theorem 4.1′). 
Let ,  be arbitrary. There is a number  such that for each integer  there is some  with the property that each -uniform, N-colored graph with  vertices, where , has a set of B mutually noncoincident edges with distinct colors, where We now describe the application of the Erdős–Rankin method in the paper [
6] and its combination with the arithmetic and graph-theoretic results just mentioned.
Let
 The first two sieving steps are as follows:
For the system of congruence classes 
 as described in (2), we choose:
 The first residual set 
 is the disjoint union 
, where 
 is the set of integers in 
 divisible by some prime 
 and 
 is the set of 
v-smooth integers in 
. Let 
 be the members of the second residual set that are in 
 and let 
 be the members of the second residual set that are in 
. Then
      where
 It is again important that the number of smooth integers is small and it easily follows that
 For the weak sieving step, one now applies the graph-theoretic results (Theorem 2).
One defines a graph whose vertex set is 
. Let
 Define
 Let 
 denote the set of primes 
q in the interval
 Let 
 be the graph with vertex set 
 and such that 
 are connected by an edge if and only if 
 for some 
.
Define the “color” of an edge by the prime 
q, so that 
 is a 
-colored graph. From the arithmetic information, combined with standard sieves, it can easily be deduced that the graphs 
 satisfy the conditions of the graph-theoretic result ([
6], Definition 4.2). Thus, the graphs 
 contain a sufficient number of edges 
 and thus pairs 
 with
 We consider the system
      for 
If we determine
      by
      then the hitting number for the prime 
 is 2. Thus, by the weak sieving step, two members of the residual set are removed for each prime 
q. The weak sieving step is completed by removing one member of the residual set for the remaining primes.
The paper [
9] by Pintz contains exactly the same arithmetic information as the paper [
6] by Maier and Pomerance, whereas the graph-theoretic construction is different. The edges of the graphs are obtained by a random construction and a hitting number of 2 for almost all primes in the weak sieving step is achieved.
The order of magnitude of 
 could finally be improved in the paper [
10]. The result is:
      with 
 for 
.
The paper is related to the work on long arithmetic progressions consisting of primes by Green and Tao [
12,
13] and work by Green, Tao and Ziegler [
14] on linear equations in primes. The authors manage to remove long arithmetic progressions of primes in the weak sieving step and thus are able to obtain a hitting number tending to infinity with 
X. We shall not describe any more details of this paper. Simultaneously and independently, James Maynard [
15] achieved progress based on multidimentional sieve methods. The authors of the paper [
10] and Maynard in [
19] joined their efforts to prove
      for a constant 
.
Again the hitting number in the weak sieving step tends to infinity for 
. Whereas in the papers [
6,
9] by Maier and Pomerance and Pintz, the pairs of the integers removed in the weak sieving step were interpreted as edges of a graph, now the tuplets of integers removed are seen as edges of a hypergraph. One uses a hypergraph covering theorem generalizing a result of Pippenger and Spencer [
24] using the Rödl nibble method [
25].
The choice of sieve weights is related to the great breakthrough results on small gaps between consecutive primes, based on the Goldston–Pintz–Yildirim (GPY) sieve and Maynard’s improvement of it. We give a short overview.
  3. Small Gaps, GPY Sieve and Maynard’s Improvement
The first non-trivial bound was proved by Erdős [
4,
22], who showed that
 By applying Selberg’s sieve, he showed that pairs of primes 
 with a fixed difference cannot appear too often.
The first major breakthrough was achieved by Bombieri and Davenport [
26], who showed that
 Let
 Then,
      with 
One row considers the integral
 By orthogonality, one obtains:
 One now tries to establish a lower bound for 
I(
x). This bound can be combined with upper bounds for 
 for large values of 
m to obtain estimates 
 for small values of 
m. Thus, gaps of size 
 exist.
These estimates became possible by application of the Bombieri–Vinogradov theorem, proven one year before [
27].
For its formulation, the following definition will be useful:
Definition 3. LetWe say that the primes have an admissible level of distribution θ ifholds for any  and any .  The Bombieri–Vinogradov theorem now states that:
For any 
, there is a 
 such that, for
	  This implies that the primes have an admissible level of distribution 
.
Definition 4. We say that the primes have anadmissible level of ditribution   ϑ if (11) holds for any  and any  with .
 A great breakthrough was achieved in the paper [
16]. They consider admissible 
k-tuples for which we reproduce the definition:
Definition 5.  is called admissible if for each prime p the number  of distinct residue classes modulo p occupied by elements of  satisfies .
 The two main results in the paper [
16] of Goldston, Pintz and Yildirim are
Theorem 3 ([
16], Theorem 3.3). 
Suppose the primes have a level of distribution . Then, there exists an explicitly calculable constant  depending only on ϑ such that any admissible k-tuple with  contains at least two primes infinitely often. Specifically, if , then this is true for . Theorem 4 ([
16], Theorem 3.4). 
We have The method of Goldston, Pintz and Yildirim has also become known as the GPY sieve.
There are several overview articles on the history of the GPY method (cf. [
18,
28]).
The overview article most relevant for this paper is due to Maynard [
29], whose improvements of the GPY sieve is of crucial importance for the large gap results described in this paper.
Before we recall Maynard’s description, we should mention another milestone which, however, is not relevant for large gap results. The results were obtained by Yitang Zhang [
30] from 2014. He proves the existence of infinitely many bounded gaps. He does not establish an admissible level of distribution 
, which would imply the result, but succeeds in replacing the sum
      by a sum over the smooth moduli.
We now come to the short description of the GPY method and its improvement by Maynard, closely following the paper “Small gaps between primes" by Maynard [
29]. One of the main results of [
29] is:
Theorem 5 (of [
29]). 
Let . We have Tao (in private communication to Maynard) has independently proven Theorem 5 (with a slightly weaker bound at much the same time).
Theorem 5 implies that for every  there exist intervals whose lengths depend only on H with arbitrarily large initial point that contain at least H primes.
Now, we follow [
29] for a short description of the GPY sieve and its improvement.
Let 
 be an admissible 
k-tuple. One considers the sum
 Here, 
 is the characteristic function of the primes, and 
 and 
 are non-negative weights. If one can show that 
, then at least one term in the sum over 
n must have a positive contribution. By the non-negativity of 
, this means that there must be some integer 
 such that at least 
 of the 
 are prime.
The weights 
 are typically chosen to mimic Selberg sieve weights. The standard Selberg 
k-dimensional weights are
 The key new idea in the paper [
16] of Goldston, Pintz and Yildirim was to consider more general sieve weights of the form
      for a suitable smooth function 
F.
Goldston, Pintz and Yildirim chose  for suitable , which has been shown to be essentially optimal, when k is large.
The new ingredient in Maynard’s method is to consider a more general form of the sieve weights
 The results of [
29] were modified and extended in the paper [
15] “Dense clusters of primes in subsets” of Maynard. Some of his results and their applications will be described later in this paper.
  4. Large Gaps with Improved Order of Magnitude and Its K-Version, Part I
Here, we state the theorems from [
19,
20] and sketch their proofs.
We number definitions and theorems in the following manner:
Definition (resp. Theorem) X of paper  (in the list of references) is referred to as (, Definition (resp. Theorem) X).
We start with a list of the theorems from [
19] and the definitions relevant for them:
Theorem 6 ([
19], Theorem 1, large prime gaps). 
For any sufficiently large X, one has for sufficiently large X. The implied constant is effective. Definition 6 ([
19], Definition (3.1)).
 where c is a certain (small) fixed positive constant. Definition 7 ([
19], Definition (3.2)).
 Definition 8 ([
19], Definitions (3.3)–(3.5)).
 For congruence classes
      and
      define the sifted sets
      and likewise
Theorem 7 ([
19], Theorem 2—sieving primes). 
Let x be sufficiently large and suppose that y obeys (7). Then, there are vectors such that Theorem 8 ([
19], Theorem 3, probabilistic covering). 
There exists a constant  such that the following holds. Let , , and let  be an integer. Let  satisfy the smallness bound Let  be disjoint finite non-empty sets and let V be a finite set. For each  and , let  be a random finite subset of V. Assume the following:
Corollary 1 ([
19], Corollary 4). 
Let . Let  be sets with  and . For each , let  be a random subset of  satisfying the size bound:Assume the following:- (Sparcity) For all  and  
- (Small codegrees) For any distinct  
- (Elements covered more than once in expectation) For all but at most  elements , we have:for some quantity C, independent of q, satisfyingThen, for any positive integer m withWe can find random sets  for each  such that  is either empty or a subset of  which  attains with positive probability and thatwith probability . More generally, for any  with cardinality at least , one haswith probability . The decay rates in the  and ∼ notation are uniform in . 
 Theorem 9 ([
19], Theorem 4, random construction). 
Let x be a sufficiently large real number and define y by (7). Then, there is a quantity  with with the implied constants independent of c, a tuple of positive integers  with  and some way to choose random vectors  and  of congruence classes  and integers  respectively, obeying the following:- For every  in the essential range of , one haswhere 
- With probability , we have that 
- Call an element  in the essential range of  good if, for all but at most  elements , one hasThen,  is good with probability . - The theorem and definitions are from [20]. 
 Theorem 10 ([
20], Theorem 1.1). 
There is a constant  and infinitely many n, such that and the interval  contains the K-th power of a prime. Definition 9 ([
20], Definitions (3.1)–(3.5)).
 where c is a fixed positive constant. Let and introduce the three disjoint sets of primes For residue classes and define the sifted sets and likewise We set Theorem 11 ([
20], Theorem 3.1, sieving primes). 
Let x be sufficiently large and suppose that y obeys (7). Then, there are vectors  and , such that Theorem 12 ([
20], Theorem 4.1). 
(Has wording identical to [19], Theorem 3.) Corollary 2 ([
20], Corollary 4.2). 
(Has wording identical to [19], Corollary 3.) Theorem 13 ([
20], Theorem 4.3, random construction). 
(Has wording identical to [19], Theorem 4.) Definition 10 ([
20], Definition 6.1). 
An admissible r-tuple is a tuple  of distinct integers that do not cover all residue classes modulo p for any prime p.For , we defineFor , letWe setFor an admissible r-tuple to be specified later and for primes p with , we set  Theorem 14 ([
20], Theorem 6.2—Existence of good sieve weights). 
Let x be a sufficiently large real number and let y be any quantity obeying (7). Let  be defined by Definitions 7 and 8. Let r be a positive integer withfor some sufficiently large absolute constant  and some sufficinetly small .Let  be an admissible r-tuple contained in . Then, one can find a positive quantityand a positive quantity  depending only on rand a non-negative function  supported on  with the following properties: - Uniformly for every , one has 
- Uniformly for every  and , one has 
- Uniformly for every  that is not equal to any of the , one hasuniformly for all  and . 
 In [
19], we have the following dependency graph for the proof of ([
19], Theorem 1).
	  Replacing these theorems by their 
K-versions we obtain the following dependency graph for the 
K-version ([
19], Theorem 1.1):
	  The graphs (13) and (14) can be combined in the graph:
      
	  (with Theorems 1, 2, 4, 5 corresponding to [
19] and Theorems 1.1, 3.1, 4.3, 6.2 corresponding to [
20]).
The horizontal arrows indicate the deduction of Theorem B from Theorem A; the vertical arrows indicate the transition from Theorem A to its K-version Theorem A’.
Part I of “Large gaps with improved order of magnitude and its 
K-version” (
Section 4) deals with the graph (16). The end of the graph, Theorem 5 and its 
K-version Theorem 6.2 is deduced from results of Maynard’s paper [
15] “Dense clusters of primes in subsets”. The 
K-version, Theorem 6.2 is deduced from its 
K-version. These deductions make up Part II and are the contents of 
Section 5.
The graph (15) consists of segments, the last one being
      
      (with Theorems 1, 2 corresponding to [
19] and Theorems 1.1, 3.1 corresponding to [
20]).
We shall proceed segment by segment starting with (16). In this way, the transition from a theorem to its K-version should become more transparent.
We start with the upper string in (16):
Let 
 and 
 be as in ([
19], Definitions (3.3)–(3.5)). We extend the tuple 
 of congruence classes 
 for all primes 
 by setting 
 for 
 and 
 for 
 and consider the sifted set
 As in previous versions, one shows that the second residual set consists of a negligible set of smooth numbers and the set 
Q of primes. Thus, we find that
 Next let 
C be a sufficiently large constant such that 
 is less than the number of primes in 
. By matching each of these surviving elements to a distinct prime in 
 and choosing congruence classes appropriately, we thus find congruence classes 
 for 
 which cover all of the integers in 
. This finishes the deduction of Theorem 1 from Theorem 2.
K-version deduction of ([20], Theorem 1.1) from ([20], Theorem 3.1) The first two sieving steps are the same as in the “upper string” of ([
19], Theorem 2 ⇒ Theorem 1). Thus, the second residual set is again 
Q apart from a negligible set of smooth integers. The random choice in the remaining sieving steps now has to be modified.
Theorem 15 ([
20], Theorem 3.1). 
Let x be sufficiently large and suppose that y obeys Definition 9. Then, there are vectors  and , such that We now sketch the deduction of ([
20], Theorem 1.1) from ([
20], Theorem 3.1).
Let 
 and 
 be as in ([
20], Theorem 3.1). We extend the tuple 
 to a tuple 
 of congruence classes 
 for all primes 
 by setting 
 for 
 and 
 for 
. Again the sifted set
      differs from the set 
 only by a negligible set of 
z-smooth integers. We find ([
20], Lemma 3.2)
 As in the “upper string deduction” ([
19], Theorem 2) ⇒ ([
19], Theorem 1) we now further reduce the sifted set 
 by using the prime numbers from the interval 
, 
 being a sufficiently large constant.
One follows—with some modification in the notation—the papers [
20,
21]. One distinguishes the cases 
K odd and 
K even. We recall the following definition:
Definition 11 ([
20], Definition 3.3). 
Let For K even and , we set Lemma 1. .
 Lemma 2. There are pairs  with , , such that all  satisfy a congruencewith the possible exceptions of u from an exceptional set V with  Proof.  If 
K is odd, the congruence
        is solvable, whenever 
.
If K is even, the congruence is solvable whenever  and . The claim now follows from Lemma 1.    □
 We now conclude the deduction of Theorem 1.1 by the application of the matrix method. The following definition is borrowed from [
31].
Definition 12. Let us call an integer  a “good” modulus if  for all characters  and all  withThis definition depends on the size of .  Lemma 3. There is a constant , such that, in terms of , there exist arbitrarily large values of x, for which the modulusis good.  Lemma 4 Let q be a good modulus. Then,where  denotes Euler’s totient function, uniformly for  and . Here, the constant D depends only on the value of  in Lemma 3.  Remark 2. This result, which is due to Gallagher [32], is Lemma 2 from [31]. We now define the matrix .
 Definition 13. Choose x, such that  is a good modulus. Let  and  be given. From the definition of  and , there aresuch thatWe now determine  byand the congruences  By the Chinese Remainder Theorem 
 is uniquely determined. We let
      with
 For 
, we denote by
      the 
r-th row of 
 and for 
, we denote by
      the 
u-th column of 
.
Lemma 5. We have that ,  is composite unless .
 Proof.  From the congruences
        in (21), it follows that for
        we have
□
 Remark 3. We observe that each  row  with  has as its first elementthe K-th power of the prime . If ,  is the K-th power of a prime of the desired kind. To deduce Theorem 5 from Theorem 15, it thus remains to show that  is nonempty.
 Proof.  This follows from Lemma 4.    □
 We obtain an upper estimate for 
 by the observation that, if 
 contains a prime number, then
      are primes for some 
.
The number
      is estimated by standard sieves as in Lemma 6.1 of [
21].
This concludes the deduction of Theorem 5 from Theorem 15. We now come to the next section in graph (16).
We first state a hypergraph covering theorem (Theorem 3 of [
19]) of a purely combinatorial nature, generalizing a result of Pippenger and Spencer [
24] using the Rödl nibble method [
25]. We also state a corollary.
Both the deduction of Theorem 2 (Theorem 7) from Theorem 4 (Theorem 17) and its 
K-version, the deduction of Theorem 15 from Theorem 18, are based on Theorem 3 of [
19].
Theorem 16 (Theorem 3 of [
19], Probabilistic covering). 
There exists a constant  such that the following holds. Let  and let ,  be an integer. Let  satisfy the smallness bound Let  be disjoint finite non-empty sets and let V be a finite set. For each  and , let  be a random finite subset of V. Assume the following:
We have the following:
Corollary 3 (Corollary 4 of [
19]). 
Let . Let  be sets with  and . For each , let  be a random subset of  satisfying the size bound: Assume the following:- (Sparsity) For all  and  
- (Uniform covering) For all but at most  elements , we have:for some quantity C, independent of q, satisfying 
- (Small codegrees) For any distinct Then, for any positive integer m withwe can find random sets  for each  such thatwith probability . More generally, for any  with cardinality at least , one haswith probability . The decay rates in the  and ∼ notation are uniform in . 
 Proof.  For the proof, we refer to [
19].    □
 Theorem 17 ([
19], Theorem 4, Random construction). 
Let x be a sufficiently large real number and define y by Definition 9. Then, there is a quantity  with with the implied constants independent of c, a tuple of positive integers  with  and some way to choose random vectors  and  of congruence classes  and integers  respectively, obeying the following:- For every  in the essential range of , one haswhere 
- With probability , we have that 
- Call an element  in the essential range of  good if, for all but at most  elements , one hasThen,  is good with probability . 
 We now show that Theorrem 17 implies Theorem 16. By (38), we may choose 
 small enough so that (35) holds. Take
	  Now, let 
 and 
 be the random vectors guaranteed by Theorem 17. Suppose that we are in the probability 
 event that 
 takes a value 
 which is good and such that (40) holds. Fix some 
 within this event. We may apply Corollary 3 with 
 and 
 for the random variables 
 conditioned to 
. A few hypotheses of the corollary must be verified. First, (34) follows easily. The small codegree condition (36) is also quickly checked. Indeed, for distinct 
 if 
 then 
. But 
 is a nonzero integer of size at most 
 and is thus divisible by at most one prime 
. Hence
      the sum on the left side being zero if 
 does not exist.
By Corollary 3, there exist random variables 
, whose essential range is contained in the essential range of 
 together with ∅ and satisfying
      with probability 
, where we have used (40). Since
      for some random integer 
, it follows that
      with probability 
. Taking a specific 
 for which this relation holds and setting 
 for all 
p concludes the proof of claim (17) and establishes Theorem 7 (Theorem 2 of [
19]).
We now come to the K-version of the deduction Theorem 4 ⇒ Theorem 2, “the lower string” Theorem 4.3 ⇒ Theorem 3.1 of the section
Theorem 18 ([
20], Theorem 4.18—Random construction). 
Let x be a sufficiently large real number and define y by Definition 9. Then, there is a quantity C with with the implied constants independent of c, a tuple of positive integers  with  and some way to choose random vectors  and  of congruence classes  and integers , respectively, obeying the following:- For every  in the essential range of , one haswhere . 
- With probability , we have that 
- Call an element  in the essential range of  good if, for all but at most  elements , one hasThen,  is good with probability . 
 Remark  4. The wording of Theorem 18 is the same as the wording of ([19], Theorem 4). However, the contents of these two theorems are different, since the term essential range has different meaning.  In Theorem 17 
 and 
, assume values of the form 
 and 
, whereas in Theorem 18 they are of the form
 Also, the wording of the deduction of Theorem 15 from Theorem 18 is the same as the deduction of Theorem 7 (Theorem 2 of [
19]) from Theorem 17 (Theorem 4 of [
19]).
We come to the section:
      
      of graph (16).
The proof of this theorem relies on the estimates for multidimensional prime-detecting sieves established by the fourth author in [
19].
We show now that Theorem 14 implies Theorem 17.
Let 
 be as in Theorem 17. We set 
r to be the maximum value permitted by Theorem 14, namely
      and let 
 be the admissible 
r-tuple consisting of the first 
r primes larger than 
r; thus, 
 for 
. From the prime number theorem, we have 
 for 
 and so we have 
 for 
 if 
x is large enough. We now invoke Theorem 14 to obtain quantities 
 and a weight 
 with the stated properties.
For each 
, let 
 denote the random integer with probability density
      for all 
 (we will not need to impose any independence condition on 
). We have
 Also, one has
      for all 
 and 
.
We choose the random vector  by selecting each  uniformly at random from , independently in s and independently of the .
The resulting sifted set 
 is a random periodic subset of 
 with density
 From the prime number theorem (with sufficiently strong error term),
      so in particular we see that
 We also see from (43) that
 We have a useful correlation bound:
Lemma 7. Let  be a natural number and let  be distinct integers of magnitude . Then, one has  Proof.  For each 
, the integers 
 occupy 
t distinct residue classes modulo 
s, unless 
s divides one of 
 for 
. Since 
 and 
 are of size 
, the latter possibility occurs at most 
 times. Thus, the probability that 
 avoids all of the 
 is equal to 
 except for 
 values of 
s, where it is instead
 Thus,
□
 Among other things, this gives claim (40):
Corollary 4. With probability , we haveandand so by the prime number theorem we see that the random variable  has meanand varianceThe claim then follows from Chebyshev’s inequality (with plenty of room to spare).  For each 
, we consider the quantity
      and let 
 denote the set of all the primes 
 such that
 In light of Lemma 7, we expect most primes in 
P to lie in 
 and this will be confirmed below in Lemma 9. We now define the random variables 
 as follows. Suppose we are in the event 
 for some 
 in the range of 
. If 
, we set 
. Otherwise, if 
, we define 
 to be the random integer with conditional probability distribution
      with the 
 jointly independent, conditionally on the event 
. From (47), we see that these random variables are well defined.
Lemma 8. With probability , we havefor all but at most  of the primes .  Let 
 be good and 
. Substituting definition (49) into the left-hand side of (50), using (48), and observing that 
 is only possible if 
, we find that
      where
      is as defined in Theorem 17 (Theorem 4 of [
19]). Relation (41) (that is, 
 is good with probability 
) follows upon noting that by (43) and (46),
 Before proving Lemma 8, we first confirm that 
 is small with high probability.
Lemma 9. With probability ,  contains all but  of the primes . In particular,  Proof.  By linearity of expectation and Markov’s inequality, it suffices to show that for each 
, we have 
 with probability 
. It suffices to show that
        and
        where 
, 
 are independent copies of 
 that are also independent of 
.    □
 The claim (50) follows from Lemma 7 (performing the conditional expectation over 
 first). A similar application of Lemma 7 allows one to write the left-hand side of (52) as
 From (44), we see that the quantity 
 is equal to 
 with probability 
 and is less than 
 otherwise. The claim now follows from (46).
(Proof of Lemma 8).  We first show that replacing 
 with 
P has negligible effect on the sum, with probability 
. Fix 
i and substitute 
. By Markov’s inequality, it suffices to show that
        by Lemma 7, we have
 Next, by (47) and Lemma 9 we have
        subtracting, we conclude that the left-hand side of (53) is 
. The claim then follows from (42). By (53), it suffices to show that with probability 
, for all but at most 
 primes 
, one has
 Call a prime 
 bad if 
 but (55) fails. Using Lemma 7 and (44), we have
        and
        where 
 and 
 are independent copies of 
 over 
. In the last step, we used the fact that the terms with 
 contribute negligibly.
By Chebyshev’s inequality, it follows that the number of bad 
q is
        with probability 
.    □
 We now come to the K-version, the “lower string” Theorem 6.2 ⇒ Theorem 4.3 of section (42).
Like in the “upper string” in Theorem 5 of [
19], a certain weight function 
w is of importance. The construction of 
w will be modelled on the construction of the function 
w in [
19], Theorem 5.
The restrictions ,  bring some additional complications. The function  will be different from zero only if n belongs to a set  of p-good integers. The definition of  is based on the set  of good integers.
Definition 15. For , we defineFor , letWe setFor an admissible r-tuple to be specified later and for primes p with , we set  Theorem 19 (Theorem 6.2 of [
20], Existence of good sieve weights). 
Let x be a sufficiently large real number and let y be any quantity obeying Definition 9. Let  be defined by Definition 9. Let r be a positive integer with for some sufficiently large absolute constant  and some sufficiently small .Let  be an admissible r-tuple contained in . Then, one can find a positive quantityand a positive quantity  depending only on r withand a non-negative functionsupported on  with the following properties:unless  for some ,  and . Uniformly for every , one hasUniformly for every  and , one hasUniformly for every  that is not equal to any of the , one hasUniformly for all  and   We now show how Theorem 19 implies Theorem 18.
Let 
 be as in Theorem 16. We set
 We now invoke Theorem 19 to obtain quantities 
 and weight 
 with the stated properties.
For each 
, let 
, denote the random integer with probability density
      for all 
. From (59), (60), we have
 Also, from (57), (59), (63), one has
      for all 
 and 
.
We choose the random vector  by selecting each  uniformly at random from  independently in s.
Lemma 10. Let  be a natural number and let  be distinct integers from . Then, one has  Proof.  For 
, let 
 be the set of 
 for which 
, for 
. Then, since
        we have
 Let 
, 
, 
. We write
        where
 We set
□
 We have
 We now use certain well-known facts from the theory of 
K-th power residues.
There are
      possible choices for the 
. From these, for each 
h, 
 there are 
 choices such that
 Thus, the total number of choices for 
 for which not all 
, 
 is
 Since the choices for the components 
 are independent, we have
 We have
 Since 
 for 
, we have by the definition for 
:
 From (65) and (66), we thus obtain
Corollary  5 (to Lemma 10). 
With probability , we have: Proof.  From Lemma 10, we have
        and
        and so by the prime number theorem we see that the random variable 
 has mean
        and variance
 The claim then follows from Chebyshev’s inequality.    □
 For each 
, we consider the quantity
      and let 
 denote the set of primes 
, such that
 We now define the random variables 
 as follows. Suppose we are in the event 
 for some 
 in the range of 
. If 
, we set 
. Otherwise, if 
, we define 
 to be the random integer with conditional probability distribution
      where
      with the 
 jointly conditionally independent on the event 
.
Lemma 11. With probability , we havefor all but at most  of the primes .  Before proving Lemma 11, we first confirm that  is small with high probability.
Lemma 12. With probability contains all butof the primes . In particular  Proof.  By linearity of expectation and Markov’s inequality, it suffices that for each 
 we have 
 with probability
 By Chebyshev’s inequality it suffices to show that
        and
        where 
 are independent copies of 
 that are also independent of 
.
To prove claim (69), we first select the value 
n for 
 according to the distribution (63):
 Because of the property 
, if 
 we have with probability 1:
 Relation (69) now follows from Lemma 10 with 
, applying the formula for total probability
 A similar application of Lemma 10 allows one to write the left-hand side of (70) as
 From (69), we see that the quantity
        is equal to 
 with probability
        and is less than 
 otherwise. 
The claim now follows from .    □
 (Proof of Lemma 11).  We first show that replacing  with P has negligible effect on the sum with probability . Fix i and substitute .
By Lemma 11, we have
 Next by
        and Lemma 12 we have
 Subtracting, we conclude that the difference of the two expectations above is 
. The claim then follows from (56).
By this, it suffices to show that
        for all but at most 
 primes 
, one has
 We call a prime 
 “bad” if 
, but (71) fails. Using Lemma 12 and (63) we have
 By the definition of 
, we have
        unless 
. By Definition 15 this means that 
.
We may thus apply Lemma 12 with
        and obtain for all 
i:
 With (71), we thus obtain
 Next, we obtain
        where 
 and 
 are independent copies of 
 over 
. In the last step, we used the fact that the terms with 
 contribute negligibly.
By Chebyshev’s inequality, it follows that the number of bad 
q’s is
 We may now prove Theorem 16.
Relation (40) is actually the corollary to Lemma 10. In order to prove (14), we assume that  is good and .
Substituting (67) into the left-hand side of (68) using 
 and observing that 
 is only possible if 
, we find that
        where
        is as defined in Theorem 16. The fact that 
 is good with probability 
 follows upon noticing that
 This concludes the proof of Theorem 16.    □
   5. Large Gaps with Improved Order of Magnitude and Its K-Version, Part II
We first state definitions and results from “Dense clusters of primes in subsets” by Maynard [
15].
We make use of the notation given in 
Section 7: “Multidimensional Sieve Estimates” of [
15].
Definition 16. A linear form is a function  of the form  with integer coefficients  and . Let  be a set of integers. Given a linear form . We define the setsfor any  and congruence class  and define the quantitywhere ϕ is the Euler totient function. A finite set  of linear forms is said to be admissible if  has no fixed prime divisor; that is, for every prime p there exists an integer  such that  is not divisible by p.
 Definition 17. Let x be a large quantity, let  be a set of integers,  a finite set of linear forms and B a natural number. We allow  to vary with x. Let  be a quantity independent of . Let  be a subset of . We say that the tuple  obeys Hypothesis 1 at  if we have the following three estimates:
- (1) 
- ( is well-distributed in arithmetic progressions). We have 
- (2) 
- ( is well-distributed in arithmetic progressions). For any , we have 
- (3) 
- ( not too concentrated). For any  and , we have 
 In [
15], this definition was only given in the case 
, but we will need the (mild) generalization to the case in which 
 is a (possibly empty) subset of 
.
As is common in analytic number theory, we will have to address the possibility of a Siegel zero. As we want to keep all our estimates effective, we will not rely on Siegel’s theorem or its consequences. Instead, we will rely on the Landau–Page theorem, which we now recall. Throughout,  denotes a Dirichlet character.
Lemma 13 (Landau–Page Theorem). 
Let . Suppose that  for some primitive character χ of modulus at most Q and some . Then, either or else  and χ is a quadratic character , which is unique. Furthermore, if  exists, then its conductor  is square-free apart from a factor of at most 4 and obeys the lower bound Proof.  See, e.g., ([
27], Chapter 14). The final estimate follows from the bound
        for a real zero 
 of 
 with 
 of modulus 
q, which can also be found in ([
27], Chapter 14).
We can then eliminate the exceptional character by deleting at most one prime factor of .    □
 Corollary  6. Let . Then, there exists a quantity  which is either equal to 1 or is a prime of sizewith the property thatwhenever  and χ is a character of modulus at most Q and coprime to .  Proof.  If the exceptional character  from Lemma 13 does not exist, then take ; otherwise, we take  to be the largest prime factor of . As  is square-free apart from a factor of at most 4, we have  by the prime number theorem and the claim follows.    □
 Lemma 14. Let x be a large quantity. Then, there exists a natural number , which is either 1 or a prime, such that the following holds.
Let , let  and  be a finite set of linear forms  (which may depend on x) with ,  and .
Let  and let  be a subset of  such that  is non-negative on  and  is coprime to B for all . Then,  obeys Hypothesis 1 at  with absolute implied constants (i.e., the bounds in Hypothesis 1 are uniform over all such choices of  and y).
 Proof.  Parts (1) and (3) of Hypothesis 1 are easy to see; the only difficult verification is (2). We apply Corollary 6 with
        for some small absolute constant 
 to obtain a quantity 
 with the stated properties. By the Landau–Page theorem (see [
27], Chapter 20), we have that if 
 is sufficiently small then we have the effective bound
        for all 
 with 
 and all 
. Here, the summation is over all primitive 
 and
 Following a standard proof of the Bombieri–Vinogradov Theorem (cf. [
27], Chapter 28), we have (for a suitable constant 
):
 Combining these two statements and using the triangle inequality gives the bound required for (2).    □
 We now recall the construction of sieve weights from ([
15], 
Section 7).
Let
 For each prime 
p not dividing 
B, let
      be the elements 
n of 
 for which
 If 
p is also coprime to 
w, then for each 
, let 
 denote the least element of 
 such that
 Let 
 denote the set
 Define the singular series
      the function
      and let 
R be a quantity of size
 Let 
 be a smooth function supported on the simplex
 For any 
, define
 For any 
, define
      and then define the function 
 by
 We then have the following slightly modified form of Proposition 6.1 of [
15].
Theorem 20. Fix θ, . Then, there exists a constant C depending only on  such that the following holds. Suppose that  obeys Hypothesis 1 at some subset  of . Write  and suppose that ,  and . Moreover, assume that the coefficients  of the linear forms  in  obey the size bound  and . Moreover, assume that the coefficients ,  of the linear forms  in  obey the size bound  for all . Then, there exists a smooth function  depending only on k and supported on the simplex  and quantities ,  depending only on k withandsuch that, for  given in terms of F as above, the following assertions hold uniformly for . - For any linear form  in  with  coprime to B and  on , we have 
- Let  be a linear form such that the discriminantis non-zero (in particular L is not in ). Then, 
- We have the crude upper boundfor all n . 
 Proof.  The first estimate (78) is given by [
15], Proposition 9.1, (79) follows from [
15], Proposition 9.2, in the case of 
, (80) is given by [
15], Proposition 9.4, (taking 
 and 
) and the final statement (81) is given by part (iii) of [
15], Lemma 8.5. The bounds for 
 and 
 are given by [
15], Lemma 8.6.
We can now prove Theorem 20. Let 
 be as in that theorem. We set
        and let 
 be the quantity from Lemma 14.
We define the function 
 by setting
        for 
 and 
, where 
 is the (ordered) collection of linear forms 
 for 
 and 
 was defined in (76). Note that the admissibility of the 
r-tuple 
 implies the admissibility of the linear forms 
.
An important point is that many of the key components of 
 are essentially uniform in 
p. Indeed, for any primes, the polynomial
        is divisible by 
s only at the residue classes - 
. From this, we see that
 In particular, 
 is independent of 
p as long as 
s is distinct from 
p; therefore,
        for some 
 independent of 
p, with the error terms uniform in 
p. Moreover, if 
 then 
, so all the 
 are distinct 
 (since the 
 are less than 
). Therefore, if 
 we have 
 and
 Since all 
 are at least 
, we have 
 whenever 
. From this, we see that
        is independent of 
p and where the error term is independent of 
.
It is clear that 
w is non-negative and supported on 
 and from (81) we have (57). We set
        and
 Since 
B is either 1 or prime, we have
        and from the definition of 
R we also have
 From (77), we thus obtain (57). From [
15], Lemma 8.1(i), we have
        and from [
15], Lemma 8.6, we have
        and so we have the lower bound (56a). (In fact, we also have a matching upper bound 
, but we will not need this.)
It remains to verify the estimates (59) and (60). We begin with (59). Let 
p be an element of 
. We shift the 
n variable by 
 and rewrite
        where 
 denotes the set of linear forms 
 for 
. (The 
 error arises from (61) and roundoff effect if 
y is not an integer.) This set of linear forms remains admissible and
 The claim (59) now follows from (75) and the first conclusion (78) of Theorem 20 (with 
x replaced by 
, 
 and 
), using Lemma 14 to obtain Hypothesis 1.
Now, we prove (60). Fix 
 and 
. We introduce the set 
 of linear forms 
, where
        and
 We claim that this set of linear forms is admissible. Indeed, for any prime 
, the solutions of
        are 
 and 
 the number of which is equal to 
. Thus
        as before. Again, for 
 we have that the 
 are distinct 
 and so if 
 and 
 we have 
 and
 In particular
        is independent of 
 and so
        where again the 
 error is independent of 
. From this, since 
 takes values in 
, we have that
        whenever 
 (note that the 
 summation variable implicit on both sides of this equation is necessarily equal to 1). Thus, recalling that 
 we can write the left-hand side of (60) as
 Applying the second conclusion on (79) of Theorem 20 (with 
x replaced by 
, 
 and 
) and using Lemma 14 to obtain Hypothesis 1, this expression becomes
 Clearly 
 and from the prime number theorem, one has
        for any fixed 
. Using (83), we can thus write the left-hand side of (79) as
 From (42) and (56a), the second error term may be absorbed into the first and (59) follows.
Finally, we prove (60). Fix 
 not equal to any of the 
 and fix 
. By the prime number theorem, it suffices to show that
 By construction, the left-hand side is the same as
        which we can shift as
        where again the 
 error is a generous upper bound for round-off errors. This error is acceptable and may be discarded. Applying (80), we may then bound the main term by
        where
 Applying (83), we may simplify the above upper bound as
 Now, 
 for each 
i; hence, 
 and it follows from (82) and (56), observing 
 This concludes the proof of Theorem 20 and hence Theorem 4.    □
 The K-version deduction of Theorem 19 (of [20]). We now modify the weights 
 to incorporate (for fixed primes 
p) the conditions
      and 
We carry out the modification in two steps. In a first step, we replace  by . Here, p is a fixed prime with .
Here, we have to be more specific about the set . We set .
Definition 18. Let  be as in (76), , p a fixed prime with . Let also . We set  We first express the solvability of (86) by the use of Dirichlet characters.
Lemma 15. Let p be a prime number. Let , and  be the principal character . There are  non-principal characters , such that for all  we have  Proof.  Let 
 be a primitive root 
,
 Setting
        we see that the congruence
        is solvable if and only if
        has a solution 
y. 
By the theory of linear congruences, this is equivalent to 
. We have
 We now define the Dirichlet character 
, (
),
        and obtain the claim of Lemma 15.    □
 Theorem 21. Let , as in the Definition of , . Then, we have  Proof.  By Lemma 15, we have
 The sum belonging to the principal character
        differs from the sum
        only by 
, since there are only 
 terms with 
, each of them has size at most 
. We therefore have
 Let now 
. Here, we closely follow the proof of Proposition 9.1 of [
15]. We split the sum into residue classes 
. We recall that
 If
        then we have 
 and so we restrict our attention to 
 with
 We substitute the definition of 
, expand the square and swap the order of summation. This gives
 The congruence conditions in the inner sum may be combined via the Chinese Remainder Theorem by a single congruence condition
        where 
 stands for the least common multiple.
There are 
 Dirichlet characters 
 such that
 We thus may write
        with a suitable absolute constant 
A, an interval 
I of length
        and the 
 non-principal Dirichlet characters 
 of conductor 
 and modulus 
.
By the Pólya–Vinogradov bound, we obtain:
 The claim of Theorem 21 now follows from (89) and (90).    □
 As a preparation for the proof of Theorem 22 which is a modification of Proposition 9.2 of [
15], we state a lemma on character sums over shifted primes.
Lemma 16. Let χ be a Dirichlet character . Then, for  we have  Proof.  This is Theorem 1 of [
33].    □
 Theorem 22. Let ,satisfy  for  andThen, we have for sufficiently small θ:  Proof.  By Lemma 15, we have
 The sum belonging to the principal character 
 differs from the sum
        only by 
 and thus in [
15], Proposition 9.2, we have
 For 
, we follow closely the proof of Proposition 9.2 in [
15]. We again split the sum into residue classes 
 If
        then we have 
 and so we restrict our attention to 
 with
 We substitute the definition of 
, expand the square and swap the order of summation. Setting 
, we obtain
 If 
 runs through the arithmetic progression
        then also 
 runs through an arithmetic progression
 Thus, we have
 Also, the condition 
 may be expressed with the help of Dirichlet characters
        using orthogonality relations.
Theorem 22 thus follows from (91) and Lemma 16.    □
 For the definition of the weight  whose existence is claimed in Theorem 19, we now have to be more specific about the set  of linear forms.
Definition 19. Let the tuple  be given. For  and , let  be the (ordered) collection of linear forms  for  and set  In the sequel, we now show that in the sums
      appearing in (58) and (59) of Theorem 19, the function 
 may be replaced by the function 
 with a negligible error.
Since these sums have been treated in Theorem 21 and Theorem 22, this will essentially conclude the proof of Theorem 19 and thus of Theorem 5. □
Definition 20. Let  be an admissible r-tuple, . For , , let  Proof.  This follows immediately from Definition 5 and 20.    □
 Lemma 19. Let ,  be as in (76),  as in Definition 19. Let . Then  Proof.  We only give the proof for the hardest case 
 and briefly indicate the proof for 
.
□
 In the inner sum, we only deal with the case 
; the case 
 has a negligible contribution. The inner sum is non-empty if and only if the system
      is solvable. In this case, (93) is equivalent to a single congruence
      where 
 is uniquely determined by the system (93) and
 We apply Theorem 20 with 
B independent of 
 and with
 We have
      and obtain
 This proves the claim for 
. The proof of the case 
 is analogous but simpler, since there is only the single variable of summation 
. □
Lemma 20. Let the conditions be as in Lemma 19. Then, we have  Theorem 23. Let the conditions be as in the previous lemmas. For sufficiently small , we have  Proof.  Let 
. By Definition 20, we have
        which yields
 Thus,
        and therefore
 The claim of Theorem 23 follows by summation over all pairs 
 if 
 is sufficiently small.    □
 We now investigate the sum (60) of Theorem 19.
Definition 21. Let , . Let : . Then, we define  Lemma 21. Let  be as in Definition 20. Let . Then, we have  Proof.  We only give the proof for the hardest case 
. The case 
 is analogous but simpler. We have
We deal only with the case 
 for the inner sum, the case 
 having a negligible contribution. The inner sum is non-empty if and only if the system
        is solvable.
In this case, the system is equivalent to a single congruence 
 uniquely determined by the system (95) and 
. The inner sum then takes the form
 By the substitution 
, we obtain
 We set 
, where 
 is replaced by the set 
, where
 We thus have
 We apply Theorem 22 with 
, 
 instead of 
x, 
, 
. We have
 From Bombieri’s Theorem, it can easily be seen that conditions (78) are satisfied for all 
s with the possible exception of 
, 
 being an exceptional set, satisfying
 For 
, we use the trivial bound 
. Thus, we obtain the claim of Lemma 21 for the case 
.
The proof for  is analogous but simpler, since we have only to sum over the single variable .    □
 Lemma 22. Let  be as in Definition 20. We have  Proof.  By Definition 21, we have
□
 Theorem 24. Let  be as in Definition 21. Then, we have  Proof.  Let 
. By Definition 20, we have
 It follows that
 Thus
 The second term is absorbed in the first one, since by the definition
        and thus
 Therefore
 The claim of the Theorem 24 now follows by summing over all pairs 
.    □
 We now can conclude the proof of Theorem 19 and therefore also the proof of Theorem 1.1.
By Theorems 21–24, we have
      and
The deduction of Equations (58) and (59) of Theorem 19 can thus be deduced from results on the sums on the right-hand side of Equations (96) and (97).