Abstract
In this work, the Sieve of Eratosthenes procedure (in the following named Sieve procedure) is approached by a novel point of view, which is able to give a justification of the Prime Number Theorem (P.N.T.). Moreover, an extension of this procedure to the case of twin primes is formulated. The proposed investigation, which is named Limited INtervals into PEriodical Sequences (LINPES) relies on a set of binary periodical sequences that are evaluated in limited intervals of the prime characteristic function. These sequences are built by considering the ensemble of deleted (that is, 0) and undeleted (that is, 1) integers in a modified version of the Sieve procedure, in such a way a symmetric succession of runs of zeroes is found in correspondence of the gaps between the undeleted integers in each period. Such a formulation is able to estimate the prime number function in an equivalent way to the logarithmic integral function Li(x). The present analysis is then extended to the twin primes, by taking into account only the runs whose size is two. In this case, the proposed procedure gives an estimation of the twin prime function that is equivalent to the one of the logarithmic integral function . As a consequence, a possibility is investigated in order to count the twin primes in the same intervals found for the primes. Being that the bounds of these intervals are given by squares of primes, if such an inference were actually proved, then the twin primes could be estimated up to infinity, by strengthening the conjecture of their never-ending.
1. Introduction
The Sieve procedure is able to achieve heuristic justifications of the Prime Number Theorem (P.N.T.) []. Such a theorem gives the asymptotic trend of the prime number function , where denotes the quantity of prime numbers less or equal to , that is,
Let be the natural logarithm of . If the real functions and are asympthotically equal, that is, , then we say that and are equivalent as , and we write . Consequently, the P.N.T. can be written as
After the infinitude of primes was recognized since ancient times, the estimation (2) was conjectured by Gauss [] and Legendre [] at the end of the century. Gauss himself improved Equation (2), by considering the logarithmic integral function Li(x), which is defined as
Again, the function (3) is such that
but the approximation (4) is much more precise than (2). In fact, it can be demonstrated that the piece is only the first term of the series expansion of (3). The aim of this work is to introduce a novel heuristic procedure (LINPES, Limited INtervals into PEriodical Sequences) that is equivalent to the approximation, in the sense of Equation (4), apart from a simple multiplicative constant, by exploiting some binary periodic sequences, and related symmetrical runs. Pieces of these sequences compose limited intervals of the prime characteristic function , which is defined as
As a matter of fact, a topic that is very much discussed nowadays in the literature just concerns the possible discovering of some regularities and periodicities in the distribution of the primes in certain intervals of the integer sequence []. In this work, the implications of the LINPES procedure are also investigated, in particular with an extension to the twin primes, whose distribution is given by a function known as twin prime function , which is similar to (1), that is,
Unlike the case of primes, the infinitude of twin primes is still unproved. However, analogously to the P.N.T., the density of the twin primes has been conjectured [], by considering that the probability to be a prime of an integer is equal to . Consequently, the probability that and are both prime can be computed, in such a way the strong twin prime conjecture [] gives an equivalence between the twin prime function and the logarithmic integral function , that is,
where is defined as
and is a multiplicative constant that takes into account the statistical dependence of the primes and []. The related constant is named twin prime constant, that is,
As it will be shown later, the proposed LINPES procedure is able to estimate the twin prime function in an equivalent way as the function, apart from a multiplicative constant. However, this is made by admitting that a basic relation, which is true for the primes, is also valid for the twin primes. In this case, the contribution of the present work will be a more probable assertion of the infinitude of twin primes.
Before starting our discussion, we itemize the variables utilized in this paper
- : prime number function (1)
- : logarithmic integral function (3), which leads to an estimation of
- : twin prime number function (6)
- : logarithmic integral function (8), which leads to an estimation of
- : prime number function computed in the fixed integer
- p: generic prime number
- : arithmetic function that gives the succession of primes
- : arithmetic function that gives the characteristic function of primes (5)
- : number of residual integers in the step of the Sieve procedure
- : estimation of given by the heuristic method of Section 2
- : approximation of after the step of the Sieve procedure
- : periodic binary sequence obtained in the step of the modified Sieve procedure
- : period of the periodic binary sequence
- : sliding interval whose size is the same of and whose initial point is given by
- : size of the interval
- : number of residual runs of zeroes in each period
- : size of the run of zeroes in each period
- : interval of where a piece of is stored
- : local density of the residual runs of zeroes by moving a sliding interval in
- : average density of the residual runs of zeroes in the period
- : estimated number of primes in the interval by using the proposed procedure
- : estimated number of primes in by using the logarithmic integral function
- : real number of primes in the interval
- : estimation of by using the proposed procedure
- : estimation of by using the logarithmic integral function
- : corrected version of the estimation
- : number of runs sized in each period
- : average density of the residual runs in the period
- : estimated number of twin primes in the interval by using the proposed procedure
- : estimation of by using the proposed procedure
- : estimation of by using the logarithmic integral function
- : estimated number of twin primes in by using the logarithmic integral function
- : corrected version of the estimation
- : real number of twin primes in the interval .
This paper is organized as follows: Section 2 reports a well-known heuristic method, which is able to estimate the prime number function in the sense of (2), apart from a multiplicative constant. Section 3 shows instead how the LINPES procedure is able to obtain an estimation of that is equivalent to the logarithmic-integral function . Section 4 extends the proposed procedure to the case of twin primes. Finally, future research and conclusive remarks are provided in Section 5.
2. A Heuristic Estimation of Equivalent to the Function
In this section, a well-known heuristic method to justify the P.N.T. in a probabilistic way is briefly resumed, by starting from the Sieve procedure, which splits the primes from the composites in a list of integers up to a given number . The Sieve procedure is the most common way to obtain the primes, and it is also presently a research topic in order to improve its efficiency []. Let be the arithmetic function whose n-th element is the n-th prime, with [,]. The Sieve procedure can be summarized by the following steps:
- Step 1: List the integers in the interval , with , then put and start from the lowest prime .
- Step 2: Cancel all the multiples of not yet struck out, by starting from up to .
- Step 3: Go to the next remaining integer in the list. If , the procedure ends, otherwise increase to .
- Step 4: Put and return to Step 2.
In order to directly compute the characteristic function of primes , we can memorize the status of each integer in a binary vector ranging from to . In practice, we associate the value to an integer that has been struck out by the procedure, and the value otherwise. Such a vector is initialized by all values, because no integer is deleted when the procedure starts. Then, in each iteration of the Sieve procedure, a value is assigned to the cells that identify the deleted integers (that is, the composite integers). At the end of the procedure, only the cells related to the prime numbers will retain the initial value.
The Sieve procedure is able to obtain heuristic justifications of the relation (2) by considering purely probabilistic considerations []. To show this, let be an integer whose order of magnitude is large enough to allow sufficiently robust statistics. In the first step (), the multiples of are struck out, starting from , and the number of deleted integers is approximately given by
Therefore, the quantity of residual integers is about . In the following step (), the multiples of are struck out. Given the independence of the congruences modulo , where is a prime, about of the residual integers will be deleted (for the Chinese Remainder Theorem []). The updated number of the residual integers will be given by
In general, about of the residual integers will be struck out in the step of the Sieve procedure. The procedure ends when the greatest prime number not exceeding is reached, that is, , where is such that is the greatest prime square lower than . At this point, we obtain an estimation of the number of residual integers , and consequently of the quantity of primes , that is,
Let us apply the Merten’s Third Theorem [] to the reciprocal of the product structure (12), by taking the limit as , that is, as . We obtain
where is the Eulero-Mascheroni constant. Consequently, we can get the limit of as , that is, an approximation of the limit of , by considering
that is, , with , and being . Noticeably, from the relations (2) and (14), the real quantity of prime numbers in the interval , is overestimated, as , by a factor , due to the previous approximations.
As a conclusion, this heuristic procedure gives a justification of the P.N.T. that is equivalent to the relation (2), except for the constant [,]. In Section 3, the proposed LINPES procedure will be described, which gives a justification of the P.N.T. that is instead equivalent to the more precise estimation (4), by means of a procedure that is not purely probabilistic, but that is also featured by analytic considerations, which can be shared with other scientific sectors.
3. The LINPES Estimation of Equivalent to the Function
In this section, the novel heuristic LINPES procedure is described, by showing that it can give an estimation of the prime number function . To this end, an ensemble of periodic binary sequences will be considered in limited intervals of the prime characteristic function . Such a topic is of a great interest because the distribution of primes in short intervals has been deeply investigated in literature, up to the present [,]. The proposed procedure is also able to provide useful insights into the estimation of the trend of the twin prime number function . In this analysis, we denote in the following for convenience, even if the integer is not considered to be a prime.
3.1. Periodic Binary Sequences Inside the Prime Characteristic Function
The occurrence of pieces of periodic binary sequences inside the prime characteristic function is discussed here. To this end, both the Sieve procedure and a modified version of it are investigated step-by-step, where each step is labelled with the progressive index , with denoting the beginning of the two procedures. The difference between the modified and the true Sieve procedure is simply that in the Sieve procedure, in each step , only the multiplies of the prime are struck out, but not the prime itself, whereas in the modified Sieve procedure the prime itself is also deleted. As previously stated, the status of each integer (0→deleted, 1→undeleted) is stored in a N-size vector, which is initialized with all values. The outputs of the Sieve procedure and its modified version are denoted as and , respectively, for each step . Consequently, the deletion of an integer from the true or the modified Sieve procedure simply means that a value replaces a value in the two previous sequences. In the case of the Sieve procedure, the sequence is an approximation at the step of the prime characteristic function .
At the beginning of the procedures (), we have two equal periodic sequences of all values, that is, and , whose period is . In the first step of the modified Sieve procedure (), the multiples of are struck out, including itself. Consequently, we obtain a sequence , which is still periodic, with alternating and symbols. The period of is given by the prime value itself, that is, . In the following, will denote the period of the sequence . Conversely, in the Sieve procedure, the prime is not deleted. In this case, the output sequence is not periodic, but includes a piece of the periodic sequence , by starting from the square . Before such a value, the previous sequence is preserved, which coincides with . It follows that is a mixed sequence, being composed by pieces of both and , that is,
Similarly, in the second step of the modified Sieve procedure (), every multiple of , which is not yet struck out, is deleted, including the prime itself, to give the new sequence . Therefore, this sequence comes from the deletion of all the multiplies of the primes and , including the primes themselves. It follows that the sequence is periodic, with a period equal to the product of and , as it will be demonstrated in Theorem 1. If we consider the second step of the Sieve procedure, where the primes and have not been deleted, we obtain the sequence . This is again a mixed sequence, where a piece of the periodic sequence is introduced, by starting from the square , whereas the previous binary values are saved before this square. Consequently, we have
In general, the multiples of the prime , which are not yet struck out in the previous steps, are deleted in the k-th step of the modified Sieve procedure, including the prime itself. Consequently, after performing all the first steps, we obtain the periodic sequence , as shown in Theorem 1. In the case of the original Sieve procedure, after the k-th step, we obtain the sequence , which is an approximation of the prime characteristic function until the prime . Such an approximation differs from the previous one , only by starting from the square . In fact, after this point, a piece of the periodic sequence is recognizable. It follows that can be eventually written as a mixed sequence, which is a generalization of Equations (15) and (16), that is,
By evaluating the expression (17), we can recognize that subsets of the periodic binary sequences are present, for each , in the related intervals of the prime characteristic function. This happens until the end of the Sieve procedure, because each interval is not influenced by the deletions done in the following steps. We now show that the sequences are periodic and that their periods are given by the product of all the primes up to .
Theorem 1.
Let be given the binary sequences , which are generated by the deletion of the multiplies of all the primes up to , including the primes themselves. Then, the sequences are periodic, and their periods are given by the product of all the primes up to , that is,
Proof.
The deletion of the multiplies of all the primes up to gives all the sets, as a function of , of reduced residue systems modulo , where is given by Equation (18). Each set is composed by all the positive integers relatively prime to , that is, by all the numbers such that . The quantity of integers in each set is given by the Euler phi function , which computes the number of positive integers less than and relatively prime to . However, the sets of reduced residue systems are abelian groups, so that each of them is associated to a principal Dirichlet character function. This is an arithmetical function , which is nothing but , being defined as
In [], it is proven that is a periodic sequence, and in particular that
This completes the proof. ☐
Table 1 reports the periods of the sequences , , in comparison with the sizes of the intervals , where subsets of each are recognizable. The pseudo-prime is put in brackets.
Table 1.
Periods of the sequences , for primes , in comparison with the sizes of the intervals . The ratios are rapidly decreasing as the prime grows.
By considering the ratios , it is evident that the periods increase much faster than the width of the intervals . This makes sense because the periodicity of the sequences is hardly recognizable by simply investigating the subsets of each in the intervals .
3.2. The Symmetric Sequences of the Runs of Zeroes in the Periods
In Section 3.1, the prime distribution has been represented as the intersection of an endless number of periodic binary sequences , whose periods rapidly grow, and such that subsets of these sequences can be found in limited intervals of the prime characteristic function . In particular, each of these intervals ranges between the squares of a prime and of the successive . Consequently, the real primes in each interval are given by the values of the correspondent sequence . In order to complete this analysis, we now consider the gaps between these primes, by following an established trend in literature. In particular, we are interested to investigate the distributions of the runs of zeros in each period , being the binary sequences composed by isolated ones followed by strings, more or less large, of zeroes. It follows that the quantity also gives the number of undeleted integers (i.e., isolated ones) in each period , because the quantity , for , is an even number, so that the last digit of each period is a zero.
Let us consider the Sieve procedure described step-by-step in Section 3.1 and the number of runs of zeroes in each period of the binary sequences . For , we have only one run (), whose sizes are and , respectively. For , the deletion of both the multiples of and give two runs () in the period , whose sizes are and , respectively, and so on. Table 2 reports the number of runs and their sizes , for , where the index identifies the specific run and gives the step of the Sieve procedure. Noticeably, the runs of each period are symmetrical around a symmetry center given by a run sized , except for a final run that is sized . Such a trend is expected to be a rule also for the successive steps.
Table 2.
Runs of zeroes in the periods of the sequences , for primes . For each , the number of runs and their sizes are reported, with . Let us notice the symmetry of the runs in each period . By starting from , the symmetry center is given by a run of length , whereas the final run of length is out of symmetry.
3.3. The Relation Between the Primes in an Interval and the Runs in a Period
For evidencing the relation between each period and the correspondent number of runs of zeroes , we report in Table 3 the scores of for .
Table 3.
Periods and related runs of zeroes for the primes . The special prime is put in round brackets.
Such scores also give the number of the integers that have not been struck out by the modified Sieve procedure in the period , which in turn can be related to the number of undeleted integers (and consequently of the primes) in the correspondent interval . We will show in Theorem 2 that a correlation exists between and , in such a way the number of primes in each interval can be inferred. According on the theory of congruences, Theorem 2 gives the quantity of the integers that have not been struck out (i.e., ) in each period , that is,
Theorem 2.
Let be given the periodic binary sequences defined in Theorem 1, and whose periods are . Then, the number of undeleted integers, that is, the number of runs of zeroes , in a period , for , is given by
Proof.
The number of undeleted integers in each period is given by the number of integers in the reduced residue systems modulo , that is, the number of positive integers less than and relatively prime to . Such a value is given by the Euler phi function , once computed in , that is []
where , are the primes dividing . ☐
By starting from , Table 1 shows that the interval is included in the first period of the sequence . Consequently, a subset of the undeleted integers in each period lies in the correspondent interval , where they are just primes. Therefore, we can infer the quantity of primes in each , by starting from the quantity in the correspondent period . As a first approximation, a simple proportional relationship is investigated. Let us consider the local density of the undeleted integers in the period , where is computed in sliding intervals whose size is the same of , that is, . In this context, the index represents the starting point of each . If such intervals span the whole period , we assume that the density is not a function of . In this case, it is equal to the average density over , and we have
It is noteworthy that the product structure in Equation (23) is the same as in Equation (12). Let us suppose that the previous assumption holds. Then, an estimation of the local density in each interval (that is, for ), will be just the average density over the period . Consequently, we can write
Therefore, by starting from Equation (23), we can estimate the quantity of primes in each interval , for . To this end, the average density is multiplied by the size , that is,
3.4. The Novel LINPES Estimation of the Prime Number Function
Equation (25) gives a succession of estimations of the real number of primes in each interval . Therefore, the next step will be to blend all these scores to compute a global estimation of the quantity of the primes up to , where , analogously to Equation (12). In theory, is simply computable by adding all the contributions of Equation (25), for , where is the greatest prime number not exceeding . However, such a procedure includes the term , which is unknown. In order to overcome this issue, the computation of has to involve only the terms up to , plus a final term , where the interval is only partially considered. Consequently, we obtain
where , and . Let us notice that Equation (26) includes as many contributions as the primes are, where each term is given by a relation similar to Equation (12), with the global size that is replaced by the size of the interval . Each contribution includes an average number of primes that is given by , so that the average distance between two consecutive primes is , which is of the order of magnitude of . For the Cramér conjecture [], this average distance is . Another conjecture by Cramér, by starting from the Riemann’s hypothesis, was [,]. Consequently, the error given by neglecting the partial term is smaller than the loading term of the Cramér conjectures, so that the partial term could be omitted.
3.5. The Corrected LINPES Estimation by Using the Equivalence with the Function
We want now to show that Equations (3) and (26) are related. To this end, we write the logarithmic integral function as a summation of integrals, each of them is computed in the interval , that is,
where the first term starts from to cope with a possible improper integral, and is the greatest square of a prime less than . Consequently, the function is expressed by Equation (27) as a succession of estimations , in a similar way to Equation (26), that is,
where , , and
We now apply the Mean Value Theorem to each interval in Equation (27), that is,
where , , , and , . In order to show the equivalence between the Equations (26) and (30), we also consider the lower bound of the interval . By taking, in the two summations, the ratio between the two terms multiplying the interval size , we can write
From Equation (13), we have
where , so that its maximum distance from is . However, we know that the prime is given asymptotically by []. Therefore, and , so that for each point we have . It follows that
and consequently Equation (31) gives, for each fixed ,
It follows that the trends of the two estimations (26) and (30) are the same as , apart from the constant coefficient . Due to this multiplicative factor, the proposed estimation (26) overestimates the prime number function with respect to Equation (30), and in this sense it is similar to the heuristic procedure described in Section 2. However, it has to be noticed that this last one is completely probabilistic, whereas the proposed method is also based on an analytical procedure, that is, the recognition of an infinite number of binary periodical sequences and related intervals of the prime characteristic function. In order to correct this discrepancy, we relax the conjecture of Section 3.3, in such a way the trend of the local density becomes a function of . Experimentally, the values of the local density in the interval are lower than those of the average density . The following conjecture is then proposed, which links and by means of the constant of the Third Mertens’ Theorem [].
Conjecture 1.
The local density of the undeleted integers in the period , if computed in sliding intervals whose size is the same of , is a function of the starting point of the sliding interval. In particular, the average density is greater than the local density in the interval , in such a way the succession of their ratios exceeds the unity. Moreover, the limit value as of is equal to the constant of the Third Mertens’ Theorem, that is,
The typical trend of , for and varying , is plotted in Figure 1, together with the average density in the period . Let us notice that, as it will be discussed in the following, such a trend is less appreciable for small values of the primes.
Figure 1.
Typical trend (in black), with and , of the local density of the non-deleted integers by varying in sliding intervals whose size is . Notice that it is shown only the initial part of the period , whose order of magnitude is , in such a way the symmetrical trend of the period falls outside the figure. The red line reports a polynomial fitting of the density , whereas the blue line concerns the average density in the period . The minimum value of the local density is just reached at the lower bound of the interval , that is, .
Figure 1 can be explained as follows. Let us consider the sequences defined in Section 3.1, where the multiples of the primes up to have been struck out, included the primes themselves. In each of these sequences, all the undeleted integers are just primes in the range , whereas the undeleted integers greater than can be indifferently primes or composites, because the multiples of the primes greater than have not yet been struck out.
At the beginning of the modified Sieve procedure (), the local density of the undeleted integers is not a function of , because no integer has been still struck out. In the first step (), only the even integers (i.e., the multiplies of ) have been struck out, so that is still a constant value up to infinity. Noticeably, the multipliers (i.e. the integers multiplying to give the deleted multiplies) are equal to the undeleted integers when the procedure starts (i.e., all the integers). This rule also holds for the following steps, that is, the multipliers of the prime in the step of the modified Sieve procedure are equal to the undeleted integers in the previous step. It follows that the multipliers of are all the odd integers, whose distribution is again uniform. Some of these multipliers (that is, ) are just primes in the interval , but they can also be composites beyond . In this case, the distribution of the composite multipliers exactly compensate the decreasing trend of the distribution of the multipliers that are also prime numbers. If the primes are sufficiently small, such a compensation happens quickly, because it starts from . In these cases, the distribution of the local density is still approximately uniform. However, as grows, a transient state is noticeable, because, for such values of and small values of , the local density is greater than the average density . In fact, for such values, only a portion of the multiplies of the primes , have been struck out, because the deletion of the multiplies of the prime , starts only from , apart from the prime itself. This means that the deletion of the multiplies of , is completed only at the lower bound of the interval , that is, . Consequently, after this point, the transient state ends and the stationary state begins, where the local density fluctuates around the average density .
Figure 1 shows the trend of the local density in the case of . Starting approximately from this value of , we can notice a minimum value for the distribution of , which is located immediately after the transient state, that is, at the lower bound of the interval . Such a minimum value is about a 10 percent lower than the average density . In fact, as previously explained, the multipliers of the prime are just primes up to , whereupon they can be even composites. It follows that the distribution of the composite multipliers compensate the decreasing distribution of the multipliers that are prime numbers only starting from the multiple . Therefore, as , such a compensation is delaying, in such a way the ratio between and more and more grows up to the value of Equation (35). As a matter of fact, if all the multipliers were primes, their distribution would decrease by following a logarithmic trend, so that would augment with the same trend, by starting from the minimum value in the interval . In the real case, however, the compensation given by the composite multipliers has the effect that the local density does not grow indefinitely, but tends to the limit value . Let us notice that, if we stop the procedure to a finite value of , the ratio between and is , where the succession is increasing and tends to the limit value as .
In order to evaluate the effect of the compensation delay for the small primes , , in comparison with the case of , Table 4 reports: a) the multipliers such that the multiples lie in the interval , and b) the first multiplier that is a composite number, that is, , whose correspondent multiple is . Evidently, as grows, the difference between the upper bound of and becomes so large that the compensation effect of the composite multipliers is no longer noticeable in the interval itself.
Table 4.
Prime numbers , and , and the related intervals , together with: a) the multipliers such that the multiples lie inside the intervals ; b) the first multiplier that is a composite number. Let us notice that the difference between and the multipliers rapidly grows, so that the distance between the multiple and the upper bound of the interval becomes larger and larger.
Figure 2 shows the trend of the succession , as approaches infinity. Evidently, such a succession tends to the constant value . The x-axis is in a logarithmic scale, in such a way the values of can be visualized up to .
Figure 2.
Trend of the succession whose elements are the ratios between the average densities in the period and the local densities in the correspondent interval . For , such a succession asymptotically approximates the constant . In the x-axis, a base-10 logarithmic scale has been chosen for a better visualization.
Finally, Table 5 highlights the equivalence between the proposed estimation (26) and the logarithmic-integral one (3). To this end, a number of linear regressions have been computed between the occurrences (25) in each interval of the proposed estimation versus the correspondent ones (29) of the integral-logarithmic function. Each row of Table 5 is referred to the prime squares ranging from a power-of-ten to the following one, except the first raw, which includes all the squares lower than , in order to elaborate a sufficient number of points. For each of these ranges, we report the coefficients and of the linear regressions , together with the coefficient of determination , which is a measure of the fitting between the two estimations. Evidently, the coefficient of determination tends very fast to its optimal value, that is , despite that the number of observations has increased. Let us notice that the intercept is practically negligible with respect to the full-scale level, whereas the slope is approaching the constant value .
Table 5.
Parameters and coefficients of determination of the linear regressions of the proposed estimations versus the logarithmic-integral ones , together with the parameters and coefficients of determination of the linear regressions of versus the true number of primes . Each point is computed in an interval .
For comparison, Table 5 also reports the parameters and the coefficient of determination in the case of the linear regressions concerning the occurrences versus the targets . These scores are defined as the number of primes in each interval . Even in this case, the fitting between and is impressive, as shown by the coefficient of determination . Noticeably, the slope still approaches the value , because the P.N.T. guarantees that the logarithmic-integral function and the prime number function goes to infinity in the same way.
From the previous analysis, it follows that, for a given , the proposed approximation overestimates the prime number function by a factor , which can be computed by considering that we have an overestimation for each interval that can be computed by considering a factor in the finite set , where is such that (see Equation (34)). If , the overestimation factor tends to the constant . Being unknown, an adjusted version (36) of (26) can be defined by means of the correction factor , that is,
Clearly, the corrected version is able to give better estimations than as approaches infinity. In order to give a quantitative assessment, Table 6 reports the scores of (26) and of its adjusted version (36), in comparison with the logarithmic integral estimation (27), and with the prime number function . The range of each row of Table 6 starts from a power-of-ten and ends to the following one up to .
Table 6.
The proposed estimation and its adjusted version in comparison with the logarithmic integral estimation , and the prime number function . The scores of have been computed by using the MATLAB® toolbox. The scores of and have been rounded to the nearest integer.
It can be noticed that the scores of slightly underestimate both the true number of primes and the logarithmic integral function , which, in turn, is such that the sign of its difference with changes infinitely many times [,], by showing some irregularities in the distribution of the primes [], which have been investigated by considering differences in some subsets of the primes themselves []. Concerning the previous underestimation, this is due to the fact that the limit value is an upper bound for the succession . Evidently, would be perfectly accurate if the terms were available for the computation of (36), by considering the real number of primes in each interval .
4. An Extension of the Procedure to the Twin Prime Numbers
4.1. Preliminary Concepts
Two prime numbers and are twin primes if , which is the lowest possible distance between primes, apart from and , where . Let us note that two consecutive pairs of twin primes do not ever occur, apart from the case and . In fact, one number in the sequence is certainly a multiple of 3. The gaps between consecutive primes have been extensively investigated in literature [,,]. However, differently from the primes, it is presently unknown whether there are infinitely many pairs of twin primes. In any case, a preliminary counting shows that the twin primes are relatively abundant into the sequence of primes, and, consequently, it is reasonable to infer the so-called twin prime conjecture, which states that there are infinitely many pairs of twin primes. This conjecture is strengthened by the fact that the distribution of the primes does not change abruptly. Recently, significant progress has been made by showing that , that is, a finite upper bound exists for the limit inferior of the difference between consecutive primes. In particular, Zhang found that [], and this bound has been successively improved by Maynard to []. Finally, the Polymath’s project, whose aim is to collect all the various efforts that try to put the bound lower as much as possible, has reached the value of []. Evidently, in order to demonstrate the twin prime conjecture, a bound of should be obtained. In this work, we try to give a contribution to the discussion of this conjecture, by following a different strategy, that is, by exploiting the concepts previously introduced for the primes. Consequently, as for the primes, the approach is not merely probabilistic, but also analytic, so constituting a possible significant step for further advancements, as in the case of approaches based on periodic functions []. The distribution of the twin primes is commonly characterized by using the twin prime function (6). Such a distribution decays more rapidly than the distribution of the primes. In fact, Brun demonstrated in 1919 [] that, if is the set of twin primes given by , the related series of the reciprocals converges to the finite limit [], that is,
regardless of the fact of whether the number of summation terms is infinite or not, whereas the same summation instead diverges for the primes.
Analogously to the P.N.T., a possible function for approximating the twin prime function has been proposed [] as the logarithmic integral function (8). As for the primes, we want to obtain an equivalent procedure and investigate possible consequences.
4.2. A Possible Relation Between the Twin Primes in the Intervals and the Undeleted Integers in the Periods
In Section 3.2, the distribution of the runs into each period has been investigated. In the present analysis, the same investigation can be made for the particular case in which the size of the runs is . Evidently, such an investigation can potentially give an estimation of the quantity of twin primes, similarly to the one given by the Equation (26) for the primes. In fact, we will suggest that the number of the runs sized in the interval is equal to the quantity of twin primes in the same interval. Such a number is equal to the number of sequences, if the sequence is completely included in the interval. However, such a sequence cannot occur across two intervals, because each interval, apart from the first one, ends with an even number (that is, a ), because it is followed by a square of an odd prime (that is, another ), which is an odd number. For the sake of clarity, in the following we denote the runs sized as runs . Let us notice that this procedure can be extended to run-lengths of whatever size, by following the Hardy-Littlewood conjecture B []. Such a topic will be the object of future explorations.
Table 7 reports the number of the runs in each period for . As for the total number of runs (21) in the same period, a correlation can be found between and the prime number . In particular, the scores of Table 7 suggest the following conjecture for
Table 7.
Number of runs , denoted as , that are included in the periods , for . These scores are compared with the total number of runs . The special prime is put in round brackets.
Equation (38) can be investigated by taking the modified Sieve procedure. At the start of the procedure (), we have no run . In the first step (), the multiples of are struck out, in such a way the sequence is made by runs only. In particular, a single run is included in the period , so that . For , we delete the multiples of , so that the period becomes three times greater. This implies that the number of runs could increase from to , but the deletion in the point vanishes two of these runs. Let us notice that the cancellation of one multiple vanishes two runs only in this step, being all the runs consecutive, but this does not happen in the following steps, where only one run , or even none, is deleted at the time. It follows that , as in the previous step. On the whole, we obtain that the deleted runs in the period are a fraction of the total number of runs in the same period if no cancellations were made.
Similarly, for , the multiples of are struck out, so that the period becomes five times greater. It follows that the number of runs would grow from to , but two cancellations (for ) vanish two of the five runs . Consequently, we obtain and the fraction of the deleted runs is of the total runs in this period if no cancellation were made. In this step, all the cancellations imply the deletion of one run , but this will not also be a rule for the following steps. In fact, for , we have eight cancellations in the period , but only six of them stroke out a run . However, the fraction of the deleted runs in the period is still given by of the pre-existing ones before the cancellations, being .
In the case of primes, it follows from the relation (21) that we struck out, in each step, a fraction of the total number of runs in the period if no cancellations were made, which is given by the product of the prime by the actual number of runs in the previous period . By considering the scores of Table 7, a similar relation can be conjectured for the runs in the case of twin primes, in order to link the number of cancelled runs and the total number of runs in the period if no cancellations were made. Unfortunately, in general, the actual number of the deleted runs is not easily computable, by starting from the total number of cancellations in . However, in the same way of the primes, our conjecture is that the deletion of the multiples of has the effect to exactly cancel a fraction of the runs in the period .
If this conjecture holds, Equation (38) follows by induction. In fact, it is true for . Let us suppose that Equation (38) holds for and show that it is also true for . By the induction hypothesis, the number of runs in the period is given by . We must show that the number of runs in the period is . Given , the number of runs in the new period becomes , because is times greater than . By taking the previous conjecture, a fraction of the runs is struck out, in such a way we have a fraction of residual runs given by .
4.3. A Heuristic Estimation of Equivalent to the Approximation
From Equation (38), we can give an estimation of the twin prime function , which is equivalent to the approximation given by the function (8). Such an estimation can be viewed as a generalization of Equation (26) to the case of the twin primes. To this end, analogously to Equation (23) for the primes, we compute the average density of the number of runs in a period . By starting from the total number of runs in the period , the average density is given by the relation
As for the primes, we can initially approximate the local density in the interval as the average density , that is, . In this case, the estimated number of twin primes in , for , is given by
The total estimation is then obtained by adding all the contributions , that is,
where , , , and is the greatest prime number not exceeding . As for the primes, Equation (41) overestimates the true scores, because the local density is not actually constant in the period , but it is a function of . However, the offset of the local density in the interval with respect to the average density is greater than for the primes. Experimentally, each value (40) overtakes the true quantity of twin primes computed in of about , that is, more or less a double of the percentage previously found for the primes, and reported in Figure 1, even if the trends of the local densities are similar. Quantitatively, the ratio between the average density and the local density seems to approximate the constant as , that is, the square of .
To evidence this statement, let us consider the estimation given by the function, that is, , for , from Equation (8), that is, , as a summation of integrals, each of them is computed in the interval
being the greatest square of a prime less than . Similarly to Equation (28), we can write Equation (42) as a succession of estimations in each interval , that is,
where , and
Then, we apply the Mean Value Theorem for Integrals to Equation (42) in each interval
where the point belongs to the interval , belongs to the interval , and belongs to the interval . As for the primes, we have to consider the lower bound of the interval . Let us take the ratio between the two terms multiplying the size , in the summations of the Equations (41) and (45), so that we obtain
If we consider the lower bound of the interval , we have
Let us notice that the ratio can be split as
Consequently, we obtain
Then, we define
From Equation (49) and considering that (see Section 3.5), the limit, as , of the ratio (47) is given by
We noticed in the Equation (33) that
For a given , the proposed approximation overestimates the twin prime number function by a factor , which can be computed by considering that we have an overestimation for each interval that can be computed by considering a factor in the finite set , where is such that . Equations (55) and (56) show that the succession tends to the constant as . Consequently, we can define a corrected version (57) of the proposed estimation , by multiplying Equation (41) by the factor , that is,
As for the primes, Equation (57) is expected to improve the estimation of as approaches infinity. This is evidenced in the scores of Table 8, where a comparison is made between the proposed estimation and its adjusted version with the estimation given by the logarithmic integral function (8) and the twin prime number function . The ranges of are the same as Table 6.
Table 8.
The proposed estimation and its adjusted version in comparison with the logarithmic integral estimation and the prime number function . The scores of the logarithmic integer function have been computed by using the MATLAB® toolbox. The scores of and have been rounded to the nearest integer.
The connection between the estimation (41) and the estimation (42) is investigated in Table 9, by considering the parameters and the coefficient of determination of the linear regressions between the occurrences of (40) versus those of , where is given by (44), in each interval . As for the primes, an excellent fitting is given by the linear relationship between and . This is confirmed by the coefficient of determination , which rapidly tends to as grows. On the other hand, the intercept is negligible, whilst the slope approaches the limit value .
Table 9.
Parameters and coefficients of determination of the linear regressions of the proposed estimations for the twin primes versus the logarithmic-integral ones , together with the parameters and coefficients of determination of the linear regressions of versus the true number of twin primes . Each point is computed in an interval .
The fitting of the linear regressions between the occurrences of (40) versus those of the twin prime number function , if computed in the same interval , is also reported in Table 9. Even if less impressive than in the case of Table 5 for the primes, the goodness of the fitting is clearly shown by the coefficient of determination , which is practically at its best value. As for , the slope seems to approximate the limit value .
In summary, the proposed approach estimates the true number of twin primes by considering the number of runs in each interval , in such a way each estimation fits the correspondent one given by . Consequently, in the case the conjecture (38) holds, we can infer that the distribution of the twin primes follows the same trend in all the intervals . Because these intervals are a function of the squares of both the prime and its successive one, it follows that, being the primes are a never-ending succession, the unproved hypothesis of the infinitude of the twin primes would be further strengthened.
5. Conclusions and Future Developments
In this work, an original heuristic procedure in order to obtain the distribution of the prime number function is proposed and investigated, which gives estimations of the scores of equivalently to the logarithmic integral function . However, this approach is not fully probabilistic, but it is also based on analytical concepts, that is, a set of infinitely many binary periodic sequences is found by means of a modified Sieve procedure, whose periods have a subset that is included in limited and disjoint intervals of the prime characteristic function. In each period , these binary sequences define a succession of values, which are separated by runs of consecutive zeroes. Starting from the number of runs of zeroes in a period , an estimation of the total number of primes can be found, which is linked to the logarithmic integral estimation by the constant of the Third Mertens’ Theorem. Noticeably, the succession of the runs of zeroes, whose elements are the gaps between two consecutive primes, is symmetric in each period . As a result, the proposed LINPES procedure estimates the prime number function in each interval , whose bounds are the squares of a prime number and of the successive one. As a particular case, this procedure is also specialized to the case of the twin primes, in such a way only the runs sized are considered in each period. Consequently, a heuristic relation for the number of these runs in a period is formulated, whose trend is linked to the relation previously found for the total number of runs in the case of primes. Therefore, such a relation gives an estimation of the twin prime number function in each interval , which is equivalent to the estimation of the logarithmic integral function , by means of the square of the constant . Being the bounds of these intervals given by squares of primes, their number is infinite. As a consequence, the proposed procedure could give a contribution to the presumed infinity of the succession of the twin primes. Future developments will further investigate the relation of the number of runs in a period , together with the symmetry of the succession of the runs of zeroes.
Author Contributions
Conceptualization, B.A.; Methodology, B.A., S.B., L.S. and M.S.; Formal Analysis, L.S.; Investigation, B.A., S.B., L.S. and M.S.; Data Curation, B.A., L.S. and M.S.; Writing—Original Draft Preparation, B.A.; Writing—Review & Editing, B.A. and M.S.; Visualization, M.S.; Supervision, S.B.
Funding
This research received no external funding.
Acknowledgments
The author would thank the site https://primes.utm.edu/lists/small/millions/ for providing the prime numbers that have been used for the computations in this work.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Crandall, R.; Pomerance, C. Prime Numbers: A Computational Perspective; Springer: New York, NY, USA, 2001. [Google Scholar]
- Gauss, C.F. Letter to Encke, dated 24 December (1849). Werke Kng. Ges. Wiss. Gottingen 1863, 2, 444–447. [Google Scholar]
- Legendre, A.M. Essai sur la thèorie des Nombres; Duprat: Paris, France, 1798. [Google Scholar]
- Torquato, S.; Zhang, G.; de Courcy-Ireland, M. Uncovering multiscale order in the prime numbers via scattering. J. Stat. Mech. 2018. [Google Scholar] [CrossRef]
- Goldston, D.A. Are There Infinitely Many Twin Primes? Available online: http://arxiv.org/pdf/0710.2123.pdf (accessed on 17 April 2019).
- Hardy, G.H.; Littlewood, J.E. Some problems of ”Partito Numerorum”, III: On the expression of a number as a sum of primes. Acta Math. 1923, 44, 1–70. [Google Scholar] [CrossRef]
- Helfgott, H.A. An improved Sieve of Eratosthenes. Available online: https://arxiv.org/abs/1712.09130 (accessed on 17 April 2019).
- Apostol, T.M. Introduction to Analytic Number Theory; Springer: New York, NY, USA, 1976. [Google Scholar]
- Fine, B.; Rosenberger, G. Number Theory: An Introduction via the Distribution of Primes; Birkhäuser: Boston, MA, USA, 2007. [Google Scholar]
- Montgomery, H. A heuristic for the Prime Number Theorem. Math. Intell. 2006, 28, 6–9. [Google Scholar] [CrossRef]
- Mertens, F. Ein Beitrag zur analytischen Zahlentheorie. J. Reine Angew. Math. 1874, 78, 46–62. [Google Scholar]
- Granville, A. Harald Cramér and the Distribution of Prime Numbers. Available online: https://www.dartmouth.edu/~chance/chance_news/for_chance_news/Riemann/cramer.pdf (accessed on 17 April 2019).
- Selberg, A. On the normal density of primes in small intervals, and the difference between consecutive primes. Arch. Math. Naturvid. 1943, 47, 87–105. [Google Scholar]
- Languasco, A.; Zaccagnini, A. Short intervals asymptotic formulae for binary problems with prime powers. J. Théorie Nombres Bordeaux 2018, 30, 609–635. [Google Scholar] [CrossRef]
- Cramer, H. On the order of magnitude of the difference between consecutive prime numbers. Acta Arith. 1936, 2, 23–46. [Google Scholar] [CrossRef]
- Cramer, H. On the distribution of primes. Proc. Camb. Phil. Soc. 1920, 20, 272–280. [Google Scholar]
- Bays, C.; Hudson, R.H. A new bound for the smallest x with π(x) > Li(x). Math. Comput. 2000, 69, 43–56. [Google Scholar] [CrossRef]
- Saouter, Y.; Demichel, P. A sharp region where π(x) − Li(x) is positive. Math. Comput. 2010, 79, 2395–2405. [Google Scholar] [CrossRef]
- Bays, C.; Hudson, R.H. Zeroes of the Dirichlet L-functions and irregularities in the distribution of primes. Math. Comput. 1999, 69, 861–866. [Google Scholar] [CrossRef]
- Granville, A.; Martin, G. Prime number races. Am. Math. Mon. 2006, 113, 1–33. [Google Scholar] [CrossRef]
- Pintz, J. Very large gaps between consecutive primes. J. Num. Theor. 1997, 63, 286–301. [Google Scholar] [CrossRef][Green Version]
- Zhang, Y. Bounded gaps between primes. Ann. Math. 2014, 179, 1121–1174. [Google Scholar] [CrossRef]
- Maynard, J. Small gaps between primes. Ann. Math. 2015, 181, 383–413. [Google Scholar] [CrossRef]
- Polymath, D.H.J. The “Bounded Gaps between Primes” Polymath Project: A Retrospective. Available online: https://arxiv.org/abs/1409.8361 (accessed on 17 April 2019).
- Bagchi, B. A promising approach to the twin prime problem. Reson 2003, 8, 26–31. [Google Scholar] [CrossRef]
- Brun, V. La série 1/5 + 1/7 + 1/11 + 1/13 + 1/17 + 1/19 + 1/29 + 1/31 + 1/41 + 1/43 + 1/59 + 1/61 + … où les dénominateurs sont ”nombres premiers jumeaux” est convergent ou finie. Bull. Sci. Math. 1919, 43, 100–104. [Google Scholar]
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).