1. Introduction and Main Results
The subject of this study is upper and lower bounds for probabilities of the type , where  are independent equally distributed Bernoulli random variables. In other words, we estimate tail probabilities for the binomial distribution. To this end we use the Poisson approximation.
It should be noted that although the binomial distribution is very special from the formal point of view it is of great concern in applications. Moreover, due to simplicity, more exact bounds are attainable for the binomial distribution than in the general case.
Let us start with the known Hoeffding inequality. Assuming that the independent random variables 
 satisfy the condition 
, 
, W. Hoeffding [
1] deduced the inequality
      
      where 
, 
. In the case of identically distributed random variables 
 we have 
, and the inequality (
1) remains the same. Making in (
1) the change of variable 
 we get
      
In turn this inequality can be written in the following form,
      
      where
      
      is the so-called relative entropy or Kullback–Leibler distance between two two-point distributions 
 and 
 concentrated at the same pair of points.
Apparently, I. Sanov [
2] was the first who stated probability inequalities in terms of the function of the type 
, where 
, 
, 
.
The starting point in proving (
1) and many other probability inequalities for independent random variables is the following bound.
Let there exist 
 such that
      
      where 
 are the distribution functions of 
, 
. Then for every 
, we have
      
      where
      
In the case of i. i. d. random variables inequality (
5) can be written in the following form,
      
      where 
, 
G is the distribution of 
. On the other hand, for each 
 the following identity holds,
      
      where
      
      is the Esscher transformation of the distribution function 
 (see [
3]). Note that starting with the classic work of Cramér [
4], Esscher’s transform has been repeatedly used in the theory of large deviations.
Let 
 be such that
      
      and denote
      
It follows from (
7) and (
10) that
      
Notice, that although the method used in this work essentially coincides with the method of our previous article on estimates of large deviations in the case of normal approximation [
5], function 
 differs from function 
 from [
5] by the absence of the factor 
. The nuance is that in this work we are dealing with one-way distributions, and direct copying of the previous approach could make the reasoning unnecessarily cumbersome.
Taking into account (
9) it is easily seen that 
 satisfies the equality 
, where
      
For any nondegenerate random variable  we have . Therefore, . Estimating , we can sharpen the Hoeffding inequality.
Note that in the case 
 the asymptotics of 
 is found in [
4] (p. 172) under condition 
, 
, namely,
      
      where 
, the restriction 
 being imposed (see also [
6]), and
      
      is the so called Mills ratio (
 and 
 are the distribution function and density function, respectively, of the standard normal law).
Let  be an arbitrary positive number,  the distribution function of the Poisson law with the mean . We will also use the notation . Note that we consider distribution functions to be continious from left.
In connection with (
12), note that in the present work we define and use the following analogue of the Mills ratio for the Poisson distribution with an arbitrary parameter 
: for every integer 
M. Talagrand [
7] sharpened the Hoeffding inequalities for 
, where 
K is a constant, regarding which is known only that it exists. The bounds obtained in [
7] are stated in terms of 
K as well.
Remark that Talagrand, like Hoeffding, considers the case of non-identically distributed random variables.
In the present work we estimate  in the case of Bernoulli trials with explicit values for constants, not laying any restriction on y.
In what follows we use the next notations: F the distribution function of the Bernoulli random variable with parameter p, ,  the n-fold convolution of F.
In what follows we will assume 
x to satisfy the following condition,
      
It is not hard to verify that 
 satisfying (
9) in the case 
 and 
 has the following form,
      
Notice that 
 under condition (
15). In what follows 
.
We get from (
14) and (
16) that
      
Denote by 
 the distribution function of Poisson law with a parameter 
. If the variable 
x from (
18) approaches 0, it is natural to take 
 with 
 as the approximating distribution for 
. Just this distribution is used in Theorem 2. However, first we need another approximating Poisson distribution with the mean
      
      depending not only on the parameters 
n and 
p, but on the variable 
x from formula (
15). We shall call this distribution by 
the variable Poisson distribution.
Let us formulate the first statement about the connection between the behaviors of tails 
 and 
. First introduce the function
      
We have
      
      where 
. Function 
 is presented as a series:
Note that the series 
 converges since by condition (
15), we have 
.
Proposition 1. If condition (
15) 
is fulfilled, then  The following theorem gives one more form of the dependence of the tails of the binomial distribution on the tails 
 of the variable Poisson distribution. It is a consequence of Proposition 1, but by no means trivial, and requires the proof of a number of additional statements, which are given in 
Section 3.
Theorem 1. If condition (
15) 
is fulfilled, then  Example 1. Let , , , . Table 1 shows the corresponding values of the function . Table 1, in accordance with Theorem 1, shows that with increasing x the approximation deteriorates.  Remark 1. It is known that the binomial distribution with parameters n, p is well approximated by the Poisson one with the parameter  if p is small enough [8]. The Poisson distribution from the equalities (
24) 
and (
26) 
has another parameter. However, we have , when x is close to 0 and . In the next claims we consider the Poisson approximation with the parameter . Note also that the Poisson distribution with parameter  degenerates when x is close to 1. See also Table 2.  Remark 2. A necessary condition for good approximation in (
26) 
is the smallness of x, namely, . This agrees with the result by Yu.V. Prokhorov [9], according to which in the case  () Poisson approximation to the binomial distribution is more precise with respect to the normal approximation. However, as x is close to 0,  (). In this case, λ can be both large and small. This also applies to the values of . Note that  for any . Indeed, it is easy to see that . Therefore, . This means that  for all .  Theorem 2. If condition (
15) 
is fulfilled, then the following equality holds, where is the function from Theorem 1.  Remark 3. It follows from Remark 2 that if in the representation (
26) 
the difference   is replaced by , where , then instead of the function , it will be necessary to insert another correction factor, which will be less than . The form of this factor is indicated in Theorem 2. In this connection, we note that the exponential function on the right-hand side of (
28) 
has a negative exponent, in contrast to the exponential function in (
26)
.  The following table gives an idea of the relationship between tails of the approximating distributions under consideration:  and .
By  we will denote quantities, maybe different in different places, satisfying the bound .
Rewrite (
28) in the form:
      where 
.
Let us give a table of values of the functions: 
, 
 and 
. Let 
, 
, 
, 
. Calculations arrive at the following table (
Table 3).
Taking into account that 
 is not much different from 1 (see 
Table 3) write up 
. We will use the elementary identity 
. Putting 
, 
, we obtain
      
      and
      
Note that Equality (
31) is another form of Theorem 2.
The following inequalities hold: 
 and 
. Hence, by (
30), 
 if 
, and 
 if 
.
In the next theorem, the estimate of  is got.
Theorem 3. If condition (
15) 
is fulfilled, then  - 1.
- The closeness of  and  to 0 ensure the closeness of  to zero. Moreover, as it was said in Remark 2, the closeness of  to 0 agrees with  [9]. 
- 2.
- Under the condition  the quantity  may not tend to zero. 
Remark 5. Let us discuss the connection between x, n and p the function  approaching zero. Obviously,  can tend to zero only if  and .
Let the parameters n and p be fixed. Find . We write  for brevity as . Obviously, . Therefore, the minimum of  is attained at the point , and  for , and  for .
As a result of calculations, we make sure that  if . This condition can be considered fulfilled. From here,where . Thus,  if and only if . Indeed, let  and  be the left and right branches of the function  with respect to the line . In this case, the domain of  is , and  is . On the other hand, for each , you can specify an interval , containing  such that for  the inequality  will hold.
These functions are strictly monotone and therefore have inverse ones:  and , respectively. Then the required interval has the form . Note that the domain of these inverse functions is the same: .
 Example 2. Let , . Then , , . The graph of the function  is shown in Figure 1. Take  for example. Finding the roots of the equation , we get: , , . Note that ε can be chosen arbitrarily small only if  is sufficiently close to 0.
The following table (Table 4) shows the behavior of the interval  with decreasing ε. Note that near the point  both functions  and  that form , make approximately the same contribution to . For instance, , where , .
 Corollary 1. Let condition (
15) 
be fulfilled and . Then  Remark 6. Note that the behavior of the series  is defined by the first summand in contrast to the Cramér series in the case of Gaussian approximation [4].    6. Supplement
In this section, we offer the reader some conjectures regarding the behavior of .
Due to the cumbersomeness of the table, we did not place columns corresponding to .
Nevertheless, we made sure that for each 
 the equality
      
      holds. Our conjecture is as follows: for every 
,
      
Remark that the sequence 
 decreases monotonically for 
. This property is also true for sequences 
 for every fixed 
 and 
 for fixed 
. According to CLT, 
 converges in a uniform metric with the normal law 
. On the other hand, 
 approaches 
. It means that
      
Using formula (
83), we get the elements of the last row of 
Table 5 and, hence, the elements of the last row of 
Table 6.
The next conjecture concerns existence and the value of the limit of 
, when 
. Calculations for 
 suggest that
      
In connection with this we remark that the behaviour of differences 
 under condition 
 is investigated in [
10].
Note that (
84) is equivalent to the assumption
      
      i.e., 
 is realized at 
. This fact is fairly easy to prove in the case 
, using the results of the paper [
10], in which this case is considered. In the case of 
, it is more difficult to find a proof, but it certainly exists.
After that we can assert the formula (
84) is valid for all 
k and, moreover, there exists 
. Indeed, according to [
10],
      
      whence
      
Accordingly to 
Table 6 the constant 
 in the inequality
      
      cannot be less 
.
If we impose the constraint 
, then the lower bound for 
 is not less than 
 (see 
Table 6). As for the upper bound for 
, it is equal to 
 if in (
85) supremum with respect to 
p is taken over all 
p such that 
, 
.
If we adhere to the principle of incomplete induction, then the available information is sufficient to assert that .
Note that in the case , it is sufficient to swap the roles of p and .
Table 6 demonstrates the following remarkable property:
 Therefore, it is highly plausible that
      
Moreover, the following equality is highly plausible,
      
The equality (
86) is another our conjecture. If this assumption is true, then instead of (
39) we have a more precise estimate
      
If the hypothetical estimate (
87) is correct, main statements of the present work can be sharpen.
Since 
, in the right-hand side of inequality (
25) in Proposition 1 the product 
 can be replaced by 
.
Taking into account the inequality , in all places the constant  can be replaced by . In particular, in the formulations of Theorems 1–3,  can be replaced by .
Taking into account that 
, and using 
Table 6, we arrive at the conclusion that in the case under 
 and 
 (see the row “
”, the column “
”),
      
If 
k is growing, the coefficient at 
p in (
88) decreases, but cannot be less than 
.