On Approximation of the Tails of the Binomial Distribution with These of the Poisson Law

: A subject of this study is the behavior of the tail of the binomial distribution in the case of the Poisson approximation. The deviation from unit of the ratio of the tail of the binomial distribution and that of the Poisson distribution, multiplied by the correction factor, is estimated. A new type of approximation is introduced when the parameter of the approximating Poisson law depends on the point at which the approximation is performed. Then the transition to the approximation by the Poisson law with the parameter equal to the mathematical expectation of the approximated binomial law is carried out. In both cases error estimates are obtained. A number of conjectures are made about the reﬁnement of the known estimates for the Kolmogorov distance between binomial and Poisson distributions.


Introduction and Main Results
The subject of this study is upper and lower bounds for probabilities of the type P n ∑ i=1 X i ≥ nx , where X 1 , . . . , X n are independent equally distributed Bernoulli random variables. In other words, we estimate tail probabilities for the binomial distribution. To this end we use the Poisson approximation.
It should be noted that although the binomial distribution is very special from the formal point of view it is of great concern in applications. Moreover, due to simplicity, more exact bounds are attainable for the binomial distribution than in the general case.
Let us start with the known Hoeffding inequality. Assuming that the independent random variables X 1 , . . . , X n satisfy the condition 0 ≤ X i ≤ 1, i = 1, . . . , n, W. Hoeffding [1] deduced the inequality In the case of identically distributed random variables X j we have µ = EX 1 , and the inequality (1) remains the same. Making in (1) the change of variable n(µ + t) = y we get P n ∑ i=1 X i ≥ y ≤ nµ y y 1 − µ 1 − y/n n(1−y/n) .
In turn this inequality can be written in the following form, where H(t, p) = t ln t p + (1 − t) ln 1 − t 1 − p is the so-called relative entropy or Kullback-Leibler distance between two two-point distributions (t, 1 − t) and (p, 1 − p) concentrated at the same pair of points. Apparently, I. Sanov [2] was the first who stated probability inequalities in terms of the function of the type m ∑ j=1 t j ln The starting point in proving (1) and many other probability inequalities for independent random variables is the following bound.
Let there exist H 0 > 0 such that where V j are the distribution functions of X j , j = 1, . . . n. Then for every 0 < h ≤ H 0 , we have Thus, In the case of i. i. d. random variables inequality (5) can be written in the following form, 1 − G n (y) ≤ min h>0 e −hy R n (h; G), where G n (y) = P ∑ n j=1 X j < y , G is the distribution of X 1 . On the other hand, for each 0 < h ≤ H 0 the following identity holds, where G (h) n (y) = R −n (h; G) y −∞ e hu dG n (u) (8) is the Esscher transformation of the distribution function G n (y) (see [3]). Note that starting with the classic work of Cramér [4], Esscher's transform has been repeatedly used in the theory of large deviations. Let h 0 be such that h 0 y − n ln R(h 0 ; G) = max h>0 hy − n ln R(h; G) , and denote ω n (y; G) = ∞ y e −h 0 u dG It follows from (7) and (10) that 1 − G n (y) = R n (h 0 ; G) ω n (y; G).
Notice, that although the method used in this work essentially coincides with the method of our previous article on estimates of large deviations in the case of normal approximation [5], function ω n (y; G) differs from function ω n (y; G) from [5] by the absence of the factor e h 0 y . The nuance is that in this work we are dealing with one-way distributions, and direct copying of the previous approach could make the reasoning unnecessarily cumbersome.
Note that in the case EX 1 = 0 the asymptotics of ω n (y; G) is found in [4] where σ 2 = EX 2 1 , the restriction y = o(n) being imposed (see also [6]), and is the so called Mills ratio (Φ(t) and ϕ(t) are the distribution function and density function, respectively, of the standard normal law). Let λ be an arbitrary positive number, Π λ (y) the distribution function of the Poisson law with the mean λ. We will also use the notation π λ (j) = λ j j! e −λ . Note that we consider distribution functions to be continious from left.
In connection with (12), note that in the present work we define and use the following analogue of the Mills ratio for the Poisson distribution with an arbitrary parameter λ: for every integer k ≥ 0 M. Talagrand [7] sharpened the Hoeffding inequalities for y < nσ 2 Kb , where K is a constant, regarding which is known only that it exists. The bounds obtained in [7] are stated in terms of K as well.
Remark that Talagrand, like Hoeffding, considers the case of non-identically distributed random variables.
In the present work we estimate ω n (y; G) in the case of Bernoulli trials with explicit values for constants, not laying any restriction on y.
In what follows we use the next notations: F the distribution function of the Bernoulli random variable with parameter p, 0 < p ≤ 1 2 , F n,p = F * n the n-fold convolution of F. Obviously, In what follows we will assume x to satisfy the following condition, It is not hard to verify that h 0 satisfying (9) in the case G = F and y = nx has the following form, Notice that h 0 > 0 under condition (15). In what follows h = h 0 . We get from (14) and (16) that Hence, by (11), where Denote by Π λ (t) the distribution function of Poisson law with a parameter λ > 0. If the variable x from (18) approaches 0, it is natural to take Π λ with λ = np as the approximating distribution for F n,p . Just this distribution is used in Theorem 2. However, first we need another approximating Poisson distribution with the mean depending not only on the parameters n and p, but on the variable x from formula (15). We shall call this distribution by the variable Poisson distribution. Let us formulate the first statement about the connection between the behaviors of tails 1 − F n,p (nx) and 1 − Π λ 1 (nx). First introduce the function We have where u = x−p q . Function − ln(1 − u) − u is presented as a series:

Proposition 1.
If condition (15) is fulfilled, then where The following theorem gives one more form of the dependence of the tails of the binomial distribution on the tails 1 − Π λ 1 (nx) of the variable Poisson distribution. It is a consequence of Proposition 1, but by no means trivial, and requires the proof of a number of additional statements, which are given in Section 3. Theorem 1. If condition (15) is fulfilled, then where u = x−p q , Example 1. Let n = 500, p = 0.002, x = kp, k = 2, 5. Table 1 shows the corresponding values of the function c 1 √ nx 3 .

Remark 1.
It is known that the binomial distribution with parameters n, p is well approximated by the Poisson one with the parameter np if p is small enough [8]. The Poisson distribution from the equalities (24) and (26) has another parameter. However, we have λ 1 = np(1−x) q ≈ np, when x is close to 0 and p < x. In the next claims we consider the Poisson approximation with the parameter np. Note also that the Poisson distribution with parameter λ 1 degenerates when x is close to 1. See also Table 2.

Remark 2.
A necessary condition for good approximation in (26) is the smallness of x, namely, x < θn −1/3 . This agrees with the result by Yu.V. Prokhorov [9], according to which in the case x < θn −1/3 (θ = 0.637) Poisson approximation to the binomial distribution is more precise with respect to the normal approximation. However, as x is close to 0, λ 1 λ ≈ 1 (λ = np). In this case, λ can be both large and small. This also applies to the values of nx. Note that d

Theorem 2.
If condition (15) is fulfilled, then the following equality holds, where , is the function from Theorem 1.

Remark 3. It follows from Remark 2 that if in the representation
where λ = np, then instead of the function A(x, n, p), it will be necessary to insert another correction factor, which will be less than A(x, n, p). The form of this factor is indicated in Theorem 2. In this connection, we note that the exponential function on the right-hand side of (28) has a negative exponent, in contrast to the exponential function in (26).
The following table gives an idea of the relationship between tails of the approximating distributions under consideration: Π λ 1 and Π λ . Table 2. Values of the ratio By θ we will denote quantities, maybe different in different places, satisfying the bound |θ| ≤ 1.
1. The closeness of xn 1/3 and p x to 0 ensure the closeness of r 2 (x) to zero. Moreover, as it was said in Remark 2, the closeness of xn 1/3 to 0 agrees with [9]. 2. Under the condition x = o(n −1/3 ) the quantity nx 2 may not tend to zero.

Remark 5.
Let us discuss the connection between x, n and p the function r(x) = c 1 √ nx 3 + p x approaching zero. Obviously, r(x) can tend to zero only if x → 0 and p x → 0. Let the parameters n and p be fixed. Find min p<x<1 r(x). We write r(x) for brevity as Indeed, let f (x) and g(x) be the left and right branches of the function r(x) with respect to the line x = x 0 . In this case, the domain of f (x) is (0, x 0 ], and g(x) is [x 0 , ∞). On the other hand, for each ε > 0, you can specify an interval (x 1 , x 2 ), containing x 0 such that for x ∈ (x 1 , x 2 ) the inequality r(x) − r(x 0 ) < ε will hold.
The graph of the function r(x) = ax 3/2 + p x is shown in Figure 1. Take ε = 0.05 for example. Finding the roots of the equation r(x) − r(x 0 ) = ε, we get: Note that ε can be chosen arbitrarily small only if np 3 is sufficiently close to 0.
The following table (Table 4) shows the behavior of the interval (x 1 , Remark 6. Note that the behavior of the series nqΛ 3 x−p q is defined by the first summand in contrast to the Cramér series in the case of Gaussian approximation [4].

Proof of Proposition 1
We will use one result from [8]. The latter is formulated as follows.
Let X 1 , . . . , X n be independent Bernoulli random variables. We denote S n = n ∑ j=1 X j , with parameter λ. In the paper [8], the following estimate for the total variation distance d TV (F S n , Π λ ) between F S n and Π λ is obtained, In the particular case when

Corollary 1.
Let condition (15) be fulfilled and c 1 √ nx 3 < 1. Then Remark 6. Note that the behavior of the series nqΛ 3 x−p q is defined by the first summand in contrast to the Cramér series in the case of Gaussian approximation [4].

Proof of Proposition 1
We will use one result from [8]. The latter is formulated as follows.
Let X 1 , . . . , X n be independent Bernoulli random variables. We denote S n = n ∑ j=1 X j , F S n the distribution of the sum S n , p j = P(X j = 1), λ = n ∑ j=1 p j , Π λ Poisson distribution with parameter λ. In the paper [8], the following estimate for the total variation distance d TV (F S n , Π λ ) between F S n and Π λ is obtained, In the particular case when we have λ = np. Then it follows from (34) that In the case (35) we will use the notation It follows from (37) that d n,p ≤ p.
In what follows, we will use the estimate (39), although there is reason to believe that on the right-hand side of (39) the coefficient 1 in front of p can be replaced by a smaller number, namely, e −1 − 1 4 ≈ 0.236 (see Section 6 Supplement). n,p (t) the binomial distribution function with the parameters n and p h . It is easily seen that by (16), Therefore, instead of F where It is easily seen that where λ(h) = nxe −h . It follows from (16) and (20) that Moreover, by (16), Then we get from (42) that Further, integrating by parts, we get It follows from (41), (44) and (45) that Using (16) and the definition of H(t, p), we get It follows from (18), (21) and (46) that where R satisfies the inequality in (46). Now applying (39) and (46)-(48), one after the other, we obtain the statement of Proposition 1.

Proof of Theorems 1 and 2
It is assumed in Section 3 condition (15) to be fulfilled.

Proof of Theorem 3 and Corollary 1
Lemma 2. Let nx be an integer. Then

Lemma 4.
The following inequality holds, The last equality is true because of and, by (43), Lemma 5. The following equality holds,

Proof. By Lemmas 3 and 4,
This implies the statement of Lemma 5.
Taking into account that c 1 + c 3 < 5.74 we can conclude that Corollary 1 is proved.

Numerical Experiments
Further we will use the following Tables 5 and 6.  Table 6. Values of product k d n,1/k for k = 2, 10, n = 1, 20. n 2d n,1/2 3d n,1/3 4d n,1/4 5d n,1/5 6d n,1/6 7d n,1/7 8d n,1/8 9d n,1/9 10d n,1/10 In view of (27) the following estimate holds, The question arises: how accurate is this estimate? In other words, we need to find and check how much c 1 differs from c 0 . Let us rewrite (26) as where the parameter λ 1 is defined by formula (43), and A(x, n, p) by formula (21). It follows from (80) that Let us carry out numerical experiments to get closer to understanding the answer to the question posed.

Illustration to Theorem 1
Let n = 10, p = 0.1. Table 7 contains values of the function g(x, n, p), found for x = j n , j = 2, 9, as well as intermediate results.
On the last row, the maximum value is shown in bold. The third row from bottom in Table 7 shows a real value of the remainder in (26) for various x. These values are acceptable for x ≤ 0.6, taking into account small value of the parameter n. The last line contains the values of the coefficient at √ nx 3 in r 1 = r 1 (x), for which the latter coincides with the real remainder for the given value of x. For example, for x = 0.2, the coefficient for √ nx 3 must be equal to 0.26764 (column "x = 0.2", last line), i.e., r 1 (0.2) = 0.26764 √ 0.08 = 0.0757. If for d K (F 10,1/5 , Π 2 ) to take the bound 0.150981x, where x = 0.2, which is found with the help of Table 6 (column "5 d n,1/5 ", line "n = 10"), we obtain 0.822685 ≡ 0.150981c 1 as the coefficient at √ nx 3 in r 1 (0.2). The latter is more than two times the number 0.370456 from Table 7. Accordingly, we obtain for r 1 (0.2) the estimate 0.822681 √ 0.08 = 0.232689. At the same time, the real value of r 1 (0.2) is 0.104781 (see Table 7 or Table 8).

Supplement
In this section, we offer the reader some conjectures regarding the behavior of d K (F n,p , Π λ ).
Due to the cumbersomeness of the table, we did not place columns corresponding to 11 ≤ k ≤ 20.
Nevertheless, we made sure that for each 2 ≤ k ≤ 20 the equality max 1≤n≤20 d n,1/k = d k,1/k holds. Our conjecture is as follows: for every 2 ≤ k ≤ n, max n≥1 d n,1/k = d k,1/k .
Remark that the sequence d k,1/k decreases monotonically for 2 ≤ k ≤ 20. This property is also true for sequences d k+j,1/k for every fixed j ≥ 1 and d jk,1/k for fixed j ≥ 2. According to CLT, F n,p np + x √ n converges in a uniform metric with the normal law Φ(x/ √ pq ).
On the other hand, Π np np + x √ n approaches Φ(x/ √ p ). It means that lim n→∞ d n,p = max Hence, lim p→0 lim n→∞ d n,p = 0.
Using formula (83), we get the elements of the last row of Table 5 and, hence, the elements of the last row of Table 6.
Taking into account the inequality 0.23576 c 1 < 1.285, in all places the constant c 1 = 5.4489 . . . can be replaced by c 1 = 1.285. In particular, in the formulations of Theorems 1-3, c 1 can be replaced by c 1 .
If k is growing, the coefficient at p in (88) decreases, but cannot be less than 1 2e = 0.18 . . . .

Conlusions
The main results of this paper are Theorems 1-3. In them, estimates are obtained for the errors arising when the tails of binomial distributions are replaced by the tails of the corresponding Poisson distributions. The constant c 1 ≈ 5.5 included in the error estimates is large enough for the estimates to be of practical use. However, further improvement of the estimates will allow apply our results, in particular, when constructing confidence intervals for the parameter p. The basis for the hope of obtaining such an improvement is the inequality (88).