An Upper Bound on the Error Induced by Saddlepoint Approximations—Applications to Information Theory †

This paper introduces an upper bound on the absolute difference between: (a) the cumulative distribution function (CDF) of the sum of a finite number of independent and identically distributed random variables with finite absolute third moment; and (b) a saddlepoint approximation of such CDF. This upper bound, which is particularly precise in the regime of large deviations, is used to study the dependence testing (DT) bound and the meta converse (MC) bound on the decoding error probability (DEP) in point-to-point memoryless channels. Often, these bounds cannot be analytically calculated and thus lower and upper bounds become particularly useful. Within this context, the main results include, respectively, new upper and lower bounds on the DT and MC bounds. A numerical experimentation of these bounds is presented in the case of the binary symmetric channel, the additive white Gaussian noise channel, and the additive symmetric α-stable noise channel.


Introduction
This paper focuses on approximating the cumulative distribution function (CDF) of sums of a finite number of real-valued independent and identically distributed (i.i.d.) random variables with finite absolute third moment.More specifically, let Y 1 , Y 2 , . . ., Y n , with n an integer and 2 n < ∞, be real-valued random variables with probability distribution P Y .Denote by F Y the CDF associated with P Y , and, if it exists, denote by f Y the corresponding probability density function (PDF).Let also be a random variable with distribution P X n .Denote by F X n the CDF and if it exists, denote by f X n the PDF associated with P X n .The objective is to provide a positive function that approximates F X n and an upper bound on the resulting approximation error.In the following, a positive function g : R → R + is said to approximate F X n with an approximation error that is upper bounded by a function : R → R + , if, for all x ∈ R, |F X n (x) − g(x)| (x). ( The case in which Y 1 , Y 2 , . . ., Y n in (1) are stable random variables with F Y analytically expressible is trivial.This is essentially because the sum X n follows the same distribution of a random variable a n Y + b n , where (a n , b n ) ∈ R 2 and Y is a random variable whose CDF is F Y .Examples of this case are random variables following the Gaussian, Cauchy, or Levy distributions [3].
In general, the problem of calculating the CDF of X n boils down to calculating n − 1 convolutions.More specifically, it holds that where f X 1 = f Y .Even for discrete random variables and small values of n, the integral in (3) often requires excessive computation resources [4].
When the PDF of the random variable X n cannot be conveniently obtained but only the r first moments are known, with r ∈ N, an approximation of the PDF can be obtained by using an Edgeworth expansion.Nonetheless, the resulting relative error in the large deviation regime makes these approximations inaccurate [5].
When the cumulant generating function (CGF) associated with F Y , denoted by K Y : R → R, is known, the PDF f X n can be obtained via the Laplace inversion lemma [4].That is, given two reals α − < 0 and α + > 0, if K Y is analytic for all z ∈ {a + ib ∈ C : (a, b) ∈ R 2 and α − a α + } ⊂ C, then, with i = √ −1 and γ ∈ (α − , α + ).Note that the domain of K Y in (4) has been extended to the complex plane and thus it is often referred to as the complex CGF.With an abuse of notation, both the CGF and the complex CGF are identically denoted.
In the case in which n is sufficiently large, an approximation to the Bromwich integral in (4) can be obtained by choosing the contour to include the unique saddlepoint of the integrand as suggested in [6].The intuition behind this lies on the following observations: (i) the saddlepoint, denoted by z 0 , is unique, real and z 0 ∈ (α − , α + ); (ii) within a neighborhood around the saddlepoint of the form |z − z 0 | < , with z ∈ C and > 0 sufficiently small, Im [nK Y (z) − zx] = 0 and Re [nK Y (z) − zx] can be assumed constant; and (iii) outside such neighborhood, the integrand is negligible.
From (i), it follows that the derivative of nK Y (t) − tx with respect to t, with t ∈ R, is equal to zero when it is evaluated at the saddlepoint z 0 .More specifically, for all t ∈ R, and thus which shows the dependence of z 0 on both x and n.
A Taylor series expansion of the exponent nK Y (z) − zx in the neighborhood of z 0 , leads to the following asymptotic expansion in powers of 1  n of the Bromwich integral in (4): where fX n : R → R + is and for all k ∈ N and t ∈ R, the notation K Y play a central role, and thus it is worth providing explicit expressions.That is, The function fX n in ( 8) is referred to as the saddlepoint approximation of the PDF f X n and was first introduced in [6].Nonetheless, fX n is not necessarily a PDF as often its integral on R is not equal to one.A particular exception is observed only in three cases [7].First, when f Y is the PDF of a Gaussian random variable, the saddlepoint approximation fX n is identical to f X n , for all n > 0. Second and third, when f Y is the PDF associated with a Gamma distribution and an inverse normal distribution, respectively, the saddlepoint approximation fX n is exact up to a normalization constant for all n > 0.
An approximation to the CDF F X n can be obtained by integrating the PDF in (4), cf., [8][9][10].In particular, the result reported in [8] leads to an asymptotic expansion of the CDF of X n , for all x ∈ R, of the form: where the function FX n : R → R is the saddlepoint approximation of F X n .That is, for all x ∈ R, where the function Q : R → [0, 1] is the complementary CDF of a Gaussian random variable with zero mean and unit variance.That is, for all t ∈ R, Finally, from the central limit theorem [5], for large values of n and for all x ∈ R, a reasonable approximation to F X n (x) is 1 − Q(x).In the following, this approximation is referred to as the normal approximation of F X n .

Contributions
The main contribution of this work is an upper bound on the error induced by the saddlepoint approximation FX n in (12) (Theorem 3 in Section 2.2).This result builds upon two observations.The first observation is that the CDF F X n can be written for all x ∈ R in the form, (14)   where the random variable has a probability distribution denoted by P S n , and the random variables are independent with probability distribution P Y (z 0 ) .The distribution P Y (z 0 ) is an exponentially tilted distribution [11] with respect to the distribution P Y at the saddlepoint z 0 .More specifically, the Radon-Nikodym derivative of the distribution P Y (z 0 ) with respect to the distribution The second observation is that the saddlepoint approximation FX n in ( 12) can be written for all x ∈ R in the form, (17)   where Z n is a Gaussian random variable with mean x, variance nK Y (z 0 ), and probability distribution P Z n .Note that the means of the random variable S n in (14) and Z n in (17) are equal to nK  Y (z 0 ).Note also that, from (6), it holds that x = nK Y (z 0 ).Using these observations, it holds that the absolute difference between F X n in (14) and FX n in (17) satisfies for all x ∈ R, A step forward (Lemma A1 in Appendix A) is to note that, when x is such that z 0 0, then, and when x is such that z 0 > 0, it holds that where F S n and F Z n are the CDFs of the random variables S n and Z n , respectively.The final result is obtained by observing that sup a∈R |F S n (a) − F Z n (a)| can be upper bounded using the Berry-Esseen Theorem (Theorem 1 in Section 2.1).This is essentially due to the fact that the random variable S n is the sum of n independent random variables, i.e., (15), and Z n is a Gaussian random variable, and both S n and Z n possess identical means and variances.Thus, the main result (Theorem 3 in Section 2.2) is that, for all x ∈ R, where Y (z 0 ) Y (z 0 ) with c 1 0.33554, and (23a) Finally, note that (21) holds for any finite value of n and admits the asymptotic scaling law with respect to n suggested in (11).

Applications
In the realm of information theory, the normal approximation has played a central role in the calculation of bounds on the minimum decoding error probability (DEP) in point-to-point memoryless channels, cf., [12,13].Thanks to the normal approximation, simple approximations for the dependence testing (DT) bound, the random coding union bound (RCU) bound, and the meta converse (MC) bound have been obtained in [12,14].The success of these approximations stems from the fact that they are easy to calculate.Nonetheless, easy computation comes at the expense of loose upper and lower bounds and thus uncontrolled approximation errors.
On the other hand, saddlepoint techniques have been extensively used to approximate existing lower and upper bounds on the minimum DEP.See, for instance, [15,16] in the case of the RCU bound and the MC bound.Nonetheless, the errors induced by saddlepoint approximations are often neglected due to the fact that calculating them involves a large number of optimizations and numerical integrations.Currently, the validation of saddlepoint approximations is carried through Monte Carlo simulations.Within this context, the main objectives of this paper are twofold: (a) to analytically assess the tightness of the approximation of DT and MC bounds based on the saddlepoint approximation of the CDFs of sums of i.i.d.random variables; (b) to provide new lower and upper bounds on the minimum DEP by providing a lower bound on the MC bound and an upper bound on the DT bound.Numerical experimentation of these bounds is presented for the binary symmetric channel (BSC), the additive white Gaussian noise (AWGN) channel, and the additive symmetric α-stable noise (SαS) channel, where the new bounds are tight and obtained at low computational cost.

Sums of Independent and Identically Distributed Random Variables
In this section, upper bounds on the absolute error of approximating F X n by the normal approximation and the saddlepoint approximation are presented.

Error Induced by the Normal Approximation
Given a random variable Y, let the function ξ Y : R − → R be for all t ∈ R : where c 1 and c 2 are defined in (23).
The following theorem, known as the Berry-Esseen theorem [5], introduces an upper bound on the approximation error induced by the normal approximation.
Theorem 1 (Berry-Esseen [17]).Let Y 1 , Y 2 , . .., Y n be i.i.d random variables with probability distribution P Y .Let also Z n be a Gaussian random variable with mean n K (1) Y (0), and CDF denoted by F Z n .Then, the CDF of the random variable where the functions K Y and ξ Y are defined in (9), (10), and (24).
An immediate result from Theorem 1 gives the following upper and lower bounds on F X n (a), for all a ∈ R, The main drawback of Theorem 1 is that the upper bound on the approximation error does not depend on the exact value of a.More importantly, for some values of a and n, the upper bound on the approximation error might be particularly big, which leads to irrelevant results.

Error Induced by the Saddlepoint Approximation
The following theorem introduces an upper bound on the approximation error induced by approximating the CDF F X n of X n in (1) by the function where the function Q : R → [0, 1] is the complementary CDF of the standard Gaussian distribution defined in (13).Note that η Y (θ, n, a) is identical to FX n (a), when θ is chosen to satisfy the saddlepoint K (1)

Y (0) and variance nK
(2) Y (0), which are the mean and the variance of X n in (1), respectively.
Theorem 2. Let Y 1 , Y 2 , . .., Y n be i.i.d.random variables with probability distribution P Y and CGF K Y .Let also F X n be the CDF of the random variable X n = Y 1 + Y 2 + . . .+ Y n .Hence, for all a ∈ R and for all θ ∈ Θ Y , it holds that where and the functions ξ Y and η Y are defined in (24) and (28), respectively.
Proof.The proof of Theorem 2 is presented in Appendix A.
This result leads to the following upper and lower bounds on F X n (a), for all a ∈ R, with θ ∈ Θ Y .The advantages of approximating F X n by using Theorem 2 instead of Theorem 1 are twofold.First, both the approximation η Y and the corresponding approximation error depend on the exact value of a.In particular, the approximation can be optimized for each value of a via the parameter θ.Second, the parameter θ in (29) can be optimized to improve either the upper bound in (31) or the lower bound in (32) for some a ∈ R. Nonetheless, such optimizations are not necessarily simple.
An alternative to the optimization on θ in (31) and ( 32) is to choose θ such that it minimizes nK Y (θ) − θ a.This follows the intuition that, for some values of a and n, the term exp(nK Y (θ) − θ a) is the one that influences the most the value of the right-hand side of (29).To build upon this idea, consider the following lemma.
Lemma 1.Consider a random variable Y with probability distribution P Y and CGF K Y .Given n ∈ N, let the function h : R → R be defined for all a ∈ R satisfying a n ∈ intC Y , with intC Y denoting the interior of the convex hull of supp P X n , as follows: where Θ Y is defined in (30).Then, the function h is concave and for all a ∈ R, where θ is the unique solution in θ to nK Y is defined in (9).
Proof.The proof of Lemma 1 is presented in Appendix B.
Given (a, n) ∈ R × N, the value of h(a) in ( 33) is the argument that minimizes the exponential term in ( 29).An interesting observation from Lemma 1 is that the maximum of h is zero, and it is reached when In this case, θ = 0, and thus, from (31) and (32), it holds that where F Z n is the CDF defined in Theorem with θ defined in (36).Hence, in this case, the right-hand side of ( 29) is always smaller than the right-hand side of (25).That is, for such values of a and n, the upper and lower bounds in (31) and (32) are better than those in ( 26) and ( 27), respectively.The following theorem leverages this observation.
. ., Y n be i.i.d.random variables with probability distribution P Y and CGF K Y .Let also F X n be the CDF of the random variable Hence, for all a ∈ int C X n , with int C X n the interior of the convex hull of suppP X n , it holds that where θ is defined in (36), and the functions FX n and ξ Y are defined in (12), and (24), respectively.
Proof.The proof of Theorem 3 is presented in Appendix C.
An immediate result from Theorem 3 gives the following upper and lower bounds on F X (a), for all a ∈ R , The following section presents two examples that highlight the observations mentioned above.

Examples
Example

Application to Information Theory: Channel Coding
This section focuses on the study of the DEP in point-to-point memoryless channels.The problem is formulated in Section 3.1.The main results presented in this section consist of lower and upper bounds on the DEP.The former, which are obtained building upon the existing DT bound [12], are presented in Section 3.2.The latter, which are obtained from the MC bound [12], are presented in Section 3.3.

System Model
Consider a point-to-point communication in which a transmitter aims at sending information to one receiver through a noisy memoryless channel.Such a channel can be modeled by a random transformation where n ∈ N is the blocklength and X and Y are the channel input and channel output sets.Given the channel inputs x = (x 1 , x 2 , . .., x n ) ∈ X n , the outputs y = (y 1 , y 2 , . .., y n ) ∈ Y n are observed at the receiver with probability where, for all x ∈ X , P Y|X=x ∈ (Y ), with (Y ), the set of all possible probability distributions whose support is a subset of Y.The objective of the communication is to transmit a message index i, which is a realization of a random variable W that is uniformly distributed over the set W {1, 2, . . ., M}, with 1 < M < ∞.To achieve this objective, the transmitter uses an (n, M, λ)-code, where λ ∈ [0, 1].
Definition 1 ((n, M,λ)-code).Given a tuple (M, n, λ) ∈ N 2 × [0, 1], an (n, M, λ)-code for the random transformation in (43) is a system where for all (j, ) ∈ W 2 , with j = : 1 To transmit message index i ∈ W, the transmitter uses the codeword u(i).For all t ∈ { 1,2,. .., n}, at channel use t, the transmitter inputs the symbol u t (i) into the channel.Assume that, at the end of channel use t, the receiver observes the output y t .After n channel uses, the receiver uses the vector y = (y 1 ,y 2 ,. .., y n ) and determines that the symbol j was transmitted if y ∈ D(j), with j ∈ W.
Given the (n,M,λ)-code described by the system in (46), the DEP of the message index i can be computed as E P Y|X=u(i) 1 {Y / ∈D(i)} .As a consequence, the average DEP is Note that, from (47d), the average DEP of such an (n, M, λ)-code is upper bounded by λ.Given a fixed pair (n,M) ∈ N 2 , the minimum λ for which an (n,M,λ)-code exists is defined hereunder.
Definition 2. Given a pair (n,M) ∈ N 2 , the minimum average DEP for the random transformation in (43), denoted by λ * (n, M), is given by When λ is chosen accordingly with the reliability constraints, an (n, M, λ)-code is said to transmit at an information rate R = log 2 (M) n bits per channel use.The remainder of this section introduces the DT and MC bounds.The DT bound is one of the tightest existing upper bounds on λ * (n, M) in (49), whereas the MC bound is one of the tightest lower bounds.

Dependence Testing Bound
This section describes an upper bound on λ * (n, M), for a fixed pair (n, M) ∈ N 2 .Given a probability distribution P X ∈ (X n ), let the random variable ι (X; Y) satisfy where the function dP XY dP X P Y : X n × Y n → R denotes the Radon-Nikodym derivative of the joint probability measure P XY with respect to the product of probability measures P X P Y , with P XY = P X P Y|X and P Y the corresponding marginal.Let the function T : N 2 × (X n ) → R + be for all (n,M) ∈ N 2 and for all probability distributions P X ∈ (X n ), Using this notation, the following lemma states the DT bound.
Lemma 2 (Dependence testing bound [12]).Given a pair (n,M) ∈ N 2 , the following holds for all P X ∈ (X n ), with respect to the random transformation in (43): with the function T defined in (51).
Note that the input probability distribution P X in Lemma 2 can be chosen among all possible probability distributions P X ∈ (X n ) to minimize the right-hand side of (52), which improves the bound.Note also that with some loss of optimality, the optimization domain can be restricted to the set of product probability distributions for which for all x ∈ X n , with P X ∈ (X ).Hence, subject to (44), the random variable ι(X; Y) in (50) can be written as the sum of i.i.d.random variables, i.e., This observation motivates the application of the results of Section 2 to provide upper and lower bounds on the function T in (51), for some given values (n, M) ∈ N 2 and a given distribution P X ∈ (X n ) for the random transformation in (43) subject to (44).These bounds become significantly relevant when the exact value of T(n, M, P X ) cannot be calculated with respect to the random transformation in (43).In such a case, providing upper and lower bounds on T(n, M, P X ) helps in approximating its exact value subject to an error sufficiently small such that the approximation is relevant.

Normal Approximation
This section describes the normal approximation of the function T in (51).That is, the random variable ι(X; Y) is assumed to satisfy (54) and to follow a Gaussian distribution.More specifically, for all P X ∈ (X ), let µ(P X ) E P X P Y|X [ι(X; Y)] , (55) σ(P X ) E P X P Y|X ι(X; Y) − µ(P X ) 2 , and (56) with c 1 and c 2 defined in (23), be functions of the input distribution P X .In particular, µ(P X ) and σ(P X ) are respectively the first moment and the second central moment of the random variables ι(X 1 ; Y 1 ), ι(X 2 ; Y 2 ) . . .ι(X n ; Y n ).Using this notation, consider the functions D : N 2 × (X ) → R + and N : N 2 × (X ) → R + such that for all (n, M) ∈ N 2 and for all P X ∈ (X ), N(n, M, P X ) = min 1, α (n, M, P X )+ 5 ξ(P X ) σ(P X ) where Using this notation, the following theorem introduces lower and upper bounds on the function T in (51).
Theorem 4. Given a pair (n, M) ∈ N 2 , for all input distributions P X ∈ (X n ) subject to (53), the following holds with respect to the random transformation in (43) subject to (44), where the functions T, D and N are defined in (51), ( 58) and (59), respectively.
In [14], the function α(n, M, P X ) in (60) is often referred to as the normal approximation of T(n, M, P X ), which is indeed a language abuse.In Section 2.1, a comment is given on the fact that the lower and upper bounds, i.e., the functions D in (58) and N in (59), are often too far from the normal approximation α in (60).

Saddlepoint Approximation
This section describes an approximation of the function T in (51) by using the saddlepoint approximation of the CDF of the random variable ι(X; Y), as suggested in Section 2.2.Given a distribution P X ∈ (X ), the moment generating function of ι(X; Y) is with θ ∈ R. For all P X ∈ (X ) and for all θ ∈ R, consider the following functions: V(P X , θ) E P X P Y|X (ι(X; Y)−µ(P X , θ)) 2 exp (θι(X; Y)) ϕ(P X , θ) , and (64) where c 1 and c 2 are defined in (23).Using this notation, consider the functions and Note that β 1 is the saddlepoint approximation of the CDF of the random variable ι(X; Y) in ( 54) when X and Y follow the distribution P X P Y|X .Note also that β 2 is the saddlepoint approximation of the complementary CDF of the random variable ι(X; Y) in ( 54) when X and Y follow the distribution P X P Y .Consider also the following functions: where, with β 1 in (66) and β 2 in (67).Often, the function β in (72) is referred to as the saddlepoint approximation of the function T in (51), which is indeed a language abuse.The following theorem introduces new lower and upper bounds on the function T in (51).
Theorem 5. Given a pair (n, M) ∈ N 2 , for all input distributions P X ∈ (X n ) subject to (53), the following holds with respect to the random transformation in (43) subject to (44), where θ is the unique solution in t to and the functions T, G, and S are defined in (51), (70), and (71).
Proof.The proof of Theorem 5 is provided in Appendix F. In a nutshell, the proof relies on Theorem 3 for independently bounding the terms E P X P Y|X 1 {ι(X;Y) ln( M−1 2 )} and E P X P Y 1 {ι(X;Y)>ln( M−1 2 )} in (51).

Meta Converse Bound
This section describes a lower bound on λ * (n, M), for a fixed pair (n, M) ∈ N 2 .Given two probability distributions For all (n,M,γ) ∈ N 2 × R and for all probability distributions Using this notation, the following lemma describes the MC bound.
Lemma 3 (MC Bound [12,15]).Given a pair (n,M) ∈ N 2 , the following holds for all Q Y ∈ ∆(Y n ), with respect to the random transformation in (43): where the function C is defined in (76).
Note that the output probability distribution Q Y in Lemma 3 can be chosen among all possible probability distributions Q Y ∈ (Y n ) to maximize the right-hand side of (76), which improves the bound.Note also that, with some loss of optimality, the optimization domain can be restricted to the set of probability distributions for which for all y ∈ Y n , with Q Y ∈ (Y ).Hence, subject to (44), for all x ∈ X n , the random variable ι(x; Y|Q Y ) in ( 76) can be written as the sum of the independent random variables, i.e., With some loss of generality, the focus is on a channel transformation of the form in (43) for which the following condition holds: The infimum in (77) is achieved by a product distribution, i.e., P X is of the form in (53), when the probability distribution Q Y satisfies (78).Note that this condition is met by memoryless channels such as the BSC, the AWGN and SαS channels with binary antipodal inputs, i.e., input alphabets are of the form X = {a, −a}, with a ∈ R.This follows from the fact that the random variable ι(x; Y|Q Y ) is invariant of the choice of x ∈ X n when the probability distribution Q Y satisfies (78) and for all y ∈ Y, Under these conditions, the random variable ι(X; Y|Q Y ) in ( 76) can be written as the sum of i.i.d.random variables, i.e., ι(X; This observation motivates the application of the results of Section 2 to provide upper and lower bounds on the function C in (76), for some given values (n, M) ∈ N 2 and given distributions P X ∈ (X n ) and Q Y ∈ (Y n ).These bounds become significantly relevant when the exact value of C(n, M, P X , Q Y , γ) cannot be calculated with respect to the random transformation in (43).In such a case, providing upper and lower bounds on C(n, M, P X , Q Y , γ) helps in approximating its exact value subject to an error sufficiently small such that the approximation is relevant.

Normal Approximation
This section describes the normal approximation of the function C in (76), that is to say, the random variable ι(X; Y|Q Y ) is assumed to satisfy (81) and to follow a Gaussian distribution.More specifically, for all (P X , Q with c 1 and c 2 defined in (23), be functions of the input and output distributions P X and Q Y , respectively.In particular, μ(P X , Q Y ) and σ(P X , Q Y ) are respectively the first moment and the second central moment of the random variables ι(X 1 ; where Using this notation, the following theorem introduces lower and upper bounds on the function C in (76).
Theorem 6.Given a pair (n, M) ∈ N 2 , for all input distributions P X ∈ (X n ) subject to (53), for all output distributions Q Y ∈ (Y n ) subject to (78), and for all γ 0, the following holds with respect to the random transformation in (43) subject to (44), where the functions C, D, and Ñ are defined in (76), (85), and (86), respectively.
Proof.The proof of Theorem 6 is partially presented in [12].Essentially, it relies on Theorem 1 for upper and lower bounding the term E P X P Y|X 1 {ι(X;Y|Q Y ) ln(γ)} in (76); and using Lemma 47 in [12] for upper bounding the term The function α (n, M, P X , Q Y , γ) in ( 87) is often referred to as the normal approximation of C(n, M, P X ), which is indeed a language abuse.In Section 2.1, a comment is given on the fact that the lower and upper bounds on the normal approximation, i.e., the functions D in (85) and Ñ in (86), are often too far from the normal approximation α in (87).

Saddlepoint Approximation
This section describes an approximation of the function C in (76) by using the saddlepoint approximation of the CDF of the random variable ι(X; Y|Q Y ), as suggested in Section 2.2.Given two distributions P X ∈ (X ) and where P Y|X is in (44).The moment generating function of ι(X; with θ ∈ R. For all P X ∈ (X ) and Q Y ∈ (Y ), and for all θ ∈ R, consider the following functions: where c 1 and c 2 are defined in (23).Using this notation, consider the functions β1 : Note that β1 and β2 are the saddlepoint approximation of the CDF and the complementary CDF of the random variable ι(X; Y|Q Y ) in (81) when (X, Y) follows the distribution P X P Y|X and P X Q Y , respectively.Consider also the following functions: The function β(n, γ, θ, P X , Q Y , M) in ( 100) is referred to as the saddlepoint approximation of the function C in (76), which is indeed a language abuse.
The following theorem introduces new lower and upper bounds on the function C in (76).
Theorem 7. Given a pair (n, M) ∈ N 2 , for all input distributions P X ∈ (X n ) subject to (53), for all output distributions Q Y ∈ (Y n ) subject to (81) such that for all x ∈ X , P Y|X=x is absolutely continuous with respect to Q Y , for all γ 0, the following holds with respect to the random transformation in (43) subject to (44), where θ is the unique solution in t to nµ(P X , t) = ln (γ) , and the functions C, G, and S are defined in (76), ( 98) and (99).
Proof.The proof of Theorem 7 is provided in Appendix G.
Note that, in (101), the parameter γ can be optimized as in (77).

Numerical Experimentation
The normal and the saddlepoint approximations of the DT and MC bounds as well as their corresponding upper and lower bounds presented from Section 3.2.1 to Section 3.3.2are studied in the cases of the BSC, the AWGN channel, and the SαS channel.The latter is defined by the random transformation in (43) subject to (44) and for all (x, y) ∈ X × Y: where P Z is a probability distribution satisfying for all t ∈ R, with i = √ −1.The reals α ∈ (0, 2] and σ ∈ R + in (104) are parameters of the SαS channel.In the following figures, Figures 3-5, the channel inputs are discrete X = {−1, 1}, P X is the uniform distribution, and θ is chosen to be the unique solution to t in (74) or (102) depending on whether the DT or MC bound is considered.For the results relative to the MC bound, Q Y is chosen to be equal to the distribution P Y , i.e., the marginal of P X P Y|X .The parameter γ is chosen to maximize the function C(n, 2 nR , P X , Q Y , γ) in (76).The plots in Figures 3a-5a illustrate the function T(n, 2 nR , P X ) in (51) as well as the bounds in Theorems 4 and 5. Figures 3b-5b illustrate the function C in (76) and the bounds in Theorems 6 and 7.The normal approximations, i.e, α n, 2 nR , P X in (60) and α n, 2 nR , P X , Q Y , γ in (87), of the DT and MC bounds, respectively, are plotted in black diamonds.The upper bounds, i.e., N n, 2 nR , P X in (59) and Ñ n, 2 nR , P X , Q Y , γ in (86), are plotted in blue squares.The lower bounds of the DT and MC bounds, i.e., D (n, M, P X ) in (58) and D n, 2 nR , P X , Q Y , γ in (85), are non-positive in these cases, and thus do not appear in the figures.The saddlepoint approximations of the DT and MC bounds, i.e., β n, 2 nR , θ, P X in (72) and β n, γ, θ, P X , Q Y , 2 nR in (100), respectively, are plotted in black stars.The upper bounds, i.e., S n, 2 nR , θ, P X in (71) and S n, γ, θ, P X , Q Y , 2 nR in (99), are plotted in blue upward-pointing triangles.The lower bounds, i.e., G n, 2 nR , θ, P X in (70) and G n, γ, θ, P X , Q Y , 2 nR in (98), are plotted in red downward-pointing triangles.
Figure 3 illustrates the case of a BSC with cross-over probability δ = 0.11.The information rates are chosen to be R = 0.32 and R = 0.42 bits per channel use in Figure 3a,b, respectively.The functions T and C can be calculated exactly and thus they are plotted in magenta asterisks in Figure 3a,b, respectively.In these figures, it can be observed that the saddlepoint approximations of the DT and MC bounds, i.e., β and β, respectively, overlap with the functions T and C.These observations are in line with those reported in [15].Therein, the saddlepoint approximations of the RCU bound and the MC bound are both shown to be precise approximations.Alternatively, the normal approximations of the DT and MC bounds, i.e., α and α, do not overlap with T and C respectively.
In Figure 3, it can be observed that the new bounds on the DT and MC provided in Theorems 5 and 7, respectively, are tighter than those in Theorems 4 and 6.Indeed, the upper-bounds N and Ñ on the DT and MC bounds derived from the normal approximations α and α, are several order of magnitude above T and C, respectively.This observation remains valid for AWGN channels in Figure 4 and SαS channels in Figure 5, respectively.Note that, in Figure 3a, for n > 1000, the normal approximation α is below the lower bound G showing that approximating T by α is too optimistic.These results show that the use of the Berry-Esseen Theorem to approximate the DT and MC bounds may lead to erroneous conclusions due to the uncontrolled error made on the approximation.
Figures 4 and 5 illustrate the cases of a real-valued AWGN channel and a SαS channel, respectively.The signal-to-noise ratio (SNR) is SNR = 1 for the AWGN channel.The information rate is R = 0.425 bits per channel use for the AWGN channel and R = 0.38 bits per channel use for the SαS channel with (α, σ) = (1.4,0.6).In both cases, the functions T in (51) and C in (76) can not be computed explicitly and hence does not appear in Figures 4 and 5.In addition, the lower bounds D (n, M, P X ) and D n, 2 nR , P X , Q Y , γ obtained from Theorems 4 and 6 are non-positive in these cases, and thus, do not appear on these figures.
In Figure 4, note that the saddlepoint approximations, β and β, are well bounded by Theorems 5 and 7 for a large range of blocklengths.Alternatively, the lower bounds D and D based on the normal approximation do not even exist in that case.
In Figure 5, note that the upper bounds S and S on the DT and MC respectively are relatively tight compared to those in AWGN channel case.This characteristic is of a particular importance in a channel such as SαS channel, where the DT and MC bounds remain computable only by Monte Carlo simulations.3a) in (51) and C (Figure 3b) in (76) as functions of the blocklength n for the case of a BSC with cross-over probability δ = 0.11.The information rate is R = 0.32 and R = 0.42 bits per channel use for Figure 3a,b, respectively.The channel input distribution P X is chosen to be the uniform distribution, the output distribution Q Y is chosen to be the channel output distribution P Y , and the parameter γ is chosen to maximize C in (76).The parameter θ is chosen to be respectively the unique solution to t in (74) in Figure 3a and in (102) in Figure 3b.4a) in (51) and C (Figure 4b) in (76) as functions of the blocklength n for the case of a real-valued AWGN channel with discrete channel inputs, X = {−1, 1}, signal to noise ratio SNR = 1, and information rate R = 0.425 bits per channel use.The channel input distribution P X is chosen to be the uniform distribution, the output distribution Q Y is chosen to be the channel output distribution P Y , and the parameter γ is chosen to maximize C in (76).The parameter θ is respectively chosen to be the unique solution to t in (74) in Figure 4a and in (102) in Figure 4b.5a) in (51) and C (Figure 5b) in (76) as functions of the blocklength n for the case of a real-valued symmetric α-stable channel with discrete channel inputs X = {−1, 1}, shape parameter α = 1.4,dispersion parameter σ = 0.6, and information rate R = 0.38 bits per channel use.The channel input distribution P X is chosen to be the uniform distribution, the output distribution Q Y is chosen to be the channel output distribution P Y , and the parameter γ is chosen to maximize C in (76).The parameter θ is respectively chosen to be the unique solution to t in (74) in Figure 5a and in (102) in Figure 5b.
distribution.The following lemma provides an upper bound on the absolute difference in (A5) in terms of the Kolmogorov-Smirnov distance between the distributions P S n,θ and P Z n,θ , denoted by where F S n,θ and F Z n,θ are the CDFs of the random variables S n,θ and Z n,θ , respectively.
If at least one of the above conditions is satisfied, then the absolute difference in (A5) satisfies Proof.The proof of Lemma A1 is presented in Appendix D.
The proof continues by providing an upper bound on ∆ P S n,θ , P Z n,θ in (A7) leveraging the observation that S n,θ is the sum of n independent and identically distributed random variables.This follows immediately from the assumptions of Theorem 2, nonetheless, for the sake of completeness, the following lemma provides a proof of this statement.Proof.The proof of Lemma A2 is presented in Appendix E.
Lemma A2 paves the way for obtaining an upper bound on ∆ P S n,θ , P Z n,θ in (A7) via the Berry-Esseen Theorem (Theorem 1).Let µ θ , V θ , and T θ be the mean, the variance, and the third absolute central moment of the random variable Y (θ) , whose probability distribution is P Y (θ) in (A8).More specifically: , and (A10) Let also ξ θ be with c 1 and c 2 defined in (23).
From Theorem 1, it follows that ∆ P S n,θ , P Z n,θ in (A7) satisfies: under the assumption that at least one of the conditions of Lemma A1 is met.The proof ends by obtaining a closed-form expression of the term E P Z n,θ exp (−θZ n,θ ) 1 {Z n,θ ∈A} in (A14) under the assumption that at least one of the conditions of Lemma A1 is met.First, assuming that condition (i) in Lemma A1 holds, it follows that: where Q in (A15f) and (A16c) is the complementary CDF of the standard Gaussian distribution defined in (13).
The expressions in (A15f) and (A16c) can be jointly written as follows: under the assumption that at least one of the conditions (i) or (ii) in Lemma A1 holds.
Third, the next step of the proof consists of proving the equality in (35).For doing so, let θ : R × N → R be for all (a, n) ∈ R × N, θ (a, n) = arg inf θ∈Θ Y g(θ, a, n). (A25) Note that the function g is a convex in θ.This follows by verifying that its second derivative with respect to θ is positive.

Y
(t)  represents the k-th real derivative of the CGF K Y evaluated at t.The first two derivatives K

Figure 3 .
Normal and saddlepoint approximations to the functions T (Figure

Figure 4 .
Normal and saddlepoint approximations to the functions T (Figure

Figure 5 .
Normal and saddlepoint approximation to the functions T (Figure

Lemma A2 .
For all θ ∈ Θ Y , Y independent and identically distributed random variables with probability distribution P Y (θ) .Moreover, P Y (θ) is an exponential tilted distribution with respect to P Y .That is, P Y (θ) satisfies for all y ∈ R,dP Y (θ) dP Y (y) = exp (θy) ϕ Y (θ) .(A8) the first derivative of g with respect to θ (see (A26a)) admits a zero in Θ Y , then θ (a, n) is the unique solution in θ to the following equality:d dθ g(θ, a, n)= n ϕ Y (θ) d dθ ϕ Y (θ) − a = 0. (A27) 1 (Discrete random variable).Let the random variables Y 1 , Y 2 , . .., Y n in (1) be i.i.d.Bernoulli random variables with parameter p = 0.2 and n = 100.In this case, E P Xn [X n ] = nE P Y [Y] = 20.Figure 1 depicts the CDF F X 100 of X 100 in (1); the normal approximation F Z 100 in (25); and the saddlepoint approximation FX 100 in (12).Therein, it is also depicted the upper and lower bounds due to the normal approximation Σ in (26) and Σ in (27), respectively; and the upper and lower bounds due to the saddlepoint approximation Ω in (41) and Ω in (42), respectively.These functions are plotted as a function of a, with a ∈[5, 35].Example 2 (Continuous random variable).Let the random variables Y 1 , Y 2 , ..., Y n in (1) be i.i.d.chi-squared random variables with parameter k = 1 and n = 50.In this case,E P Xn [X n ] = nE P Y [Y] = 50.Figure2depicts the CDF F X 50 of X 50 in (1); the normal approximation F Z 50 in (25); and the saddlepoint approximation FX 50 in(12).Therein, it is also depicted the upper and lower bounds due to the normal approximation Σ in (26) and Σ in (27), respectively; and the upper and lower bounds due to the saddlepoint approximation Ω in (41) and Ω in (42), respectively.These functions are plotted as a function of a, with a ∈ [0, 100].
,θ , P Z n,θ ) min 1, From (A28d), it follows that a n is the mean of a random variable that follows an exponentially tilted distribution with respect to P Y .Thus, there exists a solution in θ for (A28d) if and only if a n ∈ intC Y -hence the equality in (35).Finally, from (A28d), a = nE P Y [Y] implies that θ (a, n) = 0. Hence, h(nE P Y [Y]) = 0 from (35).This completes the proof for h(nE P Y [Y]) = 0.