Comparing Distributions of Sums of Random Variables by Deficiency: Discrete Case

In the paper, we consider a new approach to the comparison of the distributions of sums of random variables. Unlike preceding works, for this purpose we use the notion of deficiency that is well known in mathematical statistics. This approach is used, first, to determine the distribution of a separate random variable in the sum that provides the least possible number of summands guaranteeing the prescribed value of the (1−α)-quantile of the normalized sum for a given α∈(0,1), and second, to determine the distribution of a separate random variable in the sum that provides the least possible number of summands guaranteeing the prescribed value of the probability for the normalized sum to fall into a given interval. Both problems are solved under the condition that possible distributions of random summands possess coinciding three first moments. In both settings the best distribution delivers the smallest number of summands. Along with distributions of a non-random number of summands, we consider the case of random summation and introduce an analog of deficiency which can be used to compare the distributions of sums with random and non-random number of summands. The main mathematical tools used in the paper are asymptotic expansions for the distributions of R-valued functions of random vectors, in particular, normalized sums of independent identically distributed r.v.s and their quantiles. Along with the general case, main attention is paid to the situation where the summarized random variables are independent and identically distributed. The approach under consideration is applied to determination of the distribution of insurance payments providing the least insurance portfolio size under prescribed Value-at-Risk or non-ruin probability.


The Problem under Consideration and the Structure of the Paper
The problem considered in the paper is very close to the problem of stochastic ordering and even may be considered as a a version of this problem. In probability theory and statistics, a stochastic order quantifies the concept of one random variable being "bigger" or "smaller" than another. Many different orders exist, which have different applications, see, e.g., the book [1]. Here we propose an approach to establishing stochastic order for the distributions of sums of independent random variables (r.v.s) based on the notion of deficiency that is well known in asymptotic statistics, see, e.g., [2] and later publications [3][4][5]. Roughly speaking, in statistics the deficiency of a statistical procedure with respect to an 'optimal' procedure is the number of additional observations required to attain the same quality of inference as is guaranteed by the 'optimal' procedure.
In this paper we deal with the case where the deficiency is measured in natural-valued discrete units (number of 'additional' summands) and therefore here we deal with discrete case. The notion of deficiency can be extended to the case of the continuous parameter, say, time. This case will be considered in another work.
Along with the general case, in the paper main attention is paid to the situation where the r.v.s being summed are assumed to be independent and identically distributed.
The first problem to be considered below consists in determination of the distribution of a separate random variable in the sum that provides the least possible number of summands guaranteeing the prescribed value of the (1 − α)-quantile of the normalized sum for a given α ∈ (0, 1). The second problem considered in the paper consists in determination of the distribution of a separate random variable in the sum that provides the least possible number of summands guaranteeing the prescribed value of the probability for the normalized sum to fall into a given interval. Actually, in both problems we deal with 'fine tuning' of the distribution of a separate summand since we assume that different possible distributions of random summands possess coinciding three first moments, so that they can differ only by their kurtosis. In both settings the best distribution delivers the smallest number of summands.
We also consider the problem where some additional randomization is introduced so that the number of summands in the sum can be random itself. This randomization may not be artificially induced, but also may occur when the exact number of summands is a priori unknown and only some its 'expected' value can be available as the parameter of the problem. For this case we introduce an analog of deficiency which can be used to compare the distributions of sums with random and non-random number of summands.
Both problems are closely related with the problem of quantification of the accuracy of approximations provided by limit theorems of probability theory. The main mathematical tools used in the paper are asymptotic expansions for the distributions of normalized sums of independent identically distributed r.v.s and their quantiles.
The formal settings mentioned above can be applied to solving practical problems where the models of the observed statistical regularities have the form of distributions of sums of r.v.s and the number of summands plays a substantial role. For example, consider an insurance company whose portfolio consists of a finite number of insurance contracts. Formally, the portfolio is assumed to be a finite set of r.v.s each of which characterizes the income of the company related to a separate contract. Instead of income we can speak of loss assuming that income is a negative loss or that loss is a negative income.
In these terms, the first setting concerns the problem of determination of the distribution of a possible loss within a separate insurance contract (say, the distribution of an insurance payment) providing the least possible portfolio size and guaranteeing the prescribed Value-at-Risk for the average losses. The approach considered in the paper can be used when the distributions of the summands (possible losses) are known only up to their three first moments and the exact Value-at-Risk is not known for sure. In the second setting the latter requirement is replaced by that of guaranteeing the prescribed 'non-ruin' probability. Within the framework of this example in both settings the problem consists in the description of the best strategy of the insurance company, if by a strategy we mean the choice of the terms of a contract (e.g., the amount of insurance payment related to each possible insurance event), that is, of the distribution of possible loss within a separate contract. Briefly, the problem is to choose an optimal distribution of a separate loss among the distributions that have the same first three moments so that the portfolio size is least possible.
The paper is organized as follows. Section 1.2 contains a short overview of the properties of statistical deficiency. In Section 2 we outline some results concerning the asymptotic expansions for the distributions of R-valued measurable functions of r.v.s and, in particular, for the distributions of normalized sums of r.v.s, as well as for their quantiles. In Section 3 the problem of comparison of the distributions of two sums of independent r.v.s by their deficiency is considered. The notion of asymptotic deficiency is introduced and some formulas for the calculation of asymptotic deficiency are presented. Section 3.1 contains the solution of this problem for these distributions providing a prescribed value of the (1 − α)-quantile for a given α ∈ (0, 1). In Section 3.2 this problem is considered for the distributions of sums of independent r.v.s guaranteeing a prescribed probability for an R-valued measurable function of r.v.s, in particular, for a normalized sum of r.v.s, to fall into a given interval. Section 4 contains an example of extension of the results of Section 3 to the case of a random number of summands in the sum (random portfolio size, in terms of the example dealing with an insurance company). In Section 4.1 asymptotic expansions for the asymptotic (1 − α)-quantile (called α-reserve here) under a random portfolio size are presented and an analog of deficiency of the sum of a random number of summands (or the strategy with a random portfolio size) with respect to the distribution of the sum of a non-random number of summands (or a strategy with a non-random portfolio size) is considered. In Section 4.2 the problem of comparison of these distributions by an analog of deficiency is considered in a special case of three-point distribution of portfolio size.
Everywhere in what follows the set of real numbers is denoted by R, the set of natural numbers is denoted by N. The distribution function of the standard normal law will be denoted by Φ(x), The distribution of a random vector (X 1 , . . . , X n ) will be denoted L(X 1 , . . . , X n ).

Asymptotic Deficiency
Following the classical terminology of [6], consider two decision rules (say, two statistical procedures) D * n and D n whose quality is characterized by the quantities π * n and π n , respectively. Here n is the number of observations X 1 , . . . , X n delivering the information underlying the decision rules. Assume that the rule D * n is in some sense optimal whereas the rule D n is competing. For example, in the problem of estimation usually π * n and π n are mean square deviations and π * n ≤ π n . In the problem of testing hypotheses usually π * n and π n are powers of tests so that π * n ≥ π n . By m(n) denote the number of observations required for the decision rule D m(n) based on m(n) observations X 1 , . . . , X m(n) to attain the same quality as the 'best' rule D * n based on n observations X 1 , . . . , X n . In what follows we will keep to the asymptotic approach assuming that n → ∞. Following [7], by the asymptotic relative efficiency (a.r.e.) of the rule D n with respect to the rule D * n we will mean the limit e ≡ lim n→∞ n m(n) (if it exists and does not depend on the sequence m(n)).
Instead of the ratio of the required number of observations, the difference m(n) − n can be considered as well, vividly showing the additional number of observations required by the decision rule D n . However, many authors considered the ratio n/m(n), possibly, because the asymptotic analysis of its properties is simpler.
The systematic analysis of the asymptotic behavior of the difference m(n) − n was first carried out by Hodges and Lehmann in 1970 [2]. They suggested to call the difference m(n) − n deficiency of the competing decision rule D n with respect to the rule D * n and introduced the notation If the limit lim n→∞ d n exists, then it is called the asymptotic deficiency of the competing decision rule D n with respect to the rule D * n and is denoted d. The number d is often called the deficiency of D n with respect to D * n . Note that if a.r.e. e = 1, then d = ∞, so that this case is not so interesting. In [2] it was also noticed that for some decision rules (statistical procedures) there typically appear cases e = 1 (see, e.g., the book [8]), that is, in these cases the a.r.e. cannot give an answer to the question, which rule is better, whereas the deficiency can clarify the case, because, generally speaking, in this case the asymptotic deficiency can be arbitrary.
So, the deficiency of D n with respect to D * n shows, how many additional observations (that is, how much extra information) is required to attain the desired quality, if the decision rule D n is used instead of the 'optimal' decision rule D * n . Therefore, the notion of deficiency provides natural grounds for the asymptotic comparison of D n and D * n in the case e = 1. The study of the asymptotic behavior of the deficiency d n requires more sophisticated techniques than is used to find the limit e. As a rule, this techniques employ the construction of asymptotic expansions (a.e.s) for the corresponding functions characterizing the quality of decision rules (see, e.g., the books [7][8][9]).
Since the rules D * n and D n have the quality characteristics π * n and π n , respectively, then, by the definition of the deficiency d n = m(n) − n, for every n we have So solve Equation (2), the integer-valued quantity m(n) should be treated as a variable taking arbitrary real values. For this purpose the function π m(n) can be defined for noninteger m(n) by the formula (see [2]).
The functions π * n and π n are usually unknown, so, in practice, their approximations are used. Assume that the a.e.s and π n = a n r + hold, where a, b and c are some numbers that do not depend on n, and r > 0, and s > 0 are constants determining the rate of decrease of these quality criteria in n. The first terms in these expansions coincide which means that the a.r.e. of the corresponding rules equals one. It can be easily obtained from relations (1)-(4) that (see [2] or [7]). Thus, the asymptotic deficiency has the form .

(6)
The asymptotic deficiency possesses the following obvious property of transitivity: if there is some third decision rule D n with the quality characteristic π n admitting an a.e. of the form (4), then the deficiency d n of the rule D n with respect the the rule D * n satisfies the equality where d n is the deficiency of the rule D n with respect to D n and d n is the deficiency of D n with respect to D * n . The case where s = 1 is most interesting, because in this case the asymptotic deficiency is finite. In the paper [2] some simple examples are given illustrating that this case is quite natural in mathematical statistics (also see the book [8]).

Asymptotic Expansions for the Distributions of Normalized Sums of Random Variables
We begin with most general case. Let n ∈ N. Consider a finite set of r.v.s X 1 , . . . , X n . For the time being we do not assume that the r.v.s X 1 , . . . , X n are independent and identically distributed. Let L n = L n (X 1 , . . . , X n ) be an R-valued measurable function of X 1 , . . . , X n . (In what follows when dealing with the example of the portfolio of an insurance company we will call this function generalized loss). In particular, L n may be of the form L n = √ nT n where T n is the arithmetic mean, As it has been already said, the problem consists in description of the distribution of r.v.s X i providing the least possible number of summands n and guaranteeing the prescribed value of the (1 − α)-quantile of the function L n for a given α ∈ (0, 1).
Let α ∈ (0, 1) be a small number. Consider the quantity c α (n) defined by the asymptotic relation The quantity c α (n) is the asymptotic (1 − α)-quantile of L n . If L n = √ nT n , then c α (n) can be interpreted as the threshold, the exceedance of which by L n is undesirable and is assumed to have the prescribed small probability α. In terms of an insurance company, c α (n) is the asymptotic Value-at-Risk.
By applying the Taylor formula it is not difficult to obtain the following result.

Lemma 1.
Assume that there exist distribution function G(x) and functions g 1 (x) and g 2 (x) such that sup where the functions G(x), g 1 (x) and g 2 (x) are smooth enough. Then the asymptotic (1 − α)quantile c α (n) of L n admits the a.e.
Consider the application of this lemma to the case where X 1 , X 2 , . . . are independent identically distributed r.v.s such that and the function L n has the form L n = √ nT n with T n defined by (7). Here the condition EX 1 = 0 means that the separate losses are centered by their expectations. Assume that the characteristic function f (t) of the r.v. X 1 satisfies the Cramér condition (C) Under conditions (9) and (10), from Theorem 6.3.2 of [10] (also see [9]) it follows that there exist functions Q 1 (x), . . . , Q k−2 (x) and a C k,δ ∈ (0, ∞) such that For the definition of the functions Q 1 (x), . . . , Q k−2 (x) see the book [10]. In particular, Relations (11) and (12) and Lemma 1 directly imply the a.e. for the asymptotic (1 − α)quantile c n (α) of L n presented in the following lemma. Lemma 2. Let conditions (9) and (10) hold with k = 4, δ > 0. Then the the asymptotic (1 − α)quantile c n (α) of L n admits the a.e. In this section we will present an approach to the comparison of the distributions of two sums of r.v.s in terms of the number of summands. The distribution of the random vector X 1 , . . . , X n will be denoted L(X 1 , . . . , X n ). Consider an R-valued measurable function of X 1 , . . . , X n .

The Comparison of the Distributions of Two Normalized Sums of
From Lemma 1 we can easily obtain the following result.
Lemma 3. Consider a sequence { n } n≥1 such that n → 0 as n → ∞. Under the conditions of Lemma 1 we have sup x∈R P L n (X 1 , . . . , X n ) < x + n − P L n (X 1 , . . . , Along with the r.v.s X 1 , . . . , X n resulting in the value L n (X 1 , . . . , X n ) of the function L n , consider another set of r.v.s Y 1 , . . . , Y n , according to which the value of the function L n is L n (Y 1 , . . . , Y n ). For example, L n (X 1 , . . . , X n ) may have the form L n (X 1 , . . . , X n ) = √ nT n with T n defined by (7) and L n (Y 1 , . . . , Y n ) may have the form L n (Y 1 , . . . , Y n ) = √ nU n where Let to the distribution L(Y 1 , . . . , Y n ) there correspond the asymptotic (1 − α)-quantile c α (n) of L n : Assume that the a.e. for the distribution function of L n (Y 1 , . . . , Y n ) has the form where the functions G(x), g 1 (x) and g 2 (x) are smooth enough. The a.e. (15) differs from the a.e. for the distribution function of L n (X 1 , . . . , X n ) established by Lemma 1 only by the term of order n −1 , which means that the two distributions are rather close. Define the sequence of natural numbers {m(n)} n≥1 by the equality If m(n) − n = d + o(1), d ∈ R, n → ∞, then d is the asymptotic deficiency of the distribution L(Y 1 , . . . , Y 1 ) with respect to the distribution L(X 1 , . . . , X n ). In other words, d is the asymptotic number of 'additional' r.v.s be included in the set Y 1 , . . . , Y 1 in order that the distribution L(Y 1 , . . . , Y m(n) ) provides the same quality as the distribution L(X 1 , . . . , X n ). Theorem 1. Assume that the conditions of Lemma 1 and (15) hold and G (c α )c α = 0. Then the asymptotic deficiency d of the distribution L(Y 1 , . . . , Y 1 ) with respect to the distribution L(X 1 , . . . , X n ) has the form

Proof. From Lemma 1 and condition (15) it directly follows that
and therefore Further, with the account of the definitions of m(n) (see (16)) and n we have Applying Lemma 3 to the right-hand side of (19) we obtain α + o(n −1 ) = P L m(n) (Y 1 , . . . , Y m(n) ) ≥ c α (m(n)) − n G (c α ) + o(n −1 ).

Now from (16) and (18) it follows that
The theorem is proved.
Now consider an example of the application of Theorem 1 to the optimization of the portfolio size of an insurance company. Let the possible losses X 1 , X 2 , . . . related with each insurance contract in the portfolio be independent identically distributed r.v.s satisfying conditions (9) and (10). Consider another distribution, under which the possible losses Y 1 , Y 2 , . . . are assumed to be independent identically distributed r.v.s such that Assume that the characteristic function p(t) of the r.v. Y 1 satisfies the Cramér (C) condition lim sup |t|→∞ |p(t)| < 1.
For each n consider the average losses U n defined by (13). Assume that (for example, the r.v.s X i and Y i are centered by their expectations and the distributions of these centered r.v.s are symmetric). From Lemma 2 and Theorem 1 we directly obtain the following statement. Lemma 4 illustrates that if the distributions are close, then the deficiency is determined by the kurtosis.

The Asymptotic Deficiency of the Distributions of Summands Providing a Given Probability for the Normalized Sum to Fall into a Given Interval
To begin with, in this section we again consider the values of a measurable Rvalued function L n (X 1 , . . . , X n ) and L n (Y 1 , . . . , Y n ) on random vectors (X 1 , . . . , X n ) and (Y 1 , . . . , Y n ) with the the distributions L(X 1 , . . . , X n ) and L(Y 1 , . . . , Y n ), respectively. The goal is to provide that the value of L n falls into the interval [S 1 , S 2 ) for some given numbers S 1 < S 2 . As a quality characteristic consider the probabilities If L n (X 1 , . . . , X n ) = √ nT n (see (7)) and L n (Y 1 , . . . , Y n ) = √ nU n (see (22)), that is, normalized sums of r.v.s are considered, then relation (23) means that π n and π n are probabilities of that the normalized sums of r.v.s are inside the interval [S 1 , S 2 ).
From the definition of π n we directly obtain the following result.

Lemma 5.
Assume that for some r > 0 and s > 0 there exist a distribution function H(x) and functions h 1 (x), h 2 (x) and h 2 (x) such that sup x∈R P L n (X 1 , . . . , and, moreover, the functions h 1 (x), h 2 (x) and h 2 (x) are measurable. Then π n and π n admit a.e.s Lemma 5, Corollary 1 and formula (6) directly imply the expression for the asymptotic deficiency with quality characteristics (23). Theorem 2. Let conditions of Lemma 5 hold with s = 1. Then the deficiency d n of the distribution L(Y 1 , . . . , Y n ) with the quality characteristic π n with respect to the distribution L(X 1 , . . . , X n ) with the quality characteristic π n has the form If S 2 = S 1 + n with n ↓ 0 as n → ∞ and h 1 (S 1 ) = 0, then the formal passage to the limit in (3.13) yields the formula Consider an example of the application of Theorem 2 to the optimization of the portfolio size of an insurance company. Let the possible losses X 1 , X 2 , . . . related with each insurance contract in the portfolio be independent identically distributed r.v.s satisfying conditions (9) and (10). Consider another distribution, under which the possible losses Y 1 , Y 2 , . . . are assumed to be independent identically distributed r.v.s satisfying conditions (20) and (21). Assume that in (9) and (20) k = 3. We are interested in the asymptotic behavior of the average losses T n (see (7)) and U n (see (13)). With the account of Lemma 5 we obtain the following statement.

Theorem 3.
Let, in addition to the conditions of Lemma 5., EX 3 1 = EY 3 1 . Then the deficiency d n of the distribution L(Y 1 , . . . , Y n ) with the quality characteristic π n with respect to the strategy L(X 1 , . . . , X n ) with the quality characteristic π n (the 'additional number of contracts') has the form Consider an example where the asymptotic deficiency is finite.
Corollary 3. Let n = 1 n and S 2 = S 1 + 1 n , EX 3 1 = EY 3 1 = 0. Then under the conditions of Lemma 5 we have Moreover, the deficiency d n has the form

Asymptotic Expansions for the Asymptotic (1 − α)-Quantile of R-Valued Measurable Functions of a Random Number of Random Variables
In this section we consider the case where an additional randomization can be introduced into the problem. In this case the number of summands in the sum can be considered as random. This randomization may not be artificially induced, but also may occur when the exact portfolio size can be unknown beforehand and only some 'expected' number of summands can be available as the parameter of the problem.
Let natural-valued r.v.s N 1 , N 2 , . . . and r.v.s X 1 , X 2 , . . . be defined on one and the same probability space (Ω, A, P). In what follows we will assume that n is the expected value of N n , EN n = n.
Below we will assume that the following condition holds.
Condition A. There exist k ∈ N\{1}, α i,n ∈ R, i = 1, . . . , k, β n > 0, C k > 0, a differentiable distribution function G(x) and measurable functions g j (x), j = 1, . . . , k such that Lemma 7. Let the function L n = L n (X 1 , . . . , X n ) satisfy Condition A. Then The elementary proof of this lemma directly follows by the formula of total probability. Consider an example of application of Lemma 7. Let X 1 , X 2 , . . . be independent identically distributed r.v.s satisfying conditions (9) and (10). Assume that the function L n is the normalized arithmetic mean (or, which is the same, the normalized sum) L n = √ nT n with T n defined in (7). Then, in accordance with what has been said in Section 2, relation (11) holds implying the validity of Condition A. From (11) playing the role of Condition A and Lemma 7 we obtain the following statement.

Lemma 8.
Assume that L n = √ nT n with T n defined in (7) and conditions (9) and (10) hold. Then where the functions Q i (x) are defined in Theorem 6.3.2 of [10].
Relation (11) and Lemma 8 imply the following statement.
Now define the sequence m(n) of natural numbers by the relation n = 1, 2, . . ., then d can have the meaning of the expected additional number of summands to be included in the sum in order that the function L N n exceeds c α (n) for the loss under a non-random number n of summands. The quantity d will be called the asymptotic deficiency.
In the same way that Theorem 1 was proved, we can establish the following statement.

Theorem 4. Assume that
and there exist δ > 0, a differentiable distribution function G(x) and measurable functions g 1 (x) and g 2 (x) such that and G (c α )c α = 0. Then the expected number d of additional summands (see (28) and (29)) in the normalized random sum L N n with respect to the normalized sum L n has the form where c α satisfies the equation G(c α ) = 1 − α.
Theorem 4 implies the following statement.  (29)) corresponding to the normalized sum √ N n T N n with a random number of summands with respect to the normalized sum √ nT n has the form Lemma 12. Assume that conditions (9) and (10) hold with k = 4 and 0 < δ ≤ 1. Let conditions (30) and (31) hold. Then Corollary 5. Let conditions of Lemma 12 hold and h n = n 3/4 . Then Relations (12), Lemmas 10 and 11 yield the following theorem.
Theorem 6. Let the conditions of Corollary 5 hold. Then the asymptotic α-reserves c α (n) and c α (n) related to the normalized average losses √ nT n and √ N n T N n have the form where u α satisfies the equation Φ(u α ) = 1 − α. The corresponding expected additional number d of contracts has the form d = Q 1 (u α ) 2ϕ(u α )u α + o(1) = (1 − u 2 α )EX 3 1 12u α + o(1).

Conclusions
The paper deals with an approach to the comparison of distributions of sums of a finite number of independent random variables by deficiency. The notion of asymptotic deficiency of the distribution of a measurable R-valued function of a random vector with respect to the distribution of the same function of another random vector was introduced. Some formulas for the calculation of asymptotic deficiency were presented in the cases where the function has the form of a normalized sum of independent identically distributed r.v.s. The formulas for the asymptotic deficiency were obtained as the solution of two problems, one of which deals with the description of the distribution of a separate summand minimizing the number of summands and providing a prescribed value of the (1 − α)quantile of the normalized sum for a given α ∈ (0, 1). The second problem deals with minimization of the number of summands and guaranteeing a prescribed probability for a normalized sum of r.v.s to fall into a given interval. These results were extended to the case of a random number of summands in the sum (or random portfolio size, in terms of the example dealing with an insurance company). For this case, an analog of deficiency of the sum of a random number of summands with respect to the distribution of the sum of a non-random number of summands was introduced. The problem of comparison of these distributions by an analog of deficiency was considered in a special case of threepoint distribution of portfolio size. The main mathematical tools used in the paper were asymptotic expansions for the distributions of average losses and their quantiles.