Next Article in Journal
A New Inference Approach for Type-II Generalized Birnbaum-Saunders Distribution

Metrics 0

Export Article

Stats 2019, 2(1), 121-147; https://doi.org/10.3390/stats2010010

Article
Saddlepoint Approximation for Data in Simplices: A Review with New Applications
Institute of Mathematical Statistics and Actuarial Science, University of Bern, 3012 Bern, Switzerland
Received: 23 January 2019 / Accepted: 14 February 2019 / Published: 18 February 2019

Abstract

:
This article provides a review of the saddlepoint approximation for a M-statistic of a sample of nonnegative random variables with fixed sum. The sample vector follows the multinomial, the multivariate hypergeometric, the multivariate Polya or the Dirichlet distributions. The main objective is to provide a complete presentation in terms of a single and unambiguous notation of the common mathematical framework of these four situations: the simplex sample space and the underlying general urn model. Some important applications are reviewed and special attention is given to recent applications to models of circular data. Some novel applications are developed and studied numerically.
Keywords:
bootstrap; circular data; Dirichlet distribution; entropy; likelihood ratio test; multinomial distribution; multivariate hypergeometric distribution; multivariate Polya distribution; spacings; spacing-frequencies; urn model
MSC:
41A60; 60C05

1. Introduction

The topic of this article is a saddlepoint approximation to the distribution of the M-statistic $T n$, precisely $T n ( Y 1 , … , Y n )$, which is the implicit solution with respect to (w.r.t.) t of
$∑ j = 1 n ξ j ( Y j ; t ) = 0 ,$
where the function $ξ j$: $R + × R → R$ is continuous (thus measurable), decreasing in its second argument, for $j = 1 , … , n$, $R + = [ 0 , ∞ )$, and where the random variables $Y 1 , … , Y n$ are nonnegative, dependent and satisfy $∑ j = 1 n Y j = k$, for some fixed $k > 0$. Decreasing is meant in the strict sense. The sample vector $( Y 1 , … , Y n )$ takes values in a simplex. It is often referred to as compositional data, by referring to the situation where $Y j$ represents the number of units of the jth category, for $j = 1 , … , n$, given n possible categories (see e.g., ). When $( Y 1 , … , Y n )$ follows the multinomial distribution, it is also referred to as categorical data. We consider three discrete and one continuous joint distributions for $( Y 1 , … , Y n )$ and relate these multivariate distributions to three general urn sampling schemes that are given, e.g., in .
The derivation of the saddlepoint approximation to the distribution of $T n$ relies on the distributional equivalence
$( Y 1 , … , Y n ) ∼ ( X 1 , … , X n ) | ∑ j = 1 n X j = k ,$
which means that $( Y 1 , … , Y n )$ has the conditional distribution of $( X 1 , … , X n )$ given $∑ j = 1 n X j = k$. The nonnegative random variables $X 1 , … , X n$ form a conditional triangular array in the sense that, conditionally on their sum, they are independent and their individual distributions may depend on n. We refer to Equation (2) as the conditional representation of $( Y 1 , … , Y n )$ in terms of $( X 1 , … , X n )$. The computation of the distribution of $T n$, as function of the dependent random variables $Y 1 , … , Y n$, is generally difficult. It is however simplified by replacing these dependent random variables by the triangular array random variables $X 1 , … , X n$, in the same order, conditional on their sum. Gatto and Jammalamadaka  extended the saddlepoint approximation for tail probabilities of Skovgaard  to M-statistics and used the conditional representation in Equation (2) to derive saddlepoint approximations for important classes of nonparametric tests, such as tests based on spacings, two-sample tests based on spacing-frequencies and various tests based on ranks. The application of this conditional saddlepoint approximation to the computation of quantiles can be found in . Further applications can be found in [6,7].
This article presents the conditional saddlepoint approximation from the general perspective of the urn sampling model. Four cases of the of conditional representations given in Equation (2) are related to the urn model: the joint multinomial in terms of Poisson random variables conditional on their sum (M-P), the joint multivariate hypergeometric in terms of binomial random variables conditional on their sum (MH-B), the joint multivariate Polya in terms of negative binomial random variables conditional on their sum (MP-NB) and the joint Dirichlet in terms of gamma random variables conditional on their sum (D-G). New applications or examples are given and tested numerically. Various previous applications of the conditional saddlepoint approximation are reviewed. Two other general references on conditional saddlepoint approximations are found in  (Chapter 4 and Section 12.5) and . This article completes these references in various ways. It provides a concise and complete presentation of the conditional saddlepoint approximation for M-statistics (that includes an approximation to quantiles). It updates the previous reviews by presenting additional recent important examples. It gives a general reformulation with a consistent and homogeneous notation, that corresponds to a single underlying mathematical model (viz., the urn model and the simplex sample space). It includes new important examples and new numerical comparisons. The numerical illustrations are given for: the distribution of an estimator of the entropy that relates to the urn model, the power of the likelihood ratio test, the distribution of the insurer’s total claim amount and the null distribution of a test for symmetry of Dirichlet’s distribution.
Mirakhmedov et al.  used the three well-known conditional representations M-P, MH-B and MP-NB with the Edgeworth approximation. The Edgeworth is however not a large deviations approximation. Edgeworth approximations to small tail probabilities are usually less accurate than saddlepoint approximations. Butler and Sutton  proposed a particular saddlepoint approximation that exploits the conditional representation in Equation (2). It implies that, for all intervals $I 1 , … , I n ⊂ R +$,
$P [ Y 1 ∈ I 1 , … , Y n ∈ I n ] = P ∑ j = 1 n X j = k | X 1 ∈ I 1 , … , X n ∈ I n P [ X 1 ∈ I 1 , … , X n ∈ I n ] P ∑ j = 1 n X j = k .$
Then, the conditional probability above is approximated by a saddlepoint approximation for independent and truncated random variables. This method allows approximating the distribution of $M n = max j = 1 , … , n Y j$, for example, but does not allow approximating the distribution of the M-statistic in Equation (1). Note that, for the case where $( Y 1 , … , Y n )$ follows the multinomial distribution, given in Equation (3), Good  proposed a specific saddlepoint approximation for $M n$.
This article has the following structure. Section 2 presents the four conditional representations, in Section 2.1 and Section 2.3. They are related to urn sampling schemes in Section 2.2. The three first conditional representations, namely M-P, MH-B and MP-NB, are for counting random variables. The fourth conditional representation is D-G and holds for positive random variables. Section 3 summarizes the conditional saddlepoint approximation for a M-statistics given another one: Section 3.1 and Section 3.2 are for tail probabilities and Section 3.3 for quantiles. Then, Section 4 provides new applications and numerical studies for this saddlepoint approximation and briefly reviews other important existing applications. Some final remarks are given in Section 5.
Regarding notation, we define $N = { 0 , 1 , … }$, $N * = N \ { 0 }$, $R + = [ 0 , ∞ )$ as already defined and $R + * = R + \ { 0 }$. The Pochhammer symbol is defined by
$( x ) k = x · … · ( x − k + 1 ) , ∀ x ∈ R , k ∈ N * .$
The binomial coefficient is defined by
$x k = 0 , if k = − 1 , − 2 , … , 1 , if k = 0 , ( x ) k k ! , if k = 1 , 2 , … , ∀ x ∈ R .$
The indicator function of the statement A is defined by
$I { A } = 0 , if A is false , 1 , if A is true .$
Let $n ∈ { 2 , 3 , … }$. A $( n − 1 )$-simplex is the $( n − 1 )$-dimensional polytope determined by the convex hull of its n vertices. We consider only the symmetric simplex. It is obtained by defining the jth vertex $v j = ( v 0 , … , v n − 1 )$ by
$v i = x , if i = j , 0 , otherwise , for i = 0 , … , n − 1 ,$
for any desired size $x ∈ R + *$ and for $j = 0 , … , n − 1$. This representation corresponds to the set
$Δ x n − 1 = { ( x 1 , … , x n ) ∈ R + n | x 1 + … + x n = x } .$
We define also by
$Δ ¨ k n − 1 = Δ k n − 1 ∩ N n = { ( k 1 , … , k n ) ∈ N n | k 1 + … + k n = k }$
the integer $( n − 1 )$-simplex of size $k ∈ N *$.
We denote by $X ∼ Y$ the fact that the two random elements X and Y have same distribution. The same symbol is used for the asymptotic equivalence.

2. Four Conditional Representations and Their Urn Sampling Interpretations

This section reviews four multivariate distributions for which the conditional representation in Equation (2) holds and relates them to a common urn model. Although these results are classical and can be retrieved perhaps separately in the literature, the contribution of this section must be sought in the single and unambiguous mathematical reformulation: of the multivariate distributions, of their conditional representations and of their urn model. The same notation is used for saddlepoint approximation in Section 3 and for the examples in Section 4. The first three models are presented in Section 2.1 and are related to the three urn sampling schemes in Section 2.2. In these three models, $( Y 1 , … , Y n )$ takes values in $Δ ¨ k n − 1$, for $k ∈ N *$. Section 2.3 presents a fourth multivariate model where $( Y 1 , … , Y n )$ takes values in $Δ k n − 1$, for $k ∈ R + *$, and for which an asymptotic relation with one of the urn sampling models holds.

2.1. Three Conditional Representations for Counting Random Variables

The next three multivariate distributions allow for the conditional representation in Equation (2) and relate to the three urn sampling schemes of Section 2.2.
• Multinomial—conditional Poisson (M-P)
Let $X j ∼$ Poisson$( q p j )$, i.e., Poisson distributed with parameter $q p j$, for $j = 1 , … , n$, be independent, where $( p 1 , … , p n ) ∈ Δ 1 n − 1$ and $q ∈ R + *$. Then, the conditional representation in Equation (2) holds with $( Y 1 , … , Y n ) ∼$ $Multinomial ( k ; p 1 , … , p n )$, for $k ∈ N *$, that is, with
$P [ Y 1 = k 1 , … , Y n = k n ] = k k 1 … k n p 1 k 1 … p n k n ,$
$∀ ( k 1 , … , k n ) ∈ Δ ¨ k n − 1$, which is the multinomial distribution. Thus, $k = ∑ j = 1 n k j$.
• Multivariate hypergeometric—conditional binomial (MH-B)
Let $X j ∼$ Binomial$( m j , q )$, i.e., binomial distributed with $m j$ trials and elementary probability q, for $j = 1 , … , n$, be independent, where $( m 1 , … , m n ) ∈ Δ ¨ z n − 1$, $z = ∑ j = 1 n m j$ and $q ∈ ( 0 , 1 )$. Then, the conditional representation in Equation (2) holds with $( Y 1 , … , Y n ) ∼$ Multi-Hypergeometric$( k ; m 1 , … , m n )$, for $k ∈ N *$, that is with
$P [ Y 1 = k 1 , … , Y n = k n ] = ∏ j = 1 n m j k j z k ,$
for $k j = 0 , … , m j$, for $j = 1 , … , n$, and $k = ∑ j = 1 n k j$ $≤ z$, which is the multivariate hypergeometric distribution. Thus, $( k 1 , … , k n ) ∈ Δ ¨ k n − 1 ∩ ( [ 0 , m 1 ] × … × [ 0 , m n ] )$.
• Multivariate Polya—conditional negative binomial (MP-NB)
Let $X j ∼$ Negative-Binomial$( m j , q )$, i.e.
$P [ X j = l ] = l + m j − 1 l q m j ( 1 − q ) l , for l = 0 , 1 , … ,$
for $j = 1 , … , n$, be independent, where $( m 1 , … , m n ) ∈ Δ u n − 1$, for some $u ∈ R + *$, and $q ∈ ( 0 , 1 )$. Thus, $u = ∑ j = 1 n m j$. Then, the conditional representation in Equation (2) holds with $( Y 1 , … , Y n ) ∼$ Multi-Polya$( k ; m 1 , … , m n )$, for $k ∈ N *$, that is with
$P [ Y 1 = k 1 , … , Y n = k n ] = ∏ j = 1 n m j + k j − 1 k j u + k − 1 k ,$
$∀ ( k 1 , … , k n ) ∈ Δ ¨ k n − 1$, which is the multivariate Polya distribution. Thus, $k = ∑ j = 1 n k j$.
We end this section with three remarks of general interest. We first note that in these three situations the conditional representation in Equation (2) holds independently of the choice of q, in $R + *$ for the M-P and in $( 0 , 1 )$ for the MH-B and MP-NB representations. This independence can be understood from fact that, in all three cases, $∑ j = 1 n X j$ is a sufficient statistic for q. This is a consequence of the factorization theorem of sufficient statistics.
We can see that each one of the three conditional representations have an interpretation in terms of mixture models. For example, consider the independent random variables $X j ∼$ Poisson$( q p j )$, for $j = 1 , … , n$, where $( p 1 , … , p n ) ∈ Δ 1 n − 1$ and $q ∈ R + *$. Then, $∀ k 1 , … , k n ∈ N$, for $k = ∑ j = 1 n k j$ and $K = ∑ j = 1 n X j$,
$P [ X 1 = k 1 , … , X n − 1 = k n − 1 ] = ∑ k n = 0 ∞ P [ X 1 = k 1 , … , X n − 1 = k n − 1 , X n = k n ] = ∑ k n = 0 ∞ k k 1 … k n − 1 k n p 1 k 1 … p n − 1 k n − 1 p n k n e − q q k k ! .$
Thus, $( X 1 , … , X n − 1 )$ follows the countable mixture distribution given by multinomial probabilities with Poisson mixing probabilities. Moreover,
$∑ k n = 0 ∞ P [ X 1 = k 1 , … , X n − 1 = k n − 1 , X n = k n ] = ∑ k n = 0 ∞ P [ X 1 = k 1 , … , X n − 1 = k n − 1 , X n = k n ∣ K = k ] P [ K = k ] = ∑ k n = 0 ∞ P [ X 1 = k 1 , … , X n − 1 = k n − 1 ∣ K = k ] P [ K = k ] .$
By equating the multinomial and the Poisson probabilities of Equation (6) to the two probabilities of Equation (7), for any summand, we obtain the M-P conditional representation.
We also note that that the three distributions of $X 1 , … , X n$ (before conditioning) correspond to the three distributions of the $( a , b , 0 )$ class. The probability distribution ${ p n } n ≥ 0$ belongs to the $( a , b , 0 )$ class, if it satisfies the recurrence relation $p n = ( a + b / n ) p n − 1$, for $n = 1 , 2 , …$ and for some $a , b ∈ R$ (see, e.g., Section 6.5 of ).

2.2. Three Associated Urn Sampling Schemes

The three multivariate distributions presented in the previous section provide probability models for three sampling schemes: sampling with replacement, sampling without replacement and Polya’s sampling. These three sampling schemes are reunited in a single general urn sampling model by Ivchenko and Ivanov  (see also ). Consider an urn containing balls with the n different colors $C 1 , … , C n .$ At the beginning, the urn contains: $a j , 0 ∈ N$ balls of color $C j$, for $j = 1 , … , n$. Each single ball is drawn equiprobably from the urn. Immediately after the lth draw of a ball of color $C j$, $a j , l − 1 ∈ N *$ is updated by $a j , l ∈ N$; this holds for $l = 1 , 2 , …$ and $j = 1 , … , n$. Three updating mechanisms are presented in the next paragraph. Thus, immediately after drawing $k j$ balls of color $C j$, for $j = 1 , … , n$, and therefore just after a total of $k = ∑ j = 1 n k j$ draws, the urn contains $a j , k j$ balls of color $C j$, for $j = 1 , … , n$. The updated sampling probability of color $C j$ is thus
$p j ( k 1 , … , k n ) = ∑ j = 1 n a j , k j − 1 a j , k j , for j = 1 , … , n ,$
provided $∑ j = 1 n a j , k j > 0$. The random count $Y j$ represents the number of randomly drawn balls of color $C j$, this for $j = 1 , … , n$, after a fixed total number of draws $k = ∑ j = 1 n Y j ∈ N *$. Define by $z = ∑ j = 1 n a j , 0$ the initial total number of balls in the urn.
We are interested in the distribution of the M-statistic $T n$ viz. $T n ( Y 1 , … , Y n )$ defined in Equation (1), under the three following sampling schemes.
• Sampling with replacement and M-P representation
Sampling with replacement from the urn is obtained by setting
$a j , l = a j , 0 , for l = 1 , 2 , … and j = 1 , … , n .$
Thus, $p j ( k 1 , … , k n )$, for $j = 1 , … , n$, do not depend on $k 1 , … , k n$ and the multinomial distribution in Equation (3) holds with rational $p j = p j ( k 1 , … , k n ) = a j , 0 / z$, for $j = 1 , … , n$. Thus, $( Y 1 , … , Y n )$ takes values in $Δ ¨ k n − 1$ and the M-P representation holds.
• Sampling without replacement and MH-B representation
Sampling without replacement from the urn is obtained by setting
$a j , l = a j , l − 1 − 1 = a j , 0 − l , if l ≤ a j , 0 , 0 , if l > a j , 0 , for l = 1 , 2 , … and j = 1 , … , n .$
Assume that $k j ≤ a j , 0$ balls of color $C j$ have been drawn, for $j = 1 , … , n$. The probability of drawing a ball of color $C j$ in the next draw is $p j ( k 1 , … , k n ) = ( a j , 0 − k j ) / ( z − k )$, if $k < z$, and it is undefined, if $k = z$, for $j = 1 , … , n$. The multivariate hypergeometric distribution in Equation (4) holds with $m j = a j , 0$, for $j = 1 , … , n$, and z equal to the parameter z of the present section. Thus, $( Y 1 , … , Y n )$ takes values in $Δ ¨ k n − 1 ∩ ( [ 0 , a 1 , 0 ] × … × [ 0 , a n , 0 ] )$ and the MH-B representation holds.
• Polya’s sampling and MP-NB representation
Polya’s sampling scheme is obtained by setting
$a j , l = a j , l − 1 + r = a j , 0 + l r , for l = 1 , 2 , … and j = 1 , … , n ,$
where $r ∈ N *$. (Allowing for $r = 0$ would result in sampling with replacement and allowing for $r = − 1$ would result in sampling without replacement, which are already presented.) Assume that $k j$ balls of color $C j$ have been drawn, for $j = 1 , … , n$. The probability of drawing a ball of color $C j$ in the next draw is $p j ( k 1 , … , k n ) = ( a j , 0 + k j r ) / ( z + k r )$, for $j = 1 , … , n$. In this case, the multivariate Polya distribution in Equation (5) holds with rational $m j = a j , 0 / r$, for $j = 1 , … , n$, and rational $u = z / r$. Thus, $( Y 1 , … , Y n )$ takes values in $Δ ¨ k n − 1$ and the MP-NB representation holds.

2.3. A Conditional Representation for Positive Random Variables and Its Urn Sampling Interpretation

This section presents a fourth model that allows for the conditional representation in Equation (2). It is the Dirichlet distribution and it has a steady state interpretation in terms of Polya’s urn. Now, the dependent random variables $Y 1 , … , Y n$ take values in $R +$ and cannot yet be considered as counts of the urn model of Section 2.2.
• Dirichlet—conditional gamma (D-G)
Let $X j ∼ Gamma ( a j , q )$, with density $q a j e − q x x a j − 1 / Γ ( a j )$, $∀ x > 0$, for $j = 1 , … , n$, be independent, where $a 1 , … , a n$ and $q ∈ R + *$. Then, the conditional representation in Equation (2) holds with $( Y 1 , … , Y n ) ∼ k ( Y ¯ 1 , … , Y ¯ n )$, where $( Y ¯ 1 , … , Y ¯ n )$ is Dirichlet distributed with density
$P [ Y ¯ 1 ∈ ( y 1 , y 1 + d y 1 ) , … , Y ¯ n ∈ ( y n , y n + d y n ) ] = Γ ( a 1 + … + a n ) Γ ( a 1 ) … Γ ( a n ) y 1 a 1 − 1 … y n a n − 1 d y 1 … d y n ,$
$∀ ( y 1 , … , y n ) ∈ int Δ 1 n − 1$ and for $d y n = − ( d y 1 + … + d y n − 1 )$, which is denoted $( Y ¯ 1 , … , Y ¯ n ) ∼ Dirichlet ( a 1 , … , a n )$.
The validity of Equation (2) does not depend on the parameter $q ∈ R + *$ of the gamma distribution. This independence follows from the factorization theorem of sufficient statistics.
The Dirichlet distribution represents the steady state of Polya’s urn sampling scheme, viz. of the multivariate Polya distribution given in Section 2.2.
• Polya’s sampling and D-G representation
Precisely, immediately after drawing a ball of color $C j$, it is replaced together with $r ∈ N *$ new balls of same color $C j$, this for $j = 1 , … , n$, cf. Equation (8). If we let the total number of draws k go to infinity, then the vector of the proportions of the n drawn colors follows the Dirichlet$( a 1 , 0 / r , … , a n , 0 / r )$ distribution, viz.
$1 k ( Y 1 , … , Y n ) ⟶ d ( Y ¯ 1 , … , Y ¯ n ) , as k → ∞ ,$
where $( Y ¯ 1 , … , Y ¯ n )$ has the Dirichlet distribution in Equation (9) with $a j = a j , 0 / r$, for $j = 1 , … , n$. Thus, if $( Y 1 , … , Y n )$ follows the multivariate Polya distribution in Equation (5), taking values in $Δ ¨ k n − 1$, then it is approximatively distributed as $k ( Y ¯ 1 , … , Y ¯ n )$, taking values in $Δ k n − 1$.
To see Equation (10), let $( k 1 , … , k n ) ∈ Δ ¨ k n − 1$. The multivariate Polya probability in Equation (5) can be re-expressed as
$P [ Y 1 = k 1 , … , Y n = k n ] = Γ ( u ) ∏ j = 1 n Γ ( m j ) Γ ( 1 + k ) Γ ( u + k ) ∏ j = 1 n Γ ( m j + k j ) Γ ( 1 + k j ) .$
It follows from Stirling’s formula that $Γ ( x + z 1 ) / Γ ( x + z 2 ) ∼ x z 1 − z 2$, as $x → ∞$, $∀ z 1 , z 2 ∈ R$. Consequently,
$P [ Y 1 = k 1 , … , Y n = k n ] ∼ c 1 ( k ) ∏ j = 1 n k j m j − 1 ∼ c 2 ( k ) ∏ j = 1 n y j m j − 1 = c 2 ( k ) ∏ j = 1 n y j a j , 0 r − 1 , as k → ∞ ,$
for some positive constants $c 1 ( k )$ and $c 2 ( k )$ depending on k and for $y j = lim k → ∞ k j / k$, for $j = 1 , … , n$.

3. Conditional Saddlepoint Approximation for M-Statistics

The saddlepoint method, viz. method of steepest descent, allows approximating integrals of the form $∫ ρ f ( z ) e ν g ( z ) d z$, for large values of $ν > 0$, where f: $C → C$ and g: $C → C$ are analytic functions in a domain containing the path $ρ$ and its deformations. Let $z 0$ be point where the real part of g is the highest. It is a saddlepoint of the surface given by the real part of g. For large values of $ν$, the value of the integral is accurately approximated as follows. First, restrict $ρ$ to a small neighborhood of $z 0$. Second, deform $ρ$ such that it crosses $z 0$ and so that the real part of g decreases fast to $− ∞$, when descending from $z 0$ down to the endpoints of the deformed $ρ$. This is the path of steepest descent. The final step is the term-by-term integration, within the neighborhood of $z 0$, of an asymptotic expansion of the integrand around $z 0$. Two references are [15,16].
This method yields approximations to densities or tail probabilities of various random variables such as estimators or test statistics. The sample size n takes the role of the asymptotic parameter $ν$ and the relative error of the saddlepoint approximation vanishes at rate $n − 1$, as $n → ∞$. Unlike normal or Edgeworth approximations, saddlepoint approximations are valid at any fixed point (not depending on n) of the support of the distribution. They are thus large deviations techniques. For these two reasons, they provide accurate approximations to small tail probabilities, in fact even for small values of n. The saddlepoint approximation was introduced into statistics by Daniels , for approximating density functions. Lugannani and Rice  provided a formula for tail probabilities (see also ).
Saddlepoint approximations for conditional distributions were proposed by: Skovgaard  for the distribution of a sample mean given another mean; Wang  for the distribution of a mean given a nonlinear function of another mean; and Jing and Robinson  for the distribution of a nonlinear function of a mean given a nonlinear function of another mean. Kolassa  derived higher order terms to the conditional saddlepoint approximation of a sample mean given another mean, by using a different expansion to an integral appearing . DiCiccio  provided a different approximation, which is however restricted to the exponential class of distributions.
Some survey articles are [24,25,26,27]. General references are [8,28,29,30].
The saddlepoint approximation to conditional distribution of Skovgaard  is re-expressed for the M-statistic defined in Equation (1) by . This is summarized in Section 3.1. A modification for the lattice case is given in Section 3.2. A method for computing quantiles is given in Section 3.3.

3.1. Approximation to the Distribution

Consider n absolutely continuous and independent random variables $X 1 , … , X n$ and the M-statistic $( S 1 , n , S 2 , n )$ viz. $( S 1 , n ( X 1 , … , X n ) , S 2 , n ( X 1 , … , X n ) )$, which is the solution w.r.t. $( s 1 , s 2 )$ of
$∑ j = 1 n ψ 1 , j ( X j ; s 1 , s 2 ) ψ 2 , j ( X j ; s 2 ) = 0 ,$
where $ψ 1 , j : R 3 → R$ is a continuous function that is decreasing in its second argument and $ψ 2 , j : R 2 → R$ is a continuous function that is decreasing in its second argument, for $j = 1 , … , n$. The joint cumulant generating function (c.g.f.) of the summands in Equation (11) is given by
$K n ( v ; s ) = ∑ j = 1 n log E [ exp { v 1 ψ 1 , j ( X j ; s 1 , s 2 ) + v 2 ψ 2 , j ( X j ; s 2 ) } ] ,$
where $v = ( v 1 , v 2 ) ∈ R 2$ and $s = ( s 1 , s 2 ) ∈ R 2$. Define also $K 2 n ( v 2 ; s 2 ) = K n ( ( 0 , v 2 ) ; ( 0 , s 2 ) )$. The first computational step is to find the saddlepoint $α = ( α 1 , α 2 ) ∈ R 2$, which is the solution w.r.t. $v = ( v 1 , v 2 )$ of
$∂ ∂ v K n ( v ; s ) = 0 ,$
and the “marginal saddlepoint” $β ∈ R$, which is the solution w.r.t. $v 2$ of
$∂ ∂ v 2 K 2 n ( v 2 ; s 2 ) = 0 .$
Next, define
$K n ″ ( v ; s ) = ∂ 2 ∂ v ∂ v T K n ( v ; s ) , K 2 , n ″ ( v 2 ; s 2 ) = ∂ 2 ∂ v 2 2 K 2 n ( v 2 ; s 2 ) ,$
$ρ ( s ) = sgn ( α 1 ) { 2 [ K 2 n ( β ; s 2 ) − K n ( α ; s ) ] } 1 2 and σ ( s ) = α 1 det K n ″ ( α ; s ) K 2 , n ″ ( β ; s 2 ) 1 2 .$
With these quantities, we obtain the saddlepoint approximation
$P n ( s 1 ∣ s 2 ) = 1 − Φ ∘ ρ ( s ) + ϕ ∘ ρ ( s ) 1 σ ( s ) − 1 ρ ( s ) ,$
where $ϕ$ and $Φ$ are the standard normal density and distribution function. Then,
$P [ S 1 , n ≥ s 1 ∣ S 2 , n = s 2 ] = P n ( s 1 ∣ s 2 ) { 1 + O ( n − 1 ) } , as n → ∞ .$
Thus, the saddlepoint approximation in Equation (16) possesses a vanishing relative error and at any value of the argument $s 1$, that is, over large deviations regions.
By selecting $X 1 , … , X n$ from any one of the four conditional representations, M-P, MH-B, of MP-NB of Section 2.1 or D-G of Section 2.3, and by setting $ψ 1 , j ( x ; s 1 , s 2 ) = ξ j ( x ; s 1 )$ and $ψ 2 , j ( x ; s 2 ) = x − s 2$, for $j = 1 , … , n$, we obtain
$P [ T n ≥ t ] = P n t | k n { 1 + O ( n − 1 ) } , as n → ∞ ,$
for $T n$ defined in Equation (1).
Precisely, it follows from the conditional representation in Equation (2) that
$T n ( Y 1 , … , Y n ) ∼ S 1 , n ( X 1 , … , X n ) | S 2 , n ( X 1 , … , X n ) = k n .$
This equivalence and Equation (17) give Equation (18).
The argument $s 2$ of $ψ 1 , j ( x ; s 1 , s 2 )$ is not considered here, but it is useful in one example in .
As mentioned, the justification of this saddlepoint approximation can be found in  and it would be too long to reproduce it here. However, we can give a few general ideas. Let us consider $( U 1 , V 1 ) , … , ( U n , V n )$ independent and identically distributed (i.i.d.), absolutely continuous and with joint c.g.f. K. Let $( U ¯ , V ¯ )$ denote their sample mean. Then, the Fourier inversion and integration of the joint density gives
$P [ V ¯ ≥ v ∣ U ¯ = u ] = n 2 π i 2 ∫ c − i ∞ c + i ∞ ∫ i ∞ i ∞ exp { n [ K ( s , t ) − s u − t v ] } d s d t n t ,$
for $u , v ∈ R$ and $c > 0$. For the integral w.r.t. s, a standard saddlepoint approximation is used. The resulting saddlepoint approximation is an integral w.r.t. t and, due to a singularity, a modified saddlepoint approximation similar to the one in  must used to approximate this integral. The generalization from the sample mean to the M-statistic in Equation (11) follows directly from
$P [ S 1 , n ≥ s 1 ∣ S 2 , n = s 2 ] = P ∑ j = 1 n ψ 1 , j ( X j , s 1 , s 2 ) ≥ 0 | ∑ j = 1 n ψ 2 , j ( X j , s 2 ) = 0 ,$
for $s 1 , s 2 ∈ R$, which is due to the fact that $ψ 1 , j$ and $ψ 2 , j$ are decreasing in their second argument.

3.2. Modifications for Discrete Statistics

A slight modification of this saddlepoint approximation for the case where $T n$ takes values in the lattice ${ j δ / n } j ∈ Z$, for some $δ > 0$, is obtained by replacing $σ ( s )$ in Equation (16) by
$σ ¨ ( s ) = ( 1 − exp { − δ α 1 } ) det K n ″ ( α ; s ) K 2 n ″ ( β ; s 2 ) 1 2 .$
Moreover, the following continuity correction can be considered. For the lattice point $s 1$, define $s ˜ 1 = s 1 − δ / ( 2 n )$, $s ˜ = ( s ˜ 1 , s 2 )$ and $α ˜ = ( α ˜ 1 , α ˜ 2 )$ as the solution w.r.t. $v$ of
$∂ ∂ v K n ( v ; s ˜ ) = 0 .$
Then, replace $ρ ( s )$ and $σ ( s )$ in Equation (16) by
$ρ ˜ ( s ˜ ) = sgn ( α ˜ 1 ) { 2 [ K 2 n ( β ; s 2 ) − K n ( α ˜ ; s ˜ ) ] } 1 2 and σ ˜ ( s ˜ ) = 2 sinh δ 2 α ˜ 1 det K n ″ ( α ˜ ; s ˜ ) K 2 n ″ ( β ; s 2 ) 1 2 ,$
respectively. The justifications can be found in [4,19]. The relative error of these modified approximations remains $O ( n − 1 )$.

3.3. Approximation to Quantiles

Define $ζ ( s ) = ρ ( s ) + log { σ ( s ) / ρ ( s ) } / ρ ( s )$, for $ρ$ and $σ$ defined in Equation (15). An asymptotically equivalent version of the saddlepoint approximation in Equation (16) is given by $P n * ( s 1 ∣ s 2 ) = 1 − Φ ∘ ζ ( s )$. This formula leads to a fast algorithm for approximating quantiles, with same asymptotic error as the one entailed by exact inversion of the saddlepoint approximation. The general idea of Wang  was adapted to the present situation by Gatto .
Let $ε ∈ ( 0 , 1 )$. One starts with any reasonable approximation to the desired $ε$-quantile, for example the normal one, given by
$s 1 ( 0 ) ( ε ) = τ ( s 2 ) n Φ ( − 1 ) ( ε ) + μ ( s 2 ) ,$
where $μ ( s 2 ) ≃ E [ S 1 , n ∣ S 2 , n = s 2 ]$ and $τ 2 ( s 2 ) ≃ n var ( S 1 , n ∣ S 2 , n = s 2 )$.
Re-denote by $α ( s )$ the saddlepoint at $s$, viz. the solution of Equation (13) w.r.t. $v$. Denote $K ˙ n ( v ; s ) = ∂ / ∂ s K n ( v ; s )$. One computes, for $j = 0 , 1$,
$s 1 ( j + 1 ) ( ε ) = s 1 ( j ) ( ε ) + { Φ ( − 1 ) ( ε ) } 2 − ζ 2 ( s ( j ) ( ε ) ) − 2 { K ˙ n ( α ( s ( j ) ( ε ) ) ; s ( j ) ( ε ) ) } 1 ,$
where $s ( j ) ( ε ) = ( s 1 ( j ) ( ε ) , s 2 )$. If $s 1 ( ε )$ denotes the exact $ε$-quantile, then
$s 1 ( 2 ) ( ε ) = s 1 ( ε ) { 1 + O ( n − 3 2 ) } , as n → ∞ .$
Moreover, if $s ˜ 1 ( ε )$ denotes the $ε$-quantile obtained by exact inversion of the saddlepoint distribution, then $s 1 ( 2 ) ( ε ) = s ˜ 1 ( ε ) { 1 + O ( n − 3 / 2 ) }$, as $n → ∞$. Therefore, stopping the iteration of Equation (20) at $j = 1$ is sufficient in terms of asymptotic accuracy.
Consider the simple case $ψ 1 , j ( x ; s 1 , s 2 ) = g ( x ) − s 1$, for some continuous function $g : R → R$. Then, Equation (11) yields $S 1 , n ( X 1 , … , X n ) = n − 1 ∑ j = 1 n g ( X j )$. In this situation, the denominator of the ratio in Equation (20) simplifies to $2 { α ( s ( j ) ( ε ) ) } 1$.

4. Applications

This section presents various examples that illustrate the relevance and accuracy of the conditional saddlepoint approximation for M-statistics of Section 3 with the M-P, MH-B, MP-NB and D-G representations of Section 2, respectively, in Section 4.1, Section 4.2, Section 4.3 and Section 4.4. Important applications or examples from previous articles are summarized and novel examples are developed. The common urn sampling model of all examples is always put in the forefront. Many examples are studied numerically. The values obtained by the saddlepoint approximation are always very close to the ones obtained by Monte Carlo simulation. This section is however not a complete list of applications: further examples can be found, e.g., in [8,9] (Chapter 4 and Section 12.5).
As mentioned, the accuracy of the saddlepoint approximation is assessed through comparisons with simple Monte Carlo simulation. The following measures of accuracy for approximating the distribution of the statistic $T n$ are considered. Let $t > 0$. The probabilities obtained by simulation are considered as exact and denoted $P E [ T n < t ]$. The probabilities obtained by the saddlepoint approximation are denoted $P S [ T n < t ]$. Then,
$ae ( t ) = | P S [ T n < t ] − P E [ T n < t ] | = | P S [ T n ≥ t ] − P E [ T n ≥ t ] |$
denotes the absolute error and
$re ( t ) = | P S [ T n < t ] − P E [ T n < t ] | min { P E [ T n < t ] , 1 − P E [ T n < t ] } = | P S [ T n ≥ t ] − P E [ T n ≥ t ] | min { P E [ T n ≥ t ] , 1 − P E [ T n ≥ t ] }$
denotes the absolute relative error.

4.1. Sampling with Replacement and M-P Representation

Three new illustrations of the saddlepoint approximation with the M-P representation are presented. Example 1 considers the entropy of the coloration probabilities of the balls of the urn. Numerical evaluations of the saddlepoint approximation to the distribution of the estimator of the entropy are given. Example 2 concerns the likelihood ratio test for the null hypothesis of equality of the coloration probabilities. The power under a particular alternative hypothesis is computed numerically. Example 3 considers the insurer total claim amount when the individual claim settlement is delayed. The saddlepoint approximation to the distribution of the total claim amount is analyzed numerically. Example 4 reviews the application of the saddlepoint approximation to the bootstrap distribution of the M-statistic in Equation (1).
Example 1 (Entropy’s estimator under sampling with replacement).
The mathematical study of entropy began with Shannon , for the construction of a model for the transmission of information. In sampling with replacement from the urn, the probability of drawing a ball of color $C j$ is fixed and given by $p j = a j , 0 / z$, for $j = 1 , … , n$. Define $p = ( p 1 , … , p n ) ∈ Δ 1 n − 1$. The entropy of the coloration is given by
$ε n ( p ) = − ∑ j = 1 n p j log p j ,$
where $0 log 0 = 0$ is assumed. The entropy $ε n ( p )$ is an appropriate measure of the uncertainty about the colors of the drawn balls. Indeed, it satisfies the following properties. First, $ε n ( p )$ takes its largest value $log n$ for $p 1 = … = p n = n − 1$. Second, if we consider the equivalent coloration $C 1 , … , C n , C n + 1$ with probabilities $p 1 , … , p n$ and $p n + 1 = 0$, respectively, then $ε n ( p 1 , … , p n ) = ε n + 1 ( p 1 , … , p n , p n + 1 )$. Theorem 1 on pp. 9–10 of  states that the only continuous function that satisfies these two properties plus another one related to conditional entropy, has the form given in Equation (23) multiplied by a positive constant.
As in Section 2.2, $Y 1 , … , Y n$ denotes the number of drawn balls for each of the n colors $C 1 , … , C n$, respectively, after $k ∈ N *$ draws. Define
$T n ( Y 1 , … , Y n ) = ε n Y 1 k , … , Y n k = − ∑ j = 1 n Y j k log Y j k = − 1 k ∑ j = 1 n Y j log Y j + log k$
and $P n ( Y 1 , … , Y n ) = k Y 1 … Y n n − Y 1 … n − Y n ,$ that is, the multinomial probability of the configuration $( Y 1 , … , Y n )$ under uniformity. It is directly shown that $k − 1 log P n = T n + o ( 1 ) , a s k → ∞ a n d a . s .$ Asymptotically, the entropy of the configuration is thus an increasing transform of the probability of the configuration under uniformity. The probability $P n$ is maximized by the constant configuration and so is the entropy $T n$.
Consider now the multinomial model in Equation (3) with unknown vector of probabilities $p$. The frequency $Y j / k$ is an unbiased estimator of $p j$, for $j = 1 , … , n$. Thus, $T n$ is an estimator of the entropy $ε n ( p )$. It takes the form of the M-statistic in Equation (1) with $ξ j ( y ; t ) = − y log y + n − 1 k log k − n − 1 k t$. Using the M-P representation and some algebraic manipulations, the c.g.f. in Equation (12) takes the form
$K n ( v ; s ) = k ( log k − s 1 ) v 1 − n s 2 v 2 − q + ∑ j = 1 n log 1 + ∑ l = 1 ∞ 1 l ! q p j e v 2 l − v 1 l ,$
with $q ∈ R + *$ arbitrary. We set $s 2 = k / n$ and select q such that $E [ S 2 , n ] = k / n$, i.e., $q = k$. With this choice of q, the marginal saddlepoint equation, cf. Equation (14), has the trivial solution $β = 0$. This yields
$K n v ; s 1 , k n = k ( log k − s 1 ) v 1 − v 2 − 1 + ∑ j = 1 n log 1 + ∑ l = 1 ∞ 1 l ! k p j e v 2 l − v 1 l .$
Computing the second order derivatives is long but basic. We only give the simple result $K 2 , n ″ ( 0 ; ( s 1 , k / n ) ) = k$; it can be used for controlling the formula of the second derivative.
We can now apply the saddlepoint approximation of Section 3 to the following case: $p j = 2 j / { n ( n + 1 ) }$, for $j = 1 , … , n$, $n = 6$ and $k = 32$. The saddlepoint approximation is compared with the Monte Carlo distribution of $T 6$ based on $10 6$ simulations. The numerical results are displayed in Figure 1 and Table 1. The probabilities obtained by simulation are denoted $P E [ T 6 < t ]$, the probabilities obtained by the saddlepoint approximation are denoted $P S [ T 6 < t ]$, $ae ( t )$ denotes the absolute error defined in Equation (21) and $re ( t )$ denotes the absolute relative error defined in Equation (22), for $t ∈ [ 1.20 , 1.77 ]$. We see that the relative errors are mostly very small. The largest one occurs in the extreme right tail and it is around 31%.
Example 2 (Power of likelihood ratio test).
The estimator of entropy in Equation (24) is closely related to the likelihood ratio test. Consider a sample of k i.i.d. random variables and consider any partition of their range that is made by n intervals of positive length. Denote by $p j$ the probability that any one of the sample values belongs to the jth interval, for $j = 1 , … , n$. Denote by $Y j$ the number of sample values that belong to the jth interval, for $j = 1 , … , n$. Then, $( Y 1 , … , Y n )$ takes values in $Δ ¨ k n − 1$ and follows the multinomial distribution in Equation (3). Consider the null hypothesis $H 0 : p ∈ Π 0$, where $Π 0 ⊂ Δ 1 n − 1$. The likelihood ratio test statistic for $H 0$ against the general alternative is given by
$L n ( Y 1 , … , Y n ) = sup p ∈ Π 0 k ! Y 1 ! … Y n ! p 1 Y 1 … p n Y n sup p ∈ Δ 1 n − 1 k ! Y 1 ! … Y n ! p 1 Y 1 … p n Y n .$
By restricting to $Π 0 = { p 0 }$, for some $p 0 ∈ Δ 1 n − 1$, we obtain
$T n * ( Y 1 , … , Y n ) = − 2 log L n ( Y 1 , … , Y n ) = 2 ∑ j = 1 n Y j log Y j − 2 ∑ j = 1 n Y j log p 0 , j − 2 k log k .$
In the case $p 0 , 1 = … = p 0 , n = n − 1$, which can be obtained without loss of generality by the probability integral transform, $T n * ( Y 1 , … , Y n )$ is equal to $2 ∑ j = 1 n Y j log Y j$ plus a constant term. Then, $T n * ⟶ d χ n − 1 2$, as $k → ∞$. In addition, if $k , n → ∞$, with $k / n → l$, for some $l ∈ ( 1 , ∞ )$, then $T n *$ is asymptotically normal.
The numerical evaluation of the saddlepoint approximation to the distribution of $T n *$, with $n = 4$, $k = 12$ and under $H 0$, is given in Table 1 in . We now extend the numerical study to the evaluation of the power function at any point of the alternative, viz. at any $p ∈ Δ 1 n − 1 \ { n − 1 , … , n − 1 }$. Because $T n *$ is an affine transform of the entropy estimator $T n$ given in Equation (24), we rather consider $T n$ as test statistic. Thus, the c.g.f. for the saddlepoint approximation is already given in Equation (25). Consider the power function at the point of alternative hypothesis $p j = 2 j / { n ( n + 1 ) }$, for $j = 1 , … , n$. We select $n = 6$ and $k = 32$. The saddlepoint approximation to the distribution of $T 6$ under $H 0$ gives
$P S [ T 6 < 1.6060 ] = 0.0495 .$
The saddlepoint approximation to the distribution of $T 6$ under the chosen alternative point gives
$P S [ T 6 < 1.6060 ] = 0.5691 .$
This distribution is computed in Example 1. Thus, $0.5691$ is the saddlepoint approximation to the power of the test with size 0.0495 at the given alternative.
In situations where $Π 0$ is the singleton containing the vector of unequal elements $p 0 , 1 , … , p 0 , n$, the saddlepoint approximation can be obtained in a similar way. An important application is with language identification, where these probabilities represent the frequencies of the n letters of the alphabet of a language and $Y 1 / k , … , Y n / k$ are the frequencies of these n letters within a text of k letters. The belonging of the text to the language can be tested with the statistic $T n *$, which is in fact proportional to the Kullback–Leibler information. Precisely, denote
$ι n ( v | w ) = ∑ j = 1 n v j log v j w j$
the Kullback–Leibler information or discrepancy between the two probability distributions $v = ( v 1 , … , v n ) ∈ Δ 1 n − 1$ and $w = ( w 1 , … , w n ) ∈ Δ 1 n − 1$, that satisfy the absolute continuity condition $w j = 0 ⇒ v j = 0$, for $j = 1 , … , n$. Then, $T n * = 2 k ι n ( Y 1 / k , … , Y n / k | p 0 , 1 , … , p 0 , n )$.
Example 3 (Total claim amount under delayed settlement).
We are interested in the distribution of the total claim amount of an insurance company over a fixed time horizon. We assume that the delay of claim settlement increases as the individual claim amount increases. This can happen in actuarial practice, partially because large claim amounts require longer controls. Precisely, the individual claim amounts are i.i.d. random variables taking the n values $r 1 < … < r n$, all in $R + *$, for $n = 2 , 3 , …$. Let $j ∈ { 1 , … , n }$. Claims of amount $r j$ are settled exactly after the jth unit of time (e.g., months). During a given time horizon (e.g., a year), $Y j$ claims of amount $r j$ occur. We assume that $k ∈ N *$ claims have occurred during the time horizon under consideration and that $( Y 1 , … , Y n )$, which takes values in $Δ ¨ k n − 1$, follows the multinomial distribution in Equation (3). The total claim amount settled during the time horizon is thus $∑ j = 1 n r j Y j$. We are interested in the distribution of the proportion of total claim amount that is settled exactly after the mth unit of time, viz. of
$T n = T n ( Y 1 , … , Y n ) = ∑ j = 1 m r j Y j ∑ j = 1 n r j Y j ,$
for some $m ∈ { 1 , … , n − 1 }$. It can be re-expressed as the M-statistic in Equation (1) with
$ξ j ( y ; t ) = r j ( I { j ≤ m } − t ) y , for j = 1 , … , n .$
The M-P representation tells that the multinomial claim counts have the distribution of independent Poisson occurrences, given a total of k claim occurrences. Thus, with some algebraic manipulations, the c.g.f. in Equation (12) becomes
$K n ( v ; s ) = − n s 2 v 2 − q + ∑ j = 1 n log 1 + ∑ l = 1 ∞ 1 l ! exp { [ v 1 r j ( I { j ≤ m } − s 1 ) + v 2 + log ( q p j ) ] l } ,$
with arbitrary $q ∈ R + *$. We set $s 2 = k / n$ and select q such that $E [ S 2 , n ] = k / n$, i.e., $q = k$. Thus, the marginal saddlepoint equation, cf. Equation (14), is solved by $β = 0$. This leads to
$K n v ; s 1 , k n = − k ( 1 + v 2 ) + ∑ j = 1 n log 1 + ∑ l = 1 ∞ 1 l ! exp { [ v 1 r j ( I { j ≤ m } − s 1 ) + v 2 + log ( k p j ) ] l } .$
By computing the second order derivatives, we find $K 2 , n ″ ( 0 ; ( s 1 , k / n ) ) = k$.
For the numerical illustration, consider the multinomial distribution in Equation (3) with probabilities $p 1 = 0.15$, $p 2 = 0.23$, $p 3 = 0.16$, $p 4 = 0.14$, $p 5 = 0.12$, $p 6 = 0.1$, $p 7 = 0.06$, and $p 8 = 0.04$ and the total number of $k = 30$ claims. Thus, $n = 8$ and the possible claim amounts are $r 1 = 10$, $r 2 = 15$, $r 3 = 20$, $r 4 = 30$, $r 5 = 50$, $r 6 = 70$, $r 7 = 100$ and $r 8 = 120$. The number of unit of times for the proportion of settled total claim amount, cf. Equation (27), is $m = 4$. To assess the accuracy of the saddlepoint approximation, we compute the Monte Carlo distribution of $T 8$, based on $10 6$ simulations. The numerical results are displayed in Table 2. The probabilities obtained by simulation are denoted $P E [ T 8 < t ]$, the probabilities obtained by the saddlepoint approximation are denoted $P S [ T 8 < t ]$ and $re ( t )$ denotes the relative error, cf. Equation (22), for $t ∈ [ 0.12 , 0.72 ]$. Most relative errors are below 5%. The largest one occurs in the extreme left tail and is approximatively $12 %$.
A practical question would be the following: Which value of t bounds from above the proportion of total claim amount $T 8$ with probability 0.99? One computes directly $P S [ T 8 < 0.63 ] = 0.9897$ and thus $t = 0.63$, approximately.
Example 4 (Bootstrap distribution of M-statistic).
Let $R 1 , … , R n$ be a sample of i.i.d. random variables taking values in $R$, for $n = 2 , 3 , …$. Absolute continuity is assumed, in order to avoid repeated values a.s. Consider the M-statistic $U n$ or $U n ( R 1 , … , R n )$ defined as the root in t on
$∑ j = 1 n ζ ( R j ; t ) = 0 ,$
where $ζ : R 2 → R$ is continuous and decreasing in its second argument. Let $r 1 , … , r n$ be a realization of the sample and let $R 1 * , … , R n *$ be the random variables obtained by sampling with replacement from the values $r 1 , … , r n$ with respective probabilities $p 1 , … , p n$, for $( p 1 , … , p n ) ∈ Δ 1 n − 1$. The distribution of $U n ( R 1 * , … , R n * )$, or simply $U n *$, is the bootstrap distribution of $U n$.
This coincides with sampling with replacement from the general urn model of Section 2.2, if the color $C j$ is associated to the value $r j$, for $j = 1 , … , n$, and if the number of draws from the urn is $k = n$. Define $ξ j ( y ; t ) = y ζ ( r j ; t )$, for $t ∈ R$, $y ∈ N$ and for $j = 1 , … , n$. Then, $U n *$ can be represented as the solution w.r.t. t of Equation (1), denoted $T n$, in which $Y j$ is the number of times that $r j$ has been sampled, for $j = 1 , … , n$. The conditional saddlepoint approximation of Section 3 yields the distribution of $T n$, i.e., of $U n *$, i.e., of the bootstrap distribution of $U n$. In most practical cases, $p 1 = … = p n = n − 1$, i.e., $a 1 , 0 = … = a n , 0$.
The saddlepoint approximation for bootstrap distributions was introduced by [34,35,36] and for M-estimators by . A review can be found in  (Section 9.5). Thus, the conditional saddlepoint approximation of Section 3 provides an alternative saddlepoint approximation to the bootstrap distribution of M-estimators.
Other applications of this saddlepoint approximation with the M-P representation that can be found the literature are the following. Saddlepoint approximations for likelihood ratio test and for chi-square tests for grouped data, under the null hypotheses, are given in . For the numerical evaluation of the saddlepoint approximation for the likelihood ratio statistic, refer to .

4.2. Sampling without Replacement and MH-B Representation

The saddlepoint approximation combined with the MH-B representation can be applied for approximating the distribution of the M-statistic in Equation (1) in finite population sampling, viz. under sampling without replacement. Example 5 analyzes the numerical accuracy of the saddlepoint approximation to the distribution of the coloration entropy when sampling is without replacement.
Example 5 (Entropy’s estimator under sampling without replacement).
We consider the entropy estimation of Example 1 in the context of sampling without replacement. We are interested in the coloration entropy $ε n ( a 1 , 0 / z , … , a n , 0 / z )$, as given by Equation (23), with $a 1 , 0 , … , a n , 0$ unknown. It is the entropy of the initial state of the urn. In the multivariate hypergeometric model in Equation (4), $Y j / k$ is an unbiased estimator of $a j , 0 / z$, for $j = 1 , … , n$, where $( Y 1 , … , Y n )$ takes values in $Δ ¨ k n − 1 ∩ ( [ 0 , m 1 ] × … × [ 0 , m n ] )$. Thus, an estimator of this entropy is given by Equation (24). The unknown parameters of the multivariate hypergeometric distribution in Equation (4) are $m j = a j , 0$, for $j = 1 , … , n$.
With the MH-B representation and some algebraic manipulations, the c.g.f. in Equation (12) becomes
$K n ( v ; s ) = k ( log k − s 1 ) v 1 − n s 2 v 2 + z log ( 1 − q ) + ∑ j = 1 n log 1 + ∑ l = 1 m j ( m j ) l l ! q 1 − q e v 2 l − v 1 l ,$
with $q ∈ ( 0 , 1 )$ arbitrary. We set $s 2 = k / n$ and select q such that $E [ S 2 , n ] = k / n$, i.e., $q = k / z$. For this purpose, we assume $k < z$. With this choice, the marginal saddlepoint equation, cf. Equation (14), has the trivial solution $β = 0$ and the c.g.f. in Equation (28) becomes
$K n v ; s 1 , k n = k ( log k − s 1 ) v 1 − v 2 + z { log ( z − k ) − log z } + ∑ j = 1 n log 1 + ∑ l = 1 m j ( m j ) l l ! k z − k e v 2 l − v 1 l .$
The second order derivatives of $K n$ can be obtained through long but simple algebraic manipulations. In particular, we find $K 2 , n ″ ( 0 ; ( s 1 , k / n ) ) = k ( z − k ) / z$.
For the numerical illustration, we consider the multivariate hypergeometric distribution with $n = 7$, $m 1 = 2$, $m 2 = 4$, $m 3 = 6$, $m 4 = 8$, $m 5 = 10$, $m 6 = 12$, $m 7 = 14$ and $k = 25$. We compute the Monte Carlo distribution of $T 7$ based on $10 6$ simulations. The saddlepoint approximation is obtained by following the steps of Section 3. The results are given in Table 3. The saddlepoint probabilities are obtained instantaneously and we see that the relative errors are below 15%, with the exception an extreme left tail point, for which the relative error is $25 %$.
We now summarize two practical applications of the conditional saddlepoint approximation with the MH-B representation. The first one can be found in  and concerns a permutation test of comparison of two groups. The jth individual belongs to the control group, when $Y j = 0$, and to the treatment group, when $Y j = 1$, for $j = 1 , … , n$. We have $( Y 1 , … , Y n ) ∼ Multi - Hypergeometric ( k ; 1 , … , 1 )$, where k is the number of individuals of the treatment group. The realizations of $( Y 1 , … , Y n )$ represent the permutations of the individuals and the test statistic $T n$ is a linear combination of the elements of $( Y 1 , … , Y n )$. The permutation distribution of $T n$ is obtained from Equation (2), where $X 1 , … , X n$ are i.i.d. Bernoulli random variables.
The second application can be found in  and concerns the jackknife distribution of a ratio. Consider the fixed sample $z 1 , … , z n$, sample without replacement $1 ≤ d < n$ values and define $Y j = 0$, if $z j$ is not sampled, and $Y j = 1$, if $z j$ is sampled, for $j = 1 , … , n$. This procedure is repeated many times and a statistic of interest is computed $k = n − d$ times, from the k sampled values of each iteration. In the terminology of B. Efron, this is called the delete-d jackknife. We have $( Y 1 , … , Y n ) ∼ Multi - Hypergeometric ( k ; 1 , … , 1 )$, where k is the sample size of the jackknife samples. The realizations of $( Y 1 , … , Y n )$ represent the permutations of $( z 1 , … , z n )$. The statistic considered in  is $T n = ∑ j = 1 n v j Y j / ∑ j = 1 n u j Y j$, for $u j , v j ∈ R$, for $j = 1 , … , n$. The permutation, viz. delete-d jackknife, distribution of $T n$ is obtained from Equation (2), where $X 1 , … , X n$ are independent Bernoulli random variables with parameter $1 / 2$, together with the saddlepoint approximation for M-statistics of Section 3.

4.3. Polya’s Sampling and MP-NB Representation

This section provides various applications of the saddlepoint approximation with the MP-NB representation. Example 6 considers the estimator of the entropy of the initial coloration probabilities of the urn, in the setting of Polya’s sampling. Example 7 considers the Bayesian analysis if this entropy. The Bayesian Entropy’s estimator under multivariate Polya a priori and sampling without replacement is considered. The saddlepoint approximation to this the posterior distribution of the entropy can be obtained by MP-NB representation. Example 8 concerns the saddlepoint approximation with the MP-NB representation for many two-sample tests based on spacing-frequencies.
Example 6 (Entropy’s estimator under Polya’s sampling).
We consider the entropy estimation problem introduced in Example 1, now in the context of Polya’s sampling. We are interested in the entropy of the initial coloration probabilities $ε n ( a 1 , 0 / z , … , a n , 0 / z )$, given in Equation (23), where $a 1 , 0 , … , a n , 0$ are unknown. In the multivariate Polya model in Equation (5), $Y j / k$ is an unbiased estimator of $a j , 0 / z$, for $j = 1 , … , n$, and so an estimator of the entropy is given by Equation (24). The parameters of the multivariate Polya distribution in Equation (5) are k equal to k of the urn model and $m j = a j , 0 / r$, for $j = 1 , … , n$. Using the MP-NB representation, the c.g.f. in Equation (12) becomes
$K n ( v ; s ) = u log q − k ( s 1 − log k ) − n s 2 v 2 + ∑ j = 1 n log 1 + ∑ l = 1 ∞ ( l + m j − 1 ) l l ! ( 1 − q ) e v 2 l − v 1 l ,$
with $q ∈ ( 0 , 1 )$ arbitrary. This formula allows for the direct evaluation of the conditional saddlepoint approximation of Section 3.
Example 7 (Bayesian Entropy’s estimator under multivariate Polya a priori and sampling without replacement).
The multivariate Polya distribution is often used as a prior distribution in Bayesian statistics, because it constitutes a conjugate class when associated to the multivariate hypergeometric likelihood. Precisely, consider the prior
$( M 1 , … , M n ) ∼ Multi - Polya ( z ; α 1 , … , α n )$
taking value in $Δ ¨ z n − 1$, for $z ∈ N *$, $( α 1 , … , α n ) ∈ Δ u n − 1$ and $u ∈ R + *$, and consider the likelihood
$( Y 1 , … , Y n ) ∣ { ( M 1 , … , M n ) = ( m 1 , … , m n ) } ∼ Multi - Hypergeometric ( k ; m 1 , … , m n ) ,$
for $( m 1 , … , m n ) ∈ Δ ¨ z n − 1$, $k ∈ { 0 , … , z }$ and $( Y 1 , … , Y n )$ taking values in $Δ ¨ k n − 1 ∩ ( [ 0 , m 1 ] × … × [ 0 , m n ] )$. Then, the posterior is given by
${ ( M 1 , … , M n ) | ( Y 1 , … , Y n ) = ( k 1 , … , k n ) } ∼ ( k 1 , … , k n ) + Multi - Polya ( z − k ; α 1 + k 1 , … , α n + k n ) ,$
for $( k 1 , … , k n ) ∈ Δ ¨ k n − 1 ∩ ( [ 0 , m 1 ] × … × [ 0 , m n ] )$. Indeed,
$P [ M 1 = m 1 , … , M n = m n | Y 1 = k 1 , … , Y n = k n ] ∝ ∏ j = 1 n m j k j α j + m j − 1 m j ∝ ∏ j = 1 n ( α j + m j − 1 ) ! ( m j − k j ) ! ∝ ∏ j = 1 n ( α j + k j ) + ( m j − k j ) − 1 m j − k j ( u + k ) + ( z − k ) − 1 z − k ,$
where the last result is in fact equal to the posterior probability. Thus, Equation (31) holds.
The underlying urn model is the sampling without replacement described, in Section 2.2, where the initial number of balls of each one of the colors $C 1 , … , C n$, viz. $m j = a j , 0$, for $j = 1 , …$, in the same order, is unknown. Only $z = ∑ j = 1 n a j , 0$ is known. These initial counts are the elements of the random vector $( M 1 , … , M n )$ with prior distribution in Equation (30). Sampling without replacement has led to the counts $( Y 1 , … , Y n ) = ( k 1 , … , k n )$, for the colors $C 1 , … , C n$, in same order. The updated or posterior distribution of $( M 1 , … , M n )$ is given by Equation (31).
Assume that we are interested in the entropy of the probabilities of the initial coloration. The a priori entropy is thus $T n ( M 1 , … , M n ) = ε n M 1 / z , … , M n / z$, cf. Equation (23). According to Equation (31), the a posteriori entropy is $T n ( k 1 + L 1 , … , k n + L n )$, where $( L 1 , … , L n ) ∼ Multi - Polya ( z − k ; α 1 + k 1 , … , α n + k n )$.
The saddlepoint approximations to the distributions of the a priori and a posteriori entropies can be obtained by the saddlepoint approximation of Section 3 with MP-NB representation, as in Example 6. The a priori and a posteriori c.g.f. can be obtained by minor adaptations of the c.g.f. in Equation (29).
Example 8 (Two-sample tests based on spacing frequencies).
Consider two independent samples: the first consisting of k independent random variables $U 1 , … , U k$ with common absolutely continuous distribution $P U$ and the second sample consisting of l independent random variables $V 1 , … , V l$ with common absolutely continuous distribution $P V$. All these random variables have common range given by the real interval $I$. We wish to test the null hypothesis $H 0 : P U = P V$. Define $V ( 0 ) = inf I$, $V ( l + 1 ) = sup I$ and $V ( 1 ) ≤ … ≤ V ( l )$ the ordered $V 1 , … , V l$. Let $n = l + 1$. The random counts
$Y j = ∑ i = 1 k I { U i ∈ [ V ( j − 1 ) , V ( j ) ) } , for j = 1 , … , n ,$
are called spacing-frequencies: they provide the number of random variables $U 1 , … , U k$ that lie between gaps made by $V ( 0 ) , … , V ( l + 1 )$. Thus, $( Y 1 , … , Y n )$ takes values in $Δ ¨ k n − 1$ and possesses exchangeable components under $H 0$.
Denote by $R j$ the rank of the jth largest $V 1 , … , V l$ in the combined sample, for $j = 1 , … , l$. It is easily seen that $R j = ∑ i = 1 j ( Y i + 1 )$, or, $Y j = R j − R j − 1 − 1$, for $j = 1 , … , l$. Consequently, many two-sample test statistics based on ranks can be re-expressed in terms of spacing-frequencies. Besides this, spacing-frequencies are essential for the analysis of circular data, because they are invariant w.r.t. changes of null direction and sense of rotation (clockwise or anti-clockwise) (for a review, see, e.g., ). Circular data are planar directions and can be re-expressed as angles in radians, so that $I = [ 0 , 2 π )$, or any other interval of length $2 π$.
Holst and Rao  consider nonparametric test statistics of the form of
$T n ( Y 1 , … , Y n ) = ∑ j = 1 n h j ( Y j ) ,$
for some Borel functions $h 1 , … , h n$. If $h 1 = … = h n = h$, then the test statistic $T n$ is called symmetric. Under $H 0$, the multivariate Polya distribution in Equation (5) holds with $m 1 = … = m n = 1$. Consequently, $u = ∑ j = 1 n m j = n$ and all Polya’s probabilities in Equation (5) are equal to $n + k − 1 k − 1$. This is in accordance with the result of combinatorics that the number of solutions $( k 1 , … , k n ) ∈ N n$ of the equation $k 1 + … + k n = k$, i.e., card $Δ ¨ k n − 1$, is given by $n + k − 1 k$. Thus, the equivalence in Equation (2) holds with the MP-NB representation, where the negative binomial reduces to the geometric distribution. Clearly, Equation (33) takes the form of the M-statistic in Equation (1) and the saddlepoint approximation of Section 3 can be applied.
We now summarize the examples presented in [3,5]. In the classical Wald–Wolfowitz run test, $T n$ takes the symmetric form of Equation (33) with $h ( x ) = I { x > 0 }$. We define a U-run in the combined sample as a maximal non-empty set of adjacent $U 1 , … , U k$. Since each positive $Y 1 , … , Y n$ is mapped to a different U-run and conversely, $T n$ yields the number of U-runs and it takes values in ${ 1 , … , n }$. Large values of $T n$ show evidence for equal spread, i.e., for $H 0$.  provides the numerical evaluation of the saddlepoint approximation to the distribution of $T n$ under $H 0$. The saddlepoint approximation to the distributions of the Wilcoxon viz. Mann–Whitney, the van der Waerden viz. normal score and the Savage viz. exponential score tests are developed in , The numerical study of Savage’s test appears in . In the context of directional data, a generalization of Rao’s spacings tests (see Section 4.4) to spacing-frequencies together with the saddlepoint approximation is given in , which mention its saddlepoint approximation.
The so-called multispacing-frequencies are obtained by gaps of order larger than one made by $V ( 0 ) , … , V ( l + 1 )$. Let $g ∈ N *$ denote the differentiation gap order, such that $n = ( l + 1 ) / g$ is an integer. Then, the multispacing-frequencies are defined by
$Y j = ∑ i = 1 k I { U i ∈ [ V ( { j − 1 } g ) , V ( j g ) ) } , for j = 1 , … , n .$
In the case $g = 1$, Equation (34) coincides with the spacing-frequencies in Equation (32). As before with $g = 1$, $( Y 1 , … , Y n )$ takes values in $Δ ¨ k n − 1$. We reconsider the null hypothesis $H 0 : P U = P V$ and the general test statistics in Equation (33), however with the multispacing-frequencies in Equation (34). Under $H 0$, the multivariate Polya distribution in Equation (5) holds with $m 1 = … = m n = g$, $u = ∑ j = 1 n m j = n g$ and the MP-NB representation applies.
The saddlepoint approximation with MP-NB representation was analyzed by Gatto and Jammalamadaka  in the context of the asymptotically most powerful multispacing-frequencies test against a specific sequence of alternative distributions and also in the context of the test statistic defined by the sum of squared multispacing-frequencies.
It seems difficult to formulate an arbitrary alternative hypothesis in terms of a particular multivariate Polya distribution, for the multispacing-frequencies. In this sense, the conditional saddlepoint approximation with the MP-NB representation may not be easily applied to power computations.

4.4. D-G Representation

Example 9 of this section analyzes the most powerful test of symmetry of the Dirichlet distribution. The saddlepoint approximation based on the D-G representation to the distribution of the test statistic under an asymmetric alternative is developed and its numerical accuracy is studied. The Dirichlet associated to the multinomial distribution is an important conjugate class of distributions in Bayesian statistic. This is illustrated in Example 10, which presents a Bayesian bootstrap test on the entropy. The D-G representation with the conditional saddlepoint approximation allow to compute the Bayes factor of the test, without resampling. Another important application of the saddlepoint approximation with the D-G representation is for the class of one-sample tests based on spacings. This class of nonparametric tests is presented in Example 11 and has some similarities with the two-sample tests based on spacing frequencies of Example 8. Example 11 provides a summary of the applications that can be found in the literature of this saddlepoint approximation to tests based on spacings.
Example 9 (Test for Dirichlet’s symmetry).
The symmetric Dirichlet distribution is obtained by setting $a 1 = … = a n = a$ in Equation (9), for any $a ∈ R + *$. In Bayesian statistics, symmetric priors are of particular interest in absence of prior knowledge on the individual elements, because they become exchangeable random variables. The single parameter a becomes a concentration parameter: $a = 1$ yields the uniform distribution over $Δ 1 n − 1$ (thus, the noninformative prior); $a > 1$ yields a concave density over $Δ 1 n − 1$ (thus, promoting similarity of elements); and $a < 1$ yields a convex density over $Δ 1 n − 1$ (thus, promoting dissimilarity of elements). For $( Y ¯ 1 , … , Y ¯ n ) ∼ Dirichlet ( a 1 , … , a n )$, consider the testing problem of a particular symmetry against any particular asymmetric alternative. Precisely, given $a , α 1 , … , α n ∈ R + *$, where at least one the values $α 1 , … , α n$ differs from the other ones, consider $H 0 : a 1 = … = a n = a$, against $H 1 : ( a 1 , … , a n ) = ( α 1 , … , α n )$. The test of uniformity is obtained with $a = 1$. Neyman–Pearson’s Lemma tells that the most powerful test has the form $T n > t$, where $T n$ viz. $T n ( Y ¯ 1 , … , Y ¯ n )$ is given by
$T n ( Y ¯ 1 , … , Y ¯ n ) = ∑ j = 1 n ( α j − a ) log Y ¯ j .$
It is the M-statistic in Equation (1) with $ξ j ( y ; t ) = ( α j − a ) log y − t / n$, for $j = 1 , … , n$. From the D-G representation and some algebraic manipulations, the c.g.f. in Equation (12) becomes
$K n ( v ; s ) = − s 1 v 1 − n s 2 v 2 + α ˜ log q − log ( q − v 2 ) { α ˜ + ( α ˜ − n a ) v 1 } + ∑ j = 1 n { log Γ ( α j + [ α j − a ] v 1 ) − log Γ ( α j ) } ,$
where $α ˜ = ∑ j = 1 n α j$ and $q ∈ R + *$ arbitrary. We set $s 2 = 1 / n$ and select q such that $E [ S 2 , n ] = 1 / n$, i.e. $q = α ˜$. The marginal saddlepoint equation, cf. Equation (14), has then $β = 0$ as solution. This leads to
$K n v ; s 1 , k n = − s 1 v 1 − v 2 + α ˜ log α ˜ − log ( α ˜ − v 2 ) { α ˜ + ( α ˜ − n a ) v 1 } + ∑ j = 1 n { log Γ ( α j + [ α j − a ] v 1 ) − log Γ ( α j ) } .$
The second order derivatives of $K n$ can be expressed in terms of polygamma functions $ψ ( n ) ( z ) = ( d / d z ) n + 1$$log Γ ( z )$, for $n = 0 , 1$. We skip the details but note that $K 2 , n ″ ( 0 ; ( s 1 , k / n ) ) = α ˜ − 1$.
In the following numerical illustration, $a = 1$ and $α j = j$, for $j = 1 , … , 5$, so $n = 5$. The saddlepoint approximation is computed under $H 1$, so it gives the power of the test. It is compared with the Monte Carlo distribution of $T 5$ with $10 6$ simulations. The numerical results are displayed in Table 4. The probabilities obtained by simulation are denoted $P E [ T 5 < t ]$, the probabilities obtained by the saddlepoint approximation are denoted $P S [ T 5 < t ]$ and $re ( t )$ denotes the absolute relative error given in Equation (22), for t in the lower and in the upper tails of the distribution. The relative errors of both lower and upper tails do not exceed $7 %$.
Example 10 (Bayesian bootstrap and Bayesian entropy test).
In Bayesian statistics, Dirichlet and multinomial distributions are conjugate: Dirichlet prior and multinomial likelihood lead to Dirichlet posterior. Precisely, if
$( Y ¯ 1 , … , Y ¯ n ) ∼ Dirichlet ( a 1 , … , a n )$
and
${ ( Y 1 , … , Y n ) ∣ ( Y ¯ 1 , … , Y ¯ n ) = ( y ¯ 1 , … , y ¯ n ) } ∼ Multinomial ( k ; y ¯ 1 , … , y ¯ n ) ,$
then
${ ( Y ¯ 1 , … , Y ¯ n ) | ( Y 1 , … , Y n ) = ( y 1 , … , y n ) } ∼ Dirichlet ( a 1 + y 1 , … , a n + y n ) ,$
$∀ a 1 , … , a n ∈ R + *$ and $( y 1 , … , y n ) ∈ Δ ¨ k n − 1$.
The Bayesian bootstrap was introduced by Rubin  as a method for approximating the posterior distribution of a random parameter; precisely the distribution of a function of $Y ¯ 1 , … , Y ¯ n$, given the observed data $( Y 1 , … , Y n ) = ( y 1 , … , y n )$. It consists in sampling of $( Y ¯ 1 , … , Y ¯ n )$ from Equation (38). This can be done by generating $Z j ∼$Gamma$( a j + y j , q )$, for $j = 1 , … , n$, independently, and by setting
$Y ¯ j = Z j ∑ i = 1 n Z i , f o r j = 1 , … , n .$
The value of $q ∈ R + *$ is irrelevant. Details can be found in Section 10.5 of . Assume that the parameter of interest is $T n ( Y ¯ 1 , … , Y ¯ n )$ that admits the M-statistic representation in Equation (1), then the saddlepoint approximation with the D-G representation can be used instead of the described sampling method.
Consider now the urn model of Section 2.2 with sampling with replacement, where the probability of drawing a ball of color $C j$ is given by the random variable $Y ¯ j$, for $j = 1 , … , n$. We are interested in the entropy $ε n ( Y ¯ )$, viz. Equation (23) as a function of $Y ¯ = ( Y ¯ 1 , … , Y ¯ n )$. According to Equation (10), $ε n ( Y ¯ )$ is the entropy of the sample proportions of colors $C 1 , … , C n$ under Polya’s sampling at steady state. Thus, $a j = a j , 0 / r$, for $j = 1 , … , n$; cf. Section 2.3. We consider the Bayesian testing problem $H 0 : { ε n ( Y ¯ ) ∈ [ ε 0 , log n ] }$, against $H 1 : { ε n ( Y ¯ ) ∈ [ 0 , ε 0 ) }$, for some $ε 0 ∈ ( 0 , log n )$. Then, $ρ 0 = P [ ε n ( Y ¯ ) ≥ ε 0 ]$ and $ρ 1 = P [ ε n ( Y ¯ ) < ε 0 ]$ are the prior probabilities of $H 0$ and $H 1$, respectively. Their analog posteriors are $r 0 ( y ) = P [ ε n ( Y ¯ ) ≥ ε 0 | Y = y ]$ and $r 1 ( y ) = P [ ε n ( Y ¯ ) < ε 0 | Y = y ]$, where $Y = ( Y 1 , … , Y n )$ and $y = ( y 1 , … , y n ) ∈ Δ ¨ k n − 1$. The Bayes factor of $H 0$ to $H 1$ is the posterior odds ratio $r 0 ( y ) / r 1 ( y )$ over the prior odds ratio $ρ 0 / ρ 1$, namely $φ ( y ) = ρ 1 r 0 ( y ) / { ρ 0 r 1 ( y ) }$. The Monte Carlo solution consists in sampling of $( Y ¯ 1 , … , Y ¯ n )$ from the prior in Equation (37) and then from the posterior (38), both levels by means of Equation (39). Thus, $r 0 ( y )$ and $r 1 ( y )$ are Bayesian bootstrap estimators of $ρ 0$ and $ρ 1$, respectively, and they allow for the evaluation of $φ ( y )$. Alternatively, these values can be obtained without repeated sampling by using the conditional saddlepoint approximation of Section 3 with the D-G representation.
Example 11 (Tests based on spacings).
The so-called spacings are the first order differences or gaps between successive values of the ordered sample. Let $U 1 , … , U l$ be absolutely continuous and i.i.d. over $[ 0 , 1 ]$, without loss of generality by the probability integral transform, let $0 ≤ U ( 1 ) ≤ … ≤ U ( l ) ≤ 1$ denote the ordered sample and let $U ( 0 ) = 0$ and $U ( l + 1 ) = 1$. For $n = l + 1$, the spacings are defined by
$Y ¯ j = U ( j ) − U ( j − 1 ) , f o r j = 1 , … , n .$
Thus, $( Y ¯ 1 , … , Y ¯ n )$ takes values in $Δ 1 n − 1$. Statistics that are defined as functions of spacings are used in various statistical problems, goodness-of-fit testing representing the most important (see, e.g., ). Spacings are essential in the analysis of circular data, because they form a maximal invariant w.r.t. changes of null direction and sense of rotation. For Borel functions $h j$, for $j = 1 , … , n$, important spacings statistics have the form
$∑ j = 1 n h j ( n Y ¯ j ) .$
If $h 1 = … = h n = h$, then the test statistic is called symmetric. Under the null hypothesis $H 0$ of uniformity of $U 1 , … , U l$, the D-G representation holds with $a 1 = … = a n = 1$, so that the n spacings are equivalent in distribution to n i.i.d. exponential random variables conditioned by their sum. As Equation (41) takes the form of the M-statistic in Equation (1), the saddlepoint approximation of Section 3 can be directly applied.
The conditional saddlepoint approximation with the D-G representation under $H 0$ is analyzed numerically by  in the following cases: Rao’s spacings test (viz., $h j ( x ) = | x − 1 | / 2$, for $j = 1 , … , n$), the logarithm spacings test (viz., $h j ( x ) = log x$, for $j = 1 , … , n$), Greenwood’s test (viz., $h j ( x ) = x 2$, for $j = 1 , … , n$) and a locally most powerful spacings test (viz., $h j ( x ) = Φ ( − 1 ) ( j / ( n + 1 ) ) x$, for $j = 1 , … , n$). In the context of reliability, Gatto and Jammalamadaka  re-expressed a uniformly most powerful test of exponentially, against alternatives with increasing failure rate, in terms of spacings. They obtained the saddlepoint approximation and show some numerical comparisons.
These spacings can be generalized to higher order differences or gaps. Let $g ∈ N *$ denote the gap order, selected such that $n = ( l + 1 ) / g ∈ N *$. The so-called multispacings are defined as
$Y ¯ j = U ( j g ) − U ( { j − 1 } g ) , for j = 1 , … , n .$
As previously, $( Y ¯ 1 , … , Y ¯ n )$ takes values in $Δ 1 n − 1$. When $g = 1$, the random variables in Equation (42) coincide with the spacings in Equation (40). Under $H 0$, the D-G representation holds with $a 1 = … = a n = g$.
Gatto and Jammalamadaka  provided explicit formulae of the saddlepoint approximations for Rao’s multispacings test and for the logarithmic multispacings test, together with a numerical study.
The next problem would be the computation of the distribution of a spacings or multispacings test statistic under a non-uniform alternative distribution. This can be done by saddlepoint approximation with the D-G representation whenever one can find the parameters $a 1 , … , a n ∈ R + *$ such that, under the alternative distribution, the spacings or multispacings satisfy $( Y ¯ 1 , … , Y ¯ n ) ∼$Dirichlet($a 1 , … , a n$). This would give the power of the test. However, re-expressing a non-uniform distribution in terms of a particular Dirichlet distribution does not appear practical, in general.

5. Final Remarks

This article presents the saddlepoint approximation for M-statistics of dependent random variables taking values in a simplex. Four conditional representations that allow re-expressing these dependent random variables as independent ones are presented. A detailed presentation of the underlying urn sampling model that is common to all four conditional representations is given. Important applications are reviewed. New applications are presented with some numerical comparisons between this saddlepoint approximation and Monte Carlo simulation. The numerical accuracy of the saddlepoint approximation appears very good.
A practical question concerns the relative advantages and disadvantages of using the conditional saddlepoint approximation presented in this article. Indeed, tail probabilities can be computed rapidly and more easily by means of Monte Carlo simulation. However, there is no unique answer to this general question, because several aspects should be considered.
First, when very small tail probabilities, e.g., $10 − 4$, or extreme quantiles are desired, then the simple Monte Carlo used in this article may not always lead to accurate results. The reason is that the saddlepoint approximation is a large deviation technique, with bounded relative error everywhere in the tails, whereas simple Monte Carlo has unbounded relative error in the tails. In fact, simple Monte Carlo is even not logarithmic efficient. This is well explained in  (pp. 158–160). To have bounded relative error, importance sampling is required. Then, the mathematical complexity would become close to the one of the saddlepoint approximation. Moreover, computing quantiles by importance sampling may not be straightforward. As shown above, this is quite simple with the saddlepoint approximation.
The computations required for this article were done with Matlab (R2017b, The MathWorks, Natick, MA, USA). The minimization program fminsearch was used for obtaining the saddlepoint defined in Equation (13). All Matlab programs are available at http://www.stat.unibe.ch. They should be easily used and modified for new related applications.
One should also mention that, having analytical expression such as a saddlepoint approximation for computing a quantity of interest, may have advantages. Monte Carlo and other purely numerical methods often do not provide such an expression. For example, the saddlepoint approximation can be used for computing the sensitivity of the upper tail probability, viz. the derivative of the tail probability w.r.t. to a parameter of the model. Gatto and Peeters  proposed evaluating the sensitivityof the tail probability of the random sum w.r.t. the parameter of the summation index distribution (which is either Poisson or geometric) with the saddlepoint approximation. They showed numerically that the sensitivities obtained by the saddlepoint approximation and by simulation with importance sampling are very close, but this no longer true when simulation is without importance sampling. In the case of computing sensitivity, importance sampling is significantly more computationally intensive than the saddlepoint approximation.
An application of the saddlepoint approximation that exploits a different conditional representation concerns the distribution of the inhomogeneous compound Poisson total claim amount under force of interest, in the context of insurance. It was suggested by  and the main idea is the following. The inhomogeneous Poisson process of occurrence times of individual claims is given by $0 ≤ T 1 ≤ T 2 ≤ …$. Let $N t$ denote the number of occurrences during the time interval $[ 0 , t ]$, for some $t > 0$. Then, $∀ n ∈ N *$,
${ ( T 1 , … , T N t ) | N t = n } ∼ ( Y ( 1 ) , … , Y ( n ) ) ,$
where $Y ( 1 ) ≤ … ≤ Y ( n )$ are the ordered values of some random variables $Y 1 , … , Y n$ that are nonnegative, i.i.d. and independent of ${ N t } t ≥ 0$. The individual claim amounts are represented by the random variables $X 1 , X 2 , ⋯$ that are nonnegative, i.i.d. and independent of ${ N t } t ≥ 0$. Let $r ∈ R$ denote the force of interest. The discounted total claim amount is $Z t = ∑ j = 0 N t e r ( t − T j ) X j$, for $T 0 = X 0 = 0$, and Equation (43) implies $Z t ∼ ∑ j = 0 N t e r ( t − Y j ) X j$, for $Y 0 = 0$. The last random sum has a simple structure and its distribution can be computed by the saddlepoint approximation of .
A technique that could exploit the four conditional representations of Section 2 for computing the conditional c.g.f. (and not the conditional saddlepoint approximation) can be found in . It is tentatively applied, with the MP-NB representation, to the symmetric spacing-frequencies test statistic in Equation (33) in  (Section 6.3.2). However, this approach seems impractical.
Another extension of the proposed approximation would concern neutrosophic statistics. In standard statistics, observations and parameters are represented by precise values, whereas in neutrosophic statistics they remain indeterminate (see, e.g., ).

Funding

This research received no external funding.

Acknowledgments

The author is thankful to three anonymous reviewers, to Sreenivasa Rao Jammalamadaka and to Ilya Molchanov for various discussions, remarks and suggestions that improved the quality of this article.

Conflicts of Interest

The author declares no conflict of interest.

References

1. Aitchison, J. The Statistical Analysis of Compositional Data; Chapman & Hall: London, UK, 1986. [Google Scholar]
2. Kotz, S.; Balakrishnan, N. Advances in urn models during the past two decades. In Advances in Combinatorial Methods and Applications to Probability and Statistics; Birkhäuser, Statistics for Industry and Technology: Boston, MA, USA, 1997; pp. 203–257. [Google Scholar]
3. Gatto, R.; Jammalamadaka, S.R. A conditional saddlepoint approximation for testing problems. J. Am. Stat. Assoc. 1999, 94, 533–541. [Google Scholar] [CrossRef]
4. Skovgaard, I.M. Saddlepoint expansions for conditional distributions. J. Appl. Prob. 1987, 24, 875–887. [Google Scholar] [CrossRef]
5. Gatto, R. Symbolic computation for approximating the distributions of some families of one and two-sample nonparametric test statistics. Stat. Comput. 2000, 11, 449–455. [Google Scholar]
6. Gatto, R.; Jammalamadaka, S.R. A saddlepoint approximation for testing exponentiality against some increasing failure rate alternatives. Stat. Prob. Lett. 2002, 58, 71–81. [Google Scholar] [CrossRef]
7. Gatto, R.; Jammalamadaka, S.R. Small sample asymptotics for higher order spacings. In Advances in Distribution Theory, Order Statistics and Inference Part III: Order Statistics and Applications; Birkhäuser, Statistics for Industry and Technology: Boston, MA, USA, 2006; pp. 239–252. [Google Scholar]
8. Butler, R.W. Saddlepoint Approximations with Applications; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
9. Reid, N. The roles of conditioning in inference. Stat. Sci. 1995, 10, 138–157. [Google Scholar] [CrossRef]
10. Mirakhmedov, S.M.; Jammalamadaka, S. Rao, Ibrahim, B.M. On Edgeworth expansions in generalized urn models. J. Theor. Prob. 2014, 27, 725–753. [Google Scholar] [CrossRef]
11. Butler, R.W.; Sutton, R.K. Saddlepoint approximation for multivariate cumulative distribution functions and probability computations in sampling theory and outlier testing. J. Am. Stat. Assoc. 1998, 93, 596–604. [Google Scholar] [CrossRef]
12. Good, I.J. Saddlepoint methods for the multinomial distribution. Ann. Math. Stat. 1957, 28, 861–881. [Google Scholar] [CrossRef]
13. Klugman, S.A.; Panjer, H.H.; Willmot, G.E. Loss Models: From Data to Decisions, 3rd ed.; Wiley & Sons: New York, NY, USA, 2008. [Google Scholar]
14. Ivchenko, G.I.; Ivanov, A.V. Decomposable statistics in inverse urn problems. Discr. Math. Appl. 1995, 5, 159–172. [Google Scholar] [CrossRef]
15. Copson, E.T. Asymptotic Expansions; Cambridge University Press: Cambridge, UK, 1965. [Google Scholar]
16. De Bruijn, N.G. Asymptotic Methods in Analysis; Dover Publications: New York, NY, USA, 1981. [Google Scholar]
17. Daniels, H.E. Saddlepoint approximations in statistics. Ann. Math. Stat. 1954, 25, 631–650. [Google Scholar] [CrossRef]
18. Lugannani, R.; Rice, S. Saddle point approximation for the distribution of the sum of independent random variables. Adv. Appl. Prob. 1980, 12, 475–490. [Google Scholar] [CrossRef]
19. Daniels, H.E. Tail probability approximations. Int. Stat. Rev. 1987, 55, 37–48. [Google Scholar] [CrossRef]
20. Wang, S. Saddlepoint approximations in conditional inference. J. Appl. Prob. 1993, 30, 397–404. [Google Scholar] [CrossRef]
21. Jing, B.; Robinson, J. Saddlepoint Approximations for Marginal and Conditional Probabilities of Transformed Variables. Ann. Stat. 1994, 22, 1115–1132. [Google Scholar] [CrossRef]
22. Kolassa, J.E. Higher-order approximations to conditional distribution functions. Ann. Stat. 1996, 24, 353–365. [Google Scholar] [CrossRef]
23. DiCiccio, T.J.; Martin, M.A.; Young, G.A. Analytical approximations to conditional distribution functions. Biometrika 1993, 80, 781–790. [Google Scholar] [CrossRef]
24. Field, C.A.; Tingley, M.A. Small sample asymptotics: Applications in robustness. In Handbook of Statistics; North-Holland: Amsterdam, The Netherlands, 1997; Volume 15, pp. 513–536. [Google Scholar]
25. Gatto, R. Saddlepoint approximations. In StatsRef: Statistics Reference Online; Wiley & Sons: New York, NY, USA, 2015; pp. 1–7. [Google Scholar]
26. Goutis, C.; Casella, G. Explaining the saddlepoint approximation. Am. Stat. 1999, 53, 216–224. [Google Scholar]
27. Reid, N. Saddlepoint methods and statistical inference. Stat. Sci. 1988, 3, 213–238. [Google Scholar] [CrossRef]
28. Field, C.A.; Ronchetti, E. Small Sample Asymptotics; Institute of Mathematical Statistics Lecture Notes-Monograph Series: Hayward, CA, USA, 1990. [Google Scholar]
29. Jensen, J.L. Saddlepoint Approximations; Oxford University Press: Oxford, UK, 1995. [Google Scholar]
30. Kolassa, J.E. Series Approximation Methods in Statistics, 3rd ed.; Springer Lecture Notes in Statistics; Springer: New York, NY, USA, 2006. [Google Scholar]
31. Wang, S. One-step saddlepoint approximations for quantiles. Comput. Stat. Data Anal. 1995, 20, 65–74. [Google Scholar] [CrossRef]
32. Shannon, C.E. The mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423, 623–656. [Google Scholar] [CrossRef]
33. Khinchin, A.I. Mathematical Foundations of Information Theory; English Translation of Two Original Articles in Russian; Dover Publications: New York, NY, USA, 1957. [Google Scholar]
34. Davison, A.C.; Hinkley, D.V. Saddlepoint approximations in resampling methods. Biometrika 1988, 75, 417–431. [Google Scholar] [CrossRef]
35. Feuerverger, A. On the empirical saddlepoint approximation. Biometrika 1989, 76, 457–464. [Google Scholar] [CrossRef]
36. Wang, S. Saddlepoint approximations in resampling analysis. Ann. Inst. Stat. Math. 1990, 42, 115–131. [Google Scholar] [CrossRef]
37. Ronchetti, E.; Welsh, A.H. Empirical saddlepoint approximations for multivariate M-estimators. J. R. Stat. Soc. B 1994, 56, 313–326. [Google Scholar] [CrossRef]
38. Davison, A.C.; Hinkley, D.V. Bootstrap Methods and Their Application; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
39. Abd-Elfattah, E.; Butler, R. Saddlepoint approximations for rank-invariant permutation tests and confidence intervals with interval-censoring. Can. J. Stat. 2014, 42, 308–324. [Google Scholar] [CrossRef]
40. Booth, J.G.; Butler, R.W. Randomization distributions and saddlepoint approximations in generalized linear models. Biometrika 1990, 77, 787–796. [Google Scholar] [CrossRef]
41. Gatto, R.; Jammalamadaka, S.R. On two-sample tests for circular data based on spacing-frequencies. In Geometry Driven Statistics; Wiley & Sons: New York, NY, USA, 2015; pp. 129–145. [Google Scholar]
42. Holst, L.; Rao, J.S. Asymptotic theory for some families of two-sample nonparametric statistics. Sankhyā Ser. A 1980, 42, 19–52. [Google Scholar]
43. Rubin, D.B. The Bayesian bootstrap. Ann. Stat. 1981, 9, 130–134. [Google Scholar] [CrossRef]
44. Pyke, R. Spacings. J. R. Stat. Soc. B 1965, 27, 395–449. [Google Scholar] [CrossRef]
45. Asmussen, S.; Glynn, P.W. Stochastic Simulation. Algorithms and Analysis; Springer: New York, NY, USA, 2007. [Google Scholar]
46. Gatto, R.; Peeters, C. Saddlepoint approximations to sensitivities of tail probabilities of random sums and comparisons with Monte Carlo estimators. J. Stat. Comput. Simul. 2015, 85, 641–659. [Google Scholar] [CrossRef]
47. Gatto, R. A saddlepoint approximation to the distribution of inhomogeneous discounted compound Poisson processes. Methodol. Comput. Appl. Prob. 2010, 12, 533–551. [Google Scholar] [CrossRef]
48. Bartlett, M.S. The characteristic function of a conditional statistic. J. Lond. Math. Soc. 1938, 13, 62–67. [Google Scholar] [CrossRef]
49. Aslam, M. Design of sampling plan for exponential distribution under neutrosophic statistical interval method. IEEE Access 2018, 6, 64153–64158. [Google Scholar] [CrossRef]
Figure 1. Estimator of coloration’s entropy under sampling with replacement ($T 6$). First graph: saddlepoint approximation to the distribution function, $P S [ T 6 < t ]$. Second graph: absolute error, $ae ( t )$. Third graph: absolute relative error, $re ( t )$.
Figure 1. Estimator of coloration’s entropy under sampling with replacement ($T 6$). First graph: saddlepoint approximation to the distribution function, $P S [ T 6 < t ]$. Second graph: absolute error, $ae ( t )$. Third graph: absolute relative error, $re ( t )$.
Table 1. Estimator of coloration’s entropy under sampling with replacement ($T 6$), selected lower and upper tail points; Monte Carlo probability ($P E$), saddlepoint probability ($P S$), absolute relative error ($re$).
Table 1. Estimator of coloration’s entropy under sampling with replacement ($T 6$), selected lower and upper tail points; Monte Carlo probability ($P E$), saddlepoint probability ($P S$), absolute relative error ($re$).
t$P E [ T 6 < t ]$$P S [ T 6 < t ]$$re ( t )$
1.200.001090.001030.058
1.310.010340.009720.060
1.360.026480.024220.085
1.400.047550.047530.000
1.450.101160.102050.009
1.690.891180.889590.014
1.720.956910.958230.032
1.730.973600.973030.023
1.750.991140.991550.048
1.770.998380.998760.306
Table 2. Proportion of total claim amount ($T 8$): Monte Carlo probability ($P E$), saddlepoint probability ($P S$), and absolute relative error ($re$).
Table 2. Proportion of total claim amount ($T 8$): Monte Carlo probability ($P E$), saddlepoint probability ($P S$), and absolute relative error ($re$).
t$P E [ T 8 < t ]$$P S [ T 8 < t ]$$re ( t )$
0.120.000350.000400.124
0.160.005230.005550.059
0.200.031190.032610.044
0.240.103470.105080.016
0.280.231780.232780.004
0.320.390240.397750.019
0.380.629880.640440.029
0.420.765660.769050.015
0.460.856010.859300.023
0.480.892810.891840.009
0.520.939600.939930.006
0.560.967330.967840.016
0.600.982940.983520.035
0.640.991370.991660.035
0.680.995770.995780.003
0.720.997990.997910.037
Table 3. Estimator of coloration’s entropy under sampling without replacement ($T 7$); Monte Carlo probability ($P E$), saddlepoint probability ($P S$), and absolute relative error ($re$).
Table 3. Estimator of coloration’s entropy under sampling without replacement ($T 7$); Monte Carlo probability ($P E$), saddlepoint probability ($P S$), and absolute relative error ($re$).
t$P E [ T n < t ]$$P S [ T n < t ]$$re ( t )$
1.300.000100.000080.247
1.350.000340.000310.075
1.400.001240.001190.042
1.450.003770.004050.074
1.500.012710.012400.024
1.550.033400.033930.015
1.600.083960.082500.017
1.650.176480.176800.002
1.700.334070.330880.010
1.750.539400.538960.001
1.800.755660.759790.017
1.850.940830.932260.145
1.900.995940.996380.109
Table 4. Most powerful test statistic for Dirichlet’s symmetry ($T 5$), selected lower and upper tail points: Monte Carlo probability ($P E$), saddlepoint probability ($P S$), and absolute relative error ($re$).
Table 4. Most powerful test statistic for Dirichlet’s symmetry ($T 5$), selected lower and upper tail points: Monte Carlo probability ($P E$), saddlepoint probability ($P S$), and absolute relative error ($re$).
t$P E [ T 5 < t ]$$P S [ T 5 < t ]$$re ( t )$
−20.50.001050.001100.044
−18.50.009760.010170.042
−17.60.025930.026610.026
−17.00.048140.049520.029
−16.30.097090.099510.025
−13.40.901560.902550.010
−13.20.956320.957530.029
−13.10.976610.977270.029
−13.00.991040.990900.015
−12.90.998330.998200.070