Saddlepoint Approximation for Data in Simplices: A Review with New Applications

Gatto, Riccardo

doi:10.3390/stats2010010

Open AccessArticle

Saddlepoint Approximation for Data in Simplices: A Review with New Applications

by

Riccardo Gatto

Institute of Mathematical Statistics and Actuarial Science, University of Bern, 3012 Bern, Switzerland

Stats 2019, 2(1), 121-147; https://doi.org/10.3390/stats2010010

Submission received: 23 January 2019 / Revised: 8 February 2019 / Accepted: 14 February 2019 / Published: 18 February 2019

Download

Browse Figure

Versions Notes

Abstract

:

This article provides a review of the saddlepoint approximation for a M-statistic of a sample of nonnegative random variables with fixed sum. The sample vector follows the multinomial, the multivariate hypergeometric, the multivariate Polya or the Dirichlet distributions. The main objective is to provide a complete presentation in terms of a single and unambiguous notation of the common mathematical framework of these four situations: the simplex sample space and the underlying general urn model. Some important applications are reviewed and special attention is given to recent applications to models of circular data. Some novel applications are developed and studied numerically.

Keywords:

bootstrap; circular data; Dirichlet distribution; entropy; likelihood ratio test; multinomial distribution; multivariate hypergeometric distribution; multivariate Polya distribution; spacings; spacing-frequencies; urn model

MSC:

41A60; 60C05

1. Introduction

The topic of this article is a saddlepoint approximation to the distribution of the M-statistic

T_{n}

, precisely

T_{n} (Y_{1}, \dots, Y_{n})

, which is the implicit solution with respect to (w.r.t.) t of

\sum_{j = 1}^{n} ξ_{j} (Y_{j}; t) = 0,

(1)

where the function

ξ_{j}

:

R_{+} \times R \to R

is continuous (thus measurable), decreasing in its second argument, for

j = 1, \dots, n

,

R_{+} = [0, \infty)

, and where the random variables

Y_{1}, \dots, Y_{n}

are nonnegative, dependent and satisfy

\sum_{j = 1}^{n} Y_{j} = k

, for some fixed

k > 0

. Decreasing is meant in the strict sense. The sample vector

(Y_{1}, \dots, Y_{n})

takes values in a simplex. It is often referred to as compositional data, by referring to the situation where

Y_{j}

represents the number of units of the jth category, for

j = 1, \dots, n

, given n possible categories (see e.g., [1]). When

(Y_{1}, \dots, Y_{n})

follows the multinomial distribution, it is also referred to as categorical data. We consider three discrete and one continuous joint distributions for

(Y_{1}, \dots, Y_{n})

and relate these multivariate distributions to three general urn sampling schemes that are given, e.g., in [2].

The derivation of the saddlepoint approximation to the distribution of

T_{n}

relies on the distributional equivalence

\begin{matrix} (Y_{1}, \dots, Y_{n}) & \sim ((X_{1}, \dots, X_{n}) | \sum_{j = 1}^{n} X_{j} = k), \end{matrix}

(2)

which means that

(Y_{1}, \dots, Y_{n})

has the conditional distribution of

(X_{1}, \dots, X_{n})

given

\sum_{j = 1}^{n} X_{j} = k

. The nonnegative random variables

X_{1}, \dots, X_{n}

form a conditional triangular array in the sense that, conditionally on their sum, they are independent and their individual distributions may depend on n. We refer to Equation (2) as the conditional representation of

(Y_{1}, \dots, Y_{n})

in terms of

(X_{1}, \dots, X_{n})

. The computation of the distribution of

T_{n}

, as function of the dependent random variables

Y_{1}, \dots, Y_{n}

, is generally difficult. It is however simplified by replacing these dependent random variables by the triangular array random variables

X_{1}, \dots, X_{n}

, in the same order, conditional on their sum. Gatto and Jammalamadaka [3] extended the saddlepoint approximation for tail probabilities of Skovgaard [4] to M-statistics and used the conditional representation in Equation (2) to derive saddlepoint approximations for important classes of nonparametric tests, such as tests based on spacings, two-sample tests based on spacing-frequencies and various tests based on ranks. The application of this conditional saddlepoint approximation to the computation of quantiles can be found in [5]. Further applications can be found in [6,7].

This article presents the conditional saddlepoint approximation from the general perspective of the urn sampling model. Four cases of the of conditional representations given in Equation (2) are related to the urn model: the joint multinomial in terms of Poisson random variables conditional on their sum (M-P), the joint multivariate hypergeometric in terms of binomial random variables conditional on their sum (MH-B), the joint multivariate Polya in terms of negative binomial random variables conditional on their sum (MP-NB) and the joint Dirichlet in terms of gamma random variables conditional on their sum (D-G). New applications or examples are given and tested numerically. Various previous applications of the conditional saddlepoint approximation are reviewed. Two other general references on conditional saddlepoint approximations are found in [8] (Chapter 4 and Section 12.5) and [9]. This article completes these references in various ways. It provides a concise and complete presentation of the conditional saddlepoint approximation for M-statistics (that includes an approximation to quantiles). It updates the previous reviews by presenting additional recent important examples. It gives a general reformulation with a consistent and homogeneous notation, that corresponds to a single underlying mathematical model (viz., the urn model and the simplex sample space). It includes new important examples and new numerical comparisons. The numerical illustrations are given for: the distribution of an estimator of the entropy that relates to the urn model, the power of the likelihood ratio test, the distribution of the insurer’s total claim amount and the null distribution of a test for symmetry of Dirichlet’s distribution.

Mirakhmedov et al. [10] used the three well-known conditional representations M-P, MH-B and MP-NB with the Edgeworth approximation. The Edgeworth is however not a large deviations approximation. Edgeworth approximations to small tail probabilities are usually less accurate than saddlepoint approximations. Butler and Sutton [11] proposed a particular saddlepoint approximation that exploits the conditional representation in Equation (2). It implies that, for all intervals

I_{1}, \dots, I_{n} \subset R_{+}

,

\begin{matrix} P [Y_{1} \in I_{1}, \dots, Y_{n} \in I_{n}] & = P [\sum_{j = 1}^{n} X_{j} = k | X_{1} \in I_{1}, \dots, X_{n} \in I_{n}] \frac{P [X_{1} \in I_{1}, \dots, X_{n} \in I_{n}]}{P [\sum_{j = 1}^{n} X_{j} = k]} . \end{matrix}

Then, the conditional probability above is approximated by a saddlepoint approximation for independent and truncated random variables. This method allows approximating the distribution of

M_{n} = {max}_{j = 1, \dots, n} Y_{j}

, for example, but does not allow approximating the distribution of the M-statistic in Equation (1). Note that, for the case where

(Y_{1}, \dots, Y_{n})

follows the multinomial distribution, given in Equation (3), Good [12] proposed a specific saddlepoint approximation for

M_{n}

.

This article has the following structure. Section 2 presents the four conditional representations, in Section 2.1 and Section 2.3. They are related to urn sampling schemes in Section 2.2. The three first conditional representations, namely M-P, MH-B and MP-NB, are for counting random variables. The fourth conditional representation is D-G and holds for positive random variables. Section 3 summarizes the conditional saddlepoint approximation for a M-statistics given another one: Section 3.1 and Section 3.2 are for tail probabilities and Section 3.3 for quantiles. Then, Section 4 provides new applications and numerical studies for this saddlepoint approximation and briefly reviews other important existing applications. Some final remarks are given in Section 5.

Regarding notation, we define

N = {0, 1, \dots}

,

N^{*} = N \ {0}

,

R_{+} = [0, \infty)

as already defined and

R_{+}^{*} = R_{+} \ {0}

. The Pochhammer symbol is defined by

\begin{matrix} {(x)}_{k} & = x \cdot \dots \cdot (x - k + 1), \forall x \in R, k \in N^{*} . \end{matrix}

The binomial coefficient is defined by

\begin{matrix} (\binom{x}{k}) = \{\begin{matrix} 0, & if k = - 1, - 2, \dots, \\ 1, & if k = 0, \\ \frac{{(x)}_{k}}{k!}, & if k = 1, 2, \dots, \end{matrix} \forall x \in R . \end{matrix}

The indicator function of the statement A is defined by

\begin{matrix} I {A} = \{\begin{matrix} 0, & if A is false, \\ 1, & if A is true . \end{matrix} \end{matrix}

Let

n \in {2, 3, \dots}

. A

(n - 1)

-simplex is the

(n - 1)

-dimensional polytope determined by the convex hull of its n vertices. We consider only the symmetric simplex. It is obtained by defining the jth vertex

v_{j} = (v_{0}, \dots, v_{n - 1})

by

\begin{matrix} v_{i} & = \{\begin{matrix} x, & if i = j, \\ 0, & otherwise, \end{matrix} for i = 0, \dots, n - 1, \end{matrix}

for any desired size

x \in R_{+}^{*}

and for

j = 0, \dots, n - 1

. This representation corresponds to the set

\begin{matrix} Δ_{x}^{n - 1} & = {(x_{1}, \dots, x_{n}) \in R_{+}^{n} | x_{1} + \dots + x_{n} = x} . \end{matrix}

We define also by

\begin{matrix} {\ddot{Δ}}_{k}^{n - 1} & = Δ_{k}^{n - 1} \cap N^{n} = {(k_{1}, \dots, k_{n}) \in N^{n} | k_{1} + \dots + k_{n} = k} \end{matrix}

the integer

(n - 1)

-simplex of size

k \in N^{*}

.

We denote by

X \sim Y

the fact that the two random elements X and Y have same distribution. The same symbol is used for the asymptotic equivalence.

2. Four Conditional Representations and Their Urn Sampling Interpretations

This section reviews four multivariate distributions for which the conditional representation in Equation (2) holds and relates them to a common urn model. Although these results are classical and can be retrieved perhaps separately in the literature, the contribution of this section must be sought in the single and unambiguous mathematical reformulation: of the multivariate distributions, of their conditional representations and of their urn model. The same notation is used for saddlepoint approximation in Section 3 and for the examples in Section 4. The first three models are presented in Section 2.1 and are related to the three urn sampling schemes in Section 2.2. In these three models,

(Y_{1}, \dots, Y_{n})

takes values in

{\ddot{Δ}}_{k}^{n - 1}

, for

k \in N^{*}

. Section 2.3 presents a fourth multivariate model where

(Y_{1}, \dots, Y_{n})

takes values in

Δ_{k}^{n - 1}

, for

k \in R_{+}^{*}

, and for which an asymptotic relation with one of the urn sampling models holds.

2.1. Three Conditional Representations for Counting Random Variables

The next three multivariate distributions allow for the conditional representation in Equation (2) and relate to the three urn sampling schemes of Section 2.2.

Multinomial—conditional Poisson (M-P)
Let $X_{j} \sim$ Poisson $(q p_{j})$ , i.e., Poisson distributed with parameter $q p_{j}$ , for $j = 1, \dots, n$ , be independent, where $(p_{1}, \dots, p_{n}) \in Δ_{1}^{n - 1}$ and $q \in R_{+}^{*}$ . Then, the conditional representation in Equation (2) holds with $(Y_{1}, \dots, Y_{n}) \sim$ $Multinomial (k; p_{1}, \dots, p_{n})$ , for $k \in N^{*}$ , that is, with

$P [Y_{1} = k_{1}, \dots, Y_{n} = k_{n}] = (\binom{k}{k_{1} \dots k_{n}}) p_{1}^{k_{1}} \dots p_{n}^{k_{n}},$

(3)

$\forall (k_{1}, \dots, k_{n}) \in {\ddot{Δ}}_{k}^{n - 1}$ , which is the multinomial distribution. Thus, $k = \sum_{j = 1}^{n} k_{j}$ .
Multivariate hypergeometric—conditional binomial (MH-B)
Let $X_{j} \sim$ Binomial $(m_{j}, q)$ , i.e., binomial distributed with $m_{j}$ trials and elementary probability q, for $j = 1, \dots, n$ , be independent, where $(m_{1}, \dots, m_{n}) \in {\ddot{Δ}}_{z}^{n - 1}$ , $z = \sum_{j = 1}^{n} m_{j}$ and $q \in (0, 1)$ . Then, the conditional representation in Equation (2) holds with $(Y_{1}, \dots, Y_{n}) \sim$ Multi-Hypergeometric $(k; m_{1}, \dots, m_{n})$ , for $k \in N^{*}$ , that is with

$P [Y_{1} = k_{1}, \dots, Y_{n} = k_{n}] = \frac{\prod_{j = 1}^{n} (\binom{m_{j}}{k_{j}})}{(\binom{z}{k})},$

(4)

for $k_{j} = 0, \dots, m_{j}$ , for $j = 1, \dots, n$ , and $k = \sum_{j = 1}^{n} k_{j}$ $\leq z$ , which is the multivariate hypergeometric distribution. Thus, $(k_{1}, \dots, k_{n}) \in {\ddot{Δ}}_{k}^{n - 1} \cap ([0, m_{1}] \times \dots \times [0, m_{n}])$ .
Multivariate Polya—conditional negative binomial (MP-NB)
Let $X_{j} \sim$ Negative-Binomial $(m_{j}, q)$ , i.e.

$P [X_{j} = l] = (\binom{l + m_{j} - 1}{l}) q^{m_{j}} {(1 - q)}^{l}, for l = 0, 1, \dots,$

for $j = 1, \dots, n$ , be independent, where $(m_{1}, \dots, m_{n}) \in Δ_{u}^{n - 1}$ , for some $u \in R_{+}^{*}$ , and $q \in (0, 1)$ . Thus, $u = \sum_{j = 1}^{n} m_{j}$ . Then, the conditional representation in Equation (2) holds with $(Y_{1}, \dots, Y_{n}) \sim$ Multi-Polya $(k; m_{1}, \dots, m_{n})$ , for $k \in N^{*}$ , that is with

$P [Y_{1} = k_{1}, \dots, Y_{n} = k_{n}] = \frac{\prod_{j = 1}^{n} (\binom{m_{j} + k_{j} - 1}{k_{j}})}{(\binom{u + k - 1}{k})},$

(5)

$\forall (k_{1}, \dots, k_{n}) \in {\ddot{Δ}}_{k}^{n - 1}$ , which is the multivariate Polya distribution. Thus, $k = \sum_{j = 1}^{n} k_{j}$ .

We end this section with three remarks of general interest. We first note that in these three situations the conditional representation in Equation (2) holds independently of the choice of q, in

R_{+}^{*}

for the M-P and in

(0, 1)

for the MH-B and MP-NB representations. This independence can be understood from fact that, in all three cases,

\sum_{j = 1}^{n} X_{j}

is a sufficient statistic for q. This is a consequence of the factorization theorem of sufficient statistics.

We can see that each one of the three conditional representations have an interpretation in terms of mixture models. For example, consider the independent random variables

X_{j} \sim

Poisson

(q p_{j})

, for

j = 1, \dots, n

, where

(p_{1}, \dots, p_{n}) \in Δ_{1}^{n - 1}

and

q \in R_{+}^{*}

. Then,

\forall k_{1}, \dots, k_{n} \in N

, for

k = \sum_{j = 1}^{n} k_{j}

and

K = \sum_{j = 1}^{n} X_{j}

,

\begin{matrix} P [X_{1} = k_{1}, \dots, X_{n - 1} = k_{n - 1}] & = \sum_{k_{n} = 0}^{\infty} P [X_{1} = k_{1}, \dots, X_{n - 1} = k_{n - 1}, X_{n} = k_{n}] \\ = \sum_{k_{n} = 0}^{\infty} (\binom{k}{k_{1} \dots k_{n - 1} k_{n}}) p_{1}^{k_{1}} \dots p_{n - 1}^{k_{n - 1}} p_{n}^{k_{n}} e^{- q} \frac{q^{k}}{k!} . \end{matrix}

(6)

Thus,

(X_{1}, \dots, X_{n - 1})

follows the countable mixture distribution given by multinomial probabilities with Poisson mixing probabilities. Moreover,

\begin{matrix} \sum_{k_{n} = 0}^{\infty} P [X_{1} = k_{1}, \dots, X_{n - 1} = k_{n - 1}, X_{n} = k_{n}] & = \sum_{k_{n} = 0}^{\infty} P [X_{1} = k_{1}, \dots, X_{n - 1} = k_{n - 1}, X_{n} = k_{n} ∣ K = k] P [K = k] \\ = \sum_{k_{n} = 0}^{\infty} P [X_{1} = k_{1}, \dots, X_{n - 1} = k_{n - 1} ∣ K = k] P [K = k] . \end{matrix}

(7)

By equating the multinomial and the Poisson probabilities of Equation (6) to the two probabilities of Equation (7), for any summand, we obtain the M-P conditional representation.

We also note that that the three distributions of

X_{1}, \dots, X_{n}

(before conditioning) correspond to the three distributions of the

(a, b, 0)

class. The probability distribution

{p_{n}}_{n \geq 0}

belongs to the

(a, b, 0)

class, if it satisfies the recurrence relation

p_{n} = (a + b / n) p_{n - 1}

, for

n = 1, 2, \dots

and for some

a, b \in R

(see, e.g., Section 6.5 of [13]).

2.2. Three Associated Urn Sampling Schemes

The three multivariate distributions presented in the previous section provide probability models for three sampling schemes: sampling with replacement, sampling without replacement and Polya’s sampling. These three sampling schemes are reunited in a single general urn sampling model by Ivchenko and Ivanov [14] (see also [2]). Consider an urn containing balls with the n different colors

C_{1}, \dots, C_{n} .

At the beginning, the urn contains:

a_{j, 0} \in N

balls of color

C_{j}

, for

j = 1, \dots, n

. Each single ball is drawn equiprobably from the urn. Immediately after the lth draw of a ball of color

C_{j}

,

a_{j, l - 1} \in N^{*}

is updated by

a_{j, l} \in N

; this holds for

l = 1, 2, \dots

and

j = 1, \dots, n

. Three updating mechanisms are presented in the next paragraph. Thus, immediately after drawing

k_{j}

balls of color

C_{j}

, for

j = 1, \dots, n

, and therefore just after a total of

k = \sum_{j = 1}^{n} k_{j}

draws, the urn contains

a_{j, k_{j}}

balls of color

C_{j}

, for

j = 1, \dots, n

. The updated sampling probability of color

C_{j}

is thus

\begin{matrix} p_{j}^{(k_{1}, \dots, k_{n})} & = {(\sum_{j = 1}^{n} a_{j, k_{j}})}^{- 1} a_{j, k_{j}}, for j = 1, \dots, n, \end{matrix}

provided

\sum_{j = 1}^{n} a_{j, k_{j}} > 0

. The random count

Y_{j}

represents the number of randomly drawn balls of color

C_{j}

, this for

j = 1, \dots, n

, after a fixed total number of draws

k = \sum_{j = 1}^{n} Y_{j} \in N^{*}

. Define by

z = \sum_{j = 1}^{n} a_{j, 0}

the initial total number of balls in the urn.

We are interested in the distribution of the M-statistic

T_{n}

viz.

T_{n} (Y_{1}, \dots, Y_{n})

defined in Equation (1), under the three following sampling schemes.

Sampling with replacement and M-P representation
Sampling with replacement from the urn is obtained by setting

$a_{j, l} = a_{j, 0}, for l = 1, 2, \dots and j = 1, \dots, n .$

Thus, $p_{j}^{(k_{1}, \dots, k_{n})}$ , for $j = 1, \dots, n$ , do not depend on $k_{1}, \dots, k_{n}$ and the multinomial distribution in Equation (3) holds with rational $p_{j} = p_{j}^{(k_{1}, \dots, k_{n})} = a_{j, 0} / z$ , for $j = 1, \dots, n$ . Thus, $(Y_{1}, \dots, Y_{n})$ takes values in ${\ddot{Δ}}_{k}^{n - 1}$ and the M-P representation holds.
Sampling without replacement and MH-B representation
Sampling without replacement from the urn is obtained by setting

$a_{j, l} = \{\begin{matrix} a_{j, l - 1} - 1 = a_{j, 0} - l, & if l \leq a_{j, 0}, \\ 0, & if l > a_{j, 0}, \end{matrix} for l = 1, 2, \dots and j = 1, \dots, n .$

Assume that $k_{j} \leq a_{j, 0}$ balls of color $C_{j}$ have been drawn, for $j = 1, \dots, n$ . The probability of drawing a ball of color $C_{j}$ in the next draw is $p_{j}^{(k_{1}, \dots, k_{n})} = (a_{j, 0} - k_{j}) / (z - k)$ , if $k < z$ , and it is undefined, if $k = z$ , for $j = 1, \dots, n$ . The multivariate hypergeometric distribution in Equation (4) holds with $m_{j} = a_{j, 0}$ , for $j = 1, \dots, n$ , and z equal to the parameter z of the present section. Thus, $(Y_{1}, \dots, Y_{n})$ takes values in ${\ddot{Δ}}_{k}^{n - 1} \cap ([0, a_{1, 0}] \times \dots \times [0, a_{n, 0}])$ and the MH-B representation holds.
Polya’s sampling and MP-NB representation
Polya’s sampling scheme is obtained by setting

$\begin{matrix} a_{j, l} & = a_{j, l - 1} + r = a_{j, 0} + l r, for l = 1, 2, \dots and j = 1, \dots, n, \end{matrix}$

(8)

where $r \in N^{*}$ . (Allowing for $r = 0$ would result in sampling with replacement and allowing for $r = - 1$ would result in sampling without replacement, which are already presented.) Assume that $k_{j}$ balls of color $C_{j}$ have been drawn, for $j = 1, \dots, n$ . The probability of drawing a ball of color $C_{j}$ in the next draw is $p_{j}^{(k_{1}, \dots, k_{n})} = (a_{j, 0} + k_{j} r) / (z + k r)$ , for $j = 1, \dots, n$ . In this case, the multivariate Polya distribution in Equation (5) holds with rational $m_{j} = a_{j, 0} / r$ , for $j = 1, \dots, n$ , and rational $u = z / r$ . Thus, $(Y_{1}, \dots, Y_{n})$ takes values in ${\ddot{Δ}}_{k}^{n - 1}$ and the MP-NB representation holds.

2.3. A Conditional Representation for Positive Random Variables and Its Urn Sampling Interpretation

This section presents a fourth model that allows for the conditional representation in Equation (2). It is the Dirichlet distribution and it has a steady state interpretation in terms of Polya’s urn. Now, the dependent random variables

Y_{1}, \dots, Y_{n}

take values in

R_{+}

and cannot yet be considered as counts of the urn model of Section 2.2.

Dirichlet—conditional gamma (D-G)
Let $X_{j} \sim Gamma (a_{j}, q)$ , with density $q^{a_{j}} e^{- q x} x^{a_{j} - 1} / Γ (a_{j})$ , $\forall x > 0$ , for $j = 1, \dots, n$ , be independent, where $a_{1}, \dots, a_{n}$ and $q \in R_{+}^{*}$ . Then, the conditional representation in Equation (2) holds with $(Y_{1}, \dots, Y_{n}) \sim k ({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n})$ , where $({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n})$ is Dirichlet distributed with density

$P [{\bar{Y}}_{1} \in (y_{1}, y_{1} + d y_{1}), \dots, {\bar{Y}}_{n} \in (y_{n}, y_{n} + d y_{n})] = \frac{Γ (a_{1} + \dots + a_{n})}{Γ (a_{1}) \dots Γ (a_{n})} y_{1}^{a_{1} - 1} \dots y_{n}^{a_{n} - 1} d y_{1} \dots d y_{n},$

(9)

$\forall (y_{1}, \dots, y_{n}) \in int Δ_{1}^{n - 1}$ and for $d y_{n} = - (d y_{1} + \dots + d y_{n - 1})$ , which is denoted $({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n}) \sim Dirichlet (a_{1}, \dots, a_{n})$ .

The validity of Equation (2) does not depend on the parameter

q \in R_{+}^{*}

of the gamma distribution. This independence follows from the factorization theorem of sufficient statistics.

The Dirichlet distribution represents the steady state of Polya’s urn sampling scheme, viz. of the multivariate Polya distribution given in Section 2.2.

Polya’s sampling and D-G representation
Precisely, immediately after drawing a ball of color $C_{j}$ , it is replaced together with $r \in N^{*}$ new balls of same color $C_{j}$ , this for $j = 1, \dots, n$ , cf. Equation (8). If we let the total number of draws k go to infinity, then the vector of the proportions of the n drawn colors follows the Dirichlet $(a_{1, 0} / r, \dots, a_{n, 0} / r)$ distribution, viz.

$\begin{matrix} \frac{1}{k} (Y_{1}, \dots, Y_{n}) & \overset{d}{⟶} ({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n}), as k \to \infty, \end{matrix}$

(10)

where $({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n})$ has the Dirichlet distribution in Equation (9) with $a_{j} = a_{j, 0} / r$ , for $j = 1, \dots, n$ . Thus, if $(Y_{1}, \dots, Y_{n})$ follows the multivariate Polya distribution in Equation (5), taking values in ${\ddot{Δ}}_{k}^{n - 1}$ , then it is approximatively distributed as $k ({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n})$ , taking values in $Δ_{k}^{n - 1}$ .
To see Equation (10), let $(k_{1}, \dots, k_{n}) \in {\ddot{Δ}}_{k}^{n - 1}$ . The multivariate Polya probability in Equation (5) can be re-expressed as

$P [Y_{1} = k_{1}, \dots, Y_{n} = k_{n}] = \frac{Γ (u)}{\prod_{j = 1}^{n} Γ (m_{j})} \frac{Γ (1 + k)}{Γ (u + k)} \prod_{j = 1}^{n} \frac{Γ (m_{j} + k_{j})}{Γ (1 + k_{j})} .$

It follows from Stirling’s formula that $Γ (x + z_{1}) / Γ (x + z_{2}) \sim x^{z_{1} - z_{2}}$ , as $x \to \infty$ , $\forall z_{1}, z_{2} \in R$ . Consequently,

$\begin{matrix} P [Y_{1} = k_{1}, \dots, Y_{n} = k_{n}] \sim c_{1} (k) \prod_{j = 1}^{n} k_{j}^{m_{j} - 1} \sim c_{2} (k) \prod_{j = 1}^{n} y_{j}^{m_{j} - 1} = c_{2} (k) \prod_{j = 1}^{n} y_{j}^{\frac{a_{j, 0}}{r} - 1}, as k \to \infty, \end{matrix}$

for some positive constants $c_{1} (k)$ and $c_{2} (k)$ depending on k and for $y_{j} = {lim}_{k \to \infty} k_{j} / k$ , for $j = 1, \dots, n$ .

3. Conditional Saddlepoint Approximation for M-Statistics

The saddlepoint method, viz. method of steepest descent, allows approximating integrals of the form

\int_{ρ} f (z) e^{ν g (z)} d z

, for large values of

ν > 0

, where f:

C \to C

and g:

C \to C

are analytic functions in a domain containing the path

ρ

and its deformations. Let

z_{0}

be point where the real part of g is the highest. It is a saddlepoint of the surface given by the real part of g. For large values of

ν

, the value of the integral is accurately approximated as follows. First, restrict

ρ

to a small neighborhood of

z_{0}

. Second, deform

ρ

such that it crosses

z_{0}

and so that the real part of g decreases fast to

- \infty

, when descending from

z_{0}

down to the endpoints of the deformed

ρ

. This is the path of steepest descent. The final step is the term-by-term integration, within the neighborhood of

z_{0}

, of an asymptotic expansion of the integrand around

z_{0}

. Two references are [15,16].

This method yields approximations to densities or tail probabilities of various random variables such as estimators or test statistics. The sample size n takes the role of the asymptotic parameter

ν

and the relative error of the saddlepoint approximation vanishes at rate

n^{- 1}

, as

n \to \infty

. Unlike normal or Edgeworth approximations, saddlepoint approximations are valid at any fixed point (not depending on n) of the support of the distribution. They are thus large deviations techniques. For these two reasons, they provide accurate approximations to small tail probabilities, in fact even for small values of n. The saddlepoint approximation was introduced into statistics by Daniels [17], for approximating density functions. Lugannani and Rice [18] provided a formula for tail probabilities (see also [19]).

Saddlepoint approximations for conditional distributions were proposed by: Skovgaard [4] for the distribution of a sample mean given another mean; Wang [20] for the distribution of a mean given a nonlinear function of another mean; and Jing and Robinson [21] for the distribution of a nonlinear function of a mean given a nonlinear function of another mean. Kolassa [22] derived higher order terms to the conditional saddlepoint approximation of a sample mean given another mean, by using a different expansion to an integral appearing [4]. DiCiccio [23] provided a different approximation, which is however restricted to the exponential class of distributions.

Some survey articles are [24,25,26,27]. General references are [8,28,29,30].

The saddlepoint approximation to conditional distribution of Skovgaard [4] is re-expressed for the M-statistic defined in Equation (1) by [3]. This is summarized in Section 3.1. A modification for the lattice case is given in Section 3.2. A method for computing quantiles is given in Section 3.3.

3.1. Approximation to the Distribution

Consider n absolutely continuous and independent random variables

X_{1}, \dots, X_{n}

and the M-statistic

(S_{1, n}, S_{2, n})

viz.

(S_{1, n} (X_{1}, \dots, X_{n}), S_{2, n} (X_{1}, \dots, X_{n}))

, which is the solution w.r.t.

(s_{1}, s_{2})

of

\sum_{j = 1}^{n} (\begin{matrix} ψ_{1, j} (X_{j}; s_{1}, s_{2}) \\ ψ_{2, j} (X_{j}; s_{2}) \end{matrix}) = 0,

(11)

where

ψ_{1, j} : R^{3} \to R

is a continuous function that is decreasing in its second argument and

ψ_{2, j} : R^{2} \to R

is a continuous function that is decreasing in its second argument, for

j = 1, \dots, n

. The joint cumulant generating function (c.g.f.) of the summands in Equation (11) is given by

K_{n} (v; s) = \sum_{j = 1}^{n} log E [exp {v_{1} ψ_{1, j} (X_{j}; s_{1}, s_{2}) + v_{2} ψ_{2, j} (X_{j}; s_{2})}],

(12)

where

v = (v_{1}, v_{2}) \in R^{2}

and

s = (s_{1}, s_{2}) \in R^{2}

. Define also

K_{2 n} (v_{2}; s_{2}) = K_{n} ((0, v_{2}); (0, s_{2}))

. The first computational step is to find the saddlepoint

α = (α_{1}, α_{2}) \in R^{2}

, which is the solution w.r.t.

v = (v_{1}, v_{2})

of

\frac{\partial}{\partial v} K_{n} (v; s) = 0,

(13)

and the “marginal saddlepoint”

β \in R

, which is the solution w.r.t.

v_{2}

of

\frac{\partial}{\partial v_{2}} K_{2 n} (v_{2}; s_{2}) = 0 .

(14)

Next, define

K_{n}^{″} (v; s) = \frac{\partial^{2}}{\partial v \partial v^{T}} K_{n} (v; s), K_{2, n}^{″} (v_{2}; s_{2}) = \frac{\partial^{2}}{\partial v_{2}^{2}} K_{2 n} (v_{2}; s_{2}),

\begin{matrix} ρ (s) & = sgn (α_{1}) {2 [K_{2 n} (β; s_{2}) - K_{n} (α; s)]}^{\frac{1}{2}} and σ (s) = α_{1} {(\frac{det K_{n}^{″} (α; s)}{K_{2, n}^{″} (β; s_{2})})}^{\frac{1}{2}} . \end{matrix}

(15)

With these quantities, we obtain the saddlepoint approximation

P_{n} (s_{1} ∣ s_{2}) = 1 - Φ \circ ρ (s) + ϕ \circ ρ (s) (\frac{1}{σ (s)} - \frac{1}{ρ (s)}),

(16)

where

ϕ

and

Φ

are the standard normal density and distribution function. Then,

P [S_{1, n} \geq s_{1} ∣ S_{2, n} = s_{2}] = P_{n} (s_{1} ∣ s_{2}) {1 + O (n^{- 1})}, as n \to \infty .

(17)

Thus, the saddlepoint approximation in Equation (16) possesses a vanishing relative error and at any value of the argument

s_{1}

, that is, over large deviations regions.

By selecting

X_{1}, \dots, X_{n}

from any one of the four conditional representations, M-P, MH-B, of MP-NB of Section 2.1 or D-G of Section 2.3, and by setting

ψ_{1, j} (x; s_{1}, s_{2}) = ξ_{j} (x; s_{1})

and

ψ_{2, j} (x; s_{2}) = x - s_{2}

, for

j = 1, \dots, n

, we obtain

\begin{matrix} P [T_{n} \geq t] & = P_{n} (t | \frac{k}{n}) {1 + O (n^{- 1})}, as n \to \infty, \end{matrix}

(18)

for

T_{n}

defined in Equation (1).

Precisely, it follows from the conditional representation in Equation (2) that

T_{n} (Y_{1}, \dots, Y_{n}) \sim (S_{1, n} (X_{1}, \dots, X_{n}) | S_{2, n} (X_{1}, \dots, X_{n}) = \frac{k}{n}) .

This equivalence and Equation (17) give Equation (18).

The argument

s_{2}

of

ψ_{1, j} (x; s_{1}, s_{2})

is not considered here, but it is useful in one example in [3].

As mentioned, the justification of this saddlepoint approximation can be found in [4] and it would be too long to reproduce it here. However, we can give a few general ideas. Let us consider

(U_{1}, V_{1}), \dots, (U_{n}, V_{n})

independent and identically distributed (i.i.d.), absolutely continuous and with joint c.g.f. K. Let

(\bar{U}, \bar{V})

denote their sample mean. Then, the Fourier inversion and integration of the joint density gives

\begin{matrix} P [\bar{V} \geq v ∣ \bar{U} = u] & = {(\frac{n}{2 π i})}^{2} \int_{c - i \infty}^{c + i \infty} \int_{i \infty}^{i \infty} exp {n [K (s, t) - s u - t v]} d s \frac{d t}{n t}, \end{matrix}

for

u, v \in R

and

c > 0

. For the integral w.r.t. s, a standard saddlepoint approximation is used. The resulting saddlepoint approximation is an integral w.r.t. t and, due to a singularity, a modified saddlepoint approximation similar to the one in [18] must used to approximate this integral. The generalization from the sample mean to the M-statistic in Equation (11) follows directly from

\begin{matrix} P [S_{1, n} \geq s_{1} ∣ S_{2, n} = s_{2}] & = P [\sum_{j = 1}^{n} ψ_{1, j} (X_{j}, s_{1}, s_{2}) \geq 0 | \sum_{j = 1}^{n} ψ_{2, j} (X_{j}, s_{2}) = 0], \end{matrix}

for

s_{1}, s_{2} \in R

, which is due to the fact that

ψ_{1, j}

and

ψ_{2, j}

are decreasing in their second argument.

3.2. Modifications for Discrete Statistics

A slight modification of this saddlepoint approximation for the case where

T_{n}

takes values in the lattice

{j δ / n}_{j \in Z}

, for some

δ > 0

, is obtained by replacing

σ (s)

in Equation (16) by

\begin{matrix} \ddot{σ} (s) & = (1 - exp {- δ α_{1}}) {(\frac{det K_{n}^{″} (α; s)}{K_{2 n}^{″} (β; s_{2})})}^{\frac{1}{2}} . \end{matrix}

(19)

Moreover, the following continuity correction can be considered. For the lattice point

s_{1}

, define

{\tilde{s}}_{1} = s_{1} - δ / (2 n)

,

\tilde{s} = ({\tilde{s}}_{1}, s_{2})

and

\tilde{α} = ({\tilde{α}}_{1}, {\tilde{α}}_{2})

as the solution w.r.t.

v

of

\begin{matrix} \frac{\partial}{\partial v} K_{n} (v; \tilde{s}) & = 0 . \end{matrix}

Then, replace

ρ (s)

and

σ (s)

in Equation (16) by

\begin{matrix} \tilde{ρ} (\tilde{s}) = sgn ({\tilde{α}}_{1}) {2 [K_{2 n} (β; s_{2}) - K_{n} (\tilde{α}; \tilde{s})]}^{\frac{1}{2}} and \tilde{σ} (\tilde{s}) = 2 \sinh (\frac{δ}{2} {\tilde{α}}_{1}) {(\frac{det K_{n}^{″} (\tilde{α}; \tilde{s})}{K_{2 n}^{″} (β; s_{2})})}^{\frac{1}{2}}, \end{matrix}

respectively. The justifications can be found in [4,19]. The relative error of these modified approximations remains

O (n^{- 1})

.

3.3. Approximation to Quantiles

Define

ζ (s) = ρ (s) + log {σ (s) / ρ (s)} / ρ (s)

, for

ρ

and

σ

defined in Equation (15). An asymptotically equivalent version of the saddlepoint approximation in Equation (16) is given by

P_{n}^{*} (s_{1} ∣ s_{2}) = 1 - Φ \circ ζ (s)

. This formula leads to a fast algorithm for approximating quantiles, with same asymptotic error as the one entailed by exact inversion of the saddlepoint approximation. The general idea of Wang [31] was adapted to the present situation by Gatto [5].

Let

ε \in (0, 1)

. One starts with any reasonable approximation to the desired

ε

-quantile, for example the normal one, given by

s_{1}^{(0)} (ε) = \frac{τ (s_{2})}{\sqrt{n}} Φ^{(- 1)} (ε) + μ (s_{2}),

where

μ (s_{2}) ≃ E [S_{1, n} ∣ S_{2, n} = s_{2}]

and

τ^{2} (s_{2}) ≃ n var (S_{1, n} ∣ S_{2, n} = s_{2})

.

Re-denote by

α (s)

the saddlepoint at

s

, viz. the solution of Equation (13) w.r.t.

v

. Denote

{\dot{K}}_{n} (v; s) = \partial / \partial s K_{n} (v; s)

. One computes, for

j = 0, 1

,

\begin{matrix} s_{1}^{(j + 1)} (ε) & = s_{1}^{(j)} (ε) + \frac{{Φ^{(- 1)} (ε)}^{2} - ζ^{2} (s^{(j)} (ε))}{- 2 {{\dot{K}}_{n} (α (s^{(j)} (ε)); s^{(j)} (ε))}_{1}}, \end{matrix}

(20)

where

s^{(j)} (ε) = (s_{1}^{(j)} (ε), s_{2})

. If

s_{1} (ε)

denotes the exact

ε

-quantile, then

\begin{matrix} s_{1}^{(2)} (ε) & = s_{1} (ε) {1 + O (n^{- \frac{3}{2}})}, as n \to \infty . \end{matrix}

Moreover, if

{\tilde{s}}_{1} (ε)

denotes the

ε

-quantile obtained by exact inversion of the saddlepoint distribution, then

s_{1}^{(2)} (ε) = {\tilde{s}}_{1} (ε) {1 + O (n^{- 3 / 2})}

, as

n \to \infty

. Therefore, stopping the iteration of Equation (20) at

j = 1

is sufficient in terms of asymptotic accuracy.

Consider the simple case

ψ_{1, j} (x; s_{1}, s_{2}) = g (x) - s_{1}

, for some continuous function

g : R \to R

. Then, Equation (11) yields

S_{1, n} (X_{1}, \dots, X_{n}) = n^{- 1} \sum_{j = 1}^{n} g (X_{j})

. In this situation, the denominator of the ratio in Equation (20) simplifies to

2 {α (s^{(j)} (ε))}_{1}

.

4. Applications

This section presents various examples that illustrate the relevance and accuracy of the conditional saddlepoint approximation for M-statistics of Section 3 with the M-P, MH-B, MP-NB and D-G representations of Section 2, respectively, in Section 4.1, Section 4.2, Section 4.3 and Section 4.4. Important applications or examples from previous articles are summarized and novel examples are developed. The common urn sampling model of all examples is always put in the forefront. Many examples are studied numerically. The values obtained by the saddlepoint approximation are always very close to the ones obtained by Monte Carlo simulation. This section is however not a complete list of applications: further examples can be found, e.g., in [8,9] (Chapter 4 and Section 12.5).

As mentioned, the accuracy of the saddlepoint approximation is assessed through comparisons with simple Monte Carlo simulation. The following measures of accuracy for approximating the distribution of the statistic

T_{n}

are considered. Let

t > 0

. The probabilities obtained by simulation are considered as exact and denoted

P_{E} [T_{n} < t]

. The probabilities obtained by the saddlepoint approximation are denoted

P_{S} [T_{n} < t]

. Then,

ae (t) = | P_{S} [T_{n} < t] - P_{E} [T_{n} < t] | = | P_{S} [T_{n} \geq t] - P_{E} [T_{n} \geq t] |

(21)

denotes the absolute error and

re (t) = \frac{| P_{S} [T_{n} < t] - P_{E} [T_{n} < t] |}{min {P_{E} [T_{n} < t], 1 - P_{E} [T_{n} < t]}} = \frac{| P_{S} [T_{n} \geq t] - P_{E} [T_{n} \geq t] |}{min {P_{E} [T_{n} \geq t], 1 - P_{E} [T_{n} \geq t]}}

(22)

denotes the absolute relative error.

4.1. Sampling with Replacement and M-P Representation

Three new illustrations of the saddlepoint approximation with the M-P representation are presented. Example 1 considers the entropy of the coloration probabilities of the balls of the urn. Numerical evaluations of the saddlepoint approximation to the distribution of the estimator of the entropy are given. Example 2 concerns the likelihood ratio test for the null hypothesis of equality of the coloration probabilities. The power under a particular alternative hypothesis is computed numerically. Example 3 considers the insurer total claim amount when the individual claim settlement is delayed. The saddlepoint approximation to the distribution of the total claim amount is analyzed numerically. Example 4 reviews the application of the saddlepoint approximation to the bootstrap distribution of the M-statistic in Equation (1).

Example 1 (Entropy’s estimator under sampling with replacement).

The mathematical study of entropy began with Shannon [32], for the construction of a model for the transmission of information. In sampling with replacement from the urn, the probability of drawing a ball of color

C_{j}

is fixed and given by

p_{j} = a_{j, 0} / z

, for

j = 1, \dots, n

. Define

p = (p_{1}, \dots, p_{n}) \in Δ_{1}^{n - 1}

. The entropy of the coloration is given by

ε_{n} (p) = - \sum_{j = 1}^{n} p_{j} log p_{j},

(23)

where

0 log 0 = 0

is assumed. The entropy

ε_{n} (p)

is an appropriate measure of the uncertainty about the colors of the drawn balls. Indeed, it satisfies the following properties. First,

ε_{n} (p)

takes its largest value

log n

for

p_{1} = \dots = p_{n} = n^{- 1}

. Second, if we consider the equivalent coloration

C_{1}, \dots, C_{n}, C_{n + 1}

with probabilities

p_{1}, \dots, p_{n}

and

p_{n + 1} = 0

, respectively, then

ε_{n} (p_{1}, \dots, p_{n}) = ε_{n + 1} (p_{1}, \dots, p_{n}, p_{n + 1})

. Theorem 1 on pp. 9–10 of [33] states that the only continuous function that satisfies these two properties plus another one related to conditional entropy, has the form given in Equation (23) multiplied by a positive constant.

As in Section 2.2,

Y_{1}, \dots, Y_{n}

denotes the number of drawn balls for each of the n colors

C_{1}, \dots, C_{n}

, respectively, after

k \in N^{*}

draws. Define

\begin{matrix} T_{n} (Y_{1}, \dots, Y_{n}) & = ε_{n} (\frac{Y_{1}}{k}, \dots, \frac{Y_{n}}{k}) = - \sum_{j = 1}^{n} \frac{Y_{j}}{k} log \frac{Y_{j}}{k} = - \frac{1}{k} \sum_{j = 1}^{n} Y_{j} log Y_{j} + log k \end{matrix}

(24)

and

P_{n} (Y_{1}, \dots, Y_{n}) = (\binom{k}{Y_{1} \dots Y_{n}}) n^{- Y_{1}} \dots n^{- Y_{n}},

that is, the multinomial probability of the configuration

(Y_{1}, \dots, Y_{n})

under uniformity. It is directly shown that

k^{- 1} log P_{n} = T_{n} + o (1), a s k \to \infty a n d a . s .

Asymptotically, the entropy of the configuration is thus an increasing transform of the probability of the configuration under uniformity. The probability

P_{n}

is maximized by the constant configuration and so is the entropy

T_{n}

.

Consider now the multinomial model in Equation (3) with unknown vector of probabilities

p

. The frequency

Y_{j} / k

is an unbiased estimator of

p_{j}

, for

j = 1, \dots, n

. Thus,

T_{n}

is an estimator of the entropy

ε_{n} (p)

. It takes the form of the M-statistic in Equation (1) with

ξ_{j} (y; t) = - y log y + n^{- 1} k log k - n^{- 1} k t

. Using the M-P representation and some algebraic manipulations, the c.g.f. in Equation (12) takes the form

K_{n} (v; s) = k (log k - s_{1}) v_{1} - n s_{2} v_{2} - q + \sum_{j = 1}^{n} log \{1 + \sum_{l = 1}^{\infty} \frac{1}{l!} {(q p_{j} e^{v_{2}} l^{- v_{1}})}^{l}\},

with

q \in R_{+}^{*}

arbitrary. We set

s_{2} = k / n

and select q such that

E [S_{2, n}] = k / n

, i.e.,

q = k

. With this choice of q, the marginal saddlepoint equation, cf. Equation (14), has the trivial solution

β = 0

. This yields

K_{n} (v; (s_{1}, \frac{k}{n})) = k \{(log k - s_{1}) v_{1} - v_{2} - 1\} + \sum_{j = 1}^{n} log \{1 + \sum_{l = 1}^{\infty} \frac{1}{l!} {(k p_{j} e^{v_{2}} l^{- v_{1}})}^{l}\} .

(25)

Computing the second order derivatives is long but basic. We only give the simple result

K_{2, n}^{″} (0; (s_{1}, k / n)) = k

; it can be used for controlling the formula of the second derivative.

We can now apply the saddlepoint approximation of Section 3 to the following case:

p_{j} = 2 j / {n (n + 1)}

, for

j = 1, \dots, n

,

n = 6

and

k = 32

. The saddlepoint approximation is compared with the Monte Carlo distribution of

T_{6}

based on

10^{6}

simulations. The numerical results are displayed in Figure 1 and Table 1. The probabilities obtained by simulation are denoted

P_{E} [T_{6} < t]

, the probabilities obtained by the saddlepoint approximation are denoted

P_{S} [T_{6} < t]

,

ae (t)

denotes the absolute error defined in Equation (21) and

re (t)

denotes the absolute relative error defined in Equation (22), for

t \in [1.20, 1.77]

. We see that the relative errors are mostly very small. The largest one occurs in the extreme right tail and it is around 31%.

Example 2 (Power of likelihood ratio test).

The estimator of entropy in Equation (24) is closely related to the likelihood ratio test. Consider a sample of k i.i.d. random variables and consider any partition of their range that is made by n intervals of positive length. Denote by

p_{j}

the probability that any one of the sample values belongs to the jth interval, for

j = 1, \dots, n

. Denote by

Y_{j}

the number of sample values that belong to the jth interval, for

j = 1, \dots, n

. Then,

(Y_{1}, \dots, Y_{n})

takes values in

{\ddot{Δ}}_{k}^{n - 1}

and follows the multinomial distribution in Equation (3). Consider the null hypothesis

H_{0} : p \in Π_{0}

, where

Π_{0} \subset Δ_{1}^{n - 1}

. The likelihood ratio test statistic for

H_{0}

against the general alternative is given by

\begin{matrix} L_{n} (Y_{1}, \dots, Y_{n}) & = \frac{\sup_{p \in Π_{0}} \{\frac{k!}{Y_{1}! \dots Y_{n}!} p_{1}^{Y_{1}} \dots p_{n}^{Y_{n}}\}}{\sup_{p \in Δ_{1}^{n - 1}} \{\frac{k!}{Y_{1}! \dots Y_{n}!} p_{1}^{Y_{1}} \dots p_{n}^{Y_{n}}\}} . \end{matrix}

By restricting to

Π_{0} = {p_{0}}

, for some

p_{0} \in Δ_{1}^{n - 1}

, we obtain

T_{n}^{*} (Y_{1}, \dots, Y_{n}) = - 2 log L_{n} (Y_{1}, \dots, Y_{n}) = 2 \sum_{j = 1}^{n} Y_{j} log Y_{j} - 2 \sum_{j = 1}^{n} Y_{j} log p_{0, j} - 2 k log k .

(26)

In the case

p_{0, 1} = \dots = p_{0, n} = n^{- 1}

, which can be obtained without loss of generality by the probability integral transform,

T_{n}^{*} (Y_{1}, \dots, Y_{n})

is equal to

2 \sum_{j = 1}^{n} Y_{j} log Y_{j}

plus a constant term. Then,

T_{n}^{*} \overset{d}{⟶} χ_{n - 1}^{2}

, as

k \to \infty

. In addition, if

k, n \to \infty

, with

k / n \to l

, for some

l \in (1, \infty)

, then

T_{n}^{*}

is asymptotically normal.

The numerical evaluation of the saddlepoint approximation to the distribution of

T_{n}^{*}

, with

n = 4

,

k = 12

and under

H_{0}

, is given in Table 1 in [5]. We now extend the numerical study to the evaluation of the power function at any point of the alternative, viz. at any

p \in Δ_{1}^{n - 1} \ {n^{- 1}, \dots, n^{- 1}}

. Because

T_{n}^{*}

is an affine transform of the entropy estimator

T_{n}

given in Equation (24), we rather consider

T_{n}

as test statistic. Thus, the c.g.f. for the saddlepoint approximation is already given in Equation (25). Consider the power function at the point of alternative hypothesis

p_{j} = 2 j / {n (n + 1)}

, for

j = 1, \dots, n

. We select

n = 6

and

k = 32

. The saddlepoint approximation to the distribution of

T_{6}

under

H_{0}

gives

P_{S} [T_{6} < 1.6060] = 0.0495 .

The saddlepoint approximation to the distribution of

T_{6}

under the chosen alternative point gives

P_{S} [T_{6} < 1.6060] = 0.5691 .

This distribution is computed in Example 1. Thus,

0.5691

is the saddlepoint approximation to the power of the test with size 0.0495 at the given alternative.

In situations where

Π_{0}

is the singleton containing the vector of unequal elements

p_{0, 1}, \dots, p_{0, n}

, the saddlepoint approximation can be obtained in a similar way. An important application is with language identification, where these probabilities represent the frequencies of the n letters of the alphabet of a language and

Y_{1} / k, \dots, Y_{n} / k

are the frequencies of these n letters within a text of k letters. The belonging of the text to the language can be tested with the statistic

T_{n}^{*}

, which is in fact proportional to the Kullback–Leibler information. Precisely, denote

ι_{n} (v | w) = \sum_{j = 1}^{n} v_{j} log \frac{v_{j}}{w_{j}}

the Kullback–Leibler information or discrepancy between the two probability distributions

v = (v_{1}, \dots, v_{n}) \in Δ_{1}^{n - 1}

and

w = (w_{1}, \dots, w_{n}) \in Δ_{1}^{n - 1}

, that satisfy the absolute continuity condition

w_{j} = 0 \Rightarrow v_{j} = 0

, for

j = 1, \dots, n

. Then,

T_{n}^{*} = 2 k ι_{n} (Y_{1} / k, \dots, Y_{n} / k | p_{0, 1}, \dots, p_{0, n})

.

Example 3 (Total claim amount under delayed settlement).

We are interested in the distribution of the total claim amount of an insurance company over a fixed time horizon. We assume that the delay of claim settlement increases as the individual claim amount increases. This can happen in actuarial practice, partially because large claim amounts require longer controls. Precisely, the individual claim amounts are i.i.d. random variables taking the n values

r_{1} < \dots < r_{n}

, all in

R_{+}^{*}

, for

n = 2, 3, \dots

. Let

j \in {1, \dots, n}

. Claims of amount

r_{j}

are settled exactly after the jth unit of time (e.g., months). During a given time horizon (e.g., a year),

Y_{j}

claims of amount

r_{j}

occur. We assume that

k \in N^{*}

claims have occurred during the time horizon under consideration and that

(Y_{1}, \dots, Y_{n})

, which takes values in

{\ddot{Δ}}_{k}^{n - 1}

, follows the multinomial distribution in Equation (3). The total claim amount settled during the time horizon is thus

\sum_{j = 1}^{n} r_{j} Y_{j}

. We are interested in the distribution of the proportion of total claim amount that is settled exactly after the mth unit of time, viz. of

T_{n} = T_{n} (Y_{1}, \dots, Y_{n}) = \frac{\sum_{j = 1}^{m} r_{j} Y_{j}}{\sum_{j = 1}^{n} r_{j} Y_{j}},

(27)

for some

m \in {1, \dots, n - 1}

. It can be re-expressed as the M-statistic in Equation (1) with

ξ_{j} (y; t) = r_{j} (I {j \leq m} - t) y, for j = 1, \dots, n .

The M-P representation tells that the multinomial claim counts have the distribution of independent Poisson occurrences, given a total of k claim occurrences. Thus, with some algebraic manipulations, the c.g.f. in Equation (12) becomes

\begin{matrix} K_{n} (v; s) & = & - n s_{2} v_{2} - q \\ + \sum_{j = 1}^{n} log (1 + \sum_{l = 1}^{\infty} \frac{1}{l!} exp {[v_{1} r_{j} (I {j \leq m} - s_{1}) + v_{2} + log (q p_{j})] l}), \end{matrix}

with arbitrary

q \in R_{+}^{*}

. We set

s_{2} = k / n

and select q such that

E [S_{2, n}] = k / n

, i.e.,

q = k

. Thus, the marginal saddlepoint equation, cf. Equation (14), is solved by

β = 0

. This leads to

\begin{matrix} K_{n} (v; (s_{1}, \frac{k}{n})) & = & - k (1 + v_{2}) \\ + \sum_{j = 1}^{n} log (1 + \sum_{l = 1}^{\infty} \frac{1}{l!} exp {[v_{1} r_{j} (I {j \leq m} - s_{1}) + v_{2} + log (k p_{j})] l}) . \end{matrix}

By computing the second order derivatives, we find

K_{2, n}^{″} (0; (s_{1}, k / n)) = k

.

For the numerical illustration, consider the multinomial distribution in Equation (3) with probabilities

p_{1} = 0.15

,

p_{2} = 0.23

,

p_{3} = 0.16

,

p_{4} = 0.14

,

p_{5} = 0.12

,

p_{6} = 0.1

,

p_{7} = 0.06

, and

p_{8} = 0.04

and the total number of

k = 30

claims. Thus,

n = 8

and the possible claim amounts are

r_{1} = 10

,

r_{2} = 15

,

r_{3} = 20

,

r_{4} = 30

,

r_{5} = 50

,

r_{6} = 70

,

r_{7} = 100

and

r_{8} = 120

. The number of unit of times for the proportion of settled total claim amount, cf. Equation (27), is

m = 4

. To assess the accuracy of the saddlepoint approximation, we compute the Monte Carlo distribution of

T_{8}

, based on

10^{6}

simulations. The numerical results are displayed in Table 2. The probabilities obtained by simulation are denoted

P_{E} [T_{8} < t]

, the probabilities obtained by the saddlepoint approximation are denoted

P_{S} [T_{8} < t]

and

re (t)

denotes the relative error, cf. Equation (22), for

t \in [0.12, 0.72]

. Most relative errors are below 5%. The largest one occurs in the extreme left tail and is approximatively

12 %

.

A practical question would be the following: Which value of t bounds from above the proportion of total claim amount

T_{8}

with probability 0.99? One computes directly

P_{S} [T_{8} < 0.63] = 0.9897

and thus

t = 0.63

, approximately.

Example 4 (Bootstrap distribution of M-statistic).

Let

R_{1}, \dots, R_{n}

be a sample of i.i.d. random variables taking values in

R

, for

n = 2, 3, \dots

. Absolute continuity is assumed, in order to avoid repeated values a.s. Consider the M-statistic

U_{n}

or

U_{n} (R_{1}, \dots, R_{n})

defined as the root in t on

\begin{matrix} \sum_{j = 1}^{n} ζ (R_{j}; t) = 0, \end{matrix}

where

ζ : R^{2} \to R

is continuous and decreasing in its second argument. Let

r_{1}, \dots, r_{n}

be a realization of the sample and let

R_{1}^{*}, \dots, R_{n}^{*}

be the random variables obtained by sampling with replacement from the values

r_{1}, \dots, r_{n}

with respective probabilities

p_{1}, \dots, p_{n}

, for

(p_{1}, \dots, p_{n}) \in Δ_{1}^{n - 1}

. The distribution of

U_{n} (R_{1}^{*}, \dots, R_{n}^{*})

, or simply

U_{n}^{*}

, is the bootstrap distribution of

U_{n}

.

This coincides with sampling with replacement from the general urn model of Section 2.2, if the color

C_{j}

is associated to the value

r_{j}

, for

j = 1, \dots, n

, and if the number of draws from the urn is

k = n

. Define

ξ_{j} (y; t) = y ζ (r_{j}; t)

, for

t \in R

,

y \in N

and for

j = 1, \dots, n

. Then,

U_{n}^{*}

can be represented as the solution w.r.t. t of Equation (1), denoted

T_{n}

, in which

Y_{j}

is the number of times that

r_{j}

has been sampled, for

j = 1, \dots, n

. The conditional saddlepoint approximation of Section 3 yields the distribution of

T_{n}

, i.e., of

U_{n}^{*}

, i.e., of the bootstrap distribution of

U_{n}

. In most practical cases,

p_{1} = \dots = p_{n} = n^{- 1}

, i.e.,

a_{1, 0} = \dots = a_{n, 0}

.

The saddlepoint approximation for bootstrap distributions was introduced by [34,35,36] and for M-estimators by [37]. A review can be found in [38] (Section 9.5). Thus, the conditional saddlepoint approximation of Section 3 provides an alternative saddlepoint approximation to the bootstrap distribution of M-estimators.

Other applications of this saddlepoint approximation with the M-P representation that can be found the literature are the following. Saddlepoint approximations for likelihood ratio test and for chi-square tests for grouped data, under the null hypotheses, are given in [3]. For the numerical evaluation of the saddlepoint approximation for the likelihood ratio statistic, refer to [5].

4.2. Sampling without Replacement and MH-B Representation

The saddlepoint approximation combined with the MH-B representation can be applied for approximating the distribution of the M-statistic in Equation (1) in finite population sampling, viz. under sampling without replacement. Example 5 analyzes the numerical accuracy of the saddlepoint approximation to the distribution of the coloration entropy when sampling is without replacement.

Example 5 (Entropy’s estimator under sampling without replacement).

We consider the entropy estimation of Example 1 in the context of sampling without replacement. We are interested in the coloration entropy

ε_{n} (a_{1, 0} / z, \dots, a_{n, 0} / z)

, as given by Equation (23), with

a_{1, 0}, \dots, a_{n, 0}

unknown. It is the entropy of the initial state of the urn. In the multivariate hypergeometric model in Equation (4),

Y_{j} / k

is an unbiased estimator of

a_{j, 0} / z

, for

j = 1, \dots, n

, where

(Y_{1}, \dots, Y_{n})

takes values in

{\ddot{Δ}}_{k}^{n - 1} \cap ([0, m_{1}] \times \dots \times [0, m_{n}])

. Thus, an estimator of this entropy is given by Equation (24). The unknown parameters of the multivariate hypergeometric distribution in Equation (4) are

m_{j} = a_{j, 0}

, for

j = 1, \dots, n

.

With the MH-B representation and some algebraic manipulations, the c.g.f. in Equation (12) becomes

\begin{matrix} K_{n} (v; s) & = k (log k - s_{1}) v_{1} - n s_{2} v_{2} + z log (1 - q) + \sum_{j = 1}^{n} log \{1 + \sum_{l = 1}^{m_{j}} \frac{{(m_{j})}_{l}}{l!} {(\frac{q}{1 - q} e^{v_{2}} l^{- v_{1}})}^{l}\}, \end{matrix}

(28)

with

q \in (0, 1)

arbitrary. We set

s_{2} = k / n

and select q such that

E [S_{2, n}] = k / n

, i.e.,

q = k / z

. For this purpose, we assume

k < z

. With this choice, the marginal saddlepoint equation, cf. Equation (14), has the trivial solution

β = 0

and the c.g.f. in Equation (28) becomes

\begin{matrix} K_{n} (v; (s_{1}, \frac{k}{n})) & = & k \{(log k - s_{1}) v_{1} - v_{2}\} + z {log (z - k) - log z} \\ + \sum_{j = 1}^{n} log \{1 + \sum_{l = 1}^{m_{j}} \frac{{(m_{j})}_{l}}{l!} {(\frac{k}{z - k} e^{v_{2}} l^{- v_{1}})}^{l}\} . \end{matrix}

The second order derivatives of

K_{n}

can be obtained through long but simple algebraic manipulations. In particular, we find

K_{2, n}^{″} (0; (s_{1}, k / n)) = k (z - k) / z

.

For the numerical illustration, we consider the multivariate hypergeometric distribution with

n = 7

,

m_{1} = 2

,

m_{2} = 4

,

m_{3} = 6

,

m_{4} = 8

,

m_{5} = 10

,

m_{6} = 12

,

m_{7} = 14

and

k = 25

. We compute the Monte Carlo distribution of

T_{7}

based on

10^{6}

simulations. The saddlepoint approximation is obtained by following the steps of Section 3. The results are given in Table 3. The saddlepoint probabilities are obtained instantaneously and we see that the relative errors are below 15%, with the exception an extreme left tail point, for which the relative error is

25 %

.

We now summarize two practical applications of the conditional saddlepoint approximation with the MH-B representation. The first one can be found in [39] and concerns a permutation test of comparison of two groups. The jth individual belongs to the control group, when

Y_{j} = 0

, and to the treatment group, when

Y_{j} = 1

, for

j = 1, \dots, n

. We have

(Y_{1}, \dots, Y_{n}) \sim Multi - Hypergeometric (k; 1, \dots, 1)

, where k is the number of individuals of the treatment group. The realizations of

(Y_{1}, \dots, Y_{n})

represent the permutations of the individuals and the test statistic

T_{n}

is a linear combination of the elements of

(Y_{1}, \dots, Y_{n})

. The permutation distribution of

T_{n}

is obtained from Equation (2), where

X_{1}, \dots, X_{n}

are i.i.d. Bernoulli random variables.

The second application can be found in [40] and concerns the jackknife distribution of a ratio. Consider the fixed sample

z_{1}, \dots, z_{n}

, sample without replacement

1 \leq d < n

values and define

Y_{j} = 0

, if

z_{j}

is not sampled, and

Y_{j} = 1

, if

z_{j}

is sampled, for

j = 1, \dots, n

. This procedure is repeated many times and a statistic of interest is computed

k = n - d

times, from the k sampled values of each iteration. In the terminology of B. Efron, this is called the delete-d jackknife. We have

(Y_{1}, \dots, Y_{n}) \sim Multi - Hypergeometric (k; 1, \dots, 1)

, where k is the sample size of the jackknife samples. The realizations of

(Y_{1}, \dots, Y_{n})

represent the permutations of

(z_{1}, \dots, z_{n})

. The statistic considered in [40] is

T_{n} = \sum_{j = 1}^{n} v_{j} Y_{j} / \sum_{j = 1}^{n} u_{j} Y_{j}

, for

u_{j}, v_{j} \in R

, for

j = 1, \dots, n

. The permutation, viz. delete-d jackknife, distribution of

T_{n}

is obtained from Equation (2), where

X_{1}, \dots, X_{n}

are independent Bernoulli random variables with parameter

1 / 2

, together with the saddlepoint approximation for M-statistics of Section 3.

4.3. Polya’s Sampling and MP-NB Representation

This section provides various applications of the saddlepoint approximation with the MP-NB representation. Example 6 considers the estimator of the entropy of the initial coloration probabilities of the urn, in the setting of Polya’s sampling. Example 7 considers the Bayesian analysis if this entropy. The Bayesian Entropy’s estimator under multivariate Polya a priori and sampling without replacement is considered. The saddlepoint approximation to this the posterior distribution of the entropy can be obtained by MP-NB representation. Example 8 concerns the saddlepoint approximation with the MP-NB representation for many two-sample tests based on spacing-frequencies.

Example 6 (Entropy’s estimator under Polya’s sampling).

We consider the entropy estimation problem introduced in Example 1, now in the context of Polya’s sampling. We are interested in the entropy of the initial coloration probabilities

ε_{n} (a_{1, 0} / z, \dots, a_{n, 0} / z)

, given in Equation (23), where

a_{1, 0}, \dots, a_{n, 0}

are unknown. In the multivariate Polya model in Equation (5),

Y_{j} / k

is an unbiased estimator of

a_{j, 0} / z

, for

j = 1, \dots, n

, and so an estimator of the entropy is given by Equation (24). The parameters of the multivariate Polya distribution in Equation (5) are k equal to k of the urn model and

m_{j} = a_{j, 0} / r

, for

j = 1, \dots, n

. Using the MP-NB representation, the c.g.f. in Equation (12) becomes

\begin{matrix} K_{n} (v; s) & = & u log q - k (s_{1} - log k) - n s_{2} v_{2} \\ + \sum_{j = 1}^{n} log \{1 + \sum_{l = 1}^{\infty} \frac{{(l + m_{j} - 1)}_{l}}{l!} {[(1 - q) e^{v_{2}} l^{- v_{1}}]}^{l}\}, \end{matrix}

(29)

with

q \in (0, 1)

arbitrary. This formula allows for the direct evaluation of the conditional saddlepoint approximation of Section 3.

Example 7 (Bayesian Entropy’s estimator under multivariate Polya a priori and sampling without replacement).

The multivariate Polya distribution is often used as a prior distribution in Bayesian statistics, because it constitutes a conjugate class when associated to the multivariate hypergeometric likelihood. Precisely, consider the prior

\begin{matrix} (M_{1}, \dots, M_{n}) & \sim Multi - Polya (z; α_{1}, \dots, α_{n}) \end{matrix}

(30)

taking value in

{\ddot{Δ}}_{z}^{n - 1}

, for

z \in N^{*}

,

(α_{1}, \dots, α_{n}) \in Δ_{u}^{n - 1}

and

u \in R_{+}^{*}

, and consider the likelihood

(Y_{1}, \dots, Y_{n}) ∣ {(M_{1}, \dots, M_{n}) = (m_{1}, \dots, m_{n})} \sim Multi - Hypergeometric (k; m_{1}, \dots, m_{n}),

for

(m_{1}, \dots, m_{n}) \in {\ddot{Δ}}_{z}^{n - 1}

,

k \in {0, \dots, z}

and

(Y_{1}, \dots, Y_{n})

taking values in

{\ddot{Δ}}_{k}^{n - 1} \cap ([0, m_{1}] \times \dots \times [0, m_{n}])

. Then, the posterior is given by

\begin{matrix} {(M_{1}, \dots, M_{n}) | (Y_{1}, \dots, Y_{n}) = (k_{1}, \dots, k_{n})} \sim \\ (k_{1}, \dots, k_{n}) + Multi - Polya (z - k; α_{1} + k_{1}, \dots, α_{n} + k_{n}), \end{matrix}

(31)

for

(k_{1}, \dots, k_{n}) \in {\ddot{Δ}}_{k}^{n - 1} \cap ([0, m_{1}] \times \dots \times [0, m_{n}])

. Indeed,

\begin{matrix} P [M_{1} = m_{1}, \dots, M_{n} = m_{n} | Y_{1} = k_{1}, \dots, Y_{n} = k_{n}] & \propto \prod_{j = 1}^{n} (\binom{m_{j}}{k_{j}}) (\binom{α_{j} + m_{j} - 1}{m_{j}}) \\ \propto \prod_{j = 1}^{n} \frac{(α_{j} + m_{j} - 1)!}{(m_{j} - k_{j})!} \\ \propto \frac{\prod_{j = 1}^{n} (\binom{(α_{j} + k_{j}) + (m_{j} - k_{j}) - 1}{m_{j} - k_{j}})}{(\binom{(u + k) + (z - k) - 1}{z - k})}, \end{matrix}

where the last result is in fact equal to the posterior probability. Thus, Equation (31) holds.

The underlying urn model is the sampling without replacement described, in Section 2.2, where the initial number of balls of each one of the colors

C_{1}, \dots, C_{n}

, viz.

m_{j} = a_{j, 0}

, for

j = 1, \dots

, in the same order, is unknown. Only

z = \sum_{j = 1}^{n} a_{j, 0}

is known. These initial counts are the elements of the random vector

(M_{1}, \dots, M_{n})

with prior distribution in Equation (30). Sampling without replacement has led to the counts

(Y_{1}, \dots, Y_{n}) = (k_{1}, \dots, k_{n})

, for the colors

C_{1}, \dots, C_{n}

, in same order. The updated or posterior distribution of

(M_{1}, \dots, M_{n})

is given by Equation (31).

Assume that we are interested in the entropy of the probabilities of the initial coloration. The a priori entropy is thus

T_{n} (M_{1}, \dots, M_{n}) = ε_{n} (M_{1} / z, \dots, M_{n} / z)

, cf. Equation (23). According to Equation (31), the a posteriori entropy is

T_{n} (k_{1} + L_{1}, \dots, k_{n} + L_{n})

, where

(L_{1}, \dots, L_{n}) \sim Multi - Polya (z - k; α_{1} + k_{1}, \dots, α_{n} + k_{n})

.

The saddlepoint approximations to the distributions of the a priori and a posteriori entropies can be obtained by the saddlepoint approximation of Section 3 with MP-NB representation, as in Example 6. The a priori and a posteriori c.g.f. can be obtained by minor adaptations of the c.g.f. in Equation (29).

Example 8 (Two-sample tests based on spacing frequencies).

Consider two independent samples: the first consisting of k independent random variables

U_{1}, \dots, U_{k}

with common absolutely continuous distribution

P_{U}

and the second sample consisting of l independent random variables

V_{1}, \dots, V_{l}

with common absolutely continuous distribution

P_{V}

. All these random variables have common range given by the real interval

I

. We wish to test the null hypothesis

H_{0} : P_{U} = P_{V}

. Define

V_{(0)} = \inf I

,

V_{(l + 1)} = \sup I

and

V_{(1)} \leq \dots \leq V_{(l)}

the ordered

V_{1}, \dots, V_{l}

. Let

n = l + 1

. The random counts

Y_{j} = \sum_{i = 1}^{k} I {U_{i} \in [V_{(j - 1)}, V_{(j)})}, for j = 1, \dots, n,

(32)

are called spacing-frequencies: they provide the number of random variables

U_{1}, \dots, U_{k}

that lie between gaps made by

V_{(0)}, \dots, V_{(l + 1)}

. Thus,

(Y_{1}, \dots, Y_{n})

takes values in

{\ddot{Δ}}_{k}^{n - 1}

and possesses exchangeable components under

H_{0}

.

Denote by

R_{j}

the rank of the jth largest

V_{1}, \dots, V_{l}

in the combined sample, for

j = 1, \dots, l

. It is easily seen that

R_{j} = \sum_{i = 1}^{j} (Y_{i} + 1)

, or,

Y_{j} = R_{j} - R_{j - 1} - 1

, for

j = 1, \dots, l

. Consequently, many two-sample test statistics based on ranks can be re-expressed in terms of spacing-frequencies. Besides this, spacing-frequencies are essential for the analysis of circular data, because they are invariant w.r.t. changes of null direction and sense of rotation (clockwise or anti-clockwise) (for a review, see, e.g., [41]). Circular data are planar directions and can be re-expressed as angles in radians, so that

I = [0, 2 π)

, or any other interval of length

2 π

.

Holst and Rao [42] consider nonparametric test statistics of the form of

\begin{matrix} T_{n} (Y_{1}, \dots, Y_{n}) & = \sum_{j = 1}^{n} h_{j} (Y_{j}), \end{matrix}

(33)

for some Borel functions

h_{1}, \dots, h_{n}

. If

h_{1} = \dots = h_{n} = h

, then the test statistic

T_{n}

is called symmetric. Under

H_{0}

, the multivariate Polya distribution in Equation (5) holds with

m_{1} = \dots = m_{n} = 1

. Consequently,

u = \sum_{j = 1}^{n} m_{j} = n

and all Polya’s probabilities in Equation (5) are equal to

{(\binom{n + k - 1}{k})}^{- 1}

. This is in accordance with the result of combinatorics that the number of solutions

(k_{1}, \dots, k_{n}) \in N^{n}

of the equation

k_{1} + \dots + k_{n} = k

, i.e., card

{\ddot{Δ}}_{k}^{n - 1}

, is given by

(\binom{n + k - 1}{k})

. Thus, the equivalence in Equation (2) holds with the MP-NB representation, where the negative binomial reduces to the geometric distribution. Clearly, Equation (33) takes the form of the M-statistic in Equation (1) and the saddlepoint approximation of Section 3 can be applied.

We now summarize the examples presented in [3,5]. In the classical Wald–Wolfowitz run test,

T_{n}

takes the symmetric form of Equation (33) with

h (x) = I {x > 0}

. We define a U-run in the combined sample as a maximal non-empty set of adjacent

U_{1}, \dots, U_{k}

. Since each positive

Y_{1}, \dots, Y_{n}

is mapped to a different U-run and conversely,

T_{n}

yields the number of U-runs and it takes values in

{1, \dots, n}

. Large values of

T_{n}

show evidence for equal spread, i.e., for

H_{0}

. [5] provides the numerical evaluation of the saddlepoint approximation to the distribution of

T_{n}

under

H_{0}

. The saddlepoint approximation to the distributions of the Wilcoxon viz. Mann–Whitney, the van der Waerden viz. normal score and the Savage viz. exponential score tests are developed in [3], The numerical study of Savage’s test appears in [5]. In the context of directional data, a generalization of Rao’s spacings tests (see Section 4.4) to spacing-frequencies together with the saddlepoint approximation is given in [41], which mention its saddlepoint approximation.

The so-called multispacing-frequencies are obtained by gaps of order larger than one made by

V_{(0)}, \dots, V_{(l + 1)}

. Let

g \in N^{*}

denote the differentiation gap order, such that

n = (l + 1) / g

is an integer. Then, the multispacing-frequencies are defined by

Y_{j} = \sum_{i = 1}^{k} I {U_{i} \in [V_{({j - 1} g)}, V_{(j g)})}, for j = 1, \dots, n .

(34)

In the case

g = 1

, Equation (34) coincides with the spacing-frequencies in Equation (32). As before with

g = 1

,

(Y_{1}, \dots, Y_{n})

takes values in

{\ddot{Δ}}_{k}^{n - 1}

. We reconsider the null hypothesis

H_{0} : P_{U} = P_{V}

and the general test statistics in Equation (33), however with the multispacing-frequencies in Equation (34). Under

H_{0}

, the multivariate Polya distribution in Equation (5) holds with

m_{1} = \dots = m_{n} = g

,

u = \sum_{j = 1}^{n} m_{j} = n g

and the MP-NB representation applies.

The saddlepoint approximation with MP-NB representation was analyzed by Gatto and Jammalamadaka [7] in the context of the asymptotically most powerful multispacing-frequencies test against a specific sequence of alternative distributions and also in the context of the test statistic defined by the sum of squared multispacing-frequencies.

It seems difficult to formulate an arbitrary alternative hypothesis in terms of a particular multivariate Polya distribution, for the multispacing-frequencies. In this sense, the conditional saddlepoint approximation with the MP-NB representation may not be easily applied to power computations.

4.4. D-G Representation

Example 9 of this section analyzes the most powerful test of symmetry of the Dirichlet distribution. The saddlepoint approximation based on the D-G representation to the distribution of the test statistic under an asymmetric alternative is developed and its numerical accuracy is studied. The Dirichlet associated to the multinomial distribution is an important conjugate class of distributions in Bayesian statistic. This is illustrated in Example 10, which presents a Bayesian bootstrap test on the entropy. The D-G representation with the conditional saddlepoint approximation allow to compute the Bayes factor of the test, without resampling. Another important application of the saddlepoint approximation with the D-G representation is for the class of one-sample tests based on spacings. This class of nonparametric tests is presented in Example 11 and has some similarities with the two-sample tests based on spacing frequencies of Example 8. Example 11 provides a summary of the applications that can be found in the literature of this saddlepoint approximation to tests based on spacings.

Example 9 (Test for Dirichlet’s symmetry).

The symmetric Dirichlet distribution is obtained by setting

a_{1} = \dots = a_{n} = a

in Equation (9), for any

a \in R_{+}^{*}

. In Bayesian statistics, symmetric priors are of particular interest in absence of prior knowledge on the individual elements, because they become exchangeable random variables. The single parameter a becomes a concentration parameter:

a = 1

yields the uniform distribution over

Δ_{1}^{n - 1}

(thus, the noninformative prior);

a > 1

yields a concave density over

Δ_{1}^{n - 1}

(thus, promoting similarity of elements); and

a < 1

yields a convex density over

Δ_{1}^{n - 1}

(thus, promoting dissimilarity of elements). For

({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n}) \sim Dirichlet (a_{1}, \dots, a_{n})

, consider the testing problem of a particular symmetry against any particular asymmetric alternative. Precisely, given

a, α_{1}, \dots, α_{n} \in R_{+}^{*}

, where at least one the values

α_{1}, \dots, α_{n}

differs from the other ones, consider

H_{0} : a_{1} = \dots = a_{n} = a

, against

H_{1} : (a_{1}, \dots, a_{n}) = (α_{1}, \dots, α_{n})

. The test of uniformity is obtained with

a = 1

. Neyman–Pearson’s Lemma tells that the most powerful test has the form

T_{n} > t

, where

T_{n}

viz.

T_{n} ({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n})

is given by

T_{n} ({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n}) = \sum_{j = 1}^{n} (α_{j} - a) log {\bar{Y}}_{j} .

(35)

It is the M-statistic in Equation (1) with

ξ_{j} (y; t) = (α_{j} - a) log y - t / n

, for

j = 1, \dots, n

. From the D-G representation and some algebraic manipulations, the c.g.f. in Equation (12) becomes

\begin{matrix} K_{n} (v; s) & = & - s_{1} v_{1} - n s_{2} v_{2} + \tilde{α} log q - log (q - v_{2}) {\tilde{α} + (\tilde{α} - n a) v_{1}} \\ + \sum_{j = 1}^{n} {log Γ (α_{j} + [α_{j} - a] v_{1}) - log Γ (α_{j})}, \end{matrix}

where

\tilde{α} = \sum_{j = 1}^{n} α_{j}

and

q \in R_{+}^{*}

arbitrary. We set

s_{2} = 1 / n

and select q such that

E [S_{2, n}] = 1 / n

, i.e.

q = \tilde{α}

. The marginal saddlepoint equation, cf. Equation (14), has then

β = 0

as solution. This leads to

\begin{matrix} K_{n} (v; (s_{1}, \frac{k}{n})) & = & - s_{1} v_{1} - v_{2} + \tilde{α} log \tilde{α} - log (\tilde{α} - v_{2}) {\tilde{α} + (\tilde{α} - n a) v_{1}} \\ + \sum_{j = 1}^{n} {log Γ (α_{j} + [α_{j} - a] v_{1}) - log Γ (α_{j})} . \end{matrix}

(36)

The second order derivatives of

K_{n}

can be expressed in terms of polygamma functions

ψ^{(n)} (z) = {(d / d z)}^{n + 1}

log Γ (z)

, for

n = 0, 1

. We skip the details but note that

K_{2, n}^{″} (0; (s_{1}, k / n)) = {\tilde{α}}^{- 1}

.

In the following numerical illustration,

a = 1

and

α_{j} = j

, for

j = 1, \dots, 5

, so

n = 5

. The saddlepoint approximation is computed under

H_{1}

, so it gives the power of the test. It is compared with the Monte Carlo distribution of

T_{5}

with

10^{6}

simulations. The numerical results are displayed in Table 4. The probabilities obtained by simulation are denoted

P_{E} [T_{5} < t]

, the probabilities obtained by the saddlepoint approximation are denoted

P_{S} [T_{5} < t]

and

re (t)

denotes the absolute relative error given in Equation (22), for t in the lower and in the upper tails of the distribution. The relative errors of both lower and upper tails do not exceed

7 %

.

Example 10 (Bayesian bootstrap and Bayesian entropy test).

In Bayesian statistics, Dirichlet and multinomial distributions are conjugate: Dirichlet prior and multinomial likelihood lead to Dirichlet posterior. Precisely, if

({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n}) \sim Dirichlet (a_{1}, \dots, a_{n})

(37)

and

{(Y_{1}, \dots, Y_{n}) ∣ ({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n}) = ({\bar{y}}_{1}, \dots, {\bar{y}}_{n})} \sim Multinomial (k; {\bar{y}}_{1}, \dots, {\bar{y}}_{n}),

then

\begin{matrix} {({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n}) | (Y_{1}, \dots, Y_{n}) & = (y_{1}, \dots, y_{n})} \sim Dirichlet (a_{1} + y_{1}, \dots, a_{n} + y_{n}), \end{matrix}

(38)

\forall a_{1}, \dots, a_{n} \in R_{+}^{*}

and

(y_{1}, \dots, y_{n}) \in {\ddot{Δ}}_{k}^{n - 1}

.

The Bayesian bootstrap was introduced by Rubin [43] as a method for approximating the posterior distribution of a random parameter; precisely the distribution of a function of

{\bar{Y}}_{1}, \dots, {\bar{Y}}_{n}

, given the observed data

(Y_{1}, \dots, Y_{n}) = (y_{1}, \dots, y_{n})

. It consists in sampling of

({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n})

from Equation (38). This can be done by generating

Z_{j} \sim

Gamma

(a_{j} + y_{j}, q)

, for

j = 1, \dots, n

, independently, and by setting

\begin{matrix} {\bar{Y}}_{j} & = \frac{Z_{j}}{\sum_{i = 1}^{n} Z_{i}}, f o r j = 1, \dots, n . \end{matrix}

(39)

The value of

q \in R_{+}^{*}

is irrelevant. Details can be found in Section 10.5 of [38]. Assume that the parameter of interest is

T_{n} ({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n})

that admits the M-statistic representation in Equation (1), then the saddlepoint approximation with the D-G representation can be used instead of the described sampling method.

Consider now the urn model of Section 2.2 with sampling with replacement, where the probability of drawing a ball of color

C_{j}

is given by the random variable

{\bar{Y}}_{j}

, for

j = 1, \dots, n

. We are interested in the entropy

ε_{n} (\bar{Y})

, viz. Equation (23) as a function of

\bar{Y} = ({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n})

. According to Equation (10),

ε_{n} (\bar{Y})

is the entropy of the sample proportions of colors

C_{1}, \dots, C_{n}

under Polya’s sampling at steady state. Thus,

a_{j} = a_{j, 0} / r

, for

j = 1, \dots, n

; cf. Section 2.3. We consider the Bayesian testing problem

H_{0} : {ε_{n} (\bar{Y}) \in [ε_{0}, log n]}

, against

H_{1} : {ε_{n} (\bar{Y}) \in [0, ε_{0})}

, for some

ε_{0} \in (0, log n)

. Then,

ρ_{0} = P [ε_{n} (\bar{Y}) \geq ε_{0}]

and

ρ_{1} = P [ε_{n} (\bar{Y}) < ε_{0}]

are the prior probabilities of

H_{0}

and

H_{1}

, respectively. Their analog posteriors are

r_{0} (y) = P [ε_{n} (\bar{Y}) \geq ε_{0} | Y = y]

and

r_{1} (y) = P [ε_{n} (\bar{Y}) < ε_{0} | Y = y]

, where

Y = (Y_{1}, \dots, Y_{n})

and

y = (y_{1}, \dots, y_{n}) \in {\ddot{Δ}}_{k}^{n - 1}

. The Bayes factor of

H_{0}

to

H_{1}

is the posterior odds ratio

r_{0} (y) / r_{1} (y)

over the prior odds ratio

ρ_{0} / ρ_{1}

, namely

φ (y) = ρ_{1} r_{0} (y) / {ρ_{0} r_{1} (y)}

. The Monte Carlo solution consists in sampling of

({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n})

from the prior in Equation (37) and then from the posterior (38), both levels by means of Equation (39). Thus,

r_{0} (y)

and

r_{1} (y)

are Bayesian bootstrap estimators of

ρ_{0}

and

ρ_{1}

, respectively, and they allow for the evaluation of

φ (y)

. Alternatively, these values can be obtained without repeated sampling by using the conditional saddlepoint approximation of Section 3 with the D-G representation.

Example 11 (Tests based on spacings).

The so-called spacings are the first order differences or gaps between successive values of the ordered sample. Let

U_{1}, \dots, U_{l}

be absolutely continuous and i.i.d. over

[0, 1]

, without loss of generality by the probability integral transform, let

0 \leq U_{(1)} \leq \dots \leq U_{(l)} \leq 1

denote the ordered sample and let

U_{(0)} = 0

and

U_{(l + 1)} = 1

. For

n = l + 1

, the spacings are defined by

{\bar{Y}}_{j} = U_{(j)} - U_{(j - 1)}, f o r j = 1, \dots, n .

(40)

Thus,

({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n})

takes values in

Δ_{1}^{n - 1}

. Statistics that are defined as functions of spacings are used in various statistical problems, goodness-of-fit testing representing the most important (see, e.g., [44]). Spacings are essential in the analysis of circular data, because they form a maximal invariant w.r.t. changes of null direction and sense of rotation. For Borel functions

h_{j}

, for

j = 1, \dots, n

, important spacings statistics have the form

\begin{matrix} \sum_{j = 1}^{n} h_{j} (n {\bar{Y}}_{j}) . \end{matrix}

(41)

If

h_{1} = \dots = h_{n} = h

, then the test statistic is called symmetric. Under the null hypothesis

H_{0}

of uniformity of

U_{1}, \dots, U_{l}

, the D-G representation holds with

a_{1} = \dots = a_{n} = 1

, so that the n spacings are equivalent in distribution to n i.i.d. exponential random variables conditioned by their sum. As Equation (41) takes the form of the M-statistic in Equation (1), the saddlepoint approximation of Section 3 can be directly applied.

The conditional saddlepoint approximation with the D-G representation under

H_{0}

is analyzed numerically by [3] in the following cases: Rao’s spacings test (viz.,

h_{j} (x) = | x - 1 | / 2

, for

j = 1, \dots, n

), the logarithm spacings test (viz.,

h_{j} (x) = log x

, for

j = 1, \dots, n

), Greenwood’s test (viz.,

h_{j} (x) = x^{2}

, for

j = 1, \dots, n

) and a locally most powerful spacings test (viz.,

h_{j} (x) = Φ^{(- 1)} (j / (n + 1)) x

, for

j = 1, \dots, n

). In the context of reliability, Gatto and Jammalamadaka [6] re-expressed a uniformly most powerful test of exponentially, against alternatives with increasing failure rate, in terms of spacings. They obtained the saddlepoint approximation and show some numerical comparisons.

These spacings can be generalized to higher order differences or gaps. Let

g \in N^{*}

denote the gap order, selected such that

n = (l + 1) / g \in N^{*}

. The so-called multispacings are defined as

{\bar{Y}}_{j} = U_{(j g)} - U_{({j - 1} g)}, for j = 1, \dots, n .

(42)

As previously,

({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n})

takes values in

Δ_{1}^{n - 1}

. When

g = 1

, the random variables in Equation (42) coincide with the spacings in Equation (40). Under

H_{0}

, the D-G representation holds with

a_{1} = \dots = a_{n} = g

.

Gatto and Jammalamadaka [7] provided explicit formulae of the saddlepoint approximations for Rao’s multispacings test and for the logarithmic multispacings test, together with a numerical study.

The next problem would be the computation of the distribution of a spacings or multispacings test statistic under a non-uniform alternative distribution. This can be done by saddlepoint approximation with the D-G representation whenever one can find the parameters

a_{1}, \dots, a_{n} \in R_{+}^{*}

such that, under the alternative distribution, the spacings or multispacings satisfy

({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n}) \sim

Dirichlet(

a_{1}, \dots, a_{n}

). This would give the power of the test. However, re-expressing a non-uniform distribution in terms of a particular Dirichlet distribution does not appear practical, in general.

5. Final Remarks

This article presents the saddlepoint approximation for M-statistics of dependent random variables taking values in a simplex. Four conditional representations that allow re-expressing these dependent random variables as independent ones are presented. A detailed presentation of the underlying urn sampling model that is common to all four conditional representations is given. Important applications are reviewed. New applications are presented with some numerical comparisons between this saddlepoint approximation and Monte Carlo simulation. The numerical accuracy of the saddlepoint approximation appears very good.

A practical question concerns the relative advantages and disadvantages of using the conditional saddlepoint approximation presented in this article. Indeed, tail probabilities can be computed rapidly and more easily by means of Monte Carlo simulation. However, there is no unique answer to this general question, because several aspects should be considered.

First, when very small tail probabilities, e.g.,

10^{- 4}

, or extreme quantiles are desired, then the simple Monte Carlo used in this article may not always lead to accurate results. The reason is that the saddlepoint approximation is a large deviation technique, with bounded relative error everywhere in the tails, whereas simple Monte Carlo has unbounded relative error in the tails. In fact, simple Monte Carlo is even not logarithmic efficient. This is well explained in [45] (pp. 158–160). To have bounded relative error, importance sampling is required. Then, the mathematical complexity would become close to the one of the saddlepoint approximation. Moreover, computing quantiles by importance sampling may not be straightforward. As shown above, this is quite simple with the saddlepoint approximation.

The computations required for this article were done with Matlab (R2017b, The MathWorks, Natick, MA, USA). The minimization program fminsearch was used for obtaining the saddlepoint defined in Equation (13). All Matlab programs are available at http://www.stat.unibe.ch. They should be easily used and modified for new related applications.

One should also mention that, having analytical expression such as a saddlepoint approximation for computing a quantity of interest, may have advantages. Monte Carlo and other purely numerical methods often do not provide such an expression. For example, the saddlepoint approximation can be used for computing the sensitivity of the upper tail probability, viz. the derivative of the tail probability w.r.t. to a parameter of the model. Gatto and Peeters [46] proposed evaluating the sensitivityof the tail probability of the random sum w.r.t. the parameter of the summation index distribution (which is either Poisson or geometric) with the saddlepoint approximation. They showed numerically that the sensitivities obtained by the saddlepoint approximation and by simulation with importance sampling are very close, but this no longer true when simulation is without importance sampling. In the case of computing sensitivity, importance sampling is significantly more computationally intensive than the saddlepoint approximation.

An application of the saddlepoint approximation that exploits a different conditional representation concerns the distribution of the inhomogeneous compound Poisson total claim amount under force of interest, in the context of insurance. It was suggested by [47] and the main idea is the following. The inhomogeneous Poisson process of occurrence times of individual claims is given by

0 \leq T_{1} \leq T_{2} \leq \dots

. Let

N_{t}

denote the number of occurrences during the time interval

[0, t]

, for some

t > 0

. Then,

\forall n \in N^{*}

,

\begin{matrix} {(T_{1}, \dots, T_{N_{t}}) | N_{t} = n} & \sim (Y_{(1)}, \dots, Y_{(n)}), \end{matrix}

(43)

where

Y_{(1)} \leq \dots \leq Y_{(n)}

are the ordered values of some random variables

Y_{1}, \dots, Y_{n}

that are nonnegative, i.i.d. and independent of

{N_{t}}_{t \geq 0}

. The individual claim amounts are represented by the random variables

X_{1}, X_{2}, \dots

that are nonnegative, i.i.d. and independent of

{N_{t}}_{t \geq 0}

. Let

r \in R

denote the force of interest. The discounted total claim amount is

Z_{t} = \sum_{j = 0}^{N_{t}} e^{r (t - T_{j})} X_{j}

, for

T_{0} = X_{0} = 0

, and Equation (43) implies

Z_{t} \sim \sum_{j = 0}^{N_{t}} e^{r (t - Y_{j})} X_{j}

, for

Y_{0} = 0

. The last random sum has a simple structure and its distribution can be computed by the saddlepoint approximation of [18].

A technique that could exploit the four conditional representations of Section 2 for computing the conditional c.g.f. (and not the conditional saddlepoint approximation) can be found in [48]. It is tentatively applied, with the MP-NB representation, to the symmetric spacing-frequencies test statistic in Equation (33) in [41] (Section 6.3.2). However, this approach seems impractical.

Another extension of the proposed approximation would concern neutrosophic statistics. In standard statistics, observations and parameters are represented by precise values, whereas in neutrosophic statistics they remain indeterminate (see, e.g., [49]).

Funding

This research received no external funding.

Acknowledgments

The author is thankful to three anonymous reviewers, to Sreenivasa Rao Jammalamadaka and to Ilya Molchanov for various discussions, remarks and suggestions that improved the quality of this article.

Conflicts of Interest

The author declares no conflict of interest.

References

Aitchison, J. The Statistical Analysis of Compositional Data; Chapman & Hall: London, UK, 1986. [Google Scholar]
Kotz, S.; Balakrishnan, N. Advances in urn models during the past two decades. In Advances in Combinatorial Methods and Applications to Probability and Statistics; Birkhäuser, Statistics for Industry and Technology: Boston, MA, USA, 1997; pp. 203–257. [Google Scholar]
Gatto, R.; Jammalamadaka, S.R. A conditional saddlepoint approximation for testing problems. J. Am. Stat. Assoc. 1999, 94, 533–541. [Google Scholar] [CrossRef]
Skovgaard, I.M. Saddlepoint expansions for conditional distributions. J. Appl. Prob. 1987, 24, 875–887. [Google Scholar] [CrossRef]
Gatto, R. Symbolic computation for approximating the distributions of some families of one and two-sample nonparametric test statistics. Stat. Comput. 2000, 11, 449–455. [Google Scholar]
Gatto, R.; Jammalamadaka, S.R. A saddlepoint approximation for testing exponentiality against some increasing failure rate alternatives. Stat. Prob. Lett. 2002, 58, 71–81. [Google Scholar] [CrossRef]
Gatto, R.; Jammalamadaka, S.R. Small sample asymptotics for higher order spacings. In Advances in Distribution Theory, Order Statistics and Inference Part III: Order Statistics and Applications; Birkhäuser, Statistics for Industry and Technology: Boston, MA, USA, 2006; pp. 239–252. [Google Scholar]
Butler, R.W. Saddlepoint Approximations with Applications; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
Reid, N. The roles of conditioning in inference. Stat. Sci. 1995, 10, 138–157. [Google Scholar] [CrossRef]
Mirakhmedov, S.M.; Jammalamadaka, S. Rao, Ibrahim, B.M. On Edgeworth expansions in generalized urn models. J. Theor. Prob. 2014, 27, 725–753. [Google Scholar] [CrossRef]
Butler, R.W.; Sutton, R.K. Saddlepoint approximation for multivariate cumulative distribution functions and probability computations in sampling theory and outlier testing. J. Am. Stat. Assoc. 1998, 93, 596–604. [Google Scholar] [CrossRef]
Good, I.J. Saddlepoint methods for the multinomial distribution. Ann. Math. Stat. 1957, 28, 861–881. [Google Scholar] [CrossRef]
Klugman, S.A.; Panjer, H.H.; Willmot, G.E. Loss Models: From Data to Decisions, 3rd ed.; Wiley & Sons: New York, NY, USA, 2008. [Google Scholar]
Ivchenko, G.I.; Ivanov, A.V. Decomposable statistics in inverse urn problems. Discr. Math. Appl. 1995, 5, 159–172. [Google Scholar] [CrossRef]
Copson, E.T. Asymptotic Expansions; Cambridge University Press: Cambridge, UK, 1965. [Google Scholar]
De Bruijn, N.G. Asymptotic Methods in Analysis; Dover Publications: New York, NY, USA, 1981. [Google Scholar]
Daniels, H.E. Saddlepoint approximations in statistics. Ann. Math. Stat. 1954, 25, 631–650. [Google Scholar] [CrossRef]
Lugannani, R.; Rice, S. Saddle point approximation for the distribution of the sum of independent random variables. Adv. Appl. Prob. 1980, 12, 475–490. [Google Scholar] [CrossRef]
Daniels, H.E. Tail probability approximations. Int. Stat. Rev. 1987, 55, 37–48. [Google Scholar] [CrossRef]
Wang, S. Saddlepoint approximations in conditional inference. J. Appl. Prob. 1993, 30, 397–404. [Google Scholar] [CrossRef]
Jing, B.; Robinson, J. Saddlepoint Approximations for Marginal and Conditional Probabilities of Transformed Variables. Ann. Stat. 1994, 22, 1115–1132. [Google Scholar] [CrossRef]
Kolassa, J.E. Higher-order approximations to conditional distribution functions. Ann. Stat. 1996, 24, 353–365. [Google Scholar] [CrossRef]
DiCiccio, T.J.; Martin, M.A.; Young, G.A. Analytical approximations to conditional distribution functions. Biometrika 1993, 80, 781–790. [Google Scholar] [CrossRef]
Field, C.A.; Tingley, M.A. Small sample asymptotics: Applications in robustness. In Handbook of Statistics; North-Holland: Amsterdam, The Netherlands, 1997; Volume 15, pp. 513–536. [Google Scholar]
Gatto, R. Saddlepoint approximations. In StatsRef: Statistics Reference Online; Wiley & Sons: New York, NY, USA, 2015; pp. 1–7. [Google Scholar]
Goutis, C.; Casella, G. Explaining the saddlepoint approximation. Am. Stat. 1999, 53, 216–224. [Google Scholar]
Reid, N. Saddlepoint methods and statistical inference. Stat. Sci. 1988, 3, 213–238. [Google Scholar] [CrossRef]
Field, C.A.; Ronchetti, E. Small Sample Asymptotics; Institute of Mathematical Statistics Lecture Notes-Monograph Series: Hayward, CA, USA, 1990. [Google Scholar]
Jensen, J.L. Saddlepoint Approximations; Oxford University Press: Oxford, UK, 1995. [Google Scholar]
Kolassa, J.E. Series Approximation Methods in Statistics, 3rd ed.; Springer Lecture Notes in Statistics; Springer: New York, NY, USA, 2006. [Google Scholar]
Wang, S. One-step saddlepoint approximations for quantiles. Comput. Stat. Data Anal. 1995, 20, 65–74. [Google Scholar] [CrossRef]
Shannon, C.E. The mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423, 623–656. [Google Scholar] [CrossRef]
Khinchin, A.I. Mathematical Foundations of Information Theory; English Translation of Two Original Articles in Russian; Dover Publications: New York, NY, USA, 1957. [Google Scholar]
Davison, A.C.; Hinkley, D.V. Saddlepoint approximations in resampling methods. Biometrika 1988, 75, 417–431. [Google Scholar] [CrossRef]
Feuerverger, A. On the empirical saddlepoint approximation. Biometrika 1989, 76, 457–464. [Google Scholar] [CrossRef]
Wang, S. Saddlepoint approximations in resampling analysis. Ann. Inst. Stat. Math. 1990, 42, 115–131. [Google Scholar] [CrossRef]
Ronchetti, E.; Welsh, A.H. Empirical saddlepoint approximations for multivariate M-estimators. J. R. Stat. Soc. B 1994, 56, 313–326. [Google Scholar] [CrossRef]
Davison, A.C.; Hinkley, D.V. Bootstrap Methods and Their Application; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
Abd-Elfattah, E.; Butler, R. Saddlepoint approximations for rank-invariant permutation tests and confidence intervals with interval-censoring. Can. J. Stat. 2014, 42, 308–324. [Google Scholar] [CrossRef]
Booth, J.G.; Butler, R.W. Randomization distributions and saddlepoint approximations in generalized linear models. Biometrika 1990, 77, 787–796. [Google Scholar] [CrossRef]
Gatto, R.; Jammalamadaka, S.R. On two-sample tests for circular data based on spacing-frequencies. In Geometry Driven Statistics; Wiley & Sons: New York, NY, USA, 2015; pp. 129–145. [Google Scholar]
Holst, L.; Rao, J.S. Asymptotic theory for some families of two-sample nonparametric statistics. Sankhyā Ser. A 1980, 42, 19–52. [Google Scholar]
Rubin, D.B. The Bayesian bootstrap. Ann. Stat. 1981, 9, 130–134. [Google Scholar] [CrossRef]
Pyke, R. Spacings. J. R. Stat. Soc. B 1965, 27, 395–449. [Google Scholar] [CrossRef]
Asmussen, S.; Glynn, P.W. Stochastic Simulation. Algorithms and Analysis; Springer: New York, NY, USA, 2007. [Google Scholar]
Gatto, R.; Peeters, C. Saddlepoint approximations to sensitivities of tail probabilities of random sums and comparisons with Monte Carlo estimators. J. Stat. Comput. Simul. 2015, 85, 641–659. [Google Scholar] [CrossRef]
Gatto, R. A saddlepoint approximation to the distribution of inhomogeneous discounted compound Poisson processes. Methodol. Comput. Appl. Prob. 2010, 12, 533–551. [Google Scholar] [CrossRef]
Bartlett, M.S. The characteristic function of a conditional statistic. J. Lond. Math. Soc. 1938, 13, 62–67. [Google Scholar] [CrossRef]
Aslam, M. Design of sampling plan for exponential distribution under neutrosophic statistical interval method. IEEE Access 2018, 6, 64153–64158. [Google Scholar] [CrossRef]

Figure 1. Estimator of coloration’s entropy under sampling with replacement (

T_{6}

). First graph: saddlepoint approximation to the distribution function,

P_{S} [T_{6} < t]

. Second graph: absolute error,

ae (t)

. Third graph: absolute relative error,

re (t)

.

Figure 1. Estimator of coloration’s entropy under sampling with replacement (

T_{6}

). First graph: saddlepoint approximation to the distribution function,

P_{S} [T_{6} < t]

. Second graph: absolute error,

ae (t)

. Third graph: absolute relative error,

re (t)

.

Table 1. Estimator of coloration’s entropy under sampling with replacement (

T_{6}

), selected lower and upper tail points; Monte Carlo probability (

P_{E}

), saddlepoint probability (

P_{S}

), absolute relative error (

re

).

Table 1. Estimator of coloration’s entropy under sampling with replacement (

T_{6}

), selected lower and upper tail points; Monte Carlo probability (

P_{E}

), saddlepoint probability (

P_{S}

), absolute relative error (

re

).

t	$P_{E} [T_{6} < t]$	$P_{S} [T_{6} < t]$	$re (t)$
1.20	0.00109	0.00103	0.058
1.31	0.01034	0.00972	0.060
1.36	0.02648	0.02422	0.085
1.40	0.04755	0.04753	0.000
1.45	0.10116	0.10205	0.009
1.69	0.89118	0.88959	0.014
1.72	0.95691	0.95823	0.032
1.73	0.97360	0.97303	0.023
1.75	0.99114	0.99155	0.048
1.77	0.99838	0.99876	0.306

Table 2. Proportion of total claim amount (

T_{8}

): Monte Carlo probability (

P_{E}

), saddlepoint probability (

P_{S}

), and absolute relative error (

re

).

Table 2. Proportion of total claim amount (

T_{8}

): Monte Carlo probability (

P_{E}

), saddlepoint probability (

P_{S}

), and absolute relative error (

re

).

t	$P_{E} [T_{8} < t]$	$P_{S} [T_{8} < t]$	$re (t)$
0.12	0.00035	0.00040	0.124
0.16	0.00523	0.00555	0.059
0.20	0.03119	0.03261	0.044
0.24	0.10347	0.10508	0.016
0.28	0.23178	0.23278	0.004
0.32	0.39024	0.39775	0.019
0.38	0.62988	0.64044	0.029
0.42	0.76566	0.76905	0.015
0.46	0.85601	0.85930	0.023
0.48	0.89281	0.89184	0.009
0.52	0.93960	0.93993	0.006
0.56	0.96733	0.96784	0.016
0.60	0.98294	0.98352	0.035
0.64	0.99137	0.99166	0.035
0.68	0.99577	0.99578	0.003
0.72	0.99799	0.99791	0.037

Table 3. Estimator of coloration’s entropy under sampling without replacement (

T_{7}

); Monte Carlo probability (

P_{E}

), saddlepoint probability (

P_{S}

), and absolute relative error (

re

).

Table 3. Estimator of coloration’s entropy under sampling without replacement (

T_{7}

); Monte Carlo probability (

P_{E}

), saddlepoint probability (

P_{S}

), and absolute relative error (

re

).

t	$P_{E} [T_{n} < t]$	$P_{S} [T_{n} < t]$	$re (t)$
1.30	0.00010	0.00008	0.247
1.35	0.00034	0.00031	0.075
1.40	0.00124	0.00119	0.042
1.45	0.00377	0.00405	0.074
1.50	0.01271	0.01240	0.024
1.55	0.03340	0.03393	0.015
1.60	0.08396	0.08250	0.017
1.65	0.17648	0.17680	0.002
1.70	0.33407	0.33088	0.010
1.75	0.53940	0.53896	0.001
1.80	0.75566	0.75979	0.017
1.85	0.94083	0.93226	0.145
1.90	0.99594	0.99638	0.109

Table 4. Most powerful test statistic for Dirichlet’s symmetry (

T_{5}

), selected lower and upper tail points: Monte Carlo probability (

P_{E}

), saddlepoint probability (

P_{S}

), and absolute relative error (

re

).

Table 4. Most powerful test statistic for Dirichlet’s symmetry (

T_{5}

), selected lower and upper tail points: Monte Carlo probability (

P_{E}

), saddlepoint probability (

P_{S}

), and absolute relative error (

re

).

t	$P_{E} [T_{5} < t]$	$P_{S} [T_{5} < t]$	$re (t)$
−20.5	0.00105	0.00110	0.044
−18.5	0.00976	0.01017	0.042
−17.6	0.02593	0.02661	0.026
−17.0	0.04814	0.04952	0.029
−16.3	0.09709	0.09951	0.025
−13.4	0.90156	0.90255	0.010
−13.2	0.95632	0.95753	0.029
−13.1	0.97661	0.97727	0.029
−13.0	0.99104	0.99090	0.015
−12.9	0.99833	0.99820	0.070

© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gatto, R. Saddlepoint Approximation for Data in Simplices: A Review with New Applications. Stats 2019, 2, 121-147. https://doi.org/10.3390/stats2010010

AMA Style

Gatto R. Saddlepoint Approximation for Data in Simplices: A Review with New Applications. Stats. 2019; 2(1):121-147. https://doi.org/10.3390/stats2010010

Chicago/Turabian Style

Gatto, Riccardo. 2019. "Saddlepoint Approximation for Data in Simplices: A Review with New Applications" Stats 2, no. 1: 121-147. https://doi.org/10.3390/stats2010010

APA Style

Gatto, R. (2019). Saddlepoint Approximation for Data in Simplices: A Review with New Applications. Stats, 2(1), 121-147. https://doi.org/10.3390/stats2010010

Article Menu

Saddlepoint Approximation for Data in Simplices: A Review with New Applications

Abstract

1. Introduction

2. Four Conditional Representations and Their Urn Sampling Interpretations

2.1. Three Conditional Representations for Counting Random Variables

2.2. Three Associated Urn Sampling Schemes

2.3. A Conditional Representation for Positive Random Variables and Its Urn Sampling Interpretation

3. Conditional Saddlepoint Approximation for M-Statistics

3.1. Approximation to the Distribution

3.2. Modifications for Discrete Statistics

3.3. Approximation to Quantiles

4. Applications

4.1. Sampling with Replacement and M-P Representation

4.2. Sampling without Replacement and MH-B Representation

4.3. Polya’s Sampling and MP-NB Representation

4.4. D-G Representation

5. Final Remarks

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI