1. Introduction
In [
1], Theorem 6.3, we derived a general Karhunen–Loève (KL) representation valid for Jacobi, Laguerre, and Hermite polynomials (see below some basic definitions concerning the KL
representation of a centred Gaussian process and the associated KL
expansion of its covariance function, with respect to a positive measure
). In the particular case of Jacobi polynomials, it implies that, given
and
, the KL expansion
holds, with
. Function
is the weight associated with Jacobi polynomials (see [
2], pp. 8–9),
and
its integrals, these three functions being given by
The eigenvalues appearing in (1) are given by
the squared norms by
and
is the
th Jacobi polynomial. Jacobi polynomials satisfy orthogonality relations
where the Kronecker delta is defined by
Jacobi polynomials also satisfy the differential equations
About these standard identities, see, e.g., formulae (1.2.1), p. 3, (1.3.4), p. 8, §1.4.1.1, p. 9 and Table 1.1, p. 11 in [
2], or formulae (9.8.2), (9.8.6), and (9.8.10), pp. 217–218 in [
3].
In view of the equalities
these functional and numerical elements associated with Jacobi polynomials enable us to express the probability density function (p.d.f.)
, the distribution function
, as well as its tail
, associated with Jacobi polynomials as
In the particular cases
and
, development (1) reduces, up to a change of variables (see details below in
Section 8.1), to expansions
for
, where the standard notation
is used for the Legendre polynomial. These two expansions are familiar to statisticians as they play a key role in the study of Cramér–von Mises (CvM) and Anderson–Darling (AD) statistics; see Proposition 1 pp. 213–214 and Theorem 1, p. 225 in [
4].
This remark motivates the contents of our paper, which is organised as follows:
In
Section 2, we recall some basic facts about Cramér–von Mises statistics, highlighting those we wish to extend to some of their discrete analogues involving Hahn polynomials and the hypergeometric distribution. In particular, Proposition 1 states a new result concerning the optimal local asymptotic Bahadur efficiency of a sub-family of these statistics, including the cases of Cramér–von Mises and Anderson–Darling statistics.
In
Section 3, we introduce the classical hypergeometric distribution
and the associated Hahn polynomials. We define a weighted discrete Brownian bridge process
associated with this distribution.
In
Section 4, Proposition 3 gives the K-L expansion of the covariance function of this process in terms of Hahn polynomials, a discrete analogue of (1) and (6).
In
Section 5, we introduce our new statistic
, defined either as a discrete weighted Cramér–von Mises statistic, or as a degenerate
V-statistic, with a kernel whose KL expansion is given.
In
Section 6, we provide in Theorem 1 some of the properties of
. In particular, Proposition 5 states a result about the probability of a large deviation, a key result for the subsequent study of Bahadur efficiency.
In
Section 7, we study some properties of our statistic under a general alternative, and Theorem 2 states its local asymptotic Bahadur optimality under a more particular alternative hypothesis, the latter appearing as a perturbation of the hypergeometric distribution by the first non-constant Hahn polynomial. This result is a discrete analogue for the hypergeometric distribution of Proposition 1 for some Beta distributions.
Some proofs and various required formulas are postponed to
Section 8. For the sake of simplicity, we will at some places omit superscripts in proofs.
2. Cramér–Von Mises and Anderson–Darling Statistics Revisited
The usefulness of orthogonal expansions such as (5)–(6) is well known in the field of statistics; see [
4], Chapter 5, and the numerous references therein about this topic. In particular, recall that KL expansions (5)–(6) yield the equalities in law, or KL representations,
where
is a Brownian bridge process, i.e., a centred Gaussian process with covariance function
, and
are independent standard normal random variables. Let us now introduce some terminology about KL expansions and representations. Let
be a measure space and
be the associated Hilbert space of real, square-integrable functions endowed with the inner product
By a
-KL expansion of the bivariate symmetric kernel
, we mean a pointwise convergent series of the form
where the sequences of eigenfunctions
, eigenvalues
, and squared norms
satisfy the integral and orthogonality relations
Recall that the entire law of centred Gaussian process is determined by its covariance function (see [
5], p. 2). Therefore, if
is a centred Gaussian process with covariance function
then a corollary of (7) are the equalities in law
and
In view of the orthogonality relations (8), we call (9) the
KL representation associated with the
KL expansion (7).
Now, assume that
and let
be endowed with a positive continuous p.d.f.
. Consider a kernel
K admitting the expression and the
-KL expansion
for some weight function
. Then,
K is the covariance function of the centred Gaussian process
, which, consequently, admits the
-KL representation
In this framework, by setting
and using the equality
, then in view of (2) and (4), expansion (1) can be seen as an
-KL expansion associated with the
-KL representation
In this case, the corollary of representation (15) is the equality in law
These developments and identities appear as asymptotic statistical properties of some goodness-of-fit tests in the following way:
Assume the sequence of independent and identically distributed (i.i.d.) random variables taking values in is drawn from a population with positive continuous p.d.f. . Assume we wish to test the null hypothesis against the alternative . Let us use the -KL expansion (11) to build a test suited to this aim, based on a certain statistic associated with , say .
The
-empirical process associated with our sample
is
where
is the Heaviside function
so that
is nothing else but the empirical distribution function.
Note for further use that under
, we have for
,
Under
, the convergence in law (denoted by the sign ⇒) and the equality
hold. Note for further reference that our statistic
equals
defined by [
6], (2.1.9), p. 41. Also recall that given the square summable bivariate kernel
, the statistic
is called a von Mises functional statistic, or a
V-statistic (see [
7], p. 39, [
8], Exercise 10, p. 172 or [
9], §5.1.2, pp. 174–175). If, furthermore, the so-called degeneracy condition
holds, then the
V-statistic is said to be degenerate with respect to
(see [
8] (§12.3) or [
7] (§4.3)). Now, the second and first equalities, in (16) and in (17), respectively, enable us to write
in other words
is a V-statistic with kernel
In view of the equality
Fubini’s theorem implies that
which means that
is a degenerate V-statistic, with respect to
.
The asymptotic distribution of
under
being given by (17), we know from the theory of degenerate V-statistics that the
-KL expansion of
L, involving the eigenvalues appearing in (17), will be of the form
The statistical interest of this spectral expansion of
L stems from the limit theorem satisfied by degenerate
V-statistics, which imply in our case that if
whereas if
Therefore, only if
does not obtain, will
tend to infinity almost surely, and heuristically, the speed of divergence will increase with the value of
. If
is the maximal eigenvalue, the statistic
is, therefore, expected to be suited for detecting alternatives of the form
where
is small enough to ensure that
is a well-defined p.d.f.
The statistic appearing in (17) is referred to as a (continuous) weighted Cramér–von Mises statistic. The original Cramér–von Mises statistic as well as the Anderson–Darling statistic correspond to the uniform p.d.f. for , and weight functions and , respectively. Therefore, they are usually discussed as tests for uniformity over .
Now, our KL expansions (1) enable us to interpret the CvM and AD statistics in another way. To this end, consider again the family of weighted CvM statistics with p.d.f and weight function given for
by
and
denoting the
-empirical process defined above by (16). The corresponding statistic is
CvM and AD statistics corresponding to the cases
and
, respectively. They are now associated with the arcsine law
and the uniform distribution
, respectively.
The latter association raises the following question: Has
some optimal efficiency for some goodness-of-fit test of the null hypothesis
against a specific alternative, and if so, can Jacobi polynomials play a key role in the mathematical study of this efficiency of this test? Among the different measures of efficiency (see [
6], Introduction), we choose the Bahadur efficiency, so let us recall some facts before addressing our question.
In the theory of Bahadur efficiency, the central result is [
10], Theorem 7.5. Recall its underlying general principles. Assume
is a sequence of i.i.d. random variables following a distribution determined by a parameter, say
. Let
. The efficiency of a test based on the rejection of the simple hypothesis
, against the alternative
, for large values of the statistic
, is measured by the magnitude of a positive coefficient called the slope of the sequence
, denoted by
. High values of
correspond to a good efficiency of the test. The main way to compute
is provided by [
10], Theorem 7.2, p. 29. Furthermore, an upper bound for
is
, where
is the Kullback–Leibler information number ([
10], Theorem 7.5, p. 29). The test is asymptotically optimal whenever this upper bound is reached, or locally asymptotically optimal if
The difficult part in the determination of Bahadur efficiency is always the determination of some large deviation probability under
. In this respect, KL expansions can play a determinant role in some cases, including the cases we will deal with. The large deviation probability to be determined is
under
.
KL expansions enable to compute the exact slope in the case where it coincides with the so-called approximate exact slope, which means that the large deviation problem associated with
may be replaced by the large deviation problem associated with
given by (21). The solution of the latter problem is provided by a classical result from Zolotarev (see [
6], (1.2.17), p. 10 and pp. 75–76, in particular, Table 2, p. 76) stating that
Heuristically, we may expect that if
then the asymptotic relation
will hold.
It is true in the particular case of the Anderson–Darling statistic
, and also in the case of
, provided that
is summable. The latter condition is
When
is not summable, no result is available. These facts justify the hypotheses about
and
in the following:
Proposition 1. - (i)
Assume that and that we wish to testThen, the goodness-of-fit test based on the Cramér–von Mises statisticis locally asymptotically Bahadur optimal, as . - (ii)
Assume that , or , and that we wish to testThen, the goodness-of-fit test based on the rejection of for large values of is locally asymptotically optimal in the sense of Bahadur, as .
Proof. We will use [
6], Table 2, p. 76, with
from this reference identified with
from (11).
(i) One has, for
,
On the first hand, the exact slope is given by
On the other hand, the Kullback–Leibler information number is given by
so that
, and the result is established.
(ii) First, note that
so that under our assumptions
is a well-defined p.d.f. We will use repeatedly (4). On the first hand, we have, using [
11], 22.13.1, p. 785,
The exact slope is given by
On the other hand, the Kullback–Leibler number is given by
Therefore, as
,
and the result is established. □
Let us now show that these results involving Jacobi polynomials have analogues for Hahn polynomials.
3. A Discrete Brownian Bridge Associated with the Hypergeometric Distribution
The discrete analogues of (17) are called discrete Cramér–von Mises statistics and were introduced and discussed only quite recently by [
12,
13].
The elements appearing in (1), as well as (3), have their counterpart in all other families of classical orthogonal polynomials, continuous (Laguerre and Hermite), or discrete (Charlier, Hahn, Krawtchouk, and Meixner). It is, therefore, tempting to introduce the formal counterpart of (1) for each of these families, and the associated weighted Cramér–von Mises statistic.
Such statistics were discussed in [
14], but dealing only with the continuous case (Jacobi, Laguerre, and Hermite polynomials).
Let N be a positive integer and let be a probability mass function (p.m.f.) supported by .
Assume the sequence of independent and identically distributed random variables
taking values in
is drawn from a population with positive p.m.f.
. Assume we wish to test the simple null hypothesis
where
is the classical hypergeometric distribution given below by (25), against the alternative
.
Following [
15],
, p. 259, consider for
, the
classical hypergeometric probability mass function (p.m.f.) given by
The associated cumulative distribution function (c.d.f.) and its tail are given by
In the present paper, Hahn polynomials, denoted by
for
, will be those denoted by
in [
3], §9.5 (see formulas (1.4.1), p. 5 and (9.5.1), p. 204) with
, or also
in [
2] (see the first equality p. 53 and Table 2.4, p. 54), so that
where the Pochhammer symbol
is defined to be
These Hahn polynomials satisfy, for
, the orthogonality relations
See a proof in
Section 8.
Hahn polynomials also satisfy difference equations
for
, with
(see [
2] ((2.1.18), p. 21)), and where for any function
, the forward and backward shift operators are defined by
For a two-variable function, a subscript will indicate the variable upon which these operators act, e.g.,
The first few Hahn polynomials with their (discrete) derivatives are given by
In order to avoid problems at the endpoints 0 and
N when dealing with difference operators, any function
will be extended, over
, to a function also denoted
f, in a way that will be specified when it has to. Note for such a non-null function
f, the fundamental property, valid for
,
since the sequence
forms a complete set of solutions to the eigenvalue problem
Let us introduce the scalar products
Note the identity
In this setting, noticing that
, the orthogonality relations (28) take the form
The orthonormalized sequence of Hahn polynomials with respect to
is, therefore, given by
Given a Brownian bridge process
, we define a discrete Brownian bridge process
by setting
The process is Gaussian centred, with covariance kernel
We will now give the
-KL representation and expansion of this process and its covariance function.
5. A Family of Discrete Cramér–Von Mises Statistics
Let
be a sample of size
from a population whose distribution, with support
, has p.m.f. and c.d.f. denoted by
and
. The observed frequencies associated with our sample are
the empirical p.m.f. and c.d.f. being denoted and given by
respectively. Let
E denote the expectation operator under the null hypothesis
For
, one can associate with
the random
vector
, whose components are the random variables
For
, we clearly have
These relations imply, in turn,
If
, then
and
are independent, so that
By analogy with the uniform empirical process in the continuous case
, consider the
-empirical process
and the weighted empirical process defined over
by
Proposition 4. One has, under , for , Proof. The first equality is straightforward. The second equality follows, keeping (45) in mind, from (56)–(60), combined with definition (62). □
The discrete Cramér–von Mises statistic, associated with our empirical process, is the non-negative number, say
, defined by
The statistic
has to be thought of as a test statistic, large values of
being significant, i.e., leading to the rejection of
. For computations, one can use the equality
to be compared with the widely used chi-square statistic
Remark 1. It is well-known that the chi-square statistic provides a formula for testing the fit of a sample to any distribution, say ω, over . One has only to replace by ω in . The same property holds for our statistic defined, by formula , in which and should be replaced by ω and Ω. The difference is that our weight remains unchanged and the choice of μ is arbitrary.
As well as in the continuous case by using (62), our statistic can be seen as the degenerate
V-statistic
with kernel
the degeneracy with respect to the p.m.f.
being obtained by linearity and the equality
7. Exact Slope under and Local Asymptotic Bahadur Optimality under the Alternative
Let us apply Bahadur’s fundamental result ([
10], §7) to the sequence of statistics
.
Given the null hypothesis (55), we will first consider an alternative hypothesis under which the distribution is a p.m.f., supported, as under , by , and denoted by , the associated c.d.f. being with , .
Let us state a first result, recalling that function f was defined above in Proposition 5. Note that (77) ensures that is consistent against all alternatives. A row vector will be denoted by , the zero vector by . Note that (43) implies that all p.m.f. over can be written in the form (76) below.
Proposition 6. If the alternative hypothesisholds, then the convergence in probabilitytakes place. Furthermore, the exact slope of satisfies Proof. We will use the abbreviation
. The first equality in (77) follows from the law of large numbers applied to (65). As for the second equality, let us use the definition of
as a
V-statistic with kernel
L. Setting
, we can rewrite
as
First, note the identities
Then, the law of large numbers applied to
V-statistics implies that, as
,
Then, (74) allows us to use [
10], Theorem 7.2, and conclude that (78) holds. □
Let us use the notation
whenever
and
, and in this case
Recall that the Kullback–Leibler information number (introduced by [
19]) of
and
is defined in the discrete case by
and that function
f was introduced in Proposition 5.
Theorem 2. If the alternative hypothesisholds for some , thenTherefore, the exact slope of satisfiesso that the statistic is locally asymptotically optimal in the sense of Bahadur, with respect to the statistical problem . Proof. The first result (80) is a particular case of (77). Then, (78), rewritten with replaced by given by (80), leads to the first equality and equivalence in (81).
Finally, Fisher information being at
given by
(see, e.g., Theorem 9.17 and Example 9.20 in [
20]), the Kullback–Leibler divergence satisfies
(see
in [
19]), and the last equivalence in (81) readily follows. □
9. Discussion and Future Research Directions
Hypergeometric distributions, some aspects of which are discussed in our paper, have various applications. See the already classical treatise [
15], §6.9 for a basic list of references.
The fact that the use of hypergeometric distributions to model various problems remains an active field of research is illustrated by more recent references, such as [
21,
22] for theoretical aspects, as well as [
23,
24,
25,
26,
27,
28] for practical applications.
We have proved that most features of some Cramér–von Mises statistics can be derived from their connection with classical orthogonal polynomials. This approach enabled us, on the first hand, to state new results about the local asymptotic Bahadur optimality of the well-known Cramér–von Mises and Anderson–Darling statistics. On the second hand, similar properties were proved for the discrete case associated with Hahn polynomials and hypergeometric distributions.
A first natural direction for future work is to extend our results to the whole family of classical orthogonal polynomials, discrete or continuous, as well as their
q-analogues (see [
2], Chapter 3, and [
3], Part II about this topic). The main difficulty lies in the issue of large deviation probabilities.
A second direction should be the use of simulation to check the power of such tests, compared to that of standard tests used by practitioners, such as the chi-square test. In this respect, one should identify the alternatives of interest in practice in different fields. Whereas common alternatives are mean shifts, our approach privileges alternatives associated with the first non-constant polynomial of the family associated with a distribution. In most cases, such an alternative may not reduce to the mean shift, but be of interest in a way that has to be determined.
A third, more difficult, direction of research, would be to extend our results to the wider family of the polynomials of the Askey–Wilson scheme. The random variables associated with these polynomials, as well as their various mutual relationships, via limit relations, were discussed in [
29], where basic references concerning the Askey–Wilson scheme are given. Since only classical orthogonal polynomials satisfy a second-order differential equation, another property of orthogonal polynomials, such as, for instance, the three-term recurrence relation, should be used to carry out such a task.