1. Introduction: Common Mean and Unknown Heterogeneous
Uncertainties
This paper arose from statistical analysis of a sequence, , in the setting of heterogeneous research synthesis, with representing the estimate of the common mean (say, treatment effect), as reported by the j-th study. No conditions are imposed on the unknown accuracies, which cannot be assumed equal. The main statistical challenge is the estimation of the common mean, treated as a shift parameter, when the standard deviations are considered to be unknown nuisance scale parameters.
In some applications, the uncertainty appraisals are either missing or utterly unreliable. The difficulty in accurately valuing the variances of systematic errors, whether due to specific laboratory conditions or hospital protocols, is well acknowledged by data scientists.
The issue of underreported uncertainties, particularly those that stem from asymptotic normal theory, which presupposes large data sets, is prevalent in metrology. Furthermore, the challenge of reproducibility within individual centers may be exacerbated by the nature of the employed measuring instruments.
Another source of artificially small uncertainties may be due to the removal of outliers for purely mathematical reasons. By eliminating “unrepresentative” or “spurious” data points, one typically is left with a part of the sample that is unrealistically accurate. See [
1] for further motivation.
Unlike classical statistical models, the scenario suggested here does not require accompanying estimates of uncertainty of . Our investigation focuses on the special “self-dual” weights that define the discrete posterior distribution for the unknown mean, set against a non-informative objective prior for both the mean and independent variances.
This line of inquiry, initiated in [
2] under the assumption of normality, grapples with the lack of variance information, causing several statistical complications. For instance, the classical maximum likelihood estimator cannot be determined uniquely, as the likelihood function reaches infinity at each data point. Nevertheless, the problem is well-defined. Indeed, estimating the common mean requires determining at most
n parameters, the mean itself and, say,
, which belong to the unit simplex of dimension
. Statistical practice needs to determine only
weights to form the common mean estimator, say,
. In some applications, there is additional information that allows further dimension reduction.
This paper is motivated by mathematical statistics, but its goal is to explore the mathematical aspects of the issues arising in the statistical problem. Indeed, it is addressed to a general mathematical audience. The main contribution is the construction of a unified probabilistic framework for rank-induced parity distributions and the derivation of related moment and combinatorial formulas, which show the link to the Gauss hypergeometric function.
More specifically,
Section 2 examines the polynomial approximation problem over the set
, and orthogonal polynomials, which are less deviant from zero on this set. The required formulas for the distributions of parity-based sums are established in
Section 3. The deep connection between these distributions and the hypergeometric function is demonstrated in
Section 4. We present a self-contained proof of formulas for the specific value of this classical function. Obviously, these formulas are known to specialists, but the author failed to find an easy reference. Our approach yields seemingly new combinatorial identities, (
39), (
48), (
53)–(
55), (
57) in
Section 4 and
Section 5. Some useful expressions for partial derivatives are given in
Section 6.
2. Self-Dual Probabilities and Orthogonal Polynomials Least Deviating from Zero
We initiate our discussion with a problem that, at first glance, appears unrelated to the main focus. Namely, assuming that all x’s are distinct, the best polynomial approximation of a function f over a finite set is sought.
The optimal uniform approximation by a polynomial of degree (or higher) is achieved through the classical Lagrange interpolation polynomial , which coincides with f on this set, i.e., for . If the polynomial’s degree is , the best approximation is derived by subtracting from a specific multiple of the oscillating polynomial that alternates its sign at each successive , attaining the same absolute value at these points.
The probabilities,
are related to the Lagrange formula and to the
parities of
,
,
It is known (see Theorem 1.15, [
3]) that the approximation error coincides with the absolute value of the average, taken under (
1), of products
. Thus,
where
runs through all polynomials of degree not exceeding
. For instance, the approximation error of the oscillating function
is independent of
n,
The barycentric form of the Lagrange interpolation formula,
provides numerous advantages [
4].
Probabilities (
1) originate in random matrix theory, where they offer alternative descriptions of a physical ensemble in terms of particles or holes. Many optimization problems involving the discriminant function through electrostatic equilibrium are underpinned by them [
5]. From the mathematical perspective, these probabilities are self-dual under the duality definition given in [
6] and developed in [
7,
8,
9].
In mathematical statistics, probabilities (
1) define the discrete posterior distribution for the location parameter against a non-informative prior for this parameter, mean, and independent variances [
10]. The generalized Bayes estimator of the mean under the quadratic loss is
This statistic serves as a semiparametric estimator of the symmetry center in a heterogeneous sample, meaning that it does not depend on the distribution of
x’s from a broad class.
Orthogonal polynomials against (
1) exhibit striking symmetry. To see it, let
represent the moments of self-dual weights (
1). The monic polynomials
, are orthogonal in
,
With
the sequence
enjoys the mentioned symmetry as
.
The orthogonal polynomials
are known to satisfy the three-term recurrence,
where the coefficients
, and
also possess symmetry property:
.
Mathematical induction applied to (
5) shows that for
,
Identity (
6) can be obtained from the original duality definition [
6], according to which
is a multiple of
.
The polynomial,
is less deviant from zero in
:
, i.e., for any monic polynomial
R of degree not exceeding
,
The comparison of the extremes of
and those of the classical monic, degree
, Chebyshev polynomial on the interval,
, provides a sharp inequality,
which holds for all distinct
. The factor
in (
8) is given incorrectly in [
10].
Equality in (
8) is attained if and only if
x’s are extreme points of the mentioned Chebyshev polynomial on this interval, i.e., when for
,
Then the probabilities (
1) have a remarkably simple form,
Since
, this provides the closest resemblance of (
1) to the uniform distribution.
In addition to
, the polynomial
has
real roots (which interlace those of
), so that, with
denoting the monic polynomial of degree
having these roots,
Therefore,
is the associated polynomial to
,
The associated with
orthogonal monic polynomial
of degree
satisfies the same recurrence (
5), but the initial conditions are different:
, so that
,
.
The coefficients
admit an explicit form: for even
nIf
n is odd, then
, and
With
other central orthogonal polynomials are of the form
and
One can represent
as a product of two monic polynomials (with real roots) of degrees
and
, respectively. Then with
and
and
. If
,
when
,
. Thus,
and
Central associated polynomials are
and
We summarize the main results as a theorem whose detailed proof can be found in in [
10].
Theorem 1. Polynomials , which are orthogonal with regard to probabilities (1), satisfy (5) and (6), with their central versions in (12), (14)–(16). Their associate polynomials satisfy (9) and (10), with the central versions given in (19)–(21). Central coefficients are given in (11) and (13). Inequality (8) is valid. Coefficients are completely determined by . Two other coefficients involve quadratic forms involving . We refer to these functions of observations as parity-based statistics and embark on their study.
3. Parity-Based Distributions
The main object of interest in this section is the parity-based sums of the form, .
The finite set
in
Section 2 can be considered as the representative points of univariate statistical distribution [
11]. Thus, we assume that it is a realization of
n independent random variables with common continuous distribution function
, whose density
has all finite moments,
, which determine
F uniquely.
Denote by
the parity sequence corresponding to
. To start exploring the behavior of the parity-based sums, notice that the distribution of a random parity
can be written as
, which does not depend on
F.
The joint density of
and its parity
is
Therefore, for
,
If
f is symmetric, the distribution function of
is
These formulas can be derived from the distribution of order statistics whose rank has the same parity as the largest observation. For example, the conditional density of
and
for given
, has the form
where
and
,
, are relative ranks for
and
, say,
if
.
The next result provides the joint density of m-sub-vectors and , the relevant conditional distributions, as well as the form of the moments of parity sums. Here, m is a fixed integer, . Thus, .
Theorem 2. The exchangeable distribution of and , is provided by (28). The conditional density of , for a given value of the product satisfies (33). The moments of the parity sum, can be found from (37). Proof. Let be the order statistics corresponding to . The joint distribution of and the parities can be represented as a mixture of the conditional densities for given ranks. A particular density enters this mixture if and only if , where , is the rank of among the total sample.
Let denote the rank of within our subsample. Then , and is the parity of in this subsample.
Since the probability of any rank combination is
, the classical formula for the distribution of several order statistics [
12] implies that the joint density of
and the corresponding parities is
Here, under the convention that
,
,
are familiar
spacings,
, which are known to have a Dirichlet distribution Dir
m+1 with positive concentration parameters
[
13]. Therefore,
Integration of (
26) over
gives our first combinatorial identity,
By replacing the summation variables in (
26) with
and using multinomial theorem, we arrive at the form of the joint distribution of
and
,
Here, for
,
is the parity of
in the subsample
, where
k is the cardinality of
K. Thus, this joint density is a linear function of parity products,
.
The joint Dirichlet distribution of spacings implies that
is beta-distributed with parameters
where for any subset
K of
M,
Clearly,
. If
k is even,
for odd
k,
so that
and
.
Thus, density (
28) is related to the classical hypergeometric function,
(actually a polynomial in
z of degree
). Our argument shows the special role of the specific value
which appears as the factor at
in (
28).
Indeed, by using the fundamental integral representation of the hypergeometric function [
14], and setting
, one obtains the following expression:
Observe that for
,
functions,
, are linearly independent. Indeed, they are orthogonal under the natural inner product,
where
K and
L are fixed subsets of
M. Also,
Simplifying the notation from
to
, we see that the joint density of
, and of the product,
,
, has the form
When
, one obtains
so that by (
31) with
given in (
30),
To derive formulas for the moments of the parity-based sum,
we need an extension of (
32) for positive integers
. For this purpose, the form of the joint density of
, and the product of the corresponding parities,
, is desired.
This density can be obtained from (
32), since
, if and only if with
odd }, one has
Thus, with
d denoting the cardinality of
D and
still denoting the rank of
within the subsample,
and
where
.
To prove (
35), we evaluate the following conditional expectation:
Identity (
35) is valid when some
vanish. Thus, for any non-negative integers
,
odd},
More general formula involves functions
,
According to (
36),
where
d is as above and
is even; (
37) presents the correct version of Formula (34) in [
10]. □
By using (
28), one can find the joint (symmetric) density of
. For example, when
f is assumed symmetric, the distribution of
and
is exchangeable, so that it suffices to determine its density when
. Then
if
otherwise. Similarly,
when
otherwise. Thus,
and
Since
it follows that the joint density of
when
is
In the symmetric case, for odd n and p. Indeed, up to multiple coincides with the parity sum derived from the sample , which is equidistributed with x’s.
If
, the first moment can be derived from (
23); the form of the second moment follows from (
25),
When
,
4. Parities and Hypergeometric Function
For fixed
, the joint distribution of parities
is obtained in Theorem 2, whose notation we follow. According to (
27) and (
28), if
Here, for
,
is the
k-th elementary symmetric function whose values depend only on
d, the number of
’s equal to
. Indeed
so that
Now we give explicit formulas for
involving double factorials. See [
15] for a survey of related combinatorial identities, and [
16] for further instances of closed-form expressions for this function at specific arguments.
Theorem 3. If , are positive integers, the following identities hold for the hypergeometric function: For any positive integers and p,and (46) is valid. Proof. To prove Formulas (
40)–(
43), we use the well-known facts about the hypergeometric function. According to 15.8.13 in [
14]
If
is even, this identity means that
Here, for any real
a and non-negative integer
j,
is the ascending factorial.
The only term of the finite series in the right-hand side of (
45) without a positive power of
corresponds to
. Its coefficient equals
which is seen to coincide with (
40). This coefficient vanishes when
is odd, implying (
41).
One has
15.5.16 in [
14] leading to (
42) and (
43).
Identity (
39) means that for any positive integers,
,
In the last two formulas,
K is any
k-element subset of
M. □
The values of the hypergeometric function entering (
39) and other formulas with
,
, can be summarized as follows:
When
, this function takes its largest value,
. If
,
.
It is immediate that when
and
m is fixed,
Now by using (
39), one gets the first-order approximation for fixed
,
which indicates the deviation from uniformity of the distribution of
.
Another proof of Formulas (
40)–(
43) in Theorem 3 uses the following generating function:
which is an even function of
z when
p is odd. If
p is even,
These facts can be found in 15.15.1 [
14].
It is well known that the probabilities defining the classical hypergeometric distribution with parameters , can be determined from its probability generating function, which is the (finite) hypergeometric series .
Therefore, the probability that such a random variable takes an even value (under any positive integers
and
) is
If
this probability coincides with
whose expression through
is given in Theorem 3.
The joint distribution of
, which define traditional hypergeometric random variable
,
differs from (
39). Indeed, the parities
, in (
39) are special because of their association with ranks of the subsample.
For two disjoint subsets
K and
of
M and any
L,
so that one obtains
It follows that
with similar formulas for the joint distribution of products of
s’s over several disjoint subsets.
As in Theorem 2, all joint probabilities are linear functions of
and
,
with some coefficients
. The degree of dependence of
s’s (or of their Bernoulli versions
) can be measured via the correlation coefficient between
and
,
In our examples,
, and the dependence is negative.
In (
39), if
n is even,
when
n is odd,
More generally, in (
50)
. Then if
n is even, and
are odd,
, which means that
and
are independent,
.
When
, one achieves in (
49),
.
These formulas may find further use in probability modeling and estimation of entropy of binary sequences [
17].
5. Parity-Based Sums and Dirichlet Distribution
We start here with the identity, which is similar to the formulas (
40)–(
43) in Theorem 3. Namely, for any positive integers
p and
n with
forming a weak decomposition of
p into
n (non-negative) parts, one has
Indeed,
, so that with
defined by (
30), the sum in the left-hand side of (
51) can be written as
which is the coefficient at
in the series expansion of
, cf. Section 1.3 in [
18].
One has,
A formula similar to (
52) for the generating function,
, obtains for the partition of
p (strictly positive
’s,
.
Theorem 4. For given in (52), , one hasas well as (54) and (55). Proof. The equality (
53) holds because of (
40)–(
43). Indeed, its left-hand part satisfies (
51), which corresponds to the weak decomposition of
p. By comparing this equality with the partition of
p into
m (positive size) blocks, one obtains a representation of
,
Here,
, denotes the cardinality of
, so that
is the multiplicity of 1 among
.
The number of possible choices of the set
K with the given multiplicity
j of 1 in the set
, is
. Therefore,
Alternative representation of
results from the binomial theorem applied to
,
□
The coefficients
also appear in the conditional cumulative distribution function corresponding to (
32). For positive
N put,
where
denotes the rank of
,
.
Then, the mentioned distribution function can be expressed through
as
which implies that
.
Thus, under the Dirichlet distribution Dir
m+1 of
, with all concentration parameters 1, one has for even
m,
Now we use the facts that the marginal distribution of has a beta-density with parameters , and that the conditional distribution of is .
Thus, when
m is even,
Here we took advantage of the binomial theorem, (
31) and (
53); in the last equality
denotes the incomplete beta function.
Identity (
57) also holds for odd values of
m. For example,
and
Interest in U is due to the multivariate integration by parts, which also motivates the study of its derivatives in the next section.
6. Partial Derivatives
For
, we look at the properties of the symmetric function
, which so far is defined almost everywhere (for pairwise different
x’s),
.
If the data consists of clusters,
, then using the definition by continuity put
which allows possibly equal
x’s. Notice that functions
are discontinuous.
If all x’s coincide, then under this definition, , if m is even; , if m is odd. If there are points equal to , then even; odd. If of x’s are equal to , then .
Thus, becomes a continuous function whose absolute value is bounded by 1. Actually, it possesses the Lipschitz property if the density f is bounded.
Our goal here is to determine the generalized derivative of order ,
, so that for any smooth compactly supported
, the multivariate integration by parts formula holds
Theorem 5. With , denoting the descending factorial, one has Proof. For
differentiation over
shows that
where
For
,
is a symmetric function of its arguments,
For
, these functions can be characterized by the following recursion with
,
,
The proof is by induction. When
,
Indeed, shift invariance of
means that for any
i,
Thus,
and for
,
which indeed is a symmetric function of
.
The following induction steps are straightforward, so that (
58) follows. □
For
, so that according to (
58),
Let
where summation is over all proper subsets
K of
M,
. Then
is
grounded, i.e., it vanishes if
for at least one
j. If one of the
x’s is equal to
, then
so that for even
n,
vanishes if
for at least one
j. When
,
,
.
Thus,
For
and even
n it follows that for any integrable function
,
which is useful to determine the covariance structure.
7. Conclusions
Interesting properties of self-dual probabilities demonstrate their potential in statistical estimation without any additional variance information. The polynomial approximation over a finite set, as well as the Gauss hypergeometric function, are intimately related to data parities and parity-based distributions.
Several presented combinatorial identities may find wider use in probability theory applications, in particular, in random matrices and in statistical physics.