1. Introduction
Generalizations and modifications of standard statistical distributions, such as chi-square and Student distributions, play a useful role because of their numerous possible applications in different areas of statistics. However, the modifications introduced here, chi-square and Student bridge distributions, are only considered from the subsequent application.
If the normalized chi-square distributed random variable from the denominator of the common ratio representation of a Student distributed random variable is replaced with a mixture of independent normalized chi-square distributed variables, then the resulting ratio follows a distribution that is called here a Student bridge distribution. The possibly most prominent example of this type of random variables is the Behrens–Fisher statistic. The mixing coefficient in the corresponding representation of this statistic depends on the variances of the underlying two Gaussian sample distributions only through their ratio. The variance ratio thus plays the role of a nuisance parameter when deriving the distribution of the Behrens–Fisher statistic. As described in [
1], it may happen that the variance ratio is known although the individual variances are not when two instruments of equal precision average different numbers of replicates arriving at a response. Another situation where one of the two variances is known and the other one is not is dealt with in [
2].
The well known Behrens–Fisher statistic was introduced already in [
3,
4]. Several authors provided approximations of its distribution. To mention only some of the earlier contributions, first of all we refer to the well known approximation in [
5]. The approximative distribution whose percentage points are dealt with in [
6] is often called the Behrens–Fisher distribution. Convolutions of weighted chi-squares are used for an evaluation of the Welch approximation to the distribution of the Behrens–Fisher statistic in [
7]. In [
8], the exact distribution of the Behrens–Fisher statistic is derived for the case of two unknown variances, and depends on two unknown parameters, which brings with it the need for additional approximations for statistical applications.
The exact distribution of a modified Behrens–Fisher statistic considered in [
9] is very closely related to the distribution derived in [
8]. Authors of [
9] emphasize that there are (at that time) not many computer programs for computing the special functions that appear as components of the exact distributions and replace these functions mostly with suitable elementary ones.
The alternative aim of the present brief report is to take up again and continue earlier structural considerations on weighted chi-square distributions and their convolutions and on accordingly generalized Student distributions. Knowing the symmetry properties of the generalized Student densities considered here, numerical results obtained in [
9] can be taken over to dealing with asymmetric statistical problems in a common way. To be more specific, we prove that the distribution derived in [
8] actually depends on the unknown variances only through their ratio, thus allowing to perform exact statistical decisions in case of known variance ratios without additional approximations. Our proof follows a different line than that presented in [
8]. In particular, we make more visible the influence the mixture coefficient of the chi-square distributed variables from the denominator of the Behrens–Fisher statistic has on the resulting distribution of the Behrens–Fisher statistic itself. Although the densities of this statistic are visually quite close to each other with varying variance ratios, for many choices of the two sample sizes there are more or less exceptional situations of smaller closeness not mentioned in [
8]. From a general structural point of view, our consideration makes the particularly high precision of known approximations to the exact distribution of the Behrens–Fisher statistic more understandable, but it also confirms their limitations, as pointed out in [
9] for selected cases from a numerical point of view.
The more general problem of finding an optimal expectation test in the Gaussian two-sample scheme is called the Behrens–Fisher problem. It is dealt with in [
10] as a problem in the presence of three nuisance parameters. Reviews on numerous papers dealing with the Behrens–Fisher problem and the distribution of the Behrens–Fisher statistic can be found, e.g., in [
11] and in [
12]. The connections between the different classical approaches to statistics and the Behrens–Fisher problem are emphasized in [
11], while in [
12] there is an emphasis on three procedures that are in a certain suitably defined sense exact solutions to the Behrens–Fisher problem. The multivariate Behrens–Fisher distribution is considered, e.g., in [
13,
14]; for the nonparametric approach to the Behrens–Fisher problem see [
15] and the references given there.
The present paper does not deal with the general Behrens–Fisher problem but is devoted to the study of the probability density function of the Behrens–Fisher statistic with a focus on a function of the mixing parameter as a nuisance parameter. We explicitly describe the influence the single nuisance parameter has on the Student bridge distribution.
The two-sample
t-test with a known ratio of variances where the pooled empirical variance is used instead of individual sample variances is dealt with in [
1,
2]. What these papers have in common is that, unlike here, Student distributions with estimated d.f. are used for performing statistical tests. A test statistic conditional on the value of the variance ratio is studied in [
16].
We derive here exact representations of the pdf of the Behrens–Fisher statistic allowing heteroscedasticity and unbalancedness, i.e., different variances and sample sizes, respectively. These representations can be considered as heteroscedasticity-unbalancedness generalizations of Student’s density.
The paper is organized as follows. The chi-square bridge distribution is introduced and its moments are described in
Section 2.
Section 3 deals with the Student bridge distribution and
Section 4 with its application to the Behrens–Fisher statistic. A discussion including a three pillar bridges explanation for the choice of degrees of freedom in the Welch approximation to the exact distribution of the Behrens–Fisher statistic is presented in
Section 5. Figures were drawn using Matlab.
2. Chi-Square Bridge Distribution
Let
and
be independent random variables, where
is chi-square distributed with
d d.f.,
, and
. For
we consider the mixture of normalized chi-squares or
weighted
sum of
Chi-
squares
The first and second order moments of
W are
respectively. Minimal variance with respect to the mixing coefficient is attained for
, where
Let indicate that the random variable X follows the probability distribution h. If and , then holds for and statistic follows in this case a chi-square distribution with d.f., . Moreover, and . That is why we say that the distribution of WSCS has a three pillar bridges property.
In what follows we assume that
. The density of
W can immediately be derived then from its convolution integral representation
and allows according to commutativity of summands the following two representations:
and
where
denotes the hypergeometric function of order (1,1), see, e.g., Formula 13.2.1 in [
17] and Formula 9.210 in [
18]. The Beta function can be expressed in terms of the Gamma function
as
. In the case that
, we have that
and
. Choosing
in (
2) avoids unboundedness of
in the integrand of
and might motivate favoring Formula (
2) over Formula (
3), in this case.
Definition 1. The probability distribution having density (2) (or (3)) will be called chi-square bridge distribution with d.f. and mixing parameter γ, or -chi-square distribution , for short.
Figure 1 and
Figure 2 show the density
of the distribution
for four different pairs
, and
or
, respectively.
4. Behrens–Fisher Statistic
Let
and
be jointly independent Gaussian samples with expectations
and variances
We consider the statistic
where
and
are common sample means and unbiased sample variances, respectively. By
we mean that two random variables
Z and
U have the same probability distribution.
Lemma 1. The Behrens–Fisher statistic allows the representationwithand where is a standard Gaussian distributed random vector taking values in ,andare independent. Proof. Let us further denote the orthogonal projection onto the linear space
by
and let the matrix
P be defined such that
;
then
where
is a
-matrix. Here and below, missing off-diagonal matrix elements are zero. If
is the subspace of
being orthogonally to
then
. The statistic
can be written as:
where the functional
is defined for all
and
as:
We note that
and
, and put
The random vector
takes its values in
and follows a singular Gaussian distribution of rank
n,
As a consequence, the nominator and denominator of the ratio statistic
are stochastically independent. Let
be orthogonal
and
matrices, respectively. The random vector
with
follows a centered Gaussian distribution with the covariance matrix
The vector
allows almost surely the representation
where the random variables
are independent and centered normally distributed with variances
and
for
and
, respectively. Let
The Kronecker product matrix
describes then a mapping from
to
and, a.s.,
where the norm
is defined in
. The variance of the nominator of the Behrens–Fisher statistic is
Hence,
may be represented as
where the standard Gaussian distributed random variable
is independent of
. □
The constants and from Lemma 1 depend on and only through the variance ratio , which itself plays the role of a nuisance parameter.
If
where
is the sample size ratio then the constants
and
may be expressed in terms of the parameter triple
. The constants
and
depend on the variance ratio
, but in a different way for the different sample size ratio
. For
given, the inverse mapping
is defined by:
Other parameter triples could be introduced, e.g.,
, being closely related to but nevertheless different from the parameter triple in [
6].
The first and second order moments of
are
Finally, it turns out that under the hypothesis
the statistic
allows the representation
where the independent random variables
N and
are as in
Section 2,
, and the mixing coefficient is
Thus, the Behrens–Fisher statistic follows the Student bridge distribution with d.f.
and mixing coefficient
, or
-Student bridge distribution, for short,
In case of a known variance ratio, standard statistical significance testing and confidence estimation methods are based therefore upon the
-Student bridge distribution in the common way. Here, assumption
from
Section 2 means that
Without going here into technical details, the unrestricted distribution of is a non-central Student bridge distribution in a suitably defined sense.
5. Discussion
5.1. Reflection of the Three Pillar Bridges Property
We now consider four examples from [
8] for demonstrating the role the three pillar bridges property discussed in this paper may play in practical statistical work. In each example, we chose a pair of sample sizes
from the set
, and a variance ratio
from the positive real line that we assume to be known. In any case, we then determine the mixture coefficient
by a one-to-one calculation from
. This way,
Examples 1–4 are described (with some redundancy) as:
,
,
and
.
Figure 1,
Figure 2,
Figure 3 and
Figure 4 show the densities
and
for more parameter combinations
than required in Examples E1 to E4.
A value of close to 1 corresponds to a value of close to zero, , meaning that the sample size in the second population compared to that in the first is disproportionately large compared to the corresponding quotient of variances; in other words, the first population is under represented.
A value of close to 0 corresponds to a very large value of , , meaning that the sample size in the first population compared to that in the second is disproportionately large compared to the corresponding quotient of variances; in other words, the second population is under represented.
Unlike these two cases of imbalance, a value of in the order of speaks for an approximately achieved balance. The latter can be observed close to the middle pillar of the three pillar Student bridge distribution and is suitable to explain some effect when choosing the degree of freedom in the Welch approximation to the exact density of T.
The chi-square bridge densities shown in
Figure 1 correspond tho those of the Student bridge densities in
Figure 3. It is shown in [
8] that for such cases the Welch approximation seems to be the best that were found so far. Welch’s approximate degrees of freedom, see Formula (1.2) in [
8] with
and
replaced with
, are
for Example 1 and
for Example 2. This corresponds very well to the three pillar property of the Student bridge distribution.
If , as is approximately the case in Example 1, then the denominator of T can be written as Because the numbers 11 and 14 are of comparable size, a reasonable approximation is finally leading for Example 1 to .
In Example 2, the denominator of T allows the representation , which is reasonably approximated by . Thus, .
Figure 2 shows a broader variability between the densities when the mixing coefficient
is varied compared to
Figure 1. This is reflected in more visible variation of the corresponding Student bridge densities in
Figure 4, both in their distribution centers and their distribution tails. This should be taken into account if applications of the Student bridge distribution are required, in particular in the areas of the distributions just mentioned.
The Welch approximation is known to perform better when both k and m are sufficiently large. Our Figures show what may happen for small sample sizes.
Because the consideration in [
8] is even for higher dimensions it might be of some interest to extend the present work to this case, too.
5.2. Examples Where the Student Bridge Distribution Should Be Preferred
The aim of this section is to give a complementary structural argumentation confirming the numerical discoveries in [
9] with respect to the question of when Welch’s approximation is not sufficiently precise. To this end, we present, for two cases of sample sizes and variance ratios, the exact Student bridge density and Welch’s approximation to it in a joint figure.
Example 5. Assume that as in
Figure 1,
Figure 2 and
Figure 3 in [
9], sample sizes
are
and
and that the estimated variance ratio is always equal to 0.25. If we assume that the exact variance ratio in (10) is equal to 0.25, then the mixing coefficient
of the Student bridge distribution
is accordingly equal to 0.706, 0.840, 0.750, 0.889, 0.8 and 0.8.
Figure 5 shows the density of
and the density of its Welch approximation
that can hardly be visually distinguished from each other if considered on the whole line, but differ locally.
Figure 6 shows the densities of the Student bridge distribution
and Welch’s Student approximation to it,
. In this case, preference for the Student bridge density can even be seen globally.