Chi-Square and Student Bridge Distributions and the Behrens–Fisher Statistic

Richter, Wolf-Dieter

doi:10.3390/stats3030021

Open AccessBrief Report

Chi-Square and Student Bridge Distributions and the Behrens–Fisher Statistic

by

Wolf-Dieter Richter

Institute of Mathematics, University of Rostock, 18057 Rostock, Germany

Stats 2020, 3(3), 330-342; https://doi.org/10.3390/stats3030021

Submission received: 23 June 2020 / Revised: 19 August 2020 / Accepted: 20 August 2020 / Published: 25 August 2020

(This article belongs to the Section Statistical Methods)

Download

Browse Figures

Review Reports Versions Notes

Abstract

We prove that the Behrens–Fisher statistic follows a Student bridge distribution, the mixing coefficient of which depends on the two sample variances only through their ratio. To this end, it is first shown that a weighted sum of two independent normalized chi-square distributed random variables is chi-square bridge distributed, and secondly that the Behrens–Fisher statistic is based on such a variable and a standard normally distributed one that is independent of the former. In case of a known variance ratio, exact standard statistical testing and confidence estimation methods apply without the need for any additional approximations. In addition, a three pillar bridges explanation is given for the choice of degrees of freedom in Welch’s approximation to the exact distribution of the Behrens–Fisher statistic.

Keywords:

heteroscedasticity; unbalancedness; sums of weighted chi-squares; variance ratio; Welch approximation; three pillar bridges property

MSC:

62 E 15; 62 F 03; 62 F 25; 28 A 50

1. Introduction

Generalizations and modifications of standard statistical distributions, such as chi-square and Student distributions, play a useful role because of their numerous possible applications in different areas of statistics. However, the modifications introduced here, chi-square and Student bridge distributions, are only considered from the subsequent application.

If the normalized chi-square distributed random variable from the denominator of the common ratio representation of a Student distributed random variable is replaced with a mixture of independent normalized chi-square distributed variables, then the resulting ratio follows a distribution that is called here a Student bridge distribution. The possibly most prominent example of this type of random variables is the Behrens–Fisher statistic. The mixing coefficient in the corresponding representation of this statistic depends on the variances of the underlying two Gaussian sample distributions only through their ratio. The variance ratio thus plays the role of a nuisance parameter when deriving the distribution of the Behrens–Fisher statistic. As described in [1], it may happen that the variance ratio is known although the individual variances are not when two instruments of equal precision average different numbers of replicates arriving at a response. Another situation where one of the two variances is known and the other one is not is dealt with in [2].

The well known Behrens–Fisher statistic was introduced already in [3,4]. Several authors provided approximations of its distribution. To mention only some of the earlier contributions, first of all we refer to the well known approximation in [5]. The approximative distribution whose percentage points are dealt with in [6] is often called the Behrens–Fisher distribution. Convolutions of weighted chi-squares are used for an evaluation of the Welch approximation to the distribution of the Behrens–Fisher statistic in [7]. In [8], the exact distribution of the Behrens–Fisher statistic is derived for the case of two unknown variances, and depends on two unknown parameters, which brings with it the need for additional approximations for statistical applications.

The exact distribution of a modified Behrens–Fisher statistic considered in [9] is very closely related to the distribution derived in [8]. Authors of [9] emphasize that there are (at that time) not many computer programs for computing the special functions that appear as components of the exact distributions and replace these functions mostly with suitable elementary ones.

The alternative aim of the present brief report is to take up again and continue earlier structural considerations on weighted chi-square distributions and their convolutions and on accordingly generalized Student distributions. Knowing the symmetry properties of the generalized Student densities considered here, numerical results obtained in [9] can be taken over to dealing with asymmetric statistical problems in a common way. To be more specific, we prove that the distribution derived in [8] actually depends on the unknown variances only through their ratio, thus allowing to perform exact statistical decisions in case of known variance ratios without additional approximations. Our proof follows a different line than that presented in [8]. In particular, we make more visible the influence the mixture coefficient of the chi-square distributed variables from the denominator of the Behrens–Fisher statistic has on the resulting distribution of the Behrens–Fisher statistic itself. Although the densities of this statistic are visually quite close to each other with varying variance ratios, for many choices of the two sample sizes there are more or less exceptional situations of smaller closeness not mentioned in [8]. From a general structural point of view, our consideration makes the particularly high precision of known approximations to the exact distribution of the Behrens–Fisher statistic more understandable, but it also confirms their limitations, as pointed out in [9] for selected cases from a numerical point of view.

The more general problem of finding an optimal expectation test in the Gaussian two-sample scheme is called the Behrens–Fisher problem. It is dealt with in [10] as a problem in the presence of three nuisance parameters. Reviews on numerous papers dealing with the Behrens–Fisher problem and the distribution of the Behrens–Fisher statistic can be found, e.g., in [11] and in [12]. The connections between the different classical approaches to statistics and the Behrens–Fisher problem are emphasized in [11], while in [12] there is an emphasis on three procedures that are in a certain suitably defined sense exact solutions to the Behrens–Fisher problem. The multivariate Behrens–Fisher distribution is considered, e.g., in [13,14]; for the nonparametric approach to the Behrens–Fisher problem see [15] and the references given there.

The present paper does not deal with the general Behrens–Fisher problem but is devoted to the study of the probability density function of the Behrens–Fisher statistic with a focus on a function of the mixing parameter as a nuisance parameter. We explicitly describe the influence the single nuisance parameter has on the Student bridge distribution.

The two-sample t-test with a known ratio of variances where the pooled empirical variance is used instead of individual sample variances is dealt with in [1,2]. What these papers have in common is that, unlike here, Student distributions with estimated d.f. are used for performing statistical tests. A test statistic conditional on the value of the variance ratio is studied in [16].

We derive here exact representations of the pdf of the Behrens–Fisher statistic allowing heteroscedasticity and unbalancedness, i.e., different variances and sample sizes, respectively. These representations can be considered as heteroscedasticity-unbalancedness generalizations of Student’s density.

The paper is organized as follows. The chi-square bridge distribution is introduced and its moments are described in Section 2. Section 3 deals with the Student bridge distribution and Section 4 with its application to the Behrens–Fisher statistic. A discussion including a three pillar bridges explanation for the choice of degrees of freedom in the Welch approximation to the exact distribution of the Behrens–Fisher statistic is presented in Section 5. Figures were drawn using Matlab.

2. Chi-Square Bridge Distribution

Let

C Q_{1, k}

and

C Q_{2, m}

be independent random variables, where

C Q_{i, d}

is chi-square distributed with d d.f.,

d \in {k, m}

, and

i = 1, 2

. For

γ \in [0, 1],

we consider the mixture of normalized chi-squares or weighted sum of Chi-squares

W = W S C S (k, m; γ) = γ \frac{C Q_{1, k}}{k} + (1 - γ) \frac{C Q_{2, m}}{m} .

The first and second order moments of W are

E (W) = 1 and V (W) = 2 (\frac{γ^{2}}{k} + \frac{{(1 - γ)}^{2}}{m}),

respectively. Minimal variance with respect to the mixing coefficient is attained for

γ = γ_{0}

, where

γ_{0} = \frac{k}{k + m} .

(1)

Let

X \sim h

indicate that the random variable X follows the probability distribution h. If

A = γ / k

and

B = (1 - γ) / m

, then

A = B

holds for

γ = γ_{0}

and statistic

W / A

follows in this case a chi-square distribution with

k + m

d.f.,

(k + m) W S C S (k, m; γ_{0}) \sim χ_{k + m}^{2}

. Moreover,

k \cdot W S C S (k, m; 1) \sim χ_{k}^{2}

and

m \cdot W S C S (k, m; 0) \sim χ_{m}^{2}

. That is why we say that the distribution of WSCS has a three pillar bridges property.

In what follows we assume that

γ \in (0, 1)

. The density of W can immediately be derived then from its convolution integral representation

\begin{matrix} f_{W} (x) = \int_{- \infty}^{\infty} f_{\frac{γ}{k} C Q_{1, k}} (x - z) f_{\frac{1 - γ}{m} C Q_{2, m}} (z) d z \\ = \frac{{(\frac{k}{γ})}^{\frac{k}{2}} {(\frac{m}{1 - γ})}^{\frac{m}{2}}}{2^{\frac{k + m}{2}} Γ (\frac{k}{2}) Γ (\frac{m}{2})} \int_{0}^{x} {(x - z)}^{\frac{k}{2} - 1} e^{- \frac{k (x - z)}{2 γ}} z^{\frac{m}{2} - 1} e^{- \frac{m z}{2 (1 - γ)}} d z \end{matrix}

and allows according to commutativity of summands the following two representations:

f_{W} (x) = {(\frac{k}{γ})}^{k / 2} {(\frac{m}{1 - γ})}^{m / 2} \frac{x^{(k + m) / 2 - 1}_{1} F_{1} (\frac{m}{2}, \frac{k + m}{2}; \frac{x}{2} (\frac{k}{γ} - \frac{m}{1 - γ}))}{2^{(k + m) / 2} Γ (\frac{k + m}{2}) exp {\frac{k x}{2 γ}}}, x > 0

(2)

and

f_{W} (x) = {(\frac{k}{γ})}^{k / 2} {(\frac{m}{1 - γ})}^{m / 2} \frac{x^{(k + m) / 2 - 1}_{1} F_{1} (\frac{k}{2}, \frac{k + m}{2}; \frac{x}{2} (\frac{m}{1 - γ} - \frac{k}{γ}))}{2^{(k + m) / 2} Γ (\frac{k + m}{2}) exp {\frac{m x}{2 (1 - γ)}}}, x > 0

(3)

where

_{1} F_{1} (a, b; z) = \frac{1}{B (a, b - a)} \int_{0}^{1} e^{z t} t^{a - 1} {(1 - t)}^{b - a - 1} d t

denotes the hypergeometric function of order (1,1), see, e.g., Formula 13.2.1 in [17] and Formula 9.210 in [18]. The Beta function can be expressed in terms of the Gamma function

x \to Γ (x) = \int_{0}^{\infty} t^{x - 1} e^{- t} d t

as

B (a, b) = Γ (a) Γ (b) / Γ (a + b)

. In the case that

γ = γ_{0}

, we have that

_{1} F_{1} (a, b; \frac{x}{2} (\frac{k}{γ} - \frac{m}{1 - γ})) = 1

and

f_{W} (x) = (1 / A) f_{χ_{k + m}^{2}} (x / A)

. Choosing

\frac{k}{γ} < \frac{m}{1 - γ}

in (2) avoids unboundedness of

e^{z t}

in the integrand of

_{1} F_{1} (a, b; z)

and might motivate favoring Formula (2) over Formula (3), in this case.

Definition 1.

The probability distribution having density (2) (or (3)) will be called chi-square bridge distribution with

(k, m)

d.f. and mixing parameter γ, or

(k, m; γ)

-chi-square distribution

χ_{k, m; γ}^{2}

, for short.

Figure 1 and Figure 2 show the density

f_{W}

of the distribution

χ_{k, m; γ}^{2}

for four different pairs

(k, m)

, and

γ \in {0.1, 0.505, 0.962}

or

γ \in {0.01, 0.505, 0.7, 0.99}

, respectively.

3. Student Bridge Distribution

If N denotes a standard Gaussian distributed random variable that is independent of

C Q_{1, k}

and

C Q_{2, m}

, and

t_{l}

Student’s t-distribution with l d.f., then the statistic

T = T_{k, m; γ} = \frac{N}{\sqrt{W S C S (k, m; γ)}}

satisfies

T_{k, m; 1} \sim t_{k}, T_{k, m; γ_{0}} \sim t_{k + m} and T_{k, m; 0} \sim t_{m} .

(4)

By the general integral representation of the density of the ratio of two independent continuous random variables,

f_{T} (t) = \int_{- \infty}^{\infty} f_{N} (t x) f_{\sqrt{W S C S (k, m; γ)}} (t) | t | d t

where

f_{N}

denotes the standard Gaussian density and

f_{\sqrt{W}} (t) = 2 t f_{W} (t^{2})

can easily be derived from (2) or (3). Making use of (2) and changing the order of integration gives us:

\begin{matrix} f_{T} (t) = \frac{2 {(\frac{k}{α})}^{k / 2} {(\frac{m}{1 - α})}^{m / 2}}{\sqrt{2 π} 2^{(k + m) / 2} Γ (k / 2) Γ (m / 2} \\ \cdot \int_{0}^{1} v^{m / 2 - 1} {(1 - v)}^{k / 2 - 1} \int_{0}^{\infty} t^{k + m - 1} exp {- \frac{t^{2} x^{2}}{2} - \frac{k t^{2}}{2 α} - \frac{- t^{2} v (\frac{m}{1 - α} - \frac{k}{2})}{2}} t d t d v, \end{matrix}

and changing the variables

s = \frac{t^{2}}{2} (x^{2} + \frac{k}{α} + v (\frac{m}{1 - α} - \frac{k}{2}))

leads to the first of the following two alternative representations of the density

f_{T}

,

f_{T} (x) = {(\frac{m γ}{k (1 - γ)})}^{m / 2} \frac{Γ (\frac{k + m + 1}{2})_{2} F_{1} (\frac{k + m + 1}{2}, \frac{m}{2}; \frac{k + m}{2}; \frac{\frac{k}{γ} - \frac{m}{1 - γ}}{\frac{k}{γ} + x^{2}})}{Γ (\frac{k + m}{2}) \sqrt{π \frac{k}{γ}} {(1 + \frac{γ}{k} x^{2})}^{(k + m + 1) / 2}}, x \in R

(5)

and

f_{T} (x) = {(\frac{k (1 - γ)}{m γ})}^{k / 2} \frac{Γ (\frac{k + m + 1}{2})_{2} F_{1} (\frac{k + m + 1}{2}, \frac{k}{2}; \frac{k + m}{2}; \frac{\frac{m}{1 - γ} - \frac{k}{γ}}{\frac{m}{1 - γ} + x^{2}})}{Γ (\frac{k + m}{2}) \sqrt{π \frac{m}{(1 - γ)}} {(1 + \frac{1 - γ}{m} x^{2})}^{(k + m + 1) / 2}}, x \in R .

(6)

Making use of (3) instead of (2) proves (6). Here,

_{2} F_{1}

denotes the hypergeometric function being defined for

δ > β > 0

by

_{2} F_{1} (γ, β; δ; y) = \frac{1}{B (β, δ - β)} \int_{0}^{1} {(1 - z)}^{δ - β - 1} z^{β - 1} {(1 - y z)}^{- γ} d z,

see 15.3.1 in [17] and Formula 9.111 in [18]. Choosing

y < 0

avoids a zero of

1 - y z

within the range of integration and might motivate a favor of using Formulas (5) or (6) if

\frac{k}{γ} < \frac{m}{1 - γ}

or

\frac{k}{γ} > \frac{m}{1 - γ},

respectively. The following definition is motivated by the three pillar bridges property (4).

Definition 2.

The probability distribution corresponding to density (5) (or (6)) will be called a Student bridge distribution with

(k, m)

d.f. and mixing coefficient γ, or

(k, m; γ)

-Student bridge distribution

t_{k, m; γ}

, for short.

Figure 3 and Figure 4 show the density

f_{T}

of the distribution

t_{k, m; γ}

for the same choice of parameters as for

f_{W} .

We remark that for

l = 1, 2, \dots,

E T^{2 l - 1} = 0

and

\begin{matrix} E T^{2 l} = & \frac{(2 l - 1)!! {(\frac{k}{γ})}^{k / 2} {(\frac{m}{1 - γ})}^{m / 2}}{2^{(k + m) / 2} Γ ((k + m) / 2)} \\ \cdot \int_{0}^{\infty} x^{\frac{k + m}{2} - 1 - l} e^{- \frac{k x}{2 γ}}_{1} F_{1} (\frac{m}{2}, \frac{k + m}{2}, \frac{x}{2} (\frac{k}{γ} - \frac{m}{1 - γ})) d x, \end{matrix}

that is

\begin{matrix} E T^{2 l} = & \frac{Γ (\frac{k + m}{2} - l)}{Γ (\frac{k + m}{2})} {(\frac{k}{2 γ})}^{l} {(\frac{γ m}{(1 - γ) k})}^{m / 2} \\ \cdot_{2} F_{1} (\frac{k + m}{2} - l, \frac{m}{2}; \frac{k + m}{2}; 1 - \frac{γ m}{(1 - γ) k}) . \end{matrix}

4. Behrens–Fisher Statistic

Let

X_{1}, \dots, X_{n_{1}}

and

Y_{1}, \dots, Y_{n_{2}}

be jointly independent Gaussian samples with expectations

μ_{i}

and variances

σ_{i}^{2}, i = 1, 2 .

We consider the statistic

T^{B F} = \frac{\bar{X} - \bar{Y}}{\sqrt{\frac{S_{X}^{2}}{n_{1}} + \frac{S_{Y}^{2}}{n_{2}}}}

where

\bar{X} = \frac{1}{n_{1}} \sum_{i = 1}^{n_{1}} X_{i}, \bar{Y} = \frac{1}{n_{2}} \sum_{i = 1}^{n_{2}} Y_{i}

and

S_{X}^{2} = \frac{1}{n_{1} - 1} \sum_{i = 1}^{n_{1}} {(X_{i} - \bar{X})}^{2}, S_{Y}^{2} = \frac{1}{n_{2} - 1} \sum_{i = 1}^{n_{2}} {(Y_{i} - \bar{Y})}^{2}

are common sample means and unbiased sample variances, respectively. By

Z \overset{d}{=} U

we mean that two random variables Z and U have the same probability distribution.

Lemma 1.

The Behrens–Fisher statistic allows the representation

T^{B F} \overset{d}{=} \frac{Z_{n_{1} + n_{2} - 1}}{\sqrt{S^{2}}}

with

S^{2} = A^{*} \sum_{i = 1}^{n_{1} - 1} Z_{i}^{2} + B^{*} \sum_{i = n_{1}}^{n_{1} + n_{2} - 2} Z_{i}^{2}

and where

{(Z_{1}, \dots, Z_{n_{1} + n_{2} - 1})}^{T}

is a standard Gaussian distributed random vector taking values in

R^{n_{1} + n_{2} - 1}

,

A^{*} = \frac{1}{1 + \frac{n_{1}}{n_{2}} {(\frac{σ_{2}}{σ_{1}})}^{2}} \cdot \frac{1}{n_{1} - 1}, B^{*} = \frac{1}{1 + {[\frac{n_{1}}{n_{2}} {(\frac{σ_{2}}{σ_{1}})}^{2}]}^{- 1}} \cdot \frac{1}{n_{2} - 1}

and

Z_{n_{1} + n_{2} - 1} \overset{d}{=} \frac{\bar{X} - \bar{Y}}{\sqrt{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}}} a n d S^{2} \overset{d}{=} \frac{\frac{S_{X}^{2}}{n_{1}} + \frac{S_{Y}^{2}}{n_{2}}}{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}}

are independent.

Proof.

We put

1_{k} = {(1, \dots, 1)}^{T} \in R^{k}, 0_{k} = {(0, \dots, 0)}^{T} \in R^{k}, 1^{+ 0} = {(1_{n_{1}}^{T} 0_{n_{2}}^{T})}^{T} and 1^{0 +} = {(0_{n_{1}}^{T} 1_{n_{2}}^{T})}^{T} .

Let us further denote the orthogonal projection onto the linear space

L = L (1^{+ 0}, 1^{0 +})

by

Π_{L}

and let the matrix P be defined such that

Π_{L} x = P x, \forall x \in R^{n}

;

n = n_{1} + n_{2},

then

P = (\begin{matrix} \frac{1}{n_{1}} I I_{n_{1}} \\ \frac{1}{n_{2}} I I_{n_{2}} \end{matrix})

where

I I_{k} = 1_{k} 1_{k}^{T}

is a

k \times k

-matrix. Here and below, missing off-diagonal matrix elements are zero. If

L^{⊥}

is the subspace of

R^{n}

being orthogonally to

L

then

I_{n} - P = Π_{L^{⊥}}

. The statistic

T^{B F}

can be written as:

T^{B F} = \frac{(π_{L} (\begin{matrix} X \\ Y \end{matrix}), \frac{1}{n_{1}} 1^{+ 0} - \frac{1}{n_{2}} 1^{0 +})}{| | (\begin{matrix} X \\ Y \end{matrix}) - π_{L} (\begin{matrix} X \\ Y \end{matrix}) {| |}_{(n)}}

(7)

where the functional

{| | . | |}_{(n)}

is defined for all

x = {(x_{1}, \dots, x_{n_{1}})}^{T} \in R^{n_{1}}

and

y = {(y_{1}, \dots, y_{n_{2}})}^{T} \in R^{n_{2}}

as:

| | (\begin{matrix} x \\ y \end{matrix}) {| |}_{(n)} = {(\frac{1}{n_{1} (n_{1} - 1)} \sum_{1}^{n_{1}} x_{i}^{2} + \frac{1}{n_{2} (n_{2} - 1)} \sum_{1}^{n_{2}} y_{i}^{2})}^{1 / 2} .

We note that

P μ = μ

and

(I_{n} - P) μ = 0_{n}

, and put

κ = {(μ_{1} 1_{n_{1}}^{T} 0_{n_{2}}^{T})}^{T} .

The random vector

ζ = (\begin{matrix} P (\begin{matrix} X \\ Y \end{matrix}) \\ (I_{n} - P) (\begin{matrix} X \\ Y \end{matrix}) \end{matrix})

takes its values in

R^{2 n}

and follows a singular Gaussian distribution of rank n,

ζ \sim Φ_{κ, Ψ} with Ψ = (\begin{matrix} \frac{σ_{1}^{2}}{n_{1}} I I_{n_{1}} \\ \frac{σ_{2}^{2}}{n_{2}} I I_{n_{2}} \\ σ_{1}^{2} (I_{n_{1}} - \frac{1}{n_{1}} I I_{n_{1}}) \\ σ_{1}^{2} (I_{n_{2}} - \frac{1}{n_{2}} I I_{n_{2}}) \end{matrix}) .

As a consequence, the nominator and denominator of the ratio statistic

T^{B F}

are stochastically independent. Let

B_{1}^{T} = (\begin{matrix} b_{1} & \dots & b_{n_{1} - 1} & \frac{1}{\sqrt{n_{1}}} 1_{n_{1}} \end{matrix}), B_{2}^{T} = (\begin{matrix} c_{1} & \dots & c_{n_{2} - 1} & \frac{1}{\sqrt{n_{2}}} 1_{n_{2}} \end{matrix})

be orthogonal

n_{1} \times n_{1}

and

n_{2} \times n_{2}

matrices, respectively. The random vector

η = B (I_{n} - P) (\begin{matrix} X \\ Y \end{matrix})

with

B = (\begin{matrix} B_{1} \\ B_{2} \end{matrix})

follows a centered Gaussian distribution with the covariance matrix

B (\begin{matrix} σ_{1}^{2} (I_{n_{1}} - \frac{1}{n_{1}} I I_{n_{1}}) \\ σ_{1}^{2} (I_{n_{2}} - \frac{1}{n_{2}} I I_{n_{2}}) \end{matrix}) B^{T} = (\begin{matrix} σ_{1}^{2} (\begin{matrix} I_{n_{1} - 1} & 0 \\ 0 & 0 \end{matrix}) \\ σ_{2}^{2} (\begin{matrix} I_{n_{2} - 1} & 0 \\ 0 & 0 \end{matrix}) \end{matrix}) .

The vector

(I_{n} - P) (\begin{matrix} X \\ Y \end{matrix}) = B^{T} η

allows almost surely the representation

\begin{matrix} B^{T} η & = B^{T} {(N_{1}, \dots, N_{n_{1} - 1}, 0, N_{n_{1} + 1}, \dots, N_{n - 1}, 0)}^{T} \\ = \sum_{i = 1}^{n_{1} - 1} N_{i} (\begin{matrix} b_{i} \\ 0_{n_{2}} \end{matrix}) + \sum_{j = 1}^{n_{2} - 1} N_{n_{1} + j} (\begin{matrix} 0_{n_{1}} \\ c_{j} \end{matrix}) \\ = (\begin{matrix} Σ_{i = 1}^{n_{1} - 1} N_{i} b_{i} \\ Σ_{j = 1}^{n_{2} - 1} N_{n_{1} + j} c_{j} \end{matrix}) \end{matrix}

where the random variables

N_{i}

are independent and centered normally distributed with variances

σ_{1}^{2}

and

σ_{2}^{2}

for

i = 1, \dots, n_{1} - 1

and

i = n_{1} + 1, \dots, n - 1

, respectively. Let

{B_{1}^{*}}^{T} = (\begin{matrix} b_{1} & \dots & b_{n_{1} - 1} \end{matrix}) and {B_{2}^{*}}^{T} = (\begin{matrix} c_{1} & \dots & c_{n_{2} - 1} \end{matrix}) .

The Kronecker product matrix

{B^{*}}^{T} = {B_{1}^{*}}^{T} \otimes {B_{2}^{*}}^{T} = (\begin{matrix} {B_{1}^{*}}^{T} \\ {B_{2}^{*}}^{T} \end{matrix})

describes then a mapping from

R^{2 n - 2}

to

R^{2 n}

and, a.s.,

\begin{matrix} | | B^{T} η {| |}_{(n)} & = | | (\begin{matrix} {B_{1}^{*}}^{T} (\begin{matrix} N_{1} \\ ⋮ \\ N_{n_{1} - 1} \end{matrix}) \\ {B_{2}^{*}}^{T} (\begin{matrix} N_{n_{1} + 1} \\ ⋮ \\ N_{n - 1} \end{matrix}) \end{matrix}) {| |}_{(n)}^{*} \\ = (\frac{1}{n_{1} (n_{1} - 1)} | | {B_{1}^{*}}^{T} (\begin{matrix} N_{1} \\ ⋮ \\ N_{n_{1} - 1} \end{matrix}) {| |}^{2} + \frac{1}{n_{2} (n_{2} - 1)} | | {B_{2}^{*}}^{T} (\begin{matrix} N_{n_{1} + j} \\ ⋮ \\ N_{n - 1} \end{matrix}) {| |}^{2})^{1 / 2} \\ = (\frac{1}{n_{1} (n_{1} - 1)} | | \sum_{i = 1}^{n_{1} - 1} N_{i} b_{i} {| |}^{2} + \frac{1}{n_{2} (n_{2} - 1)} | | \sum_{j = 1}^{n_{2} - 1} N_{n_{1} + j} c_{j} {| |}^{2})^{1 / 2} \\ = (\frac{1}{n_{1} (n_{1} - 1)} \sum_{i = 1}^{n_{1} - 1} N_{i}^{2} + \frac{1}{n_{2} (n_{2} - 1)} \sum_{j = 1}^{n_{2} - 1} N_{n_{1} + j}^{2})^{1 / 2} \end{matrix}

where the norm

{| | . | |}_{(n)}^{*}

is defined in

R^{n - 1} \times R^{n - 1}

. The variance of the nominator of the Behrens–Fisher statistic is

V (\bar{X} - \bar{Y}) = \frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}} .

Hence,

T^{B F}

may be represented as

T^{B F} \overset{d}{=} \frac{(\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}})^{1 / 2} N_{n}}{(\frac{σ_{1}^{2}}{n_{1} (n_{1} - 1)} Σ_{i = 1}^{n_{1} - 1} N_{i}^{2} + \frac{σ_{2}^{2}}{n_{2} (n_{2} - 1)} Σ_{j = 1}^{n_{2} - 1} N_{n_{1} + j}^{2})^{1 / 2}}

where the standard Gaussian distributed random variable

N_{n}

is independent of

N_{1}, \dots, N_{n_{1} - 1}, N_{n_{1} + 1}, \dots, N_{n - 1}

. □

The constants

A^{*}

and

B^{*}

from Lemma 1 depend on

σ_{1}

and

σ_{2}

only through the variance ratio

V R = σ_{2}^{2} / σ_{1}^{2}

, which itself plays the role of a nuisance parameter.

If

θ = V R / S R

where

S R = \frac{n_{2}}{n_{1}}

is the sample size ratio then the constants

A^{*}, B^{*}, V R

and

n_{2}

may be expressed in terms of the parameter triple

(θ, n_{1}, S R)

. The constants

A^{*}

and

B^{*}

depend on the variance ratio

V R

, but in a different way for the different sample size ratio

S R

. For

n_{1}

given, the inverse mapping

(A^{*}, B^{*}) \to (θ, S R)

is defined by:

θ = \frac{1}{A^{*} (n_{1} - 1)} - 1, S R = \frac{1 + A^{*} + B^{*} - A^{*} n_{1}}{n_{1} B^{*}} .

Other parameter triples could be introduced, e.g.,

(k, m, θ)

, being closely related to but nevertheless different from the parameter triple in [6].

The first and second order moments of

S^{2}

are

E S^{2} = 1 and V (S^{2}) = \frac{2}{{(1 + θ)}^{2}} [\frac{1}{k} + \frac{θ^{2}}{m}] .

Finally, it turns out that under the hypothesis

H_{0} : μ_{1} = μ_{2}

the statistic

T^{B F}

allows the representation

T^{B F} |_{H_{0}} \overset{d}{=} \frac{N}{\sqrt{γ \frac{C Q_{1, n_{1} - 1}}{n_{1} - 1} + (1 - γ) \frac{C Q_{2, n_{2} - 1}}{n_{2} - 1}}}

(8)

where the independent random variables N and

C Q_{i, n_{i} - i}, i = 1, 2

are as in Section 2,

k = n_{1} - 1, m = n_{2} - 1

, and the mixing coefficient is

γ = (1 + \frac{V R}{S R})^{- 1} .

(9)

Thus, the Behrens–Fisher statistic follows the Student bridge distribution with d.f.

(n_{1} - 1, n_{2} - 1)

and mixing coefficient

γ

, or

(n_{1} - 1, n_{2} - 1; γ)

-Student bridge distribution, for short,

T^{B F} |_{H_{0}} \sim t_{n_{1} - 1, n_{2} - 1; (1 + \frac{V R}{S R})^{- 1}} .

(10)

In case of a known variance ratio, standard statistical significance testing and confidence estimation methods are based therefore upon the

(n_{1} - 1, n_{2} - 1; γ)

-Student bridge distribution in the common way. Here, assumption

\frac{k}{γ} < \frac{m}{1 - γ}

from Section 2 means that

\frac{n_{1} (n_{1} - 1)}{n_{2} (n_{2} - 1)} < \frac{σ_{1}^{2}}{σ_{2}^{2}} .

Without going here into technical details, the unrestricted distribution of

T^{B F}

is a non-central Student bridge distribution in a suitably defined sense.

5. Discussion

5.1. Reflection of the Three Pillar Bridges Property

We now consider four examples from [8] for demonstrating the role the three pillar bridges property discussed in this paper may play in practical statistical work. In each example, we chose a pair of sample sizes

(k, m)

from the set

{(11, 14), (14, 14), (6, 2), (3, 9)}

, and a variance ratio

V R

from the positive real line that we assume to be known. In any case, we then determine the mixture coefficient

γ

by a one-to-one calculation from

V R

. This way, Examples 1–4 are described (with some redundancy) as:

E 1 (11, 14 | 1.25 | 0.505)

,

E 2 (14, 14 | 0.04 | 0.962)

,

E 3 (6, 2 | 1.25 | 0.505)

and

E 4 (3, 9 | 0.04 | 0.962)

. Figure 1, Figure 2, Figure 3 and Figure 4 show the densities

f_{W} = f_{W S C S (k, m; γ)}

and

f_{T} = f_{T_{k, m; γ}}

for more parameter combinations

(k, m; γ)

than required in Examples E1 to E4.

A value of

γ

close to 1 corresponds to a value of

θ = V R / S R

close to zero,

V R < < S R

, meaning that the sample size in the second population compared to that in the first is disproportionately large compared to the corresponding quotient of variances; in other words, the first population is under represented.

A value of

γ

close to 0 corresponds to a very large value of

θ = V R / S R

,

S R < < V R

, meaning that the sample size in the first population compared to that in the second is disproportionately large compared to the corresponding quotient of variances; in other words, the second population is under represented.

Unlike these two cases of imbalance, a value of

γ

in the order of

γ_{0}

speaks for an approximately achieved balance. The latter can be observed close to the middle pillar of the three pillar Student bridge distribution and is suitable to explain some effect when choosing the degree of freedom in the Welch approximation to the exact density of T.

The chi-square bridge densities shown in Figure 1 correspond tho those of the Student bridge densities in Figure 3. It is shown in [8] that for such cases the Welch approximation seems to be the best that were found so far. Welch’s approximate degrees of freedom, see Formula (1.2) in [8] with

N_{1} - 1 = k, N_{2} - 1 = m

and

σ_{i}^{2}

replaced with

S_{i}^{2}, i = 1, 2

, are

f = 25

for Example 1 and

f = 15

for Example 2. This corresponds very well to the three pillar property of the Student bridge distribution.

If

γ = 1 / 2

, as is approximately the case in Example 1, then the denominator

N

of T can be written as

N^{2} = (C Q_{1, 11} / 11 + C Q_{2, 14} / 14) / 2 .

Because the numbers 11 and 14 are of comparable size, a reasonable approximation is

N^{2} \approx (C Q_{1, 11} + C Q_{2, 14}) / (2 \times 12.5) \sim C Q_{25} / 25

finally leading for Example 1 to

T \approx t_{25}

.

In Example 2, the denominator of T allows the representation

N^{2} = 0.962 C Q_{1, 14} / 14 + 0.038 C Q_{2, 14} / 14

, which is reasonably approximated by

N^{2} \approx C Q_{15} / 15

. Thus,

T \approx t_{15}

.

Figure 2 shows a broader variability between the densities when the mixing coefficient

γ

is varied compared to Figure 1. This is reflected in more visible variation of the corresponding Student bridge densities in Figure 4, both in their distribution centers and their distribution tails. This should be taken into account if applications of the Student bridge distribution are required, in particular in the areas of the distributions just mentioned.

The Welch approximation is known to perform better when both k and m are sufficiently large. Our Figures show what may happen for small sample sizes.

Because the consideration in [8] is even for higher dimensions it might be of some interest to extend the present work to this case, too.

5.2. Examples Where the Student Bridge Distribution Should Be Preferred

The aim of this section is to give a complementary structural argumentation confirming the numerical discoveries in [9] with respect to the question of when Welch’s approximation is not sufficiently precise. To this end, we present, for two cases of sample sizes and variance ratios, the exact Student bridge density and Welch’s approximation to it in a joint figure.

Example 5. Assume that as in Figure 1, Figure 2 and Figure 3 in [9], sample sizes

(n_{1}, n_{2})

are

(5, 3), (3, 4), (4, 3), (2, 4), (3, 3)

and

(2, 2),

and that the estimated variance ratio is always equal to 0.25. If we assume that the exact variance ratio in (10) is equal to 0.25, then the mixing coefficient

γ

of the Student bridge distribution

t_{n_{1} - 1, n_{2} - 1; γ}

is accordingly equal to 0.706, 0.840, 0.750, 0.889, 0.8 and 0.8. Figure 5 shows the density of

t_{4, 2; 0.706}

and the density of its Welch approximation

t_{6}

that can hardly be visually distinguished from each other if considered on the whole line, but differ locally. Figure 6 shows the densities of the Student bridge distribution

t_{1, 3; 0.889}

and Welch’s Student approximation to it,

t_{1}

. In this case, preference for the Student bridge density can even be seen globally.

Funding

This research received no external funding.

Acknowledgments

The author is grateful to the reviewers and the academic editor for their valuable comments and suggestions.

Conflicts of Interest

The author declares no conflict of interest.

References

Schechtman, E.; Sherman, M. The two-sample t-test with a known ratio of variances. Stat. Methodol. 2007, 4, 508–514. [Google Scholar] [CrossRef]
Maity, A.; Sherman, M. The two-sample test with one variance unknown. Am. Stat. 2006, 60, 163–166. [Google Scholar] [CrossRef]
Behrens, W. Ein Beitrag zur Fehlerberechnung bei wenigen Beobachtungen. Landwirtsch. Jahrbuecher 1929, 68, 807–837. [Google Scholar]
Fisher, R. The fiducial argument in statistical inference. Ann. Eugen. 1935, 6, 391–398. [Google Scholar] [CrossRef]
Welch, B. The generalization of ‘Student’s’ problem when several different population variances are involved. Biometrika 1947, 34, 28–35. [Google Scholar] [CrossRef] [PubMed]
Fisher, R.; Yates, F. Statistical Tables for Biological, Acricultural and Medical Research; Oliver Boyd: Edinburgh, UK, 1957.
Feiveson, A.; Delaney, F. The Distribution and Properties of a Weighted Sum of Chi Squares; NASA Technical Note TN D-4575; National Aeronautics and Space Administration: Washington, DC, USA, 1968.
Nel, D.G.; van der Merwe, C.A.; Moser, B.K. The exact distributions of the univariate and multivariate Behrens–Fisher statistics with a comparison of several solutions in the univariate case. Commun. Statist. Theory Meth. 1990, 19, 279–298. [Google Scholar] [CrossRef]
Nadarajah, S.; Li, R. Exact distribution of a modified Behrens–Fisher statistic. Commun. Stat. Simul. Comput. 2017, 46, 6845–6864. [Google Scholar] [CrossRef][Green Version]
Linnik, J. Statistical Problems with Nuisance Parameters; American Mathematical Society: Providence, RI, USA, 1968. [Google Scholar]
Kim, S.H.; Cohen, A.S. On the Behrens-Fisher problem: A review. J. Educ. Behav. Stat. 1998, 23, 356–377. [Google Scholar] [CrossRef]
Dudewicz, E.J.; Ma, Y.; Mai, E.S.; Su, H. Exact solutions to the Behrens-Fisher problem: Asymptotically optimal and finite sample efficient choice among. J. Stat. Plan. Inference 2007, 137, 1584–1605. [Google Scholar] [CrossRef]
Giron, F.J.; Castillo, C. The multivariate Behrens-Fisher distribution. J. Multiv. Anal. 2010, 101, 2091–2102. [Google Scholar] [CrossRef]
Anderson, M.J.; Walsh, D.C.I.; Clarke, K.R.; Gorley, R.N.; Guerra-Castro, E. Some solutions to the multivariate Behrens–Fisher problem for dissimilarity-based analyses. Aust. N. Z. J. Stat. 2017, 59, 57–79. [Google Scholar] [CrossRef]
Shirke, D.T.; Khorate, S.D. Two-sample nonparametric test for testing equality of locations based on data depth. J. Indian Soc. Probab. Stat. 2018, 19, 9–23. [Google Scholar] [CrossRef]
Sprott, D.A.; Farewell, V.T. The difference between two normal means. Am. Stat. 1993, 47, 126–128. [Google Scholar]
Abramowitz, M.; Stegun, I. Handbook of Mathematical Functions; US Government Printing Office: Dover, NY, USA, 1965.
Gradshteyn, I.; Ryzhik, I. Table of Integrals, Series and Products; Academic Press: New York, NY, USA, 1980. [Google Scholar]

Figure 1. (a) Example 1; (b) Example 2.

Figure 2. (a) Example 3; (b) Example 4.

Figure 3. (a) Example 1; (b) Example 2.

Figure 4. (a) Example 3; (b) Example 4.

Figure 5. Densities of

t_{4, 2; 0.706}

and

t_{6}

around the origin and in the distribution tails.

Figure 5. Densities of

t_{4, 2; 0.706}

and

t_{6}

around the origin and in the distribution tails.

Figure 6. Densities of

t_{1, 3; 0.889} and t_{1}

.

Figure 6. Densities of

t_{1, 3; 0.889} and t_{1}

.

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Richter, W.-D. Chi-Square and Student Bridge Distributions and the Behrens–Fisher Statistic. Stats 2020, 3, 330-342. https://doi.org/10.3390/stats3030021

AMA Style

Richter W-D. Chi-Square and Student Bridge Distributions and the Behrens–Fisher Statistic. Stats. 2020; 3(3):330-342. https://doi.org/10.3390/stats3030021

Chicago/Turabian Style

Richter, Wolf-Dieter. 2020. "Chi-Square and Student Bridge Distributions and the Behrens–Fisher Statistic" Stats 3, no. 3: 330-342. https://doi.org/10.3390/stats3030021

APA Style

Richter, W.-D. (2020). Chi-Square and Student Bridge Distributions and the Behrens–Fisher Statistic. Stats, 3(3), 330-342. https://doi.org/10.3390/stats3030021

Article Menu

Chi-Square and Student Bridge Distributions and the Behrens–Fisher Statistic

Abstract

1. Introduction

2. Chi-Square Bridge Distribution

3. Student Bridge Distribution

4. Behrens–Fisher Statistic

5. Discussion

5.1. Reflection of the Three Pillar Bridges Property

5.2. Examples Where the Student Bridge Distribution Should Be Preferred

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI