Chi-Square and Student Bridge Distributions and the Behrens–Fisher Statistic

: We prove that the Behrens–Fisher statistic follows a Student bridge distribution, the mixing coefﬁcient of which depends on the two sample variances only through their ratio. To this end, it is ﬁrst shown that a weighted sum of two independent normalized chi-square distributed random variables is chi-square bridge distributed, and secondly that the Behrens–Fisher statistic is based on such a variable and a standard normally distributed one that is independent of the former. In case of a known variance ratio, exact standard statistical testing and conﬁdence estimation methods apply without the need for any additional approximations. In addition, a three pillar bridges explanation is given for the choice of degrees of freedom in Welch’s approximation to the exact distribution of the Behrens–Fisher statistic.


Introduction
Generalizations and modifications of standard statistical distributions, such as chi-square and Student distributions, play a useful role because of their numerous possible applications in different areas of statistics. However, the modifications introduced here, chi-square and Student bridge distributions, are only considered from the subsequent application.
If the normalized chi-square distributed random variable from the denominator of the common ratio representation of a Student distributed random variable is replaced with a mixture of independent normalized chi-square distributed variables, then the resulting ratio follows a distribution that is called here a Student bridge distribution. The possibly most prominent example of this type of random variables is the Behrens-Fisher statistic. The mixing coefficient in the corresponding representation of this statistic depends on the variances of the underlying two Gaussian sample distributions only through their ratio. The variance ratio thus plays the role of a nuisance parameter when deriving the distribution of the Behrens-Fisher statistic. As described in [1], it may happen that the variance ratio is known although the individual variances are not when two instruments of equal precision average different numbers of replicates arriving at a response. Another situation where one of the two variances is known and the other one is not is dealt with in [2].
The well known Behrens-Fisher statistic was introduced already in [3,4]. Several authors provided approximations of its distribution. To mention only some of the earlier contributions, first of all we refer to the well known approximation in [5]. The approximative distribution whose percentage points are dealt with in [6] is often called the Behrens-Fisher distribution. Convolutions of weighted chi-squares are used for an evaluation of the Welch approximation to the distribution of the Behrens-Fisher statistic in [7]. In [8], the exact distribution of the Behrens-Fisher statistic is derived for the case of

Chi-Square Bridge Distribution
Let CQ 1,k and CQ 2,m be independent random variables, where CQ i,d is chi-square distributed with d d.f., d ∈ {k, m}, and i = 1, 2. For γ ∈ [0, 1], we consider the mixture of normalized chi-squares or weighted sum of Chi-squares The first and second order moments of W are respectively. Minimal variance with respect to the mixing coefficient is attained for γ = γ 0 , where Let X ∼ h indicate that the random variable X follows the probability distribution h. If A = γ/k and B = (1 − γ)/m, then A = B holds for γ = γ 0 and statistic W/A follows in this case a chi-square That is why we say that the distribution of WSCS has a three pillar bridges property.

Student Bridge Distribution
If N denotes a standard Gaussian distributed random variable that is independent of CQ 1,k and CQ 2,m , and t l Student's t-distribution with l d.f., then the statistic By the general integral representation of the density of the ratio of two independent continuous random variables, where f N denotes the standard Gaussian density and f √ W (t) = 2t f W (t 2 ) can easily be derived from (2) or (3). Making use of (2) and changing the order of integration gives us: and changing the variables ) leads to the first of the following two alternative representations of the density f T , and Making use of (3) instead of (2) proves (6). Here, 2 F 1 denotes the hypergeometric function being defined for δ > β > 0 by 2 [17] and Formula 9.111 in [18]. Choosing y < 0 avoids a zero of 1 − yz within the range of integration and might motivate a favor of using Formulas (5) or (6) The following definition is motivated by the three pillar bridges property (4).

Definition 2.
The probability distribution corresponding to density (5) (or (6)) will be called a Student bridge distribution with (k, m) d.f. and mixing coefficient γ, or (k, m; γ)-Student bridge distribution t k,m;γ , for short. Figures 3 and 4 show the density f T of the distribution t k,m;γ for the same choice of parameters as for f W .

Behrens-Fisher Statistic
Let X 1 , ..., X n 1 and Y 1 , ..., Y n 2 be jointly independent Gaussian samples with expectations µ i and variances σ 2 i , i = 1, 2. We consider the statistic are common sample means and unbiased sample variances, respectively. By Z d = U we mean that two random variables Z and U have the same probability distribution.

Lemma 1. The Behrens-Fisher statistic allows the representation
and where (Z 1 , ..., Z n 1 +n 2 −1 ) T is a standard Gaussian distributed random vector taking values in R n 1 +n 2 −1 , Proof. We put Let us further denote the orthogonal projection onto the linear space L = L(1 +0 , 1 0+ ) by Π L and let the matrix P be defined such that Π L x = Px, ∀x ∈ R n ; n = n 1 + n 2 , then P = where I I k = 1 k 1 T k is a k × k-matrix. Here and below, missing off-diagonal matrix elements are zero. If L ⊥ is the subspace of R n being orthogonally to L then I n − P = Π L ⊥ . The statistic T BF can be written as: where the functional ||.|| (n) is defined for all x = (x 1 , ..., x n 1 ) T ∈ R n 1 and y = (y 1 , ..., y n 2 ) T ∈ R n 2 as: We note that Pµ = µ and (I n − P)µ = 0 n , and put κ = (µ 1 1 T n 1 0 T n 2 ) T . The random vector takes its values in R 2n and follows a singular Gaussian distribution of rank n, As a consequence, the nominator and denominator of the ratio statistic T BF are stochastically independent. Let be orthogonal n 1 × n 1 and n 2 × n 2 matrices, respectively. The random vector η = B(I n − P) X Y with B = B 1 B 2 follows a centered Gaussian distribution with the covariance matrix where the random variables N i are independent and centered normally distributed with variances σ 2 1 and σ 2 2 for i = 1, ..., n 1 − 1 and i = n 1 + 1, ..., n − 1, respectively. Let The Kronecker product matrix describes then a mapping from R 2n−2 to R 2n and, a.s., where the norm ||.|| * (n) is defined in R n−1 × R n−1 . The variance of the nominator of the Behrens-Fisher statistic is Hence, T BF may be represented as where the standard Gaussian distributed random variable N n is independent of N 1 , ..., N n 1 −1 , N n 1 +1 , ..., N n−1 .
The constants A * and B * from Lemma 1 depend on σ 1 and σ 2 only through the variance ratio VR = σ 2 2 /σ 2 1 , which itself plays the role of a nuisance parameter. If θ = VR/SR where SR = n 2 n 1 is the sample size ratio then the constants A * , B * , VR and n 2 may be expressed in terms of the parameter triple (θ, n 1 , SR). The constants A * and B * depend on the variance ratio VR, but in a different way for the different sample size ratio SR. For n 1 given, the inverse mapping (A * , B * ) → (θ, SR) is defined by: Other parameter triples could be introduced, e.g., (k, m, θ), being closely related to but nevertheless different from the parameter triple in [6].
The first and second order moments of S 2 are Finally, it turns out that under the hypothesis H 0 : µ 1 = µ 2 the statistic T BF allows the representation where the independent random variables N and CQ i,n i −i , i = 1, 2 are as in Section 2, k = n 1 − 1, m = n 2 − 1, and the mixing coefficient is Thus, the Behrens-Fisher statistic follows the Student bridge distribution with d.f. (n 1 − 1, n 2 − 1) and mixing coefficient γ, or (n 1 − 1, n 2 − 1; γ)-Student bridge distribution, for short, In case of a known variance ratio, standard statistical significance testing and confidence estimation methods are based therefore upon the (n 1 − 1, n 2 − 1; γ)-Student bridge distribution in the common way. Here, assumption k γ < m 1−γ from Section 2 means that Without going here into technical details, the unrestricted distribution of T BF is a non-central Student bridge distribution in a suitably defined sense.
A value of γ close to 1 corresponds to a value of θ = VR/SR close to zero, VR << SR, meaning that the sample size in the second population compared to that in the first is disproportionately large compared to the corresponding quotient of variances; in other words, the first population is under represented.
A value of γ close to 0 corresponds to a very large value of θ = VR/SR, SR << VR, meaning that the sample size in the first population compared to that in the second is disproportionately large compared to the corresponding quotient of variances; in other words, the second population is under represented.
Unlike these two cases of imbalance, a value of γ in the order of γ 0 speaks for an approximately achieved balance. The latter can be observed close to the middle pillar of the three pillar Student bridge distribution and is suitable to explain some effect when choosing the degree of freedom in the Welch approximation to the exact density of T.
The chi-square bridge densities shown in Figure 1 correspond tho those of the Student bridge densities in Figure 3. It is shown in [8] that for such cases the Welch approximation seems to be the best that were found so far. Welch's approximate degrees of freedom, see Formula (1.2) in [8] with N 1 − 1 = k, N 2 − 1 = m and σ 2 i replaced with S 2 i , i = 1, 2, are f = 25 for Example 1 and f = 15 for Example 2. This corresponds very well to the three pillar property of the Student bridge distribution.
In Example 2, the denominator of T allows the representation N 2 = 0.962CQ 1,14 /14 + 0.038CQ 2,14 /14, which is reasonably approximated by N 2 ≈ CQ 15 /15. Thus, T ≈ t 15 . Figure 2 shows a broader variability between the densities when the mixing coefficient γ is varied compared to Figure 1. This is reflected in more visible variation of the corresponding Student bridge densities in Figure  Because the consideration in [8] is even for higher dimensions it might be of some interest to extend the present work to this case, too.

Examples Where the Student Bridge Distribution Should Be Preferred
The aim of this section is to give a complementary structural argumentation confirming the numerical discoveries in [9] with respect to the question of when Welch's approximation is not sufficiently precise. To this end, we present, for two cases of sample sizes and variance ratios, the exact Student bridge density and Welch's approximation to it in a joint figure.
Example 5. Assume that as in Figures 1-3 in [9], sample sizes (n 1 , n 2 ) are (5, 3), (3, 4), (4, 3), (2, 4), (3,3) and (2,2), and that the estimated variance ratio is always equal to 0.25. If we assume that the exact variance ratio in (10) is equal to 0.25, then the mixing coefficient γ of the Student bridge distribution t n 1 −1,n 2 −1;γ is accordingly equal to 0.706, 0.840, 0.750, 0.889, 0.8 and 0.8. Figure 5 shows the density of t 4,2; 0.706 and the density of its Welch approximation t 6 that can hardly be visually distinguished from each other if considered on the whole line, but differ locally. Figure 6 shows the densities of the Student bridge distribution t 1,3; 0.889 and Welch's Student approximation to it, t 1 . In this case, preference for the Student bridge density can even be seen globally.   Funding: This research received no external funding.