1. Introduction
Let us suppose we have two independent random samples 
 and 
, which are drawn from two normal distributions, 
 and 
, respectively. The Behrens–Fisher problem occurs when testing the equality of the two means 
 and 
 based on random samples like these without the assumption that the two variances, 
 and 
, are equal. [
1] showed that a uniformly most powerful test does not exist in this case, and the Behrens–Fisher problem remains one of the unsolved problems of statistics. Many different approaches have been tried to solve this problem. Among those approaches are the fiducial approach proposed by [
2,
3], which somehow opened the way to the Bayesian approach proposed by [
4,
5], based on setting independent and locally uniform prior distributions for 
. The frequentist approach proposed by [
6,
7] uses Student’s 
t distribution with 
 degrees of freedom as the approximate distribution of the Behrens–Fisher statistic.
In this paper, we will obtain the exact and near-exact distributions, both the probability density function and cumulative distribution function, of the Behrens–Fisher statistic,
      
      under 
, where 
 and 
 for 
, in the form of mixtures of Student’s 
t distributions multiplied by constants. Particularly for the case when both sample sizes are odd, the exact distribution will be derived in a finite closed form without any unsolved integrals or infinite sums by using the GIG (generalized integer gamma) distribution in [
8], which is the distribution of the sum of independent gamma variables with integer shape parameters and nonequal rate parameters. For the other cases, that is, when both sample sizes are even or one of them is even and the other one is odd, the near-exact distribution will be obtained by approximating the exact distribution using a finite mixture of GIG distributions to obtain a more manageable cumulative distribution function. Such exact and near-exact distributions include 
 and 
 as unknown parameters, which have to be estimated, based on the observed samples, being then the 
p-values obtained from these exact or near-exact distributions with estimated parameters. The results will be compared with Welch’s 
t-test, one of the most widely used solutions to the problem, through Monte-Carlo simulations for relatively small sample sizes. We will see that the tests based on the exact or near-exact distribution show some advantage in terms of being able to obtain higher power than Welch’s 
t-test, especially when sample sizes are small and unbalanced and variances are also unbalanced.
This paper is organized as follows: 
Section 2 presents the exact distribution when both sample sizes are odd; 
Section 3 provides the near-exact distribution when one sample size is even and the other one odd, and 
Section 4 presents the exact near-exact distribution of the test statistic when both sample sizes are even. Numerical studies are provided in 
Section 5 to compare the exact and the near-exact distribution approaches with Welch’s 
t-test, and concluding remarks are presented in 
Section 6.
  2. The Exact Distribution of the Behrens–Fisher Statistic
for Odd-Numbered Sample Sizes
In this section, we present the exact distribution of the Behrens–Fisher statistic in (
1), when both 
 and 
 are odd. Since 
 where 
 is the sum of two independent gamma variables, each of which follows the distributions 
 and 
, where 
 indicates a gamma distribution with shape parameter 
 and rate parameter 
.
Now, we can divide the problem into two cases: one is the case of 
, and the other is the case 
. When 
 follows a GIG distribution of depth 2 with shape parameters 
 and rate parameters 
, which are different. Notice that 
 and 
 are both integers because of odd-numbered sample sizes. The probability density function for this distribution is
      
      where 
 are given by (11)–(13) in [
8]. In this case,
      
      and
      
      for 
.
As a matter of fact, this probability density function can be rewritten as
      
      which is a finite mixture of integer gamma distributions in [
9]. Now that we have obtained the exact distribution of 
W, we can easily get the joint distribution of 
W and 
, under 
. 
Y follows a normal distribution 
 under 
, and so, given the independence of 
W and 
Y, the joint probability density function of these two random variables is
      
      where 
 and 
 is the probability density function of a standard normal distribution.
Since we want to derive the distribution of 
, we need to go further and obtain the joint distribution of 
 and 
 from 
 by a simple change of variables. From this process, we obtain
      
      for 
 and 
 and
      
      where
      
      yields the probability density function of a 
 random variable, with 
 denoting a Student’s 
t distribution with 
 degrees of freedom.
Hence, 
 is a mixture of probability density functions of 
 random variables 
, with weights 
. Given that the probability density function of Student’s 
t variable with 
n degrees of freedom is given by
      
      for 
, the probability density function of 
, under 
, can be rewritten as
      
      for 
.
Then, the cumulative distribution function of 
 would also be a mixture of cumulative distribution functions of 
 with weights 
. For Student’s 
t distribution with even degrees of freedom, the cumulative distribution function is given by
      
      for 
, which is obtained by applying 
 and 
 from [
10], where 
 denotes a Gaussian hypergeometric function.
Thus, the cumulative distribution function of 
, under 
, can be expressed as
      
      for 
.
The distribution of 
 is much simpler when 
 Since the rate parameters of the two independent gamma variables, 
 and 
, are, in this case, equal, 
W simply follows the gamma distribution 
. Therefore, in this case, 
. This means that the probability density function and cumulative distribution function of 
 are the same as those of Student’s 
 distribution with 
 degrees of freedom. The probability density function for this distribution is written as
      
      for 
.
Additionally, as 
 is an even number when sample sizes are odd-numbered, the cumulative distribution function can be written as
      
      for 
.
  3. The Exact and Near-Exact Distribution of the Behrens–Fisher Statistic
for Even-Numbered Sample Sizes
In this section, we present the exact distribution of  when sample sizes are both even. The exact distribution consists of an infinite series, but we provide a near-exact distribution based on a finite mixture of GIG distributions, which yields an approximation to the exact distribution.
  3.1. The Exact Distribution
Unlike the case where both sample sizes are odd,  does not follow a GIG distribution when both sample sizes are even. This is so because the shape parameters for  and  are not integers when sample sizes are even. However, we can use the Kummer confluent hypergeometric function to obtain the exact distribution of W for this case.
Given the integral definition of the Kummer confluent hypergeometric function, the probability density function of 
W, which is the sum of two independent gamma variables with shape parameters
        
        and rate parameters
        
        can be expressed as
        
        for 
.
Since
        
        the probability density function of 
W can be further written as
        
        for 
, which is the probability density function of an infinite mixture of gamma distributions with weights 
. Now, using a similar approach to the one used in 
Section 2 for the case of odd-numbered sample sizes, we can obtain the exact probability density function of 
 under 
 in the form of an infinite mixture of probability density functions of 
 distributions, with weights 
, for 
. The probability density function of 
 may be then stated as follows:
        for 
.
Regarding the exact cumulative distribution function of 
, under 
, this cumulative distribution function would also be an infinite mixture of cumulative distribution functions of 
 distributions, with weights 
, for 
. It can be written as
        
        for 
, where 
, 
, by applying the expression for 
 obtained in 
Section 2.
As the exact cumulative distribution function of  is expressed as an infinite sum when , it is not much of a manageable cumulative distribution function. In order to obtain numerical values of this cumulative distribution function in a reasonable amount of time, use of an integer upper bound for the summation in i is required. However, the number of terms required in order to obtain a small enough truncation error is often very large. Hence, a near-exact distribution of  with a manageable cumulative distribution function needs to be obtained for the case where  so that quantiles and p-values can be computed in a faster and more practical way.
  3.2. Near-Exact Distribution
In order to obtain a near-exact distribution for , based on a finite mixture, we will first obtain a near-exact distribution for W and then derive the distribution of  from this distribution of W.
Let 
, and let 
. Then, the exact characteristic function of 
W is given by
        
        where 
 is the imaginary number with 
. Since 
 is the characteristic function of the sum of two independent gamma variables 
 and 
, we can make use of the probability density function of 
W expressed as an infinite mixture of probability density functions of gamma distributions obtained in 
Section 3.1 to write
        
        where 
. This leads to
        
Now, for a given integer 
, we propose to approximate 
 by
        
        where 
 are determined in such a way that satisfies
        
        with 
. This approximate characteristic function 
 is, in fact, the characteristic function of a finite mixture (with weights 
) of 
 GIG distributions of depth 2 with integer shape parameters 
 and rate parameters 
 and 
, the first 
 moments of which are the same as those of 
W.
Hence, a finite mixture of probability density functions of 
 GIG distributions with weights 
 for 
 will then be a near-exact probability density function of 
W. As the GIG distribution itself is a finite mixture of gamma distributions, as shown in 
Section 2, this probability density function can be written as
        
        for 
, where
        
        for 
 and 
 are defined in the same way as 
 in 
Section 2, except here we use 
 instead of 
.
From this near-exact probability density function of 
W, using the same logic as before just like in 
Section 2 and 
Section 3.1, we can obtain a near-exact probability density function of 
 in the form of a finite mixture of probability density function 
 of 
 with weights 
. Thus, the near-exact probability density function under 
 is given as follows:
        for 
. Naturally, the corresponding near-exact cumulative distribution function of 
 would be a finite mixture of cumulative distribution function 
 of 
 with weights 
, which can be written as
        
        for 
, by applying 
 obtained in 
Section 2.
  4. One of the Sample Sizes Is Even and the Other Is Odd
The exact distribution of the Behrens–Fisher statistic 
 for this case is given by the same expressions used in 
Section 3.1. Just like the case of even-numbered sample sizes, the exact cumulative distribution function is not that manageable for usage when 
 in this case, too. Hence there is a need to obtain a near-exact distribution of 
 with a manageable cumulative distribution function for faster and more practical computation of quantiles and 
-values.
As we did in 
Section 3.2, we will first obtain a near-exact distribution for 
W, and then we will derive the near-exact distribution for 
 from the distribution of 
W. Let, without any loss of generality, 
 be even and 
 be odd. Additionally, let 
, which is equivalent to 
, and let 
. Then, for 
 and 
 as defined in (
2) and (
3), the exact characteristic function of 
W can be written as
      
      where 
 is the characteristic function of a GIG distribution of depth 2 with integer shape parameters 
 and 
 and rate parameters 
 and 
, while 
 is the characteristic function of a 
 distribution. Now, for a given 
, we propose to approximate 
 by
      
      where 
 are determined in such a way that
      
      with 
.
The characteristic function  is the characteristic function of a finite mixture of  distributions, which are  distributions . The first  moments of this mixture are the same as those of .
We should note that the weights 
 do not depend on 
. The 
h-th non-central moment of 
 and of the mixture of 
 distributions are, respectively, given by
      
      which means 
 are determined in such a way that satisfies
      
      for 
, definitely not depending on 
.
By using 
 instead of 
, we can then approximate 
 with
      
      which is the characteristic function of a finite mixture of 
 distributions of depth 3 with shape parameters 
 and rate parameters 
.
Thus, a finite mixture of 
 probability density functions 
s of 
 distributions of depth 3 with weights 
 can be a near-exact probability density function of 
W. As GIG distribution itself is a finite mixture of gamma distributions, this near-exact probability density function of 
W is written as
      
      where
      
      and 
 are given by (11)–(13) in [
8] using 
 and 
, which are
      
      where 
 and 
. From this near-exact probability density function of 
W, using the same logic as before, we can once again obtain a near-exact probability density function of 
 in the form of a finite mixture of probability density functions of 
 distributions, with weights 
. The near-exact probability density function of 
, under 
, is thus given by
      
      for 
.
Hence, the corresponding near-exact cumulative distribution function of 
, under 
, is also a finite mixture of cumulative distribution functions of 
, with weights 
, which can be written as
      
      for 
 by applying 
 obtained in 
Section 2.
  5. Comparison of the Exact or Near-Exact Distribution and Welch’s t Test
When it is plausible to assume that 
, that is, that 
, we can make use of the exact and near-exact distribution of 
 to solve the Behrens–Fisher problem. The exact cumulative distribution function of 
 will be used for computation of 
p-values when both sample sizes are odd, while the near-exact cumulative distribution functions obtained in 
Section 3 and 
Section 4 will be used for computing 
p-values when both sample sizes are even or when one of the sample sizes is even and the other is odd. Because the cumulative distribution functions of 
 include the unknown parameters 
 and 
, these will be estimated by the sample variances 
.
We will compare the exact or near-exact distributions and Welch’s 
t-test by their actual sizes and powers for testing 
 versus 
. For 
, since the hypothesis is for the right-tailed test, the corresponding 
p-value is computed from 
 in (
4) or 
 in (
5) or (
6), depending on the parity of the sample sizes, according to the derivations in the previous sections, and with 
 estimated by 
. We used different type I error rates 
 and we conducted Monte Carlo experiments under a range of different sets of parameters.
Simulations were conducted for variance and sample size pairs that correspond to
      
	  For 
, we covered the cases where 
 satisfies
      
      with 
 corresponding to the null hypothesis 
.
For each combination of parameters, the number of replications was 50,000, and in each subsection, we provide two scenarios: one in which sample sizes were balanced, and another one where they were unbalanced. For each generated sample, we computed 
 for 
 50,000 and then obtained the type I error under 
 and the power under 
 from
      
      where 
 is the indicator function and the 
p-value 
 is computed using 
 in (
4) for the case of both odd sample sizes or 
 in (
5) or (
6) when at least one of the sample sizes is even.
All computations were done with the software R, version 4.1.0.
  5.1. Odd n1 and n2
In 
Table 1 and 
Table 2, we present the power values for the exact distribution and Welch’s test, represented, respectively, by E and W. We considered for 
 and 
 the sets of values indicated above.
We may see that when sample sizes are unbalanced, as in 
Table 2, the exact distribution gives larger values of power than Welch’s 
t-test, namely for larger values of 
 and smaller values of 
, while they tend to give similar results when the two sample sizes are homogeneous, still with the exact distribution giving larger power values when the variances are unbalanced. Namely, for the case of 
 and 
 and 
, the near-exact distribution shows a gain of over 30% in power in relation to Welch’s 
t-test for 
.
  5.2. Even  and 
Table 3 and 
Table 4 provide numerical results for type I error rates and powers for Welch’s 
t-test and near-exact distributions that use 
 for cases where both sample sizes are even. In these tables, the near-exact distributions and Welch’s test are, respectively, denoted by NE and W.
 Similar to the case of odd sample sizes, we see that the differences in power between the near-exact distribution and Welch’s 
t-test are quite slim when the sample sizes are equal, as shown in 
Table 3, although with a tendency for the near-exact distributions to exhibit larger powers when variances are unbalanced, while in the unbalanced case 
 shown in 
Table 4 the power displayed by the near-exact distributions is quite larger, particularly if the variances are also unbalanced. Namely, for the case 
 and 
, the near-exact distribution shows a gain of over 20% in power.
  5.3.  Is Even and  Is Odd
Table 5 displays the case of similar sample sizes, and it shows that once again, the near-exact distribution and Welch’s test are fairly similar to each other in terms of Type I error rates and values of power. On the other hand, 
Table 6 presents the results for the unbalanced sample size case, and it shows that the near-exact distribution and Welch’s test can control well the type I error rates, but that the near-exact distribution can obtain larger powers than Welch’s test. In particular, when 
 and 
, the near-exact distribution shows a gain of almost 30% in power in relation to Welch’s 
t-test for 
. The near-exact distributions and Welch’s test are, respectively, denoted by NE and W in these tables.
   5.4. Brief Study of Power Evolution for Increasing Sample Sizes
With the aim of showing the evolution of the values of power for increasing sample sizes, the plots of power curves for increasing values of the sample sizes are shown in 
Figure 1, 
Figure 2 and 
Figure 3, for given 
, 
, 
 and 
. As expected, it is clear that the power curves are increasing with increasing values of the sample sizes. 
Figure 1, 
Figure 2 and 
Figure 3 represent, respectively, cases of 
, 
 and 
. It is also clear from these Figures that, also as expected, although the use of the exact or near-exact distributions lead to an increase in the values of power, the power values from Welch’s test approach asymptotically those obtained when using the exact or near-exact distributions for increasing sample sizes. In 
Figure 1, 
Figure 2 and 
Figure 3, we use 
 and present the values of 
 and 
 in each figure. In addition, the corresponding values of 
 are also presented.
  6. Conclusions
Over the years since the Behrens–Fisher problem was first introduced, many different solutions have been presented for the problem. In this paper, we propose another approach for the Behrens–Fisher problem that is based on a version of the exact distribution and near-exact distributions for its statistic, which are based on GIG (generalized integer gamma) distributions. Overall, the differences between the sizes of the near-exact distribution and Welch’s t-test are negligible, while the use of the exact or near-exact distributions provide powers that are larger than those provided by Welch’s t-test, mainly for the cases where the sample sizes and/or the variances are unbalanced, and namely when smaller sample sizes are associated with larger variances. The results thus show that, mainly for the cases of unbalanced sample sizes and/or unbalanced variances, the use of Welch’s t-test leads to some loss in power compared with the use of the exact or near-exact distributions developed, thus advising towards the use of these latter ones.
The computation of the exact or near-exact distributions poses absolutely no problems, even for large sample sizes, with the computation times remaining in the hundredths of a second for sample sizes in the order of a few hundreds.
In order to decide which distribution to use, the user may want to test the hypothesis 
. We may note that, given the definition of 
 and 
 in (
3), testing that hypothesis is indeed equivalent to testinf the hypothesis
      
      which may tested in as much the same way we run a test of equality of two variances based on two independent samples.