An Objective and Robust Bayes Factor for the Hypothesis Test One Sample and Two Population Means

It has been over 100 years since the discovery of one of the most fundamental statistical tests: the Student’s t test. However, reliable conventional and objective Bayesian procedures are still essential for routine practice. In this work, we proposed an objective and robust Bayesian approach for hypothesis testing for one-sample and two-sample mean comparisons when the assumption of equal variances holds. The newly proposed Bayes factors are based on the intrinsic and Berger robust prior. Additionally, we introduced a corrected version of the Bayesian Information Criterion (BIC), denoted BIC-TESS, which is based on the effective sample size (TESS), for comparing two population means. We studied our developed Bayes factors in several simulation experiments for hypothesis testing. Our methodologies consistently provided strong evidence in favor of the null hypothesis in the case of equal means and variances. Finally, we applied the methodology to the original Gosset sleep data, concluding strong evidence favoring the hypothesis that the average sleep hours differed between the two treatments. These methodologies exhibit finite sample consistency and demonstrate consistent qualitative behavior, proving reasonably close to each other in practice, particularly for moderate to large sample sizes.


Introduction
One of the fundamental topics in statistics revolves around the one-sample population means and the comparison of two-sample means.The go-to method for addressing this question is typically the Student's t test [1].Conducting a hypothesis test for the population mean holds significant importance in the scientific research community and various fields where making inferences about population parameters is pivotal.Frequentists heavily rely on p-values to determine whether to reject or not reject the null hypothesis [2].However, p-values, along with significance testing based on fixed α-levels, tend to exaggerate evidence against null hypotheses for large sample sizes and lack the operational meaning of a probability [3][4][5].While the Bayesian approach has gained attention in hypothesis testing and model selection [6,7], its application in essential statistics topics remains somewhat limited [8].This raises the question: Why is a Bayesian Student's t test necessary?We argue for two main reasons.Firstly, Bayesian tests provide evidence for a hypothesis of interest that naturally adapts to any sample size.Secondly, the Bayes factor can be easily converted to posterior model probabilities and support one of the testing frameworks.Another crucial consideration is that scientific questions often have a Bayesian nature, such as, "What is the probability that these two treatments differ?".
Bayesian hypothesis testing and model selection have been undergoing extensive development because of recent advances in the creation of "default" Bayes factors that can be used in the absence of substantial subjective prior information [5,[9][10][11].The study in [12] proposed some arguments for the choice of the prior, such as (i) the fact that it is located around zero, (ii) the scale parameter σ, (iii) the fact that it is symmetric, and (iv) that it should have no moments.Bayes factors are attractive in terms of interpretation as odds, and the direct probability of the posterior model is readily understandable by general users of statistics [13].Methods based on conjugate priors for the Student's t test have a long history.Perhaps the most transparent approach for the two-sample Student's t is in [14].However, natural conjugate priors do not lead to robust procedures; they have tails that are typically of the same form as the likelihood function and will hence remain influential when the likelihood function is concentrated in the prior tails, which can lead to inconsistency [15].This conjugate Bayes factor for comparing two samples based on the Student's t is finite sample-inconsistent, i.e., it does not go to zero when the estimates go to infinity.
In this work, we proposed an objective and robust Bayes factor for testing the hypothesis of one-sample and two-sample means based on the t-statistic.Our Bayes factors can be easily implemented, allowing researchers to determine support for a particular hypothesis.This manuscript proceeds as follows.In Section 2, we derive these objectives and robust Bayes factors for one-sample and two-sample scenarios and demonstrate their finite sample consistency.In Section 3, we compare our Bayes factors with existing methodologies under several experimental frameworks.In Section 4, we apply our methodologies to real-life datasets such as the original Gosset sleep data and to comparisons of changes in blood pressure in mice according to their assigned diet.We conclude this work with a discussion in Section 5.

Methodology
Statistical inference for the mean (one or two samples) has an important rule in statistics and several fields.For instance, it is very common to test in terms of the average or population mean.Suppose that we are comparing two hypotheses, H 0 : θ 0 ∈ Θ vs. H 1 : θ 1 ∈ Θ. Suppose that we have available prior densities π i , i = 1, 2 for each hypothesis and let f i (x|θ i ) be the probability density function under the ith hypothesis.Define the marginal or predictive densities for each hypothesis of interest (or model), which are sometimes called the evidence of the ith hypothesis or model.The Bayes factor for comparing H 0 to H 1 is then given by The interpretation of the Bayes factor proceeds as follows.If B 01 > 1, then the evidence is in favor of the null hypothesis, while B 01 < 1 gives evidence in favor of the alternative hypothesis.If prior probabilities P(H i ) i = 0, 1 of the hypotheses are available, then one can compute the posterior probabilities of it from the Bayes factors.The posterior probability of H 0 , given the data x, is P(H 0 |x) = m 0 (x)P(H 0 ) ∑ 1 j=0 m j (x)P(H j ) = 1 1 + P(H 1 ) P(H 0 ) B 10 ; where B 10 = 1/B 01 .

One-Sample Mean Hypothesis Testing
A one-sample hypothesis test for the population mean is one of the most fundamental statistics topics, either as an introductory topic or to address research questions.Suppose we have a random sample from a normal distribution, i.e., X 1 , . . ., X n ∼ N(µ, σ 2 ), with an unknown standard deviation σ > 0. We are interested in testing for the population mean µ.
H 0 : µ = µ 0 vs. H 1 : µ ̸ = µ 0 . (4) A Bayesian approach to test this hypothesis is based on the theory of intrinsic priors [16,17].The authors begin with the noninformative priors for the null and alternative hypotheses, π N 0 (σ) = 1/σ and π N 1 (µ, σ) = 1/σ 2 .After some calculations, the authors showed that the conditional proper intrinsic prior under the alternative Hypothesis H 1 is given by One can express π I (µ, σ) = π(µ|σ)π(σ).The resulting intrinsic prior under H 1 is defined as The approximate Bayes factors based on the intrinsic prior (B IP 01 ) for a one-sample population mean are Here, μ and σ are the Maximum Likelihood Estimators (MLEs) under H 1 .The resulting Bayes factor for the hypothesis in (4) is where t = ( x − µ 0 )/s/ √ n, where x is the sample mean and s is the sample standard deviation.Larges values of B 01 give evidence in favor of the null hypothesis.Also, we can transform these Bayes factors using the natural logarithm scale (2 log B 01 ), and values above 3 give some evidence in favor of the null hypothesis, while values above 10 give stronger evidence in favor of the null hypothesis; see [13].
This Bayes factor satisfies the finite sample consistency principle.Suppose that we are comparing the alternative hypothesis with the null hypothesis, H 0 : β = 0.As the least squares estimate β (and the noncentrality parameter) goes to infinity, so that one becomes sure that H 0 is wrong, the Bayes factor of H 0 to H 1 goes to zero.Theorem 1.For a fixed sample size n ≥ 2, the Bayes factor based on the intrinsic prior (B IP 01 ) for the one-sample mean µ is finite sample-consistent.
Proof.For a fixed sample, n ≥ 2, and letting t 2 → ∞ or equivalently |t| → ∞, the Bayes factor based on the intrinsic prior goes to 0, i.e., The Bayes Factor based on the intrinsic prior B IP 01 is finite sample consistent.
Robust Bayes Factor for the One-Sample Test for the Mean Even though the Bayes Factor constructed using the intrinsic prior is finite sampleconsistent, it is only an approximation.Evidence has been found that priors with flatter tails than those of the likelihood function tend to be fairly robust, [18,19].The robust prior proposed here is developed by [20]; we call it the Berger robust prior.This prior is hierarchical; by such a choice, we can obtain robustness while keeping the calculations relatively simple, and the computations are exact.The definition of this robust prior, denoted π R (ξ), can be defined as follows: λ has a density π(λ) = 1 2 λ −1/2 on (0, 1).where p is the rank of the design matrix.Recall that we are interested in testing (4); therefore, under the null hypothesis, the likelihood is in the form The noninformative prior under the null hypothesis is π N (σ) = 1/σ.The marginal density m(x) under the null hypothesis is given by where the sums of squares under H 0 and Γ(•) is the gamma function.Similarly, we can obtain an alternative likelihood under the alternative Hypothesis H 1 : Here, x = n −1 ∑ n i=1 x i is the sample mean and 2 is the sum of squares under the alternative.The Berger robust prior will be considered under the alternative hypothesis π 1 (µ, σ) = π R (µ|σ)/σ.The marginal density under H 1 is Computing the ratio of the marginals from ( 6) and ( 7), the Bayes factor based on the Berger robust prior is given by Here, t = ( x − µ 0 )/(s/ √ n) is the usual t-statistic with n − 1 degrees of freedom, where x and s are the sample mean and sample standard deviation, respectively.The Bayes factor based on the Berger robust prior B R 01 is finite sample-consistent.
Unlike the Bayes factor derived with the intrinsic prior, this robust Bayes factor has a closed form.We conclude the derivations for the Bayesian approach based on the intrinsic and robust prior that are finite sample-consistent.We now extend the objective and robust Bayesian approach to the two-sample scenarios.

Two-Sample Mean Hypothesis Test
Another fundamental research question of interest is whether or not the two groups are similar.This problem is usually addressed in the two-sample Student's t test to compare if these groups differ in means.Let X 1 , . . ., X n 1 ∼ N(µ 1 , σ 2 ) and let Y 1 , . . ., Y n 2 ∼ N(µ 2 , σ 2 ) independent of X with σ > 0 unknown.At first, we noticed that we were assuming that these two samples arise from a normal distribution with different means but equal variances.It is common interest to determine if these two samples are equal, or at least that they do not differ in location.To answer this, a hypothesis test for comparing two-sample means is performed, i.e., H 0 : To answer this question, Ref. [14] proposed the conjugate Bayes factor.This Bayes factor is based on the conjugate prior: Centering the prior assessment on the null hypothesis, i.e., making λ = 0, is usually a very reasonable choice.Then, the conjugate Bayes factor is simplified as 1 + n δ σ 2 δ .However, this Bayes factor is not finite sample-consistent, as |t| → ∞, or t 2 → ∞; the B C 01 does not go to zero, or equivalently, the posterior probability of the null hypothesis P(H 0 |data) does not go to zero.In fact, as |t| → ∞, then and ν = n 1 + n 2 − 2 are the degrees of freedom.

Intrinsic Bayes Factor for Two-Sample Means
To address the limitation of the conjugate prior, our first approach is based on the theory of intrinsic priors introduced in [16,17].Similar to the one-sample case, the method is to dig out a prior that yields, for moderate to large sample sizes, results equivalent to an established method for scaling the intrinsic Bayes factors.The resulting set of equations typically has solutions, at least in the nested hypothesis scenario, which is our case, and has been successfully applied coupled with the intrinsic Bayes factor method.Consider the hypotheses tests for the comparison of two populations means with unknown and equal variance σ 2 > 0, H 0 : This transformation leads us to the following design matrix X based on the training samples: where 1 k×1 and 0 k×1 are vectors of 1's and 0's of length k.The parameter of non-centrality can be computed as which becomes, in the comparison of two means, λ(l) = (µ 1 − µ 2 ) 2 /σ 2 .Following the general theory of the intrinsic Bayes factor for linear models [16,17], we have that an intrinsic prior is of the following form: Substitution from λ(l) using the non-centrality parameter of (9), then the transformation to the conditional of the parameter under the simple test, is H 0 : δ 1 = 0 vs. H 1 : δ 1 ̸ = 0.The conditional intrinsic prior for the hypothesis test is This conditional prior is proper, i.e., π I 1 (δ 1 |σ)dδ 1 = 1 and it satisfies the condition discussed by [12].The intrinsic prior, under the alternative Hypothesis H 1 , is of the form π I 1 (δ 1 , δ 0 , σ) = π I 1 (δ 1 |δ 0 , σ) • π I 1 (δ 0 , σ), where π I 1 (δ 0 , σ) = 1/σ, i.e., Setting up this framework, we can derive the intrinsic Bayes factor to compare twosample means.We will first obtain the marginal density under the null hypothesis m 0 (x, y).First, consider the joint likelihood function of the two samples under the null Hypothesis H 0 : Here, x is the sample mean of the first group, ȳ is the sample mean of the first group, and 2 are the sums of squares under the null hypothesis.The marginal density using the non-informative prior π N (δ 0 , σ) = 1/σ is computed as where t = ( x − ȳ)/(S p √ n δ ), where n = n 1 + n 2 , S p is the pooled standard deviation, i.e., Similarly, the joint likelihood function of the two samples under the alternative Hypothesis H 1 is given by The marginal density using the intrinsic prior π I (δ 0 , δ 1 , σ) defined in (11) is given by As in the one-sample framework, this Bayes factor can be approximated using the noninformative prior π N (δ 0 , δ 1 , σ) = 1/σ 2 in the asymptotic result as Using ( 12) and ( 14), we can compute B N 01 : where t 2 = ( x − ȳ) 2 /(S p n δ ), S p is the pooled estimate of the variance, S 2 p = (S 2 x + S 2 y )/(n − 2) and n δ = (1/n 1 + 1/n 2 ).Let δ1 and σ2 be the corresponding maximum likelihood estimator (MLE).Let δ1 = ( x − ȳ)/2 and σ2 = (S 2 x + S 2 y )/n = (n − 2)/nS 2 p ; where S 2 p is the variance pooled estimates and n = n 1 + n 2 .We can express the δ2 1 / σ2 = n δ nt 2 /(n − 2) in terms of the t-statistic.Then, the approximate intrinsic Bayes factor B IP 01 can be obtained by Here, t = ( x − ȳ)/(Sp √ n δ ) and coth(•) is the hyperbolic cotangent function defined as coth(x) = (e 2x + 1)/(e 2x − 1).
Theorem 3.For a fixed sample size n ≥ 4, the Bayes factor based on the intrinsic prior (B IP 01 ) for the comparison of two population means is finite sample-consistent.
Proof.For a fixed sample, n ≥ 4, and letting t 2 → ∞, or equivalently |t| → ∞, the Bayes factor based on the intrinsic prior goes to 0, i.e., The Bayes factor based on the intrinsic prior B IP 01 is finite sample-consistent.

Robust Bayes Factor for the Comparison of Two-Sample Means
Consider observations of a random sample from group 1 and group 2 of size n 1 and n 2 , respectively.We assume these groups have common variance (σ 2 1 = σ 2 2 = σ 2 ), respectively.The model of interest in this case, Further, consider the constraint that α 1 + α 2 = 0; then, the design matrix X can be written as This leads us to consider the following hypothesis, H 0 : α 1 = 0 vs. H 1 : α 1 ̸ = 0.The reference's priors, under the null hypothesis H 0 and alternative hypothesis First, we proceed to find the marginal density under H 0 .Consider the joint likelihood function under the null Hypothesis H 0 : Here, y 1 = (y 11 , . . ., y 1n 1 ) and y 2 = (y 21 , . . ., y 2n 2 ), ȳi. is the sample mean of the ith group, and SS 2 0 is the sum of squares under the null hypothesis.The marginal density under the null hypothesis m 0 (y 1 , y 2 ) is given by where t 2 = ( ȳ1. − ȳ2. ) 2 /(S p n δ ).Here, S p is the sample pooled estimate of the variance, S 2 p = (S 2 1 + S 2 2 )/(n − 2), and n δ = (1/n 1 + 1/n 2 ).For the alternative Hypothesis H 1 , we first consider the joint likelihood of group 1 and group 2: The marginal density m 1 (y 1 , y 2 ) is given by Here, α = ( ȳ1. − ȳ2.)/2.The robust Bayes factor is obtained by computing the ratio of the marginal densities of ( 17) and ( 18): To finish the calculation of the robust Bayes factor B R 01 , the term b + d has to be defined.Therefore, we propose using the effective sample size (TESS) n e o of [21].The first factor d = 0.25n δ • σ 2 , and the second factor b is Theorem 4. For a fixed sample size n ≥ 4, the Bayes factor based on the robust prior (B R 01 ) for the comparison of two populations means is finite sample-consistent.
Proof.For a fixed sample, n ≥ 4, and letting t 2 → ∞, or equivalently |t| → ∞, the Bayes factor based on the Berger robust prior goes to 0, i.e., The Bayes factor based on the Berger robust prior B R 01 is finite sample-consistent.
The Berger robust prior yields the following (exact) expression for the correction of the main term (for group i): .

Making the change of variables
the conditional intrinsic prior of Equation ( 10) is exactly recovered; π This established a correspondence between the intrinsic and Berger's robust priors for the Student's t test.

The Effective Sample Size Bayesian Information Criterion (BIC-TESS)
Our final Bayes factor for comparing two-sample means is a variation of the Bayesian Information Criterion (BIC) of [22].The BIC is a popular method to determine the best model in a set of competing models.However, in comparing the two-sample means, the BIC does not consider the information available in both groups but rather the entire sample.Here, we proposed replacing the sample size n with TESS.This may be used to form what may be claimed to be the corrected BIC or BIC-TESS.It can be demonstrated that BIC with TESS is: where n e o is defined by [11].Derivation of TESS is in Appendix A.1.If we have a balanced situation, where n 1 = n 2 , the BIC-TESS is similar to the regular BIC.If the situation is unbalanced, the BIC-TESS is stabilized, since as Theorem 5.For a fixed sample size n ≥ 3, the corrected BIC (B TESS

01
) for the two-sample mean µ is finite sample-consistent.
Proof.For a fixed sample, n ≥ 3, and letting t 2 → ∞, or equivalently |t| → ∞, the Bayesian Information Criterion constructed with TESS goes to 0, i.e., In Figure 1, we compare the asymptotic behavior of the Bayes factors and the posterior probability of the null hypothesis when the samples are balanced (n 1 ≈ n 2 ) and unbalanced (n 1 << n 2 ).The Bayes factor, based on the Berger robust prior (dark red), is very close in the range of evidence to the intrinsic Bayes factor (green) and the BIC-TESS (light orange).The robust Bayes factor, the intrinsic Bayes factor, and the BIC-TESS are relatively closed when the situation is balanced.In the unbalanced scenario, the robust Bayes factor and the BIC-TESS remain relatively close, while the intrinsic Bayes factor slightly increases.The conjugate Bayes factor (blue) is represented with different values of the prior variance σ 2 δ ; darker color means higher values for the prior variance.Recall that the conjugate Bayes factor is not finite sample-consistent, and its behavior depends on the choice of σ 2 δ .

Simulation Experiments Experiments for the One-and Two-Sample Mean Comparisons
We generated 500 datasets from random samples taken from a normal distribution, Student's t distribution with one degree of freedom, and gamma distribution.For each of these distributions, the mean and standard deviation values were set to µ 1 = 5 and σ 1 = 3.The second group was created with a combination of several parameters for the location; the mean values were µ 2 ∈ {µ 1 , 1.5µ 1 , 2µ 1 }, and for the standard deviation of the second group, σ 2 ∈ {σ 1 , 2σ 1 , 3σ 1 }.In the case of the Student's t distribution, both groups were simulated with ν = 1 degrees of freedom.The simulated gamma samples were obtained using the method of moments for the shape parameter, with α i = µ 2 i /σ 2 i , and the scale parameter, with β i = σ 2 i /µ i , for i = 1, 2. We compared our methodologies with several Bayes factors used when comparing two population means, displayed in Table 1.B S 01 is the classical Bayesian Information Criterion (BIC) of [22], (B ZS 01 ) is based on the Zellner and Siow prior [23], the two-sample Student's t Bayes factor of [14] is based on the conjugate prior with σ 2 δ = 1/3, the arithmetic Bayes factor (B EI A 10 ) of [24], and [12]'s Bayes Factor (B J 01 ) for the comparison of two-sample means with equal variances.One set of these Bayes factors-the BIC of Schwartz and the Zellner and Siow Bayes factors-depends only on the sample size n.The other set, based on the conjugate prior, intrinsic, Berger's (here called robust), and finally, the modified Jeffrey's prior, depends on the term n δ = 1/n 1 + 1/n 2 .In our experiments, we do not consider the constant 2/5 for B J 01 , since we believe it satisfies the condition that the samples arise from the same distribution; for more details about the use of the constant 2/5, see [12].We also studied these Bayes factors in unbalanced situations.Heavily unbalanced samples are interesting not only from a theoretical point of view but also because they are often observed in practice in observational studies; the results of these are displayed in Figures A1-A3.
Performance was compared using the twice natural base logarithm Bayes factors (2 log(B 01 )) for comparing the null hypothesis (µ 1 = µ 2 ) against the alternative (µ 1 ̸ = µ 2 ).This transformation allows the interpretation to be on the same scale as the deviance and likelihood ratio test statistics, as discussed in [13].
Table 1.Bayes factors based on the one-and two-sample means based on the Student's t test.The third column applies only to the two-sample comparison and is limiting when t 2 → 0 and n 2 → ∞.
Figure 2 displays the results for the evidence based on the normal distributions when testing whether two-sample means are equal (µ 1 = µ 2 ).The red line represents the cut-off for 10 (strong evidence), the yellow line for 6 (positive evidence), and the green for 2 (weak evidence).In the actual case when the means are equal, the Bayes factors based on the intrinsic prior and robust prior show strong evidence in favor of the null hypothesis.The average 2 log(B 01 ) based on the intrinsic prior shows strong evidence in favor of the null hypothesis (11.3 ± 1.59), while the Bayes factor based on the robust prior gives strong evidence in favor of the true case (10.61 ± 1.59), all above the red line.BIC-TESS also strongly supports the true case (9.9 ± 1.61).The other Bayes factors provide positive evidence for the true case, with averages ranging from 2.54 to 3.93.Even when the means were equal, and the samples had larger variance (σ 2 = 3σ 1 ), our objectives and robust Bayes factors provided strong evidence in favor of the true case, with the average above 10.The intrinsic Bayes factor and the robust prior were above 90%, showing either strong or very strong evidence in favor of the null hypothesis when the means were equal; see Table A2 for a detailed comparison.
In the Student's t random samples, when testing whether two-sample means are equal (µ 1 = µ 2 ), we can observe in Figure 3 the results when the means are equal.The Bayes Factors based on the intrinsic prior and robust prior show strong evidence in favor of the null hypothesis.The average 2 log(B 01 ) based on the intrinsic prior, robust prior, and BIC-TESS shows strong evidence in favor of the null hypothesis (averages above 10), with values of 11.41, 10.72, and 10.02; dispersion was relatively low, ranging from 1.01 to 1.03.The competing Bayes factors provide slightly positive evidence for the true case, with averages ranging from 2.65 to 4.04.It is interesting to see that in the case of µ 2 = 2µ 1 , our Bayes factor gave positive evidence above the yellow line but was very variable; the sample standard deviation ranged from 4.98 to 5.03.Finally, in the case of gamma samples, our Bayes factors gave strong evidence only when the means and the variances were equal.Departing from any of these conditions gave strong evidence that the means were unequal; see Figure 4.For more details about the simulation results' numerical performance, see Table A1.

Application in Real Dataset
In this section, we applied the proposed one and two Bayes factors based on the intrinsic, Berger, and robust priors, and BIC-TESS based on the Student's t statistic.

Gosset Original Dataset
We first consider the century-long original Student's t sleep data from [1,25] that still raise interesting discussion; see [26,27].In this study, the number of hours of sleep under both drugs (Dextro and Laevo) was recorded for each patient.The difference in hours was recorded to determine effectiveness, and the average number of hours of sleep gained by using each drug (Dextro and Laevo) was measured.The authors concluded that, in usual doses, Laevo was soporific, but Dextro was not.This analysis is treated as a paired sample, since it compares the sleep hours between treatments.Paired samples lead us to the one sample.The hypothesis of interest is H 0 : µ d = 0 versus H 1 : µ d ̸ = 0; the test statistic is t = −4.06 with a p-value of 0.002.At the 5% significance level, we can conclude that there is a difference in the average sleep hours between Laevo and Dextro.However, the Gosset original dataset has not been addressed using an objective and robust Bayesian approach.The value of the test statistics is the same as before, with n = 10.The (2 log(B 10 )) was computed for the intrinsic and robust Bayes factor, along with the associated posterior probabilities (P(H 1 |data)).The 2 log(B IP 10 ) = 5.858 and 2 log(B R 10 ) = 5.988 are positive, indicating strong evidence that the average sleep hours are different.Further, the posterior probability based on the intrinsic prior is 0.949, and the posterior probability based on the Berger robust prior is 0.952.Both posterior probabilities are above 90%, suggesting strong evidence favoring the average sleep difference.
This dataset is considered as a paired sample, since the recorded number of sleep hours belongs to the same participant.However, the treatments, Dextro and Laevo, might need to be considered independently.If these are considered independently, then a two-sample framework arises.We are interested in determining the sleep hours when receiving Laevo versus when receiving Dextro.Assuming equal variances between Laevo and Dextro, the hypothesis of interest is where µ L is the average sleep hours when receiving Laevo and µ D is the average sleep hours when receiving Dextro.The two-sample test statistic is t = −1.86 with a p-value of 0.079.At the 5% significance level, we can conclude that there is no difference in the average sleep hours when using Laevo versus Dextro.In our Bayesian approach, 2 log(B IP 10 ) = −4.33 and 2 log(B R 10 ) = −3.57,indicating weak evidence that the average number of sleep hours differs between Laevo and Dextro.Both posterior probabilities are above 15%, suggesting weak evidence that the average number of sleep hours differs when using Laevo and Dextro.

Induced Hypertension on Mice According to Diet
Our first application consists of the data from [28], but they were analyzed in a Bayesian framework using intrinsic priors by [24].In this study, the researchers were interested in how intermittent feeding affected the blood pressure of rats.The treatment group consisted of eight rats fed intermittently for weeks, and at the final period, the rats' blood pressure measurements were taken.The blood pressure measurements of a second group of seven rats fed the usual way were defined as a control group.The hypothesis of interest is that the average blood pressure is different when the rats have intermittent fasting compared to those with their usual diet, i.e., H 0 : At the 5% significance level, with a p-value = 0.044, one can conclude that there exists a difference in the mean blood pressure level according to their feeding style.
The study in Ref.
[24] computed the expected arithmetic Bayes factor that favors the alternative hypothesis B EAI The extreme observation in the intermittent group (115) was removed.The 2 log(B IP * 10 ) = 2.347, suggesting evidence in favor of H 1 , while the posterior probability of P(H 1 |(x * , y * )) = 0.764 indicates a moderate level of confidence in this conclusion.The Bayes factor constructed with the Berger robust prior exhibits a higher 2 log(B R * 10 ) = 3.535, along with the posterior probability of P(H 1 |(x * , y * )) = 0.854, indicating stronger support that the average of blood pressure differs by type of fasting.TESS models present even higher 2 log(B TESS * 10 ) = 5.041, respectively, with corresponding posterior probabilities of 0.926, indicating substantial evidence for H 1 .

Discussion
In this work, we proposed the objective and robust Bayes factors for the one-sample and two-sample comparisons.These newly proposed Bayes factors are finite sampleconsistent.Both the exact and approximate forms of the Bayes factors can be easily implemented using any open-source or commercial software.Another advantage of using Bayes factors is that the posterior probabilities of the hypothesis test are easily interpretable.We reanalyzed the original study by [1] and the comparison of blood pressure in rats according to different feeding types.Our objective and robust Bayes factors showed strong evidence that the average number of hours differed between Laevo and Dextro in the mouse application.When removing potential extreme values, we concluded that there is strong evidence that the means differed.However, we reported weak evidence with the complete dataset that these averages differed according to their diet.This might occur, since the assumption of equal variances might not hold.Even though the samples might have equal means, departing from the assumption of equal variances can lead in favor of the wrong hypothesis.Although we have made a significant contribution, an aspect that might alleviate this issue is deriving an objective and robust Bayes factor for the Behrens-Fisher problem, i.e., unequal variances for both groups.Also, the Bayes factor based on the intrinsic prior depends on the maximum likelihood estimate (MLE); perhaps robust estimates can be considered, although a modified test statistic might arise.Another possible extension is to develop an objective Bayes factor for the hypothesis of several equal means; this will be an analysis of variance (ANOVA) approach in the frequentist approach.
where n = n 1 + n 2 .Let X 1 be the second column of the design matrix , where I n×n is an identity matrix of size n.It follows from the definition of the effective sample size for the original α that . Then, the final expression of the effective sample size is obtained.The definition of the unit information is The factors b and d will be defined as This last calculation defines the factors b and d for the Robust Bayesian Student's t test, B R 01 .

Appendix A.2. Simulation Experiments
In the two-sample framework, we generated 500 datasets from a random sample from a normal distribution with parameters µ 1 = 5, σ 1 = 3; the second group was created with a combination of several parameters for the location.The mean values were µ 2 ∈ {µ 1 , 1.5µ 1 , 2µ 1 } for the standard deviation σ 2 ∈ {σ 1 , 2σ 1 , 3σ 1 }.In the case of the Student's t with ν = 1 degrees of freedom, the gamma distribution was simulated using the methods of moments for the shape parameter α i = µ 2 i /σ 2 i and scale parameter β i = σ 2 i /µ i for i = 1, 2. We reported the twice natural base logarithm Bayes factors (2 log(B 01 )) for comparing the null hypothesis (µ 1 = µ 2 ) against the alternative (µ 1 ̸ = µ 2 ).This transformation allows the interpretation to be on the same scale as deviance and likelihood ratio test statistics; see Ref. [13] for a deeper discussion.

Theorem 2 .
For a fixed sample size n ≥ 3, the Bayes factor based on the Berger robust prior (B R 01 ) for the one-sample mean µ is finite sample-consistent.Proof.For a fixed sample, n ≥ 3, and letting t 2 → ∞, or equivalently |t| → ∞, the Bayes factor based on the Berger robust prior goes to 0, i.e., lim |t|→∞ B R 01 → 0; or equivalently lim |t|→∞ P(H 0 |x) → 0.

Figure 1 .
Figure 1.Results in terms of 2 log(B 01 ) and posterior probability for the finite sample consistency.

Figure 2 .Figure 3 .Figure 4 .
Figure 2. Evidence in the 2 log(B 01 ) scale when comparing the population means of two samples that arise from a normal distribution with several means and variances with equal sizes n 1 = n 2 = 50.

10 = 2 .
035 with P(H 1 |(x, y)) = 0.671, providing support that the average blood pressure measurements differ based on diet.Notably, the Bayes factors based on the intrinsic priors and robust priors yield negative values, 2 log(B IP 10 ) = −2.412and 2 log(B R 10 ) = −1.414,respectively, indicating evidence against H 1 .However, the corresponding posterior probabilities (P(H 1 |x, y)) are 0.23 and 0.33, suggesting weak evidence for the alternative hypothesis that the means are different.The corrected BICTESS suggests weaker evidence against H 1 with a 2 log(B TESS 10 ) = −0.3634and a posterior probability of P(H 1 |x, y) = 0.455.In contrast, the conjugate 2 log(B C 10 ) = 1.517 and 2 log(B EAI 10 ) = 1.421, indicating very weak evidence in favor of H 1 .The associated posterior probability is 0.681.

Figure A1 .Figure A2 .Figure A3 .
Figure A1.Evidence in the 2 log(B 01 ) scale when comparing the population means of two samples that arise from normal distributions with several means and variances with equal sizes n 1 = 50 and n 2 = 500.Intrinsic priors (IP), Berger robust prior (Robust), BIC based on the effective sample size (TESS), conjugate, Jeffrey's, Schwarz, Zellner and Siow (ZS), and the expected arithmetic intrinsic prior of [24].