Next Article in Journal
Surrogate-Enhanced Parameter Inference for Function-Valued Models
Previous Article in Journal
The ABC of Physics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

On Two Measure-Theoretic Aspects of the Full Bayesian Significance Test for Precise Bayesian Hypothesis Testing

Department of Mathematics, University of Siegen, 57072 Siegen, Germany
Presented at the 40th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, online, 4–9 July 2021.
Current address: University of Siegen, Department of Mathematics, Walter-Flex-Street 3, 57072 Siegen, Germany.
Phys. Sci. Forum 2021, 3(1), 10; https://doi.org/10.3390/psf2021003010
Published: 17 December 2021

Abstract

:
The Full Bayesian Significance Test (FBST) has been proposed as a convenient method to replace frequentist p-values for testing a precise hypothesis. Although the FBST enjoys various appealing properties, the purpose of this paper is to investigate two aspects of the FBST which are sometimes observed as measure-theoretic inconsistencies of the procedure and have not been discussed rigorously in the literature. First, the FBST uses the posterior density as a reference for judging the Bayesian statistical evidence against a precise hypothesis. However, under absolutely continuous prior distributions, the posterior density is defined only up to Lebesgue null sets which renders the reference criterion arbitrary. Second, the FBST statistical evidence seems to have no valid prior probability. It is shown that the former aspect can be circumvented by fixing a version of the posterior density before using the FBST, and the latter aspect is based on its measure-theoretic premises. An illustrative example demonstrates the two aspects and their solution. Together, the results in this paper show that both of the two aspects which are sometimes observed as measure-theoretic inconsistencies of the FBST are not tenable. The FBST thus provides a measure-theoretically coherent Bayesian alternative for testing a precise hypothesis.

1. Introduction

Statistical hypothesis testing is an important method in a broad range of sciences [1]. However, the recent problems with the validity of research results have been termed a scientific replication crisis [2,3], at the core of which lie some fundamental flaws in the statistical analysis of data [4]. Various papers have discussed the reproducibility of research and often the inadequate use of null hypothesis significance tests (NHST) substantiates a major cause of the replication crisis [5]. This holds in particular in the biomedical and cognitive sciences [6,7], where the p-value is the gold standard for quantifying the evidence against a precise null hypothesis.
Bayesian hypothesis testing has become increasingly popular in the biomedical and cognitive sciences due to the above problems [8,9,10]. It is well known that Bayesian data analysis solves some of the problems of NHST by allowing researchers to make use of optional stopping [11,12] and by simplifying the interpretation of censored data [13]. Together, these aspects are consequence of Bayesian inference being consistent with the likelihood principle [13]. An appealing proposal for a Bayesian test of a precise hypothesis is the Full Bayesian Significance Test (FBST), which has been applied in a wide range of domains [8,14,15,16,17,18]. The FBST advocates the e-value as a Bayesian replacement of the frequentist p-value for quantifying the statistical evidence against a precise hypothesis [19]. The FBST is a fully Bayesian procedure [19], accords with the likelihood principle [15], and enjoys attractive asymptotic properties [20] next to transformation invariance [16]. However, the FBST seems to suffer from two aspects which are studied in detail in this paper. First, the reference criterion in the FBST is only defined up to Lebesgue null sets, which seems to be make the evidential threshold arbitrary. Thus, it seems that the FBST statistical evidence, the e-value, lacks a calibration. Second, the statistical evidence in the FBST seems to have no prior probability, which contradicts common Bayesian reasoning. For other criticisms on the FBST see Ly & Wagenmakers [21] and for a more optimistic perspective Kelter [22]. In this paper it is shown that both aspects can be solved by fixing a version of the posterior distribution for statistical inference, and assigning one of two possible interpretations to the prior probability of the statistical evidence in the FBST. These aspects have not yet been discussed extensively in the literature and present a further justification of the FBST as an attractive replacement of frequentist p-values to remedy the ongoing problems with the replication of scientific results. The plan of the paper is as follows: The next section outlines the theory behind the FBST. After that, the two problematic aspects mentioned above are detailed and illustrated by an example from medical research. The following section elaborates on the problems and provides solutions to them. After that, a conclusion is provided.

2. The Full Bayesian Significance Test

This section outlines the theory behind the FBST. First, the required notation is introduced.

2.1. Notation

In contrast to the frequentist approach, in the Bayesian approach the parameter θ Θ is modelled as a random variable, and the data y Y are fixed. Denote by Θ the parameter space and G as the σ -algebra on Θ , and let P ϑ be the prior probability measure on G , leading to the triple ( Θ , G , P ϑ ) . The observed sample is modelled by the random variable Y : Ω Y which takes values in the measurable space Y , where Y is endowed with a σ -algebra B . The uncertainty in the data generating mechanism producing a sample Y ( ω ) = y for ω Ω is modelled via the assumption of a statistical model P : = { P θ : θ Θ } which is dominated by a σ -finite measure ν . In practice, ν often is the Lebesgue measure λ . The latter requirement guarantees the existence of Radon-Nikodým derivatives d P θ / d λ = f ( y | θ ) . Let ( Ω , A , P * ) be the product space defined as Ω : = Θ × Y , A : = G × B and P * the product measure induced by the selection of P ϑ and P , where P θ must be a measurable function on B for every y on Y . Thus, P ϑ is the marginal distribution of P * with respect to the parameter θ , and the marginal distribution with respect to Y is the prior predictive P ϑ ( B ) : = Θ P θ ( B ) d P ϑ for any B B . The parameter, as noted above, is modelled mathematically as a random variable ϑ : Ω Θ . The resulting operational models from a Bayesian point of view are thus given as
(1)
the prior model ( Θ , G , P ϑ )
(2)
the statistical model P on ( Y , B ) , leading to ( Y , B , { P θ : θ Θ } ) , and
(3)
the posterior model ( Θ , G , { P ϑ | Y : Y Y } )
The existence of the posterior distribution P ϑ | Y is guaranteed on Polish spaces [23] and inference about θ is conducted with respect to the posterior distribution P ϑ | Y with density p ( θ | y ) : = d P ϑ | Y / d λ , which exists under the assumption that P ϑ < < λ where < < denotes absolute-continuity of P ϑ with respect to the measure λ .

2.2. Theory behind the Full Bayesian Significance Test (FBST)

The Full Bayesian Significance Test (FBST) was originally developed by Pereira and Stern [14] as an alternative to frequentist null hypothesis significance tests based on the p-value. It was created under the assumption that a significance test of a sharp hypothesis had to be conducted, where a sharp hypothesis refers to any submanifold of the parameter space of interest [20]. This includes, in particular, precise hypotheses like H 0 : θ = θ 0 for θ 0 Θ [15]. The FBST assumes a standard parametric statistical model, where θ Θ R p is a (possibly vector-valued) parameter of interest, f ( y | θ ) is the density corresponding to the model distribution P Y | ϑ and p ( θ ) is the prior density corresponding to the prior distribution P ϑ , where we again assume a dominating measure ν to guarantee the existence of Radon-Nikodým densities. A hypothesis H makes the statement that the parameter θ lies in the corresponding null set Θ H , where for simple (or precise) hypotheses Θ H : = { θ 0 } , where θ 0 is the value specified in H : θ = θ 0 . The Full Bayesian Significance Test (FBST) then defines two quantities: ev ( H ) , which is the e-value supporting (or in favour of) the hypothesis H, and ev ¯ ( H ) , the e-value against H, also called the Bayesian evidence value against H [14]. First, the posterior surprise function s ( θ ) and its maximum s * restricted to the null set Θ H are introduced:
Definition 1
(Posterior surprise function). The posterior surprise function s ( θ ) for a reference function r : Θ ( T , C ) from Θ to a measurable space ( T , C ) is defined as
s ( θ ) : = p ( θ | y ) r ( θ )
In the definition of the posterior surprise function s ( θ ) , the denominator r ( θ ) serves as a reference density, and often the measurable space ( T , C ) is equal to ( R d , B ( R d ) ) . When the improper flat reference function r ( θ ) = 1 is used, the surprise function becomes the posterior density p ( θ | y ) . Otherwise, a weakly informative prior density can be used as a reference function, see Pereira and Stern [16]. Then,
s * : = s ( θ * ) = sup θ Θ H s ( θ )
is defined as the supremum of the surprise function s ( θ ) over the null hypothesis support. For a precise null hypothesis, s * is simply s ( θ 0 ) . Next, the tangential set is introduced:
Definition 2
(Tangential set). The tangential set T ¯ ( ν ) is defined as
T ¯ ( ν ) : = Θ \ T ( ν )
where
T ( ν ) : = { θ Θ | s ( θ ) ν }
Thus, T ( ν ) includes all parameter values θ Θ which attain a surprise function value s ( θ ) smaller or equal to the threshold ν . The tangential set T ¯ ( ν ) is then the set complement and includes all parameter values θ Θ which yield a surprise function value s ( θ ) larger than ν . Fixing ν = s * yields T ¯ ( s * ) , which is called the tangential set to the hypothesis H. This set T ¯ ( s * ) contains the points θ of the parameter space Θ with higher surprise (or corroboration relative to the reference function r ( θ ) ) than the point θ 0 in the null set Θ H . Then, the cumulative surprise function is introduced which is required to compute the e-value in the final step:
Definition 3
(Cumulative surprise function). The map W : Θ [ 0 , 1 ] given by
W ( ν ) : = T ( ν ) p ( θ | y ) d θ
is called the complementary cumulative surprise function, and
W ¯ ( ν ) : = 1 W ( ν )
is called the cumulative surprise function.
Thus, the complementary cumulative surprise function W ( ν ) is the integral of the posterior density p ( θ | y ) over the set T ( ν ) , and the cumulative surprise function W ¯ ( ν ) is simply the integral of the posterior density over the tangential set T ¯ ( ν ) . The final step towards the e-value is to integrate the posterior density p ( θ | y ) over this set:
Definition 4
(e-value). The e-value against a sharp null hypothesis H 0 : θ = θ 0 is defined as
e v ¯ ( H 0 ) : = W ¯ ( s * )
and can be interpreted as the Bayesian evidence against H 0 .
Clearly, ev ¯ ( H 0 ) : = W ¯ ( s * ) is the integral of the density p ( θ | y ) over the tangential set T ( s * ) , which can be interpreted as the integral of the posterior density p ( θ | y ) over all parameter values θ which fulfill the condition s ( θ ) s * . The e-value ev ( H 0 ) supporting H is obtained as ev ( H ) : = 1 ev ¯ ( H 0 ) under r ( θ ) : = 1 . Large values of ev ¯ ( H 0 ) thus indicate that the hypothesis H traverses low-density regions (or equivalently, that the alternative hypothesis traverses high-density regions) so that the evidence against H 0 is large. For r ( θ ) 1 the argument is identical as H 0 traverses low posterior-surprise regions then.
For theoretical properties of the FBST and the e-value see Pereira and Stern [16] and Kelter [18]. The FBST then uses ev ( H ) to reject H if ev ( H ) is sufficiently small (or when ev ¯ ( H ) is large) [14,15].

3. On Two Aspects of the FBST

Now, this section demonstrates the two aspects briefly mentioned in the introduction based on an illustrative example.

3.1. The Reference Criterion

To illustrate the first problem, data of Rosenman et al. [24] of the Western Collaborative Group Study about coronary heart disease is used.
Example 1
(Coronary heart disease data). The Western Collaborative Group Study began in 1960 with 3524 male volunteers who were 39 to 59 years old and free of heart disease as determined by electrocardiogram. After the initial screening, the study population dropped to 3154 because of various exclusions. Multiple endpoints were studied and average follow-up continued for 8.5 years with repeat examinations. As an illustrative example, suppose interest lies in testing for differences in systolic blood pressure between light smokers and heavy smokers. Thus, we test the hypothesis H 0 : δ = 0 against the alternative H 1 : δ 0 where we classify participants with more than 5 cigarettes per day as heavy smokers. A Bayesian two-sample t-test using the model of Rouder et al. [25] is conducted, and the left plot in Figure 1 shows the results of the FBST using a flat reference function r ( δ ) : = 1 . The model is parameterized in the effect size δ of Cohen [26], and the e-value e v ¯ ( H 0 ) is given as e v ¯ ( H 0 ) = 0.4362 , which equals the posterior probability mass visualized as the blue area in the left plot of Figure 1. Thus, 43.62% of the posterior probability indicate evidence against the null hypothesis, and the situation is inconclusive. The right plot in Figure 1 shows the result of the FBST when replacing the flat reference function r ( δ ) : = 1 with a Cauchy C ( 0 , 2 ) density (note the different scaling on the y-axis), which is also used as the prior on δ in the two-sample t-test. In this case, the e-value e v ¯ ( H 0 ) = 0.4367 indicates a similarly inconclusive situation and changes the result barely.
Now, the above example shows that calculation of the e-value is straightforward and universally applicable. However, the parameter space Θ is continuous in the example (the effect size δ R is a continuous quantity) and any usual prior distribution P ϑ assigned to θ is absolutely continuous with respect to the Lebesgue measure λ . It is well-known that the posterior distribution P ϑ | Y is absolutely continuous with respect to the prior distribution [27], and thus any P ϑ -null-set N Θ with P ϑ ( N ) = 0 is also a P ϑ | Y -null-set with P ϑ | Y ( N ) = 0 . Problematically, the set Θ 0 : = { δ 0 } = { 0 } which is used in the precise null hypothesis H 0 : δ = 0 is a P ϑ -null-set under both the improper flat and Cauchy prior, as both of these are absolutely continuous with respect to the Lebesgue measure λ , and submanifolds are Lebesgue-null-sets [28]. Thus, λ ( { δ 0 } = λ ( { 0 } ) = 0 implies P ϑ ( { 0 } ) = 0 due to P ϑ < < λ , which implies in turn that the posterior probability P ϑ | Y ( { 0 } ) of the value δ 0 = 0 is a P ϑ | Y -null-set due to P ϑ | Y < < P ϑ . As a consequence, the value of the posterior density p ( 0 | y ) = 9.4693 which is shown as the blue point in the left plot of Figure 1 could be chosen arbitrarily. Problematically, this value is used as the reference criterion in the calculation of the e-value ev ¯ ( H 0 ) in the computation of the tangential set T ¯ ( ν ) . Thus, one could assign p ( 0 | y ) an entirely different value, say, c R , and obtain a different e-value ev ¯ ( H 0 ) than the one calculated from the value p ( 0 | y ) = 9.4693 . This seems to render the calculation of the statistical evidence ev ¯ ( H 0 ) in the FBST arbitrary, questioning the use of the procedure.

3.2. Prior Probability of the e-Value

The second issue with the FBST may be phrased as the e-value having no valid prior probability. In fact, the e-value in Equation (7) is based on the cumulative surprise function W ( s * ) , which itself depends on the tangential set T ( s * ) and the posterior density p ( θ | y ) . Before data y Y are observed, the posterior P ϑ | Y has not been realized as P ϑ | Y = y and thus there exists no prior probability P ϑ which is associated with the e-value. Even the tangential set T ¯ ( s * ) : = { θ Θ | s ( θ ) > s * } which is a subset of Θ seems to have no prior probability, because it depends on the surprise function s ( θ ) which itself depends on the posterior density p ( θ | y ) , compare Equation (1). Thus, the statistical evidence in the FBST seems to escape the natural Bayesian transition from prior to posterior probability.

4. Solutions to the Two Aspects

4.1. The Reference Criterion

If the above criticism that the reference criterion in the FBST is arbitrary would hold, the procedure would be of little use in practice. However, the solution to the problem is given by fixing a specific version of the posterior distribution and performing all calculations conditional on fixing such a version. It is well known that probability distributions (which are probability measures corresponding to a random variable) are defined up to Lebesgue-null-sets (when they are dominated by the Lebesgue measure). The values on null-sets do not influence these probability measures and therefore they are identified with each other whenever they only differ on Lebesgue-null-sets [28]. Technically, this corresponds to the shift from the vector space L p
L p ( Ω , A , μ ) : = f : Ω K | f is measurable , Ω | f ( x ) | p d μ ( x ) <
on a probability space ( Ω , A , μ ) , K { R , C } for 0 < p < to the quotient space L p , see Bauer [28]. The latter space is defined as L p : = L p / N , where
N : = f L p | f = 0 μ - almost - everywhere
and the elements in L p are equivalence classes. Thus, two elements [ f ] , [ g ] L p are equal if and only if they differ only on μ -null-sets, that is, [ f ] [ g ] N . Thus, the arbitrariness of the reference criterion in the FBST exists only unless a specific representant of the equivalence class, in which the posterior density p ( θ | y ) is located, is selected. In the context of Example 1, this implies that a specific version of the posterior density p ( δ | y ) needs to be chosen, which fixes the densities value on δ 0 = 0 (and the other values δ Θ ). Thus, setting p ( δ 0 | y ) : = p ( 0 | y ) : = 9.4693 explicitly by definition fixes one representant of the equivalence class of P ϑ | Y and bypasses the problem that the reference threshold p ( δ 0 | y ) in the FBST is arbitrary. Whenever the posterior is obtainable as a closed-form solution, that is, follows a well-known probability density P ˜ ϑ | Y with Lebesgue-density p ˜ ( θ | y ) , setting p ( θ | y ) : = p ˜ ( θ | y ) as the value of this known probability density p ˜ for the posterior density p in the FBST by definition solves the first problem. Whenever numerical techniques like Markov-Chain-Monte-Carlo (MCMC) are used to produce the posterior, the resulting posterior distribution P ϑ | Y M C M C and the posterior density p M C M C ( θ | y ) approximate the true posterior distribution P ϑ | Y and the posterior Lebesgue-density p ( θ | y ) . Thus, setting p ( θ | y ) : = p M C M C ( θ | y ) by definition for a fixed numerical technique like MCMC with given random number generator seed fixes a version of the posterior density and renders the reference threshold in the FBST unique. In Example 1 this equals the choice of p ( δ 0 | y ) : = 9.4693 by definition (as MCMC sampling was used), and p ( δ | y ) : = p M C M C ( δ | y ) for all δ R . In summary, the above considerations provide the following result:
Theorem 1.
Let s * : = s ( θ * ) = sup θ Θ H s ( θ ) be the supremum of the surprise function in the Full Bayesian Significance Test, and L p and L p the corresponding vector spaces on ( Θ , G , P ϑ | Y ) with quotient space L p / N for N : = { f L p | f = 0 μ - a l m o s t - e v e r y w h e r e } . Whenever P ϑ | Y is a known probability distribution P ˜ ϑ | Y with Lebesgue-density p ˜ ( ϑ | Y ) , defining p ( θ | y ) : = p ˜ ( θ | y ) pointwise for all θ Θ renders the e-value e v ¯ ( H 0 ) against H 0 : θ = θ 0 for θ 0 Θ well-defined and unique for the choice of p ( θ | y ) .
Proof. 
See Appendix A. □
Note that when using numerical methods such as MCMC, ergodic theory ensures that P ϑ | Y M C M C P ϑ | Y in distribution and p ϑ | Y M C M C p ϑ | Y , that is, the MCMC posterior density approximates the posterior Lebesgue-density pointwise with increasing precision for increasing number of MCMC samples [29]. Thus, fixing a version of the posterior, Theorem 1 extends also to situations where numerical techniques such as MCMC are required.

4.2. Prior Probability of the e-Value

The solution to the second problem is more involved and less technical. Conceptually, from the above line of thought it is immediate that under absolutely continuous priors P ϑ with respect to the Lebesgue measure λ , the prior probability P ϑ ( Θ 0 ) will be zero for any precise null hypothesis H 0 : = Θ 0 with Θ 0 : = { θ 0 } for θ 0 Θ . The posterior P ϑ | Y is absolutely continuous with respect to the prior P ϑ , so P ϑ | Y ( Θ 0 ) = 0 . Thus, it is simply not possible to use a natural Bayesian workflow which assigns positive probability mass to a Lebesgue-null-set Θ 0 whenever the statistician uses an absolutely continuous prior distribution P ϑ with respect to λ . Traditional Bayesian hypothesis testing and model selection bypasses this inconvenience by introducing an arbitrary mixture prior structure P ϑ : = ϱ 1 Θ 0 + ( 1 ϱ ) P ˜ ϑ which assigns positive probability mass ϱ > 0 to the null set Θ 0 , and distributes the rest of the probability mass ( 1 ϱ ) [ 0 , 1 ] by means of a probability distribution P ˜ ϑ on the alternative hypothesis space Θ 1 = Θ \ Θ 0 . Early proposals of such a mixture prior structure include Jeffreys [30] and Haldane [31], see also Robert [29] and Kleijn [23]. Such a prior allows computation of a Bayes factor, and furthermore, the Bayes factor itself also has no prior probability which is naturally associated with it. Importantly, this mixture prior structure imposes a dichotomy between hypothesis testing and parameter estimation, because such a mixture prior structure is reasonable only from a hypothesis testing perspective. Whenever parameter estimation is the goal, the assignment of probability mass ϱ > 0 to a specific value is highly questionable and often contradicts reasonable a priori beliefs. In these cases, prior beliefs are expressed better through a prior which is absolutely continuous with respect to the Lebesgue measure λ .
The FBST avoids the introduction of such a mixture structure and thus allows for a unified prior elicitation which is coherent both from a Bayesian hypothesis testing and Bayesian parameter estimation stance. Importantly, the e-value is intended to be a Bayesian replacement of the frequentist p-value which measures the statistical discrepancy between the observed data to an assumed precise hypothesis. Thus, the e-value provides the Bayesian evidence against such a precise hypothesis. From a measure-theoretic point of view, every precise null hypothesis is assumed to be false and the FBST thus aligns with the empirical rationalism of Popper [32]. For the use of testing a precise hypothesis as an approximation of a small interval hypothesis see Berger [33], Rousseau [34], Rao & Lovric [35] as well as Kelter [36]: Often, the approximation of a small interval hypothesis via a precise point null hypothesis will be bad, and thus the e-value does not assign positive probability mass to such a precise null hypothesis. Instead, the FBST quantifies the discrepancy between the observed data and the hypothetical precise null value, while simultaneously implementing I.J. Good’s principle of least surprise [37,38,39]. Note further that the mathematical introduction of positive prior probability ϱ > 0 to a precise value θ 0 Θ when using a mixture prior does not render such a precise hypothesis H 0 : θ = θ 0 more realistic in practice.
Furthermore, next to its measure-theoretic premises, there exists another argument which weakens the criticism that there is no prior probability of the e-value: When a prior distribution P ϑ is selected and no data y Y has been observed, the posterior distribution can be identified conceptually as the prior distribution. Thus, replacing the posterior density p ( θ | y ) with the λ -density p ( θ ) of the prior P ϑ yields s ( θ ) : = p ( θ ) r ( θ ) , which implies that the tangential set T ¯ ( ν ) : = Θ \ T ( ν ) for T ( ν ) : = { θ Θ | s ( θ ) ν } includes those parameter values θ Θ for which p ( θ ) / r ( θ ) > ν . Using the fact that s * = p ( θ 0 ) / r ( θ 0 ) for a precise hypothesis H 0 : θ = θ 0 then, yields T ¯ ( ν ) = { θ Θ | p ( θ ) / r ( θ ) > p ( θ 0 ) / r ( θ 0 ) } . Plugging this tangential set into Equation (6) yields the e-value
ev ¯ ( H 0 ) : = W ¯ ( s * ) = T ¯ ( s * ) p ( θ ) d θ
which is the integral of the prior density p ( θ ) over T ¯ ( s * ) . When the reference function r ( θ ) is chosen as a flat improper prior r ( θ ) : = 1 , this becomes
ev ¯ ( H 0 ) = { θ Θ | p ( θ ) > p ( θ 0 ) } p ( θ ) d θ
which is the integral of the prior density p ( θ ) over all values which attain higher prior density values than the null value θ 0 in H 0 : θ = θ 0 . Thus, the e-value in such a case quantifies the discrepancy of the precise hypothesis H 0 : θ = θ 0 with the prior beliefs P ϑ . The above line of thought provide the following result:
Theorem 2.
Let r ( θ ) : = 1 . In case no data y Y has been observed, the e-value quantifies the discrepancy between the precise hypothesis H 0 : = Θ 0 for Θ 0 : = { θ 0 } and θ 0 Θ and the prior distribution P ϑ , that is,
e v ¯ ( H 0 ) = P ϑ ( { θ Θ | p ( θ ) > p ( θ 0 ) } )
Proof. 
See Appendix A. □
Whenever r ( θ ) 1 , the interpretation is more complicated because such a reference function incorporates a surprise element into the tangential set, but the conclusions remain the same. The e-value then quantifies the discrepancy between the precise hypothesis and the prior surprise.

5. Discussion

The Full Bayesian Significance Test (FBST) has been proposed as a convenient method to replace frequentist p-values for testing a precise hypothesis [14,15,16]. Although the FBST enjoys various appealing properties [8,19,20,40], two aspects of the FBST are sometimes observed as measure-theoretic inconsistencies of the procedure and have not been discussed rigorously in the literature. First, the FBST uses the posterior density as a reference for judging the Bayesian statistical evidence against a precise hypothesis. However, under absolutely continuous prior distributions, the posterior density is defined only up to Lebesgue null sets which renders the reference criterion arbitrary. Second, the FBST statistical evidence seems to have no valid prior probability. In this paper, it was shown that the former problem can be circumvented by fixing a version of the posterior density before using the FBST. Theorem 1 demonstrated that then, the e-value is well-defined and unique after observing the data y Y .
The latter aspect is based on the measure-theoretic premises of the FBST. As shown in this paper, the FBST avoids the use of a mixture prior structure which imposes a dichotomy between Bayesian hypothesis testing and parameter estimation. Thus, the FBST is compatible with absolutely continuous priors with respect to the Lebesgue measure λ (the Bayes factor, for example, is not). As a consequence, there exists no prior probability of the e-value and a precise hypothesis H 0 : θ = θ 0 under an absolutely continuous prior P ϑ . Theorem 2 showed that even then, the e-value has a proper interpretation from a prior perspective: It quantifies the a priori discrepancy of the hypothesis H 0 with the prior beliefs which are expressed by P ϑ whenever the reference function r ( θ ) is flat. When r ( θ ) 1 , the interpretation is more difficult but the conclusion remains the same.
Together, the results in this paper show that both of the two aspects which are sometimes observed as measure-theoretic inconsistencies of the FBST are not tenable. The FBST thus provides a measure-theoretically coherent Bayesian alternative for testing a precise hypothesis.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The R code to recreate all analyses and plots can be found at the Open Science Foundation at https://osf.io/25vsw/?view_only=e1e243c1e2a44646969fb75cc4c34d57.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
FBSTFull Bayesian Significance Test
NHSTNull Hypothesis Significance Testing

Appendix A

Proof of Theorem 1. 
From Definition 1 and Equation (2) it follows that the tangential set T ¯ ( ν ) : = Θ \ T ( ν ) becomes T ¯ ( s * ) : = Θ \ T ( s * ) , which equals the set
{ θ Θ : s ( θ ) > s ( θ * ) } = θ Θ : p ( θ | y ) r ( θ ) > p ( θ * | y ) r ( θ * ) = θ Θ : p ( θ | y ) r ( θ ) > p ( θ 0 | y ) r ( θ 0 )
where the first equality uses Definition 2 and the second equality uses θ * = θ 0 for a precise hypothesis H 0 : θ = θ 0 for θ 0 Θ . By assumption, the posterior distribution P ϑ | Y is known to take the form P ˜ ϑ | Y with Lebesgue-density p ˜ ( θ | y ) . Defining the posterior density p : Θ R d pointwise as p ( θ | y ) : = p ˜ ( θ | y ) implies that the value p ( θ 0 | y ) is equal to p ˜ ( θ 0 | y ) . Thus, the tangential set T ¯ ( s * ) in Equation (A1) is well-defined and unique for this fixed value p ( θ 0 | y ) : = p ˜ ( θ 0 | y ) . From Definition 3 and Equation (7) it follows that the e-value ev ¯ ( H 0 ) is well-defined and unique for the choice of p ( θ | y ) . □
Proof of Theorem 2. 
Let P ϑ be the prior distribution and r ( θ ) : = 1 . Suppose no data y Y has been observed, then the posterior distribution P ϑ | Y can be identified as the prior distribution P ϑ . Thus, replacing the posterior density p ( θ | y ) with the λ -density p ( θ ) of the prior P ϑ yields s ( θ ) : = p ( θ ) r ( θ ) , which implies that the tangential set T ¯ ( ν ) : = Θ \ T ( ν ) for T ( ν ) : = { θ Θ | s ( θ ) ν } includes the parameter values θ Θ which fulfill the condition p ( θ ) / r ( θ ) > ν . It follows that s * = p ( θ 0 ) / r ( θ 0 ) for a precise hypothesis H 0 : θ = θ 0 , and this yields T ¯ ( ν ) = { θ Θ | p ( θ ) / r ( θ ) > p ( θ 0 ) / r ( θ 0 ) } for the tangential set to H 0 . Using the latter in Equation (6) yields the e-value
ev ¯ ( H 0 ) : = W ¯ ( s * ) = T ¯ ( s * ) p ( θ ) d θ
which is the integral of the prior density p ( θ ) over T ¯ ( s * ) . By assumption, r ( θ ) : = 1 , so this becomes
ev ¯ ( H 0 ) = { θ Θ | p ( θ ) > p ( θ 0 ) } p ( θ ) d θ = P ϑ ( { θ Θ | p ( θ ) > p ( θ 0 ) } )
which is the statement in Equation (10). □

References

  1. Gigerenzer, G. Mindless statistics. J.-Socio-Econ. 2004, 33, 587–606. [Google Scholar] [CrossRef]
  2. Pashler, H.; Harris, C.R. Is the Replicability Crisis Overblown? Three Arguments Examined. Perspect. Psychol. Sci. 2012, 7, 531–536. [Google Scholar] [CrossRef]
  3. Baker, M.; Penny, D. Is there a reproducibility crisis? Nature 2016, 533, 452–454. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. McElreath, R.; Smaldino, P.E. Replication, communication, and the population dynamics of scientific discovery. PLoS ONE 2015, 10, 1–16. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Ioannidis, J.P.A. What Have We (Not) Learnt from Millions of Scientific Papers with p-Values? Am. Stat. 2019, 73, 20–25. [Google Scholar] [CrossRef] [Green Version]
  6. Button, K.S.; Ioannidis, J.P.; Mokrysz, C.; Nosek, B.A.; Flint, J.; Robinson, E.S.; Munafò, M.R. Power failure: Why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 2013, 14, 365–376. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Kelter, R. Bayesian alternatives to null hypothesis significance testing in biomedical research: A non-technical introduction to Bayesian inference with JASP. BMC Med. Res. Methodol. 2020, 20, 1–12. [Google Scholar] [CrossRef]
  8. Kelter, R. Analysis of Bayesian posterior significance and effect size indices for the two-sample t-test to support reproducible medical research. BMC Med. Res. Methodol. 2020, 20, 1–18. [Google Scholar] [CrossRef] [Green Version]
  9. Kelter, R. Bayesian survival analysis in STAN for improved measuring of uncertainty in parameter estimates. Meas. Interdiscip. Res. Perspect. 2020, 18, 101–119. [Google Scholar] [CrossRef]
  10. Wagenmakers, E.J.; Morey, R.D.; Lee, M.D. Bayesian Benefits for the Pragmatic Researcher. Curr. Dir. Psychol. Sci. 2016, 25, 169–176. [Google Scholar] [CrossRef]
  11. Edwards, W.; Lindman, H.; Savage, L.J. Bayesian statistical inference for psychological research. Psychol. Rev. 1963, 70, 193–242. [Google Scholar] [CrossRef]
  12. Hendriksen, A.; de Heide, R.; Grünwald, P. Optional Stopping with Bayes Factors: A Categorization and Extension of Folklore Results, with an Application to Invariant Situations. Bayesian Anal. 2020, in press. [Google Scholar] [CrossRef]
  13. Berger, J.; Wolpert, R.L. The Likelihood Principle; Institute of Mathematical Statistics: Hayward, CA, USA, 1988. [Google Scholar]
  14. Pereira, C.A.d.B.; Stern, J.M. Evidence and credibility: Full Bayesian significance test for precise hypotheses. Entropy 1999, 1, 99–110. [Google Scholar] [CrossRef]
  15. Pereira, C.A.d.B.; Stern, J.M.; Wechsler, S. Can a Significance Test be genuinely Bayesian? Bayesian Anal. 2008, 3, 79–100. [Google Scholar] [CrossRef]
  16. Pereira, C.A.d.B.; Stern, J.M. The e-value: A fully Bayesian significance measure for precise statistical hypotheses and its research program. São Paulo J. Math. Sci. 2020, 1–19. [Google Scholar] [CrossRef]
  17. Kelter, R. Simulation data for the analysis of Bayesian posterior significance and effect size indices for the two-sample t-test to support reproducible medical research. BMC Res. Notes 2020, 13, 1–3. [Google Scholar] [CrossRef]
  18. Kelter, R. fbst: An R package for the Full Bayesian Significance Test for testing a sharp null hypothesis against its alternative via the e-value. Behav. Res. Methods 2021, in press. [Google Scholar] [CrossRef]
  19. Madruga, M.R.; Esteves, L.G.; Wechsler, S. On the Bayesianity of Pereira-Stern tests. Test 2001, 10, 291–299. [Google Scholar] [CrossRef]
  20. Diniz, M.; Pereira, C.A.B.; Polpo, A.; Stern, J.M.; Wechsler, S. Relationship between Bayesian and frequentist significance indices. Int. J. Uncertain. Quantif. 2012, 2, 161–172. [Google Scholar] [CrossRef]
  21. Ly, A.; Wagenmakers, E.J. A Critical Evaluation of the FBST ev for Bayesian Hypothesis Testing. Comput. Brain Behav. 2021, 1–8. [Google Scholar] [CrossRef]
  22. Kelter, R. On the Measure-Theoretic Premises of Bayes Factor and Full Bayesian Significance Tests: A Critical Reevaluation. Comput. Brain Behav. 2021, 1–11. [Google Scholar] [CrossRef]
  23. Kleijn, B. The Frequentist Theory of Bayesian Statistics; Springer: Amsterdam, The Netherlands, 2020. [Google Scholar]
  24. Rosenman, R.H.; Brand, R.J.; Jenkins, D.; Friedman, M.; Straus, R.; Wurm, M. Coronary heart disease in Western Collaborative Group Study. Final follow-up experience of 8 1/2 years. JAMA 1975, 233, 872–877. [Google Scholar] [CrossRef]
  25. Rouder, J.N.; Speckman, P.L.; Sun, D.; Morey, R.D.; Iverson, G. Bayesian t tests for accepting and rejecting the null hypothesis. Psychon. Bull. Rev. 2009, 16, 225–237. [Google Scholar] [CrossRef]
  26. Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Routledge: Hillsdale, NJ, USA, 1988. [Google Scholar]
  27. Schervish, M.J. Theory of Statistics; Springer Verlag: New York, NY, USA, 1995. [Google Scholar]
  28. Bauer, H. Measure and Integration Theory; De Gruyter: Berlin, Germany, 2001. [Google Scholar]
  29. Robert, C.P. The Bayesian Choice, 2nd ed.; Springer New York: Paris, France, 2007. [Google Scholar] [CrossRef]
  30. Jeffreys, H. Theory of Probability, 1st ed.; The Clarendon Press: Oxford, UK, 1939. [Google Scholar]
  31. Haldane, J.B.S. A note on inverse probability. Math. Proc. Camb. Philos. Soc. 1932, 28, 55–61. [Google Scholar] [CrossRef]
  32. Popper, K. The Logic of Scientific Discovery; Routledge: London, UK; New York, NY, USA, 1959. [Google Scholar] [CrossRef]
  33. Berger, J. Statistical Decision Theory and Bayesian Analysis; Springer: New York, NY, USA, 1985. [Google Scholar]
  34. Rousseau, J. Approximating Interval hypothesis: P-values and Bayes factors. In Bayesian Statistics; Bernado, J., Berger, J., Dawid, A., Smith, A., Eds.; Oxford University Press: Valencia, Spain, 2007; Volume 8, pp. 417–452. [Google Scholar]
  35. Rao, C.R.; Lovric, M.M. Testing point null hypothesis of a normal mean and the truth: 21st Century perspective. J. Mod. Appl. Stat. Methods 2016, 15, 2–21. [Google Scholar] [CrossRef]
  36. Kelter, R. Bayesian and frequentist testing for differences between two groups with parametric and nonparametric two-sample tests. Wires Comput. Stat. 2020, 13, e1523. [Google Scholar] [CrossRef]
  37. Good, I. Surprise index. In Encyclopedia of Statistical Sciences; Kotz, S., Johnson, N., Reid, C., Eds.; John Wiley & Sons: New York, NY, USA, 1988; Volume 7. [Google Scholar]
  38. Good, I. C332. Surprise indexes and p-values. J. Stat. Comput. Simul. 1989, 32, 90–92. [Google Scholar] [CrossRef]
  39. Good, I.J. C420. The existence of sharp null hypotheses. J. Stat. Comput. Simul. 1994, 49, 241–242. [Google Scholar] [CrossRef]
  40. Stern, J.M. Significance tests, Belief Calculi, and Burden of Proof in legal and Scientific Discourse. Front. Artif. Intell. Its Appl. 2003, 101, 139–147. [Google Scholar]
Figure 1. Results of the Full Bayesian Significance Test using a flat reference function (left) and a C ( 0 , 2 ) Cauchy density as reference function (right) for testing the hypothesis of no difference H 0 : δ = 0 in terms of systolic blood pressure between smokers and non-smokers.
Figure 1. Results of the Full Bayesian Significance Test using a flat reference function (left) and a C ( 0 , 2 ) Cauchy density as reference function (right) for testing the hypothesis of no difference H 0 : δ = 0 in terms of systolic blood pressure between smokers and non-smokers.
Psf 03 00010 g001
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kelter, R. On Two Measure-Theoretic Aspects of the Full Bayesian Significance Test for Precise Bayesian Hypothesis Testing . Phys. Sci. Forum 2021, 3, 10. https://doi.org/10.3390/psf2021003010

AMA Style

Kelter R. On Two Measure-Theoretic Aspects of the Full Bayesian Significance Test for Precise Bayesian Hypothesis Testing . Physical Sciences Forum. 2021; 3(1):10. https://doi.org/10.3390/psf2021003010

Chicago/Turabian Style

Kelter, Riko. 2021. "On Two Measure-Theoretic Aspects of the Full Bayesian Significance Test for Precise Bayesian Hypothesis Testing " Physical Sciences Forum 3, no. 1: 10. https://doi.org/10.3390/psf2021003010

Article Metrics

Back to TopTop