Abstract
The main objective of this paper is to find the relation between the adaptive significance level presented here and the sample size. We statisticians know of the inconsistency, or paradox, in the current classical tests of significance that are based on p-value statistics that are compared to the canonical significance levels (10%, 5%, and 1%): “Raise the sample to reject the null hypothesis” is the recommendation of some ill-advised scientists! This paper will show that it is possible to eliminate this problem of significance tests. We present here the beginning of a larger research project. The intention is to extend its use to more complex applications such as survival analysis, reliability tests, and other areas. The main tools used here are the Bayes factor and the extended Neyman–Pearson Lemma.
1. Introduction
Recently, the use of -values in tests of significance has been criticized. The question posed in [] and discussed in [,,] concerns the misuse of canonical values of significance level (0.10, 0.05, 0.01, and 0.005). More recently, a publication by the American Statistical Association [] makes recommendations for scientists to be concerned with choosing the appropriate level of significance. Pericchi and Pereira [] consider the calculation of adaptive levels of significance in an apparently successful solution for the correction of significance level choices. This suggestion eliminates the risk of a breach of the likelihood principle. However, that article deals only with simple null hypotheses, although the alternative may be composite. Another constraint is the dimensionality of the parameter space; the article was only about one-dimensional spaces. More recent is the article by 72 prominent scientists [], as described on the website of Nature Human Behavior []. In a genuinely Bayesian context, the authors of [] introduced the index e (e-value, e for evidence) as an alternative to the classical p-value, which we write with a lower-case “”. A correction to make the null hypothesis invariant under transformations was presented in [], and a more theoretical review can be seen in [,]. The e-value was the basis of the solution of an astrophysical problem described in []. The relationship between p-values and e-values is discussed in []. However, while the e-value works for hypotheses of any dimensionality without needing assignment of “point mass” probabilities to hypotheses of lower dimensionality than the parameter space, setting its significance level is not an easy task. This has made us look for a way to obtain a significance index that allows us to better understand how to obtain the optimal (in the sense we explain later) significance level of a problem of any finite dimensionality. This work is based on four previous works [,,,]. It has taken a long time to see the possibility of using them in combination and with reasonable adjustments: the Bayes factor takes the place of the likelihood ratio and the average value of the likelihood function replaces its maximum value. The mean of the likelihood function under the null hypothesis will be the density used in the calculation of the new index, the -value, which we represent with a capital “P” to differentiate it from the classical -value. The basis of all our work is the extended Neyman–Pearson Lemma in its Bayesian form (see [], sections “Optimal Tests” (Theorem 1) and “Bayes Test Procedures” (pp. 451–452)).
We present here a new hypothesis testing procedure that can eliminate some of the major problems associated with currently used hypothesis tests. For example, the new tests do not tend to reject all hypotheses in the many-data limit like Neyman–Pearson tests do, nor do they tend to fail to reject all hypotheses in the same limit, like Jeffreys’s Bayesian (Bayes factor) hypothesis tests do.
2. Blending Bayesian and Classical Concepts
2.1. Statistical Model
As usual, let and be random vectors (could be scalars) ∈ ⊂ , being the sample space, and , being the parameter space, and and being positive integers. To state the relation between the two random vectors, the statistician considers the following: a family of probability density functions indexed by the conditioning parameter; a prior probability density function on the entire parameter space, and the posterior density function In order to be appropriate, the family of likelihood functions indexed by , must be measurable in the prior σ-algebra.
With the statistical model defined, a partition of the parameter space is defined by the consideration of a null hypothesis that is to be compared to its alternative:
In the case of composite hypotheses with the partition elements having the same dimensionality, the model would then be complete. Such cases would not involve partitions for which there are components of zero Lebesgue measure. In the case of precise or “sharp” hypotheses, that is, the partition components having different dimensionalities, other elements must be added:
- positive probabilities of the hypotheses, and
- a density on the subset that has the smaller dimension. The choice of this density should be coherent with the original prior density over the global parameter space.
Consider the common case for which the null hypothesis is the one defined by a subset of lower dimensionality. In this case, we use a surface integral to normalize the values of the prior density in the null set so that the sum or integral of these values is equal to unity. Figure 1 illustrates how this procedure is carried out. Recall that a prior density can be seen as a preference system in the parameter space. That preference system must be kept even within the null hypothesis; coherence in access to prior distributions is crucial. Further details on this procedure can be found in [,,]. Later, Dawid, and Lauritzen [] considered multiple ways of obtaining compatible priors under alternative models (hypotheses). The “conditioning” approach described by Dawid and Lauritzen is equivalent to the technique presented here. Dickey [] used a similar approach previously, but in a more parameterization-dependent way.
Figure 1.
A prior made of independent and distributions in a two-dimensional parameter space is cut along the line and one of the pieces moved away to show the resulting prior on the lower-dimensional set.
2.2. Significance Index
By “significance index”, we mean a real function over the sample space that is used as an evidence measure for decision-making with respect to accepting or rejecting the null hypothesis, H. We begin this section by stating a generalization of the Neyman–Pearson Lemma, as presented by DeGroot []. Cox [,] also considers the classical -value as an evidence measure, and Evans [] considers evidence measures in general, outlines the relative belief theory developed in the references of that paper, and suggests that the associated evidence measure could have advantages over other measures of evidence and be the basis of a complete approach to estimation and hypothesis-assessment problems. The classical -value is the most widely used significance index across diverse fields of study, including almost all scientific areas. In the present work, we present a replacement for the classical -value has a number of advantages that will be described here and in future work. The conceptual and operational similarity between classical hypothesis tests as currently used and the new tests could potentially help researchers accept and use the new tests.
Let be probability density functions over the sample space. The decision problem is to choose one of these densities as being the true generator of the observed data. Consider now a binary function used to define the decision procedure. Defining a partition of the sample space with and, where is the non-rejection region for H. The test function is
To choose between a hypothesis and its alternative, one should first choose two positive real numbers, say, with meaning, respectively, preference for the null hypothesis, indifference, and preference for the alternative. The decision rule is then to reject the null hypothesis,, whenever, and not to reject otherwise. The following theorem, a generalized version of the Neyman–Pearson Lemma presented in the textbook by DeGroot [] provides a test that is optimal in the sense of minimizing a linear combination of the probabilities of the two types of errors: Type I, which is the rejection of a true hypothesis, and Type II, the non-rejection of a false hypothesis.
and
Generalized Neyman–Pearson Lemma:
Let be a test that rejects H in favor of A if , does not reject H if , and is indifferent if Then, for any other test
In 1957, both Lindley [] and Bartlett [] recognized that fixing a significance level was a major cause of problems with hypothesis tests. In 1966, Cornfield [] advocated hypothesis tests that minimize a linear combination of error probabilities like Equation (5) rather than fixing a canonical and minimizing like in the Neyman–Pearson approach [].
To see that Bayesian hypothesis tests minimize a linear combination of error probabilities of the form consider a loss function that is zero if the decision is correct and if the decision favors when is the true state of nature. In addition, if is the prior probability of and the test function, the risk function is
Consequently, simply identifying, respectively, and recalling that risk functions are to be minimized; Bayesian tests should minimize a linear combination of the form. Both the classical and the Bayesian applications of the theorem are stated in terms of the comparison of the ratio to the constant K, given by
It is important to remember that this generalized version of the Neyman–Pearson Lemma, from the classical point of view, will only apply to simple-versus-simple hypotheses. It is not common in classical inference to consider a density function under a composite hypothesis. However, some classical methods use optimization by considering the maximum of the likelihood function both under and under. Recall that the likelihood function can be represented as .
In the Bayesian paradigm, the likelihood function plays an important role, which is not at all surprising, because it is the only mathematical object considered that defines an association between a sample and a parameter. Rather than optimization, integration is the Bayesian tool applied here. With the prior densities defined, the following conditional expectations are calculated:
These functions are the Bayesian predictive densities under the respective hypotheses. Both are probability density functions over the sample space . The ratio between the two functions is known as the Bayes factor,
To define a confidence index, an alternative to the usual -value, it is necessary to establish an ordering of all the points in the sample space. Montoya-Delgado et al. [] suggest the use of the Bayes factor values of all sample points to induce the necessary order. García-Donato and Chen [] use a similar ordering of the sample space on the way to calculating Type-I and Type-II error probabilities for Bayes factor tests like those of Jeffreys [] under a specific symmetry condition on the sampling distribution of the Bayes factor. Gu, Hoijtink, and Mulder [] apply a similar condition, essentially holding the probabilities of the two types of error to be equal via tuning of the Bayes factor for a “Bayesian -test” using a specific kind of prior. Both of these approaches continue to use the comparison of a Bayes factor to fixed values, such as those in the table presented by Jeffreys [] and the updated table presented by Kass and Raftery [], to choose from competing hypotheses. The new hypothesis tests presented here adopt a criterion for choosing which hypothesis to reject that is more like the one used in familiar Neyman–Pearson testing, but with the advantage that the significance level is adaptive, that is, depends on the sample size.
The steps to perform a hypothesis test are as follows:
- Define a prior density over the entire parameter space. This function can be chosen either objectively of subjectively.
- Clearly define the hypotheses to be tested, H and A.
- Obtain the predictive functions under the two alternative hypotheses. In the case for which the parametric subspaces defined by the hypotheses are of different dimensionalities, the definition of a prior density under the subset of smaller dimension, say H, is obtained from the following expression, subject to the condition (on the parameter space as a whole and the hypotheses) that the integral in the denominator can be defined:
The denominator is the surface integral over the subspace. When consists of a single point, there is no need to perform the integral. In the case of of different dimensionalities, define an additional positive probability that H is the true hypothesis. Figure 1 illustrates how is obtained from the prior over the full parameter space.
- 4.
- Define the loss function, considering mainly the relative importance of the hypotheses and of the two types of error—consider, for example, a governor who is concerned more with the budget than with public health and who will strongly prefer the hypothesis that the apparent wave of meningitis cases in his state do not represent an epidemic.
- 5.
- Use the Bayes factor to order the sample space: establishes the order of each. This ordering can be used independently of the dimensionalities of the spaces .
- 6.
- Using the theorem above, compute the optimal averaged error probabilities and use the value of as the adaptive level of significance, which will depend on the loss function, the probability densities, the prior probability, and especially on the sample size.
- 7.
- Calculate the significance index, the -value, as follows: if is the observed value of a statistic and is the observed tail under the new ordering, the -value is calculated using the expression . Clearly, this may be a single or a multiple integral or sum.
- 8.
- Compare the value with the value of Reject (do not reject) H if . In the case of equality, take either decision without prejudice to optimality.
- 9.
- Finally, if a value of is specified a priori, calculate the sample size needed to make this fixed value as close as possible to optimal according to the generalized Neyman–Pearson Lemma.
We emphasize that it does not matter how the prior over the entire parameter space is chosen. The present work is concerned with how to perform the new hypothesis tests once an overall prior has been chosen.
3. Illustrative Examples
This section introduces four simple examples to illustrate the use of the new -value and how the adaptive significance level varies with sample sizes.
3.1. Example 1—Comparing Two Proportions
A doctor wants to show that the incorporation of a new technology in a treatment can produce better results than the conventional treatment. He plans a clinical trial with two arms, case and control, each with eight patients. The case arm receives the new treatment and the control arm receives the conventional one. Details of a clinical trial of this kind are shown in []. The observed results in this example are that only one of the patients in the control arm responded positively, but in the case arm there were four positive outcomes.
The most common classical significance tests result in the following -values: the Pearson χ2 -value is 0.106, changed to 0.281 with the Yates continuity correction applied, and Fisher’s exact -value is 0.282. Traditional analysts would conclude that there were no statistically significant differences between the two treatments, using any of the canonical significance levels. Note that these procedures were for testing a sharp hypothesis against a composite alternative: comparing the proportion of success of the two treatments. In what follows, we calculate the proposed -value and use the optimal significance level to make the decision of choosing one of the hypotheses.
To be fair in our comparisons, we consider independent uniform (non-informative) prior distributions for. With these suppositions and the likelihoods being binomials with sample sizes n = 8, the predictive probability functions under the two hypotheses are
The variables represent the possible observed values of the number of positive outcomes in the two arms. Table 1 and Figure 2 present the Bayes factors for all possible results.
Table 1.
Bayes factor for all possible results in a clinical trial with arms size of n = 8.
Figure 2.
Bayes factors of all possible results in a clinical trial with arms size of each.
To obtain the proposed -value, define the set of sample points for which the Bayes factors are smaller than or equal to the Bayes factor of the observed sample point; i.e.,
Thus, the significance index, -value, is the sum of all predictive probabilities (under H) in :
Recalling the observed result of the clinical trial, the observed Bayes factor is . The italic-bold cells in Table 1 identify the set of possible values of the Bayes factor. Thus, according to Equation (13), the -value is .
To obtain the optimal solution we minimize the sum of the error probabilities, . The two error types are considered to be of the same severity in this example. The optimal solution is the result of comparing the Bayes factor with the constant as defined in Equation (7) to make the choice according to the extended Neyman–Pearson Lemma. Defining the set of sample space points with Bayes factors smaller than or equal to K, i.e., , the optimal Type I and Type II errors are given by
and
In this example, we consider the two hypotheses to be equally probable a priori, and represent the equal severity of Type-I and Type-II errors by, resulting in. The set was identified by red cells in Table 1. From Equations (14) and (15), we obtain the optimal adaptive level of significance and the probability of a Type-II error. The high value of the probability of the second kind of error is expected whenever the sample sizes are small. Contrary to the classical results, the conclusion now is the most intuitive one; the null hypothesis is rejected since
The physician, owner of the data in Example 1, looking at our analysis, asked about the sample size needed to obtain at most a level of significance for our procedure. The answer could be obtained from the next example, which shows the case of two arms with 20 patients each.
3.2. Example 2—Two Proportions, Varying Sample Sizes
Consider now a clinical trial as in Example 1, but with an arm size of . The observed result is. We leave to the reader the simple exercise of repeating the calculations of Example 1 with different samples. Consider independent uniform (non-informative) prior distributions for and take the two hypotheses to have equal prior probabilities and the two types of error to have the same relative severity, . The predictive probability functions under hypotheses are
and the observed Bayes factor is , which leads to the following results: significance index ; optimal adaptive level of significance ; and the probability of a Type-II error . The classical χ2 -value is, indicating rejection of the null hypothesis at the canonical level of significance. This agrees with our decision of rejecting the null hypothesis since again. It is interesting to see the relative distance between the index and the level of significance. For the χ2 test, we have and the adaptive case obtains .
Figure 3 presents the optimal adaptive level of significance and the Type-II error by sample size. As expected, the probabilities of both kinds of errors decrease when the sample size increases.
Figure 3.
Type-I and Type-II error probabilities as functions of the sample size n in each arm.
The response to the question about the sample size needed to obtain a significance level of at most is in each arm. For a level of at most, we need a sample size of in each arm.
Optimal adaptive significance levels and Type-II error probabilities for different arm sizes, and are presented in Table 2. With a fixed total sample size, an unbalanced sample can have larger (both Type-I and Type-II) errors than a balanced sample. The greater the imbalance of the sample, the greater the averaged error probabilities is. For example, the error probabilities of an unbalanced sample with and is larger than a balanced sample with n1 = n2 = 20 (Table 2), despite the unbalanced sample having a total size of and the balanced sample just 40.
Table 2.
Optimal levels of significance ( ) and Type-II error probabilities ( for two proportions: Two independent binomial likelihoods and various sample sizes.
Pericchi and Pereira [] present a closed asymptotic formula that relates sample size and significance level in the simple case of testing in a binomial with parameters. A natural future project is to find this type of relation in other complex statistical problems such as the one presented in the above examples.
The following example is an attempt to show that our -value should not violate the likelihood principle. Recall that violation of this principle has produced some of the Bayesian community’s main criticisms of the classical p-values.
3.3. Example 3—Test for One Proportion and the Likelihood Principle
A common example in which the likelihood principle can be violated is the case of binomials compared to negative binomials. For the same values of x, the number of successes in n independent Bernoulli trials, the two distributions produce different -values that can lead to different decisions if compared with the same level of significance. The present example shows that the new test introduced here will produce identical decisions if the observed sample size and the number of successes are the same. The proof that this is the case in general for the new tests is presented as Appendix A to this article. The reason the decisions end up being the same for different models is that, although the -values for the different models are different from each other, they are compared to different significance levels. The decision about the null hypothesis ends up being the same, so there is no violation of the likelihood principle. Changing the notation, let the sample vector be composed of the number of success and the number of failures, and the corresponding vector of probabilities be Take as the hypotheses to be tested. Taking a uniform (non-informative) prior distribution for and taking the two hypotheses to be equally probable a priori and the two types of error to have equal relative severity, , the predictive densities needed for the significance tests are as follows:
- for a (positive) binomial,
- for a negative binomial,
Clearly, the Bayes factors, as defined by Equation (9), are equal for the two models, and since using the lemma will lead to comparing them to the same constant, the decisions about the null hypothesis end up being the same. Note that both the -values and the significance levels are different for the two models. For instance, if we consider the observations for a positive binomial, we obtain the same results for both samples; . For the negative binomial, the two observed points will produce different significance levels and probabilities of both kinds of errors. For the first (second) sample, one stops observing whenever the number of successes reaches 3 Equation (11). For the first result, we have, and; for the second. Therefore, the decisions based on positive binomials are the same as the ones based on negative binomials for the same.
Table 3 presents the predictive densities under several kinds of hypotheses for one proportion. For all kinds of hypotheses, positive and negative binomial models, for the same , produce equal Bayes factors.
Table 3.
Predictive densities under several hypotheses for one proportion.
3.4. Example 4
This is an example used by Pereira and Wechsler [], showing that the critical region is not always the tails of the null distribution; it can be a union of disjoint intervals. In such cases, it can be impossible to calculate a classical -value, but the ordering of the entire sample space by Bayes factors allows for an unambiguous definition and calculation of the new index, a -value.
Let x be a normal random variable with zero mean and unknown variance. The hypotheses are . A (chi-squared distribution with one degree of freedom) is taken as a prior density for. After an integration exercise, we can establish the predictive densities for our significance test as
These are, respectively, a Cauchy density and a normal density with zero mean and variance 2. Figure 4 shows the Bayes factor for all sample points, using the constant 1.1 as a cutoff for the decision about the null hypothesis. The sample points that do not favor the null hypothesis are a central region together with the heavy tails of the Cauchy density. The set that favors does not include the central region:
Figure 4.
Bayes factor for vs. Cauchy.
The set favoring the alternate hypothesis includes the interval a considerable central region.
4. Final Remarks
It is worth noting that there are multiple ways to understand our new test, and we would like to present a specific vision. Consider a statistical model, with a family of probability functions indexed by denoted by with all necessary conditions imposed for all relevant mathematical objects to be well-defined. If λ is a function of, one can simply write, because the sub-σ-algebra defined by the new parameter λ is contained in the one defined by the original parameter. Given a prior density for the original parameter
If the new parameter λ is a binary function (produces only values 0 and 1), then the two predictive probability functions are . These functions are averages, weighted by of the likelihood function. The original parameter has been removed as a “nuisance”, leaving only the new parameter representing the decision. Because the new parameter is binary, hypotheses involving it are simple-versus-simple, so the generalized Neyman–Pearson Lemma applies. Our procedure can be seen as elimination of a nuisance parameter for the application of optimization. We refer to Basu [] for elimination of nuisance parameters when the parameter spaces are variation dependent.
For decades, and increasingly in recent years, users of statistics have been questioning the logic of using the canonical significance levels, or indeed, any fixed significance level, for hypothesis testing. We believe that there are no formal reasons for using the established numbers, and that there are in fact good reasons not to fix significance levels a priori. We use the natural logic of optimization to define an adaptive significance level, that is, one that depends on the sample size. Our test using the new index (-value) and the adaptive significance level is compatible with the likelihood principle, as proved in the Appendix A of the present article.
There is still much work to be done, testing different kinds of hypotheses in the parameter spaces of different models, including multivariate problems. We are not aware of any complex model that prevents the use of the hypothesis tests discussed in the present paper. It is hoped that the similarity of the apparatus of the new tests to that of existing Neyman–Pearson tests, plus favorable characteristics of the new tests, will make the new testing procedure useful and popular among investigators in the many fields in which statistical hypothesis testing can be useful.
There is certainly a one-to-one relation betweenand! Hence, after a cut-off for is defined automatically, we have a corresponding cut-off for and there is then a one-to-one correspondence of the pair of error type probabilities between the two methods. Those who prefer to use Bayes factors directly can certainly do so, but they can also advantage of the cut-off provided by our method.
Acknowledgments
The first and sixth authors are grateful to the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) for financial support. CABP grant number 308776/2014-3; AP grant number 304025/2013-5. Our research group, GIS—group of inductive statistics, contributed to this work by discussing and making suggestions. We are very grateful for all the collaboration from these colleagues, especially Fernando Corrêa Filho, Julio Michael Stern, and Sergio Wechsler. The editor and four reviewers of this article engaged in lengthy discussion that helped in sharpening our work. This work is dedicated to the memory of the late Oscar Kempthorne.
Author Contributions
The authors contributed equally to this work. It would be difficult for us to identify what any one author did not contribute.
Conflicts of Interest
The six authors declare no conflict of interest.
Appendix A
It is proved here that the new tests are compatible with the likelihood principle in general.
Imagine two different possible experiments and , where is the discrete sample space for the observable in experiment and is a parametric family of probability functions indexed by the common parameter that is, , . Let be a prior for
Consider the hypotheses , and , with and Let the risks for the two types of errors in making a decision be .
For and let
be the prior predictive probability function for under where is the conditional measure of given i.e., given
In the same way,
is the prior predictive under the alternative hypothesis Define the Bayes factor in favor of by
For let
where is the probability measure associated with the probability mass function
Define
and if the set in this expression is empty, take . Note that
and that, for
Finally, define the test function by
where is the “P-value”, the significance index used in the new test, at sample point :
The conditions for rejection of in each experiment can be rewritten:
Now consider a single observation that could be produced by either experiment, expressed in the respective sample spaces as such that with That is, the likelihood generated by data in experiment differs by a constant (not a function of ) multiplicative factor from the likelihood generated by data in experiment We will prove that that is, that the decision whether or not to reject the hypothesis is the same, regardless of the details of the experiment that produced the observation and considering .
Thus, it has been proven that. The proof of is analogous and is omitted.
References
- Johnson, V.E. Revised standards for statistical evidence. Proc. Natl. Acad. Sci. USA 2013, 110, 19313–191317. [Google Scholar] [CrossRef] [PubMed]
- Gaudart, J.; Huiart, L.; Milligan, P.J.; Thiebaut, R.; Giorgi, R. Reproducibility issues in science, is P value really the only answer? Proc. Natl. Acad. Sci. USA 2014, 111, E1934. [Google Scholar] [CrossRef] [PubMed]
- Gelman, A.; Robert, C.P. Revised evidence for statistical standards. Proc. Natl. Acad. Sci. USA 2014, 111, E1933. [Google Scholar] [CrossRef] [PubMed]
- Pericchi, L.; Pereira, C.A.B.; Pérez, M.E. Adaptive revised evidence for statistical standards. Proc. Natl. Acad. Sci. USA 2014, 111, E1935. [Google Scholar] [CrossRef] [PubMed]
- Wasserstein, R.L.; Lazar, N.A. The ASA’s statement on p-values: Context, process, and purpose. Am. Stat. 2016, 70, 129–133. [Google Scholar] [CrossRef]
- Pericchi, L.R.; Pereira, C.A.B. Adaptive significance levels using optimal decision rules: Balancing by weighting the error probabilities. Braz. J. Probab. Stat. 2016, 30, 70–90. [Google Scholar]
- Benjamin, D.; Berger, J.; Johannesson, M.; Nosek, B.A.; Wagenmakers, E.-J.; Berk, R.; Bollen, K.A.; Brembs, B.; Brown, L.; Camerer, C.; et al. Redefine statistical significance. Nat. Hum. Behav. 2017. [Google Scholar] [CrossRef]
- Nature News. Big Names in Statistics Want to Shake up Much-Maligned P Value. Available online: https://www.nature.com/articles/d41586-017-02190-5?WT.mc_id=TWT_NatureNews&sf101140733=1 (accessed on 28 August 2017).
- Pereira, C.A.B.; Stern, J.M. Evidence and credibility: A full Bayesian test of precise hypotheses. Entropy 1999, 1, 104–115. [Google Scholar]
- Madruga, M.R.; Pereira, C.A.B.; Stern, J.M. Bayesian evidence test for precise hypotheses. J. Stat. Plan. Inference 2002, 117, 185–198. [Google Scholar] [CrossRef]
- Pereira, C.A.B.; Stern, J.M.; Wechsler, S. Can a significance test be genuinely Bayesian? Bayesian Anal. 2008, 3, 79–100. [Google Scholar] [CrossRef]
- Stern, J.M.; Pereira, C.A.B. Bayesian epistemic values: Focus on surprise, measure probability! Log. J. IGPL 2013, 22, 236–254. [Google Scholar] [CrossRef]
- Chakrabarty, D. A New Bayesian Test to Test for the Intractability-Countering Hypothesis. J. Am. Stat. Assoc. 2017, 112, 561–577. [Google Scholar] [CrossRef]
- Diniz, M.A.; Pereira, C.A.B.; Polpo, A.; Stern, J.M.; Wechsler, S. Relationship between Bayesian and frequentist significance indices. Int. J. Uncertain. Quantif. 2012, 2, 161–172. [Google Scholar] [CrossRef]
- Pereira, C.A.B.; Wechsler, S. On the concept of p-value. Braz. J. Probab. Stat. 1993, 7, 159–177. [Google Scholar]
- Pereira, C.A.B. Testing Hypotheses of Different Dimensions: Bayesian View and Classical Interpretation. Professor Thesis, Institute Mathematics & Statistics, USP, Sao Paulo, Brazil, 1985. (In Portuguese). [Google Scholar]
- Irony, T.Z.; Pereira, C.A.B. Bayesian hypothesis test: Using surface integrals to distribute prior information among the hypotheses. Resenhas 1995, 2, 27–46. [Google Scholar]
- Montoya-Delgado, L.E.; Irony, T.Z.; Pereira, C.A.B.; Whittle, M.R. An unconditional exact test for the Hardy-Weinberg equilibrium law: Sample space ordering using the Bayes factor. Genetics 2001, 158, 875–883. [Google Scholar] [PubMed]
- DeGroot, M.H. Probability and Statistics; Addison-Wesley: Boston, MA, USA, 1986. [Google Scholar]
- Dawid, A.P.; Lauritzen, S.L. Compatible Prior Distributions. In Bayesian Methods with Applications to Science Policy and Official Statistics; Monographs of Official Statistics; EUROSTAT: Luxembourg, 2001; pp. 109–118. [Google Scholar]
- Dickey, J.M. The weighted likelihood ratio, linear hypotheses on normal location parameters. Ann. Math. Stat. 1971, 42, 204–223. [Google Scholar] [CrossRef]
- Cox, D.R. The role of significance tests (with discussions). Scand. J. Stat. 1977, 4, 49–70. [Google Scholar]
- Cox, D.R. Principles of Statistical Inference; Cambridge University Press: New York, NY, USA, 2006. [Google Scholar]
- Evans, M. Measuring statistical evidence using relative belief. Comput. Struct. Biotechnol. J. 2016, 14, 91–96. [Google Scholar] [CrossRef] [PubMed]
- Lindley, D.V. A Statistical Paradox. Biometrika 1957, 44, 187–192. [Google Scholar] [CrossRef]
- Bartlett, M.S. A comment on D.V. Lindley’s statistical paradox. Biometrika 1957, 44, 533–534. [Google Scholar] [CrossRef]
- Cornfield, J. Sequential trials, sequential analysis and the likelihood principle. Am. Stat. 1966, 20, 18–23. [Google Scholar]
- Neyman, J.; Pearson, E.S. On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. R. Soc. Lond. Ser. A Contain. Pap. A Math. Phys. Charact. 1933, 231, 289–337. [Google Scholar] [CrossRef]
- García-Donato, G.; Chen, M.-H. Calibrating Bayes factor under prior predictive distributions. Stat. Sin. 2005, 15, 359–380. [Google Scholar]
- Jeffreys, H. The Theory of Probability; The Clarendon Press: Oxford, UK, 1935. [Google Scholar]
- Gu, X.; Hoijtink, H.; Mulder, J. Error probabilities in default Bayesian hypothesis testing. J. Math. Psychol. 2016, 72, 140–143. [Google Scholar] [CrossRef]
- Kass, R.E.; Raftery, A.E. Bayes Factors. JASA 1995, 90, 773–795. [Google Scholar] [CrossRef]
- Lopes, A.C.; Greenberg, B.D.; Canteras, M.M.; Batistuzzo, M.C.; Hoexter, M.Q.; Gentil, A.F.; Pereira, C.A.B.; Joaquim, M.A.; de Mathis, M.E.; D’Alcante, C.C.; et al. Gamma Ventral Capsulotomy for Obsessive-Compulsive Disorder: A Randomized Clinical Trial. JAMA Psych. 2014, 71, 1066–1076. [Google Scholar] [CrossRef] [PubMed]
- Basu, D. On the elimination of nuisance parameters. JASA 1977, 72, 355–366. [Google Scholar] [CrossRef]
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).