Abstract
The Non-Informative Nuisance Parameter Principle concerns the problem of how inferences about a parameter of interest should be made in the presence of nuisance parameters. The principle is examined in the context of the hypothesis testing problem. We prove that the mixed test obeys the principle for discrete sample spaces. We also show how adherence of the mixed test to the principle can make performance of the test much easier. These findings are illustrated with new solutions to well-known problems of testing hypotheses for count data.
1. Introduction
Principles of Statistical Inference (or Data Reduction) constitute important guidelines on how to draw conclusions from data, especially when performing standard inferential procedures for unknown parameters of interest, like estimation and hypothesis testing. For instance, the Sufficiency Principle (SP) states that any sufficient statistic retains all relevant information about the unknown parameters that should be used to make inferences about them. It precisely recommends that if T is a sufficient statistic for the statistical model under consideration and and are sample points such that , then the observation of any of these points should lead to the same conclusions regarding the parameters of interest.
Besides the place of sufficiency in Statistical Inference, these recommendations cover several issues such as the contrast between post-experimental and pre-experimental reasoning and the roles of non-informative stopping rules, censoring mechanisms and nuisance parameters in data analysis. Among the main principles, the Sufficiency Principle is generally recognized as a cornerstone of Statistical Inference. On the other hand, the Likelihood Principle (LP) and its profound consequences are still subjects of intense debate. The reader will find a detailed discussion of the Likelihood Principle in [1,2,3,4,5,6].
In this work, we examine the Non-Informative Nuisance Parameter Principle (NNPP) introduced by Berger and Wolpert in 1988 in their remarkable book that concerns the problem of the way inferences about a parameter of interest should be made in the presence of nuisance parameters. Nuisance parameters usually affect inferences about the parameter of interest, like in the estimation of the mean of a normal distribution with unknown variance, in the estimation of the parameters of a linear regression model in the presence of unknown variance, and in the determination of p-values for specific hypotheses in the analysis of contingency tables ([7]). In a few words, the NNPP states that under suitable conditions, it is irrelevant whether the value of a non-informative nuisance parameter is known or not in order to draw conclusions about the parameter of interest. Despite the importance of the problem for eliminating nuisance parameters in data analysis, the authors have not explored this principle and its consequences in some depth as far as we have reviewed the literature. For this reason, we revisit the NNPP by formally stating it for the problem of hypothesis testing, present decision rules that meet the principle and show how the performance of a particular test in line with the NNPP can then be simplified.
This work is organized as follows: in Section 2, the NNPP for hypothesis testing is stated, discussed and illustrated under a Bayesian perspective. In Section 3, the Bayesian test procedure based on the concept of adaptive significance level and on an alternative p-value introduced by Pericchi and Pereira in [8], henceforth named the mixed test, is reviewed and is proven to satisfy the NNPP for discrete sample data when the (marginal) null hypothesis regarding the parameter of interest is a singleton (as a matter of fact, the result also holds when such a null hypothesis is specified by a hyperplane). In that section, we also define conditional versions of the adaptive significance level and p-value based on suitable statistics and prove that under those conditions, the performance of the mixed test is then simply the comparison between these new conditional quantities. These results are of great importance to make it easier to use the mixed test in various situations. In Section 4, we exemplify the main results by presenting new solutions by using the mixed test for well-known problems of test of hypotheses for count data under suitable reparametrizations of the corresponding models: we revisit the problems of comparison of Poisson population means and of testing the hypotheses of independence and symmetry in contingency tables. We make our final comments in Section 5. The proofs of the theorems and the calculations for one example in Section 4 are found in the Appendix A.
2. The Non-Informative Nuisance Parameter Principle for Hypothesis Testing
The problem of the elimination of nuisance parameters in statistical inference has a long history and remains a major issue. Proposals to deal with it include the marginalization of the likelihood function by integrating out the nuisance parameter ([9,10,11]), the construction of partial likelihood functions ([12,13,14], among others) and the consideration of conditional likelihood functions based on different notions of non-informativeness, sufficiency and ancillarity. Elimination of nuisance parameters and different notions of non-information have also been studied in more detail in [15,16,17,18], where, based on suitable statistics, the concepts of B, S and G non-information are presented. The generalized Sufficiency and Conditionality Principles are also discussed in [17]. On the other hand, Bayesian methods for eliminating nuisance parameters based on a suitable statistic T involve different definitions of sufficiency: for instance, K-Sufficiency, Q-Sufficiency and L-Sufficiency (see for example [17] and references therein).
In this section, the Non-Informative Nuisance Parameter Principle (NNPP) by Berger and Wolpert is discussed and formally defined for the problem of hypothesis testing. As we will see, on the one hand, the NNPP seems to be fair under both the partial and the conditional non-Bayesian approaches mentioned in the previous paragraph; on the other hand, it sounds really reasonable under the Bayesian standpoint. Despite the relevance of the problem of the elimination of nuisance parameters in data analysis, Berger and Wolpert [1] presented the NNPP but has not explored the principle in-depth as far as we have examined in the literature.
Some notation is needed to continue. We denote by the unknown parameter and by X the sample to be observed. and represent the parameter and the sample spaces, respectively. The family of discrete probability distributions for X is denoted by . In addition, for , denotes the likelihood function for generated by the sample point x. By an experiment , we mean, as in [1], a triplet , with X, and as defined earlier. Finally, for a subset of , we formulate the null hypothesis and the alternative one . We recall that a test function (procedure) for the hypotheses H versus A is a function that takes the value 1 () if H is rejected when is observed and takes the value 0 () if H is not rejected when x is observed. Under the Bayesian perspective, we also consider a continuous prior density function for that induces, when combined with the likelihood function , a continuous posterior density function for given x, .
In [1], Berger and Wolpert presented the following principle on how to make inferences about an unknown parameter of interest in the presence of a nuisance parameter : when a sample observation, say , separates information concerning from information on , it is irrelevant whether the value of is known or unknown in order to make inferences about based on the observation of . In other terms, if the conclusions on were to be the same for every possible value of the nuisance parameter, were known, then the same conclusions on should be reached even if is unknown. These authors then consider the following mathematical setup to formalize these ideas.
Let , with and defined as in the previous paragraph. Consider ; that is, the parameter space is variation independent, where is the set of values for , , . Suppose the experiment is carried out to learn about . Let be the “thought” experiment in which the pair is to be observed (instead of observing only X), where is the family of distributions for indexed by . Suppose also that under experiment , the likelihood function generated by a specific for has the following factored form:
where , ; that is, depends on only through .
Berger and Wolpert then states the Non-Informative Nuisance Parameter Principle (NNPP): if and are such that (1) holds, and if the inference about from the observation of when is performed does not depend on , then the inferential statements made for from and should be the same as (should coincide with) the inferential statements made from and for every .
The authors named such a parameter a Non-Informative Nuisance Parameter (NNP), as the conclusions or decisions regarding from and do not depend on .
A likelihood function that satisfies (1) is named a likelihood function with separable parameters ([19]). The factored form of the likelihood function in (1) seems to capture the notion of “absence of information about a parameter, say , from the other, , and vice versa” under both Bayesian and non-Bayesian reasoning. Indeed, under the Bayesian paradigm, posterior independence between and (say, given ) reflects the fact that one´s opinion about the parameter after observing is not altered by any information about , and consequently, decisions regarding should not depend on . Since posterior independence between and given is equivalent to the factored form of the likelihood function generated by under prior independence, condition (1) sounds really reasonable as a mathematical description of separate information about the parameters. Thus, if a Bayesian statistician should make inferences regarding a parameter in the presence of a nuisance parameter , it would be ideal that these parameters are independent a posteriori; that is, the factored form of the likelihood function holds. This last equivalence is proven in the theorem below.
Theorem 1.
Let be an experiment and be the prior probability density function for . Suppose is independent of () a priori. Then, for each ,
such that .
On the other hand, the condition (1) seems to also be a fair representation of non-informativeness of one parameter on another under a non-Bayesian perspective. In fact, such a factored form of the likelihood function arises, for instance, when the sample X is conditioned on particular types of statistics that are simple to interpret under non-Bayesian paradigms. Note that for any statistic T, one can write
If, in addition, T is a statistic such that its distribution given depends only on and the conditional distribution of X given , and depends only on , the factored form in (1) is easily obtained (such a statistic was named p-sufficient for by Basu ([17]). In this situation, all the relevant information on is summarized in T, and one can fully make inferences on taking into account only the conditional distribution of T given , which does not involve . Similarly, if T is a statistic such that its distribution given depends only on and the conditional distribution of X given and depends only on , the factored form in (1) holds. Such a statistic was named s-ancillary for by Basu ([17]), and it is somewhat evident that in this case, conclusions on should be drawn exclusively from the distribution of X given and , which does not depend on . Such a conditional approach to the problem of elimination of nuisance parameters had already been proposed by Basu ([17]) and in a sense is closely related to the NNPP by Berger and Wolpert. The next theorem formally presents such results.
Theorem 2.
Let be an experiment in which and Θ is variation independent. Then, if such that T is either p-sufficient or s-ancillary for , then for each , the likelihood function generated by x, can be factored as (1).
In summary, it seems reasonable that inferences about and can be performed independently under condition (1). Thus, if only is of interest, then it seems sensible under (1) that we reach the same conclusions on when x is observed either by using the whole likelihood function or only the factor . That is, it makes sense to disregard the information contained in and focus on . As mentioned by [19], examples of likelihood functions with separable parameters like (1) are rare, but if (1) holds, it would be a useful property for Bayesian and non-Bayesian statisticians to analyze statistical data, especially in the presence of nuisance parameters. This fact will be illustrated in Section 3 and Section 4.
We end this section by formally adapting the general NNPP to the special problem of hypothesis testing, in which inference about an unknown parameter consists of deciding whether a statement about the parameter (a statistical hypothesis) should be rejected or accepted by using the observable quantity X.
As before, let be an experiment, with . Let be the “thought” experiment in which, in addition to X, is observed. Then, consider the following definition.
Definition 1.
Non-Informative Nuisance Parameter (NNP): Let and be a test for the hypotheses
Then, we say that is a Non-Informative Nuisance Parameter (NNP) for testing versus by using if, for every such that (1) holds, does not depend on ; that is, it depends only on x.
In a nutshell, Definition 1 tells us something that appears intuitive: if the decision between H and A does not depend on , then does not provide any information about . In the following example, we illustrate this idea.
Example 1.
Consider that and the experiment . Let and be the test for the hypotheses
such that the null hypothesis is rejected when the conditional probability of B given x and is small; that is,
where . Suppose, in addition, that and are independent a priori. Let us verify that is an NNP for testing these hypotheses by means of . Let be such that for specific functions and . Then,
where is the prior of , . Thus, we have that
Note from Equation (7) that does not depend on . Thus, is an NNP for testing versus by using .
After defining an NNP, we formally state the Non-Informative Nuisance Parameter Principle (NNPP) for hypothesis testing.
Definition 2.
Non-Informative Nuisance Parameter Principle (NNPP): Let the parameter space be variation independent; that is, . Consider the experiments and . Let be the subset of of interest. In addition, let and be tests for the hypotheses
respectively.
If is an NNP for testing versus by using and such that condition (1) holds, then
The NNPP for statistical hypothesis testing says that if one intends to test a hypothesis regarding only the parameter , it is irrelevant whether is known or unknown if it is non-informative for such a decision-making problem. More formally, if one wants to test a hypothesis concerning only and he observes a sample point that separates information on from information on —that is, (1) holds—then the performances of the tests under the original experiment and under the “thought” experiment should yield the same decision on the hypothesis if is non-informative for that purpose.
We should mention that the NNPP can be adapted to any other inferential procedure. However, in this work, we focus on the principle for the problem of hypothesis testing. We conclude this section by proving that tests based on the posterior probabilities of the hypotheses satisfy the NNPP under prior independence.
Example 2
(continuation of Example 1). Consider the conditions of Example 1. Consider and let be the test for the hypotheses
that rejects the null hypothesis H if its posterior probability is small; that is,
Let be such that . We can write the posterior probability on the right-hand side of (11) as
where the last equality follows from Fubini’s Theorem. Hence,
In the next section, we examine a second test procedure that is in line with the NNPP. We review the mixed test introduced by Pericchi and Pereira ([8]) and prove that such a test meets the NNPP for simple hypotheses concerning the parameter of interest. We also show how the adherence of the mixed test to the NNPP can then simplify its use.
3. The Mixed Test Procedure
The mixed test formally introduced in ([8]) is a test procedure that combines elements from both Bayesian and frequentist views. On the one hand, it considers an (intrinsically Bayesian) prior distribution for the parameter from which predictive distributions for the data under the competing hypotheses and Bayes factors are derived. On the other hand, the performance of the test depends on ordering the sample space by the Bayes factor and on the integration of these predictive distributions over specific subsets of the sample space in a frequentist-like manner. The mixed test is an optimal procedure in the sense that it minimizes linear combinations of averaged (weighted) probabilities of errors of decision. It also meets a few logical requirements for multiple-hypothesis testing and obeys the Likelihood Principle for discrete sample spaces despite the integration over the sample space it involves. In addition, the test overcomes several of the drawbacks fixed-level tests have. However, a difficulty with the mixed test procedure is the need to evaluate the Bayes factor for every sample point to order the sample space, which may involve intensive calculations. Properties of the mixed test and examples of application are examined in detail in [8,20,21,22,23,24,25].
Next, we review the general procedure for the performance of the mixed test and then show the test satisfies the NNPP when the hypothesis regarding the parameter of interest is a singleton.
First, we determine the predictive distributions for X under the competing hypotheses H and A, and , respectively. For the null hypothesis , , is determined as follows: for each ,
where denotes the conditional distribution of given . That is, for each , is the expected value of the likelihood function generated by x against . Similarly, for the alternative hypothesis we define
where denotes the conditional distribution of given . From (14) and (15), we obtain the Bayes factor of for the hypothesis H over A as
Finally, the mixed test for the hypotheses H versus A consists in rejecting H when is observed if and only if the Bayes factor is small. That is, for each ,
where the positive constants a and b reflect the decision maker’s evaluation of the impact of the errors of the two types or, equivalently, his prior preferences for the competing hypotheses. A detailed discussion on the specification of such constants is found in [8,20,21,22,23,24,25].
The mixed test can also be defined as a function of a new significance index. That is, (17) can be rewritten as a comparison between such a significance index and a specific cut-off value. These quantities are defined below.
For the mixed test defined in (17), the p-value of the observation is the significance index given by
where . Also, we define the adaptive type I error probability of as
where . Alternatively, is also known as the adaptive significance level of .
Pereira et al. [21] proved that the mixed test for the hypotheses
can be written as
Note that consists of comparing the p- with the cut-off , which depends on the specific statistical model under consideration and on the sample size, as opposed to a standard test with a fixed significance level that does not depend on the sample size.
The former does not have a few of the disadvantages of the latter, such as inconsistency ([8,26]) lack of correspondence between practical significance and statistical significance ([8,27]) and absence of logical coherence under multiple-hypothesis testing. We continue with the main results of the manuscript.
The Mixed Test Obeys the NNPP
In this subsection, we prove that the mixed test meets the NNPP when the hypothesis about the parameter of interest is simple. Next, we examine further the case in which there is a statistic s-ancillary for the parameter of interest and show how the introduction of the concepts of a conditional p- and a conditional adaptive significance level can make performance of the mixed test much easier.
Theorem 3.
Let and (that is, Θ is variation independent). Let and be two experiments as defined in Section 2. Let . In addition, let and be the mixed tests for the hypotheses
respectively. Assume is absolutely continuous with prior density function π, with . Then, is a Non-Informative Nuisance Parameter for testing versus by using , and for every such that (1) holds,
Theorem 3 tells us that when the likelihood function may be factored as (1), the mixed test obeys the NNPP. That is to say, if one aims to test a simple hypothesis about the parameter of interest in the presence of a non-informative nuisance parameter by means of the mixed test, then he can disregard in the analysis. Under a purely mathematical viewpoint, when satisfying (1) is observed, the decision between rejecting and accepting the null hypothesis regarding depends on only through the factor , which is not a function of , as we can see from Equation (A16) in Appendix A. It should be emphasized that Theorem 3 holds for null hypotheses more general than only simple ones. For instance, the Theorem is still valid when the null hypothesis H is of the form , where is a hyperplane of . The proof of this result is quite similar to the proof of Theorem 3 in Appendix A and for this reason is omitted.
The adherence to the NNPP is indeed an advantage of the mixed test. It may bring a considerable reduction in the calculations involved along the procedure of the mixed test, especially under statistical models for which a statistic s-ancillary for the parameter of interest can be found. Such cases are examined after Corollary 1, which follows straightforwardly from Theorems 2 and 3.
Corollary 1.
Assume the same conditions of Theorem 3 and suppose that such that T is p-sufficient for and s-ancillary for . Then, for all , .
Now, let us suppose that under experiment , there is a statistic such that T is s-ancillary for . Let be the hypothesis of interest. From the predictive distribution for X, we can define for each value the conditional probability function for X given , by
if , and , otherwise.
Finally, from the conditional distribution in (24), we define two conditional statistics: the conditional p- and the conditional adaptive significance level. Such quantities will be of great importance for the performance of the mixed test, as we will see in the next section.
Definition 3.
Conditional p-value: Let be an experiment for which the statistic is s-ancillary for . Let be the hypothesis of interest, and , , as in (24). We define the p-value conditional on T, for each by
where and . From Equation (A14), the may be rewritten as
where and since T is s-ancillary for . It follows that
that is,
Definition 4.
Conditional adaptive significance level: Let be an experiment for which the statistic is s-ancillary for . Let be the hypothesis of interest and , be as in (24). We define the conditional adaptive significance level given T, , for each by
The conditional adaptive significance level may be rewritten as
Definitions 3 and 4 are conditional versions of Definitions in (18) and (19), respectively. While calculation of the unconditional quantities involves the evaluation of the Bayes factor for every , the determination of the conditional statistics at a specific sample point depends only on the values of the Bayes factor for the sample points x such that , which may be much easier to accomplish. Note also that the -value and can be seen, respectively, as an alternative (conditional) measure of evidence in favor of the null hypothesis H and an alternative threshold value for testing the competing hypotheses. As a matter of fact, one can substitute the p- and the adaptive significance level with their conditional versions in order to perform the mixed test. This is exactly what the next theorem states.
Theorem 4.
Assume the same conditions as in Corollary 1 and Theorem 3. Then, for all ,
The results of Theorems 3 and 4 and Corollary 1 suggest a way the mixed test may be used without doing so many calculations: when an ancillary statistic for the parameter of interest, T, is available, one can perform the test by comparing the conditional statistics and instead of comparing the unconditional ones in Definitions (18) and (19). This possibility is illustrated in the next section.
4. Examples
We now revisit three well-known problems of hypothesis testing for count data and present new solutions to them by means of the mixed test. In each problem, we consider a suitable reparametrization of the standard model in order to ensure that
- There exists a statistic T that is ancillary to the new parameter of interest;
- The hypothesis about the new parameter of interest under the reparametrization is a singleton (or a hyperplane);
- The new parameter of interest is independent of the new nuisance parameter a priori;
- The distribution of the data X given any value of the statistic T is simple enough to render the calculations of the conditional p- and the conditional adaptive significance level easy.
4.1. Comparison of Poisson Means
Suppose we are interested in testing the equality between two Poisson means: say and . Let . For this purpose, let be a random vector to be observed such that given , and are independent Poisson random variables with parameters and , respectively, where is a known integer. The hypotheses to be tested are
where . The likelihood function for generated by is
Suppose also that and are independent a priori and that is distributed as a Gamma random variable with parameters and , . That is, the prior density function of is
Although one can determine an exact expression for the Bayes factor in this case (as a matter of fact, in [25], the authors first presented a solution to the problem of testing the equality of Poisson means by using weighted likelihoods in the context of a production process monitoring procedure), the use of the mixed test under the above parametrization may be computationally disadvantageous, as the sample space is and one should determine infinitely many Bayes factors to perform the test. To overcome this difficulty, we next consider the following reparametrization of the model: let be the new parameter, where
The new parameter space is then . Now, the hypotheses (30) can be rewritten as
with . Note that the likelihood function (31) can be rewritten by conditioning on the statistic as follows:
Hence, the induced likelihood function for generated by may be factored as
Note that T is an ancillary statistic for , as it is distributed as a Poisson random variable with mean , and the conditional distribution of X, given , , depends on only through . The prior distribution for is given by
Now, as (34), (36) and (37) hold, it follows from Theorem 3 that is an NNP and that the performance of the mixed test for the hypothesis against based on and the prior is equivalent to the performance of the mixed test for the simple hypothesis against based on the binomial-like factor of that depends only on and the marginal Beta prior density for ignoring the NNP . In addition, Theorem 4 implies that the test for versus reduces to the comparison of the statistics - and at the observed sample point, say . Note that in this case, one does not need to evaluate the Bayes factor for every point of but only for those of them for which the sum of the components is . That is, one needs to evaluate the Bayes factor only for the elements of when is observed.
From Equations (A14) and (A15), one gets the following predictive functions under and under for X:
and
where . Consequently, the Bayes factor is
Finally, it follows from (28) and (30) that for ,
and
Note that in this case, the conditional - resembles the frequentist p- for the simple hypothesis under simple random sampling from the Bernoulli model with parameter (however, for the calculation of the -, the sample space is ordered by the Bayes factor instead of the likelihood ratio).
Example 3
(Comparison of Poisson means). In [25], the authors consider that a methodology to detect a shift in a production process is to compare the quality index of the current rating period P, , with the quality index of the previous rating period, . Suppose that we want to test if a process is under control; that is, if . For this purpose, two audit samples of size are collected at rating periods and P, respectively. Let represent the number of defects found in the first sample and represent the number of defects found in the second sample. Also suppose that and are Poisson random variables with parameters and , respectively. Let . For simplicity, we consider the hyperparameters in (32) as . Hence, the predictive functions under the competing hypothesis are given by:
and
Consequently, the Bayes factor at can be expressed by
Now, suppose that two defects are found at rating period and nine defects are found at period P. That is, suppose that is observed. In this case, one gets . Considering , the conditional adaptive significance level and the conditional -value at are, respectively,
Since , the decision is to reject the null hypothesis (34), where .
Note that although the sample size is small, the null hypothesis can be rejected with a conditional of 0.065. Such a value is not compared with standard (fixed) cut-off values such as 0.01 or 0.05 but rather with the conditional adaptive significance level of 0.227 for . Note also that performance of the mixed test by means of the conditional statistics - and when is observed requires the calculation of only finitely many Bayes factors (twelve, precisely) even though the sample space is infinite.
4.2. Test of Symmetry
Suppose we want to test the hypothesis of symmetry in an two-way contingency table. Several methods have been proposed for testing diagonal symmetry: see, for example, [28,29,30,31,32,33] and references therein. Here we propose a solution to this problem by using the mixed test and its properties. We present the simplest case . The reader will find the general case in Appendix A for the sake of readability.
Suppose each element (individual) of a sample of size n is classified into four mutually exclusive combinations of the two-valued variables and . Let Table 1 represent the observed frequencies of the cross-classifications, where is the number of individuals classified into the category of and the category of , . Let , with and , where denotes the probability of classification into the category of and the category of , .
Table 1.
Observed frequencies of in the case.
The hypotheses for testing diagonal symmetry are
We assume that the vector is, given , a multinomial random vector with parameters n and . The likelihood function generated by is then given by
where and . Assume also a prior Dirichlet distribution with parameter vector , for . That is,
where .
We should note that the determination of the predictive functions is much easier under the following reparametrization of the model: let us define
Let . Thus, the new parameter space is , where .
Then, we can reformulate the hypotheses (46) as
where . Note that the likelihood function for generated by can be rewritten by conditioning on the statistic as
Hence, the induced likelihood function for generated by may be factored as
Note that T is an ancillary statistic for as it is a multinomial random vector with parameters n and , and the conditional distribution of X given , depends on only through . The prior distribution for is given by
where is the prior Dirichlet distribution for with parameter vector .
As in the example of the previous subsection, the results from Section 3 imply that is an NNP for testing the hypotheses versus in (50) by using the mixed test. In addition, we only need to compare the conditional - with the conditional adaptive significance level to decide between the hypotheses.
From Equations (A14) and (A15), one gets the following predictive functions under and under for X:
and
where
Consequently, the Bayes factor is
Finally, it follows from (28) and (30) that for ,
and
In this example (as in the previous subsection), the conditional - looks like the frequentist p-value for the simple hypothesis regarding an unknown proportion. We should emphasize that for calculation of the -, the sample space is ordered by the Bayes factor in place of the likelihood ratio. Note also that the evaluation of this conditional statistic involves ordering at most points of the sample space (exactly those for which the statistic T takes the value when is the effectively observed sample point). On the other hand, if one performs the mixed test without using these conditional quantities, he shall order all elements of the sample space.
Example 4
(Analysis of opinion swing). Suppose it is of interest to evaluate whether the proportion of individuals that did not support the US President before the State of the Union Address remained unchanged after his address. For this purpose, individuals are surveyed with regard to their support for the President before and after his annual message. The survey results are displayed in the following contingency Table 2:
Table 2.
Survey results.
Let () be the support—“No” or “Yes”—for the President before (after) the State of the Union Address. Let be the probability that an individual is classified into the i-th category of and j-th category of (for instance, is the probability that an individual does not support the President both before and after his address). The hypothesis that the support for the President remains unchanged is . This is equivalent to the hypothesis that the proportion of swings from “Yes” to “No” is equal to the proportion of swings from “No” to “Yes”; that is, this is equivalent to the symmetry hypothesis . Thus, we can test such a hypothesis by means of the mixed test considering the mathematical setup of this subsection. Suppose . Then, the Bayes factor is given by
For the observed data , we obtain the Bayes factor . Considering (that is , we do not reject the null hypothesis since . In this case, the conditional adaptive significance level and the conditional at the point are
Note that -value, as it was expected. Note also that we ordered only 28 elements of the sample space by the Bayes factor to determine the above conditional quantities. To calculate the unconditional ones, we should have ordered all 176,851 points in the sample space.
4.3. Test of Independence
Consider the same statistical model as in the previous subsection. However, now we want to evaluate whether there exists (or not) association between the variables and . For this purpose, we may test the independence hypothesis between these variables. Consider the joint distribution for in Table 3 below:
Table 3.
Joint distribution of and given .
The hypotheses to be tested are
It is easy to check that hypotheses H and A can be rewritten as
Let us define
and consider the new parameter , which takes value in . Let . Proceeding as in the previous subsections, we obtain the following induced likelihood function for generated by
Note that T is an ancillary statistic for , as it is a binomial random variable with parameters n and . In addition, for each possible value t of the statistic T, the conditional distribution of X given depends on only through .
The prior distribution for is such that , and are independent Beta random variables with respective parameters and , and , and and .
Finally, note that under the new parametrization, the independence hypothesis is
where .
From the results from Section 3, it follows that is an NNP for testing the hypotheses versus above by using the mixed test. In addition, we only need to compare the conditional with the conditional adaptive significance level to decide between these hypotheses. In a sense, the test for the hypothesis of independence between and by using the conditional statistics resembles the test for the hypothesis of homogeneity were the marginal counts and fixed beforehand.
Considering as in the previous section , we obtain the following expression for the Bayes factor:
The conditional predictive probability function for X given , is given by
where .
From the above distribution, one may obtain the conditional and the conditional adaptive significance level at each point in the sample space.
Example 5
(Market’s directional change). In [34] it is argued that the directional change of the stock market in January signals the directional change of the market for the remainder of the year. Suppose the following Table 4 summarizes the directional changes of the prices of a few stocks in both periods.
Table 4.
Survey results.
In this case, the Bayes factor is given by
For the observed data , we obtain the Bayes factor . Considering (that is , we reject the null hypothesis by using the mixed test since . That is, the data from only a few stocks reveal that the directional change for the remainder of the year depends on the directional change in January. Note that although the sample size is small () and a cell count is equal to zero, the mixed test can be fully performed, as opposed to standard tests for the hypothesis of independence that rely on asymptotic results. In this case, the conditional adaptive significance level and the conditional - at the point are
5. Discussion
Statistical hypothesis testing is an important quantitative method that may help the daily activity of scientists from different areas of knowledge. However, with recent computational advances, the misuse of standard tests have come to light. Thus, problems with tests of significance and fixed-level tests have brought a growing need for alternative approaches to hypothesis testing that do not have such drawbacks. Among these alternatives, we revisit in this manuscript the mixed test by Pericchi and Pereira, which combines aspects from two opposing viewpoints: the frequentist and the Bayesian. The mixed test satisfies various reasonable properties one desires when performing a test of hypotheses. Here we prove that the mixed test also meets the Non-Informative Nuisance Parameter Principle (NNPP) for simple hypotheses regarding the parameter of interest. The NNPP concerns the question of how to make inferences about a parameter in the presence of Non-Informative Nuisance Parameters: it states that it is irrelevant whether a Non-Informative Nuisance Parameter is known or unknown in order to draw conclusions about a quantity of interest from data. This principle, though important, has not been explored in some depth, and for this reason, we studied it further in hypothesis testing problems. Nuisance parameters typically affect inferences about a parameter of interest: when the variance is unknown, estimation of the mean of a normal distribution and estimation of the parameters of a linear regression model are examples of this.
Adherence of the mixed test to the NNPP allowed for much easier performance of the test, as the calculations involved were significantly reduced. Indeed, decision making between the competing statistical hypotheses was simplified in the three examples we examined: in each situation, conditioning on a suitable statistic and considering conditional versions of the p-value and the adaptive significance level were revealed to be an advantageous course of action to use the mixed test. The extent to which the adherence of the mixed test to the NNPP is valid and the use of the mixed test can then be made easier remains unanswered in this work. This issue is the goal of future investigation.
Author Contributions
All authors have contributed to the conceptualization, formal analysis and writing of the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding
This work was partially supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico [grant 141161/2018-3].
Data Availability Statement
No new data were created or analyzed in this study. Data sharing is not applicable to this article.
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A
Proof of Theorem 2.
Suppose there exists a statistic such that it is p-sufficient for and s-ancillary for . Then,
The result is immediate: as the conditional distribution of X given and depends only on , and the marginal distribution of T given depends only on , one can write for each , and . The proof when T is p-sufficient for and s-ancillary for is analogous. □
Proof of Theorem 1.
We first prove the part of the theorem. Suppose that is such that ; that is, the posterior distribution of given can be factored as . Then,
Due to the fact that , it follows from the last equality in (A2) that
The result follows considering, for instance,
□
Next, we prove the converse. Suppose that the likelihood can be factored as . Then,
The posterior marginal density of is obtained from (A5) by integrating out the other component. Thus,
and therefore, ; that is, . □
Proof of Theorem 3.
We first verify that is an NNP for testing versus by means of . Recall that
where () is the predictive distribution for obtained under (). In this case, the likelihood function generated by for with such that (1) holds and is
Then, the predictive function under the null hypothesis can be calculated as
where is degenerate at conditional distribution of . Thus,
In addition, the predictive function under the alternative hypothesis is given by
Thus, the Bayes factor can be expressed by
Note that Equation (A12) does not depend on . As a result, the test in (A7) does not depend on , and consequently, is an NNP for testing versus by means of . Now, we shall determine the test for H versus A. The predictive distribution for X at under the null hypothesis is
It is not difficult to verify that for fixed , the conditional distribution of given is such that is degenerate at , and is independent of with density .
Then,
For the alternative hypothesis, we have that
Finally,
Hence,
and consequently,
□
Proof of Corollary 1.
The corollary follows directly from Theorems 2 and 3. □
Proof of Theorem 4.
From Theorem 3 and Corollary 1, we have that for each ,
Then,
Thus,
The converse is proven by the contrapositive.
As if , we obtain that
Since and , it follows that . Thus,
and consequently,
Mixed test for symmetry hypothesis for contingency tables
In this case, Table A1 represents the observed frequencies of the cross-classification of n units by the variables and .
Table A1.
Observed frequencies of and in the case.
Table A1.
Observed frequencies of and in the case.
| n | ||||
Let be the -dimensional vector of cell counts and be the -dimensional vector of cell probabilities, where and are self explanatory. Suppose that X is a multinomial random vector with parameters n and . The likelihood function generated by for is given by
The hypotheses for testing diagonal symmetry are
We also assume a prior Dirichlet distribution with parameter for . That is,
To perform the mixed test for the symmetry hypothesis, we consider the following reparametrization of the model: we define
Let , where is the -dimensional vector for which the components are ’s such that , and is the -dimensional vector for which the components are ’s such that . The new parameter space is then .
As in previous sections, we consider a statistic T that is s-ancillary for : T is the -dimensional vector for which the components are the sums for and for . The induced likelihood function for generated by x is
We can easily see that the likelihood function in (A28) can be factored as . In addition, the prior distribution for is such that and are independent: being a Dirichlet random vector and a vector of independent Beta random variables. That is,
From Theorem 3, we have that is an NNP for testing versus by means of the mixed test. In addition, the mixed test for reduces to the mixed test for the simple hypothesis were known. From Theorem 4, it follows that we only need to compare the conditional with the conditional adaptive significance level to test against . From (28), (30) and (A28), we obtain for :
and
In this case, these conditional quantities are simply determined by the products of binomial-type probabilities.
References
- Berger, J.; Wolper, R. The Likelihood Principle; Institute of Mathematical Statistics: Hayward, CA, USA, 1988. [Google Scholar]
- Mayo, D. On the Birnbaum Argument for the Strong Likelihood Principle. Stat. Sci. 2014, 29, 227–239. [Google Scholar] [CrossRef]
- Dawid, A. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Stat. Sci. 2014, 29, 240–241. [Google Scholar] [CrossRef]
- Evans, M. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Stat. Sci. 2014, 29, 242–246. [Google Scholar] [CrossRef]
- Hannig, J. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Stat. Sci. 2014, 29, 254–258. [Google Scholar] [CrossRef]
- Bjørnstad, J. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Stat. Sci. 2014, 29, 259–260. [Google Scholar] [CrossRef]
- Shan, G. Exact Statistical Inference for Categorical Data; Academic Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Pericchi, L.; Pereira, C. Adaptive significance levels using optimal decision rules: Balancing by weighting the error probabilities. Braz. J. Probab. Stat. 2016, 30, 70–90. [Google Scholar] [CrossRef]
- Butler, R.W. Predictive Likelihood Inference with Applications. J. R. Stat. Soc. Ser. B 1986, 48, 1–38. [Google Scholar] [CrossRef]
- Severini, T. Integrated likelihoods for functions of a parameter. Stat 2018, 7, e212. [Google Scholar] [CrossRef]
- Berger, J.; Liseo, B.; Wolpert, R. Integrated Likelihood Methods for Eliminating Nuisance Parameters. Stat. Sci. 1999, 14, 1–28. [Google Scholar] [CrossRef]
- Cox, D.R. Partial likelihood. Biometrika 1975, 62, 269–276. [Google Scholar] [CrossRef]
- Dawid, A.P. On the concepts of sufficiency and ancillarity in the presence of nuisance parameters. J. R. Stat. Soc. Ser. B 1975, 37, 248–258. [Google Scholar] [CrossRef]
- Sprott, D.A. Marginal and conditional sufficiency. Biometrika 1975, 62, 599–605. [Google Scholar] [CrossRef]
- Barndorff-Nielsen, O. Nonformation. Biometrika 1976, 63, 567–571. [Google Scholar] [CrossRef]
- Barndorff-Nielsen, O. Information and Exponential Families: In Statistical Theory; Wiley: Chichester, UK, 1978. [Google Scholar]
- Basu, D. On the Elimination of Nuisance Parameters. J. Am. Stat. Assoc. 1977, 72, 355–366. [Google Scholar] [CrossRef]
- Jørgensen, B. The rules of conditional inference: Is there a universal definition of nonformation? J. Ital. Stat. Soc. 1994, 3, 355. [Google Scholar] [CrossRef]
- Pace, L.; Salvan, A. Principles of Statistical Inference: From a Neo-Fisherian Perspective; World Scientific Publishing Company Pte Limited: Singapore, 1997. [Google Scholar]
- Gannon, M.; Pereira, C.; Polpo, A. Blending Bayesian and Classical Tools to Define Optimal Sample-Size-Dependent Significance Levels. Am. Stat. 2019, 73, 213–222. [Google Scholar] [CrossRef]
- Pereira, C.; Nakano, E.; Fossaluza, V.; Esteves, L.; Gannon, M.; Polpo, A. Hypothesis Tests for Bernoulli Experiments: Ordering the Sample Space by Bayes Factors and Using Adaptive Significance Levels for Decisions. Entropy 2017, 19, 696. [Google Scholar] [CrossRef]
- Olivera, M. Definição do nivel de significancia em função do tamanho amostral. Master’s Thesis, IME, Universidade de São Paulo, São Paulo, Brazil, 2014. [Google Scholar]
- Pereira, B.; Pereira, C. A Likelihood approach to diagnostic test in clinical medicine. Stat. J. 2005, 3, 77–98. [Google Scholar]
- Montoya, D.; Irony, T.; Pereira, C.; Whittle, M. An unconditional exact test for the Hardy-Weinberg equilibrium law: Sample space ordering using the Bayes factor. Genet. Soc. Am. 2001, 158, 875–883. [Google Scholar]
- Irony, T.; Pereira, C. Bayesian hypothesis test: Using surface integrals to distribute prior information among the hypotheses. Resenhas IME-USP 1995, 2, 27–46. [Google Scholar]
- DeGroot, M. Probability and Statistics; Adson Wesley: Boston, MA, USA, 1986. [Google Scholar]
- Freeman, P. The role of p-values in analysing trial resultss. Stat. Med. 1993, 12, 15–16. [Google Scholar] [CrossRef]
- Bowker, A. A Test for Symmetry in Contingency Tables. J. Am. Stat. Assoc. 1948, 43, 572–574. [Google Scholar] [CrossRef]
- Ireland, C.; Ku, H.; Kullback, S. Symmetry and marginal homogeneity of an r × r contingency table. J. Am. Stat. Assoc. 1969, 64, 1323–1341. [Google Scholar] [CrossRef]
- Kullback, S. Marginal Homogeneity of Multidimensional Contingency Tables. Ann. Math. Stat. 1971, 42, 594–606. [Google Scholar] [CrossRef]
- Bernardo, G.; Lauretto, M.; Stern, J. The full Bayesian significance test for symmetry in contingency tables. AIP Conf. Proc. 2012, 1443, 198–205. [Google Scholar]
- Agresti, A. Categorical Data Analysis, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Tahata, K.; Tomizawa, S. Symmetry and asymmetry models and decompositions of models for contingency tables. SUT J. Math. 2014, 50, 131–165. [Google Scholar] [CrossRef]
- McClave, J.T.; Benson, P.G.; Sincich, T.T. Statistics for Business and Economics; Number 519.5; Pearson: London, UK, 2001. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).