On the Nuisance Parameter Elimination Principle in Hypothesis Testing

Flórez Rivera, Andrés Felipe; Esteves, Luis Gustavo; Fossaluza, Victor; de Bragança Pereira, Carlos Alberto

doi:10.3390/e26020117

Open AccessArticle

On the Nuisance Parameter Elimination Principle in Hypothesis Testing

by

Andrés Felipe Flórez Rivera

^*,

Luis Gustavo Esteves

,

Victor Fossaluza

and

Carlos Alberto de Bragança Pereira

Institute of Mathematics and Statistics, University of São Paulo, São Paulo 05508-090, Brazil

^*

Author to whom correspondence should be addressed.

Entropy 2024, 26(2), 117; https://doi.org/10.3390/e26020117

Submission received: 1 December 2023 / Revised: 23 January 2024 / Accepted: 24 January 2024 / Published: 29 January 2024

(This article belongs to the Special Issue Bayesianism)

Download Versions Notes

Abstract

The Non-Informative Nuisance Parameter Principle concerns the problem of how inferences about a parameter of interest should be made in the presence of nuisance parameters. The principle is examined in the context of the hypothesis testing problem. We prove that the mixed test obeys the principle for discrete sample spaces. We also show how adherence of the mixed test to the principle can make performance of the test much easier. These findings are illustrated with new solutions to well-known problems of testing hypotheses for count data.

Keywords:

p-values; Bayes factor; likelihood function; hypothesis testing

1. Introduction

Principles of Statistical Inference (or Data Reduction) constitute important guidelines on how to draw conclusions from data, especially when performing standard inferential procedures for unknown parameters of interest, like estimation and hypothesis testing. For instance, the Sufficiency Principle (SP) states that any sufficient statistic retains all relevant information about the unknown parameters that should be used to make inferences about them. It precisely recommends that if T is a sufficient statistic for the statistical model under consideration and

x_{1}

and

x_{2}

are sample points such that

T (x_{1}) = T (x_{2})

, then the observation of any of these points should lead to the same conclusions regarding the parameters of interest.

Besides the place of sufficiency in Statistical Inference, these recommendations cover several issues such as the contrast between post-experimental and pre-experimental reasoning and the roles of non-informative stopping rules, censoring mechanisms and nuisance parameters in data analysis. Among the main principles, the Sufficiency Principle is generally recognized as a cornerstone of Statistical Inference. On the other hand, the Likelihood Principle (LP) and its profound consequences are still subjects of intense debate. The reader will find a detailed discussion of the Likelihood Principle in [1,2,3,4,5,6].

In this work, we examine the Non-Informative Nuisance Parameter Principle (NNPP) introduced by Berger and Wolpert in 1988 in their remarkable book that concerns the problem of the way inferences about a parameter of interest should be made in the presence of nuisance parameters. Nuisance parameters usually affect inferences about the parameter of interest, like in the estimation of the mean of a normal distribution with unknown variance, in the estimation of the parameters of a linear regression model in the presence of unknown variance, and in the determination of p-values for specific hypotheses in the analysis of

2 \times 2

contingency tables ([7]). In a few words, the NNPP states that under suitable conditions, it is irrelevant whether the value of a non-informative nuisance parameter is known or not in order to draw conclusions about the parameter of interest. Despite the importance of the problem for eliminating nuisance parameters in data analysis, the authors have not explored this principle and its consequences in some depth as far as we have reviewed the literature. For this reason, we revisit the NNPP by formally stating it for the problem of hypothesis testing, present decision rules that meet the principle and show how the performance of a particular test in line with the NNPP can then be simplified.

This work is organized as follows: in Section 2, the NNPP for hypothesis testing is stated, discussed and illustrated under a Bayesian perspective. In Section 3, the Bayesian test procedure based on the concept of adaptive significance level and on an alternative p-value introduced by Pericchi and Pereira in [8], henceforth named the mixed test, is reviewed and is proven to satisfy the NNPP for discrete sample data when the (marginal) null hypothesis regarding the parameter of interest is a singleton (as a matter of fact, the result also holds when such a null hypothesis is specified by a hyperplane). In that section, we also define conditional versions of the adaptive significance level and p-value based on suitable statistics and prove that under those conditions, the performance of the mixed test is then simply the comparison between these new conditional quantities. These results are of great importance to make it easier to use the mixed test in various situations. In Section 4, we exemplify the main results by presenting new solutions by using the mixed test for well-known problems of test of hypotheses for count data under suitable reparametrizations of the corresponding models: we revisit the problems of comparison of Poisson population means and of testing the hypotheses of independence and symmetry in contingency tables. We make our final comments in Section 5. The proofs of the theorems and the calculations for one example in Section 4 are found in the Appendix A.

2. The Non-Informative Nuisance Parameter Principle for Hypothesis Testing

The problem of the elimination of nuisance parameters in statistical inference has a long history and remains a major issue. Proposals to deal with it include the marginalization of the likelihood function by integrating out the nuisance parameter ([9,10,11]), the construction of partial likelihood functions ([12,13,14], among others) and the consideration of conditional likelihood functions based on different notions of non-informativeness, sufficiency and ancillarity. Elimination of nuisance parameters and different notions of non-information have also been studied in more detail in [15,16,17,18], where, based on suitable statistics, the concepts of B, S and G non-information are presented. The generalized Sufficiency and Conditionality Principles are also discussed in [17]. On the other hand, Bayesian methods for eliminating nuisance parameters based on a suitable statistic T involve different definitions of sufficiency: for instance, K-Sufficiency, Q-Sufficiency and L-Sufficiency (see for example [17] and references therein).

In this section, the Non-Informative Nuisance Parameter Principle (NNPP) by Berger and Wolpert is discussed and formally defined for the problem of hypothesis testing. As we will see, on the one hand, the NNPP seems to be fair under both the partial and the conditional non-Bayesian approaches mentioned in the previous paragraph; on the other hand, it sounds really reasonable under the Bayesian standpoint. Despite the relevance of the problem of the elimination of nuisance parameters in data analysis, Berger and Wolpert [1] presented the NNPP but has not explored the principle in-depth as far as we have examined in the literature.

Some notation is needed to continue. We denote by

θ

the unknown parameter and by X the sample to be observed.

Θ

and

X

represent the parameter and the sample spaces, respectively. The family of discrete probability distributions for X is denoted by

P = {P (\cdot | θ) : θ \in Θ}

. In addition, for

x \in X

,

L_{x} (\cdot)

denotes the likelihood function for

θ

generated by the sample point x. By an experiment

E

, we mean, as in [1], a triplet

E = (X, θ, P)

, with X,

θ

and

P

as defined earlier. Finally, for a subset

Θ_{0}

of

Θ

, we formulate the null hypothesis

H : θ \in Θ_{0}

and the alternative one

A : θ \notin Θ_{0}

. We recall that a test function (procedure) for the hypotheses H versus A is a function

φ : X \to {0, 1}

that takes the value 1 (

φ (x) = 1

) if H is rejected when

x \in X

is observed and takes the value 0 (

φ (x) = 0

) if H is not rejected when x is observed. Under the Bayesian perspective, we also consider a continuous prior density function

π (\cdot)

for

θ

that induces, when combined with the likelihood function

L_{x} (\cdot)

, a continuous posterior density function for

θ

given x,

π (\cdot | x)

.

In [1], Berger and Wolpert presented the following principle on how to make inferences about an unknown parameter of interest

θ_{1}

in the presence of a nuisance parameter

θ_{2}

: when a sample observation, say

x_{0}

, separates information concerning

θ_{1}

from information on

θ_{2}

, it is irrelevant whether the value of

θ_{2}

is known or unknown in order to make inferences about

θ_{1}

based on the observation of

x_{0}

. In other terms, if the conclusions on

θ_{1}

were to be the same for every possible value of the nuisance parameter, were

θ_{2}

known, then the same conclusions on

θ_{1}

should be reached even if

θ_{2}

is unknown. These authors then consider the following mathematical setup to formalize these ideas.

Let

θ = (θ_{1}, θ_{2})

, with

θ_{1}

and

θ_{2}

defined as in the previous paragraph. Consider

Θ = Θ_{1} \times Θ_{2}

; that is, the parameter space is variation independent, where

Θ_{i} \subseteq R^{n_{i}}

is the set of values for

θ_{i}

,

n_{i} \in N^{*}

,

i = 1, 2

. Suppose the experiment

E = (X, θ, P)

is carried out to learn about

θ

. Let

\bar{E} = ((X, θ_{2}), θ_{1}, \bar{P})

be the “thought” experiment in which the pair

(X, θ_{2})

is to be observed (instead of observing only X), where

\bar{P}

is the family of distributions for

(X, θ_{2})

indexed by

θ_{1}

. Suppose also that under experiment

E

, the likelihood function generated by a specific

x_{0} \in X

for

θ

has the following factored form:

L_{x_{0}} (θ_{1}, θ_{2}) = L_{x_{0}}^{1} (θ_{1}) L_{x_{0}}^{2} (θ_{2})

(1)

where

L_{x_{0}}^{i} : Θ_{i} \to R_{+}

,

i = 1, 2

; that is,

L_{x_{0}}^{i}

depends on

θ

only through

θ_{i}

.

Berger and Wolpert then states the Non-Informative Nuisance Parameter Principle (NNPP): if

E

and

x_{0} \in X

are such that (1) holds, and if the inference about

θ_{1}

from the observation of

(x_{0}, θ_{2})

when

\bar{E}

is performed does not depend on

θ_{2}

, then the inferential statements made for

θ_{1}

from

E

and

x_{0}

should be the same as (should coincide with) the inferential statements made from

\bar{E}

and

(x_{0}, θ_{2})

for every

θ_{2} \in Θ_{2}

.

The authors named such a parameter

θ_{2}

a Non-Informative Nuisance Parameter (NNP), as the conclusions or decisions regarding

θ_{1}

from

\bar{E}

and

(x_{0}, θ_{2})

do not depend on

θ_{2}

.

A likelihood function that satisfies (1) is named a likelihood function with separable parameters ([19]). The factored form of the likelihood function in (1) seems to capture the notion of “absence of information about a parameter, say

θ_{1}

, from the other,

θ_{2}

, and vice versa” under both Bayesian and non-Bayesian reasoning. Indeed, under the Bayesian paradigm, posterior independence between

θ_{1}

and

θ_{2}

(say, given

x_{0}

) reflects the fact that one´s opinion about the parameter

θ_{1}

after observing

x_{0}

is not altered by any information about

θ_{2}

, and consequently, decisions regarding

θ_{1}

should not depend on

θ_{2}

. Since posterior independence between

θ_{1}

and

θ_{2}

given

x_{0}

is equivalent to the factored form of the likelihood function generated by

x_{0}

under prior independence, condition (1) sounds really reasonable as a mathematical description of separate information about the parameters. Thus, if a Bayesian statistician should make inferences regarding a parameter

θ_{1}

in the presence of a nuisance parameter

θ_{2}

, it would be ideal that these parameters are independent a posteriori; that is, the factored form of the likelihood function holds. This last equivalence is proven in the theorem below.

Theorem 1.

Let

E = (X, θ, P)

be an experiment and

π (\cdot)

be the prior probability density function for

θ = (θ_{1}, θ_{2})

. Suppose

θ_{1}

is independent of

θ_{2}

(

θ_{1} ⫫ θ_{2}

) a priori. Then, for each

x \in X

,

θ_{1} ⫫ θ_{2} | X = x ⟺ \exists L_{x}^{i} : Θ_{i} \to R_{+}, i = 1, 2,

(2)

such that

L_{x} (θ_{1}, θ_{2}) = L_{x}^{1} (θ_{1}) L_{x}^{2} (θ_{2})

.

On the other hand, the condition (1) seems to also be a fair representation of non-informativeness of one parameter on another under a non-Bayesian perspective. In fact, such a factored form of the likelihood function arises, for instance, when the sample X is conditioned on particular types of statistics that are simple to interpret under non-Bayesian paradigms. Note that for any statistic T, one can write

L_{x} (θ_{1}, θ_{2}) = P (X = x | θ_{1}, θ_{2}) = P (X = x | T (X) = T (x), θ_{1}, θ_{2}) P (T (X) = T (x) | θ_{1}, θ_{2}) .

If, in addition, T is a statistic such that its distribution given

θ

depends only on

θ_{1}

and the conditional distribution of X given

T (X) = T (x)

, and

θ

depends only on

θ_{2}

, the factored form in (1) is easily obtained (such a statistic was named p-sufficient for

θ_{1}

by Basu ([17]). In this situation, all the relevant information on

θ_{1}

is summarized in T, and one can fully make inferences on

θ_{1}

taking into account only the conditional distribution of T given

θ

, which does not involve

θ_{2}

. Similarly, if T is a statistic such that its distribution given

θ

depends only on

θ_{2}

and the conditional distribution of X given

T (X) = T (x)

and

θ

depends only on

θ_{1}

, the factored form in (1) holds. Such a statistic was named s-ancillary for

θ_{1}

by Basu ([17]), and it is somewhat evident that in this case, conclusions on

θ_{1}

should be drawn exclusively from the distribution of X given

T (X)

and

θ

, which does not depend on

θ_{2}

. Such a conditional approach to the problem of elimination of nuisance parameters had already been proposed by Basu ([17]) and in a sense is closely related to the NNPP by Berger and Wolpert. The next theorem formally presents such results.

Theorem 2.

Let

E = (X, θ, P)

be an experiment in which

θ = (θ_{1}, θ_{2})

and Θ is variation independent. Then, if

\exists T : X \to T

such that T is either p-sufficient or s-ancillary for

θ_{1}

, then for each

x \in X

, the likelihood function generated by x,

L_{x} (\cdot)

can be factored as (1).

In summary, it seems reasonable that inferences about

θ_{1}

and

θ_{2}

can be performed independently under condition (1). Thus, if only

θ_{1}

is of interest, then it seems sensible under (1) that we reach the same conclusions on

θ_{1}

when x is observed either by using the whole likelihood function

L_{x}

or only the factor

L_{x}^{1}

. That is, it makes sense to disregard the information contained in

L_{x}^{2}

and focus on

L_{x}^{1}

. As mentioned by [19], examples of likelihood functions with separable parameters like (1) are rare, but if (1) holds, it would be a useful property for Bayesian and non-Bayesian statisticians to analyze statistical data, especially in the presence of nuisance parameters. This fact will be illustrated in Section 3 and Section 4.

We end this section by formally adapting the general NNPP to the special problem of hypothesis testing, in which inference about an unknown parameter consists of deciding whether a statement about the parameter (a statistical hypothesis) should be rejected or accepted by using the observable quantity X.

As before, let

E = (X, θ, P)

be an experiment, with

Θ = Θ_{1} \times Θ_{2}

. Let

\bar{E} = ((X, θ_{2}), θ_{1}, \bar{P})

be the “thought” experiment in which, in addition to X,

θ_{2}

is observed. Then, consider the following definition.

Definition 1.

Non-Informative Nuisance Parameter (NNP): Let

B \subseteq Θ_{1}

and

\bar{φ} : X \times Θ_{2} \to {0, 1}

be a test for the hypotheses

\begin{matrix} \bar{H} : & θ_{1} \in B \\ \bar{A} : & θ_{1} \notin B \end{matrix}

(3)

Then, we say that

θ_{2}

is a Non-Informative Nuisance Parameter (NNP) for testing

\bar{H}

versus

\bar{A}

by using

\bar{φ}

if, for every

x \in X

such that (1) holds,

\bar{φ} (x, θ_{2})

does not depend on

θ_{2}

; that is, it depends only on x.

In a nutshell, Definition 1 tells us something that appears intuitive: if the decision between H and A does not depend on

θ_{2}

, then

θ_{2}

does not provide any information about

θ_{1}

. In the following example, we illustrate this idea.

Example 1.

Consider that

Θ_{1} = Θ_{2} = R

and the experiment

\bar{E} = ((X, θ_{2}), θ_{1}, P)

. Let

B \subset R

and

\bar{φ} : X \times Θ_{2} \to {0, 1}

be the test for the hypotheses

\begin{matrix} \bar{H} : & θ_{1} \in B \\ \bar{A} : & θ_{1} \notin B, \end{matrix}

(4)

such that the null hypothesis is rejected when the conditional probability of B given x and

θ_{2}

is small; that is,

\bar{φ} (x, θ_{2}) = 1 \Leftrightarrow P (θ_{1} \in B | x, θ_{2}) < δ,

(5)

where

δ \in (0, 1)

. Suppose, in addition, that

θ_{1}

and

θ_{2}

are independent a priori. Let us verify that

θ_{2}

is an NNP for testing these hypotheses by means of

\bar{φ}

. Let

x_{0} \in X

be such that

L_{x_{0}} (θ_{1}, θ_{2}) = L_{x_{0}}^{1} (θ_{1}) L_{x_{0}}^{2} (θ_{2})

for specific functions

L_{x_{0}}^{1}

and

L_{x_{0}}^{2}

. Then,

\begin{matrix} P (θ_{1} \in B | x_{0}, θ_{2}) = & \int_{B} π (θ_{1} | x_{0}, θ_{2}) d θ_{1} = \\ = & \int_{B} [\frac{P (X = x_{0} | θ_{1}, θ_{2}) π (θ_{2} | θ_{1}) π_{1} (θ_{1})}{\int_{Θ_{1}} P (X = x_{0} | θ_{1}^{'}, θ_{2}) π (θ_{2} | θ_{1}^{'}) π_{1} (θ_{1}^{'}) d θ_{1}^{'}}] d θ_{1} = \\ = & \frac{\int_{B} L_{x_{0}} (θ_{1}, θ_{2}) π_{2} (θ_{2}) π_{1} (θ_{1}) d θ_{1}}{\int_{Θ_{1}} L_{x_{0}} (θ_{1}, θ_{2}) π_{2} (θ_{2}) π_{1} (θ_{1}) d θ_{1}} = \\ = & \frac{\int_{B} L_{x_{0}}^{1} (θ_{1}) L_{x_{0}}^{2} (θ_{2}) π_{2} (θ_{2}) π_{1} (θ_{1}) d θ_{1}}{\int_{Θ_{1}} L_{x_{0}}^{1} (θ_{1}) L_{x_{0}}^{2} (θ_{2}) π_{2} (θ_{2}) π_{1} (θ_{1}) d θ_{1}} \Rightarrow \\ \Rightarrow & P (θ_{1} \in B | x_{0}, θ_{2}) = \frac{\int_{B} L_{x_{0}}^{1} (θ_{1}) π_{1} (θ_{1}) d θ_{1}}{\int_{Θ_{1}} L_{x_{0}}^{1} (θ_{1}) π_{1} (θ_{1}) d θ_{1}}, \end{matrix}

(6)

where

π_{i}

is the prior of

θ_{i}

,

i = 1, 2

. Thus, we have that

\bar{φ} (x_{0}, θ_{2}) = 1 \Leftrightarrow \frac{\int_{B} L_{x_{0}}^{1} (θ_{1}) π_{1} (θ_{1}) d θ_{1}}{\int_{Θ_{1}} L_{x_{0}}^{1} (θ_{1}) π_{1} (θ_{1}) d θ_{1}} < δ .

(7)

Note from Equation (7) that

\bar{φ} (x_{0}, θ_{2})

does not depend on

θ_{2}

. Thus,

θ_{2}

is an NNP for testing

\bar{H}

versus

\bar{A}

by using

\bar{φ}

.

After defining an NNP, we formally state the Non-Informative Nuisance Parameter Principle (NNPP) for hypothesis testing.

Definition 2.

Non-Informative Nuisance Parameter Principle (NNPP): Let the parameter space be variation independent; that is,

Θ = Θ_{1} \times Θ_{2}

. Consider the experiments

E = (X, θ, P)

and

\bar{E} = ((X, θ_{2}), θ_{1}, \bar{P})

. Let

B \subseteq Θ_{1}

be the subset of

Θ_{1}

of interest. In addition, let

φ : X \to {0, 1}

and

\bar{φ} : X \times Θ_{2} \to {0, 1}

be tests for the hypotheses

\begin{matrix} H : & θ \in B \times Θ_{2} \\ A : & θ \notin B \times Θ_{2}, \end{matrix} a n d \begin{matrix} \bar{H} : & θ_{1} \in B \\ \bar{A} : & θ_{1} \notin B, \end{matrix}

(8)

respectively.

If

θ_{2}

is an NNP for testing

\bar{H}

versus

\bar{A}

by using

\bar{φ}

and

x_{0} \in X

such that condition (1) holds, then

φ (x_{0}) = 1 \Leftrightarrow \bar{φ} (x_{0}, θ_{2}) = 1 .

(9)

The NNPP for statistical hypothesis testing says that if one intends to test a hypothesis regarding only the parameter

θ_{1}

, it is irrelevant whether

θ_{2}

is known or unknown if it is non-informative for such a decision-making problem. More formally, if one wants to test a hypothesis concerning only

θ_{1}

and he observes a sample point

x_{0} \in X

that separates information on

θ_{1}

from information on

θ_{2}

—that is, (1) holds—then the performances of the tests

φ

under the original experiment

E

and

\bar{φ}

under the “thought” experiment

\bar{E}

should yield the same decision on the hypothesis

θ_{1} \in B

if

θ_{2}

is non-informative for that purpose.

We should mention that the NNPP can be adapted to any other inferential procedure. However, in this work, we focus on the principle for the problem of hypothesis testing. We conclude this section by proving that tests based on the posterior probabilities of the hypotheses satisfy the NNPP under prior independence.

Example 2

(continuation of Example 1). Consider the conditions of Example 1. Consider

E = (X,

θ, P)

and let

φ : X \to {0, 1}

be the test for the hypotheses

\begin{matrix} H : & θ \in B \times Θ_{2} \\ A : & θ \notin B \times Θ_{2} \end{matrix}

(10)

that rejects the null hypothesis H if its posterior probability is small; that is,

φ (x) = 1 \Leftrightarrow P (θ \in B \times Θ_{2} | x) < δ .

(11)

Let

x_{0} \in X

be such that

L_{x_{0}} (θ_{1}, θ_{2}) = L_{x_{0}}^{1} (θ_{1}) L_{x_{0}}^{2} (θ_{2})

. We can write the posterior probability on the right-hand side of (11) as

\begin{matrix} P (θ \in B \times Θ_{2} | x_{0}) = \int_{B \times Θ_{2}} π (θ | x_{0}) d θ = \frac{\int_{B \times Θ_{2}} L_{x_{0}} (θ) π (θ) d θ}{\int_{Θ} L_{x_{0}} (θ) π (θ) d θ} = \\ = \frac{\int_{B \times Θ_{2}} L_{x_{0}}^{1} (θ_{1}) L_{x_{0}}^{2} (θ_{2}) π_{1} (θ_{1}) π_{2} (θ_{2}) d θ}{\int_{Θ_{1} \times Θ_{2}} L_{x_{0}}^{1} (θ_{1}) L_{x_{0}}^{2} (θ_{2}) π_{1} (θ_{1}) π_{2} (θ_{2}) d θ} = \frac{\int_{B} L_{x_{0}}^{1} (θ_{1}) π_{1} (θ_{1}) d θ_{1}}{\int_{Θ_{1}} L_{x_{0}}^{1} (θ_{1}) π_{1} (θ_{1}) d θ_{1}}, \end{matrix}

(12)

where the last equality follows from Fubini’s Theorem. Hence,

φ (x) = 1 \Leftrightarrow \frac{\int_{B} L_{x_{0}}^{1} (θ_{1}) π_{1} (θ_{1}) d θ_{1}}{\int_{Θ_{1}} L_{x_{0}}^{1} (θ_{1}) π_{1} (θ_{1}) d θ_{1}} < δ .

(13)

From Equations (7) and (13), we have that

φ (x) = 1 \Leftrightarrow \bar{φ} (x, θ_{2}) = 1

. Thus, the NNP Principle is met by tests based on posterior probabilities, as in Example 1. This result also holds when

Θ_{i} \subseteq R^{n_{i}}

,

n_{i} \in N^{*}

,

i = 1, 2

.

In the next section, we examine a second test procedure that is in line with the NNPP. We review the mixed test introduced by Pericchi and Pereira ([8]) and prove that such a test meets the NNPP for simple hypotheses concerning the parameter of interest. We also show how the adherence of the mixed test to the NNPP can then simplify its use.

3. The Mixed Test Procedure

The mixed test formally introduced in ([8]) is a test procedure that combines elements from both Bayesian and frequentist views. On the one hand, it considers an (intrinsically Bayesian) prior distribution for the parameter from which predictive distributions for the data under the competing hypotheses and Bayes factors are derived. On the other hand, the performance of the test depends on ordering the sample space by the Bayes factor and on the integration of these predictive distributions over specific subsets of the sample space in a frequentist-like manner. The mixed test is an optimal procedure in the sense that it minimizes linear combinations of averaged (weighted) probabilities of errors of decision. It also meets a few logical requirements for multiple-hypothesis testing and obeys the Likelihood Principle for discrete sample spaces despite the integration over the sample space it involves. In addition, the test overcomes several of the drawbacks fixed-level tests have. However, a difficulty with the mixed test procedure is the need to evaluate the Bayes factor for every sample point to order the sample space, which may involve intensive calculations. Properties of the mixed test and examples of application are examined in detail in [8,20,21,22,23,24,25].

Next, we review the general procedure for the performance of the mixed test and then show the test satisfies the NNPP when the hypothesis regarding the parameter of interest is a singleton.

First, we determine the predictive distributions for X under the competing hypotheses H and A,

f_{H}

and

f_{A}

, respectively. For the null hypothesis

H : θ \in Θ_{0}

,

Θ_{0} \subset Θ

,

f_{H}

is determined as follows: for each

x \in X

,

f_{H} (x) = \int_{Θ_{0}} L_{x} (θ) d P_{H} (θ),

(14)

where

P_{H}

denotes the conditional distribution of

θ

given

θ \in Θ_{0}

. That is, for each

x \in X

,

f_{H} (x)

is the expected value of the likelihood function generated by x against

P_{H}

. Similarly, for the alternative hypothesis

A : θ \in Θ_{0}^{c}

we define

f_{A} (x) = \int_{Θ_{0}^{c}} L_{x} (θ) d P_{A} (θ),

(15)

where

P_{A}

denotes the conditional distribution of

θ

given

θ \in Θ_{0}^{c}

. From (14) and (15), we obtain the Bayes factor of

x \in X

for the hypothesis H over A as

B F (x) = \frac{f_{H} (x)}{f_{A} (x)} .

(16)

Finally, the mixed test

φ^{*} : X \to {0, 1}

for the hypotheses H versus A consists in rejecting H when

x \in X

is observed if and only if the Bayes factor

B F (x)

is small. That is, for each

x \in X

,

φ^{*} (x) = 1 \Leftrightarrow B F (x) \leq b / a,

(17)

where the positive constants a and b reflect the decision maker’s evaluation of the impact of the errors of the two types or, equivalently, his prior preferences for the competing hypotheses. A detailed discussion on the specification of such constants is found in [8,20,21,22,23,24,25].

The mixed test can also be defined as a function of a new significance index. That is, (17) can be rewritten as a comparison between such a significance index and a specific cut-off value. These quantities are defined below.

For the mixed test defined in (17), the p-value of the observation

x_{0} \in X

is the significance index given by

p - v a l u e (x_{0}) = \sum_{x \in D (x_{0})} f_{H} (x),

(18)

where

D (x_{0}) = {x \in X : B F (x) \leq B F (x_{0})}

. Also, we define the adaptive type I error probability of

φ^{*}

as

\begin{matrix} α^{*} = \sum_{x \in D} f_{H} (x), \end{matrix}

(19)

where

D = {x \in X : B F (x) \leq b / a}

. Alternatively,

α^{*}

is also known as the adaptive significance level of

φ^{*}

.

Pereira et al. [21] proved that the mixed test

φ^{*}

for the hypotheses

\begin{matrix} H : & θ \in Θ_{0} \\ A : & θ \in Θ_{0}^{c} \end{matrix}

(20)

can be written as

φ^{*} (x) = 1 \Leftrightarrow B F (x) \leq b / a \Leftrightarrow p - v a l u e (x) \leq α^{*} .

(21)

Note that

φ^{*}

consists of comparing the p-

v a l u e

with the cut-off

α^{*}

, which depends on the specific statistical model under consideration and on the sample size, as opposed to a standard test with a fixed significance level that does not depend on the sample size.

The former does not have a few of the disadvantages of the latter, such as inconsistency ([8,26]) lack of correspondence between practical significance and statistical significance ([8,27]) and absence of logical coherence under multiple-hypothesis testing. We continue with the main results of the manuscript.

The Mixed Test Obeys the NNPP

In this subsection, we prove that the mixed test meets the NNPP when the hypothesis about the parameter of interest is simple. Next, we examine further the case in which there is a statistic s-ancillary for the parameter of interest and show how the introduction of the concepts of a conditional p-

v a l u e

and a conditional adaptive significance level can make performance of the mixed test much easier.

Theorem 3.

Let

θ = (θ_{1}, θ_{2})

and

Θ = Θ_{1} \times Θ_{2}

(that is, Θ is variation independent). Let

E = (X, θ, P)

and

\bar{E} = ((X, θ_{2}), θ_{1}, \bar{P})

be two experiments as defined in Section 2. Let

θ_{0} \in Θ_{1}

. In addition, let

φ^{*} : X \to {0, 1}

and

{\bar{φ}}^{*} : X \times Θ_{2} \to {0, 1}

be the mixed tests for the hypotheses

\begin{matrix} H : & θ \in {θ_{0}} \times Θ_{2} \\ A : & θ \notin {θ_{0}} \times Θ_{2} \end{matrix} a n d \begin{matrix} \bar{H} : & θ_{1} = θ_{0} \\ \bar{A} : & θ_{1} \neq θ_{0}, \end{matrix}

(22)

respectively. Assume

θ = (θ_{1}, θ_{2})

is absolutely continuous with prior density function π, with

θ_{1} ⫫ θ_{2}

. Then,

θ_{2}

is a Non-Informative Nuisance Parameter for testing

\bar{H}

versus

\bar{A}

by using

{\bar{φ}}^{*}

, and for every

x \in X

such that (1) holds,

φ^{*} (x) = 1 \Leftrightarrow {\bar{φ}}^{*} (x, θ_{2}) = 1

(23)

Theorem 3 tells us that when the likelihood function may be factored as (1), the mixed test obeys the NNPP. That is to say, if one aims to test a simple hypothesis about the parameter of interest

θ_{1}

in the presence of a non-informative nuisance parameter

θ_{2}

by means of the mixed test, then he can disregard

θ_{2}

in the analysis. Under a purely mathematical viewpoint, when

x \in X

satisfying (1) is observed, the decision between rejecting and accepting the null hypothesis regarding

θ_{1}

depends on

L_{x}

only through the factor

L_{x}^{1}

, which is not a function of

θ_{2}

, as we can see from Equation (A16) in Appendix A. It should be emphasized that Theorem 3 holds for null hypotheses more general than only simple ones. For instance, the Theorem is still valid when the null hypothesis H is of the form

H : θ \in Θ_{0} \times Θ_{2}

, where

Θ_{0} \subset Θ_{1}

is a hyperplane of

Θ_{1}

. The proof of this result is quite similar to the proof of Theorem 3 in Appendix A and for this reason is omitted.

The adherence to the NNPP is indeed an advantage of the mixed test. It may bring a considerable reduction in the calculations involved along the procedure of the mixed test, especially under statistical models for which a statistic s-ancillary for the parameter of interest can be found. Such cases are examined after Corollary 1, which follows straightforwardly from Theorems 2 and 3.

Corollary 1.

Assume the same conditions of Theorem 3 and suppose that

\exists T : X \to T

such that T is p-sufficient for

θ_{2}

and s-ancillary for

θ_{1}

. Then, for all

x \in X

,

φ^{*} (x) = 1 \Leftrightarrow {\bar{φ}}^{*} (x, θ_{2}) = 1

.

Now, let us suppose that under experiment

E = (X, θ, P)

, there is a statistic

T : X \to T

such that T is s-ancillary for

θ_{1}

. Let

H : θ \in {θ_{0}} \times Θ_{2}

be the hypothesis of interest. From the predictive distribution

f_{H}

for X, we can define for each value

t \in T (X) = {T (x) : x \in X}

the conditional probability function for X given

T (X) = t

,

f_{H, t}

by

f_{H, t} (x) = \frac{f_{H} (x)}{\sum_{y \in X : T (y) = t} f_{H} (y)}

(24)

if

T (x) = t

, and

f_{H, t} (x) = 0

, otherwise.

Finally, from the conditional distribution in (24), we define two conditional statistics: the conditional p-

v a l u e

and the conditional adaptive significance level. Such quantities will be of great importance for the performance of the mixed test, as we will see in the next section.

Definition 3.

Conditional p-value: Let

E = (X, θ, P)

be an experiment for which the statistic

T : X \to T

is s-ancillary for

θ_{1}

. Let

H : θ \in {θ_{0}} \times Θ_{2}

be the hypothesis of interest, and

f_{H, t}

,

t \in T (X)

, as in (24). We define the p-value conditional on T,

p_{T} - v a l u e : X \to [0, 1]

for each

x_{0} \in X

by

p_{T} - v a l u e (x_{0}) = \sum_{x \in D (x_{0})} f_{H, T (x_{0})} (x) = \frac{\sum_{x \in D_{T}^{*} (x_{0})} f_{H} (x)}{\sum_{x \in D_{T} (x_{0})} f_{H} (x)},

(25)

where

D_{T}^{*} (x_{0}) = {x \in X : B F (x) \leq B F (x_{0}), T (x) = T (x_{0})}

and

D_{T} (x_{0}) = {x \in X : T (x) = T (x_{0})}

. From Equation (A14), the

P_{T} - v a l u e

may be rewritten as

p_{T} - v a l u e (x_{0}) = \frac{\sum_{x \in D_{T}^{*} (x_{0})} [L_{x}^{1} (θ_{0}) \int_{Θ_{2}} L_{x}^{2} (θ_{2}) π_{2} (θ_{2}) d θ_{2}]}{\sum_{x \in D_{T} (x_{0})} [L_{x}^{1} (θ_{0}) \int_{Θ_{2}} L_{x}^{2} (θ_{2}) π_{2} (θ_{2}) d θ_{2}]},

(26)

where

L_{x}^{1} (θ_{0}) = P (X = x | T (X) = T (x), θ_{0})

and

L_{x}^{2} (θ_{2}) = P (T (X) = T (x) | θ_{2})

since T is s-ancillary for

θ_{1}

. It follows that

\begin{matrix} p_{T} - v a l u e (x_{0}) = \frac{\sum_{x \in D_{T}^{*} (x_{0})} [P (X = x | T (X) = T (x), θ_{0}) \int_{Θ_{2}} P (T (X) = T (x) | θ_{2}) π_{2} (θ_{2}) d θ_{2}]}{\sum_{x \in D_{T} (x_{0})} [P (X = x | T (X) = T (x), θ_{0}) \int_{Θ_{2}} P (T (X) = T (x) | θ_{2}) π_{2} (θ_{2}) d θ_{2}]} \\ = \frac{\sum_{x \in D_{T}^{*} (x_{0})} [P (X = x | T (X) = T (x_{0}), θ_{0}) \int_{Θ_{2}} P (T (X) = T (x_{0}) | θ_{2}) π_{2} (θ_{2}) d θ_{2}]}{\sum_{x \in D_{T} (x_{0})} [P (X = x | T (X) = T (x_{0}), θ_{0}) \int_{Θ_{2}} P (T (X) = T (x_{0}) | θ_{2}) π_{2} (θ_{2}) d θ_{2}]} = \\ = \frac{\sum_{x \in D_{T}^{*} (x_{0})} P (X = x | T (X) = T (x_{0}), θ_{0})}{\sum_{x \in D_{T} (x_{0})} P (X = x | T (X) = T (x_{0}), θ_{0})}; \end{matrix}

that is,

p_{T} - v a l u e (x_{0}) = \sum_{x \in D_{T}^{*} (x_{0})} P (X = x | T (X) = T (x_{0}), θ_{0}) .

(27)

Definition 4.

Conditional adaptive significance level: Let

E = (X, θ, P)

be an experiment for which the statistic

T : X \to T

is s-ancillary for

θ_{1}

. Let

H : θ \in {θ_{0}} \times Θ_{2}

be the hypothesis of interest and

f_{H, t}

,

t \in T (X)

be as in (24). We define the conditional adaptive significance level given T,

α_{T}^{*} : X \to [0, 1]

, for each

x_{0} \in X

by

α_{T}^{*} (x_{0}) = \sum_{x \in D} f_{H, T (x_{0})} (x) = \frac{\sum_{x \in D \cap D_{T} (x_{0})} f_{H} (x)}{\sum_{x \in D_{T} (x_{0})} f_{H} (x)} .

(28)

The conditional adaptive significance level

α_{T}^{*}

may be rewritten as

α_{T}^{*} (x_{0}) = \sum_{x \in D \cap D_{T} (x_{0})} P (X = x | T (X) = T (x_{0}), θ_{0}) .

(29)

Definitions 3 and 4 are conditional versions of Definitions in (18) and (19), respectively. While calculation of the unconditional quantities involves the evaluation of the Bayes factor for every

x \in X

, the determination of the conditional statistics at a specific sample point

x_{0} \in X

depends only on the values of the Bayes factor for the sample points x such that

T (x) = T (x_{0})

, which may be much easier to accomplish. Note also that the

p_{T}

-value and

α_{T}^{*}

can be seen, respectively, as an alternative (conditional) measure of evidence in favor of the null hypothesis H and an alternative threshold value for testing the competing hypotheses. As a matter of fact, one can substitute the p-

v a l u e

and the adaptive significance level with their conditional versions in order to perform the mixed test. This is exactly what the next theorem states.

Theorem 4.

Assume the same conditions as in Corollary 1 and Theorem 3. Then, for all

x_{0} \in X

,

φ^{*} (x_{0}) = 1 \Leftrightarrow p_{T} - v a l u e (x_{0}) ⩽ α_{T}^{*} (x_{0}) .

The results of Theorems 3 and 4 and Corollary 1 suggest a way the mixed test may be used without doing so many calculations: when an ancillary statistic for the parameter of interest, T, is available, one can perform the test by comparing the conditional statistics

p_{T} - v a l u e

and

α_{T}^{*}

instead of comparing the unconditional ones in Definitions (18) and (19). This possibility is illustrated in the next section.

4. Examples

We now revisit three well-known problems of hypothesis testing for count data and present new solutions to them by means of the mixed test. In each problem, we consider a suitable reparametrization of the standard model in order to ensure that

There exists a statistic T that is ancillary to the new parameter of interest;
The hypothesis about the new parameter of interest under the reparametrization is a singleton (or a hyperplane);
The new parameter of interest is independent of the new nuisance parameter a priori;
The distribution of the data X given any value of the statistic T is simple enough to render the calculations of the conditional p- $v a l u e$ and the conditional adaptive significance level easy.

4.1. Comparison of Poisson Means

Suppose we are interested in testing the equality between two Poisson means: say

θ_{1}

and

θ_{2}

. Let

θ = (θ_{1}, θ_{2})

. For this purpose, let

X = (X_{1}, X_{2})

be a random vector to be observed such that given

θ

,

X_{1}

and

X_{2}

are independent Poisson random variables with parameters

n θ_{1}

and

n θ_{2}

, respectively, where

n \in N^{*} = {1, 2, 3, \dots}

is a known integer. The hypotheses to be tested are

\begin{matrix} H : & θ \in Θ_{0} \\ A : & θ \in Θ_{0}^{c}, \end{matrix}

(30)

where

Θ_{0} = {(θ_{1}, θ_{2}) \in R_{+}^{2} : θ_{1} = θ_{2}}

. The likelihood function for

θ \in R_{+}^{2}

generated by

x = (x_{1}, x_{2}) \in N^{2}

is

L_{x} (θ) = \frac{{(n θ_{1})}^{x_{1}}}{x_{1}!} \frac{{(n θ_{2})}^{x_{2}}}{x_{2}!} e^{- θ_{1} n} e^{- θ_{2} n} .

(31)

Suppose also that

θ_{1}

and

θ_{2}

are independent a priori and that

θ_{i}

is distributed as a Gamma random variable with parameters

a_{i} > 0

and

c > 0

,

i = 1, 2

. That is, the prior density function of

θ

is

π (θ) = \frac{c^{a_{1}}}{Γ (a_{1})} θ_{1}^{a_{1} - 1} e^{- c θ_{1}} I_{(0, \infty)} (θ_{1}) \frac{c^{a_{2}}}{Γ (a_{2})} θ_{2}^{a_{2} - 1} e^{- c θ_{2}} I_{(0, \infty)} (θ_{2}) .

(32)

Although one can determine an exact expression for the Bayes factor in this case (as a matter of fact, in [25], the authors first presented a solution to the problem of testing the equality of Poisson means by using weighted likelihoods in the context of a production process monitoring procedure), the use of the mixed test under the above parametrization may be computationally disadvantageous, as the sample space is

N^{2}

and one should determine infinitely many Bayes factors to perform the test. To overcome this difficulty, we next consider the following reparametrization of the model: let

λ = (λ_{1}, λ_{2})

be the new parameter, where

λ_{1} = \frac{θ_{1}}{θ_{1} + θ_{2}} a n d λ_{2} = θ_{1} + θ_{2} .

(33)

The new parameter space is then

Λ = (0, 1) \times R_{+}

. Now, the hypotheses (30) can be rewritten as

\begin{matrix} \tilde{H} : & λ \in Λ_{0} \\ \tilde{A} : & λ \in Λ_{0}^{c}, \end{matrix}

(34)

with

Λ_{0} = {\frac{1}{2}} \times R_{+}

. Note that the likelihood function (31) can be rewritten by conditioning on the statistic

T (X) = X_{1} + X_{2}

as follows:

\begin{matrix} L_{x} (θ) = P (X_{1} = x_{1}, X_{2} = x_{2} | θ) = & P (X_{1} = x_{1}, X_{2} = x_{2} | X_{1} + X_{2} = x_{1} + x_{2}, θ) P (X_{1} + X_{2} = x_{1} + x_{2} | θ) \\ = & (\binom{x_{1} + x_{2}}{x_{1}}) {(\frac{θ_{1}}{θ_{1} + θ_{2}})}^{x_{1}} {(\frac{θ_{2}}{θ_{1} + θ_{2}})}^{x_{2}} \frac{e^{- n (θ_{1} + θ_{2})} {[n (θ_{1} + θ_{2})]}^{x_{1} + x_{2}}}{(x_{1} + x_{2})!} . \end{matrix}

(35)

Hence, the induced likelihood function for

λ

generated by

(x_{1}, x_{2})

may be factored as

{\tilde{L}}_{x} (λ) = [(\binom{x_{1} + x_{2}}{x_{1}}) λ_{1}^{x_{1}} {(1 - λ_{1})}^{x_{2}}] [\frac{{(n λ_{2})}^{x_{1} + x_{2}}}{(x_{1} + x_{2})!} e^{- λ_{2} n}]

(36)

Note that T is an ancillary statistic for

λ_{1}

, as it is distributed as a Poisson random variable with mean

n λ_{2}

, and the conditional distribution of X, given

T (X) = t

,

t \in N

, depends on

λ

only through

λ_{1}

. The prior distribution for

λ

is given by

\tilde{π} (λ) = \frac{Γ (a_{1} + a_{2})}{Γ (a_{1}) Γ (a_{2})} λ_{1}^{a_{1} - 1} {(1 - λ_{1})}^{a_{2} - 1} I_{(0, 1)} (λ_{1}) \frac{c^{a_{1} + a_{2}}}{Γ (a_{1} + a_{2})} λ_{2}^{a_{1} + a_{2} - 1} e^{- c λ_{2}} I_{(0, \infty)} (λ_{2}) .

(37)

Now, as (34), (36) and (37) hold, it follows from Theorem 3 that

λ_{2}

is an NNP and that the performance of the mixed test for the hypothesis

\tilde{H} : λ \in {\frac{1}{2}} \times R_{+}

against

\tilde{A}

based on

{\tilde{L}}_{x}

and the prior

\tilde{π}

is equivalent to the performance of the mixed test for the simple hypothesis

λ_{1} = \frac{1}{2}

against

λ_{1} \neq \frac{1}{2}

based on the binomial-like factor of

{\tilde{L}}_{x}

that depends only on

θ_{1}

and the marginal Beta prior density for

λ_{1}

ignoring the NNP

λ_{2}

. In addition, Theorem 4 implies that the test for

\tilde{H}

versus

\tilde{A}

reduces to the comparison of the statistics

p_{T}

-

v a l u e

and

α_{T}^{*}

at the observed sample point, say

x_{0} = (x_{01}, x_{02})

. Note that in this case, one does not need to evaluate the Bayes factor for every point of

N^{2}

but only for those

x_{01} + x_{02} + 1

of them for which the sum of the components is

x_{01} + x_{02}

. That is, one needs to evaluate the Bayes factor only for the elements of

{(u, v) \in N^{2} : T (u, v) = T (x_{0}) = x_{01} + x_{02}}

when

x_{0} = (x_{01}, x_{02})

is observed.

From Equations (A14) and (A15), one gets the following predictive functions under

\tilde{H}

and under

\tilde{A}

for X:

\begin{matrix} f_{\tilde{H}} (x) = & {(1 / 2)}^{x_{1} + x_{2}} (\binom{x_{1} + x_{2}}{x_{1}}) \times K \end{matrix}

(38)

and

\begin{matrix} f_{\tilde{A}} (x) = & (\binom{x_{1} + x_{2}}{x_{1}}) [\frac{Γ (x_{1} + a_{1}) Γ (x_{2} + a_{2})}{Γ (x_{1} + x_{2} + a_{1} + a_{2})}] [\frac{Γ (a_{1} + a_{2})}{Γ (a_{1}) Γ (a_{2})}] \times K, \end{matrix}

(39)

where

K = \int_{0}^{\infty} [\frac{{(n λ_{2})}^{x_{1} + x_{2}}}{(x_{1} + x_{2})!} e^{- λ_{2} n}] \frac{c^{a_{1} + a_{2}}}{Γ (a_{1} + a_{2})} λ_{2}^{a_{1} + a_{2} - 1} e^{- c λ_{2}} d λ_{2}

. Consequently, the Bayes factor

B F (x)

is

\begin{matrix} B F (x) = & {(1 / 2)}^{x_{1} + x_{2}} \frac{Γ (a_{1}) Γ (a_{2})}{Γ (a_{1} + a_{2})} \frac{Γ (x_{1} + x_{2} + a_{1} + a_{2})}{Γ (x_{1} + a_{1}) Γ (x_{2} + a_{2})} . \end{matrix}

(40)

Finally, it follows from (28) and (30) that for

x_{0} = (x_{01}, x_{02}) \in N^{2}

,

\begin{matrix} p_{T} - v a l u e (x_{0}) = & \sum_{(x_{1}, x_{2}) \in D_{T}^{*} (x_{0})} (\binom{x_{1} + x_{2}}{x_{1}}) {(1 / 2)}^{x_{1} + x_{2}} \\ = & \sum_{x_{1} = 0}^{T (x_{0})} (\binom{T (x_{0})}{x_{1}}) {(1 / 2)}^{T (x_{0})} I_{D (x_{0})} (x_{1}, T (x_{0}) - x_{1}) \end{matrix}

(41)

and

\begin{matrix} α_{T}^{*} (x_{0}) = & \sum_{(x_{1}, x_{2}) \in D \cap D_{T} (x_{0})} (\binom{x_{1} + x_{2}}{x_{1}}) {(1 / 2)}^{x_{1} + x_{2}} \\ = & \sum_{x_{1} = 0}^{T (x_{0})} (\binom{T (x_{0})}{x_{1}}) {(1 / 2)}^{T (x_{0})} I_{D} (x_{1}, T (x_{0}) - x_{1}) . \end{matrix}

(42)

Note that in this case, the conditional

p_{T}

-

v a l u e

resembles the frequentist p-

v a l u e

for the simple hypothesis

θ = \frac{1}{2}

under simple random sampling from the Bernoulli model with parameter

θ

(however, for the calculation of the

p_{T}

-

v a l u e

, the sample space is ordered by the Bayes factor instead of the likelihood ratio).

Example 3

(Comparison of Poisson means). In [25], the authors consider that a methodology to detect a shift in a production process is to compare the quality index of the current rating period P,

θ_{2}

, with the quality index of the previous rating period,

θ_{1}

. Suppose that we want to test if a process is under control; that is, if

θ_{1} = θ_{2}

. For this purpose, two audit samples of size

n = 10

are collected at rating periods

P - 1

and P, respectively. Let

X_{1}

represent the number of defects found in the first sample and

X_{2}

represent the number of defects found in the second sample. Also suppose that

X_{1}

and

X_{2}

are Poisson random variables with parameters

n θ_{1}

and

n θ_{2}

, respectively. Let

X = (X_{1}, X_{2})

. For simplicity, we consider the hyperparameters in (32) as

a_{1} = a_{2} = c = 1

. Hence, the predictive functions under the competing hypothesis are given by:

f_{H} (x) = {(1 / 2)}^{x_{1} + x_{2}} [(x_{1} + x_{2} + 1) (\binom{x_{1} + x_{2}}{x_{1}})] [{(\frac{n}{n + 1})}^{x_{1} + x_{2}} {(\frac{1}{n + 1})}^{2}],

(43)

and

f_{A} (x) = {(\frac{n}{n + 1})}^{x_{1} + x_{2}} {(\frac{1}{n + 1})}^{2} .

(44)

Consequently, the Bayes factor at

x = (x_{1}, x_{2}) \in N^{2}

can be expressed by

B F (x) = {(1 / 2)}^{x_{1} + x_{2}} (x_{1} + x_{2} + 1) (\binom{x_{1} + x_{2}}{x_{1}}) .

(45)

Now, suppose that two defects are found at rating period

P - 1

and nine defects are found at period P. That is, suppose that

X = (2, 9)

is observed. In this case, one gets

B F (2, 9) = 0.322

. Considering

\frac{b}{a} = 1

, the conditional adaptive significance level and the conditional

p_{T}

-value at

(2, 9)

are, respectively,

α_{T}^{*} ((2, 9)) = \sum_{x_{1} \in {0, 1, 2, 3, 8, 9, 10, 11}} (\binom{2 + 9}{x_{1}}) {(1 / 2)}^{2 + 9} = 0.227 a n d

P_{T} - value ((2, 9)) = \sum_{x_{1} \in {0, 1, 2, 9, 10, 11}} (\binom{11}{x_{1}}) {(1 / 2)}^{11} = 0.065 .

Since

P_{T} - value ((2, 9)) < α_{T}^{*} ((2, 9))

, the decision is to reject the null hypothesis (34), where

λ = (\frac{θ_{1}}{θ_{1} + θ_{2}}, θ_{1} + θ_{2})

.

Note that although the sample size is small, the null hypothesis can be rejected with a conditional

P_{T} - v a l u e

of 0.065. Such a value is not compared with standard (fixed) cut-off values such as 0.01 or 0.05 but rather with the conditional adaptive significance level of 0.227 for

X = (2, 9)

. Note also that performance of the mixed test by means of the conditional statistics

p_{T}

-

v a l u e

and

α_{T}^{*}

when

X = (2, 9)

is observed requires the calculation of only finitely many Bayes factors (twelve, precisely) even though the sample space is infinite.

4.2. Test of Symmetry

Suppose we want to test the hypothesis of symmetry in an

r \times r

two-way contingency table. Several methods have been proposed for testing diagonal symmetry: see, for example, [28,29,30,31,32,33] and references therein. Here we propose a solution to this problem by using the mixed test and its properties. We present the simplest case

r = 2

. The reader will find the general case

r > 2

in Appendix A for the sake of readability.

Suppose each element (individual) of a sample of size n is classified into four mutually exclusive combinations of the two-valued variables

X_{1}

and

X_{2}

. Let Table 1 represent the observed frequencies of the cross-classifications, where

X_{i j}

is the number of individuals classified into the

i - t h

category of

X_{1}

and the

j - t h

category of

X_{2}

,

i, j = 1, 2

. Let

θ = (θ_{11},

θ_{12}, θ_{21})

, with

θ_{i j} \geq 0

and

θ_{11} + θ_{12} + θ_{21} \leq 1

, where

θ_{i j}

denotes the probability of classification into the

i - t h

category of

X_{1}

and the

j - t h

category of

X_{2}

,

i, j = 1, 2

.

The hypotheses for testing diagonal symmetry are

\begin{matrix} H : & θ_{12} = θ_{21} \\ A : & θ_{12} \neq θ_{21} \end{matrix}

(46)

We assume that the vector

X = (X_{11}, X_{12}, X_{21})

is, given

θ

, a multinomial random vector with parameters n and

θ

. The likelihood function generated by

x = (x_{11}, x_{12}, x_{21})

is then given by

L_{x} (θ) = \frac{n!}{x_{11}! x_{12}! x_{21}! x_{22}!} θ_{11}^{x_{11}} θ_{12}^{x_{12}} θ_{21}^{x_{21}} θ_{22}^{x_{22}},

(47)

where

x_{22} = n - x_{11} - x_{12} - x_{21}

and

θ_{22} = 1 - θ_{11} - θ_{12} - θ_{21}

. Assume also a prior Dirichlet distribution with parameter vector

α = (α_{11}, α_{12}, α_{21}; α_{22})

,

α_{i j} > 0

for

θ

. That is,

π (θ) = \frac{Γ (α_{11} + α_{12} + α_{21} + α_{22})}{Γ (α_{11}) Γ (α_{12}) Γ (α_{21}) Γ (α_{22})} θ_{11}^{α_{11} - 1} θ_{12}^{α_{12} - 1} θ_{21}^{α_{21} - 1} θ_{22}^{α_{22} - 1} I_{Θ} (θ),

(48)

where

Θ = {(u_{1}, u_{2}, u_{3}) \in R_{+}^{3} : u_{1} + u_{2} + u_{3} \leq 1}

.

We should note that the determination of the predictive functions is much easier under the following reparametrization of the model: let us define

λ_{11} = θ_{11} λ_{21} = θ_{12} + θ_{21} and λ_{12} = \frac{θ_{12}}{θ_{12} + θ_{21}} .

Let

λ = (λ_{12}, (λ_{11}, λ_{21}))

. Thus, the new parameter space is

Λ = (0, 1) \times S_{2}

, where

S_{2} = {(u, v) \in R_{+}^{2} : u + v \leq 1}

.

Then, we can reformulate the hypotheses (46) as

\begin{matrix} \tilde{H} : & λ \in Λ_{0} \\ \tilde{A} : & λ \in Λ_{0}^{c}, \end{matrix}

(49)

where

Λ_{0} = {\frac{1}{2}} \times S_{2}

. Note that the likelihood function for

θ

generated by

x = (x_{11},

x_{12}, x_{21}) \in {(a, b, c) \in N^{3} : a + b + c \leq n}

can be rewritten by conditioning on the statistic

T (X) = (X_{11}, X_{12} + X_{21})

as

\begin{matrix} L_{x} (θ) = & P (X = (x_{11}, x_{12}, x_{21}) | T (X) = (x_{11}, x_{12} + x_{21}), θ) P (T (X) = (x_{11}, x_{12} + x_{21}) | θ) \\ = & (\binom{x_{12} + x_{21}}{x_{12}}) {(\frac{θ_{12}}{θ_{12} + θ_{21}})}^{x_{12}} {(\frac{θ_{21}}{θ_{12} + θ_{21}})}^{x_{21}} \frac{n!}{x_{11}! (x_{12} + x_{21})! x_{22}!} θ_{11}^{x_{11}} {(θ_{12} + θ_{21})}^{x_{12} + x_{21}} θ_{22}^{x_{22}} . \end{matrix}

(50)

Hence, the induced likelihood function for

λ

generated by

x = (x_{11}, x_{12}, x_{21})

may be factored as

{\tilde{L}}_{x} (λ) = [(\binom{x_{12} + x_{21}}{x_{12}}) λ_{12}^{x_{12}} {(1 - λ_{12})}^{x_{21}}] [\frac{n!}{x_{11}! (x_{12} + x_{21})! x_{22}!} λ_{11}^{x_{11}} λ_{21}^{x_{12} + x_{21}} {(1 - λ_{11} - λ_{21})}^{x_{22}}]

(51)

Note that T is an ancillary statistic for

λ_{12}

as it is a multinomial random vector with parameters n and

(λ_{11}, λ_{21})

, and the conditional distribution of X given

T (X) = t

,

t \in {(u, v) \in N^{2} : u + v \leq n}

depends on

λ

only through

λ_{12}

. The prior distribution for

λ

is given by

\tilde{π} (λ) = \frac{Γ (α_{12} + α_{21})}{Γ (α_{12}) Γ (α_{21})} λ_{12}^{α_{12} - 1} {(1 - λ_{12})}^{α_{21} - 1} I_{(0, 1)} (λ_{12}) {\tilde{π}}_{(λ_{11}, λ_{21})} (λ_{11}, λ_{21}),

(52)

where

{\tilde{π}}_{(λ_{11}, λ_{21})}

is the prior Dirichlet distribution for

(λ_{11}, λ_{21})

with parameter vector

(α_{11}, α_{12} + α_{21}, α_{22})

.

As in the example of the previous subsection, the results from Section 3 imply that

(λ_{11}, λ_{21})

is an NNP for testing the hypotheses

\tilde{H}

versus

\tilde{A}

in (50) by using the mixed test. In addition, we only need to compare the conditional

p_{T}

-

v a l u e

with the conditional adaptive significance level to decide between the hypotheses.

From Equations (A14) and (A15), one gets the following predictive functions under

\tilde{H}

and under

\tilde{A}

for X:

\begin{matrix} f_{\tilde{H}} (x_{11}, x_{12}, x_{21}) = {(1 / 2)}^{x_{12} + x_{21}} (\binom{x_{12} + x_{21}}{x_{12}}) \times K \end{matrix}

(53)

and

\begin{matrix} f_{\tilde{A}} (x) = & (\binom{x_{12} + x_{21}}{x_{12}}) [\frac{Γ (x_{12} + α_{12}) Γ (x_{21} + α_{21})}{Γ (x_{12} + x_{21} + α_{12} + α_{21})}] [\frac{Γ (α_{12} + α_{21})}{Γ (α_{12}) Γ (α_{21})}] \times K, \end{matrix}

(54)

where

K = \int_{S_{2}} [\frac{n!}{x_{11}! (x_{12} + x_{21})! x_{22}!} λ_{11}^{x_{11}} λ_{21}^{x_{12} + x_{21}} {(1 - λ_{11} - λ_{21})}^{x_{22}}] {\tilde{π}}_{(λ_{11}, λ_{21})} (λ_{11}, λ_{21}) d λ_{11} d λ_{21} .

Consequently, the Bayes factor

B F (x)

is

\begin{matrix} B F (x) = & {(1 / 2)}^{x_{12} + x_{21}} \frac{Γ (α_{12}) Γ (α_{21})}{Γ (α_{12} + α_{21})} \frac{Γ (x_{12} + x_{21} + α_{12} + α_{21})}{Γ (x_{12} + α_{12}) Γ (x_{21} + α_{21})} . \end{matrix}

(55)

Finally, it follows from (28) and (30) that for

x = (x_{11}, x_{12}, x_{21})

,

\begin{matrix} p_{T} - v a l u e (x) = & \sum_{(y_{11}, y_{12}, y_{21}) \in D_{T}^{*} (x)} (\binom{y_{12} + y_{21}}{y_{12}}) {(1 / 2)}^{y_{12} + y_{21}} \\ = & \sum_{y_{12} = 0}^{x_{12} + x_{21}} (\binom{x_{12} + x_{21}}{y_{12}}) {(1 / 2)}^{x_{12} + x_{21}} I_{D (x)} (x_{11}, y_{12}, x_{12} + x_{21} - y_{12}) \end{matrix}

(56)

and

\begin{matrix} α_{T}^{*} (x) = & \sum_{(y_{11}, y_{12}, y_{21}) \in D \cap D_{T} (x)} (\binom{y_{12} + y_{21}}{y_{12}}) {(1 / 2)}^{y_{12} + y_{21}} \\ = & \sum_{y_{12} = 0}^{x_{12} + x_{21}} (\binom{x_{12} + x_{21}}{y_{12}}) {(1 / 2)}^{x_{12} + x_{21}} I_{D} (x_{11}, y_{12}, x_{12} + x_{21} - y_{12}) \end{matrix}

(57)

In this example (as in the previous subsection), the conditional

p_{T}

-

v a l u e

looks like the frequentist p-value for the simple hypothesis

θ = \frac{1}{2}

regarding an unknown proportion. We should emphasize that for calculation of the

p_{T}

-

v a l u e

, the sample space is ordered by the Bayes factor in place of the likelihood ratio. Note also that the evaluation of this conditional statistic involves ordering at most

n + 1

points of the sample space (exactly those for which the statistic T takes the value

T (x_{0})

when

x_{0}

is the effectively observed sample point). On the other hand, if one performs the mixed test without using these conditional quantities, he shall order all

(\binom{n + 3}{3}) = \frac{(n + 3) (n + 2) (n + 1)}{6}

elements of the sample space.

Example 4

(Analysis of opinion swing). Suppose it is of interest to evaluate whether the proportion of individuals that did not support the US President before the State of the Union Address remained unchanged after his address. For this purpose,

n = 100

individuals are surveyed with regard to their support for the President before and after his annual message. The survey results are displayed in the following

2 \times 2

contingency Table 2:

Let

X_{1}

(

X_{2}

) be the support—“No” or “Yes”—for the President before (after) the State of the Union Address. Let

θ_{i j}

be the probability that an individual is classified into the i-th category of

X_{1}

and j-th category of

X_{2}

(for instance,

θ_{11}

is the probability that an individual does not support the President both before and after his address). The hypothesis that the support for the President remains unchanged is

θ_{21} + θ_{22} = θ_{12} + θ_{22}

. This is equivalent to the hypothesis that the proportion of swings from “Yes” to “No” is equal to the proportion of swings from “No” to “Yes”; that is, this is equivalent to the symmetry hypothesis

θ_{12} = θ_{21}

. Thus, we can test such a hypothesis by means of the mixed test considering the mathematical setup of this subsection. Suppose

α = (1, 1, 1, 1)

. Then, the Bayes factor is given by

B F (x_{11}, x_{12}, x_{21}) = {(1 / 2)}^{x_{12} + x_{21}} (x_{12} + x_{21} + 1) (\binom{x_{12} + x_{21}}{x_{12}}) .

(58)

For the observed data

x = (20, 17, 10)

, we obtain the Bayes factor

B F ((20, 17, 10)) = 1.76

. Considering

a = b

(that is

b / a = 1)

, we do not reject the null hypothesis since

B F ((20, 17, 10)) > 1

. In this case, the conditional adaptive significance level and the conditional

P_{T} - value

at the point

((20, 17, 10))

are

α_{T}^{*} ((20, 17, 10)) = \sum_{y_{12} \in {0, \dots, 9} \cup {18, \dots, 27}} (\binom{17 + 10}{y_{12}}) {(1 / 2)}^{17 + 10} = 0.122 a n d

P_{T} - value ((20, 17, 10)) = \sum_{y_{12} \in {0, \dots, 10} \cup {17, \dots, 27}} (\binom{27}{y_{12}}) {(1 / 2)}^{27} = 0.248 .

Note that

p_{T}

-value

((20, 17, 10)) > α_{T}^{*} ((20, 17, 10))

, as it was expected. Note also that we ordered only 28 elements of the sample space by the Bayes factor to determine the above conditional quantities. To calculate the unconditional ones, we should have ordered all 176,851 points in the sample space.

4.3. Test of Independence

Consider the same statistical model as in the previous subsection. However, now we want to evaluate whether there exists (or not) association between the variables

X_{1}

and

X_{2}

. For this purpose, we may test the independence hypothesis between these variables. Consider the joint distribution for

(X_{1}, X_{2})

in Table 3 below:

The hypotheses to be tested are

\begin{matrix} H : & θ_{11} = (θ_{11} + θ_{12}) (θ_{11} + θ_{21}) \\ A : & θ_{11} \neq (θ_{11} + θ_{12}) (θ_{11} + θ_{21}) \end{matrix}

(59)

It is easy to check that hypotheses H and A can be rewritten as

\begin{matrix} H : & \frac{θ_{11}}{θ_{11} + θ_{12}} = \frac{θ_{21}}{1 - (θ_{11} + θ_{12})} \\ A : & \frac{θ_{11}}{θ_{11} + θ_{12}} \neq \frac{θ_{21}}{1 - (θ_{11} + θ_{12})} \end{matrix}

(60)

Let us define

λ_{11} = \frac{θ_{11}}{θ_{11} + θ_{12}} λ_{21} = \frac{θ_{21}}{1 - (θ_{11} + θ_{12})} and λ_{12} = θ_{11} + θ_{12}

and consider the new parameter

λ = ((λ_{11}, λ_{21}), λ_{12})

, which takes value in

Λ = {(0, 1)}^{2} \times (0, 1)

. Let

T (X) = T (X_{11}, X_{12}, X_{21}) = X_{11} + X_{12}

. Proceeding as in the previous subsections, we obtain the following induced likelihood function for

λ

generated by

x = (X_{11}, x_{12}, x_{21})

{\tilde{L}}_{x} (λ) = [(\binom{x_{11} + x_{12}}{x_{11}}) λ_{11}^{x_{11}} {(1 - λ_{11})}^{x_{12}} (\binom{x_{21} + x_{22}}{x_{21}}) λ_{21}^{x_{21}} {(1 - λ_{21})}^{x_{22}}] [(\binom{n}{x_{11} + x_{12}}) λ_{12}^{x_{11} + x_{12}} {(1 - λ_{12})}^{x_{21} + x_{22}}]

(61)

Note that T is an ancillary statistic for

(λ_{11}, λ_{21})

, as it is a binomial random variable with parameters n and

λ_{12}

. In addition, for each possible value t of the statistic T, the conditional distribution of X given

T (X) = t

depends on

λ

only through

(λ_{11}, λ_{21})

.

The prior distribution for

λ

is such that

λ_{11}

,

λ_{21}

and

λ_{12}

are independent Beta random variables with respective parameters

α_{11}

and

α_{12}

,

α_{21}

and

α_{22}

, and

α_{11} + α_{12}

and

α_{21} + α_{22}

.

Finally, note that under the new parametrization, the independence hypothesis is

\begin{matrix} \tilde{H} : & λ \in Λ_{0} \\ \tilde{A} : & λ \notin Λ_{0}, \end{matrix}

(62)

where

Λ_{0} = {(u, v) \in {(0, 1)}^{2} : u = v} \times (0, 1)

.

From the results from Section 3, it follows that

λ_{12}

is an NNP for testing the hypotheses

\tilde{H}

versus

\tilde{A}

above by using the mixed test. In addition, we only need to compare the conditional

P_{T} - v a l u e

with the conditional adaptive significance level to decide between these hypotheses. In a sense, the test for the hypothesis of independence between

X_{1}

and

X_{2}

by using the conditional statistics resembles the test for the hypothesis of homogeneity were the marginal counts

X_{11} + X_{12}

and

n - X_{11} - X_{12}

fixed beforehand.

Considering as in the previous section

α = (1, 1, 1, 1)

, we obtain the following expression for the Bayes factor:

B F (x_{11}, x_{12}, x_{21}) = \frac{36 (x_{11} + x_{21} + 1)! (n + 1 - x_{11} - x_{21})!}{x_{11}! x_{12}! x_{21}! x_{22}! (n + 3) (\binom{n + 2}{x_{11} + x_{21} + 1})} .

The conditional predictive probability function for X given

T (X) = X_{11} + X_{12} = t

,

t = 0, \dots, n

is given by

f_{H, t} (x_{11}, x_{12}, x_{21}) = \frac{6 (\binom{t}{x_{11}}) (\binom{n - t}{x_{21}})}{(n + 3) (\binom{n + 2}{x_{11} + x_{21} + 1})} I_{T^{- 1} (t)} (x_{11}, x_{12}, x_{21}),

(63)

where

T^{- 1} (t) = {(y_{11}, y_{12}, y_{21}) \in X : T (y_{11}, y_{12}, y_{21}) = y_{11} + y_{12} = t}

.

From the above distribution, one may obtain the conditional

P_{T} - v a l u e

and the conditional adaptive significance level

α_{T}^{*}

at each point in the sample space.

Example 5

(Market’s directional change). In [34] it is argued that the directional change of the stock market in January signals the directional change of the market for the remainder of the year. Suppose the following Table 4 summarizes the directional changes of the prices of a few stocks in both periods.

In this case, the Bayes factor is given by

B F (x_{11}, x_{12}, x_{21}) = \frac{36 (x_{11} + x_{21} + 1)! (16 - x_{11} - x_{21})!}{x_{11}! x_{12}! x_{21}! x_{22}! 18 (\binom{17}{x_{11} + x_{21} + 1})} .

(64)

For the observed data

x = (5, 0, 2)

, we obtain the Bayes factor

B F ((5, 0, 2)) = 0.244

. Considering

a = b

(that is

b / a = 1)

, we reject the null hypothesis by using the mixed test since

B F ((5, 0, 2)) < 1

. That is, the data from only a few stocks reveal that the directional change for the remainder of the year depends on the directional change in January. Note that although the sample size is small (

n = 15

) and a cell count is equal to zero, the mixed test can be fully performed, as opposed to standard tests for the hypothesis of independence that rely on asymptotic results. In this case, the conditional adaptive significance level and the conditional

p_{T}

-

v a l u e

at the point

((5, 0, 2))

are

α_{T}^{*} ((5, 0, 2)) = 0.0109 a n d P_{T} - value ((5, 0, 2)) = 0.0022 .

5. Discussion

Statistical hypothesis testing is an important quantitative method that may help the daily activity of scientists from different areas of knowledge. However, with recent computational advances, the misuse of standard tests have come to light. Thus, problems with tests of significance and fixed-level tests have brought a growing need for alternative approaches to hypothesis testing that do not have such drawbacks. Among these alternatives, we revisit in this manuscript the mixed test by Pericchi and Pereira, which combines aspects from two opposing viewpoints: the frequentist and the Bayesian. The mixed test satisfies various reasonable properties one desires when performing a test of hypotheses. Here we prove that the mixed test also meets the Non-Informative Nuisance Parameter Principle (NNPP) for simple hypotheses regarding the parameter of interest. The NNPP concerns the question of how to make inferences about a parameter in the presence of Non-Informative Nuisance Parameters: it states that it is irrelevant whether a Non-Informative Nuisance Parameter is known or unknown in order to draw conclusions about a quantity of interest from data. This principle, though important, has not been explored in some depth, and for this reason, we studied it further in hypothesis testing problems. Nuisance parameters typically affect inferences about a parameter of interest: when the variance is unknown, estimation of the mean of a normal distribution and estimation of the parameters of a linear regression model are examples of this.

Adherence of the mixed test to the NNPP allowed for much easier performance of the test, as the calculations involved were significantly reduced. Indeed, decision making between the competing statistical hypotheses was simplified in the three examples we examined: in each situation, conditioning on a suitable statistic and considering conditional versions of the p-value and the adaptive significance level were revealed to be an advantageous course of action to use the mixed test. The extent to which the adherence of the mixed test to the NNPP is valid and the use of the mixed test can then be made easier remains unanswered in this work. This issue is the goal of future investigation.

Author Contributions

All authors have contributed to the conceptualization, formal analysis and writing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico [grant 141161/2018-3].

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Proof of Theorem 2.

Suppose there exists a statistic

T : X \to T

such that it is p-sufficient for

θ_{1}

and s-ancillary for

θ_{2}

. Then,

\begin{matrix} P (X = x | θ) = & P (X = x, T (X) = T (x) | θ) \\ = & P (X = x | T (X) = T (x), θ_{1}, θ_{2}) P (T (X) = T (x) | θ_{1}, θ_{2}) \end{matrix}

(A1)

The result is immediate: as the conditional distribution of X given

T (X) = T (x)

and

θ

depends only on

θ_{2}

, and the marginal distribution of T given

θ

depends only on

θ_{1}

, one can write for each

x \in X

,

P (T (X) = T (x) | θ_{1}, θ_{2}) = L_{x}^{1} (θ_{1})

and

P (X = x | T (X) = T (x) | θ_{1}, θ_{2}) = L_{x}^{2} (θ_{2})

. The proof when T is p-sufficient for

θ_{2}

and s-ancillary for

θ_{1}

is analogous. □

Proof of Theorem 1.

We first prove the

(\Rightarrow)

part of the theorem. Suppose that

x \in X

is such that

θ_{1} ⫫ θ_{2} | X = x

; that is, the posterior distribution of

θ

given

X = x

can be factored as

π (θ | x) = π_{1} (θ_{1} | x) π_{2} (θ_{2} | x)

. Then,

\begin{matrix} π (θ | x) = π_{1} (θ_{1} | x) π_{2} (θ_{2} | x) \Rightarrow \frac{L_{x} (θ) π (θ)}{\int_{Θ} L_{x} (θ) π (θ) d θ} = \frac{\int_{Θ_{2}} L_{x} (θ) π (θ) d θ_{2}}{\int_{Θ} L_{x} (θ) π (θ) d θ} \frac{\int_{Θ_{1}} L_{x} (θ) π (θ) d θ_{1}}{\int_{Θ} L_{x} (θ) π (θ) d θ} . \end{matrix}

(A2)

Due to the fact that

θ_{1} ⫫ θ_{2}

, it follows from the last equality in (A2) that

\begin{matrix} L_{x} (θ) π_{1} (θ_{1}) π_{2} (θ_{2}) = & π_{1} (θ_{1}) \int_{Θ_{2}} L_{x} (θ) π_{2} (θ_{2}) d θ_{2} \frac{π_{2} (θ_{2}) \int_{Θ_{1}} L_{x} (θ) π_{1} (θ_{1}) d θ_{1}}{\int_{Θ} L_{x} (θ) π (θ) d θ} . \end{matrix}

(A3)

The result follows considering, for instance,

L_{x}^{1} (θ_{1}) = \int_{Θ_{2}} L_{x} (θ) π_{2} (θ_{2}) d θ_{2} and L_{x}^{2} (θ_{2}) = \frac{\int_{Θ_{1}} L_{x} (θ) π (θ_{1}) d θ_{1}}{\int_{Θ} L_{x} (θ) π (θ) d θ} .

(A4)

□

Next, we prove the converse. Suppose that the likelihood can be factored as

L_{x} (θ) = L_{x}^{1} (θ_{1}) L_{x}^{2} (θ_{2})

. Then,

\begin{matrix} π (θ | x) = \frac{L_{x} (θ) π (θ)}{\int_{Θ} L_{x} (θ) π (θ) d θ} = \frac{L_{x}^{1} (θ_{1}) L_{x}^{2} (θ_{2}) π_{1} (θ_{1}) π_{2} (θ_{2})}{\int_{Θ_{1} \times Θ_{2}} L_{x}^{1} (θ_{1}) L_{x}^{2} (θ_{2}) π_{1} (θ_{1}) π_{2} (θ_{2}) d θ} . \end{matrix}

(A5)

The posterior marginal density of

θ_{i}

is obtained from (A5) by integrating out the other component. Thus,

\begin{matrix} π_{i} (θ_{i} | x) = \frac{L_{x}^{i} (θ_{i}) π_{i} (θ_{i})}{\int_{Θ_{i}} L_{x}^{i} (θ_{i}) π_{i} (θ_{i}) d θ_{i}}, i = 1, 2, \end{matrix}

(A6)

and therefore,

π (θ_{1}, θ_{2} | x) = π_{1} (θ_{1} | x) π_{2} (θ_{2} | x)

; that is,

θ_{1} ⫫ θ_{2} | X = x

. □

Proof of Theorem 3.

We first verify that

θ_{2}

is an NNP for testing

\bar{H}

versus

\bar{A}

by means of

{\bar{φ}}^{*}

. Recall that

{\bar{φ}}^{*} (x, θ_{2}) = 1 \Leftrightarrow \frac{{\bar{f}}_{\bar{H}} (x, θ_{2})}{{\bar{f}}_{\bar{A}} (x, θ_{2})} < \frac{b}{a},

(A7)

where

{\bar{f}}_{H}

(

{\bar{f}}_{A}

) is the predictive distribution for

(X, θ_{2})

obtained under

\bar{H}

(

\bar{A}

). In this case, the likelihood function generated by

(x_{0}, θ_{2})

for

θ_{1}

with

x_{0} \in X

such that (1) holds and

θ_{2} \in Θ_{2}

is

\begin{matrix} {\bar{L}}_{(x_{0}, θ_{2})} (θ_{1}) = & P (X = x_{0} | θ_{1}, θ_{2}) π_{2} (θ_{2} | θ_{1}) = L_{x_{0}} (θ_{1}, θ_{2}) π_{2} (θ_{2}) \\ = & L_{x_{0}}^{1} (θ_{1}) L_{x_{0}}^{2} (θ_{2}) π_{2} (θ_{2}) . \end{matrix}

(A8)

Then, the predictive function under the null hypothesis

\bar{H}

can be calculated as

\begin{matrix} {\bar{f}}_{\bar{H}} (x_{0}, θ_{2}) = \int_{{θ_{0}}} {\bar{L}}_{(x_{0}, θ_{2})} (θ_{1}) d P_{\bar{H}} (θ_{1}), \end{matrix}

(A9)

where

P_{\bar{H}}

is degenerate at

θ_{0}

conditional distribution of

θ_{1} | θ_{1} = θ_{0}

. Thus,

{\bar{f}}_{\bar{H}} (x_{0}, θ_{2}) = L_{x_{0}}^{1} (θ_{0}) L_{x_{0}}^{2} (θ_{2}) π_{2} (θ_{2}) .

(A10)

In addition, the predictive function under the alternative hypothesis

\bar{A}

is given by

\begin{matrix} {\bar{f}}_{\bar{A}} (x_{0}, θ_{2}) = & \int_{Θ_{1}} L_{x_{0}}^{1} (θ_{1}) L_{x_{0}}^{2} (θ_{2}) π_{2} (θ_{2}) d P_{{\bar{H}}_{1}} (θ_{1}) \\ = & \int_{Θ_{1}} L_{x_{0}}^{1} (θ_{1}) L_{x_{0}}^{2} (θ_{2}) π_{2} (θ_{2}) π_{1} (θ_{1}) d θ_{1} \\ = & L_{x_{0}}^{2} (θ_{2}) π_{2} (θ_{2}) \int_{Θ_{1}} L_{x_{0}}^{1} (θ_{1}) π_{1} (θ_{1}) d θ_{1} . \end{matrix}

(A11)

Thus, the Bayes factor can be expressed by

\begin{matrix} \frac{{\bar{f}}_{\bar{H}} (x_{0}, θ_{2})}{{\bar{f}}_{\bar{A}} (x_{0}, θ_{2})} = & \frac{L_{x_{0}}^{1} (θ_{0}) L_{x_{0}}^{2} (θ_{2}) π_{2} (θ_{2})}{L_{x_{0}}^{2} (θ_{2}) π_{2} (θ_{2}) \int_{Θ_{1}} L_{x_{0}}^{1} (θ_{1}) π_{1} (θ_{1}) d θ_{1}} \\ = & \frac{L_{x_{0}}^{1} (θ_{0})}{\int_{Θ_{1}} L_{x_{0}}^{1} (θ_{1}) π_{1} (θ_{1}) d θ_{1}} . \end{matrix}

(A12)

Note that Equation (A12) does not depend on

θ_{2}

. As a result, the test in (A7) does not depend on

θ_{2}

, and consequently,

θ_{2}

is an NNP for testing

\bar{H}

versus

\bar{A}

by means of

{\bar{φ}}^{*}

. Now, we shall determine the test

φ^{*}

for H versus A. The predictive distribution for X at

x_{0}

under the null hypothesis is

f_{H} (x_{0}) = \int_{Θ_{1} \times Θ_{2}} L_{x_{0}} (θ_{1}, θ_{2}) d P_{H} (θ_{1}, θ_{2}) .

(A13)

It is not difficult to verify that for fixed

θ_{0} \in Θ_{1}

, the conditional distribution of

θ

given

θ_{1} = θ_{0}

is such that

θ_{1}

is degenerate at

θ_{0}

, and

θ_{2}

is independent of

θ_{1}

with density

π_{2}

.

Then,

\begin{matrix} f_{H} (x_{0}) = & \int_{Θ_{1} \times Θ_{2}} L_{x_{0}} (θ_{1}, θ_{2}) d P_{H} (θ_{1}, θ_{2}) \\ = & \int_{Θ_{2}} L_{x_{0}}^{1} (θ_{0}) L_{x_{0}}^{2} (θ_{2}) π_{2} (θ_{2}) d θ_{2} \\ = & L_{x_{0}}^{1} (θ_{0}) \int_{Θ_{2}} L_{x_{0}}^{2} (θ_{2}) π_{2} (θ_{2}) d θ_{2} . \end{matrix}

(A14)

For the alternative hypothesis, we have that

\begin{matrix} f_{A} (x_{0}) = & \int_{Θ_{1} \times Θ_{2}} L_{x_{0}}^{1} (θ_{1}) L_{x_{0}}^{2} (θ_{2}) d P_{A} (θ_{1}, θ_{2}) \\ = & \int_{Θ_{1} \times Θ_{2}} L_{x_{0}}^{1} (θ_{1}) L_{x_{0}}^{2} (θ_{2}) d P (θ_{1}, θ_{2}) \\ = & \int_{Θ_{1}} \int_{Θ_{2}} L_{x_{0}}^{1} (θ_{1}) L_{x_{0}}^{2} (θ_{2}) π_{1} (θ_{1}) π_{2} (θ_{2}) d θ_{1} d θ_{2} \\ = & \int_{Θ_{1}} L_{x_{0}}^{1} (θ_{1}) π_{1} (θ_{1}) d θ_{1} \int_{Θ_{2}} L_{x_{0}}^{2} (θ_{2}) π_{2} (θ_{2}) d θ_{2} . \end{matrix}

(A15)

Finally,

\begin{matrix} \frac{f_{H} (x_{0})}{f_{A} (x_{0})} = & \frac{L_{x_{0}}^{1} (θ_{0}) \int_{Θ_{2}} L_{x_{0}}^{2} (θ_{2}) π_{2} (θ_{2}) d θ_{2}}{\int_{Θ_{1}} L_{x_{0}}^{1} (θ_{1}) π_{1} (θ_{1}) d θ_{1} \int_{Θ_{2}} L_{x_{0}}^{2} (θ_{2}) π_{2} (θ_{2}) d θ_{2}} \\ = & \frac{L_{x_{0}}^{1} (θ_{0})}{\int_{Θ_{1}} L_{x_{0}}^{1} (θ_{1}) π_{1} (θ_{1}) d_{θ_{1}}} = \frac{{\bar{f}}_{\bar{H}} (x_{0}, θ_{2})}{{\bar{f}}_{\bar{A}} (x_{0}, θ_{2})} . \end{matrix}

(A16)

Hence,

\frac{f_{H_{0}} (x_{0})}{f_{H_{1}} (x_{0})} < \frac{a}{b} \Leftrightarrow \frac{{\bar{f}}_{{\bar{H}}_{0}} (x_{0}, θ_{2})}{{\bar{f}}_{{\bar{H}}_{1}} (x_{0}, θ_{2})} < \frac{a}{b},

(A17)

and consequently,

φ^{*} (x_{0}) = 1 \Leftrightarrow {\bar{φ}}^{*} (x_{0}, θ_{2}) = 1 .

(A18)

□

Proof of Corollary 1.

The corollary follows directly from Theorems 2 and 3. □

Proof of Theorem 4.

From Theorem 3 and Corollary 1, we have that for each

x_{0} \in X

,

φ^{*} (x_{0}) = 1 \Leftrightarrow B F (x_{0}) ⩽ \frac{b}{a} .

(A19)

Then,

φ^{*} (x_{0}) = 1 \Rightarrow B F (x_{0}) ⩽ \frac{b}{a} \Rightarrow D (x_{0}) \subseteq D \Rightarrow

\Rightarrow \sum_{x \in D (x_{0})} f_{H, T (x_{0})} (x) \leq \sum_{x \in D} f_{H, T (x_{0})} (x) \Rightarrow p_{T} - v a l u e (x_{0}) \leq α_{T}^{*} (x_{0}) .

Thus,

φ^{*} (x_{0}) = 1 \Rightarrow P_{T} - v a l u e (x_{0}) \leq α_{T}^{*} (x_{0}) .

(A20)

The converse is proven by the contrapositive.

φ^{*} (x_{0}) = 0 \Rightarrow \frac{b}{a} < B F (x_{0}) \Rightarrow D \cup {x_{0}} \subseteq D (x_{0}) .

As

x_{0} \notin D

if

B F (x_{0}) > \frac{b}{a}

, we obtain that

φ^{*} (x_{0}) = 0 \Rightarrow \sum_{x \in D} f_{H, T (x_{0})} (x) + f_{H, T (x_{0})} (x_{0}) \leq \sum_{x \in D (x_{0})} f_{H, T (x_{0})} (x) .

Since

x_{0} \in D_{T} (x_{0})

and

B F (x_{0}) > \frac{b}{a} > 0

, it follows that

f_{H, T (x_{0})} (x_{0}) > 0

. Thus,

φ^{*} (x_{0}) = 0 \Rightarrow α_{T}^{*} (x_{0}) < P_{T} - v a l u e (x_{0}),

(A21)

and consequently,

p_{T} - v a l u e (x_{0}) \leq α_{T}^{*} (x_{0}) \Rightarrow φ^{*} (x_{0}) = 1 .

(A22)

From (A20) and (A22), the result follows. □

Mixed test for symmetry hypothesis for $r \times r$ contingency tables

In this case, Table A1 represents the observed frequencies of the cross-classification of n units by the variables

X_{1}

and

X_{2}

.

Table A1. Observed frequencies of

X_{1}

and

X_{2}

in the

3 \times 3

case.

Table A1. Observed frequencies of

X_{1}

and

X_{2}

in the

3 \times 3

case.

	$X_{2}$
$X_{1}$	$x_{11}$	$x_{12}$	$x_{13}$	$n_{1 .}$
	$x_{21}$	$x_{22}$	$x_{23}$	$n_{2 .}$
	$x_{31}$	$x_{32}$	$x_{33}$	$n_{3 .}$
	$n_{. 1}$	$n_{. 2}$	$n_{. 3}$	n

Let

X = (X_{i j})

be the

(r^{2} - 1)

-dimensional vector of cell counts and

θ = (θ_{i j})

be the

(r^{2} - 1)

-dimensional vector of cell probabilities, where

X_{i j}

and

θ_{i j}

are self explanatory. Suppose that X is a multinomial random vector with parameters n and

θ

. The likelihood function generated by

x \in X

for

θ

is given by

L_{x} (θ) = \frac{n!}{x_{11}! \dots x_{r r}!} θ_{11}^{x_{11}} \dots θ_{r r}^{x_{r r}} .

(A23)

The hypotheses for testing diagonal symmetry are

\begin{matrix} H : & θ_{i j} = θ_{j i} \forall i \neq j . \\ A : & θ_{i j} \neq θ_{j i} for at least one i \neq j . \end{matrix}

(A24)

We also assume a prior Dirichlet distribution with parameter

α = (α_{i j})

for

θ

. That is,

π (θ) = \frac{Γ (α_{11} + \dots + α_{r r})}{Γ (α_{11}) \dots Γ (α_{r r})} θ_{11}^{α_{11} - 1} \dots θ_{r r}^{α_{r r} - 1} .

(A25)

To perform the mixed test for the symmetry hypothesis, we consider the following reparametrization of the model: we define

\begin{matrix} λ_{i j} = \frac{θ_{i j}}{θ_{i j} + θ_{j i}} f o r i < j \\ λ_{i j} = θ_{i j} + θ_{j i} f o r i > j \\ λ_{i j} = θ_{i j} f o r i = j . \end{matrix}

(A26)

Let

λ = (λ_{1}, λ_{2})

, where

λ_{1}

is the

(\frac{r^{2} - r}{2})

-dimensional vector for which the components are

λ_{i j}

’s such that

i < j

, and

λ_{2}

is the

(\frac{r^{2} - r}{2} + r - 1)

-dimensional vector for which the components are

λ_{i j}

’s such that

i \geq j

. The new parameter space is then

Λ = {(0, 1)}^{\frac{r^{2} - r}{2}} \times S_{\frac{r^{2} - r}{2} + r - 1}

.

Then, we can rewrite the hypotheses (A24) as

\begin{matrix} \tilde{H} : & λ \in Λ_{0} \\ \tilde{A} : & λ \in Λ_{0}^{c} . \end{matrix}

(A27)

where

Λ_{0} = B \times S_{\frac{r^{2} - r}{2} + r - 1}

, and B is the singleton

B = {(\frac{1}{2}, \dots, \frac{1}{2})}

.

As in previous sections, we consider a statistic T that is s-ancillary for

λ_{1}

: T is the

(\frac{r^{2} - r}{2} + r - 1)

-dimensional vector for which the components are the sums

X_{i j} + X_{j i}

for

i < j

and

X_{i i}

for

i = 1, \dots, r - 1

. The induced likelihood function for

λ

generated by x is

{\tilde{L}}_{x} (λ) = [\prod_{i < j} (\binom{x_{i j} + x_{j i}}{x_{i j}}) λ_{i j}^{x_{i j}} {(1 - λ_{i j})}^{x_{j i}}] [\frac{n!}{\prod_{i} x_{i i}! \prod_{i > j} (x_{i j} + x_{j i})!} \prod_{i} λ_{i i}^{x_{i i}} \prod_{i > j} λ_{i j}^{x_{i j} + x_{j i}}] .

(A28)

We can easily see that the likelihood function in (A28) can be factored as

{\tilde{L}}_{x} (λ_{1}, λ_{2}) = {\tilde{L}}_{x}^{1} (λ_{1}) {\tilde{L}}_{x}^{2} (λ_{2})

. In addition, the prior distribution for

λ

is such that

λ_{1}

and

λ_{2}

are independent:

λ_{2}

being a Dirichlet random vector and

λ_{1}

a vector of independent Beta random variables. That is,

\tilde{π} (λ) = {\tilde{π}}_{1} (λ_{1}) {\tilde{π}}_{2} (λ_{2}) = \prod_{i < j} \frac{Γ (α_{i j} + α_{j i})}{Γ (α_{i j}) Γ (α_{j i})} λ_{i j}^{α_{i j} - 1} {(1 - λ_{i j})}^{α_{i j} - 1} I_{(0, 1)} (λ_{i j}) {\tilde{π}}_{2} (λ_{2})

(A29)

From Theorem 3, we have that

λ_{2}

is an NNP for testing

\tilde{H}

versus

\tilde{A}

by means of the mixed test. In addition, the mixed test for

\tilde{H}

reduces to the mixed test for the simple hypothesis

λ_{1} = (\frac{1}{2}, \dots, \frac{1}{2})

were

λ_{2}

known. From Theorem 4, it follows that we only need to compare the conditional

P_{T} - v a l u e

with the conditional adaptive significance level to test

\tilde{H}

against

\tilde{A}

. From (28), (30) and (A28), we obtain for

x = (x_{i j}) \in X

:

p_{T} - v a l u e (x) = \sum_{y \in D_{T}^{*} (x)} \prod_{i < j} (\binom{y_{i j} + y_{j i}}{y_{i j}}) {(\frac{1}{2})}^{y_{i j} + y_{j i}} = \sum_{y \in D_{T}^{*} (x)} \prod_{i < j} (\binom{x_{i j} + x_{j i}}{y_{i j}}) {(\frac{1}{2})}^{x_{i j} + x_{j i}}

(A30)

and

α_{T}^{*} (x) = \sum_{y \in D \cap D_{T} (x)} \prod_{i < j} (\binom{y_{i j} + y_{j i}}{y_{i j}}) {(\frac{1}{2})}^{y_{i j} + y_{j i}} = \sum_{y \in D \cap D_{T} (x)} \prod_{i < j} (\binom{x_{i j} + x_{j i}}{y_{i j}}) {(\frac{1}{2})}^{x_{i j} + x_{j i}} .

(A31)

In this case, these conditional quantities are simply determined by the products of binomial-type probabilities.

References

Berger, J.; Wolper, R. The Likelihood Principle; Institute of Mathematical Statistics: Hayward, CA, USA, 1988. [Google Scholar]
Mayo, D. On the Birnbaum Argument for the Strong Likelihood Principle. Stat. Sci. 2014, 29, 227–239. [Google Scholar] [CrossRef]
Dawid, A. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Stat. Sci. 2014, 29, 240–241. [Google Scholar] [CrossRef]
Evans, M. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Stat. Sci. 2014, 29, 242–246. [Google Scholar] [CrossRef]
Hannig, J. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Stat. Sci. 2014, 29, 254–258. [Google Scholar] [CrossRef]
Bjørnstad, J. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Stat. Sci. 2014, 29, 259–260. [Google Scholar] [CrossRef]
Shan, G. Exact Statistical Inference for Categorical Data; Academic Press: Cambridge, MA, USA, 2016. [Google Scholar]
Pericchi, L.; Pereira, C. Adaptive significance levels using optimal decision rules: Balancing by weighting the error probabilities. Braz. J. Probab. Stat. 2016, 30, 70–90. [Google Scholar] [CrossRef]
Butler, R.W. Predictive Likelihood Inference with Applications. J. R. Stat. Soc. Ser. B 1986, 48, 1–38. [Google Scholar] [CrossRef]
Severini, T. Integrated likelihoods for functions of a parameter. Stat 2018, 7, e212. [Google Scholar] [CrossRef]
Berger, J.; Liseo, B.; Wolpert, R. Integrated Likelihood Methods for Eliminating Nuisance Parameters. Stat. Sci. 1999, 14, 1–28. [Google Scholar] [CrossRef]
Cox, D.R. Partial likelihood. Biometrika 1975, 62, 269–276. [Google Scholar] [CrossRef]
Dawid, A.P. On the concepts of sufficiency and ancillarity in the presence of nuisance parameters. J. R. Stat. Soc. Ser. B 1975, 37, 248–258. [Google Scholar] [CrossRef]
Sprott, D.A. Marginal and conditional sufficiency. Biometrika 1975, 62, 599–605. [Google Scholar] [CrossRef]
Barndorff-Nielsen, O. Nonformation. Biometrika 1976, 63, 567–571. [Google Scholar] [CrossRef]
Barndorff-Nielsen, O. Information and Exponential Families: In Statistical Theory; Wiley: Chichester, UK, 1978. [Google Scholar]
Basu, D. On the Elimination of Nuisance Parameters. J. Am. Stat. Assoc. 1977, 72, 355–366. [Google Scholar] [CrossRef]
Jørgensen, B. The rules of conditional inference: Is there a universal definition of nonformation? J. Ital. Stat. Soc. 1994, 3, 355. [Google Scholar] [CrossRef]
Pace, L.; Salvan, A. Principles of Statistical Inference: From a Neo-Fisherian Perspective; World Scientific Publishing Company Pte Limited: Singapore, 1997. [Google Scholar]
Gannon, M.; Pereira, C.; Polpo, A. Blending Bayesian and Classical Tools to Define Optimal Sample-Size-Dependent Significance Levels. Am. Stat. 2019, 73, 213–222. [Google Scholar] [CrossRef]
Pereira, C.; Nakano, E.; Fossaluza, V.; Esteves, L.; Gannon, M.; Polpo, A. Hypothesis Tests for Bernoulli Experiments: Ordering the Sample Space by Bayes Factors and Using Adaptive Significance Levels for Decisions. Entropy 2017, 19, 696. [Google Scholar] [CrossRef]
Olivera, M. Definição do nivel de significancia em função do tamanho amostral. Master’s Thesis, IME, Universidade de São Paulo, São Paulo, Brazil, 2014. [Google Scholar]
Pereira, B.; Pereira, C. A Likelihood approach to diagnostic test in clinical medicine. Stat. J. 2005, 3, 77–98. [Google Scholar]
Montoya, D.; Irony, T.; Pereira, C.; Whittle, M. An unconditional exact test for the Hardy-Weinberg equilibrium law: Sample space ordering using the Bayes factor. Genet. Soc. Am. 2001, 158, 875–883. [Google Scholar]
Irony, T.; Pereira, C. Bayesian hypothesis test: Using surface integrals to distribute prior information among the hypotheses. Resenhas IME-USP 1995, 2, 27–46. [Google Scholar]
DeGroot, M. Probability and Statistics; Adson Wesley: Boston, MA, USA, 1986. [Google Scholar]
Freeman, P. The role of p-values in analysing trial resultss. Stat. Med. 1993, 12, 15–16. [Google Scholar] [CrossRef]
Bowker, A. A Test for Symmetry in Contingency Tables. J. Am. Stat. Assoc. 1948, 43, 572–574. [Google Scholar] [CrossRef]
Ireland, C.; Ku, H.; Kullback, S. Symmetry and marginal homogeneity of an r × r contingency table. J. Am. Stat. Assoc. 1969, 64, 1323–1341. [Google Scholar] [CrossRef]
Kullback, S. Marginal Homogeneity of Multidimensional Contingency Tables. Ann. Math. Stat. 1971, 42, 594–606. [Google Scholar] [CrossRef]
Bernardo, G.; Lauretto, M.; Stern, J. The full Bayesian significance test for symmetry in contingency tables. AIP Conf. Proc. 2012, 1443, 198–205. [Google Scholar]
Agresti, A. Categorical Data Analysis, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
Tahata, K.; Tomizawa, S. Symmetry and asymmetry models and decompositions of models for contingency tables. SUT J. Math. 2014, 50, 131–165. [Google Scholar] [CrossRef]
McClave, J.T.; Benson, P.G.; Sincich, T.T. Statistics for Business and Economics; Number 519.5; Pearson: London, UK, 2001. [Google Scholar]

Table 1. Observed frequencies of

(X_{1}, X_{2})

in the

2 \times 2

case.

Table 1. Observed frequencies of

(X_{1}, X_{2})

in the

2 \times 2

case.

	$X_{2}$
	$x_{11}$	$x_{12}$	$n_{1 .}$
$X_{1}$	$x_{21}$	$x_{22}$	$n_{2 .}$
	$n_{. 1}$	$n_{. 2}$	n

Table 2. Survey results.

		After
		No	Yes
	No	20	17	37
Before	Yes	10	53	63
		30	70	100

Table 3. Joint distribution of

X_{1}

and

X_{2}

given

θ

.

Table 3. Joint distribution of

X_{1}

and

X_{2}

given

θ

.

	$X_{2}$
	$θ_{11}$	$θ_{12}$	$θ_{1 .}$
$X_{1}$	$θ_{21}$	$θ_{22}$	$θ_{2 .}$
	$θ_{. 1}$	$θ_{. 2}$	1

Table 4. Survey results.

		After January
		Up	Down
	up	5	$0$	$5$
January change	down	2	$8$	10
		7	$8$	15

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Flórez Rivera, A.F.; Esteves, L.G.; Fossaluza, V.; de Bragança Pereira, C.A. On the Nuisance Parameter Elimination Principle in Hypothesis Testing. Entropy 2024, 26, 117. https://doi.org/10.3390/e26020117

AMA Style

Flórez Rivera AF, Esteves LG, Fossaluza V, de Bragança Pereira CA. On the Nuisance Parameter Elimination Principle in Hypothesis Testing. Entropy. 2024; 26(2):117. https://doi.org/10.3390/e26020117

Chicago/Turabian Style

Flórez Rivera, Andrés Felipe, Luis Gustavo Esteves, Victor Fossaluza, and Carlos Alberto de Bragança Pereira. 2024. "On the Nuisance Parameter Elimination Principle in Hypothesis Testing" Entropy 26, no. 2: 117. https://doi.org/10.3390/e26020117

APA Style

Flórez Rivera, A. F., Esteves, L. G., Fossaluza, V., & de Bragança Pereira, C. A. (2024). On the Nuisance Parameter Elimination Principle in Hypothesis Testing. Entropy, 26(2), 117. https://doi.org/10.3390/e26020117

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On the Nuisance Parameter Elimination Principle in Hypothesis Testing

Abstract

1. Introduction

2. The Non-Informative Nuisance Parameter Principle for Hypothesis Testing

3. The Mixed Test Procedure

The Mixed Test Obeys the NNPP

4. Examples

4.1. Comparison of Poisson Means

4.2. Test of Symmetry

4.3. Test of Independence

5. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI