The Logical Consistency of Simultaneous Agnostic Hypothesis Tests

Esteves, Luís G.; Izbicki, Rafael; Stern, Julio M.; Stern, Rafael B.

doi:10.3390/e18070256

Open AccessArticle

The Logical Consistency of Simultaneous Agnostic Hypothesis Tests

by

Luís G. Esteves

¹,

Rafael Izbicki

²,

Julio M. Stern

¹

and

Rafael B. Stern

^2,*

¹

Institute of Mathematics and Statistics, University of São Paulo, São Paulo 13565-905, Brazil

²

Department of Statistics, Federal University of São Carlos, São Carlos 05508-090, Brazil

^*

Author to whom correspondence should be addressed.

Entropy 2016, 18(7), 256; https://doi.org/10.3390/e18070256

Submission received: 30 May 2016 / Revised: 6 July 2016 / Accepted: 7 July 2016 / Published: 13 July 2016

(This article belongs to the Special Issue Statistical Significance and the Logic of Hypothesis Testing)

Download

Browse Figures

Versions Notes

Abstract

:

Simultaneous hypothesis tests can fail to provide results that meet logical requirements. For example, if A and B are two statements such that A implies B, there exist tests that, based on the same data, reject B but not A. Such outcomes are generally inconvenient to statisticians (who want to communicate the results to practitioners in a simple fashion) and non-statisticians (confused by conflicting pieces of information). Based on this inconvenience, one might want to use tests that satisfy logical requirements. However, Izbicki and Esteves shows that the only tests that are in accordance with three logical requirements (monotonicity, invertibility and consonance) are trivial tests based on point estimation, which generally lack statistical optimality. As a possible solution to this dilemma, this paper adapts the above logical requirements to agnostic tests, in which one can accept, reject or remain agnostic with respect to a given hypothesis. Each of the logical requirements is characterized in terms of a Bayesian decision theoretic perspective. Contrary to the results obtained for regular hypothesis tests, there exist agnostic tests that satisfy all logical requirements and also perform well statistically. In particular, agnostic tests that fulfill all logical requirements are characterized as region estimator-based tests. Examples of such tests are provided.

Keywords:

agnostic tests; multiple hypothesis testing; logical consistency; decision theory; loss functions

Graphical Abstract

1. Introduction

One of the practical shortcomings of simultaneous test procedures is that they can lack logical consistency [1,2]. As a result, recent papers have discussed minimum logical requirements and methods that achieve these requirements [3,4,5,6,7]. For example, it has been argued that simultaneous tests ought to be in agreement with the following criterion: if hypothesis A implies hypothesis B, a procedure that rejects B should also reject A.

In particular, Izbicki and Esteves [3] and da Silva et al. [7] examine classical and bayesian simultaneous tests with respect to four consistency properties:

Monotonicity: if A implies B, then a test that does not reject A should not reject B.
Invertibility: A test should reject A if and only if it does not reject not-A.
Union consonance: If a test rejects A and B, then it should reject $A \cup B$ .
Intersection consonance: If a test does not reject A and does not reject B, then it should not reject $A \cap B$ .

Izbicki and Esteves [3] prove that the only tests that are fully coherent are trivial tests based on point estimation, which are generally void of statistical optimality. This finding suggests that alternatives to the standard “reject versus accept” tests should be explored.

Such an alternative are agnostic tests [8], which can take the following decisions: (i) accept an hypothesis (decision 0); (ii) reject it (decision 1); or (iii) noncommittally neither accept or reject it; thus abstaining or remaining agnostic about the other two actions (decision

\frac{1}{2}

). Decision (iii) is also called a no-decision classification. The set of samples,

x \in X

, for which one abstains from making a decision about a given hypothesis is called a no-decision region [8]. An agnostic test enables one to explicitly deal with the difference between “accepting a hypothesis H” and “not rejecting H (remaining agnostic)”. This distinction will be made clearer in Section 5, which derives agnostic tests under a Bayesian decision-theoretic standpoint by means of specific penalties for false rejection, false acceptance and excessive abstinence.

We use the above framework to revisit the logical consistency of simultaneous hypothesis tests. Section 2 defines agnostic testing scheme (ATS), a transformation that assigns to each statistical hypothesis an agnostic test function. This definition is illustrated with bayesian and frequentist examples, using both existing and novel agnostic tests. Section 3 generalizes the logical requirements in [3] to agnostic testing schemes. Section 4 presents tests that satisfy all of these logical requirements. Section 5 obtains, under the Bayesian decision-theoretic paradigm, necessary and sufficient conditions on loss functions to ensure that Bayes tests meet each of the logical requirements. All theorems are proved in the Appendix.

2. Agnostic Testing Schemes

This section describes the mathematical setup for agnostic testing schemes. Let

X

denote the sample space, Θ the parameter space and

L_{x} (θ)

the likelihood function at the point

θ \in Θ

generated by the data

x \in X

. We denote by

D = {0, \frac{1}{2}, 1}

the set of all decisions that can be taken when testing a hypothesis: accept (0), remain agnostic (

\frac{1}{2}

) and reject (1). By an agnostic hypothesis test (or simply agnostic test) we mean a decision function from

X

to

D

[8,9]. Similar tests are commonly used in machine learning in the context of classification [1,2]. Moreover, let

Φ = {ϕ : ϕ : X ⟶ D}

be the set of all (agnostic) hypothesis tests. The following definition adapts testing schemes [3] to agnostic tests.

Definition 1 (Agnostic Testing Scheme; ATS).

Let

σ (Θ)

, a σ-field of subsets of the parameter space Θ, be the set of hypotheses to be tested. An ATS is a function

L : σ (Θ) \to Φ

that, for each hypothesis

A \in σ (Θ)

, assigns the test

L (A) \in Φ

for testing A.

A way of creating an agnostic testing scheme is to find a collection of statistics and to compare them to thresholds:

Example 1.

For every

A \in σ (Θ)

, let

s_{A} : X ⟶ R

be a statistic. Let

c_{1}, c_{2} \in R

, with

c_{1} \geq c_{2}

, be fixed thresholds. For each

A \in σ (Θ)

, one can define

L (A) : X \to D

by

L (A) (x) = \{\begin{matrix} 0 & if s_{A} (x) > c_{1} \\ \frac{1}{2} & if c_{1} \geq s_{A} (x) > c_{2} \\ 1 & if c_{2} \geq s_{A} (x) \end{matrix}

The ATS in Example 1 rejects a hypothesis if the value of the statistic

s_{A}

is small, accepts it if this value is large, and remains agnostic otherwise. If

s_{A} (x)

is a measure of how much evidence that x brings about A, then this ATS rejects a hypothesis if the evidence brought by the data is small, accepts it if this evidence is large, and remains agnostic otherwise. The next examples present particular cases of this ATS These examples will be explored in the following sections.

Example 2 (ATS based on posterior probabilities).

Let

Θ = R^{d}

and

σ (Θ) = B (Θ),

the Borelians of

R^{d}

. Assume that a prior probability

P

in

σ (Θ)

is fixed, and let

c_{1}, c_{2} \in (0, 1)

, with

c_{1} \geq c_{2}

, be fixed thresholds. For each

A \in σ (Θ)

, let

L (A) : X \to D

be defined by

L (A) (x) = \{\begin{matrix} 0 & if P (A | x) > c_{1} \\ \frac{1}{2} & if c_{1} \geq P (A | x) > c_{2} \\ 1 & if c_{2} \geq P (A | x) \end{matrix}

where

P (. | x)

is the posterior distribution of θ, given x. This is essentially the test that Ripley [10] proposed in the context of classification, which was also investigated by Babb et al. [9]. When

c_{1} = c_{2},

this ATS is a standard (non-agnostic) Bayesian testing scheme.

Example 3 (Likelihood Ratio Tests with fixed threshold).

Let

Θ = R^{d}

and

σ (Θ) = P (Θ),

the set of the parts of

R^{d}

. Let

c_{1}, c_{2} \in (0, 1)

, with

c_{1} \geq c_{2}

, be fixed thresholds. For each

A \in σ (Θ)

, let

λ_{x} (A) = \frac{{sup}_{θ \in A} L_{x} (θ)}{{sup}_{θ \in Θ} L_{x} (θ)}

be the likelihood ratio statistic for sample

x \in X

. Define

L

by

L (A) (x) = \{\begin{matrix} 0 & if λ_{x} (A) > c_{1} \\ \frac{1}{2} & if c_{1} \geq λ_{x} (A) > c_{2} \\ 1 & if c_{2} \geq λ_{x} (A) \end{matrix}

When

c_{1} = c_{2}

, this is the standard likelihood ratio with fixed threshold (non-agnostic) testing scheme [3].

A similar test to that of Example 3 is developed by Berg [8]; however, the values of the cutoffs

c_{1}

and

c_{2}

are allowed to change with the hypothesis of interest, and they are chosen so as to control the level of significance and the power of each of the tests.

Example 4 (FBST ATS).

Let

Θ = R^{d},

σ (Θ) = B (R^{d}),

and

f (θ)

be the prior probability density function (p.d.f.) for θ. Suppose that, for each

x \in X

, there exists

f (θ | x),

the p.d.f. of the posterior distribution of θ, given x. For each hypothesis

A \in σ (Θ),

let

T_{x}^{A} = \{θ \in Θ : f (θ | x) > sup_{θ \in A} f (θ | x)\}

be the set tangent to the null hypothesis and let

e v_{x} (A) = 1 - P (θ \in T_{x}^{A} | x)

be the Pereira–Stern evidence value for A [11]. Let

c_{1}, c_{2} \in (0, 1)

, with

c_{1} \geq c_{2}

, be fixed thresholds. One can define an ATS

L

by

L (A) (x) = \{\begin{matrix} 0 & if e v_{x} (A) > c_{1} \\ \frac{1}{2} & if c_{1} \geq e v_{x} (A) > c_{2} \\ 1 & if c_{2} \geq e v_{x} (A) \end{matrix}

When

c_{1} = c_{2}

, this ATS reduces to the standard (non-agnostic) FBST testing scheme [3].

The following example presents a novel ATS based on region estimators.

Example 5 (Region Estimator-based ATS).

Let

R : X ⟶ P (Θ)

be a region estimator of θ. For every

A \in σ (Θ)

and

x \in X

, one can define an ATS

L

via

L (A) (x) = \{\begin{matrix} 0 & if R (x) \subseteq A \\ 1 & if R (x) \subseteq A^{c} \\ \frac{1}{2} & otherwise \end{matrix}

Hence,

L (A) (x) = \frac{I (R (x) \subseteq A^{c}) + I (R (x) ⊈ A)}{2}

. See Figure 1 for an illustration of this procedure.

Notice that for continuous Θ, Example 5 does not accept precise (i.e., null Lebesgue measure) hypotheses, yielding either rejection or abstinence (unless region estimates are themselves precise). Therefore, the performance of region estimator-based ATS’s is in agreement with the prevailing position among both bayesian and frequentist statisticians: to accept a precise hypothesis is inappropriate. From a Bayesian perspective, precise null hypothesis usually have zero posterior probabilities, and thus should not be accepted. From a Frequentist perspective, not rejecting a hypothesis is not the same as accepting it. See Berger and Delampady [12] and references therein for a detailed account on the controversial problem of testing precise hypotheses.

In principle, R can be any region estimator. However, some choices of R lead to better statistical performance. For example, from a frequentist, one might choose R to be a confidence region. This choice is explored in the next example.

Example 6.

From a frequentist perspective, one might choose R in Example 5 to be a confidence region: if the region estimator has confidence at least

1 - α

, then type I error probability,

{sup}_{θ \in A} P (L (A) (X) = 1 | θ)

, is smaller than α for each of the hypothesis tests. Indeed,

sup_{θ \in A} P (L (A) (X) = 1 | θ) = sup_{θ \in A} P (θ^{'} \notin R (X) for every θ^{'} \in A | θ) \leq sup_{θ \in A} P (θ \notin R (X) | θ) \leq α .

If R is a confidence region, then this ATS also controls the Family Wise Error Rate (FWER, [13]), as shown in Section 3.1.

Consider

X_{1}, \dots, X_{20} | μ \overset{i . i . d .}{\sim} N o r m a l (μ, 1)

. In Figure 2, we illustrate how the probability of each decision,

P (L (A) (X) = d | μ)

for

d \in {0, \frac{1}{2}, 1}

, varies as a function of μ for three hypotheses: (i)

μ < 0

; (ii)

μ = 0

; and (iii)

0 < μ < 1

. We consider the standard region estimator for μ,

R (X) = [\bar{X} - z_{1 - α / 2} \frac{1}{\sqrt{n}}; \bar{X} + z_{1 - α / 2} \frac{1}{\sqrt{n}}]

with

α = 5 %

. These curves represent the generalization of the standard power function to agnostic hypothesis tests. Notice that

μ = 0

is never accepted, and that, under the null hypothesis, all tests have at most 5% of probability of rejecting H.

The next two examples show other cases of the ATS in Example 5 that use region estimators based on the measures of evidence in Examples 3 and 4.

Example 7 (Region Likelihood Ratio ATS).

For a fixed value

c \in (0, 1)

, define the region estimate

R_{c} (x) = {θ \in Θ : λ_{x} ({θ}) \geq c}

, where

λ_{x}

is the likelihood ratio statistics from Example 3. For every

A \in σ (Θ)

and

x \in X

, the ATS based on this region estimator (Example 5) satisfies

L (A) (x) = 1 \Leftrightarrow A ⋂ R_{c} (x) = \emptyset

, and

L (A) (x) = 0 \Leftrightarrow A^{c} ⋂ R_{c} (x) = \emptyset

. It follows that this ATS can be written as

L (A) (x) = \{\begin{matrix} 0 & if λ_{x} (A^{c}) < c \\ 1 & if λ_{x} (A) < c \\ \frac{1}{2} & otherwise \end{matrix}

Example 8 (Region FBST ATS).

For a fixed value of

c \in (0, 1)

, let

{HPD}_{c}^{x}

be the Highest Posterior Probability Density region with probability

1 - c

, based on observation x [3,14]. For every

A \in σ (Θ)

and

x \in X

, the ATS based on this region estimator (Example 5) satisfies

L (A) (x) = 1 \Leftrightarrow A ⋂ {HPD}_{c}^{x} = \emptyset

, and

L (A) (x) = 0 \Leftrightarrow A^{c} ⋂ {HPD}_{c}^{x} = \emptyset

. It follows that this ATS can be written as

L (A) (x) = \{\begin{matrix} 0 & if e v_{x} (A^{c}) < c \\ 1 & if e v_{x} (A) < c \\ \frac{1}{2} & otherwise \end{matrix}

In the sequence, we introduce four logical coherence properties for agnostic testing schemes and investigate which tests satisfy them.

3. Coherence Properties

3.1. Monotonicity

Monotonicity restricts the decisions that are available for nested hypotheses. If hypothesis Aimplies hypothesis B (i.e.,

A \subseteq B

), then a testing scheme that rejects B should also reject A. Monotonicity has received a lot of attention in the literature (e.g., [5,6,15,16,17,18,19]). It can be extended to ATS’s in the following way.

Definition 2 (Monotonicity).

L : σ (Θ) \to Φ

is monotonic if, for every

A, B \in σ (Θ)

,

A \subset B

implies that

L (A) \geq L (B)

.

L

is monotonic if, for every hypotheses

A \subset B

,

if $L$ accepts A, then it also accepts B.
if $L$ remains agnostic about A, then it either remains agnostic about B or accepts B.

Next, we illustrate some monotonic agnostic testing schemes.

Example 9 (Tests based on posterior probabilities).

The ATS from Example 2 is monotonic. Indeed,

A \subset B

implies that

P (A | x) \leq P (B | x)

\forall x \in X

, and hence

P (A | x) > c_{i}

implies that

P (B | x) > c_{i}

for

i = 1, 2

.

Example 10 (Likelihood Ratio Tests with fixed threshold).

The ATS from Example 3 is monotonic. This is because if

A, B \in σ (Θ)

are such that

A \subset B

, then

{sup}_{θ \in A} L_{x} (θ) \leq {sup}_{θ \in B} L_{x} (θ)

,

\forall x \in X,

which implies that

λ_{x} (A) \leq λ_{x} (B)

. It follows that

λ_{x} (A) > c_{i}

implies that

λ_{x} (B) > c_{i}

for

i = 1, 2

.

Example 11 (FBST).

The ATS from Example 4 is monotonic. In fact, let

A, B \in σ (Θ)

be such that

A \subset B

. We have

{sup}_{B} f (θ | x) \geq {sup}_{A} f (θ | x)

\forall x \in X .

Hence,

T_{x}^{B} \subseteq T_{x}^{A}

, and, therefore,

e v_{x} (A) \leq e v_{x} (B)

. It follows that

e v_{x} (A) > c_{i}

implies that

e v_{x} (B) > c_{i}

for

i = 1, 2

.

Notice that p-values and Bayes factors are not (coherent) measures of support for hypotheses [19,20], and therefore using them in a similar fashion as in Examples 2–4 would not lead to monotonic agnostic testing schemes. On the other hand, any monotonic statistic

s_{A}

does, however, provide a monotonic ATS, because, if

A \subseteq B

,

s_{A} (x) > c_{i}

implies that

s_{B} (x) > c_{i}

for

i = 1, 2

. Another example of such statistic is the s-value defined by Patriota [6]. As a matter of fact, every ATS is, in a sense, associated to monotonic statistics as shown in the next theorem.

Theorem 1.

Let

L

be an agnostic testing scheme.

L

is monotonic if, and only if, there exist a sequence of test statistics

{(s_{A})}_{A \in σ (Θ)},

s_{A} : X ⟶ I \subseteq R,

with

s_{A} \leq s_{B}

whenever

A \subset B

,

A, B \in σ (Θ)

, and cutoffs

c_{1}, c_{2} \in I

,

c_{1} \geq c_{2}

, such that for every

A \in σ (Θ)

and

x \in X

,

\begin{matrix} L (A) (x) = \{\begin{matrix} 0 & i f s_{A} (x) > c_{1} \\ \frac{1}{2} & i f c_{1} \geq s_{A} (x) > c_{2} \\ 1 & i f c_{2} \geq s_{A} (x) \end{matrix} \end{matrix}

(1)

Example 12 (Region Estimator).

The ATS from Example 5 is monotonic, because if

A \subseteq B,

A, B \in σ (Θ)

, then

L (A) (x) = \frac{I (R (x) \subseteq A^{c}) + I (R (x) ⊈ A)}{2} \geq \frac{I (R (x) \subseteq B^{c}) + I (R (x) ⊈ B)}{2} = L (B) (x),

as

I (R (x) \subseteq B^{c}) \leq I (R (x) \subseteq A^{c})

and

I (R (x) ⊈ B) \leq I (R (x) ⊈ A)

. Because this ATS is monotonic, it also controls Family Wise Error Rate [21].

3.2. Union Consonance

Finner and Strassburger [4] and Izbicki and Esteves [3] investigated the following logical property, named union consonance: if a (non-agnostic) testing scheme rejects each of the hypotheses A and B, it should also reject their union

A \cup B

. In other words, a TS cannot accept the union while rejecting its components. In this section, we adapt the concept of union consonance to the framework of agnostic testing schemes by considering two extensions for such desideratum: the weak and the strong union consonance.

Definition 3 (Weak Union Consonance).

An ATS

L : σ (Θ) \to Φ

is weakly consonant with the union if, for every

A, B \in σ (Θ)

, and for every

x \in X

,

\begin{matrix} L (A) (x) = 1 and L (B) (x) = 1 implies L (A \cup B) (x) \neq 0 \end{matrix}

This is exactly the definition of union consonance for non-agnostic testing schemes. Notice that, according to such definition, it is possible to remain agnostic about

A \cup B

while rejecting A and B.

Remark 1.

Izbicki and Esteves [3] show that if a non agnostic testing scheme

L

satisfies union consonance, then for every finite set of indices I and for every

{A_{i}}_{i \in I} \subseteq σ (Θ), min {L (A_{i})}_{i \in I} = 1

implies that

L (\cup_{i \in I} A_{i}) \neq 0

. This is not the case for weak union consonant agnostic testing schemes; we leave further details to Section 4.3.

The second definition of union consonance is more stringent than the first one:

Definition 4 (Strong Union Consonance).

An

L : σ (Θ) \to Φ

is strongly consonant with the union if, for every arbitrary set of indices I and for every

{A_{i}}_{i \in I} \subseteq σ (Θ)

such that

\cup_{i \in I} A_{i} \in σ (Θ)

, and for every

x \in X

,

min {L (A_{i}) (x)}_{i \in I} = 1 implies L (\cup_{i \in I} A_{i}) (x) = 1

Definition 3 is less stringent than Definition 4 in two senses: (i) the latter imposes the (strict) rejection of a union of hypotheses whenever each of them is rejected while the former imposes just non-acceptance (rejection or abstention) of the union is such circumstances; and (ii) in Definition 4 consonance is required to hold for every set (possibly infinite) of hypotheses as opposed to Definition 3 which applies only to pairs of hypotheses. Notice that if an ATS is strongly consonant with union, it is also weakly consonant with union, and that both definitions are indeed extensions of the concept presented by Izbicki and Esteves [3].

The following examples show ATSs that are consonant with union.

Example 13 (Tests based on posterior probabilities).

Consider again Example 2 with the restriction

c_{1} \geq 2 c_{2}

. If A and B are rejected after observing

x \in X

, then

P (A \cup B | x) \leq P (A | x) + P (B | x) \leq 2 c_{2} \leq c_{1},

and therefore

A \cup B

cannot be accepted. Thus, with this restrictions, that ATS is weakly consonant with union. The restriction

c_{1} \geq 2 c_{2}

is not only sufficient to ensure weak union consonance, but it is actually necessary to ensure it holds for every prior distribution (see Theorem 2). Notice, however, that this ATS is not strongly consonant with union in general.

Example 14 (Likelihood Ratio Tests with fixed threshold).

The ATS of Example 3 is strongly consonant with union. Indeed, let I be an arbitrary set of indices and

{A_{i}}_{i \in I} \subseteq σ (Θ)

be such that

\cup_{i \in I} A_{i} \in σ (Θ)

. For every

x \in X

,

λ_{x} (\cup_{i \in I} A_{i}) = {sup}_{i \in I} {λ_{x} (A_{i})}

[3]. It follows that if

λ_{x} (A_{i}) \leq c_{2}

for every

i \in I

, then

λ_{x} (\cup_{i \in I} A_{i}) \leq c_{2}

. Thus, if

L

rejects all hypotheses

A_{i}

after x is observed, it also rejects

\cup_{i \in I} A_{i}

. In addition,

L

is also weakly consonant with union.

Example 15 (FBST).

The ATS from Example 4 is also strongly consonant with union. Indeed, let I be an arbitrary set of indices and

{A_{i}}_{i \in I} \subseteq σ (Θ)

be such that

\cup_{i \in I} A_{i} \in σ (Θ)

. For every

x \in X

,

e v_{x} (\cup_{i \in I} A_{i}) = {sup}_{i \in I} {e v_{x} (A_{i})}

[22]. Strong union consonance holds due to the same argument from Example 14. It follows that

L

is also weakly consonant with union.

Example 16 (Region Estimator).

The TS from Example 5 satisfies strong union consonance. Indeed, let I be an arbitrary set of indices and

{A_{i}}_{i \in I} \subseteq σ (Θ)

be such that

\cup_{i \in I} A_{i} \in σ (Θ)

. If

L (A_{i}) (x) = 1

, then

R (x) \subseteq A_{i}^{c}

. Hence, if

L (A_{i}) (x) = 1

for every

i \in I

,

R (x) \subseteq ⋂_{i \in I} A_{i}^{c} = {(⋃_{i \in I} A_{i})}^{c}

, and, therefore,

⋃_{i \in I} A_{i}

is rejected. It follows that

L

is also weakly consonant with union.

3.3. Intersection Consonance

The third property we investigate, named intersection consonance [3], states that if a (non agnostic) testing scheme cannot accept hypotheses A and B while rejecting its intersection. We consider two extensions of such definition to agnostic testing schemes.

Definition 5 (Weak Intersection Consonance).

An ATS

L : σ (Θ) \to Φ

is consonant with the intersection if, for every

A, B \in σ (Θ)

and

x \in X

,

\begin{matrix} L (A) (x) = 0 and L (B) (x) = 0 implies L (A \cap B) (x) \neq 1 . \end{matrix}

This is exactly the definition of intersection consonance for non-agnostic testing schemes. Notice that it is possible to accept A and B while being agnostic about

A \cap B

.

The second definition of intersection consonance is more stringent:

Definition 6 (Strong Intersection Consonance).

An ATS

L : σ (Θ) \to Φ

is strongly consonant with the intersection if, for every arbitrary set of indices I and for every

{A_{i}}_{i \in I} \subseteq σ (Θ)

such that

\cap_{i \in I} A_{i} \in σ (Θ)

, and for every

x \in X

,

max {L (A_{i}) (x)}_{i \in I} = 0 implies L (\cap_{i \in I} A_{i}) (x) = 0 .

As in the case of union consonance, Definition 5 is less stringent than Definition 6 in two senses: (i) the latter imposes the (strict) acceptance of an intersection of hypotheses whenever each of them is accepted while the former imposes just non-rejection (acceptance or abstention) of the intersection is such circumstances; and (ii) in Definition 6 consonance is required to hold for every set (possibly infinite) of hypotheses as opposed to Definition 5 which applies only to pairs of hypotheses. Notice that if an ATS is strongly consonant with intersection, it is also weakly consonant with intersection, and that both definitions are indeed extensions of the concept presented by Izbicki and Esteves [3].

Example 17 (Tests based on posterior probabilities).

Consider Example 2 with the restriction

c_{2} \leq 2 c_{1} - 1

. If A and B are accepted when

x \in X

is sampled, then

P (A | x) > c_{1}

and

P (B | x) > c_{1}

. By Fréchet inequality, it follows that

P (A \cap B | x) \geq P (A | x) + P (B | x) - 1 > 2 c_{1} - 1 \geq c_{2}

and, therefore,

A \cap B

cannot be rejected. It follows that weak intersection consonance holds. The restriction

c_{2} \leq 2 c_{1} - 1

is not only sufficient to ensure weak intersection consonance, but it is actually necessary to ensure this property holds for every prior distribution; see Theorem 2. Notice, however, that this ATS is not strongly consonant with intersection in general (Take, for example,

Θ = [0, 1]

,

P = λ

(the Lebesgue measure),

A = [0, 2 / 3]

,

B = [1 / 3, 1]

, and

c_{1} = 3 / 5

).

The ATS based on the likelihood ratio statistic from Example 3 does not satisfy intersection consonance, because there are examples in which

λ_{x} (A \cap B) = 0

, while

λ_{x} (A) > 0

and

λ_{x} (B) > 0

(Consider, for example, that every

θ \in Θ

has the same likelihood and

A \cap B = \emptyset

). Similarly, the ATS based on FBST from Example 4 is not consonant with intersection, because there are examples such that

e v_{x} (A \cap B) = 0

, while

e v_{x} (A) > 0

and

e v_{x} (B) > 0

. ATSs based on region estimators are consonant with intersection.

Example 18 (Region Estimator).

The TS from Example 5 satisfies both strong and weak intersection consonance. Indeed, let I be an arbitrary set of indices and

{A_{i}}_{i \in I} \subseteq σ (Θ)

be such that

\cap_{i \in I} A_{i} \in σ (Θ)

. If

L (A_{i}) (x) = 0

for every

i \in I

, then

R (x) \subseteq A_{i}

for every

i \in I

. It follows that

R (x) \subseteq \cap_{i \in I} A_{i}

, and hence

\cap_{i \in I} A_{i}

is accepted.

It follows that the ATSs from Examples 7 and 8 are are also consonant with intersection. Hence, it is possible to use e-values and likelihood ratio statistics to define ATS that are consonant with intersection.

Example 19 (ANOVA).

In [3], the authors present an example which we now revisit. Suppose that

X_{1}, \dots, X_{20}

are i.i.d.

N (μ_{1}, σ^{2});

X_{21},

\dots,

X_{40}

are i.i.d.

N (μ_{2}, σ^{2})

and

X_{41}, \dots, X_{60}

are i.i.d.

N (μ_{3}, σ^{2}) .

Consider the following hypotheses:

\begin{matrix} H_{0}^{(1, 2, 3)} : μ_{1} = μ_{2} = μ_{3} H_{0}^{(1, 2)} : μ_{1} = μ_{2} H_{0}^{(1, 3)} : μ_{1} = μ_{3} \end{matrix}

and suppose that we observe the following means and standard-deviations on the data:

{\bar{X}}_{1} = 0.15; S_{1} = 1.09; {\bar{X}}_{2} = - 0.13; S_{2} = 0.5 {\bar{X}}_{3} = - 0.38; S_{3} = 0.79

. Using the likelihood ratio statistics, we have the following p-values for these hypotheses:

\begin{matrix} p_{H_{0}^{(1, 2, 3)}} = 0.0498 p_{H_{0}^{(1, 2)}} = 0.2564 p_{H_{0}^{(1, 3)}} = 0.0920 \end{matrix}

Therefore, the testing scheme given by the likelihood ratio tests with common level of significance

α = 5 %

rejects

H_{0}^{(1, 2, 3)}

but does not reject either

H_{0}^{(1, 2)}

or

H_{0}^{(1, 3)} .

It follows intersection consonance does not hold. Now, consider the region estimator ATS based on the region estimate given by [23] for this setting,

\begin{matrix} R (x) = \{(μ_{1}, μ_{2}, μ_{3}) \in R^{3} : μ_{1} - μ_{2} \in [- 1.65, 2.21], μ_{2} - μ_{3} \in [- 1.68, 2.18], μ_{1} - μ_{3} \in [- 1.40, 2.46]\} \end{matrix}

All hypotheses

H_{0}^{(1, 2, 3)}, H_{0}^{(1, 2)},

and

H_{0}^{(1, 3)}

intercept both

R (x)

and its complement, so that one remains agnostic about all of them. As expected, intersection consonance holds using this ATS.

3.4. Invertibility

Invertibility formalizes the notion of simultaneous tests free from the labels “null” and “alternative” for the hypotheses of interest and has been suggested by several authors, specially under a Bayesian perspective [3,24,25].

Definition 7 (Invertibility).

An ATS

L : σ (Θ) \to Φ

is invertible if, for every

A \in σ (Θ)

,

\begin{matrix} L (A^{c}) = 1 - L (A) \end{matrix}

Example 20 (Tests based on posterior probabilities).

The ATS from Example 2 is invertible for every prior distribution if and only if

c_{2} = 1 - c_{1}

.

Example 21 (Region Estimator).

The ATS from Example 5 is invertible. Indeed,

L (A) (x) = \frac{I (R (x) \subseteq A^{c}) + I (R (x) ⊈ A))}{2} = \frac{(1 - I (R (x) ⊈ A^{c})) + (1 - I (R (x) \subseteq A))}{2} = 1 - L (A^{c}) (x)

It follows that the ATS from Examples 7 and 8 are also invertible.

4. Satisfying All Properties

Is it possible to construct non-trivial agnostic testing schemes that satisfy all consistency properties simultaneously? Contrary to the case of non agnostic testing schemes [3], the answer is yes. We next examine this question considering three desiderata: the weak desiderata (Section 4.1), the strong desiderata (Section 4.2), and the n-weak desiderata (Section 4.3).

4.1. Weak Desiderata

Definition 8 (Weakly Consistent ATS).

An ATS,

L

, is said to be weakly consistent if

L

is monotonic (Definition 2), invertible (Definition 7), weakly consonant with the union (Definition 3), and weakly consonant with the intersection (Definition 5).

Example 22 (Region Estimator).

The ATS from Example 5 was already shown to satisfy all consistency properties from Definition 8 (Examples 12, 16, 18 and 21). Thus, it is a weakly consistent ATS.

It follows that the ATSs Examples 7 and 8, based on measures of support (likelihood ratio statistics and e-values), are weakly consistent ATSs.

Example 23 (Tests based on posterior probabilities).

Consider Example 2. We have seen that the following restrictions are sufficient to guarantee union weak consonance (Example 13), weak intersection consonance (Example 17) and invertibility (Example 20), respectively:

c_{1} \geq 2 c_{2}

,

2 c_{1} - 1 \geq c_{2}

and

c_{2} = 1 - c_{1}

. It follows from these relations and the fact that this ATS is monotonic (Example 9) that if

c_{1} > 2 / 3

and

c_{2} = 1 - c_{1}

, then it is weakly consistent, whatever the prior distribution for θ is.

The next theorem shows necessary and sufficient conditions for agnostic tests based on posterior distribution (with possibly different thresholds

c_{1}

and

c_{2}

for each hypothesis of interest) to satisfy each of the coherence properties.

Theorem 2.

Let

Θ = R^{d}

and

σ (Θ) = B (Θ),

the Borelians of

R^{d}

. Let

P

be a prior probability measure in

σ (Θ)

. For each

A \in σ (Θ)

, let

L (A) : X \to D

be defined by

L (A) (x) = \{\begin{matrix} 0 & i f P (A | x) > c_{1}^{A} \\ \frac{1}{2} & i f c_{1}^{A} \geq P (A | x) > c_{2}^{A} \\ 1 & i f c_{2}^{A} \geq P (A | x) \end{matrix}

where

P (. | x)

is the posterior distribution of θ, given x, and

0 \leq c_{2}^{A} \leq c_{1}^{A} \leq 1

. This is a generalization of the ATS of Example 2. Assume that the likelihood function is positive for every

x \in X

and

θ \in Θ

. Such ATS satisfies:

1.: Monotonicity for every prior distribution if, and only if, for every $A, B \in σ (Θ)$ with $A \subseteq B,$ $c_{2}^{A} \geq c_{2}^{B}$ and $c_{1}^{A} \geq c_{1}^{B}$
2.: Weak union consonance for every prior distribution if, and only if, for every $A, B \in σ (Θ)$ such that $A \neq B$ , $c_{2}^{A} + c_{2}^{B} \leq c_{1}^{A \cup B}$
3.: Weak intersection consonance for every prior distribution if, and only if, for every $A, B \in σ (Θ)$ such that $A \neq B$ , $c_{1}^{A} + c_{1}^{B} - 1 \geq c_{2}^{A \cap B}$
4.: Invertibility for every prior distribution if, and only if, for every $A \in σ (Θ)$ , $c_{1}^{A} = 1 - c_{2}^{A^{c}}$

It follows from Theorem 2 that if the cutoffs used in each of the tests (

c_{1}

and

c_{2}

) are required to be the same for all hypothesis of interest, then the conditions in Example 23 are not only sufficient, but they are also necessary to ensure that all (weak) consistency properties hold for every prior distribution for θ.

4.2. Strong Desiderata

Definition 9 (Fully Consistent ATS).

An ATS,

L

, is said to be fully consistent if

L

is monotonic (Definition 2), invertible (Definition 7), strongly consonant with the union (Definition 4), and strongly consonant with the intersection (Definition 6).

The following theorem shows that, under mild assumptions, the only ATSs that are fully consistent are those based on region estimators.

Theorem 3.

Assume that for every

θ \in Θ

,

{θ} \in σ (Θ)

. An ATS is fully consistent if, and only if, it is a region estimator-based ATS (Example 5).

Hence, the only way to create a fully consistent ATS is by designing an appropriate region estimator and using Example 5. In particular, ATSs based on posterior probabilities (Example 2) are typically not fully consistent. It should be emphasized that when the region estimator that characterizes a fully consistent ATS

L

maps

X

to singletons of Θ, no sample point will lead to abstention, as either

R (x) \subseteq A

or

R (x) \subseteq A^{c}

, for every

A \in σ (Θ)

. In such situations, region estimators reduce to point estimator which charaterize full consistent non-agnostic TSs [3].

In the next section, we consider a desiderata for simultaneous tests which is not as strong as that of Definition 9, but which is more stringent that that of Definition 8.

4.3. n-Weak Desiderata

In Section 3.2 and Section 3.3, weak consonance was defined for two hypotheses only. It is however possible to define it for

n < \infty

hypotheses:

Definition 10 (Weak n-union Consonance)

An A-TS

L : σ (Θ) \to Φ

satisfies weak n-union consonant if, for every finite set of indices I, with

| I | \leq n

, for every

{A_{i}}_{i \in I} \subseteq σ (Θ)

, and for every

x \in X

\begin{matrix} min {L (A_{i}) (x)}_{i \in I} = 1 implies L (\cup_{i \in I} A_{i}) (x) \neq 0 . \end{matrix}

Definition 11 (Weak n-intersection Consonance)

An ATS

L : σ (Θ) \to Φ

is weak n-intersection consonant if, for every finite set of indices I, with

| I | \leq n

, for every

{A_{i}}_{i \in I} \subseteq σ (Θ)

, and for every

x \in X

\begin{matrix} max {L (A_{i}) (x)}_{i \in I} = 0 implies L (\cap_{i \in I} A_{i}) (x) \neq 1 . \end{matrix}

Although in the context of non agnostic testing schemes (union or intersection) consonance holds for

n = 2

if, and only if, it holds for every

n \in N

[3], this is not the case in the agnostic setting. We hence define

Definition 12 (n-Weakly Consistent ATS)

An ATS,

L

, is said to be n-weakly consistent if

L

is monotonic (Definition 2), invertible (Definition 7), n-weakly consonant with the union (Definition 10), and n-weakly consonant with the intersection (Definition 11).

Example 24 (Region Estimator).

The ATS from Example 5 satisfies weak n-union and weak n-intersection consonance. The argument is the same as that presented in Examples 16 and 18. It follows that this is a n-weakly consistent ATS.

Example 25 (Tests based on posterior probabilities).

Consider Example 2. In order to guarantee weak n-union consonance for every prior, it is necessary and sufficient to have

c_{1} \geq n c_{2}

. Moreover, to guarantee weak n-intersection consonance for every prior, it is necessary and sufficient to have

c_{2} \leq n c_{1} - (n - 1)

. It follows from these conditions and Example 20 that the following restrictions are necessary and sufficient to guarantee monotonicity, n-union consonance, n-intersection consonance and invertibility:

c_{1} > n / (n + 1)

and

c_{2} = 1 - c_{1}

. Hence, these conditions are sufficient to guarantee this ATS is n-weakly consistent. Now, because these conditions are also necessary, it follows that this ATS is n-weakly consistent for every

n > 1

if, and only if, it remains agnostic about every hypothesis which has probability in

(0, 1)

.

5. Decision-Theoretic Perspective

In this section, we investigate agnostic testing schemes from a Bayesian decision-theoretic perspective. First, we define an ATS generated by a family of loss functions. Note that, in the context of agnostic tests, a loss function is a function

L : D \times Θ \to R

that assigns to each

θ \in Θ

the loss

L (d, θ)

for making the decision

d \in {0, \frac{1}{2}, 1}

.

Definition 13 (ATS generated by a family of loss functions).

Let

(X \times Θ, σ (X \times Θ), P)

be a Bayesian statistical model. Let

{(L_{A})}_{A \in σ (Θ)}

be a family of loss functions, where

L_{A} : D \times Θ \to R

is the loss function to be used to test

A \in σ (Θ)

. An ATS generated by the family of loss functions

{(L_{A})}_{A \in σ (Θ)}

is any ATS

L

defined over the elements of

σ (Θ)

such that,

\forall A \in σ (Θ)

,

L (A)

is a Bayes test for hypothesis A against

P

.

Example 26 (Bayesian ATS generated by a family of error-wise constant loss functions).

For

A \in σ (Θ)

, consider the loss function

L_{A}

of the form of Table 1, where all entries are assumed to be non negative. This is a generalization of standard

0 - 1 - c

loss functions to agnostic tests in the sense that it penalizes not only false acceptance and false rejection with constant losses

b_{A}

and

d_{A}

, respectively, but also an eventual abstention from deciding between accepting and rejecting A with the values

a_{A}

and

c_{A}

. If

b_{A} d_{A} > a_{A} b_{A} + c_{A} d_{A}

, then the Bayes test against

L_{A}

consists in rejecting A if

P (A | x) < \frac{c_{A}}{d_{A} + c_{A} - a_{A}}

, accept A if

P (A | x) > \frac{b_{A} - c_{A}}{a_{A} + b_{A} - c_{A}}

, and remain agnostic otherwise. It follows that the following ATS is generated by the family of loss functions

{(L_{A})}_{A \in σ (Θ)}

:

L (A) (x) = \{\begin{matrix} 0 & if P (A | x) > \frac{b_{A} - c_{A}}{a_{A} + b_{A} - c_{A}} \\ \frac{1}{2} & if \frac{b_{A} - c_{A}}{a_{A} + b_{A} - c_{A}} \geq P (A | x) > \frac{c_{A}}{d_{A} + c_{A} - a_{A}} \\ 1 & if \frac{c_{A}}{d_{A} + c_{A} - a_{A}} \geq P (A | x) \end{matrix}

Notice that if, for every

A, B \in σ (Θ)

,

a_{A} = a_{B}

,

b_{A} = b_{B}

,

c_{A} = c_{B}

, and

d_{A} = d_{B}

, this ATS matches that from Example 2 for a particular value of

c_{1}

and

c_{2}

.

We restrict out attention to ATSs generated by proper losses, a concept we adapt from [3] to agnostic tests:

Definition 14 (Proper losses).

A family of loss functions

{(L_{A})}_{A \in σ (Θ)}

has proper losses if

\begin{matrix} \{\begin{matrix} L_{A} (0, θ) < L_{A} (\frac{1}{2}, θ) < L_{A} (1, θ) & , if θ \in A \\ L_{A} (0, θ) > L_{A} (\frac{1}{2}, θ) > L_{A} (1, θ) & , if θ \notin A \\ L_{A} (\frac{1}{2}, θ) < \frac{L_{A} (0, θ) + L_{A} (1, θ)}{2} & , for all θ \end{matrix} \end{matrix}

Definition 14 states that (i) by taking a correct decision we lose less than by taking a wrong decision; (ii) by remaining agnostic we do not lose as much as when taking a wrong decision, but we lose more than by taking a correct decision; and (iii) it is better to remain agnostic about A than to flip a coin to decide if we reject or accept this hypothesis.

Example 27 (Bayesian ATS generated by a family of error-wise constant loss functions).

In order to ensure that the loss in Example 26 is proper, the following restrictions must be satisfied:

0 < a_{A} < d_{A} / 2 and 0 < c_{A} < b_{A} / 2 .

In particular, these conditions imply those stated in Example 26.

5.1. Monotonicity

We now turn our attention towards characterizing Bayesian monotonic ATS using a decision-theoretic framework. In order to do this, we first adapt the concept of relative losses [3] to the context of agnostic testing schemes.

Definition 15 (Relative Loss).

Let

L_{A}

be a loss function for testing hypothesis A. The relative losses

r_{A}^{(1, \frac{1}{2})} : Θ \to R

and

r_{A}^{(\frac{1}{2}, 0)} : Θ \to R

are defined by

\begin{matrix} \{\begin{matrix} r_{A}^{(1, \frac{1}{2})} (θ) & = L_{A} (1, θ) - L_{A} (\frac{1}{2}, θ) \\ r_{A}^{(\frac{1}{2}, 0)} (θ) & = L_{A} (\frac{1}{2}, θ) - L_{A} (0, θ) \end{matrix} \end{matrix}

The relative losses thus measure the difference between the losses of rejecting a given hypothesis and remaining agnostic about it, as well as the difference between the losses of remaining agnostic and accepting it. In order to guarantee that a Bayesian ATS is monotonic, certain constraints on the relative losses must be imposed. The next definition presents one of such assumptions, which we interpret in the sequence.

Definition 15 (Relative Loss).

Let

D_{>}^{2} = {(1, \frac{1}{2}), (\frac{1}{2}, 0)}

.

{(L_{A})}_{A \in σ (Θ)}

has monotonic relative losses if the family

{(L_{A})}_{A \in σ (Θ)}

is proper and, for all

A, B \in σ (Θ)

such that

A \subset B

and for all

(i, j) \in D_{>}^{2}

,

r_{B}^{(i, j)} (θ) \geq r_{A}^{(i, j)} (θ) \forall θ \in Θ

Let

A, B \in σ (Θ)

with

A \subseteq B

. If

θ \in A

, both A and B are true, so

{(L_{A})}_{A \in σ (Θ)}

having monotonic relative losses reflects the situation in which the rougher error of rejecting B compared to rejecting A (with respect to remaining agnostic about these hypotheses) should be assigned a larger relative loss. Similarly, the rougher error of remaining agnostic about B should be assigned a larger relative loss than remaining agnostic about A (with respect to correctly accepting these hypotheses). If

θ \in B

but

θ \notin A

, these conditions are a consequence of the assumption that the family

{(L_{A})}_{A \in σ (Θ)}

is proper. The case

θ \notin B

can be interpreted in a similar fashion as the case

θ \in A

.

The following example presents necessary and sufficient conditions to ensure that the loss functions from Example 26 yield monotonic relative losses.

Example 28.

Consider the losses presented in Example 26. Assuming the losses are proper (see Example 27), the conditions required to ensure

{(L_{A})}_{A \in σ (Θ)}

has monotonic relative losses are

a_{A} \leq a_{B}, c_{B} \leq c_{A}, c_{B} - b_{B} \geq c_{A} - b_{A} and d_{B} - a_{B} \geq d_{A} - a_{A}

Notice that these restrictions imply that

b_{A} \geq b_{B}

.

As a particular example, let

k > 2

and λ be a finite measure in

σ (Θ)

with

λ (Θ) > 0

. The following assignments yield a proper and monotonic loss: for every

A \in σ (Θ)

,

b_{A} = λ (A^{c}),

a_{A} = λ (A) / k,

c_{A} = λ (A^{c}) / k,

and

d_{A} = λ (A)

. Another particular case is when

a_{A} = a_{B}, b_{A} = b_{B}, c_{A} = c_{B},

and

d_{A} = d_{B}

for every

A, B \in σ (Θ)

.

Another concept that helps us characterizing the Bayesian monotonic agnostic testing schemes is that of balanced relative losses, which we adapt from [7].

Definition 17 (Balanced Relative Loss).

{(L_{A})}_{A \in σ (Θ)}

has balanced relative losses if, for all

A, B \in σ (Θ)

such that

A \subset B

, for all

θ_{1} \in A

and

θ_{2} \in B^{c}

, and for all

(i, j) \in D_{>}^{2}

,

\begin{matrix} \frac{r_{A}^{(i, j)} (θ_{1})}{r_{A}^{(i, j)} (θ_{2})} \geq \frac{r_{B}^{(i, j)} (θ_{1})}{r_{B}^{(i, j)} (θ_{2})} \end{matrix}

Lemma 1.

If

{(L_{A})}_{A \in σ (Θ)}

has monotonic relative losses, then

{(L_{A})}_{A \in σ (Θ)}

has balanced relative losses.

The following result shows that balanced relative losses characterize Bayesian monotonic ATS.

Theorem 4.

Let

{(L_{A})}_{A \in σ (Θ)}

be a family of proper loss functions. Assume that for every

θ \in Θ

and

x \in X

,

L_{x} (θ) > 0

. For every prior π for θ, let

L^{π}

denote a Bayesian ATS generated by

{(L_{A})}_{A \in σ (Θ)}

. There exists a monotonic

L^{π}

for every prior π if, and only if,

{(L_{A})}_{A \in σ (Θ)}

has balanced relative losses.

Example 29.

In Example 28, we obtained conditions on the loss functions

{(L_{A})}_{A \in σ (Θ)}

from Example 26 in order to guarantee that family to have monotonic relative losses. From Lemma 1 and Theorem 4, it follows that such family of loss functions yield monotonic Bayesian ATSs whatever the prior for θ is. In other words, there are family of loss functions that induce monotonic tests based on posterior probabilities.

5.2. Union Consonance

We now turn our attention towards characterizing union consonant Bayesian ATS using a decision theoretic framework.

Definition 18.

{(L_{A})}_{A \in σ (Θ)}

is compatible with weak union consonance if there exists no

A, B \in σ (Θ)

,

θ_{1}, θ_{2}, θ_{3} \in Θ

and

p_{1}, p_{2}, p_{3} \geq 0

such that

p_{1} + p_{2} + p_{3} = 1

and

\begin{matrix} \{\begin{matrix} p_{1} \cdot r_{A}^{(1, \frac{1}{2})} (θ_{1}) + p_{2} \cdot r_{A}^{(1, \frac{1}{2})} (θ_{2}) + p_{3} \cdot r_{A}^{(1, \frac{1}{2})} (θ_{3}) & < 0 \\ p_{1} \cdot r_{B}^{(1, \frac{1}{2})} (θ_{1}) + p_{2} \cdot r_{B}^{(1, \frac{1}{2})} (θ_{2}) + p_{3} \cdot r_{B}^{(1, \frac{1}{2})} (θ_{3}) & < 0 \\ p_{1} \cdot r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{1}) + p_{2} \cdot r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{2}) + p_{3} \cdot r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{3}) & > 0 \end{matrix} \end{matrix}

Definition 18 states that the family of loss functions

{(L_{A})}_{A \in σ (Θ)}

being compatible with weak union consonance cannot induce any Bayesian ATS on the basis of which one may prefer rejecting both hypotheses A and B over remaining agnostic about them while accepting

A \cup B

rather than abstaining.

As we will see in the next theorem, proper loss functions compatible with weak union consonance characterize Bayesian ATSs that are weakly consonant with the union.

Theorem 5.

Let

{(L_{A})}_{A \in σ (Θ)}

be a family of proper loss functions. Assume that for every

θ \in Θ

and

x \in X

,

L_{x} (θ) > 0

. For every prior π for θ, let

L^{π}

denote a Bayesian ATS generated by

{(L_{A})}_{A \in σ (Θ)}

. There exists an ATS

L^{π}

that is weakly consonant with the union for every priori π if, and only if,

{(L_{A})}_{A \in σ (Θ)}

is compatible with weak union consonance.

Example 30.

We saw that the ATS from Example 2 is a Bayes test against a particular proper loss (Examples 26 and 27) and that it is weakly consonant with the union (Example 13). It follows from Theorem 5 that the family of loss functions that lead to this ATS are compatible with weak union consonance.

Definition 19 (Union consonance-balanced relative losses [7]).

{(L_{A})}_{A \in σ (Θ)}

has union consonance-balanced relative losses if, for every

A, B \in σ (Θ)

,

θ_{1} \in A \cup B

and

θ_{2} \in {(A \cup B)}^{c}

,

\begin{matrix} \frac{r_{A}^{(1, \frac{1}{2})} (θ_{1})}{r_{A}^{(1, \frac{1}{2})} (θ_{2})} \leq \frac{r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{1})}{r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{2})}, or \\ \frac{r_{B}^{(1, \frac{1}{2})} (θ_{1})}{r_{B}^{(1, \frac{1}{2})} (θ_{2})} \leq \frac{r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{1})}{r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{2})} \end{matrix}

Corollary 1.

Let

{(L_{A})}_{A \in σ (Θ)}

be a family of proper loss functions. Assume that for every

θ \in Θ

and

x \in X

,

L_{x} (θ) > 0

. If

{(L_{A})}_{A \in σ (Θ)}

does not have union consonance-balanced relative losses, then there exists a prior π such that every Bayesian ATS,

L^{π}

, is not weakly consonant with the union.

5.3. Intersection Consonance

Next, we characterize intersection consonant Bayesian ATS under a Bayesian perspective.

Definition 20.

{(L_{A})}_{A \in σ (Θ)}

is compatible with weak intersection consonance if there exists no

A, B \in σ (Θ)

,

θ_{1}, θ_{2}, θ_{3} \in Θ

and

p_{1}, p_{2}, p_{3} \geq 0

such that

p_{1} + p_{2} + p_{3} = 1

and

\begin{matrix} \{\begin{matrix} p_{1} \cdot r_{A}^{(\frac{1}{2}, 0)} (θ_{1}) + p_{2} \cdot r_{A}^{(\frac{1}{2}, 0)} (θ_{2}) + p_{3} \cdot r_{A}^{(\frac{1}{2}, 0)} (θ_{3}) & > 0 \\ p_{1} \cdot r_{B}^{(\frac{1}{2}, 0)} (θ_{1}) + p_{2} \cdot r_{B}^{(\frac{1}{2}, 0)} (θ_{2}) + p_{3} \cdot r_{B}^{(\frac{1}{2}, 0)} (θ_{3}) & > 0 \\ p_{1} \cdot r_{A \cap B}^{(1, \frac{1}{2})} (θ_{1}) + p_{2} \cdot r_{A \cap B}^{(1, \frac{1}{2})} (θ_{2}) + p_{3} \cdot r_{A \cap B}^{(1, \frac{1}{2})} (θ_{3}) & < 0 \end{matrix} \end{matrix}

Definition 20 states that the family of loss functions

{(L_{A})}_{A \in σ (Θ)}

being compatible with weak intersection consonance cannot induce any Bayesian ATS on the basis of which one may prefer accepting both hypotheses A and B to remaining agnostic about them while rejecting

A \cap B

rather than abstaining.

As we will see in the next theorem, proper loss functions compatible with weak intersection consonance characterize Bayesian ATSs that are weakly consonant with the intersection .

Theorem 6.

Let

{(L_{A})}_{A \in σ (Θ)}

be a family of proper loss functions. Assume that for every

θ \in Θ

and

x \in X

,

L_{x} (θ) > 0

. For every prior π for θ, let

L^{π}

denote a Bayesian ATS generated by

{(L_{A})}_{A \in σ (Θ)}

. There exists an ATS

L^{π}

that is weakly consonant with the intersection for every prior π if, and only if,

{(L_{A})}_{A \in σ (Θ)}

is compatible with weak intersection consonance.

Example 31.

We saw that the ATS from Example 2 is a Bayes test against a particular proper loss (Examples 26 and 27) and that it is weakly consonant with the intersection (Example 17). It follows from Theorem 6 that the family of loss functions that lead to this ATS are compatible with weak intersection consonance.

Definition 21 (Intersection consonance-balanced relative losses [7]).

{(L_{A})}_{A \in σ (Θ)}

has intersection consonance-balanced relative losses if, for every

A, B \in σ (Θ)

,

θ_{1} \in A \cap B

and

θ_{2} \in {(A \cap B)}^{c}

,

\begin{matrix} \frac{r_{A \cap B}^{(1, \frac{1}{2})} (θ_{1})}{r_{A \cap B}^{(1, \frac{1}{2})} (θ_{2})} \leq \frac{r_{A}^{(\frac{1}{2}, 0)} (θ_{1})}{r_{A}^{(\frac{1}{2}, 0)} (θ_{2})}, or \\ \frac{r_{A \cap B}^{(1, \frac{1}{2})} (θ_{1})}{r_{A \cap B}^{(1, \frac{1}{2})} (θ_{2})} \leq \frac{r_{B}^{(\frac{1}{2}, 0)} (θ_{1})}{r_{B}^{(\frac{1}{2}, 0)} (θ_{2})} \end{matrix}

Corollary 2.

Let

{(L_{A})}_{A \in σ (Θ)}

be a family of proper loss functions. Assume that for every

θ \in Θ

and

x \in X

,

L_{x} (θ) > 0

. If

{(L_{A})}_{A \in σ (Θ)}

does not have intersection consonance-balanced relative losses, then there exists a prior π such that every Bayesian ATS,

L^{π}

, is not weakly consonant with the intersection.

We end this section by noting that although we focused our results on weak consonance, they can be extended to strong consonance using the same techniques presented in the Appendix.

5.4. Invertibility

Finally, we examine invertible Bayesian ATSs from a decision-theoretic standpoint.

Definition 22 (Invertible Relative Losses).

{(L_{A})}_{A \in σ (Θ)}

has invertible relative losses if, for every

A \in σ (Θ)

, for all

θ_{1} \in A

,

θ_{2} \in A^{c}

and

(i, j) \in D_{>}^{2}

,

\begin{matrix} \frac{r_{A}^{(i, j)} (θ_{1})}{r_{A}^{(i, j)} (θ_{2})} & = \frac{r_{A^{c}}^{(i, j)} (θ_{1})}{r_{A^{c}}^{(i, j)} (θ_{2})} \end{matrix}

We end this section by showing that invertible Bayesian ATSs are determined by family of loss functions that fulfill the conditions of Definition 22.

Theorem 7.

Let

{(L_{A})}_{A \in σ (Θ)}

be a family of proper loss functions. Assume that for every

θ \in Θ

and

x \in X

,

L_{x} (θ) > 0

. For every prior π for θ, let

L^{π}

denote a Bayesian ATS generated by

{(L_{A})}_{A \in σ (Θ)}

. There exists an ATS

L^{π}

that is invertible for every prior π if, and only if,

{(L_{A})}_{A \in σ (Θ)}

has invertible relative losses.

Example 32.

For every

A \in σ (Θ)

, let

{(L_{A})}_{A \in σ (Θ)}

be such that

L_{A} (1, θ) = L_{A^{c}} (0, θ)

and

L_{A} (\frac{1}{2}, θ) = L_{A^{c}} (\frac{1}{2}, θ)

. It is easily seen that the conditions from Definition 22 hold. Theorem 7 then implies that any Bayesian ATS generated by

{(L_{A})}_{A \in σ (Θ)}

is invertible.

6. Final Remarks

Agnostic tests allow one to explicitly capture the difference between “not rejecting” and “accepting” a null hypothesis. When the agnostic decision is chosen, the null hypothesis is neither rejected or accepted. This possibility aligns with the idea that although precise null hypotheses can be tested, they shouldn’t be accepted. This idea is followed by the region based agnostic tests derived in this paper, which can either remain agnostic or reject precise null hypotheses.

This distinction provides a solution to the problem raised by Izbicki and Esteves [3], in which all (non-agnostic) logically coherent tests were shown to be based on point estimators which lack statistical optimality. We show that agnostic tests based on region estimators satisfy logical consistency and also allow statistical optimality. For example, agnostic tests based on frequentist confidence intervals control family wise error. Similarly, agnostic tests based on posterior density regions are shown to be an extension of the Full Bayesian Significance Test [11].

Future research includes investigating the consequences and generalizations of the logical requirements in this paper. For example, one could study what kinds of trivariate logic derive from the different definition of logical consistency studied in this paper. One could also generalize these logical requirements to generalized agnostic tests, in which one can decide among different degrees of agnosticism. The scale of such degrees can be either discrete or continuous. One could also investigate region estimator-based ATSs with respect to other optimality criteria such as statistical power.

The results of this paper can also be tied to the philosophical literature that studies the consequences and importance of precise hypothesis. Agnostic tests can be used to revisit the role of testing precise hypotheses in science. Agnostic tests also provide a framework to interpret the scientific meaning of measures of possibility or significance of precise hypotheses.

multiple

Acknowledgments

Julio M. Stern is grateful for the support of IME-USP, the Institute of Mathematics and Statistics of the University of São Paulo; FAPESP—the State of São Paulo Research Foundation (grants CEPID 2013/07375-0 and 2014/50279-4); and CNPq—the Brazilian National Counsel of Technological and Scientific Development (grant PQ 301206/2011-2). Rafael Izbicki is grateful for the support of FAPESP (grant 2014/25302-2).

Author Contributions

The manuscript has come to fruition by the substantial contributions of all authors. All authors have also been involved in either writing the article or carefully revising it. All authors have read and approved the submitted version of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Theorem 1.

The sufficiency is immediate. Let

I = {0, \frac{1}{2}, 1} \subseteq R

,

c_{1} = \frac{1}{2}

and

c_{2} = 0 .

For

A \in σ (Θ)

, let

s_{A} = 1 - L (A)

,

{(s_{A})}_{A \in σ (Θ)}

is such that

s_{A} (x) \leq s_{B} (x)

,

\forall x \in X

, if

A \subset B

and it is straightforward to verify Equation (1).

Now, let

A, B \in σ (Θ)

, with

A \subset B

. If

L

is given by Equation (1), it follows that:

$L_{A} (x) = 0 \Rightarrow s_{A} (x) > c_{1} \Rightarrow s_{B} (x) > c_{1} \Rightarrow L_{B} (x) = 0$ .
$L_{A} (x) = \frac{1}{2} \Rightarrow c_{1} \geq s_{A} (x) > c_{2} \Rightarrow s_{B} (x) > c_{2} \Rightarrow L_{B} (x) \in {0, \frac{1}{2}}$ .
$L_{A} (x) = 1 \Rightarrow L_{B} (x) \leq L_{A} (x) = 1 .$

From

(1), (2), (3)

it follows that

L_{A} (x) \geq L_{B} (x)

, thus

L

is monotonic. ☐

Proof of Theorem 2.

We start by proving the sufficiency of these conditions.

If for every $A, B \in σ (Θ)$ with $A \subseteq B,$ $c_{2}^{A} \geq c_{2}^{B}$ and $c_{1}^{A} \geq c_{1}^{B}$ , then for every $x \in X$ , $P (A | x) > c_{2}^{A} \Rightarrow P (B | x) > c_{2}^{B}$ , and $P (A | x) > c_{1}^{A} \Rightarrow P (B | x) > c_{1}^{B}$ . It follows that monotonicity holds.
If for every $A, B \in σ (Θ)$ such that $A \neq B$ , $c_{2}^{A} + c_{2}^{B} \leq c_{1}^{A \cup B}$ , then for every $x \in X$ , $P (A | x) \leq c_{2}^{A}$ and $P (B | x) \leq c_{2}^{B}$ implies that $P (A \cup B | x) \leq P (A | x) + P (B | x) = c_{2}^{A} + c_{2}^{B} \leq c_{1}^{A \cup B}$ . It follows that union consonance holds.
If for every $A, B \in σ (Θ)$ such that $A \neq B$ , $c_{1}^{A} + c_{1}^{B} - 1 \geq c_{2}^{A \cap B}$ , then for every $x \in X$ , $P (A | x) > c_{1}^{A}$ and $P (B | x) > c_{1}^{B}$ implies that $P (A \cap B | x) \geq P (A | x) + P (B | x) - 1 > c_{1}^{A} + c_{1}^{B} - 1 \geq c_{2}^{A \cap B}$ . It follows that intersection consonance holds.
If for every $A \in σ (Θ)$ , $c_{1}^{A} = 1 - c_{2}^{A^{c}}$ , then for every $x \in X$ , $P (A | x) \leq c_{1}^{A}$ if, and only if, $P (A^{c} | x) \geq 1 - c_{1}^{A} = c_{2}^{A^{c}}$ . Similarly, $P (A | x) \leq c_{2}^{A}$ if, and only if, $P (A^{c} | x) \geq 1 - c_{2}^{A} = c_{1}^{A^{c}}$ . It follows that invertibility holds.

We prove the necessary condition only for union consonance; the other statements have a similar proof. Suppose there are

A, B \in σ (Θ)

,

A \neq B

, such that

c_{2}^{A} + c_{2}^{B} > c_{1}^{A \cup B}

. Let

θ_{1} \in A \cap B^{c}

and

θ_{2} \in B \cap A^{c}

.

First, assume

c_{2}^{A} + c_{2}^{B} \leq 1

. Let

ϵ > 0

be such that

ϵ \leq (c_{2}^{A} + c_{2}^{B} - c_{1}^{A \cup B}) / 2

and

ϵ < min {c_{2}^{A}, c_{2}^{B}}

. Assume that the posterior distribution on Θ given x is such that

P (A | x) = P ({θ_{1}} | x) = c_{2}^{A} - ϵ and P (B | x) = P ({θ_{2}} | x) = c_{2}^{B} - ϵ

(see the Appendix of [3] for a prior distribution that leads to such posterior). It follows that

P (A | x) = P ({θ_{1}} | x) \leq c_{2}^{A}

,

P (B | x) = P ({θ_{2}} | x) \leq c_{2}^{B}

, and

P (A \cup B | x) = P ({θ_{1}} | x) + P ({θ_{2}} | x) = c_{2}^{A} + c_{2}^{B} - 2 ϵ > c_{2}^{A} + c_{2}^{B} - (c_{2}^{A} + c_{2}^{B} - c_{1}^{A \cup B}) = c_{1}^{A \cup B}

Hence A and B are rejected, but

A \cup B

is accepted.

Now, assume

c_{2}^{A} + c_{2}^{B} > 1

. Let

b_{2}^{A} < c_{2}^{A}

and

b_{2}^{B} < c_{2}^{B}

be such that

b_{2}^{A} + b_{2}^{B} = 1

. Assume that the posterior distribution on Θ is such that

P (A | x) = P ({θ_{1}} | x) = b_{2}^{A} and P (B | x) = P ({θ_{2}} | x) = b_{2}^{B}

It follows that

P (A | x) = P ({θ_{1}} | x) < b_{2}^{A} < c_{2}^{A}

,

P (B | x) = P ({θ_{2}} | x) < b_{2}^{B} < c_{2}^{B}

, and

P (A \cup B | x) = P ({θ_{1}}, {θ_{2}} | x) = b_{2}^{A} + b_{2}^{B} = 1 \geq c_{1}^{A \cup B}

Hence A and B are rejected, but

A \cup B

is not. ☐

Lemma A1.

Let

L

be an invertible ATS. If, for every x, there exists

R (x) \subset Θ

such that

\forall A \in σ (Θ)

L (A) (x) = 0

if and only if

R (x) \subset A

, then

L

is a region estimator-based ATS (Example 5).

Proof of Lemma A1.

It follows from definition that, for every

A \in σ (Θ)

such that

R (x) \subset A

,

L (A) (x) = 0

. Furthermore, for every

A \in σ (Θ)

such that

R (x) \subset A^{c}

,

L (A^{c}) (x) = 0

. Therefore, it follows from invertibility that

L (A) (x) = 1

. Finally, let

A \in σ (Θ)

be such that

A \cap R (x) \neq \emptyset

and

A^{C} \cap R (x) \neq \emptyset

. Since

A \cap R (x) \neq R (x)

and

A^{c} \cap R (x) \neq R (x)

it follows that

L (A) (x) \geq \frac{1}{2}

and

L (A^{c}) (x) \geq \frac{1}{2}

. Conclude from invertibility that

L (A) (x) = \frac{1}{2}

. ☐

Proof of Theorem 3.

It follows from definition that a region estimator ATS is fully consistent. In order to prove the reverse implication, consider the following notation. For every

x \in X

and

θ \in Θ

, let

A_{θ} = Θ - {θ}

. Let

R (x) = ⋂ {A_{θ} : L (A_{θ}) (x) = 0}

.

Next, we prove that, for every

B \in σ (Θ)

,

L (B) (x) = 0

if and only if

R (x) \subset B

. Let

B \in σ (Θ)

be such that

R (x) \subset B

. Therefore, it follows from the definition of

R (x)

that, for every

θ \in B^{c}

,

L (A_{θ}) = 0

. Since

B = ⋂ {A_{θ} : θ \in B^{c}}

, it follows from strong intersection consonance (Definition 6) that

L (B) (x) = 0

. Let

B \in σ (Θ)

be such that

L (B) (x) = 0

. It follows from the monotonicity of

L

(Definition 2) that, for every

θ \in B^{c}

,

L (A_{θ}) (x) = 0

as

B \subseteq A_{θ}

. Therefore,

R (x) = ⋂ {A_{θ} : L (A_{θ}) (x) = 0} \subset ⋂ {A_{θ} : θ \in B^{c}} = B

Conclude that for every

B \in σ (Θ)

,

L (B) (x) = 0

if and only if

R (x) \subset B

.

Since

L

is invertible (Definition 7), it follows from the above conclusion and Lemma A1 that

L

is a region estimator-based ATS. ☐

Lemma A2.

Let, for

i \in Θ

,

{i} \in σ (Θ)

and

f_{1}, \dots, f_{m}

be

σ (Θ) / R

-measurable bounded functions. If there exists a probability

P

on

σ (Θ)

such that, for all

1 \leq i \leq m

,

\int f_{i} d P > 0

, then are

A \in σ (Θ)

, A finite, and a probability

P^{*}

with a finite support such that

P^{*} (A) = 1

and such that, for all

1 \leq i \leq m

,

\int f_{i} d P^{*} > 0

.

Proof.

Let

ϵ_{i} > 0

be such that,

\begin{matrix} \int f_{i} d P > ϵ_{i} \end{matrix}

(A1)

Since

f_{i}

are bounded, there exist simple measurable functions,

g_{i}

, such that

\begin{matrix} sup_{x \in Θ} | g_{i} (x) - f_{i} (x) | < \frac{{min}_{i} ϵ_{i}}{2} \end{matrix}

(A2)

Therefore,

\begin{matrix} \int g_{i} d P > \frac{ϵ_{i}}{2} \end{matrix}

(A3)

Let

G_{i} = {g_{i}^{- 1} ({j}) : j \in g_{i} (Θ)}

. Observe that

G_{i}

is a finite partition of Θ. Let

G^{*}

be the coarsest partition that is finer than every

G_{i}

. Since every

G_{i}

is finite,

G^{*}

is finite. Let

h : G^{*} \to Θ

be such that

h (G) \in G

. Define

P^{*} : σ (Θ) \to R_{*}

by

P^{*} (A) = \sum_{G \in G^{*}} P (G) I_{A} (h (G)) . P^{*}

is such that

P^{*} ({h (G)}) = P (G), \forall G \in G^{*}

, and that

P^{*} (h (G^{*}))) = 1,

where

h (G^{*})

is a finite subset of Θ. Also, conclude from the definition of

G^{*}

and Equation (A3) that

\begin{matrix} \int g_{i} d P^{*} = \int g_{i} d P > \frac{ϵ_{i}}{2} \end{matrix}

(A4)

Conclude from Equations (A2) and (A4) that

\begin{matrix} \int f_{i} d P^{*} > 0, i = 1, \dots, m \end{matrix}

☐

Lemma A3.

Let, for

i \in Θ

,

{i} \in σ (Θ)

and

f_{1}, \dots, f_{m}

be

σ (Θ) / R

-measurable bounded functions. If there exists a probability

P

on

σ (Θ)

such that

P

has a finite support and, for all

1 \leq i \leq m

,

\int f_{i} d P > 0

, then there exists a probability

P^{*}

with a support of size smaller or equal than m such that, for all

1 \leq i \leq m

,

\int f_{i} d P^{*} > 0

.

Proof.

Let

ϵ_{i} > 0

be such that

\begin{matrix} \int f_{i} d P \geq ϵ_{i} \end{matrix}

Let

Θ_{P}

denote the support of

P

. Let

θ_{1}, \dots, θ_{| Θ_{P} |}

be an ordering of the elements of

Θ_{P}

. Let F be a

m \times | Θ_{P} |

matrix such that

F_{i, j} = f_{i} (θ_{j})

. Let

p \in R^{| Θ_{P} |}

be such that

p_{j} = P ({θ_{j}})

,

j = 1, \dots, | Θ_{P} |

. Observe that

\begin{matrix} F p \geq ϵ; p \geq 0 \end{matrix}

Therefore, the set

C = {p^{*} \in R^{| Θ_{P} |} : p \geq 0, F p \geq ϵ}

is a non-empty polyhedron. Conclude that there exists a vertex

p^{*} \in C

such that

| {i : p_{i}^{*} = 0} | \geq | Θ_{P} | - m

. Define

P^{*} ({θ_{i}}) = \frac{p_{i}^{*}}{∥ p^{*} ∥_{1}}

. ☐

Theorem A1.

Let, for

i \in Θ

,

{i} \in σ (Θ)

and

f_{1}, \dots, f_{m}

be

σ (Θ) / R

-measurable bounded functions. There exists a probability

P

on

σ (Θ)

such that, for all

1 \leq i \leq m

,

\int f_{i} d P > 0

, if and only if there exists a probability

P^{*}

with a support of size smaller or equal to m such that, for all

1 \leq i \leq m

,

\int f_{i} d P^{*} > 0

.

Proof.

Follows directly from Lemmas A2 and A3. ☐

Lemma A4.

Let

{(L_{A})}_{A \in σ (Θ)}

have proper losses. For every

x \in X

,

If $E [L_{A} (1, θ) | x] < E [L_{A} (\frac{1}{2}, θ) | x]$ , then $E [L_{A} (1, θ) | x] < E [L_{A} (0, θ) | x]$ .
If $E [L_{A} (0, θ) | x] < E [L_{A} (\frac{1}{2}, θ) | x]$ , then $E [L_{A} (0, θ) | x] < E [L_{A} (1, θ) | x]$ .

Proof of Lemma A4.

The proof follows directly from the monotonicity of conditional expectation. ☐

Proof of Lemma 1.

Let

A \subset B

,

θ_{1} \in A

,

θ_{2} \in B^{c}

and

(i, j) \in D_{>}^{2}

. Since

{(L_{A})}_{A \in σ (Θ)}

has proper and monotonic relative losses,

\begin{matrix} r_{B}^{(i, j)} (θ_{1}) \geq r_{A}^{(i, j)} (θ_{1}) > 0 \\ r_{A}^{(i, j)} (θ_{2}) \leq r_{B}^{(i, j)} (θ_{2}) < 0 \end{matrix}

Conclude that

{(L_{A})}_{A \in σ (Θ)}

has balanced relative losses. ☐

Lemma A5.

Let

{(L_{A})}_{A \in σ (Θ)}

have proper losses,

L_{A}

be bounded for every

A \in σ (Θ)

and

L_{x} (θ) > 0

for every

θ \in Θ

and

x \in X

. There exists a prior for θ such that, for some

A \subset B

and

(i, j) \in D_{>}

and some

x \in X

,

E [L_{B} (i, θ) | x] < E [L_{B} (j, θ) | x]

and

E [L_{A} (i, θ) | x] > E [L_{A} (j, θ) | x]

if and only if

{(L_{A})}_{A \in σ (Θ)}

does not have balanced relative losses.

Proof of Lemma A5.

Since

L_{x} (θ) > 0

, the space of posteriors is exactly the space of priors over

σ (Θ)

[3]. Therefore, there exists a prior such that

E [L_{B} (i, θ) | x] < E [L_{B} (j, θ) | x]

and

E [L_{A} (i, θ) | x] > E [L_{A} (j, θ) | x]

if and only if there exists

P

such that

\begin{matrix} \int - r_{B}^{(i, j)} d P > 0 and \\ \int r_{A}^{(i, j)} d P > 0 \end{matrix}

It follows from Theorem A1 that there exists such a

P

if and only if there exists

θ_{1}, θ_{2} \in Θ

and

p \in [0, 1]

such that

\begin{matrix} \{\begin{matrix} p \cdot r_{B}^{(i, j)} (θ_{1}) + (1 - p) \cdot r_{B}^{(i, j)} (θ_{2}) & < 0 \\ p \cdot r_{A}^{(i, j)} (θ_{1}) + (1 - p) \cdot r_{A}^{(i, j)} (θ_{2}) & > 0 \end{matrix} \end{matrix}

Since

{(L_{A})}_{A \in σ (Θ)}

has proper losses, the above condition is satisfied if and only if

p \in (0, 1)

, that is, if and only if

{(L_{A})}_{A \in σ (Θ)}

doesn’t have balanced relative losses. ☐

Proof of Theorem 4.

Assume that

{(L_{A})}_{A \in σ (Θ)}

has balanced relative losses. Let

P_{θ}

be an arbitrary prior and

A, B

be arbitrary sets such that

A \subset B

. It follows from Lemma A5 that, for every

(i, j) \in D_{>}^{2}

, it cannot be the case that

E [L_{B} (i, θ) | x] < E [L_{B} (j, θ) | x]

and

E [L_{A} (i, θ) | x] > E [L_{A} (j, θ) | x]

. Conclude from Lemma A4 that there exists a monotonic Bayesian ATS.

Assume that

{(L_{A})}_{A \in σ (Θ)}

does not have balanced relative losses. It follows from Lemma A5 that there exists a prior

P_{θ}

,

A \subset B

and

(i, j) \in D_{>}^{2}

and

x \in χ

such that

E [L_{B} (i, θ) | x] < E [L_{B} (j, θ) | x]

and

E [L_{A} (i, θ) | x] > E [L_{A} (j, θ) | x]

. Conclude from Lemma A4 that, for every Bayesian ATS,

L^{P_{θ}} (A) (x) \leq j < i \leq L^{P_{θ}} (B) (x)

. Therefore there exists no monotonic Bayesian ATS against

P_{θ}

. ☐

Proof of Theorem 5.

The proof follows directly from Theorem A1 and Lemma A4. ☐

Proof of Corollary 1.

Assume that

{(L_{A})}_{A \in σ (Θ)}

doesn’t satisfy Definition 19. Therefore, there exist

A, B \in σ (Θ)

,

θ_{1} \in A \cup B

and

θ_{2} \in {(A \cup B)}^{c}

such that

\begin{matrix} \{\begin{matrix} \frac{r_{A}^{(1, \frac{1}{2})} (θ_{1})}{r_{A}^{(1, \frac{1}{2})} (θ_{2})} & > \frac{r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{1})}{r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{2})} \\ \frac{r_{B}^{(1, \frac{1}{2})} (θ_{1})}{r_{B}^{(1, \frac{1}{2})} (θ_{2})} & > \frac{r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{1})}{r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{2})} \end{matrix} \end{matrix}

Since

{(L_{A})}_{A \in σ (Θ)}

has proper losses, there exist

q_{1}, q_{2} \in (0, 1)

such that

\begin{matrix} \{\begin{matrix} q_{1} \cdot r_{A}^{(1, \frac{1}{2})} (θ_{1}) + (1 - q_{1}) \cdot r_{A}^{(1, \frac{1}{2})} (θ_{2}) & < 0 \\ q_{1} \cdot r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{1}) + (1 - q_{1}) \cdot r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{2}) & > 0 \\ q_{2} \cdot r_{B}^{(1, \frac{1}{2})} (θ_{1}) + (1 - q_{2}) \cdot r_{B}^{(1, \frac{1}{2})} (θ_{2}) & < 0 \\ q_{2} \cdot r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{1}) + (1 - q_{2}) \cdot r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{2}) & > 0 \end{matrix} \end{matrix}

(A5)

Let

p_{1} = min (q_{1}, q_{2})

,

p_{2} = 1 - p_{1}

,

p_{3} = 0

and

θ_{3} \in Θ

. Since

{(L_{A})}_{A \in σ (Θ)}

has proper losses, it follows directly from Equation (A5) that

\begin{matrix} \{\begin{matrix} p_{1} \cdot r_{A}^{(1, \frac{1}{2})} (θ_{1}) + p_{2} \cdot r_{A}^{(1, \frac{1}{2})} (θ_{2}) + p_{3} \cdot r_{A}^{(1, \frac{1}{2})} (θ_{3}) & < 0 \\ p_{1} \cdot r_{B}^{(1, \frac{1}{2})} (θ_{1}) + p_{2} \cdot r_{B}^{(1, \frac{1}{2})} (θ_{2}) + p_{3} \cdot r_{B}^{(1, \frac{1}{2})} (θ_{3}) & < 0 \\ p_{1} \cdot r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{1}) + p_{2} \cdot r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{2}) + p_{3} \cdot r_{A \cup B}^{(\frac{1}{2}, 0)} (θ_{3}) & > 0 \end{matrix} \end{matrix}

(A6)

Equation (A6) shows that

{(L_{A})}_{A \in σ (Θ)}

is not compatible with finite union consonance. Therefore, since

{(L_{A})}_{A \in σ (Θ)}

has proper losses, it follows from Theorem 5 that there exists a prior

P_{θ}

, such that, against

P_{θ}

, no Bayesian ATS is consonant with pairwise union. ☐

Proof of Theorem 6.

The proof follows directly from Theorem A1 and Lemma A4. ☐

Proof of Corollary 2.

The proof follows the same steps as in Corollary 1. ☐

Proof of Theorem 7.

It follows from Theorem A1 and

{(L_{A})}_{A \in σ (Θ)}

being proper that

{(L_{A})}_{A \in σ (Θ)}

has invertible relative losses (Definition 22) if and only if there exists no

A \in σ (Θ)

,

(i, j) \in D_{>}

,

θ_{1} \in A

,

θ_{2} \in A^{c}

and

p \in [0, 1]

such that

\begin{matrix} \{\begin{matrix} p \cdot r_{A}^{(i, j)} (θ_{1}) + (1 - p) r_{A}^{(i, j)} (θ_{2}) & > 0 \\ p \cdot r_{A^{c}}^{(i, j)} (θ_{1}) + (1 - p) r_{A^{c}}^{(i, j)} (θ_{2}) & > 0 \end{matrix} or \{\begin{matrix} p \cdot r_{A}^{(i, j)} (θ_{1}) + (1 - p) r_{A}^{(i, j)} (θ_{2}) & < 0 \\ p \cdot r_{A^{c}}^{(i, j)} (θ_{1}) + (1 - p) r_{A^{c}}^{(i, j)} (θ_{2}) & < 0 \end{matrix} \end{matrix}

(A7)

Furthermore, Equation (A7) is equivalent to there existing no

k > 0

such that

\begin{matrix} \{\begin{matrix} k & > - \frac{r_{A}^{(i, j)} (θ_{2})}{r_{A}^{(i, j)} (θ_{1})} \\ k^{- 1} & < - \frac{r_{A^{c}}^{(i, j)} (θ_{1})}{r_{A^{c}}^{(i, j)} (θ_{2})} \end{matrix} or \{\begin{matrix} k & < - \frac{r_{A}^{(i, j)} (θ_{2})}{r_{A}^{(i, j)} (θ_{1})} \\ k^{- 1} & > - \frac{r_{A^{c}}^{(i, j)} (θ_{1})}{r_{A^{c}}^{(i, j)} (θ_{2})} \end{matrix} \end{matrix}

Conclude that

{(L_{A})}_{A \in σ (Θ)}

is compatible with invertibility if and only if, for every for every

(i, j) \in D_{>}^{2}

,

A \in σ (Θ)

,

θ_{1} \in A

and

θ_{2} \in A^{c}

,

\begin{matrix} \frac{r_{A}^{(i, j)} (θ_{2})}{r_{A}^{(i, j)} (θ_{1})} & = \frac{r_{A^{c}}^{(i, j)} (θ_{1})}{r_{A^{c}}^{(i, j)} (θ_{2})} \end{matrix}

☐

References

Wiener, Y.; El-Yaniv, R. Agnostic selective classification. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2011; pp. 1665–1673. [Google Scholar]
Balsubramani, A. Learning to abstain from binary prediction. 2016; arXiv:1602.08151. [Google Scholar]
Izbicki, R.; Esteves, L.G. Logical consistency in simultaneous statistical test procedures. Logic J. IGPL 2015, 23, 732–758. [Google Scholar] [CrossRef]
Finner, H.; Strassburger, K. The partitioning principle: A powerful tool in multiple decision theory. Ann. Stat. 2002, 30, 1194–1213. [Google Scholar] [CrossRef]
Sonnemann, E. General solutions to multiple testing problems. Biom. J. 2008, 50, 641–656. [Google Scholar] [CrossRef] [PubMed]
Patriota, A.G. S-value: An alternative measure of evidence for testing general null hypotheses. Cienc. Nat. 2014, 36, 14–22. [Google Scholar]
Da Silva, G.M.; Esteves, L.G.; Fossaluza, V.; Izbicki, R.; Wechsler, S. A bayesian decision-theoretic approach to logically-consistent hypothesis testing. Entropy 2015, 17, 6534–6559. [Google Scholar] [CrossRef]
Berg, N. No-decision classification: An alternative to testing for statistical significance. J. Socio-Econ. 2004, 33, 631–650. [Google Scholar] [CrossRef]
Babb, J.; Rogatko, A.; Zacks, S. Bayesian sequential and fixed sample testing of multihypothesis. In Asymptotic Methods in Probability and Statistics; Elsevier: Amsterdam, The Netherlands, 1998; pp. 801–809. [Google Scholar]
Ripley, B.D. Pattern Recognition and Neural Networks; Cambridge University Press: Cambridge, UK, 1996. [Google Scholar]
De Bragança Pereira, C.A.; Stern, J.M. Evidence and credibility: Full bayesian significance test for precise hypotheses. Entropy 1999, 1, 99–110. [Google Scholar] [CrossRef]
Berger, J.O.; Delampady, M. Testing precise hypotheses. Stat. Sci. 1987, 2, 317–335. [Google Scholar] [CrossRef]
Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. 1995, 57, 289–300. [Google Scholar]
Jaynes, E.T. Confidence intervals vs. Bayesian intervals. In Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science; Springer: Dodrecht, The Netherlands, 1976. [Google Scholar]
Gabriel, K.R. Simultaneous test procedures—Some theory of multiple comparisons. Ann. Math. Stat. 1969, 41, 224–250. [Google Scholar] [CrossRef]
Fossaluza, V.; Izbicki, R.; da Silva, G.M.; Esteves, L.G. Coherent hypothesis testing. Am. Stat. 2016. submitted for publication. [Google Scholar]
Sonnemann, E.; Finner, H. Vollständigkeitssätze für multiple testprobleme. In Multiple Hypothesenprüfung; Bauer, P., Hommel, G., Sonnemann, E., Eds.; Springer: Berlin, Germany, 1988; pp. 121–135. (In German) [Google Scholar]
Lavine, M.; Schervish, M. Bayes factors: What they are and what they are not. Am. Stat. 1999, 53, 119–122. [Google Scholar]
Izbicki, R.; Fossaluza, V.; Hounie, A.G.; Nakano, E.Y.; Pereira, C.A.B. Testing allele homogeneity: The problem of nested hypotheses. BMC Genet. 2012, 13. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Schervish, M.J. p values: What they are and what they are not. Am. Stat. 1996, 50, 203–206. [Google Scholar] [CrossRef]
Hochberg, Y.; Tamhane, A.C. Multiple Comparison Procedures; John Wiley & Sons: New York, NY, USA, 1987. [Google Scholar]
Borges, W.; Stern, J.M. The rules of logic composition for the bayesian epistemic e-values. Logic J. IGPL 2007, 15, 401–420. [Google Scholar] [CrossRef]
Johnson, R.A.; Wichern, D.W. Applied Multivariate Statistical Analysis, 6th ed.; Pearson: Upper Saddle River, NJ, USA, 2007. [Google Scholar]
Schervish, M.J. Theory of Statistics; Springer: New York, NY, USA, 1997. [Google Scholar]
Robert, C. The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation, 2nd ed.; Springer: New York, NY, USA, 2007. [Google Scholar]

Figure 1. Agnostic test based on the region estimate

R (x)

from Example 5.

Figure 1. Agnostic test based on the region estimate

R (x)

from Example 5.

Figure 2. Illustrations of the performance of the agnostic region testing scheme (Example 5) for three different hypotheses (specified on the top of each picture). The pictures present the probability of each decision,

P (L (A) (X) = d | μ)

for

d \in {0, \frac{1}{2}, 1}

, as a function of the mean, μ.

Figure 2. Illustrations of the performance of the agnostic region testing scheme (Example 5) for three different hypotheses (specified on the top of each picture). The pictures present the probability of each decision,

P (L (A) (X) = d | μ)

for

d \in {0, \frac{1}{2}, 1}

, as a function of the mean, μ.

Table 1. The loss function for the hypothesis

θ \in A

used in Example 26.

**Table 1.** The loss function for the hypothesis $θ \in A$ used in Example 26.
Decision	State of Nature
Decision	$θ \in A$	$θ \notin A$
0 (accept A)	0	$b_{A}$
$\frac{1}{2}$ (remain agnostic about A)	$a_{A}$	$c_{A}$
1 (reject A)	$d_{A}$	0

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Esteves, L.G.; Izbicki, R.; Stern, J.M.; Stern, R.B. The Logical Consistency of Simultaneous Agnostic Hypothesis Tests. Entropy 2016, 18, 256. https://doi.org/10.3390/e18070256

AMA Style

Esteves LG, Izbicki R, Stern JM, Stern RB. The Logical Consistency of Simultaneous Agnostic Hypothesis Tests. Entropy. 2016; 18(7):256. https://doi.org/10.3390/e18070256

Chicago/Turabian Style

Esteves, Luís G., Rafael Izbicki, Julio M. Stern, and Rafael B. Stern. 2016. "The Logical Consistency of Simultaneous Agnostic Hypothesis Tests" Entropy 18, no. 7: 256. https://doi.org/10.3390/e18070256

APA Style

Esteves, L. G., Izbicki, R., Stern, J. M., & Stern, R. B. (2016). The Logical Consistency of Simultaneous Agnostic Hypothesis Tests. Entropy, 18(7), 256. https://doi.org/10.3390/e18070256

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Logical Consistency of Simultaneous Agnostic Hypothesis Tests

Abstract

1. Introduction

2. Agnostic Testing Schemes

3. Coherence Properties

3.1. Monotonicity

3.2. Union Consonance

3.3. Intersection Consonance

3.4. Invertibility

4. Satisfying All Properties

4.1. Weak Desiderata

4.2. Strong Desiderata

4.3. n-Weak Desiderata

5. Decision-Theoretic Perspective

5.1. Monotonicity

5.2. Union Consonance

5.3. Intersection Consonance

5.4. Invertibility

6. Final Remarks

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI