A Bayesian Decision-Theoretic Approach to Logically-Consistent Hypothesis Testing

Da Silva, Gustavo Miranda; Esteves, Luis Gustavo; Fossaluza, Victor; Izbicki, Rafael; Wechsler, Sergio

doi:10.3390/e17106534

Open AccessArticle

A Bayesian Decision-Theoretic Approach to Logically-Consistent Hypothesis Testing

by

Gustavo Miranda Da Silva

¹,

Luis Gustavo Esteves

^1,*,

Victor Fossaluza

¹,

Rafael Izbicki

² and

Sergio Wechsler

¹

Institute of Mathematics and Statistics, University of São Paulo, São Paulo, 05508-090, Brazil

²

Department of Statistics, Federal University of São Carlos, São Carlos, 13565-905, Brazil

^*

Author to whom correspondence should be addressed.

Entropy 2015, 17(10), 6534-6559; https://doi.org/10.3390/e17106534

Submission received: 27 May 2015 / Revised: 1 September 2015 / Accepted: 9 September 2015 / Published: 24 September 2015

(This article belongs to the Special Issue Inductive Statistical Methods)

Download

Browse Figures

Versions Notes

Abstract

:

This work addresses an important issue regarding the performance of simultaneous test procedures: the construction of multiple tests that at the same time are optimal from a statistical perspective and that also yield logically-consistent results that are easy to communicate to practitioners of statistical methods. For instance, if hypothesis A implies hypothesis B, is it possible to create optimal testing procedures that reject A whenever they reject B? Unfortunately, several standard testing procedures fail in having such logical consistency. Although this has been deeply investigated under a frequentist perspective, the literature lacks analyses under a Bayesian paradigm. In this work, we contribute to the discussion by investigating three rational relationships under a Bayesian decision-theoretic standpoint: coherence, invertibility and union consonance. We characterize and illustrate through simple examples optimal Bayes tests that fulfill each of these requisites separately. We also explore how far one can go by putting these requirements together. We show that although fairly intuitive tests satisfy both coherence and invertibility, no Bayesian testing scheme meets the desiderata as a whole, strengthening the understanding that logical consistency cannot be combined with statistical optimality in general. Finally, we associate Bayesian hypothesis testing with Bayes point estimation procedures. We prove the performance of logically-consistent hypothesis testing by means of a Bayes point estimator to be optimal only under very restrictive conditions.

Keywords:

Bayes tests; decision theory; logical consistency; loss functions; multiple hypothesis testing

1. Introduction

One could (...) argue that ‘power is not everything’. In particular for multiple test procedures one can formulate additional requirements, such as, for example, that the decision patterns should be logical, conceivable to other persons, and, as far as possible, simple to communicate to non-statisticians.
—G. Hommel and F. Bretz [1]

Multiple hypothesis testing, a formal quantitative method that consists of testing several hypotheses simultaneously [2], has gained considerable ground in the last few decades with the aim of drawing conclusions from data in scientific experiments regarding unknown quantities of interest. Most of the development of multiple hypothesis testing has been focused on the construction of test procedures satisfying statistical optimality criteria, such as the minimization of posterior expected loss functions or the control of various error rates. These advances are detailed, for instance, in [2], [3] (p. 7), [4] and the references therein. However, another important issue concerning multiple hypothesis testing, namely the construction of simultaneous tests that yield coherent results easier to communicate to practitioners of statistical methods, has not been so deeply investigated yet, especially under the Bayesian paradigm. As a matter of fact, most traditional multiple hypothesis testing schemes do not combine statistical optimality with logical consistency. For example, [5] (p. 250) presents a situation regarding the parameter, θ, of a single exponential random variable, X, in which uniformly most powerful (UMP) tests of level 0.05 for the one-sided hypothesis

H_{0}^{(1)} : θ \leq 1

and the two-sided hypothesis

H_{0}^{(2)} : θ \leq 1 \cup θ \geq 2

, say

φ_{1}

and

φ_{2}

, respectively, lead to puzzling decisions. In fact, for the sample outcome

X = 0.7

, the test

φ_{2}

rejects

H_{0}^{(2)}

, and because

H_{0}^{(1)}

implies

H_{0}^{(2)}

, one may decide to reject

H_{0}^{(1)}

, as well. On the other hand, the test

φ_{1}

does not reject

H_{0}^{(1)}

, a fact that makes a practitioner confused given these conflicting results. In this example, an inconsistency related to nested hypotheses named coherence [6] takes place. Frequently, other logical relationships one may expect from the conclusions drawn from multiple hypothesis testing, such as consonance [6] and compatibility [7], are not met either.

Although several of these properties have been deeply investigated under a frequentist hypothesis-testing framework, Bayesian literature lacks such analyses. In this work, we contribute to this discussion by examining three rational requirements in simultaneous tests under a Bayesian decision-theoretic perspective. In short, we characterize the families of loss functions that induce multiple Bayesian tests that satisfy partially such desiderata. In Section 2, we review and illustrate the concept of a testing scheme (TS), a mathematical object that assigns to each statistical hypothesis of interest a test function. In Section 3, we formalize three consistency relations one may find important to hold in simultaneous tests: coherence, union consonance and invertibility. In Section 4, we provide necessary and sufficient conditions on loss functions to ensure Bayesian tests to meet each desideratum separately, whatsoever the prior distribution for the relevant parameters is. In Section 5, we prove, under quite general conditions, the impossibility of creating multiple tests under a Bayesian decision-theoretic framework that fulfill the triplet of requisites simultaneously with respect to all prior distributions. We also explore the connection between logically-consistent Bayes tests and Bayes point estimation procedures. Final remarks and suggestions for future inquiries are presented in Section 6. All theorems are proven in the Appendix.

2. Testing Schemes

We start by formulating the mathematical setup for multiple Bayesian tests. For the remainder of the manuscript, the parameter space is denoted by Θ and the sample space by

X

. Furthermore,

σ (Θ)

and

σ (X)

represent σ-fields of subsets of Θ and

X

, respectively. We consider the Bayesian statistical model

(X \times Θ, σ (X \times Θ), I P)

. The

I P

-marginal distribution of θ, namely the prior distribution for θ, is denoted by π, while

π_{x} (.)

represents the posterior distribution for θ given

X = x

,

x \in X

. Moreover,

P (. | θ)

stands for the conditional distribution of the observable X given θ, and

L_{x} (θ)

represents the likelihood function at the point

θ \in Θ

generated by the sample observation

x \in X

. Finally, let Ψ be the set of all test functions, that is the set of all

{0, 1}

-valued measurable functions defined on

X

. As usual, “1” denotes the decision of rejecting the null hypothesis and “0” the decision of not rejecting or accepting it.

Next, we review the definition of a TS, a mathematical device that formally describes the idea that to each hypothesis of interest it is assigned a test function. Although the specification of the hypotheses of interest most of the times depends on the scientific problem under consideration, here, we assume that a decision-maker has to assign a test to each element of

σ (Θ)

. This assumption not only enables us to precisely define the relevant consistency properties, but it also allows multiple Bayesian testing based on posterior probabilities of the hypotheses (a deeper discussion on this issue may be found in [3] (p. 5) and [8]).

Definition 1. (Testing scheme (TS)) Let the σ-field of subsets of the parameter space

σ (Θ)

be the set of hypotheses to be tested. Moreover, let Ψ be the set of all test functions defined on

X

. A TS is a function

φ : σ (Θ) \to Ψ

that assigns to each hypothesis

A \in σ (Θ)

the test

φ_{A} \in Ψ

for testing A.

Thus, for

A \in σ (Θ)

and

x \in X

,

φ_{A} (x) = 1

represents the decision of rejecting the hypothesis A when the datum x is observed. Similarly,

φ_{A} (x) = 0

represents the decision of not rejecting A. We now present examples of testing schemes.

Example 1. (Tests based on posterior probabilities) Assume

Θ = R^{d}

and

σ (Θ) = B (R^{d})

, the Borelians of

R^{d}

. Let π be the prior probability distribution for θ. For each

A \in σ (Θ)

, let

φ_{A} : X \to {0, 1}

be defined by:

φ_{A} (x) = I (π_{x} (A) < \frac{1}{2}),

where

π_{x} (.)

is the posterior distribution of θ, given x. This is the TS that assigns to each hypothesis

A \in B (R^{d})

the test that rejects it when its posterior probability is smaller than

1 / 2 .

Recall that, under a Bayesian decision-theoretic perspective, a hypothesis testing for the hypothesis

Θ_{0} \subseteq Θ

[5] (p. 214) is a decision problem in which the action space is

{0, 1}

and the loss function

L : {0, 1} \times Θ \to R

satisfies:

\begin{matrix} L (1, θ) \geq L (0, θ) for θ \in Θ_{0} and L (1, θ) \leq L (0, θ) for θ \in Θ_{0}^{c}, \end{matrix}

(1)

that is, L is such that the wrong decision ought to be assigned a loss at least as large as that assigned to a correct decision (many authors consider strict inequalities in Equation (1)). We call such a loss function a (strict) hypothesis testing loss function.

A solution of this decision problem, named a Bayes test, is a test function

φ^{*} \in Ψ

derived, for each sample point

x \in X

, by minimizing the expectation of the loss function L over

{0, 1}

with respect to the posterior distribution. That is, for each

x \in X

,

\begin{matrix} φ^{*} (x) = 1 \Leftrightarrow E [L (1, θ) | X = x] < E [L (0, θ) | X = x], \end{matrix}

where

E [L (d, θ) | X = x] = \int_{Θ} L (d, θ) d π_{x} (θ)

,

d \in {0, 1}

. In the case of the equality of the posterior expectations, both zero and one are optimal decisions, and either of them can be chosen as

φ^{*} (x)

.

When dealing with multiple tests, one can use the above procedure for each hypothesis of interest. Hence, one can derive a Bayes test for each null hypothesis

A \in σ (Θ)

considering a specified loss function

L_{A} : {0, 1} \times Θ \to R

satisfying Equation (1). This is formally described in the following definition.

Definition 2. (TS generated by a family of loss functions) Let

(X \times Θ, σ (X \times Θ), I P)

be a Bayesian statistical model. Let

{(L_{A})}_{A \in σ (Θ)}

be a family of hypothesis testing loss functions, where

L_{A} : {0, 1} \times Θ \to R

is the loss function for testing

A \in σ (Θ)

. A TS generated by the family of loss functions

{(L_{A})}_{A \in σ (Θ)}

is any TS φ defined over

σ (Θ)

, such that,

\forall A \in σ (Θ)

,

φ_{A}

is a Bayes test for hypothesis A with respect to π considering the loss

L_{A}

.

The following example illustrates this concept.

Example 2. (Tests based on posterior probabilities) Assume the same scenario as Example 1 and that

{(L_{A})}_{A \in σ (Θ)}

is a family of loss functions, such that

\forall A \in σ (Θ)

and

\forall θ \in Θ

,

L_{A} (0, θ) = I (θ \notin A) a n d L_{A} (1, θ) = I (θ \in A),

that is,

L_{A}

is the 0–1 loss for A ([5] (p. 215)). The testing scheme introduced in Example 1 is a TS generated by the family of 0–1 loss functions.

The next example shows a TS of Bayesian tests motivated by different epistemological considerations (see [9,10] for details), the full Bayesian significance tests (FBST).

Example 3. (FBST testing scheme) Let

Θ = R^{d}

,

σ (Θ) = B (R^{d})

and

f (.)

be the prior probability density function (pdf) for θ. Suppose that, for each

x \in X

, there exists

f (. | x)

, the pdf of the posterior distribution of θ, given x. For each hypothesis

A \in σ (Θ)

, let:

T_{x}^{A} = \{θ \in Θ : f (θ | x) > sup_{θ \in A} f (θ | x)\}

be the set tangent to the null hypothesis, and let

e v_{x} (A) = 1 - π_{x} (T_{x}^{A})

be the Pereira–Stern evidence value for A (see [11] for a geometric motivation). One can define a TS φ by:

φ_{A} (x) = I (e v_{x} (A) \leq c), \forall A \in σ (Θ) a n d \forall x \in X,

in which

c \in [0, 1]

is fixed. In other words, one does not reject the null hypothesis when its evidence is larger than c.

We end this section by defining a TS generated by a point estimation procedure, an intuitive concept that plays an important role in characterizing logically-consistent simultaneous tests.

Definition 3. (TS generated by a point estimation procedure) Let

δ : X ⟶ Θ

be a point estimator for θ ([5] (p. 296)). The TS generated by δ is defined by:

φ_{A} (x) = I (δ (x) \notin A) .

Hence, the TS generated by the point estimator δ rejects hypothesis A after observing x if, and only if, the point estimate for θ,

δ (x)

, is not in A.

Example 4. (TS generated by a point estimation procedure) Let

Θ = R

,

σ (Θ) = P (Θ)

and

X_{1} \dots, X_{n} | θ

i.i.d.

N (θ, 1)

. The TS generated by the sample mean, X, rejects

A \in σ (Θ)

when x is observed if

x \notin A

.

3. The Desiderata

In this section, we review three properties one may expect from simultaneous test procedures: coherence, invertibility and union consonance.

3.1. Coherence

When a hypothesis is tested by a significance test and is not rejected, it is generally agreed that all hypotheses implied by that hypothesis (its “components”) must also be considered as non-rejected.
—K. R. Gabriel [6]

The first property concerns nested hypotheses and was originally defined by [6]. It states that if hypothesis

H_{0}^{(1)}

implies hypothesis

H_{0}^{(2)}

, that is

H_{0}^{(1)} \subseteq H_{0}^{(2)}

, then the rejection of

H_{0}^{(2)}

implies the rejection of

H_{0}^{(1)}

. In the context of TSs, we have the following definition.

Definition 4. (Coherence) A testing scheme φ is coherent if:

\forall A, B \in σ (Θ), A \subseteq B \Rightarrow φ_{A} \geq φ_{B}, i . e ., \forall x \in X, φ_{A} (x) \geq φ_{B} (x) .

In other words, if after observing x, a hypothesis is rejected, any hypothesis that implies it has to be rejected, as well.

The testing schemes introduced in Examples 1, 3 and 4 are coherent. Indeed, in Example 1, coherence is a consequence of the monotonicity of probability measures, while in Example 3, it follows from the fact that if

A \subseteq B

, then

T_{x}^{B} \subseteq T_{x}^{A}

and, therefore,

e v_{x} (A) \leq e v_{x} (B)

. In Example 4, coherence is immediate. On the other hand, testing schemes based on UMP tests or generalized likelihood ratio tests with a common fixed level of significance are not coherent in general. Neither are TSs generated by some families of loss functions (see Section 4). Next, we illustrate that even test procedures based on p-values or Bayes factors may be incoherent.

Example 5. Suppose that in a case-control study, one measures the genotype in a certain locus for each individual of a sample. Results are shown in Table 1. These numbers were taken from a study presented by [12] that had the aim of verifying the hypothesis that subunits of the gene

G A B A_{A}

contribute to a condition known as methamphetamine use disorder. Here, the set of all possible genotypes is

G = {A A, A B, B B} .

Let

γ = (γ_{A A}, γ_{A B}, γ_{B B})

, where

γ_{i}

is the probability that an individual from the case group has genotype i. Similarly, let

π = (π_{A A}, π_{A B}, π_{B B})

, where

π_{i}

is the probability that an individual of control group has genotype i.

In this context, two hypotheses are of interest: the hypothesis that the genotypic proportions are the same in both groups,

H_{0}^{G} : γ = π

, and the hypothesis that the allelic proportions are the same in both groups

H_{0}^{A} : γ_{A A} + \frac{1}{2} γ_{A B} = π_{A A} + \frac{1}{2} π_{A B}

. The p-values obtained using chi-square tests for these hypotheses are, respectively, 0.152 and 0.069. Hence, at the level of significance

α = 10 %

, the TS given by chi-square tests rejects

H_{0}^{A}

, but does not reject

H_{0}^{G}

. That is, the TS leads a practitioner to believe that the allelic proportions are different in both groups, but it does not suggest any difference between the genotypic proportions. This is absurd!If the allelic proportions are not the same in both groups, the genotypic proportions cannot be the same either. Indeed, if the latter were the same, then

γ_{i} = π_{i}

,

\forall i \in G

, and hence,

θ \in H_{0}^{A}

. This example is further discussed in [8,13].

Table 1. Genotypic sample frequencies.

**Table 1.** Genotypic sample frequencies.
	AA	AB	BB	Total
Case	55	83	50	188
Control	24	42	39	105

Several other (in)coherent testing schemes are explored by [8,14].

Coherence is by far the most emphasized logical requisite for simultaneous test procedures in the literature. It is often regarded as a sensible property by both theorists and practitioners of statistical methods who perceive a hypothesis test as a two-fold (accept/reject) decision problem. On the other hand, adherents to evidence-based approaches to hypothesis testing [15] do not see the need for coherence. Under the frequentist approach to hypothesis testing, the construction of coherent procedures is closely associated with the so-called closure methods [16,17]. Many results on coherent classical tests are shown in [6,17], among others. On the other hand, coherence has not been deeply investigated from a Bayesian standpoint yet, except for [18], who relate coherence with admissibility and Bayesian optimality in certain situations of finitely many hypotheses of interest. In Section 4, we provide a characterization of coherent testing schemes under a decision-theoretic framework.

3.2. Invertibility

There is a duality between hypotheses and alternatives which is not respected in most of the classical hypothesis-testing literature. (...) suppose that we decide to switch the names of alternative and hypothesis, so that $Ω_{H}$ becomes $Ω_{A}$ , and vice versa. Then we can switch tests from ϕ to $ψ = 1 - ϕ$ and the “actions” accept and reject become switched.
—M. J. Schervish [5] (p. 216)

The duality mentioned in the quotation above is formally described in the next definition.

Definition 5. (Invertibility) A testing scheme φ satisfies invertibility if:

\forall A \in σ (Θ), φ_{A^{c}} = 1 - φ_{A} .

In other words, it is irrelevant to decision-making which hypothesis is labeled as null and which is labeled as alternative.

Unlike coherence, there is no consensus among statisticians on how reasonable invertibility is. While it is supported by many decision-theorists, invertibility is usually discredited by advocates of the frequentist theory owing to the difference between the interpretations of “not reject a hypothesis” and “accept a hypothesis” under various epistemological viewpoints (the reader is referred to [7] for a discussion on this distinction). As a matter of fact, invertibility can also be seen, from a logic perspective, as a version of the law of the excluded middle, which itself represents a gap between schools of logic ([19] (p. 32)). In spite of the controversies on invertibility, it seems to be beyond any argument the fact that the absence of invertibility in multiple tests may lead a decision-maker to be puzzled by senseless conclusions, such as the simultaneous rejections of both a hypothesis and its alternative. The following example illustrates this point.

Example 6. Suppose that

X | θ \sim N o r m a l (θ, 1)

, and consider that the parameter space is

Θ = {- 3, 3}

. Assume one wants to test the following null hypotheses:

\begin{matrix} H_{0}^{A} : θ = 3 a n d H_{0}^{B} : θ = - 3 \end{matrix}

The Neyman–Pearson tests for these hypotheses have the following critical regions, at the level 5%, respectively:

{x \in R : x < 1.35} a n d {x \in R : x > - 1.35} .

Hence, if we observe

x = - 0.5

, we reject both

H_{0}^{A}

and

H_{0}^{B}

, even though

H_{0}^{A} \cup H_{0}^{B} = Θ

!

The testing schemes of Examples 2 and 4 satisfy invertibility. In Example 4, it is straightforward to verify this. In Example 2, it follows essentially from the equivalence

π_{x} (A) < 1 / 2 \Leftrightarrow π_{x} (A^{c}) > 1 / 2

. If

π_{x} (A) \neq 1 / 2

for each sample x and for all

A \in σ (Θ)

, the unique TS generated by the 0–1 loss functions satisfies invertibility. Otherwise, there is a testing scheme generated by such losses that is still in line with this property. Indeed, for any

A \in σ (Θ)

and

x_{0} \in X

, such that

π_{x_{0}} (A) = 1 / 2

, the decision of rejecting A (not rejecting

A^{c}

) after observing

x_{0}

has the same expected loss as the decision of not rejecting (rejecting) it. Thus, among all testing schemes generated by the 0–1 loss functions, which are all equivalent from a decision-theoretic point of view ([20] (p. 123)), a decision-maker can always choose a TS

φ^{'}

, such that

φ_{A^{c}}^{'} (x_{0}) = 1 - φ_{A}^{'} (x_{0})

for all

A \in σ (Θ)

, and

x_{0} \in X

, such that

π_{x_{0}} (A) = 1 / 2

. Such a TS

φ^{'}

meets invertibility.

3.3. Consonance

... a test for ${(\cup_{i \in I} H_{i})}^{c}$ versus $(\cup_{i \in I} H_{i})$ may result in rejection which then indicates that at least one of the hypotheses $H_{i}$ , $i \in I$ , may be true.
—H. Finner and K. Strassburger [21]

The third property concerns two hypotheses, say A and B, and their union,

A \cup B

. It is motivated by the fact that in many cases, it seems reasonable that a testing scheme that retains the union of these hypotheses should also retain at least one of them. This idea is generalized in Definition 6.

Definition 6. (Union Consonance) A TS φ satisfies the finite (countable) union consonance if for all finite (countable) set of indices I,

\forall {A_{i}}_{i \in I} \subseteq σ (Θ), φ_{\cup_{i \in I} A_{i}} \geq min {φ_{A_{i}}}_{i \in I} .

In other words, if we retain the union of the hypotheses

\cup_{i \in I} A_{i}

, we should not reject at least one of the

A_{i}

’s.

There are several testing schemes that meet union consonance. For instance, TSs generated by point estimation procedures, TSs of Aitchison’s confidence-region tests [22] and FBST TSs (under quite general conditions; see [8]) satisfy both finite and countable union consonance.

Although union consonance may not be considered as appealing as coherence for simultaneous test procedures, it was hinted at in a few relevant works. For instance, the interpretation given by [21] on the final joint decisions derived from partial decisions implicitly suggests that union consonance is reasonable: they suggest one should consider

B : = \cup_{A : φ_{A} (x) = 1} A

to be the set of all parameter values rejected by the simultaneous procedure at hand when x is observed. Under this reading, it seems natural to expect that

φ_{B} (x) = 1

, which is exactly what the union consonance principle states. As a matter of fact, the general partitioning principle proposed by these authors satisfies union consonance. It should also be mentioned that union consonance, together with coherence, plays a key role in the possibilistic abstract belief calculus [23]. In addition, an evidence-based approach detailed in [24] satisfies both consonance and invertibility.

We end this section by stating a result derived from putting these logical requirements together.

Theorem 1. Let Θ be a countable parameter space and

σ (Θ) = P (Θ)

. Let φ be a testing scheme defined on

σ (Θ)

. The TS φ satisfies coherence, invertibility and countable union consonance if, and only if, there is a point estimator

δ : X \to Θ

, such that φ is generated by δ.

Theorem 1 is also valid for finite union consonance with the obvious adaptation.

4. A Bayesian Look at Each Desideratum

In the previous section, we provided several examples of testing schemes satisfying some of the logical properties reviewed therein. In particular, a testing scheme generated by the family of 0–1 loss functions (Example 2) was shown to fulfill both coherence and invertibility. However, not all families of loss functions generate a TS meeting any of these requisites, as is shown in the examples below.

Example 7. Suppose that

X | θ \sim B e r n o u l l i (θ)

and that one is interested in testing the null hypotheses:

H_{0}^{A} : θ \leq 0.4 a n d H_{0}^{B} : θ \leq 0.5 .

Furthermore, assume

θ \sim U n i f o r m (0, 1)

a priori and that he uses the loss functions from Table 2 to perform the tests.

Thus, Bayes tests for testing

H_{0}^{A}

and

H_{0}^{B}

are, respectively,

φ_{A} (x) = I (I P (θ \leq 0.4 | x) \leq 1 / 7) a n d φ_{B} (x) = I (I P (θ \leq 0.5 | x) \leq 1 / 3) .

As

θ | x \sim B e t a (2, 1)

if

x = 1

is observed, then

I P (θ \leq 0.4 | x) = 0.16

and

I P (θ \leq 0.5 | x) = 0.25

, so that one does not reject

H_{0}^{A}

, but rejects

H_{0}^{B}

. Since

H_{0}^{A} \subset H_{0}^{B}

, we conclude that coherence does not hold.

Table 2. Loss functions for tests of Example 7.


	State of Nature
Decision	$θ \in H_{0}^{A}$	$θ \notin H_{0}^{A}$
0	0	1
1	6	0


	State of Nature
Decision	$θ \in H_{0}^{B}$	$θ \notin H_{0}^{B}$
0	0	1
1	2	0

Intuitively, incoherence takes place because the loss of falsely rejecting

H_{0}^{A}

is three-times as large as the loss of falsely rejecting

H_{0}^{B}

, while the corresponding errors of Type II are of the same magnitude. Hence, these loss functions reveal that the decision-maker is more reluctant to reject

H_{0}^{A}

than to reject

H_{0}^{B}

in such a way that he only needs little evidence to accept

H_{0}^{A}

(posterior probability greater than 1/7) when compared to the amount of evidence needed to accept

H_{0}^{B}

(posterior probability greater than 1/3). Thus, it is not surprising at all that in this case, the tests do not cohere for some priors.

Example 8. In the setup of Example 7, suppose one also needs to test the null hypothesis

H_{0}^{B^{c}} : θ > 0.5

by taking into account the loss function in Table 3.

The Bayes test for

H_{0}^{B^{c}}

is then to reject it if

I P (θ > 0.5 | x) < 4 / 5

. For

x = 1

,

I P (θ > 0.5 | x) = 0.75

, and consequently,

H_{0}^{B^{c}}

is rejected. As both

H_{0}^{B}

and

H_{0}^{B^{c}}

are rejected when

x = 1

is observed, these tests do not satisfy invertibility.

Table 3. Loss function for Example 8.

**Table 3.** Loss function for Example 8.
	State of Nature
Decision	$θ \in H_{0}^{B^{c}}$	$θ \notin H_{0}^{B^{c}}$
0	0	4
1	1	0

The absence of invertibility is somewhat expected here, because the degree to which the decision-maker believes an incorrect decision of choosing

H_{0}^{B}

to be more serious than an incorrect decision of choosing

H_{0}^{B^{c}}

is not the same whether

H_{0}^{B}

is regarded as the “null” or the “alternative” hypothesis. More precisely, while the decision-maker assigns a loss to the error of Type I that is the double of the one assigned to the error of Type II when testing the null hypothesis

H_{0}^{B}

, he evaluates the loss of falsely accepting

H_{0}^{B^{c}}

to be four-times (not twice!) as large as that of falsely rejecting it when

H_{0}^{B^{c}}

is the null hypothesis.

The examples we have examined so far give rise to the question: from a decision-theoretic perspective, what conditions must be imposed on a family of loss functions so that the resultant Bayesian testing scheme meets coherence (invertibility)? Next, we offer a solution to this question. We first give a definition in order to simplify the statement of the main results of this section.

Definition 7. (Relative loss) Let

L_{A}

be a loss function for testing the hypothesis

A \in σ (Θ)

. The function

Δ_{A} : Θ \to R

defined by:

Δ_{A} (θ) = \{\begin{matrix} L_{A} (1, θ) - L_{A} (0, θ), & if θ \in A \\ L_{A} (0, θ) - L_{A} (1, θ), & if θ \notin A \end{matrix}

is named the relative loss of

L_{A}

for testing A.

In short, the relative loss measures the difference between losses of taking the wrong and the correct decisions. Thus, the relative loss of any hypothesis testing loss function is always non-negative.

A careful examination of Example 7 hints that in order to obtain coherent tests, the “larger” (the “smaller”) the null hypothesis of interest is, the more cautious about falsely rejecting (accepting) it the decision-maker ought to be. This can be quantified as follows: for hypotheses A and B, such that

A \subseteq B

and with corresponding hypothesis testing loss functions

L_{A}

and

L_{B}

, if

θ_{1} \in A

, then

Δ_{B} (θ_{1})

should be at least as large as

Δ_{A} (θ_{1})

. Similarly, if

θ_{2} \in B^{c}

, then

Δ_{B} (θ_{2})

should be at most

Δ_{A} (θ_{2})

. Such conditions are also appealing, since it seems reasonable that greater relative losses should be assigned to greater “distances” between the parameter and the wrong decision. For instance, if

θ \in A

(and consequently,

θ \in B

), the rougher error of rejecting B should be penalized more heavily than the error of rejecting A; Figure 1 enlightens this idea.

Figure 1. Interpretation of sensible relative losses: rougher errors of decisions should be assigned larger relative losses.

These conditions, namely:

Δ_{A} (θ_{1}) \leq Δ_{B} (θ_{1}), \forall θ_{1} \in A and Δ_{A} (θ_{2}) \geq Δ_{B} (θ_{2}), \forall θ_{2} \in B^{c},

are sufficient for coherence. As a matter of fact, Theorem 2 states that the weaker condition:

Δ_{A} (θ_{1}) Δ_{B} (θ_{2}) \leq Δ_{A} (θ_{2}) Δ_{B} (θ_{1}), \forall θ_{1} \in A, \forall θ_{2} \in B^{c},

is necessary and sufficient for a family of hypothesis testing loss functions to induce a coherent testing scheme with respect to each prior distribution for θ. Henceforward, we assume that

E (L_{A} (d, θ) | x) < \infty

, for all

A \in σ (Θ)

,

d \in {0, 1}

and

x \in X

.

Theorem 2. Let

{(L_{A})}_{A \in σ (Θ)}

be a family of hypothesis testing loss functions. Suppose that for all

θ_{1}, θ_{2} \in Θ

, there is

x \in X

, such that

L_{x} (θ_{1}), L_{x} (θ_{2}) > 0

. Then, for all prior distributions π for θ, there exists a testing scheme generated by

{(L_{A})}_{A \in σ (Θ)}

with respect to π that is coherent if, and only if,

{(L_{A})}_{A \in σ (Θ)}

is such that for all

A, B \in σ (Θ)

with

A \subseteq B

:

\begin{matrix} Δ_{A} (θ_{1}) Δ_{B} (θ_{2}) \leq Δ_{A} (θ_{2}) Δ_{B} (θ_{1}), \forall θ_{1} \in A, \forall θ_{2} \in B^{c} . \end{matrix}

(2)

Notice that the “if” part of Theorem 2 still holds for families of hypothesis testing loss functions that depend also on the sample. Theorem 2 characterizes, under certain conditions, all families of loss functions that induce coherent tests, no matter what the decision-maker’s opinion (prior) on the unknown parameter is. Although the result of Theorem 2 is not properly normative, any Bayesian decision-maker can make use of it to prevent himself from drawing incoherent conclusions from multiple hypothesis testing by checking whether his personal losses satisfy the condition in Equation (2).

Many simple families of loss functions generate coherent tests, as we illustrate in Examples 9 and 10.

Example 9. Consider, for each

A \in σ (Θ)

, the loss function

L_{A}

in Table 4 to test the null hypothesis A, in which

λ : σ (Θ) \to R_{+}

is any finite measure, such that

λ (Θ) > 0

. This family of loss functions satisfies the condition in Equation (2) for coherence as for all

A, B \in σ (Θ)

, such that

A \subseteq B

, and for all

θ_{1} \in A

and

θ_{2} \in B^{c}

,

Δ_{A} (θ_{1}) = λ (A)

,

Δ_{B} (θ_{2}) = λ (B^{c})

,

Δ_{A} (θ_{2}) = λ (A^{c})

and

Δ_{B} (θ_{1}) = λ (B)

.

Table 4. Loss function

L_{A}

for testing A.

**Table 4.** Loss function $L_{A}$ for testing A.
	State of Nature
Decision	$θ \in A$	$θ \notin A$
0	0	$λ (A^{c})$
1	$λ (A)$	0

As a matter of fact, if for each

A \in σ (Θ)

,

L_{A}

is a

0 - 1 - c_{A}

loss function ([5] (p. 215)), with

0 < c_{A} \leq c_{B}

if

A \subseteq B

, then the family

{(L_{A})}_{A \in σ (Θ)}

will induce a coherent TS for each prior for θ.

Example 10. Assume Θ is equipped with a distance, say d. Define, for each

A \in σ (Θ)

the loss function

L_{A}

for testing A by:

L_{A} (0, θ) = d^{*} (θ, A) and L_{A} (1, θ) = d^{*} (θ, A^{c}),

where

d^{*} (θ, A) = {inf}_{a \in A} d (θ, a)

is the distance between

θ \in Θ

and A. For

A, B \in σ (Θ)

, such that

A \subseteq B

, and for

θ_{1} \in A

and

θ_{2} \in B^{c}

,

Δ_{A} (θ_{1}) = d^{*} (θ_{1}, A^{c})

,

Δ_{B} (θ_{2}) = d^{*} (θ_{2}, B)

,

Δ_{A} (θ_{2}) = d^{*} (θ_{2}, A)

and

Δ_{B} (θ_{1}) = d^{*} (θ_{1}, B^{c})

. These values satisfy Equation (2) from Theorem 2. Hence, families of loss functions based on distances as the above generate Bayesian coherent tests.

Next, we characterize Bayesian tests with respect to invertibility. In order to obtain TSs that meet invertibility, it seems reasonable that when the null and alternative hypotheses are switched, the relative losses ought to remain the same. That is to say, when testing the null hypothesis A, the relative loss at each point

θ \in Θ

,

Δ_{A} (θ)

, should be equal to the relative loss

Δ_{A^{c}} (θ)

when

A^{c}

is the null hypothesis instead. This condition is sufficient, but not necessary for a family of loss functions to induce tests fulfilling this logical requisite with respect to all prior distributions. In Theorem 3, however, we provide necessary and sufficient conditions for invertibility.

Theorem 3. Let

{(L_{A})}_{A \in σ (Θ)}

be a family of hypothesis testing loss functions. Suppose that for all

θ_{1}, θ_{2} \in Θ

, there is

x \in X

, such that

L_{x} (θ_{1}), L_{x} (θ_{2}) > 0

. Then, for all prior distributions π for θ, there exists a testing scheme generated by

{(L_{A})}_{A \in σ (Θ)}

with respect to π that satisfies invertibility if, and only if,

{(L_{A})}_{A \in σ (Θ)}

is such that for all

A \in σ (Θ)

:

\begin{matrix} Δ_{A} (θ_{1}) Δ_{A^{c}} (θ_{2}) = Δ_{A^{c}} (θ_{1}) Δ_{A} (θ_{2}), \forall θ_{1} \in A θ_{2} \in A^{c} . \end{matrix}

(3)

Condition Equation (3) is equivalent (for strict hypothesis testing loss functions) to impose, for each

A \in σ (Θ)

, that the function

\frac{Δ_{A} (.)}{Δ_{A^{c}} (.)}

to be constant over Θ. We should mention that the “if” part of Theorem 3 still holds for hypothesis testing loss functions satisfying (Equation (3)) that also depend on the sample x.

The families of loss functions introduced in Examples 9 and 10 satisfy (Equation (3)). Thus, such families of losses ensure the construction of simultaneous Bayes tests that are in conformity with both coherence and invertibility for all prior distributions on

σ (Θ)

. Thus, if one believes these (two) logical requirements to be of primary importance in multiple hypothesis testing, he can make use of any of these families of loss functions to perform tests satisfactorily. Other simple loss functions also lead to TSs that meet invertibility: for instance, any family of 0–1–c loss functions for which

c_{A^{c}} = 1 / c_{A}

for all

A \in σ (Θ)

leads to invertible TSs.

We end this section by examining union consonance under a decision-theoretic point of view. From Definition 6, it appears that a necessary condition for the derivation of consonant tests is that “smaller” (“larger”) null hypotheses ought to be assigned greater losses for false rejection (acceptance). More precisely, for

A, B \in σ (Θ)

, if

θ_{1} \in A \cup B

, then it seems that either

Δ_{A \cup B} (θ_{1}) \leq Δ_{A} (θ_{1})

or

Δ_{A \cup B} (θ_{1}) \leq Δ_{B} (θ_{1})

should hold. If

θ_{2} \in {(A \cup B)}^{c}

, then it is reasonable that either

Δ_{A \cup B} (θ_{2}) \geq Δ_{A} (θ_{2})

or

Δ_{A \cup B} (θ_{2}) \geq Δ_{B} (θ_{2})

. The next theorem shows that this is nearly the case. However, it is still unknown whether sufficient conditions for union consonance are determinable.

Theorem 4. Let

{(L_{A})}_{A \in σ (Θ)}

be a family of hypothesis testing loss functions. Suppose that for all

θ_{1}, θ_{2} \in Θ

, there is

x \in X

, such that

L_{x} (θ_{1}), L_{x} (θ_{2}) > 0

. If for all prior distribution π for θ, there exists a testing scheme generated by

{(L_{A})}_{A \in σ (Θ)}

with respect to π that satisfies finite union consonance, then

{(L_{A})}_{A \in σ (Θ)}

is such that for all

A, B \in σ (Θ)

and for all

θ_{1} \in A \cup B, θ_{2} \in {(A \cup B)}^{c}

,

\begin{matrix} either Δ_{A \cup B} (θ_{1}) Δ_{A} (θ_{2}) \leq Δ_{A \cup B} (θ_{2}) Δ_{A} (θ_{1}) or Δ_{A \cup B} (θ_{1}) Δ_{B} (θ_{2}) \leq Δ_{A \cup B} (θ_{2}) Δ_{B} (θ_{1}) . \end{matrix}

5. Putting the Desiderata Together

In Section 4, we showed that there are infinitely many families of loss functions that induce, for each prior distribution for θ, a TS that satisfies both coherence and invertibility (Examples 9 and 10). However, requiring the three logical consistency properties we presented to hold simultaneously with respect to all priors is too restrictive: under mild conditions, no TS constructed under a Bayesian decision-theoretic approach to hypothesis testing fulfills this, as stated in the next theorem.

Theorem 5. Assume that Θ and

σ (Θ)

are such that

| Θ | \geq 3

and that there is a partition of Θ composed of three nonempty measurable sets. Assume also that for all triplet

θ_{1}, θ_{2}, θ_{3} \in Θ

, there is

x \in X

, such that

L_{x} (θ_{i}) > 0

for

i = 1, 2, 3

. Then, there is no family of strict hypothesis testing loss functions that induces, for each prior distribution for θ, a testing scheme satisfying coherence, invertibility and finite union consonance.

Theorem 5 states that Bayesian optimality (based on standard loss functions that do not depend on the sample) cannot be combined with complete logical consistency. This fact can lead one to wonder whether such properties are indeed sensible in multiple hypothesis testing. The following result shows us that the desiderata are in fact reasonable in the sense that a TS meeting these requirements does correspond to the optimal tests of some Bayesian decision-makers. We return to this point in the concluding remarks.

Theorem 6. Let Θ be a countable (finite) parameter space,

σ (Θ) = P (Θ)

, and

X

be a countable sample space. Let φ be a testing scheme that satisfies coherence, invertibility and countable (finite) union consonance. Then, there exist a probability measure μ over

P (Θ \times X)

and a family of strict hypothesis testing loss functions

{(L_{A})}_{A \in σ (Θ)}

, such that φ is generated by

{(L_{A})}_{A \in σ (Θ)}

with respect to the μ-marginal distribution of θ.

We end this section by associating logically-consistent Bayesian hypothesis testing with Bayes point estimation procedures in case both Θ and

X

are finite. This relationship is characterized in Theorem 7.

Theorem 7. Let Θ and

X

be finite sets and

σ (Θ) = P (Θ)

. Let φ be the testing scheme generated by the point estimator

δ : X \to Θ

. Suppose that for all

x \in X

,

L_{x} (δ (x)) > 0

.

(a): If there exist a probability measure $π : σ (Θ) \to [0, 1]$ for θ, with $π (δ (x)) > 0$ for all $x \in X$ , and a loss function $L : Θ \times Θ \to R_{+}$ , satisfying $L (θ, θ) = 0$ and $L (d, θ) > 0$ for $d \neq θ$ , such that δ is a Bayes estimator for θ generated by L with respect to π, then there is a family of hypothesis testing loss functions ${(L_{A})}_{A \in σ (Θ)}$ , $L_{A} : {0, 1} \times (Θ \times X) \to R_{+}$ for each $A \in σ (Θ)$ , such that φ is generated by ${(L_{A})}_{A \in σ (Θ)}$ with respect to π.
(b): If there exist a probability measure $π : σ (Θ) \to [0, 1]$ for θ, with $π (δ (x)) > 0$ for all $x \in X$ , and a family of strict hypothesis testing loss functions ${(L_{A})}_{A \in σ (Θ)}$ , $L_{A} : {0, 1} \times Θ \to R_{+}$ for each $A \in σ (Θ)$ , such that φ is generated by ${(L_{A})}_{A \in σ (Θ)}$ with respect to π, then there is a loss function $L : Θ \times Θ \to R_{+}$ , with $L (θ, θ) = 0$ and $L (d, θ) > 0$ for $d \neq θ$ , such that δ is a Bayes estimator for θ generated by L with respect to π.

Theorem 7 ensures that multiple Bayesian tests that fulfill the desiderata cannot be separated from Bayes point estimation procedures. One may find in Theorem 7, Part (a), a decision-theoretic justification for performing simultaneous tests by means of a Bayes point estimator. However, the optimality of such tests is derived under very restrictive conditions, as the underlying loss functions depend both on the sample and on a point estimator. This fact reinforces that one can reconcile statistical optimality and logical consistency in multiple tests only in very particular cases. We should also emphasize that, under the conditions of Part (a), if, in addition,

π (θ) > 0

for all

θ \in Θ

, then, for all

A \in σ (Θ)

,

φ_{A}

is an admissible test for A with regard to

L_{A}

(the standard proof of this result developed for losses that do not depend on the sample also works here). The second part of Theorem 7 states that if a Bayesian testing scheme meets coherence, invertibility and finite union consonance, then the point estimator that generates it cannot be devoid of optimality: it must be a Bayes estimator for specific loss functions. Example 11 illustrates the first part of this theorem.

Example 11. Assume that

Θ = {θ_{1}, θ_{2}, \dots, θ_{k}}

and

X

is finite. Assume also that there is a maximum likelihood estimator (MLE) for θ,

δ_{M L} : X \to Θ

, such that

L_{x} (δ_{M L} (x)) > 0

, for all

x \in X

. Then, the testing scheme generated by

δ_{M L}

is a TS of Bayes tests. Indeed, when Θ is finite, an MLE for θ is a Bayes estimator generated by the loss function

L (d, θ) = I (d \neq θ)

,

d, θ \in Θ

, with respect to the uniform prior over Θ (that is,

δ_{M L} (x)

corresponds to a mode of the posterior distribution

π_{x}

, for each

x \in X

). Consequently (recall that

| Θ | = k

),

π_{x} (δ_{M L} (x)) \geq 1 / k

and

E [L (δ_{M L} (x), θ) | x] = 1 - π_{x} (δ_{M L} (x))

, for each

x \in X

. Thus,

max_{x \in X} \frac{E [L (δ_{M L} (x), θ) | x]}{π_{x} (δ_{M L} (x))} = max_{x \in X} \frac{1 - π_{x} (δ_{M L} (x))}{π_{x} (δ_{M L} (x))} \leq \frac{1 - \frac{1}{k}}{\frac{1}{k}} = k - 1,

as

g : (0, 1] \to R_{+}

given by

g (t) = (1 - t) / t

is strictly decreasing.

By Theorem 7, it follows that the TS generated by the MLE

δ_{M L}

is a Bayesian TS generated by (for instance) the family of loss functions

{(L_{A})}_{A \in σ (Θ)}

given, for each

A \in σ (Θ)

, by

L_{A} (1, (θ, x)) = 0

and

L_{A} (0, (θ, x)) = 1

, for

θ \in A^{c}

, and

L_{A} (0, (θ, x)) = 0

and

L_{A} (1, (θ, x)) = k I_{A} (δ_{M L} (x)) + (1 / k) I_{A^{c}} (δ_{M L} (x))

, for

θ \in A

.

It is worth mentioning that the development of Theorem 7(a) and Example 11 is in a sense related to the optimality of least relative surprise estimators under prior-based loss functions [24] (Section 2).

6. Conclusions

While several studies on frequentist multiple tests deal with the question of seeking for a balance between statistical optimality and logical consistency, this issue has not been addressed yet under a decision-theoretic standpoint. For this reason, in this work, we examine simultaneous Bayesian hypothesis testing with respect to three rational properties: coherence, invertibility and union consonance. Briefly, we characterize the families of loss functions that yield Bayes tests meeting each of these requisites separately, whatever the prior distribution for the relevant parameter is. These results not only shed some light on when each of these relationships may be considered to be sensible for a given scientific problem, but they also serve as a guide for a Bayesian decision-maker aiming at performing tests in line with the requirement he finds more important. In particular, this can be done through the usage of the loss functions described in the paper.

We also explore how far one can go by putting these properties together. We provide examples of fairly intuitive loss functions that induce testing schemes satisfying both coherence and invertibility, no matter what one’s prior opinion on the parameter is. On the other hand, we prove that no family of reasonable loss functions generates Bayes tests that respect the logical properties as a whole with respect to all priors, although any testing scheme meeting the desiderata corresponds to the optimal tests of several Bayesian decision-makers.

Finally, we discuss the relationship between logically-consistent Bayesian hypothesis testing and Bayes point estimations procedures when both the parameter space and the sample space are finite. We conclude that the point estimator generating a testing scheme fulfilling the rational properties is inevitably and unavoidably a Bayes estimator for certain loss functions. Furthermore, performing logically-consistent procedures by means of a Bayes estimator is one’s best approach towards multiple hypothesis testing only under very restrictive conditions in which the underlying loss functions depend not only on the decision to be made and the parameter as usual, but also on the observed sample. See [24,25,26] for some examples of such loss functions. That is, a more complex framework is needed to combine Bayesian optimality with logical consistency. This fact and the impossibility result of Theorem 5 corroborate the thesis that full rationality and statistical optimality rarely can be combined in simultaneous tests. In practice, this suggests that when testing hypotheses at once, a practitioner may abandon in part the desiderata so as to preserve statistical optimality. This is further discussed in [8].

Several issues remain open, among which we mention three. First, the extent to which the results derived in this work can be generalized to infinite (continuous) parameter spaces is an important problem from both theoretical and practical aspects. Furthermore, the consideration of different decision-theoretic approaches to hypothesis testing, such as the “agnostic” tests with three-fold action spaces proposed by [27], may bring new insight into which logical properties may be expected, not only in the current, but also in alternative frameworks. In epistemological terms, one may be concerned with the question of whether multiple hypothesis testing is the most adequate way to draw inferences about a parameter of interest from data given the incompatibility between full logical consistency and the achievement of statistical optimality. As a matter of fact, many Bayesians regard the whole posterior distribution as the most complete inference one can make about the unknown parameter. These analyses may contribute to better decision-making.

Acknowledgments

The authors are thankful for Carlos Alberto de Bragança Pereira, José Carlos Simon de Miranda, José Galvão Leite, Julio Michael Stern, Marcelo Esteban Coniglio, Márcio Alves Diniz and Paulo Cilas Marques Filho for fruitful discussions and important comments and suggestions, which improved the manuscript. We are also grateful to the referees for all of the detailed comments that helped improve the paper. This work was supported by Fundação de Amparo à Pesquisa do Estado de São Paulo (2009/03385-5,2014/25302-2) Brazil and Conselho Nacional de Pesquisa e Desenvolvimento Científico e Tecnológico (131982/2009-5) Brazil.

Author Contributions

The manuscript has come to fruition by the substantial contributions of all authors from conceiving the idea of examining Bayes tests with respect to logical consistency to obtaining the main theorems and providing several examples. All authors have also been involved in either writing the article or carefully revising it. All authors have read and approved the submitted version of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix

A. Proof of Theorem 1

That a testing scheme generated by a point estimation procedure δ satisfies the desiderata follows from Theorem 4.3 from [8] and the fact that for all

x \in X

and all countable partition

{(A_{n})}_{n \geq 1}

of Θ, there is a unique

i^{*} \in N^{*}

, such that

δ (x) \in A_{i^{*}}

and, consequently,

\sum_{i = 1}^{\infty} [1 - I (δ (x) \notin A_{i})] = 1

. For the converse, Theorem 4.3 from [8] implies that

\forall x \in X

,

\exists! θ_{0} = θ_{0} (x) \in Θ

, such that

φ_{{θ_{0}}} (x) = 0

. Thus, for

A \in σ (Θ)

,

θ_{0} \in A \Rightarrow {θ_{0} (x)} \subseteq A

and, as coherence holds,

φ_{A} (x) = 0

. On the other hand,

θ_{0} \notin A \Rightarrow {θ_{0} (x)} \subseteq A^{c}

. Coherence and invertibility yield

φ_{A} (x) = 1

. Hence, for each

A \in σ (Θ)

,

φ_{A} (x) = 1 \Leftrightarrow θ (x) \notin A

. We conclude the proof by defining

δ : X \to Θ

by

δ (x) = θ_{0} (x)

.

B. Proof of Theorem 2

First, we prove the necessary condition by the contrapositive. Thus, let us suppose there are

A, B \in σ (Θ)

with

A \subseteq B

and

θ_{1} \in A

and

θ_{2} \in B^{c}

, such that:

Δ_{A} (θ_{1}) Δ_{B} (θ_{2}) > Δ_{A} (θ_{2}) Δ_{B} (θ_{1}),

which implies that

Δ_{A} (θ_{1}) > 0

and

Δ_{B} (θ_{2}) > 0

.

Adding

Δ_{A} (θ_{2}) Δ_{B} (θ_{2})

to both sides of the inequality above, straightforward manipulations yield:

0 \leq \frac{Δ_{A} (θ_{2})}{Δ_{A} (θ_{2}) + Δ_{A} (θ_{1})} < \frac{Δ_{B} (θ_{2})}{Δ_{B} (θ_{2}) + Δ_{B} (θ_{1})} \leq 1 .

Thus, there is

α_{0} \in (0, 1)

, such that:

\begin{matrix} \frac{Δ_{A} (θ_{2})}{Δ_{A} (θ_{2}) + Δ_{A} (θ_{1})} < α_{0} < \frac{Δ_{B} (θ_{2})}{Δ_{B} (θ_{2}) + Δ_{B} (θ_{1})} . \end{matrix}

(4)

Furthermore, there is

x^{'} \in X

, such that

L_{x^{'}} (θ_{1}), L_{x^{'}} (θ_{2}) > 0

. Considering the prior distribution

π^{*}

for θ given by:

π^{*} (θ_{1}) = \frac{α_{0} L_{x^{'}} (θ_{2})}{α_{0} L_{x^{'}} (θ_{2}) + (1 - α_{0}) L_{x^{'}} (θ_{1})} and π^{*} (θ_{2}) = 1 - π^{*} (θ_{1}),

the corresponding posterior distribution given

x^{'}

is

π_{x^{^{'}}}^{*} (θ_{1}) = α_{0}

and

π_{x^{^{'}}}^{*} (θ_{2}) = 1 - α_{0}

. Let

φ^{*}

be any TS generated by

{(L_{A})}_{A \in σ (Θ)}

with respect to

π^{*}

. Thus,

φ_{A}^{*} (x^{'}) = 0, if α_{0} > \frac{Δ_{A} (θ_{2})}{Δ_{A} (θ_{2}) + Δ_{A} (θ_{1})} and φ_{B}^{*} (x^{'}) = 0, if α_{0} > \frac{Δ_{B} (θ_{2})}{Δ_{B} (θ_{2}) + Δ_{B} (θ_{1})} .

From Equation (4), we have

φ_{A}^{*} (x^{'}) = 0

and

φ_{B}^{*} (x^{'}) = 1

. Therefore, there is a prior distribution

π^{*}

for θ with respect to which any TS generated by

{(L_{A})}_{A \in σ (Θ)}

is not coherent.

We now prove the “if” part. We suppose that the family

{(L_{A})}_{A \in σ (Θ)}

satisfies the condition that for all

A, B \in σ (Θ)

with

A \subseteq B

,

Δ_{A} (θ_{1}) Δ_{B} (θ_{2}) \leq Δ_{A} (θ_{2}) Δ_{B} (θ_{1})

,

\forall θ_{1} \in A, \forall θ_{2} \in B^{c}

. Integrating (with respect to

θ_{1}

) over A with respect to any probability measure P, we obtain:

Δ_{B} (θ_{2}) \int_{A} Δ_{A} (θ_{1}) d P (θ_{1}) \leq Δ_{A} (θ_{2}) \int_{A} Δ_{B} (θ_{1}) d P (θ_{1}), \forall θ_{2} \in B^{c} .

Similarly, integration (with respect to

θ_{2}

) over

B^{c}

with respect to the same measure P yields:

\begin{matrix} \int_{B^{c}} Δ_{B} (θ_{2}) d P (θ_{2}) \int_{A} - Δ_{A} (θ_{1}) d P (θ_{1}) \geq \int_{B^{c}} Δ_{A} (θ_{2}) d P (θ_{2}) \int_{A} - Δ_{B} (θ_{1}) d P (θ_{1}) . \end{matrix}

(5)

Now, let φ be a testing scheme generated by the family

{(L_{A})}_{A \in σ (Θ)}

. For

A, B \in σ (Θ)

with

A \subseteq B

and

x \in X

,

φ_{A} (x) = 0 \Rightarrow \int_{Θ} [L_{A} (0, θ) - L_{A} (1, θ)] d π_{x} (θ) \leq 0,

where

π_{x} (.)

denotes the posterior distribution of θ given

X = x

. Thus,

\int_{A} - Δ_{A} (θ) d π_{x} (θ) + \int_{A^{c} \cap B} Δ_{A} (θ) d π_{x} (θ) + \int_{B^{c}} Δ_{A} (θ) d π_{x} (θ) \leq 0 .

Multiplying the last inequality by

\int_{B^{c}} Δ_{B} (θ) d π_{x} (θ) \geq 0

, we get:

\int_{B^{c}} Δ_{B} (θ) d π_{x} (θ) \int_{A} - Δ_{A} (θ) d π_{x} (θ) + \int_{B^{c}} Δ_{B} (θ) d π_{x} (θ) \int_{A^{c} \cap B} Δ_{A} (θ) d π_{x} (θ) + \int_{B^{c}} Δ_{B} (θ) d π_{x} (θ) \int_{B^{c}} Δ_{A} (θ) d π_{x} (θ) \leq 0 .

From inequality Equation (5), it follows that:

\int_{A} - Δ_{B} (θ) d π_{x} (θ) \int_{B^{c}} Δ_{A} (θ) d π_{x} (θ) + \int_{B^{c}} Δ_{B} (θ) d π_{x} (θ) \int_{A^{c} \cap B} Δ_{A} (θ) d π_{x} (θ) + \int_{B^{c}} Δ_{B} (θ) d π_{x} (θ) \int_{B^{c}} Δ_{A} (θ) d π_{x} (θ) \leq 0 .

As

\int_{B^{c}} Δ_{B} (θ) d π_{x} (θ) \int_{A^{c} \cap B} Δ_{A} (θ) d π_{x} (θ) \geq 0

and

\int_{A^{c} \cap B} - Δ_{B} (θ) d π_{x} (θ) \int_{B^{c}} Δ_{A} (θ) d π_{x} (θ) \leq 0

, we have that:

\int_{A} - Δ_{B} (θ) d π_{x} (θ) \int_{B^{c}} Δ_{A} (θ) d π_{x} (θ) + \int_{A^{c} \cap B} - Δ_{B} (θ) d π_{x} (θ) \int_{B^{c}} Δ_{A} (θ) d π_{x} (θ) + \int_{B^{c}} Δ_{B} (θ) d π_{x} (θ) \int_{B^{c}} Δ_{A} (θ) d π_{x} (θ) \leq 0,

and, consequently,

\int_{B^{c}} Δ_{A} (θ) d π_{x} (θ) \{\int_{A} - Δ_{B} (θ) d π_{x} (θ) + \int_{A^{c} \cap B} - Δ_{B} (θ) d π_{x} (θ) + \int_{B^{c}} Δ_{B} (θ) d π_{x} (θ)\} \leq 0 .

Finally,

\int_{Θ} [L_{B} (0, θ) - L_{B} (1, θ)] d π_{x} (θ) \leq 0 .

If

\int_{Θ} [L_{B} (0, θ) - L_{B} (1, θ)] d π_{x} (θ) < 0

, then

φ_{B} (x) = 0

. If this integral is equal to zero, then both zero and one are optimal solutions, and we can choose the decision zero as

φ_{B} (x)

in order to ensure that

φ_{B} (x) \leq φ_{A} (x)

. Hence, with respect to each prior π, there is a TS generated by

{(L_{A})}_{A \in σ (Θ)}

that is coherent.

C. Proof of Theorem 3

The proof is analogous to that of Theorem 2. First, we prove the necessary condition by the contrapositive. Suppose that there are

A \in σ (Θ)

and

θ_{1} \in A

and

θ_{2} \in A^{c}

, such that:

Δ_{A} (θ_{1}) Δ_{A^{c}} (θ_{2}) \neq Δ_{A^{c}} (θ_{1}) Δ_{A} (θ_{2}) .

Assume

Δ_{A} (θ_{1}) Δ_{A^{c}} (θ_{2}) < Δ_{A^{c}} (θ_{1}) Δ_{A} (θ_{2})

(the other case is developed in the same way), which implies that

Δ_{A^{c}} (θ_{1}) > 0

and

Δ_{A} (θ_{2}) > 0

. Adding

Δ_{A^{c}} (θ_{2}) Δ_{A} (θ_{2})

to both sides of the inequality, we easily obtain that:

0 \leq \frac{Δ_{A^{c}} (θ_{2})}{Δ_{A^{c}} (θ_{1}) + Δ_{A^{c}} (θ_{2})} < \frac{Δ_{A} (θ_{2})}{Δ_{A} (θ_{1}) + Δ_{A} (θ_{2})} \leq 1 .

Thus, there is

α_{0} \in (0, 1)

, such that:

\begin{matrix} 0 \leq \frac{Δ_{A^{c}} (θ_{2})}{Δ_{A^{c}} (θ_{1}) + Δ_{A^{c}} (θ_{2})} < α_{0} < \frac{Δ_{A} (θ_{2})}{Δ_{A} (θ_{1}) + Δ_{A} (θ_{2})} \leq 1 . \end{matrix}

(6)

In addition, there is

x^{'} \in X

, such that

L_{x^{'}} (θ_{1}), L_{x^{'}} (θ_{2}) > 0

. For the prior distribution

π^{*}

for θ given by:

π^{*} (θ_{1}) = \frac{α_{0} L_{x^{'}} (θ_{2})}{α_{0} L_{x^{'}} (θ_{2}) + (1 - α_{0}) L_{x^{'}} (θ_{1})} and π^{*} (θ_{2}) = 1 - π^{*} (θ_{1}),

the posterior distribution given

x^{'}

is

π_{x^{'}}^{*} (θ_{1}) = α_{0}

and

π_{x^{'}}^{*} (θ_{2}) = 1 - α_{0}

. Let

φ^{*}

be any TS generated by

{(L_{A})}_{A \in σ (Θ)}

with respect to

π^{*}

. Thus,

φ_{A}^{*} (x^{'}) = 0, if α_{0} > \frac{Δ_{A} (θ_{2})}{Δ_{A} (θ_{1}) + Δ_{A} (θ_{2})} and φ_{A^{c}}^{*} (x^{'}) = 0, if α_{0} < \frac{Δ_{A^{c}} (θ_{2})}{Δ_{A^{c}} (θ_{1}) + Δ_{A^{c}} (θ_{2})} .

From Equation (6), we have

φ_{A}^{*} (x^{'}) = 1

and

φ_{A^{c}}^{*} (x^{'}) = 1

. Therefore, there is a prior distribution

π^{*}

for θ with respect to which any TS generated by

{(L_{A})}_{A \in σ (Θ)}

does not meet invertibility.

Now, we prove the sufficiency. Suppose that for all

A \in σ (Θ)

:

Δ_{A} (θ_{1}) Δ_{A^{c}} (θ_{2}) = Δ_{A^{c}} (θ_{1}) Δ_{A} (θ_{2}), \forall θ_{1} \in A, θ_{2} \in A^{c} .

Integrating (with respect to

θ_{2}

) over the set

A^{c}

with respect to any probability measure P defined on

σ (Θ)

, we have:

\int_{A^{c}} Δ_{A} (θ_{2}) Δ_{A^{c}} (θ_{1}) d P (θ_{2}) = \int_{A^{c}} Δ_{A} (θ_{1}) Δ_{A^{c}} (θ_{2}) d P (θ_{2}), for all θ_{1} \in A .

Similarly, integrating (with respect to

θ_{1}

) over A, we get:

\begin{matrix} \int_{A} Δ_{A^{c}} (θ_{1}) d P (θ_{1}) \int_{A^{c}} Δ_{A} (θ_{2}) d P (θ_{2}) = \int_{A} Δ_{A} (θ_{1}) d P (θ_{1}) \int_{A^{c}} Δ_{A^{c}} (θ_{2}) d P (θ_{2}) . \end{matrix}

(7)

Let φ be a TS generated by

{(L_{A})}_{A \in σ (Θ)}

. If

φ_{A} (x) = 0

, then:

\int_{Θ} [L_{A} (0, θ) - L_{A} (1, θ)] d π_{x} (θ) = \int_{A} - Δ_{A} (θ) d π_{x} (θ) + \int_{A^{c}} Δ_{A} (θ) d π_{x} (θ) \leq 0 .

Multiplying both sides by

\int_{A} Δ_{A^{c}} (θ) d π_{x} (θ) \geq 0

, we get:

\int_{A} Δ_{A^{c}} (θ) d π_{x} (θ) \int_{A} - Δ_{A} (θ) d π_{x} (θ) + \int_{A} Δ_{A^{c}} (θ) d π_{x} (θ) \int_{A^{c}} Δ_{A} (θ) d π_{x} (θ) \leq 0 .

From Equation (7), it follows that:

\int_{A} - Δ_{A} (θ) d π_{x} (θ) \{\int_{A} Δ_{A^{c}} (θ) d π_{x} (θ) + \int_{A^{c}} - Δ_{A^{c}} (θ) d π_{x} (θ)\} \leq 0 .

Thus,

\int_{A} Δ_{A^{c}} (θ) d π_{x} (θ) + \int_{A^{c}} - Δ_{A^{c}} (θ) d π_{x} (θ) \geq 0,

since

\int_{A} - Δ_{A} (θ) d π_{x} (θ) \leq 0

. In this way,

\int_{Θ} [L_{A^{c}} (0, θ) - L_{A^{c}} (1, θ)] d π_{x} (θ) \geq 0 .

If

\int_{Θ} [L_{A^{c}} (0, θ) - L_{A^{c}} (1, θ)] d π_{x} (θ) > 0

, then

φ_{A^{c}} (x) = 1

. If the integral is zero, then we can choose

φ_{A^{c}} (x) = 1

, so as to obtain

φ_{A^{c}} (x) = 1 - φ_{A} (x)

. Similarly, we prove that if

φ_{A} (x) = 1

, then there is a Bayes test for

A^{c}

,

φ_{A^{c}}

, generated by

L_{A^{c}}

, such that

φ_{A^{c}} (x) = 0

. Consequently, there is a TS generated by

{(L_{A})}_{A \in σ (Θ)}

that satisfies invertibility.

D. Proof of Theorem 4

Suppose that there are

A, B \in σ (Θ)

,

θ_{1} \in A \cup B

and

θ_{2} \in {(A \cup B)}^{c}

such that both:

Δ_{A \cup B} (θ_{1}) Δ_{A} (θ_{2}) > Δ_{A \cup B} (θ_{2}) Δ_{A} (θ_{1}) and Δ_{A \cup B} (θ_{1}) Δ_{B} (θ_{2}) > Δ_{A \cup B} (θ_{2}) Δ_{B} (θ_{1})

hold, from which it follows that

Δ_{A \cup B} (θ_{1}) > 0

,

Δ_{A} (θ_{2}) > 0

and

Δ_{B} (θ_{2}) > 0

. Proceeding as in the previous proofs, we obtain that:

0 \leq \frac{Δ_{A \cup B} (θ_{2})}{Δ_{A \cup B} (θ_{1}) + Δ_{A \cup B} (θ_{2})} < min \{\frac{Δ_{A} (θ_{2})}{Δ_{A} (θ_{1}) + Δ_{A} (θ_{2})}, \frac{Δ_{B} (θ_{2})}{Δ_{B} (θ_{1}) + Δ_{B} (θ_{2})}\} \leq 1 .

Thus, there is

α_{0} \in (0, 1)

such that:

0 \leq \frac{Δ_{A \cup B} (θ_{2})}{Δ_{A \cup B} (θ_{1}) + Δ_{A \cup B} (θ_{2})} < α_{0} < min \{\frac{Δ_{A} (θ_{2})}{Δ_{A} (θ_{1}) + Δ_{A} (θ_{2})}, \frac{Δ_{B} (θ_{2})}{Δ_{B} (θ_{1}) + Δ_{B} (θ_{2})}\} \leq 1 .

In addition, there is

x^{'} \in X

such that

L_{x^{'}} (θ_{1}), L_{x^{'}} (θ_{2}) > 0

. For the prior distribution

π^{*}

for θ given by:

π^{*} (θ_{1}) = \frac{α_{0} L_{x^{'}} (θ_{2})}{α_{0} L_{x^{'}} (θ_{2}) + (1 - α_{0}) L_{x^{'}} (θ_{1})} and π^{*} (θ_{2}) = 1 - π^{*} (θ_{1}),

the posterior distribution is

π_{x^{'}}^{*} (θ_{1}) = α_{0}

and

π_{x^{'}}^{*} (θ_{2}) = 1 - α_{0}

. Let

φ^{*}

be any TS generated by

{(L_{A})}_{A \in σ (Θ)}

with respect to

π^{*}

. Next, we consider three cases:

(i): if $θ_{1} \in A \cap B$ , then:

$φ_{C}^{*} (x^{'}) = 0, if α_{0} > \frac{Δ_{C} (θ_{2})}{Δ_{C} (θ_{1}) + Δ_{C} (θ_{2})},$

for any $C \in {A, B, A \cup B}$ . Thus, we have $φ_{A}^{*} (x^{'}) = 1$ , $φ_{B}^{*} (x^{'}) = 1$ and $φ_{A \cup B}^{*} (x^{'}) = 0$ ;
(ii): if $θ \notin A$ , then:

$φ_{B}^{*} (x^{'}) = 0, if α_{0} > \frac{Δ_{B} (θ_{2})}{Δ_{B} (θ_{1}) + Δ_{B} (θ_{2})} and φ_{A \cup B}^{*} (x^{'}) = 0, if α_{0} > \frac{Δ_{A \cup B} (θ_{2})}{Δ_{A \cup B} (θ_{1}) + Δ_{A \cup B} (θ_{2})},$

and:

$\int_{Θ} [L_{A} (0, θ) - L_{A} (1, θ)] d π_{x^{'}} (θ) = Δ_{A} (θ_{1}) α_{0} + Δ_{A} (θ_{2}) (1 - α_{0}) > 0 .$

Thus, $φ_{A}^{*} (x^{'}) = 1$ , $φ_{B}^{*} (x^{'}) = 1$ and $φ_{A \cup B}^{*} (x^{'}) = 0$ ;
(iii): if $θ \notin B$ , a development similar to that of Case (ii) yields the same results: $φ_{A}^{*} (x^{'}) = 1$ , $φ_{B}^{*} (x^{'}) = 1$ and $φ_{A \cup B}^{*} (x^{'}) = 0$ .

Therefore, in any case, there is a prior distribution

π^{*}

for θ with respect to which no TS generated by

{(L_{A})}_{A \in σ (Θ)}

meets finite union consonance, concluding the proof.

E. Proof of Theorem 5

The proof of Theorem 5 consists of verifying the inexistence of such a family of loss functions that generates Bayes tests satisfying the desiderata with respect to all priors concentrated on three points in Θ (of course, there will not be such a family satisfying these requisites with respect to all priors over

σ (Θ)

).

Let

{A_{1}, A_{2}, A_{3}}

be a measurable partition of Θ and

θ_{1}, θ_{2}, θ_{3} \in Θ

, such that

θ_{i} \in A_{i}

,

i = 1, 2, 3

. First, notice that for all

x \in X

, such that

L_{x} (θ_{i}) > 0

for

i = 1, 2, 3

, there is a one-to-one correspondence between prior and posterior distributions concentrated on

{θ_{1}, θ_{2}, θ_{3}}

. Indeed, for all

(α_{1}, α_{2}, α_{3}) \in A = {(a, b, c) \in R_{+}^{3} : a + b + c = 1}

and

x \in X

, such that

L_{x} (θ_{i}) > 0

for

i = 1, 2, 3

, there is a unique prior distribution for θ, π, such that the corresponding posterior distribution given x,

π_{x}

, satisfies

π_{x} (θ_{i}) = α_{i}

,

i = 1, 2, 3

, namely:

π (θ_{i}) = \frac{\frac{α_{i}}{L_{x} (θ_{i})}}{\frac{α_{1}}{L_{x} (θ_{1})} + \frac{α_{2}}{L_{x} (θ_{2})} + \frac{α_{3}}{L_{x} (θ_{3})}}, i = 1, 2, 3.

Henceforth, we will refer to the above posterior by

(α_{1}, α_{2}, α_{3})

for short. Let

{(L_{A})}_{A \in σ (Θ)}

be any family of strict hypothesis testing loss functions. For each

(α_{1}, α_{2}, α_{3}) \in A

, the difference between the posterior risk of accepting

H_{0}^{(i)} : θ \in A_{i}

and that of rejecting it is given by:

\int_{Θ} [L_{A_{i}} (0, θ) - L_{A_{i}} (1, θ)] d π_{x} (θ) = Δ_{A_{i}} (θ_{1}) α_{1} + Δ_{A_{i}} (θ_{2}) α_{2} + Δ_{A_{i}} (θ_{3}) α_{3},

where

Δ_{A_{i}} (θ_{j}) = L_{A_{i}} (0, θ_{j}) - L_{A_{i}} (1, θ_{j})

(note that

Δ_{A_{i}} (θ_{j}) > 0

, if

i \neq j

, while

Δ_{A_{i}} (θ_{i}) < 0

). In order to evaluate the tests for the hypotheses

H_{0}^{(1)}

,

H_{0}^{(2)}

and

H_{0}^{(3)}

with respect to all posterior distributions concentrated on

{θ_{1}, θ_{2}, θ_{3}}

, we consider the transformation

T : A \to R^{3}

defined by:

T (α_{1}, α_{2}, α_{3}) = (\int_{Θ} Δ_{A_{1}} (θ) d π_{x} (θ), \int_{Θ} Δ_{A_{2}} (θ) d π_{x} (θ), \int_{Θ} Δ_{A_{3}} (θ) d π_{x} (θ)),

where

\int_{Θ} Δ_{A_{i}} (θ) d π_{x} (θ) = Δ_{A_{i}} (θ_{1}) α_{1} + Δ_{A_{i}} (θ_{2}) α_{2} + Δ_{A_{i}} (θ_{3}) α_{3}

. Thus, T assigns to each posterior

(α_{1}, α_{2}, α_{3}) \in A

the differences between the risks of accepting

H_{0}^{(i)}

and of rejecting it,

i = 1, 2, 3

. It is easy to verify that

B = T (A) = {T (α_{1}, α_{2}, α_{3}) : (α_{1}, α_{2}, α_{3}) \in A}

is a convex set. Indeed,

B

is a triangle (see Figure 2) with vertices

P_{1} = T (1, 0, 0) = (Δ_{A_{1}} (θ_{1}), Δ_{A_{2}} (θ_{1}), Δ_{A_{3}} (θ_{1}))

,

P_{2} = T (0, 1, 0) = (Δ_{A_{1}} (θ_{2}), Δ_{A_{2}} (θ_{2}), Δ_{A_{3}} (θ_{2}))

and

P_{3} = T (0, 0, 1) = (Δ_{A_{1}} (θ_{3}), Δ_{A_{2}} (θ_{3}), Δ_{A_{3}} (θ_{3}))

(these points are not aligned owing to the restrictions on the quantities

Δ_{A_{i}} (θ_{j})

,14]).

Figure 2. Set

B

.

Figure 2. Set

B

.

Now, we turn to the main argument of the proof. By Theorem 4.3 from [8], it is necessary for a Bayesian testing scheme to satisfy the logical requirements with respect to all priors over

σ (Θ)

that exactly one of the

A_{i}^{'} s

is accepted for each vector of probabilities

(α_{1}, α_{2}, α_{3})

. Geometrically, such a necessary condition is equivalent to the triangle

B

to be contained in the union of the octants that comprise the triplets with only one negative coordinate, namely

R_{-} \times R_{+} \times R_{+}

,

R_{+} \times R_{-} \times R_{+}

and

R_{+} \times R_{+} \times R_{-}

. However, this is impossible. To verify this fact, we consider three cases (Figure 3 illustrates the projection of

B

over the plane

w = {(u, v, 0) : u, v \in R}

in each of these cases):

(i): if $Δ_{A_{1}} (θ_{1}) Δ_{A_{2}} (θ_{2}) > Δ_{A_{1}} (θ_{2}) Δ_{A_{2}} (θ_{1})$ , then the projection of the line segment joining $P_{1}$ and $P_{2}$ over the plane $w$ intersects the (third) quadrant $R_{-} \times R_{-} \times {0}$ (see the first graphic in Figure 3). Thus, there is $γ \in (0, 1)$ , such that $γ Δ_{A_{i}} (θ_{1}) + (1 - γ) Δ_{A_{i}} (θ_{2}) < 0$ , $i = 1, 2$ . As $γ P_{1} + (1 - γ) P_{2} \in B$ , there is a posterior $(α_{1}, α_{2}, α_{3})$ concentrated on ${θ_{1}, θ_{2}, θ_{3}}$ with respect to which any TS generated by ${(L_{A})}_{A \in σ (Θ)}$ does not reject both $A_{1}$ and $A_{2}$ and, therefore, does not respect coherence, invertibility and finite union consonance;
(ii): if $Δ_{A_{1}} (θ_{1}) Δ_{A_{2}} (θ_{2}) = Δ_{A_{1}} (θ_{2}) Δ_{A_{2}} (θ_{1})$ , then the projection of the line segment joining $P_{1}$ and $P_{2}$ over $w$ intersects the origin $(0, 0, 0)$ (see the second graphic in Figure 3). Thus, there is $t_{0} > 0$ , such that the point $P_{0} = (0, 0, t_{0}) \in B$ . Considering now the line segment joining $P_{0}$ and $P_{3}$ , it is easily seen that for any $γ \in (\frac{- Δ_{A_{3}} (θ_{3})}{t_{0} - Δ_{A_{3}} (θ_{3})}, 1)$ , $γ 0 + (1 - γ) Δ_{A_{1}} (θ_{3}) > 0$ , $γ 0 + (1 - γ) Δ_{A_{2}} (θ_{3}) > 0$ and $γ t_{0} + (1 - γ) Δ_{A_{3}} (θ_{3}) > 0$ . As $γ P_{0} + (1 - γ) P_{3} \in B$ , there is a posterior distribution with respect to which any TS generated by ${(L_{A})}_{A \in σ (Θ)}$ rejects $A_{1}$ , $A_{2}$ and $A_{3}$ and, therefore, does not satisfy the logical consistency properties all together;
(iii): if $Δ_{A_{1}} (θ_{1}) Δ_{A_{2}} (θ_{2}) < Δ_{A_{1}} (θ_{2}) Δ_{A_{2}} (θ_{1})$ , then the projection of the above-mentioned segment over $w$ intersects the (first) quadrant $R_{+} \times R_{+} \times {0}$ (third graphic in Figure 3). Thus, there is $γ \in (0, 1)$ such that $γ Δ_{A_{i}} (θ_{1}) + (1 - γ) Δ_{A_{i}} (θ_{2}) > 0$ , $i = 1, 2$ . As $γ P_{1} + (1 - γ) P_{2} \in B$ , there is a posterior $(α_{1}, α_{2}, α_{3})$ concentrated on ${θ_{1}, θ_{2}, θ_{3}}$ with respect to which any TS generated by ${(L_{A})}_{A \in σ (Θ)}$ rejects $A_{1}$ , $A_{2}$ and $A_{3}$ and, consequently, does not meet the desiderata.

From (i)–(iii), the result follows.

Figure 3. Projection of

B

in

u \times v

.

Figure 3. Projection of

B

in

u \times v

.

F. Proof of Theorem 6

Let φ be a TS satisfying coherence, invertibility and countable union consonance. From Theorem 1, there is a unique point estimator

δ : X \to Θ

, such that for all

A \in σ (Θ)

and

x \in X

,

φ_{A} (x) = I (δ (x) \notin A)

. For each

x \in X

, define

μ_{x} : σ (Θ) \to R_{+}

by:

μ_{x} (A) = 1 - φ_{A} (x) = I (δ (x) \in A),

that is,

μ_{x}

is the probability measure degenerate at the point

δ (x)

[14]. Furthermore, let

μ_{0}

be any probability measure defined on

P (X)

. Defining

μ : P (Θ \times X) \to R_{+}

by:

μ (B) = \sum_{(θ, x) \in B} μ_{0} ({x}) μ_{x} ({θ}), B \in P (Θ \times χ),

it is immediate that μ is a probability measure and that

μ_{x}

is the conditional distribution of θ given

X = x

, for each

x \in X

. Next, let

{(L_{A})}_{A \in σ (Θ)}

be any family of strict hypothesis testing loss functions. Let

φ^{*}

be a testing scheme generated by

{(L_{A})}_{A \in σ (Θ)}

with respect to the μ-marginal distribution of θ. Let us verify that

φ^{*}

coincides with φ. Indeed, for

x \in X

and

A \in σ (Θ)

, we have:

φ_{A} (x) = 0 \Rightarrow μ_{x} (A) = 1 \Rightarrow

\Rightarrow \sum_{θ \in Θ} [L_{A} (0, θ) - L_{A} (1, θ)] μ_{x} (θ) = \sum_{θ \in A} [L_{A} (0, θ) - L_{A} (1, θ)] μ_{x} (θ) < 0 \Rightarrow φ_{A}^{*} (x) = 0 .

Similarly,

φ_{A} (x) = 1 \Rightarrow μ_{x} (A) = 0 \Rightarrow

\Rightarrow \sum_{θ \in Θ} [L_{A} (0, θ) - L_{A} (1, θ)] μ_{x} (θ) = \sum_{θ \in A^{c}} [L_{A} (0, θ) - L_{A} (1, θ)] μ_{x} (θ) > 0 \Rightarrow φ_{A}^{*} (x) = 1,

concluding the proof. It should be emphasized that there are many other probability measures over

P (Θ \times X)

and families of strict hypothesis testing loss functions that yield the result of Theorem 6. For instance, considering for each

x \in X

, a conditional probability measure

μ_{x}^{^{'}}

, such that

μ_{x}^{^{'}} (δ (x)) > 1 / 2

and

μ_{x}^{^{'}} (θ) > 0

, for all

θ \in Θ

, together with the family of 0–1 loss functions, one will obtain a Bayesian TS that coincides with φ, as well (see [14] for the details).

G. Proof of Theorem 7

To prove Part (a), we define a family of loss functions that generates Bayesian testing schemes satisfying both coherence and invertibility with respect to all prior distributions for θ, which implies, by Theorem 3.1 from [28], that, for each sample point, at most one hypothesis of each partition of Θ is not rejected. Next, we prove that, for each

x \in X

, there is a singleton that is not rejected with respect to the prior π. Combining these facts, we prove that, for each sample point, exactly one hypothesis of each partition of Θ is accepted, which is equivalent (Theorem 4.3 from [8]) to asserting that the TS generated by that family of losses with respect to π meets the desiderata.

Thus, for

A \in σ (Θ)

, let

L_{A} : {0, 1} \times (Θ \times X) \to R_{+}

be given, for

θ \in A^{c}

and

x \in X

, by

L_{A} (1, (θ, x)) = 0

and:

L_{A} (0, (θ, x)) = min \{min \{L (d, θ); \frac{1}{L (d, θ)}\} I_{A} (δ (x)) + max \{L (d, θ); \frac{1}{L (d, θ)}\} I_{A^{c}} (δ (x)) : d \in A\},

and, for

θ \in A

and

x \in X

, by

L_{A} (0, (θ, x)) = 0

and:

L_{A} (1, (θ, x)) = min \{\frac{1}{C} min \{L (d, θ); \frac{1}{L (d, θ)}\} I_{A^{c}} (δ (x)) + C max \{L (d, θ); \frac{1}{L (d, θ)}\} I_{A} (δ (x)) : d \in A^{c}\},

where

C > 1

is any constant greater than

max \{\frac{E [L (δ (x), θ) | x]}{π_{x} (δ (x))} : x \in X\}

.

These hypothesis testing loss functions do not penalize correct decisions. They also reflect the decision-maker’s tendency to not reject the hypotheses that comprise the best estimate for θ,

δ (x)

. For instance, if

θ \in A

and one decides to reject A on the basis of the sample x, the loss of falsely rejecting A is:

\{\begin{matrix} C min \{max {L (d, θ); \frac{1}{L (d, θ)}} : d \in A^{c}\}, & if δ (x) \in A \\ \frac{1}{C} min \{min {L (d, θ); \frac{1}{L (d, θ)}} : d \in A^{c}\}, & otherwise. \end{matrix}

If C is large enough, of course the decision-maker will be more reluctant to reject A in case of

δ (x) \in A

than to reject it if not.

This family of loss functions satisfies the condition in Equation (2) of Theorem 2. In fact, for all

A, B \in σ (Θ)

, with

A \subseteq B

, for all

θ_{1} \in A

,

θ_{2} \in B^{c}

and

x \in X

, we have:

(i): if $δ (x) \in A$ , then:

$\frac{Δ_{A} (θ_{1})}{Δ_{B} (θ_{1})} = \frac{C min \{max \{L (d, θ_{1}); \frac{1}{L (d, θ_{1})}\} : d \in A^{c}\}}{C min \{max \{L (d, θ_{1}); \frac{1}{L (d, θ_{1})}\} : d \in B^{c}\}} \leq 1 \leq \frac{min \{min \{L (d, θ_{2}); \frac{1}{L (d, θ_{2})}\} : d \in A\}}{min \{min \{L (d, θ_{2}); \frac{1}{L (d, θ_{2})}\} : d \in B\}} = \frac{Δ_{A} (θ_{2})}{Δ_{B} (θ_{2})} .$
(ii): if $δ (x) \in B \cap A^{c}$ (recall $C \geq 1$ ), it follows that:

$\frac{Δ_{A} (θ_{1})}{Δ_{B} (θ_{1})} = \frac{\frac{1}{C} min \{min \{L (d, θ_{1}); \frac{1}{L (d, θ_{1})}\} : d \in A^{c}\}}{C min \{max \{L (d, θ_{1}); \frac{1}{L (d, θ_{1})}\} : d \in B^{c}\}} \leq 1 \leq \frac{min \{max \{L (d, θ_{2}); \frac{1}{L (d, θ_{2})}\} : d \in A\}}{min \{min \{L (d, θ_{2}); \frac{1}{L (d, θ_{2})}\} : d \in B\}} = \frac{Δ_{A} (θ_{2})}{Δ_{B} (θ_{2})} .$
(iii): if $δ (x) \in B^{c}$ ,

$\frac{Δ_{A} (θ_{1})}{Δ_{B} (θ_{1})} = \frac{\frac{1}{C} min \{min \{L (d, θ_{1}); \frac{1}{L (d, θ_{1})}\} : d \in A^{c}\}}{\frac{1}{C} min \{min \{L (d, θ_{1}); \frac{1}{L (d, θ_{1})}\} : d \in B^{c}\}} \leq 1 \leq \frac{min \{max \{L (d, θ_{2}); \frac{1}{L (d, θ_{2})}\} : d \in A\}}{min \{max \{L (d, θ_{2}); \frac{1}{L (d, θ_{2})}\} : d \in B\}} = \frac{Δ_{A} (θ_{2})}{Δ_{B} (θ_{2})} .$

Therefore,

{(L_{A})}_{A \in σ (Θ)}

generates coherent testing schemes with respect to all prior distribution for θ, if

C \geq 1

. Furthermore, for all

A \in σ (Θ)

,

θ_{1} \in A

and

θ_{2} \in A^{c}

, we have, if

δ (x) \in A

, that:

\frac{Δ_{A} (θ_{2})}{Δ_{A^{c}} (θ_{2})} = \frac{min {min {L (d, θ_{2}), \frac{1}{L (d, θ_{2})}} : d \in A}}{\frac{1}{C} min {min {L (d, θ_{2}), \frac{1}{L (d, θ_{2})}} : d \in A}} = \frac{1}{\frac{1}{C}} = C = \frac{C min {max {L (d, θ_{1}), \frac{1}{L (d, θ_{1})}} : d \in A^{c}}}{min {max {L (d, θ_{1}), \frac{1}{L (d, θ_{1})}} : d \in A^{c}}} = \frac{Δ_{A} (θ_{1})}{Δ_{A^{c}} (θ_{1})} .

Analogously, we prove that the condition in Equation (3) of Theorem 3 is fulfilled if

δ (x) \notin A

. Thus, there are testing schemes generated by

{(L_{A})}_{A \in σ (Θ)}

that respect invertibility with respect to all priors. Finally, let us prove that a TS φ generated by

{(L_{A})}_{A \in σ (Θ)}

is such that

φ_{{δ (x)}} (x) = 0

, for all

x \in X

. Indeed,

\begin{matrix} \sum_{θ \in Θ} L_{{δ (x)}} (0, (θ, x)) π_{x} (θ) = \sum_{θ \neq δ (x)} L_{{δ (x)}} (0, (θ, x)) π_{x} (θ) \\ = & \sum_{θ \neq δ (x)} min \{L (δ (x), θ); \frac{1}{L (δ (x), θ)}\} π_{x} (θ) \leq \sum_{θ \neq δ (x)} L (δ (x), θ) π_{x} (θ) \end{matrix}

(8)

and:

\begin{matrix} \sum_{θ \in Θ} L_{{δ (x)}} (1, (θ, x)) π_{x} (θ) = L_{{δ (x)}} (1, (δ (x), x)) π_{x} (δ (x)) \\ = & C min \{max \{L (d, δ (x)); \frac{1}{L (d, δ (x))}\} : d \neq δ (x)\} π_{x} (δ (x)) \\ > & min \{max \{L (d, δ (x)); \frac{1}{L (d, δ (x))}\} : d \neq δ (x)\} \sum_{θ \neq δ (x)} L (δ (x), θ) π_{x} (θ), \end{matrix}

(9)

since

C > max \{\frac{E [L (δ (x), θ) | x]}{π_{x} (δ (x))} : x \in X\} \geq \frac{\sum_{θ \neq δ (x_{0})} L (δ (x_{0}), θ) π_{x_{0}} (θ)}{π_{x_{0}} (δ (x_{0}))}

, for any

x_{0} \in X

.

From Equations (8) and (9), it follows that:

\sum_{θ \in Θ} L_{{δ (x)}} (1, (θ, x)) π_{x} (θ) > min \{max \{L (d, δ (x)); \frac{1}{L (d, δ (x))}\} : d \neq δ (x)\} \sum_{θ \in Θ} L_{{δ (x)}} (0, (θ, x)) π_{x} (θ) .

Therefore,

\sum_{θ \in Θ} L_{{δ (x)}} (1, (θ, x)) π_{x} (θ) > \sum_{θ \in Θ} L_{{δ (x)}} (0, (θ, x)) π_{x} (θ)

and, consequently,

φ_{{δ (x)}} (x) = 0

, concluding the proof of Part (a).

For Part (b), suppose φ is generated by

{(L_{A})}_{A \in σ (Θ)}

with respect to π. From Theorem 4.3 from [8] and Theorem 1, it follows that for all

x \in X

,

φ_{{δ (x)}} (x) = 0

and

φ_{{d}} (x) = 1

, for all

d \neq δ (x)

. Thus,

\sum_{θ \in Θ} [L_{{δ (x)}} (0, θ) - L_{{δ (x)}} (1, θ)] π_{x} (θ) \leq 0 \leq \sum_{θ \in Θ} [L_{{d}} (0, θ) - L_{{d}} (1, θ)] π_{x} (θ),

for

d \neq δ (x)

, where

π_{x}

is the posterior distribution for θ given x. Defining

L : Θ \times Θ \to R_{+}

by:

L (d, θ) = [L_{{d}} (0, θ) - L_{{d}} (1, θ)] - min {L_{{d^{^{'}}}} (0, θ) - L_{{d^{^{'}}}} (1, θ) : d^{^{'}} \in Θ}

= [L_{{d}} (0, θ) - L_{{d}} (1, θ)] - [L_{{θ}} (0, θ) - L_{{θ}} (1, θ)],

it follows that:

\sum_{θ \in Θ} L (δ (x), θ) π_{x} (θ) \leq \sum_{θ \in Θ} L (d, θ) π_{x} (θ),

for

d \neq δ (x)

, for each

x \in X

. Therefore, δ is a Bayes estimator for θ generated by L with respect to π. Notice that L (essentially) assigns to the estimate d for the parameter the difference between the loss of not rejecting the hypothesis

{d}

and that of rejecting it when the state of nature is θ. It seems reasonable that the greater the “distance” between d and θ, the greater this difference (and, consequently,

L (d, θ)

) should be.

References

Hommel, G.; Bretz, F. Aesthetics and power considerations in multiple testing—A contradiction? Biom. J. 2008, 20, 657–666. [Google Scholar] [CrossRef] [PubMed]
Shaffer, J.P. Multiple hypothesis testing. Ann. Rev. Psychol. 1995, 46, 561–584. [Google Scholar] [CrossRef]
Hochberg, Y.; Tamhane, A.C. Multiple Comparison Procedures; Wiley: New York, NY, USA, 1987. [Google Scholar]
Farcomeni, A. A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Stat. Methods Med. Res. 2008, 17, 347–388. [Google Scholar] [CrossRef] [PubMed]
Schervish, M.J. Theory of Statistics; Springer: New York, NY, USA, 1997. [Google Scholar]
Gabriel, K.R. Simultaneous test procedures—Some theory of multiple comparisons. Ann. Math. Stat. 1969, 41, 224–250. [Google Scholar] [CrossRef]
Lehmann, E.L. A theory of some multiple decision problems, II. Ann. Math. Stat. 1957, 28, 547–572. [Google Scholar] [CrossRef]
Izbicki, R.; Esteves, L.G. Logical Consistency in Simultaneous Statistical Test Procedures. Log. J. IGPL 2015. [Google Scholar] [CrossRef]
Pereira, C.A.B.; Stern, J.M.; Wechsler, S. Can a significance test be genuinely Bayesian? Bayesian Anal. 2008, 3, 79–100. [Google Scholar] [CrossRef]
Stern, J.M. Constructive verification, empirical induction and falibilist deduction: A threefold contrast. Information 2001, 2, 635–650. [Google Scholar] [CrossRef]
Pereira, C.A.B.; Stern, J.M. Evidence and credibility: Full Bayesian significance test for precise hypotheses. Entropy 1999, 1, 99–110. [Google Scholar] [CrossRef]
Lin, S.K.; Chen, C.K.; Ball, D.; Liu, H.C.; Loh, E.W. Gender-specific contribution of the GABAA subunit genes on 5q33 in methamphetamine use disorder. Pharm. J. 2003, 3, 349–355. [Google Scholar] [CrossRef] [PubMed]
Izbicki, R.; Fossaluza, V.; Hounie, A.G.; Nakano, E.Y.; Pereira, C.A. Testing allele homogeneity: The problem of nested hypotheses. BMC Genet. 2012, 13. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Silva, G.M. Propriedades Lógicas de Classes de Testes de Hipóteses. Ph.D. Thesis, University of São Paulo, São Paulo, Brazil, 2014. [Google Scholar]
Evans, M. Measuring Statistical Evidence Using Relative Belief; Chapman & Hall/CRC: London, UK, 2015. [Google Scholar]
Marcus, R.; Eric, P.; Gabriel, K.R. On closed testing procedures with special reference to ordered analysis of variance. Biometrika 1976, 63, 655–660. [Google Scholar] [CrossRef]
Sonnemann, E. General solutions to multiple testing problems. Biom. J. 2008, 50, 641–656. [Google Scholar] [CrossRef] [PubMed]
Lavine, M.; Schervish, M.J. Bayes Factors: What they are and what they are not. Am. Stat. 1999, 53, 119–122. [Google Scholar]
Kneale, W.; Kneale, M. The Development of Logic; Oxford University Press: Oxford, UK, 1962. [Google Scholar]
DeGroot, M.H. Optimal Statistical Decisions; McGraw-Hill: New York, NY, USA, 1970. [Google Scholar]
Finner, H.; Strassburger, K. The partitioning principle: A powerful tool in multiple decision theory. Ann. Stat. 2002, 30, 1194–1213. [Google Scholar] [CrossRef]
Aitchison, J. Confidence-region tests. J. R. Stat. Soc. Ser. B 1964, 26, 462–476. [Google Scholar]
Darwiche, A.Y.; Ginsberg, M.L. A Symbolic Generalization of Probability Theory. In Proceedings of the Tenth National Conference on Artificial Inteligence, AAAI-92, San Jose, CA, USA, 12–16 July 1992.
Evans, M.; Jang, G.H. Inferences from Prior-Based Loss Functions. 2011; arXiv:1104.3258. [Google Scholar]
Berger, J.O. In defense of the likelihood principle: axiomatics and coherency. Bayesian Stat. 1985, 2, 33–66. [Google Scholar]
Madruga, M.R.; Esteves, L.G.; Wechsler, S. On the bayesianity of pereira-stern tests. Test 2001, 10, 291–299. [Google Scholar] [CrossRef]
Ripley, B.D. Pattern Recognition and Neural Networks; Cambridge University Press: Cambridge, UK, 1996. [Google Scholar]
Izbicki, R. Classes de Testes de Hipóteses. Ph.D. Thesis, University of São Paulo, São Paulo, Brazil, 2010. [Google Scholar]

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Da Silva, G.M.; Esteves, L.G.; Fossaluza, V.; Izbicki, R.; Wechsler, S. A Bayesian Decision-Theoretic Approach to Logically-Consistent Hypothesis Testing. Entropy 2015, 17, 6534-6559. https://doi.org/10.3390/e17106534

AMA Style

Da Silva GM, Esteves LG, Fossaluza V, Izbicki R, Wechsler S. A Bayesian Decision-Theoretic Approach to Logically-Consistent Hypothesis Testing. Entropy. 2015; 17(10):6534-6559. https://doi.org/10.3390/e17106534

Chicago/Turabian Style

Da Silva, Gustavo Miranda, Luis Gustavo Esteves, Victor Fossaluza, Rafael Izbicki, and Sergio Wechsler. 2015. "A Bayesian Decision-Theoretic Approach to Logically-Consistent Hypothesis Testing" Entropy 17, no. 10: 6534-6559. https://doi.org/10.3390/e17106534

APA Style

Da Silva, G. M., Esteves, L. G., Fossaluza, V., Izbicki, R., & Wechsler, S. (2015). A Bayesian Decision-Theoretic Approach to Logically-Consistent Hypothesis Testing. Entropy, 17(10), 6534-6559. https://doi.org/10.3390/e17106534

Article Menu

A Bayesian Decision-Theoretic Approach to Logically-Consistent Hypothesis Testing

Abstract

1. Introduction

2. Testing Schemes

3. The Desiderata

3.1. Coherence

3.2. Invertibility

3.3. Consonance

4. A Bayesian Look at Each Desideratum

5. Putting the Desiderata Together

6. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix

A. Proof of Theorem 1

B. Proof of Theorem 2

C. Proof of Theorem 3

D. Proof of Theorem 4

E. Proof of Theorem 5

F. Proof of Theorem 6

G. Proof of Theorem 7

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI