The Logical Consistency of Simultaneous Agnostic Hypothesis Tests

Luís G. Esteves 1, Rafael Izbicki 2, Julio M. Stern 1 and Rafael B. Stern 2,* 1 Institute of Mathematics and Statistics, University of São Paulo, São Paulo 13565-905, Brazil; lesteves@ime.usp.br (L.G.E.); jstern@ime.usp.br (J.M.S.) 2 Department of Statistics, Federal University of São Carlos, São Carlos 05508-090, Brazil; rafaelizbicki@gmail.com * Correspondence: rbstern@gmail.com; Tel.: +55-16-3351-8241


Introduction
One of the practical shortcomings of simultaneous test procedures is that they can lack logical consistency [1,2].As a result, recent papers have discussed minimum logical requirements and methods that achieve these requirements [3][4][5][6][7].For example, it has been argued that simultaneous tests ought to be in agreement with the following criterion: if hypothesis A implies hypothesis B, a procedure that rejects B should also reject A.
In particular, Izbicki and Esteves [3] and da Silva et al. [7] examine classical and bayesian simultaneous tests with respect to four consistency properties: Izbicki and Esteves [3] prove that the only tests that are fully coherent are trivial tests based on point estimation, which are generally void of statistical optimality.This finding suggests that alternatives to the standard "reject versus accept" tests should be explored.Such an alternative are agnostic tests [8], which can take the following decisions: (i) accept an hypothesis (decision 0); (ii) reject it (decision 1); or (iii) noncommittally neither accept or reject it; thus abstaining or remaining agnostic about the other two actions (decision 1  2 ).Decision (iii) is also called a no-decision classification.The set of samples, x P X , for which one abstains from making a decision about a given hypothesis is called a no-decision region [8].An agnostic test enables one to explicitly deal with the difference between "accepting a hypothesis H" and "not rejecting H (remaining agnostic)".This distinction will be made clearer in Section 5, which derives agnostic tests under a Bayesian decision-theoretic standpoint by means of specific penalties for false rejection, false acceptance and excessive abstinence.
We use the above framework to revisit the logical consistency of simultaneous hypothesis tests.Section 2 defines agnostic testing scheme (ATS), a transformation that assigns to each statistical hypothesis an agnostic test function.This definition is illustrated with bayesian and frequentist examples, using both existing and novel agnostic tests.Section 3 generalizes the logical requirements in [3] to agnostic testing schemes.Section 4 presents tests that satisfy all of these logical requirements.Section 5 obtains, under the Bayesian decision-theoretic paradigm, necessary and sufficient conditions on loss functions to ensure that Bayes tests meet each of the logical requirements.All theorems are proved in the Appendix.

Agnostic Testing Schemes
This section describes the mathematical setup for agnostic testing schemes.Let X denote the sample space, Θ the parameter space and L x pθq the likelihood function at the point θ P Θ generated by the data x P X .We denote by D " t0, 1  2 , 1u the set of all decisions that can be taken when testing a hypothesis: accept (0), remain agnostic ( 12 ) and reject (1).By an agnostic hypothesis test (or simply agnostic test) we mean a decision function from X to D [8,9].Similar tests are commonly used in machine learning in the context of classification [1,2].Moreover, let Φ " tφ : φ : X ÝÑ Du be the set of all (agnostic) hypothesis tests.The following definition adapts testing schemes [3] to agnostic tests.Definition 1 (Agnostic Testing Scheme; ATS).Let σpΘq, a σ-field of subsets of the parameter space Θ, be the set of hypotheses to be tested.An ATS is a function L : σpΘq Ñ Φ that, for each hypothesis A P σpΘq, assigns the test LpAq P Φ for testing A.
A way of creating an agnostic testing scheme is to find a collection of statistics and to compare them to thresholds: Example 1.For every A P σpΘq, let s A : X ÝÑ R be a statistic.Let c 1 , c 2 P R, with c 1 ě c 2 , be fixed thresholds.For each A P σpΘq, one can define LpAq : X Ñ D by The ATS in Example 1 rejects a hypothesis if the value of the statistic s A is small, accepts it if this value is large, and remains agnostic otherwise.If s A pxq is a measure of how much evidence that x brings about A, then this ATS rejects a hypothesis if the evidence brought by the data is small, accepts it if this evidence is large, and remains agnostic otherwise.The next examples present particular cases of this ATS These examples will be explored in the following sections.
Example 2 (ATS based on posterior probabilities).Let Θ " R d and σpΘq " BpΘq, the Borelians of R d .Assume that a prior probability P in σpΘq is fixed, and let c 1 , c 2 P p0, 1q, with c 1 ě c 2 , be fixed thresholds.For each A P σpΘq, let LpAq : X Ñ D be defined by where Pp.|xq is the posterior distribution of θ, given x.This is essentially the test that Ripley [10] proposed in the context of classification, which was also investigated by Babb et al. [9].When c 1 " c 2 , this ATS is a standard (non-agnostic) Bayesian testing scheme.
Example 3 (Likelihood Ratio Tests with fixed threshold).Let Θ " R d and σpΘq " PpΘq, the set of the parts of R d .Let c 1 , c 2 P p0, 1q, with c 1 ě c 2 , be fixed thresholds.For each A P σpΘq, let λ x pAq " sup θPA L x pθq sup θPΘ L x pθq be the likelihood ratio statistic for sample x P X .Define L by When c 1 " c 2 , this is the standard likelihood ratio with fixed threshold (non-agnostic) testing scheme [3].
A similar test to that of Example 3 is developed by Berg [8]; however, the values of the cutoffs c 1 and c 2 are allowed to change with the hypothesis of interest, and they are chosen so as to control the level of significance and the power of each of the tests.

Example 4 (FBST ATS).
Let Θ " R d , σpΘq " BpR d q, and f pθq be the prior probability density function (p.d.f.) for θ.Suppose that, for each x P X , there exists f pθ|xq, the p.d.f. of the posterior distribution of θ, given x.For each hypothesis A P σpΘq, let T A x " # θ P Θ : f pθ|xq ą sup θPA f pθ|xq + be the set tangent to the null hypothesis and let ev x pAq " 1 ´Ppθ P T A x |xq be the Pereira-Stern evidence value for A [11].Let c 1 , c 2 P p0, 1q, with c 1 ě c 2 , be fixed thresholds.One can define an ATS L by When c 1 " c 2 , this ATS reduces to the standard (non-agnostic) FBST testing scheme [3].
The following example presents a novel ATS based on region estimators.

Example 5 (Region Estimator-based ATS).
Let R : X ÝÑ PpΘq be a region estimator of θ.For every A P σpΘq and x P X , one can define an ATS L via Hence, LpAqpxq " IpRpxqĎA c q`IpRpxqĘAq 2 .See Figure 1 for an illustration of this procedure.Notice that for continuous Θ, Example 5 does not accept precise (i.e., null Lebesgue measure) hypotheses, yielding either rejection or abstinence (unless region estimates are themselves precise).Therefore, the performance of region estimator-based ATS's is in agreement with the prevailing position among both bayesian and frequentist statisticians: to accept a precise hypothesis is inappropriate.From a Bayesian perspective, precise null hypothesis usually have zero posterior probabilities, and thus should not be accepted.From a Frequentist perspective, not rejecting a hypothesis is not the same as accepting it.See Berger and Delampady [12] and references therein for a detailed account on the controversial problem of testing precise hypotheses.

Rpxq
In principle, R can be any region estimator.However, some choices of R lead to better statistical performance.For example, from a frequentist, one might choose R to be a confidence region.This choice is explored in the next example.If R is a confidence region, then this ATS also controls the Family Wise Error Rate (FWER, [13]), as shown in Section 3.1.
" Normalpµ, 1q.In Figure 2, we illustrate how the probability of each decision, PpLpAqpXq " d|µq for d P t0, 1  2 , 1u, varies as a function of µ for three hypotheses: (i) µ ă 0; (ii) µ " 0; and (iii) 0 ă µ ă 1.We consider the standard region estimator for µ, RpXq " r X ź1´α{2 n s with α " 5%.These curves represent the generalization of the standard power function to agnostic hypothesis tests.Notice that µ " 0 is never accepted, and that, under the null hypothesis, all tests have at most 5% of probability of rejecting H. Example 7 (Region Likelihood Ratio ATS).For a fixed value c P p0, 1q, define the region estimate R c pxq " tθ P Θ : λ x ptθuq ě cu, where λ x is the likelihood ratio statistics from Example 3.For every A P σpΘq and x P X , the ATS based on this region estimator (Example 5) satisfies LpAqpxq " 1 ðñ A Ş R c pxq " H, and LpAqpxq " 0 ðñ A c Ş R c pxq " H.It follows that this ATS can be written as Example 8 (Region FBST ATS).For a fixed value of c P p0, 1q, let HPD x c be the Highest Posterior Probability Density region with probability 1 ´c, based on observation x [3,14].For every A P σpΘq and x P X , the ATS based on this region estimator (Example 5) satisfies LpAqpxq " 1 ðñ A Ş HPD x c " H, and LpAqpxq " 0 ðñ A c Ş HPD x c " H.It follows that this ATS can be written as In the sequence, we introduce four logical coherence properties for agnostic testing schemes and investigate which tests satisfy them.

Monotonicity
Monotonicity restricts the decisions that are available for nested hypotheses.If hypothesis A implies hypothesis B (i.e., A Ď B), then a testing scheme that rejects B should also reject A. Monotonicity has received a lot of attention in the literature (e.g., [5,6,[15][16][17][18][19]).It can be extended to ATS's in the following way.Next, we illustrate some monotonic agnostic testing schemes.
Example 9 (Tests based on posterior probabilities).The ATS from Example 2 is monotonic.Indeed, A Ă B implies that PpA|xq ď PpB|xq @x P X , and hence PpA|xq ą c i implies that PpB|xq ą c i for i " 1, 2.
Example 10 (Likelihood Ratio Tests with fixed threshold).The ATS from Example 3 is monotonic.This is because if A, B P σpΘq are such that A Ă B, then sup θPA L x pθq ď sup θPB L x pθq, @x P X , which implies that λ x pAq ď λ x pBq.It follows that λ x pAq ą c i implies that λ x pBq ą c i for i " 1, 2.
Example 11 (FBST).The ATS from Example 4 is monotonic.In fact, let A, B P σpΘq be such that A Ă B. We have sup B f pθ|xq ě sup A f pθ|xq @x P X .Hence, T B x Ď T A x , and, therefore, ev x pAq ď ev x pBq.It follows that ev x pAq ą c i implies that ev x pBq ą c i for i " 1, 2.
Notice that p-values and Bayes factors are not (coherent) measures of support for hypotheses [19,20], and therefore using them in a similar fashion as in Examples 2-4 would not lead to monotonic agnostic testing schemes.On the other hand, any monotonic statistic s A does, however, provide a monotonic ATS, because, if A Ď B, s A pxq ą c i implies that s B pxq ą c i for i " 1, 2. Another example of such statistic is the s-value defined by Patriota [6].As a matter of fact, every ATS is, in a sense, associated to monotonic statistics as shown in the next theorem.
Theorem 1.Let L be an agnostic testing scheme.L is monotonic if, and only if, there exist a sequence of test statistics ps A q APσpΘq , s A : X ÝÑ I Ď R, with s A ď s B whenever A Ă B, A, B P σpΘq, and cutoffs c 1 , c 2 P I, c 1 ě c 2 , such that for every A P σpΘq and x P X , as IpRpxq Ď B c q ď IpRpxq Ď A c q and IpRpxq Ę Bq ď IpRpxq Ę Aq.Because this ATS is monotonic, it also controls Family Wise Error Rate [21].

Union Consonance
Finner and Strassburger [4] and Izbicki and Esteves [3] investigated the following logical property, named union consonance: if a (non-agnostic) testing scheme rejects each of the hypotheses A and B, it should also reject their union A Y B. In other words, a TS cannot accept the union while rejecting its components.In this section, we adapt the concept of union consonance to the framework of agnostic testing schemes by considering two extensions for such desideratum: the weak and the strong union consonance.

Definition 3 (Weak Union Consonance
).An ATS L : σpΘq Ñ Φ is weakly consonant with the union if, for every A, B P σpΘq, and for every x P X , LpAqpxq " 1 and LpBqpxq " 1 implies LpA Y Bqpxq ‰ 0 This is exactly the definition of union consonance for non-agnostic testing schemes.Notice that, according to such definition, it is possible to remain agnostic about A Y B while rejecting A and B. [3] show that if a non agnostic testing scheme L satisfies union consonance, then for every finite set of indices I and for every tA i u iPI Ď σpΘq, mintLpA i qu iPI " 1 implies that LpY iPI A i q ‰ 0. This is not the case for weak union consonant agnostic testing schemes; we leave further details to Section 4.3.

Remark 1. Izbicki and Esteves
The second definition of union consonance is more stringent than the first one: Definition 4 (Strong Union Consonance).An L : σpΘq Ñ Φ is strongly consonant with the union if, for every arbitrary set of indices I and for every tA i u iPI Ď σpΘq such that Y iPI A i P σpΘq, and for every x P X , mintLpA i qpxqu iPI " 1 implies LpY iPI A i qpxq " 1 Definition 3 is less stringent than Definition 4 in two senses: (i) the latter imposes the (strict) rejection of a union of hypotheses whenever each of them is rejected while the former imposes just non-acceptance (rejection or abstention) of the union is such circumstances; and (ii) in Definition 4 consonance is required to hold for every set (possibly infinite) of hypotheses as opposed to Definition 3 which applies only to pairs of hypotheses.Notice that if an ATS is strongly consonant with union, it is also weakly consonant with union, and that both definitions are indeed extensions of the concept presented by Izbicki and Esteves [3].
The following examples show ATSs that are consonant with union.

Example 13 (Tests based on posterior probabilities). Consider again Example 2 with the restriction
If A and B are rejected after observing x P X , then and therefore A Y B cannot be accepted.Thus, with this restrictions, that ATS is weakly consonant with union.The restriction c 1 ě 2c 2 is not only sufficient to ensure weak union consonance, but it is actually necessary to ensure it holds for every prior distribution (see Theorem 2).Notice, however, that this ATS is not strongly consonant with union in general.
Example 14 (Likelihood Ratio Tests with fixed threshold).The ATS of Example 3 is strongly consonant with union.Indeed, let I be an arbitrary set of indices and tA i u iPI Ď σpΘq be such that Y iPI A i P σpΘq.For every x P X , λ x pY iPI A i q " sup iPI tλ x pA i qu [3].It follows that if λ x pA i q ď c 2 for every i P I, then λ x pY iPI A i q ď c 2 .Thus, if L rejects all hypotheses A i after x is observed, it also rejects Y iPI A i .In addition, L is also weakly consonant with union.
Example 15 (FBST).The ATS from Example 4 is also strongly consonant with union.Indeed, let I be an arbitrary set of indices and tA i u iPI Ď σpΘq be such that Y iPI A i P σpΘq.For every x P X , ev x pY iPI A i q " sup iPI tev x pA i qu [22].Strong union consonance holds due to the same argument from Example 14.It follows that L is also weakly consonant with union.
Example 16 (Region Estimator).The TS from Example 5 satisfies strong union consonance.Indeed, let I be an arbitrary set of indices and tA i u iPI Ď σpΘq be such that Y iPI A i P σpΘq.
, and, therefore, Ť iPI A i is rejected.It follows that L is also weakly consonant with union.

Intersection Consonance
The third property we investigate, named intersection consonance [3], states that if a (non agnostic) testing scheme cannot accept hypotheses A and B while rejecting its intersection.We consider two extensions of such definition to agnostic testing schemes.

Definition 5 (Weak Intersection Consonance
).An ATS L : σpΘq Ñ Φ is consonant with the intersection if, for every A, B P σpΘq and x P X , LpAqpxq " 0 and LpBqpxq " 0 implies LpA X Bqpxq ‰ 1.This is exactly the definition of intersection consonance for non-agnostic testing schemes.Notice that it is possible to accept A and B while being agnostic about A X B.
The second definition of intersection consonance is more stringent:

Definition 6 (Strong Intersection Consonance
).An ATS L : σpΘq Ñ Φ is strongly consonant with the intersection if, for every arbitrary set of indices I and for every tA i u iPI Ď σpΘq such that X iPI A i P σpΘq, and for every x P X , maxtLpA i qpxqu iPI " 0 implies LpX iPI A i qpxq " 0.
As in the case of union consonance, Definition 5 is less stringent than Definition 6 in two senses: (i) the latter imposes the (strict) acceptance of an intersection of hypotheses whenever each of them is accepted while the former imposes just non-rejection (acceptance or abstention) of the intersection is such circumstances; and (ii) in Definition 6 consonance is required to hold for every set (possibly infinite) of hypotheses as opposed to Definition 5 which applies only to pairs of hypotheses.Notice that if an ATS is strongly consonant with intersection, it is also weakly consonant with intersection, and that both definitions are indeed extensions of the concept presented by Izbicki and Esteves [3].
Example 17 (Tests based on posterior probabilities).Consider Example 2 with the restriction c 2 ď 2c 1 ´1.If A and B are accepted when x P X is sampled, then PpA|xq ą c 1 and PpB|xq ą c 1 .By Fréchet inequality, it follows that PpA X B|xq ě PpA|xq `PpB|xq ´1 ą 2c 1 ´1 ě c 2 and, therefore, A X B cannot be rejected.It follows that weak intersection consonance holds.The restriction c 2 ď 2c 1 ´1 is not only sufficient to ensure weak intersection consonance, but it is actually necessary to ensure this property holds for every prior distribution; see Theorem 2.
The ATS based on the likelihood ratio statistic from Example 3 does not satisfy intersection consonance, because there are examples in which λ x pA X Bq " 0, while λ x pAq ą 0 and λ x pBq ą 0 (Consider, for example, that every θ P Θ has the same likelihood and A X B " H).Similarly, the ATS based on FBST from Example 4 is not consonant with intersection, because there are examples such that ev x pA X Bq " 0, while ev x pAq ą 0 and ev x pBq ą 0. ATSs based on region estimators are consonant with intersection.
Example 18 (Region Estimator).The TS from Example 5 satisfies both strong and weak intersection consonance.Indeed, let I be an arbitrary set of indices and tA i u iPI Ď σpΘq be such that X iPI A i P σpΘq.If LpA i qpxq " 0 for every i P I, then Rpxq Ď A i for every i P I.It follows that Rpxq Ď X iPI A i , and hence X iPI A i is accepted.
It follows that the ATSs from Examples 7 and 8 are also consonant with intersection.Hence, it is possible to use e-values and likelihood ratio statistics to define ATS that are consonant with intersection.

Invertibility
Invertibility formalizes the notion of simultaneous tests free from the labels "null" and "alternative" for the hypotheses of interest and has been suggested by several authors, specially under a Bayesian perspective [3,24,25].

Definition 7 (Invertibility
It follows that the ATS from Examples 7 and 8 are also invertible.

Satisfying All Properties
Is it possible to construct non-trivial agnostic testing schemes that satisfy all consistency properties simultaneously?Contrary to the case of non agnostic testing schemes [3], the answer is yes.We next examine this question considering three desiderata: the weak desiderata (Section 4.1), the strong desiderata (Section 4.2), and the n-weak desiderata (Section 4.3).

Weak Desiderata
Definition 8 (Weakly Consistent ATS).An ATS, L, is said to be weakly consistent if L is monotonic (Definition 2), invertible (Definition 7), weakly consonant with the union (Definition 3), and weakly consonant with the intersection (Definition 5).c 2 " 1 ´c1 .It follows from these relations and the fact that this ATS is monotonic (Example 9) that if c 1 ą 2{3 and c 2 " 1 ´c1 , then it is weakly consistent, whatever the prior distribution for θ is.
The next theorem shows necessary and sufficient conditions for agnostic tests based on posterior distribution (with possibly different thresholds c 1 and c 2 for each hypothesis of interest) to satisfy each of the coherence properties.Theorem 2. Let Θ " R d and σpΘq " BpΘq, the Borelians of R d .Let P be a prior probability measure in σpΘq.For each A P σpΘq, let LpAq : X Ñ D be defined by where Pp.|xq is the posterior distribution of θ, given x, and 0 ď c A 2 ď c A 1 ď 1.This is a generalization of the ATS of Example 2. Assume that the likelihood function is positive for every x P X and θ P Θ.Such ATS satisfies: It follows from Theorem 2 that if the cutoffs used in each of the tests (c 1 and c 2 ) are required to be the same for all hypothesis of interest, then the conditions in Example 23 are not only sufficient, but they are also necessary to ensure that all (weak) consistency properties hold for every prior distribution for θ.

Strong Desiderata
Definition 9 (Fully Consistent ATS).An ATS, L, is said to be fully consistent if L is monotonic (Definition 2), invertible (Definition 7), strongly consonant with the union (Definition 4), and strongly consonant with the intersection (Definition 6).
The following theorem shows that, under mild assumptions, the only ATSs that are fully consistent are those based on region estimators.Theorem 3. Assume that for every θ P Θ, tθu P σpΘq.An ATS is fully consistent if, and only if, it is a region estimator-based ATS (Example 5).
Hence, the only way to create a fully consistent ATS is by designing an appropriate region estimator and using Example 5.In particular, ATSs based on posterior probabilities (Example 2) are typically not fully consistent.It should be emphasized that when the region estimator that characterizes a fully consistent ATS L maps X to singletons of Θ, no sample point will lead to abstention, as either Rpxq Ď A or Rpxq Ď A c , for every A P σpΘq.In such situations, region estimators reduce to point estimator which charaterize full consistent non-agnostic TSs [3].
In the next section, we consider a desiderata for simultaneous tests which is not as strong as that of Definition 9, but which is more stringent that that of Definition 8.

n-Weak Desiderata
In Sections 3.2 and 3.3, weak consonance was defined for two hypotheses only.It is however possible to define it for n ă 8 hypotheses: Definition 10 (Weak n-union Consonance).An A-TS L : σpΘq Ñ Φ satisfies weak n-union consonant if, for every finite set of indices I, with |I| ď n, for every tA i u iPI Ď σpΘq, and for every x P X mintLpA i qpxqu iPI " 1 implies LpY iPI A i qpxq ‰ 0.

Definition 11 (Weak n-intersection Consonance
).An ATS L : σpΘq Ñ Φ is weak n-intersection consonant if, for every finite set of indices I, with |I| ď n, for every tA i u iPI Ď σpΘq, and for every x P X maxtLpA i qpxqu iPI " 0 implies LpX iPI A i qpxq ‰ 1.
Although in the context of non agnostic testing schemes (union or intersection) consonance holds for n " 2 if, and only if, it holds for every n P N [3], this is not the case in the agnostic setting.We hence define Definition 12 (n-Weakly Consistent ATS).An ATS, L, is said to be n-weakly consistent if L is monotonic (Definition 2), invertible (Definition 7), n-weakly consonant with the union (Definition 10), and n-weakly consonant with the intersection (Definition 11).
Example 24 (Region Estimator).The ATS from Example 5 satisfies weak n-union and weak n-intersection consonance.The argument is the same as that presented in Examples 16 and 18.It follows that this is a n-weakly consistent ATS.
Example 25 (Tests based on posterior probabilities).Consider Example 2. In order to guarantee weak n-union consonance for every prior, it is necessary and sufficient to have c 1 ě nc 2 .Moreover, to guarantee weak n-intersection consonance for every prior, it is necessary and sufficient to have c 2 ď nc 1 ´pn ´1q.It follows from these conditions and Example 20 that the following restrictions are necessary and sufficient to guarantee monotonicity, n-union consonance, n-intersection consonance and invertibility: c 1 ą n{pn `1q and c 2 " 1 ´c1 .Hence, these conditions are sufficient to guarantee this ATS is n-weakly consistent.Now, because these conditions are also necessary, it follows that this ATS is n-weakly consistent for every n ą 1 if, and only if, it remains agnostic about every hypothesis which has probability in p0, 1q.

Decision-Theoretic Perspective
In this section, we investigate agnostic testing schemes from a Bayesian decision-theoretic perspective.First, we define an ATS generated by a family of loss functions.Note that, in the context of agnostic tests, a loss function is a function L : D ˆΘ Ñ R that assigns to each θ P Θ the loss Lpd, θq for making the decision d P t0, 1  2 , 1u.
Definition 13 (ATS generated by a family of loss functions).Let pX ˆΘ, σpX ˆΘq, Pq be a Bayesian statistical model.Let pL A q APσpΘq be a family of loss functions, where L A : D ˆΘ Ñ R is the loss function to be used to test A P σpΘq.An ATS generated by the family of loss functions pL A q APσpΘq is any ATS L defined over the elements of σpΘq such that, @A P σpΘq, LpAq is a Bayes test for hypothesis A against P.
Example 26 (Bayesian ATS generated by a family of error-wise constant loss functions).For A P σpΘq, consider the loss function L A of the form of Table 1, where all entries are assumed to be non negative.This is a generalization of standard 0 ´1 ´c loss functions to agnostic tests in the sense that it penalizes not only false acceptance and false rejection with constant losses b A and d A , respectively, but also an eventual abstention from deciding between accepting and rejecting A with the values a A and c A .
, and remain agnostic otherwise.It follows that the following ATS is generated by the family of loss functions pL A q APσpΘq : and d A " d B , this ATS matches that from Example 2 for a particular value of c 1 and c 2 .

State of Nature
We restrict out attention to ATSs generated by proper losses, a concept we adapt from [3] to agnostic tests:

Definition 14 (Proper losses). A family of loss functions pL
, for all θ Definition 14 states that (i) by taking a correct decision we lose less than by taking a wrong decision; (ii) by remaining agnostic we do not lose as much as when taking a wrong decision, but we lose more than by taking a correct decision; and (iii) it is better to remain agnostic about A than to flip a coin to decide if we reject or accept this hypothesis.
Example 27 (Bayesian ATS generated by a family of error-wise constant loss functions).In order to ensure that the loss in Example 26 is proper, the following restrictions must be satisfied: 0 ă a A ă d A {2 and 0 ă c A ă b A {2.In particular, these conditions imply those stated in Example 26.

Monotonicity
We now turn our attention towards characterizing Bayesian monotonic ATS using a decision-theoretic framework.In order to do this, we first adapt the concept of relative losses [3] to the context of agnostic testing schemes.

Definition 15 (Relative Loss).
Let L A be a loss function for testing hypothesis A. The relative losses r p1, 1  2 q A : Θ Ñ R and r The relative losses thus measure the difference between the losses of rejecting a given hypothesis and remaining agnostic about it, as well as the difference between the losses of remaining agnostic and accepting it.In order to guarantee that a Bayesian ATS is monotonic, certain constraints on the relative losses must be imposed.The next definition presents one of such assumptions, which we interpret in the sequence.

Definition 16 (Monotonic Relative Loss). Let D 2
ą " tp1, 1 2 q, p 1 2 , 0qu.pL A q APσpΘq has monotonic relative losses if the family pL A q APσpΘq is proper and, for all A, B P σpΘq such that A Ă B and for all pi, jq P D 2 ą , r pi,jq A pθq @θ P Θ Let A, B P σpΘq with A Ď B. If θ P A, both A and B are true, so pL A q APσpΘq having monotonic relative losses reflects the situation in which the rougher error of rejecting B compared to rejecting A (with respect to remaining agnostic about these hypotheses) should be assigned a larger relative loss.Similarly, the rougher error of remaining agnostic about B should be assigned a larger relative loss than remaining agnostic about A (with respect to correctly accepting these hypotheses).If θ P B but θ R A, these conditions are a consequence of the assumption that the family pL A q APσpΘq is proper.The case θ R B can be interpreted in a similar fashion as the case θ P A.
The following example presents necessary and sufficient conditions to ensure that the loss functions from Example 26 yield monotonic relative losses.
Example 28.Consider the losses presented in Example 26.Assuming the losses are proper (see Example 27), the conditions required to ensure pL A q APσpΘq has monotonic relative losses are Notice that these restrictions imply that b A ě b B .As a particular example, let k ą 2 and λ be a finite measure in σpΘq with λpΘq ą 0. The following assignments yield a proper and monotonic loss: for every A P σpΘq, b A " λpA c q, a A " λpAq{k, c A " λpA c q{k, and d A " λpAq.Another particular case is when a A " a B , b A " b B , c A " c B , and d A " d B for every A, B P σpΘq.
Another concept that helps us characterizing the Bayesian monotonic agnostic testing schemes is that of balanced relative losses, which we adapt from [7].
Definition 17 (Balanced Relative Loss).pL A q APσpΘq has balanced relative losses if, for all A, B P σpΘq such that A Ă B, for all θ 1 P A and θ 2 P B c , and for all pi, jq P D 2 ą , r pi,jq Lemma 1.If pL A q APσpΘq has monotonic relative losses, then pL A q APσpΘq has balanced relative losses.
The following result shows that balanced relative losses characterize Bayesian monotonic ATS.Theorem 4. Let pL A q APσpΘq be a family of proper loss functions.Assume that for every θ P Θ and x P X , L x pθq ą 0. For every prior π for θ, let L π denote a Bayesian ATS generated by pL A q APσpΘq .There exists a monotonic L π for every prior π if, and only if, pL A q APσpΘq has balanced relative losses.
Example 29.In Example 28, we obtained conditions on the loss functions pL A q APσpΘq from Example 26 in order to guarantee that family to have monotonic relative losses.From Lemma 1 and Theorem 4, it follows that such family of loss functions yield monotonic Bayesian ATSs whatever the prior for θ is.In other words, there are family of loss functions that induce monotonic tests based on posterior probabilities.

Union Consonance
We now turn our attention towards characterizing union consonant Bayesian ATS using a decision theoretic framework.Definition 18. pL A q APσpΘq is compatible with weak union consonance if there exists no A, B P σpΘq, θ 1 , θ 2 , θ 3 P Θ and p 1 , p 2 , p 3 ě 0 such that p 1 `p2 `p3 " 1 and AYB pθ 3 q ą 0 Definition 18 states that the family of loss functions pL A q APσpΘq being compatible with weak union consonance cannot induce any Bayesian ATS on the basis of which one may prefer rejecting both hypotheses A and B over remaining agnostic about them while accepting A Y B rather than abstaining.
As we will see in the next theorem, proper loss functions compatible with weak union consonance characterize Bayesian ATSs that are weakly consonant with the union.Theorem 5. Let pL A q APσpΘq be a family of proper loss functions.Assume that for every θ P Θ and x P X , L x pθq ą 0. For every prior π for θ, let L π denote a Bayesian ATS generated by pL A q APσpΘq .There exists an ATS L π that is weakly consonant with the union for every priori π if, and only if, pL A q APσpΘq is compatible with weak union consonance.
Example 30.We saw that the ATS from Example 2 is a Bayes test against a particular proper loss (Examples 26 and 27) and that it is weakly consonant with the union (Example 13).It follows from Theorem 5 that the family of loss functions that lead to this ATS are compatible with weak union consonance.
Definition 19 (Union consonance-balanced relative losses [7]).pL A q APσpΘq has union consonance-balanced relative losses if, for every A, B P σpΘq, θ 1 P A Y B and AYB pθ 1 q r p 1 2 ,0q AYB pθ 2 q Corollary 1.Let pL A q APσpΘq be a family of proper loss functions.Assume that for every θ P Θ and x P X , L x pθq ą 0. If pL A q APσpΘq does not have union consonance-balanced relative losses, then there exists a prior π such that every Bayesian ATS, L π , is not weakly consonant with the union.

Intersection Consonance
Next, we characterize intersection consonant Bayesian ATS under a Bayesian perspective.Definition 20. pL A q APσpΘq is compatible with weak intersection consonance if there exists no A, B P σpΘq, θ 1 , θ 2 , θ 3 P Θ and p 1 , p 2 , p 3 ě 0 such that p 1 `p2 `p3 " 1 and AXB pθ 1 q `p2 ¨rp1, 1 2 q AXB pθ 2 q `p3 ¨rp1, 1 2 q AXB pθ 3 q ă 0 Definition 20 states that the family of loss functions pL A q APσpΘq being compatible with weak intersection consonance cannot induce any Bayesian ATS on the basis of which one may prefer accepting both hypotheses A and B to remaining agnostic about them while rejecting A X B rather than abstaining.
As we will see in the next theorem, proper loss functions compatible with weak intersection consonance characterize Bayesian ATSs that are weakly consonant with the intersection .Theorem 6.Let pL A q APσpΘq be a family of proper loss functions.Assume that for every θ P Θ and x P X , L x pθq ą 0. For every prior π for θ, let L π denote a Bayesian ATS generated by pL A q APσpΘq .There exists an ATS L π that is weakly consonant with the intersection for every prior π if, and only if, pL A q APσpΘq is compatible with weak intersection consonance.
Example 31.We saw that the ATS from Example 2 is a Bayes test against a particular proper loss (Examples 26 and 27) and that it is weakly consonant with the intersection (Example 17).It follows from Theorem 6 that the family of loss functions that lead to this ATS are compatible with weak intersection consonance.
Definition 21 (Intersection consonance-balanced relative losses [7]).pL A q APσpΘq has intersection consonance-balanced relative losses if, for every A, B P σpΘq, θ 1 P A X B and θ 2 P pA X Bq c , r p1, 1  2 q AXB pθ 1 q r p1, 1  2 q Corollary 2. Let pL A q APσpΘq be a family of proper loss functions.Assume that for every θ P Θ and x P X , L x pθq ą 0. If pL A q APσpΘq does not have intersection consonance-balanced relative losses, then there exists a prior π such that every Bayesian ATS, L π , is not weakly consonant with the intersection.
We end this section by noting that although we focused our results on weak consonance, they can be extended to strong consonance using the same techniques presented in the Appendix.

Invertibility
Finally, we examine invertible Bayesian ATSs from a decision-theoretic standpoint.
Definition 22 (Invertible Relative Losses).pL A q APσpΘq has invertible relative losses if, for every A P σpΘq, for all θ 1 P A, θ 2 P A c and pi, jq P D 2 ą , r pi,jq We end this section by showing that invertible Bayesian ATSs are determined by family of loss functions that fulfill the conditions of Definition 22.
Theorem 7. Let pL A q APσpΘq be a family of proper loss functions.Assume that for every θ P Θ and x P X , L x pθq ą 0. For every prior π for θ, let L π denote a Bayesian ATS generated by pL A q APσpΘq .There exists an ATS L π that is invertible for every prior π if, and only if, pL A q APσpΘq has invertible relative losses.
Example 32.For every A P σpΘq, let pL A q APσpΘq be such that L A p1, θq " L A c p0, θq and L A p 1 2 , θq " L A c p 1 2 , θq.It is easily seen that the conditions from Definition 22 hold.Theorem 7 then implies that any Bayesian ATS generated by pL A q APσpΘq is invertible.

Final Remarks
Agnostic tests allow one to explicitly capture the difference between "not rejecting" and "accepting" a null hypothesis.When the agnostic decision is chosen, the null hypothesis is neither rejected or accepted.This possibility aligns with the idea that although precise null hypotheses can be tested, they shouldn't be accepted.This idea is followed by the region based agnostic tests derived in this paper, which can either remain agnostic or reject precise null hypotheses.
This distinction provides a solution to the problem raised by Izbicki and Esteves [3], in which all (non-agnostic) logically coherent tests were shown to be based on point estimators which lack statistical optimality.We show that agnostic tests based on region estimators satisfy logical consistency and also allow statistical optimality.For example, agnostic tests based on frequentist confidence intervals control family wise error.Similarly, agnostic tests based on posterior density regions are shown to be an extension of the Full Bayesian Significance Test [11].
Future research includes investigating the consequences and generalizations of the logical requirements in this paper.For example, one could study what kinds of trivariate logic derive from the different definition of logical consistency studied in this paper.One could also generalize these logical requirements to generalized agnostic tests, in which one can decide among different degrees of agnosticism.The scale of such degrees can be either discrete or continuous.One could also investigate region estimator-based ATSs with respect to other optimality criteria such as statistical power.
The results of this paper can also be tied to the philosophical literature that studies the consequences and importance of precise hypothesis.Agnostic tests can be used to revisit the role of testing precise hypotheses in science.Agnostic tests also provide a framework to interpret the scientific meaning of measures of possibility or significance of precise hypotheses.
Let G i " tg ´1 i ptjuq : j P g i pΘqu.Observe that G i is a finite partition of Θ.Let G ˚be the coarsest partition that is finer than every G i .Since every G i is finite, G ˚is finite.Let h : G ˚Ñ Θ be such that hpGq P G. Define P ˚: σpΘq Ñ R ˚by P ˚pAq " ř GPG ˚PpGqI A phpGqq.P ˚is such that P ˚pthpGquq " PpGq, @G P G ˚, and that P ˚phpG ˚qqq " 1, where hpG ˚q is a finite subset of Θ.Also, conclude from the definition of G ˚and Equation (A3) that Conclude from Equations (A2) and (A4) that ż f i dP ˚ą 0, i " 1, . . ., m Lemma A3.Let, for i P Θ, tiu P σpΘq and f 1 , . . ., f m be σpΘq{R-measurable bounded functions.If there exists a probability P on σpΘq such that P has a finite support and, for all 1 ď i ď m, ş f i dP ą 0, then there exists a probability P ˚with a support of size smaller or equal than m such that, for all 1 ď i ď m, ş f i dP ˚ą 0.
Proof.Let i ą 0 be such that Let Θ P denote the support of P. Let θ 1 , . . ., θ |Θ P | be an ordering of the elements of Θ P .Let F be a m ˆ|Θ P | matrix such that F i,j " f i pθ j q.Let p P R |Θ P | be such that p j " Pptθ j uq, j " 1, . . ., |Θ P |.Observe that Fp ě ; p ě 0 Therefore, the set C " tp ˚P R |Θ P | : p ě 0, Fp ě u is a non-empty polyhedron.Conclude that there exists a vertex p ˚P C such that |ti : p i " 0u| ě |Θ P | ´m.Define P ˚ptθ i uq " . Theorem A1.Let, for i P Θ, tiu P σpΘq and f 1 , . . ., f m be σpΘq{R-measurable bounded functions.There exists a probability P on σpΘq such that, for all 1 ď i ď m, ş f i dP ą 0, if and only if there exists a probability P ˚with a support of size smaller or equal to m such that, for all 1 ď i ď m, ş f i dP ˚ą 0.
Proof.Follows directly from Lemmas A2 and A3.
Proof of Lemma A4.The proof follows directly from the monotonicity of conditional expectation.
Proof of Lemma 1.Let A Ă B, θ 1 P A, θ 2 P B c and pi, jq P D 2 ą .Since pL A q APσpΘq has proper and monotonic relative losses, r pi,jq B pθ 1 q ě r pi,jq A pθ 1 q ą 0 r pi,jq A pθ 2 q ď r pi,jq B pθ 2 q ă 0 Conclude that pL A q APσpΘq has balanced relative losses.Lemma A5.Let pL A q APσpΘq have proper losses, L A be bounded for every A P σpΘq and L x pθq ą 0 for every θ P Θ and x P X .There exists a prior for θ such that, for some A Ă B and pi, jq P D ą and some x P X , ErL B pi, θq|xs ă ErL B pj, θq|xs and ErL A pi, θq|xs ą ErL A pj, θq|xs if and only if pL A q APσpΘq does not have balanced relative losses.
Proof of Lemma A5.Since L x pθq ą 0, the space of posteriors is exactly the space of priors over σpΘq [3].Therefore, there exists a prior such that ErL B pi, θq|xs ă ErL B pj, θq|xs and ErL A pi, θq|xs ą ErL A pj, θq|xs if and only if there exists P such that ż ´rpi,jq B dP ą 0 and ż r pi,jq A dP ą 0 It follows from Theorem A1 that there exists such a P if and only if there exists θ 1 , θ 2 P Θ and p P r0, 1s such that # p ¨rpi,jq B pθ 1 q `p1 ´pq ¨rpi,jq B pθ 2 q ă 0 p ¨rpi,jq A pθ 1 q `p1 ´pq ¨rpi,jq A pθ 2 q ą 0 Since pL A q APσpΘq has proper losses, the above condition is satisfied if and only if p P p0, 1q, that is, if and only if pL A q APσpΘq doesn't have balanced relative losses.
Proof of Theorem 4. Assume that pL A q APσpΘq has balanced relative losses.Let P θ be an arbitrary prior and A, B be arbitrary sets such that A Ă B. It follows from Lemma A5 that, for every pi, jq P D 2 ą , it cannot be the case that ErL B pi, θq|xs ă ErL B pj, θq|xs and ErL A pi, θq|xs ą ErL A pj, θq|xs.Conclude from Lemma A4 that there exists a monotonic Bayesian ATS.
Assume that pL A q APσpΘq does not have balanced relative losses.It follows from Lemma A5 that there exists a prior P θ , A Ă B and pi, jq P D 2 ą and x P χ such that ErL B pi, θq|xs ă ErL B pj, θq|xs and ErL A pi, θq|xs ą ErL A pj, θq|xs.Conclude from Lemma A4 that, for every Bayesian ATS, L P θ pAqpxq ď j ă i ď L P θ pBqpxq.Therefore there exists no monotonic Bayesian ATS against P θ .
Proof of Theorem 5.The proof follows directly from Theorem A1 and Lemma A4.

Figure 1 .
Figure 1.Agnostic test based on the region estimate Rpxq from Example 5.

Example 6 .Ppθ 1 R
From a frequentist perspective, one might choose R in Example 5 to be a confidence region: if the region estimator has confidence at least 1 ´α, then type I error probability, sup θPA PpLpAqpXq " 1|θq, is smaller than α for each of the hypothesis tests.Indeed, sup θPA PpLpAqpXq " 1|θq " sup θPA RpXq for every θ 1 P A|θq ď sup θPA Ppθ R RpXq|θq ď α.

Figure 2 .
Figure 2. Illustrations of the performance of the agnostic region testing scheme (Example 5) for three different hypotheses (specified on the top of each picture).The pictures present the probability of each decision, PpLpAqpXq " d|µq for d P t0, 1 2 , 1u, as a function of the mean, µ.

Example 22 (
Region Estimator).The ATS from Example 5 was already shown to satisfy all consistency properties from Definition 8 (Examples 12, 16, 18 and 21).Thus, it is a weakly consistent ATS.It follows that the ATSs from Examples 7 and 8, based on measures of support (likelihood ratio statistics and e-values), are weakly consistent ATSs.Example 23 (Tests based on posterior probabilities).Consider Example 2. We have seen that the following restrictions are sufficient to guarantee union weak consonance (Example 13), weak intersection consonance (Example 17) and invertibility (Example 20), respectively: c 1 ě 2c 2 , 2c 1 ´1 ě c 2 and

•
Monotonicity: if A implies B, then a test that does not reject A should not reject B. • Invertibility: A test should reject A if and only if it does not reject not-A.
• Union consonance: If a test rejects A and B, then it should reject A Y B. • Intersection consonance: If a test does not reject A and does not reject B, then it should not reject A X B.
Definition 2 (Monotonicity).L : σpΘq Ñ Φ is monotonic if, for every A, B P σpΘq, A Ă B implies that LpAq ě LpBq.L is monotonic if, for every hypotheses A Ă B, 1. if L accepts A, then it also accepts B. 2. if L remains agnostic about A, then it either remains agnostic about B or accepts B.
).An ATS L : σpΘq Ñ Φ is invertible if, for every A P σpΘq, LpA c q " 1 ´LpAq Example 20 (Tests based on posterior probabilities).The ATS from Example 2 is invertible for every prior distribution if and only if c 2 " 1 ´c1 .Example 21 (Region Estimator).The ATS from Example 5 is invertible.Indeed,

Table 1 .
The loss function for the hypothesis θ P A used in Example 26.