On the Decision-Theoretic Foundations and the Asymptotic Bayes Risk of the Region of Practical Equivalence for Testing Interval Hypotheses

Kelter, Riko

doi:10.3390/stats8030056

Open AccessArticle

On the Decision-Theoretic Foundations and the Asymptotic Bayes Risk of the Region of Practical Equivalence for Testing Interval Hypotheses

by

Riko Kelter

Department of Mathematics, University of Siegen, Walter-Flex-Street 2, 57072 Siegen, Germany

Stats 2025, 8(3), 56; https://doi.org/10.3390/stats8030056

Submission received: 27 May 2025 / Revised: 27 June 2025 / Accepted: 1 July 2025 / Published: 8 July 2025

(This article belongs to the Section Bayesian Methods)

Download

Browse Figures

Versions Notes

Abstract

Testing interval hypotheses is of huge relevance in the biomedical and cognitive sciences; for example, in clinical trials. Frequentist approaches include the proposal of equivalence tests, which have been used to study if there is a predetermined meaningful treatment effect. In the Bayesian paradigm, two popular approaches exist: The first is the region of practical equivalence (ROPE), which has become increasingly popular in the cognitive sciences. The second is the Bayes factor for interval null hypotheses, which was proposed by Morey et al. One advantage of the ROPE procedure is that, in contrast to the Bayes factor, it is quite robust to the prior specification. However, while the ROPE is conceptually appealing, it lacks a clear decision-theoretic foundation like the Bayes factor. In this paper, a decision-theoretic justification for the ROPE procedure is derived for the first time, which shows that the Bayes risk of a decision rule based on the highest-posterior density interval (HPD) and the ROPE is asymptotically minimized for increasing sample size. To show this, a specific loss function is introduced. This result provides an important decision-theoretic justification for testing the interval hypothesis in the Bayesian approach based on the ROPE and HPD, in particular, when sample size is large.

Keywords:

hypothesis testing; interval hypotheses; region of practical equivalence; decision-theory; loss function

1. Introduction

Hypothesis testing is a central method in scientific research [1]. Traditional null hypothesis testing (henceforth called precise hypothesis testing) is considered with a precise null hypothesis

H_{0}

and its set complement

H_{1}

in the parameter space

Θ

, which, in the Bayesian approach, is typically a Borel space

(Θ, τ)

with

σ

-algebra

τ

, and in the frequentist approach, it is simply a set [2]:

\begin{matrix} H_{0} : θ = θ_{0} H_{1} : θ \neq θ_{0} \end{matrix}

(1)

where

θ_{0} \in Θ

. However, various authors have criticized that precise null hypothesis testing is inadequate in a variety of research situations, questioning the usefulness of this approach. One of the main critiques of precise hypothesis testing based on equality constraints is that such constraints are often not plausible for the context of research. Small violations of these equality constraints nearly always exist [3,4]. Concerning the questionable scientific standard of precise hypothesis testing in fields like medicine or cognitive science, Berger, Boukai, and Wang [5] noted the following:

“The decision whether or not to formulate an inference problem as one of testing a precise null hypothesis centers on assessing the plausibility of such an hypothesis. Sometimes this is easy, as in testing for the presence of extrasensory perception, or testing that a proposed law of physics holds. Often it is less clear. In medical testing scenarios, for instance, it is often argued that any treatment will have some effect, even if only a very small effect, and so exact equality of effects (between, say, a treatment and a placebo) will never occur.” Berger et al. [6] (p. 145)

Examples where the appropriateness of precise hypotheses may be questioned include exploratory research, measurements that include a non-negligible amount of error, and general complex phenomena in which simple statistical models at best can be interpreted as approximations to reality [5,7,8]. As exact equality of effects is unplausible in a variety of research contexts, interval hypotheses present an appealing alternative to precise hypotheses, at least in a variety of research settings in the medical, cognitive, and social sciences [7]. In contrast to precise null hypotheses, an interval hypothesis and its alternative are formalized as follows:

\begin{matrix} H_{0} : θ \in [θ_{0} - c, θ_{0} + c] H_{1} : θ \notin [θ_{0} - c, θ_{0} + c] \end{matrix}

(2)

for some

c \in Θ

, where often

Θ : = R^{n}

. In the above,

θ \in Θ

is the parameter of interest and

θ_{0} \in Θ

is the null value of the precise null hypothesis

H_{0} : θ = θ_{0}

. In contrast to precise hypotheses, an interval hypothesis consists of an interval of parameter values, where

c \in Θ

is determined from the available research context, domain-specific knowledge, or the available measurement precision, which is always finite. As a sidenote, in the vast majority of clinical phase II trials that aim to demonstrate the efficacy of a novel drug or treatment, a one-sided test for the binary variable success is of interest (where a success can have different interpretations, e.g., reduction of tumor volume [9,10]), see Zhou et al. [11] and Kelter and Schnurr [12]. While such a hypothesis often is formulated as

\begin{matrix} H_{0} : θ \leq θ_{0} versus H_{1} : θ > θ_{0} \end{matrix}

for an efficacy threshold

θ_{0} \in (0, 1)

, in the majority of cases, values of

θ_{0}

close to 1 are deemed unrealistic a priori. Note that such cases also represent an interval hypothesis, which can be modeled with the region of practical equivalence (see next section).

The core idea behind an interval hypothesis is, therefore, that equality constraints can be tested only up to a limited precision.

Research Problem and Outline

In the Bayesian paradigm, two popular approaches exist: The first is the region of practical equivalence (ROPE), which has become increasingly popular in the cognitive sciences. The second is the Bayes factor for interval null hypotheses, which was proposed by Morey et al. [13]. However, while the ROPE is conceptually appealing, it lacks a clear decision-theoretic foundation like the Bayes factor. In this paper, a decision-theoretic justification for the ROPE procedure is derived for the first time, which shows that the Bayes risk of a decision rule is asymptotically minimized for increasing sample size.

Therefore, details on Bayesian approaches to interval hypothesis testing, including the ROPE and interval Bayes factors, are provided in Section 2. Then, in Section 3, a specific loss function is introduced for the ROPE. Then, the main result is provided by using this loss function. The main result gives an important decision-theoretic justification for testing interval hypotheses in the Bayesian approach via the ROPE. Section 4 reports the results of a simulation study, which yields recommendations for application of the results in practical settings. Section 5 provides a discussion of the result and concludes this paper.

2. Bayesian Approaches to Interval Hypothesis Testing

In this section, the two most popular approaches to Bayesian interval hypothesis testing are discussed briefly. The first approach is the region of practical equivalence, and the second approach is the Bayes factor. We primarily provide details on the Bayes factor because the decision-theoretic foundation (and its associated loss function) is very different from the one we introduce for the ROPE. Also, we stress that detailed decision-theoretic considerations for interval Bayes factors are, to our best knowledge, missing in the literature.

2.1. The Region of Practical Equivalence

To facilitate the shift to an estimation-oriented perspective, which employs the idea that hypothesis testing makes only sense up to a specific precision, ref. [14] proposed the region of practical equivalence. The concept itself appears under different names in a variety of scientific areas, and examples include the “interval of clinical equivalence”, “range of equivalence”, “equivalence interval”, “indifference zone”, “smallest effect size of interest,” and “good-enough belt” [14] (p. 185), where these formulations come from a wide spectrum of scientific contexts, see [15,16,17,18,19,20]. The concept of such a region has attracted researchers from diverse areas. The uniting idea of these concepts is to establish a region of practical equivalence around the null value

θ_{0}

of the precise null hypothesis

H_{0} : θ = θ_{0}

, where the region of practical equivalence expresses “the range of parameter values that are equivalent to the null value for current practical purposes”. [14] (p. 185). With a caution not to slip back into dichotomic black-and-white thinking, the following decision rule was proposed by Kruschke [21]:

Reject the null value $θ_{0}$ specified by $H_{0} : θ = θ_{0}$ , if the 95% highest-posterior density interval (HPD) falls entirely outside the ROPE.
Accept the null value $θ_{0}$ specified by $H_{0} : θ = θ_{0}$ , if the 95% HPD falls entirely inside the ROPE.

In the first case, with more than 95% probability, the parameter value is not inside the ROPE and therefore not practically equivalent to the null value

θ_{0}

. A rejection of the null value

θ_{0}

then seems legitimate. In the second case, the parameter value is located inside the ROPE with at least 95% posterior probability and therefore is practically equivalent to the null value

θ_{0}

. It is then legitimate to accept the null value

θ_{0}

. Of course, it would also be possible to accept the null value

θ_{0}

only when the entire posterior (or equivalently, the 100% HPD) is located inside the ROPE, which leads to an even stricter decision rule.

However, in his proposal, Kruschke [21] treats the ROPE as a decision rule to accept the null value, reject it, or make no decision at all (personal communication with the author). In his interpretation, the ROPE does not mimic a hypothesis test and is not interpreted as an interval hypothesis

H_{0} : θ \in [θ_{0} - c, θ_{0} + c]

for some boundary

c > 0

. As a consequence, Kruschke [21] accepts the null value

θ_{0}

instead of the interval hypothesis

H_{0} : θ \in [θ_{0} - c, θ_{0} + c]

for some boundary

c > 0

whenever the 95% HPD is located inside the ROPE.

Although this perspective can be taken, other authors have argued for an interpretation of the ROPE as an interval hypothesis [12,22,23,24,25,26]. Reconsidering the decision rule of Kruschke [21] based on the ROPE and HPD, in the first case, more than 95% of the posterior probability mass is located outside the interval hypothesis

H_{0} : θ \in [θ_{0} - c, θ_{0} + c]

, where

c > 0

is the ROPE boundary. That is, if the ROPE is specified as

R = [θ_{0} - c, θ_{0} + c]

around the null value

θ_{0} \in Θ

, one can interpret it as the interval hypothesis area used in the interval hypotheses

H_{0} : θ \in [θ_{0} - c, θ_{0} + c]

and

H_{1} : θ \notin [θ_{0} - c, θ_{0} + c]

. A rejection of the interval hypothesis

H_{0} : θ \in [θ_{0} - c, θ_{0} + c]

in the first case then seems legitimate. In the second case, the parameter value is located inside the ROPE

R = [θ_{0} - c, θ_{0} + c]

with at least 95% posterior probability, and therefore more than 95% of the posterior probability mass are located inside the interval hypothesis

H_{0} : θ \in [θ_{0} - c, θ_{0} + c]

. It is then legitimate to accept the null hypothesis

H_{0}

. In this paper, the interpretation of the ROPE as an interval hypothesis is adopted. Notice, however, that the main result of this paper given in Theorem 1 holds from both perspectives, no matter if the ROPE is interpreted as an interval hypothesis or as an extra cushion around the sharp point null value

θ_{0}

, where values inside this cushion are interpreted as practically equivalent to

θ_{0}

.

2.2. Bayes Factors for Interval Hypotheses

A second category of Bayesian approaches for testing interval hypotheses is based on the Bayes factor. The Bayes factor

B F_{01}^{π}

is a predictive updating factor that measures the change in relative beliefs about both hypotheses

H_{0}

and

H_{1}

under consideration given the data x:

\begin{matrix} B F_{01}^{π} (x) : = \frac{P^{π} (θ \in Θ_{0} | x)}{P^{π} (θ \in Θ_{1} | x)} / \frac{π (θ \in Θ_{0})}{π (θ \in Θ_{1})} \end{matrix}

(3)

Here, the superscript

π

indexes the prior distribution

π

assigned to the parameter

θ \in Θ

, and

P^{π} (\cdot | x)

denotes the posterior distribution based on the prior

π

and observed data x. The Bayes factor

B F_{01}^{π}

is the ratio of the two marginal likelihoods

p (x | H_{0})

and

p (x | H_{1})

of both models, and these are calculated by integrating out the respective model parameters according to their prior distributions. Generally, the calculation of the marginal likelihoods quickly becomes complex for non-trivial models [27,28], and in high-dimensional settings, numerical techniques are often preferred as a consequence [29].

Despite these computational issues, the Bayes factor has an appealing decision-theoretic justification. Let

Θ : = Θ_{0} \cup Θ_{1}

be a disjunct partition of the parameter space

Θ

into the null hypothesis

H_{0} : θ \in Θ_{0}

and its alternative

H_{1} : θ \in Θ_{1}

with

Θ_{0} \cap Θ_{1} = \emptyset

. Here,

H_{0}

can be a precise or interval hypothesis. The weighted 0–1 loss function is given as follows:

\begin{matrix} L (θ, φ) : = \{\begin{matrix} 0, if φ = 1_{Θ_{0}} (θ) \\ a_{0}, if θ \in Θ_{0} and φ = 0 \\ a_{1}, if θ \notin Θ_{0} and φ = 1 \end{matrix} \end{matrix}

(4)

Here,

φ

is a decision rule based on the observed data x, where

φ = 1

means accept

H_{0}

and

φ = 0

means accept

H_{1}

. The quantities

a_{0}, a_{1} > 0

are the losses associated with a false-positive and false-negative decision. The associated Bayes estimator for a prior distribution

π

for the parameter

θ

can be given as follows:

\begin{matrix} φ^{π} (x) = \{\begin{matrix} 1, if P^{π} (θ \in Θ_{0} | x) > \frac{a_{1}}{a_{0} + a_{1}} \\ 0, otherwise \end{matrix} \end{matrix}

(5)

For details, see Robert [30] (Proposition 5.2.2). In the above,

P^{π}

is the posterior based on the prior

π

, and for the losses

a_{0} = a_{1} = 1

, the Bayes estimator

φ^{π} (x)

can be formulated as follows: Accept

H_{0}

when the posterior probability of

H_{0}

is larger than

0.5

, that is,

P^{π} (θ \in Θ_{0} | x) > 0.5

; otherwise, accept

H_{1}

. The Bayes factor is completely equivalent to the posterior probability of the null hypothesis from a decision-theoretic point of view. Using the loss function (4),

H_{0}

is accepted when

\begin{matrix} B F_{01}^{π} (x) > \frac{a_{1}}{a_{0}} / \frac{π (θ \in Θ_{0})}{π (θ \in Θ_{1})} \end{matrix}

(6)

because

\begin{matrix} B F_{01}^{π} (x) > \frac{a_{1}}{a_{0}} / \frac{π (θ \in Θ_{0})}{π (θ \in Θ_{1})} & \Leftrightarrow \frac{P^{π} (θ \in Θ_{0} | x)}{P^{π} (θ \in Θ_{1} | x)} / \frac{π (θ \in Θ_{0})}{π (θ \in Θ_{1})} > \frac{a_{1}}{a_{0}} / \frac{π (θ \in Θ_{0})}{π (θ \in Θ_{1})} \end{matrix}

(7)

\begin{matrix} \Leftrightarrow \frac{P (θ \in Θ_{0} | x)}{P (θ \in Θ_{1} | x)} > \frac{a_{1}}{a_{0}} \Leftrightarrow P (θ \in Θ_{0} | x) > \frac{a_{1}}{a_{0}} P (θ \in Θ_{1} | x) \end{matrix}

(8)

\begin{matrix} \Leftrightarrow P (θ \in Θ_{0} | x) > \frac{a_{1}}{a_{0}} (1 - P (θ \in Θ_{0} | x)) \end{matrix}

(9)

\begin{matrix} \Leftrightarrow \frac{a_{0}}{a_{0}} P (θ \in Θ_{0} | x) + \frac{a_{1}}{a_{0}} P (θ \in Θ_{0} | x) > \frac{a_{1}}{a_{0}} \end{matrix}

(10)

\begin{matrix} \Leftrightarrow \frac{a_{0} + a_{1}}{a_{0}} P (θ \in Θ_{0} | x) > \frac{a_{1}}{a_{0}} \end{matrix}

(11)

\begin{matrix} \Leftrightarrow P^{π} (θ \in Θ_{0} | x) > \frac{a_{1}}{a_{0} + a_{1}} \end{matrix}

(12)

where the last inequality is the Bayes estimator given in Equation (5). Consequently, when using the decision rule

B F_{01}^{π} (x) > \frac{a_{1}}{a_{0}} / \frac{π (θ \in Θ_{0})}{π (θ \in Θ_{1})}

, one uses the Bayes estimator in Equation (5), which justifies the Bayes factor from a decision-theoretic point of view. For identical losses

a_{0} = a_{1}

for false-positive and false-negative decisions and identical prior weights

π (θ \in Θ_{0}) = π (θ \in Θ_{1}) = 0.5

of the null and alternative hypothesis

H_{0}

and

H_{1}

, the null hypothesis

H_{0}

is accepted if

B F_{01}^{π} (x) > 1

, which looks more familiar.

As the Bayes factor enjoys the above outlined decision-theoretic justification, Morey and Rouder [13] derived Bayes factors for testing interval null hypotheses. They distinguish between three types of hypotheses: The nil hypothesis, which states that a parameter or effect is precisely zero, the null hypothesis, which may be restricted to a nil hypothesis but may also allow for values which deviate slightly from the nil, and the default hypothesis, which refers to a hypothesis which is assumed to be true unless sufficient evidence is presented against it. Morey and Rouder [13] take the default position that the nil hypothesis never holds to arbitrary precision; that is, there is always some kind of effect. As a consequence, they aim to establish a region of parameter values around the nil hypothesis which are not “materially significant”, referring to the early ideas of Hodges and Lehmann [31]. This region contains all parameter values which are considered too small to be meaningfully different from zero. Details on the different types and the derivations of the Bayes factors in the three models can be found in Morey and Rouder’s paper [13].

Although the Bayes factors given in Morey and Rouder’s paper [13] present a decision-theoretically justified procedure for interval hypothesis testing in the Bayesian paradigm, a disadvantage of the Bayes factor, as mentioned by Robert [32] and Tendeiro and Kiers [33], is its dependence on the prior used for the model parameters. We stress that often, Bayes factors are criticized due to their dependence on the prior distribution, see Robert [32] or Tendeiro and Kiers [33]. The ROPE is often seen as quite robust to the selected prior hyperparameters [24], but it is still influenced by the latter. An entirely different mode of inference would consist of using pure likelihood ratios, as championed by [34]. This approach would, however, require an analogue concept to the ROPE that is based solely on likelihood ratios. While interesting in itself, developing such an approach is clearly outside the scope of the current manuscript. A detailed discussion about the premises of Bayesian tests is also given in the discussion between Ly and Wagenmakers [35] and Kelter [36]. In contrast, the ROPE is based on the posterior distribution, which, because of the well-known Bernstein–von Mises theorem, is asymptotically normal [37,38]. Consequently, the ROPE is quite robust to the prior selection, as discussed in Kruschke [21] and demonstrated by Makowski et al. [23] and Kelter [24]. However, by now, the ROPE lacks a decision-theoretic justification, which will be presented now.

3. Decision-Theoretic Foundation of the ROPE

In this section, the main result is presented. Although the ROPE has appealing practical properties like its robustness to the prior selection and wide applicability, it lacks a decision-theoretic justification. In contrast, the Bayes factor can be motivated from a decision-theoretic perspective as a formal Bayes rule, as shown in the previous section. The main result of this paper is given in Theorem 1 and shows that the decision rule based on the ROPE and HPD asymptotically minimizes the Bayes risk and can thus be called an asymptotic Bayes rule.

The following notation is used: A decision rule

δ

maps the observed sample data

x \in X

, where x is an element of a measure space

(X, A)

with

σ

-algebra

A

onto an action space

(Δ, A_{Δ})

, which is a measure space with

σ

-algebra

A_{Δ}

:

\begin{matrix} δ : (X, A) \to (Δ, A_{Δ}) \end{matrix}

(13)

A loss function

L : Θ \times Δ \to {\bar{R}}_{+}

is then introduced, where

{\bar{R}}_{+} : = R_{+} \cup \infty

, which takes a parameter

θ \in Θ

and a decision (or action)

a \in Δ

and returns the incurred loss

L (θ, a)

when deciding for

δ (x) = a

when

θ

is true. In the Bayesian interpretation,

(Θ, τ)

is another measure space, the parameter space. The loss function

L (θ, δ (x))

depends on the decision made, which itself depends on the decision rule

δ (\cdot)

, which itself depends on the observed data

x \in X

. To account for the uncertainty in the decision rule

δ

, the risk function

R : Θ \times D \to [0, \infty)

is introduced, where

D

is the set of all (randomized) decision functions

δ

:

\begin{matrix} R (θ, δ) : = \int_{X} (\int_{Δ} L (θ, y) δ (x, d y)) d P_{θ} (x) \end{matrix}

(14)

In the above, the inner integral

\int_{Δ} L (θ, y) δ (x, d y)

is the loss incurred when using the decision rule

δ

for all possible decisions

Δ

for the fixed parameter value

θ

and fixed sample

x \in X

. The outer integral over

X

with respect to the measure

P_{θ} (x)

, which is

P (X = x | Θ = θ)

, accounts for the variability when observing

x \in X

. That is,

P_{θ} (x)

is a probability measure on the sample space

(X, A)

of the (identifiable parameterized, compare [39]) statistical model

P : = {P_{θ} : θ \in Θ}

, which is assumed to represent the true data generating process (which is a family of probability measures in most situations). The outer integral averages the incurred loss over the sample space

X

in the sense that it computes the incurred loss for all observable

x \in X

.

A Bayesian will argue that he does not know which value

θ

has, so although Equation (14) accounts for the uncertainty in observing

x \in X

, it remains unclear for which parameter value (14) should be computed. Phrased differently, Equation (14) accounts for the randomness of the observed data

x \in X

and the randomness of the action

a \in Δ

, but it does not account for the randomness of the parameter

θ \in Θ

. Therefore, the Bayesian statistician selects a prior distribution

μ \in M^{1} (Θ, τ)

, where

M^{1} (Θ, τ)

is the set of all probability measures on the parameter space

(Θ, τ)

and averages the risk over the prior distribution:

\begin{matrix} r (μ, δ) : = \int_{Θ} R (θ, δ) d μ (θ) \end{matrix}

(15)

which is called the Bayes risk of decision rule

δ

with respect to the prior

μ

. Substituting (14) into (15),

r (μ, δ)

can also be written as as follows:

\begin{matrix} r (μ, δ) : = \int_{Θ} [\int_{X} (\int_{Δ} L (θ, y) δ (x, d y)) d P_{θ} (x)] d μ (θ) \end{matrix}

(16)

A decision rule

δ

is called a Bayes rule if it minimizes the Bayes risk, which is formalized as follows:

\begin{matrix} r (μ, δ_{0}) \leq r (μ, δ) \forall δ \in D \end{matrix}

(17)

A decision rule

δ

is called an asymptotic Bayes rule if it minimizes the Bayes risk asymptotically, which is formalized as follows:

\begin{matrix} r (μ, δ_{0}) \leq r (μ, δ) \forall δ \in D for n \to \infty \end{matrix}

(18)

where n denotes the sample size of the observed data

x \in X

.

In the following, a decision rule

δ

and a loss function

L (θ, δ (x))

are introduced, for which the asymptotic Bayes risk is zero; that is,

r (μ, δ) \to_{n \to \infty}^{P} 0

. This presents a decision-theoretic justification for the ROPE + HPD procedure for testing interval hypotheses.

The ROPE + HPD decision rule

δ

allows us only to accept or reject the null value

θ_{0}

or to make no decision. This implies that

Δ : = {d_{A}, d_{R}, d_{N}}

, where

d_{A}

means accept

θ_{0}

for practical purposes,

d_{R}

means reject

θ_{0}

for practical purposes, and

d_{N}

means make no decision. Acceptance of the null value

θ_{0}

is interpreted as acceptance of the interval hypothesis

H_{0} : θ \in [θ_{0} - c, θ_{0} + c]

specified by the ROPE

R = [θ_{0} - c, θ_{0} + c]

, as noted above, and rejection of the null value

θ_{0}

as rejection of the interval hypothesis

H_{0} : θ \in [θ_{0} - c, θ_{0} + c]

.

The decision rule based on the HPD and ROPE is now defined as follows:

Definition 1

(ROPE decision rule). Let

δ : (X, A) \to (Δ, A_{Δ})

, with

Δ : = {d_{A}, d_{R}, d_{N}}

and

A_{Δ} : = P (Δ)

. The ROPE decision rule

δ (x)

is given as

\begin{matrix} δ (x) : = \{\begin{matrix} d_{A}, [a, b] \cap R = [a, b] \hat{=} HPD entirely inside ROPE \\ d_{R}, [a, b] \cap R = \emptyset \hat{=} HPD entirely outside ROPE \\ d_{N}, [a, b] \cap R \neq [a, b] \land [a, b] \cap R \neq \emptyset \hat{=} partial overlap \end{matrix} \end{matrix}

(19)

where

R \in Θ

is the ROPE (e.g.,

R = [θ_{0} - c, θ_{0} + c]

for some

c \in Θ

, where often,

Θ : = R^{n}

) and

a < b

, where a and b are the boundaries of the α% HPD based on the posterior density

p (θ | x)

based on the observed data

x \in X

.

The loss function is then defined as follows:

Definition 2

(ROPE loss function). Let

L : Θ \times Δ \to {\bar{R}}_{+}

with

Δ = {d_{A}, d_{R}, d_{N}}

. The ROPE loss function is given as follows:

\begin{matrix} L (θ, δ (x)) : = \{\begin{matrix} \underset{HPD width}{\underset{︸}{(b - a)}} + \underset{\begin{matrix} cost of parameter \\ not in HPD \end{matrix}}{\underset{︸}{c_{P} \cdot 1_{θ \notin [a, b]}}} + \underset{\begin{matrix} cost of accept \\ when HPD is not in R \end{matrix}}{\underset{︸}{c_{A} \cdot 1_{[a, b] \cap R = \emptyset}}}, if δ (x) = d_{A} \\ \underset{HPD width}{\underset{︸}{(b - a)}} + \underset{\begin{matrix} cost of parameter \\ not in HPD \end{matrix}}{\underset{︸}{c_{P} \cdot 1_{θ \notin [a, b]}}} + \underset{\begin{matrix} cost of reject \\ when HPD is in R \end{matrix}}{\underset{︸}{c_{R} \cdot 1_{[a, b] \cap R = [a, b]}}}, if δ (x) = d_{R} \\ \underset{HPD width}{\underset{︸}{(b - a)}} + \underset{\begin{matrix} cost of parameter \\ not in HPD \end{matrix}}{\underset{︸}{c_{P} \cdot 1_{θ \notin [a, b]}}} + \underset{\begin{matrix} cost of no \\ decision \end{matrix}}{\underset{︸}{c_{N} \cdot 1_{[a, b] \cap R \neq [a, b] \land [a, b] \cap R \neq \emptyset}}}, if δ (x) = d_{N} \end{matrix} \end{matrix}

(20)

where

c_{P}, c_{A}, c_{R}, c_{N} > 0

are penalties associated with the parameter not being located inside the HPD, accepting the null hypothesis

H_{0}

when the HPD is not located entirely inside the ROPE R, rejecting the null hypothesis

H_{0}

when the HPD is located entirely inside the ROPE R, and with making no decision.

In all cases, the HPD width

(b - a)

and the cost

c_{P}

of the parameter not being inside the HPD contribute to the loss. If

δ (x) = d_{R}

, the null value

θ_{0}

is rejected, and the additionally incurred loss is

c_{R} \cdot 1_{[a, b] \cap R = [a, b]}

; that is, the constant

c_{R} > 0

if the HPD

[a, b]

is located entirely inside the ROPE R (then,

[a, b] \cap R = [a, b]

). This situation occurs if the HPD is in the ROPE, but the parameter is still excluded from the HPD. This situation occurs when the consistency of the posterior distribution causes the HPD to be already located inside the ROPE, but the parameter

θ

is still excluded from the HPD, possibly due to a not sufficiently large sample size.

If

δ (x) = d_{A}

, one accepts the null value

θ_{0}

. The additional costs are

c_{A} \cdot 1_{[a, b] \cap R = \emptyset}

; that is, the constant

c_{A} > 0

if the HPD

[a, b]

is located entirely outside the ROPE R (then,

[a, b] \cap R = \emptyset

). Such a case could occur when the posterior concentrates outside the ROPE R, but one still accepts the null value

θ_{0}

included in the ROPE (or the interval hypothesis described by the ROPE, see the first section).

If

δ (x) = d_{N}

, no decision is made, and the additional costs are

c_{N} > 0

when the HPD partially overlaps with the ROPE and with the complement of the ROPE (then,

[a, b] \cap R \neq [a, b] \land [a, b] \cap R \neq \emptyset

).

Thus, the proposed loss function consists of three main components: A loss that occurs when accepting the null value

θ_{0}

(or the associated interval hypothesis) although the HPD is not in the ROPE R, a loss of rejecting it when the HPD is in R, and a loss for making no decision. All three components seem reasonable.

From a second perspective, the loss function increases the loss gradually: First, the HPD width is smaller for larger sample sizes and thus vanishes to zero due to the posteriors consistency [39]. The cost of the parameter not being inside the HPD indicates that a second step is needed. In this case, the HPD is off in some way, meaning that due to randomness in the data, the HPD has not sufficiently captured the characteristics of the true data generating processes. The last and third step is the one described in the previous paragraph.

The main result in this paper shows that the decision rule

δ

specified in Definition 1 minimizes the Bayes risk

r (μ, δ)

in Equation (15) asymptotically under the loss function

L (θ, δ (x))

specified in Definition 2.

Theorem 1.

The decision rule

δ : (X, A) \to (Δ, A_{Δ})

, as specified in Definition 1, minimizes the asymptotic Bayes risk with respect to any proper prior

μ \in M^{1} (Θ, τ)

under the loss function in Definition 2. That is,

r (μ, δ) \to_{n \to \infty}^{P} 0

and consequently, for any other decision rule

\tilde{δ} : (X, A) \to (Δ, A_{Δ})

, we have

r (μ, δ) \leq r (μ, \tilde{δ})

asymptotically in probability for

n \to \infty

.

Proof.

See Appendix A. □

The above result has a drawback, however. The loss function of Definition 1 includes the width of the HPD interval as a constant loss in any case, whether the action

d_{A}

,

d_{R}

or

d_{N}

is made (accept, reject, or stay neutral about the hypothesis specified by the ROPE R). However, for small to moderate amounts of data, the width of the HPD interval can be substantial, depending on the statistical model and its parameter space

Θ

of interest. Thus, a less debatable decision rule would consist solely of the following:

▸: the loss for the parameter not being inside the HPD $[a, b]$ due to randomness in the observed sample $x \in X$ ,
▸: the loss associated with accepting a hypothesis when the HPD is not inside R, the loss of rejecting a hypothesis when the HPD is inside R, and the loss of making no decision at all, respectively.

The first point is an inherent problem that can occur in small to moderately sized samples

x \in X

. The second point is the natural cost one would associate with making an obviously wrong decision when the (consistent) posterior distribution shows that the ROPE does not include the true parameter

θ_{0}

, but one rejects or accepts the hypothesis circumscribed by the ROPE R. Note that these two losses are the second and third summands in the loss function of Definition 1.

We will now extend Theorem 1 and show that even when omitting the width

(b - a)

of the

α %

-HPD interval from it, it still remains an asymptotic Bayes rule. The important consequence is that as the width

(b - a)

vanishes only for unrealistically large sample sizes

n \in N

, this renders the loss function of Definition 1 and Theorem 1 a much more practically relevant result. This is due to the fact that while the width

(b - a)

vanishes for only extremely large

n \in N

, the convergence speed in Equations (A5) and () is much faster. Therefore, even in situations where the HPD width might be substantial due to using only a moderate or small amount of data, the decision-theoretic justification of the ROPE, as stated in Theorem 1, holds.

Definition 3

(Simplified ROPE loss function). Let

L : Θ \times Δ \to {\bar{R}}_{+}

with

Δ = {d_{A}, d_{R}, d_{N}}

. The simplified ROPE loss function is given as follows:

\begin{matrix} L (θ, δ (x)) : = \{\begin{matrix} \underset{\begin{matrix} cost of parameter \\ not in HPD \end{matrix}}{\underset{︸}{c_{P} \cdot 1_{θ \notin [a, b]}}} + \underset{\begin{matrix} cost of accept \\ when HPD is not in R \end{matrix}}{\underset{︸}{c_{A} \cdot 1_{[a, b] \cap R = \emptyset}}}, if δ (x) = d_{A} \\ \underset{\begin{matrix} cost of parameter \\ not in HPD \end{matrix}}{\underset{︸}{c_{P} \cdot 1_{θ \notin [a, b]}}} + \underset{\begin{matrix} cost of reject \\ when HPD is in R \end{matrix}}{\underset{︸}{c_{R} \cdot 1_{[a, b] \cap R = [a, b]}}}, if δ (x) = d_{R} \\ \underset{\begin{matrix} cost of parameter \\ not in HPD \end{matrix}}{\underset{︸}{c_{P} \cdot 1_{θ \notin [a, b]}}} + \underset{\begin{matrix} cost of no \\ decision \end{matrix}}{\underset{︸}{c_{N} \cdot 1_{[a, b] \cap R \neq [a, b] \land [a, b] \cap R \neq \emptyset}}}, if δ (x) = d_{N} \end{matrix} \end{matrix}

(21)

where

c_{P}, c_{A}, c_{R}, c_{N} > 0

are penalties associated with the parameter not being located inside the HPD, accepting the null hypothesis

H_{0}

when the HPD is not located entirely inside the ROPE R, rejecting the null hypothesis

H_{0}

when the HPD is located entirely inside the ROPE R, and with making no decision.

Now, given this simplified loss function, Theorem 1 still holds.

Theorem 2.

The decision rule

δ : (X, A) \to (Δ, A_{Δ})

, as specified in Definition 1, minimizes the asymptotic Bayes risk with respect to any proper prior

μ \in M^{1} (Θ, τ)

under the loss function in Definition 3. That is,

r (μ, δ) \to_{n \to \infty}^{P} 0

, and consequently, for any other decision rule

\tilde{δ} : (X, A) \to (Δ, A_{Δ})

, we have

r (μ, δ) \leq r (μ, \tilde{δ})

asymptotically in probability for

n \to \infty

.

Proof.

See Appendix A. □

In closing this section, note that Theorems 1 and 2 do not contradict each other. Theorem 1 holds, but for the loss function in Definition 3, the inequality

r (μ, δ) \leq r (μ, \tilde{δ})

asymptotically in probability for

n \to \infty

, as ascertained in Theorem 1, becomes an equality. For a finite sample size

n \in N

, the Bayes risk of the simplified ROPE loss function in Definition 3 is smaller then for the one given in Definition 2, particularly when the sample size

n \in N

of the observed data

x \in X

is small (and the HPD width in turn might be substantial).

4. Simulation Study

The result presented above provides a decision-theoretic justification of the ROPE for interval hypothesis testing, but it offers little guidance regarding how to make use of it in practical settings. While the result holds to any parametric statistical model and therefore is quite general, when the sample size is moderate, the key question is which of the costs

c_{P}, c_{A}, c_{R}, c_{N}

dominates the total costs. In most realistic settings, it will be difficult if not even impossible to determine precise or even vague values for the costs

c_{P}, c_{A}, c_{R}, c_{N}

. Still, some comments can be made even without simulations: First, the costs

c_{P} \cdot 1_{θ \notin [a, b]}

associated with the parameter not being located inside the HPD are independent of the decision made and indicate that the sample size is too small to capture the true parameter. The latter must happen eventually for large enough sample sizes due to the consistency of the posterior distribution [2]. As a consequence, the cost parameter

c_{P}

will influence the total costs when the sample size is moderate or small. Second, it is well known from other research that the ROPE is slow, in the sense that it requires more samples than other Bayesian testing approaches to accept or reject a hypothesis [23,24,40]. As a consequence, for moderate sample sizes, the costs of no decision

c_{N}

will dominate the total costs. In that case, the HPD partially overlaps with the ROPE, but is neither located entirely inside or outside of it.

An illustration of this phenomenon is given by simulating one of the most common testing problems, the two-sample t-test. We make use of the Bayesian t-test given by Rouder et al. [41] and four simulation scenarios: (1)

δ = 0

, so the null hypothesis

H_{0} : δ \in [- 0.2, 0.2]

holds. Here, we draw on the results of a large-scale meta-analysis of effect size magnitudes in medical research, comparing [42] to determine the ROPE boundaries

- 0.2

and

0.2

. (2)

δ = - 0.35

, which equals a small effect size according to Cohen [43], (3)

δ = 0.65

, which equals a medium effect size according to Cohen [43], and (4)

δ = 1.03

, which equals a large effect size according to Cohen [43]. Most relevant here are the results under no effect, a small effect, and a medium effect size, because these are much more realistic in applied research; compare [42]. We provide the percentage of the posterior probability inside the ROPE

R = [- 0.2, 0.2]

for sample sizes

n = 10, . . ., 150

, which are common sample sizes per group in the cognitive and medical sciences. For each scenario, 10,000 datasets were simulated to compute the (1) mean posterior probability inside the ROPE and (2) the probability of the parameter being inside the HPD.

Figure 1 shows the results, and Figure 1a provides the results under

δ = 0

, so

H_{0} : δ \in [- 0, 2, 0.2]

holds. The dashed horizontal lines show the values

95 %

to accept

H_{0}

and

5 %

to reject it (in that case, 95% of the posterior probability is located inside

H_{1} : δ \notin [- 0.2, 0.2]

, indicating to accept

H_{1}

). As can be seen, for all sample sizes, the situation remains indecisive, and there is a partial overlap between the ROPE and 95% HPD. As a consequence, the costs

c_{N}

for making no decision fully dominate the entire costs in that scenario, that is, when

H_{0}

holds. Figure 1b shows the results under a small effect size, and although

H_{1}

is true now, the same situation holds. Even for

n = 150

, the sample size is not large enough to reduce the posterior probability inside the ROPE

R = [- 0.2, 0.2]

to below 5%. The costs

c_{N}

entirely dominate the resulting costs. For a medium effect size, Figure 1c shows that for

n < 60

, the costs of no decision

c_{N}

again drive the entire costs. From

n \geq 60

on, the correct decision is made, reducing the costs to zero. Figure 1d shows the situation for a large effect size, and here, even

\approx 20

samples per group suffice to avoid any costs of making no decision

c_{N}

.

Figure 2 shows the probability of the HPD containing the true parameter

δ

for varying sample and effect sizes. Thus, the cost parameter

c_{P}

is studied. In Figure 2a, the results under

H_{0}

show that the true parameter

δ = 0

is captured by the majority of HPDs independently of the sample size. This, most probably, is due to the Cauchy prior, which is used on

δ

in the model proposed by Rouder et al. [41] and which is centered around

δ = 0

. Figure 2b–d show that the same phenomenon holds for increasing effect size magnitude. Thus, the costs

c_{P}

are much less relevant in realistically attainable sample sizes than the costs

c_{N}

to stay indecisive.

We close this section with two comments. First, the values

c_{P}

and

c_{N}

clearly influence which costs are more relevant in a given situation. However, the probabilities in Figure 1 and Figure 2 show that the probability of staying indecisive is much larger than the probability of obtaining a HPD that does not cover the true parameter. Second, the costs

c_{A}

and

c_{R}

further influence the total costs. However, assigning explicit costs to acceptance or rejection of a hypothesis seems unrealistic for almost all applied research attempting to quantify the costs of a trial without a result equal to

c_{N}

. We therefore recommend studying the probability of obtaining an indecisive result for the statistical model at hand under the relevant scenarios, as shown in Figure 1. Additionally, we recommend selecting a sample size based on decision-theoretic grounds that yields a probability large enough to include the true parameter inside the HPD; compare Figure 2. Otherwise, the whole inference can be misleading.

5. Discussion

This section discusses the main result and its limitations. First and foremost, Theorem 1 provides a decision-theoretic justification of the ROPE procedure for testing interval hypotheses in the Bayesian paradigm. The latter has gained widespread attention from areas like statistical research [22], psychological research [21,23,44], and preclinical animal research [42]. Its simplicity and ease of application are appealing, but until now, the approach lacked a clear decision-theoretic justification. In this paper, such a justification is provided for the first time based on a decision rule that seems quite acceptable for most Bayesians.

The decision rule given in Definition 1 can be interpreted as a formalization of the ROPE method, as proposed by Kruschke [21]. The form of the selected loss function specified in Definition 2 needs to be justified further; for example, one could easily choose the trivial loss function

L (θ, δ (x)) = (b - a)

, which is independent of the selected decision rule

δ (x)

and equals the width of the

α

% HPD. This motivates Definition 3 and the important extension of Theorem 1 to this loss function; see Theorem 2.

The loss function

δ

given in Definition 3 incorporates four important desiderata: First, it contains an explicit penalty

c_{A} > 0

whenever the hypothesis

H_{0}

is accepted, although the

α

% HPD is not located entirely inside the ROPE R. Second, it contains an explicit penalty

c_{R} > 0

whenever the hypothesis

H_{0}

is rejected, although the

α

% HPD is located entirely inside the ROPE R. Third, it contains an explicit penalty

c_{N} > 0

whenever no decision is made, which is the case when the ROPE R and the

α

% HPD partially overlap. Fourth, in all three cases (accept

H_{0}

, reject

H_{0}

, make no decision), a penalty

c_{P} > 0

is given if the

α

% HPD does not include the true parameter

θ \in Θ

. In such cases, the decision is based on a HPD that does not include the true parameter value, and the incurred loss naturally should be larger. Consequently, the loss function

L (θ, δ (x))

given in Definition 3 is not trivial and justified from a practical point of view. Theorem 2 now guarantees that asymptotically, decisions made based on the decision rule in Definition 1 and the loss function in Definition 3 will minimize the Bayes risk.

An important limitation is that in small sample situations, little can be said about the incurred loss when following the decision rule

δ

under the loss function in Definition 2. Theorem 1 only guarantees that for large sample sizes n, the Bayes risk is minimized, but in a variety of research, only small to moderate samples can be acquired. Theorem 2 improves this situation drastically by using the simpler loss function in Definition 3, which excludes the HPD width

(b - a)

.

However, in small sample situations, the penalty terms

c_{A}, c_{R}

and

c_{N}

influence the resulting loss and risk even more substantially. Suppose, for example, the

α

% HPD is located entirely outside the ROPE R for a small sample size

n \in N

. Suppose further that the true parameter

θ \in Θ

is located inside the ROPE R, so the posterior will concentrate for increasing sample size n inside the ROPE R. In this case, the resulting loss for the small sample will be based on the case for which

δ (x) = d_{R}

in

L (θ, δ (x))

. Suppose now that the sample size is increased to

m > n

. Then, the ROPE and

α

% HPD may overlap, and the resulting loss for the sample based on m observations will be based on the case for which

δ (x) = d_{N}

in

L (θ, δ (x))

. For growing sample size

m > > n

, eventually, the case is reached for which the resulting loss will be based on the case for which

δ (x) = d_{A}

, but the speed of this process is not known. Consequently, when successively applying the decision rule (e.g., in the growing sample scenario outlined above), the total loss depends on the magnitudes of

c_{A}, c_{R}

and

c_{N}

. For example, if the associated costs with making no decision

c_{N}

are tiny, it matters little if a large sample size n is required until the posterior is located entirely inside the ROPE R. On the contrary, in situations where a decision is mandatory and cannot be deferred (the costs

c_{N}

are large), the resulting loss may be substantial when the posterior is located inside the ROPE R only after observing a sample of very large size n. However, asymptotically, Theorem 2 guarantees that the Bayes risk will be minimized for any choice of

c_{A}, c_{R}

and

c_{N}

, and a decision-theoretic justification of the popular decision rule based on the ROPE and HPD given in Definition 1 is provided.

Keeping this in mind, the result presented in this paper—as is the case with every asymptotic result, such as the laws of large numbers, the central limit theorem, or the Bernstein–von Mises-theorem—cannot provide a boundary

n_{0} \in N

so that for sample sizes

n > n_{0}

one can be certain that the Bayes risk is minimized sufficiently. Still, it provides a decision-theoretic justification of statistical hypothesis testing based on the ROPE and HPD, which should not be understated in relevance. The reason is that the loss function in Definition 3 should be acceptable to most Bayesians, although the values of

c_{A}, c_{R}

and

c_{N}

will clearly differ from situation to situation.

Out simulation study results indicate that the costs

c_{N}

could drive the total incurred loss when using the ROPE substantially. This is in line with previous research [23,24,40,45], and our results provide two important recommendations for applied research. First, when sample size is moderate, we highly recommend simulating the probability inside the ROPE for varying sample sizes and different scenarios under both

H_{0}

and

H_{1}

, as in Figure 1. Then, the costs associated with no decision can be estimated for a given

c_{N}

. Second, the probability that the HPD contains the true parameter is crucial for reliable inference. We recommend simulating the latter as shown in Figure 2, also for varying sample sizes and relevant scenarios under

H_{0}

and

H_{1}

to select a minimum sample size that yields a large enough probability. This should safeguard against overconfidence and yield a Bayesian test for which Theorem 2 holds in an appealing way; that is, the total loss should be small.

Next to its robustness to the prior specification in large samples, this shows that the ROPE and HPD approach for testing interval hypotheses asymptotically minimize the Bayes risk for what seems like a natural loss function for a variety of situations. This result strengthens the justification of researchers applying interval hypothesis tests via the ROPE + HPD procedure in the Bayesian paradigm.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All results and figures can be reproduced with the code provided at the Open Science Foundation respository under https://osf.io/rjtcd/.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Proof of Theorem 1.

By definition,

Δ : = {d_{A}, d_{R}, d_{N}}

, so the inner integral

\int_{Δ} L (θ, y) δ (x, d y)

in the Bayes risk

r (μ, δ)

over

Δ

can be simplified as follows: As we integrate with respect to

δ (x, d y)

, this inner integral becomes as follows:

\begin{matrix} \int_{Δ} L (θ, y) δ (x, d y) = \int_{{d_{A}, d_{R}, d_{N}}} L (θ, y) δ (x, d y) \end{matrix}

(A1)

\begin{matrix} = P ([a, b] \cap R = [a, b]) \cdot [(b - a) + c_{P} \cdot 1_{θ \notin [a, b]} + c_{A} \cdot 1_{[a, b] \cap R = \emptyset}] \end{matrix}

(A2)

\begin{matrix} + P ([a, b] \cap R = \emptyset) \cdot [(b - a) + c_{P} \cdot 1_{θ \notin [a, b]} + c_{R} \cdot 1_{[a, b] \cap R = [a, b]}] \end{matrix}

(A3)

\begin{matrix} + P ([a, b] \cap R \neq [a, b] \land [a, b] \cap R \neq \emptyset) \cdot [(b - a) + c_{P} \cdot 1_{θ \notin [a, b]} + c_{N} \cdot 1_{[a, b] \cap R \neq [a, b] \land [a, b] \cap R \neq \emptyset}] \end{matrix}

(A4)

Line (A2) is the case when

δ (x) = d_{A}

, which happens when

[a, b] \in R

. Therefore, we multiply the posterior probability

P ([a, b] \cap R = [a, b])

of

[a, b]

being

\in R

with the value of the loss function

L (θ, d_{A}) = [(b - a) + c_{P} \cdot 1_{θ \notin [a, b]} + c_{A} \cdot 1_{[a, b] \cap R = \emptyset}]

. Line (A3) is the case when

δ (x) = d_{R}

, and line (A4) is the case when

δ (x) = d_{N}

. Now, two cases need to be distinguished: Either the parameter

θ

is inside the ROPE, or it is not. If in both cases, the Bayes risk is minimized asymptotically, then it follows that

δ

is an asymptotic Bayes rule.

First case: Let the true parameter

θ

be inside R, so that

θ \in R

. Due to the asymptotic normality of the posterior because of the Bernstein–von Mises theorem [37] (Chapter 10), it follows that the HPD

[a, b]

will be located inside the ROPE, too, for

n \to \infty

. This implies the following:

\begin{matrix} P ([a, b] \cap R = [a, b]) \to_{n \to \infty}^{P} 1 \end{matrix}

(A5)

and

\begin{matrix} P ([a, b] \cap R = \emptyset) \to_{n \to \infty}^{P} 0 \end{matrix}

(A6)

\begin{matrix} P ([a, b] \cap R \neq [a, b] \land [a, b] \cap R \neq \emptyset) \to_{n \to \infty}^{P} 0 \end{matrix}

(A7)

That is, the probability of the HPD

[a, b]

being entirely outside the ROPE R or overlapping partially with the ROPE R goes to zero for

n \to \infty

. As a consequence, Equation (A1) converges in probability to

\begin{matrix} \int_{{d_{A}, d_{R}, d_{N}}} L (θ, y) δ (x, d y) \to_{n \to \infty}^{P} [(b - a) + c_{P} \cdot 1_{θ \notin [a, b]} + c_{A} \cdot 1_{[a, b] \cap R = \emptyset}] \end{matrix}

(A8)

However, by assumption,

θ

is

\in R

, so

1_{[a, b] \cap R = \emptyset} \to_{n \to \infty}^{P} 0

, as from Equation (A6), we have

P ([a, b] \cap R = \emptyset) \to_{n \to \infty}^{P} 0

; that is, the posterior concentrates entirely inside the ROPE for n large enough [46], and therefore, the HPD

[a, b]

does, too (due to the case that the true parameter

θ

is inside R by assumption). Due to the asymptotic consistency of posterior distributions [38], we also have

1_{θ \notin [a, b]} \to_{n \to \infty}^{P} 0

because the posterior concentrates around

θ

, and the HPD

[a, b]

will asymptotically contain the true parameter for n large enough [46]. The right-hand side in Equation (A8) therefore simplifies to the following:

\begin{matrix} [(b - a) + c_{P} \cdot 1_{θ \notin [a, b]} + c_{A} \cdot 1_{[a, b] \cap R = \emptyset}] \to_{n \to \infty}^{P} b - a \end{matrix}

(A9)

Returning to the Bayesian risk

r (μ, δ)

in Equation (15), the second-inner integral asymptotically becomes as follows:

\begin{matrix} \int_{X} (\int_{Δ} L (θ, y) δ (x, d y)) d P_{θ} (x) \to_{n \to \infty}^{P} \int_{X} (b - a) d P_{θ} (x) \end{matrix}

(A10)

From the asymptotic normality of posterior distributions due to the Bernstein–von Mises theorem [37] (Chapter 10), the width of the

α

% HPD goes to zero for any credible level

α

for

n \to \infty

: We know that the posterior is asymptotically normal with inverse Fisher information matrix

I_{n} (x)

as variance according to the Bernstein–von Mises theorem. However, as

I (x) > 0

(otherwise, a single observation already contains no information), and because of the additivity of the Fisher information matrix—that is,

I_{n} (x) = n \cdot I (x)

—we have that for

n \to \infty

, also

I_{n} (x) \to \infty

. As the posterior has asymptotic variance

1 / I_{n} (x)

, the posterior variance

1 / I_{n} (x) \to 0

for

n \to \infty

. As a consequence, the HPD

[a, b] \to \emptyset

for

n \to \infty

. It follows that the width of corresponding

α

% HPD converges to zero (note that n could possibly be huge in practice). It follows that the integrand

(b - a)

in the right-hand side of Equation (A10) converges to zero in probability for

n \to \infty

, and it follows that

\begin{matrix} \int_{X} (b - a) d P_{θ} (x) \to_{n \to \infty}^{P} 0 \end{matrix}

(A11)

Finally, the outer integral in the Bayes risk

r (μ, δ)

can be evaluated. Using any proper prior

μ \in M^{1} (Θ, τ)

from Equations (15), (A10), and (A11) it follows that

\begin{matrix} r (μ, δ) = \int_{Θ} \underset{\to_{n \to \infty}^{P} \int_{X} (b - a) d P_{θ} (x)}{\underset{︸}{[\int_{X} (\int_{Δ} L (θ, y) δ (x, d y)) d P_{θ} (x)]}} d μ (θ) \to_{n \to \infty}^{P} \int_{Θ} \underset{\to_{n \to \infty}^{P} 0}{\underset{︸}{\int_{X} (b - a) d P_{θ} (x)}} d μ (θ) \end{matrix}

(A12)

from which it follows that

\begin{matrix} r (μ, δ) \to_{n \to \infty}^{P} 0 \end{matrix}

As a consequence, for the case that

θ \in R

, the decision rule

δ

, as specified in Definition 1, is an asymptotic Bayes rule, as it asymptotically minimizes the Bayes risk

r (μ, δ)

with respect to any proper prior

μ \in M^{1} (Θ, τ)

.

Second case: Let the true parameter

θ

be entirely outside R; that is,

θ \notin R

. The same arguments as in the first case lead to the following:

\begin{matrix} P ([a, b] \cap R = [a, b]) \to_{n \to \infty}^{P} 0 \end{matrix}

(A13)

\begin{matrix} P ([a, b] \cap R = \emptyset) \to_{n \to \infty}^{P} 1 \end{matrix}

(A14)

\begin{matrix} P ([a, b] \cap R \neq [a, b] \land [a, b] \cap R \neq \emptyset) \to_{n \to \infty}^{P} 0 \end{matrix}

(A15)

That is, the probability of the HPD

[a, b]

being entirely inside the ROPE R or overlapping partially with the ROPE R goes to zero for

n \to \infty

. As a consequence, Equation (A1) converges in probability to the following:

\begin{matrix} \int_{{d_{A}, d_{R}, d_{N}}} L (θ, y) δ (x, d y) \to_{n \to \infty}^{P} [(b - a) + c_{P} \cdot 1_{θ \notin [a, b]} + c_{R} \cdot 1_{[a, b] \cap R = [a, b]}] \end{matrix}

(A16)

However, by assumption,

θ

is

\notin R

, so

1_{[a, b] \cap R = [a, b]} \to_{n \to \infty}^{P} 0

, because from Equation (A13),

P ([a, b] \cap R = [a, b]) \to_{n \to \infty}^{P} 0

. Due to the asymptotic consistency of the posterior distribution, it follows that

1_{θ \notin [a, b]} \to_{n \to \infty}^{P} 0

because the posterior concentrates around

θ

and the HPD

[a, b]

will asymptotically contain the true parameter for n large enough [38]. The right-hand side of Equation (A16) therefore simplifies to

\begin{matrix} [(b - a) + c_{P} \cdot 1_{θ \notin [a, b]} + c_{R} \cdot 1_{[a, b] \cap R = [a, b]}] \to_{n \to \infty}^{P} b - a \end{matrix}

(A17)

as in the first case. Returning to the Bayesian risk

r (μ, δ)

in Equation (15), it follows that the second-inner integral asymptotically becomes as follows:

\begin{matrix} \int_{X} (\int_{Δ} L (θ, y) δ (x, d y)) d P_{θ} (x) = \int_{X} \underset{\to_{n \to \infty}^{P} (b - a) by Equations (16), (17)}{\underset{︸}{(\int_{{d_{A}, d_{R}, d_{N}}} L (θ, y) δ (x, d y))}} d P_{θ} (x) \to_{n \to \infty}^{P} \int_{X} (b - a) d P_{θ} (x) \end{matrix}

(A18)

The same arguments as in the first case about the width

(b - a)

of the HPD then lead to the following:

\begin{matrix} r (μ, δ) = \int_{Θ} [\int_{X} (\int_{Δ} L (θ, y) δ (x, d y)) d P_{θ} (x)] d μ (θ) \to_{n \to \infty}^{P} \int_{Θ} \underset{\to_{n \to \infty}^{P} 0}{\underset{︸}{\int_{X} (b - a) d P_{θ} (x)}} d μ (θ) \end{matrix}

(A19)

so that

\begin{matrix} r (μ, δ) \to_{n \to \infty}^{P} 0 \end{matrix}

and for the case that

θ \notin R

, it follows that

δ

, as specified in Definition 1, is an asymptotic Bayes rule, as it asymptotically minimizes the Bayes risk

r (μ, δ)

with respect to any proper prior

μ \in M^{1} (Θ, τ)

. This completes the proof. □

Proof of Theorem 2.

Theorem 2 follows the same steps as the proof of Theorem 1, except that Equation (A11) is not needed anymore due to the new loss function in Definition 3. □

References

Wasserstein, R.L.; Schirm, A.L.; Lazar, N.A. Moving to a World Beyond “p<0.05”. Am. Stat. 2019, 73, 1–19. [Google Scholar] [CrossRef]
Schervish, M.J. Theory of Statistics; Springer: Berlin/Heidelberg, Germany, 1995. [Google Scholar]
Meehl, P.E. Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. J. Consult. Clin. Psychol. 1978, 46, 806–834. [Google Scholar] [CrossRef]
Cohen, J. The earth is round (p < 0.05). Am. Psychol. 1994, 49, 997–1003. [Google Scholar] [CrossRef]
Berger, J.; Boukai, B.; Wang, Y. Unified Frequentist and Bayesian Testing of a Precise Hypothesis. Stat. Sci. 1997, 12, 133–160. [Google Scholar] [CrossRef]
Berger, J.; Brown, L.; Wolpert, R. A Unified Conditional Frequentist and Bayesian Test for fixed and sequential Hypothesis Testing. Ann. Stat. 1994, 22, 1787–1807. [Google Scholar] [CrossRef]
Ravenzwaaij, D.V.; Monden, R.; Tendeiro, J.N.; Ioannidis, J.P. Bayes factors for superiority, non-inferiority, and equivalence designs. BMC Med Res. Methodol. 2019, 19, 71. [Google Scholar] [CrossRef]
Berger, J.; Sellke, T. Testing a point null hypothesis: The irreconcilability of P values and evidence. J. Am. Stat. Assoc. 1987, 82, 112–122. [Google Scholar] [CrossRef]
Therasse, P.; Arbuck, S.G.; Eisenhauer, E.A.; Wanders, J.; Kaplan, R.S.; Rubinstein, L.; Verweij, J.; Glabbeke, M.V.; Oosterom, A.T.V.; Christian, M.C.; et al. New guidelines to evaluate the response to treatment in solid tumors. European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada. J. Natl. Cancer Inst. 2000, 92, 205–216. [Google Scholar] [CrossRef]
Eisenhauer, E.A.; Therasse, P.; Bogaerts, J.; Schwartz, L.H.; Sargent, D.; Ford, R.; Dancey, J.; Arbuck, S.; Gwyther, S.; Mooney, M.; et al. New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1). Eur. J. Cancer 2009, 45, 228–247. [Google Scholar] [CrossRef]
Zhou, H.; Lee, J.J.; Yuan, Y. BOP2: Bayesian optimal design for phase II clinical trials with simple and complex endpoints. Stat. Med. 2017, 36, 3302–3314. [Google Scholar] [CrossRef]
Kelter, R.; Schnurr, A. The Bayesian Group-Sequential Predictive Evidence Value Design for Phase II Clinical Trials with Binary Endpoints. Stat. Biosci. 2024, 17, 442–478. [Google Scholar] [CrossRef]
Morey, R.D.; Rouder, J.N. Bayes Factor Approaches for Testing Interval Null Hypotheses. Psychol. Methods 2011, 16, 406–419. [Google Scholar] [CrossRef] [PubMed]
Kruschke, J.K.; Liddell, T. The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychon. Bull. Rev. 2018, 25, 178–206. [Google Scholar] [CrossRef] [PubMed]
Carlin, B.; Louis, T. Bayesian Methods for Data Analysis; Chapman & Hall, CRC Press: Boca Raton, FL, USA, 2009. [Google Scholar]
Freedman, L.S.; Lowe, D.; Macaskill, P. Stopping rules for clinical trials. Stat. Med. 1983, 2, 167–174. [Google Scholar] [CrossRef]
Hobbs, B.P.; Carlin, B.P. Practical Bayesian design and analysis for drug and device clinical trials. J. Biopharm. Stat. 2007, 18, 54–80. [Google Scholar] [CrossRef]
Lakens, D. Performing high-powered studies efficiently with sequential analyses. Eur. J. Soc. Psychol. 2014, 44, 701–710. [Google Scholar] [CrossRef]
Lakens, D. Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses. Soc. Psychol. Personal. Sci. 2017, 8, 355–362. [Google Scholar] [CrossRef]
Schuirmann, D. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J. Pharmacokinet. Biopharm. 1987, 15, 657–680. [Google Scholar] [CrossRef]
Kruschke, J.K. Rejecting or Accepting Parameter Values in Bayesian Estimation. Adv. Methods Pract. Psychol. Sci. 2018, 1, 270–280. [Google Scholar] [CrossRef]
Liao, J.G.; Midya, V.; Berg, A. Connecting and Contrasting the Bayes Factor and a Modified ROPE Procedure for Testing Interval Null Hypotheses. Am. Stat. 2020, 75, 256–264. [Google Scholar] [CrossRef]
Makowski, D.; Ben-Shachar, M.; Lüdecke, D. bayestestR: Describing Effects and their Uncertainty, Existence and Significance within the Bayesian Framework. J. Open Source Softw. 2019, 4, 1541. [Google Scholar] [CrossRef]
Kelter, R. Analysis of Bayesian posterior significance and effect size indices for the two-sample t-test to support reproducible medical research. BMC Med Res. Methodol. 2020, 20, 88. [Google Scholar] [CrossRef] [PubMed]
Kelter, R. Bayesian and frequentist testing for differences between two groups with parametric and nonparametric two-sample tests. Wiley Interdiscip. Rev. Comput. Stat. 2021, 13, e1523. [Google Scholar] [CrossRef]
Kelter, R. The Evidence Interval and the Bayesian Evidence Value—On a unified theory for Bayesian hypothesis testing and interval estimation. Br. J. Math. Stat. Psychol. 2022, 75, 550–592. [Google Scholar] [CrossRef]
Kelter, R. Bayesian alternatives to null hypothesis significance testing in biomedical research: A non-technical introduction to Bayesian inference with JASP. BMC Med Res. Methodol. 2020, 20, 142. [Google Scholar] [CrossRef]
Stern, J.M. Significance tests, Belief Calculi, and Burden of Proof in legal and Scientific Discourse. Front. Artif. Intell. Its Appl. 2003, 101, 139–147. [Google Scholar]
Wagenmakers, E.J.; Lodewyckx, T.; Kuriyal, H.; Grasman, R. Bayesian hypothesis testing for psychologists: A tutorial on the Savage-Dickey method. Cogn. Psychol. 2010, 60, 158–189. [Google Scholar] [CrossRef]
Robert, C.P. The Bayesian Choice, 2nd ed.; Springer: New York, NY, USA, 2007. [Google Scholar] [CrossRef]
Hodges, J.L.; Lehmann, E.L. Testing the Approximate Validity of Statistical Hypotheses. J. R. Stat. Soc. Ser. B Methodol. 1954, 16, 261–268. [Google Scholar] [CrossRef]
Robert, C.P. The expected demise of the Bayes factor. J. Math. Psychol. 2016, 72, 33–37. [Google Scholar] [CrossRef]
Tendeiro, J.N.; Kiers, H.A. A Review of Issues About Null Hypothesis Bayesian Testing. Psychol. Methods 2019, 24, 774–795. [Google Scholar] [CrossRef]
Royall, R. Statistical Evidence: A Likelihood Paradigm for Statistical Evidence; Chapman and Hall: Boca Raton, FL, USA, 1997. [Google Scholar]
Ly, A.; Wagenmakers, E.J. A Critical Evaluation of the FBST e v for Bayesian Hypothesis Testing: Critique of the FBST e v. Comput. Brain Behav. 2021, 5, 564–571. [Google Scholar] [CrossRef]
Kelter, R. On the Measure-Theoretic Premises of Bayes Factor and Full Bayesian Significance Tests: A Critical Reevaluation. Comput. Brain Behav. 2021, 5, 572–582. [Google Scholar] [CrossRef]
van der Vaart, A. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
Held, L.; Bové, D.S. Applied Statistical Inference; Springer: Berlin/Heidelberg, Germany, 2014; p. 375. [Google Scholar] [CrossRef]
Kleijn, B. The Frequentist Theory of Bayesian Statistics; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
Kelter, R. Bayesian Hodges-Lehmann tests for statistical equivalence in the two-sample setting: Power analysis, type I error rates and equivalence boundary selection in biomedical research. BMC Med Res. Methodol. 2021, 21, 171. [Google Scholar] [CrossRef]
Rouder, J.N.; Speckman, P.L.; Sun, D.; Morey, R.D.; Iverson, G. Bayesian t-tests for accepting and rejecting the null hypothesis. Psychon. Bull. Rev. 2009, 16, 225–237. [Google Scholar] [CrossRef]
Kelter, R. Reducing the false discovery rate of preclinical animal research with Bayesian statistical decision criteria. Stat. Methods Med Res. 2023, 32, 1880–1901. [Google Scholar] [CrossRef]
Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Routledge: Oxfordshire, UK, 1988. [Google Scholar]
Linde, M.; Tendeiro, J.N.; Selker, R.; Wagenmakers, E.J.; van Ravenzwaaij, D. Decisions about equivalence: A comparison of TOST, HDI-ROPE, and the Bayes factor. Psychol. Methods 2023, 28, 740–755. [Google Scholar] [CrossRef]
Kelter, R. Simulation data for the analysis of Bayesian posterior significance and effect size indices for the two-sample t-test to support reproducible medical research. BMC Res. Notes 2020, 13, 452. [Google Scholar] [CrossRef]
Ghosal, S. A Review of Consistency and Convergence of Posterior Distribution. In Proceedings of the Varanashi Symposium in Bayesian Inference, Varanasi, India, 6–8 April 1996; Banaras Hindu University: Varanasi, India, 1996. [Google Scholar]

Figure 1. Results of the simulation study for the Bayesian two-sample t-test for sample sizes

n = 10

to

n = 150

under different true effect sizes.

δ

: Mean posterior probability inside the ROPE for 10000 simulated datasets. (a) Results under

δ = 0

, so

H_{0} : δ \in [- 0, 2, 0.2]

holds. (b) Results under

δ = - 0.35

, so

H_{0} : δ \in [- 0, 2, 0.2]

does not hold, and the true effect size is small according to Cohen [43]. (c) Results under

δ = 0.65

, so

H_{0} : δ \in [- 0, 2, 0.2]

does not hold, and the true effect size is medium according to Cohen [43]. (d) Results under

δ = 1.03

, so

H_{0} : δ \in [- 0, 2, 0.2]

does not hold, and the true effect size is large according to Cohen [43].

Figure 1. Results of the simulation study for the Bayesian two-sample t-test for sample sizes

n = 10

to

n = 150

under different true effect sizes.

δ

: Mean posterior probability inside the ROPE for 10000 simulated datasets. (a) Results under

δ = 0

, so

H_{0} : δ \in [- 0, 2, 0.2]

holds. (b) Results under

δ = - 0.35

, so

H_{0} : δ \in [- 0, 2, 0.2]

does not hold, and the true effect size is small according to Cohen [43]. (c) Results under

δ = 0.65

, so

H_{0} : δ \in [- 0, 2, 0.2]

does not hold, and the true effect size is medium according to Cohen [43]. (d) Results under

δ = 1.03

, so

H_{0} : δ \in [- 0, 2, 0.2]

does not hold, and the true effect size is large according to Cohen [43].

Figure 2. Results of the simulation study for the Bayesian two-sample t-test for sample sizes

n = 10

to

n = 150

under different true effect sizes

δ

: Probability of HPD containing the true parameter

δ

for 10000 simulated datasets. (a) Results under

δ = 0

, so

H_{0} : δ \in [- 0, 2, 0.2]

holds. (b) Results under

δ = - 0.35

, so

H_{0} : δ \in [- 0, 2, 0.2]

does not hold, and the true effect size is small according to Cohen [43]. (c) Results under

δ = 0.65

, so

H_{0} : δ \in [- 0, 2, 0.2]

does not hold, and the true effect size is medium according to Cohen [43]. (d) Results under

δ = 1.03

, so

H_{0} : δ \in [- 0, 2, 0.2]

does not hold, and the true effect size is large according to Cohen [43].

Figure 2. Results of the simulation study for the Bayesian two-sample t-test for sample sizes

n = 10

to

n = 150

under different true effect sizes

δ

: Probability of HPD containing the true parameter

δ

for 10000 simulated datasets. (a) Results under

δ = 0

, so

H_{0} : δ \in [- 0, 2, 0.2]

holds. (b) Results under

δ = - 0.35

, so

H_{0} : δ \in [- 0, 2, 0.2]

does not hold, and the true effect size is small according to Cohen [43]. (c) Results under

δ = 0.65

, so

H_{0} : δ \in [- 0, 2, 0.2]

does not hold, and the true effect size is medium according to Cohen [43]. (d) Results under

δ = 1.03

, so

H_{0} : δ \in [- 0, 2, 0.2]

does not hold, and the true effect size is large according to Cohen [43].

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kelter, R. On the Decision-Theoretic Foundations and the Asymptotic Bayes Risk of the Region of Practical Equivalence for Testing Interval Hypotheses. Stats 2025, 8, 56. https://doi.org/10.3390/stats8030056

AMA Style

Kelter R. On the Decision-Theoretic Foundations and the Asymptotic Bayes Risk of the Region of Practical Equivalence for Testing Interval Hypotheses. Stats. 2025; 8(3):56. https://doi.org/10.3390/stats8030056

Chicago/Turabian Style

Kelter, Riko. 2025. "On the Decision-Theoretic Foundations and the Asymptotic Bayes Risk of the Region of Practical Equivalence for Testing Interval Hypotheses" Stats 8, no. 3: 56. https://doi.org/10.3390/stats8030056

APA Style

Kelter, R. (2025). On the Decision-Theoretic Foundations and the Asymptotic Bayes Risk of the Region of Practical Equivalence for Testing Interval Hypotheses. Stats, 8(3), 56. https://doi.org/10.3390/stats8030056

Article Menu

On the Decision-Theoretic Foundations and the Asymptotic Bayes Risk of the Region of Practical Equivalence for Testing Interval Hypotheses

Abstract

1. Introduction

Research Problem and Outline

2. Bayesian Approaches to Interval Hypothesis Testing

2.1. The Region of Practical Equivalence

2.2. Bayes Factors for Interval Hypotheses

3. Decision-Theoretic Foundation of the ROPE

4. Simulation Study

5. Discussion

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI