ROC Analyses Based on Measuring Evidence Using the Relative Belief Ratio

Al-Labadi, Luai; Evans, Michael; Liang, Qiaoyu

doi:10.3390/e24121710

Open AccessArticle

ROC Analyses Based on Measuring Evidence Using the Relative Belief Ratio

by

Luai Al-Labadi

^1,†

,

Michael Evans

^2,*,†

and

Qiaoyu Liang

^2,†

¹

Department of Mathematical and Computational Sciences, University of Toronto Mississauga, Mississauga, ON L5L 1C6, Canada

²

Department of Statistical Sciences, University of Toronto, Toronto, ON M5S 3G3, Canada

^*

Author to whom correspondence should be addressed.

^†

Authors Al-Labadi, Evans and Liang contributed equally to this work.

Entropy 2022, 24(12), 1710; https://doi.org/10.3390/e24121710

Submission received: 26 September 2022 / Revised: 18 November 2022 / Accepted: 19 November 2022 / Published: 23 November 2022

(This article belongs to the Special Issue Information Theoretic Criteria: New Theoretical Developments and Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

ROC (Receiver Operating Characteristic) analyses are considered under a variety of assumptions concerning the distributions of a measurement X in two populations. These include the binormal model as well as nonparametric models where little is assumed about the form of distributions. The methodology is based on a characterization of statistical evidence which is dependent on the specification of prior distributions for the unknown population distributions as well as for the relevant prevalence w of the disease in a given population. In all cases, elicitation algorithms are provided to guide the selection of the priors. Inferences are derived for the AUC (Area Under the Curve), the cutoff c used for classification as well as the error characteristics used to assess the quality of the classification.

Keywords:

ROC; AUC; optimal cutoff; statistical evidence; relative belief; binormal; mixture Dirichlet process

1. Introduction

An ROC (Receiver Operating Characteristic) analysis is used in medical science to determine whether or not a real-valued diagnostic variable X for a disease or condition is useful. If the diagnostic indicates that an individual has the condition, then this will typically mean that a more expensive or invasive medical procedure is undertaken. So it is important to assess the accuracy of the diagnostic variable

X .

These methods have a wider class of applications but our terminology will focus on the medical context.

An approach to such analyses is presented here that is based on a characterization of statistical evidence and which incorporates all available information as expressed via prior probability distributions. For example, while p-values are often used in such analyses, there are questions concerning the validity of these quantities as characterizations of statistical evidence. As will be discussed, there are many advantages to the framework adopted here.

A common approach to the assessment of the diagnostic variable X is to estimate its AUC (Area Under the Curve), namely, the probability that an individual sampled from the diseased population will have a higher value of diagnostic variable X than an individual independently sampled from the nondiseased population. A good diagnostic should give a value of the AUC near 1 while a value near 1/2 indicates a poor diagnostic test (if the AUC is near 0, then the classification is reversed). It is possible, however, that a diagnostic with AUC

\approx 1

may not be suitable (see Examples 1 and 6). In particular, a cutoff value c needs to be selected so that if

X > c,

then an individual is classified as requiring the more invasive procedure. Inferences about the error characteristics for the combination

(X, c),

such as the false positive rate, etc., are also required.

This paper is concerned with inferences about the AUC, the cutoff c and the error characteristics of the classification based on a valid measure of evidence. A key aspect of the analysis is the relevant prevalence w. The phrase “relevant prevalence” means that X will be applied to a certain population, such as those patients who exhibit certain symptoms, and w represents the proportion of this subpopulation who are diseased. The value of w may vary by geography, medical unit, time, etc. To make a valid assessment of X in an application, it is necessary that the information available concerning w be incorporated. This information is expressed here via an elicited prior probability distribution for w, which may be degenerate at a single value if w is assumed known, or be quite diffuse when little is known about w. In fact, all unknown population quantities are given elicited priors. There are many contexts where data are available relevant to the value of w and this leads to a full posterior analysis for w as well as for the other quantities of interest. Even when such data are not available, however, it is still possible to take the prior for w into account so the uncertainties concerning w always play a role in the analysis and this is a unique aspect of the approach taken here.

While there are some methods available for the choice of c, these often do not depend on the prevalence w which is a key factor in determining the true error characteristics of

(X, c)

in an application, see [1,2,3,4,5]. So it is preferable to take w into account when considering the value of a diagnostic in a particular context. One approach to choosing c is to minimize some error criterion that depends on w to obtain

c_{o p t} .

As will be demonstrated in the examples, however, sometimes

c_{o p t}

results in a classification that is useless. In such a situation a suboptimal choice of c is required but the error characteristics can still be based on what is known about w so that these are directly relevant to the application.

Others have pointed out deficiencies in the AUC statistic and proposed alternatives. For example, it can be argued that taking into account the costs associated with various misclassification errors is necessary and that using the AUC is implicitly making unrealistic assumptions concerning these costs, see [6]. While costs are relevant, costs are not incorporated here as these are often difficult to quantify. Our goal is to express clearly what the evidence is saying about how good

(X, c)

is via an assessment of its error characteristics. With the error characteristics in hand, a user can decide whether or not the costs of misclassifications are such that the diagnostic is usable. This may be a qualitative assessment although, if numerical costs are available, these could be subsequently incorporated. The principle here is that economic or social factors be considered separately from what the evidence in the data says, as it is a goal of statistics to clearly state the latter.

The framework for the analysis is Bayesian as proper priors are placed on the unknown distribution

F_{N D}

(the distribution of X in the nondiseased population), on

F_{D}

(the distribution of X in the diseased population) and the prevalence

w .

In all the problems considered, elicitation algorithms are presented for how to choose these priors. Moreover, all inferences are based on the relative belief characterization of statistical evidence where, for a given quantity, evidence in favor (against) is obtained when posterior beliefs are greater (less) than prior beliefs, see Section 2.2 for discussion and [7]. So evidence is determined by how the data change beliefs. Section 2 discusses the general framework, defines relevant quantities and provides an outline for how specific relative belief inferences are determined. Section 3 develops the inferences for the quantities of interest for three contexts (1) X is an ordered discrete variable with and without constraints on

(F_{N D}, F_{D})

(2) X is a continuous variable and

(F_{N D}, F_{D})

are normal distributions (the binormal model) (3) X is a continuous variable and no constraints are placed on

(F_{N D}, F_{D}) .

There is previous work on using Bayesian methods in ROC analyses. For example, a Bayesian analysis for the binormal model when there are covariates present is developed in [8]. An estimate of the ROC using the Bayesian bootstrap is discussed in [9]. A Bayesian semiparametric analysis using a Dirichlet mixture process prior is developed in [10,11]. The sampling regime where the data can be used for inference about the relevant prevalence and where a gold standard classifier is not assumed to exist is presented in [12]. Considerable discussion concerning the case where the diagnostic test is binary, covering the cases where there is and is not a gold standard test, as well as the situation where the goal is to compare diagnostic tests and to make inference about the prevalence distribution can be found in [13] and also see [14]. Application of an ROC analysis to a comparison of linear and nonlinear approaches to a problem in medical physics is in [15]. Further discussion of nonlinear methodology can be found in [16,17].

The contributions of this paper, that have not been covered by previous published work in this area, are as follows:

(i): The primary contribution is to base all the inferences associated with an ROC analysis on a clear and unambiguous characterization of statistical evidence via the principle of evidence and the relative belief ratio. While Bayes factors are also used to measure statistical evidence, there are serious limitations on their usage with continuous parameters as priors are restricted to be of a particular form. The approach via relative belief removes such restrictions on priors and provides a unified treatment of estimation and hypothesis assessment problems. In particular, this leads directly to estimates of all the quantities of interest, together with assessments of the accuracy of the estimates, and a characterization of the evidence, whether in favor of or against a hypothesis, together with a measure of the strength of the evidence. Moreover, no loss functions are required to develop these inferences. The merits of the relative belief approach over others are more fully discussed in Section 2.2.
(ii): A prior on the relevant prevalence is always used to determine inferences even when the posterior distribution of this quantity is not available. As such the prevalence always plays a role in the inferences derived here.
(iii): The error in the estimate of the cut-off is always quantified as well as the errors in the estimates of the characteristics evaluated at the chosen cut-off. It is these characteristics, such as the sensitivity and specificity, that ultimately determine the value of the diagnostic test.
(iv): The hypothesis $H_{0} :$ AUC $> 1 / 2$ is first assessed and if evidence is found in favor of this, the prior is then conditioned on this event being true for inferences about the remaining quantities. Note that this is equivalent to conditioning the posterior on the event AUC > $1 / 2$ when inferences are determined by the posterior but with relative belief inferences both the conditioned prior and conditioned posterior are needed to determine the inferences.
(v): Precise conditions are developed for the existence of an optimal cutoff with the binormal model.
(vi): In the discrete context (1), it is shown how to develop a prior and the analysis under the assumption that the probabilities describing the outcomes from the diagnostic variable X are monotone.

The relative belief ratio, as a measure of evidence, is seen to have a connection to relative entropy. For example, it is equivalent, in the sense that the inferences are the same, to use the logarithm of the relative belief ratio as the measure of evidence. The relative entropy is then the posterior expectation of this quantity and so can be considered as a measure of the overall evidence provided by the model, prior and data concerning a quantity of interest.

The methods used for all the computations in the paper are simulation based and represent fairly standard Bayesian computational methods. In each context considered, sufficient detail is provided so that these can be implemented by a user.

2. The Problem

Consider the formulation of the problem as presented in [18,19] but with somewhat different notation. There is a measurement

X : Ω \to R^{1}

defined on a population

Ω = Ω_{D} \cup Ω_{N D},

with

Ω_{D} \cap Ω_{N D} = ϕ,

where

Ω_{D}

is comprised of those with a particular disease, and

Ω_{N D}

represents those without the disease. So

F_{N D} (c) = # ({ω \in Ω_{N D} : X (ω) \leq c}) / # (Ω_{N D})

is the conditional cdf of X in the nondiseased population, and

F_{D} (x) = # ({ω \in Ω_{D} : X (ω) \leq x}) / # (Ω_{D})

is the conditional cdf of X in the diseased population. It is assumed that there is a gold standard classifier, typically much more difficult to use than

X,

such that for any

ω \in Ω

it can be determined definitively if

ω \in Ω_{D}

or

ω \in Ω_{N D} .

There are two ways in which one can sample from

Ω,

namely,

(i): take samples from each of $Ω_{D}$ and $Ω_{N D}$ separately or
(ii): take a sample from $Ω .$

The sampling method used affects the inferences that can be drawn. For many studies (i) is the relevant sampling mode, as in case-control studies, while (ii) is relevant in cross-sectional studies.

It supposed that the greater the value

X (ω)

is for individual

ω,

the more likely it is that

ω \in Ω_{D} .

For the classification, a cutoff value c is required such that, if

X (ω) > c

, then

ω

is classified as being in

Ω_{D}

and otherwise is classified as being in

Ω_{N D} .

However, X is an imperfect classifier for any c and it is necessary to assess the performance of

(X, c)

. It seems natural that a value of c be used that is optimal in some sense related to the error characteristics of this classification. Table 1 gives the relevant probabilities for classification into

Ω_{D}

and

Ω_{N D}

, together with some common terminology, in a confusion matrix.

Another key ingredient is the prevalence

w = # (Ω_{D}) / # (Ω)

of the disease in

Ω .

In practical situations, it is necessary to also take w into account in assessing the error in

(X, c) .

The following error characteristics depend on

w,

\begin{matrix} Error (c) & = misclassification rate = w FNR (c) + (1 - w) FPR (c), \\ FDR (c) & = false discovery rate = \frac{(1 - w) FPR (c)}{w (1 - FNR (c)) + (1 - w) FPR (c)}, \\ FNDR (c) & = false nondiscovery rate = \frac{w FNR (c)}{w FNR (c) + (1 - w) (1 - FPR (c))} . \end{matrix}

Under sampling regime (ii) and cutoff c, Error(c) is the probability of making an error, FDR(c) is the conditional probability of a subject being misclassified as positive given that it has been classified as positive and FNDR(c) is the conditional probability of a subject being misclassified as negative given that it has been classified as negative. In other words, FDR(c) is the proportion of those individuals in the population consisting of those who have been classified by the diagnostic test as having the disease, but in fact do not have it. It is often observed that when w is very small and FNR(c) and FPR(c) are small, then FDR(c) can be big. This is sometimes referred to as the base rate fallacy as, even though the test appears to be a good one, there is a high probability that an individual classified as having the disease will be misclassified. For example, if

w =

FNR(c) = FPR

(c) = 0.05,

then Error

(c) = 0.05

, FDR

(c) = 0.50,

FNDR

(c) = 2.76 \times 10^{- 3}

and when

w = 0.01,

then Error

(c) = 0.05

, FDR

(c) = 0.84,

FNDR

(c) = 5.31 \times 10^{- 4} .

In these cases the false nondiscovery rate is quite small while the false discovery rate is large. If the disease is highly contagious, then these probabilities may be considered acceptable but indeed they need to be estimated. Similarly, FNDR

(c)

may be small when FNR

(c)

is large and w is very small.

It is naturally desirable to make inference about an optimal cutoff

c_{o p t}

and its associated error quantities. For a given value of

w,

the optimal cutoff will be defined here as

c_{o p t} = arg inf

Error

(c)

, the value which minimizes the probability of making an error. Other choices for determining a

c_{o p t}

can be made, and the analysis and computations will be similar, but our thesis is that, when possible, any such criterion should involve the prior distribution of the relevant prevalence

w .

As demonstrated in Example 6 this can sometimes lead to useless values of

c_{o p t}

even when the AUC is large. While this situation calls into question the value of the diagnostic, a suboptimal choice of c can still be made according to some alternative methodology. For example, sometimes Youden’s index, which maximizes

1 - 2

Error

(c)

over c with

w = 1 / 2

, is recommended, or the closest-to-(0,1) criterion which minimizes

FPR {(c)}^{2} + {(1 - TPR (c))}^{2}

, see [2] for discussion. Youden’s index and the closest-to-(0,1) criterion do not depend on the prevalence and have geometrical interpretations in terms of the ROC curve, but as we will see, the ROC curve does not exist in full generality and this is particularly relevant in the discrete case. The methodology developed here provides an estimate of the c to be used, together with an exact assessment of the error in this estimate, as well as providing estimates of the associated error characteristics of the classification.

Letting

{\hat{c}}_{o p t}

denote the estimate of

c_{o p t}

, the values of Error

({\hat{c}}_{o p t}), TPR ({\hat{c}}_{o p t}), FPR ({\hat{c}}_{o p t}), FNR ({\hat{c}}_{o p t})

and

TNR ({\hat{c}}_{o p t})

are also estimated and the recorded values used to assess the value of the diagnostic test. There are also other characteristics that may prove useful in this regard such as the positive predictive value (PPV)

PPV (c) = \frac{w TPR (c)}{w TPR (c) + (1 - w) FPR (c)},

namely, the conditional probability a subject is positive given that they have tested positive, which plays a role similar to FDR

(c)

. See [14] for discussion of the PPV and the similarly defined negative predictive value (NPV). The value of

PPV ({\hat{c}}_{o p t})

can be estimated in the same way as the other quantities as is subsequently discussed.

2.1. The AUC and ROC

Consider two situations where

F_{N D}, F_{D}

are either both absolutely continuous or both discrete. In the discrete case, suppose that these distributions are concentrated on a set of points

c_{1} < c_{2} < \dots < c_{m} .

When

ω_{D}, ω_{N D}

are selected using sampling scheme (i), then the probability that a higher score is received on diagnostic X by a diseased individual than a nondiseased individual is

AUC = \{\begin{matrix} \int_{- \infty}^{\infty} (1 - F_{D} (c)) f_{N D} (c) d c & abs . cont . \\ \sum_{i = 1}^{m} (1 - F_{D} (c_{i})) (F_{N D} (c_{i}) - F_{N D} (c_{i - 1})) & discrete . \end{matrix}

(1)

Under the assumption that

F_{D} (c)

is constant on

{c : F_{N D} (c) = p}

for every

p \in [0, 1],

there is a function ROC (receiver operating characteristic) such that

1 - F_{D} (c) =

ROC

(1 - F_{N D} (c))

so AUC

= \int_{- \infty}^{\infty}

ROC

(1 - F_{N D} (c)) F_{N D} (d x) .

Putting

p = 1 - F_{N D} (c),

then ROC

(p) = 1 - F_{D} (F_{N D}^{- 1} (1 - p)) .

In the absolutely continuous case, AUC

= \int_{0}^{1}

ROC

(p) d p

which is the area under the curve given by the ROC function. The area under the curve interpretation is geometrically evocative but is not necessary for (1) to be meaningful.

It is commonly suggested that a good diagnostic variable X will have an AUC close to 1 while a value close to 1/2 suggests a poor diagnostic test. It is surely the case, however, that the utility of X in practice will depend on the cutoff c chosen and the various error characteristics associated with this choice. So while the AUC can be used to screen diagnostics, it is only part of the analysis and inferences about the error characteristics are required to truly assess the performance of a diagnostic. Consider an example.

Example 1.

Suppose that

F_{D} = F_{N D}^{q}

for some

q > 1,

where

F_{N D}

is continuous, strictly increasing with associated density

f_{N D} .

Then using (1), AUC

= 1 - 1 / (q + 1)

which is approximately 1 when q is large. The optimal c minimizes Error

(c) = w F_{N D}^{q} (c) + (1 - w) (1 - F_{N D} (c))

which implies c satisfies

F_{N D} (c) = {(1 - w) / q w}^{1 / (q - 1)}

when

q > (1 - w) / w

and the optimal c is otherwise

c = \infty

. If

q = 99,

then AUC

= 0.99

and with

w = 0.025, (1 - w) / w = 39 < q

so FNR

(c_{o p t}) = 0.390,

FPR

(c_{o p t}) = 0.009,

Error

(c_{o p t}) = 0.019,

FDR

(c_{o p t}) = 0.009

and FNDR

(c_{o p t}) = 0.010

. So X seems like a good diagnostic via the AUC and the error characteristics that depend on the prevalence although within the diseased population the probability is

0.39

of not detecting the disease. If instead

w = 0.01,

then the AUC is the same but

q = 99 = (1 - w) / w

and the optimal classification always classifies an individual as non-diseased which is useless. So the AUC does not indicate enough about the characteristics of the diagnostic to determine if it is useful or not. It is necessary to look at the error characteristics of the classification at the cutoff value that will actually be used, to determine if a diagnostic is suitable and this implies that information about w is necessary in an application.

2.2. Relative Belief Inferences

Suppose there is a model

{f_{θ} : θ \in Θ}

for data

x

together with a prior probability measure

Π,

with density

π,

on

Θ .

These ingredients lead, via the principle of conditional probability, to beliefs about the true value of

θ,

as initially expressed by

Π,

being replaced by the posterior probability measure

Π (\cdot | x)

with density

π (\cdot | x) .

Note that if interest is instead in a quantity

ψ = Ψ (θ),

where

Ψ : Θ \to Ψ

and we use the same notation for the function and its range, then the model is replaced by

{m_{ψ} : ψ \in Ψ},

where

m_{ψ} (x) = \int_{Ψ^{- 1} {ψ}} f_{θ} (x) π (θ | ψ) d θ

is obtained by integrating out the nuisance parameters, and the prior is replaced by the marginal prior

π_{Ψ} (ψ) = \int_{Ψ^{- 1} {ψ}} π (θ) d θ .

This leads to the marginal posterior

Π_{Ψ} (\cdot | x)

with density

π_{Ψ} (\cdot | x) .

For the moment suppose that all the distributions are discrete. The principle of evidence then says that there is evidence in favor of the value

ψ

if

π_{Ψ} (ψ | x) > π_{Ψ} (ψ),

evidence against the value

ψ

if

π_{Ψ} (ψ | x) < π_{Ψ} (ψ),

and no evidence either way if

π_{Ψ} (ψ | x) = π_{Ψ} (ψ) .

So, for example, there is evidence in favor of

ψ

if the probability of

ψ

increases after seeing the data. To order the possible values with respect to the evidence, we use the relative belief ratio

R B_{Ψ} (ψ | x) = \frac{π_{Ψ} (ψ | x)}{π_{Ψ} (ψ)} .

Note that

R B_{Ψ} (ψ | x) > (<) 1

indicates whether there is evidence in favor of (against) the value

ψ .

If there is evidence in favor of both

ψ_{1}

and

ψ_{2},

then there is more evidence in favor of

ψ_{1}

than

ψ_{2}

whenever

R B_{Ψ} (ψ_{1} | x) > R B_{Ψ} (ψ_{2} | x)

and, if there is evidence against both

ψ_{1}

and

ψ_{2},

then there is more evidence against

ψ_{1}

than

ψ_{2}

whenever

R B_{Ψ} (ψ_{1} | x) < R B_{Ψ} (ψ_{2} | x) .

For the continuous case consider a sequence of neighborhoods

N_{ϵ} (ψ) ↓ {ψ}

as

ϵ \to 0

and then

R B_{Ψ} (N_{ϵ} (ψ) | x) = \frac{Π_{Ψ} (N_{ϵ} (ψ) | x)}{Π_{Ψ} (N_{ϵ} (ψ))} \to \frac{π_{Ψ} (ψ | x)}{π_{Ψ} (ψ)}

(2)

under very weak conditions such as

π_{Ψ} (ψ) > 0

and

π_{Ψ}

being continuous at

ψ .

All the inferences about quantities considered in the paper are derived based upon the principle of evidence as expressed via the relative belief ratio. For example, it is immediate that the value

R B_{Ψ} (ψ_{0} | x)

indicates whether or not there is evidence in favor of or against the hypothesis

H_{0} : Ψ (θ) = ψ_{0} .

Furthermore, the posterior probability

Π_{Ψ} (R B_{Ψ} (ψ | x) \leq R B_{Ψ} (ψ_{0} | x) | x)

measures the strength of this evidence for, if

R B_{Ψ} (ψ_{0} | x) > 1

and this probability is large, then there is strong evidence in favor of

H_{0}

as there is a small belief that the true value has a larger relative belief ratio and if

R B_{Ψ} (ψ_{0} | x) < 1

and this probability is small, then there is strong evidence against

H_{0}

as there is high belief that the true value has a larger relative belief ratio. For estimation it is natural to estimate

ψ

by the relative belief estimate

ψ (x) = arg {sup}_{ψ \in Ψ} R B_{Ψ} (ψ | x)

as this value has the maximum evidence in its favor. Furthermore, the accuracy of this estimate can be assessed by looking at the plausible region

P l_{Ψ} (x) = {ψ : R B_{Ψ} (ψ | x) > 1},

consisting of all those values for which there is evidence in favor, together with its size and posterior content which measures how strongly it is believed the true value lies in this set. Rather than using the plausible region to assess the accuracy of

ψ (x)

, one could quote a

γ -

relative belief credible region

C_{Ψ, γ} (x) = {ψ : R B_{Ψ} (ψ | x) > c_{γ}}

where the constant

c_{γ}

is the largest value such that

Π_{Ψ} (C_{Ψ, γ} (x) | x) \geq γ

. It is necessary, however, that

γ \leq Π_{Ψ} (P l_{Ψ} (x) | x)

as otherwise

C_{Ψ, γ} (x)

will contain values for which there is evidence against, and this is only known after the data have been seen.

It is established in [7], and in papers referenced there, that these inferences possess a number of good properties such as consistency, satisfy various optimality criteria and clearly they are based on a direct measure of the evidence. Perhaps most significant is the fact that all the inferences are invariant under reparameterizations. For if

λ = Λ (ψ)

, where

Λ

is a smooth bijection, then

R B_{Λ} (λ | x) = \frac{π_{Λ} (λ | x)}{π_{Λ} (λ)} = \frac{π_{Ψ} (Λ^{- 1} (λ) | x) J_{Λ} (Λ^{- 1} (λ))}{π_{Ψ} (Λ^{- 1} (λ)) J_{Λ} (Λ^{- 1} (λ))} = \frac{π_{Ψ} ((ψ) | x)}{π_{Ψ} (ψ)}

and so, for example,

λ (x) = Λ (ψ (x)) .

This invariance property is not possessed by the most common inference methods employed such as MAP estimation or using posterior means and this invariance holds no matter what the dimension of

ψ

is. Moreover, it is proved in [20] that relative belief inferences are optimally robust among all Bayesian inferences for

ψ

, to linear contaminations of the prior on

ψ

.

An analysis, using relative belief, of the data obtained in several physics experiments that were all concerned with examining whether there was evidence in favor of or against the quantum model versus hidden variables is available in [21]. Furthermore, an approach to checking models used for quantum mechanics via relative belief is discussed in [22]. Other applications of relative belief inferences to common problems of statistical practice can be found in [7].

The Bayes factor is an alternative measure of evidence and is commonly used for hypothesis assessment in Bayesian inference. To see why the relative belief ratio has advantages over the Bayes factor for evidence-based inferences consider first assessing the hypothesis

H_{0} : Ψ (θ) = ψ_{0} .

When the prior probability of

ψ_{0}

satisfies

0 < Π_{Ψ} ({ψ_{0}}) < 1,

then the Bayes factor is defined as the ratio of the posterior odds in favor of

H_{0}

to the prior odds in favor of

H_{0},

namely,

B F_{Ψ} (ψ_{0} | x) = \{\frac{Π_{Ψ} ({ψ_{0}} | x)}{Π_{Ψ} ({ψ_{0}}^{c} | x)}\} {\{\frac{Π_{Ψ} ({ψ_{0}})}{Π_{Ψ} ({ψ_{0}}^{c})}\}}^{- 1} .

It is easily shown that the Bayes factor satisfies the principle of evidence and

B F_{Ψ} (ψ_{0} | x) > (<) 1

is evidence in favor (against)

H_{0},

so in this context it is a valid measure of evidence.

One might wonder why it is necessary to consider a ratio of odds as opposed to the simpler ratio of probabilities, as specified by the relative belief ratio, for the purpose of measuring evidence but in fact there is a more serious issue with the Bayes factor. For suppose, as commonly arises in applications, that

Π_{Ψ}

is a continuous probability measure so that

Π_{Ψ} ({ψ_{0}}) = 0

as then the Bayes factor for

H_{0}

is not defined. The common recommendation in this context is to require the specification of the following ingredients: a prior probability

p > 0,

a prior distribution

Π_{H_{0}}

concentrated on

Ψ^{- 1} {ψ_{0}}

which provides the prior predictive density

m_{H_{0}} (x)

, a prior distribution

Π_{H_{0}^{c}}

concentrated on

Ψ^{- 1} {ψ_{0}}^{c}

which provides the prior predictive density

m_{H_{0}^{c}} (x)

and then the full prior is taken to be the mixture

Π = p Π_{H_{0}} + (1 - p) Π_{H_{0}^{c}} .

With this prior the Bayes factor for

H_{0}

is defined, as now the prior probability of

ψ_{0}

equals

p,

and an easy calculation shows that

B F_{Ψ} (ψ_{0} | x) = m_{H_{0}} (x) / m_{H_{0}^{c}} (x) .

Typically the prior

Π_{H_{0}^{c}}

is taken to be the prior that we might place on

θ

when interest is in estimating

ψ

.

Now consider the problem of estimating

ψ

and the prior is such that

Π_{Ψ} ({ψ}) = 0

for every value of

ψ

as with a continuous prior. The Bayes factor is then not defined for any value of

ψ

and, if we wished to use the Bayes factor for estimation purposes, it would be necessary to modify the prior to be a different mixture for each value of

ψ

so that there would be in effect multiple different priors. This does not correspond to the logic underlying Bayesian inference. When using the relative belief ratio for inference only one prior is required and the same measure of evidence is used for both hypothesis assessment and estimation purposes.

Another approach to dealing with the problem that arises with the Bayes factor and continuous priors is to take a limit as in (2) and, when this is done, we obtain the result

B F_{Ψ} (N_{ϵ} (ψ) | x) \to R B_{Ψ} (ψ | x)

as

ϵ \to 0

whenever the prior density of

Ψ

is continuous and positive at

ψ .

In other words the relative belief ratio can be also considered as a natural definition of the Bayes factor in continuous contexts.

3. Inferences for an ROC Analysis

Suppose we have a sample of

n_{D}

from

Ω_{D}

, namely,

x_{D} = (x_{D 1}, \dots, x_{D n_{D}})

and a sample of

n_{N D}

from

Ω_{N D}

, namely,

x_{N D} = (x_{N D 1}, \dots, x_{N D n_{N D}})

and the goal is to make inference about the AUC, the cutoff c and the error characteristics FNR

(c),

FPR

(c),

Error

(c),

FDR

(c)

and FNDR

(c)

. For the AUC it makes sense to first assess the hypothesis

H_{0} :

AUC

> 1 / 2

via stating whether there is evidence for or against

H_{0}

together with an assessment of the strength of this evidence. Estimates are required for all of these quantities, together with an assessment of the accuracy of the estimate.

3.1. The Prevalence

Consider first inferences for the relevant prevalence

w .

If w is known, or at least assumed known, then nothing further needs to be done but otherwise this quantity needs to be estimated when assessing the value of the diagnostic and so uncertainty about w needs to be addressed.

If the full data set is based on sampling scheme (ii), then

n_{D} \sim

binomial

(n, w) .

A natural prior

π_{W}

to place on w is a beta

(α_{1 w}, α_{2 w})

distribution. The hyperparameters are chosen based on the elicitation algorithm discussed in [23] where interval

[l, u]

is chosen such that it is believed that

w \in [l, u]

with prior probability

γ .

Here

[l, u]

is chosen so that we are virtually certain that

w \in [l, u]

and

γ = 0.99

then seems like a reasonable choice. Note that choosing

l = u

corresponds to w being known and so

γ = 1

in that case. Next pick a point

ξ_{w} \in [l, u]

for the mode of the prior and a reasonable choice might be

ξ_{w} = (l + u) / 2 .

Then putting

τ_{w} = α_{1 w} + α_{2 w} - 2

leads to the parameterization beta

(α_{1 w}, α_{2 w}) =

beta

(1 + τ_{w} ξ_{w}, 1 + τ_{w} (1 - ξ_{w}))

where

ξ_{w}

locates the mode and

τ_{w}

controls the spread of the distribution about

ξ_{w} .

Here

τ_{w} = 0

gives the uniform distribution and

τ_{w} = \infty

gives the distribution degenerate at

ξ_{w} .

With

ξ_{w}

specified,

τ_{w}

is the smallest value of

τ_{w}

such that the probability content of

[l, u]

is

γ

and this is found iteratively. For example, if

[l, u] = [0.60, 0.70]

and

γ = 0.99,

so w is known reasonably well, then

ξ_{w} = (l + u) / 2 = 0.65

and

τ_{w} = 601.1,

so the prior is beta

(391.72, 211.39)

and the posterior is beta

(391.72 + n_{D}, 211.39 + n_{N D}) .

The estimate of w is then

w (n_{D}, n_{N D}) = arg sup_{w \in [0, 1]} R B (w | n_{D}, n_{N D}) = arg sup_{w \in [0, 1]} \frac{π_{W} (w | n_{D}, n_{N D})}{π_{W} (w)} .

In this case the estimate is the MLE, namely,

w (n_{D}, n_{N D}) = n_{D} / (n_{D} + n_{N D}) .

The accuracy of this estimate is measured by the size of the plausible region

P l (n_{D}, n_{N D}) = {w : R B (w | n_{D}, n_{N D}) > 1}

. For example, if

n = 100

and

n_{D} = 68,

then

w (68, 32) = 0.68

and

P l (68, 32) = [0.647, 0.712]

which has posterior content

0.651 .

So the data suggest that the upper bound of

u = 0.70

is too strong although the posterior belief in this interval is not very high.

The prior and posterior distributions of w play a role in inferences about all the quantities that depend on the prevalence. In the case where the cutoff is determined by minimizing the probability of a misclassification, then

c_{o p t},

FNR

(c_{o p t}),

FPR

(c_{o p t}),

Error

(c_{o p t}),

FDR

(c_{o p t})

and FNDR

(c_{o p t})

all depend on the prevalence. Under sampling scheme (i), however, only the prior on w has any influence when considering the effectiveness of

X .

Inference for these quantities is now discussed in both cases.

3.2. Ordered Discrete Diagnostic

Suppose X takes values on the finite ordered scale

c_{1} < c_{2} < \dots < c_{m}

and let

p_{N D i} = P (X (ω_{N D}) = c_{i}), p_{D i} = P (X (ω_{D}) = c_{i})

so

F_{N D} (c_{i}) = \sum_{j = 1}^{i} p_{N D j}

and

F_{D} (c_{i}) = \sum_{j = 1}^{i} p_{D j}

. These imply that FPR

(c_{i}) = 1 - \sum_{j = 1}^{i} p_{N D i},

FNR

(c_{i}) = \sum_{j = 1}^{i} p_{D i},

AUC (p_{N D}, p_{D}) = \sum_{i = 1}^{m} (1 - FNR (c_{i})) p_{N D i}

with the remaining quantities defined similarly. Ref. [23] can be used to obtain independent elicited Dirichlet priors

p_{N D} \sim Dirichlet (α_{N D 1}, \dots, α_{N D m}), p_{D} \sim Dirichlet (α_{D 1}, \dots, α_{D m})

(3)

on these probabilities by placing either upper or lower bounds on each cell probability that hold with virtual certainty

γ,

as discussed for the beta prior on the prevalence. If little information is available, it is reasonable to use uniform (Dirichlet

(1, \dots, 1)

) priors on

p_{N D}

and

p_{D} .

This together with the independent prior on w leads to prior distributions for the AUC,

c_{o p t}

and all the quantities associated with error assessment such as FNR

(c_{o p t}),

etc.

Data

(x_{D}, x_{N D})

lead to counts

f_{N D} = (f_{N D 1}, \dots, f_{N D m})

and

f_{D} = (f_{D 1}, \dots, f_{D m})

which in turn lead to the independent posteriors

p_{N D} | f_{N D} \sim Dirichlet (α_{N D} + f_{N D}), p_{D} | f_{D} \sim Dirichlet (α_{D} + f_{D}) .

(4)

Under sampling regime (ii) this, together with the independent posterior on

w,

leads to posterior distributions for all the quantities of interest. Under sampling regime (i), however, the logical thing to do, so the inferences reflect the uncertainty about

w,

is to only use the prior on w when deriving inferences about any quantities that depend on this such as

c_{o p t}

and the various error assessments.

Consider inferences for the AUC. The first inference should be to assess the hypothesis

H_{0} :

AUC

> 1 / 2

for, if

H_{0}

is false, then X would seem to have no value as a diagnostic (the possibility that the directionality is wrong is ignored here). The relative belief ratio

R B (H_{0} | f_{N D}, f_{N D}) = Π (H_{0} | f_{N D}, f_{N D}) / Π (H_{0})

is computed and compared to 1. If it is concluded that

H_{0}

is true, then perhaps the next inference of interest is to estimate the AUC via the relative belief estimate. The prior and posterior densities of the AUC are not available in closed form so estimates are required and density histograms are employed here for this. The set

(0, 1]

is discretized into L subintervals

(0, 1] = \cup_{i = 1}^{L} ((i - 1) / L, i / L],

and putting

a_{i} = (i - 1 / 2) / L,

the value of the prior density

p_{AUC} (a_{i})

is estimated by

L (

proportion of prior simulated values of AUC in

(i - 1, i] / L)

and similarly for the posterior density

p_{AUC} (a_{i} | f_{N D}, f_{D}) .

Then

R B_{AUC} (a | f_{N D}, f_{N D})

is maximized to obtain the relative belief estimate AUC

(f_{N D}, f_{D})

together with the plausible region and its posterior content.

These quantities are also obtained for

c_{o p t}

in a similar fashion, although

c_{o p t}

has prior and posterior distribution concentrated on

{c_{1}, c_{2}, \dots, c_{m}}

so there is no need to discretize. Estimates of the quantities FNR

(c_{o p t} (f_{N D}, f_{D})),

FPR

(c_{o p t} (f_{N D}, f_{D})),

Error

(c_{o p t} (f_{N D}, f_{D})),

FDR

(c_{o p t} (f_{N D}, f_{D}))

and FNDR

(c_{o p t} (f_{N D}, f_{D}))

are also obtained as these indicate the performance of the diagnostic in practice. The relative belief estimates of these quantities are easily obtained in a second simulation where

c_{o p t} (f_{N D}, f_{D})

is fixed.

Consider now an example.

Example 2.

Simulated example.

For

k = 5

and

c_{i} = i

, data were generated as

\begin{matrix} f_{N D} & \sim multinomial (50, 0.5, 0.2, 0.1, 0.1, 0.1) obtaining f_{N D} = (29, 7, 4, 5, 5), \\ f_{D} & \sim multinomial (100, 0.1, 0.1, 0.2, 0.3, 0.3) obtaining f_{D} = (14, 7, 25, 33, 21) . \end{matrix}

With these choices for

p_{N D}, p_{D}

the true values are AUC

= 0.65

, and with

w = 0.65

,

c_{o p t} = 2,

FNR

(c_{o p t}) = 0.200,

FPR

(c_{o p t}) = 0.300,

Error

_{w} (c_{o p t}) = 0.235,

FDR

(c_{o p t}) = 0.168

and FNDR

(c_{o p t}) = 0.347

. So X is not an outstanding diagnostic but with these error characteristics it may prove suitable for a given application. Uniform, namely, Dirichlet

(1, 1, 1, 1, 1),

priors were placed on

p_{N D}

and

p_{D},

reflecting little knowledge about these quantities.

Simulations based on Monte Carlo sample sizes of

N = 10^{5}

from the prior and posterior distributions of

p_{N D}

and

p_{D}

were conducted and the prior and posterior distributions of the quantities of interest obtained. The hypothesis

H_{0} :

AUC

> 0.5

is assessed by

R B_{AUC} ((0.50, 1.00] | f_{N D}, f_{D}) = 3.15 .

So there is evidence in favor of

H_{0}

and the strength of this evidence is measured by the posterior probability content of

(0.50, 1.00]

which equals

1.0

to machine accuracy and so this is categorical evidence in favor of

H_{0} .

For the continuous quantities a grid based on

L + 1 = 25

equispaced points

{0, 0.04, 0.08, \dots, 1.00}

was used and all the mass in the interval

(i - 1, i] / L

assigned to the midpoint

(i - 1 / 2) / L .

Figure 1 contains plots of the prior and posterior densities and relative belief ratio of the AUC. The relative belief estimate of the AUC is AUC

(f_{N D}, f_{D}) = 0.66

with

P l_{AUC} (f_{N D}, f_{D}) = [0.60, 0.72]

having posterior content

0.97 .

Certainly a finer partition of

[0, 1]

than just 24 intervals is possible, but even in this relatively coarse case the results are quite accurate.

Supposing that the relevant prevalence is known to be

w = 0.65,

Figure 2 contains plots of the prior and posterior densities and relative belief ratio of

c_{o p t} .

The relative belief estimate is

c_{o p t} (f_{N D}, f_{D}) = 2

with

P l_{c_{o p t}} (f_{N D}, f_{D}) = {2}

with posterior probability content

0.53

so the correct optimal cut-off has been identified but there is a degree of uncertainty concerning this. The error characteristics that tell us about the utility of X as a diagnostic are given by the relative belief estimates (column (a)) in Table 2. It is interesting to note that the estimate of Error

(c_{o p t})

is determined by the prior and posterior distributions of a convex combination of FPR

(c_{o p t})

and FNR

(c_{o p t})

and the estimate is not the same convex combination of the estimates of FPR

(c_{o p t})

and FNR

(c_{o p t})

. So, in this case Error

(c_{o p t})

seems like a much better assessment of the performance of the diagnostic.

Suppose now that the prevalence is not known but there is a beta

(1 + τ_{w} ξ_{w}, 1 + τ_{w} (1 - ξ_{w}))

prior specified for w and consider the choice discussed in Section 3.1 where

ξ_{w} = 0.65

and

τ_{w} = 601.1 .

When the data are produced according to sampling regime (i), then there is no posterior for w but this prior can still be used in determining the prior and posterior distributions of

c_{o p t}

and the associated error characteristics. When this simulation was carried out

c_{o p t} (f_{N D}, f_{D}) = 2

with

P l_{c_{o p t}} (f_{N D}, f_{D}) = {2}

with posterior probability content

0.53 .

and column (b) of Table 2 gives the estimates of the error characteristics. So other than the estimate of the FPR, the results are similar. Finally, assuming that the data arose under sampling scheme (ii), then w has a posterior distribution and using this gives

c_{o p t} (f_{N D}, f_{D}) = 2

with

P l_{c_{o p t}} (f_{N D}, f_{D}) = {2}

with posterior probability content

0.52

and error characteristics as in column (c) of Table 2. These results are the same as if the prevalence is known which is sensible as the posterior concentrates about the true value more than the prior.

Another somewhat anomalous feature of this example is the fact that uniform priors on

p_{D}

and

p_{N D}

do not lead to a prior on the AUC that is even close to uniform. In fact one could say that this prior has a built-in bias against a diagnostic with AUC

> 1 / 2

and indeed most choices of

p_{D}

and

p_{N D}

will not satisfy this. Another possibility is to require

p_{N D 1} \geq \dots \geq p_{N D m}

and

p_{D 1} \leq \dots \leq p_{D m},

namely, require monotonicity of the probabilities. A result in [22] implies that

p_{N D}

satisfies this iff

p_{N D} = A_{k} ω_{N D}

where

ω_{N D} \in S_{k},

the standard

(k - 1)

-dimensional simplex, and

A_{k} \in R^{k \times k}

with i-ith row equal to

(0, \dots, 0, 1 / i, 1 / (i + 1), \dots, 1 / k)

and

p_{D}

satisfies this iff

p_{D} = B_{k} ω_{D}

where

ω_{D} \in S_{k}

and

B_{k} = I_{k}^{*} A_{k}

where

I_{k}^{*}

\in R^{k \times k}

contains all 0’s except for 1’s on the crossdiagonal. If

ω_{N D}

and

ω_{D}

are independent and uniform on

S_{k},

then

p_{D}

and

p_{N D}

are independent and uniform on the sets of probabilities satisfying the corresponding monotonicities and Figure 3 has a plot of the prior of the AUC when this is the case. It is seen that this prior is biased in favor of AUC

> 1 / 2 .

Figure 3 also has a plot of the prior of the AUC when

p_{D}

is uniform on the set of all nondecreasing probabilities and

p_{N D}

is uniform on

S_{k} .

This reflects a much more modest belief that X will satisfy AUC

> 1 / 2

and indeed this may be a more appropriate prior than using uniform distributions on

S_{k} .

Ref. [22] also provides elicitation algorithms for choosing alternative Dirichlet distributions for

ω_{N D}

and

ω_{D} .

When

H_{0} :

AUC

> 0.5

is accepted, it makes sense to use the conditional prior, given that this event is true, in the inferences. As such it is necessary to condition the prior on the event

\sum_{i = 1}^{m} (\sum_{j = 1}^{i} p_{D j}) p_{N D i} \leq 1 / 2 .

In general, it is not clear how to generate from this conditional prior but depending on the size of m and the prior, a brute force approach is to simply generate from the unconditional prior and select those samples for which the condition is satisfied and the same approach works with the posterior.

Here

m = 5,

and using uniform priors for

p_{N D}

and

p_{D}

, the prior probability of AUC

> 0.5

is

0.281

while the posterior probability is

0.998

so the posterior sampling is much more efficient. Choosing priors that are more favorable to AUC

> 0.5

will improve the efficiency of the prior sampling. Using the conditional priors led to AUC

(f_{N D}, f_{D}) = 0.66

with

P l_{AUC} (f_{N D}, f_{D}) = [0.60, 0.76]

with posterior content

0.85

. This is similar to the results obtained using the unconditional prior but the conditional prior puts more mass on larger values of the AUC hence the wider plausible region with lower posterior content. Moreover,

c_{o p t} (f_{N D}, f_{D}) = 2

with

P l_{c_{o p t}} (f_{N D}, f_{D}) = {1, 2}

with posterior probability content approximately

1.00

(actually

0.99999

) which reflects virtual certainty that the true optimal value is in

{1, 2} .

3.3. Binormal Diagnostic

Suppose now that X is a continuous diagnostic variable and it is assumed that the distributions

F_{D}

and

F_{N D}

are normal distributions. The assumption of normality should be checked by an appropriate test and it will be assumed here that this has been carried out and normality was not rejected. While the normality assumption may seem somewhat unrealistic, many aspects of the analysis can be expressed in closed form and this allows for a deeper understanding of ROC analyses more generally.

With

Φ

denoting the

N (0, 1)

cdf, then FNR

(c) = Φ ((c - μ_{D}) / σ_{D}),

FPR

(c) = 1 - Φ ((c - μ_{N D}) / σ_{N D})

so

c = μ_{N D} + σ_{N D} Φ^{- 1} (1 - (1 - F_{N D} (c)))

and

AUC = \int_{- \infty}^{\infty} Φ (\frac{μ_{D} - μ_{N D}}{σ_{D}} + \frac{σ_{N D}}{σ_{D}} z) φ (z) d z .

For given

(μ_{D}, σ_{D}, μ_{N D}, σ_{N D})

and

c,

all these values can be computed using

Φ

except the AUC and for that quadrature or simulation via generating

z \sim N (0, 1)

is required.

The following results hold for the AUC with the proofs in the Appendix A.

Lemma 1.

AUC

> 1 / 2

iff

μ_{D} > μ_{N D}

and when

μ_{D} > μ_{N D},

the AUC is a strictly increasing function of

σ_{N D} / σ_{D} .

From Lemma 1 it is clear that it makes sense to restrict the parameterization so that

μ_{D} > μ_{N D}

but we need to test the hypothesis

H_{0} : μ_{D} > μ_{N D}

first. Clearly Error

(c) = w

FNR

(c) + (1 - w)

FPR

(c) \to 1 - w

as

c \to - \infty

and Error

(c) \to w

as

c \to \infty

so, if Error

(c)

does not achieve a minimum at a finite value of

c,

then the optimal cut-off is infinite and the optimal error is

min {w, 1 - w} .

It is possible to give conditions under which a finite cutoff exists and express

c_{o p t}

in closed form when the parameters and the relevant prevalence w are all known.

Lemma 2.

(i) When

σ_{D}^{2} = σ_{N D}^{2} = σ^{2},

then a finite optimal cut-off minimizing Error

(c)

exists iff

μ_{D} > μ_{N D}

and in that case

c_{o p t} = \frac{μ_{D} + μ_{N D}}{2} + \frac{σ^{2}}{μ_{D} - μ_{N D}} log (\frac{1 - w}{w}) .

(5)

(ii) When

σ_{D}^{2} \neq σ_{N D}^{2},

then a finite optimal cut-off exists iff

{(μ_{D} - μ_{N D})}^{2} + 2 (σ_{D}^{2} - σ_{N D}^{2}) log (\frac{1 - w}{w} \frac{σ_{D}}{σ_{N D}}) \geq 0

(6)

and in that case

c_{o p t} = \frac{σ_{N D}^{2} μ_{D} - σ_{D}^{2} μ_{N D}}{σ_{N D}^{2} - σ_{D}^{2}} - \frac{σ_{N D} σ_{D}}{σ_{N D}^{2} - σ_{D}^{2}} {\{\begin{matrix} {(μ_{D} - μ_{N D})}^{2} + \\ 2 (σ_{D}^{2} - σ_{N D}^{2}) log (\frac{1 - w}{w} \frac{σ_{D}}{σ_{N D}}) \end{matrix}\}}^{1 / 2} .

(7)

Note that when

w = 1 / 2,

then in (i)

c_{o p t} = (μ_{D} + μ_{N D}) / 2

as one might expect. In the case of unequal variances there is an additional restriction beyond

μ_{D} \geq μ_{N D}

required to hold if the diagnostic is to serve as a reasonable classifier. The following shows that these can be combined in a natural way.

Corollary 1.

The restrictions

μ_{D} \geq μ_{N D}

and (6) hold iff

μ_{D} - μ_{N D} - {\{max [0, - 2 (σ_{D}^{2} - σ_{N D}^{2}) log (\frac{1 - w}{w} \frac{σ_{D}}{σ_{N D}})]\}}^{1 / 2} \geq 0 .

(8)

So, if one is unwilling to assume constant variance, then the hypothesis

H_{0} :

(8) holds, needs to be assessed. There is some importance to these results as they demonstrate that a finite optimal cutoff may in fact not exist at least when considering both types of error. For example, when

μ_{N D} = 1, μ_{D} = 2, σ_{D} = 1, σ_{N D} = 1.5,

then for any

w \leq 0.30885,

the optimal cutoff is

c_{o p t} = \infty

with Error

(\infty) = w .

When

c_{o p t}

is infinite, then one may need to consider various cutoffs c and find one that is acceptable at least with respect to some of the error characteristics FNR

(c),

FPR

(c),

Error

(c)

, FDR

(c)

and FNDR

(c) .

Consider now examples with equal and unequal variances.

Example 3.

Binormal with

σ_{N D}^{2} = σ_{D}^{2} .

There may be reasons why the assumption of equal variance is believed to hold but this needs to be assessed and evidence in favor found. If evidence against the assumption is found, then the approach of Example 4 can be used. A possible prior is given by

π_{1} (μ_{N D}, σ^{2}) π_{2} (μ_{D} | σ^{2})

where

μ_{N D} | σ^{2} \sim N (μ_{0}, τ_{0}^{2} σ^{2}), μ_{D} | σ^{2} \sim N (μ_{0}, τ_{0}^{2} σ^{2}), 1 / σ^{2} \sim gamma (λ_{1}, λ_{2})

which is a conjugate prior. The hyperparameters to be elicited are

(μ_{0}, τ_{0}^{2}, λ_{1}, λ_{2}) .

Consider first eliciting the prior for

(μ_{N D}, σ^{2}) .

For this an interval

(m_{1}, m_{2})

is specified such that is it believed that

μ_{N D} \in (m_{1}, m_{2})

with virtual certainty (say with probability

γ = 0.99) .

Then putting

μ_{0} = (m_{1} + m_{2}) / 2

implies

γ \leq Φ ((m_{2} - μ_{0}) / τ_{0} σ) - Φ ((m_{1} - μ_{0}) / τ_{0} σ) = 2 Φ ((m_{2} - m_{1}) / 2 τ_{0} σ) - 1

which implies

σ \leq (m_{2} - m_{1}) / 2 τ_{0} z_{(1 + γ) / 2}

where

z_{(1 + γ) / 2} = Φ^{- 1} ((1 + γ) / 2) .

The interval

μ_{N D} \pm σ z_{(1 + γ) / 2}

will contain an observation from

F_{N D}

with virtual certainty and let

(l_{0}, u_{0})

be lower and upper bounds on the half-length of this interval so

l_{0} / z_{(1 + γ) / 2} \leq σ \leq u_{0} / z_{(1 + γ) / 2}

with virtual certainty. This implies

τ_{0} = (m_{2} - m_{1}) / 2 u_{0} .

This leaves specifying the hyperparameters

(λ_{1}, λ_{2}),

and letting

G (\cdot, λ_{1}, λ_{2})

denote the cdf of the gamma

(λ_{1}, λ_{2})

distribution, then

(λ_{1}, λ_{2})

satisfying

G (z_{(1 + γ) / 2}^{2} / l_{0}^{2}, λ_{1}, λ_{2}) = (1 + γ) / 2, G (z_{(1 + γ) / 2}^{2} / u_{0}^{2}, λ_{1}, λ_{2}) = (1 - γ) / 2

(9)

will give the specified γ coverage. Noting that

G (x, λ_{1}, λ_{2}) = G (λ_{2} x, λ_{1}, 1),

first specify

λ_{1}

and solve the first equation in (9) for

λ_{2}

and then solve the second equation in (9) for

λ_{1}

and continue this iteration until the probability content of

(l_{0} / z_{(1 + γ) / 2}, u_{0} / z_{(1 + γ) / 2})

is sufficiently close to γ. Using

s_{D}^{2} = | | x_{D} - {\bar{x}}_{D} {1 | |}^{2}, s_{N D}^{2} = | | x_{N D} - {\bar{x}}_{N D} 1 {| |}^{2},

the posterior is then

\begin{matrix} μ_{N D} | σ^{2}, x_{N D} \sim N ({(n_{N D} + 1 / τ_{0}^{2})}^{- 1} (n_{N D} {\bar{x}}_{N D} + μ_{0} / τ_{0}^{2}), {(n_{N D} + 1 / τ_{0}^{2})}^{- 1} σ^{2}), \\ μ_{D} | σ^{2}, x_{D} \sim N ({(n_{D} + 1 / τ_{0}^{2})}^{- 1} (n_{D} {\bar{x}}_{D} + μ_{0} / τ_{0}^{2}), {(n_{D} + 1 / τ_{0}^{2})}^{- 1} σ^{2}), \\ 1 / σ^{2} | (x_{N D}, x_{D}) \sim gamma (λ_{1} + (n_{D} + n_{N D}) / 2, λ_{x}) \end{matrix}

where

\begin{matrix} λ_{x} & = & λ_{2} + (s_{D}^{2} + s_{N D}^{2}) / 2 + {(n_{D} + 1 / τ_{0}^{2})}^{- 1} (n_{D} / τ_{0}^{2}) {({\bar{x}}_{D} - μ_{0})}^{2} / 2 + \\ {(n_{N D} + 1 / τ_{0}^{2})}^{- 1} (n_{N D} / τ_{0}^{2}) {({\bar{x}}_{N D} - μ_{0})}^{2} / 2 . \end{matrix}

Suppose the following values of the mss were obtained based on samples of

n_{N D} = 25

from

F_{N D} = N (0, 1)

and

n_{D} = 20

from

F_{D} = N (1, 1)

({\bar{x}}_{N D}, s_{N D}^{2}) = (- 0.072, 19.638), ({\bar{x}}_{D}, s_{D}^{2}) = (0.976, 16.778) .

So the true values of the parameters are

μ_{N D} = 0, μ_{D} = 1, σ^{2} = 1 .

In this case AUC

= \int_{- \infty}^{\infty} Φ (1 + z) φ (z) d z = 0.760 .

Supposing that the relevant prevalence is

w = 0.4, c_{o p t} = 0.5 + log (0.6 / 0.4) = 0.905,

FNR

(c_{o p t}) = Φ (0.905 - 1) = 0.46

, FPR

(c_{o p t}) = 1 - Φ (0.905) = 0.18

, Error

(c_{o p t}) = 0.30

, FDR

(c_{o p t}) = 0.34

, FNDR

(c_{o p t}) = 0.27

,

For the prior elicitation, suppose it is known with virtual certainty that both means lie in

(- 5, 5)

and

(l_{0}, u_{0}) = (1, 10)

so we take

μ_{0} = (- 5 + 5) / 2 = 0, τ_{0} = (m_{2} - m_{1}) / 2 u_{0} = 0.5

and the iterative process leads to

(λ_{1}, λ_{2}) = (1.787, 1.056) .

For inference about

c_{o p t}

it is necessary to specify a prior distribution for the prevalence

w .

This can range from w being completely known to being completely unknown whence a uniform(0,1) (beta

(1, 1)

) would be appropriate. Following the developments of Section 3.1, suppose it is known that

w \in [l, u] = [0.2, 0.6]

with prior probability

γ = 0.99,

so in this case

ξ_{w} = (l + u) / 2 = 0.4

and

τ_{w} = 35.89725

and the prior is

w \sim

beta

(15.3589, 22.53835) .

The first inference step is to assess the hypothesis

H_{0} :

AUC

> 1 / 2

which is equivalent to

H_{0} :

μ_{N D} < μ_{D}

by computing the prior and posterior probabilities of this event to obtain the relative belief ratio. The prior probability of

H_{0}

given

σ^{2}

is

\int_{- \infty}^{\infty} Φ ((μ_{D} - μ_{0}) / τ_{0} σ) {(τ_{0} σ)}^{- 1} φ ((μ_{D} - μ_{0}) / τ_{0} σ) d μ_{D} = 1 / 2

and averaging this quantity over the prior for

σ^{2}

we get

1 / 2 .

The posterior probability of this event can be easily obtained via simulating from the joint posterior. When this is done in the specific numerical example, the relative belief ratio of this event is

2.011

with posterior content

0.999

so there is strong evidence that

H_{0} :

AUC

> 1 / 2

is true.

If evidence is found against

H_{0},

then this would indicate a poor diagnostic. If evidence is found in favor, then we can proceed conditionally given that

H_{0}

holds and so condition the joint prior and joint posterior on this event being true when making inferences about AUC,

c_{o p t},

etc. So for the prior it is necessary to generate

1 / σ^{2} \sim

gamma

(α_{0}, β_{0})

and then generate

(μ_{D}, μ_{N D})

from the joint conditional prior given

σ^{2}

and that

μ_{D} > μ_{N D} .

Denoting the conditional priors given

σ^{2}

by

π_{D} (μ_{D} | σ^{2})

and

π_{N D} (μ_{N D} | σ^{2}),

we see that this joint conditional prior is proportional to

π_{N D} (μ_{N D} | σ^{2}) π_{D} (μ_{D} | σ^{2}) = Π_{N D} (μ_{N D} < μ_{D} | μ_{D}, σ^{2}) \frac{π_{N D} (μ_{N D})}{Π_{N D} (μ_{N D} < μ_{D} | σ^{2})} π_{D} (μ_{D} | σ^{2}) .

While generally it is not possible to generate efficiently from this distribution we can use importance sampling to calculate any expectations by generating

μ_{D} \sim μ_{D} | σ^{2} \sim N (μ_{0}, τ_{0}^{2} σ^{2}), μ_{N D} \sim N (μ_{0}, τ_{0}^{2} σ^{2} | (- \infty, μ_{D}])

with

Π_{N D} (μ_{N D} < μ_{D} | μ_{D}, σ^{2}) = Φ ((μ_{D} - μ_{0}) / τ_{0} σ)

serving as the importance sampling weight and where

N (μ_{0}, τ_{0}^{2} σ^{2} | (- \infty, μ_{D}])

denotes the

N (μ_{0}, τ_{0}^{2} σ^{2})

distribution conditioned to

(- \infty, μ_{D}]

with density

Φ^{- 1} ((μ_{D} - μ_{0}) / τ_{0} σ) {(2 π τ_{0}^{2} σ^{2})}^{- 1 / 2} φ ((μ_{N D} - μ_{0}) / τ_{0} σ)

for

μ_{N D} \leq μ_{D}

and 0 otherwise. Generating from this distribution via inversion is easy since the cdf is

Φ ((μ_{N D} - μ_{0}) / τ_{0} σ) / Φ ((μ_{D} - μ_{0}) / τ_{0} σ) .

Note that, if we take the posterior from the unconditioned prior and condition that, we will get the same conditioned posterior as when we use the conditioned prior to obtain the posterior. This implies that in the joint posterior for

(μ_{N D}, μ_{D}, σ^{2})

it is only necessary to adjust the posterior for

μ_{N D}

as was done with the prior and this is also easy to generate from. Note that Lemma 2 (i) implies that it is necessary to use the conditional prior and posterior to guarantee that

c_{o p t}

exists finitely.

Since

H_{0}

was accepted, the conditional sampling was implemented and the estimate of the AUC is

0.795

with plausible region

[0.670, 0.880]

which has posterior content

0.856 .

So the estimate is close to the true value but there is substantial uncertainty. Figure 4 is a plot of the conditioned prior, the conditioned posterior and relative belief ratio for this data.

With the specified prior for

w,

the posterior is beta

(35.3589, 47.53835)

which leads to estimate

0.444

for w with plausible interval

(0.374, 0.516)

having posterior probability content

0.782 .

Using this prior and posterior for w and the conditioned prior and posterior for

(μ_{D}, μ_{N D}, σ^{2}),

we proceed to an inference about

c_{o p t}

and the error characteristics associated with this classification. A computational problem arises when obtaining the prior and posterior distributions of

c_{o p t}

as it is clear from (5) that these distributions can be extremely long-tailed. As such, we transform to

c_{mod} = 0.5 + arctan (c_{o p t}) / π \in [0, 1]

(the Cauchy cdf), obtain the estimate

c_{mod} (d)

where

d = (n_{N D}, {\bar{x}}_{N D}, s_{N D}^{2}, n_{D}, {\bar{x}}_{D}, s_{D}^{2})

and its plausible region and then, applying the inverse transform, obtain

c_{o p t} (d) = tan (π (c_{mod} (d) - 0.5))

and its plausible region. It is notable that relative belief inferences are invariant under 1-1 smooth transformations, so it does not matter which parameterization is used, but it is much easier computationally to work with a bounded quantity. Furthermore, if a shorter tailed cdf is used rather than a Cauchy, e.g., a

N (0, 1)

cdf, then errors can arise due to extreme negative values being always transformed to 0 and very extreme positive values always transformed to 1. Figure 5 is a plot of the prior density, posterior density and relative belief ratio of

c_{mod} .

For these data

c_{o p t} (d) = 0.715

with plausible interval

(0.316, 1.228)

having posterior content

0.860 .

Large Monte Carlo samples were used to get smooth estimates of the densities and relative belief ratio but these only required a few minutes of computer time on a desktop. The estimated error characteristics at this value of

c_{o p t}

are as follows: FNR

(0.715) = 0.41

, FPR

(0.715) = 0.22

, Error

(0.715) = 0.27

, FDR

(0.715) = 0.30

, FNDR

(0.715) = 0.24

which are close to the true values.

Example 4.

Binormal with

σ_{N D}^{2} \neq σ_{D}^{2} .

In this case the prior is given by

π_{1} (μ_{N D}, σ_{N D}^{2}) π_{2} (μ_{D}, σ_{D}^{2})

where

\begin{matrix} μ_{N D} | σ_{N D}^{2} \sim N (μ_{0}, τ_{0}^{2} σ_{N D}^{2}), 1 / σ_{N D}^{2} \sim gamma (λ_{1}, λ_{2}) \\ μ_{D} | σ_{D}^{2} \sim N (μ_{0}, τ_{0}^{2} σ_{D}^{2}), 1 / σ_{D}^{2} \sim gamma (λ_{1}, λ_{2}) . \end{matrix}

(10)

Although this specifies the same prior for the two populations, this is easily modified to use different priors and, in any case, the posteriors are different. Again it is necessary to check that the AUC

> 1 / 2

but also to check that

c_{o p t}

exists using the full posterior based on this prior and for this we have the hypothesis

H_{0}

given by Corollary 1. If evidence in favor of

H_{0}

is found, the prior is replaced by the conditional prior given this event for inference about

c_{o p t} .

This can be implemented via importance sampling as was done in Example 3 and similarly for the posterior.

Using the same data and hyperparameters as in Example 3 the relative belief ratio of

H_{0}

is

3.748

with posterior content

0.828

so there is reasonably strong evidence in favor of

H_{0} .

Estimating the value of the AUC is then based on conditioning on

H_{0}

being true. Using the conditional prior given that

H_{0}

is true, the relative belief estimate of the AUC is

0.793

with plausible interval

(0.683, 0.857)

with posterior content

0.839 .

The optimal cutoff is estimated as

c_{o p t} (d) = 0.739

with plausible interval

(0.316, 1.228)

having posterior content

0.875 .

Figure 6 is a plot of the prior density, posterior density and relative belief ratio of

c_{mod} .

The estimates of the error characteristics at

c_{o p t} (d)

are as follows: FNR

(0.739) = 0.43

, FPR

(0.739) = 0.19

, Error

(0.739) = 0.28

, FDR

(0.739) = 0.28

, FNDR

(0.624) = 0.264 .

It is notable that these inferences are very similar to those in Example 3. It is also noted that the sample sizes are not big and so the only situation where it might be expected that the inferences will be quite different between the two analyses is when the variances are substantially different.

3.4. Nonparametric Bayes Model

Suppose that X is a continuous variable, of course still measured to some finite accuracy, and available information is such that no particular finite dimensional family of distributions is considered feasible. The situation is considered where a normal distribution

N (μ, σ^{2}),

perhaps after transforming the data, is considered as a possible base distribution for X but we want to allow for deviation from this form. Alternative choices can also be made for the base distribution. The statistical model is then to assume that the

x_{N D}

and

x_{D}

are generated as samples from

F_{N D}

and

F_{D},

where these are independent values from a DP

(a, H)

(Dirichlet) process with base

H = N (μ, σ^{2})

for some

(μ, σ^{2})

and concentration parameter

a .

Actually, since it is difficult to argue for some particular choice of

(μ, σ^{2}),

it is supposed that

(μ, σ^{2})

also has a prior

π (μ, σ^{2}) .

The prior on

(F_{N D}, F_{D})

is then specified hierarchically as a mixture Dirichlet process,

\begin{matrix} (μ_{N D}, σ_{N D}^{2}) \sim π independent of (μ_{D}, σ_{D}^{2}) \sim π, \\ F_{N D} | (μ_{N D}, σ_{N D}^{2}) \sim DP (a_{N D}, N (μ_{N D}, σ_{N D}^{2})) independent of \\ F_{D} | (μ_{D}, σ_{D}^{2}) \sim DP (a_{D}, N (μ_{D}, σ_{D}^{2})) . \end{matrix}

To complete the prior it is necessary to specify

π

and the concentration parameters

a_{N D}

and

a_{D} .

For

π

the prior is taken to be a normal distribution elicited as discussed in Section 3.3 although other choices are possible. For eliciting the concentration parameters, consider how strongly it is believed that normality holds and for convenience suppose

a = a_{N D} = a_{D} .

If

F \sim

DP

(a, H)

with H a probability measure, then

E (F (A)) = H (A)

and

V a r (F (A)) = H (A) (1 - H (A)) / (1 + a) .

When F a random measure from

P,

then

{sup}_{A} P (| F (A) - H (A) | \geq ε) = {sup}_{A} {1 - P (max (0, H (A) - ε) < F (A) < min (1, H (A) + ε))}

which, when

P \sim

DP

(a, H),

equals

sup_{r \in [0, 1]} {1 - B ([max (0, r - ε), min (1, r + ε)], a r, a (1 - r))}

(11)

where

B (\cdot, β_{1}, β_{2})

denotes the beta

(β_{1}, β_{2})

measure. This upper bound on the probability that the random F differs from H by at least

ε

on an event can be made as small as desirable by choosing a large enough. For example, if

ε = 0.25

and it is required that this upper bound be less than

0.1,

then this satisfied when

a \geq 9.8

and if instead

ε = 0.1

, then

a \geq 66.8

is necessary. Note that, since this bound holds for every continuous probability measure

H,

it also holds when H is random, as considered here. So a is controlling how close it is believed that the true distribution is to H. Alternative methods for eliciting a can be found in [24,25].

Generating

(F_{N D}, F_{D})

from the prior for given

(a, H)

can only be done approximately and the approach of [26] is adopted. For this, integer

n^{*}

is specified and measure

P_{n^{*}} = \sum_{i = 1}^{n^{*}} p_{i, n^{*}} I_{{c_{i}}}

is generated where

(p_{1, n^{*}}, \dots, p_{n^{*}, n^{*}}) \sim D i r i c h l e t (a / n^{*}, \dots, . a / n^{*})

independent of

c_{1}, \dots, c_{n^{*}} \overset{i i d}{\sim} H,

since

P_{n^{*}} \overset{w}{\to}

DP

(a, H)

as

n^{*} \to \infty .

So to carry out a priori calculations proceed as follows. Generate

\begin{matrix} (p_{N D 1, n^{*}}, \dots, p_{N D n^{*}, n^{*}}) \sim Dirichlet ((a / n^{*}) 1_{n^{*}}), (μ_{N D}, σ_{N D}^{2}) \sim π, \\ (c_{N D 1}, \dots, c_{N D n^{*}}) | (μ_{N D}, σ_{N D}^{2}) \overset{i . i . d .}{\sim} N (μ_{N D}, σ_{N D}^{2}), w \sim beta (α_{1 w}, α_{2 w}) \end{matrix}

and similarly for

(p_{D 1, n^{*}}, \dots, p_{D n^{*}, n^{*}}), (μ_{D}, σ_{D}^{2}),

and

(c_{D 1}, \dots, c_{D n^{*}}) .

Then

F_{N D, n^{*}} (c) = \sum_{{i : c_{N D i} \leq c}} p_{N D i n^{*}}

is the random cdf at

c \in R^{1}

and similarly for

F_{D, n^{*}}

, so AUC

= \sum_{i = 1}^{n^{*}} (1 - F_{D, n^{*}} (c_{N D i})) p_{N D i, n^{*}}

is a value from the prior distribution of the AUC. This is done repeatedly to get the prior distribution of the AUC as in our previous discussions and we proceed similarly for the other quantities of interest.

Now

F_{N D} | x_{N D}, (μ_{N D}, σ_{N D}^{2}, μ_{D}, σ_{D}^{2}) \sim DP (a + n_{N D}, H_{N D})

independent of

F_{D} | x_{D}, (μ_{N D}, σ_{N D}^{2}, μ_{D}, σ_{D}^{2}) \sim DP (a + n_{D}, H_{D})

with

H_{N D} (c) = a Φ ((c - μ_{N D}) / σ_{N D}) / (a + n_{N D}) + n_{N D} {\hat{F}}_{N D} (c) / (a + n_{N D})

and

{\hat{F}}_{N D} (c) = \sum_{i = 1}^{n_{N D}} I_{(- \infty, c]} (x_{N D i}) / n_{N D}

is the empirical cdf (ecdf) based on

x_{N D}

and similarly for

H_{D} .

The posteriors of

(μ_{N D}, σ_{N D}^{2})

and

(μ_{D}, σ_{D}^{2})

are obtained via results in [27,28]. The posterior density of

(μ_{N D}, σ_{N D}^{2})

given

x_{N D}

is proportional to

π (μ_{N D}, σ_{N D}^{2}) \prod_{i = 1}^{{\tilde{n}}_{N D}} σ_{N D}^{- 1} φ (({\tilde{x}}_{N D i} - μ_{N D}) / μ_{N D})

where

{\tilde{n}}_{N D}

is the number of unique values in

x_{N D}

and

{{\tilde{x}}_{N D 1}, \dots, {\tilde{x}}_{N D {\tilde{n}}_{N D}}}

is the set of unique values with mean

{\tilde{x}}_{N D}

and sum of squared deviations

{\tilde{s}}_{N D}^{2}

. From this it is immediate that

\begin{matrix} μ_{N D} | σ_{N D}^{2}, x_{N D} \sim N ({({\tilde{n}}_{N D} + 1 / τ_{0}^{2})}^{- 1} ({\tilde{n}}_{N D} {\tilde{x}}_{N D} + μ_{0} / τ_{0}^{2}), {({\tilde{n}}_{N D N D} + 1 / τ_{0}^{2})}^{- 1} σ_{N D}^{2}), \\ 1 / σ_{N D}^{2} | x_{N D} \sim gamma (α_{0} + {\tilde{n}}_{N D} / 2, {\tilde{λ}}_{x_{N D}}) \end{matrix}

where

{\tilde{λ}}_{x_{N D}} = λ_{0} + {\tilde{s}}_{N D}^{2} / 2 + {({\tilde{n}}_{N D} + 1 / τ_{0}^{2})}^{- 1} ({\tilde{n}}_{N D} / τ_{0}^{2}) {({\tilde{x}}_{N D} - μ_{0})}^{2} / 2 .

A similar result holds for the posterior of

(μ_{D}, σ_{D}^{2}) .

To approximately generate from the full posterior specify some

n^{* *},

put

p_{a, n_{N D}} = a / (a + n_{N D}), q_{a, n_{N D}} = 1 - p_{a, n_{N D}}

and generate

\begin{matrix} (p_{N D 1, n^{* *}}, \dots, p_{N D n^{* *}, n^{* *}}) | x_{N D} \sim Dirichlet (((a + n_{N D}) / n^{* *}) 1_{n^{* *}}), \\ (μ_{N D}, σ_{N D}^{2}) | x_{N D} \sim π (\cdot | x_{N D}), \\ (c_{N D 1}, \dots, c_{N D n^{* *}}) | (μ_{N D}, σ_{N D}^{2}), x_{N D} \overset{i . i . d .}{\sim} p_{a, n_{N D}} N (μ_{N D}, σ_{N D}^{2}) + q_{a, n_{N D}} {\hat{F}}_{N D}, \\ w | x_{N D} \sim beta (α_{1 w} + n_{D}, α_{2 w} + n_{N D}) \end{matrix}

and similarly for

(p_{D 1, n^{* *}}, \dots, p_{D n^{* *}, n^{* *}}), (μ_{D}, σ_{D}^{2})

and

(c_{D 1}, \dots, c_{D n^{* *}}) .

If the data does not comprise a sample from the full population, then the posterior for w is replaced by its prior.

There is an issue that arises when making inference about

c_{o p t},

namely, the distributions for

c_{o p t}

that arises from this approach can be very irregular and particularly the posterior distribution. In part this is due to the discreteness of the posterior distributions of

F_{N D}

and

F_{D}

. This does not affect the prior distribution because the points on which the generated distributions are concentrated vary quite continuously among the realizations and this leads to a relatively smooth prior density for

c_{o p t} .

For the posterior, however, the sampling from the ecdf leads to a very irregular, multimodal density for

c_{o p t} .

So some smoothing is necessary in this case.

Consider now applying such an analysis to the dataset of Example 3, where we know the true values of the quantities of interest and then to a dataset concerned with the COVID-19 epidemic.

Example 5.

Binormal data (Examples 3 and 4)

The data used in Example 3 are now analyzed but using the methods of this section. The prior on

(μ_{N D}, σ_{N D}^{2}), (μ_{D}, σ_{D}^{2})

and w is taken to be the same as that used in Example 4 so the variances are not assumed to be the same. The value

ε = 0.25

is used and requiring (11) to be less than

0.018

leads to

a = 20 .

So the true distributions are allowed to differ quite substantially from a normal distribution. Testing the hypothesis

H_{0} :

AUC

> 1 / 2

led to the relative belief ratio

1.992

(maximum possible value is 2) and the strength of the evidence is

0.997

so there is strong evidence that

H_{0}

is true. The AUC, based on the prior conditioned on

H_{0}

being true, is estimated to be equal to

0.839

with plausible interval

(0.691, 0.929)

having posterior content

0.814 .

For these data

c_{o p t} (d) = 0.850

with plausible interval

(0.45, 1.75)

having posterior content

0.835

. The true value of the AUC is

0.760

and the true value of

c_{o p t}

is

0.905

so these inferences are certainly reasonable although, as one might expect, when the length of the plausible intervals are taken into account, they are not as accurate as those when binormality is assumed as this is correct for this data. So the DP approach worked here although the posterior density for

c_{o p t}

was quite multimodal and required some smoothing (averaging 3 consecutive values).

Example 6.

COVID-19 data.

A dataset was downloaded from https://github.com/YasinKhc/Covid-19 containing data on 3397 individuals diagnosed with COVID-19 and includes whether or not the patient survived the disease, their gender and their age. There are 1136 complete cases on these variables of which 646 are male, with 52 having died, and 490 are female, with 25 having died. Our interest is in the use of a patient’s age X to predict whether or not they will survive. More detail on this dataset can be found in [29]. The goal is to determine a cutoff age so that extra medical attention can be paid to patients beyond that age. Furthermore, it is desirable to see whether or not gender leads to differences so separate analyses can be carried out by gender. So, for example, in the male group ND refers to those males with COVID-19 that will not die and D refers to the population that will. Looking at histograms of the data, it is quite clear that binormality is not a suitable assumption and no transformation of the age variable seems to be available to make a normality assumption more suitable. Table 3 gives summary statistics for the subgroups. Of some note is that condition (8), when using standard estimates for population quantities such as

w = 52 / 646 = 0.08

for Males and

w = 25 / 490 = 0.05

for females, is not satisfied which suggests that in a binormal analysis no finite optimal cutoff exists.

For the prior, it is assumed that

(μ_{N D}, σ_{N D}^{2})

and

(μ_{D}, σ_{D}^{2})

are independent values from the same prior distribution as in (10). For the prior elicitation, as discussed in Example 3, suppose it is known with virtual certainty that both means lie in

(20, 70)

and

(l_{0}, u_{0}) = (20, 50)

so we take

μ_{0} = 45, τ_{0} = (m_{2} - m_{1}) / 2 u_{0} = 0.75

and the iterative process leads to

(λ_{1}, λ_{2}) = (8.545, 1080.596)

which implies a prior on the σ’s with mode at

10.932

and the interval

(7.764, 19.411)

containing

0.99

of the prior probability. Here the relevant prevalence refers to the proportion of COVID-19 patients that will die and it is supposed that

w \in [0.00, 0.15]

with virtual certainty which implies

w \sim

beta

(9.81, 109.66) .

So the prior probability that someone with COVID-19 will die is assumed to be less than 15% with virtual certainty. Since normality is not an appropriate assumption for the distribution of

X,

the choice

ε = 0.25

with the upper bound (11) equal to

0.1

seems reasonable and so

a = 9.8 .

This specifies the prior that is used for the analysis with both genders and it is to be noted that it is not highly informative.

For males the hypothesis AUC

> 1 / 2

is assessed and

R B = 1.991

(maximum value 2) with strength effectively equal to

1.00

was obtained, so there is extremely strong evidence that this is true. The unconditional estimate of the AUC is

0.808

with plausible region

[0.698, 0.888]

having posterior content

0.959,

so there is a fair bit of uncertainty concerning the true value. For the conditional analysis, given that AUC

> 1 / 2,

the estimate of the AUC is

0.806

with plausible region

[0.731, 0.861]

having posterior content

0.932 .

So the conditional analysis gives a similar estimate for the AUC with a small increase in accuracy. In either case it seems that the AUC is indicating that age should be a reasonable diagnostic. Note that the standard nonparametric estimate of the AUC is

0.810

so the two approaches agree here. For females the hypothesis AUC

> 1 / 2

is assessed and

R B = 1.994

with strength effectively equal to 1 was obtained, so there is extremely strong evidence that this is true. The unconditional estimate of the AUC is

0.873

with plausible region

(0.742, 0.948)

having posterior content

0.968

. For the conditional analysis, given that AUC

> 1 / 2,

the estimate of the AUC is

0.874

with plausible region

(0.791, 0.936)

having posterior content

0.956 .

The traditional estimate of the AUC is

0.902

so the two approaches are again in close agreement.

Inferences for

c_{o p t}

are more problematical in both genders. Consider the male data. The data set is very discrete as there are many repeats and the approach samples from the ecdf about 84% of the time for the males that died and 98% of the time for the males that did not die. The result is a plausible region that is not contiguous even with smoothing. Without smoothing the estimate is

c_{o p t} (d) = 85.5

for males, which is a very dominant peak for the relative belief ratio. The plausible region contains

0.928

of the posterior probability and, although it is not a contiguous interval, the subinterval

[85.2, 85.8]

is a

0.58

-credible interval for

c_{o p t}

that is in agreement with the evidence. If we make the data continuous by adding a uniform(0,1) random error to each age in the data set, then

c_{o p t} (d) = 86.1

and plausible interval

[75.9, 86.7]

with posterior content

0.968

is obtained. These cutoffs are both greater than the maximum value in the ND data, so there is ample protection against false positives but it is undoubtedly false negatives that are of most concern in this context. If instead the FNDR is used as the error criterion to minimize, then

c_{o p t} (d) = 35.7

and plausible interval

[26.1, 35.7]

with posterior content

0.826

is obtained and so in this case there will be too many false positives. So a useful optimal cutoff incorporating the relevant prevalence does not seem to exist with these data.

If the relevant prevalence is ignored and

w_{0}

FNR

+ (1 - w_{0})

FPR is used for some fixed weight

w_{0}

to determine

c_{o p t} (d)

, then more reasonable values are obtained. Table 4 gives the estimates for various

w_{0}

values. With

w_{0} = 0.5

(corresponding to using Youden’s index)

c_{o p t} (d) = 65.7

while if

w_{0} = 0.7,

then

c_{o p t} (d) = 56.7 .

When

w_{0}

is too small or too large then the value of

c_{o p t} (d)

is not useful. While these estimates do not depend on the relevant prevalence, the error characteristics that do depend on this prevalence (as expressed via its prior and posterior distributions) can still be quoted and a decision made as to whether or not to use the diagnostic. Table 5 contains the estimates of the error characteristics at

c_{o p t} (d)

for various values of

w_{0}

where these are determined using the prior and posterior on the relevant prevalence

w .

Note that these estimates are determined as the values that maximize the corresponding relative belief ratios and take into account the posterior of

w .

So, for example, the estimate of the Error is not the convex combination of the estimates of FNR and FPR based on the

w_{0}

weight. Another approach is to simply set the cutoff Age at a value at a value

c_{0}

and then investigate the error characteristics at that value. For example, with

c_{0} = 60,

then the estimated values are given by FNR

(c_{0}) = 0.238,

FPR

(c_{0}) = 0.308,

Error

(c_{0}) = 0.328,

FDR

(c_{0}) =

0.818

and FNDR

(c_{0}) = 0.028 .

Similar results are obtained for the cutoff with female data although with different values. Overall, Age by itself does not seem to be useful classifier although that is a decision for medical practitioners. Perhaps it is more important to treat those who stand a significant chance of dying more extensively and not worry too much that some treatments are not necessary. The clear message from this data, however, is that a relatively high AUC does not immediately imply that a diagnostic is useful and the relevant prevalence is a key aspect of this determination.

4. Conclusions

ROC analyses represent a significant practical application of statistical methodology. While previous work has considered such analyses within a Bayesian framework, this has typically required the specification of loss functions. Losses are not required in the approach taken here which is entirely based on a natural characterization of statistical evidence via the principle of evidence and the relative belief ratio. As discussed in Section 2.2 this results in a number of good properties for the inferences that are not possessed by inferences derived by other approaches. While the Bayes factor is also a valid measure of evidence, its usage is far more restricted than the relative belief ratio which can be applied with any prior, without the need for any modifications, for both hypothesis assessment and estimation problems. This paper has demonstrated the application of relative belief to ROC analyses under a number of model assumptions. In addition, as documented in points (ii)–(vi) of the Introduction, a number of new results have been developed for ROC analyses more generally.

Author Contributions

Methodology, L.A.-L. and M.E.; Investigation, Q.L.; Writing—original draft, M.E.; Supervision, M.E. All authors have read and agreed to the published version of the manuscript.

Funding

Evans was supported by grant 10671 from the Natural Sciences and Engineering Research Council of Canada.

Data Availability Statement

The data and R code used for the examples in Section 3.2, Section 3.3 and Section 3.4 can be obtained at https://utstat.utoronto.ca/mikevans/software/ROCcodeforexamples.zip (accessed on 15 November 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Lemma 1.

Consider

\int_{- \infty}^{\infty} Φ (a + b z) φ (z) d z

as a function of

b,

so

\begin{matrix} \frac{d}{d b} \int_{- \infty}^{\infty} Φ (a + b z) φ (z) d z = \int_{- \infty}^{\infty} z φ (a + b z) φ (z) d z \\ = \frac{1}{\sqrt{2 π} \sqrt{1 + b^{2}}} exp (- \frac{a^{2}}{2 (1 + b^{2})}) \int_{- \infty}^{\infty} z \sqrt{1 + b^{2}} φ (\sqrt{1 + b^{2}} (z - {(1 + b^{2})}^{- 1} a b)) d z \\ = \frac{1}{\sqrt{2 π} \sqrt{1 + b^{2}}} exp (- \frac{a^{2}}{2 (1 + b^{2})}) \frac{a b}{1 + b^{2}} . \end{matrix}

When

a > 0,

then

\int_{- \infty}^{\infty} Φ (a + b z) φ (z) d z

is increasing in b for

b > 0

, decreasing in b for

b < 0,

equals 0 when

b = 0

and when

a < 0

it is decreasing in b for

b > 0

, increasing in b for

b < 0 .

Therefore, when

a > 0, b > 0,

then

\int_{- \infty}^{\infty} Φ (a + b z) φ (z) d z \geq

Φ (a) > 1 / 2

and when

a \leq 0, b > 0

then

\int_{- \infty}^{\infty} Φ (a + b z) φ (z) d z \leq

Φ (a) \leq 1 / 2 .

□

Proof of Lemma 2.

Note that

c_{o p t}

will satisfy

\frac{d}{d c} Error (c) = \frac{w}{σ_{D}} φ (\frac{c - μ_{D}}{σ_{D}}) - \frac{1 - w}{σ_{N D}} φ (\frac{c - μ_{N D}}{σ_{N D}}) = 0

which implies

φ (\frac{c - μ_{D}}{σ_{D}}) / φ (\frac{c - μ_{N D}}{σ_{N D}}) = \frac{1 - w}{w} \frac{σ_{D}}{σ_{N D}}

(A1)

So

c_{o p t}

is a root of the quadratic

(1 / σ_{D}^{2} - 1 / σ_{N D}^{2}) c^{2} - 2 (μ_{D} / σ_{D}^{2} - μ_{N D} / σ_{N D}^{2}) c + (μ_{D}^{2} / σ_{D}^{2} - μ_{N D}^{2} / σ_{N D}^{2} + 2 log ((1 - w) σ_{D} / w σ_{N D}))

. A single real root exists when

σ_{D}^{2} = σ_{N D}^{2} = σ^{2}

and is given by (5).

If

σ_{D}^{2} \neq σ_{N D}^{2}

, then there are two real roots when the discriminant

\begin{matrix} 4 {(μ_{D} / σ_{D}^{2} - μ_{N D} / σ_{N D}^{2})}^{2} - 4 (1 / σ_{D}^{2} - 1 / σ_{N D}^{2}) (μ_{D}^{2} / σ_{D}^{2} - μ_{N D}^{2} / σ_{N D}^{2} + \\ 2 log ((1 - w) σ_{D} / w σ_{N D})) \geq 0 \end{matrix}

establishing (6). To be a minimum the root c has to satisfy

0 < \frac{d^{2} {Error}_{w} (c)}{d c^{2}} = - \frac{w}{σ_{D}^{2}} (\frac{c - μ_{D}}{σ_{D}}) φ (\frac{c - μ_{D}}{σ_{D}}) + \frac{1 - w}{σ_{N D}^{2}} (\frac{c - μ_{N D}}{σ_{N D}}) φ (\frac{c - μ_{N D}}{σ_{N D}})

and by (A1), this holds iff

0 < - \frac{w}{σ_{D}^{2}} (\frac{c - μ_{D}}{σ_{D}}) \frac{1 - w}{w} \frac{σ_{D}}{σ_{N D}} + \frac{1 - w}{σ_{N D}^{2}} (\frac{c - μ_{N D}}{σ_{N D}}) = \frac{1 - w}{σ_{N D}} \{\frac{c - μ_{N D}}{σ_{N D}^{2}} - \frac{c - μ_{D}}{σ_{D}^{2}}\}

which is true iff

(1 / σ_{D}^{2} - 1 / σ_{N D}^{2}) c < μ_{D} / σ_{D}^{2} - μ_{N D} / σ_{N D}^{2} .

When

σ_{D}^{2} = σ_{N D}^{2}

this is true iff

μ_{D} > μ_{N D}

which completes the proof of (i). When

σ_{D}^{2} \neq σ_{N D}^{2}

this, together with the formula for the roots of a quadratic establishes (7). □

Proof of Corollary 1.

Suppose

μ_{D} \geq μ_{N D}

and (6) hold. Then putting

a = 2 (σ_{D}^{2} - σ_{N D}^{2}) log ((1 - w) w^{- 1} σ_{D} σ_{N D}^{- 1})

we have that, for fixed

μ_{D}, σ_{D}^{2}, σ_{N D}^{2}

and

w,

then

{(μ_{D} - μ_{N D})}^{2} + a

is a quadratic in

μ_{N D} .

This quadratic has discriminant

- 4 a

and so has no real roots whenever

a > 0

and, noting a does not depend on

μ_{D},

the only restriction on

μ_{N D}

is

μ_{N D} \leq μ_{D} .

When

a \leq 0

the roots of the quadratic are given by

μ_{D} \pm \sqrt{- a}

and so, since the quadratic is negative between the roots and

μ_{D} - \sqrt{- a} \leq μ_{D} \leq μ_{D} + \sqrt{- a}

the two restrictions imply

μ_{N D} \leq μ_{D} - \sqrt{- a} .

Combining the two cases gives (8).

Now suppose (8) holds. Then

μ_{N D} \leq μ_{D} - {max (0, - a)}^{1 / 2} \leq μ_{D}

which gives the first restriction and also

μ_{N D} - μ_{D} \leq - {max (0, - a)}^{1 / 2} \leq 0

which implies

{(μ_{N D} - μ_{D})}^{2} \geq max (0, - a)

and so

{(μ_{N D} - μ_{D})}^{2} + a \geq max (0, - a) + a

and by examining the cases

a \leq 0

and

a > 0

we conclude that (6) holds. □

References

Metz, C.; Pan, X. “Proper” binormal ROC curves: Theory and maximum-likelihood estimation. Math. Psychol. 1999, 43, 1–33. [Google Scholar] [CrossRef] [PubMed]
Perkins, N.J.; Schisterman, E.F. The inconsistency of “optimal” cutpoints obtained using two criteria based on the Receiver Operating Characteristic Curve. Am. J. Epidemiol. 2006, 163, 670–675. [Google Scholar] [CrossRef] [PubMed] [Green Version]
López-Ratón, M.; Rodríguez-Álvarez, M.X.; Cadarso-Suárez, C.; Gude-Sampedro, F. OptimalCutpoints: An R package for selecting optimal cutpoints in diagnostic tests. J. Stat. 2014, 61, 8. [Google Scholar] [CrossRef] [Green Version]
Unal, I. Defining an optimal cut-point value in ROC analysis: An alternative approach. Comput. Math. Model. Med. 2017, 2017, 3762651. [Google Scholar] [CrossRef]
Verbakel, J.Y.; Steyerberg, E.W.; Uno, H.; De Cock, B.; Wynants, L.; Collins, G.S.; Van Calster, B. ROC plots showed no added value above the AUC when evaluating the performance of clinical prediction models. Clin. Epidemiol. 2020, in press. [Google Scholar]
Hand, D. Measuring classifier performance: A coherent alternative to the area under the ROC curve. Mach. Learn. 2009, 99, 103–123. [Google Scholar] [CrossRef] [Green Version]
Evans, M. Measuring Statistical Evidence Using Relative Belief. In Monographs on Statistics and Applied Probability; CRC Press: Boca Raton, FL, USA; Taylor & Francis: Abingdon, UK, 2015; Volume 144. [Google Scholar]
O’Malley, A.J.; Zou, K.H.; Fielding, J.R.; Tempany, C.M.C. Bayesian regression methodology for estimating a receiver operating characteristic curve with two radiologic applications: Prostate biopsy and spiral CT of ureteral stiones. Acad. Radiol. 2001, 8, 5407–5420. [Google Scholar]
Gu, J.; Ghosal, S.; Roy, A. Bayesian bootstrap estimation of ROC curve. Stat. Med. 2008, 27, 5407–5420. [Google Scholar] [CrossRef]
Erkanli, A.; Sung, M.; Costello, E.J.; Angold, A. Bayesian semi-parametric ROC analysis. Stat. Med. 2006, 25, 3905–3928. [Google Scholar] [CrossRef] [PubMed]
De Carvalho, V.; Jara, A.; Hanson, E.; de Carvalho, M. Bayesian nonparametric ROC regression modeling. Bayesian Anal. 2013, 3, 623–646. [Google Scholar] [CrossRef]
Ladouceur, M.; Rahme, E.; Belisle, P.; Scott, A.; Schwartzman, K.; Joseph, L. Modeling continuous diagnostic test data using approximate Dirichlet process distributions. Stat. Med. 2011, 30, 2648–2662. [Google Scholar] [CrossRef] [PubMed]
Christensen, R.; Johnson, W.; Branscum, A.; Hanson, T.E. Bayesian Ideas and Data Analysis; Chapman and Hall/CRC: Boca Raton, FL, USA, 2011. [Google Scholar]
Rosner, G.L.; Laud, P.W.; Johnson, W.O. Bayesian Thinking in Biostatistics; Chapman and Hall/CRC: Boca Raton, FL, USA, 2021. [Google Scholar]
Diab, A.; Hassan, M.; Marquea, C.; Karlsson, B. Performance analysis of four nonlinearity analysis methods using a model with variable complexity and application to uterine EMG signals. Med. Eng. Phys. 2014, 36, 761–767. [Google Scholar] [CrossRef] [PubMed]
Gao, X.-Y.; Guo, Y.-J.; Shan, W.-R. Regarding the shallow water in an ocean via a Whitham-Broer-Kaup-like system: Hetero-Bäcklund transformations, bilinear forms and M solitons. Chaos Solitons Fractals 2022, 162, 112486. [Google Scholar] [CrossRef]
Gao, X.-T.; Tian, B. Water-wave studies on a (2+1)-dimensional generalized variable-coefficient Boiti–Leon–Pempinelli system. Appl. Math. Lett. 2022, 128, 107858. [Google Scholar] [CrossRef]
Obuchowski, N.; Bullen, J. Receiver operating characteristic (ROC) curves: Review of methods with applications in diagnostic medicine. Phys. Med. Biol. 2018, 63, 07TR01. [Google Scholar] [CrossRef]
Zhou, X.; Obuchowski, N.; McClish, D. Statistical Methods in Diagnostic Medicine, 2nd ed.; Wiley: Hoboken, NJ, USA, 2011. [Google Scholar]
Al-Labadi, L.; Evans, M. Optimal robustness results for some Bayesian procedures and the relationship to prior-data conflict. Bayesian Anal. 2017, 12, 702–728. [Google Scholar] [CrossRef]
Gu, Y.; Li, W.; Evans, M.; Englert, B.-G. Very strong evidence in favor of quantum mechanics and against local hidden variables from a Bayesian analysis. Phys. Rev. A 2019, 99, 022112. [Google Scholar] [CrossRef] [Green Version]
Englert, B.-G.; Evans, M.; Jang, G.-H.; Ng, H.-K.; Nott, D.; Seah, Y.-L. Checking the model and the prior for the constrained multinomial. arXiv 2018, arXiv:1804.06906. [Google Scholar]
Evans, M.; Guttman, I.; Li, P. Prior elicitation, assessment and inference with a Dirichlet prior. Entropy 2017, 19, 564. [Google Scholar] [CrossRef] [Green Version]
Swartz, T. Subjective priors for the Dirichlet process. Commun. Stat. Theory Methods 1993, 28, 2821–2841. [Google Scholar] [CrossRef]
Swartz, T. Nonparametric goodness-of-fit. Commun. Stat. Theory Methods 1999, 22, 2999–3011. [Google Scholar] [CrossRef]
Ishwaran, H.; Zarepour, M. Exact and approximate sum representations for the Dirichlet process. Can. J. Stat. 2002, 30, 269–283. [Google Scholar] [CrossRef]
Antoniak, C.E. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Stat. 1974, 2, 1152–1174. [Google Scholar] [CrossRef]
Doss, H. Bayesian Nonparametric Estimation for Incomplete Data Via Successive Substitution Sampling. Ann. Stat. 1994, 22, 1763–1786. [Google Scholar] [CrossRef]
Charvadeh, Y.K.; Yi, G.Y. Data visualization and descriptive analysis for understanding epidemiological characteristics of COVID-19: A case study of a dataset from January 22, 2020 to March 29, 2020. J. Data Sci. 2020, 18, 526–535. [Google Scholar] [CrossRef]

Figure 1. In Example 2, plots of the prior (- - -), the posterior (—) and the RB ratio of the AUC.

Figure 2. In Example 2, plots of the the prior (+), the posterior (×) and the RB ratio of

c_{o p t}

.

Figure 2. In Example 2, plots of the the prior (+), the posterior (×) and the RB ratio of

c_{o p t}

.

Figure 3. Prior density of the AUC when

p_{D}

is uniform on the set of nondecreasing probabilities independent of

p_{N D}

uniform on the set of nonincreasing probabilities (–) as well as when

p_{D}

is uniformly distributed on the set of nondecreasing probabilities independent of

p_{N D}

uniform on

S_{k}

(- -).

Figure 3. Prior density of the AUC when

p_{D}

is uniform on the set of nondecreasing probabilities independent of

p_{N D}

uniform on the set of nonincreasing probabilities (–) as well as when

p_{D}

is uniformly distributed on the set of nondecreasing probabilities independent of

p_{N D}

uniform on

S_{k}

(- -).

Figure 4. The conditioned prior (- -) and posterior (–) densities (left panel) and the relative belief ratio (right panel) of the AUC in Example 3.

Figure 5. Plots of the prior (- -), posterior (left panel) and relative belief ratio (right panel) of

c_{o p t}

in Example 3.

Figure 5. Plots of the prior (- -), posterior (left panel) and relative belief ratio (right panel) of

c_{o p t}

in Example 3.

Figure 6. Plots of the prior (- -), posterior (left panel) and relative belief ratio (right panel) of

c_{o p t}

in Example 4.

Figure 6. Plots of the prior (- -), posterior (left panel) and relative belief ratio (right panel) of

c_{o p t}

in Example 4.

Table 1. Error probabilities when

X > c

indicates a positive.

Table 1. Error probabilities when

X > c

indicates a positive.

	$Ω_{D}$	$Ω_{N D}$
$X > c$	$\begin{matrix} TPR (c) = 1 - F_{D} (c) \\ sensitivity (recall) or \\ true positive rate \end{matrix}$	$\begin{matrix} FPR (c) = 1 - F_{N D} (c) \\ false positive rate \end{matrix}$
$X \leq c$	$\begin{matrix} FNR (c) = F_{D} (c) \\ false negative rate \end{matrix}$	$\begin{matrix} TNR (c) = F_{N D} (c) \\ specificity or \\ true negative rate \end{matrix}$

Table 2. The estimates of the error characteristcs of X at

c_{o p t} = 2

in Example 2 where (a) w is assumed known, (b) only the prior for w is available, (c) the posterior for w is also available.

Table 2. The estimates of the error characteristcs of X at

c_{o p t} = 2

in Example 2 where (a) w is assumed known, (b) only the prior for w is available, (c) the posterior for w is also available.

Quantity	Estimate (a)	Estimate (b)	Estimate (c)
$FPR (c_{o p t})$	$0.30$	$0.26$	$0.30$
$FNR (c_{o p t})$	$0.22$	$0.22$	$0.22$
$Error (c_{o p t})$	$0.22$	$0.22$	$0.22$
$FDR (c_{o p t})$	$0.14$	$0.14$	$0.14$
$FNDR (c_{o p t})$	$0.34$	$0.34$	$0.34$

Table 3. Summary statistics for the data in Example 6.

Group	Number	Mean	std. dev.	Min	Max
ND males	594	$48.81$	$17.72$	$0.50$	$85.00$
D males	52	$68.46$	$13.66$	$36.00$	$89.00$
ND females	465	$48.69$	$18.73$	$2.00$	$96.00$
D females	25	$77.36$	$12.12$	$48.00$	$95.00$

Table 4. Weighted error

w_{0}

FNR+(

1 - w_{0})

FPR determining

c_{o p t} (d)

for Males in Example 6.

Table 4. Weighted error

w_{0}

FNR+(

1 - w_{0})

FPR determining

c_{o p t} (d)

for Males in Example 6.

$w_{0} =$ Weight of FNR	$c_{opt} (d)$	Plausible Range (post. prob.)
$0.1$	$85.5$	75.3–118.5 $(0.945)$
$0.3$	$65.1$	64.5–85.5 $(0.868)$
$0.5$	$65.1$	55.5–72.3 $(0.939)$
$0.7$	$56.7$	35.7–58.5 $(0.919)$
$0.9$	$35.7$	33.3–52.5 $(0.875)$

Table 5. Error characteristics for Males in Example 6 at various weights.

$w_{0} =$ Weight of FNR	FNR	FPR	Error	FDR	FNDR
$0.1$	$0.918$	$0.008$	$0.008$	$0.458$	$0.073$
$0.3$	$0.368$	$0.183$	$0.213$	$0.733$	$0.043$
$0.5$	$0.368$	$0.183$	$0.213$	$0.733$	$0.038$
$0.7$	$0.158$	$0.358$	$0.363$	$0.823$	$0.018$
$0.9$	$0.003$	$0.753$	$0.688$	$0.893$	$0.003$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Al-Labadi, L.; Evans, M.; Liang, Q. ROC Analyses Based on Measuring Evidence Using the Relative Belief Ratio. Entropy 2022, 24, 1710. https://doi.org/10.3390/e24121710

AMA Style

Al-Labadi L, Evans M, Liang Q. ROC Analyses Based on Measuring Evidence Using the Relative Belief Ratio. Entropy. 2022; 24(12):1710. https://doi.org/10.3390/e24121710

Chicago/Turabian Style

Al-Labadi, Luai, Michael Evans, and Qiaoyu Liang. 2022. "ROC Analyses Based on Measuring Evidence Using the Relative Belief Ratio" Entropy 24, no. 12: 1710. https://doi.org/10.3390/e24121710

APA Style

Al-Labadi, L., Evans, M., & Liang, Q. (2022). ROC Analyses Based on Measuring Evidence Using the Relative Belief Ratio. Entropy, 24(12), 1710. https://doi.org/10.3390/e24121710

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ROC Analyses Based on Measuring Evidence Using the Relative Belief Ratio

Abstract

1. Introduction

2. The Problem

2.1. The AUC and ROC

2.2. Relative Belief Inferences

3. Inferences for an ROC Analysis

3.1. The Prevalence

3.2. Ordered Discrete Diagnostic

3.3. Binormal Diagnostic

3.4. Nonparametric Bayes Model

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI