This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

This presentation will emphasize the estimation of the combined accuracy of two or more tests when verification bias is present. Verification bias occurs when some of the subjects are not subject to the gold standard. The approach is Bayesian where the estimation of test accuracy is based on the posterior distribution of the relevant parameter. Accuracy of two combined binary tests is estimated employing either “believe the positive” or “believe the negative” rule, then the true and false positive fractions for each rule are computed for two tests. In order to perform the analysis, the missing at random assumption is imposed, and an interesting example is provided by estimating the combined accuracy of CT and MRI to diagnose lung cancer. The Bayesian approach is extended to two ordinal tests when verification bias is present, and the accuracy of the combined tests is based on the ROC area of the risk function. An example involving mammography with two readers with extreme verification bias illustrates the estimation of the combined test accuracy for ordinal tests.

This article introduces the reader to the methodology of measuring the accuracy of several medical tests that are administered to the patient. Our main focus is on measuring the accuracy of a combination of two or more tests. For example, to diagnose type 2 diabetes, the patient is given a fasting blood glucose test, which is followed by an oral glucose tolerance test. What is the accuracy (true and false positive fractions) of this combination of two tests? Or, in order to diagnose coronary artery disease, the subject's history of chest pain is followed by an exercise stress test. Still another example is for the diagnosis of prostate cancer, where a digital rectal exam is followed by measuring PSA (prostate specific antigen). The reader is referred to Johnson, Sandmire, and Klein [

In many scenarios, it is common practice to administer one or more tests to diagnose a given condition and we will explore two avenues. One case where it is common to administer several tests is in standard medical practice, and the other is an experimental situation where one test is compared to a standard medical test. An example of the latter is that MRI is now being studied as an alternative to standard mammography, as a means to diagnose breast cancer.

There are many studies that assess the accuracy of the combination of two or more tests. Two tests for the diagnosis of a disease measure different aspects or characteristics of the same disease. In the case of diagnostic imaging, two modalities have different qualities (resolution, contrast, and noise), thus, although they are imaging the same scene, the information is not the same from the two sources. When this is the case, the accuracy of the combination of two modalities is of paramount importance. For example, the accuracy of the combination of mammography and scintimammography for suspected breast cancer has been reported by Buscombe, Cwikla, Holloway, and Hilson [^{18}F-FET PET and ^{18}F-FDG PET to assess the extent of the disease and estimated the accuracy of each and combined. On the other hand, Schaffler, Wolf, Schoelinast ^{18}F-FDG PET and the combination of the two.

What is the optimal way to measure the accuracy for combination of two binary tests? Pepe ([

The BP rule increases sensitivity relative to the two binary tests, but increases the FPF (false positive fraction), but by no more than the sum of the two false positive fractions FPF1 + FPF2. The false positive fraction of test 1 and test 2 are denoted by FPF1 and FPF2 respectively;

The BN rule decreases the false positive rate relative to the false positive rates of the two tests, but at the same time, decreases the sensitivity, however, the sensitivity remains above TPF1 + TPF2 − 1. Note that the true positive fractions for the two tests are designated by TPF1 and TPF2.

Thus, with the BP rule the combined test is scored positive if one or the other or both of the two are scored positive, but, on the other hand, with the BN rule the combined test is scored positive if both tests are positive.

Verification bias is present when some of the subjects are not subject to the gold standard, thus, the disease status of some of the patients is not known. Consider an example involving the exercise stress test to diagnose heart disease. Among those that test positive, some will undergo coronary angiography to confirm the diagnosis. On the other hand, among those that test negative, very few will be subject to the gold standard. Using only those cases that are verified will lead to biased estimates of test accuracy both of the true and false positive rates, thus, alternative methods based on the missing at random assumptions will be derived from a Bayesian viewpoint.

When assessing the accuracy of two tests, the design in many cases is paired. When two tests are used to assess the patient's condition, true and false positive fractions will be estimated assuming the BP (believe the positive) and BN (believe the negative) rules. For example, two modalities (e.g., CT and MRI) are imaging the same patients and the two images would be expected to be quite similar. Another case of a paired design is for two readers who are imaging the same set of patients with the same imaging device. One expects the information gained from the two paired sources to be highly correlated, and in the case of two paired readers, agreement between the two is also of interest. The experimental layout for a paired study when verification bias is present, namely:

A Good introduction to statistical methods for estimating test accuracy with verification bias is Zhou

The two binary tests are _{1} and _{2}, where 1 and 0 designate positive and negative tests respectively. For those subjects who are verified for disease V = 1, while those who are not verified are denoted with V = 0.

Note that the number of subjects verified under the gold standard, when both tests are positive, is _{11} + _{11}, among which _{11} had the disease and _{11} did not have the disease, and also the number that were not verified under the gold standard when both tests are positive is _{11}

The following derivation is based on the MAR assumption, namely

Thus, the probability a subject's disease status is verified depends only on the outcomes of the two tests, and not on other considerations. Derivations below follow to some extent Chapter 10 of Zhou, Obuchowski, and McClish^{8}.

Suppose the unknown parameters are defined as follows:

Also let

The likelihood for the parameters is

Assuming an improper prior distribution for the parameters, the posterior distributions are
_{ij}_{00}, _{01}, _{10}, _{11}).

The improper prior imposed on the parameters is given by the density

Note that if a uniform prior is assumed, one should adjust the posterior distributions for the phis and thetas by adding a one to the beta and Dirichlet hyper parameters given in formulas

The main parameters of interest are the true positive fraction and the false positive fraction for the two tests, thus for the first test
_{i.}

The main focus of this section is on measuring the accuracy of the combined test in the presence of verification bias of both tests using the BN (believe the negative) and BP (believe the positive) principles.

Assume the BP principle is in effect, then the true positive fraction for the combined test is

Formulas _{ij}_{ij}

Consider the hypothetical results of two correlated binary tests when verification bias is present as given below. The first test Y_{1} give the results for a CT determination of lung cancer risk, where a 0 indicates a small risk and a 1 a high risk of lung cancer, while the second test Y_{2} is a determination of lung cancer risk using MRI. The patients where D = 0 do not have lung cancer.

Using

model; {

# two binary tests verification bias

# accuracy of combined tests

g00∼dgamma(m00,2)

g01∼dgamma(m01,2)

g10∼dgamma(m10,2)

g11∼dgamma(m11,2)

h<-g00+g01+g10+g11

th00<-g00/h

th01<-g01/h

th10<-g10/h

th11<-g11/h

ph00∼dbeta(s00,r00)

ph01∼dbeta(s01,r01)

ph10∼dbeta(s10,r10)

ph11∼dbeta(s11,r11)

s1.<-s11+s10

r1.<-r11+r10

s.1<-s01+s11

r.1<-r01+r11

r0.<-r00+r01

s0.<-s00+s01

s.0<-s00+s10

r.0<-r00+r10

ph1.∼dbeta(s1.,r1.)

ph.1∼dbeta(s.1,r.1)

ph0.∼dbeta(s0.,r0.)

ph.0∼dbeta(s.0,r.0)

th1.<-th11+th10

th.1<-th01+th11

th0.<-th01+th00

th.0<-th00+th10

# accuracy for test 1

tpf1<-ph1.*th1./pd1

fpf1<-(1-ph1.)*th1./(1-pd1)

# p[D-1]

pd1<-ph1.*th1.+ph0.*th0.

# accuracy for test 2

tpf2<-ph.1*th.1/pd2

fpf2<-(1-ph.1)*th.1/(1-pd2)

pd2<-ph.1*th.1+ph.0*th.0

# accuracy combined tests

# believe the positive, BP

tpfbp<-(ph11*th11+ph01*th01+ph10*th10)/pd

# P[d=1]

pd<-ph11*th11+ph10*th10+ph01*th01+ph00*th00

fpfbp<-((1-ph11)*th11+(1-ph10)*th10+(1-ph01)*th01)/(1-pd)

# believe the negative, BN

tpfbn<-ph11*th11/pd

fpfbn<- (1-ph11)*th11/(1-pd)}

# CT and MRI for lung cancer risk with improper prior

# for a uniform prior, add a one to the values in the list statement

list(s00=3,r00=18,s01=9,r01=13,s10=12,r10=9,s11=14,r11=4, m00=31,m01=31,m10=29,m11=25)

# activate initial values from the specification tool with the gen inits button

Note the above code closely follows the derivation given in formulas

The results of the analysis show that the false positive fractions for the two modalities are fairly high being 0.286 and 0.38 for CT and MRI respectively, but on the other hand, the true positive fractions for the two modalities are somewhat low at 0.6755 and 0.60 respectively for CT and MRI. It is also observed that the MCMC errors for all parameters are less than 0.0001 and the posterior distributions of all parameters appear to be symmetric about the posterior mean. Assuming the BN rule, the true and false positive fractions are estimated as 0.3663 and 0.0880, but as for the BP rule, the corresponding estimates (based on the posterior mean) are 0.9165 and 0.5766! The fact that both modalities are not very accurate is reflected in the estimated combined accuracies for the BN and BP rules. At first glance, the BP rule is encouraging in that the true positive fraction is 0.9165, but the false positive fraction is also large as 0.5766. The BN rule gives a low estimate of 0.0880 for the false positive rate, but also a low estimate of 0.3663 for the true positive fraction. The posterior density of the true positive fraction for the BP rule is depicted below. Note what rule the user adopts depends on their personal preference.

The reader is referred to Pepe ([_{2}. Note when the subject is verified with D = 0, the patient does not have lung cancer.

The results of

When both tests are negative, 21 patients are not subject to the gold standard. Since they were not verified, one does not know their disease status (D = 0 or D = 1), thus one does not know the fraction of patients with disease, and the true and false positive fractions cannot be estimated. This is a case where it is not possible to estimate the true and false positive fractions for either test. However, not all is lost because other measures of test accuracies for both tests can be estimated.

Consider the detection probability for CT namely P[_{1}= 1, D = 1], then the estimated detection probability is 26/82 = 0.317, while the detection probability of MRI is estimated as 23/82 = 0.280. In lieu of the true and false positive fractions, the detection probability and the false referral probability for both tests aid in the estimation of the test accuracy of the combined tests. For example, the false referral probability for CT is P[_{1} = 1, D = 0] which is estimated as 13/82 = 0.158, and for MRI the false referral probability is estimated as 17/82 = 0.2073.

It is interesting to note that the detection probability of a test is expressed as
_{1} is the detection probability of the first test and _{2} the detection probability for the second test. In a similar way, the false positive fractions can be compared by the ratio:
_{1} is the false referral probability for the first test and _{2} the false referral probability of the second. Of course, the Bayesian approach will determine the posterior distribution of these quantities.

It is interesting to observe that the individual true and false positive fractions cannot be estimated, but that the true and false positive fractions of two tests can be compared for studies with extreme verification bias.

How is the accuracy of the combined tests estimated? Can one employ the BP (believe the positive) and BN rules to estimate the accuracy of the combined tests? The answer is no!

Consider the BP rule and refer to _{1} =1_{2} =1|D = 1], however, the disease frequency cannot be estimated. On the other hand, the ratio of the true positive fraction for the BP rule relative to the true positive fraction of the BN rule can be measured by

From

In a similar way the false positive fraction for the BP rule relative to the false positive fraction of the BN rule can be measured as

Referring to

Extreme verification bias does not provide sufficient information to estimate the usual measures of test accuracy, however, in lieu of those measures, it is possible to assess the accuracy of two binary tests (with extreme verification bias) by: (1) the detection probabilities; (2) the false referral probabilities and (3) the ratio of the true positive fraction for the BP rule relative to that of the BN rule, and (4) the ratio of the false positive fraction of the BP rule relative to that of the BN rule.

The foundation of the Bayesian analysis given by formulas

With regard to estimating the combined accuracy of the two tests in the presence of extreme verification bias, the posterior distributions of the ratio of the true positive fraction of the BP relative to that of the BN rule is determined as is the ratio of the false positive fraction of the BP rule relative to that of the BN rule.

For the detection probability of the first test
_{ij}_{ij}

For the detection probabilities, the statements are:

Refer to formula

The analysis for the CT-MRI determination of lung cancer risk with extreme verification bias is based on the amended

One notices the skewness of the posterior distribution of the ratio of the false positive fraction of the BP rule relative to the BN rule and the skewness is evident from

The other posterior distributions appear to be symmetric about the posterior mean. The detection probability of the two tests are similar, but the false referral probability of CT is somewhat less than that of MRI and the TPF of the BP rule appears to be 2.61 times that of the BN rules. On the other hand, the false positive fraction of the BP rule is 6.9 times that of the BN rule.

Recall that the true and false positive fractions of the two rules are not known, thus, it is difficult to interpret these ratios. The detection probability of a test is somewhat related to the true positive fraction in that one would favor a test with a higher detection probability. Also, of course, one would favor a test with lower false referral probability, thus, the overall conclusion about test comparison is to favor the CT determination of lung cancer risk. As for the accuracy of the combined test, the two ratios one for the true positive fraction and one for the false positive fraction provide the relevant information. Should the accuracy be based on the BP or the BN rule? The BP rule gives a larger true positive fraction, but unfortunately a much larger false positive fraction.

When analyzing the accuracy of combined tests when the tests are ordinal, a somewhat different approach is taken, where the overall accuracy is measured by the area of the ROC curve of the risk score. The risk scores are the probability of disease of each patient, usually determined. An informative presentation of the risk score is given by Pepe ([

Our general methodology is based on inverse probability weighting which transforms the original table of observations with verification (See Table), to an imputed table. Such ideas will be explained in a later section, but the reader is referred to Pepe ([

The analysis of assessing the accuracy of two tests is now expanded to include two ordinal tests _{1} and _{2}, where the general layout is given by _{i}_{i}

In what is to follow, the posterior distribution of the ROC area of the two ordinal tests is developed, which is to be followed by an explanation of inverse probability weighting, and lastly the use of the risk score to asses the accuracy of the combined tests is explained.

Recall from formulas

The improper prior density used here is the reciprocal of the parameters over the relevant region of the parameter space.

If uniform prior is appropriate, add a 1 to the hyper parameters of the _{i}_{i}

Note that

Of course, a similar expression holds for the ROC area of test 2. After determining the posterior distribution of the ROC areas of the two tests, the accuracy of the combined tests will be measured by the ROC area of the risk score. In order to illustrate Bayesian inference for estimating the accuracy of two ordinal tests, consider the following example, two readers are diagnosing breast cancer. They are both employed by a university cancer center and they both use a three–point scale to diagnose the disease, where a 1 indicates definitely no lesion is seen in the mammogram, a 2 denotes there is a possibility that a lesion is present, and a 3 signifies that a lesion is definitely present The diseased patients in fact have breast cancer and the non-diseased definitely do not have breast cancer.

The total number of patients is
_{1} = _{2} = 1, 109 are referred to the gold standard and 2 are not, while if _{1} = _{2} = 2, 83 are not referred, while 153 are subject the gold standard. What are the ROC areas of the two tests? The following code is based on the previous development of the posterior distributions of the relevant parameters (given by formulas

model;

# hypothetical data set

# two tests for staging melanoma

# one rater is a surgeon the other a dermatologist

# ratings are: stage 1, stage 2, stage 3

# similar to Zhou([8],p347) on CT and MRI

{

for (i in 1:9) {ph[i]∼dbeta(s[i],r[i])}

for (i in 1:9) {g[i]∼dgamma(m[i],2)}

ms<-sum(g[])

for (i in 1:9) {theta[i]<-g[i]/ms }

theta1.<- theta[1]+theta[2]+theta[3]

theta2.<- theta[4]+theta[5]+theta[6]

theta3.<- theta[7]+theta[8]+theta[9]

theta.1<- theta[1]+theta[4]+theta[7]

theta.2<- theta[2]+theta[5]+theta[8]

theta.3<- theta[3]+theta[6]+theta[9]

s1.<-s[1]+s[2]+s[3]

s2.<-s[4]+s[5]+s[6]

s3.<-s[7]+s[8]+s[9]

s.1<-s[1]+s[4]+s[7]

s.2<-s[2]+s[5]+s[8]

s.3<-s[3]+s[6]+s[9]

r1.<-r[1]+r[2]+r[3]

r2.<-r[4]+r[5]+r[6]

r3.<-r[7]+r[8]+r[9]

r.1<-r[1]+r[4]+r[7]

r.2<-r[2]+r[5]+r[8]

r.3<-r[3]+r[6]+r[9]

# the prob D=1 given Y1=1

ph1.∼dbeta(s1.,r1.)

ph2.∼dbeta(s2.,r2.)

ph3.∼dbeta(s3.,r3.)

ph.1∼dbeta(s.1,r.1)

ph.2∼dbeta(s.2,r.2)

ph.3∼dbeta(s.3,r.3)

# the prob the first test =1 given d=1

alpha1[1]<- ph1.*theta1./dalpha1

alpha1[2]<- ph2.*theta2./dalpha1

alpha1[3]<- ph3.*theta3./dalpha1

dalpha1<- ph1.*theta1.+ph2.*theta2.+ph3.*theta3.

# the prob the first test =1 given D=0

beta1[1]<-((1-ph1.)*theta1.)/dbeta1

beta1[2]<-((1-ph2.)*theta2.)/dbeta1

beta1[3]<-((1-ph3.)*theta3.)/dbeta1

dbeta1<-(1-ph1.)*theta1.+(1-ph2.)*theta2.+(1-ph3.)*theta3.

# the prob that the second test =1 given d=1

alpha2[1]<- ph.1*theta.1/dalpha2

alpha2[2]<- ph.2*theta.2/dalpha2

alpha2[3]<- ph.3*theta.3/dalpha2

dalpha2<- ph.1*theta.1+ph.2*theta.2+ph.3*theta.3

beta2[1]<-((1-ph.1)*theta.1)/dbeta2

beta2[2]<-((1-ph.2)*theta.2)/dbeta2

beta2[3]<-((1-ph.3)*theta.3)/dbeta2

dbeta2<-(1-ph.1)*theta.1+(1-ph.2)*theta.2+(1-ph.3)*theta.3

# area of test 1

A1<- A11+A12

A11<- alpha1[2]*beta1[1]+alpha1[3]*(beta1[1]+beta1[2])

A12<- (alpha1[1]*beta1[1]+alpha1[2]*beta1[2]+alpha1[3]*beta1[3])/2

# area of test 2

A2<- A21+A22

A21<- alpha2[2]*beta2[1]+alpha2[3]*(beta2[1]+beta2[2])

A22<- (alpha2[1]*beta2[1]+alpha2[2]*beta2[2]+alpha2[3]*beta2[3])/2

d<-A1-A2

}

# two readers

# assume an improper prior

# see

list(r = c(101,105,83,67,72,40,41,30,4), s = c(8,26,51,43,81,94,117,140,208), m= c(111,149,196,124,236,201,221,210,320))

# the initial values are activated from the specification tool by clicking on the gen inits button.

The code closely follows formulas

Reader 1 appears more accurate that reader 2. It should be noted that the gold standard is biopsy of the tissue from the suspected lesion. The MCMC errors are quite “small” and give one confidence that the simulation is providing accurate estimates of the accuracy.

The main goal of the analysis is to estimate the combined accuracy of the two readers nd compare it the individual estimated ROC areas given by

Pepe ([_{1},_{2}), the cell entries for _{i}_{i}_{1},_{2}) = (1,1) the verification rate is 109/111, thus for _{1}, 8 is replaced 8(111/109) = 8.146 and for _{1}, the number 101 is replaced by 101(111/109) = 102.85. If this is done for the remaining 8 events, the observed

The imputed table has no verification bias and the ROC areas of the two tests based on this table are the same as those based on

The risk score is the probability that D = 1, which is the value assigned to all patients in the study. For example, for each of the 1,768 patients in the melanoma staging study portrayed in

The risk score is defined as

The risk score

Suppose the risk score is expressed as

the parameter

the function g is optimal for determining the ROC curve of the risk function.

From a practical point of view, logistic regression can be used to determine the ROC curve of the risk function, but it should be noted that finding a suitable function g can be a challenge. After all, g can be a complicated non-linear function of λ and/ or Y, but it would be convenient if g is linear in the test scores Y. In order to estimate the logistic regression function or risk score, Broemeling ([

For the example of

model;

# logistic regression

{

for(i in 1:N){d[i]∼dbern(theta[i])}

for(i in 1:N){logit(theta[i])<-b[1]+b[2]*T1[i]+b[3]*T2[i]}

# prior distributions

for(i in 1:3){b[i]∼dnorm(0.000,.0001)}

}

In order to accommodate the data, a list statement must be added, which consists of three vectors, each of dimension 1,768, corresponding to the number of patients in the study: The vectors _{1} and _{2} will consist of the values 1,2, or 3 corresponding to

The posterior distributions indicate the coefficients are not zero and are important in determining the risk score of each patient, but the main interest is in the vector theta (of dimension 1,768) where the median of each component is used as the risk score. There were 9 distinct risk scores: 0.086, 0.187, 0.36, 0.336, 0.5542, 0.733, 0.7542, 0.8704, and 0.9426. Therefore, I computed the ROC using the basic formula from Broemeling ([

Based on

# Area under the curve

# Ordinal values

# Nine values

model;

{

# generate Dirichlet distribution

g11∼dgamma(a11,2)

g12∼dgamma(a12,2)

g13∼dgamma(a13,2)

g14∼dgamma(a14,2)

g15∼dgamma(a15,2)

g16∼dgamma(a16,2)

g17∼dgamma(a17,2)

g18∼dgamma(a18,2)

g19∼dgamma(a19,2)

g01∼dgamma(a01,2)

g02∼dgamma(a02,2)

g03∼dgamma(a03,2)

g04∼dgamma(a04,2)

g05∼dgamma(a05,2)

g06∼dgamma(a06,2)

g07∼dgamma(a07,2)

g08∼dgamma(a08,2)

g09∼dgamma(a09,2)

g1<-g11+g12+g13+g14+g15+g16+g17+g18+g19

g0<-g01+g02+g03+g04+g05+g06+g07+g08+g09

# posterior distribution of probabilities for response of diseased patients

theta1<-g11/g1

theta2<-g12/g1

theta3<-g13/g1

theta4<-g14/g1

theta5<-g15/g1

theta6<-g16/g1

theta7<-g17/g1

theta8<-g18/g1

theta9<-g19/g1

# posterior distribution for probabilities of response of non-diseased patients

ph1<-g01/g0

ph2<-g02/g0

ph3<-g03/g0

ph4<-g04/g0

ph5<-g05/g0

ph6<-g06/g0

ph7<-g07/g0

ph8<-g08/g0

ph9<-g09/g0

# auc is area under ROC curve

#A1 is the P[Y>X]

#A2 is the P[Y=X]

auc<- A1+A2/2

A1<-theta2*ph1+theta3*(ph1+ph2)+theta4*(ph1+ph2+ph3)+

theta5*(ph1+ph2+ph3+ph4)+theta6*(ph1+ph2+ph3+ph4+ph5)+

theta7*(ph1+ph2+ph3+ph4+ph5+ph6)+

theta8*(ph1+ph2+ph3+ph4+ph5+ph6+ph7)+

theta9*(ph1+ph2+ph3+ph4+ph5+ph6+ph7+ph8)

A2<- theta1*ph1+theta2*ph2+theta3*ph3+theta4*ph4

+theta5*ph5+theta6*ph6+theta7*ph7+theta8*ph8+theta9*ph9

}

#see

# diagnosing breast cancer with two readers

# verification bias

# improper prior

list(a11=8,a12=30,a13=75,a14=48,a15=.125,a16=164,a17=141,a18=173,a19=314,a01=103,a02=119,a 03=121,a04=76,a05=111,a06=57,a07=60,a08=37,a09=6)

# initial values

list(g11=1,g12=1,g13=1,g14=1,g15=1,g16=1,g17=1,g18=1,g19=1,g01=1,g02=1,g03=1, g04=1,g05=1, g06=1,g07=1,g08=1,g09=1)

This presentation has reviewed the Bayesian methodology available for estimating the test accuracy when two or more tests are used to diagnose disease. The main focus is on tests that are subject to verification bias, that is, when some of the patients are not subject to the gold standard (are not verified for disease status). When verification bias occurs certain techniques are employed to correct for bias. If the usual estimators are calculated only for those patients that are verified for disease, the estimators are biased, thus, this article develops Bayesian procedures that “correct” for bias. By imposing the missing at random assumption, Bayesian estimators of the usual measures of test accuracy are developed. For two binary tests, the true and false positive fractions estimate the test accuracy, while for two ordinal tests the ROC area of the score function estimates the test accuracy of the combined tests. An interesting variation of verification bias is extreme verification bias, which is present when all of the patients that test negative with both tests are not verified for disease. In such a scenario, the true and false positive fractions cannot be estimated. However, other measures of the combined accuracy are available and easily estimated with Bayesian inference. Bayesian inference is illustrated with examples involving the diagnosis of breast and lung cancer.

There is a large literature on the subject of verification bias, and for additional recent information about the subject see [

Posterior density of true positive fraction BP rule.

Posterior density of the ratio of the false positive fraction. BP relative to BN.

Two binary scores with verification bias.

Y_{1} = 1, 0 | ||||

V = 1 | Y_{2} = 1 |
Y_{2} = 0 |
Y_{2} = 1 |
Y_{2} = 0 |

D = 1 | _{11} |
_{10} |
_{01} |
_{00} |

D = 0 | _{11} |
_{10} |
_{01} |
_{00} |

V = 0 | _{11} |
_{10} |
_{01} |
_{00} |

Total | _{11} |
_{10} |
_{01} |
_{00} |

CT and lung cancer risk.

Y_{1} = 1, 0 | ||||

V = 1 | Y_{2} = 1 |
Y_{2} = 0 |
Y_{2} = 1 |
Y_{2} = 0 |

D = 1 | _{11} = 14 |
_{10} = 12 |
_{01} = 9 |
_{00} = 3 |

D = 0 | _{11} = 4 |
_{10} = 9 |
_{01} = 13 |
_{00} = 18 |

V = 0 | _{11} = 7 |
_{10} = 8 |
_{01} = 9 |
_{00} = 10 |

Total | _{11} = 25 |
_{10} = 29 |
_{01} = 31 |
_{00} = 31 |

Posterior analysis for the CT and MRI determination of lung cancer risk with verification bias.

fpf1 | 0.2864 | 0.06201 | <0.0001 | 0.1725 | 0.284 | 0.4141 |

fpf2 | 0.3809 | 0.0668 | <0.0001 | 0.253 | 0.3799 | 0.5144 |

fpfbn | 0.0880 | 0.0697 | <0.0001 | 0.0264 | 0.0829 | 0.1795 |

fpfbp | 0.5766 | 0.0653 | <0.0001 | 0.4559 | 0.5777 | 0.7021 |

tpf1 | 0.6755 | 0.0712 | <0.0001 | 0.5305 | 0.6779 | 0.8074 |

tpf2 | 0.6007 | 0.0737 | <0.0001 | 0.4538 | 0.6018 | 0.7414 |

tpfbn | 0.3663 | 0.0694 | <0.0001 | 0.2367 | 0.3645 | 0.5081 |

tpfbp | 0.9165 | 0.0439 | <0.0001 | 0.8131 | 0.9233 | 0.9812 |

CT and MRI for lung cancer risk with extreme verification bias.

Y_{1} = 1, 0 | ||||

V = 1 | Y_{2} = 1 |
Y_{2} = 0 |
Y_{2} = 1 |
Y_{2} = 0 |

D = 1 | _{11} = 14 |
_{10} = 12 |
_{01} = 9 |
_{00} = 0 |

D = 0 | _{11} = 4 |
_{10} = 9 |
_{01} = 13 |
_{00} = 0 |

V = 0 | _{11} = 0 |
_{10} = 0 |
_{01} = 0 |
_{00} = 21 |

Total | _{11} = 18 |
_{10} = 21 |
_{01} = 22 |
_{00} = 21 |

Bayesian analysis for extreme verification bias (callout).

DP1 | 0.3169 | 0.0508 | <0.0001 | 0.22 | 0.3155 | 0.4206 |

DP2 | 0.2805 | 0.0493 | <0.0001 | 0.1891 | 0.2787 | 0.3823 |

FRP1 | 0.1586 | 0.0400 | <0.0001 | 0.0882 | 0.1559 | 0.2411 |

FRP2 | 0.2074 | 0.0446 | <0.0001 | 0.127 | 0.2051 | 0.3016 |

Ratio TPF | 2.618 | 0.5922 | 0.0019 | 1.774 | 2.515 | 4.503 |

Ratio FPF | 8.346 | 5.629 | 0.0168 | 3.187 | 6.902 | 22.01 |

Diagnosing breast cancer with two readers.

_{1} = 1, 2, 3 | |||||||||

_{2} |
1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 |

D = 1 | _{1} |
_{2} |
_{3} |
_{4} |
_{5} |
_{6} |
_{7} |
_{8} |
_{9} |

D = 0 | _{1} |
_{2} |
_{3} |
_{4} |
_{5} |
_{6} |
_{7} |
_{8} |
_{9} |

V = 0 | _{1} |
_{2} |
_{3} |
_{4} |
_{5} |
_{6} |
_{7} |
_{8} |
_{9} |

Total | _{1} |
_{2} |
_{3} |
_{4} |
_{5} |
_{6} |
_{7} |
_{8} |
_{9} |

Bayesian analysis for ROC areas of two readers.

Parameter | Mean | SD | Error | 2 1/2 | Median | 97 1/2 |
---|---|---|---|---|---|---|

A1(reader 1) | 0.7867 | 0.0119 | <0.00001 | 0.763 | 0.787 | 0.8097 |

A2(reader 2) | 0.6351 | 0.0145 | <0.00001 | 0.6062 | 0.6352 | 0.6633 |

Diagnosing breast cancer with two readers imputed table via inverse probability weighting.

_{1} = 1, 2, 3 | ||||||||

1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 |

_{1} |
_{2} |
_{3} |
_{4} |
_{5} |
_{6} |
_{7} |
_{8} |
_{9} |

_{1} |
_{2} |
_{3} |
_{4} |
_{5} |
_{6} |
_{7} |
_{8} |
_{9} |

_{1} |
_{2} |
_{3} |
_{4} |
_{5} |
_{6} |
_{7} |
_{8} |
_{9} |

_{1} |
_{2} |
_{3} |
_{4} |
_{5} |
_{6} |
_{7} |
_{8} |
_{9} |

Posterior distribution of the logistic regression the risk score.

b[1] | −4.947 | 0.2859 | 0.00173 | −5,515 | −4.943 | −4.395 |

b[2] | 1.688 | 0.0851 | <0.0001 | 1.524 | 1.687 | 1.857 |

b[3] | 0.8946 | 0.0804 | <0.0001 | 0.7373 | 0.894 | 1.053 |

Posterior analysis of combined accuracy breast cancer example with two readers.

A1 | 0.8099 | 0.0106 | <0.00001 | 0.7885 | 0.8101 | 0.8308 |

A2 | 0.0657 | 0.0030 | <0.00001 | 0.0599 | 0.0656 | 0.0716 |

auc | 0.8428 | 0.0094 | <0.00001 | 0.8238 | 0.843 | 0.8608 |

^{18}F-FET PET compared with

^{18}F-FEG PET ad CT in patients with head and neck cancer

^{18}F-FET PET