Global Hypothesis Test to Compare the Predictive Values of Diagnostic Tests Subject to a Case-Control Design

: Use of a case-control design to compare the accuracy of two binary diagnostic tests is frequent in clinical practice. This design consists of applying the two diagnostic tests to all of the individuals in a sample of those who have the disease and in another sample of those who do not have the disease. This manuscript studies the comparison of the predictive values of two diagnostic tests subject to a case-control design. A global hypothesis test, based on the chi-square distribution, is proposed to compare the predictive values simultaneously, as well as other alternative methods. The hypothesis tests studied require knowing the prevalence of the disease. Simulation experiments were carried out to study the type I errors and the powers of the hypothesis tests proposed, as well as to study the effect of a misspeciﬁcation of the prevalence on the asymptotic behavior of the hypothesis tests and on the estimators of the predictive values. The proposed global hypothesis test was extended to the situation in which there are more than two diagnostic tests. The results have been applied to the diagnosis of coronary disease.


Introduction
The main parameters to assess and compare the accuracy of binary diagnostic tests (BDTs) are sensitivity and specificity. The sensitivity (Se) is the probability of the result of the BDT being positive when the individual has the disease, and the specificity (Sp) is the probability of the result of the BDT being negative when the individual does not have the disease. Other parameters that are used to assess and compare two BDTs are the predictive values (PVs). The positive predictive value (PPV) is the probability of an individual having the disease when the result of the BDT is positive, and the negative predictive value (NPV) is the probability of an individual not having the disease when the result of the BDT is negative. The PVs represent the accuracy of the diagnostic test when it is applied to a cohort of individuals, and they are measures of the clinical accuracy of the BDT. The PVs depend on Se, Sp and on the disease prevalence (p), and are easily calculated applying Bayes Theorem, i.e., whereas the Se and the Sp quantify how well the BDT reflects the true disease status (present or absent), the PVs quantify the clinical value of the BDT, since both the individual and the clinician are more interested in knowing how probable it is to have the disease given a BDT result. The comparison of the performance of two binary diagnostic tests is a topic of special importance in the study of statistical methods for the diagnosis of diseases. This comparison is made through a paired-design or through a case-control design. The paired design consists of applying the two BDTs and the gold standard to all of the individuals in a single sample. The case-control design consists of applying the two BDTs to all of the individuals in two samples, one made up of individuals who have the disease (case sample) and another made up of individuals who do not have the disease (control sample). The advantages and disadvantages of the case-control design over the paired design can be seen in the book by Pepe [1]. Summarizing, the case-control design has some advantages over the paired design: (a) the case-control design is more efficient in terms of sample size requirements, (b) case-control studies allow for the exploration of subject-related characteristics of the test. Nevertheless, the case-control design has the disadvantage is that by using it we cannot estimate the prevalence of the disease.
In paired designs, the comparison of PVs has been the subject of several studies. Bennett [2,3], Leisenring et al. [4], Wang et al. [5] and Kosinski [6] studied hypothesis tests to independently compare the PPVs and the NPVs of two BDTs. Moskowitz and Pepe [7] studied the estimation of the PVs through a confidence region. Roldán-Nofuentes et al. [8] studied the joint comparison of the PPVs and NPVs of two BDTs, and proposed a global hypothesis test based on the chi-square distribution to simultaneously compare the PVs of two BDTs.
In a case-control design, Mercaldo et al. [9] have studied the estimation of the PVs of a BDT, assuming that the prevalence of the disease (p) is known. The prevalence can be known from other studies, such as population studies of health services, cohort studies, etc. Mercado et al. have verified through simulation experiments that the confidence interval with the best asymptotic behavior is the logit interval, whose equations are:  (1), z 1−α/2 is the 100(1 − α/2)th percentile of the normal standard distribution, and the variances are: Var logit PPV = 1 −Ŝe n 1Ŝ e +Ŝ p n 2 1 −Ŝp andVar logit NPV =Ŝ e n 1 1 −Ŝe whereŜe andŜp are the estimators of sensitivity and specificity, n 1 is the size of the case sample and n 2 is the size of the control sample.
In this article, we extended the study of Mercaldo et al. [9] to the case of two BDTs, studying different hypothesis tests to compare the PVs of the two BDTs subject to a casecontrol design. Subject to a case-control design, the two BDTs are applied to all of the individual in two samples, one of n 1 individuals who have the disease (case sample) and another with n 2 individuals who do not have the disease (control sample). In this design, the sample sizes n 1 and n 2 are set by the researcher. The sample of individuals that have the disease is extracted from a population of individuals that have the disease (e.g., registers of diseases), and the control sample is extracted from a population of individuals who are known not to have the disease. As the PVs depend on the disease prevalence and subject to a case-control design the quotient n 1 /(n 1 + n 2 ) is not an estimator of the prevalence, in order to estimate and compare the PVs subject to this design it is necessary to know the value of the prevalence of the disease. This value can be obtained from health surveys or from previous studies. Consequently, the methods of comparison of the PVs subject to a paired design cannot be applied when there is a case-control design. In Section 2, we study hypothesis tests to simultaneously compare the PVs of two BDTs subject to a case-control design. A global hypothesis test is studied to simultaneously compare the PVs of the two BDTs, i.e., H 0 : (PPV 1 = PPV 2 and NPV 1 = NPV 2 ), and simultaneous comparison is also studied from individual hypothesis tests, i.e., H 0 : PPV 1 = PPV 2 and H 0 : NPV 1 = NPV 2 each of them to the α error and also applying multiple comparison methods. In Section 3, simulation experiments are carried out to study the type I errors and the powers of the hypothesis tests proposed in Section 2, and we study the effect of the misspecification of the prevalence on the asymptotic behavior of the hypothesis tests proposed in Section 2 and on the estimators of the PVs. In Section 4, the results are applied to a real example on the diagnosis of coronary heart disease. In Section 5, the model proposed in Section 2 was extended to the situation in which we compare the PVs of more than two BDTs, and in Section 6 the results are discussed.

Global Hypothesis Test
Let us consider two BDTs, Test 1 and Test 2, which are applied to all of the individuals in two samples, one of n 1 individuals who have the disease (case sample) and another of n 2 individuals who do not have it (control sample). Let T 1 and T 2 be two binary variables that model the results of each BDT, in such a way that T i = 1 when the result of the corresponding BDT is positive and T i = 0 when it is negative. In Table 1, we can see the probabilities associated to the application of both BDTs to both types of individuals (cases and controls), as well as the frequencies observed. Table 1. Probabilities and observed frequencies subject to case-control design.

Case
Control Observed Frequencies

Case Control
Using the conditional dependence model of Vacek [10], the probabilities given in the table are written as: with j, k = 0, 1. The parameter ε 1 (ε 2 ) is the covariance between the two BDTs in cases (controls), where δ jk = 1 if j = k and δ jk = −1 if j = k, and it is verified that 0 ≤ ε 1 ≤ then the two BDTs are conditionally independent on the disease status. In practice, the assumption of the conditional independence is not realistic, and therefore ε 1 > 0 and/or ε 2 > 0. In terms of the probabilities ξ ijk , the sensitivities are written as: and the specificities are written as: The estimators of sensitivities areŜe 1 = n 11· /n 1 andŜe 2 = n 1·1 /n 1 , and the estimators of specificities areŜp 1 = n 20· /n 2 andŜp 2 = n 2·0 /n 1 . The estimators of their variances arê Var Ŝ e 1 =Ŝe 1 1 −Ŝe 1 /n 1 ,Var Ŝ e 2 =Ŝe 2 1 −Ŝe 2 /n 1 ,Var Ŝ p 1 =Ŝp 1 1 −Ŝp 1 /n 2 andVar Ŝ p 2 =Ŝp 2 1 −Ŝp 2 /n 2 . Therefore, the sensitivities and the specificities are estimated as proportions of marginal totals. In this way, in the case sample we are interested in the marginal frequencies n 11· and n 1·1 , and therefore these frequencies are the product of a type I bivariate binomial distribution [11]. In an analogous way, from the control sample, the marginal frequencies n 20· and n 2·0 are the product of a type I bivariate binomial distribution. In the individuals with the disease, the type I bivariate binomial distribution is characterized [11] by Se 1 , Se 2 and the correlation coefficient (ρ 1 ) between T 1 and T 2 . In an analogous way, in the individuals who do not have the disease, the type I bivariate binomial distribution is characterized by Sp 1 , Sp 2 and the correlation coefficient (ρ 2 ) between T 1 and T 2 . Therefore, the proposed model is a parametric model based on the distribution of the marginal frequencies in each 2 × 2 table. In the individuals with the disease (cases), the correlation coefficient between the two BDTs is: and in the individuals who do not have the disease (controls), the correlation coefficient between the two BDTs is: It is easy to show thatε 1 = (n 1 n 111 − n 11· n 1·1 )/n 2 1 ,ε 2 = (n 2 n 200 − n 20· n 2·0 )/n 2 2 , Cov Ŝ e 1 ,Ŝe 2 =ε 1 /n 1 andĈov Ŝ p 1 ,Ŝp 2 =ε 2 /n 2 . All of the other covariances are zero, since the two samples are independent. The estimators of ρ 1 and ρ 2 areρ 1 = . Assuming that the disease prevalence p is known, the estimators of the predictive values are: and NPV 1 = qn 1 n 20· pn 2 (n 1 − n 11· ) + qn 1 n 20· , for Test 1, and PPV 2 = pn 2 n 1·1 pn 2 n 1·1 + qn 1 (n 2 − n 2·0 ) and NPV 2 = qn 1 n 2·0 pn 2 (n 1 − n 1·1 ) + qn 1 n 2·0 for Test 2, where q = 1 − p. Let the variance-covariance matrixes be defined as: Cov Ŝ e 1 ,Ŝe 2 Cov Ŝ e 1 ,Ŝe 2 Var Ŝ e 2 and ∑ˆS p = Let θ = (Se 1 , Se 2 , Sp 1 , Sp 2 ) T be a vector whose components are the sensitivities and the specificities, and let ω = (PPV 1 , PPV 2 , NPV 1 , NPV 2 ) T be a vector whose components are the PVs. The variance-covariance matrix ofθ is: where ⊗ is the Kronecker product. Applying the delta method, the matrix of variancescovariances ofω is: Expressions of the variances-covariances of the PVs can be seen in Appendix A. The PVs of each BDT depend on the same parameters, the sensitivity and the specificity of the test and disease prevalence, and therefore they are related parameters. Consequently, the PVs of the two BDTs can be compared simultaneously. The global hypothesis test to simultaneously compare the PVs of the two BDTs is: H 0 : PPV 1 = PPV 2 and NPV 1 = NPV 2 H 1 : at least one equality is not true, which is equivalent to the hypothesis test: where A is a complete range matrix sized 2 × 4 whose elements are known constants, i.e.: As the vectorω is distributed asymptotically according to a multivariate normal distribution, i.e., √ n 1 + n 2 ω − ω → n 1 +n 2 →∞ N(0, Σ ω ), then the test statistic for the global hypothesis test (4) is: which is distributed asymptotically according to Hotelling's T-squared distribution with a dimension 2 and n 1 + n 2 degrees of freedom, where 2 is the dimension of the vector Aω. When n 1 + n 2 is large, the statistic Q 2 is approximately distributed according to a central chi-square distribution with 2 degrees of freedom when the null hypothesis is true. On the other hand, the individual comparison of the positive (negative) predictive values is solved with the hypothesis test: where PV is PPV or NPV. Based on the asymptotic normality of the estimators, the test statistic for this hypothesis test is: which is distributed asymptotically according to a normal standard distribution, and where the variances-covariances are obtained from the Equation (3) (see Appendix A). The global hypothesis test H 0 : Aω = 0 simultaneously compares the PPVs and the NPVs of the two BDTs. Some alternative methods to this global hypothesis test, based on the individual hypothesis tests, are: (1) testing the hypotheses H 0 : PPV 1 = PPV 2 and H 0 : NPV 1 = NPV 2 (Equation (6)) each one to an α error; (2) testing the hypotheses H 0 : PPV 1 = PPV 2 and H 0 : NPV 1 = NPV 2 (Equation (6)) and applying a multiple comparison method such as the Bonferroni method [12] or the Holm method [13], which are methods that are very easy to apply based on the p-values. Bonferroni method [12] consists of solving each individual hypothesis test to an error equal to α/2. The Holm method is a step-down method which is based on Bonferroni method but is more conservative. In Appendix B, the Holm method [13] is summarized.

Simulation Experiments
Simulation experiments were carried out to study the type I errors and the powers of the four methods proposed to simultaneously compare the predictive values: the global hypothesis test based on the chi-square distribution (Equation (5)), the individual hypothesis tests each one to an α error (Equation (6)), the individual hypothesis tests (Equation (6)) applying the Bonferroni method and the individual hypothesis tests (Equation (6)) applying the Holm method. We have also studied the effect of a misspecification of the prevalence on the asymptotic behavior of these methods and on the estimators of the PVs.
The experiments were designed setting the values of the PVs. For each BDT, we took as PVs the values 0.60, 0.65, . . . , 0.90, 0.95, and as disease prevalence we took the values 10%, 25% and 50%. Based on the PVs and the prevalence, Se and Sp of each BDT were calculated from the Equation (1), only considering those cases in which the solutions are between 0 and 1. As values of the correlation coefficients ρ 1 and ρ 2 we took low values (25% of the maximum value), intermediate (50% of the maximum value) and high (75% of the maximum value), where the maximum value of each correlation coefficient is: respectively.
As sample sizes, we took the values n i = (50, 75, 100, 200, 500). The simulation experiments were carried out with R [14], using the "bindata" package [15] to generate the samples of each type I bivariate binomial distribution. Regarding the random samples, these were generated in the following way. Firstly, once the values of the PVs and of the prevalence were set, we calculated the sensitivities, the specificities and the maximum values of the coefficients ρ 1 and ρ 2 . We then generated 10,000 random samples from a type I bivariate binomial distribution with a sample size n 1 , probabilities Se 1 and Se 2 , and correlation coefficient ρ 1 . Similarly, we generated another 10,000 random samples from a type I bivariate binomial distribution with a sample size n 2 , probabilities Sp 1 and Sp 2 , and correlation coefficient ρ 2 . In this way, we obtained the marginal frequencies n 11· and n 1·1 (n 20· and n 2·0 ) of each one of the 10,000 case (control) samples. The rest of the marginal frequencies were easily calculated: n 10· = n 1 − n 11· , n 1·0 = n 1 − n 1·1 , n 21· = n 2 − n 20· and n 2·1 = n 2 − n 2·0 . In order to construct the 2 × 2 table of each case sample, we generated a random value n 111 from a doubly truncated binomial distribution of parameters n 1 and ξ 111 = Se 1 Se 2 + ε 1 , with n 11· + n 1·1 − n 1 ≤ n 111 ≤ min(n 11· , n 11· ). This is necessary so that the sum of the frequencies leads to the marginal totals randomly generated through the type I bivariate binomial distribution. In the same way, in order to construct the 2 × 2 table of each control sample, we generated a random value n 200 from a doubly truncated binomial distribution of parameters n 2 and ξ 200 = Sp 1 Sp 2 + ε 2 , with n 20· + n 2·0 − n 2 ≤ n 200 ≤ min(n 20· , n 2·0 ). For each one of the 10,000 case (control) samples, once we have generated the values n 11· , n 1·1 and n 111 (n 20· , n 2·0 and n 200 ) it is easy to construct the complete 2 × 2 table. Thus, n 110 = n 1 − n 11· , n 101 = n 1·1 − n 111 and n 100 = n 10· − n 101 for the case samples, and n 201 = n 20· − n 200 , n 210 = n 2·0 − n 200 and n 211 = n 21· − n 210 for the control samples. For the experiments α = 5% was set. Moreover, all of the samples were generated in such a way that in all of them the parameters and the variances-covariances can be estimated. If in a random sample it is obtained that n i10 = n i01 = 0, with i = 1, 2, thenŜe i =Ŝp i = 1 andVar Ŝ e i = Var Ŝ p i = 0, and therefore the test statistic Q 2 =ω Aω cannot be calculated since A∑ωA T is a non-singular matrix. This problem occurs mainly when the sample size is small or moderate. In this situation, the sample has been discarded and another is generated in its place until the 10,000 samples are obtained.

Type I Errors and Powers
In Tables 2 and 3, we can see some results obtained for the type I errors of the global test and of the alternative methods proposed in Section 2. In these tables, we can only see the results for the global test, the individual comparisons with α = 5% and with the Bonferroni method. The results obtained with the Holm method are not shown as they are practically the same as those obtained with the Bonferroni method. From the results obtained we can draw the following conclusions. In general terms, the type I error of the global hypothesis test fluctuates around the nominal error, especially in the case of samples sized n i ≥ 100, depending on the prevalence and the correlations between the two BDTs. For samples with smaller sizes (n i ≤ 75), the type I error of the global test is lower than α = 5%. The correlations between the two BDTs have an important effect on the type I error of the global test, with a decrease in the type I error when there is an increase in the correlation coefficients. Table 2. Type I errors for PPV 1 = PPV 2 = 0.70 and NPV 1 = NPV 2 = 0.95.   Regarding the method based on the individual hypothesis tests H 0 : PPV 1 = PPV 2 and H 0 : NPV 1 = NPV 2 to an error α = 5% each one of them, the type I error may clearly overwhelm the nominal error (a situation that we have considered when the type I error is greater than 7%), especially when the correlations are not high. Consequently, this method may lead to erroneous results (false significances) and, therefore, should not be used. As for solving the global test from the individual tests applying the Bonferroni (Holm) method, the type I error has a very similar behavior to that of the global hypothesis test. Table 3. Type I errors for PPV 1 = PPV 2 = 0.85 and NPV 1 = NPV 2 = 0.95.   Regarding the powers of the hypothesis tests, in Tables 4 and 5 we can see some of the results obtained for the global test and other alternative methods. The results obtained with the Holm method are not shown as they are practically the same as those obtained with the Bonferroni method. The power of the global hypothesis test is calculated as the proportion of samples in which it is accepted that PPV 1 = PPV 2 or NPV 1 = NPV 2 (being true that PPV 1 = PPV 2 or NPV 1 = NPV 2 ). From the results, the following conclusions are obtained. The disease prevalence has an important effect on the power of each one of the methods to solve the global test, and the power increases with an increase in the prevalence. Regarding the correlations ρ 1 and ρ 2 , these do not have a clear effect on the power, and the power increases sometimes and decreases other times when the correlations increase. In general terms, when the prevalence is small (p = 10%) we need large samples (n i > 500) so that the power of the global hypothesis test is greater than 80%; for a prevalence of 25% with sample sizes n i ≥ 200 we obtain a power greater than 80%; and for a very large prevalence (p = 50%) with sample sizes n i ≥ 50 we obtain a very higher power, greater than 80%-90%, depending on the difference between the PVs. Table 4. Powers for PPV 1 = 0.75, NPV 1 = 0.95, PPV 2 = 0.60 and NPV 2 = 0.95. Se 1 = 0.5357 , Sp 1 = 0.9802 , Se 2 = 0.5455 , Sp 2 = 0.9596 , p = 10% , 0 ≤ ρ 1 ≤ 0.9805 , 0 ≤ ρ 2 ≤ 0.6933   The power of the method based on the individual hypothesis tests to an error α = 5% is greater than that of the global test based on the chi-square distribution due to the fact that its type I error is also greater. Regarding the hypothesis test based on the individual tests with the Bonferroni method, in general terms, its power is very similar to that of the global test when the sample sizes are large. When the sample sizes are small or moderate, in general terms and depending on prevalence and correlations, the power of the global test is slightly greater than that of the individual tests with the Bonferroni method. The same conclusions are obtained when the Holm method is applied (whose results are almost identical to those of the Bonferroni method). Table 5. Powers for PPV 1 = 0.95 NPV 1 = 0.95, PPV 2 = 0.75 and NPV 2 = 0.95. Se 1 = 0.5278 , Sp 1 = 0.9969 , Se 2 = 0.5357 , Sp 2 = 0.9802 , p = 10% , 0 ≤ ρ 1 ≤ 0.9841 , 0 ≤ ρ 2 ≤ 0.3910   The graphs in Figure 1 show the powers of the three methods when PPV 1 = 0.90, NPV 1 = (0.80, 0.85, 0.90, 0.95), PPV 2 = 0.85 and NPV 2 = 0.90, for different sample sizes n 1 = n 2 = (50, 100, 200), p = (25%, 50%) and values intermediate of the correlation coefficients. These graphs show that when NPV 1 varies and the rest of the PVs are constant, the powers decrease when the prevalence increases. Similarly, the graphs in Figure 2 show the powers of the three methods when PPV 1 = (0.80, 0.85, 0.90, 0.95), NPV 1 = 0.95, PPV 2 = 0.60 and NPV 2 = 0.95, for different sample sizes n 1 = n 2 = (50, 100, 200), p = (10%, 25%) and values intermediate of the correlation coefficients. These graphs show that when the PPV 1 varies and the rest of the PVs are constant, the power of each method increases when the prevalence increases.
As conclusions of the results obtained in the simulation experiments, the global hypothesis test based on the chi-square distribution behaves well in terms of the type I error (it does not overwhelm the nominal error of 5%), the same as the individual tests along with the Bonferroni (Holm) method. The method based on the individual tests to a global error α = 5% should not be used as it may clearly overwhelm the nominal error.
In the simulation experiments, the proportion of times that PPV 1 = PPV 2 and that NPV 1 = NPV 2 are correctly concluded has also been studied. This issue is of special interest when the alternative hypothesis of the global test is true, as it can be a valid method to investigate the causes of significance. The study was carried out by applying the individual hypothesis tests together with the Bonferroni (Holm) method. Individual tests to an α error have not been considered as they have a type I error that can exceed the nominal error. If it is verified that PV 1 = PV 2 , then this study is equivalent to studying the power of the individual test H 0 : PV 1 = PV 2 to an α/2 error (since the Bonferroni method has been applied), where PV i is PPV i or NPV i . If it is verified that PV 1 = PV 2 then this study is equivalent to studying the type I error of the individual test H 0 : PV 1 = PV 2 to an α/2 error. In the scenarios considered in Tables 4 and 5 it is verified that PPV 1 = PPV 2 and that NPV 1 = NPV 2 . Therefore, for these two scenarios, the power of the test H 0 : PPV 1 = PPV 2 and the type I error of the test H 0 : NPV 1 = NPV 2 have been studied, each with an error equal to α/2 = 2.5%. Tables 6 and 7 show the results obtained applying the Bonferroni method. The results obtained with the Holm method are not shown as they are practically the same as those obtained with the Bonferroni method.  As conclusions of the results obtained in the simulation experiments, the global hypothesis test based on the chi-square distribution behaves well in terms of the type I error (it does not overwhelm the nominal error of 5%), the same as the individual tests along with the Bonferroni (Holm) method. The method based on the individual tests to a global error 5 = α % should not be used as it may clearly overwhelm the nominal error. In the simulation experiments, the proportion of times that 1 2 ≠ PPV PPV and that 1 2 ≠ NPV NPV are correctly concluded has also been studied. This issue is of special interest when the alternative hypothesis of the global test is true, as it can be a valid method to investigate the causes of significance. The study was carried out by applying the individual hypothesis tests together with the Bonferroni (Holm) method. Individual tests to an In general terms, the hypothesis test H 0 : PPV 1 = PPV 2 has a high power when the sample sizes are moderate or high, depending on the prevalence and the correlation coefficients. Its behavior is very similar to that of the global hypothesis test. With respect to the test H 0 : NPV 1 = NPV 2 , its type I error fluctuates around the nominal error (2.5%) when the sample sizes are moderate or large, depending on the prevalence of the correlation coefficients. In general terms, the hypothesis tests H 0 : PPV 1 = PPV 2 and H 0 : NPV 1 = NPV 2 have a good asymptotic behavior, both in terms of power and type I error. Table 6. Power of the test H 0 : PPV 1 = PPV 2 and type I error of the test H 0 : NPV 1 = NPV 2 when PPV 1 = 0.75, NPV 1 = 0.95, PPV 2 = 0.60 and NPV 2 = 0.95. Se 1 = 0.5357 , Sp 1 = 0.9802 , Se 2 = 0.5455 , Sp 2 = 0.9596 , p = 10% , 0 ≤ ρ 1 ≤ 0.9805 , 0 ≤ ρ 2 ≤ 0.6933  From the results obtained in the simulation experiments, we propose the following method to compare the PVs of two BDTs subject to a case-control design: (1) Applying the global hypothesis test based on the chi-square distribution (Equation (5)) to an α error; (2) If the global hypothesis test is not significant, the equality hypothesis of the PVs is not rejected; if the global hypothesis test is significant to an α error, the investigation of the causes of the significance is made by testing the individual tests (Equation (6)) and applying the Bonferroni method or the Holm method to an α error. Therefore, if the global test is significant, the investigation of the significance consists in solving the individual hypothesis tests H 0 : PPV 1 = PPV 2 and H 0 : NPV 1 = NPV 2 , each of them to an α/2 error (Bonferroni method) or applying Holm method.
This method to simultaneously compare the PVs is very similar to other methods used in other statistical models, such as the analysis of variance: first the global test is resolved to an α error, and if it is significant then the causes of significance are investigated from pairwise comparisons and the application of a multiple comparison method. Table 7. Power of the test H 0 : PPV 1 = PPV 2 and type I error of the test H 0 : NPV 1 = NPV 2 PPV 1 = 0.95, NPV 1 = 0.95, PPV 2 = 0.75 and NPV 2 = 0.95. Se 1 = 0.5278 , Sp 1 = 0.9969 , Se 2 = 0.5357 , Sp 2 = 0.9802 , p = 10% , 0 ≤ ρ 1 ≤ 0.9841 , 0 ≤ ρ 2 ≤ 0.3910

Effect of the Prevalence
The estimation and comparison of the PVs of two BDTs subject to a case-control design requires knowledge of the disease prevalence. To study the effect of a misspecification of the prevalence on the comparison of the PVs and on the estimators of the PVs, we carried out simulation experiments similar to those made to study the type I errors and the powers. For this purpose, we took as the prevalence for the inference a misspecification equal to 10% and to 20% of the value of the prevalence set, and we have studied the type I errors and the powers of the global test and of the Bonferroni and Holm methods, and the relative root mean square error (RRMSE) of the estimator of each PVs. Thus, for each estimator we calculated the relative root mean square error (RRMSE), i.e., where PV i is the PPV or the NPV of the ith BDT (i = 1, 2) and PV ik is its estimator calculated from the kth sample (k = 1, . . . , N), and N = 10, 000. For the values of the parameters we took as prevalence p = (10%, 25%, 50%) respectively, and to estimate the PVs we took as prevalence p = p ± d × p with d = (10%, 20%). A value d = 10% (d = 20%) can be considered as a small (moderate) value of the relative deviation. Table 8 shows some of the results obtained for the type I errors and the powers of the global test and the Bonferroni method (the results of the Holms method are not shown as they are practically identically to those obtained with the Bonferroni method). In this Table we show the results when there is no misspecification of the prevalence (p = p) and when there is a misspecification of the prevalence (p < p or p > p). From the results of these experiments, it is verified that the type I errors of the methods studied do not overwhelm the nominal error (α = 5%). In general terms there are no important differences between the type I errors when there is a misspecification of the prevalence and when there is not. Regarding the powers, the conclusions are also very similar, there are no important differences between the powers when there is a misspecification of the prevalence and when there is not.
Regarding the estimators, Table 8 shows some of the results obtained for the RRMSEs (in %) of the estimators of the PVs of Test 1 (the results for Test 2 are identical). In general terms, the difference between the RRMSEs is small (around 5% or less, in absolute value) when the two sample sizes are moderate (n i = 100) or large (n i ≥ 200) and the relative deviation is small (10%) or moderate (20%). Therefore, a small or moderate misspecification of the prevalence (p < p or p > p) does not have an important effect on the estimators of the PVs when the samples are moderate or large. Additionally, there is not an important difference between the RRMSEs when the sample sizes are small (n i ≤ 75) and the relative deviation is small. However, the difference between the RRMSEs is larger when the sample sizes are small and the relative deviation is moderate. In this situation, a misspecification of the prevalence has an important effect on the estimators of the PVs. Table 8. Effect of a misspecification of the prevalence.

Example
The results obtained have been applied to the diagnosis of coronary heart disease, using an electrocardiogram and an echocardiography as diagnostic tests. Both tests have been applied to a sample of 105 older men with coronary heart disease (case sample) and to another sample of 120 older men without this disease (control sample). In Table 9 we can see the frequencies obtained, where the random variable T 1 models the result of the electrocardiogram and the variable T 2 models the result of the echocardiography. In order to illustrate the method proposed, we are going to consider that the prevalence of the disease in older men is 5%. The objective is to compare the clinical accuracy (PVs) of both BDTs in the population whose prevalence of coronary heart disease is 5%. The comparison of the PVs will be made with α = 5%. Table 9. Diagnosis of coronary heart disease.

Case
Control 0  13  101  114  Total  83  22  105  Total  17  103  120 From the case sample, the estimates of the two sensitivities (and their standard errors, SE) and of the correlation coefficient between them areŜe 1 ± SE = 0.829 ± 0.037, Se 1 ± SE = 0.790 ± 0.040 andρ 1 = 0.511. From the control sample, the estimates of the two specificities and of the correlation coefficient between them areŜp 1 ± SE = 0.950 ± 0.020, Se 1 ± SE = 0.858 ± 0.032 andρ 2 = 0.345. Assuming that the prevalence of coronary heart disease is 5%, the estimates of the PVs are: Applying Equation (5), the value of the test statistic for the global test: H 0 : PPV 1 = PPV 2 and NPV 1 = NPV 2 H 1 : at least one equality is not true, is Q 2 = 7.516 and the p-value is 0.023, and therefore the null hypothesis of the global test is rejected. Testing the individual hypothesis tests it is found that the value of the test statistic for H 0 : PPV 1 = PPV 2 is equal to 2.552 (two sided p-value is 0.011) and that the value of the test statistic for H 0 : NPV 1 = NPV 2 is equal to 1.469 (two sided p-value is 0.142). Applying the Bonferroni (or Holm) method, the hypothesis of equality of the positive predictive values is rejected and the hypothesis of equality of the negative predictive values is not rejected. Therefore, in a population in which the prevalence of coronary heart disease is 5%, the positive predictive value of electrocardiogram is significantly greater than that of the echocardiography (95% confidence interval for the difference: 0.056 to 0.422), while there are no significant differences between the two negative predictive values.

More Than Two BDTs
Let us consider that J BDTs (J ≥ 3) are applied to all of the individuals in the case sample and the control sample. For each BDT we define the random variable T j in a similar way to how this was done in Section 2. Let Se j and Sp j be the sensitivity and the specificity of the jth BDT, with j = 1, . . . , J. Let n 1i 1 ...i J be the number of individuals with the disease for whom T 1 = i 1 , . . . , T J = i J , with i j = 1 when the result of the jth BDT is positive and i j = 0 when it is negative. In a similar way, n 2i 1 ...i J is the number of without the disease for whom T 1 = i 1 , . . . , T J = i J . Let us consider the probabilities ξ hi 1 ,...,i J = P T 1 = i 1 , T 2 = i 2 , . . . , T J = i J , with h = 1, 2. Thus, for example for three BDTs, using the dependence model of Torrance-Rynard and Walter [16], these probabilities are: with i j = 0, 1, i k = 0, 1 and j, k = 1, 2, 3, and where ε 1jk (ε 2jk ) is the covariance between the jth BDT and the kth BDT for individuals with the disease (without the disease). The estimators of these probabilities areξ hi 1 ...i J = n hi 1 ...i J /n h , with h = 1, 2. The sensitivity and the specificity of the jth BDT are: The estimators of the variances-covariances of these estimators areVar Ŝ e j = Se j 1 −Ŝe j /n 1 ,Var Ŝ p j =Ŝp j 1 −Ŝp j /n 2 ,Ĉov Ŝ e j ,Ŝe k =ε 1jk /n 1 andĈov Ŝ p j ,Ŝp k =ε 2jk /n 2 , and the rest of the covariances are equal to zero. The estimators of the PVs of the jth BDT are: where p is the disease prevalence and q = 1 − p.
Let θ = Se 1 , . . . , Se J , Sp 1 , . . . , Sp J T be the vector whose components are the sensitivities and the specificities, and let ω = PPV 1 , . . . , PPV J , NPV 1 , . . . , NPV J T be the vector whose components are the PVs. The variance-covariance matrix of the vectorθ, with a dimension 2J × 2J, is similar to that given in Equation (2), where ∑Ŝ e and ∑Ŝ p are matrixes with a dimension J × J. Applying the delta method, the variance-covariance matrix ofω, with a dimension 2J × 2J, has an expression similar to that given in Equation (3). The PVs of each one of the J BDTs depend on the same parameters (the sensitivity and the specificity of the jth diagnostic test) and, therefore, these parameters can be compared simultaneously.
The global hypothesis test to simultaneously compare the PVs of the J BDTs is: . . = PPV J and NPV 1 = NPV 2 = . . . = NPV J H 1 : at least one equality is not true, which is equivalent to the hypothesis test: where A is a matrix with a dimension 2(J − 1) × 2J, i.e., A 0 is a matrix with a dimension (J − 1) × J whose elements are all equal to 0, and A 1 is a matrix with a dimension (J − 1) × J where each component (i, i) is equal to 1, each element (i, i + 1) is equal to −1 for i = 1, . . . , J − 1, and the rest of the elements in this matrix are equal to 0. Applying the multivariate central limit theorem it is verified that √ n 1 + n 2 ω − ω → n 1 +n 2 →∞ N 2J (0, Σ ω ). Then, the statistic Q 2 = ω A T A∑ωA T −1 Aω is distributed according to Hotelling's T-squared distribution with a dimension 2(J − 1) and n 1 + n 2 degrees of freedom, where 2(J − 1) is the dimension of the vector Aω. When n 1 + n 2 is large, the statistic Q 2 is distributed according to a central chi-squared distribution with 2(J − 1) degrees of freedom when the null hypothesis is true, i.e., Finally, the method to compare the PVs of the J BDTs would consist of the following steps: (1) Solve the global hypothesis test to an α error calculating the statistic Aω based on the chi-squared distribution; (2) if the global test is not significant to an α error then we do not reject the homogeneity of the J PVs, but if the hypothesis test is significant then the causes of significance are investigated comparing the PPVs (NPVs) in pairs (Equation (6)) and applying a multiple comparison method (e.g., Bonferroni or Holm).

Discussion
The comparison of the positive and negative predictive values of two binary diagnostic tests is an important topic in the study of statistical methods in diagnostic medicine. Subject to a paired design, this topic has been subject to different studies. In this article we studied the simultaneous comparison of the predictive values of two diagnostic tests subject to a case-control design, analyzing and comparing several methods. These methods consisted of a global test based on the chi-square distribution, a method based on the individual comparisons each one to an α error, and other two methods based on individual comparisons along with a multiple comparison method. The multiple comparison methods that were used were the Bonferroni method and the Holm method, which are methods based on the p-values of the individual hypothesis tests and are very easy to apply. The methods studied to compare the predictive values require knowing the prevalence of the disease. The prevalence can be known from other studies, such as population studies of health services, cohort studies, etc. If the researcher has a great uncertainty about the value of the prevalence, the problem can be solved by using several values for the prevalence and then analyzing and comparing the results obtained.
Simulation experiments were carried out to study the type I errors and the powers of the four methods proposed. These experiments were based on the generation of random samples with type I bivariate binomial distributions, which are the distributions that are inherent to case-control design, since proportions of marginal totals are estimated from these samples. The results have shown that the global hypothesis test based on the chi-square distribution behaves well in terms of type I error, and does not overwhelm the nominal error. Regarding its power, in general this strongly depends on the disease prevalence, and it is necessary to have very large samples when the prevalence is small and relatively small sample sizes when the prevalence is high, so that the power will be high.
Based on the results of the simulation experiments, a method has been proposed to compare the predictive values of two diagnostic tests subject to a case-control design. This method, which is similar to that proposed by Roldán-Nofuentes et al. [8], consists of the following steps: (1) Simultaneously comparing the predictive values applying the global hypothesis test based on the chi-square distribution to an α error; (2) if the global hypothesis test is not significant, then the equality hypothesis of the PVs is not rejected. If the global hypothesis test is significant to an α error, then the causes of the significance are studied solving the individual hypothesis tests and applying the Bonferroni method or the Holm method to an α error. This procedure that we propose is similar to the analysis of variance: firstly, the global test is solved and, if this is significant, then the causes of the significance are studied starting with paired comparisons along with some multiple comparison method.
Simulation experiments were carried out to study the effect of a misspecification of the prevalence in the asymptotic behavior of the global hypothesis test based on the chi-square distribution and on the methods based on multiple comparisons. In general terms, we can conclude that a small or moderate misspecification of the prevalence do not have an important effect on the behavior of these hypothesis tests, especially when the sample sizes are moderate or large.
The global hypothesis test was extended to the situation in which we simultaneously compare the PVs of more than two BDTs, and for this we propose a method which is similar to that proposed for two BDTs. To be able to calculate the global test statistic Aω it is necessary that A∑ωA T to be non-singular. For two BDTs, this matrix is non-singular when it is verified that n i10 + n i01 > 0, with i = 1, 2.
If n i10 = n i01 = 0 then the method proposed to compare the PVs cannot be applied A solution to this problem consists in adding the value 0.5 to all the observed frequencies (the sample size increases by two units), a very frequent solution in the analysis of 2 × 2 tables. Simulation experiments have been carried out to study the type I errors and the powers of the hypothesis tests proposed in Section 2, using this solution when in a sample it is verified that n i10 = n i01 = 0. These experiments have been designed in a similar way to those performed in Section 3. Table 10 shows some results (type I errors and powers) for some of the scenarios considered in Section 3, as well as the average proportions (of the three correlation scenarios) of case (control) samples in which the value 0.5 has been added. Obviously, the proportion of samples in which the value 0.5 has been added is greater when the sample size is small. In general terms, the conclusions are the same as those obtained in the simulation experiments presented in Section 3, although the powers of all hypothesis tests are slightly lower when the sample sizes are small. Therefore, adding 0.5 to all the observed frequencies of a sample in which n i10 = n i01 = 0 is an adequate solution to be able to apply the PVs comparison method. Table 10. Type I errors and powers when 0.5 is added to the samples in which n i10 = n i01 = 0.