Simultaneous Comparison of Sensitivities and Speciﬁcities of Two Diagnostic Tests Adjusting for Discrete Covariates

: Adjusting for covariates is important in the study of the performance of diagnostic tests. In this manuscript, the simultaneous comparison of the sensitivities and speciﬁcities of two binary diagnostic tests is studied when discrete covariates are observed in all of the individuals in the sample. Four methods are presented to simultaneously compare the two sensitivities and the two speciﬁcities: a global hypothesis test and three other methods based on individual comparisons. The maximum likelihood method was applied to adjust the overall estimators of sensitivities and speciﬁcities. Simulation experiments were carried out to study the asymptotic behaviors of the four proposed methods when the covariate is binary, giving general rules of application. The results were applied to a real example.


Introduction
A diagnostic test is a medical test that is applied to a patient to determine the presence or absence of a certain disease. When the result of a diagnostic test may be either positive or negative, the diagnostic test is called a binary diagnostic test (BDT). The exercise test for the diagnosis of coronary artery disease is an example of a BDT. The fundamental parameters to measure the effectiveness of a BDT are its sensitivity and the specificity. The sensitivity (Se) is the probability that the BDT result is positive when the individual has the disease, and the specificity (Sp) is the probability that the BDT result is negative when the individual does not have the disease. Both parameters depend only on the intrinsic properties (physical, biological, chemical, etc.) of the BDT. The effectiveness of a BDT is assessed in relation to a gold standard. A gold standard (GS) is a medical test used to objectively diagnose the presence (or absence) of a certain disease. Therefore, a GS is an error-free test. An angiography for diagnosis of coronary artery disease is an example of a GS.
The comparison of the sensitivities (specificities) of two BDTs is an important topic in the study of statistical methods for diagnosis in medicine. The most common type of sample design to compare these parameters is the paired design. The paired design consists of applying the two BDTs to a random sample of n patients whose disease state is known by applying a GS. When the sensitivities and specificities of two BDTs are compared under a paired design, the problem is traditionally solved by conditioning on the disease status and applying a comparison test of two paired binomial proportions (e.g., the McNemar test). Therefore, the comparison of the two sensitivities is made conditioning on the diseased individuals and solving the test H 0 : Se 1 = Se 2 vs. H 1 : Se 1 = Se 2 applying the McNemar test to an α error [1]. Similarly, the comparison of the two specificities is made conditioning on the non-diseased individuals and solving the test H 0 : Sp 1 = Sp 2 vs. H 1 : Sp 1 = Sp 2 by applying the same method. Therefore, sensitivities and specificities are compared independently, by solving the hypothesis tests H 0 : Se 1 = Se 2 and H 0 : Sp 1 = Sp 2 , to the same α error. Roldán-Nofuentes and Sidaty-Regad [2] studied the simultaneous comparison of sensitivities and specificities, and showed that comparing the two sensitivities and the two specificities independently can give rise to global type I errors that greatly exceed the nominal error (and therefore can lead to wrong conclusions).
In clinical practice, when evaluating the effectiveness of a BDT, covariates are frequently observed in all patients in the sample. When the covariate is related to the disease and to the diagnostic test, it is necessary to adjust for covariates [3]. For example, in the diagnosis of coronary disease, smoking is a risk factor for the disease. Because smoking speeds up the heart rate, constricts the main arteries, and can cause disturbances in the rhythm of the heartbeat, if an exercise test is used, adjustment for smoking is needed to properly describe the diagnostic effectiveness of the exercise test. Another topical example is the diagnosis of COVID-19. Lahner et al. [4] studied the diagnosis of this disease in health workers using IgG serology as a diagnostic test (among other tests). Lahner et al. showed that the diagnostic performance of IgG serology is associated with the number of days elapsed (at least 14 or 20 days) after the nasopharyngeal swab. Therefore, adjusting for elapsed days is necessary to evaluate the diagnostic effectiveness of IgG serology. This problem also arises when comparing the effectiveness of two BDTs [3]. Therefore, when two BDTs are compared, it is necessary to eliminate the effect that the covariates have on the estimation of sensitivities and specificities, and on the comparison of these parameters.
This manuscript is an extension of the study by Roldán-Nofuentes and Sidaty-Regad [2], to the situation in which a discrete covariate is observed in all patients in the sample. Therefore, a global hypothesis test was studied to simultaneously compare the sensitivities and specificities of two BDTs when discrete covariates are observed in all patients in the sample. Other alternatives to the global hypothesis test were also studied. Adjusting for covariates in this situation eliminates the effect of covariates in the simultaneous comparison of the two sensitivities and specificities. This problem is approached by applying the maximum likelihood method to the estimation of the parameters and the delta method to the estimation of the variances-covariances. This manuscript is structured as follows. In Section 2, the model to simultaneously compare the sensitivities and specificities of two BDTs in the presence of a discrete covariate is described, in addition to other alternative methods. In Section 3, simulation experiments are carried out to study the sizes and the powers of the methods proposed in Section 2. In Section 4, a function written in R [5] is presented that allows the problem studied in this manuscript to be solved. In Section 5, the results are applied to the diagnosis of coronary heart disease, and in Section 6 the results are discussed.

Global Hypothesis Test
The objective is to study the simultaneous comparison of overall sensitivities and overall specificities of the two BDTs, i.e., to solve the global hypothesis test: when the two BDTs are applied to all individuals in a sample with a size of n and a discrete covariate is observed in all of them. Therefore, let us consider two BDTs, Test 1 and Test 2, that are applied to all n individuals in a random sample. The disease state (disease present or disease absent) of all of the individuals in the sample is known by applying a GS. Let T h be the binary random variable that models the result of the hth BDT: T h = 1 when the result of the BDT is positive and T h = 0 when it is negative. Let the binary random variable D that models the result of the GS: D = 1 when the individual is diseased and D = 0 when the individual is non-diseased. Moreover, let us consider that for all of the n individuals of the sample we observe a vector X = (X 1 , X 2 , . . . , X M ) of a discrete covariate, where X m is each of the different values or patterns that the covariate can take with m = 1, . . . , M.
Let us suppose that the number of individuals that verify X = X m is n m , and therefore n = M ∑ m=1 n m . Table 1 shows the observed frequencies for X = X m , where n ijm = s ijm + r ijm .
The sample of n individuals is the product of a multinomial distribution whose probabilities are: and: with: From the multinomial distribution sized n and probabilities τ mij and υ mij , 8M − 1 parameters can be estimated, because in total there are 8M probabilities that are subject ). If the covariate is binary, then 15 parameters can be estimated.
Let ψ m = P(X = X m ) be the probability that an individual X = X m and ψ = (ψ 1 , . . . , ψ M ) T , with M ∑ m=1 ψ m = 1. Let φ ijm and ϕ ijm be the probabilities defined as then probabilities τ mij and υ mij can be written as: The sample of n individuals can be seen as a sample of a mixture of M multinomial independent 2 × 4 tables. By conditioning on the mth table, i.e., conditioning on X = X m , and applying the conditional dependence model of Berry et al. [6], it holds that: and: φ ijm is the disease prevalence for the individuals with X = X m , q m = 1 − p m , δ ij = 1 if i = j and δ ij = −1 if i = j, and the parameter α 1m (α 0m ) is the covariance [6] between both BDTs when D = 1 (D = 0) and X = X m . The covariances verify [6] that 1 ≤ α 1m ≤ 1/max{Se 1m , Se 2m } and 1 ≤ α 0m ≤ 1/max{(1 − Sp 1m ), (1 − Sp 2m )}. If α 1m = α 0m = 1, then both BDTs are conditionally inde-pendent on the disease when X = X m , an assumption that is not realistic, so in practice α 1m > 1 and/or α 0m > 1. For the mth table (i.e., X = X m ), let ω m = (φ 11m , φ 10m , φ 01m , φ 00m , ϕ 11m , ϕ 10m , ϕ 01m , ϕ 00m ) T be the vector whose components are the probabilities φ ijm and ]ϕ ijm . Therefore, conditioning on X = X m , ω m is the probability vector of a multinomial distribution. Let ω = (ω 1 , . . . , ω M ) T be the vector whose components are ω m . In X = X m , the sensitivities of the BDTs are: Se 1m = P(T 1 = 1|D = 1, X = X m ) and Se 2m = P(T 2 = 1|D = 1, X = X m ), and the specificities are: ϕ ijm . The overall sensitivity and the overall specificity of each BDT are: With h = 1, 2, and where: are the sensitivity and specificity of Test 1 in X = X m , and: are the sensitivity and specificity of Test 2 in X = X m . The overall sensitivity and the overall specificity of each BDT are written in terms of ψ m , φ ijm and ϕ ijm as: for Test 1, and: for Test 2.
The parameters of the model are estimated by applying the maximum likelihood method. If the covariate has M patterns then 8M − 1 parameters must be estimated: 2M sensitivities, 2M specificities, 2M covariances, M prevalences and M − 1 probabilities ψ m (since M ∑ m=1 ψ m = 1). If the covariate is binary (M = 2) then 15 parameters must be estimated: four sensitivities (Se 11 , Se 21 , Se 12 and Se 22 ), four specificities (Sp 11 , Sp 21 , Sp 12 and Sp 22 ), four covariances (α 11 , α 01 , α 12 and α 02 ), two prevalences (p 1 and p 2 ) and the probability ψ 1 (since ψ 2 = 1 − ψ 1 ). Therefore, all the parameters of the model can be estimated from the sample of n individuals, since the number of parameters that must be estimated is equal to the number of parameters that can be estimated from the initial multinomial distribution. The log-likelihood function based on n individuals is: This function can be written as: where: and: Maximum likelihood estimators of ψ and ω are easily obtained from Functions (5) and (6), i.e.,ψ The estimators of sensitivities and specificities in X = X m , the estimator of overall prevalence, and the estimators of overall sensitivities and of overall specificities are easily obtained by substituting the parameters for their estimators into their respective equations. The Fisher information matrix of function (4) is: where I 1 = I(ψ) and I 2 = I(ω) are the Fisher information matrixes of Functions (5) and (6) respectively, verifying that: and, therefore, the covariances between ψ and ω are zero. Because vector ψ is the probability vector of a multinomial distribution, the variance-covariance matrix ofψ is: The variance-covariance matrix ofω m is: and the variance-covariance matrix ofω is: The proof can be seen in Appendix A.
Let θ = (Se 1 , Sp 1 , Se 2 , Sp 2 ) T be a vector whose components are the overall sensitivities and the overall specificities; then, by applying the delta method [7], the variance-covariance matrix ofθ is: The estimated variance-covariance matrix∑θ is obtained by substituting into this expression the parameters for their estimators.
The global hypothesis test (1) is equivalent to the hypothesis test: where A is a complete range matrix with the size 2 × 4, i.e., By applying the multivariate central limit theorem, it is verified that √ n θ − θ → N(0, ∑ θ ) when n is large. Then, the statistic: is distributed according to a Hotelling T-squared distribution. This distribution has 2 and n degrees of freedom, where 2 is the dimension of the vector Aθ. When n is large, Q 2 is distributed according to a central chi-squared distribution with 2 degrees of freedom when the null hypothesis is true, i.e., To calculate this test statistic, it is necessary to verify that s 10m + s 01m + r 10m + r 01m > 0. The global hypothesis test (1) can also be solved from the individual hypothesis test, i.e., H 0 : Se 1 = Se 2 and H 0 : Sp 1 = Sp 2 , each of which are independent of the α error. In this situation, the corresponding test statistics are: and: Both test statistics have a normal standard distribution when the sample size n is large. Another method used to solve the global test consists of solving each of the individual tests along with a method of multiple comparisons, such as the Bonferroni method [8] or the Holm method [9]. The Bonferroni and Holm methods are very easy to apply and are based on the p-values of the individual hypothesis tests. In the situation studied here, the Bonferroni method consists of solving each individual hypothesis test with an α/2 error. The Holm method is a less conservative method than the Bonferroni method. Let p 1 and p 2 be the p-values obtained in each individual hypothesis test and let us suppose that p 1 ≤ p 2 ; then, the Holm method [9] consists of the following two steps: (1) If p 1 > α/2, then none of the two null hypothesis H 0 : Se 1 = Se 2 and H 0 : Sp 1 = Sp 2 are rejected. If p 1 ≤ α/2, then the null hypothesis corresponding to that hypothesis test is rejected and we go on to the next step. (2) If p 2 > α, then the corresponding null hypothesis is not rejected. If p 2 ≤ α, then the null hypothesis is rejected and the process ends.
In this proposed model, it is assumed that a discrete covariate is observed in all of the individuals in the sample. If several discrete covariates are observed, the problem is solved in a similar manner. In this situation, a single discrete covariate is considered, whose number of patterns is the product of the patterns of the observed covariates [10]. For example, if two covariates are observed with two and three patterns, respectively, for example, sex and age group (young, adult, and older), then a covariate that has six patterns is considered (young man, adult man, older man, young woman, adult woman, and older woman).

Simulation Experiments
Monte Carlo simulation experiments were carried out to study the sizes and the powers of the four methods described in Section 2: global hypothesis test with α = 5%; individual hypothesis tests each with α = 5%; individual hypothesis tests along with the Bonferroni method and α = 5%; and individual hypothesis tests along with the Holm method and α = 5%. For the global hypothesis test with α = 5%, the global type I error is the error that is committed when the alternative hypothesis is accepted (Se 1 = Se 2 and/or Sp 1 = Sp 2 ) when the null hypothesis is true (Se 1 = Se 2 and Sp 1 = Sp 2 ). Regarding the individual hypothesis tests with α = 5% (with or without a multiple comparison method), the objective is to study the magnitude and behavior of the global type I error and of the global power. The global type I error is the error made when we reject H 0 : Se 1 = Se 2 and/or H 0 : Sp 1 = Sp 2 when both are true, whether or not each test is with α = 5% or applies the Bonferroni (or Holm) method. The argument for the global power is similar to this.
These experiments consisted of generating N = 10, 000 random samples with multinomial distributions with a size of n = {50, 100, 200, 500, 1000, 2000}, whose probabilities were calculated from Equation (2). It was considered that the discrete covariate X is binary (M = 2) with patterns X 1 and X 2 , such as the presence of a risk factor (Yes or No), family history of the disease (Yes or No), or sex; this situation is very frequent in clinical practice. As values for ψ 1 (ψ 2 = 1 − ψ 1 ), we considered 0.25 and 0.50, and for the prevalence p m , we considered the values 10%, 25%, and 50%. As values of the sensitivities (Se 11 , Se 12 , Se 21 and Se 22 ) and specificities (Sp 11 , Sp 12 , Sp 21 and Sp 22 ) in each pattern of the covariate, we took the values {0.70, 0.80, 0.90}. Then, from the values Se hm and Sp hm , we calculated the maximum values of the covariances α 1m and α 0m , and as values of α 1m and α 0m , we took intermediate and high values, i.e., 90}. From all of the above values, the overall sensitivities and overall specificities were calculated by applying Equation (3). The simulation experiments were designed in such a manner that, if it is not possible in a sample to estimate a parameter (for example, ifŜe hm = 0), then that sample is discarded and another is generated in its place until N samples are obtained. Tables 2 and 3 show the type I errors obtained for the four methods proposed in Section 2, considering different scenarios.    conservative test when the sample size is small (n = 50) or moderate (n = 100-200), and its global type I error approaches the nominal error (without exceeding it) when the sample size is large (n = 500-1000) or very large (n = 2000). (b). Individual tests with α = 5%. The type I error of the individual tests with α = 5% is less than the nominal error when the sample size is small and increases as the sample size increases. The type I error can clearly exceed the nominal error when the sample size is large. Therefore, the method based on individual tests with α = 5% can give false significance when the sample size is large and should not be used. (c). Individual tests combined with the Bonferroni method. The type I error of the method based on the individual tests combined with the Bonferroni method has a behavior very similar to the type I error of the global hypothesis test, and there is no important difference between both type I errors (d). Individual tests combined with the Holm method. The type I error of the method based on the individual tests combined with the Holm method is very similar (even the same in many cases) to the type I error of the individual tests combined with the Bonferroni method.

Powers
Tables 4 and 5 show the powers obtained for the four methods proposed in Section 2, considering different scenarios. The covariances α 1m and α 0m have an important effect on the powers of the methods: the powers increase when the values of the covariances increase. From the results, the following general conclusions are obtained: (a). The power of the method based on the individual tests with α = 5% is greater than the powers of the other methods, due to the fact that its global type I error is also greater than that of the other methods (clearly exceeding the nominal error when the sample size is large). (b). The power of the method based on individual tests combined with the Bonferroni method and the power of the method based on individual tests combined with the Holm method are practically equal. Therefore, both methods show an asymptotic behavior, in terms of type I error and power, that is practically identical. (c). In very general terms, the power of the method based on the individual tests combined with Bonferroni (Holm) is slightly greater than the power of the global hypothesis test when the sample size is small or moderate. When the sample size is large or very large, the power of the global hypothesis test is, in very general terms, slightly higher than that of the method based on individual tests with Bonferroni (Holm). In these situations, all of these methods have a very similar type I error.           Se 1 = 0.90, Se 2 = 0.80, Sp 1 = 0.90, Sp 2 = 0.80, p 1 = 10%, p 2 = 25%, ψ 1 = 50%, ψ 2 = 50% α 11 = 0.008 α 01 = 0.008 α 12 = 0.008 α 02 = 0.008 α 11 = 0.040 α 01 = 0.040 α 12 = 0.040 α 02 = 0.040

Application Rules
Based on the conclusions obtained from the simulation experiments, the following general application rules can be given when simultaneously comparing the accuracies of two BDTs in the presence of a binary covariate: (a). When the sample size is small or moderate, solve the individual hypothesis tests H 0 : Se 1 = Se 2 (Equation (8)) and H 0 : Sp 1 = Sp 2 (Equation (9)) combined with the Bonferroni (or Holm) method using an error α = 5%.
(b). When the sample size is large or very large, solve the global test H 0 : (Se 1 = Se 2 and Sp 1 = Sp 2 ) (Equation (7)) using an error α = 5%. If the global hypothesis test is not significant, then the equality of the accuracy of the two BDTs is not rejected. If the global hypothesis test is significant, then the causes of the significance will be investigated via testing H 0 : Se 1 = Se 2 and H 0 : Sp 1 = Sp 2 by individually applying Equations (8) and (9) combined with the Bonferroni (Holm) method using an error α = 5%. The global hypothesis test is initially applied because it is a somewhat more powerful method than the individual tests combined with the Bonferroni (Holm) method when the sample size is large or very large.
These application rules are given solely based on the sample size n because it is the only parameter of the study whose value was set by the researcher.

The "scapbc" Function
A function was written in R [5] that allows simultaneously comparing the accuracies of two BDTs subject to a paired design in the presence of a binary covariate. The function is called "scapbc" (simultaneous accuracy comparison in the presence of a binary covariate) and is executed with the command: scapbc(s 111 , s 101 , s 011 , s 001 , r 111 , r 101 , r 011 , r 001 , s 112 , s 102 , s 012 , s 002 , r 112 , r 102 , r 012 , r 002 , α) where (s 111 , s 101 , . . . , r 012 , r 002 ) are the observed frequencies and "α" is the α error. The function checks that the values of the arguments are valid. The function solves the problem by applying the rules given in Section 3.3, by applying the Bonferroni method. The results obtained are recorded in the file "results_scapbc.txt" in the same folder from which the function is run. The "scapbc" function is available as the Supplementary Materials of this manuscript.

Example
The results were applied to the diagnosis of coronary artery disease [11]. Weiner et al. [11] applied two BDTs (exercise test and clinical history) and a GS (coronary angiography) to a sample of 2045 patients (1465 men and 580 women). The observed frequencies of the study are shown in Table 6, where the variable T 1 models the result of the exercise test, T 2 models the result of the clinical history, and D models the result of the coronary angiography. Table 6. Observed frequencies in the study of Weiner et al. Women In this study, the risk of coronary heart disease is 2.4 times higher in men than in women [11]. The estimated value of the odds ratio is 5.63 (95% confidence interval: 4.56 to 6.95). Therefore, sex is a covariate that is related to the disease. In the exercise test, ST segment depression is less sensitive in women than in men, so sex is a covariate that can influence the test result. Therefore, adjusting for sex is necessary to simultaneously compare the two sensitivities and the two specificities. Executing the command scapbc (786, 29, 183, 25, 69, 46, 176, 151, 124, 4, 32, 9, 81, 68, 101, 161, 0.05), generates the results shown in Table 7. Table 7. Results obtained in the study of Weiner et al. Because the sample size is very large, the global hypothesis test is solved (application rules of Section 3.3). The test statistic for the global hypothesis test is Q 2 = 224.252 and p-value = 0. Therefore, the null hypothesis (equality of the two sensitivities and of the two specificities) of the global hypothesis test is rejected. To investigate the causes of significance, it is necessary to solve the individual tests and apply the Bonferroni (or Holm) method. The test statistic for H 0 : Se 1 = Se 2 vs. H 1 : Se 1 = Se 2 is 12.265 (p−value = 0), and the test statistic for H 0 : Sp 1 = Sp 2 vs. H 1 : Sp 1 = Sp 2 is 8.593 (p−value = 0). Applying the Bonferroni method with α = 5%, the two null hypotheses are rejected. Therefore, the sensitivity of the clinical history is significantly greater than the sensitivity of the exercise test (95% confidence interval: 0.128 to 0.177), and the specificity of the exercise test is significantly greater than the specificity of the clinical history (95% confidence interval: 0.148 to 0.235). The same conclusions are obtained if the Holm method is applied.

Discussion
Comparison of the sensitivities and specificities of two BDTs is a topic of great interest in the study of statistical methods applied to diagnosis and has been the subject of numerous studies in the statistical literature. When two BDTs are compared, it is common to observe discrete covariates in all of the individuals in the sample. In this situation, if the covariates are related to the disease and to either of the two BDTs, then it is necessary to adjust for covariates. This adjustment has the purpose of eliminating the effect of the covariate in the estimation of the global sensitivity and specificity of each BDT, and consequently eliminating its effect in the comparison of the parameters. Therefore, adjustment for covariates is important because the comparison of two diagnostic tests may be biased when an adjustment is not made. This manuscript makes a contribution to this topic, by simultaneously comparing the accuracies of two BDTs by adjusting for discrete covariates. Therefore, in this manuscript the simultaneous comparison of the sensitivities and the specificities of two BDTs was studied when discrete covariates are observed in all of the individuals in the sample. The overall estimators of the sensitivities and specificities were obtained by applying the maximum likelihood method and the variances-covariances were estimated by applying the delta method. In this situation, simultaneous comparison of sensitivities and specificities of two BDTs was resolved by four methods: the global hypothesis test H 0 : (Se 1 = Se 2 and Sp 1 = Sp 2 ) with an α error; individual tests H 0 : Se 1 = Se 2 and H 0 : Sp 1 = Sp 2 , each with an α error; individual tests H 0 : Se 1 = Se 2 and H 0 : Sp 1 = Sp 2 and application of the Bonferroni method with an α error; and individual tests H 0 : Se 1 = Se 2 and H 0 : Sp 1 = Sp 2 and application of the Holm method with an α error.
Simulation experiments were carried out to study the behaviors of the different methods when the covariate is binary. The results showed that the method based on the individual tests H 0 : Se 1 = Se 2 and H 0 : Sp 1 = Sp 2 , each with an α error, can give rise to type I errors that far exceed the nominal error, and therefore this method gives rise to too many false significances. Furthermore, the method based on the global hypothesis test has better asymptotic behavior when the sample size is large or very large than the methods based on individual tests and the application of the Bonferroni or Holm methods. However, when the sample size is small or moderate, these latter two methods perform better than the method based on the global hypothesis test. Therefore, based on the results of the simulation experiments, some rules of application of the methods can be given according to the sample size (which is the only value set by the researcher). These rules are: (a) When the sample size is small or moderate, solve the individual hypothesis tests H 0 : Se 1 = Se 2 and H 0 : Sp 1 = Sp 2 combined with the Bonferroni (or Holm) method with an error α = 5%; (b) When the sample size is large or very large, solve the global test H 0 : (Se 1 = Se 2 and Sp 1 = Sp 2 ) with an error α = 5%. If the global hypothesis test is not significant, then it is not rejected that the two sensitivities are equal and that the two specificities are equal. If the global hypothesis test is significant, then the causes of significance are investigated by solving the individual tests combined with the Bonferroni (Holm) method with an error α = 5%. The method based on the global hypothesis test is very similar to the analysis of variance: first the global test is solved and, if it is significant, then the individual tests are solved and a multiple comparisons method is applied.
Simulation experiments have shown that the covariances between the two BDTs have an important effect on type I errors and powers. Type I errors are greater when the two BDTs are conditionally independent of the disease than when the two BDTs are conditionally dependent on the disease. Regarding the powers, for a fixed sample size, the power of each method is greater when the two BDTs are conditionally dependent on the disease than when they are conditionally independent of the disease. In practice, the only parameter that the researcher can control is the sample size. Therefore, although the effect of the covariances is important, the increase in power can only be achieved by increasing the sample size (the researcher cannot increase the values of the covariances, because these depend on the intrinsic properties of both diagnostic tests).
Simulation experiments have also shown that the global hypothesis test, whose test statistic is a Wald-type test statistic, has a good asymptotic performance in terms of type I error and power. The type I error of the global test is close to the nominal error when the sample size is large or very large. Regarding the power, in general terms and depending on the covariances between the two BDTs, a large sample size is needed for the power to be large. Therefore, the global test performance when the covariate is binary is very similar to that obtained in other studies [2].
The proposed method is based on the fact that the covariate is discrete. A future study should address the problem that occurs when the covariate is quantitative.
Finally, a function was written in R that allows us to solve the problem posed when the covariate is binary. The function is easy to use and provides all of the results so that the researcher can easily solve the problem. The function is available as Supplementary Materials to this manuscript.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/math9172029/s1. The "scapbc" function is a function written in R that allows simultaneous comparison of the accuracies of two BDTs subject to a paired design in the presence of a binary covariate.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.