Abstract
The comparison of two paired binomial proportions is a topic of interest in statistics, with important applications in medicine. There are different methods in the statistical literature to solve this problem, and the McNemar test is the best known of all of them. The problem has been solved from a conditioned perspective, only considering the discordant pairs, and from an unconditioned perspective, considering all of the observed values. This manuscript reviews the existing methods to solve the hypothesis test of equality for the two paired proportions and proposes new methods. Monte Carlo simulation methods were carried out to study the asymptotic behaviour of the methods studied, giving some general rules of application depending on the sample size. In general terms, the Wald test, the likelihood-ratio test, and two tests based on association measures in 2 × 2 tables can always be applied, whatever the sample size is, and if the sample size is large, then the McNemar test without a continuity correction and the modified Wald test can also be applied. The results have been applied to a real example on the diagnosis of coronary heart disease.
MSC:
62P10; 62F05
1. Introduction
The comparison of two proportions is a topic of special interest in statistics [1], with important applications in medicine and health sciences in general. Of special interest is the case in which the two proportions are paired, as is the case in which, in a sample of n individuals, a binary variable is observed before and after a certain treatment or when the sensitivities (specificities) of two binary diagnostic tests are compared with respect to the same gold standard [2,3]. This problem also frequently arises in clinical trials [4], such as when assessing the effectiveness of a new drug or treatment. These situations give rise to the analysis of a table, in which the only value set by the researcher is the sample size n. There are numerous statistical methods in the statistical literature to solve these problems. Classically, the problem has been solved by conditioning in discordant pairs and thus neglecting the frequencies of discordant pairs. This way of solving the problem has given rise to different methods, and the McNemar test [5] is the best known of all of them [6,7,8]. The problem can be solved with exact tests (conditioned and unconditioned) and with approximate tests (conditioned and unconditioned). All test statistics of the approximate methods are distributed approximately according to a chi-square distribution with one degree of freedom.
In the statistical literature, there are numerous methods to solve the hypothesis test to compare two paired proportions. May and Johnson [9], Park [10], and, more recently, Fagerland et al. [11,12,13] have compared different methods to solve this problem. However, in these works, only some of the existing methods have been studied. This is one of the main motivations for our study together with the proposal of new methods, comparing a large number of different methods to solve the hypothesis testing to compare two paired binomial proportions.
An alternative method to the hypothesis test, one directly related to it, consists of comparing the two paired proportions using confidence intervals for the difference (or ratio) of the two paired proportions. A review of more common confidence intervals can be seen in Pradhan et al. [4] and Tan et al. [14]. In addition, new intervals are proposed in Pradhan et al. [15], more recently in Fay et al. [16], and in Chan et al. [17]. A review of different methods to solve the hypothesis test as well as confidence intervals for the difference and the ratio of two paired proportions can be seen in Fagerland et al. [13].
Therefore, the purpose of this manuscript is to compare the asymptotic behaviour in terms of type I error rates and powers of different methods to solve the hypothesis test to compare two paired binomial proportions and to provide general rules of application for the methods. The rest of the article is structured as follows. Section 2 describes 24 methods to solve the hypothesis test for comparing two paired binomial proportions. Section 3 describes the criteria used to compare the asymptotic behaviour of the 24 methods. Section 4 carries out extensive Monte Carlo simulation experiments to study the type I error rates and the powers of the methods. Section 5 presents general rules of application for the methods to solve the problem posed. In Section 6, the results are applied to a real example on the diagnosis of coronary heart disease, and Section 7 discusses the conclusions obtained.
2. Notation and Methods
In general terms and focusing on common problems in the field of medicine, let us consider a binary random variable, with the categories of ‘success’ and ‘failure’, which is observed in a random sample of n individuals before and after a treatment. This situation gives rise to Table 1, where the only value set from the researcher is the sample size n. This table also shows the theoretical probabilities of each cell. The data observed in this table, , were the product of a multinominal distribution with probability vector , verifying that . Variance–covariance matrix of was as follows:
and the estimator of was .
Table 1.
Observed frequencies and theoretical probabilities.
In this situation, the comparison of two paired binomial proportions consisted of solving the hypothesis test:
which was equivalent to solving the test:
Estimators of and were as follows:
The following describes 24 statistical methods to solve this hypothesis test. Of these 24 methods, two were exact, one was quasi-exact, and 21 were approximate (of which five were new).
- 1.
- Conditional exact test (CET)
The probabilities and did not intervene in the hypothesis test (1) so that these probabilities could be ignored, as frequencies and could, because they did not influence the results of the hypothesis test (1). Conditioning was in the sum of discordant frequencies, i.e., an exact test was obtained using the binomial distribution [13,18]. Conditioning on , it was verified that , and therefore, was the product of a binomial distribution of parameters and , i.e., . If was true, then , and the hypothesis test (1) was equivalent to the test:
The p-value could be calculated directly from the binomial distribution. If we assumed that , then the following was derived:
where . Finally, the two-sided exact p-value for the comparison test of the two paired binomial proportions was as follows:
Conditional exact test is a conservative test; that is, when is true, the p-value is typically less than of the time, where is the nominal error level.
- 2.
- Conditional exact mid-p test (MidpT)
The conditional exact mid-p test [19] is a modification of the CET that consists of subtracting the probability of the observed outcome from (3), as in the following:
Then, the mid-p value to compare the two proportions is as follows:
Conditional exact mid-p test is a less conservative method than the CET.
- 3.
- McNemar Test (MT)
The McNemar test [4,13,18] is the asymptotic version of the CET. Conditioning in and applying the central limit theorem, the test statistic for hypothesis test (1) is as follows:
whose distribution is approximately a standard normal distribution and where the following occurs:
Since it is being conditioned in (frequencies and are disregarded), then and . If is true, then the following are derived:
and
Substituting with and for in the expression of the test statistic z, the test statistic of the McNemar test (without continuity correction) is as follows:
whose distribution is approximately (it is traditionally required that ) a standard normal. Very often, the test statistic is expressed in terms of the chi-square distribution:
whose distribution is approximately one chi-square with a degree of freedom. MT is a method that has good asymptotic behaviour in terms of type I error rate and power.
- 4.
- McNemar test with Yates continuity correction (MTYcc)
The McNemar test approximates the binomial distribution to the normal distribution. In this situation, it is common to apply a continuity correction (cc), whose objective is to improve the approximation to the normal distribution. Edwards [20] proposed the following test statistic with Yates cc [21]:
whose distribution is approximately a standard normal distribution. It is also common to express this test statistic in terms of the chi-square distribution [13,18]:
- 5.
- McNemar test with continuity correction (MTcc1)
Conditioning in , the random variable jumps from 1 to 1, so a cc is (half the jump) [22]. Therefore, another test statistic of the McNemar test with cc is as follows:
or what is the same:
This cc has been used by Chang et al. [17] to estimate the difference between two paired binomial proportions using confidence intervals. These authors have also proposed other continuity corrections: 0.125 and 0.25. We proposed applying these continuity corrections to the McNemar test statistics, obtaining the following new test statistics (called MTcc2 and MTcc3, respectively):
- 6.
- Modified McNemar test (MMT)
Bennett and Underwood [23] proposed a modification of the McNemar test statistic by adding to the observed frequencies, with the aim of improving the approximation to the chi-square distribution. Thus, the test statistic is as follows:
- 7.
- Wald test (WT)
The hypothesis test (1) can be solved by applying the Wald method [24,25]. Since is the probability vector of a multinomial distribution, its variance-covariance matrix is as follows:
The hypothesis test (2) is equivalent to checking the following:
where
It is easy to verify that the estimated variance of is as follows:
Applying the central limit theorem, the following is derived:
By performing algebraic operations, it was obtained that the Wald test statistic for test (2) was as follows:
whose distribution was approximately a chi-square distribution with a degree of freedom.
- 8.
- Modified Wald test (MWT)
May and Johnson [9] proposed modifying the Wald test statistic by adding to and to . Thus, the modified Wald test statistic is as follows:
This method has good asymptotic behaviour and is recommended as one of the best methods to solve the hypothesis test [9].
- 9.
- Likelihood-ratio test (LRT)
The hypothesis test (1) can be solved by applying the likelihood-ratio test [26]. The likelihood function of the data is as follows:
where . If is true, then it is verified that the likelihood function of the data is as follows:
and that the following is derived:
Applying the likelihood-ratio test [25,26], the likelihood-ratio test statistic to compare the two proportions was as follows:
whose distribution was approximately one chi-square with a degree of freedom. Therefore, the test statistic of the LRT method only contained the frequencies of the discordant pairs.
- 10.
- Unconditional exact test (UET)
The CET method condition on . Suissa and Shuster [27] have proposed, from the McNemar test statistic, an exact test that uses all the observed frequencies and therefore does not condition in . When the two proportions were compared, the power function of the test was as follows:
where and , with and as the calculated value of the McNemar statistic. If was true, then the distribution of was a trinomial distribution with parameters and probability vector , and the power function was as follows:
where was the nuisance parameter. El nuisance parameter was eliminated by maximizing this function over the range of . The function was simplified as follows:
where , , was the integer function and was the cumulative binomial distribution function with parameters and . Finally, the two-sided exact p-value was calculated as follows:
- 11.
- Unconditional McNemar test (UMT)
Lu [28] has proposed a test statistic for the McNemar test that does not condition on . Hypothesis test (1) was equivalent to the following hypothesis test:
If was true, then (or ) was the product of a binomial distribution with parameters and , that is to say, . The mean and variance of the estimators of this binomial distribution were as follows:
respectively. Approximating the normal distribution and applying the central limit theorem, the unconditional test statistic was as follows:
or rather
whose distribution was approximately a chi-square distribution with one degree of freedom. In order to apply this method, it was required that , and its asymptotic behaviour was very similar to that of the CET [28].
- 12.
- Unconditional likelihood-ratio test (ULRT)
Lu [29] also proposed a likelihood-ratio test statistic to compare two binomial proportions that contain all frequencies. The likelihood-ratio test statistic is obtained in two phases: (I) the likelihood-ratio test statistic is calculates when the four frequencies are combined in two, and ; (II) the likelihood-ratio test statistic is calculated when the four frequencies are combined in another two, and . Corresponding test statistics were as follows:
and
Finally, the likelihood-ratio test statistic was calculated as the mean of both likelihood-ratio test statistics:
and its distribution was approximately a chi-square distribution with one degree of freedom. The ULRT can be applied in most cases, although the test statistic does not fit well to the chi-square distribution when the difference between and is large, especially when is also large, and in this situation, it was a better method than the LRT [29].
- 13.
- New revised version of the McNemar test (NMT)
Lu et al. [30] revised the unconditional McNemar test [28]. Under the hypothesis that is no difference in the number of “success” and “failure” results between “before” and “after”, the estimated probability of obtaining a “success” is as follows:
and the estimated probability of obtaining a “failure” is as follows:
Frequencies and correspond to “success” and “failure” in “before” measurements. The estimated mean is as follows:
and the estimated standard deviation is as follows:
Applying the central limit theorem, the statistic test was as follows:
and its distribution was approximately a standard normal distribution when and . Alternatively, the following was derived:
This method had an asymptotic behaviour that improved that of the UMT [30].
- 14.
- New revised version of the McNemar test with cc (NMTcc)
Lu et al. [30] revised the unconditional McNemar test and proposed the following unconditional test statistic with cc:
- 15.
- Haber test (HT)
Haber [31] has studied the use of continuity correction in hypothesis testing, particularizing the results in 2 × 2 tables. Haber proposed a McNemar test statistic with a cc based on the McNemar test statistics:
where is the McNemar test statistic and m is the number of different values z may attain. The number of different achievable values of is very close to , and since the range of values is , the cc based on the average difference of the successive values gives rise to the test statistic:
and its distribution is approximately a chi-square with one degree of freedom.
- 16.
- Irony et al. test (IT)
Irony et al. [32] have studied the comparison of two binomial proportions from a Bayesian perspective. The Dirichlet distribution is the natural conjugate prior for . Therefore, the distribution for PI is a Dirichlet with parameter , and its posterior distribution is also Dirichlet with parameter , where . The objective is to solve the hypothesis test:
where . This hypothesis test is equivalent to the following:
where . Therefore, the only parameters of interest are and , and therefore, only the trinomial data are considered. Likelihood function is written as a product of two factors: one depending only on the parameter of interest and the other depending only on the nuisance parameter . Distribution of is as follows:
and distribution of is as follows:
Parameters and are independent. An interval for is constructed by generating a large number of observations from the posterior distribution of , that is, a Dirichlet distribution with parameter . Irony et al. [32] have shown that posterior mean of is as follows:
and posterior variance of is as follows:
A confidence interval for is as follows:
where
and q is the quantile of the standard normal distribution. From the previous equations, test statistic for the hypothesis test (1) was as follows:
whose distribution was approximately a chi-square distribution with one degree of freedom.
- 17.
- RR test (RRT)
The hypothesis test (1) was equivalent to the hypothesis test:
where
Lui [33] solved this hypothesis test by applying weighted least squares. Estimator of RR is as follows:
and applying the delta method the estimated variance of is as follows:
where
Applying the central limit theorem, the test statistic for hypothesis test (4) was as follows:
or equivalently
whose distribution was approximately a chi-square distribution with one degree of freedom.
- 18.
- OD test (ODT)
The hypothesis test (1) was also equivalent to the following:
where
Lui [33] solved this hypothesis test by applying the same method as the one used in the RR test. Following an analogous procedure, the test statistic for the hypothesis test (5) is as follows:
and where
The distribution of the test statistic is the same as the one in the previous case.
- 19.
- ODM test (ODMT)
The hypothesis test (1) was also the same as the following:
where
Applying the same method as in the two previous cases, Lui [33] proposed the following test statistic:
where
The distribution of test statistic was the same as in the previous cases.
- 20.
- RR, OD, and ODM test with cc (RRTcc, ODTcc, and ODMTcc)
The previous three methods can also be obtained by adding a cc. We proposed to add to each one of the observed frequencies, i.e., in the following:
Thus, the expressions of test statistics , , and were replaced by , , and as follows:
respectively. In this way, new test statistics , , and were obtained, and their distributions were the same as in previous cases.
3. Criteria for Comparing Methods
The comparison of the asymptotic behaviour of the methods presented in the previous section was made by comparing their type I error rates and their powers, taking as the nominal error level . Based on the type I error rates and the powers, the criteria in order to choose the methods with best asymptotic behaviour were as follows:
- 1.
- The type I error rate fluctuates around without being much higher than this value, a situation that has been considered when the type I error rate is .
- 2.
- The power is higher as long as the type I error rate does not exceed .“Step 1” of this method to choose the method with the best asymptotic behaviour establishes that the type I error rate must be lower than 7%. Let , where and are the type I error rates of the method. Related to a test statistic, if there is a confidence interval (CI), then , where is the nominal confidence of the CI and is the coverage probability of the CI calculated. In this method, to choose test statistics, a test statistic is too liberal if , or what amounts to the same if , in which case it is said that the CI fails [34,35,36]. If a CI fails, then the type I error rate of the corresponding hypothesis test is , and therefore, the hypothesis test is very liberal and leads to too many false significances.
4. Simulation Experiments
Extensive Monte Carlo simulation experiments were carried out in order to study the asymptotic behaviour, measured in terms of type I error rates and powers, of the test statistics presented in Section 2. These experiments, made with the R program [37], consisted of generating random samples of multinomial distribution with probabilities given in Table 1 of sizes. Following the idea of Fagerland et al. [12], probabilities have been re-parameterized as , where is the odds ratio. In order to study type I error rates, it was considered that , and to study the powers, it was considered that . Values were taken as values for and , and values were considered for . Therefore, a wide range of values were considered to reveal the asymptotic behaviour of each test statistic. In order to calculate type I error rates and the powers, was set. Initial simulation experiments were carried out, generating random samples for several scenarios, obtaining the outcome that the results for were stable so that, finally, was considered as a way to save computing time.
4.1. Type I Error Rates
Table 2, Table 3, Table 4 and Table 5 show some of the results obtained for the type I error rates of the test statistics in different scenarios. Each scenario also shows basic descriptive statistics of (mean and standard deviation). By analyzing the result, the following conclusions can be drawn:
Table 2.
Type I error rates (in %) for and different scenarios.
Table 3.
Type I error rates (in %) for and different scenarios.
Table 4.
Type I error rates (in %) for and different scenarios.
Table 5.
Type I error rates (in %) for and different scenarios.
- Both the exact test (CET and UET) and the quasi-exact test MidpT) are conservative methods, and their type I error rates never exceed the nominal error level .
- All of the McNemar test statistics (MT, MTYcc, MTcc1, MTcc2, MTcc3, and MMT) are conservative when, in general terms, is not high. The value of decreases as the value of the odds ratio increases, so if , all four methods are conservative when (rounding up to the nearest whole value), and when , all four methods are conservative when . In each scenario, in general terms, the type I error rates of these methods fluctuate around when is higher than each one of the previous values. Likewise, continuity corrections do not improve the asymptotic behaviour of the type I error rates, especially when is high . When is small or moderate and , continuity corrections do not have a clear effect on the type I error rate, as sometimes it improves and sometimes it gets worse.
- MidpT, MT, and UET have practically the same type I error rates when .
- Test statistics ODT and ODTcc are methods that lead to many false significances since they have type I error rates that greatly exceed . Therefore, both methods should not be used.
- The other approximate methods (which are unconditioned methods) are conservative when , and, in very general terms, their type I error rates fluctuate around (without being too much higher) when . Some of these methods (WT, LRT, RRT, and ODMT) have type I error rates that fluctuate around (without being too much higher) when . Regarding the continuity corrections of the RRT and ODMT methods, they do not improve the asymptotic behaviour of their type I error rates.
4.2. Powers
Table 6, Table 7, Table 8 and Table 9 show some of the results obtained for the power of the test statistics in different scenarios. These tables do not show the results for the test statistics ODT and ODTcc since their type I error rates are very clearly higher than . For each scenario we can also see the basic descriptive statistic of . From the analysis of the results, the following conclusions are obtained:
Table 6.
Powers (in %) for , , and .
Table 7.
Powers (in %) for , , and .
Table 8.
Powers (in %) for , , and .
Table 9.
Powers (in %) for , , and .
- UET and MidpT have very similar powers, and both are a little more powerful than CET, especially when the sample size is small .
- The classic McNemar test statistic without cc (MT) has the same power as the three McNemar test statistics with cc (MTcc1, MTcc2, and MTcc3), and all of them are more powerful than the McNemar test statistic with Yates cc (MTYcc).
- Methods MT, MTcc1, MTcc2, and MTcc3 have the same power as UET and as MidpT when .
- Regarding the approximate tests, in general terms, the WT, LRT, RRT, and ODMT methods have more power than the other approximate tests, especially when . When , if the difference between and is small (for example, ), then the WT, LRT, RRT, and ODMT methods have more power than the rest of the approximate methods; if the difference between and is greater (for example, ), then all of the approximate methods have very similar powers.The continuity corrections in the RRT and ODMT methods do not improve their powers.
5. General Rules of Application
From the results obtained in the simulation experiments and only considering sample size (as it is the only parameter set by the researcher), one can provide the following general rules of application for the test statistics:
- When the sample size is small, use the WT, LRT, RRT, or ODMT methods; since they are the least conservative methods, they have the greatest power, and their powers are similar.
- When the sample size is moderate, use the WT, LRT, RRT, or ODMT methods; since their type I error rates fluctuate around , they have the greatest power, and their powers are similar.
- When the sample size is large, use the MT, WT, MWT, LRT, RRT, and ODMT methods; since their type I error rates fluctuate around , they have the greatest power, and the powers of these methods are very similar.
The graphs in Figure 1 show the type I error rates of the selected methods, and the graphs in Figure 2 show the powers of these methods for different scenarios. The graphs in Figure 1 show how the WT, LRT, RRT, and ODMT methods have a type I error rate with better behaviour than the MT and MWT methods when the sample size is small or moderate, with their values being very similar when the sample is large. In the graphs in Figure 2, it can be seen that the power of MT is a little lower than that of the other methods when the sample size is small. Likewise, the powers of these methods are very similar when the sample size is moderate or large.
Figure 1.
Type I error rates of the methods for .
Figure 2.
Powers of the methods for different scenarios.
6. Example
The results have been applied to the diagnosis of coronary artery disease using dobutamine echocardiography (DE, test 1) and myocardial perfusion scintigraphy (MPS, test 2) as diagnostic tests and coronary angiography (CA) as the gold standard. The objective of this study is to compare the sensitivities (specificities) of the two diagnostic tests. Table 10 shows the frequencies observed in the study, the estimate of each sensitivity (Se) and of each specificity (Sp), and the results of each method to resolve the respective comparisons. The comparison of the two sensitivities (specificities) has been carried out using the function “pairedProp”, which is a function written in R that allows for comparing two paired binomial proportions using the methods recommended in Section 4. This function is attached as a Supplementary Material to the manuscript. The sentence to compare the two sensitivities is as follows:
and the sentence to compare the two specificities is as follows:
Table 10.
Diagnosis of coronary artery disease: frequencies and results of comparisons of sensitivities and specificities.
In this example, the number of patients with coronary artery disease and the number of patients without coronary artery disease are large, and therefore, all the methods indicated in Section 4 can be applied. The estimates of the sensitivities and specificities of the diagnostic tests are as follows: , , , and . With fixed , the equality of the two sensitivities is rejected, and the equality of the two specificities is not rejected. It is concluded that the sensitivity of the DE test is significantly greater than the sensitivity of the MPS test.
In this example, it can be seen that the p-values of all the methods to compare the two sensitivities (specificities) are very similar to each other, and therefore, the conclusions are the same.
7. Discussion
The comparison of two paired binomial proportions is a problem that appears frequently in medical and clinical studies. In the statistical literature, there are diverse methods proposed to solve this hypothesis test, and therefore, it is necessary to determine which methods have the best asymptotic behaviour in terms of the type I error rate and power. We reviewed 19 existing methods and proposed 5 new ones, and we carried out broad simulation experiments to study their asymptotic behaviour. From the results obtained, we have given some general rules of application for the methods studied.
May and Johnson [9] compared through simulation experiments the asymptotic behaviour of eight methods (CET, MidpT, MT, MTYcc, MMT, WT, MWT, and LRT) and recommended using the MidpT, MWT, and MT methods when it is verified that . May and Johnson used the criterion that the type I error rate must not be higher than .
Park [10] has compared, using the same criteria as May and Johnson, the asymptotic behavior of the CET, MT, LRT, and WT methods, concluding that the method with the best behavior is the MT.
Fagerland et al. [11,12,13] also compared through simulation experiments the asymptotic behaviour of five methods: CET, MidpT, UET, MT, and MTYcc. These authors used the same criterion as May and Johnson and recommended using the MidpT and MT methods.
The studies of May and Johnson [9] and Fagerland et al. [11,12,13] used the same criterion to assess the type I error rates, and both studies recommended the MidpT and MT methods. Park [10] recommends the MT method.
Our criterion to assess the type I error rate of each method is more flexible, allowing for a method to be higher than without being too liberal. Regarding the asymptotic behaviour of an approximate test, it is to be expected that its type I error rate will fluctuate around the level of the nominal error when the sample size is large, and therefore, it can be higher than that of the nominal error level. With our criterion, it can be slightly higher than the level of the nominal error. Regarding an exact test, its type I error rate must not be higher than the level of the nominal error, as happens with the results obtained for CET and UET (Table 2, Table 3, Table 4 and Table 5).
The simulation experiments carried out allowed us to establish some general rules of application for the methods. The WT, LRT, RRT, and ODMT methods can be used for whatever the sample size is, and if the sample size is large, then the MT and MWT methods can also be applied. Of these six methods, two are conditioned methods (MT and LRT), and four are unconditioned (WT, MWT, RRT, ODMT); therefore, the problem can be addressed without any problem from both perspectives (conditioned and unconditioned), obtaining results that are very similar. Another important conclusion obtained from the simulation experiments is that continuity corrections do not improve the asymptotic behaviors of the studied methods. Therefore, although in the statistical literature there are different methods that incorporate continuity corrections, their application is not justified.
In this manuscript, we have studied the comparison of two paired proportions using hypothesis tests. An alternative method is to carry out this comparison using confidence intervals instead of hypothesis testing. In this context, there are also numerous intervals (exact and approximate) that can be used [4,13,14,15,16]. In Fagerland et al. [12,13], the behaviour of some of the most used is compared, but it may currently be somewhat incomplete. Therefore, given that new confidence intervals have been investigated in recent years [14,15,16], it is of great interest from a practical point of view to determine which intervals have the best asymptotic behaviour.
Supplementary Materials
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math12020190/s1.
Author Contributions
J.A.R.-N., T.S.S., and J.F.V.-V. have collaborated equally in the realization of this work. All authors have read and agreed to the published version of the manuscript.
Funding
This study was funded by Spanish Ministry of Science and Innovation, Grant “PID2021-126095NB-100” funded by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”.
Data Availability Statement
The data presented in this study are available on request from the corresponding author.
Acknowledgments
The authors thank the anonymous referees and the Academic Editors (Elvira Di Nardo and José Luis Vicente Villardón) for their helpful comments that improved the quality of the manuscript.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Fay, M.P.; Hunsberger, S.A. Practical valid inferences for the two-sample binomial problem. Stat. Surv. 2021, 15, 72–110. [Google Scholar] [CrossRef]
- Pepe, M.S. The Statistical Evaluation of Medical Tests for Classification and Prediction, 1st ed.; Oxford University Press: New York, NY, USA, 2003. [Google Scholar]
- Zhou, X.H.; Obuchowski, N.A.; McClish, D.K. Statistical Methods in Diagnostic Medicine, 2nd ed.; Wiley: Hoboken, NJ, USA, 2011. [Google Scholar]
- Pradhan, V.; Gangopadhyay, A.K.; Menon, S.M.; Basu, C.; Banerjee, T. Confidence Intervals for Discrete Data in Clinical Research, 1st ed.; Chapman & Hall/CRC: New York, NY, USA, 2021. [Google Scholar]
- McNemar, Q. Note on the sampling error of the differences between correlated proportions or percentages. Psychometrika 1947, 12, 153–157. [Google Scholar] [CrossRef] [PubMed]
- Davis, C.S. Matched pairs with categorical data. In Encyclopedia of Biostatistics; Armitage, P., Colton, T., Eds.; Willey: New York, NY, USA, 1998; Volume 3, pp. 2437–2441. [Google Scholar]
- Lachenburch, P.A. McNemar test. In Encyclopedia of Biostatistics; Armitage, P., Colton, T., Eds.; Willey: New York, NY, USA, 1998; Volume 3, pp. 2486–2487. [Google Scholar]
- Pembury Smith, M.Q.R.; Ruxton, G.D. Effective use of the McNemar test. Behav. Ecol. Sociobiol. 2020, 74, 133. [Google Scholar] [CrossRef]
- May, W.L.; Johnson, W.D. The validity and power of tests for equality of two correlated proportions. Stat. Med. 1997, 16, 1081–1096. [Google Scholar] [CrossRef]
- Park, T. Is the exact test better than the asymptotic test for testing marginal homogeneity in 2 × 2 tables? Biom. J. 2002, 44, 571–583. [Google Scholar] [CrossRef]
- Fagerland, M.W.; Lydersen, S.; Laake, P. The McNemar test for binary matched-pairs data: Mid-p and asymptotic are better than exact conditional. BMC Med. Res. Methodol. 2013, 13, 91. [Google Scholar] [CrossRef]
- Fagerland, M.W.; Lydersen, S.; Laake, P. Recommended tests and confidence intervals for paired binomial proportions. Stat. Med. 2014, 33, 2850–2875. [Google Scholar] [CrossRef] [PubMed]
- Fagerland, M.W.; Lydersen, S.; Laake, P. Statistical Analysis of Contingency Tables; Chapman & Hall/CRC: Boca Raton, FL, USA, 2017. [Google Scholar]
- Tang, M.L.; Ling, M.H.; Ling, L.; Tian, G. Confidence intervals for a difference between proportions based on paired data. Stat. Med. 2010, 29, 86–96. [Google Scholar] [CrossRef]
- Pradhan, V.; Saha, K.K.; Banerjee, T.; Evans, J.C. Weighted profile likelihood-based confidence interval for the difference between two proportions with paired binomial data. Stat. Med. 2014, 33, 2984–2997. [Google Scholar] [CrossRef]
- Fay, M.P.; Lumbard, K. Confidence intervals for difference in proportions for matched pairs compatible with exact McNemar’s or sign tests. Stat. Med. 2021, 40, 1147–1159. [Google Scholar] [CrossRef]
- Chang, P.; Liu, R.; Hou, T.; Yan, X.; Shan, G. Continuity corrected score confidence interval for the difference in proportions in paired data. J. Appl. Stat. 2024, 51, 139–152. [Google Scholar] [CrossRef] [PubMed]
- Agresti, A. Categorical Data Analysis, 3rd ed.; Wiley: New York, NY, USA, 2013; pp. 416–417. [Google Scholar]
- Lancaster, H.O. Significance tests in discrete distribution. J. Am. Stat. Assoc. 1961, 56, 223–234. [Google Scholar] [CrossRef]
- Edwards, A.L. Note on the “correction for continuity” in testing the significance of the difference between correlated proportions. Psychometrika 1948, 13, 185–187. [Google Scholar] [CrossRef] [PubMed]
- Yates, F. Contingency table involving small numbers and the test. J. R. Stat. Soc. 1934, 1, 217–235. [Google Scholar]
- Martín-Andrés, A.; de Dios Luna del Castillo, J. 40 ± 10 Horas de Bioestadística; Norma-Capitel: Madrid, Spain, 2013. [Google Scholar]
- Bennett, B.M.; Underwood, R.E. On McNemar’s test for the 2×2 table and its power function. Biometrics 1970, 26, 339–343. [Google Scholar] [CrossRef]
- Wald, A. Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans. Am. Math. Soc. 1943, 5, 426–482. [Google Scholar] [CrossRef]
- Lehmann, E.L.; Romano, J.P. Testing Statistical Hypotheses, 4th ed.; Springer: Cham, Switzerland, 2022; Chapter 14. [Google Scholar]
- Wilks, S.S. The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann. Math. Stat. 1938, 9, 60–62. [Google Scholar] [CrossRef]
- Suissa, S.; Shuster, J.J. The 2 × 2 matched-pairs trial: Exact unconditional design and analysis. Biometrics 1991, 47, 361–372. [Google Scholar] [CrossRef]
- Lu, Y. A revised version of McNemar’s test for paired binary data. Commun. Stat.-Theory Methods 2010, 39, 3525–3539. [Google Scholar] [CrossRef]
- Lu, Y. Considering the concordant observations in likelihood ratio test for paired binary data. Commun. Stat.-Theory Methods 2011, 39, 4214–4232. [Google Scholar] [CrossRef]
- Lu, Y.; Wang, M.; Zhang, G. A new revised version of McNemar’s test for paired binary data. Commun. Stat.-Theory Methods 2017, 46, 10010–10024. [Google Scholar] [CrossRef]
- Haber, M. The continuity correction and statistical testing. Int. Stat. Rev. 1982, 50, 135–144. [Google Scholar] [CrossRef]
- Irony, T.Z.; Pereira, C.A.; Tiwari, R.C. Analysis of opinion swing: Comparison of two correlated proportions. Am. Stat. 2000, 54, 57–62. [Google Scholar]
- Lui, K.J. Notes on testing equality in dichotomous data with matched pairs. Biom. J. 2001, 43, 313–321. [Google Scholar] [CrossRef]
- Price, R.M.; Bonett, D.G. An improved confidence interval for a linear function of binomial proportions. Comput. Stat. Data. Anal. 2004, 45, 449–456. [Google Scholar] [CrossRef]
- Martín-Andrés, A.; Álvarez-Hernández, M. Two-tailed asymptotic inferences for a proportion. J. Appl. Stat. 2014, 41, 1516–1529. [Google Scholar] [CrossRef]
- Martín-Andrés, A.; Álvarez-Hernández, M. Two-tailed approximate confidence intervals for the ratio of proportions. Stat. Comput. 2014, 24, 65–75. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2023; Available online: https://www.R-project.org/ (accessed on 4 December 2023).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).