Next Article in Journal
Building W Matrices Using Selected Geostatistical Tools: Empirical Examination and Application
Previous Article in Journal
Smooth Tests of Fit for the Lindley Distribution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Recent Extensions to the Cochran–Mantel–Haenszel Tests

1
National Institute for Applied Statistics Research Australia, University of Wollongong, Wollongong 2522, Australia
2
School of Mathematical and Physical Sciences, University of Newcastle, Newcastle 2308, Australia
*
Author to whom correspondence should be addressed.
Stats 2018, 1(1), 98-111; https://doi.org/10.3390/stats1010008
Submission received: 28 June 2018 / Revised: 19 September 2018 / Accepted: 19 September 2018 / Published: 26 September 2018

Abstract

:
The Cochran–Mantel–Haenszel (CMH) methodology is a suite of tests applicable to particular tables of count data. The inference is conditional on the treatment and outcome totals on each stratum being known before sighting the data. The CMH tests are important for analysing randomised blocks data when the responses are categorical rather than continuous. This overview of some recent extensions to CMH testing first describes the traditional CMH tests and then explores new alternative presentations of the ordinal CMH tests. Next, the ordinal CMH tests will be extended so they can be used to test for higher moment effects. Finally, unconditional analogues of the extended CMH tests will be developed.

1. Introduction

The Cochran–Mantel–Haenszel (CMH) methodology is a suite of tests applicable to particular tables of count data. The inference is conditional on the treatment and outcome totals on each stratum being known before sighting the data. The CMH tests are applicable to more complex designs than randomised blocks but the analysis of randomised blocks data when the responses are categorical rather than continuous is certainly an important application of the CMH tests.
This paper does not intend to be a general review of the CMH methodology or even recent CMH methodology. The principal aim was to introduce the reader to particular extensions introduced by the first author. An extensive literature survey of CMH-related topics would give the paper a new focus. We suggest that readers interested in CMH testing more broadly use their preferred search engines.
This overview will first describe the traditional CMH tests and then explore new alternative presentations of the ordinal CMH tests. These could be used to calculate the test statistics but their main use will perhaps be to develop intuition about CMH testing. Next, the ordinal CMH tests will be extended so they can be used to test for higher moment effects. One rationale for developing these extensions was to enable a comparison with the nonparametric ANOVA tests introduced in References [1,2,3]. These tests permit univariate moment assessments beyond the mean and bivariate assessments beyond the (order (1, 1)) correlation. The nonparametric ANOVA tests are therefore briefly reviewed before considering the CMH extensions. Finally, unconditional analogues of the extended CMH tests will be developed.

2. The CMH Tests

The CMH tests are a class of nonparametric tests used to nonparametrically analyse tables {Nihj} of count data of a particular structure. Specifically, Nihj counts the number of times treatment i is classified into outcome category h in the jth stratum, i = 1, …, t, h = 1, …, c and j = 1, …, b. Strata are independent and the treatments present in each stratum are fixed by design. As is usual, note that Nihj is a random variable and nihj is a particular value of that random variable.
Dot notation is used to reflect summation over a subscript. In the traditional CMH tests (see, for example, References [4,5,6,7]),
  • the strata totals i , h N i h j = n j ,
  • the treatment totals within strata, h N i h j = n i j , and
  • the outcome totals within strata, i N i h j = n h j
are all assumed to be known prior to sighting the data; they are not random variables. The inference is conditional.
The traditional CMH tests assess
  • overall partial association (OPA) (SOPA is asymptotically distributed as χ b ( c 1 ) ( t 1 ) 2 )
  • general association (GA) (SGA is asymptotically distributed as χ ( c 1 ) ( t 1 ) 2 )
  • mean scores (MS) (SMS is asymptotically distributed as χ t 1 2 ) and
  • correlation (C) (SC is asymptotically distributed as χ 1 2 ).
In parentheses are the symbols used subsequently for each test statistic and the asymptotic null distributions of the test statistics. The test statistics are quadratic forms with vectors using the table counts. Some of the covariance matrices involve Kronecker products.
This design is appropriate for categorical randomised block data. While the CMH methodology is appropriate more generally, it won’t accommodate more complex designs such as Latin square, multifactor and many other designs. However, it is an extremely important analysis tool for randomised blocks when the responses are categorical rather than continuous. In consumer studies with just about right (JAR) responses such as in the Jams Example following, when c is small the randomised block F test will often be invalid in spite of the well-known ANOVA robustness.
Two examples will now be introduced. They will be reconsidered throughout this article.
Homosexual Marriage Example. Scores of 1, 2 and 3 are assigned to the responses agree, neutral and disagree respectively to the proposition “Homosexuals should be able to marry” and scores of 1, 2 and 3 are assigned for the religious categories fundamentalist, moderate and liberal respectively. See Table 1. Agresti [8] finds SGA takes the value 19.76 with χ 4 2 p-value 0.0006, SMS takes the value 17.94 with χ 2 2 p-value 0.0001, and SC takes the value 16.83 with χ 1 2 p-value less than 0.0001. From these tests, we conclude there is strong evidence of an association between the proposition responses and religion. In particular, there is evidence of mean differences in the responses and of a (linear-linear) correlation between responses and religion.
Jams Example. Three plum jams, A, B and C are given JAR sweetness codes by eight judges as in Table 2. Here, 1 denotes not sweet enough, 2 not quite sweet enough, 3 just about right, 4 a little too sweet and 5 too sweet. The treatment sums for jams A, B and C are 18, 30 and 23 respectively. We find SMS = 9.6177 with χ 2 2 p-value 0.0082 and SC = 1.1029 with χ 1 2 p-value 0.2936. There is evidence of a mean effect; on average the jams are different. However, there is no evidence of a correlation effect: As we pass from jam A to B and then to C there is no evidence of an increasing (or decreasing) response.

3. The Nominal CMH Tests

The CMH MS test assumes that response categories are ordered while the CMH C test assumes that both treatment categories and response categories are ordered. It is therefore appropriate to label these as ordinal CMH tests, while the OPA and GA CMH tests, that make no assumption about ordering, can be called nominal CMH tests. Scores for the ordered categories are needed to apply the ordinal tests, while no scores are required for the nominal tests.
It is important to note that the traditional CMH tests are conditional tests, conditional on the treatment totals within strata, h N i h j = n i j , and the outcome totals within strata, i N i h j = n h j being known prior to collecting or sighting the data. For randomised block data in which the responses are untied ranks, the marginal totals are known before collecting the data. This is because each treatment is applied once in every block and each response is a rank that is assigned once. This is not the case for data such as in the homosexual marriage example.
For conditional testing, the distribution theory involved in determining means and covariance matrices uses a product extended hypergeometric distribution. For each stratum, since the row and column totals are known, the counts Nihj follow an extended hypergeometric distribution. Moreover, since the strata are mutually independent, a product distribution is appropriate for the collection of these strata counts.
To define the CMH GA test statistic SGA, first define the vector of counts on the jth stratum Uj = ( N 11 j , , N 1 c j , , N t 1 j , , N t c j ) T . Summing over strata gives
U = ( N 11 , , N 1 c , , N t 1 , , N t c ) T .
Using the extended hypergeometric distribution, it may be shown, as in Reference [4], that U has mean
E [ U ] = ( j = 1 b n i j n h j n j )
and covariance matrix
cov ( U ) = j = 1 b cov ( U j ) = j = 1 b n j 2 ( n j 1 ) V T j V C j
in which, after writing p i j = n i j / n j and p h j = n h j / n j ,
V T j = { diag ( p i j ) ( p i j ) ( p i j ) T } and V C j = { diag ( p h j ) ( p h j ) ( p h j ) T } .
Here, ⊗ is the Kronecker or direct product.
The CMH GA test statistic is a quadratic form using U − E[ U ] and an inverse of cov ( U ) , namely S GA = ( U E [ U ] ) T cov ( U ) ( U E [ U ] ) . As cov ( U ) is not of full rank either a generalised inverse cov ( U ) can be used or dependent variables can be chosen and omitted to produce a covariance matrix of full rank. All unknown parameters are estimated using maximum likelihood under the null hypothesis. Asymptotically, as n becomes large, SGA has the χ ( c 1 ) ( t 1 ) 2 distribution. The test statistic is symmetric in the treatments and outcome categories and independent of the choice of the dependent variables.
The test statistic SGA is too complicated for routine hand calculation; it is almost always applied using software in packages such as R.
To calculate SOPA, the vector of the quadratic form involves the aggregation of the Uj via U = ( U 1 T , , U b T ) T . Again, the covariance matrix is calculated using the product extended hypergeometric distribution. The overall partial association statistic so derived is given by
S OPA = j = 1 b n j 1 n j i = 1 t h = 1 c ( N i h j n i j n h j / n j ) 2 n i j n h j / n j .
Asymptotically this has the χ b ( c 1 ) ( t 1 ) 2 distribution.
One difference between the two tests is that the CMH GA test is seeking to detect the average partial association while the CMH OPA test is seeking to detect the overall partial association. The former is more focused, with (c − 1)(t − 1) degrees of freedom, compared with b(c − 1)(t − 1) for the CMH OPA statistic. The degrees of freedom are the dimension of the alternative hypothesis space. Thus, the CMH OPA test is seeking to detect very general alternatives to the null hypothesis and will have a relatively low power for these alternatives. The CMH GA test seeks to detect fewer alternatives and will have more power than the CMH OPA test for these alternatives. However, the CMH OPA test will have some power for alternatives to which the CMH GA is insensitive. In other contexts, more focused tests have been constructed using components of an omnibus test statistic. This idea will be taken up subsequently.
An alternative test for the overall partial association is the Pearson test, with statistic
T OPA = j = 1 b i = 1 t h = 1 c ( N i h j n i j n h j / n j ) 2 n i j n h j / n j .
This is an unconditional test that does not assume all treatment and outcome categories are known before sighting the data. The difference between SOPA and TOPA is merely the factor ( n j 1 ) / n j applied to each stratum. For large stratum counts, this will make little difference in the values of SOPA and TOPA.
Likewise general association can be tested for using a Pearson test for the two-way table of counts {Nih}, TGA say. The Pearson test statistics TOPA and TGA will have the same asymptotic distributions as the corresponding conditional tests. Most users will have more familiarity with the unconditional tests and most packages will have routines for their calculation even if they don’t have routines for the CMH tests.
Subsequently, the main focus will be the CMH ordinal tests.

4. The Ordinal CMH Tests

4.1. CMH Mean Scores Test

Suppose, as before, that Nihj counts the number of times treatment i is classified into outcome category h in the jth stratum, i = 1, …, t, h = 1, …, c and j = 1, …, b. Assume that outcomes are ordinal and assign the score bhj to the hth response on the jth stratum. All marginal totals are assumed to be known, so the product extended hypergeometric model is assumed. The score sum for treatment i in stratum j is M i j = h = 1 c b h j N i h j . Taking expectations E [ M i j ] = n i j h = 1 c b h j n h j / n j since E [ N i h j ] = n i j n h j / n j . If Mj = (Mij) then the inference is based on M = j = 1 b M j through the quadratic form
S MS = ( M E [ M ] ) T cov ( M ) ( M E [ M ] )
in which all unknown parameters are estimated by maximum likelihood under the null hypothesis. Now, E [ M ] = j = 1 b E [ M j ] and as strata are independent cov ( M ) = j = 1 b cov ( M j ) . To give cov(Mj), first define
S j 2 = n j h = 1 c b h j 2 n h j ( h = 1 c b h j n h j ) 2 ( n j 1 ) .
Then it may be shown that cov ( M j ) = S j 2   V T j . Since cov(Mj) is not of full rank, the usual approaches, such as dropping appropriate treatment and/or outcome categories, or using a generalised inverse, can be used.
Note. The mean scores statistic depends on the scores assigned to the response categories. Thus, the statistic could be written SMS({bhj}) to emphasise this dependence.
Aside. The derivation of cov(Mj) requires routine but tedious algebra. If δuv is the Kronecker delta, = 1 if u = v and zero otherwise, using standard distribution theory for the product extended hypergeometric distribution, E [ N i h j ] = n i j n h j / n j and the covariance between Nihj and N i h j is
n i j n h j ( δ i i n j n i j ) ( δ h h n j n h j ) / { n j 2 ( n j 1 ) } .
It follows that
var ( h b h j N i h j ) = n i j ( n j n i j ) S j 2 / { n j 2 ( n j 1 ) }   and
cov ( h b h j N i h j , h b h j N i h j ) = n i j n i j S j 2 / { n j 2 ( n j 1 ) } .
These lead to the stated covariance matrix.
Under the null hypothesis of no treatment effects, the distribution of SMS can be shown to be asymptotically χ t 1 2 ; see Reference [4].

4.2. The CMH Correlation Test

The CMH correlation tests assume that the treatment and response variables are both measured on either an ordinal or the interval scale and that for the ith treatment the scores are ahi, i = 1, …, t, and on the jth stratum the response scores are bhj, j = 1, …, b.
The null hypothesis of no association between the treatment and response variables, having adjusted for the b strata, is tested against the alternative that across strata there is a consistent association, positive or negative, between the treatment scores and response scores.
Take
  • C j = i h a h i b h j { N i h j E [ N i h j ] }   and
  • C = j C j .
The CMH correlation (CMH C) statistic is C2/var(C) = SC say. The derivation of var(C) is relatively complex if scalars are used but is routine using Kronecker products.
To derive var(C), first define aj = (a1j, …, atj)T, bj = (b1j, …, bcj)T and Nj = (N11j, …, N1cj, …, Nt1j, …, Ntcj)T. Then C j = ( a j b j ) T ( N j E [ N j ] ) and
var ( C j ) = E [ ( a j b j ) T ( N j E [ N j ] ) ( N j E [ N j ] ) T ( a j b j ) ] = = ( a j b j ) T E [ ( N j E [ N j ] ) ( N j E [ N j ] ) T ] ( a j b j ) = = ( a j b j ) T cov ( N j ) ( a j b j ) .
Recall from Section 3 that with p i j = n i j / n   and   p h j = n h j / n , we have V T j = diag ( p i j ) ( p i j ) ( p i j ) T   and   V C j = diag ( p h j ) ( p h j ) ( p h j ) T . Now, from Reference [4],
cov ( N j ) = n j 2 n j 1 ( V T j V C j ) .
Hence,
var ( C j ) = n j 2 n j 1 ( a j T V T j a j b j T V C j b j ) = n j 2 n j 1 ( a j T V T j a j ) ( b j T V C j b j )
because both factors in the Kronecker product are scalars. Finally, var ( C ) = j cov ( C j ) because counts in different strata are mutually independent. The CMH correlation statistic SC is now fully specified.
The Central Limit Theorem assures the asymptotic normality of C, so as the total sample size n = n 1 +   + n s approaches infinity SC has asymptotic distribution χ 1 2 . Again, see Reference [4].

5. Alternative Presentations of the Ordinal CMH Test Statistics

In this section, the focus is on alternative expressions for the ordinal CMH test statistics. In the case of randomised block data, simple expressions are given for the ready calculation of the test statistics. The expression for the correlation statistic is more general and quite insightful, so that will be considered first.

5.1. The CMH Correlation Statistic

Using the definitions previously given for VTj and VCj we have
n j   a j T V T j a j = i a i j 2 n i j ( i a i j n i j ) 2 / n j   and
n j   b j T V C j b j = h b h j 2 n h j ( i b h j n h j ) 2 / n j .
On stratum j now define
S X X j = i a i j 2 n i j ( i a i j n i j ) 2 / n j ,
S X Y j = i h a i j b h j N i h j ( i a i j n i j ) ( h b h j n h j ) / n j and
S Y Y j = h b h j 2 n h j ( h b h j n h j ) 2 / n j .
With appropriate divisors the SXXj, SXYj and SYYj give unbiased estimators of the stratum variances and covariances. With these definitions,
C j = S X Y j , n j   a j T V T j a j = S X X j   and   n j   b j T V C j b j = S Y Y j .
Finally, since C = j C j   and   var ( C ) = j cov ( C j ) ,
S C = C 2 / var ( C ) = { j S X Y j } 2 j { S X X j 2 S Y Y j 2 / ( n j 1 ) } ,
The expression here uses the SXXj, SXYj and SYYj familiar in formulae for regression coefficients.
Three special cases will be considered:
(1)
The data consists of one stratum only;
(2)
the treatment scores are independent of the strata: aij = ai for all i and j;
(3)
the randomised block design (in Section 5.2).
In the second case, SXX is constant over strata. This gives a slight simplification of the SC formula. See the Jams Example below. If the data come from a randomised block design, a considerable simplification is possible if the same treatment and response scores are used on each stratum or block. See Section 5.2.
In the first case, the CMH correlation statistic simplifies to
S C = ( n 1 1 ) S X Y 1 2 S X X 1 2 S Y Y 1 2 = ( n 1 1 ) r P 2 ,
in which rP is the Pearson correlation coefficient. This is well known. See, for example, Reference [6], p. 253.
If we now write rPj for the Pearson correlation in the jth stratum, it follows that since S X Y j = r P j S X X j S Y Y j
S C = { j r P j S X X j S Y Y j } 2 j { S X X j 2 S Y Y j 2 / ( n j 1 ) }
SC is proportional to the square of a linear combination of the Pearson correlations in each stratum. The proportionality factor ensures SC has the χ 1 2 distribution. This formula demonstrates how the Pearson correlations in each stratum contribute to the overall correlation measure.
Homosexual Marriage Example. These data were considered in Section 1; they are given in Table 1. Noting that stratum 1 is school and stratum 2 is college, we find SXX1 = 50, SXY1 = −9 and SYY1 = 39.7333, rP1 = −0.2019 and the CMH C statistic for school takes the value 2.4055 with χ 1 2 p-value 0.1209. Similarly, SXX2 = 51.6712, SXY2 = −23.8904, SYY2 = 42.6301, rP2 = −0.5090 and the CMH C statistic for college takes the value 18.6558 with χ 1 2 p-value 0.0000. From these, the value of the CMH C statistic and its χ2 p-value are confirmed: It was previously noted that SC takes the value 16.8328 with χ 1 2 p-value 0.0000. Clearly, there is an insignificant Pearson correlation for schools and a highly significant Pearson correlation for college. The latter dominates the former so that overall there is strong evidence of a correlation effect: As religion becomes increasingly liberal there is greater agreement with the proposition that homosexuals should be able to marry. This is due mainly to the stratum college.
Whiskey Example. O’Mahony [9], p. 363 gave the data in Table 3, which were analysed in Reference [10]. They use mid-rank scores and find the Spearman correlation, which takes the value 0.73. In testing if this is zero against two-sided alternatives they give a Monte Carlo p-value of 0.09 and an asymptotic p-value of 0.04.
Using scores 1, 2 and 3 for grade and 1, 5 and 7 for years of maturity, we find SXX = 6, SXY = −12 and SYY = 43.5. It follows that the Pearson correlation is −0.7428 and the CMH C statistic takes the value 3.8621 with χ 1 2 p-value 0.0494. There is some evidence that as maturity increases so does the grade of the whiskey.
Jams Example. As there are eight strata, hand calculation is possible if a little tedious. As SXX is constant over strata, it is not too much extra work to calculate SXY, SYY, the Pearson correlation, the CMH correlation statistic and its p-value on each stratum (See Table 4).
We find SXX = 2 on all strata and SC = 1.1029 with χ 1 2 p-value 0.2936. There is no significant correlation effect, which would, if present, indicate that as we pass from jam A to B to C there is increasing (or decreasing) sweetness. It could be that overall there is no correlation effect with the contrary being the case in a minority of strata. That is not the case here; no stratum shows any evidence of a slight correlation effect. Here this is hardly surprising; with only three observations in each stratum, there can be little power in testing for a correlation.
If there is interest in the contributions to the correlation from individual strata (as in the homosexual marriage example) this is a reasonable approach. However, if there is not, then for the randomised blocks design with the same treatment scores in each stratum, a considerable simplification is possible. This is now considered.

5.2. The Randomised Blocks Design

As developed, the CMH methodology does not apply to Latin square, multifactor and many other designs. However, it is an extremely important analysis tool for randomised blocks when the responses are categorical rather than continuous. See, for example, Reference [6], Chapter 8, who use CMH methods to analyse repeated measures categorical data.
In the case of the randomised block design, a considerable simplification of the CMH MS statistic is possible. In this case, n i j = 1 for all i and j. Consequently, n j = t , n i = b , n = b t and p i j = n i / n = 1 / t . Substituting in the definitions given in Section 4 ultimately gives
S MS = b ( t 1 )   F ( b 1 + F )
in which F is the ANOVA F test statistic. A derivation of this relationship is given in the supplementary Materials. The significance of this result is that the CMH MS and ANOVA F tests are formally equivalent.
Coincidentally, the given relationship is the same as that relating the Friedman test statistic and the F test statistic using the ranks as data.
p-Values may be obtained by referring F to its F(t–1),(b–1)(t–1) distribution or SMS its χ t 1 2 distribution, or otherwise. An empirical study would be required to assess which is the more reliable in the sense of closeness to, say, permutation test p-values. In the examples we have analysed, there has been little difference in these methods.
For the CMH C test statistic with the same scores in every stratum is
S C = ( t 1 ) C 2 S X X j S Y Y j
in which C is the sum of the products of the treatment sums and the centred treatment scores, S X X is the sum of the squares of the centred treatment scores and j S Y Y j may be read from the output for a one-way ANOVA. Again, the derivation is given in the supplementary Materials.
Jams Example. For the jams data, ai = 1, 2 and 3, so S X X 2 = 2 . From a one-way ANOVA with judges as treatments, j S Y Y j 2 = 22 2 3 . This can be confirmed by summing across the SYY row in Table 4. By summing the columns in Table 2, the treatment score sums are found to be 18, 30 and 23, and since the centred treatment scores are −1, 0 and 1, C = −18 + 0 + 23 = 5. Substituting gives SC = 2 × 25/(2 × 68/3) = 75/68 = 1.1029 with χ 1 2 p-value 0.2936, as previously.

6. Nonparametric ANOVA

A suite of tests that can analyse the same data as the CMH tests is the nonparametric ANOVA tests of [1,2,3]. However, these tests are applicable for a wider range of designs and are more flexible than the CMH tests, in that they assess both moment effects and generalised correlation (GC) effects of all orders, although only lower order tests would be applied in practice. The extension of the CMH tests considered in the next section makes the CMH tests equally as flexible as the nonparametric ANOVA tests.
The nonparametric ANOVA tests are nonparametric in that they are based on weak multinomial models incorporating smooth alternatives. The sampling distributions of the test statistics are not known, but the usual F distributions provide approximations that agree very well with permutation test results and, of course, are very convenient. Responses for both the CMH and nonparametric ANOVA are categorical, although the use of the term ANOVA suggests continuous responses. In fact, the issue of whether the data are categorical or continuous is of little practical importance. The nonparametric ANOVA tests are robust to even highly categorical data and, while the sampling distributions for the CMH tests are asymptotic, they are quite good even for small sample sizes. A simulation study in Reference [11] showed that for randomised blocks data both suites of tests showed good agreement between their nominal significance levels and the test sizes achieved.
It is interesting to note that if the data are ranked then the first order nonparametric ANOVA F tests are equivalent to the Kruskal-Wallis when the design is the completely randomised design and the Friedman for the randomised blocks design. See, for example, Reference [12], Section 2.5.
Nonparametric ANOVA may be applied to designs consistent with the general linear model (see Reference [3]) while of course, the design for the CMH extensions is much more restrictive. There are two types of nonparametric ANOVA: Ordered and unordered. For the unordered analysis, orthonormal polynomials on the response variable up to a given level, typically three, are constructed. For any given ANOVA, the analysis with the responses transformed by the order r, the orthonormal polynomial is called the order r analysis. Analyses of different orders are uncorrelated, so significance or not at any given order doesn’t affect the significance or not at any other order.
For a given design there will usually be several effects, such as main effects, interactions and so on. The order r analysis tests null hypotheses that the responses transformed by the rth orthonormal polynomial are consistent across levels of these effects. Suppose, for example, that one effect is the application of treatment A. Analyses of successive orders seek to assess whether or not the responses transformed by the first, second and higher order orthonormal polynomials are consistent across levels of treatment A.
The ordered analysis assumes that at least one of the independent variables is ordered in the sense that the levels of the effects corresponding to that variable are ordered. Orthonormal polynomials are constructed on the response variable and on each ordered independent variable. The responses are transformed by an orthonormal polynomial of a particular order and each independent variable is also transformed by an orthonormal polynomial of a particular order. Then the product of the transformed variables is taken. A new, reduced design is formed from the original. A new response, the product of the orthonormal polynomials, replaces the original response and the ordered independent variables are removed. ANOVAs of interest on the reduced design are then performed. These analyses assess generalised correlations, for which see Reference [13]. As with the unordered analysis, the conclusions from one ANOVA do not affect the conclusions from other ANOVAs. Of most interest is the usual order (1, 1) correlation that assesses linear-linear effects and order (1, 2) and (2, 1) correlations that assess umbrella effects.
Both the extended CMH tests and the unordered and ordered nonparametric ANOVAs assume the existence of the orthonormal polynomials. An orthonormal polynomial of order r requires the existence of moments up to order 2r while the corresponding orthogonal polynomial requires the existence of moments up to order r + 1. However, as responses are classified into, say, c classes, it is only possible to calculate moments of the responses up to order c − 1. Thus, moments, orthonormal polynomials and hence tests of certain orders are not available for some parameter choices.

7. Extensions of the CMH Mean Scores and Correlation Tests

To construct extensions to SMS, consider the scores bhj on the jth stratum. A common choice of scores to give a ‘mean’ assessment would be the ‘natural’ scores 1, 2, …, c on all strata. A ‘dispersion’ assessment could be given by choosing scores 12, 22, …, c2 and similarly higher order powers might be of interest: bhj = hr. One problem with using the scores bhj = hr is that the test statistics are correlated. Thus, the significance or not of the test for any order may affect the significance or not of tests at other orders. We now look at using more general scores with the objective of having uncorrelated test statistics. To this end, now denote order r scores that are not stratum-specific by b h r , h = 1, …, c. Define the order r score sum for treatment i by j = 1 b h = 1 c b h r N i h j = h = 1 c b h r N i h = M i r . Now suppose that { b h r } are orthonormal using the weight function { n h / n } . Then for rs = 1, 2, …, t it can be shown that c o v ( M i r , M i s ) = 0 : The ith score sums of different orders, M i r and M i s , are uncorrelated. In this sense, the information provided by the scores sums of different orders for the same treatment is not related.
To construct extensions to SC, suppose that instead of a single set of outcome scores {bhj}, we consider c sets of scores, { b h j ( s ) } for s = 0, 1, …, c − 1. Moreover, suppose the scores are orthonormal in the sense that h b h j ( r ) b h j ( s ) n h j / n j = δ r s with r, s = 0, 1, …, c − 1 and with b h j ( 0 ) = 1 for h = 1, …, c. Similarly, instead of a single set of treatment scores {aij}, consider t sets of scores, { a i j ( r ) } for r = 0, 1, …, t − 1 and suppose these scores are orthonormal in the sense that i a i j ( r ) a i j ( s ) n i j / n j = δ r s with r, s = 0, 1, …, t − 1 and with a i j ( 0 ) = 1 for i = 1, …, t. Both sets of scores may be stratum-specific. Define SCrs as before but using a i j ( r ) and b h j ( s ) . It can be shown that the SCrs are uncorrelated. See Reference [11] for details.
Jams Example. For the jams data, the CMH MS and unordered nonparametric ANOVA p-values are exactly the same. This is because for a randomised block design the CMH MS tests using the ANOVA F statistic use the same orthonormal functions, based on the category weights, as the unordered nonparametric ANOVA.
For the jams data, the p-values using the F distribution are 0.0278, 0.2435 and 0.5554 and those using the χ t 1 2 distribution are 0.0082, 0.1116 and 0.3802 respectively. At the 0.05 level, there is evidence of mean differences in the scores for jams but not of higher moment effects.
We may test for generalised correlation effects using both CMH GC and nonparametric ANOVA. In Table 5, the nonparametric ANOVA cells give p-values, first using the t-test and second using the Wilcoxon signed ranks test.
Only the (1, 2) p-value using the t-test is less than 0.05. The treatment sums for jams A, B and C are 18, 30 and 23 respectively. It seems that in passing from jam A to B and then C there is evidence that the sweetness is assessed to increase and then decrease.
Here, and in other examples, we find that the CMH extensions and the corresponding nonparametric ANOVA tests give fairly similar conclusions.
Homosexual Marriage Example. For the homosexual marriage data, the CMH MS extensions of first and second order yield p-values of 0.0000 and 0.2732 respectively.
The CMH C extensions of orders (1, 1), (1, 2), (2, 1) and (2, 2) were calculated. An intermediate step gave the generalised correlations of these orders for each stratum. Again, the (1, 1) correlation gave very strong evidence of being non-zero. The others were not significant at the commonly used levels of significance. See Table 6.
The unordered nonparametric ANOVA gave p-values of less than 0.0001 for first order and 0.3719 for second order. As with the CMH MS extensions there is strong evidence of a mean difference between religions but no evidence of a second order effect.
The ordered nonparametric ANOVA may be used to test for non-zero generalised correlations. For orders (1, 1), (1, 2), (2, 1) and (2, 2) the Wilcoxon signed rank test two-tailed p-values were 0.0003, 0.0634, 0.0710 and 0.2540. In all cases, the t-test was invalid due to non-normality. Nonparametric ANOVA confirms that there is evidence that the order (1, 1) generalised correlation is non-zero, while possibly reserving judgement on the generalised correlations of order (1, 2) and (2, 1).

8. Development of Unconditional CMH Tests

It was noted in Section 3 that the nominal CMH test statistics SOPA and SGA had analogues denoted by TOPA and TGA. The latter is based on Pearson tests. The nominal CMH tests were identified as conditional tests while the analogues since they are not based on the product extended hypergeometric distribution, are unconditional tests. Since the unconditional Pearson tests are very familiar, they would be the tests of preference for most users. It is of interest to develop unconditional analogues of the ordinal CMH tests. This was done in Reference [14].
For singly ordered two-way tables, an unconditional model is explored in Reference [10], Chapter 4. On the jth stratum define orthonormal polynomials {ωuj(h)} on ( p ^ 1 j , …, p ^ c j ):
h = 1 c ω u j ( h ) ω v j ( h ) p ^ h j = δ u v .
Next, define
V u i j = h = 1 c N i h j ω u j ( h ) / n i j ;
Vuij reflects the contribution of treatment i to the order u effect on stratum j. Rayner and Best [10], Chapter 4 show that
h = 1 c 1 i = 1 t V u i j 2 = X P j 2
the Pearson statistic on the jth stratum. The statistic V u 1 j 2 + + V u t j 2 gives an order u assessment of the consistency of treatments in the jth stratum. It has asymptotic distribution χ t 1 2 . Summing over strata i , j V u i j 2 gives an order u assessment of the consistency of treatments over all b strata and has asymptotic distribution χ b ( t 1 ) 2 .
In particular, i , j V 1 i j 2 gives an overall mean assessment of the consistency of the treatments, i , j V 2 i j 2 gives an overall dispersion assessment of the consistency of the treatments and so on. The i , j V u i j 2 partition j X P j 2 = T OPA .
Average assessments may be obtained by summing over strata. Put f j = ( n 1 j , , n t j ) T and j ( I t f j f j T / n j ) = Σ say. An average order u assessment of the consistency of treatments can be based on the quadratic form
( V u 1 , , V u t ) Σ ( V u 1 , , V u t ) T = T M u s a y
in which Σ is a generalised inverse of Σ. Of course, TM1 gives an unconditional assessment of mean score differences in treatments in contrast with SMS that gives a conditional assessment of mean score differences in treatments. The TMu are all asymptotically χ t 1 2 distributed.
Note that the tests for overall and even average association may have quite large degrees of freedom. This means they are seeking to detect very general alternatives to the null hypothesis of no association. The test for the average partial association may well be more focused than that for the overall association but may well still have low power when (t − 1)(c − 1) is quite large: The alternative to the null is too general. The moment tests here, and the generalised correlation tests following, offer focused tests for alternatives that are important and easy to understand.
For correlation tests results analogous to those in Reference [10], Section 8.2 for two-way tables are needed. There is a single multinomial {Nij} that follows a multinomial distribution with total count n = N and cell probabilities {pij} with p = 1 is assumed. Instead, we need to assume {Nij} follows a multinomial distribution with total count n i and cell probabilities {pij} with p i = 1 . Nevertheless, it may be shown that similar results apply.
In particular, if {πr(i)} are orthonormal on the { p i } with π0(i) = 1 for all i, if {ωs(j)} are orthonormal on the { p j } with ω0(j) = 1 for all j, and if V r s = i j N i j π r ( i ) ω s ( j ) / n , then the {Vrs} numerically partition the Pearson statistic and are asymptotically standard normal score test statistics. In particular, V11/√n is the Pearson product moment correlation for grouped data and, if the scores are ranks, V11/√n is the Spearman correlation.
To use the results immediately above in the context here, first define, on the jth stratum, orthonormal polynomials {πrj(i)} on ( p ^ 1 j , , p ^ c j ) with π0(i) = 1 for all i, and {ωsj(h)} on ( p ^ 1 j , , p ^ c j ) with ω0(j) = 1 for all j. Next define, for r = 1, …, t − 1 and s = 1, …, c − 1
V r s j = i = 1 t h = 1 c N i h j π r j ( i ) ω s j ( h ) / n j .
These assess order (r, s) generalised correlations on stratum j. Results in Reference [10], Section 8.2, translated to the current context, show that
r = 1 t 1 s = 1 c 1 V r s j 2 = X P j 2
the Pearson statistic on the jth stratum. The V r s j 2 may be summed over strata to give an overall order (r, s) correlation assessment. As strata are independent, j V r s j 2 will have an asymptotic χ b 2 distribution. In particular, j V 11 j 2 will give an overall unconditional assessment of the usual (that is, order (1, 1) generalised) correlation. Since the sum of the strata Pearson statistics is the overall Pearson statistic, the j V r s j 2 numerically partition the overall Pearson statistic.
For a comparison of the conditional and unconditional analyses for the homosexual marriage example, see Table 7.
Rayner and Best Reference [14] note that care must be exercised in the application of these tests. Any response category that included no observations would normally be deleted. For the jams data, see Table 2 to confirm that overall there are no null categories; however, that is not the case within each stratum. For example, the first judge gives scores of 2, 3 and 3. The first, fourth and fifth categories include no responses. That means i N i h j = n h j = 0 for h = 1, 4 and 5 when j = 1. Thus, X P 1 2 is not defined, as is the case for both SOPA and TOPA. It is some consolation that TGA is defined.
What is happening with TOPA is that in order to sum the Pearson statistics from each stratum the same treatment and responses need to be included in all statistics and with sparse data that need not be the case. The same applies to SOPA, which is using a weighted sum of Pearson statistics.
The sparseness of the data also affects the definition of the unconditional moment and correlation tests. Both require the definition of orthonormal functions on each stratum. If there are three distinct responses, then the orthonormal functions up to order two can be defined; but for the jams data this only happens on strata 4 and 7. All other strata have two distinct responses and so only orthonormal functions of order zero and one can be defined. It is feasible that a judge not be able to distinguish between the treatments and would therefore give the same score to all treatments. There is then no information about the differences between the treatments and such a judge should be excluded from the analysis. For these data, there are no non-informative judges to be removed but only first order orthonormal functions can be constructed on all strata. Second order orthonormal functions are not defined on all strata, so no second order analysis can be given.
It seems necessary to accept that sparse data yields sparse information. Here, the data gives information about the usual (order (1, 1) generalised) correlation but not enough information about, for example, the order (1, 2) generalised correlation, to be able to say anything further.

Supplementary Materials

The following are available online at https://www.mdpi.com/2571-905X/1/1/8/s1. Derivation of the relationship between SMS and F and; Derivation of the CMH C test statistic with the same scores in every stratum

Author Contributions

J.C.W.R. wrote the manuscript with P.R. adding critical and constructive comment as well as editorial advice. P.R. assisted with analysis of the data in the examples.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Rayner, J.C.W.; Best, D.J. Extended ANOVA and rank transform procedures. Aust. N. Z. J. Stat. 2013, 55, 305–319. [Google Scholar] [CrossRef]
  2. Rayner, J.C.W.; Best, D.J.; Thas, O. Extended analysis of at least partially ordered multi-factor ANOVA. Aust. N. Z. J. Stat. 2015, 57, 211–224. [Google Scholar] [CrossRef]
  3. Rayner, J.C.W. Extended ANOVA. J. Stat. Theory Pract. 2017, 11, 208–219. [Google Scholar] [CrossRef]
  4. Landis, J.R.; Heyman, E.R.; Koch, G.G. Average partial association in three-way contingency tables: A review and discussion of alternative tests. Int. Statist. Rev. 1978, 46, 237–254. [Google Scholar] [CrossRef]
  5. Landis, J.R.; Cooper, M.M.; Kennedy, T.; Koch, G.G. A computer program for testing average partial association in three-way contingency tables (PARCAT). Comput. Programs Biomed. 1979, 9, 223–246. [Google Scholar] [CrossRef]
  6. Davis, C.S. Statistical Methods for the Analysis of Repeated Measurements; Springer: New York, NY, USA, 2002. [Google Scholar]
  7. Kuritz, S.J.; Landis, J.R.; Koch, G.G. A general overview of Mantel-Maenszel methods: Applications and recent developments. Ann. Rev. Public Health 1988, 9, 123–160. [Google Scholar] [CrossRef] [PubMed]
  8. Agresti, A. Categorical Data Analysis, 2nd ed.; Wiley: New York, NY, USA, 2002. [Google Scholar]
  9. O’Mahony, M. Sensory Evaluation of Food—Statistical Methods and Procedures; Marcel Dekker: New York, NY, USA, 1986. [Google Scholar]
  10. Rayner, J.C.W.; Best, D.J. A Contingency Table Approach to Nonparametric Testing; Chapman & Hall/CRC: Boca Raton, FL, USA, 2001. [Google Scholar]
  11. Rayner, J.C.W.; Best, D.J. Extensions to the Cochran–Mantel–Haenszel mean scores and correlation tests. J. Stat. Theory Pract. 2018, 12, 561–574. [Google Scholar] [CrossRef]
  12. Rayner, J.C.W. Introductory Nonparametrics; Bookboon: Copenhagen, Denmark, 2016. [Google Scholar]
  13. Rayner, J.C.W.; Beh, E.J. Towards a Better Understanding of Correlation. Stat. Neerl. 2009, 63, 324–333. [Google Scholar] [CrossRef]
  14. Rayner, J.C.W.; Best, D.J. Unconditional analogues of Cochran–Mantel–Haenszel tests. Aust. N. Z. J. Stat. 2017, 59, 485–494. [Google Scholar] [CrossRef]
Table 1. Opinions on homosexual marriage by religious beliefs and education levels for ages 18 to 25.
Table 1. Opinions on homosexual marriage by religious beliefs and education levels for ages 18 to 25.
EducationReligionHomosexuals Should Be Able to Marry
AgreeNeutralDisagree
SchoolFundamentalist6210
Moderate839
Liberal1156
CollegeFundamentalist4211
Moderate2135
Liberal2241
Table 2. JAR codes for the sweetness of three plum jams.
Table 2. JAR codes for the sweetness of three plum jams.
JudgeJam
ABC
1323
2454
3323
4142
5242
6133
7254
8252
Table 3. Cross-classification of age and whiskey.
Table 3. Cross-classification of age and whiskey.
Years of MaturingGradeTotal
FirstSecondThird
One0022
Five1113
Seven2103
Total3238
Table 4. Analysis of Jams data.
Table 4. Analysis of Jams data.
Stratum
12345678
SXY00010220
SYY0.66670.66670.66674.66672.66672.66674.66676
rP0000.327300.86600.65470
SC0000.214301.50.85710
p-value1110.643410.22070.35451
Table 5. CMH and nonparametric ANOVA GC p-values when testing for zero generalised correlations for the jam data.
Table 5. CMH and nonparametric ANOVA GC p-values when testing for zero generalised correlations for the jam data.
CMH GC TestsNonparametric ANOVA
Treatment OrderTreatment Order
Category Order1212
10.29360.02120.2249
0.2403
0.0333
0.0853
20.18620.27800.1410
0.2058
0.2640
0.1558
30.60980.31040.6041
0.7229
0.3069
0.2601
Table 6. CMH p-values for each stratum and overall when testing for zero generalised correlations for the jam data.
Table 6. CMH p-values for each stratum and overall when testing for zero generalised correlations for the jam data.
CMH C Extensions
(1, 1)(1, 2)(2, 1)(2, 2)
School0.12090.83520.32460.8523
College0.08660.81830.27660.8371
Overall0.00000.23280.12720.8777
Table 7. Homosexual marriage opinions p-values.
Table 7. Homosexual marriage opinions p-values.
ConditionalUnconditional
StatisticValuep-ValueStatisticValuep-Value
SGA19.760.0006TGA20.680.0004
SOPA26.710.0008TOPA27.090.0007
SMS17.940.0001
SM122.320.0000TM123.710.0000
SM22.590.2732TM22.440.2954
SC1117.980.0000 T C = V 11 2 / 2 17.480.0000
SC121.420.2328 V 12 2 / 2 2.350.1254
SC212.330.1272 V 21 2 / 2 1.280.2570
SC220.020.8777 V 22 2 / 2 0.030.8726

Share and Cite

MDPI and ACS Style

Rayner, J.C.W.; Rippon, P. Recent Extensions to the Cochran–Mantel–Haenszel Tests. Stats 2018, 1, 98-111. https://doi.org/10.3390/stats1010008

AMA Style

Rayner JCW, Rippon P. Recent Extensions to the Cochran–Mantel–Haenszel Tests. Stats. 2018; 1(1):98-111. https://doi.org/10.3390/stats1010008

Chicago/Turabian Style

Rayner, J. C. W., and Paul Rippon. 2018. "Recent Extensions to the Cochran–Mantel–Haenszel Tests" Stats 1, no. 1: 98-111. https://doi.org/10.3390/stats1010008

Article Metrics

Back to TopTop