1. Introduction
Testing for approximate row-column independence in two-way contingency tables is a common task in statistical practice. The first publications on this topic go back to Hodges and Lehmann [
1], Diaconis and Efron [
2]. More recently, Liu and Lindsay [
3] applied the semi-parametric tubular tolerance regions to the row-column independence model in two-way contingency tables. The method relies on the analytical properties of the LRT statistic to obtain a closed form estimator of boundary points. Wellek [
4] develops a test for independence in multi-way contingency tables in Section 9.2. For this purpose, he applies a test for consistency with a fully specified multinomial distribution as follows. First, the marginal distributions of the contingency table are calculated. The test statistic is the Euclidean distance between the product measure of the marginal distributions and the contingency table. The critical value is calculated asymptotically.
Ostrovski [
5] proposes a general method to test equivalence to families of multinomial distributions, which is based on the minimum distance
to a family
of multinomial distributions. If
d is Euclidean distance and
is the independence model then the calculation of minimum distance Equation (
1) requires numerical optimization. Generally, the method relies on the existence of a continuous minimizer of Equation (
1). Unfortunately, it could not be shown if a continuous minimizer exists for the independence model. Instead, Ostrovski [
5] assumes the existence of a continuous minimizer at all points and then applies the method to test for approximate independence. Additionally, numerical calculation of minimum distance Equation (
1) makes the bootstrap test computationally intensive.
We follow the lines of [
5], but avoid the numerical valuation of the minimum distance Equation (
1) in the special case of independence testing. We also propose an efficient bootstrap test, which is based on the randomized estimator of the boundary points.
Any two-way contingency table of the size
corresponds to a probability matrix from
. Let
denote the probability matrix. Let
be the independence model, which contains all product measures of the corresponding dimensions. The approximate row-column independence can be shown by testing
where
is a tolerance parameter.
Let r and c denote the probability vectors of the marginal distributions, which are defined by and . A probability matrix p belongs to iff the equality is fulfilled for all . We consider the transformations and of the matrix p, which are defined by and
For any differentiable distance
l on
we define two new distances
and
. It should be noted that
and
are only pseudo-metrics because
or
does not imply
. We put these distances in Equation (
1) and obtain
and
, where 0 denotes the zero matrix and 1 is the matrix of ones. The distances
and
can be interpreted respectively as the absolute deviation and the relative deviation between
p and the product measure of the marginal distributions. The distances
and
are easy to calculate without optimization.
Therefore,
and
are good candidates for the general distance
d in Definition (
1) and we will use only these two specific distances in remainder of the paper.
We observe a contingency table
of relative frequencies, where
n is the sample size and
p is the true underlying probability matrix. Then the test statistic for Equation (
2) is
or
depending on user preference. Below we write
instead of
and
if the statements are correct for both distances. We use the subscript * instead of
a and
r, if appropriate.
2. Asymptotic Tests
In this section, we derive the asymptotic distribution of the test statistic and give a detailed description of the asymptotic test.
Let be the usual bijection . Let denote the derivative of the function , which can be easily calculated using the chain rule.
Proposition 1. Let p be a boundary point of and . Let denote a square diagonal matrix, whose diagonal entries are . Then the asymptotic distribution of is Gaussian with mean zero and variance , where is a covariance matrix.
Proof. Let
. The normalized vector
converges weakly to a random variable, which is Gaussian with mean zero and covariance matrix
, see [
6], Theorem 14.3-4 for details. The assertion follows by the delta method, see [
7], p. 26, Theorem 3.1. □
The asymptotic variance is unknown and can be estimated by . The estimator is consistent by the continuous mapping theorem because is a continuous function. Let denote the lower -quantile of the normal distribution. Then the critical value of the asymptotic test is . Now we have all components of the asymptotic test, which can be carried out as follows:
Given are the contingency table of relative frequencies, the tolerance parameter and the significance level .
Calculate the test statistic .
Calculate the asymptotic variance .
Reject if .
The outlined test is locally asymptotically most powerful, see [
8], Proposition 3.
Remark 1. The minimum tolerance parameter ε, for which the asymptotic test rejects , can be calculated as .
Remark 2. The asymptotic test can be straightforward generalized for the multi-way contingency tables.
3. Bootstrap Tests
The parametric bootstrap is an efficient method to improve the finite sample performance of the proposed tests. Let denote the boundary of . Let denote an estimator of p, which fulfills the condition . The critical value can be estimated by because the critical value should be estimated so as if were true. The estimator can be computed by the Monte Carlo method to any degree of accuracy.
The minimum distance estimator of p would be difficult to compute because the boundary cannot be parameterized to apply common optimization techniques. Therefore, we propose a computationally feasible estimator of p, which is based on the randomized approximation to the minimum distance estimator.
Let q be some probability matrix such that . If , then let be the largest number from such that . Otherwise let . The linear combination is a consistent estimator of the boundary point p under additional requirements as shown below.
Proposition 2. Assume that for all . Then a.e. for .
Proof. We show that for . Let because otherwise. The function is continuous on and as well as . Therefore, there exists a largest number such that . It is worth mentioning that is a function of .
Let
. By the strong law of large numbers,
converges to
p a.e. for
and therefore
. Let
be an arbitrary point and let
denote
. The sequence of
is bounded. Hence, there exists a convergent sub-sequence
for
. We obtain
for
and consequently
. We conclude
due to the assumption
for all
. Overall, we have shown that
for all
. □
Let be a finite set of probability matrices, such that any fulfills . We define the estimator as a minimum distance estimator among the linear combinations for all . Formally, the estimator equals , which fulfills the conditions and . Note that the distance l is used to define the estimator because is a pseudo-metric only.
Corollary 1. Let at least one satisfy for all . Then a.e. for .
Proof. By definition of
, we obtain
where
a.e. by Proposition 2 and
a.e. by the strong law of large numbers. □
The bootstrap test can be carried out as follows:
Given are the contingency table of relative frequencies, the tolerance parameter , the number of exterior points m and the significance level .
Calculate the test statistic .
If then set and go to step 7.
Find m different points such that . The following rejection algorithm can be applied for the search:
- (a)
Simulate a random matrix w whose entries are independently uniformly distributed on .
- (b)
Normalize w to a probability matrix q.
- (c)
Add q to Q if or reject q otherwise.
- (d)
Repeat previous steps until all exterior points are found.
Solve the equation for using some root finding method. Repeat for all .
Find the minimum distance estimator among all linear combinations , where .
Estimate the critical value using Monte Carlo simulation.
Reject if .
Remark 3. The bootstrap test is asymptotically consistent, see [9], Theorem 15.6.1. Consequently, the test is also locally asymptotically most powerful see [8], Proposition 3. Remark 4. The appropriate number of exterior points m can be found empirically. We found that is sufficient and scales well with the table size.
Remark 5. The minimum tolerance parameter ε, for which the bootstrap test rejects , can be found numerically. For this purpose, the equation should be solved for the tolerance parameter ε using some root finding algorithm. The exterior points and bootstrap samples should remain unchanged during optimization.
4. Simulation Study of Finite Sample Performance
The distance
l is scaled Euclidean distance
, where the scale factor is necessary to obtain comparable test results for different table sizes. We use
in case of
and
in case of
. Alternatively the smoothed total variation distance would be a good choice, see [
8].
The minimum
, for which the test power equals
, is calculated for different table sizes and sample sizes at the uniform probability matrices for the purpose of throwing some light on the appropriate values of
and the effective sample sizes.
Table 1 shows the minimum
for the distance
. The minimum
for
can be found in
Table A1 because the results are very similar for
and
. The minimum
decreases with the increasing sample size at the rate
. The minimum parameter
climbs with the increasing table size at the rate
. Thus, the test power falls slowly with the increasing table size. The bootstrap tests have a smaller minimum
than the asymptotic tests and the difference increases considerably with the table size.
We study the type I error rates at 100 randomly selected points from
because the boundary of
is a very complex set and it is difficult to identify particularly interesting boundary points. The points are found using steps 4 and 5 of the algorithm at the end of
Section 3. The sample size
n equals
to maintain similar test power for different table sizes because test power falls with increasing table size. The simulation results are summarized in
Table 2. The power of all tests varies considerably from point to point. The averaged power of the asymptotic tests decreases quickly with the table size. The asymptotic tests are not conservative for the small tables and become very conservative for the larger tables. The averaged power of the bootstrap tests is very close to the nominal level for all table sizes. However, the bootstrap tests are not conservative for all table sizes. Particularly, the
based bootstrap test shows strong anti-conservative tendency.
A detailed analysis of the boundary points shows that the test power is far above the nominal level at the points, where is close to zero for some i and j. Therefore, the test results should be treated with caution, if the marginal probability is close to zero for at least one category.
The conservative tests can be obtained by shrinking the tolerance parameter
.
Table A5 summarizes the simulation results for
, where the test power is calculated at the same points as in
Table 2, i.e.,
at all considered points. Then the
based tests are conservative at all points and the
based tests are still non conservative at some points.
The type II error rates are studied at 100 randomly selected product measures for each table size, see
Table 3. It should be noted that
Table 3 contains test power and the type II error rate equals 1 minus test power. The sample size equals
to be comparable to the type I error analysis. The power of
-based tests changes very strongly from point to point. Given the fixed table size, the power of the
-based tests is almost constant at all considered points. The averaged power of the asymptotic tests decreases slightly with the increasing table size. The averaged power of the bootstrap tests does not change with the table size.
5. Real Data Sets
To demonstrate the application of the proposed tests, three examples with real data sets are considered: gender and nitrendipine therapy (Nitrendipine); eye color and hair color (Color); children number and income (Children). The corresponding two way contingency tables are given in
Appendix A,
Table A2,
Table A3 and
Table A4.
Table 4 displays the minimum tolerance parameter
, for which
can be rejected at the nominal level
. The three examples are also used in [
5], such that a direct comparison is possible. The results for distance
are similar to those presented in [
5] after appropriate re-scaling. However, we avoid the unproven assumptions and the extensive use of the numeric optimization, which are necessary in [
5].
The first example concerns with the question if the treatment outcome on nitrendipine mono-therapy in patients suffering from mild arterial hypertension depends on gender. The data set is also an example for approximate independence in [
4]. The asymptotic and bootstrap test results for
are very close to each other. The results for
differ considerably for the asymptotic and bootstrap test. Given the small sample size, the treatment outcome and gender can be considered approximately independent.
A common example for independence testing is the cross-classification of eye color and hair color, see [
2,
3]. The test results in
Table 4 reflect the well known fact that eye color and hair color are not independently distributed. All tests behave very similarly and can reject
only for very large values of
.
The cross-classification of the number of children by the annual income has a large sample size. However, the category, where the number of children is larger than or equal 4, is sparsely populated. Therefore, the based tests can reject only for comparatively large values of and the test power is low. The based tests show that the number of children and annual income may be considered approximately independent, but the approximation is very inaccurate.