New Equivalence Tests for Approximate Independence in Contingency Tables

We introduce new equivalence tests for approximate independence in two-way contingency tables. The critical values are calculated asymptotically. The finite sample performance of the tests is improved by means of the bootstrap. An estimator of boundary points is developed to make the bootstrap based tests statistically efficient and computationally feasible. We compare the performance of the proposed tests for different table sizes by simulation. Then we apply the tests to real data sets. The tests are implemented in R and available online, see https://github.com/TestingEquivalence/EquivalenceTestIndependenceR.


Introduction
Testing for approximate row-column independence in two-way contingency tables is a common task in statistical practice. The first publications on this topic go back to Hodges and Lehmann [1], Diaconis and Efron [2]. More recently, Liu and Lindsay [3] applied the semi-parametric tubular tolerance regions to the row-column independence model in two-way contingency tables. The method relies on the analytical properties of the LRT statistic to obtain a closed form estimator of boundary points. Wellek [4] develops a test for independence in multi-way contingency tables in Section 9.2. For this purpose, he applies a test for consistency with a fully specified multinomial distribution as follows. First, the marginal distributions of the contingency table are calculated. The test statistic is the Euclidean distance between the product measure of the marginal distributions and the contingency table. The critical value is calculated asymptotically.
Ostrovski [5] proposes a general method to test equivalence to families of multinomial distributions, which is based on the minimum distance to a family M of multinomial distributions. If d is Euclidean distance and M is the independence model then the calculation of minimum distance Equation (1) requires numerical optimization. Generally, the method relies on the existence of a continuous minimizer of Equation (1). Unfortunately, it could not be shown if a continuous minimizer exists for the independence model. Instead, Ostrovski [5] assumes the existence of a continuous minimizer at all points and then applies the method to test for approximate independence. Additionally, numerical calculation of minimum distance Equation (1) makes the bootstrap test computationally intensive. We follow the lines of [5], but avoid the numerical valuation of the minimum distance Equation (1) in the special case of independence testing. We also propose an efficient bootstrap test, which is based on the randomized estimator of the boundary points. Any two-way contingency table of the size k 1 × k 2 corresponds to a probability matrix from R k 1 × R k 2 . Let p = p ij denote the probability matrix. Let M be the independence model, which contains all product measures of the corresponding dimensions. The approximate row-column independence can be shown by testing where ε > 0 is a tolerance parameter. Let r and c denote the probability vectors of the marginal distributions, which are defined by A probability matrix p belongs to M iff the equality p ij = r i c j is fulfilled for all p ij . We consider the transformations h a and h r of the matrix p, which are defined by For any differentiable distance l on R k 1 × R k 2 we define two new distances d a (p, q) = l (h a (p) , h a (q)) and d r (p, q) = l (h r (p) , h r (q)). It should be noted that d a and d r are only pseudo-metrics because d r (p, q) = 0 or d a (p, q) = 0 does not imply p = q. We put these distances in Equation (1) and obtain and d r (p, M) = l (h r (p) , 1), where 0 denotes the zero matrix and 1 is the matrix of ones. The distances d a (p, M) and d r (p, M) can be interpreted respectively as the absolute deviation and the relative deviation between p and the product measure of the marginal distributions. The distances d a (p, M) and d r (p, M) are easy to calculate without optimization.
Therefore, d a and d r are good candidates for the general distance d in Definition (1) and we will use only these two specific distances in remainder of the paper.
We observe a contingency table p n of relative frequencies, where n is the sample size and p is the true underlying probability matrix. Then the test statistic for Equation (2) is T a (p n ) = √ n (d a (p n , M) − ε) or T r (p n ) = √ n (d r (p n , M) − ε) depending on user preference. Below we write d * instead of d a and d r if the statements are correct for both distances. We use the subscript * instead of a and r, if appropriate.

Asymptotic Tests
In this section, we derive the asymptotic distribution of the test statistic and give a detailed description of the asymptotic test.
Let v : R k 1 × R k 2 → R k 1 +k 2 be the usual bijection v(p) = p 11 , p 12 , . . . , p k 1 k 2 . Letd * denote the derivative of the function q → d * v −1 (q) , M , which can be easily calculated using the chain rule. Proposition 1. Let p be a boundary point of H 0 and q = v (p). Let D q denote a square diagonal matrix, whose diagonal entries are q 1 , . . . , q k 1 +k 2 . Then the asymptotic distribution of T * (p n ) is Gaussian with mean zero and variance σ * (p) =d * (q) Σ (q)d * (q) t , where Σ (q) = D q − qq t is a covariance matrix.
Proof. Let q n = v (p n ). The normalized vector √ n (q n − q) converges weakly to a random variable, which is Gaussian with mean zero and covariance matrix Σ (q), see [6], Theorem 14.3-4 for details. The assertion follows by the delta method, see [7], p. 26, Theorem 3.1.
The asymptotic variance σ * (p) is unknown and can be estimated by σ * (p n ). The estimator σ * (p n ) is consistent by the continuous mapping theorem because p → σ * (p) is a continuous function. Let l α denote the lower α-quantile of the normal distribution. Then the critical value of the asymptotic test is l α σ * (p n ). Now we have all components of the asymptotic test, which can be carried out as follows: 1. Given are the contingency table p n of relative frequencies, the tolerance parameter ε and the significance level α.

Calculate the test statistic
The outlined test is locally asymptotically most powerful, see [8], Proposition 3.

Remark 1.
The minimum tolerance parameter ε, for which the asymptotic test rejects H 0 , can be calculated as d * (p n , M) − 1 √ n l α σ * (p n ).

Remark 2.
The asymptotic test can be straightforward generalized for the multi-way contingency tables.

Bootstrap Tests
The parametric bootstrap is an efficient method to improve the finite sample performance of the proposed tests. Let ∂H 0 denote the boundary of H 0 . Letp n denote an estimator of p, which fulfills the conditionp n ∈ ∂H 0 . The critical value c (α, p) can be estimated by c (α,p n ) = sup x∈R {x : P (T * (p n ) ≤ x) ≤ α} because the critical value should be estimated so as if H 0 were true. The estimator c (α,p n ) can be computed by the Monte Carlo method to any degree of accuracy.
The minimum distance estimator of p would be difficult to compute because the boundary ∂H 0 cannot be parameterized to apply common optimization techniques. Therefore, we propose a computationally feasible estimator of p, which is based on the randomized approximation to the minimum distance estimator. Let q be some probability matrix such that d * (q, M) > ε. If d * (p n , M) ≤ ε, then let a n be the largest number from [0, 1] such that d * (a n p n + (1 − a n ) q, M) = ε. Otherwise let a n = 1. The linear combination c (p n , q) = a n p n + (1 − a n ) q is a consistent estimator of the boundary point p under additional requirements as shown below.
Proof. We show that a n → 1 for n → ∞. Let d * (p n , M) < ε because a n = 1 otherwise. The function Therefore, there exists a largest number a n ∈ [0, 1] such that f (a n ) = ε. It is worth mentioning that a n is a function of p n .
Let E = {lim n→∞ p n = p}. By the strong law of large numbers, p n converges to p a.e. for n → ∞ and therefore P(E) = 1. Let ω ∈ E be an arbitrary point and let a n (ω) denote a n ( f n (ω)). The sequence of a n (ω) is bounded. Hence, there exists a convergent sub-sequence a n j (ω) → a 0 (ω) for j → ∞. We obtain a n j (ω) p n j (ω) We conclude a 0 (ω) = 1 due to the assumption d * (ap + (1 − a) q, M) > ε for all a ∈ [0, 1). Overall, we have shown that a n (ω) → 1 for all ω ∈ E.
Let Q = {q 1 , . . . , q m } be a finite set of probability matrices, such that any q i fulfills d * (q i , M) > ε. We define the estimatorp n as a minimum distance estimator among the linear combinations c (p n , q) for all q ∈ Q. Formally, the estimatorp n equals c (p n , q i ), which fulfills the conditions q i ∈ Q and l (c (p n , q i ) , p n ) = min q∈Q l (c (p n , q) , p n ). Note that the distance l is used to define the estimatorp n because d * is a pseudo-metric only. Corollary 1. Let at least one q ∈ Q satisfy d * (ap + (1 − a) q, M) > ε for all a ∈ [0, 1). Thenp n → p a.e. for n → ∞.
Proof. By definition ofp n , we obtain where l (c (p n , q) , p) → 0 a.e. by Proposition 2 and l (p, p n ) → 0 a.e. by the strong law of large numbers.
The bootstrap test can be carried out as follows: 1. Given are the contingency table p n of relative frequencies, the tolerance parameter ε, the number of exterior points m and the significance level α. 5. Solve the equation d * (a n p n + (1 − a n )q, M) = ε for a n using some root finding method. Repeat for all q ∈ Q. 6. Find the minimum distance estimatorp n among all linear combinations c (p n , q), where q ∈ Q. 7. Estimate the critical value c (α,p n ) using Monte Carlo simulation. 8. Reject H 0 if T * (p n ) ≤ c (α,p n ).

Remark 4.
The appropriate number of exterior points m can be found empirically. We found that m = (k 1 + k 2 ) * 50 is sufficient and scales well with the table size.

Remark 5.
The minimum tolerance parameter ε, for which the bootstrap test rejects H 0 , can be found numerically. For this purpose, the equation T * (p n ) = c (α,p n ) should be solved for the tolerance parameter ε using some root finding algorithm. The exterior points and bootstrap samples should remain unchanged during optimization.

Simulation Study of Finite Sample Performance
We study the finite sample performance of the proposed tests by the Monte Carlo simulation for different sample sizes and table sizes. The tests are implemented in VB.NET and available online, see https://github.com/TestingEquivalence/TestingApproximateIndependence.
The distance l is scaled Euclidean distance l 2 , where the scale factor is necessary to obtain comparable test results for different table sizes. We use l = 1 √ k 1 k 2 l 2 in case of d r and l = √ k 1 k 2 l 2 in case of d a . Alternatively the smoothed total variation distance would be a good choice, see [8].
The minimum ε, for which the test power equals 0.9, is calculated for different table sizes and sample sizes at the uniform probability matrices for the purpose of throwing some light on the appropriate values of ε and the effective sample sizes. Table 1 shows the minimum ε for the distance d r . The minimum ε for d a can be found in Table A1 because the results are very similar for d a and d r . The minimum ε decreases with the increasing sample size at the rate n − 1 2 . The minimum parameter ε climbs with the increasing table size at the rate k 1 + k 2 . Thus, the test power falls slowly with the increasing table size. The bootstrap tests have a smaller minimum ε than the asymptotic tests and the difference increases considerably with the table size. Table 1. Minimum tolerance parameter ε, for which the test power equals 0.9 at nominal level α = 0.05, is calculated at the uniform probability matrices using the distance d r . The sample size is 100, . . . , 10,000. table size  100  200  500  1000  2000  We study the type I error rates at 100 randomly selected points from ∂H 0 because the boundary of H 0 is a very complex set and it is difficult to identify particularly interesting boundary points. The points are found using steps 4 and 5 of the algorithm at the end of Section 3. The sample size n equals 100 * (k 1 + k 2 ) to maintain similar test power for different A detailed analysis of the boundary points shows that the test power is far above the nominal level at the points, where r i c j is close to zero for some i and j. Therefore, the test results should be treated with caution, if the marginal probability is close to zero for at least one category. Table 2. Summary of the simulated exact rejection probability of the equivalence tests at nominal level α = 0.05 and tolerance parameter ε = 0.2. The rejection probability is simulated at 100 randomly selected boundary points. The sample size is (k 1 + k 2 ) * 100 and the number of replications is 10.000 for each experiment.

Asymptotic Test Based on d r
Asymptotic The conservative tests can be obtained by shrinking the tolerance parameter ε. Table A5 summarizes the simulation results for ε = 0.18, where the test power is calculated at the same points as in Table 2, i.e., d * (p, M) = 0.2 at all considered points. Then the d a based tests are conservative at all points and the d r based tests are still non conservative at some points.
The type II error rates are studied at 100 randomly selected product measures for each table size, see Table 3. It should be noted that Table 3 contains test power and the type II error rate equals 1 minus test power. The sample size equals 100 * (k 1 + k 2 ) to be comparable to the type I error analysis. The power of d r -based tests changes very strongly from point to point. Given the fixed table size, the power of the d a -based tests is almost constant at all considered points. The averaged power of the asymptotic tests decreases slightly with the increasing table size. The averaged power of the bootstrap tests does not change with the table size. Table 3. Summary of the simulated test power at nominal level α = 0.05 and tolerance parameter ε = 0.2. The rejection probability is simulated at 100 randomly selected product measures. The sample size is (k 1 + k 2 ) * 100 and the number of replications is 10.000 for each experiment.

Asymptotic Test Based on d r
Asymptotic

Real Data Sets
To demonstrate the application of the proposed tests, three examples with real data sets are considered: gender and nitrendipine therapy (Nitrendipine); eye color and hair color (Color); children number and income (Children). The corresponding two way contingency tables are given in Appendix A, Tables A2, A3 and A4. Table 4 displays the minimum tolerance parameter ε, for which H 0 can be rejected at the nominal level α = 0.05. The three examples are also used in [5], such that a direct comparison is possible. The results for distance d a are similar to those presented in [5] after appropriate re-scaling. However, we avoid the unproven assumptions and the extensive use of the numeric optimization, which are necessary in [5].
The first example concerns with the question if the treatment outcome on nitrendipine mono-therapy in patients suffering from mild arterial hypertension depends on gender. The data set is also an example for approximate independence in [4]. The asymptotic and bootstrap test results for d r are very close to each other. The results for d a differ considerably for the asymptotic and bootstrap test. Given the small sample size, the treatment outcome and gender can be considered approximately independent.
A common example for independence testing is the cross-classification of eye color and hair color, see [2,3]. The test results in Table 4 reflect the well known fact that eye color and hair color are not independently distributed. All tests behave very similarly and can reject H 0 only for very large values of ε.
The cross-classification of the number of children by the annual income has a large sample size. However, the category, where the number of children is larger than or equal 4, is sparsely populated. Therefore, the d r based tests can reject H 0 only for comparatively large values of ε and the test power is low. The d a based tests show that the number of children and annual income may be considered approximately independent, but the approximation is very inaccurate.

Conflicts of Interest:
The author declares no conflict of interest.
Appendix A Table A1. Minimum tolerance parameter ε, for which the test power equals 0.9 at nominal level α = 0.05, is calculated at the uniform probability matrices using the distance d a . The sample size is 100, . . . , 10,000.  Table A5. Summary of the simulated exact rejection probability of the equivalence tests at nominal level α = 0.05 and shrunk tolerance parameter ε = 0.18. The rejection probability is simulated at 100 randomly selected boundary points of H 0 = {d * (p, M) ≥ 0.2}. The sample size is (k 1 + k 2 ) * 100 and the number of replications is 10.000 for each experiment.