Quasi Association Models for Square Contingency Tables with Ordinal Categories

: The analysis of contingency tables focuses on a statistical model instead of independence when the independence between row and column variables does not hold. Many association models have been proposed to indicate the structure of odds ratios. Additionally, symmetry and asymmetry models have been proposed to analyze the cell probabilities of square contingency tables with symmetric or asymmetric structures. This paper proposes an asymmetry plus association model for square contingency tables with ordinal categories and partitioning of the test statistic for goodness-of-ﬁt using our proposed model.


Introduction
A categorical variable distinguishes a set of categories. It is employed in diverse fields such as social sciences, medical sciences, engineering, and education. Here, we consider a categorical variable with r categories and another one with c categories. The outcome for two variables has rc possible combinations, which can be denoted by a rectangular table with r rows and c columns, where the cells illustrate the rc possible outcomes. This is called a contingency table (for more details, see [1,2]). A contingency table illustrates the joint frequencies by combination of two categorical variables. When analyzing a contingency table, only the observed frequencies are seen, but the true distribution is unknown. One of the aims of analyzing a contingency table is to estimate an unknown probability distribution from the observed frequencies. The confidence level of the estimated distribution is higher when fewer parameters are used to describe the data. Sometimes, we need to consider a parsimonious model. Traditionally, a contingency table is used to evaluate whether classifications are associated. That is, the analysis determines whether two variables are statistically independent.
If two variables take the same categorical values, the table is called a "square" contingency table. When the observed frequencies are concentrated in the main diagonal cells, the two variables are dependent. Even if the observations are not concentrated on the main diagonal but we have one large frequency and several small frequencies in each row and each column, then there is a strong association between the categories of a variable and those of the other, and hence a strong dependence. This is a common situation in real world data and, since the case of independence is infrequent and unrealistic, a suitable model for representing dependence data is important . Consequently, many statisticians consider various statistical models instead of an independence model and study the method of estimation and hypothesis testing based on a statistical model.When statistical independence between two variables does not hold, association models, which indicate the structure of odds ratios, have been considered to analyze contingency tables. On the other hand, symmetry or asymmetry models, which indicate the structure of ratios for cell probabilities in symmetric positions, are often used to analyze square contingency tables.
This study proposes a model with characteristics of both an association model and asymmetry model. Our model is more parsimonious than many association or asymmetry models. Hence, our model may better estimate the distribution than conventional association models and asymmetry models. This paper is organized as follows. Section 2 introduces previous research and proposes an asymmetry plus association model. Section 3 describes the necessary and sufficient condition to use our model. Section 4 provides the methods to evaluate model-fitting based on goodness-of-fit. Section 5 concludes this paper.

Models
For an r × r square contingency table with ordinal categories, let π ij denote the probability that an observation will fall in the ith row and jth column of the contingency table (i = 1, . . . , r; j = 1, . . . , r). Goodman [3][4][5] considered many association models in a contingency table. For example, the quasi-uniform association (QU) model is defined as Without loss of generality, we impose α r = β r = 1. The odds ratio for rows i and j (>i), and columns s and t (>s) are denoted by φ (ij;st) . That is, Using the odds ratios, the QU model can be expressed as The QU model with θ = 1 is the quasi-independence (QI) model (see p. 426 in Agresti [6]). That is, On the other hand, many statisticians have analyzed square contingency tables using a symmetric structure or an asymmetric structure for cell probabilities. Bowker [7] proposed the symmetry (S) model, which is defined as π ij = ψ ij (i = 1, . . . , r; j = 1, . . . , r), where ψ ij = ψ ji . This model indicates the symmetric structure for cell probabilities. Stuart [8] proposed the marginal homogeneity (MH) model, which is defined as where π i+ = ∑ r j=1 π ij and π +i = ∑ r j=1 π ji . The MH model indicates that the row marginal distribution is equivalent to the column marginal distribution.
Caussinus [9] proposed the quasi-symmetry (QS) model, which is defined as where ψ ij = ψ ji . This model is identical to the S model when α i = β i . The QS model can be expressed as φ (ij;st) = φ (st;ij) (i < j; s < t).
The QS model indicates the symmetric structure of the odds ratios. The QU model implies the QS model. That is, the QS model holds when the QU model holds. When the S model does not hold, asymmetry models, with a weaker restriction than the S model, have been proposed. For example, Tahata and Tomizawa [10] proposed the kth linear asymmetry (LS k ) model, which is defined for a fixed k (k = 1, . . . , r − 1) as where ψ ij = ψ ji . Note that when α l = β l , this model is the S model. As k increases, the LS k model is less restrictive, and the LS r−1 model is the QS model. Namely, the LS k model is the intermediate model between the S model and QS model. The LS k model can be expressed as The LS k model includes the linear diagonals-parameter symmetry model [11] and the extended linear diagonals-parameter symmetry model [12].
Goodman [4] introduced the symmetry plus quasi-independence (SQI) model, which is defined as This model is a special case of the S model and the QI model when ψ ij = µα i α j and α i = β i for i = j, respectively.
Yamamoto and Tomizawa [13] proposed the symmetry plus quasi-uniform association (SQU) model, which is defined as The SQU model implies the S model and QU model. Note that the SQU model is identical to the SQI model when θ = 1.
Association models and asymmetry models have been proposed independently. However, an asymmetry plus association model, which considers both the structure of asymmetry for cell probabilities and odds ratios, is rarely considered.
Here, we propose a new model defined for a fixed k (k = 1, . . . , r − 1) as Without loss of generality, we set α r = 1. This model is called the kth linear asymmetry plus quasi-uniform association (LSQU k ) model. When θ = 1, it is called the kth linear asymmetry plus quasi-independence (LSQI k ) model.
If the LSQU k model holds, then The LS k model holds by γ l = δ 2 l in Equation (14). Additionally, Therefore, the LSQU k model shows characteristics of both the LS k model and the QU model. This model with δ l = 1 for l = 1, . . . , k is the SQU model. When k = r − 1, the LSQU k model implies On the other hand, the QU model implies where l provides a one-to-one relation between {λ 1 , . . . , λ r−1 } and {γ 1 , . . . , γ r−1 }. This means that the LSQU r−1 model is equivalent to the QU model. The LSQU k (k < r − 1) model is a special case of the QU model since the LSQU r−1 model with δ l = 1 for l = k + 1, . . . , r − 1 is the LSQU k model. Hence, the LSQU k model is an intermediate model between the SQU and QU models. Similarly, the LSQI r−1 model is equivalent to the QI model. That is, the LSQI k model is an intermediate model between the SQI and QI models (For more details, see Figure 1).

Necessary and Sufficient Condition for the SQU Model
Caussinus [9] introduced the necessary and sufficient condition for the S model. This condition separates the S model into multiple models with a weaker restriction than the S model. Assuming that model M 1 holds if and only if both models M 2 and M 3 hold, then analyzing models M 2 and M 3 should elucidate a more detailed structure of the cell probabilities. Here, we are interested in deriving a necessary and sufficient condition for the SQU model using the LSQU k model. Yamamoto and Tomizawa [13] provided the following necessary and sufficient condition for the SQU model. Let X and Y denote the row and column variables, respectively, and consider a model defined for a fixed k (k = 1, . . . , r − 1), which is given as where E(X l ) = ∑ i ∑ j i l π ij and E(Y l ) = ∑ i ∑ j j l π ij . This model can be referred to as the marginal kth moment equality (ME k ) model. This leads to the following theorem.

Theorem 2.
For any k (k = 1, . . . , r − 1), the SQU model holds if and only if both the LSQU k model and the ME k model hold.
Proof. If the SQU model holds, the LSQU k model holds because the LSQU k model with δ l = 1 (l = 1, . . . , k) is the SQU model. Since the SQU model implies the S model, we can see that The ME k model also holds. The necessity is proved. Conversely, if both the LSQU k model and the ME k model hold, we can prove that the SQU model holds. If the LSQU k model holds, from Equation (14), we obtain The ME k model can also be expressed as From the LSQU k model and the ME k model, we obtain Since the logarithmic function is strictly increasing, then for any i = j (π ij − π ji )(log π ij − log π ji ) ≥ 0.
Equation (22) with π ij = π ji holds, that is, the S model holds. When the S model holds, the MH model holds. Additionally, the LSQU k model is a special case of the QU model. From Theorem 1, the SQU model holds. The proof is complete.
Theorem 2 is a generalization of Yamamoto and Tomizawa's result because the ME r−1 model is equivalent to the MH model (see [14]). This leads to the following corollary.

Corollary 1.
For any k (k = 1, . . . , r − 1), the SQI model holds if and only if both the LSQI k model and the ME k model hold.

Partition of Test Statistics
Here, we describe a method to evaluate the model fitting. We consider a test of hyphothesis, where the null hypothesis is that model M holds, and the alternative hypothesis is that model M does not hold. Let n ij denote the observed frequency in the (i, j)th cell of the table and m ij indicate the corresponding expected frequency with n = ∑ i ∑ j n ij (i = 1, . . . , r; j = 1, . . . , r). Assume that {n ij } has a multinomial distribution. Thenm ij denotes the maximum likelihood estimate (MLE) of m ij under a model. The likelihood ratio chi-squared statistic for the goodness-of-fit of the model M is defined as The numbers of degrees of freedom (df) for testing the goodness-of-fit under the SQU, LSQU k , and ME k models are r 2 − 2r − 1, r 2 − 2r − 1 − k, and k, respectively. The number of df for the SQU model is equal to the sum of those for the LSQU k and ME k models.
Previous studies have discussed the separability of a model [15][16][17][18][19]. Separability means that a test statistic for the goodness-of-fit of model M 1 is asymptotically equivalent to the sum of the test statistics for model M 2 − 1), the test statistic G 2 (SQU) is asymptotically equivalent to the sum of G 2 (LSQU k ) and G 2 (ME k ).
The LSQU k model can also be expressed as log π = Xβ = (1 r 2 , X 1 , X 2 , X 12 )β, where log π = (log π 11 , . . . , log π rr ) T , X is the r 2 × (2r + 1 + k) matrix, and 1 s is the s × 1 vector of the 1 element. Additionally, where and X 12 is the r 2 × (r + 1) matrix determined from Equation (25). Note that O st is the s × t zero matrix, 0 s is the s × 1 zero vector, J l r = (1 l , . . . , r l ) T , and "⊗" represents the Kronecker product. The matrix X has a full column rank, which is K = 2r + 1 + k.
We denote the linear space spanned by the columns of the matrix X by S(X) with dimension K. Let U be an r 2 × d 1 full column rank matrix, where d 1 = r 2 − 2r − 1 − k, such that S(U) is the orthogonal complement of space S(X). Hence, U T X = O d 1 ,K .
Let h 1 (π) be a vector of functions defined by h 1 (π) = U T log π. Moreover, let h 2 (π) be a vector of functions defined by h 2 (π) = X T 2 π, and note that X T 2 U = O d 2 ,d 1 where d 2 = k because X 2 belongs to space S(X).
Since H 1 (π)π = U T 1 r 2 = 0 d 1 , H 1 (π)diag(π) = U T , and H 2 (π) = X T 2 , we obtain Under each hypothesis, h s (π) = 0 d s (s = 1, 2, 3), we see where The Wald statistic W s has an asymptotic chi-squared distribution with d s df. That is, (i) W 1 is the Wald statistic for the LSQU k model, (ii) W 2 is that for the ME k model, and (iii) W 3 is that for the SQU model. The proof is completed using the asymptotic equivalence of the Wald statistic and the likelihood ratio statistic as proved by Rao [21].
Theorem 3 is also a generalization of Yamamoto and Tomizawa's result since this theorem is identical to Yamamoto and Tomizawa's result when k = r − 1. Moreover, we obtain the following corollary.

Corollary 2.
For any k (k = 1, . . . , r − 1), the test statistic G 2 (SQI) is asymptotically equivalent to the sum of G 2 (LSQI k ) and G 2 (ME k ). Table 1 shows the data cited by [22]. This data described 59 matched pairs using 4 dose levels of conjugated estrogen. The models described herein are used to analyze this data. Table 2 shows the value of G 2 (M) for each model applied to the data in Table 1. That is, for model M, the null hypothesis is that model M holds, and the alternative hypothesis is that model M does not hold. From Table 2, the SQI, SQU, S, and ME k models do not fit well, and the LSQI k , LSQU k , and LS k models are accepted at the 0.05 significant level (k = 1, 2, 3). We choose the most appropriate model in these models. If model M 1 is a special case of model M 2 , a test based on the difference between the likelihood ratio chi-squared statistic can compare the model fitting of two nested models. Let d 1 and d 2 denote the degrees of freedom for the models M 1 and M 2 , respectively. Assuming that model M 2 holds, a likelihood ratio chi-squared statistic under model M 1 is given as

Example
. This statistic is an asymptotically chi-squared distribution with d 1 − d 2 degrees of freedom. When we use it at the 0.05 significant level, the LSQI 1 model is the most appropriate model.   Table 3 shows the estimated expected frequencies from the LSQI 1 model for the data in Table 1. The value of maximum likelihood estimator of δ 1 for the LSQI 1 model is 0.71. We estimate the ratio between two probabilities asπ ij /π ji = 0.71 2(j−i) for i < j. Therefore, the probability distribution for the average dose for a case tends to be stochastically higher than the probability distribution for the average dose for control becauseδ 1 < 1. Finally, we are interested in inferring the reason for the poor fit of the SQI model. According to Corollary 1, the SQI model is separated into the LSQI 1 model and the ME 1 model. Since the LSQI 1 model fits very well, but the ME 1 model fits very poorly, we deduce that the lack of structure of the ME 1 model is responsible for the poor fit of the SQI model.

Conclusions
Herein we describe an asymmetry plus association model. This model indicates the asymmetry structures for cell probabilities between symmetric position and odds ratios. Our model is an intermediate model between the SQU model and the QU model. If the QU (LSQU r−1 ) model holds but the SQU model does not, the LSQU k model for k < r − 1 may hold. In this case, the QU model may be overfitting. That is, our model may realize a better fit than the QU model under these conditions. In practice, the LSQI 1 model fits well when the SQU model fits poorly and the QU model fits for the data in Table 1. Additionally, a theorem with respect to the necessary and sufficient condition for the SQU model is represented using our model. Using this theorem, we show the asymptotic separability for the SQU model. Namely, the likelihood ratio chi-squared statistic for the SQU model is equivalent to the sum of those for the separated models, which helps deduce the reason that the SQU model does not hold.

Acknowledgments:
We would like to thank the three anonymous referees for their helpful comments and suggestions.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

QU
Quasi-uniform association QI Quasi-independence S Symmetry MH Marginal homogeneity QS Quasi-symmetry LS k kth linear asymmetry SQI Symmetry plus quasi-independence SQU Symmetry plus quasi-uniform association LSQU k kth linear asymmetry plus quasi-uniform association LSQI k kth linear asymmetry plus quasi-independence ME k Marginal kth moment equality df Degrees of freedom