Diagonals-parameter symmetry model and its property for square contingency tables with ordinal categories

Previously, the diagonals-parameter symmetry model based on $f$-divergence (denoted by DPS[$f$]) was reported to be equivalent to the diagonals-parameter symmetry model regardless of the function $f$, but the proof was omitted. Here, we derive the DPS[$f$] model and the proof of the relation between the two models. We can obtain various interpretations of the diagonals-parameter symmetry model from the result. Additionally, the necessary and sufficient conditions for symmetry and property between test statistics for goodness of fit are discussed.


Introduction
A square contingency table with the same ordinal row and column categories may arise when a categorical variable is measured repeatedly.In such a table, observations tend to concentrate on the main diagonal cells.Our research aims to apply symmetry rather than the independence between row and column categories.Many studies have treated symmetry issues, for example, Bowker (1948), Kateri and Papaioannou (1997), Kateri and Agresti (2007), Tahata andTomizawa (2011), andTahata (2020).
Let X and Y respectively denote the row and column variables for an r × r contingency table with ordinal categories.Also, let π ij denote the probability that an observation falls in (i, j)th cell (i = 1, . . ., r; j = 1, . . ., r).Goodman (1979) proposed the diagonals-parameter symmetry (DPS) model, which is defined by where ψ ij = ψ ji and k = j − i.The parameter d k in the DPS model is simply the odds that an observation will fall in one of the cells (i, j) where j − i = k; i < j, rather than in one of the cells (j, i) where j − i = k; i < j for k = 1, . . ., r − 1.
Additionally, for j − i = k; i < j, the ratio between π ij and π ji can be expressed the constant d k .That is, the ratio depends only on the distance from the main diagonal cells.
When equation (1) with d 1 = • • • = d r−1 = 1, the DPS model is reduced to the symmetry (S) model proposed by Bowker (1948).When d k does not depend on i or j in equation ( 1) with d 1 = • • • = d r−1 , the DPS model is reduced to the conditional symmetry (CS) model proposed by McCullagh (1978).Kateri and Papaioannou (1997) described the DPS model based on the fdivergence (DPS[f ]) model, which is defined as where It should be noted that the function f is a twice-differential and strictly convex, the derivation of this model is omitted in Kateri and Papaioannou (1997) 2 Properties of the DPS[f ] model Kateri and Papaioannou (1997) noted that the DPS[f ] model is the closest model to the S model in terms of the f -divergence under the conditions where j−i=k π ij (and i−j=k π ij ) for k=1, . . ., r − 1 as well as the sums π ij + π ji for i = 1, . . ., r; j = 1, . . .r, are given.For example, Ireland et al. (1969), Kateri and Agresti (2007), and Tahata (2020) mentioned a similar property for the symmetry (or asymmetry) model.This section derives the DPS[f ] model and describes its properties.
We can obtain the following theorem, although the proof of Theorem 1 is given in Appendix.
The DPS[f ] model can be expressed as where k = j − i, γ ij = γ ji and π c ij = π ij /(π ij + π ji ).Note that π c ij is the conditional probability that an observation falls in the (i, j)th cell for a condition where the observation falls in the (i, j)th cell or the (j, i)th cell.Namely, the DPS[f ] model indicates that When If f (x) = x log(x), x > 0, then the f -divergence is reduced to the KL divergence.When we set f (x) = x log(x), equation ( 3) is reduced to where k = j − i and γ ij = γ ji .We shall refer to this model as the DPS KL model.
Under the DPS KL model, the ratios of π ij and π ji for i < j are expressed as where d KL k = exp(a k ) and k = j − i.Since equation ( 5) indicates that the ratio of π ij and π ji depends on the distance of k = j − i, the DPS KL model is equivalent to the DPS model proposed by Goodman (1979).Namely, the DPS model is the closest model to the S model in terms of the KL divergence under the conditions where i−j=k π ij , k = 0, and the sums π ij + π ji for i = 1, . . ., r; j = 1, . . ., r are given.This is a special case of Theorem 1.
where k = j − i and γ ij = γ ji .We shall refer to this model as the DPS RKL model.
This model is the closest to the S model when the divergence is measured by the reverse KL divergence and can be expressed as where where k = j − i and γ ij = γ ji .We shall refer to this model as the DPS P model.
This model is the closest to the S model when the divergence is measured by the χ 2 -divergence and can be expressed as where where λ is a real-valued parameter, then the f -divergence is reduced to the power-divergence (Read and Cressie 1988).Then, the DPS[f ] model is reduced to where k = j − i and γ ij = γ ji .We shall refer to this model as the DPS PD(λ) model.This model is the closest to the S model when the divergence is measured by the power-divergence and can be expressed as where d This model indicates that the difference between the symmetric conditional probabilities to the power of λ depends on the distance of k = j − i.When we apply the DPS PD(λ) model, we should set the value of λ.Kateri and Papaioannou (1997) reported that the DPS[f ] model is equivalent to the DPS model regardless of f .That is, the all models described above (i.e., DPS KL , DPS RKL , DPS P , and DPS PD(λ) ) are equivalent to the DPS model surprisingly.However, the proof was not given.We prove the following theorem.We consider the distance global symmetry (DGS) model defined as where Next, we consider the global symmetry (GS) model, which is defined as It should be noted that the DGS model implies the GS model.Read (1977) noted that the S model holds if and only if both the CS model and the GS model hold.Fujisawa and Tahata (2020) proved that the S model holds if and only if the CS[f ] model and the GS model hold.These statements are the same as those from Corollary 1.This section proves the separation of the test statistics for the S model into those for the DPS[f ] model and the DGS model.Let n ij denote the observed frequency in the (i, j)th cell in the r × r square contingency table.Assume that a multinomial distribution applies to the r × r contingency table.Let m ij and mij denote the expected frequency in the (i, j)th cell and the corresponding maximum likelihood estimate under a model, respectively.Each model can be tested for the goodness of fit by, for example, the likelihood ratio chi-square statistic of model M, which is given as with the corresponding degree of freedom (df).
It is supposed that model M 3 holds if and only if both models M 1 and M 2 hold.
For these three models, Aitchison (1962) discussed the properties of the Wald test statistics, and Darroch and Silvey (1963) described the properties of the likelihood ratio chi-square statistics.Assume that the following equivalence holds: where T is the goodness of fit test statistic and the number of df for M 3 is equal to the sum of numbers of df for M 1 and M 2 .If both M 1 and M 2 are accepted with a high probability (at the α significance level), then M 3 is accepted.However, when (7) does not hold, an incompatible situation where both M 1 and M 2 are accepted with a high probability but M 3 is rejected may arise.In fact, Darroch and Silvey (1963) where (Goodman 1979).Next, we consider the MLEs under the DGS model using the Lagrange function.
Since the kernel of the log likelihood is r i=1 r j=1 n ij log π ij , Lagrange function L is written as Equating the derivation of L to 0 with respect to π ij , λ, and λ k gives where k = |j − i|.
We obtain the following equivalence from equations ( 8) and ( 9).

Numerical example
Table 1, which is taken from Smith et al. (2006), describes the amount of influence religious leaders and medical leaders should have in government funding for decisions on stem cell research when surveying 871 people.The influence levels are divided into four categories: (1) Great influence, (2) Some influence, (3) A little influence, and (4) No influence.
Table 1: How much influence should religious leaders and medical leaders have in government funding for decisions on stem cell research?(Smith et al. 2006).

Religious Medical Leaders Leaders
Great(1) Fair( 2 2 gives the values of the likelihood ratio chi-square statistics G 2 and p value for the models applied to these data.Table 2 indicates that the sum of the test statistics DPS (i.e., DPS[f ]) model and DGS model is equal to that of the S model.The S model fits the data very poorly.We can infer that the marginal distribution for religious leaders is not equal to that for medical leaders.On the other hand, the DPS model fits the data very well.Additionally, the DGS model fits the data poorly.From Theorem 3, the reason of the poor fit of S model is caused by the poor fit of DGS model rather than the DPS model.) in the DPS KL model.Let (i, j) denote the pair that the amount of influence religious leaders is ith level and that of medical leaders is jth level.When k = j − i (k = 1, 2, 3), a pair (i, j) is dk times as likely as a pair (j, i) on condition that a pair is (i, j) or (j, i).

Proof of Theorem 2
Let function G be defined as where F = f ′ .Then, the derivative of G is Since the function f is twice-differential and strictly convex that G ′ (x) > 0 for x > 0. Hence, G is a strictly increasing function, and G −1 exists.
If the DPS model holds, π ij /π ji = d k holds for i < j from equation (1), where k = j − i.Then we can see that for i < j, This is equivalent to equation (4).Namely, the DPS[f ] model holds.
On the other hand, if the DPS[f ] model holds, equation (4) holds.We can see Since G −1 exists, we obtain Namely, the DPS model holds.The proof is complete.
We denote the linear space spanned by the column of matrix X by S(X) with dimension K. S(X) is a subspace of R r 2 .Let U be an r 2 × d 1 full column rank matrix such that the linear space S(U ) spanned by the column of U is the orthogonal complement of the space S(X).Note that d 1 = r 2 − ((r − 1) + r(r + 1)/2) = (r − 1)(r − 2)/2.Since U t X = O d 1 ,K where O d 1 ,K is the d 1 × K zero matrix, the DPS model can be expressed as h 1 (π) = U t log π = 0 d 1 , where 0 s is the s × 1 zero vector.
Let p denote π with π ij replaced by p ij , where p ij = n ij /n with n = n ij .
. They also noted that (i) the DPS model is the closest model to symmetry in terms of the Kullback-Leibler (KL) distance and (ii) the DPS[f ] model is equivalent to the DPS model.In this study, we derive the DPS[f ] model and the proof about the relation between the two models.We can obtain various interpretations of the DPS model from the result.Additionally, the necessary and sufficient condition for the S model and a property between test statistics for goodness of fit are discussed.The rest of this paper is organized as follows.Section 2 derives equation (2) and interprets the model from an information theory viewpoint.Additionally, the proof that the DPS[f ] model is equivalent to the DPS model regardless of the function f is given.Section 3 discusses the necessary and sufficient condition for the S model and highlights the relationships between the goodness-of-fit test statistics for the S model and the partitioned models.Section 4 gives a numerical example.Section 5 summarizes this paper.

Theorem 2 .
The DPS[f ] model is equivalent to the DPS model regardless of f .The poof is given in Appendix.Theorem 2 states that the DPS model holds if and only if the DPS[f ] model holds.That is, if the DPS model fits the given dataset, then we obtain various interpretations for the data.When a 1 = • • • = a r−1 , the DPS[f ] model is reduced to the conditional symmetry model based on the f -divergence (CS[f ]) model.The CS[f ] model is described previously Kateri and Papaioannou (1997).Additionally, Fujisawa and Tahata (2020) proposed the generalization of CS[f ] model.Similarly, when d 1 = • • • = d r−1 , the DPS model is reduced to the conditional symmetry (CS) model proposed by McCullagh (1978).The CS[f ] model is equivalent to the CS model regardless of f (Kateri and Papaioannou 1997).Hence, Theorem 2 leads to the following corollary.Corollary 1.The CS[f ] model is equivalent to the CS model regardless of f .3 Equivalence conditions for symmetry Here, the equivalence conditions of the S model are discussed.If the S model holds, then the DPS[f ] model with a 1 = • • • = a r−1 = 0 holds.Conversely, if the DPS[f ] model holds, then the S model does not hold generally.Therefore, we are interested in considering an additional condition to obtain the S model when the DPS[f ] model holds.Other studies have discussed such conditionsRead (1977) andTahata et al. (2016).
r − 1, this model indicates that the sum of probabilities which are apart distance k = j − i from main diagonal cells is equal to the sum of probabilities which are apart distance k = i − j from main diagonal cells.We obtain the following theorem.(The proof is given in Appendix.)Theorem 3. The S model holds if and only if both the DPS[f ] model and the DGS model hold.
showed such an interesting example.From Theorem 3, the S model holds if and only if the DPS[f ] model and the DGS model hold.In addition, df for the DPS[f ] model is (r − 1)(r − 2)/2 and that for DGS model is (r − 1).Note that the number of df for the S model is equal to the sum of the numbers of df for the DPS[f ] and the DGS models.Thus, we consider partitioning test statistics.Theorem 2 confirms that the DPS[f ] model is equivalent to the DPS model.Therefore, the maximum likelihood estimates (MLEs) under the DPS[f ] model are given by because the MLEs under the S model are mij = (n ij + n ji )/2.Therefore, the DPS[f ] model and the DGS model are separable and exhibit independence.Let W (M) denote the Wald statistic for model M. We obtain the following theorem and prove it in Appendix.

Theorem 4 .
W (S) is equal to the sum of W (DP S[f ]) and W (DGS).
where ζ ij = ζ ji and ∆ k + ∆ −k = 0.The minimum value of I C (π : π S ) is attained for π * ij where ζ ij and ∆ l are determined so that π * ij satisfies restraints (10) and (12).Therefore, the DPS[f ] model is the closest model to the S model in terms of the f -divergence under these conditions.
is obvious that if the S model holds, the DPS[f ] model and the DGS model simultaneously hold.Assuming that both the DPS[f ] model and the DGS model hold, we show that the S model holds.From Theorem 2, the DPS[f ] model is equivalent to π ij /π ji = d k for i < j with k = j − i.Since the DGS model holds, we obtain j−i=k (d k − 1) π ji = 0 (k = 1, . . ., r − 1).

Table 2 :
Likelihood ratio chi-square values G 2 for the models applied to Table1.