Two-Dimensional Index of Departure from the Symmetry Model for Square Contingency Tables with Nominal Categories

: In the analysis of two-way contingency tables, the degree of departure from independence is measured using measures of association between row and column variables (e.g., Yule’s coefﬁcients of association and of colligation, Cramér’s coefﬁcient, and Goodman and Kruskal’s coefﬁcient). On the other hand, in the analysis of square contingency tables with the same row and column classiﬁcations, we are interested in measuring the degree of departure from symmetry rather than independence. Over past years, many studies have proposed various types of indexes based on their power divergence (or diversity index) to represent the degree of departure from symmetry. This study proposes a two-dimensional index to measure the degree of departure from symmetry in terms of the log odds of each symmetric cell with respect to the main diagonal of the table. By measuring the degree of departure from symmetry in terms of the log odds of each symmetric cell, the analysis results are easier to interpret than existing indexes. Numerical experiments show the utility of the proposed two-dimensional index. We show the usefulness of the proposed two-dimensional index by using real data. editing, T.M., T.N., A.I., Y.S. and S.T.; visualization, T.M., T.N., A.I., Y.S. and S.T.; supervision, T.M., T.N., A.I., Y.S. and S.T.; project administration, T.M., T.N., A.I., Y.S. and S.T.; funding acquisition, T.N. and S.T. authors


Introduction
For two-way contingency tables, an analysis is generally performed to see whether the independence between the row and column classifications holds. Meanwhile, for the analysis of square contingency tables with the same row and column classifications, there are many issues related to symmetry rather than independence. This is because, in square contingency tables, there is a strong association between the row and column classifications. Consider an r × r square contingency table. Let π ij denote the probability that an observation will fall in the ith row and jth column of the table (i, j = 1, . . . , r). Ref. [1] proposed the symmetry model defined by π ij = π ji for all i < j.
This symmetry model, however, often does not hold when applied to real data. When the symmetry model fits real data poorly, other symmetry (e.g., quasi symmetry [2] and partial symmetry [3]) models, or asymmetry (e.g., conditional symmetry [4], linear diagonals-parameter symmetry [5], and conditional difference asymmetry [6]) models are applied to these real data.
In the analysis of two-way contingency tables, the degree of departure from independence is assessed by using measures of association between the row and column variables.
In addition, in the analysis of square contingency tables with the same row and column classifications, we are interested in measuring the degree of departure from the symmetry model. Over the past few years, many studies have proposed indexes to represent the degree of departure from the symmetry model. Refs. [13,14] proposed the various types of indexes based on power divergence (or the diversity index) to represent the degree of departure from the symmetry model. Refs. [15,16] proposed two-dimensional indexes to represent the degree of departure from symmetry. A two-dimensional index allows us to visually compare the degrees of departure from symmetry in multiple data sets by using confidence regions and allows us to easily interpret the results of data analysis.
This study proposes a two-dimensional index to measure the degree of departure from the symmetry model in terms of the log odds of each symmetric cell with respect to the main diagonal of the table. By measuring the degree of departure from symmetry in terms of the log odds of each symmetric cell, the analysis results are easier to interpret than existing indexes. This paper is organized as follows. Section 2 introduces the proposed index and shows the properties of the proposed index. Section 3 derives the confidence region of the proposed index. Section 4 shows the usefulness of the proposed index by applying it to real data. Section 5 discusses properties of the proposed index by using several asymmetry models. Section 6 describes the concluding remarks.

Two-Dimensional Index and Its Properties
This section proposes a two-dimensional index to measure the degree of departure from the symmetry model in terms of the log odds of each symmetric cell with respect to the main diagonal of the table. By using the weighted geometric mean indexes of the diversity index as the elements of the proposed two-dimensional index, the proposed two-dimensional index has more useful properties than the index proposed by [13], which measures the degree of departure from the symmetry model. Section 2.1 describes two univariate indexes of weighted geometric mean type that are elements of the proposed two-dimensional index and their characteristics. Section 2.2 shows the relationship between the elements of the proposed two-dimensional index and describes the properties of the proposed two-dimensional index.

Univariate Index of Weighted Geometric Mean Type
For an r × r square contingency table with nominal categories, ref. [3] proposed the weighted geometric mean of the diversity index as follows. Assuming that π ij + π ji > 0 for all i < j, for λ > −1, The values at λ = 0 are taken to be the continuous limit as λ → 0. Note that H (λ) ij is [17]'s diversity index of degree λ including the Shannon entropy (λ = 0), and the real number λ is chosen by the user. The index τ (λ) has the following characteristics: (i) 0 ≤ τ (λ) ≤ 1; (ii) τ (λ) = 0 if and only if π ij = π ji for at least one i < j; and (iii) τ (λ) = 1 if and only if the degree of asymmetry is maximum in the sense that π ij = 0 (then π ji > 0) or π ji = 0 (then π ij > 0) for all i < j.
For an r × r square contingency table with nominal categories, we define a weighted geometric mean univariate index of the diversity index, which has a different formula from index τ (λ) , as follows. Assuming that π ij + π ji > 0 for all i < j, for λ > −1, .
The values at λ = 0 are taken to be the continuous limit as is [17]'s diversity index of degree λ including the Shannon entropy (λ = 0), and the real number λ is chosen by the user. The index Φ (λ) has the following characteristics: and only if the symmetry model holds; and (iii) Φ (λ) = 1 if and only if π ij = 0 (then π ji > 0) or π ji = 0 (then π ij > 0) for at least one i < j.

Two-Dimensional Index of Symmetry
Assuming that π ij + π ji > 0 for all i < j, we propose a two-dimensional index defined by where Φ (λ) and τ (λ) are described in Section 2.1 and a is the transpose of a. The values at λ = 0 are taken to be the continuous limit as ij is the diversity index of degree λ in [17] including the Shannon entropy (λ = 0), where λ is a real number chosen by the user.
By noting that the indexes Φ (λ) and τ (λ) are expressed using the weighted geometric mean of the diversity index, the following theorem concerning the relationship between Φ (λ) and τ (λ) holds. Theorem 1. The inequality τ (λ) ≤ Φ (λ) holds, and that equality holds if, and only if, the conditional difference asymmetry model defined by [6] as where |∆ ij | = ∆ and {∆ ij } are unspecified real-valued parameters, holds.
The proof of this theorem is given in Appendix A. Based on the above properties of the elements Φ (λ) and τ (λ) of the proposed twodimensional index Λ (λ) , Λ (λ) has the following characteristics: The value of Λ (λ) lies on the sides and inside the triangle at vertices (0, 0), (1, 0), and (1, 1); (2) Λ (λ) = (0, 0) if, and only if, the symmetry model holds; 1) if, and only if, the degree of asymmetry is maximum in the sense that π ij = 0 (then π ji > 0) or π ji = 0 (then π ij > 0) for all i < j; where t is a constant for 0 ≤ t ≤ 1 if, and only if, the conditional difference asymmetry model holds.
The conditional difference asymmetry model holds if, and only if, the absolute value of the log odds of each symmetric cell with respect to the main diagonal of the table can be expressed by the constant ∆. Namely, the proposed two-dimensional index can represent the degree of departure from the symmetry model in terms of the log odds log(π ij /π ji ). Remark 1. Similar to the index proposed by [13] (see Appendix B), the proposed two-dimensional index represents the degree of departure from the symmetry model, and π ij = 0 (then π ji > 0) or π ji = 0 (then π ij > 0) for all i < j when the degree of asymmetry is maximum. However, the proposed two-dimensional index represents it in terms of the log odds of each symmetric cell with respect to the main diagonal of the table, which makes the analysis results easier to interpret than the [13]'s index. Section 4 shows the usefulness of the proposed two-dimensional index by using an example. This may be one of the advantages of the proposed two-dimensional index, which cannot be represented only by using the indexes Φ (λ) and τ (λ) , which are elements of the proposed two-dimensional index.

Approximate Confidence Region for the Proposed Index
Let n ij denote the observed frequency in the ith row and jth column of the table (i, j = 1, . . . , r). Assume that a multinomial distribution applies to the r × r table. The sample proportions of {π ij ; i, j = 1, . . . , r} are {p ij = n ij /n} with n = ∑ i,j n ij . The estimator of the proposed index Λ (λ) , Λ (λ) is provided by replacing {π ij } with {p ij }.
Since the delta method for multinomial distributions assumes asymptotic normality for the observed frequencies of each cell, the asymptotic normality of the index obtained by the delta method may be affected when the observed frequencies are small near the corners of the contingency table (see, e.g., [12], p. 589).

Example
This section demonstrates the usefulness of the proposed two-dimensional index Λ (λ) compared with the index of [13] (denoted by Γ (λ) ; see Appendix B), which measures the degree of departure from the symmetry model by using the real data cited from [18,19].
These real data are the cross-classification of fathers' and sons' occupational status categories in Japan, which were examined in 1955 and 1995. Their status could be classified as (1) capitalist, (2) new middle, (3) working, (4) self-employed, and (5) farming. Table 1 represents estimates of indexes Φ (λ) and τ (λ) , approximate standard errors for Φ (λ) and τ (λ) , and approximate 95% confidence intervals for Φ (λ) and τ (λ) for the real data in [18,19]. Note that from the delta method, the approximate confidence intervals for Φ (λ) and τ (λ) can be obtained by using the (1, 1) and (2, 2) components of the estimator of the covariance matrix Σ[Λ (λ) ] of Λ (λ) , respectively. When λ = 1, the estimates of the two-dimensional index Λ (1) Figure 1 shows point estimates and approximate 95% confidence regions of the two-dimensional index Λ (1) for the real data in [18,19]. The vertical and horizontal axes in Figure 1 represent the values of indexes Φ (λ) and τ (λ) , respectively. Since these confidence regions do not overlap, it is inferred that these data have different probability structures. From Table 1 and Figure 1, we can see that the degree of departure from the symmetry model for the data in [19] is lager than for the data in [18]. Additionally, the confidence region of Λ (1) for the data in [18] includes the line passing through the points (0, 0) and (1, 1), but the confidence region of Λ (1) for the data in [19] does not. Therefore, the probability structure of the data in [18] may have conditional difference asymmetry, and we can see that the father's occupational status in Japan in 1955 has a greater influence on his son's status than in 1995. On the other hand, even using the existing index Γ (λ) of [13], which measures the degree of departure from the symmetry model, since confidence intervals of Γ (1) for the real data in [18,19] are (0.265, 0.381) and (0.399, 0.482), respectively, we can see the degree of departure from symmetry for the data in [19] is larger than for the data in [18]. However, the existing index Γ (λ) does not allow us to determine which the data in [18,19] show that the father's occupational status has more influence on his son's status.
The proposed two-dimensional index Λ (λ) is not only capable of representing the degree of departure from the symmetry model, but also can take into account the log odds of each symmetric cell with respect to the main diagonal of the table (i.e., the degree of departure from the conditional difference asymmetry model). It can, therefore, provide more interpretable analysis results, as above. Table 1. Estimates of indexes Φ (λ) and τ (λ) , approximate standard errors for Φ (λ) and τ (λ) , and approximate 95% confidence intervals for Φ (λ) and τ (λ) for the real data in [18,19]. τ (1) Figure 1. Point estimates and approximate 95% confidence regions of the two-dimensional index Λ (1) for the real data in [18,19].

Discussion
This section discusses properties of the proposed index Λ (λ) for several asymmetry models. Consider the conditional symmetry model [4] and the linear diagonals-parameter symmetry model [5] as asymmetry models. The conditional symmetry model is defined by π ij = γπ ji for all i < j.
The linear diagonals-parameter symmetry model is defined by Note that if the conditional symmetry model holds, then the conditional difference asymmetry model holds, but the reverse is not true. In contrast, if the linear diagonalsparameter symmetry model holds, the conditional difference asymmetry model does not always hold. Figure 2 plots the values of the proposed index Λ (1) for γ = 1, 2, 3, 5, 10, 100, 1000 in the conditional symmetry model. The vertical and horizontal axes in Figure 2 represent the values of indexes Φ (λ) and τ (λ) , respectively. From Figure 2, as the value of γ increases, the value of Λ (1) approaches (1, 1) and lies on the straight line passing through (0, 0) and (1,1). On the other hand, Figure 3 plots the values of the proposed index Λ (1) for θ = 1, 2, 3, 5, 10, 100, 1000 in the linear diagonals-parameter model. The vertical and horizontal axes in Figure 3 represent the values of indexes Φ (λ) and τ (λ) , respectively. As can be seen in Figure 3, the value of Λ (1) approaches (1, 1) as the value of θ increases, but the value of Λ (1) does not lie on the straight line passing through (0, 0) and (1,1) for all values of θ. Similar results are observed for another value of λ, although the details are omitted. This difference is due to the fact that the proposed index Λ (λ) measures the degree of departure from symmetry in terms of the log odds of each symmetric cell with respect to the main diagonal in the table.
Therefore, the proposed index Λ (λ) is suitable for measuring the degree of departure from symmetry and can visually distinguish the asymmetry models as described above.

Conclusions
This paper proposed a two-dimensional index to measure the degree of departure from symmetry in terms of the log odds of each symmetric cell with respect to the main diagonal in the square contingency tables. By measuring the degree of departure from symmetry in terms of the log odds of each symmetric cell, the proposed two-dimensional index provides more interpretable analysis results than the existing index of [13], which measures the degree of departure from the symmetry model. Additionally, the proposed two-dimensional index allows us to visually compare the degrees of departure from symmetry in multiple data sets using confidence regions and easily interpret the results of data analysis.
The proposed index Λ (λ) is invariant under arbitrary same permutations of the row and column categories; namely, the value of Λ (λ) does not depend on the order of the categories. Therefore, it is possible to use the proposed index for data with nominal categories. Moreover, if we may not use the information about the categories' ordering, it is possible to use the proposed index for data on an ordinal scale.

Data Availability Statement:
The data presented in this study are available in [18,19].