Abstract
In the analysis of two-way contingency tables, the degree of departure from independence is measured using measures of association between row and column variables (e.g., Yule’s coefficients of association and of colligation, Cramér’s coefficient, and Goodman and Kruskal’s coefficient). On the other hand, in the analysis of square contingency tables with the same row and column classifications, we are interested in measuring the degree of departure from symmetry rather than independence. Over past years, many studies have proposed various types of indexes based on their power divergence (or diversity index) to represent the degree of departure from symmetry. This study proposes a two-dimensional index to measure the degree of departure from symmetry in terms of the log odds of each symmetric cell with respect to the main diagonal of the table. By measuring the degree of departure from symmetry in terms of the log odds of each symmetric cell, the analysis results are easier to interpret than existing indexes. Numerical experiments show the utility of the proposed two-dimensional index. We show the usefulness of the proposed two-dimensional index by using real data.
1. Introduction
For two-way contingency tables, an analysis is generally performed to see whether the independence between the row and column classifications holds. Meanwhile, for the analysis of square contingency tables with the same row and column classifications, there are many issues related to symmetry rather than independence. This is because, in square contingency tables, there is a strong association between the row and column classifications. Consider an square contingency table. Let denote the probability that an observation will fall in the ith row and jth column of the table (). Ref. [1] proposed the symmetry model defined by
This symmetry model, however, often does not hold when applied to real data. When the symmetry model fits real data poorly, other symmetry (e.g., quasi symmetry [2] and partial symmetry [3]) models, or asymmetry (e.g., conditional symmetry [4], linear diagonals-parameter symmetry [5], and conditional difference asymmetry [6]) models are applied to these real data.
In the analysis of two-way contingency tables, the degree of departure from independence is assessed by using measures of association between the row and column variables. Measures of association include, for example, Yule’s coefficients of association and of colligation [7,8], Cramér’s coefficient [9], and Goodman and Kruskal’s coefficient [10]. For details, see [11,12].
In addition, in the analysis of square contingency tables with the same row and column classifications, we are interested in measuring the degree of departure from the symmetry model. Over the past few years, many studies have proposed indexes to represent the degree of departure from the symmetry model. Refs. [13,14] proposed the various types of indexes based on power divergence (or the diversity index) to represent the degree of departure from the symmetry model. Refs. [15,16] proposed two-dimensional indexes to represent the degree of departure from symmetry. A two-dimensional index allows us to visually compare the degrees of departure from symmetry in multiple data sets by using confidence regions and allows us to easily interpret the results of data analysis.
This study proposes a two-dimensional index to measure the degree of departure from the symmetry model in terms of the log odds of each symmetric cell with respect to the main diagonal of the table. By measuring the degree of departure from symmetry in terms of the log odds of each symmetric cell, the analysis results are easier to interpret than existing indexes. This paper is organized as follows. Section 2 introduces the proposed index and shows the properties of the proposed index. Section 3 derives the confidence region of the proposed index. Section 4 shows the usefulness of the proposed index by applying it to real data. Section 5 discusses properties of the proposed index by using several asymmetry models. Section 6 describes the concluding remarks.
2. Two-Dimensional Index and Its Properties
This section proposes a two-dimensional index to measure the degree of departure from the symmetry model in terms of the log odds of each symmetric cell with respect to the main diagonal of the table. By using the weighted geometric mean indexes of the diversity index as the elements of the proposed two-dimensional index, the proposed two-dimensional index has more useful properties than the index proposed by [13], which measures the degree of departure from the symmetry model. Section 2.1 describes two univariate indexes of weighted geometric mean type that are elements of the proposed two-dimensional index and their characteristics. Section 2.2 shows the relationship between the elements of the proposed two-dimensional index and describes the properties of the proposed two-dimensional index.
2.1. Univariate Index of Weighted Geometric Mean Type
For an square contingency table with nominal categories, ref. [3] proposed the weighted geometric mean of the diversity index as follows. Assuming that for all , for ,
where
The values at are taken to be the continuous limit as . Note that is [17]’s diversity index of degree including the Shannon entropy (), and the real number is chosen by the user. The index has the following characteristics: (i) ; (ii) if and only if for at least one ; and (iii) if and only if the degree of asymmetry is maximum in the sense that (then ) or (then ) for all .
For an square contingency table with nominal categories, we define a weighted geometric mean univariate index of the diversity index, which has a different formula from index , as follows. Assuming that for all , for ,
The values at are taken to be the continuous limit as . Note that is [17]’s diversity index of degree including the Shannon entropy (), and the real number is chosen by the user. The index has the following characteristics: (i) ; (ii) if and only if the symmetry model holds; and (iii) if and only if .
2.2. Two-Dimensional Index of Symmetry
Assuming that for all , we propose a two-dimensional index defined by
where and are described in Section 2.1 and is the transpose of . The values at are taken to be the continuous limit as . Note that is the diversity index of degree in [17] including the Shannon entropy (), where is a real number chosen by the user.
By noting that the indexes and are expressed using the weighted geometric mean of the diversity index, the following theorem concerning the relationship between and holds.
Theorem 1.
The inequalityholds, and that equality holds if, and only if, the conditional difference asymmetry model defined by [6] as
whereandare unspecified real-valued parameters, holds.
The proof of this theorem is given in Appendix A.
Based on the above properties of the elements and of the proposed two-dimensional index , has the following characteristics:
- (1)
- The value of lies on the sides and inside the triangle at vertices , , and ;
- (2)
- if, and only if, the symmetry model holds;
- (3)
- if, and only if, the degree of asymmetry is maximum in the sense that ;
- (4)
- , where t is a constant for if, and only if, the conditional difference asymmetry model holds.
The conditional difference asymmetry model holds if, and only if, the absolute value of the log odds of each symmetric cell with respect to the main diagonal of the table can be expressed by the constant . Namely, the proposed two-dimensional index can represent the degree of departure from the symmetry model in terms of the log odds .
Remark 1.
Similar to the index proposed by [13] (see Appendix B), the proposed two-dimensional index represents the degree of departure from the symmetry model, and(then) or(then) for allwhen the degree of asymmetry is maximum. However, the proposed two-dimensional index represents it in terms of the log odds of each symmetric cell with respect to the main diagonal of the table, which makes the analysis results easier to interpret than the [13]’s index. Section 4 shows the usefulness of the proposed two-dimensional index by using an example. This may be one of the advantages of the proposed two-dimensional index, which cannot be represented only by using the indexesand, which are elements of the proposed two-dimensional index.
3. Approximate Confidence Region for the Proposed Index
Let denote the observed frequency in the ith row and jth column of the table (). Assume that a multinomial distribution applies to the table. The sample proportions of are with . The estimator of the proposed index , is provided by replacing with .
Let and be the vectors
respectively. Then asymptotically (as ) has a normal distribution with the zero mean vector and the covariance matrix , where is a diagonal matrix with the elements of on the main diagonal. Therefore, by the delta method (see, e.g., [12]), asymptotically (as ) has a bivariate normal distribution with the mean zero vector and the covariance matrix
where
with
Let denote with replaced by . The approximate confidence region of is given as
where is the quantile of the chi-square distribution with two degrees of freedom. Note that the confidence region is computable when and .
Since the delta method for multinomial distributions assumes asymptotic normality for the observed frequencies of each cell, the asymptotic normality of the index obtained by the delta method may be affected when the observed frequencies are small near the corners of the contingency table (see, e.g., [12], p. 589).
4. Example
This section demonstrates the usefulness of the proposed two-dimensional index compared with the index of [13] (denoted by ; see Appendix B), which measures the degree of departure from the symmetry model by using the real data cited from [18,19].
These real data are the cross-classification of fathers’ and sons’ occupational status categories in Japan, which were examined in 1955 and 1995. Their status could be classified as (1) capitalist, (2) new middle, (3) working, (4) self-employed, and (5) farming.
Table 1 represents estimates of indexes and , approximate standard errors for and , and approximate confidence intervals for and for the real data in [18,19]. Note that from the delta method, the approximate confidence intervals for and can be obtained by using the and components of the estimator of the covariance matrix of , respectively. When , the estimates of the two-dimensional index are
respectively, and the estimates of are
respectively. Figure 1 shows point estimates and approximate confidence regions of the two-dimensional index for the real data in [18,19]. The vertical and horizontal axes in Figure 1 represent the values of indexes and , respectively. Since these confidence regions do not overlap, it is inferred that these data have different probability structures. From Table 1 and Figure 1, we can see that the degree of departure from the symmetry model for the data in [19] is lager than for the data in [18]. Additionally, the confidence region of for the data in [18] includes the line passing through the points and , but the confidence region of for the data in [19] does not. Therefore, the probability structure of the data in [18] may have conditional difference asymmetry, and we can see that the father’s occupational status in Japan in 1955 has a greater influence on his son’s status than in 1995.
Table 1.
Estimates of indexes and , approximate standard errors for and , and approximate confidence intervals for and for the real data in [18,19].
Figure 1.
Point estimates and approximate confidence regions of the two-dimensional index for the real data in [18,19].
On the other hand, even using the existing index of [13], which measures the degree of departure from the symmetry model, since confidence intervals of for the real data in [18,19] are and , respectively, we can see the degree of departure from symmetry for the data in [19] is larger than for the data in [18]. However, the existing index does not allow us to determine which the data in [18,19] show that the father’s occupational status has more influence on his son’s status.
The proposed two-dimensional index is not only capable of representing the degree of departure from the symmetry model, but also can take into account the log odds of each symmetric cell with respect to the main diagonal of the table (i.e., the degree of departure from the conditional difference asymmetry model). It can, therefore, provide more interpretable analysis results, as above.
5. Discussion
This section discusses properties of the proposed index for several asymmetry models. Consider the conditional symmetry model [4] and the linear diagonals-parameter symmetry model [5] as asymmetry models. The conditional symmetry model is defined by
The linear diagonals-parameter symmetry model is defined by
Note that if the conditional symmetry model holds, then the conditional difference asymmetry model holds, but the reverse is not true. In contrast, if the linear diagonals-parameter symmetry model holds, the conditional difference asymmetry model does not always hold.
Figure 2 plots the values of the proposed index for in the conditional symmetry model. The vertical and horizontal axes in Figure 2 represent the values of indexes and , respectively. From Figure 2, as the value of increases, the value of approaches and lies on the straight line passing through and . On the other hand, Figure 3 plots the values of the proposed index for in the linear diagonals-parameter model. The vertical and horizontal axes in Figure 3 represent the values of indexes and , respectively. As can be seen in Figure 3, the value of approaches as the value of increases, but the value of does not lie on the straight line passing through and for all values of . Similar results are observed for another value of , although the details are omitted. This difference is due to the fact that the proposed index measures the degree of departure from symmetry in terms of the log odds of each symmetric cell with respect to the main diagonal in the table.
Figure 2.
The values of the proposed index under the conditional symmetry model with parameter .
Figure 3.
The values of the proposed index under the linear diagonals-parameter symmetry model with parameter .
Therefore, the proposed index is suitable for measuring the degree of departure from symmetry and can visually distinguish the asymmetry models as described above.
6. Conclusions
This paper proposed a two-dimensional index to measure the degree of departure from symmetry in terms of the log odds of each symmetric cell with respect to the main diagonal in the square contingency tables. By measuring the degree of departure from symmetry in terms of the log odds of each symmetric cell, the proposed two-dimensional index provides more interpretable analysis results than the existing index of [13], which measures the degree of departure from the symmetry model. Additionally, the proposed two-dimensional index allows us to visually compare the degrees of departure from symmetry in multiple data sets using confidence regions and easily interpret the results of data analysis.
The proposed index is invariant under arbitrary same permutations of the row and column categories; namely, the value of does not depend on the order of the categories. Therefore, it is possible to use the proposed index for data with nominal categories. Moreover, if we may not use the information about the categories’ ordering, it is possible to use the proposed index for data on an ordinal scale.
Author Contributions
Conceptualization, T.M., T.N., A.I., Y.S. and S.T.; methodology, T.M., T.N., A.I., Y.S. and S.T.; software, T.M., T.N., A.I., Y.S. and S.T.; validation, T.M., T.N., A.I., Y.S. and S.T.; formal analysis, T.M., T.N., A.I., Y.S. and S.T.; investigation, T.M., T.N., A.I., Y.S. and S.T.; resources, T.M., T.N., A.I., Y.S. and S.T.; data curation, T.M., T.N., A.I., Y.S. and S.T.; writing—original draft preparation, T.M., T.N., A.I., Y.S. and S.T.; writing—review and editing, T.M., T.N., A.I., Y.S. and S.T.; visualization, T.M., T.N., A.I., Y.S. and S.T.; supervision, T.M., T.N., A.I., Y.S. and S.T.; project administration, T.M., T.N., A.I., Y.S. and S.T.; funding acquisition, T.N. and S.T. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by JSPS Grant-in-Aid for Scientific Research (C) Grant Number JP20K03756 and JSPS Grant-in-Aid for Scientific Research (C) Grant Number JP18K03425.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data presented in this study are available in [18,19].
Acknowledgments
The authors would like to thank three anonymous reviewers for their careful reading, many insightful comments and suggestions.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. The Proof of Theorem
Proof of Theorem 1.
Since
holds from Jensen’s inequality, holds. This equality holds if, and only if, are constant. Moreover, are constant if, and only if, are constant, which is equivalent to the fact that the conditional difference asymmetry model holds. □
Appendix B. The Index of Symmetry
For an square contingency table with nominal categories, the index to represent the degree of departure from the symmetry model is proposed by [13], as follows. Assuming that for all , for ,
where
The is defined as
where
Note that is [17]’s diversity index of degree including the Shannon entropy (), and the real number is chosen by the user. The index has the following characteristics: (i) ; (ii) if, and only if, the symmetry model holds; and (iii) if, and only if, the degree of asymmetry is maximum in the sense that (then ) or (then ) for all .
References
- Bowker, A.H. A test for symmetry in contingency tables. J. Am. Stat. Assoc. 1948, 43, 572–574. [Google Scholar] [CrossRef] [PubMed]
- Caussinus, H. Contribution à l’analyse statistique des tableaux de corrélation. Annales de la Faculté des Sciences de Toulouse 1965, 29, 77–183. [Google Scholar] [CrossRef]
- Saigusa, Y.; Tahata, K.; Tomizawa, S. Measure of departure from partial symmetry for square contingency tables. J. Math. Stat. 2016, 12, 152–156. [Google Scholar] [CrossRef]
- McCullagh, P. A class of parametric models for the analysis of square contingency tables with ordered categories. Biometrika 1978, 65, 413–418. [Google Scholar] [CrossRef]
- Agresti, A. A simple diagonals-parameter symmetry and quasi-symmetry model. Stat. Probab. Lett. 1983, 1, 313–316. [Google Scholar] [CrossRef]
- Tomizawa, S.; Miyamoto, N.; Funato, R. Conditional difference asymmetry model for square contingency tables with nominal categories. J. Appl. Stat. 2004, 31, 271–277. [Google Scholar] [CrossRef]
- Yule, G.U. On the association of attributes in statistics. Philos. Trans. R. Soc. Lond. Ser. A Contain. Pap. Math. Phys. Character 1900, 194, 257–319. [Google Scholar]
- Yule, G.U. On the methods of measuring association between two attributes. J. R. Stat. Soc. 1912, 75, 579–652. [Google Scholar] [CrossRef]
- Cramér, H. Mathematical Methods of Statistics; Princeton University Press: Princeton, NJ, USA, 1946. [Google Scholar]
- Goodman, L.A.; Kruskal, W.H. Measures of association for cross classifications. J. Am. Stat. Assoc. 1954, 49, 732–764. [Google Scholar]
- Bishop, Y.M.; Fienberg, S.E.; Holland, P.W. Discrete Multivariate Analysis: Theory and Practice; Springer Science & Business Media: New York, NY, USA, 2007. [Google Scholar]
- Agresti, A. Categorical Data Analysis, 3rd ed.; John Wiley and Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Tomizawa, S.; Seo, T.; Yamamoto, H. Power-divergence-type measure of departure from symmetry for square contingency tables that have nominal categories. J. Appl. Stat. 1998, 25, 387–398. [Google Scholar] [CrossRef]
- Tomizawa, S.; Miyamoto, N.; Hatanaka, Y. Measure of asymmetry for square contingency tables having ordered categories. Aust. N. Z. J. Stat. 2001, 43, 335–349. [Google Scholar] [CrossRef]
- Ando, S.; Tahata, K.; Tomizawa, S. A bivariate index vector for measuring departure from double symmetry in square contingency tables. Adv. Data Anal. Classif. 2019, 13, 519–529. [Google Scholar] [CrossRef]
- Ando, S. Directional index for measuring global symmetry in square tables. J. Stat. Theory Pract. 2020, 14, 1–8. [Google Scholar] [CrossRef]
- Patil, G.; Taillie, C. Diversity as a concept and its measurement. J. Am. Stat. Assoc. 1982, 77, 548–561. [Google Scholar] [CrossRef]
- Hashimoto, K. Gendai Nihon no Kaikyuu Kouzou (Class Structure in Modern Japan: Theory, Method and Quantitative Analysis); Toshindo Press: Tokyo, Japan, 1999. (In Japanese) [Google Scholar]
- Hashimoto, K. Class Structure in Contemporary Japan; Trans Pacific Press: Melbourne, Australia, 2003. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).