Abstract
The double symmetry model satisfies both the symmetry and point symmetry models simultaneously. To measure the degree of deviation from the double symmetry model, a two-dimensional index that can concurrently measure the degree of deviation from symmetry and point symmetry is considered. This two-dimensional index is constructed by combining two existing indexes. Although the existing indexes are constructed using power divergence, the existing two-dimensional index that can concurrently measure both symmetries is constructed using only Kullback-Leibler information, which is a special case of power divergence. Previous studies note the importance of using several indexes of divergence to compare the degrees of deviation from a model for several square contingency tables. This study, therefore, proposes a two-dimensional index based on power divergence in order to measure deviation from double symmetry for square contingency tables. Numerical examples show the utility of the proposed two-dimensional index using two datasets.
1. Introduction
Consider an square contingency table that has the same row and column classifications with nominal categories. Let denote the probability that an observation will fall in the ith row and jth column of the table ().
The symmetry (S) model proposed by Bowker [1] is defined by
This S model is the most commonly used model for analyzing square contingency tables [2,3,4].
The point symmetry (PS) model proposed by Wall and Lienert [5] is defined by
where and . This PS model assumes the point of symmetry as a center of the square contingency table.
The double symmetry (DS) model proposed by Tomizawa [6] is defined by
This DS model indicates that both the S and PS model hold.
When a model does not hold, we may be interested in measuring the degree of deviation from the model. For square contingency tables with nominal categories, Tomizawa et al. [7] proposed an index that represents the degree of deviation from the S model, Tomizawa et al. [8] proposed an index that represents the degree of deviation from the PS model, and Yamamoto et al. [9] proposed an index that represents the degree of deviation from the DS model.
This study focuses on the index that represents the degree of deviation from the DS model. Although the DS model satisfies both the S and PS models simultaneously, the above index cannot concurrently measure the degree of deviation from S and PS. To address this gap, Ando et al. [10] proposed a two-dimensional index that can concurrently measure those. This two-dimensional index was constructed by combining existing indexes and . Ando et al. [10] points out that it is necessary to construct as a two-dimensional index rather than a univariate index because existing indexes and are not independent. Ando et al. [10] considered three datasets: (1) the degree of deviation from the S model is large but the degree of deviation from the PS model is small, (2) the degree of deviation from the S model is small but the degree of deviation from the PS model is large, and (3) both the degree of deviation from the S model and the PS model are large. By using these datasets which have a different structure with respect to the deviation from the DS model, Ando et al. [10] showed that the all values of the index applied to these datasets are the same, whereas all the values of the two-dimensional index are different. Thus, this two-dimensional index gives more detailed results than the index .
On the other hand, existing indexes , and are constructed using power divergence, while the two-dimensional index is constructed using only Kullback-Leibler information, which is a special case of power divergence. Moreover, the power divergence includes several divergences, for example, the power divergence with is equivalent to the Freeman-Tukey type divergence, the power divergence with is equivalent to the Pearson chi-squared type divergence. For details on power divergence, see Cressie and Read [11], Read and Cressie [12]. Previous studies (e.g., [7,8]) pointed out that it is important to use several indexes of divergence to accurately measure the degree of deviation from a model. This study proposes a two-dimensional index that is constructed by combining existing indexes and based on power divergence.
The rest of this paper is organized as follows. In Section 2, we propose a generalized two-dimensional index for measuring the degree of deviation from DS. In Section 3, we develop an approximate confidence region for the proposed two-dimensional index. We then use numerical examples to show the utility of the proposed two-dimensional index in Section 4. We also present results obtained by applying the proposed two-dimensional index to real data. We close with concluding remarks in Section 5.
2. Two-Dimensional Index to Measure Deviation from DS
We propose a generalized two-dimensional index for measuring deviation from DS in square contingency tables. The proposed two-dimensional index can concurrently measure the degree of deviation from S and PS. The proposed two-dimensional index is based on power divergence.
Assume that for all , and for all , where
In order to measure the degree of deviation from DS, we consider the following two-dimensional index:
where indexes and are those considered by Tomizawa et al. [7] and Tomizawa et al. [8], respectively (see the Appendix A and Appendix B for the details of these indexes). Note that the is a real value and is chosen by the user. We recommend choosing the (e.g., ) corresponding to the famous divergence. When , the proposed two-dimensional index is equivalent to the index by Ando et al. [10]. Thus, is a generalization of the index by Ando et al. [10]. The two-dimensional index has the following characteristics: (i) if and only if the DS model holds; (ii) if and only if the degree of deviation from DS is maximum, in the sense that (then and ) or (then and ) for all , and either or for (when r is even) or (when r is odd); (iii) if and only if the degree of deviation from S is maximum and the degree of deviation from PS is not maximum, in the sense that (then ) for all ; and (iv) if and only if the degree of deviation from PS is maximum and the degree of deviation from S is not maximum, in the sense that (then ) for all .
3. Approximate Confidence Region for the Proposed Two-Dimensional Index
Let
Assume that has a multinomial distribution with sample size N and probability vector . The has an asymptotically Gaussian distribution with mean zero and covariance matrix , where = and is a diagonal matrix with the elements of on the main diagonal (see, e.g., Agresti [13]). We estimate by , where and are given by and with replaced by , respectively. Using the delta method (see Agresti [13]), has an asymptotically bivariate Gaussian distribution with mean zero and covariance matrix
with = . Let
The elements , , and are expressed as follows:
where for
with
Note that the asymptotic variances and of and , respectively, have been given by Tomizawa et al. [7] and Tomizawa et al. [8], however, the asymptotic covariance of and is first derived in this study. An approximate bivariate confidence region for the index is given by
where is the upper percentile of the central chi-square distribution with two degrees of freedom and is given by with replaced by .
4. Examples
4.1. Utility of the Proposed Two-Dimensional Index
In this section, we demonstrate the usefulness employing several divergences to compare the degrees of deviation from DS in several datasets. We consider the two artificial datasets in Table 1. We compare the degrees of deviation from DS for Table 1a,b using the confidence region for . Table 2 gives the estimated values of and for Table 1a,b.
Table 1.
Two artificial datasets.
Table 2.
Estimated indexes and and estimated covariance matrix of applied to the data in Table 1a,b.
From Figure 1, we see that the confidence regions for do not overlap for the data in Table 1a,b. We can conclude that Table 1a,b has a different structure in the degree of deviation from DS. That is, Table 1a,b has a different structure with regard to the degree of deviation from S or PS. From Figure 1, when , we can conclude that the degree of deviation from DS for Table 1a is greater than that for Table 1b, but when , we cannot conclude this. We should, therefore, examine the value of the two-dimensional index using several to compare the degrees of deviation from DS for several datasets.
Figure 1.
Aproximate 95% confidence regions for applied to the data in Table 1a,b.
4.2. Example with Real Data
Consider the data in Table 3, which are taken from Anderson [14].
Table 3.
The two tables below show the three-year production and price forecasts, given by experts in July 1956, and the actual production and price figures from May 1959 for a sample of about 4000 Danish factories; from Andersen [14].
We are interested in the DS model for these data. We define, for example, the probability that the forecast and actual figures are “No change”and “Higher”, respectively, as , and the probability that they are “Lower”and “No change”, respectively, by . For Table 3, we are interested in whether the forecast accuracy changes depending on the category. When the forecast accuracy does not depend on these categories, the following holds: (1) the probabilities that the categories of the forecast and the actual are the same and are equal to one another (); (2) the probabilities that the difference between the categories of the forecast and the actual is one are also equal (); and (3) the probabilities that the difference between the categories of the forecast and the actual is two are also equal (). The above probability structure indicates the DS model. Moreover, we are interested in whether the degree to which the forecast accuracy depends on the categories is greater for prices than for production, or vice versa. Table 4 shows the value of and . We shall compare the degrees of deviation from DS for Table 3a,b using the confidence region for . The estimates of , applied to the data in Table 3a,b, are shown in Table 4.
Table 4.
Estimated indexes and and estimated covariance matrix of applied to Table 3a,b.
Figure 2 shows the confidence regions of applied to the data in Table 3a,b. We see that the confidence regions of do not overlap with regard to several values of . Therefore, it may be concluded that Table 3a,b has a different structure with regard to the degree of deviation from DS, in the sense that Table 3a,b has a different structure with regard to the degree of deviation from S. However, we cannot conclude whether the degree of deviation from DS is greater for Table 3a than for Table 3b. This is because, when both the degrees of deviation from S and PS are greater for Table 3a than for Table 3b, we can conclude that the degree of deviation from DS is greater for Table 3a than for Table 3b.
Figure 2.
Approximate 95% confidence regions for , applied to the data in Table 3a,b, where .
Next, consider the data in Table 5, which are taken from Tomizawa et al. [15].
Table 5.
The two tables below show the decayed teeth of 349 female 18–39-year-old patients of a dental clinic in Sapporo City, for the period 2001–2005; from Tomizawa et al. [15].
We shall compare the degrees of deviation from DS for Table 5a,b using the confidence region for . The estimates of , applied to the data in Table 5a,b, are shown in Table 6.
Table 6.
Estimated indexes and and estimated covariance matrix of applied to Table 5a,b.
Figure 3 shows the confidence regions of applied to the data in Table 5a,b. We see that the confidence regions of do not overlap in both horizontal and vertical axes with regard to several values of . Therefore, we can conclude that the degree of deviation from DS is greater for Table 5b than for Table 5a.
Figure 3.
Approximate 95% confidence regions for , applied to the data in Table 5a,b, where .
5. Concluding Remarks
This study proposed a generalized two-dimensional index that concurrently measures the degree of deviation from S and PS. Since the two indexes ( and ) were used to measure the degree of deviation from S and PS are not independent (), it is necessary to concurrently measure the degree of deviation from S and PS when we measure the degree of deviation from DS. To compare degrees of deviation from DS in several datasets using the proposed two-dimensional index, we should use several rather than one specified . Therefore, we recommend to choose the several (e.g., ) corresponding to the famous divergence.
The estimator of the proposed two-dimensional index is the unbiased estimator when the sample size is large. When the sample size is small, however, the estimator of the proposed two-dimensional index may be the biased estimator. Through simulation study, Tomizawa et al. [16] investigated the performance of the estimator . Tomizawa et al. [16] showed that (1) when the sample size was less than 300, the estimator had a bias, (2) when the sample size was above 300, it had a slight bias, and (3) when the sample size was above 1000, it had almost no bias. We believe that the proposed two-dimensional estimator may be similar results to the estimator , although it is necessary to verify by simulation study. In future research, the above concern will be investigated.
Author Contributions
Conceptualization, S.A. and S.T.; methodology, S.A. and H.H.; software, S.A. and H.H.; validation, A.I.; formal analysis, S.A. and H.H.; writing—original draft preparation, S.A. and H.H.; writing—review and editing, A.I. and S.T. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Acknowledgments
The authors would like to thank anonymous reviewers and the editor for their comments and suggestions to improve this paper.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Existing Index
Assuming that for all , the index , which represents the degree of deviation from S, is expressed as follows:
where
with
Appendix B. Existing Index
Assuming that for all , the index , which represents the degree of deviation from PS, is expressed as follows:
where
with
Note that and are the power divergence between the two conditional distributions, and the value at is taken to be the limit as .
References
- Bowker, A.H. A test for symmetry in contingency tables. J. Am. Stat. Assoc. 1948, 43, 572–574. [Google Scholar] [CrossRef] [PubMed]
- Bishop, Y.M.M.; Fienberg, S.E.; Holland, P.W. Discrete Multivariate Analysis: Theory and Practice; The MIT Press: Cambridge, MA, USA, 1975. [Google Scholar]
- Tan, T.K. Doubly Classified Model with R; Springer: Singapore, 2017. [Google Scholar]
- Agresti, A. An Introduction to Categorical Data Analysis, 3rd ed.; Wiley: Hoboken, NJ, USA, 2018. [Google Scholar]
- Wall, K.D.; Linert, G.A. A test for point-symmetry in J-dimensional contingency-cubes. Biom. J. 1976, 18, 259–264. [Google Scholar]
- Tomizawa, S. Double symmetry model and its decomposition in a square contingency table. J. Jpn. Stat. Soc. 1985, 15, 17–23. [Google Scholar]
- Tomizawa, S.; Seo, T.; Yamamoto, H. Power-divergence-type measure of departure from symmetry for square contingency tables that have nominal categories. J. Appl. Stat. 1998, 25, 387–398. [Google Scholar] [CrossRef]
- Tomizawa, S.; Yamamoto, K.; Tahata, K. An entropy measure of departure from point-symmetry for two-way contingency tables. Symmetry Cult. Sci. 2007, 18, 279–297. [Google Scholar]
- Yamamoto, K.; Komatsu, M.; Tomozawa, S. Measure of departure from double-symmetry for square contingency tables. J. Stat. Appl. 2010, 5, 105–118. [Google Scholar]
- Ando, S.; Tahata, K.; Tomizawa, S. A bivariate index vector for measuring departure from double symmetry in square contingency tables. Adv. Data Anal. Classif. 2019, 9, 519–529. [Google Scholar] [CrossRef] [Green Version]
- Cressie, N.A.C.; Read, T.R.C. Multinomial goodness-of-fit tests. J. R. Stat. Ser. B 1984, 46, 440–464. [Google Scholar] [CrossRef]
- Read, T.R.C.; Cressie, N.A.C. Goodness-of-Fit Statistics for Discrete Multivariate Data; Springer: New York, NY, USA, 1988. [Google Scholar]
- Agresti, A. Categorical Data Analysis, 3rd ed.; Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
- Andersen, E.B. Introduction to the Statistical Analysis of Categorical Data; Springer: Berlin, Germany, 1997. [Google Scholar]
- Tomizawa, S.; Miyamoto, N.; Iwamoto, M. Linear column-parameter symmetry model for square contingency tables: Application to decayed teeth data. Biom. Lett. 2006, 43, 91–98. [Google Scholar]
- Tomizawa, S.; Miyamoto, N.; Ohba, N. Improved approximate unbiased estimators of measures of asymmetry for square contingency tables. Adv. Appl. Stat. 2007, 7, 47–63. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).