Abstract
The diagonals–parameter symmetry (DPS) model is a proposed method for analyzing square contingency tables with ordinal categories. Previously, it was stated that the generalized DPS (DPS[f]) model was equivalent to the DPS model for any function f, but the proof was not provided. This paper presents the derivation of the DPS[f] model and the proof of the relationship between the two models. The findings offer various interpretations of the DPS model. Additionally, a new model is considered, and it is shown that the proposed model and the DPS[f] model are separable.
1. Introduction
A contingency table with identical categories for rows and columns can be produced when a categorical variable is repeatedly measured. Observations in this type of table tend to concentrate on the cells along the main diagonal. Our research focuses on applying symmetry instead of assuming independence between row and column categories. Several studies have addressed symmetry issues, such as [1,2,3,4,5,6,7,8,9].
Let X and Y represent the row and column variables for an contingency table with ordinal categories. Additionally, let represent the probability of an observation falling into the th cell, where and . The diagonals–parameter symmetry (DPS) model proposed by Goodman [10] is defined as follows.
where and . The parameter in the DPS model represents the odds of an observation falling into cells where and , rather than cells for . Moreover, the ratio between and can be expressed as the constant for and . This ratio depends solely on the distance from the main diagonal cells.
When in Equation (1), the DPS model reduces to the symmetry (S) model proposed by Bowker [1]. When is independent of i and j in Equation (1), with , the DPS model reduces to the conditional symmetry (CS) model proposed by McCullagh [11].
Using the f-divergence, Kateri and Papaioannou [2] proposed the generalized DPS (DPS[f]) model, defined as
where , , and . It should be noted that the function f is twice-differentiable and strictly convex. Additionally, , , , , and . The model derivation is not included in their paper. They did mention that the DPS model is the closest to symmetry regarding the Kullback–Leibler distance under some conditions and that the DPS[f] model is equivalent to the DPS model. In this study, we will derive the DPS[f] model and provide proof of the relation between the two models. We can obtain various interpretations of the DPS model from the result. We discuss the necessary and sufficient condition for the S model and the property between test statistics for goodness of fit.
The paper is organized as follows: Section 2 derives Equation (2) and interprets the model from an information theory viewpoint. The proof is given that the DPS[f] model is equivalent to the DPS model regardless of the function f. Section 3 considers a new model and proves that the proposed model and the DPS[f] model are separable. A numerical example is provided in Section 4. Finally, Section 5 summarizes the paper.
2. Properties of the DPS[f] Model
Kateri and Papaioannou [2] noted that the DPS[f] model is the closest model to the S model in terms of the f-divergence under the conditions where (and ) for k= and the sums for are given. Similar research has been conducted in, for example, Ireland et al. [12], Kateri and Agresti [3], and Tahata [5]. This section derives the DPS[f] model and describes its properties.
We can obtain the following theorem, although the proof of Theorem 1 is given in Appendix A.1.
Theorem 1.
In the class of models with given , , and , the model
with , and , is the model closest to the complete symmetry model in terms of the f-divergence.
The DPS model can be expressed as
where , and . It should be noted that represents the conditional probability of an observation falling in the cell, given that it falls in either the cell or the cell. Namely, the DPS[f] model indicates that
When , the DPS[f] model is reduced to the S model.
If , , then the f-divergence is reduced to the KL divergence. When we set , Equation (3) is reduced to
where and . We shall refer to this model as the DPSKL model. Under the DPSKL model, the ratios of and for are expressed as
where and . Since Equation (5) indicates that the ratio of and depends on the distance of , the DPSKL model is equivalent to the DPS model proposed by Goodman [10]. Namely, the DPS model is the closest model to the S model in terms of the KL divergence under the conditions where , , and the sums for are given. This is a special case of Theorem 1.
If , , then the f-divergence is reduced to the reverse KL divergence. Then, the DPS[f] model is reduced to
where and . We shall refer to this model as the DPSRKL model. This model is the closest to the S model when the divergence is measured by the reverse KL divergence and can be expressed as
where and . This model indicates that the difference between inverse probabilities and depends on the distance of .
If , then the f-divergence is reduced to the -divergence (Pearsonian distance). Then, the DPS[f] model is reduced to
where and . We shall refer to this model as the DPSP model. This model is the closest to the S model when the divergence is measured by the -divergence and can be expressed as
where and . This model indicates that the difference between and depends on the distance of .
Moreover, if , , where is a real-valued parameter, then the f-divergence is reduced to the power-divergence [13]. Then, the DPS model is reduced to
where and . We shall refer to this model as the DPSPD(λ) model. This model is the closest to the S model when the power-divergence measures the divergence and can be expressed as
where and . This model indicates that the difference between the symmetric conditional probabilities to the power of depends on the distance of . When we apply the DPSPD(λ) model, we should set the value of .
Kateri and Papaioannou [2] reported that the DPS[f] model is equivalent to the DPS model regardless of f. That is, all the models described above (i.e., DPSKL, DPSRKL, DPSP, and DPSPD(λ)) are equivalent to the DPS model surprisingly. However, the proof was not given. We prove the following theorem.
Theorem 2.
The DPS[f] model is equivalent to the DPS model regardless of f.
The poof is given in Appendix A.2. Theorem 2 states that the DPS model holds if and only if the DPS[f] model holds. If the DPS model fits the given dataset, we obtain various interpretations for the data.
When , the DPS[f] model is reduced to the conditional symmetry model based on the f-divergence (CS[f]) model. The CS[f] model is described previously Kateri and Papaioannou [2]. Additionally, Fujisawa and Tahata [14] proposed the generalization of the CS[f] model. Similarly, when , the DPS model is reduced to the CS model proposed by McCullagh [11]. The CS[f] model is equivalent to the CS model regardless of f (Kateri and Papaioannou [2]). Hence, Theorem 2 leads to the following result.
Corollary 1.
The CS[f] model is equivalent to the CS model regardless of f.
3. Equivalence Conditions for Symmetry
Here, the equivalence conditions of the S model are discussed. If the S model holds, then the DPS[f] model with holds. Conversely, if the DPS[f] model holds, then the S model does not hold generally. Therefore, we are interested in considering an additional condition to obtain the S model when the DPS[f] model holds. Other studies have discussed such conditions; see Read [15] and Tahata et al. [16].
We consider the distance global symmetry (DGS) model defined as
where , . For , this model indicates that the sum of probabilities which are apart distance from main diagonal cells is equal to the sum of probabilities which are apart distance from main diagonal cells. We obtain the following theorem. (The proof is given in Appendix A.3.)
Theorem 3.
The S model holds if and only if both the DPS[f] and DGS models hold.
Next, we consider the global symmetry (GS) model, which is defined as
It should be noted that the DGS model implies the GS model. Read [15] noted that the S model holds if and only if both the CS and GS models hold. Fujisawa and Tahata [14] proved that the S model holds if and only if the CS[f] and GS models hold. These statements are the same as those from Corollary 1. In addition, a refined estimator for measures associated with the S, CS, and GS models was introduced by [17]. The result has a significant connection to decomposing the S model and separating the goodness-of-fit test statistic of the S model. According to Corollary 1, the refined estimator for the measure of CS can be utilized to gauge the extent of deviation from the CS[f] model.
This section proves the separation of the test statistics for the S model into those for the DPS[f] model and the DGS model. Consider a square contingency table of size where denotes the observed frequency in the cell located at the position. Assume this contingency table adheres to a multinomial distribution. In this context, let represent the expected frequency in the cell, and be its corresponding maximum likelihood estimate under a specified model. To test each model’s goodness of fit, we can employ the likelihood ratio chi-square statistic, denoted by . This statistic is computed using the following formula:
This statistic follows a chi-square distribution with the corresponding degrees of freedom (df).
It is supposed that model M3 holds if and only if both models M1 and M2 hold. In this case, if the analyst has found hypothesis M3 unacceptable, their attention will move to examining components M1 and M2. For these three models, Aitchison [18] discussed the properties of the Wald test statistics, and Darroch and Silvey [19] described the properties of the likelihood ratio chi-square statistics. Assume that the following equivalence holds:
where T is the goodness of fit test statistic and the number of df for M3 is equal to the sum of numbers of df for M1 and M2. If both M1 and M2 are accepted with a high probability (at the significance level), then M3 is accepted. However, when (6) does not hold, an incompatible situation where both M1 and M2 are accepted with a high probability but M3 is rejected may arise. In fact, Darroch and Silvey [19] showed such an interesting example. The partitions of chi-squared test statistics are also discussed in, for example, [20,21].
From Theorem 3, the S model holds if and only if the DPS[f] model and the DGS model hold. In addition, df for the DPS[f] model is and that for DGS model is . The df for the S model can be obtained by adding the degrees of freedom for the DPS[f] model and the DGS model. Thus, we consider partitioning test statistics.
Theorem 2 confirms that the DPS[f] model is equivalent to the DPS model. Therefore, the maximum likelihood estimates (MLEs) under the DPS[f] model are given by
where , , and (Goodman [10]).
Next, we consider the MLEs under the DGS model using the Lagrange function. Since the kernel of the log likelihood is , Lagrange function L is written as
Equating the derivation of L to 0 with respect to , , and gives
where . It is important to note that the DPS and DGS models do not remain the same when the row and column categories are permuted. Therefore, these models should be used with data from an ordinal category.
We obtain the following equivalence from Equations (7) and (8):
because the MLEs under the S model are . Therefore, the DPS[f] model and the DGS model are separable and exhibit independence.
Let denote the Wald statistic for model M. We obtain the following theorem and prove it in Appendix A.4.
Theorem 4.
is equal to the sum of and .
4. Numerical Example
Table 1, which is taken from Smith et al. [22], describes the amount of influence religious leaders and medical leaders should have in government funding for decisions on stem cell research when surveying 871 people. The influence levels are divided into four categories: (1) Great influence, (2) Some influence, (3) A little influence, and (4) No influence.
Table 1.
How much influence should religious leaders and medical leaders have in government funding for decisions on stem cell research? [22].
The values of the likelihood ratio chi-square statistics and the corresponding p values for the models applied to these data are shown in Table 2. Table 2 indicates that the sum of the test statistics DPS (i.e., DPS[f]) model and DGS model is equal to that of the S model. The S model fits the data very poorly. We can infer that the marginal distribution for religious leaders is not equal to that for medical leaders. On the other hand, the DPS model fits the data very well. The likelihood-ratio test for the null hypothesis : uses a test statistic which is the difference between for the S model and the DPS model. The resulting test statistic is with three degrees of freedom. This indicates strong evidence of at least one difference from 1. Additionally, the DGS model fits the data poorly. From Theorem 3, the reason of the poor fit of S model is caused by the poor fit of the DGS model rather than the DPS model.
Table 2.
Likelihood ratio chi-square values for the models applied to Table 1.
The values of MLEs of in Equation (1) are . It should be noted that is equal to in the DPSKL model. Let denote the pair that the amount of influence religious leaders is ith level and that of medical leaders is jth level. When (), a pair is times as likely as a pair on condition that a pair is or . From (), the probability distribution for religious leaders is stochastically higher than the probability distribution of medical leaders. That is, the medical leaders rather than the religious leaders should have influence in government funding for decisions on stem cell research.
Moreover, from Theorem 2, we can obtain various interpretations. Since the DPS model holds, the DPSRKL, DPSP, and DPSPD(λ) models also hold. For example, we obtain
and for ,
When (), we can infer that (i) the difference between the reciprocal of conditional probability that a pair is and the reciprocal of conditional probability that a pair is is on condition that the pair is or from the DPSRKL model, (ii) the difference between the conditional probability that a pair is and the conditional probability that a pair is is under the same condition from the DPSP model, and (iii) the difference between the conditional probability that a pair is to the third power and the conditional probability that a pair is to the third power is under the same condition from the DPSPD(3) model.
5. Concluding Remarks
This paper proves that the DPS[f] model is equivalent to the DPS model proposed by Goodman [10]. This result provides various interpretations of the DPS model. The separation of the test statistic for the S model is considered. The DPS[f] and DGS models are separable and exhibit independence. Kateri and Papaioannou [2], Kateri and Agresti [3], Tahata [5] and Fujisawa and Tahata [14] considered models based on the f-divergence for the analysis of square contingency tables with ordinal categories. In the future, it should be studied whether the model based on the f-divergence is equivalent to the conventional model.
Author Contributions
Conceptualization, K.T.; methodology, K.T.; software, K.M.; validation, K.T. and K.M.; formal analysis, K.T.; investigation, K.M.; resources, Tahata, K.; data curation, K.M.; writing—original draft preparation, K.M.; writing—review and editing, K.T.; visualization, K.M.; supervision, K.T.; project administration, K.T.; funding acquisition, K.T. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by JSPS KAKENHI (Grant Number 20K03756).
Data Availability Statement
“The General Social Survey” at https://gss.norc.org/ (accessed on 1 June 2024).
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A
This section provides the proofs of theorems.
Appendix A.1
In a similar manner to Tahata [5], we prove Theorem 1. Let denote the f-divergence between () and (). That is
where f satisfies the conditions described in Section 1. Now minimize (A1) under the conditions where the restraints
and
are given. The Lagrange function is written as
By taking the partial derivative of L with respect to and setting it to zero, we obtain the following equation:
Let denote F, and let denote the solution satisfying (A2), (A3), and (A4). Given that f is a strictly convex function, it follows that for all x. Thus, F is strictly monotonic, ensuring the existence of . We represent as and as . From Equation (A4), we obtain
where and . The minimum value of is obtained for , where and are selected to ensure that complies with the constraints (A2) and (A3). Thus, the DPS[f] model represents the optimal approximation to the S model in terms of f-divergence under these specified conditions.
Appendix A.2
Let function G be defined as
where . Then, the derivative of G is
Since the function f is twice-differential and strictly convex for , hence G is a strictly increasing function, and exists.
If the DPS model holds, holds for from Equation (1), where . Then we can see that for ,
This is equivalent to Equation (4). Namely, the DPS[f] model holds.
Since exists, we obtain
Namely, the DPS model holds. The proof is complete.
Appendix A.3
It is obvious that if the S model holds, the DPS[f] model and the DGS model simultaneously hold. Assuming that both the DPS[f] and the DGS models hold, we show that the S model holds. From Theorem 2, the DPS[f] model is equivalent to for with . Since the DGS model holds, we obtain
Since , we get (). Namely, the S model holds.
Appendix A.4
Theorem 2 shows that the DPS[f] model is equivalent to the DPS model. Let
where . Then, from Equation (1), the DPS model is expressed as
where is a vector . Here, ( vector) is 1 for the hth element and 0 otherwise. For example, when ,
Additionally, () is the vector shouldering . Note that the matrix is a full column rank where .
We define the linear space spanned by the columns of the matrix as , which has dimension K. This space, , is a subspace of . Consider an matrix with full column rank, such that the linear space , spanned by the columns of , serves as the orthogonal complement of . Note that is calculated as . Given that , where denotes the zero matrix, the DPS model can be expressed as , with representing the zero vector.
Additionally, the DGS model can be expressed as where
and . Here, . Note that belongs to the space . That is, .
Let denote with replaced by , where with . From Theorem 3, the S model is equivalent to , where and . In an analogous manner to Tahata [5], we obtain that has an asymptotically normal distribution with mean and covariance matrix
where and . Here, denotes a diagonal matrix with the ith component of as the ith diagonal component. Therefore, holds, where
The Wald statistic for the DPS[f] model (i.e., ) is , that for the DGS model (i.e., ) is , and that for the S model (i.e., ) is . The proof is complete.
References
- Bowker, A.H. A test for symmetry in contingency tables. J. Am. Stat. Assoc. 1948, 43, 572–574. [Google Scholar] [CrossRef]
- Kateri, M.; Papaioannou, T. Asymmetry models for contingency tables. J. Am. Stat. Assoc. 1997, 92, 1124–1131. [Google Scholar] [CrossRef]
- Kateri, M.; Agresti, A. A class of ordinal quasi-symmetry models for square contingency tables. Stat. Probab. Lett. 2007, 77, 598–603. [Google Scholar] [CrossRef]
- Tahata, K.; Tomizawa, S. Generalized linear asymmetry model and decomposition of symmetry for multiway contingency tables. J. Biom. Biostat. 2011, 2, 1–6. [Google Scholar] [CrossRef]
- Tahata, K. Separation of symmetry for square tables with ordinal categorical data. Jpn. J. Stat. Data Sci. 2020, 3, 469–484. [Google Scholar] [CrossRef]
- Tahata, K. Advances in Quasi-Symmetry for Square Contingency Tables. Symmetry 2022, 14, 1051. [Google Scholar] [CrossRef]
- Beh, E.J.; Lombardo, R. Visualising Departures from Symmetry and Bowker’s X2 Statistic. Symmetry 2022, 14, 1103. [Google Scholar] [CrossRef]
- Altun, G.; Saraçbaşı, T. Determination of model fitting with power-divergence-type measure of departure from symmetry for sparse and non-sparse square contingency tables. Commun. Stat.-Simul. Comput. 2022, 51, 4087–4111. [Google Scholar] [CrossRef]
- Ando, S. Generalized Sum-Asymmetry Model and Orthogonality of Test Statistic for Square Contingency Tables. Austrian J. Stat. 2024, 53, 99–108. [Google Scholar] [CrossRef]
- Goodman, L.A. Multiplicative Models for Square Contingency Tables with Ordered Categories. Biometrika 1979, 66, 413–418. [Google Scholar] [CrossRef]
- McCullagh, P. A class of parametric models for the analysis of square contingency tables with ordered categories. Biometrika 1978, 65, 413–418. [Google Scholar] [CrossRef]
- Ireland, C.T.; Ku, H.H.; Kullback, S. Symmetry and Marginal Homogeneity of an r × r Contingency Table. J. Am. Stat. Assoc. 1969, 64, 1323–1341. [Google Scholar] [CrossRef]
- Read, C.B.; Cressie, N. Goodness-of-Fit Statistics for Discrete Multivariate Data; Springer: New York, NY, USA, 1988. [Google Scholar]
- Fujisawa, K.; Tahata, K. Asymmetry model based on f-divergence and orthogonal decomposition of symmetry for square contingency tables with ordinal categories. SUT J. Math. 2020, 56, 39–53. [Google Scholar] [CrossRef]
- Read, C.B. Partitioning chi-squape in contingency tables: A teaching approach. Commun. Stat.-Theory Methods 1977, 6, 553–562. [Google Scholar] [CrossRef]
- Tahata, K.; Naganawa, M.; Tomizawa, S. Extended linear asymmetry model and separation of symmetry for square contingency tables. J. Jpn. Stat. Soc. 2016, 46, 189–202. [Google Scholar] [CrossRef]
- Tahata, K.; Auchi, R.; Ando, S.; Tomizawa, S. Separation of the refined estimator of the measure for symmetry in square contingency tables. Commun. Stat.-Simul. Comput. 2023, 1–17. [Google Scholar] [CrossRef]
- Aitchison, J. Large-sample restricted parametric tests. J. R. Stat. Soc. Ser. B-Stat. Methodol. 1962, 24, 234–250. [Google Scholar] [CrossRef]
- Darroch, J.N.; Silvey, S.D. On testing more than one hypothesis. Ann. Math. Stat. 1963, 34, 555–567. [Google Scholar] [CrossRef]
- Lang, J.B.; Agresti, A. Simultaneously modeling joint and marginal distributions of multivariate categorical responses. J. Am. Stat. Assoc. 1994, 89, 625–632. [Google Scholar] [CrossRef]
- Lang, J.B. On the partitioning of goodness-of-fit statistics for multivariate categorical response models. J. Am. Stat. Assoc. 1996, 91, 1017–1023. [Google Scholar] [CrossRef]
- Smith, T.W.; Marsden, P.; Hout, M.; Kim, J. General Social Surveys, 1972–2014 [Machine-Readable Data File]; NORC at the University of Chicago: Chicago, IL, USA, 2006. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).