Families of Generalized Quasisymmetry Models: A φ -Divergence Approach

: The quasisymmetry ( QS ) model for square contingency tables is revisited, highlighting properties and features on the basis of its alternative deﬁnitions. More parsimonious QS -type models, such as the ordinal QS model for ordinal classiﬁcation variables and models based on association models (AMs) with homogeneous row and column scores, are discussed. All these models are linked to the local odds ratios (LOR). QS -type models and AMs were extended in the literature for generalized odds ratios other than LOR. Furthermore, in an information-theoretic context, they are expressed as distance models from a parsimonious reference model (the complete symmetry for QS and the independence for AMs), while they satisfy closeness properties with respect to Kullback– Leibler (KL) divergence. Replacing the KL by φ divergence, ﬂexible classes of QS -type models for LOR, AMs for LOR, and AMs for generalized odds ratios were generated. However, special QS -type models that are based on homogeneous AMs for LOR have not been extended to φ -divergence-based classes so far, or the QS -type models for generalized odds ratios. In this work, we develop these missing extensions, and discuss QS -type models and their generalizations in depth. These ﬂexible families enrich the modeling options, leading to models of better ﬁt and sound interpretation, as illustrated by representative examples.


Introduction
A special type of square contingency table that occurs often in studies of correlated or repeated categorical measurements, e.g., in panels or social mobility studies, is a table with row and column classification variables measured on the same scale, which can be nominal or ordinal. Caussinus, in his pioneering work [1], introduced quasisymmetry (QS) models for such square contingency tables, focusing on its mathematical properties and its connection to the models of complete symmetry (S) and marginal homogeneity (MH). The interpretation aspects were developed later by [2][3][4], among others. The QS model applies on contingency tables that have a common nominal classification scale. An ordinal QS (OQS) model was introduced by [5].
In this work, we focus on the QS model, penetrating its features by considering the alternative equivalent definitions of QS in terms of cell probabilities, local odds ratios (LOR), and as a model measuring departure from the S model. Furthermore, outgoing from the fact that, under QS, the table of LOR is symmetric, we discuss more special parsimonious QS models that are based on the family of association models (AMs) with homogeneous row and column scores ( [3,6]). In a information-theoretic setup, QS and OQS models were generalized through φ-divergence to the corresponding families of models QS (φ) and OQS (φ) ( [7,8]). Associated orthogonal decomposition properties of the S model were proved for the QS (φ) and OQS (φ) models, respectively, by [7,9]. Further variants of φ-divergence QS-type models and S decomposition properties were discussed in [10][11][12]. A detailed literature review on QS models and links to other models of symmetry and asymmetry can be found in [13]. Here, we introduce a φ-divergence family of special QS models that roots on the φ-divergence family of AMs ( [14]).
Moving from LOR to generalized odds ratios, e.g., global odds ratios (GOR), QS models were also considered for modeling the symmetry of generalized odds ratios ( [15]). AMs for generalized odds ratios ( [16]) were extended to a broader family through the φ-divergence ( [17]). Combining these two families of models discussed in [15,17], we introduce here new flexible classes of QS models for generalized odds ratios that are based on φ-divergence.
Overall, we revisit the QS model: i. Aiming at an indepth discussion of its nature and properties, as consolidated by the alternative possible definitions of QS, and consideration of special QS-type models, with an emphasis on QS-type models that are based on homogeneous AMs. ii. Reviewing extensions of QS-type models towards two directions: (a) in an informationtheoretic setup by replacing the role of KL divergence by φ-divergence, and (b) considering them for generalized odds ratios other than LOR. iii. Proposing a new φ-divergence family of QS-type models by expanding the relation of QS to homogeneous AMs for the φ-divergence AMs. iv. Introducing a flexible family of models by extending models of ii(b) above in terms of the φ-divergence.
The paper is structured as follows. Section 2 reviews QS and OQS models, focusing on their structural properties, and further discusses AMs with homogeneous scores that are of the QS-type. Section 3 presents QS and OQS models for generalized odds ratios. Generalized families of QS and OQS models for LOR based on φ divergence are briefly reviewed in Section 4. In Section 5, the QS-type models of Section 2.2 and the QS models for generalized odds ratios of Section 3 are expanded to corresponding φ divergencebased classes of models by modeling φ-scaled generalized odds ratios. A selection of the discussed models is illustrated on two examples in Section 6. Section 7 discusses further possible models that can be investigated. Section 8 summarizes our results.

Quasisymmetry Models for Square Contingency Tables
Consider an I × I contingency table cross-classifying two categorical variables X and Y, measured on the same scale, and corresponding to the rows and columns of the table. Let π = (π ij ) be the associated probability table with cell entry probabilities π ij = P(X = i, Y = j), for i, j = 1, . . . , I, where π ij ∈ [0, 1] and ∑ i,j π ij = 1. Then, the QS model, initially introduced by Caussinus [1], is expressed in log-linear form as with symmetric interactions, i.e., with Parameters in (1) satisfy some identifiability constraints. We set The degrees of freedom of (1) equal d f (QS) = (I − 1)(I − 2)/2. The α j parameters measure the departure from model of complete symmetry S, under which π ij = π ji = π S ij , for all i < j. Indeed, QS is reduced to S if α j = 0, for all j. Model (1) fits the diagonal entries exactly.
In terms of cell probabilities, (1) is equivalently expressed as with parameters c i providing insight into sources of marginal inhomogeneity, as discussed in [7,18].
Alternatively, the QS model can be defined in terms of the LOR i.e., the odds ratios of all 2 × 2 subtables formed by pairs of successive rows and successive columns. These form an (I − 1) × (I − 1) contingency table, and QS is the model that has symmetric LOR and fits the diagonal entries of the probability table exactly, as indicated in [19]. This definition through the LOR highlights a basic structural property of QS, facilitating its physical interpretation and enabling generalizations in new directions by considering alternative types of odds ratios, as we show in Section 3.

QS Model for Ordinal Classification Variables
Usually in applications of the QS model, the classification scale is ordinal. However, QS is also applicable for tables with nominal classification variables, since it fulfils the permutation invariance property ( [20]). In the case of an ordinal scale, alternative QS-type models are possible that are more parsimonious and provide insightful interpretation. Agresti [5] introduced the ordinal QS (OQS) model with interaction parameters satisfying (2) and under (3). In other words, OQS is a special, more parsimonious QS model, derived from (1) when α j = β · j. It has just one parameter more than the S model; hence, d f (OQS) = d f (S) − 1 = I(I − 1)/2 − 1. Equivalently, (7) can be expressed as π ij = π ji δ i−j , i ≤ j, i, j = 1, . . . , I, with β = − log(δ), or as a departure from symmetry model Under the OQS model, scores are assigned to the classification categories that equal the category indices (µ j = j, j = 1, . . . , I). It can be easily verified that these scores in (7) and (8) can equivalently be replaced by any equally spaced scores, i.e., linear transformation of µ j s. More generally, one could consider an OQS model for known but unequally spaced scores, that is, for any set of known scores µ 1 ≤ µ 2 . . . ≤ µ I (with µ 1 < µ I ). This model however will no more be equivalent to (7). Analogously to the QS model, OQS reduces to the S model when β = 0 (or δ = 1).

Association Models with Homogeneous Scores
Log-linear models with interactions for two-way contingency tables are saturated. In case of a square table, the saturated log-linear model is given by Association models (AMs) impose special structures on the interactions, thus leading to nonsaturated dependence models of sound interpretation. They are also known as Goodman's AMs (see [6] and references therein). For a detailed discussion of AMs and the associated literature, we refer to [21] (Chapter 6); a short presentation is provided in [22]. The simplest association model is that of uniform association (U) that is applied on tables with ordinal classification variables and model interactions through just one parameter of intrinsic association on the basis of equidistant scores assigned to the categories of the classification variables. The U model for square contingency tables (with classification variables of common scale) can be expressed as where ζ is the intrinsic association parameter and µ 1 < µ 2 . . . < µ I are known scores assigned to the classification categories, which are homogeneous for rows and columns, and equidistant for successive categories (µ i+1 − µ i = δ > 0). Under (11), interaction parameters u XY ij = ζµ i µ j are obviously symmetric. Under the U model, all LORs are equal since and thus (6) is trivially fulfilled. However, (11) is not of the QS-type since it does not exactly fit on the diagonal. Its extension where I is the indicator function, is the homogeneous uniform association model with exactly fitted diagonal entries, denoted by U hd , introduced by [3] as the uniform with main diagonal deleted model. It is a quasisymmetric model for ordinal classifications, more It is an alternative to OQS and more parsimonious than OQS for I > 4.
Analogously to the OQS model, Model (13) can be considered for arbitrary known scores µ 1 ≤ µ 2 . . . ≤ µ I (with µ 1 < µ I ). Furthermore, considering expression (13) with unknown parametric scores, not necessarily ordered, the homogeneous RC model with exactly fitted diagonals (RC d h ) is derived, which is another QS-type model, less parsimonious than U hd that can apply also to nominal classification variables. For a discussion on U hd , RC d h and further homogeneous AMs of higher order (i.e., homogeneous RC(K) models) and their links to QS, we refer e.g., to [21] (Section 9.4).

Generalized QS Models
In case one or both of the classification variables are ordinal, there exist other types of odds ratios that are alternatives to LOR. In our framework of square tables with classification variables measured on the same scale, of interest are, beyond LOR, odds ratios for tables with ordinal classification scale. The most popular type for ordinal classification variables is the global odds ratios (GOR), which for an I × I contingency table are defined as The characterization global is because every θ G ij is based on the whole contingency table since it dichotomizes X and Y at levels i and j, respectively, and accordingly merges the cell probabilities. GOR treat both classification variables in a symmetric manner. When merging is considered only for one classification variable, for example, Y, while the other is treated locally, then the cumulative odds ratios (COR) are derived COR can be used in problems of modeling the effect of an explanatory variable on a response. In particular, θ C Y ij could be considered if Y is the response. Obviously, θ C X ij can be analogously defined. For further types of generalized odds ratios and their detailed study, their inter-relations, and associated positive dependence properties, we refer to [23].
Motivated by the definition of QS through the symmetric LOR, the authors in [15] introduced generalized QS-type models for generalized odds ratios other than the LOR. In this context, the classical QS model that applies on the LOR would be denoted by QS L , while analogously to (6), the QS property for the GOR is defined by and denoted by QS G . On the other hand, the definition of the QS C model for the COR requires to change also the role of the response variable as explained in [15].

φ-Divergence-Based Families of Generalized QS Models
Model QS and OQS can be defined as departure from complete symmetry models (see (4) and (9)). From a statistical information point of view, they share a common property. Both, under certain conditions (different for each model), are the closest models to complete symmetry when the distance in measured in terms of the KL divergence, as proved by [7,8] for QS and OQS, respectively. Furthermore, the authors in [7,8] introduced and studied general classes of QS and OQS models, derived by replacing the KL divergence by a family of divergences, φ-divergence, which includes the KL as special case.

New Families of φ-Divergence Generalized QS Models
The QS-type models of Section 2.2 that are linked to AMs with homogenous row and column scores can be extended to φ-divergence-based families through the φ-divergence AMs of [14]. A brief presentation of the φ-divergence AMs and the underlying concept can be found in [22]. Here, we focus just on the families corresponding to the U model with homogeneous row and column scores. Associated φ-divergence-based family of models where π i+ and π +j denote the i-th row and j-th column marginals respectively, i.e., π i+ = leads to model (11) while for the CR-divergence it takes the form π ij = π i+ π +j 1 h . Hence the special QS-type model U hd defined in (13), can be extended to a family of models hd . The model expression corresponding to CR divergence is denoted by U (λ) hd . Furthermore, the QS models for generalized odds ratios, introduced in [15] and presented in Section 3, can be extended to a flexible φ-divergence family, based on the φ-divergence generalized AMs of [17], which are briefly presented below, adjusted in our set-up.
Analogously to expression (12) for the U L h model, U h can alternatively be expressed as where for i, j = 1, . . . , I − 1, are measures of local dependence, scaled through the φ-divergence and denoted by LOR (φ) . For φ(x) = x log x, LOR (φ) is the log(LOR), modeled in (12), which in the sequel is denoted as θ (30) Forcina and Kateri ([17]) provided expressions for φ-scaled generalized odds ratios and introduced families of φ-divergence AMs for generalized odds ratios, which they studied. Thus, for example, GOR and COR extend to GOR (φ) and COR (φ) , given by and for i, j = 1, . . . , I − 1. The merged probabilities in (31) and (32) are the same as defined in (14) and (15). For the KL divergence, (31) and (32) reduce to (14) and (15), while for the CR divergence, setting F(x) = 1 λ x λ , the corresponding expressions GOR (λ) and COR (λ) are derived. Through these φ-scaled generalized odds ratios, generalized QS models, such as QS G and QS C , are extended to φ-divergence-based families of models by replacing in the definitions (16) and (17) the GOR and COR by GOR (φ) and COR (φ) . QS-type models for other types of generalized odds ratios, introduced in [15], can be analogously extended to φ-divergence-based families.

Examples
We first illustrate QS-type models on one of the most classical datasets of square tables, namely, the women vision data provided in Table 1. Apart from the standard QS model fitted often in the literature, the author in [5] fitted on this dataset the OQS model, while the authors in [7] the QS L(λ) and in [8] the OQS (λ) , for λ = 1. Furthermore, the authors in [15] fitted QS models for generalized odds ratios other than the LOR. Our second example, provided in Table 2, cross-classifies male respondents of the 2008 General Social Survey (GSS) in the USA on the basis of their degree of pride with regard to America's economic vs scientific and tech achievements. We applied on this dataset the same models as on the women-vision data. Table 2. Cross-classification of male respondents of GSS 2008 survey by degree of proud (VP: very proud, SWP: somewhat proud, NVP: not very proud; NP: not proud at all) with regard to America's economic achievements (rows) vs scientific and tech achievements (columns). In parentheses are the MLEs of the expected frequencies under QS G(0) model. In Table 3 we provide the likelihood ratio goodness of fit (GOF) test statistic values for QS-type models fitted on the LOR for both examples. Table 4 shows the GOF test statistics for the generalized QS model fitted on the GOR. Table 3. Goodness-of-fit test statistics (with corresponding P values in parentheses) for the QS L(λ) and OQS (λ) for λ ∈ {0, 1} along with the U hd model fitted on data of Tables 1 and 2.

Science and
Model d f Table 1  Table 2 QS L(0)  Table 1. This model, i.e., (24) with λ = 1 and µ i = i, i = 1, . . . , I, takes the final form (see [8]) The MLE for α isα = −0.0534 < 0 and hence the probabilities in the lower triangle in Table 1 are estimated to be smaller than those in the upper. Hence the vision is worse for the left eye. Under this model, it holds Thus, the odds of an observation falling in a certain subdiagonal under the main diagonal of the table (instead of the corresponding superdiagonal) are estimated aŝ π iĵ π ji = 1−0.0534(i−j) 1−0.0534(j−i) , i < j. Notice that in [8] the correspondingα value is different (=0.119). This is due to rescaling, since there is used a different set of u i scores (theπ ij 's are the same).
The situation is different for the second example, where it is clear that the KL divergence should be used for modeling the LOR (see Table 3). Furthermore we see that for this data set, the U hd model, that imposes a special parsimonious structure on the interaction terms and not on the main effects (as under the OQS models), is of better fit. However, an impressive fit is provided by the QS G models (see Table 4). The best fit is for λ = 0 and thus the MLEs of the expected cell frequencies under QS G(0) are shown in Table 2. Hence, for this data set the QS property is significantly stronger supported for the global (than the local) dependencies. Table 4. Goodness-of-fit test statistics for the QS G(λ) models for the GOR, fitted on the data of Tables 1 and 2 for λ ∈ {−1/2, 0, 1/3, 2/3, 1}. All models have d f = 3. In our examples, we considered specific choices for the parameter λ. Analysis and interpretation of results follows analogously for other choices of the parameter λ.

Discussion
In future research, it would be interesting to consider more parsimonious QS-type models for generalized odds ratios, analogs to U hd for LOR, as for example, the model of uniform GOR with homogeneous (equidistant) row and column scores, i.e., satisfying log(θ G ij ) = ζδ 2 , i, j = 1, . . . , I − 1 , that additionally fits the probabilities on the main diagonal cells exactly. AMs that model interactions other than local, though they are naturally defined by the corresponding type of odds ratios, do not provide closed form expressions for the individual cell probabilities. Hence, a definition of the associated QS-type models by expressions analog to (13) is not possible. For the same reason, the OQS model cannot be extended to other types of odds ratios by the approach adopted here. Since it imposes a special structure on the main effects, this cannot be captured when defining models in terms of odds ratios; expressions in terms of cell probabilities are required. Recently, the authors in [17] derived expressions for such generalized AMs in terms of suitable associated marginal probabilities. These expressions include parameters for the main effects (see Forms (9) and (10) in [17]). One could generalize U hd and OQS for other types of odds ratios, adopting the framework of [17].

Conclusions
In this work, we revisited the QS model for square contingency tables with commensurable classification variables and discussed its possible equivalent formulations. QS is mostly expressed in terms of cell probabilities, while it can alternatively be expressed in terms of local odds ratios Its definition as a departure model from the more parsimonious model of complete symmetry provides additional interpretation features. Furthermore, we considered the OQS model, a more parsimonious QS-type model, applicable if the classification scale is ordinal, which imposes a special structure on the main effects of the model. On the other hand, further QS-type models can be derived by considering a special structure for the interaction terms. This is possible through AMs with homogeneous row and column scores. In particular, by adding parameters to homogeneous AMs that ensure the exact fit on the diagonal entries of the contingency table, models are derived that model the off-diagonal cells and have symmetric interaction terms. Thus, they are of the QS-type, but more parsimonious than the standard QS model. All these models are related to LOR and model local dependencies of the table. Next we present how these models can be defined for other types of generalized odds ratios, reviewing the work of [15].
In a statistical information-theoretic setup, QS-type models and AMs satisfy properties of closeness to a specific reference model, when their divergence from the reference model is measured in terms of KL divergence. The reference model is that of symmetry (for QS and OQS models) or independence (for AMs). Replacing KL divergence with φdivergence, generalized families of AMs, QS and OQS models for LOR were considered by [7,8,14], respectively. The QS model was linked to GOR in [16], while [15] introduced and implemented QS models for GOR and other types of generalized odds ratios, without, however, considering the link to divergence measures and associated properties. The possible extension of these models in terms of φ-divergence was a topic for further research in [15]. Here, we extended these models to φ-scaled generalized odds ratios and linked them to corresponding AMs on the basis of the results and models discussed in [17]. We demonstrated the flexibility in modeling the classes of models discussed here by implementing and discussing some of these models on two representative examples.  Table 2 can be found in https://gssdataexplorer.norc.org/ (accessed on 10 October 2021).