Next Article in Journal
Neuroenergetics and “General Intelligence”: A Systems Biology Perspective
Next Article in Special Issue
Analysis of an Intelligence Dataset
Previous Article in Journal
The Evidence for Geary’s Theory on the Role of Mitochondrial Functioning in Human Intelligence Is Not Entirely Convincing
Previous Article in Special Issue
A Mokken Scale Analysis of the Last Series of the Standard Progressive Matrices (SPM-LS)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Regularized Latent Class Analysis for Polytomous Item Responses: An Application to SPM-LS Data

by
Alexander Robitzsch
1,2
1
IPN—Leibniz Institute for Science and Mathematics Education, D-24098 Kiel, Germany
2
Centre for International Student Assessment (ZIB), D-24098 Kiel, Germany
Submission received: 9 July 2020 / Revised: 26 July 2020 / Accepted: 10 August 2020 / Published: 14 August 2020
(This article belongs to the Special Issue Analysis of an Intelligence Dataset)

Abstract

:
The last series of Raven’s standard progressive matrices (SPM-LS) test was studied with respect to its psychometric properties in a series of recent papers. In this paper, the SPM-LS dataset is analyzed with regularized latent class models (RLCMs). For dichotomous item response data, an alternative estimation approach based on fused regularization for RLCMs is proposed. For polytomous item responses, different alternative fused regularization penalties are presented. The usefulness of the proposed methods is demonstrated in a simulated data illustration and for the SPM-LS dataset. For the SPM-LS dataset, it turned out the regularized latent class model resulted in five partially ordered latent classes. In total, three out of five latent classes are ordered for all items. For the remaining two classes, violations for two and three items were found, respectively, which can be interpreted as a kind of latent differential item functioning.

1. Introduction

There has been recent interest in assessing the usefulness of short versions of the Raven’s Progressive Matrices. Myszkowski and Storme (2018) composed the last 12 matrices of the Standard Progressive Matrices (SPM-LS) and argued that it could be regarded as a valid indicator of general intelligence g. As part of this special issue, the SPM-LS dataset that was analyzed in Myszkowski and Storme (2018) was reanalyzed in a series of papers applying a wide range of psychometric approaches.
Previous reanalyses of the SPM-LS dataset have in common that quantitative latent variable models were utilized. In this paper, discrete latent variable models (i.e., latent class models) are applied for analyzing the SPM-LS dataset. With discrete latent variable models, the analysis of types instead of traits is the primary focus (see von Davier et al. (2012) and Borsboom et al. (2016)). A disadvantage of discrete latent variable models is that they often have a large number of parameters to estimate. For example, latent class models result in item response probabilities that are allowed to vary across classes. Even with only a few classes, the number of estimated parameters is typically larger than parametric models with quantitative latent variables. Hence, model selection based on principles often favors quantitative latent variable models over discrete latent variable models. So-called regularization approaches automatically reduce the number of parameters to estimate (see Huang et al. (2017) or Jacobucci et al. (2016)) for the use of regularization in structural equation modeling and Tutz and Schauberger (2015) or Battauz (2019) in item response modeling). In this paper, these regularization approaches are applied in discrete latent variable models, and some extensions for polytomous data are proposed.
The paper is structured as follows. In Section 2, we give a brief overview of latent class analysis. In Section 3, regularized latent class analysis for dichotomous and polytomous data is introduced. In Section 4, we apply proposed models of Section 3 in a simulated data illustration. In Section 5, we apply regularized latent class analysis to the SPM-LS dataset. Finally, in Section 6, we conclude with a discussion.

2. Latent Class Analysis

Latent variable models represent discrete items by a number of latent variables (see Agresti and Kateri 2014 for an overview). These latent variables can be categorical or quantitative or a mixture of both. Quantitative latent variables are considered in factor analysis, structural equation models, or item response models. In this article, we focus on categorical latent variables. In this case, latent variables are labeled as latent classes and are extensively studied in the literature of latent class analysis (LCA; Collins and Lanza 2009; Langeheine and Rost 1988; Lazarsfeld and Henry 1968).
A latent class model (LCM) represents the multivariate distribution of I categorical items X = ( X 1 , , X I ) by a fixed number of C latent classes. Let U denote the latent class variable that takes one of the values 1 , 2 , , C . It is assumed that items X i are conditionally independent on the latent class variable U. This means that it holds that
P ( X = x | U = c ) = i = 1 I P ( X i = x i | U = c ) for x = ( x 1 , , x I ) .
The multivariate probability distribution is then given as a mixture distribution
P ( X = x ) = c = 1 K P ( U = c ) i = 1 I P ( X i = x i | U = c ) .
Applications of LCMs to intelligence tests can be found in Formann (1982) or Janssen and Geiser (2010).

2.1. Exploratory Latent Class Analysis for Dichotomous Item Responses

In this subsection, we describe the LCM for dichotomous items. Let p i c = P ( X i = 1 | U = c ) denote the item response probability for correctly solving item i if a person is located in class c. In the estimation, these bounded parameters ( p i c [ 0 , 1 ] ) are transformed onto the real line by using the logistic transformation (see also Formann 1982)
P ( X i = x | U = c ) = exp ( x γ i c ) 1 + exp ( γ i c ) ( x = 0 , 1 ) .
Note that p i c is a one-to-one function of γ i c . For estimation purposes, it is sometimes more convenient to estimate models with unbounded parameters instead of estimating models with bounded parameters. For I items and C classes, I · C item parameters have to be estimated in the case of dichotomous items. In comparison to item response models (1PL model: one parameter, 2PL: two parameters, etc.), this results in many more parameters to be estimated. However, LCMs do not pose the assumption that classes are ordered, and no monotonicity assumptions of item response functions are posed.
Moreover, let p c = P ( U = c ) denote the probability that a person is in class c. As for item parameters, a logistic transformation is used to represent the class probabilities p c by parameters δ c . More formally, we set
p c = exp ( δ c ) 1 + j = 2 C exp ( δ j ) ( c = 1 , , C ) ,
where δ 1 = 0 . Because the probabilities sum to one, only C 1 distribution parameters have to be estimated. In total, the saturated distribution of I dichotomous items has 2 I 1 free possible parameters, which is represented by I · C + C 1 parameters in the LCM with C classes.
LCMs can be interpreted as pure exploratory models because no structure of item response probabilities among classes is posed. Confirmatory LCMs assume additional equality constraints on item response probabilities (Finch and Bronk 2011; Nussbeck and Eid 2015; Oberski et al. 2015; Schmiege et al. 2018). Like in confirmatory factor analysis, it could be assumed that some items load only on some classes, which translates into equal item response probabilities. Cognitive diagnostic models can be seen as particular confirmatory LCMs (von Davier and Lee 2019).
It should be emphasized that restricted LCMs form the basis of almost all popular latent variable models for discrete item responses that are nowadays very popular. Formann (1982) suggested to represent the vector γ = ( γ 11 , , γ 1 C , , γ I 1 , γ I C ) of item parameters as linear combinations γ i c = q i c α ( i = 1 , , I ; c = 1 , , C ) using a parameter vector α and known weight vectors q i c . In addition, the distribution parameter δ = ( δ 1 , , δ C ) is represented by δ c = w c β using a parameter vector β and known weight vectors w c . The resulting so-called structured LCM (Formann and Kohlmann 1998) includes unidimensional and multidimensional 1PL and 2PL models as special cases as well as mixture item response models (Formann 2007). To accomplish this, continuous latent variables are approximated by a finite number of discrete latent classes. For example, a normally distributed latent variable is approximated by discrete latent classes (e.g., C = 21 classes) whose probabilities are represented by only two components in α (i.e., the mean and the standard deviation as the first two moments). The usage of discrete latent classes can be interpreted as performing numerical integration with a fixed integration grid and applying the rectangle rule. Similar generalizations of restricted LCM were proposed by researcher von Davier (2008, 2010). In the rest of this article, we will focus on the simple exploratory LCM, although the proposed extension also applies to the more general structured latent class model.
In many applications, the allocation of persons to classes should be predicted by person variables Z (Collins and Lanza 2009). In more detail, class probabilities p c = P ( U = c ) are replaced by subject-specific conditional probabilities P ( U = c | Z = z ) (so-called latent class regression). These models further ease the interpretation of latent classes.

2.2. Exploratory Latent Class Analysis for Polytomous Item Responses

Now assume that there are I polytomous items and each item has K i + 1 nominal categories 0 , 1 , , K i . Item response probabilities are then given as p i k c = P ( X i = k | U = c ) and are again transformed into unbounded parameters γ i k c by a logistic transformation. In more detail, it is assumed that
P ( X i = x | U = c ) = exp ( γ i x c ) 1 + k = 1 K i exp ( γ i k c ) ( x = 0 , 1 , , K i ) ,
where γ i 0 c = 0 for all items i and all latent classes c. Instead of estimating K i + 1 probabilities for item i and class c, K i free parameters γ i h c have to be estimated. If all polytomous items have K + 1 categories, the multidimensional contingency table of observations has (K + 1)I − 1 free parameters while in the LCM I · K · C + C − 1 parameters are estimated. It should be emphasized that the LCM for polytomous items has more free parameters compared to LCMs with dichotomous items as well as for unidimensional and multidimensional item response models for polytomous data.
Like for dichotomous data, restricted LCMs were formulated that represented the vector of all item response functions by γ i k c = q i k c α ( i = 1 , , I ; k = 1 , , K i ; c = 1 , , C ) using a parameter vector α and known weight vectors q i k c (Formann 1992). It is advisable to induce some structure on item response functions, especially for polytomous data, because many parameters have to be estimated without any structural assumptions.

3. Regularized Latent Class Analysis

As LCMs are exploratory models, interpretation of results could sometimes be challenging. Moreover, in not too large samples, parameter estimation gets instable, and findings are sometimes not generalizable across different samples. Alternatively, confirmatory latent models could be estimated for obtaining more stable and more interpretable parameter estimates. However, such confirmatory approaches need assumptions that have to be known in advance of the data analysis. Hence, alternative approaches are sought.
Regularized latent class models (RLCMs; Chen et al. 2017) estimate item response probabilities under the presupposition that similar item response probabilities in these models are grouped and receive the same value. The main idea of using the regularization technique (see Hastie et al. 2015 for an overview) to LCMs is that by subtracting an appropriate penalty term from the log-likelihood function, some simpler structure on item response probabilities is posed. Different penalty terms typically result in different estimated parameter structures. In a recent Psychometrika paper, Chen et al. (2017) proposed the RLCM for dichotomous item responses. Related work for dichotomous data can be found in Wu (2013) and Yamamoto and Hayashi (2015).
The regularization technique has also been applied for factor models with continuous items (Huang et al. 2017; Jacobucci et al. 2016) and discrete items (Chen et al. 2018; Sun et al. 2016) in order to fit exploratory factor models with the goal of estimating as many zero loadings as possible. In this respect, regularization is a viable alternative to factor rotation methods (Scharf and Nestler 2019).
The regularization technique has also been applied to Gaussian mixture models in which cluster means are estimated to be equal for some variables among clusters (Bhattacharya and McNicholas 2014; Ruan et al. 2011). Regularized latent class analysis (RLCA) is also referred to as penalized latent class analysis (see DeSantis et al. 2008). Under this label, LCMs are typically meant by that apply regularization to the estimation of regression coefficients of the latent class regression model (DeSantis et al. 2008, 2012; Houseman et al. 2006; Leoutsakos et al. 2011; Sun et al. 2019; Wu et al. 2018). Fop and Murphy (2018) provide a recent review of applications of the regularization technique in mixture models.
In the following, we describe the RLCM at first for dichotomous items. Afterward, we consider the more complex case of polytomous items in which more possibilities for setting equality constraints among item response probabilities are present.

3.1. Regularized Latent Class Analysis for Dichotomous Item Responses

At first, we consider the case of dichotomous items X i ( i = 1 , , I ). In an RLCM, not all item response probabilities p i c ( c = 1 , , C ) are assumed to be unique. Chen et al. (2017) subtracted a penalty term from the log-likelihood function that penalizes differences in ordered item response probabilities. In more detail, denote by p i , ( c ) ( c = 1 , , C ) ordered item response probabilities of the original probabilities p i c such that p i , ( 1 ) p i , ( 2 ) p i , ( C ) , and collect all parameters in p i . Then, Chen and colleagues used the following penalty function for item i
P e n ( p i ; λ ) = c = 2 C H SCAD ( p i , ( c ) p i , ( c 1 ) ; λ ) ,
where H SCAD denotes the smoothly clipped absolute deviation penalty (SCAD; Fan and Li 2001). The SCAD penalty takes a value of zero if p i , ( c ) p i , ( c 1 ) = 0 and is positive otherwise (see Figure 1 for the functional form of the SCAD penalty). The parameter λ is a regularization parameter that governs the strength of the penalty function. With small values of λ , differences are barely penalized, but with large values of λ , differences are heavily penalized, and item parameters approach a uniform distribution.
If X denotes the matrix of observed data and p denotes the vector of all ordered item response probability and δ the vector that represents the skill class probabilities, the following function is maximized in Chen et al. (2017):
l ( p , δ ; X ) N i = 1 I P e n ( p i ; λ ) ,
where l denotes the log-likelihood function of the data. By employing a penalty function P e n in the estimation, some item response probabilities are merged, which, in turn, eases the interpretation of resulting latent classes. It should be noted that for estimating model parameters, the regularization parameter λ has to be fixed. In practice, the regularization parameter λ also has to be estimated. Hence, the maximization is performed on a grid of λ values (say, λ = 0.01 , 0.02 , , 0.30 ), and that model is selected that is optimal with respect to some criterion. Typical criteria are the cross-validated log-likelihood or information criteria like the Akaike information criterion (AIC), Bayesian information criterion (BIC), or others (Hastie et al. 2015).
The maximization of (7) is conducted using an expectation-maximization (EM) algorithm (see Section 3.3 for general description). The estimation approach of Chen et al. (2017) is implemented in the R package CDM (George et al. 2016; Robitzsch and George 2019).

3.1.1. Fused Regularization among Latent Classes

Though the estimation approach of Chen et al. (2017) is successful, it is not clear how it could be generalized to polytomous data because it is not evident how item response probabilities of several categories should be ordered. Hence, we propose a different estimation approach. We apply the technique of fused regularization (Tibshirani et al. 2005; Tutz and Gertheiss 2016) that penalizes all pairwise differences of item response probabilities. In more detail, for a vector p i of item response probabilities, we replace the penalty (used in Equation (6)) of Chen et al. (2017) by
P e n ( p i ; λ ) = c < d H MCP ( p i c p i d ; λ ) ,
where p i c = P ( X i = 1 | U = c ) are class-specific item response probabilities, and h MCP denotes the minimax concave penalty (MCP; Zhang 2010). We do not suppose dramatic differences to the SCAD penalty, but we would expect less biased estimators than using the often employed least absolute shrinkage and selection operator (LASSO) penalty H LASSO ( x ; λ ) = λ | x | (see Hastie et al. 2015). By using pairwise differences in Equation (8), item response probabilities are essentially merged into item-specific clusters of values that are equal within each cluster. Hence, the same goal as in Chen et al. (2017) is achieved. As explained in Section 2.2, our estimation approach uses transformed item response probabilities γ . Therefore, in the estimation, we replace Equation (8) by
P e n ( γ i ; λ ) = c < d H MCP ( γ i c γ i d ; λ ) .
Note that by using the penalty on γ i in Equation (9) instead of on p i in Equation (8), a different metric in quantifying differences in item parameters is introduced. By using γ i , differences in extreme probabilities (i.e., probabilities near 0 or 1) appear to be less similar than by using untransformed probabilities as in (8).
In Figure 1, the LASSO, MCP, and SCAD penalty functions are depicted. It can be seen for x values near to 0, the MCP and the SCAD penalty equal the LASSO penalty (i.e., f ( x ) = λ | x | ). For sufficiently large x values MCP and SCAD reach an upper asymptote, which is not the case for the LASSO penalty. Hence, for the MCP and SCAD penalty, the penalty is relatively constant for large values of x. This property explains why the MCP and SCAD penalty typically results in less biased estimates. It should be noted that the application of the regularization presupposes some sparse structure in the data for obtaining unbiased estimates. In other words, the true data generating mechanism consists of a sufficiently large number of equal item parameters. If all item response probabilities would be different in the data generating model, employing a penalty that forces many item parameters to be equal to each other would conflict the data generating model.

3.1.2. Hierarchies in Latent Class Models

The RLCM can be used to derive a hierarchy among latent classes. The main idea is depicted in Figure 2. In an RLCM with C = 4 classes, a partial order of latent classes is defined. Class 1 is smaller than Classes 2, 3, and 4. Classes 2 and 3 cannot be ordered. Finally, Classes 2 and 3 are smaller than Class 4. More formally, in an RLCM, we define class c to be smaller than class d (or: class d is larger than class c) if all item response probabilities in class c are at most as large as in class d, i.e., p i c p i d for all items i = 1 , , I . We use the notation c d to indicate that c is smaller than d. In a test with many items, fulfilling these inequalities for all items might be a too strong requirement. Hence, one weakens the concept of partial ordering a bit. Given a tolerable for at most ι items, we say that class c is approximately smaller than class d if p i c p i d is fulfilled for at least I ι items.
The partial ordering of latent classes substantially eases the interpretation of the results in RLCMs. Chen et al. (2017) used the RLCM to derive partially ordered latent classes in cognitive diagnostic modeling. Wang and Lu (2020) also applied the RLCM for estimating hierarchies among latent classes (see also Robitzsch and George 2019). Using the RLCM with an analysis of hierarchies may be considered as a preceding method of confirmatory approaches to latent class modeling.

3.2. Regularized Latent Class Analysis for Polytomous Item Responses

In the following, we propose an extension of RLCM for polytomous item responses. It has been shown that using information from item distractors (Myszkowski and Storme 2018; Storme et al. 2019) could increase the reliability for person ability estimates compared to using only dichotomous item responses that only distinguishes between correct and incorrect item responses. Moreover, it could be beneficial to learn about the differential behavior of item distractors analyzing the data based on correct and all incorrect item responses.
Assume that 0 denotes the category that refers to a correct response and 1 , , K i refer to the categories of the distractors. In our parameterization of the LCM for polytomous data (see Section 2.2), only parameters γ i k c of distractors k for item i in classes c are parameterized. Given the relatively small sample size of the SPM-LS application data (i.e., N = 499 ), the number of estimated parameters in an unrestricted LCM turn out to be quite large because there are seven distractors per item. Moreover, it could be supposed that the distractors of an item behave similarly. Hence, it would make sense to estimate some item parameters to be equal to each other.
We now outline alternatives for structural assumptions on item response probabilities. Let us fix item i. For K i + 1 categories and C classes, K i · C item parameters are modeling (omitting the category 0). Hence, we can distinguish between different strategies to the setting of equalities of item parameters. First, for a fixed category k, one can merge some item response probabilities among classes. This means that some of the differences γ i k c γ i k d ( c d ) are zero. Hence, a penalty on differences γ i k c γ i k d has to be posed. This is just the penalty as for dichotomous items (see Equation (8)), but the regularization is applied for K i categories instead of one category. Second, for a fixed class c, some item response probabilities among categories could be merged. In this case, one would impose a penalty on differences γ i k c γ i h c ( k h ). Third, penalization among classes and among categories can be simultaneously applied. In the remainder, we outline the different strategies in more detail.

3.2.1. Fused Regularization among Latent Classes

Let γ i k = ( γ i k 1 , , γ i k C ) denote the vector of item parameters for item i in category k. Again, let γ i denote the vector of all item parameters of item i. For a regularization parameter λ 1 and item i, we define the penalty
P e n ( γ i ; λ 1 ) = k = 1 K i P e n ( γ i k ; λ 1 ) = k = 1 K i c < d H MCP ( γ i k c γ i k d ; λ 1 ) .
As a result, for a category, some item response probabilities will be merged across latent classes. However, the merging of item parameters (also referred to as fusing; Tibshirani et al. 2005) is independently applied for all categories of an item. In practice, it is maybe not plausible that all distractors of an item would function differently, and item parameters should be more regularized.

3.2.2. Fused Regularization among Categories

As a second alternative, we now merge categories. Let γ i c = ( γ i 1 c , , γ i K i c ) denote the vector of item parameters for item i in class c. For a regularization parameter λ 2 and item i, we define the penalty
P e n ( γ i ; λ 2 ) = c = 1 C P e n ( γ i c ; λ 2 ) = c = 1 C k < h H MCP ( γ i k c γ i h c ; λ 2 ) .
As a result, some of the item response probabilities of categories are set equal to each other. As an outcome of applying this penalty, atypical distractors could be detected. However, by using the penalty in Equation (11), no equalities among latent classes are imposed.

3.2.3. Fused Regularization among Latent Classes and Categories

The apparent idea is to combine the regularization among latent classes and categories. By doing so, the penalties in Equations (10) and (11) have to be added. In more detail, for regularization parameters λ 1 and λ 2 , we use the penalty
P e n ( γ i ; λ 1 , λ 2 ) = k = 1 K i c < d H MCP ( γ i k c γ i k d ; λ 1 ) + c = 1 C k < h H MCP ( γ i k c γ i h c ; λ 2 ) .
It can be seen that the penalty in Equation (12) now depends on two regularization parameters. In the estimation, the one-dimensional grid of regularization parameters has then to be substituted by a two-dimensional grid. This substantially increases the computational demand.

3.2.4. Fused Group Regularization among Categories

We can now proceed to pose additional structural assumptions on item parameters. One could suppose that two distractors k and h of item i show the same behavior. In the RLCM, this means that γ i k c γ i h c = 0 holds for all classes c = 1 , , C . The group regularization technique allows us to estimate all parameters in a subset of parameters to be zero (see Huang et al. 2012 for a review). A fused group regularization approach presupposes that either all differences γ i k c γ i h c equal zero or all differences are estimated to be different from zero (Cao et al. 2018; Liu et al. 2019). This property can be achieved by substituting a norm of the difference of the two vectors in the penalty. In more detail, one considers the penalty
P e n ( γ i ; λ 1 ) = k < h H MCP ( | | γ i k γ i h | | ; λ 1 )
where for a vector x = ( x 1 , , x p ) , the norm | | x | | is defined as | | x | | = p k = 1 p x k 2 . In practice, using the penalty in Equation (13) could provide a more parsimonious estimation than the penalty defined in Equation (12). In principle, model comparisons can be carried out to decide which assumption is better represented in the data.

3.2.5. Fused Group Regularization among Classes

Alternatively, one could also assume that latent classes function the same among classes. In the RLCM, then it would hold that that γ i k c γ i k d = 0 for all categories k = 1 , , K i . A fused group regularization results in the property that either all item parameters of classes c and d are equal to each other or all estimated to be different from each other. The following penalty is used in this case:
P e n ( γ i ; λ 2 ) = c < d H MCP ( | | γ i c γ i d | | ; λ 2 )

3.3. Estimation

We now describe the estimation of the proposed RLCM for polytomous data. Let X = ( x n i k ) denote the observed dataset where x n i k equals 1 if person n ( n = 1 , , N ) chooses category k for item i. Let γ i denote item parameters of item i and and the vector that contains item parameters of all items. The vector δ represents the skill class distribution. Furthermore, let pic(x; γi) = P(Xi = x | U = c) and pc(δ) = P(U = c).
Following Chen et al. (2017) and Sun et al. (2016), an EM algorithm is applied for estimating model parameters. The complete-data log-likelihood function is given
l com ( γ , δ , U ) = n = 1 N i = 1 I k = 1 K i c = 1 C x n i k u n c log p i c ( k ; γ i ) + n = 1 N c = 1 C u n c log p c ( δ ) ,
where u n = ( u n 1 , , u n C ) is the vector of latent class indicators for person n. It holds that u n c = 1 if person n is located in class c. Obviously, the true class membership is unknown and, hence, Equation (15) cannot be used for maximization.
In the EM algorithm, the estimation of l com is replaced by the expected complete-data log-likelihood function by integrating over the posterior distribution. In more detail, unobserved values u n c are replaced by their conditional expectations:
u n c = E ( u n c | x n ; γ ( t ) , δ ( t ) ) = p c ( δ ( t ) ) i = 1 I k = 1 K i p i c ( k ; γ i ( t ) ) x n k i d = 1 D p d ( δ ( t ) ) i = 1 I k = 1 K i p i d ( k ; γ i ( t ) ) x n k i ( c = 1 , , C ) ,
where γ ( t ) and δ ( t ) are parameter estimates from a previous iteration t. The EM algorithm alternates between the E-step and the M-step. By replacing the unobserved values u n i by their expected values u n i , the following Q-function is obtained that is used for maximization in the M-step
Q ( γ , δ | γ ( t ) , δ ( t ) ) = n = 1 N i = 1 I k = 1 K i c = 1 C x n i k u n c log p i c ( k ; γ i ) + n = 1 N c = 1 C u n c log p c ( δ ) .
From this Q-function, the penalty function is subtracted such that the following function is minimized for some regularization parameter λ in the M-step
Q ( γ , δ | γ ( t ) , δ ( t ) ) N i = 1 I P e n ( γ i ; λ ) .
It can be seen that item parameters γ i are separately obtained for each item i in the M-step because the penalties are defined independently for each item. Hence, for each item i, one maximizes
n = 1 N k = 1 K i c = 1 C x n i k u n c log p i c ( k ; γ i ) N P e n ( γ i ; λ ) .
Latent class probability parameters δ are also obtained independently from item parameters in the M-step.
The penalty function P e n turns out to be non-differentiable. Here, we use a differentiable approximation of the penalty function (Oelker and Tutz 2017; see also Battauz 2019). As it is well known that the log-likelihood function in LCMs is prone to multiple maxima, using multiple starting values in the estimation is advised.
The described EM algorithm is included in an experimental version of the function regpolca() in the R package sirt (Robitzsch 2020). The function is under current development for improving computational efficiency.

4. Simulated Data Illustration

Before we illustrate the application of the method to the SPM-LS dataset, we demonstrate the technique using a simulated data set. This helps to better understand the proposed method of regularized latent class modeling under ideal conditions.

4.1. Dichotomous Item Responses

4.1.1. Data Generation

First, we consider the case of dichotomous items. To mimic the situation in the SPM-LS dataset, we also chose I = 12 items for simulating a dataset. Moreover, to reduce sampling uncertainty somewhat, a sample size of N = 1000 subjects was chosen. There were C = 4 latent classes with true class probabilities 0.30, 0.20, 0.10, and 0.40. In Table 1, we present the item response probabilities with each cluster. We only specified parameters for six items and duplicated these parameters for the remaining six items in the test. It can be seen in Table 1 that many item response probabilities were set equal to each other. Indeed, for the first four items, there are only two instead of four unique probabilities. Moreover, it is evident from Table 1 that the four classes are partially ordered. The first class has the lowest probabilities for all items and is, therefore, the smallest class that consists of the least proficient subjects. The fourth class has the highest probabilities, constitutes the largest class, and contains the most proficient subjects.
The model selection is carried out using information criteria AIC and BIC. For regularized models, the required number of parameters in the computation of information criteria is determined by the number of estimated unique parameters. For example, if four item response probabilities would be estimated to be equal in a model, only one parameter would be counted.

4.1.2. Results

In the first step, we estimated exploratory latent class models with C = 2 , 3, 4, 5, and 6 classes. The model comparison is presented in Table 2. While the decision based on the AIC was ambiguous and selected the incorrect number of classes, the BIC correctly selected model with C = 4 latent classes. This observation is consistent with the literature that argues that model selection in LCMs should be based on the BIC instead of the AIC (Collins and Lanza 2009; Keribin 2000).
In the solution with four classes, estimated class probabilities were 0.290, 0.204, 0.120, and 0.386, respectively, which closely resembled the data generating values. In Table 3, estimated item response probabilities are shown. The estimates were very similar to the data generating parameters that are presented in Table 1. It can be seen that some deviations from the simulated equality constraints are obtained. It is important to emphasize that latent class solutions are not invariant with respect to their class labels (so-called label switching). Class labels in the estimated model have to be permuted in order to match the class label in the simulated data.
Finally, we estimated the RLCM for regularization parameters from 0.01 to 1.00 in steps of 0.01 for C = 4 classes in order to obtain the best-fitting solution. The regularization parameter λ = 0.21 provided the best-fitting model in terms of the BIC ( BIC = 13,104, AIC = 12,957). Notably, this model showed a substantially better model fit than the exploratory LCM with four classes due to the more parsimonious estimation of item parameters. In the model with λ = 0.21 , in total, 21 item parameters were regularized (i.e., they were set equal to item parameters in other classes for the respective item), resulting in 30 freely estimated model parameters. With respect to the AIC, the best-fitting model was obtained for λ = 0.15 ( BIC = 13,110, AIC = 12,953). This model resulted in 19 regularized item parameters and 32 freely estimated item parameters. The model selected by the minimal BIC ( λ = 0.21 ) resulted in estimated class probabilities of 0.29, 0.21, 0.11, and 0.39. Estimated item response probabilities (shown in Table 4) demonstrate that the equality constraints that were posed in the data generating were correctly identified in the estimated model.
Lastly, we want to illustrate the behavior of regularization. For the sequence of specified regularization parameters λ in the estimation, the estimated item response probabilities p i c ( λ ) can be plotted. Such a plot is also referred to as a regularization path (Hastie et al. 2015). With very small λ values, no classes were merged, and all item parameters were estimated differently from each other. With increasing values of λ , item parameters were subsequently merged. For Item 1 and Classes 2, 3, and 4, the regularization path is shown in Figure 3. At first, item parameters of Class 2 and 4 are merged at λ = 0.04 . Afterwards, all three item response probabilities are merged at λ = 0.09 .

4.2. Polytomous Item Responses

4.2.1. Data Generation

Now, we simulate illustrated data with polytomous item responses with 12 items, each item possessing four categories (i.e., K i = 3 ). The first category (i.e., Category 0) refers to the correct category, while categories 1, 2, and 3 refer to distractors of the item. As in the case of dichotomous data, item response probabilities were 0.30, 0.20, 0.10, and 0.40, and N = 1000 subjects were simulated. Again, we specified item parameters for the first six items and replicated the parameters for the remaining six items. In Table 5, true item response probabilities are shown that were used for generating the dataset. It is evident that the item response probabilities are strongly structured. All distractors of Items 1 and 5 function precisely the same. For Item 2, Category 1, and Category 2 show the same behavior. Category 3 only shows a differential behavior in Classes 2 and 4. At the other extreme, all item response probabilities differ for Item 6 among classes and categories. It can be expected that an RLCM will result in a substantial model improvement compared to an exploratory LCM without equality constraints.

4.2.2. Results

At first, we fitted exploratory LCMs with 2, 3, 4, 5, and 6 classes. Based on the information criteria presented in Table 6, the correct model with C = 4 latent classes was selected. However, the difference in model improvement by moving from 3 to 4 classes would be considered as negligible (i.e., a BIC difference of 3) in practice. Estimated latent class probabilities in the model with four latent classes were estimated as 0.29, 0.21, 0.12, and 0.38. Estimated item response probabilities are shown in Table A1 in Appendix A.
In the next step, different RLCMs for polytomous data were specified. As explained in Section 3.2, one can regularize differences in item parameters among classes (using a regularization parameter λ 1 ), among categories (using a regularization parameter λ 2 ), or both (using both regularization parameters or applying fused group regularization). We fitted the five approaches (Approaches R1, , R5) that were introduced in Section 3.2 to the simulated data using unidimensional and two-dimensional grids of regularization parameters. For each of the regularization approaches, we selected the model with minimal BIC.
In Table 7, it can be seen that the model with the fused penalty on item categories fitted the model best (Approach R2: BIC = 24,689). In this model, 79 item parameters are regularized. The decrease in BIC compared to an exploratory LCM is substantial. From these findings, it follows for this dataset that it is important to fuse item parameters among categories instead of among classes. The best-fitting model when using a simultaneous penalty for classes and categories (Approach R3: BIC = 24,836) outperformed the model in which only parameters were fused among classes (Approach R1: BIC = 24,932). However, it was inferior to the model with fusing among categories. Notably, the largest number of regularized parameters ( # nreg = 103 ) was obtained for Approach R3. The fused grouped regularization approaches (Approaches R4 and R5) also improved fit compared to an unrestricted exploratory LCM but were also inferior to R2. The reason might be that applying group regularization results in the extreme decision that either all item parameters are equal or all are different. In contrast, fused regularization Approaches R1, R2, and R3 allow the situation in which only some of the item parameters are estimated to be equal to each other.
For the best-fitting model of Approach R2 (i.e., fusing among categories), estimated class probabilities were 0.29, 0.22, 0.10, and 0.39, respectively. Estimated item response probabilities from this model are shown in Table 8. It can be seen that model estimation was quite successful in identifying parameters that were equal in the data generating model.

5. Application of the SPM-LS Data

In this section, we illustrate the use of RLCM to the SPM-LS dataset.

5.1. Method

According to the topic of this special issue, the publicly available dataset from the Myszkowski and Storme (2018) study was reanalyzed. The original study compared various parametric item response models (i.e., 1PL, 2PL, 3PL, 4PL, and nested logit model) performed on a dataset comprised of N = 499 students (214 males and 285 females) aged between 19 and 24. The analyzed data consisted of responses on the 12 most difficult SPM items and are made freely available at https://data.mendeley.com/datasets/h3yhs5gy3w/1. For details regarding the data gathering procedure, we refer to Myszkowski and Storme (2018).
Each of the I = 12 items had one correct category and K i = 7 distractors. To be consistent with the notation introduced in the paper and to ease interpretation of the results, we recoded the original dataset when using polytomous item responses. First, we scored the correct response as Category 0. Second, we recoded the order of distractors according to their frequency. In more detail, Category 1 in our rescored dataset was the most attractive distractor (i.e., most frequent distractor), while Category 7 was the least attractive distractor. The relative frequencies and references to categories of the original dataset are shown in Table 9. It could be supposed that there some especially attractive distractors for each item. However, many item category frequencies turned out to be relatively similar such that it could be that they would also function homogeneously among latent classes. We also analyzed the SPM-LS dataset in its dichotomous version in which Category 1 was scored as correct, and Category 0 summarized responses of all distractors.
For the dataset with dichotomous items, the exploratory LCM and the RLCM were fitted for two to six latent classes. Model selection was conducted based on the BIC. For the dataset with rescored polytomous items, we used the same number of classes for estimating the exploratory LCM. For the RLCM, we applied the fused regularization approach with respect to classes (Section 3.2.1), categories (Section 3.2.2), and to classes and categories in a simultaneous manner (Section 3.2.3).

5.2. Results

5.2.1. Results for Dichotomous Item Responses

For the SPM-LS dataset with dichotomous items, a series of exploratory LCMs and RLCMs with two to six classes was fitted. According to the BIC presented in Table 10, an exploratory LCM with four classes would be selected. When RLCMs were fitted, a model with five classes would be selected that had 19 regularized item parameters.
In Table 11, item response probabilities and skill class probabilities for the RLCM with C = 5 classes are shown. By considering the average item response probabilities per skill class p ¯ c = ( i = 1 I p i c ) / I , Class C1 ( 12 % frequency) was the least performing and Class C5 ( 37 % frequency) the best performing class. Class C3 ( 40 % frequency) could be seen as an intermediate class. Classes C2 and C4 were relatively rare. Compared to the medium Class C3, students in Class C2 had a particularly bad performance at Items 3, 6, and 11, but outperformed them on Items 7, 8, and 12. Students in Class C4 showed perfect performance on Items 8 and 9, but notably worse performance on Items 10 and 11. Interestingly, one could define a partial order on the classes if we allowed at most two violations of inequality conditions. In Figure 4, this partial order is depicted. The arrow from Class C1 to Class C3 means that C1 was smaller than C3. There are arrows with particular labels that indicate violations of the partial order. For example, C1 was approximately smaller than C2, and Items 3 and 11 violated the ordering property. To summarize, three out of the five classes fulfilled the ordering property for all items. Two classes possessed violations for two or three items and could be interpreted to detect subpopulations of subjects that showed latent differential item functioning.

5.2.2. Results for Polytomous Item Responses

We now only briefly discuss the findings for the analysis of the SPM-LS dataset based on polytomous item responses. For the exploratory latent class models, the model with just two latent classes would be selected according to the BIC. However, the model with six latent classes would be selected according to the AIC. Given a large number of estimated item parameters, applying the RLCM seems to be required for obtaining a parsimonious model. The best-fitting model was obtained with C = 3 classes by fusing categories with a regularization parameter of λ 2 = 0.24 . Classes C1 ( 28 % frequency) and C2 ( 5 % frequency) had low performance, while Class C3 was the high-performing class ( 67 % frequency). As an illustration, we provide in Table 12 estimated item probabilities for the last three items. It can be seen that some of the categories were fused such that they had equal item response probabilities within a latent class. All item parameters are shown in Table A2 in Appendix B.
At the time of writing, results for polytomous data for the SPM-LS do not seem to be very consistent with those for dichotomous data. It could be the large number of parameters to be estimated (several hundred depending on the number of classes) for the relatively small sample size of N = 499 is critical. Other research has also shown that regularization methods for LCMs need sample sizes of at least 1000 or even more for performing satisfactorily (Chen et al. 2015).

6. Discussion

In this article, we proposed an extension of regularized latent class analysis to polytomous item responses. We have shown using the simulated data illustration and the SPM-LS dataset that fusing among classes or categories can be beneficial in terms of model parsimony and interpretation. Often, conceptualizing substantive questions as latent classes led researchers to easier to think in types of persons. This interpretation is not apparent in latent variables with continuous latent variables.
In our regularization approach to polytomous data, we based regularization penalties on distractors of items. Hence, the correct item response serves as a reference category. In LCA applications in which the definition of a reference category cannot be done, the regularization approach has certainly to be modified. Note that for K i + 1 categories, only K i item parameters per class can be independently estimated. Alternatively, a sum constraint k = 0 K i γ i k c = 0 could be posed if γ i k c ( k = 0 , 1 , , K i denotes the item parameters of item i of category k in class c. Such constraints can be replaced by adding ridge-type penalties of the form λ 3 k = 0 K i γ i k c 2 to the fused regularization penalty, where λ 3 is another regularization parameter. By squaring item parameters in the penalty function, they are uniformly shrunk to zero in the estimation.
By treating the correct item response as the reference category, regularization only operates on the categories for the incorrect response. As pointed out by an anonymous reviewer, it could be more appropriate by fusing classes for the correct item response and for incorrect item response categories separately. This would lead to an overidentified model because all class-specific item response probabilities would appear in the model. However, if, again, a ridge-type would be employed, the identification issue would disappear.
As the application of the regularization technique to an LCM results in a particular restricted LCM, it has to be shown that the model parameters can be identified. The analysis of necessary and sufficient conditions for identification in restricted LCMs was currently investigated (Gu and Xu 2018; Xu 2017). Because the inclusion of the penalty function, accompanied by a regularization parameter, introduces an additional amount of information in the estimation, it is unclear whether identifiability should be studied only on the likelihood part of the optimization function (see San Martín 2018 for a related discussion in Bayesian estimation).
It should be noted that similar regularization approaches have been studied for cognitive diagnostic models (Chen et al. 2015; Gu and Xu 2019; Liu and Kang 2019; Xu and Shang 2018). These kinds of models pose measurement models on D dichotomous latent variables. These D latent variables constitute 2 D latent classes. In addition, in this model class, the modeling of violations of the local independence assumption in LCA has been of interest (Kang et al. 2017; Tamhane et al. 2010).
Previous articles on the SPM-LS dataset also used distractor information by employing the nested logit model (NLM; Myszkowski and Storme 2018). The NLM is also very data-hungry, given the low sample size of the dataset. It has been argued that reliability can be increased by using distractor information (Myszkowski and Storme 2018; Storme et al. 2019). It should be noted that this is only true to the extent that item parameters can be reliably estimated. For N = 499 in the SPM-LS dataset, this will probably be not the case. Regularized estimation approaches could circumvent estimation issues (see Battauz 2019 for a similar approach in the nominal response model).
Finally, we would like to emphasize that the regularization approaches can be interpreted as empirical Bayesian approaches that employ hierarchical prior distributions on item parameters (van Erp et al. 2019). It can be expected that Bayesian variants of RLCMs are competitive to EM-based estimation, especially for small(er) samples.

Funding

This research received no external funding.

Acknowledgments

I am incredibly grateful to Nils Myszkowski for the long-standing patience and the invitation for contributing to the special issue ‘Analysis of an Intelligence Dataset’ in the Journal of Intelligence. We thank the three anonymous reviewers whose comments helped improve and clarify this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AICAkaike information criterion
BICBayesian information criterion
EMexpectation maximization
LASSOleast absolute shrinkage and selection operator
LCAlatent class analysis
LCMlatent class model
NLMnested logit model
RLCAregularized latent class analysis
RLCMrestricted latent class model
SPM-LSlast series of Raven’s standard progressive matrices

Appendix A. Additional Results for Simulated Data Illustration with Polytomous Item Responses

In Table A1, estimated item response probabilities for the exploratory LCM with four latent classes are shown.
Table A1. Data illustration polytomous data: estimated item response probabilities in the exploratory LCM with C = 4 classes.
Table A1. Data illustration polytomous data: estimated item response probabilities in the exploratory LCM with C = 4 classes.
ItemCatClassItemCatClassItemCatClass
123412341234
100.080.780.800.82500.100.110.450.92900.140.820.140.81
110.330.080.070.07510.340.270.230.02910.270.080.260.08
120.310.050.050.06520.320.310.170.02920.320.050.330.05
130.290.080.080.06530.240.310.150.04930.270.050.270.05
200.200.840.890.91600.220.240.310.771000.190.890.400.83
210.300.060.000.06610.230.170.160.071010.420.050.280.05
220.280.060.010.03620.230.210.060.031020.190.010.120.03
230.230.030.100.00630.310.390.470.131030.200.050.200.10
300.150.760.200.80700.070.790.830.841100.100.130.550.90
310.250.150.340.10710.340.100.060.051110.320.280.100.03
320.360.060.180.05720.310.070.060.061120.260.340.170.03
330.240.030.280.06730.280.030.060.051130.320.250.180.04
400.270.890.300.86800.240.860.910.881200.250.190.210.77
410.350.040.280.04810.230.060.050.061210.230.240.270.08
420.190.020.210.01820.240.060.020.061220.190.200.100.04
430.190.050.220.09830.280.020.030.001230.340.380.420.12
Note: Cat = category.

Appendix B. Additional Results for SPM-LS Dataset with Polytomous Item Responses

Table A2 shows all estimated item response probabilities for the SPM-LS dataset with polytomous items for the best fitting RLCM with C = 3 classes.
Table A2. SPM-LS polytomous data: estimated item response probabilities and latent class probabilities for best-fitting RLCM with C = 3 latent classes.
Table A2. SPM-LS polytomous data: estimated item response probabilities and latent class probabilities for best-fitting RLCM with C = 3 latent classes.
ItemCatC1C2C3ItemCatC1C2C3ItemCatC1C2C3
SPM100.730.150.82SPM500.720.040.98SPM900.250.220.73
SPM110.110.480.12SPM510.050.480.00SPM910.140.030.12
SPM120.060.010.02SPM520.030.320.01SPM920.170.000.07
SPM130.070.040.01SPM530.080.000.00SPM930.120.320.03
SPM140.000.240.01SPM540.060.000.00SPM940.190.150.01
SPM150.010.080.02SPM550.020.080.01SPM950.060.000.03
SPM160.020.000.00SPM560.020.080.00SPM960.060.160.01
SPM170.000.000.00SPM570.020.000.00SPM970.010.120.00
SPM200.870.320.97SPM600.510.080.92SPM1000.080.000.56
SPM210.020.120.03SPM610.100.360.04SPM1010.140.120.19
SPM220.020.360.00SPM620.110.000.03SPM1020.260.400.03
SPM230.070.000.00SPM630.090.000.01SPM1030.100.080.07
SPM240.010.120.00SPM640.060.240.00SPM1040.130.000.06
SPM250.000.080.00SPM650.060.200.00SPM1050.100.080.06
SPM260.010.000.00SPM660.050.080.00SPM1060.100.320.03
SPM270.000.000.00SPM670.020.040.00SPM1070.090.000.00
SPM300.670.040.92SPM700.380.200.88SPM1100.110.260.48
SPM310.170.000.05SPM710.060.120.06SPM1110.180.430.10
SPM320.080.400.00SPM720.130.280.01SPM1120.210.000.12
SPM330.010.320.00SPM730.120.280.01SPM1130.150.120.07
SPM340.010.040.02SPM740.110.000.02SPM1140.140.000.08
SPM350.000.120.01SPM750.070.000.02SPM1150.080.190.07
SPM360.040.040.00SPM760.090.000.00SPM1160.070.000.07
SPM370.020.040.00SPM770.040.120.00SPM1170.060.000.01
SPM400.620.000.98SPM800.180.000.81SPM1200.030.240.45
SPM410.110.400.01SPM810.240.040.01SPM1210.140.090.17
SPM420.040.360.00SPM820.080.000.07SPM1220.190.440.10
SPM430.070.080.00SPM830.140.080.03SPM1230.230.030.06
SPM440.050.000.01SPM840.110.200.03SPM1240.120.000.07
SPM450.040.120.00SPM850.070.280.04SPM1250.140.000.06
SPM460.040.000.00SPM860.110.400.01SPM1260.070.200.07
SPM470.030.040.00SPM870.070.000.00SPM1270.080.000.02
Note: Cat = category.

References

  1. Agresti, Alan, and Maria Kateri. 2014. Some remarks on latent variable models in categorical data analysis. Communications in Statistics Theory and Methods 43: 801–14. [Google Scholar] [CrossRef]
  2. Battauz, Michela. 2019. Regularized estimation of the nominal response model. Multivariate Behavioral Research. [Google Scholar] [CrossRef] [PubMed]
  3. Bhattacharya, Sakyajit, and Paul D. McNicholas. 2014. A LASSO-penalized BIC for mixture model selection. Advances in Data Analysis and Classification 8: 45–61. [Google Scholar] [CrossRef] [Green Version]
  4. Borsboom, Denny, Mijke Rhemtulla, Angelique O. J. Cramer, Han L. J. van der Maas, Marten Scheffer, and Conor V. Dolan. 2016. Kinds versus continua: A review of psychometric approaches to uncover the structure of psychiatric constructs. Psychological Medicine 46: 1567–79. [Google Scholar] [CrossRef] [Green Version]
  5. Cao, Peng, Xiaoli Liu, Hezi Liu, Jinzhu Yang, Dazhe Zhao, Min Huang, and Osmar Zaiane. 2018. Generalized fused group lasso regularized multi-task feature learning for predicting cognitive outcomes in Alzheimers disease. Computer Methods and Programs in Biomedicine 162: 19–45. [Google Scholar] [CrossRef]
  6. Chen, Yunxiao, Jingchen Liu, Gongjun Xu, and Zhiliang Ying. 2015. Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association 110: 850–66. [Google Scholar] [CrossRef] [Green Version]
  7. Chen, Yunxiao, Xiaoou Li, Jingchen Liu, and Zhiliang Ying. 2017. Regularized latent class analysis with application in cognitive diagnosis. Psychometrika 82: 660–92. [Google Scholar] [CrossRef]
  8. Chen, Yunxiao, Xiaoou Li, Jingchen Liu, and Zhiliang Ying. 2018. Robust measurement via a fused latent and graphical item response theory model. Psychometrika 83: 538–62. [Google Scholar] [CrossRef] [Green Version]
  9. Collins, Linda M., and Stephanie T. Lanza. 2009. Latent Class and Latent Transition Analysis: With Applications in the Social, Behavioral, and Health Sciences. New York: Wiley. [Google Scholar] [CrossRef]
  10. DeSantis, Stacia M., E. Andrés Houseman, Brent A. Coull, Catherine L. Nutt, and Rebecca A. Betensky. 2012. Supervised Bayesian latent class models for high-dimensional data. Statistics in Medicine 31: 1342–60. [Google Scholar] [CrossRef]
  11. DeSantis, Stacia M., E. Andrés Houseman, Brent A. Coull, Anat Stemmer-Rachamimov, and Rebecca A. Betensky. 2008. A penalized latent class model for ordinal data. Biostatistics 9: 249–62. [Google Scholar] [CrossRef] [Green Version]
  12. Fan, Jianqing, and Runze Li. 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96: 1348–60. [Google Scholar] [CrossRef]
  13. Finch, W. Holmes, and Kendall C. Bronk. 2011. Conducting confirmatory latent class analysis using Mplus. Structural Equation Modeling 18: 132–51. [Google Scholar] [CrossRef]
  14. Fop, Michael, and Thomas B. Murphy. 2018. Variable selection methods for model-based clustering. Statistics Surveys 12: 18–65. [Google Scholar] [CrossRef]
  15. Formann, Anton K. 1982. Linear logistic latent class analysis. Biometrical Journal 24: 171–90. [Google Scholar] [CrossRef]
  16. Formann, Anton K. 1992. Linear logistic latent class analysis for polytomous data. Journal of the American Statistical Association 87: 476–86. [Google Scholar] [CrossRef]
  17. Formann, Anton K. 2007. (Almost) equivalence between conditional and mixture maximum likelihood estimates for some models of the Rasch type. In Multivariate and Mixture Distribution Rasch Models. Edited by Matthias von Davier and Claus H. Carstensen. New York: Springer, pp. 177–89. [Google Scholar] [CrossRef]
  18. Formann, Anton K., and Thomas Kohlmann. 1998. Structural latent class models. Sociological Methods & Research 26: 530–65. [Google Scholar] [CrossRef]
  19. George, Ann C., Alexander Robitzsch, Thomas Kiefer, Jürgen Groß, and Ali Ünlü. 2016. The R package CDM for cognitive diagnosis models. Journal of Statistical Software 74: 1–24. [Google Scholar] [CrossRef] [Green Version]
  20. Gu, Yuqi, and Gongjun Xu. 2018. Partial identifiability of restricted latent class models. arXiv arXiv:1803.04353. [Google Scholar] [CrossRef]
  21. Gu, Yuqi, and Gongjun Xu. 2019. Learning attribute patterns in high-dimensional structured latent attribute models. Journal of Machine Learning Research 20: 115. [Google Scholar]
  22. Hastie, Trevor, Robert Tibshirani, and Martin Wainwright. 2015. Statistical Learning with Sparsity: The Lasso and Generalizations. Boca Raton: CRC Press. [Google Scholar] [CrossRef]
  23. Houseman, E. Andrés, Brent A. Coull, and Rebecca A. Betensky. 2006. Feature-specific penalized latent class analysis for genomic data. Biometrics 62: 1062–70. [Google Scholar] [CrossRef] [Green Version]
  24. Huang, Jian, Patrick Breheny, and Shuangge Ma. 2012. A selective review of group selection in high-dimensional models. ss 27: 481–99. [Google Scholar] [CrossRef] [PubMed]
  25. Huang, Po-Hsien, Hung Chen, and Li-Jen Weng. 2017. A penalized likelihood method for structural equation modeling. Psychometrika 82: 329–54. [Google Scholar] [CrossRef] [PubMed]
  26. Jacobucci, Ross, Kevin J. Grimm, and John J. McArdle. 2016. Regularized structural equation modeling. Structural Equation Modeling 23: 555–66. [Google Scholar] [CrossRef] [PubMed]
  27. Janssen, Anne B., and Christian Geiser. 2010. On the relationship between solution strategies in two mental rotation tasks. Learning and Individual Differences 20: 473–78. [Google Scholar] [CrossRef]
  28. Kang, Hyeon-Ah, Jingchen Liu, and Zhiliang Ying. 2017. A graphical diagnostic classification model. arXiv arXiv:1707.06318. [Google Scholar]
  29. Keribin, Christine. 2000. Consistent estimation of the order of mixture models. Sankhyā: The Indian Journal of Statistics, Series A 62: 49–66. [Google Scholar]
  30. Langeheine, Rolf, and Jürgen Rost. 1988. Latent Trait and Latent Class Models. New York: Plenum Press. [Google Scholar] [CrossRef]
  31. Lazarsfeld, Paul F., and Neil W. Henry. 1968. Latent Structure Analysis. Boston: Houghton Mifflin. [Google Scholar]
  32. Leoutsakos, Jeannie-Marie S., Karen Bandeen-Roche, Elzabeth Garrett-Mayer, and Peter P. Zandi. 2011. Incorporating scientific knowledge into phenotype development: Penalized latent class regression. Statistics in Medicine 30: 784–98. [Google Scholar] [CrossRef]
  33. Liu, Jingchen, and Hyeon-Ah Kang. 2019. Q-matrix learning via latent variable selection and identifiability. In Handbook of Diagnostic Classification Models. Edited by Matthias von Davier and Young-Sun Lee. Cham: Springer, pp. 247–63. [Google Scholar] [CrossRef]
  34. Liu, Xiaoli, Peng Cao, Jianzhong Wang, Jun Kong, and Dazhe Zhao. 2019. Fused group lasso regularized multi-task feature learning and its application to the cognitive performance prediction of Alzheimer’s disease. Neuroinformatics 17: 271–94. [Google Scholar] [CrossRef]
  35. Myszkowski, Nils, and Martin Storme. 2018. A snapshot of g. Binary and polytomous item-response theory investigations of the last series of the standard progressive matrices (SPM-LS). Intelligence 68: 109–16. [Google Scholar] [CrossRef]
  36. Nussbeck, Fritjof W., and Michael Eid. 2015. Multimethod latent class analysis. Frontiers in Psychology 6: 1332. [Google Scholar] [CrossRef] [Green Version]
  37. Oberski, Daniel L., Jacques A. P. Hagenaars, and Willem E. Saris. 2015. The latent class multitrait-multimethod model. Psychological Methods 20: 422–43. [Google Scholar] [CrossRef] [PubMed]
  38. Oelker, Margret-Ruth, and Gerhard Tutz. 2017. A uniform framework for the combination of penalties in generalized structured models. Advances in Data Analysis and Classification 11: 97–120. [Google Scholar] [CrossRef]
  39. Robitzsch, Alexander. 2020. sirt: Supplementary Item Response Theory Models. R Package Version 3.9-4. Available online: https://CRAN.R-project.org/package=sirt (accessed on 17 February 2020).
  40. Robitzsch, Alexander, and Ann C. George. 2019. The R package CDM for diagnostic modeling. In Handbook of Diagnostic Classification Models. Edited by Matthias von Davier and Young-Sun Lee. Cham: Springer, pp. 549–72. [Google Scholar] [CrossRef]
  41. Ruan, Lingyan, Ming Yuan, and Hui Zou. 2011. Regularized parameter estimation in high-dimensional Gaussian mixture models. Neural Computation 23: 1605–22. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  42. San Martín, Ernesto. 2018. Identifiability of structural characteristics: How relevant is it for the Bayesian approach? Brazilian Journal of Probability and Statistics 32: 346–73. [Google Scholar] [CrossRef]
  43. Scharf, Florian, and Steffen Nestler. 2019. Should regularization replace simple structure rotation in exploratory factor analysis? Structural Equation Modeling 26: 576–90. [Google Scholar] [CrossRef]
  44. Schmiege, Sarah J., Katherine E. Masyn, and Angela D. Bryan. 2018. Confirmatory latent class analysis: Illustrations of empirically driven and theoretically driven model constraints. Organizational Research Methods 21: 983–1001. [Google Scholar] [CrossRef]
  45. Storme, Martin, Nils Myszkowski, Simon Baron, and David Bernard. 2019. Same test, better scores: Boosting the reliability of short online intelligence recruitment tests with nested logit item response theory models. Journal of Intelligence 7: 17. [Google Scholar] [CrossRef] [Green Version]
  46. Sun, Jianan, Yunxiao Chen, Jingchen Liu, Zhiliang Ying, and Tao Xin. 2016. Latent variable selection for multidimensional item response theory models via L1 regularization. Psychometrika 81: 921–39. [Google Scholar] [CrossRef]
  47. Sun, Jiehuan, Jose D. Herazo-Maya, Philip L. Molyneaux, Toby M. Maher, Naftali Kaminski, and Hongyu Zhao. 2019. Regularized latent class model for joint analysis of high-dimensional longitudinal biomarkers and a time-to-event outcome. Biometrics 75: 69–77. [Google Scholar] [CrossRef]
  48. Tamhane, Ajit C., Dingxi Qiu, and Bruce E. Ankenman. 2010. A parametric mixture model for clustering multivariate binary data. Statistical Analysis and Data Mining 3: 3–19. [Google Scholar] [CrossRef]
  49. Tibshirani, Robert, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight. 2005. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society. Series B: Statistical Methodology 67: 91–108. [Google Scholar] [CrossRef] [Green Version]
  50. Tutz, Gerhard, and Jan Gertheiss. 2016. Regularized regression for categorical data. Statistical Modelling 16: 161–200. [Google Scholar] [CrossRef] [Green Version]
  51. Tutz, Gerhard, and Gunther Schauberger. 2015. A penalty approach to differential item functioning in Rasch models. Psychometrika 80: 21–43. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  52. van Erp, Sara, Daniel L. Oberski, and Joris Mulder. 2019. Shrinkage priors for Bayesian penalized regression. Journal of Mathematical Psychology 89: 31–50. [Google Scholar] [CrossRef] [Green Version]
  53. von Davier, Matthias. 2008. A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology 61: 287–307. [Google Scholar] [CrossRef] [PubMed]
  54. von Davier, Matthias. 2010. Hierarchical mixtures of diagnostic models. Psychological Test and Assessment Modeling 52: 8–28. [Google Scholar]
  55. von Davier, Matthias, and Young-Sun Lee, eds. 2019. Handbook of Diagnostic Classification Models. Cham: Springer. [Google Scholar] [CrossRef]
  56. von Davier, Matthias, Bobby Naemi, and Richard D. Roberts. 2012. Factorial versus typological models: A comparison of methods for personality data. Measurement: Interdisciplinary Research and Perspectives 10: 185–208. [Google Scholar] [CrossRef]
  57. Wang, Chun, and Jing Lu. 2020. Learning attribute hierarchies from data: Two exploratory approaches. Journal of Educational and Behavioral Statistics. [Google Scholar] [CrossRef]
  58. Wu, Baolin. 2013. Sparse cluster analysis of large-scale discrete variables with application to single nucleotide polymorphism data. Journal of Applied Statistics 40: 358–67. [Google Scholar] [CrossRef]
  59. Wu, Zhenke, Livia Casciola-Rosen, Antony Rosen, and Scott L. Zeger. 2018. A Bayesian approach to restricted latent class models for scientifically-structured clustering of multivariate binary outcomes. arXiv arXiv:1808.08326. [Google Scholar]
  60. Xu, Gongjun. 2017. Identifiability of restricted latent class models with binary responses. Annals of Statistics 45: 675–707. [Google Scholar] [CrossRef] [Green Version]
  61. Xu, Gongjun, and Zhuoran Shang. 2018. Identifying latent structures in restricted latent class models. Journal of the American Statistical Association 113: 1284–95. [Google Scholar] [CrossRef]
  62. Yamamoto, Michio, and Kenichi Hayashi. 2015. Clustering of multivariate binary data with dimension reduction via L1-regularized likelihood maximization. Pattern Recognition 48: 3959–68. [Google Scholar] [CrossRef] [Green Version]
  63. Zhang, Cun-Hui. 2010. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics 38: 894–942. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Different penalty functions used in regularization with regularization parameter λ = 0.25 (left panel) and λ = 0.125 (right panel).
Figure 1. Different penalty functions used in regularization with regularization parameter λ = 0.25 (left panel) and λ = 0.125 (right panel).
Jintelligence 08 00030 g001
Figure 2. Illustration of a partial order with four latent classes.
Figure 2. Illustration of a partial order with four latent classes.
Jintelligence 08 00030 g002
Figure 3. Data illustration dichotomous data: Regularization path for estimated item response probabilities for Item 1 for Classes 2, 3, 4 for the four-class solution.
Figure 3. Data illustration dichotomous data: Regularization path for estimated item response probabilities for Item 1 for Classes 2, 3, 4 for the four-class solution.
Jintelligence 08 00030 g003
Figure 4. SPM-LS dichotomous data: partial order for latent class from RLCM.
Figure 4. SPM-LS dichotomous data: partial order for latent class from RLCM.
Jintelligence 08 00030 g004
Table 1. Data illustration dichotomous data: true item response probabilities p i c .
Table 1. Data illustration dichotomous data: true item response probabilities p i c .
ItemClass
1234
1, 70.100.820.820.82
2, 80.220.880.880.88
3, 90.160.790.160.79
4, 100.250.850.250.85
5, 110.100.100.460.91
6, 120.220.220.220.79
Table 2. Data illustration dichotomous data: model comparison for exploratory latent class models (LCMs).
Table 2. Data illustration dichotomous data: model comparison for exploratory latent class models (LCMs).
C#npAICBIC
22513,63613,759
33813,16913,356
45112,97913,229
56412,98113,295
67612,97613,349
Note:C = number of classes; #np = number of estimated parameters.
Table 3. Data illustration dichotomous data: estimated item response probabilities in exploratory LCM with C = 4 classes.
Table 3. Data illustration dichotomous data: estimated item response probabilities in exploratory LCM with C = 4 classes.
ItemClass
1234
10.080.790.790.82
20.200.840.890.91
30.150.760.190.81
40.270.900.290.86
50.100.090.440.92
60.230.230.300.77
70.070.790.820.85
80.240.870.910.87
90.140.820.180.81
100.190.900.420.83
110.100.130.540.89
120.250.190.190.77
Table 4. Data illustration dichotomous data: estimated item response probabilities in the regularized latent class model (RLCM) with C = 4 classes based on the minimal Bayesian information criterion (BIC) ( λ = 0.21 ).
Table 4. Data illustration dichotomous data: estimated item response probabilities in the regularized latent class model (RLCM) with C = 4 classes based on the minimal Bayesian information criterion (BIC) ( λ = 0.21 ).
ItemClass
1234
10.080.800.800.80
20.200.880.880.88
30.150.790.150.79
40.270.870.270.87
50.090.090.440.92
60.230.230.230.76
70.070.820.820.82
80.240.870.870.87
90.140.810.140.81
100.190.850.400.85
110.110.110.550.89
120.220.220.220.76
Table 5. Data illustration polytomous data: true item response probabilities p i c .
Table 5. Data illustration polytomous data: true item response probabilities p i c .
ItemCatClassItemCatClass
12341234
1, 700.100.820.820.824, 1000.250.850.250.85
1, 710.300.060.060.064, 1010.350.030.350.03
1, 720.300.060.060.064, 1020.200.030.200.03
1, 730.300.060.060.064, 1030.200.090.200.09
2, 800.220.880.880.885, 1100.100.100.460.91
2, 810.260.050.040.065, 1110.300.300.180.03
2, 820.260.050.040.065, 1120.300.300.180.03
2, 830.260.020.040.005, 1130.300.300.180.03
3, 900.160.790.160.796, 1200.220.220.220.79
3, 910.280.110.280.116, 1210.240.230.220.06
3, 920.330.050.330.056, 1220.200.170.120.04
3, 930.230.050.230.056, 1230.340.380.440.11
Table 6. Data illustration polytomous data: model comparison for exploratory LCMs.
Table 6. Data illustration polytomous data: model comparison for exploratory LCMs.
C#npAICBIC
27225,08225,440
310724,61625,151
414324,43125,148
517924,43925,337
621524,44425,524
Note:C = number of classes; #np = number of estimated parameters.
Table 7. Data illustration polytomous data: model comparison for different RLCMs with four classes.
Table 7. Data illustration polytomous data: model comparison for different RLCMs with four classes.
Appr.FusedEquationC λ 1 λ 2 #np#nregBIC
R1Class(10)40.31846324,982
R2Cat(11)40.18687924,689
R3Cat and Class(12)40.400.154410324,836
R4Grouped Cat(13)40.45826524,777
R5Grouped Class(14)40.65796724,776
Note: Appr. = approach; Cat = category; Eq. = equation for regularization penalty in Section 3.2; C = number of classes; #np = number of estimated parameters; #nreg = number of regularized item parameters.
Table 8. Data illustration polytomous data: estimated item response probabilities in the RLCM with C = 4 classes and fused regularization among classes based on the minimal BIC.
Table 8. Data illustration polytomous data: estimated item response probabilities in the RLCM with C = 4 classes and fused regularization among classes based on the minimal BIC.
ItemCatClassItemCatClassItemCatClass
123412341234
100.070.790.820.82500.100.130.460.92900.130.820.100.82
110.310.070.060.06510.300.290.180.02910.290.060.300.06
120.310.070.060.06520.300.290.180.02920.290.060.300.06
130.310.070.060.06530.300.290.180.04930.290.060.300.06
200.220.850.870.91600.220.240.300.761000.200.890.370.82
210.260.050.010.06610.260.180.320.071010.420.050.250.05
220.260.050.010.03620.260.180.060.031020.190.010.130.03
230.260.050.110.00630.260.400.320.141030.190.050.250.10
300.160.740.190.80700.070.790.850.851100.100.160.550.91
310.280.160.270.10710.310.090.050.051110.300.280.150.03
320.280.050.270.05720.310.090.050.051120.300.280.150.03
330.280.050.270.05730.310.030.050.051130.300.280.150.03
400.260.880.280.85800.250.860.910.881200.240.200.200.76
410.360.050.240.04810.250.060.030.061210.210.210.350.10
420.190.020.240.02820.250.060.030.061220.210.210.100.04
430.190.050.240.09830.250.020.030.001230.340.380.350.10
Note: Cat = category.
Table 9. The last series of Raven’s standard progressive matrices (SPM-LS) polytomous data: percentage frequencies and recoding table.
Table 9. The last series of Raven’s standard progressive matrices (SPM-LS) polytomous data: percentage frequencies and recoding table.
ItemCat0Cat1Cat2Cat3Cat4Cat5Cat6Cat7
SPM176.0 (7)13.6 (3)3.0 (1)2.4 (4)2.2 (6)2.0 (2)0.8 (5)
SPM291.0 (6)3.0 (3)2.4 (4)2.2 (1)0.8 (5)0.4 (7)0.2 (2)
SPM380.4 (8)8.0 (2)4.2 (6)2.0 (4)1.8 (3)1.6 (5)1.2 (7)0.8 (1)
SPM482.4 (2)5.6 (3)3.2 (5)2.6 (1)2.2 (8)1.8 (6)1.2 (7)1.0 (4)
SPM585.6 (1)3.8 (2)3.0 (3)2.6 (7)1.8 (6)1.6 (5)1.0 (4)0.6 (8)
SPM676.4 (5)7.0 (4)5.2 (6)3.0 (3)2.8 (7)2.6 (8)2.0 (2)1.0 (1)
SPM770.1 (1)6.6 (4)5.8 (5)5.4 (3)4.4 (8)3.4 (6)2.4 (7)1.8 (2)
SPM858.1 (6)7.6 (1)7.0 (3)6.6 (8)6.4 (2)6.2 (5)5.8 (7)2.2 (4)
SPM957.3 (3)12.0 (5)9.0 (1)7.2 (4)6.6 (8)4.0 (7)3.0 (2)0.8 (6)
SPM1039.5 (2)17.2 (6)11.2 (7)8.0 (3)7.8 (8)7.4 (5)6.0 (4)2.8 (1)
SPM1135.7 (4)14.0 (1)13.8 (7)9.8 (5)9.4 (6)8.0 (3)6.6 (2)2.6 (8)
SPM1232.5 (5)15.4 (2)14.2 (3)10.4 (1)8.2 (4)8.2 (7)7.4 (6)3.6 (8)
Note: Numbers in parentheses denote the original item category.
Table 10. SPM-LS dichotomous data: model comparison for exploratory LCMs and RLCM.
Table 10. SPM-LS dichotomous data: model comparison for exploratory LCMs and RLCM.
C λ #np#nregBIC
LCM202505973
303805721
405105680
506405696
607705694
RLCM20.012505973
30.333535715
40.3839125643
50.2945195621
60.5345325620
Note:C = number of classes; λ = regularization parameter of selected model with minimal BIC; #np = number of estimated parameters; #nreg = number of regularized parameters.
Table 11. SPM-LS dichotomous data: estimated item probabilities and latent class probabilities for best fitting RLCM with C = 5 latent classes.
Table 11. SPM-LS dichotomous data: estimated item probabilities and latent class probabilities for best fitting RLCM with C = 5 latent classes.
ItemClass
C1C2C3C4C5
p c 0.120.040.400.070.37
SPM10.390.390.830.830.83
SPM20.570.570.990.860.99
SPM30.330.000.860.960.96
SPM40.051.000.910.601.00
SPM50.081.000.960.771.00
SPM60.070.070.850.850.97
SPM70.200.830.580.830.95
SPM80.060.690.361.000.90
SPM90.160.340.341.000.90
SPM100.000.230.230.000.79
SPM110.140.000.140.000.77
SPM120.110.620.110.110.62
p ¯ c 0.180.480.600.650.89
Note: p c = skill class probability; p ¯ c = average of item probabilities within class c.
Table 12. SPM-LS polytomous data: estimated item response probabilities and latent class probabilities for best-fitting RLCM with C = 3 latent classes for items SPM10, SPM11 and SPM12.
Table 12. SPM-LS polytomous data: estimated item response probabilities and latent class probabilities for best-fitting RLCM with C = 3 latent classes for items SPM10, SPM11 and SPM12.
ItemCatC1C2C3ItemCatC1C2C3ItemCatC1C2C3
SPM1000.080.000.56SPM1100.110.260.48SPM1200.030.240.45
SPM1010.140.120.19SPM1110.180.430.10SPM1210.140.090.17
SPM1020.260.400.03SPM1120.210.000.12SPM1220.190.440.10
SPM1030.100.080.07SPM1130.150.120.07SPM1230.230.030.06
SPM1040.130.000.06SPM1140.140.000.08SPM1240.120.000.07
SPM1050.100.080.06SPM1150.080.190.07SPM1250.140.000.06
SPM1060.100.320.03SPM1160.070.000.07SPM1260.070.200.07
SPM1070.090.000.00SPM1170.060.000.01SPM1270.080.000.02
Note: Cat = category.

Share and Cite

MDPI and ACS Style

Robitzsch, A. Regularized Latent Class Analysis for Polytomous Item Responses: An Application to SPM-LS Data. J. Intell. 2020, 8, 30. https://doi.org/10.3390/jintelligence8030030

AMA Style

Robitzsch A. Regularized Latent Class Analysis for Polytomous Item Responses: An Application to SPM-LS Data. Journal of Intelligence. 2020; 8(3):30. https://doi.org/10.3390/jintelligence8030030

Chicago/Turabian Style

Robitzsch, Alexander. 2020. "Regularized Latent Class Analysis for Polytomous Item Responses: An Application to SPM-LS Data" Journal of Intelligence 8, no. 3: 30. https://doi.org/10.3390/jintelligence8030030

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop