You are currently viewing a new version of our website. To view the old version click .
Mathematics
  • Article
  • Open Access

2 December 2025

Robustness of Identifying Item–Trait Relationships Under Non-Normality in MIRT Models

,
,
,
,
and
1
Academy for Advanced Interdisciplinary Studies & Key Laboratory of Applied Statistics of MOE, Northeast Normal University, Changchun 130024, China
2
Shanghai Zhangjiang Institute of Mathematics, Shanghai 201203, China
3
College of Education, Zhejiang Normal University, Jinhua 321004, China
4
School of Psychology & Key Laboratory of Applied Statistics of MOE, Northeast Normal University, Changchun 130024, China

Abstract

Identifying item–trait relationships is a core task in multidimensional item response theory (MIRT). Common empirical approaches include exploratory item factor analysis (EIFA) with rotations, the expectation maximization-based L 1 regularization (EML1) algorithm, and the expectation model selection (EMS) algorithm. While these methods typically assume multivariate normality of latent traits, empirical data often deviate from this assumption. This study evaluates the robustness of EIFA, EML1, and EMS, when latent traits violate normality assumptions. Using the independent generator transform, we generate latent variables under varying levels of skewness, excess kurtosis, numbers of non-normal dimensions, and inter-factor correlations. We then assess the performance of each method in terms of the F1-score for identifying item–trait relationships and mean squared error (MSE) of parameter estimations. The results indicate that non-normality leads to a reduction in F1-score and an increase in MSE generally. For F1-score, EMS performs best with small samples (e.g., N = 500 ), whereas EIFA with rotations yields the highest F1-score in larger samples. In terms of estimation accuracy, EMS and EML1 generally yield lower MSEs than EIFA. The effects of non-normality are also demonstrated by applying these methods to a real data set from the Depression, Anxiety, and Stress Scale.

1. Introduction

In psychological and educational testing, dichotomous and polytomous items are widely used to assess individuals’ latent traits or abilities. When multiple latent dimensions are involved, a central task is to identify the underlying relationships between items and latent traits. Traditionally, such relationships are determined a priori by domain experts based on their knowledge of item content and construct definitions. However, the correct specification of item–trait associations is critical, not only for accurate model calibration but also for valid individual assessment. Misclassification of these associations can result in serious model misfit and flawed diagnostic interpretations. Therefore, an important question is to empirically estimate the underlying item–trait relationships from the data. In this paper, we consider this question within the framework of MIRT models.
More specifically, each subject is characterized by a random latent trait vector θ = ( θ 1 , , θ K ) in a MIRT model. The probability of a correct response to item j is modeled as P ( Y j = 1 θ ) = F ( a j θ + b j ) , where a j = ( a j 1 , , a j K ) and b j are the discrimination and difficulty parameters, respectively. Essentially, the detection of item–trait relationships corresponds to the identification of the nonzero elements of a j .
There are three primary approaches for empirically estimating the underlying relationships between items and latent traits. A commonly used approach is exploratory item factor analysis (EIFA). It estimates all parameters (under location and scale constraints on the latent traits), applies an analytic rotation, and then imposes a cutoff on the rotated discrimination matrix to obtain a simpler, interpretable structure with some zero elements [1,2,3]. As an alternative, L 1 regularized methods are proposed [4,5,6]. These methods impose an L 1 penalty on the log-likelihood, shrinking small discrimination parameters toward zero. The performance of L 1 -regularized methods depends critically on the choice of regularization parameters, which are typically selected by minimizing the Bayesian Information Criterion (BIC; [7]). To avoid the selection of regularization parameters, Xu et al. [8] develops the expectation model selection (EMS) and Shang et al. [9] proposes generalized EMS algorithms. Both algorithms directly minimize BIC to explore item–trait relationships. Further methodological details are provided in Section 2.
It should be noted that most of preceding methods assume that latent traits θ in Equation (1) follow a multivariate normal distribution. Empirical evidence, however, shows that this assumption is often unrealistic. In clinical assessments, many respondents report no symptoms, creating strong floor effects and non-normal latent traits [10]. Likewise, in psychiatric research, latent variables reflecting psychological disorders are typically positively skewed, as most individuals show low pathology and only a few show severe levels [11]. Similarly, psychological constructs such as depression, pain, and gambling often display non-normal distribution characteristics in the general population [12]. Even when the population distribution is normal, non-random sampling techniques can induce non-normality in the sampled latent traits [13]. For example, when analyzing test scores solely from honors program students, one may obtain a negatively skewed ability distribution.
Many studies investigate the impact of non-normal latent traits on parameter estimation in item response theory (IRT) models. Stone [14] reports that estimation bias increases as latent traits deviate from normality. Finch and Edwards [15] further demonstrates that severe skewness can still distort discrimination parameters even when the sample size is large. Related findings appear in MIRT models. Svetina et al. [16] shows that skewed latent traits reduce the accuracy of discrimination estimates, particularly under complex loading structures and higher inter-factor correlations. Wang et al. [17] extends this investigation to polytomous data under the multidimensional graded response model (MGRM), showing that full-information maximum likelihood reduces bias more effectively than weighted least squares. More recently, McClure and Jacobucci [18] evaluates the Metropolis–Hastings Robbins–Monro algorithm in high-dimensional MGRM models and observes increasing discrimination bias as dimensionality and factor correlations rise. Collectively, these studies underscore that violations of normality substantially reduce parameter recovery accuracy.
While confirmatory item factor analysis under non-normality is widely studied, the robustness of methods for identifying item–trait relationships remains underexplored. This study evaluates the performance of EIFA with rotations, the L 1 -regularized method, and the EMS algorithm in recovering item–trait structures when latent traits deviate from normality. To conduct this investigation, a method for generating latent variables with controlled deviations from normality is required. A commonly used technique is the Vale–Maurelli (VM) transformation [19], which allows specification of skewness, kurtosis, and a target correlation matrix. However, prior studies show that the VM method often produces biased or unstable estimates of skewness and kurtosis [20], and its dependence structure is closely related to the multivariate normal copula [21]. To generate stronger forms of multivariate non-normality, Foldnes and Olsson [22] proposes the independent generator (IG) transform, which creates non-normal data through linear combinations of independent generator variables. The resulting distributions have a genuinely non-normal copula, and empirical studies indicate that the IG transform produces more pronounced departures from normality than the VM method [22]. Thus, this study employs the IG transform to simulate latent variables under varying levels of skewness, excess kurtosis, numbers of non-normal latent dimensions, and inter-factor correlations, following Svetina et al. [16], Wang et al. [17], and McClure and Jacobucci [18]. We then evaluate each method’s performance using the F1-score for recovering item–trait relationships and the mean squared error (MSE) for parameter estimation. These results provide practical guidance for identifying the item–trait relationships under non-normal latent distributions.
The rest of the article is organized as follows. In Section 2, we first review three methods (EIFA with rotations, EML1, and EMS) within the multidimensional 2-parameter logistic (M2PL) model framework. In Section 3, we describe the IG transform and generate two types of non-normal latent trait distributions. We then conduct two simulations to compare the robustness of the methods for identifying item–trait relationships under non-normality in Section 4. In Section 5, we analyze a real data set. Finally, we summarize the key findings and outline directions for future research.

2. Methods for Identifying the Item–Trait Relationships in the M2PL Model

2.1. The M2PL Model

We focus on the M2PL model, one of the most widely used MIRT models. For subject i = 1 , , N and item j = 1 , , J , let y i j be the binary response from subject i to item j and Y = ( y i j ) N × J be the response matrix. The item response function for subject i on item j is given by
F ( a j θ i + b j ) = exp ( a j θ i + b j ) 1 + exp ( a j θ i + b j ) .
The latent traits θ i , for i = 1 , , N , are assumed to be independent and identically distributed, following a K-dimensional normal distribution N ( 0 , Σ ) , where Σ = ( σ k k ) K × K is the covariance matrix with unit variances. Conditional on θ i , item responses are locally independent. Under these assumptions, the marginal log-likelihood is
( A , b , Σ Y ) = i = 1 N log θ i R K φ ( θ i Σ ) j = 1 J F ( a j θ i + b j ) y i j 1 F ( a j θ i + b j ) 1 y i j d θ i ,
where φ ( θ i Σ ) is the density of N ( 0 , Σ ) .

2.2. Exploratory Item Factor Analysis with Rotations

EIFA estimates the discrimination parameters A and difficulties b by maximizing the log-likelihood function (2) under the assumption that Σ is the identity matrix. A rotation matrix T is then applied to obtain a simpler structure, yielding a rotated discrimination matrix Λ = A T with some elements close to zero. There are two broad classes of rotation in EIFA: orthogonal and oblique. Orthogonal rotations assume that the latent traits remain uncorrelated, while oblique rotations allow for correlations among traits. In this study, we apply oblique rotations, including Quartimin [23], Geomin [24], and Infomax [3]. Each employs distinct criteria to achieve a simple and interpretable factor structure while accommodating inter-trait correlations. To facilitate interpretation, small discrimination parameters are often thresholded to zero.

2.3. Expectation Maximization-Based L 1 Regularization Method

In MIRT models, identifying the item–trait relationship structure can be framed as a model selection problem with missing data. Specifically, the dependence of item responses on latent traits follows the framework of a generalized linear model, where the latent traits θ i serve as unobserved covariates and the discrimination parameters a j act as regression coefficients. Identifying the nonzero entries in a j reveals the item’s associated traits. This forms a latent variable selection problem [4]. Each possible zero-nonzero configuration of the discrimination matrix A defines a distinct candidate MIRT model, as it specifies a unique pattern of item–trait associations across all items. Selecting the optimal item–trait structure is thus equivalent to choosing the best model among 2 J K candidate models. This is fundamentally a model selection problem with missing covariates. Exhaustively evaluating all candidate models is computationally infeasible, even for moderately sized assessments. To address this challenge, the L 1 -regularized methods [4,5,6] are proposed.
The L 1 -penalized estimator is defined as
( A ^ η , b ^ η , Σ ^ η ) = arg max A , b , Σ l ( A , b , Σ Y ) η A 1 ,
where A 1 = j = 1 J k = 1 K | a j k | denotes the entry-wise L 1 norm of the matrix A , and η 0 is a regularization parameter controlling the sparsity of A . When η = 0 , the estimator reduces to the marginal maximum likelihood estimator, yielding a dense discrimination matrix A ^ 0 ; when η , all entries shrink to zero ( A ^ = 0 ). Thus, choosing η ( 0 , ) appropriately is crucial for recovering the true nonzero elements. Sun et al. [4] and Shang et al. [6] recommend selecting the regularization parameter η by minimizing BIC. Recently, Robitzsch [25] proposes a smooth BIC approximation for regularized estimation. It is more efficient and matches or exceeds the performance of L 1 -regularized methods that select η using BIC.
It should be noted that the M2PL model exhibits rotational indeterminacy. To ensure identifiability, additional constraints must be imposed on the item parameters. In this paper, we adopt the constraint proposed by Sun et al. [4], which assumes that each of the first K items is associated with only one latent trait separately, i.e., a j j 0 and a j k = 0 for 1 j k K . In practice, the constraint should be determined according to prior knowledge of the items and the entire study. This identifiability constraint is adopted in subsequent studies, including Xu et al. [8], Shang et al. [6], Shang et al. [9], and Shang et al. [26].
To maximize (3), which involves an integral over the latent variable θ i , Sun et al. [4] and Shang et al. [6] apply the expectation maximization (EM) algorithm [27]. The EM algorithm is widely used to handle latent variables by treating them as missing data. Accordingly, their approach is referred to as the EM-based L 1 regularization method (EML1).

2.4. Expectation Model Selection Algorithm

The aforementioned EML1 can produce sparse estimates of the discrimination parameter matrix. However, its effectiveness depends critically on the selection of an appropriate regularization parameter, which is typically determined by minimizing BIC. To bypass this tuning step, Xu et al. [8] directly minimize BIC using the EMS algorithm [28]. The EMS algorithm is a recently proposed method for model selection with missing data [28]. It extends the EM algorithm by iteratively updating both model structure and parameters to minimize information criteria.
For M2PL models, EMS simultaneously optimizes both the model structure (represented by an incidence matrix M ( A ) = ( I ( a j k 0 ) ) J × K ) and its parameters to minimize the BIC for the observed data Y :
BIC ( M , A , b , Σ , Y ) = 2 l ( A , b , Σ Y ) + A 0 log N ,
where A 0 = j = 1 J k = 1 K I ( a j k 0 ) is the L 0 -norm, representing the number of nonzero elements in A . Specifically, the EMS algorithm alternates between the expectation step (E-step) and the model selection step (MS-step) until convergence. The E-step computes the conditional expectation of the BIC criterion for the complete data, known as the Q-function, for each candidate model. In the MS-step, the algorithm selects the optimal model by minimizing the Q function over all candidate models and then updates the model (i.e., M ( A ) ) with its parameter estimates (i.e., A , b , Σ ). In practice, the selected optimal model is often a sparse M2PL model with many zero elements in A , which tends to be close to the underlying true model. This adaptive model update improves the EMS algorithm’s performance in identifying the item–trait relationships. To ensure parameter identification, EMS adopts the same constraints as EML1.
Recent research further enhances the computational efficiency. Shang et al. [9] proposes the generalized EMS algorithm, which only requires a decrease in the Q function value during the MS-step. Shang et al. [26] computes the Q function approximately by Gauss–Hermite quadrature in the E-step.

3. Generation of Non-Normal Latent Variables

3.1. The IG Transform

In the independent generator (IG) transform framework proposed by Foldnes and Olsson [22], the non-normal latent trait vector is constructed as θ = A X . Here, X = ( X 1 , , X s ) consists of mutually independent generator variables with zero mean and unit variance, and A = ( a k j ) is the K × s coefficient matrix ( s K ). The goal is to choose A and appropriate distributions for the X j such that the generated latent vector θ attains (a) a prespecified covariance matrix Σ , and (b) prespecified marginal skewness α θ k and excess kurtosis β θ k (i.e., kurtosis minus 3) of θ k .
To match the covariance structure, A is chosen such that A A = Σ . In practice, the square-root matrix of Σ may be used or A may be constructed using a structural equation modeling formulation, as described in Foldnes and Olsson [22]. Because θ = A X , the marginal skewness and excess kurtosis of each latent trait dimension satisfy
α θ k = a k 1 3 α X 1 + a k 2 3 α X 2 + + a k s 3 α X s ( a k 1 2 + a k 2 2 + + a k s 2 ) 3 / 2 ,
β θ k = a k 1 4 β X 1 + a k 2 4 β X 2 + + a k s 4 β X s ( a k 1 2 + a k 2 2 + + a k s 2 ) 2 .
Given target α θ k and β θ k , these systems are solved to obtain the required α X j and β X j . When s > K , the systems often have more unknowns than equations, allowing feasible solutions across a wide range of distributions.
Once feasible solutions are obtained, each generator X j can be simulated from any univariate distribution with mean 0, variance 1, and specified skewness and excess kurtosis (e.g., Fleishman polynomial or Pearson-type distributions). Then, we apply the transform θ = A X to obtain latent trait samples. This procedure can be implemented using the rIG() function in the R package covsim [29].
As noted by Foldnes and Olsson [22], not all combinations of univariate skewness and excess kurtosis are feasible under the IG framework, even when the moment Equations (4) and (5) are algebraically solvable. Consequently, some target non-normality settings cannot be generated. As shown later, we encounter such infeasible combinations of skewness and excess kurtosis.

3.2. Two Types of Non-Normal Latent Traits

In this paper, we employ the IG transform to systematically simulate two types of latent variable distributions that vary in skewness, excess kurtosis, number of non-normal dimensions, and inter-factor correlations.

3.2.1. Type 1

We first fix the number of non-normal latent variable dimensions at K. All dimensions share the same skewness α and excess kurtosis β . To examine how varying degrees of non-normality in the latent traits affect the performance of identifying item–trait relationships, we consider combinations of skewness α { 0 , 1 , 2 } and excess kurtosis β { 0 , 1 , 4 } . These values span a realistic and interpretable range of latent trait shapes, from near-normal distributions to moderately skewed and heavy-tailed forms. Our selection is informed by prior work such as Curran et al. [30] and Wang et al. [17], who use α = 2 and β = 4 based on analyses of data from several community-based mental-health and substance-use studies. Note that the latent variables follow a normal distribution when both skewness and excess kurtosis are equal to zero. When the inter-factor correlation is 0.4, the combinations α = 2 with β = 0 or β = 1 are infeasible, as noted in the subsection above. Attempts to generate these distributions using the rIG function in the covsim package produce errors, so these cases are excluded. This results in six feasible non-normal combinations.
Figure 1 presents the kernel density curves for the first dimension of four-dimensional non-normal latent variables generated under six different combinations. The densities are based on a sample of N = 1000 subjects with an inter-factor correlation of 0.4. From visual inspection, the combinations ( α , β ) = ( 1 , 0 ) and ( 2 , 4 ) represent two pronounced departures from normality. The former corresponds to a left-skewed distribution with moderate kurtosis, whereas the latter reflects a right-skewed distribution with heavier tails. Therefore, we select these two ( α , β ) combinations for our simulation studies.
Figure 1. Kernel density curves of six different degrees of non-normal latent variables.

3.2.2. Type 2

We first fix the total number of latent variables at K, which includes both normally and non-normally distributed dimensions. Following Svetina et al. [16] and Wang et al. [17], we simulate varying degrees of non-normality by altering the number of non-normal dimensions from 0 to K. For each non-normal latent variable, we fix ( α = 2 , β = 4 ) , as this combination represents the greatest deviation from normality as observed in Figure 1.
For example, when K = 3 , the number of non-normal dimensions varies as follows: zero dimensions (non0), the first dimension only (non1), the first two dimensions (non2), and all three dimensions (non3).

4. Simulation Studies

In this section, we compare the robustness of three methods: the improved EMS [26], the accelerated EML1 [6], and EIFA with three oblique rotations (Quartimin, Geomin, and Infomax). In EIFA, a sparse discrimination matrix is obtained by thresholding small discrimination parameter estimates to zero. Note that, selecting a suitable threshold for EIFA is important but nontrivial. In practice, discrimination parameters below 0.30 or 0.32 are often treated as negligible [31,32]. Following this convention, and in line with the specification in Shang et al. [9], we set 0.30 as the threshold in our study. All analyses are conducted using publicly accessible R codes hosted at https://github.com/xupf900/ (accessed on 24 November 2025). Simulations are run on a 64-bit Windows 10 system configured with an Intel(R) Xeon(R) Gold 5118 CPU (2.30 GHz) and 256 GB of RAM.

4.1. Simulation Design

In the simulations, we consider M2PL models with K = 3 , 5 latent dimensions, while fixing the number of items at J = 40 . The non-zero elements of the discrimination parameter matrices A are independently drawn from a uniform distribution U ( 0.5 , 2 ) . The elements of the vectors b are sampled from the standard normal distribution. The true parameter values for A and b under all scenarios are provided in Section S1 of the Supplementary File.
For the covariance matrix Σ of latent traits, the diagonal elements are fixed at 1, and the off-diagonal elements are set to either 0.4 or 0.6 to represent moderate and higher correlations, respectively. Four sample size levels N = 500 , 1000 , 2000 , 4000 are considered. To simulate violations of the normality, we consider the following two settings:
Simulation 1. 
Type 1 latent variable distributions are used, with ( α , β ) = ( 1 , 0 ) and ( 2 , 4 ) for K = 3 , and ( α , β ) = ( 2 , 4 ) for K = 5 . The correlation is set to 0.4.
Simulation 2. 
Type 2 latent variable distributions with ( α , β ) = ( 2 , 4 ) are used for both K = 3 and K = 5 . The correlation is again set to 0.4.
Under each setting, we draw Z = 200 independent data sets and apply EMS, EML1, and EIFA with rotations to recover the discrimination parameter matrix. To resolve the issue of rotational indeterminacy, the same identification constraint is applied to EMS and EML1, but not to EIFA with rotations. Specifically, we fix a K × K submatrix of the discrimination parameter matrix A to be an identity matrix. For example, when K = 3 , items 1, 10, and 19 are constrained to relate only to latent traits 1, 2, and 3, respectively. That is, the structure of a 1 , a 10 , a 19 is fixed to the identity matrix. Similarly, for K = 5 , we fix a 1 , a 5 , a 9 , a 13 , a 17 accordingly.

4.2. Evaluation Metrics

This paper evaluates the simulation results using three structural recovery metrics: F1-score, true positive rate (TPR), and false positive rate (FPR). In addition, the mean squared error (MSE) is used to assess parameter estimation.
For each replication z, let M ^ ( z ) = ( I ( a j k ( z ) 0 ) ) J × K , where a ^ j k ( z ) denotes the estimate of a j k . We define the following quantities (excluding entries fixed by identification constraints):
TP ( z ) = j , k I ( M ^ j k ( z ) = 1 , M j k = 1 ) , FP ( z ) = j , k I ( M ^ j k ( z ) = 1 , M j k = 0 ) ,
FN ( z ) = j , k I ( M ^ j k ( z ) = 0 , M j k = 1 ) , TN ( z ) = j , k I ( M ^ j k ( z ) = 0 , M j k = 0 ) .
The precision, TPR (i.e., recall), and FPR for replication z are defined as
Precision ( z ) = TP ( z ) TP ( z ) + FP ( z ) , FPR ( z ) = FP ( z ) FP ( z ) + TN ( z ) ,
TPR ( z ) = Recall ( z ) = TP ( z ) TP ( z ) + FN ( z ) .
The F1-score, which provides a balanced summary of precision and recall, is computed as
F 1 - score ( z ) = 2 · Precision ( z ) · Recall ( z ) Precision ( z ) + Recall ( z ) ,
for the zth replication.
The MSE measures the average of the squares of the errors and is calculated for each parameter a j k as
MSE ( a j k ) = 1 Z z = 1 Z a ^ j k ( z ) a j k 2 ,
where Z = 200 is the total number of replications. The MSE for each parameter b j in b is computed analogously.

4.3. Results

4.3.1. Results for Simulation 1

This subsection presents results for simulation 1. Table 1 provides the mean and standard deviation (in parentheses) of the F1-score for all methods. From this table, EMS achieves the highest F1-score when the sample size is small ( N = 500 ) across all normal and non-normal conditions. However, its performance does not improve as N increases; in fact, the F1-score decreases slightly for larger samples, particularly when K = 5 . This decline suggests that EMS may lack asymptotic consistency in recovering the item–trait relationships. In contrast, the performance of EML1 and EIFA with rotations improves steadily with N, indicating stronger large-sample consistency. For N = 1000 , 2000 , 4000 , EIFA with rotations consistently outperforms both EMS and EML1. The only exception occurs at ( α , β ) = ( 0 , 0 ) with K = 3 and N = 4000 , where EML1 achieves the highest F1-score. Among rotations, Quartimin, Geomin, and Infomax produce very similar results.
Table 1. Mean and standard deviation of the F1-score of M ^ under simulation setting 1, where the correlation coefficient is 0.4.
Regarding the effect of non-normality, all methods show some degradation in F1-score relative to the normal case in most settings. Larger departures from normality generally lead to greater decreases in performance. This trend is particularly evident for the combination ( α , β ) = ( 2 , 4 ) , where most methods achieve their lowest F1-scores. However, for K = 3 and larger sample sizes ( N = 2000 and N = 4000 ), EIFA with rotations shows strong robustness and, in several instances, even benefits from moderate non-normality, achieving an F1-score of 1 in multiple conditions.
Figure 2 shows the boxplots of the MSEs of A ^ . Note that these boxplots are based on the J × K element-wise values of M S E ( a j k ) defined in Equation (6). From Figure 2, EMS and EML1 show better estimation of A than EIFA with rotations in most cases, especially when α = 2 and β = 4 . This may be because EMS and EML1 estimate A by directly optimizing penalized likelihoods. In contrast, EIFA uses marginal maximum likelihood followed by rotation and thresholding. The extra cutoff step can introduce small distortions and reduce estimation accuracy. Note that while larger sample sizes generally reduce the MSE of A ^ , some non-normal conditions still yield higher MSEs than the normal cases, as shown in Figure 2. Figure 3 shows the boxplots of the MSEs of b ^ . Non-normality increases the MSE, and larger sample sizes do not noticeably reduce it. For ( α , β ) = ( 2 , 4 ) , EMS gives the smallest MSE. But for ( α , β ) = ( 1 , 0 ) , EIFA with rotations performs best. Thus, no method consistently dominates under non-normality. This may be because all three methods estimate b directly through likelihood-based optimization.
Figure 2. Boxplots of the MSEs of A ^ under simulation 1.
Figure 3. Boxplots of the MSEs of b ^ under simulation 1.

4.3.2. Results for Simulation 2

This subsection presents results from simulation 2. Table 2 reports the mean of F1-scores (with standard deviations) for M ^ under simulation setting 2, where the inter-factor correlation is fixed at 0.4 . A clear pattern emerges as the sample size increases. When N = 500 , EMS achieves the highest F1-scores for both K = 3 and K = 5 . This is likely because its MS-step minimizes the expected BIC and tends to select a sparse discrimination matrix A that is close to the true M2PL structure. The reduced model complexity improves structural recovery in small samples. However, as the sample size increases, the performance of EMS declines. In contrast, the F1-scores of EML1 and EIFA with rotations steadily increase with N. As a result, EML1 and EIFA with rotations outperform EMS for large N. This pattern implies that EML1 and EIFA with rotations exhibit better consistency properties than EMS.
Table 2. Mean and standard deviation of the F1-score of M ^ under simulation 2.
In addition, we can see a systematic impact of non-normality from Table 2. As the number of non-normal latent dimensions increases, all methods exhibit a decline in F1-score, especially when N = 500 . However, for large sample sizes, the negative effect of non-normality on EIFA with rotations is minimal. EIFA with rotations remains comparatively robust and continues to achieve F1-scores extremely close to 1. This robustness may stem from the consistency properties that EIFA with rotations enjoys under normality, which appear to extend favorably to mildly non-normal settings.
Figure 4 shows the boxplots of the MSEs of A ^ . From this figure, the MSE increases with the number of non-normal latent variables. Overall, EMS and EML1 outperform EIFA with rotations in terms of estimation accuracy. While a larger sample size helps reduce the MSE, it does not fully eliminate the adverse effects of non-normality. This behavior is consistent with the results observed in simulation 1. Figure 5 displays the boxplots of the MSEs of b ^ . From Figure 5, all methods perform similarly under normal conditions; yet, EMS and EML1 outperform EIFA with rotations in terms of estimation accuracy for most non-normal cases. Increasing the sample size significantly reduces the MSE of b ^ under both normal and non-normal conditions.
Figure 4. Boxplots of the MSEs of A ^ under simulation 2.
Figure 5. Boxplots of the MSEs of b ^ under simulation 2.

4.4. Additional Simulation Results

As higher inter-factor correlations are common in psychological research, we conduct additional simulation studies using a stronger correlation of 0.6. As expected, this stronger correlation decreases the F1-scores and increases the MSEs. The comparative behaviors of EMS, EML1, and EIFA with rotations remain similar to those observed in simulations 1 and 2. Therefore, these results are moved to the Appendix A to save space in the main paper.
Moreover, we report additional simulation results in the Supplementary File, including the TPR and FPR for M ^ , the biases of A ^ and b ^ , the MSEs and biases of Σ ^ , and the CPU time. From the TPR and FPR results, we observe that when the sample size is large ( N = 4000 ), EMS attains a TPR close to 1 but also a non-negligible FPR. This leads to poorer F1-score in recovering item–trait relationships. The finding suggests that EMS may require stronger penalization on the likelihood, for example, by employing the extended Bayesian information criterion (EBIC). EBIC is commonly used in high-dimensional settings. Further empirical and theoretical work is needed to explore this direction.
Regarding the effect of skewness, higher skewness results in larger bias in the estimates of the difficulty parameters b ^ . In terms of computational cost, EML1 is the slowest method, likely because it must evaluate multiple regularization parameters in (3). EMS is also slower than EIFA with rotations, presumably due to its greater number of iterations. In future work, we plan to adopt the smooth approximation of BIC for regularized estimation proposed by Robitzsch [25] to improve computational efficiency.

5. Real Data Analysis

Psychological constructs such as depression, pain, and gambling are particularly likely to exhibit non-normal distributions in the general population [12]. To illustrate how such non-normal latent traits affect the performance in latent variable selection, we apply the EMS, EML1, and EIFA with rotations to analyze a real data set based on the Depression, Anxiety, and Stress Scale (42-item version, DASS-42) under the M2PL model. The DASS-42 data set, which contains responses from 5000 subjects to 42 items, is publicly available at https://osf.io/ykq2a/ (accessed on 8 May 2025). The items are reordered such that items 1–14, 15–28, and 29–42 correspond to the latent traits of depression (D), anxiety (A), and stress (S), respectively [33]. The full item content and measurement structure are presented in Table A2 in Appendix B. Responses are originally collected on a 4-point scale (0 = “Did not apply to me at all”; 1 = “Applied to me to some degree, or some of the time”; 2 = “Applied to me to a considerable degree, or a good part of time”; 3 = “Applied to me very much, or most of the time”). In our analysis, we dichotomize the responses by retaining 0 as 0 (symptom absence) and recoding categories 1–3 as 1 (symptom presence).
To verify the non-normality of the latent traits involved in this dataset, we compute factor scores by summarizing all responses for each individual on dimensions D, A, and S. As demonstrated in Figure 6 and Table 3, all three latent trait distributions exhibit substantial deviations from normality, thereby confirming the presence of non-normal latent traits in the DASS-42 data.
Figure 6. QQ plots, histograms, and density curves for depression, anxiety, and stress.
Table 3. Results of normality tests for depression, anxiety, and stress.
Next, we compare the performance of EMS, EML1, and EIFA with rotations in identifying the item–trait relationships in this dataset. To ensure identifiability, we designate one item for each trait based on the content of the items, similarly as Xu et al. [8]. Specifically, items 1, 15, and 29 are exclusively assigned to traits D, A, and S, respectively. The loading structures estimated by EMS, EML1, and EIFA with rotations are visualized as heatmaps in Figure 7. The estimated A ^ and b ^ are reported in Table A3, Table A4, Table A5, Table A6 and Table A7 in Appendix B, and Σ ^ are
Σ ^ E M S = 1.000 0.393 0.399 0.393 1.000 0.507 0.399 0.507 1.000 , Σ ^ E M L 1 = 1.000 0.457 0.520 0.457 1.000 0.611 0.520 0.611 1.000 ,
Σ ^ Q u a r t i m i n = 1.000 0.601 0.652 0.601 1.000 0.700 0.652 0.700 1.000 , Σ ^ G e o m i n = 1.000 0.574 0.664 0.574 1.000 0.722 0.664 0.722 1.000 ,
Σ ^ I n f o m a x = 1.000 0.672 0.750 0.672 1.000 0.775 0.750 0.775 1.000 .
Figure 7. Heatmaps of the loading matrices estimated by EMS, EML1, and EIFA with rotations for the DASS-42 data set.
It can be seen that the estimated correlations among the latent traits are relatively high, with those obtained by EIFA with rotations evidently exceeding those produced by EMS and EML1. This suggests that the constructs are interconnected and mutually influential, even though each DASS-42 item is originally designed to measure a single psychological construct (D, A, or S). In other words, the constructs are not entirely distinct but partially overlapping in their manifestations. For example, items 27 and 33 are found to be associated with both anxiety and stress.
While the designed item–trait relationships serve as the benchmark, it is important to acknowledge that multiple psychological constructs may interact when subjects respond to items. Comparing the estimated loading structures with the benchmark, the F1-scores of EMS, EML1, and EIFA with rotations (Quartimin, Geomin, and Infomax) are 0.582, 0.609, 0.724, 0.724, and 0.706, respectively. Although EIFA with rotations achieves the highest F1-scores and produce the sparsest loading structures, none of the methods attains high accuracy in identifying item–trait relationships under conditions of stronger inter-factor correlations and non-normal latent trait distributions. These findings from the DASS-42 data analysis are consistent with our simulation results.

6. Discussion

This study evaluates the robustness of EMS, EML1, and EIFA with rotations for identifying item–trait relationships under non-normal latent distributions in MIRT models. Our simulation results suggest the following. For identifying item–trait relationships, EMS is preferable for small samples, whereas EIFA with rotations performs better for large samples. For estimation accuracy, EMS and EML1 generally outperform EIFA with rotations. EMS likely achieves higher accuracy because it directly optimizes a penalized likelihood, while EIFA relies on thresholding small loadings, which can reduce precision. A potential improvement is to apply marginal maximum likelihood estimation after thresholding to enhance estimation accuracy.
The simulation results align closely with previous IRT research. Consistent with Svetina et al. [16] and McClure and Jacobucci [18], we also find that higher inter-factor correlations significantly exacerbate the adverse effects of non-normality. When the correlation increases to 0.6, the impact of non-normality on parameter estimation becomes notably stronger than when the correlation is 0.4. Furthermore, consistent with Finch and Edwards [15], McClure and Jacobucci [18], and Wall et al. [34], a large sample size helps reduce, but does not eliminate, estimation errors. These errors remain larger than those under normality, especially when the number of latent traits is five.
To address violations of normality, more robust methods are needed. One direction is to allow flexible latent trait distributions, such as semi-nonparametric forms [35], multivariate skew-normal models [36], or centered skew-t distributions [37]. We plan to extend EMS and EML1 to these distributional frameworks in future work. Another direction is limited-information estimation based on tetrachoric or polychoric correlations. These methods avoid full multivariate integration and depend only on low-order summary statistics, making them faster and less sensitive to distributional assumptions. Inspired by Huang [38], which uses penalized least squares for ordinal structural equation modeling, we will explore regularized limited-information estimation for MIRT.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math13233858/s1.

Author Contributions

Conceptualization, P.-F.X., X.L., L.S., Q.-Z.Z., N.S. and Y.L.; methodology, P.-F.X. and X.L.; software, P.-F.X., X.L. and L.S.; validation, P.-F.X. and X.L.; formal analysis, P.-F.X., X.L. and Y.L.; investigation, P.-F.X. and X.L.; resources, P.-F.X. and Y.L.; data curation, P.-F.X. and X.L.; writing—original draft preparation, P.-F.X., X.L. and Y.L.; writing—review and editing, P.-F.X., X.L., L.S., Q.-Z.Z., N.S. and Y.L.; visualization, X.L.; supervision, P.-F.X. and Y.L.; project administration, P.-F.X. and Y.L.; funding acquisition, P.-F.X. All authors have read and agreed to the published version of the manuscript.

Funding

The research of Ping-Feng Xu was supported by the National Social Science Fund of China (No. 23BTJ062).

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Simulation Study for Cases with Higher Correlations

In this appendix, we conduct simulation studies with a higher correlation fixed at 0.6. The non-normal latent trait data are designed as follows:
Simulation 3. 
A Type 1 latent variable distribution with parameters ( α , β ) = ( 2 , 4 ) and K = 3 is used, and the correlation is fixed at 0.6.
In simulation 3, we do not include other ( α , β ) combinations for K = 3 or any combinations for K = 5 , as these cases are either infeasible under the IG framework or do not depart substantially from normality.
Simulation studies are conducted similarly as in Section 4. Table A1 reports the corresponding mean of F1-scores (with standard deviations) for M ^ . As expected, these F1-scores are noticeably lower than those observed in simulation 1 where the correlation is 0.4. Other patterns remain largely consistent with the previous subsections. For example, EMS performs best when N = 500 , whereas EML1 and EIFA with rotations achieve the highest accuracy for N = 1000 , 2000 , and 4000. In most settings, non-normality continues to reduce the performance of all methods.
Table A1. Mean and standard deviation of the F1-score of M ^ under simulation 3.
Table A1. Mean and standard deviation of the F1-score of M ^ under simulation 3.
K = 3
N = 500 N = 1000 N = 2000 N = 4000
( α , β ) (0,0)(2,4) (0,0)(2,4) (0,0)(2,4) (0,0)(2,4)
EMS0.9460.872 0.9550.896 0.9270.873 0.8780.834
(0.027)(0.046) (0.023)(0.033) (0.028)(0.026) (0.033)(0.023)
EML10.9230.844 0.9630.888 0.9860.914 0.9960.924
(0.030)(0.071) (0.021)(0.035) (0.014)(0.022) (0.007)(0.018)
Quartimin0.9250.854 0.9660.916 0.9850.955 0.9790.989
(0.031)(0.083) (0.058)(0.080) (0.058)(0.096) (0.074)(0.028)
Geomin0.9210.828 0.9650.910 0.9830.950 0.9780.988
(0.035)(0.092) (0.063)(0.084) (0.062)(0.100) (0.077)(0.030)
Infomax0.9180.853 0.9600.919 0.9830.953 0.9800.977
(0.031)(0.057) (0.056)(0.055) (0.054)(0.060) (0.071)(0.027)
Note: The bold formatting is used to highlight the best results.
Figure A1 presents the boxplots of the MSEs for A ^ and b ^ . The figure shows that non-normality leads to a clear increase in MSE, and this negative effect cannot be eliminated simply by increasing the sample size. Under the non-normal case ( α , β ) = ( 2 , 4 ) , EMS achieves slightly higher estimation accuracy than EIFA with rotations, particularly when N = 500 , 1000 , and 2000.
Figure A1. Boxplots of the MSEs of A ^ and b ^ under simulation 3.

Appendix B. DASS-42 Items and Estimated Parameters in Real Data Analysis

In this appendix, we present the full item content and measurement structure in Table A2, and provide the estimated discrimination parameters A ^ and difficulty parameters b ^ by EMS, EML1, and EIFA with rotations (Quartimin, Geomin, and Infomax) in Table A3, Table A4, Table A5, Table A6 and Table A7, respectively.
Table A2. DASS-42 items.
Table A2. DASS-42 items.
1I felt downhearted and blue.
2I felt sad and depressed.
3I could see nothing in the future to be hopeful about.
4I felt that I had nothing to look forward to.
5I felt that life was meaningless.
6I felt that life wasn’t worthwhile.
7I felt I was pretty worthless.
8I felt I wasn’t worth much as a person.
9I felt that I had lost interest in just about everything.
10I was unable to become enthusiastic about anything.
11I couldn’t seem to experience any positive feeling at all.
12I couldn’t seem to get any enjoyment out of the things I did.
13I just couldn’t seem to get going.
14I found it difficult to work up the initiative to do things.
15I was aware of the action of my heart in the absence of physical exertion
(e.g., sense of heart rate increase, heart missing a beat).
16I perspired noticeably (e.g., hands sweaty) in the absence of high temperatures or physical exertion.
17I was aware of dryness of my mouth.
18I experienced breathing difficulty
(e.g., excessively rapid breathing, breathlessness in the absence of physical exertion).
19I had difficulty in swallowing.
20I had a feeling of shakiness (e.g., legs going to give way).
21I experienced trembling (e.g., in the hands).
22I was worried about situations in which I might panic and make a fool of myself.
23I found myself in situations which made me so anxious I was most relieved when they ended.
24I feared that I would be “thrown” by some trivial but unfamiliar task.
25I felt I was close to panic.
26I felt terrified.
27I felt scared without any good reason.
28I had a feeling of faintness.
29I found it hard to wind down.
30I found it hard to calm down after something upset me.
31I found it difficult to relax.
32I felt that I was using a lot of nervous energy.
33I was in a state of nervous tension.
34I found myself getting upset rather easily.
35I found myself getting upset by quite trivial things.
36I found myself getting agitated.
37I tended to over-react to situations.
38I found that I was very irritable.
39I felt that I was rather touchy.
40I was intolerant of anything that kept me from getting on with what I was doing.
41I found myself getting impatient when I was delayed in any way
(e.g., lifts, traffic lights, being kept waiting).
42I found it difficult to tolerate interruptions to what I was doing.
Table A3. The elements of A ^ and b ^ by EMS for the DASS-42 data set.
Table A3. The elements of A ^ and b ^ by EMS for the DASS-42 data set.
A ^ b ^ A ^ b ^
11.941 1.310220.3301.4400.213−1.134
21.2840.2541.0802.14523 1.4840.2450.687
32.141 0.8481.781240.2811.5660.9061.050
41.837 1.4633.293250.3250.7920.9671.509
51.816 0.9112.096260.4871.1340.8591.109
61.941 1.1822.26427 1.1060.8871.949
72.502 1.0461.30228 1.9870.1450.259
81.673 1.0822.11429 2.3552.877
91.3860.2421.2132.55430 0.2231.6601.993
101.4500.1950.8721.837310.5660.8761.1702.044
111.953 1.1562.342320.565 2.2893.234
122.454 1.0091.36533 1.5121.0791.872
131.0020.1931.0082.53534 0.3281.0151.634
141.960 0.8251.50135 0.1841.1581.535
15 1.119 0.626360.5250.5761.3081.651
160.2201.8360.2950.063370.443 1.8262.442
170.2652.319 −0.038380.3280.5201.5382.488
18 1.2630.9602.31739 0.4621.4271.867
190.3561.3010.296−0.235400.2481.4970.9571.885
20 1.4030.2610.003410.1320.3391.3791.514
210.4141.1970.9031.149420.3610.5461.2931.887
Table A4. The elements of A ^ and b ^ by EML1 for the DASS-42 data set.
Table A4. The elements of A ^ and b ^ by EML1 for the DASS-42 data set.
A ^ b ^ A ^ b ^
11.877 1.518220.2581.556 −0.945
21.2080.2710.8752.37523 1.578 0.846
32.378 2.070240.2291.6200.6971.319
41.8240.2770.9453.589250.3220.8230.7881.714
51.778 0.6532.351260.4771.1980.6401.367
61.9260.2870.6372.57827 1.1470.7272.139
72.385 0.5671.58928 1.963 0.447
81.600 0.8862.37229 2.2933.103
91.3130.3470.8892.78430 1.7602.189
101.3470.2050.6852.051310.5130.8011.0812.294
111.9270.2730.6132.637320.499 2.1083.420
122.379 0.5331.65933 1.5030.9122.107
130.9650.2010.8462.74434−0.1910.3111.1201.784
141.944 0.4731.76335 1.2581.682
15 1.098 0.727360.4560.5321.2741.913
160.1542.004 0.287370.388 1.7492.661
17 2.291 0.186380.3090.4541.4932.744
18 1.2900.7912.51939 0.3551.4462.067
190.3161.4060.075−0.046400.1941.4950.8082.136
20 1.548 0.16341 0.2771.4671.721
210.3741.2450.6981.397420.2810.4991.2772.119
Table A5. The elements of A ^ and b ^ by EIFA with Quartimin for the DASS-42 data set.
Table A5. The elements of A ^ and b ^ by EIFA with Quartimin for the DASS-42 data set.
A ^ b ^ A ^ b ^
11.916 0.4821.68722 1.480 −0.849
21.498 0.5792.47223 1.554 0.915
32.979 2.31424 1.6930.4321.438
42.219 0.5453.739250.3950.8770.5131.797
52.279 2.509260.5171.1980.3821.445
62.5840.389 2.78027 1.1820.4962.224
73.609 −0.5291.97828 2.093 0.536
81.958 0.6482.502290.473 1.8303.256
91.637 0.5992.90530 1.5742.270
101.690 0.4412.173310.5270.7070.9962.405
112.6030.365 2.843320.647 2.1883.731
123.632 −0.4872.05433 1.4920.7162.210
131.138 0.6672.85134 1.2241.895
142.599 1.93835 1.0841.751
15 0.880 0.778360.4810.4711.1992.044
16 1.957 0.378370.425 1.6862.817
17 2.323 0.286380.3260.3911.4652.924
18 1.4420.6372.67439 1.5482.232
19 1.349 0.03440 1.5100.6492.252
20 1.535 0.23841 1.5111.852
210.4141.2970.4571.51042 0.4561.2132.250
Table A6. The elements of A ^ and b ^ by EIFA with Geomin for the DASS-42 data set.
Table A6. The elements of A ^ and b ^ by EIFA with Geomin for the DASS-42 data set.
A ^ b ^ A ^ b ^
11.786 0.6561.68722 1.451 −0.849
21.378 0.7412.47223 1.528 0.915
32.848 2.31424 1.6380.5401.438
42.067 0.7543.739250.3290.8330.6141.797
52.154 0.3692.509260.4531.1550.4901.445
62.4670.390 2.78027 1.1320.5822.224
73.476 −0.3161.97828 2.067 0.536
81.813 0.8382.502290.303 2.0293.256
91.508 0.7732.90530 1.7202.270
101.573 0.6012.173310.4160.6401.1452.405
112.4880.370 2.843320.442−0.3412.4233.731
123.495 2.05433 1.4240.8282.210
131.029 0.8102.85134−0.354 1.3231.895
142.490 1.93835 1.1901.751
15 0.856 0.778360.3580.3971.3552.044
16 1.925 0.37837 1.8662.817
17 2.295 0.28638 0.3041.6322.924
18 1.3800.7302.67439 1.6932.232
19 1.323 0.03440 1.4460.7642.252
20 1.508 0.23841 1.6561.852
210.3471.2480.5691.51042 0.3811.3572.250
Table A7. The elements of A ^ and b ^ by EIFA with Infomax for the DASS-42 data set.
Table A7. The elements of A ^ and b ^ by EIFA with Infomax for the DASS-42 data set.
A ^ b ^ A ^ b ^
11.986 0.4621.68722 1.558 −0.849
21.518 0.5842.47223 1.653 0.915
33.189 −0.3142.31424 1.7470.4171.438
42.289 0.5093.739250.3230.8690.5341.797
52.401 2.509260.4531.2150.3651.445
62.7450.317 2.78027 1.2080.5202.224
73.887 −0.7951.97828 2.241−0.3580.536
82.013−0.3270.6542.50229 2.0743.256
91.662 0.5992.90530 1.8022.270
101.743 0.4212.173310.4140.6431.0882.405
112.770 −0.3362.843320.443−0.4502.4873.731
123.912 −0.7442.05433 1.5190.7632.210
131.127 0.7042.85134−0.436 1.4081.895
142.783 −0.3751.93835 1.2361.751
15 0.918 0.778360.3520.3781.3332.044
16 2.076 0.37837 1.9142.817
17 2.483−0.4220.28638 1.6492.924
18 1.4790.6802.67439 1.7682.232
19 1.415 0.03440 1.5380.6792.252
20 1.631 0.23841 1.7261.852
210.3301.3180.4531.51042 0.3681.3592.250

References

  1. Bock, R.D.; Gibbons, R.; Muraki, E. Full-information item factor analysis. Appl. Psychol. Meas. 1988, 12, 261–280. [Google Scholar] [CrossRef]
  2. Cai, L. High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika 2010, 75, 33–57. [Google Scholar] [CrossRef]
  3. Browne, M.W. An overview of analytic rotation in exploratory factor analysis. Multivar. Behav. Res. 2001, 36, 111–150. [Google Scholar] [CrossRef]
  4. Sun, J.; Chen, Y.; Liu, J.; Ying, Z.; Xin, T. Latent variable selection for multidimensional item response theory models via L1 regularization. Psychometrika 2016, 81, 921–939. [Google Scholar] [CrossRef] [PubMed]
  5. Zhang, S.; Chen, Y. Computation for latent variable model estimation: A unified stochastic proximal framework. Psychometrika 2022, 87, 1473–1502. [Google Scholar] [CrossRef]
  6. Shang, L.; Xu, P.F.; Shan, N.; Tang, M.L.; Ho, G.T.S. Accelerating L1-penalized expectation maximization algorithm for latent variable selection in multidimensional two-parameter logistic models. PLoS ONE 2023, 18, e0279918. [Google Scholar] [CrossRef]
  7. Schwarz, G. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  8. Xu, P.F.; Shang, L.; Zheng, Q.Z.; Shan, N.; Tang, M.L. Latent variable selection in multidimensional item response theory models using the expectation model selection algorithm. Br. J. Math. Stat. Psychol. 2022, 75, 363–394. [Google Scholar] [CrossRef]
  9. Shang, L.; Zheng, Q.Z.; Xu, P.F.; Tang, M.L. A generalized expectation model selection algorithm for latent variable selection in multidimensional item response theory models. Stat. Comput. 2024, 34, 49. [Google Scholar] [CrossRef]
  10. Wall, M.M.; Park, J.Y.; Moustaki, I. IRT modeling in the presence of zero-inflation with application to psychiatric disorder severity. Appl. Psychol. Meas. 2015, 39, 583–597. [Google Scholar] [CrossRef] [PubMed]
  11. Woods, C.M.; Thissen, D. Item response theory with estimation of the latent population distribution using spline-based densities. Psychometrika 2006, 71, 281–301. [Google Scholar] [CrossRef]
  12. Preston, K.S.J.; Reise, S.P. Estimating the nominal response model under nonnormal conditions. Educ. Psychol. Meas. 2014, 74, 377–399. [Google Scholar] [CrossRef]
  13. Sass, D.A.; Schmitt, T.A.; Walker, C.M. Estimating non-normal latent trait distributions within item response theory using true and estimated item parameters. Appl. Meas. Educ. 2008, 21, 65–88. [Google Scholar] [CrossRef]
  14. Stone, C.A. Recovery of marginal maximum likelihood estimation in the two-parameter logistic model: An evaluation of MULTILOG. Appl. Psychol. Meas. 1992, 16, 1–16. [Google Scholar] [CrossRef]
  15. Finch, H.; Edwards, J.M. Rasch model parameter estimation in the presence of a nonnormal latent trait using a nonparametric Bayesian approach. Educ. Psychol. Meas. 2016, 76, 662–684. [Google Scholar] [CrossRef]
  16. Svetina, D.; Valdivia, A.; Underhill, S.; Dai, S.; Wang, X. Parameter recovery in multidimensional item response theory models under complexity and nonnormality. Appl. Psychol. Meas. 2017, 41, 530–544. [Google Scholar] [CrossRef] [PubMed]
  17. Wang, C.; Su, S.; Weiss, D.J. Robustness of parameter estimation to assumptions of normality in the multidimensional graded response model. Multivar. Behav. Res. 2018, 53, 403–418. [Google Scholar] [CrossRef] [PubMed]
  18. McClure, K.; Jacobucci, R. Item parameter calibration in the multidimensional graded response model with high dimensional tests. PsyArXiv 2023. [Google Scholar] [CrossRef]
  19. Vale, C.D.; Maurelli, V.A. Simulating multivariate nonnormal distributions. Psychometrika 1983, 48, 465–471. [Google Scholar] [CrossRef]
  20. Astivia, O.L.O.; Zumbo, B.D. A cautionary note on the use of the Vale and Maurelli method to generate multivariate, nonnormal data for simulation purposes. Educ. Psychol. Meas. 2015, 75, 541–567. [Google Scholar] [CrossRef]
  21. Foldnes, N.; Grønneberg, S. How general is the Vale–Maurelli simulation approach? Psychometrika 2015, 80, 1066–1083. [Google Scholar] [CrossRef]
  22. Foldnes, N.; Olsson, U.H. A simple simulation technique for nonnormal data with prespecified skewness, kurtosis, and covariance matrix. Multivar. Behav. Res. 2016, 51, 207–219. [Google Scholar] [CrossRef]
  23. Carroll, J.B. An analytical solution for approximating simple structure in factor analysis. Psychometrika 1953, 18, 23–38. [Google Scholar] [CrossRef]
  24. Clarkson, D.B.; Jennrich, R.I. Quartic rotation criteria and algorithms. Psychometrika 1988, 53, 251–259. [Google Scholar] [CrossRef]
  25. Robitzsch, A. Smooth information criterion for regularized estimation of item response models. Algorithms 2024, 17, 153. [Google Scholar] [CrossRef]
  26. Shang, L.; Xu, P.F.; Shan, N.; Tang, M.L.; Zheng, Q.Z. The improved EMS algorithm for latent variable selection in M3PL model. Appl. Psychol. Meas. 2025, 49, 50–70. [Google Scholar] [CrossRef] [PubMed]
  27. Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 1977, 39, 1–22. [Google Scholar] [CrossRef]
  28. Jiang, J.; Nguyen, T.; Rao, J.S. The E-MS algorithm: Model selection with incomplete data. J. Am. Stat. Assoc. 2015, 110, 1136–1147. [Google Scholar] [CrossRef]
  29. Grønneberg, S.; Foldnes, N.; Marcoulides, K.M. covsim: An R package for simulating non-normal data for structural equation models using copulas. J. Stat. Softw. 2022, 102, 1–45. [Google Scholar] [CrossRef]
  30. Curran, P.J.; West, S.G.; Finch, J.F. The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychol. Methods 1996, 1, 16–29. [Google Scholar] [CrossRef]
  31. Hair, J.; Black, W.C.; Anderson, R.E. Multivariate Data Analysis, 7th ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2010. [Google Scholar]
  32. Tabachnick, B.G.; Fidell, L.S. Using Multivariate Statistics; Allyn and Bacon: Boston, MA, USA, 2001. [Google Scholar]
  33. Lovibond, P.F.; Lovibond, S.H. The structure of negative emotional states: Comparison of the Depression Anxiety Stress Scales (DASS) with the Beck Depression and Anxiety Inventories. Behav. Res. Ther. 1995, 33, 335–343. [Google Scholar] [CrossRef] [PubMed]
  34. Wall, M.M.; Guo, J.; Amemiya, Y. Mixture factor analysis for approximating a nonnormally distributed continuous latent factor with continuous and dichotomous observed variables. Multivar. Behav. Res. 2012, 47, 276–313. [Google Scholar] [CrossRef] [PubMed]
  35. Monroe, S.L. Multidimensional Item Factor Analysis with Semi-Nonparametric Latent Densities. Ph.D. Thesis, University of California, Berkeley, CA, USA, 2014. [Google Scholar]
  36. Padilla, J.L.; Azevedo, C.L.; Lachos, V.H. Multidimensional multiple group IRT models with skew normal latent trait distributions. J. Multivar. Anal. 2018, 167, 250–268. [Google Scholar] [CrossRef]
  37. Gómez, J.L.P. Bayesian Inference for Multidimensional Item Response Models Under Heavy Tail Skewed Latent Trait Distributions and Link Functions. Ph.D. Thesis, Universidade Estadual de Campinas, Campinas, Brazil, 2018. [Google Scholar]
  38. Huang, P.H. Penalized least squares for structural equation modeling with ordinal responses. Multivar. Behav. Res. 2022, 57, 279–297. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.