Measuring State and Trait Anxiety: An Application of Multidimensional Item Response Theory

The State-Trait Inventory for Cognitive and Somatic Anxiety (STICSA) is a widely used measure of state and trait anxiety. Within the Classical Testing Theory model, consistent findings provide support for its multidimensional factor structure, discriminant, convergent, and nomological validity, as well as age and gender invariance, across healthy and clinical samples. Nevertheless, some issues regarding STICSA dimensionality and item-scale composition remain unresolved (e.g., both bifactor and two-factor models were found to fit data equally well). The goal of this study was to investigate the STICSA’s dimensionality within the Item Response Theory, and to assess the tenability of the bifactor model as a plausible model over the multidimensional model. The sample consisted of 3338 Italian participants (58.21% females; 41.79% males) with an average age of 35.65 years (range: 18–99; SD = 20.25). Both bifactor and two-correlated dimensions of the STICSA scales were confirmed to fit data by applying the multidimensional Item Response Theory (mIRT). While the bifactor model showed better fit indices, the multidimensional model was more accurate and precise (0.86–0.88) in estimating state and trait latent anxiety. A further comparison between multidimensional item parameters revealed that the multidimensional and bifactor models were equivalent. Findings showed that the STICSA is an accurate and precise instrument for measuring somatic and cognitive symptomatology dimensions within state and trait anxiety. The use of the state/trait total score requires special attention from the clinicians and researchers to avoid bias in the psychodiagnostic assessment.


Introduction
Anxiety has a hypothetical multidimensional nature and multiple symptomatic manifestations [1,2]. Anxiety is multidimensional as it can be divided into different categories, including trait and state anxiety and both cognitive and somatic components (e.g., Martens, Vealey [3] proposed the multidimensional anxiety theory). This is a challenge to researchers and clinicians interested in the assessment of this complex phenomenon [4]. The main concern is the dimensionality of anxiety [1,2]. Similar to other psychological states, anxiety is considered to be composed of state and trait dimensions. It is possible to differentiate between transient emotion that varies in duration and is characterized by observable symptoms, like worry, tension, nervousness, and arousal of the autonomic nervous system (i.e., state anxiety), as well as a more enduring unobservable tendency to typically respond with high levels of anxious apprehension to perceived threats (i.e., trait anxiety) [5,6].
State anxiety has been conceptualized as an emotional response experienced in a limited period, depending on the anticipation of a (real or perceived) threatening stimulus, and is variable in duration [7]. On the other hand, trait anxiety has been viewed as an individual difference regarding an emotional response [8]. It is relatively stable in time and corresponds to neuroticism [9] and negative or low emotionality [10], is a risk factor for the development of anxiety disorder [11], and is comparable to anxiety sensitivity [12]. Individuals with high state anxiety may experience increased levels of apprehension, fear, and physiological arousal [13]. Conversely, individuals high in trait anxiety become more state anxious (in terms of duration, frequency, and intensity of the episodes of state anxiety) in critical contexts than those low in trait anxiety [7].
Within the assessment of self-reported anxiety, the state-trait distinction has been captured by distinct instructions for the state and trait scale (e.g., "generally" vs "right now") by many authors [14,15]. Both state and trait anxiety have somatic and cognitive elements, creating four distinct components of anxiety: somatic state anxiety, somatic trait anxiety, cognitive state anxiety, and cognitive trait anxiety [13]. Clinicians and researchers have devoted much attention to the differentiation between somatic and cognitive anxiety [16]. From the clinical point of view, the symptoms of anxiety involve a broad range of cognitive, physical, and emotional aspects (e.g., "numbness," "unsteadiness," and "feeling hot") that may include worry, intrusive thoughts, and lack of concentration [17]. This somatic/cognitive distinction might allow for a more fine-grained assessment of anxiety and facilitating clinical treatments to be tailored to the specific and predominant manifestation of anxiety symptomatology (e.g., physiologically oriented relaxation vs cognitively oriented) [16].
A controversial issue in assessing anxiety concerns its overlap with depression [18][19][20]. In line with the tripartite model of anxiety and depression [21], a negative effect (e.g., fear, anger, guilt) is associated with both anxiety and depression; the lack of positive effect (e.g., feeling tired) is related to depression, whereas physiological hyperarousal (e.g., trembling, dizziness, shaking) is associated with anxiety [22]. Thus, the overlap between anxiety and depression may lead to misdiagnosis, a fairly frequent problem in clinical settings [23]. To this end, the State-Trait Inventory for Cognitive and Somatic Anxiety (STICSA) [17] represents a multidimensional measure of state/trait and somatic/cognitive anxiety, which holds strong discriminant power with depression [24][25][26].
Previous research supported its multidimensional factor structure (of both state and trait STICSA scales, with each including somatic and cognitive dimensions) [27]. Additionally, it has displayed gender and age invariance, as well as good convergent validity with concurrent anxiety measures and sound internal discriminant validity as determined in large non-clinical samples [27][28][29][30][31][32]. Whilst the STICSA state and trait scales have been recently reaffirmed as independent measures each composed of somatic and cognitive dimensions [27], the STICSA's dimensionality is still an open issue [33,34]. In detail, fitting both confirmatory (CFA) and exploratory structural equation models (ESEM) to STICSA scores, Styck and colleagues [33] showed that the oblique two-factor and bifactor models fit their data equally well for the separate state and trait forms of the STICSA. They also concluded that cognitive and somatic factors were not equally robust and that STICSA items appear to measure a mixture of both latent somatic and cognitive anxiety. Styck, Rodriguez [33] highlighted that "at least three items appear to meaningfully load onto both somatic-cognitive factors", which complicates STICSA score interpretation (p. 21). The items "feel agonized over problems" (#3), "trouble remembering things" (#11), and "feel trembly and shaky" (#8) were found to cross-load onto both somatic and cognitive dimensions. However, this datum is trivial since the magnitude of these cross-loadings was below the recommended level of = 0.30 [35] on the non-target dimension [33]. In addition, the differences among the primary and alternative factor loadings were found to be above the recommended cut-off of 0.20 [36], except for item #11.
Taken together, these results were not surprising, since under certain statistical conditions, the correlated factors and bifactor models were covariance matrix-nested [37], and resulted in being statistically indistinguishable [38]. In this case, the implied covariance matrix of the STICSA-State/Trait cognitive-somatic correlated factors model can be perfectly reproduced by the bifactor model. In addition, it is well established that the bifactor model has a high fitting propensity (the "probifactor bias" [39]). Equally, cross-loadings might arise because of the similar content of indicators when measuring closely related but not identical latent variables [40]. Furthermore, cross-loadings' magnitude is strictly related to the sample size and the type of the rotation applied in factor analysis [41].
Additionally, Styck, Rodriguez and Yi [33] affirmed that "STICSA state-trait cognitive and somatic anxiety composite scores do not purely measure latent somatic and cognitive anxiety" both in past and recent studies [17,27,34,42], and "that researchers must make a choice between competing alternatives that is appropriate for their particular study aims" (p.22). As a result, the authors did not provide clear guidance about the dimensionality of the STICSA, suggesting to directly model data complexity through a more flexible and exploratory approach, such as the ESEM. Indeed, the main core of their study shifted from comparing the competing models (bifactorial vs two-factor model) to the impossibility of distinguishing somatic from cognitive items domain in each state and trait scale. Despite the complexity of the data and the use of multiple techniques that fall within the classical testing theory (CTT) framework (i.e., the use in tandem of CFA and ESEM [33]), some issues regarding STICSA dimensionality and item-scale composition remain unresolved. Firstly, if STICSA items measure a nonnegligible mixture of both latent somatic and cognitive anxiety, each state and trait scale should be best represented by a single latent dimension or by the global dimension that arises from the bifactor model. Secondly, the computation of a total state/trait scale score would result in an accurate and precise estimation of anxiety. In addition, if fit indices cannot be used to select championed competing models [33], how can we assess the tenability of a somatic and cognitive specific dimension and its items in a specific model compared to others? A different psychometric approach to the STICSA dimensionality assessment would lead to interesting results.
Hence, a step forward in the analysis of the STICSA state and trait scale could be represented by the application of modern psychometrics approaches, i.e., the Item Response Theory (IRT). One of the most important differences between CTT and IRT is that in CTT, one uses a common estimate of the measurement precision that is assumed to be equal for all individuals. Differently, in IRT, the measurement precision depends on the latent-attribute value (θ) [43]. The IRT was also found to be superior to CTT in individual change detection (especially when tests included at least 20 items) [43]. Next, assessing dimensionality, addressing items' local independency issue, and estimating item parameters and information (e.g., difficulty and discrimination) under IRT assumptions could lead to important benefits in health/clinical assessments [44,45]. The IRT approach is particularly useful for the development and refinement of clinical measures since it allows the researcher to place the abilities of both respondents and items along a single continuum and measure them with common metrics. For example, IRT can capture meaningful nuances in clinical test accuracy since clinical measures are not equally reliable for all intervals of latent-attribute distributions. Again, IRT could allow clinical psychologists to replace total scores with more accurate IRT-based latent-attribute estimates (test scoring) [45].
However, no research has been conducted to assess STICSA dimensionality and properties using IRT models. The IRT models have always been conceived and estimated as unidimensional (or one-dimensional) models (e.g., the Rasch model) due to their attractiveness and measurement properties. However, unidimensional models are often simple and fail to fit the data when more items measure multiple latent traits simultaneously [46]. In addition, clinical instruments were found to provide unusual results about item parameters in clinical applications of IRT models, resulting in violations of the unidimensional and invariance IRT assumptions (for analogy, see the debate in intelligence using Jensen's method; i.e., [47]). In the latter case, the estimation of a multidimensional IRT (mIRT) model is more appropriate [48]. Far from unidimensional IRT models, the estimation of multidimensional IRT models (e.g., bifactor mIRT; two-or three-dimensional mIRT models) may help to address the complex items issue, which involve multiple latent traits [48].
For example, estimating the STICSA bifactor mIRT model could allow for the measurement of a global anxiety dimension (technically labeled as latent-trait or latent-attribute [49]), along with the specific dimensions (e.g., somatic and cognitive), each representing distinct underlying concepts. Compared to the bifactor model estimated within a CTT framework, the bifactor mIRT model could provide full-information item analysis estimating item parameters. Thus, the mIRT models may represent a feasible methodological approach that allows to evaluate complex psychological constructs that might be understood as a combination of nested sub-scale components (compensatory factors) or a more general construct (non-compensatory factors) [48]. On the other hand, using the general dimension slope parameter to interpret the relationship between the item and its dimension may be confusing and misleading, since the probability of response in one dimension is conditioned by the other dimensions [50].
Based on the above, our main goal was to assess the tenability of the bifactor model as a plausible model that may explain each STICSA state/trait scale over the two-factor model [33]. Next, a comparison at the item parameter level was made by estimating the discrimination and difficulty indices. We examined the dimensionality of the STICSA via IRT since recent advancements, such as the multidimensional IRT, may aid researchers in disentangling the debate on its structure and items functioning, as well as to provide suggestions for improving the psychometric qualities of this measure.

Participants
The sample was recruited through advertisements in many Italian cities; the majority of the sample was taken from a published database [27] to which new datasets from unpublished studies have been added (N = 256). The STICSA was administered into the Italian language by licensed psychologists. Socio-demographic variables (such as age, gender, and education) were also collected. Inclusion criteria were ages from 18 to 99 years and the ability to complete self-administered questionnaires. Signed informed consent was obtained before the administration. The ethics committee of the Department of Psychological Sciences, Health and Territory, University of Chieti, Italy, approved the study.

Measures
The State-Trait Inventory for Cognitive and Somatic Anxiety (STICSA; [17]) (for the Italian version, see [27,31,32]) is a 21-item measure designed to assess cognitive and somatic symptoms, both on state and trait. In the state anxiety scale, they rate how they feel at the moment of assessment (from 1-"not at all" to 4-"very much"), whereas in the trait anxiety scale, participants rate how often a statement is true in general (from 1-"almost never at all" to 4-"almost always").

Statistical Analysis
Preliminarily, IRT assumptions of uni-dimensionality of the STICSA state and trait were assessed using non-parametric IRT Mokken analysis [51,52]. All analyses were conducted using the mIRT package in R [53]. The Automated Item Selection Procedure (AISP) algorithm implemented in the Mokken package of R, using the recommended range of values of c = 0.3-0.5, has been used to partition a set of items (or a set of unscalable items) into Mokken scales [54,55]. The Mokken scale is defined by a set of dichotomously (e.g., yes/no) or polytomous (e.g., 1-4 Likert type response scale) scored items for which all inter-item covariances are positive and scalability coefficient (Hi for a single item/H for the total scale) values < 0.4 identify weak scalability; values between 0.4 and 0.5 are evaluated as moderate, and values > 0.5 as strong scalability [54]. Next, local independence to test the assumption of residual relationship amount item responses, that are not accounted for by the unidimensional model, was assessed by the standardized LD-χ2 statistic [56]. Large values of standardized LD residuals (equal to |10| or greater) reflected LI issues. LD-χ2 statistics were calculated using the residuals function in the mIRT package of R. A violation of both uni-dimensionality and local independence assumptions determined the non-appropriateness of a single factor model and the need for a hierarchical model (and thus a multidimensional approach).
According to the STICSA polytomous response format, the graded response model (GRM) and its multidimensional logistic model (mGRM) [48,57] was applied to evaluate the monotonicity of item response function (e.g., estimating difficulty and discrimination in the item parameters) on model(s) selected. In order to test the adequacy of the models, we computed the C2* fit statistic [58] available using the M2 function implemented in the mIRT package. The C2* is a limited-information goodness of fit test statistic for ordinal IRT models. Following the debate in literature on the STICSA, a series of competing models were also addressed (described below). The goodness of fit of the GRM models was also evaluated based on several fit statistics, such as the comparative fit index (CFI), the Tucker-Lewis index (TLI), the root mean square error of approximation (RMSEA), and the standardized root mean square residual (SRMR), that are strictly related to those in confirmatory factor analysis [59]. The −2*Likelihood (−2LL; which is distributed as Chisquare with degree of freedom) with the provided Aikake Information Criterion (AIC) [60] and Bayesian Information Criterion (BIC) [61] were used to select the most parsimonious model among the competing ones. The model with the lower −2LL and AIC/BIC score is expected to strike a superior balance between its ability to fit the data set and its ability to avoid over-fitting the data set.
The item diagnostic was assessed using the signed χ2 test (S-χ2) [62]. A significance test (p < 0.05) was used under the null hypothesis to select misfitting items. Given that large samples and a large family of tests tend to yield significant χ2 values, the Benjamini and Hochberg [63] procedure was applied to adjust the obtained p-values for the number of tests in the family and to control for experimenter-wise error.
For the unidimensional model, we estimated single slope (discrimination) parameters and three intercept (c) parameters for each 4-category STICSA item. Discrimination parameter values theoretically range from ±∞ (0.5 to 3.0 is a reasonable range). Intercept parameter (b) values ranging from −3 to + 3 are in the typical range [64]. For hierarchical and multidimensional IRT models, Multidimensional Item Discrimination (MDISC) and Multidimensional Difficulty Index (MDIFF) were also computed, transforming the estimated slope (item discrimination) and intercept parameters (category threshold). Values of MDISC >0.65 indicate discriminative items, and high positive values of MDIFF (>0.5) indicate difficult items [49,65]. Test information function (TIF) and related standard errors of measurement (SE) indicating the precision of the whole test were also estimated to determine at what level of the STICSA state and trait provide the most information.

Models Tested
First, a unidimensional structure (Model 1-uni-GR) was examined with all 21 items defining one latent dimension for the separate trait and state forms of the STICSA. The single factor model was evaluated to assess if a unified concept underlying the data exists, as well as to evaluate the suitability of the multidimensional IRT assumption.
Next, Model 2 (multi-GR) tested the original conceptualization of the STICSA (i.e., the oblique somatic-cognitive dimensions). This model is in line with recent studies suggesting that each STICSA form is well-represented by two correlated factors-cognitive and somatic anxiety [17,26,27,33,42,66].
Model 3 tested a bifactor mIRT model (bifac-GR) for the separate state and trait forms of the STICSA. In the bifactor model, each STICSA state/trait item was restricted to define both a global anxiety factor (i.e., global state anxiety; global trait anxiety) as well as a domain-specific factor (somatic or cognitive) [33,67]. In the bifactor mIRT model, somatic and cognitive domains for each state and trait scale were treated as orthogonal (uncorrelated), and item discrimination was estimated only in the dimension they belonged to, with the discrimination parameters linking items to the remaining dimensions being constrained to zero.
Besides the above, a restricted bifactor IRT measurement model was modeled to address IRT assumption violations (i.e., local independency), which can turn up in a biased estimation of item parameters and/or an overestimation of test reliability [68]. A trace lines transformation (also named "marginal slopes") [50,69] was applied to the bifactor mIRT model response function weighting by the normal distribution and integrating out the specific dimension and other general dimensions.
Next, we applied a pairwise comparison strategy of item fit and parameter estimates and their total information/precision across (a) the uni-GR vs bifac-GR item; and (b) the multi-GR vs bifac-GR. The first comparison was done in order to assess the usefulness of a general bifactor model, as well as the presence of LD issues that reflected the presence of a multidimensional structure of the STICSA state/trait scales.
Next, given the high degree of overlap among nested models, we compared multidimensional item parameters of multi-GR and bifac-GR models in order to assess the presence of meaningful differences among the models' multidimensional parameters and the compensatory/non-compensatory nature of the STICSA state and trait scales.

Item Response Theory Assumptions
Preliminarily, we examined the frequency distribution of all item responses to identify items with high percentages at the extreme ends of the response scale. Such items could affect the subsequent stages of the analysis, especially the analyses conducted within the IRT framework [70]. The results showed that no items had a mean <0.5 and >3.5. From the total sample, we removed N = 13 cases with missing completely at random (MCAR) data on all the STICSA state and trait items. Cases with missing completely at random values were also retained in the IRT analyses. The final database was composed of 3325 observations.
The STICSA state and trait scale items were submitted for Mokken analysis to test the uni-dimensionality assumption. The AISP revealed that all the STICSA state and trait items loaded on multiple latent dimensions, respectively. The inter-item covariances were found to be positive, and all the item scalability coefficients (Hi) ranged from 0.311 to 0.428 (weak) for the state, and from 0.295 to 0.404 (weak) for the trait scale; hence, the second criterion of the Mokken scale was not satisfied. The scalability coefficient for the entire state and trait scale, equal to 0.370 and 0.343, showed a weak scalability. As expected, the assumption of uni-dimensionality was not met for the 21 items of the state and trait scales.
Next, we assessed the tenability of local independence assumptions for each item pair of the state and trait scale, taking in to account the three models tested (see Appendix A). Large positive (+) and negative (−) standardized LD values for the models tested (uni-GR, multi-GR, and bifac-GR models) were summarized in Table 1. A careful examination of the LD values exhibited a pattern of 8/5 large positive LD among item pairs within the somatic area and 8/4 large positive LD among item pairs within the cognitive area regarding the state and trait scale, respectively. Far from the positive values, which can inflate the item slope parameter, the negative LD values can be ignored since they have a tendency to underestimate the slope parameter [71]. Negative LD values are usually detected for self-rating scales that do not measure best performance. The unmodeled covariation, particularly within the cognitive and somatic domains of state anxiety, suggested that a specific number of latent variables explaining the item response patterns or properly accounting for all covariations may not be outlined. This suggested that items for the state and trait scale may be better modelled by models defined as "multidimensional", such as a two-correlated factors model or bifactor model.  Therefore, a fit of the item response data with a multi-GR model and bifac-GR model was revealed. Again, by a detailed examination, unmodeled local dependency was found again in the LD values for the multidimensional model. However, it was isolated to three and four items in the somatic content area with five and two items in the cognitive content area for the state and trait scale, respectively. Fortunately, the adoption of a multi-GR model would appear to lead towards results that facilitate the local independence assumption. Whether or not the bifac-GR model was fitted to the item response data to better account for the unique variance attributed to each content area, the unmodeled LD was significantly reduced. The only LD values included items #1 and #16 for both the state and trait scale, with three other items (items #2, #12, and #10 for the state scale), and item #9 for the trait scale. The bifac-GR model resulted in the enhancement of the modelling of the item response covariations. This result leads to unique variability to be modelled for each content area after accounting for a general latent trait. Table 1 summarizes the results of the model comparison and displays fit indexes for the several IRT models in the comparison. All indices for determining the best-fitting STICSA state-trait model seemed to agree with the selected model, which was the bifac-GR model. However, the goodness of fit for the multi-GR model indicated an acceptable-to-good fit for the sample, strictly close to the bifactor model (see CFI and TLI indices above the >0.95 threshold and a RMSEA ≤0.05 for excellent fit). Nevertheless, the deviance statistics (−2LL) and the absolute difference in AIC and BIC (>|9|) [72] provided support for the superiority of the bifactor solution over the multi-GR model. The goodness of fit of the uni-GR model confirmed a less good model fit for STICSA state-trait scales (see LD values).

Comparison of Item Fit and Parameters
The uni-GR model and bifac-GR model parameter estimates for the STICSA state and trait items, respectively, were summarized in Table 2. We examined fit at the item level for the uni-and bifac-GR models controlling for Type I error using a Benjamini-Hochberg adjustment. Concerning the uni-GR and bifac-GR models, none of the state scale items was identified as misfitting. On the contrary, item #14 (p = 0.004) for the trait scale was identified as misfitting in both models, using the same criteria. a g a S1 a S2 a* g a* S1 a* S2 a a g a S1 a S2 a* g a* S1 a* S2 STIC_S_1 Note. a = uni-GR slope; ag = conditional slope for STICSA state or trait general trait; aS1 − aS2 = conditional slopes for specific traits somatic/cognitive; a*g = marginal slope for STICSA state or trait general trait; a*S1 − a* − S = marginal slopes for specific traits somatic/cognitive.
Comparing the uni-GR with bifac-GR model solutions revealed a range from 0 to 0.99 with a mean absolute difference of 0.21 (SD = 0.29) for the trait scale, and a range from 0 to 3.04 of the absolute differences in the values of the item intercept parameters with a mean absolute difference of 0.39 (SD = 0.53) for the state scale, as Appendix B shows. In particular, the absolute value of the mean difference was larger for the third intercept at the extreme; that is, 0.36 (SD = 0.39) for the trait scale and 0.62 (SD = 0.70) for the state. The mean absolute difference was lower for the first intercepts, 0.14 (SD = 0.21) and 0.06 (SD = 0.05) for trait and state scales, respectively. On the other hand, the mean average absolute difference was elevated for the second intercept in the state scale Inflating the conditional slopes of the general latent trait in the bifac-GR resulted in the conditional relationships among the specific and general latent traits [50]. Therefore, since a straight comparison among specific and general slopes is not recommended, a marginal slope [69] was computed by using Stucky and Edelen [50,69] equations and comparing slopes between the two model solutions (uni vs. bifac model).
Concerning the state scale, a comparison of the general state conditional slopes (see Table 2) from the bifac-GR model solution shows 10 of the 21 item conditional slopes were larger than uni-GR model slopes. The average absolute difference was 0.33 (Min = 0.09, Max = 1.67). For the general trait conditional slope, the bifac-GR model solution shows 11 of the 21 item conditional slopes were larger than uni-GR model slopes (see Table 2). The average absolute difference was 0.28 (Min = 0.06, Max = 0.96). An inspection of the marginal slopes in Table 2 shows that the four items which discriminated more across respondents with respect to the general state dimension, in rank order, were items 13 (a13*g = 3.932), 17 (a17*g = 2.909), 9 (a9*g= 2.882), 19 (a19*g = 2.597), and 3 (a3*g = 2.207). Two items discriminate more across respondents with respect to the general trait dimension: items #8 (a8*g = 2.349) and #14 (a14*g = 2.109). A comparison across marginal and conditional slopes of the bifac-GR models showed that there is a slight difference in the magnitude only for those items which have slopes close to 0 on the somatic dimension of the state scale, and negligible difference only for somatic items of the trait scale.

Comparison of TIFs and Marginal Reliability
Next, the total information functions were compared across the uni-GR and bifac-GR models for the STICSA state and trait scales. How well the construct is measured across any level of the trait continuum is the main information derived from using the TIFs. A straightforward comparison between the TIF from the unidimensional (uni-GR) model and the conditional bifac-GR (CTIF) model evidenced that the two functions seemed to overlap along all latent STICSA-state values (the conditional bifac-GR model was found to be more informative than uni-GR) (see Figure 1). Concerning the trait scale, the uni-GR was found to be more informative and included the conditional bifac-GR information that was slightly narrowed along the STICSA-trait values (see Figure 1). Based on the MTIF in Figure 1, the measurement precision is roughly constant for values on the state/trait continuum closest to 0, and between −0.5 and 0.5 with total information for the 21-item state/trait of about 15~14 (TIF with prior variance of 1-posterior total [test] information; 41~42 without prior variance of 1). The standard errors (SEEs; Figure 2) of bifac-GR state/trait scores for this interval were close at about 0.25~0. 26 (1/ √ Information). This level of error around the scores converts to an IRT reliability value of about 1−SEE 2 , which was 0.93−0.94 in this interval, respectively, for the state and trait scales.
Additionally, when the marginal reliability (also named empirically, since we used an EAP estimator) of the unidimensional model (0.89 for state, 0.90 for trait) and the bifactor model (0.87 for state, 0.86 for trait) was contrasted, there was an inflation in score precision when LD was overlooked (see Table 1). This inflated effect was more pronounced in the trait scale. On the other hand, the estimation of the conditional bifac-GR model tended to overestimate state/trait reliability/precision scores along the right segments of the continuum (high state/trait anxiety). Next, the marginal reliability for the two-correlated model (multi-GR, 0.86-0.88 for state and trait, for somatic and cognitive factors), respectively, was found to be higher compared to the bifactor model ones (Bifac-GR, 0.62-0.42 for state, 0.40-0.65 for trait). tinuum closest to 0, and between −0.5 and 0.5 with total information for the 21-item state/trait of about 15~14 (TIF with prior variance of 1-posterior total [test] information; 41~42 without prior variance of 1). The standard errors (SEEs; Figure 2) of bifac-GR state/trait scores for this interval were close at about 0.25~0. 26 (1/√Information). This level of error around the scores converts to an IRT reliability value of about 1−SEE 2 , which was 0.93−0.94 in this interval, respectively, for the state and trait scales. Additionally, when the marginal reliability (also named empirically, since we used an EAP estimator) of the unidimensional model (0.89 for state, 0.90 for trait) and the bifactor model (0.87 for state, 0.86 for trait) was contrasted, there was an inflation in score precision when LD was overlooked (see Table 1). This inflated effect was more pronounced in the trait scale. On the other hand, the estimation of the conditional bifac-GR model tended to overestimate state/trait reliability/precision scores along the right segments of the continuum (high state/trait anxiety). Next, the marginal reliability for the two-correlated model (multi-GR, 0.86-0.88 for state and trait, for somatic and cognitive factors), respectively, was found to be higher compared to the bifactor model ones (Bifac-GR, 0.62-0.42 for state, 0.40-0.65 for trait).

Multidimensional Item Diagnostic
The mIRT analyses were conducted separately on the trait and state scales. Table 3 presents the multidimensional parameter estimates of the items, as well as the item diagnostics estimates for both the two-correlated factors (multi-GR) and the bifactor models. As found in the previous pairwise comparison, none of the state scale items were identified as misfitting at p < 0.05. On the contrary, item #14 (p = 0.004) for the trait scale was confirmed to misfit both multidimensional models.

Multidimensional Item Diagnostic
The mIRT analyses were conducted separately on the trait and state scales. Table 3 presents the multidimensional parameter estimates of the items, as well as the item diagnostics estimates for both the two-correlated factors (multi-GR) and the bifactor models. As found in the previous pairwise comparison, none of the state scale items were identified as misfitting at p < 0.05. On the contrary, item #14 (p = 0.004) for the trait scale was confirmed to misfit both multidimensional models. Concerning state anxiety, all the multidimensional parameter values for the multiand bifac-GR models overlapped with a minimal difference in the MDISC average values of 0.179 (SD = 0.255) and negligible differences in the three intercepts (MDIFF 1 M ± SD= −0.011 ± 0.003; MDIFF 2 M ± SD = −0.027 ± 0.013; MDIFF 3 M ± SD = 0.044 ± 0.033). Five items (items #20, #21, #1, #12, #15) for state-somatic and two (items #5, #11) for statecognitive factors demonstrated "moderate" to "high" multidimensional discrimination, according to Baker [49] and Hasmy [65], ranging from 1.069 (item #21) to 1.696 (item #5). The remaining five items showed "very high" discrimination parameters (MDISCR > 1.7).

Discussion
The STICSA is now a widely accepted measure of state and trait anxiety, although its dimensionality is a matter of interest among researchers [33,34]. Its attractiveness lies in its ability to measure both state and trait anxiety with a single measure, as well as their cognitive and somatic features. Far from the criticized Spielberg state-trait anxiety model [24,73,74], the STICSA displays good psychometric properties, including an excellent divergent validity. This feature represents an advantage for clinicians, enabling them to better discriminate between specific and common anxiety factors that share underlying processes with depression [75,76]. To date, addressing dimensionality issues represents a crucial step to score and interpret STICSA state and trait item values. Studies have relied on different conceptualization of STICSA scores, but a large part of them have supported twodimension somatic-cognitive correlated factors for each state and trait scale [27,32,34,42].
Unlike past studies, strictly oriented to the classical psychometric approach, the present study aimed to disentangle the STICSA dimensionality issue by applying the multidimensional IRT approach. Multidimensional item response theory (mIRT) represents an extension over the classical IRT approach in which an additional vector of multiple personal attributes is included so that multiple underlying traits can be simultaneously measured [48,77]. Based on Styck, Rodriguez, and Yi [33]'s findings, a bifactor model was tested as a viable and reliable alternative to multidimensional models for assessing the STICSA state and trait scale dimensionality. In detail, information about fitting unidimensional, multi-GR, and bifac-GR models, testing pertinent IRT assumptions, and looking at model-data fit statistics was supplied. Indeed, that a model fits the data well overall compared to other models, but may poorly fit items, is usually observed [78]. Likewise, to interpret the bifac-GR model results, item parameters and TIFs were compared to the several models (following [50]). Parameter estimates and their multidimensional versions were computed to make meaningful comparisons across model parameter estimates (e.g., unidimensional vs. bifactor model; multidimensional vs. bifactor model) and select a championed model.
In line with Styck, Rodriguez, and Yi [33]'s model fit, results suggested that the bifac-GR model fit our data best when compared to the multi-and uni-GR. However, when we compared the IRT conditional/marginal and multidimensional parameters estimated and the information, we observed that the STICSA bifactor model did not offer as many advantages as expected. Yet, in the present study, the restricted bifactor model offered a way to deal with violations of LD (without removing them completely), and to improve the model's fit. It has been widely attested that a bifactor model is more robust compared to minor model misspecifications due more to unmodeled complexity than correlated factor models. It has increased parametric complexity [79] when compared to correlated factor models. Likewise, larger models tended to fit data sets better than more parsimonious models, though there is a trade-off with generalizability when models overfit a data set [39,80]. In addition, probifactor bias occurs due to the functional form of the parameters estimated in the bifactor model (e.g., [80]; i.e., the "difficulty" of the estimated parameters). Models that estimate the same number of parameters can differ in functional form, and two models with different functional forms, but equivalent free parameters, will exhibit different fitting propensities.
In the present study, the importance of estimating marginal parameters and TIF over the conditional bifactor model was demonstrated. Discrepancy among conditional and marginal slopes emerged since the specific latent-attribute did not affect the probability of response to the item (they are principally reflective of the general trait and secondarily a specific trait) [50,69]. The conditional bifactor model seemed to account for the general state/trait dimension common to all STICSA items, but it failed to address additional specific somatic/cognitive dimensions. In particular, the conditional bifactor model failed to address common variance in cognitive group item contents in the state scale and cognitive group item contents in trait scale. However, this result was already known in the STICSA psychometrics literature [33]. Small factor loadings and zero or negative group factor variances were also commonly identifiable in several studies that aimed to address a 'general psychopathology' reflecting commonality among all forms of psychopathology, along with several narrower psychopathology group factors (most commonly internalizing; see [81]). However, these findings can have limitations from a practical/clinical point of view. For example, the observed total/subscale scores may reflect a mixture of general and group factor variance. Observed correlations might be inflated or attenuated, since they might reflect the influence of the general factor, the group factor, or both. As a results, clinicians might be led to an incorrect assessment and treatment planning during their practices, with harmful consequences [82,83].
Therefore, it becomes mandatory to discern among broad and narrow factors to assess whether factors reflect substantive constructs or artefacts. In line with this recommendation, the usefulness of the STICSA bifactor model in computing a pairwise comparison approach was evaluated. Utilizing the estimated marginal bifactor model and parameters (also restricted), the general dimension was of the most concern, and items were allowed to load in specific factors when necessary (to account for an excess of LD). This model was found to be more precise and informative than the unidimensional and conditional bifactor models. Not surprisingly, unidimensional IRT models show a lack of measurement precision in the estimate dimensionality issue when scales are composed of correlated latent traits [84]. The computation of marginal trace line transformation also emphasized the usefulness of testing IRT model assumptions within the context of classical bifactor modeling.
Next, the suitability of a multidimensional nature of each STICSA state/trait scale was secured via IRT, and it was found that two-correlated factors represent a plausible alternative to the bifactor model, as items fit and multidimensional parameters of both multi-GR and bifac-GR were compared. We tested the possibility that a change in dimensionality of the STICSA model resulted in a different parameterization and meaning of the scale. The results of this second step showed that item #14 ("My arms and legs feel stiff") of the trait scale displayed a poor fit for both compared models. At the parameter level, no significant differences were observed between state and trait items in both models. Compared to the cognitive ones, the somatic items were found to be less discriminative both in trait and state scales. Opposing views exist when explaining the role of either somatic or cognitive patterns of anxiety. For instance, some physiological symptom items in anxiety measures (i.e., in pain scale) have been criticized due to possible overlap with depression symptomatology and have also raised doubts about the uniformity of the anxiety construct [85]. Likewise, somatic and cognitive items were equally endorsed by psychiatric inpatients, and psychological distress was limited to somatic signs of anxiety [86]. As highlighted by different authors, the somatic-cognitive pattern of anxiety gained resonance among researchers working to ascertain the experiential components of anxiety [74,87]. Concerning the item multidimensional thresholds, in the present study, all the STICSA items encompassed a discrete range of the latent trait continuum, and also showed a clear ordering of the items based on each item's quality and estimated level of multidimensional difficulty.

Conclusions
Summing up, our findings supported the equivalence of the bifactor and two-factor correlated models in terms of somatic-cognitive item multidimensional parameters. When only the fit indices of the model were taken into account, the classical bifactor model outperformed our data. Despite this, the bifactor global domain (state or trait) seems to be not as accurate and precise as expected. To overcome this and other methodological issues (e.g., the LD issue within the IRT), other computations might be needed, as in the present investigation (i.e., the marginal trace line transformation). To date, single global STICSA state or trait scores should be used with caution.
Our results showed the viability of measuring trait and state anxiety by the STICSA measure in a community sample with a wide age range, thus demonstrating its versatility. While different confirmatory factor models have been found in literature [33,34], in this study, a two-factor model for both the trait and state scales was confirmed and preferred over the inflated classical bifactor model. Specifically, IRT yielded factors that were consistent with the original domains and showed good fit on a separate set of trait and state datasets [33,34]. The moderately high intercorrelations among somatic and cognitive factors provided support for the compensatory nature of the latent traits [88]. In this way, somatic and cognitive symptoms as measured by STICSA items concurred to improve the measurement precision of anxiety.
The results of the mIRT analysis of the STICSA trait and state scales suggested a more accurate method for evaluating the measure and, ultimately, participants' anxiety. When the discrimination parameter was taken into consideration, items on both the trait and state scales exhibited high multidimensional discrimination values, in line with past studies' results found in the literature on existing clinical instruments [89][90][91].
Further research is needed to verify this hypothesis in clinical samples. The results of this investigation suggest that somatic and cognitive symptoms are equivalent to pinpointing a stable tendency to respond to threats. The greatest advantage of applying multidimensional item response models to clinical assessment is that the symptoms may be treated as ordered indicators of risk, as well as that specific symptoms and specific factors could be scaled onto a common trait. In clinical assessment, unobservable psychopathological constructs are multidimensional in nature (e.g., mental health or well-being). This is also confirmed in the recent success of transdiagnostic treatments for anxiety and mood disorders with high comorbidity [92].
Other limitations of this study include sample size and characteristics. The sample involved is quite large and heterogeneous, and no data were collected regarding diagnosis. As a consequence, the nature of the response process, as well as differences in the trait manifestation, could differ among patients with anxiety as a primary diagnosis from depressed patients with anxiety in comorbidity. Future research should address the feasibility of the mIRT models in clinical samples. Likewise, from a methodological point of view, there is no consensus among researchers that the accuracy of parameter estimates in mIRT may be biased or inflated by the sample size [93,94].

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, approved by the Institutional Ethics Committee of the Department of Psychological Sciences, Health and Territory.
Informed Consent Statement: Informed consent was obtained from all participants involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding authors. The data are not publicly available due to privacy reasons.

Acknowledgments:
We sincerely thank Antonio Paone for his help in formatting the manuscript, and Francesco Sulla for his help in improving language.