3.2. Confirmatory Factor Analysis
Consistent with prior validations of the FOS (e.g.,
Linderbaum & Levy, 2010;
Lilford et al., 2014), we compared alternative models to evaluate the latent structure. A one-factor model showed poor fit, whereas a correlated four-factor model substantially improved fit.
Among the tested solutions, the second-order model with four first-order factors loading on a global feedback orientation construct and including the correlated residuals not only provided the best overall fit, but was also the most consistent with the theoretical conceptualization of feedback orientation as a multidimensional yet integrative construct (
Table 3).
In line with recent validation efforts (e.g.,
Fuentes-Cimma et al., 2025, Figure 2, p. 41), we introduced one correlated residual in the CFA model (specifically between items 1–2) to improve overall model fit, as suggested by modification indices exceeding 50. This decision was grounded not only in statistical criteria, but also in semantic and structural considerations. In both cases, the items appeared consecutively in the questionnaire and loaded on the same latent factor—Utility—which may have contributed to additional residual covariance beyond the target construct. Moreover, while not redundant, the items of the pair share partially overlapping content. Item 1 (“I find that feedback is critical for reaching my goals”) and item 2 (“Feedback is critical for improving performance”) both highlight the instrumental function of feedback in achieving professional objectives (see
Table 1). To clarify whether the correlated residuals between Items 1 and 2 reflected true redundancy or merely shared substantive content, we estimated two additional CFA models in which either Item 1 or Item 2 was removed from the Utility factor. This step allowed us to compare the effectiveness of item deletion versus modeling the covariance directly. Full comparison results are reported in
Table 4.
As expected, both reduced models yielded small improvements in global fit indices relative to the second-order specification without correlated residuals. However, these changes were marginal and did not translate into meaningful gains in internal consistency (α) or convergent validity (AVE). Across all specifications, the Utility factor remained psychometrically stable, and neither item demonstrated a uniquely problematic pattern of loadings or reliability. Thus, item removal did not offer substantive advantages in terms of scale performance.
Importantly, Items 1 and 2, while partially overlapping, capture complementary aspects of the instrumental value of feedback—progress toward goals and performance improvement—representing two facets that are both theoretically relevant. Eliminating either item would therefore narrow the conceptual coverage of the Utility dimension without improving the psychometric quality of the scale. Furthermore, in their cross-cultural adaptation,
Fuentes-Cimma et al. (
2025) also modeled residual covariance between these same items, arguing that their highly similar semantic content and adjacent positioning in the questionnaire may generate shared variance not fully captured by the latent factor.
For these reasons, we retained the full item set and modeled the residual covariance between Items 1 and 2. This approach provided a modest but consistent improvement in model fit while preserving the conceptual integrity and breadth of the construct.
Analyses of convergent and discriminant validity indicated that the Accountability dimension was the least robust component of the scale. Its AVE was 0.41, falling below the recommended 0.50 threshold (
Kline, 2011;
Brown, 2015;
Fornell & Larcker, 1981), and its internal consistency was only moderate (α = 0.732). The HTMT between Utility and Accountability reached 0.89, exceeding the conservative 0.85 cutoff and approaching the 0.90 boundary proposed in recent SEM literature (
Henseler et al., 2015;
Hair et al., 2022), suggesting substantial conceptual overlap between the two factors. This strong association is theoretically plausible: perceiving feedback as important for achieving goals (Utility) naturally overlaps with perceiving a responsibility to act on feedback (Accountability). Although conceptually related, the two constructs remain distinguishable, with Utility reflecting the instrumental value of feedback, while Accountability captures its normative and duty-oriented component. All other HTMT values were well below the recommended thresholds, indicating adequate discriminant validity for the remaining factor pairs. To address the concern that Utility and Accountability might not represent empirically distinct constructs, we estimated an alternative second-order model in which all items from these two dimensions were combined into a single first-order factor. This three-factor model (Utility/Accountability, Social Awareness, Feedback Self-Efficacy) showed clearly poorer global fit than the four-factor solution (e.g., robust CFI ≈ 0.89, robust TLI ≈ 0.87 robust RMSEA ≈ 0.10, SRMR = 0.06), indicating that collapsing the two dimensions does not provide an adequate representation of the data. Overall, the results indicate that although perceiving feedback as useful may incline individuals to feel responsible for acting on it—a theoretically coherent association—the two dimensions remain empirically separable and functionally distinct within the measurement model.
Given the comparatively weaker convergent validity, reduced internal coherence, and the highest inter-factor overlap of the Accountability dimension, we proceeded to examine this factor through item-level diagnostics to identify potential sources of misfit.
Item-level diagnostics confirmed this pattern: Item 8 displayed the lowest corrected item–total correlation (r.drop = 0.45) and the lowest communality (h2 = 0.24), markedly weaker than the values observed for the remaining items (r.drop = 0.55–0.60; h2 = 0.51–0.69). To evaluate its impact on the model, we estimated a CFA excluding Item 8. The revised model showed a clear improvement in global fit (robust CFI increasing from 0.921 to 0.930; robust TLI from 0.903 to 0.913) and enhanced convergent validity, with the AVE for Accountability rising from 0.41 to 0.50. Discriminant validity also benefited from the removal, as the HTMT between Utility and Accountability decreased from 0.89 to 0.863, falling below the critical threshold. Importantly, modification indices no longer suggested local misfit within the Accountability factor once Item 8 was removed; the only residual correlation that remained necessary and theoretically justified concerned Items 1 and 2. Overall, the convergence of statistical and conceptual evidence indicated that Item 8 contributed disproportionately to the reduced coherence of the Accountability dimension. Its exclusion resulted in a more stable, internally consistent, and discriminant factor structure, and the item was therefore removed from the final validated version of the scale.
Importantly, the removal of Item 8 did not negatively affect the other dimensions: their reliability and convergent validity indices remained essentially unchanged, with differences never exceeding 0.001 across models. Utility retained an AVE of 0.597, Social Awareness an AVE of 0.526, and Feedback Self-Efficacy an AVE of 0.501, mirroring the values obtained in the initial model that included all items. This stability indicates that the exclusion of Item 8 strengthened the Accountability factor without altering the psychometric performance of the remaining dimensions. Overall, eliminating item 8 strengthened the coherence of the Accountability factor and reduced its redundancy with Utility, enhancing both convergent and discriminant validity without compromising the structure of the scale. Following the removal of Item 8, the Self-Efficacy factor remained the dimension with the weakest convergent validity, with an AVE of 0.495—acceptable but still indicative of limited shared variance among its indicators. Item-level diagnostics suggested that Item 16 was the primary source of this weakness. Modification indices revealed a large residual correlation between Items 15 and 16 (MI = 77.54), implying substantial redundancy beyond what is explained by the latent factor. This redundancy was theoretically plausible: the English wording of the two items confirms their semantic proximity, with Item 15 (“I feel confident when responding to both positive and negative feedback”) and Item 16 (“I feel self-assured when dealing with feedback”) both capturing a broad, overlapping sense of confidence in handling evaluative information rather than distinct aspects of feedback-related self-efficacy. Statistical evidence also supported the removal of Item 16. It showed the highest residual variance among all indicators of the factor (residual = 0.396; R2 = 0.403), suggesting limited contribution to the latent construct. We systematically compared alternative specifications—retaining both items with a correlated residual, removing Item 15 instead of Item 16, and removing Item 16 alone. Introducing the residual correlation between Items 15 and 16 failed to strengthen the factor, lowering the AVE to 0.466, while removing Item 15 produced negligible improvements. In contrast, removing Item 16 resulted in a clearer and more coherent Self-Efficacy factor, raising its AVE to 0.514 without negatively affecting reliability or the performance of the remaining dimensions. Taken together, these conceptual and statistical considerations indicated that Item 16 added redundancy rather than substantive information to the Self-Efficacy construct; its removal therefore produced a more parsimonious and psychometrically robust factor structure.
After the iterative refinement of the scale, including the removal of Items 8 and 16, the final model displayed a clear and well-defined hierarchical structure, with the four first-order dimensions loading onto a second-order Feedback Orientation factor (FOSV). The specification also retained a theoretically justified correlated residual between Items 1 and 2, reflecting their shared wording and closely aligned content. The overall model showed solid global fit (robust CFI = 0.942, robust TLI = 0.926, RMSEA = 0.057, SRMR = 0.052), indicating that the second-order structure captured the latent construct with good parsimony and minimal local misfit. Convergent validity was satisfactory, with AVE values of 0.597 for Utility, 0.500 for Accountability, 0.526 for Social Awareness, and 0.514 for Self-Efficacy, all at or above the recommended benchmark. Composite reliability coefficients were likewise adequate (0.821, 0.742, 0.816, and 0.752, respectively), confirming good internal consistency across dimensions. Discriminant validity was supported by the pattern of HTMT correlations, all below conventional cutoffs: the highest value, 0.838 between Utility and Accountability, remained within acceptable limits, while all other associations were considerably lower. Overall, once the residual dependence between Items 1 and 2 was accounted for and the item-level refinements were applied, the second-order model demonstrated a coherent factorial structure, balanced reliability, and solid convergent and discriminant validity, supporting the adequacy of the finalized measurement model.
Criterion-related (predictive) validity was supported: feedback orientation showed a 0.298 positive association with job satisfaction, consistent with meta-analytic evidence (rc ≈ 0.33;
Katz et al., 2023); in our data, r = 0.25,
p < 0.001. Construct validity was further corroborated 300 by the positive correlation between feedback orientation and the feedback environment, 301 consistent with meta-analytic evidence from the feedback environment meta-analysis (rc 302 ≈ 0.42;
Katz et al., 2021) and discussed in the FO meta-analysis (
Katz et al., 2023); in our data, r = 0.41,
p < 0.001. This approach mirrors
Linderbaum and Levy’s (
2010) 304 original validation strategy, reinforcing the argument that the scale captures a dispositional tendency embedded within established feedback contexts.
After establishing the final second-order CFA structure, we tested its measurement invariance across key demographic and occupational subgroups. Invariance testing followed established recommendations for multi-group CFA (
Cheung & Rensvold, 2002;
Chen, 2007;
Putnick & Bornstein, 2016). We sequentially evaluated configural, metric, scalar, and strict invariance across four grouping variables: gender (male and female; participants identifying with other gender categories were excluded from this analysis due to the very small subgroup size,
n = 8), age (≤40 vs. >40 years), educational level (lower vs. higher education), and job role (managerial vs. non-managerial positions).
Model comparisons were evaluated using changes in CFI, SRMR, and RMSEA. In line with established recommendations (
Cheung & Rensvold, 2002;
Chen, 2007), measurement invariance was considered tenable when ΔCFI ≤ 0.010, ΔSRMR ≤ 0.010 for scalar/strict invariance and ≤0.030 for metric invariance, and ΔRMSEA ≤ 0.015. Robust (Yuan–Bentler) fit indices were used for all evaluations.
Table 5 reports the complete fit indices and Δ values for each step.
We acknowledge that the RMSEA of the configural model lies in the 0.08–0.09 range. Following the interpretative guidelines proposed by
MacCallum et al. (
1996), RMSEA values between 0.08 and 0.10 reflect a mediocre level of fit, which does not necessarily indicate poor model performance. This interpretation is consistent with the observation that RMSEA tends to be more severe in complex, multifactor item-level models. As noted by
Marsh et al. (
2004), such models frequently yield higher RMSEA values due to their structural complexity, even when the underlying factor solution is reasonable.
Chen (
2007), in particular, demonstrated that RMSEA can fluctuate with model complexity and degrees of freedom, sometimes producing indications of non-invariance even when constraints are tenable. At the same time, we recognize that the mediocre RMSEA of the configural model calls for a cautious interpretation. As this is the first study to examine measurement invariance for the Italian adaptation of the FOS, we consider the present findings supportive but not definitive. Further research is needed to determine whether the observed RMSEA values reflect methodological characteristics of the index in complex models or subtle group differences that warrant additional investigation.
Gender: The configural model showed acceptable CFI (0.914) and SRMR (0.057), together with an RMSEA of 0.089, which falls within the
mediocre but acceptable range for complex multifactor models (
MacCallum et al., 1996). This suggests that men and women share a broadly similar factor structure, although some misfit is present and should be interpreted cautiously.
Metric invariance was supported (ΔCFI = –0.001; ΔSRMR = 0.004), indicating that factor loadings are comparable across genders. Scalar invariance was also broadly supported (ΔCFI = –0.004; ΔSRMR = 0.001), suggesting similar item intercepts, although again the RMSEA remains modestly elevated. Strict invariance showed a more notable drop in CFI (–0.013), indicating only partial support for equality of residual variances. The factor structure and loadings appear stable across gender, and latent mean comparisons are tentatively possible, but comparisons of raw scores are not recommended, and all interpretations should remain cautious given the mediocre RMSEA and the decrease in CFI at the strict level.
Age: The configural model displayed CFI = 0.918 and SRMR = 0.056, with an RMSEA of 0.088, again indicating mediocre but acceptable fit. This suggests that the basic factor structure is reasonably consistent across age groups.
Metric and scalar invariance showed stable ΔCFI values (–0.005 for both) and acceptable ΔSRMR values (0.009 and 0.002), supporting comparable loadings and intercepts across younger and older workers. At the strict level, ΔCFI reached –0.026, indicating that error variances differ more substantially. Across age groups, structure, loadings, and intercepts appear broadly comparable, but strict invariance is not supported. Latent means may be compared with caution; raw-score comparisons should be avoided.
Education: The configural model (CFI = 0.913; SRMR = 0.053; RMSEA = 0.089) again showed mediocre but acceptable fit, supporting a similar factor structure across education levels. Metric and scalar invariance were supported (ΔCFI ≤ −0.001; ΔSRMR ≤ 0.004), indicating stability of loadings and intercepts. Strict invariance also showed acceptable ΔCFI (–0.010) and ΔSRMR (0.002). The measurement model is largely invariant across education groups, enabling cautious comparisons of latent means. As in other groups, the mediocre RMSEA of the configural model suggests interpreting results with restraint.
Job role: The configural model showed acceptable CFI (0.912) and SRMR (0.056), with an RMSEA of 0.091. This aligns with the “mediocre but acceptable” range typical for complex item-level models.
Metric invariance was supported (ΔCFI = −0.004; ΔSRMR = 0.006). Scalar invariance also met the recommended criteria (ΔCFI = −0.003; ΔSRMR = 0.001). Strict invariance yielded ΔCFI = −0.003 and a negligible ΔSRMR difference, indicating a stable model across constraints. This is the strongest invariance pattern among the groups, within the limits of the mediocre RMSEA observed at the configural level. The factor structure, loadings, and intercepts are comparable across managerial and non-managerial employees, and error variances do not show substantial differences. Latent means can be compared, though the mediocre RMSEA of the configural model still suggests cautious interpretation.