1. Introduction
Clinical simulation has become a cornerstone of healthcare education over the past two decades, providing immersive, controlled environments in which learners can practise clinical tasks and decision-making without risk to patients. Simulation-based education (SBE) has been shown to enhance technical proficiency, reduce medical errors, and contribute to improved patient safety across healthcare disciplines (
Muñoz Gualán & Sierra, 2025). Evidence from systematic reviews confirms that SBE significantly improves both technical competencies and a range of non-technical skills including communication, teamwork, leadership, and clinical reasoning (
Muñoz Gualán & Sierra, 2025;
Patten et al., 2026;
Pucer et al., 2025).
Despite this strong evidence base, a persistent asymmetry exists in the simulation literature: the majority of research and curricular investment has focused on technical skills (TS) training, while non-technical skills (NTS), encompassing interpersonal, cognitive, and behavioural competencies such as situational awareness, decision-making, and crisis resource management, remain comparatively understudied (
Patten et al., 2026;
Pucer et al., 2025). This imbalance is particularly concerning given that adverse clinical events are frequently attributed to failures in NTS rather than technical errors (
Gawronski et al., 2022). Some authors argue that NTS are best developed through interprofessional simulation, yet others highlight the lack of validated measurement instruments and the methodological heterogeneity across studies as barriers to drawing firm conclusions (
Patten et al., 2026;
Gawronski et al., 2022). It is worth noting that several behavioural assessment tools for NTS have been developed and validated in specific clinical contexts, including Non-Technical Skills for Crew Resource Management (NOTECHS) (
Flin et al., 2003), Anaesthetists’ Non-Technical Skills (ANTS) (
Fletcher et al., 2003), and the Ottawa Crisis Resource Management Global Rating Scale (Ottawa GRS) (
Kim et al., 2006;
Kim et al., 2009). These instruments provide structured, observer-rated evaluations of NTS competencies during simulation, in contrast to the learner self-report perception measures used in the present study. Their existence contextualises the current methodological choice: while validated behavioural rating tools capture objective performance, perception-based measures such as those employed here capture a distinct and complementary construct, the learner’s subjective experience of training quality, which is the appropriate operationalisation of Kirkpatrick Level 1 evaluation.
A key but underexplored question is whether learners perceive TS and NTS simulation training as equally effective. Understanding learner satisfaction is directly relevant to educational practice because, within Kirkpatrick’s four-level evaluation framework (
Kirkpatrick & Kirkpatrick, 2016), Level 1 reactions are considered foundational prerequisites for deeper learning and skill transfer. Positive learner perceptions have been linked to increased engagement and more effective application of skills in clinical practice (
Bdiri Gabbouj et al., 2024;
McGaghie et al., 2010). However, whether participants in TS and NTS courses differ systematically in their Level 1 reactions, and whether these differences have practical educational significance beyond statistical significance, has not been rigorously examined.
Furthermore, demographic and professional characteristics may moderate how learners engage with and perceive different types of simulation training. Age, professional category, and institutional setting have been shown to shape educational preferences and participation patterns in simulation programmes (
Zhang, 2023;
Pimenta et al., 2025;
Prete et al., 2024;
Ylönen et al., 2025), yet comprehensive analyses of these moderating effects across TS and NTS course types remain scarce, especially in multi-institutional, accredited simulation settings. Clarifying these associations has direct implications for equitable programme design and targeted learner recruitment.
Addressing these gaps, the present study aimed to: (1) analyse healthcare professionals’ perceptions of simulation training classified by primary learning objective (TS versus NTS) using Kirkpatrick’s Level 1 framework; (2) examine the educational significance of perception differences by calculating Cohen’s d effect sizes at both domain and item level; and (3) explore associations between participant characteristics and course type at an SSH-accredited clinical simulation centre serving multiple healthcare institutions in Spain.
2. Materials and Methods
2.1. Study Design, Setting, and Ethics
This retrospective cross-sectional study included all simulation-based training activities conducted at the Centre d’Innovació i Simulació Territorial (CISTE) during 2024 and 2025. CISTE is a high-performance clinical simulation centre affiliated with the Chair of Innovation and Simulation in Health of les Terres de l’Ebre (Universitat Rovira i Virgili, Spain), accredited by the Society for Simulation in Healthcare (SSH) and operating in accordance with SSH and INACSL Standards of Best Practice in all training activities. The retrospective design was chosen to maximise the available sample from routinely collected institutional data without introducing selection effects that a prospective design might create.
Ethical approval was granted by the Research Ethics Committee on Medicinal Products (Comitè d’Ètica d’Investigació amb Productes Sanitaris i de la Salut d’Atenció Primària, CEIPSA) under project code 2023 TDO 0112. All participants provided written informed consent prior to inclusion. The study was conducted in accordance with the Declaration of Helsinki (
World Medical Association, 2013) and applicable Spanish data protection regulations (Organic Law 3/2018).
2.2. Course Classification and Participants
All healthcare professionals and students enrolled in simulation training at CISTE during the study period were eligible for inclusion. Courses were classified according to their primary learning objective as either technical skills (TS) or non-technical skills (NTS) training. TS courses focused on procedural competencies and hands-on clinical technique acquisition using high-fidelity mannequins and task trainers (e.g., advanced life support, surgical skills, ultrasound-guided procedures). NTS courses targeted cognitive, behavioural, and interpersonal competencies, including communication, teamwork, leadership, and crisis resource management.
For the 2025 dataset (n = 941), course classification was assigned prospectively by the CISTE expert team and recorded in the institutional database at the time of programme design. For the 2024 dataset (n = 865), classification was applied retrospectively by the same multidisciplinary team (comprising two simulation educators and one clinical specialist) using course names, syllabi, and stated learning objectives. Consensus was reached for all 45 courses; no course required arbitration or was excluded due to ambiguity. The classification criterion was identical across both years: a course was assigned to TS if its primary learning objective was procedural competency acquisition, and to NTS if its primary objective was the development or application of cognitive, interpersonal, or behavioural competencies. No course was classified as both.
Participant demographic and professional data collected at enrolment included age (date of birth), biological sex, professional category (nurse, physician, TCAI, student, or other), and institution type (hospital, intermediate care, mental health, or university/community settings). Age was computed from date of birth and date of the session; values outside the plausible range (17–66 years) were excluded from age-related analyses (n = 1547 valid, 85.7%).
2.3. Data Collection Instrument
Participant perceptions were assessed at the end of each training session using a 16-item ad hoc questionnaire administered electronically via Microsoft Forms (Microsoft Corporation, Redmond, WA, USA). The instrument was structured into four domains: (1) Simulation/Methodology (8 items; e.g., “The simulation session met my learning expectations”; “The simulation improved my awareness of clinical errors”); (2) Teachers/Instructors (3 items; e.g., “Instructors created a safe learning environment”); (3) Technological Materials (2 items; e.g., “The simulated environment was realistic”); and (4) Facilities/Staff (3 items; e.g., “The facilities met my expectations”). Each item was rated on a 10-point Likert-type scale (1 = strongly disagree; 10 = strongly agree). Completion was voluntary and anonymous.
Because the questionnaire was developed specifically for this study, the following structural validation analyses were conducted to establish its psychometric suitability as the primary measurement instrument of the study: internal consistency, exploratory factor analysis, and convergent and discriminant validity assessment. Internal consistency was assessed using Cronbach’s alpha (α) for the total scale and each domain separately. The total scale demonstrated excellent reliability (α = 0.958), exceeding the α > 0.90 threshold recommended for research instruments (
Taber, 2018). Domain-level alpha values were: Simulation/Methodology α = 0.930; Teachers/Instructors α = 0.898; Technological Materials α = 0.748; Facilities/Staff α = 0.912. The questionnaire instrument, item-level data, and full EFA results are available in
Supplementary Table S1.
2.3.1. Structural Validity: Exploratory Factor Analysis
An EFA was conducted on the complete-case item correlation matrix (N = 1814 participants with all 16 items present). Suitability was confirmed by KMO = 0.957 (“meritorious”;
Kaiser, 1974) and Bartlett’s test of sphericity (χ
2(120) = 26,131.5,
p < 0.001). The number of factors was determined by parallel analysis (200 random datasets; 95th percentile threshold): only F1 (λ = 10.143) exceeded its random counterpart (λ = 1.196), while F2 (λ = 1.127) fell below the parallel analysis threshold (λ = 1.155). A one-factor solution was therefore retained. Extraction was performed using Maximum Likelihood (ML). The single factor explained 63.4% of total variance. All 16 items showed substantial loadings on F1 (range: 0.493–0.849; all ≥ 0.40), with mean communality = 0.475. Full factor loadings and communalities are reported in
Supplementary Table S1.
These results indicate that the 16-item questionnaire is essentially unidimensional: all items measure a single general satisfaction factor. The overall perception score (mean of all 16 items) is therefore the psychometrically justified primary outcome variable. Domain-level groupings (Simulation/Methodology, Teachers/Instructors, Technological Materials, Facilities/Staff) function as analytical facets of this single construct rather than as independent latent dimensions, and domain-level comparisons should be interpreted accordingly.
2.3.2. Convergent and Discriminant Validity
Convergent validity was assessed using the average variance extracted (AVE) per domain (
Fornell & Larcker, 1981). Only the Simulation/Methodology domain met the AVE ≥ 0.50 threshold (AVE = 0.564); the remaining domains showed AVE values below 0.50 (Teachers/Instructors: 0.339; Technological Materials: 0.459; Facilities/Staff: 0.386), reflecting the strong cross-domain correlations (r = 0.71–0.80) characteristic of a unidimensional instrument. Composite reliability (CR) values ranged from 0.602 to 0.911. The Fornell–Larcker discriminant validity criterion was not met for any domain (√AVE = 0.582–0.751; all below maximum inter-domain r = 0.803), confirming that the four subscales do not constitute empirically distinct constructs. These findings are consistent with the unidimensional EFA solution and further support using the overall score as the primary outcome. Detailed indices are provided in
Supplementary Table S1.
2.4. Statistical Analysis
All analyses were performed using IBM SPSS Statistics version 27.0 (IBM Corporation, Armonk, NY, USA). Analyses were conducted at the individual participant level (n = 1806). Descriptive statistics were calculated for all variables; continuous variables are reported as mean ± standard deviation (SD) and median; categorical variables as frequencies and percentages.
Associations between course type (TS vs. NTS) and participant characteristics (age, gender, professional category, and institution type) were examined using Spearman’s rank correlation coefficients (ρ) and chi-square (χ2) tests. Where chi-square indicated a significant association but the Spearman correlation did not, the association was interpreted as non-ordinal. Group differences in domain perception scores were assessed using independent-samples t-tests with Welch’s correction for unequal variances; a two-sided p < 0.05 was considered statistically significant.
To quantify the practical educational significance of observed differences, Cohen’s d effect sizes were calculated for all TS vs. NTS comparisons at both domain and item level, using pooled standard deviations. Effect sizes were interpreted as trivial (d < 0.20), small (d = 0.20–0.49), medium (d = 0.50–0.79), or large (d ≥ 0.80) (
Cohen, 1988). This step was considered essential given the large sample size, which confers sufficient statistical power to detect educationally trivial differences as statistically significant.
A post hoc sample size adequacy check was performed using G*Power 3.1 (
Faul et al., 2007). For the t-test analyses, the minimum required sample was estimated based on the smallest observed effect size (d = 0.21, Facilities/Staff domain), an allocation ratio of 1:2.1 (TS:NTS), α = 0.05 (two-tailed), and 80% power, yielding a minimum of 796 participants. The enrolled sample (n = 1806) exceeded this by a factor of 2.3, with observed power ranging from 98.8% to >99.9% across domains and a minimum detectable effect of d = 0.14. For Spearman correlations, the Fisher z-transformation method indicated a minimum requirement of n = 490 for the weakest significant association (ρ = 0.126); the enrolled sample provided adequate power for all significant correlations (ρ = 0.059–0.107).
4. Discussion
This study analyzed healthcare professionals’ perceptions of simulation-based training classified by primary learning objective, technical skills (TS) versus non-technical skills (NTS), using Kirkpatrick’s Level 1 evaluation framework, and examined associations with participant characteristics at an SSH-accredited simulation center. Overall, participants reported excellent perceptions across all domains (all means > 9.0 on a 10-point scale), consistent with prior studies reporting high learner satisfaction in high-fidelity simulation contexts (
Bdiri Gabbouj et al., 2024;
McGaghie et al., 2010).
A key finding of this study is that, although differences were identified between TS and NTS courses across all perception domains, these differences were generally limited in magnitude. This suggests that, despite some variation in how participants evaluated the two types of courses, perceptions were broadly comparable across training contexts, highlighting more similarities than substantial disparities in the overall learning experience. Consequently, the presence of statistical significance alone does not imply that these differences are educationally meaningful. The EFA establishes that the questionnaire measures a single general satisfaction construct, so domain-level effect sizes represent facets of this construct rather than independent dimensions. Taken together, the data indicate that high-fidelity simulation training is perceived as highly effective regardless of course type. The item-level pattern reveals that the TS advantage concentrates specifically in items measuring concrete and observable learning gains (P5, P6, P7, P8), while structural items, instructors, facilities, and materials show minimal differences, a theoretically coherent and practically actionable finding.
The slightly higher satisfaction scores for TS courses likely reflect characteristics intrinsic to procedural training: TS sessions offer hands-on skill practice with immediate and tangible performance feedback, clear progression cues, and measurable outcomes (e.g., successful completion of a procedure). In contrast, NTS training, while addressing critical competencies such as communication, leadership, and crisis resource management, involves more abstract learning objectives and less immediate feedback, which may contribute to somewhat lower satisfaction scores even when educational quality is high (
Cloonan et al., 2026;
Cheng et al., 2014). This interpretation is consistent with findings from Peddle (
Peddle, 2019) and others who noted that participants in NTS-focused simulation may perceive their learning as less concrete.
Gender was not significantly associated with course type or perception scores in this study, consistent with evidence from trauma team simulation research showing minimal impact of gender on NTS learning (
Ylönen et al., 2025). This null finding is nonetheless informative, as it suggests that gender equity in access to simulation training, at least within this center’s programming, is being maintained across both course types.
An interesting finding was that the “Other” professional category did not show clear differences in the evaluation of TS and NTS courses. This group includes professionals such as administrative staff, educators, security personnel, psychologists, and social workers, whose daily work is primarily centred on communication, coordination, and interpersonal interaction rather than procedural clinical tasks. In this context, non-technical competencies may be perceived as directly connected to their routine professional practice, which could contribute to a more similar evaluation of both training formats. In contrast, clinically trained professionals appeared to value TS courses more positively, suggesting that perceptions of simulation training may be influenced by the extent to which the competencies addressed align with the core activities of each professional role. From a theoretical perspective, the Kirkpatrick Level 1 data presented here are foundational but insufficient for a full evaluation of simulation program effectiveness. High learner satisfaction is a necessary but not sufficient condition for learning transfer (
Kirkpatrick & Kirkpatrick, 2016;
Bdiri Gabbouj et al., 2024). Future research at this center should examine Level 2 outcomes (knowledge and skill acquisition), Level 3 outcomes (behavior change in clinical practice), and ideally Level 4 outcomes (patient safety and clinical performance indicators). The consistently high Level 1 scores provide a favorable starting point for these investigations.
The findings carry several practical implications for simulation program design, policy, and educational practice.
A further reflection emerging from these results concerns the conceptual framing of non-technical skills themselves. The prevailing label, “non-technical” or “soft” skills, implies a secondary or supplementary nature relative to procedural competencies. This framing is, however, epistemologically misleading: NTS encompasses the most structurally complex and cognitively demanding competencies required in clinical practice, including situation awareness, leadership, crisis resource management, and interprofessional communication. Their difficulty lies precisely in the fact that their acquisition is not linear, their progress is less observable in the short term, and their measurement requires validated instruments that the literature itself acknowledges as still under development (
Gawronski et al., 2022).
Abildgren et al. (
2022) demonstrated in a systematic review published in Advances in Simulation that human factor skills, the broader category that encompasses NTS, are trainable to a degree comparable to technical skills, but crucially, evidence on their retention and transfer to clinical practice remains insufficient (
Abildgren et al., 2022). This gap between the recognized importance of NTS and the methodological tools available to capture their impact is a structural problem in health professions education, not a reflection of their intrinsic value.
This structural complexity has direct implications for interpreting learner satisfaction data. Research consistently links NTS failures to a disproportionate share of adverse clinical events:
Alexandrino et al. (
2023) noted in a narrative review in Frontiers in Medicine that root cause analyses of surgical adverse events identify failures in coordination, planning, and communication, all NTS domains, as the dominant causes of medical error, ahead of purely technical failures (
Alexandrino et al., 2023). Likewise,
Tam et al. (
2024), in a systematic review in Surgical Endoscopy, found that acute stress degrades NTS even more markedly than technical skills, underscoring their central role in high-stakes clinical performance (
Tam et al., 2024). Taken together, this evidence supports the interpretation that the perceptual gap observed in the present study does not signal lower educational quality of NTS courses, but rather reflects the inherent difficulty of making abstract, behavioural learning gains immediately visible to learners. This interpretation is consistent with prior work by
Redjem et al. (
2025), who showed in a systematic review in Nurse Education Today that NTS training evaluations rarely exceed Kirkpatrick Level 2, not because NTS training is less effective, but because its higher-level impacts are harder to measure and communicate (
Redjem et al., 2025). Strengthening NTS debriefing with explicit behavioural metrics and anchored performance feedback, rather than general reflective discussion, is therefore a priority for simulation educators seeking to close this perceptual gap, as also supported by recent Spanish-context evidence on nursing simulation (
García-Salido et al., 2024). The item-level data provide empirical support for this recommendation. Item P8 (“Error awareness”) showed one of the largest differences in favour of TS courses (d = 0.43), a pattern that is consistent with the nature of procedural training, where errors tend to be discrete, observable, and immediately identifiable (e.g., an incorrect catheter placement or a medication dosing error). By contrast, in NTS scenarios, errors are often more diffuse, emerging at the level of communication patterns or team interaction, and may be interpreted differently depending on the participant’s role within the scenario. In the absence of structured debriefing approaches that make these aspects explicit, such as anchored behavioural rating scales or guided team reflection, these types of errors are less likely to be consistently recognised. This finding therefore points to error awareness as a key area for strengthening the instructional design of NTS training, particularly through more structured and standardised debriefing strategies.
This study makes several contributions to the existing literature on simulation-based health professions education. First, to our knowledge, it is one of the few studies to systematically compare Kirkpatrick Level 1 outcomes between TS and NTS courses in an SSH-accredited multi-institutional centre, using a large, real-world sample (n = 1806) and effect size analysis to move beyond statistical significance. Most prior studies evaluating NTS simulation have focused on specific scenarios or specialties (e.g., operating room, emergency department), typically with smaller samples and without direct TS versus NTS comparison at the programme level. Second, the inclusion of participants across multiple professional categories, nurses, physicians, TCAI, students, and other healthcare roles, provides a more representative picture of real-world simulation populations than studies restricted to a single profession. Third, the use of Cohen’s d across all domains and items, combined with the explicit distinction between statistical and educational significance, is methodologically rigorous and directly applicable to curriculum planning decisions. Fourth, the study was conducted in an accredited centre operating under SSH and INACSL standards, enhancing the generalizability of findings to similarly resourced simulation environments. Together, these contributions position the present study as a meaningful empirical reference for simulation programme evaluation in the European health professions education context.
4.1. Implications for Practice
First, the small effect sizes suggest that NTS training should not be perceived as educationally inferior to TS training; its slightly lower satisfaction scores likely reflect inherent characteristics of the learning content rather than deficiencies in program quality. Simulation center managers and curriculum designers should communicate this distinction clearly to stakeholders and prioritize the continued development of NTS offerings. Investment in deliberate instructional strategies specific to NTS, such as structured debriefing with explicit performance metrics, scenario complexity calibration matched to career stage, and structured peer feedback mechanisms, may help close the perceptual gap and maximize the educational impact of these courses (
Cloonan et al., 2026;
Cheng et al., 2014).
Second, the career-stage distribution of participants, with younger students and trainees engaging primarily in TS courses and more experienced professionals gravitating toward NTS, has direct implications for program scheduling and targeted recruitment. Simulation centers should consider developing career-stage-specific simulation pathways that systematically introduce NTS training earlier in professional development, rather than treating it as a later-career activity. In this respect, the student subgroup warrants particular attention. Although students mainly participated in TS courses, they exhibited a relatively small difference in satisfaction between TS and NTS formats, especially when compared with physicians and nurses, who showed a much stronger preference for TS. While this pattern has been interpreted in relation to students’ early stage of professional development, it may also point to a broader shift in educational expectations. Current health sciences curricula place increasing emphasis on interprofessional learning, teamwork, and communication from the outset of training, which may foster a more balanced appreciation of both technical and non-technical competencies. If so, students may enter simulation-based education with greater openness to NTS approaches than more experienced clinicians. This possibility highlights an important area for future research on how educational exposure shapes perceptions of different simulation modalities over time.
Third, the consistently high instructor ratings across both course types (mean scores ≥ 9.51) highlight instructor expertise as a key driver of learner satisfaction regardless of course content (
Bdiri Gabbouj et al., 2024;
McGaghie et al., 2010). This finding supports sustained investment in instructor training and development programs, particularly in preparation for facilitating the more complex interpersonal dynamics that characterize NTS-focused simulation. Centers that cannot yet demonstrate high instructor ratings for NTS courses may consider a phased implementation model, beginning with intensive instructor preparation before expanding the NTS portfolio.
4.2. Limitations
This study has several limitations. First, data were collected through a self-reported questionnaire, which may be subject to response and social desirability biases. Second, the cross-sectional retrospective design limits causal inference. Third, the study was conducted at a single accredited simulation center, which may limit generalizability to settings with different resources, institutional cultures, or accreditation standards. Fourth, the exploratory factor analysis conducted on the complete-case item data (N = 1814) supported a unidimensional structure of the instrument (KMO = 0.957; eigenvalue = 10.143; 63.4% of variance explained; item loadings 0.49–0.85). Consistent with this result, the overall perception score should be considered the primary psychometrically supported outcome, while domain scores are better understood as exploratory groupings of items within a single underlying construct of satisfaction. In line with this interpretation, the Fornell–Larcker criterion was not satisfied for any domain, indicating substantial overlap between subscales and reinforcing the absence of empirically distinct dimensions. Although this limits the extent to which domain-level differences can be interpreted as independent constructs, future work should include confirmatory factor analyses in independent samples and comparisons with externally validated instruments to further strengthen the measurement model. Fifth, this study examined only Kirkpatrick Level 1 (learner reaction) outcomes; learning acquisition, behavior transfer, and patient safety impacts were not assessed.