Appraisal and Evaluation of the Learning Environment Instruments of the Student Nurse: A Systematic Review Using COSMIN Methodology

Background: Nursing education consists of theory and practice, and student nurses’ perception of the learning environment, both educational and clinical, is one of the elements that determines the success or failure of their university study path. This study aimed to identify the currently available tools for measuring the clinical and educational learning environments of student nurses and to evaluate their measurement properties in order to provide solid evidence for researchers, educators, and clinical tutors to use in the selection of tools. Methods: We conducted a systematic review to evaluate the psychometric properties of self-reported learning environment tools in accordance with the Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) Guidelines of 2018. The research was conducted on the following databases: PubMed, CINAHL, APA PsycInfo, and ERIC. Results: In the literature, 14 instruments were found that evaluate both the traditional and simulated clinical learning environments and the educational learning environments of student nurses. These tools can be ideally divided into first-generation tools developed from different learning theories and second-generation tools developed by mixing, reviewing, and integrating different already-validated tools. Conclusion: Not all the relevant psychometric properties of the instruments were evaluated, and the methodological approaches used were often doubtful or inadequate, thus threatening the instruments’ external validity. Further research is needed to complete the validation processes undertaken for both new and already developed instruments, using higher-quality methods and evaluating all psychometric properties.


Introduction
For decades, literature has been studying the correlation between student satisfaction and the learning environment because the students' opinion is one of the elements to be taken into account to identify situations that promote or hinder learning and determine the success or failure of the course of study [1]. The learning environment is considered to be the social and organizational atmosphere in which interactions and communications between members of a learning group take place [2]. Learning environment, educational Note: Reason 1: instruments not included in article; Reason 2: not validation studies (e.g., survey); Reason 3: studies evaluating only one psychometric property (e.g., Cronbach Alpha); (*) Notice that the CALD instrument development study also includes a validation of the CLES-T, so it should not be summarized together with the other validation studies.

Methodological Quality, Overall Rating, and GRADE Quality of Evidence
In the evaluation of the quality of the evidence, 9 instruments were rated Moderate (CALD, CLECS, CLEI, CLEI-19, CLES, CLES-T, DREEM, ESECS, and SECEE), 3 Low (CEF, CLEDI, and CLEQEI) and 2 Very Low (CLE and EAPAP). This was determined by the quality and quantity of the validation and development studies reviewed. However, as indicated by the COSMIN guideline, studies that scored low or very low were not excluded from further evaluation. In addition, in the determination of relevance, comprehensiveness, and comprehensibility and, consequently, content validity, some biases in the study design resulted in low scores (most doubtful). The most frequent sources of bias were in the instrument development procedures (qualitative methodology for identifying relevant items; doubtful presence of a trained moderator or interviewer; no interview guidelines included in the article; the doubtful process of recording and transcribing participants' responses; doubtful independence of the data coding process; doubtful reaching of data saturation); and in the pilot tests (not at the requisite level of relevance, comprehensiveness, or comprehensibility of items to respondents; insufficient number of people enrolled in the pilot test or expert panel). See Table 2.

Psychometric Properties, Overall Rating, and GRADE Quality of the Evidence
The next stage of evaluation focused on the psychometric properties of the instruments tested in the articles included in the review. They scored 5 instruments as high quality (CEF, CLEI-19, CLEQEI, EAPAP, and SECEE), 2 as Moderate (CLE and CLEDI), 4 instruments as Low (CALD, CLECS, CLES, and CLES-T), and 3 as Very Low (CLEI, DREEM, and ESECS). These ratings were determined by the procedures used to test psychometric properties and were affected by some biases. For example, low scores were given for structural validity if the sample size in the analysis was not adequate. Based on the psychometric properties investigated in the studies and reported in Table 1, we were able to assess whether they met the criteria for good measurement properties reported in the COSMIN guidelines. Finally, based on the quality of the studies and the psychometric properties of the instruments, we allocated recommendations according to the modified GRADE method indicated by the COSMIN guidelines.

Learning Environment Instruments
All the instruments included in the review were developed and validated to measure the nature of the learning environment, whether clinical or educational. We present here a brief narrative overview of the instruments. For a complete overview of the instruments and the procedures adopted in their development and validation, see Table 1.
The first tool developed to assess the clinical learning environment is the Clinical Learning Environment (CLE) tool. This instrument was developed based on the theories of Orton (1981) [66], who conducted a survey of the learning environment in hospital wards and generated a scale consisting of 124 items. Dunn and Burnett, with a panel of 12 experienced clinical educators, considered only 55 items valid and then, through factor analysis, confirmed an instrument consisting of 23 items and 5 subscales: staffstudent relationships, nurse-manager commitment, patient relationships, interpersonal relationships, and student satisfaction. Only one instrument development study that met the inclusion criteria was identified by the review, and it was rated as "inadequate" for methodological quality because it was affected by the expert panel's doubtful description of assessment procedures and the absence of a pilot test on nursing students [24]. The GRADE recommendation grade was C because of inconsistent content validity, very low methodological quality of studies, and insufficient internal consistency (Cronbach's alpha being less than 0.70 in some factors of PCA and CFA).   The Dundee Ready Education Environment Measure (DREEM) was developed by Roff in 1997 to assess the educational environment of health professional trainees [67]. It originates from the results of a grounded theory study and subsequent panel of nearly 100 health educators from around the world, with subsequent validation by over 1000 students in countries as diverse as Scotland, Argentina, Bangladesh, and Ethiopia, to measure and diagnose educational environments in the health professions. It has been used internationally in different contexts, mainly with medical students, but also with other health professionals. The instrument consists of 50 items and 5 subscales: perception of learning, perception of teachers, social self-perception, perception of atmosphere, and academic self-perception. Three validation studies were included in the review, all of which reported sufficient content validity, moderate qualitative evidence (+/M), and sufficient though low internal consistency of the instrument (+/L), achieving a level A recommendation [58][59][60].
The Student Evaluation of Clinical Education Environment (SECEE) evaluates the clinical learning environment and was developed and validated by Sand-Jecklin in 1998 [64]. This instrument is based on the theoretical framework of cognitive apprenticeship, which states that students apply conceptual knowledge tools in a real-world environment while being guided by experienced professionals. Versions of the SECEE have evolved over time. Currently, the latest version is SECEE version 3, consisting of 32 items and 3 subscales: instructor facilitation, preceptor facilitation, and learning opportunities. Two validation studies were included in the review [65,68], and based on these, a grade of recommendation A was given for high quality of evidence, high internal consistency of the instrument, and sufficient content validity of moderate quality.
The Clinical Learning Environment Inventory (CLEI), which assesses the clinical learning environment, was developed and validated by Chan in 2001 [32][33][34]. It has been evaluated in four published journal articles, including three development articles and one validation article [32][33][34][35]. The instrument was developed based on the literature review and by modifying the College and University Classroom Environment Inventory (CUCEI) by Fraser and colleagues [69] (Assessment of Classroom Psychological Environment; Perth, Australia: Curtin University of Technology). Nearly 10 years later, Newton and colleagues (2010) modified 10 items from the "Actual" CLEI version, replacing the word "clinical teacher" with "preceptors," and conducted a PCA for the first time [33]. The instrument contains 35 items and 5 subscales (each containing 7 items): individualization, innovation, involvement, personalization, and task orientation. The instrument has two formats: the "Actual" form, which measures the current clinical environment, and the "Preferred" form, which measures the preferred clinical environment. The instrument is not recommended for use (GRADE level C) because: studies showed moderate qualitative evidence, the instrument has inconsistent content validity (±/M), the internal consistency of the instrument is insufficient, and the quality of evidence of psychometric properties assessed is very low (-/VL).
In 2002, Saarikoski and Leino-Kilpi developed the Clinical Learning Environment and Supervision Instrument (CLES) [37]. The instrument originates from the theories of Quinn (1995), Wilson-Barnett et al. (1995), and Moss and Rowles (1997). From a review of literature focused on clinical learning environments and the supervisory relationship [31,32], the authors categorized and summarized those items that could reflect the construct, and these were then tested in a pilot study. Subsequently, the number and type of items were changed and revised by a group of experienced clinical teachers [37]. The final version of the CLES scale consists of 27 items and 5 subscales: ward atmosphere, leadership style of the ward manager, premises of nursing care on the ward, premises of learning on the ward, and supervisory relationship. The CLES instrument has been translated and validated in several countries: Belgium [39], Cyprus [47], and Italy [13,38], and used in international comparative validation studies (Finland and the United Kingdom) [39]. Four articles were included in the review: one development review [37] and three validation reviews [13,38,39]. The recommendation grade of the instrument is B since it requires further study due to low but sufficient evidence of its internal consistency (+/L) and moderate and inconsistent content validity (±/M).
In 2006, Hosoda [29] developed the Clinical Learning Environment Diagnostic Inventory (CLEDI) based on Kolb's 1984 theory of experiential learning, which emphasizes that the learning process occurs only after the student is able to integrate concrete emotional experiences with cognitive processes [70]. The CLEDI is an instrument that contains 35 items and has 5 subscales: affective CLE, perceptual CLE, symbolic CLE, behavioral CLE, and reflective CLE. Only Hosoda's instrument development study was included in the review, but due to the lack of a pilot study assessing students' face validity, comprehensiveness, and comprehensibility, it scored low and had inconsistent content validity, earning a grade C recommendation.
In 2008, Saarikoski and colleagues modified the original CLES by including a new subscale related to the role of the nurse teacher (NL or T) to emphasize and define the importance of the nurse teacher in the clinical setting. The new scale, titled Clinical Learning Environment, Supervision, and Nurse Teacher (CLES-T) Scale, was validated in the same year [40]. A total of 19 studies were included: 1 development review [40] and 18 validation studies [39,[44][45][46][47][48][49][50][51][52][53][54][55][56][57][58][59]. CLES-T also received a grade of B recommendation, needing further study. This is due to some less recent studies with some methodological and measurement property biases that contributed to degrees of low but sufficient evidence of internal consistency of the instrument (+/L) but moderate and inconsistent content validity (±/M).
In 2011, Salamonson and colleagues modified the CLEI, reducing the items from 35 to 19. The CLEI-19 is used to assess two generic domains common to clinical learning environments: clinical facilitator support of learning and satisfaction with clinical placement. In this review, we included two studies: one development study [34] and one validation study [35]. The instrument received a grade B recommendation, given the high quality of the evidence and sufficient assessment of the internal consistency of the instrument (+/H) and inconsistent content validity of moderate quality (±/M) due to the absence of pilot testing procedures and content and face validity by a panel expert.
In 2011, Porter and colleagues [23] developed an instrument to assess the support received by students during clinical internships with the overall goal of improving the quality of the students' clinical experience. The Clinical Evaluation Form (CEF) consists of 21 items and 5 subscales: orientation, clinical educator/teacher, ward staff/preceptor and ward environment, final assessment/clinical hurdles, and university. Only the internal consistency of this instrument was assessed, receiving a score of sufficient and high quality. However, other important psychometric properties were not evaluated. In addition, the stage of item validation (e.g., whether it was undertaken by two researchers independently) and whether the items had been evaluated for relevance, comprehensiveness, and comprehensibility by nursing students were not clearly described. Therefore, the instrument was given a level B recommendation, requiring further study.
In 2014, Baptista and colleagues [62] developed an instrument to assess nursing students' perceptions and satisfaction during simulated clinical experiences. The Escala de Satisfação com as Experiências Clínicas Simuladas (ESECS) was developed based on the results of a literature review and a phenomenological study describing students' experiences in high-fidelity simulated practice using manikins. These studies resulted in a list of 17 items and 3 subscales: practical dimension, realism dimension, and cognitive dimension. Two studies were included in the review: one on development [62] and the other on validation [63]. The studies demonstrate moderate and sufficient content validity (+/M), but insufficient internal consistency with evidence quality rated as low, and therefore the instrument achieved a level B recommendation, needing further psychometric studies.
The Clinical Learning Environment Comparison Survey (CLECS) was developed by Leighton in 2015 [25] through a literature review, the results of which were evaluated and used by a panel of 12 academics with experience in simulation with manikins and clinical environments to generate the items and subscales. This instrument was used in two pilot studies to assess clarity. The final instrument consists of 27 items and 6 subscales: communication, nursing process, holism, critical thinking, self-efficacy, and teachinglearning dyad. Four studies were included in this review: one development [63] and three of validation [66][67][68]. The content validity of the instrument was inconsistent and moderate (±/M); this was due to the unclear description of procedures on students' assessments of the comprehensiveness and comprehensibility of the instrument. However, the internal consistency of the instrument attained the level of sufficient, while the quality of the evidence was rated as low, and therefore the recommendation level of the instrument was B.
One of the studies on CLES-T documented the development of a new instrument, the Cultural and Linguistic Diversity (CALD) scale, that assesses the clinical learning environment. The theoretical framework for the development of the CALD originates from two systematic reviews conducted by Mikkonen and colleagues [22]. From the synthesis of data from the two reviews, following Thomas and Harden's 3-step analysis process, 101 descriptive themes emerged that were compared with each item on the original CLES+T scale. Those that did not have corresponding items in the CLES+T scale were operationalized into measurable items to be used in the development of CALD. The final scale includes 21 items and 4 subscales: orientation into clinical placement, role of student, cultural diversity in the clinical learning environment, and linguistic diversity in the clinical learning environment. On the basis of methodological quality and results of psychometric properties, Mokkinen's study was one of the best studies conducted, and therefore, even though only one instrument development study that met the inclusion criteria was included in the review, a level A recommendation was given.
The Clinical Learning Environment Quality Evaluation Index (CLEQEI) is an instrument developed in Italy by a group of researchers at the University of Udine in order to assess students' perceived quality of clinical learning [36]. It is composed of 22 items investigating the quality of tutoring strategies, learning opportunity, safety and quality of care, self-learning, and the quality of the learning environment. It is the subject of one of the studies included in this review, which investigated several psychometric properties of the CLEQEI with good results, although the methodology for developing the instrument for assessing relevance, comprehensiveness, and comprehensibility was described unclearly and overly briefly. Only this one developmental study was included in the review, and the recommendation achieved was level B.
The Escala de Apoyo Académico en el Prácticum in Spanish (EAPAP) was developed by Arribas-Marìn in 2017 for the purpose of assessing students' perceptions of academic support during internship [61]. The EAPAP consists of 23 items and 4 subscales: peer support, academic institution support, preceptor support, and clinical facilitator support. This study demonstrated inconsistent content validity with really low qualitative evidence (±/VL) but sufficient internal consistency with high methodological quality, and therefore, although there is only one study of the instrument development, it can be recommended at level B but needs further psychometric validation studies to be strongly recommended.
As highlighted in the results, these instruments are not all comparable with each other because, although they all assess the learning environment of nursing students, they focus on measuring specific aspects such as the traditional clinical learning environment (9 instruments: CLE, SECEE, CLES, CLES-T, CALD, CLEQEI, CLEI, CLEI-19, and CLEDI), the clinical traditional and simulated environment (2 instruments: ESECS and CLECS), the clinical placement environment (1 instrument: CEF), and the educational learning environment (2 instruments: EAPAP and DREEM).
To make the results of this review even more comprehensive, we conducted a qualitative analysis of the items belonging to all identified instruments to identify common and uncommon categories investigated by each instrument (see Table 3). Twenty-three categories were identified. Among the most common categories, "Quality of tutoring strategies" was explored by 11 instruments, followed by "Learning opportunities", which was explored by 9 instruments including DREEM. "Quality of relationship with tutors", "Quality of clinical learning environment", and "Safety and quality of care" were each explored by 8 instruments. The most notable differences are found in the categories exploring "Self-efficacy in theoretical learning," "Quality of relationship with tutors," and "Quality of teaching strategies," which are each explored by only two instruments: the DREEM and the EAPAP. Academic support (student support) X 1 Support from the staff nurse X X X X 4 Support from fellow students X X X 3

Discussion
In our systematic review, a total of 45 studies emerged that estimated the reliability and validity of 14 instruments in 22 different countries belonging to 5 continents. Most were conducted in Europe (24 studies). The first validation study was the CLE scale, and the last one was the CLEQEI in 2017 [36]. This indicates that this field of research spans more than 30 years, during which a tremendous amount of change has occurred in nursing programs, internship environments, and student profiles [71]. We can ideally divide the instruments based on their development into first-and second-generation instruments, in agreement with Mansutti and colleagues [15]. In fact, first-generation instruments such as CLE scales, CLEDI, CLES, CLES-T, DREEM, and the SECEE originated from major theories of learning established mainly in the 1980s and 1990s, while second-generation instruments, on the other hand, started from instruments previously established in clinical settings (such as CALD and CLEI-19) or from validation by expert panels of findings that emerged from literature reviews (see CLECS). Development and validation studies of second-generation instruments also appear to be better described in the procedures adopted, thus offering a better evaluation of evidence on methodological quality. In addition, in recent years, a trend has emerged to evaluate the validity and reliability of established instruments in different countries (e.g., the CLES-T), gather evidence on instrument validity, and compare data. The instruments that emerged consisted of from two (CLEI-19) to six (CLECS) factors or subscales and from 19 (CLEI-19) to 50 (DREEM) items.
Comparing results between different studies that used the same instruments was not always easy for several reasons. First, because the methodological quality adopted was heterogeneous. Second, because the validation studies were conducted at different times and some analyses may not have been known at the time or may have become obsolete over time. Other common problems encountered were that few studies estimated reliability.
Although test-retest procedures should be easy to perform in an academic setting given the availability of students, the possibility that the duration and frequency of clinical rotations might have made it impossible to perform a second assessment for the same person should be considered. Internal consistency and structural validity were estimated for most of the instruments, but with methodological approaches of different quality, also compromising the quality of the results. Finally, convergent and criterion validity were assessed on a few occasions, especially in the first-generation instruments, due to the lack of available field knowledge and instruments that could be the gold standard for comparison.

Limitations
One of the limitations of this review may have been that it included only peer-reviewed studies in English and Italian. Therefore, this may have resulted in a potential publication selection bias because other instruments may have been developed and diffused as gray literature or in different languages. The evaluation of the studies was based on the 2018 COSMIN guidelines, and some criteria required for the "very good" or "adequate" rating may not have been considered by authors of older studies, and this may have influenced the final evaluation of the instruments. Finally, it was not possible to assess the responsiveness of the instruments, that is, the ability of an instrument to detect change in the measured construct over time (as required by the COSMIN procedure), due to the absence of longitudinal studies among those included.

Conclusions
Fourteen tools that assess the quality of learning environments, both clinical and educational, have gone through a validation process so far. First-generation instruments have been developed from different learning theories, while second-generation instruments have been developed from the first generation by mixing, revising, and integrating several already-validated instruments. Not all relevant psychometric properties have been evaluated for the instruments, and often the methodological approaches used are doubtful or inadequate. In addition, a lack of homogeneity in the procedures for both assessing instrument relevance, comprehensiveness, and comprehensibility and for assessing psychometric properties emerged, thus threatening the external validity of the instruments. Future research must complete the validation processes undertaken for newly developed instruments and those already developed, but using higher-quality methods and estimating all psychometric properties.

Conflicts of Interest:
The authors declare no conflict of interest.

Multimedia Appendix 1: Searching filter of PubMed
• Construct ("clinical practice*" OR "clinical internship" OR "clinical nursing education" OR "clinical education" OR "education-nursing" OR "practice education" OR "practicum education" OR "hospital learning environment" OR "nurse education" OR "clinical learning environment" OR "learning environment" OR "clinical placement" OR "clinical teaching" OR "mentoring" OR "tutoring") • Population ("nurse student*" OR "baccalaureate student*" OR "student nurse*")