Development and Validation of a Critical Thinking Assessment-Scale Short Form

: This study presents and validates the psychometric characteristics of a short form of the Critical Thinking Self-assessment Scale (CTSAS). The original CTSAS was composed of six subscales representing the six components of Facione’s conceptualisation of critical thinking. The CTSAS short form kept the same structures and reduced the number of items from 115 in the original version, to 60. The CTSAS short form was tested with a sample of 531 higher education students from five countries (Germany, Greece, Lithuania, Romania, and Portugal) enrolled in different disciplinary fields (Business Informatics, Teacher Education, English as a Foreign Language, Business and Economics, and Veterinary Medicine). The confirmatory analysis was used to test the new instrument reliability, internal consistency, and construct validity. Both the models that hypothesized the six factors to be correlated and to tap into a second-order factor representing the complex concept of critical thinking, had acceptable fit to the data. The instrument showed strong internal consistency (α = 0.969) and strong positive correlations between skills and between the skills and the overall scale ( p < 0.05). Despite the unbalanced sex distribution in the population (close to 75% females), the instrument retained its factorial structure invariance across sexes. Therefore, the new instrument shows adequate goodness of fit and retained stability and reliability, and is proposed as a valid and reliable means to evaluate and monitor critical thinking in university students.


Introduction
Improving critical thinking (CrT) skills remains a growing concern for today's Higher Education Institutions (HEI).CrT is a crucial non-technical, soft skill, highly prized by stakeholders in every profession, which led to a market-driven educational culture.CrT has been identified as one of the top soft skills sought in the twenty-first century [1][2][3][4].The HEIs raised the purpose of nurturing their students in critical thought and informed decision-making to provide the market with a skilled workforce and thereby improve their employment rates [5].
CrT involves a complex combination of higher-order reasoning processes.More than the sum of individual skills, CrT is perceived as being interwoven of various multidimensional and multi-levelled sub-skills.For instance, within the Think4Jobs ERASMUS+ project, a working definition was conceptualised under the Facione framework [6] as the "purposeful mental process driven by conscious, dynamic, self-directed, self-monitored, self-corrective thinking, sustained by disciplinary and procedural knowledge as well as metacognition" [7].
CrT conceptualization has diverged through time, in accordance with three large branches: philosophical, psychological and educational [8].For the philosophical approach, focused on the mental process of thought, a critical thinker is someone that logically evaluates and questions the assumptions of others and his own, while for the psychological approach, focused on the processes driving an action, the critical thinker holds a combination of skills that allow individuals to assess a situation to decide on the best action to take.The educational approach places itself closer to the psychological approach, and relies on the use of frameworks and learning activities designed to enhance students' CrT skills, and consequently to test these skills [8,9].
CrT requires a complex set of qualities that may be foreseen as "generic" or as "domain-specific" [10,11].The generic-CrT-skills usefulness transcends the academic and professional settings, and applies to all aspects of one's life; it is particularly foreseen to judge challenging moral and ethical situations that are often framed by particular interests [3,12].Domain-specific CrT skills are often framed by a standard intervention or a code of professional conduct as expected from professionals, and support the decision-making within a particular context.Furthermore, most concepts recognize that CrT embeds in abilities or skills supported by a set of sub-skills, as well as in attitudes or dispositions [13][14][15].The dispositions comprise different dimensions, and they determine whether a person is willing to use critical thinking in everyday life.
For HEI, a challenge exists regarding CrT development in their students: 1-how can they be efficiently and effectively taught along with the programme curricula, to mitigate putative gaps regarding the expectations of stakeholders; and 2-how can they be assessed to both validate the strategies' effectiveness and to demonstrate the students' acquisition of CrT skills?Educators in HEI have been confronted with the need to adopt appropriate teaching strategies to enhance students' CrT skills and assessment tools to show evidence of students' achievements in this regard [16].Another challenge faced by HEI and educators is respecting the complexity of CrT' nature, which should be made explicit to students, while avoiding the pressure that may be associated with the reported limitations of the "teach-to-test" approach [17], improving the odds of developing and transferring CrT skills to everyday life or the labour market.
Regarding the evaluation of CrT skills, two different approaches have been used [18].One approach uses resources such as different types of measuring instruments, either the formal or standardized CrT tests (such as the CCTT-Cornell Critical Thinking Test; the California Critical Thinking Dispositions Inventory-CCTDI; or the Halpern Critical Thinking Assessment test-HCTA, among others) [8], or the self-reported students' or stakeholders' perceptions [8].The other approach uses "objective measurements" or "performance assessment", which are based on the transferability of skills to new work-based, professional-driven situations (e.g., the PBDS-Performance-Based Development System Test for nursing [19], the OSCE-Objective Structured Clinical Examination for clinical subjects [20], or the iPAL-Performance Assessment of Learning for engineering [11]).The performance assessment combines different dimensions of technical and soft-competencies evaluation.In general, performance-based critical-thinking tests rely on simulated real-life decision-making situations, demanding the students present the rationale for their decisions, using the available evidence [21].
Whether standardized or self-reported, CrT tests share a common pitfall: they tend to assume that critical thinking can be fragmented into a sum of detached sets of measurable sub-skills, such as analysis, evaluation, inference, deduction and induction [10].According to those defending the performance assessment, there is little support that CrT sub-skills, or even the CrT skill for that effect, are independently mobilized in everyday life or work contexts.Therefore, performance assessment allows for a holistic evaluation of a set of CT skills or sub-skills combined differently, to succeed in the task in hand.
According to Simper et al. [22], "Critical thinking, problem solving and communication are fundamental elements of undergraduate education, but methods for assessing these skills across an institution are susceptible to logistical, motivational and financial issues".Standardized tests are based on well-established CrT-skills taxonomies, such as Ennis' and Facione's [8], and have been used for a long time to measure the CrT skills in students or HEI candidates worldwide.Even though some of these tools were validated at the time, they have been recently questioned regarding the transferability of the construct validity across disciplines or regions [23,24], questioning their face validity.Moreover, they are not easily available; some demand expert evaluation and scoring, the rater needs to be trained [3], and are usually expensive to routinely apply [25].In addition, for some of them, the situations around the questionnaire are far from the students' reality or take between 50 to 80 min to respond [23], contributing to the poor motivation of respondents to fill in the questionnaires [23,26].Other concerns include the use of forced-choice questions, which may restrict the respondent's answers by limiting the possible hypothesis and relying on recognition memory [27,28]; the fact that the questions are often constructed from inauthentic situations [8], designed to trigger a response from the respondent; and the possible limited relevance of the skills tested, compared with the proposed instruction outcomes [29].Finally, at least for some particular tests, it remains unclear how the respondent's reasoning will allow evidence of more discrete dispositions, such as open-mindedness or inquisitiveness [8], or how they could avoid the use of specific reasoning skills in students positioned in the more advanced years of their academic path.Therefore, their use in an academic context remains controversial.In particular fields, discipline-specific formal tests, such as in Business and in Health Science, have been developed, to copy with a less generalist scope the CrT-skills instruments, but their use also involves costs, and they need to be further validated in other regional or cultural contexts.
Consequently, the usefulness of such instruments has been questioned regarding their regular application in academic contexts, particularly in assisting students´ improvement of CrT, as a whole, or as specific skills, and as evidence of student progression across the curricula [16,30].On the other hand, a mismatch may arise from differences between the tasks students must develop during learning and what is assessed by the standardized tests, leading to a situation where the assessment does not cope with the subject outcome.
In the past decades, dissatisfaction with standardized tests led to the development of self-report instruments that have been validated in various disciplines.Nevertheless, there are some differences between these tools in the conceptualization of the CrT skills underlying the construct, so they are not entirely equivalent, and may even be scoring different sets of competences.Consequently, it is challenging to establish a comparison between them.Self-report tests for critical thinking seem more frequently used to ascertain respondents' dispositions, rather than skills.
However, students' self-report data might not be a consensual approach.Cole and Gonyea's work showed that the scores obtained by the same students in standard and self-reported questionnaires often present low correlations, as students tend to overscore their performance in the self-report questionnaires [31].Such a bias may be associated with the willingness to cope with social standards or to meet the teacher's expectations, or influenced by the need to evoke past events to answer a question [32].Nonetheless, if there is awareness of their nature and recognition of the underlying reasons for them to occur, these kind of biases may be prevented when designing a self-report instrument [32].Self-reporting methods have been widely used to assess CrT skills gained after changes in instructional strategies [21,33].However, both the complexity of the construct of critical thinking and the small population enrolled in the studies contribute to the poor reliability of the constructs, thereby reducing the validity of some tests.A similar concern applies to the performance tests.Nonetheless, they may assume particular interest in assessing non-cognitive characteristics [34] when there is no direct reflection on the students' grades or opportunities, namely in educational settings.In this context, self-report questionnaires may be used to monitor and enhance performance and to identify individual training needs [34].
The European Project "Think4Jobs" (2020-1-EL01-KA203-078797), currently ongoing, aims at expanding the collaboration between Labor Market Organizations (LMOs) and HEI to design, implement and assess the efficacy of CrT-blended-apprenticeships curricula developed for five disciplines (i.e., Veterinary Medicine, Teacher Education, Business and Economics, Business Informatics, English as a Foreign Language).These curricula were designed to provide students with the opportunity to systematically train CrT skills and to stimulate their transfer into new situations arising from the labour market.This collaboration is foreseen as a flexible interface sustaining HEI and LMO collaboration, to provide a work-based context for developing graduates' CT (https://think4jobs.uowm.gr/,accessed on 9 November 2022).The changes in the CrT skills were tested in students participating in piloting courses using new CrT-embedding instructional strategies in different disciplines.The changes in the CrT skills were tested in students participating in piloting courses using new CrT-embedding instructional strategies in different disciplines such as Teacher Education (Greece), Business Informatics (Germany), English as a Foreign Language (Lithuania), Business and Economics (Romania) and Veterinary Medicine (Portugal).Based on previous experience and available literature, it was decided among partners to abandon classical, standardized CrT-skills tests, and instead select a test that may cope with some primary criteria: a closed-end test; easy to administer online; matching the proposed outcomes of the activities that would be implemented with students to reinforce CrT skills, and covering the CrT skills as conceptualized under the Facione framework [35]; practical for students to take; and not demanding, in terms of the level of technical expertise required to answer and to retrieve information.In addition, a limit expected time for completion of the questionnaire was set (preferably less than 30 min), as it was intended to be used paired with a different questionnaire tackling CrT dispositions.
Among the questionnaires addressing the core CrT skills as conceptualized by Facione [13,35], namely interpretation, analysis, evaluation, inference, explanation, and self-regulation, each one encompassing subskills, the consortium selected the questionnaire developed by Nair during her doctoral thesis [36]-the Critical Thinking Self-Assessment Scale (CTSAS)-to be applied pre-and post-test to the students enrolled in the activities.The instrument was one of the first validated scales for self-evaluation of CrT skills in higher-education students, and was designed to be applied across different disciplines.The original final version was composed of 115 items scored according to a sevenpoint rating scale (ranging from 0 = never to 6 = always).The questionnaire has been tested in different geographic and cultural contexts, and scored well in the reliability and internal consistency tests, as well as in the confirmatory factor analysis for all the skills composing the questionnaire [37].
However, even though the expected time to complete Nair´s questionnaire was around 50 min, according to the author [36], the time for filling in the questionnaire was longer when it was tested with a small group of students, and was slightly longer than desired.Consequently, it was decided to shorten the original scale, to reach a response time of less than 30 min.
The purpose of this study is to present and validate the psychometric characteristics of a new, short form of Nair's CTSAS, intended to assess the CrT skills of students engaged in activities designed to support the enhancement of CrT skills, to diagnose the skills needing intervention and to monitor the progress or the results of interventions.

Shorthening of the CTSAS
To shorten the original Nair scale composed of 115 items, a two-step approach was used, involving two Portuguese experts.The following criteria were outlined for the possible rejection of items: 1-low loading-weights elimination (items with loading weights below 0.500 were eliminated, with 84 items remaining); 2-elimination of redundant items and items whose specific focus was not set on the use of cognitive skills (since the partnership considered that 84 items was still a high number, the items considered as redundant or not focusing on cognitive skills were marked for elimination, and items were reduced to 58); 3-review by two experts (after marking the items for elimination, the proposal was analysed by two independent experts who confirmed or reverted the rejection proposal, based on the Facione-based conceptualization of CrT skills and subskills.As recommended by the experts, the final version also incorporated items 16 and 19 from the original scale, due to their theoretical relevance.Modification of the items of the original CTSAS was avoided.Table 1 summarizes the changes introduced in the original questionnaire. The CTSAS short form retained a total of 60 peer-reviewed items.The number of items assessing each dimension ranged between 7 and 13.For subdimensions (or subskills), this number varied from 3 to 7 items, except for 5 subdimensions (decoding significance, detecting arguments, assessing claims, stating results, and justifying procedures), which comprised only two items.The short-form scale retained the original scale's framework, where students start with the question « What do you do when presented with a problem?» and are requested to answer the items using a seven-point Likert-scale structure with the following options: 0 = Never; 1 = Rarely; 2 = Occasionally; 3 = Usually; 4 = Often; 5 = Frequently; 6 = Always.

Participants
Five hundred and thirty-one university students (389 women, 142 men) participated in this study, ranging from 19 to 58 years old (mean = 23.47;SD = 7.184).The distribution of participants by country was as follows: 33.3% were from Greece, 29.4% from Portugal, 21.1% from Romania, 9.8% from Lithuania and 6.4% from Germany.Students studied within the following disciplines: Business Informatics, Business and Economics, Teacher Education, English as a Foreign Language, and Veterinary Medicine.
Ethical clearance for the study was obtained from the University of Évora Ethical Committee (registered with the internal code GD/39435/2020); moreover, students signed an informed consent associated with the questionnaire, and were allowed to withdraw from the study at any time without penalty or loss of benefits.

Translation of the CTSAS Short Form into Different Languages
The adopted short-version of the CTSAS in English, was translated into Portuguese, Romanian, Greek and German.The translation into these languages followed the recommended procedures (translation, revision and refinement), to ensure that the meaning, connotation and conceptualization respected the original instrument [38,39].Two bilingual translators from each country using a non-English-version questionnaire, converted the adopted CTSAS short form into their mother language; different sets of operators then analysed this translation to screen the differences between the two versions of the questionnaire and ensure the precision of the translation and its compliance with the original [40].The consensual translated versions were reviewed by a group of experts from each national team in the project, who judged the content equivalence of the instrument.The experts' concordance was considered as an equivalent assessment of the translated questionnaire.

Data Collection
The collection of data through the CTSAS short form was performed from October 2021 to January 2022, in accordance with the scheduled term in the different piloting courses designed in the Think4Jobs project.This study used a non-randomised, non-probability convenience sample resulting from the voluntary responses from students enrolled on the Think4Jobs' designed curricula.The participants were students from Greece (enrolled on the courses Teaching of Science Education, Teaching of the Study of the Environment, and Teaching of Biological Concepts), students from Germany (enrolled on the courses Design Patterns, Innovation Management, Economic Aspects of Industrial Digitalization, and Scientific Seminar), from Lithuania (enrolled on the English for Academic Purposes course); from Portugal (enrolled on the courses Deontology, Gynaecology, Andrology and Obstetrics, Imaging, and on Curricular Traineeship), and students from Romania (enrolled on the courses Pedagogy and Didactics of Financial Accounting, Business Communication, and Virtual Learning Environments in Economics), all of whom responded to questionnaires in Greek, German, English, Portuguese and Romanian, respectively.
The questionnaire was made available to students online, on the Google Forms platform.The invitation to participate was sent to the students at the beginning of the semester, through the course page on Moodle.The process was supervised by the teachers involved in the pilot courses.
The responses were collected into an individual Excel file for each country: after data anonymization (by replacing the names with an alpha-numeric code (composed of the code for the country-GR, LT, RO, GE and PT, respectively, for Greece, Lithuania, Romania, Germany and Portugal-plus a sequential number, from 1 to n), the removal of all other identifying information retrieved from the platform, and screening for inconsistent data, the files were merged into the database used for statistical analysis.

Statistical Analysis
The descriptive measures for the items included the mean, standard deviation, skewness, kurtosis, the equal distribution Kolmogorov-Smirnov test and the Mann-Whitney U test for the mean rank differences.
To assess if the CTSAS short form fits the original factor model, a confirmatory factor analysis (CFA) was performed, with weighted least-square means and variances (WLSMV) as an estimation method, due to the ordinal nature of the data [41].Model fitindices performed include the χ² test for exact fit, the comparative fit index (CFI), the Tucker-Lewis index (TLI) and the root-mean-square error of approximation (RMSEA).Following Hu and Bentler [42], we considered CFI and TLI values ≥ 0.90 and RMSEA ≤ 0.06 (90%IC) as acceptable fit values.Data were specified as ordinal in the model.
To evaluate the reliability and internal consistency of the scale and subscales, Cronbach's alpha was computed.In accordance with Hair et al. [43], we considered alphas above 0.70 as good reliability-indices.
The multigroup invariance was assessed for female and male students.Differences on RMSEA and CFI values lower than 0.015 and 0.01, respectively, were used as criteria for invariance [44,45].
Univariate descriptive-and internal-consistency was calculated using the IBM SPSS Statistics 26.CFA and multigroup invariance analysis were performed, using MPlus 7.4 [46].

Results
The results are divided into three sections.The first section presents the descriptive statistics of the items.The second section shows the results from the confirmatory factor analysis.The third section shows the multigroup invariance analysis.

Descriptive Analysis of Items
The mean range of the 60 items varied from 3.13 (« I write essays with adequate arguments supported with reasons for a given policy or situation») to 5.04 («I try to figure out the content of the problem» ).The standard deviation varied from 0.958 (« I try to figure out the content of the problem» ) to 1.734 (« I write essays with adequate arguments supported with reasons for a given policy or situation» ).The K-S test showed that data were equally distributed for female and male students (p > 0.05), except for the item « I can logically present results to address a given problem» (Z = 1.533; p = 0.018) and the item « I respond to reasonable criticisms one might raise against one's viewpoints» (Z = 1.772; p = 0.004).The item descriptions are displayed in Table A1 (see Appendix A, Table A1).
The Mann-Witney U test showed no statistically significant differences between female and male students (p > 0.05) except for the items «I observe the facial expression people use in a given situation» (Std U = −2.230;p = 0.026), «I can logically present results to address a given problem» (Std U = 2.382; p = 0.017), «I respond to reasonable criticisms one might raise against one's viewpoints» (Std U = 3.957; p < 0.001) and «I provide reasons for rejecting another's claim» (Std U = 2.588; p = 0.010).Detailed item descriptions can be consulted in Appendix A, Table A1.

Confirmatory Factor Analysis (CFA) and Reliability
The aim of the CFA was to confirm whether the CTSAS short form (60 items) fitted the original second-order factor model proposed by Nair [36].Five successive models of increasing complexity were tested to achieve a comprehensive analysis of the structure and relations of the sixty items, six latent skills and a general construct.Five successive models of increasing complexity were tested to achieve a comprehensive analysis of the structure and relations of the sixty items, six latent skills and a general construct: 1. Model 1: One-factor model.This model tests the existence of one global factor on critical thinking skills, which explains the variances of the 60 variables.Table 2 shows model fit-indices for each model.Goodness-of-fit indices are satisfactory for models 3 and 4, but not for models 1, 2 and 5.As model 3 and model 4 are not nested, we guide our interpretation based on fit-indices differences.The differential values of the RMSEA and CFI indices between model 3 (which shows the best goodness-offit indices) and model 4 (which represent the original model proposed by Nair [36]) are lower than 0.015 and 0.010, respectively (ΔRMSEA = 0.002; ΔCFI = 0.003), suggesting that both models may be used to validate the internal structure of the questionnaire.As model 4 represents the original model, it will be accepted as a fitted factor structure, and considered for the following analysis.A2) are significant (p < 0.001) and vary from 0.386 («I observe the facial expression people use in a given situation») to 0.786 («I continually revise and rethink strategies to improve my thinking»).All factor loadings are above 0.50, except for the items «I observe the facial expression people use in a given situation» (0.386), «I clarify my thoughts by explaining to someone else» (0.422) and «I confidently reject an alternative solution when it lacks evidence» (0.470).
The internal consistency of the CTSAS short from is excellent (Cronbach's α = 0.969).As shown in Table A3 (see Appendix A, Table A3), Cronbach's alphas for each scale are above 0.70, showing good factorial reliability.Correlations between factors and between factors and the general critical-thinking-skills construct are strong (from 0.750 to 0.965) (Table 3).All correlations are significant at p-value < 0.0001.

Multigroup Invariance
A multigroup invariance analysis was produced to verify the factorial-structure invariance across sexes.Multigroup invariance was tested using the WLSMV as an estimation method, due to the ordinal nature of the data.As an initial step, the baseline was created for both groups (female and male students) using independent CFAs for each group.After the baseline was created, a CFA was applied to both groups simultaneously, to test for invariance.We tested the three invariance models, from the less restrictive (the configural model) to the most restrictive (the scalar invariance).The results are shown below, in Table 4. Based on the goodness-of-fit values of the different invariance models tested (configural, metric and scalar), the stability of the factor structure in both sexes is confirmed.The difference (Δ) in CFI and RMSEA values between the models is less than 0.015 and 0.010, respectively, revealing the invariance of the factorial structure, the invariance of factor loadings and the invariance of the item intercepts when comparing female and male students.
Once the measurement invariance was confirmed, we tested the structural invariance related to the populational heterogeneity, as well as the latent-mean invariance.Structural invariance tests whether the covariance level between factors is the same for both groups.Latent-mean invariance assesses whether the latent means are equal in both groups.
Table 5 displays the results from the structural invariance in both groups.The Wald test shows a significant difference between factor correlations of the female and male models (Wald = 6.507; df = 1; p = 0.011).As seen in Table 5, factor covariances are significantly higher for the male model than for the female model, showing some population heterogeneity [47].Within the means invariance analysis, female students are the baseline group, with a latent mean equal to zero.The mean comparison is presented in Table 6.There are non-significant differences in factor means between females and males.

Discussion
This study attempted to validate a short form of the CTSAS questionnaire originally developed by Nair [36].The number of items was reduced from 115 to 60, to reduce the time for completion of the questionnaire, with a greater focus on cognitive processes.Even though Nair refers to having participants that took between 35 and 45 min to complete the questionnaire [36], and recommends a time of 40 to 50 min [36], a previous test with random researchers took them more than 60 min to fill in the original CTSAS form.The adaptation of the short form eliminated 55 items, reducing the time for completion of the questionnaire to a maximum of 30 min, while maintaining the original skills-and subskills-dimensions. Thus, it was possible to keep the six core-skills structure (Interpretation, Analysis, Evaluation, Inference, Explanation and Self-regulation).
The shortened form was tested with 531 students from five HEI disciplines, in five European countries.Data were collected during the first and second terms of the academic year 2021/2022.Country representativeness was skewed, as the Lithuanian and German groups had a smaller number of participants.Nonetheless, the total number of respondents was adequate for performing a robust confirmatory-factor analysis [48].
On average, the age of participants was close to 23.5 years old, ranging from 19 to 58 years.Close to 87% of the participants were aged below 31 years.The age distribution reflects the reality of the HEIs in the five countries represented in this study, where most students enter HEIs at around 18 to 19 years of age.Older students are less commonly found, and often represent non-traditional students who work while enrolled in college or who seek graduation programmes to adjust their careers or to acquire new competencies supporting economic improvement.
The sex of the respondents was unevenly distributed, with the females reaching close to 75% of the total participants.In Europe, females represent the majority among tertiary students, particularly in longer programmes and in masters´ and doctoral cycles, even though differences in this trend are recorded in some countries.In general, females predominate in Business and Law and Administration, as well as in Health Sciences, Arts and Humanities, Social Sciences and Education.In contrast, in Engineering and Information and Technologies, males predominate [49].Among the population enrolled in the current study, a small number of respondents belong to the Information and Technologies disciplines (German students, who represented 6.4% of the sampled students).Due to their numbers, they were insufficient to balance the females predominance in the other disciplines.
The CTSAS short form was validated through confirmatory factor analysis, the evaluation of internal consistency or reliability, and by testing the multigroup invariance for male and female students.
In the confirmatory factor analysis used to test the questionnaire dimensionality and accuracy, two models showed equivalent satisfactory goodness-of-fit indices, namely the correlated six-factor model (Model 3) and the second-order factor model (Model 4).The chi-square/df ratio in the second-order factor model and the correlated six-factor model (2.33 and 2.28, respectively,) confirmed the overall fitness of both models, while the RMSEA value, together with the TLI and CFI indices, supported the very good fit of both models [43,50].Therefore, both models may be used as adequate models for depicting the structure of the CTSAS short-form questionnaire.
The confirmatory factor analysis established the validity and reliability of the correlated six-factor empirical model for the CTSAS short form: Interpretation (nine items), Analysis (eleven items), Evaluation (seven items), Inference (thirteen items), Explanation (ten items) and Self-regulation (ten items).The Cronbach alphas of the overall instrument and of the six scales were high (α = 0.969 for the overall scale and between 0.750 and 0.965 for the six factors), supporting the independent use of each one of the six skills-scales [27], whenever different dimensional aspects of CrT skills need to be evaluated separately.Nonetheless the correctness of assuming that CrT may be decomposed into a set of discrete, measurable skills, has recently been questioned [8].A number of voices defend the fact that CrT is usually practised as an integrated competence, and it is incongruent and potentially detrimental to reduce CrT to a series of skills [8].Considering that CrT results from complex, multifactorial, interwoven and multileveled processes of thought [15], the second-order factorial model might better reflect the multidimensionality of CrT.Note here that the model that tested the hypothesis that all the 60 items are explained by one factor (model 1) or by the bi-factorial model (model 5) did not have an adequate fit to the data.That is, we cannot refer to critical thinking without referring to the six skills that constitute the higher-order concept of critical thinking.It also deals with the fact that the exercise of CrT is shaped by values and one´s background, which adds a complexity to the development of CrT competences.Consequently, the integrated score provided by the CTSAS short form adequately recognizes the complex and dynamic interplay of the six skills measured by the instrument, and support a holistic assessment of the students' CrT skills.The second-order factorial model, which was used to establish the comparison of results with the original CTSAS questionnaire by Nair [36], also showed that only four items had a factorial load below 0.500 (items # 4, 6, 8, and 39), suggesting that all other items presented convergent validity [43].Despite this, it was decided to keep the four questions, considering that the substantive contents they dealt with were relevant for the intended purposes.
A limitation of this study might be that a self-report instrument is proposed to test students' CrT skills, with all the inherent biases that might encompass such questionnaires [31,51].However, this limitation may be overcome by using the aggregate level to report data for individual CrT skills or by using the global CrT score.
The overall CTSAS short form showed strong internal consistency, with a Cronbach's alpha of 0.969, suggesting the scale retained stability and reliability despite the reduction in the number of items in the instrument.In addition, the individual dimensions of the skills assessed with the CTSAS short form presented acceptable-to-good Cronbach's alpha values [51][52][53][54] of between 0.772 (Interpretation) and 0.903 (Inference).These coefficients suggest that the constructs measure the intended dimensions.In addition, the correlations between the total score of the CTSAS short form and the individual dimensions tested confirm that skills may be measured through the items retained in the new, shortened CTSAS version.
Strong positive significant correlations were found between skills, and between the skills and the overall CTSAS short form.This finding supports the existence of a good item-related validity that strengthens the good internal-consistency-reliability that was found.
Sex did not affect most data distribution, except for four particular items (4, 42, 47 and 50).Moreover, the CTSAS short-form maintained its factorial structure invariance across sexes, supporting its reliability for both genders.
In summary, the current study presents and validates a short-form CTSAS questionnaire to assess CrT skills and some subskills to be applied in academic contexts, with the learning activities designed under the Facione framework.The short-form questionnaire presents a good construct validity with a good model-data-fit, and very good overall reliability and validity when applied to a multinational population enrolled on five very different higher education programmes.The strengths of the correlations between the skills and between each skill and the whole scale, confirm the good reliability of the instrument.
Consequently, the short-form of the CTSAS is a comprehensive CrT-assessment tool which has the potential to be used by HEIs to assess CrT skills.Funding: This work has been supported by the "Critical Thinking for Successful Jobs-Think4Jobs" Project, with the reference number 2020-1-EL01-KA203078797, funded by the European Commission/EACEA, through the ERASMUS + Programme.The European Commission support for the production of this publication does not constitute an endorsement of the contents, which reflect the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of University of Évora (GD/39435/2020, approved at 05-01.2021).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

2. Model 2 :
Six-factor (non-correlated) model.This model tests the existence of six noncorrelated factors that explain the variance of the set of items.3. Model 3: Six-factor (correlated) model.This model tests the existence of six correlated latent factors, each one explaining the variance of a set of items.4. Model 4: Second-order factor model.This model represents the original model proposed by Nair[36], in which a global critical-thinking-skills construct explains the six latent-skills variance, which, in turn, each explain a set of items. 5. Model 5: Bi-factor model.This model tests the possibility that the 60 scale-items variances are being explained by a global critical-thinking-skills construct, and by the six latent skills, independently.

Table 1 .
Comparison between the original Nair's CTSAS questionnaire and its short form.

Table 3 .
Cronbach´s alfa reliability index and correlations between factors and between the factors and the general critical-thinking-skills construct.

Table 4 .
The goodness-of-fit indices for multigroup invariance, by sex.

Table 5 .
Factor covariances by sex.

Table 6 .
Latent-means differences between female and male.

Table A3 .
Cronbach' alpha for the CTSAS short form skills and sub-skills.