Patterns of Scientific Reasoning Skills among Pre-Service Science Teachers: A Latent Class Analysis

We investigated the scientific reasoning competencies of pre-service science teachers (PSTs) using a multiple-choice assessment. This assessment targeted seven reasoning skills commonly associated with scientific investigation and scientific modeling. The sample consisted of 112 PSTs enrolled in a secondary teacher education program. A latent class (LC) analysis was conducted to evaluate if there are subgroups with distinct patterns of reasoning skills. The analysis revealed two subgroups, where LC1 (73% of the PSTs) had a statistically higher probability of solving reasoning tasks than LC2. Specific patterns of reasoning emerged within each subgroup. Within LC1, tasks involving analyzing data and drawing conclusions were answered correctly more often than tasks involving formulating research questions and generating hypotheses. Related to modeling, tasks on testing models were solved more often than those requiring judgment on the purpose of models. This study illustrates the benefits of applying person-centered statistical analyses, such as LC analysis, to identify subgroups with distinct patterns of scientific reasoning skills in a larger sample. The findings also suggest that highlighting specific skills in teacher education, such as: formulating research questions, generating hypotheses, and judging the purposes of models, would better enhance the full complement of PSTs’ scientific reasoning competencies.


Introduction
Scientific reasoning has been a subject of study in the field of science education for some time [1]. Assessing this reasoning, however, remains a 21st century challenge for science educators today [2]. The present study is on the scientific reasoning of future science teachers themselves. We have assessed reasoning amongst this group because they will need to teach and demonstrate reasoning to their future students in science, and we can design activities in science teacher education that can enhance their competency in this field.
Scientific reasoning is a competency that encompasses the abilities needed for scientific problem-solving, as well as the capacity to reflect on problem-solving [3,4]. In the sciences, reasoning has been previously distinguished from other constructs such as problem-solving and critical thinking or scientific thinking alone. Descriptions of thinking, problem-solving, and reasoning are often conflated. For example, scientific reasoning has been suggested as being a kind of problem-solving; however, it has also been suggested that reasoning can be distinguished from problem-solving alone in that direct retrieval of a solution from memory is not possible with reasoning [5]. Ford [6] further reinforces that reasoning does not mean following a series of rules either but rather encompasses permanent evaluation and critique, as suggested by the reflective component of the above definition. Reasoning in the sciences requires cognitive processes that can contribute to, or allow for, inquiring and answering questions about the world and the nature of phenomena. These cognitive processes include Educ. Sci. 2021, 11, 647 2 of 9 formulating and evaluating hypotheses, two of several processes regularly invoked in scientific domains [7,8].
The multiple cognitive processes that have been investigated in research on reasoning in science and science education have been variously described as formal logic, non-formal reasoning, creativity, model-based reasoning, abductive reasoning, analogical reasoning, and probabilistic reasoning [9][10][11][12]. These processes may or may not be used in the wider category of critical thinking [13]. Scholars have provided evidence that the ability to use these processes for reasoning is transferable across domains [14], while others such as Kind and Osborne [15] suggest that reasoning is highly variable by the content and the procedural and epistemic knowledge of the reasoner. Scholars have also shown that the ability to reason in science does not necessarily improve with age [16] but that it can be taught and enhanced in both the early years and at university levels [17][18][19].
Our focus in the present study is on the reasoning competencies of pre-service science teachers (PSTs) enrolled in a university teacher education program. Most studies on pre-service science teachers' scientific reasoning competencies adopt variable-centered approaches and report, for example, average scores for sample groups or populations. For example, one study [20] reported on a group of 66 Australian pre-service science teachers that they performed significantly better on tasks that required skills of 'planning investigations' compared to tasks related to skills of 'formulating research questions' and 'generating hypotheses'. Such insights are valuable but sometimes might be too roughgrained depending on the research questions, as different subgroups with distinct patterns of scientific reasoning skills exist within a sample. In order to identify such subgroups, person-centered analyses are necessary, that, statistically speaking, aim to "[R]educe the 'noise' in the data by splitting the total variability into 'between-group' variability and 'within-group' variability" [21] (p. 2). Hence, person-centered analyses, like latent class analyses (LCA), are finer-grained analyses in the sense that they are case-based and identify individuals with similar patterns of scientific reasoning skills (e.g., [22]). Person-centered analyses are also referred to as 'typological' approaches [23]. Such approaches can be specifically valuable for educators as they move beyond the 'average' and follow, methodologically, "[M]odern developmental theory, in which individuals are regarded as the organising unit of human development" [23] (p. 502). In the present study, we seek to establish whether subgroups of reasoners can be ascertained among PSTs using an LCA. The seven reasoning skills examined are: formulating research questions, generating hypotheses, planning investigations, analyzing data and drawing conclusions, judging the purpose of models, testing models, and changing models. While historical examination of scientific work has revealed that practices such as thought experiments, analogies, and imagistic simulation are important to scientists' development of new concepts [24], these seven skills under investigation were identified as key empirical areas of inquiry in science education [25][26][27][28][29] and likely having been taught in undergraduate science programs [3].

Sample
A full cohort of 56 PSTs from a university in North America participated in this study. Their mean age was 27 years (SD = 6.34; mode = 23). Data collection was done in their science teacher education secondary methods course within a Bachelor of Education afterdegree program. To enroll in the secondary program, all students had at least one prior degree (usually 4 years of Science or more). The instrument described below (Section 2.2) was administered to the PSTs in their methods course at the beginning and at the end of the semester (pre-post-assessment). For the purpose of identifying groups with distinct patterns of scientific reasoning, we analyzed pre-and post-assessment data taken together of 56 PSTs. The total response sample for each item was thus n = n pre + n post or n = 112. Only PSTs without missing responses have been included in the analysis, resulting in a sample of n = 101 for the statistical analysis. The number of PSTs by primary major were: Biology (n = 30), Chemistry (n = 11), Physics (n = 8), Biomedicine (n = 1), Earth Sciences (n = 1), Mathematics (n = 1), n/a (n = 4). Most of the PSTs' prior degrees were within the field of Biology (n = 60; e.g., general Biology, Applied Biology, or Evolutionary Biology), followed by Chemistry (n = 25) and Physics (n = 6).

Data Collection
An established multiple-choice instrument was administered to assess the PSTs' scientific reasoning competencies. The instrument was originally developed in the German language [27] and was later adapted into English, with thorough evaluations [30]. The instrument includes 21 multiple-choice items that were developed to assess seven reasoning skills of formulating research questions, generating hypotheses, planning investigations, analyzing data and drawing conclusions, judging the purpose of models, testing models, and changing models. Authentic scientific contexts were included in the items, which are mostly related to general science and Biology as well. As suggested in the organizing device that has been used for test development (see Table 1), these seven skills are related to two sub-competencies: conducting scientific investigations and using scientific models [31]. To correctly solve the multiple-choice items, PSTs have to apply their procedural and epistemic knowledge related to the respective skills [32][33][34]. Table 1 lists the two sub-competencies, their associated skills, and the specific knowledge necessary to correctly answer the items. Table 1. Sub-competencies of scientific reasoning and associated skills with necessary procedural and epistemic knowledge, as described by Mathesius et al. [34].

Data Analysis: Latent Class Analysis
A latent class analysis (LCA) was utilized to identify patterns of scientific reasoning skills among PSTs. The R package poLCA was employed [35]. All further (classical) statistical analyses, such as t-tests and descriptive analyses, were carried out with IBM Educ. Sci. 2021, 11, 647 4 of 9 SPSS statistics, version 26. In an LCA, PSTs' responses are analyzed on the latent level, all variables are assumed to be (at least) on a nominal level, and there are no restrictions on the kind of relation between the (manifest) variables [33,36,37]. LCA was selected for data analysis because it permits the identification and computation of different groups (i.e., latent classes) of PSTs, with each group consisting of individuals with a response pattern that is as homogenous as possible (low within-group variability) but different from the response patterns of the other groups (high between-group variability). Therefore, LCA would be considered as belonging to the person-centered approaches of data analyses [21,23].
A core question of LCA is to decide on the appropriate number of latent classes [36]. To compare different LCA models, indices such as the Akaike information criterion (AIC), the Bayesian information criterion (BIC), and the sample size adjusted Bayesian information criterion (ssaBIC) are typically employed. These indices factor in the parsimony, the sample size, and the likelihood of the LCA models-each of the indices in a different manner [38]. When comparing different LCA models with these information indices, the smallest value of each index points out the comparatively best LCA model; however, the BIC and the ssaBIC were identified as superior indicators compared to the AIC [39] (p. 557), which is why these indicators are used in the present study. On the other hand, the BIC and the ssaBIC often do not identify the same LCA model as optimal [38]. Therefore, one has to use a combination of different insights to decide how many latent classes represent the data set best [38].
It is an important characteristic of LCA that the subjects are not assigned to the different latent classes in a deterministic manner but more so in a probabilistic sense. For diagnostic purposes, it is common to classify each subject to the latent class with the highest probability of assignment. Therefore, an "Additional indicator [of model-goodness] is the average membership probability within each [latent] class" [40] (p. 52); the higher this probability, the better the LCA model. Furthermore, one should analyze the item parameters for extreme values that indicate an estimated probability of 0% or 100% to solve a task; the fewer extreme values, the better the LCA model [40]. Table 2 provides the fit-indices for the LCA models compared in this study. Because the BIC (2 latent classes) and ssaBIC (4 latent classes) suggest selecting different LCA models, the number of extreme values and the probability of assignment have been used as additional indicators. Based on these indicators, it can be assumed that the response pattern of the PSTs is best represented using two latent classes. These two latent classes consist of about 73% or 74 PSTs (latent class 1) and 27% or 27 PSTs (latent class 2) of the sample, respectively.  Figure 1 illustrates the response profiles for the two latent classes across the seven skills of scientific reasoning covered in the multiple-choice instrument. Generally, PSTs in latent class 1 show a higher mean probability of correct answers within all seven skills. Comparing the mean probability of correct answers between the two latent classes with independent t-tests resulted in significant differences for the skills planning investigations (p = 0.04; d = 0.48, small to medium effect size measure), analyzing data and drawing conclusions (p < 0.001; d = 1.25, large effect size measure) as well as judging the purpose of models large effect size measures could be found between these two groups of skills. For the skills related to using scientific models (Table 1), correct responses were found significantly more often for the skill testing models than for judging the purpose of models (p = 0.02; d = 0.36, small effect size measure).

Results
For latent class 2 and considering skills related to conducting scientific investigations (Table 1), items related to the skill planning investigations have been answered correctly significantly more often than the tasks related to the other three skills (p < 0.001; d > 1.00, large effect size measures). For using scientific models (Table 1), no significant differences between the skills could be found.  For latent class 1 and considering skills related to conducting scientific investigations (Table 1), response probabilities for the skills formulating research questions and generating hypotheses on the one hand, and planning investigations and analyzing data and drawing conclusions, on the other hand, are quite similar, even though significant differences with large effect size measures could be found between these two groups of skills. For the skills related to using scientific models (Table 1), correct responses were found significantly more often for the skill testing models than for judging the purpose of models (p = 0.02; d = 0.36, small effect size measure).
For latent class 2 and considering skills related to conducting scientific investigations (Table 1), items related to the skill planning investigations have been answered correctly significantly more often than the tasks related to the other three skills (p < 0.001; d > 1.00, large effect size measures). For using scientific models (Table 1), no significant differences between the skills could be found.
In order to better understand the characteristics of the PSTs assigned to latent class 1 and latent class 2, we compared their age, primary majors, and the sum of previous degrees. Independent t-tests (Table 3) revealed that there are significantly more PSTs with the primary major of Biology in latent class 1 (about 65%) than in latent class 2 (about 33%). For the primary major of Chemistry, it is quite the reverse (about 15 % in latent class 1 and about 33% in latent class 2); also, the number of PSTs with more than one previous degree is significantly higher in latent class 1 (n = 11) than in latent class 2 (n = 1). These findings illustrate that the study of Biology as a primary major and a higher number of previous degrees made it more likely to belong to the more proficient latent class 1, whereas the study of Chemistry as a primary major made it more likely to belong to latent class 2. Table 3. Comparison of the PSTs assigned to latent class (LC) 1 and LC 2 along the variables age, primary major of Biology, Chemistry or Physics, and the sum of previous degrees (the latter as a dichotomized variable with 1 = one previous degree and 2 = more than one previous degree). * Adjusted t-statistic and df because of violated assumption of variance homogeneity.

Discussion
Using LCA, we revealed that two groups of reasoners emerged amongst the PSTs. One subgroup (latent class 1) had a statistically higher probability of solving scientific reasoning tasks than the other subgroup (latent class 2). Overall, the groups were significantly different on the following five skills out of seven investigated: planning investigations, analyzing data and drawing conclusions, judging the purpose of models, testing models, and changing models. They were not significantly different from each other on formulating research questions and generating hypotheses.
The latent class 1 subgroup responded significantly differently from each other on the skills planning investigations and analyzing data and drawing conclusions in contrast to the skills formulating research questions and generating hypotheses. Tasks about testing models were solved more often than those requiring judging the purpose of models within this subgroup. The latent class 2 subgroup responded significantly differently from each other on planning investigations compared to the other skills. For using scientific models, no significant differences could be found within this subgroup on the skills related to modeling (judging the purpose of models, testing models, and changing models).
These two subgroups also shared several other key characteristics. In latent class 1, a significant majority had a major in Biology compared to latent class 2, whereas there were far fewer from Chemistry in latent class 1. Moreover, there were significantly more PSTs with more than one previous degree in latent class 1 than in latent class 2. This finding is noteworthy for science teacher education because it suggests that Biology majors were significantly better at planning investigations, analyzing data and drawing conclusions, judging the purpose of models, testing models, and changing models than Chemistry majors. These findings might have been caused by the dominance of Biology-related items in the instrument; however, as the items require PSTs to apply procedural and epistemic knowledge as shown in Table 1 (and less so content knowledge), the findings lead us towards a renewed emphasis on reasoning tasks for Chemistry teacher education. Nevertheless, future studies could investigate the importance of science content knowledge from specific subjects (such as Biology) for solving the items, for instance, by applying think-aloud studies [25] or statistically investigating difficulty-generating task characteristics [41].
As a 'person-centered' statistical approach, the LCA was particularly powerful in ascertaining subgroups within a science teacher education cohort. This statistical approach Educ. Sci. 2021, 11, 647 7 of 9 is a departure from traditional variable-centered approaches in education that tend to report on average scores for sample groups [21,23]. The LCA permits statistical cases to emerge from within samples or classrooms and is a recommended approach to generate case studies for further inquiry in science teacher education research.
In combination with relevant epistemic, procedural, and content knowledge, greater attention to formulating research questions and generating hypotheses would be helpful within science teacher education. Furthermore, reasoning tasks involving judging the purpose of models and changing models could be a high priority for modeling investigations in preservice science teacher education. Possible science teacher education activities to support such tasks include the three-phased generating, evaluating, and modifying (GEM) models approach [10]. This approach emphasizes generating hypotheses in the first phase and testing and changing models in the second and third phases [42]. In general, science teacher education courses, Biology majors, or those with additional degrees could be purposefully included within heterogeneous groups for cooperative learning tasks. It was interesting to the authors that Biology majors outperformed other majors in this study, although this might be caused by the dominance of Biology-related items in the instrument; insights into the differences in performance among majors would be a helpful avenue for the design of science teachers education courses and group work in the ways suggested above. By participating in reasoning tasks with such recommendations in mind, future teachers might be able to better support their own students to develop competencies in these areas.
The significance of this study is that it identifies two groups of reasoners who are PSTs with different propensities to reason in science using person-centered statistics. Normally, the classroom would be treated similarly as an entire group; however, with this statistical approach, the researchers are able to show that subgroups of PSTs themselves emerged as competent at very different reasoning tasks. One subgroup is significantly more competent at planning investigations, analyzing data and drawing conclusions, judging the purpose of models, testing models, and changing models than the other. The subgroups had approximately equivalent competencies at formulating research questions and generating hypotheses showing for the first time that among PSTs, different subgroups with specific patterns of scientific reasoning skills exist. This finding can have an impact on science students of these future teachers, who presumably will draw upon their own competencies to demonstrate how to reason in the classroom. Future directions for research could target investigation and model-based reasoning competencies among PSTs and relationships to student reasoning. Judging the purpose of models, formulating research questions, and generating hypotheses were areas that PSTs were less competent; researching interventions related to these aspects of modeling and investigation would be worthwhile.