Can Pre-Service Biology Teachers’ Professional Knowledge and Diagnostic Activities Be Fostered by Self-Directed Knowledge Acquisition via Texts?

: In a diagnostic context of reasoning about instructional quality, scientiﬁc reasoning skills can be described as diagnostic activities, which require professional knowledge. Different approaches to enhance pre-service teachers’ professional knowledge (PCK, CK, PK), as well as diagnostic activities exist. However, results about their effectiveness are still inconsistent. We systematically investigated the effectiveness of self-directed knowledge acquisition via texts on PCK, CK, PK, and diagnostic activities of 81 pre-service biology teachers following an experimental design. Paper-pencil tests, measuring PCK, CK, and PK, and the video-based assessment tool DiKoBi Assess , measuring diagnostic activities in the context of diagnosing instructional quality, were used pre and post an intervention. Intervention included four treatments on self-directed knowledge acquisition via texts on (1) PCK, (2) CK, (3) PK, (4) combination PCK/CK/PK. Treatment (5) served as control. Mixed ANOVAs showed large time effects for PCK and CK, but no interaction effect concerning knowledge facets between time and treatment for any of the treatments. Time effects might be due to pre-service teachers’ scientiﬁc reasoning on biology instruction that activated knowledge. An ANCOVA showed no signiﬁcant effect of treatment on diagnostic activities either. We conclude that scientiﬁc reasoning about instructional quality is more effective for knowledge acquisition than text-work.


Introduction
Scientific reasoning, as a component of scientific inquiry, encompasses reasoning, and problem-solving processes that count as crucial for coping with science-related issues of everyday-life [1,2]. Therefore, Krell et al. [3] underlined the importance of developing scientific reasoning competencies during teacher education. Krell et al. defined scientific reasoning competencies as a complex construct comprising three knowledge types for problem-solving (knowing that, knowing how, knowing why, cf. [4]) that are applied in cognitive processes (e.g., encoding, strategy development, cf. [2]). For this application, scientific reasoning skills, such as formulating questions, testing hypotheses, planning, and performing investigations, analyzing information systematically, and drawing reasonable conclusions from specific observations are required [2,5,6]. With regard to a rather broad understanding of scientific reasoning competencies, they can also be considered important for teachers to monitor and improve instructional quality in the science classroom. In such a context of instructional diagnosis, a teacher explicitly and systematically compares different characteristics of instruction in a data-based manner in order to be able to make appropriate instructional decisions [7]. Scientific reasoning competencies in such a context of diagnosis can be considered similar to conceptualizations of diagnostic competences [8,9]. For diagnostic competences, different components such as professional knowledge and diagnostic activities have been distinguished. Knowledge has been conceptualized in terms of the content-related facets pedagogical content knowledge (PCK), content knowledge (CK), and psychological-pedagogical knowledge (PK), as well as with regard to types of knowledge (knowing that, knowing how, knowing when and why) [10]. Since knowledge is applied in diagnostic contexts, and since within those contexts, the diagnostic focus can vary, all content-related knowledge facets may be of importance. Therefore, knowing that, knowing how, knowing when and why is not restricted to CK, but can also be distinguished for PCK and PK, which can be seen as an extension of the definition given in the scientific reasoning competencies approach. The conceptualization of diagnostic activities in which knowledge is applied in order to solve specific problems can be seen as equivalent to scientific reasoning skills [8].
Whereas many studies in recent years have examined the structure of professional knowledge and developed instruments to measure different knowledge facets (e.g., COAC-TIV: Cognitive Activation in the Mathematics Classroom and Professional Competence of Teachers [11]; ProwiN: Professional Knowledge of Teachers in Science [12]), there is still a lack on how to effectively promote the content-related knowledge facets PCK, CK, and PK, and whether there are differences regarding the effectiveness of procedures such as text work, lectures, or practical training. Additionally, the effects of fostering professional knowledge on the execution of scientific reasoning skills are not well studied either. Within a biology context, the present study addresses this issue by investigating whether pre-service biology teachers' professional knowledge can be supported by self-directed knowledge acquisition via texts that represents one common working method in higher education programs and, thus, has high practical relevance for pre-service teachers. The study further investigates whether this text-based support affects the pre-service teachers' scientific reasoning skills about subject-specific instruction expressed as diagnostic activities.
The following theoretical section starts from the professional competence of teachers, from which corresponding conceptualizations of knowledge and skills are derived that are relevant for scientific reasoning in the context of diagnosis.

Conceptualizing Teachers' Professional Competence in Terms of Diagnosing
Teachers' professional competence has been studied from a cognitive perspective focusing dominantly on teachers' knowledge facets (e.g., [11,13,14]) and from a situated perspective including the context in which instructional decisions have to be made (e.g., [15][16][17]). The competence as a continuum model combines both perspectives and defines professional competence as a continuum with different components spanning from a teacher's dispositions (e.g., professional knowledge, beliefs) that underlie situation-specific skills, which in turn inform the teacher's actual instruction [18]. Dispositions are defined as "underlying characteristic of a person" [19] (p. 97) that can be regarded as cognitive in terms of professional knowledge and as affect-motivation in terms of teachers' belief, interest, or motivation [18,20]. Conceptualizations of situation-specific skills refer to teachers' adequate coping with teaching situations and allow an action-oriented assessment of variables that take situated learning approaches into account [20,21]. In the context of reasoning, such skills can be operationalized as scientific reasoning skills [22].
Furthermore, teachers' instruction in the classroom is considered to be decisive for teaching effectiveness. Effective teaching can be described by generic and subject-specific dimensions of instructional quality, which influence students' outcomes [23][24][25]. As part of their professional competence, teachers should know about these dimensions and included instructional quality features in order to reason about them, to diagnose instructional processes, make appropriate decisions, and adapt teaching (cf., [15,26]). Generic instructional quality features can be described by three basic dimensions of instructional quality: classroom management, supportive climate, and cognitive activation. The dimensions classroom management (includes strategies and activities such as rules, routines, or monitoring to organize the classroom and ensure an effective use of time) and supportive climate (includes, amongst others, a teacher's sensitivity to learners, patient teacher actions, establishment of a positive learning climate, and appropriate feedback) are understood as generally applica-ble across domains. In contrast, the content-dependent basic dimension cognitive activation (includes instructional practices which stimulate students to higher cognitive engagement to foster conceptual understanding) is to be differentiated more subject-specifically and, thus, is considered closer to subject-specific dimensions [27,28]. Relevant aspects of cognitive activation that have been described are conceptual instruction and an appropriate level of students' cognitive activities [24]. However, teachers of a specific subject, such as biology, do not only have to reason about the basic dimensions of instructional quality, but above all they must be able to describe and implement instructional key features of the specific subject, since it is those subject-specific instructional quality features that are considered necessary for high-quality biology instruction (cf. [27,29]). Diagnosing biology-specific instructional quality features (e.g., teachers' formative handling with specific student ideas, the thoughtful use of content-specific technical language, an elaborate use of models in order to solve scientific questions or the application of scientific inquiry strategies when planning and conducting experiments) is therefore of great importance [23,25,30].
Within the situation-specific processes of a subject such as biology, teachers evaluate data (e.g., from monitoring scientific inquiry steps, the elicitation of student thinking, the use of three-dimensional models) to inform their pedagogical reasoning and decisionmaking [31]. Since such processes are part of the systematic and continuous generation and evaluation of knowledge about students and (subject-specific) instructional dimensions, they can be summarized as diagnosing, which counts as an important component of teachers' professional competence [32,33]. Taking the data-based process into account, diagnosing can be considered as a type of scientific reasoning including several epistemic activities teachers can make use of [34]. In the context of diagnosis, these activities have also been described as diagnostic activities [8]. Therefore, teacher education should not only foster teachers' professional knowledge regarding instructional quality features but also enable them to "apply their knowledge in diagnostic activities according to professional standards to collect and interpret data in order to make decisions of high quality" [8] (p. 9). Thus, diagnostic activities represent scientific reasoning skills used to specify situation-specific skills in the context of diagnosis.
Depending on the context in which knowledge has to be applied, different contentrelated knowledge facets and types of knowledge may be critical [10]. In addition, different diagnostic activities have been described, but not all of them are considered relevant for diagnosing specific situations [8], since in some situations it might be sufficient to generate and evaluate evidence, whereas in other situations the generation of hypothesis or the creation or redesign of artifacts may be more important for knowledge generation. In the following, different approaches to conceptualize teacher professional knowledge and skills are described.

Conceptualizing Teachers' Professional Knowledge
Effective biology teaching requires different knowledge types. Förtsch et al. [10] distinguished knowledge related to facts, terms, and principles (mostly referred to as declarative knowledge or knowing that), and action-related knowledge (knowing how, knowing when and why). Knowing that means, for example, that a teacher can correctly list the advantages and disadvantages of a specific model. Knowing how refers to knowledge about actions, procedures, or manipulations, and is applied, for example, when a teacher deals with students' ideas. Knowing when and why relates to knowledge about when and why to apply particular procedures to achieve particular goals, for example, knowing when and why students' errors within a certain topic are dealt with [10,35].
In addition to the division into different types of knowledge, content-related knowledge facets can be classified. Based on Shulman's classification [36,37], most models focused on PCK, CK, and PK that build the core of the construct [11]. Knowledge can be described for all three knowledge facets in terms of declarative and action-related knowledge [10]. The subject-independent facet PK contains knowledge that counts as necessary for classroom management, classroom assessment, and organization to facilitate an effective learning atmosphere in which pedagogical strategies can be applied [26,38,39]. Broader conceptualizations, as used in the BilWiss project, include the PK-dimensions instruction (which is further divided into the sub-dimensions generic instructional quality features and teaching strategies and methods), learning and development, diagnostics and evaluation, educational theory, school as an educational institution, and teaching as a profession [40]. In general, the conditions a science teacher establishes in the classroom are assumed to provide the basis on which PCK and CK can be used [39]. Since PCK and CK count as subject-specific, they are most important for science education. CK describes the knowledge of subject matter, discipline-specific methods for generating knowledge, and the conceptual understanding of specific topics, which researchers emphasized as a necessary but insufficient precondition for the development of PCK [11,36,41]. With regard to a specific subject matter, PCK includes subject-specific knowledge about corresponding (mis)conceptions of particular students, knowledge about subject-specific structures of instruction, and corresponding teaching strategies, and was shown to be highly predictive for instructional quality and students' achievement [15,21,[42][43][44]. Accordingly, PCK is related to the implementation of subject-specific instructional quality features and is thus considered particularly relevant for subject-specific instruction [42,44]. However, researchers emphasized that it is not only this stage of knowledge that forms PCK but also the knowledge that is closely related to the actual practice and, thus, is more dynamic [43,45,46]. Science teaching includes taking students' prior learning into account, facilitating linkages between concepts, or choosing and utilizing instructional strategies that best suit particular teaching moments. Such tendencies underpin teachers' pedagogical reasoning, which is the heart of teaching, and are regarded as components of PCK rather than equivalent to PCK (see [47,48]).
For measuring professional knowledge, researchers mostly used paper-pencil tests with validated test items [11,49]. However, to measure context-dependent, practiceoriented knowledge, other approaches than paper-pencil assessments are needed that take the situated character of knowledge application into account [50]. The use of video analyses represents such an approach [51].

Conceptualizing Situation-Specific Skills for Reasoning about Instruction
Researchers assume that teachers' professional knowledge underlies their situationspecific skills that teachers use to systematically solve specific situations in the classroom or to inform subject-specific instruction [18]. When solving (problematic) situations, teachers engage in reasoning processes in order to make decisions. From a scientific stance, "scientific reasoning encompasses the reasoning and problem-solving skills involved in generating, testing and revising hypotheses or theories, and in the case of fully developed skills, reflecting on the process of knowledge acquisition and knowledge change that results from such inquiry activities" [2] (p. 61). In this definition, it becomes evident that scientific reasoning comprises specific processes that aim at generating knowledge [2,34]. The mentioned processes are in line with Nowak et al. [5], who identified three main processes that are central to scientific reasoning: (1) asking questions and formulating hypotheses, (2) planning and performing an investigation, (3) analyzing data and reflecting on the investigation. For effective engagement in any of these reasoning processes, scientific reasoning skills including reasoning from evidence are required [22,52]. Several skills have been described either with regard to the three main processes (cf. [5]) or with regard to epistemic activities that have been found relevant for generating knowledge in different domains [34]. Overall, eight epistemic activities have been described: (1) problem identification, (2) questioning, (3) hypothesis generation, (4) construction and redesign of artefacts, (5) evidence generation, (6) evidence evaluation, (7) drawing conclusions, and (8) communicating and scrutinizing. Such scientific reasoning skills are considered vital not only for teachers' classroom instruction [53] but for every human's understanding of the world and for the development of responsible citizenship [2,54]. Accordingly, it is important that science teachers master such skills: Firstly, to promote them among their students and, secondly, to apply them in the context of reasoning processes about science or biology instruction.
Within the broader understanding of scientific reasoning, applying scientific reasoning on instruction can be seen as an evidence-based process of systematically collecting data, generating and evaluating evidence, and drawing inferences in order to produce a diagnosis and make instructional decisions (cf. [8,31,55]). Therein, evidence is not only the product of an experimental investigation but can also consist of statements describing observations (cf. [52]). Overall, scientific evidence-based reasoning in the context of diagnosing enables teachers to diagnose their students and also instructional features that are important in terms of instructional quality, and thus, for student learning (cf. [11,42]). In teacher education, pre-service teachers should, therefore, also develop knowledge and skills to enact scientific reasoning in different contexts, such as diagnosing, for which specific tools are needed [3].
In the context of diagnosing, scientific reasoning skills have been operationalized as diagnostic activities. Diagnostic activities describe those activities that teachers execute for data-evaluation within situation-specific diagnostic contexts and that are more clearly observable than solely cognitive processes that underlie diagnosing [8,56]. Diagnostic activities have been mentioned in several studies (e.g., [57,58]), but an explicit definition of different activities is mostly missing. A more differentiated approach was recently made by Heitzmann et al. [8], who translated the eight epistemic activities introduced by Fischer et al. [34] into eight diagnostic activities relevant for the goal-oriented process of diagnosing. In the following, these diagnostic activities are illustrated with examples from teaching: • identifying problems (e.g., a teacher recognizes a noteworthy incident in classroom instruction driven by prior knowledge, cf. [59]); • questioning (e.g., a teacher asks for reasons of the identified problematic incident); • generating hypothesis (e.g., a teacher makes an assumption about the underlying problem of the teaching situation); • constructing artifacts (e.g., a teacher generates tests/tasks to be used for (further) data collection); • generating evidence (e.g., a teacher or observer uses the test or task or systematically observes and describes the situation, for example, with regard to relevant student or teacher behavior); • evaluating evidence (e.g., a teacher interprets data and evaluates the extent to which it supports a demanded standard); • drawing conclusions (e.g., a teacher derives (behavioral) consequences from the evaluation of multiple data sources); • communicating the process and results (e.g., a teacher shares findings and feedback can be given; afterward, further measures can be taken or alternative instructional strategies can be implemented).
The eight diagnostic activities can be understood as a reservoir of activities teachers can use for diagnosis. Which diagnostic activities are appropriate may differ with regard to the discipline or the diagnostic focus [8,60]. In studies that investigated diagnostic activities in the context of teacher education, the diagnostic activities generating hypothesis, generating evidence, evaluating evidence, and drawing conclusions were considered to be particularly relevant [55,60,61]. Furthermore, it is noteworthy that some diagnostic activities show similarities with conceptualizations of situation-specific cognitive skills, such as perceiving, interpreting, and decision-making (PID model, [51]) or professional vision [62,63]. Both conceptualizations have been used in the context of video analysis and the diagnosis of classroom instruction. From the perspective of the PID-model, teachers' abilities to perceive particular events in instructional settings, to interpret the events, and to make decisions either as anticipating answers to student ideas or proposing alternative teaching strategies have been identified as crucial in terms of professional competence. From the perspective of professional vision, the skills of noticing (paying attention to noteworthy events) and reasoning (describing noteworthy events, explaining by linking pedagogical concepts and principles to observed events, and predicting possible consequences as specification of teachers' decision making) are highlighted [63]. Therefore, researchers using diagnostic activities in the context of video analysis have operationalized generating evidence, evaluating evidence, and drawing conclusions as reasonable diagnostic activities for assessment designs [61].
Even when scientific reasoning skills, and more explicit the epistemic purpose underlying a diagnostic activity, are transferable across disciplines [64], to a certain extent, the application of scientific reasoning skills is discipline-and context-specific, as well as knowledge-dependent (cf. [65,66]). Therefore, the adequate execution of diagnostic activities in the context of diagnosing biology instruction may rely on subject-specific facets of professional knowledge such as teachers' PCK (cf. [9,29]).

Fostering Professional Knowledge and (Scientific) Reasoning Skills
Due to the complex interaction and interdependence of professional knowledge and scientific reasoning skills, such as diagnostic activities, pre-service teachers need varying opportunities to develop knowledge and apply diagnostic activities during their teacher education. The common division into three content-related knowledge facets is also reflected in the university organization, in which the knowledge facets are taught in separate courses and lectures [67], while different knowledge types are addressed more or less explicitly across all courses. Standard working methods in higher education include text-based procedures requiring pre-service teacher to acquire knowledge self-directed, lecture-based procedures in which specific information is presented by a lecturer, mixed forms of text-and lecture-based instruction, and situated approaches to learning that represent scenarios from real-world demands, for example, in video vignettes (cf. [68,69]). Depending on the context, on learners' prerequisites, on structure and content of materials, and specific learning goals, the effects of instructional support on knowledge acquisition may vary.
Barth et al. [70] compared the effects of self-directed knowledge acquisition and direct instruction of knowledge about classroom disruptions that represent an aspect of PK on three cognitive outcomes: (declarative) knowledge on classroom management, noticing critical incidents in the classroom, and knowledge-based reasoning. Results showed that direct instruction that was conducted by a university teacher and included a systematic introduction to the relevant content led to higher gains in knowledge on classroom management (PK) and improved the "ability to apply this knowledge in a simulated teaching situation (the video) through knowledge-based reasoning" [70] (p. 8). However, noticing (that corresponds to the diagnostic activity problem identification) was not affected. In addition, the self-directed acquisition of knowledge did not result in any significant effects. Kleickmann et al. [71] investigated conditions that are necessary for developing PCK in mathematics education. Experimentally manipulated treatments received instructions by an experienced lecturer on different combinations and sequencing of declarative and action-related PCK, CK, or PK over two days. Additionally, important content was repeated, and the participating pre-service teachers received handouts and had to carry out different tasks, such as answering short questions or writing assignments to recapitulate the major contents of the instruction. Regarding direct instruction, their results showed that "explicitly addressing the knowledge of students, learning and teaching in concrete content domains, whether with or without antecedent CK instruction, appeared to be the most effective pathway" [71] (p. 126). A reanalysis of the data also underlined that instruction on PCK has small effects on CK or PK development as well [67]. The authors of the study attributed this to comparisons and reflections stimulated by the test questions that might have prompted particular aspects of CK and PK. Furthermore, Smit et al. [72] investigated the relationships between PCK, CK, and scientific inquiry attitudes. They found gains for both declarative PCK and CK measured with test items within a training program on scientific inquiry. To ensure all participants had a similar level of knowledge, a teacher educator gave a theoretical PCK input on scientific inquiry. Furthermore, inquiryrelated videos were discussed. Results showed a major relationship between PCK and scientific inquiry attitudes. In addition, the input proved to be effective for PCK and scientific inquiry attitudes. The ensuing intervention consisted of a peer-coaching on lesson planning focusing on scientific inquiry skills. However, lesson planning was not found to affect professional knowledge. Possible effects on scientific inquiry skills have not been investigated.
Besides studies focusing on the three knowledge facets PCK, CK, or PK, training interventions that aim to improve situation-specific skills exist. Positive effects of university courses were shown, in which noticing and knowledge-based reasoning were fostered by using videos [63,73,74]. In most of these courses, pre-service teachers discussed and reflected teaching performances shown in video clips, which potentially addressed the diagnostic activities problem identification, generating and evaluating evidence, and drawing conclusions. To our knowledge, effects of instructional support via texts explicitly on activities relevant within diagnostic contexts have not been investigated yet.
However, a first approach to differentiate between measurements of cognitive dispositions in terms of knowledge facets and situated skills was recently made by Gess-Newsome et al. [45]. As part of a three-year professional development training, participants studied curriculum materials, discussed issues of effective pedagogy, and deepened CK to promote different facets of knowledge and skills. Investigated facets of teachers' knowledge and skills were declarative CK, general PK, two separate PCK-constructs (PCK-PK and PCK-CK), as well as inquiry-oriented teacher practice. Even though general PK was conceptualized as a cognitive knowledge facet in their initial conception, the authors finally recognized that by using an observation protocol to assess PK in video-recorded classroom sessions, they actually measured a skill instead of declarative PK. Furthermore, the assessment of the PCK-constructs was situated and can be considered as an approach to elicit skills as enacted form of PCK (cf. [48]). These PCK-related skills reflected the application of reasoning skills in terms of diagnostic activities on a meta-level. Skills referred to the abilities to describe a lesson (i.e., generating evidence), to explain rationales for instruction (i.e., evaluating evidence), and to make instructional decisions (i.e., drawing conclusions). In addition, Gess-Newsome et al. [45] investigated the development of teachers' inquiry-based instruction and, thus, included a second measure related to reasoning skills directly used in action. As the result of the three-year professional development training, the authors found an increase in all investigated facets of teachers' knowledge and skills, indicating the effectiveness of integrating multiple pathways of teachers' professional learning.
Despite the number of studies carried out in the field of professional competence training, only a few studies can be found that explicitly investigated a certain type of knowledge acquisition (self-directed via texts, lecture or instruction, video-club) and its impact on the knowledge facets PCK, CK, and PK, as well as on teachers' skills such as diagnostic activities in a systematic way. Previous studies have rarely focused on the investigation and support of all knowledge facets equally, nor have skills such as diagnostic activities been investigated with regard to subject-specific instructional quality features. Moreover, in some studies, the conceptualization of investigated variables lacks preciseness or has not been clearly considered.
Assessment situations should include the explicit measurement of all knowledge facets and types or at least situate measures to specific facets and types for a more fine-grained differentiation and analysis of professional knowledge in order to examine knowledge development in relation to methodological approaches and training measures (cf. [10,75]). This is important for understanding the nature of the individual knowledge facets and skills, and for clarifying whether and what type of intervention is effective. It also provides important information for practical implementation.

Motivation of the Study and Research Questions
Teacher education at universities continues to provide a great deal of knowledge acquisition about text-based instruction, although it is unclear to what extent this is effective. The question arises whether this kind of learning setting is best for pre-service teachers with little classroom experience. Therefore, our first goal was to investigate the effects of knowledge acquisition via texts and how self-directed knowledge acquisition via texts affects pre-service teachers' professional knowledge facets PCK, CK, and PK. Our second goal was to analyze if knowledge acquisition also affects the application of scientific reasoning skills within the context of diagnosing subject-specific instructional quality. Since it is assumed that the use of scientific reasoning skills, such as diagnostic activities, relies to some degree on an individual's knowledge (cf. [65]), there might not only be an effect of self-directed knowledge acquisition via texts on the professional knowledge facets but on diagnostic activities as well (cf. [15,48]). Thus, the present study addresses the following research questions (RQ):

•
RQ1a: Is the self-directed knowledge acquisition via texts effective to foster pre-service biology teachers' PCK, CK, and PK? • RQ1b: Are there different effects of the intervention on pre-service biology teachers' PCK, CK, and PK? • RQ2: Is the self-directed knowledge acquisition via texts effective to foster the execution of diagnostic activities?

Design and Sample
The present study was embedded in pre-service biology teacher's university studies within a regular seminar held once a week. It took a total of three seminar dates in autumn 2018. The seminar dealt with basic theories and concepts for teaching biology and is attended by pre-service biology teachers at the beginning of their teacher education. Using the video-based tool DiKoBi Assess (German acronym for diagnostic competences of biology teachers in biology classrooms) was compulsory for all seminar attendees. However, consent to the use of data for analysis was voluntary. All participants signed informed consent documents stating an anonymous and voluntary participation.
The experimental design of the study contained a pre-test (day 1), a post-test (day 3) and featured five different treatments (intervention on day 2). Pre-and post-data were collected in two steps each (see Figure 1). First, pre-service biology teachers completed three paper-pencil tests to measure their PCK, CK, and PK, which were the same in pre and post-test. Second, we used the video-based assessment tool DiKoBi Assess to measure pre-service teachers' diagnostic activities pre (DiKoBi I Assess) and post (DiKoBi II Assess). The assessment tool provides videotaped classroom situations that have to be diagnosed with regard to different subject-specific dimensions (see Section 2.3.2 Video-Based Assessment Tool DiKoBi Assess). The diagnostic tasks were the same for DiKoBi I and II; both versions differed only in the content of the classroom situations shown. Both pre-and post-measurements took 120 min each. The intervention lasted 90 min and consisted of five different treatments, in which information on either (1) PCK, (2) CK, (3) PK, (4) a combination of these three knowledge facets, or (5) none of these knowledge facets (control group) was acquired in a self-directed way. The intervention covered declarative and action-related knowledge relevant for teaching the topic "skin". The same topic was also addressed in the videotaped classroom situations of the assessment tool [76]. The sample consisted of 81 pre-service biology teachers (75.3% female; average study semester: M = 3.9, SD = 1.3; age in years: M = 23.6, SD = 3.9). Overall, 48.1% of the pre-service teachers attended the academic track of teacher education, qualifying them for future teaching at German secondary schools ("Gymnasium"); 51.9% attended programs for the non-academic track that prepares students for a vocational career (for an overview of the German school system see [77]).
Pre-service teachers were randomly assigned to five treatments (see Table 1). There was no statistically significant difference in age (F(4, 76) = 0.77, p = 0.55), study semester

Description of the Treatments
After reviewing the literature on generic and subject-specific features of instructional quality, and including theoretical aspects that were relevant for pre-service teachers' diagnoses of subject-specific challenges in the classroom situations presented in the assessment tool DiKoBi Assess, content-specific aspects regarding PCK, CK, and PK were identified and summarized in texts (for an overview about the content included in the texts, see Appendix A). The texts either contained information on only one knowledge facet (treatments 1-3, see Figure 1) or represented a combined form of all three knowledge facets (treatment 4). A control group (treatment 5) did not receive any information according to the three knowledge facets. After a five minute lasting introduction into day 2 of the study, the participants of each treatment worked individually for 85 min on the associated texts. In treatments 1-4, participants were guided by identical tasks that were adapted with regard to the specific content of the texts (for an example, see Appendix B). The participants were asked to highlight important information in the texts (task 1), to complement an already outlined concept map on the basis of the highlighted information (task 2), and to apply this information by evaluating a statement or situation and providing alternatives (task 3). The tasks were constructed to consider declarative and action-related knowledge that were also part of the professional knowledge tests used [49]. However, the major share relates to declarative knowledge.

Professional Knowledge Tests
Three paper-pencil tests for measuring PCK, CK, and PK were utilized.
The tests included open-ended items (required a written response in a text field), single best answer (SBA) items (required the selection of one correct answer from a set of possible responses consisting of multiple distractors and one correct answer), and multiple true or false items (all of the possible responses had to be assessed for their validity) [78].
The PCK and CK test covered declarative and action-related knowledge about the topic "skin" (in accordance with the topic covered in the assessment tool DiKoBi Assess and the intervention). For example, declarative knowledge (knowing that) for PCK was addressed by asking for advantages and disadvantages to a specific model, which shows the structure of the human skin. Action-related knowledge in terms of knowing when and why was measured by asking for possible reasons students develop specific misconceptions on a specific biology topic after the learning process (cf. [49]). Both the PCK and CK test were adapted versions of the professional knowledge tests used in ProwiN [49,79]. The PCK test included eight open-ended items and five SBA items. Therefore, we assumed to elicit pre-service biology teachers PCK with the different items since for responding, the pre-service teachers had to draw on their individual specialized knowledge. The PCK test covered two important components of biology teachers' PCK: knowledge of instructional strategies (model use and use of experiments) and knowledge of students' errors [12]. The CK test included 13 open-ended items and 15 SBA-items. Criteria for item scoring of both the PCK and CK test were provided in two separate coding manuals. Precise descriptions of the scoring process can be found in Kramer et al. [9]. To ensure objective and reliable coding, ten percent of both the PCK and CK tests were coded by two independent raters utilizing the coding manuals. A high agreement between the two raters has been shown by the results of two-way random intra-class correlations (ICC absolute ): PCK: ICC absolute (310, 310) = 0.84, p < 0.001; CK: ICC absolute (341, 341) = 0.97, p < 0.001 [80].
For assessing PK, we used a short, adapted version of a paper-pencil test utilized in the BilWiss project covering the dimension instruction [40,81]. This dimension of the PK test referred to the basic dimensions of instructional quality containing declarative items about generic features such as classroom management, supportive climate, and general aspects of cognitive activation [11,24,28], as well as items on general pedagogical issues of teaching such as teaching methods. Since the differentiation between generic and subjectspecific features of instructional quality was an important element of the video-based assessment tool DiKoBi Assess, the dimension instruction was best suited to our construct as it referred to knowledge about generic instructional quality features, whereas the PCK-test covered knowledge about subject-specific features. For PK-measurement, participants had to answer five SBA-items and ten multiple true/false items. Item scoring followed the instructions from BilWiss [40,81]. Precise descriptions of the scoring process can be found in Kramer et al. [9].
Afterward, each knowledge test was evaluated using the Rasch partial credit model (PCM), which resulted in PCK, CK, and PK Rasch person measures for each respondent for each test instrument [82,83]. For evaluating data fit, we utilized item Outfit-MNSQ (mean-square) values, item reliability and person reliability for each test. A productive measurement is shown by item Outfit-MNSQ values below 1.5 [84]. If item reliability is high, both the range of item difficulty and the sample size can be considered as appropriate to measure the variables precisely. The person reliability is a measure of internal consistency. Person reliability is impacted by the length of the test and the range of abilities of respondents [85]. Item fit statistics of the PCK, CK, and PK test showed good fit values (see Table 2). To compare data from the identical pre-and post-tests, we anchored items from the pre-test with appropriate items from the post-test. After analyzing pre-and post-test of each knowledge facet utilizing Differential Item Functioning [82], we included 10 anchor items for the PCK test, 23 anchor items for the CK test, and 11 anchor items for the PK test. Those items, which produced a measurement bias for pre-and post-test were excluded from anchoring.

Video-Based Assessment Tool DiKoBi Assess
To measure the three diagnostic activities (DA) generating evidence, evaluating evidence, and drawing conclusions, which are applicable for diagnosing instructional quality [61], we used the video-based assessment tool DiKoBi Assess that is embedded in an online survey platform [86]. DiKoBi Assess contains short staged video clips showing challenging biology classroom situations on the topic "skin". DiKoBi Assess consists of six videotaped classroom situations, which represent one whole biology lesson and refer each to another subject-specific dimension of instructional quality that was found to be empirically effective for student achievement in science instruction [30,87]: (1) level of students' cognitive activities and creation of situational interest, (2) dealing with (specific) student ideas and errors, (3) use of technical language, (4) use of experiments, (5) use of models, (6) conceptual instruction. The evaluation of these six subject-specific dimensions and the identification of subject-specific instructional quality features are applicable to any biology lesson regardless of the specific content to be taught [76]. For this study, we used two versions of the assessment tool DiKoBi Assess, which differed in the specific sub-theme of the embedded videos on the topic "skin":

•
DiKoBi I Assess (sub-theme: "skin as a sensory organ") to assess pre-service teachers' diagnostic activities before the intervention (diagnostic activities pre); • DiKoBi II Assess (sub-theme: "protective function of the skin") to assess pre-service teachers' diagnostic activities after the intervention (diagnostic activities post).
However, essential is the division into the six classroom situations, which address corresponding subject-specific dimensions of instructional quality and are the same for both versions.
For each of the six classroom situations, pre-service teachers had to identify challenging aspects of the shown situation of biology instruction and reason about them by describing the identified challenging aspects (DA = generating evidence), by explaining (including theoretical references) why there is room for instructional improvement (DA = evaluating evidence), and by proposing alternative teaching strategies (DA = drawing conclusions) (cf. [51,60,63]). Pre-service teachers' diagnostic activities were measured in an open-ended format with short-answer items [88]. For scoring, written answers were compared with predefined sample solutions of content-related coding variables that have been compiled with regard to the literature and research results on the subject-specific dimensions of instructional quality. The content-related coding variables referred to subject-specific instructional quality features that represent challenging aspects of biology instruction (see Table 3). Results of several qualitative validation steps showed that practicing in-service biology teachers (qualified for teaching Grade 5 to 12 in German secondary schools) with an average age of 40.4 years (SD = 9.2) and an average teaching experience of 9.4 years (SD = 6.9) received the staged classroom situations as authentic and that they could identify the challenging instructional aspects sufficiently. Moreover, it was shown that the created tasks can validly measure the assumed diagnostic activities. Further information on the validation process can be found in Kramer et al. [61,76]. Table 3. Overview of the content-related coding variables of the six classroom situations, shown for the three diagnostic activities.

Drawing Conclusion (12 Coding Variables)
(1) Level of students' cognitive activities and creation of situational interest Depending on the quality of the executed diagnostic activity (observable in not accurate or vague answers, or in more detailed, elaborated answers), Zero (0), 1, or 2 points were used for scoring answers corresponding to the diagnostic activity generating evidence; 0, 1, 2, or 3 points for scoring answers corresponding to evaluating evidence, and 0, 1, or 2 points for answers corresponding to drawing conclusions (cf. [9]). For a high-quality scoring of answers corresponding to the diagnostic activity generating evidence, it was important that the provided answer contained a systematic description of an observed challenging instructional aspect. A high-scored answer of the diagnostic activity evaluating evidence contained references to scientific concepts that were used to justify the claim that was made. High-quality in drawing conclusions became visible by specifically described alternative strategies that were derived based on the preceding steps of scientific reasoning. In the following, the scoring is exemplified for the answers of the pre-service teacher Anne to the classroom situation (4) use of experiments. When using experiments, biology teachers should implement the steps of scientific inquiry to foster students' scientific thinking [3,6]. However, the video shows a teaching situation (experiment on cold protection) in which the teacher disregards individual steps of scientific inquiry, such as formulating a question or generating hypotheses. Instead, the students carry out a recipe-like work instruction without having to think much. With regard to the coding variable characteristics of scientific inquiry, pre-service teachers' answers should address one of the aforementioned aspects. After watching the video clip, Anne gave the following description (DA = generating evidence): "given instruction (recipe)". Her description is very brief and does not contain many observed details. The answer is, therefore, scored 1 point. Anne justified her observation as follows (DA = evaluating evidence): "Students hardly learn the way of scientific inquiry". Her explanation refers to the scientific concept of scientific inquiry and is, therefore, scored 2 points. Eventually, Anne gave an alternative teaching strategy (DA = drawing conclusions): "Important: have students generate hypotheses, different opinions stimulate discussion, promote interest, generate excitement; experiment serves as a test of the hypothesis; discuss results; relate to practice, do fatter people then freeze less?" Anne's answer is comprehensive and refers to different aspects that are important with regard to scientific inquiry. Moreover, she proposes a transfer question which, in the sense of conceptual instruction, also represents a link to the students' everyday life. Consequently, her answer is scored 2 points. The same procedure was applied to all answers of the pre-service teachers regarding all coding variables of the three diagnostic activities.
After data collection, the scores were used to calculate person measures that represent pre-service teachers' abilities to generate evidence, evaluate evidence, or to draw conclusions with regard to the subject-specific classroom situations in the video-based assessment tool. Person measures were calculated by utilizing Rasch PCM. This was done separately for pre-and post-test, since the corresponding versions of the assessment tool DiKoBi Assess contained different sub-themes that may have had an impact on the execution of diagnostic activities. Fit statistics showed good fit values for pre-and post-test including a onedimensional construct of diagnostic activities (diagnostic activities pre/post: 31/31 items, all item Outfit-MNSQ < 1.32/1.38, person reliability = 0.77/0.73, item reliability = 0.90/0.85; note: for diagnostic activities pre, one item in DiKoBi I Assess produced inestimable high values). The one-dimensional construct was used due to weak reliability values when calculating person measures separately for the three diagnostic activities generating evidence, evaluating evidence, and drawing conclusions.
Our video-based approach follows Kersting et al. [15], who "use video clips of authentic classroom events as prompts to elicit teachers' analyses, which are in turn assumed to draw on teachers' knowledge" (p. 571). Therefore, we assume that the videotaped classroom situations elicit pre-service biology teachers' diagnostic activities for reasoning about the biology-specific challenges and features in both versions of the assessment tool, which are in turn assumed to rely on pre-service teachers' declarative and action-related PCK (cf. [9]).

Data Analysis
First, measures of all variables (pre/post: PCK, CK, PK, diagnostic activities) were separately analyzed using the Rasch PCM [83] with the software Winsteps 3.81 [85]. Second, Pearson's correlations and descriptive results were calculated utilizing IBM SPSS Statistics (version 26) and Microsoft Excel (2010) to describe the development and intercorrelation between all variables relevant for this study. The main analysis was done in two steps. (I) To answer RQ1a and RQ1b, we ran mixed ANOVAs for PCK, CK, and PK to analyze the main and interaction effects between time and treatments. (II) To answer RQ2, we chose an analysis of covariance (ANCOVA) to analyze the effects on diagnostic activities in the post-test while controlling for diagnostic activities in the pre-test. There was homogeneity of the error variances of all variables, as assessed by Levene's test (p > 0.05), as well as homogeneity of covariance, as assessed by Box's test (PCK: p = 0.53; CK: p = 0.57; PK: p = 0.53) and homogeneity of regression slopes (diagnostic activities: p = 0.20).

Results
A descriptive overview about means and standard deviations, as well as intercorrelations between all variables, is given in Table 4. The mean values represent the average person ability from the PCM of the corresponding variables. Intercorrelations emphasized the importance of PCK since there was a low to moderate correlation between PCK and most of the pre or post-measured variables [89]. For example, PCK pre was significantly correlated with PCK post (r = 0.57, p < 0.001), CK post (r = 0.30, p = 0.006), PK pre (r = 0.22, p = 0.048), and diagnostic activities pre (r = 0.33, p = 0.002). Furthermore, the descriptive results showed that pre-service teachers' PCK, CK, and PK increased from pre to post (see Table 4). Note that the average person measures of diagnostic activities pre and post are not directly comparable because they have not been anchored, as we used two separate measurement instruments in pre and post-test (pre: DiKoBi I Assess and post: DiKoBi II Assess). However, a descriptive comparison between diagnostic activities pre and diagnostic activities post can be made based on the quality of the diagnostic activities generating evidence, evaluating evidence, and drawing conclusions. Table 5 shows how often each diagnostic activity was scored with 0, 1, 2, or 3 points. It is noteworthy that the total scores of generating evidence and drawing conclusions decreased from pre to post, while the total score of evaluating evidence increased. Furthermore, the dispersion increased from pre to post for the diagnostic activities generating and evaluating evidence. More often 0 points (low quality of diagnostic activity) but also 2 or 3 points (improved quality of diagnostic activity) were used for scoring. In contrast, fewer answers were scored with 1 point in DiKoBi II Assess. For the diagnostic activity drawing conclusions, the frequency of 0 points increased whereas the frequency of 1 and 2 points decreased, indicating an overall decrease in quality.  Table 5. Absolute frequency of points that were used for scoring the quality of the three diagnostic activities. N refers to the total number of activities scored. This number is calculated from the number of participants (81) and the number of content-related coding variables for the respective diagnostic activity and is therefore the same for pre and post. Note: The maximum score for generating evidence and drawing conclusions was 2 points, for evaluating evidence it was 3 points.

Generating Evidence
Next, we ran mixed ANOVAs for PCK, CK, and PK to analyze time effect, treatment effect, and interaction effect between time and treatment (RQ1a, RQ1b). Main effects of time were found for both PCK and CK. They confirmed the positive descriptive trend of knowledge acquisition since a statistically significant increase in mean person abilities of PCK and CK was measured from pre to post. However, there was no statistically significant increase from pre-to post-measurement for PK. Furthermore, there was no statistically significant main effect of treatment or interaction between time and treatment for any knowledge facet when each treatment was considered individually (see Table 6). However, since the maximum number of participants per treatment did not exceed 18, we merged treatments that included CK in a second step. This step was done because the p-value of 0.085 of the interaction effect between time and treatment for the CK group was considerably lower compared to PCK or PK. This p-value might indicate a possible underlying effect of the self-directed knowledge acquisition via texts in terms of CK acquisition that might not have been detectable due to the small number of participants per treatment. Since it was not possible to increase the number of participants in the overall sample for the time of the study, a fallback solution was applied: treatments containing CK and treatments not-containing CK were merged (group 1: CK-treatment and combinationtreatment; group 2: PCK-treatment, PK-treatment, control group). Since our PCK-treatment did not contain CK content, we assigned the PCK-treatment to not-containing CK (group 2). Accordingly, the PCK test was constructed in such a way that PCK could be measured as independently as possible from CK (cf. [42,79]).
Effects of treatments on diagnostic activities were examined using an ANCOVA (RQ2). Results showed that the covariate diagnostic activities pre was significantly related to diagnostic activities post (F(1,75) = 11.77, p = 0.001, partial η 2 = 0.14). There was no significant effect of treatment on diagnostic activities after controlling for the effects of the covariate (F(1,75) = 0.61, p = 0.656, partial η 2 = 0.03), meaning that pre-service biology teachers in all treatments had equal person abilities after the intervention.

Discussion
This study aimed to investigate one particular way of knowledge acquisition and its effects on pre-service biology teachers' cognitive dispositions and skills. Thus, we investigated effects of self-directed knowledge acquisition via texts on the pre-service teachers' professional knowledge facets PCK, CK, and PK, as well as on their diagnostic activities as conceptualization of scientific reasoning skills in diagnostic settings. By using a video-based assessment tool to measure diagnostic activities, we contributed to situated measures of pre-service teachers' knowledge and skills.
In summary, neither PCK, CK, PK nor diagnostic activities were significantly affected by any of the study treatments. Accordingly, the research questions can be answered as follows: Self-directed knowledge acquisition via texts did not increase pre-service biology teachers' knowledge facets PCK, CK, or PK (RQ1a). Thus, no differences in the effectiveness of the intervention with respect to the three knowledge facets could be found (RQ1b). Additionally, there were no significant effects of the text-based intervention on the execution of the diagnostic activities (RQ2). In detail, we can state: The knowledge facets PCK and CK have significantly increased from pre to post; however, this increase could not be explained by the self-directed knowledge acquisition. This finding is in line with other research showing, for example, that PK could not effectively be fostered through self-directed knowledge acquisition compared to direct instruction [70]. Therefore, the methodological approach to knowledge acquisition via texts as it was implemented in the present study has not proven to be effective. However, that does not mean that self-directed knowledge acquisition via texts that represents one common learning practice at German universities is generally ineffective. Effectiveness may depend on the specific actions that are initiated by the instructional approaches. For example, Kyriakides et al. [90] found that both direct instruction and self-directed constructivist approaches can benefit student outcomes, depending on what exactly the teacher and the students do during instruction. Therefore, even though we made an effort to increase pre-service teachers' engagement with the learning material, the utilized tasks may not have been activating enough or appropriate to promote in-depth learning. Moreover, possible small effects of the selfdirected knowledge acquisition via texts might have been overlaid by other effects, possibly resulting from the video-based work. However, descriptive comparisons regarding the quality of the diagnostic activities pre and post showed an increase in terms of quality that was slightly noticeable for the diagnostic activity generating evidence and particularly noteworthy for evaluating evidence. Whereas pre-service teachers' evidence evaluations in the assessment tool DiKoBi I Assess were often superficial and vague, their quality slightly increased in DiKoBi II Assess in terms of more frequent concept references and explicit linking of observations and theoretical references. This primarily indicates a potential impact of the self-directed knowledge acquisition on the diagnostic activity evaluating evidence. Similar findings regarding the relationship between knowledge and interpretive processes have also been described in other studies (cf. [91]). The findings on the decrease in the quality of the diagnostic activity drawing conclusions from pre to post might indicate that other approaches of instructional support are necessary for the promotion of this activity, which cannot be provided via text-based instruction (that included a large amount of declarative knowledge). In addition, affective-motivational aspects have to be taken into account, because, in order to set up an alternative strategy, longer answers were required, for which the study participants might not have been motivated enough (cf. [92]). However, further research is needed to make more reliable conclusions and differentiate pathways of knowledge and skill development.
Hence, two other important questions remain whose answers contribute to the debate on measuring knowledge and skills of science teachers: First, how can the increase in knowledge in our study be explained, if not by the treatments? Second, which ways of knowledge acquisition might be more effective for university teaching? Regarding the first question, we want to refer to "the use of classroom video as a tool for bringing the central activities of teaching into the PD (professional development) setting" [93] (p. 1099). The greatest effects reported in this article are time effects of PCK-measures and, therefore, were independent of treatment. Researchers within science education already underlined that the use and prompted analysis of classroom performances challenges pre-service teachers' thinking and thus can activate their knowledge and make it accessible [15,45,74,94]. Our hypothesis is that the increase in PCK is due to the work with the video-based assessment tool DiKoBi Assess. The observation and diagnosis of biology-specific classroom situations may have elicited existing subject-specific knowledge. Working on the tasks in the assessment tool DiKoBi Assess and engaging in scientific evidence-based reasoning on biology instruction required pre-service teachers to apply diagnostic activities. This application of diagnostic activities to a specific situation of biology instruction may have contributed to the promotion of PCK. By observing and describing challenging aspects of biology instruction (i.e., generating evidence that helps encoding, cf. [2]), pre-service teachers directed their focus to very specific aspects. For the evaluation of this evidence (i.e., the observed and described challenging aspects) a linkage with broader principles they represent and thus the elicitation of professional knowledge had to take place [63,73]. The scientific reasoning skill evaluating evidence, in particular, is considered important in order to interpret classroom interactions and inform appropriate follow-up decisions [95]. Thus, implementing opportunities in which diagnostic activities can be applied or are even fostered may in turn have an impact on pre-service teachers' PCK. This assumption can be seen as an indication of the bidirectional relationship between knowledge and skills [96]. Consequently, the results of the study suggest that scientific reasoning about subject-specific instructional quality can potentially promote PCK and that the application of skills, such as diagnostic activities, to video-based settings thus seems to be more suitable for knowledge development than instructional support via texts. Still, a well-planned use of videos in specific teaching and learning situations is required to ensure the effectiveness of such learning opportunities (e.g., [97,98]).
Building on these thoughts, we want to emphasize the relevance of using and reflecting on practical examples for the development of pre-service teachers' professional knowledge (cf. [94]). Although a profound declarative knowledge base is considered important (especially in terms of CK) and may still be provided via specialized texts, PCK and PK are much more action-oriented and, therefore, require other forms of knowledge acquisition (cf. [99]). Our suggestion based on the present findings is to provide learning opportunities in which pre-service teachers engage in scientific reasoning about instructional quality, for example, via video-based tools. Other researchers have already made similar suggestions. König et al. [91], for example, used video vignettes to underline the importance of practical insights into teaching to improve teachers' general pedagogical knowledge. By setting the focus on practical scenarios, they also addressed reasoning skills as necessary components for the acquisition and transformation of knowledge. The use of video-based tools can be considered an appropriate approach to provide opportunities for the assessment and development of PCK and PK in teacher education that count as an important part of diagnostic competences [8,29,99]. In addition, such tools could be adapted to promote other facets of knowledge. For example, videos that focus on the use of digital media in the science classroom could be utilized to promote pre-service teachers' Technological Pedagogical Content Knowledge (TPACK) that represents the knowledge necessary to effectively use technology in the classroom [100]. Moreover, a further development of digital learning environments is the use of emerging technologies such as virtual or augmented reality, which can be used to examine specific competencies of prospective teachers systematically and realistically through virtual training scenarios (cf. [101]).
Besides the use of video-based or other digital tools, other ways of supporting preservice teachers' knowledge development have been discussed with regard to direct instruction. Effective approaches to knowledge acquisition often included an experienced lecturer in addition to text-based work in science teacher education. In contrast to static texts, a lecturer can "explicitly (address) the knowledge of students, learning and teaching in concrete content domains" [71] (p. 126), including practical examples as well, which has proven to be an effective method. Barth et al. [70] showed a positive effect of a systematic introduction to the relevant knowledge base on both the development of declarative professional knowledge and knowledge-based reasoning skills. Small effects of direct instruction on PCK were also reported by Tröbst et al. [67]. With regard to the development of PCK, the researchers found the combined instruction of the knowledge facets within professional development programs, which considered transformation processes of CK and PK during PCK-construction, to be more effective than polyvalent traditional teacher education [45,67,71]. This combined view on the acquisition of knowledge attributes a high relevance to educational training in higher education. In such trainings, practical classroom scenarios can also be videotaped and used for eliciting reasoning processes about these classroom scenarios, for which positive effects in terms of professional development could be recorded [45,93]. It is, therefore, important for pre-service teachers to take advantage of corresponding offers of universities.

Limitations
First of all, it should be noted that the overall sample size of the reported study was rather small, thus, resulting in an even smaller sample for the five treatments. This is accompanied by losses in terms of statistical power in our calculations. However, since the study was embedded in regular courses within a German university, the number of participants was limited and could not be easily increased up to the time of the study. In addition, extensive effects of the treatments could hardly be expected, since the intervention of the study was embedded in a seminar held once a week and lasted only 85 min. This time might not have been sufficient to produce sustainable, measurable effects. Other approaches included significantly higher intervention times with up to 250 h of professional development experiences over a two-year program [45]. At the same time, in such programs, a variety of different support activities take place to foster teachers' knowledge and skills at different levels. The advantage of the approach chosen in the present study is that by focusing on one specific activity, it is easier to investigate and monitor its effectiveness. Nevertheless, a longer intervention period might have been beneficial.
Further limitations concern the pre-service teachers' test performances. The increase in professional knowledge in terms of PCK and CK from pre to post might be the result of the pre-and post-test research design, since other experiences made during the study period or other forms of input that took place in everyday life or other seminars may have contributed to teachers' learning. Although descriptive PK-measures improved from pre to post as well, the time effect of the mixed ANOVA did not show statistical significance. One reason might be seen in the subject-specific focus of the videos, in which mainly subject-specific dimensions from biology instruction were addressed. In contrast, studies using videos that focus on general pedagogical aspects showed corresponding effects on PK [26,94]. Another reason for the lack of significant improvement in PK may also be due to the PK test used.
Test results for the pre-test were already comparatively high. Thus, the PK test did not discriminate the sample enough. Test reliabilities might be increased by utilizing a longer version of the test, or by a higher number of participants to increase variance in ability [85]. In contrast to the increase in test scores on the three knowledge facets, participants' total test scores indicating the quality of the diagnostic activities generating evidence and drawing conclusions decreased in the post-test. This could indicate a variation in the difficulty of the videos used in the two versions of the video-based assessment tool DiKoBi Assess. In order to investigate the dependence of the situation-specific performance of the test person on the video versions used, further analyses must be conducted that refer to the situation-specific difficulty of the video situations and thus to the situation-specificity of the participants' performances (cf. [26,102]).
Additionally, the absence of motivational variables that were not considered in this study might be seen as another potentially limiting factor. For a productive measurement, it is important that participants' desire and willingness to solve the tasks are considered and, if necessary, are scaffolded [92]. However, a supplementary analysis considering teachers' situational interest showed that controlling the variable situational interest did not change the results either. Still, an impact on diagnostic activities, particularly on drawing conclusions, may be possible.
A final limitation is of conceptual nature. In our study, we measured content-related knowledge facets (CK, PCK, PK). Theoretically, the different knowledge types knowing that, knowing how, knowing when and why can be distinguished for each of them [10]. However, the utilized PK test merely contained items covering knowing that. Additionally, the texts used for the intervention mainly covered declarative knowledge. Even though actionrelated knowledge in terms of knowing how and knowing when and why should be prompted by the tasks to be completed in the treatments, their proportion may not have been high enough to significantly influence pre-service teachers' execution of diagnostic activities. In addition, the analysis of the videos and the application of action-related knowledge therein must be considered as highly complex, which can lead to a high cognitive load, especially for learners with little prior knowledge [103]. The participants involved in this study were still at the beginning of their university education. To reduce cognitive load, short video clips were used. Furthermore, the video analysis process was already pre-structured by prompting the application of the three diagnostic activities through three individual tasks (describing, explaining, proposing alternative strategies). Moreover, the videos focused on only one dimension of subject-specific instructional quality. Nevertheless, the participants' working memory capacity may not have been sufficient to process the different information and to alternately access the corresponding knowledge types [104]. Therefore, addressing explicitly knowing how and knowing when and why is of high importance for future intervention approaches. However, they may be better accomplished through other types of instructional support than texts, e.g., through experienced lecturers.

Conclusions and Further Research
Finally, we want to derive implications for practice and further research. The present study provides further evidence that using video-based tools is beneficial in teacher education since these tools extend the number of practical approaches, which are provided, for example, via videos, classroom observations, or field experiences. Since the use of the video-based tool as an assessment instrument already had positive effects on pre-service teachers' professional knowledge in our study, possibly elicited by reasoning about instructional quality, it is reasonable to use the tool as a learning environment to promote pre-service biology teachers' PCK and their application of diagnostic activities even more effectively. Therefore, scientific reasoning skills should not only be investigated in terms of their relation to content knowledge of a specific discipline (cf. [4]), but also with respect to teachers' PCK. In this context, it might also be useful to use the individual activities generating evidence, evaluating evidence, and drawing conclusions separately for analyses instead of the one-dimensional diagnostic activities construct.
Future research could also investigate additional support in terms of scaffolding within video-based tools or simulations that might further promote the development of knowledge and diagnostic activities in order to facilitate teachers' diagnostic competences [8]. Moreover, scaffolding different types of knowledge relevant for scientific reasoning (that is procedural and epistemic knowledge for problem solving, cf. [4]) may improve scientific reasoning skills such as diagnostic activities and, thus, the development of content-related knowledge as well.
Following the demand for integrated coursework or a combined instruction of the knowledge facets, knowledge acquisition could also be addressed through direct instruction, as it is done, for example, in lectures at universities [70,71]. Therefore, further research could investigate the possible effects of an integrated presentation of the knowledge facets in lectures.  Data Availability Statement: Information and queries on the data used can be obtained from the authors of this article.

Conflicts of Interest:
The authors declare no conflict of interest. A   Table A1. Overview of the content that was covered in the texts.   [25,105,106,111,[113][114][115]117,[119][120][121][122][123]125] 5 no information (control group) -reflection on the organization of the university teacher education and ideas for improvement -Appendix B Figure A1. Example of the given tasks utilized in treatment (1) PCK (translated from German). 1 The materials used, including figures and tables, can be made available on request by the authors of this study. Figure A2. Outlined concept map, which is to be completed in Task 2 (translated from German).