Investigating Pre-Service Biology Teachers’ Diagnostic Competences: Relationships between Professional Knowledge, Diagnostic Activities, and Diagnostic Accuracy

Teachers’ diagnostic competences are essential with respect to student achievement, classroom assessment, and instructional quality. Important components of diagnostic competences are teachers’ professional knowledge including content knowledge (CK), pedagogical knowledge (PK), and pedagogical content knowledge (PCK), their diagnostic activities as a specification of situation-specific skills, and diagnostic accuracy. Accuracy is determined by comparing a teacher’s observation of classroom incidents with subject-specific challenges to be identified from scripted instructional situations. To approximate diagnostic situations close to real-life, the assessment of science teachers’ diagnostic competences requires a situated context that was provided through videotaped classroom situations in this study. We investigated the relationship between professional knowledge (PCK, CK, PK) of 186 pre-service biology teachers, their diagnostic activities, and diagnostic accuracy measured with the video-based assessment tool DiKoBi Assess. Results of path analyses utilizing Rasch measures showed that both PCK and PK were statistically significantly related to pre-service teachers’ diagnostic activities. Additionally, biology teachers’ PCK was positively related to diagnostic accuracy. Considering higher effect sizes of PCK compared to PK, the findings support previous findings indicating the importance of PCK, thus demonstrating its importance in the context of subject-specific diagnosis as well.


Introduction
In order to make efficient decisions during classroom instruction, teachers' abilities to identify and interpret relevant situations and events that influence student learning have been described as an important part of teachers' professional competence (e.g., [1,2]). Similar to the processes that a medical person performs (identifying and interpreting symptoms of a patient to decide on how to treat the patient best), the ability to assess classroom situations and events (e.g., how to implement an experiment in science instruction best, or how to deal with student misconceptions) in order to adapt instruction can be considered in the context of diagnostic competences [3,4]. Therefore, the consideration of diagnostic competences is a critical element in teacher education, which is significantly important with respect to student achievement, classroom assessment, and instructional quality [5][6][7]. Its importance is also stated in German standards of teacher education [8]. At this point, it is important to remark that in German research, diagnostic competences are used largely synonymous with assessment competences [4,9]. For reasons of consistency, we primarily use the term diagnostic competences instead of assessment competences throughout this article. The plural is used to indicate that there is no global construct of diagnostic competence but that its conceptualization depends on the study's specific focus [10,11].
Nonetheless, courses and practicums that promote diagnostic competences vary between universities [12,13]. Likewise, approaches to support diagnostic competences vary based on different diagnostic contexts (e.g., assessing learning outcomes, diagnosis of instructional tasks, monitoring the teaching and learning process) [14]. Additionally, results on the efficiency of particular programs are mostly scarce [12]. In order to effectively foster diagnostic competences and adapt university programs, we need to understand the components that constitute diagnostic competences and how these components are interrelated.

Diagnostic Competences as a Specification of Professional Competence
Until a few years ago, diagnostic competences were often studied in terms of the accuracy of teachers' judgment, with diagnostic accuracy referring to the difference between a teacher's judgment or a teachers' critical observation and more objective assessments of performance [15,16]. However, supporters of a broader understanding of competence criticized that diagnostic competences cannot be limited solely to measures of accuracy (e.g., [15,17]). According to Schrader [14], a broader understanding of diagnostic competence covers the entire process of diagnosing, including the ability to use appropriate strategies and methods of data collection and processing, as well as to interpret the obtained data properly. This understanding of the diagnostic process is also necessary to effectively help prospective teachers develop diagnostic competence [15]. Therefore, judgment accuracy can only be regarded as one component of diagnostic competences within the broader understanding of competence that takes situated contexts into account.
The refinement of additional components of diagnostic competences can be guided by research on the expert paradigm [18]. The shift within teacher professionalism research from a merely cognitive perspective to a situated perspective on professional competence is considered decisive in order to examine teacher competences as closely as possible to the demands of the real world [9,15,[19][20][21]. In an attempt to integrate both perspectives, Blömeke et al. [1] modeled professional competence on a continuum including personal dispositions such as cognition or affect-motivation, that underlie situation-specific skills, which again mediate between teachers' dispositions and their performance. A particular strength is the consideration of those situation-specific skills that teachers require to succeed in specific situations such as diagnosing. This approach offers the advantage of considering diagnostic competences more broadly, rather than limiting investigations to diagnostic abilities [5] or operationalizing diagnostic competences as accuracy of teachers' judgments only [15,16]. Therefore, in accordance with the competence as a continuum model, diagnostic competences can thus be understood as a latent trait including "those dispositions, situation-specific skills and performance that teachers need for diagnosis in the context of teaching and learning" [6] (p. 43).
Following this understanding, several components must be considered when examining diagnostic competences bringing together components studied in research on teachers' professional competence [1] and teachers' judgment accuracy [15,16]. First approaches to define diagnostic competences with regard to the competence as a continuum model mainly referred to teachers' professional knowledge and diagnostic skills such as diagnostic activities [4,6]. In that vein, Heitzmann et al. [4] defined diagnostic competences as "individual dispositions enabling people to apply their knowledge in diagnostic activities according to professional standards to collect and interpret data in order to make decisions of high quality." Therefore, three components of diagnostic competences are highlighted: (1) teachers' cognitive disposition and, in particular, their professional knowledge, (2) the application of diagnostic activities as a specification of situation-specific skills in the context of diagnosing, and (3) the need for a measure of diagnostic accuracy to check the agreement with professional standards (see Figure 1).
In the following sections, we describe the three components of diagnostic competences in more detail before referring to empirical findings concerning the relationship between these components. In the following sections, we describe the three components of diagnostic competences in more detail before referring to empirical findings concerning the relationship between these components.

Professional Knowledge
According to Heitzmann et al. [4], knowledge counts as a crucial prerequisite that enables teachers to execute diagnostic activities effectively. Teachers need to apply their knowledge in different diagnostic situations and recall what they know about effective teaching, diagnosing students' (mis)conceptions, and how to support students' learning progress [5,22]. Different facets describe teachers' professional knowledge based on Shulman's division into PK, CK, and PCK [23,24]. The three facets cover knowledge that teachers need for effective teaching. This comprises general pedagogical-psychological knowledge (PK), which is knowledge about classroom management and generic strategies and methods of teaching, learning, and assessment [24][25][26]; content knowledge (CK) that is knowledge about subject-specific facts, concepts, and methods [27]; and pedagogical content knowledge (PCK) that is knowledge about how to make a particular content accessible for a particular group of students taking into account content-dependent (mis)conceptions of students and instructional strategies of the subject [20,23,28,29]. PK is assumed to be content-independent [30], and therefore, PK seems more relevant for diagnosing general characteristics such as classroom management (cf. [6]). Both CK and PCK are mainly considered subject-and content-specific, and thus, they are applied in subjectspecific instructional situations [31,32]. As specific instructional quality features characterize effective science teaching, science teachers' subject-specific PCK counts as the knowledge facet with a high influence on instructional quality and student achievement [33][34][35]. PCK can therefore be considered a pivotal knowledge facet of teachers' diagnostic competence when the diagnostic focus is on subject-specific instructional aspects. However, regarding the relationship between the three knowledge facets, researchers assume CK to be necessary but not sufficient for the development of PCK, while PK counts as an important precondition to applying CK and PCK in subject-specific instruction [30,31,36]. In order to measure teachers' knowledge facets in a standardized way, different methods such as paper-pencil assessments with multiple-choice or open-ended items, semi-structured interviews, or concept mapping have been used [29,[37][38][39].   [4], knowledge counts as a crucial prerequisite that enables teachers to execute diagnostic activities effectively. Teachers need to apply their knowledge in different diagnostic situations and recall what they know about effective teaching, diagnosing students' (mis)conceptions, and how to support students' learning progress [5,22]. Different facets describe teachers' professional knowledge based on Shulman's division into PK, CK, and PCK [23,24]. The three facets cover knowledge that teachers need for effective teaching. This comprises general pedagogical-psychological knowledge (PK), which is knowledge about classroom management and generic strategies and methods of teaching, learning, and assessment [24][25][26]; content knowledge (CK) that is knowledge about subject-specific facts, concepts, and methods [27]; and pedagogical content knowledge (PCK) that is knowledge about how to make a particular content accessible for a particular group of students taking into account content-dependent (mis)conceptions of students and instructional strategies of the subject [20,23,28,29]. PK is assumed to be content-independent [30], and therefore, PK seems more relevant for diagnosing general characteristics such as classroom management (cf. [6]). Both CK and PCK are mainly considered subject-and content-specific, and thus, they are applied in subject-specific instructional situations [31,32]. As specific instructional quality features characterize effective science teaching, science teachers' subject-specific PCK counts as the knowledge facet with a high influence on instructional quality and student achievement [33][34][35]. PCK can therefore be considered a pivotal knowledge facet of teachers' diagnostic competence when the diagnostic focus is on subject-specific instructional aspects. However, regarding the relationship between the three knowledge facets, researchers assume CK to be necessary but not sufficient for the development of PCK, while PK counts as an important precondition to applying CK and PCK in subject-specific instruction [30,31,36]. In order to measure teachers' knowledge facets in a standardized way, different methods such as paper-pencil assessments with multiple-choice or open-ended items, semi-structured interviews, or concept mapping have been used [29,[37][38][39].

Diagnostic Activities
When teachers assess specific instructional situations, they engage in situation-specific diagnostic processes. These processes require the execution of situation-specific skills that have been termed as assessment skills [40], professional vision [41,42] or noticing [43] in the context of classroom assessments, and diagnostic skills [5,6], or diagnostic activities [4,17,44] considering the specific context of diagnosis. Diagnostic activities can be described as those activities teachers execute to evaluate data on, for example, learning conditions and prerequisites of learners in order to optimize the overall instructional pro-cess [4,44]. With regard to a specific diagnostic situation, different diagnostic activities may be relevant. Diagnostic activities may also vary regarding the weight attributed to each activity and the way these activities are performed [4,45]. This variability of possible activities makes adaptation to specific diagnostic contexts possible. Overall, eight diagnostic activities have been differentiated following scientific reasoning processes: problem identification, questioning, generating hypothesis, constructing artefacts, generating evidence, evaluating evidence, drawing conclusions, and communicating the process/results [4,45]. Descriptions of the eight diagnostic activities can be found in Table 1. Table 1. Taxonomy of the diagnostic activities according to Heitzmann et al. [4] and Fischer et al. [45]. Note that not each diagnostic activity is appropriate for a given situation, and thus, the number and type of the executed diagnostic activities may vary.

Diagnostic Activity Description
Identifying problems A noteworthy event that may influence student learning is noticed by the teacher.

Questioning
The teacher asks questions to find out more about the identified problematic incident or its cause.

Generating hypothesis
The teacher generates a hypothesis about possible sources of the identified problem.

Construct or redesign artefacts
The teacher creates content-specific tasks suitable for identifying underlying instructional problems or detecting students' misconceptions.

Generating evidence
Evidence is generated either by the use of a constructed test or a created task or through systematic observation and description of the problematic incident.

Evaluating evidence
The teacher assesses the generated evidence regarding its support to a claim or theory. He/she interprets the data, thus making sense of the generated evidence with regard to his/her belief, knowledge, and expertise (cf. [46]).

Drawing conclusions
As a result of evaluating evidence, the teacher predicts consequences regarding student learning or makes suggestions for alternative instructional strategies.

Communicating the process/results
The teacher scrutinizes diagnostic results to colleagues, students, or parents.
A more analytical operationalization of situation-specific skills for assessing classroom situations is described in the concepts of "professional vision" and "teacher noticing" (e.g., [41,43]) (see also Section 3.2.2). Overall, situated approaches for the measurement of situation-specific skills such as diagnostic activities that science teachers apply in practices close to the assessment of classroom instruction include, for example, classroom observations, reflection on lesson plans, responding to students' ideas, or video-based analyses [47][48][49]. All these approaches are based on evidence collected in the specific context in which the skills are applied.

Diagnostic Accuracy
Diagnostic competences are also reflected by the quality of the diagnosis, which can be operationalized by means of accuracy measures [4,17]. Accuracy has often been investigated in terms of judgments accuracy that has been assessed at different levels, for example, at the student-related level by focusing on teachers' judgments about student achievement [15], or at the classroom level by assessing instructional features such as task demands [5]. Considering the student level approach, researchers investigated correlations between teachers' judgments of student characteristics and students' outcomes in a standardized test, or the accuracy of the rank order of student performance according to competence levels as measures [17,40,50]. At the classroom level, judgments can be compared to specific standards of the domain (e.g., features of instructional quality) in order to consider the quality of the information basis and to obtain a measure of accuracy (cf. [15,51]). However, other measures of accuracy referred to the ability to apply situation-specific skills accurately. For example, researchers focused on perception accuracy, which describes "the precise observation of a professional situation" [2] (p. 373). Carter et al. [52] investigated perception accuracy in terms of immediate perception of science classroom environments of rapidly presented visual classroom stimuli on presentation slides. By limiting the time available for observing the situations, differences in perception accuracy between experts and novices, and thus in the participants' perceptual skill, were revealed. Therefore, accuracy measures should be considered as one component of diagnostic competences as well.

Empirical Evidence on the Relationships between the Components of Diagnostic Competences
Depending on the diagnostic focus (rather generic: e.g., classroom disruption [53]; or rather subject-specific: e.g., diagnosing biology instruction [54]), either PK or subjectspecific facets such as CK or PCK may be relevant to the application of diagnostic activities and diagnostic accuracy (cf. [55]). Tolsdorf and Markic [12] studied different knowledge types (conditional, technological, knowledge of change, competence knowledge) relevant in the context of diagnosing in chemistry. However, they only studied to what extent pre-service teachers that were at different stages of their education differed in, for example, different beliefs, attitudes, and knowledge about the importance of diagnosis in science or knowledge about how to change learning material concerning the needs of the learners. Among others, they found the most positive attitudes toward diagnosis or clear ideas for changing learning materials for more experienced pre-service teachers indicating the role of teacher education in university. However, they did not explicitly investigate other components of diagnostic competence such as diagnostic activities. Furthermore, Tolsdorf and Markic [12] assumed the practical applications of what the students had previously studied and experiences to be crucial for the change of diagnostic competence (defined here rather cognitively) in the course of pre-service teachers' studies. This points to the need to investigate diagnostic competences in situated approaches [21].
Situation-specificity of diagnostic competences was emphasized by Hoth et al. [6]. To investigate teachers' situation-based diagnostic competences, they investigated mathematics teachers' situation-specific skills in order to identify different perspectives of situation-based diagnoses. Furthermore, they examined interrelations between these diagnostic perspectives and teachers' knowledge. Results showed that a content-related perspective in the given classroom situation was related to high average mathematical CK. In addition, teachers using both a didactical and a content-related mathematical perspective had the highest average mathematics PCK scores. Furthermore, there were instances with a higher general PK score associated with a more pedagogical focus on classroom situations, with teachers focusing on aspects of classroom management, organizational aspects, and other pedagogical aspects. Overall, their qualitative analyses suggested that "teachers with greater knowledge not only interpret classroom events more adequately but also knowingly focus their attention to the relevant aspects" [6] (p. 52). Similar results have been found by König et al. [25], who showed that interpreting general pedagogical classroom situations correlated with general pedagogical knowledge. Furthermore, higher values of mathematics teachers PCK were positively connected with noticing relevant teaching and learning incidents, with "relevant" referring to the accurate diagnosing of aspects that matter in terms of students' learning in math [21]. Moreover, teachers with below-average PCK focused on rather superficial characteristics that were irrelevant to the diagnostic problem and thus not accurate. In addition, Blömeke et al. [56] emphasized that pre-service teachers who were prepared to teach both lower-and upper-secondary school (mathematics) had stronger prerequisites and more mathematics-related learning opportunities that resulted in a stronger cognitive base. Furthermore, their cognitive base (including CK and PCK) was better connected to situation-specific skills. Comparable results on the relationship between knowledge and skills can also be found in research on professional vision (e.g., [38,57]).
Within research focusing on skills relevant during the diagnostic process, Wildgans-Lang et al. [17] showed that the quality of executed diagnostic activities was more important for diagnostic accuracy, and thus, for the quality of the overall diagnosis than the frequency of the diagnostic activities. Furthermore, they differentiated between diagnosing competence levels and diagnosing students' misconceptions and assessed the diagnostic accuracy with regard to both aspects. Results indicated the accurate diagnosis of misconceptions to be more challenging than the accurate diagnosis of competence levels. However, they did not analyze or link knowledge facets in their study but exposed the relation between diagnostic activities and professional knowledge as an important issue for further research.
In educational research, the correlates of diagnostic accuracy such as personal cognitive traits or professional knowledge facets are still not well enough studied [14,58,59]. Accuracy has mostly been considered in terms of comparisons of pre-service teachers' answers/observations with expert ratings/answers that served as a measure of correctness (e.g., [40,55,58]) but without explicitly investigating relationships to other components. In clinical research, content-specific knowledge counts as the basis for diagnosing clinical cases accurately, and accuracy is assumed to depend on skills relevant for correct interpretations [60,61]. Transferred to teacher education, this may imply that subject-specific knowledge facets (CK, PCK) are more relevant for diagnosing subject-specific cases accurately. Accuracy may rely on skills such as the elaborate execution of diagnostic activities that are assumed to be quite poor in pre-service teachers [52,62]. However, so far, only a weak correlation could be found for the relationship between the accurate rank order of tasks and student performance, and teachers' PCK (on text-image integration in biology, geography, and German) [50]. Studies focusing on teachers' adaptive teaching skills in terms of adequate planning and carrying out instruction found moderate correlations with teachers' accuracy of the rank order of student performance [63,64]. While some studies thus focused on student-related accuracy, there is, to our knowledge, a lack of studies on accuracy measures regarding the diagnosis of instructional features in science classrooms that improve the quality of science instruction, and thus, student achievement. Karst et al. [10] traced this back to the greater effort when measuring features of instructional quality that require, for example, the use of videography.
Overall, depending on the diagnostic focus, a correlation between teachers' situationspecific skills, their diagnostic accuracy, and corresponding knowledge facets can be assumed but has not been systematically investigated yet. The use of videotaped classroom situations may be one approach to examine the different components collectively and in a situated way.

Video-Based Assessment of Diagnostic Competences
For pre-service science teachers (PST) it might be challenging to succeed in diagnostic situations within the complex environment of a science classroom, as PST are less experienced and less skillfully with regard to their situation-specific skills [41,62]. Video-based programs and instruments have been developed, in which videos were used in different ways: to reflect on teachers' own or other teachers' practice; to show best practice training, in which teaching strategies can be observed and adapted for own teaching; or to promote situation-specific skills for the interpretation of important features of classroom interactions [42,65,66]. Researchers assume that supported by the situated context, teachers' professional knowledge can be activated, and necessary situation-specific skills can be applied [2,6,57]. The advantage of videos is that they approximate practice, reduce the complexity of the diagnostic situation, and thus, can promote PSTs' competence development [67]. However, video-based instruments are also considered promising for assessing teachers' diagnostic competences within a situated context, and thus, for measuring their diagnostic activities and diagnostic accuracy [2,6]. Investigation and training of diagnostic competences within learning environments have often been student-or interaction-based [68]. Diagnostic tasks within video-based instruments referred mostly to the diagnosis of teacher-student interaction on mathematical content (e.g., [6,40,57]), or students' thinking (e.g., [37]), but diagnostic contexts regarding the assessment of the instructional behavior of the teacher with regard to features indicating instructional quality that impact learning have rarely been applied. Researchers, however, emphasized the relation between teacher knowledge and instruction. Knowing about effective instructional strategies and being able to diagnose instructional situations and offer effective instructional alternatives is considered a crucial skill that was found a significant predictor of student learning [57,69]. A first approach can be found by Meschede et al. [38], who assessed pre-service science teachers' situation-specific skill professional vision with regard to the two instructional aspects cognitive activation and structuring learning situations. However, the authors claimed that future research needs to conceptualize and investigate assessment skills with respect to other aspects of instructional quality or in other contentspecific domains. The study of diagnostic competences and its components with respect to further subject-specific features of instructional quality is still pending.

Summary
Overall, three observations can be made: First, investigating diagnostic competences should go beyond judgment accuracy and should also include components such as professional knowledge and diagnostic activities to fully grasp the construct. Second, previous analyses mostly referred to individual components of diagnostic competences, for example, teachers' situation-specific skills for making situation-based diagnoses [6]. Third, considering all components and their relationships is helpful to plan further studies in the future by building on existing relations. This aspect can be considered particularly worthwhile in relation to teacher education, in which diverse offers and courses for the development of competence are embedded without systematically addressing (aspects of) diagnostic competences [13]. Knowing about components that influence teachers' diagnostic competences and which can be modified in teacher education is therefore of great importance [70].

Aims and Hypotheses
As described in the previous sections, effective diagnoses, and thus, diagnostic competences, are considered to depend on the activation of appropriate knowledge facets that underly the execution of diagnostic activities (cf. [6,38]). Since concrete empirical evidence regarding diagnostic activities, the inclusion of accuracy measures in situated approaches, as well as interrelations between the professional knowledge base, diagnostic activities, and diagnostic accuracy as components of diagnostic competences is still rare, we want to address these issues within a biological context in higher education since the development of competence needs to start within university teacher education [12,13]. Thus, the present study makes an effort to measure the three components of pre-service biology teachers' diagnostic competences in order to investigate the relationship between them as starting point for a programmatic approach to the investigation of pre-service biology teachers' diagnostic competences (cf. [71]). Therefore, we addressed the following research question: How do the different knowledge facets PCK, CK, and PK relate to diagnostic activities and diagnostic accuracy?
Considering the demand for practical approaches within the investigation of PST diagnostic competences [6,40], we investigated this question within a situated context using videotaped classroom situations showing whole-class biology instruction. A wholeclass diagnostic focus is considered more complicated than diagnosing a particular student or a group of students [12] but reflects teachers' everyday-life practice. Therefore, it is necessary to understand how components of diagnostic competences relate to each other in order to develop or adapt university programs. The following hypotheses were derived from previous research: Hypothesis 1. With regard to the assessment of science instruction (cf. [38]) and previous findings on the importance of content-related knowledge facets for subject-specific instructional quality [33,34], PCK can be considered as a pivotal knowledge facet of teachers' diagnostic competences when the diagnostic focus is on subject-specific instructional aspects. Therefore, we assume PCK to be strongly related to the application of diagnostic activities and diagnostic accuracy [6,21,35,38,55,57]. Hypothesis 2. CK was found to be a necessary condition for PCK development, but research about its relation to diagnostic activities or accuracy is scarce. Therefore, we assume CK to be correlated with PCK, while a connecion between CK and diagnostic activities or diagnostic accuracy is not assumed (cf. [30,33,34,36]).

Hypothesis 3.
Finally, a relationship between PK, diagnostic activities, and diagnostic accuracy should not exist, since the focus of the diagnosis lies on subject-specific instructional quality, and thus, not on general aspects of teaching and learning (cf. [6,41]).

Design and Sample
Data collection was embedded in a mandatory seminar attended by pre-service biology teachers at the beginning of their teacher education. Using the video-based assessment tool DiKoBi Assess (German acronym for diagnostic competences of biology teachers in biology classrooms) was compulsory for all PST. Still, participation in the study, and thus, releasing their data for analysis was voluntary. All participants signed informed consent documents stating an anonymous and voluntary participation.
The present study had a cross-sectional design with two points of data collection. According to Spector [71], cross-sectional designs are most useful to provide initial evidence of the extent to which variables "are related without introducing the complexities of temporal flows that might distort relationships" (p. 130). Since we were interested in understanding the relationships between the three components of diagnostic competences which we defined, the cross-sectional design provided a useful starting point within our research that will become more complex in subsequent studies.
First, we asked PST of the subject biology to complete three professional knowledge tests to measure PCK, CK, and PK. Second, we used the video-based assessment tool DiKoBi Assess to measure PSTs' diagnostic activities and diagnostic accuracy. This means the data set allowed the computation of five different PST measures (a PCK measure, a CK measure, a PK measure, a diagnostic activities measure, a diagnostic accuracy measure).
In DiKoBi Assess, PST had to use diagnostic activities to diagnose a biology teacher's subject-specific instruction within a real-life teaching situation to capture diagnostic competences as ecologically valid as possible (cf. [72]). The sample consisted of 186 PST of the subject biology (72.0% female; average study semester: M = 3.3, SD = 1.3; age in years: M = 23.0, SD = 3.8). A percentage of 36.6% of the PST strove for a certification qualifying them for the academic track of German secondary education (German Gymnasium), and 63.4% attended programs for the non-academic track that qualifies students for a vocational career. For an overview of the German school system, see Cortina and Thames [73].

Professional Knowledge Tests
Professional knowledge was assessed through the use of paper-pencil tests that included open-ended items (responses were written in text fields), single best answer (SBA) items: one correct answer must be selected from a set of possible responses consisting of multiple distractors and one correct answer), or multiple true/false items (all of the possible responses must be assessed for their validity) [74].
The PCK-and CK-tests considered the biology topic skin as this was the same topic covered in the video-based assessment tool DiKoBi Assess. The tests were adapted versions of the professional knowledge tests, which have been utilized in the ProwiN project (Professional Knowledge of Teachers in Science) [75,76]. In the tests, aspects of PSTs' declarative and action-related knowledge were assessed. Declarative knowledge (knowing that) includes knowledge about terms, facts, and principles (e.g., listing the advantages and disadvantages of a specific model); action-related knowledge (knowing how, knowing when and why) is needed for successful instruction in different situations [77]. Knowing how is knowledge about an individual science teacher's (instructional) practices and processes (e.g., knowing how to deal with student ideas). Knowing when and why refers to "knowledge about conditions under which decisions and practices are appropriate and knowledge about reasons for performing specific practices" [77] (p. 6).
The PCK-test covered knowledge of instructional strategies and knowledge of student (mis)conceptions. Both issues count as important components of teachers' PCK [22,31]. Utilizing the model of Tepner et al. [78], eight open-ended items and five SBA items concerning three PCK dimensions were included in the test (see Table 2). The CK-test included 13 open-ended items and 15 SBA items. Topics that were covered with the items are shown in Table 3. Criteria for item scoring of both the PCK-and CK-test were provided in two separate coding manuals. Two independent raters used the coding manuals to code ten percent of both the PCK-and CK-tests. Results of two-way random intra-class correlations (ICC absolute ) showed a high agreement between the two raters (PCK-test: ICC absolute (310,310) = 0.84, p < 0.001; CK-test: ICC absolute (341,341) = 0.97, p < 0.001) [79].  The knowledge facet PK was assessed by utilizing a paper-pencil test that was adapted from the BilWiss project [80,81]. The adapted version contained one out of six different dimensions originally used in BilWiss. For this study, the short scale of the dimension instruction was used, because it contained items about generic features of instructional quality such as classroom management, supportive climate, and cognitive activation, which are referred to as basic dimensions of instructional quality [33,82,83]. The instrument also contained items concerning general pedagogical issues of teaching such as teaching methods (see Table 4). Therefore, the selected dimension of the BilWiss test was the most important one with regard to the differentiation between generic and subject-specific features of instructional quality, which was important for accurate diagnosis of the videotaped classroom situations. The PK-test contained five SBA items and ten multiple true/false items. The item scoring followed the instructions from BilWiss [80]. Data sets from the BilWiss project, in which the PK test from this study was developed and used, are publicly available on the IQB website [81]. The three tests were evaluated utilizing Rasch theory and Rasch analysis techniques [84,85]. The Rasch Partial Credit Model (PCM) was used utilizing the Winsteps program [86]. The use of Rasch allowed "person measures" to be computed for each respondent for each instrument. Therefore, the data collected from the PCK-, CK-, and PK-tests allowed PCK, CK, and PK Rasch measures to be computed for each respondent. Reasons for utilizing Rasch are that raw scores (be it from a multiple-choice test, a partial credit test, or a rating scale) are non-linear. Rasch allows one to take that non-linear data and compute Rasch person measures which are expressed on a linear scale [85,87]. It is those linear measures that are needed for parametric statistics. Additional reasons why Rasch analysis techniques should be used when test and survey data is evaluated include that Rasch methods (1) express items and respondents on the same measurement scale, (2) provide wide-ranging Rasch indices to evaluate the functioning of items, (3) allow the computation of the measurement error of each item and each respondent, (4) allow respondent measures to be computed even if data is missing, (5) correct for the non-linearity of raw test scores, and (6) enable alternate forms of an instrument to be developed (through linking items), such alternate forms enable respondent performance to be expressed on the same scale regardless of test form completed [85,87].
One important component of an analysis utilizing Rasch methods is an assessment of the "fit" of items. To evaluate data fit, item Outfit-MNSQs (mean-squares) were utilized. Additionally, Rasch person reliability and Rasch item reliability were computed and evaluated. It has been argued that for a productive measurement, item Outfit-MNSQ values should not exceed 1.5 [88]. High values of item reliabilities demonstrate that both the range of item difficulty and the sample size are appropriate to measure the variables precisely. Person reliability is impacted by the length of the test and the range of abilities of respondents [86]. Item fit statistics of the knowledge tests showed good fit values in which all items exhibited an Outfit-MNSQ below 1.5 (PCK: 13 item outfit-MNSQ < 1.18; item reliability = 0.96; person reliability = 0.55; CK: 28 item outfit-MNSQ < 1.35; item reliability = 0.97; person reliability = 0.67; PK: 15 item outfit-MNSQ < 1.34; item reliability = 0.98; person reliability = 0.50).

Video-Based Assessment Tool DiKoBi Assess Measuring Diagnostic Activities
To measure PSTs' diagnostic activities, we used the video-based assessment tool DiKoBi Assess that was embedded in an online-survey platform [89]. DiKoBi was developed to provide diagnostic situations of real-world demands for biology teachers, in which subject-specific knowledge and skills can be applied to assess biology teaching [54]. Six staged videos were embedded in DiKoBi to address six different challenging situations that biology teachers have to confront when teaching biology (e.g., elaborate use of threedimensional models). All of these six challenging situations addressed the biological topic "skin," each challenge concerning a different subject-specific dimension of instructional quality. The embedded dimensions and including features have been found important factors that impact student achievement [90]. These dimensions are (1) level of students' cognitive activities and creation of situational interest, (2) dealing with (specific) student ideas and errors, (3) use of technical language, (4) use of experiments, (5) use of models, (6) conceptual instruction.
In DiKoBi, every challenging situation to be diagnosed started with the video of the classroom situation. Afterward, PSTs had to complete three tasks. Each task required the execution of a diagnostic activity. Since not all diagnostic activities were found to be useful for diagnosing in DiKoBi, a selection of three activities that can be considered crucial in the context of video analyses was made in comparison with conceptualizations reported in research previously [91]. Besides conceptualizing situation-specific skills as perception, interpretation, and decision-making, originated from the competence as a continuum model [1], other conceptualizations exist. These refer to reflective skills for video viewing or classroom observation and are discussed, for example, under the term professional vision, which includes several critical activities [42,92]. Within a teaching context, professional vision means the ability to notice (that is paying attention to relevant events in the classroom) and to reason about relevant features of classroom interaction [41]. The reasoning process is knowledge-based and can be differentiated into three further activities: describing the situation without making judgments, explaining the situation by linking the observation to professional terms and concepts, and predicting possible consequences from the observed situation [41,43]. Considering the skills displayed in the competence as a continuum model and the aforementioned activities described within the concept of professional vision, four diagnostic activities (DA) can be considered crucial within diagnostic processes. First, science teachers have to identify noteworthy events in the science classroom (DA = problem identification), and systematically observe and describe the noteworthy events (DA = evidence generation). Second, the teachers explain the situation by actively drawing on their declarative and action-related knowledge (e.g., by using professional terms and theories of teaching and learning that support the relevance of their observation) (DA = evidence evaluation). Third, they make decisions on how to continue with instruction or respond to students' activities, or they even have to propose alternative teaching strategies (DA = drawing conclusions) (cf. [2,93]). Since problem identification occurs rather invisibly in the participant's mind, we assumed problem identification to be indirectly included in the events described without explicitly measuring it. This assumption is based on the fact that in order to describe an incident, a teacher must show awareness of exactly this incident or problem that occurred [92].
Three tasks prompted the diagnostic process in DiKoBi. Each task focused on a specific diagnostic activity. For Task Describe (DA = evidence generation), the PST had to identify and describe challenging aspects of each classroom situation; for Task Explain (DA = evidence evaluation, the PST had to reason about their described challenges by linking their description to scientific theories and concepts; and for Task Alternative Strategy (DA = drawing conclusions), the PST had to propose an alternative teaching strategy, and give reasons why this would improve instruction. For a more detailed description of the development and the design of DiKoBi, see Kramer et al. [54].
A coding manual was utilized to analyze the PSTs' answers to each task of the six challenging situations. The coding manual is based on subject-specific instructional quality features and corresponding indicators described in the science literature (e.g., using challenging tasks to foster conceptual understanding [83]). In empirical studies, a positive impact of these features on student learning has been found. Examples from the manual and corresponding references can be found in Kramer et al. [54]. For each task, the answers were coded according to the content-related knowledge facets (PCK, CK, PK), and the quality of the appropriate diagnostic activity was assessed. Zero (0) points were used to indicate very low (not accurate) answers. For correct answers of improved quality, 1 or 2 or 3 points could be utilized for a rating. Appendix A provides an overview of the procedure utilized for coding and the codes of the quality levels that have been used for coding the PSTs' answers.
Three independent raters used the coding manual to code statements of Task Describe, Task Explain, and Task Alternative Strategy to ensure objective coding of the answers. 337 statements from 10 PST were coded by all three raters. Results of a two-way random intra-class correlation (ICC absolute ) analysis of these ratings suggested a high agreement between the three raters (ICC absolute = 0.90, F (1520, 3040) = 10.26, p < 0.001, N = 1521) [79]. For the small number of discrepancies in coding that were observed, these differences were discussed by all three raters prior to the rating of the remaining data by a single coder. Complex cases continued to be discussed together during the ongoing coding process.
After coding the different tasks and situations, the research team calculated Rasch person ability measures for each respondent. Similar Rasch techniques as that described previously for the PCK-, CK-, and PK-tests were utilized. This Rasch person measure that we computed expressed the level of each PST's ability to execute diagnostic activities accurately. Thus, the Rasch person measure provides an assessment of the PSTs' diagnostic level utilizing the data collected for evidence generation, evidence evaluation, and drawing conclusions for each of the six classroom situations. Fit statistics of the Rasch model showed productive measures (diagnostic activities: 29 item outfit-MNSQ < 1.43; item reliability = 0.95; person reliability = 0.76).

Calculating Diagnostic Accuracy
On the one hand, accuracy was already taken into account in the scoring of the diagnostic activities. Thus, it was included as a quality criterion in the measurement of the diagnostic activities, since zero points were assigned if the description of the preservice teachers referred to a not-accurate observation. However, for investigating the relationship between components of diagnostic competences, we established another, more explicit measure of accuracy, which is operationalized as perception accuracy, referring to the precise perception and description of biology instruction (cf. [52,94]). Therefore, diagnostic accuracy was assessed by a teachers' individual observation compared to the scripted challenges embedded in the videos. These scripted challenges and corresponding indicators of instructional quality features served as an objective criterion for comparison since they were derived based on empirically proven features for effective teaching. In other words, we examined whether PSTs' descriptions of the challenging situations referred to the PCK-challenges that addressed missing features of subject-specific instructional quality. Descriptions that referred to superficial or general pedagogical observations not relevant for teaching and learning in the specific situation were not counted. To better understand the accuracy calculation, we will illustrate the procedure with an example: Given the classroom situation use of models, the two challenging instructional aspects elaborate model use and critical reflection were included as subject-specific features of instructional quality. Indicators addressing the lack of an elaborated model use were statements such as "The model is used for illustrative purposes only" or "The model is described incompletely." Indicators addressing the lack of critical reflection were statements such as "Teacher does not initiate critical reflection of the model" or "Teacher does not discuss the model with students." PST could describe up to ten observations per classroom situation. Subsequently, each described observation was compared with indicators of subject-specific instructional quality listed in the manual. If this was the case, the response was scored with one point. If this was not the case, zero points were given. Subsequently, all points (i.e., correct observations) for a challenging classroom situation such as use of models were added up and divided by the overall number of observations made in this classroom situation. For example, a participant who described four observations in the classroom situation use of models, but only two of them were indicators of biology-specific instructional quality and thus considered accurate, the calculated diagnostic accuracy of this classroom situation was 0.5. In the end, we calculated the average of the accuracy measures of the six classroom situations for the final measure of diagnostic accuracy.

Data Analysis
For the knowledge variables PCK, CK, and PK, as well as for the variable diagnostic activities, we conducted Rasch analyses [84] using the software Winsteps 3.81 [86]. As mentioned, we computed Rasch person measures, which we utilized for our subsequent statistical analysis. To test our hypotheses, we used path analyses in AMOS 26 [95] with the equal interval person abilities resulting from Rasch analysis of PCK, CK, and PK as predictor variables and the person abilities of diagnostic activities as well as the calculation of diagnostic accuracy as outcome variables. The model was estimated with maximum likelihood. For model fit, we used the comparative fit index (CFI), the root-mean-square error of approximation (RMSEA), and the standardized root-mean-square residual (SRMR). Model fit was estimated by guidelines of Hu and Bentler [96]: CFI > 0.90, RMSEA < 0.05, SRMS < 0.08. Results of the path analyses are shown as standardized values.

Results
An overview of all variables of the path model, including means and standard deviations, and correlations are shown in Table 5. For the variables PCK, CK, PK, and diagnostic activities, we used person abilities from the PCM that take item difficulties into account [85]. Negative mean values for PCK, CK, and diagnostic activities indicate that the tests used were rather difficult for the PST in our sample. There was a moderate correlation between PCK and almost all other variables, indicating the great importance of PCK for diagnostic competences. Additionally, there was a strong correlation between diagnostic activities and diagnostic accuracy. This observation can partly be explained by the fact that accuracy was also considered when scoring the executed diagnostic activities (see Appendix A). For example, if a PST described an incident that has been found as non-problematic according to the coding manual, the PST's description was assessed as "not accurate." Additionally, there were small correlations of CK and PK with diagnostic activities and diagnostic accuracy. To investigate the relationship between the professional knowledge facets, diagnostic activities, and diagnostic accuracy, we calculated a path model. Figure 1 shows the model with standardized parameter estimates and levels of significance. The model demonstrated that PCK (β = 0.29, SE = 0.07, p < 0.001) and PK (β = 0.15, SE = 0.09, p = 0.031) was significantly related to PSTs' diagnostic activities; both PCK and PK together explained 17% of the variance of diagnostic activities (R 2 = 0.17). Furthermore, 12% of the variance of diagnostic accuracy was attributable to PCK (β = 0.23, SE = 0.02, p = 0.002). The model had no degrees of freedom, and its fit values were CFI = 1.000, RMSEA = 0.315, SRMR = 0.000. The rejection of the model by RMSEA could be due to the sample size smaller than N = 250 [96].
However, whereas results confirmed hypotheses 1 and 2, hypothesis 3 must partly be rejected since the predictor variable PK was positively related to diagnostic activities as well (see Figure 2).

Discussion
This study aimed to contribute to teacher education by investigating the relationship between three components of diagnostic competences that have been defined as professional knowledge (PCK, CK, PK), diagnostic activities, and diagnostic accuracy [4]. Following the understanding of competence as described in the competence as a continuum model, we assumed professional knowledge as part of teachers' cognitive dispositions related to teachers' application of diagnostic activities and their diagnostic accuracy [1,6]. However, since we chose a whole-class diagnostic focus on biology instruction in our study, not all knowledge facets were assumed to be equally related to diagnostic activities and diagnostic accuracy. Therefore, PCK, CK, and PK of pre-service biology teachers (PST) were measured with adapted versions of objective, reliable, and valid paper-pencil

Discussion
This study aimed to contribute to teacher education by investigating the relationship between three components of diagnostic competences that have been defined as professional knowledge (PCK, CK, PK), diagnostic activities, and diagnostic accuracy [4]. Following the understanding of competence as described in the competence as a continuum model, we assumed professional knowledge as part of teachers' cognitive dispositions related to teachers' application of diagnostic activities and their diagnostic accuracy [1,6]. However, since we chose a whole-class diagnostic focus on biology instruction in our study, not all knowledge facets were assumed to be equally related to diagnostic activities and diagnostic accuracy. Therefore, PCK, CK, and PK of pre-service biology teachers (PST) were measured with adapted versions of objective, reliable, and valid paper-pencil tests [76,80]. Diagnostic activities as an operationalization of situation-specific skills were measured with DiKoBi Assess. This video-based assessment tool provided subject-specific challenges that a biology teacher has to deal with in real-life instruction. The challenges focused on empirically proven features of instructional quality in the science classroom. As part of their diagnostic competences, pre-service teachers need to know about effective instructional strategies, but they also need to be able to diagnose instructional situations and offer effective instructional alternatives. Being able to apply skills such as diagnostic activities was found a significant predictor of student learning [57,69]. With regard to the aspects of professional vision (description, explanation, prediction), and with regard to the situation-specific skills perception, interpretation, and decision-making that are depicted in the competence as a continuum model, tasks in the video-based assessment tool prompted the use of the diagnostic activities evidence generation, evidence evaluation, and drawing conclusions as relevant situation-specific skills in the context of video analysis and diagnosis of classroom instruction [2,42]. Following this, we want to discuss the results of our path analysis.
Results showed that PSTs' PCK was positively related to the application of diagnostic activities (hypothesis 1), thus further supporting the tendency in existing results on the relation between PCK and situation-specific skills in domain-specific situations [22,38,57]. Furthermore, our results also highlight the importance of PCK for diagnostic accuracy (defined as precise perception and description of incidents relevant in biology instruction). Similar findings have been reported by Hoth et al. [21], who already indicated a connection between high subject-specific knowledge of teachers and their ability to reason about student errors more accurately, "while teachers with low knowledge focus on aspects that are not directly connected to the student's learning" [21] (p. 1). However, it must be noted that both PCK and CK were included in Hoth et al.'s knowledge conception. Therefore, our results explicitly indicate the relationship between PCK and diagnostic accuracy, not only for student errors but also for other dimensions of subject-specific instructional quality. Since many incidents occur simultaneously or in very short succession in biology classrooms, being able to focus on relevant incidents is crucial in terms of a biology teachers' effective instructional behavior [13]. Pre-service teachers' diagnostic accuracy might therefore also be considered important for implementing instructional quality in real-life teaching that should be studied in future research.
The results also showed that for the application of diagnostic activities and in terms of diagnostic accuracy, CK is not critical, but CK is moderately connected to PCK (Hypothesis 2). This result also confirms previous research findings highlighting the role of CK in defining the scope of the development of PCK [30,31,34].
Contrary to our hypothesis 3, the knowledge facet PK was positively related to the application of diagnostic activities. To understand this relation, we want to take a closer look at the utilized items (see Table 2; Table 4). Items of the PCK test referred to three PCK dimensions, which were use of models, use of experiments, and student errors [78]. The three PCK dimensions were covered in the video-based assessment tool as well. However, the video-based assessment tool also covered additional dimensions of PCK, which were level of students' cognitive activities and creation of situational interest, use of technical language, and conceptual instruction. An extension of the PCK paper-pencil test would be useful to cover the same PCK dimensions in both measurement instruments. Subsequently, it would have to be verified whether the present relationships remain the same. On the other hand, two items of the PK test showed similarities to PCK-dimensions utilized in the video-based assessment tool. For example, item PK-11 referred to the role of activating students' prior knowledge for instruction. Activating students' prior knowledge can be considered important in terms of cognitive activation, which is one of the three basic dimensions of instructional quality that have been described for effective instruction relevant in different domains [33,83]. At the same time, the implementation of cognitive activation has to be concretized from a subject-and content-related perspective [11,90]. Such content-related concretization was done in the corresponding classroom situations of the video-based assessment tool that referred to the level of students' cognitive activities and conceptual instruction. That is why those two dimensions were assigned to PCK. In the same vein, there is an overlap between PK-8 (constructive handling of errors/analysis of student errors) and the PCK dimension dealing with (specific) student ideas and errors. Again, this is a content-related concretization with regard to very specific student ideas and errors on the topic skin. Thus, whereas in PK-8 the general principle of dealing with errors was focused, in the items PCK-1a, PCK-1b, and PCK-1c it was about the application of the principle in a subject-specific context. Therefore, there is some overlap in the operationalization of PK and PCK at this point that may be the reason for the unexpected relationship between PK and diagnostic activities.
However, the results might also be interpreted in the sense that both PK and PCK are positively related to the application of diagnostic activities. Considering the situated diagnostic context that we used to assess the diagnostic activities, the multidimensional nature of the teachers' classroom performance might have activated both scientific and pedagogical concepts as suggested by Depaepe et al. [20]. We believe PK is mainly correlated with evidence generation, whereas for evidence evaluation PCK should be decisive (cf. [25]). Therefore, coding the diagnostic activities as a total person ability measure, as done in this study, may have influenced the result as well. A next interesting step would be to investigate the relation between teachers' knowledge facets and the different diagnostic activities as it can be assumed that subject-specific knowledge is more important for evidence evaluation than for the other diagnostic activities (cf. [25]).
A well-developed PK including knowledge of general principles of teaching and learning may thus be seen as a precondition for being able to apply knowledge of contentrelated basic dimensions to the diagnosis of subject-specific situations (cf. [11,30]). Here, further research can follow that examines the use of knowledge facets in relation to different dimensions of instructional quality in which diagnostic activities are applied. Contentrelated dimensions such as cognitive activation could be related to both PCK and PK, while subject-specific dimensions such as use of models or use of experiments could correlate more strongly with PCK (cf. [11]). Since the video-based assessment tool DiKoBi Assess provided diagnostic situations in the biology classroom, the higher effect sizes for PCK (see Figure 2) and the moderate correlations between PCK and the other components of diagnostic competences indicated the importance of subject-specific knowledge for effective diagnosing in the science classroom.
As is the case in all studies, there are of course some limitations to our study. First, DiKoBi Assess was used to measure the diagnostic competences of a sample of PSTs within a situated context. Since the videos were scripted and presented specific challenges in a very condensed way, there might be a gap between the classroom situations in the videos and real-life teaching, even when teachers perceived the classroom situations as authentic [54]. By using short videos of classroom situations, the complexity of diagnostic situations can be reduced. In our case, this was done by focusing on specific challenges of instruction within one video, so that the complexity of each diagnostic situation was reduced by "breaking down practice into its constituent parts for the purposes of teaching and learning" [67] (p. 2058). This procedure may be beneficial for inexperienced PSTs' learning [40], but it must be kept in mind that instruction within the real classroom may involve solving more complex challenges.
Furthermore, the described relationships apply to the selected diagnostic activities that were relevant for diagnosing the classroom situations embedded in the video-based tool. Prompting other activities might change results. In future studies, we also want to include affective-motivational variables such as teachers' beliefs, motivation, and selfrelated cognitions that may impact teachers' diagnostic activities as well [7,38,94].
Another potential limiting factor to our study could be the person reliabilities of our knowledge tests. Many factors can impact Rasch person reliability. Often such reliability values are impacted by the targeting of a test. If a test is too difficult, or too easy for a sample, the person reliability can be impacted. Such off-targeting of a test to a sample's ability level is not uncommon in studies. A common rule of thumb is that there should be less than a 1 logit difference between the average person measure on a test and the average item measure [97]. For our instruments, only the value of the PCK-test violated this rule of thumb. The difference between the average person measure and the average item measure was −1.22, meaning that the test was too hard for the respondents. Since the PSTs of this study were at the beginning of their studies, higher reliabilities might be observed if the test instruments were administered later in the curriculum of the PSTs. Linacre [86] has also suggested that an instrument's reliability might be increased if longer versions of a test are utilized, or if a sample with greater variance in ability took an instrument. Since our sample's ability range was rather narrow and PSTs with extremely high or low abilities were not part of the sample, the measure of person reliability decreased.
Furthermore, we did not investigate what pre-service teachers are able to implement in real-life performance. Thus, being able to diagnose instructional features of biology instruction does not necessarily mean being able to implement those features in real-life instruction. The linking of diagnostic competences and practical implementation as well as the investigation of the effects of diagnostic accuracy on instructional quality can follow such considerations (cf. [14]).

Implications and Further Research
The results serve as a basis for further, more complex investigations of science teachers' diagnostic competences within the COSIMA project (Facilitating diagnostic competences in simulation-based learning environments in the university context) and can be used to design ways of fostering pre-service teachers' diagnostic competences systematically in the future. Following considerations by Meschede et al. [38], we have investigated diagnosis with regard to further features of instructional quality in the field of biology instruction and thus linked dimensions of subject-specific instructional quality with diagnostic competences. It is now important to investigate in more detail which diagnostic activities are related to which knowledge facet and what differences might exist with respect to different dimensions of instructional quality.
The results of the present study support the importance of PCK for biology teacher education, particularly for pre-service biology teachers' diagnostic activities and diagnostic accuracy. In this regard, video-based tools provide the opportunity to apply knowledge and diagnostic activities to classroom situations, which are particularly relevant for biology instruction. Thus, these tools can be used not only to measure skills (cf. [6,11,38]) but also as an effective way for supporting knowledge acquisition as well as for providing varying opportunities in which diagnostic activities can be applied and trained (cf. [42]). Therefore, besides utilizing DiKoBi as an assessment tool, its potential as a learning tool (DiKoBi Learn) will be investigated in future studies. Taking the results of the present study into account, DiKoBi Learn has then to provide additional instructional information concerning the subject-specific dimensions of instructional quality that are addressed in the videos used.
At the same time, the present results also point to the relevance of a well-founded pedagogical knowledge base. The findings could serve as an incentive to focus more on general dimensions of instructional quality and activities such as the general description of instruction in the pedagogical training of teachers, while subject-specific features are then given priority in subject-specific courses on the basis of practical examples (such as those provided in the video-based tool DiKoBi). Accordingly, in subject-specific courses, the interpretation and evaluation of subject-specific instruction should gain more weight.  Data Availability Statement: Information and queries on the data used can be obtained from the authors of this article.

Conflicts of Interest:
The authors declare no conflict of interest.

Coding Procedure for Diagnostic Activities
The coding procedure for the individual tasks was as follows: Task Describe, DA = evidence generation: (1) For each observed and described challenge, we assessed whether it addressed subject-specific pedagogical content (PCK) or merely pedagogical content (PK). Aspects referring to the subject matter (CK) have not been mentioned. For the assignment to PCK or PK, the descriptions were compared with the itemization (indicators) in the coding manual. Consequently, the descriptions could be assigned either to the scripted, evidence-based PCK challenges which we had recorded and embedded in the video clips [53], or to further PCK-aspects or PK-aspects. (2) We evaluated how well each PST executed the diagnostic activity evidence generation to assess the descriptions' quality. The quality of the statements was assessed on three levels (see Table A1).
Task Explain, DA = evidence evaluation: (1) Depending on the knowledge facet to which the description was assigned to, we evaluated (2) whether the statement was per se an accurate explanation and how well did the given explanation relate to subject-specific pedagogical theories. The quality of the statements was assessed on four levels (see Table A1).
Task Alternative Strategy, DA = drawing conclusions: (1) For each described strategy, affiliation to PCK or PK was assessed first by comparing the statements with the itemization in the coding manual. Strategies addressing PCK were further assessed on whether they covered aspects of the scripted PCK-challenges. (2) For quality assessment, we evaluated how well the alternative teaching strategy was set up. The quality of the statements was assessed on three levels (see Table A1).
The code "0" was assigned for not accurate statements. This was the case, for example, when incorrect observations have been made that were not visible in the videos (Task Describe), when false or incomprehensible explanations were given (Task Explain), or when the described alternative teaching strategy did simply not represent an (appropriate) alternative strategy (Task Alternative Strategy). The students should briefly repeat what was discussed last week, but only superficial, general terms are discussed. Presence of the teacher is minimized -less activating and motivating -room use is almost not given.

Empty phrase
The statement is more of an everyday phrase than an explanation, partly meaningless.
It doesn't really make sense to say we repeat the last lesson and then stop after two aspects.
2 Simple reference to concepts/theories Appropriate to or based on the corresponding description, the subject-specific pedagogical theory is named as a keyword or embedded as a phrase in a sentence.
The teachers' questions do not allow for cognitive activation.

Comprehensive explanation
Observation and theory are related to each other.
Calling one student is not enough, The teacher neither asked for explanations nor did she engage the students to recognize or call on conceptual connections. The activation of prior knowledge could be extended to activate the students more deeply. Promoting the motivation of the students by using different skin types (elephant skin, crocodile skin), activating prior knowledge (e.g., describing similarities/differences of the different skin types), asking for students' prior experiences (e.g., experiments on the sense of feeling/touch).