1. Introduction
Teacher education in Germany has frequently been at the center of educational-policy and scholarly debate, especially concerning how best to prepare novice teachers for the complexities of real-world classroom practice (
SWK, 2023). In the state of North Rhine-Westphalia (NRW), this debate culminated in the introduction of
Praxissemester in 2009 (effective from 2013), a five-month block placement intended to immerse pre-service teachers in authentic school contexts while they concurrently engage with theory in university seminars (
MSW, 2009;
Mertens & Gräsel, 2018). The core aim of the
Praxissemester is not merely to acquaint prospective teachers with school routines but, more importantly, to foster their ability to relate educational-science theory and subject-didactic constructs to live classroom situations (
MSW, 2010). These aims resonate with the Standards of the Standing Conference of the Ministers of Education and Cultural Affairs (
KMK, 2004), which likewise underscore that pre-service teachers develop professional competencies through observation, teaching, and structured reflection on real classroom events.
Regardless of the model of the
Praxissemester in NRW, there has also been an international discussion for some time about optimizing the preparation of students for the teaching profession through practical phases (
Allen & Wright, 2014) or the development and effects of longer practical phases (
Spooner et al., 2008;
Zeichner, 2010;
Donche et al., 2015;
Kervick et al., 2020). Similar goals are pursued, in particular the relationship between theory and practice and the increase in reflective ability and professional perception (
Malinen et al., 2012;
Lino et al., 2019). There are also examples of practical phases in which—similar to the
Praxissemester—preparatory and follow-up events as well as reflection opportunities are used to try to achieve the desired goals (
Parsons & Stephenson, 2005;
Stenberg et al., 2016).
1.1. Teacher Professionalization and the Role of Reflection
Professional competence in teaching is widely understood as a multidimensional construct encompassing (a) cognitive dispositions—such as subject knowledge, pedagogical-content knowledge, and general pedagogical knowledge; (b) motivational-affective dispositions—including self-efficacy and teaching beliefs; and (c) situational skills—such as classroom management and the ability to notice and interpret critical events (
Blömeke et al., 2015;
Stender et al., 2021). Thus, developing this competence requires more than acquiring declarative knowledge and technical skills. Especially in the early stages of training, teachers must learn to set educational-scientific knowledge into relation with context-specific situational challenges to generate alternative courses of action for future situations (
von Aufschnaiter et al., 2019;
Häcker, 2017).
Schön (
1983) famously characterized this as the primary task of the “reflective practitioner,” who continually engages in reflection-in-action, making on-the-spot adjustments during teaching, and reflection-on-action, analyzing past episodes to inform future practice.
In the context of teacher education at the university level,
Herzig and Grafe (
2005) refer to the process of “reflective learning” as a form of reflexive relationing in which pre-service teachers systematically confront theoretical concepts with real or recorded classroom episodes. As illustrated in
Figure 1, the individual—understood as a bearer of subjective theories—acts as a mediating instance between theory and practice. This mediation involves both engaging with educational theory and analyzing teaching practice. These processes mutually influence one another: theoretical engagement affects how practice is perceived, while the analysis of practice (e.g., through video-based, annotated, or paper-based materials) has repercussions on how theory is interpreted. Through such reflexivity, learners develop a situated integration of professional knowledge that becomes the basis for informed, adaptive practice (
Neuweg, 2005,
2017,
2021;
Herzig & Grafe, 2005;
Häcker, 2017).
Despite these aspirations, empirical evidence indicates that many pre-service teachers struggle to recognize theoretically grounded constructs during their practicums and to apply them when analyzing classroom events (
Blomberg et al., 2013;
Herzig, 2003;
Herzig & Grafe, 2005;
Seidel & Prenzel, 2007). For instance,
Mertens and Gräsel (
2018) demonstrate that pre-service teachers often struggle to move beyond surface descriptions of classroom events, rarely synthesizing their observations with educational theories unless they are explicitly trained to do so. Likewise, in their systematic review of the effects of long-term practicum models, such as the
Praxissemester,
Ulrich et al. (
2020) concluded that the quality of the learning opportunities accompanying student teaching is far more critical for genuine professionalization than the mere duration of field experiences.
Thus, to transform practicum experiences into authentic learning opportunities, teacher-education programs must embed structured reflective guidance, scaffolds like reflective heuristics, conceptual categorizations, and iterative feedback processes that foster deeper, theory-informed reflection. These observations have led teacher educators to explore new didactic tools that can scaffold reflection on practice. Among these, video-based methods, especially the analysis of self-recorded lesson segments, have emerged as an ever more widely adopted approach to create authentic, theory-driven reflection prompts that bridge the “theory–practice gap” (
Weng et al., 2023;
Hamel & Viau-Guay, 2019;
Gaudin & Chaliès, 2015).
1.2. Video-Supported Reflection
Over the last two decades, video-based learning (VBL) has moved from an innovative niche into a mainstream component of teacher-education programs worldwide, because they afford multi-perspective, repeatable, and contextually rich analyses of teaching practice (
Gaudin & Chaliès, 2015;
Vogelsang, 2019). By confronting video-based case studies of their own or others’ classroom interactions, pre-service teachers can practice, noticing subtleties of classroom dynamics, categorizing events under didactically meaningful labels, and then elaborating explanation or intervention hypotheses that draw directly on theory. In this way, video-based reflection evolves into a cognitive tool for developing a theory-guided perspective on teaching and learning (
Herzig & Grafe, 2004). A recent meta-analysis covering 30 experimental and quasi-experimental studies with 2161 pre-service teachers concluded that VBL produces medium positive effects on both content and pedagogical knowledge as well as on motivational dispositions when compared with non-video interventions (Hedges’s g = 0.38–0.53; Weng et al., 2023). Likewise, several systematic reviews concluded there are positive effects of video-based reflective formats on the professionalization of prospective teachers (
Hamel & Viau-Guay, 2019;
Tripp & Rich, 2012).
1.2.1. Video-Based Learning as an Established Route to Reflective Competence
Used in the context of reflective learning, empirical research demonstrates that video prompts can deepen student’s reflective engagement. In a four-group field experiment,
Weber et al. (
2023) showed that students who analyzed their own lesson videos (“video + memory”) generated significantly more enjoyment, immersion, and knowledge-based reasoning than peers who reflected only from memory or on text cases. Complementary evidence indicates that such benefits generalize beyond single studies: across 63 investigations synthesized by
Tripp and Rich (
2012), activities like coding, editing, or annotating teaching videos consistently fostered deeper noticing and theory application.
However, the design of video-supported reflection tasks significantly influences their effectiveness. Collaborative modes, moderate clip lengths, and structured scaffolds systematically moderate positive effects (
Weng et al., 2023). For preservice novices, unstructured viewing can be overwhelming; indeed, longitudinal interviews document initial nervousness and uncertainty with self-recorded lessons, with perceived usefulness emerging only after repeated, scaffolded engagement (
Pollmeier et al., 2021). At the same time, evidence from in-service teachers points to a complementary mechanism: in an individual-reflection setting, analyzing other teachers’ lessons—rather than one’s own—elicited deeper analysis of problematic events and higher emotional–motivational involvement (
Kleinknecht & Schneider, 2013). While transfer from experienced to novice contexts must be made cautiously, taken together, these findings suggest that self-video tends to impose additional cognitive–affective load, and therefore benefits from pre-arranged structure and prompting. Consequently, researchers advocate guided tasks that channel attention toward theoretically salient events—e.g., three-step analysis heuristics or heuristic-guided viewing protocols—to focus preservice teachers’ attention on theory-relevant aspects of practice (
Kleinknecht & Gröschner, 2016;
Prilop et al., 2019;
Schaper & Vogelsang, 2022).
The three-step heuristic “describe → evaluate → propose alternatives”, for instance, operationalizes the progression through analytical levels (
cf. Sherin & van Es, 2009;
Seidel & Stürmer, 2014): “describe” requires objective documentation of observed classroom events without interpretation; “evaluate” involves connecting observations to theoretical constructs from educational science and the estimation to which degree they are realized (e.g., identifying cognitive activation strategies); and “propose alternatives” demands theory-based assessment and generation of pedagogically grounded alternatives. This progression aligns with established models of professional vision development (
Blömeke et al., 2015), moving from perception through knowledge-based reasoning to situated decision making.
1.2.2. Digital Video Annotation as a Catalyst of Reflection
While simply watching video already supports reflection, digital video annotation (DVA) adds a further layer of interactivity and allows for the combination of documentation within the reflected video. A DVA tool allows learners (and supervisors) to attach time-stamped comments, tags or hyperlinks directly to specific frames of a clip, thereby
externalizing reflective thought in situ (
Pérez-Torregrosa et al., 2017). A systematic review of 18 studies concluded that DVA typically fulfils three partially overlapping functions (documentation, reflection, communication/feedback), and that the function of documentation and reflection dominates teacher-education uses (
von Wachter & Lewalter, 2023). Since annotations are anchored to the very moment they describe, they preserve contextual cues that would otherwise be lost in retrospective narratives and foster evolving lines of reasoning (
Blomberg et al., 2013). Empirical studies corroborate these affordances. For instance,
Okumu et al. (
2024) demonstrated that teacher candidates perceive time-stamped annotation feedback as more actionable and credible than conventional narrative feedback, although they still attribute higher authority to instructors than to peers. It can be argued that DVA operates as a bridging technology between video and reflection:
it conserves fleeting classroom phenomena by pinning comments to exact frames;
it anchors theoretical language (e.g., “cognitive activation”) to empirical evidence;
it facilitates dialogic feedback cycles without requiring face-to-face co-presence; and
it generates process data (number, timing, and content of annotations) that can serve both feedback and research purposes.
However technical affordances alone do not guarantee learning gains; clear prompts, modeling, and alignment with assessment criteria remain critical design features (
von Wachter & Lewalter, 2023).
1.2.3. The Analysis-Competence Model: Three Hierarchical Levels of Reflective Engagement
To operationalize the quality of reflection elicited by video analysis and annotation,
Plöger et al. (
2015) developed the Analysis-Competence Model. Drawing on expertise research and professional-vision literature, the authors conceptualize analysis competence as a two-dimensional construct. The first dimension encompasses content domains, while the second dimension represents cognitive operations ranging from simple perception to complex evaluation. This matrix structure yields a hierarchical taxonomy where higher levels require both broader content integration and more sophisticated cognitive processing (
Plöger et al., 2015). As shown in
Figure 2, the empirical validated model arranges reflective performance along three hierarchical main stages with five subordinate levels:
Empirical Rasch analyses with 800 participants across all phases of teacher education validated this hierarchical ordering and supported the model’s ability to discriminate novices from experts (
Plöger & Scholl, 2014). Importantly, the model aligns tightly with video-supported reflection: the stimulus used for validation was a 45-min physics lesson video, and participants’ written analyses were scored against the five levels.
Within the present research project, the model serves two purposes. First, it provides analytic language for describing the depth of reflection that DVA aims to elicit—moving students from noticing (Levels 1–2) towards explanation and justified alternatives (Levels 3–5). Second, its graded rubrics allow the evaluation of intervention effects in a psychometrically sound manner. For example, our own scaffolded annotation design hypothesizes that explicit prompts targeting cause–effect relations will particularly foster transitions from Level 2 to Level 3.
Following
von Aufschnaiter (
2023), we construe reflection as an analytic, solution-oriented thinking process that may be internally oriented (towards one’s own beliefs and habits) or externally oriented (towards the analysis and improvement of teaching, materials, or processes). In this study, reflection on video vignettes of one’s own lessons is treated as externally oriented reflection—i.e., analysis—and is therefore operationalized via the Analysis-Competence Model introduced above. For readability, we use the umbrella term “reflection” throughout, while retaining the ACM’s original labels when reporting stage-specific outcomes.
1.2.4. Summary and Implications for Design
The literature reviewed here substantiates three claims. (1) Video-based learning is an evidence-backed pathway for cultivating reflective competence; its impact exceeds that of text-based or memory-based reflection when tasks are deliberately structured. (2) Digital video annotation amplifies these benefits by externalizing cognition, synchronizing feedback, and permitting fine-grained analysis, provided that learners receive clear heuristic guidance. (3) The Analysis-Competence Model offers a validated yardstick for gauging how deeply student teachers engage with classroom video, distinguishing superficial description from theory-grounded synthesis and holistic appraisal.
Therefore, designing effective video-supported reflection sequences entails aligning instructional prompts, DVA tool functions, and outcome measures with the hierarchical logic of the Analysis-Competence Model. Practically, this means the following:
Starting with focused noticing prompts linked to Levels 1–2;
Progressively introducing interpretive lenses (e.g., cognitive-activation theory) to scaffold Levels 3–4;
Culminating in global lesson analyses and alternative scenario planning emblematic of Level 5.
Future sections of this paper detail how the present study operationalizes these principles within a semester-long practicum and how resulting gains in analysis competence are assessed.
1.3. The Present Study
Building on the theoretical strands outlined above, and guided by the overarching goal of developing and evaluating a scaffolded reflection sequence anchored in self-recorded lesson videos during the Praxissemester, this paper addresses the following research questions (RQs):
RQ1: To what extent does media-mediated reflection via self-recorded videos influence pre-service teachers’ global analysis competence in the context of the Praxissemester?
RQ2: How does media condition shape the development of students’ dimensional analysis competence?
RQ2a. What effect does the analysis of self-recorded videos with a structured digital annotation tool have on the growth of pre-service teachers’ analysis competence?
RQ2b. What effect does the analysis of self-recorded videos without an annotation tool have on the growth of pre-service teachers’ analysis competence?
RQ3: In what ways do the findings inform design principles for cultivating high-quality reflection—operationalized as “analysis competence”—within long-term practicums such as the Praxissemester?
These questions allow us to isolate the incremental value of a structured annotation scaffold (RQ2a vs. RQ2b) and to connect empirical outcomes to actionable design recommendations for teacher-education programs.
Related to the research questions and based on the theoretical framework that operationalized externally focused reflection as “analysis competence”, the following was hypothesized:
H1: Video-based reflection on own teaching sequences will enhance pre-service teachers’ analysis competence.
H2: Video-based reflection will lead to greater improvements in analysis competence than memory- and text-based reflection.
H3: Video-based reflection with a digital annotation tool will improve pre-service teachers’ analysis competence to a greater extent than video-based reflection alone.
2. Materials and Method
2.1. Design and Setting
The study employed a three-group, pre-/post-test, randomized quasi-experimental design that fit seamlessly into the organizational logic of the Praxissemester. Before enrolment opened, the university’s central timetabling office randomly assigned pre-service teachers to one of three seminar groups accompanying the school placement (Begleitseminar). This administrative allocation occurred independently of researcher influence, eliminating potential selection bias. The three intact seminar groups were then randomly assigned to experimental conditions (VG1: Video + Annotation Tool; VG2: Video Only) or the control condition (CG: Text-Only) prior to roster publication, thereby preventing self-selection.
All participants had previously completed an identical preparatory seminar (Vorbereitungsseminar) that introduced the theoretical frameworks (Behaviorism, Cognitivism, Constructivism, Motivation, Classroom Management, Cognitive Activation, Constructive Support) and the three-step heuristic used in the study. This shared foundational course ensured that all participants, regardless of their subsequent group assignment, possessed comparable theoretical vocabulary and analytical tools at baseline, though the groups remained naturally heterogeneous in demographic characteristics (age, gender, semester). Using whole seminars as the unit of assignment avoided contamination between conditions that would have occurred had students within one cohort used different tools.
All three cohorts followed the same
Praxissemester schedule (
Section 2.4) and differed only in the medium and structure of reflection tasks (annotation vs. unstructured video vs. text). Online preparation and asynchronous materials were hosted in PANDA, the university’s Moodle-based learning management system, which had already proved reliable for blended and flipped arrangements in previous pilot iterations.
Following an exploratory comparison of five commercial and open-source tools against the criteria usability, annotation affordances, data control, collaboration and cost, the commercial tool “edubreak
®® Campus” was chosen to realize the structured annotation scaffold central to VG1. Only edubreak
®® Campus combined timeline-anchored comments, peer-threaded dialogue, fine-grained rights management, and full compliance with the General Data Protection Regulation (GDPR) of the European Union, and was therefore chosen for the intervention. Participants could upload 30- to 45-min self-recorded lessons and time stamp and annotate directly in the video as well as integrate symbols such as arrows and colored circles. By doing so, the participants completed a three-step heuristic (“describe → evaluate → propose alternatives”) directly in the video player (see
Section 2.3 for item mapping).
In contrast, VG 2 did not have access to a digital annotation tool and used only the self-recorded video for reflecting on their own teaching. Finally, the CG did not self-record their lessons but rather used remembered vignettes as the basis for written reflections.
All feedback (peer + instructor) occurred in a standard peer-group discussion forum moderated by the instructor, so that the presence or absence of videos and/or annotations was the only systematic design difference.
Participants filmed with their own devices (BYOD) supplemented, on demand, by loan tablets and clip-on microphones supplied by the chair. This solution balanced image quality, ease of use, and cost while meeting ethical and GDPR requirements for pupil privacy. Written consent was obtained from all persons visible in the videos or their legal custodians; raw files were stored on the participants’ devices and deleted after the end of the seminar, in accordance with state law and GDPR requirements.
2.2. Participants
Sixty-six preservice teachers were enrolled across three Begleitseminar (accompanying university seminar running in parallel to the school placement) cohorts (CG: 23; VG1: 21; VG2: 22). After excluding eleven students who either (a) had not completed the compulsory preparatory seminar in the preceding term or (b) missed one of the two measurement points, the analytic sample comprised 55 participants (overall attrition = 11/66 = 16.7%). Attrition by condition was of similar magnitude (CG: 3/23 = 13.0%; VG1: 4/21 = 19.0%; VG2: 4/22 = 18.2%).
The demographic breakdown of the analytic sample is shown in
Table 1.
As
Table 1 indicates, 74.5% of participants were female, reflecting typical gender distributions in secondary-teacher education programs. Experimental Group 2 exhibited the highest female proportion (88.8%), whereas the Control Group was more balanced (60% female). The overall mean age was 26.45 years (
SD = 4.67), with Experimental Group 1 showing a slightly higher age variance (
M = 27.00,
SD = 6.52). Average duration of study stood at 10.00 semesters (
SD = 2.97), again indicating considerable heterogeneity, which is characteristic of students who enter specific teaching tracks after initial vocational training. Baseline equivalence was supported by the rmANOVA: the between-subjects group effect on the global score was non-significant,
F(2, 51) = 1.20,
p = 0.310, η
2p = 0.04; likewise, stage-specific rmANOVAs showed non-significant group effects (all
p > 0.05).
Thus, any post-intervention differences can be interpreted against a statistically comparable starting point.
All 55 analyzed students provided written informed consent. Participation in data collection was voluntary and anonymous. No financial incentives were offered; course credit was identical across conditions.
2.3. Measures
Analysis competence was assessed with a video-based analysis-competence test developed and validated for this study. The instrument couples a full-length authentic classroom video with a multiform questionnaire that operationalizes the five ordinal levels of the Analysis-Competence Model (ACM;
Plöger et al., 2015).
As a stimulus, all participants viewed the same 40-min recording of a lower-secondary history lesson. A whole-lesson stimulus was chosen because higher-order reflection (Levels 3–5) presupposes insight into temporal coherence and contingency patterns across an entire instructional episode.
Twenty-one items are arranged in two blocks that mirror the ACM’s hierarchy:
Stage I—Analytic Competence (Levels 1 and 2). Seven binary items probe descriptive noticing (“trifft zu/trifft nicht zu” [“applies/does not apply”]), and seven six-point frequency ratings (1 = sehr häufig [“very often”]…6 = gar nicht [“not at all”]) gauge the prevalence of surface-level features.
Stage II—Synthetic Competence (Levels 3 and 4). Seven multiple-choice items invite predictions of likely instructional effects, while seven open prompts ask respondents to propose theory-grounded alternatives for specific lesson segments.
Stage III—Process Competence (Level 5). Three linked tasks—global lesson rating, written justification, and redesign proposal—capture holistic, theory-informed appraisal.
Items are evenly distributed across the seven theoretical strands taught in the preparatory seminar (Behaviorism, Cognitivism, Constructivism, Motivation, Classroom Management, Cognitive Activation, Constructive Support).
Responses are first coded as beherrscht (“mastered”, = 2), teilweise beherrscht (“partially mastered”, = 1), or nicht beherrscht (“not mastered”, = 0) according to item-specific scoring criteria that differentiate binary, Likert, and open formats. For each theoretical framework (e.g., Motivation), a weighted individual score at the level tier is then calculated—(Level 1 × 1 + Level 2 × 2)/3 for Stage I; (Level 3 × 1 + Level 4 × 2)/3 for Stage II; Stage III = Level 5—so that higher-level reasoning contributes proportionally more. Aggregating across the seven theoretical frameworks yields three stage scores (I, II, and III) and, finally, a global score that aggregates across all three stages.
Prior to data collection, content validity was examined through an expert review with eight panelists drawn from school pedagogy/educational science and school practice. The panel comprised five certified teachers with completed preparatory service (Second State Examination) and several years of classroom experience, one teacher with a First State Examination, and two (school) pedagogues/educational scientists. Six panelists were active in teacher education at the time of the review and two in active school service. Each expert independently evaluated the item pool for representativeness, clarity, and alignment with the targeted constructs and lesson situations. Reviews were conducted as individual, semi-structured expert interviews; sessions were audio-recorded, transcribed, and analyzed using structured content analysis. Feedback informed minor refinements to wording and rubric anchors prior to piloting.
Inter-rater reliability for the open items—based on 686 double-coded answers (≈13% of the corpus)—produced a median Cohen’s κ = 0.699 across the seven theoretical frameworks, exceeding the 0.60 benchmark for “good” agreement. Retest stability, estimated in the untreated control group over a 14-week interval, averaged r = 0.85, indicating marked temporal consistency despite the instrument’s complexity.
To parallel the intervention setting, the test was delivered in a 60-min supervised session: each student received a tablet pre-loaded with the stimulus video and a paper questionnaire. Participants could pause, rewind, and revisit scenes at will; thus, media competence differences were minimized and ecological validity preserved. Identical procedures and instructions at pre- and post-test ensured objectivity and controlled for testing artefacts.
2.4. Procedure and Ethical Considerations
The intervention spanned two consecutive semesters. During the summer term, all students completed the Vorbereitungsseminar (preparatory seminar in the preceding term). Delivered in an inverted-classroom format, this seminar revisited four theoretical modules through self-study tasks in the university’s Moodle instance (PANDA), complemented by classroom simulations that were video-recorded with a bring-your-own-device (BYOD) approach. These activities familiarized every participant with the three-step heuristic and with basic video-production logistics, thereby standardizing conceptual and technical prerequisites before group differentiation occurred.
At the start of the following winter term, the three cohorts of the
Begleitseminar were randomly assigned to the experimental or control condition (see
Section 2.1). The seminar comprised twelve weekly meetings: three sessions of theoretical and organizational induction, seven sessions devoted to the analysis of students’ own teaching, and one concluding session. In the opening meeting, participants received a BYOD recording guide, signed an information sheet confirming that participation was voluntary and anonymous, and took the pre-test. Since all data were anonymized and every participant was of legal age, the project proceeded without a formal ethics-committee review, in line with university policy. Students in VG 1 and VG 2 were required to film a full lesson in their practicum school. The organizational complexity of securing parental and head-teacher consent meant that most recordings took place only after the autumn break; the first clips, therefore, became available in November, i.e., by the fourth seminar session. Members of the control group did not record video but compiled narrative vignettes from their lesson plans and post-lesson notes.
From session 4 to session 11, the seminar rooms were given over to theory-guided reflection on participants’ own teaching. VG 1 uploaded its recordings to edubreak® Campus and analyzed them with time-stamped annotations and visual markers while following the “describe → evaluate → propose alternatives” heuristic; VG 2 viewed the same footage on their own devices and produced unguided written analyses; CG conducted text-based reflections on remembered episodes. To foster peer learning, the two video groups re-organized themselves into self-selected triads that exchanged feedback on each other’s clips. Each participant conducted one analysis of their own recorded lesson and two additional analyses of peer videos, resulting in a total of three structured reflections per person. A 90-min mandatory expert feedback session with the lecturer could be scheduled during this phase.
In week 12, all participants completed the post-test under conditions identical to the pre-test. The final session was used to clarify administrative deadlines, to discuss preliminary group-level findings, and to assist students in deleting residual video files from private devices in accordance with GDPR guidance issued at the outset.
2.5. Data Analysis
All participants completed the video-based analysis-competence test once in the first Begleitseminar session (pre-test) and again in the closing session 14 weeks later (post-test). Changes in performance, therefore, served as the principal indicator of learning. To assess (a) overall growth in analysis competence and (b) differential growth across media conditions, we combined a conventional repeated-measures analysis of variance (rmANOVA) with linear mixed modeling (LMM).
The rmANOVA treated Time (pre vs. post) as a within-subjects factor and Condition (VG 1, VG 2, CG) as a between-subjects factor. Separate models were estimated for (a) the global score and (b) the three stage scores that correspond to the Analysis-Competence hierarchy (Stages I–III). Prior to analysis, the normality of residuals was examined with Shapiro–Wilk tests. Effect sizes are reported as partial eta-squared (η
2p), following
Cohen’s (
1988) conventions (η
2p = 0.01 small; 0.06 medium; 0.14 large).
Since intact seminar groups were used as administrative units and each condition was delivered in a distinct seminar, seminar membership is confounded with Condition. Consequently, between-condition effects (and Group × Time interactions) may partly reflect seminar-specific influences and should be interpreted with caution. Each LMM specified Time, Condition and their interaction (plus covariates: sex, age, semester) as fixed effects, with random intercepts for participants to account for the repeated-measures structure. Degrees of freedom for fixed-effect tests were obtained via the Satterthwaite method (lmerTest). In the rmANOVA, missings were list-wise deleted. The LMM relied on maximum-likelihood estimation, which provides unbiased parameter estimates under the assumption of missing at random. Alpha was set at 0.05 (two-tailed) for all tests.
This dual analytic strategy—rmANOVA for ease of interpretation and LMM for statistical precision—allows for a rigorous examination of whether the integrated annotation scaffold (VG 1) yielded gains in reflective competence that exceed those observed in the video-only (VG 2) and text-only (CG) conditions, and whether such gains are concentrated at the higher ACM stages where theory-based evaluation and redesign are expected to emerge.
3. Results
3.1. Baseline Equivalence
Prior to any intervention, all three cohorts, Experimental Group 1 (VG1: Video + Annotation Tool;
n = 17), Experimental Group 2 (VG2: Video Only;
n = 18), and Control Group (CG: Text Only;
n = 20), were compared on their pretest Analysis-Competence scores to verify equivalence. Baseline equivalence was supported by the rmANOVA: the between-subjects Group effect on the global score was non-significant,
F(2, 51) = 1.20,
p = 0.310, η
2p = 0.04 (cf.
Table 2). Consequently, all post-intervention contrasts can be attributed to the instructional treatments rather than pre-existing disparities.
3.2. Global Competence Gains
To evaluate overall growth in reflective competence, we first conducted repeated measures ANOVA (rmANOVA) on the global Analysis-Competence score, with Time (pretest vs. posttest) as the within-subjects factor and Condition (VG1, VG2, CG) as the between-subjects factor. Results indicate a significant main effect of Time (F(1, 51) = 35.11, p < 0.001, η2p = 0.41), confirming that all participants improved from pretest to posttest. The critical Time × Condition interaction, however, was not significant, F(2, 51) = 1.50, p = 0.232, η2p = 0.06, indicating that the three instructional formats did not differ reliably in the size of their gains. Nonetheless, descriptive contrasts favored the video-plus-annotation cohort. Within-group analyses showed that this group’s mean score climbed from 25.78 (SD = 4.83) at pre-test to 30.78 (SD = 4.50) at post-test, t(16) = 5.49, p < 0.001, d = 1.07. The video-only group improved from 25.56 (SD = 3.89) to 28.98 (SD = 4.14), t(17) = 3.36, p = 0.004, d = 0.85, while the text-based control also registered a significant, though smaller, gain from 25.17 (SD = 4.36) to 27.67 (SD = 4.35), t(19) = 2.30, p = 0.033, d = 0.57. Hence, although the interaction term fell short of significance, the pattern of effect sizes suggests that structured video annotation may confer an incremental advantage over both video viewing without annotation and traditional text-only reflection.
3.3. Stage-Specific Gains
Next, we disaggregated effects by competence levels to determine whether annotation especially facilitated deeper reflection (Stages II–III). Separate rmANOVAs for each stage revealed the following:
Stage I (Analytic Competence): Neither the main effect of Time nor the Time × Condition interaction reached significance, F(1, 51) = 2.39, p = 0.128, η2p = 0.04, and F(2, 51) = 1.05, p = 0.359, η2p = 0.04, respectively. Scores were already close to the maximum at pre-test, leaving little room for improvement; consequently, none of the three cohorts changed reliably at this surface-noticing stage.
Stage II (Synthetic Competence): The test yielded a strong main effect of Time,
F(1, 51) = 24.86,
p < 0.001, η
2p = 0.33, but the interaction with Condition was non-significant,
F(2, 51) = 1.77,
p = 0.180, η
2p = 0.07. Nevertheless, within-group contrasts revealed a differentiated picture. The video-plus-annotation group (VG1) increased its mean by 1.69 points,
t(16) = 3.62,
p = 0.002,
d = 0.79, while the text-based control (CG) also improved by 1.90 points,
t(19) = 3.57,
p = 0.002,
d = 1.17. In contrast, the video-only cohort (VG2) showed a smaller, statistically non-significant gain of 0.83 points,
t(17) = 1.71,
p = 0.106,
d = 0.53. We discuss a possible explanation for the control group’s Stage-II gains in
Section 4.1.
Stage III (Process Competence): A large main effect of Time emerged again, F(1, 51) = 21.87, p < 0.001, η2p = 0.30, accompanied by a trend-level Time × Condition interaction, F(2, 51) = 2.63, p = 0.082, η2p = 0.09. Within-group tests showed that both video cohorts achieved substantial advances: VG1 rose by 2.30 points, t(16) = 4.24, p < 0.001, d = 0.92, and VG2 by 2.28 points, t(17) = 3.47, p = 0.003, d = 0.86. The control group, however, registered only a trivial, non-significant change of 0.36 points, t(19) = 0.99, p = 0.333, d = 0.26. Although both VG1 and VG2 achieved significant within-group gains at this level, between-condition differences were not statistically significant; descriptively, VG1 showed the largest improvement, indicating a potential incremental benefit of structured annotation that remains at the trend level.
Taken together, the stage-specific analyses show a differentiated pattern: basic descriptive noticing (Stage I) was already well established and remained stable across all groups; synthesis skills (Stage II) improved significantly in the annotation group (VG1) and the text-based control (CG) but not in the video-only group (VG2); and process-oriented reasoning (Stage III) grew markedly in both video conditions but not in the control group, with the annotation group showing descriptively the largest gains.
Table 2 provides an overview of the Repeated-Measures ANOVA Results.
3.4. Linear Mixed Models
Since the three seminar cohorts formed intact clusters, the repeated-measures analyses were replicated with linear mixed models that included random intercepts for participants. The global Analysis-Competence score served as the dependent variable; fixed effects comprised Time (pre vs. post), the two dummy-coded experimental conditions, and their interactions, while age, sex, and number of semesters were entered as covariates.
Random intercepts for participants were included to account for the within-person correlation induced by the repeated-measures design. Degrees of freedom for fixed effects were obtained via the Satterthwaite method.
For the full model, the fixed-effect tests yielded three key results (see
Table 3):
Time was a robust positive predictor, Estimate = 2.50, SE = 0.96, p = 0.012, confirming that—after accounting for clustering and covariates—participants gained on average 2½ points on the global scale between pre- and post-tests.
The Time × VG1 interaction was positive but narrowly missed conventional significance, Estimate = 2.50, SE = 1.41, p = 0.083. This coefficient indicates that the video-plus-annotation group improved by roughly an additional 2½ points beyond the control group’s slope, mirroring the trend observed in the rmANOVA.
The Time × VG2 interaction was small and non-significant, Estimate = 0.85, SE = 1.41, p = 0.549, suggesting that watching self-recorded video without structured annotation did not accelerate growth relative to text-based reflection.
Main effect contrasts for group membership (VG1 vs. CG: Estimate = −1.62, p = 0.505; VG2 vs. CG: Estimate = 0.19, p = 0.940) were likewise non-significant, underscoring that initial competence levels were comparable across conditions. Covariates (age, sex, semesters) contributed no reliable variance (all p > 0.090).
Overall model quality was acceptable (marginal R2 = 0.23; conditional R2 = 0.58; AIC = 620.49; BIC = 649.99; log Lik = −299.24), indicating that the fixed effects accounted for about one quarter of the explainable variance, while the addition of random effects captured a further third. Supplementary analyses of individual competence stages confirmed the pattern observed in the global model, with both video conditions showing advantages at higher competence levels.
Taken together, the LMMs corroborate the rmANOVA by showing (a) a robust overall improvement and (b) an incremental—but not yet statistically definitive—advantage for the annotation scaffold. With larger samples, the observed trend of steeper competence growth in VG1 would likely reach significance, lending further support to the pedagogical value of structured video annotation.
4. Discussion
This study investigated the impact of different media-supported reflection formats on the development of pre-service teachers’ reflection competence during a semester-long practicum. By comparing a group using a structured digital video annotation (DVA) tool (VG1), a group using video only (VG2), and a text-based control group (CG), we sought to determine the incremental value of specific instructional scaffolds. The findings offer valuable insights into how to bridge the theory–practice gap in teacher education, particularly within practicum settings like the German Praxissemester. This discussion will interpret the principal findings in relation to our research questions, outline their practical implications for teacher education, and acknowledge the study’s limitations while proposing avenues for future research.
4.1. Principal Findings and Interpretation
Our first research question (
RQ1) asked whether media-mediated reflection enhances pre-service teachers’ global analysis competence. The results provide a clear affirmative answer: All participants demonstrated significant growth in their ability to analyze classroom situations over the course of the semester, as evidenced by the strong main effect of Time. This general finding aligns with our first hypothesis (
H1), which specifically predicted that video-based reflection would enhance analysis competence. Indeed, both video groups improved significantly on the global score and on process-oriented, whole-lesson reasoning, with the largest within-group gains observed in these conditions—although between-condition differences did not reach statistical significance. At the situation-focused domain (Stage II), evidence was mixed: the annotation group improved significantly, while the video-only group did not. This pattern suggests that
H1 is supported, most strongly for whole-lesson reasoning, with synthesis-level gains contingent on scaffolding. These findings align with a robust body of literature confirming that video-based learning (VBL) offers a more potent medium for reflection than text-based or memory-based tasks, largely because it provides an authentic, repeatable, and contextually rich record of practice (
Gaudin & Chaliès, 2015;
Weng et al., 2023).
The central contribution of this study, however, lies in the answer to our second research question (RQ2), which probed how different media conditions shaped competence development. The findings reveal that how video is used is a critical moderator of its effectiveness. While both video groups exhibited larger within-group gains and, at whole-lesson reasoning, gains that descriptively exceeded those of the text-only control, the group using the DVA tool with a structured heuristic (VG1) showed the most consistent and pronounced development, with between-condition differences not reaching conventional significance.
Specifically, the annotation scaffold (RQ2a) appeared beneficial for fostering synthetic competence (Stage II), where learners must connect observed events to theoretical constructs. The DVA group (VG1) showed a clear, significant improvement at this level, and the text-only control (CG) likewise improved significantly; by contrast, the video-only group (VG2) did not reach significance. This suggests that the heuristic prompts (“describe → evaluate → propose alternatives”) embedded within the annotation tool effectively guided participants to move beyond simple noticing and engage in the crucial work of theoretical sense-making, while also indicating that structured written reflection can support synthesis in the absence of video scaffolds. One possible account for the control group’s gains is that the act of writing itself operates as a scaffold: by requiring participants to make classroom episodes intelligible for a reader, it pushes them to (a) produce precise, audience-oriented descriptions that extend beyond an immediate first-person teaching perspective (the “describe” step) and (b) narratively organize and compress classroom complexity, which can prompt evaluative links to theoretical constructs (the “evaluate” step). Without this explicit scaffold (RQ2b), the video-only group (VG2) did not show reliable growth at this intermediate stage, consistent with evidence that unstructured viewing can leave novices struggling to synthesize observations.
At the highest level of process competence (Stage III), which involves holistic, theory-driven evaluation, both video groups showed significant within-group gains that descriptively exceeded those of the control group; however, between-condition differences were not statistically significant. This pattern is consistent with the view that engaging with one’s own teaching on video supports the development of advanced reflective skills (
Tripp & Rich, 2012). Nevertheless, a clear pattern emerged: the DVA group (VG1) demonstrated the largest gains, with the interaction effect approaching statistical significance in both rmANOVA and LMM analyses. This trend suggests that while any form of video analysis is beneficial, the structured DVA format provides an incremental advantage by externalizing thought and anchoring it to specific moments in practice, thereby facilitating more sophisticated, theory-grounded appraisals (
Blomberg et al., 2013;
von Wachter & Lewalter, 2023). Hypotheses
H2 and
H3 received descriptive support in the pattern of within-group gains but not confirmatory support in between-condition tests (non-significant interactions).
In summary
(RQ3), the findings strongly indicate that to cultivate high-quality reflection, it is not enough to simply provide pre-service teachers with video recordings. The design of the reflective task is paramount. The structured video annotation scaffold acted as a cognitive tool (
Herzig & Grafe, 2004), fostering a deeper, theory-informed reflection in practicum settings.
4.2. Implications for Teacher Education
The results of this study have direct and actionable implications for the design of practicum seminars and teacher education curricula:
Prioritize Scaffolding Over Unstructured Viewing: The primary implication is that the pedagogical design surrounding video use is more critical than the mere presence of the technology. Teacher education programs should move beyond simply requiring students to record and watch their lessons and instead implement structured reflective frameworks. The “describe → evaluate → propose alternatives” heuristic used here provides a simple yet effective model that can be adapted for various theoretical lenses.
Consider Digital Video Annotation Tools: The promising trends observed in the DVA condition suggest potential value in tools that allow for time-stamped, in situ commenting. Such platforms make the link between theory and practice tangible by anchoring abstract concepts to concrete classroom moments. They also create a permanent, dialogic record of reflection that is more precise and actionable than separate written reports. Programs should consider integrating GDPR-compliant DVA tools into their learning management systems.
Align Reflective Tasks with Course Content: The observed patterns suggest that the intervention benefited from its alignment with the theoretical modules taught in the preparatory seminar. By asking students to apply specific theoretical constructs in their annotations, the design ensured that reflection was not an isolated activity but an integrated part of the learning process. This model of tight coupling between theory input and video-based application should be a core principle of practicum design.
In sum, our findings underscore that video reflection alone is insufficient to guarantee high-quality reflection; to realize the full potential of video-supported reflection, university programs should embed structured prompts coherent with both theoretical modules and recognized reflection levels within practicum seminars. Annotation tools can offer an elegant solution to this approach.
4.3. Limitations and Future Research
While this study offers valuable insights, its limitations must be acknowledged, each pointing toward avenues for future research.
Sample Size and Statistical Power: With a final sample of 55 students, the study was underpowered to reliably detect small-to-medium effects. This is the most likely explanation for why several key interaction effects (e.g., the advantage of VG1 at Stage III) only reached trend-level significance. Moreover, the seminar-level clustering could not be modeled separately (one seminar per condition; three clusters), precluding random-effects or cluster-robust adjustments at that level. Future research should replicate this design with larger, multi-institutional samples to provide more robust estimates of the effects and confirm the trends observed here.
Potential for Self-Selection Bias: Although seminar cohorts were randomized, individual participation in the data collection was voluntary. It is possible that students with higher motivation or pre-existing technical skills were more likely to consent, potentially influencing the results. Future studies should attempt to gather baseline data on all enrolled students, including non-participants, to assess the extent of any self-selection bias.
Limited Generalizability: The study was conducted at a single German university within the specific context of the NRW Praxissemester. The findings’ generalizability to other teacher education systems, subject domains, or cultural contexts remains to be established. Multi-site and cross-cultural replication studies are needed to test the robustness of these design principles.
Possible Ceiling Effects: The measure for Analytic Competence (Stage I) may have been too easy for participants, who scored highly at pre-test, leaving little room for measurable growth. This may explain the lack of significant effects at this stage. Future research could develop or employ more challenging instruments to capture finer-grained developments in professional noticing. Alternatively, the absence of measurable gains at Stage I might reflect the nature of this initial level itself, which may represent a foundational skill that pre-service teachers tend to acquire early in their studies. This would suggest that the lack of growth is not necessarily a failure of the intervention but rather a ceiling effect inherent in the model, indicating that Stage I captures a relatively low-threshold competence.
Despite its limitations, this study makes a clear contribution by isolating the specific value of a structured DVA scaffold in fostering higher-order reflective competence. In doing so, it informs the ongoing scholarly and practical debate on how video use in teacher education can be designed for optimal pedagogical impact.
5. Conclusions
The present study suggests that embedding a structured annotation scaffold within video-based reflection tasks yields the largest within-group improvements in deep-structure analysis competence—particularly at the synthesis (Stage II) and process-oriented reasoning (Stage III) levels; between-condition differences did not reach conventional significance. While simple video viewing (VG2) produced significant gains in higher-order reflective skills (Stage III), it did not yield reliable improvements at the synthesis level (Stage II). Only the Video-plus-Annotation condition (VG1) yielded significant improvements at both Stages II and III of the ACM. Notably, the text-only control also improved at Stage II, which may be explained by the idea that structured writing itself functions as a scaffold: it requires precise, audience-oriented description and narrative compression, thereby inviting evaluative links to theoretical constructs.
From a practical standpoint, our quasi-experimental results underscore the value of designing video-based reflection tasks that integrate (a) clearly articulated theoretical modules, (b) a timestamped, three-step annotation heuristic, and (c) feedback mechanisms. Such an architecture not only orients preservice teachers toward relevant pedagogical constructs but also guides them through incremental cognitive steps, thereby enabling more authentic, theory-informed reflections. Consequently, teacher-education programs should consider adopting or adapting annotation tools that align seamlessly with curricular competencies (
KMK, 2004) and that afford direct, contextually embedded feedback from peers and expert facilitators.
Despite these encouraging outcomes, given this study’s single-institution context, cross-institutional replications are imperative to establish external validity. Conducting similar quasi-experimental interventions at diverse universities—possibly with variations in mentor support structures and school partnerships—will clarify whether the annotation-driven model generalizes across differing program cultures and resource availabilities (
Ulrich et al., 2020).
In conclusion, by demonstrating that a Video plus Annotation design shows promise for advancing deep-structure Analysis-Competence, this study offers a replicable blueprint for embedding high-quality reflection in long-term practicums. Future investigations are essential to optimize scaffold design, assess longitudinal impacts on teaching practice, and ensure that these promising results can be realized in diverse teacher-education contexts.