1.1. Research Background: TPACK in the Generative AI Era
The Technological Pedagogical Content Knowledge (TPACK) framework, introduced by Mishra and Koehler [
1], builds on Shulman’s foundational work on Pedagogical Content Knowledge (PCK) [
2,
3] and conceptualizes effective technology integration as the dynamic intersection of Content Knowledge (CK), Pedagogical Knowledge (PK), and Technological Knowledge (TK). Over the past two decades, TPACK has provided a shared analytical language for examining how teachers develop integrated knowledge structures for technology-enhanced instruction across diverse educational contexts [
4,
5].
The rapid emergence of generative artificial intelligence (GenAI), particularly after the public release of ChatGPT in November 2022, challenges key assumptions underlying classical TPACK. Whereas earlier educational technologies were largely stable and predictable tools whose functions could be mastered through procedural proficiency [
6], GenAI systems participate in meaning-making and representation by producing novel outputs through opaque and context-sensitive processes, often with nontrivial unpredictability [
6,
7]. This shift changes the cognitive task of teaching from operating tools to orchestrating tasks under generative uncertainty.
Recognizing this design reorientation, recent scholarship has proposed multiple extensions to the TPACK framework. For example, Intelligent-TPACK, initially conceptualized by Celik [
8] and further developed by Chiu and colleagues [
9,
10], treats ethical knowledge as a core component and considers human–AI collaboration a distinct competency area. Using a Delphi study with teachers from diverse subject domains, Chiu et al. operationalized Intelligent-TPACK as an interrelated set of knowledge domains that includes AI-related technological, pedagogical, and content knowledge, human–AI collaboration knowledge, and ethical knowledge [
10]. This line of work highlights a broader shift from “tool use” toward the design and governance of AI-augmented instructional activity.
The impetus for extending TPACK is further reinforced by contributions from the framework’s original co-creators. Mishra, Warr, and Islam identified five defining characteristics of GenAI—protean, opaque, unstable, generative, and social—that distinguish it from prior educational technologies [
6]. These characteristics imply that effective integration is shaped not only by what teachers know, but also by how they monitor, evaluate, and recalibrate task structures during ongoing human–AI interaction, especially when outputs are uncertain or difficult to interpret.
Related frameworks have further expanded the conceptual boundaries of TPACK by foregrounding ethical judgment and treating AI not only as a tool but also as an active participant in instructional processes [
7,
11]. Together, these models converge on a common premise: GenAI integration requires more than incremental adjustments to traditional technology frameworks. The present study addresses a complementary gap—process-level accounts that explain how existing TPACK is dynamically activated and evaluated during iterative human–AI collaboration.
Empirical research on AI-TPACK has expanded rapidly. Large-scale survey studies document persistent gaps in teachers’ preparedness for AI-assisted teaching. For example, Wang et al. reported that pre-service teachers’ intentions to design AI-supported instruction were shaped by GenAI anxiety, social influence, and performance expectancy, with TPACK and self-efficacy serving as significant predictors [
12]. Intervention studies likewise suggest that AI-related competencies can be developed through structured professional learning. Sun et al. observed gains in AI knowledge and teaching self-efficacy following intensive training [
13], while Tan et al. reported both positive learning gains and “metacognitive recalibration” effects over an extended quasi-experimental program [
14]. These findings motivate a framework that explains how such recalibration is enacted through task-level evaluation and iterative redesign.
Qualitative investigations have empirically identified iterative processes of inquiry, evaluation, and modification in the design of pre-service teachers with GenAI [
15]. However, a comprehensive theoretical framework is needed to elevate these empirical observations into a generalized cognitive model that explains specifically how static TPACK resources are mobilized and transformed within these generative loops. Domain-specific studies have revealed both affordances and challenges across disciplines, including mathematics [
16,
17], visual arts [
18], and science education [
19]. Collectively, these findings suggest that effective AI integration depends not only on technical skill but also on reflective awareness of when and how AI should be deployed.
Despite this growing body of work, critical gaps remain. Existing AI-TPACK extensions primarily emphasize identifying requisite knowledge domains [
9,
10], while offering limited theoretical accounts of the cognitive processes through which such knowledge is activated during iterative human–AI collaboration [
20,
21]. Current frameworks insufficiently explain how educators navigate generative uncertainty, evaluate AI outputs against disciplinary criteria, and refine task structures to maintain epistemic responsibility.
Building on Wong’s concepts of Meta-Task Awareness (MTA) and the Chain of Learning Design and Evaluation (CoLDE) [
22], the present study proposes the MTA–TPACK Dynamic Collaboration Spiral as a process-oriented framework. The model explains how static TPACK resources are mobilized through iterative generate–evaluate–refine trajectories, culminating in Visible Pedagogical Thinking as an inspectable record of mechanism-guided reasoning. In this framing, implications for teacher professional development are positioned as a theoretically grounded direction that requires subsequent empirical validation.
1.2. Generative AI for Scientific Visualization: The Case of Typhoon–Terrain Interactions
Scientific visualization represents a domain in which the limitations of prompt-based AI interaction become particularly salient. Translating complex scientific phenomena into accurate and pedagogically meaningful visual representations requires not only domain expertise but continuous evaluative judgment. The visualization of tropical cyclone interactions with complex terrain provides an especially demanding test case for examining how human–AI collaboration unfolds under conditions of high epistemic responsibility.
Scientific illustration bridges abstract concepts and visual communication. Its evolution traces from prehistoric art [
23] and ancient drawings [
24] to the systematic rigor of classical scholars [
25]. The Renaissance established foundational anatomical precision [
26], while the printing press expanded knowledge dissemination [
27]. Later centuries standardized botanical details [
28], leading to modern transformations through photography, digital imaging, and virtual reality [
29].
Recent advances in generative AI, including large language models and text-to-image systems such as Midjourney, DALL·E, and Stable Diffusion, have further transformed this landscape by enabling rapid production of detailed scientific imagery from natural language prompts [
30,
31,
32]. These tools have expanded access to visualization while simultaneously introducing new risks related to scientific fidelity.
Within meteorology, prior work has demonstrated the use of generative AI for visualizing atmospheric processes, including tornado dynamics and tropical cyclone structures [
33,
34]. Similar applications have emerged across scientific disciplines, such as medicine, biology, and physics, highlighting the broad potential of AI-assisted visualization for science communication [
35]. However, superficially plausible outputs do not guarantee alignment with underlying physical mechanisms, underscoring the indispensability of expert evaluation.
Tropical cyclone–terrain interactions, particularly over regions such as Taiwan’s Central Mountain Range, involve multiple well-documented physical mechanisms that any scientifically accurate visualization must capture [
36,
37]. These include orographic blocking [
38,
39,
40,
41], vortex splitting [
39,
41,
42,
43], asymmetric convection [
41,
42,
44,
45], and terrain-induced track deflection [
36,
38,
40,
45,
46,
47,
48,
49], as documented in extensive observational and modeling studies. Additional factors such as terrain-induced vorticity generation [
39,
50], moisture blocking [
38,
41], and looping or stalling behavior further complicate storm evolution [
47,
51]. Despite advances in numerical modeling, accurately representing these processes remains challenging, motivating continued investigation into physical mechanisms and communication strategies [
37,
41,
50].
To ensure that the evaluation of AI-generated artifacts remains rigorously grounded in domain knowledge, we established a clear mapping between abstract physical mechanisms and their concrete representational manifestations.
Table 1 summarizes the four key mechanisms governing tropical cyclone–terrain interactions and, crucially, defines the Observable Visual Indicators for each. These indicators serve as the specific criteria for the subsequent evaluation, ensuring that judgments of ‘Scientific Accuracy’ are based on observable evidence (e.g., specific streamline curvature) rather than subjective impression. This operationalization represents a critical function of Meta-Task Awareness: transforming static Content Knowledge (CK) into an active epistemic filter for visual assessment.
This body of meteorological literature constitutes the Content Knowledge foundation that must be actively engaged when using generative AI for scientific visualization. As emphasized in prior analyses of GenAI characteristics, providing disciplinary terminology alone does not ensure scientifically meaningful output [
6]. AI systems may generate visually appealing but physically misleading representations, requiring experts to identify and correct hallucinations through iterative evaluation. Accordingly, typhoon–terrain visualization offers a rigorous empirical context for examining how Meta-Task Awareness operates as a dynamic filter that guides human–AI collaboration toward epistemically responsible outcomes.
1.3. Research Design and Methodological Positioning
This study is positioned as a conceptual framework paper illustrated through an empirical case. Its primary contribution is theoretical: the development of the MTA–TPACK Dynamic Collaboration Spiral as an explanatory framework for human–AI collaboration. The empirical component serves an illustrative, process-tracing function rather than a hypothesis-testing role, documenting how the framework’s constructs unfold across iterative visualization cycles.
The research design adopts a two-phase structure. In the Midjourney phase, an atmospheric science domain expert conducted four iterative Generate–Evaluate–Refine cycles to produce mechanism-faithful visualizations of typhoon–terrain interactions. In the GPT-4o phase, the task was re-executed using a closed-loop collaboration structure in which the model generated candidate images and produced structured self-evaluation outputs under human-defined evaluative criteria. Both phases employed the same evaluation criteria: four mechanism-linked indicators derived from the meteorological literature (blocking, vortex splitting, rainfall asymmetry, and streamline structure; see
Table 1 for the mechanism-to-indicator mapping), against which each generated image was assessed. During the GPT-4o phase, the model’s cycle-by-cycle self-evaluation texts were recorded as an audit trail, and the resulting trace is reported in
Appendix D (
Table A4) to support methodological transparency and rubric-based verification by linking diagnosis to prompt revision and observable changes. Technical configurations for both AI platforms are documented in
Appendix E.
Scientific accuracy was rated on a five-point scale by the first author, whose domain expertise includes published work on typhoon–terrain interactions [
33,
34,
46]. Visual clarity was assessed in terms of perceptual salience, compositional coherence, and communicative accessibility, following the anchored descriptors provided in
Appendix C. These ratings serve as structured expert judgments in an illustrative case rather than psychometric measurements. The study does not aim to claim statistical generalizability; instead, it demonstrates the explanatory utility of the proposed framework through detailed process documentation of a demanding scientific visualization task.
1.4. Research Objectives and Study Overview
Building on the theoretical developments reviewed in
Section 1.1 and the empirical context established in
Section 1.2, this study proposes the MTA–TPACK Dynamic Collaboration Spiral as a framework for understanding human–AI collaboration in educational settings. While existing AI-TPACK extensions have clarified the knowledge structures required for AI integration, this study focuses on the cognitive processes that activate and coordinate these structures during iterative interaction with generative AI systems.
The study pursues three interrelated objectives:
Theoretical Articulation: To articulate the MTA–TPACK Dynamic Collaboration Spiral as a theoretically grounded model linking static TPACK resources with dynamic Meta-Task Awareness. This objective involves defining how MTA functions as a navigation engine, transforming passive knowledge into active design strategies.
Empirical Demonstration with Structured Evaluation: To empirically demonstrate this framework through AI-assisted scientific visualization, tracing how the Generate–Evaluate–Refine cycles operate across multiple iterations. To ensure methodological rigor and mitigate the subjectivity inherent in single-expert case studies, this demonstration employs a Structured Evaluation Rubric (provided in
Appendix C). Specifically, the iterative evaluation of AI outputs is governed by:
Mechanism-Specific Indicators (
Table A1 in
Appendix C), which map abstract physical principles to observable visual evidence (e.g., assessing ‘Vortex Splitting’ via the presence of secondary gyres).
Holistic Scoring Anchors (
Table A2 and
Table A3 in
Appendix C), which provide explicit criteria for rating Scientific Accuracy and Visual Clarity on a 1–5 scale.
By adhering to these standardized descriptors, the study aims to show how principled human–AI collaboration can be systematically assessed and replicated.
Outcome Definition: To examine how this principled collaboration produces Visible Pedagogical Thinking—externalized, structured reasoning patterns that render expert cognition inspectable and reusable as pedagogical resources.
The study is organized as follows.
Section 2 presents the theoretical framework, detailing its foundational components, dynamic mechanisms, and outcomes.
Section 3 and
Section 4 provide empirical demonstrations through a Midjourney-based human-orchestrated phase and a GPT-4o-based AI-augmented phase, respectively, applying the rigorous evaluation criteria defined in
Appendix C.
Section 5 synthesizes the findings, discusses implications for sustainable human–AI collaboration in the post-prompting era, and articulates how Visible Pedagogical Thinking emerges as a structured reasoning pattern as the culminating outcome of the framework.