The MTA-TPACK Dynamic Collaboration Spiral: Making Pedagogical Thinking Visible in Human–AI Scientific Visualization for Sustainable Teacher Innovation

Chen, Hung-Cheng; Wong, Lung-Hsiang

doi:10.3390/su18062718

Open AccessArticle

The MTA-TPACK Dynamic Collaboration Spiral: Making Pedagogical Thinking Visible in Human–AI Scientific Visualization for Sustainable Teacher Innovation

by

Hung-Cheng Chen

^1,2

and

Lung-Hsiang Wong

^2,*

¹

School of Mechatronics and Intelligent Manufacturing, Huanggang Normal University, Huanggang 438000, China

²

National Institute of Education, Nanyang Technological University, Singapore 637616, Singapore

^*

Author to whom correspondence should be addressed.

Sustainability 2026, 18(6), 2718; https://doi.org/10.3390/su18062718

Submission received: 14 February 2026 / Revised: 6 March 2026 / Accepted: 7 March 2026 / Published: 11 March 2026

(This article belongs to the Special Issue Generative Artificial Intelligence (GAI) and AI-Generated Content (AIGC) for Sustainable Futures: Innovations in Education, Culture, and Society)

Download

Browse Figures

Versions Notes

Abstract

Generative AI (GenAI) challenges traditional technology integration frameworks by introducing agentic systems that actively participate in meaning-making, requiring educators to shift from tool operation to cognitive orchestration. This study introduces the MTA–TPACK Dynamic Collaboration Spiral, a theoretical framework that integrates Meta-Task Awareness (MTA) to explain how static knowledge resources are dynamically activated during human–AI collaboration. We empirically illustrate this framework through a two-phase scientific visualization task concerning typhoon–terrain interactions, utilizing Midjourney for human-led orchestration and GPT-4o for closed-loop refinement. The results demonstrate that successful integration requires translating abstract disciplinary knowledge into precise, AI-intelligible visual constraints rather than relying solely on technical prompting skills. Furthermore, we document how evaluation practices evolve from simple error correction to structured, AI-assisted diagnosis. The resulting visual artifacts embody Visible Pedagogical Thinking (VPT)—externalized cognitive constructs that make expert reasoning inspectable and reusable. By foregrounding evaluation-centered task design, this study provides a transferable, theoretically grounded account of how human–AI collaboration can remain pedagogically meaningful. The model contributes to sustainable pedagogical innovation by offering a roadmap for strengthening teachers’ long-term epistemic agency in AI-mediated design environments.

Keywords:

meta-task awareness (MTA); human–AI collaboration; TPACK; teacher professional development; generative AI; cognitive sustainability; pedagogical innovation; visible pedagogical thinking (VPT)

1. Introduction

1.1. Research Background: TPACK in the Generative AI Era

The Technological Pedagogical Content Knowledge (TPACK) framework, introduced by Mishra and Koehler [1], builds on Shulman’s foundational work on Pedagogical Content Knowledge (PCK) [2,3] and conceptualizes effective technology integration as the dynamic intersection of Content Knowledge (CK), Pedagogical Knowledge (PK), and Technological Knowledge (TK). Over the past two decades, TPACK has provided a shared analytical language for examining how teachers develop integrated knowledge structures for technology-enhanced instruction across diverse educational contexts [4,5].

The rapid emergence of generative artificial intelligence (GenAI), particularly after the public release of ChatGPT in November 2022, challenges key assumptions underlying classical TPACK. Whereas earlier educational technologies were largely stable and predictable tools whose functions could be mastered through procedural proficiency [6], GenAI systems participate in meaning-making and representation by producing novel outputs through opaque and context-sensitive processes, often with nontrivial unpredictability [6,7]. This shift changes the cognitive task of teaching from operating tools to orchestrating tasks under generative uncertainty.

Recognizing this design reorientation, recent scholarship has proposed multiple extensions to the TPACK framework. For example, Intelligent-TPACK, initially conceptualized by Celik [8] and further developed by Chiu and colleagues [9,10], treats ethical knowledge as a core component and considers human–AI collaboration a distinct competency area. Using a Delphi study with teachers from diverse subject domains, Chiu et al. operationalized Intelligent-TPACK as an interrelated set of knowledge domains that includes AI-related technological, pedagogical, and content knowledge, human–AI collaboration knowledge, and ethical knowledge [10]. This line of work highlights a broader shift from “tool use” toward the design and governance of AI-augmented instructional activity.

The impetus for extending TPACK is further reinforced by contributions from the framework’s original co-creators. Mishra, Warr, and Islam identified five defining characteristics of GenAI—protean, opaque, unstable, generative, and social—that distinguish it from prior educational technologies [6]. These characteristics imply that effective integration is shaped not only by what teachers know, but also by how they monitor, evaluate, and recalibrate task structures during ongoing human–AI interaction, especially when outputs are uncertain or difficult to interpret.

Related frameworks have further expanded the conceptual boundaries of TPACK by foregrounding ethical judgment and treating AI not only as a tool but also as an active participant in instructional processes [7,11]. Together, these models converge on a common premise: GenAI integration requires more than incremental adjustments to traditional technology frameworks. The present study addresses a complementary gap—process-level accounts that explain how existing TPACK is dynamically activated and evaluated during iterative human–AI collaboration.

Empirical research on AI-TPACK has expanded rapidly. Large-scale survey studies document persistent gaps in teachers’ preparedness for AI-assisted teaching. For example, Wang et al. reported that pre-service teachers’ intentions to design AI-supported instruction were shaped by GenAI anxiety, social influence, and performance expectancy, with TPACK and self-efficacy serving as significant predictors [12]. Intervention studies likewise suggest that AI-related competencies can be developed through structured professional learning. Sun et al. observed gains in AI knowledge and teaching self-efficacy following intensive training [13], while Tan et al. reported both positive learning gains and “metacognitive recalibration” effects over an extended quasi-experimental program [14]. These findings motivate a framework that explains how such recalibration is enacted through task-level evaluation and iterative redesign.

Qualitative investigations have empirically identified iterative processes of inquiry, evaluation, and modification in the design of pre-service teachers with GenAI [15]. However, a comprehensive theoretical framework is needed to elevate these empirical observations into a generalized cognitive model that explains specifically how static TPACK resources are mobilized and transformed within these generative loops. Domain-specific studies have revealed both affordances and challenges across disciplines, including mathematics [16,17], visual arts [18], and science education [19]. Collectively, these findings suggest that effective AI integration depends not only on technical skill but also on reflective awareness of when and how AI should be deployed.

Despite this growing body of work, critical gaps remain. Existing AI-TPACK extensions primarily emphasize identifying requisite knowledge domains [9,10], while offering limited theoretical accounts of the cognitive processes through which such knowledge is activated during iterative human–AI collaboration [20,21]. Current frameworks insufficiently explain how educators navigate generative uncertainty, evaluate AI outputs against disciplinary criteria, and refine task structures to maintain epistemic responsibility.

Building on Wong’s concepts of Meta-Task Awareness (MTA) and the Chain of Learning Design and Evaluation (CoLDE) [22], the present study proposes the MTA–TPACK Dynamic Collaboration Spiral as a process-oriented framework. The model explains how static TPACK resources are mobilized through iterative generate–evaluate–refine trajectories, culminating in Visible Pedagogical Thinking as an inspectable record of mechanism-guided reasoning. In this framing, implications for teacher professional development are positioned as a theoretically grounded direction that requires subsequent empirical validation.

1.2. Generative AI for Scientific Visualization: The Case of Typhoon–Terrain Interactions

Scientific visualization represents a domain in which the limitations of prompt-based AI interaction become particularly salient. Translating complex scientific phenomena into accurate and pedagogically meaningful visual representations requires not only domain expertise but continuous evaluative judgment. The visualization of tropical cyclone interactions with complex terrain provides an especially demanding test case for examining how human–AI collaboration unfolds under conditions of high epistemic responsibility.

Scientific illustration bridges abstract concepts and visual communication. Its evolution traces from prehistoric art [23] and ancient drawings [24] to the systematic rigor of classical scholars [25]. The Renaissance established foundational anatomical precision [26], while the printing press expanded knowledge dissemination [27]. Later centuries standardized botanical details [28], leading to modern transformations through photography, digital imaging, and virtual reality [29].

Recent advances in generative AI, including large language models and text-to-image systems such as Midjourney, DALL·E, and Stable Diffusion, have further transformed this landscape by enabling rapid production of detailed scientific imagery from natural language prompts [30,31,32]. These tools have expanded access to visualization while simultaneously introducing new risks related to scientific fidelity.

Within meteorology, prior work has demonstrated the use of generative AI for visualizing atmospheric processes, including tornado dynamics and tropical cyclone structures [33,34]. Similar applications have emerged across scientific disciplines, such as medicine, biology, and physics, highlighting the broad potential of AI-assisted visualization for science communication [35]. However, superficially plausible outputs do not guarantee alignment with underlying physical mechanisms, underscoring the indispensability of expert evaluation.

Tropical cyclone–terrain interactions, particularly over regions such as Taiwan’s Central Mountain Range, involve multiple well-documented physical mechanisms that any scientifically accurate visualization must capture [36,37]. These include orographic blocking [38,39,40,41], vortex splitting [39,41,42,43], asymmetric convection [41,42,44,45], and terrain-induced track deflection [36,38,40,45,46,47,48,49], as documented in extensive observational and modeling studies. Additional factors such as terrain-induced vorticity generation [39,50], moisture blocking [38,41], and looping or stalling behavior further complicate storm evolution [47,51]. Despite advances in numerical modeling, accurately representing these processes remains challenging, motivating continued investigation into physical mechanisms and communication strategies [37,41,50].

To ensure that the evaluation of AI-generated artifacts remains rigorously grounded in domain knowledge, we established a clear mapping between abstract physical mechanisms and their concrete representational manifestations. Table 1 summarizes the four key mechanisms governing tropical cyclone–terrain interactions and, crucially, defines the Observable Visual Indicators for each. These indicators serve as the specific criteria for the subsequent evaluation, ensuring that judgments of ‘Scientific Accuracy’ are based on observable evidence (e.g., specific streamline curvature) rather than subjective impression. This operationalization represents a critical function of Meta-Task Awareness: transforming static Content Knowledge (CK) into an active epistemic filter for visual assessment.

This body of meteorological literature constitutes the Content Knowledge foundation that must be actively engaged when using generative AI for scientific visualization. As emphasized in prior analyses of GenAI characteristics, providing disciplinary terminology alone does not ensure scientifically meaningful output [6]. AI systems may generate visually appealing but physically misleading representations, requiring experts to identify and correct hallucinations through iterative evaluation. Accordingly, typhoon–terrain visualization offers a rigorous empirical context for examining how Meta-Task Awareness operates as a dynamic filter that guides human–AI collaboration toward epistemically responsible outcomes.

1.3. Research Design and Methodological Positioning

This study is positioned as a conceptual framework paper illustrated through an empirical case. Its primary contribution is theoretical: the development of the MTA–TPACK Dynamic Collaboration Spiral as an explanatory framework for human–AI collaboration. The empirical component serves an illustrative, process-tracing function rather than a hypothesis-testing role, documenting how the framework’s constructs unfold across iterative visualization cycles.

The research design adopts a two-phase structure. In the Midjourney phase, an atmospheric science domain expert conducted four iterative Generate–Evaluate–Refine cycles to produce mechanism-faithful visualizations of typhoon–terrain interactions. In the GPT-4o phase, the task was re-executed using a closed-loop collaboration structure in which the model generated candidate images and produced structured self-evaluation outputs under human-defined evaluative criteria. Both phases employed the same evaluation criteria: four mechanism-linked indicators derived from the meteorological literature (blocking, vortex splitting, rainfall asymmetry, and streamline structure; see Table 1 for the mechanism-to-indicator mapping), against which each generated image was assessed. During the GPT-4o phase, the model’s cycle-by-cycle self-evaluation texts were recorded as an audit trail, and the resulting trace is reported in Appendix D (Table A4) to support methodological transparency and rubric-based verification by linking diagnosis to prompt revision and observable changes. Technical configurations for both AI platforms are documented in Appendix E.

Scientific accuracy was rated on a five-point scale by the first author, whose domain expertise includes published work on typhoon–terrain interactions [33,34,46]. Visual clarity was assessed in terms of perceptual salience, compositional coherence, and communicative accessibility, following the anchored descriptors provided in Appendix C. These ratings serve as structured expert judgments in an illustrative case rather than psychometric measurements. The study does not aim to claim statistical generalizability; instead, it demonstrates the explanatory utility of the proposed framework through detailed process documentation of a demanding scientific visualization task.

1.4. Research Objectives and Study Overview

Building on the theoretical developments reviewed in Section 1.1 and the empirical context established in Section 1.2, this study proposes the MTA–TPACK Dynamic Collaboration Spiral as a framework for understanding human–AI collaboration in educational settings. While existing AI-TPACK extensions have clarified the knowledge structures required for AI integration, this study focuses on the cognitive processes that activate and coordinate these structures during iterative interaction with generative AI systems.

The study pursues three interrelated objectives:

Theoretical Articulation: To articulate the MTA–TPACK Dynamic Collaboration Spiral as a theoretically grounded model linking static TPACK resources with dynamic Meta-Task Awareness. This objective involves defining how MTA functions as a navigation engine, transforming passive knowledge into active design strategies.
Empirical Demonstration with Structured Evaluation: To empirically demonstrate this framework through AI-assisted scientific visualization, tracing how the Generate–Evaluate–Refine cycles operate across multiple iterations. To ensure methodological rigor and mitigate the subjectivity inherent in single-expert case studies, this demonstration employs a Structured Evaluation Rubric (provided in Appendix C). Specifically, the iterative evaluation of AI outputs is governed by:
- Mechanism-Specific Indicators (Table A1 in Appendix C), which map abstract physical principles to observable visual evidence (e.g., assessing ‘Vortex Splitting’ via the presence of secondary gyres).
- Holistic Scoring Anchors (Table A2 and Table A3 in Appendix C), which provide explicit criteria for rating Scientific Accuracy and Visual Clarity on a 1–5 scale.
By adhering to these standardized descriptors, the study aims to show how principled human–AI collaboration can be systematically assessed and replicated.
Outcome Definition: To examine how this principled collaboration produces Visible Pedagogical Thinking—externalized, structured reasoning patterns that render expert cognition inspectable and reusable as pedagogical resources.

The study is organized as follows. Section 2 presents the theoretical framework, detailing its foundational components, dynamic mechanisms, and outcomes. Section 3 and Section 4 provide empirical demonstrations through a Midjourney-based human-orchestrated phase and a GPT-4o-based AI-augmented phase, respectively, applying the rigorous evaluation criteria defined in Appendix C. Section 5 synthesizes the findings, discusses implications for sustainable human–AI collaboration in the post-prompting era, and articulates how Visible Pedagogical Thinking emerges as a structured reasoning pattern as the culminating outcome of the framework.

2. Theoretical Framework: The MTA-TPACK Dynamic Collaboration Spiral

As established in Section 1.1, classical TPACK conceptualizes effective technology integration as the intersection of CK, PK, and TK [1,4]. However, the agentic characteristics of GenAI—systems that actively participate in representational and decision-making processes, producing outputs that are generative, opaque, and context-sensitive [6]—require a reconceptualization of how these knowledge resources function in practice. Under these conditions, effective AI integration cannot be understood solely as the accumulation or intersection of knowledge domains. Rather, it requires a framework that accounts for how static knowledge resources are dynamically activated, coordinated, and evaluated during iterative human–AI interaction.

This paper introduces the MTA–TPACK Dynamic Collaboration Spiral (see Figure 1), a framework comprising three core constructs: MTA, the Generate–Evaluate–Refine Collaboration Spiral, and VPT. In this model, static TPACK resources serve as the foundation, while MTA dynamically activates these resources through iterative human–AI collaboration cycles. The process culminates in documenting VPT as the outcome. Each construct is detailed in the subsections that follow.

In this framework, a meta-task refers to higher-order framing work that specifies and restructures a task’s goals, constraints, and evaluative criteria, including how labor is distributed between human judgment and AI generation. MTA is the capacity to deliberately carry out such framing—an integrative metacognitive capability for recognizing how AI participation reshapes task structure, constraint space, and evaluation demands.

In this study, MTA is not conceptualized as a discrete skill, trait, or checklist of strategies. Rather, it is framed as an epistemic orientation toward task design in AI-mediated environments—an awareness of how task goals, structures, and boundaries are co-shaped by human judgment and AI affordances. This orientation becomes observable when teachers explicitly articulate and iteratively refine the underlying structure of their design tasks. Unlike conventional metacognition, which centers on monitoring and regulating one’s own task execution, MTA foregrounds task-framing regulation and epistemic positioning within AI-mediated environments—attending not only to how one performs a task, but to how the task itself is reshaped by AI participation.

2.1. The Foundation and the Engine: Static TPACK Resources and Meta-Task Awareness

At the base of the MTA–TPACK Dynamic Collaboration Spiral lies the Foundation, comprising static TPACK resources. This layer represents relatively stable knowledge structures, including disciplinary understanding (CK), instructional design principles (PK), and proficiency in using digital tools (TK). These resources align with the conventional interpretation of TPACK as an integrated yet essentially static knowledge base [1,4].

Within the proposed framework, static TPACK resources are necessary but not sufficient for effective collaboration with GenAI systems. Building on Wong’s conceptualization [22], MTA is defined here as a dynamic navigation engine that activates and coordinates CK, PK, and TK during iterative human–AI interaction. Whereas prompt engineering typically concentrates on improving the immediate effectiveness of a single prompt, MTA operates at the level of task framing and evaluation: it monitors the evolving Chain of Learning (CoL) trajectory, calibrates when and how to delegate sub-tasks to AI, evaluates outputs against disciplinary standards, and steers the generative process toward pedagogically meaningful outcomes.

From the perspective of Technological Knowledge, MTA shifts the focus from procedural proficiency—knowing how to operate AI tools—to awareness of how AI systems interpret prompts, prioritize features, and exercise generative autonomy [6]. In relation to Pedagogical Knowledge, MTA supports monitoring and regulating the distribution of agency between human and AI actors, aligning with recent conceptions of teachers as designers of augmented pedagogical systems [10]. With respect to Content Knowledge, MTA enables disciplinary understanding to function as an epistemic filter, guiding judgments about whether AI-generated representations conform to or deviate from accepted scientific or conceptual constraints.

Through this coordinating role, MTA transforms static TPACK resources into an active system for navigating human–AI collaboration. The framework therefore distinguishes between possessing integrated knowledge and being able to mobilize that knowledge effectively under conditions of generative uncertainty.

In this study, MTA is operationalized as observable shifts in (i) how domain mechanisms are translated into visual constraints in prompts, and (ii) how evaluation criteria move from surface plausibility to mechanism-level causal consistency. This spiral structure provides the conceptual reference point for the empirical analyses reported in Section 3 and Section 4.

2.2. The Process: The Generate–Evaluate–Refine Collaboration Spiral

When static TPACK resources are activated through Meta-Task Awareness, human–AI interaction takes the form of a structured, iterative process represented in the framework as the Dynamic Collaboration Spiral. This iterative Generate–Evaluate–Refine process is an instantiation of the CoLDE proposed by Wong [22]. CoLDE reconceptualizes the design process not as a series of isolated tasks, but as a coherent, cumulative trajectory. Within this chain, the ‘Evaluate’ phase is critical: it serves as the linkage that transforms a raw generative output into a refined pedagogical artifact, guided by the teacher’s expert reasoning.

In the Generate phase, the human expert externalizes task-relevant knowledge by specifying prompts or constraints intended to guide AI output. This action is treated as a design intervention rather than a simple command, as it reflects hypotheses about how domain knowledge and representational requirements can be communicated to the AI system.

In the Evaluate phase, the AI-generated artifact is examined against explicit criteria derived from disciplinary, pedagogical, or task-specific considerations. Evaluation serves a diagnostic function by revealing discrepancies between intended representations and actual outputs, as well as by exposing implicit assumptions embedded in the prompt design.

In the Refine phase, insights from evaluation are used to adjust specifications, constraints, or representational strategies in subsequent iterations. This phase is governed by Meta-Task Awareness, which informs decisions about how to recalibrate human input in response to AI behavior.

Across iterations, the Generate–Evaluate–Refine sequence forms an upward spiral rather than a closed loop. Each cycle builds on prior evaluations, progressively stabilizing alignment between human intent and AI output while deepening understanding of the task–AI relationship. Section 3 and Section 4 instantiate these phases using a four-mechanism rubric (Table 1) and cycle-wise scoring (Table 2 and Table 3).

Table 2. Evaluation trajectory across four Generate–Evaluate–Refine cycles in the Midjourney phase (Figure 2, Figure 3, Figure 4 and Figure 5b) ¹.

Cycle	Figure	Prompt	Blocking	Vortex Splitting	Rainfall Asymmetry	Streamline Structure	Scientific Accuracy	Visual Clarity	MTA Development Stage
1	Figure 2	P1	✗	✗	△	✗	2	4	Traditional View
2	Figure 3	P2	✓	△	✓	△	3	4	Emerging Awareness
3	Figure 4	P3	✓	✓	✓	△	4	4	Active Orchestration
4	Figure 5a	P4	✓	✓	✓	✓	5	4	MTA-Enhanced View
4	Figure 5b	P4	✓	✓	✓	✓	5	5	MTA-Enhanced View

¹ Legend: ✓ = clearly present; △ = partially present; ✗ = absent (Detailed operational definitions for these indicators are provided in Table A1 of Appendix C).

Table 3. Evaluation trajectory across four closed-loop Generate–Evaluate–Refine cycles in the GPT-4o phase (Figure 6, Figure 7, Figure 8 and Figure 9) ¹.

Cycle	Figure	Prompt	Blocking	Vortex Splitting	Rainfall Asymmetry	Streamline Structure	Scientific Accuracy	Visual Clarity	AI-Augmented/Assisted Self-Evaluation Stage
1	Figure 6	P5	△	△	△	△	3	3	Descriptive Alignment
2	Figure 7	P6	✓	△	✓	△	4	4	Diagnostic Evaluation
3	Figure 8	P7	✓	✓	✓	△	4	5	Structured Self-Reflection
4	Figure 9	P8	✓	✓	✓	✓	5	5	Selection-Oriented Stabilization

¹ Legend: ✓ = clearly present; △ = partially present (Detailed operational definitions for these indicators are provided in Table A1 of Appendix C).

Figure 2. First spiral cycle output generated using Prompt P1. The image is visually striking but exhibits significant scientific deficiencies: orographic blocking, vortex splitting, and rainfall asymmetry are not clearly evidenced under the rubric.

Figure 3. Second spiral cycle output generated using Prompt P2. Terrain effects and windward rainfall show improvement, though vortex splitting remains vague and streamline coherence is insufficient.

Figure 4. Third spiral cycle output generated using Prompt P3, introducing “terrain-induced gyres” as the visual correlate for vortex splitting. Vortex splitting becomes visible, orographic blocking is convincingly illustrated, and rainfall asymmetry is more pronounced.

Figure 5. Fourth spiral cycle outputs generated using Prompt P4. Two variants emerged: (a) structural precision is emphasized, which is suitable for academic presentations; (b) scientific accuracy is balanced with visual expressiveness through color-coded vorticity contours and dynamic cloud textures. Variant (b) was selected as the optimal output from the Midjourney phase.

Figure 6. First closed-loop cycle output generated using GPT-4o with Prompt P5, establishing the baseline for the closed-loop experiment. The image shows improved terrain shading and overall completeness but retains limitations in terrain-anchored vortex splitting and cross-barrier streamline coherence (see Table 3 and Appendix D, Table A4, Cycle 1).

Figure 7. Second closed-loop cycle output generated using GPT-4o with Prompt P6. The image reflects diagnostic refinements toward terrain-aligned streamline curvature and windward–leeward contrast (see Appendix D, Table A4, Cycle 2).

Figure 8. Third closed-loop cycle output generated using Prompt P7 with enhanced multilayered atmospheric specifications. Vortex splitting is clearly visible with dual circulation centers, and scientific labels enhance educational utility.

Figure 9. Fourth closed-loop cycle output representing the culmination of the MTA-TPACK Dynamic Collaboration Spiral. All target mechanism-linked indicators are fully integrated, and labeling has been refined to meet the constraints specified in Prompt P8.

2.3. The Outcome: Visible Pedagogical Thinking

The outcome of sustained engagement within the MTA–TPACK Dynamic Collaboration Spiral is termed Visible Pedagogical Thinking. In this framework, AI-generated artifacts are treated not only as outputs but as evidence-bearing representations of the evaluative process that shaped them, made inspectable through stable criteria (Table 1; Appendix C) and cycle-wise traces (Appendix D).

Visible Pedagogical Thinking is the manifestation of expert reasoning externalized through iterative human–AI collaboration, so that the evolving artifact can be linked to explicit criteria and revision decisions. When Meta-Task Awareness governs the Generate–Evaluate–Refine process, disciplinary judgments, pedagogical intentions, and technological considerations become materially inscribed in the output. As a result, the final artifact reflects not only what was generated but also how evaluative and corrective decisions accumulated across cycles.

It is important to distinguish Visible Pedagogical Thinking from standard iterative design documentation. While any design process can be logged chronologically, VPT is characterized by rubric-anchored evidentiary structure.

First, VPT is indexed to disciplinary mechanisms rather than to time alone. Each Generate–Evaluate–Refine cycle is evaluated against explicit mechanism-linked indicators (Table 1; Appendix C), so the documented trajectory functions as a structured account of how domain knowledge progressively constrained AI output, not merely a record of successive edits. Second, VPT makes evaluative reasoning explicit. Beyond documenting what changed, it documents why it changed—what mechanism was underrepresented, what mismatch was diagnosed, and how constraints were revised in response (Appendix D, Table A4). Third, VPT is theory-guided: the framework prescribes focal points for reflective attention (mechanism–representation alignment and the distribution of evaluative labor), enabling the trace to be interpreted as an epistemically grounded procedure rather than an idiosyncratic narrative. Section 3 and Section 4 provide empirical evidence for these distinctions.

This outcome has both theoretical and practical significance. Theoretically, it provides a mechanism for examining competencies emphasized in AI-TPACK extensions—such as epistemic responsibility and human–AI collaboration—in a concrete, traceable form [8,9,10,11]. Practically, it makes pedagogical reasoning inspectable and discussable through rubric-anchored traces, and potentially reusable as a documented procedure for design reflection in AI-mediated contexts, while broader transfer to teacher professional development remains subject to future empirical validation. In this study, Appendix D provides an auditable example of such traces in a closed-loop GPT-4o setting.

3. Activating the Dynamic Collaboration Spiral: The Midjourney Phase

3.1. The Foundation: Static TPACK Resources as the Stable Base

As illustrated in Figure 1, engagement with generative AI for scientific visualization begins with the Foundation, consisting of static TPACK resources. These resources include disciplinary knowledge of atmospheric dynamics (CK), principles of scientific and educational visualization (PK), and operational familiarity with text-to-image generation tools (TK). Together, they form the stable knowledge base from which human–AI collaboration proceeds.

For the present task, Content Knowledge was established through a synthesis of peer-reviewed literature on tropical cyclone interactions with complex terrain. This review identified four mechanisms essential for scientifically accurate representation: orographic blocking, vortex splitting, asymmetric convection, and terrain-induced track deflection (see Table 1). These mechanisms provided explicit scientific criteria against which AI-generated images were evaluated [36,37,38,39,40,41,42,43,44,45,46,47,48,49].

Pedagogical Knowledge was grounded in established principles of scientific visualization, including selective abstraction, spatial organization to reduce cognitive load, and iterative refinement as a quality control strategy [26,29]. Technological Knowledge consisted of familiarity with Midjourney’s prompt-based image generation, including its sensitivity to stylistic framing and its tendency to prioritize aesthetic coherence when scientific constraints are underspecified [32,33,34]. Platform-specific settings, such as using the parameter—style raw to mitigate aesthetic overelaboration, are reported in Appendix E.

At this stage, these TPACK resources existed as parallel knowledge domains. Their integration into a functional system for managing generative uncertainty required activation through iterative interaction, as theorized in Section 2.

3.2. The First Spiral Cycle: Establishing the Baseline

To maintain analytical focus on the collaboration process rather than on prompt phrasing, the evolution of prompts across iterative cycles is described in structured form in the main text. The full prompt texts used in the Midjourney phase (P1–P4) are provided in Appendix A.

3.2.1. First GENERATE Phase

In the first Generate phase, Prompt P1 was designed to specify the target scientific phenomena relevant to tropical cyclone–terrain interactions, including orographic blocking, vortex splitting, asymmetric convection, and potential vorticity redistribution. The representational format was defined as an infographic-style visualization with three-dimensional depth and optimized lighting to enhance perceptual salience.

At this stage, scientific mechanisms were introduced primarily as descriptive elements rather than as explicitly constrained visual features. The prompt therefore relied on the AI system’s default interpretive tendencies to translate disciplinary terminology into visual form. The complete text of Prompt P1 is provided in Appendix A.

3.2.2. The First EVALUATE Phase

The image generated from Prompt P1 (Figure 2) exhibited high visual quality, including a well-defined cyclone structure and convincing depth. However, evaluation against the four scientific criteria revealed substantial misalignment. Orographic blocking was weakly represented, vortex splitting was absent, and rainfall distribution remained largely symmetric rather than windward-enhanced [38,39,40,41,42,43,44,45].

Expert evaluation assigned the image a scientific accuracy score of 2/5 and a visual clarity score of 4/5. Referencing the anchors in Table A2 (Appendix C), a score of 2 indicates ‘Low Accuracy,’ in which key mechanisms such as vortex splitting are absent or severely distorted despite the image’s high aesthetic quality. Conversely, the high visual clarity score aligns with the criteria in Table A3, indicating readable features but insufficient scientific substance.” This divergence between visual appeal and physical fidelity established a baseline reference for subsequent refinement.

3.2.3. First REFINE Phase

Refinement following the first evaluation focused on diagnosing the source of misalignment between scientific intent and AI output. Analysis indicated that listing mechanisms without specifying their visual correlates allowed aesthetic priors to dominate the generation process.

This first refinement cycle clarified the functional role of evaluation: not to judge success or failure, but to identify how the AI system interpreted abstract disciplinary descriptions.

At this stage, Meta-Task Awareness corresponds to the Traditional View (Table 2), in which evaluation is conducted entirely by human experts and remains primarily outcome-oriented—focusing on whether the generated image appeared superficially plausible, with limited sensitivity to how AI interpretive biases shaped representational outcomes. This human-centered baseline provides the reference point for subsequent AI-augmented extensions.

3.3. Ascending the Spiral: Progressive Refinement Across Iterative Cycles

3.3.1. Second Spiral Cycle: From Descriptive Terms to Visual Constraints

In the second cycle, Prompt P2 shifted emphasis from enumerating scientific mechanisms toward specifying concrete visual constraints. Streamlines were introduced as the primary representational device for airflow deflection, and compositional instructions were added to reduce unnecessary embellishment.

By constraining both representational elements and compositional focus, Prompt P2 aimed to curb the AI system’s tendency toward aesthetic elaboration observed in the baseline cycle. The full text of Prompt P2 is included in Appendix A.

The resulting image (Figure 3) showed improved representation of terrain effects, including partial windward rainfall enhancement. Scientific accuracy increased to 3/5, while visual clarity remained high at 4/5. Vortex splitting, however, remained insufficiently specified.

This cycle reflects the Emerging Awareness stage of MTA development (Table 2). Evaluation expanded beyond surface plausibility to include recognition of how specific prompt elements influenced AI behavior. The introduction of streamlines marked an initial shift from judging outputs to diagnosing representational mechanisms, indicating growing awareness of AI interpretive tendencies.

Importantly, diagnostic sensitivity at this stage remains externalized in human judgment, with AI behavior interpreted and adjusted through expert analysis rather than autonomous system feedback.

3.3.2. Third Spiral Cycle: Translating Mechanisms into Observable Features

The third cycle further operationalized the translation from disciplinary mechanisms to observable visual features. Prompt P3 explicitly introduced terrain-induced gyres as the visual correlate for vortex splitting, reducing ambiguity in AI interpretation.

The generated image (Figure 4) exhibited clearer secondary circulation features on the lee side of the terrain, more coherent streamline structures, and pronounced rainfall asymmetry, consistent with documented cyclone–terrain interaction patterns [41,44,50]. Scientific accuracy increased to 4/5, with visual clarity remaining stable. The complete text of Prompt P3 is provided in Appendix A.

By the third cycle, Meta-Task Awareness advanced to the Active Orchestration stage. Disciplinary knowledge was no longer applied retrospectively but was proactively translated into explicit visual correlates. Evaluation at this stage involved deliberate alignment between scientific mechanisms and observable features, reflecting increased control over human–AI role distribution.

This form of orchestration reflects a human-centered redistribution of task structure, in which planning, diagnosis, and refinement decisions are explicitly performed by the human expert and imposed on the AI system through prompt design.

At this stage, prompt structures function not only as technical instructions but as external representations of expert pedagogical reasoning, consistent with arguments that generative AI artifacts can serve as cognitive mirrors when guided by principled orchestration [7,9].

3.3.3. Fourth Spiral Cycle: Integration and Selection

In the fourth cycle, Prompt P4 emphasized integration of all target mechanisms while explicitly constraining compositional complexity. Instructions prioritized scientific accuracy and minimalism to suppress aesthetic drift and stabilize representational fidelity.

While the textual specification in P4 remained largely consistent with P3 (see Appendix A), the minimalist constraint and the accumulation of prior evaluative insights enabled the generation of multiple scientifically valid variants, shifting evaluation from correction to selection.

This prompt generated multiple scientifically valid candidate images (Figure 5a,b), introducing a more advanced evaluation approach based on selection rather than correction. Both variants achieved scientific accuracy scores of 5/5, with Figure 5b selected as the optimal output due to its balance between accuracy and visual accessibility. The full text of Prompt P4 is provided in Appendix A.

The fourth cycle corresponds to the MTA-Enhanced View identified in Table 2. At this stage, evaluation shifted from error correction to strategic selection among multiple valid solutions. The MTA-Enhanced View observed in this phase represents the highest level of human-centered Meta-Task Awareness, characterized by expert-driven selection among multiple valid AI-generated representations.

The comparative trajectory of scientific fidelity, visual clarity, and Meta-Task Awareness across the four Generate–Evaluate–Refine cycles is summarized in Table 2. Note that the stage definitions in Table 2 reflect the shift from proficiency-based tool use toward epistemic filtering and human–AI agency orchestration, consistent with prior discussions of GenAI characteristics and AI-TPACK extensions [6,8,10,21].

3.4. Summary of the Midjourney Phase

Across four Generate–Evaluate–Refine cycles, the Midjourney phase demonstrates how static TPACK resources were progressively mobilized through Meta-Task Awareness within the Dynamic Collaboration Spiral, as theoretically articulated in Section 2. Rather than reiterating the full structural sequence, this phase empirically illustrates how human-led evaluation evolved from outcome-focused validation toward diagnostic control, active orchestration, and selection-based stabilization.

As summarized in Table 2, this trajectory establishes a human-centered baseline of Meta-Task Awareness, in which evaluative judgment remains externalized yet increasingly structured. This baseline provides a critical reference point for the GPT-4o phase (Section 4), where evaluative functions are progressively augmented—though not replaced—by AI-supported self-evaluation mechanisms.

4. Activating the Dynamic Collaboration Spiral: The GPT-4o Closed-Loop Phase

4.1. Conceptual Positioning of the GPT-4o Phase Within the MTA–TPACK Framework

Following the Midjourney phase, in which MTA was developed through exclusively human-led evaluation and refinement, the GPT-4o phase introduces a closed-loop collaboration structure that procedurally embeds AI-assisted evaluation within the workflow. This phase does not represent autonomous AI cognition. Rather, it instantiates an AI-augmented extension of MTA under human-defined task structures and evaluative criteria. A critical distinction must be made regarding the role of AI in this phase. A potential concern is that AI “self-evaluation” may appear indistinguishable from prompt-following unless the evaluation trace is made inspectable. Viewed through the lens of CoLDE [22], the key point is not autonomous judgment, but the procedural embedding of an evaluation loop within a human-designed Chain of Learning. The cycle-wise trace of GPT-4o evaluation outputs and corresponding revisions is provided in Appendix D (Table A4).

In the Midjourney phase, evaluative judgment remained fully externalized in the human expert’s interpretation and selection. In the GPT-4o closed-loop phase, evaluation is operationally embedded as a structured diagnostic protocol specified by the human and executed by the model within predefined boundaries (Appendix D, Table A4). This does not imply independent epistemic agency; instead, it demonstrates how MTA enables the educator to distribute cognitive labor by designing an interaction structure in which judgment is structurally integrated rather than improvised.

Within the MTA–TPACK Dynamic Collaboration Spiral proposed in Section 2, this phase corresponds to a redistribution of evaluative roles. While content knowledge and evaluative criteria are externally defined by human experts, GPT-4o is assigned explicitly structured self-evaluation tasks within predefined boundaries. As a result, evaluation statements that were previously articulated only in human reflection become part of the recorded generate–evaluate–refine trace (Appendix D, Table A4). Beyond efficiency gains, the GPT-4o phase amplifies how pedagogical reasoning is operationalized and audited through iterative constraint revision.

To maintain analytical focus on collaboration dynamics rather than prompt phrasing, prompts used in this phase (P5–P8) are described in terms of functional roles. Full prompt texts are provided in Appendix B; platform configurations are documented in Appendix E. To make this redistribution of evaluative labor empirically inspectable, the recorded GPT-4o self-evaluation outputs are provided in Appendix D (Table A4), mapping diagnosis to prompt revision, observable image changes, and human rubric verification.

4.2. First Closed-Loop Cycle: AI-Assisted Descriptive Alignment

4.2.1. Generate Phase

In the first closed-loop cycle, Prompt P5 instructed GPT-4o to regenerate an image based on the finalized Midjourney output and to perform an initial self-evaluation of whether the target mechanism-linked indicators were present (including blocking, vortex splitting, and rainfall asymmetry). At this stage, the self-evaluation primarily emphasized descriptive correspondence between textual specifications and visible elements.

The generated visualization (Figure 6) showed improved visual completeness and clearer differentiation of major atmospheric components. However, spatial relationships among mechanisms remained weakly constrained. Secondary circulations associated with vortex splitting were present but not systematically anchored to terrain features and streamline continuity across windward and leeward regions lacked physical coherence.

At this stage, evaluation remained largely surface-oriented. The corresponding recorded self-evaluation output for this baseline cycle is summarized in Appendix D (Table A4, Cycle 1). GPT-4o identified mismatches primarily at the descriptive level (e.g., missing visible elements or insufficient emphasis) without explicitly diagnosing how representational structures mediated scientific meaning. The full text of Prompt P5 is provided in Appendix B.

4.2.2. Evaluate–Refine Phase and AI Self-Evaluation Stage

Self-evaluation in this cycle functioned primarily as a consistency check. GPT-4o identified whether specified elements appeared but did not analyze causal or structural relationships among them. Refinement, therefore, consisted mainly of reiterating descriptive constraints, resulting in incremental enhancement without substantive reorganization of representational structure.

Typographical errors in AI-generated labels (e.g., “ORORAGICAL SPLITTING” in Figure 7; “PRESSURE REDISTRUTRION” in Figure 8) are not merely incidental artifacts but represent a substantive limitation of current text-to-image AI systems for scientific communication. These errors directly impact the pedagogical usability of the generated images if deployed without post-processing. In the current study, corrected label text is provided in the figure captions; in a deployment context, post hoc annotation overlays or manual correction would be required. This limitation reinforces a core claim of the MTA–TPACK framework: even when AI systems participate in closed-loop evaluative cycles, human epistemic authority remains essential for final quality assurance.

This cycle corresponds to the AI-Assisted Descriptive Alignment stage in Table 3, representing the most basic form of AI-augmented Meta-Task Awareness, closely analogous to the Traditional View of human-centered evaluation in Table 2.

4.3. Progressive Redistribution of AI Self-Evaluation

4.3.1. Second Closed-Loop Cycle: AI-Augmented Diagnostic Evaluation

In the second closed-loop cycle, Prompt P6 shifted GPT-4o self-evaluation from descriptive correspondence to mechanism-sensitive diagnosis. The model was instructed to identify mismatches between intended physical mechanisms and their visual manifestations, and to propose targeted refinements that operationalize causal structure (e.g., terrain-aligned flow deflection, windward–leeward contrast, and multilayer atmospheric organization). The recorded diagnostic output for this cycle is documented in Appendix D (Table A4, Cycle 2), and the full text of Prompt P6 is provided in Appendix B.

The regenerated visualization (Figure 7) shows improved terrain-aligned streamline curvature and clearer differentiation between windward enhancement and leeward subsidence. Although vortex splitting remains partially constrained at this stage, the diagnostic cycle establishes a transition from “feature checking” toward “mechanism–representation alignment,” corresponding to the AI-Assisted Diagnostic Evaluation stage in Table 3.

4.3.2. Third Closed-Loop Cycle: AI-Augmented Structured Self-Reflection

The third cycle further formalized AI self-evaluation through Prompt P7, which required GPT-4o to generate structured reflective feedback prior to image regeneration. This feedback explicitly separated (i) identification of representational misalignment, (ii) physical justification grounded in atmospheric dynamics, and (iii) targeted refinement strategies. The corresponding structured self-reflection text (checklist-style evaluation) is reported in Appendix D (Table A4, Cycle 3).

As a result, the regenerated visualization (Figure 8) demonstrated clear structural reorganization. Terrain-induced gyres were spatially localized downstream of the mountain range, vortex splitting was rendered as a coherent secondary circulation rather than diffuse turbulence, and streamline density and orientation exhibited improved internal consistency.

This cycle corresponds to the AI-Augmented Structured Self-Reflection stage in Table 3. Here, GPT-4o produces structured self-evaluation outputs that function as a planning-and-checking protocol guiding refinement decisions prior to regeneration, within human-defined task boundaries (see Appendix D, Table A4, Cycle 3). This stage parallels the Active Orchestration phase in Table 2 and extends evaluative control through AI-mediated reflection.

The structured self-reflection observed in this cycle marks a clear transition toward Visible Pedagogical Thinking, as evaluative criteria previously articulated only in human judgment are now explicitly referenced, justified, and operationalized within the refinement process. As a result, the regenerated visualization reflects not only improved scientific accuracy but also an auditable imprint of evaluative reasoning across cycles (Appendix D, Table A4), consistent with frameworks that emphasize the externalization of pedagogical cognition in AI-mediated artifacts [8,9,10,11].

Typographical errors in AI-generated labels remain a substantive usability limitation for scientific communication; accordingly, corrected label text is provided in captions, while deployment would require post hoc overlays or manual correction (see Section 4.2.2).

4.3.3. Fourth Closed-Loop Cycle: AI-Augmented Selection-Oriented Stabilization

In the final cycle, Prompt P8 shifted the role of AI self-evaluation from correction to stabilization. GPT-4o was instructed to generate multiple candidate visualizations and articulate explicit selection rationales grounded in scientific fidelity, pedagogical clarity, and representational balance. The recorded selection-oriented directives and stabilization criteria are documented in Appendix D (Table A4) for Cycle 4.

The selected output (Figure 9) exhibits an integrated representation of all target mechanism-linked indicators without over-specification. Rather than maximizing any single feature, the visualization reflects deliberate trade-offs among detail, clarity, and interpretability. Evaluation at this stage no longer aims to eliminate errors but to select among multiple scientifically valid representations.

This cycle corresponds to the AI-Augmented Selection-Oriented Stabilization stage in Table 3, representing the highest level of AI-augmented Meta-Task Awareness. It aligns with the MTA-Enhanced View in Table 2, while demonstrating how epistemic selection can be partially supported by AI under human-defined constraints.

4.4. Comparative Analysis: Human-Orchestrated vs. AI-Augmented Spirals

Table 3 summarizes the progression of AI self-evaluation across the four closed-loop cycles. For transparency, the corresponding cycle-wise trace is provided in Appendix D (Table A4). When read in conjunction with Table 2, these stages should be understood not as a separate cognitive framework, but as AI-augmented extensions of Meta-Task Awareness within the same theoretical axis. Note that the AI-augmented self-evaluation stages in Table 3 describe a progression in GPT-4o’s self-evaluation outputs—from surface-level feature checking toward more integrated mechanism-level evaluation—within human-designed task structures and under Meta-Task Awareness oversight [6,22]. This interpretation aligns with empirical findings by Tan et al. [14], who reported that effective GenAI integration involves “metacognitive recalibration” resembling the dynamic awareness captured in our spiral model.

From the perspective of the MTA–TPACK Dynamic Collaboration Spiral introduced in Section 2, the GPT-4o phase demonstrates that evaluative functions—traditionally externalized in human judgment—can be progressively redistributed within AI-mediated workflows through explicit task design. Importantly, this redistribution does not diminish human epistemic authority. Instead, it redistributes reflective labor, allowing AI to participate in structured evaluation while ultimate task framing and selection criteria remain human-defined.

Together, Table 2 and Table 3 operationalize Meta-Task Awareness across two collaboration regimes: human-centered orchestration and AI-augmented self-evaluation. Their combined trajectories provide empirical grounding for the claim that Meta-Task Awareness is not a fixed attribute of either human or AI agents, but a dynamic property emerging from carefully designed interaction structures.

4.5. Summary of the GPT-4o Phase

The GPT-4o phase extends the Dynamic Collaboration Spiral by redistributing evaluative labor within a closed-loop human–AI interaction structure, as conceptualized in Section 2. Rather than restating the full spiral architecture, this phase demonstrates how AI-supported self-evaluation can progressively take on structured diagnostic and selection-oriented functions under human-defined epistemic constraints (see Appendix D, Table A4 for the cycle-wise trace).

As documented in Table 3, this progression parallels—but does not replicate—the human-centered MTA development observed in the Midjourney phase. Crucially, evaluative authority remains grounded in human Meta-Task Awareness, confirming that AI augmentation amplifies reflective capacity without displacing epistemic responsibility.

5. Conclusions

This study proposed the MTA–TPACK Dynamic Collaboration Spiral as a process-oriented framework to explain how relatively static TPACK resources are activated through Meta-Task Awareness, operationalized through iterative Generate–Evaluate–Refine cycles, and externalized as Visible Pedagogical Thinking. The study does not aim to establish statistical generalizability but to articulate design-transferable epistemic principles through analytic exemplification. Illustrated through a two-phase scientific visualization case, the study shows how human–AI collaboration can remain epistemically grounded by translating disciplinary mechanisms into system-interpretable constraints and by documenting evaluation as an auditable trace. Together, the framework and the case provide a structured account of how pedagogical reasoning can be made inspectable under increasingly agentic generative systems.

5.1. Theoretical Contributions

The MTA–TPACK Dynamic Collaboration Spiral advances current discourse on AI integration by shifting analytical attention from static knowledge categories to dynamic, evaluative processes. While existing extensions have clarified what knowledge educators may need [8,9,10,11,12], this study theorizes how such knowledge can be activated and coordinated during iterative human–AI collaboration. The theoretical contributions are threefold:

First, the framework reconceptualizes TPACK for agentic AI environments. Unlike the predictable instruments of classical TPACK, generative AI participates in representation and decision-making in opaque, context-sensitive ways [6]. By positioning MTA as a dynamic navigation engine, the model offers an analytic lens for examining how epistemic authority may be maintained while leveraging AI’s generative capacity under human-defined criteria. This responds to the need for frameworks that treat AI not merely as a tool, but as a task-bounded collaborator within pedagogical design activity [7,20].

Second, the study bridges the gap between knowledge structure and procedural deployment by operationalizing the Generate–Evaluate–Refine spiral as a reproducible interaction pattern. The framework specifies how disciplinary content knowledge can function as an epistemic filter when translated into mechanism-linked constraints and evaluated against anchored criteria. In doing so, it reframes TK from tool proficiency to a critical awareness of how AI reshapes the cognitive task itself, and PK from strategy selection to the deliberate orchestration of human–AI agency during iterative design.

Third, the study clarifies Visible Pedagogical Thinking as an outcome of collaboration: expert reasoning becomes inspectable when evaluation is anchored in explicit criteria (Appendix C) and documented as a cycle-wise trace of diagnosis, revision, and verification (Appendix D). In this sense, the value of human–AI collaboration extends beyond efficiency gains to the externalization of mechanism-guided pedagogical reasoning in forms that can be reviewed, discussed, and potentially reused in design practice.

These theoretical contributions carry implications for teacher professional development that require future empirical validation across diverse educator populations and disciplinary contexts.

5.2. Empirical Contributions

This study provides an empirically grounded illustration that the MTA–TPACK spiral is not only a theoretical construct but can also be operationalized as a practical workflow for iterative scientific visualization.

First, the results suggest that expert-level accuracy is achieved through cognitive reorientation rather than mere tool proficiency. In the Midjourney phase, the systematic improvement in scientific accuracy—from 2/5 to 5/5—is consistent with the proposed transition from a tool-oriented Traditional View to an awareness-centered MTA-Enhanced View. As documented in Table 2, this trajectory was not a random exploration but a principled progression from post hoc checking to Active Orchestration. By translating abstract meteorological mechanisms (e.g., vortex splitting and orographic blocking) into precise AI-interpretable constraints, the study provides empirical evidence that Content Knowledge must be activated through MTA to function as an effective epistemic filter.

Second, the closed-loop experiment illustrates the acceleration and operational embedding of the collaboration spiral. The GPT-4o phase reveals how AI participation in evaluation can lead to an “accelerated spiral ascent.” As documented in Table 3, the AI’s ability to perform surface detection and feature discrimination enabled more rapid refinements. However, the finding that AI could not independently determine epistemic relevance underscores the continued necessity of human oversight. This finding is consistent with a core premise of our framework: while AI can participate in the “Evaluate” phase, the “Refine” phase remains tethered to human-defined epistemic criteria, ensuring that the spiral remains pedagogically grounded.

Third, the comparative analysis supports the interpretation that Meta-Task Awareness is an emergent, dynamic property of interaction. Although the human-orchestrated (Midjourney) and AI-augmented (GPT-4o) pathways differed in efficiency and oversight requirements, both converged on equivalent, expert-validated outcomes (5/5 scientific accuracy). This convergence suggests that MTA functions not as a static trait of the user, but as a dynamic navigation property that can be distributed across different interaction structures. By documenting these two distinct yet successful trajectories, the study offers an initial empirical baseline for structuring sustainable human–AI collaboration to maintain scientific rigor in the post-prompting era. Across both phases, refinement is guided by mechanism-linked indicators and anchored scoring descriptors (Appendix C), while AI-assisted self-evaluation is documented as an auditable cycle-wise trace (Appendix D), and platform configurations are reported for reproducibility (Appendix E).

5.3. Visible Pedagogical Thinking: The Culminating Outcome

Positioned at the apex of our framework, Visible Pedagogical Thinking represents the structured reasoning pattern of the human–AI collaboration. Through the application of the MTA engine, the resulting visualizations may transcend their role as mere aesthetic products; they offer structured reasoning patterns—tangible embodiments of the negotiation between expert intent and AI capability.

Visible Pedagogical Thinking refers to pedagogical reasoning that becomes inspectable when evaluative criteria are made explicit and iteratively applied to constrain generative outputs. In this study, this is evidenced by the rubric-anchored scoring descriptors (Appendix C) and the cycle-wise diagnosis–revision–verification trace recorded for the closed-loop GPT-4o phase (Appendix D, Table A4), which links mechanism-level judgments to observable changes across iterations.

5.3.1. Final Artifacts as Externalized Mental Models

The final visualizations (Figure 5b and Figure 9) serve as a testament to the principled orchestration of human–AI agency. By guiding the AI through iterative refinement—enforcing scientific rigor and recalibrating physical constraints—experts effectively externalize their internal mental models.

This structured reasoning reflects a fundamental reframing across all TPACK dimensions: Content Knowledge is reframed from passive facts into a potential epistemic filter; Pedagogical Knowledge shifts from strategy selection toward the deliberate orchestration of visual communication; and Technological Knowledge evolves into a critical awareness of AI’s interpretive biases. These reframings—normally tacit dimensions of expert cognition—are now inscribed in visual form. The images do not merely represent atmospheric science; they embody the cognitive work of translating complex science into AI-mediated communication.

5.3.2. Documented Trajectory as a Reusable Pedagogical Resource

Beyond the final artifacts, the documented trajectory (P1 through P8) constitutes Visible Pedagogical Thinking in its most actionable sense. This sequence provides a transparent record of design evolution, capturing how expert logic is iteratively refined to address AI interpretation patterns.

This trajectory has the potential to serve as a reflective pedagogical resource for teacher professional development, pending empirical validation, allowing both pre-service and in-service educators to deconstruct and learn from the expert’s reasoning process. By transforming ephemeral design cognition into observable and teachable patterns, this documentation reveals:

Common Pitfalls: such as over-reliance on scientific terminology without visual correlates.
Effective Strategies: such as translating abstract mechanisms into observable visual features.
Evaluative Principles: the four-mechanism rubric as a standard for scientific rigor.

This visibility carries potential implications for education. Teaching scientific visualization is no longer limited to presenting the “final product”; it now also encompasses demonstrating iterative negotiation. This process, while distinct from traditional generative workflows [15], remains firmly rooted in established visual traditions [26,29], ensuring that pedagogical innovation remains both grounded and future-ready.

5.3.3. Empirical Evidence for the Distinctiveness of Visible Pedagogical Thinking

As proposed in Section 2.3, Visible Pedagogical Thinking is distinguished from standard iterative documentation by three properties: epistemic structure, evaluative reasoning, and theoretical grounding. The two-phase case study provides concrete evidence for each distinction.

Regarding epistemic structure, the trajectory from P1 to P8 is not organized as a chronological revision history but as a progressive mobilization of the four target mechanism-linked indicators (Table 1). In the Midjourney phase, for example, the transition from P1 to P4 can be traced as a sequence in which blocking and rainfall asymmetry were first addressed (P2), followed by vortex splitting (P3), and finally integrated with streamline structure (P4). Each cycle is legible as a response to a specific mechanism deficit, not merely as a generic statement that “the image was improved.”

Regarding evaluative reasoning, Table 2 and Table 3 record not only the scoring outcomes but the MTA stage at each cycle, documenting the type of evaluative thinking applied. The progression from “Traditional View” (outcome-focused evaluation) through “Emerging Awareness” (identifying AI-specific biases) to “MTA-Enhanced View” (mechanism-level causal assessment) demonstrates that the evaluative depth of the documentation evolved across cycles—a property absent from standard process logs, which typically maintain a uniform level of descriptive detail.

Regarding theoretical grounding, the first spiral cycle (Traditional View, Table 2) serves as an empirical baseline. In this cycle, the domain expert possessed high content knowledge (CK) and tool familiarity (TK) but had not yet developed MTA for the specific task. The resulting visualization scored 2/5 on scientific accuracy despite the expert’s disciplinary competence, and the accompanying documentation records only a generic prompt and an aesthetic evaluation. This cycle illustrates that iterative AI use without MTA produces process records that lack evaluative depth—confirming that the epistemic structure characteristic of VPT is a product of MTA-guided engagement, not a byproduct of thorough logging.

5.4. Limitations and Future Directions

While this study illustrates the theoretical utility of the MTA–TPACK spiral, several limitations offer productive avenues for future research.

First, our empirical focus was restricted to a single scientific domain. Although typhoon–terrain interaction provided a rigorous testbed, future research should assess the framework’s transferability across diverse disciplines—such as biology, chemistry, and social sciences—where the visual correlates of disciplinary knowledge may follow different logic.

Second, although the MTA–TPACK framework is theoretically grounded in constructs developed for teacher education (Wong, 2025 [22]), the present study demonstrates the framework through a single domain expert. Future research should examine how MTA develops among pre-service and in-service teachers with varying levels of disciplinary and technological expertise, using design-based research or think-aloud methodologies.

Third, the short-term nature of the four-iteration sequences limits our understanding of long-term developmental trajectories. Longitudinal studies are needed to track how the collaboration spiral evolves over extended periods, offering deeper insights into the mechanisms of cognitive sustainability and the prevention of cognitive deskilling. Such longitudinal work would clarify whether human–AI collaboration contributes to sustainable professional growth rather than short-term performance gains.

Fourth, the evaluations were conducted by the first author, who also designed the prompts. While the structured rubric (Appendix C) was developed to anchor judgments in observable visual criteria and reduce subjective bias, future studies should incorporate independent raters to further strengthen evaluative validity and establish inter-rater reliability.

Finally, our findings are based on Midjourney v6.1 and GPT-4o with DALL·E 3 (see Appendix E for technical specifications). Given the protean and unstable nature of Generative AI [6], the framework must be continuously validated against emerging agentic platforms. Future research should develop validated instruments to measure MTA, distinct from general metacognition and epistemic cognition, potentially using think-aloud protocols to capture real-time awareness of AI–task interactions.

A particularly valuable future study would examine whether novice teachers can more accurately identify the intended scientific mechanisms from a final artifact paired with its documented evaluation trajectory (P1–P8 + Table 2 and Table 3) than from the artifact alone. Such a study would empirically validate the pedagogical utility of Visible Pedagogical Thinking.

In addition, external validation of Visible Pedagogical Thinking artifacts is a particularly important next step. Future studies should examine whether educators unfamiliar with the original design process can (a) accurately identify the intended scientific mechanisms from the documented trajectory and (b) adapt the trajectory to their own disciplinary contexts. Such studies would require independent participants, structured comparison conditions (artifact + trajectory vs. artifact alone), and validated instruments to measure pedagogical interpretability.

5.5. Concluding Remarks

The MTA-TPACK Dynamic Collaboration Spiral reconceptualizes generative AI integration as the dynamic orchestration of teacher knowledge rather than the acquisition of technical skills. By positioning Meta-Task Awareness (MTA) as a navigation engine, the framework explains how static TPACK resources are mobilized through iterative Generate–Evaluate–Refine cycles, culminating in Visible Pedagogical Thinking—artifacts that externalize structured expert reasoning.

As AI evolves from a passive instrument to a socio-cognitive collaborator, sustaining epistemic responsibility becomes the central educational challenge. Our findings suggest that maintaining human oversight may depend less on tool fluency than on deliberate evaluative design. When disciplinary mechanisms and selection criteria are explicitly articulated, AI functions as a reflective medium, stabilizing and making internal mental models visible.

Anchored in meta-cognitive regulation and disciplinary rigor, the framework provides an analytical lens through which human–AI collaboration can remain cognitively sustainable, supporting the prospect that advances in machine capabilities can be matched by the continued cultivation of human pedagogical judgment.

Author Contributions

Conceptualization, H.-C.C. and L.-H.W.; methodology, H.-C.C. and L.-H.W.; software, H.-C.C.; validation, H.-C.C. and L.-H.W.; formal analysis, H.-C.C. and L.-H.W.; writing—original draft preparation, H.-C.C. and L.-H.W.; writing—review and editing, H.-C.C. and L.-H.W.; visualization, H.-C.C.; supervision, L.-H.W.; project administration, H.-C.C.; funding acquisition, H.-C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, under Grant No. 42275064.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

During the preparation of this manuscript/study, the authors used GPT-5.2 for linguistic refinement and proofreading to enhance the clarity of the manuscript. The authors used Midjourney, Version 6.1, for the purposes of generating Figure 2, Figure 3, Figure 4 and Figure 5, and GPT-4o for generating Figure 6, Figure 7, Figure 8 and Figure 9. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Full Prompt Texts for the Midjourney Visualization Phase

This appendix presents the complete prompt texts (P1–P4) used in the Midjourney phase of the study. The prompts are provided in full to ensure transparency and reproducibility, while the main text focuses on the structural evolution of prompt design and its role in the MTA–TPACK Dynamic Collaboration Spiral.

Appendix A.1. Prompt P1 (Baseline Generation)

Infographic of a tropical cyclone encountering orographic blocking by Taiwan’s Central Mountain Range, vivid depiction of low-level flow disruption and vortex splitting as winds interact with steep terrain, intricate layers showing deflected wind trajectories around the mountain, rain-enhancing windward lifting, and subsiding air on the lee side, detailed visualization of pressure gradients and potential vorticity redistribution, scientifically accurate with dynamic visuals of energy transfer and atmospheric turbulence, dramatic lighting emphasizing terrain features and flow divergence, realistic shadows enhancing the 3D perspective of atmospheric layers.

Appendix A.2. Prompt P2 (Introduction of Visual Constraints)

Infographic illustrating a tropical cyclone encountering orographic blocking by Taiwan’s Central Mountain Range. The visualization captures detailed low-level flow disruption, vortex splitting, and turbulence as winds interact with steep terrain. Streamlines clearly depict deflected wind trajectories around the mountains, rain-enhancing windward lifting, and descending dry air on the lee side. The image highlights pressure gradients and potential vorticity redistribution with structured flow lines and color gradients. Atmospheric turbulence and energy transfer are dynamically represented. Lighting and shadows emphasize terrain features, flow divergence, and three-dimensional atmospheric layering. The composition excludes unnecessary embellishments, focusing on scientifically accurate cyclone–terrain interactions.

Appendix A.3. Prompt P3 (Mechanism-to-Feature Translation)

Infographic illustrating a tropical cyclone encountering orographic blocking by Taiwan’s Central Mountain Range. The visualization focuses on scientifically accurate low-level flow interactions, showing vortex splitting, turbulence, and wind field deflection around steep terrain. Streamlines precisely depict flow disruption, terrain-induced gyres, and windward lifting that enhances rainfall. The lee side features descending dry air and complex turbulent structures. Pressure gradients and potential vorticity redistribution are visualized using color gradients and structured flow lines. Atmospheric turbulence and energy transfer are dynamically represented. Lighting and shadows emphasize both terrain features and flow divergence, ensuring the visualization accurately represents cyclone-terrain interactions in three-dimensional space.

Appendix A.4. Prompt P4 (Integrated and Minimalist Specification)

Infographic illustrating a tropical cyclone encountering orographic blocking by Taiwan’s Central Mountain Range. The visualization focuses on scientifically accurate low-level flow interactions, showing vortex splitting, turbulence, and wind field deflection around steep terrain. Streamlines precisely depict flow disruption, terrain-induced gyres, and windward lifting that enhances rainfall. The lee side features descending dry air and complex turbulent structures. Pressure gradients and potential vorticity redistribution are visualized using color gradients and structured flow lines. Atmospheric turbulence and energy transfer are dynamically represented. Lighting and shadows are optimized to emphasize both terrain features and flow divergence, ensuring the visualization accurately represents cyclone–terrain interactions in three-dimensional space. The composition is minimalist, removing unnecessary elements while maintaining scientific accuracy in illustrating the physical dynamics.

Appendix B. Full Prompt Texts for the GPT-4o Closed-Loop Phase

Appendix B.1. Prompt P5 (Baseline Regeneration with Initial Self-Evaluation)

Generate a scientific illustration of a tropical cyclone interacting with Taiwan’s Central Mountain Range. Position the typhoon on the right side of the image, with the Central Mountain Range on the left. The typhoon makes landfall from east to west between Hualien and Yilan. Render from bird’s eye perspective with color tones similar to satellite imagery. Reference specifications: orographic blocking with low-level flow disruption, vortex splitting as winds interact with steep terrain, deflected wind trajectories around the mountain, rain-enhancing windward lifting, subsiding air on the lee side, pressure gradients and potential vorticity redistribution, scientifically accurate atmospheric dynamics, dramatic lighting emphasizing terrain features and flow divergence.

Appendix B.2. Prompt P6 (Evaluation–Regeneration with Diagnostic Comparison)

Part 1: Based on the following description prompt, objectively evaluate the differences between the uploaded scientific image and the prompt specifications. Identify areas requiring modification, then regenerate an image consistent with the prompt description along with a revised prompt.

Part 2: Generate a revised image using this enhanced specification: Infographic of a tropical cyclone interacting with Taiwan’s Central Mountain Range, viewed from bird’s eye perspective, realistic 3D terrain shading. The cyclone appears as a flattened, dense spiral over the ocean on the right, making landfall from east to west between Hualien and Yilan. Show vivid low-level inflow deflection curving around steep topography, mid-level vortex splitting into two streams, and high-altitude outflow divergence. Thick multilayered clouds rise along the eastern slope due to orographic lifting, while the western side shows sparse clouds and descending dry air. Visualize pressure gradients and red-orange vorticity redistribution zones near the mountains. Use dramatic lighting and realistic shadows to emphasize turbulence and energy transfer in a scientifically accurate meteorological illustration.

Appendix B.3. Prompt P7 (Enhanced Multilayer Specification with Mechanism-Level Evaluation)

Part 1: Evaluate the uploaded image against previous specifications, identify remaining gaps, and propose targeted modifications.

Part 2: Generate using this enhanced specification: Ultra-detailed educational infographic of a tropical cyclone interacting with Taiwan’s Central Mountain Range, bird’s eye perspective, 3D shaded terrain. The cyclone, a dense flattened spiral over the ocean on the right, is making landfall between Hualien and Yilan. Include multilayered atmospheric structure: low-level inflow deflection curving around terrain, mid-level vortex splitting into dual cores, and high-altitude divergent outflow. Show thick cumulonimbus clouds forming over eastern slopes due to orographic lifting, and sparse cloud fields with descending dry air on western side. Visualize pressure gradients and vorticity redistribution with red-orange turbulent zones and layered flow streamlines. Dramatic lighting, realistic shadows, and textured cloud formations. Label major components clearly in scientific style, use accurate spelling and typography.

Appendix B.4. Prompt P8 (Comprehensive Integration and Final Optimization)

Part 1: Evaluate the uploaded image against specifications, identify any remaining discrepancies, and propose final optimizations.

Part 2: Generate using this comprehensive specification: Ultra-detailed educational infographic of a tropical cyclone interacting with Taiwan’s Central Mountain Range, bird’s eye perspective, realistic 3D shaded terrain. The tropical cyclone is shown as a dense flattened spiral over the ocean on the right, making landfall between Hualien and Yilan. Highlight three-layered atmospheric structure: vivid low-level inflow deflection curving around topography, mid-level vortex splitting into two core streams, and high-altitude divergent outflow shown with large-scale spiral arrows. Show thick cumulonimbus clouds forming over eastern slopes with tower-like vertical development, and sparse cloud cover with dry descending air on the western lee side. Add red–orange turbulent zone near terrain to represent vorticity redistribution and pressure gradients, with scientific flow lines and realistic cloud textures. Include dashed isolines or gradient fields to visualize pressure and vorticity. Dramatic lighting and shadows emphasize terrain and cloud volume. Ensure all labels are scientifically accurate and typographically correct: “low-level inflow,” “mid-level splitting,” “high-altitude outflow,” “orographic lifting,” “vorticity redistribution,” “descending dry air.”

Appendix C. Structured Evaluation Rubric for AI-Generated Scientific Visualizations

This appendix details the structured criteria used to evaluate the AI-generated visualizations in the Midjourney and GPT-4o phases. The evaluation logic proceeds from the identification of specific mechanisms (Table A1) to holistic expert scoring (Table A2 and Table A3).

Table A1. Operational definitions for mechanism presence indicators used in Table 2 and Table 3.

Indicator	Absent (×)	Partial (△)	Clear (✓)
Blocking	No visible flow disruption at terrain; cyclone appears unimpeded.	Terrain is present, and a cyclone is nearby, but the flow disruption is not spatially anchored to the mountain.	Clear deceleration/deflection of low-level flow at the terrain interface; upstream flow disruption is visually distinct.
Vortex Splitting	Single coherent vortex; no secondary circulation features.	Suggestion of asymmetry or diffuse turbulence on the lee side, but no identifiable secondary gyres.	Distinct secondary circulation center(s) or terrain-induced gyres downstream of the mountain; dual-core structure visible.
Rainfall Asymmetry	Symmetrical cloud/rainfall distribution around terrain.	Slight windward enhancement visible, but leeside drying not clearly differentiated.	Pronounced windward cloud/rainfall enhancement with corresponding lee-side suppression;
Streamline Structure	No coherent airflow lines or a generic spiral without terrain interaction.	Some flow lines present but discontinuous or not physically anchored to terrain; directionality inconsistent.	Continuous, physically coherent streamlines showing deflection, curvature, and convergence/divergence patterns consistent with terrain–cyclone interaction.

Table A2. Holistic scoring anchors for Scientific Accuracy. Reflects the degree to which the visual artifact aligns with the four physical mechanisms defined in Table A1.

Score	Anchor Description
1	Major Misrepresentation/Hallucination: The image violates fundamental laws of physics (e.g., airflow passing through the mountain). Key mechanisms are predominantly Absent (×).
2	Low Accuracy: Mechanisms are barely recognizable or heavily distorted. Most indicators are Absent (×) or weakly Partial (△). Scientifically misleading without significant correction.
3	Partial Accuracy: A mix of Clear (✓) and Absent (×) features. For example, blocking may be visible, but vortex splitting is missing. Plausible to a layperson but flawed to an expert.
4	High Accuracy: Most mechanisms are clearly present (✓). Minor physical inconsistencies exist (e.g., slightly incorrect flow angles), but the core dynamics remain valid.
5	Precise/Expert-Level: All four target mechanisms are clearly present (✓). The interaction between terrain and flow is physically consistent. No significant hallucinations.

Table A3. Holistic scoring anchors for Visual Clarity. Reflects the visualization’s communicative quality, perceptual salience, and accessibility.

Score	Anchor Description
1	Unintelligible: The image is cluttered, blurry, or visually incoherent. Key features cannot be distinguished from background noise.
2	Low Clarity: Poor composition or lighting makes it difficult to identify terrain or flow features. Overly stylized to the point of obscuring information.
3	Moderate Clarity: The image is readable. Features are distinguishable, but contrast is lacking, or unnecessary aesthetic details distract from the scientific content.
4	High Clarity: Good use of lighting, color, and composition to highlight key features. Visual hierarchy directs attention to relevant phenomena.
5	Excellent: Outstanding visual saliency. The use of color (e.g., for vorticity), lighting, and perspective makes the scientific mechanisms immediately obvious without explanation.

Appendix D. GPT-4o Self-Evaluation Trace

This appendix documents the recorded self-evaluation outputs produced by GPT-4o during the closed-loop phase (Section 4). In each closed-loop cycle (P5–P8), GPT-4o was instructed to (i) evaluate the current image against the prompt specifications, (ii) diagnose misalignments, and (iii) propose targeted refinements prior to regeneration (Appendix B). While the main text abstracts this process into four AI-augmented self-evaluation stages (Table 3), the present appendix provides an audit trail that makes the evaluative reasoning inspectable.

Table A4 consolidates the cycle-wise trace in a “diagnosis→prompt revision focus→observed image change→human verification” format, aligned with the rubric-based judgments reported in Table 3. The recorded outputs are reproduced as short excerpts from the GPT-4o interface logs. Formatting has been minimally adjusted for readability while preserving the meaning and intent of the model’s evaluation statements.

Table A4. GPT-4o Self-Evaluation Trace ¹.

Cycle/Figure/Prompt	Recorded GPT-4o Self-Evaluation Output (Excerpts)	Prompt Revision Focus (Next Cycle; Appendix B)	Observed Image Change (Main Text Summary)	Human Verification (Table 3)
1/6/P5	The artifact aligns descriptively (terrain–wind relation and map readability) but lacks mechanism layers: no clear low/mid/upper separation, weak windward rainfall/ascent cues, missing pressure-gradient/PV layer, limited 3D terrain shading, and overly linear arrows with little turbulence or energy-transfer detail.	Operationalize mechanisms as visual correlates by enforcing a multilayer structure, windward rainband, leeward subsidence, PV/pressure layer, and stronger shading.	Establishes a baseline with better overall completeness, but vortex splitting and streamline coherence are weakly constrained and not clearly terrain anchored.	B △; vs. △; RA △; SS △; Sci 3/5; Vis 3/5
2/7/P6	The self-evaluation shifts to a mechanism-sensitive diagnosis: add a tripartite vertical structure (low-level inflow, mid-level splitting, upper-level divergence), intensify windward rainband thickness for orographic lifting, clarify leeward subsidence with thinner clouds and downward cues, use gradients for pressure/PV redistribution, and improve terrain lighting/shadows for depth.	Tighten constraints by enforcing terrain-aligned streamline curvature, windward–leeward contrast, and altitude-differentiated semantics.	Streamline curvature appears more coherent around terrain, with clearer windward enhancement and leeward subsidence separation; splitting persists but remains somewhat ambiguous.	B ✓; vs. △; RA ✓; SS △; Sci 4/5; Vis 4/5
3/8/P7	Checklist reflection: achieved better 3D terrain shading and clearer low-level deflection; missing includes explicit mid-level split, outward upper divergence cues, and a convincing western dry zone; partial includes cumulonimbus layering and PV/pressure overlays; typography issues flagged.	Implement a structured reflection process with ‘misalignment–justification–action’, improve typography, enhance multilayer cues, and transition from correction to stabilization criteria.	Structural organization clarifies dual circulation centers and strengthens pedagogical labeling, but upper-level divergence and some multilayer cues remain incomplete.	B ✓; vs. ✓; RA ✓; SS △; Sci 4/5; Vis 5/5
4/9/P8	Final stabilization directives include making vortex splitting explicit, adding upper-level outflow arrows, strengthening windward cumulonimbus features, visualizing leeward dry subsidence, adding isobars/iso-vorticity overlays, and correcting label errors.	Stabilize through explicit selection criteria to achieve a balanced variant that maximizes scientific accuracy, educational clarity, and consistent representation across mechanisms.	Integrates all target mechanisms without over-specification; labels and flow depiction stabilize, enhancing interpretability and aligning with rubric-based evaluation.	B ✓; vs. ✓; RA ✓; SS ✓; Sci 5/5; Vis 5/5

¹ Legend: B = Blocking; vs. = Vortex Splitting; RA = Rainfall Asymmetry; SS = Streamline Structure. ✓ = clearly present; △ = partially present.

Appendix E. Technical Specifications

This appendix documents the technical configurations used in both experimental phases to support methodological transparency. Given the protean and unstable nature of generative AI [6], exact replication of specific outputs is not achievable; the specifications below are provided to enable approximate reconstruction of experimental conditions. The replicable contribution of this study is not any specific generated image but the documented evaluation trajectory (Appendix A, Appendix B, Appendix C and Appendix D, Table 2 and Table 3), which is designed to be transferable across generative AI platforms.

Table A5. Technical Specifications.

Parameter	Midjourney Phase	GPT-4o Phase
Model version	Midjourney v6.1	GPT-4o (gpt-4o-2024-08-06) with DALL·E 3
Interface	Discord bot	ChatGPT web interface
Generation period	29–January 2025	12–April 2025
Aspect ratio	--ar 3:2	--ar 3:2
Key parameters	--quality 2 --raw --stylize 850	System defaults
Seed/Randomization	stochastic default	No seed control available
Candidates per cycle	4; best selected per rubric	1 per cycle
Negative prompts	None	None
Conversation continuity	None (each prompt independent)	Single thread across all 4 cycles

References

Mishra, P.; Koehler, M.J. Technological Pedagogical Content Knowledge: A Framework for Teacher Knowledge. Teach. Coll. Rec. 2006, 108, 1017–1054. [Google Scholar] [CrossRef]
Shulman, L.S. Those Who Understand: Knowledge Growth in Teaching. Educ. Res. 1986, 15, 4–14. [Google Scholar] [CrossRef]
Shulman, L.S. Knowledge and Teaching: Foundations of the New Reform. Harv. Educ. Rev. 1987, 57, 1–23. [Google Scholar] [CrossRef]
Koehler, M.J.; Mishra, P.; Cain, W. What Is Technological Pedagogical Content Knowledge (TPACK)? J. Educ. 2013, 193, 13–19. [Google Scholar] [CrossRef]
Voogt, J.; Fisser, P.; Pareja Roblin, N.; Tondeur, J.; van Braak, J. Technological Pedagogical Content Knowledge—A Review of the Literature. J. Comput. Assist. Learn. 2013, 29, 109–121. [Google Scholar] [CrossRef]
Mishra, P.; Warr, M.; Islam, R. TPACK in the Age of ChatGPT and Generative AI. J. Digit. Learn. Teach. Educ. 2023, 39, 235–251. [Google Scholar] [CrossRef]
Aziz, M.; Mokhtari, K. ‘AIA-PCEK’: A New Framework for Teaching with AI. Cogent Educ. 2025, 12, 2563171. [Google Scholar] [CrossRef]
Celik, I. Towards Intelligent-TPACK: An Empirical Study on Teachers’ Professional Knowledge to Ethically Integrate Artificial Intelligence (AI)-Based Tools into Education. Comput. Hum. Behav. 2023, 138, 107468. [Google Scholar] [CrossRef]
Chiu, T.K.F. Developing Intelligent-TPACK (I-TPACK) Framework from Unpacking AI Literacy and Competency: Implementation Strategies and Future Research Direction. Interact. Learn. Environ. 2025, 33, 4189–4192. [Google Scholar] [CrossRef]
Chiu, T.K.F.; Moorhouse, B.L.; Chai, C.S.; Celik, I. Intelligent-TPACK (I-TPACK) Framework Developed from TPACK through Integration of Artificial Intelligence Literacy and Competency. Interact. Learn. Environ. 2026, 1–16. [Google Scholar] [CrossRef]
Deng, G.; Zhang, J. Technological Pedagogical Content Ethical Knowledge (TPCEK): The Development of an Assessment Instrument for Pre-Service Teachers. Comput. Educ. 2023, 197, 104740. [Google Scholar] [CrossRef]
Wang, K.; Ruan, Q.; Zhang, X.; Fu, C.; Duan, B. Pre-Service Teachers’ GenAI Anxiety, Technology Self-Efficacy, and TPACK: Their Structural Relations with Behavioral Intention to Design GenAI-Assisted Teaching. Behav. Sci. 2024, 14, 373. [Google Scholar] [CrossRef] [PubMed]
Sun, L.; Zhou, D.; Luo, J.; Chen, Y.; Wei, D.; Gao, F. Promoting the AI Teaching Competency of K–12 Computer Science Teachers: A TPACK-Based Professional Development Approach. Educ. Inf. Technol. 2023, 28, 1813–1840. [Google Scholar] [CrossRef]
Tan, X.; Cheng, G.; Ling, M.H. Enhancing teachers’ AI competency: A professional development intervention study based on intelligent-TPACK framework. Comput. Educ. Artif. Intell. 2025, 9, 100521. [Google Scholar] [CrossRef]
Wen, W.; Wen, H. Bridging generative AI technology and teacher education: Understanding preservice teachers’ processes of unit design with ChatGPT. Contemp. Issues Technol. Teach. Educ. 2024, 24, 582–611. [Google Scholar]
Biton, Y.; Segal, R. Learning to Craft and Critically Evaluate Prompts: The Role of Generative AI (ChatGPT) in Enhancing Pre-Service Mathematics Teachers’ TPACK and Problem-Posing Skills. Int. J. Educ. Math. Sci. Technol. 2025, 13, 202–223. [Google Scholar] [CrossRef]
Pepin, B.; Buchholtz, N.; Salinas-Hernández, U. A Scoping Survey of ChatGPT in Mathematics Education. Digit. Exp. Math. Educ. 2025, 11, 9–41. [Google Scholar] [CrossRef]
Hwang, Y.; Wu, Y. Graphic design education in the era of text-to-image generation: Transitioning to contents creator. Int. J. Art Des. Educ. 2025, 44, 239–253. [Google Scholar] [CrossRef]
Lee, G.-G.; Zhai, X. Using ChatGPT for science learning: A study on pre-service teachers’ lesson planning. IEEE Trans. Learn. Technol. 2024, 17, 1643–1660. [Google Scholar] [CrossRef]
Yan, L. From Passive Tool to Socio-Cognitive Teammate: A Conceptual Framework for Agentic AI in Human–AI Collaborative Learning. arXiv 2025, arXiv:2508.14825. [Google Scholar] [CrossRef]
Ning, Y.; Zhang, C.; Xu, B.; Zhou, Y.; Wijaya, T.T. Teachers’ AI-TPACK: Exploring the Relationship between Knowledge Elements. Sustainability 2024, 16, 978. [Google Scholar] [CrossRef]
Wong, L.-H. Transforming Education with Generative AI: Designing for Post-Prompting Era. Inf. Technol. Educ. Learn. 2025, 5, Inv.p003. [Google Scholar] [CrossRef]
Chalmin, E.; Menu, M.; Vignaud, C. Analysis of Rock Art Painting and Technology of Palaeolithic Painters. Meas. Sci. Technol. 2003, 14, 1590–1597. [Google Scholar] [CrossRef]
Nunn, J.F. Ancient Egyptian Medicine; University of Oklahoma Press: Norman, OK, USA, 1998. [Google Scholar]
Leroi, A.M. The Lagoon: How Aristotle Invented Science; Bloomsbury Publishing: London, UK, 2015. [Google Scholar]
Kemp, M. Leonardo da Vinci: The Marvelous Works of Nature and Man; Harvard University Press: Cambridge, MA, USA, 1981. [Google Scholar]
Ford, B.J. Images of Science: A History of Scientific Illustration; Oxford University Press: Oxford, UK, 1993. [Google Scholar]
Kemp, M. Structural Intuitions: Seeing Shapes in Art and Science; University of Virginia Press: Charlottesville, VA, USA, 2016. [Google Scholar]
Kemp, M. Visualizations: The Nature Book of Art and Science; University of California Press: Berkeley, CA, USA, 2000. [Google Scholar]
Al-Amin, M.; Ali, M.S.; Salam, A.; Khan, A.; Ali, A.; Ullah, A. History of Generative Artificial Intelligence (AI) Chatbots: Past, Present, and Future Development. arXiv 2024, arXiv:2402.05122. [Google Scholar] [CrossRef]
Martínez, G.; Watson, L.; Reviriego, P.; Hernández, J.A.; Juarez, M.; Sarkar, R. Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet. arXiv 2023, arXiv:2306.06130. [Google Scholar] [CrossRef]
Davenport, T.H.; Mittal, N. How Generative AI Is Changing Creative Work. Harv. Bus. Rev. 2022. Available online: https://hbr.org/2022/11/how-generative-ai-is-changing-creative-work (accessed on 1 January 2026).
Chen, H.-C. Harnessing AI for scientific illustration: Exploring tornado dynamics through Midjourney. In Proceedings of the 2023 12th International Conference on Awareness Science and Technology (iCAST), Taichung, Taiwan, 9–11 November 2023; pp. 136–141. [Google Scholar] [CrossRef]
Chen, H.-C. Harnessing AI for scientific illustration: Exploring tropical cyclone dynamics using ChatGPT and Midjourney. In Proceedings of the 2025 IEEE International Conference on Computation, Big-Data and Engineering (ICCBE), Penang, Malaysia, 25–27 June 2025; pp. 568–573. [Google Scholar] [CrossRef]
Kessler, S.H.; Mahl, D.; Schäfer, M.S.; Volk, S.C. Science communication in the age of artificial intelligence. J. Sci. Commun. 2025, 24, Y01. [Google Scholar] [CrossRef]
Chan, J.C.L. Physical Mechanisms Responsible for Track Changes and Rainfall Distributions Associated with Tropical Cyclone Landfall. In Oxford Handbook Topics in Physical Sciences; Oxford Handbook Editorial Board, Ed.; Oxford University Press: Oxford, UK, 2017. [Google Scholar] [CrossRef]
Wu, C.-C.; Kuo, Y.-H. Typhoons Affecting Taiwan: Current Understanding and Future Challenges. Bull. Am. Meteorol. Soc. 1999, 80, 67–80. [Google Scholar] [CrossRef]
Tang, C.K.; Chan, J.C.L. Idealized Simulations of the Effect of Taiwan and Philippines Topographies on Tropical Cyclone Tracks. Q. J. R. Meteorol. Soc. 2014, 140, 1578–1589. [Google Scholar] [CrossRef]
Lin, Y.-L.; Han, J.; Hamilton, D.W.; Huang, C.-Y. Orographic Influence on a Drifting Cyclone. J. Atmos. Sci. 1999, 56, 534–562. [Google Scholar] [CrossRef]
Lin, Y.-L.; Savage, L.C. Effects of Landfall Location and the Approach Angle of a Cyclone Vortex Encountering a Mesoscale Mountain Range. J. Atmos. Sci. 2011, 68, 2095–2106. [Google Scholar] [CrossRef]
Sun, W.-Y. The Vortex Moving toward Taiwan and the Influence of the Central Mountain Range. Geosci. Lett. 2016, 3, 21. [Google Scholar] [CrossRef]
Chen, X.; Wu, L. Topographic Influence on the Motion of Tropical Cyclones Landfalling on the Coast of China. Wea. Forecast. 2016, 31, 1615–1623. [Google Scholar] [CrossRef]
Hsu, L.H.; Su, S.H.; Kuo, H.C. A Numerical Study of the Sensitivity of Typhoon Track and Convection Structure to Cloud Microphysics. J. Geophys. Res. Atmos. 2021, 126, e2020JD034390. [Google Scholar] [CrossRef]
Wu, C.-C. Typhoon Morakot: Key Findings from the Journal TAO for Improving Prediction of Extreme Rains at Landfall. Bull. Am. Meteorol. Soc. 2013, 94, 155–160. [Google Scholar] [CrossRef]
Tan, Z.-M.; Lei, L.L.; Wang, Y.; Xu, Y.; Zhang, Y. Typhoon Track, Intensity, and Structure: From Theory to Prediction. Adv. Atmos. Sci. 2022, 39, 1789–1799. [Google Scholar] [CrossRef]
Chen, H.-C. An Innovative Dynamic Model for Predicting Typhoon Track Deflections over Complex Terrain. Atmosphere 2024, 15, 1372. [Google Scholar] [CrossRef]
Lin, Y.-L.; Chen, S.-H.; Liu, L. Orographic Influence on Basic Flow and Cyclone Circulation and Their Impacts on Track Deflection of an Idealized Tropical Cyclone. J. Atmos. Sci. 2016, 73, 3951–3974. [Google Scholar] [CrossRef]
Liu, L.; Lin, Y.-L.; Chen, S.-H. Effects of Landfall Location and Approach Angle of an Idealized Tropical Cyclone over a Long Mountain Range. Front. Earth Sci. 2016, 4, 14. [Google Scholar] [CrossRef]
Yeh, T.C.; Elsberry, R.L. Interaction of Typhoons with the Taiwan Orography. Part II: Continuous and Discontinuous Tracks across the Island. Mon. Weather Rev. 1993, 121, 3213–3233. [Google Scholar] [CrossRef][Green Version]
Montgomery, M.T.; Smith, R.K. Recent Developments in the Fluid Dynamics of Tropical Cyclones. Annu. Rev. Fluid Mech. 2017, 49, 541–574. [Google Scholar] [CrossRef]
Lin, Y.-F.; Wu, C.-C.; Yen, T.-H.; Huang, Y.-H.; Lien, G.-Y. Typhoon Fanapi (2010) and Its Interaction with Taiwan Terrain—Evaluation of the Uncertainty in Track, Intensity and Rainfall Simulations. J. Meteorol. Soc. Jpn. 2020, 98, 93–113. [Google Scholar] [CrossRef]

Figure 1. The MTA-TPACK Dynamic Collaboration Spiral. This framework shows the shift of generative AI from a static tool to a dynamic process. The base is static TPACK Resources—Content, Pedagogical, and Technological Knowledge. These are activated by Meta-Task Awareness (MTA), a Dynamic Navigation Engine that guides agency and reimagines traditional knowledge for AI collaboration. Enabled by MTA, the process forms a Dynamic Collaboration Spiral through Generation, Evaluation, and Refinement. This Chain of Learning Design and Evaluation (CoLDE) ends with a visible pedagogical artifact that captures implicit reasoning.

Table 1. Key physical mechanisms governing tropical cyclone–terrain interactions and their corresponding visual indicators for evaluation.

Mechanism	Description (Content Knowledge)	Observable Visual Indicator (Evaluation Criteria)	Key References
Orographic Blocking	High mountain ranges disrupt low-level inflow, leading to deceleration, deflection, or partial blocking of the cyclone circulation.	Blocking Features: Visible stagnation of low-level flow upstream of the terrain; flow lines splitting or diverting around the mountain barrier rather than passing over it.	[38,39,40,41]
Vortex Splitting	Complex terrain can induce the separation of the primary vortex into secondary gyres through potential vorticity redistribution.	Vortex Splitting: Presence of distinct secondary circulation centers (gyres) or dual-eye structures located on the lee side or downstream of the terrain.	[39,41,42,43]
Asymmetric Convection	Enhanced convection and precipitation occur on windward slopes due to forced ascent, while subsidence and drying dominate on the lee side.	Rainfall Asymmetry: High-density cloud/precipitation rendering on the windward slopes contrasted with sparse clouds or clear air (subsidence) on the leeward side.	[41,42,44,45]
Track Deflection	Cyclone tracks are deflected as a function of terrain interaction, leading to looping or discontinuous flow patterns.	Streamline Structure: Coherent curvature of the overall wind field showing deflection; continuity of flow lines consistent with physical deflection rather than random turbulence.	[36,38,40,45,46,47,48,49]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, H.-C.; Wong, L.-H. The MTA-TPACK Dynamic Collaboration Spiral: Making Pedagogical Thinking Visible in Human–AI Scientific Visualization for Sustainable Teacher Innovation. Sustainability 2026, 18, 2718. https://doi.org/10.3390/su18062718

AMA Style

Chen H-C, Wong L-H. The MTA-TPACK Dynamic Collaboration Spiral: Making Pedagogical Thinking Visible in Human–AI Scientific Visualization for Sustainable Teacher Innovation. Sustainability. 2026; 18(6):2718. https://doi.org/10.3390/su18062718

Chicago/Turabian Style

Chen, Hung-Cheng, and Lung-Hsiang Wong. 2026. "The MTA-TPACK Dynamic Collaboration Spiral: Making Pedagogical Thinking Visible in Human–AI Scientific Visualization for Sustainable Teacher Innovation" Sustainability 18, no. 6: 2718. https://doi.org/10.3390/su18062718

APA Style

Chen, H.-C., & Wong, L.-H. (2026). The MTA-TPACK Dynamic Collaboration Spiral: Making Pedagogical Thinking Visible in Human–AI Scientific Visualization for Sustainable Teacher Innovation. Sustainability, 18(6), 2718. https://doi.org/10.3390/su18062718

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The MTA-TPACK Dynamic Collaboration Spiral: Making Pedagogical Thinking Visible in Human–AI Scientific Visualization for Sustainable Teacher Innovation

Abstract

1. Introduction

1.1. Research Background: TPACK in the Generative AI Era

1.2. Generative AI for Scientific Visualization: The Case of Typhoon–Terrain Interactions

1.3. Research Design and Methodological Positioning

1.4. Research Objectives and Study Overview

2. Theoretical Framework: The MTA-TPACK Dynamic Collaboration Spiral

2.1. The Foundation and the Engine: Static TPACK Resources and Meta-Task Awareness

2.2. The Process: The Generate–Evaluate–Refine Collaboration Spiral

2.3. The Outcome: Visible Pedagogical Thinking

3. Activating the Dynamic Collaboration Spiral: The Midjourney Phase

3.1. The Foundation: Static TPACK Resources as the Stable Base

3.2. The First Spiral Cycle: Establishing the Baseline

3.2.1. First GENERATE Phase

3.2.2. The First EVALUATE Phase

3.2.3. First REFINE Phase

3.3. Ascending the Spiral: Progressive Refinement Across Iterative Cycles

3.3.1. Second Spiral Cycle: From Descriptive Terms to Visual Constraints

3.3.2. Third Spiral Cycle: Translating Mechanisms into Observable Features

3.3.3. Fourth Spiral Cycle: Integration and Selection

3.4. Summary of the Midjourney Phase

4. Activating the Dynamic Collaboration Spiral: The GPT-4o Closed-Loop Phase

4.1. Conceptual Positioning of the GPT-4o Phase Within the MTA–TPACK Framework

4.2. First Closed-Loop Cycle: AI-Assisted Descriptive Alignment

4.2.1. Generate Phase

4.2.2. Evaluate–Refine Phase and AI Self-Evaluation Stage

4.3. Progressive Redistribution of AI Self-Evaluation

4.3.1. Second Closed-Loop Cycle: AI-Augmented Diagnostic Evaluation

4.3.2. Third Closed-Loop Cycle: AI-Augmented Structured Self-Reflection

4.3.3. Fourth Closed-Loop Cycle: AI-Augmented Selection-Oriented Stabilization

4.4. Comparative Analysis: Human-Orchestrated vs. AI-Augmented Spirals

4.5. Summary of the GPT-4o Phase

5. Conclusions

5.1. Theoretical Contributions

5.2. Empirical Contributions

5.3. Visible Pedagogical Thinking: The Culminating Outcome

5.3.1. Final Artifacts as Externalized Mental Models

5.3.2. Documented Trajectory as a Reusable Pedagogical Resource

5.3.3. Empirical Evidence for the Distinctiveness of Visible Pedagogical Thinking

5.4. Limitations and Future Directions

5.5. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Full Prompt Texts for the Midjourney Visualization Phase

Appendix A.1. Prompt P1 (Baseline Generation)

Appendix A.2. Prompt P2 (Introduction of Visual Constraints)

Appendix A.3. Prompt P3 (Mechanism-to-Feature Translation)

Appendix A.4. Prompt P4 (Integrated and Minimalist Specification)

Appendix B. Full Prompt Texts for the GPT-4o Closed-Loop Phase

Appendix B.1. Prompt P5 (Baseline Regeneration with Initial Self-Evaluation)

Appendix B.2. Prompt P6 (Evaluation–Regeneration with Diagnostic Comparison)

Appendix B.3. Prompt P7 (Enhanced Multilayer Specification with Mechanism-Level Evaluation)

Appendix B.4. Prompt P8 (Comprehensive Integration and Final Optimization)

Appendix C. Structured Evaluation Rubric for AI-Generated Scientific Visualizations

Appendix D. GPT-4o Self-Evaluation Trace

Appendix E. Technical Specifications

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI