From Prompt to Play: Examining Computational Thinking Through Vibe Coding in Game Making for Pre-Service Teacher Education

Pellas, Nikolaos

doi:10.3390/mti10050057

Open AccessArticle

From Prompt to Play: Examining Computational Thinking Through Vibe Coding in Game Making for Pre-Service Teacher Education

by

Nikolaos Pellas

School of Philosophy and Education, Department of Education, Aristotle University of Thessaloniki (AUTH), 54124 Thessaloniki, Greece

Multimodal Technol. Interact. 2026, 10(5), 57; https://doi.org/10.3390/mti10050057

Submission received: 26 April 2026 / Revised: 15 May 2026 / Accepted: 19 May 2026 / Published: 21 May 2026

(This article belongs to the Special Issue Technology-Enhanced Game-Based Approaches in Education: Learning, Emotions, and Motivation)

Download

Browse Figures

Versions Notes

Abstract

Computational thinking (CT) is increasingly recognized as essential in education, yet teacher preparation programs struggle to develop both computational proficiency and pedagogical readiness in pre-service teachers (PSTs). This study examines an AI-mediated, game-making course grounded in the emerging “vibe coding” paradigm, where 24 novice PSTs iteratively constructed programs through natural language prompting. Adopting a mixed-methods design, the study drew on pre- and post-course attitude questionnaires, reflective accounts of prompting strategies, and open-ended responses. Results indicate that participants substantively engaged with core CT practices, particularly debugging, iterative refinement, and problem decomposition. Nonetheless, this downward recalibration in self-reported coding and teaching confidence represents a productive adjustment rather than a failure. Conversely, attitudes toward game-making improved significantly, with a statistically significant medium effect size for perceived instructional value (d = 0.51), the largest practical effect observed across dimensions. Most participants intended to integrate CT into future teaching. These findings suggest that prompt-driven learning environments support meaningful engagement with computational processes when carefully scaffolded, but do not inherently ensure pedagogical readiness, particularly for higher-order CT practices such as abstraction and pattern recognition. Unlike prior research that has examined game-making processes or PST attitudes toward CT in isolation, this study empirically integrates all three within a single scaffolded instructional design using vibe coding. This integration enables a process-level account of how CT is enacted—and how it develops—when code generation is partially delegated to AI systems. Beyond documenting attitude shifts, the study introduces an analytical rubric for identifying CT engagement in AI-mediated prompting and derives evidence-based design principles that specify the pedagogical conditions under which vibe coding supports, rather than bypasses, computational reasoning.

Keywords:

artificial intelligence; computational thinking; game making; vibe coding; pre-service teacher education

1. Introduction

Computational thinking (CT) is widely recognized as a fundamental competence for navigating and shaping digital systems [1], involving practices such as decomposition, debugging, algorithmic thinking, abstraction, and pattern recognition [2]. These competencies are not only relevant for learners in K–12 settings but are increasingly considered essential for teachers themselves, who are expected to model, facilitate, and assess computational practices in their classrooms [3]. Pre-service teachers (PSTs), in particular, occupy a critical position in this field. As future educators, their own experiences with CT during teacher preparation programs will shape whether and how these practices reach the next generation of students.

Game making has consistently been positioned as an effective context for cultivating CT, as it requires learners to design rule-based systems, model interactions, and debug emergent behaviors [4]. For PSTs, game-making environments such as Scratch, Alice, and App Inventor have proven especially generative, as they provide low-barrier entry points into coding while simultaneously prompting reflection on pedagogy, creativity, and learning design [5,6]. Research in this area suggests that when PSTs engage in coding through game-making, they not only develop technical confidence but also begin to form richer understandings of how to support student reasoning in digitally mediated environments [7]. A constructionist theoretical framework helps explain why this approach is effective. It emphasizes learning through making meaningful artifacts, and in the context of game making, PSTs are not just learners, but also designers of learning experiences [8]. When creating a game for others to play, they must both understand the computational logic behind the system and anticipate how a user will experience and interpret it. This dual perspective—thinking as both a developer and a learner—is central to effective teaching, as it mirrors the kind of pedagogical thinking required in real classrooms [4,5,9]. The constructionist logic motivating this design is developed further in Section 2.2.

PSTs entering teacher preparation programs, however, rarely encounter CT from a neutral baseline. Research [3,5,7] consistently documents that this population brings a specific psychological profile to coding courses: many report significant technology anxiety worries not about using smartphones or social media, but about being asked to produce rather than consume digital content—alongside a pervasive belief that programming is a domain reserved for mathematically inclined or STEM-trained individuals. This belief persists even among PSTs who describe themselves as generally comfortable with technology because daily digital fluency (scrolling, typing, searching) does not prepare them for the cognitive demands of decomposing a problem, tracing an algorithm, or debugging a broken loop. The result is a baseline state characterized by high self-efficacy in general digital tool usage paired with a fundamental deficit in computational construction. This misalignment represents a classic case of unconscious incompetence: a metacognitive state where learners are not only unskilled, but also lack the awareness necessary to recognize their own limitations within the programming domain [5,6]. Any intervention that genuinely develops CT must therefore be expected to produce a temporary drop in self-reported confidence because making the hidden difficulty of the domain visible is what moves learners from unconscious to conscious incompetence—a necessary and productive developmental transition rather than a pedagogical failure.

To date, the rise in AI-assisted development—often informally described as “vibe coding,” where creators iteratively generate and refine code through natural language prompts using AI tools—has introduced a new paradigm in how games are conceptualized and built [10]. By using existing chatbots such as Claude, Gemini, or ChatGPT, learners shift from writing code line by line to orchestrating system behavior through prompting, evaluation, and adjustment of generated outputs [11,12]. For PSTs, this shift carries particular significance: if their primary encounter with coding occurs through AI-mediated prompting rather than direct construction, the nature of the CT they develop—and their readiness to teach it—may differ substantially from what prior research describes. In this context, vibe coding may lower the entry barrier while still demanding genuine logical engagement, making a temporary decline in self-reported confidence interpretable as a shift toward a more realistic, metacognitive understanding of computational work [10,11].

Although this approach lowers technical barriers and accelerates prototyping, it raises critical questions about the nature of computational engagement. Foundational research on CT assumes direct interaction with code as a medium for expressing logic and building mental models [1,13]. Research also shows that PSTs’ attitudes toward programming are shaped by their sense of competence and perceived barriers (time, training, resources), all of which influence how deeply they engage with coding tools in teacher preparation contexts [14,15]. Recent studies on AI-assisted programming tools indicate that these systems can both scaffold and obscure computational reasoning, depending on how they are integrated [16,17,18]. When code generation is partially delegated to AI systems, the abstraction introduced may reduce cognitive load at the syntax level but also redirect attention away from the underlying algorithmic structures upon which CT development depends. For teacher education specifically, this creates a “black box” risk. If PSTs cannot articulate the underlying logic of the applications they build, they will struggle to scaffold computational reasoning for their future students. Without empirical clarity on whether PSTs are truly developing CT or simply delegating logic to AI, teacher education risks producing educators who can use AI tools but cannot teach the logic behind them. Therefore, two research questions guide this study:

RQ1: To what extent do PSTs exhibit changes in their attitudes toward AI-mediated (vibe coding) game making, as reflected in their perceptions of digital technology use, game-making contexts, and self-efficacy?
RQ2: How do PSTs engage in core CT practices during game making, as reflected in their accounts of natural language interactions with AI chatbots and in their open-ended reflective responses?

The purpose of the present study is to examine the enactment of CT within an AI-mediated game-making context for PSTs. Rather than focusing solely on the final game product, it investigates the process: how PSTs formulate problems, interpret chatbot-generated outputs, and iteratively refine solutions through vibe coding. By situating this analysis within a structured learning environment, the study seeks to clarify how the core CT practices introduced above are enacted when programming is mediated through natural language interaction with AI chatbots.

This study adds to the emerging intersection of CT literacy and AI-mediated game making in educational contexts. It provides an empirically grounded account of how vibe coding reshapes the traditional CT landscape. Practically, it offers design implications for teacher education, showing that while AI can facilitate creative production, instructors must implement specific scaffolds to ensure PSTs maintain conceptual ownership over the logic generated by AI. It moves the conversation from whether AI should be used to how it can be used to deepen pedagogical fluency.

2. Background

2.1. Vibe Coding

The concept of “vibe coding” was coined by Andrej Karpathy in early 2025 to describe a mode of software creation in which developers articulate intentions through natural language prompts rather than writing explicit code, with a large language model (LLM) generating and refining the corresponding implementation [10,19]. This paradigm represents a fundamental shift from syntax-driven to intent-driven development, repositioning the programmer as a conceptual guide who specifies goals, interprets outputs, and steers system behavior rather than constructing algorithms line by line. Vibe coding differs from prompt engineering in its fluid, exploratory, and trial-and-error character, emphasizing creative momentum and personal expression over formal specification [10]. Vibe coding spans a continuum from fully autonomous implementation to scaffolded, instructor-mediated use; the present study is positioned at the structured end of that continuum rather than as an unconstrained autonomous form [19].

Empirical research reveals both the promise and the limits of this approach. Fortes-Ferreira et al. [19] showed that a user with no programming background could build a functional 3D driving simulator through iterative prompting alone, demonstrating vibe coding’s potential to democratize software development. Yet, the same study also found that effective prompting required an emerging form of “prompt literacy”. It is defined as the ability to analyze outputs, detect misalignments, and iteratively refine instructions to debug, a process that requires at least foundational programming knowledge. Thorgeirsson et al. [20] supported this in a preregistered study with 100 tertiary students, finding that both computer science achievement and written communication skills were significant independent predictors of vibe coding performance, even after controlling for general cognitive ability. These findings show that vibe coding recontextualizes rather than replaces computational competence. Prior conceptual knowledge serves as an essential scaffold, enabling developers to navigate complex AI-mediated workflows.

2.2. Constructionism as a Theoretical Framework in Game Making

Constructionism, first theorized by Papert [21] and subsequently developed in the context of digital learning environments, offers a compelling theoretical rationale for why programming through game creation represents a particularly powerful form of learning in teacher education [22]. At its core, constructionism holds that knowledge is not passively received but actively constructed—and that this construction is most generative when it produces a tangible, shareable artifact that exists beyond the learner’s own mind and invites response, critique, and reuse by others. Game creation fulfils this criterion with unusual completeness: a game is a rule-based interactive system that must be designed, tested, revised, and ultimately played by someone other than its maker, embedding the constructionist cycle of build–evaluate–refine into the very logic of the activity [4,8].

For PSTs specifically, the theoretical framework of constructionism carries dual pedagogical significance. As learners, they benefit from the same conditions that constructionism identifies as generative for any individual: a meaningful and personally motivating design challenge, creative latitude in how that challenge is approached, and access to peers whose different solutions expand the space of possible thinking [23]. Constructing computational artifacts from the ground up provides pre-service teachers with experiential pedagogical knowledge that observation alone cannot offer. This experience fosters an embodied understanding of problem decomposition, debugging, and persistence in the face of system failure [5,6]. This experiential grounding is precisely what teacher education programs have historically struggled to provide in the domain of CT, and it is what a constructionist game-making approach is uniquely positioned to offer [3,14].

Block-based programming environments, such as Scratch, were designed explicitly within this constructionist tradition and represent its most pedagogically accessible realization in the context of game creation [23]. By replacing text-based syntax with visual, interlocking command structures, Scratch eliminates the syntactic barriers that most reliably deter novice programmers—a population that includes the majority of PSTs entering teacher preparation programs with no prior coding experience and considerable anxiety about their computational competence [7,15]. Yet, this reduction in technical complexity does not diminish the depth of computational engagement that the environment affords. Learners building games in block-based programming environments encounter the full architecture of CT practices. They decompose narrative and gameplay logic into programmable sequences, construct algorithms governing character movement and interaction, identify and systematically correct errors through debugging, and abstract repeated patterns into reusable loop and conditional structures [1,13]. Game making is therefore not a simplified or diluted form of programming, but a contextually rich instantiation of it, one in which the design demands of the game continuously generate new CT problems for the learner to solve [8,24]. This alignment between game-making logic and CT practice is what makes block-based environments particularly well suited to teacher education. PSTs are simultaneously developing computational competencies and experiencing firsthand how those competencies emerge from the process of designing something for a specific audience and purpose. This is an insight that directly informs how they will later design learning experiences for their own students [9,25].

Block-based environments make computational logic visible through explicit command structures, whereas vibe coding shifts that logic into prompt formulation, evaluation, and revision. The five-stage sequence in this study was designed to preserve that visibility while gradually moving learners from manual decomposition and debugging toward AI-mediated orchestration, thereby preventing the process from becoming a “black” box [3].

2.3. CT Learning Through AI-Assisted Programming

The integration of CT into educational curricula worldwide reflects a broad consensus that algorithmic problem solving is now a fundamental competence for all learners [1,26,27]. Traditionally, the acquisition of these competencies has been contingent upon direct, manual engagement with code, characterized by the rigorous processes of debugging, iterative refinement, and the ‘productive struggle’ of error resolution.

However, the rise of ‘vibe coding’—characterized by AI-driven automation of code generation—challenges this pedagogical paradigm. By abstracting the lower-level mechanics of development, these tools risk decoupling learners from the cognitive scaffolding essential for consolidating CT. This transition induces a state of epistemic opacity [28], wherein learners generate functional artifacts without achieving a commensurate understanding of the underlying computational logic, necessitating a critical re-evaluation of programming learning.

Among CT components, decomposition emerges as especially pivotal in the AI-assisted context. Lee et al. [25] introduced Spark, a multimodal programming interface designed to make decomposition explicit and inspectable for novice adult learners. The interface operationalizes three forms of decomposition—substantive, relational, and functional—through a combination of natural language interaction, visual representation, and tangible robot execution. In a comparative user study, learners who encountered Spark first achieved significantly higher objective decomposition scores and task success rates than those beginning with Scratch’s block-based environment, with no increase in cognitive load. Crucially, the same authors drew a direct theoretical connection to vibe coding. They argued that early exposure to embodied, conversational programming, in which learners can articulate intent, diagnose misalignment, and reason about system behavior, cultivates the competencies that make meaningful AI-assisted development possible. Si et al. [12], studying 41 children (aged 6–11) using an AI-supported environment, further demonstrated this dependency: children frequently produced vague, logically incoherent prompts, struggled to decompose objectives into actionable subgoals, and closed error dialogs rather than attempting to debug—revealing that the iterative refinement workflow assumed by vibe coding presupposes metacognitive and CT capacities that many young learners have not yet developed.

The literature reviewed above collectively implies a specific design challenge for teacher education contexts, which is how to integrate AI-mediated prompting into game-making in a way that preserves rather than bypasses the computational reasoning still required for CT development. Previous studies [11,12,13] have additionally argued that block-based enviroments represent one part of the answer because their visual, manipulable structure keeps flow and algorithmic logic transparent in a way that text-generated code does not. However, block-based environments alone do not develop the intent articulation competencies that vibe coding demands. The combination of these two modalities—block-based construction for computational visibility and natural language prompting for intent specification—offers a principled pedagogical response, one in which learners cannot simply delegate logic to AI because the block-based layer requires them to evaluate, translate, and implement whatever the AI proposes. This integration provides the theoretical basis for the five-stage scaffolding design described in Section 3. It predicts that the most productive vibe coding environments are not those in which AI replaces construction, but rather those in which AI challenges learners to articulate and defend computational decisions at the semantic level instead of the syntactic level. What this body of work does not address is how CT is enacted in practice when AI-mediated prompting and game making converge within a single scaffolded teacher education course, which is the specific intersection investigated in this study.

2.4. Game Making as a Context for Vibe Coding and CT Development

Game making has long been recognized as a powerful constructionist context for CT development, precisely because building an interactive game requires learners to decompose complex systems, abstract behaviors into rules, and iteratively debug mechanics to achieve intended outcomes [4]. Traditional game-making environments make computational processes visible and consequential, requiring active construction rather than passive consumption. The introduction of vibe coding into this context introduces a productive tension: AI assistance can expand creative ambition and lower implementation barriers, but it also risks transforming game development into a more descriptive and less computationally grounded activity.

Royal [10] captured this duality through a Vibe and Flow Model grounded in Csikszentmihalyi’s flow theory, applied to a project-based mobile application development course. Students using AI tools to build interactive apps reported deep engagement, intrinsic motivation, and increased coding self-efficacy. All participants agreed that AI enabled higher-quality projects and made the experience more enjoyable by allowing them to focus on creative and user experience dimensions, as the AI efficiently handled routine implementation tasks. Si et al. [12] observed analogous dynamics in children’s Scratch game creation: vibe coding expanded creative ideation, with 31.70% of participants spontaneously proposing features beyond assigned tasks, and 68.29% successfully implementing structures they had never previously learned. Yet, the same study found that rapid prototyping could produce a superficial sense of accomplishment while bypassing the foundational understanding that programming education is designed to build. Together, these findings reveal that the relationship between game making and CT can no longer be assumed to be inherently productive when AI mediates the process. Rather, the pedagogical value of vibe coding in game contexts depends on deliberate design, specifically on scaffolding that foregrounds decomposition, goal articulation, and critical reflection on AI-generated outputs, ensuring that creative engagement translates into genuine computational learning.

As a result, current research provides a compelling conceptual foundation but offers limited insight into how CT manifests in practice when AI-assisted coding and game design converge. It remains unclear whether learners actively engage in core CT processes, such as decomposition, abstraction, and algorithmic reasoning, or whether these processes are partially externalized to AI systems during game development. This absence of situated empirical evidence represents a significant gap in the literature. Addressing this gap is essential for understanding whether vibe-coded game-making environments function as effective scaffolds for CT or risk diminishing meaningful computational engagement. The present study is designed to directly investigate this intersection, providing empirical insight into how CT is enacted within AI-mediated game-making activities. Unlike prior studies that examine AI-assisted programming or game making for CT learning in isolation, this study integrates these domains within a single instructional design, allowing for a more precise analysis of how CT is demonstrated when code generation is partially delegated to AI platforms.

2.5. PSTs’ Attitudes Toward Coding and CT

Taken together, the literature reviewed across Section 2.1, Section 2.2, Section 2.3 and Section 2.4 reveals three converging gaps. First, while vibe coding has been examined in isolated contexts—simulator development [19], mobile application design [10], and children’s block-based environments [12]—no study has empirically integrated these strands within a single teacher education course. Second, existing research on PSTs’ attitudes toward CT [5,6] does not account for the additional cognitive layer introduced when code generation is partially delegated to AI. Third, the analytical tools available for identifying CT engagement in AI-mediated contexts remain underdeveloped; prior studies rely primarily on artifact analysis or task performance rather than on the process-level reasoning evident in natural language prompting. The present study is designed to address all three gaps simultaneously.

Research on how PSTs engage with programming and CT remains an evolving area, with the knowledge base still developing [3]. The studies that do exist paint a nuanced picture: PSTs tend to recognize the value of coding in education and express broadly favorable views about its role in modern teaching [15]. At the same time, many practical concerns, including insufficient preparation time, limited access to resources, and a lack of professional training, stand in the way of confident implementation. These concerns often mirror broader anxieties PSTs hold about incorporating technology into their teaching practice, as well as uncertainties about their own digital competence [7,14].

Several intervention studies have explored how engaging PSTs directly with coding tools affects their confidence and motivation. Gleasman and Kim [6], for instance, studied a program in which PSTs used Scratch to teach elementary mathematics conceptually rather than procedurally. Their focus was on whether PSTs could draw meaningful connections between CT and deeper mathematical understanding, not just technical execution. Pre- and post-survey data revealed a modest overall improvement in attitudes toward integrating CT into mathematics instruction, alongside a notably strong gain in programming self-confidence.

In a related line of inquiry, Butler and Leahy [5] examined how PSTs developed their understanding of CT through hands-on work in Scratch within a constructionist learning framework. Participants designed interactive challenges for children as part of their practicum experience, and in doing so, began articulating concrete pedagogical strategies—such as using questioning techniques to guide students toward deeper algorithmic reasoning, particularly during debugging activities. PSTs also reported that working directly with programmable objects sharpened their own problem-solving abilities and extended their capacity for abstract, everyday reasoning.

While existing literature establishes a strong theoretical connection between vibe coding and CT within AI-mediated game-making contexts, it remains largely fragmented across these domains. Prior studies have examined vibe coding in isolated contexts such as simulator development [19], decomposition in embodied or robotic programming environments [25], learner engagement and flow in mobile application design [10], and children’s interactions with AI-assisted tools in block-based environments [12]. However, these strands of research have not been empirically integrated. In particular, there is a lack of studies investigating how learners enact and develop CT practices while engaging in vibe coding within game-making contexts.

In the context of this study, where vibe coding introduces AI-assisted interaction as an additional layer, block-based programming serves a further critical function. It maintains the visibility of computational logic, ensuring that PSTs remain active in the game creation process rather than becoming passive recipients of AI-generated outputs. This approach preserves the constructionist principle that understanding emerges from making rather than merely from having made [10,17,29]. In the current study, therefore, vibe coding is operationalized through three observable dimensions: (a) intent articulation through natural language prompts, (b) iterative refinement via prompt–output cycles, and (c) evaluation of AI-generated logic against expected system behavior.

This study extends current understanding by empirically demonstrating that CT development can persist even when core programming tasks are partially delegated to AI systems, provided that learning environments are carefully scaffolded.

3. Research Method

3.1. The “Vibe Coding in Game Making” Course

3.1.1. Rationale and Course Design

The design of this course was not incidental. It was a direct response to the patterns identified in the literature. Research consistently shows that PSTs approach coding with a complicated mixture of intellectual curiosity and practical anxiety: they recognize the growing importance of computational competencies in education, yet frequently cite insufficient training, limited resources, and low perceived self-efficacy as barriers to engagement [14,15]. Evidence from constructionist learning environments supports this approach, showing that PSTs deepen their content knowledge and pedagogical skills when they are given structured opportunities to build, program, and reflect instead of simply observing or consuming [5]. This active engagement also fosters greater confidence. This study course was therefore designed around a core pedagogical conviction: that the most effective way to prepare PSTs to teach CT is to immerse them in the experience of learning it themselves under conditions that are simultaneously challenging, supported, and meaningful.

Three interconnected objectives shaped the overall design. The first was to lower the psychological and practical barriers that prevent many PSTs from engaging meaningfully with coding by ensuring that technical entry points were accessible and that early experiences were structured to produce visible, tangible outcomes. The second was to cultivate genuine CT competencies—including decomposition, abstraction, pattern recognition, and debugging—not as isolated technical procedures, but as habits of mind embedded in creative, purposeful activity. The third was to develop PSTs’ capacity for pedagogical reflection: specifically, their ability to observe their own learning process, articulate the strategies they used to navigate difficulty, and translate those insights into implications for their future teaching practice. By the end of the course, participants were expected to design and program an original Scratch game meaningfully connected to their subject matter specialization. This artifact served as evidence of not only their technical growth but also their creative and conceptual development.

This study course was delivered face-to-face and co-facilitated by a college lecturer and an external moderator from an organization specializing in technology-enhanced learning environments. Each session was structured to integrate two complementary dimensions: a technical strand in which participants engaged directly with coding through hands-on construction of simulations and games, and a reflective strand in which participants examined their own creative and problem-solving processes through guided discussion and written documentation. This structure reflects the view that coding without reflection becomes procedural imitation, while reflection without genuine technical engagement lacks the concrete experiential foundation from which meaningful learning emerges [9].

To strengthen the interpretive rigor of the findings, several validity considerations were addressed. First, construct validity was supported through the use of multiple data sources (close-ended questionnaires and open-ended responses), enabling triangulation of participants’ reported experiences. Second, internal validity is limited by the absence of a control group and the exploratory nature of the design; therefore, causal claims cannot be made. Third, external validity is constrained due to the small, self-selected sample drawn from a single institution. Finally, interpretive validity was enhanced through systematic thematic analysis and the alignment of qualitative findings with quantitative trends, reducing the risk of overinterpretation. This study does not claim validated gains in CT competence; rather, it identifies behavioral indicators of CT enactment within AI-mediated workflows.

AI delegation risk was mitigated by requiring participants to articulate logic prior to AI generation. To mitigate the potential ‘novelty effect’—where participant engagement is driven primarily by the initial excitement of new AI tools—this study employed a longitudinal design spanning three distinct instructional modules over 12 weeks. The staged nature of the curriculum provided a ‘cooling-off’ period; while Module 1 focused on the initial discovery of ‘vibe coding,’ Modules 2 and 3 introduced significantly higher computational complexity and debugging requirements. This transition shifted the participants’ focus from the novelty of the AI interface to the rigorous, iterative logic required for game mechanics. By the final module, the PSTs had reached a state of ‘technological habituation,’ ensuring that the reported changes in self-efficacy and CT practices reflected sustained pedagogical growth rather than transient enthusiasm. The present study cannot isolate causal effects of AI mediation; however, differences in prompting-based reasoning suggest a shift in how CT is externalized and negotiated.

The absence of a control group is an acknowledged structural feature of DBR design, not an omission. In contexts where the intervention is novel, iterative, and embedded within an authentic institutional setting, the introduction of a parallel comparison condition would require either artificial standardization of the learning environment or a delay of one full academic year. Moreover, the theoretical questions driving this study—how CT is enacted in an AI-mediated game-making context, and under what conditions—are not well served by between-group comparison, which can confirm that an effect exists without illuminating the mechanisms responsible for it [30,31,32,33]. The present study prioritizes mechanism over magnitude, consistent with its DBR framing.

3.1.2. Course Modules and Learning Sequence

This study course unfolded across three sequential modules, each comprising four 90 min sessions, and was structured around a five-stage cognitive scaffold designed to progressively deepen participants’ engagement with computational thinking (CT) through maze-based game-making tasks. These two organizational layers—the thematic modules and the staged task sequence—work in tandem to move participants from conceptual familiarization to independent creative production. Although framed within the emerging paradigm of “vibe coding,” the course did not constitute a fully unconstrained prompt-only environment. Instead, the curriculum featured a structured instructional process where AI-assisted prompting (e.g., ChatGPT 5.5, Claude Sonnet 4.6, and Gemini Flashni) was intentionally combined with block-based programming tasks and explicit CT scaffolds. This design allowed participants to engage in the intent-driven interaction characteristic of vibe coding while maintaining visibility into underlying computational structures. In this study, vibe coding is defined as a prompt-driven programming approach in which users iteratively generate, evaluate, and refine code through natural language interaction with AI systems. Accordingly, the findings reflect AI-mediated, scaffolded computational engagement rather than fully autonomous prompt-based development. It is important to note that ‘vibe coding’ as implemented in this course, does not denote unconstrained, prompt-only development in the manner described by Fortes-Ferreira [19]. Rather, it denotes a scaffolded, AI-assisted prompting paradigm in which natural language interaction with AI systems is deliberately integrated with block-based construction tasks and explicit CT scaffolds. This operationalization preserves the intent-driven, iterative character of vibe coding while maintaining computational visibility—a design decision justified precisely by the pedagogical limitations of fully unconstrained AI delegation documented in Section 2.3.

The first module established the theoretical and experiential foundations of the course. Participants were introduced to core concepts in educational game making, examined examples of game-making learning, and engaged critically with the pedagogical opportunities and challenges of integrating digital games into school contexts. The aim at this stage was to develop conceptual understanding rather than technical skills. The instructional design sought to prevent passive delegation to AI by requiring learners to explain intended logic before generation and to diagnose AI-generated errors after execution.

The second module shifted toward practical engagement, focusing on core programming constructs such as motion, conditionals, and loops within a vibe coding context. The instructor modeled each operation, after which participants replicated and adapted these examples in line with their own design ideas. At the end of each session, PSTs shared their work in a communal space, fostering peer learning through observation, feedback, and idea exchange. These interactions were structured around four dimensions of creativity, fluency, flexibility, originality, and elaboration [30], ensuring that technical work remained embedded within a broader creative and pedagogical framework.

Underlying these modules was a five-stage learning sequence designed to scaffold CT development through progressively increasing cognitive demand. In the first stage, participants analyzed a maze problem mentally, engaging in decomposition and initial problem representation. In the second stage, they constructed solutions using constrained directional commands, translating their plans into structured sequences and thereby operationalizing decomposition and abstraction. The third stage introduced execution with immediate visual feedback, enabling iterative debugging and reinforcing the deterministic nature of algorithmic processes. In the fourth stage, participants optimized their solutions using loop constructs, compressing repeated actions into reusable patterns and making principles of efficiency and pattern recognition explicit. In the fifth stage, participants were presented with a pre-constructed program containing a deliberate error and were required to trace execution, identify the fault, and correct it—foregrounding systematic debugging and algorithmic reasoning.

The third module culminated in independent creative production. Working in pairs, PSTs designed, developed, and presented original game-making projects aligned with their subject specializations and informed by their practicum experiences. The presentation format (Figure 1) was intentionally designed to position participants as both creators and critical audience members, reinforcing the social and communicative dimensions of computational work.

The use of maze-based tasks within a block-based programming environment was a deliberate pedagogical choice aimed at preserving the visibility of computational logic within an AI-mediated context. While natural language prompting lowers syntactic barriers, it also risks obscuring underlying processes within AI-generated outputs. The maze task provided a constrained, rule-based environment in which learners were required to articulate stepwise logic, trace execution, and iteratively debug solutions—practices central to CT, including decomposition, algorithmic reasoning, and error correction [1,13]. Block-based environments further support this goal by making control flow, conditionals, and event structures visually explicit, enabling learners to inspect and manipulate underlying logic rather than simply accept generated code [23]. Directional primitives (e.g., “move forward,” “turn left,” and “turn right”) make algorithmic structure explicit by requiring learners to express solutions as ordered sequences of rule-based actions, thereby supporting early algorithmic reasoning and decomposition [1,13].

Within a game-making process, AI-assisted prompting did not replace computational reasoning but repositioned it. Learners were required to formulate precise instructions, evaluate system responses, and iteratively refine their prompts. Such environments can support CT development when scaffolds maintain conceptual transparency and require structured problem solving and debugging [12,25]. The maze-based approach, in particular, embeds CT practices within a bounded and feedback-rich context, enabling learners to internalize computational structures before applying them in more open-ended game-making scenarios. In this way, the combination of block-based programming, maze tasks, and AI-mediated interaction functions as a scaffolded pathway that sustains meaningful computational engagement while leveraging the creative affordances of vibe coding [10,17].

Throughout all three modules, participant prompts were maintained as entries in a shared Moodle space, documenting their experiences, challenges, and strategies at the close of each session. These entries served simultaneously as a metacognitive tool for the learners and as a rich qualitative data source for the research. An organizing principle running across the entire course was the gradual and deliberate transition from consumption to production. Participants shifted from engaging with existing games as users in Module 1 to building, debugging, and designing in Modules 2 and 3. This progression was intended to make experientially tangible the constructionist claim that the deepest learning occurs not when knowledge is received, but when it is actively constructed through the making of something shareable, testable, and real [8]. An example of an escape room structured around a CT curriculum is illustrated in Figure 2.

The five-stage maze task described above was embedded within Module 2 as a structured scaffold. Module 3 then required participants to transfer these CT competencies to an independent, open-ended game-making project supported by AI-assisted (Claude) prompting for code generation and debugging. Figure 2 illustrates this workflow, showing the interplay between natural language prompting in Claude and block-based implementation similar to Scratch. While the maze-based task (Stages 1–5) focused on foundational block-based logic, it served as a prerequisite for the vibe coding module. In this later stage, students used AI prompting to translate these manual logic patterns into more complex, automated game mechanics.

Three course features were specifically designed to guard against the ‘black box’ risk—the possibility that participants generate functional programs without achieving any understanding of the underlying logic. First, the pre-generation articulation requirement (design principle 2, Appendix C) required participants to write out their intended game behavior in their reflective logs before submitting any prompt to an AI chatbot. This requirement made the learner’s reasoning prior to AI delegation visible and forced them to take a position on what they expected the system to produce. Second, the Stage 5 error-tracing task presented participants with a pre-built Scratch program containing a deliberate error and required them to identify and correct it without AI assistance. This task specifically assessed whether participants had developed any functional understanding of block-based execution logic or whether their apparent engagement during AI-assisted sessions had been superficial. Third, the project presentation format in Module 3 required participants to explain the computational logic of their games to peers, creating an accountability mechanism for conceptual ownership: a participant who had simply accepted AI outputs without understanding them would be unable to answer peer questions about how their game worked.

CT components were operationalized analytically by mapping participants’ reported actions and reasoning strategies onto established CT categories (e.g., decomposition, abstraction, and debugging). For example, instances in which participants described breaking down tasks into smaller steps were coded as decomposition, while descriptions of identifying and correcting errors were coded as debugging. This operational mapping enabled a structured interpretation of qualitative data in relation to CT constructs.

To examine the complexity of PSTs’ learning experiences within an authentic educational setting, this study employed a mixed-methods research design that integrated both quantitative and qualitative data strands [31]. This integration was considered essential given that neither numbers alone nor narratives alone would be sufficient to capture the multidimensional nature of attitude change, skill development, and pedagogical growth among participants. This study was grounded in design-based research (DBR), an inquiry approach that originated in the learning sciences as a response to the limitations of purely experimental methodologies [32]. Rather than seeking to control variables in order to establish causal chains under artificial conditions, DBR situates the researcher within the learning environment itself, treating the educational design as both the object of study and an instrument of inquiry. This methodology is particularly well suited to contexts where the intervention is novel and iterative, the boundaries between implementation and investigation are deliberately fluid, and understanding how and why something works is valued alongside whether it works [33].

3.2. Participants

The participant group consisted of twenty-four (n = 24) PSTs enrolled in a Primary Education degree program at a teacher preparation program, all preparing to teach at the elementary school level. The cohort was composed of 13 males and 11 females. Participants’ average age was 24 years (SD = 1.4), with ages ranging from 21 to 26. While institutionally homogeneous as Primary Education majors, the cohort was academically diverse in terms of subject matter specialization, with participants drawn from Science, Mathematics, Literature, and English tracks. This disciplinary breadth proved generative in practice, contributing to the thematic and conceptual variety of the game design projects produced across the course.

A defining characteristic of the sample was the complete absence of prior programming experience. First, every participant was approaching coding for the first time and had no background in block-based or text-based programming environments. Participants did, however, report general familiarity with everyday digital tools, including smartphones, social media platforms, and the institution’s Moodle learning management system. This suggests a baseline level of general digital literacy, even in the absence of computational or programming experience. Second, none of the participants reported prior experience with educational game making, though several indicated potential familiarity with digital games as leisure activities, providing at least an experiential reference point for the game-making tasks introduced in this study course. Third, participants reported no prior programming experience and limited exposure to AI tools (e.g., ChatGPT or Claude), with most indicating only casual or exploratory use prior to the course. While all participants demonstrated routine engagement with general purpose digital technologies (e.g., smartphones, social media, and Moodle), their experience was predominantly consumption-oriented rather than focused on content creation or computational problem-solving. All of them had prior exposure to foundational pedagogical coursework and limited classroom observation through early practicum experiences, but none had independently designed or delivered technology-integrated lessons.

At the time of enrollment, participants had completed the first year of their teacher preparation program, which included foundational coursework in general pedagogy, child development, and curriculum theory. None had yet completed modules specifically addressing technology integration or digital pedagogy, meaning that the course under investigation represented their first sustained institutional engagement with CT and programming as educational practices. This baseline is relevant for interpreting the post-course attitude data, as any shift observed cannot be attributed to prior formal exposure to CT pedagogy.

Participation in this study course was entirely self-selected and voluntary, indicating that participants entered the experience with at least some degree of openness or curiosity toward the subject matter, even if this did not necessarily translate into confidence or prior competence [34]. This self-selection introduces a motivational bias that should be acknowledged as a boundary condition on the generalizability of the findings: the participant group may not be representative of the broader PST population, which frequently includes individuals with considerably higher levels of technology anxiety and lower initial orientation toward digital learning [35].

This study course was delivered in Greek, the language of instruction at the participating institution. Participants interacted with AI-assisted vibe coding tools exclusively through Greek language prompts; the methodological rationale and implications of this choice are discussed in Section 3.3.

While the sample size (n = 24) is modest, it is consistent with the epistemological commitments of design-based research, which prioritizes analytical depth within authentic educational contexts over statistical power and population-level inference [33]. The goal of this study is therefore analytical generalization, providing transferable insight into the processes, mechanisms, and conditions through which CT develops within vibe coding using chatbots, rather than producing statistically representative findings.

3.3. Sampling Procedure and Ethical Considerations

Full cohort participation reflects the structural characteristics of the elective rather than any pressure to enroll. The course was newly introduced to the program’s catalog for the first time in this academic year and was promoted through the institution’s course registration system as a digital pedagogy elective combining game design and AI-assisted programming. Of the students eligible to register, having completed their first year of foundational coursework, 24 chose to do so voluntarily. No student was assigned, strongly encouraged by a supervisor, or incentivized with academic credit beyond the elective’s standard allocation. The complete uptake is therefore better understood as a reflection of the cohort’s ceiling (24 was the total eligible population, not a subset of a larger group who opted in) than as evidence of unusual enthusiasm. That said, it is acknowledged that students who actively chose to take an elective explicitly described as involving AI tools and digital game creation were almost certainly more favorably predisposed toward digital learning than the broader PST population, and this constitutes a meaningful self-selection bias that limits the downward generalizability of the findings.

Two distinct generalizability constraints follow from this sampling frame. The first is institutional: the findings are directly applicable to comparable national teacher preparation programs but should not be assumed to transfer without modification to systems operating under different pedagogical traditions, resource environments, or technology infrastructure. The second is disciplinary: the cohort was composed entirely of Primary Education majors training to teach at the elementary level. While participants came from different subject matter specialization (Science, Mathematics, Literature, and English), all were novice programmers with no STEM background. PSTs training for secondary-level STEM teaching would likely enter an analogous course with substantially different prior knowledge, anxiety profiles, and CT intuitions, and future research should explicitly sample across these populations.

Given that participants were adult learners enrolled in a teacher education program, they were informed of their role in the research in a transparent and accessible manner, with particular attention to the use of AI-mediated tools and the collection of learning artifacts such as prompts and open-ended responses. This ensured that participants understood both the nature of the “vibe coding” learning activities and how their data would be used for research purposes. All prompting was conducted in Greek, the participants’ native language and the institutional medium of instruction. This was a deliberate pedagogical choice grounded in ecological validity. Greek-speaking PSTs who will teach in Greek language primary classrooms need to develop prompt literacy in their own language, and requiring English language interaction would have introduced an additional cognitive burden while reducing the authenticity of the learning experience. Nonetheless, this choice introduces a recognized methodological limitation. LLMs are trained predominantly on English language corpora and exhibit lower syntactic and semantic consistency on morphologically rich languages such as Greek meaning that prompts that were unambiguous in Greek occasionally produced divergent outputs—generating what might be described as linguistically induced debugging demands. The present study cannot fully disentangle whether observed debugging and iterative refinement reflects genuine CT engagement, a strategic response to AI misinterpretation of Greek input, or a combination of both; this is acknowledged as a boundary condition on the generalizability of findings and identified in Section 8 as a priority for future cross-linguistic replication.

Strict measures were implemented to protect confidentiality and anonymity. All collected data—including questionnaire responses, generated prompts, and qualitative feedback—were anonymized prior to analysis. Participants were also informed of their right to withdraw from the study at any stage without penalty. These procedures ensured that ethical standards were maintained while examining PSTs’ engagement with computational thinking practices in an AI-mediated, game-making learning environment. Although the sample size limits statistical power and prevents broad generalization, it aligns with design-based research, which prioritizes explanatory understanding of learning processes in authentic educational settings. Findings should therefore be interpreted as analytically transferable rather than statistically representative. This epistemological position is consistent with the broader methodological tradition in which this study is situated. Consistent with the methodological standards of design-based research, Luo et al. (2026) has emphasized that the quality of such research is determined by analytical depth and theoretical coherence, rather than the size of the participant sample. The current study involved a cohort of 24 participants, a sample size that aligns with, or in several instances exceeds, those utilized in foundational DBR-focused CT research within the context of teacher education: Butler and Leahy [5] worked with 20 PSTs, and Gleasman and Kim [6] similarly drew on a small voluntary cohort. The concern about statistical power is directly addressed through the reporting strategy adopted here: every quantitative finding is reported with both a p-value and Cohen’s d, precisely because significance tests are underpowered at this sample size. Throughout, non-significant results are treated as inconclusive rather than as evidence of null effects, and effect size is the primary unit of practical inference.

Ethical approval for this study was obtained from the relevant institutional authority overseeing the teacher preparation program, ensuring that all procedures were aligned with the principles of the Declaration of Helsinki and applicable national guidelines for research involving human participants. The participation was entirely voluntary and based on informed consent. All also were provided with clear and comprehensive information about the study’s objectives, procedures, and any potential risks before agreeing to take part.

3.4. Research Instruments

Preliminary clarification is necessary before describing the instruments employed. To strengthen analytical consistency in identifying CT engagement, a structured coding rubric was developed (Appendix A). The rubric distinguishes between levels of computational reasoning—from surface interaction to generative reasoning—across five core CT practices. This allowed for systematic classification of participants’ reflective accounts and reconstructed prompts, ensuring that claims regarding CT engagement were grounded in observable and consistently coded behaviors. Any reference to CT development throughout this paper should therefore be understood as denoting the emergence of computational practices and orientations rather than verified or measurable gains in computational competence.

Participant attitudes were assessed using a purpose-constructed pre- and post-course questionnaire developed through the adaptation and synthesis of existing items drawn from relevant instruments held within the Collegiate Research Authority’s database. The questionnaire was delivered digitally via Google Forms, accessed through the course Moodle platform. The pre-course version was completed independently at home before the first session began, and the post-course version was administered at the end of the program. Both administrations were conducted anonymously to minimize social desirability effects and encourage more candid self-reporting [36]. While none of the three subscales measures CT attitudes as an isolated construct, together they capture the attitudinal landscape most directly relevant to CT engagement in teacher education: confidence with digital tools, openness to game-making learning, and general problem-solving self-efficacy. These dimensions are treated as proxies for CT orientation, consistent with how the construct has been operationalized in comparable PST studies [3,15]. The three subscales employ different response ranges—five points (Table 1), five tables (Table 2), six points (Table 3), and four points (Table 4)—reflecting the distinct validated instruments from which items were adapted. Absolute mean values are therefore not directly comparable across subscales. A mean of 3.5 on a four-point scale represents a near-ceiling response, while the same mean on a six-point scale represents a moderate endorsement. Readers should treat within-scale pre-to-post comparisons as the primary unit of analysis and should use Cohen’s d rather than raw mean differences when making cross-subscale comparisons.

All items were presented in Greek, the participants’ native language. To safeguard linguistic precision and cultural appropriateness, a back-translation procedure was implemented in accordance with the methodological guidelines established by Brislin [37]. The questionnaire was structured around three conceptually distinct dimensions, each targeting a different aspect of PSTs’ attitudes and self-perceptions relevant to the course objectives.

The first dimension examined orientations toward the use of digital games as pedagogical tools, comprising seven items that probed both the perceived instructional value of game-making approaches and potential professional identity conflicts associated with their adoption (e.g., Incorporating digital games into my teaching has the potential to meaningfully strengthen what I offer my students; Using digital games in school feels at odds with my sense of what it means to be a professional educator). Because game-making contexts represented a largely unfamiliar domain for most participants prior to the course, a six-point Likert scale was selected—anchored at one (strongly disagree) and six (strongly agree)—to provide sufficient sensitivity to capture variation across a population with limited prior exposure to this domain [38].

The second dimension addressed general self-efficacy and problem-solving dispositions—cognitive and motivational competencies closely aligned with the adaptive reasoning that CT demands—through six items rated on a four-point scale from one (strongly disagree) to four (strongly agree) (e.g., I feel confident in my capacity to handle situations I have not encountered before; My instinct when facing a problem is to consider a range of possible approaches rather than committing to the first one that comes to mind). The theoretical grounding for this section drew on Bandura’s conceptualization of self-efficacy [39] and the item structure of Schwarzer’s [40] General Self-Efficacy Scale, both of which informed the selection and wording of individual items. This subscale was included to assess whether domain-general self-efficacy—distinct from coding-specific confidence—was affected by this study course, providing a comparative baseline for interpreting domain-specific attitude shifts.

The third dimension assessed PSTs’ broader orientations toward digital technology alongside their self-reported confidence in programming and technology-integrated pedagogy, using eight items rated on a five-point scale from one (strongly disagree) to five (strongly agree) (e.g., Being asked to teach coding to primary-age learners is something I find daunting; I generally adapt without difficulty when introduced to a new technological tool; Teaching programming in primary schools should be the exclusive domain of STEM-trained educators). These items targeted three barriers consistently identified in the literature: beliefs that CT belongs only to STEM specialists, anxiety about programming, and uncertainty about one’s own digital skills—all recognized as predictors of whether PSTs integrate technology in practice [35].

Given that this was the first delivery of the course, the post-course questionnaire incorporated two additional open-ended items designed to elicit structured, forward-looking reflection. The first invited participants to identify three aspects of the course they would recommend preserving and three they would suggest revising or removing. The second asked them to specify which pedagogical strategies or course elements they intended to carry into their future classrooms. These items were included both to generate actionable data for iterative course refinement and to illuminate the transferable insights participants derived from the experience [9].

The qualitative data strand drew on two sources. The first was the written responses participants produced for the open-ended questionnaire items described above. The second comprised reflective entries written by each participant at the close of every session during the third course module, yielding four entries per participant. A structured set of prompts guided these entries, asking participants to describe the emotional quality of their experience during the session, the specific obstacles they encountered, the cognitive strategies they deployed to move past those obstacles, and the forms of support—whether from peers, the instructor, or the digital environment itself—that they drew upon in the process [41].

A methodological transparency note is warranted regarding scale comparability. The three subscales employ different Likert response ranges—five points (Table 1), six points (Table 3), and four points (Table 4)—reflecting the distinct validated instruments from which items were adapted. A mean of 3.5 on a four-point scale represents a near-ceiling response, while the same value on a six-point scale indicates only moderate endorsement; absolute means are therefore not directly comparable across subscales. Accordingly, within-scale pre-to-post comparisons serve as the primary unit of analysis throughout the study. Furthermore, Cohen’s d—a scale-invariant metric—is employed for all cross-subscale comparisons of practical magnitude.

Table 1. Pre-service teachers’ self-reported competence with technology.

Statement	Pre-M (SD)	Post-M (SD)	t(23)	p	d	Interpretation
1. I tend to adapt quickly to new software or hardware tools.	3.4 (1.22)	3.0 (1.31)	0.82	0.420	0.17	Negligible
2. I feel adequately prepared to manage computer-based academic tasks.	3.0 (1.24)	3.4 (1.33)	0.52	0.610	0.11	Negligible
3. The prospect of teaching coding to primary students makes me feel uneasy. †	2.1 (1.26)	2.3 (1.32)	0.32	0.750	0.07	Negligible
4. I am open to adopting innovative digital tools to enrich my instructional practice.	4.0 (1.23)	3.9 (1.31)	0.15	0.880	0.03	Negligible
5. Engaging with computer interfaces often feels inefficient and overly tedious. †	2.2 (1.21)	3.1 (1.33)	1.00	0.330	0.20	Small
6. I am confident in my capacity to acquire unfamiliar computer applications.	3.9 (1.22)	3.9 (1.31)	0.24	0.810	0.05	Negligible
7. I generally have an aversion to computer-based technologies. †	1.5 (1.27)	1.9 (1.32)	0.76	0.450	0.16	Negligible
8. Facilitating programming lessons should be reserved exclusively for STEM specialists.	1.3 (1.22)	1.4 (1.32)	0.36	0.720	0.07	Negligible

Note. † Items 3, 5, and 7 are negatively worded; higher scores indicate less favorable attitudes. t-values were derived from paired-samples t-tests with df = 23. Cohen’s d = t/√n and is reported as a measure of practical significance independent of sample size. Effect sizes are interpreted as negligible (|d| < 0.20), small (0.20–0.49), medium (0.50–0.79), and large (≥0.80) [42]. Scale: 1 (strongly disagree) to 5 (strongly agree). No items reached statistical significance (p < 0.05).

Table 2. Distribution of themes from open-ended prompts (post-course).

Theme	Percentage (%)
(a) Learning how to use vibe coding	9%
(b) Nurturing divergent and creative ideation	14%
(c) Promoting autonomous, student-directed inquiry	20%
(d) Implementing digital play-based instructional design	24%
(e) Integrating visual programming (block-based operations)	33%

Note. Percentages represent the relative frequency of thematic occurrences within the total corpus of qualitative responses (n = 41 codes), normalized to 100% to reflect the thematic weight of each category.

Table 3. Pre-service teachers’ perspectives on game-making learning integration.

Statement	Pre-M (SD)	Post-M (SD)	t(23)	p	d	Interpretation
1. Integrating computer games can meaningfully elevate my teaching practice.	4.8 (1.22)	5.6 (1.24)	2.51	0.020 *	0.51	Medium
2. Using games in the classroom may create emotional distance between the instructor and the learner. †	2.4 (1.24)	2.1 (1.23)	0.43	0.670	0.09	Negligible
3. Digital game-making tasks tend to increase peer-to-peer cooperation.	5.1 (1.24)	5.5 (1.22)	1.34	0.190	0.27	Small
4. The deployment of digital games conflicts with my professional identity as an educator. †	2.5 (1.25)	2.1 (1.22)	0.63	0.530	0.13	Negligible
5. Incorporating games fundamentally transforms the traditional instructional role of the teacher.	3.5 (1.26)	4.0 (1.21)	0.78	0.440	0.16	Negligible
6. It is pedagogically unsound to embed digital games within my subject area. †	1.7 (1.24)	1.4 (1.21)	0.89	0.380	0.18	Negligible
7. Digital games are predominantly recreational tools with limited academic merit. †	2.3 (1.23)	1.5 (1.22)	1.45	0.160	0.30	Small

Note. † Items 2, 4, 6, and 7 are negatively worded; lower post-course scores indicate more favorable attitudes toward game-making contexts. t-values were derived from paired-samples t-tests with df = 23. Cohen’s d = t/√n. Effect sizes: negligible (|d| < 0.20), small (0.20–0.49), medium (0.50–0.79), large (≥0.80) [42]. Scale: 1 (strongly disagree) to 6 (strongly agree). * p < 0.05.

Table 4. Pre-service teachers’ sense of general competence and resilience.

Statement	Pre-M (SD)	Post-M (SD)	t(23)	p	d	Interpretation
1. I believe I can resolve challenging problems through sustained effort.	3.6 (1.22)	3.8 (1.23)	1.26	0.220	0.26	Small
2. I trust my ability to navigate unforeseen or ambiguous situations.	3.4 (1.22)	3.6 (1.22)	0.40	0.690	0.08	Negligible
3. I find it relatively uncomplicated to maintain focus and realize my objectives.	3.2 (1.23)	3.6 (1.22)	1.64	0.110	0.33	Small
4. I can stay composed under pressure because I have confidence in my coping strategies.	3.1 (1.22)	3.3 (1.24)	0.84	0.410	0.17	Negligible
5. When confronting a hurdle, I typically generate several alternative approaches.	3.4 (1.22)	3.7 (1.24)	0.89	0.380	0.18	Negligible
6. Given adequate investment in energy, I am capable of devising a solution for most issues.	3.4 (1.21)	3.6 (1.22)	0.63	0.530	0.13	Negligible

Note. t-values were derived from paired-samples t-tests with df = 23. Cohen’s d = t/√n. Effect sizes: negligible (|d| < 0.20), small (0.20–0.49), medium (0.50–0.79), large (≥0.80) [42]. Scale: 1 (strongly disagree) to 4 (strongly agree). No items reached statistical significance (p < 0.05); non-significant results should be interpreted as inconclusive rather than as evidence of null effects given the modest sample size (n = 24).

3.5. Data Analysis

Quantitative data were analyzed using descriptive statistics to characterize response distributions at both time points and paired-samples t-tests to determine whether pre-to-post shifts reached statistical significance [42]. Due to the modest sample size, the current study is statistically underpowered for the detection of small-to-medium effects; non-significant p-values should therefore be treated as inconclusive rather than as confirmation of null effects, and Cohen’s d effect sizes are reported throughout to convey practical significance independently of sample size constraints. Although non-significant, the consistent directionality across items suggests a systematic recalibration rather than random variation.

Qualitative data were analyzed thematically following the six-phase procedure described by Braun and Clarke [43], with categories derived inductively from the data and subsequently organized into higher-order interpretive themes. The qualitative findings served a dual analytical function: constructing a process-level account of how participants experienced and navigated the vibe coding environment, and providing a validity check on the quantitative patterns by examining whether the two data streams converged with or diverged from one another [44]. Integration of the two strands occurred at two distinct points in the analytical process: during analysis, through triangulation of quantitative attitude shifts with qualitative themes; and during interpretation, through expansion, where qualitative accounts were used to explain quantitative patterns that were ambiguous or unexpected.

To capture the developmental trajectory of CT practices across the course, each coded instance was additionally tagged according to the module and instructional phase in which it emerged and designated as “Course Stage” (Figure 3). Stages 1–5 refer to the scaffolded maze task sequence within Module 2, whereas Module 3 refers to the independent game-making project. This temporal tagging was applied by cross-referencing each participant’s reflective entry with the session it documented, allowing CT practices to be mapped not only by type, but also by their point of emergence across the instructional sequence. The resulting Course Stage classification is presented alongside the exemplar prompts in Appendix B, together with a CT tier designation reflecting the developmental hierarchy proposed by Grover and Pea [13]. This approach is grounded in an expectation embedded within the course design. Earlier modules were expected to elicit foundational practices such as decomposition and debugging, while later modules—particularly the independent project phase of Module 3—would generate evidence of higher-order CT practices such as abstraction and pattern recognition.

It is important to clarify the nature of the qualitative data sources used in this study. Participants interacted with AI chatbots—Claude, Gemini, and ChatGPT—through their standard web interfaces (chat.openai.com, gemini.google.com, and claude.ai), none of which provided session-level export or logging functionality accessible to the researcher at the time of data collection. Participants did not retain or submit their chat histories as part of the course requirements. As a result, verbatim transcripts of learner–AI interactions were not available for analysis. The exemplar prompts presented in Appendix B are therefore analytically reconstructed approximations derived from participants’ written reflective logs, in which they described—often in considerable detail—the problems they had posed to the AI, the outputs they received, and the reasoning they had applied in refining their requests. These reconstructions serve to render observable reasoning patterns visible for analysis; they are not to be interpreted as verbatim interaction data, and no claim is made that they are.

3.6. Validity and Reliability

Several measures were taken to strengthen the validity and reliability of both the quantitative instruments and the qualitative analytical framework employed in this study. These are addressed in sequence below according to the type of validity each procedure was designed to establish.

Internal consistency reliability across each questionnaire subscale was evaluated using Cronbach’s alpha prior to the main analysis, with a threshold of α ≥ 0.70 adopted as the criterion for acceptable reliability in accordance with established psychometric conventions [42]. The technology attitudes subscale returned α = 0.73, the game-making learning subscale α = 0.76, and the general competence subscale α = 0.72. These coefficients fall within the acceptable range across all three dimensions, indicating that items within each subscale were measuring their intended constructs with sufficient internal coherence. It is acknowledged, however, that alpha estimates derived from small samples (n = 24) carry greater sampling instability than those obtained from larger datasets, and these values should therefore be interpreted as indicative rather than definitive [42]. They are reported as a transparency measure rather than as strong psychometric confirmation. The current sample provides sufficient depth for process-level, design-oriented analysis consistent with DBR methodology [33], but is underpowered for medium-effect detection (power < 0.50 for d = 0.50 at n = 24); therefore, the non-significant results reported throughout should be treated as inconclusive rather than as evidence of an absence of effect.

Content validity was established through a structured expert review process conducted prior to data collection. Three academic reviewers—two with specializations in educational technology and one with expertise in pre-service teacher preparation—independently evaluated each item for conceptual alignment with its intended construct and linguistic appropriateness for the target population. A conservative retention criterion was applied: items were retained only when all three reviewers concurred on their relevance and clarity. Where disagreement arose, items were returned to collective discussion and either substantively revised or replaced with new items, which then underwent a second round of independent evaluation before being accepted. This iterative process ensured that the final instrument was grounded in both theoretical coherence and practical relevance to the specific population and context under investigation [36].

Construct validity was supported through two complementary mechanisms. First, item development was anchored in established theoretical frameworks: Bandura’s [39] self-efficacy theory provided the conceptual foundation for the general competence subscale, ensuring that item wording and scale structure were consistent with how self-efficacy has been operationalized in prior educational research, while the broader CT attitudes literature informed item selection across the remaining two dimensions [3,15]. This theoretical anchoring provides a principled basis for the claim that each subscale measures what it is intended to measure, distinguishing the instrument from a theoretical item aggregation. Second, the mixed-methods design itself functioned as an additional construct validity mechanism: qualitative data drawn from participants’ open-ended responses and reflective logs were used systematically to triangulate the quantitative attitude findings, contextualizing numerical patterns within participants’ own accounts of their experiences [31,44]. Where the two data streams converged, this convergence strengthened confidence in the quantitative results; where they diverged, the qualitative strand provided interpretive leverage for explaining patterns that the questionnaire data alone could not account for. By drawing on multiple independent sources of evidence, this triangulation strategy reduced the risk of systematic bias inherent in any single instrument and enabled a more analytically defensible account of how PSTs engaged with CT within the vibe coding environment.

Inter-rater reliability for the qualitative coding rubric was established through a structured second coder procedure. A randomly selected subset comprising 20% of all coded prompt instances (n = 13 instances, drawn proportionally across all five CT practice categories) was independently classified by a second researcher with doctoral-level expertise in computational thinking education. Prior to independent coding, both researchers reviewed the rubric definitions and anchor examples provided in Appendix A to ensure shared understanding of the level boundaries—particularly the distinction between Level 2 (structured reasoning) and Level 3 (generative reasoning), which represents the most interpretively demanding judgment in the classification process. Inter-rater agreement was subsequently calculated using Cohen’s kappa (κ = 0.81), indicating strong agreement that exceeds the threshold of κ ≥ 0.75 conventionally adopted as the criterion for acceptable reliability in qualitative educational research [42]. The three instances in which initial disagreement arose all concerned boundary cases between Level 2 and Level 3 within the debugging and algorithmic thinking categories. These were resolved through structured discussion in which both coders returned to the rubric anchor examples and the specific prompt language until consensus was reached. The remaining 80% of instances were subsequently classified by the primary researcher, with the consensus criteria established during calibration applied systematically throughout. This procedure ensures that the rubric-level designations reported in Table 5 and Appendix B reflect a verified and replicable classification framework rather than a single analyst’s interpretive judgment, thereby strengthening the construct validity of the qualitative strand of this study.

Taken together, these four procedures (internal consistency evaluation, expert content review, theoretical anchoring combined with mixed-methods triangulation, and second coder inter-rater reliability) constitute a layered validity framework that addresses the quantitative, qualitative, and integrative dimensions of the study’s evidence base. No single procedure is treated as sufficient in isolation; rather, their convergence across complementary validity types provides the strongest available basis for confidence in the conclusions drawn.

4. Results

This section first outlines the descriptive statistical results (Section 4.1, Section 4.1.1, Section 4.1.2, Section 4.1.3) derived from the questionnaire instruments, followed by a synthesis of qualitative evidence (Section 4.2) gathered from open-ended survey prompts and student-authored vibe coding reflections.

As noted in Section 3.4, the three subscales use different Likert ranges; all cross-subscale comparisons below rely on Cohen’s d and the direction of pre-to-post change rather than on raw mean values, which are not directly commensurable across scales. The results presented below address CT engagement, defined as observable reasoning practices evidenced in participants’ reflective accounts and reconstructed prompts. This is distinct from CT competence, which would require performance-based or standardized assessment instruments that were not employed in this study. This distinction is also addressed in Section 6 (Conclusions).

4.1. Pre-Service Teachers’ Orientations Toward Pre- and Post-Intervention

4.1.1. General Disposition Toward Digital Technology

Given the modest sample size (n = 24) and limited statistical power, Cohen’s d effect sizes are reported alongside significance tests to convey the practical magnitude of pre-to-post shifts independently of sample size constraints. Effect sizes were interpreted using conventional thresholds: negligible (d < 0.20), small (0.20–0.49), medium (0.50–0.79), and large (≥0.80) [42].

To gauge the baseline disposition of PSTs regarding digital tools and computational literacy, participants were asked to reflect on their prior engagement with software, coding, and general computer utilization. Table 1 summarizes the mean scores for these items on a five-point Likert scale.

Broadly, the data in Table 1 suggest that PSTs maintain a favorable outlook regarding the acquisition of new digital competencies (refer to items 2, 4, and 6). While paired-sample t-tests revealed no statistically significant shifts—likely attributable to the modest cohort size and limited statistical power—a notable trend emerged in items 1, 3, 5, and 7. The directional change may tentatively propose a slight erosion of self-assurance in the post-intervention. A plausible interpretation for this pattern is the cognitive demand associated with mastering CT concepts; the perceived difficulty of the subject matter may temporarily undermine participants’ confidence in their technical proficiency. Corroborating this inference, qualitative exit data revealed that 10 out of 24 participants (approximately 41.7%) expressed reservations about their readiness to independently facilitate coding instruction using block-based environments.

By contrast, analysis of the open-ended query, “Which pedagogical approaches will I incorporate into my future instruction?” (summarized in Table 2), underscores a notable emphasis on game-centric learning modalities and visual programming platforms (cited by 42% of respondents). This apparent discrepancy highlights a divergence between the PSTs’ immediate self-efficacy struggles and their longer-term pedagogical aspirations regarding digital integration.

Examination of practical effect sizes offers a more nuanced account than significance testing alone. Across the eight items in Table 1, seven produced negligible effects (d < 0.20), indicating that the course produced little or no meaningful change in these general technology dispositions. The sole exception was Item 5 (“Engaging with computer interfaces often feels inefficient and overly tedious”), which yielded a small but non-trivial effect (d = 0.20) in a negative direction—that is, participants reported greater perceived tedium with computer-based work after the course than before. Although this shift did not reach statistical significance (t(23) = 1.00, p = 0.330), its practical magnitude stands apart from every other item in this subscale and warrants specific interpretive attention. Item 8 (“Facilitating programming lessons should be reserved exclusively for STEM specialists”) recorded the smallest effect in the entire table (d = 0.03), confirming that PSTs entered and exited the course with an equally strong rejection of STEM-exclusivity beliefs, showing that this disposition was already well-formed prior to the intervention and was unaffected by the vibe coding experience. Taken together, the effect size profile of Table 2 indicates that the course produced no broad shift in general technology orientations, but introduced a specific friction around the perceived time-cost and repetitiveness of AI-assisted workflows. This is a finding that is interpretively distinct from global technology anxiety and is discussed further in Section 5.1.

4.1.2. Perspectives on Game-Making Contexts

Table 3 delineates the comparative pre- and post-course attitudes concerning the instructional integration of digital games. Overall, the findings reveal a consistently positive orientation toward ludic learning environments. A statistically significant improvement was observed specifically regarding the statement: “The integration of computer games can enhance the quality of my instructional delivery” (p < 0.05). Notably, independent of the course intervention, students entered and exited the term with a favorable view of game integration. The observed decrease in mean scores for the reverse-coded items (6 and 7) further substantiates a consolidation of positive attitudes toward the pedagogical utility of games over the duration of the present study.

Beyond the single statistically significant item, the effect size distribution across Table 3 reveals a secondary pattern of substantive practical change. Item 3 (“Digital game-making tasks tend to increase peer-to-peer cooperation”) produced a small effect (d = 0.27, t(23) = 1.34, p = 0.190), and Item 7 (“Digital games are predominantly recreational tools with limited academic merit”) similarly yielded a small effect (d = 0.30, t(23) = 1.45, p = 0.160). Although neither reached conventional significance thresholds—a finding attributable to the modest sample size rather than the absence of real change—both represent practically meaningful shifts in the direction of more positive game-making attitudes. In contrast, Items 2, 4, and 6—all negatively worded items addressing professional identity conflicts—produced negligible effects (d = 0.09, 0.13, and 0.18, respectively), indicating that participants did not experience the course as threatening their professional self-concept as educators, despite its departure from conventional instructional formats. The one statistically significant item, Item 1 (“Integrating computer games can meaningfully elevate my teaching practice”), produced a medium effect (d = 0.51, t(23) = 2.51, p = 0.020), making it the largest practical shift across all three subscales and the only change in this study to meet both the significance and medium-effect threshold simultaneously. This convergence of statistical and practical significance lends particular weight to the conclusion that the course produced a genuine and meaningful shift in PSTs’ beliefs about the instructional value of game-making approaches.

4.1.3. Shifts in General Perceived Self-Competence

An examination of PSTs’ broader sense of self-competence revealed a nuanced but encouraging shift across the semester (see Table 4). While baseline measures indicated a relatively robust sense of general self-efficacy prior to instruction—contrary to initial design phase expectations—the post-intervention data show a marginal but consistent upward trend across nearly all measured dimensions. The absence of statistical significance does not negate the positive directional shift observed in the post-course averages. The subsequent qualitative data derived from participant reflections provide context for this modest enhancement in perceived resilience and problem-solving capacity. Given the small number of participants, the absence of statistical significance may reflect insufficient power to detect true effects rather than an absence of change.

The effect size profile of Table 4 is more differentiated than the non-significant p-values alone would suggest. Two items produced small effects that, while not statistically significant, carry interpretive relevance given the study’s limited statistical power. Item 3 (“I find it relatively uncomplicated to maintain focus and realize my objectives”) yielded the largest practical shift in this subscale (d = 0.33, t(23) = 1.64, p = 0.110), and Item 1 (“I believe I can resolve challenging problems through sustained effort”) produced a small effect of comparable size (d = 0.26, t(23) = 1.26, p = 0.220). Both shifts were in a positive direction, indicating modest growth in task persistence and goal-directed focus—competencies that are closely aligned with the iterative debugging and prompt refinement demands of the vibe coding environment. The remaining four items produced negligible effects (d ranging from 0.08 to 0.18), indicating that broader dimensions of self-efficacy, such as composure under pressure and the ability to generate alternative solutions, were largely stable across the semester. This stability should not be interpreted as stagnation; rather, it reflects the already-high baseline scores on these items (Pre-M ranging from 3.1 to 3.6 on a four-point scale), which introduced a ceiling effect that constrained the range of possible positive changes. The practical implication is that the course most noticeably influenced the specific self-regulatory competencies most directly exercised during game-making—sustained attention and persistence through difficulty—while leaving more global self-efficacy perceptions intact.

Across all three subscales, the effect size distribution reveals a coherent pattern: negligible effects cluster in the technology attitudes subscale (all d values ≤ 0.20, with one exception), small-to-medium effects characterize the game-making subscale (d ranging from 0.09 to 0.51), and the general competence subscale shows selective small effects for the two items most directly exercised during vibe coding. This d-value distribution is the appropriate cross-subscale comparator, as absolute mean values are not commensurable across scales of different ranges. This indicates that a single 12-week intervention delivered to a favorably predisposed sample was insufficient to produce broad, generalized attitude change across multiple dimensions. However, three items produced small but practically non-trivial effects, including Item 5 from Table 1 (d = 0.20), Items 3 and 7 from Table 3 (d = 0.27 and 0.30), and Items 1 and 3 from Table 4 (d = 0.26 and 0.33). Item 1 from Table 3 produced the only medium effect in the dataset (d = 0.51) alongside statistical significance. Notably, the direction of change was not uniformly positive: the small effect for Item 5 in Table 1 indicated increased perceived tedium, while all effects in Table 3 and Table 4 were positive. This reduced comfort with the mechanics of AI-assisted work, alongside increased belief in its pedagogical value, constitutes a central tension in the data and is the primary focus of the discussion that follows. It is also consistent with the interpretation that participants developed a more calibrated and realistic appraisal of computational work rather than a generalized increase or decrease in confidence.

4.2. Qualitative Insights

The qualitative findings provide a process-level explanation for the quantitative patterns, showing how reduced confidence coexisted with increasing computational engagement. While the questionnaires captured a recalibration of self-assessed competence, that is, a shift from unconscious to conscious incompetence, the reflective narratives reveal that this recalibration coexisted with, and in some cases enabled, a growing sense of creative agency and computational engagement. The following themes therefore explain how PSTs experienced and navigated the cognitive demands that the quantitative instruments detected as reduced surface-level confidence.

Applying the three-tier rubric (Appendix A) to all coded instances reveals how the quality of CT engagement varied across practices and across the instructional sequence. Decomposition instances clustered predominantly at Level 2 (structured reasoning), with participants identifying discrete functional components of their game but rarely generalizing those structures beyond the immediate task. Debugging showed the widest spread across levels: instances from Module 2 were predominantly coded at Level 1, reflecting trial-and-error manipulation of blocks and prompts, while by Module 3 the majority of debugging instances reached Level 3, characterized by explicit causal hypotheses about execution order and system state. Algorithmic thinking instances were predominantly Level 2, reflecting the construction of rule-governed sequences without yet producing transferable logical structures. Abstraction and pattern recognition, where they appeared at all, were coded at Level 3 in the majority of instances, consistent with their emergence exclusively in Module 3 after participants had accumulated sufficient game-making experience to perceive reusable patterns across system components. Table 5 summarizes the rubric-level distribution across all five CT practices.

Thematic analysis of PST reflections revealed a strong sense of empowerment derived from the creative agency afforded by the visual programming environment. Participants characterized the experience as simultaneously invigorating, intellectually demanding, and distinct from traditional coursework. The following paraphrased excerpts illustrate a trajectory of growth in computational self-concept, highlighting emergent dispositions toward cognitive flexibility and iterative refinement. For clarity, the salient CT component referenced in the learning process is indicated in brackets:

Participant D: “The course taught me the value of perseverance in the face of ambiguity. When confronting an unfamiliar coding structure that seemed insurmountable, I learned to deconstruct the problem, seek out supplementary resources, and rely on peer consultation rather than immediate capitulation.” [Abstraction and Pattern Recognition]

Participant L: “I discovered that productive struggle can be a catalyst for deeper engagement. The cycle of debugging—despite the temporary frustration of malfunctioning scripts—was immensely satisfying. That friction forced a level of analytical scrutiny that I wouldn’t have applied otherwise.”

Participant R: “The construction process was iterative and often painstaking; I spent extended periods isolating logic errors. However, the tangible outcome of a fully functional, interactive artifact rendered the prior frustration inconsequential. It redefined my perception of my own technical ceiling.”

The culminating projects frequently exceeded the students’ initial estimations of their own capabilities, a phenomenon largely attributed to the medium’s capacity for personalized, creative expression. A representative comment exemplifies this:

Participant J: “The platform provided a scaffold, yet the potential for particular design was immense. Once I realized the project was a canvas for my own creative sensibilities rather than a rigid exercise, the development process shifted from drudgery to discovery.”

Several narratives explicitly acknowledged the role of legitimate failure and heuristic exploration within the design process:

Participant N: “The ‘guess and check’ method—manipulating parameters and observing the real-time output from prompts—became an efficient diagnostic tool, particularly when navigating complex, layered scripts late in the design phase.”

Peer-to-peer discourse around coding challenges manifested in two distinct phases: (a) immediate problem-solving during coding roadblocks; and (b) shared conceptualization and execution of the summative projects. The following anonymized accounts capture these respective contexts:

Participant A: “When my sequence logic failed, I immediately sought out a peer to act as a second pair of eyes. Verbalizing my intended steps and comparing notes often revealed the missing link in my algorithm.”

Participant G: “My final project was a product of genuine construction. While minor disagreements emerged regarding the narrative and question formulation, consensus was reached efficiently. The transition from the storyboard phase to the actual coding was streamlined because we shared a unified, clearly articulated vision from the outset.”

The value of this peer interaction is further underscored by the qualitative data in Table 6, which summarizes the theme of participants’ open-ended recommendations regarding course elements they wished to retain. Moreover, the thematic analysis confirmed a strong resonance with the pedagogical “Four P’s” framework (Projects, Passion, Peers, and Play) as proposed by Resnick [23]. The fact that nearly half the cohort expressed a desire to maintain the course’s substantive content, while over half sought deeper and more extended engagement, demonstrates that the PSTs recognized significant intrinsic and professional value in the CT experience despite the cognitive challenges they encountered.

The qualitative analysis of participant logs revealed that PSTs engaged in iterative computational refinement throughout the course. Rather than submitting single, undifferentiated requests to AI systems, participants progressively developed more structured prompting strategies, moving from broad problem descriptions toward targeted, logic-specific instructions that reflected emerging CT reasoning. Two reconstructed examples drawn from participant reflective logs illustrate this transition prior to the systematic analysis presented in Appendix B.

The first example shows decomposition combined with constraint setting. A participant working on a maze game wrote: “My sprite keeps moving even when I let go of the arrow key, and the walking animation keeps going too. Can you help me write a script that checks when the sprite has stopped moving and switches to the standing costume?” This prompt demonstrates decomposition as the separation of compound behavior into discrete functional components, namely movement control and animation state. The participant reframes a single behavioral issue into two interdependent system states (movement and costume appearance) and specifies a conditional rule linking sprite state to visual output. Rather than reporting a general malfunction, the prompt defines a rule-based condition under which the system should transition between states, reflecting foundational CT practice consistent with Wing’s conceptualization of decomposition as partitioning complex problems into manageable components [1].

The second example points out debugging through hypothesis testing. A participant working on the same maze game wrote: “Every time my sprite reaches the edge of the maze and reappears at the starting position, it just stops responding to the arrow keys. I think the problem is that the ‘go to start position’ block runs before the sprite properly resets its movement state. Can you check if that is what is happening and suggest how to fix the timing?” This prompt demonstrates debugging as hypothesis-driven reasoning about system behavior, in which the participant identifies a potential causal relationship between block execution order and the loss of input responsiveness after reset. Rather than describing the issue in vague terms, the participant formulates a testable explanation of system behavior and requests verification of a specific execution sequence. This reflects foundational debugging practice as defined by Grover and Pea [13], moving from surface-level symptom description toward structural reasoning about program execution.

Both examples are reconstructed from participant reflective accounts rather than extracted verbatim from live AI interaction transcripts; their function is to render observable CT reasoning visible prior to the systematic analysis that follows.

To systematically address RQ2, participants’ natural language interactions with chatbots were examined through their reflective logs and session-based documentation. Given that direct transcripts of AI exchanges were not retained, the analysis focuses on analytically reconstructed prompts that capture recurrent reasoning patterns evident across the dataset. These reconstructions are grounded in participants’ descriptions of their problem-solving processes and are used to illustrate how core CT practices were enacted within the AI-mediated game-making environment. In addition, reconstructed prompts represent interpretive approximations rather than verbatim interaction data.

Representative examples of these prompts, organized by CT practice, course stage of emergence, and developmental tier, alongside their analytical rationale and frequency of occurrence, are presented in Appendix B. The appendix is not intended to provide an exhaustive catalog of all interactions, but rather a structured synthesis that makes visible how PSTs progressively engaged in decomposition, debugging, algorithmic thinking, abstraction, and pattern recognition through natural language prompting. By linking these practices to their point of emergence within the instructional sequence and their distribution across participants, Appendix B provides an empirically grounded account of how CT was enacted—and how it developed—within the context of AI-mediated (“vibe coding”) game making. Frequency counts reflect the number of distinct prompt instances coded under each CT category across all participant logs (n = 24) and are expressed both as raw counts and as proportions of the full cohort to enable assessment of the relative prevalence and reach of each practice.

The prompt sequences illustrated in Appendix B, considered alongside their “Course Stage” and frequency distribution, provide empirical evidence that natural language interaction in a vibe coding environment does not bypass computational logic; rather, it relocates it. In this context, the cognitive work of CT shifts from the syntactic level, where a learner must correctly construct code from scratch, to the semantic and evaluative level, where a learner must correctly specify intent, interpret AI-generated output, and diagnose misalignments between what was requested and what was produced [11,29]. For PSTs with no prior programming experience, this relocation appears to have been productive: rather than encountering CT as an opaque technical barrier, participants encountered it as a communicative and analytical challenge—one that the reflective logs suggest they were able to engage with meaningfully, even in the absence of formal coding knowledge.

The frequency distribution across Appendix B reveals a developmental gradient that is consistent with established accounts of CT progression. Debugging was the most pervasive practice, observed in 79% of participants, followed by decomposition at 58%. Both emerged as characteristic prompt patterns from Module 2 onward, consistent with their role as the most immediate and feedback-responsive entry points into computational reasoning [1,13]. Algorithmic thinking appeared in 38% of participants across Modules 2 and 3, reflecting the intermediate conceptual demands of constructing rule-governed, variable-based game logic. Abstraction and pattern recognition—the two higher-order practices identified by Grover and Pea [13] as developmentally dependent on prior consolidation of foundational skills—appeared only in Module 3 and in the smallest proportion of participants, 25% and 33%, respectively. This gradient is not incidental. It replicates within a natural language prompting paradigm the same developmental sequence that prior CT research has documented in block-based and text-based programming environments, showing that the hierarchical structure of CT development is robust across different modes of computational engagement.

It is equally important, however, to attend to what the frequency data reveal about the limits of what the vibe coding environment produced. The fact that abstraction appeared in only one quarter of participants—despite being the CT competency most predictive of transferable CT understanding [13,25]—indicates that this practice was not a natural or spontaneous outcome of AI-assisted game-making for most learners. Higher-order CT reasoning required prior experience with game mechanics sufficient to perceive reusable patterns across system components, a threshold that not all participants reached within the 12-week course. This finding has direct implications for instructional design: without explicit scaffolding that prompts learners to look for structural generalities before requesting AI assistance, abstraction risks remaining the province of the most advanced participants rather than becoming a shared feature of the vibe coding experience. These design implications are developed in Section 7, and operationalized as design principles in Appendix C.

Taken together, the qualitative findings provide a process-level answer to RQ2. PSTs did navigate computational challenges—debugging, decomposition, algorithmic thinking, and, for a proportion of participants, abstraction and pattern recognition—in ways that reflect genuine CT engagement rather than passive delegation to AI systems. These practices were evidenced in participants’ natural language interactions through the specificity of their problem formulations, the presence of causal hypotheses, and the articulation of system-level constraints. The AI chatbot functioned as a cognitive scaffold [17] that redirected participants’ attention from syntactic correctness to logical architecture, enabling them to engage with the structure of their games in ways that would not have been accessible without prior coding experience. Nevertheless, this engagement was uneven across participants and uneven across CT competency levels, and the course alone was insufficient to ensure that higher-order practices reached all learners. This tension between the access that vibe coding provides and the depth it does not automatically guarantee is the central finding of this section and the central challenge for future course design.

Across both data strands, a consistent pattern emerges. PSTs engaged substantively with CT processes, particularly debugging and decomposition, while showing limited spontaneous progress toward higher-order practices such as abstraction and pattern recognition. The quantitative data capture this as a selective pattern of attitude shift—strongest for perceived instructional value of game-making (d = 0.51), negligible for general technology orientations—while the qualitative data explain the mechanism: the iterative prompt–evaluate–refine cycle generated genuine CT reasoning but did not, for most participants, scaffold unprompted generalization beyond the immediate task.

5. Discussion

The current study investigated how pre-service teachers engage with CT when it is embedded within an AI-assisted, game-making course built around vibe coding principles. The findings that emerged from both the quantitative and qualitative strands of this study offer a layered view. This view sheds light not only on what PSTs gained from the experience, but also on what the experience revealed about the specific tensions inherent in CT learning and teaching simultaneously. Drawing on empirical findings and observed patterns of learner engagement, this study identifies a set of design principles for supporting CT development in AI-mediated environments (see Appendix C). These principles extend beyond tool use, specifying the pedagogical conditions under which AI assistance contributes to, rather than obscures, computational reasoning.

5.1. Addressing RQ1: Changes in Attitudes Toward Digital Technology Use, Game-Making Contexts, and Self-Efficacy

5.1.1. Attitudes Toward Digital Technology Use

Contrary to a straightforward confidence-building narrative, several attitude indicators showed a slight downward shift after sustained engagement with vibe coding, particularly those related to perceived speed of technology acquisition, anxiety about teaching programming, and the time demands of computer-based work. Rather than signifying failure, this recalibration reflects the transition from unconscious to conscious incompetence that metacognitive literature identifies as necessary for genuine expertise development [45]. PSTs who entered with high, generalized confidence adjusted their self-perceptions once they encountered the specific, rigorous complexities of CT. While this study mirrors Gleasman and Kim’s observation of modest overall attitude gains [6], it diverges significantly in its findings on confidence. The downward recalibration likely reflects the unique cognitive demands of the AI-mediated vibe coding context. It also echoes Butler and Leahy [5], whose participants articulated pedagogical strategies for CT instruction only after sustained hands-on engagement, suggesting that the gap between experiencing CT and feeling ready to teach is a consistent feature of the PST developmental trajectory rather than a failure specific to this course. What distinguishes the present study is that it situates this developmental gap within the additional complexity of a vibe coding context, where the locus of computational agency is partially redistributed between the learner and the AI system—raising new questions about what confidence in CT actually means when code can be generated rather than written.

The effect size analysis adds precision to this interpretation. Across the eight items of the technology attitudes subscale (Table 1), seven produced negligible practical effects (d < 0.20), confirming that the course did not broadly reshape PSTs’ general relationship with digital technology. The negligible effects on Items 2, 4, and 6 (d = 0.09, 0.13, 0.18) indicate no meaningful change in professional identity concerning a domain where attitudes were already well-formed prior to enrollment. However, the single item producing a small effect (d = 0.20) is analytically revealed because of its direction: Item 5 (“Engaging with computer interfaces often feels inefficient and overly tedious”) shifted negatively, meaning participants found computer-based work more tedious after the course. This is not a generalized technology anxiety response—Items 6 and 7, capturing broader aversion and confidence, remained essentially unchanged. Rather, it appears to be a workflow-specific response to the particular demands of iterative AI prompting: generating output, evaluating it, refining the prompt, waiting for a revised response, and debugging what is produced. This cycle is cognitively demanding and time-consuming in ways that differ from general computer use, and the small but non-trivial effect on Item 5 suggests participants experienced this difference acutely.

Reframing this workflow friction as pedagogically productive rather than a usability problem is essential for designing effective vibe coding courses. Kapur’s [46] productive failure framework is directly applicable: the iterative loop of prompt generation, output evaluation, and prompt refinement structurally instantiates the exploratory struggle that produces more robust understanding than guided instruction alone. When a generated Scratch script does not behave as intended, the ensuing debugging cycle—formulating a hypothesis about what went wrong, adjusting the prompt specification, evaluating the revised output against the original intent—is not wasted time but the moment at which computational reasoning occurs in its most explicit. Future course designs should make this reframing pedagogically visible: an early session activity asking PSTs to deliberately generate a ‘bad’ prompt, analyze why the output diverged from their intent, and write out the diagnostic reasoning involved would give learners a metacognitive vocabulary for understanding why the iterative workflow is valuable before they experience it as frustrating. Establishing the iterative cycle as the expected and desired mode of working, rather than as an obstacle to be overcome, is likely to reduce the tedium effect observed in Item 5 without reducing the analytical demands of the workflow.

5.1.2. Game-Making Contexts

Yet, this downward recalibration coexisted with a strikingly forward-looking set of intentions. When asked what they anticipated applying in future classrooms, game-making approaches and vibe coding were cited most frequently, suggesting that even participants who felt less technically confident still recognized the value of what they had encountered and intended to act on it. The gap between feeling not yet ready to teach something and valuing it enough to intend to teach it is itself a meaningful finding, pointing to a course that generated genuine intellectual and pedagogical engagement even if it did not yet produce the technical consolidation needed for confident instruction.

This tension has direct implications for course design. The concurrent demands placed on PSTs to acquire a new technical skill while also developing a pedagogical framework for teaching may have imposed a higher cognitive load than anticipated. Future iterations could benefit from a more explicitly sequenced structure, in which foundational technical fluency with chatbots is established prior to introducing pedagogical and problem-solving applications. Such staging would help mitigate cognitive competition between learning the tool itself and using it as a cognitive instrument, allowing each dimension to develop with greater depth and stability [14].

Crucially, this friction with workflow mechanics coexisted with the study’s strongest findings. Item 1 of Table 3 (“Integrating computer games can meaningfully elevate my teaching practice”) produced a medium effect (d = 0.51, t(23) = 2.51, p = 0.020)—the only item in the entire dataset to satisfy both practical and statistical significance criteria. Its magnitude, the largest observed across all subscales, lends analytical weight to the conclusion that the course produced a genuine shift in how PSTs perceive the instructional value of game making. Two additional items produced small but practically non-trivial effects in the same positive direction: Item 3 (d = 0.27) and Item 7 (d = 0.30), both addressing beliefs about the academic and social value of digital games. Together, these three items form a coherent cluster suggesting that sustained, hands-on engagement with game making—even when mediated through AI and accompanied by confidence recalibration—reliably strengthens PSTs’ endorsement of games as legitimate pedagogical tools. This aligns with Gleasman and Kim’s [6] observation of attitude gains following block-based environment work and extends it to the AI-mediated context, suggesting that the game-making frame retains its attitudinal power even when production shifts from manual coding to prompt-driven development.

5.1.3. Self-Efficacy

The convergence of decreased self-efficacy with increased evidence of structured debugging suggests a transition from naïve confidence to informed awareness. This recalibration of programming confidence is further illuminated by contrasting it with participants’ general self-efficacy, which showed a different pattern. While confidence in programming and technology integration showed a downward recalibration (Table 1), broader problem-solving competence and resilience exhibited a consistent, albeit non-significant, upward trend across all six items (Table 4). This divergence suggests that PSTs did not experience a global erosion of self-confidence; rather, they developed a more calibrated and realistic appraisal of the specific demands of computational work while simultaneously strengthening general adaptive dispositions. Distinguishing these constructs matters for teacher education research: a decline in coding-specific confidence may paradoxically signal productive metacognitive growth rather than disengagement.

The effect size profile of the general competence subscale supports this interpretation. Two items produced small positive effects that, while not statistically significant, carry interpretive relevance given the study’s limited power: Item 3 (“I find it relatively uncomplicated to maintain focus and realize my objectives,” d = 0.33) and Item 1 (“I believe I can resolve challenging problems through sustained effort,” d = 0.26). Both represent the self-regulatory competencies—task persistence and goal-directed focus—most directly exercised during the iterative debugging and prompt refinement cycles of vibe coding. Broader dimensions, such as composure under pressure and the ability to generate alternative solutions, remained stable, likely due to ceiling effects given the already high pre-course baseline scores (Pre-M ranging from 3.1 to 3.6 on a four-point scale). The practical implication is that the course most noticeably influenced the specific self-regulatory resources that the game-making process required, while leaving more global self-efficacy perceptions intact.

When the three subscales are considered together, the effect size distribution reveals an asymmetric pattern that is itself theoretically informative. Negligible effects clustered in the technology attitudes subscale (Table 1), particularly around perceptions of efficiency, with Item 5 as the sole exception, shifting negatively in a workflow-specific rather than globally anxious direction. Small-to-medium positive effects clustered in the game-making subscale (Table 3). The general competence subscale (Table 4) showed selective strengthening of task persistence and goal-directed focus. This three-part pattern—friction with process, growth in pedagogical belief, selective strengthening of relevant self-regulatory capacities—is internally coherent and consistent with the constructionist account of learning through making. PSTs did not emerge feeling uniformly more confident or capable. They emerged with a more precise and calibrated set of beliefs: greater conviction in the educational power of what they had built, greater awareness of the cognitive demands involved in building it, and modestly stronger self-regulatory resources for sustaining effort when those demands intensified [8,46].

5.2. Addressing RQ2: Navigating Computational Challenges in Vibe Coding Environments

5.2.1. Navigating Difficulties in AI-Mediated Game-Making Contexts

The qualitative data offers a considerably richer account of how PSTs actually moved through the computational challenges the course presented. The most consistent pattern was one of gradual reorientation toward difficulty. PSTs who initially experienced obstacles as signs of inadequacy progressively came to treat them as inherent and even generative features of the computational environment. This transition—from avoidance to engagement, from frustration to persistence—signifies fundamental cognitive development, reflecting the internalization of a core CT disposition wherein learners recognize that debugging, revision, and iterative refinement are not signs of failure but the very mechanisms of computational problem solving [1,24].

To make that cognitive shift analytically visible, the present discussion interprets participant accounts using the CT engagement rubric in Appendix A, which distinguishes surface interaction (Level 1), structured reasoning (Level 2), and generative reasoning (Level 3). This allows the analysis to move beyond whether CT appeared at all and instead specify how deeply participants reasoned about computational problems. This resonates with Kapur’s [46] concept of productive failure, in which learners who struggle with complex problems before receiving instruction develop more robust understanding than those guided to solutions from the outset. The vibe coding process, by generating imperfect outputs that learners must evaluate and correct, structurally instantiates this principle in every program generated. The game-making process thus becomes a starting point for reasoning rather than a finished product.

The creative and expressive dimensions of the environment also played a significant role in sustaining engagement through difficulty. The ability to make personal choices about game content, visual design, and thematic connection to subject matter specialization transformed what might otherwise have been a purely technical exercise into something with individual meaning and stakes. This reinforces constructionist arguments that learning through making is most powerful when the artifact is personally meaningful to the maker [8,22].

The evidence for genuine rather than superficial engagement is uneven across the cohort, and this unevenness is itself reported honestly rather than suppressed. The rubric-level distribution in Table 5 shows that while the majority of debugging and decomposition instances reached Level 2 (structured reasoning) or Level 3 (generative reasoning), a proportion remained at Level 1 (surface interaction). Participants coded at Level 1 demonstrated problem descriptions that lacked logical structure, prompt requests that were vague (“make the game better”), and reflective accounts that did not engage with execution logic. The course design mitigated but did not eliminate this pattern, which is why the design principles in Appendix C recommend more explicit scaffolding for conceptual ownership in future iterations.

5.2.2. The Developmental Gradient of CT Practices in Vibe Coding

The CT competencies that most visibly emerged from PSTs’ reflective accounts were implementation and solution testing, data representation and model construction, and the evaluation of alternative solutions through systematic debugging. As shown in Table 5, instances clustered primarily at Level 2 for decomposition and algorithmic thinking, while debugging showed the widest developmental spread, progressing from Level 1 in Module 2 to Level 3 in Module 3. This distribution matters because it shows that CT engagement was not uniform: some participants remained at exploratory trial-and-error, while others progressed toward generative reasoning. That variability is consistent with the broader CT development literature, which recognizes exploratory testing as a developmentally appropriate strategy in game-making environments, particularly in early engagement [24]. The important finding is not that all participants reached the same level, but that meaningful CT engagement occurred across the range—and that the vibe coding environment accommodated that range without penalizing those who needed more time to move toward systematic reasoning.

The frequency data in Appendix B corroborates this developmental account. Debugging emerged as the most pervasive CT practice, recorded in 22 instances across 79% of participants, followed by decomposition, which appeared in 18 instances across 58% of participants. Both practices emerged from Module 2 onward and were most often coded at Level 2 or Level 3, reflecting structured reasoning that frequently matured into hypothesis-driven debugging. By contrast, abstraction was the least frequently observed practice (seven instances, 25% of participants), and pattern recognition appeared in 33% of participants. These higher-order practices emerged only in Module 3, after participants had accumulated enough game-making experience to perceive reusable structures across their projects. Importantly, the vibe coding paradigm did not flatten this sequence: even when AI generated code on demand, the cognitive work of recognizing when abstraction was possible—and how to express it as a prompt—remained the learner’s own.

The scarcity of abstraction and pattern recognition is a design-relevant finding. These are precisely the CT practices most associated with transferable understanding [13], yet they are also the most vulnerable to being bypassed in a prompt-driven environment because learners can obtain reusable code from AI without themselves articulating the abstraction that motivates it. The frequency data confirms that some PSTs did identify such patterns, but this was not yet consistent across the cohort. This finding therefore supports the design principles articulated in Appendix C. Future iterations should require learners to identify repeated structures, predict reusable logic, and formulate the abstraction before requesting AI assistance so that higher-order CT practices become an expected part of the workflow rather than an optional by-product.

5.2.3. AI-Assisted Programming

While this study’s findings align with certain patterns in prior AI-assisted programming research, they also reveal important distinctions. Kazemitabaar et al. showed that AI code generation can simultaneously support and obscure computational reasoning in novice programmers, a duality also visible here. PSTs who critically engaged with AI-generated outputs developed meaningful debugging and evaluation skills, whereas those who accepted outputs uncritically showed less evidence of deep CT. Jansen et al. similarly found that programmers’ engagement shift between exploratory and targeted uses of code-generating models depending on task demands, a flexibility these participants were only beginning to develop. Luo et al. [18] documented the mismatch between novice expectations and actual AI outputs in authentic, complex collaborative design tasks, closely paralleling the confidence recalibration seen in the post-course data.

More specifically related to vibe coding, Si et al. showed that moving from syntax-based coding to prompt-based interaction introduces new cognitive demands around interpreting system behavior and evaluating outputs. These demands closely match those seen among the PSTs. Thorgeirsson et al. found that both prior programming knowledge and written communication skills predict vibe coding proficiency, which may help explain differences in CT depth despite participants’ similarly limited coding backgrounds. Hsu argued that the shift from programming to prompting requires reconceptualizing CT in AI-mediated contexts, a transition participants were beginning to negotiate at different rates. Fortes-Ferreira et al. found that personal motivation enabled non-experts to create more complex systems, consistent with the stronger engagement and iterative refinement shown by PSTs who linked their games to their own teaching subjects.

The above studies suggest that the challenges encountered were not idiosyncratic but reflect broader patterns in novice engagement with AI-assisted coding. AI-assisted programming does not eliminate computational thinking; rather, it redistributes it across new forms of reasoning, particularly interpretation, evaluation, and iterative refinement. In this study, that redistribution was most productive when learners retained responsibility for decomposition, debugging, and prompt refinement instead of delegating those processes entirely to the system. This reinforces the rationale for the design principles in Appendix B: they preserve the learner’s role as an analyst of computational logic rather than a passive recipient of AI-generated output.

6. Conclusions

This study examined how PSTs engaged with CT practices in an AI-mediated (“vibe coding”) game-making environment, focusing on both their evolving self-efficacy and the nature of their computational reasoning. The findings indicate that AI-mediated environments can support meaningful engagement with CT, but not in a uniform or automatic way. Using the analytical rubric (Appendix A), this study showed that PSTs’ engagement extended beyond surface interaction, with many instances reflecting structured (Level 2) and generative (Level 3) computational reasoning. In particular, debugging emerged as a central mechanism through which deeper reasoning developed, while higher-order practices such as abstraction and pattern recognition appeared less frequently and primarily in later stages of the instructional sequence. These results suggest that AI tools do not eliminate computational thinking but rather redistribute its cognitive demands, shifting emphasis from code production to problem formulation, evaluation, and iterative refinement. Nonetheless, this redistribution introduces a critical dependency on instructional design. Without appropriate scaffolding, there is a risk that learners may rely on AI-generated outputs without fully engaging in the underlying computational logic. Of the 21 attitude items across the three subscales, one—Item 1 of Table 3—produced an effect size meeting both the medium threshold (d = 0.51) and statistical significance (p = 0.020). Three further items produced small but non-trivial effects (d ranging from 0.20 to 0.33). The remaining 17 items produced negligible effects. Conclusions are calibrated accordingly: strong claims are reserved for the single medium-effect finding; directional patterns supported by small effects are described as suggestive rather than confirmatory; and negligible-effect items are noted but not used as the basis for design recommendations.

To address this, the current study proposes a set of design principles (Appendix B) that specify the conditions under which AI-mediated environments can support CT development. These include maintaining computational visibility, requiring learners to articulate intended logic prior to AI interaction, structuring debugging as a core activity, and explicitly supporting the development of abstraction and pattern recognition. Together, these principles emphasize that the effectiveness of vibe coding as a pedagogical approach depends not on the capabilities of AI systems alone, but on how they are integrated into learning designs that sustain active computational reasoning.

This study contributes to the growing literature on AI in education by offering both an analytical framework for interpreting CT engagement and a set of design-oriented implications for practice. At the same time, its findings should be interpreted within the limits of its methodology. The analysis is based on reconstructed prompts and reflective accounts rather than direct interaction logs and therefore captures observed reasoning practices rather than validated learning gains. Perhaps the most insightful consequential finding of this study is one that the quantitative data captures indirectly but the qualitative data makes explicit: the experience of doing CT through AI-mediated game-making did not automatically produce a sense of readiness to teach CT. Participants who engaged most deeply with debugging, decomposition, and iterative refinement—demonstrating Level 2 and Level 3 reasoning in the rubric—were often the same participants who reported the sharpest post-course decline in coding confidence and teaching self-efficacy. This pattern is not a contradiction; it is a developmental signal. Learning to do something and learning to teach it are distinct competencies that require different kinds of experience and different timeframes to develop. A single semester of CT engagement—however well scaffolded—can move PSTs from unconscious to conscious incompetence, which is a genuine and important gain. But it cannot bridge the full distance to pedagogical readiness, which requires not only technical fluency but also the ability to anticipate learner difficulties, design appropriate scaffolding, and make computational reasoning visible to others. Recognizing this distinction is the study’s central contribution to teacher education literature: it reframes single-course CT interventions not as sufficient preparation for teaching but as necessary first steps on a longer developmental trajectory that teacher preparation programs must support over time.

7. Implications for Design and Practice

The findings of this study carry several concrete implications for how vibe coding experiences might be designed and integrated within teacher preparation programs. These are as follows:

Sequencing technical and pedagogical demands: A key implication is the need to decouple, at least initially, the technical and pedagogical dimensions of CT learning. When PSTs are simultaneously required to acquire programming fluency and develop a pedagogical framework for teaching that fluency, the cognitive load generated by these parallel demands can undermine progress in both. A staged design—one that establishes a functional baseline of technical competence before introducing higher-order pedagogical reflection—would allow each strand to develop with greater depth and without competing for the same limited cognitive resources. · This outcome supports Laurillard’s assertion that effective technology-integrated pedagogy requires educators to first experience the learning process as students before designing their own instruction [9].
Preserving productive difficulty: This implication should not be misread as an argument for simplifying the CT experience. The data are unambiguous on this point: what PSTs most valued, and most wished to extend, was precisely the challenge. A course redesign that removes difficulty in order to protect confidence would misread both the findings and the literature. Kapur’s [46] productive failure framework is directly applicable here: the cognitive struggle encountered in vibe coding environments is not a design flaw to be corrected but a generative mechanism to be preserved and scaffolded. The design challenge is not to reduce difficulty but to ensure that learners have sufficient support—instructional, social, and technical— to sustain productive engagement with it rather than retreating from it. Practically, this means future courses should include an explicit ‘workflow literacy’ component. This can be a dedicated early activity in which learners are introduced to the “prompt–evaluate–refine” cycle as a named and valued process. Giving the cycle a name (for instance, ‘the vibe coding loop’), explaining its relationship to CT practices, and setting a deliberate expectation that multiple refinement cycles are the norm rather than a sign of failure would help PSTs develop a constructive relationship with the friction before it begins to accumulate into frustration.
Leveraging AI-assisted environments as reasoning scaffolds: The vibe coding paradigm introduces a specific pedagogical opportunity that traditional programming courses do not. It also positions a chatbot as a reasoning partner whose outputs must be interrogated, evaluated, and revised rather than simply executed. This shifts the focus of CT from code production to code interpretation—from writing algorithms to reading, testing, and debugging them. For PSTs, who will ultimately need to facilitate similar interpretive processes in their students, this is pedagogically significant. Instructional designers should make this interpretive dimension explicit by building in structured activities that require learners to articulate why a generated program does or does not behave as expected and to trace the relationship between their prompts, the system’s outputs, and the computational logic underlying both [11,47]. For instance, within the Creative Production module, micro-presentations could include asking the AI to generate a purposeful error and then requiring the pre-service teacher to explain that error to a peer as if teaching a student, to write a student-friendly hint, or to present a two-minute mini-lesson tracing how a prompt became a working game mechanic. These activities would connect computational reasoning to pedagogical explanation and help bridge the gap between knowing CT and teaching CT.
Addressing the teaching-readiness gap through graduated exposure: The finding that experiencing CT did not directly translate into perceived readiness to teach is consistent with prior research and points to a structural challenge in teacher education: single-course interventions, however well designed, are insufficient to bridge the gap between personal competence and professional confidence. Future programs should consider embedding CT experiences across multiple courses and field placements—creating opportunities for PSTs to encounter CT first as learners, then as observers of others teaching it, and finally as practitioners themselves. This graduated trajectory would more closely mirror the kind of extended, contextually varied exposure that the literature identifies as necessary for the development of durable pedagogical competence [3,7]. A more proximal bridge between technical engagement and pedagogical readiness can be created within the vibe coding course itself through the integration of structured micro-teaching activities in the Creative Production module (Module 3). One practical design is an AI-generated error explanation task, in which participants use a chatbot to deliberately produce a broken Scratch script—for example, prompting Claude or ChatGPT to introduce a specific logical error such as an off-by-one loop boundary, an incorrect conditional, or a missing reset event—and then practice explaining to a peer what the error is, why it occurs, and how a primary school student might be guided toward diagnosing it. This task asks PSTs to inhabit two cognitive positions simultaneously: the programmer who understands execution logic and the teacher who can make that logic accessible. A second micro-teaching activity could involve having participants design a ‘deliberate challenge’—an intentionally difficult game section—and subsequently write a brief pedagogical script outlining how they would guide a student through it. Both activities move PSTs from doing CT to teaching CT within the same course structure, without requiring an additional field placement, and could be evaluated using the same rubric (Appendix A) extended to include a pedagogical communication dimension.
Connecting CT to subject matter identity: The vibe coding approach adopted in this course, which required PSTs to design games connected to their own disciplinary specializations, proved particularly effective in sustaining motivation and deepening engagement. This finding suggests a broader design principle: CT integration in teacher education is most effective when it is not positioned as a generic digital literacy requirement but as a domain-specific pedagogical tool with direct relevance to what PSTs care about and intend to teach. The digital competence frameworks articulated in policy contexts [26,27] increasingly recognize this domain-specific dimension, but teacher preparation programs have been slower to operationalize it in their course designs. Vibe coding, with its capacity to rapidly generate domain-relevant interactive content through natural language prompting, offers a practical pathway for doing so.

8. Limitations and Directions for Future Research

Several limitations of the present study warrant acknowledgment. The key limitations and suggestions for future research are as follows:

The findings are institutionally situated and should not be interpreted as globally representative of PST populations. Replication across universities, national contexts, and teacher-specialization pathways is required before broader claims can be made. This limits the generalizability of the findings, particularly as voluntary participation likely produced a participant group more favorably disposed toward technology than the broader PST population. Consequently, future studies should deliberately recruit across a wider range of technology orientations, including technology-averse participants. This would allow researchers to examine whether AI-mediated ‘vibe coding’ environments can effectively scaffold CT engagement for learners who approach programming with greater initial resistance. Future research should also recruit samples of at least 80–100 to achieve adequate statistical power (>0.80) for detecting medium effects (d = 0.50) using paired t-tests, enabling confirmation or disconfirmation of the directional patterns observed here.
The absence of a comparison group means that the attitude shifts and CT engagement patterns observed cannot be causally attributed to the intervention. Plausible alternative explanations include general maturation over a 12-week semester, the novelty effect of AI tools, and the motivational priming created by voluntary enrolment in an unusual elective. A quasi-experimental design incorporating a comparison group—either a parallel cohort receiving a conventional block-based programming course without AI mediation, or a waitlist control receiving the course in a subsequent semester—would substantially strengthen causal inference and is identified as the highest priority for follow-up research. Additionally, a comparative replication using English prompts may reveal whether some debugging behaviors observed here derive from computational reasoning itself or from language-specific ambiguities in prompt interpretation. Replication studies should recruit approximately 80–100 participants to achieve adequate statistical power (>0.80) for detecting medium effects with paired t-tests, thereby allowing future studies to confirm or disconfirm the directional patterns observed here. This remains an important direction for future cross-linguistic validation. A cross-linguistic comparative study represents one of the most tractable and theoretically informative extensions of this work. Such a study would deploy the same course design, the same scaffolding sequence, and the same instruments with two groups: one prompting in Greek and one prompting in English. Differences in debugging frequency, prompt refinement cycles, and CT tier distributions between the groups would provide direct empirical evidence of how much the linguistic mediation variable contributes to the patterns observed in the present study and would help determine whether the findings are specific to Greek language implementations or generalizable to AI-mediated vibe coding contexts more broadly.
An additional limitation concerns the potential over-reliance on AI-generated solutions. While AI assistance lowers barriers to entry, it may also reduce opportunities for independent algorithmic construction and deeper engagement with underlying computational logic. Future work should investigate how different levels of AI support influence the balance between efficiency and conceptual understanding.
This study is also limited by its reliance on self-reported and reflective data as proxies for CT engagement. Although these measures provide valuable insight into participants’ experiences and reasoning processes, they do not constitute direct evidence of computational proficiency. The absence of performance-based assessment means that this study’s claims about CT engagement should be understood as evidence of observable reasoning practices in discourse rather than as validated gains in computational competence. The analytical rubric (Appendix A) was developed to make this distinction explicit, but it cannot substitute for the direct measurement that artifact analysis or standardized task performance would provide. Future research should incorporate performance-based assessments, artifact analysis, or trace data (e.g., prompt iterations and debugging sequences) to provide a more robust and objective account of CT practices.
The findings of this research are primarily grounded in reflective and qualitative indicators, as no formal objective instrument was utilized to measure gains in programming competence. To increase the internal validity of future investigations, it is essential to triangulate these reflective insights with empirical performance metrics. Integrating objective assessments, such as rubric-based artifact analysis or controlled debugging simulations, would offer a more precise evaluation of how iterative cycles of AI-assisted learning translate into tangible CT skills.
An additional methodological consideration concerns the nature of the qualitative data sources. While the study was situated within an AI-mediated vibe coding environment, the qualitative analysis drew primarily on participants’ prompts and open-ended survey responses rather than on direct analysis of the natural language prompts they submitted to AI chatbots or the iterative sequences of AI-generated outputs. Consequently, the exemplar prompts presented in Appendix B, while grounded in participant accounts, should be understood as analytically reconstructed illustrations rather than verbatim transcriptions; their function is to render observable CT reasoning patterns visible, not to serve as raw data in themselves. Future research should capture live prompt–output sequences to enable direct, granular analysis of learner–AI interaction. Future research should incorporate log-based analysis of learner–AI interactions to provide a more granular, process-level account of how decomposition, debugging, and iterative refinement are enacted within the prompting workflow itself.
The use of Greek as the primary language for instruction and prompting introduces a critical layer of linguistic mediation between user intent and AI output. This linguistic variable may have influenced the structural quality of the generated code and dictated participants’ specific interaction patterns. Future studies should investigate whether the nuances of non-English prompt formulation impact CT engagement differently from English-dominant contexts, where LLM training data are more robust. To mitigate the limitations of reconstructed accounts, future research must implement systematic interaction logging from the study’s inception. This could be operationalized through three distinct mechanisms: (a) requiring participants to embed prompt–output sequences within their reflective logs, (b) utilizing interfaces with native conversation export features (e.g., Claude), or (c) deploying a bespoke intermediary logging platform for server-side recording. Such granular data would facilitate rigorous analysis of decomposition specificity, debugging cycle latency, and prompt refinement trajectories, providing the empirical weight necessary to substantiate claims regarding CT enactment that self-reported data alone cannot provide.
Finally, it is important to recognize that computational fluency—particularly the level required to teach CT with confidence and pedagogical clarity—cannot be developed within a single semester. This study represents a deliberate but modest first step. While it demonstrates that integrating AI-mediated game making into teacher education is both feasible and pedagogically valuable, it also highlights that one course is insufficient to bridge the gap between exposure and instructional readiness. Longitudinal research tracking PSTs across multiple CT-integrated experiences, and into their early years of professional practice, would provide a far richer account of how initial engagement develops into durable teaching capacity. The development of computationally confident and pedagogically reflective educators is not a short-term outcome but a sustained trajectory that teacher education programs must support over time.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data used are publicly available and their citations are provided in the manuscript.

Acknowledgments

I extend my deepest gratitude to all the individuals who voluntarily participated in this research.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
CT	Computational Thinking
PST	Pre-Service Teacher
DBR	Design-Based Research
LLM	Large Language Model
STEM	Science, Technology, Engineering and Mathematics
RQ	Research Question
SD	Standard Deviation
CSTA	Computer Science Teachers Association

Appendix A. Analytical Rubric for Identifying CT Engagement in AI-Mediated Prompting

Appendix A.1. Purpose of Rubric

This rubric was developed to systematically identify and classify observable indicators of CT engagement in participants’ reflective accounts and reconstructed prompts. It does not measure CT proficiency, but rather the depth and structure of computational reasoning demonstrated during AI-mediated interaction.

Appendix A.2. Structure of the Rubric

Each instance of participant reasoning was coded across two dimensions:

(1)

CT Practice Type

Decomposition;
Debugging;
Algorithmic thinking;
Abstraction;
Pattern recognition.

(2)

CT Engagement Level (0–3 scale)

Appendix A.3. CT Engagement Levels

Table A1. CT Engagement Levels.

Level	Label	Description	Indicators in Data
0	No CT Evidence	No evidence of computational reasoning	Vague descriptions; no logic reference
1	Surface Interaction	Problem described but not analytically structured	“It doesn’t work”; general complaints with no causal reasoning
2	Structured Reasoning	Clear logical structuring of the problem	Conditions, sequences, or variables explicitly mentioned
3	Generative/Transfer Reasoning	Generalization, reuse, or cross-context transfer	Reusable logic, abstraction, or transferable structures proposed

Note. The rubric distinguishes four levels of computational reasoning. Level 0 indicates no observable CT engagement. Levels 1–3 represent increasing depth of reasoning from surface description to generative transfer. All coded instances were rated on this scale by the primary researcher and verified by a second coder (κ = 0.81).

Appendix A.4. Operational Indicators by CT Practice

Table A2. Operational Indicators by CT Practice.

CT Practice	Level 1	Level 2	Level 3
Decomposition	Identifies problem vaguely	Breaks problem into discrete components	Defines relationships between components
Debugging	Reports error without explanation	Identifies probable cause	Formulates testable hypothesis about system behavior
Algorithmic Thinking	Describes desired outcome only	Specifies ordered steps or rules	Defines constraints, variables, or conditional logic
Abstraction	Notices repetition	Suggests simplification (e.g., a loop)	Constructs generalized, reusable logic
Pattern Recognition	Recognizes similarity	Reuses a known structure	Transfers logic across different contexts

Note. L1 = surface interaction; L2 = structured reasoning; L3 = generative/transfer reasoning. Operational indicators provide observable anchors for each engagement level within each CT practice category.

Appendix A.5. Coding Procedure

Reflective entries and reconstructed prompts were segmented into meaning units.
Each unit was:
○
Assigned a CT practice category,
○
Rated on the 0–3 engagement scale.
Ambiguous cases were resolved through iterative comparison across dataset.

Appendix A.6. Interpretive Use

The current rubric supports:

Identification of CT engagement patterns;
Comparison across course stages;
Differentiation between:
○
superficial AI use and
○
structured computational reasoning.

Appendix B. CT Practices and Levels of Engagement in AI-Mediated Prompting

Table A3. CT Practices and Levels of Engagement in AI-Mediated Promptingt.

CT Practice	Course Stage	CT Tier	Exemplar Prompt	CT Engagement Level	Analytical Rationale	n (Participants)
Decomposition	Module 2 (Stage 1)	Foundational	“Break the maze solution into smaller steps and specify what the character should do at each point.”	Level 2—Structured Reasoning	Participant explicitly partitions the task into discrete components and defines stepwise logic. Reflects Level 2 decomposition; does not yet model relationships at a system level.	n = 18.14 (58%)
Debugging	Module 2 (Stage 3)	Foundational	“Check why the character stops before reaching the goal and correct the movement sequence so it completes the path.”	Level 3—Generative Reasoning	Participant hypothesizes system behavior and directs targeted correction, moving beyond simple error identification. Aligns with Level 3 hypothesis-driven debugging.	n = 22.19 (79%)
Algorithmic Thinking	Module 2 (Stage 3)	Intermediate	“I want the game to become harder… speed starts at 2, increases every 30 s, but stops at 10…”	Level 3—Generative Reasoning	Participant defines variables, temporal conditions, and constraints prior to implementation. Reflects generative algorithmic structuring (Level 3).	n = 11.9 (38%)
Abstraction	Module 3 (Stage 4)	Advanced	“Identify repeated sequences… use a loop instead of repeating commands.”	Level 3—Generative Reasoning	Participant recognizes redundancy and proposes a generalized solution (loop). Reflects abstraction as reusable logic construction (Level 3).	n = 7.6 (25%)
Pattern Recognition	Module 3 (Stage 4)	Advanced	“The player-solving maze logic could be reused for the opponent…”	Level 3—Generative Reasoning	Participant identifies structural similarity across contexts and proposes logic transfer. Aligns with Level 3 pattern recognition (cross-context transfer).	n = 9.8 (33%)

Note 1: Prompts are analytically reconstructed from participants’ reflective logs and session observations and are presented as representative illustrations of observed reasoning patterns rather than verbatim transcripts of AI interactions. “Course Stage” indicates the instructional module in which each CT practice most frequently emerged. “CT Tier” follows the developmental hierarchy proposed by Grover and Pea [13], distinguishing foundational (decomposition, debugging), intermediate (algorithmic thinking), and higher-order (abstraction, pattern recognition) practices. CT tier classification follows Grover & Pea [13]) and is used as an analytical heuristic rather than a validated developmental sequence. Frequency values are reported both as (a) the number of coded instances across the dataset and (b) the number and percentage of participants (n = 24) whose reflections contained at least one instance of the corresponding CT practice. Percentages are not additive, as individual participants may demonstrate multiple CT practices across sessions. “Instances (n)” indicates the total number of coded occurrences across the dataset, while “Participants (n, %)” indicates the number and proportion of participants (n = 24) demonstrating each practice. Percentages are not additive, as participants may exhibit multiple CT practices. All examples are situated within an AI-mediated game-making context supported by AI tools (e.g., Claude Sonnet 4.6., ChatGPT 5.5, and Gemini Flash-Litei). Note 2: The distribution of Level 3 instances across multiple CT practices suggests that AI-mediated environments, when scaffolded, can support not only engagement with CT but also the emergence of higher-order reasoning processes such as abstraction and transferable logic construction. Note 3. All exemplar prompts in this appendix are analytically reconstructed illustrations derived from participants’ reflective log entries. They represent the researcher’s best-faith interpretation of the computational reasoning patterns evident in participants’ written accounts of their AI interactions. They are not verbatim transcriptions of actual chatbot exchanges and should not be treated as primary data. Their function is to render observable CT reasoning patterns visible and to ground the rubric-level classifications in concrete linguistic form.

Appendix C. Design Principles for Supporting CT in AI-Mediated Game-Making Environments

Table A4. Design Principles for Supporting CT in AI-Mediated Game-Making Environments.

DP	Title and Description	Implementation	Rationale
DP1	Preserve Computational Visibility AI assistance must not obscure underlying logic; learners should remain able to inspect and manipulate computational structures.	Block-based programming (Scratch) used alongside AI prompting; maze tasks require explicit step-by-step logic before any AI generation.	Prevents black box interaction; maintains the scaffolding essential for debugging and decomposition.
DP2	Require Pre-Generation Articulation of Logic Learners should define intended behavior in writing before requesting AI-generated solutions.	Prompting tasks required explicit written problem descriptions; reflective logs captured intended logic prior to each AI interaction.	Ensures CT occurs before AI delegation, preserving conceptual ownership over generated code.
DP3	Scaffold Debugging as a Core Activity Debugging should be explicitly structured as a core course activity, not treated as incidental.	Stage 5 maze activity required identification and correction of a pre-built error; prompt-based debugging was embedded in reflective logs.	Debugging was the most consistently observed CT practice (79% of participants), confirming its centrality to vibe coding engagement.
DP4	Use Constrained Problem Spaces Early Early tasks should limit complexity to foreground core CT processes before open-ended design begins.	Maze tasks with directional primitives (move forward, turn left, turn right) constrained the problem space in Modules 1–2.	Supports decomposition and algorithmic thinking before learners encounter the full ambiguity of open-ended game design.
DP5	Gradually Transition to Open-Ended Creation Learners should move from structured tasks to independent design as CT competencies consolidate.	Module 2 provided scaffolded maze tasks; Module 3 required fully independent, subject-aligned game projects.	Enables transfer from guided to generative CT practices, mirroring the constructionist progression from structured to open-ended making.
DP6	Integrate Reflection as a Computational Activity Reflection should explicitly target reasoning processes, not merely experiential satisfaction.	Structured reflective prompts asked participants to describe obstacles, strategies, and debugging processes at the close of each session.	Supports metacognitive awareness of CT practices and provides the qualitative data stream necessary for process-level analysis.

Note. Design principles were derived from the convergence of empirical findings, observed learner difficulties, and qualitative–quantitative triangulation. They specify the conditions under which vibe coding supports CT engagement rather than passive AI delegation. DP = design principle.

References

Wing, J.M. Computational thinking. Commun. ACM 2006, 49, 33–35. [Google Scholar] [CrossRef]
Brennan, K.; Resnick, M. New frameworks for studying and assessing the development of computational thinking. In Proceedings of the 2012 Annual Meeting of the American Educational Research Association, Vancouver, BC, Canada, 13–17 April 2012. [Google Scholar]
Yadav, A.; Gretter, S.; Good, J.; McLean, T. Computational thinking in teacher education. In Emerging Research, Practice, and Policy on Computational Thinking; Rich, P.J., Hodges, C.B., Eds.; Springer: Cham, Switzerland, 2017; pp. 205–220. [Google Scholar]
Kafai, Y.B.; Burke, Q. Connected Gaming: What Making Video Games Can Teach Us About Learning and Literacy; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Butler, D.; Leahy, M. Developing pre-service teachers’ understanding of computational thinking through constructionist game design. Br. J. Educ. Technol. 2021, 52, 1060–1077. [Google Scholar] [CrossRef]
Gleasman, C.; Kim, C. Pre-service teacher’s use of block-based programming and computational thinking to teach elementary mathematics. Digit. Exp. Math. Educ. 2020, 6, 52–90. [Google Scholar] [CrossRef]
Gudmundsdóttir, G.B.; Gassó, H.H.; Rubio, J.C.C.; Hatlevik, O.E. Student teachers’ responsible use of ICT: Examining two samples in Norway and Spain. Comput. Educ. 2020, 152, 103877. [Google Scholar] [CrossRef]
Kafai, Y.B.; Burke, Q. Constructionist gaming: Understanding the benefits of making games for learning. Educ. Psychol. 2015, 50, 313–334. [Google Scholar] [CrossRef] [PubMed]
Laurillard, D. Teaching as a Design Science: Building Pedagogical Patterns for Learning and Technology; Routledge: New York, NY, USA, 2012. [Google Scholar]
Royal, C. Integrating vibe coding and flow theory: A student-centered model for AI-augmented coding education. J. Mass Commun. Educ. 2026. advance online publication. [Google Scholar]
Lyu, W.; Wang, Y.; Sun, Y.; Zhang, Y. Will Your Next Pair Programming Partner Be Human? An Empirical Evaluation of Generative AI as a Collaborative Teammate in a Semester-Long Classroom Setting. In Proceedings of the ACM Conference on Learning @ Scale (L@S ‘25); ACM: New York, NY, USA, 2025. [Google Scholar] [CrossRef]
Si, J.J.; Li, D.; Wang, Q.; Liu, A.W.; Lin, X.; Zhu, Y.; Zhou, X.; Wang, A.Y.; Wang, K.Y. Exploring the impacts and challenges of vibe coding paradigm to children’s programming learning and practices. In Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems; ACM: New York, NY, USA, 2026; pp. 1–24. [Google Scholar]
Grover, S.; Pea, R. Computational thinking in K–12: A review of the state of the field. Educ. Res. 2013, 42, 38–43. [Google Scholar] [CrossRef]
Lachner, A.; Fabian, A.; Franke, U.; Preiß, J.; Jacob, L.; Führer, C.; Küchler, U.; Paravicini, W.; Randler, C.; Thomas, P. Fostering pre-service teachers’ technological pedagogical content knowledge (TPACK): A quasi-experimental field study. Comput. Educ. 2021, 174, 104304. [Google Scholar] [CrossRef]
Sáez-López, J.M.; Sevillano-García, M.L.; Vazquez-Cano, E. The effect of programming on primary school students’ mathematical and scientific understanding: Educational use of mBot. Educ. Technol. Res. Dev. 2020, 67, 1405–1425. [Google Scholar] [CrossRef]
Jansen, T.; Horbach, A.; Meyer, J. Feedback from Generative AI: Correlates of Student Engagement in Text Revision from 655 Classes from Primary and Secondary School. In Proceedings of the 15th International Learning Analytics and Knowledge Conference (LAK ‘25), Dublin, Ireland, 3–7 March 2025; ACM: New York, NY, USA, 2025; pp. 831–836. [Google Scholar] [CrossRef]
Kazemitabaar, M.; Chow, J.; Ma, C.K.T.; Ericson, B.J.; Weintrop, D.; Grossman, T. Studying the effect of AI code generators on supporting novice learners in introductory programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Hamburg, Germany, 23–28 April 2023. Article 455. [Google Scholar]
Luo, J.; Qu, L.; Wang, M.; Li, H. Cultivating Pre-Service Teachers’ Design Thinking and Generative Artificial Intelligence Literacy through an LLM-Based Educational Website Design Task. Interact. Learn. Environ. 2026, 1–20. [Google Scholar] [CrossRef]
Fortes-Ferreira, M.; Alam, M.S.; Bazilinskyy, P. Vibe coding in practice: Building a driving simulator without expert programming skills. In Proceedings of the 17th International Conference on Automotive User Interfaces and Interactive Vehicular Applications (AutomotiveUI Adjunct ‘25); ACM: New York, NY, USA, 2025; pp. 60–66. [Google Scholar]
Thorgeirsson, S.; Weidmann, T.B.; Su, Z. Computer science achievement and writing skills predict vibe coding proficiency. In Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI ‘26); ACM: New York, NY, USA, 2026. [Google Scholar]
Papert, S. Mindstorms: Children, Computers and Powerful Ideas; Basic Books: New York, NY, USA, 1980. [Google Scholar]
Harel, I.; Papert, S. (Eds.) Constructionism; Ablex Publishing: Norwood, NJ, USA, 1991. [Google Scholar]
Resnick, M. Lifelong Kindergarten: Cultivating Creativity Through Projects, Passion, Peers, and Play; MIT Press: Cambridge, MA, USA, 2017. [Google Scholar]
Shute, V.J.; Sun, C.; Asbell-Clarke, J. Demystifying computational thinking. Educ. Res. Rev. 2017, 22, 142–158. [Google Scholar] [CrossRef]
Lee, C.; Zheng, Q.; Xiong, J. From visual to multimodal programming: Designing an interface to externalize decomposition thinking for novice learners. In Proceedings of the 31st International Conference on Intelligent User Interfaces (IUI ‘26); ACM: New York, NY, USA, 2026; pp. 1202–1217. [Google Scholar]
Computer Science Teachers Association (CSTA). CSTA K–12 Computer Science Standards; CSTA: Chicago, IL, USA, 2022. [Google Scholar]
European Commission. DigComp 2.2: The Digital Competence Framework for Citizens; Publications Office of the European Union: Luxembourg, 2023. [Google Scholar]
Kusper, G.; Szabó, C. Vibe Coding in Education. In Proceedings of the 2025 International Conference on Emerging eLearning Technologies and Applications (ICETA); IEEE: New York, NY, USA, 2025; pp. 506–511. [Google Scholar]
Hsu, H.-P. From programming to prompting: Developing computational thinking through large language model-based generative artificial intelligence. TechTrends 2025, 69, 485–506. [Google Scholar] [CrossRef]
Runco, M.A.; Acar, S. Divergent thinking as an indicator of creative potential. Creat. Res. J. 2012, 24, 66–75. [Google Scholar] [CrossRef]
Tashakkori, A.; Teddlie, C. SAGE Handbook of Mixed Methods in Social and Behavioral Research, 2nd ed.; SAGE Publications: Thousand Oaks, CA, USA, 2010. [Google Scholar]
Andersen, T.; Shattuck, J. Design-based research: A decade of progress in education research? Educ. Res. 2012, 41, 16–25. [Google Scholar] [CrossRef]
Sandoval, W. Conjecture mapping: An approach to systematic educational design research. J. Learn. Sci. 2014, 23, 18–36. [Google Scholar] [CrossRef]
Pintrich, P.R. A motivational science perspective on the role of student motivation in learning and teaching contexts. J. Educ. Psychol. 2003, 95, 667–686. [Google Scholar] [CrossRef]
Tondeur, J.; van Braak, J.; Ertmer, P.A.; Ottenbreit-Leftwich, A. Understanding the relationship between teachers’ pedagogical beliefs and technology use in education. Educ. Technol. Res. Dev. 2017, 65, 555–575. [Google Scholar] [CrossRef]
Dörnyei, Z.; Taguchi, T. Questionnaires in Second Language Research: Construction, Administration, and Processing, 2nd ed.; Routledge: New York, NY, USA, 2009. [Google Scholar]
Brislin, R.W. Back-translation for cross-cultural research. J. Cross-Cult. Psychol. 1970, 1, 185–216. [Google Scholar] [CrossRef]
Joshi, A.; Kale, S.; Chandel, S.; Pal, D.K. Likert scale: Explored and explained. Br. J. Appl. Sci. Technol. 2015, 7, 396–403. [Google Scholar] [CrossRef]
Bandura, A. Self-Efficacy: The Exercise of Control; W.H. Freeman: New York, NY, USA, 1997. [Google Scholar]
Schwarzer, R. Self-Efficacy: Thought Control of Action; Hemisphere Publishing Corp: Washington, DC, USA, 1992. [Google Scholar]
Zeichner, K.; Liston, D. Reflective Teaching: An Introduction, 2nd ed.; Routledge: New York, NY, USA, 2014. [Google Scholar]
Field, A. Discovering Statistics Using IBM SPSS Statistics, 5th ed.; SAGE Publications: London, UK, 2018. [Google Scholar]
Braun, V.; Clarke, V. Using thematic analysis in psychology. Qual. Res. Psychol. 2006, 3, 77–101. [Google Scholar] [CrossRef]
Creswell, J.W.; Plano Clark, V.L. Designing and Conducting Mixed Methods Research, 3rd ed.; SAGE Publications: Thousand Oaks, CA, USA, 2017. [Google Scholar]
Kruger, J.; Dunning, D. Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. J. Pers. Soc. Psychol. 1999, 77, 1121–1134. [Google Scholar] [CrossRef]
Kapur, M. Examining productive failure, productive success, unproductive failure, and unproductive success in learning. Educ. Psychol. 2016, 51, 289–299. [Google Scholar] [CrossRef]
Lau, S.; Guo, P.J. Adapting to AI code generation tools. In Proceedings of the ACM Conference on International Computing Education Research (ICER); ACM: New York, NY, USA, 2023; pp. 106–120. [Google Scholar]

Figure 1. Overview of course modules and scaffolded learning sequence in vibe coding.

Figure 2. Example of iterative game development and debugging using AI-assisted prompting.

Figure 3. Mapping CT practices to “Course Stages”.

Table 5. Rubric-level distribution of coded CT practice instances across all participant logs.

CT Practice	Level 1 n (%)	Level 2 n (%)	Level 3 n (%)	Total Instances
Debugging	6 (27%)	9 (41%)	7 (32%)	22
Decomposition	3 (17%)	12 (67%)	3 (17%)	18
Algorithmic thinking	1 (9%)	8 (73%)	2 (18%)	11
Pattern recognition	0 (0%)	4 (44%)	5 (56%)	9
Abstraction	0 (0%)	2 (29%)	5 (71%)	7

Note. Level 1 = surface interaction (trial-and-error manipulation with no stated rationale); Level 2 = structured reasoning (identification of discrete components or conditions); Level 3 = generative reasoning (hypothesis-driven causal explanation or cross-context transfer). Percentages reflect the proportion of total instances within each CT practice. Total instance counts correspond to those reported in Appendix B.

Table 6. Distribution of recommendations for course element retention (post-course).

Theme	Percentage (%)
Content Knowledge: Retain current CT and vibe coding content	9%
Extension: Increase in depth and duration of CT exploration	14%
Pedagogy of Play: Preserve focus on game making and active engagement	20%
Creative Freedom: Maintain open-ended, project-based final tasks	24%
Vibe Coding Structures: Retain debugging and planning sessions	33%

Note. Percentages reflect the proportion of participants identifying each theme in their open-ended post-course recommendations. Categories are not mutually exclusive.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pellas, N. From Prompt to Play: Examining Computational Thinking Through Vibe Coding in Game Making for Pre-Service Teacher Education. Multimodal Technol. Interact. 2026, 10, 57. https://doi.org/10.3390/mti10050057

AMA Style

Pellas N. From Prompt to Play: Examining Computational Thinking Through Vibe Coding in Game Making for Pre-Service Teacher Education. Multimodal Technologies and Interaction. 2026; 10(5):57. https://doi.org/10.3390/mti10050057

Chicago/Turabian Style

Pellas, Nikolaos. 2026. "From Prompt to Play: Examining Computational Thinking Through Vibe Coding in Game Making for Pre-Service Teacher Education" Multimodal Technologies and Interaction 10, no. 5: 57. https://doi.org/10.3390/mti10050057

APA Style

Pellas, N. (2026). From Prompt to Play: Examining Computational Thinking Through Vibe Coding in Game Making for Pre-Service Teacher Education. Multimodal Technologies and Interaction, 10(5), 57. https://doi.org/10.3390/mti10050057

Article Menu

From Prompt to Play: Examining Computational Thinking Through Vibe Coding in Game Making for Pre-Service Teacher Education

Abstract

1. Introduction

2. Background

2.1. Vibe Coding

2.2. Constructionism as a Theoretical Framework in Game Making

2.3. CT Learning Through AI-Assisted Programming

2.4. Game Making as a Context for Vibe Coding and CT Development

2.5. PSTs’ Attitudes Toward Coding and CT

3. Research Method

3.1. The “Vibe Coding in Game Making” Course

3.1.1. Rationale and Course Design

3.1.2. Course Modules and Learning Sequence

3.2. Participants

3.3. Sampling Procedure and Ethical Considerations

3.4. Research Instruments

3.5. Data Analysis

3.6. Validity and Reliability

4. Results

4.1. Pre-Service Teachers’ Orientations Toward Pre- and Post-Intervention

4.1.1. General Disposition Toward Digital Technology

4.1.2. Perspectives on Game-Making Contexts

4.1.3. Shifts in General Perceived Self-Competence

4.2. Qualitative Insights

5. Discussion

5.1. Addressing RQ1: Changes in Attitudes Toward Digital Technology Use, Game-Making Contexts, and Self-Efficacy

5.1.1. Attitudes Toward Digital Technology Use

5.1.2. Game-Making Contexts

5.1.3. Self-Efficacy

5.2. Addressing RQ2: Navigating Computational Challenges in Vibe Coding Environments

5.2.1. Navigating Difficulties in AI-Mediated Game-Making Contexts

5.2.2. The Developmental Gradient of CT Practices in Vibe Coding

5.2.3. AI-Assisted Programming

6. Conclusions

7. Implications for Design and Practice

8. Limitations and Directions for Future Research

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Analytical Rubric for Identifying CT Engagement in AI-Mediated Prompting

Appendix A.1. Purpose of Rubric

Appendix A.2. Structure of the Rubric

Appendix A.3. CT Engagement Levels

Appendix A.4. Operational Indicators by CT Practice

Appendix A.5. Coding Procedure

Appendix A.6. Interpretive Use

Appendix B. CT Practices and Levels of Engagement in AI-Mediated Prompting

Appendix C. Design Principles for Supporting CT in AI-Mediated Game-Making Environments

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI