Scaffolding Generative AI as a Tutor: A Quasi-Experimental Study of Learning Outcomes and Motivational, Cognitive and Metacognitive Processes

Melanou, Chrysanthi; Beege, Maik

doi:10.3390/educsci16040651

Open AccessArticle

Scaffolding Generative AI as a Tutor: A Quasi-Experimental Study of Learning Outcomes and Motivational, Cognitive and Metacognitive Processes

by

Chrysanthi Melanou

^1,*

and

Maik Beege

^2,3

¹

Baden-Wuerttemberg Cooperative State University, 78054 Villingen-Schwenningen, Germany

²

Department of Psychology, Digital Media in Education, University of Education Freiburg, 79117 Freiburg, Germany

³

Center for Interdisciplinary Research on Digital Education (CIRDE), University of Education Freiburg, 79117 Freiburg, Germany

^*

Author to whom correspondence should be addressed.

Educ. Sci. 2026, 16(4), 651; https://doi.org/10.3390/educsci16040651

Submission received: 21 February 2026 / Revised: 12 April 2026 / Accepted: 16 April 2026 / Published: 20 April 2026

(This article belongs to the Topic Generative Artificial Intelligence in Higher Education)

Download

Browse Figures

Versions Notes

Abstract

Generative artificial intelligence (AI) is increasingly used in higher education as an interactive tutoring partner rather than a passive information tool. While AI offers opportunities to support learning, concerns remain regarding cognitive offloading, reduced engagement, and unreflective use. Although instructional scaffolding is a well-established design principle for supporting complex learning, its role in shaping cognitive and metacognitive processes in AI-supported settings remains underexplored. This quasi-experimental pre–post study examined how varying levels of scaffolding influence learning outcomes and motivational, cognitive and metacognitive processes during AI-tutored learning. A total of 175 first-semester students from two faculties and diverse academic backgrounds completed the same academic task within a four-hour university session under one of three conditions: (1) full scaffolding, including a structured prompting template based on the Goal–Context–Constraints (GCC) strategy, iterative refinement, and reflective guidance; (2) light scaffolding, including the GCC prompting template; or (3) no scaffolding template as the control condition. Measures included knowledge gain, motivation, cognitive load, critical thinking, and reflective use. Data were analysed using ANOVAs, ANCOVAs, regression models, and PROCESS moderation and mediation analyses. Across the conditions, students showed significant gains in knowledge, critical thinking, and reflective use, while motivation remained stable and intrinsic and extraneous cognitive load decreased; no significant differences between scaffolding conditions were observed. The scaffolding conditions did not produce significant interaction effects, although descriptive trends suggested higher gains in higher-order knowledge under scaffolded conditions. Overall, the findings suggest that short-term learning gains in AI-supported settings may not depend on scaffolding intensity alone, but rather on how learners engage with AI during the learning process.

Keywords:

generative AI; scaffolding; higher education; learning outcomes; cognitive load; reflective use; motivation; critical thinking

1. Introduction

The rapid rise in generative artificial intelligence (AI) has initiated a fundamental shift in how knowledge is accessed, processed, and validated in higher education. With generative AI tools widely accessible, students can instantly generate explanations, summaries, and arguments that previously required their own reasoning and academic skills (Gruenhagen et al., 2024). Rather than functioning only as a passive tool, generative AI increasingly acts as an interactive tutoring partner that can guide learners through academic reasoning processes (Kestin et al., 2025).

This shift from tool to tutor raises important questions about instructional framing and learner support (Pireci Sejdiu & Sejdiu, 2025). Beyond its technological aspects, generative AI challenges established assumptions about how academic learning, reasoning, and responsibility are distributed between learners and external resources. Recent studies highlight the potential of AI to offer personalised support to foster self-directed learning (Roe & Perkins, 2025). However, educators express concerns that excessive reliance on AI is likely to undermine students’ cognitive engagement and critical thinking (Larson et al., 2024). As generative AI becomes part of everyday academic practice, it is essential to examine how instructional conditions can promote active, reflective, and meaningful learning.

1.1. Scaffolding in AI-Supported Learning

Instructional scaffolding has long been recognised as a key design principle in complex learning environments (Belland, 2017). Grounded in Vygotsky and Cole (1978) concept of the Zone of Proximal Development, scaffolding refers to temporary, adaptive support that helps learners engage in tasks they could not accomplish independently. Scaffolding can be implemented through worked examples, structured prompts, feedback, or guiding questions (Belland, 2017). In the context of generative AI, it is expected to help students use the technology more productively by guiding them in how to formulate effective prompts, evaluate AI-generated responses, and integrate them meaningfully into their learning process (D. Lee & Palmer, 2025). Without such support, students are expected to rely on AI passively or focus on surface-level results (Zhai et al., 2024). While scaffolding is well-established in educational theory, its role in shaping student–AI interactions remains underexplored.

Scaffolding design can vary widely in type, intensity, and pedagogical framing, ranging from highly structured guidance to more open, student-guided exploration (Dhillon et al., 2024). In AI-supported learning contexts, where learners must interpret, evaluate, and apply machine-generated content, the level and quality of scaffolding may strongly influence whether students engage in deep, reflective processing or rely on surface-level strategies (Li et al., 2025). Importantly, different forms and intensities of scaffolding may involve distinct benefits and risks, for example, by supporting conceptual understanding while potentially increasing the risk of over-reliance on AI-generated output. In AI-supported learning environments, scaffolding not only guides how learners interact with generative AI but also structures how cognitive resources are allocated during the learning process. From a Cognitive Load Theory perspective (see Section 1.4), scaffolding may reduce extraneous load by providing orientation and task structure, while simultaneously fostering germane load through guided elaboration and reflection. At the same time, the use of generative AI as a tutor introduces additional demands for evaluating and integrating machine-generated content. This makes the interaction between scaffolding, AI use, and cognitive load relevant for understanding how learners engage in higher-order thinking and regulate their learning processes.

Understanding the effects of varying scaffolding levels in AI-supported settings remains an open question with high relevance. In this study, scaffolding is conceptualised as structured, temporary support that accompanies the use of generative AI as a tutor. It is operationalised at two levels of scaffolding intensity: a full-scaffolding condition combining structured prompting strategy with iterative refinement, source evaluation, and guided reflection, and a light-scaffolding condition limited to the structured prompting strategy. This distinction enables a direct comparison of how different levels of scaffolding intensity shape learners’ interaction with generative AI. By comparing different levels of scaffolding intensity, the study examines how instructional framing shapes students’ interaction with AI and influences key learning-related processes, including knowledge gain, motivation, cognitive load, reflective use and critical thinking. In the context of this study, AI-supported scaffolding refers to structured instructional support embedded in the interaction with generative AI, guiding how students formulate prompts, evaluate AI-generated responses, and engage in reflective processing. This conceptualisation builds on established perspectives on scaffolding as temporary, adaptive support that structures learners’ cognitive and metacognitive processes and promotes self-regulated learning (Munshi et al., 2023).

This distinguishes AI-supported scaffolding from general AI use, which may lack explicit structure or intentional guidance for cognitive or metacognitive engagement. At the same time, generative AI may be understood as providing forms of support that resemble scaffolding processes, even in the absence of explicit instructional design. Through structured responses, iterative interaction, and adaptive feedback, AI systems may support learners’ reasoning and support problem-solving. In this sense, such interactions may function as a form of implicit scaffolding, shaping cognitive and metacognitive processes without being intentionally designed as instructional support. This perspective refines the distinction between scaffolded and less structured AI use, suggesting that differences between instructional conditions may reflect variations in the degree and explicitness of scaffolding rather than its presence or absence. In line with recent works on AI-supported learning environments (Bauer et al., 2025), the study assumes that the design of the interaction with AI plays a critical role in shaping learning processes.

The study differentiates between three scaffolding conditions representing the scaffolding intensity. The full-scaffolding condition included a template for guided prompt formulation, iterative refinement, source evaluation, and reflection. Light-scaffolding included only the template for guided prompt formulation without additional guidance for reflection or evaluation. The control condition involved AI use without any structured scaffolding support. This differentiation enables a direct comparison of how varying levels of scaffolding influence students’ cognitive and metacognitive engagement when interacting with generative AI. Accordingly, instructional scaffolding can be understood as a central mechanism that shapes how learners engage with generative AI, influencing cognitive load, motivation, critical thinking, and reflective use.

1.2. Knowledge Gain and the Role of Prior Knowledge

Knowledge gain remains a central outcome in higher education and is increasingly examined in the context of generative AI. According to Bloom’s revised taxonomy (Anderson & Krathwohl, 2001), lower-order thinking (LOT) refers to remembering and understanding, while higher-order thinking (HOT) involves analysing and evaluating. Generative AI can support LOT through quick access to information (Akgun & Toker, 2025), but engaging in HOT often requires deeper processing and reflection processes that may benefit from instructional scaffolding (Nathaniel et al., 2025).

Students’ ability to benefit from AI tools may also depend on prior knowledge and experience. Learners with stronger academic backgrounds are often better positioned to prompt and evaluate AI outputs effectively (Shoufan, 2023; Knoth et al., 2024), reflecting the Matthew effect (Stanovich, 1986). Without support, performance gaps may widen. Scaffolding, however, may help close these gaps by guiding less-experienced learners through cognitive and metacognitive steps. The extent and type of scaffolding may therefore determine whether AI fosters superficial recall or deeper conceptual understanding. Accordingly, higher levels of instructional scaffolding are expected to enhance deeper knowledge acquisition by supporting the evaluation and refinement of AI-generated content. LOT processes may improve across all conditions due to the availability of AI-generated information, with less dependence on instructional scaffolding. In contrast, HOT is expected to benefit more strongly from structured scaffolding, leading to greater gains in conditions with higher levels of scaffolding intensity.

1.3. Motivation

Motivation is essential for active learning. According to Self-Determination Theory (Deci & Ryan, 2000), autonomy and competence are key drivers of motivation. AI may enhance these needs by offering instant feedback and flexibility, but can also reduce intrinsic motivation if learners rely too heavily on automated responses (Fan et al., 2025). Studies show mixed effects, with both increases in motivation, for example, through autonomy (Boguslawski et al., 2025), and declines once the novelty fades or outputs dominate thinking (Fryer et al., 2017; H.-P. Lee et al., 2025). Instructional scaffolding may balance the risks of motivational decline by encouraging students to reflect, engage more actively, and use AI purposefully (Tsakeni et al., 2025). Even small shifts in motivation may influence how learners interact with AI and whether they translate this interaction into meaningful knowledge gains (Bai & Wang, 2025). These findings suggest that while motivational effects may be limited, structured scaffolding may have the potential to sustain motivation and contribute to small but meaningful increases by guiding learners’ interaction with AI. In line with Self-Determination Theory, higher levels of motivation are associated with more active and sustained engagement, which may in turn support knowledge acquisition. Motivation may influence how learners allocate cognitive resources and engage with AI-generated content, thus shaping cognitive load dynamics, the depth of reflective use, and the activation of HOT processes, which are critical for meaningful knowledge acquisition.

1.4. Cognitive Load

Cognitive Load Theory proposes that learning is limited by the capacity of working memory (Sweller et al., 1998). It distinguishes intrinsic cognitive load (ICL) from task complexity, extraneous cognitive load (ECL) from unnecessary processing demands, and germane cognitive load (GCL) from productive effort (Sweller et al., 2019), which together determine the effectiveness of learning processes (Sweller et al., 2011). Furthermore, GCL is associated with effective learning processes, whereas ECL may hinder knowledge acquisition (Sweller et al., 2011).

Generative AI may affect all three types of cognitive load. It can reduce ECL by simplifying tasks and clarifying content, but may also introduce additional demands when students must evaluate the reliability of outputs (Jose et al., 2025). Instructional scaffolding is assumed to foster GCL by prompting elaboration and explanation (Gkintoni et al., 2025). As comprehension improves, ICL may gradually decrease, particularly when complex tasks are well supported (Leppink et al., 2014).

Although Cognitive Load Theory is well established, little is known about how cognitive load evolves during short AI-supported learning activities, especially across varying scaffolding conditions. As learners become more familiar with the content of the task’s structure during the learning activity, ICL may decrease slightly as comprehension improves. Instructional scaffolding is expected to reduce ECL and support GCL by structuring learners’ interaction with AI.

1.5. Critical Thinking

Critical thinking, defined as the ability to analyse, evaluate, and justify ideas, is a central goal in higher education (Facione, 1990). While generative AI can stimulate reasoning, it may also lead to uncritical acceptance of outputs, especially in the absence of instructional support (Hou et al., 2025). As this competence develops gradually (Halpern, 2013), even short interventions may foster small but measurable gains. Learners with higher initial thinking skills are more likely to engage in reflective and purposeful use of generative AI, rather than relying on it uncritically, which may support deeper cognitive engagement during learning tasks (Hou et al., 2025). From a Cognitive Load Theory perspective, critical thinking may optimise cognitive processing by reducing ECL and increasing GCL (Sweller et al., 2011). These effects may mediate how scaffolding influences cognitive load and how HOT fosters deeper evaluative reasoning. Instructional scaffolding may foster critical thinking by encouraging deeper evaluation, questioning, and refinement of AI-generated responses. In addition, engagement in HOT processes may further enhance critical thinking by promoting analysis, evaluation, and reflection.

1.6. Reflective Use

Reflective use refers to the metacognitive regulation of AI outputs, including source evaluation, plausibility checks, and purposeful integration (Flavell, 1979). Such metacognitive regulation is increasingly seen as a core component of AI literacy, essential for navigating technical, ethical, and cognitive dimensions of AI-supported learning (Pinski & Benlian, 2024; Chiu et al., 2024). However, many students still struggle with this skill (McGrew et al., 2018).

Scaffolding may promote reflective use by encouraging students to engage more consciously with AI, leading to increased regulation over time and greater learning gains. Reflective use is therefore expected to support knowledge acquisition by promoting deeper processing and more effective regulation of learning activities. Reflective use may also help optimise cognitive processing by reducing ECL, enhancing GCL, and supporting HOT (Sweller et al., 2011). Structured scaffolding is expected to promote reflective use by encouraging deliberate monitoring, evaluation, and regulation of AI-supported learning processes.

Accordingly, reflective use may represent a metacognitive mechanism that regulates how learners engage with AI-generated content, while simultaneously shaping cognitive load and higher-order processing. Structured scaffolding is expected to promote reflective use by encouraging deliberate monitoring, evaluation, and regulation of AI-supported learning processes.

1.7. Research Gaps and Aim

Although the use of generative AI in higher education is expanding rapidly, empirical research on its instructional integration remains limited. Existing studies focus on general perceptions, self-reported use, or isolated learning outcomes (Zhang et al., 2024). Many studies rely on artificial tasks or short survey-based designs rather than authentic academic activities (N. C. Wang, 2025). As a result, there is limited understanding of how students actually engage with AI tools during real academic work, or how different instructional approaches shape this engagement (Liu & Zhong, 2025). Importantly, existing studies rarely examine multiple cognitive and metacognitive process variables collectively, often focusing on isolated outcomes rather than the interaction between them. This limits the ability to develop a comprehensive understanding of how instructional framing influences learning processes in AI-supported environments.

Scaffolding is a well-established instructional design principle for supporting complex learning (Vygotsky & Cole, 1978) and is frequently discussed as a promising strategy for AI-supported learning (Tsakeni et al., 2025). However, its specific role in guiding students’ interaction with generative AI has not been examined in depth. There is little empirical evidence on how different levels of scaffolding influence students’ cognitive and metacognitive engagement with AI tools (Ma & Chen, 2025). The lack of integrated and comparative evidence limits our understanding of how instructional design shapes learning processes in AI-supported environments. Therefore, this study aims to provide a systematic and theory-informed examination of how different levels of AI-supported scaffolding influence knowledge gain, motivation, cognitive load, critical thinking, and reflective use within a controlled instructional setting. The variation in instructional design raises a key pedagogical question about what level and type of scaffolding enables students to use generative AI in ways that foster deep, reflective learning rather than superficial engagement.

In light of these gaps, the present study compares three instructional settings in which students complete an academic task using generative AI as a tutor. All participants use the same tool but with different levels of guidance: full scaffolding, light scaffolding, or no scaffolding. The study examines how these conditions affect five key areas: knowledge gain, motivation, cognitive load, critical thinking, and reflective use. It takes place in a real university course with students from different faculties, allowing for a broader range of academic experience, prior knowledge, and AI familiarity. By combining authentic teaching contexts with theory-based instructional variation, the study aims to provide empirical insights into how generative AI can be used to support meaningful learning in applied university contexts. Building on the theoretical framework outlined above, this study examines how varying levels of scaffolding in AI-supported learning influence key cognitive and metacognitive processes. Prior research indicates that scaffolding may affect knowledge acquisition, motivational processes, and cognitive load, while also supporting higher-order thinking and metacognitive regulation. Accordingly, the present study focuses on five outcome dimensions, which are grounded in established theoretical frameworks, including Bloom’s taxonomy, Self-Determination Theory, and Cognitive Load Theory. These constructs are operationalised within the empirical design using both performance-based and self-report measures. Knowledge gain is assessed through a performance-based test differentiating lower- and higher-order thinking, while motivation, cognitive load, critical thinking, and reflective use are measured using theory-informed self-report instruments capturing the effect of the respective constructs on AI-supported learning. Accordingly, this study adopts a comparative design to examine how varying levels of instructional scaffolding shape these processes. Taken together, the constructs of the study are expected to interact in shaping learning processes in AI-supported environments, forming an integrated framework that underpins the study’s hypotheses.

The results indicate that scaffolding intensity did not consistently enhance learning outcomes, but that reflective use played a critical role in regulating cognitive load dynamics. The findings suggest that scaffolding intensity may not consistently enhance learning outcomes, while reflective use may play a critical role in regulating cognitive load dynamics. The evidence of the study points to the possibility that generative AI may already function as a form of implicit scaffolding, potentially limiting the additional impact of explicit instructional support.

1.8. Research Questions and Hypotheses

Based on the theoretical framework, this study formulates five research questions that reflect the multidimensional nature of AI-supported learning. The focus lies on key cognitive and metacognitive learning processes, including knowledge gain, motivation, cognitive load, critical thinking, and reflective use. These dimensions are analysed in relation to different scaffolding conditions in which students work with generative AI as a tutoring partner. An overview of the research questions and associated hypotheses is provided in Table 1.

2. Method

2.1. Research Design

The study employed a quasi-experimental mixed design with pre- and post-assessments, conducted during a single four-academic-hour university session. It was embedded in a real academic setting across two faculties and examined how different levels of scaffolding influence students’ knowledge acquisition, motivation, cognitive load, and reflective AI use. The intervention was integrated in a mandatory, first-semester session involving students from diverse academic backgrounds to ensure variation in prior knowledge and AI familiarity.

All groups received the same lecture-based introduction and task framing. The conditions differed exclusively with regard to the presence and intensity of the scaffolding template accompanying AI use. Participants interacted with the AI tutor under one of three scaffolding conditions that differed in the level of scaffolding provided: full scaffolding, light scaffolding, or no scaffolding (control). The full-scaffolding group received a structured template combining the Goal–Context–Constraints (GCC) strategy to ensure goal-oriented prompt formulation, contextual adaptation, and output constraints (D. Lee & Palmer, 2025). This approach was designed to support students’ metacognitive regulation and deeper conceptual engagement, which are considered essential for active knowledge construction and critical thinking in AI-supported learning (Bauer et al., 2025). The template also included additional steps for iterative refinement, source verification, and guided reflection to promote active processing and metacognitive awareness, which are key components of effective self-regulated learning and critical reasoning (McGrew et al., 2018). The light-scaffolding group received only the initial GCC prompt framework, offering minimal structure to guide the initial interaction with the AI tutor. No further prompts or reflective elements were included, allowing for more self-directed engagement and learner autonomy, as discussed in minimally guided instructional approaches (Kirschner et al., 2006). The control group had access to the AI tutor but received no scaffolding materials or prompting templates. AI use was guided exclusively by the task instructions, which aligned with the lecture content and did not provide structured scaffolding.

By embedding the study in a real university setting and systematically varying instructional support, the design enhances ecological validity and allows meaningful comparison across learning conditions. This aligns with recent research emphasising that the educational impact of AI depends less on the technology itself, but on its didactic integration and ability to activate cognitive and metacognitive resources (Bauer et al., 2025). The study therefore contributes to understanding not only whether, but how and for whom AI can support learning in higher education, addressing gaps highlighted by recent empirical reviews.

2.2. Participants

A total of 175 first-semester students from two faculties participated in the study. Participants were assigned to the study conditions based on their existing course structures, as the study was embedded in scheduled teaching sessions across both faculties.

In the Faculty of Business, 82 students (12 female and 70 male) from the Business Informatics degree programme participated within their established parallel class groups. They were distributed across the three conditions based on their regular class membership as follows: 29 in the full-scaffolding group, 28 in the light-scaffolding group, and 25 in the control group.

From the Faculty of Social Work, 93 students (67 female and 26 male) participated. These students were enrolled in three different degree programmes: Social Work, Education and Profession, and People with Disabilities. As the course is organised jointly across these degree programmes in the early semesters, students were already part of mixed classes, which were maintained for the study. Specialisation into programme-specific tracks occurs in later semesters. This resulted in the following distribution across conditions: 34 in the full-scaffolding group, 31 in the light-scaffolding group, and 28 in the control group. Accordingly, the study follows a quasi-experimental design based on existing organisational structures, and group comparability is therefore limited compared to randomised designs. All groups, however, participated in the same instructional setting, received identical materials, and completed the same task. Relevant baseline characteristics, including prior knowledge and prior AI experience, were assessed and incorporated into the analyses. Participants’ ages ranged from 18 to 28 year, based on enrolment records provided by the university administration. As only aggregated age information was provided, mean and standard deviation could not be computed.

Descriptive statistics are based on the maximum available cases per construct and are reported in Table 1. For knowledge gain, complete data were available for 136 participants, whereas for motivation, cognitive load, critical thinking, and reflective use, complete data were available for 142 participants. Inferential analyses were conducted using complete cases for all variables included in each respective model to ensure consistent and comparable data within each analysis. As a result, sample sizes varied depending on the variables in the respective models. Exact sample sizes for each model and hypothesis are reported in Table 2.

2.3. Materials and Learning Context

The study was conducted within the mandatory first-semester lecture Introduction into the Fundamentals of Artificial Intelligence. To ensure that all students acquire foundational AI competencies, the lecture was embedded in a compulsory lecture series. For the Faculty of Business (degree programme of Business Informatics), it was part of the IT-Fundamentals I lecture series, while for the Faculty of Social Work, it was integrated into Academic Writing. The lecture aimed to develop foundational AI competencies across four domains: technical principles, ethical considerations, legal awareness, and practical application skills. Materials and assignments were provided via Moodle, with task instruction specifying the required AI use per condition. The complete lecture content, learning task instructions, and scaffolding templates used in the study are provided in Appendix A.

2.4. Procedure

Data were collected during a single four-hour session. At the beginning of the session, all participants completed a pre-assessment, consisting of two separate online questionnaires administered via LimeSurvey (Community Edition, Version 6.14.1+250527). This included a domain-specific knowledge test and a questionnaire assessing the remaining study variables. This was followed by a lecture covering fundamental principles of AI, delivered by the same instructor using identical materials across faculties.

After the lecture, students reviewed the materials and completed the core learning task using generative AI as a tutor. Participants worked individually under one of three instructional conditions that differed in the level of scaffolding provided for interacting with generative AI as a tutor. All participants received the same instructional materials introducing the task and providing the foundations of using generative AI as a tutor (see Appendix A). The strength of the manipulation was reflected in the qualitative differences between conditions. While the full-scaffolding condition included structured prompting based on the GCC strategy, iterative refinement, and reflective guidance, the light-scaffolding condition was limited to the initial prompting structure without further guidance. The control condition worked only with the instructional materials without additional scaffolding templates. The scaffolding templates, which served as the central working material in the experimental conditions, were directly embedded in the task instructions, ensuring that differences between conditions were systematically integrated into the learning process (see Appendix A). The learning activity was conducted in a classroom setting under instructor supervision, allowing for ongoing monitoring of students’ engagement with the assigned condition and supporting adherence to the respective scaffolding instructions. In the full-scaffolding condition, students received a structured prompting template based on the Goal–Context–Constraints (GCC) strategy, combined with additional guidance for iterative refinement, source verification, and guided reflection on AI-generated outputs. In the light-scaffolding condition, students were provided only with the initial GCC-based prompting framework without further guidance on refinement, evaluation, or reflection. The control condition did not receive any structured scaffolding or prompting templates beyond the general task instructions. All groups worked on the same task using the same AI system under otherwise identical conditions. The task required students to create three exam-style questions with corresponding model answers, an approach shown to foster deeper processing and conceptual understanding (Yu & Chen, 2021). Task responses were uploaded to Moodle but were not analysed further.

Generative AI was accessed via an institution-provided, data-protection-compliant interface for large language models (GPT-4o, GPT-4 mini, GPT-5, and GPT-5.1). The interface allowed students to upload and query course materials in multiple file formats (e.g., PDF, Word, and PowerPoint). This supported the AI-as-tutor scenario and enabled personalised content interaction. The intervention was based on the instructional principle of scaffolding, defined as temporary, adaptive support the helps learners manage complex tasks that they may not approach as effectively without guidance (Van De Pol et al., 2010). In the full- and light-scaffolding conditions, scaffolding was used to guide students in formulating prompts, evaluating AI responses, and integrating them meaningfully into their learning process.

All participants received the same instructional input, and were introduced to the AI tool with a shared demonstration on how to use it as a tutor effectively. Following task completion, the session concluded with a post-assessment, consisting of the same two questionnaires administered via LimeSurvey. Participation was mandatory and anonymous. To enable response matching, participants generated pseudonyms used across measurement points. Responses that could not be matched due to invalid or missing pseudonyms were excluded from the analyses.

2.5. Measures

2.5.1. Final School Grade

Students’ final secondary school grades were collected at the pre stage as an indicator of prior academic proficiency to distinguish between higher- and lower-performing students. Grades were reported using the national grading scale (1 = excellent, 6 = insufficient) and were reverse-coded so that higher values reflected higher achievement.

2.5.2. Experience

Prior experience with generative AI was assessed at pre-test using a self-developed single-item measure. Students rated their experience on a five-point ordinal scale from no experience to very high experience. Each category was accompanied by a short descriptive label to support consistent interpretation. Prior AI experience was assessed using a single item, a common approach for background variables in higher education research (Ifenthaler & Egloffstein, 2020). Previous research has shown that singe-item measures can demonstrate adequate validity for clearly defined and concrete constructs (Song et al., 2023). As the construct was measured with a single item, no internal consistency reliability was calculated.

2.5.3. Knowledge and Knowledge Gain

Based on Bloom’s revised taxonomy (Anderson & Krathwohl, 2001), knowledge is conceptualised as the acquisition of both lower- and higher-order thinking skills. In the present study, knowledge gain is operationalised as the change in performance between pre-and post-scores. Knowledge was assessed at pre and post using an identical, instructor-developed single-choice test, which differentiates between LOT and HOT, and therefore aligns with the theoretical definition of the construct. Well-constructed single-choice items are widely recognised as an efficient and valid format for assessing domain-specific knowledge in higher education, as they allow objective scoring while capturing meaningful differences in students’ understanding (Haladyna & Rodriguez, 2013). When carefully designed, such items can capture not only factual recall but also higher-order cognitive processes by requiring students to apply, analyse, or evaluate information and by including distractors that reflect plausible misconceptions (Crowe et al., 2008).

The test included 16 items, each with four response options and one correct answer. The items were not limited to factual recall but required understanding and application of knowledge developed through both the lecture content and the learning task. Given the limited class time, the interaction with the AI tutor was intentionally designed to foster deeper engagement and support constructive alignment between learning activities and assessment (Biggs, 1996). Items were developed based on Bloom’s revised taxonomy to ensure representation across cognitive levels (Anderson & Krathwohl, 2001). Eight items targeted lower-order thinking (remembering and understanding), and eight assessed higher-order thinking skills (applying, analysing, and evaluating). This approach provides a theoretically grounded and comprehensive measure of both foundational and integrative cognitive skills.

Each item was scored dichotomously, resulting a maximum total score of 16, which was converted to a percentage for analysis. In addition to the total knowledge score, separate subscores were calculated for LOT and HOT items, following the same procedure. The same test was administered at pre- and post-stages, ensuring identical measurement conditions and comparability of the scores across time. As the knowledge test was conceptualised as a formative measure covering distinct content domains, internal consistency coefficients such as Cronbach’s alpha were not considered appropriate. In formative assessments, items are not assumed to reflect a single theoretical construct, and internal consistency estimates may therefore not meaningfully indicate test quality (Stadler et al., 2021).

A pre–post test was employed to capture knowledge changes within a single learning session. As the intervention was embedded in a four-hour authentic university lecture covering a clearly defined and limited set of content, this approach enabled a sensitive assessment of short-term learning gains. By measuring performance before and after the intervention within the same individuals, the design enables the assessment of change over time while controlling for individual differences in prior knowledge (Creswell, 2014).

2.5.4. Motivation

According to Self-Determination Theory (Deci & Ryan, 1985), motivation is conceptualised as learners’ engagement driven by autonomy and competence in the learning process. Motivation was measured with a self-developed four-item scale that was theoretically derived from established perspectives on educational motivation, especially Self-Determination Theory (Deci & Ryan, 1985), and by current discussions on human–AI interactions in learning contexts (X. Wang et al., 2023). The present operationalisation focuses on motivation in the context of working with AI-generated content and therefore covers beyond what is typically captured by general motivation scales. The self-developed scale captures these dimensions by assessing persistence and motivational experiences in the context of AI-supported learning, aligning closely with the theoretical framework of the construct.

Two items captured persistence-related aspects of motivation (e.g., “I remain motivated even when a task is challenging”), aligning with Self-Determination Theory’s emphasis on autonomy and competence in sustaining engagement (Deci & Ryan, 1985). Two further items measured motivational experiences in the context of AI-supported learning, capturing whether motivation can be maintained or enhanced when AI tools are used purposefully (e.g., “Using AI tools purposefully can enhance my motivation”). This reflects research indicating that AI-supported learning environments can positively relate to students’ motivational experiences depending on how such tools are integrated into the learning process (Mohamed et al., 2025). The same items were used at pre- and post-stages to ensure comparability. At pre-questionnaire, items referred to students’ anticipated motivation toward AI-assisted learning based on the course introduction, as prior attitudes and expectations are known to shape later motivational responses (Eccles & Wigfield, 2002). All items used a 5-point Likert scale (1 = strongly disagree; 5 = strongly agree). Internal consistency was assessed using Cronbach’s a, with values indicating low to moderate reliability (α_pre = 0.60; α_post = 0.72). As coefficient alpha depends on test length and dimensionality (Cortina, 1993), these values should be interpreted in light of the scale’s shortness and conceptual breadth.

2.5.5. Cognitive Load

Based on Cognitive Load Theory (Sweller et al., 2011), cognitive load is conceptualised as the distribution of cognitive resources across intrinsic, extraneous, and germane loads. The adapted questionnaire captures these dimensions by separately assessing content complexity (ICL), unnecessary processing demand (ECL), and active cognitive processing (GCL). Cognitive load was measured using an adapted version of the cognitive load questionnaire developed by Krieglstein et al. (2023). The questionnaire differentiates between intrinsic cognitive load (ICL; 5 items), extraneous cognitive load (ECL; 5 items), and germane cognitive load (GCL; 7 items). Responses were given on a 5-point Likert scale (1 = strongly disagree; 5 = strongly agree), and subscale means were computed, with higher scores reflecting higher cognitive load within the respective dimension.

Items assessing ICL addressed the inherent difficulty and complexity of the content (e.g., “The learning contents were complex”). Items for ECL focused on the unnecessary cognitive effort caused by the instructional design, (e.g., “it was difficult to stay focused on the content while working on the task.”). GCL items reflected active mental processing and the integration of new information (e.g., “I achieved a comprehensive understanding of the learning content”).

The questionnaire was administered in the same format across all groups at pre- and post-stages. All responses were merged for analysis. Pre-test measures reflected students’ expectations about cognitive effort prior to the intervention experience. Internal consistencies were acceptable to good across all dimensions and at both measurement points: ICL (α_pre = 0.81; α_post = 0.82), ECL (α_pre = 0.71; α_post = 0.79), and GCL (α_pre = 0.80; α_post = 0.80).

2.5.6. Critical Thinking

Based on established frameworks of critical reflection and critical thinking in education (Ennis & Philosophy Documentation Center, 2011), critical thinking is conceptualised as the ability to analyse, evaluate, and justify information. Critical thinking in the context of AI-supported learning was assessed using self-developed items informed by theoretical foundations. This scale specifically targets the evaluation of AI-generated outputs and therefore requires a self-developed scale measuring critical thinking related to AI-generated outputs.

It was assessed at pre- and post-test using a nine-item self-developed scale designed to capture critical thinking processes in interactions with AI tools. The items covered three dimensions of critical thinking. The first addressed the evaluation of AI outputs (e.g., “I recognize when AI-generated text sounds plausible but is weak in content”). The second reflected awareness of personal biases and reflective thinking (e.g., “I know that my own assumptions influence how I evaluate AI responses”), consistent with Kember et al.’s (2000) conceptualization of reflective thinking as a critical component of higher-order learning. The third dimension captured inquiry and reasoning (e.g., “I use AI responses as a starting point to develop my own arguments”), aligning with Kuhn et al.’s (2000) developmental perspective on epistemological understanding and argument construction.

All items were rated on a 5-point Likert scale (1 = strongly disagree; 5 = strongly agree). The items were identical across all groups at both measurement points, and responses were merged for analysis. Pre-test ratings referred to students’ anticipated critical thinking processes before experiencing the learning task. Internal consistency was acceptable to good (α_pre = 0.73; α_post = 0.80).

2.5.7. Reflective Use

Reflective use was measured using four self-developed items, informed by foundational metacognitive theory (Flavell, 1979). Reflective use is conceptualised as monitoring, evaluation, and regulation of AI-generated outputs. The scale captures these processes by assessing verification, plausibility checking, and purposeful integration of AI outputs.

Reflective use was assessed at pre- and post-test with a four-item self-developed scale to capture students’ metacognitive monitoring and regulation during task engagement with AI-generated outputs. The construct was informed by foundational metacognitive theory (Flavell, 1979). One item captured the awareness to adopt outputs upon reflection (e.g., “I use AI outputs reflectively and verify them before adopting them”). The remaining items addressed source-checking, plausibility assessment, and active regulation of AI use.

All items were rated on a 5-point Likert scale (1 = strongly disagree; 5 = strongly agree). Mean scores were computed for each time point, with higher values indicating more reflective use. At pre-test, items referred to students’ anticipated reflective engagement with AI tools. The reliability was good (α_pre = 0.80; α_post = 0.81).

2.6. Data Analysis

All statistical analyses were conducted using JASP (Version 0.95; JASP Team, 2025). Descriptive statistics were computed for all variables. To examine the five research questions, a combination of mixed-design ANOVAs, repeated-measures ANCOVA, linear regressions, moderation and mediation analyses was used. Prior AI experience and academic background (final school grade) were included as covariates in the repeated-measures ANCOVA examining knowledge gain to account for individual differences in prior experience and academic performance.

Repeated-measures ANOVAs tested changes in key outcomes across time points (pre–post) and between instructional conditions (full scaffolding, light scaffolding, and control). Effect sizes for ANOVAs are reported as η²_p.

To test group differences and predictors of learning gains, further regression analyses were conducted using standardised coefficients (β) and explained variance (R²). Significance was set at α = 0.05.

To explore moderation and mediation effects, PROCESS models (Model 1 and Model 4; Hayes & Little, 2022) were applied. Moderation analyses examined whether instructional condition influenced the strength of association between predictors and learning outcomes. Mediation analyses tested whether cognitive variables explained indirect effects of instructional condition on learning outcomes. All models used bootstrapped 95% confidence intervals. The results are reported as standardised coefficients (β), p-values, and 95% Cls. Assumptions underlying the parametric analyses were examined as part of the analytical process across the applied models. Homogeneity of variance was assessed using Levene’s test, which was not significant across the main ANOVA models. Normality of residuals was evaluated through visual inspection of Q-Q plots. The observed distributions showed no substantial violations from normality, with only minor deviations at the distribution tails, which are considered acceptable given the robustness of parametric tests.

Missing data primarily resulted from unmatched responses between pre- and post-assessments, basically caused by missing or invalid pseudonyms as well as incomplete submissions that could not be matched. For knowledge measures, including HOT and LOT, data were available for N = 146 at pre-test and N = 138 at post-test, indicating a loss of 8 cases due to unmatched or incomplete responses. For motivation, ICL, ECL, GCL, critical thinking, and reflective use, complete matched data were available for N = 142 across both time points, indicating no dropout for these variables.

As matching responses across measurement points was required for the pre–post comparisons, only complete cases could be included. The use of complete-case analysis may introduce bias if the missing data are not random.

2.7. Analysis Methods

The selection of statistical methods was systematically aligned with the structure of the research questions and the nature of the variables involved. Changes in the central outcome variables (knowledge, motivation, cognitive load, critical thinking, and reflective use) across time (pre–post) and between instructional conditions were examined using mixed-design ANOVAs. This approach allowed the simultaneous investigation of within-subject changes and between-group differences.

Where hypotheses addressed the influence of covariates or prior characteristics (e.g., prior achievement and prior AI experience), repeated-measures ANCOVA models were applied to account for these variables.

Associations between variables were analysed using linear regression models.

To examine underlying mechanisms and indirect effects, mediation analyses were conducted using PROCESS (Model 4; Hayes & Little, 2022), particularly in relation to the role of cognitive load and critical thinking in explaining learning outcomes.

Finally, moderation analyses (Model 1; Hayes & Little, 2022) were conducted to examine whether relationships between variables (e.g., between HOT, reflective use, and cognitive load) differed across instructional conditions.

This structured approach ensured that each hypothesis was tested using a method that directly corresponded to its analytical requirements.

2.8. Ethical Considerations

Ethical considerations were taken into account throughout the study. Participants were informed about the purpose of the study, the voluntary nature of their participation, and the anonymous handling of their data prior to data collection. Students were free to withdraw at any time without any negative consequences. Generative AI was accessed via an institution-provided, data-protection-compliant interface. The system was hosted on servers located within the European Union and operated in accordance with the General Data Protection Regulation (GDPR). User accounts were anonymised, and the research team did not access, store or further analyse individual prompts entered into the AI system. As the study involved adult university students and anonymous data collection within a low-risk educational setting, formal ethical approval was not required according to institutional guidelines, The study was conducted in accordance with the principles of the Declaration of Helsinki.

3. Results

The results are presented according to the five research questions of this study. Descriptive statistics for all outcome variables, including means, standard deviations, and sample sizes by instructional condition and measurement point, are presented in Table 1.

The Section 3 focuses on the corresponding inferential analyses. These include mixed-design ANOVAs and ANCOVAs examining changes over time and between-group differences, as well as regression, moderation analyses (PROCESS Model 1; Hayes & Little, 2022) and mediation analyses (PROCESS Model 4; Hayes & Little, 2022) to test predictive, main and interaction effects. p-values are based on the delta method, whereas confidence intervals are percentile bootstrapped, which may result in minor discrepancies between significance indicators.

All analyses were conducted on complete cases for each construct. As a result, sample sizes vary across outcomes and analytical models. An overview of the research questions, hypotheses, analysis-specific sample sizes, and hypothesis support is provided in Table 2.

3.1. RQ1: Knowledge

The first research question examined whether students’ knowledge increased from pre- to post-test and whether these gains differed between the conditions.

For H1a, a mixed-design ANOVA with knowledge as the dependent variable, time (pre, post) as the within-subject factor and condition (full scaffolding, light scaffolding, and control) as the between-subjects factor was conducted. Across all groups, students’ knowledge scores improved significantly from pre to post, F(1, 133) = 28.02, p < 0.001, η²_p = 0.174. The time × condition interaction did not reach significance, F(2, 133) = 2.00, p = 0.139, η²_p = 0.029. The between-subjects effect of condition was not significant, F(2, 133) = 0.255, p = 0.775, η²_p = 0.004. H1a was partially supported.

For H1b, a repeated-measures ANCOVA with knowledge as the dependent variable, time (pre, post) as the within-subject factor and final secondary school grade and prior AI experience as covariates was conducted. The main effect of time was not significant, F(1, 129) = 1.68, p = 0.198, η²_p = 0.013. Neither the interaction with time reached significance, F(1, 129) = 0.37, p = 0.542, η²_p = 0.003, nor did the interaction between time and prior AI experience F(1, 129) = 2.04, p = 0.156, η²_p = 0.016. To further examine whether the initial level of knowledge (pre) influenced learning gains, an additional exploratory mixed-design ANOVA was conducted using high/low initial knowledge (median split) as a between-subjects factor. This analysis revealed a significant time × initial knowledge interaction, F(1, 134) = 5.08, p = 0.026, η²_p = 0.037. H1b was not supported.

For H1c, a mixed-design ANOVA with LOT as the dependent variable, time (pre, post) as the within-subject factor and condition (full scaffolding, light scaffolding, and control) as the between-subjects factor showed a significant effect of time, F(1, 133) = 16.57, p < 0.001, η²_p = 0.111. The time × condition interaction was not significant, F(2, 133) = 1.53, p = 0.219, η²_p = 0.023. The between-subjects effect of condition was not significant, F(2, 133) = 0.373, p = 0.689, η²_p = 0.006. A Bayesian repeated-measures ANOVA was conducted to assess the evidence for models including or excluding condition effects. The model containing only the time factor showed no preference over the null hypothesis (BF₁₀ = 1.00), while there was evidence supporting the null hypothesis for both the main effect of condition (BF₁₀ = 0.094) and the time × condition interaction (BF₁₀ = 0.027). H1c was supported.

For H1d, a mixed-design ANOVA with HOT as the dependent variable, time (pre, post) as the within-subject factor and condition (full scaffolding, light scaffolding, and control) as the between-subjects factor showed a significant effect of time, F(1, 133) = 18.58, p < 0.001, η²_p = 0.123. The time × condition interaction was not significant, F(2, 133) = 1.00, p = 0.371, η²_p = 0.015. The between-subjects effect of condition was not significant, F(2, 133) = 0.107, p = 0.898, η²_p = 0.002. H1d was not supported.

3.2. RQ2: Motivation

Research Question 2 examined whether students’ motivation changed from pre to post during the AI-supported learning activity, and whether motivation was associated with knowledge gains.

For H2a, a mixed-design ANOVA with motivation as the dependent variable, time (pre, post) as the within-subject factor and condition (full scaffolding, light scaffolding, and control) as the between-subjects factor was conducted. Students’ motivation remained stable during the session, and the main effect of time was not significant, F(1, 139) = 1.05, p = 0.308, η²_p = 0.007. The time × condition interaction also did not reach significance, F(2, 139) = 0.46, p = 0.636, η²_p = 0.006. The between-subjects effect of condition was significant, F(2, 139) = 5.900, p = 0.003, η²_p = 0.078. H2a was not supported.

For H2b, a multiple linear regression analysis was conducted to examine whether motivation (pre) predicted knowledge (post), while controlling for knowledge (pre). The overall model was significant, F(2, 129) = 20.68, p < 0.001, R² = 0.243. Motivation (pre) did not significantly predict knowledge (post) beyond prior knowledge (β = 0.138, p = 0.080, 95% CI [−0.473, 8.286]). Knowledge (pre) significantly predicted knowledge (post) (β = 0.449, p < 0.001, 95% CI [0.357, 0.731]). In additional exploratory analyses including a didactic scenario and its interaction with motivation, no significant effects emerged (ps > 0.649). H2b was not supported.

3.3. RQ3: Cognitive Load

Research Question 3 examined how cognitive load dimensions (intrinsic, extraneous, and germane) evolved during the AI-supported learning activity, and how they differed across the didactic scenarios.

For hypothesis H3a, a mixed-design ANOVA with ICL as the dependent variable, time (pre, post) as the within-subject factor and condition (full scaffolding, light scaffolding, and control) as the between-subjects factor revealed significant effects of time, F(1, 139) = 11.74, p < 0.001, η²_p = 0.078. The time × condition interaction was not significant, F(2, 139) = 1.03, p = 0.359, η²_p = 0.015. The between-subjects effect of condition was also not significant, F(2, 139) = 0.707, p = 0.495, η²_p = 0.010. H3a was supported.

For H3b, a mixed-design ANOVA with ECL as the dependent variable, time (pre, post) as the within-subject factor and condition (full scaffolding, light scaffolding, and control) as the between-subjects factor was conducted. Across all groups, ECL decreased significantly from pre to post, F(1, 139) = 12.32, p < 0.001, η²_p = 0.081. The time × condition interaction was not significant, F(2, 139) = 0.98, p = 0.376, η²_p = 0.014. The between-subjects effect of condition was significant, F(2, 139) = 4.624, p = 0.011, η²_p = 0.062. H3b was partially supported.

For H3c, a mixed-design ANOVA with GCL as the dependent variable, time (pre, post) as the within-subject factor and condition (full scaffolding, light scaffolding, and control) as the between-subjects factor was conducted. The main effect of time did not reach significance, F(1, 139) = 3.47, p = 0.065, η²_p = 0.024. The time × condition interaction was also not significant, F(2, 139) = 0.33, p = 0.718, η²_p = 0.005. The between-subjects effect of condition was not significant, F(2, 139) = 1.016, p = 0.365, η²_p = 0.014. H3c was not supported.

For H3d, two mediation PROCESS analyses (see Figure 1) were conducted with GCL (post) as the mediator, knowledge gain as the dependent variable, and condition as the independent variable. Two dummy-coded comparisons were tested: full scaffolding vs control and light scaffolding vs. control. In the full-scaffolding model, condition was not significantly associated with GCL (β = −0.097, p = 0.343, 95% CI [−0.279, 0.099]). GCL was significantly positively associated with knowledge gain (β = 0.216, p = 0.030, 95% CI [0.983, 12.173]). The direct effect of condition on knowledge gain was not significant (β = 0.190, p = 0.057, 95%CI [−0.249, 11.400]). The indirect effect of condition on knowledge gain via GCL was nonsignificant (β = −0.021, p = 0.385, 95% CI [−2.317, 0.629]). The model explained 7.5% of the variance. In the light-scaffolding model, condition was not significantly associated with GCL (β = −0.028, p = 0.794, 95% CI [−0.243, 0.182]). GCL did not significantly associate with knowledge gain (β = 0.127, p = 0.224, 95% CI [−1.600, 10.327]). The direct effect of condition on knowledge gain was not significant (β = 0.177, p = 0.090, 95% CI [−1.278, 12.811]). The indirect effect of condition on knowledge gain via GCL was nonsignificant (β = −0.004, p = 0.798, 95% CI [−1.454, 0.970]). The model explained 4.6% of the variance. H3d was not supported.

For H3e, two mediation PROCESS analyses (see Figure 2) were conducted with ECL (post) as the mediator, knowledge gain as the dependent variable, and condition as the independent variable. Two dummy-coded comparisons were tested: full scaffolding vs control and light scaffolding vs. control. In the full-scaffolding model, condition was significantly associated with ECL (β = 0.252, p = 0.012, 95% CI [0.070, 0.610]). ECL was significantly negatively associated with knowledge gain (β = −0.203, p = 0.048), although the bootstrapped confidence interval included zero (95% CI [−8.755, 0.001]). The direct effect of condition on knowledge gain remained significant (β = 0.220, p = 0.032, 95% CI [0.222, 12.743]). The indirect effect of condition on knowledge gain via ECL was nonsignificant (β = −0.051, p = 0.120, 95% CI [−3.701, 0.039]). The model explained 6.7% of the variance. In the light-scaffolding model, condition was not significantly associated with ECL (β = 0.009, p = 0.929, 95% CI [−0.234, 0.267]). ECL showed a negative association, with knowledge gain reaching no significance (β = −0.190, p = 0.065, 95% CI [−12.954, 2.122]). The direct effect of condition on knowledge gain was nonsignificant (β = 0.175, p = 0.090, 95% CI [−0.957, 12.653]). The indirect effect of condition on knowledge gain via ECL was nonsignificant (β = −0.002, p = 0.929, 95%CI [−2.138, 1.642]). The model explained 6.6% of the variance. H3e was not supported.

3.4. RQ4: Critical Thinking

Research Question 4 examined whether students’ critical thinking increased from pre- to post-test during the AI-supported learning activity, and whether changes differed across the conditions.

For H4a, a mixed-design ANOVA with critical thinking as the dependent variable, time (pre, post) as the within-subject factor and condition (full scaffolding, light scaffolding, and control) as the between-subjects factor showed a significant main effect of time, F(1, 139) = 26.23, p < 0.001, η²_p = 0.159, and no significant time × condition interaction, F(2, 139) = 2.86, p = 0.060, η²_p = 0.040. The between-subjects effect of condition was also not significant, F(2, 139) = 0.397, p = 0.673, η²_p = 0.006. H4a was supported.

For H4b, a linear regression was conducted with critical thinking (pre) as the predictor and knowledge gain as dependent variable. The model was not significant, F(1, 130) = 2.19, p = 0.141. Initial critical thinking showed a positive but nonsignificant association with knowledge gain (β = 0.129, p = 0.141, 95% CI [−1.244, 8.37]). The model explained 1.7% of the variance. H4b was not supported.

For H4c, four mediation PROCESS analyses (see Figure 3) were conducted to examine whether the effect of condition on GCL (post) and ECL (post) was mediated by critical thinking (post). Two dummy-coded comparisons were tested: full scaffolding vs control and light scaffolding vs. control. For GCL, the effect of the full-scaffolding model on critical thinking was nonsignificant (β = −0.092, p = 0.288, 95% CI [−0.267, 0.073]). Critical thinking was significantly associated GCL (β = 0.480, p < 0.001, 95% CI [0.256, 0.633]). The indirect effect was not significant (β = −0.050, p = 0.297, 95% CI [−0.156, 0.038]). The direct effect of condition on GCL was not significant (β = −0.092, p = 0.288, 95% CI [−0.267, 0.073]). The model explained 24.8% of the variance in GCL and 1.1% of the variance in critical thinking. For the light-scaffolding model, the effect of light scaffolding on critical thinking was not significant (β = 0.050, p = 0.628, 95% CI [−0.148, 0.256]). Critical thinking was significantly associated with GCL (β = 0.602, p < 0.001, 95% CI [0.468, 0.807]). The indirect effect was not significant (β = 0.030, p =.629, 95% CI [−0.101, 0.163]), and the direct effect of condition on GCL was nonsignificant (β = −0.077, p = 0.354, 95% CI [−0.258, 0.089]). The model explained 36.4% of the variance in GCL and 0.3% of the variance in critical thinking.

For ECL, the effect of full-scaffolding model on critical thinking was not significant (β = −0.105, p = 0.288, 95% CI [−0.332, 0.092]). Critical thinking was significantly negatively related to ECL (β = −0.373, p < 0.001, 95% CI [−0.741, −0.196]). The indirect effect was nonsignificant (β = 0.039, p = 0.303, 95% CI [−0.046, 0.164]), while the direct effect of condition on ECL was significant (β = 0.213, p = 0.017, 95% CI [0.047, 0.507]). The model explained 20.1% of the variance in ECL and 1.1% of the variance in critical thinking. For the light-scaffolding model, the effect on critical thinking was not significant (β = 0.50, p = 0.628, 95% CI [−0.15, 0.25]). Critical thinking was significantly negatively related to ECL (β = −0.525, p < 0.001, 95% CI [−0.812, −0.413]). The indirect effect was not significant (β = −0.026, p = 0.630, 95% CI [−0.174, 0.087]), and no direct effect of condition on ECL emerged (β = 0.035, p = 0.694, 95% CI [−0.155, 0.249]). The model explained 27.6% of the variance in ECL and 0.3% of the variance in critical thinking. H4c was not supported.

For H4d, two moderation PROCESS analyses (See Figure 4) were conducted to examine whether the effect of HOT Gain on Critical Thinking Gain was moderated by the condition. The moderator was dummy-coded into two comparisons: full scaffolding vs control and light scaffolding vs. control. In the full-scaffolding model, HOT Gain was not significantly associated with Critical Thinking Gain (β = −0.138, p = 0.328, 95% CI [−0.017, 0.005]). Condition was significantly associated with Critical Thinking Gain (β = −0.343, p < 0.001, 95% CI [−0.869, −0.197]). The interaction between HOT Gain and condition approached but did not reach the statistical significance (β = 0.187, p = 0.057, 95% CI [0.0001, 0.032]). The association between HOT Gain and Critical Thinking Gain was not significant in the control condition (β = −0.138, p = 0.328, 95% CI [−0.017, 0.005]). In the full-scaffolding condition, the slope was positive but did not reach significance (β = 0.237, p = 0.084, 95% CI [−0.001, 0.021]). The model explained 11.1% of the variance in Critical Thinking Gain. In the light-scaffolding model, HOT Gain did not significantly associate with Critical Thinking Gain (β = −0.125, p = 0.436, 95% CI [−0.018, 0.006]). Condition was not significantly associated with Critical Thinking Gain (β = −0.017, p = 0.881, 95% CI [−0.475, 0.431]). The interaction between HOT Gain and condition was not significant (β = 0.033, p = 0.758, 95% CI [−0.022, 0.025]). The association between HOT Gain and Critical Thinking Gain was not significant in the control condition (β = −0.125, p = 0.436, 95% CI [−0.018, 0.006]). In the light-scaffolding condition, the slope was negative but did not reach significance (β = −0.059, p = 0.682, 95% CI [−0.025, 0.016]). The model explained 0.9% of the variance in Critical Thinking Gain. H4d was not supported.

3.5. RQ5: Reflective Use

Research Question 5 examined how students’ reflective use of AI developed from pre to post and how it was associated with knowledge gain, cognitive load, and critical thinking.

For H5a, a mixed-design ANOVA with reflective use as the dependent variable, time (pre, post) as the within-subject factor and condition (full scaffolding, light scaffolding, and control) as the between-subjects factor showed a significant main effect of time, F(1, 139) = 30.94, p < 0.001, η²_p = 0.182, and no significant time × condition interaction, F(2, 139) = 0.53, p = 0.593, η²_p = 0.008. The between-subjects effect of condition was also not significant, F(2, 139) = 1.36, p = 0.260, η²_p = 0.019. H5a was supported.

For H5b, a linear regression was conducted with reflective use (pre) as the predictor and knowledge gain as a covariate. The model was not significant, F(1, 130) = 0.01, p = 0.758, R² = 0.001. The regression coefficient was positive but very small and not significant (β = 0.027, p = 0.758, 95% Cl [−2.726, 3.735]). H5b was not supported.

For H5c, two linear regressions with dummy-coded conditions as predictors (full scaffolding vs control and light scaffolding vs. control) were conducted with reflective use (post) as the dependent variable. The model including full scaffolding was not significant, F(1, 99) = 0.51, p = 0.476, R² = 0.005, and the regression coefficient was negative and small in magnitude (β = −0.072, p = 0.476, 95% Cl [−0.366, 0.172]). The model including light scaffolding was also not significant, F(1, 91) = 2.43, p = 0.123, R² = 0.026, although the coefficient was positive (β = 0.161, p = 0.123, 95% Cl [−0.055, 0.453]). H5c was not supported.

For H5d, four moderation PROCESS analyses (see Figure 5) were conducted with reflective use (post) as the predictor and GCL (post) as the dependent variable. Condition as the categorical moderator was dummy-coded into two comparisons: full scaffolding vs control and light scaffolding vs. control. In the full-scaffolding model, reflective use was significantly positively associated with GCL (β = 0.549, p < 0.001, 95% CI [0.184, 0.669]), while neither the main effect of condition (β = −0.255, p = 0.210, 95% CI [−0.293, 0.068]) nor the interaction term reached significance (β = −0.270, p = 0.137, 95% CI [−0.543, 0.107]). The model explained 19.1% of the variance in GCL. In the light-scaffolding model, reflective use was significantly positively associated with GCL (β = 0.480, p < 0.001, 95% Cl [−0.185, 0.675]). Neither the main effect of condition (β = −0.263, p = 0.144, 95% CI [−0.321, 0.057]) nor the interaction term was significant (β = 0.089, p = 0.622, 95% CI [−0.307, 0.384]). The model explained 26.7% of the variance in GCL. Two additional moderation PROCESS analyses (Model 1; Hayes & Little, 2022) were conducted with reflective use (post) as the predictor and ECL (post) as the dependent variable. The condition as the categorical moderator was dummy-coded into two comparisons: full scaffolding vs. control and light scaffolding vs. control. In the full-scaffolding model, reflective use was not significantly associated with ECL, although the bootstrapped confidence narrowly excluded zero (β = −0.223, p = 0.124, 95% CI [−0.449, −0.023]). The main effect of condition was significant (β = 0.472, p = 0.012, 95% CI [0.075, 0.556]), whereas the interaction between reflective use and condition was nonsignificant (β = −0.011, p = 0.952, 95% CI [−0.410, 0.376]). The model explained 11.6% of the variance in ECL. In the light-scaffolding model, reflective use was not significantly associated with ECL (β = −0.237, p = 0.058, 95% CI [−0.455, −0.020]). The main effect of condition was not significant (β = 0.162, p = 0.384, 95% CI [−0.122, 0.306]). The interaction between reflective use and light scaffolding was significant (β = −0.413, p = 0.028, 95% CI [−0.709, −0.065]). Conditional effects indicated that the association between reflective use and ECL was stronger in the light-scaffolding condition (β = −0.650, p < 0.001, 95% CI [−0.879, −0.389]) than in the control group (β = −0.237, p = 0.058, 95% CI [−0.455, −0.020]). The model explained 21.3% of the variance in ECL. H5d was partially supported.

For H5e, two moderation PROCESS analyses (see Figure 6) were conducted with HOT (post) as the predictor and reflective use (post) as the dependent variable. Condition was included as a categorical moderator using two dummy-coded comparisons (full-scaffolding vs. control and light-scaffolding vs. control). In the full-scaffolding model, HOT (post) was not significantly associated with reflective use (post), (β = 0.151, p = 0.312, 95% CI [−0.003, 0.015]), explaining 4.6% of the variance. The main effect of condition was not significant (β = −0.141, p = 0.480, 95% CI [−0.372, 0.166]), and the interaction between HOT and condition did not reach significance (β = 0.088, p = 0.661, 95% CI [−0.010, 0.017]). In the light-scaffolding model, HOT (post) was not significantly associated with reflective use (post), (β = 0.174, p = 0.261, 95% CI [−0.003, 0.014], explaining 7.1% of the variance. The main effect of condition was not significant (β = 0.480, p = 0.320, 95% CI [−0.057, 0.442]). The interaction between HOT and condition did not reach significance (β = 0.073, p = 0.722, 95% CI [−0.009, 0.015]). H5e was not supported.

3.6. Summary of Main Findings

Across all five research questions, a consistent pattern emerged. Students showed significant improvements in knowledge, critical thinking, and reflective use across conditions, while motivation remained largely stable. Cognitive load dynamics indicated significant decreases in intrinsic and extraneous load, whereas germane load did not show significant changes over time. Contrary to expectations, variations in scaffolding intensity did not result in consistent or systematic differences between conditions across the examined outcomes. Instead, the findings suggest that key learning processes, particularly reflective use and its association with cognitive load, play a more central role in explaining variation in learning outcomes than externally imposed scaffolding intensity.

From a practical perspective, these results indicate that increasing scaffolding intensity alone may not be sufficient to enhance learning outcomes in AI-supported environments. Rather, fostering reflective and active engagement with AI-generated content appears to be critical for supporting effective learning processes.

4. Discussion

This study examined how generative AI shapes learning dynamics in higher education. The findings highlight that the educational impact of AI depends less on scaffolding intensity and more on how learners engage with and regulate AI-supported interactions. In particular, the absence of consistent differences between scaffolding conditions suggests that generative AI itself may already provide elements of support that resemble scaffolding processes, although this was not directly examined in the present study.

The findings partly align with theoretical expectations. Decreases in ICL and ECL reflect increasing task familiarity, while gains in critical thinking and reflective use indicate that generative AI as a tutor can activate evaluative and metacognitive processes even in short-term settings. The absence of differential scaffolding effects and stable motivation suggest that the educational impact of AI may not be determined only by support intensity, but also relate to its pedagogical integration. Of the twenty hypotheses tested, four were supported and three were partially supported. The findings underscore that the educational impact of generative AI depends less on technological access and more on the interaction between instructional design, cognitive regulation, and learner engagement. A central finding of this study is the absence of consistent differences between scaffolding conditions. Rather than confirming traditional assumptions, this may suggest that generative AI functions as a form of implicit scaffolding. However, this mechanism was not directly examined in the present study. This interpretation should therefore be considered preliminary and requires further empirical investigation. By providing structured explanations and guidance, AI may partially assume the role traditionally attributed to external instructional support. This interpretation is consistent with scaffolding theory and may suggest that, in AI-supported learning environments, the distinction between instructional support and learning tool becomes less clearly defined. As a result, variations in externally imposed scaffolding may have limited additional effects in short-term learning contexts.

The pattern of results suggests that different learning outcomes may be affected through distinct mechanisms in AI-supported environments. Outcomes such as knowledge gain and reflective use improved across conditions, suggesting that interactions with generative AI may support information access, organisation, and evaluation that can be activated within a single session.

In contrast, outcomes such as motivation or GCL may depend on more sustained engagement and iterative learning processes, which are less likely to emerge in short-term instructional settings. This differentiation indicates that AI-supported learning may primarily influence immediate cognitive and metacognitive processes, while deeper motivational and higher-order cognitive changes require long-term interaction. Importantly, the findings do not point to a uniform effect of generative AI across learning outcomes. While knowledge gain and reflective use showed clear improvements, motivation and GCL remained unaffected. This variability suggests that AI-supported learning may not operate as a general enhancement mechanism but rather may influence specific processes in a differentiated way.

4.1. Knowledge Gain

In line with H1a, students across all three conditions improved their knowledge from pre to post, indicating that the AI integration as a tutor was effective overall. However, the nonsignificant time × condition interaction indicates that H1a is only partially supported. Although descriptively larger gains occurred in the scaffolding groups, these differences did not reach statistical significance within a single session. Given that scaffolding effects often unfold progressively (Vygotsky & Cole, 1978), longer exposure may lead to stronger effects. This pattern suggests that even a minimal AI tutor introduction may already be sufficient to produce short-term knowledge gains, while additional scaffolding may require longer exposure to unfold measurable advantages. H1a was partially supported.

Contrary to H1b, neither prior achievement nor prior AI experience moderated knowledge gains, indicating that initial differences did not translate into Matthew-effect patterns (Stanovich, 1986). This suggests that initial advantages did not amplify within the AI tutor setting. However, exploratory analyses using initial knowledge suggested a reverse Matthew effect, with lower-performing students benefiting more strongly. As this analysis was exploratory and based on a median split, the interpretation should be interpreted with caution. Consistent with the Zone of Proximal Development (Vygotsky & Cole, 1978), structured AI guidance may reduce extraneous cognitive load (Sweller et al., 2019) and disproportionately support learners with lower prior knowledge. Rather than widening achievement gaps, scaffolded AI use may therefore enhance an equalising effect. H1b was not supported.

LOT improved similarly across conditions without group differences. This aligns with Bloom’s revised taxonomy (Anderson & Krathwohl, 2001) and research suggesting that generative AI efficiently supports remembering and understanding through structured information access (Akgun & Toker, 2025). H1c was supported.

H1d was not statistically supported. Although HOT gains were descriptively higher in the scaffolding conditions, differences were nonsignificant. As higher-order processes require deeper integration and evaluation, scaffolding may require sustained exposure to produce measurable effects (Nathaniel et al., 2025). Because the HOT items focused on the content of a single lecture, their complexity may have been limited. Over a full semester with progressively more complex higher-order tasks, scaffolding-related differences might become more evident.

Overall, scaffolded AI tutor learning reliably enhanced knowledge acquisition within a single session, particularly at the LOT level. Differential HOT gains and potential Matthew-effect patterns may require longitudinal exposure and more complex tasks. H1d was not supported.

4.2. Motivation

Contrary to H2a, motivation remained stable from pre to post and did not differ in its development across instructional conditions. Although overall motivational levels differed between conditions, these differences were independent of time and therefore do not indicate differential motivational change due to scaffolding. As intrinsic motivation depends on perceived autonomy and competence (Deci & Ryan, 2000), the variation in scaffolding intensity did not meaningfully change these perceptions. Rather than enhancing or undermining engagement, the scaffolded AI intervention maintained students’ existing motivational state. While prior research has shown that AI-supported instruction can enhance motivation (Boguslawski et al., 2025), such effects did not emerge in the present short-term intervention. Similarly, no evidence was found that generative AI reduced intrinsic motivation by limiting perceived learner control, a concern raised in studies of highly automated systems (Fryer et al., 2017). Together, these findings indicate that motivational effects of scaffolded AI interventions do not emerge automatically and may require sustained engagement to unfold. H2a was not supported.

In contrast to H2b, initial motivation did not significantly predict knowledge gains beyond prior knowledge. Although prior work suggests that motivational differences can shape how learners engage with AI-supported tasks (Bai & Wang, 2025), this influence did not translate into measurable short-term knowledge gains. Motivational processes are often associated with sustained engagement over a longer period of time (Polyportis, 2024). The findings suggest that, within brief and highly structured AI-supported settings, knowledge acquisition may be driven more directly by instructional design and scaffolding quality than by motivational variation. H2b was not supported.

4.3. Cognitive Load

Cognitive load patterns revealed theoretically coherent but structurally stable dynamics. In line with H3a, ICL decreased significantly across all conditions, indicating improved task comprehension through increasing situational understanding (Leppink et al., 2014). This effect occurred independently of scaffolding intensity, suggesting that improvements were driven by increasing familiarity with the task rather than by the level of scaffolding.

ECL also declined significantly over time across all conditions, consistent with the assumption that learners gradually reduce unnecessary processing demands as they gain clarity about task requirements (Sweller et al., 2019). However, contrary to expectations, no significant time × condition interaction emerged, indicating that the reduction did not differ across scaffolding conditions. Although stable between-group differences were observed, scaffolding intensity did not differentially accelerate the reduction in extraneous load. This aligns with arguments that highly structured prompts may introduce additional procedural demands, which can temporarily offset reductions in task-irrelevant load (Jose et al., 2025). H3b was partially supported, as ECL declined overall but not more strongly in the full-scaffolding condition.

GCL did not increase significantly over time or across conditions. This is expected in short-term learning settings, as schema construction typically requires extended engagement before measurable gains emerge (Sweller et al., 2019). Moreover, when generative AI functions as a tutor, the AI may handle parts of the integrative processing, potentially affecting how much germane effort students invest. This may reflect cognitive offloading, where students rely on AI instead of using their own strategies (Risko & Gilbert, 2016). Consequently, H3c was not supported.

Mediation analyses further clarified the functional role of load dimensions. GCL at post-test was positively associated with knowledge gain in the full scaffolding vs control comparison, consistent with theoretical assumptions that germane processing supports learning (Sweller et al., 2019). However, GCL did not mediate the relationship between scaffolding condition and knowledge gain, and no significant associations were observed in the light scaffolding vs control comparison. Therefore, H3d was not supported. Similarly, ECL was negatively associated with knowledge gain in the full scaffolding vs. control comparison, consistent with the established cognitive load mechanisms (Sweller et al., 2019), whereas in the light scaffolding vs control comparison only a nonsignificant negative tendency was observed. In neither comparison did ECL significantly mediate the relationship between condition and knowledge gain. Therefore, H3e was not supported.

These findings suggest that while cognitive load mechanisms operate in theoretically expected directions, their association with learning outcomes was more evident in the full-scaffolding condition. However, even in the full-scaffolding condition, the absence of significant mediation effects suggests that short-term variations in scaffolding intensity may not be sufficient to reshape the cognitive load patterns. These patterns suggest that stronger scaffolding effects may unfold over longer or iterative interactions with generative AI, when learners have more opportunities to refine strategies for coordinating their own processes with AI-generated support (Holmes et al., 2019). From a Cognitive Load Theory perspective, these findings are consistent with the assumption that cognitive load may change during AI-supported learning activities. While scaffolding is typically expected to reduce ECL and foster GCL, generative AI may simultaneously simplify task demands and partially externalise cognitive processes. This dual role may help to interpret why increases in GCL did not emerge as expected. Learners may rely on AI-generated structuring instead of investing their own cognitive resources. In this sense, generative AI does not only support learning but may reshape how cognitive effort is allocated during the learning process.

4.4. Critical Thinking

In line with H4a, across conditions, students showed significant gains in critical thinking, indicating that engagement with generative AI as a tutor can stimulate reasoning even in a single session. One explanation for this pattern is that interaction with generative AI may require evaluative engagement. Learners are confronted with outputs that appear plausible but require validation, which may activate critical thinking processes independent of explicit instructional scaffolding. This suggests that critical thinking in AI-supported environments may be associated not only with instructional design but also with characteristics of AI-supported interactions. This aligns with prior work suggesting that generative AI can prompt learners to examine, refine, and justify their reasoning when engaging with content (Hou et al., 2025). The nonsignificant time × condition interaction suggests that these gains emerged broadly across all scaffolding settings. Although descriptively smaller differences emerged in the full-scaffolding condition, these differences were not statistically reliable. One possible interpretation is that highly structured scaffolding does not necessarily involve autonomous reasoning within short-term learning contexts, whereas lighter guidance may provide orientation without crucially changing self-directed evaluation (Halpern, 2013). Future research should examine how the duration and design of scaffolding regulate these effects over time. H4a was supported.

Conversely to H4b, students’ initial critical thinking did not significantly predict knowledge gains. While the direction of the effect was positive, the relationship was weak and nonsignificant. This suggests that the ability to think critically may not translate into measurable improvements in knowledge gains within a single session. These findings challenge the assumption that critical thinking universally drives learning (Facione, 1990), and instead support emerging views that its impact depends on task duration, domain, and design (Entwistle, 1991). H4b was not supported.

Although the associations between critical thinking and cognitive load were consistent with Cognitive Load Theory (Sweller et al., 2011), they did not mediate the effects of instructional condition on either GCL or ECL. This indicates that while students with higher critical thinking invested more germane effort and experienced less extraneous load, scaffolding did not operate through critical thinking as an explanatory mechanism. These findings suggest that critical thinking functions as a cognitive resource influencing how learners process tasks, but not as a mediator through which instructional scaffolding shapes cognitive load. The absence of mediational effects may reflect the short intervention, as such processes often require repeated, domain-specific engagement (Azevedo, 2020). H4c was not supported.

Contrary to H4d, gains in HOT were not significantly associated with gains in critical thinking, and this relationship was not significantly moderated by instructional condition. Although a positive tendency emerged in the full-scaffolding comparison, the interaction did not reach statistical significance. Improvements in condition-specific higher-order processing did not reliably translate into broader gains in critical thinking, suggesting that these represent related but conceptually separate processes. While HOT reflects domain-specific analytical engagement, critical thinking involves more evaluative skills (Facione, 1990). The absence of a significant moderating effect indicates that scaffolding does not automatically promote the transfer from task-specific analysis to broader reasoning within a single session. Sustained and iterative metacognitive activation may be required for such transfer effects to emerge (Halpern, 2013). H4d was not supported.

Overall, the results indicate that interaction with AI as a tutor can enhance critical thinking, but variations in scaffolding intensity do not fundamentally change its development within short-term learning contexts.

4.5. Reflective Use

In line with H5a, reflective use increased significantly from pre-test to post-test across all conditions, indicating that engagement with generative AI as a tutor can activate metacognitive regulation even within a single session. This finding aligns with Flavell’s (1979) conceptualisation of monitoring and regulation as processes triggered through active task engagement. The absence of a time × condition interaction suggests that reflective regulation may emerge not only through structured scaffolding but also through direct interaction with AI systems. Given the epistemic lack of transparency and variability in AI outputs, students may be prompted to evaluate plausibility and integrate responses more systematically, supporting its role as a core dimension of AI literacy (Pinski & Benlian, 2024; Chiu et al., 2024). H5a was supported.

Contrary to H5b and H5c, initial reflective use did not predict knowledge gains, and scaffolding intensity did not significantly enhance reflective use compared to the control condition. These findings indicate that reflective regulation does not automatically translate into short-term performance improvements and does not rely only on external scaffolding guidance. Rather, reflective use appears to function as a self-initiated regulatory process that can be activated through engagement with AI itself, consistent with metacognitive theory emphasising internal monitoring and control (Flavell, 1979). Reflective engagement may emerge more from the demands of interacting with generative AI than from scaffolding intensity. H5b and H5c were not supported.

In line with H5d, reflective use was positively associated with GCL and negatively associated with ECL in the light-scaffolding condition, and this association was significantly stronger than in the control condition. Consistent with Cognitive Load Theory (Sweller et al., 2011), this pattern indicates that metacognitive regulation helps focus cognitive effort on relevant processing while reducing unnecessary load. Interestingly, the stronger negative association with ECL in the light-scaffolding condition suggests that reflective use may function as an internal support mechanism when external structure is limited. This indicates that reflective AI use does not simply accompany learning but is linked to how cognitive load is managed depending on the instructional context. H5d was partially supported.

Contrary to H5e, higher-order thinking was not significantly associated with reflective use. Although both constructs involve analytical engagement, they reflect conceptually separate processes. While higher-order thinking captures domain-specific reasoning and evaluative processing, reflective use concerns the metacognitive monitoring and regulation of AI-generated outputs (Flavell, 1979). The absence of a significant relationship supports theoretical distinctions between metacognitive monitoring and cognitive reasoning (Kuhn, 2000), suggesting that engaging in complex task-related reasoning does not automatically involve conscious regulation of AI responses. Reflective use and higher-order thinking appear to represent complementary but independent dimensions of higher-order learning within AI-tutored learning contexts. H5e was not supported.

The findings indicate that reflective use is associated with differences in cognitive processing rather than directly enhancing short-term knowledge outcomes. By shaping cognitive load while remaining distinct from HOT, reflective use represents one dimension of effective AI-tutored learning, supporting the view of AI literacy as a multidimensional construct (Ng et al., 2021). These findings highlight reflective use as a central regulatory mechanism in AI-supported learning. Rather than directly enhancing performance, reflective use appears to be related to how learners manage cognitive processing. From a metacognitive perspective, this suggests that interacting with generative AI increases the need for monitoring and regulation, as learners must actively evaluate and integrate AI-generated outputs. Reflective use may therefore represent a key mechanism for maintaining cognitive control in AI-supported learning environments.

4.6. Limitations and Future Research

Several limitations should be considered when interpreting the findings, as they may influence the interpretation and generalisability of the results. First, the study employed a quasi-experimental design based on pre-existing classroom groups. As participants were assigned to conditions according to existing course structures, pre-existing differences between groups cannot be fully ruled out. This limits the ability to determine whether the different scaffolding conditions caused the observed effects and requires that group differences be interpreted with caution. At the same time, this design reflects authentic classroom settings and therefore enhances ecological validity. Future research should employ fully randomised designs to more systematically isolate causal effects.

Second, the intervention was conducted within a single four-hour session. While significant gains in knowledge, critical thinking, and reflective use were observed, the short intervention restricts conclusions about sustained cognitive and metacognitive development. This also limits the generalisability of the findings, as longer-term learning processes and sustained effects of AI-supported scaffolding cannot be captured within a single-session design. Scaffolding intensity effects may unfold gradually and require repeated interaction to produce stable differences between scaffolding conditions. The absence of consistent interaction effects between scaffolding conditions should therefore not be interpreted as evidence of ineffectiveness, but rather as an indication that the short-term design may have limited the ability to detect differential effects between scaffolding levels, which may only emerge over longer periods of engagement.

Third, while conducting the intervention within a single session allowed for controlled comparison in an authentic university context without significant dropouts, it may have limited the emergence of differential effects, particularly for HOT and motivational processes that typically develop over extended periods of engagement. Future research should examine the scaffolded implementation of AI as a tutor across multiple sessions or semesters to investigate cumulative and longitudinal effects.

Fourth, a further limitation concerns the assessment of adherence to the scaffolding conditions. Although the intervention was conducted in a supervised classroom setting, allowing for general monitoring of students’ engagement with the assigned condition, no process-level data (e.g., prompts or interaction traces) were collected to directly assess individual compliance. Future research should incorporate process-based measures to more precisely capture how learners engage with scaffolding in AI-supported learning environments.

Fifth, another limitation concerns the use of identical pre–post knowledge tests and the analytical treatment of the data structure. While identical test items were used to ensure comparability of knowledge scores across measurement points, this approach may introduce testing effects (e.g., familiarity with items), potentially leading to an overestimation of learning gains. Furthermore, as the study was conducted within existing classroom structures, the data were organised within classroom groups. As the analyses were conducted at the individual level, potential intra-class dependencies could not be fully accounted for. This may affect the precision of the estimated effects. Future research should therefore consider multilevel modelling approaches to more explicitly account for group-level influences in AI-supported learning environments.

Sixth, mediation and moderation analyses basically relied on post-intervention measures. Although the assumed relationships were theoretically justified, the design does not allow clear conclusions about causal mechanisms. The observed indirect and interaction effects should therefore be interpreted as meaningful associations rather than definitive causal mechanisms, which limits causal interpretation. Future longitudinal studies with additional measurement points during the intervention could more precisely examine how scaffolding intensities, cognitive load, and metacognitive regulation unfold over time.

Seventh, for motivation, cognitive load, critical thinking, and reflective use, pre-measures captured students’ anticipated rather than experienced task-related states. Accordingly, pre–post differences should be interpreted as shifts in initial expectations rather than as pure developmental changes. Future research could complement this approach by including baseline assessments in comparable learning contexts or additional measurement points within a longitudinal research design.

Eighth, some constructs, including motivation, critical thinking, and reflective use, relied on self-report measures. While these measures were designed to capture task-specific processes of AI-supported learning, the use of self-developed instruments may limit measurement precision. In particular, the motivation scale showed low internal consistency (α_pre = 0.60), which may reflect the limited number of items and multidimensional nature of the self-developed instrument. These findings should therefore be interpreted with caution. Future studies could employ more extensive validated scales to further strengthen measurement precision. Finally, the study was conducted within a specific institutional context involving first-semester students from two faculties at a practice-oriented university. While this enhances ecological validity, the findings may not fully extend to other academic disciplines, educational levels, or institutional settings, which limits the generalisability of the findings beyond similar educational contexts. Replication across diverse contexts, including research-intense universities and advanced cohorts, would strengthen external validity and clarify the robustness of the observed effects.

Ninth, another limitation concerns the statistical power of the study. While the initial sample comprised N = 175 participants, the effective sample size varied across analyses due to the use of complete case approaches. Although these sample sizes were sufficient to detect small-to-moderate main effects, more complex effects, such as interactions and mediation effects, typically require larger sample sizes to be detected reliably. Previous research has shown that mediation effects, particularly small indirect effects, often require large sample sizes to achieve adequate statistical power (Fritz & MacKinnon, 2007). Therefore, the absence of significant interaction and mediation effects should be interpreted with caution, as some effects may have remained undetected due to limited statistical power. Future research should include larger samples to more robustly examine these effects.

Tenth, analyses were conducted using a complete-case approach, resulting in varying sample sizes across models. While this approach ensured that only valid responses were included, it may introduce bias if missing data were not completely random. The reduction in sample size may also have affected statistical power and the stability of certain estimates. Future research should consider using more advanced methods for handling missing data to improve robustness.

5. Conclusions

This study provides empirical insight into how generative AI interacts with core cognitive and metacognitive processes of learning in higher education. Within a single authentic classroom session, students demonstrated gains in knowledge, critical thinking, and reflective use, while motivational levels remained stable and cognitive load evolved in theoretically coherent ways.

Importantly, the three levels of scaffolding (full scaffolding, light scaffolding, and the control condition) did not consistently produce differential effects on learning outcomes. This finding suggests that in short-term AI-supported learning settings, generative AI may itself provide elements functionally similar to scaffolding, such as structure, guidance, and explanations during interaction, which may reduce the added value of externally imposed scaffolding intensity. Instead, the effectiveness of AI-supported learning may be more closely related to how learners engage with and regulate AI-generated support than to the intensity of instructional scaffolding alone. This study contributes to existing research on AI-supported learning by providing initial empirical evidence that generative AI may support processes typically associated with scaffolding.

The findings further indicate that the educational impact of the implementation of generative AI as a tutor may not be explained only by technological access, or from variations in scaffolding intensity. It may depend on how learners regulate their interaction with AI. Reflective use was associated with cognitive load, particularly under light-scaffolding conditions, suggesting that metacognitive engagement may serve as an internal regulatory buffer when external structure is limited.

These results contribute to the growing literature on AI in education by moving beyond binary assumptions of enhancement or decline. Generative AI does not inherently improve or undermine learning. Its educational value appears to be related to how it is embedded within theory-informed instructional design and how learners regulate their interaction with AI-generated outputs. By analytically distinguishing motivational, cognitive, and metacognitive dimensions within a real university context, the study advances a multidimensional perspective on AI-tutored learning. For higher education, the key challenge is therefore not whether AI should be integrated, but how instructional environments can foster sustained knowledge acquisition, higher-order reasoning, and responsible metacognitive regulation of AI-supported learning contexts.

For instructional design, these findings suggest that increasing the intensity of external scaffolding alone may not be sufficient to enhance learning outcomes in short-term AI-supported settings. Instead, greater emphasis should be placed on fostering students’ ability to actively regulate their interaction with generative AI.

This can be achieved by designing learning tasks that explicitly require students to engage in evaluation, revision, and reflection on AI-generated outputs. For example, in a university course, students can be asked to use generative AI to generate an explanation or solution related to the course content and then critically assess this output in relation to the lecture materials. Rather than submitting the AI-generated response itself, students are required to revise the output, justify their changes, and reflect on inaccuracies, limitations, or misconceptions in the AI response.

In addition, instructional designs may incorporate structured comparison tasks, in which students contrast AI-generated responses with their own reasoning or peer-generated solutions, as well as iterative refinement tasks that require multiple rounds of prompting and revision.

Such approaches shift the role of generative AI from a tool for answer generation to a facilitator for active processing and metacognitive regulation. By embedding these practices into authentic learning activities, educators may support deeper engagement and more effective use of generative AI as a learning resource.

Author Contributions

Conceptualization, C.M. and M.B.; methodology, C.M. and M.B.; software, C.M.; validation, C.M. and M.B.; formal analysis, C.M.; investigation, C.M.; resources, C.M.; data curation, C.M.; writing—original draft preparation, C.M.; writing—review and editing, C.M.; visualisation, C.M.; supervision, C.M. and M.B.; project administration, C.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study due to the study involving voluntary participation of adult students within a regular educational setting. No identifying or sensitive data were collected, and participation was entirely anonymous. Prior to data collection, students were orally informed about the purpose of the study, the anonymous handling of their data, and their right to withdraw at any time without disadvantage. As the research constituted minimal-risk, non-interventional educational research and did not involve sensitive or identifiable data, written informed consent was not obtained. Informed consent was implied through voluntary participation. The study was conducted in accordance with the Declaration of Helsinki. As the research involved anonymous data collection from adult participants within a regular educational setting and posed no foreseeable risk, formal ethical approval was not required. Please let me know if any further clarification/revision is required.

Informed Consent Statement

Informed consent was obtained from all participants throughout voluntary participation. Prior to data collection, students were orally informed about the purpose of the study, the anonymous handling of their data, and their right to withdraw at any time without disadvantage.

Data Availability Statement

The dataset is available from the corresponding author upon request. The data will be made publicly available in an open repository upon acceptance of the manuscript.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT and GPT 5.2 (Plus Version), for the purposes of language refinement and grammar checking. The authors have written, reviewed and edited the manuscript and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial intelligence
LOT	Lower-order thinking
HOT	Higher-order thinking
GCL	Germane cognitive load
ECL	Extraneous cognitive load
ICL	Intrinsic cognitive load
RQ	Research question
H	Hypothesis

Appendix A. Instructional Materials for the Scaffolding Conditions

This appendix provides the complete lecture content, learning task instructions, and scaffolding templates used in the study. The English translations are presented first, followed by the original German versions.

Appendix A.1. Lecture Content: AI Competencies and the AI-as-Tutor Approach (All Conditions)

As part of the course, all students received a standardised introduction to core AI competencies before working with the AI tool. This introduction was identical across all experimental conditions and covered the following aspects:

Technical foundations of generative AI, including the functioning of large language models and their typical use cases in higher education.
Ethical foundations, particularly the opportunities and risks associated with AI use and their importance of critically evaluating AI-generated content.
Legal aspects and data protection, with focus on institution-specific regulations, data-protection-compliant use, and permitted AI application scenarios.

In addition, students were informed about which AI tool was available at the university, how this tool could be used technically, and which forms of AI use were permitted in their studies. This was followed by an instruction to effective prompting. Students were shown how to structure AI prompts in a purposeful manner, focusing in particular on the following elements:

The role or purpose assigned to the AI use (e.g., tutor or explanation partner);
The task or specific question;
The desired output format (e.g., structured explanation, examples, and bullet points).

In the final step, the AI-as-tutor scenario was introduced. It was explicitly emphasised that the AI should be used as a learning partner to support thinking and learning processes, rather than as a substitute for students’ own cognitive effort. To illustrate this approach, three instructional slides were presented.

Appendix A.2. English Translation of Materials

Appendix A.2.1. Course Lecture Slides: Explanation of the AI-as-Tutor Scenario (All Conditions)

Slide 1: AI as Tutor, Not as a Replacement!
What does this mean in practice?
AI explains complex content in clear and accessible language.
AI supports understanding of relationships and connections.
AI provides feedback, asks questions, or offers examples.
AI does not replace students’ own cognitive effort.
Why is this form of use permitted?
It supports goal-oriented learning.
It promotes critical thinking rather than dependence.
It is transparent, traceable, and ethically acceptable.
AI may support your thinking…but it must not think for you!
Slide 2: Effective Use Requires Structure!

AI as a replacement for students’ own work (not permitted)	AI as a learning partner (tutor) (permitted)
Students have tasks or texts generated entirely by AI	Students use AI to understand, reflect on, and practice content
Focus: Producing an outcome	Focus: Learning process and understanding
Passive reliance on AI output	Active questioning, checking, and revising
No meaningful learning	Development of AI competence and critical thinking

Slide 3: AI Use is Permitted When It Supports Learning—Not When It Replaces Students’ Own Work!
Replacement of own work (not permitted example):
“Explain the differences between machine learning and deep learning and write a text for my assignment.”
Learning partner (permitted tutor scenario)
“Explain the differences between machine learning and deep learning in your own words. Give an example and help me check whether my explanation is correct.”

Appendix A.2.2. Learning Task

Scaffolding Conditions (Full and Light Scaffolding)

Please use the Ask Alma chatbot as a tutor to independently deepen your understanding of today’s course content. Upload the learning materials and use the AI with the support of the provided scaffolding template (see Appendix A.2.3).
Formulate three possible exam-style questions based on the course materials and provide a correct model answer for each question. The questions should demonstrate that you have understood the content and are able to reflect on it critically, rather than only reproducing information.
Submission:
Save your results in a Word file (.docx) and upload the document via Moodle.

Control Condition

Please use the Ask Alma chatbot as a tutor to independently deepen your understanding of today’s unit, “Introduction to AI Competencies”. Upload the learning materials (slides) to Ask Alma and use the AI to review, explain, and further explore the core content.
Formulate three possible exam-style questions based on the learning materials and provide a correct model answer for each question. Your questions should demonstrate that you have understood and reflected on the content, rather than only reproducing it.
Note:
Use the AI as a learning aid, not for automated text generation.
You decide which questions to ask the AI in order to understand or deepen the topics.
Use the AI to find examples, explain concept, or clarify relationships.
Submission:
Save your results in a Word file (.docx) and upload the document via Moodle.

Appendix A.2.3. Scaffolding Templates

Full-Scaffolding Template

This template is designed to help you use AI purposefully as a learning aid in order to better understand the course content of “Introduction to AI Competencies”. You may use the AI to review, clarify, and reflect on the technical, ethical and legal aspects of the lecture. This template will help you prepare effectively for the task.
GCC Strategy (Goal–Context–Constraints)
Formulate the learning goal (Goal)
What would you like to understand better with the help of the AI?
Refer to a topic from the lecture or the slides (e.g., machine learning, weak vs. strong AI, the EU AI Act, copyright, and fairness).
Example Prompts:
“I would like to understand the difference between weak and strong AI”.
“I would like to understand the difference between supervised and unsupervised learning.”
“I would like to be able to explain what the EU AI Act regulates.”
“I would like to understand why algorithm bias is important.”
Specify the context (Context)
For whom should the explanation be understandable? (e.g., first-semester student, a person without prior AI knowledge, or in an academic or professional context)
Define clear constraints (Constraints)
What requirements should the AI’s response meet? (e.g., a maximum of 100 words, simple language, including an example from everyday life or university studies)
Refine Loop—Improving the answer step by step
Do not use the first response directly.
Ask at least two follow-up questions to improve, deepen, simplify, or critically reflect on the explanation.
Example Prompts:
“Rephrase the explanation in simpler terms and include an everyday example.”
“What are the advantages and risks?”
“How can this concept be observed in a real-world application?”
“How could this be applied in a university context?”
What are common misconceptions about this topic?”
Goal: To truly understand the topic—not only to reproduce the information.
Source-Check and Reflection
Ask explicitly for sources or supporting evidence.
Check whether these sources are plausible and aligned with the course materials.
Example Prompts:
“Name a source or example that supports this statement.”
“Which legal basis or ethical guideline applies here?”
“How does this answer align with what we discussed in the lecture?”
Output Checklist
Critically review the AI output in light of your own understanding and insights, and summarise the key points you have genuinely understood.
Example Guiding Questions:
How would I explain this topic to someone else in my own words?
What is the key insight from this dialogue?
Which statement in the AI output was initially unclear or misleading to me, and how do I understand it now?
Is there any statement in the AI output that I would question or revise based on my own understanding?
What typical exam question could relate to this topic and how would I answer it?

Light-Scaffolding Condition

The light-scaffolding condition followed the same template as the full-scaffolding condition but included only the Goal–Context–Constraints (GCC) framework. In contrast to the full-scaffolding condition, the light-scaffolding condition did not include any instructions for iterative refinement, source verification, and reflective evaluation of AI-generated outputs.

Control Condition

Students in the control condition did not receive any scaffolding template beyond the written guidance included in the instructions of the learning task (see “Control Condition” in Appendix A.3.2). This reflected the content introduced during the lecture and did not involve scaffolding.

Appendix A.3. Original Materials

Appendix A.3.1. Original Course Lecture Slides: Explanation of the AI-as-Tutor Scenario

Folie 1: KI als Tutor nicht als Ersatz!
Was bedeutet es konkret?
KI erklärt komplexe Inhalte in einfachen Worten
KI hilft beim Verstehen von Zusammenhängen
KI gibt Feedback, stellt Fragen oder bietet Beispiele
KI ersetzt nicht die eigene Denkleistung
Warum diese Form der Nutzung erlaubt ist?
Unterstützt das zielgerichtete Lernen
Fördert kritisches Denken statt Abhängigkeit
Ist transparent, nachvollziehbar und ethisch unbedenklich
KI darf dich beim Denken unterstützen…aber sie darf nicht für dich denken!
Folie 2: Damit das gelingt braucht es Struktur!

KI als Ersatz der eigenen Leistung (nicht erlaubt)	KI als Lernpartner (Tutor)
Studierende lassen sich Aufgaben oder Texte komplett erzeugen	Studierende nutzen KI, um Inhalte zu verstehen, zu reflektieren und zu üben
Fokus: schnelles Ergebnis	Fokus: Lernprozess &Verständnis
Passives Konsumieren	Aktives Nachfragen, Überprüfen, Verbessern
Kein Lerneffekt	Aufbau von KI-Kompetenz & kritischem Denken

Folie 3: Effizient und erlaubt ist der Einsatz von KI dann, wenn sie das Lernen unterstützt, nicht wenn sie die Arbeit übernimmt!
Abkürzung (nicht erlaubtes Beispiel):
“Erkläre mir die Unterschiede zwischen Machine Learning und Deep Learning—und schreibe Gleich einen Text für mein Assignment.”
Lernpartner (erlaubtes Tutor-Szenario)
“Erkläre mir die Unterschiede zwischen Machine Learning und Deep Learning in eigenen Worte. Nenne ein Beispiel, und hilf mir zu überprüfen, ob meine Erklärung korrekt ist.”

Appendix A.3.2. Learning Task

Scaffolding Conditions (Full and Light Scaffolding)

Verwenden Sie den Ask Alma Chatbot als Tutor, um die heutigen Inhalte selbst zu vertiefen.
Laden Sie die Lernmaterialien hoch und nutzen Sie die KI mit Hilfe der Scaffolding-Vorlage (siehe Anhang A.3.3.).
Formulieren Sie drei mögliche Prüfungsfragen zum Thema der Lehrmaterialien und geben Sie eine richtige Musterantwort. Die Frage soll zeigen, dass Sie die Inhalte verstanden haben und kritisch reflektiert können.
Abgabe:
Speichern Sie Ihre Ergebnisse in einer Word-Datei (.docx) und laden Sie diese als Abgabe hier in Moodle hoch.

Control Condition

Verwenden Sie den Ask Alma Chatbot als Tutor, um die Inhalte der heutigen Einheit “Einführung in die KI-Kompetenzen” selbständig zu vertiefen. Laden Sie die Lernmaterialien (Folien) in Ask Alma hoch und nutzen Sie die KI, um zentrale Inhalte zu wiederholen, zu erklären und zu vertiefen.
Formulieren Sie anschließend drei mögliche Prüfungsfragen zu den Lernmaterialien und geben Sie jeweils eine richtige Musterantwort. Ihre Fragen sollen zeigen, dass Sie die Inhate verstanden und reflektiert haben. nicht nur wiedergeben.
Hinweis:
Verwenden Sie die KI als Lernhilfe, nicht zur automatischen Texterstellung.
Sie entscheiden selbst, welche Fragen Sie der KI stellen, um die Themen zu verstehen oder zu vertiefen.
Nutzen Sie die KI, um Beispiele zu finden, Begriffe zu erklären oder Zusammenhänge zu klären.
Abgabe:
Speichern Sie Ihre Ergebnisse in einer Word-Datei (.docx) und laden Sie diese als Abgabe hier in Moodle hoch.

Appendix A.3.3. Scaffolding Templates

Full-Scaffolding Condition

Diese Vorlage unterstützt Sie dabei, die KI gezielt als Lernhilfe zu nutzen, um die Inhalte der Vorlesung „Einführung in die KI-Kompetenzen” besser zu verstehen.
Sie können die KI verwenden, um technische, ethische und rechtliche Aspekte zu wiederholen, zu klären und zu reflektieren. So bereiten Sie sich optimal auf die Aufgabe vor.
GCC-Strategie (Goal—Context—Constraints)
Formulieren Sie Ihr Lernziel (Goal)
Was möchten Sie mithilfe der KI besser verstehen?
Beziehen Sie sich auf ein Thema aus der Vorlesung oder den Folien (z. B. maschinelles Lernen, schwache/starke KI, EU-KI-Verordnung, Urheberrecht, Fairness).
Prompt-Beispiele:
„Ich möchte den Unterschied zwischen schwacher und starker KI verstehen.”
„Ich möchte den Unterschied zwischen überwachten und unüberwachten Lernverfahren nachvollziehen.”
„Ich möchte besser erklären können, was die EU-KI-Verordnung regelt.”
„Ich möchte nachvollziehen, warum algorithmische Fairness wichtig ist.”
Geben Sie den Kontext an (Context)
Für wen soll die Erklärung verständlich sein?
(z. B. für Studierende im 1. Semester, für eine Person ohne KI-Vorkenntnisse, im studentischen und beruflichen Alltag)
Setzen Sie klare Rahmenbedingungen (Constraints)
Welche Anforderungen soll die KI-Antwort erfüllen?
(z. B. maximal 100 Wörter, einfache Sprache, mit Beispiel aus dem Alltag oder Studium)
Refine-Loop—Antwort in Schritten verbessern
Nutzen Sie die erste Antwort nicht direkt.
Stellen Sie mindestens zwei Folgefragen, um die Erklärung zu verbessern, zu vertiefen, zu vereinfachen oder kritisch zu vertiefen.
Prompt-Beispiele:
„Formuliere die Erklärung einfacher und mit einem Beispiel aus dem Alltag.”
„Welche Vorteile und Risiken gibt es?”
„Wie kann man das Konzept in einem realen Anwendungsfall sehen?”
„Wie könnte man das in der Hochschule anwenden?”
„Welche typischen Missverständnisse gibt es bei diesem Thema?”
Ziel: Das Thema wirklich verstehen—nicht nur wiedergeben.
Source-Check and Reflection
Fragen Sie gezielt nach Quellen oder Belegen.
Überprüfen Sie, ob diese plausibel und lernmaterialiennah sind.
Prompt-Beispiele:
„Nenne eine Quelle oder ein Beispiel, das diese Aussage stützt.”
„Welche rechtliche Grundlage oder ethische Leitlinie gilt hier?”
„Wie stimmt diese Antwort mit dem überein, was wir in der Vorlesung besprochen haben?”
Output-Checkliste
Überprüfen Sie die KI-Ausgabe kritisch auf Grundlage Ihres eigenen Verständnisses und Ihrer eigenen Erkenntnisse und fassen Sie die zentralen Punkte zusammen, die Sie tatsächlich verstanden haben.
Leitfragen:
Wie würde ich das Thema jemandem in eigenen Worten erklären?
Was ist die wichtigste Erkenntnis aus diesem Dialog?
Welche Aussage der KI war für mich unklar oder missverständlich und wie verstehe ich sie jetzt?
Gibt es eine Aussage im KI-Output, die ich hinterfragen oder korrigieren würde?
Welche typische Prüfungsfrage könnte dazu passen und wie würde ich sie beantworten?

References

Akgun, M., & Toker, S. (2025). Short-term gains, long-term gaps: The impact of GenAI and search technologies on retention (version 1). arXiv, arXiv:2507.07357. [Google Scholar] [CrossRef]
Anderson, L. W., & Krathwohl, D. R. (Eds.). (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. Longman. [Google Scholar]
Azevedo, R. (2020). Reflections on the field of metacognition: Issues, challenges, and opportunities. Metacognition and Learning, 15(2), 91–98. [Google Scholar] [CrossRef]
Bai, Y., & Wang, S. (2025). Impact of generative AI interaction and output quality on university students’ learning outcomes: A technology-mediated and motivation-driven approach. Scientific Reports, 15(1), 24054. [Google Scholar] [CrossRef]
Bauer, E., Greiff, S., Graesser, A. C., Scheiter, K., & Sailer, M. (2025). Looking beyond the hype: Understanding the effects of AI on learning. Educational Psychology Review, 37(2), 45. [Google Scholar] [CrossRef]
Belland, B. R. (2017). Instructional scaffolding in STEM education. Springer International Publishing. [Google Scholar] [CrossRef]
Biggs, J. (1996). Enhancing teaching through constructive alignment. Higher Education, 32(3), 347–364. [Google Scholar] [CrossRef]
Boguslawski, S., Deer, R., & Dawson, M. G. (2025). Programming education and learner motivation in the age of generative AI: Student and educator perspectives. Information and Learning Sciences, 126(1/2), 91–109. [Google Scholar] [CrossRef]
Chiu, T. K. F., Ahmad, Z., Ismailov, M., & Sanusi, I. T. (2024). What are artificial intelligence literacy and competency? A comprehensive framework to support them. Computers and Education Open, 6, 100171. [Google Scholar] [CrossRef]
Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78(1), 98–104. [Google Scholar] [CrossRef]
Creswell, J. W. (2014). Research design: Qualitative, quantitative, and mixed methods approaches (4th ed.). SAGE. [Google Scholar]
Crowe, A., Dirks, C., & Wenderoth, M. P. (2008). Biology in bloom: Implementing bloom’s taxonomy to enhance student learning in biology. CBE—Life Sciences Education, 7(4), 368–381. [Google Scholar] [CrossRef]
Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior. Springer. [Google Scholar] [CrossRef]
Deci, E. L., & Ryan, R. M. (2000). The “what” and “why” of goal pursuits: Human needs and the self-determination of behavior. Psychological Inquiry, 11(4), 227–268. [Google Scholar] [CrossRef]
Dhillon, P. S., Molaei, S., Li, J., Golub, M., Zheng, S., & Robert, L. P. (2024). Shaping human-AI collaboration: Varied scaffolding levels in co-writing with language models. In Proceedings of the CHI conference on human factors in computing systems (pp. 1–18). ACM. [Google Scholar] [CrossRef]
Eccles, J. S., & Wigfield, A. (2002). Motivational beliefs, values, and goals. Annual Review of Psychology, 53(1), 109–132. [Google Scholar] [CrossRef]
Ennis, R., & Philosophy Documentation Center. (2011). Critical thinking: Reflection and perspective part I. Inquiry: Critical Thinking Across the Disciplines, 26(1), 4–18. [Google Scholar] [CrossRef]
Entwistle, N. J. (1991). Approaches to learning and perceptions of the learning environment: Introduction to the special issue. Higher Education, 22(3), 201–204. [Google Scholar] [CrossRef]
Facione, P. A. (1990). Critical thinking: A statement of expert consensus for purposes of educational assessment and instruction. The California Academic Press. [Google Scholar]
Fan, Y., Tang, L., Le, H., Shen, K., Tan, S., Zhao, Y., Shen, Y., Li, X., & Gašević, D. (2025). Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. British Journal of Educational Technology, 56(2), 489–530. [Google Scholar] [CrossRef]
Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitive–developmental inquiry. American Psychologist, 34(10), 906–911. [Google Scholar] [CrossRef]
Fritz, M. S., & MacKinnon, D. P. (2007). Required sample size to detect the mediated effect. Psychological Science, 18(3), 233–239. [Google Scholar] [CrossRef]
Fryer, L. K., Ainley, M., Thompson, A., Gibson, A., & Sherlock, Z. (2017). Stimulating and sustaining interest in a language course: An experimental comparison of chatbot and human task partners. Computers in Human Behavior, 75, 461–468. [Google Scholar] [CrossRef]
Gkintoni, E., Antonopoulou, H., Sortwell, A., & Halkiopoulos, C. (2025). Challenging cognitive load theory: The role of educational neuroscience and artificial intelligence in redefining learning efficacy. Brain Sciences, 15(2), 203. [Google Scholar] [CrossRef]
Gruenhagen, J. H., Sinclair, P. M., Carroll, J.-A., Baker, P. R. A., Wilson, A., & Demant, D. (2024). The rapid rise of generative AI and its implications for academic integrity: Students’ perceptions and use of chatbots for assistance with assessments. Computers and Education: Artificial Intelligence, 7, 100273. [Google Scholar] [CrossRef]
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items (1st ed.). Routledge. [Google Scholar] [CrossRef]
Halpern, D. F. (2013). Thought and knowledge (5th ed.). Psychology Press. [Google Scholar] [CrossRef]
Hayes, A. F., & Little, T. D. (2022). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach (3rd ed.). The Guilford Press. [Google Scholar]
Holmes, W., Bialik, M., & Fadel, C. (2019). Artificial intelligence in education: Promises and implications for teaching and learning. The Center for Curriculum Redesign. [Google Scholar]
Hou, C., Zhu, G., & Sudarshan, V. (2025). The role of critical thinking on undergraduates’ reliance behaviours on generative AI in problem-solving. British Journal of Educational Technology, 56(5), 1919–1941. [Google Scholar] [CrossRef]
Ifenthaler, D., & Egloffstein, M. (2020). Development and implementation of a maturity model of digital transformation. TechTrends, 64(2), 302–309. [Google Scholar] [CrossRef]
JASP Team. (2025). JASP (Version 0.95.3) [Computer software]. University of Amsterdam. Available online: https://jasp-stats.org/ (accessed on 19 November 2025).
Jose, B., Cherian, J., Verghis, A. M., Varghise, S. M., S, M., & Joseph, S. (2025). The cognitive paradox of AI in education: Between enhancement and erosion. Frontiers in Psychology, 16, 1550621. [Google Scholar] [CrossRef]
Kember, D., Leung, D. Y. P., Jones, A., Loke, A. Y., McKay, J., Sinclair, K., Tse, H., Webb, C., Yuet Wong, F. K., Wong, M., & Yeung, E. (2000). Development of a questionnaire to measure the level of reflective thinking. Assessment & Evaluation in Higher Education, 25(4), 381–395. [Google Scholar] [CrossRef]
Kestin, G., Miller, K., Klales, A., Milbourne, T., & Ponti, G. (2025). AI tutoring outperforms in-class active learning: An RCT introducing a novel research-based design in an authentic educational setting. Scientific Reports, 15(1), 17458. [Google Scholar] [CrossRef] [PubMed]
Kirschner, P. A., Sweller, J., & Clark, R. E. (2006). Why minimal guidance during instruction does not work: An analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching. Educational Psychologist, 41(2), 75–86. [Google Scholar] [CrossRef]
Knoth, N., Tolzin, A., Janson, A., & Leimeister, J. M. (2024). AI literacy and its implications for prompt engineering strategies. Computers and Education: Artificial Intelligence, 6, 100225. [Google Scholar] [CrossRef]
Krieglstein, F., Beege, M., Rey, G. D., Sanchez-Stockhammer, C., & Schneider, S. (2023). Development and validation of a theory-based questionnaire to measure different types of cognitive load. Educational Psychology Review, 35(1), 9. [Google Scholar] [CrossRef]
Kuhn, D. (2000). Metacognitive development. Current Directions in Psychological Science, 9(5), 178–181. [Google Scholar] [CrossRef]
Kuhn, D., Cheney, R., & Weinstock, M. (2000). The development of epistemological understanding. Cognitive Development, 15(3), 309–328. [Google Scholar] [CrossRef]
Larson, B. Z., Moser, C., Caza, A., Muehlfeld, K., & Colombo, L. A. (2024). Critical thinking in the age of generative AI. Academy of Management Learning & Education, 23(3), 373–378. [Google Scholar] [CrossRef]
Lee, D., & Palmer, E. (2025). Prompt engineering in higher education: A systematic review to help inform curricula. International Journal of Educational Technology in Higher Education, 22(1), 7. [Google Scholar] [CrossRef]
Lee, H.-P., Sarkar, A., Tankelevitch, L., Drosos, I., Rintel, S., Banks, R., & Wilson, N. (2025). The impact of generative AI on critical thinking: Self-reported reductions in cognitive effort and confidence effects from a survey of knowledge workers. In Proceedings of the 2025 CHI conference on human factors in computing systems (pp. 1–22). ACM. [Google Scholar] [CrossRef]
Leppink, J., Paas, F., Van Gog, T., Van Der Vleuten, C. P. M., & Van Merriënboer, J. J. G. (2014). Effects of pairs of problems and examples on task performance and different types of cognitive load. Learning and Instruction, 30, 32–42. [Google Scholar] [CrossRef]
Li, J., Zhang, J., Chai, C. S., Lee, V. W. Y., Zhai, X., Wang, X., & King, R. B. (2025). Analyzing the network structure of students’ motivation to learn AI: A self-determination theory perspective. npj Science of Learning, 10(1), 48. [Google Scholar] [CrossRef]
Liu, X., & Zhong, B. (2025). Integrating generative artificial intelligence into student learning: A systematic review from a TPACK perspective. Educational Research Review, 49, 100741. [Google Scholar] [CrossRef]
Ma, Y., & Chen, M. (2025). The human touch in AI: Optimizing language learning through self-determination theory and teacher scaffolding. Frontiers in Psychology, 16, 1568239. [Google Scholar] [CrossRef] [PubMed]
McGrew, S., Breakstone, J., Ortega, T., Smith, M., & Wineburg, S. (2018). Can students evaluate online sources? Learning from assessments of civic online reasoning. Theory & Research in Social Education, 46(2), 165–193. [Google Scholar] [CrossRef]
Mohamed, A. M., Shaaban, T. S., Bakry, S. H., Guillén-Gámez, F. D., & Strzelecki, A. (2025). Empowering the faculty of education students: Applying AI’s potential for motivating and enhancing learning. Innovative Higher Education, 50(2), 587–609. [Google Scholar] [CrossRef]
Munshi, A., Biswas, G., Baker, R., Ocumpaugh, J., Hutt, S., & Paquette, L. (2023). Analysing adaptive scaffolds that help students develop self-regulated learning behaviours. Journal of Computer Assisted Learning, 39(2), 351–368. [Google Scholar] [CrossRef]
Nathaniel, J., Oyelere, S. S., Suhonen, J., & Tedre, M. (2025). Investigating the impact of generative AI integration on the sustenance of higher-order thinking skills and understanding of programming logic. Computers and Education: Artificial Intelligence, 9, 100460. [Google Scholar] [CrossRef]
Ng, D. T. K., Leung, J. K. L., Chu, S. K. W., & Qiao, M. S. (2021). Conceptualizing AI literacy: An exploratory review. Computers and Education: Artificial Intelligence, 2, 100041. [Google Scholar] [CrossRef]
Pinski, M., & Benlian, A. (2024). AI literacy for users—A comprehensive review and future research directions of learning methods, components, and effects. Computers in Human Behavior: Artificial Humans, 2(1), 100062. [Google Scholar] [CrossRef]
Pireci Sejdiu, N., & Sejdiu, S. (2025). The quiet transformation of higher education in the AI era. Open Research Europe, 5, 249. [Google Scholar] [CrossRef] [PubMed]
Polyportis, A. (2024). A longitudinal study on artificial intelligence adoption: Understanding the drivers of ChatGPT usage behavior change in higher education. Frontiers in Artificial Intelligence, 6, 1324398. [Google Scholar] [CrossRef]
Risko, E. F., & Gilbert, S. J. (2016). Cognitive offloading. Trends in Cognitive Sciences, 20(9), 676–688. [Google Scholar] [CrossRef] [PubMed]
Roe, J., & Perkins, M. (2025). Generative AI in self-directed learning: A thematic scoping review. Interactive Learning Environments, 1–12. [Google Scholar] [CrossRef]
Shoufan, A. (2023). Can students without prior knowledge use ChatGPT to answer test questions? An empirical study. ACM Transactions on Computing Education, 23(4), 45. [Google Scholar] [CrossRef]
Song, J., Howe, E., Oltmanns, J. R., & Fisher, A. J. (2023). Examining the concurrent and predictive validity of single items in ecological momentary assessments. Assessment, 30(5), 1662–1671. [Google Scholar] [CrossRef]
Stadler, M., Sailer, M., & Fischer, F. (2021). Knowledge as a formative construct: A good alpha is not always better. New Ideas in Psychology, 60, 100832. [Google Scholar] [CrossRef]
Stanovich, K. E. (1986). Matthew effects in reading: Some consequences of individual differences in the acquisition of literacy. Reading Research Quarterly, 21(4), 360–407. [Google Scholar] [CrossRef]
Sweller, J., Ayres, P., & Kalyuga, S. (2011). Cognitive load theory. Springer. [Google Scholar] [CrossRef]
Sweller, J., Van Merriënboer, J. J. G., & Paas, F. (2019). Cognitive architecture and instructional design: 20 years later. Educational Psychology Review, 31(2), 261–292. [Google Scholar] [CrossRef]
Sweller, J., Van Merrienboer, J. J. G., & Paas, F. G. W. C. (1998). Cognitive architecture and instructional design. Educational Psychology Review, 10(3), 251–296. [Google Scholar] [CrossRef]
Tsakeni, M., Nwafor, S. C., Mosia, M., & Egara, F. O. (2025). Mapping the scaffolding of metacognition and learning by AI tools in STEM classrooms: A bibliometric–systematic review approach (2005–2025). Journal of Intelligence, 13(11), 148. [Google Scholar] [CrossRef]
Van De Pol, J., Volman, M., & Beishuizen, J. (2010). Scaffolding in teacher–student interaction: A decade of research. Educational Psychology Review, 22(3), 271–296. [Google Scholar] [CrossRef]
Vygotsky, L. S., & Cole, M. (1978). Mind in society: The development of higher psychological processes. Harvard University Press. [Google Scholar]
Wang, N. C. (2025). Scaffolding creativity: Integrating generative AI tools and real-world experiences in business education. In Proceedings of the extended abstracts of the CHI conference on human factors in computing systems (pp. 1–9). ACM. [Google Scholar] [CrossRef]
Wang, X., Liu, Q., Pang, H., Tan, S. C., Lei, J., Wallace, M. P., & Li, L. (2023). What matters in AI-supported learning: A study of human-AI interactions in language learning using cluster analysis and epistemic network analysis. Computers & Education, 194, 104703. [Google Scholar] [CrossRef]
Yu, F. -Y., & Chen, C. -Y. (2021). Student- versus teacher-generated explanations for answers to online multiple-choice questions: What are the differences? Computers & Education, 173, 104273. [Google Scholar] [CrossRef]
Zhai, C., Wibowo, S., & Li, L. D. (2024). The effects of over-reliance on AI dialogue systems on students’ cognitive abilities: A systematic review. Smart Learning Environments, 11(1), 28. [Google Scholar] [CrossRef]
Zhang, X., Zhang, P., Shen, Y., Liu, M., Wang, Q., Gašević, D., & Fan, Y. (2024). A systematic literature review of empirical research on applying generative artificial intelligence in education. Frontiers of Digital Education, 1(3), 223–245. [Google Scholar] [CrossRef]

Figure 1. Visualisation of PROCESS Mediation Models of H3d, testing the role of GCL (post) as a mediator between instructional condition and knowledge gain. Results highlighted in yellow indicate statistically significant effects (p < 0.05).

Figure 2. Visualisation of PROCESS Mediation Models of H3e, testing the role of ECL (post) as a mediator between instructional condition and knowledge gain. Results highlighted in yellow indicate statistically significant effects (p < 0.05).

Figure 3. Visualisation of PROCESS Mediation Models of H4c, testing the role of critical thinking (post) as a mediator between instructional condition and cognitive load (GCL and ECL). Results highlighted in yellow indicate statistically significant effects (p < 0.05).

Figure 4. Visualisation of PROCESS Moderation Models of H4d, testing whether the relationship between HOT Gain and Critical Thinking Gain is moderated by instructional condition. Results highlighted in yellow indicate statistically significant effects (p < 0.05).

Figure 5. Visualisation of PROCESS Moderation Models of H5d, testing whether the relationship between reflective use (post) and cognitive load (GCL and ECL) is moderated by instructional condition. Results highlighted in yellow indicate statistically significant effects (p < 0.05).

Figure 6. Visualisation of PROCESS Moderation Models of H5e, testing whether the relationship between HOT (post) and reflective use (post) is moderated by instructional condition.

Table 1. Descriptive statistics (M, SD) for all study variables across conditions and measurement points.

	Experimental Groups
	Full Scaffolding		Light Scaffolding		Control		N
	M	SD	M	SD	M	SD	(Full/Light/Control)
Knowledge (pre)	64.27	12.70	63.40	12.31	68.10	14.25	46/42/48
Knowledge (post)	72.42	16.06	72.50	15.86	71.40	16.30	46/42/48
LOT (pre)	72.28	14.20	71.43	14.42	77.10	18.29	46/42/48
LOT (post)	80.43	18.40	81.43	17.07	80.00	19.42	46/42/48
HOT (pre)	56.30	16.20	55.40	15.65	60.00	14.98	46/42/48
HOT (post)	64.04	17.70	63.70	20.62	62.80	17.00	46/42/48
Motivation (pre)	3.74	0.50	3.94	0.61	4.03	0.55	53/43/46
Motivation (post)	3.73	0.64	4.03	0.61	4.10	0.50	53/43/46
ICL (pre)	2.91	0.70	2.93	0.56	2.95	0.60	53/43/46
ICL (post)	2.75	0.80	2.53	0.79	2.77	0.70	53/43/46
ECL (pre)	2.71	0.51	2.50	0.65	2.54	0.50	53/43/46
ECL (post)	2.62	0.80	2.26	0.65	2.28	0.50	53/43/46
GCL (pre)	3.70	0.50	3.75	0.48	3.78	0.53	53/43/46
GCL (post)	3.73	0.52	3.85	0.55	3.89	0.50	53/43/46
Critical thinking (pre)	3.77	0.45	3.77	0.54	3.65	0.61	53/43/46
Critical thinking (post)	3.86	0.61	4.02	0.55	3.98	0.50	53/43/46
Reflective use (pre)	3.52	0.88	3.66	0.75	3.56	0.80	53/43/46
Reflective use (post)	3.80	0.72	4.10	0.60	3.90	0.65	53/43/46

Values are means (M) and standard deviations (SD). Knowledge outcomes are reported as percentages. Self-reported measures were assessed on 5-point Likert scales (1 = strongly disagree; 5 = strongly agree). Ns reflect complete cases per construct.

Table 2. Overview of research questions, hypotheses, sample sizes and hypothesis support.

Research Questions & Hypotheses	N	N Didactic Scenario	Supported
RQ1: Does scaffolding affect students’ knowledge gains within a single session?
H1a: Students’ knowledge will increase significantly from pre to post, with the full-scaffolding group showing the largest gains, followed by light-scaffolding and control.	136	Full = 46 Light = 42 Control = 48	Partially, yes. This conclusion reflects that the overall increase in knowledge was significant, whereas the expected group differences emerged only descriptively.
H1b: Prior achievement (final school grade) and prior AI experience will moderate knowledge gains, with higher-performing students showing greater benefits, consistent with a Matthew effect.	132–136	High initial knowledge = 72 Low initial knowledge = 64	No
H1c: LOT will increase significantly from pre to post across all conditions, without differences between groups.	136	Full = 46 Light = 42 Control = 48	Yes
H1d: HOT gains will differ between conditions, with full scaffolding showing the largest increase, followed by light scaffolding and control.	136	Full = 46 Light = 42 Control = 48	No
RQ2: How does students’ motivation change during the AI-tutored learning activity, and is motivation associated with knowledge gains?
H2a: Students’ motivation will show a small increase from pre to post, with the scaffolding conditions showing greater motivational increase than the control condition.	142	Full = 53 Light = 43 Control = 46	No
H2b: Higher initial motivation will predict greater knowledge gains.	132		No
RQ3: How do different dimensions of cognitive load (intrinsic, extraneous, and germane) change during the AI-tutored learning activity, and do they differ across didactic scenarios?
H3a: ICL will show small decrease from pre to post across conditions, as comprehension increases during the learning activity.	142	Full = 53 Light = 43 Control = 46	Yes
H3b: ECL will decrease significantly, with the largest reduction expected in the full-scaffolding group.	142	Full = 53 Light = 43 Control = 46	Partially, yes. This conclusion reflects that ECL decreased significantly across all groups, although the expected group differences did not emerge.
H3c: GCL will increase significantly, with the largest increase expected in the full-scaffolding group.	142	Full = 53 Light = 43 Control = 46	No
H3d: GCL will positively predict knowledge gain and mediate the relationship between condition and knowledge gain.	88–94	Full vs. Control = 94 Light vs. Control = 88	No
H3e: ECL will negatively predict knowledge gain and mediate the relationship between condition and knowledge gain.	88–94	Full vs. Control = 94 Light vs. Control = 88	No
RQ4: Does students’ critical thinking change during the AI-tutored learning activity and how is it related to knowledge gains and cognitive load?
H4a: Critical thinking will increase significantly from pre to post across all conditions.	142	Full = 53 Light = 43 Control = 46	Yes
H4b: Higher initial critical thinking will positively predict knowledge gains.	132		No
H4c: The effect of condition (full scaffolding vs. control; light scaffolding vs. control) on GCL (post) and ECL (POST) will be mediated by students’ critical thinking (POST).	93–101	Full vs. Control = 101 Light vs. Control = 93	No
H4d: Students’ gains in HOT will positively predict gains in critical thinking, particularly in the full-scaffolding condition, compared to the light-scaffolding and control conditions.	84–92	Full vs. Control = 92 Light vs. Control = 84	No
RQ5: How does students’ reflective use of AI develop during the AI-tutored learning activity, and how is it associated with learning outcomes and higher-order thinking?
H5a: Reflective use will increase significantly from pre to post across all conditions.	142	Full = 53 Light = 43 Control = 46	Yes
H5b: Higher initial reflective use will positively predict knowledge gains.	132		No
H5c: Students in the scaffolding conditions will report higher reflective use (post) compared to the control group.	93–101	Full vs. Control = 101 Light vs. Control = 93	No
H5d: Higher reflective use (post) will be associated with higher GCL and lower ECL.	93–101	Full vs. Control = 101 Light vs. Control = 93	Partially, yes. This conclusion reflects that reflective use was positively associated with GCL across conditions, indicating deeper cognitive engagement. In addition, reflective use was associated with lower ECL in the light-scaffolding condition, whereas this association did not consistently emerge across all conditions.
H5e: Higher HOT (post) will be positively associated with reflective use (post).	89–95	Full vs. Control = 95 Light vs. Control = 89	No

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Melanou, C.; Beege, M. Scaffolding Generative AI as a Tutor: A Quasi-Experimental Study of Learning Outcomes and Motivational, Cognitive and Metacognitive Processes. Educ. Sci. 2026, 16, 651. https://doi.org/10.3390/educsci16040651

AMA Style

Melanou C, Beege M. Scaffolding Generative AI as a Tutor: A Quasi-Experimental Study of Learning Outcomes and Motivational, Cognitive and Metacognitive Processes. Education Sciences. 2026; 16(4):651. https://doi.org/10.3390/educsci16040651

Chicago/Turabian Style

Melanou, Chrysanthi, and Maik Beege. 2026. "Scaffolding Generative AI as a Tutor: A Quasi-Experimental Study of Learning Outcomes and Motivational, Cognitive and Metacognitive Processes" Education Sciences 16, no. 4: 651. https://doi.org/10.3390/educsci16040651

APA Style

Melanou, C., & Beege, M. (2026). Scaffolding Generative AI as a Tutor: A Quasi-Experimental Study of Learning Outcomes and Motivational, Cognitive and Metacognitive Processes. Education Sciences, 16(4), 651. https://doi.org/10.3390/educsci16040651

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Scaffolding Generative AI as a Tutor: A Quasi-Experimental Study of Learning Outcomes and Motivational, Cognitive and Metacognitive Processes

Abstract

1. Introduction

1.1. Scaffolding in AI-Supported Learning

1.2. Knowledge Gain and the Role of Prior Knowledge

1.3. Motivation

1.4. Cognitive Load

1.5. Critical Thinking

1.6. Reflective Use

1.7. Research Gaps and Aim

1.8. Research Questions and Hypotheses

2. Method

2.1. Research Design

2.2. Participants

2.3. Materials and Learning Context

2.4. Procedure

2.5. Measures

2.5.1. Final School Grade

2.5.2. Experience

2.5.3. Knowledge and Knowledge Gain

2.5.4. Motivation

2.5.5. Cognitive Load

2.5.6. Critical Thinking

2.5.7. Reflective Use

2.6. Data Analysis

2.7. Analysis Methods

2.8. Ethical Considerations

3. Results

3.1. RQ1: Knowledge

3.2. RQ2: Motivation

3.3. RQ3: Cognitive Load

3.4. RQ4: Critical Thinking

3.5. RQ5: Reflective Use

3.6. Summary of Main Findings

4. Discussion

4.1. Knowledge Gain

4.2. Motivation

4.3. Cognitive Load

4.4. Critical Thinking

4.5. Reflective Use

4.6. Limitations and Future Research

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Instructional Materials for the Scaffolding Conditions

Appendix A.1. Lecture Content: AI Competencies and the AI-as-Tutor Approach (All Conditions)

Appendix A.2. English Translation of Materials

Appendix A.2.1. Course Lecture Slides: Explanation of the AI-as-Tutor Scenario (All Conditions)

Appendix A.2.2. Learning Task

Scaffolding Conditions (Full and Light Scaffolding)

Control Condition

Appendix A.2.3. Scaffolding Templates

Full-Scaffolding Template

Light-Scaffolding Condition

Control Condition

Appendix A.3. Original Materials

Appendix A.3.1. Original Course Lecture Slides: Explanation of the AI-as-Tutor Scenario

Appendix A.3.2. Learning Task

Scaffolding Conditions (Full and Light Scaffolding)

Control Condition

Appendix A.3.3. Scaffolding Templates

Full-Scaffolding Condition

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI