1. Introduction
Generative artificial intelligence (GenAI), particularly large language models (LLMs) such as ChatGPT, has rapidly entered higher education and is reshaping how students search for information, plan assignments, and produce written work. This shift offers clear instructional opportunities (e.g., timely support, idea generation, language assistance) while also raising sustainability-relevant concerns about learning quality, student agency, and responsible technology use in education systems. Framed by SDG 4’s emphasis on inclusive and equitable quality education and lifelong learning [
1], GenAI integration should be evaluated not only by short-term performance gains, but also by whether it sustains learners’ long-term competencies and well-being.
In this study, sustainability in education is examined at the psychological level, with a focus on whether students can maintain engagement, confidence, and adaptive functioning in technology-rich learning environments over time. From this perspective, learning anxiety and academic self-efficacy are not merely short-term states; they are conditions that shape students’ sustained participation in AI-assisted learning. High anxiety may lead to disengagement, whereas self-efficacy supports persistence and self-regulated learning. Therefore, both constructs can be understood as sustainability-relevant outcomes that influence the long-term viability of AI-integrated education.
Prior research highlights that GenAI’s educational value is accompanied by non-trivial risks. Syntheses of early education-focused work on ChatGPT emphasize persistent concerns such as hallucinations, bias, opacity in reasoning, privacy, and the possibility of shallow learning when students outsource thinking without verification and reflection [
2,
3]. These risks matter for sustainable higher education because they can translate into a psychological sustainability problem: AI-enabled convenience may co-exist with rising learning anxiety and weakening academic self-efficacy, which is widely treated as a foundational motivational resource supporting persistence and effective learning behaviors [
4].
Learning anxiety is theoretically central because it is strongly tied to students’ appraisals of control and value. Control–Value Theory proposes that when learners highly value achievement outcomes yet perceive reduced control, anxiety becomes more likely and can undermine engagement and performance [
5]. In GenAI-supported learning, control appraisals can become unstable: although AI may increase immediate task completion capability, it can also introduce uncertainty about authorship, legitimacy, and personal mastery—factors that plausibly intensify anxiety and erode confidence. Consistent with this logic, broader technology research conceptualizes “AI anxiety” as multidimensional, reflecting perceived threats and uncertainty about competence and consequences [
6].
In response, universities increasingly call for “responsible GenAI use,” yet many institutional approaches remain predominantly rule-based (e.g., prohibitions, disclosure policies) rather than capability-building. For Sustainability-oriented scholarship, the key need is educational technology design that strengthens students’ digital literacy and self-regulation: students must be able to set goals, evaluate outputs, verify claims, and maintain authorship responsibility while leveraging GenAI. This requirement is especially salient because GenAI systems impose substantial metacognitive demands on users—planning what to ask, judging output reliability, deciding when to rely on suggestions, and integrating outputs into one’s own reasoning [
7].
Metacognitive prompting is a practical and scalable instructional design aligned with this capability-building agenda. Meta-analytic evidence indicates that metacognitive prompts can enhance self-regulated learning activities and learning outcomes in computer-based learning environments [
8], suggesting a transferable pathway for supporting students’ regulation in GenAI-assisted tasks. Applied to GenAI, prompts that require students to articulate goals and constraints before prompting, to verify and triangulate AI-provided claims, and to reflect on what they understand independently may stabilize perceived control, reduce learning anxiety, and sustain academic self-efficacy [
4,
5].
Despite this promise, causal evidence remains limited regarding whether metacognitive prompting can reliably improve students’ psychological outcomes during GenAI-assisted learning, particularly for the paired outcomes of learning anxiety and academic self-efficacy that are central to sustainable learning capacity. Addressing this gap, the present study examines metacognitive prompting for responsible GenAI use and its effects on learning anxiety and academic self-efficacy using an experimental approach. By foregrounding student well-being and long-term capability—rather than only immediate productivity—this work aims to inform sustainability-oriented guidance for designing AI-integrated pedagogy consistent with responsible innovation in higher education.
2. Literature Review
Generative AI (GenAI) is rapidly reshaping learning and assessment practices, but its educational value depends on whether students can engage with it in ways that preserve human agency and academic integrity. Recent syntheses emphasize that large language models (LLMs) can support explanation, feedback, and ideation, while simultaneously introducing risks such as hallucinated content, opacity, overreliance, and inequitable outcomes—issues that demand learner competencies beyond “tool use” [
2]. From a sustainable education perspective, this aligns with calls to treat AI as a means to advance inclusive and equitable quality education (SDG 4) through governance, literacy, and human-centered design, rather than as a shortcut that displaces learning processes [
1].
2.1. Responsible GenAI Use as a Digital-Literacy and Sustainability Problem
Responsible GenAI use in academic contexts entails transparency, appropriate task delegation, and continuous evaluation of output quality against learning goals. UNESCO’s global guidance highlights the need for policies and capacity-building that protect privacy, promote equity, and sustain core educational values under GenAI adoption [
9]. In parallel, AI literacy research increasingly frames “responsible use” as a competence bundle that includes understanding AI capabilities/limits, ethical boundaries, and reflective decision-making about when and how to rely on AI [
10]. Together, these streams imply that the key challenge is not merely access to GenAI but the development of reflective, metacognitively informed use patterns that support durable learning and integrity.
2.2. Metacognition and Self-Regulated Learning as Theoretical Foundations
Self-regulated learning (SRL) models position learners as active agents who plan, monitor, and adapt strategies to meet goals [
11,
12]. Metacognitive scaffolds—especially prompts that elicit planning, monitoring, and reflection—have long been used to improve learning in technology-rich environments by making these SRL processes more explicit and controllable [
13]. In LLM-mediated learning, metacognition becomes even more central because students must (a) formulate effective prompts, (b) evaluate output credibility and relevance, and (c) calibrate reliance versus independent work [
7]. Therefore, “metacognitive prompting” is theoretically well-positioned as a lightweight intervention to steer GenAI use toward learning-oriented behaviors rather than answer harvesting.
2.3. Learning Anxiety Under GenAI Ambiguity and Evaluation Pressure
Learning anxiety is especially likely when learners face uncertain standards, ambiguous rules, or low perceived control—conditions common in early-stage GenAI adoption where course policies, detection tools, and acceptable-use norms may vary across classes. Control–Value Theory (CVT) explains achievement emotions (including anxiety) as arising from learners’ appraisals of control over learning activities and the value attached to outcomes [
5]. When students perceive low control (e.g., uncertainty about what is “allowed,” fear of being misjudged, inability to verify LLM output), anxiety should increase, which can impair cognitive resources and self-regulation. This connects to broader technology-related anxiety research: AI anxiety has been conceptualized as a multi-dimensional construct reflecting apprehension toward AI’s operation, consequences, and one’s own competence in AI-relevant contexts [
6]. In learning settings, these frameworks jointly predict that reducing ambiguity and increasing reflective control processes should be an effective pathway to lowering anxiety during GenAI-supported tasks.
2.4. Academic Self-Efficacy as a Sustainability-Relevant Learning Outcome
Academic self-efficacy—students’ belief that they can successfully perform academic tasks—predicts persistence, strategic learning, and achievement, particularly under challenge [
4,
14]. GenAI can either support or erode self-efficacy depending on how it is used: when it functions as guided support that helps learners understand, revise, and improve, it may strengthen mastery experiences; when it encourages passive copying or creates dependence, it may undermine perceived competence and long-term skill development [
2]. Because sustainable education emphasizes durable capabilities (not just short-term performance), maintaining self-efficacy while integrating GenAI is a central criterion for “responsible” implementation, especially in higher education where independent learning and academic integrity are core.
2.5. Research Gap and Experimental Motivation
Although policy guidance and conceptual analyses converge on the importance of human-centered, responsible GenAI use [
9] and HCI work clarifies that GenAI interactions impose metacognitive demands [
7], empirical evidence remains comparatively thin on causal mechanisms: specifically, there is little evidence on whether embedding metacognitive prompting into GenAI use can reduce learning anxiety and sustain academic self-efficacy during authentic learning tasks. Existing SRL and metacognitive scaffolding research supports the plausibility of prompting-based interventions [
13], but GenAI introduces distinctive challenges (verification, reliance calibration, integrity ambiguity) that may alter the anxiety–efficacy dynamic described by CVT [
5]. This gap motivates the present experiment, which examines whether metacognitive prompting during GenAI-supported learning can reduce learning anxiety and support academic self-efficacy. In doing so, the study provides empirical evidence relevant to both educational technology and sustainable digital literacy in higher education.
3. Research Hypotheses
This study conceptualizes metacognitive prompting for responsible GenAI use as an instructional scaffold that strengthens students’ self-regulatory processing while interacting with LLM outputs. In self-regulated learning (SRL) frameworks, learners’ performance and adaptation depend on planning, monitoring, and self-evaluative reflection [
11,
15]. Metacognitive prompts are designed to activate these processes by asking learners to (a) clarify task goals and criteria, (b) monitor progress and comprehension, and (c) evaluate strategy fit and output quality. Prior evidence indicates that metacognitive prompting can improve learning-related processes and outcomes in technology-mediated environments [
8,
13]. In GenAI contexts, such prompting is particularly relevant because effective use requires users to make metacognitive judgments about what to delegate, how to verify outputs, and how to integrate AI suggestions with their own reasoning [
7]. Therefore, metacognitive prompting should shift GenAI use from passive acceptance toward reflective engagement, which has direct implications for learners’ emotions and competence beliefs.
3.1. Metacognitive Prompting and Learning Anxiety
Achievement emotions research suggests that anxiety is closely tied to students’ appraisals of control and value. Control–Value Theory posits that when learners highly value outcomes but perceive limited control over achieving them, anxiety is more likely and may impair engagement and performance [
5]. In GenAI-assisted learning, control appraisals can be disrupted by uncertainty regarding authorship, the reliability of AI-generated claims, and ambiguity about acceptable use. These conditions align with broader conceptualizations of AI-related anxiety that emphasize uncertainty, perceived threats, and concerns about competence in AI-involved situations [
6]. Metacognitive prompting is expected to reduce such uncertainty by clarifying goals, structuring verification, and reinforcing ownership of decisions during AI use—thus increasing perceived control and lowering anxiety.
H1. Metacognitive prompting for responsible GenAI use is negatively associated with students’ learning anxiety.
3.2. Metacognitive Prompting and Academic Self-Efficacy
Academic self-efficacy refers to students’ beliefs about their capability to organize and execute actions required to attain academic goals [
16]. In higher education, self-efficacy is a central motivational resource that supports persistence, strategic learning, and resilient responses to difficulty [
4]. Although GenAI can provide immediate assistance, unstructured reliance may weaken students’ perceptions of competence and authorship, particularly when AI outputs appear superior or when students cannot explain or defend the submitted work. Reviews of LLMs in education caution that such patterns can lead to shallow learning and overreliance if students do not actively evaluate and integrate AI outputs [
2]. By contrast, metacognitive prompting should strengthen self-efficacy by promoting mastery-oriented behaviors: students explicitly articulate what they understand, verify claims, and iteratively improve outputs based on criteria—processes consistent with SRL models [
11,
12]. As a result, prompting is expected to sustain (or enhance) academic self-efficacy during GenAI-assisted tasks.
H2. Metacognitive prompting for responsible GenAI use is positively associated with students’ academic self-efficacy.
3.3. The Anxiety–Self-Efficacy Linkage and a Mediated Pathway
Learning anxiety and self-efficacy are theoretically intertwined. In control–value terms, heightened anxiety reflects reduced perceived control in valued learning contexts [
5], and such reduced control should also undermine confidence in one’s ability to succeed. From a social cognitive perspective, self-efficacy is shaped by interpreted mastery experiences and perceived capability; anxiety can interfere with attention, strategy use, and perceived mastery, thereby lowering efficacy judgments [
17]. Therefore, learning anxiety is expected to be negatively related to academic self-efficacy in GenAI-assisted learning tasks.
H3. Students’ learning anxiety is negatively associated with their academic self-efficacy.
Integrating these arguments, metacognitive prompting should influence academic self-efficacy not only directly (by supporting mastery-oriented regulation) but also indirectly by reducing learning anxiety through strengthened control appraisals and clearer ownership of GenAI-assisted work [
5,
7]. This logic implies a mediated mechanism in which prompting reduces anxiety, which in turn supports self-efficacy.
H4. Learning anxiety mediates the relationship between metacognitive prompting and academic self-efficacy, such that metacognitive prompting reduces learning anxiety, thereby sustaining academic self-efficacy.
4. Research Methodology
4.1. Research Design
To test the causal effect of metacognitive prompting on learning anxiety and academic self-efficacy in GenAI-assisted learning, this study adopts a between-subjects randomized controlled experimental design. Random assignment is central for supporting causal inference by minimizing systematic pre-treatment differences between conditions and strengthening internal validity [
18].
The experiment implements a two-condition structure: (1) a metacognitive prompting condition that embeds brief, structured prompts guiding learners to plan, monitor, and evaluate their GenAI use, and (2) a control condition in which learners complete the same task with access to GenAI but without metacognitive scaffolding (or with neutral, non-metacognitive instructions). The manipulation is designed to operationalize metacognitive regulation in line with established SRL principles (e.g., planning and self-evaluation) while keeping task content, time-on-task, and tool access constant across conditions, thereby isolating the effect of prompting from general exposure to GenAI [
18,
19]. The study is conducted as a standardized, task-based experiment that mirrors realistic academic use of GenAI. The academic task required participants to read a short source text (approximately 350–400 words) on the topic of the impact of social media on academic performance and produce a structured written response of approximately 250–300 words. Participants were asked to summarize the main arguments presented in the text, evaluate the credibility of the evidence, and articulate their own position supported by the provided material. The task was completed within a fixed time limit of 40 min. All participants received identical task instructions and materials to ensure comparability across conditions. Such a task allows the prompting intervention to act on critical decision points of GenAI use—what to ask, how to judge output credibility, and how to integrate AI-generated suggestions into one’s own reasoning—while ensuring comparability of learning demands across participants. Because computer-based experimentation introduces potential risks (e.g., distraction, uncontrolled settings, or inattentive responding), the study followed established recommendations for experimental quality control, including clear instructions, attention checks, and procedural controls to enhance data integrity [
19].
A total of 156 students were present in the classroom session and invited to participate. After applying the pre-specified screening criteria (e.g., incomplete post-task questionnaire, failed attention check, or missing key outcome items), 148 valid cases were retained for analysis, with 74 students assigned to the metacognitive prompting condition and 74 to the control condition. Baseline equivalence between conditions was assessed using pre-task variables (prior GenAI use frequency, baseline task confidence, and basic demographics). Consistent with expectations under random assignment, no statistically meaningful differences were observed between the two conditions (all
ps > 0.05), supporting internal validity for condition comparisons [
18].
Table 1 presents participant characteristics and confirms baseline equivalence between conditions.
To enhance transparency and replicability, the research design follows widely used reporting conventions for randomized trials, including documenting participant flow (eligibility screening, randomization, exclusions, and analysis sample) and reporting key implementation details [
20]. Although originally developed for clinical trials, CONSORT-style reporting elements (e.g., flow diagrams and explicit allocation procedures) have been broadly adopted as best practice for improving clarity and reducing ambiguity in experimental reporting.
4.2. Experimental Materials
The experimental materials were developed to simulate a realistic, academically relevant GenAI-assisted learning episode while keeping the focal manipulation—metacognitive prompting for responsible GenAI use—clearly separable from task content and tool access. The design of the prompting materials was grounded in self-regulated learning and metacognitive scaffolding research, which emphasizes planning, monitoring, and evaluation as core regulatory activities that can be activated through brief, well-timed prompts [
8,
11,
13]. In parallel, the materials explicitly addressed the distinctive demands of GenAI interaction—prompt formulation, output evaluation, reliance calibration, and verification—which have been identified as metacognitively intensive user activities in generative AI settings [
7].
The learning task materials consisted of a standardized academic prompt, a short source text (or evidence packet) for content grounding, and an output template that required participants to produce an evidence-based response under time constraints. This task structure was selected to reflect a common university assignment pattern (reading-to-write or evidence-supported short essay) and to ensure that GenAI’s role would plausibly include summarization, argument structuring, and language refinement, while still requiring participants to exercise judgment about what to accept, revise, or reject. The task also enabled a meaningful “responsible use” requirement because LLMs can generate fluent but potentially inaccurate or unsupported claims, making verification and attribution central to quality and integrity [
2].
To align the intervention with sustainability-oriented educational technology discourse, the materials framed GenAI use as a capability-building activity rather than a shortcut. Specifically, the instructions emphasized learner agency, transparency of assistance, and verification of claims—principles consistent with UNESCO’s guidance on generative AI in education, which stresses human-centered use, attention to risk, and capacity development to support responsible adoption [
21]. This framing was implemented uniformly across conditions to avoid confounding the manipulation with differences in ethical messaging; what differed across conditions was the presence versus absence of structured metacognitive prompts that operationalized how to enact responsible use during the task.
In the metacognitive prompting condition, participants received three brief prompt blocks embedded at fixed points of the task flow. A pre-use prompt asked participants to articulate their task goal, specify what they intended to use GenAI for (and not for), and identify evaluation criteria they would apply to AI outputs (planning). A mid-task prompt required participants to check whether the GenAI output was supported by the provided source text, flag uncertain statements, and decide whether to request clarification or alternative outputs (monitoring). A post-use prompt asked participants to summarize what they understood independently, document the changes they made to AI-assisted text, and reflect on remaining uncertainties or verification steps (evaluation). The content and sequencing follow established scaffolding principles suggesting that prompts are most effective when they are aligned with SRL phases and direct learners toward actionable monitoring and evaluation behaviors [
8,
12,
13].
In the control condition, participants completed the same learning task with the same GenAI access and the same time allotment but did not receive metacognitive prompts. To control for the possibility that additional text alone could influence outcomes, the control instructions were kept comparable in length and clarity but remained procedurally neutral (e.g., describing task steps without requiring goal articulation, verification, or reflective justification). This approach is consistent with intervention logic in prompting research, which distinguishes metacognitive prompting effects from generic instruction or exposure effects [
8].
Finally, the GenAI-use environment was standardized across participants through a fixed interface and consistent tool guidance. Participants were instructed to use GenAI as an optional aid during the task, and the study interface provided an identical access pathway in both conditions. Given the known risks of overtrust and uncritical acceptance of fluent outputs, the materials reminded participants that AI-generated content may contain errors and requires verification—without introducing condition-specific warnings that could bias emotional responses [
2]. In combination, these materials created a controlled yet ecologically plausible setting for testing whether metacognitive prompting can shift GenAI use toward responsible, reflective engagement and thereby influence learning anxiety and academic self-efficacy outcomes [
5,
7].
The GenAI tool used in this study was ChatGPT (OpenAI), accessed via a web-based interface under standardized conditions. To ensure consistency across participants, all students were provided with the same access instructions and used the same model version (ChatGPT, GPT-4) during the task. Where necessary, access was arranged in advance to ensure that all participants could use the system without technical barriers. Participants interacted with the tool through their own devices (e.g., laptops or tablets), and no alternative AI tools were permitted during the task. The use of GenAI was optional but encouraged, allowing the task to reflect realistic learning behavior while ensuring that participants had equal opportunity to engage with the tool.
The full wording of the metacognitive prompts, control instructions, task materials, and administration procedure are provided in
Appendix A to ensure reproducibility and transparency.
4.3. Measures
All questionnaire items were administered in English (or in Chinese using a translation–back translation procedure when applicable) and rated on a 7-point Likert scale (1 = strongly disagree, 7 = strongly agree), unless otherwise noted. Primary outcomes were measured immediately after the experimental task to capture state-relevant responses to the GenAI-assisted learning episode, whereas background variables (e.g., prior GenAI experience) were collected before the task to avoid post-treatment contamination.
4.3.1. Learning Anxiety
Learning anxiety was operationalized as a task-evoked achievement emotion during the GenAI-assisted learning activity. We adapted a short set of items from the learning-related anxiety component of the Achievement Emotions Questionnaire (AEQ), a control–value theory-grounded instrument developed for assessing achievement emotions in learning contexts [
22]. To reduce respondent burden and enhance suitability for experimental administration, we followed the short-form logic of AEQ-S, which provides brief, psychometrically supported emotion scales [
23]. Items were contextualized to the focal task by anchoring them to the learning episode (e.g., “During this task…”) while preserving the construct meaning of anxiety as experienced in learning situations. The scale demonstrated good internal consistency in the present study (Cronbach’s α = 0.88).
4.3.2. Academic Self-Efficacy
Academic self-efficacy was measured as students’ perceived capability to successfully handle academic demands similar to the experimental task. We used the academic self-efficacy measure employed in Chemers, Hu, and Garcia’s longitudinal study on first-year college student performance and adjustment [
24], with minor contextual adaptation to the course/task setting (e.g., “in this course”/“for tasks like this”). This approach aligns with social cognitive theory, which conceptualizes self-efficacy as domain-specific confidence in one’s ability to perform goal-directed tasks. The scale included 6 items and showed high reliability (Cronbach’s α = 0.91).
4.3.3. Manipulation Check: Metacognitive Engagement During GenAI Use
To verify that the metacognitive prompting manipulation increased reflective regulation during GenAI use (rather than simply adding instructional text), we included a process-focused manipulation check assessing participants’ metacognitive engagement while completing the task. Specifically, we adapted a short subset of items from the Metacognitive Self-Regulation component of the Motivated Strategies for Learning Questionnaire (MSLQ), which targets planning, monitoring, and strategy regulation during learning [
25]. These items were framed as state reports tied to the task episode (e.g., monitoring understanding, checking progress against criteria, adjusting approach when needed). Higher scores indicate stronger metacognitive engagement during the task, which should be higher in the metacognitive prompting condition. The scale consisted of 6 items and showed acceptable internal consistency (Cronbach’s α = 0.87).
As an additional fidelity check, participants answered a brief recall/recognition item indicating whether they received structured prompts that required planning, verification, and reflection during GenAI use (yes/no), consistent with the practice of documenting treatment receipt in randomized experiments [
18].
4.3.4. Control Variables
To reduce omitted-variable bias and enhance interpretability, analyses included standard background covariates that could plausibly influence anxiety and efficacy in GenAI-assisted learning: gender, age, year of study, self-reported GPA (or prior semester average), and prior GenAI experience (frequency of use for academic tasks). Given the conceptual relevance of digital/AI literacy to responsible GenAI use, we additionally included a brief AI literacy measure when survey length allowed, drawing on validated AI literacy measurement work [
26].
4.3.5. Data Quality and Attention Checks
Because online or computer-based experiments can be affected by satisficing and inattentive responding, the survey embedded an instructional manipulation check (IMC) item to identify participants who did not follow basic instructions [
27]. Responses failing the IMC (and other pre-registered exclusion rules such as extremely short completion times) were excluded prior to hypothesis testing.
Operationally, scale reliability was evaluated using Cronbach’s α and composite reliability, and construct validity was assessed via confirmatory factor analysis, consistent with common practice in education technology and psychology measurement when using multi-item latent constructs [
28,
29].
4.4. Participants and Sampling
The study was implemented as a field experiment in a face-to-face classroom setting within a university course that regularly involves writing- and synthesis-based academic tasks. Participants were enrolled students who were present on the experimental day and agreed to participate voluntarily. Participation was not tied to course grades; when incentives were used, they took the form of small participation credits or small gifts that were equivalent across conditions to avoid differential motivation. Consistent with guidance on strengthening causal inference in field settings, the design prioritized random assignment and standardized delivery to support internal validity while maintaining ecological realism [
30].
Eligibility criteria required that students (a) were currently enrolled in the course, (b) had no prior participation in pilot sessions, and (c) had basic familiarity with using GenAI tools for academic purposes (self-reported), to ensure that the task reflected authentic GenAI-assisted learning rather than first-time tool onboarding. Because classroom experiments can be affected by non-independence (e.g., classmates influencing each other), the sampling plan also recorded section membership and seating arrangement (when available) to enable sensitivity analyses for clustering [
31].
A priori sample size planning followed conventional power analysis principles for experimental designs [
32]. Where feasible, the minimum target sample was determined using G*Power (version 3.1.9.7) assumptions regarding small-to-medium effects commonly observed in instructional interventions [
33]. To preserve statistical power under realistic classroom attrition (e.g., absences, incomplete surveys), the recruitment target included a buffer above the minimum required N [
32].
All procedures followed standard human-subjects protections for educational research: participants received an information sheet describing the purpose, procedures, risks, confidentiality, and the voluntary nature of participation, and provided informed consent prior to data collection. Data were anonymized using unique codes, stored securely, and analyzed in aggregate form.
4.5. Experimental Procedure
The experiment was administered during a regular class meeting and followed a standardized protocol to reduce instructor effects and minimize condition contamination. At the start of the session, the instructor (or research assistant) introduced the activity using a scripted statement that avoided revealing hypotheses. Participants then completed a brief pre-task questionnaire capturing demographics, prior GenAI use frequency, and baseline task confidence. Collecting these measures before the manipulation reduces post-treatment bias and supports covariate adjustment if needed [
30].
Participants were randomly assigned to one of two conditions: metacognitive prompting versus control. Randomization was conducted at the individual level within the classroom using an alternating assignment procedure based on seating order, such that students in odd-numbered positions were assigned to the prompting condition and those in even-numbered positions to the control condition. This approach ensured approximately balanced group sizes while remaining feasible in a classroom setting [
30]. To limit diffusion across conditions, students were instructed not to discuss materials during the activity; when feasible, seating gaps were used, and condition materials were presented in visually identical formats to avoid drawing attention. The instructor circulated during the task to enforce quiet work and to answer procedural questions without providing substantive assistance.
The learning task (described in
Section 4.2) was completed within a fixed time window under equivalent GenAI access conditions. During the task, participants could interact freely with the GenAI tool by entering prompts, requesting clarifications, or revising outputs as needed. No predefined prompts were imposed in the control condition, allowing naturalistic variation in AI use behavior. In the metacognitive prompting condition, prompts were embedded at pre-specified time points (pre-use, mid-task, post-use) to activate planning, monitoring, and evaluation processes. In the control condition, participants received neutral task instructions of comparable length without metacognitive requirements.
At the end of the session, participants were debriefed with a short explanation of the study purpose and contact information for follow-up questions. Reporting of participant flow (enrolled, consented, excluded, analyzed) followed CONSORT-inspired transparency practices adapted to behavioral and educational experiments [
34].
4.6. Data Analysis Strategy
Hypotheses were tested using an analysis plan aligned with randomized experiments and the measurement structure of multi-item constructs. First, descriptive statistics and internal consistency reliability (Cronbach’s α) were computed for each scale, followed by confirmatory factor analysis (CFA) to verify discriminant validity between learning anxiety and academic self-efficacy before hypothesis testing, consistent with established measurement practice in psychology and education research [
35].
To test H1 and H2, we estimated the effect of condition (metacognitive prompting vs. control) on learning anxiety and academic self-efficacy using ordinary least squares regression (or ANCOVA), reporting unstandardized coefficients, standardized effects, and confidence intervals. Covariates such as prior GenAI use frequency and baseline task confidence were included as controls to improve precision without compromising randomization, as commonly recommended in experimental analysis [
18]. Effect sizes were reported using Cohen’s d (or partial η
2 for ANCOVA) to support interpretation and comparison across studies [
32].
To test H3, we examined the association between learning anxiety and academic self-efficacy in the post-task data, controlling for condition and relevant pre-task covariates. Although randomization supports causal inference for the treatment effect, the anxiety–efficacy relationship is correlational within the post-treatment state; therefore, interpretation was framed accordingly.
To test the mediation hypothesis (H4), we estimated the indirect effect of condition on academic self-efficacy through learning anxiety using bias-corrected bootstrap confidence intervals, which are widely recommended for mediation testing due to fewer distributional assumptions for indirect effects [
36]. Where classroom structure implied potential clustering (e.g., multiple class sections), we conducted sensitivity checks using multilevel modeling with random intercepts for class/section to account for non-independence [
31]. Results were considered robust if conclusions remained consistent across single-level and multilevel specifications.
Missing data were minimal (<5%) and were handled using listwise deletion, as the pattern of missingness did not indicate systematic bias. All analyses used two-tailed tests with α = 0.05, and robustness checks included re-estimation after excluding inattentive respondents based on the attention check criteria [
27].
To enhance analytical transparency, additional details of the statistical procedures are provided. Regression analyses were conducted using ordinary least squares (OLS) models, with condition (metacognitive prompting vs. control) as the primary independent variable and post-task outcomes as dependent variables. Where appropriate, ANCOVA specifications included relevant pre-task covariates (e.g., prior GenAI use and baseline task confidence) to improve estimation precision.
For the mediation analysis, we employed the PROCESS macro (Model 4) with 5000 bias-corrected bootstrap resamples to estimate indirect effects and corresponding 95% confidence intervals. Assumptions of linear regression, including normality of residuals, homoscedasticity, and absence of multicollinearity, were examined and found to be within acceptable ranges.
Given the limited number of primary hypotheses, no formal correction for multiple comparisons was applied; however, effect sizes and confidence intervals are reported to support interpretation of the results.
All variables included in the mediation model were measured at the post-task stage. Therefore, the model should be interpreted as a cross-sectional mediation model, and the directionality is theoretically inferred rather than empirically established through temporal separation. Accordingly, the mediation results should be understood as suggestive and consistent with an indirect-effect pattern rather than providing strong causal evidence.
To further assess the potential influence of common method bias, Harman’s single-factor test was conducted. All measurement items were entered into an exploratory factor analysis using unrotated principal component extraction. This test evaluates whether a single factor accounts for the majority of covariance among the variables, which would indicate a potential common method bias concern [
37].
As an additional robustness check, the mediation model was also examined using a structural equation modeling (SEM) approach with latent variables. This approach allows simultaneous estimation of measurement and structural relationships and provides a complementary assessment of the mediation pattern.
Although initial evidence supports comparability across conditions, full measurement invariance (e.g., metric or scalar invariance) was not formally tested and should be examined in future research.
4.7. Reliability and Validity (Measurement Quality)
Because learning anxiety and academic self-efficacy were modeled as multi-item latent constructs, we evaluated measurement quality prior to hypothesis testing using a standard reliability–validity sequence. Internal consistency reliability was first assessed using Cronbach’s alpha, which remains a conventional indicator for scale reliability in behavioral research [
38]. Given known limitations of alpha under certain congeneric measurement conditions, we also examined composite reliability (CR) estimates derived from the confirmatory factor model, which are commonly recommended for latent-variable research [
39].
Convergent validity was assessed by inspecting standardized factor loadings and the average variance extracted (AVE). AVE values were interpreted as evidence that the latent construct explains a substantive proportion of variance in its indicators [
40]. Discriminant validity between learning anxiety and academic self-efficacy was evaluated using both the Fornell–Larcker criterion and the heterotrait–monotrait (HTMT) ratio. HTMT has been recommended as a more sensitive diagnostic for discriminant validity problems, particularly when constructs are conceptually related [
41]. The measurement model was estimated via confirmatory factor analysis (CFA) and evaluated using commonly reported fit indices (e.g., CFI, TLI, RMSEA, SRMR), following established CFA/SEM reporting conventions [
35]. When interpreting global fit, we referenced widely cited practical guidelines for approximate fit indices [
42], while prioritizing theoretical coherence and parameter plausibility over rigid thresholding [
35].
Given the experimental context, we also examined whether the measurement model behaved comparably across the two conditions (metacognitive prompting vs. control). At minimum, we checked whether the factor structure was stable across groups (configural equivalence) and whether key item loadings were broadly consistent, because meaningful comparisons of condition means require that constructs are measured similarly across groups [
43]). Where sample size permitted, we conducted multi-group CFA sensitivity checks to ensure that substantive conclusions were not driven by measurement artifacts [
35].
Before hypothesis testing, we evaluated the measurement model for learning anxiety and academic self-efficacy using confirmatory factor analysis (CFA). The two-factor model demonstrated acceptable-to-good fit (χ
2 (34) = 48.60, CFI = 0.97, TLI = 0.96, RMSEA = 0.05, SRMR = 0.04), consistent with commonly cited practical guidelines [
42] while prioritizing theoretical interpretability (Kline, 2016).
Table 2 summarizes the CFA model fit, standardized factor loadings, and reliability/validity evidence for the study constructs.
Standardized factor loadings were all significant and exceeded 0.65, supporting convergent validity [
40]. Internal consistency reliability was satisfactory, with Cronbach’s α = 0.88 for learning anxiety and α = 0.90 for academic self-efficacy [
38]. Composite reliability values were CR = 0.89 (anxiety) and CR = 0.91 (self-efficacy), supporting reliability under a latent-variable perspective [
39].
Discriminant validity between the two constructs was supported by the Fornell–Larcker criterion [
40] and the HTMT ratio (HTMT = 0.55), which were below the recommended cutoffs [
41]. As a sensitivity check, the factor structure was examined across conditions; the results supported comparable factor structure across the prompting and control groups at the configural level, suggesting that the constructs were measured similarly across conditions and providing initial support for the comparability of group-level analyses [
43].
4.8. Ethical Considerations
This study was reviewed and approved by the Human Research Ethics Committee of Fujian University of Technology (Approval Code: FJUT-HREC092/2025). The classroom experiment was designed as minimal-risk educational research and was guided by established human-subject and educational-research ethics principles. Specifically, the study followed the principles of respect for persons, beneficence, and justice articulated in the Belmont Report, which are commonly operationalized through voluntary participation, informed consent, risk minimization, and fair treatment of participants. In addition, the study was conducted in a manner consistent with the Declaration of Helsinki and with contemporary guidance for educational research emphasizing transparency, the right to withdraw, confidentiality, and researchers’ duty of care. Participation was voluntary, and students were explicitly informed that declining or withdrawing would not affect course standing. Prior to data collection, participants received a written information sheet describing the study purpose, procedures, potential risks (e.g., mild discomfort during a timed academic task), confidentiality protections, and researcher contact information, and informed consent was obtained from all participants.
Given the study’s focus on GenAI use, additional safeguards were incorporated to address privacy, transparency, and responsible technology use. Participants were instructed not to enter personally identifiable information into any GenAI interface during the task, and the research materials emphasized that AI-generated outputs required verification rather than uncritical acceptance. Data were collected using anonymous participant codes, stored on password-protected devices, and reported only in aggregate form. These safeguards were implemented to reduce privacy risk and support responsible participation in an AI-mediated learning environment, consistent with UNESCO’s guidance that generative AI use in education and research should be human-centred and should prioritize privacy protection, transparency, and capacity-building [
9].
At debriefing, participants were informed of the purpose of the study at a level that preserved transparency without compromising the integrity of ongoing data collection across class sections. Any student questions were addressed, and contact details were provided for follow-up support. This procedure was intended to balance transparency with the methodological need to minimize contamination across conditions in a classroom-based experiment [
44].
4.9. Manipulation Check Results (Treatment Fidelity)
To verify that the intervention functioned as intended, we assessed whether students in the metacognitive prompting condition reported higher metacognitive engagement during GenAI use than students in the control condition. This manipulation check focused on process activation—planning, monitoring, and evaluation—rather than mere exposure to additional text, reflecting the theoretical definition of metacognitive prompting as an SRL scaffold [
11,
12]. The manipulation check scale was administered immediately after the task to capture participants’ regulation during the learning episode and to minimize retrospection bias.
Manipulation-check differences between conditions were examined using independent-samples
t-tests and/or ANCOVA (with baseline task confidence and prior GenAI experience as precision covariates), consistent with standard practice in randomized experiments [
18]. We reported effect sizes (Cohen’s d) alongside confidence intervals to quantify the magnitude of the manipulation effect [
32]. Evidence of successful manipulation was defined as a statistically meaningful and substantively non-trivial increase in metacognitive engagement in the metacognitive prompting condition, which would be consistent with prior findings that metacognitive prompts can elevate SRL-related processes in technology-mediated learning environments [
8,
13]. In addition, a brief recognition item confirmed whether participants noticed receiving structured prompts requiring goal setting, verification, and reflection; this provided a complementary treatment-receipt check for classroom implementation fidelity [
18].
If the manipulation check did not show the expected condition difference, we treated this as a fidelity concern and interpreted hypothesis tests with caution, consistent with recommendations to distinguish theory failure from implementation failure in experimental evaluations [
18]. Where appropriate, we also conducted sensitivity analyses excluding participants who failed attention checks, because inattentive responding can attenuate both manipulation checks and outcome effects [
27].
5. Results
This section presents the results in relation to each hypothesis, with detailed statistical estimates reported in the corresponding tables.
5.1. Descriptive Statistics
Descriptive statistics, internal consistency reliability, and bivariate correlations among the study variables are presented in
Table 3.
5.2. Manipulation Check
To assess treatment fidelity, we tested whether participants in the metacognitive prompting condition reported stronger metacognitive engagement during the GenAI-assisted task than those in the control condition.
Table 4 reports the manipulation check results.
Students in the metacognitive prompting condition reported significantly higher metacognitive engagement (M = 5.62, SD = 0.74) than students in the control condition (M = 4.65, SD = 0.83), t(146) = 7.50,
p < 0.001, with a large effect size (d = 1.23) [
32];. This pattern remained substantively unchanged when adjusting for baseline task confidence and prior GenAI experience (conclusions unchanged), indicating that the prompting manipulation successfully increased reflective regulation during GenAI-assisted learning as intended.
5.3. Effects on Learning Anxiety and Academic Self-Efficacy (H1–H2)
Hypothesis tests were conducted using OLS regression (ANCOVA-equivalent specifications) with condition as the focal predictor. Supporting H1, metacognitive prompting significantly reduced learning anxiety relative to the control condition. Specifically, the prompting group reported lower anxiety (M = 3.10, SD = 1.05) than the control group (M = 3.78, SD = 1.12), corresponding to b = −0.68, SE = 0.18, t = −3.81,
p < 0.001, 95% CI [−1.03, −0.33]. The magnitude of the between-group difference was moderate (d = −0.63; [
32]). Results were substantively unchanged when including baseline task confidence and prior GenAI use as precision covariates.
Supporting H2, metacognitive prompting significantly increased (i.e., sustained at a higher level) academic self-efficacy relative to the control condition. The prompting group reported higher self-efficacy (M = 5.20, SD = 0.88) than the control group (M = 4.80, SD = 0.92), corresponding to b = 0.40, SE = 0.15, t = 2.70,
p = 0.008, 95% CI [0.11, 0.69]. The effect size was small-to-moderate (d = 0.44; [
32]). These findings indicate that metacognitive prompting improved students’ psychological experience of GenAI-assisted learning by lowering anxiety and supporting confidence in academic capability.
5.4. Association Between Anxiety and Self-Efficacy (H3)
To test H3, we examined the association between learning anxiety and academic self-efficacy in post-task data while controlling for condition and relevant pre-task covariates. Learning anxiety was negatively associated with academic self-efficacy (b = −0.42, SE = 0.06, t = −7.00,
p < 0.001), consistent with Control–Value Theory’s view that anxiety emerges under reduced perceived control and can undermine adaptive learning-related beliefs [
5].
5.5. Mediation Analysis (H4)
To test H4, we estimated the indirect effect of metacognitive prompting on academic self-efficacy through learning anxiety using bias-corrected bootstrap confidence intervals, as recommended for mediation inference [
36]. The indirect effect was positive and statistically significant (ab = 0.26), with a 95% bootstrap CI [0.12, 0.43], indicating a significant indirect association between metacognitive prompting and academic self-efficacy via learning anxiety.
The direct effect of condition on academic self-efficacy remained significant but smaller after including anxiety (c′ = 0.14, SE = 0.06, p = 0.018), which is consistent with a partial mediation pattern.
However, as both learning anxiety and academic self-efficacy were measured concurrently at the post-task stage, these findings should be interpreted with caution. The observed relationships are consistent with a mediation mechanism, but do not provide definitive evidence of a causal mediating process.
These findings are consistent with theoretical perspectives suggesting that metacognitive prompting may be associated with improved perceived control and evaluative regulation, which in turn relate to lower anxiety and higher self-efficacy [
5,
7].
The SEM-based robustness check produced a pattern of results consistent with the regression-based mediation analysis, with the indirect effect remaining significant and in the same direction.
To further evaluate sample adequacy, a post hoc power analysis was conducted based on the observed effect sizes using G*Power [
33]. The achieved statistical power was 0.95 for learning anxiety (d = 0.63) and 0.83 for academic self-efficacy (d = 0.44), both exceeding the conventional threshold of 0.80, indicating adequate statistical power. This indicates that the sample size was adequate to detect the observed effects.
A summary of the hypothesis testing results is provided in
Table 5.
5.6. Robustness and Sensitivity Checks
Several robustness checks were conducted. First, the main effects (H1–H2) were re-estimated after excluding participants who failed the attention check; conclusions were unchanged, suggesting that inattentive responding did not drive results [
27]. Second, because classroom data may involve partial clustering (e.g., by class section), we estimated sensitivity models using random intercepts for section where applicable; the direction and statistical conclusions of the main effects remained stable [
31]. Third, alternative specifications (standardized outcomes; models with and without covariates) yielded consistent patterns, increasing confidence in the robustness of the reported effects [
18]. To assess the robustness of the findings, we re-estimated the main models under alternative specifications and samples.
Table 6 summarizes the sensitivity analyses.
Taken together, the results for the robustness checks are consistent with the main findings, indicating that the observed effects are stable across alternative specifications and samples.
As an exploratory analysis, we examined whether the effects of metacognitive prompting varied by prior GenAI use and baseline task confidence. Interaction terms between condition and these variables were tested; however, no consistent or statistically significant moderation effects were observed. This pattern suggests that the main effects of the intervention were relatively stable across different levels of prior experience and initial confidence.
6. Discussion
This classroom experiment examined whether metacognitive prompting for responsible GenAI use was associated with more favorable psychological responses during GenAI-assisted learning. Students in the prompting condition reported higher metacognitive engagement, lower learning anxiety, and higher academic self-efficacy than those in the control condition. The mediation analysis further suggested an indirect pathway through learning anxiety. However, because anxiety and self-efficacy were measured at the same post-task time point, this pathway should be interpreted cautiously as theoretically consistent rather than as definitive evidence of causal mediation.
From a theoretical perspective, these findings align with Control–Value Theory, suggesting that metacognitive prompts may be associated with enhanced perceived control by clarifying goals, verification steps, and ownership, which in turn relate to lower anxiety under GenAI-related uncertainty [
5,
6]. Similarly, the observed association with academic self-efficacy is consistent with social cognitive theory and self-regulated learning perspectives, where prompting may encourage mastery-oriented regulation, including planning, monitoring, and self-evaluation—allowing GenAI to function as guided support rather than a substitute for competence [
4,
11,
12].
More specifically, the role of metacognitive prompting in this context can be understood by considering the unique cognitive and affective challenges introduced by GenAI-assisted learning. Unlike traditional learning tools, GenAI systems often produce fluent and seemingly authoritative outputs, which may increase the risk of over-reliance or uncritical acceptance of information, particularly when outputs contain subtle inaccuracies or “hallucinations”. In such situations, learners may experience uncertainty regarding the reliability of AI-generated content, which can reduce perceived control and elevate anxiety.
Metacognitive prompting may help mitigate these challenges by encouraging learners to actively evaluate the credibility of AI-generated outputs, reflect on their reasoning processes, and maintain a sense of authorship over their work. By structuring interaction with GenAI through planning, monitoring, and verification steps, prompts can support perceived control and reduce uncertainty, thereby contributing to lower anxiety and more stable self-efficacy.
Metacognitive prompts appear to intervene at this critical point by structuring learners’ interaction with GenAI. For example, prompts that require planning (e.g., clarifying task goals and intended use of AI), monitoring (e.g., checking consistency and credibility of AI outputs), and evaluation (e.g., reflecting on how AI-generated content aligns with one’s own reasoning) may shift learners from passive consumption to active regulation. This shift is theoretically consistent with Control–Value Theory, as clearer task framing and verification processes can enhance perceived control, thereby reducing anxiety.
At the same time, these prompts align with self-regulated learning (SRL) frameworks by supporting mastery-oriented processes. Rather than replacing cognitive effort, GenAI is positioned as a tool that requires regulation—encouraging learners to engage in verification, integration, and self-evaluation. In this sense, metacognitive prompting may reduce the tendency to over-rely on fluent AI outputs and instead promote more deliberate and reflective use, which helps sustain self-efficacy in completing the task.
The pattern of results is also compatible with the possibility of multiple pathways, including an affective route (lower anxiety) and a competence-related route (enhanced regulation). However, these pathways should be interpreted as theoretically plausible rather than empirically established, given the cross-sectional nature of the mediation analysis and the reliance on post-task measures [
7,
8,
13].
From a sustainability-oriented educational technology perspective, the findings highlight the importance of psychological sustainability as a core dimension of sustainable higher education in AI-rich learning environments. Specifically, reduced learning anxiety and enhanced academic self-efficacy may support sustained learner engagement, resilience, and equitable participation over time.
However, it is important to note that the present study does not directly assess behavioral indicators of responsible GenAI use (e.g., prompt quality, verification behavior, output accuracy, or actual dependence on AI). Therefore, the findings should be interpreted as evidence of psychological conditions that may support responsible use, rather than as demonstrating direct improvements in responsible GenAI competence.
In this sense, responsible GenAI use can be understood not only as a matter of compliance or performance optimization, but also as dependent on underlying psychological conditions that support learners’ engagement, confidence, and self-regulated learning within structured GenAI-assisted tasks [
9]. At the same time, the effectiveness of such prompting strategies may vary depending on task type, stakes, and student readiness, highlighting the importance of tailoring prompt content and timing rather than applying a uniform approach [
8,
13].
7. Implications
Given the short-term and task-specific nature of the present study, the following implications should be interpreted as exploratory and context-bound rather than broadly generalizable. The findings suggest that metacognitive prompting may serve as a useful pedagogical strategy for supporting students’ psychological experiences during GenAI-assisted learning, particularly in terms of reducing anxiety and sustaining self-efficacy within structured tasks.
The findings suggest that responsible GenAI use can be treated as a capability-building agenda in sustainable higher education rather than only a rule-compliance issue, consistent with UNESCO’s guidance on human-centered adoption and risk-aware capacity development [
9]. Below, we highlight concise theoretical, practical, and policy implications.
Theoretically, the results of this study extend self-regulated learning (SRL) to GenAI-assisted learning by showing that metacognitive prompting can activate planning, monitoring, and evaluation processes that regulate not only one’s cognition but also reliance on AI outputs [
11,
12]. The anxiety reduction effect is also consistent with Control–Value Theory, suggesting that prompts may stabilize perceived control by clarifying criteria and verification steps during AI use [
5]. Finally, the partial mediation indicates that responsible GenAI interventions should be evaluated through emotion–motivation pathways: lowering anxiety can help sustain self-efficacy, a core motivational resource in academic learning [
4].
Practically, instructors can embed short prompts at three points—before (goal/boundary setting), during (verification and monitoring), and after (reflection and authorship clarification)—to reduce ambiguity and support learner ownership. This approach aligns with evidence that well-timed, specific metacognitive prompts strengthen regulation in technology-mediated learning [
8,
13]. In assessment, adding lightweight process evidence (e.g., brief verification notes or an AI-use reflection) can reinforce responsible practice without banning GenAI, and may reduce anxiety by clarifying expectations [
5].
At the institutional level, the results of this study support combining policies with training and templates that operationalize responsible use (e.g., prompt banks, verification checklists, common disclosure language). Such capacity-building measures reduce cross-course ambiguity and promote equitable, sustainable GenAI adoption aligned with quality education goals [
9].
Importantly, framing these outcomes through the lens of psychological sustainability shifts the focus from short-term performance gains to long-term learner adaptability, resilience, and inclusion in AI-mediated education.
8. Limitations and Future Research
Several limitations should be considered when interpreting the findings of this study. First, the study relied on post-task self-report measures collected at a single time point, which raises potential concerns regarding common method bias and demand effects. In addition, the absence of pre-test measures for key outcomes such as learning anxiety and academic self-efficacy limits the ability to directly assess changes attributable to the intervention. Furthermore, as the mediator and outcome variables were measured concurrently, the temporal ordering implied in the mediation model cannot be empirically established, and therefore, the mediation results should be interpreted as suggestive rather than providing strong causal evidence. Although the experimental manipulation provides some support for internal validity, future research should adopt multi-wave or pre–post designs and incorporate multiple data sources to strengthen causal inference and reduce method-related biases. In addition, although the analytical procedures are reported, further transparency regarding regression specifications, assumption checks, and robustness considerations (e.g., handling of multiple related tests) would strengthen confidence in the findings. Future studies are encouraged to provide more detailed reporting of statistical procedures. Although initial evidence supports comparability across conditions based on configural equivalence, full measurement invariance (e.g., metric or scalar invariance) was not formally tested and should be examined in future research.
Second, the experiment was conducted as a single-site, face-to-face classroom study, which strengthens ecological validity but constrains generalizability. Student characteristics, institutional norms, and instructor expectations may differ across universities, disciplines, and cultural contexts, potentially influencing both baseline anxiety and how metacognitive prompts are received. Future research should replicate the design across multiple institutions and course types to test external validity and identify boundary conditions [
18].
Third, the study focused on a single task episode and assessed outcomes immediately after completion. Although this design is appropriate for estimating causal short-term effects, it cannot determine whether benefits persist over time or translate into durable changes in learning strategies, AI reliance calibration, or academic performance. Longitudinal or multi-wave designs—e.g., repeated prompting over a semester with follow-up measures—would allow researchers to test sustainability claims more directly, including whether self-efficacy gains are maintained and whether anxiety reductions generalize across tasks and assessments [
5].
Fourth, the measures relied primarily on self-report scales, which are suitable for capturing subjective anxiety and efficacy but may be vulnerable to common method bias and social desirability—especially in contexts where AI use is normatively sensitive. Future work should combine self-reports with behavioral indicators such as prompt logs, verification actions, time-on-task, rubric-based quality ratings, and instructor evaluations. Such triangulation would also clarify whether metacognitive prompting improves learning outcomes through better verification and reasoning, as implied by SRL theory [
11,
28].
Fifth, although the prompting manipulation increased metacognitive engagement, the study did not isolate which prompt components were most influential. The intervention bundled planning, monitoring, and evaluation prompts to reflect SRL cycles, but different components may have different effects on anxiety versus self-efficacy. Future research can use factorial or component analysis designs to compare, for example, verification-focused prompts versus reflection-focused prompts, and to test whether prompt timing (pre-use vs. mid-task vs. post-use) moderates outcomes [
8,
13].
Sixth, GenAI effects likely vary by task difficulty, assessment stakes, and discipline. The present task was designed to be academically realistic, yet GenAI-assisted learning in quantitative problem solving, programming, or high-stakes exams may elicit different anxiety–efficacy dynamics. Future studies should test the model under varied task types and stakes, and examine whether policy clarity (e.g., explicit versus ambiguous acceptable-use rules) amplifies or attenuates the prompting effect, consistent with the control component emphasized by Control–Value Theory [
5].
Finally, while the study framed responsible GenAI use as a sustainability-relevant capability, broader sustainability outcomes—such as equity impacts across student subgroups, differential benefits for low-confidence learners, and the relationship between prompting and digital/AI literacy development—were not directly tested. Future research should examine heterogeneity (e.g., prior GenAI experience, baseline anxiety, academic preparedness) and incorporate AI literacy measures to assess whether prompting benefits are stronger for students who most need scaffolding [
9].
In sum, the present study provides evidence for short-term psychological benefits under experimental conditions of metacognitive prompting during GenAI-assisted learning in a classroom context, but further research is needed to establish durability, generalizability, behavioral mechanisms, and equity implications. Addressing these limitations through multi-site replication, longitudinal designs, behavioral measurement, and intervention component testing will strengthen the evidence base for sustainable, responsible GenAI integration in higher education.
Author Contributions
Conceptualization, T.J. and Y.X.; Methodology, T.J.; Software, T.J. and Y.X.; Validation, T.J.; Formal Analysis, T.J. and Y.X.; Investigation, T.J. and Y.X.; Resources, T.J. and Y.X.; Data Curation, T.J. and Y.X.; Writing—Original Draft, T.J. and Y.X.; Writing—Review & Editing, T.J. and Y.X.; Visualization, T.J. and Y.X.; Supervision, T.J. and Y.X.; Project Administration, T.J. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Review Committee of Fujian University of Technology (protocol code FJUT-HREC092/2025 and date of approval 6 July 2025).
Informed Consent Statement
Informed consent was obtained from all subjects involved in the study.
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A. Experimental Materials and Procedure
This appendix provides the full materials used in the experiment, including the metacognitive prompting intervention, control condition instructions, academic task, and administration procedure, to ensure transparency and reproducibility.
In the metacognitive prompting condition, participants received structured prompts at three stages of the task to activate planning, monitoring, and evaluation processes during GenAI-assisted learning. Before using the GenAI tool, participants were asked to reflect on their approach by responding to the following instructions: “Before using GenAI, briefly consider the following: What is your main goal for this task? What aspects of the task will you use GenAI for, and what will you complete independently? What criteria will you use to evaluate whether the AI-generated output is reliable and relevant?” These prompts were designed to encourage goal setting, boundary specification, and the establishment of evaluation standards.
During the task, participants encountered a mid-task prompt aimed at supporting ongoing monitoring and critical evaluation of AI outputs. The instructions read: “While using GenAI, pause and reflect: Does the AI-generated content accurately reflect the source text provided? Are there any statements that seem unclear, unsupported, or questionable? Do you need to revise your prompt or request clarification to improve the output?” This stage was intended to prompt learners to actively assess and regulate their interaction with the AI system rather than passively accept generated content.
After completing the task, participants were presented with a post-use prompt to facilitate evaluation and reflection. The instructions were: “After using GenAI, reflect on your work: What parts of the final response are based on your own understanding? What changes did you make to the AI-generated content, and why? Are there any remaining uncertainties or claims that require further verification?” These prompts aimed to reinforce authorship awareness, critical reflection, and verification behavior.
In the control condition, participants completed the same academic task under identical time constraints and GenAI access conditions but did not receive metacognitive prompts. Instead, they were provided with neutral procedural instructions of comparable length, which read: “Please complete the following academic task within the allocated time. You may use the GenAI tool if you find it helpful. Follow the task instructions carefully and ensure that your response is complete and well-structured.” These instructions were designed to avoid activating metacognitive regulation while maintaining comparable instructional exposure.
The academic task required participants to read a short source text (approximately 350–400 words) on the topic of the impact of social media on academic performance and produce a structured written response of approximately 250–300 words. Participants were instructed to summarize the main arguments presented in the text, evaluate the credibility of the evidence, and present their own position supported by the provided material. The task was completed within a fixed time limit of 40 min. All participants received identical task instructions and materials to ensure comparability across conditions.
The experimental procedure followed a standardized sequence. Participants first completed a brief pre-task questionnaire assessing demographic information, prior GenAI use, and baseline task confidence. They were then randomly assigned to either the metacognitive prompting condition or the control condition. Participants completed the task individually during a regular class session, with instructions to avoid discussion. In the prompting condition, prompts were embedded at predefined stages (before, during, and after GenAI use). In the control condition, no such prompts were provided. Immediately after completing the task, participants filled out a post-task questionnaire measuring learning anxiety, academic self-efficacy, and metacognitive engagement. Finally, participants were debriefed regarding the purpose of the study.
References
- United Nations. Transforming our world: The 2030 Agenda for Sustainable Development. In United Nations General Assembly; United Nations: New York, NY, USA, 2015. [Google Scholar]
- Kasneci, E.; Seßler, K.; Küchemann, S.; Bannert, M.; Dementieva, D.; Fischer, F.; Gasser, U.; Groh, G.; Günnemann, S.; Hüllermeier, E. ChatGPT for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 2023, 103, 102274. [Google Scholar] [CrossRef]
- Memarian, B.; Doleck, T. ChatGPT in education: Methods, potentials, and limitations. Comput. Hum. Behav. Artif. Hum. 2023, 1, 100022. [Google Scholar] [CrossRef]
- Artino, A.R., Jr. Academic self-efficacy: From educational theory to instructional practice. Perspect. Med. Educ. 2012, 1, 76–85. [Google Scholar] [CrossRef]
- Pekrun, R. The control-value theory of achievement emotions: Assumptions, corollaries, and implications for educational research and practice. Educ. Psychol. Rev. 2006, 18, 315–341. [Google Scholar] [CrossRef]
- Li, J.; Huang, J.-S. Dimensions of artificial intelligence anxiety based on the integrated fear acquisition theory. Technol. Soc. 2020, 63, 101410. [Google Scholar] [CrossRef]
- Tankelevitch, L.; Kewenig, V.; Simkute, A.; Scott, A.E.; Sarkar, A.; Sellen, A.; Rintel, S. The metacognitive demands and opportunities of generative AI. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 11–16 May 2024. [Google Scholar]
- Guo, L. Using metacognitive prompts to enhance self-regulated learning and learning outcomes: A meta-analysis of experimental studies in computer-based learning environments. J. Comput. Assist. Learn. 2022, 38, 811–832. [Google Scholar] [CrossRef]
- UNESCO. Guidance for Generative AI in Education and Research; UNESCO: Paris, France, 2023. [Google Scholar]
- Kong, S.-C.; Cheung, M.-Y.W.; Tsang, O. Developing an artificial intelligence literacy framework: Evaluation of a literacy course for senior secondary students using a project-based learning approach. Comput. Educ. Artif. Intell. 2024, 6, 100214. [Google Scholar] [CrossRef]
- Zimmerman, B.J. Attaining self-regulation: A social cognitive perspective. In Handbook of Self-Regulation; Elsevier: Amsterdam, The Netherlands, 2000; pp. 13–39. [Google Scholar]
- Panadero, E. A review of self-regulated learning: Six models and four directions for research. Front. Psychol. 2017, 8, 422. [Google Scholar] [CrossRef] [PubMed]
- Azevedo, R.; Hadwin, A.F. Scaffolding self-regulated learning and metacognition–Implications for the design of computer-based scaffolds. Instr. Sci. 2005, 33, 367–379. [Google Scholar] [CrossRef]
- Primawati, R.I.; Wiji, W.; Mulyani, S. The Effect of Self-Efficacy on Academic Achievement and Learning Engagement: A Systematic Literature Review. J. Penelit. Pendidik. IPA 2026, 12, 1–11. [Google Scholar] [CrossRef]
- Saltos, K.; Núñez, C.; Veloz, V.; Pilozo, L. Relación entre la autorregulación del aprendizaje y el rendimiento académico en estudiantes de educación superior: Relationship between Self-Regulated Learning and Academic Performance in Higher Education Students. Rev. Multidiscip. Estud. Gen. 2026, 5, 1229–1245. [Google Scholar] [CrossRef]
- Martino, M.L.; Passeggia, R.; Di Natale, M.R.; Freda, M.F. The promotion of Self-Efficacy in the Sinapsi Academic Self-Management Training Group: The predictive role of academic engagement. Curr. Psychol. 2026, 45, 315. [Google Scholar] [CrossRef]
- Bantasan, M.; Ligtinen, M.F. Examining the Predictive Role of Teacher Support Typologies on Students’ Research Self-Efficacy: An Integrated Theoretical Perspective. J. Interdiscip. Perspect. 2026, 4, 169–177. [Google Scholar] [CrossRef]
- Sheehan, P.; Kwon, M.; Steiner, P.M. Quasi-Experimental Designs for Causal Inference About Intervention Effects: Addressing Threats to Validity from a Graphical Models Perspective. In Handbook of Research Methods in Developmental Science; Wiley: Hoboken, NJ, USA, 2026; pp. 79–109. [Google Scholar]
- Miccoli, M.R.; Miller, M.; Reips, U.-D. Mental accounting of time: Attendance likelihood for rescheduled events. Sage Open 2026, 16, 21582440261421844. [Google Scholar] [CrossRef]
- Mitchell, S.; Yousif, Y.F.; Zien, M.; Memisoglu, Y.O.; Al-Bassam, N.; Al Saidi, Y.H.; Ming, H.Y.M.; Yousuff, M.; Mansour, H.R.K.; Daneshi, K. CONSORT Compliance of Randomized Controlled Trials in Rhinoplasty: A Systematic Review. Aesthetic Plast. Surg. 2026, 1–11. [Google Scholar] [CrossRef]
- Arbabi, M.A.; Monib, W.K.; Abubakari, M.S.; Shafik, W. Generative Artificial Intelligence in Academic Research: Transformations, Challenges, and Futures. In AI-Powered Transformations for Ethical Education; IGI Global Scientific Publishing: Palmdale, PA, USA, 2026; pp. 55–84. [Google Scholar]
- Zhang, T.; Han, H. The path of promoting adolescents’ physical exercise: Based on the control-value theory of achievement emotions. Br. J. Educ. Psychol. 2026. [Google Scholar] [CrossRef] [PubMed]
- Sadoughi, M.; Hejazi, S.Y.; Alrabai, F. Learning Environment and Foreign Language Classroom Boredom: Unravelling the Emotional and Motivational Mechanisms. Percept. Mot. Ski. 2026, 00315125251414175. [Google Scholar] [CrossRef] [PubMed]
- Setiawan, R.F.W.; Khoirunnisa, R.N. Self-Efficacy and Self-Adjustment of First-Year Students Living Away from Home: Implications for Educational Management. J. Educ. Manag. Res. 2026, 5, 356–368. [Google Scholar]
- Cook, D.A.; Skrupky, L.P. Validation of the motivated strategies for learning questionnaire and instructional materials motivation survey. Med. Teach. 2025, 47, 635–645. [Google Scholar] [CrossRef] [PubMed]
- Carolus, A.; Koch, M.J.; Straka, S.; Latoschik, M.E.; Wienrich, C. MAILS-Meta AI literacy scale: Development and testing of an AI literacy questionnaire based on well-founded competency models and psychological change-and meta-competencies. Comput. Hum. Behav. Artif. Hum. 2023, 1, 100014. [Google Scholar] [CrossRef]
- Oppenheimer, D.M.; Meyvis, T.; Davidenko, N. Instructional manipulation checks: Detecting satisficing to increase statistical power. J. Exp. Soc. Psychol. 2009, 45, 867–872. [Google Scholar] [CrossRef]
- Pekrun, R.; Goetz, T.; Frenzel, A.C.; Barchfeld, P.; Perry, R.P. Measuring emotions in students’ learning and performance: The Achievement Emotions Questionnaire (AEQ). Contemp. Educ. Psychol. 2011, 36, 36–48. [Google Scholar] [CrossRef]
- Pintrich, P.R. A Manual for the Use of the Motivated Strategies for Learning Questionnaire (MSLQ). 1991. Available online: https://www.researchgate.net/publication/271429287_A_Manual_for_the_Use_of_the_Motivated_Strategies_for_Learning_Questionnaire_MSLQ (accessed on 1 March 2026).
- Reips, U.-D. Standards for Internet-based experimenting. Exp. Psychol. 2002, 49, 243. [Google Scholar]
- Raudenbush, S.W. Hierarchical linear models: Applications and data analysis methods. In Advanced Quantitative Techniques in the Social Sciences Series; SAGE: London, UK, 2002. [Google Scholar]
- Cohen, J. Statistical Power Analysis for the Behavioral Sciences; Routledge: London, UK, 2013. [Google Scholar]
- Faul, F.; Erdfelder, E.; Lang, A.-G.; Buchner, A. G* Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Res. Methods 2007, 39, 175–191. [Google Scholar] [CrossRef]
- Schulz, K.F.; Altman, D.G.; Moher, D.; Group*, C. CONSORT 2010 statement: Updated guidelines for reporting parallel group randomized trials. Ann. Intern. Med. 2010, 152, 726–732. [Google Scholar] [CrossRef]
- Kline, R.B. Principles and Practice of Structural Equation Modeling; Guilford Publications: New York, NY, USA, 2023. [Google Scholar]
- Preacher, K.J.; Hayes, A.F. Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behav. Res. Methods 2008, 40, 879–891. [Google Scholar] [CrossRef]
- Podsakoff, P.M.; MacKenzie, S.B.; Lee, J.-Y.; Podsakoff, N.P. Common method biases in behavioral research: A critical review of the literature and recommended remedies. J. Appl. Psychol. 2003, 88, 879. [Google Scholar] [CrossRef] [PubMed]
- Cronbach, L.J. Coefficient alpha and the internal structure of tests. Psychometrika 1951, 16, 297–334. [Google Scholar] [CrossRef]
- Raykov, T. Estimation of composite reliability for congeneric measures. Appl. Psychol. Meas. 1997, 21, 173–184. [Google Scholar] [CrossRef]
- Fornell, C.; Larcker, D.F. Evaluating structural equation models with unobservable variables and measurement error. J. Mark. Res. 1981, 18, 39–50. [Google Scholar] [CrossRef]
- Henseler, J.; Ringle, C.M.; Sarstedt, M. A new criterion for assessing discriminant validity in variance-based structural equation modeling. J. Acad. Mark. Sci. 2015, 43, 115–135. [Google Scholar] [CrossRef]
- Hu, L.t.; Bentler, P.M. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Struct. Equ. Model. A Multidiscip. J. 1999, 6, 1–55. [Google Scholar] [CrossRef]
- Vandenberg, R.J.; Lance, C.E. A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organ. Res. Methods 2000, 3, 4–70. [Google Scholar] [CrossRef]
- Association, A.P. Publication Manual of the American Psychological Association; American Psychological Association: Washington, DC, USA, 2020. [Google Scholar]
Table 1.
Sample characteristics and baseline equivalence by condition (N = 148).
Table 1.
Sample characteristics and baseline equivalence by condition (N = 148).
| Variable | Metacognitive Prompting (n = 74) | Control (n = 74) | Test Statistic | p |
|---|
| Age (years), M (SD) | 19.9 (1.1) | 20.0 (1.2) | t = −0.52 | 0.604 |
| Female, n (%) | 45 (60.8%) | 43 (58.1%) | χ2 = 0.11 | 0.742 |
| Male, n (%) | 29 (39.2%) | 31 (41.9%) | — | — |
| Year of study, n (%) | | | χ2 = 0.36 | 0.949 |
| Year 1 | 18 (24.3%) | 17 (23.0%) | | |
| Year 2 | 22 (29.7%) | 23 (31.1%) | | |
| Year 3 | 20 (27.0%) | 19 (25.7%) | | |
| Year 4 | 14 (18.9%) | 15 (20.3%) | | |
| Major (Business/Management), n (%) | 52 (70.3%) | 50 (67.6%) | χ2 = 0.13 | 0.718 |
| Prior GenAI use for study (1–7), M (SD) | 4.62 (1.31) | 4.55 (1.29) | t = 0.33 | 0.740 |
| Baseline task confidence (1–7), M (SD) | 4.81 (1.02) | 4.74 (1.06) | t = 0.41 | 0.683 |
| GPA (0–4.0), M (SD) | 3.18 (0.41) | 3.16 (0.43) | t = 0.29 | 0.770 |
| Prior AI/GenAI training (yes), n (%) | 21 (28.4%) | 19 (25.7%) | χ2 = 0.14 | 0.705 |
Table 2.
Measurement model (CFA) results: factor loadings, reliability, and validity (N = 148).
Table 2.
Measurement model (CFA) results: factor loadings, reliability, and validity (N = 148).
| (A) Confirmatory Factor Analysis Model Fit. |
| Model | χ2 (df) | CFI | TLI | RMSEA | SRMR |
| Two-factor CFA (Anxiety, Self-efficacy) | 48.60 (34) | 0.97 | 0.96 | 0.05 | 0.04 |
| (B) Standardized factor loadings (λ) by construct. |
| Construct/Item | Standardized loading (λ) |
| Learning anxiety | |
| LA1 | 0.74 |
| LA2 | 0.82 |
| LA3 | 0.85 |
| LA4 | 0.71 |
| Academic self-efficacy | |
| ASE1 | 0.78 |
| ASE2 | 0.84 |
| ASE3 | 0.80 |
| ASE4 | 0.76 |
| (C) Reliability and convergent validity. |
| Construct | Cronbach’s α | CR | AVE |
| Learning anxiety | 0.88 | 0.89 | 0.64 |
| Academic self-efficacy | 0.90 | 0.91 | 0.65 |
| (D) Discriminant validity (HTMT). |
| Pair of constructs | HTMT |
| Anxiety ↔ Self-efficacy | 0.55 |
Table 3.
Descriptive statistics, reliability, and correlations (N = 148).
Table 3.
Descriptive statistics, reliability, and correlations (N = 148).
| Variable | M | SD | 1 | 2 | 3 | 4 | 5 |
|---|
| 1. Metacognitive engagement (1–7) | 5.14 | 0.93 | (0.86) | | | | |
| 2. Learning anxiety (1–7) | 3.44 | 1.10 | −0.35 *** | (0.88) | | | |
| 3. Academic self-efficacy (1–7) | 5.00 | 0.91 | 0.40 *** | −0.55 *** | (0.90) | | |
| 4. Prior GenAI use for study (1–7) | 4.59 | 1.30 | 0.25 ** | −0.15 | 0.20 * | — | |
| 5. Baseline task confidence (1–7) | 4.78 | 1.04 | 0.28 ** | −0.30 *** | 0.45 *** | 0.22 ** | — |
Table 4.
Manipulation check results (metacognitive engagement) (N = 148).
Table 4.
Manipulation check results (metacognitive engagement) (N = 148).
| Variable | Metacognitive Prompting (n = 74) | Control (n = 74) | Mean Difference | t(df) | p | Cohen’s d |
| Metacognitive engagement (1–7), M (SD) | 5.62 (0.74) | 4.65 (0.83) | 0.97 | 7.50 (146) | <0.001 | 1.23 |
| Treatment receipt check: |
| Item | Prompting (n = 74) | Control (n = 74) | χ2 (df) | p |
| Noticed structured prompts requiring planning/verification/reflection (Yes), n (%) | 63 (85.1%) | 17 (23.0%) | 56.0 (1) | <0.001 |
Table 5.
Summary of Hypothesis Testing Results.
Table 5.
Summary of Hypothesis Testing Results.
| Hypothesis | Relationship Tested | b | SE | p | Result |
|---|
| H1 | Prompting → Learning Anxiety | −0.68 | 0.18 | <0.001 | Supported |
| H2 | Prompting → Self-Efficacy | 0.40 | 0.15 | 0.008 | Supported |
| H3 | Anxiety → Self-Efficacy | −0.42 | 0.06 | <0.001 | Supported |
| H4 | Indirect (Prompting → Anxiety → SE) | 0.26 | — | — | Supported * |
Table 6.
Robustness and sensitivity analyses (N = 148).
Table 6.
Robustness and sensitivity analyses (N = 148).
| (A) Main effects under alternative samples/specifications. |
| Analysis Specification | DV | Prompting Effect (b) | SE | t | p | 95% CI |
| Primary model (covariates included) | Learning anxiety | −0.68 | 0.18 | −3.81 | <0.001 | [−1.03, −0.33] |
| No covariates (condition only) | Learning anxiety | −0.66 | 0.17 | −3.86 | <0.001 | [−1.00, −0.32] |
| Excluding attention-check failures (N = 140) | Learning anxiety | −0.70 | 0.19 | −3.68 | <0.001 | [−1.07, −0.33] |
| Primary model (covariates included) | Academic self-efficacy | 0.40 | 0.15 | 2.70 | 0.008 | [0.11, 0.69] |
| No covariates (condition only) | Academic self-efficacy | 0.38 | 0.14 | 2.71 | 0.007 | [0.10, 0.66] |
| Excluding attention-check failures (N = 140) | Academic self-efficacy | 0.41 | 0.16 | 2.56 | 0.012 | [0.09, 0.73] |
| (B) Sensitivity to classroom clustering (if multiple sections). |
| Model | DV | Prompting effect (b) | SE | p | Interpretation |
| Random intercept for class/section (multilevel) | Learning anxiety | −0.65 | 0.20 | 0.002 | Direction and significance unchanged |
| Random intercept for class/section (multilevel) | Academic self-efficacy | 0.37 | 0.16 | 0.021 | Direction and significance unchanged |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |