Positioning Generative AI in EFL Peer Feedback: Training Feedback Literacy and Enabling Uptake in Speaking Classes

Irwin, Bradley; Muller, Theron

doi:10.3390/educsci16040544

Open AccessArticle

Positioning Generative AI in EFL Peer Feedback: Training Feedback Literacy and Enabling Uptake in Speaking Classes

by

Bradley Irwin

^1,2,*

and

Theron Muller

³

¹

Faculty of Arts and Letters, Kyoritsu Women’s University, 2 Chome-2-1 Hitotsubashi, Chiyoda, Tokyo 101-8437, Japan

²

Graduate School of Human Sciences, Waseda University, 2-579-15 Mikajima, Tokorozawa 359-1192, Japan

³

Faculty of Human Sciences, Waseda University, 2-579-15 Mikajima, Tokorozawa 359-1192, Japan

^*

Author to whom correspondence should be addressed.

Educ. Sci. 2026, 16(4), 544; https://doi.org/10.3390/educsci16040544

Submission received: 2 February 2026 / Revised: 19 March 2026 / Accepted: 27 March 2026 / Published: 1 April 2026

(This article belongs to the Special Issue Artificial Intelligence and Language Learning: Innovations, Impacts, and Insights)

Download Versions Notes

Abstract

Peer feedback is widely used in English as a foreign language (EFL) higher education, yet its benefits are often limited by uneven feedback quality and learners’ difficulty in interpreting and using comments. This theoretical paper synthesizes research on peer feedback, student feedback literacy, and recent developments in generative artificial intelligence (GenAI) to propose a theory-informed design framework that positions GenAI as Trainer and Synthesizer in L2 speaking peer feedback. Building on feedback literacy as a set of capacities (appreciating feedback, making judgments, managing affect, and taking action), the paper argues that speaking tasks create distinct constraints, including time pressure, fleeting performance, and heightened affect, which make real-time peer feedback promising but pedagogically challenging. To address these challenges, here we introduce two complementary roles for GenAI in peer feedback workflows: a Trainer that supports feedback quality through calibration with exemplars, rubric-guided practice, and feedback-on-feedback; and a Synthesizer that aggregates peer input into concise, actionable guidance linked to criteria and learning goals. The conceptual proposal specifies key design principles (e.g., transparency, learner agency, teacher-in-the-loop oversight, and privacy-conscious data practices) and outlines researchable propositions for evaluating learning, engagement, and equity outcomes. The paper concludes with implications for task design, training sequences, and responsible classroom implementation.

Keywords:

generative AI (GenAI); AI-supported peer feedback; EFL speaking; peer assessment; student feedback literacy; rubric-based assessment

1. Introduction

This article proposes two complementary roles for generative artificial intelligence (GenAI), implemented in a constrained, pedagogically governed manner, to support peer feedback on speaking in English as a foreign or second language (EFL/ESL). The proposal addresses two persistent challenges documented in peer assessment research: the variable quality of comments that students provide (Cheng & Warren, 2005; De Grez et al., 2012) and the difficulty speakers face in processing and acting on multiple, sometimes contradictory peer messages (Jonsson, 2013; Winstone et al., 2017). The first role for GenAI is as Trainer, which is intended to support students to give more specific, criterion-referenced peer feedback. The second role is as Synthesizer, which is intended to support recipients in interpreting multiple peer comments and identifying priorities for action. The mechanisms through which these two roles operate are developed in Section 3. Together, they target the twin bottlenecks of peer feedback production and uptake.

Our proposal builds on research showing that feedback promotes learning when it is clear, purposeful, and actionable (Hattie & Timperley, 2007; Shute, 2008; Wisniewski et al., 2020). Among formative feedback approaches, peer feedback is a well-established and effective method that can promote evaluative judgment, engagement, and achievement (K. Topping, 1998). In EFL/ESL higher education, peer feedback is integral to formative assessment and learner agency (Double et al., 2020; Yeh et al., 2019). However, when the target performance is oral, such as in performative speaking tasks, peer feedback often exhibits problems on both the production and uptake sides that weaken its formative promise in speaking classes. On the production side, comments can be generic, thinly justified, or misaligned with task criteria (Cheng & Warren, 2005; De Grez et al., 2012). On the uptake side, speakers may face cognitive overload when interpreting multiple, sometimes contradictory, peer messages into concrete revisions for their next performance (Jonsson, 2013; Winstone et al., 2017). This conceptual proposal is intended to help students benefit more fully from peer review in speaking classes by providing scaffolded support for both the production and uptake of feedback.

The positioning of GenAI as Trainer and Synthesizer is grounded in two strands of scholarship: student feedback literacy (Carless & Boud, 2018) and GenAI-enabled student feedback engagement as a cyclical, self-regulated process (Zhan et al., 2025). Feedback literacy research conceptualizes the capacities learners require to benefit from feedback (appreciating feedback, making judgments, managing affect, and taking action) and argues for explicit curricular design to cultivate them (Carless & Boud, 2018). Building on a broader cyclical model of self-regulation proposed by Zimmerman (2000), Zhan et al. (2025) adapt this theory to GenAI-enabled feedback engagement and conceptualize learners’ work with feedback in three phases: forethought (pre-task planning and expectations), control (in-task regulation and immediate response), and retrospect (post-task reflection and transfer). Mapping GenAI functions to these constructs clarifies their pedagogical purpose: the Trainer primarily supports appreciation, calibrated judgment, and affect management during the forethought and the outset of the control (in-task regulation) phases, while the Synthesizer primarily supports taking action and reflection during later stages of the control (immediate response) and the retrospect phases.

The present paper is a conceptual proposal focused on EFL/ESL speaking tasks involving performative assessment. The Trainer and Synthesizer are treated as theory-informed design patterns rather than validated systems or off-the-shelf tools. Any references to possible delivery environments should therefore be read as illustrative examples of how such roles might be instantiated within teacher-guided institutional ecosystems, not as claims of demonstrated feasibility. The design patterns’ scope is deliberately limited: it excludes automated grading, unsupervised private tutoring, and high-stakes decisions. Instead, teachers remain in the loop to approve prompts, audit outputs, and integrate GenAI-generated, teacher-curated artifacts into instruction; privacy, provenance, and bias checks are explicit design constraints.

The aims of the current paper are threefold. First, to articulate a clear, theory-aligned positioning for GenAI in EFL/ESL peer feedback on speaking. Second, to derive design principles and ethical guardrails suitable for higher education English language instructional contexts. Third, to formulate theoretical expectations and a research agenda that can be pursued in subsequent empirical studies. The remainder of the paper is organized as follows: Section 2 consolidates the theoretical foundations. Section 3 details the Trainer and Synthesizer roles and their mechanisms. Section 4 presents the design principles. Section 5 explores ethics and governance issues. Section 6 advances the paper’s theoretical expectations. Section 7 outlines the future research agenda. Section 8 discusses boundary conditions and design implications.

2. Theoretical Foundations

2.1. Peer Feedback in EFL Speaking

Peer feedback refers to information about performance quality and guidance for improvement that learners of similar status provide to one another (K. Topping, 1998). In EFL/ESL speaking, the benefits of peer feedback are strongest when activities are structured with clear criteria, exemplars, and brief training (Hung et al., 2016; van Zundert et al., 2010). Studies report gains in presentation performance and reductions in speaking anxiety when learners exchange feedback online and can revisit comments asynchronously, which allows time to notice issues and plan revisions (Yeh et al., 2019; Tseng & Yeh, 2019). It has also been shown that technology can improve logistics by removing time and place constraints, widening participation to quieter students, and allowing teachers to monitor exchanges (J. G. Wu & Miller, 2020). With trained raters and design choices such as using more than one peer rater, peer assessment of EFL speaking can align closely with teacher judgments, addressing reliability concerns (Li et al., 2022). Peer review also develops evaluative judgment and self-regulation in both givers and receivers of feedback, which supports broader learner autonomy goals (K. Topping, 1998; Carless & Boud, 2018).

Despite these benefits, several challenges persist. Without explicit guidance, learners may produce generic or thinly justified comments, and friendship dynamics or preferences for teacher validation can skew peer ratings and reduce trust (Li et al., 2022; Azarnoosh, 2013; Mok, 2011; Zhao, 2018). Lower proficiency can limit students’ ability to notice problems in speech or to articulate specific, criterion-referenced suggestions (Cheng & Warren, 2005; K. Topping, 1998). In addition, variation in tone and accuracy can lead to unconstructive or discouraging comments if teachers do not scaffold feedback (Chien et al., 2020). These issues point to the need to support both peer feedback production and uptake. A feedback-literacy perspective emphasizes the understandings, capacities, and dispositions that enable students to interpret information and act on it, which is directly applicable to spoken performance and follow-up practice (Carless & Boud, 2018).

A theoretical account of technology’s role in peer feedback should therefore foreground instructor-controlled design choices that improve feedback quality and uptake. Useful design choices include criteria and feedback language training, task timing that enables reflection on recorded speech (Dai & Wu, 2022; Tseng & Yeh, 2019), multiple peer raters to stabilize judgments (Li et al., 2022), and visibility settings that balance accountability with psychological safety (K. J. Topping et al., 2025). Research with young learners suggests that peer feedback training should include explicit discussion of criteria and guidance on giving constructive suggestions (Hung et al., 2016). Further, video or audio workflows can support targeted work on features like pronunciation and intonation while giving teachers quality assurance oversight (Dai & Wu, 2022; Tseng & Yeh, 2019). Framing peer feedback within feedback literacy further clarifies how technology can support learner sense-making and planning next steps, as well as how to produce better comments. More research should investigate both how design choices interact in speaking tasks across age groups and contexts, and how they influence immediate revisions and longer-term oral communication skill development (Carless & Boud, 2018; Li et al., 2022).

2.2. Student Feedback Literacy

Student feedback literacy refers to the understandings, capacities, and dispositions needed to make sense of information from various sources and use it to enhance work or learning strategies, comprising four interdependent elements: (a) appreciating feedback; (b) making judgments; (c) managing affect; and (d) taking action (Carless & Boud, 2018).

Feedback literacy is highly relevant to L2 speaking because learners need to make sense of feedback to use it to guide subsequent performances (Carless & Boud, 2018). Existing scholarship shows that stronger feedback literacy is associated with deeper engagement with feedback processes, better use of teacher and peer comments, and greater evaluative judgment and self-regulation capacity (Han & Xu, 2021; Winstone et al., 2017; T. Zhang & Mao, 2023). Related speaking research further suggests that when learners can revisit feedback and plan revisions, they may improve their fluency, confidence, and course performance (Aguilera-Fuentes & Ortiz-Navarrete, 2025). In peer activities, giving feedback trains attention to criteria and common pitfalls, which feed forward into the feedback giver’s speaking practice, contributing to a classroom culture where feedback is sought and used productively.

Developing feedback literacy presents challenges. Learners may enter with passive orientations, low self-efficacy, or information overload, and without explicit training, they may not know how to use comments for revision (Taguba & Plata, 2025). Practical constraints also matter, as time and workload for rich feedback, large classes, and uneven technology access reduce opportunities to revisit and act on comments (Aguilera-Fuentes & Ortiz-Navarrete, 2025). Cultural and assessment contexts can dampen dialogue about feedback and focus attention on scores rather than formative use, which undermines agentic engagement (Winstone et al., 2017). These constraints argue for planned instruction in how to use feedback, safe climates that balance critique with encouragement, staged designs that prevent overload, and structured reflection that helps learners take ownership of next steps.

Recent work on peer feedback literacy in EFL writing validates four complementary dimensions: appreciation of peer feedback, feedback-related knowledge and abilities, negotiation agency, and revision efficacy (F. Zhang et al., 2025). Applied to speaking, the Trainer can strengthen appreciation and knowledge/skills, as well as prompt negotiation agency by shaping tone and justification in dialogic exchanges, while the Synthesizer supports revision efficacy by organizing peer input into feasible next steps (Carless & Boud, 2018; F. Zhang et al., 2025).

2.3. Cyclical Engagement with Feedback and the Role of GenAI

Cyclical accounts of learning and self-regulation have a well-established theoretical history, most notably in models that distinguish phases of forethought, performance/control, and self-reflection (Zimmerman, 2000, 2002). Building on this broader tradition, Zhan et al. (2025) adapt cyclical logic to student feedback engagement in GenAI-mediated contexts, distinguishing three broad phases: forethought (before a task, when learners prepare and set expectations), control (during and immediately after the task, when they respond to feedback and regulate their performance), and retrospect (after the task, when they reflect and plan what to change next time). Conditions such as task design, learner agency, and available technological support shape how effectively students move through these phases and how well this engagement aligns with the feedback literacy elements of appreciating feedback, making judgments, managing affect, and taking action articulated by Carless and Boud (2018).

Within this cycle, we position GenAI as a tool that supports learner action while keeping human judgment and teacher guidance at the center through two complementary roles: GenAI as Trainer and GenAI as Synthesizer, which were introduced in Section 1 and are elaborated in detail in Section 3. In brief, the Trainer supports students as feedback givers and the Synthesizer as feedback recipients. This positioning is compatible with K. J. Topping et al. (2025), who see promising roles for GenAI in enhancing individual reviews and analyzing student feedback, without replacing human evaluators.

In the forethought phase, GenAI as Trainer introduces criteria and exemplars, helps students compare their own judgments with expert ratings, prompts them to justify their views with evidence, rehearses feedback language, and encourages them to set specific goals for the upcoming task. These supports build the feedback literacy elements of appreciating feedback and making judgments (Carless & Boud, 2018) by helping learners prepare for review, calibrate expectations, and approach the task with clearer criteria in mind.

In the control phase, two forms of support operate. First, as students write comments, the Trainer can flag vague wording, misalignment with the rubric, or unhelpful tone and ask for revisions. This nudges comments toward clearer, more specific, and more constructive phrasing. Second, as peer feedback is received, GenAI as Synthesizer groups comments by criterion, highlights points of agreement and disagreement, and suggests a short list of priorities. As K. J. Topping et al. (2025) note, learners can feel overwhelmed by the amount and variability of peer feedback, so a GenAI Synthesizer that condenses and personalizes input can reduce overload and support timely uptake. Evidence from AI-supported peer feedback in L2 writing shows that such scaffolds can improve feedback quality and later performance (Guo et al., 2024), which provides a plausible mechanism for speaking tasks that require targeted revision planning. These functions map to the control phase by reducing cognitive load and supporting in-the-moment self-regulation (Zhan et al., 2025).

In the retrospective phase, the Synthesizer produces a concise uptake report. This report lists prioritized themes with short rationales and representative excerpts, points out disagreements that may need clarification, proposes concrete goals for the next performance, and links to brief, level-appropriate resources. Each resource is clearly sourced so that teachers can see how the report was generated. This supports the taking action and reflection elements of feedback literacy as well as responding to calls to make feedback more visible, easier to revisit, and more clearly connected to next steps (Carless & Boud, 2018). Across all three stages, GenAI is used to support learners during high cognitive and affective load while teachers retain evaluative authority and pedagogical control. The system prompts judgment, organizes information so that it is easier to use, and supports both planning and enacting change. It does not grade students, override minority peer views, or remove student agency.

Taken together, the feedback literacy framework (Carless & Boud, 2018) clarifies what capacities instruction should develop, with the cyclical model (Zhan et al., 2025) clarifying when and how supports should be deployed. Integrating the two suggests a design logic for EFL speaking courses: (a) provide structured, criterion-referenced practice with exemplars before live peer review to build judgment skills; (b) offer mediated scaffolds that cue constructive language and evidence-based justification while students compose feedback; (c) after review, present recipients with synthesized, prioritized themes drawn from multiple peer comments, plus short resources and simple templates that help them plan concrete next steps; and (d) support feedback uptake by prompting learners to set and revisit specific goals for their next performance. This integration establishes the conceptual foundation for the Trainer and Synthesizer roles developed in Section 3 and for the theoretical expectations and future research directions outlined later.

3. Positioning GenAI: Two Roles

3.1. GenAI as Trainer

Whereas K. J. Topping et al. (2025) envision using GenAI as a real-time coach that enhances individual reviews as students formulate comments, we conceptualize the Trainer as guided practice that builds students’ capacity to give feedback using exemplars and analytical rubrics. In the proposed design, learners would apply criteria, calibrate judgments with reference standards, and respond to open prompts, while the Trainer would provide scaffolded guidance intended to help them revise comments toward greater specificity, criterion alignment, evidential support, and constructive phrasing. In feedback literacy terms, the Trainer most directly builds appreciation and feedback knowledge/skills and can cue negotiation agency when reviewers justify or refine comments (Carless & Boud, 2018; F. Zhang et al., 2025).

One plausible training sequence would be short and repeatable. Students might first apply an analytic rubric to a 60–90 s speaking exemplar (e.g., an excerpt from a presentation) and then compare their judgments with expert ratings at the criterion level. In such a design, the system could highlight points of convergence and divergence and prompt students to justify their ratings with reference to evidence. After this calibration step, students could submit open-ended comments in response to prompts, and the Trainer could provide feedback-on-feedback intended to draw attention to vagueness, missing justification, misalignment with criteria, or counterproductive tone. Students could then revise their comments toward more criterion-linked feedback with a concrete suggestion. Whether such scaffolds can function reliably across diverse learners, tasks, and contexts would need to be established through empirical testing.

The following transformations are intended as illustrative examples of how the Trainer could support comment revision. Beginning from short, global reactions (for example, “Nice voice,” “Slides were confusing,” or “Speak louder”), the proposed design would prompt students to connect their comments to specific rubric dimensions, time-coded evidence, and concrete next steps. For instance, a global comment like “Nice voice, but can’t hear past.” might be revised to “Intelligibility: increase consonant endings in past tense at 0:42 and 1:15. Practice with minimal pair drills for final consonants.” Similarly, “Slides were confusing” might be revised to “Organization: add a signpost before the second example at 1:05 and use a slide title that states the claim in one sentence.” Over repeated training cycles, feedback givers may become better able to name the relevant feature (such as intelligibility, organization, or delivery), point to where it occurs in the recording, and suggest a feasible strategy or practice activity. In this way, the Trainer is conceptualized as support for moving giver feedback from vague praise or criticism toward criterion-referenced diagnoses and actionable, behaviour-focused suggestions.

In a possible implementation, students could retain a small portfolio of calibrated scores, revised comments, and short reflections that they revisit before later peer-review tasks. Teachers could also view a simple dashboard showing patterns that students systematically over- or underrate and recurring weaknesses in their comments. Such information could support brief follow-up instruction, for example, a mini lesson on signposting if organizational features appear to be consistently under-noticed. Framed in this way, the Trainer supports teacher-guided feedback development rather than autonomously evaluating student feedback. It does not assign grades, and any suggested revisions remain open to teacher and student judgment. More broadly, the proposed design is consistent with work suggesting that GenAI may enhance peer assessment by improving the specificity and justification of individual reviewers (K. J. Topping et al., 2025).

3.2. GenAI as Synthesizer

This role is intended to support feedback receivers during uptake by reducing cognitive overload, strengthening recipience processes, and increasing the likelihood of prioritized action while preserving divergent perspectives (Carless & Boud, 2018; Guo et al., 2024; Jonsson, 2013; K. J. Topping et al., 2025; Winstone et al., 2017). In feedback literacy terms, it is designed to support revision efficacy by helping learners move from comments to concrete actions (F. Zhang et al., 2025).

In the proposed design, the Synthesizer would receive rubric scores and open comments from multiple peers for a single performance and generate a draft uptake report soon after feedback collection. It could normalize formats, group comments by criterion, cluster similar points, and identify areas of agreement and disagreement. Minority views could also be preserved and labeled as alternative perspectives. The intended output would be a concise thematic summary, while the complete set of anonymized peer comments would remain available to the receiver.

One possible report template is outlined here to illustrate the intended function of the Synthesizer:

Prioritized themes with a one-sentence rationale and one representative excerpt per theme.
Contradictions or uncertainties that merit clarification before revision.
Two or three candidate next-step goals framed as specific actions for the next performance.
Links to short, level-appropriate learning resources for each theme, with each resource listing its source.

The following excerpt is intended as an illustrative example rather than a demonstrated system output:

Theme: Pacing and pausing.
- Rationale: Three of four reviewers noted fast delivery and missing pauses before new points. Quote: “Hard to catch the new idea for the second topic.”
- Goal: Insert a two-second pause before each section heading and plan a summary sentence at the end of each section.
- Resource: Link to a 3-min video on chunking and pauses.

In a possible implementation, teachers could review the synthesized report, adjust resource mappings, and add brief guidance, thereby maintaining instructor oversight (K. J. Topping et al., 2025). The proposed design also assumes that the original peer comments would remain accessible and that the Synthesizer would not assign grades. Rather, it is conceptualized as support for action planning and reflection while preserving student agency as givers and receivers of peer feedback. By organizing and summarizing peer feedback and linking themes to candidate learning resources, the Synthesizer is intended to reduce overload and support feedback receiver uptake in ways that are broadly consistent with recommendations for GenAI-supported peer assessment (K. J. Topping et al., 2025), although the educational value of such support would need empirical testing.

3.3. Workflow Integration

This subsection outlines one possible way the Trainer and Synthesizer could connect within a single cycle in EFL/ESL speaking classes that use performative tasks (e.g., presentations, pitches, demonstrations). Before a speaking task, students could complete a short Trainer activity aligned with the teacher-designed rubric so that they can rehearse applying criteria, comparing their judgments with benchmarked examples, and revising comments toward greater specificity and usefulness. During or immediately after the task, feedback givers could submit rubric scores and open comments, and the teacher could also contribute ratings or moderation where appropriate. The Synthesizer could then organize the resulting feedback into a concise uptake report that identifies common themes, highlights possible areas of agreement or disagreement, and helps feedback receivers select one or two priorities for their next performance. Receivers could use this report alongside original giver comments to plan specific revisions or follow-up practice, while teachers retain oversight of criteria, prompts, and the degree of AI support built into the cycle.

4. Design Principles for EFL Speaking

These principles apply to EFL/ESL speaking classes that use performative tasks. They align with feedback literacy (Carless & Boud, 2018) and the cyclical model of engagement (Zhan et al., 2025). Principles of good feedback practice also emphasize clarifying standards, supporting self-regulation, and encouraging dialogue (Nicol & Macfarlane-Dick, 2006).

Principle 1: Feedback timing and sequencing are critical. Pre-task training helps set performance expectations and primes attention to task criteria (Hung et al., 2016). Research suggests that peer feedback immediately after speaking tasks, while the performance is still fresh, promotes deeper engagement and reflection (J. G. Wu & Miller, 2020). Spacing out opportunities for feedback, rather than condensing them into a single round, supports sustained improvement and more reliable peer ratings (Li et al., 2022).

Principle 2: Trainer units should be short, repeated, and task aligned. Keeping training activities brief and spaced helps manage cognitive load, especially in complex speaking tasks (van Merriënboer & Sweller, 2005). Studies on video and rubric-based training show that using annotated exemplars and expert-aligned feedback boosts peer reviewers’ specificity and accuracy (Chien et al., 2020; J. G. Wu & Miller, 2020). Further, feedback-on-feedback guidance that prompts justification and polite tone has been shown to improve feedback giver comments in writing and speaking (Guo et al., 2024; K. J. Topping et al., 2025).

Principle 3: The Synthesizer should preserve the givers’ voice while simplifying intake. Automated clustering by criterion and identifying shared patterns helps feedback receivers prioritize action steps without losing the benefit of divergent perspectives (Guo et al., 2024; Winstone et al., 2017). Learners need the full feedback giver record for transparency and trust, but curated summaries and associated learning resources support goal setting and uptake (Zhan et al., 2025).

Principle 4: Guardrails remain essential. Research cautions against delegating grading or judgment to GenAI, as this can erode learner trust and undermine the formative purpose of feedback (K. J. Topping et al., 2025). Instead, teachers should retain oversight and intervene when synthesized giver reports are inaccurate, biased, or incomplete (Carless & Boud, 2018; K. J. Topping et al., 2025). Prompt design and resource selection should match students’ CEFR level and task purpose to avoid misalignment. Offline workflows such as paper-based rubrics followed by delayed synthesis provide inclusivity and resilience for varied instructional contexts.

5. Ethics and Governance

The applications of GenAI to education more generally and to language education specifically remain in their formative stages; thus, principles are likely to be refined and changed at a moderate pace for the foreseeable future. Nevertheless, there are some tentative principles that have begun to emerge, which have been used to develop the theory-informed design framework outlined here. First, it is necessary to point out that the ethical issues surrounding GenAI extend far beyond the purview of the English language classroom to issues of unequal resource distribution (Warschauer et al., 2023) and appropriation of copyrighted works to train models (Karamolegkou et al., 2023). Borrowing from discussions of ethical research practice in Kubanyiova (2008), a useful framing of different ethical perspectives is macroethical and microethical (Talandis & Muller, 2025), where macroethical concerns system-wide issues and microethical specific use cases or implementations (Kubanyiova, 2008). Without minimizing the importance of the broader, macroethical concerns with GenAI, here we focus on microethical issues concerning introducing GenAI to the speaking classroom as previously outlined.

Proponents of ethical GenAI classroom use, largely concerning its implementation in the teaching of writing, include Warschauer et al. (2023), Chapelle et al. (2024), Pratschke (2024), and Ohashi and Hubbard (2025). Elements from their recommendations relevant to the teaching of speaking skills that are taken up here include prioritizing effectiveness and ethics over efficiency, as the Trainer and Synthesizer act in supporting roles, helping students to more easily incorporate training material in the case of the Trainer and summarize peer feedback in the case of the Synthesizer. Grading remains the purview of the teacher, and the underlying training materials and original peer feedback remain accessible to students to check for potential GenAI discrepancies or hallucinations. Further, students’ oral production is not submitted to GenAI agents, thereby maintaining privacy. Instead, students’ anonymized feedback is sent to GenAI agents, meeting the requirement to protect privacy and data outlined in Ohashi and Hubbard (2025).

The student-teacher-GenAI interactional model described here is reflective of what Pratschke (2024) refers to as “generativism,” or “the symbiotic approach to designing and delivering learning in collaboration with GAI” (p. 64). As outlined, the design framework calls for givers to be trained to produce more effective feedback through the Trainer, which is then delivered to receivers through an indirect recipience process in which the Synthesizer curates and consolidates what could otherwise be an overwhelming amount of input. This Synthesizer output remains connected to the original giver-produced feedback so that, when needed, receivers can access and corroborate its output against the original, anonymous giver-generated feedback. Further, consistent with more conventional peer feedback best practices (Nicol & Macfarlane-Dick, 2006), the feedback is formative rather than summative, as it is intended to assist students to improve their future performances. Here, the teacher remains responsible for grading and can also supervise every level of the process between Trainer and Synthesizer, intervening to adjust outputs where necessary to facilitate comprehension and learning.

As the conceptual proposal incorporates pre-programmed prompts and specialized chatbot agents that handle student feedback comments rather than students interacting directly with GenAI tools, many of the concerns raised regarding the ethical use of GenAI prompts are largely circumvented. As such, the GenAI implementation described here represents what Warschauer et al. (2023) refer to as introducing “partial functions” (p. 5) of GenAI, with students interacting with GenAI outputs through, for example, dynamically interlinked Google Docs. However, the requirement that students understand the basic affordances and limitations of GenAI tools (Chapelle et al., 2024; Ohashi & Hubbard, 2025), particularly regarding potential hallucinations and bias (Warschauer et al., 2023), remains salient. Even when prompts, rubrics, and source materials are teacher-designed, LLM outputs may still introduce biases not present in those materials, because bias can also arise from model training data and prompt interpretation, which could perhaps be addressed through representative examples of successful and unsuccessful Trainer and Synthesizer GenAI outputs.

6. Theoretical Expectations

This section articulates theoretical expectations about the educational value of the Trainer and Synthesizer in EFL/ESL speaking contexts. These expectations turn the ideas discussed previously into testable claims about feedback quality, feedback literacy, and speaking development. They are not hypotheses tied to a single study design, but broad statements that can be examined through multiple complementary methods.

6.1. Expectations About the Trainer

The Trainer is designed to improve peer feedback quality by combining rubric-guided practice, expert-referenced calibration, and feedback-on-feedback. From the literature on peer assessment, feedback literacy, and AI-supported review, several expectations follow.

Expectation 1: The Trainer will improve attitudes toward peer feedback. When students complete structured Trainer activities that explain the purposes of peer feedback, model high-quality examples, and provide low-stakes practice in using criteria and feedback language, their attitudes toward the value and efficacy of peer feedback will become more positive than those of students who receive only a rubric and brief instructions. A literature review of 26 empirical studies on peer feedback of diverse academic and professional tasks found that training can strengthen students’ perceptions of fairness and usefulness, as well as their confidence in the process (van Zundert et al., 2010). Given that this review included research on oral communication skills, the Trainer is expected to achieve similar results in EFL speaking. More positive attitudes should, in turn, be associated with greater willingness for feedback givers to invest effort and receivers to engage with comments received (Iwashita & Dao, 2021), providing an affective foundation for the expectations about quality, uptake, and speaking development that follow.

Expectation 2: The Trainer will enhance feedback quality. When students complete short, task-aligned Trainer units before giving feedback, their comments will be more numerous, specific, criterion-referenced, and evidence-based than those of peers who receive only a rubric and generic guidance (Irwin, 2019; Sato & Ballinger, 2016; Sluijsmans et al., 2004). Studies of peer feedback in L2 writing have shown that training and structured guidance can shift comments from vague praise toward more specific, problem-focused, and revision-oriented feedback that is more likely to support substantive revision (Min, 2016; Y. Wu & Schunn, 2021; F. Zhang et al., 2017). Researchers can judge comment quality by checking how specific comments are, whether they mention task criteria, whether they give clear suggestions, and whether the tone is polite and helpful.

Expectation 3: The Trainer will align judgments and increase feedback-giver engagement. Repeated exposure to exemplar-based calibration is expected to reduce systematic gaps between student ratings and expert ratings at the criterion level (Li et al., 2022). Over time, trained students’ rubric scores are expected to show stronger agreement with teacher or expert judgments than those of untrained peers, especially on analytic criteria such as organization and delivery, given evidence that rater training can reduce inconsistent rating patterns and support more reliable criterion-level ratings in peer assessment of oral presentations (De Grez et al., 2012; Saito, 2008). Understanding what constitutes a good example of giver feedback is also expected to enhance engagement in feedback provision (Dao et al., 2021).

Expectation 4: The Trainer will develop feedback literacy. Participation in Trainer activities will foster feedback literacy by strengthening appreciation of feedback, knowledge of quality criteria, and confidence in giving feedback (Carless & Boud, 2018). Students who use the Trainer are expected to perceive peer feedback processes as more useful and to engage with them more actively and agentively, consistent with research linking higher feedback literacy to deeper feedback engagement (Han & Xu, 2021; Winstone et al., 2017; T. Zhang & Mao, 2023).

These expectations focus on low-stakes, iterative training in which the Trainer operates as a preparation and rehearsal space. They are grounded in existing evidence that training can shape attitudes, giver behaviour, and feedback engagement.

6.2. Expectations About the Synthesizer and Its Integrated Use

The Synthesizer targets the recipience side of feedback by organizing multiple peer comments into prioritized themes and curating short learning resources. Its contributions are expected to be most visible in how receivers interpret and act on peer feedback.

Expectation 5: The Synthesizer will support uptake and action planning. Students who receive Synthesizer reports will be more likely to translate peer comments into concrete, criterion-linked goals for their next performance than students who receive unsynthesized feedback alone. Research on feedback recipience and uptake highlights the importance of structured opportunities to turn feedback information into action plans (Jonsson, 2013; Winstone et al., 2017), and studies of GenAI-supported peer feedback in L2 writing suggest that tools that organize or comment on peer feedback can facilitate receivers’ use of comments for revision (Guo et al., 2024). Evidence for this expectation can be sought in the number and quality of action plans, the degree of alignment between plans and giver comments, and the extent to which receivers attempt the planned changes in subsequent tasks.

Expectation 6: The Synthesizer will reduce overload and support affect. By condensing giver input and highlighting convergences and disagreements, the Synthesizer will reduce perceived overload and confusion compared with conditions where students must navigate long lists of unstructured comments. This expectation builds on work showing that learners can experience overload when faced with large amounts of unstructured feedback and that structuring information can support more agentic engagement with feedback (Jonsson, 2013; Winstone et al., 2017; Zhan et al., 2025). Students who receive Synthesizer reports are expected to report lower levels of frustration and higher clarity about what to work on, while maintaining or improving trust in peer feedback processes.

Expectation 7: The combined use of the Trainer and Synthesizer may support speaking development over time. When the Trainer and Synthesizer are used together across a sequence of tasks, students will show stronger gains in targeted aspects of speaking (e.g., organization, intelligibility, delivery) than comparison groups who engage in peer assessment without GenAI support. This expectation is based on evidence that technology-mediated peer feedback and opportunities to revisit comments and performances can improve learners’ speaking performance and confidence over time (Aguilera-Fuentes & Ortiz-Navarrete, 2025; Ding & Zhu, 2025; Tseng & Yeh, 2019; Yeh et al., 2019).

Expectation 8: The combined design may support more equitable participation in peer-feedback processes. By clarifying criteria, scaffolding feedback production, and organizing feedback for uptake, the Trainer and Synthesizer may be particularly helpful for learners who are less confident, less experienced with peer assessment, or more easily overwhelmed by large amounts of unstructured feedback. This expectation is informed by research showing that peer-feedback participation can be shaped by anxiety, hesitation, unequal engagement, and access to supportive structures, as well as that technology-mediated or AI-mediated feedback environments may help reduce some of these barriers (Iwashita & Dao, 2021; K. J. Topping et al., 2025; J. G. Wu & Miller, 2020; Zhan et al., 2025). However, whether such support produces more equitable outcomes in practice would require careful empirical investigation.

Taken together, these expectations frame the Trainer and Synthesizer as parts of a recurring feedback cycle and as a design that may influence not only speaking development, but also feedback uptake, learner affect, and patterns of participation. Students calibrate and rehearse feedback before review, receive structured synthesis and planning support after review, and then enact changes in subsequent tasks. Longitudinal and design-based research can examine how these cycles operate in real courses, how teachers adapt them, and how students’ feedback literacy and speaking performance evolve. Framing the system as an iterative cycle aligns with broader models of feedback engagement and technology-mediated feedback that emphasize repeated loops between information, interpretation, and action (Carless & Boud, 2018; Zhan et al., 2025).

7. Future Research Agenda

7.1. Feasibility, Usability, and Classroom Fit

Initial studies should establish whether the Trainer and Synthesizer can be integrated into existing speaking courses without excessive burdens for teachers and students. Key questions include how long units take in practice, which interfaces (LMS plug-in, webform, or mobile view) are most practical, and what forms of teacher dashboard are most useful. Mixed-methods case studies that combine usage logs (e.g., completion times, access frequency), short post-activity surveys, and semi-structured interviews with teachers and students can illuminate practical constraints and affordances. Researchers might follow one or two courses over a semester, collecting field notes and teacher reflection logs to document how the tools are introduced, adapted, or sidelined in everyday practice. Recent work on AI-enhanced peer assessment suggests that questions of classroom fit and teacher oversight are central to successful adoption (K. J. Topping et al., 2025).

7.2. Effects on Feedback Quality and Feedback Literacy

Building on feasibility work, quasi-experimental or randomized designs can compare classes or sections that use the Trainer with those that follow business-as-usual peer review. Researchers can use established coding schemes for comments (e.g., Y. Wu & Schunn, 2021), rubric–expert agreement analyses, and validated or adapted feedback literacy scales. Designs might assign intact classes to conditions, collect baseline measures of feedback literacy and speaking performance, then track changes across several tasks. Qualitative work, such as stimulated recall based on Trainer interactions, can show how students reason with criteria and how their language for feedback evolves, for example, by asking learners to explain why they accepted or rejected particular GenAI suggestions when revising their comments. These methods are appropriate because they can capture both changes in comment quality and shifts in feedback literacy over time. This line of inquiry would also connect closely with recent work on peer feedback literacy development (F. Zhang et al., 2025).

7.3. Effects on Uptake, Affect, and Speaking Performance

To test propositions about the Synthesizer, studies can examine how students process feedback with and without synthesized reports. Possible indicators include the content of written action plans, observed changes in recorded performances, and self-report measures of clarity, overload, and confidence. Longer sequences of tasks will enable analysis of whether repeated exposure to Synthesizer reports leads to cumulative improvements in targeted speaking behaviors. Experimental or quasi-experimental designs could, for example, compare sections where only some groups receive Synthesizer reports, with blinded raters using analytic rubrics to score pre- and post-intervention performances. Studies should also examine whether these improvements are evenly distributed across different aspects of speaking (e.g., organization, intelligibility, delivery), or whether some dimensions are more sensitive to Trainer–Synthesizer scaffolding than others, using disaggregated analyses to detect uneven gains. These methods are appropriate because they link uptake support not only to learner perceptions but also to subsequent performance and revision behavior (Zhan et al., 2025).

7.4. Student and Teacher Perspectives on AI-Supported Feedback

Because the Trainer and Synthesizer redistribute some feedback work to GenAI, it is important to understand how stakeholders perceive this redistribution. Research questions might include how students negotiate the authority of GenAI suggestions relative to peer and teacher comments, and under what conditions teachers feel that GenAI support enhances, duplicates, or conflicts with their pedagogical intentions. Interviews, focus groups, and classroom observations can uncover tensions and design opportunities that are not visible in performance metrics alone. Studies might sample teachers and students from multiple courses, use artifact-based interviews that draw on actual Trainer exchanges and Synthesizer reports, and trace how perceptions change over time as familiarity with the tools increases. This area of focus is important because recent research suggests that the success of AI-supported peer assessment depends not only on measurable gains but also on whether teachers and students regard such support as trustworthy and pedagogically coherent (K. J. Topping et al., 2025).

7.5. Contextual Variation and Equity

Finally, studies should examine how the proposed design patterns function across different institutional types, proficiency levels, and linguistic and cultural contexts. For example, a Trainer configured for B1-level university students in Japan may require adaptation for vocational colleges, secondary schools, or adult community programmes in other countries. Researchers should also pay attention to whether the system benefits some learners more than others, for instance, by proficiency band, confidence level, or prior experience with peer assessment, and how design adjustments can promote more equitable outcomes. Comparative case studies and multi-site designs can contrast implementations across institutions, while quantitative analyses can disaggregate outcomes by learner subgroups to identify emerging equity gaps and test targeted adaptations intended to close them. GenAI prompt guardrail best practices can be examined in terms of their portability across different contexts. This area of inquiry is especially important because feedback engagement is shaped by both contextual and personal factors and may therefore be supported unevenly across learner groups (Zhan et al., 2025).

Overall, these strands form a research agenda that is both ambitious and pragmatic. Early work can focus on small-scale classroom trials that refine the Trainer and Synthesizer and validate measures of feedback quality, feedback literacy, and uptake. Subsequent studies can adopt more rigorous comparative designs and explore transfer to related domains such as L2 writing. Throughout, close collaboration between researchers, teachers, and students will be essential to ensure that AI-supported feedback remains grounded in pedagogical priorities and responsive to EFL/ESL classroom realities.

8. Implications

8.1. Implications for Feedback Theory and GenAI

Conceptualizing GenAI as Trainer and Synthesizer extends feedback literacy and feedback engagement frameworks into concrete roles for GenAI. Feedback literacy research has called for designs that help learners appreciate feedback, make judgments, manage affect, and take action (Carless & Boud, 2018), while models of feedback recipience and engagement emphasize the processes by which students interpret and use information (Jonsson, 2013; Winstone et al., 2017). Situating GenAI as a scaffold rather than an evaluator responds to these calls by specifying how it can support, rather than replace, human feedback processes.

The cyclical framing of forethought, control, and retrospect (Zhan et al., 2025) is also sharpened by the Trainer–Synthesizer distinction. The Trainer is located primarily in forethought and early control, where it helps build evaluative judgment and feedback literacy through exemplar-based practice and feedback-on-feedback. The Synthesizer operates mainly at control and retrospect, where it structures giver input to support planning and reflection. This division clarifies when and how GenAI can add value and encourages researchers to examine GenAI-supported feedback not as a single tool but as a set of role-specific interventions embedded in feedback cycles.

More broadly, the framework contributes to ongoing debates about whether GenAI in feedback and assessment should augment or automate human judgment, echoing recent work in academic writing feedback and language assessment that advocates positioning LLM-based tools as complements to, rather than replacements for, human expert decisions (Chuang & Yan, 2025; Jovic et al., 2025). The design principles discussed in Section 4 and the governance considerations described in Section 5 align with arguments that GenAI should remain transparent, contestable, and under pedagogical control, with teachers and students retaining evaluative authority (Carless & Boud, 2018; K. J. Topping et al., 2025; Zhan et al., 2025). The Trainer and Synthesizer thus offer a testable middle path between uncritical automation and blanket prohibition of GenAI in feedback processes, whereby GenAI’s affordances of efficiently managing large amounts of data are leveraged to enhance EFL/ESL classroom learning.

8.2. Pedagogical and Institutional Implications

For EFL/ESL speaking teachers, the main implication is that peer feedback on oral performance can be redesigned as a sequence of structured, technology-supported learning opportunities rather than a one-off activity attached to a single presentation. The Trainer suggests ways to build feedback giver literacy and evaluative judgment through short, repeated tasks that use exemplars, criteria, and feedback language, while the Synthesizer highlights how teachers might support feedback receivers, who often struggle with overload and uncertainty about which comments to act on (Jonsson, 2013; Winstone et al., 2017). This aligns with evidence that rater training, annotated exemplars, and guided practice can improve comment quality and rating behaviour in oral assessment (De Grez et al., 2012; Hung et al., 2016; Li et al., 2022; Saito, 2008), and with studies that technology-mediated speaking practice is most helpful when learners can control pace, revisit feedback, and work within psychologically safe environments (Aguilera-Fuentes & Ortiz-Navarrete, 2025; Ding & Zhu, 2025; Tseng & Yeh, 2019; Yeh et al., 2019).

At the institutional level, treating the Trainer and Synthesizer as patterns rather than fixed products encourages institutions to prototype low-cost implementations using existing platforms while retaining pedagogical and governance oversight. Institutions interested in this approach will need to consider technical support, teacher development, and governance structures that oversee prompt design, exemplar selection, data handling, and ongoing review of GenAI outputs (Carless & Boud, 2018; Ohashi & Hubbard, 2025; K. J. Topping et al., 2025; Zhan et al., 2025). Bias and fairness concerns require systematic monitoring of whether GenAI-supported feedback privileges particular varieties of English, participation styles, or student groups (Warschauer et al., 2023).

8.3. Boundary Conditions and Design Limitations

Several boundary conditions and design limitations qualify the claims made here. First, the account is conceptual and design-oriented; it extrapolates from existing research on peer assessment, feedback literacy, and GenAI-supported feedback in related domains, but it does not present new empirical data. The theoretical expectations in Section 6, therefore, remain provisional and require empirical testing, replication, and refinement across multiple courses and institutions. In addition, one of the paper’s central framing tools, Zhan et al.’s (2025) cyclical feedback engagement framework, is very recent and has not yet been empirically tested in EFL/ESL speaking contexts. It is therefore used here as a theoretically useful interpretive framework rather than as an established model for speaking-specific feedback design. Design-based research and classroom trials will be needed to determine how well the Trainer and Synthesizer function in practice and how teachers and students adapt or resist them.

Second, the focus is on EFL/ESL speaking in higher education, especially performative assessments such as presentations, speeches, and project demonstrations. While some mechanisms, such as feedback giver exemplar-based training and structured feedback receiver uptake support, are likely to transfer to other skills and contexts, generalization to younger learners, different proficiency bands, or high-stakes assessment settings should be made cautiously. Prior work on peer assessment suggests that age, proficiency, and assessment culture can all shape how students experience and enact peer feedback (Azarnoosh, 2013; Double et al., 2020; Li et al., 2022; Mok, 2011).

Third, the proposed design assumes access to sufficient digital infrastructure, including devices, connectivity, and institutional platforms capable of integrating GenAI securely. In contexts with limited technology access or strict data-protection constraints, only partial implementations may be feasible, such as offline Trainer activities using printed exemplars or delayed, teacher-mediated synthesis of peer comments. Research on technology-mediated learning has repeatedly shown that infrastructural and workload constraints can limit what teachers can realistically adopt, regardless of conceptual merit (Dai & Wu, 2022; Hung et al., 2016).

A further limitation concerns LLM output reliability. Because the proposed framework depends on GenAI-generated feedback-on-feedback and synthesized uptake support, errors such as hallucinated suggestions, misleading summaries, or inappropriate resource links could weaken the quality and trustworthiness of the support provided. For this reason, the conceptual framework assumes teacher oversight, access to underlying peer comments, and the possibility of auditing the AI-supported outputs rather than treating them as authoritative.

Finally, there is a risk that successful implementations could unintentionally encourage overreliance on GenAI or crowd out valuable human dialogue about feedback. If receivers come to see Synthesizer reports as authoritative, they may pay less attention to individual peer comments or opportunities for direct discussion. Likewise, if teachers defer too much to Trainer or Synthesizer outputs, they may miss chances to model disciplinary judgment or to address misconceptions. These concerns echo broader cautions about the need to keep GenAI in a genuinely augmentative role, supporting but not replacing human sense-making in education (Carless & Boud, 2018; Ohashi & Hubbard, 2025; Pratschke, 2024; Winstone et al., 2017; Zhan et al., 2025).

Recognizing these boundary conditions and design limitations is not a reason to reject GenAI-supported feedback but a reminder that such systems are best understood as contextual, revisable designs. Their value will depend on careful implementation, ongoing evaluation, and sustained collaboration between researchers, teachers, students, and institutional stakeholders.

Author Contributions

Conceptualization, B.I. and T.M.; Methodology, B.I.; Writing—original draft, B.I. and T.M.; Writing—review and editing, B.I. and T.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial intelligence
B1	Common European Framework of Reference (CEFR) Level B1
CEFR	Common European Framework of Reference for Languages
EFL	English as a foreign language
ESL	English as a second language
GenAI	Generative artificial intelligence
L2	Second language
LLM	Large language model
LMS	Learning management system
LTI	Learning Tools Interoperability

References

Aguilera-Fuentes, Y., & Ortiz-Navarrete, M. (2025). Enhancing English speaking skills through screencast feedback: A technique explored with Spanish-speaking undergraduates. Cogent Education, 12(1), 2545330. [Google Scholar] [CrossRef]
Azarnoosh, M. (2013). Peer assessment in an EFL context: Attitudes and friendship bias. Language Testing in Asia, 3, 11. [Google Scholar] [CrossRef]
Carless, D., & Boud, D. (2018). The development of student feedback literacy: Enabling uptake of feedback. Assessment & Evaluation in Higher Education, 43(8), 1315–1325. [Google Scholar] [CrossRef]
Chapelle, C. A., Beckett, G. H., & Ranalli, J. (2024). Paths for exploring AI in applied linguistics. In C. A. Chapelle, G. H. Beckett, & J. Ranalli (Eds.), Exploring artificial intelligence in applied linguistics (pp. 1–8). Iowa State University Digital Press. [Google Scholar] [CrossRef]
Cheng, W., & Warren, M. (2005). Peer assessment of language proficiency. Language Testing, 22(1), 93–121. [Google Scholar] [CrossRef]
Chien, S.-Y., Hwang, G.-J., & Jong, M. S.-Y. (2020). Effects of peer assessment within the context of spherical video-based virtual reality on EFL students’ English-speaking performance and learning perceptions. Computers & Education, 146, 103751. [Google Scholar] [CrossRef]
Chuang, P.-L., & Yan, X. (2025). Language assessment in the era of generative artificial intelligence: Opportunities, challenges, and future directions. System, 134, 103846. [Google Scholar] [CrossRef]
Dai, Y., & Wu, Z. (2022). Mobile-assisted peer feedback on EFL pronunciation: Outcome effects, interactional processes, and shaping factors. System, 111, 102953. [Google Scholar] [CrossRef]
Dao, P., Duong, P. T., & Nguyen, M. (2021). Effects of SCMC mode and learner familiarity on peer feedback in L2 interaction. Computer Assisted Language Learning, 36, 1206–1235. [Google Scholar] [CrossRef]
De Grez, L., Valcke, M., & Roozen, I. (2012). How effective are self- and peer assessment of oral presentation skills compared with teachers’ assessments? Active Learning in Higher Education, 13(2), 129–142. [Google Scholar] [CrossRef]
Ding, Y., & Zhu, J. (2025). Mobile based peer feedback in EFL speaking: Learners’ motivation, behavioral engagement in feedback provision, and achievement. Asian-Pacific Journal of Second and Foreign Language Education, 10, 10. [Google Scholar] [CrossRef]
Double, K. S., McGrane, J. A., & Hopfenbeck, T. N. (2020). The impact of peer assessment on academic performance: A meta-analysis of controlled group studies. Educational Psychology Review, 32, 481–509. [Google Scholar] [CrossRef]
Guo, K., Pan, M., Li, Y., & Lai, C. (2024). Effects of an AI-supported approach to peer feedback on university EFL students’ feedback quality and writing ability. The Internet and Higher Education, 63, 100962. [Google Scholar] [CrossRef]
Han, Y., & Xu, Y. (2021). Student feedback literacy and engagement with feedback: A case study of Chinese undergraduate students. Teaching in Higher Education, 26(2), 181–196. [Google Scholar] [CrossRef]
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. [Google Scholar] [CrossRef]
Hung, Y.-J., Samuelson, B. L., & Chen, S.-C. (2016). Relationships between peer- and self-assessment and teacher assessment of young EFL learners’ oral presentations. In M. Nikolov (Ed.), Assessing young learners of English: Global and local perspectives (pp. 317–338). Springer. [Google Scholar] [CrossRef]
Irwin, B. (2019). Enhancing peer feedback practices through screencasts in blended academic writing courses. The JALT CALL Journal, 15(1), 43–59. [Google Scholar] [CrossRef]
Iwashita, N., & Dao, P. (2021). Peer feedback in L2 oral interaction. In H. Nassaji, & E. Kartchava (Eds.), The Cambridge handbook of corrective feedback in second language learning and teaching (pp. 275–300). Cambridge University Press. [Google Scholar] [CrossRef]
Jonsson, A. (2013). Facilitating productive use of feedback in higher education: A literature review. Active Learning in Higher Education, 14(1), 63–76. [Google Scholar] [CrossRef]
Jovic, M., Papakonstantinidis, S., & Kirkpatrick, R. (2025). From red ink to algorithms: Investigating the use of large language models in academic writing feedback. Language Testing in Asia, 15, 59. [Google Scholar] [CrossRef]
Karamolegkou, A., Li, J., Zhou, L., & Søgaard, A. (2023). Copyright violations and large language models. In Proceedings of the 2023 conference on empirical methods in natural language processing (pp. 7403–7412). Association for Computational Linguistics. [Google Scholar] [CrossRef]
Kubanyiova, M. (2008). Rethinking research ethics in contemporary applied linguistics: The tension between macroethical and microethical perspectives in situated research. Modern Language Journal, 92(4), 503–518. [Google Scholar] [CrossRef]
Li, J., Huang, J., & Cheng, S. (2022). The reliability, effectiveness, and benefits of peer assessment in college EFL speaking classrooms: Student and teacher perspectives. Studies in Educational Evaluation, 72(4), 101120. [Google Scholar] [CrossRef]
Min, H. T. (2016). Effect of teacher modeling and feedback on EFL students’ peer review skills in peer review training. Journal of Second Language Writing, 31, 43–57. [Google Scholar] [CrossRef]
Mok, J. (2011). A case study of students’ perceptions of peer assessment in Hong Kong. ELT Journal, 65(3), 230–239. [Google Scholar] [CrossRef]
Nicol, D. J., & Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: A model and seven principles of good feedback practice. Studies in Higher Education, 31(2), 199–218. [Google Scholar] [CrossRef]
Ohashi, L., & Hubbard, P. (2025). Generative AI ethics: Emerging principles for language teachers. In L. Ohashi, M. Hillis, & R. Dykes (Eds.), Artificial intelligence in our language learning classrooms (pp. 100–121). Candlin & Mynard ePublishing. [Google Scholar]
Pratschke, B. M. (2024). Generative AI and education digital pedagogies, teaching innovation and learning design. Springer. [Google Scholar]
Saito, H. (2008). EFL classroom peer assessment: Training effects on rating and commenting. Language Testing, 25(4), 553–581. [Google Scholar] [CrossRef]
Sato, M., & Ballinger, S. (2016). Understanding peer interaction: Research synthesis and directions. In M. Sato, & S. Ballinger (Eds.), Peer interaction and second language learning: Pedagogical potential and research agenda (pp. 1–30). John Benjamins. [Google Scholar]
Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153–189. [Google Scholar] [CrossRef]
Sluijsmans, D. M. A., Brand-Gruwel, S., van Merriënboer, J. J. G., & Martens, R. L. (2004). Training teachers in peer-assessment skills: Effects on performance and perceptions. Innovations in Education and Teaching International, 41(1), 59–78. [Google Scholar] [CrossRef]
Taguba, H., & Plata, S. (2025). Engagement strategies and reasons for disengagement with teacher feedback: Insights from L2 senior high school students in academic writing. Language Testing in Asia, 15(1), 28. [Google Scholar] [CrossRef]
Talandis, J., Jr., & Muller, T. (2025). Integrating generative AI into academic writing classrooms: Practical pedagogical issues. In B. Lacy, M. Swanson, & P. Lege (Eds.), Moving JALT into the future: Opportunity, diversity, and excellence (pp. 23–31). JALT. [Google Scholar] [CrossRef]
Topping, K. (1998). Peer assessment between students in colleges and universities. Review of Educational Research, 68(3), 249–276. [Google Scholar] [CrossRef]
Topping, K. J., Gehringer, E., Khosravi, H., Gudipati, S., Jadhav, K., & Susarla, S. (2025). Enhancing peer assessment with artificial intelligence. International Journal of Educational Technology in Higher Education, 22(1), 3. [Google Scholar] [CrossRef]
Tseng, S.-S., & Yeh, H.-C. (2019). The impact of video and written feedback on student preferences of English speaking practice. Language Learning & Technology, 23(2), 145–158. [Google Scholar] [CrossRef]
van Merriënboer, J. J. G., & Sweller, J. (2005). Cognitive load theory and complex learning: Recent developments and future directions. Educational Psychology Review, 17(2), 147–177. [Google Scholar] [CrossRef]
van Zundert, M., Sluijsmans, D., & van Merriënboer, J. (2010). Effective peer assessment processes: Research findings and future directions. Learning and Instruction, 20(4), 270–279. [Google Scholar] [CrossRef]
Warschauer, M., Tseng, W., Yim, S., Webster, T., Jacob, S., Du, Q., & Tate, T. (2023). The affordances and contradictions of AI-generated text for writers of English as a second or foreign language. Journal of Second Language Writing, 62, 101071. [Google Scholar] [CrossRef]
Winstone, N. E., Nash, R. A., Parker, M., & Rowntree, J. (2017). Supporting learners’ agentic engagement with feedback: A systematic review and a taxonomy of recipience processes. Educational Psychologist, 52(1), 17–37. [Google Scholar] [CrossRef]
Wisniewski, B., Zierer, K., & Hattie, J. (2020). The power of feedback revisited: A meta-analysis of educational feedback research. Frontiers in Psychology, 10, 3087. [Google Scholar] [CrossRef] [PubMed]
Wu, J. G., & Miller, L. (2020). Improving English learners’ speaking through mobile-assisted peer feedback. RELC Journal, 51(1), 168–178. [Google Scholar] [CrossRef]
Wu, Y., & Schunn, C. D. (2021). From plans to actions: A process model for why feedback features influence feedback implementation. Instructional Science, 49(3), 337–363. [Google Scholar] [CrossRef]
Yeh, H. C., Tseng, S. S., & Chen, Y. S. (2019). Using online peer feedback through blogs to promote speaking performance. Educational Technology & Society, 22(1), 1–14. [Google Scholar]
Zhan, Y., Boud, D., Dawson, P., & Yan, Z. (2025). Generative artificial intelligence as an enabler of student feedback engagement: A framework. Higher Education Research & Development, 44(5), 1289–1304. [Google Scholar] [CrossRef]
Zhang, F., Li, D., Zhao, Y. E., & Zhao, Y. (2025). A longitudinal exploration of EFL learners’ peer feedback literacy development. Studies in Educational Evaluation, 87, 101522. [Google Scholar] [CrossRef]
Zhang, F., Schunn, C. D., & Baikadi, A. (2017). Charting the routes to revision: An interplay of writing goals, peer comments, and self-reflections from peer reviews. Instructional Science, 45(5), 679–707. [Google Scholar] [CrossRef]
Zhang, T., & Mao, Z. (2023). Exploring the development of student feedback literacy in the second language writing classroom. Assessing Writing, 55, 100697. [Google Scholar] [CrossRef]
Zhao, H. (2018). Exploring tertiary English as a foreign language writing tutors’ perceptions of the appropriateness of peer assessment for writing. Assessment & Evaluation in Higher Education, 43(7), 1133–1145. [Google Scholar] [CrossRef]
Zimmerman, B. J. (2000). Attaining self-regulation: A social cognitive perspective. In M. Boekaerts, P. R. Pintrich, & M. Zeidner (Eds.), Handbook of self-regulation (pp. 13–39). Academic Press. [Google Scholar] [CrossRef]
Zimmerman, B. J. (2002). Becoming a self-regulated learner: An overview. Theory into Practice, 41(2), 64–70. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Irwin, B.; Muller, T. Positioning Generative AI in EFL Peer Feedback: Training Feedback Literacy and Enabling Uptake in Speaking Classes. Educ. Sci. 2026, 16, 544. https://doi.org/10.3390/educsci16040544

AMA Style

Irwin B, Muller T. Positioning Generative AI in EFL Peer Feedback: Training Feedback Literacy and Enabling Uptake in Speaking Classes. Education Sciences. 2026; 16(4):544. https://doi.org/10.3390/educsci16040544

Chicago/Turabian Style

Irwin, Bradley, and Theron Muller. 2026. "Positioning Generative AI in EFL Peer Feedback: Training Feedback Literacy and Enabling Uptake in Speaking Classes" Education Sciences 16, no. 4: 544. https://doi.org/10.3390/educsci16040544

APA Style

Irwin, B., & Muller, T. (2026). Positioning Generative AI in EFL Peer Feedback: Training Feedback Literacy and Enabling Uptake in Speaking Classes. Education Sciences, 16(4), 544. https://doi.org/10.3390/educsci16040544

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Positioning Generative AI in EFL Peer Feedback: Training Feedback Literacy and Enabling Uptake in Speaking Classes

Abstract

1. Introduction

2. Theoretical Foundations

2.1. Peer Feedback in EFL Speaking

2.2. Student Feedback Literacy

2.3. Cyclical Engagement with Feedback and the Role of GenAI

3. Positioning GenAI: Two Roles

3.1. GenAI as Trainer

3.2. GenAI as Synthesizer

3.3. Workflow Integration

4. Design Principles for EFL Speaking

5. Ethics and Governance

6. Theoretical Expectations

6.1. Expectations About the Trainer

6.2. Expectations About the Synthesizer and Its Integrated Use

7. Future Research Agenda

7.1. Feasibility, Usability, and Classroom Fit

7.2. Effects on Feedback Quality and Feedback Literacy

7.3. Effects on Uptake, Affect, and Speaking Performance

7.4. Student and Teacher Perspectives on AI-Supported Feedback

7.5. Contextual Variation and Equity

8. Implications

8.1. Implications for Feedback Theory and GenAI

8.2. Pedagogical and Institutional Implications

8.3. Boundary Conditions and Design Limitations

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI