1. Introduction
Over the past several decades, the evaluation of second-language (L2) writing has increasingly incorporated technological support, evolving from early computer-assisted language learning (CALL) applications to more sophisticated automated writing evaluation (AWE) systems (
Ranalli & Yamashita, 2022). Early forms of technology-assisted writing assessment relied primarily on rule-based error detection, focusing on surface-level features such as grammar, spelling, and mechanics (
Barrot et al., 2021). With advances in natural language processing (NLP), AWE systems gradually expanded their evaluative scope to include multiple dimensions of writing, such as lexical choice, syntactic complexity, and organizational features, offering more systematic and consistent feedback to learners (
Nunes et al., 2022).
Consequently, AWE systems have been widely adopted to address persistent challenges in writing instruction, particularly those associated with large class sizes, heavy teacher workloads, and delayed formative feedback (
McNamara & Kendeou, 2022). These challenges are especially prominent in Chinese tertiary EFL classrooms, where writing instruction has traditionally been teacher-centered and product-oriented, with a strong emphasis on linguistic accuracy and examination performance (
Z. V. Zhang & Hyland, 2022;
Ng & Cheung, 2017). Such structural constraints limit opportunities for individualized feedback and iterative revision, often resulting in students’ mechanical engagement with writing tasks and heightened levels of writing anxiety (
Barrot et al., 2021;
Patty, 2024).
Empirical research conducted prior to the emergence of generative artificial intelligence (GenAI) has generally highlighted the pedagogical benefits of AWE systems. Studies suggest that automated feedback can support writing accuracy, encourage learner autonomy, and promote more frequent revision cycles (
Ranalli & Yamashita, 2022;
Geng & Razali, 2020;
Zhai & Ma, 2022). In addition, AWE-mediated feedback has been found to reduce the affective pressure associated with direct teacher evaluation, potentially contributing to a more supportive writing environment (
Barrot et al., 2021). However, earlier generations of AWE systems were largely constrained by fixed feedback templates and statistically driven algorithms, limiting their ability to engage meaningfully with content development, discourse organization, and learner-specific writing trajectories (
Türkoğlu, 2025).
In recent years, the rapid development of GenAI technologies has fundamentally altered the landscape of writing instruction and evaluation (
Bewersdorff et al., 2023). Unlike traditional AWE systems, GenAI-powered tools are capable of generating context-sensitive feedback, simulating dialogic interaction, and supporting writing as a dynamic, process-oriented activity rather than a purely product-based outcome (
Yan, 2024). These developments have stimulated ongoing scholarly discussions regarding the pedagogical positioning of automated systems, particularly in relation to assessment practices and instructional roles (
Loncar et al., 2023). Thus, the integration of automated writing systems now requires not only technical refinement but also careful pedagogical repositioning within instructional contexts.
Despite their growing prominence, most widely used AWE and AI-assisted writing tools—such as WriteToLearn
®, MY Access!
®, Criterion
®, and Project Essay Grader—have been developed primarily for English-as-a-first-language contexts or generalized international markets (
Wilson & Roscoe, 2020). As a result, these systems often fail to fully accommodate the linguistic characteristics, curricular demands, and learning cultures of Chinese EFL learners (
Z. V. Zhang & Hyland, 2022;
Su, 2020). Challenges such as first-language interference, exam-oriented writing requirements, and context-specific rhetorical conventions remain insufficiently addressed, underscoring the need for localized and pedagogy-aligned AWE solutions (
Y. Zhang, 2020).
In response to this contextual demand, iWrite has been developed as a locally grounded AWE system designed specifically for Chinese tertiary EFL instruction (
J. Wang & Wang, 2021). Rather than functioning solely as a grammar-focused correction tool, iWrite adopts an analytical scoring approach that evaluates student writing across multiple dimensions, including language use, content development, and organizational structure (
Qin & Liu, 2025). By aligning its assessment framework with national curriculum standards and mainstream instructional practices in China, iWrite represents an evolution of AWE that emphasizes pedagogical compatibility alongside technological functionality (
X. Chen, 2025).
Beyond linguistic outcomes, affective factors have emerged as critical considerations in technology-mediated writing instruction. Writing anxiety, in particular, remains prevalent among Chinese university students and has been shown to negatively affect writing fluency, organizational coherence, and sustained engagement (
Abdel Latif, 2019;
Yan, 2024). From the perspective of Control-Value Theory, learners’ emotional responses to writing tasks are closely linked to their perceived control over the writing process and evaluative pressure (
Pekrun, 2006;
Deane, 2018). Automated feedback systems that provide structured, manageable, and supportive feedback may therefore contribute to anxiety reduction by enhancing learners’ sense of control and facilitating repeated engagement with writing tasks (
Barrot et al., 2021).
In evaluating L2 writing development, complexity, accuracy, and fluency (CAF) have been widely recognized as core dimensions in second-language writing research (
McCallum & Curry, 2023). Although CAF has been extensively employed in second-language acquisition studies, comprehensive investigations examining all three dimensions simultaneously in the Chinese EFL writing context remain limited (
Lu & Ai, 2015;
Y. Zhang, 2020;
Cheng & Zhang, 2021;
Qin & Liu, 2025). Existing research has often focused on isolated aspects of writing performance, leaving a gap in understanding how localized AWE systems influence overall writing development across CAF dimensions (
Toufaha, 2024).
Taken together, while iWrite is theoretically positioned to address both instructional and affective challenges in Chinese EFL writing, robust empirical evidence supporting its effectiveness remains scarce. Specifically, few studies have systematically examined its impact on writing performance across CAF dimensions or its influence on writing anxiety in authentic university classroom settings (
Ranalli & Yamashita, 2022;
Toufaha, 2024). To address these gaps, this mixed-methods study aims to investigate the effects of iWrite on Chinese university students’ English writing performance and writing anxiety. It seeks to answer the following research questions:
RQ1. What is the effect of the iWrite system, compared to traditional instruction, on Chinese university students’ English writing performance as measured by CAF?
RQ2. What is the effect of the iWrite system, compared to traditional instruction, on Chinese university students’ English writing anxiety?
RQ3. What are students’ perceptions of the influence of the iWrite system on their English writing performance and writing anxiety?
2. Materials and Methods
2.1. Research Design
This study adopted an explanatory sequential mixed-methods design (
Creswell & Clark, 2017), a two-phase approach that strategically combines quantitative and qualitative methods. The rationale for selecting this design was threefold. First, it aligns directly with the nature of our research questions: initial quantitative questions (RQ1 & RQ2) assess the effects and extent of changes in writing performance and anxiety, while the subsequent qualitative question (RQ3) seeks to explain how and why students experienced these changes, providing depth to the numerical results. Second, this design capitalizes on the complementary strengths of both paradigms. The quantitative, quasi-experimental pretest–posttest phase offers generalizable and statistically testable evidence of causal relationships, whereas the subsequent qualitative phase, utilizing semi-structured interviews, provides rich, contextualized insights into participants’ perceptions and experiences (
Mrabti & Alaoui, 2024). This integration allows for a more comprehensive and nuanced understanding than either approach alone. Third, the sequential structure is pragmatically suitable for investigating educational interventions, as it enables the qualitative data to build upon and explain the initial quantitative findings, a logic well-established in applied linguistics and educational technology research.
Therefore, the research was operationalized in two distinct phases. In the first, quantitative phase, a quasi-experimental pretest–posttest design was employed to collect numerical data on writing performance (CAF metrics) and writing anxiety (SLWAI scores) from both experimental and control groups. In the second, qualitative phase, semi-structured interviews were conducted with a purposive sample of participants from the experimental group to explore their subjective perceptions and experiences in greater depth, thereby explaining and elaborating on the quantitative outcomes. The findings from both phases were integrated during the interpretation stage to provide a consolidated conclusion regarding the impact of the iWrite system.
2.2. Participants and Context
The participants in this study consisted of 60 first-year non-English majors enrolled in teacher education programs, including Primary Education and Preschool Education. Detailed demographic characteristics, including gender, age, and years of English learning, were comparable between the experimental group (
n = 30) and the control group (
n = 30), as shown in
Table 1.
Due to constraints in the actual teaching arrangements, a convenience sampling method was adopted by selecting two intact classes. These classes were subsequently assigned to the experimental and control conditions following a quasi-experimental design. The sample size was determined with reference to similar empirical studies and was further examined through an a priori power analysis conducted using G*Power 3.1.9.7, which indicated that the sample size was sufficient to detect medium-to-large effect sizes (
Cheng & Zhang, 2021).
Although individual random assignment was not feasible, pre-test measures and independent-samples
t-tests were conducted to examine baseline equivalence between the two groups. The results indicated no statistically significant differences in writing performance or writing anxiety prior to the intervention, thereby supporting the validity of subsequent between-group comparisons (
Creswell & Clark, 2017).
2.3. Instructional Procedures
The instructional intervention in this study was grounded in the theoretical frameworks of the process writing approach and formative assessment (
Black & Wiliam, 1998;
Flower & Hayes, 1981). Process writing conceptualizes writing as a recursive, multi-draft process. Accordingly, a 12-week cycle comprising five writing tasks was designed to facilitate a complete “planning-drafting-feedback-revising” cycle. The integration of the iWrite system aimed to transform the typically delayed, teacher-centered feedback into an immediate and multi-source feedback mechanism, aligning with formative assessment principles. Both groups received regular College English curriculum-aligned instruction, with the key difference residing not in task design, but in the feedback mechanism and the revision process it engendered.
Both groups were required to complete the same five writing assignments. All tasks were short argumentative essays (120–180 words) taken directly from the CET-4 test, ensuring genre consistency, and were administered under comparable classroom conditions. The use of these standardized prompts guaranteed consistent task difficulty and demands across groups.
The experimental group completed the assignments using the iWrite system, while the control group followed the traditional method by submitting their essays directly to the teacher. The key operational difference resided in the feedback mechanism and the revision process it engendered.
A structured, process-oriented cycle was implemented for each task. Both groups were afforded an identical three-week revision window following the initial submission and were permitted to revise their work during this period. The fundamental distinction lay in the nature of the feedback that scaffolded revision: students in the experimental group, utilizing the iWrite system, received immediate, iterative, and diagnostic automated feedback after each draft submission, which enabled a dynamic, feedback-driven revision process where multiple submissions were naturally facilitated by the tool. In contrast, students in the control group received delayed, holistic written feedback from the instructor at the beginning of the revision window; consequently, their revision was guided by a single, static set of comments, without a structured mechanism for obtaining incremental feedback.
2.4. Research Instruments
2.4.1. iWrite System
The iWrite English Writing Teaching and Evaluation System employed in this study is a localized Automated Writing Evaluation (AWE) system jointly developed in 2015 by the research team led by Professor Liang Maocheng at Beijing Foreign Studies University and the Foreign Language Teaching and Research Press. Its core functionality lies in providing automated multi-dimensional scoring of student essays (covering language, content, text structure, and technical conventions) and delivering immediate, diagnostic feedback, with particular strength in identifying and correcting language errors specific to Chinese EFL learners.
The system is specifically designed to address the characteristics of Chinese English learners. Its evaluation model is based on the aforementioned four core dimensions established by experts in the fields of second-language writing, language assessment, and corpus linguistics (
Li, 2021). To enable accurate assessment, iWrite has built a dedicated corpus containing hundreds of millions of tokens, integrating native speaker corpora, international learner corpora, and—most importantly—a continuously updated Chinese Learner English Corpus (e.g., the iWrite Corpus), which focuses on diagnosing common errors among Chinese learners (
Wu et al., 2024). Currently, its intelligent scoring engine utilizes advanced deep learning technologies and demonstrates strong performance in automated scoring and grammatical error correction.
In this study, the core procedure for students using iWrite was as follows: after submitting an essay, the system generated an overall score and provided diagnostic feedback across the four dimensions (including specific grammatical corrections and lexical suggestions). Students in the experimental group then revised their essays based on this feedback.
2.4.2. Writing Tests
All participants completed a total of five writing tasks over the course of the study, consisting of one pretest, three instructional writing tasks during the intervention phase, and one posttest. To assess writing performance, participants completed an argumentative essay as a pretest at the beginning of the semester and a parallel argumentative essay as a posttest at the end of the intervention. Tasks were assigned at regular three-week intervals to ensure sufficient time for drafting, feedback, and revision. The pretest and posttest prompts were parallel in terms of genre, topic familiarity, length requirement, and difficulty level, ensuring task equivalence for measuring writing development over time.
This study utilized writing prompts from the authentic National College English Test Band 4 (CET-4) as the testing materials. The selection of the CET-4 was based on the following three key considerations:
First, as a national standardized test, the validity and reliability of the CET-4 have been extensively validated over time, ensuring its measurement quality and solid academic credibility (
S. Chen, 2022).
Second, the test content aligns closely with the objectives of this study. The CET-4 writing section directly assesses students’ ability to express themselves in writing on familiar topics. Its argumentative essay genre represents a core requirement in Chinese university English teaching and assessment, thereby ensuring good ecological validity for the research.
Finally, CET-4 scores carry significant social weight and certification value within the context of Chinese higher education. Consequently, research findings based on this test hold greater referential significance for teaching practices and related decision-making (
Jiang, 2020).
Writing performance was assessed using the three-dimensional CAF (Complexity, Accuracy, and Fluency) framework. This framework has been widely recognized as a core, operationalizable index system for evaluating second-language writing development and output, capable of comprehensively reflecting different facets of learners’ language ability (
Lee et al., 2023). Complexity was measured through syntactic complexity (clauses per T-unit) and lexical complexity (corrected type–token ratio). Accuracy was measured by the number of errors per 100 words. Fluency was measured by words per clause. Automated linguistic analysis tools, including L2SCA and LCA, were used to ensure objectivity and reliability (
Saricaoglu & Atak, 2022).
To ensure the objectivity and consistency of scoring for writing performance (CAF), two trained raters independently assessed all essays. The raters were blinded to the group assignment (experimental/control) and testing occasion (pre-/post-test) of the samples. Inter-rater reliability was quantitatively assessed using a two-way random-effects model for the intraclass correlation coefficient (ICC). The analysis yielded an ICC of 0.92, which is conventionally interpreted as indicating excellent agreement (
Cicchetti, 1994).
2.4.3. Writing Anxiety Questionnaire
This study adopts the Second Language Writing Anxiety Inventory (SLWAI), revised by Guo and Qin in 2010, as the research instrument. The scale is based on the Second Language Writing Anxiety Inventory developed by Cheng in 2004. The questionnaire assesses four dimensions: classroom anxiety, conceiving anxiety, avoidance behavior, and lack of confidence. Compared with Cheng’s SLWAI (2004) designed for English majors in Taiwan, the SLWAI adapted by
Guo and Qin (
2010) is more suitable for the actual situation of Chinese EFL university students due to its specificity, comprehensiveness, and localization adaptability, including language habits and cultural background. Such a localized scale can more accurately reflect the writing anxiety of Chinese EFL college students. In addition, it has been proven to have high reliability and validity (
Guo & Qin, 2010). Therefore, in this study, it will be utilized to measure the writing anxiety of Chinese EFL university students.
2.4.4. Semi-Structured Interviews
To address RQ3 and gain a deeper understanding of the mechanisms behind the quantitative trends, semi-structured interviews were conducted. The interviews served a dual purpose: (1) to triangulate the quantitative findings on writing performance (CAF) and anxiety by seeking students’ explanatory perspectives; and (2) to explore in-depth the perceived advantages, challenges, and overall lived experience of using the iWrite system from the learners’ standpoint (
Creswell & Creswell, 2017).
An interview protocol was developed based on the research objectives, ensuring alignment with the study’s theoretical framework (
Coker & Akande, 2025). The protocol contained open-ended questions organized around key domains: students’ general experience with the writing tasks; their detailed perceptions of the feedback received; their described revision behaviors; and their feelings of confidence or anxiety throughout the process.
Semi-structured interviews were conducted with six participants from the experimental group following the completion of the intervention. A purposive sampling strategy was adopted to ensure variation in writing performance improvement and anxiety reduction levels. To ensure comfort and expressiveness, interviews were conducted face-to-face, one-on-one, in Mandarin Chinese (the participants’ first language). Each interview lasted approximately 30 min. Prior to the interview, informed consent was obtained, explicitly covering audio recording for research purposes. All interviews were recorded digitally and subsequently transcribed verbatim to prepare the textual data for analysis. Participant anonymity was maintained through the use of pseudonyms in all transcripts and reports.
2.5. Data Analysis
To address RQ1 and RQ2, quantitative data were analyzed in a two-stage procedure. Following descriptive statistics, inferential analyses were conducted. First, paired-sample t-tests assessed within-group changes from pre-test to post-test for each group separately.
Then, to directly compare the intervention effect between groups while controlling for pre-existing differences, a series of one-way analyses of covariance (ANCOVA) was performed with post-test scores as the dependent variable, group as the fixed factor, and the corresponding pre-test scores as the covariate. This method was selected over independent t-tests because it provides a more precise estimate of the treatment effect by accounting for baseline variability. Prior to these tests, the statistical assumptions were verified. The normality of distribution for all continuous variables was assessed using the Shapiro–Wilk test, and the homogeneity of variance was checked using Levene’s test. The data met the assumptions for parametric tests.
Effect sizes were calculated for statistically significant results, with Cohen’s d reported for
t-tests and partial η
2 for ANCOVA. The magnitude of effects was interpreted in accordance with the L2-specific benchmarks proposed by
Plonsky and Oswald (
2014). All quantitative analyses were performed using SPSS 27.0.
The qualitative data were analyzed using thematic analysis following the procedures outlined by
Braun and Clarke (
2006). The analysis involved six steps: (1) familiarization with the data through repeated reading of transcripts; (2) initial open coding to identify meaningful units; (3) grouping related codes into broader categories; (4) generating candidate themes; (5) reviewing and refining themes to ensure internal coherence and external distinction; and (6) defining and naming the final themes.
To enhance credibility and reliability, a second researcher independently reviewed a subset (30%) of the transcripts and coding results. Discrepancies were discussed until consensus was reached. Member checking was also conducted by returning summarized interpretations to participants for confirmation.
The qualitative findings were used to triangulate and enrich the quantitative results, providing deeper insight into the mechanisms through which the iWrite system influenced writing performance and anxiety.
4. Discussion
4.1. Summary of Key Findings
This study investigated the effects of integrating the iWrite Automated Writing Evaluation (AWE) system into university EFL writing instruction. The quantitative results revealed a differentiated impact of the iWrite system on the writing performance of Chinese EFL learners: significant improvements with medium-to-large effect sizes were observed in accuracy, fluency, and lexical complexity, whereas no significant change was found in syntactic complexity. This pattern delineates the specific affordances and current boundaries of technology-mediated feedback.
Specifically, the marked gain in accuracy can be directly attributed to iWrite’s immediate, form-focused formative feedback, which created an efficient cycle of “error identification-autonomous correction.” This corroborates the effectiveness of Formative Assessment Theory and Autonomous Learning Theory in technology-enhanced environments (
Black & Wiliam, 1998;
Holec, 1981). Interview data confirmed that students actively utilized the feedback for multiple revisions, achieving autonomous optimization of linguistic accuracy.
The enhancement in fluency likely stems from a reallocation of cognitive load. By offloading part of the language-monitoring function onto the system, students could direct more attentional resources toward idea generation and content development, thereby producing more words per unit time (W/T). This supports the view that technological tools can promote production fluency by reducing the cognitive burden of the task.
The increase in lexical complexity indicates that iWrite can effectively serve as a scaffold for vocabulary development. The system’s feedback on word repetition and inappropriate collocations enhanced students’ metacognitive lexical awareness, prompting them to actively experiment with and incorporate more diverse vocabulary. This demonstrates the potential of technological feedback to facilitate constructivist learning.
However, the lack of significant improvement in syntactic complexity (C/T) can be interpreted from two perspectives. On one hand, it may reflect a “trade-off” in attentional resources, where students prioritized meeting the system’s explicit demands for accuracy and lexical diversity within a limited time (
Skehan, 1998). On the other hand, it suggests that iWrite’s current feedback mechanism focuses more on sentence-level corrections and offers limited guidance for generating more complex syntactic structures like clause embedding.
Concurrently, on the affective dimension, the iWrite intervention led to a statistically significant reduction in writing anxiety across all four measured subscales, with particularly large effect sizes for “Avoidance Behavior” and “Lack of Confidence.” This provides robust empirical evidence for the system’s efficacy in alleviating the emotional barriers to writing among Chinese EFL learners. Qualitative data further revealed that this anxiety reduction was closely linked to the low-threat practice environment and the perceivable pathway for progress created by the system, laying the groundwork for a deeper analysis based on Control-Value Theory (
Pekrun, 2006).
In summary, the findings paint a nuanced picture: the iWrite system is effective in enhancing language control, promoting production fluency, and enriching vocabulary, while also significantly reducing writing anxiety. However, its role in fostering the development of deeper linguistic competence, such as syntactic complexity, remains limited. This provides crucial evidence for understanding the scope of pedagogical empowerment offered by localized AWE tools.
4.2. Dialogue with Existing Theory and Literature
The aforementioned findings engage in a constructive dialog with core theoretical frameworks in second-language writing and educational technology. First, through its immediate and multi-round technological feedback, iWrite successfully institutionalized the “drafting-feedback-revision” cycle advocated by the process writing approach into a sustainable classroom practice (
Y. Zhang, 2020), demonstrating the potential of technology to ground pedagogical principles. Second, the significant reduction in writing anxiety can be powerfully explained by Control-Value Theory (
Pekrun, 2006). The system’s feedback, by providing a clear and actionable path for improvement, enhanced learners’ perceived control over the writing task, while its automated, non-judgmental nature reduced the perceived threat often associated with authoritative evaluation (
Waer, 2023). Third, the results address the ongoing debate about whether AWE can promote deep language learning. The facilitation of lexical complexity supports the view that technology can act as a scaffold for constructivist learning. However, the stagnation in syntactic complexity echoes previous observations that current AWE systems are more adept at optimizing local language forms, while offering limited support for generating complex syntactic structures (
Warschauer & Ware, 2006). This highlights the necessity of combining such technology with social interactions that emphasize meaning negotiation, such as teacher guidance.
4.3. Implications for Pedagogical Practice and Teacher Professional Development
The findings of this study hold direct significance for EFL writing instruction, particularly regarding the evolution of teachers’ roles and their professional growth. AWE systems like iWrite are best positioned as collaborative feedback providers. By efficiently handling surface-level linguistic features, they can liberate teachers from the heavy burden of mechanical grading, allowing them to focus on providing higher-order guidance on aspects such as ideation, logical organization, and rhetorical strategies (
Bai, 2021). This enables the realization of human–computer collaborative differentiated instruction.
Critically, technology integration profoundly drives teacher professional development. The introduction of iWrite necessitates and fosters a threefold evolution in the teacher’s role: (1) From assessor to learning designer: Teachers need to redesign curricular processes to embed AWE organically within learning cycles, constructing a multi-source feedback ecosystem. (2) From judgment-by-experience to data-informed instructional decision-maker: Teachers need to develop data literacy to utilize the learning analytics provided by the system for targeted intervention (
Lee et al., 2023). (3) From knowledge transmitter to metacognitive coach: Teachers need to guide students on how to critically utilize technological feedback, cultivating self-regulated learning strategies. This shift aligns with the Technological Pedagogical Content Knowledge (TPACK) framework, emphasizing the need for teachers to integrate technology, pedagogy, and content knowledge (
DeLeon et al., 2023).
Therefore, successful AWE integration must be accompanied by systematic teacher professional development support. The focus of training should shift from software operation to technology-enhanced pedagogy, helping teachers master the skills required for these new roles and reflect on the place of technology within their teaching philosophy. This echoes the global discussion on the need for teachers to develop TPACK (
Tseng et al., 2019) and provides an empirical case for its specific application in the Chinese EFL context.
4.4. Limitations and Directions for Future Research
This study has several limitations. The sample was drawn from a specific major at a single university, which limits the generalizability of the findings. While the 12-week intervention period allowed for the observation of short-term changes, the long-term effects remain unclear. Furthermore, the study treated iWrite as a holistic intervention, failing to disentangle the specific contributions of its different feedback types. Future research could: (1) conduct comparative studies across different regions and learner profiles; (2) implement longer-term longitudinal tracking; (3) employ micro-analytic methods to investigate learners’ attention to and internalization of different types of AWE feedback, thereby informing more refined system design and pedagogical integration.
5. Conclusions
This study investigated the pedagogical impact of integrating the iWrite Automated Writing Evaluation system into university EFL writing instruction, with a particular focus on writing performance and writing anxiety. Drawing on a quasi-experimental mixed-methods design, the findings demonstrate that sustained use of iWrite leads to significant improvements in writing accuracy, fluency, and lexical complexity, as well as marked reductions in learners’ writing anxiety, when compared with traditional teacher feedback. These results provide empirical support for the effectiveness of localized AWE systems in technology-enhanced EFL writing contexts.
From a theoretical perspective, the findings extend existing research on AWE-assisted writing by offering fine-grained evidence based on the CAF framework and by foregrounding the affective dimension of writing development. The differential effects observed across CAF dimensions suggest that automated feedback is particularly well suited to facilitating form-focused learning, while more complex aspects of syntactic development remain less responsive to short-term automated intervention. In addition, the significant reduction in writing anxiety lends support to Control–Value Theory by illustrating how enhanced perceived control and supportive feedback environments can positively shape learners’ emotional experiences in second-language writing.
Pedagogically, the study highlights the value of adopting a human–AI collaborative approach to EFL writing instruction. When employed as a formative tool, iWrite can effectively complement teacher feedback by handling routine linguistic evaluation and enabling repeated, low-stakes revision. Such an instructional configuration allows teachers to allocate more time and attention to higher-order concerns, including idea development, discourse organization, and rhetorical effectiveness. At the same time, the use of AWE systems may contribute to the creation of a more supportive and low-anxiety learning environment, thereby fostering sustained learner engagement in writing.
Despite these contributions, several limitations should be acknowledged. The relatively short duration of the intervention and the use of intact classes may limit the generalizability of the findings. Future research could employ longitudinal designs, involve participants from diverse institutional contexts, and explore the combined effects of automated and teacher feedback on higher-order writing development. Further investigation into learners’ long-term writing trajectories and self-regulatory behaviors would also enrich understanding of the pedagogical potential of AWE systems.
In conclusion, the present study underscores the promise of localized AWE systems such as iWrite in enhancing both the cognitive and affective dimensions of EFL writing. By integrating automated feedback within a carefully designed instructional framework, higher education institutions may better support learners’ writing development and well-being in increasingly technology-mediated learning environments.