Boundary Conditions for LLM-Generated Feedback in Primary Writing: An Educator-Aligned Evaluation and Design Considerations

Zhang, Dan; Hoang, Thuong; Zhu, Ye; Wang, Rui; Crouch, Paula; Wang, Yi

doi:10.3390/computers15060393

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Boundary Conditions for LLM-Generated Feedback in Primary Writing: An Educator-Aligned Evaluation and Design Considerations

by

Dan Zhang

^1,*

,

Thuong Hoang

^1,*

,

Ye Zhu

¹

,

Rui Wang

²

,

Paula Crouch

³ and

Yi Wang

¹

Faculty of Science, Engineering and Built Environment, School of Information Technology, Deakin University, Burwood, VIC 3125, Australia

²

Commonwealth Scientific and Industrial Research Organisation (CSIRO), Research Way, Clayton, VIC 3168, Australia

³

Kinetic Education, 506 Nepean Hwy, Frankston, VIC 3199, Australia

^*

Authors to whom correspondence should be addressed.

Computers 2026, 15(6), 393; https://doi.org/10.3390/computers15060393 (registering DOI)

Submission received: 18 May 2026 / Revised: 10 June 2026 / Accepted: 13 June 2026 / Published: 18 June 2026

(This article belongs to the Special Issue Natural Language Processing (NLP) and Large Language Modelling (2nd Edition))

Download Versions Notes

Abstract

Generative large language models (LLMs) are increasingly used to support writing feedback. However, the pedagogical safety and usefulness of LLM feedback for primary students remains under-evaluated. This study reports an educator-centered evaluation of GPT-4 Turbo for Year 5 narrative and persuasive writing in the context of an established online tutoring program. Using authentic students’ drafts paired with tutor feedback, we generated parallel LLM feedback via rubric-aligned prompting and compared the two feedback sources in a blinded, within-script design. Four experienced English specialists co-designed a six-dimensional rubric (clarity, specificity, helpfulness, feasibility, relevance, and overall effectiveness) and rated tutor versus LLM feedback for each script; their written reflections were analyzed thematically to surface boundary conditions and risk perceptions. Across dimensions, tutor feedback received slightly higher mean ratings, with the clearest descriptive advantage in perceived helpfulness; however, none of the differences remained statistically significant after Holm-Bonferroni correction. LLM feedback was often rated similarly for clarity and feasibility but was frequently characterized as generic, surface-focused, and occasionally misaligned with the student draft, which increased verification effort and posed a risk of misleading learners if used without mediation. Synthesizing ratings and educator reflections, we identify conditions under which LLM feedback is most appropriate as rapid first-pass support for routine structure and surface revision, and least appropriate for developmental judgment and context-sensitive guidance. We translate these findings into design requirements for teacher-in-the-loop primary writing feedback systems, including alignment to explicit pedagogical constructs, editable workflows, and safeguards that reduce unsupported feedback before release to students.

Keywords: large language models (LLMs); automated writing feedback; primary education; Human-AI collaboration; educator-centered evaluation

Share and Cite

MDPI and ACS Style

Zhang, D.; Hoang, T.; Zhu, Y.; Wang, R.; Crouch, P.; Wang, Y. Boundary Conditions for LLM-Generated Feedback in Primary Writing: An Educator-Aligned Evaluation and Design Considerations. Computers 2026, 15, 393. https://doi.org/10.3390/computers15060393

AMA Style

Zhang D, Hoang T, Zhu Y, Wang R, Crouch P, Wang Y. Boundary Conditions for LLM-Generated Feedback in Primary Writing: An Educator-Aligned Evaluation and Design Considerations. Computers. 2026; 15(6):393. https://doi.org/10.3390/computers15060393

Chicago/Turabian Style

Zhang, Dan, Thuong Hoang, Ye Zhu, Rui Wang, Paula Crouch, and Yi Wang. 2026. "Boundary Conditions for LLM-Generated Feedback in Primary Writing: An Educator-Aligned Evaluation and Design Considerations" Computers 15, no. 6: 393. https://doi.org/10.3390/computers15060393

APA Style

Zhang, D., Hoang, T., Zhu, Y., Wang, R., Crouch, P., & Wang, Y. (2026). Boundary Conditions for LLM-Generated Feedback in Primary Writing: An Educator-Aligned Evaluation and Design Considerations. Computers, 15(6), 393. https://doi.org/10.3390/computers15060393

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Boundary Conditions for LLM-Generated Feedback in Primary Writing: An Educator-Aligned Evaluation and Design Considerations

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI