Prompting Better Feedback: A Study of Custom GPT for Formative Assessment in Undergraduate Physics
Abstract
1. Introduction
- How effectively can GenAI evaluate and provide feedback on longer-form physics content?
- What are the strengths and limitations of GenAI as a feedback provider, particularly in comparison to current assessor feedback?
- What are students’ perceptions of GenAI’s role in assessment, and how do these influence their evaluation of the tool?
1.1. Generative Artificial Intelligence (GenAI)
1.1.1. Why ChatGPT?
1.1.2. Benefits and Limitations
1.1.3. Past Works
1.2. Assessment and Feedback
- Feedback about the task (FT)—related to correctness and specific knowledge;
- Feedback about the processing of the task (FP)—related to learning strategies;
- Feedback about self-regulation (FR)—such as prompting self-evaluation;
- Feedback about the self (FS)—relating to personal praise.
2. Phase 1: Preliminary Survey—Results and Findings
2.1. Method
2.2. Results
2.2.1. Opinions on Lab Report Writing
2.2.2. Opinions on Current Feedback
2.2.3. Student Familiarity with GenAI
2.2.4. Student Trust in GenAI
2.2.5. GenAI in Relation to Assessors
2.2.6. Student Concerns Regarding GenAI Assessment
2.3. Key Survey Insights
3. Phase 2: GenAI Tool Development and Evaluation
3.1. Tool Development
3.1.1. Instruction Box
- Analyse the report structure using AnalysisInstructions.pdf:
- Identify and label sections using heading and formatting cues.
- Flag any deviations from the expected order.
- Recognise written style as a distinct section.
- Evaluate the report content using MarkingCriteria.pdf.
- Generate feedback based on the formatting conventions in FeedbackPresentation.pdf.
- Identify the weakest section and ask the user whether a rewrite is desired. If yes, proceed as follows using RewritingGuidelines.pdf.
- End with an open invitation for student follow-up or clarification.
3.1.2. Knowledge Base
3.1.3. Output Format
3.2. Evaluation Methodology
3.3. Evaluation Results
3.3.1. Pre-Feedback Survey: Student Attitudes Toward GenAI
3.3.2. Post-Feedback Survey: Student Evaluation of GenAI Tool
Usefulness
Accuracy
Clarity
Actionability
Comparison of Categories
Comparison to Human Assessor
Rewrite
3.3.3. Evaluation Results by Category: The Influence of Prior Attitudes on Feedback Evaluation
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- An, Y., Yu, J. H., & James, S. (2025). Investigating the higher education institutions’ guidelines and policies regarding the use of generative AI in teaching, learning, research, and administration. International Journal of Educational Technology in Higher Education, 22, 10. [Google Scholar] [CrossRef]
- Blank, I. A. (2023). What are large language models supposed to model? Trends in Cognitive Sciences, 27(11), 987–989. [Google Scholar] [CrossRef]
- Bloom, B. (1971). Handbook on formative and summative evaluation of student learning. McGraw-Hill. [Google Scholar]
- Bonett, D. G. (2002). Sample size requirements for testing and estimating coefficient alpha. Journal of Educational and Behavioral Statistics, 27(4), 335–340. [Google Scholar] [CrossRef]
- Chassignol, M., Khoroshavin, A., Klimova, A., & Bilyatdinova, A. (2018). Artificial Intelligence trends in education: A narrative overview. Procedia Computer Science, 136, 16–24. [Google Scholar] [CrossRef]
- Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). Wiley. [Google Scholar]
- Cooper, G. (2023). Examining science education in ChatGPT: An exploratory study of generative artificial intelligence. Journal of Science Education and Technology, 32(3), 444–452. [Google Scholar] [CrossRef]
- de Winter, J. F. C., & Dodou, D. (2010). Five-point likert items: t test versus Mann–Whitney–Wilcoxon. Practical Assessment, Research & Evaluation, 15(11), 1–16. [Google Scholar] [CrossRef]
- Dotan, R., Parker, L. S., & Radzilowicz, J. (2024, June 3–6). Responsible. adoption of generative AI in higher education: Developing a “points to consider” approach based on faculty perspectives. The 2024 ACM Conference on Fairness, Accountability, and Transparency (pp. 2033–2046), Rio de Janeiro, Brazil. [Google Scholar] [CrossRef]
- Ebert, C., & Louridas, P. (2023). Generative AI for software practitioners. IEEE Software, 40(4), 30–38. [Google Scholar] [CrossRef]
- El Fathi, T., Saad, A., Larhzil, H., Lamri, D., & Al Ibrahmi, E. M. (2025). Integrating generative AI into STEM education: Enhancing conceptual understanding, addressing misconceptions, and assessing student acceptance. Disciplinary and Interdisciplinary Science Education Research, 7(6), 6. [Google Scholar] [CrossRef]
- Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using r. SAGE Publications. [Google Scholar]
- Fyfe, E. R., de Leeuw, J. R., Carvalho, P. F., Goldstone, R. L., Hourihan, K. L., Kerr, B., Lee, H., Motz, B. A., Nathan, M. J., Noelle, D. C., Pape, S. J., Ruprecht, C., Subban, P., Teasley, S. D., Thompson, C. A., Uz Zaman, T., Van Tassell, R., Yan, V. M., & Yip, D. (2021). ManyClasses 1: Assessing the generalizable effect of immediate feedback versus delayed feedback across many college classes. Advances in Methods and Practices in Psychological Science, 4(3), 1–24. [Google Scholar] [CrossRef]
- Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. [Google Scholar] [CrossRef]
- Ivanov, S., Soliman, M., Tuomi, A., Alhamar Alkathiri, N., & Al-Alawi, A. N. (2024). Drivers of generative AI adoption in higher education through the lens of the theory of planned behaviour. Technology in Society, 77, 102521. [Google Scholar] [CrossRef]
- Klenowski, V. (2009). Assessment for learning revisited: An Asia-Pacific perspective. Assessment in Education: Principles, Policy and Practice, 16(3), 263–268. [Google Scholar] [CrossRef]
- Lipnevich, A. A., & Smith, J. K. (2009). Effects of differential feedback on students’ examination performance. Journal of Experimental Psychology: Applied, 15(4), 319–333. [Google Scholar] [CrossRef]
- Nicol, D. J., & Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: A model and seven principles of good feedback practice. Studies in Higher Education, 31(2), 199–218. [Google Scholar] [CrossRef]
- Nunnally, J. C. (1978). Psychometric theory (2nd ed.). McGraw-Hill. [Google Scholar]
- OpenAI. (2023, November). Introducing gpts. Available online: https://openai.com/index/introducing-gpts (accessed on 9 July 2025).
- Ramaprasad, A. (1983). On the definition of feedback. Behavioral Science, 28(1), 4–13. [Google Scholar] [CrossRef]
- Ray, P. P. (2023). ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems, 3, 121–154. [Google Scholar] [CrossRef]
- Reynolds, L., & McDonell, K. (2021). Prompt programming for large language models: Beyond the few-shot paradigm. In Extended abstracts of the 2021 chi conference on human factors in computing systems (pp. 1–7). ACM. [Google Scholar] [CrossRef]
- Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18(2), 119–144. [Google Scholar] [CrossRef]
- Schildkamp, K., & Kuiper, W. (2010). Data-informed curriculum reform: Which data, what purposes, and promoting and hindering factors. Teaching and Teacher Education, 26(3), 482–496. [Google Scholar] [CrossRef]
- Schulhoff, S., Ilie, M., Balepur, N., Kahadze, K., Liu, A., Si, C., Li, Y., Gupta, A., Han, H., Schulhoff, S., Dulepet, P. S., Vidyadhara, S., Ki, D., Agrawal, S., Pham, C., Kroiz, G., Li, F., Tao, H., Srivastava, A., … Resnik, P. (2024). The prompt report: A systematic survey of prompting techniques. arXiv, arXiv:2406.06608. [Google Scholar] [CrossRef]
- Scriven, M. (1966). The methodology of evaluation (No. 110). Purdue University. [Google Scholar]
- Sirnoorkar, A., Zollman, D., Laverty, J. T., Magana, A. J., Rebello, N. S., & Bryan, L. A. (2024). Student and AI responses to physics problems examined through the lenses of sensemaking and mechanistic reasoning. Computers and Education: Artificial Intelligence, 7, 100318. [Google Scholar] [CrossRef]
- Sortwell, A., Trimble, K., Ferraz, R., Geelan, D. R., Hine, G., Ramirez-Campillo, R., Carter-Thuiller, B., Gkintoni, E., & Xuan, Q. (2024). A systematic review of meta-analyses on the impact of formative assessment on K-12 students’ learning: Toward sustainable quality education. Sustainability, 16(17), 7826. [Google Scholar] [CrossRef]
- Wan, T., & Chen, Z. (2024). Exploring generative AI assisted feedback writing for students’ written responses to a physics conceptual question with prompt engineering and few-shot learning. Physical Review Physics Education Research, 20(1), 010152. [Google Scholar] [CrossRef]
- Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E. H., Le, Q. V., & Zhou, D. (2022, November 28–December 9). Chain-of-thought prompting elicits reasoning in large language models. 36th International Conference on Neural Information Processing Systems (NeurIPS 2022) (pp. 1800–1813), New Orleans, LA, USA. Available online: https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf (accessed on 9 July 2025).
- Yeadon, W., Agra, E., Inyang, O.-O., Mackay, P., & Mizouri, A. (2024). Evaluating AI and human authorship quality in academic writing through physics essays. European Journal of Physics, 45(5), 055703. [Google Scholar] [CrossRef]
- Yeadon, W., & Hardy, T. (2024). The impact of AI in physics education: A comprehensive review from GCSE to university levels. Physics Education, 59(2), 025010. [Google Scholar] [CrossRef]
- Yeadon, W., Inyang, O.-O., Mizouri, A., Peach, A., & Testro, C. P. (2023). The death of the short-form physics essay in the coming AI revolution. Physics Education, 58(3), 035027. [Google Scholar] [CrossRef]
- Yeadon, W., Peach, A., & Testrow, C. (2024). A comparison of human, GPT-3.5, and GPT-4 performance in a university-level coding course. Scientific Reports, 14, 23285. [Google Scholar] [CrossRef] [PubMed]
- Zambon, C., Mizouri, A., & Stevenson, C. (2024). Navigating the gAI landscape: Insights from a physics education survey. Enhancing Teaching and Learning in Higher Education, 2, 16–38. [Google Scholar] [CrossRef]
- Zhang, Z., Zhang, J., Zhang, X., & Mai, W. (2025). A comprehensive overview of Generative AI (GAI): Technologies, applications, and challenges. Neurocomputing, 632, 129645. [Google Scholar] [CrossRef]
Technique | Primary Function | Description | Purpose and Benefits in Educational Context |
---|---|---|---|
Chain of Thought (CoT) | Reasoning | Encourages step-by-step reasoning before answering. | Helps the model align with logical structures typical of lab report assessment. |
Rephrase and Respond (RaR) | Tone Control | Requires the model to rephrase student input before evaluating. | Improves alignment with student writing and promotes clearer feedback phrasing. |
System 2 Attention (S2A) | Deep Reflection | Simulates slow, deliberate cognitive processing. | Reduces superficial or overly generic responses by encouraging depth. |
Few-shot Learning | Output Structuring | Incorporates worked examples into the prompt. | Establishes feedback style and tone consistency by modelling ideal responses. |
Instruction-based Prompting | Behaviour Control | Uses explicit natural language directives for structure and constraints. | Ensures feedback format and scope are consistently followed. |
Contextual Grounding | Content Anchoring | Conditions the model using domain-specific files (e.g., marking rubrics). | Anchors output in institutional expectations and scientific content norms. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mills, E.; Mizouri, A.; Peach, A. Prompting Better Feedback: A Study of Custom GPT for Formative Assessment in Undergraduate Physics. Educ. Sci. 2025, 15, 1058. https://doi.org/10.3390/educsci15081058
Mills E, Mizouri A, Peach A. Prompting Better Feedback: A Study of Custom GPT for Formative Assessment in Undergraduate Physics. Education Sciences. 2025; 15(8):1058. https://doi.org/10.3390/educsci15081058
Chicago/Turabian StyleMills, Ellie, Arin Mizouri, and Alex Peach. 2025. "Prompting Better Feedback: A Study of Custom GPT for Formative Assessment in Undergraduate Physics" Education Sciences 15, no. 8: 1058. https://doi.org/10.3390/educsci15081058
APA StyleMills, E., Mizouri, A., & Peach, A. (2025). Prompting Better Feedback: A Study of Custom GPT for Formative Assessment in Undergraduate Physics. Education Sciences, 15(8), 1058. https://doi.org/10.3390/educsci15081058