Comparing ChatGPT Feedback and Peer Feedback in Shaping Students’ Evaluative Judgement of Statistical Analysis: A Case Study
Abstract
1. Introduction
2. Literature Review
2.1. Defining Evaluative Judgement
2.2. Developing Evaluative Judgement of Statistical Analysis
2.3. Intersections Between Evaluative Judgement, Peer Feedback, and ChatGPT Feedback
3. Methods
3.1. Research Context and Participants
3.2. Data Collection
3.3. Data Analysis
4. Findings
4.1. Hard Evaluative Judgement
4.2. Soft Evaluative Judgement
4.3. Dynamic Evaluative Judgement
5. Discussion
5.1. Hard Evaluative Judgement: Navigating Accuracy and Accountability
5.2. Soft Evaluative Judgement: Cultivating Reflective and Interpretive Depth
5.3. Dynamic Evaluative Judgement: Reflecting on Process, Not Just Product
6. Implications
7. Limitations, Future Studies, and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ajjawi, R., & Bearman, M. (2018). Problematising standards: Representation or performance? In D. Boud, R. Ajjawi, P. Dawson, & J. Tai (Eds.), Developing evaluative judgement in higher education: Assessment for knowing and producing quality work (pp. 41–50). Routledge. [Google Scholar]
- Ajjawi, R., Tai, J., Dawson, P., & Boud, D. (2018). Conceptualising evaluative judgement for sustainable assessment in higher education. In D. Boud, R. Ajjawi, P. Dawson, & J. Tai (Eds.), Developing evaluative judgement in higher education: Assessment for knowing and producing quality work (pp. 7–17). Routledge. [Google Scholar]
- Albers, M. J. (2017). Quantitative data analysis—In the graduate curriculum. Journal of Technical Writing and Communication, 47(2), 215–233. [Google Scholar] [CrossRef]
- Baglin, J., Hart, C., & Stow, S. (2017). The statistical knowledge gap in higher degree by research students: The supervisors’ perspective. Higher Education Research & Development, 36(5), 875–889. [Google Scholar] [CrossRef]
- Bearman, M., & Ajjawi, R. (2023). Learning to work with the black box: Pedagogy for a world with artificial intelligence. British Journal of Educational Technology, 54(5), 1160–1173. [Google Scholar] [CrossRef]
- Bearman, M., Tai, J., Dawson, P., Boud, D., & Ajjawi, R. (2024). Developing evaluative judgement for a time of generative artificial intelligence. Assessment & Evaluation in Higher Education, 49(6), 893–905. [Google Scholar] [CrossRef]
- Boud, D. (2000). Sustainable assessment: Rethinking assessment for the learning society. Studies in Continuing Education, 22(2), 151–167. [Google Scholar] [CrossRef]
- Boud, D., & Soler, R. (2016). Sustainable assessment revisited. Assessment & Evaluation in Higher Education, 41(3), 400–413. [Google Scholar] [CrossRef]
- Brown, G. T. (2014). What supervisors expect of education masters students before they engage in supervised research: A Delphi study. International Journal of Quantitative Research in Education, 2(1), 69–88. [Google Scholar] [CrossRef]
- Brown, G. T. (2024). Teaching advanced statistical methods to postgraduate novices: A case example. Frontiers in Education, 8, 1302326. [Google Scholar] [CrossRef]
- Chen, L., Howitt, S., Higgins, D., & Murray, S. (2022). Students’ use of evaluative judgement in an online peer learning community. Assessment & Evaluation in Higher Education, 47(4), 493–506. [Google Scholar] [CrossRef]
- Chong, S. W. (2021). University students’ perceptions towards using exemplars dialogically to develop evaluative judgement: The case of a high-stakes language test. Asian-Pacific Journal of Second and Foreign Language Education, 6(1), 12. [Google Scholar] [CrossRef]
- Dall’Alba, G. (2018). Evaluative judgement for learning to be in a digital world. In D. Boud, R. Ajjawi, P. Dawson, & J. Tai (Eds.), Developing evaluative judgement in higher education: Assessment for knowing and producing quality work (pp. 18–27). Routledge. [Google Scholar]
- Darvishi, A., Khosravi, H., Sadiq, S., & Gašević, D. (2022). Incorporating AI and learning analytics to build trustworthy peer assessment systems. British Journal of Educational Technology, 53(4), 844–875. [Google Scholar] [CrossRef]
- Ellis, A. R., & Slade, E. (2023). A new era of learning: Considerations for ChatGPT as a tool to enhance statistics and data science education. Journal of Statistics and Data Science Education, 31(2), 128–133. [Google Scholar] [CrossRef]
- Essien, A., Bukoye, O. T., O’Dea, X., & Kremantzis, M. (2024). The influence of AI text generators on critical thinking skills in UK business schools. Studies in Higher Education, 49(5), 865–882. [Google Scholar] [CrossRef]
- Fan, Y., Tang, L., Le, H., Shen, K., Tan, S., Zhao, Y., Shen, Y., Li, X., & Gašević, D. (2025). Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. British Journal of Educational Technology, 56(2), 489–530. [Google Scholar] [CrossRef]
- Fereday, J., & Muir-Cochrane, E. (2006). Demonstrating rigor using thematic analysis: A hybrid approach of inductive and deductive coding and theme development. International Journal of Qualitative Methods, 5(1), 80–92. [Google Scholar] [CrossRef]
- Goodyear, P., & Markauskaite, L. (2018). Epistemic resourcefulness and the development of evaluative judgement. In D. Boud, R. Ajjawi, P. Dawson, & J. Tai (Eds.), Developing evaluative judgement in higher education: Assessment for knowing and producing quality work (pp. 28–38). Routledge. [Google Scholar]
- Huang, A. Y., Lu, O. H., & Yang, S. J. (2023). Effects of artificial intelligence–Enabled personalized recommendations on learners’ learning engagement, motivation, and outcomes in a flipped classroom. Computers & Education, 194, 104684. [Google Scholar] [CrossRef]
- Jonsson, A. (2013). Facilitating productive use of feedback in higher education. Active Learning in Higher Education, 14(1), 63–76. [Google Scholar] [CrossRef]
- Joughin, G. (2018). Limits to evaluative judgement. In D. Boud, R. Ajjawi, P. Dawson, & J. Tai (Eds.), Developing evaluative judgement in higher education: Assessment for knowing and producing quality work (pp. 60–69). Routledge. [Google Scholar]
- Luo, J., & Chan, C. K. Y. (2022a). Conceptualising evaluative judgement in the context of holistic competency development: Results of a Delphi study. Assessment & Evaluation in Higher Education, 48(4), 513–528. [Google Scholar] [CrossRef]
- Luo, J., & Chan, C. K. Y. (2022b). Exploring the process of evaluative judgement: The case of engineering students judging intercultural competence. Assessment & Evaluation in Higher Education, 48(7), 951–965. [Google Scholar] [CrossRef]
- Markauskaite, L., Marrone, R., Poquet, O., Knight, S., Martinez-Maldonado, R., Howard, S., & Siemens, G. (2022). Rethinking the entwinement between artificial intelligence and human learning: What capabilities do learners need for a world with AI? Computers and Education: Artificial Intelligence, 3, 100056. [Google Scholar] [CrossRef]
- Mejía-Ramos, J. P., & Weber, K. (2020). Using task-based interviews to generate hypotheses about mathematical practice: Mathematics education research on mathematicians’ use of examples in proof-related activities. ZDM, 52(6), 1099–1112. [Google Scholar] [CrossRef]
- Nelson, R. (2018). Barriers to the cultivation of evaluative judgement: A critical and historical perspective. In D. Boud, R. Ajjawi, P. Dawson, & J. Tai (Eds.), Developing evaluative judgement in higher education: Assessment for knowing and producing quality work (pp. 51–59). Routledge. [Google Scholar]
- Nicol, D. (2021). The power of internal feedback: Exploiting natural comparison processes. Assessment & Evaluation in Higher Education, 46(5), 756–778. [Google Scholar] [CrossRef]
- Rawas, S. (2024). ChatGPT: Empowering lifelong learning in the digital age of higher education. Education and Information Technologies, 29(6), 6895–6908. [Google Scholar] [CrossRef]
- Rudolph, J., Tan, S., & Tan, S. (2023). ChatGPT: Bullshit spewer or the end of traditional assessments in higher education? Journal of Applied Learning and Teaching, 6(1), 342–363. [Google Scholar] [CrossRef]
- Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18(2), 119–144. [Google Scholar] [CrossRef]
- Schwarz, J. (2025). The use of generative AI in statistical data analysis and its impact on teaching statistics at universities of applied sciences. Teaching Statistics, 47(2), 118–128. [Google Scholar] [CrossRef]
- Su, Y., Lin, Y., & Lai, C. (2023). Collaborating with ChatGPT in argumentative writing classrooms. Assessing Writing, 57, 100752. [Google Scholar] [CrossRef]
- Tai, J., Ajjawi, R., Boud, D., Dawson, P., & Panadero, E. (2018). Developing evaluative judgement: Enabling students to make decisions about the quality of work. Higher Education, 76, 467–481. [Google Scholar] [CrossRef]
- Tai, J., Canny, B. J., Haines, T. P., & Molloy, E. K. (2016). The role of peer-assisted learning in building evaluative judgement: Opportunities in clinical medical education. Advances in Health Sciences Education, 21, 659–676. [Google Scholar] [CrossRef]
- Topping, K., & Ehly, S. (1998). Introduction to peer-assisted learning. In K. Topping, & S. Ehly (Eds.), Peer-assisted learning (pp. 1–23). Lawrence Erlbaum Associates. [Google Scholar]
- Wichmann, A., Funk, A., & Rummel, N. (2018). Leveraging the potential of peer feedback in an academic writing activity through sense-making support. European Journal of Psychology of Education, 33, 165–184. [Google Scholar] [CrossRef]
- Xie, X., Nimehchisalem, V., Yong, M. F., & Yap, N. T. (2024a). Malaysian students’ perceptions towards using peer feedback to cultivate evaluative judgement of argumentative writing. Arab World English Journal, 15(1), 298–313. [Google Scholar] [CrossRef]
- Xie, X., Yong, M. F., Yap, N. T., & Nimehchisalem, V. (2024b). Students’ perceptions of evaluative judgement in technology-mediated dialogic peer feedback. Pertanika Journal of Social Sciences & Humanities, 32(4), 1643–1660. [Google Scholar] [CrossRef]
- Xing, Y. (2024). Exploring the use of ChatGPT in learning and instructing statistics and data analytics. Teaching Statistics, 46(2), 95–104. [Google Scholar] [CrossRef]
- Zhan, Y., Boud, D., Dawson, P., & Yan, Z. (2025). Generative artificial intelligence as an enabler of student feedback engagement: A framework. Higher Education Research & Development, 44(5), 1289–1304. [Google Scholar] [CrossRef]
- Zhan, Y., & Yan, Z. (2025). Students’ engagement with ChatGPT feedback: Implications for student feedback literacy in the context of generative artificial intelligence. Assessment & Evaluation in Higher Education, 1–14. [Google Scholar] [CrossRef]
Framework/Author(s) | Theoretical Lens | Key Dimensions of Evaluative Judgement | Unique Contribution |
---|---|---|---|
Ajjawi and Bearman (2018) | Sociomaterial | Emerges through interplay of learners, tools, and contexts | Emphasises how judgement is situated and mediated through socio-material interactions |
Dall’Alba (2018) | Epistemological and ontological | Involves both what students know/do and who they are becoming | Highlights integration of knowledge, practice, and identity in digital higher education |
Joughin (2018) | Dual-process theory | Differentiates between intuitive (System 1) and analytical (System 2) modes of judgement | Introduces cognitive mechanisms underlying evaluative processes |
Goodyear and Markauskaite (2018) | Epistemic capability | Judgement as capability for knowledgeable action in complex, evolving contexts | Links evaluative judgement with adaptive expertise and situated action |
Luo and Chan (2022a, 2022b) | Developmental and integrative | Cognitive, affective, behavioural, and identity-related dimensions; evolves through disciplinary practice | Emphasises the holistic, non-linear development of evaluative judgement over time |
Nelson (2018) | Tripartite historical/typological | Hard (accuracy), soft (value/significance), dynamic (procedural reflection) | Offers nuanced typology based on disciplinary evolution |
Category | Category/(Axial) Code | Example |
---|---|---|
ChatGPT feedback | High statistical accuracy for straightforward tasks | I feel the main reason it worked well for me this time is because the tasks I gave it weren’t too complex or intricate (Tina_post-implementation survey). |
Necessity of verifying ChatGPT’s statistical outputs | If my results were different from what ChatGPT suggests, I didn’t just blindly trust it—I asked why, read its analysis carefully (John_post-implementation survey). | |
Dependence on prompt quality | What if I gave it the wrong prompt? Then ChatGPT would still follow my instruction and produce an answer, but it might be based on a misunderstanding from the very beginning (Tina_interview). | |
Peer feedback | Identifying errors overlooked in self-checking | Peer feedback alerted me that other people’s answers were different from mine, and it was difficult for me to find such mistakes on my own (Stella_post-implementation survey). |
Questioning the authority of peer feedback | Sometimes we weren’t entirely sure whether the answer we agreed on was actually correct, and we needed an authoritative expert or teacher to provide us with a more credible explanation (Viki_interview). |
Category | Category/(Axial) Code | Example |
---|---|---|
ChatGPT feedback | Immediate and user-friendly assistance | ChatGPT can answer my questions anytime I need—it’s so convenient. There’s no time limit (Mary_post-implementation survey). |
Broadening knowledge scope through exploration of alternatives | It sometimes gave me new insights or different ways of looking at the data—ideas I wouldn’t have thought of myself (John_interview). | |
Hallucination hindering deep learning and undermining trust | The answers given by ChatGPT are often made up and not based on anything solid, so I don’t fully trust it to handle complex data analysis (Tina_interview). | |
Limited contribution to rigorous statistical analysis | Because I don’t fully understand ChatGPT’s algorithms, training data, or data privacy practices, I prefer to use traditional tools like SPSS for future work (Tina_post-implementation survey). | |
Peer feedback | Encouraging critical reflection and in-depth understanding | Peers can suggest alternative explanations, highlight areas of ambiguity, and recommend improvements in the presentation of findings (Olivia_post-implementation survey). |
Stimulating learning motivation and engagement | This kind of interaction increased the sense of participation, built up confidence, and made my learning feel more active and engaging (Stella_interview). | |
Fostering co-regulated learning and knowledge sharing | When someone made a good point, we’d recognise it right away. If there was any confusion or disagreement, we just talked it through in a friendly way (Olivia_interview). | |
Limited contribution to rigorous statistical analysis | Most of the peers I know are also starting from scratch—we’re all beginners in statistics, so I can’t fully trust peer feedback when it comes to final application (Viki_interview). |
Category | Category/(Axial) Code | Example |
---|---|---|
ChatGPT feedback | Recommending appropriate statistical methods | It could help me clarify my research question making sure it’s well-defined and recommend the suitable statistical analysis approaches (Mary_post-implementation survey). |
Step-by-step guidance for statistical software use | For correlation analysis, it told me to go to Analyse > Correlate > Bivariate in SPSS—super clear (John_post-implementation survey). | |
Detailed instructions for result explanation | ChatGPT explained the meaning of each table and figure, making it much easier for me to understand the results (Mary_interview). | |
Support for standardised statistical reporting | ChatGPT also guided me in properly reporting the results, including highlighting the significant findings to be reported in line with academic standards (John_interview). | |
Limitations in error detection and procedure verification | ChatGPT can’t check for data entry mistakes, missing values, or confirm whether appropriate statistical procedures have been correctly followed (Tina_interview). | |
Peer feedback | Fostering critical reflection on statistical methods | Peer feedback prompted us to carefully examine data analysis for any potential flaws or limitations, and we discussed the reasons for choosing Spearman over Pearson (Stella_interview). |
Supporting careful review of analytical processes | My peers immediately asked me to re-operate the analysis, and when I opened the data file, they pointed out to me that I had opened the wrong data file (Olivia_interview). | |
Promoting deeper understanding of statistical outputs | The discussion made me realise that my interpretation of the output was not as clear or accurate as I had initially thought (Olivia_post-implementation survey). | |
Providing feedback supported by multiple references | The process of providing feedback required us to consult various sources to continually justify our viewpoints, helping us develop a deeper understanding of knowledge (Stella_interview). | |
Peer consensus sometimes results from practical necessity | Since others in the group thought the same way, I felt like I had to compromise temporarily. After all, this was a team assignment, it wasn’t just about my own answer anymore (Viki_interview). |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xie, X.; Zhang, L.J.; Wilson, A.J. Comparing ChatGPT Feedback and Peer Feedback in Shaping Students’ Evaluative Judgement of Statistical Analysis: A Case Study. Behav. Sci. 2025, 15, 884. https://doi.org/10.3390/bs15070884
Xie X, Zhang LJ, Wilson AJ. Comparing ChatGPT Feedback and Peer Feedback in Shaping Students’ Evaluative Judgement of Statistical Analysis: A Case Study. Behavioral Sciences. 2025; 15(7):884. https://doi.org/10.3390/bs15070884
Chicago/Turabian StyleXie, Xiao, Lawrence Jun Zhang, and Aaron J. Wilson. 2025. "Comparing ChatGPT Feedback and Peer Feedback in Shaping Students’ Evaluative Judgement of Statistical Analysis: A Case Study" Behavioral Sciences 15, no. 7: 884. https://doi.org/10.3390/bs15070884
APA StyleXie, X., Zhang, L. J., & Wilson, A. J. (2025). Comparing ChatGPT Feedback and Peer Feedback in Shaping Students’ Evaluative Judgement of Statistical Analysis: A Case Study. Behavioral Sciences, 15(7), 884. https://doi.org/10.3390/bs15070884