Next Article in Journal
Exploring the Interplay Between Creative Self-Efficacy, Mindset, and Creativity in Response to Negative Feedback
Previous Article in Journal
Older Adults’ Experiences of Telephone-Delivered Behavioral Activation with Mental Imagery as a Treatment for Depression During the COVID-19 Pandemic: A Qualitative Study
Previous Article in Special Issue
Complementing but Not Replacing: Comparing the Impacts of GPT-4 and Native-Speaker Interaction on Chinese L2 Writing Outcomes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Participatory Co-Design and Evaluation of a Novel Approach to Generative AI-Integrated Coursework Assessment in Higher Education

by
Alex F. Martin
1,*,
Svitlana Tubaltseva
2,
Anja Harrison
1 and
G. James Rubin
1
1
Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London WC2R 2LS, UK
2
School of Liberal Arts, Richmond American University London, London W4 5AN, UK
*
Author to whom correspondence should be addressed.
Behav. Sci. 2025, 15(6), 808; https://doi.org/10.3390/bs15060808
Submission received: 30 April 2025 / Revised: 5 June 2025 / Accepted: 10 June 2025 / Published: 12 June 2025

Abstract

Generative AI tools offer opportunities for enhancing learning and assessment, but raise concerns about equity, academic integrity, and the ability to critically engage with AI-generated content. This study explores these issues within a psychology-oriented postgraduate programme at a UK university. We co-designed and evaluated a novel AI-integrated assessment aimed at improving critical AI literacy among students and teaching staff (pre-registration: osf.io/jqpce). Students were randomly allocated to two groups: the ‘compliant’ group used AI tools to assist with writing a blog and critically reflected on the outputs, while the ‘unrestricted’ group had free rein to use AI to produce the assessment. Teaching staff, blinded to group allocation, marked the blogs using an adapted rubric. Focus groups, interviews, and workshops were conducted to assess the feasibility, acceptability, and perceived integrity of the approach. Findings suggest that, when carefully scaffolded, integrating AI into assessments can promote both technical fluency and ethical reflection. A key contribution of this study is its participatory co-design and evaluation method, which was effective and transferable, and is presented as a practical toolkit for educators. This approach supports growing calls for authentic assessment that mirrors real-world tasks, while highlighting the ongoing need to balance academic integrity with skill development.

1. Introduction

Recent advances in generative artificial intelligence (AI), powered by large language models, present opportunities and challenges for assessment in higher education. AI is now widely used across sectors including health, industry, and research (McKinsey, 2024; Sun et al., 2023), and is permanently reshaping the nature of academic tasks. In educational settings, AI has already shown potential to support learning by providing personalised feedback, scaffolding writing processes, and automating routine tasks (Kasneci et al., 2023; Strielkowski et al., 2025). Interest in the role of AI in education has accelerated rapidly in recent years (Strielkowski et al., 2025), with growing attention being paid to its implications for assessment and feedback practices (e.g., Henderson et al., 2025; Usher, 2025). In this study, we extend this literature by evaluating a novel assessment design that contrasts different modalities of AI use, providing new insight into how AI can be critically and ethically integrated into higher education assessment. Our participatory methodology is transferable to other educational contexts, and we provide practical resources to support educators in adapting this approach.
Initial studies suggest that, while students may benefit from AI-enhanced feedback, overreliance on these tools may undermine opportunities for deep learning and critical engagement (Kasneci et al., 2023; Suriano et al., 2025; Zawacki-Richter et al., 2019; Zhai et al., 2024). The integration of generative AI in education also presents challenges. Equity concerns persist, including unequal access to reliable AI tools and the digital skills needed to use them meaningfully (UNESCO, 2024). Academic integrity is also at risk, as AI can be used to ‘cheat’ in ways that evade detection (Yusuf et al., 2024). Moreover, the use of AI complicates traditional concepts of authorship and scholarship, raising questions about what constitutes independent academic work (Kulkarni et al., 2024; Luo, 2024). There are also concerns that critical thinking, a key goal of higher education, could be weakened if students accept AI outputs without careful evaluation (Bittle & El-Gayar, 2025).
In response, there is growing recognition of the need to build critical AI literacy among students and staff. This means not just knowing how to use AI tools, but understanding how they work, the wider impacts they have, and how to assess AI-generated content carefully and ethically (Abdelghani et al., 2023). Developing critical AI literacy is needed to prepare students to be thoughtful, responsible users of AI, and should be built into teaching and assessment strategies.
The overarching aim of this study is to improve the critical AI literacy of postgraduate students and teaching staff through the co-design and evaluation of an AI-integrated written coursework assessment that contrast different AI modalities. In this assessment, students used generative AI tools to draft a blog critically summarising an empirical research article and produced a reflective, critical commentary on the AI-generated content. Specifically, we asked two research questions:
  • Is the AI-integrated assessment acceptable and feasible for students and teaching staff?
  • Can teaching staff distinguish between assessments completed in accordance with the brief and those generated entirely by AI?
Our findings were used to develop practical guidance and a toolkit for educators, support the implementation and iterative improvement of AI-integrated assessments, and contribute to the wider pedagogical literature on assessment in higher education.

2. Materials and Methods

This study uses a participatory evaluation approach. Participatory evaluation involves contributors not just as participants, but as co-designers and co-evaluators (Fetterman et al., 2017), and has been used previously to explore AI-related resources and curriculum development (Cousin, 2006; Teodorowski et al., 2023). A strength of this approach is its emphasis on different forms of expertise, including lived experience, disciplinary knowledge, and teaching practice, which contribute to the development of assessments that are both grounded and relevant.
The protocol is available at OSF Registries: doi.org/10.17605/OSF.IO/JQPCE. We used the Guidance for Reporting Involvement of Patients and the Public short form (GRIPP2-SF) checklist to report involvement in the study (reported in Table S1 in the Supplementary Materials; (Staniszewska et al., 2017). This study was approved by the Research Ethics Panel of King’s College London (LRS/DP-23/24-42387; 27 June 2024).
This study involves twelve participants from the 2023–24 cohort of a postgraduate course at the Institute of Psychiatry, Psychology, and Neuroscience, King’s College London, a Russell Group university in the United Kingdom. The student cohort comprised approximately 30 individuals. Most were in their early twenties and had entered the MSc programme directly after completing their undergraduate studies, with around one in six being mature students returning to education after spending time in the workforce. A very small number were men. Approximately one-third were UK home students, while two-thirds were international, the majority of whom were from East Asia.
Eight students and four members of the teaching team took part in the study. The teaching staff included a Teaching Fellow, a Lecturer, and two Research Associates. All participants had recently completed or marked a summative assessment within the course. We considered the sample size to be adequate given the small cohort, the participatory nature of the research, and the principle of information power, which suggests that the more relevant information the sample holds, the lower the sample size that is needed (Malterud et al., 2016). In this study, participants were well positioned to inform the evaluation, having first-hand experience with the assessment and its development. They brought a range of expertise and experience with AI, from high digital literacy to limited prior use, as well as strengths in academic writing and assessment design. This ensured that the participatory methods supported shared ownership, practical relevance, and opportunities for innovation.

2.1. Stage 1: AI-Integrated Assessment

The research team collaborated with other members of the course’s teaching team to adapt an existing summative assessment already embedded in the curriculum. This assessment required students to write a blog post summarising and critically appraising an empirical research article on mental health. Framed as an authentic assessment, the task included the potential for selected blogs to be published on science communication platforms.
We used the Transforming Assessment in Higher Education framework developed by AdvanceHE to guide our approach to integrating generative AI tools into this assessment (Healey & Healey, 2019). The framework highlights the need for assessments that are authentic, inclusive, and aligned with learning outcomes, emphasising the importance of involving students in the assessment development process. This emphasis aligned with our approach of integrating AI tools to reflect real-world practices and to develop critical AI literacy.
Under the revised assessment approach, students were asked to use two AI tools to assist with drafting a blog based on an empirical article. The written assessment consisted of three components:
  • Two AI-generated blog drafts using two AI tools.
  • A final blog that combined the strongest elements of the AI outputs with the student’s own revisions and original contributions, assessed for the accurate and critical appraisal of the empirical article.
  • A commentary critically reflecting on the AI-generated content and explaining the rationale for revisions made, assessed for the depth of critical and ethical reflection.
The marking matrix was revised to retain the use of a standard critical appraisal checklist for assessing students’ understanding of the empirical article, alongside the programme-wide marking framework (stage 2). New criteria were introduced to evaluate students’ critical engagement with AI-generated content (stage 3). The adapted format built on the existing learning outcome of critically appraising empirical research, extending it to assess students’ ability to reflect on the role of AI in academic work, apply subject knowledge to evaluate AI outputs, and make informed editorial decisions.

2.2. Stage 2: Assessment Trial

All participants were invited to take part in a trial of the adapted assessment. They first attended a workshop designed to support students in their AI-assisted assessment. Microsoft Copilot, in both balanced and precise modes, was the mandated generative AI tool, selected for its free availability for the participants (ensuring equitable access) and to allow for direct comparisons between model outputs. While Copilot was used in this instance, the assessment was designed to be transferable to other AI tools.
The workshop was delivered in four stages. The first introduced Copilot’s core functions, including its strengths, limitations, and examples of effective prompt writing. In the second stage, students practised drafting prompts and used the AI models to generate and revise a mock blog post. The final two stages drew on Gibbs’ Reflective Cycle to guide structured learning (Gibbs, 1988). In stage three, students critically appraised an AI-generated blog and compared the outputs produced by the two Copilot modes. This exercise supported a deeper understanding and analysis of AI-generated content. In the final stage, students reflected on their use of AI and developed an action plan for how they would apply AI tools in future academic work. This reflection aimed to consolidate learning and promote ethical, informed use of generative AI tools.
Feedback on the workshop was collected through qualitative discussions at the end of the session and a short survey. The survey included a Likert-scale question assessing whether the workshop would help students complete the assessment (responses: yes, somewhat, no) and two free-text questions: “What did you learn from the workshop?” and “What was missing from the workshop that would help you feel more prepared for the pilot assessment?”
Student participants were then randomly allocated to one of two groups. Those in the ‘compliant’ group were instructed to follow the coursework brief precisely, using the designated AI tools as directed. Students in the ‘unrestricted’ group were given freedom to complete the assessment by any means, including generating the entire submission using AI tools. They were encouraged to be creative and to push the boundaries of the process. Teaching staff participants were asked to mark the submitted assessments and provide written student feedback using the adapted marking matrix. They also indicated whether they believed the student had completed the assessment as instructed (were in the ‘compliant’ group) or had been in the unrestricted group.

2.3. Stage 3: Evaluation

Participants participated in an iterative process of reviewing and refining the assessment materials, including the workshop content, coursework brief, and marking matrix.
To explore the feasibility, acceptability, and perceived integrity of the AI-integrated assessment approach, we conducted a series of semi-structured focus groups with students and individual interviews with teaching staff. This format was chosen to accommodate participant preferences and availability, while also helping to reduce power imbalances by providing students with a peer-supported setting in which to reflect on an assessment co-designed with researchers who were also their course instructors.
The discussion guides are reported in Supplement SB in the Supplementary Materials and at the Open Science Framework project: osf.io/ctewk/. They were developed to address the study’s research aims and to capture experiences across both groups regarding their engagement with generative AI in the context of assessments. Both focus groups and interviews lasted from approximately 45 to 60 min and were structured in two parts: the first explored participants’ existing knowledge of generative AI and their experiences of completing or marking the assessment; the second addressed their reflections on the assessment design and its potential for future implementation. In addition, we explored perceptions of ‘cheating’ in the assessment, including whether students in the compliant and unrestricted groups felt they had met the intended learning outcomes and whether staff felt able to distinguish between the two groups. Particular attention was paid to whether the approach supported intended learning outcomes and provided a fair measure of student performance.
We also asked questions about the initial training workshop as part of the interviews. This feedback was reviewed alongside data from the survey questions completed by participants after the workshop and was used to revise and improve the training content.
Focus groups and interviews were conducted via Microsoft Teams. Thematic analysis was led by one researcher (AFM), following Braun and Clarke’s (2006) approach, including familiarisation with the data, initial coding, theme identification, and iterative theme refinement. Analyses was performed separately for students and teaching staff. Emerging themes were reviewed and refined through discussion within the research team and with participants who participated in subsequent workshops.
In addition to qualitative comparisons, we conducted a statistical analysis to compare how successful markers were at identifying assessments written by students in either the compliant or unrestricted groups. Given the small sample size and expected cell counts of below five, we used Fisher’s exact test rather than the chi-square approximation (Howell, 2011), calculated using base R (R Core Team, 2025).
We held co-design workshops with students and teaching staff to further refine the assessment brief and marking matrix, respectively. The think aloud technique was used (Charters, 2003; Someren et al., 1994), whereby each section of the assessment materials was reviewed in turn. Participants took part in a facilitated group discussion, voicing their thoughts, suggestions, and reactions in real-time as they engaged with the materials. Data saturation was considered to have occurred when no further substantial changes were proposed by the participants. Two workshops were held with students and one with teaching staff, which likely reflects the fact that more extensive feedback had already been gathered from teaching staff during earlier individual interviews and incorporated into the materials prior to the workshops.
Feedback gathered during these sessions was used to inform revisions to the assessment materials. We documented this process using a Table of Changes (ToC) from the Person-Based Approach using the MoSCoW method, a prioritisation framework used to collaboratively decide which features, changes, or recommendations should be implemented [must, should, could, would like] (Bradbury et al., 2018). We also used a Custom GPT using GPT-4-turbo, which allows for the creation of a personalised version of ChatGPT-4 tailored to specific tasks or knowledge domains, to review the final materials for accessibility and readability.

3. Results

3.1. Assessment Materials and Learning Outcomes

The modifications made to the assessment materials are summarised in the ToC (Table 1, Table 2, Table 3 and Table 4).
Feedback on the co-designed assessment materials produced using the Custom GPT indicated that, while both the assessment brief and marking matrix were generally well-structured and aligned with learning outcomes, several refinements could improve readability and accessibility. These included ensuring consistency of language and tone, using bullet points and clearer formatting to support navigation, and clarifying instructions around AI tool use and submission structure. Minor revisions were recommended to the learning outcomes and the reflection criteria to enhance alignment with marking expectations.
The final versions of the assessment materials (workshop proformas, assessment brief, and marking matrix) and the amendments recommended by ChatGPT are included in Supplement SC in the Supplementary Materials and at the Open Science Framework project: osf.io/ctewk/.
Feedback on the learning outcomes was generally positive or neutral, with no negative responses offered. Students and teaching staff appreciated the inclusion of learning outcomes and found them helpful for understanding the purpose of the assessment. Some suggested making the link between the learning outcomes and the specific assessment tasks more explicit to improve alignment and clarify expectations. The major change that emerged from all feedback sources was the need to communicate that critical appraisal of the original empirical article is as important as the appraisal of the ability of AI to generate seemingly useful content. One student noted that engaging with the AI output highlighted inaccuracies, such as fabricated participant details, which prompted them to critically verify the content against the original source. This process, while demanding, was seen as intellectually valuable: “It forces you to actually figure out whether you’re critically appraising the critical appraisal.”(S5) Another contributor reflected on the need to distinguish between assessing AI literacy and assessing critical thinking (S2), suggesting that the learning objectives should clearly indicate which of these skills is being prioritised. This feedback informed revisions to the assessment brief and the revised learning outcomes.
Our revised learning outcomes became the following:
  • Critical appraisal: Students will demonstrate the ability to critically appraise academic content by the following:
    • Evaluating an empirical research article using an established critical appraisal checklist.
    • Assessing the accuracy, relevance, and limitations of AI-generated content in relation to the original empirical article.
    • Comparing outputs from different AI tools, identifying their strengths and weaknesses in academic content generation.
  • Generative AI literacy: Students will develop foundational AI literacy by using generative tools to support scientific blog writing. They will demonstrate an understanding of AI’s capabilities and limitations, including the ability to identify common errors such as fabrication or hallucination.
  • Editorial and reflective judgement: Students will apply editorial judgement to revise AI-generated content, integrating critical analysis and original insight. They will reflect on their use of AI tools and articulate the rationale for content modifications in alignment with accuracy, academic standards, and ethical considerations.

3.2. Feasibility, Acceptability, and Integrity

Table 5 and Table 6 present summaries of the key findings and illustrative quotes from the thematic analysis of the focus groups and interviews.
Student feedback highlighted that, while AI tools could streamline aspects of the writing process, they did not reduce workload due to the effort required to refine outputs. Perceptions of feasibility, acceptability, and integrity varied, with students valuing the opportunity to build critical thinking skills, but also expressing concerns about fairness, skill development, and, notably, ownership of their work. Some viewed equitable access and thoughtful integration of AI to be particularly important for maintaining academic standards. Teaching staff found the assessment structure clear, although marking was initially time-intensive because of the dual task of evaluating both AI and student contributions. Efficiency improved with familiarity, and staff recognised the assessment’s potential to support critical engagement. While challenges remained in distinguishing AI-generated from student-authored content, most staff endorsed transparent and pedagogically grounded use of AI in academic settings.
Students in the unrestricted group found that using AI to complete the entire assessment was challenging, with outputs, particularly the reflective commentary, requiring substantial oversight and correction. Some spent a similar amount of time on the task as those in the compliant group, while others felt they used somewhat less. Most felt they had achieved the intended learning outcomes due to the time spent checking, appraising, and reflecting on the AI-generated content.
Assessment marks ranged from 35 (fail) to 78 (distinction). For most assessments, marks from different markers fell within a ten-point range, but for one assessment, scores ranged more widely (from 58 to 78). Markers correctly identified 6 out of 14 students in the compliant group (42.9%) compared to 3 out of 6 in the unrestricted group (50.0%). Fisher’s exact test produced an odds ratio of 0.75, p = 1.00, indicating that marker accuracy did not differ meaningfully between the groups. Markers’ views on identifying students in the unrestricted condition were polarised: some reported having no clear sense, while others felt very confident that they could recognise AI-generated submissions. However, these subjective impressions were not reflected in their actual ability to accurately distinguish between the groups.

4. Discussion

The findings from this study add to the nascent body of literature that highlights the dual role of AI-integrated assessments as tools for digital literacy and as mechanisms for reflective, critical pedagogy. The blog format provided a unique opportunity for students to practise public-facing, accessible academic writing, aligning with real-world expectations in science communication. The pilot findings show that students found the approach to be feasible and helpful for developing critical skills, although engaging with AI outputs was perceived to increase workload. Teaching staff initially found marking more demanding and had limited success distinguishing unrestricted AI-generated content, but valued the assessment’s potential to promote ethical and critical AI use.
A key success of the project was the development of students’ critical AI literacy, with findings suggesting that the blog assessment promoted active engagement with AI outputs. Students were required to critique AI-generated content, identify inaccuracies, and justify their editorial decisions. This process appeared to encourage deeper critical engagement and helped students to view AI as a tool requiring human oversight rather than as a source of ready-made answers. However, some students may have used AI to support parts of the evaluative process itself, for example, by prompting AI to critique its own outputs, blurring the boundary between human and AI intervention. This challenge is prompting the development of pedagogical tools to enhance deeper engagement with AI content, including a revised version of Bloom’s Taxonomy (Gonsalves, 2024). In our study, students in the unrestricted group reported limited success when attempting to outsource critical reflection and revision entirely to AI, reporting that human oversight remained essential to complete the task successfully. This supports Gonsalves’ (2024) observation of AI as a cocreator, where students collaboratively refined, challenged, and integrated output. Nonetheless, the timing and degree of human input will vary between students, highlighting the need for structured scaffolding to support meaningful engagement with AI whilst safeguarding academic skill development.
The requirement to compare outputs across different AI models also supported the development of critical evaluation skills, as students reflected on the variability and limitations of AI-generated content. Importantly, these findings address concerns raised during the qualitative evaluation and reflected issues highlighted in previous research, such as that overreliance on AI could undermine opportunities for deep learning and reflective practice (Kasneci et al., 2023; Larson et al., 2024; Zawacki-Richter et al., 2019). These findings align with recommendations that AI in education must go beyond functional skills to include AI literacy, as well as active learning skills and metacognition (Abdelghani et al., 2023).
Beyond promoting critical engagement with AI outputs, this study also highlights strategies for maintaining assessment integrity and supporting academic skill development. Teaching staff expressed concerns that AI use could make it harder to distinguish original work from AI-generated content. This echoes broader challenges in the literature, where AI use may complicate traditional definitions of scholarship and independent academic work (Kulkarni et al., 2024; Luo, 2024; Yusuf et al., 2024). Although markers were sometimes confident, their accuracy in identifying AI-reliant submissions from those that were compliant with the assessment instructions was poor. This is likely because students in the unrestricted group generally described a similar editorial process to those in the compliant group. Nevertheless, the assessment’s structure, requiring critical appraisal of the empirical article, critique of AI outputs, evidence of revision, and transparency may have helped to mitigate these risks, although this needs further testing.
By embedding critical evaluation and editorial judgement, the assessment addressed concerns that AI could weaken core academic skills such as critical thinking and reflective analysis (Bittle & El-Gayar, 2025). One key challenge identified by participants was that the focus on evaluating AI-generated content risked overshadowing the critical appraisal of the empirical article itself. In response, the final co-produced brief more clearly separated and emphasised both components and better balanced the dual aims of the task. Maintaining this balance will be essential in future implementations to ensure that the assessment remains both authentic and educationally robust. Students also recognised that genuine engagement, not uncritical acceptance of AI outputs, was needed to meet the learning outcomes. However, the extent to which students internalised critical evaluation versus simply complying with task requirements remains unclear. Future studies could explore students’ metacognitive strategies and critical reasoning during AI use through longitudinal or think-aloud methodologies (Charters, 2003; Someren et al., 1994). Overall, the findings suggest that carefully designed AI-integrated assessments can uphold academic integrity while supporting the development of essential academic competencies.
Involving students and teaching staff in the co-design and evaluation process was central to developing an assessment that was authentic, feasible, and acceptable. The participatory approach drew on academic, pedagogical, and lived experience to shape the teaching workshop and assessment materials, helping us spot practical challenges early and promote shared ownership of the development of the assessment (Fetterman et al., 2017; Teodorowski et al., 2023). This aligns with broader calls for more inclusive, responsive, and transparent innovation in educational assessment (Bovill et al., 2016; Healey & Healey, 2019). However, participatory approaches also carry limitations, including potential power imbalances between participants and researchers, risks of tokenism, and the possibility of over-relying on stakeholder input to the detriment of expert judgement. Future research should continue to embed participatory evaluation while remaining mindful of these challenges to ensure AI-integrated assessment remains student-centred and pedagogically sound.
Several limitations of this study should be acknowledged. First, students were not involved in the initial design phase of the assessment, falling short of authentic co-production (Cook-Sather et al., 2014). Although this was partly mitigated through later participatory evaluation, involving students earlier could have strengthened the creativity, relevance, and ethical responsiveness of the assessment. Second, qualitative feedback was collected following the initial pilot rather than after a full module-wide rollout. As such, findings may reflect early impressions rather than longer-term engagement. However, this timing allowed for immediate adjustments and iterative revisions of the assessment materials. Third, this study was conducted within a single institutional setting with a small cohort and an ensuing small sample size, which limits the generalisability of the evaluation findings to other universities or international contexts with different AI access, policies, and pedagogical cultures. However, this study did not aim for statistical generalisability, but rather aimed to explore the feasibility and acceptability in context, using participatory methods grounded in information power (Malterud et al., 2016). Our broader goal was to model a co-design and evaluation approach that is transferable and could be adapted to different educational settings. The resulting assessment toolkit supports wider applications, helping educators adapt AI-integrated assessments to their own institutional and disciplinary contexts.

5. Conclusions

This study explores the participatory development and evaluation of a generative AI-integrated assessment in postgraduate education. The participatory methods used were effective in shaping an assessment that was both feasible and meaningful. A practical toolkit was produced to enable educators to apply similar co-design and evaluation processes within their own teaching contexts. Findings from this pilot evaluation suggest that integrating AI into assessments can promote both technical fluency and ethical reflection when scaffolded appropriately. Students engaged critically with AI outputs, while teaching staff recognised the potential for supporting critical thinking and maintaining academic integrity. Our approach supports growing calls for authentic assessments that mirrors real-world tasks, particularly in professions where AI is becoming more common. However, there remains a tension between preserving academic integrity and using AI to support skill development. Future iterations must continue to navigate this balance carefully, ensuring that critical engagement and ethical practice are at the core of AI-integrated learning.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/bs15060808/s1, Table S1: GRIPP2 short form.

Author Contributions

Conceptualization, A.F.M. and G.J.R.; Methodology, A.F.M., S.T., A.H. and G.J.R.; Validation, A.F.M. and G.J.R.; Formal analysis, A.F.M.; Investigation, A.F.M.; Resources, S.T.; Data curation, A.F.M.; Writing—original draft, A.F.M.; Writing—review and editing, S.T., A.H. and G.J.R.; Visualisation, A.F.M.; Supervision, G.J.R.; Project administration, A.F.M.; Funding acquisition, A.F.M. and G.J.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research specifically was funded by a grant from the Institute of Psychiatry, Psychology and Neuroscience at King’s College London’s Strategic College Teaching Fund 2023–24 to A.F.M. and G.J.R. (no grant reference number). A.F.M. and G.J.R. are also funded by the National Institute for Health and Care Research Health Protection Research Unit (NIHR HPRU) in Emergency Preparedness and Response, and the National Institute for Health and Care Research Health Protection Research Focus Award (NIHR HPRFA) in Outbreak Related Behaviours, a partnership between the UK Health Security Agency, King’s College London and the University of East Anglia (award numbers: NIHR200890 and NIHR207394). The views expressed are those of the authors and not necessarily those of the NIHR, UKHSA or the Department of Health and Social Care. For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and was approved by the PNM Research Ethics Panel at King’s College London, LRS/DP-23/24-42387, 27 June 2024.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Given the participatory nature of this research and the potential for participants to be identifiable in full transcripts, the data cannot be shared with third parties.

Acknowledgments

We are very grateful to Louise E Smith for her valuable support and insights in developing the original integration of generative AI into the blog assessment.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
AIGenerative Artificial Intelligence

References

  1. Abdelghani, R., Sauzéon, H., & Oudeyer, P.-Y. (2023). Generative AI in the classroom: Can students remain active learners? arXiv. [Google Scholar] [CrossRef]
  2. Bittle, K., & El-Gayar, O. (2025). Generative AI and academic integrity in higher education: A systematic review and research agenda. Information, 16(4), 296. [Google Scholar] [CrossRef]
  3. Bovill, C., Cook-Sather, A., Felten, P., Millard, L., & Moore-Cherry, N. (2016). Addressing potential challenges in co-creating learning and teaching: Overcoming resistance, navigating institutional norms and ensuring inclusivity in student–staff partnerships. Higher Education, 71, 195–208. [Google Scholar] [CrossRef]
  4. Bradbury, K., Morton, K., Band, R., van Woezik, A., Grist, R., McManus, R. J., Little, P., & Yardley, L. (2018). Using the person-based approach to optimise a digital intervention for the management of hypertension. PLoS ONE, 13(5), e0196868. [Google Scholar] [CrossRef] [PubMed]
  5. Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. [Google Scholar] [CrossRef]
  6. Charters, E. (2003). The use of think-aloud methods in qualitative research an introduction to think-aloud methods. Brock Education Journal, 12(2), 68–82. [Google Scholar] [CrossRef]
  7. Cook-Sather, A., Bovill, C., & Felten, P. (2014). Engaging students as partners in learning and teaching: A guide for faculty. John Wiley & Sons. [Google Scholar]
  8. Cousin, G. (2006). An introduction to threshold concepts. Planet, 17, 4–5. [Google Scholar] [CrossRef]
  9. Fetterman, D. M., Rodríguez-Campos, L., & Zukoski, A. P. (2017). Collaborative, participatory, and empowerment evaluation: Stakeholder involvement approaches. Guilford Publications. [Google Scholar]
  10. Gibbs, G. (1988). Learning by doing: A guide to teaching and learning methods. Further Education Unit. [Google Scholar]
  11. Gonsalves, C. (2024). Generative AI’s impact on critical thinking: Revisiting bloom’s taxonomy. Journal of Marketing Education, 1–16. [Google Scholar] [CrossRef]
  12. Healey, M., & Healey, R. L. (2019). Student engagement through partnership: A guide and update to the AdvanceHE framework. Advance HE, 12, 1–15. [Google Scholar]
  13. Henderson, M., Bearman, M., Chung, J., Fawns, T., Buckingham Shum, S., Matthews, K. E., & de Mello Heredia, J. (2025). Comparing generative AI and teacher feedback: Student perceptions of usefulness and trustworthiness. Assessment & Evaluation in Higher Education, 1–16. [Google Scholar] [CrossRef]
  14. Howell, D. C. (2011). Chi-square test: Analysis of contingency tables. International Encyclopedia of Statistical Science, 35(3), 250–252. [Google Scholar]
  15. Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., & Hüllermeier, E. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. [Google Scholar] [CrossRef]
  16. Kulkarni, M., Mantere, S., Vaara, E., van den Broek, E., Pachidi, S., Glaser, V. L., Gehman, J., Petriglieri, G., Lindebaum, D., & Cameron, L. D. (2024). The future of research in an artificial intelligence-driven world. Journal of Management Inquiry, 33(3), 207–229. [Google Scholar] [CrossRef]
  17. Larson, B. Z., Moser, C., Caza, A., Muehlfeld, K., & Colombo, L. A. (2024). Critical thinking in the age of generative AI. Academy of Management Learning & Education, 23(3), 373–378. [Google Scholar]
  18. Luo, J. (2024). A critical review of GenAI policies in higher education assessment: A call to reconsider the “originality” of students’ work. Assessment & Evaluation in Higher Education, 49(5), 651–664. [Google Scholar] [CrossRef]
  19. Malterud, K., Siersma, V. D., & Guassora, A. D. (2016). Sample size in qualitative interview studies: Guided by information power. Qualitative Health Research, 26(13), 1753–1760. [Google Scholar] [CrossRef]
  20. McKinsey. (2024). The state of AI in early 2024: Gen AI adoption spikes and starts to generate value. Quantum Black. Available online: https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-2024 (accessed on 29 April 2025).
  21. R Core Team. (2025). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Available online: https://www.R-project.org/ (accessed on 1 April 2025).
  22. Someren, M. W. v., Barnard, Y. F., & Sandberg, J. (1994). The think aloud method: A practical guide to modelling cognitive processes. Academic Press. [Google Scholar]
  23. Staniszewska, S., Brett, J., Simera, I., Seers, K., Mockford, C., Goodlad, S., Altman, D., Moher, D., Barber, R., & Denegri, S. (2017). GRIPP2 reporting checklists: Tools to improve reporting of patient and public involvement in research. BMJ, 358, j3453. [Google Scholar] [CrossRef]
  24. Strielkowski, W., Grebennikova, V., Lisovskiy, A., Rakhimova, G., & Vasileva, T. (2025). AI-driven adaptive learning for sustainable educational transformation. Sustainable Development, 33(2), 1921–1947. [Google Scholar] [CrossRef]
  25. Sun, L., Yin, C., Xu, Q., & Zhao, W. (2023). Artificial intelligence for healthcare and medical education: A systematic review. American Journal of Translational Research, 15(7), 4820. [Google Scholar]
  26. Suriano, R., Plebe, A., Acciai, A., & Fabio, R. A. (2025). Student interaction with ChatGPT can promote complex critical thinking skills. Learning and Instruction, 95, 102011. [Google Scholar] [CrossRef]
  27. Teodorowski, P., Gleason, K., Gregory, J. J., Martin, M., Punjabi, R., Steer, S., Savasir, S., Vema, P., Murray, K., & Ward, H. (2023). Participatory evaluation of the process of co-producing resources for the public on data science and artificial intelligence. Research Involvement and Engagement, 9(1), 67. [Google Scholar] [CrossRef] [PubMed]
  28. UNESCO. (2024). AI literacy and the new digital divide: A global call for action. Available online: https://www.unesco.org/en/articles/ai-literacy-and-new-digital-divide-global-call-action (accessed on 20 January 2025).
  29. Usher, M. (2025). Generative AI vs. instructor vs. peer assessments: A comparison of grading and feedback in higher education. Assessment & Evaluation in Higher Education, 1–16. [Google Scholar] [CrossRef]
  30. Yusuf, A., Pervin, N., & Román-González, M. (2024). Generative AI and the future of higher education: A threat to academic integrity or reformation? Evidence from multicultural perspectives. International Journal of Educational Technology in Higher Education, 21(1), 21. [Google Scholar] [CrossRef]
  31. Zawacki-Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education—Where are the educators? International Journal of Educational Technology in Higher Education, 16(1), 1–27. [Google Scholar] [CrossRef]
  32. Zhai, C., Wibowo, S., & Li, L. D. (2024). The effects of over-reliance on AI dialogue systems on students’ cognitive abilities: A systematic review. Smart Learning Environments, 11(1), 28. [Google Scholar] [CrossRef]
Table 1. Table of changes: Summary of ‘must’ changes to assessment materials with feedback sources.
Table 1. Table of changes: Summary of ‘must’ changes to assessment materials with feedback sources.
ComponentSourceFeedbackProposed ChangeRationaleAgreed Change
Assessment BriefStaff workshopInstructions lacked clarity in appraising both the original article and the AI’s interpretationHave a separate section that states expectations for appraising the empirical articleEnsures students understand and address both components of the assessment
  • Clarification added under Instructions section
  • Critical appraisal learning outcome split into three exclusive sections
  • Additional content added at top of Tips section describing the need to critically understand the empirical article
Assessment BriefStudent workshopInstructions for the reflection part of the assessment was somewhat unclearAdd more detailed guidance and reflective models/frameworksHelps students structure their critique and understand expectations
4.
Reflective models are recommended under Instructions section
Assessment BriefStudent workshopReferencing format unclearSpecify referencing style in blog guidanceAddresses AI limitations in citation generation
5.
Clarity of referencing style added to Blog Guidance section
Assessment BriefStudent workshopStudents appreciated learning outcomes, but overlooked critical appraisalEmphasise critical appraisal more clearly throughout and link to learning outcomesReinforces key educational focus and clarifies expectations
6.
Critical appraisal learning outcome split into three exclusive sections
Table 2. Table of changes: Summary of ‘should’ changes to assessment materials with feedback sources.
Table 2. Table of changes: Summary of ‘should’ changes to assessment materials with feedback sources.
ComponentSourceFeedbackProposed ChangeRationaleAgreed Change
Assessment BriefStudent workshopWord count guidance inconsistentState word count for each section and total limit clearlyReduces confusion and supports appropriate planning
7.
Wordcounts are provided for the total and for each part of the report under the Instructions section
Assessment BriefStudent workshopValue of tips section highlightedRetain and expand guidance on recognising AI hallucinations and faultsReinforces critical AI literacy and practical assessment skills
8.
Added to the learning outcomes
9.
Added to ‘risks’ in the Background section
10.
Expanded the Tips section
Assessment BriefStaff workshopStudents may not fully demonstrate critical engagement with AI-generated contentRequire students to highlight changes made to AI-generated content and include in an appendixEncourages transparency and supports assessment of student input
11.
Added to the Instructions for coursework section
Marking MatrixStaff workshopLack of marker guidance for grade boundariesProvide examples of high- and low-quality workImproves marker confidence and consistency
12.
Included examples from high and grade bounds
Marking MatrixStaff interviewsTime consuming to check against academic articlesProvide checklist for articles of key methods, results, and conclusionsImproves marker confidence and consistency
13.
Included checklist for empirical articles
Training WorkshopStudent workshopEthical use of AI is not well understoodInclude declaration forms, videos, and exemplar prompts in the training materialsSupports transparency and encourages ethical practice
14.
Added an interactive brainstorming activity on the ethical use of AI, tailored to the specific context of the assessment
15.
Incorporated group breakout discussions to support peer learning and reflective engagement
Training WorkshopStudent workshopStudents requested more support with AI promptingInclude dedicated time in the workshop focused on writing effective AI promptsBuilds critical AI literacy and confidence in tool use
16.
Integrated into revised two-part workshop design, with optional homework to support independent learning
Table 3. Table of changes: Summary of ‘could’ changes to assessment materials with feedback sources.
Table 3. Table of changes: Summary of ‘could’ changes to assessment materials with feedback sources.
ComponentSourceFeedbackProposed ChangeRationaleAgreed Change
Assessment BriefStudent workshopBackground section was appreciated, but some found it overwhelmingUse bullet points instead of paragraphsImproves accessibility and reduces cognitive load
17.
The background and relevance section has been streamlined and uses bullet points to break down heavier text
Assessment BriefStudent workshopUnderstanding the purpose improved engagementHighlight the brief rationale or real-world relevance in the assessment introductionEnhances motivation and situates learning in context
18.
Extended the ‘why blogs’ and ‘why generative AI’ sections of the Background section
Assessment BriefStaff interviewsComplexity of articles may limit scope for critical appraisalUse more complex articles with intentional limitations or errors that are not highlighted by the study authorsAllows for students to demonstrate deeper critical thinking and analytical skills
19.
Added to the Instructions for coursework section
Training WorkshopStudent workshopStudents need more practical support in using AIOffer two workshops at different levels across the programmeCaters to varying levels of familiarity and ensures accessible skill development
20.
Revised workshop structure to offer two scaffolded sessions:
-
Foundations 1: Covers ethics and core AI concepts, with optional homework to reinforce learning
-
Foundations 2: Focuses on practical assessment-based activities, including a review of the King’s AI coversheet
Training WorkshopTraining session surveyLimited prior understanding of how to use AI; prompting guidance was especially valuableExpand workshop content on prompt engineering, including examples and guided practice sessionsBuilds foundational skills for effective AI use and supports equitable engagement with the assessment format
21.
Included in the scaffolded workshop structure and aligned with core learning outcomes and feedback priorities
Table 4. Table of changes: Summary of ‘would like’ changes to assessment materials with feedback sources.
Table 4. Table of changes: Summary of ‘would like’ changes to assessment materials with feedback sources.
ComponentSourceFeedbackProposed ChangeRationaleAgreed Change
Marking MatrixStaff workshopDifficulty in marking AI vs. student inputIntegrate AI evaluation criteria into the main marking matrixReduces marker cognitive load and reflects integrated learning outcomes
22.
Out of the scope of the pilot as it addresses the programme-wide marking matrix
Training WorkshopStaff workshopMarkers need guidance on evaluating AI-assisted workProvide training sessions specifically for staff assessing AI-integrated submissionsSupports consistency and confidence in marking across staff teams and supports new format adoption
23.
Marked as a priority for inclusion before the next phase of the trial’s implementation
Table 5. Key findings and illustrative quotes from students.
Table 5. Key findings and illustrative quotes from students.
ThemeKey FindingContributor Quotes
Students
Feasibility: efficiency vs. effortMany students found that generative AI tools streamlined the writing process, but did not necessarily change the overall workload, particularly when refining AI outputs.“[the assessment] took about the same amount of time that it would normally … that that was a big shock to me!”(S4) “It forces you to figure out whether you’re critically appraising the critical appraisal, but you think it’s given. So it is a bit, it is more work in that sense.”(S3)
Feasibility: user variabilityStudents highlighted varying levels of success when using AI, depending on their AI literacy and academic skills.“I couldn’t get it to be any longer, no matter how many times I prompt it … so that’s probably user error.”(S4) “I started off using AI, but then I found that it was giving very predictable answers … so I then added a lot to that and brought in my own experience into it.”(S6)
Acceptability: learning enhancementSome students saw the integration of AI as a valuable tool for building critical thinking and learning how to appraise content more deeply.“It would improve students’ critical thinking abilities if they don’t have to focus as much on things like grammar … they can focus more on the topics at hand, gaps in research.”(S4) “It can help them do the basic task, learn from it, but at the same time still require them to give some critical viewpoints … I only see positivity.”(S1)
Acceptability: skill developmentOthers raised concerns that reliance on AI might compromise the development of core academic skills.“It would be sad to lose [the skill of writing] an essay with good grammar … losing some of the core skills of sort of being a student at university”(S3) “Are we learning anything, or are we just asking [AI]?”
Integrity and fairness: ethics and ownershipSome students expressed discomfort about using AI-generated content, especially when it felt detached from their intellectual effort.“It kind of alienated me from the work … I had to keep going back to be like, did I say this?”(S4) “I wasn’t exactly tired after I’d finished it because it did a lot of the work, but like, it was like, I don’t read it and feel like any kind of pride or any kind of like ownership of it in any way.”(S6)
Integrity and fairness: equity and academic standardsSome responses reflected concerns around integrity and fairness of using AI in assessment generally.“It’s going to start feeling like we’re being assessed on how well we can use AI.”(S5) “It doesn’t make me any more tempted to use it in university assignments, because I feel like I’d come out of it with a degree that I didn’t really feel like was mine.”(S6)
Integrity and fairness: equity and academic standardsSome were more optimistic about AI in the mock assessment.‘The problem happens when some people in the classroom are relying on AI, some people are not. So then it creates a kind of disparity. But if everyone is using it then … I would feel that would be the most fair updater of ability because then everyone has the same resource.”(S1)
Table 6. Key findings and illustrative quotes from teaching staff.
Table 6. Key findings and illustrative quotes from teaching staff.
Teaching Staff
Feasibility: complexity of markingMost staff reported that while the structure was clear, marking took longer due to the added components and the unfamiliar task of evaluating AI use.“The first one took me two hours … I had to read the paper so that I wasn’t taking into account how long it takes.”(T2) “You have a blog, and a commentary, it doubles the work.”(T1)
Feasibility: familiarity improved efficiencyOnce familiar with the marking expectations, staff found the process more manageable.“Afterwards, it took about an hour…. once I figured out what I was doing.”(T2) “I take more time in the first in the first or second, but then generally the blog part it was quite easy to mark.”(T3)
Acceptability: critical thinkingStaff saw potential in the assessment for encouraging deeper student engagement and critical evaluation.“Incorporating AI into assessments is interesting, because … you still need to think and you still need to produce something.”(T1) “I think that’s good for some students. The work I read [was] really interesting. Some students did read well and add their critics. I’m really impressed about … how they learn from that.”(T3)
Acceptability: undermining learningThere were worries that integration of AI might limit the development of core academic skills.“You ask ChatGPT, it removes the painful experience [of writing well], but I think … we need that struggle.”(T4) “If we just rely on chat GPT writing things for us and then thinking for us, we will lose the ability to be creative ourselves, because I feel like you learn by doing.”(T2)
Integrity and fairness: evaluationStaff expressed uncertainty about distinguishing between student-authored and AI-generated content in more formulaic sections, but many were confident that they could identify inappropriate use of AI in the assessment.“AI has done a pretty good job describing the methodology.”(T1) “You can tell when you read, some sections of the blog that can also be impersonal, like the methods and results … The personal perspective that’s easier to judge because you can kind of sense the touch human touch.”(T4) “There’s no critical thinking because obviously the blog has been written by AI, so there’s automatically no critical thinking there, and they didn’t particularly try to add more on their own behalf.”(T1)
Integrity and fairness: integrationDespite concerns, staff generally recognised the inevitability of AI in education and advocated for proactive adaptation.“It helps people understand and think critically about [AI] use.”(T2) “We might as well embrace it and … teach people how to [use it properly].”(T4)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Martin, A.F.; Tubaltseva, S.; Harrison, A.; Rubin, G.J. Participatory Co-Design and Evaluation of a Novel Approach to Generative AI-Integrated Coursework Assessment in Higher Education. Behav. Sci. 2025, 15, 808. https://doi.org/10.3390/bs15060808

AMA Style

Martin AF, Tubaltseva S, Harrison A, Rubin GJ. Participatory Co-Design and Evaluation of a Novel Approach to Generative AI-Integrated Coursework Assessment in Higher Education. Behavioral Sciences. 2025; 15(6):808. https://doi.org/10.3390/bs15060808

Chicago/Turabian Style

Martin, Alex F., Svitlana Tubaltseva, Anja Harrison, and G. James Rubin. 2025. "Participatory Co-Design and Evaluation of a Novel Approach to Generative AI-Integrated Coursework Assessment in Higher Education" Behavioral Sciences 15, no. 6: 808. https://doi.org/10.3390/bs15060808

APA Style

Martin, A. F., Tubaltseva, S., Harrison, A., & Rubin, G. J. (2025). Participatory Co-Design and Evaluation of a Novel Approach to Generative AI-Integrated Coursework Assessment in Higher Education. Behavioral Sciences, 15(6), 808. https://doi.org/10.3390/bs15060808

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop