1. Introduction
Large language models (LLMs) present both a challenge and an opportunity for academics working in both the social and hard sciences. On the one hand, LLMs, such as ChatGPT, Claude, Copilot, etc., offer students the opportunity to complete coursework without completing the requisite reading or writing expected for a third-level degree [
1].
Likewise, many concerns exist about false information being drawn from LLMs suffering from hallucinations [
2]. Indeed Rudolph et al.’s [
2] work forcefully and clearly highlights the educational and philosophical limitations that LLMs have for students and the wider world. Such dangers have engendered a great deal of criticism and discontent from those working in more library-based fields, such as my own twin fields of history and sociology [
3]. Such fields, which centre predominantly around written assignments, seem to be particularly vulnerable to students using LLMs to complete coursework on their behalf. On the other hand, LLMs can provide real-time and engaged feedback and responses for student essay plans, queries, and copy-editing [
4]. They are clearly something now being used en masse by many students and it is unlikely that universities will be able to truly combat them without resorting to older and less diverse teaching assignments. Where, and how, should LLMs like ChatGPT be used in educational settings?
This paper takes a measured approach to LLMs as a teaching tool. It does not deny the temptations some students may have in using technology to avoid critical thinking, but it nevertheless argues that, used correctly, LLMs can greatly enhance the student experience by providing varied and engaging forms of assignment. What follows is a case study from a year-two undergraduate sociology assignment based in a School of Sport during the 2023/2024 academic year. Here, ChatGPT (version 3.5) was directly incorporated into assignments and served as a proxy for individuals from the world of sport. Students interviewed these proxies and critiqued their responses with reference to peer-reviewed academic research. Based on both subjective and objective evaluations, it is clear that this approach, whose applications outside sport will also be discussed within the paper, clearly resonated with students and challenged them to connect academic research to “real-life” events within their worlds [
5].
2. Why Use LLMs in Classrooms?
At the time of writing, universities in the United Kingdom and Northern Ireland have yet to agree on a clear consensus on the use of LLMs and artificial intelligence (AI) more broadly within academic settings [
6]. In the author’s own institution, Ulster University, a decision was taken prior to the 2023/2024 academic year to allow students to use ChatGPT and other LLMs provided they were acknowledged or referenced in their work [
7]. Additionally, students were encouraged to use LLMs as a sort of study aid to help them generate ideas for their assignments and/or to copy edit their work. This approach was unique compared to many institutions and gave instructors the opportunity to trial new assignments within their core modules. The author works within the School of Sport and Exercise Science, teaching two modules to undergraduate students on both the sociology and politics of sport. Concerning the latter, the year two Politics of Sport module was specifically concerned with using “real-life” examples to help students connect academic research with core issues in sport. In previous years, students drafted mock policy reports for politicians and government officials on pressing sporting issues ranging from racism and homophobia in sport to the need to update sporting infrastructure for specific groups [
8]. In class sizes ranging from fifty to eighty students, such policy reports somewhat achieved the instructor’s desire for students to exhibit critical thinking (roughly defined here as the ability to observe a problem or issue, research said issue using robust academic literature, and to finally provide potential solutions), but students often expressed discontent with the assignment’s learning outcomes. As part of ongoing continuous professional development, the instructor undertook a series of teaching and pedagogy workshops during the 2022/2023 academic year where they were introduced to the concept of the proxy [
9].
As a historian by trade, my own classroom education was centred on post-structuralist relationships with power [
10]. Rather than viewing “objective facts”, the researcher was taught to critique knowledge, exploring the underlying relationships, dynamics, and motivations to the recording and conveying of knowledge. Put simply, the researcher was taught that knowledge and education is not a “true” thing but rather the result of human societies which have constructed stories, and structures, about the past. While this terrified and excited me in equal measure as an undergraduate, it meant that as an educator, the researcher followed in the line of Cook-Sather and Matthews [
11], who conceptualize the teacher not as an all-knowing oracle but rather as a co-creator of knowledge with their students.
Understanding a proxy as a “cultural stand-in” or representation of something else, academics have long used proxies as teaching devices [
12]. Perhaps the best example of this is the simulated patient in medicine, wherein actors present with a litany of symptoms to aspiring medical students who attempt, based upon the actors’ descriptions, to successfully diagnose them [
13]. The benefit of such an approach is simple. It allows students to both test and practice their powers of deduction and patient care in a low-stakes environment. It does not matter, outside of their grading, whether their diagnosis is incorrect as, in essence, the simulated patient is an act of intense play [
14]. In the social sciences, instructors have used proxies to re-enact legal cases and court trials, to relive historical debates, and to test philosophical arguments [
15]. It is a well-tested, and highly effective, means of challenging students through engaging and interactive assignments (be they formal or informal tests). Previous research has found that such an approach can help students to apply their learning in a more engaged and interactive way, which has a positive influence over their broader learning within a course [
16]. LLMs thus presented an opportunity to apply the proxy framework to assessment in an innovative manner. Somewhat cynically, it is clear that students, despite university dictates, are already engaged in using LLMs for their assignments. Rather than resist students’ use of LLMs, an alternative approach is to harness that use for academic purposes such as the proxy approach detailed below. For those working in the social sciences, LLMs as interview proxies also work very well in allowing students to test, on a low scale, what field work and data analysis entails without the need to undergo ethical approval and to interview members of the public [
17]. It provides, then, a similar training ground to the simulated patient in medicine.
Interested in using LLMs as a sort of proxy, and due to the uniqueness of this assignment, the researcher trialled the entire process and provided these efforts as templates for students as examples of what the researcher believed to be best practice. Additionally, one seminar class (of one hour) was devoted to helping students choose research topics and refamiliarize them with the process of finding research through the University Library, JStor, Google Scholar, etc. A later week in the semester was dedicated to helping students generate interview questions and a transcript with ChatGPT. Lectures these weeks focused on misinformation and the media in sport which ensured that the seminars aligned with the teaching content. The researcher found linking the first assignment to the second one and “feeding forward” with the grading [
18], combined with assignment exemplars, have helped students who may otherwise have struggled to achieve passing grades. From a teaching perspective, students engaged with this assignment in creative ways. Some decided to interview disgraced players about doping, others sporting officials about concussion and head trauma or gender disparities. These ideas were not prompted by my own suggestions but rather the students’ consideration of a pressing issue within sport (say doping) and the best person to interview about that topic (a player who failed a drug test). Free to choose their own topic, students engaged with sporting issues and the literature in ways the researcher had not considered. In short, it challenged students’ creativity and critical thinking, and they responded well to it.
While a number of different prompts were used, they followed the same basic template as below. This was provided to students, and an in-class workshop was also given to teach them through the process of using LLMs in this way.
Pretend you are a high-ranking FIFA administrator.
Answer the following ten questions in as much detail as possible, in prose form and in a Q&A format.
Cite real-world examples if possible.
Interview question one…
Interview question two…
3. Using LLMs as Proxy
Prior to the inclusion of LLMs, and in this case ChatGPT 3.5, assignments for students enrolled in the “Politics of Sport” module included a somewhat traditional method of essays and annotated bibliographies which fed forward from one assignment into the next.
Based on best teaching practice, assignments one and two, the annotated bibliography and literature review, “fed forward” into the final assignment (this is demonstrated in
Table 1). Phrased differently, feedback given on earlier assignments was provided, with the opportunity to amend the work before the final assignment submission. Feedback was given formally and informally during this period. For the updated assignment, students were required to complete the following:
A 500-word annotated bibliography: similar to the above assignment.
Ten interview questions: Based upon the annotated bibliography, students drafted ten interview questions for their AI proxy. They were required to write a short, ungraded reflection of which research informed their interview questions.
A 2500-word critical review: Students finally critiqued the AI’s responses with reference to three core areas (its accuracy, its complexity, and what areas were ignored by the proxy). Critiques had to be firmly grounded in academic research and students were to include excerpts from their interviews, as well as including the entire transcript in the appendix.
These directives were split between two assignments, an annotated bibliography/interview question list and a critical review. There are two key points to consider at this juncture. The first is that students were allowed to use ChatGPT to generate an interview transcript but not to critically analyze it and, stemming from this, that the use of peer-reviewed academic literature, combined with the two-staged assignment process, largely precluded students from avoiding independent study. Aware that this was the first time students had been exposed to this kind of assignment, time was directed in class and in pre-recorded videos uploaded on the University’s learning management software (Blackboard, version 9.1) concerning the assignment on how to find academic sources, how to write a critical review, etc. Additionally, sample templates were uploaded to give an indication of the academic standard expected in writing and referencing. For the first year, such templates were written by the instructor but, following a successful trial, subsequent years will utilize past student papers.
4. Results, Applications, and Conclusions
From the sixty-two student responses in the mid- and end-of-semester surveys, the majority (fifty-eight) expressed satisfaction or high satisfaction with the assignment. Illustrative feedback included comments that the assignment “was different and interesting,” “linked nicely from the first to the second assignment,” and was “engaging and fun.” Many students also appreciated the flexibility to “choose what to write about.” Unlike the previous policy report assignment, which failed to adequately challenge students to think critically, the interactive nature of the current assignment (through the proxy interview) prompted them to thoughtfully consider the questions they wanted to ask about sporting phenomena. A common theme among students was that they revised their interview questions when initial responses did not address the key issues raised in their research [
9]. In terms of quantitative responses, the following findings emerged:
Ninety percent of students believed that using AI in this way was a worthwhile task;
Eighty-five percent of students recommended retaining the assignment in its current form for the following year;
Ninety-three percent of students felt they engaged in and/or enjoyed this assignment more than more traditional formats;
Ninety-five percent of students expressed satisfaction or high satisfaction with the assignment.
One of the few criticisms from students was the desire to see more than one example of completed assignments. While the researcher did upload a completed assignment, students wanted examples across all grade bands. A more substantive issue was which LLM to use. When this assignment was initially created, many people viewed LLMs and ChatGPT as interchangeable. However, the reality is more complex, given the multitude of rival AIs available, such as Google Gemini and Microsoft Copilot. My decision to use ChatGPT (at the time, version 3.5 was freely available) was driven by two considerations: familiarity and cost. Students were instructed to use version 3.5, which was the free service, and had to provide evidence in the form of screenshots and integrity statements that they did not use a paid subscription for a higher-rated version. In future iterations of this assignment, the researcher will trial several more freely available LLMs to determine which provide the best responses for this assignment. To enhance the assignment further, the researcher’s plan is to upload multiple completed assignments across different grade bands to give students a clearer understanding of expectations and quality standards. Additionally, the researcher will experiment with various LLMs, such as Google Gemini, Microsoft Copilot, and other emerging AIs, to evaluate which best supports the assignment’s objectives [
19].
From a teaching perspective, the trial was a success. Students displayed a great deal of creativity and engagement in their learning, undertook independent research to answer their questions, and, crucially, challenged themselves to expand their learning. While grade averages were largely consistent with previous years, the quality on display was substantially higher in each grade bracket. Further trials and experiments will need to be undertaken before more conclusive statements can be made but, following its first year, the use of LLMs was a clear success for staff and students alike.
The most challenging aspect of this assignment was undoubtedly the assessment areas. The researcher did not want this assignment to simply demonstrate that “AI is bad” but rather to have students engage with the interview responses and contextualize them with reference to research [
20]. The three headings chosen (accuracy, complexity, and what was missed) were meant to illustrate this goal. Initially, students struggled to understand what “complexity” meant as a point of critique within their assignments. Colloquially, phrases like “what would this mean in real life?” or “would this work in real life?” or “have they simplified what is already happening?” helped to clarify this point. Students who initially struggled with the concept of complexity reported no confusion at the time of submission, but alternative terms such as “real-world application,” “applicability,” “effectiveness,”, etc., are currently being considered. To support this understanding, the researcher will include a glossary or detailed explanation of key terms. To further support student engagement and understanding, interactive workshops where students can practice formulating and revising interview questions will be offered. Additionally, incorporating discussions on the ethical implications of using AI in research and assignments will help students understand the broader impact of their work.
The integration of artificial intelligence into educational paradigms presents a complex landscape of potentialities and challenges. The full ramifications of this technological shift remain indeterminate, with salient concerns regarding academic integrity, intellectual property rights, and the preservation of critical analytical skills at the forefront of scholarly discourse [
8]. Notwithstanding these apprehensions, the pervasive nature of AI in educational contexts appears to be an inexorable trend. Given this reality, it is imperative to eschew both an uncritical adoption and a wholesale rejection of AI technologies in favour of a nuanced approach that leverages AI’s capabilities to enhance established pedagogical objectives. This paper proposes a “surrogate methodology” that shows particular promise in disciplines such as the humanities and healthcare sciences. This approach facilitates the simulation of diverse personae, ranging from standardized patients to historical figures, from contentious public personalities to literary constructs. Central to effective pedagogy is the facilitation of active engagement and meaningful interaction. The onus falls upon educators to optimize the implementation of AI technologies in service of these educational imperatives. The proxy interview technique, as detailed here in this study, represents a potentially successful strategy in this regard [
9].
5. Example of a Critical Review Response Given to Guide Students
As a final point, the below example was provided to students as a sort of exemplar on how to answer the question.
While the answers provided by the official appeared to be accurate on a superficial level, further research has uncovered several issues. These revolve around the depth of the problem, the initiatives being used to address it, and the hoped outcomes. Here, focus will be given to the educational programmes, welfare measures and fan measures.
This is especially the case when it comes to educational programmes. In the interview, the FIFA official asserts that:
FIFA is implementing comprehensive education programs, strict penalties for racist behavior, and promoting diversity at all levels of football.
However, this claim warrants scrutiny when contrasted with the academic literature on the subject. For instance, a study by Mark Doidge [
21] challenges the effectiveness of such educational programmes, suggesting that they often fail to address the underlying social and cultural roots of racism in football. The paper argues that while FIFA’s initiatives are a step in the right direction, they largely operate on a superficial level, focusing more on immediate responses to racist incidents rather than fostering a deeper, systemic change. This discrepancy highlights a gap between FIFA’s stated strategies and the more nuanced, holistic approach recommended by scholars. Doidge’s [
21] research suggests that without addressing the broader societal attitudes and beliefs that fuel racism, efforts like those described by the FIFA official may have limited long-term impact. This contrast raises questions about the accuracy and sufficiency of FIFA’s response as it appears to overlook the deeper, ingrained aspects of racism within the sport.