From Policing to Design: A Qualitative Multisite Study of Generative Artificial Intelligence and SDG 4 in Higher Education

Joseph, Marina Mathew; Areepattamannil, Shaljan

doi:10.3390/su172210381

Open AccessArticle

From Policing to Design: A Qualitative Multisite Study of Generative Artificial Intelligence and SDG 4 in Higher Education

by

Marina Mathew Joseph

¹ and

Shaljan Areepattamannil

^2,*

¹

School of Social Sciences, Indira Gandhi National Open University, New Delhi 110068, India

²

Data Analytics, Policy and Leadership Division, Emirates College for Advanced Education, Abu Dhabi P.O. Box 126662, United Arab Emirates

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(22), 10381; https://doi.org/10.3390/su172210381

Submission received: 14 October 2025 / Revised: 12 November 2025 / Accepted: 19 November 2025 / Published: 20 November 2025

(This article belongs to the Special Issue Reimagining Digital Learning for Sustainable Development)

Download Versions Notes

Abstract

Generative artificial intelligence (AI) is now embedded in the everyday practice of higher education. This qualitative, multisite study examines how university faculty perceive where generative AI advances or threatens Sustainable Development Goal (SDG) 4, which commits education systems to inclusive, equitable, high-quality learning across the lifespan. We conducted semi-structured interviews and focus groups with 36 academics across three universities, complemented by document and artefact analysis. Guided by critical pedagogy, sociomateriality, and technological pedagogical content knowledge (TPACK), we used reflexive thematic analysis to identify five cross-cutting themes. Faculty reported inclusion gains through rapid accessibility work, multilingual support, and differentiated feedback, alongside risks that undermine SDG 4, including bias, expansion of surveillance, unreliable outputs, paywalled access advantages, and work intensification. Assessment emerged as the decisive site of tension: staff rejected detection-led policing and favoured designs that reward process, critique, and provenance. We offer a practical framework, aligned to SDG 4 targets, that translates these insights into commitments, indicators, and a 12-month programme plan. The sector should move beyond bans and hype. Responsible adoption requires equity by design, assessment redesign, institutionally guaranteed access, transparent evaluation, and protected time for teacher development.

Keywords:

generative artificial intelligence; Sustainable Development Goal 4; higher education; inclusion; assessment; accessibility; qualitative research; teacher professional learning

1. Introduction

Sustainable Development Goal (SDG) 4 is an unambiguous call to redesign higher education around quality, equity, and lifelong learning. It asks universities to improve learning outcomes, widen participation, remove structural barriers, and sustain learning across the lifespan, with concrete targets covering outcomes, access, skills for work, elimination of disparities, safe learning environments, and teacher development [1,2]. SDG 4 is not a slogan; it is a set of measurable commitments now pressing on day-to-day decisions in curricula, assessment, and student support.

Generative artificial intelligence (AI) has moved rapidly from novelty to infrastructure in higher education. Sector foresight identifies it as a near-term driver of change in teaching, learning, and assessment, even as many institutions are still building capability and policy [3,4]. Student uptake has surged, so decisions about when and how to use AI are no longer hypothetical. In the UK, for example, 92 percent of undergraduates report using generative AI in some form, and 88 percent use it for assessment-related tasks, most commonly to explain concepts, summarise readings, and structure ideas [5]. These figures signal a shift in the baseline conditions of university learning.

Potential benefits are tangible. Faculty can draft formative feedback more quickly, translate and rephrase content for multilingual cohorts, generate accessibility assets such as transcripts and alternative text, and scaffold academic writing or code with on-demand exemplars. Government and intergovernmental guidance converge on a pragmatic message: when used appropriately, teacher-facing applications can reduce routine workload, increase differentiation, and free time for higher-value pedagogy, provided that quality checks and human oversight are in place from the outset [2,6]. This aligns with SDG 4’s emphasis on quality, inclusion, and safe learning environments.

Risks are equally concrete. Large language models can fabricate citations, amplify bias, misinterpret local contexts, and mishandle personal data. Over-reliance can dull critical readings and source evaluations if they are left unchecked. The same guidance therefore stresses accuracy verification, transparency of use, and protection of learner data and dignity [2,6]. Equity remains a persistent fault line: paywalled tiers, bandwidth constraints, and uneven digital skills can create a two-speed classroom that contradicts the spirit of SDG 4 unless institutions guarantee access and provide alternatives suited to low-data contexts.

Assessment has become the decisive pressure point. Regulators and quality agencies now encourage universities to move beyond detection-led policing towards designs that surface process, judgement, and provenance. Recent resources from Australia’s Tertiary Education Quality and Standards Agency, for example, pull institutions towards authentic tasks, oral or practical defences, and explicit declarations of AI use as part of learning assurance [7]. The thrust is clear: validity, fairness, and learning quality are better served by redesign than by surveillance. This direction of travel accords with SDG 4 targets on quality outcomes, equitable access, and safe learning environments.

Evidence on stakeholder perceptions is catching up with practice. Surveys and mixed-methods studies report increasing but uneven confidence among educators and students, disciplinary variation, and gendered differences in attitudes and usage [5,8]. They also show that interest in AI’s potential coexists with caution about ethics, integrity, and skill development. In short, practice is outpacing consensus, and faculty are being asked to exercise professional judgement under volatile conditions.

This study responds to that reality by explaining how university faculty perceive the potential and risks of generative AI in relation to SDG 4 and by specifying the conditions under which adoption advances equitable, high-quality learning. It asks two practical questions:

How do faculty members perceive where generative AI advances or threatens specific SDG 4 targets in their teaching context?
What design, governance, and capacity conditions do faculty consider non-negotiable for responsible use?

To address these questions, we take a forward-looking, applied stance. We aim to generate evidence that helps programme teams convert SDG 4 from principle into everyday practice through concrete decisions about access, assessment, support, and governance. We therefore ground the inquiry in the formal definition of SDG 4 and its targets. SDG 4 ensures inclusive and equitable quality education and promotes lifelong learning opportunities for all [9].

We next situate the study against current progress in SDG 4 goals globally and in India. Global progress towards SDG 4 remains uneven. Access to education has expanded since 2015, and an estimated 58 percent of students achieved minimum reading proficiency by 2019; however, substantial disparities persist in terms of disability, income, language, and geography [10,11]. Technology has broadened opportunities but has also amplified inequalities where connectivity, devices, and assistive tools are scarce [10]. Disparities are most acute in low-income settings and in countries affected by conflict or humanitarian crises, where restrictive regulations and fragile systems continue to limit participation, including for refugees and displaced learners [12]. Within countries, gaps in access and attainment remain wide across socioeconomic groups, reflecting persistent structural disadvantages [13]. Nearly one-fifth of adolescents and youth are out of school, and progress in reducing these numbers has stagnated in many low- and middle-income countries; even among those who complete primary school, vulnerable groups are less likely to transition to or complete secondary education [14]. India has integrated SDG 4 objectives into national policy through the Right of Children to Free and Compulsory Education Act 2009 and the National Education Policy 2020, signalling a strong commitment to inclusive and equitable quality education [15,16]. Recent assessments report that India is on track or moderately improving in terms of SDG 4 goals, with related gains in areas such as gender equality and health that interact with education outcomes [17,18]. Nonetheless, persistent gaps remain for learners in rural and urban-poor settings, for students with disabilities, and for linguistically marginalised groups, reinforcing the need for targeted strategies that pair access with learning quality [15,19,20].

Against this backdrop, the article proceeds as follows. Section 2 reviews the literature and specifies the research gap. Section 3 justifies a layered conceptual framework and maps it to SDG 4 targets. Section 4 details the multisite qualitative design, participants, and analysis. Section 5 presents five themes with calibrated frequency cues. Section 6 discusses implications for assessment, inclusion, and teacher work. Section 7 concludes with actionable recommendations.

2. Background and Literature

Higher education has used artificial intelligence for decades in adaptive tutoring, analytics, and automated feedback. What is new is the velocity and breadth of large language model adoption after late 2022, which shifted AI from a niche layer to a default layer in student and staff workflows. Sector foresight consistently frames generative AI as a near-term driver of change in learning, teaching, and assessment while also flagging gaps in capability and governance that must be closed to realise benefits fairly and safely [3,4].

Prior to large language models, syntheses were a field dominated by STEM and computer science, with heavy use of quantitative methods and a focus on prediction, tutoring, assessment, and personalisation. This baseline matters because current tools enter ecosystems that already privilege certain disciplines and data practices [4,21].

Since 2023, evidence on generative AI in higher education has grown quickly but remains uneven. Horizon scanning and QuickPolls report rapid experimentation with limited strategic alignment. Institutions see value in feedback generation, coding support, and content transformation, yet they also report thin policy scaffolds, budget constraints, and staff development needs that function as bottlenecks [3,22].

Student uptake is high and rising. UK survey data from 1041 undergraduates in early 2025 indicate that 88 percent use generative AI for assessment-related tasks, most commonly to explain concepts, summarise readings, and organise ideas. Normalisation has outpaced policy clarity, increasing pressure on assessment design and staff guidance for students [5].

Peer-reviewed work now documents opportunities and risks with more granularity. Reviews and position papers identify plausible benefits for personalisation, formative feedback, and language support, alongside concerns about hallucinations, bias, opacity, and over-reliance that can dull critical reading if left unchecked [9,23].

Policy guidance has converged on a human-centred, equity-first stance. UNESCO’s guidance recommends transparent use, human oversight, and protection of learner data; pairs short-term actions with capacity building; and connects generative AI to the broader SDG agenda for inclusive, quality education [2,24].

Assessment is the decisive pressure point. The UK and Australian quality agencies advise moving beyond detection toward designs that surface process, judgement, and provenance. The QAA urges providers to redesign assessments for a world where students have generative tools, and the TEQSA offers worked examples that protect learning assurance while enabling responsible student use [7,25].

Inclusion is a recurring theme. The Universal Design for Learning (UDL) community highlights how AI can accelerate the production of accessible formats, multilingual explanations, and flexible representations when paired with human reviews. CAST’s UDL Guidelines 3.0 update in 2024 strengthens this link by foregrounding barriers rooted in bias and exclusion and by providing concrete design cues [26].

Research on perceptions suggests a cautious middle. Multisite and national studies report that educators and students see potential for equity gains and workload relief but want clear norms, transparent boundaries, and targeted professional learning. Faculty point to AI literacy gaps and the need for exemplars that they can adapt to context rather than generic tool tips [8,22,27].

In sum, prior studies describe the opportunities and risks of generative AI in higher education but rarely map faculty sense-making directly to SDG 4 targets (see Table 1) with designable commitments for programmes, and evidence beyond the Global North remains limited. We address these gaps with a multisite qualitative design that links faculty perceptions to specific SDG 4 targets and translates them into actionable commitments for assessment, access, governance, and teacher development. To operationalise this agenda, we adopt a layered framework that links SDG 4’s aims to concrete tests in practice, turning normative commitments into questions that guide sampling, instruments, and analysis. Section 3 motivates each lens, and Table 2 summarises their alignment with the SDG 4 targets and the indicators used in our coding.

3. Conceptual Framework

We use a layered framework that keeps quality, equity, and lifelong learning at the centre while being realistic about how change occurs in universities. Each lens supplies one analytic test and a practical use in sampling, prompts, and analysis. This reduces overlap and makes decisions traceable from claims to evidence. The lenses also link directly to the SDG 4 targets and indicators summarised in Table 2.

3.1. Education’s Purpose, Not Just Its Metrics

SDG 4 concerns what education is for, not only what it measures. Purpose precedes technique. Biesta’s critique of the age of measurement reminds us that validity begins with purposes and is then realised through evidence and methods [28]. We therefore treat AI as valuable only insofar as it strengthens worthwhile educational goods such as subject understanding, democratic voice, and human flourishing. This stance aligns with the capability approach, which reframes development as expanding people’s real freedoms to do and to be rather than merely increasing inputs or test scores [29,30].

Analytic test: Does the practice clearly advance educational purposes consistent with SDG 4?

3.2. Critical Pedagogy: Agency, Voice and Judgement

Following Freire, we reject banking models and treat students as co-authors of knowledge. Generative AI can increase participation in meaning-making or reduce learning to fluent paraphrase. Our prompts ask whether staff invite students to critique model outputs, surface tacit assumptions, and make processes visible. Analytically, we code for shifts from product to process, from answers to reasoning, and from surveillance to dialogue [31].

Analytic test: Does practice centre student agency and dialogue, not mere output?

3.3. Sociomateriality: Practice as an Assemblage

Classrooms are assemblages of people, tools, policies, data, and places. Adoption is not a switch but a negotiated reconfiguration in which changes to prompts, rules, access, and data flows co-produce new practices. This guards against technological determinism and legitimises local variation. In interviews we map the assemblage around a task, identify who performs verification labour, and trace where data travel. We then compare consequences for equity and quality across assemblages rather than attributing outcomes to a tool in the abstract [32,33].

Analytic test: Which elements of the assemblage enable or constrain equity and quality?

3.4. TPACK: Disciplinary Fit and Professional Judgement

Generative AI is educationally useful only when it improves the fit among technology, pedagogy, and disciplinary content. Technological pedagogical content knowledge (TPACK) provides a pragmatic vocabulary for that fit. It steers sampling toward disciplinary breadth and guides questions about where staff feel that AI deepens disciplinary understanding versus where it encourages superficial substitution. In coding, we tag good-fit episodes, such as AI-supported expert critique in design studios, and poor-fit episodes, such as generic summarisations that bypass core readings [34].

Analytic test: Is the use transformational for disciplinary learning or merely a substitution?

3.5. Universal Design for Learning (UDL): Inclusion by Design

UDL is our yardstick for access with challenge. We use the CAST UDL Guidelines 3.0 as a concrete checklist for representation, action expression, and engagement, and we pair any AI-generated accessibility asset with a human review for accuracy, tone, and cultural fit. The 2024 update strengthens alignment with SDG 4 by surfacing barriers rooted in bias and exclusion, which is directly relevant to AI’s known risks for multilingual and disabled learners [26].

Analytic test: Does the design widen access by design and preserve appropriate challenge?

3.6. Assessment Validity and Authenticity

Integrity is primarily a design problem. We draw on Messick’s unified view of validity and on evidence-centred design to evaluate whether assessment in an AI-rich environment still captures the intended constructs. We privilege authentic performances that surface judgement, provenance, and process, consistent with Wiggins’ criteria and with the National Research Council’s “Knowing What Students Know” logic linking models of competence to observations and interpretations [35,36,37]. Analytically, we ask whether practices protect consequential validity and avoid narrowing the construct to what machines can produce.

Analytic test: Does the assessment make thinking and provenance visible while preserving construct validity?

3.7. Ethics Baseline: Human Oversight, Transparency and Fairness

We treat UNESCO’s recommendation on the ethics of AI as a constraint set, not an aspiration. Human oversight, transparency, fairness, data protection, and attention to low- and middle-income contexts are non-negotiable. This addresses well-documented risks in generative systems, including opacity, data provenance concerns, and bias. We incorporate disclosure of AI use, opt-out routes, and local bias testing of prompts and outputs [2,38,39,40].

Analytic test: Are minimum ethics safeguards present and auditable in practice?

3.8. Equity Guardrails: Capabilities, Language and Data Governance

Equity requires more than intent. The capability lens focuses on real opportunities for participation when bandwidth, devices, or paid tiers differ. We examine whether institutions guarantee baseline access and provide low-data alternatives. We also treat dataset and prompt documentation as equity infrastructure, adopting datasheet-style documentation to increase transparency about training data, likely failure modes, and appropriate uses. We look for safeguards that protect linguistic minorities and students with disabilities, and we test whether practices hold under low-bandwidth conditions [29,39,40].

Analytic test: Do access guarantees and documentation practices protect those most at risk of exclusion?

3.9. Diffusion of Innovations: Why Adoption Arcs Differ

Faculties and programmes adopt different speeds for good reasons. Diffusion theory helps us read patterns without pathologizing caution. We probe relative advantage, compatibility with assessment cultures, complexity of use, trialability, and observability to explain why comparable departments make different choices and to distinguish principled resistance from capacity gaps that leadership can fix [41].

Analytic test: Which adoption attributes explain the variation that policy or support could address?

3.10. Activity Theory: Contradictions as Engines for Change

When AI enters practice, contradictions often surface, for example between rules that reward product outcomes and learning goals that reward process, or between the promise of automation and the verification work that follows. We use activity theory to treat such tensions as levers for expansive learning rather than as failure points. Critical incident prompts ask staff to narrate a moment when AI exposed a misfit in tasks, rules, or roles and to explain how they resolved it to improve equity and quality [42].

Analytic test: Which contradictions can be converted into design moves that improve practice?

3.11. Sociotechnical Imaginaries and Design Justice: Futures Worth Having

Policies encode futures. Sociotechnical imaginaries provide a vocabulary to analyse whose futures are privileged in AI policies and pilots, while design justice insists that those most affected by decisions should shape them. We read institutional documents for whose risks count, who authored the rules, and how commuter students, disabled students, and Global South perspectives are represented. Adoption plans that ignore community voice rarely deliver equity at scale [43,44].

Analytic test: Whose futures are imagined and resourced, and who participates in design?

The framework lenses serve as operational tests, each linked to specific SDG 4 targets. This map guided instrument design and coding and reduced overlap by assigning a distinct question and indicator set to each lens. Table 2 summarises the framework-target map and shows how it informed the interview prompts and coding. This framework keeps SDG 4 visible as a yardstick for quality and equity, resists quick fixes that ignore workload and data realities, and gives programme teams a structured way to decide when generative AI adds educational value and when it simply adds noise.

4. Research Methodology

4.1. Methodological Orientation and Theory

We adopt an interpretivist stance that treats knowledge as situated and co-constructed in practice. The study is informed by critical pedagogy, sociomateriality, TPACK, and UDL, with UNESCO’s AI ethics principles used as a baseline for safe and equitable practice. The method follows a purpose: to understand how faculty make sense of generative AI in relation to SDG 4 and to surface the conditions they regard as non-negotiable for responsible use [24,26,28,31,33,34].

We used a qualitative multisite case study to examine how and why questions within real settings where policy, tools, and teaching practices are evolving together [33]. The design privileges depth over breadth and enables contrasts across institutions with different missions and digital capacity. The analysis was planned inductively while remaining sensitive to the framework, which is consistent with reflexive thematic analysis logic [45,46,47].

4.2. Settings, Participants, and Sampling

The study was conducted in India between April and May 2025 across three public universities with contrasting profiles: one research-intensive metropolitan institution, one teaching-focused university with strong professional programmes, and one regional university serving many commuters and first-generation students. These contexts provide maximal variation in policy maturity, infrastructure, and student demographics, all of which plausibly shape the SDG 4 trajectories.

We used purposive, maximum-variation sampling to capture disciplinary breadth, role diversity, and teaching loads. The sample comprised 36 academics across engineering, computing, business, education, health, social sciences, and arts and humanities, from early-career lecturers to professors and programme leaders. Table 3 reports the distribution by institution type and discipline cluster. Sampling and analysis were guided by the principle of information power rather than numerical saturation alone, given the study aim, sample specificity, and analytic strategy [48,49]. For recruitment, departmental lists and snowballing under clear inclusion criteria were used.

We set a priori decision rules: continue recruitment while new interviews add meaning units to at least two targeted code families, namely, assessment validity and inclusion by design; stop when three consecutive interviews add no new codes in those families and only elaborations elsewhere. We weighted sample specificity, the use of theory in prompting, narrative richness, and an analysis strategy based on reflexive thematic analysis. We also sought disconfirming cases. Recruitment ceased when these criteria were met and information sufficiency for the study aims was reached.

4.3. Data Collection and Management

We collected data through semi-structured interviews, site-based focus groups, and documentary analysis to triangulate accounts and anchor claims in artefacts of practice. Twenty-eight interviews, each lasting 45 to 60 min, explored concrete teaching episodes, perceived benefits and harms, assessment decisions, accessibility workflows, workload effects, and governance expectations. We used critical incident prompts to keep discussions grounded in specific events rather than hypothetical events. One focus group was held at each university, 75 to 90 min in length, with eight additional faculty members in total, to test interim interpretations, surface tensions across disciplines, and capture the dynamics of peer sense-making. Alongside these conversations, we analysed module handbooks, assessment briefs, micro-credential outlines, and institutional guidance on AI to compare the stated policy with reported practice and to map claims to SDG 4 targets. An interview guide aligned with the conceptual framework provided common coverage while allowing probing and adaptation to the local context. The interview prompts are provided in Appendix A. We iterated wording after the initial interviews to sharpen prompts on assessment validity, equity, and data governance. The interviews and focus groups were conducted from April to May 2025; documentary sources were collected and archived between April and May 2025.

The interviews and focus groups were audio-recorded and transcribed verbatim. Identifiers were removed at transcription; files were stored on encrypted drives behind institutional authentication. A structured file-naming convention and version control supported the audit trail. We used a qualitative data analysis environment for coding and memoing; the software supported organisation, not interpretation. Some quotations were lightly composited to remove identifiers that could reveal individuals in small departments or unique roles. Compositing did not alter the semantic content. We preserved wording and tone, merged adjacent utterances by the same participant when necessary, and maintained an audit trail linking composites to original segments. Bracketed generalisations were considered but were insufficient to prevent deductive disclosure in several cases because of unique combinations of role, module, and context.

4.4. Data Analysis

We followed reflexive thematic analysis in six phases: familiarisation, coding, theme construction, theme review, theme definition, and narrative production [46,47]. Coding was primarily inductive, guided by sensitising concepts from our lenses, including equity by design, assessment validity, workload visibility, data governance, and disciplinary fit. An excerpt of the analytic codebook, with code families and examples, is provided in Appendix B. We used constant comparisons within and across sites and disciplines. Themes were evaluated for internal coherence, distinctiveness, and pragmatic utility for SDG 4 decision-making. To guard against construct slippage, we examined whether claims about assessment quality remained aligned with intended constructs, rather than with detectability or convenience, which is consistent with validity and authenticity scholarship [35,36,37]. We used multiple strategies to enhance credibility, transferability, dependability, and confirmability, including an audit trail, reflexive memos, peer debriefs, attention to disconfirming cases, and data triangulation [50,51,52]. We consulted the SRQR and COREQ reporting standards as a checklist for transparency without forcing formulaic reporting that would conflict with the ethos of reflexive thematic analysis [53,54,55].

Documentary analysis was integrated with interview coding. We coded module handbooks, assessment briefs, and institutional guidance using the same codebook; added document-only codes for declared AI use, data retention, and accessibility provisions; and linked document segments to interview excerpts in analytic memos. Documents were treated as evidence that could confirm, contradict, or extend interview-derived themes. Discrepancies triggered follow-up memos and, where possible, member checks. This integration strengthened triangulation and enhanced credibility.

5. Findings

This section presents five interlocking themes that explain how faculty perceive the promise and the peril of generative AI in relation to SDG 4. Each theme contains subthemes, deviant cases, and illustrative extracts. Quotations are attributed by discipline and institution type to preserve anonymity while retaining context. When a quotation is lightly composited to remove identifiers, semantic intent is preserved. The frequency cues used are as follows: “most” indicates at least two-thirds of participants; “about one-third” indicates approximately one-third; “a minority” indicates fewer than one-quarter; and percentages refer to the 36 participants unless otherwise stated.

5.1. Inclusive Affordances, with Caveats

Most participants described a step change in how quickly they could produce accessible materials. Transcripts, alt text, simplified readings, and alternative formats that previously took hours could be drafted in minutes and then checked by a human. An education lecturer at the teaching-focused university said, “I can get from zero to an accessible baseline quickly, then spend my energy on the human bit, the checking and the coaching.” A social scientist at the regional university described using AI to create sensory-friendly slides for autistic students and then co-reviewing with a disability adviser to tune pacing and colour contrast.

In programmes with international cohorts, about one-third, most commonly in engineering and computing and business, valued register shifting that moved between plain and technical English. An engineer at the research-intensive university noted, “The tool can explain a control systems concept in simpler language, then we raise the register together. I still verify the technical terms before anything reaches the virtual learning environment.” A business lecturer described a workflow where students requested bilingual glossaries keyed to the week’s case study, with the lecturer approving or amending definitions before release.

Tutors generated tiered practice items aligned with common misconceptions. A subgroup in health and social sciences reported using progressively faded scaffolds across the semester, whereas several in computing produced multiple worked examples of the same algorithm with different commentary styles and then asked students to critique which explanation best surfaced the core idea.

Three limits recurred. First, about one-third, especially in numerate and arts contexts, reported factual drift and stylistic overreach. An arts colleague said, “The alt text was fluent but wrong about composition and style. It named techniques that were simply not there. We scrapped it and rewrote it by hand.” Second, most participants stressed that equity hinges on access and refused to normalise workflows dependent on paywalled tiers or high-end devices. Third, several warned that stigma is a risk if only disabled or multilingual students are directed to AI support; to counter this, a minority repositioned accessibility assets as standards for everyone.

A minority, concentrated at the regional university, piloted a low-data route: weekly printable packets generated with AI and then teacher-edited for accuracy and cultural fit. Students without reliable connectivity reported feeling less excluded, and staff reported fewer late submissions tied to access issues.

The participants linked these practices to SDG Targets 4.a and 4.5. Most saw immediate traction when AI accelerated universal design and linguistic inclusion, provided that human review and institution-funded access were guaranteed; otherwise, tools risked deepening disparities.

5.2. Assessment Is a Design Problem, Not a Policing Problem

The most consistent redesign was a move to mark thinking. Most modules now require a short process file containing prompt histories, drafts, and decision rationales. A business academic at the regional university said, “I mark the journey. A polished answer with no provenance is not learning.” In engineering labs, students submit short error logs that name the model’s mistakes and their fixes, which make judgements audible and assessable.

Most participants introduced concise declarations of use, typically a few lines that specified tools, prompts, and edits. The students reported that this made boundaries legible. A humanities colleague said, “It is like declaring you used a grammar checker or a stats package. It makes the invisible visible.”

About one-third shifted briefs toward local data, community partners, or messy artefacts that fluent paraphrase cannot satisfy. Oral defences and studio critiques reappeared in courses that had drifted toward purely written submissions. A humanities tutor explained, “When the assignment engages our local archive, a generic answer cannot bluff it.”

Trials of detection tools were short lived. Almost all those who trialled them discontinued use, citing false positives, adversarial dynamics, and poor fit with values of fairness and dignity. A computing lecturer at a research-intensive university said, “False positives shattered trust. We redesigned assessment rather than escalating policing.” Several also questioned the construct being measured if a task could be passed by paraphrasing a model’s output without understanding.

A minority retained invigilated paper tests for specific first-year thresholds, typically core algebra or anatomy, while changing coursework to reward process and critique. These were framed as scaffolds for disciplinary fluency, not as bans.

Staff anchored their choices in SDG Targets 4.1 and 4.4. The design logic was validity and fairness: reward judgement, provenance, and process, and remove incentives to hide learning practices. A one-page declaration-first module policy template is provided in Appendix C.

5.3. Workload Realignment, Not Automatic Reduction

Most participants reported that the time saved in the drafting of feedback or rubrics reappeared as verification and coaching. A health lecturer explained, “I save thirty minutes of drafting feedback, then spend thirty minutes checking facts and helping students read outputs critically. The work moved. It did not vanish.” In quantitative subjects, many described running spot checks against authoritative sources, a habit that added predictable minutes per batch.

New tasks include curating prompt libraries, bias testing with local names and dialects, producing discipline-specific exemplars, documenting model limits for students, and troubleshooting privacy settings. Few workload models capture this. A social scientist at the regional university said, “We are asked to innovate and to police. That is, there are two jobs. If we want quality, we need time and credit for the extra work.”

One-off tool demos were rated as low value. About one-third praised local, iterative support: communities of practice that shared error cases, redesigned rubrics, and tested declarations; mentor pairs that observed classes; and short R and D sprints that produced reusable artefacts such as declaration templates or bias test scripts.

Colleagues reported straightforward but narrow gains for formatting reading lists, generating plain-language module summaries, and drafting first-pass announcements, with minimal verification. These wins were welcomed but did not transform the overall workload.

Without explicit recognition of shifted labour, progress on SDG Target 4.c stalls. Staff asked for protected time, credit in workload models, and support for peer mentoring. They also asked leaders to stop treating AI as a time saver by default and to fund the verification that quality requires.

5.4. Governance, Equity and Trust

Most participants were unequivocal that institution-funded access was the first equity move. A teacher educator in the teaching-focused university said, “I will not run a two-tier class. If the university cannot guarantee access, I design as if no one has the tool.” Several programmes paused plans until licences were in place and then resumed with clear guidance and low-data routes.

About one-third offered a non-AI path without penalty, such as a manual feedback route or a human-only translation option. Students who opted out often cited confidentiality concerns in work-based projects, which staff respected.

Programmes earned credibility when they tested prompts with local names, dialects, and case materials and then published results and mitigations. A minority reported suspending specific AI workflows after documented harm and replacing them with human-authored alternatives, using the incident as a teaching case.

Staff preferred policies that set red lines and require disclosure while leaving room for disciplinary judgement. Mandates to use a single tool were resisted. Faculties that published short, principle-based guides with worked examples reported fewer anxieties and smoother student experiences.

Decisions here map directly to SDG Targets 4.5 and 4.a. Faculty want evidence that access is equitable, privacy is protected, and harm is monitored and addressed in public.

5.5. Professional Identity Under Negotiation

Most participants reframed expertise away from first-draft production toward diagnostic judgement, task design, and pastoral care. An engineer at the research-intensive university said, “My craft is not typing. It is designing tasks that make thinking visible and helping students see what good judgment looks like.” A literature scholar modelled how to read critically with and against AI by running a live think aloud that labelled fallacies and missing sources.

Many reported a renewed duty to teach how knowledge is made, checked, and attributed. Short in-class activities such as error autopsies, bias hunts, and source tracing became routine. A colleague in public health said, “Students need to recognise when fluency is not evidence. We practise that skill weekly.”

Educators worried about the erosion of writing fluency or programming stamina if students lean too hard on AI. Rather than prohibitions, about one-third used staged autonomy: tighter scaffolds early, increasing independence later, with explicit reflection on what to automate and what to do manually. The students reported appreciating clear boundaries by level and task.

A minority chose not to use AI in their teaching, citing disciplinary mismatch or ethical discomfort. They still accepted student declarations of use and engaged in programme-level policy making, which kept the conversation honest about values and the limits of automation.

These shifts align with SDG Target 4.c on teacher development and with 4.7 on ethics and citizenship. Faculty want development that respects professional judgement, builds disciplinary fit, and foregrounds care.

5.6. Cross-Theme Synthesis: How Perceptions Translate to SDG 4 Practice

The themes point to a disciplined, equity-led path that is carried across contexts. Most participants perceived inclusion gains when AI accelerated universal design and multilingual support, yet they anchored those gains in human review and institution-funded access. Almost all those who experiment with detection tools reject them in favour of assessments that reward processing, provenance, and judgement. When work did not vanish but shifted toward verification and coaching, progress on Target 4.c depended on protected time and recognition for the labour that quality demands. Governance that pairs access guarantees with privacy by design and local transparency earned trust and enabled responsible experimentation. Finally, professional identity is being rearticulated around diagnostic judgement, task design, and care, which is consistent with SDG 4’s emphasis on quality outcomes and lifelong capability rather than mere output. In short, generative AI functions as a lever for SDG 4 only when equity is designed, assessment is rebuilt for thinking, and teacher agency is treated as the engine of educational quality.

6. Discussion

This discussion interprets the findings through our theoretical lenses and derives practical implications for programmes, faculties, and sector policy. The argument is straightforward. Generative AI can advance SDG 4 when equity is designed from the outset, when assessment is rebuilt to capture thinking rather than output, and when teacher agency is treated as the engine of quality. When these conditions are absent, the same tools risk deepening disparities, narrowing constructs, and eroding trust.

6.1. Interpreting the Findings Through the Lenses

A critical pedagogy lens clarifies why staff rejected detection-led responses and shifted towards designs that make student reasoning visible. The turn from policing to dialogue is not a soft option; it is a principled response that centres agency and judgement in learning, which is what SDG 4 envisages education to be [28,31]. Sociomateriality explains the variability across sites. Practice changed when assemblages of people, rules, tools, and data were reconfigured, for example, when programme boards approved declaration templates or when institutional licences removed paywall frictions. The outcomes followed assemblages rather than tools, which cautions against hype and blanket bans [33].

TPACK helps with reading disciplinary differences. Good-fit episodes occurred when AI illuminated disciplinary thinking, for example, engineering labs that graded error diagnosis or design studios that used AI to model critique. Poor-fit episodes occurred when generic summarisation displaced core practices such as close reading. The lesson is targeted, not universal, for adoption with professional judgement in the lead [34]. UDL provides the yardstick for inclusion. Staff realised tangible wins when AI accelerated transcripts, alt text, and multilingual explanations, yet human review remained non-negotiable to safeguard accuracy, tone, and cultural fit, which is consistent with the 2024 update of the UDL guidelines [26].

An ethics baseline is essential. Faculty concerns about bias, privacy, and opacity are well evidenced in the AI literature. Human oversight, transparency, and data protection are therefore a floor, not a ceiling, for classroom use, and local bias testing should be routine rather than exceptional [24,38,39].

6.2. Reframing Assessment Validity in an AI-Rich Environment

The strongest signal in the data is that integrity is a design problem. Staff moved towards process evidence, provenance, and oral or practical defence because these choices better protect the meaning of scores and the fairness of judgements. This aligns with validity as a unified concept that integrates content, cognition, observation, and consequences and with evidence-centred design that links models of competence to observable behaviours [35,36]. Authentic assessment was enacted through concrete moves, for example, local data, messy artefacts, iterative drafts, and reflective rationales, consistent with established principles of authenticity [37]. These designs are aligned with regulator guidance that is steering providers away from detection towards assessment redesign, grounding faculty choices in sector norms rather than preferences alone [7,25].

Two boundary conditions emerged. First, gateway competencies such as core algebra or anatomical knowledge sometimes warrant invigilated checks early on, followed by process-rich coursework. This is staged autonomy, not retreat. Second, declarations work because they normalise transparency. Treating AI use like any other resource to be acknowledged and critiqued reduces incentives to hide and rewards professional behaviour.

6.3. Equity by Design, Not Retrofit

The inclusion gains were real yet conditional. Without institution-funded access and low-bandwidth options, equity falters. Staff refused dependencies on paid tiers, which aligns with SDG 4’s target of eliminating disparities and with UNESCO’s guidance that data protection and fairness are non-negotiable in education settings [2,24]. Casting accessibility assets as standards for all learners reduced stigma for disabled and multilingual students, a simple move consistent with UDL’s universalist logic [26].

Language remains a live equity frontier. Staff valued register shifting and bilingual glossaries but insisted on technical accuracy and cultural fit. Local bias testing made ethics concrete. Programmes that tested prompts with local names, dialects, and case materials and then published results and mitigations earned trust. This practice operationalises fairness work from the AI ethics literature and embeds it in everyday teaching rather than relegating it to procurement or research [38,39].

6.4. Teacher Work and Professional Identity

The workload did not evaporate; it moved. The time saved in first-pass drafting reappeared as verification, coaching, and care. This redistribution is unsurprising given model brittleness and bias risks, yet adoption rhetoric often ignores it. Programmes that invested in communities of practice, mentoring, and reusable artefacts reported increased confidence and quality, mirroring wider observations that strategic capability, not tool awareness, is the bottleneck in many institutions [3]. Academic crafts are being reframed from typing towards curating tasks, diagnosing misconceptions, and modelling ethical judgement. This is not deskilling; it is reskilling around epistemic virtues that matter for lifelong learning and citizenship, which map to SDG 4 targets on quality, relevance, and global citizenship.

The policy implications are blunt. If leaders want quality and equity dividends, they must recognise the new labour. This requires protected time for redesign and verification, credit in workload models, and development that is close to practice rather than generic tool demonstrations. Without these factors, adoption will stall or compromise quality.

6.5. Governance, Trust and the Social Licence to Operate

Trust is a governance outcome. Faculty endorsed principle-based policies that require disclosure, protect privacy, and guarantee access while leaving room for disciplinary nuance. Appendix C translates these principles into a copyable module-level policy. Mandates to use a single tool, or surveillance-heavy detection regimes, corrode trust and underperform on validity and equity. Publishing local bias tests, offering non-AI pathways without penalty, and documenting data flows and retention schedules are simple practices that build a social licence to operate. These moves align with international ethics guidance and with sector advice from quality agencies that urge redesign rather than arms races [7,24,25].

The study supports four immediate commitments that programmes can enact without waiting for perfect policy:

guarantee access by licensing core tools for all students and staff, creating low-bandwidth routes, and treating accessibility assets as standards;
redesign assessment for process, provenance, and critique, and adopt concise declarations that normalise transparency;
invest in teachers by funding communities of practice, mentoring, and reusable exemplars, and reflect verification labour in workload models; and
publish evidence by reporting accuracy checks, bias tests, and workload effects. These commitments are the conditions under which generative AI supports SDG 4 rather than undermines it. A staged, twelve-month programme plan is provided in Appendix D.

Empirically, the study grounds claims in the day-to-day decisions of staff across contrasting institutions and disciplines. Conceptually, it integrates critical pedagogy, sociomateriality, TPACK, UDL, and validity theory into an operational framework that programme teams can use to reason about adoption. This moves beyond high-level guidance and tool-centred case reports towards designable commitments that can be evaluated against SDG 4 targets. The findings also help reconcile a common paradox in the literature: opportunity and risk are not mutually exclusive; both are present, which is why human oversight, design justice, and professional judgement matter in practice [2,21,38].

6.6. Limitations

The study is qualitative and context bound. Quotations are self-reported and sometimes composited to protect identities, which can reduce narrative texture. Policy environments, infrastructure, and workload models vary internationally. Readers should judge fit to their own context using the thick description provided, not assume portability. That said, the mechanisms we identify are plausible across settings: equity depends on access and alternatives; validity depends on assessment design; and trust depends on transparency and care.

6.7. A Forward Look

Two strands deserve priority. First, systematic tracking of equity outcomes is needed, especially for students who are disabled, multilingual, or bandwidth constrained. Second, assessment research should test the reliability and validity of process-focused rubrics across disciplines, not only for early adopters but also for evaluating student voice. Institutions should also experiment with transparency regimes, for example, annual public notes that summarise bias tests, error rates, and workload effects, which would make governance auditable and accelerate shared learning. None of this requires perfect certainty. It requires disciplined experimentation, public evidence, and a willingness to redesign when the data demand it.

Responsible adoption requires designable commitments rather than tool-level rules. Programmes should guarantee baseline access for all learners, provide low-bandwidth alternatives, and require brief declarations of AI use. At least one major assessment per module should be rebuilt to evidence process, provenance, and judgement. Faculties should recognise verification and coaching time in workload models and support communities of practice. With these conditions in place, generative AI can advance SDG 4 by improving inclusion and learning quality rather than entrenching disparities.

7. Conclusions

This study explains how faculty members perceive the conditions under which generative AI advances SDG 4 and identifies where it threatens equity and quality. Conceptually, the study contributes a practical framework that links institutional conditions to SDG 4 outcomes through assessment redesign and teacher work. Equity is enabled by access guarantees and low-bandwidth alternatives; quality is strengthened when assessment makes thinking and provenance visible; and trust is built through privacy by design, local bias testing, and routine public reporting. Under these conditions, generative AI functions as a lever for inclusion and quality rather than a source of new disparities.

The implications are actionable and sector-facing. Programmes should license core tools for all learners and adopt declaration-first policies; rebuild at least one major assessment per module to evidence process, provenance, and judgement; and recognise verification and mentoring in workload models through funded communities of practice. Faculties should publish an annual transparency note summarising bias tests, accuracy checks, and workload effects so that governance is auditable. Sector bodies should align quality guidance with these design moves and resist surveillance-heavy fixes that narrow the construct of learning.

We recommend four actionable commitments: access guarantees for core tools; assessment redesigned to evidence process and provenance; teacher time for verification and mentoring recognised in workload; and routine transparency on bias tests and accuracy checks. These measures align adoption with SDG 4 targets 4.1, 4.4, 4.5, and 4.a, and they convert diffuse enthusiasm into accountable practice.

Author Contributions

M.M.J.: Conceptualization, Data Curation, Software, Validation, Formal Analysis, Methodology and Writing—Original Draft Preparation; S.A.: Supervision and Writing—Review and Editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study has been approved by the Research Ethics Committee of the Indira Gandhi National Open University, IG/RU/IEC/2025/635, on 6 March 2025.

Informed Consent Statement

The informed consent for participation obtained from the participants of this study.

Data Availability Statement

The data are available from the corresponding author upon reasonable request and are subject to ethical and privacy restrictions.

Acknowledgments

We thank the three anonymous reviewers for their thoughtful and constructive feedback, which improved the clarity and rigour of this article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Appendix A

Interview Guide

Describe a recent teaching moment where a generative AI tool helped or hindered learning.
How have you adapted assessment in response to generative AI? What accessibility or inclusion opportunities do you see, and what safeguards are needed?
How has your workload changed, if at all?
What institutional policies or support would enable responsible use?
Where do you draw red lines in your module and why?
Which SDG 4 targets feel most relevant to your practice?
What evidence would convince you to scale a pilot or to stop it?

Appendix B

Analytic Codebook Excerpt

Accessibility by default: transcripts, alternative formats, universal design checks, human review.
Multilingual support: translation, register shifting, discipline-specific terminology.
Assessment redesign: process evidence, provenance, rubrics, oral defence.
Integrity regimes: detection tools, proctoring, surveillance concerns.
Workload dynamics: drafting gains, verification burden, coaching labour.
Governance expectations: licences, privacy, data flows, transparency and redress.
Professional identity: expertise, pastoral care, deskilling risks, mentoring.
Equity risks: paywalled tiers, bandwidth, device access, disability support.

Appendix C

One-Page Module Policy Template for Generative AI

Purpose: to support learning quality and fairness.
Permitted uses: idea generation, feedback on drafts, translation, accessibility assets such as transcripts and alt text.
Prohibited uses: submitting AI-generated work as your own without declaration, using AI to bypass required readings or data collection.
Declaration: include a short note in each submission that specifies if, where and how AI was used, including prompts and what changed after your edits.
Provenance: keep prompt histories and drafts. You may be asked to discuss your process.
Verification: you are responsible for checking accuracy, bias and citation integrity.
Privacy: do not enter personal or sensitive data into external tools.
Accessibility: you may request AI-assisted formats. Human review will be performed.
Support: contact details for module staff, accessibility services and learning support.

Appendix D

Appendix D.1. Twelve-Month Programme Plan Aligned with SDG 4

Appendix D.1.1. Months 1 to 2

Form a faculty student working group with disability services, language support and IT. Select two low-risk pilots per department, for example, formative feedback or multilingual onboarding. Agree adoption principles that mirror inclusion, quality and transparency.

Appendix D.1.2. Months 3 to 5

Redesign one assessment per pilot to reward process, critique and provenance. Build accessibility workflows, including transcripts, alt text and simplified summaries. Secure institution-funded access to core tools and a low-bandwidth channel.

Appendix D.1.3. Months 6 to 8

Run pilots with mentoring and student co-researchers. Collect data on feedback quality, time on task, access patterns and workload. Operate a no-surveillance default with a clear incident protocol.

Appendix D.1.4. Months 9 to 10

Review evidence with the working group. Iterate assessment designs, refine prompts and prepare exemplars.

Appendix D.1.5. Months 11 to 12

Scale to additional programmes. Publish an annual transparency note that reports indicators and lessons learned. Credit participating staff in workload models and promotion criteria.

References

United Nations. Transforming Our World: The 2030 Agenda for Sustainable Development; United Nations: New York, NY, USA, 2015; Available online: https://sdgs.un.org/sites/default/files/publications/21252030%20Agenda%20for%20Sustainable%20Development%20web.pdf (accessed on 2 October 2025).
UNESCO. Guidance for Generative AI in Education and Research; UNESCO: Paris, France, 2023; Available online: https://www.unesco.org/en/articles/guidance-generative-ai-education-and-research (accessed on 2 October 2025).
EDUCAUSE. 2024 EDUCAUSE Horizon Report: Teaching and Learning Edition; EDUCAUSE: Louisville, CO, USA, 2024; Available online: https://library.educause.edu/resources/2024/5/2024-educause-horizon-report-teaching-and-learning-edition (accessed on 2 October 2025).
OECD. OECD Digital Education Outlook 2021: Pushing the Frontiers with AI, Blockchain and Robots; OECD Publishing: Paris, France, 2021; Available online: https://www.oecd.org/en/publications/oecd-digital-education-outlook-2021_589b283f-en.html (accessed on 2 October 2025).
Freeman, J. Student Generative AI Survey 2025 (HEPI Policy Note 61); Higher Education Policy Institute: Oxford, UK, 2025; Available online: https://www.hepi.ac.uk/wp-content/uploads/2025/02/HEPI-Kortext-Student-Generative-AI-Survey-2025.pdf (accessed on 2 October 2025).
Department for Education. Generative Artificial Intelligence (AI) in Education; Department for Education: London, UK, 2025. Available online: https://www.gov.uk/government/publications/generative-artificial-intelligence-in-education/generative-artificial-intelligence-ai-in-education (accessed on 2 October 2025).
TEQSA. Enacting Assessment Reform in a Time of Artificial Intelligence; Tertiary Education Quality and Standards Agency: Melbourne, Australia, 2025. Available online: https://www.teqsa.gov.au/guides-resources/resources/corporate-publications/enacting-assessment-reform-time-artificial-intelligence (accessed on 2 October 2025).
Kim, J.; Klopfer, M.; Grohs, J.R.; Eldardiry, H.; Weichert, J.; Cox, L.A., II; Pike, D. Examining faculty and student perceptions of generative AI in university courses. Innov. High. Educ. 2025, 50, 1281–1313. [Google Scholar] [CrossRef]
United Nations Department of Economic and Social Affairs. Goal 4: Ensure Inclusive and Equitable Quality Education and Promote Lifelong Learning Opportunities for All: Targets and Indicators. Sustainable Development Goals. Available online: https://sdgs.un.org/goals/goal4#targets_and_indicators (accessed on 2 October 2025).
Nagime, P.V.; Chidrawar, V.R.; Singh, S.; Shafi, S.; Singh, S. Palm oil mill waste: A review on ecological improvement goals advancement and prospects. Food Human. 2025, 5, 100816. [Google Scholar] [CrossRef]
Senior, C.; Sahlberg, P. The evolution of the OECD’s position on equity in global education. Int. J. Educ. Dev. 2025, 114, 103241. [Google Scholar] [CrossRef]
Dupuy, K.; Palik, J.; Østby, G. No right to read: National regulatory restrictions on refugee rights to formal education in low- and middle-income host countries. Int. J. Educ. Dev. 2022, 88, 102537. [Google Scholar] [CrossRef]
Morlà-Folch, T.; Renta Davids, A.I.; Padrós Cuxart, M.; Valls-Carol, R. A research synthesis of the impacts of successful educational actions on student outcomes. Educ. Res. Rev. 2022, 37, 100482. [Google Scholar] [CrossRef]
Pezzulo, C.; Alegana, V.A.; Christensen, A.; Bakari, O.; Tatem, A.J. Understanding factors associated with attending secondary school in Tanzania using household survey data. PLoS ONE 2022, 17, e0263734. [Google Scholar] [CrossRef]
Mukherjee, M.; Mali, A.; Dolma, T. Education for all and MDGs: Global education policy translation in India. In International Encyclopedia of Education, 4th ed.; Elsevier: New York, NY, USA, 2023; pp. 526–538. [Google Scholar]
Pathak, P.; Nandini, V.; Yadav, D.; Kaur, S.; Sen, R.K.; Riang, S. Realizing human rights in rural Punjab of India: A study of enforcement of selected human rights. Front. Sociol. 2025, 10, 1619603. [Google Scholar] [CrossRef]
Raman, R.; Nair, V.K.; Lathabai, H.H.; Nedungadi, P. Research on sustainable development in India: Growth, key themes, and challenges. Social Sci. Humanit. Open 2025, 11, 101637. [Google Scholar] [CrossRef]
Nath, S.; Arrawatia, R. Adaptive capacity and attainment of the sustainable development goals in local communities of India. J. Environ. Manag. 2025, 373, 123850. [Google Scholar] [CrossRef]
Ray, S.; Chakravarty, S. Innovative initiatives to improve access to education in urban slums: A critical review of mobile school education programs. Cities 2025, 159, 105748. [Google Scholar] [CrossRef]
Sareen, S.; Mandal, S. Assessing SDG 4 indicators in online and blended higher education within conflict zones: A case study of northern India’s higher education institutions. Social Sci. Humanit. Open 2024, 9, 100903. [Google Scholar] [CrossRef]
Zawacki-Richter, O.; Marín, V.I.; Bond, M.; Gouverneur, F. Systematic review of research on artificial intelligence applications in higher education. Int. J. Educ. Technol. High. Educ. 2019, 16, 39. [Google Scholar] [CrossRef]
EDUCAUSE. QuickPoll Results: A Growing Need for Generative AI Strategy; EDUCAUSE: Louisville, CO, USA, 2024; Available online: https://er.educause.edu/articles/2024/4/educause-quickpoll-results-a-growing-need-for-generative-ai-strategy (accessed on 2 October 2025).
Kasneci, E.; Sessler, K.; Küchemann, S.; Bannert, M.; Dementieva, D.; Fischer, F.; Gasser, U.; Groh, G.; Günnemann, S.; Hüllermeier, E.; et al. ChatGPT for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 2023, 103, 102274. [Google Scholar] [CrossRef]
UNESCO. Recommendation on the Ethics of Artificial Intelligence; UNESCO: Paris, France, 2021; Available online: https://www.unesco.org/en/articles/recommendation-ethics-artificial-intelligence (accessed on 2 October 2025).
QAA. Reconsidering Assessment for the ChatGPT Era: QAA Advice on Developing Sustainable Assessment Strategies; Quality Assurance Agency for Higher Education: Gloucestershire, UK, 2023; Available online: https://www.qaa.ac.uk/docs/qaa/members/reconsidering-assessment-for-the-chat-gpt-era.pdf (accessed on 2 October 2025).
CAST. Universal Design for Learning Guidelines 3.0; CAST: Wakefield, MA, USA, 2024; Available online: https://udlguidelines.cast.org/ (accessed on 2 October 2025).
Mah, D.-K.; Groß, N. Artificial intelligence in higher education: Exploring faculty use, self-efficacy, distinct profiles, and professional development needs. Int. J. Educ. Technol. High. Educ. 2024, 21, 58. [Google Scholar] [CrossRef]
Biesta, G. Good Education in an Age of Measurement: Ethics, Politics, Democracy; Routledge: London, UK, 2010. [Google Scholar]
Nussbaum, M.C. Creating Capabilities: The Human Development Approach; Harvard University Press: Cambridge, MA, USA, 2011. [Google Scholar]
Sen, A. Development as Freedom; Oxford University Press: Oxford, UK, 1999. [Google Scholar]
Freire, P. Pedagogy of the Oppressed; Continuum: New York, NY, USA, 1970. [Google Scholar]
Cooren, F. Beyond entanglement: (socio-)materiality and organization studies. Organ. Theory 2020, 1, 2631787720954444. [Google Scholar] [CrossRef]
Orlikowski, W.J. Sociomaterial practices: Exploring technology at work. Organ. Stud. 2007, 28, 1435–1448. [Google Scholar] [CrossRef]
Mishra, P.; Koehler, M.J. Technological pedagogical content knowledge: A framework for teacher knowledge. Teach. Coll. Rec. 2006, 108, 1017–1054. [Google Scholar] [CrossRef]
Messick, S. Standards of validity and the validity of standards in performance assessment. Educ. Meas. Issues Pract. 1995, 14, 5–8. [Google Scholar] [CrossRef]
Pellegrino, J.W.; Chudowsky, N.; Glaser, R. (Eds.) Knowing What Students Know: The Science and Design of Educational Assessment; National Academies Press: Washington, DC, USA, 2001. [Google Scholar]
Wiggins, G. The case for authentic assessment. Pract. Assess. Res. Eval. 1990, 2, 1–3. [Google Scholar]
Bender, E.M.; Gebru, T.; McMillan-Major, A.; Shmitchell, S. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Toronto, ON, Canada, 3–10 March 2021; ACM: New York, NY, USA, 2021; pp. 610–623. [Google Scholar] [CrossRef]
Buolamwini, J.; Gebru, T. Gender shades: Intersectional accuracy disparities in commercial gender classification. Proc. Mach. Learn. Res. 2018, 81, 77–91. [Google Scholar]
Gebru, T.; Morgenstern, J.; Vecchione, B.; Vaughan, J.W.; Wallach, H.; Daumé, H., III; Crawford, K. Datasheets for datasets. Commun. ACM 2021, 64, 86–92. [Google Scholar] [CrossRef]
Rogers, E.M. Diffusion of Innovations, 5th ed.; Free Press: New York, NY, USA, 2003. [Google Scholar]
Engeström, Y. Expansive learning at work: Toward an activity theoretical reconceptualization. J. Educ. Work 2001, 14, 133–156. [Google Scholar] [CrossRef]
Costanza-Chock, S. Design Justice: Community-Led Practices to Build the Worlds We Need; MIT Press: Cambridge, MA, USA, 2020. [Google Scholar]
Jasanoff, S.; Kim, S.H. (Eds.) Dreamscapes of Modernity: Sociotechnical Imaginaries and the Fabrication of Power; University of Chicago Press: Chicago, IL, USA, 2015. [Google Scholar]
Yin, R.K. Case Study Research and Applications: Design and Methods, 6th ed.; Sage: Thousand Oaks, CA, USA, 2018. [Google Scholar]
Braun, V.; Clarke, V. Using thematic analysis in psychology. Qual. Res. Psychol. 2006, 3, 77–101. [Google Scholar] [CrossRef]
Braun, V.; Clarke, V. Reflecting on reflexive thematic analysis. Qual. Res. Sport Exerc. Health 2019, 11, 589–597. [Google Scholar] [CrossRef]
Hennink, M.M.; Kaiser, B.N.; Marconi, V.C. Code saturation versus meaning saturation: How many interviews are enough? Qual. Health Res. 2017, 27, 591–608. [Google Scholar] [CrossRef]
Malterud, K.; Siersma, V.D.; Guassora, A.D. Sample size in qualitative interview studies: Guided by information power. Qual. Health Res. 2016, 26, 1753–1760. [Google Scholar] [CrossRef]
Lincoln, Y.S.; Guba, E.G. Naturalistic Inquiry; Sage: Newbury Park, CA, USA, 1985. [Google Scholar]
Nowell, L.S.; Norris, J.M.; White, D.E.; Moules, N.J. Thematic analysis: Striving to meet the trustworthiness criteria. Int. J. Qual. Methods 2017, 16, 1–13. [Google Scholar] [CrossRef]
Tracy, S.J. Qualitative quality: Eight “big-tent” criteria for excellent qualitative research. Qual. Inq. 2010, 16, 837–851. [Google Scholar] [CrossRef]
Braun, V.; Clarke, V. Thematic Analysis: A Practical Guide; Sage: London, UK, 2021. [Google Scholar]
O’Brien, B.C.; Harris, I.B.; Beckman, T.J.; Reed, D.A.; Cook, D.A. Standards for reporting qualitative research: A synthesis of recommendations. Acad. Med. 2014, 89, 1245–1251. [Google Scholar] [CrossRef]
Tong, A.; Sainsbury, P.; Craig, J. Consolidated criteria for reporting qualitative research (COREQ): A 32-item checklist for interviews and focus groups. Int. J. Qual. Health Care 2007, 19, 349–357. [Google Scholar] [CrossRef]

Table 1. SDG 4 targets and relevance to this study.

Target	Full Wording, Abridged	Relevance
4.1	Free, equitable, quality primary and secondary education with effective learning outcomes	Assessment validity; learning quality
4.2	Quality early childhood development and pre-primary education	Contextual reference
4.3	Equal access to affordable tertiary education	Sector framing; access policies
4.4	Skills for employment, decent jobs, and entrepreneurship	AI literacy; judgement; discipline-specific skills
4.5	Eliminate disparities and ensure equal access	Equity by design; access guarantees
4.6	Youth and adult literacy and numeracy	Baseline skills; scaffolded autonomy
4.7	Education for sustainable development and global citizenship	Critical pedagogy; ethical judgement
4.a	Safe, inclusive, effective learning environments	Licensed access; privacy; low-bandwidth routes
4.b	Scholarships for higher education	Not examined empirically
4.c	Qualified teachers and teacher training	Workload; mentoring; professional learning

Table 2. Framework lenses aligned to SDG 4 targets, guiding questions, and indicators.

Lens	SDG 4 Targets	Guiding Question	Indicators
Purpose and capabilities	4.1, 4.7	Which educational goods are advanced or traded off by AI for learners and teachers?	Evidence of expanded opportunities; risks to agency and judgement
Critical pedagogy	4.7	Whose voice is centred and how is dialogue structured?	Visibility of process; opportunities for student critique and co-design
Sociomateriality	4.a, 4.5	Which tools, rules, roles, and data compose practice?	Descriptions of assemblages; data flow; points where practice is constrained or enabled
TPACK	4.4	How well does AI fit disciplinary knowledge and practice?	Substitution versus transformation tags; discipline-specific exemplars
Universal Design for Learning	4.a, 4.5	Does AI widen access by design?	Transcripts, alternative formats, multilingual supports, low-bandwidth routes
Validity and authenticity of assessment	4.1, 4.4	Does assessment evidence process, provenance, and judgement?	Rubric shifts; provenance checks; oral or in-class defences; iterative feedback traces
Ethics and equity guardrails	4.5, 4.a, 4.c	Are fairness, privacy, and oversight assured?	Disclosure requirements; bias checks; clarity on data retention and model provenance; teacher support mechanisms
Adoption dynamics	4.c	What enables or blocks responsible uptake?	Relative advantage, compatibility, trialability; workload effects; critical incidents and contradictions
Sociotechnical imaginaries and design justice	4.5, 4.7	Whose futures are imagined and resourced?	Inclusion of affected groups in policy and design; attention to commuter, disabled, and Global South perspectives

Table 3. Participant sample by institution type and discipline.

Institution Type	n	%
Research-intensive	14	38.9
Teaching-focused	12	33.3
Regional or commuter	10	27.8
Discipline cluster	n	%
Engineering & computing	10	27.8
Health	6	16.7
Social sciences	8	22.2
Arts & humanities	6	16.7
Business & education	6	16.7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Joseph, M.M.; Areepattamannil, S. From Policing to Design: A Qualitative Multisite Study of Generative Artificial Intelligence and SDG 4 in Higher Education. Sustainability 2025, 17, 10381. https://doi.org/10.3390/su172210381

AMA Style

Joseph MM, Areepattamannil S. From Policing to Design: A Qualitative Multisite Study of Generative Artificial Intelligence and SDG 4 in Higher Education. Sustainability. 2025; 17(22):10381. https://doi.org/10.3390/su172210381

Chicago/Turabian Style

Joseph, Marina Mathew, and Shaljan Areepattamannil. 2025. "From Policing to Design: A Qualitative Multisite Study of Generative Artificial Intelligence and SDG 4 in Higher Education" Sustainability 17, no. 22: 10381. https://doi.org/10.3390/su172210381

APA Style

Joseph, M. M., & Areepattamannil, S. (2025). From Policing to Design: A Qualitative Multisite Study of Generative Artificial Intelligence and SDG 4 in Higher Education. Sustainability, 17(22), 10381. https://doi.org/10.3390/su172210381

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

From Policing to Design: A Qualitative Multisite Study of Generative Artificial Intelligence and SDG 4 in Higher Education

Abstract

1. Introduction

2. Background and Literature

3. Conceptual Framework

3.1. Education’s Purpose, Not Just Its Metrics

3.2. Critical Pedagogy: Agency, Voice and Judgement

3.3. Sociomateriality: Practice as an Assemblage

3.4. TPACK: Disciplinary Fit and Professional Judgement

3.5. Universal Design for Learning (UDL): Inclusion by Design

3.6. Assessment Validity and Authenticity

3.7. Ethics Baseline: Human Oversight, Transparency and Fairness

3.8. Equity Guardrails: Capabilities, Language and Data Governance

3.9. Diffusion of Innovations: Why Adoption Arcs Differ

3.10. Activity Theory: Contradictions as Engines for Change

3.11. Sociotechnical Imaginaries and Design Justice: Futures Worth Having

4. Research Methodology

4.1. Methodological Orientation and Theory

4.2. Settings, Participants, and Sampling

4.3. Data Collection and Management

4.4. Data Analysis

5. Findings

5.1. Inclusive Affordances, with Caveats

5.2. Assessment Is a Design Problem, Not a Policing Problem

5.3. Workload Realignment, Not Automatic Reduction

5.4. Governance, Equity and Trust

5.5. Professional Identity Under Negotiation

5.6. Cross-Theme Synthesis: How Perceptions Translate to SDG 4 Practice

6. Discussion

6.1. Interpreting the Findings Through the Lenses

6.2. Reframing Assessment Validity in an AI-Rich Environment

6.3. Equity by Design, Not Retrofit

6.4. Teacher Work and Professional Identity

6.5. Governance, Trust and the Social Licence to Operate

6.6. Limitations

6.7. A Forward Look

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Interview Guide

Appendix B

Analytic Codebook Excerpt

Appendix C

One-Page Module Policy Template for Generative AI

Appendix D

Appendix D.1. Twelve-Month Programme Plan Aligned with SDG 4

Appendix D.1.1. Months 1 to 2

Appendix D.1.2. Months 3 to 5

Appendix D.1.3. Months 6 to 8

Appendix D.1.4. Months 9 to 10

Appendix D.1.5. Months 11 to 12

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI