Theoretical Foundations for Governing AI-Based Learning Outcome Assessment in High-Risk Educational Contexts

Manganello, Flavio; Nico, Alberto; Boccuzzi, Giannangelo

doi:10.3390/info16090814

Open AccessArticle

Theoretical Foundations for Governing AI-Based Learning Outcome Assessment in High-Risk Educational Contexts

by

Flavio Manganello

^1,*

,

Alberto Nico

²

and

Giannangelo Boccuzzi

^1,3

¹

Institute for Educational Technology, National Research Council, 16149 Genoa, Italy

²

Department of Law, University “Aldo Moro”, 70122 Bari, Italy

³

Department of Education Studies, University of Bologna “Alma Mater Studiorum”, 4012 Bologna, Italy

^*

Author to whom correspondence should be addressed.

Information 2025, 16(9), 814; https://doi.org/10.3390/info16090814

Submission received: 30 July 2025 / Revised: 27 August 2025 / Accepted: 17 September 2025 / Published: 19 September 2025

(This article belongs to the Special Issue Advances in Explainable Artificial Intelligence, 2nd Edition)

Download Versions Notes

Abstract

The governance of artificial intelligence (AI) in education requires theoretical grounding that extends beyond system compliance toward outcome-focused accountability. The EU AI Act classifies AI-based learning outcome assessment (AIB-LOA) as a high-risk application (Annex III, point 3b), underscoring the importance of algorithmic decision-making in student evaluation. Current regulatory frameworks such as GDPR and ALTAI focus primarily on ex-ante and system-focused approaches. ALTAI applications in education concentrate on compliance and vulnerability analysis while often failing to integrate governance principles with established educational evaluation practices. While explainable AI research demonstrates methodological sophistication (e.g., LIME, SHAP), it often fails to deliver pedagogically meaningful transparency. This study develops the XAI-ED Consequential Assessment Framework (XAI-ED CAF) as a sector-specific, outcome-focused governance model for AIB-LOA. The framework reinterprets ALTAI’s seven requirements (human agency, robustness, privacy, transparency, fairness, societal well-being, and accountability) through three evaluation theories: Messick’s consequential validity, Kirkpatrick’s four-level model, and Stufflebeam’s CIPP framework. Through this theoretical integration, the study identifies indicators and potential evidence types for institutional self-assessment. The analysis indicates that trustworthy AI in education extends beyond technical transparency or legal compliance. Governance must address student autonomy, pedagogical validity, interpretability, fairness, institutional culture, and accountability. The XAI-ED CAF reconfigures ALTAI as a pedagogically grounded accountability model, establishing structured evaluative criteria that align with both regulatory and educational standards. The framework contributes to AI governance in education by connecting regulatory obligations with pedagogical evaluation theory. It supports policymakers, institutions, and researchers in developing outcome-focused self-assessment practices. Future research should test and refine the framework through Delphi studies and institutional applications across various contexts.

Keywords:

artificial intelligence in education; AI-based learning outcome assessment; explainable AI (XAI); ALTAI; educational evaluation; transparency; accountability; AI governance; pedagogical validity

1. Introduction

1.1. Rationale

The integration of artificial intelligence (AI) systems into educational assessment practices creates ethical, legal, and operational challenges that require systematic governance responses [1,2]. Initial research examined both the potential benefits and risks of AI-driven assessment: Luckin et al. [3] described AI as a catalyst for alternative formative and competency-based assessment approaches, while Holmes et al. [4] identified implications for curriculum design and the need to embed validity considerations. Within this context, educational institutions increasingly assign critical evaluative functions, particularly learning outcome assessment, to algorithmic systems, creating concerns about transparency, accountability, and student rights [5]. These concerns assume greater importance within the regulatory environment: the European Union’s AI Act explicitly designates AI systems used to evaluate learning outcomes in education and vocational training as “high-risk” (Annex III, point 3b), mandating rigorous compliance [6]. Simultaneously, the General Data Protection Regulation (GDPR) establishes specific rights to meaningful explanations in automated decision-making [7].

Delegating evaluative authority to AI creates governance tensions that extend beyond technical transparency. Ethical analyses document risks of power redistribution, limited institutional accountability, and the vulnerability of student populations [5]. These tensions appear as conflicts between efficiency and autonomy, accuracy and explainability, personalization and surveillance, compliance and innovation. Bibliometric mapping shows that research on AI transparency in assessment has expanded substantially since 2019 but remains terminologically fragmented, with limited methodological convergence and insufficient operational standards for transparency auditing [8]. This aligns with systematic reviews of (Artificial Intelligence in Education) AIED ethics, which document the lack of integration between governance principles and evaluation models [9], and with recommendations for a shared ethical framework [1].

In this context, governance instruments such as the Assessment List for Trustworthy AI (ALTAI) have gained prominence as structured self-assessment tools [10]. However, their limitations are well-documented. Radclyffe et al. [11] observe that ALTAI fails to differentiate risks, lacks peer-comparison mechanisms, and omits integration with Fundamental Rights Impact Assessments. Peterson and Broersen [12] support this critique, demonstrating that automated systems cannot provide normative justifications (only propositional explanations of outputs) thereby making “explainable ethical AI” a problematic goal. Reidenberg’s [13] concept of Lex Informatica demonstrates that technical architectures themselves function as regulatory instruments, embedding normative choices in system design. Recent research identifies possible adaptations: Fedele et al. [14] applied ALTAI to a student performance prediction system, identifying vulnerabilities but not exploring its adaptation to AIB-LOA, specifically designated as high-risk under the AI Act. Comparative legal analyses recommend differentiated transparency standards and procedural fairness to harmonize human and algorithmic judgment [15], while theoretical work drawing from judicial science explores hybrid two-stage models in which AI generates draft evaluations that remain subject to educator review, thereby preserving human oversight [16].

International frameworks provide additional guidance but remain abstract. UNESCO emphasizes contextualized explainability, stating that explanations should be pedagogically meaningful for educators, learners, and parents [17,18]. However, operationalization remains limited, and ALTAI’s reliance on self-assessment differs from established principles of independent testing and audit [11]. Furthermore, connections between ALTAI’s requirements and established traditions of educational evaluation are rarely established. Existing applications identify vulnerabilities but do not engage with evaluation theory [14]. Reviews confirm this gap, indicating the absence of standardized approaches that could bridge AI governance with educational assessment frameworks [19].

This study addresses these gaps by adapting ALTAI for AI-based learning outcome assessment and reinterpreting its seven requirements through three established frameworks of educational evaluation: Messick’s [20] consequential validity, Kirkpatrick’s [21] four-level evaluation model, and Stufflebeam’s [22] CIPP framework. This alignment seeks to bridge regulatory principles with pedagogical theory, establishing a theoretically grounded foundation for subsequent empirical inquiry.

1.2. Objectives

The study pursues three related objectives. First, it seeks to conceptualize governance for AI-based learning outcome assessment consistent with the European Union’s AI Act designation of such systems as high-risk (Annex III, point 3b) and with the principles of trustworthy AI articulated in ALTAI. This requires translating regulatory expectations into criteria that reflect pedagogical validity, protect student rights, and acknowledge the contextual conditions of assessment practices.

Second, it aims to examine how ALTAI’s seven requirements (human agency, technical robustness, privacy and data governance, transparency, fairness, societal well-being, and accountability) can be reinterpreted through established traditions of educational evaluation. The outcome is a set of theoretically derived indicators and evidence types for AIB-LOA, proposed as design-level constructs to demonstrate potential pathways of operationalization rather than validated tools for immediate application.

Third, the study establishes a theoretical foundation for subsequent empirical inquiry, offering a coherent framework that can later be refined through expert consultation and institutional case studies.

These objectives address documented fragmentation in the literature by connecting legal and regulatory requirements with pedagogical principles and technical capabilities.

The study is guided by two research questions:

RQ1.

How can ALTAI’s seven requirements for trustworthy AI be reinterpreted through established educational evaluation theories to support AI-based learning outcome assessment?

RQ2.

Which theoretically derived indicators and evidence types can demonstrate how transparency, explainability, and accountability may be operationalized in AI-based learning outcome assessment, ensuring consistency with both regulatory demands and pedagogical traditions?

The outcome of this inquiry is the proposed XAI-ED CAF, a theoretically grounded framework that contributes to AI governance research by integrating ALTAI with educational evaluation theory in the specific high-risk domain of AIB-LOA.

2. Background and Related Work

2.1. Regulatory Frameworks and the Governance of Learning Outcomes

The regulatory environment for AI in education has become more structured but remains incomplete regarding outcome-focused governance. The EU Artificial Intelligence Act (Regulation 2024/1689) explicitly designates “AI systems intended to be used for the evaluation of students in educational and vocational training institutions” as high-risk applications (Annex III, point 3b). This classification recognizes that algorithmic mediation in educational evaluation affects rights, equity, and access to future opportunities [14]. Student outcomes extend beyond administrative results; they determine academic progression and social mobility, making the integrity of these processes important for institutional legitimacy. In this study, assessment refers specifically to AI-based learning outcome assessment of student learning outcomes, rather than to the technical evaluation of AI systems themselves. This distinction ensures that governance is framed around the educational validity and fairness of algorithmic decisions, consistent with Annex III, point 3b, of the EU AI Act.

Additional regulation is provided by the General Data Protection Regulation (GDPR). Article 22 establishes the right not to be subject exclusively to automated decisions and entitles individuals to “meaningful information” about the logic of algorithmic outputs [7]. In Reidenberg’s [13] terms, such provisions demonstrate Lex Informatica: technical architectures operate as normative instruments embedding rules and constraints. However, GDPR’s focus is primarily procedural (preventing automated profiling) whereas the AI Act extends to functional risks, robustness, and oversight mechanisms [14].

Both instruments remain primarily ex-ante in orientation. Institutions must demonstrate conformity at deployment, but guidance on post-deployment governance of outcomes is limited, that is, how to evaluate whether AIB-LOA continues to generate fair, valid, and pedagogically meaningful results over time. This gap is increased in hybrid assessment systems, where algorithmic scoring interacts with educator judgment. Legal analyses document the tension between algorithmic efficiency and procedural fairness in such contexts, identifying the lack of concrete models to balance both [15]. Annex III, point 3b, of the EU AI Act emphasizes the stakes of this governance challenge but does not resolve it, creating a need for sector-specific frameworks that operationalize regulatory requirements in outcome-focused terms. Reviews confirm this fragmentation by showing that definitions of “assessment” vary across technical, pedagogical, and legal perspectives, undermining conceptual clarity and limiting governance implementation [23].

2.2. ALTAI and Its Educational Reinterpretation

The Assessment List for Trustworthy Artificial Intelligence (ALTAI) was developed by the European Commission’s High-Level Expert Group on AI as a self-assessment checklist covering seven requirements: human agency and oversight, technical robustness and safety, privacy and data governance, transparency, diversity and fairness, societal well-being, and accountability [11]. Its contribution lies in bridging ethical principles with organizational practices, providing a structured tool to examine trustworthiness.

Early applications in education demonstrate both promise and limitations. Fedele et al. [14] applied ALTAI to student performance prediction systems, demonstrating its capacity to identify vulnerabilities and promote stakeholder dialogue. However, ALTAI’s scope remains limited: it emphasizes system robustness and compliance processes rather than learning outcome validity and fairness. Additionally, as a self-assessment tool, ALTAI lacks external auditability, creating concerns about its adequacy for high-risk domains such as AIB-LOA [11].

For Annex III, point 3b, of the EU AI Act these limitations are significant. ALTAI does not differentiate obligations across educational contexts, institutional capacities, or assessment risks, and it does not provide indicators for post-deployment evaluation of outcomes. Reinterpreting ALTAI through educational evaluation theories is therefore necessary. Messick’s [20] consequential validity addresses how outcomes affect equity and opportunity; Kirkpatrick’s [21] four-level model structures evaluation from immediate reactions to long-term results; Stufflebeam’s [22] CIPP framework links context, inputs, processes, and products into systematic accountability. Combined, these theories provide the conceptual foundation to transform ALTAI into an institutional self-assessment framework for outcome governance. This reinterpretation moves ALTAI from compliance-oriented assessment to a pedagogically grounded model of accountability.

This reinterpretation builds on developments in the XAI literature. Early contributions to explainable AI [24] identified the need for trustworthy systems that could communicate decision processes in human-understandable terms. Subsequently, the XAI-ED framework proposed by Khosravi et al. [25] developed explainability principles adapted to educational contexts, but it has remained weakly connected to established theories of educational evaluation. These contributions support an educational reinterpretation of ALTAI that connects trustworthy-AI principles with established evaluation theory, thereby shifting ALTAI from a compliance-oriented checklist to a pedagogically grounded model of accountability tailored to Annex III, point 3b, of the EU AI Act.

2.3. Explainability and Its Limits for Outcome Accountability

Explainability is commonly promoted as a remedy for algorithmic opacity. Techniques such as Local Interpretable Model-Agnostic Explanations (LIME) [26] and SHapley Additive exPlanations (SHAP) are designed to reveal which features influenced predictions, offering localized transparency into model behavior. Such approaches are appealing in education, where AI-based grading or feedback must be justified to stakeholders.

However, empirical evidence reveals their limited educational utility. In automated essay scoring, for example, explanations fail to enhance trust when they are not aligned with how students and teachers conceptualize learning, and can reduce credibility when misinterpreted [27]. Explanations also risk fostering excessive confidence: visually compelling outputs may lead educators or administrators to over-trust algorithmic assessments they do not fully understand [1].

Beyond technical constraints, explainability cannot resolve normative questions about fairness or validity. Peterson and Broersen [12] contend that AI cannot autonomously provide ethical justifications, since human values are plural and context-sensitive. Applied to AIB-LOA, this means no single explanation can determine whether an algorithmic grade is educationally valid; such judgments require hybrid human–AI oversight [16].

Explainability should therefore be reconceptualized as a component of outcome accountability. It gains value when integrated into institutional processes that evaluate whether algorithmic outputs can legitimately serve as evidence of learning. This reconceptualization underscores the need for indicators and evidence types that translate technical transparency into pedagogical validity and due process.

2.4. Governance Frameworks and Educational Accountability

Several governance frameworks have emerged in response to ethical challenges in AI. UNESCO’s Recommendation on the Ethics of AI [17] and Guidance for Generative AI in Education and Research [18] emphasize transparency and pedagogical integrity, advocating for contextualized and learner-centered explanations. Chan’s [28] AI Ecological Policy Framework organizes responsibilities across pedagogical, governance, and operational domains, while Umoke et al. [19] propose the Ethical AI Governance Framework for Adaptive Learning (EAGFAL), integrating bias mitigation, explainability, and data governance into a policy model.

These frameworks remain largely principle-oriented or policy-level. They do not specify measurable criteria for evaluating outcomes under Annex III, point 3b, of the EU AI Act. Systematic evidence confirms this assessment: Yan et al. [9] reviewed 34 empirical and conceptual studies and found limited multi-stakeholder integration, insufficient attention to geographic disparities, and minimal linkage between ethical principles and outcome evaluation. Chen et al. [29] and Zawacki-Richter et al. (2019) [2] similarly identified fragmentation, observing that AI in education research has concentrated on technical innovations or administrative efficiency while neglecting pedagogical accountability.

This asymmetry is particularly significant for AIB-LOA. Institutions must demonstrate not only technical compliance but also that learning outcomes assessed by AI remain valid and fair. No governance framework currently provides the tools for such self-assessment. This supports the need for a reinterpretation of ALTAI that connects ethical principles with evaluation theory, offering operationalizable indicators for outcome-focused accountability.

2.5. Toward Outcome-Focused Self-Assessment

Across regulatory, technical, and governance domains, literature shows progress but also persistent fragmentation. The AI Act establishes obligations but does not specify how institutions should evaluate outcomes of AIB-LOA post-deployment. GDPR protects rights to information but does not address pedagogical validity. ALTAI provides a checklist but lacks outcome focus. Explainability methods enhance transparency but not validity. Governance frameworks articulate principles but do not develop operational indicators.

Table 1 synthesizes these findings, mapping references to gaps and contributions toward an outcome-focused framework. The evidence indicates that no model currently enables institutions to systematically self-assess whether AI systems under Annex III, point 3b, of the EU AI Act produce valid, fair, and educationally accountable outcomes. As documented in reviews of ethical-legal principles for AI in educational assessment [23], existing literature identifies concerns but does not provide operational criteria. The XAI-ED CAF builds on these findings by translating conceptual discussions into theoretically grounded indicators for institutional self-assessment.

The selection presented in Table 1 identifies a core set of references that are directly relevant for developing the XAI-ED CAF. Rather than providing an exhaustive survey, this table focuses on contributions that directly engage with the governance of AI-based learning outcome assessment under Annex III, point 3b, of the EU AI Act. The eight studies selected demonstrate complementary dimensions of the problem: regulatory incompleteness [11,14], ethical and operational challenges [1,12], technical limits of explainability [26,27], and fragmented governance approaches [9,19].

This focused synthesis confirms that while progress has been made, no existing framework equips institutions to conduct outcome-focused self-assessment of AIB-LOA. The gaps identified across these selected contributions provide the rationale for the development of the XAI-ED CAF, which reinterprets ALTAI requirements through educational evaluation theory (RQ1) and derives indicators and evidence types for institutional accountability (RQ2).

3. Theoretical Framework Development: XAI-ED Consequential Assessment Framework (XAI-ED CAF)

3.1. Pedagogical Foundations for Outcome-Focused Governance of AIB-LOA

The development of a theoretical framework for the governance of AI-based learning outcome assessment (AIB-LOA) requires systematic integration of established evaluation paradigms that focus on the consequences and long-term impacts of educational interventions. In this study, learning outcome assessment (LOA) refers specifically to institutional practices for evaluating student achievement, progression, and competence acquisition: processes that determine grades, certification, and access to further educational or professional opportunities. This definition distinguishes LOA from the broader evaluation of AI systems themselves, emphasizing how algorithmically mediated judgments in assessment affect learners’ rights, equity, and pedagogical validity.

In accordance with Annex III, point 3b, of the EU AI Act, which designates AI systems used to evaluate students as high-risk applications, the present study aims to conceptualize a framework that enables educational institutions to conduct outcome-focused self-assessment of AIB-LOA. The focus extends beyond verifying technical robustness or regulatory conformity to evaluating whether algorithmically supported assessments produce valid, fair, and educationally accountable outcomes. The framework directly addresses the gaps identified in Section 1 and Section 2, where existing regulatory instruments remain primarily ex-ante and abstract, and governance approaches fail to operationalize accountability in terms of pedagogical validity.

The XAI-ED CAF responds to these gaps by synthesizing three foundational evaluation theories: Messick’s consequential validity, Kirkpatrick’s four-level evaluation model, and Stufflebeam’s CIPP framework, into an integrated approach for assessing the effectiveness of AI governance in educational settings.

The first is Messick’s [20] theory of validity, which transformed educational assessment by emphasizing that validity extends beyond technical accuracy toward systematic evaluation of the consequences of assessment decisions on individuals and institutions. Messick’s consequential validity framework is relevant to AIB-LOA governance because it addresses the social and ethical implications of assessment, requiring systematic investigation of whether practices support or undermine educational objectives and social values. This perspective directly addresses the gap identified in current literature, where XAI techniques often fail to provide pedagogically relevant explanations and may confuse educational stakeholders about decision logic [1,27].

The second component is Kirkpatrick’s [21] four-level evaluation model, which provides structured approaches for assessing intervention impacts across multiple temporal and organizational dimensions. Kirkpatrick’s framework enables systematic evaluation of AI system effects at stakeholder reaction levels, learning impact assessment, behavioral change measurement, and institutional results analysis. This multi-level approach addresses the need for evaluation that extends beyond system accuracy metrics toward the measurement of educational effectiveness and long-term institutional outcomes [9].

Third, Stufflebeam’s [22] Context-Input-Process-Product (CIPP) evaluation framework provides the theoretical integration by establishing systematic procedures for evaluating educational programs and interventions. The CIPP model ensures that outcome-focused governance addresses contextual factors influencing AI system effectiveness, resource allocation and implementation strategies, operational processes, and product measurement. This component directly addresses the implementation challenges identified in governance literature, where abstract ethical principles lack operational guidance for systematic assessment [9,19].

The integration of these three evaluation traditions creates a theoretically grounded approach that builds upon existing ALTAI-based analyses of educational AI [11,14]. In contrast to prior applications focused primarily on compliance or system vulnerabilities, the XAI-ED CAF reframes ALTAI through consequential validity, multi-level evaluation, and contextual assessment. This shift emphasizes long-term educational consequences, institutional effectiveness, and stakeholder well-being, thereby aligning the framework with the requirements of Annex III, point 3b, of the EU AI Act.

3.2. ALTAI Dimensions Reinterpreted for Outcome-Focused Governance (RQ1)

Addressing RQ1, the adaptation of ALTAI’s seven trustworthy AI requirements through the XAI-ED CAF requires systematic reinterpretation that extends beyond technical compliance toward assessment of educational effectiveness and stakeholder well-being. This reinterpretation process employs Messick’s consequential validity framework as the primary analytical lens, ensuring that each ALTAI dimension is evaluated in terms of its impacts on educational processes and outcomes rather than its adherence to abstract principles. To operationalize this reinterpretation, the seven ALTAI dimensions are aligned with the selected evaluation theories and reframed for outcome-focused governance of AIB-LOA. The resulting mapping establishes the pedagogical foundation and assessment focus for each dimension. This synthesis is presented in Table 2.

Table 2. Mapping of ALTAI’s seven trustworthy AI dimensions to pedagogical evaluation foundations for outcome-focused governance of AIB-LOA.

ALTAI Dimension	Pedagogical Foundation Link	Assessment Focus
Human agency & oversight	Messick (consequential validity); Kirkpatrick (Levels 1–4)	Preserving student autonomy, educator authority, and meaningful human intervention.
Technical robustness & safety	Messick (construct & predictive validity)	Ensuring alignment between algorithmic outputs and authentic learning outcomes.
Privacy & data governance	Stufflebeam (CIPP—context, input, process, product)	Evaluating institutional data stewardship, consent, and compliance effectiveness.
Transparency	Messick (consequential validity); Kirkpatrick (Level 2 learning)	Assessing interpretability, stakeholder understanding, and usefulness of explanations.
Diversity, fairness & non-discrimination	Messick (consequential validity)	Monitoring equity of access and differential impacts across demographic groups.
Societal & environmental well-being	Kirkpatrick (institutional results); Stufflebeam (product)	Measuring contribution to learning communities, institutional culture, and sustainability.
Accountability	Stufflebeam (CIPP—governance processes)	Evaluating governance structures, policy responsiveness, and institutional learning capacity.

The following subsections illustrate how each ALTAI dimension, when reinterpreted through pedagogical evaluation theories, operationalizes RQ1 by yielding a structured set of evaluative criteria and guiding questions for outcome-focused governance of AIB-LOA under Annex III, point 3b, of the EU AI Act. The emphasis is on conceptual translation rather than applied protocols, consistent with the study’s theoretical scope.

3.2.1. Human Agency and Oversight: Preserving Educational Autonomy

Addressing RQ1, the reinterpretation of human agency and oversight within the XAI-ED CAF emphasizes the preservation of autonomy for both students and educators. In the context of AIB-LOA, oversight cannot be reduced to a standard “human-in-the-loop” safeguard but must ensure that learners and teachers retain meaningful authority in evaluative processes. Holmes et al. [1] document the risk that excessive delegation to automated systems may erode educators’ agency and reduce students’ opportunities for self-regulated learning. Similarly, Boccuzzi et al. [16] contend that hybrid human–AI models are needed to ensure contestability and due process in evaluation practices.

Through Messick’s consequential validity, human agency is evaluated in terms of its social and ethical outcomes: Do AI-supported assessments strengthen learners’ self-regulation and decision-making, or do they weaken them? Kirkpatrick’s model further specifies observable criteria: reactions (Level 1) and behavioral change (Level 3) provide evidence of whether students and educators experience increased or reduced agency. This translation reframes oversight as a protection of educational autonomy, aligning with Annex III, point 3b, of the EU AI Act requirements while laying the foundation for RQ2 by identifying indicators such as contestation mechanisms, appeals, and documented educator discretion.

3.2.2. Technical Robustness and Safety: Educational Construct Validity

For AIB-LOA, technical robustness must be understood as a question of educational construct validity. Messick’s framework emphasizes that validity is not solely technical accuracy but also fidelity to the constructs of learning being assessed.

Recent empirical work demonstrates this challenge. Albaladejo-González et al. [27] show that common explainability techniques, such as LIME and SHAP, often generate explanations that are pedagogically irrelevant or even misleading. Hooshyar and Yang [30] confirm these limitations, demonstrating through comparative studies that SHAP and LIME can yield unstable explanations, rely on problematic feature independence assumptions, and fail to provide actionable insights for educators. This evidence demonstrates the need for robustness criteria that extend beyond algorithmic stress tests and instead verify consistent alignment with authentic learning outcomes.

Fedele et al. [14] also show that while ALTAI prompts attention to system vulnerabilities, it does not ensure that AI models avoid proxy optimization (e.g., predicting effort by time-on-task rather than competence). Addressing RQ1, the XAI-ED CAF reframes robustness as the requirement that algorithmic outputs consistently and reliably measure intended learning constructs across diverse populations. This, in turn, supports RQ2 by identifying indicators such as validity studies, fairness audits, and expert reviews of pedagogical alignment.

3.2.3. Privacy and Data Governance: Educational Data Stewardship

In AIB-LOA contexts, privacy and data governance must be reframed as educational data stewardship. As Zawacki-Richter et al. [2] observe, the extensive collection of student data risks undermining trust and reshaping the learning environment in ways that are opaque to stakeholders. Reidenberg’s [13] concept of Lex Informatica demonstrates how technical architectures embed regulatory choices, making data governance a normative rather than neutral process.

Regarding governance, Umoke et al. [19] identify the absence of standardized practices for ensuring fairness and accountability in adaptive learning systems, while Yan et al. [9] document the limited operationalization of privacy principles in educational AI ethics. Through Stufflebeam’s CIPP framework, privacy becomes part of the “input” dimension: the adequacy of data stewardship directly influences the fairness and validity of outcomes. Addressing RQ1, this reinterpretation emphasizes that compliance with GDPR must be complemented by educationally meaningful practices of consent, transparency, and learner rights. For RQ2, indicators may include institutional data governance policies, consent procedures, and evidence of student awareness of their rights.

3.2.4. Transparency: Educational Interpretability Impact Assessment

Transparency in AIB-LOA requires moving beyond legal disclosure toward educational interpretability. UNESCO [17,18] contends that explanations must be pedagogically contextualized and understandable to educators, learners, and parents. However, empirical studies reveal a gap: Albaladejo-González et al. [27] demonstrate that even when technical explanations are provided, they often fail to improve trust or comprehension among non-technical users. Holmes et al. [1] caution that such opacity undermines procedural fairness and accountability.

Messick’s consequential validity reframes transparency as the degree to which interpretability enhances trust and learning rather than as an abstract requirement. Kirkpatrick’s framework provides evaluation levels: comprehension (Level 2) can be used to measure whether stakeholders understand explanations. Addressing RQ1, transparency is thus reconceptualized as an educational condition: explanations must be functional for learning and decision-making. For RQ2, this identifies indicators such as comprehension assessments, stakeholder surveys, and documentation of how explanations influence educational choices.

3.2.5. Diversity, Fairness, and Educational Equity: Opportunity Access Assessment

Diversity and fairness in AIB-LOA must be addressed not as abstract ideals but as conditions of equal educational opportunity. Peterson and Broersen [12] argue that AI systems cannot provide a single normative justification for fairness, underscoring the need for pluralistic, context-dependent governance. Chan [28] similarly warns that algorithmic systems may exacerbate inequities by encoding biases in training data, thereby reinforcing existing achievement gaps.

Messick’s consequential validity positions fairness as integral to the legitimacy of assessment, requiring systematic examination of differential impacts on student groups. Addressing RQ1, the XAI-ED CAF reframes fairness as a governance obligation: do algorithmic evaluations create barriers or open pathways? For RQ2, this implies indicators such as disaggregated outcome data, bias audits, and equity impact reports. Such evidence operationalizes fairness not only as compliance with non-discrimination norms but as a demonstrable commitment to inclusive education.

3.2.6. Environmental and Societal Well-Being: Educational Community Impact Assessment

AIB-LOA systems influence not only individual assessments but also the broader culture of learning and institutional sustainability. UNESCO [17] emphasizes that AI adoption should strengthen human relationships and sustainable practices, yet empirical reviews indicate that these dimensions remain underexplored in current governance models [9]. Boncillo [31] shows how ethical and infrastructural risks emerge when AI systems are implemented without attention to community impacts and environmental costs.

Kirkpatrick’s framework situates this dimension at Level 4, institutional results, while Stufflebeam’s CIPP “product” dimension emphasizes the long-term outcomes of interventions. Addressing RQ1, societal and environmental well-being is reframed as a governance criterion: do AI systems contribute to collaborative learning environments, trust, and sustainability? For RQ2, this identifies indicators such as institutional climate surveys, student–educator engagement reports, and documentation of environmental footprints.

3.2.7. Accountability: Educational Governance Effectiveness Assessment

Accountability is the cornerstone that ensures institutional learning and adaptability in the governance of AIB-LOA. Radclyffe et al. [11] critique ALTAI for its reliance on self-assessment, noting that without independent validation, institutions may underreport vulnerabilities. Fedele et al. [14] similarly observe that ALTAI applications remain outcome-blind, failing to connect governance processes with the validity of learning assessments.

Through Stufflebeam’s CIPP framework, accountability is positioned as both process and product: institutions must document governance procedures, stakeholder engagement, and policy revisions based on feedback. Addressing RQ1, accountability is reframed as the demonstration of responsiveness, transparency, and institutional learning capacity. For RQ2, relevant indicators include governance documents, stakeholder consultation reports, and evidence of policy changes based on system performance. This positions accountability not as a compliance checklist but as a responsive mechanism linking technical performance, student rights, and institutional responsibility [19].

3.3. Operational Indicators and Evidence Framework (RQ2)

Addressing RQ2, the operationalization of the XAI-ED CAF requires identifying indicators and evidence types that can guide the outcome-focused evaluation of AIB-LOA. These elements represent conceptual foundations that demonstrate how ALTAI’s requirements may be translated into educationally meaningful assessment practices, consistent with the regulatory obligations of Annex III, point 3b, of the EU AI Act, rather than prescriptive protocols.

Table 3 provides a schema that connects each ALTAI dimension to general categories of indicators and possible forms of evidence. The intention is to demonstrate feasibility while not developing applied instrument design, which will be the focus of future empirical research.

By presenting Table 3 at a conceptual level, the framework demonstrates its potential to guide systematic outcome-focused governance of AIB-LOA without prematurely advancing into tool design. This approach maintains coherence with the article’s theoretical focus while providing a pathway toward future empirical validation through Delphi studies and institutional case applications.

4. Discussion

4.1. Theoretical Contributions to Educational AI Governance

This study makes three theoretical contributions to the governance of AI-based learning outcome assessment (AIB-LOA). Unlike prior work that has positioned ALTAI within education primarily as a compliance checklist [11,14], the XAI-ED CAF advances the field by applying educational evaluation theories as structured evaluative criteria for the reinterpretation of ALTAI requirements. This approach grounds AI governance not only in abstract ethical principles but in established traditions of educational assessment research.

First, the framework extends existing governance debates by reframing ALTAI’s seven requirements through pedagogical evaluation theories [20,21,22]. While most AI governance tools focus on technical compliance, the XAI-ED CAF emphasizes consequential validity and the social-educational consequences of algorithmic decision-making. Second, it bridges a conceptual gap by showing how hybrid human–AI oversight can be operationalized in education through systematic evaluation of learning outcomes, thereby aligning governance mechanisms with legal principles of explainability and due process [12,16]. Third, it positions educational AI explicitly as a high-risk domain under the EU AI Act’s Annex III, point 3(b), demonstrating that governance models must be not only legally robust but also pedagogically meaningful. Through these contributions, it supports advancing the theoretical foundations of AI governance for high-stakes learning contexts. While earlier work has mapped definitional debates and ethical-legal principles in AI-based assessment [23], the present study advances this discussion by proposing a design-level framework that systematically integrates evaluation theory with regulatory requirements.

4.2. Methodological Contributions to Assessment Framework Design

Methodologically, the XAI-ED CAF provides a replicable model for translating abstract ethical principles into operational indicators and evidence types. In contrast to prior checklists that remain descriptive, the framework systematically derives indicators from established theories of validity and evaluation, ensuring methodological rigor. By integrating consequential validity, multi-level evaluation, and contextual assessment into the reinterpretation of ALTAI, the framework demonstrates how established educational theories can serve as analytical criteria for AI governance [1,9]. This methodological contribution establishes a bridge between regulatory obligations and educational practices, enabling institutions to self-assess outcome validity and fairness.

4.3. Implications for Educational Institutional Practice

At the institutional level, the framework provides actionable guidance for universities, schools, and other education providers seeking to comply with emerging AI regulations while protecting student rights. First, the translation of ALTAI into outcome-focused practices equips institutions with tools for monitoring algorithmic assessment systems after deployment, rather than relying exclusively on ex-ante conformity checks. Second, by embedding stakeholder-centered evaluation, including students, teachers, administrators, and parents, the framework encourages participatory governance [17,18] and trust-building within educational communities [31]. Third, its emphasis on learning outcomes and educational equity aligns algorithmic governance with pedagogical missions, ensuring that AI systems support rather than undermine institutional commitments to fairness, inclusion, and student development.

4.4. Policy and Regulatory Compliance Implications

For regulators and policymakers, this study demonstrates the importance of sector-specific adaptations of general AI governance tools. By aligning ALTAI with GDPR Article 22 safeguards and the Annex III, point 3b, of the EU AI Act high-risk classification for student evaluation systems, the XAI-ED CAF clarifies how compliance requirements can be operationalized in practice. It shows that transparency and accountability must be understood not only as legal obligations but also as pedagogical requirements central to educational legitimacy.

The framework identifies the need for independent validation mechanisms beyond institutional self-assessment, reflecting critiques of ALTAI’s self-reporting orientation [11]. By demonstrating how educational theory can inform governance design, this study contributes to ongoing discussions at the European Commission, UNESCO, and national authorities considering how to embed trustworthy AI principles within education policy frameworks [9,19].

4.5. Limitations and Directions for Future Research

This research remains at a theoretical and design stage. While grounded in established evaluation theories and aligned with regulatory requirements, the indicators and evidence types require empirical validation. The validity of the framework will depend on expert validation processes, such as Delphi studies, where limitations in panel composition could affect generalizability. Additionally, while the framework is designed for European educational governance, cross-jurisdictional testing will be necessary to ensure relevance in contexts with different legal and cultural assumptions.

A further limitation concerns the selection of the three evaluation frameworks as the basis for reinterpretation. Although widely recognized and compatible with operationalization, the choice is not definitive: other traditions, such as sociocultural or critical assessment theories, could also be applied or further developed for AI-driven evaluation. Similarly, reliance on ALTAI as the baseline framework introduces constraints, given that it was developed in 2020 and may require future revisions to align with the evolving AI Act and changing educational practices. However, its continued use in recent educational research [14] justifies its adoption as a reference point for this study.

Future research should advance in three directions. First, validating the proposed indicators and evidence types through structured expert consensus. Second, testing the framework across diverse institutional and cultural contexts to evaluate adaptability and cross-jurisdictional validity. Third, exploring whether alternative or additional theories of educational evaluation might better capture the particular challenges of AI-driven assessment. Another priority is investigating mechanisms for scalability and integration with independent audits, addressing concerns raised about ALTAI’s reliance on self-assessment [11].

5. Conclusions

This study has developed the XAI-ED CAF to address the pressing need for sector-specific governance of artificial intelligence in student evaluation, as a specifically designated high-risk domain under Annex III, 3(b), of the EU AI Act. Building on established theories of educational evaluation (Messick’s consequential validity, Kirkpatrick’s multi-level model, and Stufflebeam’s CIPP framework) the CAF reinterprets the seven ALTAI dimensions through a pedagogical foundation, translating them into indicators and evidence types for outcome-focused governance of AI-based learning outcome assessment (AIB-LOA).

The primary contribution lies in demonstrating that trustworthy AI in education extends beyond technical transparency or regulatory compliance. Governance of AIB-LOA requires systematic attention to the educational consequences of algorithmic decision-making: preserving student autonomy, ensuring pedagogical validity, protecting privacy, enhancing interpretability, safeguarding fairness, sustaining institutional culture, and guaranteeing accountability. By integrating these dimensions, the XAI-ED CAF advances theoretical debates in AI governance while offering a methodological pathway for operationalization.

A further contribution is the integration of educational evaluation theory into the reinterpretation of ALTAI. While prior applications of ALTAI in education have largely remained at the level of compliance or vulnerability analysis [11,14], this study introduces a pedagogically grounded framework that connects regulatory requirements with established traditions of assessment research. Through this approach, it directly addresses RQ1 by reinterpreting ALTAI’s requirements through educational evaluation theory and RQ2 by proposing theoretically derived indicators and evidence types for institutional self-assessment.

For policy, the CAF provides a structured response to the AI Act’s high-risk classification of AIB-LOA and complements GDPR safeguards by operationalizing ALTAI requirements in ways that are meaningful for students, educators, and institutions. It contributes to bridging the gap between regulatory expectations and educational practice, thereby supporting both legal compliance and pedagogical accountability.

This work establishes a theoretical foundation for outcome-focused governance of AIB-LOA. It offers a design-level framework that future research should validate and refine through expert consensus-building, cross-contextual testing, and empirical institutional studies, rather than presenting a completed instrument.

Author Contributions

Conceptualization, F.M. and G.B.; Methodology, F.M.; Software, F.M.; Validation, F.M., A.N. and G.B.; Formal analysis, F.M. and G.B.; Investigation, F.M.; Resources, F.M.; Data curation, F.M.; Writing—original draft, F.M.; Writing—review & editing, F.M., A.N. and G.B.; Visualization, F.M.; Supervision, F.M.; Project administration, F.M.; Funding acquisition, F.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was co-funded by the Ministry of Enterprises and Made in Italy (MIMIT) within the project ITAVER5O—Intelligent 5G-Technology-Assisted Virtual Experiences for Robust Student Orientation, grant number CUP B53C23004710004. The APC was also covered by the same project.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Holmes, W.; Porayska-Pomsta, K.; Holstein, K.; Sutherland, E.; Baker, T.; Shum, S.B.; Santos, O.C.; Rodrigo, M.T.; Cukurova, M.; Bittencourt, I.I.; et al. Ethics of AI in education: Towards a community-wide framework. Int. J. Artif. Intell. Educ. 2022, 32, 504–526. [Google Scholar] [CrossRef]
Zawacki-Richter, O.; Marín, V.I.; Bond, M.; Gouverneur, F. Systematic review of research on artificial intelligence applications in higher education–where are the educators? Int. J. Educ. Technol. High. Educ. 2019, 16, 39. [Google Scholar] [CrossRef]
Luckin, R.; Holmes, W.; Griffiths, M.; Forcier, L.B. Intelligence Unleashed: An Argument for AI in Education; Pearson: London, UK, 2016; 18p. [Google Scholar]
Holmes, W.; Bialik, M.; Fadel, C. Artificial Intelligence in Education: Promises and Implications for Teaching and Learning; Center for Curriculum Redesign: Boston, MA, USA, 2019; Available online: https://discovery.ucl.ac.uk/id/eprint/10139722 (accessed on 26 August 2025).
Boccuzzi, G.; Nico, A.; Manganello, F. Delegated authority and algorithmic power: A rapid review of ethical issues in AI-based educational assessment. In Proceedings of the ACM 5th International Conference on Information Technology for Social Good (ACM GoodIT 2025), Antwerp, Belgium, 3–5 September 2025. [Google Scholar]
European Parliament. Council of the European Union. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending Regulations (EC) No 300/2008, (EU) No 167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2144 and Directives 2014/90/EU, (EU) 2016/797 and (EU) 2020/1828 (Artificial Intelligence Act). Off. J. Eur. Union. 2024. L 1689. Available online: http://data.europa.eu/eli/reg/2024/1689/oj (accessed on 26 August 2025).
Edwards, L.; Veale, M. Slave to the algorithm? Why a ‘right to an explanation’ is probably not the remedy you are looking for. Duke Law Technol. Rev. 2017, 16, 18. [Google Scholar]
Manganello, F.; Nico, A.; Boccuzzi, G. Mapping the research landscape of transparent AI in university assessment: A bibliometric investigation. In Proceedings of the 5th International Conference on AI Research (ICAIR), Genoa, Italy, 11–12 December 2025. [Google Scholar]
Yan, Y.; Liu, H.; Chau, T. A systematic review of AI ethics in education: Challenges, policy gaps, and future directions. J. Glob. Inf. Manag. 2025, 33, 1–50. [Google Scholar] [CrossRef]
European Commission; Directorate-General for Communications Networks, Content and Technology. The Assessment List for Trustworthy Artificial Intelligence (ALTAI) for Self-Assessment; Publications Office of the European Union: Luxembourg, 2020; Available online: https://digital-strategy.ec.europa.eu/en/library/assessment-list-trustworthy-artificial-intelligence-altai-self-assessment (accessed on 26 August 2025).
Radclyffe, C.; Ribeiro, M.; Wortham, R. The assessment list for trustworthy artificial intelligence: A review and recommendations. Front. Artif. Intell. 2023, 6, 1020592. [Google Scholar] [CrossRef] [PubMed]
Peterson, C.; Broersen, J. Understanding the limits of explainable ethical AI. Int. J. Artif. Intell. Tools 2024, 33, 2460001. [Google Scholar] [CrossRef]
Reidenberg, J.R. Lex informatica: The formulation of information policy rules through technology. Tex. Law Rev. 1997, 76, 553. [Google Scholar]
Fedele, A.; Punzi, C.; Tramacere, S. The ALTAI checklist as a tool to assess ethical and legal implications for a trustworthy AI development in education. Comput. Law Secur. Rev. 2024, 53, 105986. [Google Scholar] [CrossRef]
Boccuzzi, G.; Nico, A.; Manganello, F. Harmonizing human and algorithmic assessment: Legal reflections on the right to explainability in education. In Proceedings of the 17th International Conference on Education and New Learning Technologies (EDULEARN25), Palma, Spain, 30 June–2 July 2025. [Google Scholar] [CrossRef]
Boccuzzi, G.; Nico, A.; Manganello, F. Hybridizing human and AI judgment: Legal theories as a framework for educational assessment. In Proceedings of the 2nd Workshop on Law, Society and Artificial Intelligence (LSAI 2025), Held at HHAI 2025: The 4th International Conference on Hybrid Human-Artificial Intelligence, Pisa, Italy, 10 June 2025. [Google Scholar]
UNESCO. Recommendation on the Ethics of Artificial Intelligence; Adopted on 23 November 2021; UNESCO: Paris, France, 2022; 43p, Available online: http://digitallibrary.un.org/record/4062376 (accessed on 26 August 2025).
Holmes, W.; Miao, F. Guidance for Generative AI in Education and Research; UNESCO Publishing: Paris, France, 2023. [Google Scholar]
Umoke, C.C.; Nwangbo, S.O.; Onwe, O.A. The governance of AI in education: Developing ethical policy frameworks for adaptive learning technologies. Int. J. Appl. Sci. Math. Theory 2025, 11, 71–88. [Google Scholar] [CrossRef]
Messick, S. Standards of validity and the validity of standards in performance assessment. Educ. Meas. Issues Pract. 1995, 14, 5–8. [Google Scholar] [CrossRef]
Kirkpatrick, D.; Kirkpatrick, J. Evaluating Training Programs: The Four Levels, 3rd ed.; Berrett-Koehler Publishers: San Francisco, CA, USA, 2006. [Google Scholar]
Stufflebeam, D.L. Madaus, G.F., Scriven, M., Stufflebeam, D.L., Eds.; The CIPP model for program evaluation. In Evaluation Models: Viewpoints on Educational and Human Services Evaluation; Springer: Dordrecht, The Netherlands, 1983; pp. 117–141. [Google Scholar] [CrossRef]
Boccuzzi, G.; Nico, A.; Manganello, F. Educational assessment in the age of AI: A narrative review on definitions and ethical-legal principles for trustworthy automated systems. In Proceedings of the 19th International Conference on e-Learning and Digital Learning (ELDL 2025) + STE 2025, Lisbon, Portugal, 23–25 July 2025. [Google Scholar]
Gunning, D.; Stefik, M.; Choi, J.; Miller, T.; Stumpf, S.; Yang, G.Z. XAI—Explainable artificial intelligence. Sci. Robot. 2019, 4, eaay7120. [Google Scholar] [CrossRef] [PubMed]
Khosravi, H.; Shum, S.B.; Chen, G.; Conati, C.; Tsai, Y.S.; Kay, J.; Gašević, D. Explainable artificial intelligence in education. Comput. Educ. Artif. Intell. 2022, 3, 100074. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar] [CrossRef]
Albaladejo-González, M.; Ruipérez-Valiente, J.A.; Gómez Mármol, F. Artificial intelligence to support the training and assessment of professionals: A systematic literature review. ACM Comput. Surv. 2024, 57, 1–29. [Google Scholar] [CrossRef]
Chan, C.K.Y.; Hu, W. A comprehensive AI policy education framework for university teaching and learning. Int. J. Educ. Technol. High. Educ. 2023, 20, 38. [Google Scholar] [CrossRef]
Chen, L.; Chen, P.; Lin, Z. Artificial intelligence in education: A review. IEEE Access 2020, 8, 75264–75278. [Google Scholar] [CrossRef]
Hooshyar, D.; Yang, Y. Problems with SHAP and LIME in interpretable AI for education: A comparative study of post-hoc explanations and neural-symbolic rule extraction. IEEE Access 2024, 12, 137472–137490. [Google Scholar] [CrossRef]
Boncillo, J. AI in education: A systematic review of its applications, benefits, and ethical challenges. Int. J. Multidiscip. Educ. Res. Innov. 2025, 3, 436–447. [Google Scholar]

Table 1. Key references, identified gaps, and contributions to the development of the XAI-ED CAF.

Source (A–Z)	Identified Gap	Contributed to the XAI-ED CAF	Relevance to RQs
Albaladejo-González et al. [27]	XAI techniques immature; explanations lack pedagogical and assessment relevance.	Reinforces critique of SHAP/LIME for failing to provide educationally meaningful explanations.	[RQ2] Highlights need for outcome-focused indicators beyond technical transparency.
Fedele et al. [14]	ALTAI applied to educational AI but limited to vulnerabilities and compliance; no focus on outcome validity.	Demonstrates ALTAI’s relevance while exposing need for outcome-focused governance.	[RQ1] Motivates reinterpretation of ALTAI through evaluation theory.
Holmes et al. [1]	Ethical and operational challenges of AI-based educational assessment remain under-theorized.	Justifies urgency of systematic governance approaches in AIB-LOA.	[RQ1] Frames the governance challenge requiring theoretical integration.
Peterson & Broersen [12]	Autonomous ethical AI impossible; no single normative explanation.	Strengthens case for hybrid human–AI evaluation and pluralistic accountability.	[RQ1] Underlines role of evaluation theory in normative interpretation.
Radclyffe et al. [11]	ALTAI remains self-assessment oriented; weak capacity for independent auditing.	Motivates translation of ALTAI into institutional self-assessment tailored to AIB-LOA.	[RQ1] Highlights regulatory limits and the need for educational reinterpretation.
Ribeiro et al. [26]	Early XAI methods (LIME) created for technical interpretability, not pedagogical accountability.	Demonstrates need to adapt technical explainability to educational meaning-making.	[RQ2] Shows why indicators must link transparency to validity.
Umoke et al. [19]	Fragmented governance; lack of standardized policies for educational AI ethics.	Validates urgency of sector-specific governance frameworks.	[RQ1] Confirms absence of frameworks tailored to Annex III, point 3b, EU AI Act.
Yan et al. [9]	Fragmented AIED ethics literature; weak integration between principles and practice.	Confirms need for coherent frameworks linking ethics with systematic educational evaluation.	[RQ1 & RQ2] Highlights absence of empirical validation and outcome indicators.

Table 3. Illustrative indicators and evidence types for assessing ALTAI dimensions in educational ex-post evaluation.

ALTAI Dimension	Illustrative Indicators (What to Consider)	Possible Evidence Types (How to Show)
Human agency & oversight	Presence of mechanisms for human override; stakeholder capacity to contest decisions; preservation of educator judgment and student autonomy	Records of appeals and overrides; surveys on student autonomy; educator feedback
Technical robustness & safety	Validity and reliability of assessment outputs; mechanisms to detect and mitigate bias; resilience to failure	Validity studies; fairness audits; expert reviews of alignment with pedagogical goals
Privacy & data governance	Compliance with data minimization; clarity of consent; effectiveness of data correction/deletion procedures	Institutional data governance policies; audit reports; stakeholder awareness surveys
Transparency	Comprehensibility of explanations; traceability of outcomes; usefulness for decision-making	Comprehension assessments; explanation usage data; transparency policy documents
Diversity, fairness & non-discrimination	Equity of access; absence of systematic disadvantage across demographic groups; bias mitigation effectiveness	Disaggregated outcome data; equity reports; fairness audit documentation
Societal & environmental well-being	Contribution to institutional culture; effects on student–educator relationships; sustainability of infrastructure	Climate surveys; engagement reports; documentation of environmental impact
Accountability	Clarity of roles and responsibilities; mechanisms for oversight and redress; evidence of institutional learning from feedback	Governance documents; records of complaints and resolutions; reports on policy revisions

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Manganello, F.; Nico, A.; Boccuzzi, G. Theoretical Foundations for Governing AI-Based Learning Outcome Assessment in High-Risk Educational Contexts. Information 2025, 16, 814. https://doi.org/10.3390/info16090814

AMA Style

Manganello F, Nico A, Boccuzzi G. Theoretical Foundations for Governing AI-Based Learning Outcome Assessment in High-Risk Educational Contexts. Information. 2025; 16(9):814. https://doi.org/10.3390/info16090814

Chicago/Turabian Style

Manganello, Flavio, Alberto Nico, and Giannangelo Boccuzzi. 2025. "Theoretical Foundations for Governing AI-Based Learning Outcome Assessment in High-Risk Educational Contexts" Information 16, no. 9: 814. https://doi.org/10.3390/info16090814

APA Style

Manganello, F., Nico, A., & Boccuzzi, G. (2025). Theoretical Foundations for Governing AI-Based Learning Outcome Assessment in High-Risk Educational Contexts. Information, 16(9), 814. https://doi.org/10.3390/info16090814

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Theoretical Foundations for Governing AI-Based Learning Outcome Assessment in High-Risk Educational Contexts

Abstract

1. Introduction

1.1. Rationale

1.2. Objectives

2. Background and Related Work

2.1. Regulatory Frameworks and the Governance of Learning Outcomes

2.2. ALTAI and Its Educational Reinterpretation

2.3. Explainability and Its Limits for Outcome Accountability

2.4. Governance Frameworks and Educational Accountability

2.5. Toward Outcome-Focused Self-Assessment

3. Theoretical Framework Development: XAI-ED Consequential Assessment Framework (XAI-ED CAF)

3.1. Pedagogical Foundations for Outcome-Focused Governance of AIB-LOA

3.2. ALTAI Dimensions Reinterpreted for Outcome-Focused Governance (RQ1)

3.2.1. Human Agency and Oversight: Preserving Educational Autonomy

3.2.2. Technical Robustness and Safety: Educational Construct Validity

3.2.3. Privacy and Data Governance: Educational Data Stewardship

3.2.4. Transparency: Educational Interpretability Impact Assessment

3.2.5. Diversity, Fairness, and Educational Equity: Opportunity Access Assessment

3.2.6. Environmental and Societal Well-Being: Educational Community Impact Assessment

3.2.7. Accountability: Educational Governance Effectiveness Assessment

3.3. Operational Indicators and Evidence Framework (RQ2)

4. Discussion

4.1. Theoretical Contributions to Educational AI Governance

4.2. Methodological Contributions to Assessment Framework Design

4.3. Implications for Educational Institutional Practice

4.4. Policy and Regulatory Compliance Implications

4.5. Limitations and Directions for Future Research

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI