1. Introduction
Artificial Intelligence (AI) is swiftly transforming the educational landscape, heralding significant changes in student learning, educator instruction, and institutional data-driven decision-making. AI technologies, including intelligent tutoring systems (
Graesser et al., 2012;
Nwana, 1990), automated assessment tools (
Owan et al., 2023;
Estrada et al., 2025), real-time risk prediction, and content adaptation, are now essential in creating scalable, personalized (
Bhutoria, 2022;
Maghsudi et al., 2021), and responsive learning environments in both formal and informal learning settings. Central to this transformation are large language models (LLMs), whose capabilities in natural language generation, dialogic interaction, and real-time explanation are redefining digital pedagogy. These technologies facilitate conversational agents, dynamic learning platforms, and multilingual support systems, thereby enhancing the engagement and accessibility of education. Nevertheless, the proliferation of LLM-based educational tools has introduced a new array of ethical, pedagogical, and technical challenges. A primary concern is the persistent opacity of AI decision-making processes and alignment of the generated output to the educational and cognitive process of learners. In high-stakes educational contexts, such as student assessment, intervention planning, or curriculum alignment, black-box models can obscure reasoning, mislead stakeholders, and perpetuate systemic inequities. Most AI outputs remain challenging for learners to interpret, for educators to justify, and for administrators to audit owing to the challenges of lack of transparency (
Bing & Leong, 2025), trust (
Martínez-Requejo et al., 2025), representational bias (
Bing & Leong, 2025) and more (
Alali & Wardat, 2024). While the field of Explainable AI (XAI) is advancing to address these hindrances in acceptance, the absence of clear, stakeholder-relevant explanations fine-tuned for educational settings undermines trust, limits accountability, and ultimately diminishes the instructional and institutional value of AI integration. Explanations in educational AI systems are not explanations of the model in the classical XAI sense but are human-centered explanations that aim to bridge the output to user understanding in a way that is developmentally appropriate, context-aware, and instructionally relevant, rather than the output to input traceability alone.
While Explainable AI (XAI) has been extensively investigated in fields such as healthcare, finance, and criminal justice, its application within the educational sector remains in its infancy and often lacks alignment with educational objectives. Our work draws from traditions in explainable AI (XAI), it reconceptualizes explanation through a human-centered lens. Rather than focusing solely on the inner workings of the model such as attention maps or feature importance scores, this study prioritizes stakeholder-centered justifications that support understanding, trust, and action. This approach aligns with the goals of human-centric XAI, where the aim is not transparency for its own sake, but rather meaningful comprehension for diverse users including students, educators, and administrators. In educational contexts, generative AI outputs, especially in natural language, should be inherently interpretable and explain the decision, going beyond mechanistic traceability to be pedagogically useful. Learners benefit more from actionable, curriculum-aligned feedback, illustrative examples, and motivational prompts than from technical artifacts like SHapley Additive exPlanations (SHAP) values. Current XAI frameworks frequently emphasize algorithmic transparency at the expense of human-centered relevance, neglecting the specific requirements of students, educators, educational institutions and general pedagogy. Furthermore, educational AI presents unique challenges that necessitate considerations beyond mere technical interpretability. Systems must align feedback with curricular objectives, adapt to diverse learner profiles, support equitable decision-making, and be comprehensible to non-technical users. Additionally, effective AI explanations in education must transcend mere clarification; they must engage, motivate, and empower learners. To address these gaps, we introduce the PEARL Framework, a human-centered, ethically grounded model for designing and evaluating explainable AI in education. PEARL stands for:
Pedagogical Personalization: AI should adapt feedback and content to the learner’s developmental level and curriculum, combining instructional alignment with individualized support.
Explainability & Engagement: Explanations should be clear and motivating, tailored to various user groups and designed to sustain attention and understanding.
Attribution & Accountability: AI decisions should be traceable and auditable, allowing users to inspect the reasoning behind predictions and attribute outcomes to specific learning inputs.
Representation & Reflection: AI should mitigate bias across learner demographics and encourage metacognitive growth by prompting self-reflection on progress and misconceptions.
Localized Learner Agency: AI should respect cultural and linguistic contexts while empowering learners to control interaction styles such as adjusting explanation depth, pacing, or modality.
Figure 1 presents the PEARL framework as a cyclical model composed of five interdependent dimensions: Pedagogical Personalization, Explainability & Engagement, Attribution & Accountability, Representation & Reflection, and Localized Learner Agency, emphasizing that effective educational AI emerges from their continuous interaction rather than isolated implementation.
Each dimension of the PEARL framework addresses a significant deficiency in existing LLM-based educational tools. For instance, current systems frequently offer feedback that is fluent yet lacks pedagogical foundation; they may label students as “at risk” without adequate justification; or they generate content that does not adequately reflect the cultural backgrounds of learners. PEARL addresses these issues by incorporating ethical principles, learner-centered design, and educational relevance into a modular, adaptable framework for explainable AI. This framework mitigates these risks while supporting stakeholder-specific explainability across diverse educational contexts. Additionally, we introduce the PEARL Composite Score, a quantitative rubric designed to assess AI systems on their ethical and pedagogical dimensions. This score serves as a quantitative, multi-dimensional tool that enables developers, educators, and policymakers to evaluate AI systems in terms of ethical and pedagogical dimensions. By translating qualitative principles of explainability and fairness into measurable dimensions, it allows stakeholders to monitor an AI system’s performance not only in terms of functionality but also in terms of ethical and pedagogical integrity. The score provides transparency regarding system strengths and weaknesses across the five core dimensions of PEARL, thereby guiding targeted improvements and informing procurement or deployment decisions. For researchers, it facilitates reproducible evaluation across systems and versions, while for institutions, it offers a structured audit mechanism aligned with global AI ethics standards. Over time, the longitudinal application of the composite score can support continuous quality assurance, ensuring that smart learning systems remain trustworthy, inclusive, and educationally meaningful as they evolve. In this paper, we operationalize PEARL through the design and simulation of a smart AI tutor powered by LLMs (
Mokhtari et al., 2025). Although user data were not collected from real-world classrooms as part of this study, structured simulations and scenario-driven evaluations serve to illustrate PEARL’s applicability and diagnostic value. The main contributions of this research center on the development and application of the PEARL Framework, which is designed to support ethical, transparent, and student-centered AI in education.
We propose the PEARL Framework, a principled and multi-dimensional framework grounded in pedagogy, ethics, and human-centered design for AI in education.
We demonstrate the framework’s application through simulated case studies involving a developed LLM-based smart tutor, illustrating how each PEARL component promotes transparency, fairness, and user engagement.
We introduce the PEARL Composite Score, a stakeholder-aware evaluation rubric that guides auditing, comparison, and refinement of educational AI systems.
We provide a thorough literature survey on advancements in Human-centric XAI for AI Systems in education.
The PEARL Framework brings together educational principles, ethical considerations, and a focus on human needs to leverage AI use in learning environments. The practical value of this framework is highlighted through examples involving an AI-powered smart tutor that shows how each part of PEARL fosters openness, fairness, and greater student engagement. In addition, the introduction of the PEARL Composite Score provides a tool for stakeholders to review, compare, and continuously improve educational AI systems. This work also includes a comprehensive review of recent progress in creating AI tools for education that are understandable and people-focused. By combining clear explanation, a foundation in education, and strong ethical awareness, PEARL helps close the gap between advanced technology and everyday classroom needs. It offers a pathway for developing intelligent educational systems that are not only effective, but also transparent, fair, and grounded in real-life educational contexts.
In the following sections, we review existing research on explainable AI in education; describe how the PEARL Framework centers the ways students, teachers, and administrators experience and interpret AI decisions; illustrate its application through simulated examples using an AI-based tutoring system, demonstrating how PEARL enhances feedback clarity, supports diverse stakeholder needs, reduces bias, and promotes culturally relevant learning; introduce the PEARL Composite Score as a practical tool for evaluating alignment with ethical, pedagogical, and human-centered principles; and discuss study limitations and future validation efforts to refine the Composite Score and test its reliability across educational contexts.
2. Background and Review of Related Work
Researchers on educational artificial intelligence increasingly agree that AI systems in education must be understandable, fair, and designed around human needs, not just technically accurate. The core dimensions of explainability, fairness, and human-centered design form the foundation for recent work in educational AI and directly motivate the PEARL framework.
2.1. Educational AI Systems and Human-Centered Design
Several studies emphasize that AI explanations should help users trust systems and support learning rather than simply process inputs and deliver outputs.
Maity and Deroy (
2024) propose Human-Centric Explainable AI in Education, arguing that explanations must work for diverse users, including students and teachers with different backgrounds, goals, and levels of technical knowledge. While they highlight the potential of large language models (LLMs) to provide personalized feedback, they also stress the importance of transparency and cultural sensitivity. PEARL builds on this work by embedding structured explanations that are explicitly tied to instructional goals and classroom learning.
Fairness is another concern in educational AI.
Chinta et al. (
2024), through the FairAIED framework, examine bias and equity in AI-supported educational decisions. They combine technical fairness methods with stakeholder-informed interventions, showing how difficult it can be to balance system performance with equitable outcomes. PEARL extends this work by translating fairness principles into education-specific criteria that reflect classroom realities rather than abstract statistical measures.
Broader reviews of generative AI also highlight the importance of explainability.
Schneider (
2024) identifies key features such as interactivity and verifiability, which allow users to question and explore AI outputs. These ideas align closely with PEARL’s emphasis on interactive, dialog-based explanations. However, Schneider notes a lack of empirical evidence showing how such features function in real educational settings—an issue PEARL addresses through learner- and educator-facing evaluation.
Gupta and Kaul (
2024) explored the promise and risks of generative AI in education, pointing out challenges related to curriculum alignment and cultural relevance. They caution that AI-generated content may not always fit educational goals or learner contexts and call for education-specific explainability frameworks. PEARL directly responds to this need by benchmarking explanations against pedagogical objectives and structuring reasoning in ways that support learning.
Policy guidance echoes these research concerns. The
U.S. Department of Education (
2023) emphasizes that educational AI should remain human-centered, adaptable to local contexts, and subject to continuous evaluation. PEARL aligns with these priorities by embedding educational theory and ethical considerations into both system design and evaluation.
Ethical and participatory perspectives are central to current work.
Memarian and Doleck (
2023) argue that AI systems in higher education should be developed through participatory design, meaning that educators and learners should help shape how systems behave and explain themselves. PEARL reflects this approach by incorporating stakeholder perspectives into its explanation strategies and evaluation criteria.
Finally,
Wang et al. (
2024) introduce LLM-based Symbolic Programs (LSPs), which combine language models with symbolic logic to make AI reasoning more transparent. While still early in educational use, this approach aligns with PEARL’s goal of producing explanations that are rule-based, traceable, and accessible to learners.
2.2. Explainable AI and Learning
Researchers agree that AI systems in education must go beyond accurate predictions to help users understand why specific decisions are made.
Rosé et al. (
2019) propose explanatory learner models that provide actionable insights instead of just predicting grades or risk levels. By supporting a better grasp of the learning process, these models encourage essential collaboration between developers, educators, and scientists. Most importantly, developers should build explanations into these systems from the start, evaluating them for both their educational impact and technical accuracy.
Context also plays a crucial role in how explanations are understood. Clancey’s work on situated cognition argues that explanation is not a static output, but something shaped by social, cultural, and instructional context (
Clancey, 1997;
Clancey & Hoffman, 2021). From this perspective, effective explanations should feel conversational and adaptable, responding to who the learner is, what they are doing, and where learning is taking place. This challenges one-size-fits-all explanation strategies and supports PEARL’s focus on dialog-based, learner-controlled explanations (
Maity & Deroy, 2024).
Research in the broader XAI field reinforces these ideas. Studies by
Doshi-Velez and Kim (
2017) and
Escalante et al. (
2018) identify key qualities of effective explanations, including user trust, manageable cognitive load, and relevance to the task at hand. They also emphasize that explanations should support ethical responsibility by helping users recognize bias, question AI decisions, and remain in control. In educational settings, this means explanations should promote reflection, fairness, and informed human judgment rather than blind reliance on AI.
2.3. Limitations of Current Approaches
It is worth noting that despite the growing emphasis on explainable AI (XAI), many current educational systems utilize technical explanation methods that remain inaccessible to non-experts. For instance, visualizations such as feature importance charts may assist system developers, but they frequently offer minimal utility for students or teachers. Furthermore, a significant portion of existing research is limited to short-term or laboratory-based studies, lacking evidence from sustained use in authentic classroom environments. Consequently, there is a lack of standardized metrics to evaluate whether these explanations effectively promote learning, ensure fairness, or bolster learner agency (
Maity & Deroy, 2024;
Altukhi & Pradhan, 2024).
Overall, current research reveals a clear gap between standard explainable AI and the practical needs of schools. Addressing this gap requires a shift toward frameworks that link explanations to pedagogy, promote social equity, and provide students with meaningful control over their AI interactions. To meet these needs, this paper introduces the PEARL Framework. By integrating instructional theory with fairness criteria and interactive explanations, PEARL offers a unified, learner-focused approach to AI in education.
3. PEARL: A Human-Centered Model for Explainable AI in Education
The swift incorporation of artificial intelligence in both formal and informal educational settings is transforming educational experiences, thereby necessitating that these systems be comprehensible, equitable, and aligned with the values of educators, learners, and administrators. Numerous contemporary tutoring systems based on large language models produce fluent explanations that may seem sophisticated but are frequently pedagogically ambiguous, culturally misaligned, or cognitively misdirected. While these systems may perform adequately in aggregate, they often provide inequitable, disengaging, or perplexing experiences for learners from diverse backgrounds. The current lack of transparency, adaptability, and pedagogical foundation in educational AI poses a risk to its potential as a reliable learning partner. The PEARL Framework comprising Pedagogical Personalization, Explainability & Engagement, Attribution & Accountability, Representation & Reflection, and Localized Learner Agency was developed to address this challenge. It offers a design and evaluation blueprint for AI systems in education, particularly those powered by large language models. PEARL emphasizes not only interpretability and fairness but also learner motivation, cultural relevance, and stakeholder empowerment, rendering it uniquely suited for explainable AI in high-impact learning environments.
Pedagogical Personalization (P): This component ensures AI-generated content, scaffolding, and feedback are educationally appropriate and aligned with curriculum standards and the learner’s developmental stage. Distinct from generic personalization methods, pedagogical personalization integrates adaptive learning with instructional theories such as Bloom’s Taxonomy and the Zone of Proximal Development (ZPD). For example, it prevents inappropriate recommendations, such as suggesting abstract algebra to a student who is struggling with arithmetic. By aligning AI responses with both the learner’s cognitive readiness and formal learning objectives, this component transforms AI from a reactive tool into a proactive instructional partner. It maintains curriculum fidelity while supporting individualized progression.
Table 1 summarizes the five components of the PEARL framework, outlining each dimension and its corresponding role in supporting human-centered, ethically grounded educational AI.
Explainability & Engagement (E): AI systems must not only be interpretable but also capable of conveying explanations in a manner that is both cognitively accessible and emotionally engaging for all natural language outputs. This aspect underscores the importance of stakeholder-specific explainability. For instance, students benefit from simplified analogies and motivational scaffolds, whereas teachers may require insights into error trends and conceptual mappings. Administrators, on the other hand, necessitate dashboards and equity indicators. Engagement is crucial: explanations should stimulate curiosity, invite inquiry, and support self-regulated learning. By integrating clarity with motivation, this component enhances both interpretability and pedagogy, rendering explanations actionable and meaningful across diverse user groups.
Attribution & Accountability (A): In educational contexts, decisions made by artificial intelligence (AI) can significantly impact grades, interventions, placements, and disciplinary measures. The principle of Attribution and Accountability is crucial in ensuring that these decisions are both traceable and justifiable. This involves maintaining comprehensive audit trails of AI outputs, documenting the rationale behind decisions, linking outcomes to learner actions or inputs, and enabling version control or rollback in the event of errors. For example, if a student is identified as “at risk,” educators must be able to trace the model’s reasoning and verify its data inputs. This component is essential for legal compliance, fostering stakeholder trust, and enhancing system debuggability, thereby transforming AI from an opaque black box into an accountable and auditable entity within the educational process.
Representation & Reflection (R): Artificial intelligence systems tend to replicate and even exacerbate biases present in historical data. This component emphasizes fairness across dimensions such as race, gender, language, disability, and learning profiles, while also encouraging reflective feedback for learners. It incorporates demographic disaggregation, subgroup accuracy testing, and reflective explanation strategies that encourage students to contemplate their learning strategies. For instance, instead of merely correcting an error, a reflective AI might inquire: “What strategy did you use here? Would you like to try a different one?” This approach fosters metacognition while ensuring that no group is systematically disadvantaged by the system. Bias may also originate from the data used to train the underlying language models. Monitoring datasets for skew and applying debiasing techniques can help ensure fairer outputs.
Localized Learner Agency (L): AI systems often fail to accommodate cultural diversity and learner autonomy. The concept of Localized Learner Agency (L) embodies a dual commitment to cultural relevance and user autonomy within AI-driven educational systems. While cultural and linguistic adaptations ensure content inclusivity, the agency component empowers learners to make informed decisions regarding their engagement with AI-generated explanations. Operationally, this component can be instantiated in several ways:
Modality Control: Learners should be able to choose their preferred format for feedback, e.g., visual explanations (diagrams), verbal/narrative summaries, or step-by-step walkthroughs. For instance, a student struggling with fractions might opt for a pictorial number line rather than a verbal explanation.
Depth Adjustment: PEARL-based systems offer learners a toggle or slider to control the granularity of explanations. A novice might select a highly scaffolded response (“Explain like I’m in 5th grade”), while an advanced learner can request a more concise or abstract summary (“Just show me the error source”).
Cultural Context Personalization: The system should present regionally adapted analogies or examples. A math problem might refer to local currency, sports, or festivals, ensuring contextual resonance.
Language and Tone Preferences: Learners may choose explanations in their first language or in a tone suited to their learning style, e.g., formal academic vs. casual motivational.
The configurable options extend beyond mere interface selections; they are instrumental in fostering metacognition, learner autonomy, and sustained engagement. By incorporating these controls, PEARL conceptualizes explainability as a dialogic and learner-centered process, rather than a uniform output, thereby aligning AI behavior with both ethical sensitivity and pedagogical theory. Collectively, these five pillars establish a modular yet interdependent framework for developing AI systems that are not only functional but also educationally meaningful, ethically responsible, and human-centered. PEARL provides a pragmatic blueprint for researchers, developers, and educators to systematically integrate explainability, fairness, and pedagogical sensitivity into the design of next-generation smart learning tools.
3.1. The PEARL Composite Score
To translate the principles of the PEARL framework into actionable evaluation criteria, we introduce the PEARL Composite Score, a multi-dimensional scoring system that enables systematic assessment of educational AI systems across the five components of PEARL. The objective is to bridge the gap between abstract values, such as fairness and interpretability, and concrete, comparable metrics that can inform development, auditing, and deployment processes in an educational context.
While most artificial intelligence (AI) systems, including those utilized in educational settings, are evaluated using traditional performance metrics such as accuracy and F1 score, these metrics alone provide an incomplete assessment of system quality. An AI tutoring system that achieves high predictive accuracy but lacks pedagogical alignment or presents concepts in culturally inaccessible ways may ultimately detract from learning outcomes or diminish trust. Furthermore, ethical and educational objectives such as transparency, inclusivity, and learner empowerment are often challenging to quantify without a structured, context-aware evaluation tool. The PEARL Composite Score addresses this gap by providing a comprehensive, stakeholder-centered rubric for assessing an AI system’s ethical, pedagogical, and contextual suitability. It enables researchers, developers, educators, and policymakers to evaluate how effectively a system adheres to the five core pillars of PEARL, not only in terms of functionality but also in its human-centered contributions to the learning process. The PEARL Composite Score is structured as a modular rubric, with each of the five PEARL components Pedagogical Personalization, Explainability & Engagement, Attribution & Accountability, Representation & Reflection, and Localized Learner Agency scored independently across multiple sub-criteria. Each dimension is assessed using stakeholder-relevant indicators, allowing the score to be adaptable to various educational contexts (e.g., K–12, higher education, informal learning). Each component includes:
Design indicators (e.g., was the system built to account for curriculum alignment or diverse learner profiles?)
Output quality indicators (e.g., are generated content appropriate, reflective, or motivating?)
System behavior indicators (e.g., does the system log decisions, provide version traceability, or adapt to local cultural references?)
Each of these indicators is rated on a Likert-style scale (1–5), and qualitative descriptors are provided to reduce subjectivity. The framework encourages evaluation by interdisciplinary teams including educators, learning scientists, ethicists, and technologists to capture the full spectrum of values embedded in educational AI systems. The PEARL Composite Score serves multiple purposes throughout the AI lifecycle:
Development: Supports iterative design and debugging by highlighting specific weaknesses (e.g., lack of reflection prompts or traceability).
Benchmarking: Enables comparison across different systems or model versions not solely by accuracy but by alignment with educational and ethical principles.
Deployment Readiness: Assists educators and institutions in deciding whether a system is sufficiently trustworthy, inclusive, and pedagogically sound for real-world use.
Regulatory and Institutional Auditing: Aligns with emerging AI policy frameworks (e.g., UNESCO AI in Education guidelines, EU AI Act risk classifications) to support responsible innovation and compliance.
Longitudinal Monitoring: Facilitates ongoing quality assurance and system evolution, ensuring continued relevance and safety over time.
By integrating the principles of transparency, fairness, inclusivity, accountability, and learner empowerment into a definitive scoring framework, the PEARL Composite Score actualizes the essence of human-centered explainable AI in the educational domain. It provides a reproducible and scalable approach to ensure that AI-driven learning systems not only operate effectively but also thrive in promoting human development. Future iterations of this rubric will include inter-rater reliability testing and empirical validation in live classroom settings to ensure consistency and practical relevance.
3.2. Alignment with Global AI Ethics and Fairness Guidelines
To ensure its ethical integrity and international applicability, the PEARL framework has been intentionally aligned with globally recognized guidelines on AI ethics and fairness (
Martínez-Requejo et al., 2025). These guidelines include frameworks from prominent organizations such as UNESCO, the OECD, the
European Commission (
2023), NIST (
U.S. National Institute of Standards and Technology, 2023), and
IBM (
n.d.). This alignment guarantees that PEARL is not only pedagogically and technically robust but also compliant with policy, globally interoperable, and prepared for implementation in educational systems operating under diverse regulatory conditions. Each component of PEARL incorporates priorities derived from these international guidelines, thereby creating a framework that integrates theoretical, technical, and ethical policy considerations:
Pedagogical Personalization (P) resonates strongly with the EU’s human-centricity and the OECD’s emphasis on inclusive and effective learning systems. By ensuring that AI output aligns with developmental stages and curriculum goals, this component supports trust and relevance in AI-driven instruction, particularly in regulated or standards-based education systems.
Explainability and Engagement (E) embodies a central principle found across all five frameworks: the need for transparent, accessible AI explanations. It operationalizes explainability as a means for engagement and trust-building among non-technical stakeholders including students, teachers, and administrators addressing calls from UNESCO, NIST, IBM, and the EU for inclusive interpretability.
Attribution and Accountability (A) directly support the traceability and auditability requirements outlined in the NIST AI RMF and the EU’s Trustworthy AI guidelines (
U.S. National Institute of Standards and Technology, 2023). By enabling logging, decision tracing, and rollback, this component facilitates system oversight, dispute resolution, and compliance with data protection and AI audit mandates (e.g., GDPR, FERPA).
Representation and Reflection (R) reflect commitments to equity, anti-discrimination, and bias mitigation found in IBM, EU, and UNESCO frameworks. It ensures that AI systems proactively audit and correct disparities in learning experiences across socio-demographic groups, thereby reinforcing education’s role in advancing social justice.
Localized Learner Agency (L) uniquely incorporates the context-awareness and cultural-linguistic inclusion principles championed by UNESCO and the OECD. It affirms that responsible educational AI must adapt to the learner’s sociocultural reality, not the other way around, offering user control over explanation depth, pacing, and interaction style.
Table 2 summarizes the alignment between the PEARL framework and major international AI policy and governance frameworks, highlighting areas of convergence across ethical, regulatory, and human-centered principles.
Together, these mappings demonstrate that PEARL is more than a pedagogical or technical model; it is an ethically robust, globally aligned framework for designing and evaluating explainable AI in education. Its modularity allows for adaptation across jurisdictions while its grounding in international guidelines supports use in publicly funded, policy-sensitive, or globally scaled educational AI initiatives. As AI regulation tightens worldwide, PEARL provides a future-facing roadmap for building learning technologies that are not only innovative, but also trustworthy, inclusive, and institutionally responsible.
4. Simulation Case Studies Using LLM-Based Virtual Teaching Assistant
To evaluate the applicability and diagnostic value of the PEARL framework, we present a set of structured simulation case studies involving a large language model (LLM) based Virtual Teaching Assistant (
Mokhtari et al., 2025) as illustrated in
Figure 2. These simulations do not involve human subjects or real user data; instead, they are constructed as scenario-driven analyses designed to reflect plausible, pedagogically relevant interactions that highlight key strengths and vulnerabilities of AI in educational settings. The goal of these simulations is twofold:
To show how PEARL functions as an evaluative scaffold for surfacing ethical, pedagogical, and human-centered issues, especially in situations where AI systems may produce misleading, biased, or opaque responses.
To show how the PEARL framework works in practice, this section presents a series of simulated examples using an AI-powered Virtual Teaching Assistant (VTA,
Virtual Teaching Assistant Project Website, n.d. Available online:
https://www.sagnikdakshit.com/vta, accessed on 8 November 2025) based on a large language model. These are not real classroom experiments and do not involve real student data. Instead, they are carefully designed scenarios that reflect realistic situations teachers, students, and school leaders might encounter when using AI tools.
Simulation-based studies are commonly used in early-stage educational AI research because they allow researchers to examine potential benefits and risks before deploying systems in real classrooms (
Rosé et al., 2019;
Clancey & Hoffman, 2021). The goals of these simulations are to show how PEARL can guide the design of AI-based learning tools, and to demonstrate how PEARL can be used to identify problems, such as unclear explanations or unfair outcomes, before they cause harm. Together, these scenarios illustrate how explainability, fairness, and learner control can be embedded into everyday educational AI interactions.
4.1. Practical Examples for Different Educational Roles
Educational AI systems serve many people, not just students. Teachers and administrators rely on AI-generated information to make decisions. The following examples show how PEARL supports explainability for three key groups: students, teachers, and administrators.
4.1.1. Student Example: Making Feedback More Helpful
The Problem: After completing an algebra quiz, a student receives the vague prompt: “You should revisit your understanding of algebraic transformations.” While technically accurate, this feedback lacks the diagnostic clarity needed for improvement. As noted in educational research, such ambiguity can lead to learner frustration and a decline in motivation (
Hattie & Timperley, 2007).
PEARL Implementation:
Pedagogical Personalization (P): Calibrates feedback to the student’s specific curriculum and current mastery level.
Explainability & Engagement (E): Translates technical errors into supportive, intelligible explanations that encourage further study.
Localized Learner Agency (L): Presents information in a relatable format, offering the student choice in their next steps rather than simply assigning a grade.
The Outcome: The student receives actionable guidance: “You missed 3 out of 5 questions on factoring quadratic expressions. Would you like to see a worked example similar to Question 3?” This transformation allows the AI to function as a supportive tutor, promoting learner agency and personalized growth.
4.1.2. Teacher Example: Understanding “At-Risk” Alerts
The Problem: A teacher receives an automated notification labeling a student as “at risk” without any accompanying justification. This lack of transparency undermines professional trust and prevents effective intervention, reflecting a broader critique that predictive accuracy alone is insufficient for educational support (
Maity & Deroy, 2024;
Rosé et al., 2019).
PEARL Implementation:
Attribution & Accountability (A): The system provides a clear diagnostic trail, revealing the specific data points that triggered the alert.
Explainability & Engagement (E): Technical data is translated into an accessible format that non-expert users can easily interpret and act upon.
Representation & Reflection (R): The framework allows the teacher to verify that the alert is based on individual performance rather than systemic bias across student subgroups.
The Outcome: Instead of a generic score, the teacher receives a specific summary: “This student missed three assignments in the past two weeks and scored below average on recent geometry quizzes, increasing their risk score from 0.32 to 0.74.” Armed with these insights, the teacher can design a targeted intervention rather than blindly following an algorithmic prompt.
4.1.3. Administrator Example: Auditing for Fairness
The Problem: A school administrator observes that English Language Learners (ELLs) are consistently assigned lower-complexity tasks by the AI system, raising concerns regarding educational equity.
PEARL Implementation:
Representation & Reflection (R): Facilitates demographic analysis to detect and visualize patterns of disparate treatment across student groups.
Attribution & Accountability (A): Enables administrators to trace automated recommendations back to specific algorithmic rules, identifying the root cause of the bias.
The Outcome: Rather than relying on technical accuracy alone, the administrator receives a report that flags the discrepancy for English Learners and points to the specific logic causing the bias. This transforms the AI from a “black box” into an auditable tool, allowing the school to mitigate algorithmic bias before it results in systemic inequity. By providing these diagnostic capabilities, PEARL directly fulfills global policy mandates for fairness audits and ensures that AI supports equity and inclusion (
UNESCO, 2023;
OECD, 2019).
4.2. Risk Scenarios
Despite the increasing potential of AI-powered educational systems, they are frequently associated with complex failure modes. These issues often arise not solely from performance deficiencies but also from a lack of transparency, pedagogical alignment, and stakeholder control. In this context, explainability is not merely an advantageous feature; it serves as a crucial protective measure that underpins trust, accountability, and inclusivity. This section presents a series of simulated risk scenarios, derived from academic literature, pilot deployments, and practitioner feedback, which elucidate the intricate challenges that emerge when educational AI systems function without adequate human-centered safeguards.
Each scenario, summarized in
Table 3, illustrates how the PEARL framework can diagnose and mitigate these risks. By operationalizing the five interrelated dimensions, PEARL provides developers and institutions with actionable strategies to ensure that learning technologies are not only effective but also comprehensible, equitable, and contextually relevant. These scenarios underscore the necessity for explainability to be dynamically aligned with educational objectives, cultural contexts, and stakeholder requirements to prevent harm and promote ethical system behavior.
R1: Misinterpreted Feedback occurs when learners receive generic or overly technical explanations that do little to guide improvement or build understanding. PEARL’s Explainability & Engagement (E) ensures that system outputs are readable, motivating, and matched to cognitive level, while Pedagogical Personalization (P) aligns them with curricular intent.
R2: Biased Recommendations arise when personalization systems serve easier or repetitive content disproportionately to certain demographic groups, unintentionally reinforcing achievement gaps. Representation & Reflection (R) surfaces these disparities through disaggregated audits, and Attribution & Accountability (A) provides traceability to underlying model behavior.
R3: Blind Risk Prediction reflects situations where educators receive alerts (e.g., “at-risk” flags) without interpretability. This opacity reduces trust and limits intervention. PEARL’s Explainability (E) and Accountability Tracing (A) components provide justification pathways that empower human decision-makers to act meaningfully on system outputs.
R4: Cultural Mismatch in Content affects multilingual or multicultural classrooms, where examples or analogies drawn from unfamiliar contexts can alienate or confuse learners. Localized Learner Agency (L) ensures that content is adapted to socio-linguistic norms, supporting inclusion and engagement.
R5: Over-personalization Leading to Isolation refers to recommendation loops that narrow learners’ exposure to diverse topics, ideas, or challenges. PEARL counters this through Pedagogical Personalization (P), which balances adaptivity with learning goals, and Representation (R), which ensures that personalization does not erode equity of opportunity.
R6: Conflicting Explanations Across Updates emerge when AI model changes produce inconsistent explanations for similar outputs, confusing both learners and educators. Attribution & Accountability (A) enables consistent versioning and change logs to maintain explanation integrity over time.
R7: Feedback Overload is an emerging risk in generative systems where students receive unnecessarily long or dense explanations for simple errors. Explainability & Engagement (E) adjusts the scope and granularity of explanations based on learner profile and task complexity.
5. Evaluation of AI Education Systems Using PEARL Composite Score
To translate the human-centered principles of the PEARL framework into a practical, reproducible evaluation methodology, we introduce the PEARL Composite Score, a multi-dimensional scoring rubric designed to assess the pedagogical soundness, ethical robustness, and explainability maturity of AI-powered educational systems. This score is derived from the five core components of PEARL: Pedagogical Personalization (P), Explainability & Engagement (E), Attribution & Accountability (A), Representation & Reflection (R), and Localized Learner Agency (L). Each dimension is evaluated using context-specific criteria and scored on a normalized scale from 0 to 20, for a maximum possible score of 100 points. Unlike standard technical benchmarks, the PEARL Composite Score is explicitly designed to balance educational utility with ethical and stakeholder-centered values. It enables developers, evaluators, and institutional decision-makers to audit AI systems not just for performance, but for their alignment with human learning needs, transparency expectations, and contextual appropriateness. By doing so, the score functions as both a diagnostic lens and a developmental roadmap.
Table 4 presents a simulated evaluation of our prototype large language model-based VTA using this composite scoring method. These results are based on simulated use cases and should be interpreted as illustrative rather than definitive classroom findings. The score is derived from structured simulations across practical use cases and risk scenarios (
Section 4), rather than real-world classroom deployment. While illustrative in nature, this simulation demonstrates how the PEARL Composite Score can be applied in early-stage prototyping or as part of model selection and refinement workflows.
The results reveal several insights about the system’s current strengths and areas for growth. The highest-performing dimension, Explainability & Engagement (E), reflects the system’s effectiveness in delivering stakeholder-specific justifications that are clear, cognitively appropriate, and positively received. This indicates early success in fostering transparency and engagement across diverse user roles. Pedagogical Personalization (P) follows closely, demonstrating that the system is reasonably well-aligned with curricular goals and developmental learning stages. Representation & Reflection (R) yields a solid mid-range score, suggesting that fairness is being addressed through audit tools, although deeper demographic equity may require continued refinement. Attribution & Accountability (A) receives a moderate score, indicating basic traceability is in place, but that improvements particularly in longitudinal explanation consistency and rollback transparency are needed. The lowest score, Localized Learner Agency (L), highlights persistent challenges in adapting content to local sociocultural and linguistic contexts. While some efforts are evident, broader adaptation mechanisms and user-level control over interactions remain underdeveloped. A composite score of 77 out of 100 reflects a system that demonstrates ethical and pedagogical maturity across several dimensions, while still requiring iterative enhancement in accountability, fairness, and contextualization. Rather than serving as a one-time assessment, the PEARL Composite Score is designed for continuous use across development cycles, enabling teams to track improvement, compare systems, and align AI behavior with evolving institutional and regulatory expectations. Over time, longitudinal tracking of this score can serve as a quality assurance signal, helping ensure that educational AI systems remain human-centered, pedagogically effective, and socially responsible. Future evaluations will also consider practical aspects such as computational efficiency, scalability, and resource requirements to ensure feasibility for real-world classroom deployment.
6. Mixed-Methods Exploratory Study
To evaluate the PEARL framework, we conducted a small exploratory mixed-methods user study (N = 17) in which participants reviewed example interactions with an AI-based tutor. The study focused on perceived clarity, engagement, fairness, and learner control; no live classroom deployment or intervention was conducted. The survey combined Likert-scale questions, categorical judgments, and short-response items to capture both breadth and depth of perceptions; the aggregated results are presented in
Figure 3 and
Figure 4. Given the small sample size (N = 17), these results should be considered early-stage findings and not generalizable to classroom deployment. The results should be interpreted with caution, discussed in relation to previous studies and the working hypotheses, and their interpretations situated within the broadest possible contexts, including future research directions. The instruments used for this study, representative examples of the actual survey questions used in the qualitative and quantitative analysis are provided in
Appendix A, along with the inductively derived PEARL evaluation rubric in
Appendix B.
Quantitative results across the five PEARL dimensions, participants reported generally positive perceptions of the tutor’s explanations and feedback are illustrated in
Figure 5. Mean ratings for helpfulness in identifying mistakes (M ≈ 4.0) and alignment with learning level (M ≈ 4.1) indicate that explanations were considered both useful and appropriately tailored. Similarly, clarity of feedback and alert usefulness scored above 4.0, while control over learning path averaged slightly lower (M ≈ 3.5), suggesting opportunities to strengthen learner agency. Categorical items further supported these findings with a strong majority (>80%), confirming that the tutor kept learners engaged, provided clear improvement feedback, and helped them gain deeper insight into strengths and weaknesses. However, fewer respondents felt the tutor consistently offered multiple perspectives on the content. Qualitative insights from open-ended responses provided further nuance. Several participants emphasized the need for simpler language in explanations, highlighting accessibility concerns. Others stressed the importance of cultural and linguistic sensitivity, linking directly to the “personalization” and “equity” aspects of PEARL. Preferences for flexible explanation styles ranging from concise summaries to more detailed reasoning also emerged, underscoring the value of learner agency and choice.
Taken together, the mixed-methods evidence suggests that the PEARL framework supports more personalized, engaging, and reflective AI tutor feedback. While quantitative results confirm strong performance on clarity and engagement, qualitative insights highlight areas for refinement, particularly in simplifying language and expanding cultural adaptability. These results provide early empirical support for PEARL’s human-centered explainability approach, while also identifying actionable directions for improvement.
7. Comparative Positioning of PEARL in Existing XAI Frameworks in Education
While explainability in AI has been a growing research priority, few frameworks have been specifically tailored for the educational domain, and even fewer adopt a human-centered, pedagogically grounded approach. In this section, we briefly compare PEARL with existing frameworks to clarify its unique contributions.
Table 5 compares PEARL with representative explainable AI frameworks in education, highlighting differences in focus areas, strengths, and the specific limitations addressed by the proposed framework.
The PEARL framework builds upon foundational principles articulated in
Clancey and Hoffman’s (
2021) work on explainable AI in intelligent tutoring systems (ITS). Their paper emphasizes that effective explanations in educational AI systems must be evaluated in terms of user-centered utility, contextual relevance, and interactional design rather than algorithmic transparency alone. In particular, the authors argue that explanation research must shift from “what the model did” to “what the learner needs to understand”, aligning closely with PEARL’s emphasis on stakeholder-centered and pedagogically grounded feedback. Clancey and Hoffman advocate for explanations that are dialogic, tailored to learning goals, and embedded within broader instructional interactions core principles operationalized in PEARL through its components like Pedagogical Personalization, Explainability & Engagement, and Localized Learner Agency. Furthermore, their call for standardized evaluation methods and scenario-based studies parallels PEARL’s use of the PEARL Composite Score and simulation-based diagnostics to assess educational and ethical dimensions of AI systems. Unlike Clancey and Hoffman’s primarily conceptual and reflective approach, PEARL extends this foundation by providing a structured, multi-dimensional framework with actionable design components and quantifiable evaluation mechanisms. In this sense, PEARL complements and advances the ITS-driven vision of explainability by translating it into a modular scaffold applicable to contemporary LLM-based smart tutors.
Rosé et al. (
2019) argue persuasively that predictive accuracy alone is insufficient for educational machine learning (ML) systems. They propose the development of explanatory learner models, which unlike opaque black-box algorithms provide interpretable, actionable insights derived from interdisciplinary learning-engineering practices, enabling teachers and learners to understand not only what the system predicts but why. This aligns with the ethos of educational impact emphasized in both frameworks. PEARL builds on these principles by integrating explanatory learner model design as one of its core pillars, while extending the scope to include five interlocking dimensions of Pedagogical Personalization (P), Explainability & Engagement (E), Attribution & Accountability (A), Representation & Reflection (R), and Localized Learner Agency (L). While Rosé et al. focus primarily on how to craft learner models that produce actionable feedback, PEARL incorporates those insights into a broader holistic architecture that also addresses fairness, version traceability, stakeholder-tailored explanation, and cultural adaptation. In essence, explanatory learner models emphasize the technical and design-driven process required to make machine learning (ML) output meaningful and actionable in classroom settings. PEARL operationalizes that value proposition through a structured, component-based framework with evaluation scaffolds such as the PEARL Composite Score and use cases demonstrating how explanatory learner models function within an ecosystem that values educational equity, learner agency, and stakeholder comprehension.
The recent survey by
Liu et al. (
2024) reviews XAI methods applied in educational settings, emphasizing visual explanations and learning analytics dashboards designed for institutional deployment. While this survey highlights technical and design challenges, it does not propose a unified, operationalized framework for evaluating explainability nor does it incorporate mechanisms for learner control or adaptivity. PEARL fills this gap by offering a comprehensive scoring rubric and a multi-actor design that supports learners, educators, and administrators alike, fostering more interactive and transparent educational AI systems. PEARL distinguishes itself from existing educational XAI frameworks in four keyways:
Stakeholder-Inclusive Scope: Unlike models that focus narrowly on teacher trust or learner modeling transparency, PEARL’s dimensions are designed to support a broad range of stakeholders including students, educators, and administrators.
Dual Role as Design and Audit Framework: Whereas many existing frameworks rely solely on heuristics, PEARL operationalizes explainability, fairness, and cultural adaptation through a Composite Score, facilitating both design guidance and systematic evaluation.
Learner-Centric Adaptivity: PEARL introduces components such as Localized Learner Agency, empowering users with real-time control over explanation formats and content, a feature absent in prior models.
Ethical Fairness Integration: Fairness and demographic bias checks are integral configuration elements within PEARL, rather than optional add-ons, ensuring ethical considerations are embedded throughout the framework.
By synthesizing pedagogical theory, transparency principles, ethical safeguards, and user agency, PEARL advances beyond current educational XAI models. It offers a truly human-centered, interdisciplinary framework designed for impactful real-world educational applications.
8. Limitation and Future Validation
While the PEARL Framework is theoretically robust and supported by initial simulations, several hurdles remain before it can be considered a turnkey solution for schools. Acknowledging these limitations is essential for ensuring that the transition from a “black-box” model to an “explainable” one is both safe and effective.
First, the primary limitation of the current study is that PEARL has not yet been stress-tested in live, authentic classrooms. These findings should be considered exploratory. While a mixed-methods user study (N = 17) was conducted using example AI tutor interactions, the system was not deployed in live classroom settings, and no longitudinal learning outcomes were measured. While simulations provide a controlled environment to verify algorithmic fairness (Representation & Reflection), they cannot account for the “messiness” of a real school day, such as fluctuating student engagement, teacher “alert fatigue,” or the technical constraints of school hardware. A future goal is to establish longitudinal classroom deployments to observe how the framework performs over an entire academic year rather than a single session.
Second, the current validation of the PEARL framework has relied on curated datasets. To truly fulfill the promise of Equity and Inclusion, the framework must be tested against larger and more geographically diverse participant samples. A future goal is to partner with diverse school districts (urban, rural, and international) to ensure that the Pedagogical Personalization (P) pillar adapts effectively to different cultural and linguistic contexts.
Third, it is important to note that while technical accuracy is a mathematical metric, Trust and Agency are human psychological states. We have yet to establish standardized ways to measure whether an AI explanation results in a “better” pedagogical decision by a teacher or increased motivation in a student. A future goal is to utilize Mixed-Methods Research, combining quantitative learning data with qualitative interviews to measure how PEARL impacts the relationship between the user and the technology.
Fourth, the current iteration of the PEARL framework is researcher-driven; however, the bridge between “technical explanation” and “actionable insight” requires the expert intuition of practitioners. In other words, a framework designed for users should be designed with users. A future goal is to initiate Co-design Workshops with teachers and students to refine the Explainability & Engagement (E) pillar, ensuring that the interface is intuitive and reduces, rather than adds to, the cognitive load of the user.
Finally, the “Composite Score” used to evaluate the balance of fairness and accuracy requires further validation across different subjects. What constitutes “fairness” in a high-stakes standardized math test may differ from a creative writing exercise. A future goal is to conduct Sensitivity Analyses to test the reliability of the PEARL scoring system across varied educational domains to ensure it remains a stable and trustworthy auditing tool.
9. Conclusions
The integration of artificial intelligence into education is no longer speculative; it is an enduring shift. To ensure that this transition benefits all stakeholders, educational AI must move beyond narrow measures of accuracy toward systems that are fair, transparent, and grounded in the science of learning. The PEARL framework offers a critical bridge between technical AI development and the practical realities of classroom use.
As demonstrated in our comparative analysis, PEARL distinguishes itself from traditional explainable AI approaches by shifting the focus from what the model did to what the learner needs. Whereas many existing systems remain opaque or provide purely technical explanations, PEARL advances a transparent, auditable, and human-centered alternative. By integrating pedagogy, explainability, and learner agency, the framework ensures that AI augments, rather than displaces, the human dimensions of education.
Evidence from simulations, diagnostic scoring tools, and early stakeholder feedback further illustrates that educational AI can be both technically sophisticated and pedagogically meaningful. This work underscores that computational efficiency alone is insufficient; educational AI systems must also be equitable, interpretable, and instructionally grounded.
Although important limitations remain, including the need for longitudinal classroom evaluation and participatory co-design with educators and learners, the PEARL framework represents a foundational roadmap for future research and development. It supports the creation of the next generation of educational AI systems: not merely intelligent technologies, but trustworthy partners in advancing student learning and success.