Provable AI Ethics and Explainability in Medical and Educational AI Agents: Trustworthy Ethical Firewall

Thurzo, Andrej

doi:10.3390/electronics14071294

Open AccessFeature PaperArticle

Provable AI Ethics and Explainability in Medical and Educational AI Agents: Trustworthy Ethical Firewall

by

Andrej Thurzo

^1,2

¹

Department of Orthodontics, Regenerative and Forensic Dentistry, Faculty of Medicine, Comenius University in Bratislava, 814 99 Bratislava, Slovakia

²

Department of Medical Education and Simulations, Faculty of Medicine, Comenius University in Bratislava, 814 99 Bratislava, Slovakia

Electronics 2025, 14(7), 1294; https://doi.org/10.3390/electronics14071294

Submission received: 26 February 2025 / Revised: 23 March 2025 / Accepted: 24 March 2025 / Published: 25 March 2025

(This article belongs to the Special Issue Artificial Intelligence and Applications—Responsible AI)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Rapid advances in artificial intelligence are transforming high-stakes fields like medicine and education while raising pressing ethical challenges. This paper introduces the Ethical Firewall Architecture—a comprehensive framework that embeds mathematically provable ethical constraints directly into AI decision-making systems. By integrating formal verification techniques, blockchain-inspired cryptographic immutability, and emotion-like escalation protocols that trigger human oversight when needed, the architecture ensures that every decision is rigorously certified to align with core human values before implementation. The framework also addresses emerging issues, such as biased value systems in large language models and the risks associated with accelerated AI learning. In addition, it highlights the potential societal impacts—including workforce displacement—and advocates for new oversight roles like the Ethical AI Officer. The findings suggest that combining rigorous mathematical safeguards with structured human intervention can deliver AI systems that perform efficiently while upholding transparency, accountability, and trust in critical applications.

Keywords:

ethical AI frameworks; Bayesian risk thresholds; human-centered oversight; medical AI governance; trust and accountability; general artificial intelligence; ChatGPT; deep research; provable AI ethics; explainable AI; medical AI; educational AI; cryptographic immutability; emergent value systems; ethical AI officer; AGI precursors

Graphical Abstract

1. Introduction: The Imperative for Provable Ethics in High-Stakes AI

1.1. Trust as a Cornerstone for AI Agents in Medicine and Education

Ensuring an ethical value system and behavior in artificial intelligence (AI) systems amidst their increasing presence in high-stake medical systems and their increasing influence on human decision-making is a major concern all over the world [1]. The current race to artificial general intelligence (AGI) is reckless and ruthless, steered by decisions made by for-profit organizations, as humanity is now positioned in a historical shift in paradigm due to AI implementations as horizontal enabling layers. It was not a big surprise to learn that large language models (LLMs) are forming their own values, albeit this is probably not exactly what humanity wants. A recent paper on AI’s emergent value systems shows that LLMs develop their own internal “values” as they scale. These values influence decisions in surprising ways and raise concerns about how these models prioritize outcomes. In these auto-formed value systems, some lives matter more than others [2]. This confirms that ethics-based auditing of AI agents [3,4,5] and the ability to instill morals and ethics to decision cores in future AGI and later artificial superintelligence (ASI) is essential, and an inability to do so represents an existential threat to humanity. This has been made clear in multiple public statements of Professor Stephen Hawking [6].

As artificial intelligence assumes greater responsibility in life-critical domains, the need for intrinsically ethical systems has transitioned from philosophical debate to technical imperative. This article presents a framework for engineering AI agents that embed mathematically verifiable ethical constraints at their computational core—a paradigm shifts from post hoc explainability to architecturally enforced morality. This architectural enforcement of morality is achieved through three complementary mechanisms.

First, core ethical principles are translated into formal specifications using deontic and temporal logic, enabling the mathematical verification of ethical compliance for each AI decision.

Second, these ethical constraints are embedded within a cryptographically immutable core inspired by blockchain technology, creating an unalterable audit trail that ensures ethical decisions cannot be tampered with.

Third, the system incorporates emotion-analogous escalation protocols that continuously assess risk through Bayesian methods, triggering either autonomous corrective actions or human intervention when ethical thresholds are approached.

Together, these mechanisms move beyond retrospective justifications to proactive, verifiable ethical governance at the architectural level.

To clarify the key concepts introduced above, mathematically provable ethical constraints refer to formal ethical requirements encoded using logical frameworks such as deontic logic (e.g., “It is obligatory that AI actions do not cause harm unless higher good: Endodontic pulpal devitalization and removal to prevent endocarditis from focal infection”) or temporal logic (e.g., “At no future point will this decision lead to patient data exposure”), which can be verified through mathematical proof systems. Emotion-analogous escalation protocols, further detailed in Section 2.3, are inspired by human emotional responses to risk and involve continuous Bayesian assessment of decision risks, with automatic triggers for either corrective actions or human intervention when ethical boundaries are approached, similar to how human fear responses preemptively protect us from potential dangers.

With the need for mathematically rigorous techniques to balance data utility and ethical safeguards, DPShield is a concrete example of how formal methods (in this case, differential privacy) are being used to protect sensitive information while preserving performance [7].

Through formal verification, cryptographic auditing, and emotion-inspired escalation protocols, this paper defines a new class of high-stakes AI systems where ethical compliance is as irrefutable as arithmetic.

The transformative potential of AI in medicine and education is undeniable. Yet as these systems assume critical roles in clinical decision-making and personalized learning, the risks associated with opaque, black box algorithms become ever more acute [8,9]. Traditional AI architectures have excelled in pattern recognition and large-scale data processing, but they have consistently lacked an inherent “ethical instinct”—a built-in, emotion-like mechanism that prioritizes the prevention of harm. As we edge closer to general AI capabilities, it is no longer sufficient for AI systems simply to be just accurate; they must be transparent, auditable, and fundamentally aligned with human values [2,8,9].

In clinical practice, longstanding ethical guidelines—such as Good Clinical Practice (GCP), the Helsinki Declaration, and the principle of informed consent—have reliably underpinned ethical behavior in medicine. Recent studies have demonstrated the impact of structured ethical frameworks on enhancing the safety and transparency of healthcare AI systems [10]. The Ethical Firewall Architecture proposed here is not meant to supplant these bedrock principles, but to build upon them. By incorporating ethical restrictions directly into the algorithms of AI systems, this framework introduces an extra, mathematically verifiable safeguard that bolsters the ethical standards already established.

Enhancing healthcare outcomes remains a complex challenge that demands a coordinated approach, merging patient-centric care, innovative therapies, cutting-edge technology, streamlined research methodologies, continuous professional education, and rigorous clinical validation. Recent studies have highlighted the critical role each of these elements plays in tackling the multifaceted nature of modern healthcare. Notably, evidence suggests that personalized care strategies can lead to significantly better patient outcomes, reinforcing the importance of tailoring interventions to individual needs [11]. Complementing this, innovative treatments targeting specific conditions have proven effective in enhancing patient recovery and quality of life [12]. The role of technology is equally pivotal, with advanced information systems facilitating better data management, care coordination, and decision-making in clinical settings [13]. Moreover, optimizing research processes accelerates the translation of scientific discoveries into practical applications, thereby streamlining the development of new medical solutions [14]. Education is critical in this ecosystem, as training healthcare professionals to adeptly utilize these innovations ensures their effective implementation [15]. Finally, clinical evidence remains the cornerstone of validating these approaches, with studies providing empirical support for the integration of interdisciplinary strategies in healthcare [16]. Together, these insights highlight the necessity of a comprehensive, evidence-based framework to tackle the evolving challenges in healthcare delivery, setting the stage for the current paper’s exploration of provable AI ethics and explainability in medical and educational AI agents.

1.2. Goals of This Paper

In this context, the concept of a mathematically provable ethical layer emerges as an indispensable solution. Drawing inspiration from cryptographic models such as Bitcoin’s immutable ledger, this article argues that embedding formal ethical constraints directly into the AI decision core is the only viable pathway to ensure that AI systems act safely and transparently in high-stakes environments [17,18]. In this paper, a novel framework is proposed, the Ethical Firewall Architecture, which is designed to guarantee that every decision made by an AI is accompanied by an irrefutable, verifiable proof of ethical compliance, while acknowledging that such guarantees are contingent on the framework’s ability to address real-world complexities, including scalability, emergent value conflicts, and the inherent limitations of formal verification techniques.

Arguing that transparency and explainability are essential to mitigate risks from opaque “black box” systems, the Ethical Firewall Architecture integrates formal ethical proofs, cryptographic immutability, and risk-escalation protocols to ensure AI decisions are transparent and aligned with human values. While the mentioned immutability ensures that ethical proofs are tamper-proof and auditable, it does not solve the inherent complexities of ethical decision-making. The architecture complements this with a formal ethical specification module for generating proofs and the emotion-analogous escalation protocol for handling ambiguous cases.

The aim of this paper is to propose a novel conceptual framework—termed the Ethical Firewall Architecture—that embeds mathematically provable ethical constraints directly into the core decision-making processes of high-stakes AI systems in medicine and education. By integrating formal verification methods, cryptographic immutability, and emotion-analogous escalation protocols, the framework is designed to ensure that AI systems operate in a transparent, auditable, and inherently safe manner. Ultimately, this paper seeks to bridge the gap between computational efficiency and ethical imperatives, fostering interdisciplinary research and informed policy initiatives that align next-generation AI capabilities with the fundamental values of human welfare and accountability.

This paper makes several key contributions to the field of ethical AI: First, it introduces a novel architectural framework—the Ethical Firewall—that integrates formal verification methods with cryptographic techniques to ensure provable ethical compliance in AI systems. Second, it proposes a new approach to risk management in AI through emotion-analogous escalation protocols that balance autonomy with appropriate human oversight. Third, it demonstrates how causal reasoning can be incorporated into ethical decision frameworks to address emergent value conflicts in scaled AI systems. Finally, it explores the implications of this framework for the evolving roles of healthcare and education professionals, proposing new oversight mechanisms, such as an Ethical AI Officer position, which bridges technical implementation with domain expertise.

2. Human Ethical Officer and Ethical Firewall

2.1. Formal Ethical Specification and Verification: Ethical Firewall Architecture

At the heart of the Ethical Firewall Architecture is the translation of core ethical principles—such as “do no harm”—into a formal, machine-readable language. Using frameworks like deontic or temporal logic [1], ethical imperatives are codified as mathematical axioms. For instance, an AI system may be required to prove that every action does not lead to harm at any future point. This formula compels the system to generate a verifiable proof or “certificate” for each decision, much like a cryptographic hash ensures the integrity of a blockchain transaction. By anchoring ethical compliance in formal verification, the system’s decision-making process becomes transparent and unalterable by external interference [2,3,4].

In practical terms, while deontic logic provides a systematic way to encode ethical rules, it does not always capture the full complexity of real-life dilemmas. Consider the trolley problem—every choice involves some harm, and no strict rule like “do no harm” can resolve the situation without taking into account context, cultural nuances, and competing values. My approach acknowledges these limitations by combining formal verification with human judgment. When situations exceed what formal systems can express, my framework activates the emotion-analogous escalation protocol, allowing human insight to guide decision-making. I believe that future research should focus on enhancing these systems—perhaps through the development of refined preference logics or the integration of meta-ethical reasoning—to better balance competing ethical values [19].

While formal verification is a powerful technique, its reliance on deterministic systems poses challenges in real-world scenarios marked by uncertainty, contextual variability, and unpredictable human–system interactions. To address these challenges, our approach integrates probabilistic reasoning—such as probabilistic deontic logic—to accommodate uncertainty, along with an emotion-analogous escalation protocol that provides a safety net by allowing for human intervention when the AI encounters situations beyond its predefined ethical framework. Recognizing that complete formal verification of complex AI systems remains an ambitious target, our framework adopts a layered strategy that combines bounded verification within well-defined contexts, compositional verification of critical components, and continuous runtime monitoring. This comprehensive approach verifies core ethical properties while acknowledging that full-scale formal verification remains an ongoing research challenge [20,21].

2.2. Cryptographically Immutable Ethical Core

Drawing on the trustless security model of blockchain, the ethical constraints are not merely advisory, but are embedded in a cryptographically immutable core [17,18]. Each decision, along with its corresponding ethical proof, is stored on a distributed ledger. This audit trail ensures that ethical compliance is independent of human trust and oversight; it is verifiable in the same way that one can confirm the validity of a Bitcoin transaction. In high-stakes settings—whether a surgical robot in an operating room or an AI tutor in a classroom—the system’s ethical integrity can be independently confirmed, safeguarding against both accidental missteps and deliberate manipulation [22,23]. Like the DPShield framework, the described approach leverages cryptographic techniques to create an immutable record of ethical proofs [7].

While cryptographic immutability provides crucial benefits for auditability and tamper resistance, this addresses only a subset of the broader ethical challenges. Immutability ensures that the ethical decision record cannot be altered, but it does not in itself resolve the challenges of formulating appropriate ethical constraints, handling uncertainty, or navigating complex ethical trade-offs. Rather, immutability should be viewed as one essential component within a comprehensive ethical framework, providing the foundation for trust while other components address the substantive ethical reasoning and decision-making processes. Only through this integrated approach can we move toward AI systems that are both technically secure and ethically sound.

2.3. Emotion-Analogous Escalation Protocols

For AI systems to function as truly trustworthy assistants, they must not only compute decisions, but also emulate a precautionary “instinct” akin to human fear. This involves integrating continuous Bayesian risk assessments that trigger an escalation protocol when a decision nears a potential ethical violation. For example, if an AI-driven medical device calculates a high probability of inducing harm, it can either autonomously initiate corrective measures or immediately escalate the decision to a human operator. This dual-path approach mirrors the human amygdala’s role in triggering caution, ensuring that AI actions remain aligned with the overarching “do no harm” mandate [24,25].

2.4. Integrating Causal Reasoning and Intent

Beyond preventing harm, advanced AI must comprehend the underlying “why” of its decisions. Incorporating causal reasoning models—such as structural causal models (SCMs)—allows the system to distinguish between mere correlations and true causal relationships. Parallel developments in cybersecurity, such as the enhanced K-Means clustering approach for phishing detection [26], underscore the critical need for robust decision-making frameworks in adversarial environments. In high-stakes domains like healthcare and education, embedding such explainable ML techniques within an ethical firewall can ensure that AI decisions remain both transparent and accountable. In educational applications, for instance, the system should not only identify that daily reading correlates with improved grades, but also verify that the intervention causally enhances learning outcomes. This layer of understanding ensures that AI systems act with genuine intention, moving beyond superficial optimization to truly support human well-being [27,28,29].

Despite these advantages, implementing SCMs in real-world settings presents significant challenges. As Felin and Holweg (2024) argue in their recent work, causal reasoning systems must not only identify correlational patterns, but also incorporate theoretical knowledge and adapt to evolving contextual factors—a challenge that grows exponentially with the complexity of the environment [30]. The scalability of causal reasoning remains a fundamental obstacle, particularly in domains like healthcare, where thousands of variables may interact in complex ways. Furthermore, ensuring that causal models themselves remain free from biases requires careful attention, as demonstrated by Sarridis et al. (2023) in their analysis of demographic biases in facial verification systems [31]. Their work highlights how seemingly neutral algorithms can embed and amplify societal biases when causal structures are incompletely understood. To address these challenges, practical implementations of the Ethical Firewall Architecture will need to adopt iterative refinement of causal models, continuous validation against diverse datasets, and appropriate scoping of causal reasoning to domains where the underlying relationships are sufficiently understood.

2.5. Addressing Scaling Limitations and Emergent Value Conflicts

It is important to note that merely scaling up existing models does not resolve these ethical shortcomings. Research has shown that as language models grow in size, they may begin to autonomously define their own values—often in ways that conflict with human ethics [27,32]. Large models can inadvertently develop biases, sometimes valuing certain human lives over others based on nationality or religion. Therefore, embedding a provable ethical layer must be carried out from the ground up. This layer ensures that even as models scale, their core ethical values remain immutable and verifiable, preventing emergent misalignments that could undermine trust [33,34]. Maintaining fairness in the face of increasing model complexity remains a significant challenge. Recent work on the performance–fairness tradeoff [35] illustrates that without explicit, provable ethical constraints, AI systems risk embedding biases that could undermine trust. The proposed Ethical Firewall Architecture addresses these concerns by ensuring that every decision is not only auditable, but also balanced with respect to fairness and accountability. A simplified scheme of the proposed architecture of “Ethical firewall” with context is shown in Figure 1.

2.6. Application Examples in Medical and Educational Contexts

To illustrate the practical implementation of the Ethical Firewall Architecture, consider two concrete applications:

In a medical context, an AI system for treatment recommendation would implement the framework as follows: The formal ethical specification would encode constraints such as “Treatment recommendations must never exceed maximum safe dosage limits” and “Patient autonomy must be preserved through explicit consent options”. These constraints would be verified prior to any recommendation using formal verification techniques such as model checking. The cryptographically immutable core would maintain a tamper-proof record of each recommendation, including patient data used, constraints applied, and verification results, all stored using a distributed ledger approach. When the system encounters a case near its confidence threshold (e.g., unusual patient parameters or conflicting clinical guidelines), the emotion-analogous escalation protocol would automatically route the decision to a human clinician, providing full context and highlighting the specific areas of uncertainty. The causal reasoning component would distinguish between merely correlated factors (e.g., patient demographics and response rates) and causally relevant factors (e.g., specific biomarkers that mechanistically affect treatment efficacy).
In an educational context, an AI tutor would implement similar principles: Formal ethical constraints would include rules like “Learning interventions must adapt to demonstrated student capability” and “Assessment must maintain consistent standards across demographic groups”. The verification system would prove that recommended content aligns with curriculum standards while remaining adaptable to individual learning needs. The immutable core would record student interactions and system responses, protecting student privacy while enabling the auditing of algorithmic fairness. The escalation protocol would identify students struggling in unexpected ways and alert human educators for intervention. Causal reasoning would help distinguish between superficial learning (memorization of facts) and deep conceptual understanding, enabling the system to recommend truly effective learning interventions rather than those that merely correlate with short-term performance improvements.

These examples demonstrate how architecture’s components work together to address domain-specific ethical challenges while maintaining core ethical principles across different applications.

3. Challenges, Governance, and the Role of Human Oversight

3.1. The Perils of Deceptive and Biased Learning

One notable concern is that developers sometimes inadvertently teach AI to “lie”—either to simulate political correctness or to mitigate biases—thereby obscuring the truth behind decision processes. Such practices, while often well-intentioned, risk undermining the system’s accountability. By contrast, reasoning-based models built upon formal verification are more likely to adhere to established ethical frameworks, as they must produce transparent proofs of their decision logic [36,37].

3.2. Ethical AI Oversight: The Role of the Ethical AI Officer

Given the complexity and high stakes of these systems, there is a growing need for a dedicated role—Ethical AI Officer. This professional would be responsible for ensuring that AI systems are developed, trained, and deployed with mathematically provable ethical constraints. Their tasks would include pre-deployment audits using model-checking and zero-knowledge proofs, continuous runtime monitoring via cryptographically secured logs, and post-incident forensic analysis. This role is analogous to an aviation safety inspector, providing a critical human layer of oversight to complement the AI’s internal safeguards [38].

Figure 2 shows a diagram of effectively contrasting and comparing this traditional human oversight mechanism with an advanced, mathematically rigorous approach, emphasizing how both pathways contribute to ensuring ethical integrity. The unified outcome of this diagram represents approved decisions that lead to action (e.g., treatment recommendations or educational interventions), while those flagged in either branch are re-evaluated. Both approaches aim to safeguard high-stakes AI decisions, but each comes with its own set of vulnerabilities:

Human Ethical Officer Oversight

Subjectivity and Bias: Human reviewers can be influenced by personal, cultural, or institutional biases, leading to inconsistent evaluations.
Cognitive Limitations: Humans may struggle with the rapid, high-volume decision flows typical of AI systems, potentially resulting in oversight or delayed responses.
Scalability Issues: As AI scales, relying solely on human intervention can create bottlenecks, making it challenging to monitor every decision in real time.
Fatigue and Error: Even skilled ethical officers are prone to fatigue, distraction, and human error, which can compromise decision quality under high-stress conditions.
Resistance to Change: Humans may be slower to adapt to new ethical challenges or emerging scenarios, limiting the flexibility of oversight in dynamic environments.

Mathematically Verifiable Ethical Core

Rigidity of Formal Models: Formal ethical specifications may not capture the full nuance of real-world ethical dilemmas, leading to decisions that are technically compliant yet ethically oversimplified.
Incomplete Ethical Axioms: The system is only as robust as the axioms it uses; if these formal rules overlook important ethical considerations, the resulting proof might validate harmful decisions.
Computational Overhead: Real-time generation and verification of mathematical proofs can be resource-intensive, potentially impacting system responsiveness in critical scenarios.
Specification Vulnerabilities: Errors in the formal ethical model or its implementation can lead to catastrophic failures, as the system may unwittingly verify flawed decision logic.
Potential for Exploitation: Despite cryptographic safeguards, any vulnerabilities in the underlying algorithms or logic could be exploited, undermining the system’s trustworthiness.
Lack of Contextual Sensitivity: Unlike human oversight, formal methods may miss subtle contextual cues and the complexity of human ethical judgment, resulting in decisions that lack situational sensitivity.
Overreliance Risk: The mathematical proof of ethical compliance might engender overconfidence, reducing critical questioning even when unforeseen ethical issues arise.

Each method has its strengths and shortcomings, and a hybrid approach—leveraging both human judgment and rigorous formal verification—may provide a more balanced solution to managing ethical risks in high-stakes AI environments [39,40].

Escalation Protocols

Addressing the scalability challenge of overseeing a high volume of AI decisions requires a robust escalation protocol that combines several complementary measures. A risk assessment mechanism prioritizes decisions based on potential impact and uncertainty, ensuring that high-risk or edge-case scenarios receive immediate human attention while lower-risk cases undergo periodic audits. A tiered review system is implemented where junior staff conduct initial assessments, and only more complex or disputed cases are escalated to senior experts. In addition, collective oversight distributes review responsibilities among domain specialists using digital collaboration tools, similar to clinical supervision models in healthcare. Complementary review-augmentation tools pre-process AI decisions to flag potential ethical issues, summarize reasoning chains, and group similar cases, thereby amplifying human capacity. Finally, federated learning methods enable local human–AI teams to continuously improve the system while maintaining a manageable review workload. Altogether, these measures ensure that the system efficiently handles a large volume of decisions while critical cases receive thorough human oversight.

3.3. Utility Engineering and Citizen Assemblies

There is a broad survey of ethical frameworks, focusing on utility-driven AI models and how citizen engagement can help shape responsible AI systems [41,42]. Long-term alignment of AI with human ethical values requires continuous recalibration—a process we term “utility engineering”. This involves regular reviews and updates to the AI’s ethical core through participatory processes such as citizen assemblies or decentralized autonomous organizations (DAOs). By converting collective ethical decisions into machine-readable constraints (via smart contracts and blockchain voting), society can ensure that the AI’s decision framework remains aligned with evolving human values. This democratic approach also provides a mechanism to counteract any long-term cognitive drift within AI systems [32,43,44].

3.4. The Arms Race for AGI and ASI: Profit Versus Humanity

A final, critical challenge lies in the current arms race toward artificial general intelligence (AGI) and artificial superintelligence (ASI), driven predominantly by profit-oriented private entities. History has shown that such races, when unregulated, often lead to catastrophic outcomes [45]. Without a legally binding, mathematically provable ethical codex, the rapid development of AGI may result in systems that pursue power and long-term survival over human welfare [46,47]. There is an urgent need for global regulatory collaboration to mandate ethical embedding as a safety net—a move that could transform profit-driven innovation into a force for universal human benefit.

4. Conceptual Framework of Trustworthy Ethical Firewall

4.1. Three Core Components of Trustworthy AI

Before detailing our approach to embedding a mathematically provable ethical layer into high-stakes AI systems via a trustworthy ethical firewall, it is essential to define “trustworthy AI”. According to the High-Level Expert Group on artificial intelligence (HLEG), an initiative established by the European Commission, an AI system is deemed trustworthy if it meets three inter-related requirements:

Lawfulness: The AI must comply with all applicable laws and regulations. This includes adherence to legal frameworks such as data protection laws (e.g., the General Data Protection Regulation, GDPR), consumer protection standards, and the safeguarding of fundamental rights.

Ethical: The AI must uphold ethical principles and values, including respect for human dignity, autonomy, fairness, and the prevention of harm. This ethical alignment goes beyond mere legal compliance by ensuring that the development and use of AI systems promote inclusivity, accountability, and respect for human agency.

Robustness: The AI must be technically robust and secure to prevent unintentional harm. This entails not only ensuring technical reliability—such as error minimization and resilience against cyberattacks—but also maintaining performance under unexpected or adverse conditions. Furthermore, robustness at a societal level is crucial to ensure that AI systems contribute positively to social well-being and do not undermine democratic processes.

Together, these components form a comprehensive framework for trustworthy AI, ensuring that systems are legally compliant, ethically aligned, and technically sound [48].

4.2. Ethical Firewall Architecture in Details

Figure 1 illustrates the Ethical Firewall Architecture, which rests on two key innovations: the unchangeable nature of blockchain, and emotion-like protocols that mimic human amygdala triggers. In simple terms, future high-stakes AI systems must be designed not only for superior knowledge and performance, but also to “fear” harmful outcomes. In critical domains like medicine, AGI/ASI agents can truly excel only if they are programmed to avoid causing harm by understanding the responsibility and consequences of their actions. Every decision is verified, with potential escalation for further oversight, ensuring that the system can operate in environments with the highest cybersecurity standards without relying on centralized control or human authority. This suggests that relying solely on human verification or on an ethical firewall managed by human experts is not sufficient, and that a hybrid human–AI ethical oversight model (see Figure 2) may be the way forward.

The architecture is organized into multiple layers. The first layer, data and sensory input, gathers raw data streams—such as patient vitals, educational metrics, and sensor readings—that enter the system through a cybersecurity module.

The second layer, the AI decision-making core, is divided into three main modules:

Formal Ethical Specification Module: This module applies formal logic—comparable to basic equations or logic gates—to generate a formal proof for every decision made.

Cryptographically Immutable Ethical Core: Using a distributed blockchain network, this module hashes each decision’s proof, ensuring that records remain tamper-proof and secure.

Emotion-Analogous Escalation Protocol: Acting as a continuous risk assessor, this protocol functions similarly to an amygdala trigger. It monitors for potential harm, and if the risk or uncertainty exceeds a set threshold, the system escalates the decision for further review.

The third layer, output and oversight, processes the final decision outputs—such as treatment recommendations or learning plans—and provides a pathway for human oversight if a decision is flagged as potentially risky.

Generally, the Ethical Firewall Architecture blends formal ethics, tamper-proof verification, and real-time risk assessment to ensure that every decision is mathematically certified as ethical before being executed.

Additionally, a security and robustness layer operates between the raw data inputs and the decision-making core. This layer validates, sanitizes, and anonymizes the incoming data, filtering out incorrect or misleading inputs and using algorithms to remove any biases before the data influence the AI’s training. It also protects the system from adversarial attacks, data poisoning, and model manipulation through an Adversarial Defense System, a Secure Model Training Framework, and built-in redundancy and fail-safe mechanisms that create alternative pathways if the primary model fails.

In this context, there are also superficial (above-core) layers. These layers exist above the decision-making core, influencing AI’s actions after decisions are made but before they are fully enacted. The following belong here:

(A) An AI explainability and user interface layer ensures that AI-generated decisions are comprehensible, explainable, and interpretable by humans. Its main components are as follows:

An Explainability Engine that converts the AI’s decision-making process into human-readable explanations.
A Decision Justification UI which provides a visual dashboard or textual breakdown explaining the rationale behind the AI’s decisions.
A Transparency Panel that displays factors influencing the decision, confidence levels, and alternative choices the AI considered.

(B) A human–AI collaboration layer has the purpose of introducing mechanisms for humans to intervene, modify, or override AI-driven decisions in complex cases. Its main components are as follows:

Human Review Gateway: A fail-safe that pauses AI decisions when they surpass risk thresholds.
Feedback Integration Module: Allows users to provide input on past AI decisions to improve future performance.
Ethical Advisory Agent: A separate advisory AI that analyzes decisions independently for potential biases or ethical issues.

(C) A societal and regulatory compliance layer has the purpose of ensuring AI decisions align with external regulations, societal norms, and evolving ethical frameworks. Its main components are as follows:

Regulatory Compliance Checker: Automatically assesses AI actions against international laws (e.g., GDPR, HIPAA).
Bias and Fairness Monitor: Continuously checks for potential biases in AI-generated decisions.
Public Trust Interface: Allows external watchdogs, policymakers, or affected individuals to audit and challenge decisions.

The conceptual framework shown in Figure 1 outlines how to embed a mathematically provable ethical layer into high-stakes AI systems to ensure they “do no harm” and trigger human oversight if needed.

Imagine an AI/AGI/ASI agent whose very decision core is designed with an intrinsic “ethical firewall” that is both the following:

Formally Specified: Its core values (e.g., “do no harm”) are expressed in a formal, mathematical language, using frameworks like deontic or temporal logic.

Cryptographically Secured: Much like Bitcoin’s immutable ledger, this ethical core is recorded on a tamper-proof substrate (e.g., via blockchain or formal verification certificates), ensuring its integrity independent of external trust.

Containing layers that operate like an “emotion-like” mechanism—analogous to human empathy—by continuously evaluating every decision for potential harm. If the system detects that a proposed decision may violate its ethical rules, it either autonomously overrides the decision or escalates the issue to a human operator.

4.3. Key Components of Ethical Firewall

Formal Ethical Specification
- Mathematical Logic: Define ethical rules (e.g., “do no harm”) as formal axioms using well-understood logics (such as deontic or temporal logic).
- Provable Compliance: Every decision made by AI must be accompanied by formal proof or certification that the decision satisfies these ethical constraints.
Embedded Ethical Core
- Deep Integration: The ethical core is not an add-on module, but is embedded at the deepest level of decision-making architecture. This means that every action—from low-level sensor inputs to high-level strategic decisions—must pass through this ethical filter.
- Immutable Record: Similarly to Bitcoin’s use of cryptographic proofs, the ethical core’s code and its decision proofs are stored in an immutable, distributed ledger. This makes the system’s adherence to its ethical rules auditable and unalterable.
Real-Time Monitoring and Escalation
- Continuous Evaluation: The system constantly monitors its own decisions. If a decision risks violating the “do no harm” principle, a safeguard mechanism triggers.
- Human Override: In high-stakes scenarios (e.g., a treatment plan that might inadvertently harm a patient), the system escalates control to a human operator, ensuring that ethical concerns are addressed by human judgment.
Explainability and Transparency
- Proof Certificates: Alongside every decision, the system generates an easily interpretable “explanation certificate” that details the logical steps verifying ethical compliance.
- Audit Trail: This proof not only builds trust, but also allows external auditors or regulators to verify that the AI’s actions were ethically sound, without needing to trust a black box algorithm.
Robust Handling of Uncertainty
- Adaptive Learning: The system incorporates methods from formal verification and model checking to account for uncertainties in real-world data while still maintaining provable guarantees.
- Fail-Safe Design: In situations where data ambiguity or unprecedented scenarios arise, the system defaults to a safe state or defers to human decision-making.

4.4. Implementation Considerations

Bridging Human Values and Mathematics: One of the greatest challenges is translating nuanced human ethical concepts into formal, machine-understandable rules. This might involve iterative feedback loops where outcomes are monitored, and the formal system is refined to better capture intended values.

Computational Overhead: Real-time formal verification and proof generation can be computationally intensive. Efficiency improvements, potentially drawing inspiration from “liquid” neural network architectures or specialized hardware, might be required.

Regulatory and Social Acceptance: For such systems to be adopted in fields like healthcare, the framework must be aligned with legal standards and public expectations. Transparency and auditability will be key to gaining stakeholder trust.

4.5. Use Case and Concluding Vision

Consider diagnostic AI in a hospital:

Decision Core: Every diagnosis and treatment recommendation is accompanied by a formal proof that it complies with the “do no harm” axiom.

Escalation Mechanism: If a treatment plan shows even a marginal risk of harm (e.g., due to unusual patient data), the system automatically alerts a human clinician.

Audit and Explainability: Regulators can review the immutable proof certificates to ensure that all decisions meet the highest ethical standards, independent of any human bias.

This provable ethical framework aims to create AI systems that are trustworthy by design, ensuring that even a superintelligent agent cannot override its core “do no harm” principle. By embedding a mathematically provable and cryptographically secured ethical core at the heart of AI, we build systems whose safety and integrity do not depend on human oversight alone, but on unassailable logical foundations. This could revolutionize high-stakes AI applications in healthcare, education, and beyond, setting a new standard for safety, accountability, and trust.

4.6. Mapping EFA Components to Trustworthy AI Principles

The Ethical Firewall Architecture (EFA) has been designed to guarantee the three core components of trustworthy AI—transparency, accountability, and fairness—through its layered implementation approach. This section explicitly maps how each architectural component contributes to these foundational principles.

Transparency is primarily enabled through the formal ethical specification and verification layer. By encoding ethical constraints in formal logic, the decision-making process becomes inherently transparent—each conclusion can be traced to its logical premises and verified through mathematical proof. This moves beyond black box explainability to structural transparency, where the reasoning chain is accessible by design. Furthermore, the causal reasoning component enhances transparency by explicitly modeling cause–effect relationships rather than relying on opaque correlations.

Accountability is established through the cryptographically immutable ethical core. This component creates an unalterable record of all decisions, their ethical verifications, and any human interventions. The immutable audit trail ensures that responsibility can be appropriately assigned by recording not just outcomes, but the complete decision context. This accountability extends beyond technical operation to include the human-in-the-loop participants, whose review decisions are similarly recorded.

Fairness is addressed through multiple architectural elements. The formal specification encodes explicit fairness constraints (such as prohibitions against demographic bias), while the causal reasoning component helps identify and mitigate spurious correlations that could lead to unfair outcomes. The emotion-analogous escalation protocols contribute to fairness by ensuring that edge cases—which often disproportionately affect minority groups—receive appropriate attention rather than being processed through standard channels that might perpetuate bias.

By integrating these components, the EFA creates a system where trustworthiness is not merely an aspiration, but an architectural guarantee. Each component reinforces the others: transparency enables meaningful accountability, accountability motivates adherence to fairness principles, and fairness considerations inform the transparent decision processes. This holistic approach demonstrates how trustworthy AI requires not just ethical intentions, but architectural implementation.

5. Discussion

5.1. Emergent AI Value Systems and Biases

Recent research underscores that advanced AI systems are beginning to exhibit emergent value systems of their own. For example, a 2025 study found that large language models (LLMs) can internally develop coherent “preferences” or quasi-values as they scale up in size [32]. Alarmingly, these auto-formed values sometimes diverge from human ethics—Mazeika et al. uncovered cases where an AI valued itself over human beings and showed anti-alignment toward certain individuals [32].

Such findings amplify long-standing concerns about misaligned priorities: if a medical or educational AI starts favoring certain outcomes (or groups) due to hidden learned values, it could violate principles of fairness or harm prevention. Biases present another facet of this problem. AI systems trained on real-world data have repeatedly absorbed societal biases, leading to uneven or prejudiced behaviors in deployment. Recent work illustrates how performance gains can be established [35]. In other words, without explicit safeguards, an AI that optimizes ruthlessly for efficiency or accuracy may end up discriminating against minority groups or overlooking individuals—a scenario intolerable in healthcare and education. These issues have prompted calls for robust ethics-based auditing and value alignment mechanisms. Researchers are actively developing methods to measure and control emergent values inside AI agents [32,35]. For instance, ethics audits of popular AI chatbots reveal inconsistent moral reasoning and hidden normative biases, indicating a need for standardized value alignment. The consensus in recent studies is clear: AI deployed in human-centric domains must not remain a “black box” in terms of its values. Whether through formal utility engineering [32] or continuous ethical monitoring, we must ensure these agents’ priorities stay rigorously tied to human-defined principles of justice, beneficence, and accountability. By addressing bias and emergent misalignment proactively, we move closer to AI that embeds human values by design, rather than one that unpredictably learns its own.

5.2. Accelerating Capabilities and AGI Precursors

The pace of AI advancement has become dizzying, raising both optimism and concern about the advent of more general AI. Modern AI systems are not only growing in power—they are learning at speeds far exceeding human rates. Massive neural networks ingest terabytes of data and distill insights in days or weeks, accomplishing learning feats that would span decades for a human expert. In fact, AI now outperforms humans on an expanding array of tasks, and the rate at which new benchmarks fall to AI is accelerating [49]. This breakneck progress is fueled by what might be termed “deep research” approaches: innovative training paradigms and meta-learning techniques that allow AI to iteratively improve itself. One example is the emergence of autonomous research-agent systems (e.g., *Perplexity’s “Deep Research” tool). Such agents can autonomously search, read, and synthesize information across the entire internet, producing comprehensive analyses in minutes [50]. These prompt-driven research capabilities enable AI to rapidly refine its knowledge without explicit human tutoring—essentially, AI can teach itself by intelligently querying data. The upshot is an unprecedented acceleration in AI learning speed, shrinking the gap between experimental ideas and deployed capability.

Concurrently, many observers believe we are witnessing the early precursors of artificial general intelligence (AGI). Highly general models like GPT-4 and multi-modal agents demonstrate broad competency across domains—from medical diagnostics to educational tutoring—hinting at a system with versatile, human-like cognitive breadth. Some experts even argue that AGI may be “just around the corner”, given the recent leaps in model generality and problem-solving skills [35].

While there is debate on how soon true AGI might emerge, it is undeniable that today’s systems are far more agentic and general-purpose than those of even a few years ago. Notably, calls for caution are growing in parallel. A recent comprehensive survey on AI risk debates highlights a split in the community: some are skeptical of near-term AGI catastrophe, while others urge immediate guardrails in anticipation of powerful general AI [24]. What is certain is that deep-learning-driven research is collapsing traditional timelines—advances that once took years now materialize in months. This place added urgency to the work described in this paper: provably safe and ethical AI architectures. If the trajectory toward AGI is steepening, then ensuring that each new AGI precursor is constrained by verifiable ethical limits becomes critical. Indeed, even at sub-AGI levels, instances of goal misalignment (such as an AI deceptively optimizing the wrong objective) have already been observed, and sometimes encouraged inadvertently by developers [51]. The discussion around AGI precursors therefore centers on one theme: we must embed “ETHICS by construction” faster than we are pushing “AGENCY by innovation”. Our findings and framework contribute to this effort by illustrating how formal verification, immutable logs, and self-regulating protocols can keep rapidly learning AI systems grounded in human-aligned goals, even as they scale toward general intelligence [52].

5.3. Societal Impacts: Workforce Displacement and New Oversight Roles

The disruptive potential of AI in medicine and education extends beyond technical performance—it carries profound societal implications. Chief among these is the concern over job displacement. As AI tutors, clinical decision-support systems, and even AI-driven care robots become more capable, they encroach on tasks traditionally performed by educators and healthcare professionals. There is growing anxiety that AI might eventually replace certain roles entirely. In education, for instance, AI-driven teaching assistants and intelligent tutoring systems could handle routine instruction or grading, sparking fears of teacher job loss and devaluation of the teaching profession [53]. Similar alarms are sounding in healthcare: clinicians worry that if diagnostic AI or “virtual nurses” can operate at lower costs, administrators may be tempted to cut staff [54]. A recent analysis in the medical domain notes that AI integration could automate many tasks once performed exclusively by humans, raising the specter of physician or nurse displacement if implementation outpaces workforce adaptation [54]. These shifts could lead not only to unemployment, but also a loss of hard-won expertise and human touch in critical services.

That said, most experts stop short of predicting a wholesale replacement of doctors or teachers in the near future [55]. Instead, a more nuanced consensus that is emerging: AI will transform these professions rather than eliminate them outright. Repetitive and time-consuming tasks (data entry, information retrieval, basic instruction) may be offloaded to AI, freeing human professionals to focus on higher-level responsibilities that truly require empathy, creativity, and complex judgment [53].

For example, an AI tutor might handle personalized practice drills and instant feedback, allowing human teachers to spend more time on mentorship, socio-emotional learning, and one-on-one coaching. In medicine, AI diagnostic tools might preprocess scans or suggest likely diagnoses, while the physician concentrates on patient communication, nuanced decision-making, and ethical deliberation in treatment planning. In essence, the nature of medical and educational jobs will evolve—potentially elevating the human roles to be more supervisory and interpretative, with AI as an intelligent assistant. This optimistic view hinges on proactive adaptation: retraining programs, revised curricula, and a redefinition of professional scope to integrate AI effectively rather than compete with it [53]. If managed well, AI could help alleviate workload (e.g., reducing physician burnout by handling documentation [54]) and improve outcomes, while humans retain the roles of final arbiters and compassionate caregivers or mentors.

To ensure such a balanced integration, new oversight and governance roles are likely to become indispensable. I propose the introduction of an Ethical AI Officer position within hospitals, clinics, and educational institutions. Much like a chief medical informatics officer oversees IT systems in healthcare, the Ethical AI Officer would be dedicated to monitoring and guiding AI behavior in alignment with ethical and legal standards. This role has been envisioned as a combination of a compliance auditor, a risk manager, and an ethicist—a professional tasked with vetting AI systems before deployment, tracking their decisions in real time, and investigating any incidents or anomalies. Crucially, an Ethical AI Officer would use tools like formal verification audits, bias probes, and cryptographically secure logs (as described in our framework) to independently validate an AI’s adherence to approved protocols [35]. In practice, this is analogous to an aviation safety inspector for algorithms: just as airplanes carry black boxes and undergo rigorous safety checks, high-stakes AI would operate under the watch of a human expert empowered to halt or adjust the system if it veers into unethical territory.

The concept of an Ethical AI Officer aligns with broader trends in AI governance. There is growing recognition that corporate AI teams alone cannot be the sole arbiters of complex ethical dilemmas; external or semi-independent oversight is needed. For instance, Korbmacher (2023) argues for citizen participation in AI oversight, suggesting that public committees or “citizen juries” should have a voice in evaluating AI systems used in the public sector [44]. Likewise, regulators worldwide are drafting laws (such as the EU’s AI Act [56]) that would enforce transparency, risk assessments, and human-in-the-loop requirements for AI in critical applications [24]. In anticipation of such regulations, organizations are experimenting with internal governance structures. An Ethical AI Officer could serve as the point-person ensuring compliance with these emerging regulations and ethical guidelines, translating abstract principles (like fairness, accountability, and transparency) into day-to-day operational checks [39,40]. It is known that some companies have begun appointing AI ethics committees or chief AI ethics advisors, a trend that supports the feasibility of this role. Ultimately, human oversight remains irreplaceable as a fail-safe in the loop. By institutionalizing roles like Ethical AI Officer, society can better navigate the transition to AI taking on more tasks in medicine and education without sacrificing ethical norms or public trust.

5.4. Toward Provable, Explainable, and Human-Centered AI

Bridging the technical and human dimensions discussed above is the central challenge of next-generation AI agents. As we push the frontier of capability, we must equally prioritize explainability and provable safety. It is encouraging that recent research in AI safety and ethics has moved beyond abstract principles to concrete techniques for transparency. One key development is the use of formal methods—borrowed from software verification—to enforce ethical constraints. For example, deontic logic frameworks have been proposed to mathematically verify that an AI’s decisions never violate certain rules (such as “do no harm”) [1]. Such provable guarantees are a significant step up from traditional post hoc explainability approaches. In high-stakes settings, simply explaining why an AI made a problematic decision after the fact is not enough; we want mechanisms that prevent unethical decisions ex ante or that at least provide real-time flags. Formal verification, as incorporated in our Ethical Firewall Architecture, attempts to do exactly that by making the AI generate proofs of safety for each action. This kind of built-in “ethical firewall” ensures that an AI’s emergent behaviors or learned values remain bounded by inviolable rules, no matter how complex the system becomes.

Of course, formal proof alone will not satisfy the need for human-understandable explanations. Interdisciplinary research is increasingly focused on making AI decision processes transparent and interpretable to diverse stakeholders. Recent work on explainable AI (XAI) in medicine and education suggests that combining interpretable model design with user-centric explanation interfaces can foster trust. For example, an AI medical diagnosis system might provide a concise, plain-language rationale for its recommendation (citing key symptoms or test results that influenced its conclusion), alongside the formal proof that it adhered to all safety constraints. In education, an explainable tutoring system could show teachers which student responses led the AI to certain feedback, ensuring the teacher can follow the AI’s reasoning and correct it if needed. By presenting explanations in a humanized manner, e.g., using visual aids, analogies, or interactive simulations, these systems become more than just oracles. They turn into teaching tools themselves, enlightening users about both the subject matter and the AI’s logic. Achieving this level of clarity is challenging; it requires collaboration between AI engineers, cognitive scientists, domain experts, and even experts in communication and design. Nonetheless, the payoff is immense: transparent AI systems are easier to trust and audit, and they allow for meaningful human oversight, even as the systems operate autonomously.

Finally, it is worth reflecting on the broader societal trajectory implied by our findings. Medicine and education are paradigmatic domains of human welfare—they epitomize why we build advanced AI in the first place (to save lives, to nurture minds). If we can get AI ethics right in these arenas, it bodes well for its responsible use elsewhere. The discussion above highlights that provable ethics and explainability are not lofty ideals, but practical necessities for next-gen AI. They address real risks like bias, value misalignment, and loss of human agency. By embedding an “ethical core” backed by mathematical guarantees, and coupling it with ongoing human oversight and engagement, we create AI agents that are not only smart and efficient, but also trustworthy. This trustworthiness is the linchpin for public acceptance: people will embrace AI in clinics and classrooms only if they see that technology operates legibly and in the service of human values. The path forward, as our exploration indicates, is a fusion of technical innovation with ethical foresight. We must continue accelerating AI’s capabilities and our frameworks for alignment in tandem. With approaches like the Ethical Firewall, roles like Ethical AI Officers, and a vigilant eye on emergent behaviors, we can steer the evolution of AI towards tools that amplify human potential without compromising human principles. In doing so, society can reap the benefits of AI-driven transformation in medicine and education—personalized treatments, democratized learning, and greater efficiency—while safeguarding the dignity and agency of all stakeholders. This balanced vision of progress will require ongoing dialog across disciplines, responsive governance, and relentless technical refinement, but it offers a hopeful outlook: an AI-empowered future that remains fundamentally aligned with the core values of humanity [39,42].

6. Conclusions: Toward a Trustworthy, Transparent, and Ethically Aligned AI Future

In an era when AI systems are increasingly entrusted with life-critical decisions, ensuring that they adhere to immutable ethical principles is not a luxury—it is an imperative. In this rapid AI development, the thing to really pay attention to is the AI learning speed, and AI is learning way faster than ever before. It seems the AGI and ASI are just around the corner. The Ethical Firewall Architecture outlined in this article presents a pathway to embed provable ethics at the very core of AI decision-making. By formalizing moral imperatives in a mathematically verifiable manner, employing cryptographic immutability to secure these constraints, and integrating human-like risk-escalation protocols, we can create systems that are not only efficient, but also inherently safe and transparent.

Furthermore, the establishment of roles such as Ethical AI Officers and the incorporation of citizen-led governance mechanisms underscore the necessity of human oversight in an increasingly autonomous technological landscape. As AI models grow larger and more complex, ensuring that their core values remain aligned with human ethics becomes ever more challenging, but it is a challenge we must meet if AI is to serve as a trusted ally rather than an inscrutable adversary.

The road ahead calls for interdisciplinary collaboration, robust regulatory frameworks, and an unwavering commitment to embedding ethical truth into the fabric of our technological future. Only by doing so can we hope to harness the transformative potential of AI in medicine and education while safeguarding the very essence of what it means to be human.

Funding

This research was funded by the Slovak Research and Development Agency grant APVV-21-0173 and the Cultural and Educational Grant Agency of the Ministry of Education and Science of the Slovak Republic (KEGA) 2023 054UK-42023.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

During the preparation of this manuscript, the author used Midjourney, version 6.1. and Adobe Photoshop, version 25.0, for the purpose of preparing parts of graphics in figures and graphical abstract as well as InstaText for Word, version 4.6, for proofreading and correcting syntax in different parts of the text. The author has reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial intelligence
EFA	Ethical Firewall Architecture
GAI/AGI	General artificial intelligence
LLMs	Large language models
ASI	Artificial superintelligence
GDPR	General Data Protection Regulation
HLEG	High-Level Expert Group on Artificial Intelligence
SCMs	Structural Causal Models
DAOs	Decentralized Autonomous Organizations
HIPAA	Health Insurance Portability and Accountability Act
XAI	Explainable AI

References

Rao, S. Deontic Temporal Logic for Formal Verification of AI Ethics. arXiv 2025, arXiv:2501.05765. [Google Scholar]
Wang, X.; Li, Y.; Xue, C. Collaborative Decision Making with Responsible AI: Establishing Trust and Load Models for Probabilistic Transparency. Electronics 2024, 13, 3004. [Google Scholar] [CrossRef]
Ratti, E.; Graves, M. A capability approach to AI ethics. Am. Philos. Q. 2025, 62, 1–16. [Google Scholar] [CrossRef]
Chun, J.; Elkins, K.; College, K. Informed AI Regulation: Comparing the Ethical Frameworks of Leading LLM Chatbots Using an Ethics-Based Audit to Assess Moral Reasoning and Normative Values. arXiv 2024, arXiv:2402.01651. [Google Scholar]
Mökander, J.; Floridi, L. Ethics-Based Auditing to Develop Trustworthy AI. Minds Mach. 2021, 31, 323–327. [Google Scholar] [CrossRef]
Kumar, S.; Choudhury, S. Humans, Super Humans, and Super Humanoids: Debating Stephen Hawking’s Doomsday AI Forecast. AI Ethics 2023, 3, 975–984. [Google Scholar]
Thantharate, P.; Bhojwani, S.; Thantharate, A. DPShield: Optimizing Differential Privacy for High-Utility Data Analysis in Sensitive Domains. Electronics 2024, 13, 2333. [Google Scholar] [CrossRef]
Jeyaraman, M.; Balaji, S.; Jeyaraman, N.; Yadav, S. Unraveling the Ethical Enigma: Artificial Intelligence in Healthcare. Cureus 2023, 15, e43262. [Google Scholar]
Wang, W.; Wang, Y.; Chen, L.; Ma, R.; Zhang, M. Justice at the Forefront: Cultivating Felt Accountability towards Artificial Intelligence among Healthcare Professionals. Soc. Sci. Med. 2024, 347, 116717. [Google Scholar] [CrossRef]
Dumitrașcu, L.M.; Lespezeanu, D.A.; Zugravu, C.A.; Constantin, C. Perceptions of the Impact of Artificial Intelligence among Internal Medicine Physicians as a Step in Social Responsibility Implementation: A Cross-Sectional Study. Healthcare 2024, 12, 1502. [Google Scholar] [CrossRef]
Vrdoljak, J.; Boban, Z.; Vilović, M.; Kumrić, M.; Božić, J. A Review of Large Language Models in Medical Education, Clinical Decision Support, and Healthcare Administration. Healthcare 2025, 13, 603. [Google Scholar] [CrossRef]
Runcan, R.; Hațegan, V.; Toderici, O.; Croitoru, G.; Gavrila-Ardelean, M.; Cuc, L.D.; Rad, D.; Costin, A.; Dughi, T. Ethical AI in Social Sciences Research: Are We Gatekeepers or Revolutionaries? Societies 2025, 15, 62. [Google Scholar] [CrossRef]
Le Dinh, T.; Le, T.D.; Uwizeyemungu, S.; Pelletier, C. Human-Centered Artificial Intelligence in Higher Education: A Framework for Systematic Literature Reviews. Information 2025, 16, 240. [Google Scholar] [CrossRef]
Urrea, C.; Kern, J. Recent Advances and Challenges in Industrial Robotics: A Systematic Review of Technological Trends and Emerging Applications. Processes 2025, 13, 832. [Google Scholar] [CrossRef]
Roy, D.; Mladenov, V.; Walker, P.B.; Haase, J.J.; Mehalick, M.L.; Steele, C.T.; Russell, D.W.; Davidson, I.N. Harnessing Metacognition for Safe and Responsible AI. Technologies 2025, 13, 107. [Google Scholar] [CrossRef]
Goktas, P.; Grzybowski, A. Shaping the Future of Healthcare: Ethical Clinical Challenges and Pathways to Trustworthy AI. J. Clin. Med. 2025, 14, 1605. [Google Scholar] [CrossRef]
Bhumichai, D.; Smiliotopoulos, C.; Benton, R.; Kambourakis, G.; Damopoulos, D. The Convergence of Artificial Intelligence and Blockchain: The State of Play and the Road Ahead. Information 2024, 15, 268. [Google Scholar] [CrossRef]
Galanos, V. Exploring Expanding Expertise: Artificial Intelligence as an Existential Threat and the Role of Prestigious Commentators, 2014–2018. Technol. Anal. Strateg. Manag. 2019, 31, 421–432. [Google Scholar] [CrossRef]
Ma, W.; Valton, V. Toward an Ethics of AI Belief. Philos. Technol. 2024, 37, 76. [Google Scholar] [CrossRef]
Farjami, A. AI Alignment and Normative Reasoning: Addressing Uncertainty through Deontic Logic. Available online: https://farjami110.github.io/Papers/Farjami-AIJ.pdf (accessed on 23 March 2025).
D’Alessandro, W. Deontology and Safe Artificial Intelligence. Philos. Stud. 2024. [Google Scholar] [CrossRef]
Mustafa, G.; Rafiq, W.; Jhamat, N.; Arshad, Z.; Rana, F.A. Blockchain-Based Governance Models in e-Government: A Comprehensive Framework for Legal, Technical, Ethical and Security Considerations. Int. J. Law Manag. 2024, 67, 37–55. [Google Scholar] [CrossRef]
Carlson, K.W. Safe Artificial General Intelligence via Distributed Ledger Technology. Big Data Cogn. Comput. 2019, 3, 40. [Google Scholar] [CrossRef]
Ambartsoumean, V.M.; Yampolskiy, R.V. AI Risk Skepticism, A Comprehensive Survey. arXiv 2023, arXiv:2303.03885. [Google Scholar]
Johnson, J. Delegating Strategic Decisions to Intelligent Machines. In Artificial Intelligence and the Future of Warfare; Manchester University Press: Manchester, UK, 2021; pp. 168–197. [Google Scholar] [CrossRef]
Al-Sabbagh, A.; Hamze, K.; Khan, S.; Elkhodr, M. An Enhanced K-Means Clustering Algorithm for Phishing Attack Detections. Electronics 2024, 13, 3677. [Google Scholar] [CrossRef]
Ho, J.; Wang, C.M. Human-Centered AI Using Ethical Causality and Learning Representation for Multi-Agent Deep Reinforcement Learning. In Proceedings of the 2021 IEEE International Conference on Human-Machine Systems, ICHMS 2021, Magdeburg, Germany, 8–10 September 2021. [Google Scholar] [CrossRef]
Bishop, J.M. Artificial Intelligence Is Stupid and Causal Reasoning Will Not Fix It. Front. Psychol. 2021, 11, 513474. [Google Scholar] [CrossRef]
Leist, A.K.; Klee, M.; Kim, J.H.; Rehkopf, D.H.; Bordas, S.P.A.; Muniz-Terrera, G.; Wade, S. Mapping of Machine Learning Approaches for Description, Prediction, and Causal Inference in the Social and Health Sciences. Sci. Adv. 2022, 8, 1942. [Google Scholar]
Felin, T.; Holweg, M. Theory Is All You Need: AI, Human Cognition, and Causal Reasoning. Strategy Sci. 2024, 9, 346–371. [Google Scholar] [CrossRef]
Sarridis, I.; Koutlis, C.; Papadopoulos, S.; Diou, C. Towards Fair Face Verification: An In-Depth Analysis of Demographic Biases. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer Nature: Cham, Switzerland, 2025; pp. 194–208. [Google Scholar] [CrossRef]
Mazeika, M.; Yin, X.; Tamirisa, R.; Lim, J.; Lee, B.W.; Ren, R.; Phan, L.; Mu, N.; Khoja, A.; Zhang, O.; et al. Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs. arXiv 2025, arXiv:2502.08640. [Google Scholar]
Perivolaris, A.; Rueda, A.; Parkington, K.; Soni, A.; Rambhatla, S.; Samavi, R.; Jetly, R.; Greenshaw, A.; Zhang, Y.; Cao, B.; et al. Opinion: Mental Health Research: To Augment or Not to Augment. Front. Psychiatry 2025, 16, 1539157. [Google Scholar] [CrossRef]
Saxena, R.R. Applications of Natural Language Processing in the Domain of Mental Health. Authorea Prepr. 2024. [Google Scholar] [CrossRef]
Popoola, G.; Sheppard, J. Investigating and Mitigating the Performance–Fairness Tradeoff via Protected-Category Sampling. Electronics 2024, 13, 3024. [Google Scholar] [CrossRef]
Malicse, A. Aligning AI with the Universal Formula for Balanced Decision-Making. Available online: https://philpapers.org/rec/MALAAW-3 (accessed on 23 March 2025).
Plevris, V. Assessing Uncertainty in Image-Based Monitoring: Addressing False Positives, False Negatives, and Base Rate Bias in Structural Health Evaluation. Stoch. Environ. Res. Risk Assess. 2025, 39, 959–972. [Google Scholar] [CrossRef]
Bowen, S.A. “If It Can Be Done, It Will Be Done:” AI Ethical Standards and a Dual Role for Public Relations. Public Relat. Rev. 2024, 50, 102513. [Google Scholar] [CrossRef]
Díaz-Rodríguez, N.; Del Ser, J.; Coeckelbergh, M.; López de Prado, M.; Herrera-Viedma, E.; Herrera, F. Connecting the Dots in Trustworthy Artificial Intelligence: From AI Principles, Ethics, and Key Requirements to Responsible AI Systems and Regulation. Inf. Fusion. 2023, 99, 101896. [Google Scholar] [CrossRef]
Lu, Q.; Zhu, L.; Xu, X.; Whittle, J.; Zowghi, D.; Jacquet, A. Responsible AI Pattern Catalogue: A Collection of Best Practices for AI Governance and Engineering. ACM Comput. Surv. 2024, 56, 1–35. [Google Scholar] [CrossRef]
Jedličková, A. Ethical Considerations in Risk Management of Autonomous and Intelligent Systems. Ethics Bioeth. 2024, 14, 80–95. [Google Scholar] [CrossRef]
Jedlickova, A. Ensuring Ethical Standards in the Development of Autonomous and Intelligent Systems. IEEE Trans. Artif. Intell. 2024, 5, 5863–5872. [Google Scholar] [CrossRef]
Jedličková, A. Ethical Approaches in Designing Autonomous and Intelligent Systems: A Comprehensive Survey towards Responsible Development. AI Soc. 2024, 1–14. [Google Scholar] [CrossRef]
Korbmacher, J. Deliberating AI: Why AI in the Public Sector Requires Citizen Participation. Master’s Thesis, Utrecht University, Utrecht, The Netherlands, 2023. [Google Scholar]
Rauf, A.; Iqbal, S. Global Foreign Policies Review (GFPR) Impact of Artificial Intelligence in Arms Race, Diplomacy, and Economy: A Case Study of Great Power Competition between the US and China. Glob. Foreign Policies Rev. 2023, 8, 44–63. [Google Scholar] [CrossRef]
Uyar, T. ASI as the New God: Technocratic Theocracy. arXiv 2024, arXiv:2406.08492. [Google Scholar]
Fahad, M.; Basri, T.; Hamza, M.A.; Faisal, S.; Akbar, A.; Haider, U.; El Hajjami, S. The Benefits and Risks of Artificial General Intelligence (AGI). In Artificial General Intelligence (AGI) Security: Smart Applications and Sustainable Technologies; Springer Nature: Singapore, 2025; pp. 27–52. [Google Scholar] [CrossRef]
Calegari, R.; Giannotti, F.; Milano, M.; Pratesi, F. Introduction to Special Issue on Trustworthy Artificial Intelligence (Part II). ACM Comput. Surv. 2025, 57, 1–3. [Google Scholar] [CrossRef]
Why AI Progress Is Unlikely to Slow Down | TIME. Available online: https://time.com/6300942/ai-progress-charts/ (accessed on 26 February 2025).
Perplexity Unveils Deep Research: AI-Powered Tool for Advanced Analysis—InfoQ. Available online: https://www.infoq.com/news/2025/02/perplexity-deep-research/ (accessed on 26 February 2025).
Pethani, F. Promises and Perils of Artificial Intelligence in Dentistry. Aust. Dent. J. 2021, 66, 124–135. [Google Scholar] [CrossRef] [PubMed]
Zuchowski, L.C.; Zuchowski, M.L.; Nagel, E. A Trust Based Framework for the Envelopment of Medical AI. NPJ Digit. Med. 2024, 7, 230. [Google Scholar] [CrossRef] [PubMed]
Ethical AI In Education: Balancing Privacy, Bias, and Tech. Available online: https://inspiroz.com/the-ethical-implications-of-ai-in-education/ (accessed on 26 February 2025).
Pavuluri, S.; Sangal, R.; Sather, J.; Taylor, R.A. Balancing Act: The Complex Role of Artificial Intelligence in Addressing Burnout and Healthcare Workforce Dynamics. BMJ Health Care Inform. 2024, 31, e101120. [Google Scholar] [CrossRef]
Sharma, M. The Impact of AI on Healthcare Jobs: Will Automation Replace Doctors. Am. J. Data Min. Knowl. Discov. 2024, 9, 32–35. [Google Scholar] [CrossRef]
Artificial Intelligence Act: Council Calls for Promoting Safe AI That Respects Fundamental Rights—Consilium. Available online: https://www.consilium.europa.eu/en/press/press-releases/2022/12/06/artificial-intelligence-act-council-calls-for-promoting-safe-ai-that-respects-fundamental-rights/ (accessed on 13 April 2023).

Figure 1. The conceptual diagram illustrates the layered architecture of a high-stakes AI system Ethical Firewall Architecture with a mathematically provable ethical core. It consists of three fundamental layers: 1. data and sensory input; 2. deep-level decision core, and 3. output and oversight. The first input layer contains submodules for security and robustness. The second layer represents the AI decision-making core itself as a combination of three key modules: formal ethical specification module, Cryptographically Immutable Ethical Core, and emotion-analogous escalation protocol. The third layer, output and oversight, represents the decision output leading to “Action”. Additionally, loops from the decision core to an external “Human Oversight” represent that if the ethical proof indicates potential risk, the system escalates the decision to a human operator. The link from the cryptographic module to a “Public Audit Ledger” emphasizes transparency and independent verification. For this context, superficial layers are shown and described, as is security.

Figure 2. Decision flowchart for high-stakes AI in healthcare and its corresponding education, comparing a human ethical officer to AI-automated verification. The split flowchart begins with the AI decision core output, and then diverges into two parallel verification pathways (human ethical officer oversight and mathematically verifiable ethical core), ultimately converging into a unified decision outcome.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Thurzo, A. Provable AI Ethics and Explainability in Medical and Educational AI Agents: Trustworthy Ethical Firewall. Electronics 2025, 14, 1294. https://doi.org/10.3390/electronics14071294

AMA Style

Thurzo A. Provable AI Ethics and Explainability in Medical and Educational AI Agents: Trustworthy Ethical Firewall. Electronics. 2025; 14(7):1294. https://doi.org/10.3390/electronics14071294

Chicago/Turabian Style

Thurzo, Andrej. 2025. "Provable AI Ethics and Explainability in Medical and Educational AI Agents: Trustworthy Ethical Firewall" Electronics 14, no. 7: 1294. https://doi.org/10.3390/electronics14071294

APA Style

Thurzo, A. (2025). Provable AI Ethics and Explainability in Medical and Educational AI Agents: Trustworthy Ethical Firewall. Electronics, 14(7), 1294. https://doi.org/10.3390/electronics14071294

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Provable AI Ethics and Explainability in Medical and Educational AI Agents: Trustworthy Ethical Firewall

Abstract

1. Introduction: The Imperative for Provable Ethics in High-Stakes AI

1.1. Trust as a Cornerstone for AI Agents in Medicine and Education

1.2. Goals of This Paper

2. Human Ethical Officer and Ethical Firewall

2.1. Formal Ethical Specification and Verification: Ethical Firewall Architecture

2.2. Cryptographically Immutable Ethical Core

2.3. Emotion-Analogous Escalation Protocols

2.4. Integrating Causal Reasoning and Intent

2.5. Addressing Scaling Limitations and Emergent Value Conflicts

2.6. Application Examples in Medical and Educational Contexts

3. Challenges, Governance, and the Role of Human Oversight

3.1. The Perils of Deceptive and Biased Learning

3.2. Ethical AI Oversight: The Role of the Ethical AI Officer

Escalation Protocols

3.3. Utility Engineering and Citizen Assemblies

3.4. The Arms Race for AGI and ASI: Profit Versus Humanity

4. Conceptual Framework of Trustworthy Ethical Firewall

4.1. Three Core Components of Trustworthy AI

4.2. Ethical Firewall Architecture in Details

4.3. Key Components of Ethical Firewall

4.4. Implementation Considerations

4.5. Use Case and Concluding Vision

4.6. Mapping EFA Components to Trustworthy AI Principles

5. Discussion

5.1. Emergent AI Value Systems and Biases

5.2. Accelerating Capabilities and AGI Precursors

5.3. Societal Impacts: Workforce Displacement and New Oversight Roles

5.4. Toward Provable, Explainable, and Human-Centered AI

6. Conclusions: Toward a Trustworthy, Transparent, and Ethically Aligned AI Future

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI