XAI2Brain: A Perspective on Mechanistic Interpretability for Brain–AI Alignment

Jiang, Richard; Zhou, Yongchen; Wang, Boyuan; Angelov, Plamen; Ni, Qiang

doi:10.3390/make8060167

Open AccessReview

XAI2Brain: A Perspective on Mechanistic Interpretability for Brain–AI Alignment

by

Richard Jiang

^1,2,*

,

Yongchen Zhou

¹,

Boyuan Wang

¹,

Plamen Angelov

¹

and

Qiang Ni

¹

LIRA Centre, Lancaster University, Lancaster LA1 4YW, UK

²

NAII Institute, Shanghai Jiao Tong University, Ningbo 315012, China

^*

Author to whom correspondence should be addressed.

Mach. Learn. Knowl. Extr. 2026, 8(6), 167; https://doi.org/10.3390/make8060167

Submission received: 20 May 2026 / Revised: 15 June 2026 / Accepted: 16 June 2026 / Published: 18 June 2026

(This article belongs to the Section Learning)

Download

Browse Figures

Versions Notes

Abstract

The convergence of artificial intelligence (AI), explainable AI (XAI), and neuroscience is fostering new opportunities for understanding both machine and biological intelligence through interpretable and human-centered learning paradigms. In this Perspective, we introduce XAI2Brain as a conceptual framework for brain–AI alignment, positioning mechanistic interpretability as an intermediate layer connecting neural network representations, human understanding, and neuroscience-inspired AI design. Rather than viewing XAI solely as a post hoc transparency tool, we emphasize its emerging role in enabling mechanistic analysis of internal model representations, concept-level reasoning, and interactive human–AI alignment. We define XAI2Brain as a multi-level conceptual framework rather than a deployable system, explicitly aimed at structuring brain–AI alignment across representation-level, mechanism-level, and interaction-level perspectives. We survey the evolution of XAI methodologies—from feature attribution and concept-based explanations to mechanistic and human-centric interpretability approaches—and discuss how these methods may support bidirectional knowledge transfer between AI systems and cognitive neuroscience. Importantly, we adopt a cautious stance on brain–AI analogy, explicitly recognizing that artificial neural representations are not equivalent to biological neural representations, and instead focusing on functional and informational correspondences rather than structural equivalence. Unlike conventional human-in-the-loop or reinforcement learning from human feedback paradigms that primarily optimize behavioral outputs, XAI2Brain focuses on cognitively interpretable and mechanistically grounded alignment between AI systems and human reasoning processes. This alignment promotes interactive human-in-the-loop intelligence, empowering humans to comprehend, guide, and refine AI systems, while enabling AI systems to better interpret human instructions, intentions, and contextual reasoning. We further discuss the challenges of scaling explainability to large generative and multimodal models, including issues of interpretability robustness, cognitive compatibility, evaluation, and ethical accountability. We also highlight key limitations of current mechanistic interpretability methods, including explanation instability, representation superposition, and lack of causal guarantees, underscoring that these challenges remain open research problems. Rather than proposing a complete artificial brain architecture, this Perspective outlines a research roadmap toward more interpretable, adaptive, and neuroscience-inspired AI systems capable of supporting future brain–AI integration and collaborative intelligence. We additionally clarify that this work follows a narrative perspective review methodology with structured thematic synthesis of the literature. By framing explainability as a bridge between mechanistic AI understanding, cognitive science, and human-centered interaction, XAI2Brain highlights the importance of interpretable alignment for the next generation of brain-inspired AI systems.

Keywords:

explainable AI (XAI); neuroscience–AI alignment; human-in-the-loop; artificial brain; XAI2Brain; mechanistic interpretability

Graphical Abstract

1. Introduction

The rapid advancement of Artificial Intelligence (AI) has heightened the need for transparency and interpretability, especially in systems with significant societal impact [1,2,3]. The opaque nature of AI decision-making, often termed a “black box,” highlights the essential role of Explainable AI (XAI) [4,5,6]. While XAI has traditionally focused on improving transparency and trustworthiness, recent developments in mechanistic interpretability and concept-based reasoning suggest broader opportunities for understanding the internal representations and decision processes of neural networks. These advances position XAI not only as a post hoc explanation tool, but also as a potential bridge connecting artificial neural representations, human cognition, and neuroscience-inspired AI design [7,8,9].

The iterative learning in neural networks parallels aspects of human cognitive processing, implying opportunities for bidirectional knowledge exchange between AI and neuroscience. Figure 1 depicts this interactive cycle, where brain-inspired neural networks yield interpretable models that enhance our understanding of cerebral mechanisms, which in turn guide the development of more adaptive and human-aligned AI systems [10,11]. In this Perspective, we refer to this process as brain–AI alignment, defined as a multi-level conceptual framework spanning representation-level, mechanism-level, and interaction-level correspondences between AI systems and cognitive principles, where explainability serves as an intermediate layer linking machine representations, human interpretation, and neuroscience-inspired abstractions.

Current discussions on XAI advocate for reflective analyses of explainability, interpretability, and human-centered AI design in ongoing research [3,12]. This interdisciplinary field, spanning philosophy, cognitive science, neuroscience, and computer science, requires comprehensive approaches to address persistent misconceptions surrounding deep learning systems [13,14]. Anthropomorphizing AI systems also introduces ethical concerns, potentially shifting accountability from system designers to the systems themselves [15]. Accordingly, there is an increasing need for explainability frameworks that remain grounded in mechanistic understanding and human interpretability rather than speculative claims regarding autonomous cognition or artificial consciousness. We further explicitly distinguish between biological and artificial neural representations, emphasizing that similarities between AI and brain systems are functional and informational [7,8] rather than structural or biological equivalences, and we therefore adopt a cautious stance on neuroscience analogies throughout this work.

The integration of XAI with cognitive science, human–computer interaction, and educational frameworks promises more interactive and interpretable AI systems [4,16]. This synergy fosters human-in-the-loop intelligence, empowering humans to interpret, guide, and refine AI systems while enabling AI systems to better understand human instructions, intentions, and contextual reasoning. These developments hold transformative potential in areas such as healthcare, scientific discovery, education, and legal decision-making [6,16,17,18,19]. Unlike conventional reinforcement learning from human feedback (RLHF) or human-in-the-loop paradigms that primarily optimize behavioral outputs, XAI2Brain emphasizes cognitively interpretable and mechanistically grounded alignment between AI systems and human reasoning processes.

In this Perspective, we introduce XAI2Brain as a conceptual framework for mechanistic and human-centered brain–AI alignment. We define XAI2Brain as a perspective-level research framework rather than a deployable system, aimed at structuring explainability-driven alignment across multiple levels of abstraction, and clarifying its role as a roadmap rather than an implementation methodology.

We survey XAI methodologies and their theoretical underpinnings, focusing on feature-level, concept-level, mechanistic, and human-centric interpretability approaches. Rather than proposing a complete artificial brain architecture, we outline a research roadmap describing how explainability may support bidirectional knowledge transfer between neuroscience and AI, enabling more interpretable, adaptive, and cognitively aligned intelligent systems. We also incorporate a structured discussion of methodological scope, clarifying that this work follows a narrative perspective review with thematic synthesis of prior literature, rather than an empirical or systematic experimental study.

By positioning explainability as a bridge between mechanistic AI understanding, cognitive science, and interactive human guidance, this Perspective highlights the growing role of interpretable alignment in the development of next-generation brain-inspired AI systems.

2. Contemporary XAI Paradigms and Brain–AI Alignment

This section surveys contemporary XAI methodologies through the lens of mechanistic and human-centered brain–AI alignment. Rather than treating explainability solely as a transparency mechanism, we organize existing XAI paradigms according to the level at which they interpret neural representations, ranging from feature-level attribution to concept-level reasoning and human-centric cognitive alignment. This progression highlights how explainability methods may contribute not only to understanding AI systems, but also to bridging machine representations with human cognition and neuroscience-inspired learning principles.

Table 1 provides an illustrative comparison of predictive performance and relative interpretability characteristics on the Caltech-101 benchmark. While interpretable models can achieve competitive performance on smaller datasets, these results should not be interpreted as definitive evidence of scalability to large-scale or foundation-model settings such as ImageNet or multimodal generative architectures. We further clarify that this comparison is purely illustrative and not intended as a standardized benchmark across heterogeneous model classes, and interpretability ratings reflect qualitative categorization rather than quantitative evaluation.

2.1. Feature-Oriented Methods

Feature-based interpretability techniques, such as Shapley Additive Explanations (SHAP) [29], Class Activation Maps (CAMs) [30], Grad-CAM [31], Grad-CAM++ [32], Global Attribution Mappings (GAMs) [33], and Gradient-based Saliency Maps [34], provide localized insights into machine learning model decision-making.

SHAP employs a game-theoretic framework to quantify the contribution of each feature to a model’s prediction, providing both local and global interpretability [29]. CAMs and their extensions, including Grad-CAM and Grad-CAM++, generate heatmaps highlighting influential regions in convolutional neural network (CNN) predictions [30,31,32]. GAMs extend this perspective by clustering similar local explanations to reveal broader attribution patterns across subpopulations [33]. Gradient-based Saliency Maps visualize influential input features through gradient magnitudes, identifying salient regions contributing to classification decisions [34].

Recent studies also demonstrate the value of foundation models as feature extractors for interpretable downstream learning. Tomczyk et al. [35] show that lightweight latent-space methods can retain competitive accuracy with substantially reduced computational requirements, while IDEAL [36] illustrates how prototype-based approaches may support interpretable transfer learning without fine-tuning.

These methods collectively improve understanding of where decisions are made within the input space, but they often provide limited insight into how or why decisions emerge from internal neural representations, particularly in highly non-linear models. From a brain–AI alignment perspective, feature-oriented explanations resemble localized attentional mechanisms in biological perception, yet remain limited in their ability to capture hierarchical reasoning, causal structure, and higher-level cognitive processes. We explicitly note that such brain analogies are functional and informational rather than biologically or structurally equivalent, and should be interpreted as conceptual correspondences rather than direct neural mappings.

2.2. Layerwise Mechanistic Methods

Layerwise mechanistic interpretability techniques aim to uncover the internal operations of deep learning systems by analyzing representations and contributions at the level of neurons, layers, and latent activations. Representative approaches include Layer-Wise Relevance Propagation (LRP) [37], DeconvNet [38], and deep belief networks [39]. Recent works on energy landscape-aware vision transformers [11] and sparsity-aware pruning with dying neuron reactivation [10] further advance mechanistic understanding by revealing task-specific sensitivities and adaptive neuron behaviors.

LRP [37] propagates relevance scores backward through neural layers to generate heatmaps indicating the contribution of individual features to model outputs. DeconvNet [38] utilizes learned deconvolutional structures to map latent activations back into the input domain, revealing influential patterns contributing to classification. Deep belief networks [39] capture hierarchical latent representations across multiple layers, enabling analysis of progressively abstract feature learning. Meanwhile, energy landscape-aware models [11] and sparsity-aware pruning strategies [10] provide additional neuron-level insights into adaptive dynamics and model sensitivity.

These methods are particularly relevant to XAI2Brain because they move beyond surface-level attribution toward mechanistic analysis of internal neural representations. Such approaches may offer opportunities to compare computational structures in AI systems with principles observed in biological neural processing and cognitive organization. Nevertheless, scalability to large foundation and multimodal models remains a major challenge, motivating further research into efficient mechanistic interpretability frameworks. We further emphasize that such comparisons should be interpreted as structural abstractions rather than claims of neurobiological equivalence, and current evidence supports only limited correspondence at the level of functional representation.

2.3. Concept-Level Models

Concept-level interpretability techniques represent an important transition from low-level attribution toward cognitively meaningful representations. Methods such as Concept Relevance Propagation (CRP) and Concept Activation Vectors (CAVs) [40,41] attempt to associate latent neural representations with human-understandable concepts.

CRP extends Layer-Wise Relevance Propagation by tracing how semantically meaningful concepts contribute to model decisions [37,40]. While current CRP applications are predominantly demonstrated in vision models, extending them to large language and multimodal architectures remains an open research direction. Unlike feature-level heatmaps, concept-based approaches seek to provide insight into the semantic structure underlying AI reasoning processes.

CAVs provide a complementary perspective by associating high-level latent representations with user-defined concepts [41]. These methods quantify the sensitivity of model predictions to interpretable semantic directions in latent space, potentially revealing biases or unintended concept associations. Automatic concept-based explanation techniques further reduce reliance on manually curated concepts, although their effectiveness depends heavily on concept quality, separability, and contextual relevance [42].

From the perspective of brain–AI alignment, concept-level interpretability offers a potential interface between machine representations and human semantic reasoning. By connecting latent activations with cognitively meaningful abstractions, these approaches may support future efforts toward interpretable and neuroscience-inspired representation learning.

2.4. Surrogate Models

Model-agnostic explanation techniques provide a general framework for approximating the behavior of otherwise opaque AI systems. Among these, Sparse Linear Subset Explanation (SLISE) [43] and Local Interpretable Model-Agnostic Explanations (LIME) [44] are widely used for localized interpretability.

SLISE generates interpretable explanations without relying on synthetic perturbations, improving stability and applicability across different machine learning scenarios [43]. LIME approximates local decision boundaries using simplified surrogate models trained on perturbed input samples [44]. In image classification tasks, this process often involves segmenting images into superpixels and evaluating their influence on predictions.

Although these approaches improve accessibility and flexibility, their explanations can be sensitive to perturbation strategies, sampling choices, and local approximation assumptions. From a human-centered perspective, surrogate models provide simplified representations of otherwise inaccessible decision processes, but they may not fully capture the internal reasoning structure of complex neural systems. Improving robustness, faithfulness, and cognitive compatibility therefore remains an important challenge.

2.5. Human-Centric Methods

Despite significant progress in XAI, many existing methods still provide explanations that remain difficult for humans to meaningfully interpret and utilize in practice. Traditional XAI approaches frequently emphasize post hoc feature importance or spatial attribution maps [4], while providing limited support for reasoning processes involving abstraction, analogy, contextual understanding, and semantic association.

Human-centric explainability introduces an alternative perspective that prioritizes cognitively compatible representations and interactive understanding. Rather than decomposing information into isolated features or pixels, these approaches emphasize prototype-based reasoning, semantic comparison, and holistic interpretation of complex entities such as images, text, and multimedia content [21,45]. Such strategies more closely resemble how humans naturally organize and interpret information [46,47].

Recent developments further emphasize the importance of evaluating explainability in the context of human–AI interaction and collaborative decision-making [47,48]. In this setting, explainability is assessed not only by technical correctness, but also by its accessibility, usability, and ability to support human understanding in real-world workflows.

Within the XAI2Brain perspective, human-centric methods play a critical role in bridging mechanistic interpretability with human cognitive processes. These approaches shift explainability beyond static visualization toward interactive and cognitively aligned human–AI collaboration. However, empirical validation across diverse domains and user populations remains limited, motivating future research into standardized human-centered evaluation protocols.

2.6. Usability of Counterfactual Explanations

Counterfactual explanations are increasingly recognized for their ability to clarify AI decision-making by illustrating how modifications to inputs can alter model outputs. Frameworks such as Alien Zoo [49] evaluate the interpretability and usability of these explanations from a human-centered perspective. By presenting alternative scenarios, counterfactual explanations can improve user comprehension and facilitate more intuitive interaction with AI systems [50].

Despite these advantages, generating realistic, diverse, and computationally efficient counterfactuals remains challenging, particularly in high-dimensional and multimodal settings. From the perspective of brain–AI alignment, counterfactual reasoning is especially important because it resembles aspects of human causal reasoning and hypothetical thinking, both of which are central to cognitive decision-making processes. Developing scalable and cognitively meaningful counterfactual frameworks therefore remains an important direction for future interpretable AI research.

Collectively, these XAI paradigms reveal a progression from feature-level attribution toward concept-level and human-centered interpretability. From the perspective of XAI2Brain, this evolution reflects a broader transition from explaining isolated model outputs toward understanding internal representations, cognitive compatibility, and mechanistic alignment between AI systems and human reasoning processes. Although current approaches remain limited in scalability, causal reasoning, and neuroscientific grounding, they provide important foundations for future research in interpretable and brain-inspired AI systems.

3. Current Challenges in XAI

As machine learning models become prevalent in diverse sectors, the need for XAI becomes crucial. XAI aims to make these models transparent, accountable, and understandable to a broad audience, not just technical experts. This shift towards explainability seeks to improve model performance and accessibility, promoting a more democratic approach to AI. However, achieving this goal faces challenges, including technical obstacles and ethical considerations [51,52].

From the perspective of XAI2Brain, these challenges can be reframed as limitations in achieving mechanistically grounded and cognitively compatible explanations, where interpretability is expected not only to describe outputs but also to partially reflect internal representational and functional structure in a way that can be meaningfully related (but not equated) to human cognitive and neuroscientific principles.

3.1. Explainability of Generative Models

The exploration of generative models in AI, as depicted in Figure 2, underscores a significant leap forward in data synthesis and creative output generation. However, the complexity and opacity of these models present profound challenges in understanding and explaining their internal decision-making processes. Key limitations include generalization issues, computational inefficiencies, and trade-offs between interpretability and model performance, as highlighted in recent reviews. In addition, from a brain–AI alignment perspective, a fundamental limitation of current generative models is the lack of explicitly structured intermediate representations that are consistently aligned with interpretable cognitive or functional abstractions, limiting their role as mechanistic models of intelligence rather than purely behavioral generators.

Generative adversarial network. Generative Adversarial Networks (GANs) have revolutionized data generation with their dual network architecture of a generator and discriminator. Yet, the complexity in interpreting GANs poses significant challenges. Their non-linear, high-dimensional structures, combined with the dynamic adversarial training, make the understanding of their decision-making process complex. In the context of brain–AI alignment, GANs remain largely behaviorally interpretable but mechanistically opaque, as their latent spaces do not naturally correspond to cognitively meaningful or neuroscientifically grounded representations. Recent studies address these challenges by integrating XAI techniques, such as explainable evaluation frameworks and hybrid approaches with feedback-driven personalization, to enhance transparency without sacrificing performance [53,54].

Neural radiance field. Neural Radiance Field (NeRF) has marked a significant breakthrough in 3D modeling from 2D images. Understanding the workings of NeRF models is complex due to their processing of high-dimensional data and opaque methods of reconstructing spatial information. However, from a mechanistic interpretability standpoint, NeRF lacks explicit disentangled representations of geometry and semantics that can be directly mapped to human spatial reasoning or perceptual cognition. The lack of intuitive interpretability in NeRF models adds to the challenge, particularly in identifying complex geometrical features and analyzing views with object occlusions. Advances include uncertainty visualization techniques like NeRVis, which help quantify model confidence, and integrations with diffusion models for improved regularization [55,56,57].

Diffusion model. Diffusion models, at the forefront of generative AI for image and audio synthesis, encounter substantial explainability challenges. The complexity of their iterative processes, akin to a multi-step chemical reaction, makes understanding these models a daunting task. Recent advancements aim to unravel these complexities by incorporating XAI methods, such as activation maps and counterfactual generation, improving both interpretability and output quality [58,59]. In particular, their iterative denoising process, while effective for generation, does not naturally expose semantically stable intermediate representations that could be aligned with cognitive stages of perception or decision-making.

Differential privacy. The integration of differential privacy in generative models, while ensuring data privacy, also introduces explainability challenges. From a brain–AI alignment perspective, privacy-preserving mechanisms further obscure internal representations, making it more difficult to recover interpretable structures that could support cognitive or mechanistic analysis. Balancing transparency with privacy protection is a complex endeavor, often leading to obscured understandings of the models’ inner workings and degraded explanation quality. Ongoing research explores trade-offs, such as using federated learning to preserve privacy while maintaining interpretability, though challenges like data heterogeneity persist [60,61,62].

Large language model. Large language models (LLMs) such as GPT-4 demonstrate remarkable abilities in text generation and comprehension. However, their internal representations remain largely unstructured with respect to explicit cognitive or neuroscientific interpretability frameworks, making mechanistic interpretability an open and active research challenge. The complexity of their decision-making processes, involving millions of parameters, presents significant explainability challenges. Understanding the rationale behind specific outputs or language comprehension in LLMs remains difficult due to their scale and dynamic behavior. Recent efforts focus on local explainability, mechanistic interpretability, and frameworks to generate trustworthy explanations, addressing gaps in transparency and ethical alignment [63,64,65].

3.2. Trust and Reliance in AI Systems

Explorations into the impact of human-centric explanations on user trust in AI systems are gaining prominence. Studies investigating the effects of various explanation types, such as feature importance and counterfactuals, on user reliance are revealing intricate dynamics between AI transparency and user trust.

Within the XAI2Brain framework, trust is interpreted as a downstream consequence of both functional transparency and the degree to which explanations reflect (at least partially) the internal structure of model representations, thereby linking psychological trust formation with mechanistic interpretability.

These studies contribute to a nuanced understanding of how explanations influence user perceptions and reliance on AI recommendations, crucial in contexts where AI aids critical decision-making [66].

3.3. Responsible AI

Embedding complex human values and ethics in AI systems is a profound challenge, acknowledging the subjective and culturally dependent nature of these concepts. Within XAI2Brain, responsible AI is closely associated with the availability of interpretable and mechanistically grounded representations that enable human oversight beyond output-level auditing, extending toward intermediate representational and decision-process transparency, while explicitly acknowledging that such interpretability remains partial and approximate in current systems.

Developing ethically aligned AI requires a deep understanding of diverse cultural and moral frameworks, supported by emerging governance structures like the EU AI Act and NIST guidelines [67,68,69,70,71]. Responsible AI necessitates transparent systems that can clearly articulate their reasoning to build user trust in ethical decision-making [51].

The scrutiny of fairness in AI aims to detect and neutralize biases to prevent the perpetuation of social inequalities [72]. Accountability is crucial, requiring structures to hold AI systems and developers responsible for outcomes. Strengthening the ethical foundation of AI demands interdisciplinary collaboration across technology, humanities, and social sciences.

3.4. Ethical Implications of Explanations

The ethical landscape of XAI is complex, emphasizing the importance of addressing moral and societal impacts as AI systems advance. Within XAI2Brain, ethical risks are also shaped by the depth and fidelity of interpretability, since limited mechanistic transparency can constrain the ability to reliably audit alignment between model behavior, internal representations, and human intent, thereby increasing reliance on post hoc rather than structural explanations.

Key ethical considerations include the need for unbiased AI operations to ensure fair and just decisions across all demographics [73,74]. Transparency in AI mechanisms is essential for building user trust [72,75]. Ongoing discussions highlight the requirement for robust ethical frameworks as AI becomes deeply integrated into society [76]. The discourse on moral and ethical implications underscores the need for continuous exploration and dialogue [68,77,78].

4. From XAI to Artificial Brain

4.1. Understanding and Mimicking Brain Functionality

Challenges in Brain Complexity. Replicating the intricate neural processes of the human brain remains a significant hurdle for AI systems, as evident from recent studies highlighting the difficulty in simulating dynamic neural interplay [79,80,81]. Recent advancements emphasize that even minor architectural changes can make AI more brain-like, but scaling these to match biological complexity poses ongoing challenges, including energy efficiency and adaptability [67]. Critically, while progress has been made, the gap persists due to the brain’s ability to process sparse, noisy data in real-time, which current AI struggles to emulate without vast computational resources. This motivates a shift from purely performance-driven architectures toward brain-inspired, explainability-aware AI systems. From the XAI2Brain perspective, this gap is not only architectural but also representational: current AI systems lack stable, interpretable intermediate representations that can be systematically aligned with cognitive or neuroscientific abstractions in a mechanistically meaningful way.

Limitations of Neural Networks. Despite progress, current neural networks fall short in emulating the depth of learning and adaptability seen in the human brain, posing a hurdle in achieving true intelligence in AI systems [82,83,84]. For instance, AI models often require massive datasets for training, unlike the brain’s efficient learning from limited experiences. Recent research suggests redesigning AI architectures inspired by neuroscience could address this, but empirical evidence shows limitations in handling novel scenarios without overfitting [85]. These limitations further reinforce the need for hybrid neuro-symbolic and explainability-centric learning paradigms.

Explicability in Deep Learning. Deep learning models, often seen as black boxes, present a significant challenge in explicability. Balancing advanced capabilities with the need for understanding their decision-making process is a focal point in Explainable AI (XAI) research, particularly important for applications demanding trust and transparency [51,86]. However, achieving this balance often incurs trade-offs in performance, and XAI methods may not fully capture the non-linear interactions in deep models. Future directions should integrate neuroscientific validation to ensure explanations align with cognitive processes. This alignment is essential for the transition from XAI systems to cognitively grounded artificial brain models. We further note that such alignment should be interpreted as functional and representational correspondence rather than direct equivalence between artificial and biological neural mechanisms.

4.2. AI Consciousness and Cognition

Theories of AI consciousness. The exploration of consciousness through frameworks such as Integrated Information Theory (IIT) and Attention Schema Theory (AST) offers profound insights into the essence of awareness. IIT posits that consciousness emerges from the capacity of a system to process integrated information [87], whereas AST considers consciousness as a byproduct of the brain’s model of attention [88]. These theories provide a foundational understanding that could extend to the realm of artificial intelligence, suggesting pathways to interpret AI behaviors within the context of XAI. However, recent debates argue that current AI lacks true consciousness, viewing claims otherwise as illusions rooted in anthropomorphism [5,81].

AI sentience and cognition. The ongoing debate surrounding AI’s potential for sentience, especially within the ambit of LLMs, enriches the discourse on consciousness. This conversation spans the ethical, philosophical, and practical dimensions of endowing AI with ‘sentience,’ pondering over the potential rights and duties this might imply [89]. As LLMs display increasingly complex behaviors, the line between programmed responses and genuine cognitive processes becomes blurred, presenting a unique challenge for XAI in demarcating clear explanations for AI actions that mimic conscious decisions [64,90]. Empirical studies, however, indicate that AI behaviors are emergent from training data rather than intrinsic awareness [28]. This reinforces the importance of XAI as a diagnostic framework rather than a consciousness attribution mechanism.

Theory of mind in AI. The progression of LLMs towards understanding scenarios indicative of a Theory of Mind (ToM) underscores the sophistication of AI cognitive models. This evolution provokes pivotal inquiries about AI’s level of ‘understanding’ and its implications for developing XAI frameworks that can explain AI decisions in human-centric terms [91,92]. Limitations include AI’s inability to generalize ToM beyond trained contexts, highlighting the need for diverse datasets and evaluation metrics.

AI vs. human cognition. The comparison between the cognitive processes of LLMs and humans remains a fertile area of inquiry. Investigating how AI systems and humans differ in processing information and interpreting emotions highlights the complexities involved in making AI’s decision-making processes transparent and understandable through XAI [93,94]. Recent analyses reveal AI excels in pattern recognition but falters in causal reasoning, suggesting XAI should incorporate cognitive science to bridge these gaps. Bridging this gap is central to the proposed XAI2Brain paradigm, where explanation systems are co-designed with cognitive principles.

Evaluating AI consciousness. Crafting metrics to evaluate AI consciousness and its understanding of ToM is an emerging challenge. This endeavor necessitates interdisciplinary collaboration, drawing on neuroscience, psychology, and computer science to forge evaluation tools that not only assess AI’s cognitive capabilities but also its ability to make decisions in a manner that is explainable and interpretable to humans [95]. The quest for such metrics is integral to advancing XAI, ensuring that as AI systems grow more complex and ostensibly ‘conscious,’ they remain accountable and comprehensible to the people who use them. Challenges include the subjectivity of consciousness, with calls for agnostic approaches until empirical evidence solidifies [5]. Future work should prioritise falsifiable, behavior-grounded evaluation protocols rather than subjective consciousness claims.

4.3. Emotional AI Evolution

Emotional Intelligence in AI. Efforts to integrate emotional intelligence into AI, particularly large language models, mark a significant shift towards recognizing the role of emotions in human cognition. This development aims to bridge the gap between human-machine interaction, promising a more relatable and intuitive user experience. However, ethical considerations arise regarding the authenticity of AI’s emotional understanding and potential misperceptions about its empathetic capabilities [76,96]. Recent advancements in 2025 show AI achieving emotional responsiveness, but risks include dependency and emotional manipulation [97,98,99].

Algorithmic Emotional Complexity. As AI evolves to simulate human emotions, algorithms must increasingly consider cultural and situational nuances. Recognizing and responding appropriately to the variability and subjectivity of human emotions requires advanced programming approaches. XAI plays a crucial role in making emotional AI’s underlying mechanisms transparent and understandable to users, addressing the need for a sophisticated framework [100,101]. Limitations involve biases in emotion datasets, which XAI can help mitigate through attribution analysis.

Simulated Empathy Limitations. AI’s attempts to simulate empathy highlight the fundamental absence of genuine consciousness. Recognizing this distinction is crucial for setting realistic expectations and ensuring ethically responsible deployment of emotional AI. XAI contributes by clarifying the extent and limitations of AI’s emotional intelligence, providing insights into how AI interprets and reacts to emotional cues [96]. Empirical case studies reveal that over-reliance on simulated empathy can erode human relationships, necessitating guidelines.

Emotional AI in Healthcare. The use of emotional AI in healthcare offers personalized emotional support, enhancing patient care. However, challenges arise from depersonalized interactions and the risk of oversimplifying human emotions, impacting patient trust. XAI becomes essential in elucidating the decision-making processes of emotional AI systems, aligning technology more closely with patient needs and expectations in healthcare settings [102]. Recent trials indicate benefits in mental health apps, but ethical oversight is critical to prevent harm.

4.4. The Personality of AI

Conceptualizing AI Personality. Exploring AI’s embodiment of human-like traits raises interdisciplinary questions in psychology, AI ethics, and human–computer interaction [98,103]. As AI integrates into social and professional realms, the concept of AI personalities influences user experience, prompting considerations about transparency and predictability. XAI emerges as a crucial framework for understanding how personality traits impact AI decision-making. Ethical implications include the risk of users forming unhealthy attachments [104,105].

Technical Aspects of AI Personality Simulation. Examining algorithms and methodologies central to AI’s interactive capabilities [96,106] aims to refine AI personalities for enhanced relatability and engagement. From an XAI perspective, elucidating technical mechanisms is vital to demystifying how personality traits are modeled and manifested in AI interactions, ensuring not only relatability but also comprehensibility of AI responses and decisions. Recent simulations of 1,052 personalities demonstrate accuracy but highlight privacy concerns [107].

Practical Implications and Case Studies. Real-world examples and case studies shed light on the practical implications of infusing personality into AI systems [108,109]. Analyzing these cases through an XAI lens provides insights into how personalities impact interpretability and user experience. Understanding these outcomes helps craft AI personalities that enhance transparency and rationale, aligning with XAI’s goal of making AI decisions more interpretable and justifiable to users. Limitations include cultural biases in personality models, suggesting diverse training data as a future direction.

4.5. Creating Biologically Plausible AI Models

Bridging the Biological–Computational Model Gap. Closing the disparity between biological neural networks and computational AI models is a formidable challenge for advancing artificial intelligence. Significant differences in information processing highlight the complexity of replicating biological functions in AI systems [81,84,110], offering insights for improving AI interpretability and transparency, crucial aspects of XAI.

Complexity in Biological Integration. Developing AI models mirroring biological neural networks is intricate. These networks balance electrical and chemical signals, adapt through learning, and dynamically reconfigure—processes challenging to emulate in computational algorithms. Achieving biologically plausible models demands a multidisciplinary approach, combining insights from neuroscience, cognitive science, and computer engineering to closely mirror the nuanced functionality of the human brain [81,84,111]. Such biologically grounded modelling is increasingly viewed as a prerequisite for next-generation explainable AI systems.

4.6. Human–AI Interaction and Cognitive Alignment

Bridging the gap between AI processing and human cognitive and communicative styles is a central challenge for realizing interactive, human-in-the-loop AI systems [112,113]. For AI to effectively assist humans, it must not only perform tasks accurately but also communicate its reasoning in a way that aligns with human comprehension and expectations.

AI Communication. A key aspect of interactive AI is the ability to translate complex model reasoning into forms that humans can easily understand and act upon [112,113]. This involves adapting explanations to human cognitive norms and preferences, enabling users to form mental models of AI behavior and trust its recommendations. Limitations include cultural differences in interpretation, requiring personalized XAI.

Human-Centric XAI. Human-centric design lies at the heart of explainable AI, emphasizing interfaces and explanations that are intuitive, cognitively aligned, and accessible to both experts and non-experts [73,114,115]. By aligning AI explanations with human reasoning, these approaches facilitate a collaborative learning loop in which humans can guide AI systems while AI systems provide insights that enrich human understanding.

Learning from Human Cognition. Effective human-in-the-loop XAI requires learning from human cognitive strategies, including pattern recognition, analogy-making, and concept abstraction [116,117,118,119,120]. Integrating these strategies into AI explanation mechanisms enables bidirectional interaction: humans better understand AI reasoning, while AI adapts to human instructions and true intentions, enhancing agency and collaborative decision-making.

Precision-Interpretability Balance. Achieving high accuracy while maintaining interpretability is critical, particularly in high-stakes domains such as healthcare and law [48,121,122]. Human-in-the-loop XAI leverages this balance to enable real-time interaction and iterative learning, where AI provides actionable insights and humans provide corrective guidance, fostering trust, accountability, and cognitive alignment.

Towards Interactive Cognitive Alignment. Collectively, these human–AI interaction strategies underscore the importance of designing AI systems that are not only explainable but also interactive and cognitively aligned with human users. This alignment forms the foundation for the XAI2Brain vision: AI systems that evolve through bidirectional learning, offering insights into both neural mechanisms and human cognition, ultimately supporting the development of artificial brains and human-like intelligence. This establishes a unified transition from XAI methodologies to brain-inspired artificial cognition systems.

4.7. Learning from the Brain to Enhance AI

Neuroscientific principles in XAI. As the demand for transparency in AI systems grows, so does the field of XAI [123]. Techniques like Local Interpretable Model-Agnostic Explanations (LIME) [66] and Shapley Additive Explanations (SHAP) [29] have been instrumental in demystifying the decision-making processes of complex models. Additionally, methods such as Layer-wise Relevance Propagation (LRP) [37] and Integrated Gradients (IG) [124] offer deeper insights into the influence of input features in specific model architectures. These methods can be seen as parallel to how neuroscientists attempt to decipher neural network activities in the brain. Recent synergies integrate AI with neuroscience for brain research, enhancing XAI’s biological plausibility [5,9,125]. From the XAI2Brain perspective, these techniques can be interpreted as partial and task-specific probes of internal representations rather than full mechanistic descriptions of model cognition, and thus provide only limited but useful correspondences to neuroscientific measurement tools.

Incorporating neuroscientific insights. Integrating findings from neuroscience into AI development is challenging due to the complex nature of neural mechanisms, as highlighted in recent interdisciplinary studies [81,94]. Extended minds with generative AI propose hybrid systems, though translation barriers persist.

Bridging research fields. The challenge of bridging methodologies and terminologies between AI and neuroscience remains significant, as reported in recent collaborative efforts [126,127]. Future directions include shared benchmarks to facilitate knowledge transfer.

Neurodiversity and XAI adaptability. XAI’s adaptability is tested across various domains, each with unique requirements for interpretability. In healthcare, for instance, explainable models can aid clinicians in understanding AI-assisted diagnoses [128]. In finance, they can help in clarifying credit scoring models for both providers and consumers [129]. Additionally, XAI’s role is becoming increasingly important in fields like autonomous driving and environmental modeling, where decisions have significant implications for safety and sustainability. Customizing XAI tools to meet these diverse needs is paramount for their effective integration into different sectors. Limitations involve domain-specific biases, requiring inclusive design.

Real-world XAI applicability. XAI methodologies must prove their worth in real-world scenarios [73]. Their applicability and scalability are crucial, especially when dealing with large and complex AI systems [130]. Scalable XAI solutions are essential for widespread adoption, ensuring that interpretability does not come at the cost of reduced performance or increased resource demands.

AI visualization and brain imaging. Visualization techniques such as Saliency Maps and Gradient Input are crucial in enhancing the interpretability of AI models [34,131]; similar approaches are used in brain imaging (e.g., fMRI, PET scans) to understand which areas of the brain are activated during specific tasks, offering a window into the brain’s decision-making processes.

Cognitive neuroscience in XAI evaluation. The evaluation of XAI methodologies often involves sensitivity analysis to assess their robustness [132]. Moreover, incorporating user studies is becoming essential in evaluating the effectiveness of XAI, especially in terms of how different stakeholders perceive and interact with AI explanations [133]. This analysis helps in determining the reliability of different XAI methods, identifying their strengths and limitations [134,135,136].

5. Further Discussion

5.1. Artificial General Intelligence

Embarking on the journey towards Artificial General Intelligence (AGI) aims to create systems with broad, adaptive intelligence akin to human cognition, potentially surpassing narrow tasks, as illustrated in Figure 3. This involves enhancing machine learning models with advanced neural networks, particularly hybrid models that blend deep learning and symbolic reasoning to foster generalization and abstract reasoning [137]. Recent advancements, such as large language models developed by OpenAI and Anthropic, have demonstrated significant progress in multi-task capabilities, though their behavior remains task-dependent and does not constitute general intelligence in the human cognitive sense [138]. From the XAI2Brain perspective, current AGI research should be interpreted as progress toward increasingly generalizable pattern learning systems rather than evidence of emerging human-like cognition or mechanistically grounded intelligence comparable to biological systems.

Integrating emotional intelligence into AI systems enhances human–AI interaction through affective computing, enabling systems to interpret and respond to human emotions more effectively [139]. However, challenges persist in accurately modeling nuanced emotional states, and current systems often rely on pattern recognition rather than genuine understanding. Ethical considerations are crucial, requiring comprehensive guidelines and governance frameworks to address potential risks such as AI misuse, societal disruption, and unintended consequences such as job displacement [140]. Importantly, debates around AI consciousness and rights remain speculative and are not grounded in current mechanistic understanding of AI systems. Recent discussions emphasize the need for interdisciplinary collaboration to mitigate biases and ensure equitable deployment [2,104,105].

Scalability and energy efficiency remain key research areas for AGI systems. The focus is on developing intelligent, adaptable, scalable, and energy-efficient architectures using innovative hardware technologies, such as neuromorphic computing, and optimized computational techniques [141]. Despite promising developments, limitations including high computational costs and data requirements hinder progress. Current progress should therefore be interpreted as incremental advances toward more capable AI systems rather than convergence toward fully realized AGI. Addressing these interconnected challenges—technical, ethical, and practical—is essential for advancing AI responsibly, with ongoing efforts expected to deliver increasingly capable systems in the coming decades, benefiting applications in healthcare, education, and scientific discovery.

5.2. Neuro-AI Interface

The Neuro-AI Interface sector is undergoing a transformative convergence of neuroscience and artificial intelligence, ushering in an era of innovation to decode and translate neural signals, bridging the gap between biology and digital systems. This interdisciplinary junction holds promise for advancements in healthcare, technology, and cognitive science, significantly improving our understanding of cerebral mechanisms [142]. In neuroprosthetics, these interfaces are revolutionizing the restoration and augmentation of motor, sensory, and cognitive functions, granting individuals with disabilities new interaction capabilities, as evidenced by recent clinical demonstrations of implantable devices [138]. Moreover, they play a crucial role in analyzing neural activity patterns, with AI enabling improved interpretation of complex brain datasets, contributing to progress in neurological disorder research such as Parkinson’s disease and epilepsy [143,144]. From the XAI2Brain perspective, neuro-AI interfaces can also be viewed as extreme cases of brain–AI alignment, where the correspondence between neural signals and computational representations becomes explicit, although current systems still rely on partial, noisy, and task-specific mappings rather than full bidirectional interpretability.

As neuro-AI systems advance toward more seamless human–device interaction, including thought-based communication paradigms, they raise important questions regarding privacy, autonomy, and data governance [145]. However, many of these capabilities remain experimental or limited to controlled clinical settings rather than general-purpose deployment. Future directions include developing non-invasive alternatives to improve accessibility and safety, while addressing limitations such as signal noise, robustness, and long-term biocompatibility. Regulatory frameworks must evolve to balance innovation with safety, ensuring equitable access and mitigating potential societal disparities [146].

5.3. Deciphering the Brain Mysteries with AI

The convergence of AI and neuroscience is increasingly enabling new tools for analyzing brain data, including advances in neural decoding and brain mapping technologies [81]. Leveraging AI’s capacity to model complex patterns, researchers are investigating neural mechanisms underlying cognition, perception, and behavior, extending beyond prediction toward scientific interpretation.

Central to this interdisciplinary collaboration are several key areas of exploration. For instance, AI-assisted studies of memory formation may improve understanding of cognitive decline and disorders such as Alzheimer’s disease, although clinical translation remains an ongoing research challenge [81]. Moreover, computational models of learning provide new perspectives on how information is acquired and processed, informing educational technologies and cognitive modeling approaches, while still falling short of fully capturing human intuition [93].

Importantly, such models should be interpreted as approximations of cognitive functions rather than faithful reproductions of biological neural processes, and current evidence supports only partial correspondence at the level of functional behavior.

This convergence also supports ongoing scientific inquiry into consciousness, although whether subjective experience can be fully captured by computational models remains an open philosophical and scientific question. Additionally, AI-driven simulation of neural systems is contributing to advances in neurotechnology, including brain-computer interfaces and neurostimulation research, with the goal of improving neurological healthcare outcomes [142]. Overall, the integration of AI and neuroscience should be viewed as a bidirectional research program aimed at improving models of brain function rather than a direct pathway to replicating human-like cognition.

5.4. Human-like Intelligence

Human-Like Intelligence (HLI) in AI research explores the extent to which machines can exhibit human-inspired cognitive and affective behaviors, including perception, reasoning, and emotion modeling [146]. The study of consciousness in AI involves designing systems with advanced representational capabilities and introspective modeling, although such systems do not possess self-awareness in the human sense and remain fundamentally computational [87,90]. From the XAI2Brain perspective, Human-Like Intelligence should be understood as a spectrum of functional mimicry rather than equivalence to human cognition, with emphasis on interpretability, alignment, and controllable behavior rather than claims of genuine cognitive equivalence.

HLI also emphasizes emotionally aware systems, integrating natural language processing, emotion recognition, and context-aware computation [146]. This has potential applications in assistive technologies, customer interaction systems, and mental health support, enabling more adaptive human–AI interaction. However, current systems simulate affective responses without experiencing emotions, and their behavior is strongly dependent on training data distributions. Accordingly, HLI should be interpreted as a design aspiration for more human-compatible AI systems rather than a literal reproduction of human cognition.

The broader development of HLI is inherently multidisciplinary, involving AI, neuroscience, psychology, and ethics. This convergence aims to improve human–AI interaction quality and interpretability, rather than to replicate human consciousness or subjective experience. Future research is expected to focus on hybrid cognitive systems and improved interpretability mechanisms that enhance collaboration between humans and AI systems while maintaining clear conceptual separation between simulation and biological cognition.

6. Conclusions

In conclusion, this Perspective highlights the critical role of Explainable AI (XAI) in demystifying AI decision-making, improving transparency in complex models, and supporting responsible AI development in high-impact domains. Rather than focusing solely on explanation as a post-hoc tool, we emphasize its emerging role in enabling mechanistic understanding, human-centered interaction, and cognitive alignment between artificial and biological intelligence. From the XAI2Brain perspective, explainability should be understood as a multi-level alignment process linking internal model representations, human interpretability, and neuroscience-inspired abstractions, rather than a purely visualization- or attribution-based mechanism.

While XAI has advanced significantly in recent years, it continues to face fundamental limitations in scalability, causal reasoning, and evaluation, particularly in large-scale generative and multimodal models. These challenges highlight the need for more robust, standardized, and cognitively grounded evaluation frameworks. Instead of positioning Artificial General Intelligence (AGI) or Human-Like Intelligence (HLI) as immediate or necessary targets, we frame the development of explainable and interpretable AI systems as a gradual progression toward more adaptive and cognitively aligned intelligence. This Perspective does not assume the emergence of machine consciousness, but instead focuses on improving the interpretability of internal representations and their alignment with human reasoning processes.

The integration of AI with neuroscience continues to yield promising interdisciplinary insights, including parallels between neural computation and artificial learning systems, advances in brain-inspired architectures, and emerging directions in neuro-AI interfaces. These developments support the broader vision of using AI not to replicate human cognition, but to better understand, model, and interact with it through interpretable computational systems. Importantly, current neuroscience-inspired AI approaches should be interpreted as functional analogies rather than structural equivalences, as biological neural systems and artificial networks differ substantially in learning dynamics, energy efficiency, and representational mechanisms. Multidisciplinary collaborations across neuroscience, cognitive science, machine learning, and ethics remain essential for advancing this field. In particular, future research should prioritize mechanistically grounded XAI methods, cognitively meaningful evaluation frameworks, and human-centered design principles that ensure AI systems remain interpretable, controllable, and socially aligned.

In summary, XAI2Brain provides a conceptual roadmap toward brain–AI alignment through mechanistic interpretability and human-in-the-loop intelligence, positioning explainability as a foundational component for next-generation AI systems rather than a supplementary tool.

Author Contributions

Conceptualization, Y.Z., B.W., P.A., Q.N. and R.J.; methodology, Y.Z. and B.W.; formal analysis, Y.Z. and B.W.; investigation, Y.Z. and B.W.; resources, P.A., Q.N. and R.J.; writing—original draft preparation, Y.Z. and B.W.; writing—review and editing, Y.Z., B.W., P.A., Q.N. and R.J.; visualization, Y.Z.; supervision, P.A., Q.N. and R.J.; project administration, R.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the UK EPSRC under Grant EP/P009727/2, and the Leverhulme Trust under Grant RF-2019-492.

Data Availability Statement

No new data were created or analyzed in this Perspective article. The study is a theoretical survey and discussion based on previously published literature. Performance data presented in Table 1 are reproduced from the cited sources for comparative purposes.

Acknowledgments

During the preparation of this manuscript, the author(s) used Grok 4 (developed by xAI) for the purposes of refining and improving the structure, language, and clarity of the text, as well as generating suggestions for section enhancements. The authors have reviewed and edited the output produced by the tool and take full responsibility for the content of this publication. The authors thank the LIRA Centre at Lancaster University for providing a supportive research environment.

Conflicts of Interest

The authors declare no conflicts of interest. No funders played any role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Ehsan, U.; Liao, Q.V.; Muller, M.; Riedl, M.O.; Weisz, J.D. Expanding explainability: Towards social transparency in ai systems. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Yokohama Japan, 8–13 May 2021; ACM: New York, NY, USA, 2021; pp. 1–19. [Google Scholar]
Angelov, P.P.; Soares, E.A.; Jiang, R.; Arnold, N.I.; Atkinson, P.M. Explainable artificial intelligence: An analytical review. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2021, 11, e1424. [Google Scholar] [CrossRef]
Longo, L.; Brcic, M.; Cabitza, F.; Choi, J.; Confalonieri, R.; Del Ser, J.; Guidotti, R.; Hayashi, Y.; Herrera, F.; Holzinger, A.; et al. Explainable Artificial Intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directions. Inf. Fusion 2024, 106, 102301. [Google Scholar] [CrossRef]
Saeed, W.; Omlin, C. Explainable AI (XAI): A systematic meta-survey of current challenges and future opportunities. Knowl.-Based Syst. 2023, 263, 110273. [Google Scholar] [CrossRef]
Bartle, A.S.; Jiang, Z.; Jiang, R.; Bouridane, A.; Almaadeed, S. A critical appraisal on deep neural networks: Bridge the gap between deep learning and neuroscience via XAI. In Handbook on Computer Learning and Intelligence: Volume 2: Deep Learning, Intelligent Control and Evolutionary Computation; World Scientific Publishing: Singapore, 2022; pp. 619–634. [Google Scholar]
Zhang, Z.; Aggarwal, V.; Angelov, P.; Jiang, R. Modeling Brain Aging with Explainable Triamese ViT: Towards Deeper Insights into Autism Disorder. IEEE J. Biomed. Health Inform. 2025, 29, 8409–8422. [Google Scholar] [CrossRef] [PubMed]
Ronca, V.; Castagneto Gissey, L.; Bellini, M.I.; Iodice, A.; Aricò, P.; Di Flumeri, G.; Giorgi, A.; Vozzi, A.; Capotorto, R.; Bonelli, S.; et al. Mutual information-based teamwork evaluation in real-world environments: An exploratory investigation with professional surgeons. Front. Netw. Physiol. 2025, 5, 1608824. [Google Scholar] [CrossRef] [PubMed]
Vozzi, A.; Ronca, V.; Malerba, P.; Ghiselli, S.; Murri, A.; Pizzol, E.; Babiloni, F.; Cuda, D. An innovative method for trans-impedance matrix interpretation in hearing pathologies discrimination. Med. Eng. Phys. 2022, 102, 103771. [Google Scholar] [CrossRef] [PubMed]
Jiang, Z.; Wang, Y.; Li, C.T.; Angelov, P.; Jiang, R. Delve into neural activations: Toward understanding dying neurons. IEEE Trans. Artif. Intell. 2022, 4, 959–971. [Google Scholar] [CrossRef]
Wang, B.; Jiang, R. DNR-Pruning: Sparsity-Aware Pruning via Dying Neuron Reactivation in Convolutional Neural Networks. Trans. Mach. Learn. Res. 2025. [Google Scholar]
Xia, R.; Jiang, R. Energy Landscape-Aware Vision Transformers: Layerwise Dynamics and Adaptive Task-Specific Training via Hopfield States. In Proceedings of the Thirty-Ninth Annual Conference on Neural Information Processing Systems, San Diego, CA, USA, 2–7 December 2025. [Google Scholar]
Ali, S.; Abuhmed, T.; El-Sappagh, S.; Muhammad, K.; Alonso-Moral, J.M.; Confalonieri, R.; Guidotti, R.; Del Ser, J.; Díaz-Rodríguez, N.; Herrera, F. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence. Inf. Fusion 2023, 99, 101805. [Google Scholar] [CrossRef]
Schmid, U.; Wrede, B. What is Missing in XAI So Far? An Interdisciplinary Perspective. KI-Künstliche Intell. 2022, 36, 303–315. [Google Scholar] [CrossRef]
Bengio, Y.; Lecun, Y.; Hinton, G. Deep learning for AI. Commun. ACM 2021, 64, 58–65. [Google Scholar] [CrossRef]
Hindennach, S.; Shi, L.; Miletić, F.; Bulling, A. Mindful Explanations: Prevalence and Impact of Mind Attribution in XAI Research. arXiv 2023, arXiv:2312.12119. [Google Scholar]
Suffian, M.; Kuhl, U.; Alonso-Moral, J.M.; Bogliolo, A. Toward enriched Cognitive Learning with XAI. arXiv 2023, arXiv:2312.12290. [Google Scholar]
Mehboob, F.; Rauf, A.; Jiang, R.; Saudagar, A.K.J.; Malik, K.M.; Khan, M.B.; Hasnat, M.H.A.; AlTameem, A.; AlKhathami, M. Towards robust diagnosis of COVID-19 using vision self-attention transformer. Sci. Rep. 2022, 12, 8922. [Google Scholar] [CrossRef] [PubMed]
Dinakaran, R.; Zhang, L.; Li, C.T.; Bouridane, A.; Jiang, R. Robust and fair undersea target detection with automated underwater vehicles for biodiversity data collection. Remote Sens. 2022, 14, 3680. [Google Scholar] [CrossRef]
Shen, A.; Zhu, Y.; Angelov, P.; Jiang, R. Marine debris detection in satellite surveillance using attention mechanisms. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 4320–4330. [Google Scholar] [CrossRef]
Kabir, H.D.; Abdar, M.; Khosravi, A.; Jalali, S.M.J.; Atiya, A.F.; Nahavandi, S.; Srinivasan, D. Spinalnet: Deep neural network with gradual input. IEEE Trans. Artif. Intell. 2022, 4, 1165–1177. [Google Scholar] [CrossRef]
Angelov, P.; Soares, E. Towards explainable deep neural networks (xDNN). Neural Netw. 2020, 130, 185–194. [Google Scholar] [CrossRef] [PubMed]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 770–778. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]
Peterson, L.E. K-nearest neighbor. Scholarpedia 2009, 4, 1883. [Google Scholar] [CrossRef]
Quinlan, J.R. Bagging, boosting, and C4. 5. In Proceedings of the AAAI-96 Proceedings, Portland, OR, USA, 4–8 August 1996; AAAI: Washington, DC, USA, 1996; Volume 1, pp. 725–730. [Google Scholar]
Rish, I. An empirical study of the naive Bayes classifier. In Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Seattle, WT, USA, 4–6 August 2001; Morgan Kaufmann Publishers: San Francisco, CA, USA, 2001; Volume 3, pp. 41–46. [Google Scholar]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 2921–2929. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; IEEE: New York, NY, USA, 2017; pp. 618–626. [Google Scholar]
Chattopadhay, A.; Sarkar, A.; Howlader, P.; Balasubramanian, V.N. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; IEEE: New York, NY, USA, 2018; pp. 839–847. [Google Scholar]
Ibrahim, M.; Louie, M.; Modarres, C.; Paisley, J. Global explanations of neural networks: Mapping the landscape of predictions. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, Honolulu, HI, USA, 27–28 January 2019; ACM: New York, NY, USA, 2019; pp. 279–287. [Google Scholar]
Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv 2013, arXiv:1312.6034. [Google Scholar]
Tomczyk, B.; Angelov, P.; Kangin, D. Machine learning within latent spaces formed by foundation models. In Proceedings of the 2024 IEEE 12th International Conference on Intelligent Systems (IS), Varna, Bulgaria, 29–31 August 2024; IEEE: New York, NY, USA, 2024; pp. 1–10. [Google Scholar]
Angelov, P.; Kangin, D.; Zhang, Z. IDEAL: Interpretable-by-Design ALgorithms for learning from foundation feature spaces. Neurocomputing 2025, 626, 129464. [Google Scholar] [CrossRef]
Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; Müller, K.R.; Samek, W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 2015, 10, e0130140. [Google Scholar] [CrossRef] [PubMed]
Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; IEEE: New York, NY, USA, 2015; pp. 1520–1528. [Google Scholar]
Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed]
Achtibat, R.; Dreyer, M.; Eisenbraun, I.; Bosse, S.; Wiegand, T.; Samek, W.; Lapuschkin, S. From attribution maps to human-understandable explanations through concept relevance propagation. Nat. Mach. Intell. 2023, 5, 1006–1019. [Google Scholar] [CrossRef]
Kim, B.; Wattenberg, M.; Gilmer, J.; Cai, C.; Wexler, J.; Viegas, F.; Sayres, R. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; PMLR: Cambridge, MA, USA, 2018; pp. 2668–2677. [Google Scholar]
Ghorbani, A.; Wexler, J.; Zou, J.Y.; Kim, B. Towards automatic concept-based explanations. Adv. Neural Inf. Process. Syst. 2019, 32, 9277–9286. [Google Scholar]
Björklund, A.; Mäkelä, J.; Puolamäki, K. SLISEMAP: Supervised dimensionality reduction through local explanations. Mach. Learn. 2023, 112, 1–43. [Google Scholar]
Dieber, J.; Kirrane, S. Why model why? Assessing the strengths and limitations of LIME. arXiv 2020, arXiv:2012.00093. [Google Scholar]
Bien, J.; Tibshirani, R. Prototype selection for interpretable classification. Ann. Appl. Stat. 2011, 5, 2403–2424. [Google Scholar] [CrossRef]
Bishop, C. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006; Volume 2, pp. 531–537. [Google Scholar]
Cartocci, G.; Veyrié, A.; Cavagnetto, N.; Hurter, C.; Degas, A.; Ferreira, A.; Ahmed, M.U.; Begum, S.; Barua, S.; Inguscio, B.M.S.; et al. Explainable artificial intelligence in air traffic control: Effects of expertise on workload, acceptance, and usage intentions. Brain Inform. 2026, 13, 6. [Google Scholar] [CrossRef] [PubMed]
Doshi-Velez, F.; Kim, B. Towards a rigorous science of interpretable machine learning. arXiv 2017, arXiv:1702.08608. [Google Scholar]
Kuhl, U.; Artelt, A.; Hammer, B. Let’s go to the Alien Zoo: Introducing an experimental framework to study usability of counterfactual explanations for machine learning. Front. Comput. Sci. 2023, 5, 20. [Google Scholar] [CrossRef]
Wachter, S.; Mittelstadt, B.; Russell, C. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. J. Law Technol. 2017, 31, 841. [Google Scholar]
Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
Gilpin, L.H.; Bau, D.; Yuan, B.Z.; Bajwa, A.; Specter, M.; Kagal, L. Explaining explanations: An overview of interpretability of machine learning. In Proceedings of the 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy, 1–3 October 2018; IEEE: New York, NY, USA, 2018; pp. 80–89. [Google Scholar]
Wang, Z.; She, Q.; Ward, T.E. Generative adversarial networks: A survey and taxonomy. arXiv 2019, arXiv:1906.01529. [Google Scholar]
Bau, D.; Zhu, J.Y.; Strobelt, H.; Lapedriza, A.; Zhou, B.; Torralba, A. Understanding the role of individual units in a deep neural network. Proc. Natl. Acad. Sci. USA 2020, 117, 30071–30078. [Google Scholar] [CrossRef] [PubMed]
Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
Tewari, A.; Thies, J.; Mildenhall, B.; Srinivasan, P.; Tretschk, E.; Yifan, W.; Lassner, C.; Sitzmann, V.; Martin-Brualla, R.; Lombardi, S.; et al. Advances in neural rendering. Comput. Graph. Forum 2022, 41, 703–735. [Google Scholar] [CrossRef]
Martin-Brualla, R.; Radwan, N.; Sajjadi, M.S.; Barron, J.T.; Dosovitskiy, A.; Duckworth, D. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; IEEE: New York, NY, USA, 2021; pp. 7210–7219. [Google Scholar]
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
Dhariwal, P.; Nichol, A. Diffusion models beat gans on image synthesis. Adv. Neural Inf. Process. Syst. 2021, 34, 8780–8794. [Google Scholar]
Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; ACM: New York, NY, USA, 2016; pp. 308–318. [Google Scholar]
Jayaraman, B.; Evans, D. Evaluating differentially private machine learning in practice. In Proceedings of the 28th USENIX Security Symposium (USENIX Security 19), Santa Clara, CA, USA, 14–16 August 2019; USENIX Association: Berkeley, CA, USA, 2019; pp. 1895–1912. [Google Scholar]
Li, J.; Khodak, M.; Caldas, S.; Talwalkar, A. Differentially private meta-learning. arXiv 2019, arXiv:1909.05830. [Google Scholar]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Bender, E.M.; Gebru, T.; McMillan-Major, A.; Shmitchell, S. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual, 3–10 March 2021; ACM: New York, NY, USA, 2021; pp. 610–623. [Google Scholar]
Linzen, T. How can we accelerate progress towards human-like linguistic generalization? arXiv 2020, arXiv:2005.00955. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; AAAI Press: Washington, DC, USA, 2016; pp. 1135–1144. [Google Scholar]
Mittelstadt, B. Principles alone cannot guarantee ethical AI. Nat. Mach. Intell. 2019, 1, 501–507. [Google Scholar] [CrossRef]
Russell, S. Human Compatible: Artificial Intelligence and the Problem of Control; Penguin: New York, NY, USA, 2019. [Google Scholar]
European Parliament and Council of the European Union. Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Off. J. Eur. Union 2024. [Google Scholar]
NIST AI 100-1; Artificial Intelligence Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2023.
National Institute of Standards and Technology (NIST). AI RMF Playbook; Supporting Guidance to the AI Risk Management Framework; NIST: Gaithersburg, MD, USA, 2024.
Mehrabi, N.; Morstatter, F.; Saxena, N.; Lerman, K.; Galstyan, A. A survey on bias and fairness in machine learning. ACM Comput. Surv. (CSUR) 2021, 54, 115. [Google Scholar] [CrossRef]
Baniecki, H.; Kretowicz, W.; PiÄ, P.; WiĹ, J. Dalex: Responsible machine learning with interactive explainability and fairness in python. J. Mach. Learn. Res. 2021, 22, 1–7. [Google Scholar]
Bellamy, R.K.; Dey, K.; Hind, M.; Hoffman, S.C.; Houde, S.; Kannan, K.; Lohia, P.; Martino, J.; Mehta, S.; Mojsilovic, A.; et al. AI Fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv 2018, arXiv:1810.01943. [Google Scholar]
Slack, D.; Hilgard, A.; Singh, S.; Lakkaraju, H. Reliable post hoc explanations: Modeling uncertainty in explainability. Adv. Neural Inf. Process. Syst. 2021, 34, 9391–9404. [Google Scholar]
Rahwan, I.; Cebrian, M.; Obradovich, N.; Bongard, J.; Bonnefon, J.F.; Breazeal, C.; Crandall, J.W.; Christakis, N.A.; Couzin, I.D.; Jackson, M.O.; et al. Machine behaviour. Nature 2019, 568, 477–486. [Google Scholar] [CrossRef] [PubMed]
Bostrom, N. Superintelligence: Paths, Dangers, Strategies; OUP: Oxford, UK, 2014. [Google Scholar]
Wallach, W.; Allen, C. Moral Machines: Teaching Robots Right from Wrong; Oxford University Press: Oxford, UK, 2008. [Google Scholar]
Prieto, A.; Prieto, B.; Ortigosa, E.M.; Ros, E.; Pelayo, F.; Ortega, J.; Rojas, I. Neural networks: An overview of early research, current frameworks and new challenges. Neurocomputing 2016, 214, 242–268. [Google Scholar] [CrossRef]
Friston, K.J. Models of brain function in neuroimaging. Annu. Rev. Psychol. 2005, 56, 57–87. [Google Scholar] [CrossRef] [PubMed]
Hassabis, D.; Kumaran, D.; Summerfield, C.; Botvinick, M. Neuroscience-inspired artificial intelligence. Neuron 2017, 95, 245–258. [Google Scholar] [CrossRef] [PubMed]
Saleem, M.A.; Senan, N.; Wahid, F.; Aamir, M.; Samad, A.; Khan, M. Comparative analysis of recent architecture of Convolutional Neural Network. Math. Probl. Eng. 2022, 2022, 7313612. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Marblestone, A.H.; Wayne, G.; Kording, K.P. Toward an integration of deep learning and neuroscience. Front. Comput. Neurosci. 2016, 10, 94. [Google Scholar] [CrossRef] [PubMed]
Goertzel, B. Artificial general intelligence: Concept, state of the art, and future prospects. J. Artif. Gen. Intell. 2014, 5, 1–46. [Google Scholar] [CrossRef]
Samek, W.; Montavon, G.; Vedaldi, A.; Hansen, L.K.; Müller, K.R. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning; Springer Nature: Berlin/Heidelberg, Germany, 2019; Volume 11700. [Google Scholar]
Tononi, G.; Boly, M.; Massimini, M.; Koch, C. Integrated information theory: From consciousness to its physical substrate. Nat. Rev. Neurosci. 2016, 17, 450–461. [Google Scholar] [CrossRef] [PubMed]
Graziano, M.S. The attention schema theory: A foundation for engineering artificial consciousness. Front. Robot. AI 2017, 4, 60. [Google Scholar] [CrossRef]
Chamola, V.; Hassija, V.; Sulthana, A.R.; Ghosh, D.; Dhingra, D.; Sikdar, B. A review of trustworthy and explainable artificial intelligence (xai). IEEE Access 2023, 11, 78994–79015. [Google Scholar] [CrossRef]
Dehaene, S. Consciousness and the Brain: Deciphering How the Brain Codes Our Thoughts; Penguin: New York, NY, USA, 2014. [Google Scholar]
Premack, D.; Woodruff, G. Does the chimpanzee have a theory of mind? Behav. Brain Sci. 1978, 1, 515–526. [Google Scholar] [CrossRef]
Tsoukalas, I. Theory of mind: Towards an evolutionary theory. Evol. Psychol. Sci. 2018, 4, 38–66. [Google Scholar]
Lake, B.M.; Ullman, T.D.; Tenenbaum, J.B.; Gershman, S.J. Building machines that learn and think like people. Behav. Brain Sci. 2017, 40, e253. [Google Scholar] [PubMed]
Marcus, G. Deep learning: A critical appraisal. arXiv 2018, arXiv:1801.00631. [Google Scholar]
Gamez, D. Measuring intelligence in natural and artificial systems. J. Artif. Intell. Conscious. 2021, 8, 285–302. [Google Scholar] [CrossRef]
McDuff, D.; Czerwinski, M. Designing emotionally sentient agents. Commun. ACM 2018, 61, 74–83. [Google Scholar] [CrossRef]
Breazeal, C. Emotion and sociable humanoid robots. Int. J. Hum.-Comput. Stud. 2003, 59, 119–155. [Google Scholar] [CrossRef]
Goleman, D. Emotional Intelligence; Bloomsbury Publishing: London, UK, 2020. [Google Scholar]
Moerland, T.M.; Broekens, J.; Jonker, C.M. Emotion in reinforcement learning agents and robots: A survey. Mach. Learn. 2018, 107, 443–480. [Google Scholar]
Zhan, F.; Yu, Y.; Wu, R.; Zhang, J.; Lu, S.; Liu, L.; Kortylewski, A.; Theobalt, C.; Xing, E. Multimodal image synthesis and editing: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 4, 1–20. [Google Scholar]
McStay, A. Emotional AI: The rise of empathic media. In Emotional AI; SAGE Publications Ltd.: London, UK, 2018; pp. 1–248. [Google Scholar]
Riek, L.D. Healthcare robotics. Commun. ACM 2017, 60, 68–78. [Google Scholar] [CrossRef]
Gerrish, S. How Smart Machines Think; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Nguyen, T.; Smith, J. Ethical AI: Challenges in Embedding Human Values. Ethics Inf. Technol. 2023, 25, 45–56. [Google Scholar]
Floridi, L.; Cowls, J.; Beltrametti, M.; Chatila, R.; Chazerand, P.; Dignum, V.; Luetge, C.; Madelin, R.; Pagallo, U.; Rossi, F.; et al. An ethical framework for a good AI society: Opportunities, risks, principles, and recommendations. In Ethics, Governance, and Policies in Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2021; pp. 19–39. [Google Scholar]
Segal, M.T.; Demos, V. Gender and the Media: Women’s Places; Emerald Publishing Limited: Leeds, UK, 2018. [Google Scholar]
McSherry, F.D. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, Providence, RI, USA, 29 June–2 July 2009; ACM: New York, NY, USA, 2009; pp. 19–30. [Google Scholar]
Luxton, D.D. Artificial intelligence in psychological practice: Current and future applications and implications. Prof. Psychol. Res. Pract. 2014, 45, 332–339. [Google Scholar] [CrossRef]
Wellman, H.M. Making Minds: How Theory of Mind Develops; Oxford University Press: Oxford, UK, 2014. [Google Scholar]
Gherman, I.M.; Abdallah, Z.S.; Pang, W.; Gorochowski, T.E.; Grierson, C.S.; Marucci, L. Bridging the gap between mechanistic biological models and machine learning surrogates. PLoS Comput. Biol. 2023, 19, e1010988. [Google Scholar] [CrossRef] [PubMed]
Dean, J. 1.1 the deep learning revolution and its implications for computer architecture and chip design. In Proceedings of the 2020 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, CA, USA, 16–20 February 2020; IEEE: New York, NY, USA, 2020; pp. 8–14. [Google Scholar]
Garcia-Magarino, I.; Muttukrishnan, R.; Lloret, J. Human-centric AI for trustworthy IoT systems with explainable multilayer perceptrons. IEEE Access 2019, 7, 125562–125574. [Google Scholar] [CrossRef]
Clark, L.; Pantidi, N.; Cooney, O.; Doyle, P.; Garaialde, D.; Edwards, J.; Spillane, B.; Gilmartin, E.; Murad, C.; Munteanu, C.; et al. What makes a good conversation? Challenges in designing truly conversational agents. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, UK, 4–9 May 2019; ACM: New York, NY, USA, 2019; pp. 1–12. [Google Scholar]
Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 2019, 267, 1–38. [Google Scholar] [CrossRef]
Shneiderman, B. Human-Centered AI; Oxford University Press: Oxford, UK, 2022. [Google Scholar]
Wang, Z.; Liu, M.; Luo, Y.; Xu, Z.; Xie, Y.; Wang, L.; Cai, L.; Qi, Q.; Yuan, Z.; Yang, T.; et al. Advanced graph and sequence neural networks for molecular property prediction and drug discovery. Bioinformatics 2022, 38, 2579–2586. [Google Scholar] [CrossRef] [PubMed]
Jacovi, A.; Goldberg, Y. Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness? arXiv 2020, arXiv:2004.03685. [Google Scholar]
Wiegreffe, S.; Pinter, Y. Attention is not not explanation. arXiv 2019, arXiv:1908.04626. [Google Scholar]
Pope, P.E.; Kolouri, S.; Rostami, M.; Martin, C.E.; Hoffmann, H. Explainability methods for graph convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; IEEE: New York, NY, USA, 2019; pp. 10772–10781. [Google Scholar]
Yuan, H.; Yu, H.; Gui, S.; Ji, S. Explainability in graph neural networks: A taxonomic survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 5782–5799. [Google Scholar]
Gunning, D.; Stefik, M.; Choi, J.; Miller, T.; Stumpf, S.; Yang, G.Z. XAI—Explainable artificial intelligence. Sci. Robot. 2019, 4, eaay7120. [Google Scholar] [CrossRef] [PubMed]
Fiorucci, M.; Khoroshiltseva, M.; Pontil, M.; Traviglia, A.; Del Bue, A.; James, S. Machine learning for cultural heritage: A survey. Pattern Recognit. Lett. 2020, 133, 102–108. [Google Scholar] [CrossRef]
Adadi, A.; Berrada, M. Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; PMLR: Cambridge, MA, USA, 2017; pp. 3319–3328. [Google Scholar]
Jiang, R.; Crookes, D. Shallow unorganized neural networks using smart neuron model for visual perception. IEEE Access 2019, 7, 152701–152714. [Google Scholar] [CrossRef]
Yamins, D.L.; DiCarlo, J.J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. 2016, 19, 356–365. [Google Scholar] [CrossRef] [PubMed]
Kriegeskorte, N. Deep neural networks: A new framework for modeling biological vision and brain information processing. Annu. Rev. Vis. Sci. 2015, 1, 417–446. [Google Scholar] [CrossRef] [PubMed]
Holzinger, A.; Biemann, C.; Pattichis, C.S.; Kell, D.B. What do we need to build explainable AI systems for the medical domain? arXiv 2017, arXiv:1712.09923. [Google Scholar]
Weerts, H.; Dudík, M.; Edgar, R.; Jalali, A.; Lutz, R.; Madaio, M. Fairlearn: Assessing and Improving Fairness of AI Systems. arXiv 2023, arXiv:2303.16626. [Google Scholar]
Hedström, A.; Weber, L.; Krakowczyk, D.; Bareeva, D.; Motzkus, F.; Samek, W.; Lapuschkin, S.; Höhne, M.M.C. Quantus: An explainable ai toolkit for responsible evaluation of neural network explanations and beyond. J. Mach. Learn. Res. 2023, 24, 1–11. [Google Scholar]
Shrikumar, A.; Greenside, P.; Kundaje, A. Learning important features through propagating activation differences. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; PMLR: Cambridge, MA, USA, 2017; pp. 3145–3153. [Google Scholar]
Baehrens, D.; Schroeter, T.; Harmeling, S.; Kawanabe, M.; Hansen, K.; Müller, K.R. How to explain individual classification decisions. J. Mach. Learn. Res. 2010, 11, 1803–1831. [Google Scholar]
Abdul, A.; von der Weth, C.; Kankanhalli, M.; Lim, B.Y. COGAM: Measuring and moderating cognitive load in machine learning model explanations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 25–30 August 2020; ACM: New York, NY, USA, 2020; pp. 1–14. [Google Scholar]
Alvarez-Melis, D.; Jaakkola, T.S. On the robustness of interpretability methods. arXiv 2018, arXiv:1806.08049. [Google Scholar]
Adebayo, J.; Gilmer, J.; Goodfellow, I.; Kim, B. Local explanation methods for deep neural networks lack sensitivity to parameter values. arXiv 2018, arXiv:1810.03307. [Google Scholar]
Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt, M.; Kim, B. Sanity checks for saliency maps. Adv. Neural Inf. Process. Syst. 2018, 31, 9525–9536. [Google Scholar]
Zhang, B.; Zhu, J.; Su, H. Toward the third generation artificial intelligence. Sci. China Inf. Sci. 2023, 66, 121101. [Google Scholar] [CrossRef]
Donoghue, J.P. Connecting cortex to machines: Recent advances in brain interfaces. Nat. Neurosci. 2002, 5, 1085–1088. [Google Scholar] [CrossRef] [PubMed]
Zall, R.; Kangavari, M.R. Comparative analytical survey on cognitive agents with emotional intelligence. Cogn. Comput. 2022, 14, 1223–1246. [Google Scholar] [CrossRef]
McLean, S.; Read, G.J.; Thompson, J.; Baber, C.; Stanton, N.A.; Salmon, P.M. The risks associated with Artificial General Intelligence: A systematic review. J. Exp. Theor. Artif. Intell. 2023, 35, 649–663. [Google Scholar]
Moura, R.F.d.; Carro, L. Scalable and Energy-Efficient NN Acceleration with GPU-ReRAM Architecture. In Proceedings of the International Symposium on Applied Reconfigurable Computing, Cottbus, Germany, 27–29 September 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 230–244. [Google Scholar]
Wang, Z.; She, Q.; Smeaton, A.F.; Ward, T.E.; Healy, G. Synthetic-Neuroscore: Using a neuro-AI interface for evaluating generative adversarial networks. Neurocomputing 2020, 405, 26–36. [Google Scholar] [CrossRef]
Fetz, E.E. Volitional control of neural activity: Implications for brain–computer interfaces. J. Physiol. 2007, 579, 571–579. [Google Scholar] [CrossRef] [PubMed]
Moxon, K.A.; Foffani, G. Brain-machine interfaces beyond neuroprosthetics. Neuron 2015, 86, 55–67. [Google Scholar] [CrossRef] [PubMed]
Ritchie, J.B.; Kaplan, D.M.; Klein, C. Decoding the brain: Neural representation and the limits of multivariate pattern analysis in cognitive neuroscience. Br. J. Philos. Sci. 2019, 70, 581–607. [Google Scholar] [CrossRef] [PubMed]
Assran, M.; Duval, Q.; Misra, I.; Bojanowski, P.; Vincent, P.; Rabbat, M.; LeCun, Y.; Ballas, N. Self-supervised learning from images with a joint-embedding predictive architecture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; IEEE: New York, NY, USA, 2023; pp. 15619–15629. [Google Scholar]

Figure 1. XAI2Brain—The Cycle of Interactive Learning Between Brain and AI. Neural networks inspired by brain mechanisms evolve by capturing computational principles of neural processing. In turn, these models provide interpretable insights into cerebral function, which facilitate human-in-the-loop interaction with AI systems. This creates a continuous, bidirectional cycle of learning and adaptation.

Figure 2. The inscrutability of generative models. Generative models, similar to black boxes, conceal the intricate processes behind their creative outputs, making their internal workings enigmatic and challenging for humans to decipher.

Figure 3. The evolution of artificial intelligence. The arrow’s ascent reflects the dynamic growth and expanding capabilities of AI, marking key developments from basic algorithms to advanced sentient systems.

Table 1. Illustrative comparison of predictive performance and relative interpretability characteristics on the Caltech-101 dataset. Results should not be interpreted as definitive evidence of scalability to large-scale or foundation-model settings.

Method	Accuracy	# Parameters	Interpretability
SpinalNet [20]	97.32%	132,600,000	Medium (qualitative)
xDNN [21]	94.31%	4 per class	High
VGG-16 [22]	90.32%	138,000,000	Very low
ResNet-50 [23]	90.39%	23,000,000	Very low
Random forest [24]	87.12%	∼20,000	Medium
SVM [25]	86.64%	∼15,000	Low
kNN [26]	85.65%	∼300 and all data	Low
Decision tree [27]	86.42%	∼5 rules per class	High
Naive Bayes [28]	54.84%	409,700	Medium

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jiang, R.; Zhou, Y.; Wang, B.; Angelov, P.; Ni, Q. XAI2Brain: A Perspective on Mechanistic Interpretability for Brain–AI Alignment. Mach. Learn. Knowl. Extr. 2026, 8, 167. https://doi.org/10.3390/make8060167

AMA Style

Jiang R, Zhou Y, Wang B, Angelov P, Ni Q. XAI2Brain: A Perspective on Mechanistic Interpretability for Brain–AI Alignment. Machine Learning and Knowledge Extraction. 2026; 8(6):167. https://doi.org/10.3390/make8060167

Chicago/Turabian Style

Jiang, Richard, Yongchen Zhou, Boyuan Wang, Plamen Angelov, and Qiang Ni. 2026. "XAI2Brain: A Perspective on Mechanistic Interpretability for Brain–AI Alignment" Machine Learning and Knowledge Extraction 8, no. 6: 167. https://doi.org/10.3390/make8060167

APA Style

Jiang, R., Zhou, Y., Wang, B., Angelov, P., & Ni, Q. (2026). XAI2Brain: A Perspective on Mechanistic Interpretability for Brain–AI Alignment. Machine Learning and Knowledge Extraction, 8(6), 167. https://doi.org/10.3390/make8060167

Article Menu

XAI2Brain: A Perspective on Mechanistic Interpretability for Brain–AI Alignment

Abstract

1. Introduction

2. Contemporary XAI Paradigms and Brain–AI Alignment

2.1. Feature-Oriented Methods

2.2. Layerwise Mechanistic Methods

2.3. Concept-Level Models

2.4. Surrogate Models

2.5. Human-Centric Methods

2.6. Usability of Counterfactual Explanations

3. Current Challenges in XAI

3.1. Explainability of Generative Models

3.2. Trust and Reliance in AI Systems

3.3. Responsible AI

3.4. Ethical Implications of Explanations

4. From XAI to Artificial Brain

4.1. Understanding and Mimicking Brain Functionality

4.2. AI Consciousness and Cognition

4.3. Emotional AI Evolution

4.4. The Personality of AI

4.5. Creating Biologically Plausible AI Models

4.6. Human–AI Interaction and Cognitive Alignment

4.7. Learning from the Brain to Enhance AI

5. Further Discussion

5.1. Artificial General Intelligence

5.2. Neuro-AI Interface

5.3. Deciphering the Brain Mysteries with AI

5.4. Human-like Intelligence

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI