Next Article in Journal
Enhancement of Building Heating Systems Connected to Third-Generation Centralized Heating Systems
Previous Article in Journal
Why ROC-AUC Is Misleading for Highly Imbalanced Data: In-Depth Evaluation of MCC, F2-Score, H-Measure, and AUC-Based Metrics Across Diverse Classifiers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Systematic Review

Comparative Experimental Studies on Superior Cognitive Domains: AI Versus Humans

by
Raquel Ayala-Carabajo
* and
Joe Llerena-Izquierdo
Universidad Politécnica Salesiana, Guayaquil 090901, Ecuador
*
Author to whom correspondence should be addressed.
Technologies 2026, 14(1), 55; https://doi.org/10.3390/technologies14010055
Submission received: 14 November 2025 / Revised: 30 December 2025 / Accepted: 6 January 2026 / Published: 10 January 2026
(This article belongs to the Section Information and Communication Technologies)

Abstract

This study analyzes the performance of artificial intelligence in processes known as “cognitive” (according to scientific literature) in comparison with the performance of human cognitive processes, analyzing experimental and/or empirical studies. The PRISMA process and bibliometric analysis were used to identify and analyze relevant research. A total of 291 studies were analyzed, which were grouped into five categories corresponding to the identified cognitive processes. The results show that only 10.3% of the studies report accuracy rates between 90% and 100% in their performance. The evidence suggests that AI can perform comparably to humans, but not with absolute efficiency. The experimental studies focus mainly on the “decision-making” process (56%), followed, in order of importance, by the processes of “analysis and evaluation” (25%), “judgment and reasoning” (8.6%), “comprehension and learning” (5.5%), and other “specific processes” (4.8%). The most significant contribution of this study is the comparative relational structure between human cognitive processes versus AI processes.

Graphical Abstract

1. Introduction

This study explores the intersection between Artificial Intelligence (AI) and higher cognitive processes. The widespread use of these intelligent algorithms is beginning to undermine the ordinary understanding of what it means to be “intelligent.” Although there is still widespread belief that humans must maintain oversight of AI to ensure its reliability and trustworthiness in a wide range of situations, empirical evidence shows that AI introduces a new range of capabilities, whether identical to those of a human being or not, that force us to reconsider the nature of processes called “intelligent” and/or “cognitive” [1].
With the unstoppable advancement of system technologies, the range and scope of AI capabilities are also constantly expanding [2]. However, we question how “intelligent” AI is compared to human intelligence. Or whether it is truly “intelligent” when comparing its performance to that of humans in specific and/or complex tasks. Obviously, this radical question cuts across different disciplinary and research fields (philosophical, computational, psychological, etc.) and is being addressed by many researchers worldwide. It will continue to be complex to answer, especially based on scientific evidence [3]. More specifically, experiments are being conducted to determine whether AI can emulate human cognitive processes.
However, the interest of this study lies in exploring how the concept of intelligence is being redefined, based on experimental and/or empirical studies, proving that AI can perform cognitive and pedagogical tasks with efficiency comparable to that of humans, according to numerous authors [2,3]. Even though these models exhibit biases comparable to those of human cognition [4], their performance has been rated as exceptional in recognizing various data spectra [5]. It should be noted that the selected works belong to specialized fields (basically medicine, radiology, and health). Specifically, a set of experimental and comparative studies designed to compare the performance of AI systems, especially extensive language models, with the performance of human experts has been selected.
Thus, the normalized use of AI has transcended research in computer labs to reach even the most ordinary activities in communication, work, and social life, for example. At the same time, the social perception of AI is being studied, as it is now a central topic of discussion in a transformed world [6].
In effect, the potential of AI is being explored for use in processing large data sets, surpassing experts in tasks that require demanding and in-depth analysis in less time, as well as achieving consistency in results [7]. Furthermore, it is already considered that AI systems achieve reliable processing similar to that of humans when the quality of the data is higher [8]. These results are mainly obtained in areas such as medicine, health, education, and engineering; specifically, in relation to applications focused on assistance for evaluation and diagnosis [9].
Currently, AI has proven to outperform humans in specific tasks involving data analysis, processing large data sets, and predictive diagnostics [6,10]. Although its knowledge base and configuration mimic the specialized intelligence of an expert, based on the processing of large volumes of information, new challenges are emerging, such as the ethical and human approach to interpreting complex images [11].
In the works [1,12] a perception of augmented intelligence is presented, translated into a synergy between the consistency of results, creativity, and ethical judgment of human beings, among other cognitive processes, establishing a new way of solving more complex problems. For example, in the field of medical research, AI has demonstrated a high capacity to detect recurring data and subtle patterns in data that experts in this field may overlook [3,7].
At the same time, AI has achieved greater effectiveness in clinical decision-making, improving accuracy in the management of pancreatic cysts and reducing unnecessary surgeries [10]. Thus, when performing a professional diagnosis, studies show that AI optimizes processes and provides professional clinical support [13]. In another study, AI models demonstrated 90% accuracy in detecting diseases such as gastritis and pseudopapilledema, as well as in diagnosing acute heart attacks based on electrocardiograms [9,10].
In another area, in the teaching of programming algorithms, the assistance of generative AI agents results in increased student motivation and interest, especially in the early years, as they receive supervised educational support. These classroom experiences reveal that AI is a collaborative tool that generates a change in the efficiency and quality of human cognitive processes [14]. In this regard, several studies show that AI can function as a pedagogical tool or to support programming language teaching processes that not only increase student interest and satisfaction but also exceed expectations in terms of educational support [14].

1.1. Higher Cognitive Processes Based on Psychology and Cognitive Science

In the fields of cognitive psychology and neuroscience, higher cognitive processes are defined as complex mental mechanisms that enable us to interpret information, generate inferences, evaluate alternatives, and act under conditions of uncertainty [15,16]. These processes go beyond mere perception and automatic response, integrating advanced cognitive control mechanisms that involve prior knowledge and metacognitive regulation [17,18]. In this sense, reasoning has been defined as the ability to derive conclusions from premises by applying explicit or implicit procedures, articulating mental representations and executive control to evaluate the validity of the inferences generated [16,18].
Thus, human reasoning involves the conscious manipulation of symbolic representations, the normative evaluation of inferences, and the ability to revise or inhibit intuitive responses when they are inappropriate. Therefore, it is not reduced to obtaining a correct answer or optimizing a result [19,20]. It should be noted that, in cognitive neuroscience studies, higher cognitive processes have been associated with distributed neural networks (particularly in the prefrontal cortex) responsible for controlling behavior and regulating cognition based on goals (at the hierarchical level) and context (at the flexibility level) [21].
Likewise, reasoning is closely linked to judgment, expressed as an evaluative process through which humans integrate verified evidence, prior experience, and contextual characteristics to make assessments or estimates, especially in uncertain or confusing situations, as well as with incomplete information [22]. It has been shown that human judgment is deeply influenced by general rules, cognitive resource constraints, and metacognitive mechanisms, introducing systematic versatility into human evaluation and differentiating it from purely algorithmic machine processes [23,24]. In this sense, judgment is not a simple algorithmic decision rule, but rather a rational adaptation to the computational limitations of the human cognitive system [25].
For its part, decision-making has been extensively studied as a higher-order cognitive process that integrates perception, memory, risk assessment, learning, and executive control [7,26]. In a broader sense, cognitive neuroscience has shown that human decision-making involves the interaction of deliberative and automatic systems, supported by distributed networks with activity in the prefrontal regions, integrating assessment systems and adaptive learning mechanisms [27,28]. It is precisely this functional architecture that explains both the flexibility and limitations observed in human cognitive activity in complex and dynamic environments [29,30].
From this same scientific perspective, analysis and evaluation are conceived as processes based on the explicit definition of criteria, the weighing of evidence, and the justification of conclusions within a specific conceptual and normative framework [28]. These findings have directly influenced the development of current computational models, such as reinforcement learning algorithms, sequential decision models, and certain architectures inspired by principles of brain function that aim to emulate aspects of human learning and adaptation [31].
It should be noted that human cognitive processes are not limited to the successful execution of a task, the decomposition of information, or the efficient comparison of alternatives [30]. Rather, these processes involve semantic understanding, contextual flexibility, and, in many cases, forms of reflective awareness and metacognition that do not automatically follow from observable performance [32]. Numerous recent studies determine that artificial intelligence systems “reason,” “decide,” or “judge,” but strictly speaking, they describe systems that optimize target functions or identify complex statistical patterns in large volumes of data [33]. While these capabilities represent significant technical advances, equating them directly with human cognitive processes leads to conceptual ambiguities that must be explicitly addressed to avoid theoretically inaccurate interpretations and generalized conclusions [34,35,36].

1.2. Cognitive Processes, AI Performance, and Technological Trust

The distinction between operational performance and cognitive competence is key in the contemporary debate on trust in the performance of artificial intelligence (AI) [12,37]. Operational performance refers to a system’s ability to execute a specific task with high levels of accuracy, efficiency, or speed, usually evaluated using objective and quantitative metrics. In contrast, cognitive competence involves intrinsic mechanisms comparable to human mental processes, such as semantic comprehension, context-based reasoning, evaluation based on norms, and the ability to justify decisions in a reflective manner [24,38]. This difference is important in order to avoid superficial and erroneously generalized comparisons between observable performance and genuine cognition.
Numerous studies show that recently developed AI systems can match or even surpass humans in highly defined tasks, especially in domains such as medical image recognition, assisted diagnosis, surgical tool suggestions, big data analysis, and complex statistical prediction [39]. These results are largely explained by the ability of deep learning models to identify high-dimensional patterns in a consistent and scalable manner, overcoming human limitations in terms of memory and processing speed [40].
However, recent literature also points to substantive limitations when these systems encounter situations outside their training domain, require explanations that are understandable to humans, or must integrate ethical, social, and contextual considerations. From a cognitive science perspective, these limitations are interpreted as evidence that high task performance does not equate to possession of the underlying cognitive process. In particular, the absence of explicit internal models, metacognition, and deep semantic understanding restricts AI’s ability to generalize, justify its decisions, or adapt flexibly to novel contexts [41,42].
Various approaches in cognitive psychology and social sciences agree that human trust in an agent—whether human or artificial—is not built solely on accuracy or efficiency, but on the perception of understanding, internal consistency, predictability, and the ability to justify actions taken [43]. In this sense, trust is configured as a relational and cognitive phenomenon, rather than a simple response to performance metrics.
Studies on human–AI interaction have consistently shown that users tend to distrust highly accurate but opaque systems, especially in high-risk contexts such as medicine, education, or legal decision-making. The lack of explainability and alignment with human reasoning criteria increases the perception of risk, even when the system demonstrates superior performance to humans in controlled evaluations [44]. Consequently, there is a need to clearly distinguish between technical effectiveness and cognitive credibility, as well as to develop AI models that not only optimize results but also integrate cognitive principles geared toward human understanding and trust [45].
In the context of what has been indicated so far, the objective of this article is to analyze the performance of AI in the field of cognitive processes. Through a review of the relevant literature, mainly experimental studies, the results comparing the capabilities of AI and human intelligence are evaluated, as well as the implications that this comparison has for the understanding of cognition.

2. State of the Art

When studying human intelligence, we generally consider a set of skills such as creativity, logical and ethical reasoning, critical judgment, and the ability to respond appropriately to unpredictable and complex contexts [4]. But the advent of AI has forced a radical rethinking of what it means to “be intelligent” and calls into question whether we should call these artificial functions and processes intelligence, comparing them with what we conceive as such.
In this sense, the term “artificial intelligence” is used to describe systems capable of performing complex tasks based on logical reasoning and focused on problem-solving [46], questioning whether AI proves to be a form of functional intelligence with a nature and manifestations different from those of humans [13]. There are positions that assert ownership of the term “intelligence”, even claiming that it surpasses human experts in specific and complex cognitive tasks such as diagnosing diseases from medical images [2,7,11]; being even more effective when making or “taking” decisions in the clinical setting [10,47].
Likewise, there has been an increase in the capacity of intelligent language systems and models to evaluate, for example, complex medical examinations, which are improving over time [48] and already have experimental evidence of this learning [49].
On the contrary, critics of the use of the term “intelligence” argue that systems that lack the subjectivity, ethical judgment, and consciousness characteristic of human beings cannot be called intelligent. In the study of [4] show that, despite its achievements, AI has critical flaws in unpredictable scenarios and that in high-risk decisions, AI lacks a moral framework and accountability to guide it [1]. This is why researchers insist that, given the potential of AI, its operation must be supervised by expert human professionals [5]. Viewed in this way, artificial intelligence cannot be conceived as an imitation of human intelligence, but rather as a high-capacity processing tool that lacks intention and the dimension of meaning, see Figure 1.
In short, advances in AI in areas of deep learning (DL) have transformed the debate about the capabilities of these systems in comparison with human intelligence [50]. Both empirical and experimental studies show that AI-based technology can complement human performance with effective results, matching or even surpassing human performance in specific tasks that require expertise [51].
In the field of medicine, AI is proving to be a powerful tool. For example, in the detection of dental caries, DL algorithms have demonstrated high accuracy, often outperforming human experts by extracting and analyzing imperceptible features in radiographic images [50]. In the field of pharmacology, there are studies that use machine learning with artificial intelligence algorithms that model pharmacokinetics to emulate the process of absorption, distribution, metabolism, and elimination of drugs in the body. Their use has made it possible to simulate real processes that allow for estimates based on real parameters, making the AI model useful in helping experts understand complex interactions between chemical or biological substances in a living organism [52].
Likewise, an ultrasound imaging system with AI elements for automatic segmentation of ovarian masses achieved a Sorensen-Dice similarity coefficient of 91%, gaining strong acceptance among experts of the system for determining images that enable professional diagnosis [53].
The potential of AI extends beyond diagnostic imaging. In cardiology, for example, an AI model that performs cardiac chamber volumetry significantly outperformed known clinical markers and the Agatston score in predicting heart failure over a fifteen-year period [54]. This demonstrates that AI can identify complex patterns that are difficult to detect using traditional methods. In the field of surgery, an AI model was able to evaluate doctors’ surgical skills with 87.2% accuracy by recognizing instruments in robotic gastrectomy videos [55]. This opens up the possibility for an impartial and more accessible way of measuring surgical performance.
In addition, AI is being used in medical decision-making. An automated version of AI-assisted navigation software proved to be as accurate as the version controlled by a human technician in placing implants in the safety zone and achieving the correct leg length [56], as in hip arthroplasty. However, AI performance can vary depending on the complexity of the task. In the field of language models, it has been shown that the most advanced versions (such as GPT-4) outperform previous ones in medical examinations [57]. However, the study points out that, although improvements are evident when it comes to text-based questions, there is still room for improvement in image interpretation. These data call into question visual reasoning abilities, recognition of subtle patterns, and visual perception compared to human performance.
Despite these results, AI still does not fully achieve effectiveness in certain specific tasks. For example, although there are studies that indicate that they can obtain accurate results even with weakly labeled data [51], It has been determined that the cost and time involved in data labeling are a barrier to training DL models.

2.1. Cognitive Architectures Inspired by Neuroscience and Cognitive Psychology

Cognitive architectures are a fundamental pillar in the integration of principles contributed by cognitive psychology and neuroscience into artificial intelligence, with the aim of computationally modeling processes such as reasoning, memory, learning, and human decision-making [58]. These architectures seek to explicitly represent internal cognitive mechanisms, relying on empirically validated theories about the functioning of the human mind [17,59].
Among the models addressed is SOAR (State, Operator, And Result), which implements a decision cycle based on the proposal, selection, and application of operators [60,61]. Based on this approach, rapid and automated functioning is possible when sufficient knowledge exists, analogous to intuitive processes such as the first system, while activating deliberative mechanisms when faced with situations of uncertainty or incomplete knowledge, approximating the processes of a second system with which it interacts [62]. In this way, SOAR articulates a hierarchical dynamic of cognitive control consistent with dual models of human reasoning [63,64].
For its part, ACT-R (Adaptive Control of Thought—Rational) is explicitly based on contributions from experimental cognitive psychology and empirically identified neurobiological correlations [65]. Its architecture integrates declarative memory, procedural memory, and basic motor skills, incorporating mathematical formulations to model processes such as activation, partial matching, and the probability of information retrieval [66]. This correspondence between behavioral data, cognitive structures, and computational mechanisms has made ACT-R a benchmark for modeling human cognition [67].
Additionally, derived from the principles of ACT-R, Instance-Based Learning Theory (IBLT) proposes that reasoning and decision-making arise primarily from the retrieval and comparison of experiences stored in memory, rather than from the application of abstract rules [68,69]. This approach more accurately reflects the way humans make decisions in dynamic and uncertain contexts, where prior knowledge and experience in a situation are indispensable [70,71].
In a complementary vein, the CER (Conceptualization–Experimentation–Reflection) model conceives of decision-making as a cyclical process of updating the mental model [72]. Conceptualization allows us to construct a representation of the situation, experimentation tests decisions in the real environment, and reflection integrates the feedback obtained to adjust future actions. This model emphasizes the adaptive and metacognitive nature of human cognition [73].
In neuroscience, brain-inspired approaches have reinforced these proposals by highlighting the role of synaptic plasticity, Hebbian learning, and the hierarchical organization of the brain as fundamental principles for the development of artificial systems capable of multitasking and flexible reasoning [74,75]. These contributions have influenced both the design of deep neural networks and hybrid architectures that seek to overcome the limitations of models based solely on the phenomenon of connection [76,77].
Indeed, several interdisciplinary studies have proposed hybrid neuro-symbolic frameworks that integrate deep learning with explicit logical reasoning, with the aim of improving robustness, interpretability, and generalization in complex cognitive tasks such as planning, causal inference, and decision-making. This convergence between classical cognitive models, neuroscience, and contemporary machine learning has paved the way for broader paradigms, including brain-inspired computing (BIC) [33], which seeks not only to mimic behavioral outcomes, but also to approximate the underlying computational principles of the human brain.

2.2. Neuro-Symbolic Integration in AI Reasoning and Decision-Making

When it comes to artificial intelligence, decision-making has been formalized mainly through mathematical frameworks aimed at optimizing reward functions in well-defined environments [78]. While these models have demonstrated outstanding performance in controlled domains, such as games or simulations, recent literature points to significant limitations when decisions require broad contextual understanding, moral reasoning, or social interpretation [79]. From a cognitive science perspective, these limitations are explained by the absence of metacognitive mechanisms, experiential integration, and reflective evaluation comparable to those of human cognition [80].
Complementarily, interdisciplinary studies have emphasized that automated analysis performed by AI systems, although highly accurate in specific tasks, presents difficulties in reinterpreting information outside the training domain [17,42]. Comparative neurocognitive research suggests that this rigidity contrasts with the human ability to transfer knowledge between contexts, dynamically adjust evaluation criteria, and reflect on the decision-making process itself [81]. Consequently, decision-making in AI should be understood as a functionally efficient but cognitively limited process, whose evaluation requires criteria different from those used to characterize human cognitive processes [82].
Thus, neuro-symbolic integration emerges as a key paradigm for bridging the gap between operational performance and cognitive competence by merging neural learning with explicit symbolic reasoning mechanisms inspired by cognitive science [81]. Recent research has classified neuro-symbolic architectures according to their degree of coupling, from modular systems to deeply integrated approaches, showing improvements in causal reasoning, inferential traceability, and decision-making in complex contexts [33]. From cognitive psychology, these models seek to functionally approximate dual human processes—intuitive and deliberative—through explicit representations of mental states, preferences, and goals, without asserting a strong cognitive equivalence [58,83].
From neuroscience, agentic approaches inspired by hierarchical brain mechanisms and the functional organization of the prefrontal cortex have contributed to the development of artificial agents with greater capacity for interactive reasoning, contextual adaptation, and generalization in dynamic environments [84]. This convergence between neuroscience, cognitive science, and artificial intelligence does not guarantee artificial cognition comparable to human cognition but it does represent a significant advance toward systems that are more interpretable, robust, and conceptually aligned with the cognitive processes they seek to model [85].
This study is focused on answering the following research questions, see Table 1.

3. Materials and Methods

This study follows an empirical and analytical research design with a quantitative approach, combining a systematic review based on the PRISMA flow with both bibliometric and descriptive statistical analyses [49]. The work is carried out in two phases. First, in line with the purpose of this study, recent scientific literature is analyzed to determine how artificial intelligence is conceptualized, applied, and evaluated in relation to higher-order human cognitive processes, using a bibliometric-statistical approach. This approach is useful for identifying dominant cognitive domains, methodological trends, and patterns of empirical evidence in a large corpus of studies.
In the first phase, the PRISMA flow is adopted to ensure transparency and replicability in the stages of identification, screening, and inclusion of studies, followed by conceptual analysis and categorization [50]. In the second phase, bibliometric and statistical analysis is carried out to extract structured evidence related to cognitive processes, artificial intelligence architectures, reported accuracy levels, and the degree of epistemic trust in the technology [48], see Figure 2.
The PRISMA framework comprises the stages of identification, selection, and eligibility [5], see Figure 3.
In the first stage, the selection stage, the Web of Science database is determined to be the primary source of information due to its extensive coverage of peer-reviewed journals and its suitability for bibliometric analysis. The search strategy combines the text strings “artificial intelligence,” “AI with human intelligence,” and “studies comparing” in all indexable fields, generating an initial set of 2879 records (Available at: https://bit.ly/webofscience-start, accessed on 26 July 2025). After removing early access articles, enriched cited references, and editor-invited reviews, 1416 records remain.
In the second stage, the selection stage, the exclusion and inclusion criteria are applied. All records that are not open access are excluded, a total of n = 733. Meanwhile, works in English are included, and those in other languages are excluded (n = 7). Although the exclusive selection of open access articles in English may introduce a bias into the set of selected works, this decision was made intentionally to prioritize methodological transparency and the replicability of the work. Thus, given that it is considered essential to identify conceptual patterns and trends based on empirical evidence in the literature (rather than evaluating causal effects or performance, which is predominantly clinical in scope), it is considered that this bias does not compromise the validity of the results. However, future work may incorporate restricted-access literature to contrast the stability of the findings and broaden the analysis.
In the third stage, the eligibility stage, filtering is applied by document type, retaining original research articles and literature reviews (n = 67). In addition, the time frame is also restricted to studies published in 2024 and 2025, in order to capture the most recent developments in research on generative and cognitive artificial intelligence, (n = 361). Subsequently, the abstracts are reviewed to exclude studies that do not explicitly address comparisons or interactions between artificial intelligence systems and human cognitive processes, giving a total of n = 4.
Finally, a total of n = 291 job records were obtained for inclusion in the study.
Next, conceptual evaluation and cognitive categorization are performed, where the selected studies are treated from an analytical evaluation using a structured categorization framework and analyzed in terms of their functional and conceptual contribution to specific cognitive processes. Each study is examined to identify the main cognitive process addressed by the artificial intelligence system analyzed. Cognitive processes are operationally defined based on the cognitive psychology literature, addressing decision-making, analysis and evaluation, judgment and reasoning, comprehension and learning, as well as a residual category that groups advanced or emerging cognitive processes. Studies are assigned to a primary category based on the dominant cognitive function described in the objectives, methods, and results.
The categorization process follows an iterative procedure based on the rules defined below. Initially, conceptual definitions are established for each cognitive process. Subsequently, the studies are read in their entirety or, when necessary, from extended abstracts, and coded according to the main cognitive process addressed. In cases where a study involves multiple cognitive dimensions, classification is based on the main result or function on which the artificial intelligence system focuses.
Ambiguous cases are resolved through iterative re-evaluation of the reported objectives and results to ensure conceptual consistency. As a result, four dominant cognitive processes are identified, along with a fifth category that groups together advanced or less represented processes, such as ethical reasoning, abstraction and modeling of complex systems, and human–AI cognitive interaction.
In the second phase of the study, a bibliometric analysis is performed using VOSviewer software 1.6.20 to visually generate collaboration networks, keyword co-occurrence maps, and density visualizations. These analyses allow for the identification of dominant research clusters, geographic collaboration patterns, and thematic concentrations (Available at: https://bit.ly/webofscience-end, accessed on 26 July 2025) [86]. In addition, Descriptive statistical analyses are performed using scripts developed in Python 3 and executed on the Google Colab platform. These scripts are used to calculate frequencies, percentages, and distributions related to cognitive processes, accuracy levels reported by artificial intelligence systems, study design types, and conclusions regarding trust in AI technologies. Bar charts and summary statistics are also generated to support the quantitative interpretation of the results. In order to strengthen reproducibility, all analytical stages follow a predefined workflow that includes metadata extraction, preprocessing, categorical coding, statistical aggregation, and visualization. Although the specific scripts are not included in the manuscript, the analytical logic and tools used are described in detail, allowing for replication by other researchers.

4. Results

In the first phase of the research, 291 studies were analyzed, allowing five groups to be identified, corresponding to four cognitive processes. In addition, other advanced processes were grouped into a fifth category see Table 2.
The table above shows that 56.01% of research papers belong to the cognitive processes group for “Decision-making”. 25.09% of research papers belong to the cognitive processes group for “Analysis and evaluation”. 8.59% of research papers belong to the cognitive processes group for “Judgment and reasoning”. 5.55% of research papers belong to the cognitive processes group for “Comprehension and learning”. And 4.76% were grouped under “Other cognitive processes”.
Within the “Other cognitive processes” group, 14 research papers were identified that focused on “Advanced cognitive processes in interaction with AI”, which center on abstraction and modeling of complex systems, visual perception, selective attention, ethical reasoning, and AI-assisted creative problem solving, among other topics, see Table 3.
In the second phase of the research, the metadata from the 291 works included in the study were used for processing in VOSviewer, enabling the generation of a map of collaborative relationships in academia or research, heat maps, and a map of frequently used keywords.
On the heat map of countries that collaborate closely on research studies comparing AI with human intelligence, it can be seen that countries such as the USA, England, the People’s Republic of China, Germany, Australia, Italy, South Korea, Canada, Spain, and India are leading the way in studies, see Figure 4a.
In addition, the map shows groups of countries addressing the same issue in different working groups. On the other hand, the map showing the relationships between countries working collaboratively highlights those that have been leading several joint studies since 2023. Similarly, the yellow coloring highlights those countries that will be integrating new working groups with existing countries by 2024. These countries include the Philippines, Mexico, Turkey, Israel, Estonia, Belgium, Poland, and Nigeria, see Figure 4b.
In the density map for relevant terms, it is clear to see that the terms “artificial intelligence”, “deep learning”, and “ChatGPT” are the most prevalent. Furthermore, the frequency of terms from the medical and computational fields shows that there is a very high level of interest in obtaining results in these areas, see Figure 4c.
Finally, the heat map of identified and related keywords shows that scientific studies focus on topics related to extensive language models, learning assistance, interpretation of medical results, interpretation of medical images, among others, allowing the creation of a ring that takes shape according to scientific studies and findings that open up comparative relationships, see Figure 4d.
A statistical analysis is also performed on those studies that present the accuracy of the identified artificial intelligence models in their results. The evidence in these studies is grouped into four categories. They are presented in percentages, with 55.7% of studies not presenting any accuracy index data, 19.9% of studies with accuracy indices below 80%, 14.1% of studies indicating accuracy rates between 80% and 89%, and only 10.3% of studies indicating accuracy rates between 90% and 100%, see Figure 5.
From this point onwards, it is determined that only 10.3% of studies present accuracies greater than 90%, reflecting an approximation of the complexity of higher cognitive processes; the misalignment between quantitative metrics and actual cognitive performance; the diversity of tasks evaluated; and contextual and ethical constraints. Most studies evaluate highly complex cognitive domains rather than superficial tasks. In real-world contexts, judgment and reasoning, as well as decision-making, involve ambiguity, doubt, contextual changes, and a strong relationship with and need for implicit knowledge [2,11].
These characteristics mean that even the performance of a human expert is imperfect and that, therefore, achieving accuracy levels above 90% represents a demanding metric and—some authors consider—should not be interpreted as evidence of the limitation of artificial intelligence compared to human performance [13].
For example, in the field of dentistry, although AI applications can exceed the recommended margins of error in most cephalometric points, even small inaccuracies are considered clinically unacceptable due to the risk associated with incorrect therapeutic decisions [5]. In these cases, an accuracy close to but below 90% does not imply poor performance from a technical standpoint, but rather an incompatibility between clinical safety standards and traditional statistical metrics.
Likewise, studies that report favorable results tend to do so in well-defined subtasks, while those that identify comprehensive cognitive processes tend to report more moderate accuracies. Thus, in specific tasks such as pattern detection in gastroscopy, digital pathology, or radiology images, these systems may fail when required to integrate multiple sources of information, explain decisions, or adapt to atypical cases [2]. Conversely, between high point performance and overall performance, this explains why maximum accuracy percentages remain relatively low when broader studies are considered [55].
Next, those studies with distinctive characteristics that develop “Experimental studies” or “Non-experimental studies” within their research proposal are identified, finding that of the studies included, 22% are Non-experimental studies and 78% are Experimental studies, see Figure 6.
The number of experimental studies (78%) compared to non-experimental studies (22%) indicates that much of the evidence from research studies is generated in controlled designs, which reduces the extrapolation of results to real-world contexts. This situation, between experimental conditions and real-world applications, ensures that researchers exercise caution when evaluating the reliability of the technology, even when the experimental results are favorable [7,12].
Furthermore, the components of each study—ethical, social, contextual, cultural, etc.—that introduce additional restrictions on the deployment and evaluation of AI must also be considered [56]. Variability in public and cultural perceptions of the ethics of AI use means that, in many scenarios, systems are not implemented or evaluated solely on the basis of maximum accuracy, but also on criteria of reliability, transparency, and acceptability in a social context [57]. As a result, design may be limited to studies aimed solely at maximizing performance metrics, reducing the number of studies reporting extremely high accuracy [9,86,87].
Therefore, the main challenge lies not only in improving the technical performance of models, but also in developing more transparent AI systems, framed in a context and in line with the cognitive, ethical, and social expectations of (human) users, in order to achieve generalization [8].
In addition, 83.2% of the studies included correspond to the percentage of studies concluding that technologies cannot be relied upon, while 16.8% of the studies included correspond to the percentage of studies reflecting confidence in technology, see Figure 7.
These findings show mistrust as a multidimensional phenomenon, bringing together factors such as accuracy, explainability, context of application, and ethical, legal, and social considerations [53,56,88].
It should be noted that studies show that, although AI can achieve high performance in certain specific parameters such as partial or asymmetric accuracy, it fails in other equally critical areas [47,53]. The feasibility study on real-time feedback in CPR sequences is illustrative: the AI tool assessed compression frequency with high concordance but did not achieve adequate accuracy in compression depth [4].
This inconsistency creates a scenario in which the technology is only reliable in a fragmented way, which is insufficient in medical domains where even occasional errors can have serious consequences [24,53]. Thus, mistrust does not stem from the total inability of AI, but from its inability to guarantee consistent performance across all relevant dimensions of the task [89].
For example, the lack of transparency in AI decision-making processes when explainability is required contributes significantly to the perception of mistrust [90]. When models achieve results comparable to or superior to humans in various studies, although difficulties in interpreting how and why certain predictions are made are pointed out, there is not necessarily complete transparency about the model applied [91]. This is evident in research that uses AI models to evaluate surgical skills, such as in robotic gastrectomy, where instrument recognition allows differentiation between experienced and inexperienced surgeons [55]. Although the results are promising, discrepancies between AI-preceded instrument use and human evaluation raise questions about the cognitive basis of such predictions and the model applied, limiting their acceptance as an autonomous assessment tool [92].
As indicated, mistrust stems not only from experimental results but also from uncertainty about the applicable ethical, legal, and social criteria (ELSI), which transcend technical performance [87]. Comparative studies between Japan, the USA, Germany, and Korea show that concerns about AI vary significantly between countries and that greater knowledge about the technology is often associated with a higher level of concern. This shows that mistrust is the result of a more conscious understanding of the implications of using technology, its risks, and limitations [49]. A lack of awareness about the ethics of AI use reinforces this perception, especially in scenarios where automated decisions can affect rights, power relations, or professional responsibilities [87].
Furthermore, heterogeneity in the evidence-based evaluation process introduces additional pressure when comparing predictions from an AI system and human experts in high-risk professional areas such as healthcare [54]. Although several studies report discrepancies between AI decisions and those of specialists, it is necessary to question the implicit assumption that human opinion is homogeneous and consistent [93]. Indeed, in certain fields, disagreements among human experts are frequent, which raises the question of whether part of the mistrust toward AI stems from comparisons based on a standard proposed by humans that is not necessarily stable or uniform [94]. In this sense, mistrust reflects both the limitations of AI and the inherent ambiguities attributed to human evaluation [95].
The AI systems found in the studies included in this work were also identified, standardized, and classified according to the field of study.
The table is arranged according to classification, term, frequency, and percentage, see Table 4, identifying AI systems studied in a human cognitive process; their categorization, number, and description see Table 5.
The data show that large language models (LLMs)—particularly the GPT family, along with BERT, Claude, Gemini, and Llama—are concentrated in almost all identified cognitive processes. However, their dominance is not homogeneous, but rather responds to the cognitive properties that each architecture is capable of emulating, reproducing, or amplifying, as determined by their developers.
In the decision-making process, which is the most represented in the literature, there is a clear hegemony of hybrid architectures that combine language models, deep neural networks, and classic supervised learning algorithms such as Random Forest, XGBoost, or logistic regression.
This predominance can be explained by the fact that decision-making in complex contexts often involves the integration of large volumes of heterogeneous information, the detection of hidden patterns, and the generation of low-certainty recommendations.
LLMs provide semantic contextualization and approximate reasoning, while traditional models offer statistical stability and tracking in high-risk domains such as medicine or engineering. The high frequency of medical and diagnostic applications in this domain reinforces the idea that their value lies not only in their accuracy, but also in their ability to reduce human cognitive load in scenarios where errors have critical consequences.
For its part, the domain of analysis and evaluation is frequently identified by convolutional architectures and computer vision models, such as ResNet, UNet, EfficientNet, VGG, and variants of Vision Transformers, along with generative image models such as DALL-E, Stable Diffusion, or Midjourney. Thus, analysis and evaluation, as they appear in the literature, are mostly associated with tasks of visual pattern recognition, segmentation, classification, and quality assessment, where human cognition relies on highly specialized perceptual skills but is limited by fatigue (a complex human condition) and subjectivity.In these contexts, the comparative advantage of AI is clear, as these architectures consistently outperform humans in speed, stability, and scalability, even when absolute accuracy does not reach the 90% threshold in all studies. Its value therefore lies in the standardization of evaluative judgment, rather than the complete replacement of human experts.
Likewise, the application of judgment and reasoning, although less represented quantitatively, shows a significant pattern: it is dominated almost exclusively by language models trained on large textual corpora, such as GPT, BERT, Bard, and Llama, along with specific architectures such as RAG or argumentative classifiers. This is explained by the fact that judgment and reasoning, unlike perceptual analysis, depend on symbolic manipulation, discursive coherence, and contextual inference, areas where LLMs have demonstrated remarkable emerging capabilities. Despite this, the low number of studies suggests that, as promising architectures, their merit is more instrumental and complementary than substitutive of expert human reasoning, especially in ethical, legal, or clinical contexts.
In the case of comprehension and learning processes, language models predominate, but with a different orientation: the emphasis is on assimilating information, generating explanations, and facilitating educational or training processes, rather than deciding or judging. Models such as ChatGPT, MedAlpaca, and ORCA-mini appear repeatedly because their architecture favors pedagogical interaction, adaptation to the user’s level, and the generation of explanations in natural language. Their contribution from a cognitive perspective is very high when their use is evaluated as a cognitive structure, that is, they are conceived as a support for human learning and—to a lesser extent—as an autonomous agent of knowledge.
It is worth noting the scarce representation of architectures oriented towards ethical reasoning, despite the growing social and academic concern reflected in the studies. The limited presence of models such as GPT-4, Claude, or Bard in this domain shows that ethics continues to be treated more as an external normative level than as a cognitive capacity integrated into AI systems. This supports the idea that the mistrust detected in the studies reviewed is not explained solely by technical limitations, but by the misalignment between the current cognitive capabilities of AI and human expectations regarding responsibility, dilemmas, reasoning, values, and moral judgment.
Likewise, it is evident that architectures predominate in those cognitive domains that are linked to their fundamental design. Convolutional networks excel in visual perception and analysis; language models excel in symbolic, discursive, and decision-making domains; and traditional algorithms maintain their prominence when interpretability and statistical robustness are priorities. The true contribution of these architectures lies in achieving levels of accuracy comparable to or superior to humans, and fundamentally in their ability to reconfigure human cognitive processes, transferring the load between automation, supervision, and expert judgment.
In addition, 31 studies were identified whose results show that the use of artificial intelligence achieves an accuracy rate of between 90% and 100%. These results were obtained in research in the following categories: “Medicine and health”, “Health and technology education”, “Technological Innovation for Sustainability and Health”, “Technology and Humanism in Health”, and “Technology and Human Interaction”, see Table 6.

5. Discussion

The question of how intelligent AI is compared to humans, based on the empirical and experimental evidence analyzed here, while difficult to answer, will continue to be necessary. Starting with the very meaning and definition of intelligence (philosophy, psychology, neuroscience, etc.), there is no possibility of finding an adequate and comprehensive answer. Although artificial intelligence systems have achieved success in solving complex problems, professionals still do not fully trust them as the sole measure.
In answering RQ1, the findings show that the term “artificial intelligence” is no longer a foreign scientific concept but has become an operationalization developed in applications of human cognition [13].
The flexible, adaptable, and critical judgment capabilities of human intelligence remain the initial basis for study in any experiment. However, the criteria of human experts are unique [12], because it operates in a complex ethical and subjective context. In this sense [5], the need for critical application is evident; AI still requires professional supervision by an expert, demonstrating that human knowledge is indispensable. Recent studies are therefore focusing their efforts on rethinking whether artificial intelligence is a phenomenon that should remain on the margins of collaboration [5,10]. The term “artificial intelligence”, as described for systems capable of performing tasks, requires human involvement [4,12].
In answering RQ2, the findings show a lack of ethical judgment, creativity, and adaptability in different situations. Studies show that AI can fail critically in tasks that require handling unexpected or high-risk variables. Measuring the depth of cardiopulmonary resuscitation compressions, for example, requires an emergency procedure that is used by trained professionals when a person is not breathing or has no heartbeat [4]. Thus, the consensus is that, despite its analytical power, AI cannot assume ultimate responsibility for its actions, an intrinsic quality that corresponds to intelligence and human beings [6,11]. Mistrust and/or resistance toward the application of AI is not simply resistance to technological change, but rather a rational response to the discrepancy between algorithmic performance and human cognitive standards constructed from psychology and neuroscience [25]. From a cognitive scientific perspective, the mistrust identified in the literature is interpreted as conceptual discrepancy, in addition to the technical limitations experienced. If research results report high levels of performance, accuracy, or efficiency in tasks, they are considered evidence of operational success rather than the achievement of essential cognitive mechanisms such as semantic comprehension, contextual reasoning, or reflective judgment. As highlighted in psychology and neuroscience, human confidence is based on the reliability of results, perceived intentionality, consistency, and the ability to justify decisions [17,18,21]. Consequently, when AI systems demonstrate high performance without transparent reasoning or contextual awareness, trust remains weak despite favorable quantitative metrics [24].
However, advocates of AI as intelligence demonstrate that its performance in data analysis tasks is superior and provides compelling evidence [2], and have proven to be more accurate in certain fields, especially in health [3,6]. All in all, these findings show that AI, understood as a knowledge base, mimics human intelligence in terms of high data processing but is in fact a reflection of it in specific domains.
In answering RQ3, empirical findings reveal “erroneous academic perceptions”, or the generation of incorrect but reasonable information [9], as evidence that AI lacks a real understanding of complex scenarios, since it does not have a subjective frame of reference that allows it to validate truth in situations not guided by creativity, emotion, empathy, or critical judgment-based feeling.
The contributions analyzed show a kind of hierarchy of practical application complexity, beyond the theorization or definition of cognitive processes in this research (a non-predominant aspect). The basis of this hierarchy, although it may seem contradictory, lies in “understanding and learning” (PLN), which, in turn, would enable “analysis and evaluation” [8]. From there, the complexity increases for the development of “judgment and reasoning” (such as diagnostic inferences) [12]) and, at its highest level, “decision-making” [50].
At the same time, there is a significant dominance (56 percent of the analyzed corpus) of the study of AI “decision-making”, which shows that the application of AI actively seeks to “increase” (and replace?) the most decisive human cognitive process in professional fields.
It should be noted that this study highlights the need to distinguish between the attribution of cognitive processes and task-based functional equivalence. Although the present study classifies research according to cognitive processes (such as decision-making, analysis, or reasoning), the results indicate that most studies operationalize these processes through task execution, architecture, performance, and standardized metrics, using explicit, customized, and contextualized modeling of human cognitive mechanisms. In this sense, describing an AI system as capable of “reasoning” or “judging” often reflects its ability to optimize decisions or detect patterns, rather than the presence of human-like inferential or evaluative cognition. This distinction is necessary to avoid anthropomorphic interpretations and ensure conceptual rigor when comparing artificial and human intelligence [36,99].
Today, there is a public debate about the implications of artificial intelligence in social, legal, and ethical contexts; it is no longer limited to analysis by experts, academics, researchers, or scientists [87]. Indeed, its increasingly widespread and intensive presence and use is giving rise to public concern about accountability, transparency, security, and trust, especially in situations where the results of algorithms are subject to human judgment and ethical evaluation. The benefits of AI are being advocated as a possible extension and improvement of human capabilities towards a model of responsible, effective, and collaborative interaction between humans and intelligent systems [47]. However, this study suggests that such collaboration can only be sustainable if clear epistemic boundaries are maintained, recognizing the limitations of artificial systems and reaffirming the indispensable role of human cognition in contextual understanding, ethical reasoning, and final decision-making.

6. Conclusions

Scientific evidence indicates that AI shows superior and comparable performance to humans in different domains and processes of an “apparently cognitive” artificial nature, but not in high-risk ones. Indeed, the findings show the tension between different outcomes when establishing “judgment reliability”, as AI appears to be superior in evaluation but deficient in judgment related to decision-making.
The corpus of papers analyzed is structured almost entirely around higher (or complex) cognitive processes. As a conclusion derived from the selection of studies (using the PRISMA method), it was found that none of the papers reviewed had the purpose or content of defining or theorizing about cognitive processes in relation to specific AI applications. Notions such as “reasoning”, “judgment”, or “decision-making” are tacit knowledge.
Specifically, we have determined that the dominant process is “Decision-making” (163 papers, 56% of the corpus), focused on AI’s ability to select a course of action, diagnose, or choose; often in scenarios with direct consequences. This is followed by “analysis and evaluation” (73 papers, 25%), which focuses on AI’s ability to break down complex information, compare the quality of responses (AI vs. human), and evaluate accuracy. Next, in order of importance, we find that “judgment and reasoning” (25 papers, 8.6%) studies inference, logical deduction, and the application of rules in clinical or legal contexts. For its part, “comprehension and learning” (16 papers, 5.5%) examines AI’s ability to understand natural language (NLP) and generate coherent knowledge or summaries. Finally, other specific processes (14 papers, 4.8%) deal with more granular processes such as “high-level visual perception”, “selective attention”, and “ethical reasoning”.
However, rather than measuring algorithmic accuracy, these studies investigate the reliability, biases, and quality of collaboration between human agents and intelligent systems, thereby delineating the thresholds of trust and the ethical and practical implications of integrating these tools into high-responsibility workflows.
The collected works analyze various higher cognitive processes. The research does not stop at the automation of simple tasks but tests the ability of algorithms to execute and potentially surpass human judgment, situational analysis, inferential reasoning, and decision-making in environments characterized by ambiguity, high pressure, or the need to interpret complex data (visual or textual). Across the board, these studies explore human–AI interaction and cognitive augmentation. The most significant theoretical contribution is not found in a single article defining cognition but in the implicit classification that this body of research uses to structure human–AI interaction.
Methodologically speaking, the studies propose scenarios of high fidelity and domain specificity, moving away from a theoretical treatment of the phenomenon under study and focusing, instead, on its direct observation and evaluation in comparison with expert cognition, in complex tasks typical of professional practice.
It can be concluded that AI is becoming a potentially powerful tool with high data processing capabilities that is closer to being described as “artificial” than “intelligent”. The cooperation between the two, a form of augmented intelligence, is seen as the clearest strategy for tackling complex problems, combining the speed and accuracy of AI with the experience and ethics of humans. Competition between machines and humans, which ends with an eventual winner, remains the subject of science fiction, popular imagination, and scientific research [1].
AI, as a technological tool with a high-capacity knowledge base or “intelligence”, will not replace humans, but rather enhance their unique cognitive abilities. This is the principle of “human-centered AI” [1] and is conceived as a model where the cooperation, speed, and consistency of AI are combined with human creativity and ethical judgment to create a form of augmented intelligence [47].
In other words, the redefinition proposed by the studies found is that the most effective intelligence is neither artificial nor human alone, but rather their synergy. Thus, the synergy between human and AI capabilities should be interpreted as functional complementarity, rather than cognitive equivalence. Based on the findings of the numerous studies reviewed here, we maintain our position in suggesting the use of AI systems to support human analysis and decision-making processes, as well as the variety of cognitive processes addressed thus far, reserving judgment (in its entirety) and responsibility for action to the individual.

Author Contributions

Conceptualization, R.A.-C. and J.L.-I.; methodology, R.A.-C. and J.L.-I.; software, R.A.-C. and J.L.-I.; validation, R.A.-C. and J.L.-I.; formal analysis, R.A.-C. and J.L.-I.; investigation, R.A.-C. and J.L.-I.; resources, R.A.-C. and J.L.-I.; data curation, R.A.-C. and J.L.-I.; writing—original draft preparation, R.A.-C. and J.L.-I.; writing—review and editing, R.A.-C. and J.L.-I.; visualization, R.A.-C. and J.L.-I.; supervision, R.A.-C. and J.L.-I.; project administration, R.A.-C. and J.L.-I.; funding acquisition, R.A.-C. and J.L.-I. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Esposito, A.; Desolda, G.; Lanzilotti, R. The fine line between automation and augmentation in website usability evaluation. Sci. Rep. 2024, 14, 10129. [Google Scholar] [CrossRef]
  2. Turtoi, D.C.; Brata, V.D.; Incze, V.; Ismaiel, A.; Dumitrascu, D.I.; Militaru, V.; Munteanu, M.A.; Botan, A.; Toc, D.A.; Duse, T.A.; et al. Artificial Intelligence for the Automatic Diagnosis of Gastritis: A Systematic Review. J. Clin. Med. 2024, 13, 4818. [Google Scholar] [CrossRef]
  3. Díaz-Herrera, B.A.; Roman-Rangel, E.; Castro-García, C.A.; Sierra-Lara Martinez, D.; Gopar-Nieto, R.; Velez-Talavera, K.G.; Espinosa-Martínez, M.P.; March-Mifsut, S.; Latapi-Ruiz-Esparza, X.; Preciado-Gutiérrez, Ó.U.; et al. Derivación de un modelo electrocardiográfico basado en inteligencia artificial para la detección de infarto agudo del miocardio por oclusión trombótica. Arch. Cardiol. México 2025, 95, 178–187. [Google Scholar] [CrossRef]
  4. Ecker, H.; Adams, N.B.; Schmitz, M.; Wetsch, W.A. Feasibility of real-time compression frequency and compression depth assessment in CPR using a “machine-learning” artificial intelligence tool. Resusc. Plus 2024, 20, 100825. [Google Scholar] [CrossRef] [PubMed]
  5. Polizzi, A.; Leonardi, R. Automatic cephalometric landmark identification with artificial intelligence: An umbrella review of systematic reviews. J. Dent. 2024, 146, 105056. [Google Scholar] [CrossRef]
  6. Chen, J.; Huang, S.; Zhang, Y.; Chang, Q.; Zhang, Y.; Li, D.; Qiu, J.; Hu, L.; Peng, X.; Du, Y.; et al. Congenital heart disease detection by pediatric electrocardiogram based deep learning integrated with human concepts. Nat. Commun. 2024, 15, 976. [Google Scholar] [CrossRef] [PubMed]
  7. Chang, M.Y.; Heidary, G.; Beres, S.; Pineles, S.L.; Gaier, E.D.; Gise, R.; Reid, M.; Avramidis, K.; Rostami, M.; Narayanan, S. Artificial Intelligence to Differentiate Pediatric Pseudopapilledema and True Papilledema on Fundus Photographs. Ophthalmol. Sci. 2024, 4, 100496. [Google Scholar] [CrossRef] [PubMed]
  8. Zhang, J.; Wu, P.; London, J.; Tenney, D. Benchmarking and Evaluating Large Language Models in Phishing Detection for Small and Midsize Enterprises: A Comprehensive Analysis. IEEE Access 2025, 13, 28335–28352. [Google Scholar] [CrossRef]
  9. Chen, X.; Zhao, Z.; Zhang, W.; Xu, P.; Wu, Y.; Xu, M.; Gao, L.; Li, Y.; Shang, X.; Shi, D.; et al. EyeGPT for Patient Inquiries and Medical Education: Development and Validation of an Ophthalmology Large Language Model. J. Med. Internet Res. 2024, 26, e60063. [Google Scholar] [CrossRef]
  10. Lavista Ferres, J.M.; Oviedo, F.; Robinson, C.; Chu, L.; Kawamoto, S.; Afghani, E.; He, J.; Klein, A.P.; Goggins, M.; Wolfgang, C.L.; et al. Performance of explainable artificial intelligence in guiding the management of patients with a pancreatic cyst. Pancreatology 2024, 24, 1182–1191. [Google Scholar] [CrossRef]
  11. Gumilar, K.E.; Wardhana, M.P.; Akbar, M.I.A.; Putra, A.S.; Banjarnahor, D.P.P.; Mulyana, R.S.; Fatati, I.; Yu, Z.Y.; Hsu, Y.C.; Dachlan, E.G.; et al. Artificial intelligence-large language models (AI-LLMs) for reliable and accurate cardiotocography (CTG) interpretation in obstetric practice. Comput. Struct. Biotechnol. J. 2025, 27, 1140–1147. [Google Scholar] [CrossRef]
  12. Bowness, J.S.; Morse, R.; Lewis, O.; Lloyd, J.; Burckett-St Laurent, D.; Bellew, B.; Macfarlane, A.J.R.; Pawa, A.; Taylor, A.; Noble, J.A.; et al. Variability between human experts and artificial intelligence in identification of anatomical structures by ultrasound in regional anaesthesia: A framework for evaluation of assistive artificial intelligence. Br. J. Anaesth. 2024, 132, 1063–1072. [Google Scholar] [CrossRef] [PubMed]
  13. Boyd, C.J.; Rivera, L.R.P.; Hemal, K.; Sorenson, T.J.; Amro, C.; Choi, M.; Karp, N.S. Analyzing the precision and readability of a healthcare focused artificial intelligence platform on common questions regarding breast augmentation. Artif. Intell. Surg. 2024, 4, 316–323. [Google Scholar] [CrossRef]
  14. Llerena-Izquierdo, J.; Mendez-Reyes, J.; Ayala-Carabajo, R.; Andrade-Martinez, C. Innovations in Introductory Programming Education: The Role of AI with Google Colab and Gemini. Educ. Sci. 2024, 14, 1330. [Google Scholar] [CrossRef]
  15. Kammer, J.E.; Hautz, W.E.; Krummrey, G.; Sauter, T.C.; Penders, D.; Birrenbach, T.; Bienefeld, N. Effects of interacting with a large language model compared with a human coach on the clinical diagnostic process and outcomes among fourth-year medical students: Study protocol for a prospective, randomised experiment using patient vignettes. BMJ Open 2024, 14, e087469. [Google Scholar] [CrossRef] [PubMed]
  16. Stanovich, K.E.; Toplak, M.E.; West, R.F. Intelligence and Rationality. In The Cambridge Handbook of Intelligence; Cambridge Handbooks in Psychology; Cambridge University Press: Cambridge, UK, 2020; pp. 1106–1139. [Google Scholar]
  17. Goldenholz, D.M.; Goldenholz, S.R.; Habib, S.; Westover, M.B. Inductive reasoning with large language models: A simulated randomized controlled trial for epilepsy. Epilepsy Res. 2025, 211, 107532. [Google Scholar] [CrossRef]
  18. Evans, J.S. Dual-process theories of reasoning: Contemporary issues and developmental applications. Dev. Rev. 2011, 31, 86–102. [Google Scholar] [CrossRef]
  19. Abdullahi, T.; Singh, R.; Eickhoff, C. Learning to Make Rare and Complex Diagnoses with Generative AI Assistance: Qualitative Study of Popular Large Language Models. JMIR Med. Educ. 2024, 10, e51391. [Google Scholar] [CrossRef]
  20. Jaworski, A.; Jasiński, D.; Sławińska, B.; Błecha, Z.; Jaworski, W.; Kruplewicz, M.; Jasińska, N.; Sysło, O.; Latkowska, A.; Jung, M. GPT-4o vs. Human Candidates: Performance Analysis in the Polish Final Dentistry Examination. Cureus 2024, 16, e68813. [Google Scholar] [CrossRef]
  21. Badre, D.; Nee, D.E. Frontal Cortex and the Hierarchical Control of Behavior. Trends Cogn. Sci. 2018, 22, 170–188. [Google Scholar] [CrossRef]
  22. Habicht, J.; Dina, L.M.; McFadyen, J.; Stylianou, M.; Harper, R.; Hauser, T.U.; Rollwage, M. Generative AI–Enabled Therapy Support Tool for Improved Clinical Outcomes and Patient Engagement in Group Therapy: Real-World Observational Study. J. Med. Internet Res. 2025, 27, e60435. [Google Scholar] [CrossRef] [PubMed]
  23. Lieder, F.; Griffiths, T.L. Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. Behav. Brain Sci. 2020, 43, e1. [Google Scholar] [CrossRef]
  24. Kobayashi, K.; Takamizawa, Y.; Miyake, M.; Ito, S.; Gu, L.; Nakatsuka, T.; Akagi, Y.; Harada, T.; Kanemitsu, Y.; Hamamoto, R. Can physician judgment enhance model trustworthiness? A case study on predicting pathological lymph nodes in rectal cancer. Artif. Intell. Med. 2024, 154, 102929. [Google Scholar] [CrossRef] [PubMed]
  25. Seth, A.K. Conscious artificial intelligence and biological naturalism. Behav. Brain Sci. 2025, 1–42. [Google Scholar] [CrossRef]
  26. Solomou, S.; Sengupta, U. Simulating Complex Urban Behaviours with AI: Incorporating Improved Intelligent Agents in Urban Simulation Models. Urban Plan. 2025, 10, 8561. [Google Scholar] [CrossRef]
  27. Xu, P.; Estrada, S.; Etteldorf, R.; Liu, D.; Shahid, M.; Zeng, W.; Früh, D.; Reuter, M.; Breteler, M.M.B.; Aziz, N.A. Hypothalamic volume is associated with age, sex and cognitive function across lifespan: A comparative analysis of two large population-based cohort studies. eBioMedicine 2025, 111, 105513. [Google Scholar] [CrossRef]
  28. Ringgold, V.; Abel, L.; Eskofier, B.M.; Rohleder, N. Validation of the Virtual Reality Stroop Room: Effects of inhibiting interfering information under time-pressure and task-switching demands. Comput. Hum. Behav. Rep. 2024, 16, 100497. [Google Scholar] [CrossRef]
  29. Viscasillas Vázquez, C.; Solano, E.; Ulla, A.; Ambrosch, M.; Álvarez, M.A.; Manteiga, M.; Magrini, L.; Santoveña-Gómez, R.; Dafonte, C.; Pérez-Fernández, E.; et al. Advanced classification of hot subdwarf binaries using artificial intelligence techniques and Gaia DR3 data. Astron. Astrophys. 2024, 691, A223. [Google Scholar] [CrossRef]
  30. de Lellis Rossi, L.; Rohmer, E.; Dornhofer Paro Costa, P.; Colombini, E.; da Silva Simões, A.; Gudwin, R. A Procedural Constructive Learning Mechanism with Deep Reinforcement Learning for Cognitive Agents. J. Intell. Robot. Syst. 2024, 110, 38. [Google Scholar] [CrossRef]
  31. Roshan, M.P.; Al-Shaikhli, S.A.; Linfante, I.; Antony, T.T.; Clarke, J.E.; Noman, R.; Lamy, C.; Britton, S.; Belnap, S.C.; Abrams, K.; et al. Revolutionizing Intracranial Hemorrhage Diagnosis: A Retrospective Analytical Study of Viz.ai ICH for Enhanced Diagnostic Accuracy. Cureus 2024, 16, e66449. [Google Scholar] [CrossRef]
  32. Putra, R.V.W.; Marchisio, A.; Shafique, M. SNN4Agents: A framework for developing energy-efficient embodied spiking neural networks for autonomous agents. Front. Robot. AI 2024, 11, 1401677. [Google Scholar] [CrossRef]
  33. Zhou, J.; Duan, Y.; Chang, Y.C.; Wang, Y.K.; Lin, C.T. BELT: Bootstrapped EEG-to-Language Training by Natural Language Supervision. IEEE Trans. Neural Syst. Rehabil. Eng. 2024, 32, 3278–3288. [Google Scholar] [CrossRef] [PubMed]
  34. Frank, B. Consumer preferences for artificial intelligence-enhanced products: Differences across consumer segments, product types, and countries. Technol. Forecast. Soc. Change 2024, 209, 123774. [Google Scholar] [CrossRef]
  35. Hadar-Shoval, D.; Asraf, K.; Mizrachi, Y.; Haber, Y.; Elyoseph, Z. Assessing the Alignment of Large Language Models with Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz’s Theory of Basic Values. JMIR Ment. Health 2024, 11, e55988. [Google Scholar] [CrossRef]
  36. Kim, H.; Jeon, B.; Kim, M.C.; Choi, Y. Feasibility study on dose conversion using a deep learning algorithm for retrospective dosimetry. Radiat. Meas. 2025, 181, 107382. [Google Scholar] [CrossRef]
  37. Reinecke, M.G.; Wilks, M.; Bloom, P. Developmental changes in the perceived moral standing of robots. Cognition 2025, 254, 105983. [Google Scholar] [CrossRef] [PubMed]
  38. Myers, S.; Everett, J.A. People expect artificial moral advisors to be more utilitarian and distrust utilitarian moral advisors. Cognition 2025, 256, 106028. [Google Scholar] [CrossRef]
  39. Han, D.; Shanbhag, A.; Miller, R.J.; Kwok, N.; Waechter, P.; Builoff, V.; Newby, D.E.; Dey, D.; Berman, D.S.; Slomka, P. AI-Derived Left Ventricular Mass From Noncontrast Cardiac CT. JACC Adv. 2024, 3, 101249. [Google Scholar] [CrossRef]
  40. Mehrotra, S.; Jorge, C.C.; Jonker, C.M.; Tielman, M.L. Integrity-based Explanations for Fostering Appropriate Trust in AI Agents. ACM Trans. Interact. Intell. Syst. 2024, 14, 1–36. [Google Scholar] [CrossRef]
  41. Deeb, B.M.; Savchenko, A.V.; Makarov, I. Enhancing Emotion Recognition in Speech Based on Self-Supervised Learning: Cross-Attention Fusion of Acoustic and Semantic Features. IEEE Access 2025, 13, 56283–56295. [Google Scholar] [CrossRef]
  42. Bulla, L.; De Giorgis, S.; Mongiovì, M.; Gangemi, A. Large Language Models meet moral values: A comprehensive assessment of moral abilities. Comput. Hum. Behav. Rep. 2025, 17, 100609. [Google Scholar] [CrossRef]
  43. Chu, R.; Chik, L.; Song, Y.; Chan, J.; Li, X. An effective approach for early fuel leakage detection with enhanced explainability. Intell. Syst. Appl. 2025, 26, 200504. [Google Scholar] [CrossRef]
  44. Shimada, K.; Inokuchi, R.; Ohigashi, T.; Iwagami, M.; Tanaka, M.; Gosho, M.; Tamiya, N. Artificial intelligence-assisted interventions for perioperative anesthetic management: A systematic review and meta-analysis. BMC Anesthesiol. 2024, 24, 306. [Google Scholar] [CrossRef]
  45. Farid, Y.; Fernando Botero Gutierrez, L.; Ortiz, S.; Gallego, S.; Zambrano, J.C.; Morrelli, H.U.; Patron, A. Artificial Intelligence in Plastic Surgery: Insights from Plastic Surgeons, Education Integration, ChatGPT’s Survey Predictions, and the Path Forward. Plast. Reconstr. Surg. Glob. Open 2024, 12, e5515. [Google Scholar] [CrossRef]
  46. Wang, H.; Wu, Y.; Guo, S.; Wang, L. PDPP: Projected Diffusion for Procedure Planning in Instructional Videos. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 2107–2124. [Google Scholar] [CrossRef]
  47. Bixby, C.J.; Miller, B. Real-world use of an artificial intelligence-powered clinical decision support tool for ovarian stimulation. F&S Rep. 2025, 6, 140–146. [Google Scholar] [CrossRef]
  48. Rojas, M.; Rojas, M.; Burgess, V.; Toro-Pérez, J.; Salehi, S. Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 with Vision in the Chilean Medical Licensing Examination: Observational Study. JMIR Med. Educ. 2024, 10, e55048. [Google Scholar] [CrossRef] [PubMed]
  49. Prazeres, F. ChatGPT’s Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini. JMIR Med. Educ. 2025, 11, e65108. [Google Scholar] [CrossRef] [PubMed]
  50. Dashti, M.; Londono, J.; Ghasemi, S.; Zare, N.; Samman, M.; Ashi, H.; Amirzade-Iranaq, M.H.; Khosraviani, F.; Sabeti, M.; Khurshid, Z. Comparative analysis of deep learning algorithms for dental caries detection and prediction from radiographic images: A comprehensive umbrella review. PeerJ Comput. Sci. 2024, 10, e2371. [Google Scholar] [CrossRef]
  51. Domalpally, A.; Slater, R.; Linderman, R.E.; Balaji, R.; Bogost, J.; Voland, R.; Pak, J.; Blodi, B.A.; Channa, R.; Fong, D.; et al. Strong versus Weak Data Labeling for Artificial Intelligence Algorithms in the Measurement of Geographic Atrophy. Ophthalmol. Sci. 2024, 4, 100477. [Google Scholar] [CrossRef]
  52. Ozbek, O.; Genc, D.E.; O. Ulgen, K. Advances in Physiologically Based Pharmacokinetic (PBPK) Modeling of Nanomaterials. ACS Pharmacol. Transl. Sci. 2024, 7, 2251–2279. [Google Scholar] [CrossRef]
  53. Whitney, H.M.; Yoeli-Bik, R.; Abramowicz, J.S.; Lan, L.; Li, H.; Longman, R.E.; Lengyel, E.; Giger, M.L. AI-based automated segmentation for ovarian/adnexal masses and their internal components on ultrasound imaging. J. Med. Imaging 2024, 11, 044505. [Google Scholar] [CrossRef]
  54. Naghavi, M.; Reeves, A.; Budoff, M.; Li, D.; Atlas, K.; Zhang, C.; Atlas, T.; Roy, S.K.; Henschke, C.I.; Wong, N.D.; et al. AI-enabled cardiac chambers volumetry in coronary artery calcium scans (AI-CACTM) predicts heart failure and outperforms NT-proBNP: The multi-ethnic study of Atherosclerosis. J. Cardiovasc. Comput. Tomogr. 2024, 18, 392–400. [Google Scholar] [CrossRef] [PubMed]
  55. Laperdrix, C.; Duhieu, S.; Haftek, M. Chondroitin/dermatan sulphate proteoglycan, desmosealin, showing affinity to desmosomes. Int. J. Cosmet. Sci. 2024, 46, 494–505. [Google Scholar] [CrossRef] [PubMed]
  56. Cardenas, J.M.; Gordon, D.; Waddell, B.S.; Kitziger, K.J.; Peters, P.C., Jr.; Gladnick, B.P. Does Artificial Intelligence Outperform Humans Using Fluoroscopic-Assisted Computer Navigation for Total Hip Arthroplasty? Arthroplast. Today 2024, 27, 101410. [Google Scholar] [CrossRef]
  57. Builoff, V.; Shanbhag, A.; Miller, R.J.; Dey, D.; Liang, J.X.; Flood, K.; Bourque, J.M.; Chareonthaitawee, P.; Phillips, L.M.; Slomka, P.J. Evaluating AI proficiency in nuclear cardiology: Large language models take on the board preparation exam. J. Nucl. Cardiol. 2025, 45, 102089. [Google Scholar] [CrossRef]
  58. Huang, R.S.; Benour, A.; Kemppainen, J.; Leung, F.H. The future of AI clinicians: Assessing the modern standard of chatbots and their approach to diagnostic uncertainty. BMC Med. Educ. 2024, 24, 1133. [Google Scholar] [CrossRef]
  59. Innerebner, K.; Kowald, D.; Schedl, M.; Lex, E. Hybrid Personalization Using Declarative and Procedural Memory Modules of the Cognitive Architecture ACT-R. In Proceedings of the UMAP Adjunct ’25: Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization, New York, NY, USA, 16–19 June 2025; pp. 349–352. [Google Scholar] [CrossRef]
  60. Dang, C.V.; Jun, M.; Shin, Y.B.; Choi, J.W.; Kim, J.W. Application of modified Asimov’s laws to the agent of home service robot using state, operator, and result (Soar). Int. J. Adv. Robot. Syst. 2018, 2018, 1–9. [Google Scholar] [CrossRef]
  61. Boggs, J. Towards visual-symbolic integration in the Soar cognitive architecture. Cogn. Syst. Res. 2025, 91, 101353. [Google Scholar] [CrossRef]
  62. Ronneberg, C.R.; Lv, N.; Ajilore, O.A.; Kannampallil, T.; Smyth, J.; Kumar, V.; Barve, A.; Garcia, C.; Dosala, S.; Wittels, N.; et al. Study of a PST-trained voice-enabled artificial intelligence counselor for adults with emotional distress (SPEAC-2): Design and methods. Contemp. Clin. Trials 2024, 142, 107574. [Google Scholar] [CrossRef]
  63. Zuo, G.; Pan, T.; Zhang, T.; Yang, Y. SOAR Improved Artificial Neural Network for Multistep Decision-making Tasks. Cogn. Comput. 2021, 13, 612–625. [Google Scholar] [CrossRef]
  64. Verdes, A.; Bhattachan, S.; Kolevzon, A.; King, B.H.; McDougle, C.J.; Sanders, K.B.; Kim, S.J.; Spanos, M.; Chandrasekhar, T.; Rockhill, C.; et al. Predictors of Placebo Response in the Study of Oxytocin in Autism to Improve Reciprocal Social Behaviors. J. Child Adolesc. Psychopharmacol. 2025, 35, 202–210. [Google Scholar] [CrossRef]
  65. Alkam, T.; Tarshizi, E.; Van Benschoten, A.H. Reinforcement learning at the interface of artificial intelligence and cognitive science. Neuroscience 2025, 585, 289–312. [Google Scholar] [CrossRef]
  66. Sievers, T.; Russwinkel, N. Using Memory Contents of a Cognitive Model for Prompt Augmentation of a Large Language Model. In Proceedings of the 2025 IEEE Conference on Cognitive and Computational Aspects of Situation Management (CogSIMA), Duisburg, Germany, 2–5 June 2025; pp. 172–176. [Google Scholar] [CrossRef]
  67. Thomson, R.; Lebiere, C. Cognitive models of influence dynamics in a conformity simulation. Comput. Math. Organ. Theory 2025, 31, 323–343. [Google Scholar] [CrossRef]
  68. Huang, H.; Liu, J.; Zhang, B.; Zhao, S.; Li, B.; Wang, J. LEAD: Learning-Enhanced Adaptive Decision-Making for Autonomous Driving in Dynamic Environments. IEEE Trans. Intell. Transp. Syst. 2025, 26, 6142–6156. [Google Scholar] [CrossRef]
  69. Hölken, A.; Kugele, S.; Newen, A.; Franklin, S. Modeling interactions between the embodied and the narrative self: Dynamics of the self-pattern within LIDA. Cogn. Syst. Res. 2023, 81, 25–36. [Google Scholar] [CrossRef]
  70. Zhang, Z.; Zou, G.; Chen, C.; Qi, Z.; Yu, X.; Qi, J.; Yao, Y.; Li, X.; Xie, Y.; Tan, X. A Task-Aware Parameter Decoupling Framework for Continual Anomaly Detection. IEEE Trans. Ind. Inform. 2025, 1–11. [Google Scholar] [CrossRef]
  71. Kronsted, C.; Kugele, S.; Neemeh, Z.A.; Ryan, K.J.; Franklin, S. Embodied Intelligence: Smooth Coping in the Learning Intelligent Decision Agent Cognitive Architecture. Front. Psychol. 2022, 13, 846931. [Google Scholar] [CrossRef]
  72. Rayavaram, P.; Ukaegbu, O.; Abbasalizadeh, M.; Vellamchetty, K.; Narain, S. CryptoEL: A Novel Experiential Learning Tool for Enhancing K-12 Cryptography Education. In Proceedings of the SIGCSETS 2025: Proceedings of the 56th ACM Technical Symposium on Computer Science Education V. 1, Pittsburgh, PA, USA, 26 February–1 March 2025; pp. 980–986. [Google Scholar] [CrossRef]
  73. Kim, K.A.; Hong, S.; Yoo, S.; Kang, Y.; Shim, H.S. Enhancing Structured Pathology Report Generation with Foundation Model and Modular Design. IEEE Access 2025, 13, 121290–121299. [Google Scholar] [CrossRef]
  74. Hussein, S.Y.S.H.; Ho, P.W.C. Towards brain-inspired edge AI: A review of memristor-based neuromorphic computing and learning algorithms. Eng. Res. Express 2025, 7, 032201. [Google Scholar] [CrossRef]
  75. Mao, J.; Zheng, H.; Yin, H.; Fan, H.; Mei, L.; Guo, H.; Li, Y.; Wu, J.; Pei, J.; Deng, L. Adaptive dendritic plasticity in brain-inspired dynamic neural networks for enhanced multi-timescale feature extraction. Neural Netw. 2026, 194, 108191. [Google Scholar] [CrossRef]
  76. Wu, C.M.; Meder, B.; Schulz, E. Unifying Principles of Generalization: Past, Present, and Future. Annu. Rev. Psychol. 2025, 76, 275–302. [Google Scholar] [CrossRef] [PubMed]
  77. Hewavitharana, J.; Anand, A.; Giese, P.; Moretti Ierardi, C.; Steinhofel, K. Brain Inspired Learning for Neural Networks. In Proceedings of the Engineering Applications of Neural Networks; Iliadis, L., Maglogiannis, I., Kyriacou, E., Jayne, C., Eds.; Springer: Cham, Switzerland, 2025; pp. 59–72. [Google Scholar]
  78. García-Torres, D.; Vicente Ripoll, M.A.; Fernández Peris, C.; Mira Solves, J.J. Enhancing Clinical Reasoning with Virtual Patients: A Hybrid Systematic Review Combining Human Reviewers and ChatGPT. Healthcare 2024, 12, 2241. [Google Scholar] [CrossRef]
  79. Levin, C.; Suliman, M.; Naimi, E.; Saban, M. Augmenting intensive care unit nursing practice with generative AI: A formative study of diagnostic synergies using simulation-based clinical cases. J. Clin. Nurs. 2025, 34, 2898–2907. [Google Scholar] [CrossRef]
  80. Mundinger, A. Artificial Intelligence in Senology—Where Do We Stand and What Are the Future Horizons ? Eur. J. Breast Health 2024, 20, 73–80. [Google Scholar] [CrossRef] [PubMed]
  81. Fedorova, A.; Jovišić, N.; Vallverdù, J.; Battistoni, S.; Jovičić, M.; Medojević, M.; Toschev, A.; Alshanskaia, E.; Talanov, M.; Erokhin, V. Advancing Neural Networks: Innovations and Impacts on Energy Consumption. Adv. Electron. Mater. 2024, 10, 2400258. [Google Scholar] [CrossRef]
  82. Wilhelm, C.; Steckelberg, A.; Rebitschek, F.G. Benefits and harms associated with the use of AI-related algorithmic decision-making systems by healthcare professionals: A systematic review. Lancet Reg. Health Eur. 2025, 48, 101145. [Google Scholar] [CrossRef]
  83. Heisinger, S.; Salzmann, S.N.; Senker, W.; Aspalter, S.; Oberndorfer, J.; Matzner, M.P.; Stienen, M.N.; Motov, S.; Huber, D.; Grohs, J.G. ChatGPT’s Performance in Spinal Metastasis Cases—Can We Discuss Our Complex Cases with ChatGPT? J. Clin. Med. 2024, 13, 7864. [Google Scholar] [CrossRef]
  84. Rojek, I.; Mikołajewski, D.; Dostatni, E.; Piszcz, A.; Galas, K. ML-Based Maintenance and Control Process Analysis, Simulation, and Automation—A Review. Appl. Sci. 2024, 14, 8774. [Google Scholar] [CrossRef]
  85. Carrillo-de-la Peña, M.T.; Fernandes, C.; Castro, C.; Rubal, L.; Samartin-Veiga, N.; Yarnitzsky, D.; Arendt-Nielsen, L.; Dahl, C.; Medeiros, R.; Consortium, P. Validity of central pain processing biomarkers for predicting the occurrence of oncological chronic pain: A study protocol. BMC Cancer 2024, 24, 705. [Google Scholar] [CrossRef]
  86. Giansanti, D.; Lastrucci, A.; Iannone, A.; Pirrera, A. A Narrative Review of Systematic Reviews on the Applications of Social and Assistive Support Robots in the Health Domain. Appl. Sci. 2025, 15, 32. [Google Scholar] [CrossRef]
  87. Ikkatai, Y.; Itatsu, Y.; Hartwig, T.; Noh, J.; Takanashi, N.; Yaguchi, Y.; Hayashi, K.; Yokoyama, H.M. The relationship between the attitudes of the use of AI and diversity awareness: Comparisons between Japan, the US, Germany, and South Korea. AI Soc. 2025, 40, 2369–2383. [Google Scholar] [CrossRef]
  88. Combs, K.; Moyer, A.; Bihl, T.J. Uncertainty in Visual Generative AI. Algorithms 2024, 17, 136. [Google Scholar] [CrossRef]
  89. Suresh, S.; Qi, H.; Wu, T.; Fan, T.; Pineda, L.; Lambeta, M.; Malik, J.; Kalakrishnan, M.; Calandra, R.; Kaess, M.; et al. NeuralFeels with neural fields: Visuotactile perception for in-hand manipulation. Sci. Robot. 2024, 9, eadl0628. [Google Scholar] [CrossRef]
  90. Lundin, J.; Suutala, A.; Holmström, O.; Henriksson, S.; Valkamo, S.; Kaingu, H.; Kinyua, F.; Muinde, M.; Lundin, M.; Diwan, V.; et al. Diagnosis of soil-transmitted helminth infections with digital mobile microscopy and artificial intelligence in a resource-limited setting. PLoS Neglected Trop. Dis. 2024, 18, e0012041. [Google Scholar] [CrossRef] [PubMed]
  91. Tseng, L.W.; Lu, Y.C.; Tseng, L.C.; Chen, Y.C.; Chen, H.Y. Performance of ChatGPT-4 on Taiwanese Traditional Chinese Medicine Licensing Examinations: Cross-Sectional Study. JMIR Med. Educ. 2025, 11, e58897. [Google Scholar] [CrossRef]
  92. Djoumessi, K.; Huang, Z.; Kühlewein, L.; Rickmann, A.; Simon, N.; Koch, L.M.; Berens, P. An inherently interpretable AI model improves screening speed and accuracy for early diabetic retinopathy. PLoS Digit. Health 2025, 4, e0000831. [Google Scholar] [CrossRef]
  93. Zhang, X.; Tsang, C.C.S.; Ford, D.D.; Wang, J. Student Pharmacists’ Perceptions of Artificial Intelligence and Machine Learning in Pharmacy Practice and Pharmacy Education. Am. J. Pharm. Educ. 2024, 88, 101309. [Google Scholar] [CrossRef] [PubMed]
  94. Jho, H.; Ha, M. Towards Effective Argumentation: Design and Implementation of a Generative Ai-Based Evaluation and Feedback System. J. Balt. Sci. Educ. 2024, 23, 280–291. [Google Scholar] [CrossRef]
  95. Tognetti, L.; Miracapillo, C.; Leonardelli, S.; Luschi, A.; Iadanza, E.; Cevenini, G.; Rubegni, P.; Cartocci, A. Deep Learning Techniques for the Dermoscopic Differential Diagnosis of Benign/Malignant Melanocytic Skin Lesions: From the Past to the Present. Bioengineering 2024, 11, 758. [Google Scholar] [CrossRef] [PubMed]
  96. Hajam, M.A.; Arif, T.; Khanday, A.M.U.D.; Wani, M.A.; Asim, M. AI-Driven Pattern Recognition in Medicinal Plants: A Comprehensive Review and Comparative Analysis. Comput. Mater. Contin. 2024, 81, 2077–2131. [Google Scholar] [CrossRef]
  97. Funk, N.; Helmut, E.; Chalvatzaki, G.; Calandra, R.; Peters, J. Evetac: An Event-Based Optical Tactile Sensor for Robotic Manipulation. IEEE Trans. Robot. 2024, 40, 3812–3832. [Google Scholar] [CrossRef]
  98. Qiao, T.; Xiao, C.; Feng, Z.; Ye, J. Habitat Distributions and Abundance of Four Wild Herbivores on the Qinghai–Tibetan Plateau: A Review. Land 2025, 14, 23. [Google Scholar] [CrossRef]
  99. Zhang, Z.; Ding, X.; Liang, X.; Zhou, Y.; Qin, B.; Liu, T. Brain and Cognitive Science Inspired Deep Learning: A Comprehensive Survey. IEEE Trans. Knowl. Data Eng. 2025, 37, 1650–1671. [Google Scholar] [CrossRef]
Figure 1. Key issues guiding the development of the study.
Figure 1. Key issues guiding the development of the study.
Technologies 14 00055 g001
Figure 2. Integrative methodological diagram of the research process.
Figure 2. Integrative methodological diagram of the research process.
Technologies 14 00055 g002
Figure 3. Process for including relevant studies in the first phase using the PRISMA flow method.
Figure 3. Process for including relevant studies in the first phase using the PRISMA flow method.
Technologies 14 00055 g003
Figure 4. Visual elements derived from the metadata of the works included in the study: (a) Heat map of the countries that collaborate closely in the field of study. (b) Relationship map by countries that work collaboratively. (c) Density map for relevant terms. (d) Heat map of the identified and related keywords, generated in the VOSWiever software.
Figure 4. Visual elements derived from the metadata of the works included in the study: (a) Heat map of the countries that collaborate closely in the field of study. (b) Relationship map by countries that work collaboratively. (c) Density map for relevant terms. (d) Heat map of the identified and related keywords, generated in the VOSWiever software.
Technologies 14 00055 g004
Figure 5. Studies with or without accuracy percentages.
Figure 5. Studies with or without accuracy percentages.
Technologies 14 00055 g005
Figure 6. Differentiated studies with experimental and non-experimental characteristics.
Figure 6. Differentiated studies with experimental and non-experimental characteristics.
Technologies 14 00055 g006
Figure 7. The image shows the percentage of trust and distrust in AI systems based on the results.
Figure 7. The image shows the percentage of trust and distrust in AI systems based on the results.
Technologies 14 00055 g007
Table 1. Research questions developed for justification purposes.
Table 1. Research questions developed for justification purposes.
Research Questions
RQ1:How do human intelligence and artificial intelligence behave when solving cognitive problems?
RQ2:What experimental and/or empirical results exist on the performance of artificial intelligence compared to human cognitive processes?
RQ3:To what extent do empirical evidence, both for and against, challenge the concept of intelligence?
Table 2. Cognitive processes identified quantitatively in the literature review.
Table 2. Cognitive processes identified quantitatively in the literature review.
Cognitive ProcessQuantityPercentage
Decision-making16356.01%
Analysis and evaluation7325.09%
Judgment and reasoning258.59%
Comprehension and learning165.55%
Other cognitive processes 1144.76%
1 Advanced cognitive processes in interaction with AI.
Table 3. Advanced cognitive processes in interaction with AI.
Table 3. Advanced cognitive processes in interaction with AI.
Cognitive Process: Other Cognitive ProcessesQuantityPercentage
Abstraction and modelling of complex systems10.34%
High-level visual perception10.34%
Visual pattern recognition and high-level visual perception10.34%
Selective attention and high-level visual perception10.34%
Knowledge production10.34%
Evaluation of Human–AI Interaction10.34%
Evaluation and Ethics10.34%
Ethical reasoning in AI10.34%
Mind perception and ethical judgment10.34%
AI-assisted validation judgment10.34%
AI-assisted creative problem-solving10.34%
Human–AI Cognitive Interaction10.34%
Trust in automated technology10.34%
Inference of Personality Traits10.34%
Table 4. Distribution of terms found on AI, presented by term, frequency, and percentage.
Table 4. Distribution of terms found on AI, presented by term, frequency, and percentage.
ClassificationTerm, Frequency, PercentageTerm, Frequency, Percentage
TermFrequency%TermFrequency%
Language models
and Chatbots
ChatGPT2414.81ChatGPT-4V10.62
GPT-42213.58Claude 310.62
GPT-3.5169.88Clinical-T5-Large10.62
BERT53.09Copilot10.62
ChatGPT-3.542.47Ernie Bot10.62
ChatGPT-442.47FLAN-T5-xl10.62
Llama231.85FLAN-UL210.62
Bard21.23Gemini10.62
Claude 221.23GigaBERT10.62
GPT21.23GPT-4 Turbo10.62
GPT-4o21.23GPT-310.62
MedAlpaca21.23GPT-210.62
AraBERT10.62Llama310.62
BLOOM10.62Llama-7B10.62
ChatGPT-310.62Mistral10.62
ChatGPT-3.5 Turbo10.62Mixtral-8x7B10.62
ChatGPT-4o10.62XLM-RoBERTa10.62
ChatGPT-4o (mini)10.62
Vision and Content
creation models
DALL-E21.23Imagen 210.62
YOLO21.23Imagine10.62
BLIP10.62InstructBLIP10.62
BLIP-210.62Midjourney v610.62
DALL-E 310.62Stable Diffusion XL10.62
Firefly 210.62YOLOv510.62
Architectures and
Neural networks
ResNet31.85EfficientNet10.62
UNet31.85FPN10.62
DenseNet21.23PSPNet10.62
VGG10.62
Algorithms and
Techniques
Random Forest21.23RAG10.62
XGBoost21.23Fuzzy c-means10.62
Logistic regression10.62REINFORCE10.62
SVM10.62SOM10.62
Word2Vec10.62GloVe10.62
FastText10.62
Platforms and
Tools
Google Earth10.62NVivo10.62
SmartPLS 310.62
Specific or
Application Terms
AI-CACTM10.62AICF10.62
AI-ECG10.62AIM-MASH10.62
AI-QCT10.62Avicenna CINA10.62
AVIEW-LCS10.62Brainomix10.62
C51-DDQN10.62CAAI-FDS10.62
ChainingAI10.62CHDdECG10.62
CoSP10.62D-Conformer10.62
DECA10.62DeepDream10.62
Deeplab10.62DenseNet20110.62
Diagnocat10.62Dora10.62
DPA-210.62EchoCLR10.62
EMOCA10.62ExpNet10.62
EyeArt v2.110.62HypVINN10.62
Foggy Drone10.62GI Genius10.62
GIT10.62GPT-agent10.62
GPTZero10.62InceptionTime10.62
Lumen10.62LumineticsCore10.62
MediaPipe Pose10.62Mirai10.62
Vision Transformer10.62MobiFit10.62
Analysis Tagger10.62Narrativa10.62
NeuralFeels10.62PathChat10.62
PDPP10.62PERCEPT-R10.62
Phoenix10.62QCPR10.62
Rapid10.62Realistic Vision10.62
RingNet10.62RosettaDock10.62
UNNT10.62US2ai10.62
Viz LVO10.62Viz.ai ICH10.62
vPatho10.62
Table 5. Categorization of Artificial Intelligence by Cognitive Processes.
Table 5. Categorization of Artificial Intelligence by Cognitive Processes.
Cognitive ProcessesQuantityDescriptionArtificial Intelligence Systems Detected
Decision
making
69Intelligent systems created for decision-making or making predictions based on analysed data or established rulesAI-CACTM, AICF, AIM-MASH, AI-QCT, AraBERT, GigaBERT, XLM-RoBERTa, AVIEW-LCS, Bard, Brainomix, Avicenna CINA, CAAI-FDS, ChatGPT (all versions), Ernie Bot, Gemini, Copilot, Google BARD, GPTZero, Claude (includes versions 2.0 and 3), Clinical-T5-Large, Llama2 (includes Llama2-13B and LLaMA2-7b), FLAN-UL2, GPT-3.5, GPT-4 (includes versions Turbo, 4o, 4V, and GPT-agent), CNN-BiLSTM, CoSP, Deeplab, DenseNet (includes DenseNet201), Diagnocat, Dora, ExpNet, 3DDFA-V2, RingNet, DECA, EMOCA, EyeArt (includes EyeArt Automated DR Detection System and v2.1), Foggy Drone Teacher (FDT), GANs (Generative Adversarial Networks), Logic Learning Machine (LLM), GIT, BLIP (includes BLIP-2), InstructBLIP, DALL-E, Word2Vec, GloVe, FastText, InceptionTime, Flan-T5-xl, T0-3b-T, T0pp(8bit)-T, logistic regression, XGBoost, Random Forest, LumineticsCore, MediaPipe Pose Landmark Detection, QCPR, Mirai, MobiFit, PathChat, PDPP, Rapid, Viz LVO, ResNet-34, SmartPLS 3, SVM, SOM, CNN, UNet, Fuzzy c-means (FCM), US2ai, Viz.ai ICH, UNNT, YOLO
Analysis
and
evaluation
34Tools and models created for data interpretation, pattern recognition, and information evaluationChatGPT (all versions), BERT, BLOOM, Phoenix, Multidimensional Analysis Tagger (MAT), NVivo, C51-DDQN, DFP, Imagine, REINFORCE, DALL-E (incl. DALL-E 3), Firefly 2, Midjourney v6, D-Conformer, DeepDream, FastSurfer-HypVINN, Imagen 2, FPN, UNet, PSPNet, EfficientNet, ResNet, Stable Diffusion XL Turbo, VGG, Mix Vision Transformer (mViT), GPT-4, GPT, GPT-3, Realistic Vision, Google AI (BERT-based), Smile Random Forest, Smile CART, Google Earth Engine (GEE), vPatho, YOLOv5
Judgement
and
reasoning
11Intelligent systems that apply logic and evaluate argumentative texts to generate critical judgementsBERT, ChatGPT/GPT Family (includes versions 3.5, 4, 4o), ChainingAI, PERCEPT-R Classifier, Bard, DenseNet, GI Genius, RAG, Llama-3 (8b-instruct)
Comprehension
and
learning
12Intelligent systems that focus on understanding information, acquiring it from human languageChatGPT (incluye versiones 3.5 y 3.5-Turbo), CHDdECG, Doximity GPT, DPA-2, EchoCLR, GPT-2, GPT-3.5, GPT-4, Llama2 (incluye LLaMA 2), MedAlpaca, ORCA-mini, Lumen
Ethical
reasoning
in AI
4Intelligent systems for handling ethical dilemmas or establishing moral guidelinesGPT-4, GPT-3.5, Claude 2, Bard
Knowledge
production
3Intelligent systems that generate new information, content, or insightsChatGPT, GPT-4o, BERT
Abstraction and
modelling of
complex systems
1Intelligent systems that simplify or represent complex systemsRosettaDock
Table 6. Cognitive processes identified in the literature review.
Table 6. Cognitive processes identified in the literature review.
Cognitive ProcessStageReference
Decision makingMedicine and health[3,4,7,10,11,46,47,48,50,53,54,55,56,57,87,96,97]
Analysis and evaluationHealth and technology education[2,5,9,49,51,52,86]
Judgement and reasoningTechnological Innovation for Sustainability and Health[8,12,98]
Comprehension and learningTechnology and Humanism in Health[6,13]
Other cognitive processes 1Technology and Human Interaction[1]
1 Advanced cognitive processes in interaction with AI.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ayala-Carabajo, R.; Llerena-Izquierdo, J. Comparative Experimental Studies on Superior Cognitive Domains: AI Versus Humans. Technologies 2026, 14, 55. https://doi.org/10.3390/technologies14010055

AMA Style

Ayala-Carabajo R, Llerena-Izquierdo J. Comparative Experimental Studies on Superior Cognitive Domains: AI Versus Humans. Technologies. 2026; 14(1):55. https://doi.org/10.3390/technologies14010055

Chicago/Turabian Style

Ayala-Carabajo, Raquel, and Joe Llerena-Izquierdo. 2026. "Comparative Experimental Studies on Superior Cognitive Domains: AI Versus Humans" Technologies 14, no. 1: 55. https://doi.org/10.3390/technologies14010055

APA Style

Ayala-Carabajo, R., & Llerena-Izquierdo, J. (2026). Comparative Experimental Studies on Superior Cognitive Domains: AI Versus Humans. Technologies, 14(1), 55. https://doi.org/10.3390/technologies14010055

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop