Towards Human-like Artificial Intelligence: A Review of Anthropomorphic Computing in AI and Future Trends

Zhang, Jiacheng; Zhang, Haolan

doi:10.3390/math13132087

Open AccessArticle

Towards Human-like Artificial Intelligence: A Review of Anthropomorphic Computing in AI and Future Trends

by

Jiacheng Zhang

^1,2 and

Haolan Zhang

^2,*

¹

School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China

²

Ningbo Institute of Technology, Zhejiang University, Ningbo 315104, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(13), 2087; https://doi.org/10.3390/math13132087

Submission received: 13 May 2025 / Revised: 12 June 2025 / Accepted: 23 June 2025 / Published: 25 June 2025

(This article belongs to the Special Issue Machine Learning: Mathematical Foundations and Applications)

Download

Browse Figures

Versions Notes

Abstract

Artificial intelligence has brought tremendous convenience to human life in various aspects. However, during its application, there are still instances where AI fails to comprehend certain problems or cannot achieve flawless execution, necessitating more cautious and thoughtful usage. With the advancements in EEG signal processing technology, its integration with AI has become increasingly close. This idea of interpreting electroencephalogram (EEG) signals illustrates researchers’ desire to explore the deeper relationship between AI and human thought, making human-like thinking a new direction for AI development. Currently, AI faces several core challenges: it struggles to adapt effectively when interacting with an uncertain and unpredictable world. Additionally, the trend of increasing model parameters to enhance accuracy has reached its limits and cannot continue indefinitely. Therefore, this paper proposes revisiting the history of AI development from the perspective of “anthropomorphic computing”, primarily analyzing existing AI technologies that incorporate structures or concepts resembling human brain thinking. Furthermore, regarding the future of AI, we will examine its emerging trends and introduce the concept of “Cyber Brain Intelligence”—a human-like AI system that simulates human thought processes and generates virtual EEG signals.

Keywords:

artificial intelligence; human-like thinking; electroencephalogram (EEG); anthropomorphic computation; cyber brain intelligence; AI future trends

MSC:

68T01

1. Introduction

Artificial intelligence (AI), as one of the most cutting-edge technologies of today, has undergone a journey of innovation and breakthroughs. From initial theoretical research to its widespread application across various aspects, AI has profoundly transformed the way we live; these aspects include healthcare [1], education [2], industrial production [3], and daily life [4], bringing high convenience to all of us. Although AI has achieved significant success to a certain extent, challenges still remain. For instance, when using ChatGPT-4o or other AI products, we often need to make multiple attempts before obtaining a relatively satisfactory answer. At such times, we may feel that AI is still not as intelligent as it seems.

In recent years, with the rapid development and groundbreaking progress in brain–computer interface (BCI) technology [5], the integration of this cutting-edge technology with artificial intelligence has become increasingly prominent, especially in the field of EEG signal processing [6]. This development in EEG signal processing reflects a profound ambition among researchers to delve deeper into the intricate relationship between artificial intelligence and human thinking. By analyzing brainwave activity, scientists aim to bridge the gap between human and machine intelligence, fostering the development of AI systems that can emulate human-like thinking processes. This deep integration also prompts profound reflections. When artificial intelligence systems acquire information-processing capabilities closer to human thought patterns or even develop human-like cognition and decision-making mechanisms, this may lead to a qualitative leap in AI technology, becoming a critical pathway to overcoming current technical bottlenecks.

Anthropomorphic computing is an intelligent computing paradigm constructed by simulating human cognitive models, behavioral characteristics, and interaction logic [7]. Its core objective is to endow computer systems with human-like capabilities in perception, reasoning, contextual adaptation, and natural interaction [8]. This field is built upon technologies such as multimodal perception fusion, cognitive architecture modeling, and dynamic knowledge representation, integrating methods from machine learning, affective computing, and neuromorphic engineering to enable systems to proactively understand human intentions, semantically parse complex environments, and autonomously evolve interaction behaviors. There have already been some attempts in this field; neuromorphic chips (IBM TrueNorth) [9] reduce power consumption to milliwatt levels through asynchronous event-driven architectures, mimicking the sparse coding characteristics of biological neurons. Spiking neural networks (SNNs) [10] adopt simplified forms of the Hodgkin–Huxley equations, using differential equations to simulate the dynamics of membrane potentials. However, these efforts are still limited by the constraints of “selective biomimicry”: current models focus narrowly on the mathematical abstraction of electrical signal transmission while decoupling the multiscale coupling effects between neuromodulator diffusion and metabolic activity.

Prior to this, there has been virtually no academic literature systematically analyzing the overall development of artificial intelligence from this specific perspective. Therefore, this paper will adopt the lens of anthropomorphic computing to review key technologies in the history of AI, analyze future development directions, and draw inspiration to propose a novel conceptual system called “Cyber Brain Intelligence”. Section 2 primarily outlines the research methodology and literature selection strategy employed in this study. Section 3 categorizes existing AI technologies into two groups for retrospective analysis: early AI ideologies and system architectures and modern AI algorithms represented by machine learning and deep learning. Section 4 illustrates the current challenges in AI, exploring future trends in AI development; then, we introduce the “Cyber Brain Intelligence” system and envision potential applications of “Cyber Brain Intelligence”. Section 5 provides a discussion of the limitations of this research. Finally, Section 6 draws a conclusion and adds future work. The conceptual map of this paper is shown in Figure 1.

The major contributions of this article can be summarized as follows:

This paper examines the manifestation of various AI technologies and algorithms in mimicking human cognition throughout the history of artificial intelligence, using anthropomorphic computing as an analytical framework, and provides technical explanations from architectural or algorithmic perspectives.
We propose a “Cyber Brain Intelligence” framework for future AI development. This is an AI system designed to generate and analyze virtual EEG signals in order to address various issues encountered in everyday life. Emerging from the perspective of anthropomorphic computing, this conceptual model attempts to address current limitations in artificial intelligence.
This paper analyzes the development trends of future artificial intelligence and provides market forecasts for the potential applications of “Cyber Brain Intelligence”.

2. Research Methodology

Given the extensive historical and methodological span of artificial intelligence covered in this paper, it is impractical to review all relevant works within a single article. We use a systematic review and meta-analysis program named Prisma [11] to select the papers. Specifically, we first input keywords into a well-known multidisciplinary database, i.e., Web of Science. Subsequently, we manually discard some search results that are irrelevant to this paper.

Figure 2 shows the diagram of our PRISMA-based article selection in detail. By adopting PRISMA, 148 related works are chosen. As it also involves some articles relevant to medicine and other fields, we have manually included some works that are not chosen by our PRISMA-based article selection. Finally, we chose 155 related works for review.

Since this paper explores the manifestation of anthropomorphic computing in the history of artificial intelligence, the cited references include 27 papers published before 2004, which will not be elaborated on here. Figure 3 presents the number of relevant literature collections by year since 2004, from which it can be observed that the majority of the papers were concentrated after 2014, a period that coincides with the explosive development of artificial intelligence technology worldwide.

Figure 4 is a pie chart that illustrates the proportion of selected literature categorized by different AI technologies. Based on the data from the chart, research related to deep learning accounts for the largest proportion at 33.1%, primarily composed of architectures such as CNN, LSTM, and Transformer. Machine learning papers make up 26.9%, including explorations of traditional algorithms and theoretical studies. Papers in the early artificial intelligence domain represent 23.1%, mainly focusing on early AI methods, theories, and some related inventions or innovations. Papers on large language models account for 9.2%; although smaller in proportion, this field has developed rapidly in recent years, involving architectures, training methods, and applications of models such as the GPT series and BERT. Lastly, papers categorized as “Others” constitute 7.7%, mainly focusing on articles related to medicine and engineering.

3. The Anthropomorphic Computing in Different Time of Artificial Intelligence

This chapter focuses on the central role and evolution of anthropomorphic computing in the development of artificial intelligence. Anthropomorphic computing aims to enable computers to exhibit human-like intelligence by simulating human thought and behavior. This concept has underpinned both theoretical research and technological practice in AI. From the foundational designs of early mechanical systems and computational theories to the emergence of electronic computers and the modernization of AI, anthropomorphic computing has driven technological breakthroughs and profoundly influenced the design philosophy of intelligent systems. We will analyze how this concept has been manifested and impacted in different historical stages, starting from the early theories and practices of AI. A significant timeline for the development of computers and artificial intelligence is shown in Figure 5.

The history of computers and artificial intelligence spans over a century, with the concept of anthropomorphic computing playing a key role throughout this development. The invention of the Jacquard loom is widely regarded as the starting point for computational thinking, as it introduced the idea of using encoded instructions to control machines [12]. Based on this, Charles Babbage designed the Analytical Engine, often considered the first “thinking machine”. Although the design of the Analytical Engine resembled modern computers, capable of executing sequences of instructions and processing data, its core purpose aligned with the concept of anthropomorphic computing—enabling machines to simulate human thinking and behavior. However, due to the technological limitations of the time, the Analytical Engine was never realized. With the emergence of figures like Alan Turing, a wave of new ideas and technologies began to take shape, marking the first surge in the development of computers and artificial intelligence.

3.1. Early Artificial Intelligence and Anthropomorphic Computing

3.1.1. Turing Machine

The Turing Machine, introduced by Alan Turing in the early 20th century, is a foundational concept in computer science and artificial intelligence. This abstract mathematical model simulates human problem-solving processes through symbolic manipulation. The Turing Machine operates on an infinitely long tape divided into discrete cells, equipped with a read/write head that moves and performs operations based on predefined rules according to its current state and the symbol it reads. The most remarkable feature of the Turing Machine lies in its demonstration of the universality of computation, proving that any computable task can theoretically be performed by such a machine. This universality not only forms the theoretical basis for modern digital computers but also aligns with the principles of anthropomorphic computing by conceptualizing how machines can replicate human cognitive functions. Then, Alan Turing worked on much less fictional issues and developed a code-breaking machine called The Bombe for the British government, with the purpose of deciphering the Enigma code used by the German army in the Second World War. The Bombe, which was about 6 feet tall and had a weight of about a ton, is generally considered the first working electro-mechanical computer. The powerful way in which the Bombe was able to break the Enigma code, a task previously impossible to even the best human mathematicians, made Turing and other scientists wonder about the intelligence of such machines [13].

In 1950, he published his seminal article “Computing Machinery and Intelligence” [14], where he described how to create intelligent machines and, in particular, how to test their intelligence. The core idea of the test is to determine whether a machine can engage in a conversation with humans without being identified as a machine. During the test, an evaluator interacts with both a human participant and a machine through textual communication. If the evaluator cannot reliably distinguish between the human and the machine, the machine is considered to have passed the Turing Test.

The importance of the Turing Test lies not only in its straightforward evaluation criteria but also in its guidance for artificial intelligence research. First, it defines intelligence based on observable behavior rather than internal mechanisms. This behavior-centric evaluation broadens the scope of AI research, encouraging the development of systems capable of simulating human behaviors. Second, the Turing Test emphasizes the critical role of natural language processing. Human language is complex and multidimensional, involving grammar, semantics, context, and cultural knowledge. Therefore, designing machines capable of passing the Turing Test requires addressing core challenges in language understanding, reasoning, and learning. For instance, in the late 20th century, the principles of the Turing Test inspired numerous anthropomorphic computing research projects, such as the development of chatbots and virtual assistants. These systems use semantic analysis and contextual reasoning to simulate human conversational abilities, serving as practical implementations of anthropomorphic computing.

By proposing the Turing Test, Turing not only defined the objectives of AI research but also provided a clear direction for the future of anthropomorphic computing. Centered around dialogue, this test exemplifies the capacity of machines to mimic human cognitive processes and has significantly influenced the exploration of AI in practical applications.

3.1.2. The McCulloch–Pitts Neuron

The McCulloch–Pitts neuron is a simplified mathematical model proposed in 1943 by neuroscientist Warren McCulloch and mathematician Walter Pitts, aiming to simulate the basic working principles of biological neurons [15]. This model abstracts the neuron as a logical unit with binary inputs and outputs: each input signal is assigned a fixed weight (typically +1 or −1), and when the weighted sum of input signals exceeds a preset threshold, the neuron is activated and outputs 1; otherwise, it outputs 0. Its design was inspired by the “all-or-none” characteristic of biological neurons, meaning neurons are either fully activated or completely inactive. Although this model lacks learning capability (weights and thresholds need to be preset), it demonstrated for the first time through mathematical formalization how neurons process information through logical operations (such as AND, OR, or NOT), thereby providing a theoretical foundation for constructing complex computational systems.

The McCulloch–Pitts neuron has had a particularly profound impact on anthropomorphic computing. Simulating the threshold activation mechanism of biological neurons reveals how simple units can achieve complex functions through combination. This idea directly inspired the development of artificial neural networks. For example, multiple McCulloch–Pitts neurons can form logical circuits through hierarchical connections to complete tasks such as classification or decision-making, similar to the division of labor and cooperation among neuronal networks in the human brain. Additionally, although its assumption of temporal synchronization (all input signals arriving simultaneously) is simplified, it provides a starting point for subsequent research on temporal information processing, such as recurrent neural networks. Although modern neural networks have introduced learnable parameters and nonlinear activation functions to enhance their expressive ability, the core idea of the McCulloch–Pitts model—simulating intelligence through distributed simple units—remains one of the central ideas in anthropomorphic computing. It not only drove the transition of computer science from symbolic logic to bionic computing but also provided a mathematical model for understanding the underlying mechanisms of human cognition, making the vision of “machines thinking like humans” gradually possible.

3.1.3. The ENIAC

ENIAC (Electronic Numerical Integrator and Computer), completed in 1946, was the world’s first general-purpose electronic computer, marking a significant leap in computational power [16]. Built using over 17,000 vacuum tubes, ENIAC could perform complex arithmetic operations, such as addition, subtraction, multiplication, and division, at speeds thousands of times faster than manual methods. This breakthrough went beyond mere efficiency gains—it represented the earliest systematic attempt to mechanize human cognitive labor. While originally designed for numerical computations, the architecture of ENIAC was modular, allowing it to be reconfigured for various types of calculations, though it required manual rewiring to handle different tasks. Its system consisted of components such as the accumulator for storing intermediate results and various units dedicated to arithmetic operations. The machine’s operation involved an iterative process, where results were calculated step by step. One of the notable applications of ENIAC was its use in solving ballistic trajectory calculations during World War II. Using numerical integration, ENIAC solved the differential equations that modeled projectile motion, a task that had been manually executed by human “computers” following intricate procedural logic—a logic ENIAC now encoded into electronic operations.

The birth of ENIAC aligned with the paradigm shift from mechanical metaphor to electronic concretization in anthropomorphic computing thought. Its design philosophy did not merely replicate arithmetic actions; it deconstructed and metaphorically reconstructed the cognitive workflow of human mathematicians. The electronic tube array of this colossal machine did not merely simulate the physical actions of human calculators performing arithmetic operations; more importantly, it reconstructed the procedural logic of mathematical thinking. When calculators manually solved ballistic trajectories using desktop calculating machines, they had to decompose complex differential equations into atomic operations of addition, subtraction, multiplication, and division and coordinate the distribution and integration of intermediate results through paper tables. This anthropocentric process became a blueprint for the engineering design of ENIAC. The accumulator (Accumulator) and function table (Function Table) units directly mirrored these steps: each accumulator acted like a digitized calculator’s cerebral cortex, completing decimal carry propagation through electronic pulses rather than neural impulses; the program control unit played the role of a human dispatcher, using a cable matrix instead of paper instruction sheets to encode the computational process into a reusable current path. Crucially, this marked a conceptual leap: ENIAC transformed human cognitive procedures into machine-executable formalisms, establishing a foundational model for later AI systems seeking to emulate higher-order intelligence. This pioneering act of abstracting biological cognition into programmable electronic circuits laid the material foundation for later synaptic simulations in neural networks and iterative optimizations in machine learning, enabling the philosophical inquiry of “how machines think” to begin obtaining an engineering-based decoding approach.

Although ENIAC was not a stored-program computer and required manual reconfiguration for different tasks, its flexible architecture demonstrated the potential for machines to assist in a wide range of applications. Its success in automating complex scientific and military calculations set the stage for the next generation of computers and AI systems, which would continue to evolve with even more sophisticated capabilities. From the perspective of anthropomorphic computing, the legacy of ENIAC lies in its role as a transitional artifact: it shifted the paradigm from “machines as tools” to “machines as cognitive proxies”, proving that human intellectual processes could be systematically externalized into electronic systems. The legacy of ENIAC in anthropomorphic computing remains foundational, showing how machines could replicate and enhance human intellectual functions through the lens of procedural mimicry—a precursor to the adaptive, learning-driven approaches of modern AI.

3.1.4. Fuzzy Logic

Fuzzy logic is a mathematical framework that extends classical logic, aiming to more flexibly express the vagueness and uncertainty found in natural language. This concept was first introduced by Professor Lotfi A. Zadeh of the University of California, Berkeley, in 1965, with the goal of better simulating human decision-making processes when faced with incomplete or ambiguous information [17]. Later developments led to Interval Type-2 Fuzzy Logic (IT2FL), which extends the original Type-1 framework by introducing graded uncertainties—where membership degrees themselves become intervals rather than precise values [18]. Unlike traditional binary logic models, fuzzy logic allows a variable to take any value between 0 and 1, providing a gradual assessment approach. For example, when determining whether a person is tall, classical logic can only offer a simple “yes” or “no” response. However, fuzzy logic introduces the concept of membership, making this question more flexible. It allows an individual’s height to have different degrees of membership in the “tall” fuzzy set, such as 0.6 or 0.8. In IT2FL, this flexibility is further enhanced by allowing “tallness” to be represented as a bandwidth (e.g., [0.5, 0.7]) rather than a single value, better capturing inter-expert disagreement in linguistic definitions. These methods indicate the individual is not clearly at the boundary of being “tall” but rather in a fuzzy range.

Fuzzy sets are at the core of fuzzy logic theory. Each fuzzy set is equipped with a membership function that assigns a membership degree between 0 and 1 to each element, representing the extent to which the element belongs to the fuzzy set. For example, when describing a “moderate temperature” fuzzy set, different temperatures can correspond to different membership degrees, more accurately reflecting people’s subjective perception of temperature. Fuzzy logic also defines a series of operations such as fuzzy conjunction (AND), fuzzy disjunction (OR), and fuzzy complement (NOT). These operations are applied to the membership values through specific rules to achieve fuzzy reasoning.

The concept of anthropomorphic computing in fuzzy logic is dedicated to mimicking human decision-making processes under uncertain or ambiguous information. In daily life, humans frequently encounter vagueness; for example, perceptions of temperature and judgments of crowding cannot be expressed simply as “true” or “false”. Instead, these judgments rely on experience and feelings, processed to varying degrees of vagueness. This way of thinking reflects an adaptability to vague natural language processing in many aspects.

In the application of fuzzy logic, anthropomorphic computing is realized by establishing a set of fuzzy rules. These rules are akin to those humans use when making decisions and can express complex conditions and conclusions. For instance, in a driving scenario, a fuzzy rule system might include rules like, “If the speed is high and the road is slippery, then drive cautiously.” Such rules allow the system to autonomously determine appropriate actions in more complex situations without needing explicit, quantified input. This rule-based system is widely applied in fuzzy logic controllers, such as in household appliance automation and automotive cruise control [19]. Humans possess the ability to adjust decisions based on changes in their environment, and fuzzy logic, through anthropomorphic computing, offers a similar adjustment mechanism. By defining and optimizing fuzzy rules, a system can dynamically adjust its behavior in different contexts. This approach is particularly effective in dealing with hierarchical and multidimensional information, enabling it to handle complex situations that traditional logic models struggle to accommodate.

In summary, anthropomorphic computing in fuzzy logic formalizes human fuzzy thinking processes through fuzzy rules and linguistic descriptions, assisting systems in making reasonable decisions under uncertain conditions. The emergence of IT2FL has further bridged the gap between machine reasoning and human cognition, particularly in domains requiring multi-granular uncertainty articulation. This not only enhances the flexibility of the systems but also improves user interpretability, making it an important aspect of fuzzy logic applications.

3.1.5. Expert Systems

From the 1960s to the 1970s, expert systems emerged as a significant application of the anthropomorphic computing concept, aiming to simulate the decision-making and reasoning processes of human experts in specific domains. Expert systems primarily rely on knowledge representation and logical reasoning, using knowledge bases to store expert knowledge in the form of rules and applying these rules to infer solutions to complex problems typically requiring expert knowledge, as shown in Figure 6.

The core of expert systems is knowledge representation, where the expertise of human professionals is translated into formal rules. These rules are typically expressed in an “If...Then..”. structure, enabling the system to perform reasoning by applying known facts to generate conclusions. For example, in the MYCIN system, a medical rule might be, “If the patient has fever and cough, the diagnosis could be influenza”.

The system uses a reasoning engine to apply the rules to the facts in the knowledge base. The reasoning engine generally employs two major types of reasoning:

Forward reasoning: This data-driven approach begins with known facts and applies rules to derive new facts or conclusions. For instance, if the system knows the patient has a fever and cough, it applies the corresponding rules to infer a diagnosis.
Backward reasoning: This goal-driven method starts with a hypothesis or goal (such as the diagnosis of influenza) and works backward to deduce the facts necessary to support that conclusion.

Both approaches involve the structured application of rules to simulate the expert’s reasoning process.

In more advanced expert systems, Bayesian networks are often used to handle uncertainty in decision-making. These networks help the system calculate the conditional probabilities between events, providing a more refined approach to reasoning under uncertainty, similar to how human experts consider multiple possibilities. Bayesian networks use probability theory to make inferences about the likelihood of certain events occurring. The core formula in Bayesian statistics is:

P (A | B) = \frac{P (B | A) P (A)}{P (B)},

(1)

where

P (A | B)

is the probability of A given B,

P (B | A)

is the probability of B given A, and

P (A)

and

P (B)

are the prior probabilities of A and B.

By integrating Bayesian methods, expert systems can offer more detailed probabilistic inferences, especially when faced with uncertain data, just as human experts would weigh different possibilities in their decision-making.

One of the earliest and most notable expert systems was MYCIN, developed in the 1970s for diagnosing infectious diseases. MYCIN employed a set of medical rules to diagnose diseases and recommend treatments. The system interacted with the user by asking questions about symptoms and applying the rules from its knowledge base to make a diagnosis and propose treatment plans. The MYCIN reasoning engine was based on simple if... then... rules, but its true value lay in its ability to simulate a doctor’s thought process in diagnosing diseases. For example, one rule might state, “If the patient has fever and cough, the diagnosis could be influenza”. The system then used this diagnosis to recommend a course of treatment. Although MYCIN could not replace physicians, it served as a decision-support tool, demonstrating the potential of expert systems to simulate expert reasoning in the medical field.

Despite their success, expert systems faced several inherent limitations. One major challenge was their dependence on knowledge bases. The system’s capabilities were heavily reliant on the quality of the knowledge base, which required continuous updates and expert involvement. Furthermore, expert systems were typically domain-specific and difficult to adapt to new fields or interdisciplinary knowledge.

To address these limitations, some expert systems integrated fuzzy logic and decision trees. Fuzzy logic enables systems to handle uncertain or imprecise information, which is particularly useful in domains requiring human judgment. For instance, in medical diagnosis, fuzzy logic could handle symptoms like “mild fever” or “slight cough” [21]. In addition, decision trees were used in expert systems to provide a visual representation of decision-making, breaking down complex problems into sequential decisions. Decision trees helped simplify the decision-making process by organizing it hierarchically.

The emergence of expert systems marked a significant development in anthropomorphic computing. By simulating the expert’s knowledge and decision-making processes, expert systems demonstrated artificial intelligence’s potential to solve real-world problems. They emphasized the power of rule-based reasoning in AI and highlighted the enormous value of anthropomorphic computing in practical applications. Although expert systems faced scalability and adaptability issues, their contributions laid the foundation for more advanced artificial intelligence technologies and demonstrated that computers could not only mimic expert knowledge but also enhance human ability to solve complex, knowledge-intensive problems.

3.1.6. Evolutionary Algorithm

The concept of evolutionary algorithms originated in the 1960s and 1970s and was gradually developed by Darwin’s theory of natural selection and the theory of biological evolution. In the early 1960s, German scholars Ingo Rechenberg and Hans–Paul Schwefel proposed Evolution Strategies (ESs) while studying engineering optimization problems [22]. This is one of the earliest forms of evolutionary algorithms, mainly used in the field of continuous variable optimization.

At the same time, John Holland, a professor at the University of Michigan in the United States, began to explore search techniques based on the principles of genetics and eventually formed the classical genetic algorithms (GAs) framework. In 1975, he published the book Adaptation in Natural and Artificial Systems, which systematically explained his research results and marked the birth of modern genetic algorithms [23]. With the improvement of computer performance and technological progress, by the late 1980s and early 1990s, a variety of new variants continued to emerge, such as differential evolution [24], particle swarm optimization [25], and other emerging swarm intelligence algorithms have also borrowed the essence of traditional evolutionary algorithms, promoting this field into a prosperous development stage.

The evolutionary algorithms start with the initial population, which consists of some candidate solutions that can be thought of as random sampling points within the search space. Then, they enter the evaluation stage, where each individual performs according to a specific objective function to obtain a fitness value; this value reflects the individual’s ability in terms of the current problem-solving effect being good or bad. The subsequent selection operation is similar to the survival of the fittest principle in nature, giving those individuals with higher fitness scores a greater chance of passing on their characteristics to the next generation. At the same time, the two key links of crossover and mutation are introduced to increase diversity. Crossover means that two excellent individuals exchange some information to produce new offspring; mutations, on the other hand, allow a small number of random changes to occur at certain gene locations to avoid falling into the local optimal solution trap. After several iterations, the global optimal scheme can be approached, or satisfactory results can be found theoretically [26].

In the field of evolutionary algorithms, the idea of anthropomorphic computing is more focused on how to use the biological human reproduction law to guide the optimization of the problem-solving process. Here, we imagine each iteration as a virtual generational alternation—a group of “candidates” competing to represent all possible combinations of answers, each carrying a unique piece of information known as the chromosomal code. When faced with external pressure such as resource shortage (corresponding to objective function constraints), only the members with the most suitable characteristics can remain and continue to contribute value to the future group; those who do not meet the standards gradually disappear until they completely withdraw from the stage of history. In addition, when building such an intelligent simulation system, a lot of delicate design elements are also incorporated to reflect human characteristics. For example, in relation to the establishment of kinship, each parent can contribute half of the genetic material to form the entity of the child. In addition, there is also a probabilistic event trigger mechanism to ensure the existence of unexpected surprises, and, just like in real life, there will occasionally be talented offspring who break the conventional path to stand out. This arrangement makes the whole search path more colorful and full of unknown challenges and also improves the chance of finding the ideal outcome.

The Anthropomorphic Evolutionary Algorithm framework in Algorithm 1 is adapted from a basic evolutionary algorithm by enhancing the concept of anthropomorphic computing. It operationalizes human-like cognitive adaptation through three-phase dynamics. Initially, N diverse cognitive schemas (

x_{i} \in R^{d}

) are generated, each encoding heuristic strategies akin to human decision-making patterns. Each schema undergoes ecological evaluation via a fitness function

f (x_{i})

, which incorporates both solution optimality and behavioral plausibility constraints (e.g.,

resource_usage \leq τ_{human}

). The iterative refinement phase begins with socialized selection, where parent schemas are chosen probabilistically according to imitation probability

p_{imitate} \propto f {(x_{i})}^{α}

, balancing social learning (

α \to 1

) and individual exploration (

α \to 0

). Selected parents produce offspring through cognitive crossover

o_{j} = β x_{p} + (1 - β) x_{q} + ϵ

, where innovation weight

β \in (0, 1)

controls knowledge inheritance from dominant strategies, while Gaussian noise

ϵ \sim N (0, σ^{2})

injects creativity.

Subsequent anchored mutation applies neuro-plausible perturbations bounded by

Δ_{synaptic}

, ensuring biological realism in schema evolution. The population renewal mechanism enforces cultural replacement, preserving top-k elites as accumulated cultural capital while admitting novel strategies. This computational paradigm bridges evolutionary optimization with human cognitive principles, where parameters

α

(social imitation),

β

(knowledge integration), and

Δ_{synaptic}

(neural constraints) collectively transform abstract biological metaphors into actionable algorithmic mechanisms.

Therefore, it can be said that by deeply mining the connotation of anthropomorphic computing and skillfully combining the essential characteristics of evolution, the original cold number calculation can become lively, interesting, and full of vitality, and then it can effectively cope with the changing trend of demand under various complex and difficult situations.

Algorithm 1: Anthropomorphic Evolutionary Algorithm (AEA)

3.2. Modern Approaches to AI and Anthropomorphic Computing

Since the 1990s, significant progress in artificial intelligence has been driven by machine learning. The development of machine learning, particularly neural networks and deep learning, has shifted AI from rule-based systems (e.g., expert systems) to data-driven learning approaches. Algorithms such as supervised, unsupervised, and reinforcement learning have revolutionized fields like image recognition, natural language processing, and autonomous driving, laying the foundation for the widespread application of AI. In recent years, large-scale pretrained models (GPT, BERT) have further advanced AI. By leveraging pretraining on massive datasets, these models have significantly enhanced language understanding and generation capabilities while enabling cross-domain task unification in multimodal learning, making them central to generative and understanding-oriented AI.

The rise of machine learning and deep learning has also propelled the development of anthropomorphic computing, particularly in mimicking human cognition. For instance, convolutional neural networks (CNNs) and recurrent neural networks (RNNs) simulate how the human brain processes information, while large models demonstrate human-like adaptability through few-shot and zero-shot learning. These advancements bring AI closer to achieving human-like abilities and solving complex problems.

3.2.1. The Rise of Machine Learning and Neural Networks (the 1990s)

The 1990s marked a transformative shift in the field of artificial intelligence, driven by the rise of machine learning and neural networks. Unlike traditional AI approaches, which rely on predefined rules and logic-based reasoning, machine learning emphasizes data-driven learning methods. This paradigm shift was a significant step toward anthropomorphic computing, enabling systems to mimic human problem-solving processes. The following will introduce several classic machine learning algorithms and attempt to identify the human-like computational ideas within them.

A. Decision tree. Decision tree is a machine learning algorithm based on hierarchical decision rules, widely used in classification and regression tasks. Its history can be traced back to preliminary research in the 1960s, and it gained significant development in the 1980s with the introduction of algorithms such as ID3 [27]. The basic idea of a decision tree is to decompose a complex problem step by step through a series of conditional judgments, thereby forming an easily interpretable decision model. Its logical structure bears a significant similarity to the hierarchical reasoning methods humans use when solving problems.

The goal of a decision tree is to extract decision rules from data and construct a tree through recursive splitting. Each node represents a conditional judgment on a feature, each branch corresponds to the possible values of the judgment result, and each leaf node represents the predicted outcome. The core issue in constructing a decision tree is how to select the splitting feature at each node, i.e., choosing the optimal split point at each step so that the resulting child nodes are as “pure” as possible (i.e., containing samples of the same category). Commonly used criteria for feature selection include the following:

Information gain: Information gain measures the reduction in uncertainty about the classification due to the feature. Given a feature A and target variable Y, its formula is

$I G (A) = H (Y) - H (Y | A),$

(2)

where $H (Y)$ is the entropy of the classification, defined as

$H (Y) = - \sum_{i = 1}^{k} p_{i} {log}_{2} p_{i},$

(3)

$H (Y | A)$ represents the conditional entropy of Y given feature A.
Gini index: The Gini index [28] measures the impurity of the data, defined as

$G i n i = 1 - \sum_{i = 1}^{k} p_{i}^{2},$

(4)

where $p_{i}$ is the probability of the i-th class sample. The lower the Gini index, the “purer” the node.

The structure and algorithm of a decision tree essentially mimic human logical reasoning processes. First, there is hierarchical reasoning: the splitting process of a decision tree is similar to the step-by-step analysis humans use when solving complex problems. For example, a doctor diagnosing a disease might first inquire about primary symptoms (such as fever or cough) and then progressively rule out possible conditions based on specific symptoms. Additionally, decision trees assess feature importance, selecting splitting features using criteria like information gain. This mirrors how humans prioritize the most important factors when making decisions, such as considering price first when choosing a home, followed by location or amenities. Another human-like characteristic is the interpretability and transparency of decision trees. Each node in the tree can be understood as a specific decision rule that aligns with human cognition. For instance, “If the temperature is greater than 37.5 °C, it’s considered a fever” is a simple rule that is easy to comprehend. Lastly, decision trees also exhibit human-like heuristics in their limitations. For instance, the issue of overfitting reflects how humans might over-interpret special cases when faced with limited data. The pruning mechanism in decision trees corresponds to the way humans exclude unnecessary assumptions based on experience when making decisions. As shown in Figure 7, this diagram illustrates the similarity between a decision tree and the human thought process. A decision tree guides users to make choices step by step through a structured sequence of questions. Humans arrive at answers by inquiring about conditions step by step. This process begins with asking questions about color, followed by further refinement questions such as size, shape, or taste, until a specific result is obtained, such as identifying which type of fruit it is.

B. Reinforcement learning. In the field of artificial intelligence research, reinforcement learning (RL) serves as a machine learning paradigm for sequential decision-making through a trial-and-error mechanism [29]. Its core lies in the agent gradually optimizing its strategy to maximize cumulative rewards through continuous interaction with the environment. This process mimics the learning mechanism of organisms adjusting their behavioral patterns based on reward–punishment feedback.

The mathematical framework of reinforcement learning can be formalized as a Markov Decision Process (MDP) [30], consisting of a state space S, an action space A, state transition probabilities

P (s^{'} ∣ s, a)

, and a reward function

R (s, a)

. At time t, the agent observes a state

s_{t} \in S

, executes an action

a_{t} \in A

, receives an immediate reward

r_{t} = R (s_{t}, a_{t})

, and transitions to a new state

s_{t + 1} \sim P (\cdot ∣ s_{t}, a_{t})

. The optimization objective is to find an optimal policy

π^{*} : S \to A

that maximizes the expected discounted return

E [\sum_{t = 0}^{\infty} γ^{t} r_{t}]

, where

γ \in [0, 1)

is the discount factor. This framework essentially simulates the cognitive process of organisms making adaptive adjustments through behavioral feedback in uncertain environments.

From the perspective of anthropomorphic computing, innovations in reinforcement learning algorithms often stem from abstract modeling of human cognitive mechanisms. Taking the Deep Q-Network (DQN) [31] as an example, its experience replay mechanism establishes a finite-capacity experience buffer

B = {(s_{t}, a_{t}, r_{t}, s_{t + 1})}

and uniformly samples small batches of experiences

B^{'} \subset B

during training, mathematically equivalent to stabilizing a non-stationary experience distribution. This design directly corresponds to the theory of hippocampal memory consolidation in neuroscience—the human brain reinforces important experiences through memory replay during sleep [32,33]. Experiments show that introducing experience replay can improve the algorithm’s sample efficiency by approximately 3.8 times, validating the inspirational role of biological memory mechanisms in machine learning.

Furthermore, FeUdal Networks simulate the hierarchical cognitive strategies humans use when tackling complex tasks by constructing a two-tiered architecture with separated temporal scales [34]. The high-level controller generates a low-dimensional goal vector

g_{t} \in R^{d}

every c steps, while the low-level policy network translates this into specific actions

a_{t} = π_{low} (s_{t}, g_{t})

. This spatio-temporal abstraction mechanism exhibited a 47% higher task completion rate than traditional methods in Atari game tests. The performance advantage stems from the computational reconstruction of humans’ “divide and rule” cognitive strategy. Notably, the dimension reduction process of the high-level goal vectors (

d ≪ dim (S)

) maps onto the chunking theory in psychology, revealing the modeling depth of anthropomorphic computing at the level of feature abstraction.

Monte Carlo Tree Search (MCTS) [35], as a core component of the AlphaGo series of algorithms, embodies the computational implementation of forward-looking mental simulation. The algorithm simulates multiple trajectories in parallel by constructing a search tree. In complex decision-making problems such as Go, which has a branching factor exceeding

10^{170}

, its iterative mechanism of selection, expansion, evaluation, and backpropagation shares cognitive homology with the “reading seconds” deduction process of human professional Go players. Empirical research by Silver et al. [36] shows that the move distribution

P (a ∣ s)

produced by the policy network enhanced by MCTS has a KL divergence of only 0.32 with the decision-making patterns of human players. This quantitative indicator confirms the degree to which the algorithm’s decision-making process approximates human cognitive patterns.

These cases demonstrate that the infiltration of anthropomorphic computing ideas in reinforcement learning has surpassed superficial functional imitation, delving into the computational essence of cognitive architectures. Current research trends are shifting from simulating single mechanisms to integrating multiple cognitive dimensions, such as novel architectures that combine memory consolidation, hierarchical abstraction, and mental simulation (see the SAC algorithm proposed by Haarnoja et al. [37]). Future breakthroughs may hinge on deep cross-validation between neuroscience and computational models, thereby establishing brain-inspired reinforcement learning paradigms with enhanced interpretability and generalization capabilities.

C. Transfer learning. Transfer learning, an advanced machine learning approach, was proposed to address two major challenges in real-world applications [38,39]. The first challenge is the limited number of labeled training samples, especially when dealing with classification problems in specific domains, where there is often a lack of sufficient training data. The second challenge is the change in data distribution, which can render previously collected data outdated, necessitating the recollection of data and retraining of the model. To address these issues, transfer learning emerged. It allows the transfer of knowledge from related domains to the target domain, thereby improving classification performance in the target domain and reducing the need for a large amount of labeled data. Figure 8 is an intuitive example of transfer learning from the human and computer perspectives.

Transfer learning assumes that the feature representations or model parameters learned in one task (the source task) can be useful for another task (the target task). Therefore, the knowledge acquired from the source task can be transferred to the target task. This knowledge can manifest as model weights, feature representations, or even certain structures of the algorithm. Through transfer, the target task can achieve better performance with less data and training time. The core of transfer learning lies in identifying and reasonably leveraging the similarities between the source and target domains. These similarities can be understood as invariants, which remain constant amidst changes.

In transfer learning, there are typically three different operations: domain adaptation, feature selection/extraction, and instance weighting [40]. Domain adaptation aims to bring the source distribution closer to the target distribution as much as possible to enhance the model’s performance in the target domain. In this process, common techniques such as generative adversarial networks (GANs) [41] can learn the mapping from the source domain to the target domain, effectively narrowing the gap between domains. Meanwhile, feature selection/extraction plays a crucial role in this process by identifying and extracting shared features that are effective for different but related tasks, thereby enhancing the model’s generalization ability. This often involves various methods, such as statistical testing and deep learning models, aiming to screen out the most representative feature dimensions [42]. Furthermore, the instance weighting strategy assigns different weights based on the importance of samples, emphasizing representative samples while weakening the influence of noise, further optimizing model performance. This weighting mechanism can be implemented through distance metrics, similarity assessments, and other means, especially showing significant effects in cases of imbalanced datasets or high noise levels [43]. In summary, domain adaptation, feature selection/extraction, and instance-weighting complement each other and collectively constitute the core strategies for improving model adaptability and generalization ability in transfer learning.

The various techniques mentioned above actually carry rich psychological insights and reflect a profound analogy to human cognitive processes. Just as humans continuously learn from past experiences and apply those lessons to similar situations in the future during their growth process—a process that involves not just the accumulation of knowledge but also the refinement of judgment, intuition, and decision-making skills—transfer learning also aims to endow machines with the same capability. The essence of transfer learning lies in its ambition to bridge the gap between static, task-specific algorithms and dynamic, adaptive intelligence, much like how humans evolve from novices to experts by leveraging prior learning across contexts. Transfer learning seeks to replicate this by enabling machines to generalize knowledge from one domain (e.g., image classification) to another (e.g., medical diagnosis) or to fine-tune pretrained models on new, unseen data with minimal retraining. Moreover, the process of transfer learning embodies a “meta-learning” aspect, where models learn how to learn—a skill humans hone through metacognition (thinking about thinking). Similarly, transfer learning models can be designed to dynamically allocate resources, prioritize features, or even self-correct based on feedback, mimicking the iterative, self-improving nature of human problem-solving (see Algorithm 2).

Algorithm 2: Transfer Learning [39]

D. Principal component analysis. Principal component analysis is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. In this, the dimension of the data is reduced to make the computations faster and easier. It is used to explain the variance-covariance structure of a set of variables through linear combinations [44]. It is often used as a dimensionality-reduction technique. The entire process can be understood in three steps: first, translate the data to the origin to eliminate positional bias; second, calculate the correlation between different dimensions (covariance matrix); and finally, use mathematical eigendecomposition to find the directions in which the data distribution is most “stretched out” (i.e., the directions with the largest variance), and use these directions as new coordinate axes to redescribe the data.

The anthropomorphic characteristic of this method lies in its simulation of human cognition’s ability to abstract. For instance, when we observe a cluster of scattered point clouds, we subconsciously seek out its main directions of extension while ignoring secondary fluctuations; PCA achieves this by retaining the principal components with the largest variance. It does not require preset rules by humans but, like humans learning from experience, directly summarizes the most important patterns from the data distribution, automatically filtering out redundant and irrelevant noise. Additionally, PCA compresses high-dimensional data into a two- or three-dimensional visual space, akin to humans’ instinct to convert abstract concepts into intuitive images, aiding us in quickly perceiving the overall structure from complex data. However, this “anthropomorphism” is merely a superficial similarity between mathematical mechanisms and cognitive outcomes. Essentially, PCA remains a linear transformation that relies on rigorous covariance calculations, whereas human cognition excels at processing nonlinear relationships and can flexibly adjust focuses based on external knowledge. This difference also reminds us that algorithms’ definition of “essential features” is purely mathematical, whereas humans often understand data with purpose and semantic relevance (see Algorithm 3).

Algorithm 3: PCA [44]

Input: Data matrix

X \in R^{n \times d}

Output: Reduced data

Y \in R^{n \times k}

1: Center the data by subtracting the mean of each feature
2: Compute the covariance matrix

Σ = \frac{1}{n - 1} X^{T} X

3: Compute the eigenvalues and eigenvectors of

Σ

4: Sort eigenvectors by decreasing eigenvalues
5: Select the top k eigenvectors to form matrix

W \in R^{d \times k}

6: Transform the data matrix X using W to obtain

Y = X W

E. Bayesian networks. Bayesian networks, also known as belief networks, are probabilistic graphical models that represent a set of variables and their interdependencies. These models were first introduced by Judea Pearl in 1985 and have since become a cornerstone in the fields of artificial intelligence, machine learning, and probabilistic reasoning [45]. In a Bayesian network, nodes represent random variables, which can be discrete or continuous. Directed edges between nodes indicate conditional dependencies, with the direction of the edge indicating the direction of influence. Each node is associated with a conditional probability distribution (CPD) that specifies the probability of the node’s state given the states of its parent nodes. Bayesian networks are particularly useful for modeling uncertain domains and performing probabilistic inference. They allow for the visualization of dependencies, efficient modeling, flexible probabilistic reasoning, and the accommodation of incomplete data.

The structure of a Bayesian network is a directed acyclic graph (DAG), where nodes represent random variables, and edges represent conditional dependencies. The joint probability distribution of all variables in the network can be factorized into a product of conditional probabilities, one for each node given its parents [46]. The core functionality of Bayesian networks is probabilistic inference, which involves computing the probability of unknown variables given known evidence. This can be carried out using various inference algorithms, such as variable elimination, belief propagation, or sampling-based methods.

Bayesian networks embody anthropomorphic computing principles in several ways; first, the simulation of human reasoning processes. By decomposing high-dimensional problems into smaller conditional probability problems, Bayesian networks simulate the step-by-step reasoning processes that humans use when facing complex problems. This is akin to how humans gradually analyze and synthesize information to form judgments. Second, the utilization of prior knowledge; Bayesian networks allow for the integration of prior knowledge from domain experts with data, enhancing the accuracy and interpretability of the model. This is similar to how humans use existing knowledge and experience to reason and make decisions when facing new problems. Third, handling uncertainty; Bayesian networks are well-suited for dealing with uncertainty, making them ideal for domains such as medical diagnosis and fault detection that require probabilistic decision-making. This aligns with how humans reason and make decisions in the presence of uncertainty by relying on known information and probability distributions to infer unknown information.

Furthermore, research has explored the integration of Bayesian principles with deep learning, such as by representing neural network weights as probability distributions rather than fixed values [47]. This approach, known as Bayesian deep learning, aims to improve the robustness and performance of neural networks by incorporating uncertainty into the model (see Algorithm 4).

Algorithm 4: Bayesian Network [45]

The advancements in machine learning during the 1990s not only redefined the trajectory of artificial intelligence but also embodied the principles of anthropomorphic computing. By automating tasks such as pattern recognition and decision-making, machine learning systems showcased an increasing ability to mimic human cognitive functions. These developments highlighted the potential of AI to extend and enhance human capabilities, aligning with the overarching vision of creating human-like intelligent systems.

3.2.2. The Rise of Deep Learning (2005)

The core breakthrough of deep learning lies in constructing a computational paradigm that echoes the human cognitive process through multilayered nonlinear representations. Its human-like capabilities are reflected in four progressive levels: first, the perceptual layer abstracts and mimics the primary processing of raw signals using the human sensory system; second, temporal and contextual understanding enables dynamic information memory and reasoning; third, cross-modal integration combines multimodal semantic spaces; finally, the attention mechanism facilitates the dynamic allocation of cognitive focus. This hierarchical and progressive architecture not only overcomes the limitations of traditional algorithms feature engineering but also allows AI systems to exhibit human-like cognitive characteristics in complex tasks such as image understanding and language interaction. The following sections will analyze its human-like implementation mechanisms layer by layer, in conjunction with typical neural network structures. Table 1 briefly analyzes the advantages and disadvantages of these four categories and explains the anthropomorphic computing ideas embodied in them.

A. Perceptual layer abstraction: Biologically inspired feature extraction. The perceptual systems of biological organisms (such as vision and hearing) are essentially multi-level, adaptive processes of information abstraction. Taking human vision as an example, after the retina receives raw light signals, the information is processed layer by layer through the lateral geniculate nucleus (LGN) and the visual cortex: the primary visual cortex (V1) extracts low-level local features such as edges and orientations, while higher-level visual areas (V4, IT cortex) progressively integrate information to form representations of shapes, textures, and even semantic content [48,49]. This process reflects the biological logic of “hierarchical abstraction”, which involves a progressive transformation from low-level physical features to high-level semantic representations.

In deep learning, the architectural design of convolutional neural networks (CNNs) [50,51,52] directly draws inspiration from this principle, as shown in Figure 9. CNNs utilize local receptive fields, where the local connections of convolutional kernels mimic the spatially limited receptive fields of retinal neurons. Through a sliding window mechanism, CNNs capture local patterns in images, such as edges and corner points. At the same time, hierarchical feature composition is employed. Shallow convolutional layers extract primitive textures (similar to the functionality of the V1 area), while deeper layers in the network, using nonlinear activations and pooling operations, progressively integrate local features to form global semantic representations (such as object contours or categories). This process approximates the information integration seen in the higher visual areas of the visual cortex. Table 2 presents a comparison of parameters and performance between CNN and several of its variants. The key performance metrics of four major architectures are compared. These metrics include parameter size, Top-1 accuracy on the ImageNet dataset, model size, and computational complexity. From the table, it can be seen that ResNet and Capsule Network have more parameters, Classic CNN is moderate, and Shallow CNN has the fewest parameters, indicating its simpler structure. ResNet and Capsule Network perform best on ImageNet classification tasks, but their models are larger and more complex, with higher computational complexity compared to Shallow CNN and others. Table 3 presents a tabular overview of CNN networks, briefly highlighting their advantages and disadvantages.

CNNs start with a convolutional layer that applies a small kernel (filter) over the input image, which allows the model to detect localized patterns such as edges or textures. As the network deepens, additional layers combine these local features into more complex representations. For instance, lower layers might detect edges, while deeper layers could identify parts of objects, and even deeper layers might recognize complete objects.

Mathematically, a convolutional layer operates by applying a kernel K across an input image I, producing an output feature map:

S (i, j) = \sum_{m = - 1}^{1} \sum_{n = - 1}^{1} K (m, n) \cdot I (i + m, j + n),

(5)

where

K (m, n)

represents the kernel weights at position

(m, n)

, and

S (i, j)

is the output at position

(i, j)

. Through this process, the network mimics the localized and overlapping receptive fields in the human visual system, allowing it to focus on specific areas of an image while preserving spatial relationships.

Pooling layers further reduce the spatial resolution of feature maps, ensuring robustness to variations such as translations and distortions in the input. This characteristic resembles the way humans can recognize objects even when they appear in different positions or orientations, highlighting an implicit anthropomorphic alignment.

From the photoreceptor cells in the retina to the hierarchical information processing in the cortex, the biological visual system provides a natural paradigmatic template for feature extraction. Convolutional neural networks (CNNs) reconstruct this process through biomimicry, replacing biological neurons with technicalized convolutional kernels and simulating the feature compression of visual pathways via pooling layers. Ultimately, CNNs replicate in the digital domain the cognitive abstraction process of “from simple to complex, from local to global”. This cross-disciplinary knowledge transfer not only confirms the computability of biological mechanisms but also catalyzes a revolutionary breakthrough in paradigms for visual task processing. While current models have approached or even surpassed human performance in basic feature extraction, the bio-inspired modeling of complex mechanisms such as dynamic attention allocation and multimodal collaborative processing in biological systems remains in an exploratory phase.

However, CNNs are not perfect, which is why Geoffrey Hinton and others proposed capsule networks [55,56]. The core idea of capsule networks is to address the shortcomings of traditional convolutional neural networks in spatial relationship modeling and viewpoint robustness by simulating the hierarchical information processing mechanism of the human visual system. Traditional CNNs extract features through local receptive fields and pooling operations, but pooling can lose the spatial location and pose the information (such as rotation and scaling) of features, making it difficult for the model to understand the overall structure of objects. Capsule networks introduce “capsules” as the basic units, where each capsule consists of a group of neurons. These capsules can not only detect the presence of specific features but also encode the pose parameters of features (such as position, orientation, scale, etc.) through high-dimensional vectors, thereby explicitly modeling the spatial relationships between features.

Table 3. Brief overview of CNN architectures.

Model	Main Finding	Limitations	Dataset	Year
AlexNet [57]	Utilizes Dropout and ReLU	Sensitive to input image size, many parameters in the fully connected layers	ImageNet	2012
ZFNet [58]	Visualization idea of middle layers	Redundant parameters in the fully connected layers	ImageNet	2014
VGG	Increased depth, small filter size	Large numbers of parameters, not flexible and modular enough	ImageNet	2014
GoogLeNet [59]	Increased depth, block concept, different filter size, concatenation concept	High computational complexity in branching operations	ImageNet	2015
ResNet [60]	Robust against overfitting due to symmetry mapping-based skip links	Lack of feature locality, deep structures require higher computational resources	ImageNet	2016
CapsuleNet [61]	Pays attention to special relationships between features	Architectural complexity and high memory requirements	MNIST	2018
MobileNet-v2 [62]	Inverted residual structure	Limited feature representation capability	ImageNet	2018
HRNetV2 [63]	High-resolution representations	The multi-branch design and high-resolution module impose higher demands on implementation and debugging	ImageNet	2020
SCNet [64]	Emphasize structured pruning techniques	The flexibility of the pruned model is limited	ImageNet	2022

The anthropomorphic computing idea of capsule networks is embodied in their imitation of the biological visual system. For example, when humans recognize objects, they infer the whole (such as a human face) based on the combination of local features (such as eyes and nose) and their spatial relationships (such as eyes being above the nose). Capsule networks achieve a similar process through the “Dynamic Routing” mechanism: lower-level capsules (such as those for edge detection) transmit their predictions to higher-level capsules (such as those for object parts), and higher-level capsules dynamically adjust connection weights based on prediction consistency, ultimately selecting the most credible parent capsule. This mechanism is similar to how the human brain integrates local information through attention mechanisms to form a holistic perception. Additionally, capsule networks use the magnitude of vector norms to represent the probability of feature existence and encode the posed information in vector directions. This separate design aligns more closely with the hypotheses of “sparse representation” and “explicit attribute encoding” in neuroscience.

B.Temporal and contextual understanding: Engineering the implementation of memory mechanisms. Human language comprehension is akin to a relay race of thought; when you hear the sentence “Last week, the kitten I raised got sick”, your brain naturally retains the key event “kitten got sick”, and upon hearing the follow-up “now it has finally recovered”, it automatically links “it” back to the previously stored “kitten”. This ability to dynamically weave memories is precisely the cognitive trait that deep learning systems aim to simulate through engineered memory frameworks. Table 4 shows the overview of RNN and LSTM architectures. The following will introduce several representative networks, and their overall information is summarized in Table 5.

Table 4. Brief overview of RNN and LSTM architectures.

Model	Main Finding	Limitations	Year
Vanilla RNN [65]	Introduced recurrent connections for sequential data processing	Prone to vanishing/exploding gradients, difficulty in learning long-term dependencies	1986
LSTM [66]	Introduced memory cells and gating mechanisms to address long-term dependency issues	Computationally expensive due to additional parameters and operations	1997
GRU [67]	Simplified gating mechanism compared to LSTM, faster training	Limited flexibility compared to LSTM, may underperform on complex tasks	2014
Transformer-based RNN [68]	Combines RNN with Transformer architectures for enhanced sequence modeling	Higher memory requirements and complexity compared to pure RNNs	2017
ConvLSTM [69]	Combines convolutional layers with LSTM for spatio-temporal data modeling	Computationally intensive, requires large datasets	2015
Hierarchical Attention RNN [70]	Combines hierarchical structures with attention mechanisms for document classification	Increased model complexity and harder to tune hyperparameters	2016
BiLSTM + Attention [71]	Combines BiLSTM with attention for improved sequence modeling and feature extraction	Higher computational cost due to bidirectional and attention mechanisms	2021
Attention-based ConvLSTM [72]	Enhances ConvLSTM with attention for spatio-temporal feature modeling	Computationally intensive, requires large datasets	2022

Early recurrent neural networks (RNNs) [73] introduced a memory chain into AI: the hidden state at each timestep acted like the coordinate axis of a thought trajectory, encoding sequential relationships such as “kitten” → “got sick” → “now” into continuous vectors. However, these models often suffered from vanishing gradients, turning them into “amnesiacs”. This limitation was addressed with the advent of LSTM [66,74] and GRU [67], which introduced gated memory mechanisms. The forget gate operates like a cognitive sieve, filtering out irrelevant information (e.g., ignoring “last week’s weather”), while the input gate writes crucial events (like “kitten got sick”) into long-term memory. The output gate then retrieves these stored fragments as needed, perfectly emulating the selective memory reinforcement process humans use to process lengthy texts.

Recurrent neural networks extend the capabilities of traditional feedforward networks by introducing feedback connections, allowing them to process sequential data. This architecture is particularly effective for tasks requiring temporal understanding, such as speech recognition or language modeling. The ability of RNNs to retain a hidden state over time mirrors human memory, where past experiences influence present decisions.

The hidden state

h_{t}

at time step t is updated based on the current input

x_{t}

and the previous hidden state

h_{t - 1}

:

h_{t} = tanh (W_{h} h_{t - 1} + W_{x} x_{t} + b),

(6)

where

W_{h}

and

W_{x}

are weight matrices, and b is a bias term. This recursive structure enables the network to process variable-length sequences and maintain context, much like how humans derive meaning from sentences by integrating information over time.

Table 5. Detailed comparison of recurrent neural network architectures: Features, gating mechanisms, temporal dependencies, and computational characteristics.

Characteristic	Vanilla RNN	LSTM	GRU	Phased-LSTM
Gating Mechanism	None	Input, Forget, and Output gates	Update and Reset gates	$LSTM gates + Time gate k_{t}$
Temporal Dependency	Short term	Long term	Mid-range	Event-triggered
Parameter Count	$3 n^{2} + 2 n$	$4 (n^{2} + n)$	$3 (n^{2} + n)$	$4 (n^{2} + n) + 3$
Bio-Inspired Features
Memory Mechanism	Short-term potentiation	Explicit memory cells	Adaptive forgetting	Circadian-like rhythm
Update Pattern	Continuous	Conditional update	Balanced flow	Sawtooth-wave sampling
Compute Efficiency	$O (n^{2})$	$4 \times$ RNN	$3 \times$ RNN	$\sim 0.1 \times$ LSTM

Note: n denotes hidden layer size. Time gate operation:

k_{t} = I [ϕ \leq (Δ t mod τ) / τ < ϕ + r]

.

RNNs, however, face challenges with long-term dependencies due to issues like vanishing gradients. To address this, Long Short-Term Memory (LSTM) networks were developed. LSTMs introduce gates that regulate the flow of information, allowing the network to retain important information for longer periods while forgetting irrelevant details. For example, the forget gate in LSTMs is defined as

f_{t} = σ (W_{f} [h_{t - 1}, x_{t}] + b_{f}),

(7)

which determines how much of the previous memory

C_{t - 1}

should be retained. This mechanism mimics human cognitive processes, such as selective attention and memory retention, where some information is kept, and other information is discarded based on relevance.

For non-uniformly sampled time-series data, there is an improved version of LSTM called Phased LSTM [75]. Its core lies in simulating the spiking activation characteristics of biological neural systems through periodically rhythmic time gating. This design is inspired by the observation that biological neurons do not remain active continuously but instead discharge intermittently at specific frequencies (such as circadian rhythms and respiratory rhythms). For example, in electroencephalogram (EEG) monitoring for epilepsy, abnormal discharges often exhibit sudden and intermittent characteristics. Traditional LSTMs may obscure critical signals due to their continuous updating of hidden states. However, Phased LSTM introduces phased gating, which activates neurons and updates memory only within specific time windows, thereby more accurately capturing intermittent events.

The core mechanism of Phased LSTM revolves around three key parameters: {period

τ

(controlling the gating frequency), phase

ϕ

(determining the initial offset of gate activation), and duty cycle r (setting the proportion of open time per cycle). The temporal gating state

k_{t}

is dynamically adjusted by the interval

Δ t = t - t_{prev}

between the current timestamp t and the previous event time

t_{prev}

through a sawtooth wave function:

k_{t} = \{\begin{matrix} 1, & if ϕ \leq \frac{Δ t mod τ}{τ} < ϕ + r, \\ 0, & otherwise . \end{matrix}

(8)

When

Δ t

falls within the gate activation interval (i.e.,

k_{t} = 1

), the model executes standard LSTM updates; during deactivation intervals (i.e.,

k_{t} = 0

), it freezes the cell state

C_{t}

and hidden state

h_{t}

, maintaining their previous values. This event-driven update strategy significantly reduces redundant computations.

C. Attention mechanism: Dynamic modeling of cognitive focus. Human attention is the result of the coordination of multiple brain systems. The prefrontal cortex serves as the core control center, responsible for goal setting and distraction inhibition. For instance, when focusing on a conversation in a noisy environment, it suppresses neural signals associated with background noise. The parietal cortex acts like a “spatial map”, helping to locate visual targets, such as quickly finding keys on a desk. Meanwhile, the thalamus, through its reticular nucleus, functions as an “information gate”, filtering out irrelevant signals to ensure important information reaches the cortex. This collaboration among brain regions is regulated by neurotransmitters. Dopamine strengthens neural circuits related to goal-directed tasks (e.g., dopamine dysregulation in ADHD leads to impaired attention), norepinephrine maintains alertness, and acetylcholine enhances sensory signal processing. Table 6 shows the overview of Transformer-based architectures.

The attention mechanism in computers simulates the human process of information filtering through mathematical modeling. Taking the Transformer model as an example, its self-attention module dynamically allocates weights by calculating the correlation between words. For instance, in the sentence “The cat chases the mouse”, the word “chases” is associated with both “cat” and “mouse”. The model quantifies the strength of these associations through matrix computations. Multi-head attention further mimics the brain’s multi-channel processing, allowing the model to focus simultaneously on semantics (analogous to the prefrontal cortex) and structure (similar to the parietal cortex) of a text. Unlike traditional convolutional neural networks (CNNs) with fixed receptive fields, the attention mechanism allows for dynamic focusing on key areas, akin to how the human visual system quickly locks onto targets in complex scenes. Next, we will explore how the attention mechanism is applied in neural networks through the Transformer model.

The Transformer architecture represents a departure from traditional sequential models, replacing recurrence with a mechanism known as self-attention [68]. This allows the network to process all input elements in parallel while dynamically focusing on relevant parts of the data. This innovation has enabled breakthroughs in tasks such as machine translation, text generation, and image understanding.

Table 6. Brief overview of Transformer-based architectures.

Model	Main Finding	Limitations	Year
Transformer [68]	Introduced self-attention mechanism, significantly improved sequence modeling for NLP tasks	High computational and memory cost, quadratic complexity in sequence length	2017
Longformer [76]	Designed for long sequences, uses sliding window attention to reduce computational complexity	Limited to sparse attention patterns, may not capture global dependencies well	2020
Swin Transformer [77]	Hierarchical architecture for vision tasks, introduces shifted windows for efficient computation	Requires careful tuning of window sizes, less effective for NLP tasks	2021
Conformer [78]	Combines convolution and Transformer for improved speech recognition performance	Computationally intensive for large datasets, requires specialized hardware	2020
ViT (Vision Transformer) [79]	Applies Transformer directly to image patches for vision tasks, achieves state-of-the-art results	Requires large-scale pretraining, sensitive to input patch size	2020
BERT [80]	Pretrained bidirectional encoder, revolutionized NLP by enabling transfer learning	Computationally expensive fine-tuning, large memory requirements	2018
GPT (Generative Pretrained Transformer) [81]	Focused on autoregressive generation tasks, excels in few-shot learning scenarios	Requires vast amounts of data and computational resources for pretraining	2020
DeiT (Data-efficient Image Transformer) [82]	Optimized ViT for data-efficient training with distillation tokens	Limited scalability for very large datasets, dependent on distillation techniques	2021

The self-attention mechanism computes relationships between input tokens by comparing their query (Q), key (K), and value (V) representations:

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V,

(9)

where

d_{k}

is the dimensionality of the key vectors. This global context modeling resembles the way humans selectively focus on certain aspects of a conversation or visual scene while ignoring irrelevant details.

What sets Transformers apart is their ability to model long-range dependencies without relying on sequential processing. The multi-head attention mechanism enhances this ability by learning different representations from multiple attention heads:

MultiHead (Q, K, V) = Concat ({head}_{1}, \dots, {head}_{h}) W^{O}

(10)

Each head processes the input independently, allowing the model to consider various aspects of the data simultaneously. This mirrors how humans often integrate multiple perspectives or pieces of information at once.

The evolutionary trajectory of Transformer variants further exemplifies this bio-inspired engineering paradigm. Longformer [76] introduces dilated sliding window attention to handle extended contexts, mimicking the human brain’s hierarchical chunking strategy when parsing complex narratives—a process neurobiologically implemented via hippocampal theta-gamma phase coupling. Swin Transformer [77] adopts shifted window partitioning to establish local-to-global receptive fields, replicating the ventral visual stream’s progressive feature integration from V1 simple cells to IT cortex holistic representation. This spatial hierarchy aligns with the discovery of cortical magnification factors in fMRI studies [83], where the primary visual cortex dedicates disproportionate resources to foveal inputs, similar to Swin’s patch embedding allocation. Table 7 illustrates the parameter differences among these several distinct variants/types of Transformer models, as well as their performance on several publicly available datasets.

More sophisticated adaptations deepen the neuromorphic correspondence. Perceiver IO [84] introduces latent space cross-attention to process multimodal inputs through shared neural substrates, mirroring the superior colliculus’ role in coordinating audio-visual integration. Its fixed-size latent bottleneck enforces dynamic resource allocation akin to working memory capacity constraints governed by prefrontal dopamine levels. Similarly, BigBird’s [85] sparse attention mechanism with global tokens replicates the default mode network’s function in maintaining contextual priors during rest states, while local attention windows simulate focal cortical activation patterns during task engagement.

These architectural innovations reveal an implicit adherence to neurobiological principles: (1) dynamic resource constraints (inspired by limited synaptic transmission bandwidth), (2) multiscale temporal processing (mimicking thalamocortical resonance frequencies), and (3) energy-efficient sparsification (paralleling neural population coding sparsity). For example, Visual Transformer [79] achieves human-level image categorization accuracy by hierarchically routing patches through self-attention layers, a process mathematically analogous to the ventral stream’s progressively invariant feature extraction [86]. Meanwhile, the introduction of memory-augmented Transformers [87] through external differentiable memory banks directly emulates the hippocampal–neocortical dialog during memory consolidation.

Recent breakthroughs like FlashAttention [88] further bridge the gap between biological plausibility and computational efficiency. By optimizing memory access patterns during attention computation, it achieves 3.5× speedup while maintaining mathematical equivalence—a feat reminiscent of the brain’s white matter tract optimization through myelination. This synergy between biological insight and engineering pragmatism underscores the transformative potential of neuromorphic attention modeling, not merely replicating cognitive functions but uncovering fundamental laws of intelligent information processing that transcend substrate limitations.

D. Multimodal association: Unified representation of cross-sensory cognition. Multimodal learning is a direction in deep learning that focuses on the collaborative modeling of various information modalities, such as images, text, speech, and video, as in Figure 10. Its core is to simulate how humans understand the world through the integration of multiple sensory inputs. For example, when observing a “burning campfire”, a person relies not only on the visual flickering of the flames and the auditory crackling of the wood but also on the tactile memory of warmth and even the linguistic description of “warmth” to form a comprehensive understanding of the scene. This complementarity and alignment across modalities are key to the efficiency and robustness of human cognition. Anthropic computing emphasizes replicating the underlying logic of human cognition in technological design, and multimodal association is a central expression of this concept. By enabling machines to integrate semantics, context, and even emotions from different modalities, similar to humans, it seeks to overcome the limitations of single modalities and achieve an understanding and reasoning capacity closer to human intuition.

The core design of CLIP [89] revolves around aligning the semantics of images and text. Its architecture consists of two independent encoder branches: an image encoder, typically based on a variant of the Vision Transformer (ViT) or ResNet-50 [60], and a text encoder. Taking ViT-L/14 as an example, the input image is divided into a sequence of 14 × 14-pixel patches (a total of 256 patches). Each patch undergoes a linear projection into a 768-dimensional vector, with learnable positional encodings added to preserve spatial relationships. A 24-layer Transformer encoder, with each layer containing 12 attention heads, then extracts features. Finally, global average pooling produces a 1024-dimensional feature vector for the image. The text encoder is based on a 12-layer Transformer model. The input text is first tokenized using byte pair encoding (BPE) into a sequence with a maximum length of 77 tokens. The embedding layer maps the tokens into 768-dimensional vectors and adds positional encodings to maintain order. This sequence is processed via a self-attention mechanism, and the final output corresponding to the “End of Sequence” token serves as the global text representation. The output features from both encoders are projected into a shared 128-dimensional embedding space using a learnable projection matrix implemented as a two-layer MLP. This shared embedding space’s geometric structure is optimized using a contrastive loss function. During training, the model computes a cosine similarity matrix for all image–text pairs within a batch and employs a symmetric InfoNCE loss to optimize the matching probabilities in two directions: from image to text and from text to image. The temperature coefficient

τ

, dynamically adjusted via gradient descent, controls the sharpness of the similarity distribution. The innovation of CLIP lies in introducing natural language supervision to visual representation learning [89]. For instance, in zero-shot classification tasks, users simply extend class labels into textual prompts. The model then classifies images by computing the similarity between image features and all textual prompts without requiring any parameter fine-tuning.

The Flamingo model, in contrast, focuses on multimodal sequence generation tasks, deeply integrating visual information with a pretrained language model (such as Chinchilla-70B) [90]. The vision encoder is based on NFNet-F6 [91], a ResNet variant that eliminates batch normalization. Input images are resized to a resolution of 320 × 320, and the encoder outputs 256 visual tokens, with each token corresponding to a 64-dimensional feature representing a local region of the image. These visual tokens interact with the text sequence via a gated cross-attention layer. After each Transformer layer in the language model, a cross-attention module is inserted. This module first performs self-attention on the text’s hidden states and then incorporates the visual tokens as key–value pairs. By using a multi-head attention mechanism (typically with 32 attention heads), it dynamically integrates visual information into the text sequence. A gating mechanism controlled by learnable scalar weights initialized to zero regulates the strength of visual signal injection, preventing visual noise from disrupting the language model’s pretrained capabilities during the early training phase. Flamingo’s training consists of two stages: The first stage freezes the visual encoder’s parameters and trains only the cross-attention layers and language model using large-scale multimodal datasets. The second stage fine-tunes the entire model end-to-end for downstream tasks such as visual question answering and image caption generation. Flamingo’s design enables it not only to describe image content but also to perform complex reasoning across multi-turn dialogues.

The pinnacle of neuromorphic multimodal integration emerges in architectures like PaLI [92], which cascades modality-specific encoders with fusion Transformers. This design replicates the primate ventral-dorsal stream dichotomy: visual processing splits into “what” (ventral) and “where” (dorsal) pathways before converging in prefrontal executive networks. The mixture-of-experts routing of PaLI dynamically allocates computation based on input complexity—a process governed by attention-based gating that mathematically models the locus coeruleus-norepinephrine system’s adaptive gain control [93]. During scene understanding tasks, its routing patterns show a striking similarity to fMRI-measured blood oxygenation level-dependent (BOLD) signals in the temporoparietal junction during multisensory conflict resolution.

3.2.3. The Rise of Large Language Models (LLMs) (2020)

Human beings possess the remarkable ability to express and communicate through language, which starts developing during early childhood and continues to evolve throughout their lifetime [94,95]. However, machines lack the inherent capacity to comprehend and communicate in human language unless they are equipped with powerful AI algorithms. The goal of achieving human-like reading, writing, and communication skills in machines has been a long-standing research challenge and desire [96].

The rise of large models in artificial intelligence, particularly the breakthrough advancements in natural language processing (NLP) and computer vision (CV), has made all of this possible. These models, which typically contain billions or even trillions of parameters, are trained on massive datasets to capture multilayered features and deep semantic relationships. The core idea behind these large models is to simulate and learn cognitive abilities closer to human intelligence by leveraging large numbers of parameters and deeper architectures. Table 8 shows the overview of large language models.

Large models are built upon the Transformer architecture, which has become a standard in deep learning for tasks involving sequential data. The Transformer’s self-attention mechanism allows the model to process and understand relationships between elements in the input, regardless of their position in the sequence. This architecture enables highly parallelizable and scalable processing, making it ideal for handling vast amounts of data. One of the key innovations in large models is their ability to pre-train on large datasets in an unsupervised manner and then fine-tune on specific tasks with smaller, supervised datasets. For instance, models like BERT [80] and GPT, which were trained on massive textual data, have demonstrated a remarkable ability to understand and generate human-like language. Table 9 presents the parameter scales and performance metrics of several currently renowned large-language models. Accuracy on the SQuAD 2.0 dataset (percentage) measures the model’s ability to answer questions in a question-answering task. The higher the score, the better the model’s performance in language understanding and question-answering tasks. The energy efficiency metric, expressed as the number of operations processed per watt (TOPS/W), represents the energy utilization efficiency of the model. The higher the value, the more energy-efficient the model is.

The rise of large models can also be attributed to advancements in computational power. With improvements in hardware, such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs), training deep learning models has become more efficient, allowing the scaling up of models and data. Additionally, new optimization algorithms like Adam [102] and LAMB [103] have made training on large datasets more stable and effective. Alongside computational power, the availability of large, diverse datasets, particularly from the internet, has provided the raw material needed for training these models.

Table 9. Comprehensive performance comparison of global large-scale AI models.

Model	Parameters	Training Data	Hardware Cost	Accuracy	Inference Speed	Energy Efficiency
Model	(Billion)	(TB)	(GPU-Days)	(SQuAD 2.0)	(Tokens/s)	(TOPS/W)
GPT-4	175	45	12,500	89.2	1200	3.5
PaLM-2 [104]	540	780	35,000	88.5	950	2.8
LLaMA-2-70B [105]	70	2	3200	86.7	2500	6.2
Claude 3	150	130	18,000	90.1	1800	4.1
Falcon-180B	180	3.5	7800	87.3	3000	5.9

In large models, the core manifestation of anthropomorphic computing lies in simulating key human cognitive characteristics through structural design and training paradigms—mapping multilayered abilities from intuitive reasoning and common-sense accumulation to social interaction. The chain-of-thought technique vividly illustrates this direction [106]; when GPT-4 is tasked with solving “If A was born 3 years before B and B is 5 years younger than C, what is the age difference between A and C?”, the model generates intermediate steps of gradual deduction (such as calculating the relationship between B and C first, then considering the difference between A and B) rather than directly outputting the result. This staged, explicit reasoning process is highly similar to the “working memory buffer” humans rely on when solving problems—breaking down complex problems into subtasks and temporarily storing intermediate states to avoid cognitive overload. Furthermore, Program-Aided Reasoning combines natural language reasoning with code execution, such as allowing the model to generate a function to solve a geometry problem, essentially simulating the mechanism by which the human brain utilizes mathematical symbolic systems and spatial imagination cooperatively.

At the level of emotion and intent understanding, large models approach the complexity of human interaction through context-sensitive social cognition modeling. For example, in Llama 2-chat, when a user inputs “I just got laid off from work”, the model not only recognizes “laid off” as a negative event in its literal sense, but also assesses the emotional tone of the response based on the conversation’s history (such as avoiding celebratory emojis), and actively offers supportive suggestions (such as steps for claiming unemployment benefits) [105]. This ability stems from the implicit learning of “behavior-feedback” patterns in massive social texts, similar to how children accumulate knowledge of social norms by observing others’ interactions. More refined value alignment techniques further equip models with anthropomorphic ethical judgment abilities, e.g., using reinforcement learning from human feedback (RLHF) [107] to learn strategies for avoiding sensitive topics (such as questions involving racial discrimination), mirroring the brain’s prefrontal cortex’s control over impulsive behavior.

The construction of knowledge systems reflects the bionic characteristics of human memory mechanisms. Traditional neural networks’ parameterized memory is prone to catastrophic forgetting, whereas large models achieve dynamic knowledge access through Mixture of Experts (MoEs) [108] systems, where each input activates specific subnetworks (e.g., a “quantum physics” question triggers a science expert module), while other modules remain dormant to save computational resources. This is consistent with the “pattern activation” feature of human long-term memory; the word “Mozart” preferentially activates neuron clusters related to music rather than unrelated cooking knowledge. Additionally, Retrieval-Augmented Generation (RAG) [109] technology combines internal parameter memory with external knowledge bases, such as calling search engines to obtain the latest historical materials before organizing language output when answering historical event questions, akin to scholars cross-referencing literature and their own knowledge reserves when writing papers.

The evolution of Embodied AI further extends anthropomorphic computing to multimodal physical interactions. For instance, when interpreting the instruction “place the blue block to the left of the red box” through visual-language joint encoding, Google’s PaLM-E model [110] not only analyzes the objects’ colors and spatial relationships but also generates robot joint motion trajectory planning, replicating the multi-level regulatory mechanism by which humans convert abstract language into action intent and then into the motor cortex. In the medical field, the combination of large models and wearable devices showcases anthropomorphic health monitoring potential; by analyzing the correlation between continuous glucose monitoring data and patient diet diaries, the model can infer changes in insulin sensitivity like an experienced doctor and even anticipate hypoglycemia risks and provide early warnings.

However, there remain intrinsic differences between current large models and human cognition; their “understanding” is fundamentally statistical association rather than true emergent consciousness, and their handling of counterfactual reasoning (such as “How would human civilization develop if the dinosaurs hadn’t gone extinct”) often relies on data biases rather than logical inference. Language models have the potential to unintentionally demonstrate bias when the training data used in their development is biased. According to Schramowski et al. [111], large pretrained models designed to mimic natural languages can inadvertently perpetuate unfairness and prejudices. Consequently, this can lead to discriminatory or inaccurate analyses and recommendations, resulting in public criticism across various domains, including politics, society, and law.

Meanwhile, GPT-4 may generate information that is not based on its training data, leading to outputs that are factually incorrect or purely fictional. Hallucinations in large language models (LLMs) are often the result of the model attempting to fill in gaps in its knowledge or context, making assumptions based on patterns it learned during training. This can result in incorrect or misleading outputs, which is particularly problematic in sensitive applications [112].

4. The Possible Future Trends of Artificial Intelligence

4.1. The Major Limitations of AI

Current artificial intelligence systems still face fundamental bottlenecks in four dimensions on the path to approaching human intelligence. These limitations precisely constitute the breakthrough directions for Cyber Brain Intelligence (CBI).

(1): Data Addiction and Cognitive Rigidity

Traditional AI relies on “greenhouse training” with closed datasets, essentially using loss functions to forcibly compress the statistical features of a high-dimensional input space. Take object detection as an example; YOLOv7 [113] achieves 63% mAP on the COCO dataset [114], but this comes at the cost of 110,000 manually annotated images and tens of thousands of gradient descent iterations. However, when faced with unexpected forms of ice crevasses during Antarctic glacier expeditions, the model fails completely due to the lack of similar training samples. This underscores the fundamental difference between supervised learning and biological learning: the human visual cortex constructs flexible conceptual boundaries for object recognition through sparse spiking codes and hierarchical predictive coding, requiring relatively few samples [115]. In contrast, contemporary artificial intelligence systems rely on explicit memory of data distributions and lack internalized representations of the fundamental principles of the physical world.

(2): Energy Efficiency Gap

AlphaGo Zero’s training consumed 2.9 megawatt-hours of electricity (equivalent to the daily electricity usage of 1200 households), sharply contrasting with the extreme energy efficiency of the human brain, which operates at 20 watts [116]. The root of this issue lies in the compute–storage separation architecture of traditional AI; processing a single image with ResNet-152 requires transferring 95MB of data between DDR memory and the GPU, whereas biological neurons transmit pulse signals and update synaptic weights in situ in physical space. Even more problematic is the

O (n^{2})

complexity growth of attention in Transformer models with sequence length, leading to exponentially increasing energy consumption when processing long text dialogues [117]. This hinders the practical application of AI in edge computing scenarios, such as mobile devices and satellites.

(3): Disability in Dynamic Environments

The vulnerability of existing systems in open environments stems from their dualistic “train-deploy” paradigm. For example, Boston Dynamics robots can perform complex backflips in laboratory settings but still fall on slippery moss-covered ground in the wild [118,119]. This is because they can not dynamically adjust their muscle signals through proprioceptive feedback within milliseconds like the human cerebellum. Deep reinforcement learning (DRL) in such scenarios requires millions of trial-and-error iterations to update its strategy [31]. In contrast, the biological brain, with the help of burst-like dopamine signals from the basal ganglia, can immediately reconstruct its movement pattern after a single fall (as seen in a skier’s quick posture adjustment after a first fall). This gap in real-time adaptability fundamentally arises from the lack of mathematical modeling of the neural plasticity mechanisms of AI.

(4): Causal Reasoning Barrier

The “hallucination” issues seen in large language models during text generation reveal the limitations of statistical learning in understanding causal relationships [120]. When asked how raising the minimum wage affects unemployment rates, GPT-4 [121] often stitches together excerpts from economics textbooks but fails to construct a dynamic supply–demand model to perform counterfactual reasoning. This is because the Transformer architecture essentially performs a random walk on a co-occurrence graph of words rather than simulating causal interactions between variables through neural population dynamics like the human prefrontal cortex. Even more concerning is the inability of current AI systems to distinguish confounding variables in data: a medical diagnostic model might incorrectly associate the brand of a hospital’s monitoring equipment with patient survival rates, whereas a doctor would recognize and disentangle such spurious causal relationships through double-blind trials.

In response to these challenges, we propose CBI (Cyber Brain Intelligence), which outlines a promising vision and will address the following issues. While traditional AI engineers remain preoccupied with annotating more data, a different approach is emerging in future laboratories, where a set of chips simulating biological neuron pulses can learn through limited variations in rhythm. Current robotic systems face a deeper issue; they often operate based on fixed instructions. In contrast, CBI aims to develop an intelligent system equipped with a “digital endocrine system”. We have integrated a virtual neurotransmitter network into the system; for instance, when the autonomous driving module detects abnormal road conditions in severe weather, digital norepinephrine reallocates computational power to signal analysis; when prediction errors occur, changes in dopamine levels trigger synaptic reorganization. This dynamic regulation endows AI with a “physiological rhythm”, enabling it to adapt its state to different scenarios. Additionally, current AI systems often lack advanced analytical capabilities, whereas CBI builds a self-evolving causal system. When addressing complex economic issues, the system can simulate the chain effects among variables and conduct reverse reasoning. This capability stems from the integration of neuroscience and symbolic logic, similar to how humans rely on logical reasoning for analysis and validation. Next, we will provide a detailed analysis of the three components of the CBI framework.

4.2. The Structure of CBI

The CBI three-dimensional architecture collaborative paradigm is established on a theoretical framework that integrates bionic computing and causal science, as shown in Figure 11. CBI first receives external signals and converts them into EEG signals through a specific pulse spatio-temporal encoder; then, the signals are passed on to three specialized layers for processing. Its core lies in achieving a closed-loop emergence of intelligence in perception, regulation, and decision-making through the nested coupling of multilayered bio-inspired mechanisms. This paradigm consists of three interoperable tiers: the spiking neural substrate layer, the neuromodulation control layer, and the causal inference decision layer.

4.2.1. The Spiking Neural Substrate Layer

The spiking neural substrate layer serves as the information–physical interface, using spatio-temporal spike coding mechanisms to reconstruct the perception process. Its core operating logic is based on the synergy of Spike-Timing Dependent Plasticity (STDP) and event-driven computation.

For raw sensor inputs (visual, auditory, tactile, etc.), the layer implements three-stage conditioning:

1.: Dynamic normalization: Continuous signals $s (t)$ are compressed via [122]

$\tilde{s} (t) = \frac{s (t) - μ (t)}{k σ (t) + ϵ},$

(11)

where $μ (t), σ (t)$ represents moving averages over biologically plausible windows (τ = 50–200 ms).
2.: Temporal sharpening: Leaky differentiator [123]

$y (t) = 0.95 y (t - 1) + 0.05 (\tilde{s} (t) - \tilde{s} (t - 1))$

(12)

extracts transient features.
3.: Cross-modal alignment: Phase-locked loops compensate inter-sensor delays (e.g., 40 ms audio-visual latency).

STDP (Spike-Timing-Dependent Plasticity) is a biological learning rule in neuroscience that describes synaptic plasticity regulation [124]. Its mathematical essence reveals the quantitative relationship between the timing of signal transmission between neurons and changes in synaptic weights. This process operates on conditioned spike trains generated through adaptive threshold crossing:

T = \{t_{i} |\int_{t_{i} - δ}^{t_{i}} \tilde{s} (τ) d τ \geq θ_{0} + γ \sum_{t_{j} < t_{i}} e^{- (t_{i} - t_{j}) / τ_{adapt}}\}

(13)

where a dynamic threshold

θ (t)

self-regulates spike density to 15–25 spikes/sec per neuron [124].

When the spike timing of presynaptic neuron A,

t_{pre}

, precedes the spike timing of postsynaptic neuron B,

t_{post}

(

Δ t = t_{post} - t_{pre} > 0

), the synaptic weight w undergoes long-term potentiation (LTP), with the potentiation magnitude decaying exponentially with

Δ t

[125]:

Δ w = A_{+} \cdot exp (- Δ t / τ_{+}),

(14)

where

A_{+}

is the potentiation coefficient, and

τ_{+}

is the time constant. Conversely, when

Δ t < 0

, it triggers long-term depression (LTD):

Δ w = - A_{-} \cdot exp (Δ t / τ_{-}) .

(15)

This mechanism breaks through the global synchronous update paradigm of error backpropagation in traditional artificial neural networks (ANNs) [126], enabling distributed learning based on local spatio-temporal information. Its differential equation form can be expressed as

\frac{d w}{d t} = η \cdot [f_{+} (w) \cdot Θ (Δ t) - f_{-} (w) \cdot Θ (- Δ t)]

(16)

where

η

is the learning rate,

Θ

is the Heaviside step function, and

f_{+}

and

f_{-}

are the potentiation and depression functions, respectively.

This biologically plausible encoding method transcends the static weight update paradigm of traditional artificial neural networks (ANNs), enabling the system to autonomously discover temporal-dependent patterns in input streams. In this process, the event-driven computation architecture performs selective attention functions by filtering spike clusters with information entropy below a preset threshold through an adaptive threshold mechanism, thereby achieving nonlinear compression of computational energy consumption while ensuring the completeness of feature extraction.

4.2.2. The Neuromodulation Control Layer

The neuromodulation control layer implements the system-level dynamic optimization of resources through a virtual neurotransmitter network. The core inspiration for the concept of the virtual neurotransmitter network comes from the Neuromodulatory System in biology; although this concept is not defined officially yet, its implementation within a computational framework integrates cutting-edge theories from multiple disciplines. In biological brains, neurotransmitters such as dopamine, serotonin, and norepinephrine operate through volume transmission, a non-synaptic signaling method that affects broad brain regions. These neurotransmitters dynamically regulate the excitability of neural populations, synaptic plasticity, and network oscillatory patterns. Dopamine encodes reward prediction errors through its concentration gradients, influencing reinforcement learning processes by modulating the synaptic weight update rate in the basal ganglia [127]:

Δ D A (t) = α [r (t) + γ V (s_{t + 1}) - V (s_{t})] + (1 - α) D A (t - Δ t)

(17)

where

α \in [0.08, 0.12]

is the temporal difference learning rate, and

γ

is the discount factor.

Serotonin, through 5-HT receptors, regulates the cortico–limbic system circuits, balancing risk-taking and cautious tendencies in decision-making [128]:

σ_{5 H T} (x) = \frac{1}{1 + e^{- k (5 H T) \cdot x}}, k (5 H T) = k_{0} + β ln (1 + 5 H T / ξ)

(18)

where

k_{0}

is baseline curvature, and

ξ

is the half-saturation constant.

Cortisol, as a stress hormone, alters the connectivity strength between the prefrontal cortex and the amygdala via glucocorticoid receptors, facilitating “fight-or-flight” mode switches [129]:

R_{a l l o c} = R_{t o t a l} \cdot [1 - e^{- λ \cdot C O R T}] + R_{b a s a l}

(19)

with stress response coefficient

λ \in [0.03, 0.15]

.

This layer constructs a multidimensional model of chemical signal transmission, abstracting the regulatory roles of biological neurotransmitters, such as dopamine, serotonin, and cortisol, into control vectors within a dynamic parameter space. Here, digital dopamine concentration represents system reward prediction error, driving long-term potentiation or inhibition of synaptic connection strengths; serotonin levels maintain network stability by modulating the curvature of activation functions to prevent oscillatory instabilities caused by resonance effects; cortisol acts as a stress response signal, triggering reconfiguration strategies for distributed computing resources. This regulatory mechanism, akin to an endocrine system, achieves Pareto improvements in computational resource allocation; when task complexity changes abruptly, the control system directs computational power to critical modules via neurotransmitter concentration gradients while performing power gating on idle resources according to von Neumann entropy criteria [130].

4.2.3. The Causal Inference Decision Layer

The causal inference decision layer constructs an explainable decision space based on a neuro-symbolic fusion architecture. This layer introduces Structural Causal Models (SCMs) [131] as formal tools, mapping abstract features transmitted from the previous layers into nodes of a causal graph and establishing directed acyclic dependencies between variables using do-calculus. During decision-making, the counterfactual reasoning engine conducts virtual intervention experiments: under the condition of preserving the observational data distribution, the system calculates (in parallel) the perturbation effects of multiple intervention actions on the posterior distribution of the causal graph. To overcome the computational complexity bottleneck of traditional Bayesian networks, this layer designs an approximate inference algorithm based on the characteristics of spiking networks, using the Poisson process properties of spike firing rates to simulate the probability propagation of intervention operations. Final decisions are optimized using the Shapley Value for multi-objective optimization [132], ensuring Nash equilibrium is reached under the constraint of causal effect traceability [133].

The collaborative mechanism of the three-layer architecture is embodied in the bidirectional recursive coupling of information flow. The spatio-temporal feature vectors output by the substrate layer serve both as the observation basis for neurotransmitter concentration updates in the control layer and provide raw evidence for causal graph construction in the decision layer; the resource allocation strategies issued by the control layer modulate both the temporal precision of pulse encoding in the substrate layer and the computational depth of the reasoning engine in the decision layer; and the assessment of causal importance feedback from the decision layer guides the adaptive adjustment of the STDP learning rate in the substrate layer. This recursive interaction forms a dynamic system with global Lyapunov stability, enabling CBI to sustain the convergence of intelligent processes in non-stationary environments.

This paradigm theoretically overcomes the fundamental limitations of existing AI systems: the pulse coding mechanism of the substrate layer addresses the excessive reliance of ANNs on dense data, the neuromodulation model of the control layer transcends the singular reward-driven paradigm of reinforcement learning, and the causal formalism tool of the decision layer bridges the explanatory gap between deep learning and symbolic logic. Their synergistic effect lays the theoretical foundation for constructing biologically inspired intelligent systems with autonomous evolution capabilities.

4.3. The Future Trends of AI

Under the framework of anthropomorphic computing, AI is undergoing a paradigm shift from “functional simulation” to “cognitive symbiosis”. The next decade may witness evolutionary trajectories along these axes:

A. Neurobiologically inspired dynamic cognitive architecture. The asynchronous event-driven mechanism based on spiking neural networks (SNNs) will reshape the spatio-temporal tuning capabilities of intelligent systems. Drawing inspiration from the multi-timescale learning characteristics of Purkinje cells in the cerebellum [134], next-generation algorithms can achieve multitask coordination in dynamic environments through hierarchical plasticity rules, such as metaplasticity regulation and dendritic domain computation. The recently developed Liquid Time-Constant Network (LTCN) [135] by MIT has demonstrated that differential equations mimicking the dynamic changes of neuronal ion channels enable autonomous driving systems to exhibit human-like dopamine-regulated emergency decision priority switching in sudden traffic situations. Such architectures will advance AI from static knowledge reasoning toward contextualized cognition sensitive to biological rhythms, triggering breakthrough applications in brain–machine interfaces and embodied robotics.

In 2020, Al-Mahasneh et al. demonstrated the implementation of a spiking neural network (SNN) based on the Izhikevich model on an FPGA platform [136]. This system was successfully applied to English character recognition, showcasing the advantages of SNNs in hardware acceleration and real-time processing. Another study compared the performance of spiking neural networks and traditional neural networks for character recognition on the same FPGA platform [137]. The study achieved an English letter recognition system with 58% lower energy consumption and 3.7 times higher recognition speed compared to traditional convolutional neural networks (CNNs), further validating the potential of SNNs in low-power, high-efficiency intelligent systems. The breakthrough of this work lies at the intersection of two future trends: (1) The event-driven nature successfully decoupled synaptic plasticity from gate propagation delays (on the order of 20 ns), enabling the FPGA platform to directly map the spike-timing-dependent plasticity (STDP) mechanism observed in the cerebellar granular layer. (2) The programmable nature of biophysical parameters provides micro-level interfaces for dynamic cognitive architectures, for example, by adjusting the quantization precision of the recovery variable in the Izhikevich model within the FPGA lookup table.

In the context of the future development trends of AI, spiking neural networks—advanced models simulating the spiking mechanisms of biological neurons—will not only drive the evolution of AI from static data processing to dynamic event-driven processing but also lay the foundation for more efficient, flexible, and adaptive intelligent systems. SNNs demonstrate significant potential in cutting-edge domains such as edge computing, brain–computer interfaces, and embodied intelligence.

B. Perception and interaction: Multimodal integration technology drives the future evolution of precise intelligence. In future development trends, the perception and interaction aspects of artificial intelligence will undergo a transformation from single-modal processing to deep integration and contextual awareness, eventually achieving comprehensive intelligence with human-like sensory capabilities. The evolution in this domain will deepen not only within traditional sensory systems such as vision and hearing but also extend to non-verbal interaction modes like touch and smell. By emulating the synergistic mechanisms of biological sensory systems, it will construct perception and interaction architectures with situational adaptability. Multimodal integration perception technology will significantly enhance intelligent systems’ ability to interpret complex environments and interact socially, paving the way for a new era of cross-sensory perception.

Multimodal perception has created a new computational paradigm that enables machines to simultaneously process multi-source data such as vision, hearing, and touch, thereby achieving more advanced semantic understanding and interaction capabilities [138]. Future intelligent systems will adopt integration patterns similar to human brain regions (e.g., the feedback mechanisms between the visual cortex and auditory cortex), employing cross-modal attention models to dynamically allocate perception weights. For instance, in driving scenarios, machines can not only recognize red lights through vision but also monitor siren directions via environmental audio and detect driver fatigue using in-vehicle tactile sensors, thereby making optimal decisions in complex situations.

The extension of speech recognition to emotional interaction is a major breakthrough in perception and interaction. Based on multimodal plugins, future language models will utilize emotion perception technology to deeply interpret speech features such as tone, rhythm, and frequency fluctuations. For example, the emotion-corrected speech interaction system proposed by Google Brain Lab demonstrates that by incorporating sentiment computing models, machines can adjust interaction strategies when users express sadness, offering customized positive responses [98]. In overcoming language barriers, AI will integrate real-time contextual inference technology, enabling systems to generate contextually adaptive interactions through speech analysis and environmental data, such as user facial expressions and pointed objects.

In the field of tactile perception, technological breakthroughs are paving the way for new applications in high-precision physical operations, particularly in medical surgery assistance and industrial precision tasks. The development of force feedback systems has led to the application of flexible electronic sensors in surgical robots in recent years. For example, Gu et al. developed soft tactile sensors that simulate the sensory characteristics of human skin, providing precise pressure feedback during minimally invasive surgeries and significantly reducing the risk of damage to sensitive patient tissues [139]. This advancement not only enhances the safety and efficiency of robotic operations but also provides a basis for real-time decision-making in complex physiological environments. In the industrial sector, innovations in tactile algorithms are similarly driving precision in robotic operations. Nadon et al. demonstrated how tactile sensors enhanced by neural networks address challenges in processing soft materials, such as optimizing real-time force sensing and feedback to successfully solve issues like miscuts during textile sorting [140]. These technologies allow robots to achieve greater accuracy and adaptability when handling non-fixed-form materials. Looking ahead, the integration of tactile technology with biologically inspired mechanisms promises revolutionary changes [141]. Through the interconnected findings of these studies, we can observe the continuous evolution of tactile perception in artificial intelligence, which transcends traditional sensory boundaries and integrates multidisciplinary technologies into biologically inspired future systems. This development not only enhances the interactive capabilities of robots but also opens new possibilities for achieving truly intelligent human–machine collaboration.

C. The future development and evolution of large language models. When discussing the future development of artificial intelligence, it is hard to overlook the evolution of the currently popular large language models. Large language models (LLMs) have incited substantial interest across both academic and industrial domains [142]. As demonstrated by existing work [143], the great performance of LLMs has raised promise that they could be AGI in this era. LLMs possess the capabilities to solve diverse tasks, contrasting with prior models confined to solving specific tasks. Due to its great performance in handling different applications, such as general natural language tasks and domain-specific ones, LLMs are increasingly used by individuals with critical information needs, such as students or patients [120].

However, at the same time, large language models still face certain issues in many aspects. First, LLMs exhibit restricted proficiency in discerning semantic similarity between events [144] and demonstrate substandard performance in evaluating fundamental phrases [145]. Second, LLMs have limited abilities in abstract reasoning and are prone to confusion or errors in complex contexts [146]. Third, LLMs may exhibit social biases and toxicity [147,148] during the generation process, resulting in the production of biased outputs.

Addressing these issues will inject new momentum into the development of large language models (LLMs). Overcoming these limitations will not only significantly enhance the models’ performance in language understanding, reasoning, and handling complex tasks but also create more possibilities for extending their application scope. Additionally, optimizing for social bias and content quality will make the models’ outputs more aligned with ethical standards and practical demands. These advancements will further strengthen the pivotal role of LLMs in the field of artificial intelligence, laying a solid foundation for continued progress and innovation in the technology.

4.4. The Potential Application of CBI

Cyber Brain Intelligence (CBI), through the triple coupling of spiking neural coding, neuromodulatory regulation, and causal formal reasoning, constructs an intelligent architecture with biological-level dynamic adaptability and cognitive interpretability. Although the engineering implementation of CBI still faces frontier challenges such as neuromorphic hardware, cross-scale regulatory algorithms, and causal computability, its theoretical framework has already provided a new mathematical language for understanding the emergence of intelligence in complex systems. Below are some potential application scenarios of CBI.

A. Small-scale digital twin brain. The digital twin brain refers to the creation of a virtual model corresponding to the real brain through advanced digital technologies and algorithms in order to simulate and analyze the brain’s structure and functions [149,150]. This concept originates from digital twin technology, which was initially applied in the industrial and manufacturing fields and has gradually expanded to multiple fields, such as medical science and neuroscience. Through the digital twin brain, researchers can gain a deeper understanding of the complex mechanisms of the brain, explore neural activities in both normal and abnormal states, and thereby contribute to early disease diagnosis and the formulation of treatment plans. The digital twin brain typically integrates various technologies such as artificial intelligence, big data analysis, computational models, and bioinformatics, enabling the real-time processing and analysis of vast amounts of biological data from the human brain. This not only aids researchers in theoretically exploring brain functions but also provides clinicians with personalized medical solutions to improve patient treatment outcomes. Additionally, the digital twin brain may play a significant role in fields such as education, cognitive science, and human–computer interaction, driving the development of intelligent systems.

Cyber Brain Intelligence (CBI) possesses brain-like dynamic modeling capabilities, enabling it to serve as a micro digital twin brain embedded within intelligent systems by constructing a bidirectional closed-loop cognitive mapping between physical and digital spaces. Through the synergistic optimization of neurotransmitter diffusion models and spike-timing coding, CBI can synchronously reproduce the multiscale state transitions of biological neural systems with millisecond-level latency: at the molecular scale, virtual dopamine concentration gradients drive synaptic plasticity adjustments (consistent with Fick’s law of diffusion); at the functional module scale,

θ

-

γ

oscillation phase coupling achieves dynamic buffering of working memory (based on Kuramoto oscillator equations [151]), and at the behavioral decision-making level, a causal intervention engine simulates cognitive trajectory deviations in real time under environmental disturbances (utilizing do-calculus counterfactual analysis). This cross-level bio-machine state mirroring makes the CBI twin obtain the possibility not only to predict brain dysfunction (such as simulating chaotic phase synchronization preceding epileptic seizures) but also to reversely regulate the behavior of physical entities (such as repairing path planning deviations in robots by adjusting the virtual affinity of NMDA receptors).

B. Super-adaptive smart city. In generic terms, a smart city is an urban environment that utilizes ICT and other related technologies to enhance the performance efficiency of regular city operations and the quality of services (QoSs) provided to urban citizens. In formal terms, experts have defined smart cities by considering various aspects and perspectives [152]. A popular definition states that a smart city connects physical, social, business, and ICT infrastructure to uplift the intelligence of the city [153]. The utmost goal of initial smart cities was to enhance the QoL of urban citizens by reducing the contradiction between demand and supply in various functionalities. Accommodating QoL demands, modern smart cities especially focus on sustainable and efficient solutions for energy management, transportation, healthcare, governance, and many more in order to meet the extreme necessities of urbanization.

With CBI, the smart city system might be more precise and efficient. The pulse infrastructure layer of the CBI transforms the urban sensor network into a “digital peripheral nervous system”, capturing spatial and temporal fluctuations in traffic flow, energy consumption, and air quality with millisecond precision. The neuromodulatory control layer balances computational resources in real time through virtual neurotransmitter concentration gradients: during morning rush hours, it releases “digital dopamine” to enhance the computational power allocated to traffic signal optimization modules; when pollution exceeds safe levels, it triggers a “cortisol stress response” to activate emergency emission reduction strategies. The causal engine in the decision-making layer constructs dynamic causal graphs of complex urban systems; in the event of sudden power grid failures, the system simulates the impact of different repair routes on critical facilities such as hospitals and subways in counterfactual spaces, generating Pareto-optimal resilient recovery plans. This capability elevates urban management from “passive response” to “predictive self-healing”.

C. Autonomous evolution for Industry 4.0. The term Industry 4.0 collectively refers to a wide range of current concepts, including smart factories, cyber-physical systems, self-organization, adaption to human needs, etc. [154]. The core of Industry 4.0 lies in the integration of the Internet of Things, cloud computing, big data analytics, artificial intelligence, and automation technologies to facilitate the transformation of traditional manufacturing into smart manufacturing. Within this framework, production systems can achieve autonomous coordination, monitoring, and optimization of the production process through embedded sensors, network connectivity, and data processing capabilities.

In smart factories, the Spike-Timing Dependent Plasticity (STDP) mechanism of CBI equips production lines with biological-level learning capabilities. When robotic arms grasp new, irregularly shaped parts, the spike-timing encoding automatically strengthens the associations between visual recognition and motor control neurons, enabling adaptation to variations within certain shape tolerances without the need for retraining. The virtual neurotransmitter network balances production safety and efficiency through a “5-hydroxytryptamine-adrenaline” balance. When the equipment wear rate exceeds a threshold, changes in neurotransmitter concentrations trigger a “conservative production mode”, dynamically reducing speed to avoid the risk of failure. The causal decision-making layer further breaks through the limitations of traditional digital twins by using intervention operators to simulate the cascading effects of black swan events, such as raw material price hikes and geopolitical conflicts on the supply chain, providing robust optimization solutions for globally distributed manufacturing networks.

5. Discussion

5.1. Limitations of the Anthropomorphic Computing Review

When reviewing the development of artificial intelligence technologies from the perspective of anthropomorphic computing, it becomes evident that the limitations at each stage have hindered further breakthroughs in AI simulating human intelligence. Early AI technologies, centered on symbolic reasoning, excelled at rule-based inference but struggled to adapt to dynamic and complex environments, lacking the flexibility and adaptability inherent in human intelligence. Machine learning mitigated some of these limitations through data-driven pattern discovery, yet its predictions were based on correlations rather than causal reasoning, and its heavy reliance on large-scale labeled datasets restricted its application scope. Deep learning achieved significant advancements in representation capabilities through multilayer neural networks, but its “black-box” nature led to poor interpretability, high energy consumption, and challenges in knowledge transfer, remaining far from the capabilities of biological intelligence. Large models (such as the GPT series) further advanced language understanding and generation abilities but were highly resource-dependent, lacking true causal reasoning and multimodal cognitive integration capabilities. Overall, the limitations of these technologies highlight the difficulty of current AI in emulating the core features of biological intelligence, such as dynamic learning, causal reasoning, and resource efficiency, posing significant challenges to the development of new anthropomorphic computing paradigms.

Despite the significant progress in artificial intelligence research in recent years, the literature and studies related to “Anthropomorphic Computing” remain relatively scarce. This has led to the lack of a unified framework and authoritative definition for anthropomorphic computing as a research direction in both academia and industry. Existing research tends to focus on the implementation of specific technologies (such as machine learning, deep learning, and large models), while systematic discussions on integrating these technologies with biological intelligence methodologies are relatively limited. This limitation means the concept of anthropomorphic computing is still in the exploratory stage, with varying interpretations of its scope and meaning among researchers and a lack of systematized research under a mature theoretical framework. This not only restricts public understanding of anthropomorphic computing but also creates challenges in its practical development and application, thus hindering the theoretical advancement and widespread adoption of the field.

5.2. Limitations of the CBI Structure

In analyzing existing AI technologies from the perspective of anthropomorphic computing and proposing Cyber Brain Intelligence (CBI), this study, despite its innovative exploration in the theoretical framework, still has certain shortcomings and limitations that need further consideration and resolution.

First, as a new computing paradigm, the complete implementation of CBI depends on immature neuromorphic hardware and chemical-electrical signal joint encoding technology. Current market hardware supporting spiking neural networks, such as Loihi [155], can simulate certain neuron behaviors but faces technical bottlenecks in dynamic neurotransmitter concentration regulation, making it difficult to support the complex chemical regulation model of CBI. This hardware dependency limits the broad application and validation of CBI, potentially requiring further hardware innovation or improvements to fully realize the theoretical framework.

Second, although CBI theoretically possesses advantages in causal reasoning, accurately constructing large-scale causal graphs in practice remains a challenge. The lack of standardized causal knowledge bases makes the quality and credibility of causal reasoning dependent on the quality and structure of underlying data, adding uncertainty during the knowledge acquisition and integration phase. Additionally, the dynamic updating and maintenance costs of causal graphs are high, requiring further research on how to maintain the immediacy and accuracy of causal models in rapidly changing fields.

Furthermore, although CBI provides a computational approach closer to biological intelligence through spiking networks and neurotransmitter models, its performance in complex cognitive tasks still requires long-term empirical validation. Especially in scenarios involving multimodal data processing, multitask learning, and efficient resource allocation and decision-making capabilities, whether CBI can surpass existing deep learning models still requires more experimental data for support. This also involves the design of evaluation metrics, as the current academic field lacks evaluation standards for causal reasoning ability and dynamic adaptability, meaning the framework proposed in this study has yet to achieve comprehensive and objective coverage in effect evaluation.

6. Conclusions

This paper reviews the historical development of various algorithms and technologies in artificial intelligence from the perspective of anthropomorphic computing, highlighting their efforts to emulate human thinking. Throughout different historical stages, various AI technologies have reflected humanity’s vision of enhancing machine intelligence to approach human intelligence. From early AI systems to deep learning and large language models, continuous method iterations have progressively increased accuracy, bringing us closer to this ideal. However, as discussed in Section 5, AI technologies still have limitations, and the concept of anthropomorphic computing requires a more authoritative definition. This paper proposes the Cyber Brain Intelligence (CBI) model as an attempt to address these limitations, with its core components being spiking neural networks, neurotransmitter networks, and causal reasoning models. While current technologies can simulate certain neuron behaviors, there are technical bottlenecks in dynamic neurotransmitter concentration regulation, which make it difficult to support the complex chemical regulation model of CBI. Additionally, the high costs of dynamically updating and maintaining causal graphs are major factors that CBI has yet to overcome. Future research can focus on enhancing the dynamic neurotransmitter regulation capabilities of the CBI model, developing lightweight causal reasoning architectures to reduce maintenance costs and integrating neural spiking dynamics with multimodal bio-inspired mechanisms to build more comprehensive bionic intelligence systems. Additionally, there is a need to further integrate foundational biological theories with computational models to promote the systematic theoretical construction and standardization of the “anthropomorphic computing” concept.

Author Contributions

Conceptualization, J.Z. and H.Z.; methodology, J.Z.; formal analysis, J.Z.; investigation, J.Z. and H.Z.; supervision, H.Z.; funding acquisition, H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the Humanity and Social Science Foundation of the Ministry of Education of China (21A13022003).

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Al Kuwaiti, A.; Nazer, K.; Al-Reedy, A.; Al-Shehri, S.; Al-Muhanna, A.; Subbarayalu, A.V.; Al Muhanna, D.; Al-Muhanna, F.A. A review of the role of artificial intelligence in healthcare. J. Pers. Med. 2023, 13, 951. [Google Scholar] [CrossRef]
Chen, L.; Chen, P.; Lin, Z. Artificial intelligence in education: A review. IEEE Access 2020, 8, 75264–75278. [Google Scholar] [CrossRef]
Peres, R.S.; Jia, X.; Lee, J.; Sun, K.; Colombo, A.W.; Barata, J. Industrial artificial intelligence in industry 4.0-systematic review, challenges and outlook. IEEE Access 2020, 8, 220121–220139. [Google Scholar] [CrossRef]
Aslitdinova, M. How artificial intelligence helps us in our daily life. Int. J. Artif. Intell. 2025, 1, 538–542. [Google Scholar]
Zhang, X.; Ma, Z.; Zheng, H.; Li, T.; Chen, K.; Wang, X.; Liu, C.; Xu, L.; Wu, X.; Lin, D.; et al. The combination of brain-computer interfaces and artificial intelligence: Applications and challenges. Ann. Transl. Med. 2020, 8, 712. [Google Scholar] [CrossRef] [PubMed]
Cao, Z. A review of artificial intelligence for EEG-based brain- computer interfaces and applications. Brain Sci. Adv. 2020, 6, 162–170. [Google Scholar] [CrossRef]
Mohanty, H. Trust: Anthropomorphic Algorithmic. In Proceedings of the International Conference on Distributed Computing and Internet Technology, Bhubaneswar, India, 10–13 January 2019; pp. 50–72. [Google Scholar]
Watson, D. The rhetoric and reality of anthropomorphism in artificial intelligence. Minds Mach. 2019, 29, 417–440. [Google Scholar] [CrossRef]
Merolla, P.A.; Arthur, J.V.; Alvarez-Icaza, R.; Cassidy, A.S.; Sawada, J.; Akopyan, F.; Jackson, B.L.; Imam, N.; Guo, C.; Nakamura, Y.; et al. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 2014, 345, 668–673. [Google Scholar] [CrossRef]
Maass, W. Networks of spiking neurons: The third generation of neural network models. Neural Netw. 1997, 10, 1659–1671. [Google Scholar] [CrossRef]
Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G.; PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Int. J. Surg. 2010, 8, 336–341. [Google Scholar] [CrossRef]
Grzybowski, A.; Pawlikowska-Łagód, K.; Lambert, W.C. A history of artificial intelligence. Clin. Dermatol. 2024, 42, 221–229. [Google Scholar] [CrossRef] [PubMed]
Haenlein, M.; Kaplan, A. A brief history of artificial intelligence: On the past, present, and future of artificial intelligence. Calif. Manag. Rev. 2019, 61, 5–14. [Google Scholar] [CrossRef]
Turing, A.M. Computing Machinery and Intelligence; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
Weik, M.H. The ENIAC story. Ordnance 1961, 45, 571–575. [Google Scholar]
Zadeh, L.A. Fuzzy logic. In Granular, Fuzzy, and Soft Computing; Springer: Berlin/Heidelberg, Germany, 2023; pp. 19–49. [Google Scholar]
Mittal, K.; Jain, A.; Vaisla, K.S.; Castillo, O.; Kacprzyk, J. A comprehensive review on type 2 fuzzy logic applications: Past, present and future. Eng. Appl. Artif. Intell. 2020, 95, 103916. [Google Scholar] [CrossRef]
Dumitrescu, C.; Ciotirnae, P.; Vizitiu, C. Fuzzy logic for intelligent control system using soft computing applications. Sensors 2021, 21, 2617. [Google Scholar] [CrossRef]
Expert System. 2021. Available online: https://ikcest-drr.data.ac.cn/tutorial/k2033 (accessed on 20 June 2025).
Zadeh, L.A. Fuzzy logic. Computer 1988, 21, 83–93. [Google Scholar] [CrossRef]
Back, T.; Hammel, U.; Schwefel, H.P. Evolutionary computation: Comments on the history and current state. IEEE Trans. Evol. Comput. 1997, 1, 3–17. [Google Scholar] [CrossRef]
Holland, J.H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence; MIT Press: Cambridge, MA, USA, 1992. [Google Scholar]
Storn, R.; Price, K. Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
Michalewicz, Z. Genetic Algorithms + Data Structures = Evolution Programs; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
Loh, W.Y. Classification and regression trees. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2011, 1, 14–23. [Google Scholar] [CrossRef]
Wiering, M.A.; Van Otterlo, M. Reinforcement learning. Adapt. Learn. Optim. 2012, 12, 729. [Google Scholar]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998; Volume 1. [Google Scholar]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
Shu, T.; Pan, Z.; Ding, Z.; Zu, Z. Resource scheduling optimization for industrial operating system using deep reinforcement learning and WOA algorithm. Expert Syst. Appl. 2024, 255, 124765. [Google Scholar] [CrossRef]
Vezhnevets, A.S.; Osindero, S.; Schaul, T.; Heess, N.; Jaderberg, M.; Silver, D.; Kavukcuoglu, K. Feudal networks for hierarchical reinforcement learning. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 3540–3549. [Google Scholar]
Coulom, R. Efficient selectivity and backup operators in Monte-Carlo tree search. In Proceedings of the International Conference on Computers and Games, Turin, Italy, 29–31 May 2006; pp. 72–83. [Google Scholar]
Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef]
Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
Caruana, R. Multitask learning. Mach. Learn. 1997, 28, 41–75. [Google Scholar] [CrossRef]
Panigrahi, S.; Nanda, A.; Swarnkar, T. A survey on transfer learning. In Proceedings of the Intelligent and Cloud Computing: Proceedings of ICICC 2019; Springer: Singapore, 2021; Volume 1, pp. 781–789. [Google Scholar]
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Volume 27. [Google Scholar]
Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data 2016, 3, 1–40. [Google Scholar] [CrossRef]
Huang, J.; Gretton, A.; Borgwardt, K.; Schölkopf, B.; Smola, A. Correcting sample selection bias by unlabeled data. In Proceedings of the Advances in Neural Information Processing Systems; Bradford Books: Denver, CO, USA, 2006; Volume 19. [Google Scholar]
Mahesh, B. Machine learning algorithms—A review. Int. J. Sci. Res. (IJSR) 2020, 9, 381–386. [Google Scholar] [CrossRef]
Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference; Elsevier: Amsterdam, The Netherlands, 2014. [Google Scholar]
Matzkevich, I.; Abramson, B. The topological fusion of Bayes nets. In Proceedings of the Uncertainty in Artificial Intelligence; Elsevier: Amsterdam, The Netherlands, 1992; pp. 191–198. [Google Scholar]
Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight uncertainty in neural network. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 1613–1622. [Google Scholar]
Wandell, B.A.; Dumoulin, S.O.; Brewer, A.A. Visual field maps in human cortex. Neuron 2007, 56, 366–383. [Google Scholar] [CrossRef]
Riesenhuber, M.; Poggio, T. Hierarchical models of object recognition in cortex. Nat. Neurosci. 1999, 2, 1019–1025. [Google Scholar] [CrossRef] [PubMed]
LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
O’Shea, K.; Nash, R. An Introduction to Convolutional Neural Networks. arXiv 2015, arXiv:1511.08458. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Elngar, A.A.; Arafa, M.; Fathy, A.; Moustafa, B.; Mahmoud, O.; Shaban, M.; Fawzy, N. Image classification based on CNN: A survey. J. Cybersecur. Inf. Manag. 2021, 6, 18–50. [Google Scholar] [CrossRef]
Chen, Y. Convolutional Neural Network for Sentence Classification. Master’s Thesis, University of Waterloo, Waterloo, ON, Canada, 2015. [Google Scholar]
Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic routing between capsules. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Hinton, G.E.; Sabour, S.; Frosst, N. Matrix capsules with EM routing. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 25, pp. 1097–1105. [Google Scholar]
Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part I 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 818–833. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3349–3364. [Google Scholar] [CrossRef]
Wang, Y.; Rao, Y.; Huang, C.; Yang, Y.; Huang, Y.; He, Q. Using the improved mask R-CNN and softer-NMS for target segmentation of remote sensing image. In Proceedings of the 2021 4th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), Yibin, China, 20–22 August 2021; pp. 91–95. [Google Scholar]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.U.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.c. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1480–1489. [Google Scholar]
Liu, G.; Guo, J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 2019, 337, 325–338. [Google Scholar] [CrossRef]
Tayeh, T.; Aburakhia, S.; Myers, R.; Shami, A. An attention-based ConvLSTM autoencoder with dynamic thresholding for unsupervised anomaly detection in multivariate time series. Mach. Learn. Knowl. Extr. 2022, 4, 350–370. [Google Scholar] [CrossRef]
Elman, J.L. Finding structure in time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef]
Neil, D.; Pfeiffer, M.; Liu, S.C. Phased lstm: Accelerating recurrent network training for long or event-based sequences. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The long-document transformer. arXiv 2020, arXiv:2004.05150. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Gulati, A.; Qin, J.; Chiu, C.C.; Parmar, N.; Zhang, Y.; Yu, J.; Han, W.; Wang, S.; Zhang, Z.; Wu, Y.; et al. Conformer: Convolution-augmented transformer for speech recognition. arXiv 2020, arXiv:2005.08100. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 4171–4186. [Google Scholar]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
Kay, K.N.; Naselaris, T.; Prenger, R.J.; Gallant, J.L. Identifying natural images from human brain activity. Nature 2008, 452, 352–355. [Google Scholar] [CrossRef]
Jaegle, A.; Borgeaud, S.; Alayrac, J.B.; Doersch, C.; Ionescu, C.; Ding, D.; Koppula, S.; Zoran, D.; Brock, A.; Shelhamer, E.; et al. Perceiver io: A general architecture for structured inputs & outputs. arXiv 2021, arXiv:2107.14795. [Google Scholar]
Zaheer, M.; Guruganesh, G.; Dubey, K.A.; Ainslie, J.; Alberti, C.; Ontanon, S.; Pham, P.; Ravula, A.; Wang, Q.; Yang, L.; et al. Big bird: Transformers for longer sequences. Adv. Neural Inf. Process. Syst. 2020, 33, 17283–17297. [Google Scholar]
Khaligh-Razavi, S.M.; Kriegeskorte, N. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Comput. Biol. 2014, 10, e1003915. [Google Scholar] [CrossRef] [PubMed]
Rae, J.W.; Potapenko, A.; Jayakumar, S.M.; Lillicrap, T.P. Compressive transformers for long-range sequence modelling. arXiv 2019, arXiv:1911.05507. [Google Scholar]
Dao, T.; Fu, D.; Ermon, S.; Rudra, A.; Ré, C. Flashattention: Fast and memory-efficient exact attention with io-awareness. Adv. Neural Inf. Process. Syst. 2022, 35, 16344–16359. [Google Scholar]
Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
Alayrac, J.B.; Donahue, J.; Luc, P.; Miech, A.; Barr, I.; Hasson, Y.; Lenc, K.; Mensch, A.; Millican, K.; Reynolds, M.; et al. Flamingo: A visual language model for few-shot learning. Adv. Neural Inf. Process. Syst. 2022, 35, 23716–23736. [Google Scholar]
Brock, A.; De, S.; Smith, S.L.; Simonyan, K. High-performance large-scale image recognition without normalization. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 1059–1071. [Google Scholar]
Chen, X.; Wang, X.; Changpinyo, S.; Piergiovanni, A.; Padlewski, P.; Salz, D.; Goodman, S.; Grycner, A.; Mustafa, B.; Beyer, L.; et al. Pali: A jointly-scaled multilingual language-image model. arXiv 2022, arXiv:2209.06794. [Google Scholar]
Aston-Jones, G.; Cohen, J.D. An integrative theory of locus coeruleus-norepinephrine function: Adaptive gain and optimal performance. Annu. Rev. Neurosci. 2005, 28, 403–450. [Google Scholar] [CrossRef]
Pinker, S. The Language Instinct: How the Mind Creates Language; Penguin: London, UK, 2003. [Google Scholar]
Hauser, M.D.; Chomsky, N.; Fitch, W.T. The faculty of language: What is it, who has it, and how did it evolve? Science 2002, 298, 1569–1579. [Google Scholar] [CrossRef]
Dwivedi, Y.K.; Kshetri, N.; Hughes, L.; Slade, E.L.; Jeyaraj, A.; Kar, A.K.; Baabdullah, A.M.; Koohang, A.; Raghavan, V.; Ahuja, M.; et al. Opinion Paper:“So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Int. J. Inf. Manag. 2023, 71, 102642. [Google Scholar] [CrossRef]
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
Zhang, S.; Roller, S.; Goyal, N.; Artetxe, M.; Chen, M.; Chen, S.; Dewan, C.; Diab, M.; Li, X.; Lin, X.V.; et al. Opt: Open pre-trained transformer language models. arXiv 2022, arXiv:2205.01068. [Google Scholar]
Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. Palm: Scaling language modeling with pathways. J. Mach. Learn. Res. 2023, 24, 1–113. [Google Scholar]
Thoppilan, R.; De Freitas, D.; Hall, J.; Shazeer, N.; Kulshreshtha, A.; Cheng, H.T.; Jin, A.; Bos, T.; Baker, L.; Du, Y.; et al. Lamda: Language models for dialog applications. arXiv 2022, arXiv:2201.08239. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
You, Y.; Gitman, I.; Ginsburg, B. Large batch training of convolutional networks. arXiv 2017, arXiv:1708.03888. [Google Scholar]
Anil, R.; Dai, A.M.; Firat, O.; Johnson, M.; Lepikhin, D.; Passos, A.; Shakeri, S.; Taropa, E.; Bailey, P.; Chen, Z.; et al. Palm 2 technical report. arXiv 2023, arXiv:2305.10403. [Google Scholar]
Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open foundation and fine-tuned chat models. arXiv 2023, arXiv:2307.09288. [Google Scholar]
Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 2022, 35, 24824–24837. [Google Scholar]
Christiano, P.F.; Leike, J.; Brown, T.; Martic, M.; Legg, S.; Amodei, D. Deep reinforcement learning from human preferences. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Shazeer, N.; Mirhoseini, A.; Maziarz, K.; Davis, A.; Le, Q.; Hinton, G.; Dean, J. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv 2017, arXiv:1701.06538. [Google Scholar]
Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
Driess, D.; Xia, F.; Sajjadi, M.S.; Lynch, C.; Chowdhery, A.; Wahid, A.; Tompson, J.; Vuong, Q.; Yu, T.; Huang, W.; et al. Palm-e: An embodied multimodal language model. arXiv 2023, arXiv:2303.03378. [Google Scholar]
Schramowski, P.; Turan, C.; Andersen, N.; Rothkopf, C.A.; Kersting, K. Large pre-trained language models contain human-like biases of what is right and wrong to do. Nat. Mach. Intell. 2022, 4, 258–268. [Google Scholar] [CrossRef]
Hadi, M.U.; Qureshi, R.; Shah, A.; Irfan, M.; Zafar, A.; Shaikh, M.B.; Akhtar, N.; Wu, J.; Mirjalili, S. A survey on large language models: Applications, challenges, limitations, and practical usage. Authorea Prepr. 2023, 3. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
Zheng, Y.; Li, S.; Yan, R.; Tang, H.; Tan, K.C. Sparse temporal encoding of visual features for robust object recognition by spiking neurons. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 5823–5833. [Google Scholar] [CrossRef]
Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
Tay, Y.; Dehghani, M.; Bahri, D.; Metzler, D. Efficient transformers: A survey. ACM Comput. Surv. 2022, 55, 1–28. [Google Scholar] [CrossRef]
Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain randomization for transferring deep neural networks from simulation to the real world. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 23–30. [Google Scholar]
Guizzo, E. By leaps and bounds: An exclusive look at how boston dynamics is redefining robot agility. IEEE Spectr. 2019, 56, 34–39. [Google Scholar] [CrossRef]
Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y.; et al. A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–45. [Google Scholar] [CrossRef]
Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
Carandini, M.; Heeger, D.J. Normalization as a canonical neural computation. Nat. Rev. Neurosci. 2012, 13, 51–62. [Google Scholar] [CrossRef] [PubMed]
Adelson, E.H.; Bergen, J.R. Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A 1985, 2, 284–299. [Google Scholar] [CrossRef]
Dan, Y.; Poo, M.m. Spike timing-dependent plasticity of neural circuits. Neuron 2004, 44, 23–30. [Google Scholar] [CrossRef]
Gerstner, W.; Kistler, W.M. Spiking Neuron Models: Single Neurons, Populations, Plasticity; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
Wu, Y.C.; Feng, J.W. Development and application of artificial neural network. Wirel. Pers. Commun. 2018, 102, 1645–1656. [Google Scholar] [CrossRef]
Schultz, W. Dopamine neurons and their role in reward mechanisms. Curr. Opin. Neurobiol. 1997, 7, 191–197. [Google Scholar] [CrossRef]
Cools, R.; Roberts, A.C.; Robbins, T.W. Serotoninergic regulation of emotional and behavioural control processes. Trends Cogn. Sci. 2008, 12, 31–40. [Google Scholar] [CrossRef] [PubMed]
McEwen, B.S. Physiology and neurobiology of stress and adaptation: Central role of the brain. Physiol. Rev. 2007, 87, 873–904. [Google Scholar] [CrossRef]
Von Neumann, J. Mathematische Grundlagen der Quantenmechanik; Springer: Berlin/Heidelberg, Germany, 2013; Volume 38. [Google Scholar]
Pearl, J. Causal inference. Causal. Object. Assess. 2010, 39–58. [Google Scholar]
Shapley, L.S. A value for n-person games. In Contributions to the Theory of Games II; Kuhn, H.W., Tucker, A.W., Eds.; Princeton University Press Princeton: Princeton, NJ, USA, 1953; pp. 307–317. [Google Scholar]
Nash Jr, J.F. Equilibrium points in n-person games. Proc. Natl. Acad. Sci. USA 1950, 36, 48–49. [Google Scholar] [CrossRef]
Tolu, S.; Capolei, M.C.; Vannucci, L.; Laschi, C.; Falotico, E.; Hernández, M.V. A cerebellum-inspired learning approach for adaptive and anticipatory control. Int. J. Neural Syst. 2020, 30, 1950028. [Google Scholar] [CrossRef] [PubMed]
Hasani, R.; Lechner, M.; Amini, A.; Rus, D.; Grosu, R. Liquid time-constant networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 7657–7666. [Google Scholar]
Humaidi, A.J.; Kadhim, T.M.; Hasan, S.; Ibraheem, I.K.; Azar, A.T. A generic izhikevich-modelled FPGA-realized architecture: A case study of printed english letter recognition. In Proceedings of the 2020 24th International Conference on System Theory, Control and Computing (ICSTCC), Sinaia, Romania, 8–10 October 2020; pp. 825–830. [Google Scholar]
Humaidi, A.J.; Kadhim, T.M. Spiking versus traditional neural networks for character recognition on FPGA platform. J. Telecommun. Electron. Comput. Eng. (JTEC) 2018, 10, 109–115. [Google Scholar]
Dritsas, E.; Trigka, M.; Troussas, C.; Mylonas, P. Multimodal Interaction, Interfaces, and Communication: A Survey. Multimodal Technol. Interact. 2025, 9, 6. [Google Scholar] [CrossRef]
Gu, G.; Zhang, N.; Xu, H.; Lin, S.; Yu, Y.; Chai, G.; Ge, L.; Yang, H.; Shao, Q.; Sheng, X.; et al. A soft neuroprosthetic hand providing simultaneous myoelectric control and tactile feedback. Nat. Biomed. Eng. 2023, 7, 589–598. [Google Scholar] [CrossRef]
Nadon, F.; Valencia, A.J.; Payeur, P. Multi-modal sensing and robotic manipulation of non-rigid objects: A survey. Robotics 2018, 7, 74. [Google Scholar] [CrossRef]
Yuanyang, W.; Mahyuddin, M.N. Grasping Deformable Objects in Industry Application: A Comprehensive Review of Robotic Manipulation. IEEE Access 2025, 13, 33403–33423. [Google Scholar] [CrossRef]
Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the opportunities and risks of foundation models. arXiv 2021, arXiv:2108.07258. [Google Scholar]
Bubeck, S.; Chadrasekaran, V.; Eldan, R.; Gehrke, J.; Horvitz, E.; Kamar, E.; Lee, P.; Lee, Y.T.; Li, Y.; Lundberg, S.; et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv 2023, arXiv:2303.12712. [Google Scholar]
Tao, Z.; Jin, Z.; Bai, X.; Zhao, H.; Feng, Y.; Li, J.; Hu, W. Eveval: A comprehensive evaluation of event semantics for large language models. arXiv 2023, arXiv:2305.15268. [Google Scholar]
Riccardi, N.; Yang, X.; Desai, R.H. The Two Word Test as a semantic benchmark for large language models. Sci. Rep. 2024, 14, 21593. [Google Scholar] [CrossRef]
Ott, S.; Hebenstreit, K.; Liévin, V.; Hother, C.E.; Moradi, M.; Mayrhauser, M.; Praas, R.; Winther, O.; Samwald, M. ThoughtSource: A central hub for large language model reasoning data. Sci. Data 2023, 10, 528. [Google Scholar] [CrossRef]
Gehman, S.; Gururangan, S.; Sap, M.; Choi, Y.; Smith, N.A. Realtoxicityprompts: Evaluating neural toxic degeneration in language models. arXiv 2020, arXiv:2009.11462. [Google Scholar]
Dhamala, J.; Sun, T.; Kumar, V.; Krishna, S.; Pruksachatkun, Y.; Chang, K.W.; Gupta, R. Bold: Dataset and metrics for measuring biases in open-ended language generation. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Online, 3–10 March 2021; pp. 862–872. [Google Scholar]
Xiong, H.; Chu, C.; Fan, L.; Song, M.; Zhang, J.; Ma, Y.; Zheng, R.; Zhang, J.; Yang, Z.; Jiang, T. The Digital Twin Brain: A Bridge between Biological and Artificial Intelligence. Intell. Comput. 2023, 2, 0055. [Google Scholar] [CrossRef]
Liu, M.; Fang, S.; Dong, H.; Xu, C. Review of digital twin about concepts, technologies, and industrial applications. J. Manuf. Syst. 2021, 58, 346–361. [Google Scholar] [CrossRef]
Kuramoto, Y.; Kuramoto, Y. Chemical Turbulence; Springer: Berlin/Heidelberg, Germany, 1984. [Google Scholar]
Silva, B.N.; Khan, M.; Han, K. Towards sustainable smart cities: A review of trends, architectures, components, and open challenges in smart cities. Sustain. Cities Soc. 2018, 38, 697–713. [Google Scholar] [CrossRef]
Harrison, C.; Eckman, B.; Hamilton, R.; Hartswick, P.; Kalagnanam, J.; Paraszczak, J.; Williams, P. Foundations for smarter cities. IBM J. Res. Dev. 2010, 54, 1–16. [Google Scholar] [CrossRef]
Lasi, H.; Fettke, P.; Kemper, H.G.; Feld, T.; Hoffmann, M. Industry 4.0. Bus. Inf. Syst. Eng. 2014, 6, 239–242. [Google Scholar] [CrossRef]
Davies, M.; Wild, A.; Orchard, G.; Sandamirskaya, Y.; Guerra, G.A.F.; Joshi, P.; Plank, P.; Risbud, S.R. Advancing neuromorphic computing with loihi: A survey of results and outlook. Proc. IEEE 2021, 109, 911–934. [Google Scholar] [CrossRef]

Figure 1. The structure of this paper.

Figure 2. The PRISMA-based article section diagram.

Figure 3. Number of relevant literature collections by year from 2004 to 2025 (the horizontal axis represents the publication years of the cited literature, while the numbers above the bars indicate the citation count for each corresponding year).

Figure 4. Proportion of selected literature categorized by different AI technologies.

Figure 5. The timeline of artificial intelligence history related to anthropomorphic computing.

Figure 6. An example of an expert system, including a complete reasoning process (The image is modified based on [20]).

Figure 7. The contrast between decision tree and human thinking when guessing a fruit.

Figure 8. The comparison and similarity between humans and computers in transfer learning. (a) Intuitive examples of transfer learning from the human perspective. (b) Transfer learning from a computer science perspective while carrying out two similar tasks.

Figure 9. The contrast between CNN and the process of humans receiving visual information. The primary visual cortex (V1) extracts low-level local features, such as edges and orientations, while higher-level visual areas (V4) progressively integrate information to form representations of shapes, textures, and even semantic content.

Figure 10. An example diagram of multimodal style generation using CLIP. Both text and image modalities are used as inputs.

Figure 11. Detailed overview of the Cyber Brain Intelligence system architecture: A layered design based on the spiking neural substrate, a neuromodulation control layer, and a causal inference decision layer.

Table 1. Anthropomorphic computing perspectives in deep learning architectures.

Categories	Strengths	Limitations	Anthropomorphic Computing Insights
Perceptual Layer Abstraction:	- Automatic feature hierarchy	- Limited temporal awareness	- Retinotopic mapping
(e.g., CNN, ResNet)	- Spatial invariance	- Context-agnostic	- Mimics ventral visual pathway
	- Parameter efficiency	- Static processing	- Orientation-selective kernels
Temporal and Contextual Understanding:	- Sequential dependency	- Vanishing gradients	- Hippocampal replay
(e.g., LSTM, RNN)	- Long-range correlation	- Quadratic complexity	- Episodic memory consolidation
	- Adaptive forgetting	- Catastrophic interference	- Theta–gamma coupling simulation
Attention Mechanism:	- Dynamic saliency	- Over-smoothing	- Foveal-peripheral vision
(e.g., Transformer)	- Interpretable focus	- High memory cost	- Top-down modulation
	- Context-aware weighting	- Positional bias	- Global workspace theory
Multimodal Association:	- Cross-modal alignment	- Modality imbalance	Synesthetic binding
(e.g., CLIP, ViLBERT)	- Unified representation	- Fusion ambiguity	- Superior colliculus inspiration
	- Knowledge transfer	- Alignment cost	- Cross-modal plasticity

Table 2. Comparative analysis of several CNN architectures: Characteristics and performance metrics.

Characteristic	Classic CNN [53]	Capsule Network	Shallow CNN [54]	ResNet
Hierarchical Structure	Conv-Pooling stacks	Capsule layers	≤3 conv layers	Residual blocks
Core Innovation	Local receptive fields	Dynamic routing	Depth reduction	Skip connections
Parameter Efficiency	$O (k^{2} \cdot c_{in} \cdot c_{out})$	$2.5 \times$ CNN	$0.2 \times$ CNN	$1.2 \times$ CNN
Spatial Awareness	Translation invariance	Viewpoint	Position-sensitive	Deep context
Key Advantage	Feature extraction	Part–whole relationships	Fast inference	Gradient flow
Performance Metrics
Parameters	$1.2 \times 10^{7}$	$3.0 \times 10^{7}$	$5.0 \times 10^{5}$	$2.5 \times 10^{7}$
ImageNet Top-1	$72.1 %$	$74.3 %$	$65.8 %$	$76.5 %$
Model Size	48 MB	115 MB	2.1 MB	98 MB
FLOPs (224 px)	$1.5 \times 10^{9}$	$3.8 \times 10^{9}$	$0.3 \times 10^{9}$	$4.1 \times 10^{9}$

Table 7. Comparison of Transformer, Longformer, Swin Transformer, and Conformer: Parameters and common task performance.

Parameter/Feature	Transformer	Longformer	Swin Transformer	Conformer
Year Proposed	2017	2020	2021	2020
Core Idea	Self-Attention	Sparse Attention	Hierarchical Window Attention	Convolution-Enhanced Attention
Primary Tasks	General (e.g., NLP)	Long Documents	Vision (e.g., CV)	Speech/ASR
Attention Scope	Global	Local + Global	Hierarchical Windows	Local + Global
Computational Complexity	$O (n^{2})$	$O (n \sqrt{n})$	$O (n)$ (Approx.)	$O (n)$ (Optimized)
Common Task Performance Examples
GLUE Benchmark (Avg.)	88.5	89.2	87.8	88.0
Long Document Summarization (BLEU)	35.1	38.7	36.5	37.0
ImageNet Classification	76.2	-	87.3	-

Table 8. Brief overview of large language models (LLMs).

Model	Main Finding	Limitations	Year
BERT [80]	Pretrained bidirectional encoder, revolutionized NLP by enabling fine-tuning for diverse tasks	Computationally expensive fine-tuning, large memory requirements	2018
GPT-2 [97]	Introduced autoregressive pretraining for text generation, demonstrated strong zero-shot learning ability	Prone to generating repetitive or nonsensical outputs for long texts	2019
GPT-3 [81]	Scaled up to 175B parameters, excels in few-shot and zero-shot learning tasks	Requires vast computational resources for training, environmental concerns	2020
T5(Text-to-Text Transfer Transformer) [98]	Unified NLP tasks as text-to-text problems, enabling versatile applications across tasks	Sensitive to hyperparameter tuning, large-scale pretraining required	2020
OPT [99]	Open-sourced LLM, aimed at democratizing access to large-scale models	Less optimized compared to other commercial LLMs, limited multilingual support	2022
PaLM [100]	Achieves impressive performance on reasoning and complex NLP tasks, incorporates chain-of-thought prompting	Requires massive computational resources, limited accessibility	2022
LaMDA [101]	Optimized for dialogue systems, focuses on conversational AI applications	Struggles with factual accuracy and consistency in long conversations	2022
Claude	Safety-focused LLM with improved alignment techniques, designed to minimize harmful outputs	Relatively new, limited benchmarks compared to GPT models	2023
GPT-4	Multimodal capabilities, improved reasoning and contextual understanding compared to GPT-3	High computational demands, limited public access and transparency	2023

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, J.; Zhang, H. Towards Human-like Artificial Intelligence: A Review of Anthropomorphic Computing in AI and Future Trends. Mathematics 2025, 13, 2087. https://doi.org/10.3390/math13132087

AMA Style

Zhang J, Zhang H. Towards Human-like Artificial Intelligence: A Review of Anthropomorphic Computing in AI and Future Trends. Mathematics. 2025; 13(13):2087. https://doi.org/10.3390/math13132087

Chicago/Turabian Style

Zhang, Jiacheng, and Haolan Zhang. 2025. "Towards Human-like Artificial Intelligence: A Review of Anthropomorphic Computing in AI and Future Trends" Mathematics 13, no. 13: 2087. https://doi.org/10.3390/math13132087

APA Style

Zhang, J., & Zhang, H. (2025). Towards Human-like Artificial Intelligence: A Review of Anthropomorphic Computing in AI and Future Trends. Mathematics, 13(13), 2087. https://doi.org/10.3390/math13132087

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Towards Human-like Artificial Intelligence: A Review of Anthropomorphic Computing in AI and Future Trends

Abstract

1. Introduction

2. Research Methodology

3. The Anthropomorphic Computing in Different Time of Artificial Intelligence

3.1. Early Artificial Intelligence and Anthropomorphic Computing

3.1.1. Turing Machine

3.1.2. The McCulloch–Pitts Neuron

3.1.3. The ENIAC

3.1.4. Fuzzy Logic

3.1.5. Expert Systems

3.1.6. Evolutionary Algorithm

3.2. Modern Approaches to AI and Anthropomorphic Computing

3.2.1. The Rise of Machine Learning and Neural Networks (the 1990s)

3.2.2. The Rise of Deep Learning (2005)

3.2.3. The Rise of Large Language Models (LLMs) (2020)

4. The Possible Future Trends of Artificial Intelligence

4.1. The Major Limitations of AI

4.2. The Structure of CBI

4.2.1. The Spiking Neural Substrate Layer

4.2.2. The Neuromodulation Control Layer

4.2.3. The Causal Inference Decision Layer

4.3. The Future Trends of AI

4.4. The Potential Application of CBI

5. Discussion

5.1. Limitations of the Anthropomorphic Computing Review

5.2. Limitations of the CBI Structure

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI