You are currently viewing a new version of our website. To view the old version click .
Mathematics
  • Article
  • Open Access

23 May 2025

AI Reasoning in Deep Learning Era: From Symbolic AI to Neural–Symbolic AI

,
and
1
School of Computer Science and Engineering, Beihang University, 37 Xueyuan Road, Haidian District, Beijing 100191, China
2
State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, 37 Xueyuan Road, Haidian District, Beijing 100191, China
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Advanced Applications of Deep Learning Methods: Interdisciplinary Perspectives

Abstract

The pursuit of Artificial General Intelligence (AGI) demands AI systems that not only perceive but also reason in a human-like manner. While symbolic systems pioneered early breakthroughs in logic-based reasoning, such as MYCIN and DENDRAL, they suffered from brittleness and poor scalability. Conversely, modern deep learning architectures have achieved remarkable success in perception tasks, yet continue to fall short in interpretable and structured reasoning. This dichotomy has motivated growing interest in Neural–Symbolic AI, a paradigm that integrates symbolic logic with neural computation to unify reasoning and learning. This survey provides a comprehensive and technically grounded overview of AI reasoning in the deep learning era, with a particular focus on Neural–Symbolic AI. Beyond a historical narrative, we introduce a formal definition of AI reasoning and propose a novel three-dimensional taxonomy that organizes reasoning paradigms by representation form, task structure, and application context. We then systematically review recent advances—including Differentiable Logic Programming, abductive learning, program induction, logic-aware Transformers, and LLM-based symbolic planning—highlighting their technical mechanisms, capabilities, and limitations. In contrast to prior surveys, this work bridges symbolic logic, neural computation, and emergent generative reasoning, offering a unified framework to understand and compare diverse approaches. We conclude by identifying key open challenges such as symbolic–continuous alignment, dynamic rule learning, and unified architectures, and we aim to provide a conceptual foundation for future developments in general-purpose reasoning systems.
MSC:
68T01; 68T37

1. Introduction

Can machines think? Since Alan Turing raised this profound question [1], endowing machines with human-like reasoning abilities has remained one of the central challenges of artificial intelligence (AI) [2]. As AI research deepens and applications continue to expand across domains, a key question emerges: how can we enhance AI’s generalization and adaptability to match or even surpass human cognitive capabilities? This question lies at the heart of the quest for Artificial General Intelligence (AGI) [3].
Unlike narrow AI systems—designed for specific tasks such as image classification or speech recognition [4]—AGI envisions AI systems capable of abstract reasoning, cross-domain generalization, autonomous learning, and the ability to adapt to novel environments [2]. While the path toward AGI remains unclear [5], it is widely agreed that achieving it will require breakthroughs in key areas such as commonsense reasoning [6], causal modeling [7], and learning from limited data [8]. Among these, AI reasoning—the ability to infer, explain, and make decisions based on knowledge—has re-emerged as a foundational component toward building more general, intelligent systems.
AI reasoning, broadly defined, refers to the computational ability to perform logical inference, knowledge-based deduction, and structured problem solving [9]. Early efforts in this domain were grounded in symbolic AI, particularly following the formulation of the Physical Symbol System Hypothesis by Newell and Simon [10] in the 1970s. Symbolic AI dominated from the 1950s to the 1980s, leveraging formal logic, handcrafted rules, and expert systems (e.g., MYCIN [11], DeepBlue [12]) to perform reasoning tasks. However, symbolic methods suffered from brittleness—high reliance on manually encoded knowledge made them ill suited for dynamic or open-ended environments with uncertainty and unstructured data. Since the 2010s, the deep learning revolution has brought tremendous success in perception-driven tasks [4]. End-to-end neural architectures like CNNs [13] and Transformers [14] have enabled machines to recognize images, generate text, and play games at superhuman levels. Yet, their capacity for robust, interpretable reasoning remains limited [5]. For example, while AlphaGo [15] defeated human champions, its success hinged on a hybrid system that combined deep neural networks with symbolic search techniques like Monte Carlo Tree Search. This highlighted a critical limitation: deep learning alone still struggles with structured reasoning, causal inference, and factual consistency [7]—capabilities essential for AGI.
This dichotomy—symbolic systems excel at reasoning but lack perception, while neural networks excel at perception but struggle with reasoning [16]—has led to growing interest in their integration. Leading researchers such as Yoshua Bengio [17] and Bo Zhang [18] have advocated for combining System 1 (intuitive, fast) and System 2 (deliberative, logical) reasoning within AI architectures. Real-world tasks—such as autonomous driving, where causal inference is needed in real time; medical diagnosis, where interpretability is crucial; and legal reasoning, where logical consistency is mandatory—further highlight the urgent need for AI systems that can reason as well as they perceive.
In response, the emerging paradigm of Neural–Symbolic AI has gained significant traction. This paradigm aims to unify symbolic logic and neural computation through techniques such as differentiable logic programming, knowledge-infused learning, and symbolic–neural interaction modules. Recent advances such as ILP (Differentiable Inductive Logic Programming) [19,20], TransE-style knowledge graph embeddings [21], and NS-CL (Neural–Symbolic Concept Learner) [22] represent promising steps in this direction. Nonetheless, key challenges remain, including (1) the compatibility of discrete symbolic structures with continuous vector representations, (2) dynamic adaptation of logical rules, (3) balancing system complexity and efficiency, and (4) the absence of unified architectural frameworks.
This survey presents a comprehensive overview of AI reasoning in the deep learning era, with a focus on Neural–Symbolic AI, from historical, technical, and application perspectives. Beyond recounting historical developments, we contribute a unified and formalized view of AI reasoning by (1) introducing a formal definition and typology of reasoning functions across symbolic, statistical, and neural paradigms; (2) proposing a three-dimensional taxonomy that categorizes reasoning systems by representation type, inferential logic, and domain assumptions; and (3) offering a technically detailed, up-to-date synthesis of modern reasoning architectures, especially neuro-symbolic frameworks that integrate logic with deep learning. Our goal is not only to revisit reasoning history, but also to equip researchers with a structured conceptual map for navigating current challenges and future innovations in AI reasoning.
While several surveys have been conducted on reasoning in AI, some are temporally constrained or narrowly scoped—often limited to pre-deep learning symbolic systems or high-level descriptions of Neural–Symbolic AI [16,23,24]. Others focus solely on recent advancements in large language models (LLMs) [25] without fully examining their implications for structured reasoning. In contrast, this work seeks to offer a comprehensive, up-to-date, and technically grounded analysis of the field. We hope this survey provides a valuable resource for researchers and practitioners, facilitating deeper understanding and further innovation in AI reasoning.
Terminological Note. In this paper, the term symbolic refers to symbolic AI and logic-based reasoning, i.e., systems that perform inference via formal rules, symbolic structures, or discrete logic programs. This usage differs from that in symbolic data analysis (SDA) as studied in statistical learning, where “symbolic” refers to complex data types such as intervals, distributions, or multi-valued variables [26]. While both paradigms use symbolic representations, our focus is on symbolic reasoning, rather than symbolic data summarization.

2. Historical View: From Symbolic Systems to Hybrid Intelligence

2.1. A Brief History of AI Reasoning

Artificial intelligence reasoning has undergone a series of transformative paradigm shifts since the field’s inception. Each era of development—symbolic, statistical, and neural–symbolic—has introduced new methods, capabilities, and limitations. In this section, we revisit the history of AI reasoning with a focus on identifying major methodological lineages within each era, their emergence timeline, core motivations, challenges, and representative systems. This lineage-based analysis serves as the conceptual foundation for constructing a reasoning paradigm evolution diagram in subsequent sections.

2.1.1. Symbolic Reasoning Era (1950s–1980s)

The symbolic era of AI, spanning roughly from the 1950s through the 1980s, was characterized by the dominance of logic-based and knowledge-driven methods for reasoning and decision-making. At its core was the belief that intelligent behavior could be achieved through explicit symbol manipulation, formalized primarily in logic systems and rule-based frameworks. Several distinct methodological lineages emerged during this era, each contributing uniquely to the field.
Logicism (Formal Logic-Based Reasoning). Pioneered in the 1950s, logicism posited that reasoning could be modeled using formal logic systems, primarily first-order logic. This tradition was rooted in cognitive science and mathematical logic and led to the development of early theorem provers and AI planning formalisms. Its key limitation was the inability to handle uncertainty, noise, and incomplete knowledge. This lineage also laid the foundation for logic programming paradigms such as Prolog [27], resolution-based theorem proving [28], and early symbolic rule learning systems such as Michalski’s Variable-Valued Logic (VL1) framework [29]. This lineage also laid the foundation for logic programming paradigms, most notably Prolog, which became a cornerstone of symbolic AI in the 1970s and 1980s. VL1 extended classical logic by allowing predicates to take on flexible, multi-valued conditions, which enabled symbolic generalization from structured data and informed later developments in inductive learning.
Expert Systems. In the 1970s and 1980s, expert systems emerged as a practical application of symbolic reasoning. These systems used handcrafted rules encoded in knowledge bases to replicate human decision-making in specific domains. Despite early successes like MYCIN and DENDRAL [11,30], expert systems suffered from brittleness and the knowledge acquisition bottleneck. Notably, this lineage also encompassed production systems and rule-based engines such as CLIPS and OPS5 [31,32], as well as early constraint satisfaction solvers widely used in configuration and scheduling.
Non-monotonic and Default Logic. Traditional logic systems assumed monotonicity: once something is inferred, it remains true. However, real-world reasoning often requires retracting conclusions when new information arrives. This gave rise to non-monotonic reasoning formalisms such as default logic [33], circumscription [34], and later, answer set programming [35]. These methods better captured commonsense reasoning but introduced high computational complexity. Related developments such as truth maintenance systems (TMSs) [36] and belief revision frameworks were also introduced to support the dynamic consistency of knowledge bases under new information.
Planning Systems. Classical planning frameworks, such as STRIPS [37], emerged in the 1970s, framing reasoning as a sequence of state transitions governed by logical operators. These methods were effective in controlled environments but faced scalability issues in large or dynamic domains. A notable early planning robot was Shakey [38] which was the world’s first mobile intelligent robot.
Argumentation-Based Reasoning. Emerging in the late 1980s, computational argumentation sought to model reasoning as a dialectical process of supporting and attacking claims. Dung’s abstract argumentation framework [39] formalized argument acceptability, laying the foundation for structured approaches like ASPIC+ [40].
Semantic Networks and Description Logic. Efforts to formalize conceptual hierarchies led to semantic networks and later, description logic (DL). DL offered a decidable fragment of first-order logic with well-defined semantics, which eventually became the basis for ontologies and the Semantic Web [41,42,43].
Modal and Temporal Logic. Developed during the 1960s and 1970s, modal logic introduced operators to express necessity, possibility, belief, and knowledge—enabling reasoning about epistemic states and agent capabilities [44]. Temporal logic, including linear and computation tree logics, provided tools for modeling the evolution of system states over time [45,46]. These formalisms laid the foundation for AI planning under uncertainty, program verification, and multi-agent reasoning, though their adoption was often limited by expressiveness and tractability trade-offs.
These streams collectively shaped the symbolic foundations of AI reasoning (Table 1). Despite their strengths in interpretability and formal rigor, symbolic approaches struggled with scalability, brittleness, and perceptual grounding—limitations that motivated the shift toward data-driven methods in the following decades.
Table 1. Key methodological lineages in the symbolic AI reasoning era.

2.1.2. Statistical and Data-Driven Era (1990s–2010s)

The second major era of AI reasoning emerged in response to the limitations of purely symbolic systems, particularly their brittleness, poor scalability, and inability to handle uncertainty. From the 1990s onward, the field increasingly embraced data-driven and statistical approaches, fueled by the rise in machine learning and the growing availability of real-world data. This era introduced reasoning frameworks capable of modeling uncertainty, learning from data, and making probabilistic inferences.
Probabilistic Graphical Models (PGMs). Probabilistic graphical models, including Bayesian Networks [47] and Markov Networks [48], provided a powerful formalism to represent and reason under uncertainty. They captured dependencies among random variables using graph structures, enabling efficient probabilistic inference and decision-making. PGMs were widely used in diagnosis, prediction, and robotics, but often required expert-designed structures and struggled with high-dimensionality.
Markov Logic Networks (MLNs). As a bridge between logic and probability, MLNs extended first-order logic by associating weights with formulas [49], thereby relaxing strict logical inference into probabilistic reasoning. This hybrid approach addressed the rigidity of symbolic logic while retaining structure. However, inference in MLNs remained computationally expensive, and scalability to large domains remained an open challenge.
Probabilistic Logic Programming. This lineage integrated probabilistic reasoning directly into logic programming paradigms. Languages such as ProbLog [50], PRISM [51], and LPAD [52] extended traditional logic programming with probabilistic facts and rules. These systems supported structured and interpretable inference under uncertainty, often leveraging symbolic reasoning engines as their computational backbone.
Statistical Relational Learning (SRL). SRL frameworks generalized machine learning methods to relational domains with complex structures. Approaches such as Relational Bayesian Networks [53] and Relational Markov Networks [54] enabled learning from relational data with shared statistical patterns. Historical precursors to SRL include Inductive Logic Programming (ILP), which aimed to generalize logic rules from structured examples [55]. However, classical ILP systems lacked robustness to noise and uncertainty, which limited their scalability and integration with probabilistic inference.
In parallel with the rise in Inductive Logic Programming and statistical relational learning, researchers also explored conceptual clustering as a form of symbolic structure discovery [56,57,58]. One notable example is COBWEB [58], an incremental conceptual clustering algorithm that constructs hierarchical taxonomies from symbolic attributes using category utility as a guiding metric. Unlike classical numerical clustering, COBWEB produces interpretable concept trees that reflect human-like abstraction processes. These systems demonstrated that symbolic representations and pattern discovery could be integrated in an unsupervised manner—an idea that remains relevant in modern efforts to learn structured latent spaces for reasoning.
Causal Inference and Structural Causal Models. Inspired by the philosophy of science and epidemiology, this stream focused on modeling causal relationships rather than mere correlations. Structural causal models (SCMs), popularized by Judea Pearl, introduced formal tools such as do-calculus for inferring interventions, counterfactuals, and causal effects [7,59]. These methods enriched the reasoning landscape with mechanisms for understanding and manipulating cause–effect relationships.
Kernel Methods and Shallow Statistical Learning. Before deep learning became dominant, statistical reasoning relied heavily on feature-based methods such as SVMs [60], decision trees [61], and ensemble models [62]. Though not symbolic, these methods provided interpretable decision rules and were effective in many structured reasoning tasks, particularly in classification, regression, and ranking.
Deep Neural Networks for Sub-symbolic Reasoning. With the rise in deep learning, neural networks began to exhibit emergent capabilities in representation-based reasoning. Architectures such as convolutional neural networks (CNNs) [63], recurrent neural networks (RNNs) [64], graph neural networks (GNNs) [65], and especially Transformers [14], enabled models to learn hierarchical abstractions, capture compositional patterns, and generalize to novel input combinations. Though these systems lack explicit symbolic representations, they demonstrated powerful inductive capabilities in vision, language, and decision-making tasks. For example, AlphaGo integrated deep neural policy networks with symbolic search (Monte Carlo Tree Search) [15], foreshadowing later neural–symbolic systems. However, purely neural models struggle with interpretability, logical consistency, and systematic generalization in abstract reasoning tasks.
A notable frontier within sub-symbolic reasoning is the emergent reasoning phenomenon observed in large foundation models. Although these models lack explicit symbolic structures or logical inference mechanisms, they exhibit surprising capabilities in multi-step problem solving, analogical inference, and even structured planning—solely by virtue of model scale and in-context learning. These emergent behaviors are best interpreted as an implicit extension of sub-symbolic reasoning, and are distinct from neural–symbolic systems that explicitly incorporate logical form.
Overall, the statistical era emphasized generalization from data, uncertainty modeling, and structural regularities, laying the foundation for modern machine learning (Table 2). However, many of these approaches sacrificed interpretability, relied on strong assumptions, and lacked the rich semantic expressivity of symbolic systems—limitations that would later motivate the integration of symbolic and neural paradigms.
Table 2. Historical lineage of statistical and sub-symbolic reasoning paradigms.

2.1.3. Neural–Symbolic Integration Era (2016–Present)

In recent years, the AI community has witnessed a renewed interest in combining the strengths of symbolic reasoning with the representational power of neural networks. This has given rise to the neural–symbolic integration paradigm, which aims to bridge the long-standing divide between logic-based inference and gradient-based learning. The rise in deep learning, the emergence of large-scale pretrained language and vision models, and the increasing demand for explainable and structured reasoning have all contributed to the rapid development of this hybrid field.
Differentiable Logic Programming. One of the foundational directions in neural–symbolic reasoning is the development of differentiable logic systems. Methods such as ILP [19], Neural Logic Machines [68], and Logical Tensor Networks [69] introduce gradient-based mechanisms to induce and evaluate logical rules. These systems offer differentiability and symbolic interpretability, but often face scalability limitations and require careful design to avoid degeneracy in optimization.
Abductive Learning. Proposed by Zhou and colleagues [70], abductive learning is a neuro-symbolic reasoning paradigm that integrates neural perception with symbolic abductive inference. The central idea is to generate plausible symbolic explanations for observed data using background knowledge and abductive logic programming, and use these explanations to supervise the learning of neural perception modules. Unlike purely deductive or inductive approaches, abductive learning enables reasoning with incomplete observations and missing logical components. It has been applied to handwritten equation understanding, visual question answering, and semantic parsing, demonstrating strong generalization and interpretability. The original framework was proposed by Lin and Zhou [70]. However, it faces challenges in grounding symbols, scaling to large search spaces, and integrating with gradient-based models.
Neuro-symbolic Concept Learners and Program Induction. Another stream focuses on neural systems that learn structured programs, symbolic rules, or modular logic graphs from data. Approaches such as NS-CL [22], Neural Module Networks [71], and CLEVR-CoGenT [72] construct interpretable logic chains over visual or textual inputs. These systems are especially effective in visual question answering, relational reasoning, and grounded language understanding.
LLM-guided Neural–Symbolic Reasoning. With the rise in LLMs such as GPT-4 [73] and Claude [74], researchers have explored their capacity to perform reasoning through chain-of-thought prompting, tree-structured generation, and API/tool augmentation. Frameworks like ReAct [75], Toolformer [76], and DSPy [77] use LLMs as symbolic planners, orchestrating external tools for structured decision-making. These approaches offer flexibility and generalization, though they often struggle with consistency, faithfulness, and verifiability. These models also demonstrate what some researchers describe as emergent reasoning—a form of symbolic-like behavior not grounded in explicit logical structure but arising from large-scale pattern learning and in-context processing.
Logic-aware Transformer Architectures. Recent work has extended standard Transformer architectures with symbolic inductive bias. Models such as Logical Transformers [78] and LogicBench [79] integrate logical constraints, graph structures, or discrete operators into the attention mechanism or decoder path. These systems aim to combine end-to-end learning with structural regularization and symbolic supervision.
Neural Theorem Provers and Knowledge Injection. Other approaches focus on integrating symbolic knowledge into neural architectures via knowledge graphs, ontology embeddings, or reasoning constraints. Systems like DeepProbLog [80], NeurASP [81], and K-BERT [20] use neural–symbolic hybrids to perform logical inference grounded in structured knowledge. They demonstrate strong performance in knowledge-based QA, commonsense inference, and scientific reasoning tasks.
Multimodal Neuro-symbolic Reasoning. Extending beyond text, recent work applies neuro-symbolic methods to multimodal domains including visual reasoning, video event understanding, and robotic planning. Models like NS-CL [22], CLEVR-CoGenT [72], VideoCoT [82], and ViperGPT [83] demonstrate that grounding symbolic structures in perception significantly improves generalization and reasoning robustness.
Overall, the neural–symbolic era represents a convergence of paradigms: the statistical generalization of deep learning, the formal rigor of symbolic logic, and the flexibility of large-scale pretrained models (Table 3). Despite ongoing challenges—including training stability, scalability, and semantic consistency—this paradigm is considered a promising path toward interpretable, robust, and generalizable AI reasoning.
Table 3. Historical lineage of neural–symbolic reasoning paradigms.

2.2. Definition and Formalization of AI Reasoning

AI reasoning refers to the capability of an artificial system to derive conclusions, explanations, or decisions based on a set of premises, background knowledge, or observed information—often through a structured, generalizable, and semantically meaningful process [7,9,86]. Reasoning is distinct from perception or raw classification in that it involves inferential relationships, abstract manipulation, and goal-oriented decision chains that may go beyond direct pattern recognition. Historically, AI reasoning has been grounded in formal logic, such as propositional and first-order logic [87], and it evolved through multiple computational paradigms. Depending on the paradigm, the underlying mechanism and representation of reasoning vary significantly. We review these core characterizations below.

2.2.1. Formal Characterizations Across Paradigms

We now formalize AI reasoning across the major paradigms discussed in Section 2.1. While their underlying assumptions differ, these paradigms share a common goal: to map structured knowledge and observations into logically or semantically grounded conclusions. Each paradigm instantiates the abstract reasoning function:
R : ( Knowledge , Observation ) Inferred Conclusion .
We analyze these three major paradigms and their representative instantiations below.
Symbolic Reasoning. Symbolic reasoning is grounded in classical logic, where reasoning involves deriving conclusions from explicit knowledge bases and input premises using syntactic rules, as follows:
R sym : ( K , Γ ) Δ where Γ K Δ ,
where
  • K is background knowledge (e.g., axioms, ontologies);
  • Γ are current inputs or observations;
  • Δ are conclusions derived by logical entailment;
  • ⊧ is syntactic or semantic entailment (e.g., modus ponens, resolution).
This paradigm underpins classical expert systems and theorem provers [31,88], with strong interpretability but limited generalization in open domains.
Statistical and Sub-symbolic Reasoning. Statistical reasoning models uncertainty and correlation between observed and unobserved variables using probabilistic structures. A general reasoning model follows:
R stat : P ( Y X , θ ) arg max Y E θ [ Y X ] ,
where
  • X are observed data (e.g., features, facts);
  • Y is the target variable to be inferred;
  • K is prior knowledge encoded in a probabilistic graphical model;
  • θ are model parameters (e.g., conditional probabilities);
  • Y ^ is the most probable or expected outcome.
This paradigm includes Bayesian Networks [47], MLNs [49], probabilistic logic programming (e.g., ProbLog [50]), and probabilistic soft logic [89]. It supports reasoning under uncertainty but often requires handcrafted structures or supervision.
As a key modern subclass of statistical reasoning, sub-symbolic reasoning refers to inference performed via continuous representations learned from data, where inference is implicit within neural function approximators like the following:
R sub : f θ ( X ) Z where Z R d ,
where
  • X is the high-dimensional input (e.g., image pixels, word tokens, graph embeddings);
  • f θ is the neural model parameterized by θ (e.g., CNNs [63], GNNs [65], Transformers [14]);
  • Y ^ is the task-dependent output (e.g., answer, class label, entity).
Though not based on formal logic, such models demonstrate compositional generalization and relational abstraction [90] in tasks like visual question answering, multi-hop QA, and analogical reasoning.
Recent foundation models (e.g., GPT-4 [73], Claude [74], LLaMA [91]) demonstrate reasoning-like behaviors in tasks such as multi-step question answering, numerical inference, and analogical problem solving. These capabilities emerge without explicitly defined symbolic rules or logic modules, but instead arise from the scale and pattern capacity of Transformer architectures trained on large corpora [92,93]. We interpret such phenomena as an extension of sub-symbolic reasoning, where inference is not computed through logic chains, but statistically approximated through prompt-conditioned generation.
R emergent : T ( X , P ) Y ^ ,
where
  • X is the user input or natural context (e.g., question, image caption);
  • P is the prompt scaffold or few-shot template (e.g., chain-of-thought);
  • T is a large pretrained Transformer model;
  • Y ^ is the model-generated output (e.g., answer, plan, proof explanation).
While these behaviors are not grounded in verifiable logic, they exhibit surprising reasoning fluency across many domains. It is important to distinguish this prompt-only emergent reasoning from LLM-augmented neural-symbolic systems—discussed separately in Section 2.1.3—that explicitly integrate external symbolic modules or structured APIs into the reasoning loop.
Neural–Symbolic Reasoning. Neural–symbolic systems aim to combine the expressivity of symbolic structures with the flexibility and scalability of neural networks. These systems typically integrate logic-based priors or rules K into a neural model f θ , resulting in hybrid architectures capable of both pattern recognition and structured reasoning. We formalize the reasoning process as follows:
R ns : f θ ( X ) + K Δ ,
where
  • X are input observations (e.g., a question, an image, a scene graph);
  • f θ is a neural encoder or predictor parameterized by θ ;
  • K is symbolic knowledge, such as rules, ontologies, or graphs;
  • Δ are structured outputs inferred jointly from symbolic and neural components.
This reasoning paradigm supports end-to-end learning while allowing the incorporation of explicit reasoning structures. Depending on the implementation, the symbolic component may appear in different forms:
  • Differentiable logic layers: logic rules are approximated using tensors or neural operators (e.g., ILP [19], Logical Tensor Networks [69]);
  • Neuro-symbolic concept learning: symbolic program execution is conditioned on visual or textual concept modules (e.g., NS-CL [22], Neural Module Networks [71]);
  • Knowledge-guided Transformers: structured external knowledge is injected into pretrained models (e.g., K-BERT [20], NeurASP [81]).
Neural–symbolic reasoning offers a middle ground between interpretability and adaptability. It allows AI systems to learn from data while reasoning over known structures, making it particularly useful in domains such as scientific QA, medical diagnosis, and law, where both statistical inference and logical guarantees are essential.

2.2.2. Categorization of AI Reasoning Across Dimensions

Beyond historical paradigms and formal mechanisms, AI reasoning can be further categorized along multiple orthogonal axes. These dimensions help situate diverse reasoning systems in a unified framework and facilitate comparative analysis. We propose a three-dimensional taxonomy based on representation type, task structure, and application context.
By Representation Type. This axis characterizes the internal form of knowledge and reasoning operations ranging from explicit logical formulas to implicit statistical embeddings, including the following:
  • Symbolic Reasoning: Relies on discrete, human-interpretable representations such as logic rules, graphs, and ontologies [94,95]. Inference is typically performed using deductive or rule-based systems, enabling traceability and formal verification [88]. Such systems dominate early expert systems and theorem provers.
  • Statistical Reasoning: Model uncertainty using probability distributions over structured data. Reasoning tasks involve belief updating, probabilistic inference, and marginalization, often leveraging tools like Bayesian Networks [47], HMMs [96], or MLNs [49]. Logic-based probabilistic systems such as ProbLog [50] and PSL [89] also fall under this category.
  • Neural Reasoning: Employs continuous, learned representations within neural networks. Inference emerges from multi-layer transformations and pattern abstraction, without explicit rule structures. Despite its black-box nature, neural reasoning has demonstrated success in perception-rich and language-heavy tasks [14,90].
  • Hybrid (Neural–Symbolic) Reasoning: Attempts to unify the interpretability of symbolic models with the flexibility of neural networks. This includes architectures that inject symbolic priors into differentiable computation (e.g., ILP [19], Logical Tensor Networks [69]), or use neural controllers to invoke symbolic tools (e.g., NS-CL [22], NeurASP [81], K-BERT [20]).
By Task Structure. This axis focuses on the inferential logic underlying the reasoning process, reflecting different modes of human-like thinking, including the following:
  • Deductive Reasoning: Draws logically valid conclusions from known premises or rules. It operates under certainty and preserves truth, forming the foundation of theorem provers [88], symbolic solvers [97], and classical logic programming [94].
  • Inductive Reasoning: Generalizes from specific instances to broader rules or models. Typical in scientific discovery and machine learning, this paradigm underlies systems like Inductive Logic Programming (e.g., FOIL [98], Meta-ILP [99]), and concept generalization.
  • Abductive Reasoning: Seeks the most plausible explanation for an observation. It is used extensively in diagnosis, plan recognition, and commonsense reasoning, where causes must be inferred from effects [70,84].
  • Analogical Reasoning: Solves unfamiliar problems by mapping structures from previously encountered scenarios. This approach underlies analogical question answering, metaphor understanding, and visual analogy [100,101].
By Application Context. This axis describes the environment in which reasoning is applied, emphasizing the domain’s structural assumptions and complexity, including the following:
  • Closed-domain Reasoning: Operates in well-defined, highly structured environments where rules and ontologies are fixed and comprehensive. Common in robotic control [102], rule-based planning [37], and legal document validation [103], these systems prioritize correctness and determinism.
  • Open-domain Reasoning: Engages with ambiguous, dynamic, and incomplete knowledge sources. It encompasses multi-hop question answering [104], visual reasoning [72,82], dialogue systems [105], and scientific exploration [106], where models must cope with noise, novelty, and partially observed states.
This multidimensional taxonomy enables a more fine-grained understanding of the diverse methodologies in AI reasoning and provides a scaffold for comparing their assumptions, strengths, and limitations. It also clarifies the trade-offs between generalization and precision, expressiveness and tractability, and learning and interpretability (Table 4).
Table 4. Three-dimensional categorization of AI reasoning across representation, task structure, and application context. Each cell lists representative methods under Closed (C) and Open (O) domains.

4. Application View: Benchmarks, Datasets, and Reasoning-Oriented AI Systems

4.1. Reasoning-Centric Tasks and Benchmarks

Reasoning-oriented AI systems have demonstrated significant impact across a range of domains, from natural language understanding and commonsense inference to planning in embodied agents and multimodal cognition. In this section, we categorize representative reasoning tasks and summarize the associated methodologies and real-world applications, with a focus on how symbolic or neural–symbolic approaches enhance interpretability, generalization, and causal understanding.
(1) Question Answering. QA is a long-standing benchmark for evaluating machine reasoning, spanning deductive, commonsense, abductive, causal, and explanatory inference. These tasks require systems to move beyond surface-level text matching and instead construct, apply, and verify reasoning chains from explicit premises or implicit world knowledge.
  • Deductive QA. Tasks such as ProofWriter [189] and FOLIO [190] involve formal logic reasoning, requiring models to infer conclusions from natural language premises using entailment, conjunction, and implication rules. These benchmarks emphasize the need for systematic generalization over formal logical forms.
  • Commonsense QA. Benchmarks like CommonsenseQA [191], CosmosQA [192], and OpenBookQA [193] test models’ ability to integrate background knowledge with contextual understanding. They target reasoning over latent knowledge, including naïve physics, social conventions, and intentionality.
  • Abductive and Causal QA. Datasets such as AbductiveNLI [194], ART [195], CausalQA [196], and SituatedQA [197] evaluate models on inferring plausible causes or explanations behind observed scenarios. Such tasks reflect the importance of abductive reasoning and counterfactual analysis in explainable AI.
  • Explanatory QA. Tasks like e-SNLI [198] and EntailmentBank [199] require not only answer prediction but also structured generation of reasoning chains, either as entailment trees or natural language justifications. These tasks are crucial for interpretability and educational applications.
(2) Planning, Tool Use, and Decision-Making. Symbolic and logical reasoning form the foundation of traditional AI planning. In contemporary systems, these paradigms are integrated with neural policies and tool-augmented agents to support long-horizon reasoning and explainable decision-making.
  • Symbolic Planning. Symbolic planning focuses on solving goal-directed tasks by generating action sequences under symbolic representations of states, transitions, and constraints. In contrast to purely reactive control policies, symbolic planning requires agents to explicitly reason about future states, preconditions, and causality. This often entails the construction of intermediate symbolic structures such as logic programs, action graphs, or scene representations, enabling long-horizon planning, interpretability, and task compositionality. PUZZLES [200] present structured environments composed of discrete logic-based tasks (e.g., grid games, puzzles, combinatorial path planning), designed to test whether agents can generalize across symbolic domains and solve algorithmic reasoning problems under limited feedback. RSBench [201] introduces a suite of neuro-symbolic reasoning environments targeting concept-level evaluation. SCERL [202] adapts textual reinforcement learning environments for safe and interpretable planning, covering sub-domains like side-effect minimization and reward uncertainty. In the domain of physical and robotic environments, RLBench [203] serves as a high-dimensional benchmark featuring over 100 task variants ranging from simple object manipulation to multi-step tool use.
  • Tool-Augmented Agents. Systems such as ReAct [75], AutoGPT [162], and DSPy [77] combine LLMs with external tools and APIs to perform chain-of-thought reasoning and actionable planning. These agents dynamically invoke tools, reason over intermediate outputs, and adaptively revise plans.
  • Multi-Agent and Interactive Planning. Frameworks like CAMEL [171] leverage symbolic role assignments and structured dialogue policies to coordinate among collaborative agents. They enable decentralized planning, intention modeling, and joint task execution in social or multi-agent settings.
(3) Multimodal Reasoning and Perception. Perception-driven reasoning tasks integrate visual or sensorimotor input with symbolic abstraction, supporting compositional, causal, and temporally grounded inference.
  • VQA. Datasets such as CLEVR [72], GQA [204], and VQA-X [205] are designed to probe structured reasoning over visual scenes, testing capabilities like relational comparison, quantification, and spatial inference under varying degrees of language grounding and visual complexity.
  • Video and Event Reasoning. Benchmarks including CLEVRER [206], NExT-QA [207], and VideoCoT [82] evaluate temporal and causal reasoning in video contexts, such as predicting future states, identifying event chains, and explaining dynamic processes. Symbolic modules or causal priors are often critical in modeling these temporal dependencies.
  • Embodied and Situated Reasoning. In robotics and embodied environments, agents must perform goal-oriented reasoning from partial observations. Systems increasingly utilize symbolic representations (e.g., scene graphs or task logic) derived from perceptual inputs to support grounding, action abstraction, and generalizable planning. Recent approaches such as Embodied Chain-of-Thought Reasoning (ECoT) [208] and Inner Monologue [209] demonstrate how integrating structured reasoning steps and leveraging language model feedback can enhance robotic control and planning capabilities in complex environments.
(4) Program Induction and Semantic Parsing. Programmatic representations—ranging from SQL queries to domain-specific languages—serve as explicit reasoning artifacts, allowing verification, interpretability, and execution within structured environments.
  • Semantic Parsing. Benchmarks such as Spider [210], ATIS [211], and ScienceBenchmark [212] evaluate the mapping of natural language to executable queries. These tasks often require understanding compositional semantics, coreference resolution, and domain schemas.
  • Program Synthesis. Tasks like CODET [213], NL2Bash [214], and MathQA [215] involve generating symbolic code from language descriptions or problems. Success in these tasks depends on precise logical translation, error handling, and explanation capabilities, particularly in mathematical or command-line environments.
Beyond academic evaluation, these reasoning tasks are increasingly deployed in high-stakes domains such as healthcare, education, and human–computer interaction. Causal and abductive QA enables hypothesis testing in scientific discovery; tool-augmented reasoning powers assistive agents; multimodal reasoning enables situational understanding in autonomous systems; and programmatic reasoning supports transparent decision pipelines. Together, they form the backbone of AI systems aspiring toward generality, interpretability, and real-world reliability.
We summarize the benchmarks and datasets by domain and reasoning type in Table 5. These benchmarks differ across three critical dimensions: (i) the nature of the input modality (symbolic, visual, or hybrid), (ii) the type of reasoning targeted (deductive, abductive, causal, or procedural), and (iii) the structural complexity of the task formulation (single-step vs. multi-hop, static vs. dynamic environments). As the field matures, recent datasets increasingly emphasize alignment with real-world conditions, such as noisy perception (e.g., VideoCoT [82]), open-domain tool use (e.g., WebArena [216]), and multi-agent coordination (e.g., AgentBench [166]).
Table 5. Representative benchmarks and datasets for reasoning-centric AI tasks.
The diversity of benchmarks in symbolic and neural–symbolic reasoning reflects the richness of reasoning paradigms and their application scopes. The field has evolved from synthetic logic tasks (e.g., CLEVR [72], ProofWriter [189]) toward complex, real-world, multi-agent or tool-augmented scenarios (e.g., ToolBench [218], AgentBench [166]), presenting new opportunities for scalable and interpretable reasoning in AI systems.

4.2. Reasoning Frameworks and Toolkits

Alongside benchmarks and datasets, a growing ecosystem of toolkits, frameworks, and reasoning engines has been developed to support symbolic and Neural–Symbolic AI. These systems differ by their reasoning paradigms, supported modalities, and integration with learning-based models. Table 6 summarizes representative systems.
Table 6. Representative toolkits and frameworks for symbolic and neural–symbolic reasoning.
These frameworks offer varying levels of abstraction—from declarative logic interfaces (e.g., ASP [81]) to Python-integrated neural–symbolic environments (e.g., DeepProbLog [80]). Some (e.g., DSPy [77], LangChain [169]) enable LLM-centric tool chaining, while others emphasize differentiable semantics, probabilistic reasoning, or symbolic learning. The selection of toolkits depends on the desired balance between interpretability, learning capacity, and domain constraints.
While reasoning-oriented AI systems have demonstrated strong theoretical advances, practical deployment in real-world applications remains a critical benchmark of their utility. We highlight representative deployment domains where symbolic or neural–symbolic reasoning frameworks have proven effective:
  • Question Answering and Explainable Search. Neuro-symbolic systems such as DeepProbLog [80] and DSPy [77] have been integrated into QA pipelines to provide interpretable reasoning traces alongside factual answers. Logic-enhanced retrieval and reasoning has also shown promise in scientific QA and legal document analysis.
  • Automated Planning and Robotics. ASP solvers like clingo [229] and symbolic planners such as DLV [226] are widely used in robotic task planning, where action constraints, resource dependencies, and failure recovery can be naturally expressed using logical rules.
  • Scene and Event Understanding. Hybrid models like NEUMANN [133] and AlphaILP [140] have enabled compositional visual reasoning in scene graphs and multi-object tracking tasks. Their integration with visual detectors improves the accuracy and interpretability of symbolic queries over visual domains.
  • Tool-Augmented Agent Systems. Toolchain frameworks such as LangChain [169], DSPy [77], and AgentBench [166] allow LLMs to invoke APIs, retrieve documents, and invoke solvers in complex reasoning workflows. These systems have been deployed in domains like software engineering, autonomous planning, and complex report generation.
  • Decision Support and Diagnosis. Probabilistic logic systems like ProbLog [50] and PSL [89] have been applied in healthcare, recommender systems, and risk assessment settings, where uncertainty and rule-based policies must be jointly modeled.
These deployments highlight the increasing maturity and integration capability of modern AI reasoning systems. As reasoning modules become more modular and explainable, their adoption in domain-critical applications—such as law, finance, and scientific discovery—is likely to expand.

5. Closing Remarks and Future Directions

As this survey has illustrated, reasoning remains a cornerstone capability in the pursuit of Artificial General Intelligence. From symbolic logic systems rooted in the Physical Symbol System Hypothesis to neural architectures excelling at perceptual tasks, the historical development of AI has reflected a persistent tension between structure and flexibility, interpretability and adaptability.
Neural–Symbolic AI emerges as a promising paradigm to bridge this divide, aiming to unify the structured inference of symbolic reasoning with the representation power and scalability of deep learning. Across this survey, we have categorized seven methodological directions—ranging from Differentiable Logic Programming and abductive learning to LLM-guided reasoning and multimodal neuro-symbolic integration—and analyzed how each contributes to reconciling classical reasoning principles with modern AI capabilities.
On the application side, we have witnessed a growing adoption of reasoning systems across diverse domains: question answering, robot planning, visual scene understanding, and agentic tool use. Benchmarks and toolkits have also matured, offering common grounds for evaluating inference consistency, generalization ability, and modular reasoning workflows.
Despite these advances, several open challenges remain at the heart of AI reasoning research:
  • Unified Architectures. Existing systems are often task-specific or handcrafted. Achieving general-purpose, reusable reasoning modules remains an unsolved problem.
  • Symbol-Vector Bridging. Seamlessly combining discrete symbolic structures with continuous neural representations requires more principled modeling and training strategies.
  • Reasoning under Uncertainty. While probabilistic and fuzzy logic frameworks exist, efficient integration with deep perception remains limited in practice.
  • Explainability and Trust. As reasoning systems are increasingly deployed in sensitive applications such as healthcare and law, their transparency, robustness, and ethical alignment become essential.
Looking forward, the distinction between symbolic and non-symbolic reasoning paradigms is likely to blur further. While symbolic logic provides interpretability, verifiability, and structured abstraction, emerging generative paradigms—particularly those enabled by large language models (LLMs)—demonstrate remarkable reasoning-like behavior through prompt-conditioned generation. Whether reasoning necessarily requires symbolic logic, or whether statistical generation alone can yield robust and generalizable reasoning, remains an open question. Rather than taking an exclusive stance, we advocate for a pluralistic view: symbolic, sub-symbolic, and generative paradigms offer complementary strengths, and their integration may be the most promising path toward general-purpose reasoning systems. We anticipate that the next generation of AI reasoning will involve hybrid designs that combine large language models, symbolic planners, differentiable theorem provers, and structured memory components. Progress in this direction will not only push the boundaries of interpretable and generalizable AI, but also lay the groundwork for cognitively inspired, human-aligned intelligent systems. Ultimately, advancing AI reasoning is not merely a technical pursuit—it is a conceptual imperative. It calls for rethinking the foundations of how machines perceive, abstract, and decide, and for constructing systems that reason not only with data, but also with knowledge, causality, and intent.

Author Contributions

Conceptualization, B.L.; Methodology, B.L.; Data curation, Y.W.; Writing—original draft, B.L.; Writing—review & editing, C.T. All authors have read and agreed to the published version of the manuscript.

Funding

This study was partially supported by the National Natural Science Foundation of China (62176016, 72274127), National Key R&D Program of China (No. 2021YFB2104800), Guizhou Province Science and Technology Project: Research on Q&A Interactive Virtual Digital People for Intelligent Medical Treatment in Information Innovation Environment (supported by Qiankehe[2024] General 058), Capital Health Development Research Project (2022-2-2013), Haidian innovation and translation program from Peking University Third Hospital (HDCXZHKC2023203).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Turing, A.M. Computing machinery and intelligence. Mind 1950, 59, 433–460. [Google Scholar] [CrossRef]
  2. Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach, 4th ed.; Pearson: London, UK, 2021. [Google Scholar]
  3. Goertzel, B.; Pennachin, C. Artificial General Intelligence; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  4. LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  5. Marcus, G. Deep learning: A critical appraisal. arXiv 2018, arXiv:1801.00631. [Google Scholar]
  6. Lake, B.M.; Ullman, T.D.; Tenenbaum, J.B.; Gershman, S.J. Building Machines That Learn and Think Like People. Behav. Brain Sci. 2017, 40, e253. [Google Scholar] [CrossRef] [PubMed]
  7. Pearl, J. Causality: Models, Reasoning, and Inference, 2nd ed.; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
  8. Wang, X.; Zhang, S.; Cen, J.; Gao, C.; Zhang, Y.; Zhao, D.; Sang, N. CLIP-guided Prototype Modulating for Few-shot Action Recognition. Int. J. Comput. Vis. 2023, 132, 1899–1912. [Google Scholar] [CrossRef]
  9. Brachman, R.J.; Levesque, H.J. Knowledge Representation and Reasoning; The Morgan Kaufmann Series in Artificial Intelligence; Morgan Kaufmann: San Francisco, CA, USA, 2004. [Google Scholar]
  10. Newell, A.; Simon, H.A. Computer Science as Empirical Inquiry: Symbols and Search. Commun. ACM 1976, 19, 113–126. [Google Scholar] [CrossRef]
  11. Shortliffe, E.H.; Buchanan, B.G. A model of inexact reasoning in medicine. Math. Biosci. 1975, 23, 351–379. [Google Scholar] [CrossRef]
  12. Campbell, M.; Hoane, A.J.; Hsu, F.h. Deep Blue. Artif. Intell. 2002, 134, 57–83. [Google Scholar] [CrossRef]
  13. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
  14. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 4–9 December 2017; NIPS’17. pp. 6000–6010. [Google Scholar]
  15. Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
  16. Garcez, A.d.; Lamb, L.C. Neurosymbolic AI: The 3rd Wave. Artif. Intell. Rev. 2022, 55, 2245–2274. [Google Scholar] [CrossRef]
  17. Bengio, Y. From System 1 Deep Learning to System 2 Deep Learning. Available online: https://neurips.cc/virtual/2019/invited-talk/15488 (accessed on 11 December 2019).
  18. Zhang, B.; Zhu, J.; Hang, S. Toward the third generation of artificial intelligence. Sci. Sin. (Informationis) 2020, 9, 1281–1302. [Google Scholar] [CrossRef]
  19. Evans, R.; Grefenstette, E. Learning explanatory rules from noisy data. J. Artif. Intell. Res. 2018, 61, 1–64. [Google Scholar] [CrossRef]
  20. Liu, W.; Zhou, P.; Zhao, Z.; Wang, Z.; Ju, Q.; Deng, H.; Wang, P. K-BERT: Enabling Language Representation with Knowledge Graph. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 2901–2908. [Google Scholar]
  21. Bordes, A.; Usunier, N.; Garcia-Durán, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. In Proceedings of the 27th International Conference on Neural Information Processing Systems–Volume 2, Red Hook, NY, USA, 5–10 December 2013; NIPS’13. pp. 2787–2795. [Google Scholar]
  22. Mao, J.; Gan, C.; Kohli, P.; Tenenbaum, J.B.; Wu, J. The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision. arXiv 2019, arXiv:1904.12584. [Google Scholar]
  23. Besold, T.R.; d’Avila Garcez, A.; Bader, S.; Bowman, H.; Domingos, P.; Hitzler, P.; Kuehnberger, K.U.; Lamb, L.C.; Lowd, D.; Lima, P.M.V.; et al. Neural-Symbolic Learning and Reasoning: A Survey and Interpretation. In Neuro-Symbolic Artificial Intelligence: The State of the Art; IOS Press: Amsterdam, The Netherlands, 2017. [Google Scholar]
  24. Wan, Z.; Liu, C.K.; Yang, H.; Li, C.; You, H.; Fu, Y.; Wan, C.; Krishna, T.; Lin, Y.; Raychowdhury, A. Towards Cognitive AI Systems: A Survey and Prospective on Neuro-Symbolic AI. arXiv 2024, arXiv:2401.01040. [Google Scholar]
  25. Li, Z.Z.; Zhang, D.; Zhang, M.L.; Zhang, J.; Liu, Z.; Yao, Y.; Xu, H.; Zheng, J.; Wang, P.J.; Chen, X.; et al. From System 1 to System 2: A Survey of Reasoning Large Language Models. arXiv 2025, arXiv:2502.17419. [Google Scholar]
  26. Bock, H.H.; Diday, E. Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data; Springer: Berlin/Heidelberg, Germany, 2000. [Google Scholar]
  27. Colmerauer, A.; Roussel, P. The birth of Prolog. In History of Programming Languages—II; Association for Computing Machinery: New York, NY, USA, 1996; pp. 331–367. [Google Scholar]
  28. Newell, A.; Simon, H.A. GPS, a program that simulates human thought. In Computation & Intelligence: Collected Readings; American Association for Artificial Intelligence: Washington, DC, USA, 1995; pp. 415–428. [Google Scholar]
  29. Michalski, R.S. Variable-valued logic and its applications to pattern recognition and machine learning. In Computer Science and Multiple-Valued Logic; Rine, D.C., Ed.; Elsevier: Amsterdam, The Netherlands, 1977; pp. 506–534. [Google Scholar] [CrossRef]
  30. Buchanan, B.G.; Feigenbaum, E.A. Dendral and meta-dendral: Their applications dimension. Artif. Intell. 1978, 11, 5–24. [Google Scholar] [CrossRef]
  31. Giarratano, J.C.; Riley, G. Expert Systems: Principles and Programming; Brooks/Cole Publishing Co.: Pacific Grove, CA, USA, 1989. [Google Scholar]
  32. Forgy, C.L. Rete: A fast algorithm for the many pattern/many object pattern match problem. In Readings in Artificial Intelligence and Databases; Elsevier: Amsterdam, The Netherlands, 1989; pp. 547–559. [Google Scholar]
  33. Reiter, R. A logic for default reasoning. Artif. Intell. 1980, 13, 81–132. [Google Scholar] [CrossRef]
  34. McCarthy, J. Circumscription—A form of non-monotonic reasoning. Artif. Intell. 1980, 13, 27–39. [Google Scholar] [CrossRef]
  35. Gelfond, M.; Lifschitz, V. Classical negation in logic programs and disjunctive databases. New Gener. Comput. 1991, 9, 365–385. [Google Scholar] [CrossRef]
  36. Doyle, J. A truth maintenance system. Artif. Intell. 1979, 12, 231–272. [Google Scholar] [CrossRef]
  37. Fikes, R.E.; Nilsson, N.J. STRIPS: A new approach to the application of theorem proving to problem solving. Artif. Intell. 1971, 2, 189–208. [Google Scholar] [CrossRef]
  38. Nilsson, N.J. Shakey the Robot. In SRI International AI Center Technical Note; SRI International: Menlo Park, CA, USA, 1984. [Google Scholar]
  39. Dung, P.M. On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artif. Intell. 1995, 77, 321–357. [Google Scholar] [CrossRef]
  40. Modgil, S.; Prakken, H. The ASPIC+ framework for structured argumentation: A tutorial. Argum. Comput. 2014, 5, 31–62. [Google Scholar] [CrossRef]
  41. Brachman, R.J.; Schmolze, J.G. An overview of the KL-ONE knowledge representation system. Cogn. Sci. 1985, 9, 171–216. [Google Scholar] [CrossRef]
  42. Baader, F. The Description Logic Handbook: Theory, Implementation and Applications; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
  43. McGuinness, D.L.; Van Harmelen, F. OWL web ontology language overview. W3C Recomm. 2004, 10, 2004. [Google Scholar]
  44. Hintikka, J. Knowledge and Belief: An Introduction to the Logic of the Two Notions; Texts in Philosophy; King’s College Publications: London, UK, 2005. [Google Scholar]
  45. Pnueli, A. The temporal logic of programs. In Proceedings of the 18th Annual Symposium on Foundations of a Computer Science (sfcs 1977), Providence, RI, USA, 31 October–2 November 1977; pp. 46–57. [Google Scholar]
  46. Emerson, E.A. Temporal and modal logic. In Formal Models and Semantics; Elsevier: Amsterdam, The Netherlands, 1990; pp. 995–1072. [Google Scholar]
  47. Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1988. [Google Scholar]
  48. Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
  49. Richardson, M.; Domingos, P. Markov logic networks. Mach. Learn. 2006, 62, 107–136. [Google Scholar] [CrossRef]
  50. De Raedt, L.; Kimmig, A.; Toivonen, H. ProbLog: A probabilistic Prolog and its application in link discovery. In Proceedings of the IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, 6–12 January 2007; pp. 2462–2467. [Google Scholar]
  51. Sato, T.; Kameya, Y. PRISM: A language for symbolic-statistical modeling. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, Nagoya, Japan, 23–29 August 1997; Volume 97, pp. 1330–1339. [Google Scholar]
  52. Riguzzi, F. Extended semantics and inference for the Independent Choice Logic. Log. J. IGPL 2009, 17, 589–629. [Google Scholar] [CrossRef][Green Version]
  53. Jaeger, M. Relational Bayesian networks. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI), Providence, RI, USA, 1–3 August 1997; pp. 266–273. [Google Scholar]
  54. Getoor, L.; Taskar, B. Introduction to Statistical Relational Learning; MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
  55. De Raedt, L.; Frasconi, P.; Kersting, K.; Muggleton, S. (Eds.) Probabilistic Inductive Logic Programming: Theory and Applications; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
  56. Michalski, R.S. Knowledge Acquisition Through Conceptual Clustering: A Theoretical Framework and an Algorithm for Partitioning Data into Conjunctive Concepts. J. Policy Anal. Inf. Syst. 1980, 4, 219–244. [Google Scholar]
  57. Michalski, R.S.; Stepp, R.E. Automated construction of classifications: Conceptual clustering versus numerical taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 1983, PAMI-5, 396–410. [Google Scholar] [CrossRef]
  58. Fisher, D.H. Knowledge Acquisition via Incremental Conceptual Clustering. Mach. Learn. 1987, 2, 139–172. [Google Scholar] [CrossRef]
  59. Pearl, J.; Mackenzie, D. The Book of Why: The New Science of Cause and Effect; BasicBooks: New York, NY, USA, 2018. [Google Scholar]
  60. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  61. Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
  62. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  63. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  64. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  65. Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. arXiv 2018, arXiv:1812.08434. [Google Scholar] [CrossRef]
  66. Kok, S.; Sumner, M.; Richardson, M.; Singla, P.; Poon, H.; Lowd, D.; Domingos, P. The Alchemy System for Statistical Relational AI; Technical Report; Department of Computer Science and Engineering, University of Washington: Seattle, WA, USA, 2007; Available online: http://alchemy.cs.washington.edu (accessed on 20 April 2025).
  67. Niu, F.; Zhang, C.; Ré, C.; Shavlik, J. Tuffy: Scaling up statistical inference in Markov Logic Networks using an RDBMS. Proc. Vldb Endow. 2011, 4, 373–384. [Google Scholar] [CrossRef]
  68. Dong, H.; Mao, J.; Lin, C.G. Neural Logic Machines. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  69. Serafini, L.; d’Avila Garcez, A. Learning and Reasoning with Logic Tensor Networks. In Proceedings of the Conference of the Italian Association for Artificial Intelligence, Genova, Italy, 29 November 2016. [Google Scholar]
  70. Zhou, Z.H. Abductive learning: Towards bridging machine learning and logical reasoning. Sci. China Inf. Sci. 2019, 62, 76101. [Google Scholar] [CrossRef]
  71. Andreas, J.; Rohrbach, M.; Darrell, T.; Klein, D. Neural Module Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 39–48. [Google Scholar]
  72. Johnson, J.; Hariharan, B.; van der Maaten, L.; Hoffman, J.; Fei-Fei, L.; Zitnick, C.L.; Girshick, R. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2901–2910. [Google Scholar]
  73. OpenAI. GPT-4 Technical Report. 2023. Available online: https://openai.com/research/gpt-4 (accessed on 24 April 2025).
  74. Anthropic. The Claude 3 Model Family: Opus, Sonnet, Haiku. 2024. Available online: https://www.anthropic.com/news/claude-3-family (accessed on 17 April 2025).
  75. Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.; Cao, Y. React: Synergizing reasoning and acting in language models. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
  76. Schick, T.; Dwivedi-Yu, J.; Dessì, R.; Raileanu, R.; Lomeli, M.; Hambro, E.; Zettlemoyer, L.; Cancedda, N.; Scialom, T. Toolformer: Language models can teach themselves to use tools. Adv. Neural Inf. Process. Syst. 2023, 36, 68539–68551. [Google Scholar]
  77. Khattab, O.; Singhvi, A.; Maheshwari, P.; Zhang, Z.; Santhanam, K.; Vardhamanan, S.; Haq, S.; Sharma, A.; Joshi, T.T.; Moazam, H.; et al. Dspy: Compiling declarative language model calls into self-improving pipelines. arXiv 2023, arXiv:2310.03714. [Google Scholar]
  78. Wang, B.R.; Huang, Q.Y.; Deb, B.; Halfaker, A.; Shao, L.Q.; McDuff, D.; Awadallah, A.H.; Radev, D.; Gao, J.F. Logical Transformers: Infusing Logical Structures into Pre-Trained Language Models. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023. [Google Scholar]
  79. Parmar, M.; Patel, N.; Varshney, N.; Nakamura, M.; Luo, M.; Mashetty, S.; Mitra, A.; Baral, C. LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024; pp. 13679–13707. [Google Scholar] [CrossRef]
  80. Manhaeve, R.; Dumancic, S.; Kimmig, A.; Demeester, T.; De Raedt, L. DeepProbLog: Neural Probabilistic Logic Programming. In Advances in Neural Information Processing Systems; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
  81. Yang, Z.; Ishay, A.; Lee, J. NeurASP: Embracing Neural Networks into Answer Set Programming. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Yokohama, Japan, 7–15 January 2020; Bessiere, C., Ed.; pp. 1755–1762. [Google Scholar] [CrossRef]
  82. Wang, Y.; Zeng, Y.; Zheng, J.; Xing, X.; Xu, J.; Xu, X. VideoCoT: A Video Chain-of-Thought Dataset with Active Annotation Tool. In Proceedings of the 3rd Workshop on Advances in Language and Vision Research (ALVR), Bangkok, Thailand, 16 August 2024; Gu, J., Fu, T.J.R., Hudson, D., Celikyilmaz, A., Wang, W., Eds.; pp. 92–101. [Google Scholar] [CrossRef]
  83. Surís, D.; Menon, S.; Vondrick, C. ViperGPT: Visual Inference via Python Execution for Reasoning. arXiv 2023, arXiv:2303.08128. [Google Scholar]
  84. Huang, Y.X.; Sun, Z.Q.; Li, G.Y.; Tian, X.B.; Dai, W.Z.; Hu, W.; Jiang, Y.; Zhou, Z.H. Enabling Abductive Learning to Exploit Knowledge Graph. In Proceedings of the 32th International Joint Conference on Artificial Intelligence (IJCAI), Macao SAR, China, 19–25 August 2023; pp. 2730–2736. [Google Scholar]
  85. Saparov, A.; He, H. Language modeling via stochastic processes and data augmentation. arXiv 2022, arXiv:2205.09310. [Google Scholar]
  86. Marcus, G.; Davis, E. Rebooting AI: Building Artificial Intelligence We Can Trust; Pantheon: London, UK, 2019. [Google Scholar]
  87. Russell, S.J.; Norvig, P. Artificial Intelligence: A Modern Approach, 3rd ed.; Prentice Hall: Hoboken, NJ, USA, 2010. [Google Scholar]
  88. Robinson, J.A. A machine-oriented logic based on the resolution principle. J. ACM (JACM) 1965, 12, 23–41. [Google Scholar] [CrossRef]
  89. Bach, S.H.; Broecheler, M.; Huang, B.; Getoor, L. Hinge-loss markov random fields and probabilistic soft logic. J. Mach. Learn. Res. 2017, 18, 1–67. [Google Scholar]
  90. Battaglia, P.; Hamrick, J.B.; Bapst, V.; Sanchez-Gonzalez, A.; Zambaldi, V.; Malinowski, M.; Tacchetti, A.; Raposo, D.; Santoro, A.; Faulkner, R.; et al. Relational inductive biases, deep learning, and graph networks. arXiv 2018, arXiv:1806.01261. [Google Scholar]
  91. Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. LLaMA: Open and Efficient Foundation Language Models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
  92. Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D.; Ichter, B. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 2022, 35, 24824–24837. [Google Scholar]
  93. Ganguli, D.; Hernandez, D.; Lovitt, L.; DasSarma, N.; Henighan, T.; Jones, A.; Joseph, N.; Kernion, J.; Mann, B.; Askell, A.; et al. Predictability and surprise in large generative models. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, 21–24 June 2022; pp. 1747–1764. [Google Scholar]
  94. Abiteboul, S.; Hull, R.; Vianu, V. Foundations of Databases; Addison-Wesley: Boston, MA, USA, 1995. [Google Scholar]
  95. Baader, F.; Horrocks, I.; Sattler, U. The Description Logic Handbook; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
  96. Rabiner, L.R. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition; IEEE: New York, NY, USA, 1989; Volume 77, pp. 257–286. [Google Scholar]
  97. De Moura, L.; Bjørner, N. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS); Springer: Berlin/Heidelberg, Germany, 2008; pp. 337–340. [Google Scholar]
  98. Muggleton, S. Inverse entailment and Progol. New Gener. Comput. 1995, 13, 245–286. [Google Scholar] [CrossRef]
  99. Cropper, A.; Muggleton, S. Learning efficient logic programs. In Proceedings of the ILP Conference, London, UK, 4–6 September 2016. [Google Scholar]
  100. Falkenhainer, B.; Forbus, K.; Gentner, D. The structure-mapping engine: Algorithm and examples. Artif. Intell. 1989, 41, 1–63. [Google Scholar] [CrossRef]
  101. Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML), Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
  102. Kaelbling, L.P.; Littman, M.L.; Cassandra, A.R. Planning and acting in partially observable stochastic domains. Artif. Intell. 1998, 101, 99–134. [Google Scholar] [CrossRef]
  103. Palmirani, M.; Governatori, G.; Rotolo, A.; Sartor, G.; Tabet, S.; Boley, H. LegalRuleML: XML-based rules and norms. In Proceedings of the Rule-Based Modeling and Computing on the Semantic Web (RuleML), America, Ft. Lauderdale, FL, USA, 3–5 November 2011; pp. 298–312. [Google Scholar]
  104. Yang, Z.; Qi, P.; Zhang, S.; Bengio, Y.; Cohen, W.W.; Salakhutdinov, R.; Manning, C.D. HotpotQA: A dataset for diverse, explainable multi-hop question answering. arXiv 2018, arXiv:1809.09600. [Google Scholar]
  105. Reddy, S.; Chen, D.; Manning, C.D. CoQA: A conversational question answering challenge. Trans. Assoc. Comput. Linguist. 2019, 7, 249–266. [Google Scholar] [CrossRef]
  106. Taylor, R.; Kardas, M.; Cucurull, G.; Scialom, T.; Hartshorn, A.; Saravia, E.; Poulton, A.; Kerkez, V.; Stojnic, R. Galactica: A large language model for science. arXiv 2022, arXiv:2211.09085. [Google Scholar]
  107. Goodwin, T.; Demner-Fushman, D. Enhancing Question Answering by Injecting Ontological Knowledge through Regularization. In Proceedings of the Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, Online, 19–20 November 2020; pp. 56–63. [Google Scholar] [CrossRef]
  108. Valmeekam, K.; Ellis, K.; Solar-Lezama, A.; Tenenbaum, J.B. DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning. In Proceedings of the NeurIPS, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
  109. Morel, R.; Cropper, A.; Ong, C.-H.L. Typed meta-interpretive learning of logic programs. In Proceedings of the Logics in Artificial Intelligence (JELIA), Rende, Italy, 7–11 May 2019; pp. 198–213. [Google Scholar]
  110. Saad, F.A.; Cusumano-Towner, M.F.; Schaechtle, U.; Rinard, M.C.; Mansinghka, V.K. Bayesian synthesis of probabilistic programs for automatic data modeling. ACM Program. Lang. 2019, 3, 1–32. [Google Scholar] [CrossRef]
  111. Tai, K.S.; Socher, R.; Manning, C.D. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL), Beijing, China, 26–31 July 2015; pp. 1556–1566. [Google Scholar]
  112. Adrita, B.; Cara, W.; Pascal, H. Concept Induction Using LLMs: A User Experiment for Assessment. In Proceedings of the International Conference on Neural-Symbolic Learning and Reasoning (NeSy), Barcelona, Spain, 9–12 September 2024. [Google Scholar]
  113. Yi, K.; Wu, J.; Gan, C.; Torralba, A.; Kohli, P.; Tenenbaum, J. Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding. In Proceedings of the NeurIPS, Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
  114. Reiter, R. A theory of diagnosis from first principles. Artif. Intell. 1987, 32, 57–95. [Google Scholar] [CrossRef]
  115. Ramírez, M.; Geffner, H. Plan Recognition as Planning. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, CA, USA, 11–17 July 2009; pp. 1778–1783. [Google Scholar]
  116. Glória-Silva, D.; Ferreira, R.; Tavares, D.; Semedo, D.; Magalhaes, J. Plan-Grounded Large Language Models for Dual Goal Conversational Settings. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Valletta, Malta, 21–22 March 2024; pp. 1271–1292. [Google Scholar]
  117. Liang, B.; Su, Q.; Zhu, S.; Liang, Y.; Tong, C. VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos. Proc. AAAI Conf. Artif. Intell. 2025, 39, 5128–5136. [Google Scholar] [CrossRef]
  118. Shutova, E.; Sun, L.; Korhonen, A. Metaphor identification using verb and noun clustering. In Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, 23–27 August 2010; pp. 1002–1010. [Google Scholar]
  119. Borgwardt, K.; Kriegel, H.P. Shortest-path kernels on graphs. In Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM), Houston, TX, USA, 27–30 November 2005; pp. 74–81. [Google Scholar]
  120. Du, X.; Ji, H. Retrieval-augmented generative question answering for event argument extraction. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 4567–4578. [Google Scholar]
  121. Webb, T.; Holyoak, K.J.; Lu, H. Emergent analogical reasoning in large language models. Nat. Hum. Behav. 2023, 7, 1526–1541. [Google Scholar] [CrossRef]
  122. Bengio, Y. The consciousness prior. arXiv 2017, arXiv:1709.08568. [Google Scholar]
  123. LeCun, Y. A path towards autonomous machine intelligence. arXiv 2022, arXiv:2206.06927. [Google Scholar]
  124. Garcez, A.d.; Gori, M.; Lamb, L.C.; Serafini, L.; Spranger, M.; Tran, S.N. Neural-symbolic computing: An effective methodology for principled integration of machine learning and reasoning. arXiv 2019, arXiv:1905.06088. [Google Scholar]
  125. Yang, F.; Yang, Z.; Cohen, W.W. Differentiable learning of logical rules for knowledge base reasoning. Adv. Neural Inf. Process. Syst. 2017, 30, 2319–2328. [Google Scholar]
  126. Purgał, S.J.; Cerna, D.M.; Kaliszyk, C. Differentiable inductive logic programming in high-dimensional space. arXiv 2022, arXiv:2208.06652. [Google Scholar]
  127. Gao, K.; Inoue, K.; Cao, Y.; Wang, H. Learning first-order rules with differentiable logic program semantics. arXiv 2022, arXiv:2204.13570. [Google Scholar]
  128. Shindo, H.; Nishino, M.; Yamamoto, A. Differentiable inductive logic programming for structured examples. Proc. AAAI Conf. Artif. Intell. 2021, 35, 5034–5041. [Google Scholar] [CrossRef]
  129. Gao, K.; Inoue, K.; Cao, Y.; Wang, H. A Differentiable First-Order Rule Learner for Inductive Logic Programming. Artif. Intell. 2024, 331, 104108. [Google Scholar] [CrossRef]
  130. Rocktäschel, T.; Riedel, S. End-to-end differentiable proving. Adv. Neural Inf. Process. Syst. 2017, 30, 3788–3800. [Google Scholar]
  131. Cohen, W.W. Tensorlog: A differentiable deductive database. arXiv 2016, arXiv:1605.06523. [Google Scholar]
  132. Zimmer, M.; Feng, X.; Glanois, C.; Jiang, Z.; Zhang, J.; Weng, P.; Li, D.; Hao, J.; Liu, W. Differentiable logic machines. arXiv 2021, arXiv:2102.11529. [Google Scholar]
  133. Shindo, H.; Pfanschilling, V.; Dhami, D.S.; Kersting, K. Learning differentiable logic programs for abstract visual reasoning. Mach. Learn. 2024, 113, 8533–8584. [Google Scholar] [CrossRef]
  134. Takemura, A.; Inoue, K. Differentiable Logic Programming for Distant Supervision. In Proceedings of the European Conference on Artificial Intelligence, Santiago de Compostela, Spain, 19–24 October 2024. [Google Scholar]
  135. Serafini, L.; Garcez, A.d. Logic tensor networks: Deep learning and logical reasoning from data and knowledge. arXiv 2016, arXiv:1606.04422. [Google Scholar]
  136. Riegel, R.; Gray, A.; Luus, F.; Khan, N.; Makondo, N.; Akhalwaya, I.Y.; Qian, H.; Fagin, R.; Barahona, F.; Sharma, U.; et al. Logical Neural Networks. arXiv 2020, arXiv:2006.13155. [Google Scholar]
  137. Šourek, G.; Železný, F.; Kuželka, O. Beyond graph neural networks with lifted relational neural networks. Mach. Learn. 2021, 110, 1695–1738. [Google Scholar] [CrossRef]
  138. Geh, R.L.; Gonçalves, J.; Silveira, I.C.; Mauá, D.D.; Cozman, F.G. dPASP: A probabilistic logic programming environment for neurosymbolic learning and reasoning. In Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning, Hanoi, Vietnam, 2–8 November 2024; Volume 21, pp. 731–742. [Google Scholar]
  139. Li, Z.; Huang, J.; Naik, M. Scallop: A Language for Neurosymbolic Programming. Proc. ACM Program. Lang. 2023, 7, 1463–1487. [Google Scholar] [CrossRef]
  140. Shindo, H.; Pfanschilling, V.; Dhami, D.S.; Kersting, K. α ilp: Thinking visual scenes as differentiable logic programs. Mach. Learn. 2023, 112, 1465–1497. [Google Scholar] [CrossRef]
  141. Peirce, C.S. Collected Papers of Charles Sanders Peirce; Harvard University Press: Cambridge, MA, USA, 1935; Volume 5. [Google Scholar]
  142. Poole, D. A logical framework for default reasoning. Artif. Intell. 1988, 36, 27–47. [Google Scholar] [CrossRef]
  143. Dai, W.Z.; Muggleton, S.H. Abductive Knowledge Induction from Raw Data. In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI), Montreal, QC, Canada, 19–26 August 2021; pp. 2730–2736. [Google Scholar]
  144. Shao, J.J.; Hao, H.R.; Yang, X.W.; Li, Y.F. Abductive Learning for Neuro-Symbolic Grounded Imitation. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1, Toronto, ON, Canada, 3–7 August 2025; KDD ’25. pp. 1221–1232. [Google Scholar] [CrossRef]
  145. Hu, Y.Y.; Yu, Y. Enhancing Neural Mathematical Reasoning by Abductive Combination with Symbolic Library. arXiv 2023, arXiv:2203.14487. [Google Scholar]
  146. Hu, W.C.; Dai, W.Z.; Jiang, Y.; Zhou, Z.H. Efficient rectification of neuro-symbolic reasoning inconsistencies by abductive reflection. Proc. AAAI Conf. Artif. Intell. 2025, 39, 17333–17341. [Google Scholar] [CrossRef]
  147. Sun, Z.H.; Zhang, R.Y.; Zhen, Z.; Wang, D.H.; Li, Y.J.; Wan, X.; You, H. Systematic Abductive Reasoning via Diverse Relation Representations in Vector-symbolic Architecture. arXiv 2025, arXiv:2501.11896. [Google Scholar]
  148. Camposampiero, G.; Hersche, M.; Terzić, A.; Wattenhofer, R.; Sebastian, A.; Rahimi, A. Towards Learning Abductive Reasoning Using VSA Distributed Representations. In Neural-Symbolic Learning and Reasoning; Besold, T.R., d’Avila Garcez, A., Jimenez-Ruiz, E., Confalonieri, R., Madhyastha, P., Wagner, B., Eds.; Springer Nature: Cham, Switzerland, 2024; pp. 370–385. [Google Scholar]
  149. Hu, S.; Ma, Y.; Liu, X.; Wei, Y.; Bai, S. Stratified Rule-Aware Network for Abstract Visual Reasoning. Proc. AAAI Conf. Artif. Intell. 2021, 35, 1561–1569. [Google Scholar] [CrossRef]
  150. Jin, Y.; Liu, J.; Luo, Z.; Peng, Y.; Qin, Z.; Dai, W.Z.; Ding, Y.X.; Zhou, K. Pre-Training Meta-Rule Selection Policy for Visual Generative Abductive Learning. arXiv 2025, arXiv:2503.06427. [Google Scholar]
  151. Yang, X.W.; Wei, W.D.; Shao, J.J.; Li, Y.F.; Zhou, Z.H. Analysis for Abductive Learning and Neural-Symbolic Reasoning Shortcuts. In Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024; Volume 235, pp. 56524–56541. [Google Scholar]
  152. Gupta, T.; Kembhavi, A. Visual Programming: Compositional visual reasoning without training. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar]
  153. Kamali, D.; Barezi, E.J.; Kordjamshidi, P. NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization. Proc. AAAI Conf. Artif. Intell. 2025, 39, 4184–4193. [Google Scholar] [CrossRef]
  154. Liang, J.; Huang, W.; Xia, F.; Xu, P.; Hausman, K.; Ichter, B.; Florence, P.; Zeng, A. Code as Policies: Language Model Programs for Embodied Control. In Proceedings of the 6th Conference on Robot Learning (CoRL), Auckland, New Zealand, 14–18 December 2022. [Google Scholar]
  155. Shin, R.; Kant, N.; Gupta, K.; Bender, C.; Trabucco, B.; Singh, R.; Song, D. Synthetic Datasets for Neural Program Synthesis. In Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  156. Ellis, K.; Wong, L.; Nye, M.; Sable-Meyer, M.; Cary, L.; Anaya Pozo, L.; Hewitt, L.; Solar-Lezama, A.; Tenenbaum, J.B. DreamCoder: Growing generalizable, interpretable knowledge with wake–sleep Bayesian program learning. Philos. Trans. R. Soc. A 2023, 381, 20220050. [Google Scholar] [CrossRef] [PubMed]
  157. Khan, R.M.; Gulwani, S.; Le, V.; Radhakrishna, A.; Tiwari, A.; Verbruggen, G. LLM-Guided Compositional Program Synthesis. arXiv 2025, arXiv:2503.15540. [Google Scholar]
  158. Liang, C.; Berant, J.; Le, Q.; Forbus, K.D.; Lao, N. Neural symbolic machines: Learning semantic parsers on freebase with weak supervision. arXiv 2016, arXiv:1611.00020. [Google Scholar]
  159. Duan, X.; Wang, X.; Zhao, P.; Shen, G.; Zhu, W. DeepLogic: Joint Learning of Neural Perception and Logical Reasoning. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 4321–4334. [Google Scholar] [CrossRef]
  160. Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. Palm: Scaling language modeling with pathways. J. Mach. Learn. Res. 2023, 24, 1–113. [Google Scholar]
  161. Patil, S.G.; Zhang, T.; Wang, X.; Gonzalez, J.E. Gorilla: Large Language Model Connected with Massive APIs. arXiv 2023, arXiv:2305.15334. [Google Scholar]
  162. Gravitas, S. AutoGPT: An Autonomous GPT-4 Experiment. GitHub Repository. 2023. Available online: https://github.com/Torantulino/Auto-GPT (accessed on 20 May 2025).
  163. Wang, X.; Wei, J.; Schuurmans, D.; Le, Q.; Chi, E.; Narang, S.; Chowdhery, A.; Zhou, D. Self-consistency improves chain of thought reasoning in language models. arXiv 2022, arXiv:2203.11171. [Google Scholar]
  164. Zhou, D.; Schärli, N.; Hou, L.; Wei, J.; Scales, N.; Wang, X.; Schuurmans, D.; Cui, C.; Bousquet, O.; Le, Q.; et al. Least-to-most prompting enables complex reasoning in large language models. arXiv 2022, arXiv:2205.10625. [Google Scholar]
  165. Yao, S.; Yu, D.; Zhao, J.; Shafran, I.; Griffiths, T.; Cao, Y.; Narasimhan, K. Tree of thoughts: Deliberate problem solving with large language models. Adv. Neural Inf. Process. Syst. 2023, 36, 11809–11822. [Google Scholar]
  166. Liu, X.; Yu, H.; Zhang, H.; Xu, Y.; Lei, X.; Lai, H.; Gu, Y.; Ding, H.; Men, K.; Yang, K.; et al. Agentbench: Evaluating llms as agents. arXiv 2023, arXiv:2308.03688. [Google Scholar]
  167. Lewkowycz, A.; Andreassen, A.; Dohan, D.; Dyer, E.; Michalewski, H.; Ramasesh, V.; Slone, A.; Anil, C.; Schlag, I.; Gutman-Solo, T.; et al. Solving quantitative reasoning problems with language models. Adv. Neural Inf. Process. Syst. 2022, 35, 3843–3857. [Google Scholar]
  168. AtlasUnified. PyCoT: A Pythonic Chain-of-Thought Dataset Series. Hugging Face. 2025. Comprehensive CoT Expansions Across Multiple Domains. Available online: https://huggingface.co/datasets/AtlasUnified/PyCoT-Collection_Main (accessed on 1 May 2025).
  169. Chase, H. LangChain: Building Applications with LLMs Through Composability. 2023. Available online: https://www.langchain.com/ (accessed on 1 May 2025).
  170. Hong, S.; Zheng, X.; Chen, J.; Cheng, Y.; Wang, J.; Zhang, C.; Wang, Z.; Yau, S.K.S.; Lin, Z.; Zhou, L.; et al. Metagpt: Meta programming for multi-agent collaborative framework. arXiv 2023, arXiv:2308.00352. [Google Scholar]
  171. Li, G.; Hammoud, H.; Itani, H.; Khizbullin, D.; Ghanem, B. Camel: Communicative agents for “mind” exploration of large language model society. Adv. Neural Inf. Process. Syst. 2023, 36, 51991–52008. [Google Scholar]
  172. Miranda, B.; Shinnar, A.; Pestun, V.; Trager, B. Transformer Models for Type Inference in the Simply Typed Lambda Calculus: A Case Study in Deep Learning for Code. arXiv 2023, arXiv:2304.10500. [Google Scholar]
  173. Zhou, W.; Le Bras, R.; Choi, Y. Modular Transformers: Compressing Transformers into Modularized Layers for Flexible Efficient Inference. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; pp. 10452–10465. [Google Scholar] [CrossRef]
  174. Herzig, J.; Berant, J. Span-based semantic parsing for compositional generalization. arXiv 2020, arXiv:2009.06040. [Google Scholar]
  175. Poulis, A.; Tsalapati, E.; Koubarakis, M. Transformers in the Service of Description Logic-Based Contexts. In Knowledge Engineering and Knowledge Management; Springer Nature: Cham, Switzerland, 2025; pp. 328–345. [Google Scholar]
  176. Brinkmann, B.J.; Smith, J.D.; Lee, C.M. How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis. In Proceedings of the Workshop on Mechanistic Interpretability of Neural Networks at ICLR, Vienna, Austria, 27 July 2024. Workshop paper. [Google Scholar]
  177. Wang, X.; Gao, T.; Zhu, Z.; Zhang, Z.; Liu, Z.; Li, J.; Tang, J. KEPLER: A Unified Model for Knowledge Embedding and Pre-Trained Language Representation. Trans. Assoc. Comput. Linguist. 2021, 9, 176–194. [Google Scholar] [CrossRef]
  178. Winters, T.; Marra, G.; Manhaeve, R.; De Raedt, L. DeepStochLog: Neural Stochastic Logic Programming. Proc. AAAI Conf. Artif. Intell. 2022, 36, 10090–10100. [Google Scholar] [CrossRef]
  179. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  180. Luo, L.; Zhao, Z.; Gong, C.; Haffari, G.; Pan, S. Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models. arXiv 2024, arXiv:2410.13080. [Google Scholar]
  181. Mondal, D.; Modi, S.; Panda, S.; Singh, R.; Rao, G.S. KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning. Proc. AAAI Conf. Artif. Intell. 2024, 38, 18798–18806. [Google Scholar] [CrossRef]
  182. Khan, M.J.; Breslin, J.G.; Curry, E. NeuSyRE: Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment. Semant. Web 2023, 15, 1389–1413. [Google Scholar] [CrossRef]
  183. Verma, A.; Murali, V.; Singh, R.; Kohli, P.; Chaudhuri, S. Programmatically Interpretable Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 5045–5054. [Google Scholar]
  184. Kimura, D.; Chaudhury, S.; Ono, M.; Tatsubori, M.; Agravante, D.J.; Munawar, A.; Wachi, A.; Kohita, R.; Gray, A. LOA: Logical Optimal Actions for Text-based Interaction Games. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, Online, 1–6 August 2021; pp. 227–231. [Google Scholar]
  185. Delfosse, Q.; Shindo, H.; Dhami, D.S.; Kersting, K. Interpretable and Explainable Logical Policies via Neurally Guided Symbolic Abstraction. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
  186. Yang, M.; Liu, F.; Chen, Z.; Shen, X.; Hao, J.; Wang, J. CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 9593–9602. [Google Scholar]
  187. Wang, L.; He, Z.; Dang, R.; Shen, M.; Liu, C.; Chen, Q. Vision-and-Language Navigation via Causal Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
  188. Liang, W.; Kekić, A.; von Kügelgen, J.; Buchholz, S.; Besserve, M.; Gresele, L.; Schölkopf, B. Causal Component Analysis. In Proceedings of the Thirty-seventh Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
  189. Tafjord, O.; Dalvi, B.; Clark, P. ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online, 1–6 August 2021; Zong, C., Xia, F., Li, W., Navigli, R., Eds.; pp. 3621–3634. [Google Scholar] [CrossRef]
  190. Han, S.; Schoelkopf, H.; Zhao, Y.; Qi, Z.; Riddell, M.; Zhou, W.; Coady, J.; Peng, D.; Qiao, Y.; Benson, L.; et al. FOLIO: Natural Language Reasoning with First-Order Logic. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; Al-Onaizan, Y., Bansal, M., Chen, Y.N., Eds.; pp. 22017–22031. [Google Scholar] [CrossRef]
  191. Talmor, A.; Herzig, J.; Lourie, N.; Berant, J. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; Burstein, J., Doran, C., Solorio, T., Eds.; pp. 4149–4158. [Google Scholar] [CrossRef]
  192. Huang, L.; Le Bras, R.; Bhagavatula, C.; Choi, Y. Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Inui, K., Jiang, J., Ng, V., Wan, X., Eds.; pp. 2391–2401. [Google Scholar] [CrossRef]
  193. Mihaylov, T.; Clark, P.; Khot, T.; Sabharwal, A. Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J., Eds.; pp. 2381–2391. [Google Scholar] [CrossRef]
  194. Bhagavatula, C.; Bras, R.L.; Malaviya, C.; Sakaguchi, K.; Holtzman, A.; Rashkin, H.; Downey, D.; Yih, S.W.t.; Choi, Y. Abductive commonsense reasoning. arXiv 2019, arXiv:1908.05739. [Google Scholar]
  195. Du, L.; Ding, X.; Liu, T.; Qin, B. Learning Event Graph Knowledge for Abductive Reasoning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021. [Google Scholar] [CrossRef]
  196. Bondarenko, A.; Wolska, M.; Heindorf, S.; Blübaum, L.; Ngonga Ngomo, A.C.; Stein, B.; Braslavski, P.; Hagen, M.; Potthast, M. CausalQA: A Benchmark for Causal Question Answering. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; Calzolari, N., Huang, C.R., Kim, H., Pustejovsky, J., Wanner, L., Choi, K.S., Ryu, P.M., Chen, H.H., Donatelli, L., Ji, H., et al., Eds.; pp. 3296–3308. [Google Scholar]
  197. Zhang, M.; Choi, E. SituatedQA: Incorporating Extra-Linguistic Contexts into QA. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; Moens, M.F., Huang, X., Specia, L., Yih, S.W.t., Eds.; pp. 7371–7387. [Google Scholar] [CrossRef]
  198. Camburu, O.M.; Rocktäschel, T.; Lukasiewicz, T.; Blunsom, P. e-SNLI: Natural Language Inference with Natural Language Explanations. In Advances in Neural Information Processing Systems; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
  199. Dalvi, B.; Jansen, P.; Tafjord, O.; Xie, Z.; Smith, H.; Pipatanangkura, L.; Clark, P. Explaining Answers with Entailment Trees. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; Moens, M.F., Huang, X., Specia, L., Yih, S.W.t., Eds.; pp. 7358–7370. [Google Scholar] [CrossRef]
  200. Estermann, B.; Lanzendörfer, L.A.; Niedermayr, Y.; Wattenhofer, R. PUZZLES: A Benchmark for Neural Algorithmic Reasoning. In Advances in Neural Information Processing Systems; Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2024; Volume 37, pp. 127059–127098. [Google Scholar]
  201. Bortolotti, S.; Marconato, E.; Carraro, T.; Morettin, P.; van Krieken, E.; Vergari, A.; Teso, S.; Passerini, A. A Neuro-Symbolic Benchmark Suite for Concept Quality and Reasoning Shortcuts. In Proceedings of the Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 9–15 December 2024; pp. 1–44. [Google Scholar]
  202. Mirchandani, R.; Sundar, K.; Mohamed, S.; Krueger, D. SCERL: A Text-based Safety Benchmark for Reinforcement Learning Problems. In Proceedings of the NeurIPS Datasets and Benchmarks Track, Virtual, 28 November 2022. [Google Scholar]
  203. James, S.; Ma, Z.; Rovick Arrojo, D.; Davison, A.J. RLBench: The Robot Learning Benchmark & Learning Environment. IEEE Robot. Autom. Lett. 2020, 5, 3019–3026. [Google Scholar]
  204. Hudson, D.A.; Manning, C.D. Gqa: A new dataset for real-world visual reasoning and compositional question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6700–6709. [Google Scholar]
  205. Park, D.H.; Hendricks, L.A.; Akata, Z.; Rohrbach, A.; Schiele, B.; Darrell, T.; Rohrbach, M. Multimodal Explanations: Justifying Decisions and Pointing to the Evidence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
  206. Yi, K.; Gan, C.; Li, Y.; Kohli, P.; Wu, J.; Torralba, A.; Tenenbaum, J.B. Clevrer: Collision events for video representation and reasoning. arXiv 2019, arXiv:1910.01442. [Google Scholar]
  207. Xiao, J.; Shang, X.; Yao, A.; Chua, T.S. Next-qa: Next phase of question-answering to explaining temporal actions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9777–9786. [Google Scholar]
  208. Zawalski, M.; Chen, W.; Pertsch, K.; Mees, O.; Finn, C.; Levine, S. Robotic Control via Embodied Chain-of-Thought Reasoning. arXiv 2024, arXiv:2407.08693. [Google Scholar]
  209. Huang, W.; Xia, F.; Xiao, T.; Chan, H.; Liang, J.; Florence, P.; Zeng, A.; Tompson, J.J.R.; Mordatch, I.; Chebotar, Y.; et al. InnerMonologue: Embodied Reasoning through Planning with Language Models. In Proceedings of the Conference on Robot Learning (CoRL), Auckland, New Zealand, 14–18 December 2022. [Google Scholar]
  210. Yu, T.; Zhang, R.; Yang, K.; Yasunaga, M.; Wang, D.; Li, Z.; Ma, J.; Li, I.; Yao, Q.; Roman, S.; et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv 2018, arXiv:1809.08887. [Google Scholar]
  211. Price, P.; Fisher, W.M.; Bernstein, J.; Pallett, D.S. The DARPA 1000-word resource management database for continuous speech recognition. In Proceedings of the ICASSP-88, International Conference on Acoustics, Speech, and Signal Processing, New York, NY, USA, 11–14 April 1988; IEEE Computer Society: New York, NY, USA, 1988; pp. 651–652. [Google Scholar]
  212. Zhang, Y.; Deriu, J.; Katsogiannis-Meimarakis, G.; Kosten, C.; Koutrika, G.; Stockinger, K. ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems. In Proceedings of the VLDB Endowment, Vancouver, BC, Canada, 28 August–1 September 2023. [Google Scholar]
  213. Chen, B.; Zhang, F.; Nguyen, A.; Zan, D.; Lin, Z.; Lou, J.G.; Chen, W. CodeT: Code Generation with Generated Tests. In Proceedings of the Eleventh International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
  214. Lin, X.V.; Wang, C.; Zettlemoyer, L.; Ernst, M.D. NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 7–12 May 2018; Calzolari, N., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Hasida, K., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., et al., Eds.; [Google Scholar]
  215. Amini, A.; Gabriel, S.; Lin, P.; Chaturvedi, S.; Farhadi, A.; Hajishirzi, H. MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
  216. Zhou, S.; Xu, F.F.; Zhu, H.; Zhou, X.; Lo, R.; Sridhar, A.; Cheng, X.; Ou, T.; Bisk, Y.; Fried, D.; et al. Webarena: A realistic web environment for building autonomous agents. arXiv 2023, arXiv:2307.13854. [Google Scholar]
  217. Oh, J.H.; Kadowaki, K.; Kloetzer, J.; Iida, R.; Torisawa, K. Open-Domain Why-Question Answering with Adversarial Learning to Encode Answer Texts. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; Korhonen, A., Traum, D., Màrquez, L., Eds.; pp. 4227–4237. [Google Scholar] [CrossRef]
  218. Xu, Q.; Hong, F.; Li, B.; Hu, C.; Chen, Z.; Zhang, J. On the Tool Manipulation Capability of Open-source Large Language Models. arXiv 2023, arXiv:2305.16504. [Google Scholar]
  219. Nakano, R.; Hilton, J.; Balaji, S.; Wu, J.; Ouyang, L.; Kim, C.; Hesse, C.; Jain, S.; Kosaraju, V.; Saunders, W.; et al. Webgpt: Browser-assisted question-answering with human feedback. arXiv 2021, arXiv:2112.09332. [Google Scholar]
  220. Suhr, A.; Zhou, S.; Zhang, A.; Zhang, I.; Bai, H.; Artzi, Y. A corpus for reasoning about natural language grounded in photographs. arXiv 2018, arXiv:1811.00491. [Google Scholar]
  221. Thrush, T.; Jiang, R.; Bartolo, M.; Singh, A.; Williams, A.; Kiela, D.; Ross, C. Winoground: Probing vision and language models for visio-linguistic compositionality. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5238–5248. [Google Scholar]
  222. Zhong, V.; Xiong, C.; Socher, R. Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv 2017, arXiv:1709.00103. [Google Scholar]
  223. Yu, T.; Zhang, R.; Er, H.Y.; Li, S.; Xue, E.; Pang, B.; Lin, X.V.; Tan, Y.C.; Shi, T.; Li, Z.; et al. Cosql: A conversational text-to-sql challenge towards cross-domain natural language interfaces to databases. arXiv 2019, arXiv:1909.05378. [Google Scholar]
  224. Dries, A.; Kimmig, A.; Meert, W.; Renkens, J.; Van den Broeck, G.; Vlasselaer, J.; De Raedt, L. ProbLog2: Probabilistic Logic Programming. In Machine Learning and Knowledge Discovery in Databases; Bifet, A., May, M., Zadrozny, B., Gavalda, R., Pedreschi, D., Bonchi, F., Cardoso, J., Spiliopoulou, M., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 312–315. [Google Scholar]
  225. Gebser, M.; Kaminski, R.; Kaufmann, B.; Schaub, T. clingo = ASP + Control: Preliminary Report. arXiv 2014, arXiv:1405.3694. [Google Scholar]
  226. Leone, N.; Pfeifer, G.; Faber, W.; Eiter, T.; Gottlob, G.; Perri, S.; Scarcello, F. The DLV system for knowledge representation and reasoning. ACM Trans. Comput. Log. (TOCL) 2006, 7, 499–562. [Google Scholar] [CrossRef]
  227. Byrd, W.E.; Holk, E.; Friedman, D.P. miniKanren, live and untagged: Quine generation via relational interpreters (programming pearl). In Proceedings of the 2012 Annual Workshop on Scheme and Functional Programming, Copenhagen, Denmark, 9–15 September 2012; pp. 8–29. [Google Scholar]
  228. Manandhar, S.; Džeroski, S.; Erjavec, T. Learning multilingual morphology with CLOG. In International Conference on Inductive Logic Programming; Springer: Berlin/Heidelberg, Germany, 1998; pp. 135–144. [Google Scholar]
  229. Gebser, M.; Kaminski, R.; Kaufmann, B.; Schaub, T. Answer Set Solving in Practice; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.