Next Article in Journal
Study on the Stress Response and Deformation Mechanism of Pipe Jacking Segments Under the Coupling Effect of Defects and Deflection
Previous Article in Journal
Multi-Source Feature Fusion Domain Adaptation Planetary Gearbox Fault Diagnosis Method
Previous Article in Special Issue
Development and Experimental Testing of a 3D Vision System for Railway Freight Wagon Monitoring
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Symbolic Imitation Learning: From Black-Box to Explainable Driving Policies

1
Department of Science and Engineering, The George Washington University, Washington, DC 20052, USA
2
Connected and Autonomous Vehicles Lab, Department of Mechanical Engineering, University of Surrey, Guildford GU2 7XH, UK
3
Mechanical Engineering Department, Istanbul University-Cerrahpasa, Istanbul 34320, Türkiye
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(23), 12464; https://doi.org/10.3390/app152312464
Submission received: 10 October 2025 / Revised: 14 November 2025 / Accepted: 19 November 2025 / Published: 24 November 2025
(This article belongs to the Special Issue Intelligent Vehicle Collaboration and Positioning)

Abstract

Current imitation learning approaches, predominantly based on deep neural networks (DNNs), offer efficient mechanisms for learning driving policies from real-world datasets. However, they suffer from inherent limitations in interpretability and generalizability—issues of critical importance in safety-critical domains such as autonomous driving. In this paper, we introduce Symbolic Imitation Learning (SIL), a novel framework that leverages Inductive Logic Programming (ILP) to derive explainable and generalizable driving policies from synthetic datasets. We evaluate SIL on real-world HighD and NGSim datasets, comparing its performance with state-of-the-art neural imitation learning methods using metrics such as collision rate, lane change efficiency, and average speed. The results indicate that SIL significantly enhances policy transparency while maintaining strong performance across varied driving conditions. These findings highlight the potential of integrating ILP into imitation learning to promote safer and more reliable autonomous systems.

1. Introduction

The development of autonomous driving technologies has elevated transportation systems to a new level, promising safer and more efficient roadways. Among the various techniques used to enable autonomous vehicles (AVs), imitation learning has emerged as a promising approach due to its ability to learn complex driving behaviors from expert demonstrations [1]. By leveraging large-scale driving datasets and deep neural networks (DNNs), imitation learning has demonstrated remarkable success in training autonomous agents to emulate human driving behaviors [2,3,4,5,6].
Despite its impressive performance, DNN-based imitation learning (DNNIL), often implemented using Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs) [7], inherits critical limitations from DNNs that hinder its widespread adoption in real-world autonomous systems. A prominent drawback is the lack of interpretability in the learned driving policies due to the black-box nature of DNNs [8,9], making the decision-making process of autonomous driving difficult to verify and validate. This lack of interpretability not only limits human ability to diagnose failures or errors but also hampers the trust and acceptance of autonomous systems by society and regulatory bodies [10].
Moreover, the generalizability of DNNIL driving policies remains a concern. Although these models can learn to imitate expert drivers in specific scenarios, adapting the learned policies to unseen situations can be problematic, as DNNIL is limited to behaviors observed during training [11]. Consequently, when there is a mismatch between test and training data distributions [12], the models often exhibit limited knowledge of novel situations. The rigid nature of these policies frequently leads to suboptimal or unsafe behavior when encountering unfamiliar traffic conditions. Finally, although more sample-efficient than reinforcement learning, DNNIL methods still suffer from data inefficiency, requiring millions of state-action pairs for effective learning [13,14]. These limitations call for further research aimed at addressing the transparency, generalizability, and data efficiency challenges of imitation learning.
To improve the transparency and interpretability of imitation learning, explainable AI (XAI) methods [15,16,17] often employ one of the following approaches: white-box (symbolic) models, explainable neural networks, or neurosymbolic frameworks. In white-box models, for example, a learning framework combining imitation learning and logical automata was proposed by Leech [18] to represent problems as compact finite-state machines with human-interpretable logic states. Additionally, Bewley et al. [19] employed decision trees to interpret emulated policies using only inputs and outputs, while [20] leveraged a hierarchical approach to ensure interpretability. On the other hand, pixel-wise CNN-based methods, which are often used in computer vision applications, capture high-level features using heatmaps and their implications [21].
Furthermore, neurosymbolic learning methods—considered among the cutting-edge—aim to combine the learning capabilities of DNNs with symbolic reasoning [22,23]. Most neurosymbolic methods employ symbolic, logic-based reasoning. They extract domain-specific logical rules using various rule-generation techniques [24,25]. As such, they are recognized as sample-efficient approaches that exhibit strong generalizability [26].
Each of the three aforementioned approaches has its own advantages and limitations. Neurosymbolic frameworks such as the Differentiable Logic Machine (DLM) [25,27] integrate differentiable reasoning layers into deep neural architectures, enabling end-to-end training. However, they often require large-scale labeled data and may sacrifice full symbolic transparency due to the latent nature of their learned predicates. Pixel-wise CNN-based techniques are commonly applied in visual domains and also need large-scale datasets. In contrast, white-box models can provide explicit logical expressions behind learned behaviors using limited data. Yet, models based on finite-state machines and decision trees often struggle with scalability and problem complexity in challenging tasks.
As a white-box symbolic approach, search-based heuristic methods such as Inductive Logic Programming (ILP) [28,29] have demonstrated the ability to efficiently extract abstract rules from a small number of examples when background knowledge is provided. Unlike finite-state machines, ILP-based approaches can scale better and handle more complex tasks by effectively searching for rules that satisfy the given examples. ILP can be employed to imitate human behavior using a limited number of examples; however, it has not yet been applied in the context of autonomous driving.
In this paper, we propose a novel rule-based imitation learning technique, called SIL, which is the first purely ILP-based imitation learning method for autonomous driving. This method aims to generate explainable driving policies instead of black-box, non-interpretable ones by extracting symbolic first-order rules from human-labeled driving scenarios, using basic background knowledge provided by humans. It addresses the transparency and generalizability challenges associated with current neural network-based imitation learning methods. By extracting abstract logical relationships between states and actions in autonomous highway driving, SIL aims to deliver transparent, interpretable, and adaptive driving policies that enhance the safety and reliability of autonomous driving systems.
The main contributions of this paper are:
  • We propose SIL, a logic-based imitation learning method that extracts the underlying logical rules governing human drivers’ actions in various scenarios. This approach enhances the transparency of learned driving policies by inducing human-readable and interpretable rules that capture essential aspects of safety, legality, and smoothness in driving behavior. Furthermore, SIL improves the generalizability of these policies, enabling AVs to handle diverse and challenging driving conditions.
  • We compare SIL with state-of-the-art neural-network-based imitation learning methods using real-world HighD and NGSim datasets, demonstrating how a symbolic approach can outperform neural methods—even when trained on a small number of synthetic scenarios labeled as examples for each action.
The remainder of the paper is organized as follows. Section 2 introduces the prerequisites of the method and Section 3 describes the proposed approach in general. Section 4 presents the simulation environment and experimental results, while Section 5 discusses and evaluates the outcomes. Finally, Section 6 concludes the paper.

2. Background

This section presents the theoretical and methodological background relevant to the proposed approach.

2.1. First-Order Logic

First-order logic is a formalism that uses facts and rules to represent knowledge and perform logical inferences [14]. In this framework, a rule consists of a head and a body, written as head :- body, where :- denotes the entailment operation [29]. The head represents an output predicate expressing a relationship between concepts, while the body specifies the conditions under which the head predicate holds.
Predicates are composed of a functor and a list of arguments, expressed as functor/n, each of which has n arguments that are either variables or constants. First-order logic rules typically follow the Horn clause structure, consisting of a single head literal and zero or more body literals:
H : B _ 1 , B _ 2 , , B _ n ,
where B_i ( i = 1 , , n ) denotes the predicates in the body, and commas (,) indicate conjunctions (∧), consistent with Prolog syntax. The H predicate serves as the conclusion of the rule. This structure implies that if all the B_i predicates hold true conjunctively, then H is true; otherwise, it is false.
First-order logic rules offer a robust mechanism for expressing complex relationships and performing logical reasoning based on structured knowledge.

2.2. Inductive Logic Programming

Inductive Logic Programming (ILP) is a machine learning paradigm that combines first-order logic with inductive reasoning to learn symbolic rules from examples, including a set of positive examples and, optionally, negative examples [29]. These examples are typically represented as tuples of ground literals, where each ground literal is an atom with all variables instantiated to constants.
An ILP setup involves three key components: the language bias set B , the background knowledge set BK , and a set of examples E . The set B defines the hypothesis space—that is, the rules the ILP system considers during learning. It includes a desired head predicate along with multiple candidate body predicates and guides the search for rules that could explain the given examples. The set BK , which is domain-specific, provides additional contextual information, such as known relationships, constraints, or regularities, that can assist in inferring new rules. The set of examples E includes both positive examples E + and negative examples E , representing the observed instances from which the ILP system learns.
The primary goal of an ILP system is to discover hypotheses H that explain E + while avoiding E . The system begins by scanning E + and attempting to match each example with every b B . If a candidate b covers a positive example according to BK , it is added to the list of learned hypotheses. In the next step, any hypothesis that also covers a negative example from E is removed. The algorithm then refines the remaining hypotheses using E + and BK . For each hypothesis that covers a positive example e + E + , a more specific rule is generated by adding new literals to extend its coverage. Algorithm 1 presents the pseudo-code of the ILP process.
ILP leverages the strengths of first-order logic and inductive reasoning to learn interpretable and generalizable symbolic rules from a small set of examples E —even from a single example [29]. By iteratively refining candidate rules under the guidance of E + and BK , ILP provides a principled framework for learning explainable policies in complex domains such as autonomous driving.
Algorithm 1 Inductive Logic Programming (ILP)
Require:  B (bias set), BK (background knowledge), E + (positive examples), E (negative examples)
Ensure: H (final set of learned hypotheses)
1:
Initialize H old [ ]                        ⊳Start with an empty hypothesis set
2:
  for each e + E +  do
3:
      for each b B  do
4:
          if b covers e + under BK  then
5:
             Add b to H old                       ⊳Retain b as a candidate hypothesis
6:
          end if
7:
      end for
8:
  end for
9:
  for each e E  do
10:
    for each h H old  do
11:
        if h covers e under BK  then
12:
           Drop h from H old           ⊳ Eliminate hypotheses that incorrectly cover negative examples
13:
        end if
14:
    end for
15:
end for
16:
Initialize H new [ ]                    ⊳ Prepare to refine remaining hypotheses
17:
for each e + E +  do
18:
    for each h H old  do
19:
        if h covers e + under BK  then
20:
            h r refinement of h using e +           ⊳Add new literals to make h more specific
21:
           Add h r to H new
22:
        end if
23:
    end for
24:
end for
25:
H H new                         ⊳Final set of refined, valid hypotheses

2.3. Imitation Learning

Imitation learning is a widely used technique in machine learning and robotics that enables an agent to acquire complex behaviors by imitating expert demonstrations. In deep neural network-based imitation learning (DNNIL), the agent uses deep neural networks (DNNs) to learn from a dataset D consisting of expert demonstrations, represented as state-action pairs: { ( s 1 , a 1 ) , ( s 2 , a 2 ) , , ( s T , a T ) } . Here, s t denotes the state of the environment at time step t, and a t is the action taken by the expert in that state. The objective of DNNIL is to learn a policy π θ ( a | s ) , parameterized by θ , that maps states to actions in a manner consistent with expert behavior.
A common approach to training the imitation policy is through supervised learning. The learning process involves minimizing the discrepancy between the actions predicted by the learned policy and the actions taken by the expert. This discrepancy is typically measured using a mean squared error (MSE) loss function, as shown in Equation (2). The aim is to identify the optimal parameters θ * that minimize this loss over the dataset D :
θ * = arg min θ ( s , a ) D ( π θ ( a | s ) a ) 2
DNNIL has demonstrated significant success in various domains, including autonomous driving. By learning directly from expert drivers’ demonstrations, DNNIL agents are capable of handling complex traffic scenarios. However, one of the major challenges associated with DNNIL is its lack of interpretability. Since policies are encoded within deep networks, understanding the decision-making process becomes difficult. Moreover, DNNIL tends to perform poorly outside the distribution of the training data, resulting in limited generalization to unseen situations.

3. Methodology: Symbolic Imitation Learning

The Symbolic Imitation Learning (SIL) framework introduces a novel approach that leverages ILP to extract symbolic policies from human-generated background knowledge. The core objective of this framework is to replicate human behaviors by uncovering explicit rules that govern complex actions demonstrated by humans. As illustrated in Figure 1, SIL comprises three main components: knowledge acquisition, rule induction, and rule aggregation.
In the knowledge acquisition phase, essential inputs are provided based on prior knowledge about the environment. These include the language bias set B , background knowledge BK , and the set of examples E (refer to Algorithm 1). These components enable the ILP system to induce a single rule during the rule induction phase. To construct a complete policy, this process is repeated iteratively for all required rules, progressively assembling them into the hypothesis set H . The final rule aggregation component then utilizes and refines these induced rules to infer the desired actions. Since these logical rules are derived directly from human demonstrations, the resulting actions are inherently interpretable and human-like. By capitalizing on ILP’s ability to infer symbolic rules from structured examples, SIL effectively captures nuanced human behavior—an area where conventional DNNIL methods often face limitations.
One of the key advantages of SIL is its sample efficiency. It can generate a coherent and interpretable set of rules from a relatively small number of expert demonstrations. These rules not only support human-understandable decision-making but also enhance the system’s generalizability. The explicit, symbolic nature of the learned policies enables SIL-based agents to better adapt to previously unseen scenarios, outperforming black-box policies that typically lack both transparency and adaptability.
Symbolic Imitation Learning in Autonomous Driving: As a use-case scenario, SIL aims to derive unknown rules for autonomous highway driving. The primary objective of employing SIL in this context is to extract driving rules from human-derived background knowledge and use them to emulate human-like behavior. These behaviors include lane changes and adjustments to the AV’s longitudinal velocity, all of which are essential for achieving safe, efficient, and smooth driving. During lane-change decisions, the AV can execute one of three discrete actions: LK, RLC, or LLC.
Background: Human drivers frequently adjust their lane position and velocity based on the positions and speeds of nearby vehicles, also referred to as target vehicles (TVs). Motivated by this principle, the proposed approach incorporates the relative positions and velocities of TVs surrounding the AV to support informed decision-making regarding lane changes and speed adjustments. To formalize this, the area surrounding the AV is partitioned into eight distinct sectors: front, frontRight, right, backRight, back, backLeft, left, and frontLeft, as shown in Figure 2. Each sector may either be occupied by a vehicle—indicated by the predicate sector_isBusy set to true—or unoccupied, in which case the predicate is set to false. For example, right_isBusy is true if there is a vehicle in the right sector; otherwise, it is set to false. Table A1 in the Appendix A.1 shows all the necessary predicates used in this research with their definitions. Additional details on predicate definitions are provided in [30].
To capture the relative longitudinal velocities of TVs, we compute the difference between each TV’s velocity and the AV’s velocity, then assign predicates accordingly. Each sector is associated with three predicates reflecting this velocity difference. If the relative velocity is greater (or smaller) than a predefined threshold η v , the predicate bigger (or lower) is used. Threshold η v ensures that velocity differences are significant rather than negligible, and is set to 5 km/h in this context. When the absolute value of the relative velocity is within the threshold range, it is considered equal. For example, the predicates frontVel_isBigger, frontVel_isEqual, and frontVel_isLower describe the relative velocity of the TV in the front sector with respect to the AV.
In addition, AVs—like human drivers—must remain within valid road sections (i.e., highway lanes) and avoid entering off-road or restricted areas. To represent this contextual constraint, each sector is also labeled as either valid (sector_isValid is set to true) or invalid (sector_isValid is set to false).

3.1. Knowledge Acquisition

Autonomous driving systems require a variety of rules to perform effectively under diverse conditions, and each rule must be learned under a consistent setting with appropriate examples. Therefore, to extract distinct rules, it is essential to define unique configurations and provide corresponding examples for each one. We begin by specifying the scope of each desired rule through the definition of its head predicate and a set of candidate body predicates, all encapsulated within a specific bias set B . The selection of candidate body predicates depends on their potential impact on the accuracy of the head predicate. Consequently, body predicates that have no effect on the head are discarded, reducing the dimensionality of B and avoiding excessive computation during the learning process.
To identify the rules, we initially categorize the actions into four unique sets: fatal, risky, and efficient lane-change actions, along with smooth longitudinal velocities. Fatal actions are the actions leading to serious accidents or crashes, while risky actions are mildly dangerous actions that might lead to law violations. Efficient actions are associated with the actions leading to not only smooth lane changes but also a higher level of safety. These categories can be interdependent.
For each action category, we provide human-labeled datasets comprising scenarios featuring an ego vehicle surrounded by varying numbers of intruders. In each of the scenarios, taking a certain action is either true or false. For example, in the fatal lane change dataset, if there is an intruder on the left, taking the left action is considered fatal. These true/false labels help to define positive and negative examples suitable for ILP systems. Table 1 indicates the number of positive and negative examples in each action category, with the number of related body predicates to the corresponding action. In general, the goal is to extract rules associated with the aforementioned categories to ensure safe and efficient lateral lane changes and longitudinal velocity control.
In each category, the corresponding rule has a unique head predicate defined in B using the head_pred/1 declaration, and a corresponding set of candidate body predicates. These candidate body predicates include all literals that may influence the head predicate. Then, the ILP system searches for optimal body predicates that satisfy all positive examples while simultaneously rejecting negative examples. To enable this, we define multiple possible scenarios using background knowledge and assign a unique identifier to each scenario. Based on expert understanding of the intended action in each scenario, scenarios are labeled as either positive examples using the pos/1 predicate or negative examples using the neg/1 predicate. This process is repeated for each rule to generate sufficient knowledge-driven data for inducing previously unknown logical rules. As such, each rule induction task requires a specific configuration of B , background knowledge BK , and training examples E to support effective learning.
The knowledge acquisition process generates a variety of real-world scenarios, each labeled by a real human driver. Each state consists of an AV surrounded by eight spatial sections, each of which is categorized as either occupied by a TV or vacant. Accordingly, we include eight sector_isBusy literals in the candidate body predicates—one for each sector surrounding the AV.
Additionally, for each state, we incorporate the relative velocity of every TV with respect to the AV. This results in 24 relative velocity literals—three per sector—being added to the candidate body predicates. To ensure legal compliance in driving decisions, we also consider the validity of the right and left sections, adding two corresponding literals to represent whether these areas are drivable. Table 1 summarizes the total number of candidate body predicates associated with each head predicate.
Once the states are defined, we assign positive or negative labels to the head predicates of the target rules based on expert driving knowledge and the behavioral outcome expected in each scenario. For example, to guide the ILP system in deriving fatal RLC rules, we label states in which taking the RLC action is considered fatal as positive examples; otherwise, such states are labeled as negative examples.
The number of predicates in each bias set B , along with the counts of positive and negative examples, are denoted by N BP , N E + , and N E , respectively. These parameters significantly influence the accuracy and robustness of the induced rules. While including negative examples ( E ) is optional, their presence improves the generalizability and resilience of the rules across a wider range of environmental states. As indicated in Table 1, distinct values of N E + and N E are defined for each rule. (All knowledge-based datasets are available at https://github.com/CAV-Research-Lab/Symbolic-Imitation-Learning/tree/main/data (accessed on 10 November 2025)).

3.2. Rule Induction

After obtaining a sufficient amount of knowledge-driven data for each target rule, we proceed to the rule induction stage of the SIL framework. This stage focuses on learning interpretable rules tailored to autonomous highway driving. The rule extraction process is carried out using Popper, a state-of-the-art ILP system that integrates Answer Set Programming (ASP is a declarative programming paradigm well-suited for solving combinatorial and knowledge-intensive problems by encoding them as logic-based rules and constraints) (ASP) [31,32] with Prolog to enhance learning efficiency and accuracy.
One of Popper’s main advantages is its ability to learn from failures (LFF). This is achieved through a three-stage process: generating candidate rules by exploring the hypothesis space via ASP (generate stage), evaluating those candidates against the positive examples E + and background knowledge BK using Prolog (test stage), and pruning the hypothesis space based on failed hypotheses (constrain stage) [33].
Given the dynamic nature of decision-making in autonomous driving, a wide range of rules can be induced for different tasks. However, in this work, we focus on extracting only the essential general rules required for highway driving, specifically in the categories of safety, efficiency, and smoothness.

3.2.1. Safe Lane Changing

This section aims to identify unsafe lane-changing actions from human-labeled scenarios, thereby eliminating them from the action space and ensuring that only safe actions remain. To this end, we consider two datasets containing the following types of unsafe actions: fatal and risky lane-changing actions. The RLC and LLC actions may fall into either category depending on the surrounding traffic conditions. In contrast, LK is generally assumed to be safe, except in specific situations where it may pose potential danger.
Fatal Lane Changing: The objective here is to uncover previously unknown rules that characterize situations in which executing RLC or LLC would be fatal and could lead to collisions. For instance, four such scenarios are illustrated in Figure 3—the top-left and top-middle subfigures correspond to fatal RLC examples, while the bottom-left and bottom-middle represent fatal LLC examples. Using ILP, we derive two rules: one for fatal RLC (h1) and another for fatal LLC (h2). For example, rule h1 specifies when taking the RLC action is considered fatal:
Applsci 15 12464 i001
Here, the comma (,) denotes conjunction, the semicolon (;) represents disjunction, and not(.) indicates negation, following first-order logic notation. Similarly, rule h2 identifies conditions under which executing LLC is fatal:
Applsci 15 12464 i002
In summary, if a TV is present in the right (or left) section, or if that section is marked as invalid, performing an RLC (or LLC) action is deemed unsafe due to the high risk of collision. Therefore, the AV should avoid these actions in such states.
Risky Lane Changing: These actions refer to situations in which the AV selects maneuvers that, while not perilous, may still pose a risk to passenger safety. For instance, as illustrated in the top-right image of Figure 3, consider a scenario where a vehicle occupies the backRight (or backLeft) section of the AV, and its velocity is higher than that of the AV. Although executing an RLC (or LLC) maneuver may be legally or physically feasible, doing so could lead to a collision with the approaching vehicle in the adjacent section. In such cases, the action is not fatal but is considered hazardous and should ideally be avoided.
Using the Popper ILP system, we derive three rules that characterize risky actions. The first rule (h3) specifies that if there is a vehicle in the backRight section with a higher velocity than the AV, or if the frontRight section is occupied by a slower-moving vehicle, then performing an RLC maneuver is deemed risky:
Applsci 15 12464 i003
Similarly, rule h4 captures the conditions under which an LLC maneuver is risky:
Applsci 15 12464 i004
Finally, rule h5 addresses cases where the LK action may pose danger. It states that if a TV is present in the back section, the distance to the AV is unsafe, and the TV is approaching at a higher velocity, then remaining in the current lane becomes hazardous:
Applsci 15 12464 i005
This rule highlights that even lane keeping—typically considered the safest option—can be endangered when the AV is rapidly approached from behind without adequate spacing, thereby increasing the risk of rear-end collisions.

3.2.2. Efficient Lane Changing

While safety rules identify lane-change actions that are either fatal or risky lane changes in specific traffic conditions, they do not provide guidance on which action is more efficient when none of the options pose safety risks. To address this, we introduce a prioritization scheme that helps the AV make time-efficient decisions.
In general, minimizing driving time requires the AV to change lanes when appropriate. However, unnecessary or abrupt lane changes can compromise passenger comfort and health. Therefore, the default priority is to maintain the current lane unless a lane change is deemed necessary. When the AV needs to overtake a vehicle in the front section, we introduce a secondary prioritization: if both RLC and LLC are viable, the AV should prefer LLC, as the left lane typically supports higher traffic speeds. RLC should be considered only when LLC is not feasible. This rule-based preference allows the AV to make more efficient decisions while preserving safety and comfort.
Based on this prioritization strategy, we label scenarios from the knowledge-driven dataset to reflect the more favorable action and use ILP to learn the corresponding rules. Rule h6 identifies a situation where LLC is preferable to RLC:
Applsci 15 12464 i006
This rule specifies that when the front section is occupied and both the left and frontLeft sections are clear, the AV should prefer LLC over RLC. Conversely, rule h7 describes situations in which RLC becomes the preferred action:
Applsci 15 12464 i007
This rule applies when the front, left, and frontLeft sections are all occupied, while the right and frontRight sections are available. In such cases, RLC is the more efficient option.
Importantly, the predicates rlc_isBetter and llc_isBetter are mutually exclusive by design. If one holds true, the other cannot, thereby avoiding conflicting recommendations during decision-making.

3.2.3. Smooth Longitudinal Velocity Control

To ensure passenger comfort and overall safety, an AV—like a human driver—must adjust its velocity smoothly. Human drivers typically rely on a limited set of intuitive actions to manage vehicle speed in a continuous manner. These include gradually accelerating to reach a desired cruising speed, adjusting speed to match that of a slower vehicle ahead (referred to as the front TV), and decelerating in response to unsafe following distances. In critical situations, such as when the AV is too close to the front TV while moving faster, the driver—or the AV—must apply the brake to prevent a collision. Based on these observed driving behaviors, the goal is to identify three fundamental rules that govern safe and smooth velocity adjustments.
In the previous work [30], we introduced a rule-based method for longitudinal velocity control in autonomous vehicles. The results demonstrated that the AV could avoid collisions with the front TV while ensuring smooth acceleration and deceleration transitions. This was achieved by integrating a low-level controller that eliminated discontinuities in the velocity commands. As shown in Equation (3), the proposed approach models three distinct acceleration phases, inspired by typical lane-following behavior observed in human drivers, and aligns with methods proposed in [34].
As illustrated in Figure 4, the catch-up phase applies when the front sector is unoccupied. In such cases, the AV has the flexibility to accelerate, decelerate, or maintain speed. To compute the appropriate acceleration, we define a desired longitudinal velocity term ( V x d ), which may differ across drivers or situations, and use it to determine the required acceleration to reach this target speed. For this task, V x d is 110 (km/h).
When a vehicle is present in the front section, human drivers typically adjust their speed to match that of the front TV, thereby maintaining a safe distance and avoiding collisions. This behavior corresponds to the follow-up phase, which applies when a TV occupies the front sector and the longitudinal distance between the AV and the TV is considered safe. As shown in Equation (3), the follow-up phase computes the required acceleration that allows the AV to synchronize its velocity ( v x , A V ) with that of the front TV ( v x , T V ), while gradually reducing the separation distance D.
Finally, the brake phase applies in situations where the AV finds itself at an unsafe distance—specifically, when D falls below a critical threshold C—and is traveling at a higher speed than the front TV. In these emergency scenarios, braking is necessary to prevent a potential collision. The AV must decelerate promptly until a safe following distance is re-established. This phase is illustrated in Figure 4 and formalized as three cases in the following equation:
a x = V x d v x , A V Δ t catch - up , v x , T V 2 v x , A V 2 2 ( D C ) follow - up , v x , A V 2 2 D brake ,
where D is the distance between the AV and the front TV, and C is the critical braking distance, which is considered 15 (m) in this use case. The time step Δ t is 0.04 (s).
As previously stated, the objective is to define logical rules that correspond to the acceleration conditions described in Equation (3). However, existing ILP systems are not designed to reason directly over continuous numerical values (except for binary values such as 0 and 1), making it impractical to extract the complete equations symbolically. Instead, we identify which of the three control phases—catch-up, follow-up, or brake—is applicable in a given state. Once the appropriate phase is determined, the corresponding acceleration is computed analytically.
To induce rules for each control phase, we construct training examples by evaluating the relative distance between the AV and the front TV. Based on this evaluation, each state is labeled as a positive or negative example for the corresponding rule, as detailed in Table 1. This approach enables Popper to learn one symbolic rule per phase. For instance, the following rule specifies that when the front section is unoccupied, the AV should enter the catch-up phase to reach its desired speed:
Applsci 15 12464 i008
The second rule addresses the follow-up phase. It states that when the front section is occupied and the distance between the AV and the front TV is safe, the AV should adjust its acceleration to match the speed of the vehicle ahead:
Applsci 15 12464 i009
Finally, the third rule pertains to the brake phase. It specifies that when a vehicle is present in the front section, the relative distance is unsafe, and the AV is moving faster than the front TV, the AV should initiate braking to prevent a collision:
Applsci 15 12464 i010
As summarized in Table 1, we evaluate the quality of each learned rule using an accuracy metric, defined as the average of precision and recall over the corresponding positive ( E + ) and negative ( E ) examples. According to the results, all induced rules achieve perfect accuracy (1.00), indicating that all the extracted rules successfully covered all considered scenarios and predicted positive and negative examples within their respective domains. Since there is no mislabeled data in the datasets, it is not beyond expectation that an ILP-based system can find the rules satisfying all positive and negative examples.
In addition to accuracy, we also report the rule induction time ( T i ) for each rule in Table 1. The values show that T i is consistently low across all rules, demonstrating that the SIL framework enables fast and efficient rule induction. This efficiency can be attributed to Popper’s structured approach, which combines symbolic reasoning with effective pruning strategies to learn concise and accurate rules.
To evaluate the robustness of this method, we intentionally flip the data labels randomly to observe how Popper manages the noise. Varying percentages of noise from 2% to 20% are considered for all the datasets. As shown in Table A2 (see Appendix A.2), increasing the noise percentage leads to a significant decrease in both precision and recall. Moreover, the induction time T i also increases.

3.3. Rule Aggregation

Having extracted the rules h1 to h10 using Popper, these interpretable rules are further processed to generate coherent, human-like decisions for lane changes and velocity adjustments. This post-processing is performed by the rule aggregation component, which integrates the induced rules into a unified decision-making framework.
The primary goal of this component is to identify the best lane-change and velocity actions for each state by combining the symbolic rules with background knowledge. This component prioritizes rules based on their criticality: (1) fatal lane changing rules (h1h2) identify deadly actions, which are eliminated from the action space. (2) Risky lane changing rules (h3h5) detect risky actions; these actions are removed from the action space but retained as backup options if no other actions remain. (3) Finally, in the refined action space, efficiency (h6h7) rules select the action leading to the most efficient path for the AV. Additionally, smoothness (h8h10) rules adjust the longitudinal velocity to create a smooth driving.
Supplementary rules are also introduced to ensure consistent reasoning across the hypotheses, enabling seamless interpretation and coordination of the high-level symbolic outputs. These outputs are then used to control the AV via a low-level controller.
To determine the best action for a given state S t , the framework first applies rules h1 through h5 to eliminate actions deemed fatal or risky, thus narrowing the action space to only safe candidates. From the remaining options, the most efficient action is selected using rules h6 and h7, which assess the relative quality and priority of the remaining lane-change options. When multiple candidate rules are triggered simultaneously, the system applies a deterministic priority order based on rule specificity. In practice, this means that rules supported by larger numbers of positive examples take precedence. For instance, left-lane change rules are prioritized over right-lane change rules. The final decision yields the optimal action a t , which can be LK, LLC, or RLC. As illustrated in Figure 1, once a t is chosen, the AV’s lane ID is updated via a switch box mechanism that maps the decision to a corresponding lateral position y d N L . A low-level controller is then used to generate the lateral acceleration command a y , where a proportional-integral-derivative (PID) controller is employed as in [30]. The controller gains K p y , K I y , and K D y are 4, 1, and 0, respectively.
For longitudinal velocity control, the current state S t is evaluated against rules h8 to h10 to determine which of the three acceleration phases—catch-up, follow-up, or brake—is applicable. Based on the selected rule and the formulation in Equation (3), the required longitudinal acceleration a x is computed. The desired velocity is then calculated as v x d = v x + a x Δ t and passed as input to a PID controller responsible for regulating thrust. For this task, the controller gains K p x , K I x , and K D x are 3, 1, and 0, respectively. The PID controller ensures smooth and continuous velocity control in accordance with the selected symbolic rule.
In summary, the SIL framework integrates symbolic reasoning and control by mapping learned high-level hypotheses into executable actions. These actions are aligned with the safety, efficiency, and smoothness principles established throughout the rule induction process.

4. Results

This section presents the performance evaluation of the proposed method in comparison with several imitation learning baselines. The evaluation focused exclusively on lane-change decisions rather than longitudinal control. The SIL model control acceleration based on Equation (3) and DNN-based baselines uses rule rule-based controller. This design aligns with the core objective of comparing symbolic policy learning with black-box neural imitation learning in the context of high-level decision-making. The experiments are conducted using the HighD [35] and NGSim [36] datasets, which offer realistic driving scenarios to evaluate both learning and generalization capabilities.

4.1. Baselines

4.1.1. DNNIL: Deep-Neural-Network-Based Imitation Learning

We implemented a fully connected deep neural network (DNN) as the baseline for DNNIL. The network architecture consisted of three layers: the first two layers contained 128 nodes each, while the final layer had three output nodes representing the action space. A softmax activation function was applied in the output layer. The network was implemented using the PyTorch v2.0 library, and the ADAM optimization algorithm was used with a learning rate of α = 1 × 10 4 and a batch size of 256.
To train the network, we collected over 160,000 state–action pairs from the HighD dataset by tracking all vehicles from the beginning to the end of each sequence. Each state was represented by the normalized relative positions of eight surrounding target vehicles (TVs) and the normalized velocity of the AV, with normalization performed using the predefined maximum values in the HighD dataset (118.8 km/h for velocity and 250 m for distance). Lane-change actions were detected by comparing the initial and final lane indices of each vehicle within each frame. Based on this detection, we encoded the action as a binary list of length three, indicating the corresponding maneuver for lane changes. Velocity control was handled separately using rule-based methods rather than being learned by the network.
In addition to DNNIL, we implemented two other well-established imitation learning methods on the HighD dataset to further validate the reliability and effectiveness of the proposed approach. These baselines are briefly described below.

4.1.2. BCMDN: Behavioral Cloning with Mixture Density Networks

This method combines Behavior Cloning (BC) [5]—a widely used approach in which an agent learns to mimic expert behavior—with Mixture Density Networks (MDNs) [37], a neural architecture designed to model probability distributions. Unlike traditional BC, which predicts a single deterministic action, BCMDN predicts a probability distribution over multiple possible actions using MDNs. This modeling of uncertainty allows the agent to represent multiple plausible behaviors in ambiguous or highly variable situations, thereby improving robustness and flexibility in complex driving scenarios. The pseudocode is provided as Algorithm A1 in the Appendix A.3.

4.1.3. GAIL: Generative Adversarial Imitation Learning

Generative Adversarial Imitation Learning (GAIL) [6] is an advanced imitation learning framework built on Generative Adversarial Networks (GANs) [7]. In this approach, a discriminator network is trained to distinguish between expert actions and those produced by a learning agent. Simultaneously, a generator (policy) network attempts to generate actions that are indistinguishable from expert actions according to the discriminator. GAIL optimizes the policy such that it closely mimics expert behavior, all without requiring explicitly defined reward signals [6]. The pseudocode is provided as Algorithm A2 in the Appendix A.3.
For further details regarding the implementation of BCMDN and GAIL in the context of driving behavior modeling on the HighD dataset, we refer interested readers to [38,39].

4.2. Symbolic Imitation Learning Implementation

In contrast to the DNNIL method, the SIL framework relies on a relatively small number of positive and negative examples to learn unknown rules, as summarized in Table 1. Once constructed, the SIL framework is capable of making interpretable decisions in each state by leveraging symbolic first-order logic. These decisions are also generalizable, as the learned rules are not constrained to specific training scenarios.
As shown in Table 1, another advantage of the SIL method is its computational efficiency—each rule can be induced in a fraction of a second (see T i column). For implementation, we used the HighD dataset to simulate a virtual AV agent initialized with arbitrary lane and velocity values. The agent was then tasked with driving safely, efficiently, and smoothly on the highway using the learned symbolic policies.
Similar to DNNIL, the SIL training process is conducted offline and completed prior to deployment. However, unlike neural networks that often require large datasets and extended training time, the ILP-based SIL framework is highly sample-efficient and can often learn meaningful rules from only a few well-labeled examples. Induction times for the rules are given in Table 1. In contrast, the baselines required substantially longer training times: approximately 13 h for DNNIL, 14 h for BCMDN, and 20 h for GAIL on a workstation equipped with an RTX-series GPU, 32 GB RAM, and an Intel i7 processor.

5. Discussion

All methods were evaluated under comparable conditions to facilitate a fair comparison of their overall performance. Each agent was initialized with similar positions, velocities, and driving directions. To assess the effectiveness of each approach, we defined performance metrics in three key categories: safety, efficiency, and smoothness. Safety was evaluated by the two success rates: collision-based success rate SR c and distance-based success rate SR d , which are computed as:
SR c = ( 1 N C N ) × 100 , SR d = D ¯ L × 100 ,
where N C is the number of collisions, and N is the number of evaluation episodes. Also, D ¯ is the average traveled distance and L is the total length of the driving scenario. SR c indicates the percentage of the episodes completed without a collision, while SR d indicates the percentage of the traveled distance compared to the total traveling distance. Higher values of both metrics show a higher degree of safety.
Furthermore, the efficiency and smoothness were evaluated through the number of lane changes ( N L C ). Additionally, average agent speed ( V ¯ ) was used as a composite measure of efficiency, calculated as V ¯ = D ¯ T ¯ , where T ¯ denotes mission time per episode.
Using the HighD dataset, we tested all methods over N = 100 episodes with at least five different seeds, each episode consisting of a L = 2100 (m) driving track. For each experiment, the AV operated in either the left-to-right (L2R) or right-to-left (R2L) direction. All agents were trained exclusively in the L2R direction, and their generalization was assessed by evaluating performance in the R2L direction and the NGSim environment. The average comparative results over seeds are summarized in Table 2.
Safety Analysis: As shown in Table 2, the SIL agent completed all evaluation scenarios with 100% collision-based success rate in the L2R scenario, demonstrating strong safety performance. In contrast, the corresponding success rates for the DNNIL, BCMDN, and GAIL are significantly smaller. Due to the explicit safety rules considered in the proposed framework, the SIL agent complies with the safety rules and avoids fatal and risky lane changes, which lead to collisions. Moreover, in the R2L scenario and the NGSim scenarios, it consistently maintains superiority over the baselines by success rates 98% and 96%, respectively.
From the distance-based success rate perspective, the SIL agent similarly outperforms the baselines by completing 100% and 99.3% of the path on average in L2R and R2L directions, respectively. These superior results stem not only from explicit safety rules but also from the longitudinal velocity rules, which ensure safe distances from the front and back intruder vehicles.
Efficiency Analysis: To assess the efficiency of the agents, we examined their average speed per episode, denoted by V ¯ . This metric serves as a proxy for effective lane-change behavior, as faster travel generally correlates with successful overtaking. We computed V ¯ across both directions for all agents and used the average value for comparison. The results show that DNNIL, BCMDN, GAIL, and SIL achieved V ¯ values of 109.7, 99.89, 115.07, and 116.06 km/h, respectively. Notably, the SIL agent achieved a higher average speed than the other models, suggesting that its rule-based lane changes enabled it to find free lanes efficiently, maintain higher velocities, and ultimately reduce overall travel time.
Smoothness Analysis: To analyze the smoothness, we consider the number of lane changes ( N L C ), which was close to zero for the DNNIL and BCMDN agents, indicating their limited ability to learn and execute lane-change maneuvers effectively. While these agents exhibited smooth driving behavior, they often remained in a single lane throughout the episode. This tendency not only reduces responsiveness but may also contribute to traffic congestion due to inefficient lane utilization. Despite being trained on a large volume of data, the inability of these models to generalize lane-change behaviors highlights their inherent sample inefficiency.
The GAIL agent, in contrast, performed a significantly higher number of lane changes. However, this came at the cost of a high collision rate, indicating that its behavior, while active, was not reliably safe. On the other hand, the SIL agent changed lanes approximately once per episode, guided by explicitly learned efficient lane-changing rules. These rules enabled the AV to make strategic, context-aware lane changes, which contributed to shorter travel times without compromising safety.
Generalizability Analysis: To assess the generalizability of the SIL framework, we tested it on a different dataset—NGSim [36]—which captures diverse urban highway scenarios in the United States. Among the available subsets, US-101 was chosen because its highway characteristics are broadly comparable to those of the HighD dataset, while differing in vehicle density, lane structure, and speed distribution. Given that the NGSim environment is highly congested, we reduced the number of vehicles to ensure feasible driving space for the AV while preserving comparability. To this end, the state-space representation was kept consistent across both datasets by maintaining the same eight-sector structure as in HighD. As shown in Table 2, the SIL agent continued to perform safely, maintaining a low number of collisions despite the changes in environment and traffic dynamics.
Notably, some collisions observed in the NGSim experiments were unavoidable. This is primarily due to the fact that surrounding vehicles in the dataset are not aware of the SIL-controlled AV and thus do not respond to its presence. As a result, collisions often occurred from the rear, where other vehicles failed to maintain a safe following distance. Additionally, the AV’s average velocity in the NGSim scenarios was lower compared to the HighD due to the overall slower traffic flow and higher congestion levels in the NGSim dataset. Nevertheless, the ILP-generated rules enabled the AV to adapt effectively, demonstrating that the SIL framework is robust to variations in lane configurations and speed distributions.
Another advantage of the SIL framework lies in its computational efficiency. As shown in Table 1, each rule can be induced in a relatively short time, whereas training a DNNIL model on large datasets demands substantial time and computational resources. Moreover, in many real-world applications, large volumes of labeled data may not be readily available. In contrast, the SIL framework can learn effectively from a small set of human-labeled examples, making it more practical and accessible. Furthermore, the explicit, interpretable nature of the SIL rule base aligns well with established automotive functional safety standards, such as ISO 26262 [40], by addressing both functional safety and AI system assurance requirements.
Sensitivity to Label Noise: We evaluated SIL under controlled label-noise injections to assess robustness. Performance of model degrades as noise increases: with 2% noise, precision drops to ≈98%; at 5% to ≈95%; at 10% to ≈91%; and at 20% to ≈80%. This indicates a strong dependence on clean supervision. In contrast, deep imitation learning baselines such as DNNIL are typically more tolerant to moderate annotation noise because they optimize over large datasets and can partially smooth out inconsistent labels during training. SIL, by design, induces discrete symbolic rules that must exactly satisfy positive and negative examples, which makes it more brittle to mislabeled samples.
As summarized in Table A2 and Appendix A.2, even modest noise levels (2–10%) reduce precision and recall and increase induction time T i . This has practical implications such as high-quality labeling becomes critical; noisy rule examples can directly degrade driving decisions.
In addition to sensitivity to label noise, the proposed method has several other limitations. First and most importantly, as discussed in Section 3.1, each unknown rule must be learned under a carefully defined setting, including a tailored bias set and example set, which adds considerable system design complexity and constitutes a substantial manual bottleneck. Second, all datasets should be carefully labeled to avoid noise issues associated with classical ILP systems; the presence of incorrectly labeled positive or negative examples can significantly hinder the rule induction process. Third, the extracted rules (h1h10) were not validated by independent experts beyond the original annotator(s); future work should include external validation by independent drivers or traffic safety experts to assess rule correctness and completeness. Finally, another challenge arises when attempting to extract rules from real-world human driving data: human drivers often take different actions in similar situations due to personal preferences or unobservable knowledge, making it difficult to infer consistent rules.
Besides these limitations, there is a minor difference between SIL and the DNN-based imitation learning baseline for longitudinal acceleration control: SIL includes rule-based longitudinal speed control (rules h8h10), whereas the DNNIL baseline controls only lane changes, and velocity is handled separately by rules rather than learned. This may advantage SIL; we therefore highlight this difference and leave fully learned rate control for DNNIL as future work.
Future work should aim to address these challenges by improving the noise tolerance of ILP systems and exploring semi-automated ways to configure rule learning environments. Advancing in these directions would help scale the application of symbolic imitation learning to more diverse and unstructured real-world driving scenarios.

6. Conclusions

This paper introduced a novel Symbolic Imitation Learning (SIL) framework that utilizes human driving background knowledge and example-based reasoning to extract interpretable decision-making rules for autonomous vehicles via Inductive Logic Programming (ILP). The proposed method is structured around three core components: knowledge acquisition, rule induction, and rule aggregation. These components work in tandem to enable the derivation and integration of symbolic rules that guide autonomous driving behavior. Through extensive experiments on the HighD and NGSim datasets, we demonstrated that SIL outperforms prominent deep neural network-based imitation learning approaches in key performance dimensions, including safety, efficiency, and smoothness. Importantly, the method offers full interpretability by operating within a first-order logic framework and exhibits strong generalizability in previously unseen environments. Although the proposed method has drawbacks, such as the need for careful labeling and the difficulty of rule extraction, SIL is highly sample-efficient, requiring substantially fewer state–action pairs to learn effective policies compared to black-box learning methods.
Future work should validate the application of the SIL framework in more complex and dynamic driving contexts, such as highway merging, bidirectional traffic, and scenarios involving heterogeneous driving behaviors and intentions. While the extracted rules have proven effective in highway-like environments, extending the rule base to accommodate broader traffic situations remains a valuable direction for research and development.

Author Contributions

Conceptualization, I.S. and M.Y.; methodology, I.S.; software, I.S.; validation, I.S. and M.Y.; data curation, I.S. and M.Y.; writing—original draft preparation, I.S.; writing—review and editing, I.S. and M.Y.; visualization M.Y. and I.S.; supervision, S.F.; project administration, S.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The implementation of the proposed methods is available at: https://github.com/CAV-Research-Lab/Symbolic-Imitation-Learning (accessed on 10 November 2025). The HighD dataset can be accessed at: https://www.highd-dataset.com (accessed on 10 November 2025), and the NGSim dataset is available at: https://ops.fhwa.dot.gov/trafficanalysistools/ngsim.htm (accessed on 10 November 2025).

Acknowledgments

During the preparation of this manuscript, the authors partially used GPT-4o for language editing and proofreading. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

    The following abbreviations are used in this manuscript:   
ADAutonomous Driving
AVAutonomous Vehicle
DNNILDeep-Neural-Network-based Imitation Learning
ILImitation Learning
ILPInductive Logic Programming
FOLFirst-Order Logic
HighDHighD Traffic Dataset
L2RLeft to Right
LKLane Keeping
LLCLeft Lane Change
NGSIMNext Generation Simulation Dataset
PIDProportional-Integral-Derivative
R2LRight to Left
RLCRight Lane Change
SILSymbolic Imitation Learning
TVTarget Vehicle
XAIExplainable Artificial Intelligence

Appendix A

Appendix A.1. Symbolic Predicate Definitions

Table A1. List of symbolic predicates used in the SIL framework and their definitions.
Table A1. List of symbolic predicates used in the SIL framework and their definitions.
PredicateDefinition
front_isBusyIndicates whether the front sector is occupied by a target vehicle.
frontRight_isBusyIndicates whether the front-right sector is occupied.
right_isBusyIndicates whether the right sector is occupied.
backRight_isBusyIndicates whether the back-right sector is occupied.
back_isBusyIndicates whether the back sector is occupied.
backLeft_isBusyIndicates whether the back-left sector is occupied.
left_isBusyIndicates whether the left sector is occupied.
frontLeft_isBusyIndicates whether the front-left sector is occupied.
right_isValidIndicates whether the right lane is valid (within road boundaries).
left_isValidIndicates whether the left lane is valid.
frontVel_isBiggerThe front vehicle is moving faster than the AV. If | v x , T V v x , A V | > η v and v x , T V > v x , A V , it is true; otherwise, it is false.
frontVel_isEqualThe front vehicle is moving at a similar speed as the AV. If | v x , T V v x , A V | < η v , it is true; otherwise, it is false.
frontVel_isLowerThe front vehicle is moving slower than the AV. If | v x , T V v x , A V | > η v and v x , T V < v x , A V , it is true; otherwise, it is false.
frontRightVel_isBiggerThe vehicle in the front-right is faster than the AV.
frontRightVel_isLowerThe vehicle in the front-right is slower than the AV.
backRightVel_isBiggerThe vehicle in the back-right is faster than the AV.
backLeftVel_isBiggerThe vehicle in the back-left is faster than the AV.
frontLeftVel_isLowerThe vehicle in the front-left is slower than the AV.
backVel_isBiggerThe vehicle in the back sector is faster than the AV.
backDist_isSafeThe distance between the AV and the vehicle in the back is safe. If the distance is more than C, it is true; otherwise, it is false.
frontDist_isSafeThe distance to the front vehicle is within a safe threshold. If the distance is more than C, it is true; otherwise, it is false.
rlc_isFatalIndicates the right lane change (RLC) is fatal.
llc_isFatalIndicates the left lane change (LLC) is fatal.
lk_isRiskyIndicates lane keeping (LK) is risky in the current state.
rlc_isRiskyIndicates RLC is risky in the current state.
llc_isRiskyIndicates LLC is risky in the current state.
llc_isBetterIndicates LLC is more efficient than RLC in the given state.
rlc_isBetterIndicates RLC is more efficient than LLC.
reachDesiredSpeedIndicates the AV should accelerate to reach its desired speed.
reachFrontSpeedIndicates the AV should adjust speed to match the front vehicle.
brakeIndicates that the AV should decelerate due to an unsafe distance ahead.

Appendix A.2. Experimental Results with Noisy Datasets

Table A2. Experimental results with noisy datasets. All noises applied with seed 42. In Popper, parameter b max indicates the maximum number of body predicates for each rule, which directly affects the rule space size.
Table A2. Experimental results with noisy datasets. All noises applied with seed 42. In Popper, parameter b max indicates the maximum number of body predicates for each rule, which directly affects the rule space size.
Dataset
Category
ActionNoise (%) b max PrecisionRecall T i (s)
Fatal Lane ChangingRLC220.980.990.24
520.950.980.25
1020.900.970.27
2020.790.940.36
LLC220.980.990.24
520.950.980.26
1020.910.950.28
2020.820.910.36
Risky Lane ChangingRLC230.990.821.52
530.970.641.83
1030.900.461.90
2030.780.272.20
LLC230.980.961.88
530.950.902.31
1030.920.802.92
2030.820.640.49
LLC230.980.951.82
530.940.902.82
1030.910.792.71
2020.820.630.53
Efficient Lane ChangingRLC250.970.620.82
550.970.381.41
1050.910.232.80
2050.880.123.80
LLC250.980.881.91
550.970.732.67
1050.880.563.58
2050.800.364.72
Smooth Longitudinal VelocityCatch-up230.980.982.80
520.940.960.48
1020.910.891.62
2010.790.800.26
Follow-up240.980.9418.84
540.950.8621.87
1040.920.7440.08
2030.830.5779.55
Brake230.980.952.07
530.950.862.72
1030.910.7538.54
2020.840.560.92

Appendix A.3. BCMDN and GAIL Pseudo Codes

Algorithm A1 Behavior Cloning with a Mixture Density Network (BCMDN)
Require: Dataset D = { ( s i , a i ) } i = 1 N of expert state–action pairs
Require: Number of Gaussian components K, learning rate α , batch size B
Ensure: Policy π Θ ( a | s ) parameterized by MDN Θ = { ϕ , ψ }
1:
Model: Feature encoder h ϕ ( s ) (e.g., MLP/CNN); MDN head g ψ ( · ) outputs per state s:
2:
    { π ( s ) , μ ( s ) , σ ( s ) } = g ψ ( h ϕ ( s ) )
3:
   Constraints:  π k = softmax ( z k π / τ ) , σ k , d = softplus ( z k , d σ ) + ϵ
4:
Initialize parameters Θ Θ 0
5:
Normalize/standardize states and actions (fit on D )
6:
  for epoch  = 1 , 2 , do
7:
      Shuffle D and split into mini-batches { B j }
8:
      for each mini-batch B = { ( s , a ) } b = 1 B  do
9:
          Compute MDN outputs { π k , μ k , σ k } k = 1 K for each ( s , a ) B
10:
        Negative log-likelihood (per sample):
11:
              L NLL ( s , a ) = log k = 1 K π k ( s ) d = 1 D N a d | μ k , d ( s ) , σ k , d 2 ( s )
12:
        Regularization (optional):
13:
              L reg = β Θ 2 2 η H ( π ( s ) )
14:
        Batch loss:  L = 1 B ( s , a ) B L NLL ( s , a ) + L reg
15:
        Update Θ Θ α Θ L
16:
    end for
17:
end for
18:
Inference (deterministic):  a ( s ) = k = 1 K π k ( s ) μ k ( s )
19:
Inference (stochastic): sample k Cat ( π ( s ) ) , then a N ( μ k ( s ) , diag ( σ k 2 ( s ) ) )
20:
return MDN policy π Θ ( a | s )
Algorithm A2 Generative Adversarial Imitation Learning (GAIL) with PPO
Require: Expert trajectories τ e π expert with τ e = ( s 1 , a 1 , s 2 , a 2 , )
Require: Learning rates α w , α θ , entropy weight λ , PPO clip ϵ
Ensure: Policy π θ that imitates π expert
1:
Initialize policy parameters θ and discriminator parameters w
2:
  for iter  = 1 , 2 , do
3:
      Collect on-policy data: roll out π θ to obtain trajectories τ θ
4:
      Update discriminator  d w ( s , a ) :
5:
          δ w ( s , a ) τ e w log d w ( s , a ) + ( s , a ) τ θ w log 1 d w ( s , a )
6:
          w w + α w δ w                 ⊳ Binary cross-entropy step
7:
      Form imitation reward from discriminator:
8:
          r D ( s , a ) log 1 d w ( s , a )                ⊳ GAIL reward
9:
      Estimate returns G ^ t and advantages A ^ t from r D (e.g., GAE)
10:
    Update policy with PPO:
11:
       For minibatches M τ θ :
12:
            L CLIP ( θ ) = E ( s , a ) M min ρ θ A ^ , clip ( ρ θ , 1 ϵ , 1 + ϵ ) A ^ λ E s M H ( π θ ( · | s ) )
13:
           where ρ θ = π θ ( a | s ) π θ old ( a | s )
14:
        θ θ + α θ θ L CLIP ( θ )
15:
end for
16:
return  π θ

References

  1. Codevilla, F.; Santana, E.; López, A.M.; Gaidon, A. Exploring the limitations of behavior cloning for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 9329–9338. [Google Scholar]
  2. Pan, Y.; Cheng, C.A.; Saigol, K.; Lee, K.; Yan, X.; Theodorou, E.; Boots, B. Agile autonomous driving using end-to-end deep imitation learning. arXiv 2017, arXiv:1709.07174. [Google Scholar]
  3. Pan, Y.; Cheng, C.A.; Saigol, K.; Lee, K.; Yan, X.; Theodorou, E.A.; Boots, B. Imitation learning for agile autonomous driving. Int. J. Robot. Res. 2020, 39, 286–302. [Google Scholar] [CrossRef]
  4. Zhang, J.; Cho, K. Query-efficient imitation learning for end-to-end autonomous driving. arXiv 2016, arXiv:1605.06450. [Google Scholar]
  5. Morga-Bonilla, S.I.; Rivas-Cambero, I.; Torres-Jiménez, J.; Téllez-Cuevas, P.; Nú nez-Cruz, R.S.; Perez-Arista, O.V. Behavioral cloning strategies in steering angle prediction: Applications in mobile robotics and autonomous driving. World Electr. Veh. J. 2024, 15, 486. [Google Scholar] [CrossRef]
  6. Ho, J.; Ermon, S. Generative adversarial imitation learning. Adv. Neural Inf. Process. Syst. 2016, 29, 4572–4580. [Google Scholar]
  7. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
  8. Zheng, B.; Zhou, J.; Liu, C.; Li, Y.; Chen, F. Explaining Imitation Learning Through Frames. IEEE Intell. Syst. 2024, 39, 18–27. [Google Scholar] [CrossRef]
  9. Zhang, Y.; Tiňo, P.; Leonardis, A.; Tang, K. A survey on neural network interpretability. IEEE Trans. Emerg. Top. Comput. Intell. 2021, 5, 726–742. [Google Scholar] [CrossRef]
  10. Kim, J.; Canny, J. Interpretable learning for self-driving cars by visualizing causal attention. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2942–2950. [Google Scholar]
  11. Ghasemipour, S.K.S.; Zemel, R.; Gu, S. A divergence minimization perspective on imitation learning methods. In Proceedings of the Conference on Robot Learning, Virtual, 16–18 November 2020; pp. 1259–1277. [Google Scholar]
  12. Zhu, Z.; Lin, K.; Dai, B.; Zhou, J. Off-policy imitation learning from observations. Adv. Neural Inf. Process. Syst. 2020, 33, 12402–12413. [Google Scholar]
  13. Fang, B.; Jia, S.; Guo, D.; Xu, M.; Wen, S.; Sun, F. Survey of imitation learning for robotic manipulation. Int. J. Intell. Robot. Appl. 2019, 3, 362–369. [Google Scholar] [CrossRef]
  14. Yu, C.; Zheng, X.; Zhuo, H.H.; Wan, H.; Luo, W. Reinforcement learning with knowledge representation and reasoning: A brief survey. arXiv 2023, arXiv:2304.12090. [Google Scholar] [CrossRef]
  15. Saarela, M.; Podgorelec, V. Recent applications of explainable AI (XAI): A systematic literature review. Appl. Sci. 2024, 14, 8884. [Google Scholar] [CrossRef]
  16. Došilović, F.K.; Brčić, M.; Hlupić, N. Explainable artificial intelligence: A survey. In Proceedings of the 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 21–25 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 0210–0215. [Google Scholar]
  17. Nazat, S.; Arreche, O.; Abdallah, M. On evaluating black-box explainable AI methods for enhancing anomaly detection in autonomous driving systems. Sensors 2024, 24, 3515. [Google Scholar] [CrossRef]
  18. Leech, T. Explainable Machine Learning for Task Planning in Robotics. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2019. [Google Scholar]
  19. Bewley, T.; Lawry, J.; Richards, A. Modelling agent policies with interpretable imitation learning. In Proceedings of the International Workshop on the Foundations of Trustworthy AI Integrating Learning, Optimization and Reasoning, Virtual, 21–25 September 2020; Springer: Cham, Switzerland, 2020; pp. 180–186. [Google Scholar]
  20. Zhang, D.; Li, Q.; Zheng, Y.; Wei, L.; Zhang, D.; Zhang, Z. Explainable hierarchical imitation learning for robotic drink pouring. IEEE Trans. Autom. Sci. Eng. 2021, 19, 3871–3887. [Google Scholar] [CrossRef]
  21. Pan, M.; Huang, W.; Li, Y.; Zhou, X.; Luo, J. XGAIL: Explainable generative adversarial imitation learning for explainable human decision analysis. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, CA, USA, 23–27 August 2020; pp. 1334–1343. [Google Scholar]
  22. Sarker, M.K.; Zhou, L.; Eberhart, A.; Hitzler, P. Neuro-symbolic artificial intelligence. AI Commun. 2021, 34, 197–209. [Google Scholar] [CrossRef]
  23. Hitzler, P.; Sarker, M.K. Neuro-Symbolic Artificial Intelligence: The State of the Art; IOS Press: Amsterdam, The Netherlands, 2022. [Google Scholar]
  24. Kimura, D.; Ono, M.; Chaudhury, S.; Kohita, R.; Wachi, A.; Agravante, D.J.; Tatsubori, M.; Munawar, A.; Gray, A. Neuro-symbolic reinforcement learning with first-order logic. arXiv 2021, arXiv:2110.10963. [Google Scholar]
  25. Zimmer, M.; Feng, X.; Glanois, C.; Jiang, Z.; Zhang, J.; Weng, P.; Li, D.; Hao, J.; Liu, W. Differentiable logic machines. arXiv 2021, arXiv:2102.11529. [Google Scholar]
  26. Keller, L.; Tanneberg, D.; Peters, J. Neuro-symbolic imitation learning: Discovering symbolic abstractions for skill learning. arXiv 2025, arXiv:2503.21406. [Google Scholar] [CrossRef]
  27. Song, Z.; Jiang, Y.; Zhang, J.; Weng, P.; Li, D.; Liu, W.; Hao, J. An interpretable deep reinforcement learning approach to autonomous driving. In Proceedings of the IJCAI Workshop on Artificial Intelligence for Automous Driving, Vienna, Austria, 23 July 2022. [Google Scholar]
  28. Muggleton, S.; De Raedt, L.; Poole, D.; Bratko, I.; Flach, P.; Inoue, K.; Srinivasan, A. ILP turns 20: Biography and future challenges. Mach. Learn. 2012, 86, 3–23. [Google Scholar] [CrossRef]
  29. Cropper, A.; Dumančić, S. Inductive logic programming at 30: A new introduction. J. Artif. Intell. Res. 2022, 74, 765–850. [Google Scholar] [CrossRef]
  30. Sharifi, I.; Yildirim, M.; Fallah, S. Toward Safe Autonomous Highway Driving Policies Using A Neuro-Symbolic Deep Reinforcement Learning Approach. Transp. Res. Rec. 2025. [Google Scholar] [CrossRef]
  31. Lifschitz, V. Answer Set Programming; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
  32. Corapi, D.; Russo, A.; Lupu, E. Inductive logic programming in answer set programming. In Proceedings of the International Conference on Inductive Logic Programming, Windsor Great Park, UK, 31 July–3 August 2011; pp. 91–97. [Google Scholar]
  33. Cropper, A.; Morel, R. Learning programs by learning from failures. Mach. Learn. 2021, 110, 801–856. [Google Scholar] [CrossRef]
  34. Sun, J.; Sun, H.; Han, T.; Zhou, B. Neuro-symbolic program search for autonomous driving decision module design. In Proceedings of the Conference on Robot Learning, London, UK, 8–11 November 2021; pp. 21–30. [Google Scholar]
  35. Krajewski, R.; Bock, J.; Kloeker, L.; Eckstein, L. The highd dataset: A drone dataset of naturalistic vehicle trajectories on german highways for validation of highly automated driving systems. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 2118–2125. [Google Scholar]
  36. Yeo, H.; Skabardonis, A.; Halkias, J.; Colyar, J.; Alexiadis, V. Oversaturated freeway flow algorithm for use in next generation simulation. Transp. Res. Rec. 2008, 2088, 68–79. [Google Scholar] [CrossRef]
  37. Bishop, C.M. Mixture Density Networks; Technical Report NCRG/94/004; Aston Universit: Birmingham, UK, 1994; Available online: https://publications.aston.ac.uk/id/eprint/373/1/NCRG_94_004.pdf (accessed on 10 November 2025).
  38. Kuutti, S.; Fallah, S.; Bowden, R. Adversarial mixture density networks: Learning to drive safely from collision data. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 705–711. [Google Scholar]
  39. Yildirim, M.; Fallah, S.; Tamaddoni-Nezhad, A. Human-like autonomous driving on dense traffic. arXiv 2023, arXiv:2310.02477. [Google Scholar]
  40. Palin, R.; Ward, D.; Habli, I.; Rivett, R. ISO 26262 safety cases: Compliance and assurance. In Proceedings of the 6th IET International Conference on System Safety 2011, Birmingham, UK, 20–22 September 2011; p. B12. [Google Scholar]
Figure 1. Symbolic imitation learning (SIL) framework for autonomous driving decision-making. In the SIL module, B denotes background facts from the environment, BK is background knowledge, and E represents training examples. Rule induction produces a set of rules H, from which h1h5 filter fatal actions and h6h7 rank the safe candidates to select the optimal action among: lane keeping (LK), left lane change (LLC), and right lane change (RLC). The chosen action a t is mapped to a Next Lane (NL) desired lateral position y d N L via a switch, and a PID controller generates the lateral acceleration a y , combined with the longitudinal acceleration a x to control the AV.
Figure 1. Symbolic imitation learning (SIL) framework for autonomous driving decision-making. In the SIL module, B denotes background facts from the environment, BK is background knowledge, and E represents training examples. Rule induction produces a set of rules H, from which h1h5 filter fatal actions and h6h7 rank the safe candidates to select the optimal action among: lane keeping (LK), left lane change (LLC), and right lane change (RLC). The chosen action a t is mapped to a Next Lane (NL) desired lateral position y d N L via a switch, and a PID controller generates the lateral acceleration a y , combined with the longitudinal acceleration a x to control the AV.
Applsci 15 12464 g001
Figure 2. Sector-based representation of the autonomous vehicle (AV) on a three-lane highway. The green vehicle denotes the AV, and the surrounding eight sectors (Front, Front Left, Front Right, Left, Right, Back, Back Left, Back Right) define its state space. Each sector encodes whether it is occupied or not.
Figure 2. Sector-based representation of the autonomous vehicle (AV) on a three-lane highway. The green vehicle denotes the AV, and the surrounding eight sectors (Front, Front Left, Front Right, Left, Right, Back, Back Left, Back Right) define its state space. Each sector encodes whether it is occupied or not.
Applsci 15 12464 g002
Figure 3. Samples of positive and negative examples for fatal and risky lane change rules. For the top-left (a) and top-middle (b) scenarios, pos(rlc_isFatal) and neg(llc_isFatal), and also for the bottom-left (d) and bottom-middle (e) scenarios, neg(rlc_isFatal) and pos(llc_isFatal) hold true. For the top-right (c) scenario, pos(rlc_isRisky) and neg(llc_isRisky), and also for the bottom-right (f) scenario, neg(rlc_isRisky) and pos(llc_isRisky) hold true.
Figure 3. Samples of positive and negative examples for fatal and risky lane change rules. For the top-left (a) and top-middle (b) scenarios, pos(rlc_isFatal) and neg(llc_isFatal), and also for the bottom-left (d) and bottom-middle (e) scenarios, neg(rlc_isFatal) and pos(llc_isFatal) hold true. For the top-right (c) scenario, pos(rlc_isRisky) and neg(llc_isRisky), and also for the bottom-right (f) scenario, neg(rlc_isRisky) and pos(llc_isRisky) hold true.
Applsci 15 12464 g003
Figure 4. Acceleration phases for the autonomous vehicle (AV) as a green vehicle and relative to the target vehicle (TV) as a black vehicle. The catch-up phase is used to reach the driver’s desired speed, the follow-up phase adjusts the AV’s velocity to match the front TV’s speed while maintaining a safe distance, and the brake phase is activated when the distance becomes unsafe, requiring deceleration. Safe and unsafe regions are indicated along the distance axis.
Figure 4. Acceleration phases for the autonomous vehicle (AV) as a green vehicle and relative to the target vehicle (TV) as a black vehicle. The catch-up phase is used to reach the driver’s desired speed, the follow-up phase adjusts the AV’s velocity to match the front TV’s speed while maintaining a safe distance, and the brake phase is activated when the distance becomes unsafe, requiring deceleration. Safe and unsafe regions are indicated along the distance axis.
Applsci 15 12464 g004
Table 1. Summary of the datasets and the rules extracted during the knowledge acquisition and rule induction phases of the SIL framework. The table reports the rule head, the number of predicates in the bias set ( N BP ), the counts of positive ( N E + ) and negative ( N E ) examples used for induction, the hypothesis label, the achieved accuracy, and the induction time ( T i ). Abbreviations: N BP —number of predicates in bias set B ; N E + —number of positive examples; N E —number of negative examples; T i —induction time.
Table 1. Summary of the datasets and the rules extracted during the knowledge acquisition and rule induction phases of the SIL framework. The table reports the rule head, the number of predicates in the bias set ( N BP ), the counts of positive ( N E + ) and negative ( N E ) examples used for induction, the hypothesis label, the achieved accuracy, and the induction time ( T i ). Abbreviations: N BP —number of predicates in bias set B ; N E + —number of positive examples; N E —number of negative examples; T i —induction time.
Dataset CategoryActionHead Predicate N BP N E + N E HypothesisAccuracy T i (s)
Fatal Lane ChangingRLCrlc_isFatal38768256h11.000.365
LLCllc_isFatal38768256h21.000.345
Risky Lane ChangingLKlk_isRisky4387937h31.000.762
RLCrlc_isRisky38320704h41.000.416
LLCllc_isRisky38308717h51.001.007
Efficient Lane ChangingRLCrlc_isBetter2532992h61.004.108
LLCllc_isBetter25128896h71.000.892
Smooth Longitudinal VelocityCatch-upreachDesiredSpeed43512512h81.000.336
Follow-upreachFrontSpeed43253771h91.000.703
Brakebrake43253771h101.000.326
Table 2. Comparison of SIL and baseline methods on evaluation episodes from the HighD and NGSim scenarios with different seeds.
Table 2. Comparison of SIL and baseline methods on evaluation episodes from the HighD and NGSim scenarios with different seeds.
Direction Left-to-Right (HighD) Right-to-Left (HighD) NGSim
Method DNNILBCMDNGAILSIL DNNILBCMDNGAILSIL SIL
N LC 323140 452338 34
T ¯  (s) 65.2869.6756.7864.84 65.1376.550.7664.93 76.38
V ¯  (km/h) 110.18104.29117.65116.59 109.2295.5112.5115.72 98.00
SR d  (%) 95.19688.3100 9496.675.699.3 99
SR c  (%) 888846100 84863698 96
N L C : Number of lane changes; SR c : Collision-based success rate; SR d : Distance-based success rate; T ¯ : Average mission time; V ¯ : Average agent speed.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sharifi, I.; Yildirim, M.; Fallah, S. Symbolic Imitation Learning: From Black-Box to Explainable Driving Policies. Appl. Sci. 2025, 15, 12464. https://doi.org/10.3390/app152312464

AMA Style

Sharifi I, Yildirim M, Fallah S. Symbolic Imitation Learning: From Black-Box to Explainable Driving Policies. Applied Sciences. 2025; 15(23):12464. https://doi.org/10.3390/app152312464

Chicago/Turabian Style

Sharifi, Iman, Mustafa Yildirim, and Saber Fallah. 2025. "Symbolic Imitation Learning: From Black-Box to Explainable Driving Policies" Applied Sciences 15, no. 23: 12464. https://doi.org/10.3390/app152312464

APA Style

Sharifi, I., Yildirim, M., & Fallah, S. (2025). Symbolic Imitation Learning: From Black-Box to Explainable Driving Policies. Applied Sciences, 15(23), 12464. https://doi.org/10.3390/app152312464

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop