PLM-Net: Perception Latency Mitigation Network for Vision-Based Lateral Control of Autonomous Vehicles

Khalil, Aws; Kwon, Jaerock

doi:10.3390/s26061798

Open AccessArticle

PLM-Net: Perception Latency Mitigation Network for Vision-Based Lateral Control of Autonomous Vehicles

by

Aws Khalil

and

Jaerock Kwon

^*

Department of Electrical and Computer Engineering, University of Michigan-Dearborn, Dearborn, MI 48128, USA

^*

Author to whom correspondence should be addressed.

Sensors 2026, 26(6), 1798; https://doi.org/10.3390/s26061798

Submission received: 30 January 2026 / Revised: 5 March 2026 / Accepted: 9 March 2026 / Published: 12 March 2026

(This article belongs to the Special Issue Intelligent Control Systems for Autonomous Vehicles)

Download

Browse Figures

Versions Notes

Abstract

This study introduces the Perception Latency Mitigation Network (PLM-Net), a modular deep learning framework designed to mitigate perception latency in vision-based imitation-learning lane-keeping systems. Perception latency, defined as the delay between visual sensing and steering actuation, can degrade lateral tracking performance and steering stability. While delay compensation has been extensively studied in classical predictive control systems, its treatment within vision-based imitation-learning architectures under constant and time-varying perception latency remains limited. Rather than reducing latency itself, PLM-Net mitigates its effect on control performance through a plug-in architecture that preserves the original control pipeline. The framework consists of a frozen Base Model (BM), representing an existing lane-keeping controller, and a Timed Action Prediction Model (TAPM), which predicts future steering actions corresponding to discrete latency conditions. Real-time mitigation is achieved by interpolating between model outputs according to the measured latency value, enabling adaptation to both constant and time-varying latency. The framework is evaluated in a closed-loop deterministic simulation environment under fixed-speed conditions to isolate the impact of perception latency. Results demonstrate significant reductions in steering error under multiple latency settings, achieving up to

62 %

and

78 %

reductions in Mean Absolute Error (MAE) for constant and time-varying latency cases, respectively. These findings demonstrate the architectural feasibility of modular latency mitigation for vision-based lateral control under controlled simulation settings. The project page including video demonstrations, code, and dataset is publicly released.

Keywords:

latency mitigation; autonomous vehicle navigation; deep learning in robotics and automation; model learning for control; learning from demonstration

1. Introduction

1.1. Motivation

Vision-based Autonomous Vehicle (AV) control follows a perception–planning–control (sense–think–act) cycle, where visual observations are processed to generate control actions. In this cycle, there is a latency between sensing the environment and applying a corresponding action, which makes the human reaction time always higher than zero [1]. Similarly, it is challenging to completely eliminate this latency in AV control [2], and reducing it through powerful GPUs and FPGAs is impractical in automotive platforms. If not properly mitigated, this latency can degrade lateral tracking performance and steering stability, potentially affecting ride comfort and control reliability. Human drivers implicitly anticipate future vehicle states when reacting to visual stimuli [3]. Inspired by this predictive behavior, we propose a deep neural network architecture that forecasts future steering actions to mitigate perception latency.

In this paper, we refer to this latency as the perception latency

δ

, as shown in Figure 1. When vehicle state is

x_{t}

and we have observation

o_{t}

, the corresponding action

a_{t}

is applied at time

t + δ

rather than at time t (

o_{t} \to a_{t + δ}

). By the time this action is applied, the vehicle state has changed and we have a new observation. The perception latency has two components: the algorithmic latency, which is the time required for the algorithm to infer an action from an observation, and the actuator latency, which is the time required to apply the inferred action. The actuator latency, known as steering lag in lateral control [4], can be considered constant [4]. However, as mentioned in [5], the high processing cost of visual algorithms leads to uneven time delays based on the driving scenario, which leads to an overall perception latency that is time-varying.

To address both constant and time-varying perception latency encountered during lane keeping, we propose a deep-learning-based approach, focusing primarily on vision-based AV lateral control. The contribution of this paper will be discussed after explaining the effect of the perception latency on AV lateral control.

1.2. Latency Effect on AV Lateral Control

Vision-based AV lateral control for lane keeping can be achieved using various methods. The traditional approach involves incorporating a computer vision module for lane-marking detection alongside a classical control module for path planning and control. Alternatively, a deep-learning-based approach, such as imitation learning [6], directly maps visual input to control actions, like steering angle. This paper adopts the latter method.

The effect of perception latency on AV lateral control during lane keeping is highly dependent on vehicle speed. In this study, vehicle speed is held constant during evaluation in order to isolate the effect of perception latency independently of speed-induced dynamic variations. At low speeds, the scene does not change significantly between the time the vehicle receives the observation at time t and the time it applies the action at time

t + δ

, because the traveled distance during the latency period is minimal (

d = δ v

). In this scenario, it is reasonable to assume that

a_{t} \approx a_{t + δ}

, making the effect of

δ

negligible. Thus, at low speeds, we can consider that at time t, the action

a_{t}

corresponding to observation

o_{t}

is applied immediately (

o_{t} \to a_{t}

).

The effect of latency on AV control during lane keeping is illustrated in Figure 2. Two vehicles drive at a constant low speed: the green vehicle uses the normal real-time observation, assuming zero latency, while the red vehicle uses the delayed observation, mimicking the perception latency effect. Both vehicles start from position A and move towards position D. If we timestamp each position, then

t_{A} < t_{B} < t_{C} < t_{D}

. For the green vehicle at position B, the available observation is

o_{t_{B}}

and the corresponding action is

a_{t_{B}}

. For the red vehicle at the same position, the available observation is

o_{t_{B} - δ}

and the corresponding action is

a_{t_{B} - δ}

. This pattern continues for the other positions. Both vehicles remain in their lanes at positions A and B, before encountering any curves, because they started within the lane and continued straight, where the steering angle is zero. Thus, at positions A and B, the actions

a_{t}

and

a_{t - δ}

are effectively the same. At position C, the first curve is encountered. The green vehicle successfully turns left to stay in the lane, but the red vehicle continues straight. This deviation occurs because the red vehicle’s input observation at position C is

o_{t_{C} - δ}

, not

o_{t_{C}}

, leading it to apply the action

a_{t_{C} - δ}

at position C. After this first curve, the red vehicle’s zigzag trajectory becomes difficult to correct, even on a straight road, due to the incorrect action taken at position C, which causes subsequent incorrect observations.

In this approach, to mitigate the perception latency and avoid this unstable driving behavior, the main objective would be to predict the correct action from the delayed observation input.

1.3. Contribution

We introduce the Perception Latency Mitigation Network (PLM-Net), outlined in Figure 3. This novel deep learning approach is intended to work easily and without requiring any changes to the original vision-based LKA system of the AV. As depicted in Figure 3, PLM-Net leverages a Timed Action Prediction Model (TAPM) alongside the Base Model (BM), where the latter represents the preexisting vision-based LKA system. The design of the TAPM is inspired by our prior work, ANEC [7], and the Branched Conditional Imitation Learning model (BCIL) proposed by Codevilla et al. [8]. It combines the concept of a predictive model capable of forecasting future action from current visual observation, akin to ANEC [7] (originally inspired by the human driver capability of dealing with the perception latency [9]), with the notion of employing multiple sub-models within the BCIL framework [8] to provide different predictive action values corresponding to different latency levels. Additionally, similar to the ‘command’ input used in BCIL to select a sub-model, the real-time perception latency

δ_{t}

is used to determine the final action value. The final action value

{\tilde{a}}_{t}^{PLM}

is determined through the function

f ({\tilde{a}}_{t}, δ_{t})

where it performs linear interpolation based on the real-time latency value

δ_{t}

given all the predictive action values provided by the TAPM (

{\tilde{a}}_{t}^{TAPM}

) and the current action value provided by the BM (

{\tilde{a}}_{t}^{BM}

). A comprehensive explanation of our proposed method is provided in Section 3. The main contributions of this paper are summarized below.

Main contributions:
- We formulate perception latency in vision-based imitation-learning lane keeping as a time-offset control problem, and analyze how delayed observations degrade steering stability and lateral tracking performance;
- We propose PLM-Net, a modular plug-in latency-mitigation framework that augments an existing imitation-learning lane-keeping controller without modifying or retraining the base policy, thereby preserving its deployment characteristics;
- We introduce the Timed Action Prediction Model (TAPM), a latency-conditioned multi-head predictive module that produces discrete future steering actions indexed by delay values, enabling mitigation of both constant and time-varying latency through runtime interpolation based on measured latency;
- We validate the proposed framework in a deterministic closed-loop simulation environment under fixed-speed conditions to isolate latency effects, demonstrating substantial improvements in steering similarity and trajectory stability across multiple latency settings.

The structure of this paper proceeds as follows: Section 2 presents an overview of related research, while Section 3 outlines the proposed methodology. Following this, Section 4 and Section 5 present the experimental setup and the experimental findings within the simulation environment, accompanied by a comprehensive analysis. Finally, Section 6 summarizes the core findings of the study and provides valuable insights into potential avenues for future research.

2. Related Work

In the domain of vision-based control, the explicit modeling and mitigation of perception latency have received comparatively limited attention relative to other aspects of autonomous driving control. In classical control methods, discussions surrounding latency in autonomous driving have largely revolved around computational delays associated with hardware deployment [10,11,12,13] and communication delays related to network performance [14,15,16,17,18,19]. While several classical control studies have addressed perception or input delay in driving systems, these efforts are relatively sparse compared to the broader literature on delay-aware predictive control. Xu et al. [4] modeled steering lag as a fixed 200 ms delay, while Liu et al. [5] proposed a hierarchical MPC framework to compensate for time-variant input delays on the order of several hundred milliseconds. More recently, Kalaria et al. [20,21] developed robust control strategies to handle fixed perception delays (e.g., 200 ms), demonstrating safety improvements for both lane-keeping and racing tasks. In the context of networked and teleoperated autonomous vehicles, Kamtam et al. [22] reviewed delay patterns between 100 and 350 ms across vision and communication channels. Recent system-level studies have also begun to address the impact of latency variability on autonomous driving pipelines. Han and Kim [23] propose probabilistic scheduling techniques to minimize end-to-end perception–planning–control latency in real-time AV systems. While valuable, these approaches primarily aim to reduce latency, while our work focuses on learning to compensate for its effect on control behavior. Unlike classical or system-level approaches, PLM-Net introduces a learning-based method that adaptively mitigates both fixed and time-varying latency without relying on handcrafted dynamics or delay-tuned controllers. Classical delay-aware control methods, including preview control and delay-compensated MPC, explicitly incorporate system dynamics and delay models into the control law and typically require accurate vehicle modeling, as well as controller redesign. In contrast, PLM-Net does not modify the original control structure nor require explicit vehicle dynamics modeling. Instead, it augments an already trained vision-based imitation-learning policy with a latency-conditioned predictive layer that operates as a plug-in module. Therefore, rather than replacing model-based delay compensation strategies, the proposed framework provides a complementary learning-based solution tailored to data-driven lateral control systems where analytical models or controller redesign may not be feasible.

Within vision-based neural network control models, research efforts have primarily focused on model architecture and performance optimization, with comparatively limited attention given to explicit modeling of the perception latency issue [6,24,25,26,27,28,29,30,31,32,33,34].

Only few studies have discussed perception latency. Li et al. [2] underscored the significance of addressing latency in online vision-based perception systems. To tackle this issue, they introduced a methodology for assessing the real-time performance of perception systems, effectively balancing accuracy and latency. However, it is important to note that the paper primarily focuses on proposing a metric and benchmark for evaluating the real-time performance of perception systems, rather than offering a direct solution for vehicle control. While their approach provides valuable insights into quantifying the trade-off between accuracy and latency in perception systems, it does not directly address the challenges associated with mitigating latency in autonomous driving scenarios. Khalil et al. [7] addressed the perception latency issue by proposing the Adaptive Neural Ensemble Controller (ANEC). However, ANEC assumed perception latency to be constant and did not address time-varying latency. Weighted sum was used to combine the output from the two driving models, but the weight function parameters had to be carefully chosen and adjusted by hand. This introduces environment-specific parameter tuning, which may limit straightforward deployment across different scenarios. Furthermore, ANEC does not explicitly model latency as a runtime-conditioned variable nor provide discrete latency-indexed predictive heads. Instead, it blends multiple policies through adaptive weighting. In contrast, PLM-Net formulates latency mitigation as an explicit mapping between measurable delay values and predicted future steering actions. This design separates the base driving policy from the latency-compensation layer, enabling deterministic interpolation across latency conditions while preserving the original controller parameters. Mao et al. [35] explored latency in the context of video object detection by analyzing the detection latency of various video object detectors. While they introduced a metric to quantify latency, their study focused on measurement rather than proposing solutions to mitigate latency. Kocic et al. [36] tried to decrease the latency in driving by altering the neural network architecture. Although this approach could potentially diminish latency, preserving the original accuracy poses a significant challenge. Wu et al. [37] emphasized that control-based driving models, which convert images into control signals, inherently exhibit perception latency and are susceptible to failure due to their focus on the current time step. In response, they developed Trajectory-guided Control Prediction (TCP), a multi-task learning system integrating a control prediction model with a trajectory planning model. However, this approach necessitates the extraction of precise trajectories, presenting a notable challenge. Popov et al. [38] introduced a latent space generative world model that, while not explicitly addressing latency, exhibited inherent robustness to it during deployment. This suggests that certain architectures may inadvertently compensate for latency, though without targeted mechanisms. In contrast, our method proactively models latency during both training and inference, allowing it to adaptively adjust actions based on real-time delay conditions. Tampuu et al. [39] examined how delays and vehicle speed affect the test-time performance of end-to-end driving models, highlighting that even modest delay values can significantly impact behavior unless label alignment is addressed. However, their method focuses on mitigating symptoms of latency degradation, not latency modeling itself.

In our approach, we acknowledge the inevitability of latency and explicitly model its effect during both training and inference. By integrating latency-indexed predicted future actions with the current action based on the real-time latency value, the proposed method mitigates the impact of latency on steering behavior in vision-based imitation-learning lane keeping.

3. Method

This section describes the proposed Perception Latency Mitigation Network (PLM-Net) and its evaluation methodology. We first introduce the conceptual framework and explain how latency mitigation is achieved through the interaction between the Base Model (BM) and the Timed Action Prediction Model (TAPM). We then detail the architectural design of both models, followed by the training procedure and the performance metrics used to evaluate latency mitigation in vision-based lane keeping.

3.1. PLM-Net Framework

We begin by describing the overall framework of PLM-Net and the interaction between its components. As shown in Figure 3, the PLM-Net has two major components, the Base Model (BM) and the Timed Action Prediction Model (TAPM). This novel deep learning approach smoothly integrates the TAPM with the original vision-based LKA system of the AV, represented by the BM. Figure 4 shows how these two models mitigate the perception latency, where

π_{ϕ}^{BM}

is the BM policy and

π_{θ}^{TAPM}

is the TAPM policy. As explained in (1), given the vehicle state

s_{t}

at time t,

π_{ϕ}^{BM}

takes the input

i_{t} = {o_{t}, v_{t}}

, where

o_{t}

is the visual observation and

v_{t}

is the vehicle speed at time t, and provides the action

{\tilde{a}}_{t}^{BM}

.

{\tilde{a}}_{t}^{BM} = π_{ϕ}^{BM} (i_{t}) = π_{ϕ}^{BM} (o_{t}, v_{t}) .

(1)

The TAPM is a predictive action model, meaning that the policy

π_{θ}^{TAPM}

generates a set of predictive action values

{\tilde{a}}_{t}^{TAPM}

, where each action corresponds to a future operating state associated with a specific latency value in

δ^{TAPM}

.

The number of predictive actions in the vector

{\tilde{a}}_{t}^{TAPM}

depends on the number of sub-models in TAPM. If there are N sub-models, then

δ^{TAPM} = [δ_{1}, δ_{2}, \dots, δ_{N}]

. For example, if

δ_{1} = 0.15

s, then the action

{\tilde{a}}_{t + 0.15}

represents the action that the BM would take if the vehicle was in the state

s_{t + 0.15}

. The process of obtaining the vector

{\tilde{a}}_{t}^{TAPM}

is described by (2), where the inputs to

π_{θ}^{TAPM}

are the output of the

π_{ϕ}^{BM}

(i.e., the action

{\tilde{a}}_{t}^{BM}

) along with two feature vectors, the image feature vector

z_{t}^{o}

and the vehicle velocity vector

z_{t}^{v}

, both derived from

π_{ϕ}^{BM}

.

\begin{matrix} {\tilde{a}}_{t}^{TAPM} & = [{\tilde{a}}_{t + δ_{1}}, {\tilde{a}}_{t + δ_{2}}, \dots, {\tilde{a}}_{t + δ_{N}}] \\ = π_{θ}^{TAPM} ({\tilde{a}}_{t}, z_{t}^{o}, z_{t}^{v}) . \end{matrix}

(2)

The final action of the PLM-Net, denoted by

{\tilde{a}}_{t}^{PLM}

, is obtained through linear interpolation, as detailed in Algorithm 1. This interpolation combines the outputs of both

π_{ϕ}^{BM}

and

π_{θ}^{TAPM}

according to the real-time perception latency value

δ

. In this work, the latency

δ

is assumed to be measurable or monitored at inference time (e.g., through system-level latency tracking mechanisms) and is explicitly injected in the controlled simulation environment used for evaluation.

Algorithm 1 Linear Interpolation for Latency

Require:

δ, δ_{ref}

// Target latency value and list of known latency.
Require:

a_{δ_{ref}}

// Corresponding action values for known latency.
Ensure:

a_{δ}

// Interpolated action value for target latency
1: for

j = 1

to N do
2: if

δ_{ref} [j] < δ < δ_{ref} [j + 1]

then
3:

a_{δ} \leftarrow a_{δ_{ref}} [j] + \frac{δ - δ_{ref} [j]}{δ_{ref} [j + 1] - δ_{ref} [j]} \times (a_{δ_{ref}} [j + 1] - a_{δ_{ref}} [j])

  4:    end if
  5: end for
  6: return

a_{δ}

Since

{\tilde{a}}_{t}^{BM}

represents the steering action corresponding to the zero-latency case; it is equivalent to

{\tilde{a}}_{t + 0.0}

. Therefore, the value

0.0

is prepended to

δ^{TAPM}

and the action

{\tilde{a}}_{t}^{BM}

is appended to

{\tilde{a}}_{t}^{TAPM}

to construct the reference latency vector

δ_{r e f}

and its associated steering vector

{\tilde{a}}_{δ_{r e f}}

, as shown in (3).

\begin{matrix} δ_{r e f} & = [0.0, δ^{TAPM}] = [0.0, δ_{1}, δ_{2}, \dots, δ_{N}], \\ {\tilde{a}}_{δ_{ref}} & = [{\tilde{a}}_{t}^{BM}, {\tilde{a}}_{t}^{TAPM}] \\ = [{\tilde{a}}_{t + 0.0}, {\tilde{a}}_{t + δ_{1}}, {\tilde{a}}_{t + δ_{2}}, \dots, {\tilde{a}}_{t + δ_{N}}] . \end{matrix}

(3)

Given a measured latency value

δ

, the algorithm identifies the two adjacent latency entries in

δ_{r e f}

that bound

δ

and performs linear interpolation between their corresponding steering predictions. If

δ

exactly matches one of the predefined latency values, the corresponding steering action is selected directly without interpolation.

At inference time, the Base Model first computes the nominal steering action, the TAPM generates the latency-indexed predictive actions, and the measured latency

δ

determines the interpolated final output. This procedure enables smooth mitigation of both constant and time-varying perception latency within the predefined latency range.

3.2. PLM-Net Models Architecture

We now detail the internal architecture of the Base Model (BM) and the Timed Action Prediction Model (TAPM).

The architecture of the PLM-Net models is illustrated in Figure 5. The network design of the BM is presented in Figure 5a, and the network design of the TAPM is presented in Figure 5b.

The BM network design is inspired by the NVIDIA PilotNet structure [40], with modifications tailored to our requirements. Specifically, we adapt the network to accept visual observations (

o_{t}

) and vehicle speed (

v_{t}

) as inputs and to predict steering angle (action

{\tilde{a}}_{t}^{BM}

) as output, and we add dropout layers to improve generalization and avoid overfitting. The BM features two primary inputs: the visual observation

o_{t}

and the vehicle speed

v_{t}

. The visual observation undergoes processing through five convolutional layers, then a flatten layer to obtain the image feature vector

z_{t}^{o}

. Simultaneously, the vehicle speed input is directed through a fully connected layer that has 144 neurons, resulting in the formation of the vehicle speed vector

z_{t}^{v}

. Subsequently, the image feature vector

z_{t}^{o}

and the vehicle speed vector

z_{t}^{v}

are concatenated and forwarded to a multi-layer perceptron (MLP) network. This MLP configuration consists of four fully connected layers interspersed with three dropout layers. The fully connected layers contain 512, 100, 50, and 10 neurons, respectively. The dropout layers maintain a dropout rate of

0.3

. The final output of the BM

{\tilde{a}}_{t}^{BM}

, the image feature vector

z_{t}^{o}

, and the vehicle speed vector

z_{t}^{v}

, are forwarded to the TAPM network.

The TAPM network design is the result of fusing two key ideas. Firstly, it incorporates a predictive model, similar to ANEC [7], which was inspired by human drivers’ ability to mitigate perception latency. Secondly, it utilizes the BCIL framework [8], employing multiple sub-models to provide a range of predictive action values that align with different latency levels, and adding a ‘command’ input, representing the perception latency

δ

, to influence the final action value. The TAPM network inputs are

{\tilde{a}}_{t}^{BM}

,

z_{t}^{o}

, and

z_{t}^{v}

, forwarded from the BM. These three inputs go to 100-neuron, 500-neuron, and 100-neuron fully connected layers, respectively. The outputs of these three layers are then concatenated to be forwarded to all sub-models. Each sub-model consists of three fully connected layers, with 200, 100, and 50 neurons, interspersed with two dropout layers with a dropout rate of

0.3

. Sub-models outputs will result in

{\tilde{a}}_{t}^{TAPM}

, shown in (2). Although a single neural network could, in principle, learn a continuous mapping between latency values and steering actions, prior work [8,31] has shown that branched architectures improve stability and performance when handling condition-dependent outputs. This motivated our adoption of a BCIL-inspired design for TAPM, where separate sub-models specialize in distinct latency conditions.

3.3. PLM-Net Models Training

This subsection explains the supervised training procedures for both the BM and the TAPM. The functionality of a vision-based LKA system can be achieved through imitation learning, where we directly map the steering angle (i.e., action

a_{t}

) to the input

i_{t} = {o_{t}, v_{t}}

, where

o_{t}

is the visual observation and

v_{t}

is the vehicle speed at time t. The training dataset collected by an expert driver can be defined as

D = {i_{t}, a_{t}}_{t = 1}^{M}

, where M is the total number of time-steps. Figure 6 explains the training process of the PLM-Net models. Learning the BM policy

π_{ϕ}^{BM}

is a supervised learning problem. The parameters

ϕ

of the policy are optimized by minimizing the prediction error of

a_{t}

given the input

i_{t}

, as shown in (4), where we use the Mean Squared Error (MSE) to calculate the loss per sample. Once optimized, the BM can predict the action

{\tilde{a}}_{t}^{BM}

given the input

i_{t} = {o_{t}, v_{t}}

at time t, as shown in (1).

\underset{ϕ}{arg min} \frac{1}{M} \sum_{t = 1}^{M} {({\tilde{a}}_{t}^{BM} - a_{t})}^{2}

(4)

To learn the TAPM policy

π_{θ}^{TAPM}

, we generate a new dataset

D^{TAPM}

from the original dataset

D

, mapping the input

i_{t} = {o_{t}, v_{t}}

to N distinct future actions

a_{t}^{TAPM}

corresponding to N distinct latency values

δ^{TAPM}

(2), where

D^{TAPM} = {i_{t}, a_{t}^{TAPM}}_{t = 1}^{M} = {i_{t}, [a_{t + δ_{1}}, \dots, a_{t + δ_{N}}]}_{t = 1}^{M}

The optimization of the parameters

θ

of the TAPM policy

π_{θ}^{TAPM}

is explained in (5). We minimize the prediction error of

a_{t}^{TAPM}

given the input

i_{t}

, where

a_{t}^{TAPM}

indicates the ground truth values of

{\tilde{a}}_{t}^{TAPM}

. We use MSE to calculate the loss per sample.

\underset{θ}{arg min} \frac{1}{M} \sum_{t = 1}^{M} \sum_{j = 1}^{N} {({\tilde{a}}_{t + δ_{j}} - a_{t + δ_{j}})}^{2}

(5)

To integrate the TAPM with the existing LKA system without modifying it, we set the BM to be non-trainable during the training of the TAPM. This modular separation allows PLM-Net to serve as a plug-in latency mitigation layer, enabling deployment without retraining the base controller. This design ensures that latency mitigation is achieved without altering the original control policy, preserving the integrity and deployment characteristics of the base lane-keeping controller.

3.4. Performance Metrics

Finally, we describe the quantitative metrics used to evaluate latency mitigation performance. In the context of lateral control for AVs, perception latency can lead to delayed or incorrect actions, particularly affecting the steering angle, which in turn impacts the overall trajectory of the vehicle, as explained in Section 1.2. Therefore, to assess PLM-Net’s ability to mitigate this latency, two primary evaluation criteria can be employed: steering angle similarity and trajectory similarity. Accurate steering is essential for maintaining lane position and following the desired path, ensuring precise vehicle control even under latency conditions. Meanwhile, trajectory similarity evaluates how closely the vehicle’s path adheres to the intended trajectory, providing insight into the broader impact of perception latency on vehicle navigation and the efficacy of PLM-Net in mitigating it.

4. Simulation-Based Validation Setup

This section outlines the experimental setup for validating the Perception Latency Mitigation Network (PLM-Net). We begin by describing the simulator used for creating a controlled testing environment. Next, we detail the dataset for training and evaluation. We then discuss the methods for measuring driving performance, followed by the parameter tuning process. After that, an ablation study is presented to highlight key development decisions. Finally, we specify the computational environment and machine learning framework, including hardware and software configurations.

4.1. Simulator

To simulate and evaluate the effect of perception latency on lateral control, we used the OSCAR simulator [41], which provides a simulated Ford Fusion vehicle and multiple test tracks. OSCAR is tightly integrated with an ROS (Robotic Operating System) [42] and a Gazebo multi-robot 3D simulator [43], enabling real-time control testing and closed-loop behavior evaluation. These capabilities made it well-suited for our goal of studying perception latency under controlled and repeatable conditions.

While other simulators such as CARLA [44] or NVIDIA Drive SIM [45] offer photorealistic rendering and advanced sensor simulation, the focus of our work does not require such features. Our primary goal is to explore the learning and integration of latency-aware control models, and OSCAR provided a practical and efficient platform for this investigation. In our experiments, perception latency was explicitly injected within the control loop to emulate delay, enabling precise and repeatable evaluation under both constant and time-varying latency profiles.

The simulated environment consisted of a single ego vehicle without surrounding traffic, and the human driver followed the predefined lane centerline as the reference trajectory during data collection.

4.2. Dataset

4.2.1. Training and Test Tracks

The training track used for PLM-Net is identical to the one employed in our prior work [7], ensuring consistency in base controller learning. To evaluate generalization and latency mitigation performance, a separate three-lane test track was designed containing a combination of straight segments and both left and right turns, as shown in Figure 7. This track was not used during training and was selected to expose the controller to multiple curvature conditions so that the interaction between perception latency and different road geometries could be examined during evaluation.

4.2.2. Data Collection

Our training dataset, denoted by

D

, was collected by a human driver navigating the training track using an OSCAR simulator. This simulator captures visual observations via a mounted camera on the vehicle, alongside critical control values such as steering angle, throttle position, braking pressure, time, velocity, acceleration, and position. High-quality data was collected using the Logitech G920 dual-motor feedback driving force racing wheel with pedals and a gear shifter, resulting in approximately 115,000 clean training samples. Table 1 provides detailed statistics on the steering and velocity data.

The steering angle for the vehicle is represented on a scale from

- 1

to 1, where 0 denotes the center position. According to the right-hand rule, positive steering angles (0 to 1) correspond to rotations to the left, and negative steering angles (0 to

- 1

) correspond to rotations to the right. The steering wheel has a maximum rotation angle of

\pm 450^{\circ}

. This mapping implies that a steering angle of 1 corresponds to a

450^{\circ}

rotation to the left, while a steering angle of

- 1

corresponds to a

450^{\circ}

rotation to the right. Thus, the steering angle range is scaled such that 0 to 1 maps to

0^{\circ}

to

450^{\circ}

and 0 to

- 1

maps to

0^{\circ}

to

- 450^{\circ}

.

During both training and evaluation, the vehicle operated at a constant predefined speed to isolate the effect of perception latency on lateral control. The relationship between speed and latency impact was previously analyzed in [7]; in this work, speed was held constant to focus specifically on latency mitigation behavior.

4.2.3. Data Balancing

As shown in Table 1, approximately

50 %

of the steering values in our dataset are close to

0.0

, indicating the vehicle traveling on a straight road segment. Training a driving model solely on this dataset would introduce bias. To address this, we conducted histogram-based data balancing to reduce the skew towards zero steering values. Figure 8 illustrates the steering angle histogram before (left) and after (right) the balancing process. Post-balancing, our dataset comprised approximately 67,000 data samples.

4.2.4. Data Augmentation

To improve the model’s generalization capabilities and augment dataset diversity during training, we implemented data augmentation techniques. Each image presented to the network undergoes a random subset of transformations, including horizontal flipping, where the steering value is negated, and random changes in brightness, while preserving the original steering angle. These augmentation techniques have proven effective in enhancing the robustness of our model during training.

4.3. Driving Performance Evaluation

While Section 3.4 explained the rationale behind choosing the performance metrics and their importance, this section discusses the adopted methods to measure them in our experiments.

4.3.1. Steering Angle Similarity

We analyzed the steering angle values under different conditions: BM driving without latency, BM driving with latency but without TAPM, and BM driving with latency and TAPM (utilizing PLM-Net). This analysis was conducted both qualitatively, through visual inspection, and quantitatively, by calculating metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE). These metrics quantify pointwise deviation between steering signals, where lower values indicate improved temporal alignment with the latency-free baseline controller.

4.3.2. Trajectory Similarity

The performance metrics used to compare the driving trajectories were adopted from [46,47]. We measure the similarity between driving trajectories based on lane center positioning. We use partial curve mapping, Frechet distance, area between curves, curve length, and dynamic time warping from [46], and the Driving Trajectory Stability Index (DTSI) from [47]. These trajectory metrics capture complementary aspects of spatial deviation, including geometric similarity, accumulated lateral error, temporal alignment of motion profiles, and overall path stability relative to the latency-free baseline.

All evaluations were conducted in a deterministic closed-loop simulation environment with fixed speed and predefined latency injection profiles. Under identical latency settings, repeated trials yield identical steering outputs and vehicle trajectories. Therefore, a single representative evaluation per latency condition is sufficient to characterize system behavior.

4.4. Parameter Tuning

For the TAPM, we used

N = 5

sub-models, meaning the TAPM predicts five future actions for five different latency values in

δ^{TAPM}

. Specifically, the latency values start with

δ_{1} = 0.15

s, with increments of

0.05

seconds, resulting in

δ^{TAPM} = [0.15, 0.20, 0.25, 0.30, 0.35]

seconds. Consequently, the predictive action vector

{\tilde{a}}_{t}^{TAPM}

in (2) becomes

{\tilde{a}}_{t}^{TAPM} = [{\tilde{a}}_{t + 0.15}, {\tilde{a}}_{t + 0.20}, {\tilde{a}}_{t + 0.25}, {\tilde{a}}_{t + 0.30}, {\tilde{a}}_{t + 0.35}]

The selected latency range was designed to cover reported steering actuation delays (0.2 s) [4], as well as visual and network-induced delays, in connected and teleoperated autonomous vehicle systems, which frequently range from 100 to 350 ms and exhibit noticeable degradation around 300 ms [5,22].

As detailed in (3), the reference latency vector

δ_{r e f}

and the corresponding action vector

{\tilde{a}}_{δ_{r e f}}

are formed by including the BM action for zero latency. Thus,

δ_{r e f}

and

{\tilde{a}}_{δ_{r e f}}

become

δ_{r e f} = [0.0, 0.15, 0.20, 0.25, 0.30, 0.35] .

{\tilde{a}}_{δ_{r e f}} = [{\tilde{a}}_{t + 0.0}, {\tilde{a}}_{t + 0.15}, {\tilde{a}}_{t + 0.20}, {\tilde{a}}_{t + 0.25}, {\tilde{a}}_{t + 0.30}, {\tilde{a}}_{t + 0.35}]

These configurations enable the PLM-Net to handle both constant and time-varying perception latency within the range [0–0.35] s. Latency values outside this predefined range were not evaluated because, beyond approximately

0.30

–

0.35

s, the baseline model (BM) departed the lane and subsequently left the track, preventing meaningful trajectory-based evaluation. Extending the latency range would require additional latency heads and potentially a redesigned baseline controller.

Both models, the BM and the TAPM, were trained using the Adam optimizer [48] with a batch size of 32 and a learning rate of

0.001

. Table 2, shows the number of trainable and non-trainable parameters for the BM and the TAPM. The BM has 6,006,191 parameters, in which all of them are trainable. The TAPM has 12,256,396 total parameters, where 6,250,205 are trainable and 6,006,191 are non-trainable since the BM layers are set to be not trainable when training the TAPM. For the performance metric DTSI, we used the default parameters recommended in [47].

4.5. Latency Knowledge and Modeling Assumptions

In this study, perception latency is injected within the closed-loop simulation environment, allowing direct access to the delay value at runtime. The framework therefore assumes availability of a measurable scalar latency estimate rather than knowledge of its physical origin. In practical autonomous driving systems, such estimates can be obtained through timestamp synchronization across sensing, processing, communication, and actuation modules.

The injected delay represents aggregate perception-to-actuation latency rather than separating algorithmic, communication, and actuator components. This abstraction allows the mitigation mechanism to remain agnostic to the physical source of delay and focus on compensating its behavioral impact on steering control.

Linear interpolation between discrete latency-conditioned predictions is adopted because steering evolution remains locally smooth under moderate delay variation within the trained range. The discrete latency heads are ordered with respect to delay magnitude, enabling interpolation to approximate intermediate latency values without modifying the base controller.

4.6. Ablation Study

In our exploration of different model architectures for the TAPM network, we experimented with various configurations to optimize performance. Initially, we introduced an additional fully connected layer with 500 neurons after the concatenation layer preceding the five sub-models. However, this adjustment overly complicated the learning process for the predictive action values. Since this layer was shared among all sub-models, it hindered their individual learning capacities, resulting in subpar outcomes. Further experimentation involved modifying the number of layers within the sub-models. Reducing the layers to two, with 100 and 50 neurons respectively, led to the model’s inability to effectively learn predictive actions. Similarly, adding an extra fully connected layer to each sub-model with 300 neurons yielded comparable (if not slightly inferior) results to the existing architecture, thus introducing unnecessary complexity without significant improvement. Additionally, we explored the integration of Long Short-Term Memory (LSTM) [49] layers into the sub-models to capture temporal information. However, this approach encountered substantial challenges. Firstly, the model’s complexity increased significantly, impeding computational efficiency. Secondly, the nature of capturing temporal information hindered conventional data balancing techniques and random data sampling from the dataset to enhance batch diversity.

During model training, we found that a batch size of 32 yielded optimal results compared to smaller sizes such as 16 and 8. Furthermore, we experimented with adjusting the learning rate from

0.001

to

0.0001

. However, the model failed to converge under the lower learning rate, suggesting that the original rate was more conducive to effective training.

4.7. Computational Environment and Machine Learning Framework

Our machine learning framework is built upon TensorFlow and Keras libraries. Specifically, we utilized Keras version

2.2 . 5

in conjunction with TensorFlow-GPU version

1.12 . 0

, leveraging CUDA 9 and cuDNN

7.1 . 2

for GPU acceleration. All computational experiments were conducted on a hardware setup featuring an Intel i7-10700K CPU, 32 GB of RAM, and an NVIDIA GeForce RTX 2080 Ti with 11 GB of GPU memory. The operating system employed for these experiments was Ubuntu

18.04 . 6

.

4.8. Computational Cost Analysis

To assess the computational overhead introduced by PLM-Net at deployment, we report inference latency and GPU memory usage relative to the BM. Parameter counts and the trainable/non-trainable breakdown are provided in Table 2. During inference, PLM-Net executes a forward pass through the BM, followed by a forward pass through the TAPM and a lightweight linear interpolation step (Algorithm 1).

Inference-time measurements were performed with batch size 1 after discarding the first 50 samples to mitigate warm-up effects. The BM achieved an average inference time of 2.352 ms per frame (p95: 4.532 ms; N = 670) and occupied approximately 650 MB of GPU memory. In comparison, PLM-Net required 5.909 ms per frame on average (p95: 9.880 ms; N = 686) and occupied approximately 778 MB of GPU memory. This corresponds to an additional computational overhead of 3.557 ms per frame and an additional 128 MB of GPU memory (19.7% increase) relative to the BM.

Given a 30 Hz control frequency (33.3 ms per cycle), the observed inference times confirm that PLM-Net maintains real-time feasibility with a substantial timing margin. The additional computational overhead is justified by the improved trajectory tracking performance under latency conditions, as demonstrated in Section 5.

5. Results

Our experimental design aimed to investigate the impact of perception latency on driving and assess the efficacy of our proposed solution, PLM-Net, in mitigating this effect (as detailed in Section 1.2). To simulate latency, we introduced delays in the input data and applied closed-loop velocity control to maintain a constant vehicle speed (approximately 60 km/h), reflecting realistic latency conditions observed in perception–control pipelines where perception and decision-making processes are delayed. For instance, with a

0.2

s latency, the available visual observation at time t becomes

o_{t - 0.2}

instead of

o_{t}

, causing the baseline model (BM) to compute actions based on outdated perception. PLM-Net aims to mitigate this mismatch by predicting an action that better approximates the desired action at time t, despite the delayed observation.

We assess the impact of perception latency through a comparison of driving behaviors: BM without latency, BM with latency, and PLM-Net with latency. Successful mitigation by PLM-Net is indicated when its driving performance closely resembles that of the latency-free BM. Our evaluation involves analyzing steering angle similarity and driving trajectory similarity, as detailed in Section 4.3, for both constant and time-variant perception latency.

To examine how perception latency interacts with road geometry, trajectory similarity metrics are reported not only for the full test track but also for individual track segments, including straight sections, left turns, and right turns. This segment-level evaluation enables analysis of latency-induced deviations under different curvature conditions and provides additional insight into how the mitigation mechanism behaves across distinct driving scenarios.

5.1. Constant Perception Latency Mitigation

In evaluating PLM-Net under constant perception latency, we focus on a latency of

0.2

s, with similar trends observed for other constant latency values, as illustrated in Appendix A. Figure 9 provides a qualitative comparison of steering angles over time between the BM with and without latency and PLM-Net with the same latency. In this figure, the blue line represents the BM driving without latency, the green line represents the BM driving with

0.2

s latency, and the red line represents PLM-Net driving with

0.2

s latency. Additionally, Figure 10 provides a visual representation of vehicle trajectories on the test track, colored based on steering angle, to further elucidate the comparative performance. Table 3 quantifies steering angle errors, demonstrating PLM-Net’s superior performance in reducing errors compared to the BM under identical latency conditions. Under a constant perception latency of 0.2 s, the performance of the BM degraded substantially.

Furthermore, Figure 11 presents trajectory comparisons qualitatively, while Table 4 presents trajectory comparisons quantitatively, showing PLM-Net’s ability to maintain accurate driving trajectories despite latency-induced challenges. Each color-coded trajectory corresponds to a different driving condition: blue for the BM driving without latency, green for the BM driving with

0.2

s latency, and red for PLM-Net driving with

0.2

s latency. Examining the deviation from the lane center on the full track, the Partial Curve Mapping metric for the BM with

0.2

s latency increased by

238.7 %

, while PLM-Net maintained a much smaller increase of only

11.1 %

. Similarly, improvements in Frechet distance, area between curves, curve length, and DTSI demonstrate that PLM-Net significantly reduces the trajectory deviation caused by latency. The additional segments of Figure 11 and Table 4, specifically parts (b), (c), and (d), which depict the trajectories on a straight road, right turn, and left turn, respectively, also demonstrate that PLM-Net effectively mitigates latency, similar to the results observed on the full track.

The Mean Absolute Error (MAE) between the BM without latency and the BM with latency is

0.1915

, indicating a substantial degradation in steering angle accuracy. However, when using PLM-Net, the performance decline was mitigated, with the MAE reduced to

0.0726

. This corresponds to a

62.1 %

reduction in MAE relative to the BM under the same latency condition. Additionally, the Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) were similarly improved with PLM-Net, showing reductions of

82.5 %

and

58.2 %

, respectively.

5.2. Time-Variant Perception Latency Mitigation

For time-variant perception latency, we evaluate PLM-Net against varying latency levels ([0.0–0.35] s). Similar to the constant latency scenario, the upper image in Figure 12 illustrates qualitative steering angle comparisons over time, with the blue line representing the BM driving without latency, the green line representing the BM driving with time-variant latency, and the red line representing PLM-Net driving with time-variant latency. The lower image illustrates the varying latency values experienced by both models over time. Additionally, Figure 13 provides a visual representation of vehicle trajectories on the test track, colored based on steering angle values. Table 5 quantifies steering angle errors, demonstrating PLM-Net’s effectiveness in reducing errors compared to the BM under time-variant latency conditions. Under a time-variant perception latency of [0.0–0.35] s, the performance of the BM degraded substantially. The Mean Absolute Error (MAE) between the BM without latency and the BM with latency is

0.3336

, indicating a substantial degradation in steering angle similarity. However, when using PLM-Net, the performance decline was mitigated, with the MAE reduced to

0.0710

. This represents a

78.7 %

improvement in MAE compared to the BM under the same latency condition. Additionally, the Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) were similarly improved with PLM-Net, showing reductions of

94.2 %

and

76.0 %

, respectively.

Figure 14 presents trajectory comparisons qualitatively, while Table 6 presents trajectory comparisons quantitatively, confirming PLM-Net’s successful mitigation of time-variant perception latency. Each color-coded trajectory corresponds to a different driving condition: blue for the BM driving without latency, green for the BM driving with time-variant latency, and red for PLM-Net driving with time-variant latency. The Partial Curve Mapping metric revealed a

396.6 %

increase in deviation from the lane center for the BM, with a [0.0–0.35] s time-variant latency on the full track, while PLM-Net showed a more modest increase of

114.0 %

. Similarly, the Frechet distance for the BM increased by

254.9 %

, compared to a

66.2 %

increase for PLM-Net. This pattern of reduced deviation is also reflected in the improvements in area between curves, curve length, and DTSI, indicating that PLM-Net significantly mitigates the trajectory deviations caused by latency. Parts (b), (c), and (d) of Figure 14 and Table 6, which illustrate the trajectories during a straight segment, a right turn, and a left turn, respectively, exhibit results consistent with the full track, confirming that PLM-Net successfully mitigates latency.

Overall, the experimental findings underscore PLM-Net’s robustness in mitigating both constant and time-variant perception latency, indicating its potential for improving robustness against perception latency in vision-based autonomous driving systems.

6. Conclusions

This paper introduced PLM-Net, a learning-based framework designed to mitigate the effect of perception latency in vision-based imitation-learning lateral control systems. By integrating a Timed Action Prediction Model (TAPM) with an existing Base Model (BM), PLM-Net anticipates latency-induced mismatch between perception and control without modifying the original controller architecture.

The TAPM predicts a discrete set of future steering actions corresponding to predefined latency values, and linear interpolation is used to generate the final control output according to the real-time latency. This design enables mitigation of both constant and time-varying perception latency within the modeled range.

Experimental results in a deterministic closed-loop simulation environment demonstrated that PLM-Net substantially reduces latency-induced steering and trajectory errors. Under a constant

0.2

s latency, PLM-Net achieved a

62.1 %

reduction in MAE compared to the baseline model. Under time-variant latency within [0.0–0.35] s, the MAE reduction reached

78.7 %

. Trajectory-based metrics further confirmed improved lane-following performance across straight and turning segments.

Limitations: This study was conducted in a deterministic closed-loop simulation environment with a fixed vehicle speed to isolate the effect of perception latency on lateral control behavior. While this controlled setting enables systematic evaluation of latency mitigation behavior, broader validation across multiple drivers, stochastic latency realizations, and more diverse road environments remains an important direction for future work. The training dataset was collected from a single driver in a controlled simulator setting, consistent with common imitation-learning frameworks, and was evaluated on a separate test track to assess generalization under the studied conditions. The modeled latency range was bounded within [0.0–0.35] s, beyond which the baseline controller departed the track boundaries, preventing meaningful trajectory-based comparison. Additionally, the proposed framework was evaluated for lateral control only under aggregate perception-to-actuation delay modeling. These constraints define the scope of validation for the present study and motivate future investigation under broader operational, dynamic, and multi-sensor conditions. The present work focuses on architectural latency mitigation under bounded delay assumptions and does not claim replacement or superiority over model-based delay compensation methods; rather, it provides a complementary learning-based solution designed for vision-based imitation-learning control pipelines where explicit vehicle modeling or controller redesign may not be feasible.

Future work will focus on extending the modeled latency range, evaluating the framework under varying vehicle speeds and more complex traffic scenarios, integrating the approach into higher-fidelity simulation and real-world platforms, and exploring hybrid strategies that combine learning-based latency mitigation with classical delay-aware control methods to enhance robustness and theoretical guarantees.

Author Contributions

Conceptualization, J.K. and A.K.; methodology, A.K.; software, A.K.; validation, A.K.; formal analysis, A.K.; resources, J.K.; data curation, A.K.; writing—original draft preparation, A.K.; writing—review and editing, J.K.; visualization, A.K.; supervision, J.K.; project administration, J.K.; funding acquisition, J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This material is based upon work supported in part by the National Science Foundation under Grant No. 2500638, and in part by institutional support from the University of Michigan–Dearborn.

Data Availability Statement

The project page includes the video, code, and dataset: https://awskhalil.github.io/plm-net/ (accessed on 8 March 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Constant Perception Latency

The following are the results from the different constant perception latency values that we tested to validate our model PLM-Net. The interpretation of these results is similar to that in Section 5, so we directly show the relevant figures and tables for each latency value.

Appendix A.1. Perception Latency = 0.15 s

This subsection illustrates the results when the perception latency was

0.15

s. Figure A1 and Table A1 show the steering comparison against time. Figure A2 compares driving trajectories colored based on steering angle values. Figure A3 and Table A2 show the trajectory similarity.

Table A1. Steering Angle Similarity. Quantitative comparison of steering angle between the BM driving without latency, the BM driving with latency of

0.15

s, and PLM-Net driving with the same latency. Downward arrows indicate that lower values correspond to better performance.

Table A1. Steering Angle Similarity. Quantitative comparison of steering angle between the BM driving without latency, the BM driving with latency of

0.15

s, and PLM-Net driving with the same latency. Downward arrows indicate that lower values correspond to better performance.

BM, No Latency vs.	MAE ↓	MSE ↓	RMSE ↓
BM, Latency = 0.15 s	0.171355	0.0570811	0.238916
PLM-Net, Latency = 0.15 s	0.053122	0.00691402	0.0831506

Figure A1. Qualitative comparison of steering angle over time. The blue line represents the BM driving on the test track with no latency. The green line represents the BM driving on the test track with

0.15

s latency. The red line represents the PLM-Net driving on the test track with the same latency value.

Figure A1. Qualitative comparison of steering angle over time. The blue line represents the BM driving on the test track with no latency. The green line represents the BM driving on the test track with

0.15

s latency. The red line represents the PLM-Net driving on the test track with the same latency value.

Figure A2. Vehicle trajectories colored by steering angle. (a) The trajectory of the BM driving with no latency. (b) The trajectory of the BM driving with

0.15

s latency. (c) The trajectory of the PLM-Net driving with

0.15

s latency.

Figure A2. Vehicle trajectories colored by steering angle. (a) The trajectory of the BM driving with no latency. (b) The trajectory of the BM driving with

0.15

s latency. (c) The trajectory of the PLM-Net driving with

0.15

s latency.

Figure A3. Trajectory comparison between the BM driving with no latency (blue line), the BM driving with

0.15

s latency (green line), and the PLM-Net driving with

0.15

s latency (red line). (a) The trajectories on the full test track. (b) The trajectories on a straight road segment. (c) The trajectories on a left turn. (d) The trajectories on a right turn.

Figure A3. Trajectory comparison between the BM driving with no latency (blue line), the BM driving with

0.15

s latency (green line), and the PLM-Net driving with

0.15

s latency (red line). (a) The trajectories on the full test track. (b) The trajectories on a straight road segment. (c) The trajectories on a left turn. (d) The trajectories on a right turn.

Table A2. Quantitative comparison of driving trajectory similarity between the BM driving without latency, the BM driving with latency of

0.15

s, and PLM-Net driving with the same latency. All driving models’ trajectories are compared with the lane center on full track, straight road, left turn, and right turn. Downward arrows indicate that lower values correspond to better performance.

Table A2. Quantitative comparison of driving trajectory similarity between the BM driving without latency, the BM driving with latency of

0.15

s, and PLM-Net driving with the same latency. All driving models’ trajectories are compared with the lane center on full track, straight road, left turn, and right turn. Downward arrows indicate that lower values correspond to better performance.

Figure A3 Ref.	Driving Model	Partial	Frechet	Area	Curve	Dynamic	$\bar{DTSI}$ ↓
		Curve ↓	Distance ↓	Between ↓	Length ↓	Time ↓
		Mapping		Curves		Warping
(a) Full Track	BM, No Latency (ref)	1.19	2.176	4658.15	0.349	1253.21	0.034
(a) Full Track	BM, Latency = $0.15$ s	1.931	4.087	15,946.7	0.432	2343.73	0.266
(a) Full Track	PLM-Net, Latency = $0.15$ s	1.579	4.543	13,035	0.383	2255.83	0.041
(b) Straight	BM, No Latency (ref)	24.475	5.095	13.556	0.077	76.376	0.012
(b) Straight	BM, Latency = $0.15$ s	229.411	5.918	123.118	0.109	213.709	0.016
(b) Straight	PLM-Net, Latency = $0.15$ s	63.767	5.566	34.741	0.086	97.41	0.012
(c) Left turn	BM, No Latency (ref)	0.895	3.966	41.402	0.178	67.892	0.025
(c) Left turn	BM, Latency = $0.15$ s	0.932	4.115	98.63	0.179	113.299	0.07
(c) Left turn	PLM-Net, Latency = $0.15$ s	0.531	2.126	81.367	0.141	92.124	0.058
(d) Right turn	BM, No Latency (ref)	0.293	1.546	90.228	0.048	79.036	0.037
(d) Right turn	BM, Latency = $0.15$ s	0.746	3.971	285.641	0.08	169.875	0.113
(d) Right turn	PLM-Net, Latency = $0.15$ s	0.313	1.882	67.192	0.05	75.371	0.098

Appendix A.2. Perception Latency = 0.25 s

This subsection illustrates the results when the perception latency was

0.25

s. Figure A4 and Table A3 show the steering comparison against time. Figure A5 compares driving trajectories colored based on steering angle values. Figure A6 and Table A4 show the trajectory similarity.

Figure A4. Qualitative comparison of steering angle over time. The blue line represents the BM driving on the test track with no latency. The green line represents the BM driving on the test track with

0.25

s latency. The red line represents the PLM-Net driving on the test track with the same latency value.

Figure A4. Qualitative comparison of steering angle over time. The blue line represents the BM driving on the test track with no latency. The green line represents the BM driving on the test track with

0.25

s latency. The red line represents the PLM-Net driving on the test track with the same latency value.

Figure A5. Vehicle trajectories colored by steering angle. (a) The trajectory of the BM driving with no latency. (b) The trajectory of the BM driving with

0.25

s latency. (c) The trajectory of the PLM-Net driving with

0.25

s latency.

Figure A5. Vehicle trajectories colored by steering angle. (a) The trajectory of the BM driving with no latency. (b) The trajectory of the BM driving with

0.25

s latency. (c) The trajectory of the PLM-Net driving with

0.25

s latency.

Table A3. Steering angle similarity. Quantitative comparison of steering angle between the BM driving without latency, the BM driving with latency of

0.25

s, and PLM-Net driving with the same latency. Downward arrows indicate that lower values correspond to better performance.

Table A3. Steering angle similarity. Quantitative comparison of steering angle between the BM driving without latency, the BM driving with latency of

0.25

s, and PLM-Net driving with the same latency. Downward arrows indicate that lower values correspond to better performance.

BM, No Latency vs.	MAE ↓	MSE ↓	RMSE ↓
BM, Latency = 0.25 s	0.334727	0.18581	0.431057
PLM-Net, Latency = 0.25 s	0.0921437	0.0192333	0.138684

Figure A6. Trajectory comparison between the BM driving with no latency (blue line), the BM driving with

0.25

s latency (green line), and the PLM-Net driving with

0.25

s latency (red line). (a) The trajectories on the full test track. (b) The trajectories on a straight road segment. (c) The trajectories on a left turn. (d) The trajectories on a right turn.

Figure A6. Trajectory comparison between the BM driving with no latency (blue line), the BM driving with

0.25

s latency (green line), and the PLM-Net driving with

0.25

s latency (red line). (a) The trajectories on the full test track. (b) The trajectories on a straight road segment. (c) The trajectories on a left turn. (d) The trajectories on a right turn.

Table A4. Quantitative comparison of driving trajectory similarity between the BM driving without latency, the BM driving with latency of

0.25

s, and PLM-Net driving with the same latency. All driving models’ trajectories are compared with the lane center on full track, straight road, left turn, and right turn. Downward arrows indicate that lower values correspond to better performance.

Table A4. Quantitative comparison of driving trajectory similarity between the BM driving without latency, the BM driving with latency of

0.25

s, and PLM-Net driving with the same latency. All driving models’ trajectories are compared with the lane center on full track, straight road, left turn, and right turn. Downward arrows indicate that lower values correspond to better performance.

Figure A6 ref.	Driving Model	Partial	Frechet	Area	Curve	Dynamic	$\bar{DTSI}$ ↓
		Curve ↓	Distance ↓	Between ↓	Length ↓	Time ↓
		Mapping		Curves		Warping
(a) Full Track	BM, No Latency (ref)	1.19	2.176	4658.15	0.349	1253.21	0.034
(a) Full Track	BM, Latency = $0.25$ s	8.077	18.917	25,958.2	2.09	7459.15	0.36
(a) Full Track	PLM-Net, Latency = $0.25$ s	3.47	5.357	91,195.1	0.814	3637.4	0.046
(b) Straight	BM, No Latency (ref)	24.475	5.095	13.556	0.077	76.376	0.012
(b) Straight	BM, Latency = $0.25$ s	680.624	4.756	359.311	0.122	550.559	0.077
(b) Straight	PLM-Net, Latency = $0.25$ s	150.696	3.053	78.985	0.072	122.401	0.046
(c) Left turn	BM, No Latency (ref)	0.895	3.966	41.402	0.178	67.892	0.025
(c) Left turn	BM, Latency = $0.25$ s	2.999	5.542	349.446	0.156	326.643	0.13
(c) Left turn	PLM-Net, Latency = $0.25$ s	0.973	4.765	61.525	0.218	89.88	0.093
(d) Right turn	BM, No Latency (ref)	0.293	1.546	90.228	0.048	79.036	0.037
(d) Right turn	BM, Latency = $0.25$ s	1.79	9.468	810.979	0.222	438.145	0.11
(d) Right turn	PLM-Net, Latency = $0.25$ s	0.199	1.971	108.645	0.032	103.168	0.065

Appendix A.3. Perception Latency = 0.30 s

This subsection illustrates the results when the perception latency was

0.30

s. Figure A7 and Table A5 show the steering comparison against time. Figure A8 compares driving trajectories colored based on steering angle values. Figure A9 and Table A6 show the trajectory similarity.

Figure A7. Qualitative comparison of steering angle over time. The blue line represents the BM driving on the test track with no latency. The green line represents the BM driving on the test track with

0.3

s latency. The red line represents the PLM-Net driving on the test track with the same latency value.

Figure A7. Qualitative comparison of steering angle over time. The blue line represents the BM driving on the test track with no latency. The green line represents the BM driving on the test track with

0.3

s latency. The red line represents the PLM-Net driving on the test track with the same latency value.

Figure A8. Vehicle trajectories colored by steering angle. (a) The trajectory of the BM driving with no latency. (b) The trajectory of the BM driving with

0.3

s latency. (c) The trajectory of the PLM-Net driving with

0.3

s latency.

Figure A8. Vehicle trajectories colored by steering angle. (a) The trajectory of the BM driving with no latency. (b) The trajectory of the BM driving with

0.3

s latency. (c) The trajectory of the PLM-Net driving with

0.3

s latency.

Table A5. Steering angle similarity. Quantitative comparison of steering angle error between the BM driving without latency, the BM driving with latency of

0.3

s, and PLM-Net driving with the same latency. Downward arrows indicate that lower values correspond to better performance.

Table A5. Steering angle similarity. Quantitative comparison of steering angle error between the BM driving without latency, the BM driving with latency of

0.3

s, and PLM-Net driving with the same latency. Downward arrows indicate that lower values correspond to better performance.

BM, No Latency vs.	MAE ↓	MSE ↓	RMSE ↓
BM, Latency = 0.3 s	0.226148	0.0968093	0.311142
PLM-Net, Latency = 0.3 s	0.0710674	0.0107641	0.10375

Figure A9. Trajectory comparison between the BM driving with no latency (blue line), the BM driving with

0.3

s latency (green line), and the PLM-Net driving with

0.3

s latency (red line). (a) The trajectories on the full test track. (b) The trajectories on a straight road segment. (c) The trajectories on a left turn. (d) The trajectories on a right turn.

Figure A9. Trajectory comparison between the BM driving with no latency (blue line), the BM driving with

0.3

s latency (green line), and the PLM-Net driving with

0.3

s latency (red line). (a) The trajectories on the full test track. (b) The trajectories on a straight road segment. (c) The trajectories on a left turn. (d) The trajectories on a right turn.

Table A6. Quantitative comparison of driving trajectory similarity between the BM driving without latency, the BM driving with latency of

0.3

s, and PLM-Net driving with the same latency. All driving models’ trajectories are compared with the lane center on full track, straight road, left turn, and right turn. Downward arrows indicate that lower values correspond to better performance.

Table A6. Quantitative comparison of driving trajectory similarity between the BM driving without latency, the BM driving with latency of

0.3

s, and PLM-Net driving with the same latency. All driving models’ trajectories are compared with the lane center on full track, straight road, left turn, and right turn. Downward arrows indicate that lower values correspond to better performance.

Figure A9 Ref.	Driving Model	Partial	Frechet	Area	Curve	Dynamic	$\bar{DTSI}$ ↓
		Curve ↓	Distance ↓	Between ↓	Length ↓	Time ↓
		Mapping		Curves		Warping
(a) Full Track	BM, No Latency (ref)	1.19	2.176	4658.15	0.349	1253.21	0.034
(a) Full Track	BM, Latency = $0.3$ s	16.097	28.429	93,980.4	4.494	15,235.3	0.391
(a) Full Track	PLM-Net, Latency = $0.3$ s	2.189	7.033	80,617.1	0.606	3524.91	0.02
(b) Straight	BM, No Latency (ref)	24.475	5.095	13.556	0.077	76.376	0.012
(b) Straight	BM, Latency = $0.3$ s	1046.2	7.968	631.198	0.203	911.435	0.192
(b) Straight	PLM-Net, Latency = $0.3$ s	97.525	6.002	59.361	0.108	117.782	0.014
(c) Left turn	BM, No Latency (ref)	0.895	3.966	41.402	0.178	67.892	0.025
(c) Left turn	BM, Latency = $0.3$ s	3.088	8.825	422.69	0.556	398.425	0.161
(c) Left turn	PLM-Net, Latency = $0.3$ s	0.514	1.047	60.492	0.019	69.811	0.034
(d) Right turn	BM, No Latency (ref)	0.293	1.546	90.228	0.048	79.036	0.037
(d) Right turn	BM, Latency = $0.3$ s	1.235	9.1	1321.13	0.2	545.015	0.482
(d) Right turn	PLM-Net, Latency = $0.3$ s	0.21	2.529	162.868	0.039	120.484	0.148

Appendix B. Training Results

Figure A10 shows the training results for the PLM-Net models. Each sub-figure shows the predicted vs ground truth steering angle. Figure A10a show the BM training results where

δ_{r e f} = 0.0

. Figure A10b–f show the TAPM training results for all other values of

δ_{r e f}

, as the TAPM predicts five future steering angles.

Figure A10. PLM-Net models training results. Each sub-figure shows the predicted vs ground truth steering angle. (a) BM training results where

δ_{r e f} = 0.0

. (b–f) TAPM training results for all other values of

δ_{r e f}

, as the TAPM predicts five future steering angles.

Figure A10. PLM-Net models training results. Each sub-figure shows the predicted vs ground truth steering angle. (a) BM training results where

δ_{r e f} = 0.0

. (b–f) TAPM training results for all other values of

δ_{r e f}

, as the TAPM predicts five future steering angles.

References

Kwon, J.; Choe, Y. Enhanced facilitatory neuronal dynamics for delay compensation. In Proceedings of the 2007 International Joint Conference on Neural Networks; IEEE: Piscataway, NJ, USA, 2007; pp. 2040–2045. [Google Scholar]
Li, M.; Wang, Y.X.; Ramanan, D. Towards streaming perception. In Proceedings of the European Conference on Computer Vision; Springer: Cham, Switzerland, 2020; pp. 473–488. [Google Scholar]
Ho, M.K.; Griffiths, T.L. Cognitive science as a source of forward and inverse models of human decisions for robotics and control. Annu. Rev. Control. Robot. Auton. Syst. 2022, 5, 33–53. [Google Scholar] [CrossRef]
Xu, S.; Peng, H.; Tang, Y. Preview path tracking control with delay compensation for autonomous vehicles. IEEE Trans. Intell. Transp. Syst. 2020, 22, 2979–2989. [Google Scholar] [CrossRef]
Liu, Q.; Liu, Y.; Liu, C.; Chen, B.; Zhang, W.; Li, L.; Ji, X. Hierarchical lateral control scheme for autonomous vehicle with uneven time delays induced by vision sensors. Sensors 2018, 18, 2544. [Google Scholar] [CrossRef]
Chen, Z.; Huang, X. End-to-end learning for lane keeping of self-driving cars. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV); IEEE: Piscataway, NJ, USA, 2017; pp. 1856–1860. [Google Scholar] [CrossRef]
Khalil, A.; Kwon, J. ANEC: Adaptive Neural Ensemble Controller for Mitigating Latency Problems in Vision-Based Autonomous Driving. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: Piscataway, NJ, USA, 2023; pp. 9340–9346. [Google Scholar]
Codevilla, F.; Müller, M.; López, A.; Koltun, V.; Dosovitskiy, A. End-to-End Driving Via Conditional Imitation Learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA); IEEE: Piscataway, NJ, USA, 2018; pp. 4693–4700. [Google Scholar] [CrossRef]
Salvucci, D.D. Modeling driver behavior in a cognitive architecture. Hum. Factors 2006, 48, 362–380. [Google Scholar] [CrossRef]
Berntorp, K.; Hoang, T.; Quirynen, R.; Di Cairano, S. Control architecture design for autonomous vehicles. In Proceedings of the 2018 IEEE Conference on Control Technology and Applications (CCTA); IEEE: Piscataway, NJ, USA, 2018; pp. 404–411. [Google Scholar]
Lee, H.; Choi, Y.; Han, T.; Kim, K. Probabilistically Guaranteeing End-to-end Latencies in Autonomous Vehicle Computing Systems. IEEE Trans. Comput. 2022, 71, 3361–3374. [Google Scholar] [CrossRef]
Strobel, K.; Zhu, S.; Chang, R.; Koppula, S. Accurate, low-latency visual perception for autonomous racing: Challenges, mechanisms, and practical solutions. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: Piscataway, NJ, USA, 2020; pp. 1969–1975. [Google Scholar]
Vedder, B.; Vinter, J.; Jonsson, M. A low-cost model vehicle testbed with accurate positioning for autonomous driving. J. Robot. 2018, 2018, 4907536. [Google Scholar] [CrossRef]
Ge, X. Ultra-reliable low-latency communications in autonomous vehicular networks. IEEE Trans. Veh. Technol. 2019, 68, 5005–5016. [Google Scholar] [CrossRef]
El Marai, O.; Taleb, T. Smooth and low latency video streaming for autonomous cars during handover. IEEE Netw. 2020, 34, 302–309. [Google Scholar] [CrossRef]
Pokhrel, S.R.; Kumar, N.; Walid, A. Towards ultra reliable low latency multipath TCP For connected autonomous vehicles. IEEE Trans. Veh. Technol. 2021, 70, 8175–8185. [Google Scholar] [CrossRef]
Zhang, Q.; Zhong, H.; Cui, J.; Ren, L.; Shi, W. AC4AV: A flexible and dynamic access control framework for connected and autonomous vehicles. IEEE Internet Things J. 2020, 8, 1946–1958. [Google Scholar] [CrossRef]
Gorsich, D.J.; Jayakumar, P.; Cole, M.P.; Crean, C.M.; Jain, A.; Ersal, T. Evaluating mobility vs. latency in unmanned ground vehicles. J. Terramech. 2018, 80, 11–19. [Google Scholar] [CrossRef]
Yaqoob, I.; Khan, L.U.; Kazmi, S.A.; Imran, M.; Guizani, N.; Hong, C.S. Autonomous driving cars in smart cities: Recent advances, requirements, and challenges. IEEE Netw. 2019, 34, 174–181. [Google Scholar] [CrossRef]
Kalaria, D.; Lin, Q.; Dolan, J.M. Delay-aware robust control for safe autonomous driving. In Proceedings of the 2022 IEEE Intelligent Vehicles Symposium (IV); IEEE: Piscataway, NJ, USA, 2022; pp. 1565–1571. [Google Scholar]
Kalaria, D.; Lin, Q.; Dolan, J.M. Delay-aware robust control for safe autonomous driving and racing. IEEE Trans. Intell. Transp. Syst. 2024, 25, 7140–7150. [Google Scholar] [CrossRef]
Kamtam, S.B.; Lu, Q.; Bouali, F.; Haas, O.C.; Birrell, S. Network latency in teleoperation of connected and autonomous vehicles: A review of trends, challenges, and mitigation strategies. Sensors 2024, 24, 3957. [Google Scholar] [CrossRef]
Han, T.; Kim, K. Minimizing probabilistic end-to-end latencies of autonomous driving systems. In Proceedings of the 2023 IEEE 29th Real-Time and Embedded Technology and Applications Symposium (RTAS); IEEE: Piscataway, NJ, USA, 2023; pp. 27–39. [Google Scholar]
Pomerleau, D. ALVINN: An Autonomous Land Vehicle In a Neural Network. In Proceedings of the Advances in Neural Information Processing Systems; Morgan Kaufmann: San Mateo, CA, USA, 1989; pp. 305–313. [Google Scholar]
Lecun, Y.; Cosatto, E.; Ben, J.; Muller, U.; Flepp, B. Autonomous Off-Road Vehicle Control Using End-to-End Learning. Darpa-Ipto Final. Rep. 2004, 36, 1. [Google Scholar]
Bojarski, M.; Yeres, P.; Choromanska, A.; Choromanski, K.; Firner, B.; Jackel, L.; Muller, U. Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car. arXiv 2017, arXiv:1704.07911. [Google Scholar]
Wang, Q.; Chen, L.; Tian, B.; Tian, W.; Li, L.; Cao, D. End-to-End Autonomous Driving: An Angle Branched Network Approach. IEEE Trans. Veh. Technol. 2019, 68, 11599–11610. [Google Scholar] [CrossRef]
Wu, T.; Luo, A.; Huang, R.; Cheng, H.; Zhao, Y. End-to-End Driving Model for Steering Control of Autonomous Vehicles with Future Spatiotemporal Features. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: Piscataway, NJ, USA, 2019; pp. 950–955. [Google Scholar] [CrossRef]
Kwon, J.; Khalil, A.; Kim, D.; Nam, H. Incremental end-to-end learning for lateral control in autonomous driving. IEEE Access 2022, 10, 33771–33786. [Google Scholar] [CrossRef]
Weiss, T.; Behl, M. Deepracing: Parameterized trajectories for autonomous racing. arXiv 2020, arXiv:2005.05178. [Google Scholar] [CrossRef]
Cultrera, L.; Becattini, F.; Seidenari, L.; Pala, P.; Del Bimbo, A. Addressing limitations of state-aware imitation learning for autonomous driving. IEEE Trans. Intell. Veh. 2023, 9, 2946–2955. [Google Scholar] [CrossRef]
Jamgochian, A.; Buehrle, E.; Fischer, J.; Kochenderfer, M.J. Shail: Safety-aware hierarchical adversarial imitation learning for autonomous driving in urban environments. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA); IEEE: Piscataway, NJ, USA, 2023; pp. 1530–1536. [Google Scholar]
Cheng, J.; Chen, Y.; Chen, Q. Pluto: Pushing the limit of imitation learning-based planning for autonomous driving. arXiv 2024, arXiv:2404.14327. [Google Scholar] [CrossRef]
Delavari, E.; Khalil, A.; Kwon, J. CARIL: Confidence-Aware Regression in Imitation Learning for Autonomous Driving. arXiv 2025, arXiv:2503.00783. [Google Scholar] [CrossRef]
Mao, H.; Yang, X.; Dally, W.J. A delay metric for video object detection: What average precision fails to tell. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: Piscataway, NJ, USA, 2019; pp. 573–582. [Google Scholar]
Kocić, J.; Jovičić, N.; Drndarević, V. An End-to-End Deep Neural Network for Autonomous Driving Designed for Embedded Automotive Platforms. Sensors 2019, 19, 2064. [Google Scholar] [CrossRef]
Wu, P.; Jia, X.; Chen, L.; Yan, J.; Li, H.; Qiao, Y. Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline. arXiv 2022, arXiv:2206.08129. [Google Scholar]
Popov, A.; Degirmenci, A.; Wehr, D.; Hegde, S.; Oldja, R.; Kamenev, A.; Douillard, B.; Nistér, D.; Muller, U.; Bhargava, R.; et al. Mitigating covariate shift in imitation learning for autonomous vehicles using latent space generative world models. arXiv 2024, arXiv:2409.16663. [Google Scholar] [CrossRef]
Tampuu, A.; Roosild, K.; Uduste, I. The Effects of Speed and Delays on Test-Time Performance of End-to-End Self-Driving. Sensors 2024, 24, 1963. [Google Scholar] [CrossRef] [PubMed]
Bojarski, M.; Del Testa, D.; Dworakowski, D.; Firner, B.; Flepp, B.; Goyal, P.; Jackel, L.D.; Monfort, M.; Muller, U.; Zhang, J.; et al. End to end learning for self-driving cars. arXiv 2016, arXiv:1604.07316. [Google Scholar] [CrossRef]
Kwon, J. OSCAR: Open-Source Robotic Car Architecture for Research and Education. Available online: https://github.com/jrkwon/oscar (accessed on 8 March 2026).
Quigley, M.; Conley, K.; Gerkey, B.; Faust, J.; Foote, T.; Leibs, J.; Wheeler, R.; Ng, A.Y. ROS: An open-source Robot Operating System. ICRA Workshop Open Source Softw. 2009, 3, 5. [Google Scholar]
Koenig, N.; Howard, A. Design and use paradigms for gazebo, an open-source multi-robot simulator. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: Piscataway, NJ, USA, 2004; Volume 3, pp. 2149–2154. [Google Scholar]
Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An Open Urban Driving Simulator. In Proceedings of the 1st Annual Conference on Robot Learning (CoRL); PMLR: New York, NY, USA, 2017; pp. 1–16. [Google Scholar]
NVIDIA Corporation. NVIDIA DRIVE Sim. 2024. Available online: https://developer.nvidia.com/drive/simulation (accessed on 9 May 2024).
Jekel, C.F.; Venter, G.; Venter, M.P.; Stander, N.; Haftka, R.T. Similarity measures for identifying material parameters from hysteresis loops using inverse analysis. Int. J. Mater. Form. 2019, 12, 355–378. [Google Scholar] [CrossRef]
Kim, D.; Khalil, A.; Nam, H.; Kwon, J. OPEMI: Online Performance Evaluation Metrics Index for Deep Learning-Based Autonomous Vehicles. IEEE Access 2023, 11, 16951–16963. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Graves, A.; Graves, A. Long short-term memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 37–45. [Google Scholar]

Figure 1. Perception latency definition. When vehicle state is

x_{t}

and we have observation

o_{t}

, the corresponding action

a_{t}

is applied at time

t + δ

not t (

o_{t} \to a_{t + δ}

). By the time this action is applied, the vehicle state has changed, and we have a new observation. The perception latency has two components: the algorithmic latency and the actuator latency.

Figure 1. Perception latency definition. When vehicle state is

x_{t}

and we have observation

o_{t}

, the corresponding action

a_{t}

is applied at time

t + δ

not t (

o_{t} \to a_{t + δ}

). By the time this action is applied, the vehicle state has changed, and we have a new observation. The perception latency has two components: the algorithmic latency and the actuator latency.

Figure 2. Illustration of perception latency effect on lateral control during lane keeping. The green vehicle exhibits driving behavior with real-time observations, while the red vehicle demonstrates driving behavior with a delayed observation input. Both vehicles maintain a constant low speed. S denotes the start point, and the arrow indicates the direction of vehicle motion. Motion starts at position A. Positions A and B depict both vehicles driving straight, staying within their lanes, with minimal steering adjustments (

a_{t_{A}} \approx a_{t_{B}} \approx 0.0

). At position C, where the first curve is encountered, the green vehicle successfully adjusts its trajectory by turning left to stay in the lane, while the red vehicle continues straight due to the delayed observation input. Subsequently, from position D onward, the red vehicle struggles to recover from its zigzag-shaped trajectory, highlighting the impact of incorrect actions on subsequent observations.

Figure 2. Illustration of perception latency effect on lateral control during lane keeping. The green vehicle exhibits driving behavior with real-time observations, while the red vehicle demonstrates driving behavior with a delayed observation input. Both vehicles maintain a constant low speed. S denotes the start point, and the arrow indicates the direction of vehicle motion. Motion starts at position A. Positions A and B depict both vehicles driving straight, staying within their lanes, with minimal steering adjustments (

a_{t_{A}} \approx a_{t_{B}} \approx 0.0

). At position C, where the first curve is encountered, the green vehicle successfully adjusts its trajectory by turning left to stay in the lane, while the red vehicle continues straight due to the delayed observation input. Subsequently, from position D onward, the red vehicle struggles to recover from its zigzag-shaped trajectory, highlighting the impact of incorrect actions on subsequent observations.

Figure 3. Overview of the perception latency mitigation network (PLM-Net). By leveraging a Timed Action Prediction Model (TAPM) alongside the Base Model (BM), PLM-Net enhances the system’s ability to mitigate perception latency. The TAPM incorporates predictive modeling to anticipate future actions based on current visual observations. Additionally, the integration of multiple sub-models within the framework enables adaptation to varying latency levels, as the final action value

{\tilde{a}}_{t}^{PLM}

is determined through the function

f ({\tilde{a}}_{t}, δ_{t})

where it performs linear interpolation based on the real-time latency value

δ_{t}

given all the predictive action values provided by the TAPM (

{\tilde{a}}_{t}^{TAPM}

) and the current action value provided by the BM (

{\tilde{a}}_{t}^{BM}

). See Section 3 for detailed explanation.

Figure 3. Overview of the perception latency mitigation network (PLM-Net). By leveraging a Timed Action Prediction Model (TAPM) alongside the Base Model (BM), PLM-Net enhances the system’s ability to mitigate perception latency. The TAPM incorporates predictive modeling to anticipate future actions based on current visual observations. Additionally, the integration of multiple sub-models within the framework enables adaptation to varying latency levels, as the final action value

{\tilde{a}}_{t}^{PLM}

is determined through the function

f ({\tilde{a}}_{t}, δ_{t})

where it performs linear interpolation based on the real-time latency value

δ_{t}

given all the predictive action values provided by the TAPM (

{\tilde{a}}_{t}^{TAPM}

) and the current action value provided by the BM (

{\tilde{a}}_{t}^{BM}

). See Section 3 for detailed explanation.

Figure 4. PLM-Net diagram illustrating the interaction between the Base Model (BM) and the Timed Action Prediction Model (TAPM) in mitigating perception latency. The BM, governed by policy

π_{ϕ}^{BM}

, processes visual observation

o_{t}

and vehicle speed

v_{t}

to generate the nominal steering action

{\tilde{a}}_{t}^{BM}

, as described in Equation (1). The TAPM, governed by policy

π_{θ}^{TAPM}

, generates a set of latency-indexed predictive steering actions corresponding to predefined latency offsets in

δ^{TAPM}

(e.g.,

{\tilde{a}}_{t + δ_{1}}

associated with the delayed operating state

s_{t + δ_{1}}

), as detailed in Equation (2). The inputs to

π_{θ}^{TAPM}

include

{\tilde{a}}_{t}^{BM}

, along with feature vectors

z_{t}^{o}

and

z_{t}^{v}

extracted from the BM. The zero-latency action

{\tilde{a}}_{t}^{BM}

is appended to the latency grid to form the reference set used for interpolation. Finally, linear interpolation (Algorithm 1) combines the latency-indexed action grid according to the measured perception latency

δ

to produce the final mitigated action

{\tilde{a}}_{t}^{PLM}

. Green nodes represent vehicle states, blue nodes represent observations, red blocks indicate the Base Model processing stage, yellow blocks denote TAPM processing and predictive heads, and the purple node represents the final action produced by PLM-Net. Arrows illustrate the flow of information between modules.

Figure 4. PLM-Net diagram illustrating the interaction between the Base Model (BM) and the Timed Action Prediction Model (TAPM) in mitigating perception latency. The BM, governed by policy

π_{ϕ}^{BM}

, processes visual observation

o_{t}

and vehicle speed

v_{t}

to generate the nominal steering action

{\tilde{a}}_{t}^{BM}

, as described in Equation (1). The TAPM, governed by policy

π_{θ}^{TAPM}

, generates a set of latency-indexed predictive steering actions corresponding to predefined latency offsets in

δ^{TAPM}

(e.g.,

{\tilde{a}}_{t + δ_{1}}

associated with the delayed operating state

s_{t + δ_{1}}

), as detailed in Equation (2). The inputs to

π_{θ}^{TAPM}

include

{\tilde{a}}_{t}^{BM}

, along with feature vectors

z_{t}^{o}

and

z_{t}^{v}

extracted from the BM. The zero-latency action

{\tilde{a}}_{t}^{BM}

is appended to the latency grid to form the reference set used for interpolation. Finally, linear interpolation (Algorithm 1) combines the latency-indexed action grid according to the measured perception latency

δ

to produce the final mitigated action

{\tilde{a}}_{t}^{PLM}

. Green nodes represent vehicle states, blue nodes represent observations, red blocks indicate the Base Model processing stage, yellow blocks denote TAPM processing and predictive heads, and the purple node represents the final action produced by PLM-Net. Arrows illustrate the flow of information between modules.

Figure 5. PLM-Net architecture. (a) The BM network design, inspired by the NVIDIA PilotNet structure [40], processes visual observation

o_{t}

and vehicle speed

v_{t}

to predict steering angle

{\tilde{a}}_{t}^{BM}

. It utilizes five convolutional layers and a multi-layer perceptron network with dropout layers, including fully connected layers with neuron counts of 512, 100, 50, and 10. (b) The TAPM network design, inspired by ANEC [7] and BCIL [8], processes inputs forwarded by the BM (

{\tilde{a}}_{t}

,

z_{t}^{o}

,

z_{t}^{v}

) through fully connected layers with 100, 500, and 100 neurons, respectively. These layers are then concatenated and forwarded to sub-models, including fully connected layers with neuron counts of 200, 100, and 50, and dropout layers with a rate of 0.3. Each sub-model will provide a distinct future action for a certain latency value. The outputs of the sub-models form the TAPM output

{\tilde{a}}_{t}^{TAPM}

, which is a set of predictive action values corresponding to distinct latency values. In the diagram, arrows indicate the direction of data flow. Blue fully connected layers represent the image-feature pathway (

z_{t}^{o}

), green layers correspond to the velocity pathway (

z_{t}^{v}

), and pink layers denote the action-prediction MLP blocks. The different colors used for convolutional layers serve only for visual distinction between stages of the network.

Figure 5. PLM-Net architecture. (a) The BM network design, inspired by the NVIDIA PilotNet structure [40], processes visual observation

o_{t}

and vehicle speed

v_{t}

to predict steering angle

{\tilde{a}}_{t}^{BM}

. It utilizes five convolutional layers and a multi-layer perceptron network with dropout layers, including fully connected layers with neuron counts of 512, 100, 50, and 10. (b) The TAPM network design, inspired by ANEC [7] and BCIL [8], processes inputs forwarded by the BM (

{\tilde{a}}_{t}

,

z_{t}^{o}

,

z_{t}^{v}

) through fully connected layers with 100, 500, and 100 neurons, respectively. These layers are then concatenated and forwarded to sub-models, including fully connected layers with neuron counts of 200, 100, and 50, and dropout layers with a rate of 0.3. Each sub-model will provide a distinct future action for a certain latency value. The outputs of the sub-models form the TAPM output

{\tilde{a}}_{t}^{TAPM}

, which is a set of predictive action values corresponding to distinct latency values. In the diagram, arrows indicate the direction of data flow. Blue fully connected layers represent the image-feature pathway (

z_{t}^{o}

), green layers correspond to the velocity pathway (

z_{t}^{v}

), and pink layers denote the action-prediction MLP blocks. The different colors used for convolutional layers serve only for visual distinction between stages of the network.

Figure 6. The training process for PLM-Net models. Initially, the BM policy

π_{ϕ}^{BM}

is trained using supervised learning on dataset

D

, optimizing parameters

ϕ

to minimize prediction error (Equation (4)). Subsequently, a new dataset

D^{TAPM}

is generated from

D

to train the TAPM policy

π_{θ}^{TAPM}

, optimizing parameters

θ

to minimize error between predicted and ground truth future actions (Equation (5)). During TAPM training, the BM remains non-trainable to ensure TAPM’s reliance on its output. This modular design supports integration into existing control pipelines. Input data pass through the BM, generating output action

{\tilde{a}}_{t}^{BM}

and input vectors

z_{t}^{o}

and

z_{t}^{v}

, which feed into the TAPM. The TAPM utilizes these inputs to generate future action values (

{\tilde{a}}_{t}^{TAPM}

). In the diagram, the blue network represents the Base Model (BM), while the yellow network represents the TAPM. Circles denote loss computation nodes used during training. Black arrows indicate the flow of data and supervision signals through the training pipeline, and the shaded panel on the right illustrates the set of predicted future actions corresponding to different latency offsets.

Figure 6. The training process for PLM-Net models. Initially, the BM policy

π_{ϕ}^{BM}

is trained using supervised learning on dataset

D

, optimizing parameters

ϕ

to minimize prediction error (Equation (4)). Subsequently, a new dataset

D^{TAPM}

is generated from

D

to train the TAPM policy

π_{θ}^{TAPM}

, optimizing parameters

θ

to minimize error between predicted and ground truth future actions (Equation (5)). During TAPM training, the BM remains non-trainable to ensure TAPM’s reliance on its output. This modular design supports integration into existing control pipelines. Input data pass through the BM, generating output action

{\tilde{a}}_{t}^{BM}

and input vectors

z_{t}^{o}

and

z_{t}^{v}

, which feed into the TAPM. The TAPM utilizes these inputs to generate future action values (

{\tilde{a}}_{t}^{TAPM}

). In the diagram, the blue network represents the Base Model (BM), while the yellow network represents the TAPM. Circles denote loss computation nodes used during training. Black arrows indicate the flow of data and supervision signals through the training pipeline, and the shaded panel on the right illustrates the set of predicted future actions corresponding to different latency offsets.

Figure 7. Test track used for evaluating the PLM-Net performance. The track features a combination of straight sections and curves to simulate real-world driving conditions. The starting point is marked as S and the arrow indicates the driving direction. This track was chosen to test the vehicle’s ability to maintain lane keeping and handle perception latency during both straight and curved segments.

Figure 8. Histogram-based data balancing based on steering angle values. The left image shows the steering histogram before data balancing and the right image shows the histogram after balancing.

Figure 9. Qualitative comparison of steering angle over time. The blue line represents the BM driving on the test track with no latency. The green line represents the BM driving on the test track with

0.2

s latency. The red line represents the PLM-Net driving on the test track with the same latency value.

Figure 9. Qualitative comparison of steering angle over time. The blue line represents the BM driving on the test track with no latency. The green line represents the BM driving on the test track with

0.2

s latency. The red line represents the PLM-Net driving on the test track with the same latency value.

Figure 10. Vehicle trajectories colored by steering angle. (a) The trajectory of the BM driving with no latency. (b) The trajectory of the BM driving with

0.2

s latency. (c) The trajectory of the PLM-Net driving with

0.2

s latency.

Figure 10. Vehicle trajectories colored by steering angle. (a) The trajectory of the BM driving with no latency. (b) The trajectory of the BM driving with

0.2

s latency. (c) The trajectory of the PLM-Net driving with

0.2

s latency.

Figure 11. Qualitative comparison of trajectories on test track. Comparison of driving trajectories on the test track, highlighting differences between the BM driving with no latency (blue line), the BM driving with

0.2

s latency (green line), and the PLM-Net driving with

0.2

s latency (red line). (a) The trajectories on the full test track. (b) The trajectories on a straight road segment. (c) The trajectories on a left turn. (d) The trajectories on a right turn.

Figure 11. Qualitative comparison of trajectories on test track. Comparison of driving trajectories on the test track, highlighting differences between the BM driving with no latency (blue line), the BM driving with

0.2

s latency (green line), and the PLM-Net driving with

0.2

s latency (red line). (a) The trajectories on the full test track. (b) The trajectories on a straight road segment. (c) The trajectories on a left turn. (d) The trajectories on a right turn.

Figure 12. Qualitative comparison of steering angle over time is depicted in the upper image. The blue line represents the BM driving without latency on the test track. Meanwhile, the green line illustrates the BM’s performance under time-variant latency, ranging from

0.0

to

0.35

s. The red line showcases the PLM-Net’s driving behavior under the same time-variant latency conditions. In the lower image, latency values against time are presented. The green line corresponds to the latency experienced by the BM, while the red line represents the latency encountered by the PLM-Net.

Figure 12. Qualitative comparison of steering angle over time is depicted in the upper image. The blue line represents the BM driving without latency on the test track. Meanwhile, the green line illustrates the BM’s performance under time-variant latency, ranging from

0.0

to

0.35

s. The red line showcases the PLM-Net’s driving behavior under the same time-variant latency conditions. In the lower image, latency values against time are presented. The green line corresponds to the latency experienced by the BM, while the red line represents the latency encountered by the PLM-Net.

Figure 13. Colored trajectories based on steering angle. (a) The trajectory of the BM driving with no latency. (b) The trajectory of the BM driving under time-variant latency [0.0–0.35] s. (c) The trajectory of the PLM-Net driving under time-variant latency [0.0–0.35] s.

Figure 14. Trajectory comparison between the BM driving with no latency (blue line), the BM driving with time-variant latency [0.0–0.35] s (green line), and the PLM-Net driving with time-variant latency [0.0–0.35] s (red line). (a) The trajectories on the full test track. (b) The trajectories on a straight road segment. (c) The trajectories on a left turn. (d) The trajectories on a right turn.

Table 1. Steering and velocity statistics.

	Steering	Velocity
Mean	−0.003923	20.310327
Std	0.176994	2.924156
Min	−1.000000	0.018407
25%	−0.069025	18.301480
50%	0.000000	19.883244
75%	0.058849	22.329700
Max	0.691892	29.936879

Table 2. Trainable and non-trainable parameter counts for the BM and the TAPM, where the BM weights remained frozen during TAPM training.

	Total	Trainable	Non-Trainable
	Parameters	Parameters	Parameters
BM	5,874,566	5,874,566	0
TAPM	12,256,396	6,250,205	6,006,191

Table 3. Steering angle similarity. Quantitative comparison of steering angle between the BM driving without latency, the BM driving with latency of

0.2

s, and PLM-Net driving with the same latency. Downward arrows indicate that lower values correspond to better performance.

Table 3. Steering angle similarity. Quantitative comparison of steering angle between the BM driving without latency, the BM driving with latency of

0.2

s, and PLM-Net driving with the same latency. Downward arrows indicate that lower values correspond to better performance.

BM, No Latency vs.	MAE↓	MSE↓	RMSE↓
BM, Latency = $0.2$ s	0.1915	0.0716	0.2676
PLM-Net, Latency = $0.2$ s	0.0726	0.0125	0.1119

Table 4. Quantitative comparison of driving trajectory similarity, between the BM driving without latency, the BM driving with latency of

0.2

s, and PLM-Net driving with the same latency. All driving models’ trajectories are compared with the lane center on full track, straight road, left turn, and right turn. Downward arrows indicate that lower values correspond to better performance.

Table 4. Quantitative comparison of driving trajectory similarity, between the BM driving without latency, the BM driving with latency of

0.2

s, and PLM-Net driving with the same latency. All driving models’ trajectories are compared with the lane center on full track, straight road, left turn, and right turn. Downward arrows indicate that lower values correspond to better performance.

Figure 11 Ref.	Driving Model	Partial	Frechet	Area	Curve	Dynamic	$\bar{DTSI}$ ↓
		Curve ↓	Distance ↓	Between ↓	Length ↓	Time ↓
		Mapping		Curves		Warping
(a) Full Track	BM, No Latency (ref)	1.19	2.176	4658.15	0.349	1253.21	0.034
(a) Full Track	BM, Latency = $0.2$ s	4.03	5.469	23,045.1	0.767	2594.39	0.072
(a) Full Track	PLM-Net, Latency = $0.2$ s	1.322	3.418	39,523.2	0.374	2299.84	0.036
(b) Straight	BM, No Latency (ref)	24.475	5.095	13.556	0.077	76.376	0.012
(b) Straight	BM, Latency = $0.2$ s	351.807	9.048	227.119	0.209	338.861	0.095
(b) Straight	PLM-Net, Latency = $0.2$ s	137.353	6.117	66.545	0.093	136.861	0.016
(c) Left turn	BM, No Latency (ref)	0.895	3.966	41.402	0.178	67.892	0.025
(c) Left turn	BM, Latency = $0.2$ s	1.3	5.405	141.041	0.234	156.02	0.048
(c) Left turn	PLM-Net, Latency = $0.2$ s	0.674	1.5	71.816	0.05	81.885	0.01
(d) Right turn	BM, No Latency (ref)	0.293	1.546	90.228	0.048	79.036	0.037
(d) Right turn	BM, Latency = $0.2$ s	0.927	5.357	384.746	0.124	232.834	0.055
(d) Right turn	PLM-Net, Latency = $0.2$ s	0.745	2.863	45.779	0.063	69.018	0.078

Table 5. Steering angle similarity. Quantitative comparison of steering angle between BM driving with no latency and with time-variant latency [0.0–0.35] s, and PLM-Net driving with the same time-variant latency. Downward arrows indicate that lower values correspond to better performance.

BM, No Latency vs.	MAE↓	MSE↓	RMSE↓
BM, Latency = [0.0–0.35] s	0.3336	0.1871	0.4326
PLM-Net, Latency = [0.0–0.35] s	0.0710	0.0108	0.1037

Table 6. Comparative analysis of driving trajectory similarity. The BM driving without latency, the BM driving under time-variant latency ranging from

0.0

to

0.35

s, and PLM-Net driving within the same time-variant latency range. Trajectories of all driving models are evaluated against the lane center across various scenarios, including full track, straight road, left turn, and right turn. Downward arrows indicate that lower values correspond to better performance.

Table 6. Comparative analysis of driving trajectory similarity. The BM driving without latency, the BM driving under time-variant latency ranging from

0.0

to

0.35

s, and PLM-Net driving within the same time-variant latency range. Trajectories of all driving models are evaluated against the lane center across various scenarios, including full track, straight road, left turn, and right turn. Downward arrows indicate that lower values correspond to better performance.

Figure 14 ref.	Driving Model	Partial	Frechet	Area	Curve	Dynamic	$\bar{DTSI}$ ↓
		Curve ↓	Distance ↓	Between ↓	Length ↓	Time ↓
		Mapping		Curves		Warping
(a) Full Track	BM, No Latency (ref)	1.19	2.176	4658.15	0.349	1253.21	0.034
(a) Full Track	BM, Latency = 0.0–0.35 s	5.91	7.723	26,498.2	1.245	4268.24	0.238
(a) Full Track	PLM-Net, Latency = 0.0–0.35 s	2.547	3.616	29,126.6	0.694	2653.26	0.028
(b) Straight	BM, No Latency (ref)	24.475	5.095	13.556	0.077	76.376	0.012
(b) Straight	BM, Latency = 0.0–0.35 s	465.004	4.026	293.106	0.08	409.007	0.109
(b) Straight	PLM-Net, Latency = 0.0–0.35 s	21.428	1.319	12.097	0.032	49.84	0.014
(c) Left turn	BM, No Latency (ref)	0.895	3.966	41.402	0.178	67.892	0.025
(c) Left turn	BM, Latency = 0.0–0.35 s	1.697	4.449	209.537	0.284	198.209	0.107
(c) Left turn	PLM-Net, Latency = 0.0–0.35 s	0.4	1.527	45.844	0.087	59.073	0.017
(d) Right turn	BM, No Latency (ref)	0.293	1.546	90.228	0.048	79.036	0.037
(d) Right turn	BM, Latency = 0.0–0.35 s	0.744	4.572	420.415	0.11	228.825	0.097
(d) Right turn	PLM-Net, Latency = 0.0–0.35 s	0.327	1.692	89.126	0.046	92.033	0.056

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Khalil, A.; Kwon, J. PLM-Net: Perception Latency Mitigation Network for Vision-Based Lateral Control of Autonomous Vehicles. Sensors 2026, 26, 1798. https://doi.org/10.3390/s26061798

AMA Style

Khalil A, Kwon J. PLM-Net: Perception Latency Mitigation Network for Vision-Based Lateral Control of Autonomous Vehicles. Sensors. 2026; 26(6):1798. https://doi.org/10.3390/s26061798

Chicago/Turabian Style

Khalil, Aws, and Jaerock Kwon. 2026. "PLM-Net: Perception Latency Mitigation Network for Vision-Based Lateral Control of Autonomous Vehicles" Sensors 26, no. 6: 1798. https://doi.org/10.3390/s26061798

APA Style

Khalil, A., & Kwon, J. (2026). PLM-Net: Perception Latency Mitigation Network for Vision-Based Lateral Control of Autonomous Vehicles. Sensors, 26(6), 1798. https://doi.org/10.3390/s26061798

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PLM-Net: Perception Latency Mitigation Network for Vision-Based Lateral Control of Autonomous Vehicles

Abstract

1. Introduction

1.1. Motivation

1.2. Latency Effect on AV Lateral Control

1.3. Contribution

2. Related Work

3. Method

3.1. PLM-Net Framework

3.2. PLM-Net Models Architecture

3.3. PLM-Net Models Training

3.4. Performance Metrics

4. Simulation-Based Validation Setup

4.1. Simulator

4.2. Dataset

4.2.1. Training and Test Tracks

4.2.2. Data Collection

4.2.3. Data Balancing

4.2.4. Data Augmentation

4.3. Driving Performance Evaluation

4.3.1. Steering Angle Similarity

4.3.2. Trajectory Similarity

4.4. Parameter Tuning

4.5. Latency Knowledge and Modeling Assumptions

4.6. Ablation Study

4.7. Computational Environment and Machine Learning Framework

4.8. Computational Cost Analysis

5. Results

5.1. Constant Perception Latency Mitigation

5.2. Time-Variant Perception Latency Mitigation

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Constant Perception Latency

Appendix A.1. Perception Latency = 0.15 s

Appendix A.2. Perception Latency = 0.25 s

Appendix A.3. Perception Latency = 0.30 s

Appendix B. Training Results

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI