A Comprehensive Literature Review on Modular Approaches to Autonomous Driving: Deep Learning for Road and Racing Scenarios

Hussain, Kamal; Moreira, Catarina; Pereira, João; Jardim, Sandra; Jorge, Joaquim

doi:10.3390/smartcities8030079

Open AccessSystematic Review

A Comprehensive Literature Review on Modular Approaches to Autonomous Driving: Deep Learning for Road and Racing Scenarios

by

Kamal Hussain

¹

,

Catarina Moreira

²

,

João Pereira

¹

,

Sandra Jardim

^3,*

and

Joaquim Jorge

¹

Instituto de Engenharia de Sistemas e Computadores: Investigação e Desenvolvimento, Instituto Superior Tecnico, University of Lisboa, 1000-029 Lisbon, Portugal

²

Data Science Institute, University Technology Sydney, Ultimo, NSW 2007, Australia

³

Smart Cities Research Center, Polytechnic Institute of Tomar, 2300-313 Tomar, Portugal

^*

Author to whom correspondence should be addressed.

Smart Cities 2025, 8(3), 79; https://doi.org/10.3390/smartcities8030079

Submission received: 28 March 2025 / Revised: 27 April 2025 / Accepted: 28 April 2025 / Published: 6 May 2025

(This article belongs to the Section Smart Transportation)

Download

Browse Figures

Versions Notes

Abstract

Highlights

What are the main accomplishments of this study?

A comprehensive analysis of deep learning techniques in both on-road and autonomous racing cars, highlighting distinct challenges and requirements for each context.
The identification of critical challenges for future research, to ensure safety and performance in autonomous systems.

What are the implications of the main findings?

The detailed evaluation of planning methods and performance metrics points to opportunities to refine existing methodologies and identify emerging research areas that can guide the development of more efficient, robust, and scalable autonomous driving technologies.
The challenges identified in sensor fusion, environmental robustness, and computational efficiency imply that addressing these issues is critical to progress in autonomous systems.

Abstract

Autonomous driving technology is advancing rapidly, driven by integrating advanced intelligent systems. Autonomous vehicles typically follow a modular structure, organized into perception, planning, and control components. Unlike previous surveys, which often focus on specific modular system components or single driving environments, our review uniquely compares both settings, highlighting how deep learning and reinforcement learning methods address the challenges specific to each. We present an in-depth analysis of local and global planning methods, including the integration of benchmarks, simulations, and real-time platforms. Additionally, we compare various evaluation metrics and performance outcomes for current methodologies. Finally, we offer insights into emerging research directions based on the latest advancements, providing a roadmap for future innovation in autonomous driving.

Keywords:

autonomous driving; autonomous racing; modular system; perception; planning; control; safety; deep learning; simulations

1. Introduction

In the past decade, there has been a significant increase in traffic accidents worldwide. According to the latest report by the World Health Organization [1], traffic accidents have become the eighth-leading cause of death worldwide. Each year, approximately 1.3 million people lose their lives in road accidents, while an additional 20 to 50 million suffer non-fatal injuries, many of which result in permanent disabilities. Many sectors, including the transportation and urban planning sectors, as well as road authorities and lawmakers, play a vital role in addressing this challenge. To reduce accidents and improve road safety, autonomous driving technologies have become essential.

In line with these objectives, the automotive industry and the scientific community have focused on developing innovative technologies, particularly those aimed at enhancing the safety and performance of road vehicles [2,3]. These advances contribute to increased safety and road efficiency and a significant reduction in the risk of traffic accidents, lowering the number of fatalities on the road. In recent decades, autonomous systems have significantly advanced, offering promising improvements in driver safety and vehicle quality.

Existing studies have investigated modular systems utilizing various techniques, including classical methods, mathematical modeling, optimization, machine learning, and deep learning. In particular, deep learning has recently shown remarkable effectiveness in autonomous driving scenarios, as demonstrated by Zhao et al. [4]. The success of deep learning has led to its widespread adoption in autonomous driving. These techniques are used individually in modular components and end-to-end approaches [5,6,7].

End-to-end deep learning approaches bypass the traditional modular hierarchy, handling complex tasks autonomously through a single architecture. This approach streamlines workflows by eliminating the need for separate modules, which is shown to be effective in certain autonomous driving scenarios.

This systematic review uniquely analyzes deep learning approaches for autonomous vehicles in road and race environments. In the on-road context, autonomous driving components must navigate challenges such as traffic congestion, obstacles, parking, pedestrian detection, and managing vehicle speed in areas with congested traffic signals. Autonomous racing presents unique challenges, including speed control, optimized acceleration and braking, and maintaining track position, identifying the best trajectories for shorter lap times, and monitoring high-speed opponents to ensure safety and competitiveness.

This paper explains how deep learning addresses these challenges, focusing on the integration of the components of the modular system from existing studies. The main contributions can be summarized as follows:

This review is the first to present a comparison of state-of-the-art approaches that address deep learning methods between autonomous driving on the road and racetrack scenarios.
We highlight existing deep learning approaches for modular systems, including perception, planning, control, end-to-end approaches, and safety.
We also describe both scenarios’ benchmarks, simulations, and real-time platforms.
In addition, we discuss evaluation metrics, state-of-the-art performance, and their comparisons.
Finally, we outline the potential challenges and research directions from the state-of-the-art.

The rest of this paper is organized as follows: Section 2 describes the background overview of autonomous driving, autonomous racing cars, and modular systems. Section 3 elaborates on recent surveys conducted for autonomous driving, incorporating various deep learning techniques. Section 4 presents the research questions addressed and describes the methodology used to prepare this systematic review. Section 5 presents the characteristics of the selected studies and a synthesis of the results achieved. Focusing on each of the defined research questions, Section 6 presents an in-depth analysis of the articles included in our study, providing a comparative analysis of existing approaches as well as proposing future research directions. Section 7 presents the limitations of this study. Section 8 discusses the performance, real-time challenges, and future directions of existing modular autonomous driving systems in on-road and racing scenarios. Section 9 presents a case study focused on the integration and combination of state-of-the-art methods in the creation of autonomous driving systems. Finally, the conclusions of this systematic review are presented in Section 10.

2. Background

This section provides an overview of autonomous driving, one of the fastest-growing technologies enabling vehicles to navigate and operate autonomously. The Society of Automotive Engineers (SAE) has established a taxonomy for autonomous vehicles, standardized under the “SAE J3016 Levels of Driving Automation”. This framework defines six levels of driving automation, ranging from Level 0 (complete manual) to Level 5 (fully autonomous) [8]. Figure 1 provides a visual summary of these levels of automation, clearly understanding the progression from manual to fully autonomous vehicle control.

The scientific and industrial communities have recently focused on investigating Level 5 automation, the highest level of SAE taxonomy. This level represents fully autonomous vehicles operating without human intervention in all driving conditions. Achieving this level of automation is considered a critical milestone in the advancement of autonomous driving technology.

2.1. On-Road Autonomous Vehicles

The literature shows that the investigation focuses on the quality and maturity of self-driving vehicles from the perspective of road and race scenarios. However, the concept of an autonomous vehicle appeared for the first time in 1925, when Francis Houdini presented a remotely controlled prototype that traveled 19 km/h between Broadway and Fifth Avenue. Between the 1980s and 1995, Dickmanns and Zapp [9] integrated computer systems into vehicles, allowing speed control of up to 63 km/h on stationary roads. Over time, autonomous vehicles have progressively advanced, incorporating a variety of approaches.

The autonomous vehicle challenges were initially organized as the Grand and Urban Challenges in 2004 and 2007, respectively. The Defense Advanced Research Projects Agency (DARPA) launched the Grand Challenge to spur innovation in unmanned ground vehicle navigation. DARPA organized various competitions successfully and won millions of dollars by demonstrating industrial interest and improving the quality of autonomous vehicles [10].

2.2. Autonomous Racing Cars

There are three autonomous racing car platforms—F1/tenth, Formula Student, and Formula 1; a detailed discussion is presented in Section 6.3. The first autonomous racing car competition took place in 2006. It was designated as the first competition in which vehicles are fully autonomous on large-scale racing tracks, considering Formula F1. The teams in this autonomous racing challenge are the world’s leading university research groups. The winner completed the race, reaching a top speed of 124 km/h. Roborace is organizing several series of autonomous races; the most recent series of races was held from 2020 to 2022, including the Beta season and DevBot2.0. This automobile championship provides a platform for technological advancement capable of accelerating the commercialization of fully autonomous vehicles and implementing advanced driving assistance systems (ADASs) to increase safety and performance. In the 2020 competition, the TUM Autonomous Motorsport team from the Technical University of Munich won the challenge, securing a prize of USD 1 million and achieving an average track speed of 218 km/h [11].

2.3. Modular System

Although modular structures have been extensively studied, our contribution focuses on bridging these established architectures across distinct domains—public roads and high-speed racing environments—as well as critically evaluating the transferability of techniques such as perception and planning across contexts.

The modular system is the leading autonomous driving procedure, providing three different modules.

Perception involves gathering information and extracting relevant information from the vehicle’s environment. Perception can be divided into two categories: (1) environmental perception, which involves understanding the context of the environment, such as identifying obstacles, detecting traffic signs, and performing semantic data classification, and (2) localization, which pertains to the vehicle’s ability to determine its position relative to the environment as well as mapping and tracking the path for maneuvers. Perception provides essential input for planning modules.

Planning involves making deliberate decisions to achieve higher-order objectives, guiding the vehicle from a start location to a target location while avoiding obstacles and optimizing based on specific heuristics. Path planning is often divided into global planning (route planning) and local planning, which encompasses trajectory and behavioral planning.

Control refers to the vehicle’s ability to carry out the actions defined by higher-level processes. This includes lateral and longitudinal control, such as managing acceleration, braking, drifting, and steering.

Figure 2 illustrates an autonomous driving pipeline, comprising hardware such as sensors, computing devices, servers, buses, and connectivity, as well as software components such as operating systems, a modular system, safety, and simulations.

3. Related Work

Scopus data (TITLE-ABS-KEY (( “autonomous driving” OR “autonomous vehicle” OR “autonomous racing”) AND (“perception” OR “planning” OR “control” OR “safety”))AND (“machine learning” OR “deep learning” ) AND PUBYEAR > 2019) indicate a substantial growth in publications on autonomous driving since 2019, with an exponential rise in recent years (Figure 3), reflecting the increasing focus on autonomous driving within transportation systems. This surge signifies growing attention from academic and industrial sectors toward the research and development of autonomous vehicle technologies as they become central to future mobility.

Several studies have examined the integration of deep learning techniques in autonomous driving, focusing on key tasks such as perception, planning, control, and safety. These reviews consistently emphasize the central role that deep learning plays in improving the performance and reliability of autonomous systems.

To provide a structured and insightful overview of existing surveys, the literature review is organized thematically across core components of autonomous driving systems.

3.1. Perception-Focused Surveys

Perception remains a fundamental module, often addressed through object detection, segmentation, localization, and sensor fusion. Morooka et al. [13] presented a comprehensive review of deep learning applications in various components of the modular system in autonomous driving. The study focused on advances in perception (such as object detection, localization, and segmentation), planning (including trajectory generation and prediction), and control (covering steering and speed optimization). It highlighted the significant role deep learning plays in improving the accuracy and robustness of each component, demonstrating its growing impact on improving the overall performance of autonomous driving systems. Golroudbari and Sabour [14] focused on improving autonomous driving systems by analyzing deep learning approaches applied to specific components of the modular system. These included obstacle detection, sensory perception, behavior modeling, lane detection, navigation, and path planning, all aimed at enhancing the overall performance of autonomous driving systems. Specific reviews focused on perception [15,16,17,18], emphasizing deep learning techniques for tasks such as object detection and semantic segmentation. LiDAR-based perception surveys were found in [18,19,20,21], while SLAM-based perception reviews were discussed in [22,23].

3.2. Planning and Trajectory Prediction

Planning has been extensively surveyed, integrating both machine learning and deep learning techniques. Reviews such as [24,25,26,27] evaluated planning systems based on trajectory generation, prediction, and behavioral reasoning under dynamic road conditions. Bachute and Subhedar [28] presented a study with a broader scope, with an in-depth exploration of self-driving technologies, using a combination of deep learning and machine learning techniques. They compared the performance of existing methods by analyzing key metrics such as mean intersection over union (mIoU) and average precision, among others. Their review focused on specific components of the modular system, including motion planning, lane detection, localization and mapping, pedestrian detection, traffic sign detection, auto-parking, security, and fault diagnosis.

3.3. Control Systems

Control modules, including lateral and longitudinal vehicle control, have been extensively studied in surveys like [29,30], which incorporate both classical and learning-based control strategies.

3.4. End-to-End Learning

Several reviews focus on end-to-end autonomous driving architectures where deep neural networks learn the entire pipeline from perception to control. These were examined in [31,32,33], covering both imitation learning and reinforcement learning frameworks.

3.5. Safety in Autonomous Driving

Safety-related survey articles analyze risk mitigation, redundancy, and robustness. Jing Ren, Raymond Ning Huang, Jing Ren [34] reviewed deep learning techniques by applying various aspects of autonomous vehicles, particularly fault diagnosis, data security, and vehicle communication systems. The authors highlighted how deep learning has become a cornerstone for optimizing vehicle-to-vehicle (V2V) and vehicle-to-everything (V2X) communication. Broader safety-oriented discussions were found in [35,36,37,38].

3.6. Large Language Models and Vision-Language Models

The emerging literature highlights the role of large language models (LLMs) and vision-language models (VLMs) in autonomous systems. Li et al. [39] reviewed how LLMs support human-like reasoning in self-driving. They investigated both modular system pipelines and end-to-end systems, concluding that LLM models enhance decision-making, perception, and human–machine interaction in autonomous vehicles. The LLM4Drive survey [40] explored planning, perception, and question-answering modules enhanced with LLMs. Multimodal models (MLLMs) were covered in [41], focusing on integration with motion planning and perception. Zhou et al. [42] analyzed VLMs in perception, decision-making, and scene understanding. These studies marked a shift toward knowledge-driven and context-aware driving systems. However, despite their potential, LLMs and vision-language models present significant limitations.

Although LLMs and language-vision models have shown promise in improving semantic understanding, decision-making, and contextual awareness in autonomous systems Cui et al. [43], Zheng et al. [44], several limitations require critical consideration. These models often inherit biases from their training data, which may reflect geographic, cultural, or environmental biases. In the context of driving, this can result in inadequate generalization to road conditions, signaling styles, or underrepresented human behaviors.

In addition, the integration of LLMs raises ethical and operational issues, including how decisions made by the models can be interpreted, verified, or certified for safety-critical applications. However, issues such as data privacy, accountability in failure scenarios, and explainability of language-based decisions still pose major challenges.

Furthermore, LLMs and VLMs are computationally intensive, which limits their real-time applicability without significant model compression or dedicated hardware optimization.

3.7. Simulation Modalities in Autonomous Driving Research

Simulation environments constitute an important basis for the development and validation of autonomous driving systems. The existing literature distinguishes between several simulation modalities, each of which offers different degrees of realism and interaction.

Fully simulated environments include computer games or virtual worlds where both the vehicle and its surroundings are entirely synthetic. Several reviews have examined the use of these environments for training and benchmarking AI agents, especially in early reinforcement learning experiments. For example, Kaur et al. [45] provided a comprehensive survey on simulators for testing self-driving cars, discussing different simulation platforms and their applications in autonomous vehicle development. More recently, Ref. [46] examined several mainstream simulators, detailing their features, including accessibility and underlying engines, and presented a structured overview of various open-source datasets. They also investigated notable virtual competitions, highlighting their reliance on simulators and datasets.

In a semi-simulated framework scenario, real-world driving footage is used as input, and an AI agent attempts to make decisions based on the video stream. This setup has been discussed in research exploring imitation learning and video prediction for decision-making. Shen et al. [47] presented Sim-on-Wheels, a framework that integrates virtual events into real-world driving scenarios, enabling safe and realistic testing of autonomous vehicles.

Semi-physical environments involve scaled-down physical environments where small vehicles equipped with real sensors drive on miniature tracks. Reviews in this area typically focus on simulation-to-real-world transfer, sensor calibration, and domain adaptation. Caleffi et al. [48] provided an overview of small-scale self-driving cars, summarizing current autonomous platforms and focusing on software and hardware developments in this field.

In real-world physical systems, full-scale autonomous vehicles operating in real-world road conditions represent the highest level of fidelity. Several studies have focused on real-world deployment challenges, including sensor fusion, system safety, and real-time performance. Fremont et al. [49] explored formal scenario-based testing approaches that span both simulation and real-world environments, highlighting methods to bridge the gap between the two.

Each modality plays a unique role in the development pipeline. A comprehensive understanding of their trade-offs helps researchers select the appropriate test environment for specific phases of algorithm design and evaluation.

3.8. Comparative Context and Scope

While many existing surveys concentrate on isolated components, our review contributes by connecting these components across two domains—on-road and autonomous racing—highlighting their adaptation to specific context-specific requirements. We aim to bridge traditional modular reviews with a comparative perspective that accounts for scenario-specific constraints.

We provide an overview of the modular system, including perception, planning, control, safety, benchmarks, simulation, and real-time autonomous driving. Figure 4 illustrates the hierarchical structure of modular autonomous driving systems, which offers a conceptual overview of how perception, planning, control, and evaluation modules interact, serving as a foundation for understanding the various methods.

This paper provides a detailed analysis and comparison of the unique challenges and requirements for both road and race autonomous vehicles. This analysis demonstrates how artificial intelligence (AI) technologies can be adapted and optimized for varying automotive environments. Our comparison highlights the adaptability and efficiency of AI algorithms in managing the fast-paced decision-making required in racing scenarios, in contrast to the complex and dynamic conditions faced by on-road vehicles.

4. Materials and Methods

The current systematic review of the literature was conducted based on the preferred reporting items for systematic reviews and meta-analyses (PRISMA) [50].

4.1. Research Motivations

Autonomous driving technology is advancing rapidly, driven by the integration of advanced intelligent systems. Among the various techniques used, deep learning has proven to be highly effective in autonomous driving scenarios. The success of deep learning techniques has led to their widespread use in autonomous driving, which makes it necessary to explore in-depth analyses by examining previous studies. Our study aims to systematically review the proposed deep learning approaches for autonomous vehicles, covering both on-road and racing environments. It seeks to highlight existing deep learning approaches for modular systems, describe benchmarks of simulation and real-time platforms, discuss evaluation metrics, state-of-the-art performances and their comparisons, and outline potential challenges and future research directions. Additionally, we aim to fill what we perceive as a gap in similar studies that often focus on specific modular system components within either on-road or racing car environments.

4.2. Research Questions

This study aims to answer the following research questions:

RQ1: What are the deep learning approaches used in modular autonomous driving systems on the road and in racing scenarios?
RQ2: What safety and robustness techniques from machine learning and deep learning are used in autonomous driving on the road and in racing scenarios?
RQ3: What are the existing datasets used for machine learning/deep learning techniques in autonomous driving?
RQ4: What performance evaluation metrics are used to evaluate the modular system in autonomous driving on the road and in race scenarios?

4.3. Information Sources and Databases

We implemented a systematic approach to gather the most recent studies, covering the period from 2020 to 2025, from a variety of journals and conferences across multiple database sources, including SCOPUS, Science Direct, IEEE Xplore, Web of Science, ACM, Taylor & Francis, and MDPI. We developed a search strategy to identify relevant literature following the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines during all stages MJ et al. [50]. Figure 5 presents the paper screening and selection processes.

4.4. Search Strategy and Key Terms

Our search strategy was customized for three databases using the following search terms: “autonomous driving” OR “autonomous vehicles” OR “autonomous racing” AND “perception” OR “planning” OR “control” OR “safety” AND “machine learning” OR “deep learning” OR “DL” OR “ML”. All searches covered the period from 2020 to 2025 and included journal and conference papers, early access papers, and book chapters.

4.5. Eligibility Criteria and Quality Assessment

The selection of articles to be considered in our study was made according to a set of criteria that prioritized high-quality publications from leading journals and conferences in the relevant areas. The search focused on mapping the existing literature on machine learning/deep learning approaches used in modular autonomous driving systems. The search covered the years 2020 to 2025, excluding all papers with a publication date outside this period. The search was not restricted to specific countries or regions and was of global scope. Only papers written in English were considered. Furthermore, only papers published in the research areas of computer science, engineering (excluding subfields not aligned, such as biomedical engineering, and civil engineering, among others), transportation, automation control systems, robotics, or related areas were included in this study. Short articles, posters, editorial materials, letters, and articles not aligned with the scope of this study were excluded. To ensure the relevance of each selected paper, we critically reviewed the abstracts of each one, focusing only on those directly related to our topic. Table 1 summarizes the inclusion and exclusion criteria used in the selection of research articles.

4.6. Data Extraction

Our search yielded a substantial number of articles from which we extracted key themes, main contributions, discussions, future directions, and conclusions to form the foundation of this systematic review. To achieve this, two of the authors independently and extensively analyzed the 155 articles included in the study.

4.7. Risk of Bias Assessment

To assess potential bias in the included studies, we developed a custom checklist, taking into account best practices for systematic reviews. This checklist considered the clarity of the study objectives, data quality, methodological transparency and consistency, completeness of reporting, and potential conflicts of interest. Each of the 155 studies was independently assessed by two of the authors (K.H. and S.J.), and points of disagreement were resolved through discussion between the authors. The overall risk of bias was classified as low, moderate, or high for each study, according to these criteria.

4.8. Effect Measures

Although we did not conduct a meta-analysis, we extracted available effect measures from the included studies. These included precision, recall, F1 score, mean absolute error, root mean square error, average displacement error, final displacement error, and accuracy. Where applicable, we present these values in the Section 5 for comparative analysis.

4.9. Reporting Bias Assessment

In this systematic review, no statistical tools were used to assess reporting bias. However, we qualitatively considered potential bias by examining the trends in reporting in similar studies and identifying any patterns of selective reporting of results.

4.10. Registration and Protocol

This systematic review was not registered in PROSPERO or any other protocol registry. No formal review protocol was published prior to conducting the review.

5. Results

5.1. Characteristics of the Selected Studies

From the research carried out in previously identified databases, 155 articles were identified, focused on deep learning techniques in modular autonomous driving systems. Of these, 6 were published in 2020, 57 in 2021, 47 in 2022, 17 in 2023, 25 in 2024, and 3 in 2025. Regarding the scope, of the 155 included in the study, 31 focus on the perception module (

20 %

), 48 on the planning module (

31 %

), 31 on the control module (

20 %

), 35 on end-to-end approaches (

22.5 %

), and 10 (

6.5 %

) on safety and robustness issues.

5.2. Synthesis of Results

The results of this research are presented in the Section 6, which is divided according to the scope of the articles that make up this study: (a) articles focused on deep learning approaches for the perception module of modular autonomous driving systems; (b) articles focused on deep learning approaches for the planning module of modular autonomous driving systems; (c) articles focused on deep learning approaches for the control module of modular autonomous driving systems; (d) articles focused on deep learning end-to-end approaches for modular autonomous driving systems; and (e) articles focused on deep learning approaches for the safety of modular autonomous driving systems.

6. Major Findings

In this section, we present an in-depth analysis of the papers included in our study. For each research question, we not only present the findings reached through the systematic review carried out, but also provide a comparative analysis of existing approaches and propose future research directions.

6.1. RQ1: What Are the Deep Learning Approaches Used in Modular Autonomous Driving Systems on the Road and in Racing Scenarios?

For better organization, and given that each of the analyzed articles focuses mainly on one of the modules of the modular autonomous driving system, the answer to research question one (RQ1) is presented in three parts, each of which relates to one of the modules of the modular system, perception, planning, and control. Thus, Table 2 provides a detailed comparison of perception, planning, and control components for road versus race vehicles, highlighting how each domain presents distinct challenges and system demands.

6.1.1. Perception

Perception is the initial module of the modular system that is based on sensory data. Sensors contain cameras, radar, LiDAR, and the global positioning system (GPS). Table 2 illustrates components concerning perception for both types of vehicles, including object detection, lane or road recognition, boundary prediction, and obstacles or opponents detection, which are captured by the sensors. Here, we focus on the existing methodologies of the perception module, including sensor data based on 2D and 3D. These sensory data are crucial in autonomous driving since they detect, recognize, and segment obstacles surrounding vehicles. For example, CNNs are widely used for object detection and segmentation across 2D and 3D inputs in both on-road and racing scenarios [8,51]. Therefore, autonomous driving has been used in various urban, rural, and highway environments. During these environments, multiple studies have been conducted on the different instances surrounding autonomous driving, including traffic congestion, blocks, pedestrians, speed congestion, traffic signals, and other obstacles. In contrast, the perception module for autonomous racing cars follows specific tracks to detect and segment objects, racetrack lanes, and opponents at high speeds.

On-Road Perception Techniques

This section elaborates on state-of-the-art perception methods in autonomous on-road vehicles, including object detection, segmentation, depth estimation, localization, and mapping using deep learning models. Table 2 lists the main perception components.

Object detection methods are typically categorized into one-stage (e.g., SSD, FCOS, YOLO) and two-stage (e.g., Faster R-CNN) CNN-based approaches. One-stage detectors such as the YOLO family are widely used for real-time applications due to their speed and efficiency [52,53]. Despite detecting small objects, newer models such as YOLOv5 [54] and YOLOv8 [55] show significant accuracy improvements. Jia et al. [56] proposed a structural re-parameterization technique in YOLOv5 to enhance the detection of small traffic elements.

Low-light scenarios addressed by [57] utilized LLE-UNET to improve image clarity and object detection, enhancing decision-making for road elements. Other advances integrate large language models (LLMs) to convert sensor data into natural language prompts, improving contextual reasoning and decision-making [58,59]. Vision-language models (VLMs) such as CLIP [60] and VLM-Auto [61] further support scene interpretation and driver-assistance systems (ADAS).

Three-dimensional (3D) object detection has gained traction using data from LiDAR and stereo cameras. BEVDetNet [62] employs a bird’s-eye view and semantic segmentation for low-latency performance. Multistage approaches such as Faster R-CNN integrated with feature pyramids improve the detection of occluded and small-scale objects [63]. Other notable models include Stereo-CenterNet [64], RCBi-CenterNet [65], and Pseudo-Stereo [66], which use stereo geometry and virtual views.

Sensor fusion approaches, such as CameraRadarFusion-Net [67], combine radar and camera data to enhance detection across wide fields of view. The combination of CNN and DRL [68,69] has been employed to enhance environment sensing, tracking, and navigation.

For segmentation, dense feature pyramid networks [70] and multi-scale FPN [71] have been used to delineate roads and detect clusters of pedestrians. Semantic models such as GCNet [72] and A3T-GCN [73] are used for understanding urban scenes and predicting traffic flow.

In SLAM, deep learning enhancements include Siamese-YOLOv5 [74] for loop closure and 4Seasons [75] for robust localization under varying weather. Hybrid-dense SLAM approaches [76] merge sparse and dense mapping techniques.

Racing Car Perception Techniques

In this section, we focus on various approaches to the perception of autonomous racing cars, comprising deep learning algorithms used to detect racetracks, track boundaries or lanes, and identify opponents, as specified in Table 2.

Formula One-based racing cars have recently been progressively promoted in different studies and innovations. Zhang [77] explored autonomous racing, focusing on perception and reinforcement learning in two phases. The first part describes the perception stack using RADAR and LiDAR-camera systems to detect and track race cars, achieving high accuracy and speed. The second part investigates the improvement of a reinforcement learning agent for simulated autonomous racing, along with the incorporation of long short-term memory (LSTM) networks and soft actor-critic algorithms to enhance the agent’s driving capabilities.

Balakrishnan et al. [78] detected and recognized objects using global features, contours, shape matching, and color analysis using matching algorithms and convolution layers, and their approach appears in an F1-based simulator by avoiding obstacles and opponents. Betz et al. [11] conducted a study on perception that utilized various algorithms by integrating sensor fusion techniques, e.g., camera, LiDAR, and radar. First, they performed preprocessing using conditional removal, Voxel downsampling, and ground filtering (employing the Ray ground filter based on Autoware). Teeti et al. [79] emphasized the impact of adverse weather conditions, using cycle generative adversarial networks (CycleGANs) alongside various deep learning methods, including tiny-YOLOv3, Scaled-YOLOv4, and EfficientDet to support object detection.

However, in Formula Student-based autonomous racing, real-time cars are utilized on racetracks, guided by strategically placed cones.

TraCon [80] used an innovative methodology for cone detection that uses YOLOv5 combined with CSPDarkNet for feature extraction and PANet (feature integration).

Strobel et al. [81] investigated depth prediction and clustering by incorporating YOLOv3, ResNet, and ReKTNet with the Perspective-n-Point method for sensor fusion. Large et al. [22] analyzed and compared EKFSLAM and GhaphSLAM for SLAM. The performance of GraphSLAM outperformed that of EKFSLAM. zheng Peng et al. [82] provided a novel method for SLAM and navigation that utilized open sensor-fusion-based sources in YOLOv3-tiny (cone detection), vector regular registration (positioning), and graph factor (optimization).

Ref. [83] introduced a multimodal sensor fusion and object tracking method designed for autonomous racing, focusing on scenarios that require high speed and low latency.

Comparative Analysis and Challenges

Perception takes data from the different sensors individually or together (fusion), which investigates the perception part using various approaches and assesses performance using multiple evaluation metrics. Furthermore, each approach reveals its advantages and disadvantages during the investigation. In this section, we illustrate the limitations of existing studies in both scenarios.

The perception of on-road vehicles includes various components. These components have been investigated progressively, incorporating deep learning techniques, and some studies need to be further improved. Therefore, Ref. [84] compared stage one and two models with various feature extractors, while the two-stage model outperformed in detecting tiny or occluded objects. However, two-stage detectors are slow in real-time and struggle with dense traffic, adverse weather, and poor illumination. The detection of tiny or lightweight objects in real-time outperformed in static environments but faced challenges in dark or adverse weather conditions [54,55,56]. Fast and accurate object detection is another challenge in auto-driving scenarios, and this has been investigated extensively. However, it faced difficulties in reducing the sample size of the train, the model parameters, the adjustment of the parameters and the flooring, adverse weather conditions, and computational complexity [52]. Mohapatra et al. [62] improved object detection in dynamic environments like diverse weather and light by using LiDAR point clouds, but faced challenges in detecting small or distant objects. Stereo CenterNet [64] performed better in detecting 3D objects by estimating depth, but struggled to detect distant and blocked objects in dynamic environments. The model excelled in marking road indicators with different colors but struggled with segmenting faded markings and required significant computational resources [70]. The approach detected and segmented pedestrians from dense crowds, demonstrating robust performance. However, it requires significant computational resources, faces challenges in real-time detection, and struggles with varying postures of pedestrians and extreme occlusions [71].

Autonomous racing perception incorporates various components that have been progressively investigated using deep learning techniques, although some areas still require further improvements.

Teeti et al. [79] researched the robustness of various object detection methods, improving performance in diverse weather conditions, but also faced computational overhead due to the augmentation process. Optimization of the factor graph based on sensor fusion achieves highly accurate and reliable vehicle odometry, enhancing localization precision and robustness in dynamic environments. However, it still struggles with sensor synchronization and failure and requires fine-tuning [82].

6.1.2. Planing

Planning techniques are responsible for planning a path while driving from source to destination, categorized into global, local, and behavior planning, as described in Table 2. These categories have been evaluated by incorporating various approaches, including classical, sample-based, graph-based, interpolating curve-based, optimization-based, and deep learning-based methods. Deep and reinforcement learning approaches, particularly RNNs and DRL, have been adapted for trajectory prediction and decision-making in both urban and high-speed racing contexts.

As stated, this paper focuses on state-of-the-art deep learning approaches for trajectory prediction, trajectory optimization, lap time, minimum curvature, decision-making, overtaking maneuvers, etc. The distinction between autonomous racing cars and on-road vehicles in planning modules lies in the fact that racing cars are designed to maintain planning at high speeds, while on-road vehicles must navigate various types of environments, such as urban, rural, and highway settings.

On-Road Path Planning Techniques

In this section, we investigate the autonomous vehicle planning stage, which presents significant challenges, including local and global planning methods, focusing on deep learning and reinforcement learning techniques. Ref. [85] investigated complex traffic scenarios that integrate driver-style inference with deep reinforcement learning to improve safety and efficiency. Thus, a variational autoencoder (VAE) based on gated recurrent units (GRUs) is employed to extract driver-style features from historical trajectory data and is used to train a motion planning strategy within reinforcement learning. This research explores the enhancement of the efficiency of interaction-aware trajectory planning for autonomous vehicles. Ref. [86] introduced a knowledge distillation approach, training a smaller network to mimic the predictive capabilities of a larger network by reducing computational costs without sacrificing accuracy, ensuring that the distilled network significantly accelerates trajectory optimization. Hui et al. [87] used an encoder–decoder for trajectory prediction and also worked on trajectory correction using a DNN. ResNet has been used for trajectory prediction, reducing the gradient vanishing problem, and ResNet34 has been used to estimate the trajectory [88]. TridentNet [89] is a conditional variational autoencoder for trajectory generation. VTGNet [90] is a vision-based trajectory generation with two phases: feature extraction for trajectory planning using MobileNet with fully connected layers, and LSTM for trajectory prediction and generation. LSTM-based models have been employed for trajectory prediction and generation [90,91]. Zhang et al. [92] predicted trajectories sequentially with specific time intervals. Each interval creates a trajectory that builds on previous trajectories by supporting adversarial robustness and attack. F-Net [93] combines RNN and CNN to estimate trajectories using fusion-based data. Trajectory prediction utilizes a monocular image DNN with YOLOv3, combined with the extended Kalman filter, to determine angular and linear velocity [94]. LSTM based on spatial–temporal attention has been used to estimate the trajectory of historical trajectories within a specific interval to be ranked between the nearest vehicles [95]. Greer et al. [96] employed the auxiliary loss function in ResNet-50 to predict multiple trajectories and paths. The trajectory predictions exhibited various uncertainties when using deep learning, and the proposed PRIME approach feasibly and reliably predicted trajectories by employing a model-based method as a generator. In addition, it used learning-based models to choose the trajectories as evaluators [97]. TrajVAE [98] incorporated a generative adversarial network (GAN), a variational autoencoder, and LSTM models by generating trajectories. GANs with features of temporal logic-based syntax have been utilized to predict the trajectory [99]. Attention-based interaction-aware trajectory prediction (AI-TP) [100] consolidated the generative attention network (GAT) with ConvGRU to predict trajectories.

Graph-based spatial–temporal convolutional networks [101] have been utilized to predict and estimate trajectories within sequential temporal contexts, incorporating a graph convolution network (GCN). SCOUT [102] is a graph-based trajectory prediction that supports the attention-based graph neural network (GNN); this method organizes social consistency during trajectory prediction and maintains vulnerable other users in a specific manner. Hierarchical-GNN [103] estimates the trajectories of vehicles by analyzing the interactions and relationships between multiple maneuvers. Li et al. [104] explored improving decision-making in self-driving cars using deep reinforcement learning, addressing complex scenarios and efficient data use. They implemented the DRL-EPKG model and combined human driving knowledge with a soft actor-critic framework, which determines both vertical and horizontal vehicle behaviors. A novel behavior-aware trajectory prediction model (BAT) [105] considers the behavior, interactions, priorities, and positions of surrounding vehicles, informed by traffic psychology and human behavior, providing a robust and efficient solution for predicting vehicle trajectories, improving safety and reliability. Ref. [106] utilized the lightweight SqueezeNet model for human-like trajectory planning by imitating human driving behavior and planning efficient and safe trajectories. Ref. [107] introduced a human-like planning method for navigating lane changes and intersections. This method mimics various human driving styles to emulate realistic and safe human-like driving behaviors. vision-language-planning (VLP) [108] enhances autonomous driving systems (ADS) by integrating the reasoning capabilities of language models that incorporate agent-centric and self-driving-car-centric learning paradigms to improve scene understanding and decision-making, demonstrating VLP’s superiority in open-loop planning, perception, prediction, and generalization. DriveLLM [43] integrates LLMs to improve decision-making with common sense reasoning by addressing challenges in traditional rule-based systems. DriveLLM incorporates cyber-physical feedback for continuous learning and improved performance in complex scenarios, enabling interaction with human input while protecting against adversarial attacks. “Drive as You Speak” [109] explores the integration of LLMs to create more human-like interactions, allowing natural language communication, contextual understanding, and continuous learning in autonomous vehicles. It investigates the potential of LLMs for improved reasoning, personalization, and transparency in driving, demonstrating enhanced decision-making and motion-planning capabilities.

Here, we illustrate various state-of-the-art approaches for planning that incorporate reinforcement learning. The Markov decision process (MDP) and DRL have been investigated for navigation and path planning, respectively [110]. Xu et al. [111] used an actor-critic approach for trajectory generation and decision-making in trajectory planning to achieve optimal policies.

Longitudinal trajectory planning was implemented using the Adam optimizer and deep Q-learning, formulated with MDP [112]. Luis et al. [113] introduced multi-agent trajectory planning using centralized convolutional deep Q-learning (DQL) and a modified reward function for rewards and penalties. Hierarchy-based reinforcement learning has been used for trajectory planning by supporting LSTM for historical monitoring, and a hybrid reward has been used to improve performance [114]. Reinforcement learning and multi-agent reinforcement learning (MARL) methods have been used to generate trajectory optimization [115] and trajectory planning [116], respectively.

Racing Car Planning Techniques

Here, we discuss the state of the art in planning autonomous race cars that incorporate deep learning techniques.

Thus, Jain and Morari [117] predicted the trajectories to solve the waypoint and optimal lap time by maintaining the limit handling in high-speed racing vehicles, incorporating Bayesian optimization. Trajectory planning during interactive overtaking that incorporates conventional methods to rely on predicting the behavior of other vehicles proposed a novel RL-based approach to generate trajectories, and a safety layer intervenes if infeasible trajectories are generated, selecting a safe and suboptimal alternative [118]. An unscented Kalman filter combined with a Bayesian estimator was used to address the uncertainty in predicting future trajectories in LUCIDGames [119].

Mix-Net [120] is a structural deep learning method based on the LSTM encoder–decoder to estimate the trajectory, position, and motion of the opponent cars.

The generation of a trajectory from unseen data using reinforcement learning managed computationally outperformed the unmatched parameter method compared to the traditional method [121]. Weaver et al. [122] presented a novel method for real-time trajectory generation in autonomous racing, employing dynamic movement primitives (DMPs). This approach allows racing cars to control deviations that handle accelerating targets, which is crucial for precise control at handling limits. Evans et al. [123] introduced a modification planner that identifies paths by avoiding obstacles and adapts to obstacles using deep reinforcement learning to adjust waypoints when blocks occur The deep deterministic policy gradient (DDPG) has also been employed to capture short-term trajectories with MARL and a linear-quadratic Nash game approximation for low-level trajectories or reference points [124]. RaceMOP [125] is a novel mapless online path planner for multi-agent autonomous racing that uniquely combines an artificial potential field (APF) planner with residual policy learning (RPL) to enable long-horizon planning and robust collision avoidance during overtaking.

Feed-forward artificial neural networks (ANNs) have been used to identify optimal trajectories and tap time, outperforming other traditional methods [126]. Traditional methods are needed to prevent uncertainty and other constraints. Therefore, the TOAST method has introduced trajectory optimization and tracking by deploying a feed-forward network [127].

Sim2Real [128] uses reinforcement learning to leverage optimal trajectories and lap time. Reinforcement learning with a proxy reward for progress predicts optimal trajectories and lap time [129]. ResRace [130], based on the modified artificial potential field (MAPF), produces the optimal DRL-predicted navigation policy.

Weiss et al. [131] predicted opponent vehicle future trajectories by incorporating LSTM, which maintains the interaction area during racing. Q-learning has been applied to behavior inspections and overtaking maneuvers with opponent vehicles on both straight and curved tracks. Gaussian process and Stochastic-MPC have been used for interaction-based methods dealing with overtaking procedures. The procedure involves planning and maintaining optimal trajectories [132,133]. Tian et al. [134] propose a balanced reward-inspired proximal policy optimization (BRPPO) algorithm to improve the quality of training and decision-making of autonomous racing vehicles to navigate complex tracks, incorporating reinforcement learning. BRPPO used a balanced reward function that considers both historical and current rewards to minimize collisions during sharp bends.

Comparative Analysis and Future Research Directions

In order to present an overview of the performance of previous approaches that incorporate deep learning in autonomous driving in road vehicles and racing cars, we include a performance analysis in Table 3, which highlights how diverse planning approaches cater to the specific demands of autonomous driving in both on-road and racing scenarios. For on-road vehicles, models such as VAE + GRU and SCOUT show strong accuracy metrics (e.g., RMSE, ADE/FDE), which suggest that these approaches are effective in structured, lower-speed environments with frequent decision-making points. However, the reliance on high-dimensional data (e.g., graph-based spatiotemporal models) poses challenges in scalability and interpretability. In contrast, racing-focused methods like BRPPO and ANN emphasize fast decision-making and robustness under high-speed conditions, with metrics reflecting lower collision rates and faster planning cycles. These results underscore a practical challenge. While on-road systems prioritize reliability and accuracy, racing systems demand agility and low-latency response. This contrast points to an opportunity for hybrid planners that can adaptively balance speed and precision based on environmental context and task complexity.

The planning components of autonomous driving on roads have been extensively investigated using deep learning techniques. Furthermore, the encoder–decoder deep neural network accurately predicts trajectories in dynamic environments, but it still faces challenges, such as the need for significant computational resources and the difficulty of dealing with highly unpredictable driving scenarios [87]. The graph-based temporal CNN-based trajectory prediction enhances the accuracy in diverse driving scenarios. It still struggles with complex interactions in highly congested scenarios, and it also needs to extend its training to ensure robust performance in diverse behavior and patterns [101]. Trajectory prediction efficiently incorporates lane-cross behavior and destination points but faces computational resources and adverse driving conditions [91]. TridentNet generates trajectory dynamically using the conditional generative model effectively, still facing the issue of rare and highly unpredictable events and limited training due to limited computational resources [89]. TrajVAE [98] generated the trajectory for dynamic environments but needs to be extended computationally and faces problems in real-time scenarios.

The planning components of autonomous racing cars have been drastically investigated using deep learning techniques. Thus, TC-Driver [121] used reinforcement learning to drive decisions on predicted trajectories, optimize performance by learning robustly, and face challenges when unseen track layouts. Real-time optimal trajectory planning for autonomous vehicles using machine learning algorithms, enabling efficient lap time simulation and improving performance, but faces challenges when involving opponents and sudden change events [126]. Simultaneous trajectory optimization and tracking using shared neural network dynamics effectively balance accuracy and computational efficiency, which enhances real-time performance, but this approach faces challenges with noisy data and limited computational capacity and adaptability [127]. DeepRacing AI [135] employed advanced algorithms for agile trajectory planning, which allows it to navigate racing circuits efficiently, but it faces problems in real-time decision-making.

6.1.3. Control

This section discusses the final module—control systems based on lateral and longitudinal control. Lateral control manages path tracking by regulating steering angles and inputs from lateral controls, which are based on reference lines and track boundaries determined during the planning phase. Likewise, longitudinal control manages vehicle acceleration, braking, and throttle settings to maintain a specific velocity, with inputs based on speed, position, or direction. Autonomous racing cars handle high-speed environments on a specific racetrack by incorporating various methods, including classical controls, Model Predictive Controls (MPC), learning-based, drifting, and optimization controls. Therefore, lateral and longitudinal control is investigated individually or simultaneously to control steering angles, speed, velocity, acceleration/deceleration, position, side slope, and drifting, incorporating deep learning techniques. On-road autonomous vehicles must control and monitor all situations around traffic environments at a limited speed to avoid other obstacles. Due to differing environments, we should investigate the approaches between racing and on-road vehicle controls.

Table 4 provides a detailed comparison of control strategies across on-road and racing scenarios, showcasing how different deep learning models handle lateral and longitudinal control under different conditions. In road environments, methods such as robust adaptive learning control (RALC) and human-like neural networks emphasize control precision and smooth trajectory tracking, often reporting low lateral deviation and improved convergence. However, these methods may fail under rapidly changing conditions, where fast adaptability is required. Conversely, racing-focused models like model-free DRL (e.g., Dreamer) and ResRace prioritize lap-time minimization and rapid control response. Although effective for high-speed maneuvers, these approaches often struggle with stability and interpretability. The differences in evaluation metrics, from infraction scores to variance in tracking errors, highlight a broader trade-off—optimizing for safety and interpretability in on-road scenarios versus maximizing performance and agility in racing scenarios. Future research could benefit from hybrid control architectures that take advantage of the reliability of on-road controllers with the adaptability of racing strategies, especially for urban high-speed applications and adversarial settings.

Lateral and Longitudinal Control in On-Road Vehicles

This section reviews the latest advances in the autonomous control of road vehicles, focusing on longitudinal and lateral control systems supported by deep learning techniques. We explore approaches applicable to various driving environments, including urban, rural, and highway settings, while taking into account maneuvers, obstacles, and complex traffic conditions. Deep learning methods are increasingly being used to optimize vehicle control, enabling systems to manage limit handling and meet the challenges of various traffic scenarios.

Several deep learning techniques have been associated with MPC, showing promising results in autonomous driving. Thus, we summarize some MPC-based deep learning techniques. DDPG combined with MPC has been used to control vehicle speed under poor and inefficient conditions on sidewalks in dynamic traffic conditions, maintaining crowd-sourced information about road conditions and ensuring adequate vehicle performance [143]. The Deep Koopman neural network operator is a data-driven MPC approach that controls dynamic vehicles based on unpredictable and nonlinearities from infinite dimensionality [144]. Linear parameter varying (LPV) is a regression-based learning method used for MPC to control non-linear path and lateral problems [145].

Collecting camera-based data, Yin [139] investigated the estimation of steering angles by incorporating the CNN with a pre-trained network to avoid overfitting. Vision-based neural networks [146] were introduced to control the tracking under uncertainty in group vehicles. The vehicles follow the leader of the group of cars, and the leader vehicle leads the other member vehicles. Salunkhe et al. [147] controlled CAN-bus and data optimization and managed energy consumption, incorporating MLP and CNN neural network methods. Ref. [136] introduced a novel deep learning approach to enhance control as human-like, specifically addressing longitudinal motion, which is designed with a structure that mirrors the driver’s control mechanism, making the model more interpretable and applicable to vehicle dynamics, maintaining and improving performance consistency and convergence rate. The robust adaptive learning control (RALC) approach, introduced based on Lyapunov-like theory, enhances tracking control performance in unpredictable environments. This method also improves tracking performance by considering previous tracking control experiences [140]. CarLLaVA [137] is a novel vision language model designed for autonomous driving using only camera input, which leverages the LLaVA vision encoder in closed-loop driving scenarios. It used a semi-disentangled output representation, combining path predictions and waypoints to enhance literal control. Experiments demonstrate CarLLaVA’s superiority over existing methods, particularly in lateral control. DriveMLM [148] is a framework for autonomous driving that uses LLMs, which bridges the divide between language-based decisions and vehicle control through standardized decision states. DriveMLM employed a multimodal LLM, which uses data from various sensors and inputs to make driving decisions and provide explanations.

Ref. [138] explored DRL to improve autonomous vehicle control, addressing the complexities of diverse and varying weather conditions, using Deep-Q networks (DQNs) to train AVs for challenging conditions such as heavy traffic and adverse weather. Prathiba et al. [149] used deep reinforcement learning and genetic algorithms to manage complex road structures, traffic congestion, safety comfort, and vehicle energy maintenance. DRL has been used to optimize traffic flow and generate or predict new routes to reach the destination in emergency situations [150]. DDPG and PPO have been adapted for continuous control in complex navigation tasks, enabling vehicles to manage unstructured intersections or high-speed overtaking in racing environments [151]. Dong et al. [152] used DRL to manage connected autonomous vehicles, maintain safety and movement, and suggest lane changes. Deep reinforcement learning has also been used to manage the unsignalized intersections of connected vehicles, utilizing a proximal policy optimization algorithm and generalized advantage estimation to ensure the reliability and accuracy of traffic flow [153]. Energy consumption includes fuel consumption, battery backup from source to destination, estimated range, driving rounds, etc. Reinforcement learning is used to control the collision problem in autonomous vehicles by framing collision prevention as a Markov decision problem. This approach utilizes an actor-critic policy to make decisions that ensure the safety of the cars [154]. Traffic-aware autonomous driving (TrAAD) [155] is a novel method used to improve the performance of autonomous vehicles by integrating traffic simulation into imitation learning and optimizing speed control to improve traffic flow and energy consumption, which involves a two-phase training process—the autonomous vehicle learns optimal acceleration through reinforcement learning, and the learned behavior is integrated into a comprehensive autonomous driving framework.

High-Speed Control Strategies in Racing Cars

First, we discuss autonomous racing car controls, which deal with high-speed vehicles with high accuracy and outstanding performance. The primary purpose of this survey is to address the question “How to control and reduce the lateral and longitudinal error rate during racing on the track?”. We designed control-based approaches by incorporating deep learning techniques. Here, we examine the lateral and longitudinal controls, focusing on steering angles, acceleration, braking, velocity, and other throttle adjustments.

MPC is the most promising strategy for addressing autonomous racing constraints, which, we hope, will perform increasingly well in the autonomous racing world, particularly with exceptionally dynamic models. Ref. [141] introduced a novel method for autonomous racing, which enhances learning model predictive control (LMPC) by focusing on learning the error dynamics. Instead of directly modeling vehicle dynamics, it learns the difference between a nominal physics-based model and real-world data.

The MPC method employs the Model Predictive Path Integral (MPPI) control, supported by a multilayer neural network, to navigate dynamic environments. However, it may perform inappropriately in real-time environments, and model-free-based reinforcement learning is used to optimize the cost function [121]. The Deep Koopman operator [156] is a data-driven, model-free learning method used for MPC problems. It controls the growing dimensionality and produces optimal racing lanes without existing knowledge of the vehicle. Here, a Gaussian process-based stochastic MPC is used for optimal trajectory planning and overtaking during racing [133]. BAYESRACE [157] was invented to investigate the lateral side slip by incorporating a Gaussian process while maintaining a kinematics-based MPC.

Deep reinforcement learning has been used to control velocity to ensure the limit of high-speed autonomous racing cars from uncertainty dynamics and investigate the minimum trajectory and lap time [129,142]. Evans et al. [158] deployed reinforcement learning control for vehicle position, velocity, and rewards to achieve optimal trajectory, reduce curvature, and optimize lap time.

Comparative Analysis and Challenges

We discuss the performance and limitations of the controls in both scenarios and list some performance of recent studies in Table 4.

Therefore, the components and comparison of both driving scenarios are included in Table 2. Initially, we list the most recent studies on the cons and pros of autonomous driving based on the road. The other reinforcement learning model effectively enabled the avoidance of chain collisions by learning safe and adaptive maneuvers. However, it faces challenges in real-time decision-making under extreme conditions [154]. The deep learning-based control algorithm provides improved decision-making, efficient path planning, and adaptive control, allowing vehicles to navigate complex environments with improved precision and responsiveness [139].

Finally, we list the most recent existing studies on the performance and challenges of autonomous racing in control modules.

Salvaji et al. [159] investigated using deep reinforcement learning (DRL) to control an autonomous Formula SAE, trained reinforcement learning algorithms in simulation, exploring the challenges of applying RL to autonomous navigation, especially the gap between simulation and real-world performance. Two algorithms, DQN and TD3, were benchmarked. The Gaussian process model is effectively used to predict opponent movements and provide probabilistic estimates that improve decision-making, but it struggles in dense racing fields [132]. The method designed a signal-specific tailored reward based on improving and promoting faster lap times with precise maneuvers, but it faces challenges like the potential for leading to unintended behaviors if the reward is not carefully balanced [158]. Residual policy learning has led to faster convergence and significantly refined control in high-speed environments, but it faces problems in real-time [130]. The hierarchical control system improves coordination among cooperative teams in competitive autonomous racing by structuring decision-making into layers, optimizing individual vehicle performance and team strategies, and enhancing race efficiency. However, there is a need to improve real-time coordination in highly dynamic and competitive environments [124].

End-to-end approaches

End-to-end approaches have emerged as alternatives to traditional modular autonomous driving pipelines. These methods employ deep learning models where sensor inputs, such as images, LiDAR, point clouds, etc., are directly mapped to control actions, potentially simplifying the system architecture and improving its latency. This section presents a structured overview of fully end-to-end and partially end-to-end systems, highlighting their applications in road and racing environments. Figure 6 illustrates the architectural differences between fully and partially end-to-end pipelines.

Sometimes, end-to-end approaches overlap in autonomous vehicles in both on-road and racing scenarios. However, the methodologies in both cases differ due to different environments and objectives. A summary of selected approaches across domains is provided in Table 5, comparing architectures, performance metrics, and control strategies. On-road systems increasingly incorporate vision-language models (VLMs) and large language models (LLMs), enabling semantic understanding and long-tail reasoning, which is essential for urban driving where interpretability and robustness are priorities. However, models such as ShuffleNet V2 and DSUNet, while accurate in static tasks like lane detection, may underperform in dynamic or edge-case scenarios. In contrast, racing-focused models like extreme learning machines (ELMs) and CNN + LSTM based on reinforcement learning architectures are optimized for speed and responsiveness, excelling in lap time and trajectory adherence, but often at the cost of explainability and safety. These trade-offs highlight an ongoing challenge: aligning architectural simplicity with robustness and transferability. The field may benefit from partially modular approaches that blend interpretability with performance, particularly in sim-to-real transitions and multi-task scenarios.

In the following subsections, a more detailed analysis of the end-to-end approaches proposed for both scenarios is made.

Fully End-to-End Systems: On-Road Scenarios

Fully end-to-end systems are predominantly used in structured environments such as urban and highway driving. These systems often use convolutional neural networks (CNNs), recurrent layers such as GRUs [168] or LSTMs [169], as well as reinforcement learning (RL) techniques [170,171] to learn control policies from raw sensory data. For example, Anzalone et al. [7] proposed an end-to-end reinforcement learning-based framework using proximal policy optimization (PPO) and curriculum learning to improve driving policies in complex scenarios. Similarly, Agarwal et al. [172] integrated monocular vision with deep reinforcement learning for urban trajectory planning and control, enabling real-time collaboration and collision avoidance. Other approaches include Coopernaut [173], which emphasizes vehicle-to-vehicle perception, and integrates vision and language systems such as SimpleLLM4AD [44], where driving decisions are reformulated as a series of visual question-and-answer tasks. These systems use graph vision transformers and visual question answering (GVQA) to analyze the scene and plan coherent actions. Imitation learning has also been widely applied, for example, in CNN-based behavioral cloning approaches [174], where end-to-end lateral controllers are trained directly from human demonstrations. The authors of [161] have made advancements in this area by incorporating large language models (LLMs) to mimic human-like reasoning and memory in long-tail driving scenarios.

Fully End-to-End Systems: Racing Scenarios

In racing environments, fully end-to-end systems aim to maximize control performance at high speeds with minimal latency. Wadekar et al. [5] proposed a deep neural network (DNN) to estimate steering, throttle, and braking based on input from vision sensors, simplifying modular planning. Reinforcement learning has proven to be particularly effective in competitive environments. Deep latent competition (DLC) [175] models policy learning within a latent visual space to interact with opponents in racing simulations. On the other hand, Conditioning for action policy smoothness (CAPS) [176] applies soft actor-critic algorithms for smooth policy transitions using visual data, while other approaches use imitation learning for high-speed control with formal safety constraints through control barrier functions (CBFs) [167]. The transition from a simulation environment to the real environment is a relevant topic in the racing context. Methods such as DeepRacer [177], which integrates point cloud and stereo vision with reinforcement learning, aim to bridge the performance gap between simulation and physical environments.

Partially End-to-End Approaches and Hybrid Methods

Partial end-to-end systems offer a balance between the adaptability of learning-based models and the structure of modular pipelines. These methods typically learn specific segments of the pipeline, such as perception-to-planning or planning-to-control, while retaining rule-based or classical approaches for other components. DeepSTEP [178] introduced a novel end-to-end perception architecture that comprises a self-attention mechanism to leverage temporal information, improving overall efficiency and performance by integrating detection and localization into a single unified pipeline. Hu et al. [164] addressed the challenge of autonomous vehicles merging onto highways from on-ramps and proposed a novel approach that fuses RL with optimization-based methods, leveraging the strengths of both techniques, and achieving a balance between smoothness, computational efficiency, explainability, and robustness.

EMMA [163] is a new end-to-end multimodal model for autonomous driving developed by Waymo. EMMA leverages a multimodal large language model foundation to directly transform raw camera sensor data into driving outputs like trajectories and object detection. The model represents inputs and outputs as natural language, enabling it to perform various driving tasks within a unified language space using task-specific prompts.

SuperDriverAI [179] is an end-to-end autonomous driving system that improves robustness and interpretability by using deep neural networks, processing image data to control steering, throttle, and braking.

DriveVLM and DriveVLM-Dual [180] are novel autonomous driving approaches that harness the power of VLMs to better understand and plan the scene. The core innovation, DriveVLM, employs a unique reasoning approach through scene description, analysis, and hierarchical planning. Recognizing VLMs’ limitations, DriveVLM-Dual has emerged as a hybrid integrating VLM capabilities with traditional autonomous driving pipelines for better spatial awareness and computational efficiency. VLM-AD [181] introduced a novel approach that incorporates VLMs to supervise and improve AD training, which enables AD models to learn richer feature representations, leading to improved planning accuracy and reduced collision rates on the nuScenes dataset, leveraging VLM driving knowledge into end-to-end AD pipelines through high-quality datasets of behavioral text annotations.

Lee and Liu [162] introduced the depth separable convolutional network (DSUNet) to determine the lane of the roads and predict the path involving CNN and UNet. UNet is a convolutional-based encoder–decoder network for semantic segmentation.

Three-dimensional (3D)-LiDAR-based point clouds and 2D-camera-based images were used for trajectory planning, where point cloud images were converted into grayscale images using gradient transformation. Both data images on similar scales used attention-based autoregressive-GRU to predict trajectories [182]. Conditional DQN with fuzzy logic as an end-to-end approach has been used to predict the planning of the directional trajectory by supporting global route planning. It also maintains planning commands such as steering angles and throttles [183]. TransFuser [184] is an end-to-end imitation learning model invented for multiple sensor-fusion converted to the same dimension to perform in high-dynamic and complicated environments, controlled like multi-directional traffic congestion and communication.

Kalaria et al. [165] proposed a novel framework that uses an extreme learning machine (ELM) to learn and adapt the vehicle tire model online to operate at their handling limits to achieve optimal lap times. DeepRacing [185] has been introduced to capture trajectory predictions from a vision-based perspective. It is a partial end-to-end approach from the sensor to planning. Mammadov [166] focused on an end-to-end approach using raw lidar data and velocity information using RL. The RL agent’s ability to navigate without prior map information contrasts with traditional methods that rely on precise mapping and path planning. The gradient-free optimizer has been used across various domains of high-speed autonomous racing, such as planning and control. This method has been extensively used for trajectory planning and behavior planning [186]. Optimization of the soft actor-critic-based trajectory and overtaking maneuvers from vision-based input is also a partially end-to-end method [187].

Comparative Analysis and Challenges

End-to-end and hybrid systems present contrasting trade-offs. Fully end-to-end approaches offer optimized integration and potential performance gains, especially in structured environments. However, they often lack interpretability and require extensive training data. In contrast, partially end-to-end and hybrid models provide systems with greater modular clarity while selectively leveraging learning-based adaptability. These methods are particularly well-suited for scenarios that require explainability, certification, or domain adaptation. Despite significant progress, challenges remain. Although fully end-to-end systems often address rare driving events, hybrid methods can suffer from interface incompatibilities between learned and rule-based components. In this regard, it is important to continue research on simulation-to-real-world transfer, uncertainty estimation, and hierarchical control frameworks to address these limitations and move toward building robust and generalizable autonomous systems.

6.2. What Are the Safety and Robustness Machine Learning/Deep Learning Techniques Used in Autonomous Driving on the Road and in Racing Scenarios?

Safety is the most critical debate in the environment of autonomous driving. Suppose that the researchers are not appropriate for the safety and comfort by organizing the limited handling of vehicles. Thus, technology may be a dangerous zone. However, safety must consider presenting their novel studies, especially in deep learning techniques, since they are based on historical data, which can occur under different constraints, including inadequate interpretation, verification, overfitting/underfitting, etc. Therefore, we review some state-of-the-art studies that have focused on safety.

Safety in Road Scenarios

In this section, we focus on the safety of autonomous vehicles on the road by tackling several instances, such as other vehicles and their interaction, route or road lane, signal, turning points, pedestrians, vehicle speed, and other technical challenges. Taking into account the safety constraints, the SAE developed a safety level standard that was adopted by the National Highway Traffic Safety Administration (NHTSA), Department of Transportation, USA.

Therefore, we will present some state-of-the-art approaches to handling safety verification by incorporating deep learning. Safety improvements in autonomous driving have been driven by fundamental advances in perception and control algorithms supported by deep learning techniques.

Abrecht et al. [188] addressed the critical safety concerns associated with the use of deep learning, which develops a structured approach for “safety concerns” to systematically analyze and mitigate potential risks arising from the unique characteristics of deep neural networks. Wang et al. [189] proposed using LLMs for intelligent decision-making in behavior planning, supplemented by a safety verifier, which is based on two case studies demonstrating that LLMs can improve driving performance and safety within simulated environments. Chen et al. [190] proposed a novel hierarchical DRL framework that enables autonomous vehicles to avoid collisions, particularly with pedestrians. The system combines a traditional PID controller for path-following with a DRL-based collision avoidance agent activated upon vulnerable road users’ VRU detection.

Autonomous and manual vehicle safety solutions have enabled 5G intelligent transportation systems associated with LSTM and decision-making layers of the probability matrix [191]. Multi-source and multi-outcome-based vehicles safely and progressively predict the direction of arrival and transmission constraints. They estimate geometry-based coordinate limitations using the spare block recovery problem with SBLNet to predict the direction of arrival and transmission to ensure the safety of autonomous vehicles [192].

Deep neural networks have been used to compute the safety level of autonomous vehicles, analyze instances from the on-board unit, and identify breakdowns originating from the on-board unit [193].

Zhu et al. [194] proposed a CNN-based vehicle-pedestrian detection method that extracts several huge features to maintain the exchanges. Squeeze-Net has been presented to extract some other features of traffic by promoting safety interaction. Xing et al. [195] investigated safety within connected automated vehicles (CAVs) performing deep-RNN and LSTM, which predicts trajectories, maintains longitudinal and lateral behavior, and highlights the stage of energy consumption.

Safety Concerns in Racing Scenarios

We highlight some state-of-the-art safety measures in high-speed vehicles under challenges, incorporating deep learning approaches.

Chen et al. [196] manipulated the camera vision and vehicle speed inputs with reinforcement learning to identify unsafe behaviors. They used the Hamilton–Jacobi (HJ) approach to maintain safety within constrained MDPs.

End-to-end deep reinforcement learning (RL) and imitation learning methods have been utilized in what is known as the deep imitative reinforcement learning (DIRL) approach. It is used on visual data to reliably and accurately identify control policies by validating safety in both simulation and real-time environments [197].

Autonomous racing vehicles have ensured high-speed vehicle controls and safety concerning track boundary changes and, for this purpose, presented the predictive safety filter by supporting the imitation learning method that learns from human expertise and handles non-linear high-dynamic systems [198].

Safety in Adversarial and Edge-Case Scenarios

While significant attention has been given to perception and planning modules, safety remains an underexplored critical component of autonomous driving systems, with the importance of robustness to adverse conditions and rare driving events being particularly important. Vision-based systems, especially those based on CNNs, have demonstrated susceptibility to adversarial attacks, such as imperceptible changes in inputs that can lead to drastic changes in outputs. Recent studies have proposed input sanitization, adversarial training, and model distillation as mitigation strategies.

In addition, handling edge cases is a concern that must be addressed urgently. Autonomous vehicles must be prepared to respond to scenarios that, although rare, pose a high risk, such as pedestrians crossing the street or sudden lane closures. To address this, systems increasingly incorporate out-of-distribution detection, uncertainty estimation, and scenario-based testing. LLM-enhanced architectures, such as DriveLLM [43] and SimpleLLM4AD [44], offer promising directions by introducing contextual reasoning and generalization capabilities for long-tail events.

Simulation environments and stress test datasets are also being developed to assess safety under unpredictable conditions. However, industry-applicable standards are not yet available. Bridging the gap between model performance and certifiable safety is essential for the real-world deployment of autonomous driving systems.

6.3. What Are the Existing Datasets Used for Machine Learning/Deep Learning Techniques in Autonomous Driving?

Based on our interest in autonomous vehicles, we illustrated the existing datasets used for machine learning/deep learning techniques. These datasets are based on simulation and real-world environments, which are categorized into full-simulators, semi-simulators, semi-real, and real-world. Deep learning learns objectives or environments from historical data, and as a consequence of our intention, investigations will be conducted on existing dataset benchmarks. Those datasets have been explored in previous work, and some datasets have also been developed by considering their objectives.

This section summarizes the datasets for autonomous driving for racing and on-road applications separately in Table 6 and Table 7, respectively.

Real-Time and Simulations on-road vehicles

Several different real-time simulators and simulations have been developed in some popular industries to ensure that vehicles perform well in the real world dynamically and that humans can easily understand and benefit from the modern contemporary world. These vehicles compete in diverse environments, including maintaining weather conditions day and night and managing urban, rural, and highway environments. Researchers have developed different simulations in the field of autonomous vehicles. When researchers introduce novel concepts in their studies to evaluate real-time cars, which can be very expensive, they often use simulators as a test bed to conduct their studies.

Consequently, we illustrate some real-time autonomous and simulation-based vehicles and hardware and software perspectives. Several sensory input sources have been used to capture the environment of autonomous detection that can be demonstrated in further processing; sensor inputs, including cameras, LiDAR, radar, GPS, IMU, and other sensor fusion types for GPUs, especially NVIDIA, are being used for computing platforms. Table 6 summarizes and compares the real-time and simulator-based datasets.

First, the real-time scenarios are most promisingly used as dataset benchmarks in their research work, such as Kitti (Karlsruhe Institute of Technology and Toyota Technological Institute), Oxford RobotCar, NuScenes, Drive360, Udacity, DAVIS, Cityscapes, HDD (Honda Research Institute Driving Dataset ), BDDV (Berkeley DeepDrive Video), Comma.ai, PandaSet, and so on.

Secondly, some simulators are being investigated during research as datasets, including CARLA, Udacity, CARSIM, Comma.ai, 9-DoF Driving Simulator, GTA-v, rFpro, NGSIM (Next Generation Simulation), etc.

Real-Time and Simulations racing cars

In 2017, a company in the UK invited a series of autonomous racing. They developed two versions of race cars—Devbot (versions 1.0 and 2.0) and Beta Serious—which have a maximum speed of 180 km/h. The Nvidia PX2 and Speedgoat Mobile Target Machine control the engines of race cars, utilizing several input sources to gather and process environmental information. The sensory devices include front and rear cameras, LiDARs, Radars, and GPS/IMU. These are integrated into a fully electric, Le Mans Prototype (LMP) chassis with rear-wheel drive, as used in Roborace. However, this series has been featured in several competitions, such as the ALPHA (2018–2019) and BETA (2020–2021) series, and has been accomplished through participation from various universities and researchers. Additionally, simulations targeting Roborace have also been developed, including the Roborace simulator and Learn-to-Race (L2R) simulators (handles RL problems). Researchers and companies have also investigated the components of the modular system that can be utilized in Roborace environments.

Now, the most prominent racing competition for full-scale autonomous racing cars is the Indy Autonomous Challenge, launched in 2021. Inspired by the DARPA Series, it features competitions at speeds of up to 290 km/h [199]. These cars have a central computing platform to control the race car engine, which contains a fully electric Indy light chassis, and rear wheel drive power. In 2021 and 2022, Indianapolis Motor Speedway and Las Vegas Motor Speedway organized competitions to compete with a stationary car and an opponent vehicle, respectively [11]. Several groups of university students also participated in the rounds.

However, simulations have been developed for researchers to easily access the IAC environment, such as the Ansys simulator and the LGSVL simulator. Researchers have recently investigated various studies involving simulation to validate their findings. Moreover, other simulators for autonomous racing cars are being utilized as investigation testbeds, including TORCS, Carla Simulator, Gran Turismo Sports, and a hardware-in-the-loop (HIL) simulator. These datasets belong to autonomous racing platforms that use the components of the modular system, which are listed and summarized in Table 7. According to our investigation, most of the existing papers annotated their dataset by considering real-time or simulation tracks during their experiments.

6.4. What Performance Evaluation Metrics Are Used to Evaluate the Modular System in Autonomous Driving on the Road and in Racing Scenarios?

Now, we describe the evaluation metrics utilized to evaluate the modular system of both scenarios in state-of-the-art studies, including mean absolute error (MAE), mean squared error (MSE), root mean square error (RMSE), accuracy, F-measure, precision, recall, the area under the curve (AUC), mean Jaccard, and IOU.

Occasionally, planning has been evaluated based on MAE, MSE, RMSE, lap time, trajectory fit ratio (TFR), mean time between boundary failures (TBFs), distance between boundary failures (DBFs), and tracking errors. The lap time represents the duration it takes for cars to complete a circuit, traveling from the source to the destination, which is represented as follows:

T_{o} = T_{a} + T_{c} + T_{s} + T_{b}

(1)

where

T_{o}

represents the optimal lap time during racing,

T_{a}

is the period accelerating from the start line to the first corner,

T_{c}

is the time taken to navigate the corners around the track,

T_{s}

is denoted as the period of the straight portion of the racetrack where maximum speed is achieved, and

T_{b}

is the time spent braking before entering corners.

T F R

has been used to evaluate the precision of the compared trajectory prediction models, which demonstrate the degree of fit between the predicted trajectory and the real-time trajectory [87], which is mathematically represented as follows:

T F R = (1 - \frac{a b s (a e_{1}) + a b s (a e_{2}) + . . . . + a b s (a e_{n})}{L}) \times 100 %

(2)

where

a e_{i}

is the difference between the

i_{t} h

output coordinate and its actual value, and L denotes the average length of trajectories.

However, control performance has been examined using several metrics, including MAE, MSE, RMSE, lateral and longitudinal error, heading error, average displacement error (ADE), final displacement error (FDE), learned model velocity optimization (LMVO), and LMVO plus failure prediction and the intervention module (FIM).

A D E

estimates the actual position of autonomous vehicles within a specific period, which is represented as follows:

A D E = \frac{1}{n} \sum_{i = 1}^{n} d_{i}

(3)

where n denotes the total number of positions,

d_{i}

is the displacement error at

i t h

positions, and

d_{i}

utilizes the Euclidean distance between the actual and desired positions.

F D E

is used to measure the error between the final position and its corresponding desired position to complete a trajectory; it is also calculated using Euclidean distance [102].

7. Limitations of the Study

The systematic review presented in this article comprehensively summarizes the proposed approaches for modular autonomous driving systems. It primarily relies on research articles collected using a search strategy that prioritized databases such as SCOPUS, Science Direct, IEEE Xplore, Web of Science, ACM, MDPI, and Taylor & Francis databases. Consequently, related articles may have been excluded from this study because they were not found in the considered databases. Furthermore, the search was restricted to publications from the last five years (2020–2025), as well as publications written in English, which also limited the number of articles analyzed. Another limitation is that we mainly used manual screening, without the assistance of automation or AI tools, which may have introduced human error.

8. Real-Time Challenges and Future Directions

This section evaluates the performance, real-time challenges, and future directions of existing modular autonomous driving systems in on-road and racing scenarios.

8.1. Real-Time Challenges for On-Road and Racing Scenarios

In this section, we present the real-time challenges in autonomous driving and provide an overview of the issues faced in both environments.

8.1.1. On-Road Autonomous Vehicles

Initially, many studies focused on specifying pose detection and classification faults to detect tiny objects and adverse weather conditions in real-time [62]. Adapting to unpredictable changes (e.g., sudden obstacles, traffic changes) in the environment creates problems in real-time [14]. Consequently, during auto-driving, making instant and reliable decisions in real-time by ensuring safety and compliance [140,149]. Vehicle-to-vehicle and vehicle-to-everything communications—communication with other vehicles and infrastructures to improve situational awareness [140,152]. Handling unexpected situations and sensor failures presents a challenge in real-time environments [151]. In addition, efficiently calculating the safest and most efficient routes in real-time and managing computational resources are significant challenges [143,151].

8.1.2. Autonomous Racing Cars

First, by maintaining safety and control to navigate at high speeds, racing cars cause challenges in real-time [117]. Thus, it is necessary to decide between dynamic racing and overtaking opponents in real-time face problems [132,133]. Rapid sensor data processing to make immediate adjustments poses a real-time problem [185]. It is challenging to calculate and adjust the trajectory to obtain optimal racing lines in real-time [132,133]. Another critical aspect involves ensuring safety by avoiding collisions with opponents and adhering to track boundaries at high speeds [143,158]. Lastly, it is essential to manage energy consumption and computational resources during races to sustain performance [117,143].

8.2. Research Directions for On-Road and Racing Scenarios

This section outlines future research directions in deep learning for autonomous driving, focusing on both on-road and racing applications. We highlight the challenges of integrating deep learning techniques across the components of the modular system, emphasizing the need for innovation in perception, planning, and control.

8.2.1. Perception

Recent advancements in perception for autonomous driving open up promising research directions. The IS-YOLOv5+VNP model [54] integrates YOLOv5 with versatile network pruning (VNP) to improve the detection of lightweight objects. Future work could extend this approach for real-time applications, especially for detecting smaller objects such as traffic signs and signals. In addition, CycleGAN combined with object detection methods [79] has shown promise in improving performance under adverse weather conditions. Training models with augmented datasets that simulate such conditions could significantly enhance the reliability of detection in challenging environments.

Additionally, GCNet [72] and two-stage object detection models [84] could be further developed by integrating sensor fusion techniques (e.g., camera and LiDAR data). Extending these methods to incorporate temporal data frames may also improve the detection accuracy, particularly for moving objects.

8.2.2. Planning

Planning is critical for autonomous driving, with trajectory prediction and lane detection forming the core of decision-making processes. TC-Driver [121] uses RL for trajectory planning, and combining RL with recurrent neural networks while optimizing regularization through Bernoulli policies could improve real-time performance.

ANN-based trajectory optimization [126] is another area for exploration, particularly for non-stationary autonomous racing, where detecting opponents and optimizing overtaking maneuvers could greatly improve performance. TOAST [127] employs model predictive control (MPC) for optimal trajectory tracking, which could benefit from integrating RL to learn optimal policies through adaptive environmental interactions.

In addition, MixNet [120] introduces interaction-based motion planning, which could be expanded by incorporating graph neural networks (GNNs) to handle complex interactions with surrounding objects. The deep encoder–decoder model [87] could be improved by handling lateral and longitudinal boundaries, and transforming path values into graph representations could facilitate the use of GNNs. AI-TP [100] offers an attention-based approach to multimodal trajectory prediction, which could be refined to handle more complex interactions.

8.2.3. Control

In control systems, deep learning continues to show potential in improving vehicle handling. More advanced deep learning models could improve DRL-based lane change during racing [152] for better efficiency and performance.

As demonstrated by RALC [154], adaptive learning-based tracking control could be further developed using a hardware-in-the-loop (HIL) system for real-time testing. Improving training data quality by focusing on realistic driving patterns could enhance model performance and generalization.

8.2.4. End-to-End Approach

End-to-end learning, which integrates perception, planning, and control, is a growing area in autonomous driving research. Hierarchy-based control [124] uses MARL for high-level planning and low-level control. Future research could focus on integrating both levels into a unified framework using multi-agent approaches.

End-to-end trajectory planning using sensor fusion data in the GRU model [182] offers another avenue to improve generalization. Similarly, DQN with fuzzy logic [183] for motion planning could be extended to include obstacle detection and traffic signal interaction, offering a more comprehensive solution for real-world driving challenges. Aoki et al. [179] addressed the challenges of unexpected road scenarios and cooperation with human drivers, presenting a practical approach based on imitation learning that could focus on extending to handle more practical situations and allowing safe interaction between autonomous and human-driven vehicles.

8.3. Certainty of Evidence

Regarding the assessment of the overall certainty of the evidence on the topics studied, we considered that the findings related to the perception and planning modules presented moderate confidence due to the consistency of the methodologies and the reproducibility of the metrics. In contrast, we understand that evidence on safety and robustness should be classified as low certainty due to the smaller number of studies and the methodological variability existing in them.

9. Integration of State-of-the-Art Modules: A Case Study

To more clearly demonstrate the state-of-the-art methods of different modules and their combinations for creating autonomous driving systems, we built a conceptual modular autonomous driving system based on the main methods identified and analyzed in this study.

Perception module: The perception stage uses PointPillars [15], a 3D object detection framework capable of efficiently processing LiDAR point clouds for real-time object localization. Its voxelization of point cloud data allows for low-latency obstacle detection, critical for downstream trajectory planning. For visual perception, the use of YOLOv5 [94] allows for robust and lightweight object detection from RGB cameras, allowing redundancy in multisensory fusion pipelines.
Planning module: For the planning subsystem, we propose the use of SCOUT [96], a spatiotemporal graph-based model capable of predicting trajectories with interaction awareness. SCOUT predicts the future movements of surrounding agents and proposes safe and dynamically feasible paths. Its graph-based structure allows explicit modeling of interactions between multiple agents, making it suitable for complex urban environments.
Control module: The control layer incorporates robust adaptive learning control (RALC) [134], which ensures stability in the face of modeling inaccuracies and external disturbances. By integrating learning-based disturbance estimation with classical feedback control, RALC is suitable for dynamically uncertain conditions.
Deployment environment: To test the system, we propose the use of the CARLA simulator [45], which provides several urban, suburban, and highway scenarios. The possibility of simulating sensors (LiDAR, cameras, GPS, IMU) enables realistic sensor fusion, as well as scenario-based system evaluation.

While each module individually demonstrates state-of-the-art performance, integrating them presents challenges. In cases of perception errors, such as adverse weather conditions or occlusions, planning accuracy can decrease significantly. Using a graph-based method for the planning module may result in delays in real-time execution, given the computational overhead that these entail. On the other hand, controllers such as RALC, although robust, depend on accurate and frequent trajectory updates, which can be compromised, for example, by network communication and computational issues. Thus, future research could focus on tightly coupled learning frameworks that jointly optimize perception, planning, and control, in addition to quantifying uncertainty at each stage to mitigate cascading failures.

The modular structure of autonomous systems varies significantly between the road and racing domains, as illustrated in Figure 7. While traffic-oriented architectures prioritize traffic element perception and rule-compliant planning, focusing on safety and rule compliance, racing systems emphasize dynamic road perception and aggressive trajectory optimization, focusing on optimization of performance metrics such as lap time. These differences fundamentally shape the design, evaluation, and integration challenges faced by each application domain.

10. Conclusions

This survey paper offers valuable information by comprehensively analyzing deep learning techniques in both on-road and autonomous racing cars, highlighting their distinct challenges and requirements. On-road vehicles emphasize safety and adaptability in complex, dynamic environments while racing scenarios demand precision, high-speed control, and real-time decision-making.

In perception, deep learning has significantly advanced object detection, semantic segmentation, localization, and sensor fusion, improving navigation in complex environments. State-of-the-art CNN VLM models are frequently used, but challenges remain, particularly in dealing with adverse weather and unpredictable conditions.

Deep learning techniques such as trajectory prediction, generation, optimization, and decision-making have been proven to be effective in planning. Notably, RNNs, generative models, LLMs, VLMs, and reinforcement learning have made crucial contributions. However, real-time decision-making in dense traffic and high-speed racing remains a significant hurdle.

Deep learning has greatly benefited control systems, including steering, throttle, stability, traction, and safety. LLMs, reinforcement learning, and model predictive control have shown substantial promise in both on-road and racing applications. However, the complexity of control in dynamic environments demands further refinement.

End-to-end approaches, which integrate perception, planning, and control into unified deep learning models, have laid the groundwork for fully autonomous systems. However, these approaches often struggle with complex edge cases and maintaining robust safety in real-time scenarios.

This paper underscores critical challenges for future research, including improving the robustness of models in diverse environmental conditions, improving real-time computational efficiency, and strengthening sensor fusion techniques. Addressing these challenges will be pivotal in ensuring the safety and performance of autonomous systems.

Ultimately, this survey encourages ongoing collaboration between academia and industry to advance autonomous driving technologies. By extending the capabilities of deep learning, we can foster new innovations in both on-road vehicles and autonomous racing, enhancing performance and transforming the future of mobility.

Author Contributions

Conceptualization: K.H. and S.J.; methodology: K.H., J.P., S.J. and J.J.; investigation: K.H.; writing—original draft preparation: K.H.; writing—review and editing: K.H., C.M., J.P., S.J. and J.J.; supervision: J.P., S.J. and J.J.; funding acquisition: S.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially funded by BREUCA—“Development of a high-precision Virtual Reality simulator designed to be used in professional simulation and gaming environments associated with real events”, with code POCI-01-0247-FEDER-048257, co-financed by Portugal 2020 and supported under the auspices of the UNESCO Chair on AI & VR and through Fundação para a Ciência e Tecnologia with references 2022.09212.PTDC and DOI:10.54499/UIDB/50021/2020, DOI:10.54499/DL57/2016/CP1368/CT0002.

Data Availability Statement

The list of included studies, data extraction tables, and related material used for the synthesis is available from the corresponding author upon reasonable request.

Acknowledgments

We thank the reviewers for their helpful comments.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AD	Autonomous Driving
ADAS	Advanced Driver-Assistance Systems
ADE	Average Displacement Error
AI	Artificial Intelligence
ANN	Artificial Neural Network
APF	Artificial Potential Field
AUC	Area Under the Curve
BAT	Behavior-Aware Trajectory prediction model
BRPPO	Balanced Reward-inspired Proximal Policy Optimization
CLIP	Contrastive Language–Image Pre-training
CNN	Convolutional Neural Network
DARPA	Defense Advanced Research Projects Agency
DBF	Distance Between Boundary Failures
DDPG	Deep Deterministic Policy Gradient
DIRL	Deep Imitative Reinforcement Learning
DLC	Deep Latent Competition
DMP	Dynamic Movement Primitives
DRL	Deep Reinforcement Learning
FCOS	Fully Convolutional One-Stage Object Detection
FDE	Final Displacement Error
FIM	Failure Prediction and Intervention Module
GAN	Generative Adversarial Network
GCN	Graph Convolution Network
GNN	Graph Neural Network
GPS	Global Positioning System
GVQA	Graph Visual Question Answering
HIL	Hardware-in-the-loop
HJ	Hamilton–Jacobi
HRL	Hierarchical Reinforcement Learning
IOU	Intersection Over Union
IRL	Inverse Reinforcement Learning
L2R	Learn-to-Race
LiDAR	Light Detection and Ranging
LLM	Large Language Model
LMPC	Learning Model Predictive Control
LMVO	Learned Model Velocity Optimization
LPV	Linear Parameter Varying
LQNG	Linear-Quadratic Nash Game
LSTM	Long Short-Term Memory
MAE	Mean Absolute Error
MAPF	Modified Artificial Potential Field
MDP	Markov Decision Process
mIoU	Mean Intersection Over Union
MLLM	Multimodal Large Language Model
MPC	Model Predictive Controls
MPPI	Model Predictive Path Integral
MSE	Mean Squared Error
NHTSA	National Highway Traffic Safety Administration
RALC	Robust Adaptive Learning Control
R-CNN	Region-based Convolutional Neural Network
RL	Reinforcement Learning
RMSE	Root Mean Square Error
RPL	Residual Policy Learning
SAE	Society of Automotive Engineers
SLAM	Simultaneous Localization and Mapping
SSD	Single Shot Detection
TBF	Mean Time Between Boundary Failures
TFR	Trajectory Fit Ratio
TrAAD	Traffic-Aware Autonomous Driving
V2V	Vehicle-to-Vehicle Communication
V2X	Vehicle-to-Everything Communication
VGG	Visual Geometry Group
VLM	Vision Language Model
VLP	Vision-Language-Planning
VNP	Versatile Network Pruning
YOLO	You Only Look Once

References

World Health Organization. Global Health Estimates 2019: Deaths by Cause, Age, Sex, by Country and by Region. 2020. Available online: https://injuryfacts.nsc.org/international/international-overview/ (accessed on 10 September 2024).
Yasin, Y.; Grivna, M.; Abu-Zidan, F. Motorized 2–3 wheelers death rates over a decade: A global study. World J. Emerg. Surg. 2022, 17, 7. [Google Scholar] [CrossRef]
Francis, J.; Chen, B.; Ganju, S.; Kathpal, S.; Poonganam, J.; Shivani, A.; Vyas, V.; Genc, S.; Zhukov, I.; Kumskoy, M.; et al. Learn-to-Race Challenge 2022: Benchmarking Safe Learning and Cross-domain Generalisation in Autonomous Racing. arXiv 2022, arXiv:2205.02953. [Google Scholar]
Zhao, J.; Zhao, W.; Deng, B.; Wang, Z.; Zhang, F.; Zheng, W.; Cao, W.; Nan, J.; Lian, Y.; Burke, A.F. Autonomous driving system: A comprehensive survey. Expert Syst. Appl. 2024, 242, 122836. [Google Scholar] [CrossRef]
Wadekar, S.N.; Schwartz, B.; Kannan, S.S.; Mar, M.; Manna, R.K.; Chellapandi, V.; Gonzalez, D.J.; Gamal, A.E. Towards End-to-End Deep Learning for Autonomous Racing: On Data Collection and a Unified Architecture for Steering and Throttle Prediction. arXiv 2021, arXiv:2105.01799. [Google Scholar]
Bosello, M.; Tse, R.; Pau, G. Train in Austria, Race in Montecarlo: Generalized RL for Cross-Track F1_tenth LIDAR-Based Races. In Proceedings of the 2022 IEEE 19th Annual Consumer Communications and Networking Conference (CCNC), Las Vegas, NV, USA, 8–11 January 2022; Volume 19, pp. 290–298. [Google Scholar] [CrossRef]
Anzalone, L.; Barra, P.; Barra, S.; Castiglione, A.; Nappi, M. An End-to-End Curriculum Learning Approach for Autonomous Driving Scenarios. IEEE Trans. Intell. Transp. Syst. 2022, 23, 19817–19826. [Google Scholar] [CrossRef]
Fernandes, D.; Silva, A.; Névoa, R.; Simões, C.; Gonzalez, D.; Guevara, M.; Novais, P.; Monteiro, J.; Melo-Pinto, P. Point-cloud based 3D object detection and classification methods for self-driving applications: A survey and taxonomy. Inf. Fusion 2021, 68, 161–191. [Google Scholar] [CrossRef]
Dickmanns, E.; Zapp, A. Autonomous High Speed Road Vehicle Guidance by Computer Vision1. IFAC Proc. Vol. 1987, 20, 221–226. [Google Scholar] [CrossRef]
Thrun, S.; Montemerlo, M.; Dahlkamp, H.; Stavens, D.; Aron, A.; Diebel, J.; Fong, P.; Gale, J.; Halpenny, M.; Hoffmann, G.; et al. Stanley: The robot that won the DARPA Grand Challenge. J. Field Robot. 2006, 23, 661–692. [Google Scholar] [CrossRef]
Betz, J.; Betz, T.; Fent, F.; Geisslinger, M.; Heilmeier, A.; Hermansdorfer, L.; Herrmann, T.; Huch, S.; Karle, P.; Lienkamp, M.; et al. TUM Autonomous Motorsport: An Autonomous Racing Software for the Indy Autonomous Challenge. arXiv 2022, arXiv:2205.15979. [Google Scholar] [CrossRef]
Huang, Y.; Chen, Y. Autonomous Driving with Deep Learning: A Survey of State-of-Art Technologies. arXiv 2020, arXiv:2006.06091. [Google Scholar]
Morooka, F.E.; Junior, A.M.; Sigahi, T.F.A.C.; Pinto, J.d.S.; Rampasso, I.S.; Anholon, R. Deep Learning and Autonomous Vehicles: Strategic Themes, Applications, and Research Agenda Using SciMAT and Content-Centric Analysis, a Systematic Review. Mach. Learn. Knowl. Extr. 2023, 5, 763–781. [Google Scholar] [CrossRef]
Golroudbari, A.A.; Sabour, M.H. Recent Advancements in Deep Learning Applications and Methods for Autonomous Navigation: A Comprehensive Review. arXiv 2023, arXiv:2302.11089. [Google Scholar]
Jebamikyous, H.H.; Kashef, R. Autonomous Vehicles Perception (AVP) Using Deep Learning: Modeling, Assessment, and Challenges. IEEE Access 2022, 10, 10523–10535. [Google Scholar] [CrossRef]
Jiang, Y.; Hsiao, T. Deep Learning in Perception of Autonomous Vehicles. In Proceedings of the 2021 International Conference on Public Art and Human Development (ICPAHD 2021), Kunming, China, 24–26 December 2021; Atlantis Press: Dordrecht, The Netherlands, 2022; pp. 561–565. [Google Scholar] [CrossRef]
Delecki, H.; Itkina, M.; Lange, B.; Senanayake, R.; Kochenderfer, M.J. How Do We Fail? Stress Testing Perception in Autonomous Vehicles. arXiv 2022, arXiv:2203.14155. [Google Scholar]
Gupta, A.; Anpalagan, A.; Guan, L.; Khwaja, A.S. Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues. Array 2021, 10, 100057. [Google Scholar] [CrossRef]
Qian, R.; Lai, X.; Li, X. 3D Object Detection for Autonomous Driving: A Survey. Pattern Recognit. 2022, 130, 108796. [Google Scholar] [CrossRef]
Mao, J.; Shi, S.; Wang, X.; Li, H. 3D Object Detection for Autonomous Driving: A Review and New Outlooks. arXiv 2022, arXiv:2206.09474. [Google Scholar]
Ma, X.; Ouyang, W.; Simonelli, A.; Ricci, E. 3D Object Detection from Images for Autonomous Driving: A Survey. arXiv 2022, arXiv:2202.02980. [Google Scholar] [CrossRef]
Large, N.L.; Bieder, F.; Lauer, M. Comparison of different SLAM approaches for a driverless race car. tm Tech. Mess. 2021, 88, 227–236. [Google Scholar] [CrossRef]
Abaspur Kazerouni, I.; Fitzgerald, L.; Dooly, G.; Toal, D. A survey of state-of-the-art on visual SLAM. Expert Syst. Appl. 2022, 205, 117734. [Google Scholar] [CrossRef]
Teng, S.; Hu, X.; Deng, P.; Li, B.; Li, Y.; Ai, Y.; Yang, D.; Li, L.; Xuanyuan, Z.; Zhu, F.; et al. Motion Planning for Autonomous Driving: The State of the Art and Future Perspectives. IEEE Trans. Intell. Veh. 2023, 8, 3692–3711. [Google Scholar] [CrossRef]
Abdallaoui, S.; Aglzim, E.H.; Chaibet, A.; Kribèche, A. Thorough Review Analysis of Safe Control of Autonomous Vehicles: Path Planning and Navigation Techniques. Energies 2022, 15, 1358. [Google Scholar] [CrossRef]
Li, S.; Shu, K.; Chen, C.; Cao, D. Planning and Decision-making for Connected Autonomous Vehicles at Road Intersections: A Review. Chin. J. Mech. Eng. 2021, 34, 133. [Google Scholar] [CrossRef]
Zhou, H.; Laval, J.; Zhou, A.; Wang, Y.; Wu, W.; Qing, Z.; Peeta, S. Review of Learning-Based Longitudinal Motion Planning for Autonomous Vehicles: Research Gaps Between Self-Driving and Traffic Congestion. Transp. Res. Rec. 2022, 2676, 324–341. [Google Scholar] [CrossRef]
Bachute, M.R.; Subhedar, J.M. Autonomous Driving Architectures: Insights of Machine Learning and Deep Learning Algorithms. Mach. Learn. Appl. 2021, 6, 100164. [Google Scholar] [CrossRef]
Khanum, A.; Lee, C.Y.; Yang, C.S. Involvement of Deep Learning for Vision Sensor-Based Autonomous Driving Control: A Review. IEEE Sensors J. 2023, 23, 15321–15341. [Google Scholar] [CrossRef]
Kalandyk, D. Reinforcement learning in car control: A brief survey. In Proceedings of the 2021 Selected Issues of Electrical Engineering and Electronics (WZEE), Rzeszow, Poland, 13–15 September 2021; pp. 1–8. [Google Scholar] [CrossRef]
Tampuu, A.; Matiisen, T.; Semikin, M.; Fishman, D.; Muhammad, N. A Survey of End-to-End Driving: Architectures and Training Methods. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 1364–1384. [Google Scholar] [CrossRef]
Le Mero, L.; Yi, D.; Dianati, M.; Mouzakitis, A. A Survey on Imitation Learning Techniques for End-to-End Autonomous Vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 14128–14147. [Google Scholar] [CrossRef]
Coelho, D.; Oliveira, M. A Review of End-to-End Autonomous Driving in Urban Environments. IEEE Access 2022, 10, 75296–75311. [Google Scholar] [CrossRef]
Huang, R.N.; Ren, J.; Gabbar, H.A. The Current Trends of Deep Learning in Autonomous Vehicles: A Review. J. Eng. Res. Sci. 2022, 1, 56–68. [Google Scholar] [CrossRef]
Razi, A.; Chen, X.; Li, H.; Wang, H.; Russo, B.; Chen, Y.; Yu, H. Deep learning serves traffic safety analysis: A forward-looking review. IET Intell. Transp. Syst. 2023, 17, 22–71. [Google Scholar] [CrossRef]
Muhammad, K.; Ullah, A.; Lloret, J.; Ser, J.D.; de Albuquerque, V.H.C. Deep Learning for Safe Autonomous Driving: Current Challenges and Future Directions. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4316–4336. [Google Scholar] [CrossRef]
Deng, Y.; Zhang, T.; Lou, G.; Zheng, X.; Jin, J.; Han, Q. Deep Learning-Based Autonomous Driving Systems: A Survey of Attacks and Defenses. arXiv 2021, arXiv:2104.01789v2. [Google Scholar] [CrossRef]
Hou, L.; Chen, H.; Zhang, G.K.; Wang, X. Deep Learning-Based Applications for Safety Management in the AEC Industry: A Review. Appl. Sci. 2021, 11, 821. [Google Scholar] [CrossRef]
Li, Y.; Katsumata, K.; Javanmardi, E.; Tsukada, M. Large Language Models for Human-like Autonomous Driving: A Survey. arXiv 2024, arXiv:2407.19280. [Google Scholar]
Yang, Z.; Jia, X.; Li, H.; Yan, J. LLM4Drive: A Survey of Large Language Models for Autonomous Driving. arXiv 2024, arXiv:2311.01043. [Google Scholar]
Cui, C.; Ma, Y.; Cao, X.; Ye, W.; Zhou, Y.; Liang, K.; Chen, J.; Lu, J.; Yang, Z.; Liao, K.D.; et al. A Survey on Multimodal Large Language Models for Autonomous Driving. arXiv 2023, arXiv:2311.12320. [Google Scholar]
Zhou, X.; Liu, M.; Yurtsever, E.; Zagar, B.L.; Zimmer, W.; Cao, H.; Knoll, A.C. Vision Language Models in Autonomous Driving: A Survey and Outlook. arXiv 2024, arXiv:2310.14414. [Google Scholar] [CrossRef]
Cui, Y.; Huang, S.; Zhong, J.; Liu, Z.; Wang, Y.; Sun, C.; Li, B.; Wang, X.; Khajepour, A. DriveLLM: Charting the Path Toward Full Autonomous Driving With Large Language Models. IEEE Trans. Intell. Veh. 2024, 9, 1450–1464. [Google Scholar] [CrossRef]
Zheng, P.; Zhao, Y.; Gong, Z.; Zhu, H.; Wu, S. SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving. arXiv 2024, arXiv:2407.21293. [Google Scholar]
Kaur, R.; Arora, A.; Nayyar, A.; Rani, S. A survey on simulators for testing self-driving cars. Comput. Mater. Contin. 2021, 66, 1043–1062. [Google Scholar]
Zhang, T.; Liu, H.; Wang, W.; Wang, X. Virtual Tools for Testing Autonomous Driving: A Survey and Benchmark of Simulators, Datasets, and Competitions. Electronics 2024, 13, 3486. [Google Scholar] [CrossRef]
Shen, Y.; Chandaka, B.; Lin, Z.; Zhai, A.; Cui, H.; Forsyth, D.; Wang, S. Sim-on-Wheels: Physical World in the Loop Simulation for Self-Driving. IEEE Robot. Autom. Lett. 2023, 8, 8192–8199. [Google Scholar] [CrossRef]
Caleffi, F.; Rodrigues, L.; Stamboroski, J.; Pereira, B. Small-scale self-driving cars: A systematic literature review. J. Traffic Transp. Eng. 2024, 11, 170–188. [Google Scholar] [CrossRef]
Fremont, D.; Kim, E.; Pant, Y.; Seshia, S.; Acharya, A.; Bruso, X.; Wells, P.; Lemke, S.; Lu, Q.; Mehta, S. Formal Scenario-Based Testing of Autonomous Vehicles: From Simulation to the Real World. In Proceedings of the 23rd International Conference on Intelligent Transportation, Rhodes, Greece, 20–23 September 2023; Volume 108, pp. 1211–1230. [Google Scholar]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
Howe, M.; Bockman, J.; Orenstein, A.; Podgorski, S.; Bahrami, S.; Reid, I. The Edge of Disaster: A Battle Between Autonomous Racing and Safety. arXiv 2022, arXiv:2206.15012. [Google Scholar]
Zhou, Y.; Wen, S.; Wang, D.; Meng, J.; Mu, J.; Irampaye, R. MobileYOLO: Real-Time Object Detection Algorithm in Autonomous Driving Scenarios. Sensors 2022, 22, 3349. [Google Scholar] [CrossRef]
Cai, Y.; Luan, T.; Gao, H.; Wang, H.; Chen, L.; Li, Y.; Sotelo, M.A.; Li, Z. YOLOv4-5D: An Effective and Efficient Object Detector for Autonomous Driving. IEEE Trans. Instrum. Meas. 2021, 70, 3065438. [Google Scholar] [CrossRef]
Mahaur, B.; Mishra, K.; Kumar, A. An improved lightweight small object detection framework applied to real-time autonomous driving. Expert Syst. Appl. 2023, 234, 121036. [Google Scholar] [CrossRef]
Yang, M.; Fan, X. YOLOv8-Lite: A Lightweight Object Detection Model for Real-time Autonomous Driving Systems. IECE Trans. Emerg. Top. Artif. Intell. 2024, 1, 1–16. [Google Scholar] [CrossRef]
Jia, X.; Tong, Y.; Qiao, H.; Li, M.; Tong, J.; Liang, B. Fast and accurate object detector for autonomous driving based on improved YOLOv5. Sci. Rep. 2023, 13, 9711. [Google Scholar] [CrossRef]
Ranasinghe, P.; Muthukuda, D.; Morapitiya, P.; Dissanayake, M.B.; Lakmal, H. Deep Learning Based Low Light Enhancements for Advanced Driver-Assistance Systems at Night. In Proceedings of the 2023 IEEE 17th International Conference on Industrial and Information Systems (ICIIS), Peradeniya, Sri Lanka, 25–26 August 2023; pp. 489–494. [Google Scholar] [CrossRef]
Karagounis, A. Leveraging Large Language Models for Enhancing Autonomous Vehicle Perception. arXiv 2024, arXiv:2412.20230. [Google Scholar]
Ananthajothi, K.; Satyaa Sudarshan, G.S.; Saran, J.U. LLM’s for Autonomous Driving: A New Way to Teach Machines to Drive. In Proceedings of the 2023 3rd International Conference on Mobile Networks and Wireless Communications (ICMNWC), Tumkur, India, 4–5 December 2023; pp. 1–6. [Google Scholar] [CrossRef]
Elhenawy, M.; Ashqar, H.I.; Rakotonirainy, A.; Alhadidi, T.I.; Jaber, A.; Tami, M.A. Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding. Electronics 2025, 14, 1282. [Google Scholar] [CrossRef]
Guo, Z.; Yagudin, Z.; Lykov, A.; Konenkov, M.; Tsetserukou, D. VLM-Auto: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes. arXiv 2024, arXiv:2405.05885. [Google Scholar]
Mohapatra, S.; Yogamani, S.; Gotzig, H.; Milz, S.; Mader, P. BEVDetNet: Bird’s Eye View LiDAR Point Cloud based Real-time 3D Object Detection for Autonomous Driving. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 2809–2815. [Google Scholar] [CrossRef]
Zhou, Y.; Wen, S.; Wang, D.; Mu, J.; Richard, I. Object Detection in Autonomous Driving Scenarios Based on an Improved Faster-RCNN. Appl. Sci. 2021, 11, 11630. [Google Scholar] [CrossRef]
Shi, Y.; Guo, Y.; Mi, Z.; Li, X. Stereo CenterNet-based 3D object detection for autonomous driving. Neurocomputing 2022, 471, 219–229. [Google Scholar] [CrossRef]
An, K.; Chen, Y.; Wang, S.; Xiao, Z. RCBi-CenterNet: An Absolute Pose Policy for 3D Object Detection in Autonomous Driving. Appl. Sci. 2021, 11, 5621. [Google Scholar] [CrossRef]
Chen, Y.N.; Dai, H.; Ding, Y. Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 887–897. [Google Scholar]
Nobis, F.; Geisslinger, M.; Weber, M.; Betz, J.; Lienkamp, M. A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection. arXiv 2020, arXiv:2005.07431. [Google Scholar]
Zhang, J.; Cao, J.; Chang, J.; Li, X.; Liu, H.; Li, Z. Research on the Application of Computer Vision Based on Deep Learning in Autonomous Driving Technology. arXiv 2024, arXiv:2406.00490. [Google Scholar]
Zhao, X. Deep learning based visual perception and decision-making technology for autonomous vehicles. Appl. Comput. Eng. 2024, 33, 191–200. [Google Scholar] [CrossRef]
Chen, S.; Zhang, Z.; Zhong, R.; Zhang, L.; Ma, H.; Liu, L. A Dense Feature Pyramid Network-Based Deep Learning Model for Road Marking Instance Segmentation Using MLS Point Clouds. IEEE Trans. Geosci. Remote Sens. 2021, 59, 784–800. [Google Scholar] [CrossRef]
Shao, X.; Wang, Q.; Yang, W.; Chen, Y.; Xie, Y.; Shen, Y.; Wang, Z. Multi-Scale Feature Pyramid Network: A Heavily Occluded Pedestrian Detection Network Based on ResNet. Sensors 2021, 21, 1820. [Google Scholar] [CrossRef]
Liu, J.; Zhou, W.; Cui, Y.; Yu, L.; Luo, T. GCNet: Grid-like context-aware network for RGB-thermal semantic segmentation. Neurocomputing 2022, 506, 60–67. [Google Scholar] [CrossRef]
Bai, J.; Zhu, J.; Song, Y.; Zhao, L.; Hou, Z.; Du, R.; Li, H. A3t-gcn: Attention temporal graph convolutional network for traffic forecasting. ISPRS Int. J. Geo-Inf. 2021, 10, 485. [Google Scholar] [CrossRef]
Mseddi, W.; Sedrine, M.A.; Attia, R. YOLOv5 Based Visual Localization For Autonomous Vehicles. In Proceedings of the 2021 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 23–27 August 2021. [Google Scholar] [CrossRef]
Wenzel, P.; Wang, R.; Yang, N.; Cheng, Q.; Khan, Q.; von Stumberg, L.; Zeller, N.; Cremers, D. 4Seasons: A Cross-Season Dataset for Multi-Weather SLAM in Autonomous Driving. In Proceedings of the Pattern Recognition: 42nd DAGM German Conference, DAGM GCPR 2020, Tübingen, Germany, 28 September–1 October 2020; Akata, Z., Geiger, A., Sattler, T., Eds.; ACM: Cham, Switzerland, 2021; pp. 404–417. [Google Scholar]
Gallagher, L.; Ravi Kumar, V.; Yogamani, S.; McDonald, J.B. A Hybrid Sparse-Dense Monocular SLAM System for Autonomous Driving. In Proceedings of the 2021 European Conference on Mobile Robots (ECMR), Bonn, Germany, 31 August–3 September 2021; pp. 1–8. [Google Scholar] [CrossRef]
Zhang, T. Perception Stack for Indy Autonomous Challenge and Reinforcement Learning in Simulation Autonomous Racing. Technical Report No. UCB/EECS-2023-187, 2 May 2023. Available online: https://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-187.pdf (accessed on 16 July 2024).
Balakrishnan, A.; Ramana, K.; Dhiman, G.; Ashok, G.; Bhaskar, V.; Sharma, A.; Gaba, G.S.; Masud, M.; Al-Amri, J.F. Multimedia Concepts on Object Detection and Recognition with F1 Car Simulation Using Convolutional Layers. Wirel. Commun. Mob. Comput. 2021, 2021, 5543720. [Google Scholar] [CrossRef]
Teeti, I.; Musat, V.; Khan, S.; Rast, A.; Cuzzolin, F.; Bradley, A. Vision in adverse weather: Augmentation using CycleGANs with various object detectors for robust perception in autonomous racing. arXiv 2022, arXiv:2201.03246. [Google Scholar]
Katsamenis, I.; Karolou, E.E.; Davradou, A.; Protopapadakis, E.; Doulamis, A.; Doulamis, N.; Kalogeras, D. TraCon: A novel dataset for real-time traffic cones detection using deep learning. arXiv 2022, arXiv:2205.11830. [Google Scholar]
Strobel, K.; Zhu, S.; Chang, R.; Koppula, S. Accurate, Low-Latency Visual Perception for Autonomous Racing: Challenges, Mechanisms, and Practical Solutions. arXiv 2020, arXiv:2007.13971. [Google Scholar]
Peng, W.; Ao, Y.; He, J.; Wang, P. Vehicle Odometry with Camera-Lidar-IMU Information Fusion and Factor-Graph Optimization. J. Intell. Robotic Syst. 2021, 101, 81. [Google Scholar] [CrossRef]
Karle, P.; Fent, F.; Huch, S.; Sauerbeck, F.; Lienkamp, M. Multi-Modal Sensor Fusion and Object Tracking for Autonomous Racing. IEEE Trans. Intell. Veh. 2023, 8, 3871–3883. [Google Scholar] [CrossRef]
Carranza-García, M.; Torres-Mateo, J.; Lara-Benítez, P.; García-Gutiérrez, J. On the Performance of One-Stage and Two-Stage Object Detectors in Autonomous Vehicles Using Camera Data. Remote Sens. 2021, 13, 89. [Google Scholar] [CrossRef]
Liu, Y.; Diao, S. An automatic driving trajectory planning approach in complex traffic scenarios based on integrated driver style inference and deep reinforcement learning. PLoS ONE 2024, 19, e0297192. [Google Scholar] [CrossRef] [PubMed]
Gupta, P.; Isele, D.; Bae, S. Towards Scalable and Efficient Interaction-Aware Planning in Autonomous Vehicles using Knowledge Distillation. arXiv 2024, arXiv:2404.01746. [Google Scholar]
Hui, F.; Wei, C.; ShangGuan, W.; Ando, R.; Fang, S. Deep encoder–decoder-NN: A deep learning-based autonomous vehicle trajectory prediction and correction model. Phys. A Stat. Mech. Appl. 2022, 593, 126869. [Google Scholar] [CrossRef]
Zhang, Z. ResNet-Based Model for Autonomous Vehicles Trajectory Prediction. In Proceedings of the 2021 IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 15–17 January 2021; pp. 565–568. [Google Scholar] [CrossRef]
Paz, D.; Zhang, H.; Christensen, H.I. TridentNet: A Conditional Generative Model for Dynamic Trajectory Generation. In Proceedings of the Intelligent Autonomous Systems 16; Ang, M.H., Jr., Asama, H., Lin, W., Foong, S., Eds.; Springer: Cham, Switzerland, 2022; pp. 403–416. [Google Scholar]
Cai, P.; Sun, Y.; Wang, H.; Liu, M. VTGNet: A Vision-Based Trajectory Generation Network for Autonomous Vehicles in Urban Environments. IEEE Trans. Intell. Veh. 2021, 6, 419–429. [Google Scholar] [CrossRef]
Liu, X.; Wang, Y.; Zhou, Z.; Nam, K.; Wei, C.; Yin, C. Trajectory Prediction of Preceding Target Vehicles Based on Lane Crossing and Final Points Generation Model Considering Driving Styles. IEEE Trans. Veh. Technol. 2021, 70, 8720–8730. [Google Scholar] [CrossRef]
Zhang, Q.; Hu, S.; Sun, J.; Chen, Q.A.; Mao, Z.M. On Adversarial Robustness of Trajectory Prediction for Autonomous Vehicles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 15159–15168. [Google Scholar]
Wang, J.; Wang, P.; Zhang, C.; Su, K.; Li, J. F-Net: Fusion Neural Network for Vehicle Trajectory Prediction in Autonomous Driving. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 4095–4099. [Google Scholar] [CrossRef]
Qu, L.; Dailey, M.N. Vehicle Trajectory Estimation Based on Fusion of Visual Motion Features and Deep Learning. Sensors 2021, 21, 7969. [Google Scholar] [CrossRef]
Lin, L.; Li, W.; Bi, H.; Qin, L. Vehicle Trajectory Prediction Using LSTMs with Spatial-Temporal Attention Mechanisms. IEEE Intell. Transp. Syst. Mag. 2022, 14, 197–208. [Google Scholar] [CrossRef]
Greer, R.; Deo, N.; Trivedi, M. Trajectory Prediction in Autonomous Driving With a Lane Heading Auxiliary Loss. IEEE Robot. Autom. Lett. 2021, 6, 4907–4914. [Google Scholar] [CrossRef]
Song, H.; Luan, D.; Ding, W.; Wang, M.Y.; Chen, Q. Learning to Predict Vehicle Trajectories with Model-based Planning. In Proceedings of the 5th Conference on Robot Learning, PMLR, London, UK, 8–11 November 2021; Faust, A., Hsu, D., Neumann, G., Eds.; Volume 164, pp. 1035–1045. [Google Scholar]
Chen, X.; Xu, J.; Zhou, R.; Chen, W.; Fang, J.; Liu, C. TrajVAE: A Variational AutoEncoder model for trajectory generation. Neurocomputing 2021, 428, 332–339. [Google Scholar] [CrossRef]
Li, X.; Rosman, G.; Gilitschenski, I.; Vasile, C.I.; DeCastro, J.A.; Karaman, S.; Rus, D. Vehicle Trajectory Prediction Using Generative Adversarial Network With Temporal Logic Syntax Tree Features. IEEE Robot. Autom. Lett. 2021, 6, 3459–3466. [Google Scholar] [CrossRef]
Zhang, K.; Zhao, L.; Dong, C.; Wu, L.; Zheng, L. AI-TP: Attention-based Interaction-aware Trajectory Prediction for Autonomous Driving. IEEE Trans. Intell. Veh. 2023, 8, 73–83. [Google Scholar] [CrossRef]
Sheng, Z.; Xu, Y.; Xue, S.; Li, D. Graph-Based Spatial-Temporal Convolutional Network for Vehicle Trajectory Prediction in Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2022, 23, 17654–17665. [Google Scholar] [CrossRef]
Carrasco, S.; Llorca, D.; Fern, E.; Sotelo, M.A. SCOUT: Socially-COnsistent and UndersTandable Graph Attention Network for Trajectory Prediction of Vehicles and VRUs. In Proceedings of the 2021 IEEE Intelligent Vehicles Symposium (IV), Nagoya, Japan, 11–17 July 2021; pp. 1501–1508. [Google Scholar] [CrossRef]
Jo, E.; Sunwoo, M.; Lee, M. Vehicle Trajectory Prediction Using Hierarchical Graph Neural Network for Considering Interaction among Multimodal Maneuvers. Sensors 2021, 21, 5354. [Google Scholar] [CrossRef] [PubMed]
Li, F.J.; Zhang, C.Y.; Chen, C.P. Robust decision-making for autonomous vehicles via deep reinforcement learning and expert guidance. Appl. Intell. 2025, 55, 412. [Google Scholar] [CrossRef]
Liao, H.; Li, Z.; Shen, H.; Zeng, W.; Liao, D.; Li, G.; Li, S.E.; Xu, C. BAT: Behavior-Aware Human-Like Trajectory Prediction for Autonomous Driving. arXiv 2023, arXiv:2312.06371. [Google Scholar] [CrossRef]
Zhai, F.; Xu, H.; Chen, C.; Zhang, G. Deep Learning Based Approach for Human-like Driving Trajectory Planning. In Proceedings of the 2023 3rd International Conference on Robotics, Automation and Intelligent Control (ICRAIC), Los Alamitos, CA, USA, 22–24 December 2023; pp. 393–397. [Google Scholar] [CrossRef]
Cai, L.; Guan, H.; Xu, Q.H.; Jia, X.; Zhan, J. A novel behavior planning for human-like autonomous driving. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2025, 0, 09544070241310648. [Google Scholar]
Pan, C.; Yaman, B.; Nesti, T.; Mallik, A.; Allievi, A.G.; Velipasalar, S.; Ren, L. VLP: Vision Language Planning for Autonomous Driving. arXiv 2024, arXiv:2401.05577. [Google Scholar]
Cui, C.; Ma, Y.; Cao, X.; Ye, W.; Wang, Z. Drive as You Speak: Enabling Human-Like Interaction with Large Language Models in Autonomous Vehicles. arXiv 2023, arXiv:2309.10228. [Google Scholar]
He, L.; Aouf, N.; Song, B. Explainable Deep Reinforcement Learning for UAV autonomous path planning. Aerosp. Sci. Technol. 2021, 118, 107052. [Google Scholar] [CrossRef]
Xu, C.; Zhao, W.; Chen, Q.; Wang, C. An actor-critic based learning method for decision-making and planning of autonomous vehicles. Sci. China E Technol. Sci. 2021, 64, 984–994. [Google Scholar] [CrossRef]
Cheng, Y.; Hu, X.; Chen, K.; Yu, X.; Luo, Y. Online longitudinal trajectory planning for connected and autonomous vehicles in mixed traffic flow with deep reinforcement learning approach. J. Intell. Transp. Syst. 2023, 27, 396–410. [Google Scholar] [CrossRef]
Luis, S.Y.; Reina, D.G.; Marín, S.L.T. A Multiagent Deep Reinforcement Learning Approach for Path Planning in Autonomous Surface Vehicles: The Ypacaraí Lake Patrolling Case. IEEE Access 2021, 9, 17084–17099. [Google Scholar] [CrossRef]
Naveed, K.B.; Qiao, Z.; Dolan, J.M. Trajectory Planning for Autonomous Vehicles Using Hierarchical Reinforcement Learning. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 601–606. [Google Scholar] [CrossRef]
Yang, J.; Wang, P.; Yuan, W.; Ju, Y.; Han, W.; Zhao, J. Automatic generation of optimal road trajectory for the rescue vehicle in case of emergency on mountain freeway using reinforcement learning approach. IET Intell. Transp. Syst. 2021, 15, 1142–1152. [Google Scholar] [CrossRef]
Bayerlein, H.; Theile, M.; Caccamo, M.; Gesbert, D. Multi-UAV Path Planning for Wireless Data Harvesting With Deep Reinforcement Learning. IEEE Open J. Commun. Soc. 2021, 2, 1171–1187. [Google Scholar] [CrossRef]
Jain, A.; Morari, M. Computing the racing line using Bayesian optimization. arXiv 2020, arXiv:2002.04794. [Google Scholar]
Ögretmen, L.; Chen, M.; Pitschi, P.; Lohmann, B. Trajectory Planning Using Reinforcement Learning for Interactive Overtaking Maneuvers in Autonomous Racing Scenarios. arXiv 2024, arXiv:2404.10658. [Google Scholar]
Cleac’h, S.L.; Schwager, M.; Manchester, Z. LUCIDGames: Online Unscented Inverse Dynamic Games for Adaptive Trajectory Prediction and Planning. arXiv 2020, arXiv:2011.08152. [Google Scholar]
Karle, P.; Török, F.; Geisslinger, M.; Lienkamp, M. MixNet: Structured Deep Neural Motion Prediction for Autonomous Racing. arXiv 2022, arXiv:2208.01862. [Google Scholar]
Ghignone, E.; Baumann, N.; Boss, M.; Magno, M. TC-Driver: Trajectory Conditioned Driving for Robust Autonomous Racing—A Reinforcement Learning Approach. arXiv 2022, arXiv:2205.09370. [Google Scholar]
Weaver, C.; Capobianco, R.; Wurman, P.R.; Stone, P.; Tomizuka, M. Real-time Trajectory Generation via Dynamic Movement Primitives for Autonomous Racing. In Proceedings of the 2024 American Control Conference (ACC), Toronto, ON, Canada, 8–9 July 2024; IEEE: New York, NY, USA, 2024; pp. 352–359. [Google Scholar]
Evans, B.; Jordaan, H.W.; Engelbrecht, H.A. Autonomous Obstacle Avoidance by Learning Policies for Reference Modification. arXiv 2021, arXiv:2102.11042. [Google Scholar]
Thakkar, R.S.; Samyal, A.S.; Fridovich-Keil, D.; Xu, Z.; Topcu, U. Hierarchical Control for Cooperative Teams in Competitive Autonomous Racing. arXiv 2022, arXiv:2204.13070. [Google Scholar] [CrossRef]
Trumpp, R.; Javanmardi, E.; Nakazato, J.; Tsukada, M.; Caccamo, M. RaceMOP: Mapless Online Path Planning for Multi-Agent Autonomous Racing using Residual Policy Learning. arXiv 2024, arXiv:2403.07129. [Google Scholar]
Garlick, S.; Bradley, A. Real-Time Optimal Trajectory Planning for Autonomous Vehicles and Lap Time Simulation Using Machine Learning. arXiv 2021, arXiv:2102.02315. [Google Scholar] [CrossRef]
Kim, T.; Lee, H.; Hong, S.; Lee, W. TOAST: Trajectory Optimization and Simultaneous Tracking using Shared Neural Network Dynamics. arXiv 2022, arXiv:2201.08321. [Google Scholar] [CrossRef]
Chisari, E.; Liniger, A.; Rupenyan, A.; Gool, L.V.; Lygeros, J. Learning from Simulation, Racing in Reality. arXiv 2021, arXiv:2011.13332. [Google Scholar]
Fuchs, F.; Song, Y.; Kaufmann, E.; Scaramuzza, D.; Durr, P. Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning. IEEE Robot. Autom. Lett. 2021, 6, 4257–4264. [Google Scholar] [CrossRef]
Zhang, R.; Hou, J.; Chen, G.; Li, Z.; Chen, J.; Knoll, A. Residual Policy Learning Facilitates Efficient Model-Free Autonomous Racing. IEEE Robot. Autom. Lett. 2022, 7, 11625–11632. [Google Scholar] [CrossRef]
Weiss, T.; Chrosniak, J.; Behl, M. Towards multi-agent autonomous racing with the Deepracing framework. In Proceedings of the International Conference on Robotics and Automation, Virtual Conference, 31 May–31 August 2020. [Google Scholar]
Busch, F.L.; Johnson, J.; Zhu, E.L.; Borrelli, F. A Gaussian Process Model for Opponent Prediction in Autonomous Racing. arXiv 2022, arXiv:2204.12533. [Google Scholar]
Brüdigam, T.; Capone, A.; Hirche, S.; Wollherr, D.; Leibold, M. Gaussian Process-based Stochastic Model Predictive Control for Overtaking in Autonomous Racing. arXiv 2021, arXiv:2105.12236. [Google Scholar]
Tian, Z.; Zhao, D.; Lin, Z.; Flynn, D.; Zhao, W.; Tian, D. Balanced reward-inspired reinforcement learning for autonomous vehicle racing. In Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR, Oxford, UK, 15–17 July 2024; Abate, A., Cannon, M., Margellos, K., Papachristodoulou, A., Eds.; Volume 242, pp. 628–640. [Google Scholar]
Trent Weiss, V.S.; Behl, M. DeepRacing AI: Agile Trajectory Synthesis for Autonomous Racing. arXiv 2020, arXiv:2005.05178. [Google Scholar]
Gao, Z.; Yu, T.; Gao, F.; Zhao, R.; Sun, T. Human-like mechanism deep learning model for longitudinal motion control of autonomous vehicles. Eng. Appl. Artif. Intell. 2024, 133, 108060. [Google Scholar] [CrossRef]
Renz, K.; Chen, L.; Marcu, A.M.; Hünermann, J.; Hanotte, B.; Karnsund, A.; Shotton, J.; Arani, E.; Sinavski, O. CarLLaVA: Vision language models for camera-only closed-loop driving. arXiv 2024, arXiv:2406.10165. [Google Scholar]
Elallid, B.B.; Bagaa, M.; Benamar, N.; Mrani, N. A reinforcement learning based autonomous vehicle control in diverse daytime and weather scenarios. J. Intell. Transp. Syst. 2024; in press. [Google Scholar] [CrossRef]
Yin, Y. Design of Deep Learning Based Autonomous Driving Control Algorithm. In Proceedings of the 2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 14–16 January 2022; pp. 423–426. [Google Scholar] [CrossRef]
Li, X.; Liu, C.; Chen, B.; Jiang, J. Robust Adaptive Learning-Based Path Tracking Control of Autonomous Vehicles Under Uncertain Driving Environments. IEEE Trans. Intell. Transp. Syst. 2022, 23, 20798–20809. [Google Scholar] [CrossRef]
Xue, H.; Zhu, E.L.; Dolan, J.M.; Borrelli, F. Learning Model Predictive Control with Error Dynamics Regression for Autonomous Racing. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; IEEE: New York, NY, USA, 2024; pp. 13250–13256. [Google Scholar] [CrossRef]
Brunnbauer, A.; Berducci, L.; Brandstätter, A.; Lechner, M.; Hasani, R.M.; Rus, D.; Grosu, R. Model-based versus Model-free Deep Reinforcement Learning for Autonomous Racing Cars. arXiv 2021, arXiv:2103.04909. [Google Scholar]
Du, Y.; Chen, J.; Zhao, C.; Liu, C.; Liao, F.; Chan, C.Y. Comfortable and energy-efficient speed control of autonomous vehicles on rough pavements using deep reinforcement learning. Transp. Res. Part C Emerg. Technol. 2022, 134, 103489. [Google Scholar] [CrossRef]
Xiao, Y.; Zhang, X.; Xu, X.; Liu, X.; Liu, J. Deep Neural Networks with Koopman Operators for Modeling and Control of Autonomous Vehicles. IEEE Trans. Intell. Veh. 2022, 8, 135–146. [Google Scholar] [CrossRef]
Fényes, D.; Németh, B.; Gáspár, P. A Novel Data-Driven Modeling and Control Design Method for Autonomous Vehicles. Energies 2021, 14, 517. [Google Scholar] [CrossRef]
He, S.; Xu, R.; Zhao, Z.; Zou, T. Vision-based neural formation tracking control of multiple autonomous vehicles with visibility and performance constraints. Neurocomputing 2022, 492, 651–663. [Google Scholar] [CrossRef]
Salunkhe, S.S.; Pal, S.; Agrawal, A.; Rai, R.; Mole, S.S.S.; Jos, B.M. Energy Optimization for CAN Bus and Media Controls in Electric Vehicles Using Deep Learning Algorithms. J. Supercomput. 2022, 78, 8493–8508. [Google Scholar] [CrossRef]
Wang, W.; Xie, J.; Hu, C.; Zou, H.; Fan, J.; Tong, W.; Wen, Y.; Wu, S.; Deng, H.; Li, Z.; et al. DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving. arXiv 2023, arXiv:2312.09245. [Google Scholar]
Prathiba, S.B.; Raja, G.; Dev, K.; Kumar, N.; Guizani, M. A Hybrid Deep Reinforcement Learning For Autonomous Vehicles Smart-Platooning. IEEE Trans. Veh. Technol. 2021, 70, 13340–13350. [Google Scholar] [CrossRef]
Mushtaq, A.; Haq, I.U.; Imtiaz, M.U.; Khan, A.; Shafiq, O. Traffic Flow Management of Autonomous Vehicles Using Deep Reinforcement Learning and Smart Rerouting. IEEE Access 2021, 9, 51005–51019. [Google Scholar] [CrossRef]
Pérez-Gil, O.; Barea, R.; López-Guillén, E.; Bergasa, L.M.; Gómez-Huélamo, C.; Gutiérrez, R.; Díaz-Díaz, A. Deep Reinforcement Learning Based Control for Autonomous Vehicles in CARLA. Multimed. Tools Appl. 2022, 81, 3553–3576. [Google Scholar] [CrossRef]
Dong, J.; Chen, S.; Li, Y.; Du, R.; Steinfeld, A.; Labi, S. Space-weighted information fusion using deep reinforcement learning: The context of tactical control of lane-changing autonomous vehicles and connectivity range assessment. Transp. Res. Part C Emerg. Technol. 2021, 128, 103192. [Google Scholar] [CrossRef]
Peng, B.; Keskin, M.F.; Kulcsár, B.; Wymeersch, H. Connected autonomous vehicles for improving mixed traffic efficiency in unsignalized intersections with deep reinforcement learning. Commun. Transp. Res. 2021, 1, 100017. [Google Scholar] [CrossRef]
Muzahid, A.J.M.; Kamarulzaman, S.F.; Rahman, M.A.; Alenezi, A.H. Deep Reinforcement Learning-Based Driving Strategy for Avoidance of Chain Collisions and Its Safety Efficiency Analysis in Autonomous Vehicles. IEEE Access 2022, 10, 43303–43319. [Google Scholar] [CrossRef]
Zheng, L.; Son, S.; Lin, M.C. Traffic-Aware Autonomous Driving with Differentiable Traffic Simulation. arXiv 2023, arXiv:2210.03772. [Google Scholar]
Folkestad, C.; Wei, S.X.; Burdick, J.W. Quadrotor Trajectory Tracking with Learned Dynamics: Joint Koopman-based Learning of System Models and Function Dictionaries. arXiv 2021, arXiv:2110.10341. [Google Scholar]
Jain, A.; O’Kelly, M.; Chaudhari, P.; Morari, M. BayesRace: Learning to race autonomously using prior experience. arXiv 2020, arXiv:2005.04755. [Google Scholar]
Evans, B.; Engelbrecht, H.A.; Jordaan, H.W. From Navigation to Racing: Reward Signal Design for Autonomous Racing. arXiv 2021, arXiv:2103.10098. [Google Scholar]
Salvaji, A.; Taylor, H.; Valencia, D.; Gee, T.; Williams, H. Racing Towards Reinforcement Learning based control of an Autonomous Formula SAE Car. arXiv 2023, arXiv:2308.13088. [Google Scholar]
Betz, J.; Zheng, H.; Liniger, A.; Rosolia, U.; Karle, P.; Behl, M.; Krovi, V.; Mangharam, R. Autonomous Vehicles on the Edge: A Survey on Autonomous Vehicle Racing. arXiv 2022, arXiv:2202.07008. [Google Scholar] [CrossRef]
Fu, D.; Li, X.; Wen, L.; Dou, M.; Cai, P.; Shi, B.; Qiao, Y. Drive Like a Human: Rethinking Autonomous Driving with Large Language Models. arXiv 2023, arXiv:2307.07162. [Google Scholar]
Lee, D.; Liu, J. End-to-End Deep Learning of Lane Detection and Path Prediction for Real-Time Autonomous Driving. arXiv 2021, arXiv:2102.04738. [Google Scholar] [CrossRef]
Hwang, J.J.; Xu, R.; Lin, H.; Hung, W.C.; Ji, J.; Choi, K.; Huang, D.; He, T.; Covington, P.; Sapp, B.; et al. EMMA: End-to-End Multimodal Model for Autonomous Driving. arXiv 2024, arXiv:2410.23262. [Google Scholar]
Hu, B.; Jiang, L.; Zhang, S.; Wang, Q. An Explainable and Robust Motion Planning and Control Approach for Autonomous Vehicle On-Ramping Merging Task Using Deep Reinforcement Learning. IEEE Trans. Transp. Electrif. 2024, 10, 6488–6496. [Google Scholar] [CrossRef]
Kalaria, D.; Lin, Q.; Dolan, J.M. Adaptive Planning and Control with Time-Varying Tire Models for Autonomous Racing Using Extreme Learning Machine. arXiv 2023, arXiv:2303.08235. [Google Scholar]
Mammadov, M. End-to-end Lidar-Driven Reinforcement Learning for Autonomous Racing. arXiv 2023, arXiv:2309.00296. [Google Scholar]
Cosner, R.K.; Yue, Y.; Ames, A.D. End-to-End Imitation Learning with Safety Guarantees using Control Barrier Functions. In Proceedings of the CDC, Cancun, Mexico, 6–9 December 2022; IEEE: New York, NY, USA, 2022. [Google Scholar]
Natan, O.; Miura, J. End-to-end Autonomous Driving with Semantic Depth Cloud Mapping and Multi-agent. IEEE Trans. Intell. Veh. 2023, 8, 557–571. [Google Scholar] [CrossRef]
Lee, H.; Choi, Y.; Han, T.; Kim, K. Probabilistically Guaranteeing End-to-end Latencies in Autonomous Vehicle Computing Systems. IEEE Trans. Comput. 2022, 71, 3361–3374. [Google Scholar] [CrossRef]
Nair, U.R.; Sharma, S.; Parihar, U.S.; Menon, M.S.; Vidapanakal, S. Bridging Sim2Real Gap Using Image Gradients for the Task of End-to-End Autonomous Driving. arXiv 2022, arXiv:2205.07481. [Google Scholar]
Antonio, G.P.; Maria-Dolores, C. Multi-Agent Deep Reinforcement Learning to Manage Connected Autonomous Vehicles at Tomorrow’s Intersections. IEEE Trans. Veh. Technol. 2022, 71, 7033–7043. [Google Scholar] [CrossRef]
Agarwal, T.; Arora, H.; Schneider, J. Learning Urban Driving Policies Using Deep Reinforcement Learning. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 607–614. [Google Scholar] [CrossRef]
Cui, J.; Qiu, H.; Chen, D.; Stone, P.; Zhu, Y. Coopernaut: End-to-End Driving With Cooperative Perception for Networked Vehicles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 17252–17262. [Google Scholar]
Kwon, J.; Khalil, A.; Kim, D.; Nam, H. Incremental End-to-End Learning for Lateral Control in Autonomous Driving. IEEE Access 2022, 10, 33771–33786. [Google Scholar] [CrossRef]
Schwarting, W.; Seyde, T.; Gilitschenski, I.; Liebenwein, L.; Sander, R.; Karaman, S.; Rus, D. Deep Latent Competition: Learning to Race Using Visual Control Policies in Latent Space. arXiv 2021, arXiv:2102.09812. [Google Scholar]
Hsu, B.J.; Cao, H.G.; Lee, I.; Kao, C.Y.; Huang, J.B.; Wu, I.C. Image-Based Conditioning for Action Policy Smoothness in Autonomous Miniature Car Racing with Reinforcement Learning. arXiv 2022, arXiv:2205.09658. [Google Scholar]
Cota, J.L.; Rodríguez, J.A.T.; Alonso, B.G.; Hurtado, C.V. Roadmap for development of skills in Artificial Intelligence by means of a Reinforcement Learning model using a DeepRacer autonomous vehicle. In Proceedings of the 2022 IEEE Global Engineering Education Conference (EDUCON), Tunis, Tunisia, 28–31 March 2022; pp. 1355–1364. [Google Scholar] [CrossRef]
Huch, S.; Sauerbeck, F.; Betz, J. DeepSTEP–Deep Learning-Based Spatio-Temporal End-To-End Perception for Autonomous Vehicles. arXiv 2023, arXiv:2305.06820. [Google Scholar]
Aoki, S.; Yamamoto, I.; Shiotsuka, D.; Inoue, Y.; Tokuhiro, K.; Miwa, K. SuperDriverAI: Towards Design and Implementation for End-to-End Learning-based Autonomous Driving. arXiv 2023, arXiv:2305.10443. [Google Scholar]
Tian, X.; Gu, J.; Li, B.; Liu, Y.; Wang, Y.; Zhao, Z.; Zhan, K.; Jia, P.; Lang, X.; Zhao, H. DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models. arXiv 2024, arXiv:2402.12289. [Google Scholar]
Xu, Y.; Hu, Y.; Zhang, Z.; Meyer, G.P.; Mustikovela, S.K.; Srinivasa, S.; Wolff, E.M.; Huang, X. VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision. arXiv 2024, arXiv:2412.14446. [Google Scholar]
Zhang, Y. LIDAR–camera deep fusion for end-to-end trajectory planning of autonomous vehicle. J. Phys. Conf. Ser. 2022, 2284, 012006. [Google Scholar] [CrossRef]
Chen, L.; Hu, X.; Tang, B.; Cheng, Y. Conditional DQN-Based Motion Planning With Fuzzy Logic for Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2022, 23, 2966–2977. [Google Scholar] [CrossRef]
Prakash, A.; Chitta, K.; Geiger, A. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. arXiv 2021, arXiv:2104.09224. [Google Scholar]
Weiss, T.; Behl, M. DeepRacing: A Framework for Autonomous Racing. In Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 9–13 March 2020; IEEE: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
Zheng, H.; Betz, J.; Mangharam, R. Gradient-free Multi-domain Optimization for Autonomous Systems. arXiv 2022, arXiv:2202.13525. [Google Scholar]
Song, Y.; Lin, H.; Kaufmann, E.; Duerr, P.; Scaramuzza, D. Autonomous Overtaking in Gran Turismo Sport Using Curriculum Reinforcement Learning. arXiv 2021, arXiv:2103.14666. [Google Scholar]
Abrecht, S.; Hirsch, A.; Raafatnia, S.; Woehrle, M. Deep Learning Safety Concerns in Automated Driving Perception. IEEE Trans. Intell. Veh. 2024; early access. [Google Scholar] [CrossRef]
Wang, Y.; Jiao, R.; Zhan, S.S.; Lang, C.; Huang, C.; Wang, Z.; Yang, Z.; Zhu, Q. Empowering Autonomous Driving with Large Language Models: A Safety Perspective. arXiv 2024, arXiv:2312.00812. [Google Scholar]
Chen, H.; Cao, X.; Guvenc, L.; Aksun-Guvenc, B. Deep-Reinforcement-Learning-Based Collision Avoidance of Autonomous Driving System for Vulnerable Road User Safety. Electronics 2024, 13, 1952. [Google Scholar] [CrossRef]
Yu, K.; Lin, L.; Alazab, M.; Tan, L.; Gu, B. Deep Learning-Based Traffic Safety Solution for a Mixture of Autonomous and Manual Vehicles in a 5G-Enabled Intelligent Transportation System. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4337–4347. [Google Scholar] [CrossRef]
Wan, L.; Sun, Y.; Sun, L.; Ning, Z.; Rodrigues, J.J.P.C. Deep Learning Based Autonomous Vehicle Super Resolution DOA Estimation for Safety Driving. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4301–4315. [Google Scholar] [CrossRef]
Karmakar, G.; Chowdhury, A.; Das, R.; Kamruzzaman, J.; Islam, S. Assessing Trust Level of a Driverless Car Using Deep Learning. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4457–4466. [Google Scholar] [CrossRef]
Zhu, Z.; Hu, Z.; Dai, W.; Chen, H.; Lv, Z. Deep learning for autonomous vehicle and pedestrian interaction safety. Saf. Sci. 2022, 145, 105479. [Google Scholar] [CrossRef]
Xing, Y.; Lv, C.; Mo, X.; Hu, Z.; Huang, C.; Hang, P. Toward Safe and Smart Mobility: Energy-Aware Deep Learning for Driving Behavior Analysis and Prediction of Connected Vehicles. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4267–4280. [Google Scholar] [CrossRef]
Chen, B.; Francis, J.; Herman, J.; Oh, J.; Nyberg, E.; Herbert, S.L. Safety-aware Policy Optimisation for Autonomous Racing. arXiv 2021, arXiv:2110.07699. [Google Scholar]
Cai, P.; Wang, H.; Huang, H.; Liu, Y.; Liu, M. Vision-Based Autonomous Car Racing Using Deep Imitative Reinforcement Learning. IEEE Robot. Autom. Lett. 2021, 6, 7262–7269. [Google Scholar] [CrossRef]
Tearle, B.; Wabersich, K.P.; Carron, A.; Zeilinger, M.N. A Predictive Safety Filter for Learning-Based Racing Control. IEEE Robot. Autom. Lett. 2021, 6, 7635–7642. [Google Scholar] [CrossRef]
Tranzatto, M.; Dharmadhikari, M.; Bernreiter, L.; Camurri, M.; Khattak, S.; Mascarich, F.; Pfreundschuh, P.; Wisth, D.; Zimmermann, S.; Kulkarni, M.; et al. Team CERBERUS Wins the DARPA Subterranean Challenge: Technical Overview and Lessons Learned. arXiv 2022, arXiv:2207.04914. [Google Scholar] [CrossRef]

Figure 1. SAE J3016 levels of driving automation (adopted by the U.S. Department of Transportation).

Figure 2. The hierarchy of a modular self-driving system (adapted from [12]).

Figure 3. Publication trends on autonomous driving (2010–2025) from Scopus.

Figure 4. The hierarchy of components and methodologies of modular autonomous driving systems.

Figure 5. The PRISMA flow diagram for the review.

Figure 6. Overview of end-to-end system structures in autonomous driving, distinguishing fully integrated models from partially modular approaches (adapted from [160]).

Figure 7. Comparison of modular perception, planning, and control architectures between road and racing scenarios.

Table 1. Inclusion and exclusion criteria.

	Inclusion	Exclusion
Type of Study	Original research papers, review papers, technical reports, data papers	Posters, short papers, editorials, letters
Language	English	Non-English
Countries/Regions	Not restricted	-
Publication Year	January 2020 to December 2025	Pre-2020 or Post-2025
Intervention	Machine learning/deep learning approaches	Other approaches rather than machine learning or deep learning
Scope	Deep learning approaches for modular autonomous driving systems	Other focus rather than deep learning approaches for modular autonomous driving systems

Table 2. Comparison of modular system components between autonomous on-road vehicles and racing cars.

Modular System Module	Components	Autonomous On-Road Vehicles	Autonomous Racing Cars
Perception	Sensor Fusion	Tracks pedestrians, vehicles, and road signs using cameras, LiDAR, and radar.	Detects track boundaries, other cars, and obstacles using cameras, LiDAR, radar, and GPS.
	Object Detection	Pedestrians, cyclists, road signs, other vehicles, traffic signals, and weather conditions.	Track conditions such as turns, barriers, and opponents.
	Lane and Road Recognition	Road lanes, lane markings, road edges, and free space to drive.	Track boundaries, racing lines, and potential off-track hazards
Planning	Path Planning	Manage trajectory, traffic rules, collision obstacles and road geometry.	Optimal trajectory, avoiding collisions and overtaking at high speeds.
	Decision-Making	Safe driving decisions include stopping at signals, changing lanes, and walking to pedestrians.	Aggressive decision-making, such as high-speed overtaking, defending positions, and responding to track conditions.
	Speed Control	Balances fuel efficiency and safety based on traffic, road conditions, and speed limits.	Maximizes speed while maintaining control, particularly during cornering and on straights.
Control	Steering Control	Smooth steering for safe navigation, considering curves, traffic, and road signs.	Makes precise, rapid steering adjustments to maintain the optimal racing line.
	Throttle & Braking Control	Adjust throttle and braking to maximize fuel efficiency and safety, including emergency braking.	Precise throttle and braking to optimize lap times, especially during sharp corners and accelerations.
	Stability and Traction Control	Ensure vehicle stability in varying weather and road conditions, minimizing skidding and loss of control.	Maximizes traction, especially during cornering, to maintain grip and minimize oversteer/understeer.
	Vehicle Safety	Monitors vehicle health and adapts for comfort and fuel efficiency.	Monitors vehicle performance in real-time for performance optimization.

Table 3. Comparison of planning methods between autonomous racing cars and autonomous vehicles.

Platform	Reference	Approach	Modular	Performance
Autonomous Vehicles	[85]	VAE + GRU and RL	Trajectory planning	Average route completion degree (100%), number of collisions N is 6, Number of deadlocks is 0, and average running time is 542.56 s
	[101]	Graph-spatial-temporal-CNN with GRU	Trajectory prediction	Average RMSE 1.52 with all vehicles, and Average RMSE 1.49 with one vehicle
	[102]	SCOUT: Attention-based GCN	Interaction, trajectory prediction	Average displacement error (ADE)/final displacement error (FDE), InD dataset: 0.46/1.03
	[104]	SAC and VAE	Decision-making	Success rate outperformed
	[109]	LLMs	Motion planning and decision-making
Autonomous Racing Cars	[118]	RL	Trajectory planning for overtaking maneuvers	Success rate up to 92%, time per planning cycle 1.5 ms
	[126]	ANN: feed-forward neural network	Optimal trajectory and lap time	MAE = ±0.27, and ±0.11 at curvature point, 9000 times faster performance than traditional methods
	[125]	Multi-agent: artificial potential field (APF) planner with residual policy learning (RPL)	Path planning	Collision ratio of IC = 0.33% and IC = 0.42%
	[134]	Balanced reward-inspired proximal policy optimization (BRPPO)	Decision-making to navigate complex tracks	Number of collisions is 0 on all tracks
	[122]	Dynamic movement primitives (DMPs)	Trajectory generation	Mean lap times between the acceleration goal DMP (Mean = 134.75, SD = 0.85) and Velocity goal DMP (M = 136.87, SD = 1.34)

Table 4. Listing and comparison of the control modules within the pipeline of a modular system for both autonomous racing cars and autonomous vehicles.

Platform	Reference	Approach	Modular	Performance
Autonomous Vehicles	[136]	Proposed a human-like neural network	Longitudinal motion control	Control style consistency and convergence rate.
	[137]	VLM	Literal controls	Driving Score (DS) 6.87, route completion (RC) 18.08 and infraction score (IS) 0.42
	[138]	DRL with DQN	Vehicles control in difficult environments	Maximizes success rate with minimizes collision rate
	[139]	CNN with pre-training as well as maintaining overfitting	Lateral control: steering angle estimation	Improve training and generalization to prevent over-fitting
	[140]	Introduced a novel approach—robust adaptive learning control (RALC)	Predicting uncertainties and lateral tracking controls	Tracking performance and errors: lateral deviation and heading errors are evaluated on an eight-shaped track where the adhesion coefficient is = 1.
Autonomous Racing Cars	[132]	Gaussian process model and model predictive control (MPC)	Control safely overtaking opponents	Lateral and longitudinal error are (0.02 and 0.026) means and (0.006 and 0.006) variance respectively
	[141]	Mathematical models	Learning MPC	20th iteration lap time (ILT-20) is 5.0
	[142]	Model-free DRL, Dreamer for Sim2Real	Controls	Lap time on four tracks
	[130]	ResRace: MAPF and model-free-DRL	Control policy	Lap time on five tracks

Table 5. Summary of representative end-to-end approaches for autonomous driving on the road and in racing scenarios.

Platform	Reference	Approach	Modular	Performance
Autonomous Vehicles	[161]	LLMs	Perception to controls	Ability to reason and solve long-tailed cases
	[162]	DSUNet (depthwise separable convolutions)	Planning: Lane detection and path prediction	Static/dynamic MAE: estimated curvature: 0.0046/0.0049, lateral offset: 0.18/0.11
	[44]	VLMs	Perception to planning	Accuracy 66.5
	[7]	ShuffleNet V2: proximal policy optimization (PPO) algorithm with curriculum learning and actor-critic	e2e: perception to controls	Best: Collision rate is 63%, waypoint distance is 2.98 m, speed is 8.65 km/h, total reward is 2025, and timestep is 374.
	[163]	Multimodal LLMs	Perception to planning	Avg L2 (m) 0.29 s
	[164]	Fuses RL with optimization-based methods	Explainable and robust motion planning and control approach	Online Computing time PPO-based in few millisecond(very fast) and optimization-based 120–150 ms (very slow) and Proposed (Hybrid) < 100 ms (acceptable)
Autonomous Racing Cars	[165]	Extreme Learning Machine (ELM)	Planning and control	Optimal lap times 8.46 s, Mean deviation from racing line 0.0832 m and violation time 0.46 s
	[166]	RL	e2e: Perception to trajectory planning	Success rate around 80%
	[5]	Direct policy learning with CNN and LSTM	e2e: Perception to controls (steering angles and throttles)	Average lap time: (84.19, 142.4), at highest speed: (70 mph, 60 mph), competed laps: (65, 50) track1 and track2, respectively
	[6]	CNN, transfer learning, DQN	e2e process for Sim2Real	Lap time: (AUT: 23 s, BRC: 56 s, GBR: 48 s, MCO: 42 s)
	[167]	Imitation learning (IL): IRL	Trajectories to controllers	CBFs-value for safety guarantee

Table 6. Datasets that are frequently utilized in autonomous driving on-road vehicles.

Datasets	Sensors	Purpose	Developed	Type
CARLA	LiDAR, cameras	Perception, planning, control, sensor fusion, edge case testing	Computer game with game world and AI agents	Fully Simulation
LGSVL	LiDAR, cameras	Perception, localization, V2X interaction, control	Computer game world using Unity, AI agent simulation
AirSim	LiDAR, cameras	Perception, reinforcement learning, control	Game-based simulation using Unreal Engine with drone/car agents
KITTI	LiDAR, cameras	Perception (object detection, SLAM), sensor calibration	Real video data with AI agents making decisions on pre-existing video	Semi-simulation
Waymo Open	LiDAR, cameras	Perception, planning evaluation, tracking	Real-world recordings processed with autonomous AI modules
nuScenes	LiDAR, cameras	Sensor fusion, prediction, 3D object detection	Real-world dataset with annotated scenes used for training AI agents
Duckietown	Cameras, GPS, IMU	Lane detection, control, end-to-end learning	Scale model car with sensors running in a physical, scaled-down town	Semi-Real
Tesla Autopilot	LiDAR, cameras, radar	Perception, autopilot control, real-world planning	Real car with sensor suite collecting data from actual roadways	Real-World
Waymo	LiDAR, cameras, radar	Perception, prediction, planning, localization, control	Autonomous vehicles operating and recording in real environments
Cruise	LiDAR, cameras, radar	Mapping, decision-making, motion planning, control	Real vehicles with AI-driven, collecting real data

Table 7. Datasets that are progressively used in autonomous racing cars.

Datasets	Sensors	Purpose	Developed	Type
Sim4Racing	Cameras, IMU	End-to-end driving, high-speed control, reinforcement learning	Game engine simulation with virtual racing environment and AI agents	Full simulation
TORCS	Cameras, wheel encoders	Planning, lane keeping, control, speed optimization	Classic racing simulator used for training AI agents
DeepRacer (Sim)	Cameras, LiDAR	End-to-end policy learning, control	Cloud-based virtual simulation platform for reinforcement learning
FormulaTrainee	Video footage (real track)	Decision-making from video frames, imitation learning	AI agent trained using pre-recorded video of real race tracks	Semi-simulation
DriverNet	Dash cam videos	Lane following, control	Real driving video data	Semi-simulation
F1TENTH	Cameras, LiDAR, IMU	Planning, real-time obstacle avoidance, racing policy control	Onboard sensors operating in real environment	Semi-Real
DonkeyCar	Cameras, IMU	End-to-end learning, behavioral cloning	DIY scale car platform with sensors trained in physical tracks	Semi-Real
Audi RS5 (AutoDrive)	Cameras, radar, LiDAR	Real-time planning, high-speed control, safety-critical navigation	Full-sized car with sensor suite tested on race tracks	Real-World
Indy Autonomous Challenge	Cameras, LiDAR, GPS, IMU	Full autonomy in high-speed race scenarios, perception, planning	Full-size open-wheel race cars equipped with sensors in real competitions
Roborace	Cameras, LiDAR, radar, GPS, IMU	High-speed autonomous racing, multi-agent interaction, real-time planning	Real full-size electric race cars with autonomous control tested in real racing circuits

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hussain, K.; Moreira, C.; Pereira, J.; Jardim, S.; Jorge, J. A Comprehensive Literature Review on Modular Approaches to Autonomous Driving: Deep Learning for Road and Racing Scenarios. Smart Cities 2025, 8, 79. https://doi.org/10.3390/smartcities8030079

AMA Style

Hussain K, Moreira C, Pereira J, Jardim S, Jorge J. A Comprehensive Literature Review on Modular Approaches to Autonomous Driving: Deep Learning for Road and Racing Scenarios. Smart Cities. 2025; 8(3):79. https://doi.org/10.3390/smartcities8030079

Chicago/Turabian Style

Hussain, Kamal, Catarina Moreira, João Pereira, Sandra Jardim, and Joaquim Jorge. 2025. "A Comprehensive Literature Review on Modular Approaches to Autonomous Driving: Deep Learning for Road and Racing Scenarios" Smart Cities 8, no. 3: 79. https://doi.org/10.3390/smartcities8030079

APA Style

Hussain, K., Moreira, C., Pereira, J., Jardim, S., & Jorge, J. (2025). A Comprehensive Literature Review on Modular Approaches to Autonomous Driving: Deep Learning for Road and Racing Scenarios. Smart Cities, 8(3), 79. https://doi.org/10.3390/smartcities8030079

Article Menu

A Comprehensive Literature Review on Modular Approaches to Autonomous Driving: Deep Learning for Road and Racing Scenarios

Abstract

Highlights

Abstract

1. Introduction

2. Background

2.1. On-Road Autonomous Vehicles

2.2. Autonomous Racing Cars

2.3. Modular System

3. Related Work

3.1. Perception-Focused Surveys

3.2. Planning and Trajectory Prediction

3.3. Control Systems

3.4. End-to-End Learning

3.5. Safety in Autonomous Driving

3.6. Large Language Models and Vision-Language Models

3.7. Simulation Modalities in Autonomous Driving Research

3.8. Comparative Context and Scope

4. Materials and Methods

4.1. Research Motivations

4.2. Research Questions

4.3. Information Sources and Databases

4.4. Search Strategy and Key Terms

4.5. Eligibility Criteria and Quality Assessment

4.6. Data Extraction

4.7. Risk of Bias Assessment

4.8. Effect Measures

4.9. Reporting Bias Assessment

4.10. Registration and Protocol

5. Results

5.1. Characteristics of the Selected Studies

5.2. Synthesis of Results

6. Major Findings

6.1. RQ1: What Are the Deep Learning Approaches Used in Modular Autonomous Driving Systems on the Road and in Racing Scenarios?

6.1.1. Perception

6.1.2. Planing

6.1.3. Control

6.2. What Are the Safety and Robustness Machine Learning/Deep Learning Techniques Used in Autonomous Driving on the Road and in Racing Scenarios?

6.3. What Are the Existing Datasets Used for Machine Learning/Deep Learning Techniques in Autonomous Driving?

6.4. What Performance Evaluation Metrics Are Used to Evaluate the Modular System in Autonomous Driving on the Road and in Racing Scenarios?

7. Limitations of the Study

8. Real-Time Challenges and Future Directions

8.1. Real-Time Challenges for On-Road and Racing Scenarios

8.1.1. On-Road Autonomous Vehicles

8.1.2. Autonomous Racing Cars

8.2. Research Directions for On-Road and Racing Scenarios

8.2.1. Perception

8.2.2. Planning

8.2.3. Control

8.2.4. End-to-End Approach

8.3. Certainty of Evidence

9. Integration of State-of-the-Art Modules: A Case Study

10. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI