Building the Future of Transportation: A Comprehensive Survey on AV Perception, Localization, and Mapping
Abstract
:1. Introduction
2. Object Detection and Tracking in Autonomous Vehicles
2.1. Introduction to Image Processing in Autonomous Vehicles
2.2. Overview of Convolutional Neural Networks (CNNs)
- Input layer: Larger structures are introduced to a network by input layers, which are normally represented by an image consisting of a multi-dimensional array of pixels. When images are to be put in a model, they should conform to the region, channel dimensions, and the model or deep learning library.
- Convolution layer: while a kernel specifies the view size in the convolution operation, a filter, regarded as a tensor in this occasion, is the total number of kernels in a given layer channel-wise, for example, a filter with dimensions, has c pieces of kernels.
- Padding: Using this mechanism, the picture is resized in the spatial dimension, and the number of channels depends on both the input and the filter size. Parameters of the kernel define features that present edges in the output that are relevant to those characteristics.Padding adds a number of pixel grids to extend the spatial plane’s output dimensions. It also addresses the common problem of information loss. Thus, padding is a critical factor in convolutional neural network architecture and performance.
- Stride: stride relates to the translation distance in pixels when convolving the images.
- Activation functions: A feature of the network requiring more dense representation has its respective feature map built via the activation function, transforming the feature map into a nonlinear zone and allowing the model to learn complex tasks over data. The model can be made for more complicated structures by permuting the straight surfaces. For relief, the famous option for CNNs defeats the vanishing gradient problem.
- Pooling operation: The pooling layers in CNNs remove small rectangular structures from the output feature map, and each small block chooses a single maximum, minimum, or average number. With this adjustment, the feature map’s parameters and spatial size are smaller, making it easier to aim at microregulation. There are different sorts of pooling, such as maximum pooling or global average pooling. The second is when the whole image is pooled but nothing undergoes a sliding operation.
- Fully Connected Layers: after completing the convolutional and pooling processes, the last part of the CNN structure is the Fully Connected Layers (FCLs) used for high-level classification.
2.3. Object Detection Networks
2.3.1. Two-Stage CNN Detectors
2.3.2. One-Stage CNN Detectors
2.4. Performance Metrics
2.5. Challenges in Object Detection and Tracking
2.5.1. Handling Occlusion in Tracking
2.5.2. Adverse Weather Conditions
2.5.3. Nighttime Object Detection
2.6. Datasets for Object Detection and Tracking
- KITTI Vision: one of the most comprehensive datasets for AVs, providing images and 3D point clouds for object detection and tracking.
- nuScenes: a large-scale dataset for AV research, including LiDAR, camera images, and radar data.
- Waymo Open Dataset: developed by Waymo, it provides extensive data for training object detection and tracking models.
- CityScapes: focuses on semantic segmentation and object detection in urban environments.
Dataset | Real | Location Accuracy | Diversity | Annotation | |||
---|---|---|---|---|---|---|---|
3D | 2D | Video | Lane | ||||
CamVid [30] | ✓ | - | Daytime | No | Pixel: 701 | ✓ | 2D/2 classes |
Kitti [31] | ✓ | cm | Daytime | 80k 3D box | Box: 15k, Pixel: 400 | - | No |
Cityscapes [32] | ✓ | - | Daytime, 50 cities | No | Pixel: 25k | - | No |
IDD [33] | ✓ | cm | Various weather, urban and rural roads in India | No | Pixel: 10k | ✓ | No |
Mapillary [34] | ✓ | Meter | Various weather, day and night, 6 continents | No | Pixel: 25k | - | 2D/2 classes |
BDD100K [29] | ✓ | Meter | Various weather, 4 regions in US | No | Box: 100k, Pixel: 10k | - | 2D/2 classes |
SYNTHIA [35] | - | - | Various weather | Box | Pixel: 213k | No | No |
P.F.B. [36] | - | - | Various weather | Box | Pixel: 250k | - | No |
ApolloScape [37] | ✓ | cm | Various weather, daytime, 4 regions in China | 3D semantic point, 70k 3D fitted cars | Pixel: 140k | 3D/2D video | 35 classes |
Waymo Open Dataset [38] | ✓ | cm | Various weather, urban and suburban roads in the US | 12 M 3D boxes | Box: 12M, Pixel: 200k | ✓ | 2D/3D lane markings |
2.6.1. Classical Tracking Methods
- Kalman filtering: This method is commonly used to estimate the future state of an object based on its past trajectory. The Kalman filter is beneficial for tracking objects moving in a linear path with constant velocity. However, it struggles in cases where the object’s movement is erratic or nonlinear.Example: In the context of AVs, a Kalman filter might track the position of a moving car on a highway by predicting its future location based on its current speed and direction [39].
- Hungarian algorithm: This algorithm solves the assignment problem—matching detected objects in consecutive frames. It assigns objects from the current frame to the closest objects detected in the next frame, minimizing the overall movement cost. This method can efficiently handle multiple objects but is limited by its reliance on spatial proximity, often failing in complex scenes with significant object overlap [39].
2.6.2. Deep Learning-Based Tracking
- SORT (Simple Online and Realtime Tracking): SORT is an early and simple tracking algorithm that uses Kalman filtering for motion prediction and the Hungarian algorithm for object association. It tracks objects solely based on motion models without considering appearance information, which makes it susceptible to errors in crowded environments or occlusion [40].Use case: SORT is most effective in environments with minimal occlusion or interaction between objects, such as monitoring traffic in low-density areas.Limitation: the algorithm frequently loses track of objects during occlusions due to its sole reliance on motion models, which prevent it from distinguishing between objects based on appearance.
- DeepSORT (Simple Online and Realtime Tracking with Deep Appearance Descriptors): DeepSORT improves upon SORT by incorporating deep appearance descriptors from a convolutional neural network (CNN). This addition helps the algorithm distinguish objects based on their visual characteristics, improving its ability to maintain consistent tracking during occlusion and re-identification when objects reappear after being hidden.The improvement over SORT: using appearance-based features, DeepSORT is more robust in crowded scenes or environments where objects frequently overlap.Use case: DeepSORT is ideal for dense urban environments or crowded pedestrian areas, where the visual appearance of objects is critical to their accurate tracking.
- Tracktor: Tracktor is a tracking-by-detection algorithm that leverages object detection across multiple frames, eliminating the need for a separate tracking module. Instead, it uses bounding-box regression to predict an object’s future position, making the process more straightforward but dependent on high-quality object detection [41].Improvement over DeepSORT: Tracktor simplifies the tracking process by directly extending detection into future frames, though it relies heavily on the quality of the detection.Use case: Tracktor performs well in environments where the detection system is highly reliable, such as AVs equipped with advanced LiDAR or radar data for precise detection.
- BoT-SORT (Bytetrack Optimal Transport–SORT): BoT-SORT enhances DeepSORT by incorporating appearance and motion information while using Optimal Transport (OT) to match detected objects across frames. This leads to more accurate tracking, particularly in scenarios with rapid object movement or complex interactions between objects.The improvement over Tracktor: BoT-SORT integrates appearance information, allowing it to handle occlusion better than Tracktor, which relies solely on bounding-box predictions.Use case: BoT-SORT is especially useful in high-speed tracking scenarios, such as racing or drone footage, where objects move at varying speeds and directions.
- ByteTrack: ByteTrack improves upon tracking by utilizing high-confidence and low-confidence detection. This allows ByteTrack to track objects even when they are partially visible or occluded, reducing missed detection events and increasing overall robustness in challenging environments.An improvement over BoT-SORT: ByteTrack’s ability to incorporate low-confidence detection ensures continuous tracking even in scenarios with severe occlusion or partial visibility.Use case: ByteTrack is ideal for urban environments, where AVs must track multiple objects under varying conditions, such as dense city traffic or busy intersections.
Metric | SORT | DeepSORT | ByteTrack |
---|---|---|---|
MOTA | 54.7% | 61.4% | 77.3% |
MOTP | 77.5% | 79.1% | 82.6% |
ID switches | 831 | 781 | 558 |
MT | 34.2% | 45.1% | 54.7% |
ML | 24.6% | 21.3% | 14.9% |
FP | 7876 | 5604 | 3828 |
FN | 26,452 | 21,796 | 14,661 |
Processing speed | 143 FPS | 61 FPS | 171 FPS |
2.6.3. Occlusion Handling and Re-Identification
- Appearance descriptors: Algorithms like DeepSORT use visual appearance features to help re-identify objects after they have been occluded. By capturing the unique visual characteristics of objects, these trackers can re-associate objects with their original identities when they reappear.
- Multiple detection strategies: Algorithms such as ByteTrack maintain tracking using high-confidence and low-confidence detection. This ensures that even when an object is partially visible or occluded, its trajectory can still be maintained through lower-confidence predictions.
- Re-identification models: In algorithms like OC-SORT and Tracktor, Re-ID models predict the object’s likely future location based on its previous movements. This helps reassign the object’s identity when it reappears after occlusion, reducing errors in tracking [43].
- Motion modeling and data association: Algorithms employ motion models such as Kalman filters to predict the future trajectory of an object based on its velocity and direction. This allows for consistent tracking even when objects are temporarily occluded.
3. Localization Strategies in Autonomous Vehicles
3.1. Vision-Based Localization
- Feature-based localization: In feature-based localization, differential features from a set of images are used to generate matches between consecutive frames for motion estimation purposes. Such a method is of great importance during such conditions, although chances of encountering such conditions where repetitive textures or low-contrast environments exist are also very high.
- Visual odometry (VO): Visual odometry (VO) concerns the self-location estimation of the mobile robot mounted with cameras or cameras mounted on their platforms [45]. VO has several advantages, such as escaping the GPS fold without compromising on the precision of positioning and being more affordable than other sensor-based systems. It is also less affected by wheel slippage, accommodating rough surfaces, unlike the old-fashioned wheel encoders [46]. However, there are some drawbacks that VO has to face, including the dependence on power resources and the processes of VO being affected by lighting and the environment in general, altering its effectiveness.
- Place recognition: Through place recognition, a person can recall a specific position that he/she has already been to by looking at the picture content. When used with other localization methods, this method can yield good positioning accuracy by enhancing the system’s robustness.
- Integration approaches: vision-based localization techniques usually have additional sensor systems that are used to improve performance.
3.2. LiDAR-Based Localization
4. Mapping Technologies for Autonomous Navigation
4.1. What Is a Map?
4.1.1. Definition
4.1.2. General Maps Versus Autonomous Driving Maps
4.2. Types of Maps Used in Autonomous Systems
4.2.1. Real-Time (Online) Maps—Simultaneous Localization and Mapping (SLAM)
4.2.2. Prebuilt Maps
- Simplified map representations: Simplified map representations reduce the computational load while retaining essential navigational features. These maps are categorized primarily into topological, metric, and geometric maps.
- −
- Topological maps: Topological maps emphasize connectivity and relationships between locations rather than precise geometric details. They simplify high-level route planning and navigation by representing the environment as nodes and edges, highlighting key routes and intersections [52,53]. Topological maps provide abstract representations of paths and landmarks, useful for understanding the overall layout of an area without delving into precise measurements.
- −
- Metric maps: Metric maps provide detailed spatial information about the environment, including distances and relationships between objects. They are essential for precise navigation and obstacle avoidance in autonomous vehicles.
- ∗
- Landmark-based maps: Landmark-based navigation relies on identifying environmental features to determine the vehicle’s position and orientation. They utilize distinct landmarks such as buildings or traffic signs for localization and navigation [54]. These maps are handy in GPS-denied environments [55,56].
- ∗
- Occupancy grid maps: These maps represent the environment as a network of cells, indicating each cell’s chances of occupation. They are widely used in determining drivable and non-drivable areas. Occupancy grid maps are crucial for mobile robot perception and navigation, differentiating between free space and obstacles [57].
- ·
- Octree: The hierarchical 3D grid systems are an efficient means of representing three-dimensional spaces. In an eight-node octree, every node has eight children for deeper spatial partitioning. They have additional advantages for extensive three-dimensional regions by enabling spatial data management and processing efficiently [58,59].
- ·
- CostMap: A cost is assigned to each cell in the map, representing the difficulty of traversing different areas. These cell costs guide autonomous systems in choosing the safest and most optimal paths. CostMaps are important in dynamic obstacle avoidance and efficient path planning in autonomous navigation systems.
- −
- Geometric maps: Geometric maps give accurate geometrical data concerning objects and their locations in the surroundings. Such maps usually use vector data systems for representation, which is important for activities that require high precision, such as avoiding obstacles and navigating accurately through traffic. Thus, geometric maps are employed widely by urban planners to map infrastructure, which includes buildings and roads [60,61].
- High-accuracy maps—HD maps: Autonomous driving needs maps of fine resolution (HD) that yield centimeter-level precision. They contain elaborate representations of road environments, lane configurations, traffic signs as well as obstacles that are regularly updated to reflect real-time situations. Through HD maps, autonomous vehicles can navigate precisely, plan routes, and dodge obstacles, relying on various sensors like LIDARs, cameras, or GPS for their creation.
4.3. HD Maps
4.3.1. Key Features of HD Maps
- Painted lines: cast and represented as defined 3D features that border a particular roadway.
- Traffic signs: these are strategically placed for easy identification and reaction.
- Three-dimensional building models: implemented to give a better view of cities and their spatial organization.
- Signals and stop lines: these two are angles and 3D spaces related to stopping and other actions at the intersection.
- Semantic data: encoding information concerning the driving features like lanes, junctions, and the different road segments into the maps for advanced driving support functionalities.
4.3.2. Main Categories of Mapping Information
- Topological representation
- −
- What does it denote?In the context of autonomous vehicles, a topological map is used to represent the connectivity of key elements—such as roads, junctions, traffic signals, and lane markings—without requiring exact geometric details. This abstraction enables efficient path planning by focusing on how these critical nodes are interconnected rather than their precise spatial positions [52,53].
- −
- Why is it necessary?This enables the system to appreciate how different regions are connected and related, which is important in route planning and navigation. For example, a topological view of a road allows the system to find its way even when other factors are present, such as traffic or road reconditioning, allowing it to calculate other routes [71].
- −
- Use case:It can provide other routes to take when there is heavy traffic.
- Geometric representation
- −
- What does it denote?The configuration and relative positions of road surfaces, buildings, lanes, and other infrastructures [62].
- −
- Why is it necessary?Autonomous systems must deal with geometric details regarding localization, route design, and movement. The vector data structure is most used where real-life objects are reduced to points, lines, polygons, and so on.
- −
- Use case:Geometric representations come in handy for specific movements whenever an autonomous vehicle has to position itself in a particular lane or steer clear of curbs and other objects.
- Semantic interpretation
- −
- What does it denote?It adds meaning and gives purpose to the spatial-structural and –topological aspects of roads by attaching names to roads, their tributaries, road signs, pedestrian crossings, limits of speed, etc.
- −
- Why is it necessary?The meaning of the semantic information is to make machines capable of understanding and perceiving the environment with the help of navigation. This is essential when decisions have to be made based on certain principles and the state of street traffic [54,72,73]. Examples: the HD map layers provided by HERE [74] and TomTom’s RoadDNA [75] contain semantic features at the object level for the vehicle’s positioning and enhancement of the decision-making process.
- −
- Use case:The system identifies a traffic sign and performs related operations.
- Dynamic elements
- −
- What does it denote?Elements that are based outside someone’s control, such as moving people and vehicles, items that are present on the area surrounding the road, as well as changes concerning the road that may be temporary, such as construction work and traffic congestion.
- −
- Why is it necessary?In such environments, objects are permanently positioned and in motion simultaneously, so the elements of autonomous systems must be able to interact with them instantly for navigation to be safe. The changes are timely and precise; therefore, they ensure that the HD map is current and appropriately describes the surroundings.
- −
- Use case:The vehicle changes its orientation and moves around children, adults, and other cyclists to avoid knockdowns.
- Feature-based map layers
- −
- What does it denote?These are comprised of several map layers that are rich in specific elements in the map that favor the process of localization and navigation [76].
- −
- Why is it necessary?These layers also increase the maps’ precision and trustworthiness, thus aiding the system in operating effectively in environments with many structures, such as cities where outdoor navigation is challenging due to poor satellite reception [77].
- −
- Use case:In an urban setup surrounded by tall structures, feature-based map layers assist the vehicle in steering accurately by using moving objects such as traffic lights and similar features that are easily identifiable.
4.3.3. HD Map Creation Pipeline
- Data collection:The first stage is collecting the needed data to develop an HD map. A vehicle that has been installed with high-accuracy sensors and improved calibration techniques is sent to map out and gather extensive details of the surroundings. Mapping vehicles are usually fitted with Mobile Mapping Systems (MMSs) [80,81], which consist of various sensors most of the time. Figure 7 shows the Mobile Mapping System in use with relevant sensors.For instance, the LiDAR, the GNSS, the IMU, and the camera of an MMS user work together to deliver a dense and accurate three-dimensional point cloud of the scanned scene. Cameras are used to photograph the environment to produce high-resolution images with details like road markings, traffic signs, and features. Similarly, another type of sensor which is found in the MMS is the Global Navigation Satellite System (GNSS). GNSSs have receivers that allow them to connect with several satellite systems simultaneously, improving the accuracy of the measurements. A GNSS sensor is frequently integrated with the Inertial Measurement Unit (IMU) sensor to compute the moving object’s trajectory, including its position, velocity, and orientation in space [82,83].Once the data are collected, they are processed through a series of steps to generate an HD map. The data collection process must ensure the integrity and quality of the data to achieve accurate HD maps.
- Data Preprocessing and Alignment:Data preprocessing and alignment are extremely important factors in HD map generation as they increase the collected data’s accuracy, consistency, and quality. Data preprocessing is manipulating gathered data from different types of sensors by cleaning, filtering, and sieving them to remove unwanted signals, outliers, and inconsistencies. This is meant to ensure the information collected is as good as possible and fit for the subsequent processes. Figure 8 shows the use of CloudCompare v2.12.4 software [85] to preprocess point cloud data by integrating point cloud segments and applying SOR filtering to remove noise and enhance data accuracy. This procedure usually consists of sensor calibration, synchronization of time, and the “registration” of spatial coordinates [86]. It is very important to carry out calibration in order to obtain precise readings and position the sensor data properly. It helps when combining scanner data collected by synchronizing time across different sensors. It is the last stage where the data are shaped to fit into a common coordinate frame for proper positioning of the sensor data.
- Feature extraction:Feature extraction is an essential part of the HD map creation pipeline, wherein relevant data are identified and extracted from the processed material. Feature extraction entails spotting and drawing out key features from the aligned data, such as the edges of the road, lane markings, road signs, and other pertinent objects. Numerous methods can be employed for the purpose of feature extraction, for instance:
- −
- LiDAR feature extraction: a LiDAR sensor captures a very dense 3D cloud of points, which is then processed to extract features like road edges, curbs, and obstacles whose features can be identified and extracted using techniques such as clustering, segmentation, and other machine learning algorithms [87,88,89,90].
- −
- Map creation and updating:The last phase of the HD map building process consists of the map’s enhancement or editing while using the detected characteristics. This phase usually concerns the following:
- −
- Map data structure—the detected features have to be transformed into a proper representation so that they can be efficiently stored, accessed, and worked upon (for example, a graph or a tree) [97].
- −
- −
- Map validation—the process of checking the quality and coherence of the HD map content against the reference data, for example, collected surveys or other maps. Different methods may be utilized within the validation process, such as an inspection which can be performed visually or statistically, or by using deduced techniques [100].
- −
- Map updating—Modifying the HD map to include any evolutions experienced within its surroundings, for example, new roads, lane re-configurations, or even the presence of temporary obstacles. This also includes real-time data alterations regarding the constantly changing circumstances and location within the existing local map [101,102].
4.3.4. Advantages of HD Maps
- Better route design: HD maps provide information concerning the design and configuration of the structure of roads, including lanes and traffic regulation, making it possible and less risky to plan out routes.
4.3.5. Limitation of HD Maps
- High cost of creation and maintenance: Creating and maintaining high-definition maps is resource-heavy [107]. This involves an accurate mapping of a city or a region over years and enormous amounts of money. In addition, real-life situations are always changing, so the maps have to be frequently and expensively corrected [101,108]. Various methods are incorporated to update the map constantly, one being the SLAMCU algorithm, which effectively detects and updates high-definition map changes, improving autonomous driving by providing the most accurate environmental information [102].
- Standardization: The absence of common formats and protocols for HD maps reduces the levels of sharing and cooperation between the different players in the autonomous driving value chain [107]. Now similar to Open HD Map Service Model (OHDMSM) (Figure 9 shows the OHDMSM framework), attempts are being made to create interoperable HD map data models, a baseline that would assist in HD mapped development and unify data fusion and applications across platforms [109].
- Data storage: HD maps are also quite large, involving several terabytes of digital storage space whenever large city-based map coverage is needed [110]. This presents a problem to self-driving cars as storing the whole map at once may not be possible. Rather, the vehicle can connect to the internet, which allows for the maps to be continuously uploaded and downloaded on the fly. However, there are disadvantages to this approach, such as the need for considerable bandwidth, the risk of congestion, or even unavailability of the networks in areas that are poorly connected [111]. Due to the above reasons, there are heavy cost implications and technical difficulties in using HD maps on a sustained basis.
- Third-party dependency: HD maps are normally made available by a handful of specialized vendors, which makes it impossible to avoid reliance on those third parties concerning map generation, modification, and maintenance. This dependence can further restrict the flexibility and adaptability of the autonomously driven systems since the success of those systems depends on how effective and fast the respective vendors’ systems are updated [112].
- Inapplicability to off-road use cases: HD maps assist in navigating moderate traffic and urban public roads. However, they are not very useful for extreme unstructured environments, such as in mining or agriculture or even in rural areas devoid of paved roads. Most often, in such surroundings, the accuracy of the HD map is either not required or is meaningless.
4.4. The Mapless Approach: A New Way Forward?
4.5. Key Technologies for Mapless Approach
- SLAM (Simultaneous Localization and Mapping): it has also been implemented in various mapless navigation systems to create and renew the environment maps while locating the vehicle simultaneously [118].
5. Simultaneous Localization and Mapping (SLAM) for Navigation
5.1. Visual SLAM
5.2. Filter-Based SLAM
- Initialization step: The state vector consists of the robot’s pose (position and orientation) and the environment map. Initially, the state vector is initialized with , which are the starting coordinates; if they are unknown, it is initialized as . If there are any landmarks, they can also be initialized in the state vector as x = . The covariance matrix, which is used to consider the uncertainty in the robot’s pose, is also initialized. The diagonal elements of the covariance matrix indicate how certain the initial estimates are.
- Prediction step: Odometry data are used as control inputs for predicting the next state. A motion model describes the robot’s movement given the above parameters. The current state of the robot and the control inputs are passed as inputs to the model to calculate the predicted state. The covariance matrix is also updated to reflect the change in uncertainty due to the robot’s motion. The Jacobian of the motion model is calculated to linearize the model around the current state. The updated covariance matrix is
- Correction step: The sensor measurements are incorporated to correct the predicted state. These measurements are obtained using a number of sensors like a camera, LiDAR, IMU, etc. These measurements provide information about the environment and the robot’s interaction with it. A measurement model is used to obtain the expected values from the predicted state. The Jacobian of the measurement model is calculated to linearize it with respect to the state vector to linearize the model. This, in turn, is used to calculate the Kalman gain, which is used to correct the predicted state based on the difference between the actual and predicted parameters. The covariance matrix has also been updated accordingly to reflect these changes.
5.3. Kalman Filter and Its Variations
5.3.1. Comparison and Applications
- The EKF is simpler to implement and computationally efficient for mildly nonlinear systems.
- The UKF offers superior performance for highly nonlinear systems, potentially providing more accurate uncertainty representation.
5.3.2. Hardware Implementation Challenges
- Real-world environments introduce more noise and variability than simulated scenarios.
- Sensor noise and limited precision can affect the accuracy of state estimates.
- The computational requirements of the UKF can be higher, potentially limiting real-time performance on resource-constrained hardware.
5.4. Particle Filter and Its Variations
6. Challenges
- Dynamism and unpredictability: in an ideal world, all roads would be wide enough to accommodate intentionally designed cars that automatically transport passengers; we would need to consider scenarios where there are construction sites, other vehicles, or even pedestrians that do not interact directly with the car sensors; therefore, to receive uninterrupted traffic data, extensive autonomous navigation would have to be put into practice.
- Weather and lighting conditions: Autonomous systems can receive more accurate data in optimal conditions, although this is not always the case, whether we are looking at weather conditions such as rain, snow, or general overcast skies or the time of day reducing sensor efficiency. Even with models such as GAN or thermal sensors being developed, the weather is still critical to automatic navigation.
- Occlusion and re-identification: Having to manually follow certain features across frames for tracking is strenuous, especially in the case of occlusion, whereby one or multiple tracked objects are surrounded by other objects. Although specific models such as DeepSORT or ByteTrack are effective for operational environments with fewer people, they are not mainly designed for complex environments.
- High computational demands: High standards such as real-time output when performing a variety of tasks, including identifying and mapping objects from three-dimensional sensors, are now theoretically possible. However, this would require spending a large sum of money on advanced hardware, making it inaccurate in predicting real-life scenarios.
- Data bias: Datasets such as KITTI, nuScenes, and Waymo Open Dataset are being used to train autonomous vehicle models. Although these datasets are pretty broad, they may also be limited in geographical coverage, weather conditions, and social culture, which can hinder the model, especially in places where such conditions exist.
- Cybersecurity risks: The fact that AVs are constantly communicating with each other, transmitting updates of maps and feedback of sensors, makes them more prone to cyber-attacks. Navigation, privacy, and even vehicle collision can be controlled by hacked devices, which is why robust systems are required to maintain security.
- Fusion of different technologies: Integrating AI technologies such as cameras, LiDAR, radar, and GNSS is challenging. Each sensor comes with its own set of limitations, and amalgamating their outputs to form a single system requires sophisticated algorithms for sensor fusion that are both computationally efficient and robust.
- Legal and social issues: Laws have not been made to adapt to the advancements in technology, and this causes a lack of safety standards and regulations in case a vehicle ends up being involved in an accident. Ethical problems arise when a decision needs to be made in an unavoidable accident, which makes the integration of AVs more difficult.
- Scalability and deployment in untouched or remote areas: Existing AV technologies are built around large cities and flirt with the suburbs with amenities such as lane markings and HD maps. Adapting such systems to rural or cross-country operations is still an uphill task due to sparse data availability and weak sensors in unpaved areas.
7. Discussion and Future Scope
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
- Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. Bytetrack: Multi-object tracking by associating every detection box. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 1–21. [Google Scholar]
- Vijayarajan, V.; Rajeshkannan, R.; Rajkumar Dhinakaran, R. Automatidetection of moving objects using Kalman algorithm. Int. J. Pharm. Technol. IJPT 2016, 8, 18963–18970. [Google Scholar]
- Aharon, N.; Orfaig, R.; Bobrovsky, B.Z. BoT-SORT: Robust associations multi-pedestrian tracking. arXiv 2022, arXiv:2206.14651. [Google Scholar]
- Grisetti, G.; Kümmerle, R.; Stachniss, C.; Burgard, W. A tutorial on graph-based SLAM. IEEE Intell. Transp. Syst. Mag. 2010, 2, 31–43. [Google Scholar] [CrossRef]
- Liu, F.; Lu, Z.; Lin, X. Vision-based environmental perception for autonomous driving. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2025, 239, 39–69. [Google Scholar] [CrossRef]
- Liang, L.; Ma, H.; Zhao, L.; Xie, X.; Hua, C.; Zhang, M.; Zhang, Y. Vehicle Detection Algorithms for Autonomous Driving: A Review. Sensors 2024, 24, 3088. [Google Scholar] [CrossRef]
- Turay, T.; Vladimirova, T. Toward performing image classification and object detection with convolutional neural networks in autonomous driving systems: A survey. IEEE Access 2022, 10, 14076–14119. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- Girshick, R. Fast r-cnn. arXiv 2015, arXiv:1504.08083. [Google Scholar]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-fcn: Object detection via region-based fully convolutional networks. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Wu, B.; Iandola, F.; Jin, P.H.; Keutzer, K. Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 129–137. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–18 June 2024; pp. 16965–16974. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Hnewa, M.; Radha, H. Object detection under rainy conditions for autonomous vehicles: A review of state-of-the-art and emerging techniques. IEEE Signal Process. Mag. 2020, 38, 53–67. [Google Scholar] [CrossRef]
- Qian, R.; Tan, R.T.; Yang, W.; Su, J.; Liu, J. Attentive generative adversarial network for raindrop removal from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2482–2491. [Google Scholar]
- Ren, D.; Zuo, W.; Hu, Q.; Zhu, P.; Meng, D. Progressive image deraining networks: A better and simpler baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3937–3946. [Google Scholar]
- Chen, Y.; Li, W.; Sakaridis, C.; Dai, D.; Van Gool, L. Domain adaptive faster r-cnn for object detection in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3339–3348. [Google Scholar]
- Lin, H.; Parsi, A.; Mullins, D.; Horgan, J.; Ward, E.; Eising, C.; Denny, P.; Deegan, B.; Glavin, M.; Jones, E. A Study on Data Selection for Object Detection in Various Lighting Conditions for Autonomous Vehicles. J. Imaging 2024, 10, 153. [Google Scholar] [CrossRef]
- Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2636–2645. [Google Scholar]
- Brostow, G.J.; Fauqueur, J.; Cipolla, R. Semantic object classes in video: A high-definition ground truth database. Pattern Recognit. Lett. 2009, 30, 88–97. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Varma, G.; Subramanian, A.; Namboodiri, A.; Chandraker, M.; Jawahar, C. IDD: A dataset for exploring problems of autonomous navigation in unconstrained environments. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7–11 January 2019; pp. 1743–1751. [Google Scholar]
- Neuhold, G.; Ollmann, T.; Rota Bulo, S.; Kontschieder, P. The mapillary vistas dataset for semantic understanding of street scenes. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4990–4999. [Google Scholar]
- Ros, G.; Sellart, L.; Materzynska, J.; Vazquez, D.; Lopez, A.M. The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3234–3243. [Google Scholar]
- Richter, S.R.; Hayder, Z.; Koltun, V. Playing for benchmarks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2213–2222. [Google Scholar]
- Huang, X.; Cheng, X.; Geng, Q.; Cao, B.; Zhou, D.; Wang, P.; Lin, Y.; Yang, R. The apolloscape dataset for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 954–960. [Google Scholar]
- Sun, P.; Kretzschmar, H.; Dotiwalla, X.; Chouard, A.; Patnaik, V.; Tsui, P.; Guo, J.; Zhou, Y.; Chai, Y.; Caine, B.; et al. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2446–2454. [Google Scholar]
- Tithi, J.J.; Aananthakrishnan, S.; Petrini, F. Online and Real-time Object Tracking Algorithm with Extremely Small Matrices. arXiv 2020, arXiv:2003.12091. [Google Scholar]
- Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar]
- Sridhar, V.H.; Roche, D.G.; Gingins, S. Tracktor: Image-based automated tracking of animal movement and behaviour. Methods Ecol. Evol. 2019, 10, 815–820. [Google Scholar] [CrossRef]
- Abouelyazid, M. Comparative Evaluation of SORT, DeepSORT, and ByteTrack for Multiple Object Tracking in Highway Videos. Int. J. Sustain. Infrastruct. Cities Soc. 2023, 8, 42–52. [Google Scholar]
- Li, Y.; Xiao, Z.; Yang, L.; Meng, D.; Zhou, X.; Fan, H.; Zhang, L. AttMOT: Improving multiple-object tracking by introducing auxiliary pedestrian attributes. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 5454–5468. [Google Scholar] [CrossRef] [PubMed]
- Alarcon, N. DRIVE Labs: How Localization Helps Vehicles Find Their Way | NVIDIA Technical Blog. 2022. Available online: https://developer.nvidia.com/blog/drive-labs-how-localization-helps-vehicles-find-their-way/ (accessed on 9 March 2025).
- Azzam, R.; Taha, T.; Huang, S.; Zweiri, Y. Feature-based visual simultaneous localization and mapping: A survey. SN Appl. Sci. 2020, 2, 224. [Google Scholar] [CrossRef]
- Agostinho, L.R.; Ricardo, N.M.; Pereira, M.I.; Hiolle, A.; Pinto, A.M. A practical survey on visual odometry for autonomous driving in challenging scenarios and conditions. IEEE Access 2022, 10, 72182–72205. [Google Scholar] [CrossRef]
- Abdelaziz, N.; El-Rabbany, A. INS/LIDAR/Stereo SLAM Integration for Precision Navigation in GNSS-Denied Environments. Sensors 2023, 23, 7424. [Google Scholar] [CrossRef] [PubMed]
- Yin, H.; Xu, X.; Lu, S.; Chen, X.; Xiong, R.; Shen, S.; Stachniss, C.; Wang, Y. A survey on global lidar localization: Challenges, advances and open problems. Int. J. Comput. Vis. 2024, 132, 3139–3171. [Google Scholar] [CrossRef]
- Wang, H.; Yin, Y.; Jing, Q. Comparative analysis of 3D LiDAR scan-matching methods for state estimation of autonomous surface vessel. J. Mar. Sci. Eng. 2023, 11, 840. [Google Scholar] [CrossRef]
- Golledge, R.G.; Gärling, T. Cognitive maps and urban travel. In Handbook of Transport Geography and Spatial Systems; Emerald Group Publishing Limited: Bingley, UK, 2004; pp. 501–512. [Google Scholar]
- Epstein, R.A.; Patai, E.Z.; Julian, J.B.; Spiers, H.J. The cognitive map in humans: Spatial navigation and beyond. Nat. Neurosci. 2017, 20, 1504–1513. [Google Scholar] [CrossRef]
- Qi, Y.; Wang, R.; He, B.; Lu, F.; Xu, Y. Compact and efficient topological mapping for large-scale environment with pruned Voronoi diagram. Drones 2022, 6, 183. [Google Scholar] [CrossRef]
- Rawlinson, D.; Jarvis, R. Topologically-directed navigation. Robotica 2008, 26, 189–203. [Google Scholar] [CrossRef]
- Murali, V.; Chiu, H.P.; Samarasekera, S.; Kumar, R.T. Utilizing semantic visual landmarks for precise vehicle navigation. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 1–8. [Google Scholar]
- Levinson, J.; Montemerlo, M.; Thrun, S. Map-based precision vehicle localization in urban environments. In Proceedings of the Robotics: Science and Systems, Atlanta, GA, USA, 27–30 June 2007; Volume 4, pp. 121–128. [Google Scholar]
- Sundar, K.; Srinivasan, S.; Misra, S.; Rathinam, S.; Sharma, R. Landmark Placement for Localization in a GPS-denied Environment. In Proceedings of the 2018 Annual American Control Conference (ACC), Milwaukee, WI, USA, 27–29 June 2018; pp. 2769–2775. [Google Scholar]
- Li, Y.; Ruichek, Y. Occupancy grid mapping in urban environments from a moving on-board stereo-vision system. Sensors 2014, 14, 10454–10478. [Google Scholar] [CrossRef]
- Hornung, A.; Wurm, K.M.; Bennewitz, M.; Stachniss, C.; Burgard, W. OctoMap: An efficient probabilistic 3D mapping framework based on octrees. Auton. Robot. 2013, 34, 189–206. [Google Scholar] [CrossRef]
- Leven, J.; Corso, J.; Cohen, J.; Kumar, S. Interactive visualization of unstructured grids using hierarchical 3D textures. In Proceedings of the Symposium on Volume Visualization and Graphics, Boston, MA, USA, 28–29 October 2002; pp. 37–44. [Google Scholar]
- Lafarge, F.; Mallet, C. Creating large-scale city models from 3D-point clouds: A robust approach with hybrid representation. Int. J. Comput. Vis. 2012, 99, 69–85. [Google Scholar] [CrossRef]
- Wolf, D.; Howard, A.; Sukhatme, G.S. Towards geometric 3D mapping of outdoor environments using mobile robots. In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, AB, Canada, 2–6 August 2005; pp. 1507–1512. [Google Scholar]
- Ebrahimi Soorchaei, B.; Razzaghpour, M.; Valiente, R.; Raftari, A.; Fallah, Y.P. High-definition map representation techniques for automated vehicles. Electronics 2022, 11, 3374. [Google Scholar] [CrossRef]
- Elghazaly, G.; Frank, R.; Harvey, S.; Safko, S. High-definition maps: Comprehensive survey, challenges and future perspectives. IEEE Open J. Intell. Transp. Syst. 2023, 4, 527–550. [Google Scholar] [CrossRef]
- Asrat, K.T.; Cho, H.J. A Comprehensive Survey on High-Definition Map Generation and Maintenance. ISPRS Int. J. Geo-Inf. 2024, 13, 232. [Google Scholar] [CrossRef]
- Charroud, A.; El Moutaouakil, K.; Palade, V.; Yahyaouy, A.; Onyekpe, U.; Eyo, E.U. Localization and Mapping for Self-Driving Vehicles: A Survey. Machines 2024, 12, 118. [Google Scholar] [CrossRef]
- Wong, K.; Gu, Y.; Kamijo, S. Mapping for autonomous driving: Opportunities and challenges. IEEE Intell. Transp. Syst. Mag. 2020, 13, 91–106. [Google Scholar] [CrossRef]
- Li, T.; Zhang, H.; Gao, Z.; Chen, Q.; Niu, X. High-accuracy positioning in urban environments using single-frequency multi-GNSS RTK/MEMS-IMU integration. Remote Sens. 2018, 10, 205. [Google Scholar] [CrossRef]
- Ma, H.; Zhao, Q.; Verhagen, S.; Psychas, D.; Liu, X. Assessing the performance of multi-GNSS PPP-RTK in the local area. Remote Sens. 2020, 12, 3343. [Google Scholar] [CrossRef]
- Aldibaja, M.; Suganuma, N.; Yoneda, K.; Yanase, R. Challenging environments for precise mapping using GNSS/INS-RTK systems: Reasons and analysis. Remote Sens. 2022, 14, 4058. [Google Scholar] [CrossRef]
- Gargoum, S.A.; Basyouny, K.E. A literature synthesis of LiDAR applications in transportation: Feature extraction and geometric assessments of highways. GISci. Remote Sens. 2019, 56, 864–893. [Google Scholar] [CrossRef]
- Blochliger, F.; Fehr, M.; Dymczyk, M.; Schneider, T.; Siegwart, R. Topomap: Topological mapping and navigation based on visual slam maps. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 3818–3825. [Google Scholar]
- Drouilly, R.; Rives, P.; Morisset, B. Semantic representation for navigation in large-scale environments. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 1106–1111. [Google Scholar]
- Kumpakeaw, S.; Dillmann, R. Semantic road maps for autonomous vehicles. In Proceedings of the Autonome Mobile Systeme 2007: 20. Fachgespräch Kaiserslautern, Kaiserslautern, Germany, 18–19 October 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 205–211. [Google Scholar]
- Map Rendering | Mapping Technology | Platform | HERE. Available online: https://www.here.com/platform/map-rendering (accessed on 13 March 2025).
- HD Map | TomTom. Available online: https://www.tomtom.com/products/orbis-maps-for-automation/ (accessed on 13 March 2025).
- Berrio, J.S.; Ward, J.; Worrall, S.; Nebot, E. Identifying robust landmarks in feature-based maps. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 1166–1172. [Google Scholar]
- Kim, C.; Cho, S.; Sunwoo, M.; Jo, K. Crowd-sourced mapping of new feature layer for high-definition map. Sensors 2018, 18, 4172. [Google Scholar] [CrossRef]
- Scholtes, M.; Westhofen, L.; Turner, L.R.; Lotto, K.; Schuldes, M.; Weber, H.; Wagener, N.; Neurohr, C.; Bollmann, M.H.; Körtke, F.; et al. 6-Layer Model for a Structured Description and Categorization of Urban Traffic and Environment. IEEE Access 2021, 9, 59131–59147. [Google Scholar] [CrossRef]
- Stepanyants, V.; Romanov, A. An Object-Oriented Approach to a Structured Description of Machine Perception and Traffic Participant Interactions in Traffic Scenarios. In Proceedings of the 2022 IEEE 7th International Conference on Intelligent Transportation Engineering (ICITE), Beijing, China, 11–13 November 2022; pp. 197–203. [Google Scholar] [CrossRef]
- Elhashash, M.; Albanwan, H.; Qin, R. A review of mobile mapping systems: From sensors to applications. Sensors 2022, 22, 4262. [Google Scholar] [CrossRef] [PubMed]
- Chang, Y.F.; Chiang, K.W.; Tsai, M.L.; Lee, P.L.; Zeng, J.C.; El-Sheimy, N.; Darweesh, H. The implementation of semi-automated road surface markings extraction schemes utilizing mobile laser scanned point clouds for HD maps production. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, 48, 93–100. [Google Scholar] [CrossRef]
- Peng, C.W.; Hsu, C.C.; Wang, W.Y. Cost effective mobile mapping system for color point cloud reconstruction. Sensors 2020, 20, 6536. [Google Scholar] [CrossRef]
- Ilci, V.; Toth, C. High definition 3D map creation using GNSS/IMU/LiDAR sensor integration to support autonomous vehicle navigation. Sensors 2020, 20, 899. [Google Scholar] [CrossRef] [PubMed]
- Ben Elallid, B.; Benamar, N.; Senhaji Hafid, A.; Rachidi, T.; Mrani, N. A Comprehensive Survey on the Application of Deep and Reinforcement Learning Approaches in Autonomous Driving. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 7366–7390. [Google Scholar] [CrossRef]
- Girardeau-Montaut, D. CloudCompare—Open Source Project—danielgm.net. Available online: https://www.danielgm.net/cc/ (accessed on 5 March 2025).
- Gholami Shahbandi, S.; Magnusson, M. 2D map alignment with region decomposition. Auton. Robot. 2019, 43, 1117–1136. [Google Scholar] [CrossRef]
- Xu, S.; Wang, R.; Zheng, H. Road curb extraction from mobile LiDAR point clouds. IEEE Trans. Geosci. Remote Sens. 2016, 55, 996–1009. [Google Scholar] [CrossRef]
- Kumar, P.; McElhinney, C.P.; Lewis, P.; McCarthy, T. An automated algorithm for extracting road edges from terrestrial mobile LiDAR data. ISPRS J. Photogramm. Remote Sens. 2013, 85, 44–55. [Google Scholar] [CrossRef]
- Kuang, H.; Wang, B.; An, J.; Zhang, M.; Zhang, Z. Voxel-FPN: Multi-scale voxel feature aggregation for 3D object detection from LIDAR point clouds. Sensors 2020, 20, 704. [Google Scholar] [CrossRef]
- Li, Y.; Olson, E.B. Extracting general-purpose features from LIDAR data. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, Alaska, 3–8 May 2010; pp. 1388–1393. [Google Scholar]
- Yin, R.; Cheng, Y.; Wu, H.; Song, Y.; Yu, B.; Niu, R. Fusionlane: Multi-sensor fusion for lane marking semantic segmentation using deep neural networks. IEEE Trans. Intell. Transp. Syst. 2020, 23, 1543–1553. [Google Scholar] [CrossRef]
- Tian, W.; Yu, X.; Hu, H. Interactive attention learning on detection of lane and lane marking on the road by monocular camera image. Sensors 2023, 23, 6545. [Google Scholar] [CrossRef] [PubMed]
- Zhao, X.; Sun, P.; Xu, Z.; Min, H.; Yu, H. Fusion of 3D LIDAR and camera data for object detection in autonomous vehicle applications. IEEE Sens. J. 2020, 20, 4901–4913. [Google Scholar] [CrossRef]
- Zhao, L.; Zhou, H.; Zhu, X.; Song, X.; Li, H.; Tao, W. Lif-seg: Lidar and camera image fusion for 3d lidar semantic segmentation. IEEE Trans. Multimed. 2023, 26, 1158–1168. [Google Scholar] [CrossRef]
- Lagahit, M.L.R.; Matsuoka, M. Focal Combo Loss for Improved Road Marking Extraction of Sparse Mobile LiDAR Scanning Point Cloud-Derived Images Using Convolutional Neural Networks. Remote Sens. 2023, 15, 597. [Google Scholar] [CrossRef]
- Huang, A.S.; Moore, D.; Antone, M.; Olson, E.; Teller, S. Finding multiple lanes in urban road networks with vision and lidar. Auton. Robot. 2009, 26, 103–122. [Google Scholar] [CrossRef]
- Zheng, C.; Cao, X.; Tang, K.; Cao, Z.; Sizikova, E.; Zhou, T.; Li, E.; Liu, A.; Zou, S.; Yan, X.; et al. High-definition map automatic annotation system based on active learning. AI Mag. 2023, 44, 418–430. [Google Scholar] [CrossRef]
- Li, Q.; Wang, Y.; Wang, Y.; Zhao, H. Hdmapnet: An online hd map construction and evaluation framework. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 4628–4634. [Google Scholar]
- Elhousni, M.; Lyu, Y.; Zhang, Z.; Huang, X. Automatic building and labeling of hd maps with deep learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13255–13260. [Google Scholar]
- He, L.; Jiang, S.; Liang, X.; Wang, N.; Song, S. Diff-net: Image feature difference based high-definition map change detection for autonomous driving. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 2635–2641. [Google Scholar]
- Zhang, P.; Zhang, M.; Liu, J. Real-time HD map change detection for crowdsourcing update based on mid-to-high-end sensors. Sensors 2021, 21, 2477. [Google Scholar] [CrossRef]
- Jo, K.; Kim, C.; Sunwoo, M. Simultaneous localization and map change update for the high definition map-based autonomous driving car. Sensors 2018, 18, 3145. [Google Scholar] [CrossRef]
- Alonso, I.P.; Llorca, D.F.F.; Gavilan, M.; Pardo, S.Á.Á.; García-Garrido, M.Á.; Vlacic, L.; Sotelo, M.Á. Accurate global localization using visual odometry and digital maps on urban environments. IEEE Trans. Intell. Transp. Syst. 2012, 13, 1535–1545. [Google Scholar] [CrossRef]
- Kang, J.M.; Yoon, T.S.; Kim, E.; Park, J.B. Lane-level map-matching method for vehicle localization using GPS and camera on a high-definition map. Sensors 2020, 20, 2166. [Google Scholar] [CrossRef]
- Vargas, J.; Alsweiss, S.; Toker, O.; Razdan, R.; Santos, J. An overview of autonomous vehicles sensors and their vulnerability to weather conditions. Sensors 2021, 21, 5397. [Google Scholar] [CrossRef] [PubMed]
- Wang, W.; You, X.; Chen, L.; Tian, J.; Tang, F.; Zhang, L. A scalable and accurate de-snowing algorithm for LiDAR point clouds in winter. Remote Sens. 2022, 14, 1468. [Google Scholar] [CrossRef]
- Tsushima, F.; Kishimoto, N.; Okada, Y.; Che, W. Creation of high definition map for autonomous driving. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 43, 415–420. [Google Scholar] [CrossRef]
- Zhang, H.; Venkatramani, S.; Paz, D.; Li, Q.; Xiang, H.; Christensen, H.I. Probabilistic semantic mapping for autonomous driving in urban environments. Sensors 2023, 23, 6504. [Google Scholar] [CrossRef]
- Zhang, F.; Shi, W.; Chen, M.; Huang, W.; Liu, X. Open HD map service model: An interoperable high-Definition map data model for autonomous driving. Int. J. Digit. Earth 2023, 16, 2089–2110. [Google Scholar] [CrossRef]
- Ma, W.C.; Tartavull, I.; Bârsan, I.A.; Wang, S.; Bai, M.; Mattyus, G.; Homayounfar, N.; Lakshmikanth, S.K.; Pokrovsky, A.; Urtasun, R. Exploiting sparse semantic HD maps for self-driving vehicle localization. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 5304–5311. [Google Scholar]
- Barsi, A.; Poto, V.; Somogyi, A.; Lovas, T.; Tihanyi, V.; Szalay, Z. Supporting autonomous vehicles by creating HD maps. Prod. Eng. Arch. 2017, 16, 43–46. [Google Scholar] [CrossRef]
- Taeihagh, A.; Lim, H.S.M. Governing autonomous vehicles: Emerging responses for safety, liability, privacy, cybersecurity, and industry risks. Transp. Rev. 2019, 39, 103–128. [Google Scholar] [CrossRef]
- Linkov, V.; Zámečník, P.; Havlíčková, D.; Pai, C.W. Human factors in the cybersecurity of autonomous vehicles: Trends in current research. Front. Psychol. 2019, 10, 995. [Google Scholar] [CrossRef] [PubMed]
- Parkinson, S.; Ward, P.; Wilson, K.; Miller, J. Cyber threats facing autonomous and connected vehicles: Future challenges. IEEE Trans. Intell. Transp. Syst. 2017, 18, 2898–2915. [Google Scholar] [CrossRef]
- Chattopadhyay, A.; Lam, K.Y.; Tavva, Y. Autonomous vehicle: Security by design. IEEE Trans. Intell. Transp. Syst. 2020, 22, 7015–7029. [Google Scholar] [CrossRef]
- Lee, S.; Ryu, J.H. Autonomous Vehicle Localization Without Prior High-Definition Map. IEEE Trans. Robot. 2024, 40, 2888–2906. [Google Scholar] [CrossRef]
- Shaviv, I. Benefits of Mapless Autonomous Driving Technology; Imagry—AI Mapless Autonomous Driving Software Company: San Jose, CA, USA, 2024. [Google Scholar]
- Guzel, M.S.; Bicker, R. A behaviour-based architecture for mapless navigation using vision. Int. J. Adv. Robot. Syst. 2012, 9, 18. [Google Scholar] [CrossRef]
- Xue, H.; Hein, B.; Bakr, M.; Schildbach, G.; Abel, B.; Rueckert, E. Using deep reinforcement learning with automatic curriculum learning for mapless navigation in intralogistics. Appl. Sci. 2022, 12, 3153. [Google Scholar] [CrossRef]
- Wang, N.; Wang, Y.; Zhao, Y.; Wang, Y.; Li, Z. Sim-to-real: Mapless navigation for USVs using deep reinforcement learning. J. Mar. Sci. Eng. 2022, 10, 895. [Google Scholar] [CrossRef]
- Pavel, M.I.; Tan, S.Y.; Abdullah, A. Vision-based autonomous vehicle systems based on deep learning: A systematic literature review. Appl. Sci. 2022, 12, 6831. [Google Scholar] [CrossRef]
- Baten, S.; Lutzeler, M.; Dickmanns, E.D.; Mandelbaum, R.; Burt, P.J. Techniques tor autonomous, off-road navigation. IEEE Intell. Syst. Their Appl. 1998, 13, 57–65. [Google Scholar] [CrossRef]
- Taketomi, T.; Uchiyama, H.; Ikeda, S. Visual SLAM algorithms: A survey from 2010 to 2016. IPSJ Trans. Comput. Vis. Appl. 2017, 9, 1–11. [Google Scholar] [CrossRef]
- Civera, J.; Davison, A.J.; Montiel, J.M. Inverse depth parametrization for monocular SLAM. IEEE Trans. Robot. 2008, 24, 932–945. [Google Scholar] [CrossRef]
- Eade, E.; Drummond, T. Scalable monocular SLAM. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 1, pp. 469–476. [Google Scholar]
- Klein, G.; Murray, D. Parallel tracking and mapping for small AR workspaces. In Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Nara, Japan, 13–16 November 2007; pp. 225–234. [Google Scholar]
- Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
- Schlegel, D.; Colosi, M.; Grisetti, G. Proslam: Graph slam from a programmer’s perspective. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 3833–3840. [Google Scholar]
- Sumikura, S.; Shibuya, M.; Sakurada, K. OpenVSLAM: A versatile visual SLAM framework. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 2292–2295. [Google Scholar]
- Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J.D. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Giubilato, R.; Pertile, M.; Debei, S. A comparison of monocular and stereo visual FastSLAM implementations. In Proceedings of the 2016 IEEE Metrology for Aerospace (MetroAeroSpace), Florence, Italy, 22–23 June 2016; pp. 227–232. [Google Scholar]
- Ullah, I.; Su, X.; Zhang, X.; Choi, D. Simultaneous localization and mapping based on Kalman filter and extended Kalman filter. Wirel. Commun. Mob. Comput. 2020, 2020, 2138643. [Google Scholar] [CrossRef]
- Saman, A.B.S.H.; Lotfy, A.H. An implementation of SLAM with extended Kalman filter. In Proceedings of the 2016 6th International Conference on Intelligent and Advanced Systems (ICIAS), Kuala Lumpur, Malaysia, 15–17 August 2016; pp. 1–4. [Google Scholar]
- Cadena, C.; Carlone, L.; Carrillo, H.; Latif, Y.; Scaramuzza, D.; Neira, J.; Reid, I.; Leonard, J.J. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. Robot. Auton. Syst. 2016, 69, 59–75. [Google Scholar] [CrossRef]
- Liu, T.; Xu, C.; Qiao, Y.; Jiang, C.; Yu, J. Particle Filter SLAM for Vehicle Localization. arXiv 2024, arXiv:2402.07429. [Google Scholar] [CrossRef]
- Montemerlo, M.; Thrun, S.; Koller, D.; Wegbreit, B. FastSLAM: A factored solution to the simultaneous localization and mapping problem. In Proceedings of the AAAI Conference on Artificial Intelligence, Edmonton, AB, Canada, 28 July 28–1 August 2002; pp. 593–598. [Google Scholar]
- Montemerlo, M.; Thrun, S.; Koller, D.; Wegbreit, B. FastSLAM 2.0: An improved particle filtering algorithm for simultaneous localization and mapping that provably converges. Proc. Int. Jt. Conf. Artif. Intell. 2003, 3, 1151–1156. [Google Scholar]
- Song, W.; Yang, Y.; Fu, M.; Kornhauser, A.; Wang, M. Critical Rays Self-adaptive Particle Filtering SLAM. J. Intell. Robot. Syst. 2018, 92, 107–124. [Google Scholar] [CrossRef]
Architectures | # Parameters (M) | Frames per Second (FPS) | |
---|---|---|---|
YOLOv5s | 7.2 | 156 | 37.4 |
YOLOv7n | 6.2 | 286 | 38.7 |
DETR | 41 | 28 | 43.3 |
RT-DETR | 32 | 114 | 53 |
Faster-RCNN | 166 | 16 | 39 |
Mitigating Technique | Faster R-CNN | YOLO-V3 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
V-AP | P-AP | TL-AP | TS-AP | mAP | V-AP | P-AP | TL-AP | TS-AP | mAP | |
None (clear conditions *) | 72.61 | 40.99 | 26.07 | 38.12 | 44.45 | 76.57 | 37.12 | 46.22 | 50.56 | 52.62 |
None (rainy conditions **) | 67.84 | 32.58 | 20.52 | 35.04 | 39.00 | 74.15 | 32.07 | 41.07 | 50.27 | 49.39 |
Deraining: DDN | 67.00 | 28.55 | 20.02 | 35.55 | 37.78 | 73.07 | 29.89 | 40.05 | 48.74 | 47.94 |
Deraining: DeRaindrop | 64.37 | 29.27 | 18.32 | 33.33 | 36.32 | 70.77 | 30.16 | 37.70 | 48.03 | 46.66 |
Deraining: PReNet | 63.69 | 24.39 | 17.40 | 31.68 | 34.29 | 70.83 | 27.36 | 35.49 | 43.78 | 44.36 |
Image translation: UNIT | 68.47 | 32.76 | 18.85 | 36.20 | 39.07 | 74.14 | 34.19 | 41.18 | 48.41 | 49.48 |
Domain adaptation | 67.36 | 34.89 | 19.24 | 35.49 | 39.24 | – | – | – | – | – |
Time of Day | Label Count |
---|---|
Day | 41,986 |
Night | 31,900 |
Dusk/dawn | 5942 |
Total | 79,828 |
Type of Map | Description | Key Features/Uses |
---|---|---|
Real-time (online) maps—SLAM | Build maps while tracking location in environments without preexisting maps or frequent changes. | Use sensors (cameras, LIDARs, range finders) for perception, mapping, and localization. |
1. Topological maps (prebuilt maps) [52,53] | Focus on connectivity and relationships between locations rather than geometric details. | Represent environments as nodes and edges; useful for route planning and understanding layout. |
2. Metric maps (prebuilt maps) | Provide detailed spatial information, including object distances and relationships. | Essential for precise navigation and obstacle avoidance. |
2.1 Landmark-based maps (prebuilt maps) [54,55,56] | Use distinct environmental features (e.g., buildings, traffic signs) for localization and navigation. | Effective in GPS-denied environments. |
2.2 Occupancy grid maps (prebuilt maps) [57] | Represent the environment as a grid of cells, each indicating the likelihood of being occupied. | Distinguishes free space from obstacles; crucial for perception and navigation. |
2.2.1 Octree (prebuilt map) [58,59] | Hierarchical 3D grid system with eight-node subdivisions for spatial partitioning. | Efficient representation and processing of large 3D spaces. |
2.2.2 CostMap (prebuilt map) | Assigns costs to cells based on traversal difficulty. | Guides safe and optimal paths; important for dynamic obstacle avoidance and path planning. |
3. Geometric maps (prebuilt maps) [60,61] | Provide accurate geometrical data about objects and their locations. | Use vector data for high-precision tasks like obstacle avoidance and navigating through traffic. |
High-accuracy maps—HD maps (prebuilt map) [62,63,64,65,66] | Offer fine resolution with centimeter-level precision, regularly updated for real-time accuracy. | Include detailed road environments, lane configurations, traffic signs, and obstacles; use sensors (LIDAR, GPS). |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Patil, A.K.; Punugupati, B.; Gupta, H.; Mayur, N.S.; Ramesh, S.; Honnavalli, P.B. Building the Future of Transportation: A Comprehensive Survey on AV Perception, Localization, and Mapping. Sensors 2025, 25, 2004. https://doi.org/10.3390/s25072004
Patil AK, Punugupati B, Gupta H, Mayur NS, Ramesh S, Honnavalli PB. Building the Future of Transportation: A Comprehensive Survey on AV Perception, Localization, and Mapping. Sensors. 2025; 25(7):2004. https://doi.org/10.3390/s25072004
Chicago/Turabian StylePatil, Ashok Kumar, Bhargav Punugupati, Himanshi Gupta, Niranjan S. Mayur, Srivatsa Ramesh, and Prasad B. Honnavalli. 2025. "Building the Future of Transportation: A Comprehensive Survey on AV Perception, Localization, and Mapping" Sensors 25, no. 7: 2004. https://doi.org/10.3390/s25072004
APA StylePatil, A. K., Punugupati, B., Gupta, H., Mayur, N. S., Ramesh, S., & Honnavalli, P. B. (2025). Building the Future of Transportation: A Comprehensive Survey on AV Perception, Localization, and Mapping. Sensors, 25(7), 2004. https://doi.org/10.3390/s25072004