The Role of AI in Smart Mobility: A Comprehensive Survey

Del-Coco, Marco; Carcagnì, Pierluigi; Oliver, Sergi Trilles; Iskandaryan, Ditsuhi; Leo, Marco

doi:10.3390/electronics14091801

Open AccessArticle

The Role of AI in Smart Mobility: A Comprehensive Survey

by

Marco Del-Coco

^1,*

,

Pierluigi Carcagnì

¹

,

Sergi Trilles Oliver

²

,

Ditsuhi Iskandaryan

²

and

Marco Leo

¹

National Research Council of Italy, Institute of Applied Science and Intelligent Systems, 73100 Lecce, Italy

²

Institute of New Image Technologies, Universitat Jaume I. Castellón de la Plana, 12071 Castellón, Spain

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(9), 1801; https://doi.org/10.3390/electronics14091801

Submission received: 18 March 2025 / Revised: 22 April 2025 / Accepted: 23 April 2025 / Published: 28 April 2025

(This article belongs to the Special Issue Advancement on Smart Vehicles and Smart Travel)

Download

Browse Figures

Versions Notes

Abstract

The advancement in Artificial Intelligence, particularly the application of deep learning methodologies, has allowed for the implementation of modern smart transportation systems, which are making the driver experience increasingly reliable and safe. Unfortunately, a literature review revealed that no survey paper provides a collective overview of all the machine learning applications involved in smart transportation systems. To fill this gap, this paper provides a discussion on the role and advancement of deep learning methodologies in all the smart mobility aspects, highlighting their mutual dependencies. To this end, three key pillar areas are considered: smart vehicles, smart planning, and vehicle network and security. In each area, the subtasks commonly addressed by machine learning are pointed out, and state-of-the-art techniques are reviewed, with a final discussion about advancements according to recent findings in machine learning.

Keywords:

smart vehicles; artificial intelligence; machine learning

1. Introduction

In the last two decades, roads have significantly increased the number and type of vehicles on the streets, severely affecting several related issues such as road safety, traffic congestion, and environmental pollution. In 2022, there were 42,500 fatalities in the United States [1] and 20,600 in the European Union [2]. Research indicates that 90% of car accidents are due to human error, whereas only 10% can be attributed to vehicle malfunction or other factors [3]. Consequently, most accidents can be significantly reduced by employing vehicles fitted with advanced assistive technologies based on emerging innovations.

In this context, advances in Artificial Intelligence, particularly the application of deep learning (DL) methods, have been game-changers that largely improved detection and prediction capabilities, increasing reliability and safety provided by driver assistance systems.

These capabilities have enabled the development of vehicles with varying levels of automation. Smart vehicles with Advanced Driver Assistance Systems (ADASs) offer features such as lane keeping, adaptive cruise control, and collision avoidance. As automation increases, we are moving towards the idea of Connected and Autonomous Vehicles (CAVs), which use data collected by sensors, cameras, and connectivity features as input for Artificial Intelligence to make decisions about how to operate at different levels [4].

1.1. Related Works

Some works have already tried to summarise deep learning applications for smart transportation systems. In [5], the authors present a comprehensive survey on smart vehicles. They provide a generic overview covering hardware, software, and network aspects, focusing their attention on the technological trends. Applications of Artificial Intelligence (AI) are generically presented, while the work mainly focuses on networking infrastructure, standards, challenges, and security issues. A vehicle-centred survey is presented in [6], where the problem is mainly treated from the point of view of the levels of autonomy. A similar approach is used in [7], where principal clustering is performed depending on major embodiments (measurement, analysis, and execution) of autonomous driving systems. A work surveying the area of publicly available datasets oriented towards autonomous driving is instead performed in [8], whereas smart mobility in traffic management is provided in [9], where a generic traffic management architecture is presented and a set of recently proposed research prototypes are compared in depth. State-of-the-art technologies for infrastructure monitoring are presented in [10], where the authors discuss contact and noncontact measurements and evaluation techniques for pavement distress evaluation systems. A comprehensive overview of the architecture associated with intelligent vehicles, network functionalities, security issues, vulnerabilities, and possible countermeasures is provided in [11,12]. Further, in terms of network perspective, a detailed discussion on the necessity of dedicated sub-networks devoted to the specific needs of different agents is provided in [13]. In the work, the authors present a comprehensive investigation and analysis of network slicing management, both in general use cases and in its specific application to smart transportation. In the smart transportation system, many tasks are strictly related but often treated separately. Indeed, most of the proposed models can only work on one single task, leading to redundant computations, which may cause efficiency problems due to the limited computation power. In [14], a wide discussion on the application of Multi-Task Learning, i.e., a technique to train a single model that can work for multiple tasks, in the smart transportation field is presented. A comprehensive overview of available datasets for smart navigation is provided in [15]. Datasets are classified depending on the specific task (e.g., Perception Datasets, Mapping Datasets, and Planning Datasets), and an analysis of their impact is provided to select the most important ones. Finally, foundation models, i.e., models trained on extremely large datasets and highly computationally demanding, are discussed from a transport system perspective in [16].

1.2. Motivation and Contribution

The discussion so far highlights how smart mobility is a complex ecosystem made up of multiple specific topics that, even though focused on specific aspects and/or research topics, are strictly related to each other. Unfortunately, the literature review revealed that no survey covers all the DL applications involved in smart transport systems. They either cover a specific aspect (hardware, networking, applications, or datasets) or focus on a single application (autonomous driving, traffic management, or infrastructure monitoring). Additionally, they do not discuss how recent DL technologies have impacted the cross-application, lower-level tasks of smart mobility, which is fundamental to keeping abreast of open challenges and potential future directions.

To fill this gap, this paper provides an overview of the role and evolution of DL methods in smart mobility, considering three key pillar areas: smart vehicles, smart planning and vehicle network and security. Mutual interactions and interdependencies are emphasised to make the reader aware of how each specific research work must be guided by advancements and the criticality of the related fields. In each area, the subtasks commonly addressed by machine learning (ML) are discussed, and the state of the art is reviewed, with a concluding discussion on the expected advances according to the latest findings in ML.

Figure 1 schematically shows the taxonomy proposed above, and the rest of the paper is structured according to it.

1.3. Methodology

The preliminary selection of papers followed a systematic and multi-stage approach to ensure comprehensive and relevant coverage of the literature. To capture a wide range of scholarly work, searches were conducted across three major academic databases: Elsevier Scopus, chosen for its broad interdisciplinary coverage; Web of Science, selected for its focus on high-impact journals and robust citation tracking; and Google Scholar, included to account for influential preprints and grey literature. The search queries were carefully designed by combining task-specific keywords derived from Figure 1 with the terms “machine learning” OR “deep learning” and Boolean operators. For instance, an example query for one task took the following form: (“pedestrian detection” AND (“machine learning” OR “deep learning”)). This initial search yielded thousands of papers, necessitating a structured filtering process. First, duplicates were removed by using reference management tools such as Zotero or EndNote. Next, non-English publications, technical reports, and studies lacking empirical validation were excluded to maintain academic rigour. The remaining papers were then screened based on their publication venue rankings, prioritising peer-reviewed articles from top-tier journals and conferences, including those indexed in CORE and SCI/SCIE or ranked within JCR Q1-Q2. Additionally, citation counts were normalised by publication year to account for differences in exposure time—for example, a paper published in 2020 with 100 citations was weighted more heavily than a 2022 paper with 50 citations. To focus on recent advancements, the final selection was restricted to papers published after 2015. Finally, backward and forward snowballing techniques were applied to seminal works, examining their references and citations to ensure no key contributions were overlooked. This thorough and multi-criterion approach ensured a balanced yet selective representation of the most relevant and high-quality literature in the field.

1.4. Structure

The paper is structured as follows: Section 2 is dedicated to smart vehicles through a complete analysis of both hardware and software components, while how to address the monitoring of the environment, traffic, and road conditions is discussed in Section 3. The role of networks in linking vehicles and infrastructure is the subject of Section 4, together with security issues. All these aspects are then brought together in the analysis of available datasets, which forms the core of Section 5. Then, Section 6 reports the most relevant implementation of AI in cities, and Section 7 discusses open challenges, new research horizons, and the overall conclusions. Finally, Section 8 concludes the paper.

2. Smart Vehicles

The integration of sensors and microprocessors into vehicles to enhance braking, stability, and overall comfort began in the late 1960s. Today, modern vehicles can feature up to a hundred microprocessors, along with numerous sensors and actuators, and are commonly referred to as smart vehicles.

The automation capabilities of vehicles introduced by the above-mentioned technologies were ranked from 0 to 5 by the Society of Automotive Engineers (SAE) [7]. Level 0 refers to fully manual vehicles, whereas Levels 1 and 2 include the level of automation guaranteed by Advanced Driver Assistance Systems (ADASs) that exploit sensors to provide features such as adaptive cruise control, lane keeping, and collision avoidance. From Level 3 onwards, we can talk of Connected and Autonomous Vehicles (CAVs) that use sensor data and connectivity features to perform higher-level decisions, such as how to steer, accelerate, and brake autonomously up to Level 5, in which any human control is missing.

Anyway, it is worth noting that automation capabilities arise not just from the availability of various data sources (local or remote) and computing resources to process the generated data but primarily from the set of algorithms employed to perform intelligent tasks most effectively. Accordingly, the hardware and software levels can be distinguished to provide a schematic overview of the technologies used in the following subsections.

2.1. Hardware Layer

Smart vehicle hardware can be roughly divided into sensors, which are responsible for sensing the environment and in-vehicle state; Electronic Control Units (ECUs), which are responsible for processing the data; and vehicle networks, which are responsible for connecting the sensors, ECUs, and vehicle controllers but also for data exchange with other entities (i.e., other vehicles and remote data sources). A schematic reference of such a division is reported in Figure 2.

Sensors primarily embrace in-vehicle monitoring and control and can be oxygen sensors, accelerometers, gyroscopes, tyre pressure sensors, fuel level sensors, and all the sensors dedicated to engine monitoring [12]. Beyond the vehicle perspective, passive and active sensors such as cameras, LiDAR sensors, and radars provide information about the outside world. Cameras are passive sensors that capture images of the environment and can be used for object detection, lane following, and traffic sign recognition. In this category, we can include monocular cameras, stereo-cameras, and infrared (IR) or thermal cameras. Monocular cameras provide comprehensive information about the scene, showing shapes and appearance properties that can be used to determine road geometry, object class, and road signs, but do not provide depth knowledge. This can be compensated by stereo-cameras, which require additional processing for correct calibration. Both of them suffer from poor image quality in low-light conditions, which can be overcome by using IR and thermal cameras, which, anyway, are much more expensive. In contrast, LiDAR is a type of active sensor that uses a laser beam to create a 3D point cloud that provides a reliable measure of the distance of objects. Unfortunately, LiDAR sensors are expensive, suffer under certain weather conditions such as fog or heavy rain, and are unsuitable in some situations, such as crowded environments, where visual information is required for a correct understanding of the scene. Finally, radars use radio waves to detect objects and measure their distance, speed, and angle of arrival and are mostly dedicated to the interaction with other vehicles at high speed [17]. Radar measurements can be properly performed even under adverse conditions such as low light, fog, rain, or night-time, but they are affected by the problem of signal cluttering and low resolution, which does not allow the precise shape of objects to be retrieved [18].

The data produced by all the above-mentioned sensors are then processed by ECUs, which retrieve and process the data and make decisions to regulate the response of the vehicle’s systems. It is worth noting that depending on the type of processing required, ECUs can range from low-power microprocessors, responsible for specific ad hoc processing, to high-performance Graphic Processing Units (GPUs), usually dedicated to the deployment of Artificial Intelligence features.

Finally, vehicle networks are responsible for in-vehicle and out-of-vehicle communication. More precisely, in-vehicle networks allow for communication between ECUs and sensors within the vehicle. There are several types of IVNs, each with its characteristics. The most common types include Controller Area Network (CAN), Local Interconnect Network (LIN), FlexRay, and Ethernet [19]. On the other hand, Vehicular Ad hoc Networks (VANETs) deal with external communication, enabling vehicles to exchange information online, including remote vehicle monitoring.

2.2. Perception and Control Layer

The data coming from vehicle sensors need to be processed to obtain higher-level information useful to perceive and understand the road environment dynamically and implement necessary countermeasures for autonomous behaviours.

The road environment can be divided into two main categories: moving and fixed entities. The former are represented by pedestrians and other vehicles; their detection is crucial to avoiding collisions. Fixed entities refer to all road infrastructure, such as lanes, traffic lights, and traffic signs, which need to be properly detected so that a vehicle can navigate according to the road rules while avoiding violations or risky manoeuvres caused by the driver’s carelessness.

2.2.1. Pedestrians and Vehicles

Pedestrians and other vehicles are active agents, with complex and interactive movements, sharing both on- and off-the-road spaces (e.g., parking areas). It is, therefore, mandatory for smart vehicles to be equipped with AI features capable of providing alerts for the driver or implementing autonomous countermeasures (steering and braking) to avoid injuring people or causing accidents. This can be realised by the cooperation of multiple AI models, each addressing specific tasks, from low-level machine vision detection to high-level behaviour prediction [20].

Detecting vehicles and people, tracking their positions, and predicting their movements over time to avoid colliding with them are all tasks included in more general fields such as object detection and classification and multi-object tracking (MOT). A summary of some relevant works in the literature among the ones cited in the following is listed in Table 1.

The road context poses specific challenges that require dedicated investigations. In particular, pedestrians and other vehicles have different sizes and different usual dynamics, which, in some cases, introduces the need for even more specific algorithm tuning. Additionally, depending on the specific object of interest and driving conditions, each of the previously discussed sensors shows specific advantages and disadvantages, leading to the involvement of sensor fusion strategies [21]. An additional level of complexity in defining a well-structured literature review is given by the strong interdependence of the involved tasks, which has led many works to treat detection and tracking as a single objective. Figure 3 attempts to give a schematic representation of the main computational tasks involved in this type of research. Each task will be analysed in the following.

Pedestrian monitoring in the road environment means going beyond their detection, tracking their path, and predicting their dynamics and behaviours. This goal goes through three main steps: the detection, tracking (even multiple entities over time [22]), and finally, understanding and predicting the behaviour of entities [23]. Anyway, standard algorithms for object detection and tracking tasks may fail and do not provide enough information for behaviour understanding. Figure 4 reports some of the challenges that mainly limit reliable detection (e.g., large-scale variations, frequent partial or total occlusions, and low-light conditions).

In recent years, many surveys have been devoted to pedestrian detection, tracking, and behaviour prediction. A discussion covering the whole pipeline is the focus of a two-part review where Part I [20] investigates image detection and tracking, while Part II [20] is dedicated to methods for understanding human behaviour. In [24], a review of the research progress in pedestrian detection is presented, focusing on the occlusion problem, while in [25], a broad discussion on the type of features used in the detection process is made.

Recent detection methods have been largely based on DL approaches and can be divided into two groups: two-stage detectors, based on the regional proposal method and sparse prediction, and single-stage detectors. Two-stage detectors work by generating proposal regions of interest (ROI) that are successively classified as pedestrian or non-pedestrian and include approaches such as regional-fast convolutional network (RFCN) [26] and region-CNN (R-CNN) [27]. A notable solution is provided in [28], where the authors propose a two-stage detection architecture which eliminates the current two-stage detectors’ redundancy by replacing the region proposal network and bounding box head with a novel focal detection network and fast suppression head. On the other hand, non-regional approaches, such as YOLO [29] and Single-Shot Detector (SSD) [30], perform detection in a single step. These approaches perform detection in a single pass, enabling fast, accurate, and scalable detection capabilities that classical methods struggle to achieve due to their complex, multi-stage approach and limited feature representations. Additionally, SSD improves multi-scale detection efficiency. In this category, the authors of [31] proposed Localised Semantic Feature Mixers (LSFM), a novel, anchor-free pedestrian detection architecture capable of outperforming the state of the art on several datasets.

Part of the literature has focused on the problems of occlusion. In [32], the authors use R-CNN to deal with occlusions of pedestrians in crowds. A notable contribution is provided in [33], where DeepParts is proposed, a powerful detector that can detect pedestrians by observing only part of a proposal.

Dealing with different lighting conditions is another key problem, which is usually handled by multi-spectral images. Most works address this issue by using a multi-spectral pedestrian dataset published in 2015 [34]. The authors of [35] process far infrared and RGB images simultaneously by using a CNN. In [36], a Single-Shot Detector (SSD) is used for multi-spectral pedestrian tracking.

In [37], the authors leveraged GANs and proposed a new architecture using a cascaded SSD to improve pedestrian detection at a distance. To avoid the problem caused by illumination changes in [38], a LiDAR sensor for pedestrian identification is employed, while the combination of LiDAR and camera sensors is proposed in [39]. In the latter work, 3D LiDAR data are used to generate object region proposals that are mapped onto the image space, from which regions of interest (ROI) are fed into a CNN for final object recognition. A strategy using camera and radar data fusion is presented in [40] to address scenarios where pedestrians are crossing behind parked vehicles.

Detected elements can be further processed with a recognition step that outputs attributes such as body pose and facial features. These features can help to understand whether the pedestrian’s head is turned towards the vehicle or away from it, while a particular emotion may be less central to the jaywalking process [41]. Body language also plays a key role: deliberate gestures, body posture, stance, and walking style can also predict pedestrian behaviour [42]. A more general understanding of pedestrian behaviour can be obtained by activity recognition [43], for example, by combining different aspects of information, e.g., poses and trajectories [44]. In [45], the authors propose a benchmark based on two public datasets [46,47] for pedestrian behaviour understanding and provide a rank of several models considering their performance concerning specific properties of the data. Recently proposed works leverage multi-modal methods that jointly exploit inputs from multiple sources (i.e., onboard cameras and vehicle ego-motion) [45,46,48,49].

Table 1. This is a summary of some relevant works in the literature. From the leftmost column, the table presents the corresponding reference number and year of publication (in brackets), the target to be investigated (P: pedestrians; V: vehicles), the application goal (D: detection; T: tracking; A: action/activity), and the key aspect that made the work noteworthy.

Work (Y)	Target	Goal	Key Aspect
[33] (2015)	P	D	Detection by parts
[50] (2017)	V	D	Convolutional 3D detection
[51] (2017)	P	T	Recurrent neural networks
[52] (2018)	V	D	Multi-modality (LiDAR + RGB)
[39] (2020)	P	D	LiDAR–RGB fusion
[26] (2021)	P	D	Region proposal
[49] (2021)	P	A	Multi-modality (LiDAR + RGB)
[53] (2021)	V	DT	YOLO and DeepSORT
[54] (2022)	V	D	Convolutional block attention
[31] (2023)	P	D	Anchor-free detection
[55] (2024)	PD	T	Unifying Foundation Trackers

Distinguishing between different types of road users is essential due to both their differences in appearance and dynamic characteristics. Pedestrians typically move at low speeds, often with irregular or unpredictable trajectories, and have small, upright silhouettes. In contrast, vehicles tend to be larger, move faster, and follow more constrained paths governed by road infrastructure. Moreover, substantial variability exists even within the broader category of vehicles. For instance, bicycles and motorcycles are smaller, more agile, and capable of navigating through tighter spaces, while trucks and vans are bulkier, with limited manoeuvrability and larger turning radii. Cars, which are the most common vehicles, are somewhere in between, and their behaviour differs based on context, driving style, and vehicle capabilities. These intra-class variations in physical dimensions, acceleration profiles, and interaction patterns with other road users introduce an additional layer of complexity for detection and subsequent tasks. It is clear that effectively modelling this heterogeneity is crucial to accurate detection, classification, and behaviour forecasting.

The solution proposed in [56] consists in projecting radar signals onto the camera plane to obtain radar range velocity images. The authors of [53] provide a vehicle detector based on YOLO and simultaneously address the tracking problem with DeepSORT. A convolutional block attention module (CBAM) is introduced into the YOLOv5 backbone network to highlight critical information for the vehicle detection task and ignore useless information [54]. YOLO is also the core of [57], which highlights that modern techniques based on DL are more optimised and accurate. In [50], a fully convolutional network-based 3D detection technique is applied to point cloud data, increasing performance over previous point cloud-based detection approaches on the KITTY dataset. Data fusion of LiDAR and camera sources is exploited on the YOLO backbone for improved multi-modal detection [52]. A scale-insensitive Convolutional Neural Network (SINet) is developed in [58] for the fast detection of vehicles with large-scale variance.

In smart transportation systems, object detection and tracking are inherently interconnected tasks that, working in tandem, ensure complete situational awareness. This allows for monitoring and predicting the movement of vehicles, pedestrians, and other road users in real time. Detection identifies and localises objects of interest, while tracking associates these detection results over time to estimate the objects’ trajectories. Reliable tracking depends heavily on the accuracy and consistency of the initial detection step, as missed or false detections can lead to tracking errors or failures. More specifically, once the element of interest has been detected, its localised window (visual or other sensor data) is sent to the tracking stage, which generally consists of two main steps: (1) a prediction phase, which estimates the possible future state of the object, and (2) an update phase, which uses detection to refine successive estimates. Furthermore, tracking operations often involve more than one object. In this case, one speaks of multi-object tracking (MOT), whose additional task is to distinguish objects and correctly associate their identities with the corresponding tracks. This task is particularly challenging when tracks overlap or when pedestrians or vehicles are obscured by obstacles, a situation that often occurs in crowded scenarios.

The works in [59,60] propose a comprehensive review of the recent literature, whereas an interesting perspective on multiple-pedestrian tracking is indeed performed in [22]. Multiclass multi-object tracking is the focus of [61], while in [51], the authors present an end-to-end human tracking approach based on recurrent neural networks. The tracking of a pedestrian on thermal images at night is proposed in [62], which implements an encoder–decoder CNN. In [63], the authors propose a novel online process for track validity that temporally distinguishes between legitimate and ghost tracks, along with a multi-stage observational gating mechanism for incoming observations. Scene understanding can also provide contextual information that improves the tracking algorithm, especially in crowded scenes. In [64], a model called ’interaction feature strings’, which is based on the decomposition of an image and extracts features from the observed scene, has been developed. A separate discussion can be reserved for the works dealing with both detection and tracking issues and/or involving specific data fusion to compensate for the pros and cons of different sensors [21,65]. Another recent solution is proposed in [66], where the authors propose a Stepwise Goal-Driven Network that estimates and uses goals at multiple temporal scales. More precisely, they use an encoder that captures historical information, a stepwise goal estimator that predicts successive future goals, and a decoder devoted to the prediction of future trajectory. A query-centric paradigm for scene encoding is the key point of [67]; it enables the reuse of past computations by learning representations independent of the global spacetime coordinate system. Additionally, the authors propose an anchor-free query strategy to generate trajectory proposals recurrently, allowing for the use of different scene contexts when decoding waypoints at different horizons.

Lastly, it is worth noting how the need for an even greater capacity for generalisation is shifting research towards the use of foundation models, large, pre-trained models that can be adapted to a wide range of tasks with minimal additional training. In [68], the authors propose the integration of the Segment Anything Model (SAM) [69] into a multiple-pedestrian tracking method. The authors of [55] present a general framework for unifying different tracking tasks based on a foundation tracker [70].

2.2.2. Road Infrastructure

Automated driving and vehicle safety systems need object detection. Object detection must be accurate overall, be robust to weather and environmental conditions, and run in real time.

Images of traffic signs taken in real-world situations exhibit various distortions because of varying light directions, fluctuating light intensity, and different weather conditions. Noise, partial or complete underexposure, partial or complete overexposure, and significant variations in colour saturation (caused by light intensity) of the traffic signs are all introduced by images taken under multiple conditions. Furthermore, the traffic sign recognition task is difficult for a computer vision system due to the vast range of viewing angles, viewing distance, and shape/colour deformations of traffic signs. To depict the shapes of traffic signs, researchers typically extract human-defined local features from input images before using a classifier to predict the class label. Convolutional Neural Network (CNN)-based solutions have recently gained popularity in the computer vision community due to their powerful capacity to automatically learn features through an internal process [71]. Numerous CNN-based strategies, even with energy-efficient solutions [72], have been presented for this task [73]. A pivotal point, which is the core in [74], regards the generalisation capabilities under extreme weather conditions. In particular, the authors propose an efficient feature extraction module based on a Contextual Transformer and a CNN with the aim to utilise the static and dynamic features of images to provide stronger feature enhancement capabilities and improve inference speed. YOLO (You Only Look Once) is an algorithm based on Deep Neural Networks with real-time object detection capabilities. It is the state-of-the-art technology in road sign detection and recognition due to its speed and precision [75]. It has also been enhanced by targeted fine-tuning [76], slightly modified architectures [77,78], and pooling strategies [79]. Despite progress, all existing technologies still struggle under challenging conditions such as adverse weather and complex roadway environments. They also fail to detect small targets due to their inherent down-sampling operations to obtain high-level feature maps. To overcome these obstacles, combined models, e.g., YOLO and Mamba, have been recently introduced to enhance the accuracy and robustness of traffic sign detection [80], whereas ref. [81] uses a Space-to-Depth module to compress spatial information into depth channels, expanding the receptive field and enhancing the detection capabilities for objects of varying sizes.

The semantic road detection task aims to distinguish variously sized and shaped road objects and open spaces from the scene’s background. Semantic segmentation and stereo-matching are two essential components of 3D environmental perception systems for autonomous driving. Semantic segmentation concerns a monocular pixel-level understanding of the environment, while 3D stereo-matching simulates human binocular vision to acquire accurate and dense depth information. An interesting methodology is proposed in [82], where the authors decompose a road scene into different regions and represent each region as a node in a graph formed by establishing pairwise relationships between nodes based on their similarity in feature distribution. Data-driven, geometry-based, and recently, DL strategies usually carry out the two aforementioned tasks separately. One of the most recent and relevant works is the one in [83], in which transformer attention [84] and CNN modules are exploited to merge multi-scale features across different scales and receptive fields. It effectively fused heterogeneous features and improved accuracy compared with previous semantic segmentation approaches. A joint learning framework developed to perform semantic segmentation and stereo-matching simultaneously is presented in [85]. This end-to-end joint learning framework yields improved performance compared with the models trained separately for each task. A solution based on fish-eye image segmentation is proposed in [86], where a framework featuring three semi-supervised components, namely, pseudo-label filtering, dynamic confidence thresholding, and robust strong augmentation, is introduced. The problem of inaccurate depth estimation in regions where the depth changes significantly (depth jump) using camera-only and multi-model 3D object detection models is treated in [87]. The authors propose an edge-aware depth fusion module to alleviate the “depth jump” problem and a fine-grained depth module to enforce refined supervision on depth. Most recently, foundation models have proven to be transformative in understanding the surrounding scene. A multi-modal and multi-task foundation model for road scene understanding is defined in [88] as a framework that inputs multi-modal data and outputs multi-task results.

Lane markers are pivotal components of autonomous driving and driver assistance systems. They must be detected among various objects. Lane detection paradigms can be grouped as segmentation-based (bottom-up pixel-based estimation) [89], keypoint-based, parametric (a lane instance is represented as a set of curve parameters), row-based, and anchor-based (the lane instance is represented as a set of x-coordinates at the fixed rows) [90] methods. More effective methods leverage multiple features and mathematical models [91]. Even though early neural network models that used anchor-based object detection and instance segmentation have demonstrated progress, they still have trouble detecting lanes in complicated configurations and with low visibility. In [92], the authors develop a multi-stage framework exploiting PETRv2 to detect the centreline and the popular YOLOv8 to detect traffic elements. Alternatively, precise global semantics and local characteristics must be combined with sophisticated loss functions to provide accurate lane prediction in complicated settings [93].

Real-time and accurate traffic light status recognition can provide reliable data support for autonomous vehicle decision making and control systems. Currently, status recognition methods for traffic lights include machine learning- and deep learning-based methods. More effective approaches are enhanced versions of state-of-the-art end-to-end object detectors with the direct generation of object bounding boxes and category labels. For example, in [94], the proposed algorithm for traffic light status recognition is based on the DINO [95] object detector.

Many accidents and fatalities on the road are caused by high-speed cars and infractions of traffic laws. Governments put speed breakers in place as a safety precaution to lessen this problem. However, many speed breakers are not properly signboarded, and vehicles frequently fail to notice them, which results in accidents. Numerous DL techniques can be used for detecting speed breakers in real time: R-CNN, YOLO, and Mask R-CNN. In particular, the use of YOLO v7 with the usage of anchor boxes is one of the recent fundamental advancements in this research field [96].

For postprocessing, detectors usually need Non-Maximum Suppression (NMS), which slows down the inference speed and creates hyperparameters that lead to speed and accuracy instability. The next objective is to increase speed without sacrificing accuracy and then to increase speed without sacrificing precision. The academic community has recently paid close attention to end-to-end transformer-based detectors (DETRs) [97] because of their simplified architecture and removal of manually constructed parts. Their potential for detecting road infrastructure has not yet been investigated.

In addition to objects that usually populate road infrastructure, many other elements appear less frequently and represent a risk factor. A dataset for addressing this problem has been proposed by the authors of [98], who proposed an approach exploiting the difference between the original image and the resynthesised image to highlight spurious objects. They started their analysis with the Lost and Found dataset [99] and then produced new annotated data to enrich the provided results. Recently, the solution proposed in [100] outperformed the most recent proposals with the introduction of a novel outlier scoring function called RbA, which defines the event of being an unexpected object as being rejected by all known classes. Another solution of interest that uses a mask classification has been proposed in [101]. A richer dataset, but still unexplored, for the detection of anomalous objects as the main task is proposed in [102]. A collection of the most relevant works in this context is reported in Table 2.

3. Smart Planning

Smart planning represents a paradigm shift in urban and transportation management, integrating advanced technologies to enhance efficiency, sustainability, and infrastructure resilience. The application of cloud computing, Artificial Intelligence (AI), and real-time data analytics has revolutionised how cities address challenges such as traffic congestion, pollution control, and road infrastructure maintenance. By leveraging data-driven insights, smart planning enables proactive decision making, minimises resource wastage, and improves overall urban mobility.

This section explores three key aspects of smart planning: traffic prediction, pollution estimation, and road condition assessment. These interconnected domains illustrate how AI and computational models contribute to the development of intelligent transportation systems, sustainable environmental policies, and effective road maintenance strategies.

3.1. Traffic Prediction

Accurate traffic prediction is a cornerstone of modern intelligent transportation systems (ITSs), aiming to alleviate congestion, enhance navigation, and improve urban planning. By leveraging AI and advanced data collection methodologies, traffic prediction provides actionable insights for both short-term and long-term transportation planning. A summary of some of the most relevant works on traffic prediction is given in Table 3.

Different authors have defined various components of traffic prediction. Yuan and Li [103] described traffic prediction as a combination of traffic status prediction, traffic flow prediction, and travel demand prediction. Similarly, the authors of [104] highlighted key tasks, such as predicting traffic flow, speed, demand (future request predictions), travel time, and occupancy (extent of road space utilisation), while the authors of [105] proposed a Gaussian process path planning (GP3) algorithm to calculate the a priori optimal path as the reliable shortest-path solution. The role of public vehicle systems, focusing on online ride sharing, is considered in [106], where an efficient path-planning strategy based on a greedy algorithm is proposed, leveraging a limited potential search area for each vehicle and excluding the requests that violate the passenger service quality level.

Recent research has introduced sophisticated AI-based approaches to improve traffic forecasting accuracy. The proposal of [107] regards the Traffic State Anticipation Network (TSANet), a model based on graph and transformer architectures [84] to predict congestion in laneless traffic scenarios. The study introduces three traffic states—clumping, unclumping, and neutral—enabling detailed congestion tracking and improving forecasting precision.

The Graph Spatiotemporal Transformer Network (GSTTN) presented in [108] addresses the challenges posed by complex nonlinearity and dynamic spatiotemporal dependencies in traffic data. This framework integrates a multi-view Graph Convolutional Network (GCN) to capture spatial patterns and a transformer network with multi-head attention to model time-series disturbances. Similarly, ref. [109] introduces the Spatiotemporal Dual-Adaptive Graph Convolutional Network (ST-DAGCN), which dynamically learns both global and local traffic state features by using a dual-adaptive adjacency matrix while capturing temporal dependencies through Gated Recurrent Units (GRUs).

The authors of [110] investigated Traffic Speed Forecasting (TSF) by using GPS probe data from transport vehicles in Vietnam, focusing on challenges such as abnormal or missing GPS values. Their approach optimises traffic speed predictions on parallel multilane roads in Hai Phong City by integrating advanced optimisation algorithms, specifically Particle Swarm Optimisation (PSO) and the Genetic Algorithm (GA), with Long Short-Term Memory (LSTM) networks. S. Wu [111] explored a dynamic traffic flow prediction model for urban road networks (URNs) by combining Spatiotemporal Graph Convolution Networks (STGCNs) with Bi-directional Long Short-Term Memory (BiLSTM) networks [112]. In [113], the authors introduced Propagation Delay-aware Dynamic Long-Range Transformer (PDFormer), which considers long-range dependencies in traffic forecasting, whereas a Graph Multi-Attention Network (GMAN) designed to predict traffic volume and speed for multiple future time steps at different locations was proposed in [114].

Table 3. This is a summary of some relevant works on traffic prediction. TP: traffic prediction; ST: spatiotemporal; LR: literature review; TCP: traffic congestion prediction; TFP: traffic flow prediction; TSP: traffic speed prediction; TVP: traffic volume prediction.

Work (Y)	Goal/Target	Method	Metrics	Data
[103] (2021)	Survey of ST data in TP	LR	N/A	Various public datasets
[104] (2021)	Survey of DL in TP	LR	N/A	METR-LA, PEMS, etc.
[105] (2021)	Shortest-path planning	GP3	Path reliability and runtime	Transportation network data
[106] (2018)	Ride-sharing path planning	Online path planning	Travel time and efficiency	Simulated and GPS data
[107] (2024)	TCP	TSANet	Accuracy and F1-Score	Aerial video datasets
[108] (2024)	TFP using ST data	GSTTN	MAE, RMSE, and MAPE	METR-LA and PEMS-BAY
[109] (2024)	TP	ST-DAGCN	MAE, RMSE, and MAPE	PEMS-BAY and Los-loop
[110] (2024)	TSP	PSO+GA+LSTM	RMSE, MAE, and MDAE	Registered vehicle probe data
[111] (2021)	TFP	STGCN+BiLSTM	MAE, RMSE, and MAPE	Urban sensor data
[113] (2023)	Delay-aware long-range TFP prediction	MAE, RMSE, and MAPE	Delay-tagged traffic data	Public traffic data
[114] (2020)	TVP and TSP	GMAN	MAE, RMSE, and MAPE	Xiamen and PEMS

3.2. Pollution Estimation

Accurate air quality prediction plays a fundamental role in pollution management and public health protection. The ability to anticipate pollutant concentrations enables policymakers and environmental agencies to implement proactive measures, reducing human exposure to harmful air contaminants. The increasing complexity of pollution patterns, influenced by meteorological conditions, industrial emissions, vehicular activity, and other anthropogenic factors, has driven the adoption of advanced computational models. Traditional statistical techniques have been largely replaced by AI-driven approaches due to their ability to model complex, nonlinear relationships in air quality data [115]. Table 4 reports a summary of the key literature on Air Pollution Estimation.

Machine learning and deep learning methods have become essential to air pollution estimation, allowing for improved accuracy and robustness in predictive models. Some of the most widely applied techniques include Artificial Neural Networks (ANNs), Deep Neural Networks (DNNs) [116], Fuzzy Logic (FL), and Support Vector Machines (SVMs). These models facilitate the identification of intricate patterns in environmental data, enabling reliable short-term and long-term air quality forecasts.

Recent research has focused on developing models that integrate various data sources, such as meteorological observations, emission inventories, and remote sensing data, to enhance air quality predictions. The authors of [117] propose models to estimate in-vehicle pollution exposure by considering driving patterns and ventilation settings. Their study compares a mass-balance model with an ML model, with the latter demonstrating superior predictive performance, particularly for NO₂ concentrations, and achieving greater accuracy when applied to unseen data.

A hybrid model is introduced in [118]. It combines LSTM networks with the Multi-Verse Optimisation (MVO) algorithm to predict NO₂ and SO₂ emissions from combined cycle power plants. The LSTM model forecasts pollutant levels, while MVO optimises its hyperparameters, improving predictive accuracy. Similarly, in [119], a combination of SVM and LSTM networks to predict the Air Quality Index (AQI) is employed. Before conducting the predictive analysis, the study applies extensive preprocessing techniques, such as handling missing values, removing redundancies, and extracting features by using the Grey-Level Co-occurrence Matrix (GLCM) method. To further enhance predictive accuracy, a Modified Fruit Fly Optimisation Algorithm (MFOA) is used for feature selection and parameter tuning.

Graph-based models have gained prominence in pollution forecasting. The authors of [120] proposed an Attention Temporal Graph Convolutional Network (A3T-GCN) for NO₂ prediction in Madrid. The model integrates GCN, GRU, and an attention mechanism to better capture spatial and temporal dependencies in pollution data. By leveraging graph structures, the approach improves air quality predictions over traditional DL models.

To improve interpretability and robustness in PM_2.5 predictions, the solution proposed in [121] exploits a hybrid interpretable predictive ML model that focuses on peak value prediction accuracy while ensuring model explainability. Similarly, the authors of [122] developed a novel DL framework for PM_2.5 estimation, leveraging a spatiotemporal Graph Neural Network (BGGRU) with self-optimisation. This approach integrates a spatial module (GraphSAGE) and a temporal module (BGraphGRU) while utilising Bayesian optimisation for hyperparameter selection, thereby improving forecasting accuracy and generalisation performance.

Cloud-based AI and Internet of Things (IoT) integration have also contributed significantly to advancements in air pollution forecasting. In [123], a cloud-implemented AI model that employs IoT devices to monitor real-time AQI levels is proposed. The approach, known as BO-HyTS, combines Seasonal Autoregressive Integrated Moving Average (SARIMA) with LSTM networks, using Bayesian optimisation to refine predictive accuracy. This methodology effectively captures both linear and nonlinear temporal dependencies in air quality data, enhancing forecast precision.

Transformer-based architectures [84] have gained traction in air pollution modelling due to their ability to process long-range dependencies in time-series data. AirFormer is introduced in [124]; it consists of a transformer-based model for air quality prediction across China. This framework incorporates self-attention mechanisms to capture spatiotemporal dependencies while incorporating a stochastic component to model uncertainty. Another transformer-based approach was presented by Zhang and Zhang [125], who used Sparse Attention-based Transformer Networks (STNs) to predict PM_2.5 levels in Beijing and Taizhou, achieving improved performance when dealing with missing or sparse environmental data.

3.3. Road Conditions

The condition of road pavements is crucial to economic development and transportation efficiency. According to the World Bank, road density varies significantly by economic level, ranging from 40 km per million inhabitants in low-income economies to 8550 km in high-income economies [126]. As urbanisation accelerates and climate change impacts intensify, the need for frequent and precise pavement monitoring has grown. Ensuring that roads are well maintained reduces accident risks, improves driving comfort, and optimises maintenance budgets. A summary of some relevant studies on road conditions is reported in Table 5.

Traditional pavement inspection methods rely heavily on manual assessments, which are time-consuming, costly, and prone to subjectivity. The integration of AI, computer vision (CV), and ML has revolutionised road condition assessment, enhancing accuracy, efficiency, and scalability. These technologies enable both dynamic monitoring, where data are actively collected with vehicle-mounted cameras and drones, and static monitoring, where fixed sensors continuously track road surface conditions. The adoption of smart monitoring solutions supports transportation agencies in making data-driven maintenance decisions, ensuring road longevity and public safety.

Pavement assessment tasks include distress detection, classification, segmentation, and condition quantification. While traditional methods require extensive data collection and manual evaluation, DL-based approaches offer more robust and automated solutions. In [127], a hybrid system combining a YOLO-based crack classification model with a U-Net segmentation model to quantify crack density is proposed. Their method introduces a pavement condition index derived from both models’ outputs, improving crack detection even in challenging environments. Similarly, the authors of [128] present EdgeFusionViT, a transformer-based architecture designed for real-time pavement classification. The system, deployed on an edge device (Nvidia Jetson Orin Nano), processes road images at 12 frames per second, efficiently categorising surface conditions such as “dry gravel” and “dry asphalt-smooth”.

Beyond classification, DL models have improved pavement segmentation and distress quantification. A multi-modal transformer model for detecting and classifying winter road surface conditions is introduced in [129]. Their approach integrates multiple sensor inputs by using a cross-attention mechanism, enhancing classification accuracy by capturing variations in road textures and weather-induced changes. ISTD-DisNet is introduced in [130], a transformer-based model for multi-type pavement distress segmentation. By leveraging a Mix Transformer (MiT) and a Mixed Attention Module (MAM), the model extracts multi-scale pavement features, while a Learnable Transposed Convolution Up-sampling Module (TCUM) refines distress details.

Crack detection is a particularly critical aspect of pavement evaluation, requiring high accuracy to prevent infrastructure deterioration. The authors of [131] present a DeepLabv3+ CNN model combined with a crack quantification algorithm that analyses crack length, average width, maximum width, and affected pavement area. The authors of [132] proposed a two-stage defect detection approach, where a CNN-based segmentation network predicts pixel-wise crack locations, followed by a decision network that confirms defect presence across entire-pavement images. A DL model for crack detection across diverse pavement types and environmental conditions is introduced in [133]. The model employs context-aware analysis to adaptively adjust its predictions based on variations in lighting, road texture, and surface materials. Additionally, an iterative feedback mechanism enables the model to learn from misclassifications, significantly improving accuracy over time.

Beyond crack detection, efforts have been made to quantify overall road conditions by using sensor-based measurements. The application of Convolutional Neural Networks (CNNs) to detect pavement distress from orthoframes collected via mobile mapping systems is considered in [134]. The study finds that fine-tuning pre-trained CNNs and applying extensive preprocessing significantly enhances detection accuracy. However, challenges remain in handling visual artefacts such as shadows from tree branches, which can cause misclassification.

A semi-supervised learning framework is proposed in [135] to estimate the International Roughness Index (IRI) from in-car vibrations. By using Power Spectral Density (PSD) analysis and a Linear Time-Invariant (LTI) system, the study formulates the connection between pavement roughness and vehicle vibrations. By leveraging a self-training approach, their model effectively estimates IRI values across road networks while addressing issues related to incomplete datasets and sensor inconsistencies.

While DL has significantly advanced road condition assessment, several challenges persist. The reliance on high-quality labelled datasets remains a major limitation, as annotating pavement distress images is resource-intensive and time-consuming. Research efforts are exploring semi-supervised and self-supervised learning techniques to reduce dependency on large labelled datasets. Additionally, AI models often struggle to generalise across different geographical regions due to variations in road materials, climatic conditions, and construction practices. To mitigate this issue, Transfer Learning and domain adaptation strategies are being explored to improve model adaptability.

Real-time road monitoring presents another challenge, as most DL models are computationally demanding. The integration of edge computing [136], IoT sensors [137], and cloud-based AI platforms offers potential solutions by enabling real-time processing and automated data collection. Future advancements are expected to focus on optimising model efficiency, reducing false detections, and improving interpretability. By refining AI-driven pavement monitoring systems, transportation agencies can implement more effective and cost-efficient road maintenance strategies, ultimately enhancing infrastructure sustainability and public safety.

4. Vehicle Networks and Security

The term Internet of Vehicles (IoV) refers to integrating vehicles with the Internet, enabling them to exchange information online and supporting new applications and services, including remote vehicle monitoring and autonomous driving. Smart vehicles create ad hoc networks, known as Vehicular Ad hoc Networks (VANETs), using short-range wireless communication technologies like Wi-Fi. Among the specific types of communication, we can cite Vehicle-to-Vehicle (V2V), Vehicle-to-Device (V2D), Vehicle-to-Building (V2B), Vehicle-to-Grid (V2G), and Vehicle-to-Infrastructure (V2I) communication, which can be grouped in the macro-category of Vehicle-to-Everything (V2X) communication (i.e., the communication between a vehicle and any entity in the surrounding environment). Additionally, beyond these communication channels, the sensing layer also serves as a means for vehicles to interact with the external environment.

This makes it clear that while communication capabilities greatly enhance the vehicle’s contextual awareness, they also introduce vulnerabilities to malicious attacks targeting these systems, making the protection against unauthorised access a critical priority [138]. Indeed, smart vehicles provide a wide attack surface, i.e., the collection of vulnerabilities, entry points and techniques that an attacker can exploit [139].

Physical access refers to physical interfaces, such as OBD-II port, traditionally utilised by service professionals, whereas Wireless access regards wireless or remote communication capabilities like Wi-Fi or Bluetooth, but also broadcast channels, including GNSS, Traffic Message Channel, Digital Radio, and Satellite Radio. These channels can be exploited by attackers to access ECUs and IVNs [140,141], allowing them to display false information on the instrument panel by altering frame data. Another attack vector is represented by the sensing layer, usually with the aim of altering the vehicle’s perception of the surrounding environment. Off-the-shelf hardware can be used to perform jamming and spoofing attacks, which can cause blindness and/or malfunctions in vehicles under attack. Finally, a transversal attach surface is represented by ML capabilities that can be corrupted in several ways, causing erroneous deduction and consequent unexpected vehicle behaviour.

4.1. Type of Attack and ML-Based Security Solution

Such a complex ecosystem is susceptible to various forms of attacks, and numerous countermeasures have been proposed in the literature to address these threats. While traditional hard-coded algorithms are effective against deterministic attack scenarios, deep learning architectures can identify a broader range of attacks by leveraging experience and shared data. As a result, DL models are increasingly becoming the standard de facto to deal with these kinds of attacks. Most relevant works are summarised in Table 6.

4.1.1. Attack Detection

A non-trivial challenge consists in detecting running attacks aimed at violating the integrity of both vehicles and vehicle networks. Solutions proposed in the literature vary depending on the specific type of attack.

Platoon Attack: A platoon refers to a group of vehicles travelling in the same lane at close distances and maintaining similar speeds to enhance energy efficiency and contribute to more effective traffic management. An attack on this kind of pattern could cause car accidents and serious injuries. An approach devoted to detecting this kind of attack and providing the opportune countermeasure is proposed in [142], where the authors try to detect and locate the attacker exploiting velocity range and distance data provided by LiDAR and radar sensors. The data are sent to a network made up of convolutional (CNN) and fully connected (FCNN) layers that return a high-value output corresponding to the attacker node.

Distributed Down of Service (DDoS): Exploiting Software-Defined Network is another type of attack devoted to making the network services unavailable. In [143], the proposed algorithm aims to perform attack detection by using live traffic data, capturing packets with Fuzzy Logic and successively implementing a Q-learning approach. In [155], the authors present a multi-agent system, exploiting the collaborative nature of intelligent agents for anomaly detection, with each agent implementing a Graph Neural Network. In a recent work [150], the authors proposed a novel Deep Multi-modal Learning (DML) approach for detecting DDoS attacks in the IoV by the integration of LSTM and Gated Recurrent Unit (GRU).

Black/grey hole: This type of attack refers to a node that tries to stop messages/packets from being forwarded to a receiver. More precisely, a black hole corresponds to a complete drop of packets, whereas the grey version refers to the case where packets are partially dropped or corrupted. In [144], the authors propose to use an Artificial Neural Network (ANN) devoted to analysing three types of trace files to extract features devoted to characterising data as normal and abnormal.

Sybil attack: A sybil attack creates virtual nodes in the vehicle networks to launch an attack. Unfortunately, the use of pseudonyms in vehicular networks for the privacy of user identity makes it difficult for the system to detect such virtual nodes and consequently to detect sybil attacks. In this context, an interesting solution was proposed in [145], in which a generic RNN-based solution to perform the global detection of sybil attacks is introduced. It performs several information checks (e.g., range plausibility, position plausibility, and speed plausibility), using them as an input for an LSTM-based RNN network, which detects the correct type of sybil attack.

Jamming attack: A jammer aims to compromise the transmission of data from a sensor by sending false alerts or creating a spoofed environment around the sensor. The solution suggested in [147] proposes an algorithm for jamming attack prevention in mobile ad hoc networks. The proposed algorithm uses a Q-learning approach to learn the history of actions (past) and then inputs it into the deep Q-network (DQN) to predict Q-values for the present states.

Spoofing attack: Spoofing attacks refer to an attacker pretending to be a legitimate network user to gain access to personal information. Among them, we can cite GNSS spoofing attacks devoted to sending fake GPS signals to the target receiver, making the user think that he/she is in a false location. Time-series analysis can be used to detect these kinds of attacks as proposed in [146], where the authors use an LSTM-RNN model to anticipate the distance travelled between two consecutive locations of a vehicle working with a publicly available dataset, comma2k19 [156].

Vehicle position data are also personal data that can be the target of an attacker exploiting existing V2V communication to track a vehicle’s CAM-trace. In [157], the authors use a VANET topology learning methodology that can use any existing Graph Learning framework.

4.1.2. Intrusion Detection and Misbehaviour

Intrusion Detection Systems (IDSs) include all the approaches devoted to identifying unknown types of attacks. Numerous studies in the literature have explored the challenges associated with detecting intrusions or misbehaviour caused by malicious nodes.

In [158,159], the authors exploit anomaly detection strategies to find malicious messages transmitted to ECUs, whereas approaches based on the analysis of sensor data are proposed in [160,161]. Recently, a technique exploiting the identification of the driver based on their real-world driving data by LSTM time-series analysis was proposed in [162], whereas a CNN-based system working on CAN messages was used in [163] to detect attacks on the IoV.

Recently, some works have exploited the Transfer Learning paradigm. In [148], the authors train a convolution LSTM-based model with a previously known intrusion dataset, and successively, a one-shot TL approach is used to re-train the model for the detection of new attacks to make it able to detect new intrusions by exploiting only one sample. The authors of [149] propose two TL-based model update schemes to detect new types of attacks. The possibility of using a small number of data to achieve high detection accuracy is one of the key advantages of this work.

Finally, it is worth noting that a perspective on the explainability aspect is provided by the authors of [151], who propose a novel feature ensemble framework that integrates multiple explainable AI (XAI) methods with various AI models to improve both anomaly detection and interpretability.

4.1.3. Adversarial ML Attacks

Even relying on ML and DL capabilities can represent a vulnerability. In this case, Adversarial ML attacks are among the most exploited scenarios [164]. Such an approach consists of creating or modifying input signals by adding imperceptible changes that can compromise the reliability of the ML/DL model. This kind of attack can occur either during the training or inference phase. The first case is known as a poisoning attack and is performed by manipulating the training data to compromise the ML/DL model [165]. On the contrary, an evasion attack regards the inference phase and aims to produce a false result from the deployed model. In recent years, researchers have shown high interest in this topic: Security attacks against sign recognition by exploiting Adversarial ML approaches are explored in [166], whereas a cyber-attacks targeting the ML policy of an autonomous vehicle in a dynamic environment are performed in [164]; finally, a more general analysis of adversarial attacks on CAVs is the topic of [12]. Most recent works, like [167], propose an adversarial attack framework allowing the attacker to easily fool LiDAR semantic segmentation by placing some simple objects (e.g., cardboard and road signs) at some locations in the physical space. In [153], instead, the authors attack adaptive traffic control systems (ATCSs) based on deep reinforcement learning (DRL) by using a group of vehicles that cooperatively send falsified information to “cheat” the DRL-based ATCSs.

Consequently, another part of the literature focuses on the defence against adversarial attacks, which can be performed by two different strategies: focusing on the detection of adversarial inputs after the training of ML models or proactively increasing the robustness of the ML model against such attacks [12,168,169]. Some works focus their attention on preventing unknown scenarios, usually with the enhancement of model robustness by training models on adversarially perturbed data. For example, in [170], the authors developed a framework named Closed-loop Adversarial Training to generate adversarial scenarios for training to improve driving safety, especially in safety-critical traffic scenarios. The integration of objectness information into the adversarial training process is instead used in [154], where the authors applied this technique to enhance the robustness of YOLO detectors on datasets such as KITTI [171]. On the other hand, known hazardous situations are pivotal to advancing safety standards. Deep learning architectures (e.g., autoencoders and GANs) have been utilised to enhance recognition model performance in attack scenarios, as proposed in [172].

4.2. Privacy Protection

Privacy protection deals with the management of sensitive information. In the context of vehicular networks, it is further classified into location privacy and user privacy. Several privacy schemes have been proposed in the literature with the purpose of obtaining services from the network without exposing the real identity of other users. In recent years, ML-based approaches are gaining attention as a solution to overcome the drawbacks of the current techniques and protect the privacy of users from attackers, while ensuring secure availability to the service-providing entity is quite challenging. The proposal of [173] is a federated learning-based data privacy protection scheme for vehicular cyber–physical systems. It consists of data transformation and collaborative data leakage detection. An approach exploiting federated learning to detect misbehaviour, preserving user data privacy, is proposed in [174], where the authors propose the use of federated learning where the personal information of a vehicle is kept on the vehicle and training is performed without sending data to the central node. In [152], an innovative approach leveraging federated learning and edge computing techniques is proposed to predict vehicles’ subsequent locations by using large-scale data, concurrently prioritising user privacy.

5. Datasets

Datasets are crucial in the smart mobility field, as they provide the real-world data needed to properly train and validate algorithms devoted to providing intelligent features. High-quality datasets enable vehicles to understand and respond to various driving scenarios, improving safety and decision making. Moreover, the availability of diverse datasets guarantees adaptability and reliability in different environments. In this section, some of the most used and complete datasets have been briefly described and listed in Table 7.

The detection and tracking of pedestrians and vehicles have attracted much attention in various contexts (e.g., security surveillance, road monitoring, etc.), resulting in an abundance of datasets including well-annotated images and video sequences. However, the increasing demand in the automotive industry has shifted the attention to the driver assistance and autonomous navigation perspective. Among these, the Calthec pedestrian dataset [175], dedicated exclusively to pedestrian detection and tracking, is one of the largest for this task and provides a valuable benchmark for occlusion handling. The KITTI dataset [171] is a larger dataset covering multiple tasks; it provides multiple sources for 3D object detection and tracking in video sequences, road/lane detection, and semantic instance segmentation. Up to 15 cars and 30 pedestrians are visible per frame, and detailed benchmarks and metrics are provided. The number of available images/sequences varies according to the specific task. The CityPersons [176] dataset consists of a large and diverse set of stereo-video sequences recorded in the streets of different cities in Germany and neighbouring countries, providing high diversity and allowing for high generalisation capabilities. In terms of tracking, MOT Challenge provides a framework for the fair evaluation of multiple people-tracking algorithms, consisting of a large collection of datasets (including some new challenging sequences) and including detections for all sequences, with a common evaluation tool providing several measures, from recall to precision to running time. Also, in this case, the dataset size varies according to the specific subset/task. To go deeper into the context of action recognition, it is worth mentioning two datasets: PIE [46], a large dataset (1842 pedestrian samples) designed for estimating pedestrian intentions in traffic scenes, and JAAD [47], which considers pedestrians with behavioural annotations, distant pedestrians not interacting with the driver, and groups of pedestrians.

Given the priority of making smart vehicles capable of perceiving other road users, it is obvious that the security, as well as the ability to perform autonomous driving tasks, depends on the understanding of the surrounding environment. To this end, several datasets, exploiting different sensors, have been provided. In this context, two notable datasets are represented by ApolloScape [177] and WoodScape [178]. The former exploits front view and the latter 360°view, and both benefit from additional data based on GPS and CAN-bus info. Mapillary [179] provides pixel-accurate and instance-specific human annotations for understanding street scenes with 124 semantic object categories and worldwide coverage. A rich portfolio of sensors is the strength of nuScenes, which provides 360°, LiDAR, GPS, CAN, IMU, radar, and HD-Map data. HD-Map data are also included in the Argoverse dataset, which comes with more than 300k images. OpenLaneV2 focuses its attention on recognising the dynamic drivable states of lanes in the surrounding environment, exploiting multi-view images and LiDAR data.

Among the datasets for traffic sign detection, we can cite the recently released Mapillary Traffic Sign Dataset [179]. The dataset contains 100k images (52,000 fully annotated and 48,000 partially annotated). It is the largest and most diverse collection of traffic sign images globally, with detailed annotations for various traffic sign categories. A valid alternative is given by CCTSDB2021 [180], which counts more than 16,000 annotated images acquired under several weather conditions.

A less explored field regards the detection of unexpected objects in the road context. Among the datasets dealing with non-common road elements, we can cite Road Anomaly [98], which attracted the attention of many researchers. The dataset contains images, with associated per-pixel labels, of unusual dangers which can be encountered in a road environment (e.g., traffic cones, animals, and rocks). An older but still exploited dataset is Lost and Found [99], which comprises 112 stereo-video sequences with 2104 annotated frames.

In terms of unpredictable situations, it is worth citing RDD-2020 [181], a large-scale heterogeneous dataset including 26,620 images of road damage collected from multiple countries.

Aspects related to traffic prediction and navigation planning represent another hot topic that led to the release of datasets such as NYC Taxi Data [182], a public dataset containing detailed trip records of New York City taxi rides in 2014 including several specific data (e.g., pickup and drop-off dates and times, locations, and trip distances). The Transportation Network [183] dataset comes instead from ride-hailing services like Uber or Lyft. A comprehensive dataset for the development and validation of GNSS algorithms and mapping algorithms designed to work with commodity sensors is comma2k19 [156], a collection of 33 h of commute time on California’s 280 highways, containing various features, such as speed, acceleration, steering wheel angle, and distance travelled between two consecutive points, extracted from CAN, GNSS, and additional smart vehicle sensors.

Air quality monitoring and forecasting are also pivotal applications that need continuous monitoring at multiple points to be treated properly. Within this scope, Beijing Multi-Site AQ [184] provides Beijing’s PM2.5 data of the past 4 years at 36 monitoring sites, whereas Madrid AQ [185] reports data (e.g PM2, PM10, and SO) from 24 stations in the city of Madrid collected over 18 years.

In the context of security, a valuable dataset is VeReMi [186], a simulated dataset that consists of message logs for vehicles, containing both GPS data about the local vehicle and BSM messages received from other vehicles. Its primary purpose is to assess how misbehaviour detection mechanisms operate on a city scale.

A comprehensive overview of available datasets for smart navigation is provided in [15]. Datasets are classified depending on the specific task (e.g., Perception Datasets, Mapping Datasets, and Planning Datasets), and an analysis of their impact is provided to select the most important ones.

Table 7. Collection of the most relevant datasets serving the discussed topic. DT: detection and tracking, PD: pedestrian detection; VDT: vehicle detection and tracking; PDT: pedestrian detection and tracking; SU: scene understanding; AR: action recognition; SD: traffic signal detection; RA: road anomaly; N: navigation; MD: misbehaviour detection; FVC: front-view camera; MVI: multi-view images; HM: HD-MAP; E: environment; AQS: Air Quality Sensor; LID: LiDAR; IMU: Inertial Measurement Unit.

Dataset	Year	Sensor	Task	Size	Ref.
Caltech [175]	2009	FVC, LID, GPS, and IMU	VDT and PDT	~100 k	[28]
KITTY [171]	2012	FVC	3D DT and SU	~500	[63]
CityPersons [176]	2017	FVC	PD	~5000	[66]
MOT Challenge [187]	2017	FVC	DT	~60 k	[31]
PIE [46]	2019	FVC	AR	~2 k	[66]
JAAD [47]	2017	FVC	AR	~350	[66]
ApolloScape [177]	2018	FVC, GPS, and IMU	SU	~2.5 h	[82]
WoodScape [178]	2019	360°view, GPS, CAN, and IMU	SU	~100 k	[86]
Mapillary [188]	2020	FVC	SU	25,000	[189]
nuScenes [190]	2019	360°view, LID, GPS CAN, IMU, radar, and HM	SU	1000	[87]
Argoverse [191,192]	2019	360°view, HM	SU	324 k	[67]
Mapillary TSD [179]	2020	FVC	SD	~100 k	[80]
CCTSDB [180]	2021	FVC	SD	~16.5 k	[74]
OpenLane V2 [193]	2023	MVI and LID	SU	~100 k	[92]
Road Anomaly [98]	2019	FVC	RA	~61	[100]
Lost and Found [99]	2016	FVC	RA	~2104	[101]
RDD-2020 [181]	2020	FVC	RA	~26,620	[194]
OpenStreetMap [195]	2025	N/A	N	Worldwide	[196]
Transportation networks [183]	2016	N/A	N	N/A	[105]
NYC Taxi Data [182]	2014	N/A	N	4 years	[106]
comma2k19 [156]	2018	GPS	N	~33 h	[146]
Beijing Multi-Site AQ [184]	2017	AQS	E	4 years	[122]
Madrid AQ [185]	2019	AQS	E	18 years	[120]
VeReMi [186]	2018	Simulation	MD	225 simulations	[150]

6. Practical Implementations of AI in Smart Mobility

Artificial Intelligence (AI) is increasingly being integrated into urban mobility systems and vehicles, enhancing efficiency, safety, and sustainability. Below are notable examples.

Singapore’s Smart Mobility initiative leverages AI to optimise traffic flow and public transportation services. By analysing real-time data, the system adjusts traffic signals and public transit schedules, leading to reduced congestion and improved commuter experience (https://www.smartnation.gov.sg/initiatives/transport/, accessed on 22 April 2025). British AI startup Wayve has partnered with Nissan to integrate its autonomous driving software into Nissan’s ProPILOT system by 2027 (https://wayve.ai/press/nissan-announcement/, accessed on 22 April 2025). Wayve’s approach utilises camera data and real-world testing, enabling the AI to adapt like a human driver. This collaboration aims to enhance collision avoidance and support semi-autonomous driving fuctions. In the UK, AI-driven traffic lights developed by VivaCity are being tested to prioritise cyclists over cars during peak hours (https://vivacitylabs.com/glasgow-city-council-uses-vivacity-ai-to-improve-traffic-flow-and-road-safety/, accessed on 22 April 2025). These systems detect cyclists up to 30 m away, adjusting signals to provide them with a smoother and safer journey, thereby encouraging cycling and reducing vehicular congestion. The Los Angeles Metro employs AI for predictive maintenance of its rail network (https://www2.wi-tronix.com/wi-tronix-and-siemens-mobility-to-transform-la-metro-with-wi-tronix-violet-platform/, accessed on 22 April 2025). By analysing sensor data from trains and tracks, the system can identify maintenance needs before they lead to service disruptions, enhancing reliability and reducing costs. Copenhagen utilises AI algorithms to monitor bike availability and redistribute bikes to high-demand areas in its bike-sharing system. This optimisation encourages cycling as a sustainable mode of transportation and reduces reliance on cars (https://orbit.dtu.dk/en/projects/c67e0118-f8c7-410e-8d78-35a6de1c78b9, accessed on 22 April 2025). These implementations demonstrate the transformative impact of AI on urban mobility, offering solutions that enhance transportation efficiency, safety, and sustainability.

7. Discussions, Open Challenges, and Future Directions

Smart mobility seeks to transform transportation through the integration of advanced technologies such as AI, the IoT, and autonomous systems [197]. While significant progress has been made, several critical challenges must be addressed to realise the full potential of smart cities and enable truly efficient, safe, and sustainable transportation systems. This section systematically examines these challenges and proposes concrete research directions to overcome them.

7.1. Open Challenges

The transition to smart mobility faces several persistent technical and societal hurdles that require urgent attention:

Computational limitations:
- Current state-of-the-art object recognition and context segmentation systems based on deep learning (e.g., YOLO [198] and SAM [69]) achieve impressive accuracy but at the cost of substantial computational complexity. This creates a significant barrier for deployment on resource-constrained edge devices commonly used in vehicular systems, forcing compromises between performance and practicality. Moreover, the process of adapting these research solutions for industrial applications often requires unexpected additional engineering effort and cost.
Data and modelling challenges:
- Traffic prediction models continue to struggle with capturing the complex spatiotemporal dependencies inherent in transportation networks [104]. While current approaches can handle regular patterns well, they often fail to account for unexpected events like accidents or special occasions that dramatically alter traffic flows. Furthermore, the lack of diverse, large-scale datasets for rare but safety-critical scenarios limits our ability to develop robust systems. In air quality modelling, despite advances in AI techniques [199], significant challenges remain in integrating disparate data sources (ground sensors and satellite imagery) while maintaining model interpretability for policymakers.
Sensor limitations:
- While sensor fusion techniques have greatly improved the reliability of autonomous vehicle perception systems [200], they still exhibit performance degradation under challenging conditions such as heavy rain, snow, or complex urban environments with many occlusions. This limitation stems from both physical sensor limitations in adverse weather and algorithmic shortcomings in handling conflicting sensor inputs.
Network management issues:
- The highly dynamic nature of vehicular networks creates unique communication challenges [201]. During peak hours in dense urban areas, the surge in Vehicle-to-Vehicle and Vehicle-to-Infrastructure communication can lead to network congestion, which existing protocols struggle to handle effectively. This becomes particularly problematic for safety-critical messages that require guaranteed low-latency delivery.
Security vulnerabilities:
- The increasing reliance on machine learning for critical vehicle functions has introduced new attack vectors [12,202]. Adversarial attacks that subtly manipulate sensor inputs can cause dangerous misperceptions, while more direct attacks on vehicle control systems could have catastrophic consequences. Current defence mechanisms remain largely reactive and specialised in specific attack types.
Ethical and legal concerns:
- The “trolley problem” and similar ethical dilemmas [203] highlight fundamental questions about how autonomous vehicles should make life-and-death decisions in unavoidable accident scenarios. Beyond these philosophical questions, practical legal frameworks for determining liability in AV-related accidents remain underdeveloped. Additionally, the massive data collection required for smart mobility systems raises significant privacy concerns that current regulations may not adequately address.

7.2. Future Directions

To address these challenges, the research community should focus on the following promising directions:

Efficient AI development:
- Recent advances in model compression techniques like quantisation and knowledge distillation [204] show promise for deploying sophisticated AI models on edge devices. These can be complemented by lightweight AI network architectures [205] that maintain accuracy while reducing parameters through innovative designs like multi-scale context awareness. The success of specialised YOLO variants [81,206] demonstrates how task-specific optimisations can achieve real-time performance without compromising detection quality.
Advanced modelling approaches:
- Graph Neural Networks (GNNs) present an exciting opportunity [207] to better model the complex interactions in transportation systems, particularly for applications like intersection management and urban planning that have received less attention than traffic prediction. The development of city-scale digital twins [208] offers a powerful tool for testing planning algorithms across diverse scenarios without real-world risks. Combining these with synthetic data generation [209] could dramatically accelerate development cycles while reducing costs.
Robust perception systems:
- Next-generation perception systems must handle diverse environmental conditions, as demonstrated by NTS-YOLO’s [206] effective handling of nocturnal scenarios. Combining such condition-specific optimisations with the hybrid communication framework of [210] could create more resilient multi-modal systems. The TSD-YOLO approach [81] shows particular promise for small object detection in cluttered urban environments.
Network optimisation:
- Novel congestion management approaches should draw lessons from heterogeneous network implementations like [211], which successfully balanced autonomous monitoring with centralised control. Their experience with dynamic resource allocation could inform QoS-aware protocols for vehicular networks.
Trustworthy AI systems:
- The development of explainable AI (XAI) techniques [212] is crucial to building trust in autonomous systems and meeting regulatory requirements. Simultaneously, we need to move beyond ad hoc defences against adversarial attacks toward certifiably robust models that can guarantee safety under defined threat models.
Regulatory frameworks:
- Establishing standardised datasets and testing protocols will be essential to comparing different approaches and ensuring system reliability. Blockchain technology [213] offers a potential solution for secure, transparent data management, though challenges around scalability and implementation costs must be addressed.

Addressing these challenges and pursuing these research directions will require unprecedented collaboration across academia, industry, and government. Only through such multidisciplinary efforts can we overcome the technical, ethical, and regulatory hurdles to realise the full potential of smart mobility systems that are not only technologically advanced but also socially responsible and widely trusted.

8. Conclusions

In this paper, an overview of the role and evolution of DL methods in smart mobility has been presented, considering three key pillar areas: smart vehicles, smart planning, and vehicle network and security. It emerged that smart mobility holds immense potential to revolutionise transportation through AI, the IoT, and autonomous systems, but several challenges must be addressed to achieve efficient, safe, and sustainable urban mobility. Key obstacles include the computational complexity of deep learning models for object recognition and segmentation, necessitating optimised solutions like quantisation and edge computing. Graph Neural Networks (GNNs) and digital twins offer promising paths for traffic prediction and urban planning but require further exploration in areas like autonomous vehicle operations and intersection management. Sensor fusion enhances autonomous vehicle reliability, but challenges persist under adverse conditions, requiring improved simulation and data augmentation techniques. Vehicular network management faces congestion issues, demanding adaptive communication strategies. AI-driven pollution estimation and traffic prediction struggle with data heterogeneity, model interpretability, and real-time processing, calling for advanced DL architectures and explainable AI (XAI). Security remains another critical concern, with adversarial attacks posing risks to ML-based decision systems, necessitating robust defence mechanisms. Ethical dilemmas, such as decision making in unavoidable accidents and data privacy, highlight the need for transparent AI and regulatory frameworks. Blockchain technology could enhance security and accountability but requires industry-wide standardisation. Future research should prioritise large, diverse datasets, generalisable models for global infrastructure, and interdisciplinary collaboration to overcome these barriers. By addressing these challenges, smart mobility can realise its full potential in shaping the future of transportation.

Author Contributions

Conceptualization, M.D.-C. and M.L.; methodology, M.D.-C. and M.L.; investigation, M.D.-C., M.L, P.C., S.T.O., D.I.; data curation, M.D.-C.; writing—original draft preparation, M.D.-C., M.L, P.C., S.T.O., D.I.; writing—review and editing, M.D.-C., M.L, P.C., S.T.O., D.I.; supervision, M.D.-C. and M.L.; All authors have read and agreed to the published version of the manuscript.

Funding

This research study was partially supported by the project Future Artificial Intelligence Research (FAIR) CUP B53C220036 30006 grant number PE0000013.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ADAS	Advanced Driver Assistance System
AI	Artificial Intelligence
ANN	Artificial Neural Network
AQI	Air Quality Index
BGGRU	Bidirectional Graph Gated Recurrent Unit
BiLSTM	Bidirectional Long Short-Term Memory
CAN	Controller Area Network
CAV	Connected and Autonomous Vehicle
CNN	Convolutional Neural Network
DL	deep learning
DNN	Deep Neural Network
ECU	Electronic Control Unit
FL	Fuzzy Logic
GAN	Generative Adversarial Network
GCN	Graph Convolutional Network
GMAN	Graph Multi-Attention Network
GNSS	Global Navigation Satellite System
GRU	Gated Recurrent Unit
IoT	Internet of Things
IoV	Internet of Vehicles
IVN	in-vehicle network
LSTM	Long Short-Term Memory
ML	machine learning
MOT	multi-object tracking
OBD	Onboard Diagnostics
PDFormer	Propagation Delay-aware Dynamic Long-Range Transformer
PDT	Pedestrian Detection and Tracking
PM	Particulate Matter
R-CNN	Region-based Convolutional Neural Network
RNN	recurrent neural network
SAM	Segment Anything Model
SVM	Support Vector Machines
ST-DAGCN	Spatiotemporal Dual-Adaptive Graph Convolutional Network
TSANet	Traffic State Anticipation Network
V2B	Vehicle-to-Building
V2D	Vehicle-to-Device
V2G	Vehicle-to-Grid
V2I	Vehicle-to-Infrastructure
V2V	Vehicle-to-Vehicle
V2X	Vehicle-to-Everything
VANET	Vehicular Ad hoc Network
XAI	explainable AI
YOLO	You Only Look Once

References

NHTSA Estimates for 2022 Show Roadway Fatalities Remain Flat After Two Years of Dramatic Increases. Available online: https://www.nhtsa.gov/press-releases/traffic-crash-death-estimates-2022 (accessed on 22 April 2025).
Road Safety, Fatalities Rise in 2022: +4 Percent over 2021. Available online: https://www.eunews.it/en/2024/04/12/road-safety-fatalities-rise-in-2022-4-per-cent-over-2021/# (accessed on 22 April 2025).
Singh, S. Critical Reasons for Crashes Investigated in the National Motor Vehicle Crash Causation Survey; Technical Report. 2015. Available online: https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/812506 (accessed on 22 April 2025).
D’orazio, T.; Leo, M.; Distante, A. Eye detection in face images for a driver vigilance system. In Proceedings of the IEEE Intelligent Vehicles Symposium, Parma, Italy, 8 October 2004; pp. 95–98. [Google Scholar]
Ahmad, U.; Han, M.; Jolfaei, A.; Jabbar, S.; Ibrar, M.; Erbad, A.; Herbert Song, H.; Alkhrijah, Y. A Comprehensive Survey and Tutorial on Smart Vehicles: Emerging Technologies, Security Issues, and Solutions Using Machine Learning. IEEE Trans. Intell. Transp. Syst. 2024, 25, 15314–15341. [Google Scholar] [CrossRef]
Parekh, D.; Poddar, N.; Rajpurkar, A.; Chahal, M.; Kumar, N.; Joshi, G.P.; Cho, W. A Review on Autonomous Vehicles: Progress, Methods and Challenges. Electronics 2022, 11, 2162. [Google Scholar] [CrossRef]
Muhammad, K.; Ullah, A.; Lloret, J.; Ser, J.D.; De Albuquerque, V.H.C. Deep Learning for Safe Autonomous Driving: Current Challenges and Future Directions. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4316–4336. [Google Scholar] [CrossRef]
Kang, Y.; Yin, H.; Berger, C. Test Your Self-Driving Algorithm: An Overview of Publicly Available Driving Datasets and Virtual Testing Environments. IEEE Trans. Intell. Veh. 2019, 4, 171–185. [Google Scholar] [CrossRef]
Almukhalfi, H.; Noor, A.; Noor, T.H. Traffic management approaches using machine learning and deep learning techniques: A survey. Eng. Appl. Artif. Intell. 2024, 133, 108147. [Google Scholar] [CrossRef]
Ranyal, E.; Sadhu, A.; Jain, K. Road Condition Monitoring Using Smart Sensing and Artificial Intelligence: A Review. Sensors 2022, 22, 3044. [Google Scholar] [CrossRef]
Elkhail, A.A.; Refat, R.U.D.; Habre, R.; Hafeez, A.; Bacha, A.; Malik, H. Vehicle Security: A Survey of Security Issues and Vulnerabilities, Malware Attacks and Defenses. IEEE Access 2021, 9, 162401–162437. [Google Scholar] [CrossRef]
Qayyum, A.; Usama, M.; Qadir, J.; Al-Fuqaha, A. Securing Connected & Autonomous Vehicles: Challenges Posed by Adversarial Machine Learning and the Way Forward. IEEE Commun. Surv. Tutor. 2020, 22, 998–1026. [Google Scholar] [CrossRef]
Wu, Y.; Dai, H.N.; Wang, H.; Xiong, Z.; Guo, S. A survey of intelligent network slicing management for industrial IoT: Integrated approaches for smart transportation, smart energy, and smart factory. IEEE Commun. Surv. Tutor. 2022, 24, 1175–1211. [Google Scholar] [CrossRef]
Alzahrani, M.; Wang, Q.; Liao, W.; Chen, X.; Yu, W. Survey on multi-task learning in smart transportation. IEEE Access 2024, 12, 17023–17044. [Google Scholar] [CrossRef]
Li, H.; Li, Y.; Wang, H.; Zeng, J.; Xu, H.; Cai, P.; Chen, L.; Yan, J.; Xu, F.; Xiong, L.; et al. Open-sourced data ecosystem in autonomous driving: The present and future. arXiv 2023, arXiv:2312.03408. [Google Scholar]
Han, X.; Meng, Z.; Xia, X.; Liao, X.; He, B.Y.; Zheng, Z.; Wang, Y.; Xiang, H.; Zhou, Z.; Gao, L.; et al. Foundation Intelligence for Smart Infrastructure Services in Transportation 5.0. IEEE Trans. Intell. Veh. 2024, 9, 39–47. [Google Scholar] [CrossRef]
Taraba, M.; Adamec, J.; Danko, M.; Drgona, P. Utilization of modern sensors in autonomous vehicles. In Proceedings of the 2018 ELEKTRO, Mikulov, Czech Republic, 21–23 May 2018; pp. 1–5. [Google Scholar] [CrossRef]
Richards, M.A. Fundamentals of Radar Signal Processing, 2nd ed.; McGraw-Hill Education: New York, NY, USA, 2014. [Google Scholar]
Talbot, S.C.; Ren, S. Comparision of FieldBus Systems CAN, TTCAN, FlexRay and LIN in Passenger Vehicles. In Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems Workshops, Montreal, QC, Canada, 22–26 June 2009; pp. 26–31. [Google Scholar] [CrossRef]
Camara, F.; Bellotto, N.; Cosar, S.; Weber, F.; Nathanael, D.; Althoff, M.; Wu, J.; Ruenz, J.; Dietrich, A.; Markkula, G.; et al. Pedestrian models for autonomous driving part ii: High-level models of human behavior. IEEE Trans. Intell. Transp. Syst. 2020, 22, 5453–5472. [Google Scholar] [CrossRef]
Ravindran, R.; Santora, M.J.; Jamali, M.M. Multi-Object Detection and Tracking, Based on DNN, for Autonomous Vehicles: A Review. IEEE Sens. J. 2021, 21, 5668–5677. [Google Scholar] [CrossRef]
Sun, Z.; Chen, J.; Chao, L.; Ruan, W.; Mukherjee, M. A Survey of Multiple Pedestrian Tracking Based on Tracking-by-Detection Framework. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 1819–1833. [Google Scholar] [CrossRef]
Zhang, C.; Berger, C. Pedestrian Behavior Prediction Using Deep Learning Methods for Urban Scenarios: A Review. IEEE Trans. Intell. Transp. Syst. 2023, 24, 10279–10301. [Google Scholar] [CrossRef]
Ning, C.; Menglu, L.; Hao, Y.; Xueping, S.; Yunhong, L. Survey of pedestrian detection with occlusion. Complex Intell. Syst. 2021, 7, 577–587. [Google Scholar] [CrossRef]
Cao, J.; Pang, Y.; Xie, J.; Khan, F.S.; Shao, L. From Handcrafted to Deep Features for Pedestrian Detection: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 4913–4934. [Google Scholar] [CrossRef]
Shivappriya, S.N.; Priyadarsini, M.J.P.; Stateczny, A.; Puttamadappa, C.; Parameshachari, B.D. Cascade Object Detection and Remote Sensing Object Detection Method Based on Trainable Activation Function. Remote Sens. 2021, 13, 200. [Google Scholar] [CrossRef]
Indapwar, A.; Choudhary, J.; Singh, D.P. Survey of Real-Time Object Detection for Logo Detection System. In Intelligent Systems; Sheth, A., Sinhal, A., Shrivastava, A., Pandey, A.K., Eds.; Series Title: Algorithms for Intelligent Systems; Springer: Singapore, 2021; pp. 61–72. [Google Scholar] [CrossRef]
Khan, A.H.; Munir, M.; van Elst, L.; Dengel, A. F2dnet: Fast focal detection network for pedestrian detection. In Proceedings of the 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 21–25 August 2022; pp. 4658–4664. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. Version Number: 5. arXiv 2015, arXiv:1506.02640. [Google Scholar]
Xiao, X.; Wang, B.; Miao, L.; Li, L.; Zhou, Z.; Ma, J.; Dong, D. Infrared and Visible Image Object Detection via Focused Feature Enhancement and Cascaded Semantic Extension. Remote Sens. 2021, 13, 2538. [Google Scholar] [CrossRef]
Khan, A.H.; Nawaz, M.S.; Dengel, A. Localized semantic feature mixers for efficient pedestrian detection in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 5476–5485. [Google Scholar]
Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Occlusion-Aware R-CNN: Detecting Pedestrians in a Crowd. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Series Title: Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11207, pp. 657–674. [Google Scholar] [CrossRef]
Tian, Y.; Luo, P.; Wang, X.; Tang, X. Deep Learning Strong Parts for Pedestrian Detection. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 13–16 December 2015; pp. 1904–1912. [Google Scholar] [CrossRef]
Hwang, S.; Park, J.; Kim, N.; Choi, Y.; Kweon, I.S. Multispectral pedestrian detection: Benchmark dataset and baseline. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1037–1045. [Google Scholar] [CrossRef]
Hangil, C.; Kim, S.; Kihong, P.; Sohn, K. Multi-spectral pedestrian detection based on accumulated object proposal with fully convolutional networks. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 621–626. [Google Scholar] [CrossRef]
Hou, Y.L.; Song, Y.; Hao, X.; Shen, Y.; Qian, M. Multispectral pedestrian detection based on deep convolutional neural networks. In Proceedings of the 2017 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Xiamen, China, 22–25 October 2017; pp. 1–4. [Google Scholar] [CrossRef]
Dinakaran, R.K.; Easom, P.; Bouridane, A.; Zhang, L.; Jiang, R.; Mehboob, F.; Rauf, A. Deep Learning Based Pedestrian Detection at Distance in Smart Cities. In Intelligent Systems and Applications; Bi, Y., Bhatia, R., Kapoor, S., Eds.; Series Title: Advances in Intelligent Systems and Computing; Springer International Publishing: Cham, Switzerland, 2020; Volume 1038, pp. 588–593. [Google Scholar] [CrossRef]
Navarro, P.; Fernández, C.; Borraz, R.; Alonso, D. A Machine Learning Approach to Pedestrian Detection for Autonomous Vehicles Using High-Definition 3D Range Data. Sensors 2016, 17, 18. [Google Scholar] [CrossRef] [PubMed]
Zhao, X.; Sun, P.; Xu, Z.; Min, H.; Yu, H. Fusion of 3D LIDAR and Camera Data for Object Detection in Autonomous Vehicle Applications. IEEE Sens. J. 2020, 20, 4901–4913. [Google Scholar] [CrossRef]
Palffy, A.; Kooij, J.F.P.; Gavrila, D.M. Occlusion aware sensor fusion for early crossing pedestrian detection. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 1768–1774. [Google Scholar] [CrossRef]
Zhu, X.; Fu, W.; Xu, X. Intent Prediction of Pedestrians via Integration of Facial Expression and Human 2D Skeleton for Autonomous Car-like Mobile Robots. In Proceedings of the 2021 IEEE 16th Conference on Industrial Electronics and Applications (ICIEA), Chengdu, China, 1–4 August 2021; pp. 1775–1780. [Google Scholar]
Galvão, L.G.; Huda, M.N. Pedestrian and vehicle behaviour prediction in autonomous vehicle system—A review. Expert Syst. Appl. 2024, 238, 121983. [Google Scholar] [CrossRef]
Dimiccoli, M.; Cartas, A.; Radeva, P. Activity recognition from visual lifelogs: State of the art and future challenges. In Multimodal Behavior Analysis in the Wild; Elsevier: Amsterdam, The Netherlands, 2019; pp. 121–134. [Google Scholar] [CrossRef]
Huang, J.; Gautam, A.; Saripalli, S. Learning Pedestrian Actions to Ensure Safe Autonomous Driving. In Proceedings of the 2023 IEEE Intelligent Vehicles Symposium (IV), Anchorage, AK, USA, 4–7 June 2023; pp. 1–8. [Google Scholar]
Kotseruba, I.; Rasouli, A.; Tsotsos, J.K. Benchmark for Evaluating Pedestrian Action Prediction. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 1257–1267. [Google Scholar] [CrossRef]
Rasouli, A.; Kotseruba, I.; Kunic, T.; Tsotsos, J. PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6261–6270. [Google Scholar] [CrossRef]
Rasouli, A.; Kotseruba, I.; Tsotsos, J.K. Are They Going to Cross? A Benchmark Dataset and Baseline for Pedestrian Crosswalk Behavior. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy, 22–29 October 2017; pp. 206–213. [Google Scholar] [CrossRef]
Lorenzo, J.; Alonso, I.P.; Izquierdo, R.; Ballardini, A.L.; Saz, H.; Llorca, D.F.; Sotelo, M. CAPformer: Pedestrian Crossing Action Prediction Using Transformer. Sensors 2021, 21, 5694. [Google Scholar] [CrossRef]
Rasouli, A.; Rohani, M.; Luo, J. Bifold and Semantic Reasoning for Pedestrian Behavior Prediction. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 15580–15590. [Google Scholar] [CrossRef]
Li, B. 3D fully convolutional network for vehicle detection in point cloud. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 1513–1518. [Google Scholar] [CrossRef]
Milan, A.; Rezatofighi, S.H.; Dick, A.; Reid, I.; Schindler, K. Online Multi-Target Tracking Using Recurrent Neural Networks. Proc. AAAI Conf. Artif. Intell. 2017, 31, 4225–4232. [Google Scholar] [CrossRef]
Asvadi, A.; Garrote, L.; Premebida, C.; Peixoto, P.; Nunes, U.J. Multimodal vehicle detection: Fusing 3D-LIDAR and color camera data. Pattern Recognit. Lett. 2018, 115, 20–29. [Google Scholar] [CrossRef]
Bin Zuraimi, M.A.; Kamaru Zaman, F.H. Vehicle Detection and Tracking using YOLO and DeepSORT. In Proceedings of the 2021 IEEE 11th IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), Penang, Malaysia, 3–4 April 2021; pp. 23–29. [Google Scholar] [CrossRef]
Dong, X.; Yan, S.; Duan, C. A lightweight vehicles detection network model based on YOLOv5. Eng. Appl. Artif. Intell. 2022, 113, 104914. [Google Scholar] [CrossRef]
Hong, L.; Yan, S.; Zhang, R.; Li, W.; Zhou, X.; Guo, P.; Jiang, K.; Chen, Y.; Li, J.; Chen, Z.; et al. OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 19079–19091. [Google Scholar] [CrossRef]
Chadwick, S.; Maddern, W.; Newman, P. Distant Vehicle Detection Using Radar and Vision. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 8311–8317. [Google Scholar] [CrossRef]
Punyavathi, G.; Neeladri, M.; K Singh, M. Vehicle tracking and detection techniques using IoT. Mater. Today Proc. 2022, 51, 909–913. [Google Scholar] [CrossRef]
Hu, X.; Xu, X.; Xiao, Y.; Chen, H.; He, S.; Qin, J.; Heng, P.A. SINet: A Scale-Insensitive Convolutional Neural Network for Fast Vehicle Detection. IEEE Trans. Intell. Transp. Syst. 2019, 20, 1010–1019. [Google Scholar] [CrossRef]
Park, Y.; Dang, L.M.; Lee, S.; Han, D.; Moon, H. Multiple Object Tracking in Deep Learning Approaches: A Survey. Electronics 2021, 10, 2406. [Google Scholar] [CrossRef]
Luo, W.; Xing, J.; Milan, A.; Zhang, X.; Liu, W.; Kim, T.K. Multiple object tracking: A literature review. Artif. Intell. 2021, 293, 103448. [Google Scholar] [CrossRef]
Lee, B.; Erdenee, E.; Jin, S.; Rhee, P.K. Multi-Class Multi-Object Tracking using Changing Point Detection. Version Number: 1. arXiv 2016, arXiv:1608.08434. [Google Scholar]
Chen, Y.; Shin, H. Pedestrian Detection at Night in Infrared Images Using an Attention-Guided Encoder-Decoder Convolutional Neural Network. Appl. Sci. 2020, 10, 809. [Google Scholar] [CrossRef]
Nagy, M.; Werghi, N.; Hassan, B.; Dias, J.; Khonji, M. RobMOT: Robust 3D Multi-Object Tracking by Observational Noise and State Estimation Drift Mitigation on LiDAR PointCloud. arXiv 2024, arXiv:2405.11536. [Google Scholar]
Leal-Taixe, L.; Fenzi, M.; Kuznetsova, A.; Rosenhahn, B.; Savarese, S. Learning an Image-Based Motion Context for Multiple People Tracking. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 3542–3549. [Google Scholar] [CrossRef]
Tang, X.; Zhang, Z.; Qin, Y. On-Road Object Detection and Tracking Based on Radar and Vision Fusion: A Review. IEEE Intell. Transp. Syst. Mag. 2022, 14, 103–128. [Google Scholar] [CrossRef]
Wang, C.; Wang, Y.; Xu, M.; Crandall, D.J. Stepwise goal-driven networks for trajectory prediction. IEEE Robot. Autom. Lett. 2022, 7, 2716–2723. [Google Scholar] [CrossRef]
Zhou, Z.; Wang, J.; Li, Y.H.; Huang, Y.K. Query-centric trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 17863–17873. [Google Scholar]
Wang, S.; Liu, H.; Wang, Q.; Zhou, Y.; Yao, Y. SMP-Track: SAM in Multi-Pedestrian Tracking. In Proceedings of the 2024 IEEE 11th International Conference on Data Science and Advanced Analytics (DSAA), San Diego, CA, USA, 6–10 October 2024; pp. 1–9. [Google Scholar] [CrossRef]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment Anything. Version Number: 1. arXiv 2023, arXiv:2304.02643. [Google Scholar]
Gao, S.; Zhou, C.; Zhang, J. Generalized Relation Modeling for Transformer Tracking. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 18686–18695. [Google Scholar] [CrossRef]
Zhu, Y.; Yan, W.Q. Traffic sign recognition based on deep learning. Multimed. Tools Appl. 2022, 81, 17779–17791. [Google Scholar] [CrossRef]
An, F.; Wang, J.; Liu, R. Road traffic sign recognition algorithm based on cascade attention-modulation fusion mechanism. IEEE Trans. Intell. Transp. Syst. 2024, 25, 17841–17851. [Google Scholar] [CrossRef]
Haque, W.A.; Arefin, S.; Shihavuddin, A.; Hasan, M.A. DeepThin: A novel lightweight CNN architecture for traffic sign recognition without GPU requirements. Expert Syst. Appl. 2021, 168, 114481. [Google Scholar] [CrossRef]
Hong, H.; Zhou, Y.; Shu, X.; Hu, X. CCSPNet-joint: Efficient joint training method for traffic sign detection under extreme conditions. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 30 June–5 July 2024; pp. 1–8. [Google Scholar]
Flores-Calero, M.; Astudillo, C.A.; Guevara, D.; Maza, J.; Lita, B.S.; Defaz, B.; Ante, J.S.; Zabala-Blanco, D.; Armingol Moreno, J.M. Traffic sign detection and recognition using YOLO object detection algorithm: A systematic review. Mathematics 2024, 12, 297. [Google Scholar] [CrossRef]
Soylu, E.; Soylu, T. A performance comparison of YOLOv8 models for traffic sign detection in the Robotaxi-full scale autonomous vehicle competition. Multimed. Tools Appl. 2024, 83, 25005–25035. [Google Scholar] [CrossRef]
Wang, J.; Chen, Y.; Dong, Z.; Gao, M. Improved YOLOv5 network for real-time multi-scale traffic sign detection. Neural Comput. Appl. 2023, 35, 7853–7865. [Google Scholar] [CrossRef]
Zhang, S.; Che, S.; Liu, Z.; Zhang, X. A real-time and lightweight traffic sign detection method based on ghost-YOLO. Multimed. Tools Appl. 2023, 82, 26063–26087. [Google Scholar] [CrossRef]
Dewi, C.; Chen, R.C.; Jiang, X.; Yu, H. Deep convolutional neural network for enhancing traffic sign recognition developed on Yolo V4. Multimed. Tools Appl. 2022, 81, 37821–37845. [Google Scholar] [CrossRef]
Zhao, R.; Tang, S.H.; Shen, J.; Supeni, E.E.B.; Rahim, S.A. Enhancing autonomous driving safety: A robust traffic sign detection and recognition model TSD-YOLO. Signal Process. 2024, 225, 109619. [Google Scholar] [CrossRef]
Li, H.; Zhang, R.; Zhao, M. TSD-YOLO: Small traffic sign detection based on improved YOLO v8. Expert Syst. Appl. 2024, 238, 121824. [Google Scholar] [CrossRef]
Hou, Y.; Ma, Z.; Liu, C.; Hui, T.W.; Loy, C.C. Inter-region affinity distillation for road marking segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12486–12495. [Google Scholar]
Li, J.; Zhan, Y.; Yun, P.; Zhou, G.; Chen, Q.; Fan, R. RoadFormer: Duplex transformer for RGB-normal semantic road scene parsing. IEEE Trans. Intell. Veh. 2024, 9, 5163–5172. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Wu, Z.; Feng, Y.; Liu, C.W.; Yu, F.; Chen, Q.; Fan, R. S³M-Net: Joint Learning of Semantic Segmentation and Stereo Matching for Autonomous Driving. IEEE Trans. Intell. Veh. 2024, 9, 3940–3951. [Google Scholar] [CrossRef]
Paul, S.; Patterson, Z.; Bouguila, N. Fishsegssl: A semi-supervised semantic segmentation framework for fish-eye images. J. Imaging 2024, 10, 71. [Google Scholar] [CrossRef]
Hu, H.; Wang, F.; Su, J.; Wang, Y.; Hu, L.; Fang, W.; Xu, J.; Zhang, Z. Ea-lss: Edge-aware lift-splat-shot framework for 3d bev object detection. arXiv 2023, arXiv:2303.17895. [Google Scholar]
Luo, S.; Chen, W.; Tian, W.; Liu, R.; Hou, L.; Zhang, X.; Shen, H.; Wu, R.; Geng, S.; Zhou, Y.; et al. Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives. IEEE Trans. Intell. Veh. 2024, 1–25. [Google Scholar] [CrossRef]
Zheng, T.; Fang, H.; Zhang, Y.; Tang, W.; Yang, Z.; Liu, H.; Cai, D. Resa: Recurrent feature-shift aggregator for lane detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 19–21 May 2021; Volume 35, pp. 3547–3554. [Google Scholar]
Honda, H.; Uchida, Y. CLRerNet: Improving confidence of lane detection with LaneIoU. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2024; pp. 1176–1185. [Google Scholar]
Zhou, H.; Zhou, H.; Chang, J.; Lu, T.; Ma, J. 3d lane detection from front or surround-view using joint-modeling & matching. IEEE Trans. Intell. Veh. 2024, 1–14. [Google Scholar]
Wu, D.; Jia, F.; Chang, J.; Li, Z.; Sun, J.; Han, C.; Li, S.; Liu, Y.; Ge, Z.; Wang, T. The 1st-place solution for cvpr 2023 openlane topology in autonomous driving challenge. arXiv 2023, arXiv:2306.09590. [Google Scholar]
Xi, S.; Liu, Z.; Wang, Z.; Zhang, Q.; Ding, H.; Kang, C.C.; Chen, Z. Autonomous driving roadway feature interpretation using integrated semantic analysis and domain adaptation. IEEE Access 2024, 12, 98254–98269. [Google Scholar] [CrossRef]
Yang, L.; He, Z.; Zhao, X.; Fang, S.; Yuan, J.; He, Y.; Li, S.; Liu, S. A deep learning method for traffic light status recognition. J. Intell. Connect. Veh. 2023, 6, 173–182. [Google Scholar] [CrossRef]
Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.M.; Shum, H.Y. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv 2022, arXiv:2203.03605. [Google Scholar]
VT, M.A.; Omar, M.; Ahamad, J.; Ahmad, K.; Khan, M.A. Deep Learning-Based Speed Breaker Detection. SN Comput. Sci. 2024, 5, 571. [Google Scholar] [CrossRef]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
Lis, K.; Nakka, K.; Fua, P.; Salzmann, M. Detecting the unexpected via image resynthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2152–2161. [Google Scholar]
Pinggera, P.; Ramos, S.; Gehrig, S.; Franke, U.; Rother, C.; Mester, R. Lost and found: Detecting small road hazards for self-driving vehicles. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–16 October 2016; pp. 1099–1106. [Google Scholar]
Nayal, N.; Yavuz, M.; Henriques, J.F.; Güney, F. Rba: Segmenting unknown regions rejected by all. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 711–722. [Google Scholar]
Rai, S.N.; Cermelli, F.; Fontanel, D.; Masone, C.; Caputo, B. Unmasking anomalies in road-scene segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 4037–4046. [Google Scholar]
Laskar, Z.; Vojir, T.; Grcic, M.; Melekhov, I.; Gangisettye, S.; Kannala, J.; Matas, J.; Tolias, G.; Jawahar, C.V. A Dataset for Semantic Segmentation in the Presence of Unknowns. arXiv 2025, arXiv:2503.22309. [Google Scholar]
Yuan, H.; Li, G. A survey of traffic prediction: From spatio-temporal data to intelligent transportation. Data Sci. Eng. 2021, 6, 63–85. [Google Scholar] [CrossRef]
Yin, X.; Wu, G.; Wei, J.; Shen, Y.; Qi, H.; Yin, B. Deep learning on traffic prediction: Methods, analysis, and future directions. IEEE Trans. Intell. Transp. Syst. 2021, 23, 4927–4943. [Google Scholar] [CrossRef]
Guo, H.; Hou, X.; Cao, Z.; Zhang, J. GP3: Gaussian process path planning for reliable shortest path in transportation networks. IEEE Trans. Intell. Transp. Syst. 2021, 23, 11575–11590. [Google Scholar] [CrossRef]
Zhu, M.; Liu, X.Y.; Wang, X. An online ride-sharing path-planning strategy for public vehicle systems. IEEE Trans. Intell. Transp. Syst. 2018, 20, 616–627. [Google Scholar] [CrossRef]
Kumar, K.N.; Roy, D.; Suman, T.A.; Vishnu, C.; Mohan, C.K. TSANet: Forecasting traffic congestion patterns from aerial videos using graphs and transformers. Pattern Recognit. 2024, 155, 110721. [Google Scholar] [CrossRef]
Zhao, Z.; Shen, G.; Wang, L.; Kong, X. Graph Spatial-Temporal Transformer Network for Traffic Prediction. Big Data Res. 2024, 36, 100427. [Google Scholar] [CrossRef]
Liu, Y.; Feng, T.; Rasouli, S.; Wong, M. ST-DAGCN: A spatiotemporal dual adaptive graph convolutional network model for traffic prediction. Neurocomputing 2024, 601, 128175. [Google Scholar] [CrossRef]
Do, V.M.; Tran, Q.H.; Le, K.G.; Vuong, X.C.; Vu, V.T. Enhanced Deep Neural Networks for Traffic Speed Forecasting Regarding Sustainable Traffic Management Using Probe Data from Registered Transport Vehicles on Multilane Roads. Sustainability 2024, 16, 2453. [Google Scholar] [CrossRef]
Wu, S. Spatiotemporal dynamic forecasting and analysis of regional traffic flow in urban road networks using deep learning convolutional neural network. IEEE Trans. Intell. Transp. Syst. 2021, 23, 1607–1615. [Google Scholar] [CrossRef]
Iskandaryan, D.; Ramos, F.; Trilles, S. Bidirectional convolutional LSTM for the prediction of nitrogen dioxide in the city of Madrid. PLoS ONE 2022, 17, e0269295. [Google Scholar] [CrossRef]
Jiang, J.; Han, C.; Zhao, W.X.; Wang, J. Pdformer: Propagation delay-aware dynamic long-range transformer for traffic flow prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 6 September 2023; Volume 37, pp. 4365–4373. [Google Scholar]
Zheng, C.; Fan, X.; Wang, C.; Qi, J. Gman: A graph multi-attention network for traffic prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 1234–1241. [Google Scholar]
Masood, A.; Ahmad, K. A review on emerging artificial intelligence (AI) techniques for air pollution forecasting: Fundamentals, application and performance. J. Clean. Prod. 2021, 322, 129072. [Google Scholar] [CrossRef]
Iskandaryan, D.; Ramos, F.; Trilles, S. Application of deep learning and machine learning in air quality modeling. In Current Trends and Advances in Computer-Aided Intelligent Environmental Data Engineering; Elsevier: Amsterdam, The Netherlands, 2022; pp. 11–23. [Google Scholar]
Matthaios, V.N.; Knibbs, L.D.; Kramer, L.J.; Crilley, L.R.; Bloss, W.J. Predicting real-time within-vehicle air pollution exposure with mass-balance and machine learning approaches using on-road and air quality data. Atmos. Environ. 2024, 318, 120233. [Google Scholar] [CrossRef]
Heydari, A.; Majidi Nezhad, M.; Astiaso Garcia, D.; Keynia, F.; De Santoli, L. Air pollution forecasting application based on deep learning model and optimization algorithm. Clean Technol. Environ. Policy 2022, 24, 607–621. [Google Scholar] [CrossRef]
Janarthanan, R.; Partheeban, P.; Somasundaram, K.; Elamparithi, P.N. A deep learning approach for prediction of air quality index in a metropolitan city. Sustain. Cities Soc. 2021, 67, 102720. [Google Scholar] [CrossRef]
Iskandaryan, D.; Ramos, F.; Trilles, S. Graph neural network for air quality prediction: A case study in madrid. IEEE Access 2023, 11, 2729–2742. [Google Scholar] [CrossRef]
Gu, Y.; Li, B.; Meng, Q. Hybrid interpretable predictive machine learning model for air pollution prediction. Neurocomputing 2022, 468, 123–136. [Google Scholar] [CrossRef]
Jin, X.B.; Wang, Z.Y.; Kong, J.L.; Bai, Y.T.; Su, T.L.; Ma, H.J.; Chakrabarti, P. Deep spatio-temporal graph network with self-optimization for air quality prediction. Entropy 2023, 25, 247. [Google Scholar] [CrossRef]
Ansari, M.; Alam, M. An intelligent IoT-cloud-based air pollution forecasting model using univariate time-series analysis. Arab. J. Sci. Eng. 2024, 49, 3135–3162. [Google Scholar] [CrossRef]
Liang, Y.; Xia, Y.; Ke, S.; Wang, Y.; Wen, Q.; Zhang, J.; Zheng, Y.; Zimmermann, R. Airformer: Predicting nationwide air quality in china with transformers. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 6 September 2023; Volume 37, pp. 14329–14337. [Google Scholar]
Zhang, Z.; Zhang, S. Modeling air quality PM2. 5 forecasting using deep sparse attention-based transformer networks. Int. J. Environ. Sci. Technol. 2023, 20, 13535–13550. [Google Scholar] [CrossRef]
Queiroz, C.A.; Gautam, S. Road Infrastructure and Economic Development: Some Diagnostic Indicators; World Bank Publications: Washington, DC, USA, 1992; Volume 921. [Google Scholar]
Majidifard, H.; Adu-Gyamfi, Y.; Buttlar, W.G. Deep machine learning approach to develop a new asphalt pavement condition index. Constr. Build. Mater. 2020, 247, 118513. [Google Scholar] [CrossRef]
Ahmed, T.; Ejaz, N.; Choudhury, S. Redefining Real-time Road Quality Analysis with Vision Transformers on Edge Devices. IEEE Trans. Artif. Intell. 2024, 5, 4972–4983. [Google Scholar] [CrossRef]
Moroto, Y.; Maeda, K.; Togo, R.; Ogawa, T.; Haseyama, M. Multimodal Transformer Model Using Time-Series Data to Classify Winter Road Surface Conditions. Sensors 2024, 24, 3440. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z.; Song, W.; Zhuang, Y.; Zhang, B.; Wu, J. Automated Multi-Type Pavement Distress Segmentation and Quantification Using Transformer Networks for Pavement Condition Index Prediction. Appl. Sci. 2024, 14, 4709. [Google Scholar] [CrossRef]
Ji, A.; Xue, X.; Wang, Y.; Luo, X.; Xue, W. An integrated approach to automatic pixel-level crack detection and quantification of asphalt pavement. Autom. Constr. 2020, 114, 103176. [Google Scholar] [CrossRef]
Tabernik, D.; Šela, S.; Skvarč, J.; Skočaj, D. Segmentation-based deep-learning approach for surface-defect detection. J. Intell. Manuf. 2020, 31, 759–776. [Google Scholar] [CrossRef]
Kulambayev, B.; Astaubayeva, G.; Tleuberdiyeva, G.; Alimkulova, J.; Nussupbekova, G.; Kisseleva, O. Deep CNN Approach with Visual Features for Real-Time Pavement Crack Detection. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 319–328. [Google Scholar] [CrossRef]
Riid, A.; Louk, R.; Pihlak, R.; Tepljakov, A.; Vassiljeva, K. Pavement distress detection with deep learning using the orthoframes acquired by a mobile mapping system. Appl. Sci. 2019, 9, 4829. [Google Scholar] [CrossRef]
Liu, C.; Wu, D.; Li, Y.; Du, Y. Large-scale pavement roughness measurements with vehicle crowdsourced data using semi-supervised learning. Transp. Res. Part C Emerg. Technol. 2021, 125, 103048. [Google Scholar] [CrossRef]
Belmonte-Fernández, Ó.; Sansano-Sansano, E.; Trilles, S.; Caballer-Miedes, A. A reactive architectural proposal for fog/edge computing in the internet of things paradigm with application in deep learning. In Artificial Intelligence, Machine Learning, and Optimization Tools for Smart Cities: Designing for Sustainability; Springer International Publishing: Cham, Switzerland, 2022; pp. 155–175. [Google Scholar]
Granell, C.; Kamilaris, A.; Kotsev, A.; Ostermann, F.O.; Trilles, S. Internet of things. In Manual of Digital Earth; Springer: Singapore, 2020; pp. 387–423. [Google Scholar]
Yuan, X.; He, P.; Zhu, Q.; Li, X. Adversarial Examples: Attacks and Defenses for Deep Learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2805–2824. [Google Scholar] [CrossRef]
Philipsen, S.G.; Andersen, B.; Singh, B. Threats and Attacks to Modern Vehicles. In Proceedings of the 2021 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS), Bandung, Indonesia, 23–24 November 2021; pp. 22–27. [Google Scholar] [CrossRef]
Bharati, S.; Podder, P.; Mondal, M.R.H.; Robel, M.R.A. Threats and Countermeasures of Cyber Security in Direct and Remote Vehicle Communication Systems. Version Number: 1. arXiv 2020, arXiv:2006.08723. [Google Scholar]
Koscher, K.; Czeskis, A.; Roesner, F.; Patel, S.; Kohno, T.; Checkoway, S.; McCoy, D.; Kantor, B.; Anderson, D.; Shacham, H.; et al. Experimental Security Analysis of a Modern Automobile. In Proceedings of the 2010 IEEE Symposium on Security and Privacy, Oakland, CA, USA, 16–19 May 2010; pp. 447–462. [Google Scholar] [CrossRef]
Khanapuri, E.; Chintalapati, T.; Sharma, R.; Gerdes, R. Learning-based adversarial agent detection and identification in cyber physical systems applied to autonomous vehicular platoon. In Proceedings of the 2019 IEEE/ACM 5th International Workshop on Software Engineering for Smart Cyber-Physical Systems (SEsCPS), Montreal, QC, Canada, 28 May 2019; pp. 39–45. [Google Scholar]
Sherazi, H.H.R.; Iqbal, R.; Ahmad, F.; Khan, Z.A.; Chaudary, M.H. DDoS attack detection: A key enabler for sustainable communication in internet of vehicles. Sustain. Comput. Inform. Syst. 2019, 23, 13–20. [Google Scholar] [CrossRef]
Gruebler, A.; McDonald-Maier, K.D.; Alheeti, K.M.A. An intrusion detection system against black hole attacks on the communication network of self-driving cars. In Proceedings of the 2015 Sixth International Conference on Emerging Security Technologies (EST), Braunschweig, Germany, 3–5 September 2015; pp. 86–91. [Google Scholar]
Kamel, J.; Haidar, F.; Jemaa, I.B.; Kaiser, A.; Lonc, B.; Urien, P. A misbehavior authority system for sybil attack detection in c-its. In Proceedings of the 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), Online, 10–12 October 2019; pp. 1117–1123. [Google Scholar]
Dasgupta, S.; Rahman, M.; Islam, M.; Chowdhury, M. Prediction-Based GNSS Spoofing Attack Detection for Autonomous Vehicles. Version Number: 1. arXiv 2020, arXiv:2010.11722. [Google Scholar]
Xu, Y.; Lei, M.; Li, M.; Zhao, M.; Hu, B. A new anti-jamming strategy based on deep reinforcement learning for MANET. In Proceedings of the 2019 IEEE 89th Vehicular Technology Conference (VTC2019-Spring), Kuala Lumpur, Malaysia, 28 April–1 May 2019; pp. 1–5. [Google Scholar]
Tariq, S.; Lee, S.; Woo, S.S. CANTransfer: Transfer learning based intrusion detection on a controller area network using convolutional LSTM network. In Proceedings of the 35th Annual ACM Symposium on Applied Computing, Virtual, 30 March–3 April 2020; pp. 1048–1055. [Google Scholar]
Li, X.; Hu, Z.; Xu, M.; Wang, Y.; Ma, J. Transfer learning based intrusion detection scheme for Internet of vehicles. Inf. Sci. 2021, 547, 119–135. [Google Scholar] [CrossRef]
Ababsa, M.; Ribouh, S.; Malki, A.; Khoukhi, L. Deep Multimodal Learning for Real-Time DDoS Attacks Detection in Internet of Vehicles. arXiv 2025, arXiv:2501.15252. [Google Scholar]
Nazat, S.; Abdallah, M. XAI-based Feature Ensemble for Enhanced Anomaly Detection in Autonomous Driving Systems. arXiv 2024, arXiv:2410.15405. [Google Scholar]
Ali, W.; Din, I.U.; Almogren, A.; Rodrigues, J.J.P.C. Federated Learning-Based Privacy-Aware Location Prediction Model for Internet of Vehicular Things. IEEE Trans. Veh. Technol. 2025, 74, 1968–1978. [Google Scholar] [CrossRef]
Qu, A.; Tang, Y.; Ma, W. Adversarial attacks on deep reinforcement learning-based traffic signal control systems with colluding vehicles. ACM Trans. Intell. Syst. Technol. 2023, 14, 1–22. [Google Scholar] [CrossRef]
Im Choi, J.; Tian, Q. Adversarial attack and defense of yolo detectors in autonomous driving scenarios. In Proceedings of the 2022 IEEE Intelligent Vehicles Symposium (IV), Aachen, Germany, 5–9 June 2022; pp. 1011–1017. [Google Scholar]
Protogerou, A.; Papadopoulos, S.; Drosou, A.; Tzovaras, D.; Refanidis, I. A graph neural network method for distributed anomaly detection in IoT. Evol. Syst. 2021, 12, 19–36. [Google Scholar] [CrossRef]
Schafer, H.; Santana, E.; Haden, A.; Biasini, R. A Commute in Data: The comma2k19 Dataset. Version Number: 1. arXiv 2018, arXiv:1812.05752. [Google Scholar]
Sant’Ana Da Silva, E.; Pedrini, H.; Santos, A.L.D. Applying Graph Neural Networks to Support Decision Making on Collective Intelligent Transportation Systems. IEEE Trans. Netw. Serv. Manag. 2023, 20, 4085–4096. [Google Scholar] [CrossRef]
Narayanan, S.N.; Mittal, S.; Joshi, A. OBD_SecureAlert: An Anomaly Detection System for Vehicles. In Proceedings of the 2016 IEEE International Conference on Smart Computing (SMARTCOMP), St Louis, MO, USA, 18–20 May 2016; pp. 1–6. [Google Scholar] [CrossRef]
Alshammari, A.; Zohdy, M.A.; Debnath, D.; Corser, G. Classification Approach for Intrusion Detection in Vehicle Systems. Wirel. Eng. Technol. 2018, 09, 79–94. [Google Scholar] [CrossRef]
Ahmad, U.; Song, H.; Bilal, A.; Alazab, M.; Jolfaei, A. Secure Passive Keyless Entry and Start System Using Machine Learning. In Security, Privacy, and Anonymity in Computation, Communication, and Storage; Wang, G., Chen, J., Yang, L.T., Eds.; Series Title: Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11342, pp. 304–313. [Google Scholar] [CrossRef]
Ali Alheeti, K.M.; Al-Zaidi, R.; Woods, J.; McDonald-Maier, K. An intrusion detection scheme for driverless vehicles based gyroscope sensor profiling. In Proceedings of the 2017 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 8–11 January 2017; pp. 448–449. [Google Scholar] [CrossRef]
Ahmad, U.; Song, H.; Bilal, A.; Alazab, M.; Jolfaei, A. Securing smart vehicles from relay attacks using machine learning. J. Supercomput. 2020, 76, 2665–2682. [Google Scholar] [CrossRef]
Han, M.; Cheng, P.; Ma, S. CVNNs-IDS: Complex-Valued Neural Network Based In-Vehicle Intrusion Detection System. In Security and Privacy in Digital Economy; Yu, S., Mueller, P., Qian, J., Eds.; Series Title: Communications in Computer and Information Science; Springer: Singapore, 2020; Volume 1268, pp. 263–277. [Google Scholar] [CrossRef]
Clark, G.; Doran, M.; Glisson, W. A Malicious Attack on the Machine Learning Policy of a Robotic System. In Proceedings of the 2018 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), New York, NY, USA, 1–8 August 2018; pp. 516–521. [Google Scholar] [CrossRef]
Biggio, B.; Nelson, B.; Laskov, P. Poisoning Attacks against Support Vector Machines. Version Number: 3. arXiv 2012, arXiv:1206.6389. [Google Scholar]
Sitawarin, C.; Bhagoji, A.N.; Mosenia, A.; Chiang, M.; Mittal, P. DARTS: Deceiving Autonomous Cars with Toxic Signs. Version Number: 3. arXiv 2018, arXiv:1802.06430. [Google Scholar]
Zhu, Y.; Miao, C.; Hajiaghajani, F.; Huai, M.; Su, L.; Qiao, C. Adversarial attacks against lidar semantic segmentation in autonomous driving. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, Coimbra, Portugal, 15–17 November 2021; pp. 329–342. [Google Scholar]
Papernot, N.; McDaniel, P.; Wu, X.; Jha, S.; Swami, A. Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2016; pp. 582–597. [Google Scholar] [CrossRef]
Lu, J.; Issaranon, T.; Forsyth, D. SafetyNet: Detecting and Rejecting Adversarial Examples Robustly. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 446–454. [Google Scholar] [CrossRef]
Zhang, L.; Peng, Z.; Li, Q.; Zhou, B. Cat: Closed-loop adversarial training for safe end-to-end driving. In Proceedings of the Conference on Robot Learning, Atlanta, GA, USA, 6–9 November 2023; pp. 2357–2372. [Google Scholar]
Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar] [CrossRef]
Villarini, B.; Radoglou-Grammatikis, P.; Lagkas, T.; Sarigiannidis, P.; Argyriou, V. Detection of Physical Adversarial Attacks on Traffic Signs for Autonomous Vehicles. In Proceedings of the 2023 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), Bali, Indonesia, 13–15 July 2023; pp. 31–37. [Google Scholar] [CrossRef]
Lu, Y.; Huang, X.; Dai, Y.; Maharjan, S.; Zhang, Y. Federated learning for data privacy preservation in vehicular cyber-physical systems. IEEE Netw. 2020, 34, 50–56. [Google Scholar] [CrossRef]
Uprety, A.; Rawat, D.B.; Li, J. Privacy preserving misbehavior detection in IoV using federated machine learning. In Proceedings of the 2021 IEEE 18th Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 9–12 January 2021; pp. 1–6. [Google Scholar]
Dollar, P.; Wojek, C.; Schiele, B.; Perona, P. Pedestrian detection: A benchmark. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 304–311. [Google Scholar] [CrossRef]
Zhang, S.; Benenson, R.; Schiele, B. CityPersons: A Diverse Dataset for Pedestrian Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4457–4465. [Google Scholar] [CrossRef]
Huang, X.; Cheng, X.; Geng, Q.; Cao, B.; Zhou, D.; Wang, P.; Lin, Y.; Yang, R. The ApolloScape Dataset for Autonomous Driving. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 1067–10676. [Google Scholar] [CrossRef]
Yogamani, S.; Hughes, C.; Horgan, J.; Sistu, G.; Chennupati, S.; Uricar, M.; Milz, S.; Simon, M.; Amende, K.; Witt, C.; et al. WoodScape: A Multi-Task, Multi-Camera Fisheye Dataset for Autonomous Driving. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9307–9317. [Google Scholar] [CrossRef]
Ertler, C.; Mislej, J.; Ollmann, T.; Porzi, L.; Neuhold, G.; Kuang, Y. The mapillary traffic sign dataset for detection and classification on a global scale. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 68–84. [Google Scholar]
Zhang, J.; Zou, X.; Kuang, L.D.; Wang, J.; Sherratt, R.S.; Yu, X. CCTSDB 2021: A more comprehensive traffic sign detection benchmark. Hum.-Centric Comput. Inf. Sci. 2022, 12, 1–18. [Google Scholar]
Arya, D.; Maeda, H.; Ghosh, S.K.; Toshniwal, D.; Mraz, A.; Kashiyama, T.; Sekimoto, Y. Transfer learning-based road damage detection for multiple countries. arXiv 2020, arXiv:2008.13101. [Google Scholar]
Donovan, B.; Work, D. New York City Taxi Data (2010–2013); University of Illinois Urbana-Champaign: Champaign, IL, USA, 2014. [Google Scholar] [CrossRef]
Research Core Team. Transportation Networks for Research. 2023. Available online: https://github.com/bstabler/TransportationNetworks (accessed on 10 April 2025).
Zhang, S.; Guo, B.; Dong, A.; He, J.; Xu, Z.; Chen, S.X. Cautionary tales on air-quality improvement in Beijing. Proc. R. Soc. A Math. Phys. Eng. Sci. 2017, 473, 20170457. [Google Scholar] [CrossRef]
Madrid Air Quality. 2019. Available online: https://www.kaggle.com/datasets/decide-soluciones/air-quality-madrid (accessed on 10 April 2025).
Van Der Heijden, R.W.; Lukaseder, T.; Kargl, F. Veremi: A dataset for comparable evaluation of misbehavior detection in vanets. In Proceedings of the Security and Privacy in Communication Networks: 14th International Conference, SecureComm 2018, Singapore, 8–10 August 2018; Proceedings, Part I. pp. 318–337. [Google Scholar]
Leal-Taixé, L.; Milan, A.; Schindler, K.; Cremers, D.; Reid, I.; Roth, S. Tracking the Trackers: An Analysis of the State of the Art in Multiple Object Tracking. Version Number: 1. arXiv 2017, arXiv:1704.02781. [Google Scholar]
Neuhold, G.; Ollmann, T.; Rota Bulo, S.; Kontschieder, P. The mapillary vistas dataset for semantic understanding of street scenes. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4990–4999. [Google Scholar]
Bi, Q.; You, S.; Gevers, T. Interactive Learning of Intrinsic and Extrinsic Properties for All-Day Semantic Segmentation. IEEE Trans. Image Process. 2023, 32, 3821–3835. [Google Scholar] [CrossRef]
Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11621–11631. [Google Scholar]
Chang, M.F.; Lambert, J.; Sangkloy, P.; Singh, J.; Bak, S.; Hartnett, A.; Wang, D.; Carr, P.; Lucey, S.; Ramanan, D.; et al. Argoverse: 3d tracking and forecasting with rich maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8748–8757. [Google Scholar]
Wilson, B.; Qi, W.; Agarwal, T.; Lambert, J.; Singh, J.; Khandelwal, S.; Pan, B.; Kumar, R.; Hartnett, A.; Pontes, J.K.; et al. Argoverse 2: Next generation datasets for self-driving perception and forecasting. arXiv 2023, arXiv:2301.00493. [Google Scholar]
Wang, H.; Li, T.; Li, Y.; Chen, L.; Sima, C.; Liu, Z.; Wang, B.; Jia, P.; Wang, Y.; Jiang, S.; et al. Openlane-v2: A topology reasoning benchmark for unified 3d hd mapping. Adv. Neural Inf. Process. Syst. 2023, 36, 18873–18884. [Google Scholar]
Arya, D.; Maeda, H.; Ghosh, S.; Toshniwal, D.; Omata, H. Crowdsensing-based road damage detection challenge (crddc-2022). In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17–20 December 2022. [Google Scholar]
Haklay, M.; Weber, P. OpenStreetMap: User-Generated Street Maps. IEEE Pervasive Comput. 2008, 7, 12–18. [Google Scholar] [CrossRef]
De Souza, A.M.; Yokoyama, R.S.; Maia, G.; Loureiro, A.; Villas, L. Real-time path planning to prevent traffic jam through an intelligent transportation system. In Proceedings of the 2016 IEEE Symposium on Computers and Communication (ISCC), Messina, Italy, 27–30 June 2016; pp. 726–731. [Google Scholar]
Upadhyay, A.; Ayodele, J.O.; Kumar, A.; Garza-Reyes, J.A. A review of challenges and opportunities of blockchain adoption for operational excellence in the UK automotive industry. J. Glob. Oper. Strateg. Sourc. 2021, 14, 7–60. [Google Scholar] [CrossRef]
Glenn, J.; Jing, Q. Ultralytics YOLO11. AGPL-3.0. 2024. Available online: https://github.com/ultralytics/ultralytics (accessed on 22 April 2025).
Huang, L.; Duan, Q.; Liu, Y.; Wu, Y.; Li, Z.; Guo, Z.; Liu, M.; Lu, X.; Wang, P.; Liu, F.; et al. Artificial intelligence: A key fulcrum for addressing complex environmental health issues. Environ. Int. 2025, 198, 109389. [Google Scholar] [CrossRef]
Nawaz, M.; Tang, J.K.T.; Bibi, K.; Xiao, S.; Ho, H.P.; Yuan, W. Robust cognitive capability in autonomous driving using sensor fusion techniques: A survey. IEEE Trans. Intell. Transp. Syst. 2023, 25, 3228–3243. [Google Scholar] [CrossRef]
Ye, H.; Li, G.Y.; Juang, B.H.F. Deep Reinforcement Learning Based Resource Allocation for V2V Communications. IEEE Trans. Veh. Technol. 2019, 68, 3163–3173. [Google Scholar] [CrossRef]
Ibrahum, A.D.M.; Hussain, M.; Hong, J.E. Deep learning adversarial attacks and defenses in autonomous vehicles: A systematic literature review from a safety perspective. Artif. Intell. Rev. 2025, 58, 1–53. [Google Scholar] [CrossRef]
Singh, A.; Murzello, Y.; Pokhrel, S.; Samuel, S. An investigation of supervised machine learning models for predicting drivers’ ethical decisions in autonomous vehicles. Decis. Anal. J. 2025, 14, 100548. [Google Scholar] [CrossRef]
Liu, D.; Zhu, Y.; Liu, Z.; Liu, Y.; Han, C.; Tian, J.; Li, R.; Yi, W. A survey of model compression techniques: Past, present, and future. Front. Robot. AI 2025, 12, 1518965. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Wang, F.; Li, Y. A lightweight network for traffic sign detection via multiple scale context awareness and semantic information guidance. Eng. Appl. Artif. Intell. 2024, 128, 107532. [Google Scholar] [CrossRef]
Wang, Q.; Liu, Y.; Zhou, B. NTS-YOLO: A Nocturnal Traffic Sign Detection Method Based on Improved YOLOv5. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 11245–11251. [Google Scholar] [CrossRef]
Rahmani, S.; Baghbani, A.; Bouguila, N.; Patterson, Z. Graph neural networks for intelligent transportation systems: A survey. IEEE Trans. Intell. Transp. Syst. 2023, 24, 8846–8885. [Google Scholar] [CrossRef]
Nag, D.; Brandel-Tanis, F.; Pramestri, Z.A.; Pitera, K.; Frøyen, Y.K. Exploring digital twins for transport planning: A review. Eur. Transp. Res. Rev. 2025, 17, 15. [Google Scholar] [CrossRef]
Alfaro-Viquez, D.; Zamora-Hernandez, M.; Fernandez-Vega, M.; Garcia-Rodriguez, J.; Azorin-Lopez, J. A Comprehensive Review of AI-Based Digital Twin Applications in Manufacturing: Integration Across Operator, Product, and Process Dimensions. Electronics 2025, 14, 646. [Google Scholar] [CrossRef]
Zhang, W.; Chen, L.; Martinez, F.J. Inter-Urban Analysis of Pedestrian and Drivers through a Vehicular Network Based on Hybrid Communications Embedded in a Portable Car System and Advanced Image Processing Technologies. IEEE Trans. Intell. Transp. Syst. 2023, 24, 7892–7905. [Google Scholar]
Kumar, S.; Patel, V.; Tanaka, H. Intelligent Traffic Monitoring through Heterogeneous and Autonomous Networks Dedicated to Traffic Automation. In Proceedings of the 2023 IEEE 97th Vehicular Technology Conference (VTC2023-Spring), Florence, Italy, 20–23 June 2023; pp. 1–6. [Google Scholar]
Feng, Y.; Carballo, A.; Fujii, K.; Karlsson, R.; Ding, M.; Takeda, K. MulCPred: Learning Multi-Modal Concepts for Explainable Pedestrian Action Prediction. Sensors 2024, 24, 6742. [Google Scholar] [CrossRef]
Njoku, J.N.; Nwakanma, C.I.; Lee, J.M.; Kim, D.S. Enhancing Security and Accountability in Autonomous Vehicles through Robust Speaker Identification and Blockchain-Based Event Recording. Electronics 2023, 12, 4998. [Google Scholar] [CrossRef]

Figure 1. Smart mobility represented as the contribution of three main perspectives: the vehicle, the environment, and security.

Figure 2. A representation of the main hardware pillars in smart vehicles: environmental and in-vehicle sensors (orange boxes), like LiDAR, camera, radar, tire pressure, temperature, and speed sensors, retrieve data that are processed by ECUs (red box). The network (green box and dotted line) guarantees the correct in-vehicle and out-of-vehicle exchange of information.

Figure 3. A compact representation of the computational tasks involved in detection and tracking in intelligent vehicles.

Figure 4. An intuitive representation of the main challenges involving pedestrian detection. From left to right: large-scale variation, occlusion, and low-light conditions.

Table 2. A summary of some relevant works in the literature. From the leftmost column, the table presents the corresponding reference number and year of publication (in brackets), the target to be investigated (TSD: traffic sign detection; LD: lane detection; TLD: traffic light detection; UO: unknown object), and the key aspect that made the work noteworthy.

Work (Y)	Target	Key Aspect
[89] (2021)	LD	Recurrent feature shift
[94] (2023)	TLD	DINO
[101] (2023)	UO	Mask classification
[75] (2024)	TSD	YOLO
[83] (2024)	LD	Transformer
[91] (2024)	LD	Three-dimensional lane detection

Table 4. This is a summary of the key literature on air pollution estimation. APP: air pollution prediction; AQ: air quality; AQP: air quality prediction.

Work (Y)	Goal/Target	Method	Metrics	Data
[115] (2021)	Review of techniques for APP	LR	RMSE, MAE, MAPE IFAW, and IFCP $R^{2}$	Miscellaneous
[116] (2022)	Exploration of AQP factors	LR	RMSE, MAE, and MAPE	Miscellaneous
[117] (2024)	Real-time APP	Mass-balance model combined + ML	FAC2, MB, MGE RMSE, R, and IOA	On-road + air quality data
[118] (2022)	NO₂ and SO₂ prediction	LSTM + MVO	RMSE, MAE, and MAPE	AQ data
[119] (2021)	AQI prediction	SWM + LSTM GLCM + MFOA	RMSE and $R^{2}$	AQ and meteorological data
[120] (2023)	NO₂ prediction	A3T-GCN	RMSE, MAE, and R	AQ and meteorological and traffic data
[121] (2022)	PM_2.5 prediction	Hybrid ML model	CC, PE, and NRMSE	UC Irvine ML Repository
[122] (2023)	PM_2.5 prediction	BGGRU	RMSE, MSE, MAE, and $R^{2}$	AQ data
[123] (2024)	AQI prediction	BO-HyTS	MSE, RMSE, Med AE Max Error, and MAE	IoT sensor data
[124] (2023)	AQI prediction	AirFormer	MAE and RMSE	AQ and meteorological data
[125] (2023)	PM_2.5 prediction	STN	MAE, RMSE, and $R^{2}$	Beijing and Taizhou $P M_{2.5}$ data

Table 5. This is a summary of some relevant studies on road conditions. PCA: pavement condition assessment; P: prediction; D: detection; C: classification; S: segmentation; Q: quantification; MMTransformer: multi-modal transformer model; RSCD: road surface condition dataset; M: macro; MIoU: mean intersection of union; KolektorSDD: Kolektor Surface-Defect Dataset; P: average precision; FN: False Negatives; FP: False Positives; Acc: accuracy; Rec: recall; Prc: precision; F1s: F1-Score.

Work (Y)	Goal/Target	Method	Metrics	Data
[127] (2020)	PCA	YOLO + U-Net	F1s, Prc, and Rec	Google Street View images
[128] (2024)	Real-time pavement C	EdgeFusionViT	Acc, Prc, Rec, and F1s	RSCD
[129] (2024)	DC winter road surface conditions	MMTransformer	Acc, M-Prc, M-Rec, and M-F1	RGB images
[130] (2024)	Multi-type pavement distress SD	ISTD-DisNet	F1s and MIoU	ISTD-PDS7 dataset
[131] (2020)	Asphalt crack DQ	DeepLabv3+CNN	MIoU	RGB images
[132] (2020)	Defect DS	CNN	AP, FN, and FP	KolektorSDD
[133] (2024)	Real-time crack D	Deep CNN	Acc, Prc, Rec, and F1s	RGB images
[134] (2019)	Pavement distress D	CNN	Acc, Prc, Rec, and MCC	Orthoframes from mobile mapping system
[135] (2021)	PCA	PSD+LTI	MAE	Vehicle crowdsourced data

Table 6. This is a summary of some relevant works in the literature. From the leftmost column, the table presents the corresponding reference number and year of publication (in brackets), the target to be investigated (AD: attack detection; ID: intrusion detection; PP: privacy protection; AN: adversarial network attack), the application goal, and the key aspect that made the work noteworthy.

Work (Y)	Type	Goal	Key Aspect
[142] (2019)	AD	Platoon Attack	CNN and FCNN
[143] (2015)	AD	DDoS	Q-learning
[144] (2019)	AD	Black hole	ANN
[145] (2019)	AD	Sybil attack	LSTM
[146] (2020)	AD	Spoofing	LSTM
[147] (2019)	AD	Jamming	Deep Q-network
[148] (2020)	ID	New attacks	Transfer Learning on LSTM
[149] (2021)	ID	New attaks	Transfer Learning
[150] (2024)	AD	DDoS	LSTM and GRU
[151] (2024)	ID	Anomaly detection	Explainable AI
[152] (2025)	PP	Vehicle location	Federated learning
[153] (2023)	AN	Traffic control system attack	Coop-send falsified information
[154] (2023)	AN	Object detection	Objectness information

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Del-Coco, M.; Carcagnì, P.; Oliver, S.T.; Iskandaryan, D.; Leo, M. The Role of AI in Smart Mobility: A Comprehensive Survey. Electronics 2025, 14, 1801. https://doi.org/10.3390/electronics14091801

AMA Style

Del-Coco M, Carcagnì P, Oliver ST, Iskandaryan D, Leo M. The Role of AI in Smart Mobility: A Comprehensive Survey. Electronics. 2025; 14(9):1801. https://doi.org/10.3390/electronics14091801

Chicago/Turabian Style

Del-Coco, Marco, Pierluigi Carcagnì, Sergi Trilles Oliver, Ditsuhi Iskandaryan, and Marco Leo. 2025. "The Role of AI in Smart Mobility: A Comprehensive Survey" Electronics 14, no. 9: 1801. https://doi.org/10.3390/electronics14091801

APA Style

Del-Coco, M., Carcagnì, P., Oliver, S. T., Iskandaryan, D., & Leo, M. (2025). The Role of AI in Smart Mobility: A Comprehensive Survey. Electronics, 14(9), 1801. https://doi.org/10.3390/electronics14091801

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Role of AI in Smart Mobility: A Comprehensive Survey

Abstract

1. Introduction

1.1. Related Works

1.2. Motivation and Contribution

1.3. Methodology

1.4. Structure

2. Smart Vehicles

2.1. Hardware Layer

2.2. Perception and Control Layer

2.2.1. Pedestrians and Vehicles

2.2.2. Road Infrastructure

3. Smart Planning

3.1. Traffic Prediction

3.2. Pollution Estimation

3.3. Road Conditions

4. Vehicle Networks and Security

4.1. Type of Attack and ML-Based Security Solution

4.1.1. Attack Detection

4.1.2. Intrusion Detection and Misbehaviour

4.1.3. Adversarial ML Attacks

4.2. Privacy Protection

5. Datasets

6. Practical Implementations of AI in Smart Mobility

7. Discussions, Open Challenges, and Future Directions

7.1. Open Challenges

7.2. Future Directions

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI