1. Introduction
Smart cities are rapidly reshaping urban life, driven by growing real-world deployments and expanding research interest in intelligent urban development [
1]. At their core, smart cities integrate social capital with both traditional and modern Information and Communication Technology (ICT) infrastructure to foster sustainable economic growth and improve quality of life [
2]. This transformation has been accelerated by the proliferation of emerging technologies—Artificial Intelligence (AI), Big Data, and the Internet of Things (IoT)—which have collectively redefined how cities manage governance, healthcare, energy, transportation, safety, infrastructure, and education [
3].
Central to this transformation is smart mobility, a critical component of a smart city. Smart mobility leverages IoT technologies, wireless networks, and real-time communication to optimize urban transportation, enabling improved route planning, reduced congestion and emissions, shorter travel times, and enhanced overall efficiency [
4]. At its core, smart mobility includes Intelligent Transportation Systems (ITSs) which leverage communication networks, Big Data and sensing devices into transportation infrastructure to enable data-driven decision-making. It also extends to Connected and Automated Vehicles (CAVs), smart road infrastructure, cloud-based data services, and demand-driven platforms such as Mobility-as-a-Service (MaaS) [
5]; together forming an ecosystem characterized by heterogeneity, spatio-temporality, high interconnectivity, and real-time data dependency [
6].
The rapid adoption of smart mobility is further underscored by projections that 85% of the world’s population will reside in urban areas by 2050 [
7], thereby placing unprecedented demand on city infrastructure and services. Smart mobility directly addresses this challenge by integrating private vehicles, ride-sharing, public transit, electric vehicles, and on-demand services into unified digital ecosystems that optimize the movement of people and goods. Beyond reducing congestion and improving traffic flow, smart mobility supports broader sustainability objectives, most notably the United Nations Sustainable Development Goals, through measurable reductions in carbon emissions and more efficient utilization of urban resources [
8,
9].
In view of the above, existing research has predominantly focused on behavioural aspects of sustainable transportation and the enhancement of mobility solutions [
2], while cybersecurity for smart mobility remains an evolving and insufficiently addressed concern. Karopoulos et al. [
10] highlight that breaches in cyberspace resulting from cyber-physical attacks can lead to harmful consequences in the physical domain, many internal vehicle networks (IVNs), particularly the Controller Area Network (CAN) bus, were originally designed with minimal security considerations because earlier IVNs operated in isolated environments with little external connectivity [
11]. The rapid integration of these legacy systems into broader smart mobility ecosystems has therefore introduced unaddressed attack surfaces.
Existing studies on Intrusion Detection Systems (IDSs) for smart mobility have largely focused on isolated components rather than the ecosystem in general. At the network level, studies have proposed intrusion detection for vehicular ad hoc networks [
12], edge-based detection for transportation IoT [
13], and IoV-specific frameworks, including federated learning approaches such as CVAR-FL [
14] and deep-learning-based models [
15]. At the in-vehicle level, contributions include GAN-based intrusion detection for CAN-FD buses [
16], voltage signal analysis for CAN bus attacks [
17], threat detection for autonomous vehicles [
18], supervised learning for CAV security [
19], federated learning for ITS misbehavior detection [
20], and cloud–vehicle collaborative detection for IoV [
21].
While these contributions represent meaningful progress, they share a common limitation: models are developed and evaluated within a single component, typically validated on one or two datasets. As a result, they struggle to generalize across the extreme heterogeneity, diverse communication protocols, and spatio-temporal characteristics of real-world smart mobility networks. This growing complexity and expanding threat landscape necessitate the development of an advanced, intelligent intrusion detection framework capable of operating across the heterogeneous and dynamically interconnected components of smart mobility networks [
22].
To address this gap, we propose a hybrid deep learning framework that integrates Convolutional Neural Networks (CNNs), long short-term memory (LSTM) networks, and attention mechanisms for intrusion detection in heterogeneous smart mobility networks. The primary novelty of this research lies in its synergistic integration of a CNN, an LSTM, and an attention mechanism to create a comprehensive and resilient intrusion detection framework tailored for heterogeneous smart mobility networks. This hybrid architecture uniquely addresses the limitations of prior component-specific models by employing CNNs to extract spatial features from high-dimensional data sources, including Roadside Unit (RSU) sensor and vehicle telemetry, while leveraging LSTM networks to model temporal dependencies in time-series data such as traffic streams and network communications, capturing evolving intrusion patterns over time. The attention mechanism further enhances this architecture by dynamically weighting the most discriminative features across diverse data streams, suppressing irrelevant signals and enabling precise, context-aware intrusion detection in smart mobility networks.
The proposed framework targets intrusion detection across three core layers of smart mobility infrastructure. The first is the vehicle layer, encompassing in-vehicle networks such as Controller Area Network (CAN) buses, Electronic Control Units (ECUs), On-Board Units (OBUs), and Connected and Automated Vehicles (CAVs). The second is the infrastructure layer, covering Roadside Units (RSUs) supporting vehicle-to-infrastructure and vehicle-to-network (V2I/V2N) communications, 5G/6G networks, and sensor-based smart parking systems. The third is the digital services layer, comprising cloud platforms, IoT-based traffic and parking sensors, Mobility-as-a-Service (MaaS) platforms, Vehicular Ad Hoc Networks (VANETs), and the broader Internet of Vehicles (IoV) ecosystem. By operating across all three layers, the framework overcomes the limitations of component-specific Intrusion Detection Systems, offering consistent cross-dataset performance, scalability, improved detection performance, and enhanced resilience against new and evolving threats.
This study makes the following contributions:
- 1.
We propose a novel hybrid deep learning framework that integrates CNNs, LSTM networks, and an attention mechanism for spatial feature extraction, temporal dependency modeling, and adaptive feature weighting respectively, achieving enhanced intrusion detection accuracy across heterogeneous smart mobility data streams.
- 2.
We take into account the heterogeneous nature of the smart mobility network by considering threats occurring in diverse components such as in-vehicle Controller Area Network (CAN) buses, vehicular ad hoc networks, Internet of Vehicles (IoV) and 5G-connected vehicles, IoT smart parking systems, and cloud-based mobility services.
- 3.
We conduct a rigorous and comprehensive evaluation of the proposed model on multiple datasets spanning key smart mobility domains: VeReMi Extension (vehicular misbehavior), Car Hacking (in-vehicle network attacks), 5G-NIDD (5G network intrusions), Edge-IIoTset (IoT and edge threats), UNSW-NB15 (cloud-based attacks), and CICIoV2024 (Internet of Vehicles anomalies).
The subsequent sections of this paper are structured as follows:
Section 2 includes the literature review and highlights the gaps in the literature.
Section 3 outlines the methodology adopted in this study.
Section 4 is for the presentation and discussion of the findings.
Section 5, rounding off, includes the concluding remarks and recommendations for future studies.
2. Related Work
The integration of connected and autonomous vehicles (CAVs), intelligent transportation systems (ITSs), and digital platforms in smart mobility has raised cybersecurity concerns, resulting in a growing interest in their Intrusion Detection System (IDS) research.
For vehicular components, particularly the Controller Area Network (CAN) bus, several intrusion detection approaches have been proposed. Wang et al. [
16] developed a GAN-based IDS for CAN-FD bus nodes, achieving a 99.93% detection rate and a 0.15 ms response time, surpassing baseline methods by 1.2%. Khan et al. [
23] proposed an LSTM-FCN model enhanced with squeeze-and-excite layers and attention mechanisms for automotive theft detection, reporting accuracies of 99.36% and 96.36% on the HCRL and test datasets, respectively. Wei et al. [
24] introduced an attention-based autoencoder model for binary CAN message processing, achieving AUC scores of 0.923 and 0.915 on benchmark datasets, while Kang et al. [
25] combined time-interval likelihood and signal-based analysis in the CANival framework, reporting true positive rates of up to 0.960. At the physical layer, Levy et al. [
17] employed voltage-based spoofing detection, achieving 100% accuracy in intrusion localization, and Yin et al. [
26] proposed an LSTM autoencoder for voltage-based attack filtering, reducing attack success rates to 0.18% with 99.4% ECU identification accuracy. Further contributions include a control-system-level analysis for J1939 buses via Simulink-CANoe simulations [
27] and the IDS-DEC framework [
28], which combines LSTM-CNN autoencoders with entropy-based clustering to achieve 99% accuracy on Car Hacking datasets.
Beyond in-vehicle security, existing literature also highlights cybersecurity challenges across broader smart mobility infrastructure, including electric vehicle charging stations (EVCS), roadside units (RSUs), and ITS components. For EVCS, Almadhor et al. [
29] proposed a transfer-learning-based IDS combining deep neural networks and LSTM-RNN, achieving 93% accuracy on the CICEVSE2024 dataset for cyber-physical attack detection, while risk analyses by Hamdare et al. [
30] and Skarga-Bandurova et al. [
31] further highlight protocol-centric vulnerabilities inherent to charging infrastructure. Within ITS, Usha et al. [
32] employed adaptive neuro-fuzzy inference systems (ANFISs) for DDoS detection, achieving 94.3% accuracy across the UNSW-NB15 and CICDDoS2019 datasets, outperforming SVM and CNN baselines in dynamic vehicular settings. Weerasinghe et al. [
33] explored threshold cryptography for securing V2X communications in 5G-enabled ITSs, and Chowdhury et al. [
34] applied Kalman filters and Dempster–Shafer theory for anomaly detection in urban traffic simulations. At the infrastructure level, Li et al. [
35] proposed a hybrid RSU deployment strategy leveraging parked vehicles as temporary units to enhance network coverage, though their focus remains on physical deployment rather than intrusion detection. Channamallu et al. [
36] further identified persistent cybersecurity gaps in IoT-based smart parking systems, particularly within wireless sensor networks and VANETs.
Cybersecurity and privacy challenges also extend to data and digital platforms within smart mobility ecosystems, including cloud-based systems and Mobility-as-a-Service (MaaS) platforms. These platforms function as systems of systems, integrating backend infrastructures, third-party providers, endpoints, and data producers to deliver comprehensive mobility solutions [
37], making them particularly attractive and consequential targets for cyberattacks. Several studies have reviewed MaaS-specific cybersecurity risks, including insider threats and AI-driven attacks, proposing countermeasures such as overlay networking and blockchain-based ticketing [
38,
39,
40,
41]. For cloud-based intrusion detection, Attou et al. [
42] proposed a random forest model achieving 98.3–99.9% accuracy on Bot-IoT and NSL-KDD datasets, while hybrid approaches including a Bi-SC-CBALSTM model [
43] and a variational autoencoder Wasserstein GAN [
44] have further advanced detection performance in digital platform environments.
Recent studies have begun exploring large-scale AI models for intrusion detection across IoT, vehicular, and cloud environments. For example, transformer-based approaches such as SecurityBERT proposed by Ferrag et al. [
45] achieved 98.2% accuracy across fourteen attack types, outperforming earlier hybrid models, including GAN–Transformer and CNN–LSTM architectures. Building on this trend, generative AI frameworks leveraging large language models (LLMs) have been applied to detect zero-day attacks in electric vehicle ecosystems, achieving detection accuracies of around 98% with lower false positive rates compared to traditional IDS methods [
46]. Similarly, LLM-based neuro-symbolic agents have shown promising results for cloud anomaly detection, achieving F1-scores above 92% on benchmark datasets [
47]. However, despite these advances, challenges remain regarding computational cost, scalability, and explainability, particularly when fine-tuning LLMs for specific network environments [
48]. Isgandarov et al. [
49] further contributed an Isolation-Forest-based approach for interpretable anomaly detection in shared mobility systems, whereas Zhang et al. [
50] proposed a federated learning framework that achieved a 92.10% F1-score while preserving data privacy.
Security in the Internet of Vehicles (IoV) and vehicular communication networks, particularly VANETs and V2X communications, including V2V and V2I, has been widely studied [
51]. At the network level, Kong et al. [
52] proposed a reinforcement quantile spatial CNN (RQSCNN) for V2V cybersecurity in 6G networks, achieving high throughput and low latency, while Karim et al. [
53] introduced a blockchain-based framework using elliptic curve cryptography for secure 5G IoV data exchange, validated via the Scyther tool. For V2X threat mitigation, Sedar et al. [
54] explored attack vectors and proposed AI-driven countermeasures for adversarial threats and lightweight models including UltraADV and a VAE-based approach; [
55,
56] achieved up to a 99% F1-score on VeReMi-Extension datasets. Several ML-based IDS studies have similarly targeted replay and DDoS attacks in VANETs [
57,
58,
59,
60,
61], consistently achieving over 99% accuracy on VeReMi datasets. More recently, Fu et al. [
62] introduced IoV-BERT-IDS, a hybrid LLM-based model capable of detecting both in-vehicle and extra-vehicular threats, demonstrating generalization across CICIDS and Car-Hacking datasets.
To contextualize the contributions of the proposed attention-enhanced CNN–LSTM framework,
Table 1 provides a chronological comparison of of cutting-edge studies on securing smart mobility components from 2023 to 2025, outlining their primary areas of focus and emphasizing the specific limitations that our system seeks to overcome.
As summarized in
Table 1, a critical gap persists across the existing literature: proposed IDS solutions remain narrowly scoped to isolated components—vehicular networks, infrastructure, digital platforms, or IoV—without accounting for the interdependent nature of smart mobility ecosystems. Specifically, they fail to capture the interdependencies among physical infrastructure (RSUs, smart roads, EV charging stations, and 5G backbones), IoT components (sensors, telematics, and drones), data platforms (MaaS and cloud/edge computing), and vehicular communication networks (V2X and C-V2X), fundamentally limiting their ability to detect and respond to cross-domain threats.
To address this, we propose a novel hybrid framework that integrates CNNs, LSTM networks, and an attention mechanism to enable real-time, scalable intrusion detection across the smart mobility stack, encompassing in-vehicle networks, mobility infrastructure, digital platforms, and IoV and thereby enhancing security in smart city ecosystems.
4. Experimental Results and Analysis
This section presents the empirical evaluation of the proposed CNN–LSTM–Attention framework across six benchmark datasets, 5G-NIDD, UNSW-NB15, Edge-IIoTset, Car-Hacking, CICIoV2024, and VeReMi Extension, spanning diverse attack scenarios and network environments representative of cloud, vehicular, IoT, and other mobility platforms.
4.1. Experimental Setup
All experiments were conducted using an NVIDIA Tesla T4 GPU with 32 GB of RAM. The CNN–LSTM–Attention model was implemented using
Scikit-learn (v1.6.1),
Keras (v3.10.0),
Pandas (v2.3.3), and
TensorFlow (v2.19). The dataset was processed as described in
Section 3.2, followed by the splitting into training, validation, and testing datasets at the ratio of 70%:15%:15%. Random seeds 0, 1, and 42 were used across the
NumPy (v2.0.2),
TensorFlow, and Python (v3.12.12) environments to ensure reproducibility during three independent experimental runs.
In order to improve the model efficiency and to limit overfitting and computation waste, Keras Tuner with Bayesian optimization was used for hyperparameter tuning. It builds a surrogate model, and specifically a Gaussian process, to model and explore the hyperparameter space. Unlike grid search or random search, Bayesian optimization focuses on evaluating the most promising hyperparameters and therefore is suited for the time-consuming architectures in deep learning. The tuning was targeted towards maximizing validation accuracy for the purposes of the optimization. Thirty trials were done, with a maximum of 10 epochs for each of the trials. In order to control overfitting, early stopping was instituted with five epochs of patience and learning rate reduction on a plateau with a factor of 0.5 and three epochs of patience to increase convergence speed. Each of the trials was conducted with a batch size of 4096.
The search space included a wide array of flexible hyperparameters to optimize model trade-off between capacity, regularization and computation:
CNN configuration: Filters in Layer 1 (32–128, step = 32), Filters in Layer 2 (16–64, step = 16); Kernel size .
LSTM units: Layer 1 (64–256, step = 64), Layer 2 (32–128, step = 32), Layer 3 (16–64, step = 16).
Dropout rates: Standard dropout (0.2–0.5, step = 0.1); Recurrent dropout (0.2–0.5, step = 0.1).
Dense layers: Units in Layer 1 (32–128, step = 32), Layer 2 (16–64, step = 16).
Regularisation and optimisation: L2 penalty ; Learning rate .
The resulting optimal hyperparameter configuration obtained through Bayesian optimization is summarized in
Table 3. Additional training configurations, including batch size, number of training epochs, and regularization strategies (
EarlyStopping and
ReduceLROnPlateau), are also detailed in
Table 3 to ensure reproducibility.
4.2. Metrics for Model Evaluation
Evaluating model performance is key to this research. For this purpose, a confusion matrix was used to assess the ability of the model to accurately distinguish positive classes (attack) from the negative (Benign) classes, as well as to quantify classification errors, namely false positives and false negatives. Six different classification metrics were employed to assess model performance. These metrics encompass four threshold-based metrics (accuracy, precision, recall and F1-score) and two threshold-independent metrics (ROC-AUC and PR-AUC). Additionally, the false negative rate (FNR) and the false positive rate (FPR) have been used to quantify the proportion of errors relative to the actual classes, providing insight into the reliability of the model for each true outcome. Mathematically, the expressions of the metrics are given below.
Threshold-free metrics:
The ROC−AUC metric evaluates a model’s performance across all the classification thresholds.
And the PR-AUC (see Equation (
17)) metric evaluates the area of the precision–recall curve.
where
represents the number of correctly identified attacks,
represents the number of correctly classified normal scenarios,
represents misclassified normal traffic, and
is the number of misclassified attacks.
4.3. Overall Performance
Table 4 and
Table 5 present the averaged metrics across three random seeds along with their standard deviations. The proposed model achieved outstanding classification results, with accuracy greater than 98% in all datasets.
On the 5G-NIDD dataset, which simulates lightweight network intrusion patterns in 5G-enabled vehicular communication environments, the proposed model attains a mean accuracy of (), a precision of (), a recall of (), and an F1-score of (). Similarly, AUC-ROC and PR-AUC scores reach near-perfect values of () for both, indicating outstanding discriminative power.
The model also displays remarkable performance on the UNSW-NB15 dataset, which is a benchmark model for comprehensive network-based intrusions, achieving a mean accuracy of , a precision of , a recall of , and an F1-score of . The AUC-ROC and PR-AUC scores of and , respectively, demonstrate the capacity of the model to equilibrate false positives and false negatives in attack detection.
The Edge-IIoTset, which is customized for the detection of anomalies in network traffic of IIoT and IoT in smart vehicle technology, yields a mean accuracy of , precision, recall, and F1-score, and a perfect AUC-ROC and PR-AUC of . The Car Hacking dataset, which focuses on CAN bus exploits, records a perfect score across the board with for accuracy, precision, recall, F1-score, AUC-ROC, and PR-AUC, showing the model’s zero tolerance for evasion in real-time automotive hacking simulations.
The CICIoV2024 dataset provides comprehensive coverage of multiple attacks targeting Internet of Vehicles (IoV) within mobility contexts and provides mean accuracy of and all other metrics at , confirming the capability of the model to cope with the dynamic stream of threats against IoV. Lastly, with the VeReMi extension dataset capturing the misbehavior of cooperative Intelligent Transportation Systems (C-ITSs), the model attained mean accuracy, precision, recall, F1-score, AUC-ROC, and PR-AUC.
Figure 2 presents the confusion matrices for the hybrid model, showing the numbers of true positives (attacks correctly detected), true negatives (normal traffic correctly classified), false positives (normal traffic classified as attacks), and false negatives (attacks unmodelled). The proposed model performs perfectly on both CICIoV2024 and Car Hacking, and there are no misclassification. On Edge-IIoTset; the proposed model wrongly misclassified only one out of 60,900 samples, with benign misclassifications totalling 47 out of 119,100. This indicates a mild bias toward benign classification but demonstrates improved sensitivity to attack traffic.
With regard to the VeReMi extension, 5G-NIDD, and UNSW-NB15, the proposed model has considerably improved false positive and false negative rates, with only 38 benign and 651 attack instances missed on VeReMi, 101 benign and 11 attack instances on 5G-NIDD, and 2349 benign and 1453 attack instances on UNSW-NB15.
The proposed CNN–LSTM–Attention framework has a mean F1-score of 98.94% and a standard deviation of 0.08% (
Figure 3) across all datasets, showing that the framework has a well-balanced trade-off between precision and recall. Results show that utilizing CNNs for spatial feature extraction, LSTMs for temporal modeling, and attention mechanisms for enhanced threat focus is a winning combination. The framework is able to maintain performance, and is able to provide a greater proportion of correct classification of benign traffic compared to other frameworks, while also providing a greater proportion of correct classification of attacked traffic. This is an important consideration for real-world smart mobility operational deployments, where having high false negatives (missed attacks) can cause operational and safety issues. Additionally, all standard deviations for all evaluation metrics of accuracy, F1-score, ROC-AUC, and PR-AUC for the model are below 0.3% and 0.5%, showing the model’s stability and the minimized impact of random initialization and data shuffling on the CNN–LSTM–Attention hybrid model.
4.3.1. Analysis of Perfect-Score Results
To further investigate the perfect classification performance observed in some datasets, and to address concerns regarding potential overfitting or optimistic evaluation, we conducted an additional validation using stratified k-fold cross-validation (k = 5) across all datasets. This evaluation complements the previously reported train/validation/test split and provides a more statistically robust assessment of the proposed model’s generalization capability. The mean and standard deviation of accuracy, precision, recall, F1-score, ROC-AUC, and PR-AUC across the five folds are reported in
Table 6.
The cross-validation results confirm the stability of the model performance across different data partitions. As shown in
Table 6, the
Car Hacking dataset maintains near-perfect performance across all folds. This dataset contains CAN bus traffic with structurally distinct attack patterns, including fuzzy attacks, DoS attacks, and spoofing via RPM and gear manipulation. These attacks introduce message patterns that are significantly different from benign CAN traffic, resulting in highly separable feature distributions. Previous studies such as IDS-DEC [
28] and Wang et al. [
16] have similarly reported near-perfect detection results on this dataset, suggesting that the classification task is inherently separable due to the clear distinction between normal and malicious CAN messages.
A similar trend is observed for the
CICIoV2024 dataset [
76], which was collected from real electronic control units (ECUs) within a controlled Internet of Vehicles (IoV) testbed. The dataset contains clearly defined CAN protocol attack scenarios with well-separated traffic patterns, which also contributes to the high separability between benign and attack classes. The consistent results obtained across multiple folds further confirm that the observed performance is not caused by a favorable random data split but reflects the intrinsic structure of the dataset.
Furthermore,
Figure 4 provides a visual comparison of the mean F1-scores across all evaluated datasets, reinforcing the statistical consistency of the results reported in
Table 6. Specifically, the model achieves a mean F1-score of 100% on the Car Hacking dataset, 99.99% on CICIoV2024, 99.95% on Edge-IIoTset, 99.78% on 5G-NIDD, 97.31% on the VeRemi Extension and 95.74% on the UNSW-NB15 dataset.
4.3.2. Efficiency and Practical Considerations
In addition to accuracy and the F1 score,
Table 5 also provides additional metrics, including metrics like the false positive rate (FPR) and false negative rate (FNR), inference latency, the number of parameters, and the amount of training time. As depicted in
Figure 5, the model produces the highest error rates in the two datasets. The model has an FNR of roughly 4.4% and an FPR of approximately 0.5% for the UNSW-NB15 dataset. The VeReMi Extension dataset also has a high FNR around 4.0%, and the FPR is negligible. The model also achieved near perfect error-rate performance on the Edge-IIoTset, Car Hacking, and CICIoT2024 datasets, where FPR and FNR were near 0% (or 0% in some cases). The 5G-NIDD dataset also performs excellently, having both error rates at less than 0.2%.
To maintain low operational overhead, which is imperative for resource-constrained devices in mobility applications, we have also considered inference latency, which averages around (
per sample), allowing for real-time detection at over 200 inferences per second on conventional hardware. Inference latency is typically low, as illustrated in
Figure 6a, though the Edge-IIoTset dataset represents a significant inference time bottleneck. In contrast, the VereMi extension and the Car Hacking dataset are incredibly quick to train, while the other datasets have average training time durations of 50 s to 60 s per epoch due to the quantity of samples in the training dataset.
The hybrid model also has a reasonable size of
parameters and
, as shown in
Figure 6b, which supports their deployment on embedded systems, such as in-vehicle ECUs and other resource-constrained systems. The model demonstrates good average training efficiency with average epoch duration of
. However, there is a high variability on epoch duration, with up to 23%, which indicates that efficiency and stability are strongly dependent on the dataset.
The CNN–LSTM–Attention mechanism maintains state-of-the-art performance with solid statistical certainty across all six smart mobility datasets that we investigated, as illustrated in
Table 4 and
Table 5. The model’s performance consistency and stability make it a strong generalizable model, and it is an adaptable option for the protection of smart mobility ecosystems from a wide variety of cyber threats.
4.4. Ablation Experiment
To assess the contribution of each of the proposed CNN–LSTM–Attention model for intrusion detection, an ablation study was performed. Four variants of the model were studied: CNN-only, LSTM-only, Attention-only, and the complete model that is a composite of all the parts.
Table 7 summarizes the benchmark results across all six Smart Mobility datasets. Among the configurations, the CNN-only model demonstrated a high level of performance, with the ability to achieve an F1 score of over 97% in the majority of the analyzed datasets, which is attributed to its ability to model local spatial dependencies in the traffic flow dataset. With reference to the LSTM-only Model, which was able to achieve the highest performance in the task of modeling temporal dependencies, its F1 score was considerably low (e.g.,
on UNSW-NB15 and
on 5G-NIDD) which demonstrates that temporal features in isolation do not sufficiently contribute to a high level. The performance of the Attention-only Model was the lowest (e.g.,
F1 score on 5G-NIDD), demonstrating that the mechanism of Attention requires the combination of spatial and temporal features that are of high quality.
The suggested hybrid deep-learning-based structure, integrating convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and attention mechanisms, received the highest total performance metrics, achieving an F1-score of and surpassing all other configurations. This enhancement validates the synergistic impact of integration in CNN-based spatial feature extraction, LSTM-based temporal modeling, and attention feature refinement. In addition, the minimal standard deviations (<) among datasets demonstrates the consistency and reliability of performance, emphasizing the versatility of our framework.
Additionally, the perfect scores reported in
Table 7 should be interpreted with caution. The ablation study shows that simpler configurations, such as CNN-only and LSTM-only models, also achieve near-perfect performance on the Car Hacking and CICIoV2024 datasets, suggesting that the classification task may be inherently separable for these datasets. This can be attributed to the structurally distinct attack patterns in the Car Hacking dataset and the controlled data collection environment of CICIoV2024.
4.5. Comparison with the State of the Art
The comparative analysis of mean detection accuracy and F1 score of the proposed framework is illustrated in
Table 8 across six distinct datasets and is juxtaposed against existing works. However, most existing works are not directly aimed at smart mobility and are limited to specific facets of the smart mobility system. For example, some works only address vehicular CAN traffic using autoencoder or GAN based models [
16,
28], and some focus on the detection of misbehaving nodes in VANETs using the VeReMi extension dataset [
61]. Cloud and MaaS-based architectures by Zhang et al. [
50] focused on mobility security address other verticals but remain within vertical silos without cross-integration with the IoV and other ecosystems. In the same vein, Usha et al. [
32] focused on the detection of DDoS in smart transportation systems through an ANFIS-based IDS, which also points to limited specific areas of smart mobility. It is evident that existing works are specific to certain datasets and certain domains and are infrequently tested in heterogeneous conditions that represent real-world mobility systems. In this regard, the proposed IDS has been tested on six distinct datasets (VeReMi extension, Car Hacking, UNSW-NB15, Edge-IIoTset, CICIoV2024, and 5G-NIDD), allowing for a systematic evaluation of its detection capability across diverse network infrastructures and threat scenarios.
4.6. Discussion
The experimental results provide evidence for the efficacy of the suggested framework, as detailed in
Section 4. The model possesses strong performance by consistently outperforming its competitors on six diverse datasets. This phenomenon can be understood in theory by the complementary nature of the model’s components (e.g., CNNs, LSTMs, and attention). The convolutional layers capture short-term patterns within the network traffic windows. These short-term patterns can be byte-level anomalies or sudden spikes in packet traffic. The LSTM layers model longer temporal dependencies and help the system identify sequential attack behaviors that emerge over time. The attention model helps to mitigate the overfitting to irrelevant patterns by emphasizing significant temporal features and assigning them higher weights.
As shown in the performance metrics presented in
Table 7, while CNN and LSTM models have captured some evaluation metrics strongly, their solitary performance across the board has been weak for all datasets, including the 5G-NIDD, VeReMi extension, and CICIoV2024 datasets. Therefore, the balancing inductive bias from component combination helps the model improve across all smart mobility environments and traffic distribution, which also explains why the model shows impressive generalization across diverse smart mobility environments, as demonstrated in
Table 4 and
Table 5 and the confusion matrices depicted in
Figure 2.
In spite of this, the ablation results
Table 7 indicates that the contribution of each architectural component varies across datasets. On some datasets, such as 5G-NIDD and UNSW-NB15, the CNN-only configuration performs comparably to the full model. For instance, on 5G-NIDD, the CNN-only model achieves 99.83% accuracy compared to 99.90% for the full model, while on UNSW-NB15, the CNN-only model (99.03% accuracy, F1: 96.11%) is close to the full model (98.97%, F1: 95.92%). This suggests that for datasets where spatial feature patterns are already highly discriminative, the additional temporal and attention components provide only marginal gains.
However, the benefits of the hybrid architecture become more evident on datasets with stronger temporal dependencies or higher noise levels. For example, on the VeReMi Extension dataset, the proposed model slightly improves accuracy (98.24% vs. 98.17%) while substantially reducing the false positive rate (0.11% vs. 0.38%), indicating that the LSTM and attention mechanisms help suppress spurious detections and improve decision consistency. Similarly, on the Edge-IIoTset dataset, the full model achieves near-perfect performance (accuracy of , F1-score of , and AUC-ROC of ), demonstrating its effectiveness in complex IoT environments. Additionally, perfect scores observed on the Car Hacking and CICIoV2024 datasets (accuracy, F1-score, and AUC-ROC of ) should be interpreted with caution. The ablation results show that simpler models can also achieve near-perfect performance on these datasets, suggesting that the classification task may be inherently separable due to structurally distinct attack patterns and controlled data collection conditions.
Comprehensive results show that adding CNNs, LSTMs, and attention mechanisms improves the identification intrusion framework’s results across smart mobility datasets, achieving over
accuracy across all six datasets. The proposed CNN–LSTM–Attention framework demonstrates superior performance and robustness over various heterogeneous infrastructures. However, the UNSW-NB15 dataset continues to be the most challenging dataset due to increased false positives and false negatives, as evident in
Figure 2f. This indicates that perhaps the proposed frameworks need to be trained more than 10 epochs or other approaches such as domain adaption or transfer learning may need to be employed to mitigate the challenges posed by dynamic and complex vehicular communication environments. Overall, the results show that attention mechanisms improve the model’s focus on important traffic features, enabling better detection of less obvious sophisticated attack behaviors while maintaining good benign traffic classification.
The offered framework is an effective framework for real-time intrusion detection and provides a robust and reliable defense against cyber attacks on smart mobility systems.