1. Introduction
Connected autonomous vehicles (CAVs) are transforming intelligent transportation by enabling communication among vehicles, infrastructure, and roadside systems through vehicle-to-everything (V2X) technologies. These capabilities support cooperative driving, traffic efficiency, and improved road safety. At the vehicle level, many functions rely on electronic control units (ECUs) that exchange data through in-vehicle networks, among which the Controller Area Network (CAN) bus remains a dominant protocol for real-time communication [
1,
2].
However, the CAN protocol was not designed with strong built-in security features such as authentication and encryption. This limitation makes in-vehicle networks vulnerable to message injection, spoofing, replay, and denial-of-service attacks, which may disrupt vehicle operation and threaten passenger safety. As vehicle connectivity continues to expand, securing CAN-based communication has become a critical challenge in intelligent transportation systems [
3,
4].
Machine learning-based intrusion detection has attracted significant attention for CAN bus security because it can identify abnormal traffic patterns beyond predefined attack signatures. Compared with rule-based and signature-based approaches, learning-based methods are better suited to detecting previously unseen attacks. However, conventional centralized training requires collecting data from many vehicles, which raises privacy concerns and increases communication cost [
5,
6].
Federated learning (FL) addresses this limitation by enabling distributed model training without sharing raw local data. In FL, each client trains locally and sends model updates to a central server, which aggregates them into a global model. This setting is well suited to connected vehicle environments, where vehicles can collaboratively learn from locally generated CAN traffic while preserving data privacy [
7,
8].
Despite its privacy advantages, FL remains vulnerable to adversarial manipulation. Compromised clients can poison training by altering labels or submitting malicious model updates, thereby degrading global model quality. Two representative threats are label-flipping attacks and gradient manipulation attacks, where adversaries amplify or distort local updates to influence aggregation. Backdoor attacks also pose a serious threat because the model may maintain high clean accuracy while misclassifying samples that contain a specific trigger pattern. In safety-critical CAV applications, such attacks can significantly weaken anomaly detection reliability [
9,
10].
To mitigate poisoned updates, robust aggregation methods have been proposed as alternatives to simple averaging. In particular, coordinate-wise Median and Trimmed Mean are widely studied for Byzantine-resilient distributed learning because they reduce the influence of abnormal client updates [
11,
12,
13]. Other robust aggregation strategies, such as Krum, Multi-Krum, and Geometric Median, provide additional defense mechanisms by selecting updates based on distance relationships or minimizing the total distance to submitted client updates. However, the relative effectiveness of these aggregation methods for federated CAN bus anomaly detection under different poisoning and backdoor attack settings remains insufficiently explored.
In this study, we examine the robustness of federated anomaly detection for CAV networks under adversarial conditions using the CAN-HCRL-OTIDS dataset. We simulate a multi-client vehicular FL environment in which a multilayer perceptron (MLP) is trained collaboratively under non-IID client data distributions. The evaluation compares six aggregation strategies: Federated Averaging (FedAvg), coordinate-wise Median, Trimmed Mean, Krum, Multi-Krum, and Geometric Median (GeoMed). We consider both benign training and adversarial training under label-flipping, gradient-scaling, and feature-triggered backdoor attacks. We also evaluate malicious client ratios, gradient-scaling strengths, learning rates, Trimmed Mean beta values, multi-seed reliability, and server-side aggregation time. Model performance is assessed using accuracy, precision, recall, F1-score, ROC-AUC, false negative rate, attack success rate, and aggregation time [
14,
15].
The main contributions of this work are summarized as follows:
We develop a federated learning framework for CAN bus anomaly detection in connected autonomous vehicle networks while preserving local data privacy.
We present a systematic robustness evaluation of federated aggregation strategies under adversarial conditions in connected vehicular networks.
We compare six aggregation strategies in the same vehicular federated learning setting: FedAvg, coordinate-wise Median, Trimmed Mean, Krum, Multi-Krum, and GeoMed.
We investigate the vulnerability of federated anomaly detection under three adversarial attack models: label-flipping attacks, gradient-scaling attacks, and feature-triggered backdoor attacks.
We analyze the effect of malicious client ratio by testing adversarial participation levels of 10%, 20%, 30%, and 40%.
We conduct sensitivity analyses for gradient-scaling strength, learning rate, and the Trimmed Mean beta parameter, and we report multi-seed reliability results.
We report server-side aggregation time to examine the practical trade-off between robustness and computational overhead in CAV environments.
We show that Median and GeoMed provide strong stability against gradient-scaling attacks, while Multi-Krum provides stronger resistance to label-flipping and backdoor attacks. These results indicate that robust aggregation should be selected according to the expected attack model and deployment constraints.
2. Related Work
2.1. Anomaly Detection and Federated Learning in Vehicular Networks
Intrusion detection is a critical component of vehicular cybersecurity because connected vehicles increasingly rely on networked communication systems. Recent studies highlight the rapid growth of machine learning-based anomaly detection for intelligent transportation systems and emphasize the need for scalable security solutions in vehicular environments [
1,
2]. Centralized learning models can detect known attack patterns effectively, but they require continuous transmission of in-vehicle data to a central server, which increases privacy risk and communication cost [
4,
14].
Federated learning addresses this limitation by enabling collaborative model training without sharing raw data. Each participant trains locally and sends model updates to an aggregation server, where a global model is constructed [
7]. In vehicular networks, federated learning has been studied for hierarchical vehicle-edge coordination, V2X-based collaborative learning, autonomous driving perception, and privacy-preserving intrusion detection [
15,
16,
17,
18]. Beyond transportation, AI-enabled digital twin and smart manufacturing research also provides useful context for autonomous systems that depend on sensing, robotics, data exchange, and intelligent decision making [
19]. This broader automation perspective is relevant to CAV anomaly detection because safety, privacy, and system trustworthiness are shared requirements across intelligent autonomous systems.
Despite these advances, many vehicular federated learning studies assume that participating clients behave honestly. This assumption may not hold in practical CAV environments, where compromised vehicles or edge nodes may participate in training while submitting manipulated updates.
2.2. Adversarial Attacks on Federated Learning
Federated learning is vulnerable to adversarial manipulation because the server usually cannot directly inspect private client data. Malicious participants can corrupt local training data or submit poisoned model updates, causing data poisoning or model poisoning attacks. Prior studies show that even a small number of adversarial clients can degrade federated model performance [
9,
20]. In vehicular networks, adversarial behavior can directly reduce the reliability of collaborative intrusion detection and safety-critical decision support [
21,
22]. Related intelligent transportation studies also show that adversarial attacks can affect learning-based traffic-signal control systems [
23].
Label-flipping attacks represent data poisoning, where malicious clients intentionally change class labels during local training. Gradient-scaling attacks represent model poisoning, where malicious clients amplify submitted updates to increase their influence during aggregation. Backdoor attacks are more covert because the global model may retain strong clean performance while misclassifying samples that contain a specific trigger pattern [
24]. These attacks are important for CAV anomaly detection because a model that appears reliable under normal testing may still fail under targeted adversarial conditions.
2.3. Robust Aggregation and Defense Mechanisms
Robust aggregation methods aim to reduce the influence of abnormal client updates during federated aggregation. Coordinate-wise Median and Trimmed Mean are widely studied defenses against Byzantine or poisoning behavior in distributed learning [
12]. Coordinate-wise Median computes the median value of each parameter across client submissions, while Trimmed Mean removes selected extreme values before averaging. Because Trimmed Mean depends on the selected trimming fraction, sensitivity to the beta value should be evaluated rather than assuming that a fixed parameter is optimal.
Distance-based robust aggregation provides another defense direction. Krum selects the update closest to its neighboring updates, while Multi-Krum selects multiple low-score candidate updates before averaging [
11]. Geometric Median aggregation seeks an aggregate update that minimizes the total distance to submitted client updates [
25]. These methods can improve resistance to malicious updates, but they may introduce additional server side computation compared with FedAvg.
Robust aggregation can improve resilience against poisoning, but it does not fully address all privacy and security risks. Privacy-preserving and cryptographic defenses such as differential privacy, secure aggregation, authentication, and blockchain-based accountability can reduce inference risks, protect update confidentiality, or improve auditability [
26,
27,
28]. These mechanisms are complementary to robust aggregation, but they may introduce utility loss, communication overhead, or additional computation. Therefore, practical deployment in CAV environments requires evaluating both robustness and overhead.
2.4. Research Gap
Although previous studies have examined federated vehicular intrusion detection and adversarial robustness, several gaps remain. First, many evaluations compare only a small number of aggregation strategies and do not include stronger robust baselines such as Krum, Multi-Krum, and Geometric Median. Second, most studies focus on label-flipping or simple model poisoning, while backdoor attacks remain less explored for federated CAN bus anomaly detection. Third, malicious client ratio, learning rate, and Trimmed Mean beta sensitivity are often not fully analyzed, even though these factors can change conclusions about the strongest aggregation method. Fourth, server side aggregation time is important in latency-sensitive CAV environments but is often omitted from robustness analysis.
To address these gaps, this study evaluates FedAvg, coordinate-wise Median, Trimmed Mean, Krum, Multi-Krum, and GeoMed under clean training, label-flipping attacks, gradient-scaling attacks, and backdoor attacks. The evaluation also analyzes malicious client ratio, gradient-scaling strength, learning rate, Trimmed Mean beta, multi-seed reliability, and aggregation time.
3. System Model
We consider a federated learning-based anomaly detection framework deployed across a fleet of connected autonomous vehicles. The system consists of N vehicular clients and a central aggregation server that coordinates collaborative training. Each vehicle is equipped with onboard computing resources capable of collecting Controller Area Network (CAN) bus traffic and performing local model training.
The objective is to train a global intrusion detection model without requiring vehicles to share raw CAN traffic data. Instead, each client trains locally using its private dataset and transmits only model parameters or model updates to the aggregation server. This design preserves data privacy while reducing raw data exposure, consistent with recent privacy-preserving vehicular federated learning architectures [
16].
Figure 1 illustrates the V2X communication environment and the high-level deployment context for federated anomaly detection in connected autonomous vehicle networks. The figure shows the interaction among connected vehicles, roadside infrastructure, edge devices, communication networks, and the central server.
3.1. Federated Learning Workflow
Training proceeds in synchronous communication rounds. At the beginning of each round, the server distributes the current global model to all clients. Each client performs local training on its CAN traffic dataset for several local epochs and then sends the updated model parameters back to the server.
The server aggregates the received updates using a predefined aggregation strategy to construct a new global model. In this study, the evaluated aggregation strategies include Federated Averaging (FedAvg), coordinate-wise Median, Trimmed Mean, Krum, Multi-Krum, and Geometric Median (GeoMed). The updated model is then broadcast to clients for the next training round. This iterative process continues until the specified number of communication rounds is completed.
3.2. Data Distribution
In practical vehicular environments, CAN traffic collected from different vehicles varies due to differences in driving behavior, hardware configurations, and surrounding conditions. Therefore, client datasets are assumed to follow a non-independent and non-identically distributed (non-IID) setting.
Each vehicle collects its own CAN traffic logs, and the proportion of benign and malicious samples may differ across clients. Such heterogeneous data distributions commonly arise in real-world federated learning deployments and represent realistic conditions for collaborative intrusion detection. In the experiments, this non-IID setting is represented through heterogeneous class distributions across clients, where different clients contain different benign and attack sample proportions. Specifically, client attack ratios are varied across clients to create label distribution heterogeneity.
This non-IID construction captures class distribution heterogeneity among vehicular clients. It does not fully represent all possible forms of real-world vehicular heterogeneity, such as different vehicle models, manufacturer-specific CAN semantics, or large feature distribution shifts across environments. These limitations are considered in the discussion of practical deployment and future work.
3.3. Client Composition
The federated system contains both benign and adversarial participants. Let f denote the number of compromised clients such that . Malicious clients follow the federated protocol but intentionally manipulate their training behavior to degrade the global model.
In our experimental setup, we simulate clients, and the number of malicious clients is varied to evaluate the effect of adversarial participation. Specifically, malicious client ratios of 10%, 20%, 30%, and 40% are considered, corresponding to , , , and malicious clients. The 40% setting is close to the robustness boundary of median-based aggregation, where the number of malicious clients must remain below half of the total client population.
3.4. Threat and Trust Assumptions
The aggregation server is assumed to be trusted and executes the selected aggregation rule correctly. The server does not know in advance which clients are malicious. Malicious clients may manipulate local labels, scale their submitted updates, or inject a backdoor trigger during local training. However, they do not control the server and do not directly modify benign client data.
The adversary is internal to the federated learning process because compromised clients are allowed to participate in training and submit model updates. This setting is relevant to CAV networks, where a compromised vehicle or edge node may appear as a legitimate participant while attempting to poison the global model.
3.5. Communication Assumptions
Communication between the aggregation server and vehicular clients is assumed to be authenticated and reliable. Although practical vehicular networks may experience latency or bandwidth constraints, this work focuses on the robustness of the learning framework under adversarial client behavior.
Model parameters exchanged during training are assumed to be protected against external tampering, consistent with secure vehicular federated learning architectures. To partially assess practical deployment cost, the experiments report aggregation time at the server for each aggregation method. A complete packet-level communication delay analysis is outside the scope of this study.
4. Methodology
4.1. System Overview
The V2X deployment context is shown in
Figure 1, while the detailed federated anomaly detection workflow is shown in
Figure 2. In this framework, vehicular clients train local anomaly detection models using private CAN bus data and send model updates to a central aggregation server. The server aggregates the received updates using the selected aggregation strategy and broadcasts the updated global model to clients for the next communication round. This section presents the federated optimization objective, aggregation strategies, training procedure, and adversarial threat model used in the study.
4.2. Federated Learning Formulation
Consider a federated learning environment with
N vehicular clients. Each client
i maintains a private local dataset:
where
is the feature vector extracted from CAN messages and
denotes benign or malicious traffic. Let
denote the anomaly detection model parameterized by
. The local empirical loss at client
i is defined as
where
is the training loss. The global federated objective is
At communication round
t, the server broadcasts the current global model
to all clients. Each client performs local training and returns updated parameters
. The local update is represented as
4.3. Aggregation Strategies
This study evaluates six aggregation strategies under clean and adversarial federated learning conditions.
4.3.1. Federated Averaging
Federated Averaging (FedAvg) computes a weighted average of client model parameters:
FedAvg is efficient and widely used, but it can be strongly affected when malicious clients submit poisoned updates.
4.3.2. Coordinate-Wise Median
Coordinate-wise Median aggregation computes the median value independently for each model parameter dimension. Let
denote the
j-th parameter submitted by client
i. The aggregated parameter is
The median operator is robust when the number of malicious clients satisfies
This condition is important because the 40% malicious client setting with
and
is close to the robustness boundary of median-based aggregation. Therefore, lower malicious client ratios are also evaluated.
4.3.3. Trimmed Mean
Trimmed Mean aggregation removes extreme values before averaging. Let
denote the trimming fraction and
. After sorting the submitted values for parameter
j, the aggregated value is
where
is the
i-th sorted value for parameter
j. Since the performance of Trimmed Mean depends on
, the experiments include a beta sensitivity analysis.
4.3.4. Krum
Krum is a distance-based Byzantine-resilient aggregation method that selects one submitted update whose distance to neighboring updates is smallest [
11]. For each client update
, Krum computes the squared Euclidean distance to other updates and forms the neighbor set
containing the
closest updates. The Krum score is
The selected update is
Krum can reject isolated malicious updates, but it uses only one selected update per round.
4.3.5. Multi-Krum
Multi-Krum extends Krum by selecting multiple client updates with the lowest Krum scores and averaging them [
11]. If
is the selected set, the aggregated model is
This preserves distance-based filtering while using more benign information than single-update Krum.
4.3.6. Geometric Median
Geometric Median (GeoMed) aggregation computes an aggregate model that minimizes the total distance to submitted client models [
25]:
Unlike coordinate-wise Median, GeoMed treats each submitted model as a full parameter vector and reduces the effect of updates that are far from the central group of client submissions.
4.4. Federated Training Procedure
The training process follows synchronous communication rounds. In each round, the server broadcasts to all clients. Each client trains locally on its private CAN traffic data and sends to the server. The server then applies one of the evaluated aggregation strategies: FedAvg, coordinate-wise Median, Trimmed Mean, Krum, Multi-Krum, or GeoMed. The resulting global model is used in the next round. The process continues until the specified number of communication rounds is completed.
4.5. Threat Model
We consider an internal adversary that controls a subset of participating clients. The aggregation server is trusted and executes the selected aggregation rule correctly, but it does not know which clients are malicious. Let f denote the number of adversarial clients, with . The experiments evaluate malicious client ratios of 10%, 20%, 30%, and 40%. Three attack strategies are considered.
Label-Flipping Attack: Malicious clients alter local labels during training by, for example, switching benign labels to attack labels and attack labels to benign labels. This corrupts the local training objective and produces misleading model updates.
Gradient-Scaling Attack: Malicious clients amplify their submitted updates. Given the local update
, a malicious client submits
and the corresponding poisoned model becomes
The scaling factor
controls the attack strength.
Backdoor Attack: In the backdoor attack, malicious clients poison a fraction of their local training samples by applying a predefined feature-level trigger and assigning the triggered samples to a target label. Let
denote the trigger function and
denote the adversary-selected target label. A poisoned sample is represented as
The backdoor attack is evaluated using attack success rate (ASR), defined as
A lower ASR indicates stronger resistance to the backdoor trigger. ASR is reported together with clean classification performance to evaluate both normal detection capability and targeted attack resistance.
5. Experimental Setup
This section describes the dataset, preprocessing, model architecture, federated configuration, attack settings, sensitivity analyses, and evaluation metrics used to assess the robustness of the federated anomaly detection framework. All experiments were conducted in a simulated federated learning environment where vehicular clients collaboratively train an anomaly detection model using CAN bus traffic data.
5.1. Dataset and Preprocessing
Experiments were performed using the CAN-HCRL-OTIDS dataset, which contains labeled CAN bus traffic collected from vehicular environments. The dataset includes normal communication and malicious traffic generated under different attack scenarios, making it suitable for evaluating CAN bus intrusion detection.
Four source files were used: dataset.csv, dataset1.csv, dataset2.csv, and dataset3. csv. These files were merged into a unified binary classification dataset, where target label 0 was mapped to the benign class and target labels 1, 2, and 3 were mapped to the attack class. The merged dataset contains 4,613,439 CAN samples, including 2,369,398 benign samples and 2,244,041 attack samples.
To preserve temporal independence, each source file was divided using a chronological 80% training and 20% testing split before merging the partitions. The final split contains 3,690,750 training samples and 922,689 testing samples. Each CAN message is represented using 12 packet-level features: TS, ID1, ID0, LEN, and payload bytes DLC0 to DLC7. Hexadecimal fields were converted into numerical values, missing values were handled during preprocessing, and all features were normalized using StandardScaler. The same preprocessing pipeline was applied consistently to the chronological training and testing partitions.
Table 1 summarizes the dataset characteristics used in the experiments.
5.2. Model Architecture
The anomaly detection model is implemented as a lightweight multilayer perceptron (MLP) for binary classification. The network receives a 12-dimensional CAN feature vector and uses two fully connected hidden layers with 32 and 16 neurons, respectively, followed by a two-neuron softmax output layer. ReLU activation is used in the hidden layers.
This architecture is computationally efficient and suitable for resource-constrained vehicular settings. Although CNN, LSTM, and Transformer-based models may capture richer spatial or temporal patterns, the MLP is used here to keep the focus on robustness differences among aggregation strategies under identical model capacity and adversarial conditions.
5.3. Federated Learning Configuration
To simulate a collaborative vehicular environment, the dataset was partitioned across multiple clients representing individual vehicles. Each client trains locally using private data and shares only model parameters with the aggregation server. A non-IID setting was created by assigning different benign and attack proportions across clients, with client attack ratios increasing from 0.10 to 0.90.
The experiments use 10 clients, with 20,000 local samples assigned to each client and 100,000 test samples used for evaluation. Malicious-client ratios of 10%, 20%, 30%, and 40% are evaluated, corresponding to 1, 2, 3, and 4 malicious clients. The default adversarial setting uses 40% malicious clients.
Table 2 summarizes the federated learning configuration.
5.4. Attack and Sensitivity Settings
Six aggregation strategies are evaluated: FedAvg, coordinate-wise Median, Trimmed Mean, Krum, Multi-Krum, and Geometric Median (GeoMed). The attack evaluation includes label flipping, gradient scaling, and a feature-triggered backdoor attack.
In the label-flipping attack, malicious clients switch benign and attack labels during local training. This attack is evaluated under malicious client ratios of 10%, 20%, 30%, and 40%. In the gradient-scaling attack, malicious clients amplify their local updates before sending them to the server:
where
controls attack strength. Gradient-scaling sensitivity is evaluated using
and the default setting uses 40% malicious clients with
.
In the backdoor attack, malicious clients poison a fraction of local training samples by inserting a feature-level trigger and assigning triggered samples to a target label. The backdoor setting uses 40% malicious clients, a poison fraction of 0.2, and target label 0.
Additional sensitivity analyses were conducted for learning rate values of 0.0005, 0.001 and 0.005, and Trimmed Mean beta values of , , and . To reduce dependence on a single random run, multi-seed reliability analysis was performed using seeds 42, 123, and 2026.
5.5. Evaluation Metrics
Model performance was evaluated using accuracy, precision, recall, F1-score, ROC-AUC, and false negative rate (FNR). These metrics measure both overall classification performance and the ability to detect malicious CAN traffic while minimizing missed attacks. For the backdoor attack, attack success rate (ASR) is also reported, where lower ASR indicates stronger resistance to triggered misclassification. Server-side aggregation time is measured to assess computational overhead across aggregation strategies.
6. Results and Discussion
This section evaluates the federated anomaly detection framework under clean and adversarial settings. The analysis covers main performance, malicious client ratio sensitivity, gradient-scaling strength, backdoor resistance, hyperparameter sensitivity, multi-seed reliability, and server-side aggregation overhead.
6.1. Overall Performance Under Clean and Adversarial Settings
We first evaluate the model under clean federated training and then under label-flipping and gradient-scaling attacks with 40% malicious clients.
Table 3 reports the final results after 30 communication rounds.
Under clean training, all methods achieve similar F1-scores between 0.9055 and 0.9113, showing that the selected MLP and CAN features provide a stable baseline. Under label flipping, FedAvg drops to an F1-score of 0.6932, while Multi-Krum achieves the best F1-score of 0.8865, followed by GeoMed with 0.8804. Under gradient scaling, FedAvg and Trimmed Mean degrade strongly, whereas GeoMed and Median remain stable with F1-scores of 0.9134 and 0.9108. These results show that robust aggregation is necessary, but the strongest method depends on the attack type.
6.2. Malicious Client Ratio and Gradient-Scaling Sensitivity
To examine the effect of adversarial participation, label-flipping attacks were evaluated under 10%, 20%, 30%, and 40% malicious client ratios.
Table 4 and
Figure 3 summarize the F1-score trend.
FedAvg declines continuously as the malicious client ratio increases, while Multi-Krum and GeoMed remain more stable. Median also degrades near the 40% setting because this ratio approaches the robustness boundary of median-based aggregation.
Gradient-scaling sensitivity was evaluated using
,
, and
with 40% malicious clients.
Table 5 and
Figure 4 show that FedAvg is stable only at the weaker scaling level, while Median and GeoMed remain robust as attack strength increases.
6.3. Backdoor Attack Evaluation
Backdoor attacks were evaluated because they measure targeted failure rather than only general detection degradation. Malicious clients poisoned 20% of local training samples using a feature-level trigger and target label 0. The malicious client ratio was fixed at 40%.
Figure 5 shows the attack success rate of each aggregation method under the feature-triggered backdoor attack, and
Table 6 reports the corresponding clean performance and ASR values.
FedAvg, Median, and GeoMed maintain high clean F1-scores but also show high ASR values above 0.95. This indicates that clean performance alone is insufficient to establish backdoor robustness. Multi-Krum reduces ASR to 0.0411, showing the strongest resistance to the triggered attack, although its clean F1-score is slightly lower.
6.4. Hyperparameter Sensitivity and Multi-Seed Reliability
Learning rate sensitivity and Trimmed Mean beta sensitivity were evaluated to ensure that the conclusions were not caused by a single parameter choice. The detailed trends are shown in
Figure 6 and
Figure 7, while
Table 7 reports the key F1-score values.
FedAvg remains weak under gradient scaling across all learning rates. Median and GeoMed remain substantially stronger, with the best F1-scores observed around the default learning rate. Trimmed Mean is highly sensitive to : increasing from 0.1 to 0.3 improves the F1-score from 0.6636 to 0.7537 under label flipping and from 0.6547 to 0.8311 under gradient scaling.
To reduce dependence on a single run, multi-seed experiments were conducted using seeds 42, 123, and 2026.
Table 8 reports F1-score mean and standard deviation values.
The multi-seed results support the main findings. FedAvg remains weak under both attacks. Under gradient scaling, Median, GeoMed, and Multi-Krum produce similar mean F1-scores, while under label flipping, Multi-Krum remains clearly strongest.
6.5. Aggregation Time and Comparative Discussion
Server-side aggregation time was measured to evaluate practical overhead. From
Table 3, Median is the fastest robust method, requiring about 0.0004 to 0.0005 s. FedAvg requires about 0.0011 s, GeoMed requires about 0.0024 to 0.0026 s, and Krum or Multi-Krum requires about 0.0032 to 0.0040 s because of distance-based computations.
The backdoor experiment shows the clearest robustness and overhead trade-off. Multi-Krum requires more aggregation time, but it reduces ASR to 0.0411. Median is faster, but its ASR remains high. Therefore, the practical choice of aggregation method should depend on the expected attack model. Median and GeoMed are strong choices against gradient-scaling attacks, while Multi-Krum is more suitable when resistance to label-flipping or backdoor attacks is the priority.
Overall, no single aggregation method dominates across all attack models. FedAvg performs competitively under clean training but is vulnerable under strong adversarial manipulation. Trimmed Mean can improve with a better beta value, but it remains parameter-sensitive. Median and GeoMed provide stable protection against scaled updates, while Multi-Krum provides stronger resistance to label-flipping and backdoor attacks. These results support attack-aware aggregation selection for federated CAV anomaly detection.
7. Conclusions
This study examined the robustness of federated learning for anomaly detection in connected autonomous vehicle networks using CAN bus traffic data. While federated learning enables collaborative model training without sharing raw data, its distributed nature introduces vulnerabilities when compromised clients submit poisoned model updates.
To investigate these challenges, we evaluated six aggregation strategies within a federated anomaly detection framework using the CAN-HCRL-OTIDS dataset: Federated Averaging (FedAvg), Coordinate-wise Median, Trimmed Mean, Krum, Multi-Krum, and Geometric Median (GeoMed). The experiments considered adversarial training scenarios including label-flipping attacks and gradient scaling attacks with different attack intensities as well as a feature-triggered backdoor attack.
The results show that FedAvg is highly vulnerable to adversarial manipulation, exhibiting substantial performance degradation under strong poisoning attacks. Trimmed Mean provides parameter-dependent robustness but remains sensitive to the selected trimming fraction. Median and GeoMed maintain stable performance under gradient-scaling attacks, with GeoMed achieving the strongest F1-score in the main gradient-scaling setting. Multi-Krum provides the strongest resistance to label-flipping and backdoor attacks, including a substantial reduction in attack success rate under the backdoor setting. These findings show that no single aggregation strategy is optimal across all evaluated attack scenarios.
These findings highlight the importance of robust aggregation mechanisms for deploying federated anomaly detection systems in safety-critical vehicular environments. For gradient-scaling attacks, Median and GeoMed provide strong robustness with relatively stable detection performance. For label-flipping and backdoor attacks, Multi-Krum offers stronger protection, although it introduces higher server-side aggregation cost. Therefore, the selection of an aggregation strategy should consider the expected attack model, detection reliability requirement, and computational overhead.
The study also provides additional evidence through malicious client ratio analysis, gradient scaling strength sensitivity, learning rate sensitivity, Trimmed Mean beta sensitivity, multi-seed reliability evaluation, and aggregation time measurement. These analyses strengthen the experimental basis of the findings and show how robustness changes under different adversarial and training conditions.
Future work will explore more sophisticated adversarial strategies, evaluate the framework under highly heterogeneous client data distributions, and investigate adaptive defense mechanisms such as trust-aware aggregation and dynamic client weighting to further strengthen federated learning security in intelligent transportation systems. Future work will also evaluate larger client populations, additional CAV datasets, temporal deep learning models, communication delay under realistic vehicular networks, and hybrid defenses that combine robust aggregation with privacy-preserving mechanisms such as differential privacy and secure aggregation.
Author Contributions
Conceptualization, A.N.; methodology, A.N. and A.Z.M.J.U.; software, A.N.; validation, A.N. and A.Z.M.J.U.; formal analysis, A.N.; investigation, A.N.; data curation, A.N.; writing—original draft preparation, A.N.; writing—review and editing, A.Z.M.J.U. and T.B.; visualization, A.N.; supervision, T.B. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The CAN-HCRL-OTIDS dataset used in this study is publicly available from its official repository. Processed data and experimental results generated during the experiments are available from the corresponding author upon reasonable request.
Acknowledgments
The authors thank the developers of the CAN-HCRL-OTIDS dataset for making the dataset publicly available, which enabled the experimental evaluation conducted in this work.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
| CAN | Controller Area Network |
| CAV | Connected Autonomous Vehicle |
| FL | Federated Learning |
| FedAvg | Federated Averaging |
| IDS | Intrusion Detection System |
| MLP | Multilayer Perceptron |
| ROC-AUC | Area Under the Receiver Operating Characteristic Curve |
| FNR | False Negative Rate |
| ASR | Attack Success Rate |
| GeoMed | Geometric Median |
| IID | Independent and Identically Distributed |
| Non-IID | Non-Independent and Non-Identically Distributed |
| V2X | Vehicle-to-Everything |
References
- Solaas, J.R.V.; Tuptuk, N.; Mariconti, E. Systematic Literature Review: Anomaly Detection in Connected and Autonomous Vehicles. IEEE Trans. Intell. Transp. Syst. 2025, 26, 43–58. [Google Scholar] [CrossRef]
- Abdallah, E.E.; Aloqaily, A.; Fayez, H. Identifying Intrusion Attempts on Connected and Autonomous Vehicles: A Survey. Procedia Comput. Sci. 2023, 220, 307–314. [Google Scholar] [CrossRef]
- Aloraini, F.; Javed, A.; Rana, O. Adversarial Attacks on Intrusion Detection Systems in In-Vehicle Networks of Connected and Autonomous Vehicles. Sensors 2024, 24, 3848. [Google Scholar] [CrossRef] [PubMed]
- Luo, F.; Wang, J.; Zhang, X.; Jiang, Y.; Li, Z.; Luo, C. In-Vehicle Network Intrusion Detection Systems: A Systematic Survey of Deep Learning-Based Approaches. PeerJ Comput. Sci. 2023, 9, e1648. [Google Scholar] [CrossRef] [PubMed]
- Yang, J.; Hu, J.; Yu, T. Federated AI-Enabled In-Vehicle Network Intrusion Detection for Internet of Vehicles. Electronics 2022, 11, 3658. [Google Scholar] [CrossRef]
- Nwakanma, C.I.; Ahakonye, L.A.C.; Njoku, J.N.; Odirichukwu, J.C.; Okolie, S.A.; Uzondu, C.; Nweke, C.C.N.; Kim, D.-S. Explainable Artificial Intelligence for Intrusion Detection and Mitigation in Intelligent Connected Vehicles: A Review. Appl. Sci. 2023, 13, 1252. [Google Scholar] [CrossRef]
- McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; Aguera y Arcas, B. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS); PMLR: Cambridge, MA, USA, 2017; Volume 54, pp. 1273–1282. Available online: https://proceedings.mlr.press/v54/mcmahan17a.html (accessed on 18 March 2026).
- Kumar, S.K.G.; Krishna Prakasha, K.; Muniyal, B.; Rajarajan, M. Explainable Federated Framework for Enhanced Security and Privacy in Connected Vehicles Against Advanced Persistent Threats. IEEE Open J. Veh. Technol. 2025, 6, 1438–1463. [Google Scholar] [CrossRef]
- Zhang, J.; Li, B.; Chen, C.; Lyu, L.; Wu, S.; Ding, S.; Wu, C. Delving into the Adversarial Robustness of Federated Learning. arXiv 2023, arXiv:2302.09479. [Google Scholar] [CrossRef]
- Demir, U.; Erpek, T.; Yalin, E.; Kompella, S.; Xue, M. Targeted Attacks and Defenses for Distributed Federated Learning in Vehicular Networks. arXiv 2025, arXiv:2510.15109. [Google Scholar] [CrossRef]
- Blanchard, P.; El Mhamdi, E.M.; Guerraoui, R.; Stainer, J. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017); Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 119–129. Available online: https://proceedings.neurips.cc/paper/2017/hash/f4b9ec30ad9f68f89b29639786cb62ef-Abstract.html (accessed on 18 March 2026).
- Yin, D.; Chen, Y.; Kannan, R.; Bartlett, P. Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates. In Proceedings of the 35th International Conference on Machine Learning (ICML); PMLR: Cambridge, MA, USA, 2018; Volume 80, pp. 5650–5659. Available online: https://proceedings.mlr.press/v80/yin18a.html (accessed on 18 March 2026).
- Zhang, C.; Yang, S.; Mao, L.; Ning, H. Anomaly Detection and Defense Techniques in Federated Learning: A Comprehensive Review. Artif. Intell. Rev. 2024, 57, 150. [Google Scholar] [CrossRef]
- Taheri, R.; Jafari, R.; Gegov, A.; Arabikhan, F.; Ichtev, A. Explainable AI for Federated Learning-Based Intrusion Detection Systems in Connected Vehicles. Electronics 2025, 14, 4508. [Google Scholar] [CrossRef]
- Lim, L.-H.; Ong, L.-Y.; Leow, M.-C. Federated Learning for Anomaly Detection: A Systematic Review on Scalability, Adaptability, and Benchmarking Framework. Future Internet 2025, 17, 375. [Google Scholar] [CrossRef]
- HaghighiFard, M.S.; Coleri, S. Secure Hierarchical Federated Learning in Vehicular Networks Using Dynamic Client Selection and Anomaly Detection. arXiv 2024, arXiv:2405.17497. [Google Scholar] [CrossRef]
- Alekszejenko, L.; Dobrowiecki, T.P. A V2X-Based privacy-preserving Federated Measuring and Learning System. arXiv 2024, arXiv:2401.13848. [Google Scholar] [CrossRef]
- Xiang, Z. Federated Learning in Autonomous Driving: Progress, Challenges, and Outlook in Perception, Prediction, and Communication. Appl. Comput. Eng. 2024, 46, 72–78. [Google Scholar] [CrossRef]
- Addula, S.R.; Tyagi, A.K. Future of Computer Vision and Industrial Robotics in Smart Manufacturing. In Artificial Intelligence-Enabled Digital Twin for Smart Manufacturing; Tyagi, A.K., Tiwari, S., Arumugam, S.K., Sharma, A.K., Eds.; Scrivener Publishing LLC: Beverly, MA, USA; Wiley: Hoboken, NJ, USA, 2024; pp. 505–539. [Google Scholar] [CrossRef]
- Liu, L.; Wang, F.; Du, N. Attack Detection of Federated Learning Model Based on Attention Mechanism Optimization in Connected Vehicles. World Electr. Veh. J. 2025, 16, 679. [Google Scholar] [CrossRef]
- Amara Korba, A.; Boualouache, A.; Brik, B.; Rahal, R.; Ghamri-Doudane, Y.; Senouci, S.M. Federated Learning for Zero-Day Attack Detection in 5G and Beyond V2X Networks. In Proceedings of the IEEE International Conference on Communications (ICC); IEEE: Piscataway, NJ, USA, 2023; pp. 1137–1142. [Google Scholar] [CrossRef]
- Ercan, S.; Mendiboure, L.; Alouache, L.; Maaloul, S.; Sylla, T.; Aniss, H. An Enhanced Model for Machine Learning-Based DoS Detection in Vehicular Networks. In Proceedings of the IFIP Networking Conference; IEEE: Piscataway, NJ, USA, 2023; pp. 1–9. [Google Scholar] [CrossRef]
- Haydari, A.; Zhang, M.; Chuah, C.-N. Adversarial Attacks and Defense in Deep Reinforcement Learning-Based Traffic Signal Controllers. IEEE Open J. Intell. Transp. Syst. 2021, 2, 402–416. [Google Scholar] [CrossRef]
- Bagdasaryan, E.; Veit, A.; Hua, Y.; Estrin, D.; Shmatikov, V. How To Backdoor Federated Learning. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS); PMLR: Cambridge, MA, USA, 2020; Volume 108, pp. 2938–2948. Available online: https://proceedings.mlr.press/v108/bagdasaryan20a.html (accessed on 18 March 2026).
- Chen, Y.; Su, L.; Xu, J. Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent. Proc. ACM Meas. Anal. Comput. Syst. 2017, 1, 44:1–44:25. [Google Scholar] [CrossRef]
- Dwork, C. Differential Privacy. In Proceedings of the 33rd International Colloquium on Automata, Languages and Programming (ICALP); Springer: Berlin/Heidelberg, Germany, 2006; Volume 4052, pp. 1–12. [Google Scholar] [CrossRef]
- Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security; ACM: New York, NY, USA, 2017; pp. 1175–1191. [Google Scholar] [CrossRef]
- Xie, N.; Zhang, C.; Yuan, Q.; Kong, J.; Di, X. IoV-BCFL: An Intrusion Detection Method for IoV Based on Blockchain and Federated Learning. Ad Hoc Netw. 2024, 163, 103590. [Google Scholar] [CrossRef]
Figure 1.
V2X-enabled connected autonomous vehicle environment for federated anomaly detection. Vehicles, roadside units, edge infrastructure, and a central server cooperate to support model training and anomaly detection, while local CAN traffic remains within each vehicle. Solid arrows indicate V2X communication links, and dashed red arrows indicate potential adversarial paths.
Figure 1.
V2X-enabled connected autonomous vehicle environment for federated anomaly detection. Vehicles, roadside units, edge infrastructure, and a central server cooperate to support model training and anomaly detection, while local CAN traffic remains within each vehicle. Solid arrows indicate V2X communication links, and dashed red arrows indicate potential adversarial paths.
Figure 2.
Federated anomaly detection workflow in V2X networks. Vehicular clients train local models using private CAN bus data and send model updates to the aggregation server. Red dashed arrows indicate poisoned updates from malicious clients, while black arrows indicate the normal training and global model update flow.
Figure 2.
Federated anomaly detection workflow in V2X networks. Vehicular clients train local models using private CAN bus data and send model updates to the aggregation server. Red dashed arrows indicate poisoned updates from malicious clients, while black arrows indicate the normal training and global model update flow.
Figure 3.
Effect of malicious client ratio on F1-score under label-flipping attacks.
Figure 3.
Effect of malicious client ratio on F1-score under label-flipping attacks.
Figure 4.
F1-score under gradient-scaling attacks with different values.
Figure 4.
F1-score under gradient-scaling attacks with different values.
Figure 5.
Attack success rate under the feature-triggered backdoor attack. Different bar colors distinguish the evaluated aggregation methods.
Figure 5.
Attack success rate under the feature-triggered backdoor attack. Different bar colors distinguish the evaluated aggregation methods.
Figure 6.
Learning rate sensitivity under gradient-scaling attack.
Figure 6.
Learning rate sensitivity under gradient-scaling attack.
Figure 7.
Trimmed Mean beta sensitivity under label-flipping and gradient-scaling attacks.
Figure 7.
Trimmed Mean beta sensitivity under label-flipping and gradient-scaling attacks.
Table 1.
Dataset characteristics used in the experiments.
Table 1.
Dataset characteristics used in the experiments.
| Attribute | Value |
|---|
| Dataset | CAN-HCRL-OTIDS |
| Source files | dataset.csv, dataset1.csv, dataset2.csv, dataset3.csv |
| Total samples | 4,613,439 |
| Class distribution | 2,369,398 benign and 2,244,041 attack samples |
| Class mapping | Target 0 as benign; targets 1, 2, and 3 as attack |
| Feature count | 12 |
| Split strategy | Chronological 80% training and 20% testing split per source file |
| Final split | 3,690,750 training and 922,689 testing samples |
Table 2.
Federated learning configuration.
Table 2.
Federated learning configuration.
| Parameter | Value |
|---|
| Learning paradigm | Federated learning |
| Number of clients | 10 |
| Client sample size | 20,000 samples per client |
| Evaluation test size | 100,000 samples |
| Non-IID construction | Client attack ratios from 0.10 to 0.90 |
| Communication rounds | 30 |
| Local epochs | 5 |
| Batch size | 1024 |
| Learning rate | 0.001 |
| Optimizer | Adam |
| Malicious-client ratios | 10%, 20%, 30%, and 40% |
| Random seeds | 42, 123, and 2026 |
Table 3.
Main performance comparison under clean training, label-flipping attack, and gradient-scaling attack. The adversarial settings use 40% malicious clients and gradient scaling uses .
Table 3.
Main performance comparison under clean training, label-flipping attack, and gradient-scaling attack. The adversarial settings use 40% malicious clients and gradient scaling uses .
| Scenario | Method | Accuracy | Precision | Recall | F1-Score | ROC-AUC | FNR | Time (s) |
|---|
| Clean | Median | 0.9200 | 0.9900 | 0.8442 | 0.9113 | 0.9544 | 0.1558 | 0.0005 |
| Clean | Multi-Krum | 0.9187 | 0.9867 | 0.8444 | 0.9100 | 0.9520 | 0.1556 | 0.0040 |
| Clean | GeoMed | 0.9172 | 0.9855 | 0.8423 | 0.9083 | 0.9526 | 0.1577 | 0.0024 |
| Clean | Krum | 0.9153 | 0.9752 | 0.8475 | 0.9069 | 0.9534 | 0.1525 | 0.0032 |
| Clean | FedAvg | 0.9149 | 0.9844 | 0.8385 | 0.9056 | 0.9512 | 0.1615 | 0.0011 |
| Clean | Trimmed Mean | 0.9148 | 0.9833 | 0.8391 | 0.9055 | 0.9531 | 0.1609 | 0.0013 |
| Label flipping | Multi-Krum | 0.8909 | 0.8979 | 0.8755 | 0.8865 | 0.9551 | 0.1245 | 0.0037 |
| Label flipping | GeoMed | 0.8846 | 0.8886 | 0.8724 | 0.8804 | 0.9507 | 0.1276 | 0.0026 |
| Label flipping | Krum | 0.8462 | 0.8035 | 0.9056 | 0.8515 | 0.9530 | 0.0944 | 0.0032 |
| Label flipping | Median | 0.8210 | 0.7561 | 0.9334 | 0.8355 | 0.9568 | 0.0666 | 0.0004 |
| Label flipping | Trimmed Mean | 0.6573 | 0.5911 | 0.9604 | 0.7318 | 0.8993 | 0.0396 | 0.0007 |
| Label flipping | FedAvg | 0.5792 | 0.5374 | 0.9762 | 0.6932 | 0.8875 | 0.0238 | 0.0011 |
| Gradient scaling | GeoMed | 0.9215 | 0.9858 | 0.8510 | 0.9134 | 0.9553 | 0.1490 | 0.0025 |
| Gradient scaling | Median | 0.9191 | 0.9823 | 0.8491 | 0.9108 | 0.9548 | 0.1509 | 0.0004 |
| Gradient scaling | Multi-Krum | 0.8942 | 0.9020 | 0.8780 | 0.8898 | 0.9553 | 0.1220 | 0.0038 |
| Gradient scaling | Krum | 0.8462 | 0.8035 | 0.9056 | 0.8515 | 0.9530 | 0.0944 | 0.0032 |
| Gradient scaling | FedAvg | 0.4893 | 0.4881 | 0.9987 | 0.6557 | 0.6984 | 0.0013 | 0.0011 |
| Gradient scaling | Trimmed Mean | 0.4869 | 0.4869 | 0.9995 | 0.6548 | 0.7256 | 0.0005 | 0.0006 |
Table 4.
F1-score under label-flipping attacks with different malicious client ratios.
Table 4.
F1-score under label-flipping attacks with different malicious client ratios.
| Ratio | FedAvg | Median | Multi-Krum | GeoMed |
|---|
| 10% | 0.8997 | 0.9081 | 0.9081 | 0.9081 |
| 20% | 0.8335 | 0.8902 | 0.9049 | 0.9012 |
| 30% | 0.7687 | 0.8616 | 0.8980 | 0.8973 |
| 40% | 0.6932 | 0.8355 | 0.8865 | 0.8804 |
Table 5.
F1-score under different gradient-scaling strengths.
Table 5.
F1-score under different gradient-scaling strengths.
| FedAvg | Median | GeoMed |
|---|
| 2 | 0.9086 | 0.9074 | 0.9119 |
| 5 | 0.6557 | 0.9108 | 0.9134 |
| 10 | 0.6603 | 0.9106 | 0.9143 |
Table 6.
Backdoor attack results with 40% malicious clients. Lower ASR indicates stronger resistance.
Table 6.
Backdoor attack results with 40% malicious clients. Lower ASR indicates stronger resistance.
| Method | Clean F1-Score | Clean ROC-AUC | ASR |
|---|
| FedAvg | 0.9063 | 0.9518 | 0.9895 |
| Median | 0.9095 | 0.9537 | 0.9589 |
| Multi-Krum | 0.8973 | 0.9564 | 0.0411 |
| GeoMed | 0.9079 | 0.9532 | 0.9519 |
Table 7.
Compact hyperparameter sensitivity summary using F1-score.
Table 7.
Compact hyperparameter sensitivity summary using F1-score.
| Analysis | Setting | FedAvg | Median/GeoMed | Trimmed Mean |
|---|
| Learning rate | 0.0005 | 0.6576 | 0.9062/0.9040 | – |
| Learning rate | 0.001 | 0.6546 | 0.9116/0.9083 | – |
| Learning rate | 0.005 | 0.6547 | 0.8910/0.9018 | – |
| Beta, label flipping | 0.1 | – | – | 0.6636 |
| Beta, label flipping | 0.2 | – | – | 0.6657 |
| Beta, label flipping | 0.3 | – | – | 0.7537 |
| Beta, gradient scaling | 0.1 | – | – | 0.6547 |
| Beta, gradient scaling | 0.2 | – | – | 0.6602 |
| Beta, gradient scaling | 0.3 | – | – | 0.8311 |
Table 8.
Multi-seed reliability analysis using F1-score mean and standard deviation.
Table 8.
Multi-seed reliability analysis using F1-score mean and standard deviation.
| Attack Type | FedAvg | Median | GeoMed | Multi-Krum |
|---|
| Gradient scaling | | | | |
| Label flipping | | | | |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |