Robust Federated Learning for Anomaly Detection in Connected Autonomous Vehicle Networks Under Adversarial Attacks

Md Jalal Uddin, Abu Zahid; Nayeem, Atahar; Bhuiyan, Touhid

doi:10.3390/automation7030080

Open AccessArticle

Robust Federated Learning for Anomaly Detection in Connected Autonomous Vehicle Networks Under Adversarial Attacks

by

Abu Zahid Md Jalal Uddin

^1,*

,

Atahar Nayeem

²

and

Touhid Bhuiyan

¹

School of Information Technology (SIT), Washington University of Science and Technology (WUST), Alexandria, VA 22314, USA

²

Department of Computer Science and Engineering, National Institute of Textile Engineering and Research (NITER), Savar, Dhaka 1350, Bangladesh

^*

Author to whom correspondence should be addressed.

Automation 2026, 7(3), 80; https://doi.org/10.3390/automation7030080 (registering DOI)

Submission received: 10 April 2026 / Revised: 10 May 2026 / Accepted: 14 May 2026 / Published: 20 May 2026

(This article belongs to the Topic Advanced Methods in Unmanned Aerial Vehicle Control, Navigation, and Safety)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Connected and autonomous vehicles (CAVs) increasingly rely on vehicle-to-everything (V2X) communication and distributed sensing infrastructures to support cooperative driving and intelligent transportation services. While these capabilities improve traffic efficiency and safety, they also expand the attack surface of vehicular networks and expose in-vehicle communication systems such as the Controller Area Network (CAN) bus to a wide range of cyber threats. Machine learning-based anomaly detection has emerged as a promising approach for identifying malicious CAN traffic patterns; however, conventional centralized learning requires large-scale data aggregation from vehicles, which raises privacy and scalability concerns. Federated learning (FL) enables collaborative model training across distributed vehicles without requiring the exchange of raw in-vehicle data, making it attractive for privacy-preserving vehicular security applications. Nevertheless, FL systems remain vulnerable to adversarial participants that manipulate local training data or model updates to poison the global model during aggregation. In this work, we present a systematic robustness evaluation of federated anomaly detection in connected vehicular networks under adversarial conditions. The study compares six aggregation strategies, including Federated Averaging (FedAvg), coordinate-wise Median, Trimmed Mean, Krum, Multi-Krum, and Geometric Median (GeoMed), within a non-IID federated CAN bus anomaly detection setting. The evaluation covers label-flipping attacks, gradient-scaling attacks, and a feature-triggered backdoor attack. In addition, the analysis examines malicious client participation, attack-strength variation, learning-rate sensitivity, Trimmed Mean beta sensitivity, multi-seed reliability, and server-side aggregation time. The results show that FedAvg is vulnerable under strong adversarial manipulation, while Trimmed Mean is sensitive to the selected trimming fraction. Median and GeoMed provide strong robustness against gradient-scaling attacks, whereas Multi-Krum achieves the strongest resistance to label-flipping and backdoor attacks. These findings demonstrate that no single aggregation strategy is optimal across all threat models. Instead, robust aggregation for federated CAV anomaly detection should be selected according to the expected attack type, reliability requirement, and computational overhead.

Keywords:

federated learning; connected autonomous vehicles; CAN bus intrusion detection; adversarial attacks; poisoning attacks; backdoor attacks; robust aggregation; Krum; Multi-Krum; geometric median

1. Introduction

Connected autonomous vehicles (CAVs) are transforming intelligent transportation by enabling communication among vehicles, infrastructure, and roadside systems through vehicle-to-everything (V2X) technologies. These capabilities support cooperative driving, traffic efficiency, and improved road safety. At the vehicle level, many functions rely on electronic control units (ECUs) that exchange data through in-vehicle networks, among which the Controller Area Network (CAN) bus remains a dominant protocol for real-time communication [1,2].

However, the CAN protocol was not designed with strong built-in security features such as authentication and encryption. This limitation makes in-vehicle networks vulnerable to message injection, spoofing, replay, and denial-of-service attacks, which may disrupt vehicle operation and threaten passenger safety. As vehicle connectivity continues to expand, securing CAN-based communication has become a critical challenge in intelligent transportation systems [3,4].

Machine learning-based intrusion detection has attracted significant attention for CAN bus security because it can identify abnormal traffic patterns beyond predefined attack signatures. Compared with rule-based and signature-based approaches, learning-based methods are better suited to detecting previously unseen attacks. However, conventional centralized training requires collecting data from many vehicles, which raises privacy concerns and increases communication cost [5,6].

Federated learning (FL) addresses this limitation by enabling distributed model training without sharing raw local data. In FL, each client trains locally and sends model updates to a central server, which aggregates them into a global model. This setting is well suited to connected vehicle environments, where vehicles can collaboratively learn from locally generated CAN traffic while preserving data privacy [7,8].

Despite its privacy advantages, FL remains vulnerable to adversarial manipulation. Compromised clients can poison training by altering labels or submitting malicious model updates, thereby degrading global model quality. Two representative threats are label-flipping attacks and gradient manipulation attacks, where adversaries amplify or distort local updates to influence aggregation. Backdoor attacks also pose a serious threat because the model may maintain high clean accuracy while misclassifying samples that contain a specific trigger pattern. In safety-critical CAV applications, such attacks can significantly weaken anomaly detection reliability [9,10].

To mitigate poisoned updates, robust aggregation methods have been proposed as alternatives to simple averaging. In particular, coordinate-wise Median and Trimmed Mean are widely studied for Byzantine-resilient distributed learning because they reduce the influence of abnormal client updates [11,12,13]. Other robust aggregation strategies, such as Krum, Multi-Krum, and Geometric Median, provide additional defense mechanisms by selecting updates based on distance relationships or minimizing the total distance to submitted client updates. However, the relative effectiveness of these aggregation methods for federated CAN bus anomaly detection under different poisoning and backdoor attack settings remains insufficiently explored.

In this study, we examine the robustness of federated anomaly detection for CAV networks under adversarial conditions using the CAN-HCRL-OTIDS dataset. We simulate a multi-client vehicular FL environment in which a multilayer perceptron (MLP) is trained collaboratively under non-IID client data distributions. The evaluation compares six aggregation strategies: Federated Averaging (FedAvg), coordinate-wise Median, Trimmed Mean, Krum, Multi-Krum, and Geometric Median (GeoMed). We consider both benign training and adversarial training under label-flipping, gradient-scaling, and feature-triggered backdoor attacks. We also evaluate malicious client ratios, gradient-scaling strengths, learning rates, Trimmed Mean beta values, multi-seed reliability, and server-side aggregation time. Model performance is assessed using accuracy, precision, recall, F1-score, ROC-AUC, false negative rate, attack success rate, and aggregation time [14,15].

The main contributions of this work are summarized as follows:

We develop a federated learning framework for CAN bus anomaly detection in connected autonomous vehicle networks while preserving local data privacy.
We present a systematic robustness evaluation of federated aggregation strategies under adversarial conditions in connected vehicular networks.
We compare six aggregation strategies in the same vehicular federated learning setting: FedAvg, coordinate-wise Median, Trimmed Mean, Krum, Multi-Krum, and GeoMed.
We investigate the vulnerability of federated anomaly detection under three adversarial attack models: label-flipping attacks, gradient-scaling attacks, and feature-triggered backdoor attacks.
We analyze the effect of malicious client ratio by testing adversarial participation levels of 10%, 20%, 30%, and 40%.
We conduct sensitivity analyses for gradient-scaling strength, learning rate, and the Trimmed Mean beta parameter, and we report multi-seed reliability results.
We report server-side aggregation time to examine the practical trade-off between robustness and computational overhead in CAV environments.
We show that Median and GeoMed provide strong stability against gradient-scaling attacks, while Multi-Krum provides stronger resistance to label-flipping and backdoor attacks. These results indicate that robust aggregation should be selected according to the expected attack model and deployment constraints.

2. Related Work

2.1. Anomaly Detection and Federated Learning in Vehicular Networks

Intrusion detection is a critical component of vehicular cybersecurity because connected vehicles increasingly rely on networked communication systems. Recent studies highlight the rapid growth of machine learning-based anomaly detection for intelligent transportation systems and emphasize the need for scalable security solutions in vehicular environments [1,2]. Centralized learning models can detect known attack patterns effectively, but they require continuous transmission of in-vehicle data to a central server, which increases privacy risk and communication cost [4,14].

Federated learning addresses this limitation by enabling collaborative model training without sharing raw data. Each participant trains locally and sends model updates to an aggregation server, where a global model is constructed [7]. In vehicular networks, federated learning has been studied for hierarchical vehicle-edge coordination, V2X-based collaborative learning, autonomous driving perception, and privacy-preserving intrusion detection [15,16,17,18]. Beyond transportation, AI-enabled digital twin and smart manufacturing research also provides useful context for autonomous systems that depend on sensing, robotics, data exchange, and intelligent decision making [19]. This broader automation perspective is relevant to CAV anomaly detection because safety, privacy, and system trustworthiness are shared requirements across intelligent autonomous systems.

Despite these advances, many vehicular federated learning studies assume that participating clients behave honestly. This assumption may not hold in practical CAV environments, where compromised vehicles or edge nodes may participate in training while submitting manipulated updates.

2.2. Adversarial Attacks on Federated Learning

Federated learning is vulnerable to adversarial manipulation because the server usually cannot directly inspect private client data. Malicious participants can corrupt local training data or submit poisoned model updates, causing data poisoning or model poisoning attacks. Prior studies show that even a small number of adversarial clients can degrade federated model performance [9,20]. In vehicular networks, adversarial behavior can directly reduce the reliability of collaborative intrusion detection and safety-critical decision support [21,22]. Related intelligent transportation studies also show that adversarial attacks can affect learning-based traffic-signal control systems [23].

Label-flipping attacks represent data poisoning, where malicious clients intentionally change class labels during local training. Gradient-scaling attacks represent model poisoning, where malicious clients amplify submitted updates to increase their influence during aggregation. Backdoor attacks are more covert because the global model may retain strong clean performance while misclassifying samples that contain a specific trigger pattern [24]. These attacks are important for CAV anomaly detection because a model that appears reliable under normal testing may still fail under targeted adversarial conditions.

2.3. Robust Aggregation and Defense Mechanisms

Robust aggregation methods aim to reduce the influence of abnormal client updates during federated aggregation. Coordinate-wise Median and Trimmed Mean are widely studied defenses against Byzantine or poisoning behavior in distributed learning [12]. Coordinate-wise Median computes the median value of each parameter across client submissions, while Trimmed Mean removes selected extreme values before averaging. Because Trimmed Mean depends on the selected trimming fraction, sensitivity to the beta value should be evaluated rather than assuming that a fixed parameter is optimal.

Distance-based robust aggregation provides another defense direction. Krum selects the update closest to its neighboring updates, while Multi-Krum selects multiple low-score candidate updates before averaging [11]. Geometric Median aggregation seeks an aggregate update that minimizes the total distance to submitted client updates [25]. These methods can improve resistance to malicious updates, but they may introduce additional server side computation compared with FedAvg.

Robust aggregation can improve resilience against poisoning, but it does not fully address all privacy and security risks. Privacy-preserving and cryptographic defenses such as differential privacy, secure aggregation, authentication, and blockchain-based accountability can reduce inference risks, protect update confidentiality, or improve auditability [26,27,28]. These mechanisms are complementary to robust aggregation, but they may introduce utility loss, communication overhead, or additional computation. Therefore, practical deployment in CAV environments requires evaluating both robustness and overhead.

2.4. Research Gap

Although previous studies have examined federated vehicular intrusion detection and adversarial robustness, several gaps remain. First, many evaluations compare only a small number of aggregation strategies and do not include stronger robust baselines such as Krum, Multi-Krum, and Geometric Median. Second, most studies focus on label-flipping or simple model poisoning, while backdoor attacks remain less explored for federated CAN bus anomaly detection. Third, malicious client ratio, learning rate, and Trimmed Mean beta sensitivity are often not fully analyzed, even though these factors can change conclusions about the strongest aggregation method. Fourth, server side aggregation time is important in latency-sensitive CAV environments but is often omitted from robustness analysis.

To address these gaps, this study evaluates FedAvg, coordinate-wise Median, Trimmed Mean, Krum, Multi-Krum, and GeoMed under clean training, label-flipping attacks, gradient-scaling attacks, and backdoor attacks. The evaluation also analyzes malicious client ratio, gradient-scaling strength, learning rate, Trimmed Mean beta, multi-seed reliability, and aggregation time.

3. System Model

We consider a federated learning-based anomaly detection framework deployed across a fleet of connected autonomous vehicles. The system consists of N vehicular clients and a central aggregation server that coordinates collaborative training. Each vehicle is equipped with onboard computing resources capable of collecting Controller Area Network (CAN) bus traffic and performing local model training.

The objective is to train a global intrusion detection model without requiring vehicles to share raw CAN traffic data. Instead, each client trains locally using its private dataset and transmits only model parameters or model updates to the aggregation server. This design preserves data privacy while reducing raw data exposure, consistent with recent privacy-preserving vehicular federated learning architectures [16].

Figure 1 illustrates the V2X communication environment and the high-level deployment context for federated anomaly detection in connected autonomous vehicle networks. The figure shows the interaction among connected vehicles, roadside infrastructure, edge devices, communication networks, and the central server.

3.1. Federated Learning Workflow

Training proceeds in synchronous communication rounds. At the beginning of each round, the server distributes the current global model to all clients. Each client performs local training on its CAN traffic dataset for several local epochs and then sends the updated model parameters back to the server.

The server aggregates the received updates using a predefined aggregation strategy to construct a new global model. In this study, the evaluated aggregation strategies include Federated Averaging (FedAvg), coordinate-wise Median, Trimmed Mean, Krum, Multi-Krum, and Geometric Median (GeoMed). The updated model is then broadcast to clients for the next training round. This iterative process continues until the specified number of communication rounds is completed.

3.2. Data Distribution

In practical vehicular environments, CAN traffic collected from different vehicles varies due to differences in driving behavior, hardware configurations, and surrounding conditions. Therefore, client datasets are assumed to follow a non-independent and non-identically distributed (non-IID) setting.

Each vehicle collects its own CAN traffic logs, and the proportion of benign and malicious samples may differ across clients. Such heterogeneous data distributions commonly arise in real-world federated learning deployments and represent realistic conditions for collaborative intrusion detection. In the experiments, this non-IID setting is represented through heterogeneous class distributions across clients, where different clients contain different benign and attack sample proportions. Specifically, client attack ratios are varied across clients to create label distribution heterogeneity.

This non-IID construction captures class distribution heterogeneity among vehicular clients. It does not fully represent all possible forms of real-world vehicular heterogeneity, such as different vehicle models, manufacturer-specific CAN semantics, or large feature distribution shifts across environments. These limitations are considered in the discussion of practical deployment and future work.

3.3. Client Composition

The federated system contains both benign and adversarial participants. Let f denote the number of compromised clients such that

f < N

. Malicious clients follow the federated protocol but intentionally manipulate their training behavior to degrade the global model.

In our experimental setup, we simulate

N = 10

clients, and the number of malicious clients is varied to evaluate the effect of adversarial participation. Specifically, malicious client ratios of 10%, 20%, 30%, and 40% are considered, corresponding to

f = 1

,

f = 2

,

f = 3

, and

f = 4

malicious clients. The 40% setting is close to the robustness boundary of median-based aggregation, where the number of malicious clients must remain below half of the total client population.

3.4. Threat and Trust Assumptions

The aggregation server is assumed to be trusted and executes the selected aggregation rule correctly. The server does not know in advance which clients are malicious. Malicious clients may manipulate local labels, scale their submitted updates, or inject a backdoor trigger during local training. However, they do not control the server and do not directly modify benign client data.

The adversary is internal to the federated learning process because compromised clients are allowed to participate in training and submit model updates. This setting is relevant to CAV networks, where a compromised vehicle or edge node may appear as a legitimate participant while attempting to poison the global model.

3.5. Communication Assumptions

Communication between the aggregation server and vehicular clients is assumed to be authenticated and reliable. Although practical vehicular networks may experience latency or bandwidth constraints, this work focuses on the robustness of the learning framework under adversarial client behavior.

Model parameters exchanged during training are assumed to be protected against external tampering, consistent with secure vehicular federated learning architectures. To partially assess practical deployment cost, the experiments report aggregation time at the server for each aggregation method. A complete packet-level communication delay analysis is outside the scope of this study.

4. Methodology

4.1. System Overview

The V2X deployment context is shown in Figure 1, while the detailed federated anomaly detection workflow is shown in Figure 2. In this framework, vehicular clients train local anomaly detection models using private CAN bus data and send model updates to a central aggregation server. The server aggregates the received updates using the selected aggregation strategy and broadcasts the updated global model to clients for the next communication round. This section presents the federated optimization objective, aggregation strategies, training procedure, and adversarial threat model used in the study.

4.2. Federated Learning Formulation

Consider a federated learning environment with N vehicular clients. Each client i maintains a private local dataset:

D_{i} = {(x_{i j}, y_{i j})}_{j = 1}^{n_{i}},

where

x_{i j}

is the feature vector extracted from CAN messages and

y_{i j} \in {0, 1}

denotes benign or malicious traffic. Let

f_{w} (\cdot)

denote the anomaly detection model parameterized by

w

. The local empirical loss at client i is defined as

F_{i} (w) = \frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} ℓ (f_{w} (x_{i j}), y_{i j}),

(1)

where

ℓ (\cdot)

is the training loss. The global federated objective is

\min_{w} F (w) = \sum_{i = 1}^{N} \frac{n_{i}}{n} F_{i} (w), n = \sum_{i = 1}^{N} n_{i} .

(2)

At communication round t, the server broadcasts the current global model

w^{t}

to all clients. Each client performs local training and returns updated parameters

w_{i}^{t + 1}

. The local update is represented as

Δ_{i}^{t} = w_{i}^{t + 1} - w^{t} .

(3)

4.3. Aggregation Strategies

This study evaluates six aggregation strategies under clean and adversarial federated learning conditions.

4.3.1. Federated Averaging

Federated Averaging (FedAvg) computes a weighted average of client model parameters:

w^{t + 1} = \sum_{i = 1}^{N} \frac{n_{i}}{n} w_{i}^{t + 1} .

(4)

FedAvg is efficient and widely used, but it can be strongly affected when malicious clients submit poisoned updates.

4.3.2. Coordinate-Wise Median

Coordinate-wise Median aggregation computes the median value independently for each model parameter dimension. Let

w_{i, j}^{t + 1}

denote the j-th parameter submitted by client i. The aggregated parameter is

w_{j}^{t + 1} = median (w_{1, j}^{t + 1}, w_{2, j}^{t + 1}, \dots, w_{N, j}^{t + 1}) .

(5)

The median operator is robust when the number of malicious clients satisfies

f < \frac{N}{2} .

(6)

This condition is important because the 40% malicious client setting with

N = 10

and

f = 4

is close to the robustness boundary of median-based aggregation. Therefore, lower malicious client ratios are also evaluated.

4.3.3. Trimmed Mean

Trimmed Mean aggregation removes extreme values before averaging. Let

β

denote the trimming fraction and

k = ⌊ β N ⌋

. After sorting the submitted values for parameter j, the aggregated value is

w_{j}^{t + 1} = \frac{1}{N - 2 k} \sum_{i = k + 1}^{N - k} w_{(i), j}^{t + 1},

(7)

where

w_{(i), j}^{t + 1}

is the i-th sorted value for parameter j. Since the performance of Trimmed Mean depends on

β

, the experiments include a beta sensitivity analysis.

4.3.4. Krum

Krum is a distance-based Byzantine-resilient aggregation method that selects one submitted update whose distance to neighboring updates is smallest [11]. For each client update

w_{i}^{t + 1}

, Krum computes the squared Euclidean distance to other updates and forms the neighbor set

N_{i}

containing the

N - f - 2

closest updates. The Krum score is

s_{i} = \sum_{j \in N_{i}} {∥ w_{i}^{t + 1} - w_{j}^{t + 1} ∥}_{2}^{2} .

(8)

The selected update is

i^{*} = \arg \min_{i} s_{i}, w^{t + 1} = w_{i^{*}}^{t + 1} .

(9)

Krum can reject isolated malicious updates, but it uses only one selected update per round.

4.3.5. Multi-Krum

Multi-Krum extends Krum by selecting multiple client updates with the lowest Krum scores and averaging them [11]. If

M

is the selected set, the aggregated model is

w^{t + 1} = \frac{1}{| M |} \sum_{i \in M} w_{i}^{t + 1} .

(10)

This preserves distance-based filtering while using more benign information than single-update Krum.

4.3.6. Geometric Median

Geometric Median (GeoMed) aggregation computes an aggregate model that minimizes the total distance to submitted client models [25]:

w^{t + 1} = \arg \min_{z} \sum_{i = 1}^{N} {∥ z - w_{i}^{t + 1} ∥}_{2} .

(11)

Unlike coordinate-wise Median, GeoMed treats each submitted model as a full parameter vector and reduces the effect of updates that are far from the central group of client submissions.

4.4. Federated Training Procedure

The training process follows synchronous communication rounds. In each round, the server broadcasts

w^{t}

to all clients. Each client trains locally on its private CAN traffic data and sends

w_{i}^{t + 1}

to the server. The server then applies one of the evaluated aggregation strategies: FedAvg, coordinate-wise Median, Trimmed Mean, Krum, Multi-Krum, or GeoMed. The resulting global model

w^{t + 1}

is used in the next round. The process continues until the specified number of communication rounds is completed.

4.5. Threat Model

We consider an internal adversary that controls a subset of participating clients. The aggregation server is trusted and executes the selected aggregation rule correctly, but it does not know which clients are malicious. Let f denote the number of adversarial clients, with

f < N

. The experiments evaluate malicious client ratios of 10%, 20%, 30%, and 40%. Three attack strategies are considered.

Label-Flipping Attack: Malicious clients alter local labels during training by, for example, switching benign labels to attack labels and attack labels to benign labels. This corrupts the local training objective and produces misleading model updates.

Gradient-Scaling Attack: Malicious clients amplify their submitted updates. Given the local update

Δ_{i}^{t}

, a malicious client submits

{\tilde{Δ}}_{i}^{t} = γ Δ_{i}^{t}, γ > 1,

(12)

and the corresponding poisoned model becomes

{\tilde{w}}_{i}^{t + 1} = w^{t} + {\tilde{Δ}}_{i}^{t} .

(13)

The scaling factor

γ

controls the attack strength.

Backdoor Attack: In the backdoor attack, malicious clients poison a fraction of their local training samples by applying a predefined feature-level trigger and assigning the triggered samples to a target label. Let

T (\cdot)

denote the trigger function and

y_{b}

denote the adversary-selected target label. A poisoned sample is represented as

(x^{'}, y^{'}) = (T (x), y_{b}) .

(14)

The backdoor attack is evaluated using attack success rate (ASR), defined as

A S R = \frac{Number of triggered samples classified as the target label}{Total number of triggered samples} .

(15)

A lower ASR indicates stronger resistance to the backdoor trigger. ASR is reported together with clean classification performance to evaluate both normal detection capability and targeted attack resistance.

5. Experimental Setup

This section describes the dataset, preprocessing, model architecture, federated configuration, attack settings, sensitivity analyses, and evaluation metrics used to assess the robustness of the federated anomaly detection framework. All experiments were conducted in a simulated federated learning environment where vehicular clients collaboratively train an anomaly detection model using CAN bus traffic data.

5.1. Dataset and Preprocessing

Experiments were performed using the CAN-HCRL-OTIDS dataset, which contains labeled CAN bus traffic collected from vehicular environments. The dataset includes normal communication and malicious traffic generated under different attack scenarios, making it suitable for evaluating CAN bus intrusion detection.

Four source files were used: dataset.csv, dataset1.csv, dataset2.csv, and dataset3. csv. These files were merged into a unified binary classification dataset, where target label 0 was mapped to the benign class and target labels 1, 2, and 3 were mapped to the attack class. The merged dataset contains 4,613,439 CAN samples, including 2,369,398 benign samples and 2,244,041 attack samples.

To preserve temporal independence, each source file was divided using a chronological 80% training and 20% testing split before merging the partitions. The final split contains 3,690,750 training samples and 922,689 testing samples. Each CAN message is represented using 12 packet-level features: TS, ID1, ID0, LEN, and payload bytes DLC0 to DLC7. Hexadecimal fields were converted into numerical values, missing values were handled during preprocessing, and all features were normalized using StandardScaler. The same preprocessing pipeline was applied consistently to the chronological training and testing partitions. Table 1 summarizes the dataset characteristics used in the experiments.

5.2. Model Architecture

The anomaly detection model is implemented as a lightweight multilayer perceptron (MLP) for binary classification. The network receives a 12-dimensional CAN feature vector and uses two fully connected hidden layers with 32 and 16 neurons, respectively, followed by a two-neuron softmax output layer. ReLU activation is used in the hidden layers.

This architecture is computationally efficient and suitable for resource-constrained vehicular settings. Although CNN, LSTM, and Transformer-based models may capture richer spatial or temporal patterns, the MLP is used here to keep the focus on robustness differences among aggregation strategies under identical model capacity and adversarial conditions.

5.3. Federated Learning Configuration

To simulate a collaborative vehicular environment, the dataset was partitioned across multiple clients representing individual vehicles. Each client trains locally using private data and shares only model parameters with the aggregation server. A non-IID setting was created by assigning different benign and attack proportions across clients, with client attack ratios increasing from 0.10 to 0.90.

The experiments use 10 clients, with 20,000 local samples assigned to each client and 100,000 test samples used for evaluation. Malicious-client ratios of 10%, 20%, 30%, and 40% are evaluated, corresponding to 1, 2, 3, and 4 malicious clients. The default adversarial setting uses 40% malicious clients. Table 2 summarizes the federated learning configuration.

5.4. Attack and Sensitivity Settings

Six aggregation strategies are evaluated: FedAvg, coordinate-wise Median, Trimmed Mean, Krum, Multi-Krum, and Geometric Median (GeoMed). The attack evaluation includes label flipping, gradient scaling, and a feature-triggered backdoor attack.

In the label-flipping attack, malicious clients switch benign and attack labels during local training. This attack is evaluated under malicious client ratios of 10%, 20%, 30%, and 40%. In the gradient-scaling attack, malicious clients amplify their local updates before sending them to the server:

{\tilde{w}}_{i} = w^{t} + γ (w_{i} - w^{t}),

(16)

where

γ

controls attack strength. Gradient-scaling sensitivity is evaluated using

γ = 2, 5, 10

and the default setting uses 40% malicious clients with

γ = 5

.

In the backdoor attack, malicious clients poison a fraction of local training samples by inserting a feature-level trigger and assigning triggered samples to a target label. The backdoor setting uses 40% malicious clients, a poison fraction of 0.2, and target label 0.

Additional sensitivity analyses were conducted for learning rate values of 0.0005, 0.001 and 0.005, and Trimmed Mean beta values of

β = 0.1

,

β = 0.2

, and

β = 0.3

. To reduce dependence on a single random run, multi-seed reliability analysis was performed using seeds 42, 123, and 2026.

5.5. Evaluation Metrics

Model performance was evaluated using accuracy, precision, recall, F1-score, ROC-AUC, and false negative rate (FNR). These metrics measure both overall classification performance and the ability to detect malicious CAN traffic while minimizing missed attacks. For the backdoor attack, attack success rate (ASR) is also reported, where lower ASR indicates stronger resistance to triggered misclassification. Server-side aggregation time is measured to assess computational overhead across aggregation strategies.

6. Results and Discussion

This section evaluates the federated anomaly detection framework under clean and adversarial settings. The analysis covers main performance, malicious client ratio sensitivity, gradient-scaling strength, backdoor resistance, hyperparameter sensitivity, multi-seed reliability, and server-side aggregation overhead.

6.1. Overall Performance Under Clean and Adversarial Settings

We first evaluate the model under clean federated training and then under label-flipping and gradient-scaling attacks with 40% malicious clients. Table 3 reports the final results after 30 communication rounds.

Under clean training, all methods achieve similar F1-scores between 0.9055 and 0.9113, showing that the selected MLP and CAN features provide a stable baseline. Under label flipping, FedAvg drops to an F1-score of 0.6932, while Multi-Krum achieves the best F1-score of 0.8865, followed by GeoMed with 0.8804. Under gradient scaling, FedAvg and Trimmed Mean degrade strongly, whereas GeoMed and Median remain stable with F1-scores of 0.9134 and 0.9108. These results show that robust aggregation is necessary, but the strongest method depends on the attack type.

6.2. Malicious Client Ratio and Gradient-Scaling Sensitivity

To examine the effect of adversarial participation, label-flipping attacks were evaluated under 10%, 20%, 30%, and 40% malicious client ratios. Table 4 and Figure 3 summarize the F1-score trend.

FedAvg declines continuously as the malicious client ratio increases, while Multi-Krum and GeoMed remain more stable. Median also degrades near the 40% setting because this ratio approaches the robustness boundary of median-based aggregation.

Gradient-scaling sensitivity was evaluated using

γ = 2

,

γ = 5

, and

γ = 10

with 40% malicious clients. Table 5 and Figure 4 show that FedAvg is stable only at the weaker scaling level, while Median and GeoMed remain robust as attack strength increases.

6.3. Backdoor Attack Evaluation

Backdoor attacks were evaluated because they measure targeted failure rather than only general detection degradation. Malicious clients poisoned 20% of local training samples using a feature-level trigger and target label 0. The malicious client ratio was fixed at 40%. Figure 5 shows the attack success rate of each aggregation method under the feature-triggered backdoor attack, and Table 6 reports the corresponding clean performance and ASR values.

FedAvg, Median, and GeoMed maintain high clean F1-scores but also show high ASR values above 0.95. This indicates that clean performance alone is insufficient to establish backdoor robustness. Multi-Krum reduces ASR to 0.0411, showing the strongest resistance to the triggered attack, although its clean F1-score is slightly lower.

6.4. Hyperparameter Sensitivity and Multi-Seed Reliability

Learning rate sensitivity and Trimmed Mean beta sensitivity were evaluated to ensure that the conclusions were not caused by a single parameter choice. The detailed trends are shown in Figure 6 and Figure 7, while Table 7 reports the key F1-score values.

FedAvg remains weak under gradient scaling across all learning rates. Median and GeoMed remain substantially stronger, with the best F1-scores observed around the default learning rate. Trimmed Mean is highly sensitive to

β

: increasing

β

from 0.1 to 0.3 improves the F1-score from 0.6636 to 0.7537 under label flipping and from 0.6547 to 0.8311 under gradient scaling.

To reduce dependence on a single run, multi-seed experiments were conducted using seeds 42, 123, and 2026. Table 8 reports F1-score mean and standard deviation values.

The multi-seed results support the main findings. FedAvg remains weak under both attacks. Under gradient scaling, Median, GeoMed, and Multi-Krum produce similar mean F1-scores, while under label flipping, Multi-Krum remains clearly strongest.

6.5. Aggregation Time and Comparative Discussion

Server-side aggregation time was measured to evaluate practical overhead. From Table 3, Median is the fastest robust method, requiring about 0.0004 to 0.0005 s. FedAvg requires about 0.0011 s, GeoMed requires about 0.0024 to 0.0026 s, and Krum or Multi-Krum requires about 0.0032 to 0.0040 s because of distance-based computations.

The backdoor experiment shows the clearest robustness and overhead trade-off. Multi-Krum requires more aggregation time, but it reduces ASR to 0.0411. Median is faster, but its ASR remains high. Therefore, the practical choice of aggregation method should depend on the expected attack model. Median and GeoMed are strong choices against gradient-scaling attacks, while Multi-Krum is more suitable when resistance to label-flipping or backdoor attacks is the priority.

Overall, no single aggregation method dominates across all attack models. FedAvg performs competitively under clean training but is vulnerable under strong adversarial manipulation. Trimmed Mean can improve with a better beta value, but it remains parameter-sensitive. Median and GeoMed provide stable protection against scaled updates, while Multi-Krum provides stronger resistance to label-flipping and backdoor attacks. These results support attack-aware aggregation selection for federated CAV anomaly detection.

7. Conclusions

This study examined the robustness of federated learning for anomaly detection in connected autonomous vehicle networks using CAN bus traffic data. While federated learning enables collaborative model training without sharing raw data, its distributed nature introduces vulnerabilities when compromised clients submit poisoned model updates.

To investigate these challenges, we evaluated six aggregation strategies within a federated anomaly detection framework using the CAN-HCRL-OTIDS dataset: Federated Averaging (FedAvg), Coordinate-wise Median, Trimmed Mean, Krum, Multi-Krum, and Geometric Median (GeoMed). The experiments considered adversarial training scenarios including label-flipping attacks and gradient scaling attacks with different attack intensities as well as a feature-triggered backdoor attack.

The results show that FedAvg is highly vulnerable to adversarial manipulation, exhibiting substantial performance degradation under strong poisoning attacks. Trimmed Mean provides parameter-dependent robustness but remains sensitive to the selected trimming fraction. Median and GeoMed maintain stable performance under gradient-scaling attacks, with GeoMed achieving the strongest F1-score in the main gradient-scaling setting. Multi-Krum provides the strongest resistance to label-flipping and backdoor attacks, including a substantial reduction in attack success rate under the backdoor setting. These findings show that no single aggregation strategy is optimal across all evaluated attack scenarios.

These findings highlight the importance of robust aggregation mechanisms for deploying federated anomaly detection systems in safety-critical vehicular environments. For gradient-scaling attacks, Median and GeoMed provide strong robustness with relatively stable detection performance. For label-flipping and backdoor attacks, Multi-Krum offers stronger protection, although it introduces higher server-side aggregation cost. Therefore, the selection of an aggregation strategy should consider the expected attack model, detection reliability requirement, and computational overhead.

The study also provides additional evidence through malicious client ratio analysis, gradient scaling strength sensitivity, learning rate sensitivity, Trimmed Mean beta sensitivity, multi-seed reliability evaluation, and aggregation time measurement. These analyses strengthen the experimental basis of the findings and show how robustness changes under different adversarial and training conditions.

Future work will explore more sophisticated adversarial strategies, evaluate the framework under highly heterogeneous client data distributions, and investigate adaptive defense mechanisms such as trust-aware aggregation and dynamic client weighting to further strengthen federated learning security in intelligent transportation systems. Future work will also evaluate larger client populations, additional CAV datasets, temporal deep learning models, communication delay under realistic vehicular networks, and hybrid defenses that combine robust aggregation with privacy-preserving mechanisms such as differential privacy and secure aggregation.

Author Contributions

Conceptualization, A.N.; methodology, A.N. and A.Z.M.J.U.; software, A.N.; validation, A.N. and A.Z.M.J.U.; formal analysis, A.N.; investigation, A.N.; data curation, A.N.; writing—original draft preparation, A.N.; writing—review and editing, A.Z.M.J.U. and T.B.; visualization, A.N.; supervision, T.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The CAN-HCRL-OTIDS dataset used in this study is publicly available from its official repository. Processed data and experimental results generated during the experiments are available from the corresponding author upon reasonable request.

Acknowledgments

The authors thank the developers of the CAN-HCRL-OTIDS dataset for making the dataset publicly available, which enabled the experimental evaluation conducted in this work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

CAN	Controller Area Network
CAV	Connected Autonomous Vehicle
FL	Federated Learning
FedAvg	Federated Averaging
IDS	Intrusion Detection System
MLP	Multilayer Perceptron
ROC-AUC	Area Under the Receiver Operating Characteristic Curve
FNR	False Negative Rate
ASR	Attack Success Rate
GeoMed	Geometric Median
IID	Independent and Identically Distributed
Non-IID	Non-Independent and Non-Identically Distributed
V2X	Vehicle-to-Everything

References

Solaas, J.R.V.; Tuptuk, N.; Mariconti, E. Systematic Literature Review: Anomaly Detection in Connected and Autonomous Vehicles. IEEE Trans. Intell. Transp. Syst. 2025, 26, 43–58. [Google Scholar] [CrossRef]
Abdallah, E.E.; Aloqaily, A.; Fayez, H. Identifying Intrusion Attempts on Connected and Autonomous Vehicles: A Survey. Procedia Comput. Sci. 2023, 220, 307–314. [Google Scholar] [CrossRef]
Aloraini, F.; Javed, A.; Rana, O. Adversarial Attacks on Intrusion Detection Systems in In-Vehicle Networks of Connected and Autonomous Vehicles. Sensors 2024, 24, 3848. [Google Scholar] [CrossRef] [PubMed]
Luo, F.; Wang, J.; Zhang, X.; Jiang, Y.; Li, Z.; Luo, C. In-Vehicle Network Intrusion Detection Systems: A Systematic Survey of Deep Learning-Based Approaches. PeerJ Comput. Sci. 2023, 9, e1648. [Google Scholar] [CrossRef] [PubMed]
Yang, J.; Hu, J.; Yu, T. Federated AI-Enabled In-Vehicle Network Intrusion Detection for Internet of Vehicles. Electronics 2022, 11, 3658. [Google Scholar] [CrossRef]
Nwakanma, C.I.; Ahakonye, L.A.C.; Njoku, J.N.; Odirichukwu, J.C.; Okolie, S.A.; Uzondu, C.; Nweke, C.C.N.; Kim, D.-S. Explainable Artificial Intelligence for Intrusion Detection and Mitigation in Intelligent Connected Vehicles: A Review. Appl. Sci. 2023, 13, 1252. [Google Scholar] [CrossRef]
McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; Aguera y Arcas, B. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS); PMLR: Cambridge, MA, USA, 2017; Volume 54, pp. 1273–1282. Available online: https://proceedings.mlr.press/v54/mcmahan17a.html (accessed on 18 March 2026).
Kumar, S.K.G.; Krishna Prakasha, K.; Muniyal, B.; Rajarajan, M. Explainable Federated Framework for Enhanced Security and Privacy in Connected Vehicles Against Advanced Persistent Threats. IEEE Open J. Veh. Technol. 2025, 6, 1438–1463. [Google Scholar] [CrossRef]
Zhang, J.; Li, B.; Chen, C.; Lyu, L.; Wu, S.; Ding, S.; Wu, C. Delving into the Adversarial Robustness of Federated Learning. arXiv 2023, arXiv:2302.09479. [Google Scholar] [CrossRef]
Demir, U.; Erpek, T.; Yalin, E.; Kompella, S.; Xue, M. Targeted Attacks and Defenses for Distributed Federated Learning in Vehicular Networks. arXiv 2025, arXiv:2510.15109. [Google Scholar] [CrossRef]
Blanchard, P.; El Mhamdi, E.M.; Guerraoui, R.; Stainer, J. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017); Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 119–129. Available online: https://proceedings.neurips.cc/paper/2017/hash/f4b9ec30ad9f68f89b29639786cb62ef-Abstract.html (accessed on 18 March 2026).
Yin, D.; Chen, Y.; Kannan, R.; Bartlett, P. Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates. In Proceedings of the 35th International Conference on Machine Learning (ICML); PMLR: Cambridge, MA, USA, 2018; Volume 80, pp. 5650–5659. Available online: https://proceedings.mlr.press/v80/yin18a.html (accessed on 18 March 2026).
Zhang, C.; Yang, S.; Mao, L.; Ning, H. Anomaly Detection and Defense Techniques in Federated Learning: A Comprehensive Review. Artif. Intell. Rev. 2024, 57, 150. [Google Scholar] [CrossRef]
Taheri, R.; Jafari, R.; Gegov, A.; Arabikhan, F.; Ichtev, A. Explainable AI for Federated Learning-Based Intrusion Detection Systems in Connected Vehicles. Electronics 2025, 14, 4508. [Google Scholar] [CrossRef]
Lim, L.-H.; Ong, L.-Y.; Leow, M.-C. Federated Learning for Anomaly Detection: A Systematic Review on Scalability, Adaptability, and Benchmarking Framework. Future Internet 2025, 17, 375. [Google Scholar] [CrossRef]
HaghighiFard, M.S.; Coleri, S. Secure Hierarchical Federated Learning in Vehicular Networks Using Dynamic Client Selection and Anomaly Detection. arXiv 2024, arXiv:2405.17497. [Google Scholar] [CrossRef]
Alekszejenko, L.; Dobrowiecki, T.P. A V2X-Based privacy-preserving Federated Measuring and Learning System. arXiv 2024, arXiv:2401.13848. [Google Scholar] [CrossRef]
Xiang, Z. Federated Learning in Autonomous Driving: Progress, Challenges, and Outlook in Perception, Prediction, and Communication. Appl. Comput. Eng. 2024, 46, 72–78. [Google Scholar] [CrossRef]
Addula, S.R.; Tyagi, A.K. Future of Computer Vision and Industrial Robotics in Smart Manufacturing. In Artificial Intelligence-Enabled Digital Twin for Smart Manufacturing; Tyagi, A.K., Tiwari, S., Arumugam, S.K., Sharma, A.K., Eds.; Scrivener Publishing LLC: Beverly, MA, USA; Wiley: Hoboken, NJ, USA, 2024; pp. 505–539. [Google Scholar] [CrossRef]
Liu, L.; Wang, F.; Du, N. Attack Detection of Federated Learning Model Based on Attention Mechanism Optimization in Connected Vehicles. World Electr. Veh. J. 2025, 16, 679. [Google Scholar] [CrossRef]
Amara Korba, A.; Boualouache, A.; Brik, B.; Rahal, R.; Ghamri-Doudane, Y.; Senouci, S.M. Federated Learning for Zero-Day Attack Detection in 5G and Beyond V2X Networks. In Proceedings of the IEEE International Conference on Communications (ICC); IEEE: Piscataway, NJ, USA, 2023; pp. 1137–1142. [Google Scholar] [CrossRef]
Ercan, S.; Mendiboure, L.; Alouache, L.; Maaloul, S.; Sylla, T.; Aniss, H. An Enhanced Model for Machine Learning-Based DoS Detection in Vehicular Networks. In Proceedings of the IFIP Networking Conference; IEEE: Piscataway, NJ, USA, 2023; pp. 1–9. [Google Scholar] [CrossRef]
Haydari, A.; Zhang, M.; Chuah, C.-N. Adversarial Attacks and Defense in Deep Reinforcement Learning-Based Traffic Signal Controllers. IEEE Open J. Intell. Transp. Syst. 2021, 2, 402–416. [Google Scholar] [CrossRef]
Bagdasaryan, E.; Veit, A.; Hua, Y.; Estrin, D.; Shmatikov, V. How To Backdoor Federated Learning. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS); PMLR: Cambridge, MA, USA, 2020; Volume 108, pp. 2938–2948. Available online: https://proceedings.mlr.press/v108/bagdasaryan20a.html (accessed on 18 March 2026).
Chen, Y.; Su, L.; Xu, J. Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent. Proc. ACM Meas. Anal. Comput. Syst. 2017, 1, 44:1–44:25. [Google Scholar] [CrossRef]
Dwork, C. Differential Privacy. In Proceedings of the 33rd International Colloquium on Automata, Languages and Programming (ICALP); Springer: Berlin/Heidelberg, Germany, 2006; Volume 4052, pp. 1–12. [Google Scholar] [CrossRef]
Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security; ACM: New York, NY, USA, 2017; pp. 1175–1191. [Google Scholar] [CrossRef]
Xie, N.; Zhang, C.; Yuan, Q.; Kong, J.; Di, X. IoV-BCFL: An Intrusion Detection Method for IoV Based on Blockchain and Federated Learning. Ad Hoc Netw. 2024, 163, 103590. [Google Scholar] [CrossRef]

Figure 1. V2X-enabled connected autonomous vehicle environment for federated anomaly detection. Vehicles, roadside units, edge infrastructure, and a central server cooperate to support model training and anomaly detection, while local CAN traffic remains within each vehicle. Solid arrows indicate V2X communication links, and dashed red arrows indicate potential adversarial paths.

Figure 2. Federated anomaly detection workflow in V2X networks. Vehicular clients train local models using private CAN bus data and send model updates to the aggregation server. Red dashed arrows indicate poisoned updates from malicious clients, while black arrows indicate the normal training and global model update flow.

Figure 3. Effect of malicious client ratio on F1-score under label-flipping attacks.

Figure 4. F1-score under gradient-scaling attacks with different

γ

values.

Figure 4. F1-score under gradient-scaling attacks with different

γ

values.

Figure 5. Attack success rate under the feature-triggered backdoor attack. Different bar colors distinguish the evaluated aggregation methods.

Figure 6. Learning rate sensitivity under gradient-scaling attack.

Figure 7. Trimmed Mean beta sensitivity under label-flipping and gradient-scaling attacks.

Table 1. Dataset characteristics used in the experiments.

Attribute	Value
Dataset	CAN-HCRL-OTIDS
Source files	`dataset.csv`, `dataset1.csv`, `dataset2.csv`, `dataset3.csv`
Total samples	4,613,439
Class distribution	2,369,398 benign and 2,244,041 attack samples
Class mapping	Target 0 as benign; targets 1, 2, and 3 as attack
Feature count	12
Split strategy	Chronological 80% training and 20% testing split per source file
Final split	3,690,750 training and 922,689 testing samples

Table 2. Federated learning configuration.

Parameter	Value
Learning paradigm	Federated learning
Number of clients	10
Client sample size	20,000 samples per client
Evaluation test size	100,000 samples
Non-IID construction	Client attack ratios from 0.10 to 0.90
Communication rounds	30
Local epochs	5
Batch size	1024
Learning rate	0.001
Optimizer	Adam
Malicious-client ratios	10%, 20%, 30%, and 40%
Random seeds	42, 123, and 2026

Table 3. Main performance comparison under clean training, label-flipping attack, and gradient-scaling attack. The adversarial settings use 40% malicious clients and gradient scaling uses

γ = 5

.

Table 3. Main performance comparison under clean training, label-flipping attack, and gradient-scaling attack. The adversarial settings use 40% malicious clients and gradient scaling uses

γ = 5

.

Scenario	Method	Accuracy	Precision	Recall	F1-Score	ROC-AUC	FNR	Time (s)
Clean	Median	0.9200	0.9900	0.8442	0.9113	0.9544	0.1558	0.0005
Clean	Multi-Krum	0.9187	0.9867	0.8444	0.9100	0.9520	0.1556	0.0040
Clean	GeoMed	0.9172	0.9855	0.8423	0.9083	0.9526	0.1577	0.0024
Clean	Krum	0.9153	0.9752	0.8475	0.9069	0.9534	0.1525	0.0032
Clean	FedAvg	0.9149	0.9844	0.8385	0.9056	0.9512	0.1615	0.0011
Clean	Trimmed Mean	0.9148	0.9833	0.8391	0.9055	0.9531	0.1609	0.0013
Label flipping	Multi-Krum	0.8909	0.8979	0.8755	0.8865	0.9551	0.1245	0.0037
Label flipping	GeoMed	0.8846	0.8886	0.8724	0.8804	0.9507	0.1276	0.0026
Label flipping	Krum	0.8462	0.8035	0.9056	0.8515	0.9530	0.0944	0.0032
Label flipping	Median	0.8210	0.7561	0.9334	0.8355	0.9568	0.0666	0.0004
Label flipping	Trimmed Mean	0.6573	0.5911	0.9604	0.7318	0.8993	0.0396	0.0007
Label flipping	FedAvg	0.5792	0.5374	0.9762	0.6932	0.8875	0.0238	0.0011
Gradient scaling	GeoMed	0.9215	0.9858	0.8510	0.9134	0.9553	0.1490	0.0025
Gradient scaling	Median	0.9191	0.9823	0.8491	0.9108	0.9548	0.1509	0.0004
Gradient scaling	Multi-Krum	0.8942	0.9020	0.8780	0.8898	0.9553	0.1220	0.0038
Gradient scaling	Krum	0.8462	0.8035	0.9056	0.8515	0.9530	0.0944	0.0032
Gradient scaling	FedAvg	0.4893	0.4881	0.9987	0.6557	0.6984	0.0013	0.0011
Gradient scaling	Trimmed Mean	0.4869	0.4869	0.9995	0.6548	0.7256	0.0005	0.0006

Note: Bold values indicate the best-performing result within each scenario and metric. For FNR and aggregation time, lower values are better; for all other metrics, higher values are better.

Table 4. F1-score under label-flipping attacks with different malicious client ratios.

Ratio	FedAvg	Median	Multi-Krum	GeoMed
10%	0.8997	0.9081	0.9081	0.9081
20%	0.8335	0.8902	0.9049	0.9012
30%	0.7687	0.8616	0.8980	0.8973
40%	0.6932	0.8355	0.8865	0.8804

Note: Bold values indicate the highest F1-score for each malicious client ratio.

Table 5. F1-score under different gradient-scaling strengths.

$γ$	FedAvg	Median	GeoMed
2	0.9086	0.9074	0.9119
5	0.6557	0.9108	0.9134
10	0.6603	0.9106	0.9143

Note: Bold values indicate the highest F1-score for each gradient-scaling strength.

Table 6. Backdoor attack results with 40% malicious clients. Lower ASR indicates stronger resistance.

Method	Clean F1-Score	Clean ROC-AUC	ASR
FedAvg	0.9063	0.9518	0.9895
Median	0.9095	0.9537	0.9589
Multi-Krum	0.8973	0.9564	0.0411
GeoMed	0.9079	0.9532	0.9519

Note: Bold values indicate the best result for each metric. For ASR, lower values are better; for Clean F1-Score and Clean ROC-AUC, higher values are better.

Table 7. Compact hyperparameter sensitivity summary using F1-score.

Analysis	Setting	FedAvg	Median/GeoMed	Trimmed Mean
Learning rate	0.0005	0.6576	0.9062/0.9040	–
Learning rate	0.001	0.6546	0.9116/0.9083	–
Learning rate	0.005	0.6547	0.8910/0.9018	–
Beta, label flipping	0.1	–	–	0.6636
Beta, label flipping	0.2	–	–	0.6657
Beta, label flipping	0.3	–	–	0.7537
Beta, gradient scaling	0.1	–	–	0.6547
Beta, gradient scaling	0.2	–	–	0.6602
Beta, gradient scaling	0.3	–	–	0.8311

Table 8. Multi-seed reliability analysis using F1-score mean and standard deviation.

Attack Type	FedAvg	Median	GeoMed	Multi-Krum
Gradient scaling	$0.6547 \pm 0.0000$	$0.8568 \pm 0.0055$	$0.8609 \pm 0.0030$	$0.8610 \pm 0.0209$
Label flipping	$0.6616 \pm 0.0059$	$0.7280 \pm 0.0126$	$0.7407 \pm 0.0049$	$0.8525 \pm 0.0130$

Note: Bold values indicate the highest mean F1-score for each attack type.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Md Jalal Uddin, A.Z.; Nayeem, A.; Bhuiyan, T. Robust Federated Learning for Anomaly Detection in Connected Autonomous Vehicle Networks Under Adversarial Attacks. Automation 2026, 7, 80. https://doi.org/10.3390/automation7030080

AMA Style

Md Jalal Uddin AZ, Nayeem A, Bhuiyan T. Robust Federated Learning for Anomaly Detection in Connected Autonomous Vehicle Networks Under Adversarial Attacks. Automation. 2026; 7(3):80. https://doi.org/10.3390/automation7030080

Chicago/Turabian Style

Md Jalal Uddin, Abu Zahid, Atahar Nayeem, and Touhid Bhuiyan. 2026. "Robust Federated Learning for Anomaly Detection in Connected Autonomous Vehicle Networks Under Adversarial Attacks" Automation 7, no. 3: 80. https://doi.org/10.3390/automation7030080

APA Style

Md Jalal Uddin, A. Z., Nayeem, A., & Bhuiyan, T. (2026). Robust Federated Learning for Anomaly Detection in Connected Autonomous Vehicle Networks Under Adversarial Attacks. Automation, 7(3), 80. https://doi.org/10.3390/automation7030080

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Robust Federated Learning for Anomaly Detection in Connected Autonomous Vehicle Networks Under Adversarial Attacks

Abstract

1. Introduction

2. Related Work

2.1. Anomaly Detection and Federated Learning in Vehicular Networks

2.2. Adversarial Attacks on Federated Learning

2.3. Robust Aggregation and Defense Mechanisms

2.4. Research Gap

3. System Model

3.1. Federated Learning Workflow

3.2. Data Distribution

3.3. Client Composition

3.4. Threat and Trust Assumptions

3.5. Communication Assumptions

4. Methodology

4.1. System Overview

4.2. Federated Learning Formulation

4.3. Aggregation Strategies

4.3.1. Federated Averaging

4.3.2. Coordinate-Wise Median

4.3.3. Trimmed Mean

4.3.4. Krum

4.3.5. Multi-Krum

4.3.6. Geometric Median

4.4. Federated Training Procedure

4.5. Threat Model

5. Experimental Setup

5.1. Dataset and Preprocessing

5.2. Model Architecture

5.3. Federated Learning Configuration

5.4. Attack and Sensitivity Settings

5.5. Evaluation Metrics

6. Results and Discussion

6.1. Overall Performance Under Clean and Adversarial Settings

6.2. Malicious Client Ratio and Gradient-Scaling Sensitivity

6.3. Backdoor Attack Evaluation

6.4. Hyperparameter Sensitivity and Multi-Seed Reliability

6.5. Aggregation Time and Comparative Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI