Next Article in Journal
Graph-Based Deep Learning and Multi-Source Data to Provide Safety-Actionable Insights for Rural Traffic Management
Previous Article in Journal
Advanced Multi-Modal Sensor Fusion System for Detecting Falling Humans: Quantitative Evaluation for Enhanced Vehicle Safety
Previous Article in Special Issue
Optimized GOMP-Based OTFS Channel Estimation Algorithm for V2X Communications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Reputation-Aware Multi-Agent Cooperative Offloading Mechanism for Vehicular Network Attack Scenarios

1
School of Information Engineering, Chang’an University, Xi’an 710064, China
2
School of Advanced Technology, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
*
Authors to whom correspondence should be addressed.
Vehicles 2025, 7(4), 150; https://doi.org/10.3390/vehicles7040150
Submission received: 30 October 2025 / Revised: 29 November 2025 / Accepted: 2 December 2025 / Published: 4 December 2025
(This article belongs to the Special Issue V2X Communication)

Abstract

The air–ground integrated Internet of Vehicles (IoV), which incorporates unmanned aerial vehicles (UAVs), is a key component of a three-dimensional intelligent transportation system. Task offloading is crucial to improving the overall efficiency of the IoV. However, blackhole attacks and false-feedback attacks pose significant challenges to achieving secure and efficient offloading for heavily loaded roadside units (RSUs). To address this issue, this paper proposes a reputation-aware, multi-objective task offloading method. First, we define a set of multi-dimensional Quality of Service (QoS) metrics and combine K-means clustering with a lightweight Proximal Policy Optimization variant (Light-PPO) to realize fine-grained classification of offloading data packets. Second, we develop reputation assessment models for heterogeneous entities—RSUs, vehicles, and UAVs—to quantify node trustworthiness; at the same time, we formulate the RSU task offloading problem as a multi-objective optimization problem and employ the Non-dominated Sorting Genetic Algorithm II (NSGA-II) to find optimal offloading strategies. Simulation results show that, under blackhole and false-feedback attack scenarios, the proposed method effectively improves task completion rate and substantially reduces task latency and energy consumption, achieving secure and efficient task offloading.

1. Introduction

The IoV is the core of future smart transportation, playing a significant role in applications such as autonomous driving, real-time traffic monitoring, and vehicle cooperative driving. However, with the increase in the number of vehicles and the surge in data traffic, the computational capabilities and communication load of RSUs are facing severe challenges. Task offloading is a critical technology to alleviate the load on RSUs and thereby reduce the computational pressure on IoV tasks [1,2,3,4,5].
In the field of IoV task offloading, researchers have primarily modeled and optimized from the following perspectives: First, semi-Markov decision processes are used to precisely characterize the state transitions and temporal characteristics of offloading decisions [6,7]. Second, game theory methods provide solutions for distributed resource competition by analyzing strategic interactions among offloading participants such as vehicles and RSUs [8,9]. Simultaneously, reinforcement learning techniques have been widely adopted, ranging from classic multi-armed bandits and Q-learning to emerging deep reinforcement learning, which enable the autonomous learning of optimal offloading strategies in unknown environments and adaptation to dynamic changes [10,11,12,13]. Finally, to address large-scale real-time demands, scholars have applied heuristic and metaheuristic algorithms such as genetic algorithms, particle swarm optimization, and simulated annealing to task scheduling and resource allocation, achieving near-optimal solutions through efficient search [14,15,16].
However, most existing offloading methods ignore the pervasive risk of network attacks in IoV, particularly blackhole attacks and false-feedback attacks. Under these attacks, malicious nodes may fail to execute tasks after receiving them, deliberately delay responses, or forge execution results, severely disrupting task offloading.
To address the aforementioned issues, this paper proposes a reputation-evaluation-based multi-agent collaborative task offloading mechanism for the IoV, termed the Task Offloading Mechanism with Reputation Evaluation (TOM-RE). The main contributions of this paper are summarized as follows:
  • A two-stage task-classification mechanism is proposed, leveraging multi-dimensional QoS descriptors and combining K-means clustering with a Light-PPO module to refine data packets within the top-level categories of emergency safety, safety assist, and non-real-time entertainment, thereby generating adaptive, service-aware subcategories.
  • Differentiated reputation evaluation mechanisms are designed for heterogeneous agents, including RSUs, vehicles, and UAVs. RSU reputations are computed using an LSTM-based model capturing temporal dependencies among service availability, data accuracy, and resource utilization, whereas vehicle and UAV reputations are derived from multi-metric indicators using deep reinforcement learning for adaptive weighting and exponential smoothing, yielding robust local and global trust assessments.
  • RSU task offloading is formulated as a multi-objective optimization problem that considers latency, energy consumption, and load balancing. NSGA-II is employed to approximate the Pareto-optimal front, enabling efficient and interpretable task scheduling in heterogeneous IoV environments.
The rest of this paper is organized as follows. Section 2 reviews the progress and related work on task offloading in vehicular networks. Section 3 describes the network model. Section 4 presents the proposed reputation-based task offloading framework. Section 5 details the experimental setup and results analysis. Finally, Section 6 concludes the paper and outlines future work.

2. Related Work

This section will review the related work in three areas of current task offloading research: (1) task offloading methods focusing on reducing delay and energy consumption; (2) cooperative optimization offloading methods for balancing multiple objectives; (3) task offloading methods integrating reputation modeling and security assurance mechanisms.

2.1. Task Offloading Methods Aiming to Minimize Delay and Energy Use

To make task offloading work under high loads, Khabbaz et al. [17] came up with an R2V task offloading scheduling plan with deadline limits. They built a random modeling framework to understand how RSUs work and check how well the plan performs when it is running. Li et al. [18] looked at task offloading in vehicle networks as a multi-armed bandit problem and suggested a new partial task offloading algorithm that uses help from RSUs for learning. Fan et al. [19] put forward a joint task offloading and resource allocation plan that considers different types of tasks, vehicle categories, and flexibility in task processing. The algorithm, based on generalized bend decomposition and reformulation linearization methods, is meant to solve the optimization problem in the best way. To meet the delay needs of different tasks, Cao et al. [20] proposed ranking heterogeneous tasks by priority, set up an optimization problem to reduce the weighted average system delay as much as possible, and used relay hopping and a differentiated task priority algorithm to solve it. However, most of these studies focus on reducing task processing delay and resource use, and do not fully optimize multiple goals like delay, load balancing, energy use, and communication costs.

2.2. Cooperative Optimization Offloading Methods for Balancing Multiple Goals

To make offloaded tasks run efficiently, and for computing resources to be allocated properly in vehicle networks, researchers have studied the cooperative optimization of task offloading and resource management in recent years. Fan W et al. [21] suggested a resource management plan and designed a twin delayed deep deterministic policy gradient algorithm based on deep reinforcement learning (DRL). It includes an optimization subroutine to lower the training complexity of the algorithm. Fan et al. [22] proposed an edge–edge collaborative VEC joint task offloading and resource allocation method. It minimizes the total task processing delay for all vehicles while making sure that tasks can be finished within their delay limits and that each vehicle stays connected long enough. Liao et al. [23] came up with a cooperative offloading strategy to gradually optimize offloading decisions and computing resource allocation together. Xu et al. [24] designed a task offloading plan based on the Takagi–Sugeno fuzzy neural network and game theory, called fuzzy task offloading and resource allocation. Wang et al. [25] built a decision-based vehicle offloading method. They proved that it can always reach Nash equilibrium by using limited improvement properties and integration to reduce computing costs as much as possible. They also put forward a task migration algorithm, and based on that, an offloading algorithm to minimize computing costs. Mao et al. [26] proposed a task offloading mechanism for vehicle networks based on trusted RSU services. It makes the system better at handling task offloading when under attack, but does not consider untrusted RSUs. Lakhan et al. [27] suggested a security plan based on fully homomorphic encryption, but this kind of encryption often brings high computing costs, which affects how well the system works. Wang et al. [28] put forward a privacy-protecting vehicle edge computing structure. It protects privacy by changing context information through differential privacy technology and uses a K-neighborhood joint optimization algorithm for task offloading and resource allocation to reduce the overall delay of task execution. However, these methods mainly focus on things like reducing task processing delay, resource use, and secure offloading but do not pay enough attention to how reliable and efficient computing resource allocation is.

2.3. Task Offloading Methods Combining Reputation Modeling and Security Guarantee Mechanisms

Parvini et al. [29] proposed two new multi-agent reinforcement learning frameworks based on a multi-agent deep deterministic policy gradient. But this framework still has problems with coordination when breaking down tasks and allocating resources among multiple agents. Xu et al. [30] came up with a secure computing offloading scheduling plan based on blockchain. It uses triangular subjective logic and deep reinforcement learning algorithms for trust management. However, in dynamic environments, it is still hard to quickly and correctly spot malicious actions and update trust information right away. Cao et al. [31] suggested an evaluation mechanism based on security trust incentives. It calculates the security trust level through interactions between vehicle users and RSU BS and uses a stable matching algorithm to find the best BS for each vehicle user. But when vehicles are moving around and task needs are different, it is still a challenge to keep the matching stable and efficient.
In existing research on task offloading in vehicle networks, on the one hand, most offloading plans ignore potential threats from network attacks and malicious nodes. They do not fully check how trustworthy the nodes involved in offloading are, which makes it hard for the system to quickly identify and separate unreliable nodes when facing blackhole attacks or false feedback. On the other hand, in situations with extremely heavy traffic, most methods are not good enough at scheduling multiple resource constraints and node differences together. The offloading efficiency needs to be improved further in dynamic environments.
To address the aforementioned issues, this paper proposes a multi-agent collaborative task offloading mechanism for the IoV based on reputation evaluation, namely the TOM-RE. This mechanism works as follows: First, multi-dimensional QoS parameters are designed. The K-means method combined with the Light-PPO method is adopted to further refine the classification of IoV data packets on the basis of three major categories [32], namely the emergency safety category, the safety assistance category, and the non-real-time entertainment category. Second, the trustworthiness of different agents, such as RSUs, vehicle nodes, and UAVs, is evaluated, respectively. For RSUs, a long short-term memory (LSTM) network is used for modeling to automatically capture the dynamic correlations among indicators such as service availability, data accuracy, and resource utilization rate, so as to evaluate the real-time reputation value of the RSU. Combined with its historical reputation value, the comprehensive reputation value is calculated. For vehicles and UAVs, multiple indicators such as task completion rate, processing delay, error rate, and feedback score are introduced. Deep reinforcement learning is used to train and adjust the weight of each indicator to calculate the real-time reputation value. Then, the local reputation is obtained by fusing the historical reputation with exponential smoothing. Combined with the weighted recommended reputation value, the global reputation value is finally obtained. Meanwhile, the task offloading problem of RSUs is transformed into a multi-objective optimization problem. The minimization of delay, the minimization of energy consumption, and load balancing are comprehensively considered. Under the constraints of task processing delay, agent reputation, and computing resources, the NSGA-II algorithm is used to solve the optimal offloading strategy, realizing efficient task offloading.

3. Network Attack Scenarios

The low-altitude integrated vehicular network consists of vehicles, RSUs, UAVs, and cloud platforms interconnected through heterogeneous wireless links, enabling various communication modes such as Vehicle-to-Vehicle, Vehicle-to-Infrastructure, Vehicle-to-Everything, and Vehicle-to-UAV. This multidimensional collaborative communication architecture significantly enhances the perception accuracy, information coverage, and task-handling flexibility of vehicular networks, thereby providing fundamental support for intelligent driving, road safety warning, environmental perception, and traffic optimization scheduling. However, with the rapid increase in the number of vehicular nodes and the explosive growth of perception, data-sharing, and computation tasks, the computation and communication resources of RSUs are becoming increasingly constrained. Consequently, the overall network load is growing exponentially, leading to rising task-processing latency and energy consumption [33].
Against this backdrop, task offloading has become a key approach to alleviating the computational burden on RSUs and improving overall network performance [34,35]. Through task offloading mechanisms, vehicular nodes can dynamically assign tasks to neighboring idle vehicles, RSUs, or UAVs, taking into account real-time network topology, link quality, task priority, and available resources, thus achieving distributed load balancing [36]. Among these nodes, UAVs act as aerial mobile edge nodes with the advantages of flexible deployment, high mobility, and relatively strong computational capacity. UAVs can play an essential role in regions with limited ground communication coverage, assisting vehicles and RSUs in task distribution, relay transmission, and computation offloading. This cooperation effectively enhances task timeliness and reliability.
Nevertheless, in UAV-assisted vehicular task offloading scenarios, security threats and trust challenges become significantly more severe. Due to the complexity of the multi-layer air–ground collaborative communication architecture, various potential attack surfaces and uncertainties exist within the network. On the one hand, UAVs, functioning as aerial relays or computational units, are more vulnerable to signal interference, spoofing, and falsified information injection. On the other hand, both UAVs and vehicles are characterized by high mobility and anonymity, which makes it easier for malicious entities to disguise themselves as trustworthy participants and conduct attacks such as blackhole attacks and false-feedback attacks. Moreover, due to privacy protection and data minimization principles, intermediate computational results and state information are often hidden or inaccessible during task execution, resulting in reduced transparency and auditability. This further complicates the detection, trust evaluation, and traceability of malicious behaviors.
Blackhole attacks and false-feedback attacks are two typical types of malicious behaviors in UAV-assisted vehicular network task offloading scenarios. Figure 1 illustrates how a blackhole node intercepts and drops task flows, while a false-feedback node contaminates computation results within this context.
The core of a blackhole attack [37,38] lies in malicious nodes falsifying routing information or falsely claiming superior computing and forwarding capabilities to mislead other nodes into offloading tasks to them. Once the tasks are received, these nodes neither forward nor execute them but directly discard the data packets or computation requests. In task offloading scenarios, blackhole attacks lead directly to task loss, reduced task completion rates, and increased end-to-end latency. More critically, in multi-hop cooperative offloading mechanisms, packet loss at a single point can trigger rollback, retransmission, or rescheduling, resulting in cascading bottlenecks that degrade service availability and real-time performance guarantees. In latency- and reliability-sensitive applications such as autonomous driving assistance and emergency situation coordination, such cascading effects may even cause functional interruptions or significantly increase safety risks.
The false-feedback attack [39,40] is equally destructive during the task offloading process. In this type of attack, an adversarial node, after receiving a task—whether or not it actually executes it—submits falsified high-quality results or exaggerated performance metrics to raise its reputation. In task offloading scenarios, such nodes can quickly accumulate high reputation scores in the short term, thereby gaining more high-value task allocations. However, in the long run, since these nodes do not truly perform the computations, the network suffers from numerous task failures, incorrect outputs, or hidden data pollution, leading to instability in the offloading process and unreliability of results. Moreover, false feedback distorts scheduling and resource allocation strategies, causing valuable computing resources that should have been assigned to honest vehicles, RSUs, or UAVs to be misused or wasted, thereby undermining the overall efficiency and trust foundation of the integrated air-ground vehicular network.
The main objective of this paper is to construct a secure and efficient task offloading mechanism in vehicular networks in the presence of blackhole attacks and false feedback attacks.

4. Task Offloading Model for Multi-Objective Optimization Based on Reputation Evaluation

To enable efficient and secure task offloading in vehicular network scenarios subject to blackhole and false-feedback attacks, this paper proposes a reputation-aware, multi-objective task offloading model named TOM-RE. The TOM-RE framework is illustrated in Figure 2 and comprises three main components: a two-level task classification module, a reputation assessment module, and a task offloading module. The two-level classification module refines tasks—initially grouped as emergency-safety, safety-assist, and non-real-time entertainment—into finer-grained clusters; the reputation assessment module applies differentiated reputation models to participating RSUs, vehicles, and UAVs; and the task offloading module integrates reputation scores to make multi-objective optimized offloading decisions.
The two-level task classification module takes multi-dimensional QoS features (e.g., task delay sensitivity, computational density, memory footprint, and data entropy value) as input. Tasks are first assigned to the three coarse-grained classes and then further partitioned within each class using second-stage K-means clustering to improve intra-cluster consistency. Light-PPO is then employed to derive a compact priority decision table from the clustering boundaries, supporting efficient online priority mapping (see Section 4.1). The reputation assessment module adopts differentiated modeling for heterogeneous nodes: for RSUs, an LSTM captures temporal dynamics of service availability, data accuracy, and resource utilization to produce instantaneous reputation values that are fused with historical reputation to obtain composite scores; for vehicles and UAVs, multiple indicators (task completion rate, processing delay, error rate, feedback score, etc.) form the state representation, and a deep reinforcement learning agent adaptively adjusts indicator weights—combined with exponential smoothing and weighted recommendation fusion—to yield global reputation values that identify and de-prioritize suspicious or malicious nodes (see Section 4.2). The task offloading module formulates offloading as a constrained multi-objective optimization problem—minimizing latency and energy while promoting load balance—and employs NSGA-II to search the Pareto front under constraints on task delay, node reputation, and computational capacity, thereby producing feasible offloading strategies (see Section 4.3).
The sub-cluster labels produced by the two-level classification and the priority table generated by Light-PPO serve as task descriptors and priority inputs to the offloading module, constraining and guiding NSGA-II’s search space. Concurrently, the reputation module supplies real-time availability and trust constraints that enable the offloading module to proactively avoid untrustworthy nodes under blackhole and false-feedback attack scenarios. Execution outcomes and callback feedback from the offloading module are fed back to update the reputation models and to trigger periodic re-clustering and Light-PPO policy adjustments, forming a closed-loop workflow that jointly optimizes task scheduling while enhancing robustness against malicious behavior.

4.1. Task Offloading Classification Model

The specific encoding scheme is shown in Table 1. For task data packets in vehicular networks, a new field entitled “Task Category” is first added to label each task. Tasks are further categorized into three primary types: Emergency Safety, Safety Assistance, and Non-Real-Time Entertainment.
To further subdivide offloading tasks, this paper defines multi-dimensional QoS parameters, uses the K-means clustering algorithm to perform secondary classification based on the multi-dimensional QoS parameters, and then adopts Light-PPO to generate a lightweight priority decision table, thereby realizing the priority division of final task offloading.
The multi-dimensional QoS parameters are defined as the following four indicators:
  • Task delay sensitivity (L): The sensitivity score of a task to delay, usually normalized to the range [0, 1]. A larger value indicates that the task has a lower tolerance for delay.
  • Computational density (C): Expressed in GOPS/MB, it represents the computing requirement per unit of data volume and measures the computing power consumption for task processing.
  • Memory footprint (M): Represents the size of the cache or working set (in MB) required during task execution, measuring the memory usage requirement when the task runs.
  • Data entropy value (E): The information compression ratio of the task input data, characterizing the complexity of the data and the processing overhead. A larger value indicates that the data is more difficult to compress and the preprocessing overhead is higher.
Each task i is represented by a four-dimensional QoS feature vector x i = L i , C i , M i , E i , which is used as the input for K-means clustering.
The specific processing flow of the task offloading classification model is as follows:

4.1.1. Secondary Classification

Using the K-means clustering method, all tasks within each primary category are subjected to feature extraction of their multi-dimensional QoS vectors to achieve secondary clustering. Data standardization is performed on the four-dimensional QoS feature vector of each task, x ˜ i , j = x i , j μ j σ j , where x i , j is the feature vector value of task i in the current category on indicator j and μ j and σ j denote the mean and standard deviation of metric j across the current category of tasks, respectively. The number of clusters K is predefined, and clustering is performed by minimizing the within-cluster sum of squared errors. The objective function of K-means is
min { C k } k = 1 K x i C k x ˜ i μ k 2
where the k-th cluster C k contains all tasks assigned to it, and μ k is the centroid of cluster C k .

4.1.2. Light-PPO Lightweight Priority Decision Making

For each second-level subclass C k , the Light-PPO model is used to learn a policy network during the offloading process. The policy network is represented by the parameter θ , which gives the probability of action a under state s, denoted as π θ ( a | s ) . Then, the optimal resource allocation strategy is learned on the subtask categories. After training, Light-PPO generates a lightweight priority decision table. The specific steps are as follows:
(1)
State and Action Spaces
State: The first-level category and second-level subclass number of the current task, as well as the corresponding cluster centroid feature μ k . The state vector provides decision context for the policy network, enabling it to learn the resource allocation strategy based on the statistical structure information of different task categories.
Action: Discrete priority allocation table entries represent the resource quota levels granted to subclasses. The action space maps complex continuous resource allocation to discrete priority levels.
(2)
Instantaneous Reward Function
The instantaneous reward r t is the comprehensive benefit after the execution of subclass tasks, including the task completion rate increment, delay reduction, energy consumption reduction, and weight coefficients. By aggregating multi-objectives into a single scalar signal, it drives policy learning to achieve the desired trade-off between objectives in resource allocation, formalized as r i = λ 1 Δ T C R λ 2 Δ D e l a y λ 3 Δ E n e r g y , where Δ T C R is the increment of the task completion rate, Δ D e l a y and Δ E n e r g y are the reductions of delay and energy consumption compared to the baseline, respectively, and λ denotes the weight coefficient.
(3)
Key Steps
The Light-PPO network consists of a policy network π θ ( a | s ) and a value network V ϕ ( s ) . The value function, represented by the parameter ϕ , estimates the expected discounted return under a given state s. The policy network is responsible for mapping the state to the probability distribution of discrete priority actions: its input state dimension is d + 2 , it has two fully connected hidden layers, and it outputs a Softmax discrete action probability distribution. The value network, acting as a Critic, estimates the value for each state, which is used to calculate the advantage function and reduce the variance of the policy gradient. The value network has the same structure as the policy network but outputs a single scalar state value.
The clipped objective of PPO controls the policy update magnitude while improving sampling efficiency, avoiding severe policy fluctuations. Its specific form is
L C L I P ( θ ) = E s , a π θ old min r ( θ ) A ^ , clip r ( θ ) , 1 ε , 1 + ε A ^
where E s , a π θ old denotes the expectation operator, which represents taking the expectation over the probability distribution of state s and action a following the old policy π θ old . The importance ratio r ( θ ) = π θ ( a | s ) π θ old ( a | s ) is used to measure the variation magnitude of the new policy relative to the old policy, and clip is the clipping function, which restricts the importance ratio r ( θ ) to the interval [ 1 ε , 1 + ε ] to prevent excessive policy update magnitude. A ^ is the advantage estimate, and the clipping hyperparameter ε limits the policy deviation in a single update. Each time, M mini-batch segments are sampled from the latest execution trajectory to balance computational resource consumption and gradient estimation accuracy. We then calculate L CLIP , the value loss, and the entropy regularization, which form the total loss function L ( θ , ϕ ) of PPO for the joint optimization of the policy and value networks:
L ( θ , ϕ ) = L C L I P ( θ ) + c 1 V ϕ ( s ) R 2 c 2 E a π θ log π θ ( a | s )
where R is the actual discounted return, and c 1 , c 2 are loss weights. After each update, θ old θ is synchronized to improve training stability.
After PPO converges, the policy network is used to perform forward inference on each sub-cluster centroid. For the cluster center state s of each subclass, the optimal action a ( s ) = arg max a π θ ( a | s ) for each C k subclass (output by the policy network) is derived, forming the final lightweight priority decision table. The decision table is indexed based on the clustering results to enable fast offloading.

4.1.3. Task Offloading Delay and Energy Consumption

When task i is processed at the RSU node, only the computing delay and energy consumption of task i at the current RSU need to be considered, as follows:
T i l o c a l = N i c a l c u l a t e V l o c a l
E i l o c a l = T i l o c a l × P l o c a l
where T i l o c a l is the delay for the task i to be computed locally; N i c a l c u l a t e is the number of CPU cycles required to compute the i-th task; V l o c a l is the execution rate of the RSU; E i l o c a l is the energy consumption for task i to be computed locally; and P l o c a l represents the local computing power of the RSU.
When task i is offloaded to an idle node for processing, the offloading delay is
T i t r a n s = N i d a t a t r a n V i t r a n s
T i c o m p u t e = N i c a l c u l a t e V f r e e
V i t r a n s = ω × log 2 1 + P t r a n s × d i δ N 0
T i f r e e = N i d a t a t r a n ω × log 2 1 + P t r a n s × d i δ N 0 + N i c a l c u l a t e V f r e e
Among them, N i d a t a t r a n is the total size of transmitted data, V i t r a n s is the data transmission rate, ω is the channel bandwidth for task upload, P t r a n s is the transmission power of the node, d i δ is the channel gain, d is the distance between the task node and the idle node, δ is the path loss factor, and N 0 is the Gaussian white noise power.
Regarding the energy consumption E i f r e e during the offloading process, for the offloading node, only the energy consumed for uploading the task from the offloading node to the idle node needs to be considered:
E i f r e e = T i t r a n s × P f r e e = N i d a t a t r a n ω × log 2 1 + P t r a n s × d i δ N 0 × P f r e e

4.2. Multi-Agent Reputation Evaluation Method Based on Two Layer Blockchain

This paper develops a differentiated reputation evaluation method based on a two-layer blockchain architecture, aimed at RSUs, vehicles, and UAVs. The first blockchain layer is deployed on the RSU to store the RSU’s overall reputation; the second layer is deployed on vehicles and UAVs to store their global reputation for the RSU to reference when making decisions.
(1)
Multi-indicator Based RSU Reputation Evaluation Model
This paper constructs a multi-indicator-based reputation evaluation model focusing on the service performance of the RSU. To comprehensively evaluate the performance of RSU in terms of business continuity, safety, and reliability, as well as performance carrying capacity, the three indicators of service availability, data accuracy, and resource utilization rate are defined as follows:
  • Service Availability (AV): Defined as the ratio of the number of successfully responded requests N success to the total number of requests N total during the most recent time interval ( t 0 t last ) , reflecting the RSU’s ability to provide normal service within the specified time. Here, λ α denotes the time decay factor.
    A V = N success N total · e λ α ( t 0 t last )
  • Data Accuracy (DA): Consistency of the traffic sensing data (e.g., vehicle position and speed) provided by the RSU with that from other nearby idle nodes. Multi-source cross-verification is performed on the accuracy of traffic perception data, and the improved Jaccard similarity coefficient is used to check adjacent RSUs: S RSU = D target D neighbor D target D neighbor · cos ( θ pos ) , where θ pos is the angle between position data vectors, D target is the traffic perception data provided by RSU, and D neighbor is the traffic perception data provided by other neighboring RSUs. The Hampel filter is used to compare and perform anomaly detection on on-board sensors: S sensor = 1 n i = 1 n D RSU D sensor i < 3 σ ^ i . Perform dynamic credibility fusion:
    D A = η S RSU + ( 1 η ) S sensor , η = N RSU N RSU + N sensor
    where N RSU is the number of other neighboring RSUs, and N sensor is the number of on-board sensors.
  • Resource Utilization Rate (RHI): The weighted geometric mean of the current RSU’s CPU usage, memory usage, and bandwidth utilization. A piecewise function is used to normalize the sub-indicators to distinguish the non-linear impact of normal/overload states:
    f ( x ) = 1 x θ normal x θ normal e γ ( x θ normal ) x > θ normal
    where θ normal is the normal threshold of each indicator, and γ controls the overload penalty rate. Weights are dynamically allocated through the entropy weight method: w i = 1 H i j = 1 n ( 1 H j ) , H i = k p i k ln p i k , where H i is the information entropy of the i-th indicator, and p i k is the mutual data distribution probability, reflecting the difference in resource importance in different scenarios. The resource utilization rate R H I is calculated as a weighted geometric mean to enhance the sensitivity of low-score indicators, where ε is a smoothing factor to prevent zero values, and w i denotes the weight of the i-th indicator:
    R H I = i = 1 n ( f ( x i ) + ε ) w i × 100
The steps for evaluating multi-indicator reputation degree are as follows:
  • Step 1: Calculate the multi-dimensional indicator values of the current RSU
After calculating the AV indicator, a first-order difference method is used to capture the rate of change of service availability over time, thereby reflecting the stability of the service. Assume that the service availability of the RSU at time t is A V ( t ) . Through monitoring records within a period of time, continuous time series data { A V ( t 1 ) , A V ( t 2 ) , , A V ( t n ) } is obtained, and the differences between consecutive moments are calculated: Δ A V ( t i ) = A V ( t i ) A V ( t i 1 ) , i = 2 , 3 , , n . The stability of the service is measured by the average value of the absolute values of the first-order differences: S A V = 1 n 1 i = 2 n | Δ A V ( t i ) | . Usually, the smaller the absolute difference, the more stable the service availability.
Then, calculate DA and use the sliding window method to calculate the Pearson correlation coefficient, which reflects the correlation between the RSU data within a certain time window and the reference data provided by adjacent nodes. Set a time window W, and collect the data sequence X = { X ( t i W + 1 ) , , X ( t i ) } output by the RSU and the reference data sequence Y = { Y ( t i W + 1 ) , , Y ( t i ) } , respectively. Calculate the Pearson correlation coefficient within each window; the formula is
r ( t i ) = j = i W + 1 i ( X ( j ) X ¯ ) ( Y ( j ) Y ¯ ) j = i W + 1 i ( X ( j ) X ¯ ) 2 j = i W + 1 i ( Y ( j ) Y ¯ ) 2
where X ¯ and Y ¯ are the mean values of X and Y within the window, respectively. Since the range of the Pearson correlation coefficient r is [ 1 , 1 ] , it can be linearly mapped to [ 0 , 1 ] : D A n o r m ( t i ) = r ( t i ) + 1 2 . Take the average value over multiple windows to obtain the overall score of the RSU data accuracy: D A = 1 n W + 1 i = W n D A n o r m ( t i ) .
At the same time, calculate the average value, standard deviation, and average absolute rate of change for RHI to reflect the overall load level of the RSU within a certain period of time, and quantify its fluctuation intensity and dynamic change rate. Usually, within a given time window W, calculate the average value M A ( t ) of RHI as M A ( t ) = 1 W j = t W + 1 t R H I ( j ) .
In addition, the standard deviation S D ( t ) within the sliding window can also be calculated to reflect the load fluctuation situation, that is, S D ( t ) = 1 W j = t W + 1 t ( R H I ( j ) M A ( t ) ) 2 . Calculate the rate of change of resource utilization at consecutive moments: Δ R H I ( t i ) = R H I ( t i ) R H I ( t i 1 ) , and the average absolute rate of change can be calculated as M Δ U = 1 n 1 i = 2 n | Δ R H I ( t i ) |
  • Step 2: Obtain Weights through LSTM Modeling
Obtain the weights of the three sequenced indicators in the comprehensive reputation evaluation through LSTM network modeling. Take | Δ A V | , D A , M A , S D , and M Δ U , calculated above as the input of LSTM. At the same time, it is necessary to define a target reputation value R S t a r g e t R S U ( t ) based on historical performance within each time window W. For the window time t, the input sequence is
X ( t ) = | Δ A V t W + 1 | D A t W + 1 M A t W + 1 S D t W + 1 M Δ U , t W + 1 | Δ A V t W + 2 | D A t W + 2 M A t W + 2 S D t W + 2 M Δ U , t W + 2 | Δ A V t | D A t M A t S D t M Δ U , t
The target output is y ( t ) = R S t a r g e t R S U ( t ) . In the LSTM network, the input dimension is W × 5 , where W is the length of the time window and 5 is the number of input features. The output of the LSTM passes through a fully connected layer, and the Softmax activation function is used to convert the output into three weights, ω 1 ω 2 , ω 3 , satisfying ω 1 + ω 2 + ω 3 = 1 , where ω 1 is the weight of service availability, ω 2 is the weight of data accuracy, and ω 3 is the weight of resource utilization. To make the predicted reputation value R S p r e d R S U as close as possible to the target reputation value R S t a r g e t R S U during the model training process, the loss function is defined as the mean squared error: L o s s = 1 N t = 1 N R S p r e d R S U ( t ) R S t a r g e t R S U ( t ) 2 , where N is the number of training samples. The network parameters are continuously updated through the backpropagation algorithm to realize the adaptive learning of weights.
According to the weights output by the LSTM network, the instantaneous reputation value of the RSU at time t is calculated:
R S i n s t a n t R S U = ω 1 · A V ( t ) + ω 2 · D A ( t ) + ω 3 · R H I ( t )
To ensure long-term stability, an exponential smoothing method is used to fuse the instantaneous reputation value R S p r e d R S U with the historical reputation value R S h i s t o r y R S U to obtain the comprehensive reputation value: R S c o m p o s i t e R S U = δ · R S p r e d R S U + ( 1 δ ) · R S h i s t o r y R S U , where δ is the smoothing factor. Finally, the comprehensive reputation value R S c o m p o s i t e R S U and the timestamp t are stored on the first-layer blockchain.
(2)
Vehicle and UAV Reputation Evaluation Model
This paper constructs a reputation evaluation model for vehicles and UAVs, aiming at the characteristics of high mobility and resource limitation of mobile nodes. To comprehensively measure the reliability, real-time performance, and collaboration quality of mobile nodes during task execution, task completion rate (TCR), processing delay (PD), error rate (ER), and feedback score (FS) are selected as the reputation evaluation indicators for vehicles and UAVs, defined as follows:
  • Task Completion Rate (TCR):The ratio of the number of successfully completed tasks N s u c c e s s to the total number of tasks N t o t a l represents the proportion of successfully completed tasks, measuring the reliability of nodes when executing tasks.
    T C R = N s u c c e s s N t o t a l
  • Processing Delay (PD): The average processing time of tasks. The lower the delay, the higher the reputation of the node.
    P D = i = 1 n T i n
    where T i is the processing delay of the i-th task and n is the total number of tasks.
  • Error Rate (ER): The ratio of the number of tasks that failed or had errors N e r r o r to the total number of tasks N t o t a l represents the frequency of task failures or errors. Nodes with a higher error rate have a lower reputation.
    E R = N e r r o r N t o t a l
  • Feedback Score (FS): The evaluation of its task performance by other nodes, usually a score between −1 and 1.
    F S = j = 1 m F j m
    where F j is the feedback score of the j-th node for the task, and m is the number of scores.
Perform normalization processing on each indicator. The task completion rate T C R and feedback score F S are processed using the min-max normalization method. The task processing delay P D uses logarithmic normalization; the larger the task processing delay value, the smaller its normalized value. The use of logarithmic transformation can reduce the impact of extreme values on calculations and make the data conform more to a normal distribution. The error rate E R uses reverse normalization.
This paper uses a deep Q-network(DQN) for modeling to obtain the weights of the above indicators. The state S of this model is composed of the normalized indicators: S = ( T C R norm , P D norm , E R norm , F S norm ) . The actions defined in the action space are operations for adjusting the weights of each indicator, which can be expressed as
α = ( Δ ω T C R , Δ ω P D , Δ ω E R , Δ ω F S ) Δ ω { Δ , 0 , Δ }
where ( Δ ω T C R , Δ ω P D , Δ ω E R , Δ ω F S ) are the adjustment operations for the weights of each indicator. { Δ , 0 , Δ } defines three basic actions: decrease by Δ , remain unchanged, and increase by Δ . When the agent selects one of the actions, it will update the weights of each indicator, and the updated weights are ( ω 1 , ω 2 , ω 3 , ω 4 ) .
In the DQN model, the absolute error is used to measure the gap between the predicted reputation value R S pred mobile and the target reputation value R S target mobile evaluated based on historical data. Define the error part reward as:
R err = v · R S pred mobile R S target mobile
where v > 0 is the adjustment coefficient. To evaluate the performance of task offloading after updating the weights, we define the performance improvement reward:
R perf = μ Δ U = μ t 1 T old T new T old + t 2 E old E new E old + t 3 S new S old 1 S old
where Δ U represents the performance improvement amplitude of the current system; T old , E old , and S old are the average delay, energy consumption, and completion rate before adjustment, respectively; T new , E new , and S new are the corresponding values after adjustment, respectively; t 1 , t 2 , and t 3 are the weights of the three indicators in the comprehensive improvement; and μ > 0 is the performance improvement coefficient. To prevent the instability of reputation evaluation performance caused by excessive weight adjustment, a penalty term is introduced: R pen = ι · Δ ω , where Δ ω is measured by the L2 norm, and ι > 0 is the penalty coefficient. In summary, the final reward function R is obtained:
R = R err + R perf + R pen = v · R S cred mobile R S target mobile + μ · Δ U ι · Δ ω
When the updated weights make the predicted reputation closer to the target reputation, R S cred mobile R S target mobile is smaller, so the reward R is higher; if the system performance such as the task completion rate improvement and delay reduction is significantly enhanced after using the current weights, then Δ U is positive, and the reward R increases; if the weight adjustment amplitude is too large, then Δ ω is larger, and the penalty term reduces the overall reward, thus enabling the algorithm to achieve the performance improvement target while smoothly adjusting the weights.
The global reputation evaluation of vehicle and UAV agents mainly includes local reputation and recommended reputation. This paper uses the weighted geometric mean method to calculate the instantaneous reputation value:
R S current mobile = ( T C R norm ) ω 1 × ( P D norm ) ω 2 × ( E R norm ) ω 3 × ( F S norm ) ω 4 1 ω 1 + ω 2 + ω 3 + ω 4
After exponential smoothing, historical reputation and current instantaneous reputation value are fused to calculate local reputation, obtaining a relatively stable long-term reputation evaluation R S new mobile = λ · R S old mobile + ( 1 λ ) · R S current mobile . Among them, R S old mobile is the local reputation value updated last time, and λ ( 0 , 1 ) is the smoothing factor, controlling the weight ratio between the historical reputation value and the current reputation value.
The recommended reputation reflects the evaluation of the target node’s reputation by other nodes. For the target node i, collect the reputation scores of each neighbor for it from its neighbor set N i , denoted as { R S j : j N i } . For the score set of the neighbor set N i , calculate the mean μ and standard deviation σ :
μ = 1 | N i | j N i R S j , σ = 1 | N i | j N i ( R S j μ ) 2
For each score R S j , calculate the Z-score: Z j = R S j μ σ . Set a threshold Z t h . For all scores satisfying | Z j | > Z t h , regard them as outliers or malicious recommendations and remove them from the set to obtain a new score set N i . To make the neighbor scores close to the local reputation value R S i of the target node have higher weights, this paper uses a weight function, that is, an exponential decay function based on the absolute difference: ϕ j = exp ( γ · | R S j R S i | ) , where γ is a positive adjustment parameter, controlling the influence of the reputation value gap on the weight. The closer the reputation value is to the neighbor of the target node R S i , the larger its ϕ j . Then, perform a weighted average on the score set N i after removing outliers to calculate the recommended reputation value: R S r e c = j N i ϕ j · R S j j N i ϕ j .
The global reputation value R S g l o b a l m o b i l e is obtained by combining the local reputation value and the recommended reputation value, R S g l o b a l m o b i l e = τ · R S n e w m o b i l e + ( 1 τ ) · R S r e c m o b i l e , where τ controls the proportion of the local reputation value and neighbor evaluation in the global reputation value. Finally, the global reputation value R S global mobile is then stored on the second-layer blockchain.

4.3. Transformation and Modeling of Multi-Objective Optimization Task Offloading Problem

In task offloading of the Internet of Vehicles, this paper comprehensively considers three constraints: delay, reputation, and computing resources.
  • Delay Constraint: The total delay in task execution is T total = T j trans + T j compute , which mainly includes the communication delay task T j trans (that is, the time to transmit from the offloading node j to the idle node of the edge or the cloud) and the computing delay T j compute (that is, the processing time of the task on the target node), where T j compute = C i f j , C i is the task computing complexity (CPU cycles) and f j is the computing capability (GHz) of the target node.
    The constraint condition is as follows:
    T total T max
    where T max is the delay threshold.
  • Reputation Constraint: The reputation constraint condition is as follows:
    R S i R S min
    where R S min is the reputation threshold.
  • Computing Resource Constraint: The computing resource constraint of node j:
    C i C available ( j )
    where C i is the task computing requirement and C available ( j ) is the currently available computing resource of node j.
Based on the above three constraints, this paper constructs a multi-objective optimization model. Its objectives not only include the minimization of delay and energy consumption but also involve load balancing optimization, aiming to balance the conflicts between different objectives. Assume that there are N tasks in the IoV, and any task i can choose two execution modes: local execution or offloading to a high-reputation idle node.
x i = 0 , Task i is executed locally 1 , Task i is offloaded for execution
For any task i, the optimization objectives during task offloading are as follows:
  • Minimization of Delay: For task i, the total delay objective function can be expressed as
    f 1 ( x ) = i = 1 N P i ( 1 x i ) T i l o c a l + x i T i f r e e + Penalty i T
    where P i represents the task priority, T i l o c a l is the local computing delay, T i f r e e is the offloading delay, and Penalty i T is the delay penalty term.
  • Minimization of Energy Consumption: The total energy consumption objective function can be expressed as
    f 2 ( x ) = i = 1 N P i ( 1 x i ) E i l o c a l + x i E i f r e e + Penalty i E
    where P i represents the task priority, E i l o c a l is the local energy consumption, E i f r e e is the offloading energy consumption, and Penalty i E is the energy consumption penalty term.
  • Load Balance: For each available node j, N is the set of tasks to be offloaded, x i , j { 0 , 1 } indicates whether task i is allocated to node j, O i is the computing load of task i, and C j is the maximum computing capability of node j. Then, the absolute load allocated to node j is L j = i N x i , j O i , and its relative load rate can be expressed as U j = L j C j = 1 C j i N x i , j O i . The objective of the load balance is to minimize the variance of the loads of each node, where U ¯ = 1 | J | j J U j :
    f 3 ( x ) = 1 | J | j J ( U j U ¯ ) 2
Therefore, the multi-objective optimization problem of task offloading can be expressed as
minimize F ( x ) = ( f 1 ( x ) , f 2 ( x ) , f 3 ( x ) ) s . t . T total T max R S i R S min C i C available ( j )
As illustrated in Figure 3, the proposed approach employs the NSGA-II to solve the multi-objective optimization problem, leveraging its ability to simultaneously optimize multiple conflicting objectives and to generate a diverse set of Pareto-optimal solutions for decision makers. In the context of the offloading of the vehicular edge computing tasks, NSGA-II provides a frontier of solutions that balance latency, energy consumption, and load balancing, allowing planners to flexibly select the most appropriate offloading strategy under dynamic conditions.

5. Simulation Experiments and Performance Analysis

5.1. Simulation Experiment Setup

To effectively evaluate the performance of the proposed method, a joint simulation platform based on Simulation of Urban MObility (SUMO) [41] and Network Simulator 3 (NS-3) [42] was constructed. First, urban traffic scenarios were created in SUMO to generate vehicle mobility traces, which were then imported into NS-3 to build a multi-agent vehicular network simulation comprising vehicles, RSUs, and UAVs. The detailed experimental procedure is shown in Figure 4. Table 2 summarizes the core fields of the simulated dataset. The dataset was preprocessed for feature extraction and engineering, generating LSTM time-series inputs and DQN state vectors and providing training samples for the initial model training.
Prior to model training, the raw dataset undergoes missing value imputation, normalization, and time-window segmentation. Cross-validation is employed to ensure robustness during both training and evaluation. The entire experimental workflow is implemented using C++17, and Python 3.9. C++17 is responsible for developing the joint simulation interface scripts and real-time metric collection modules while Python 3.9 handles data processing and model implementation. Both the LSTM and DQN models are trained on an NVIDIA GeForce RTX 4070 GPU. Hyperparameters are determined via random search on the validation set, and the final configuration is selected based on the lowest validation loss. To ensure stability, each hyperparameter setting is trained five times using different random seeds, with the mean and standard deviation of the results reported. The core configuration parameters of the SUMO and NS-3 simulations are listed in Table 3, while the key hyperparameter settings of the LSTM and DQN models are presented in Table 4 [43,44,45].

5.2. Experimental Results and Performance Analysis

This paper uses regression evaluation indicators mean square error (MSE), mean absolute error (MAE), and coefficient of determination ( R 2 ) to analyze the results of the reputation evaluation module of the proposed framework. As shown in Figure 5, 100–300 rounds of experimental data are selected to compare the real values and predicted values of RSU reputation. The mean absolute error (MAE) of the model on the test set is only 0.023, the mean square error (MSE) is 0.0012, and the coefficient of determination ( R 2 ) reaches 0.94, which fully indicates that the model can accurately capture the dynamic changes of RSU reputation values. In the high fluctuation interval, the model can track the sharp rise or fall of the real reputation in a timely manner. Overall, the reputation evaluation model not only maintains a prediction error of less than 0.05 at most moments but also shows good adaptability and stability to extreme modes.
As shown in Table 5, the MSE of the dedicated reputation evaluation model for RSUs remains at a relatively low level, and the corresponding MAE is also at a relatively low level, indicating that the prediction error of the model for RSU reputation values is very small and stable. At the same time, the R 2 of each RSU remains above 0.9, indicating that the model can explain more than 90% of the real reputation fluctuations, and the fitting effect is excellent. The experimental results show that this set of indicators indicates that the proposed dedicated reputation evaluation model for RSUs has very good generalization ability and stability in a diversified RSU environment.
As shown in Figure 6, for other mobile nodes in the Internet of Vehicles except RSUs, after comparing the historical reputation values of 60 groups of samples with the reputation values predicted by the DQN model, the deviation between the predicted values and the true values is extremely small and stable. From the actual predicted scatter plot, the data points are densely distributed near the 45° diagonal. This visual effect is a typical sign of a good fit of the regression model. Overall, the experimental results of low MSE/MAE and high R 2 fully prove that the proposed DQN-driven multi-indicator weight adaptive scheme can stably and accurately perform real-time reputation evaluation in the vehicle edge node environment, laying a solid data foundation for reputation-based task offloading.
The proposed TOM-RE framework, which integrates a comprehensive task classification module and a reputation assessment module for task offloading, is compared with the following four baseline methods.
  • RALPTO [18]: A partially offloading algorithm executed in a distributed fashion that leverages RSU-assisted learning and is built on multi-armed bandit theory.
  • OJTR [19]: A heuristic task offloading optimization method combining reinforcement learning and a greedy-based decomposition approach.
  • CODA [23]: A distance-driven computation resource allocation scheme designed to achieve load balancing.
  • DRAODA [46]: A resource allocation and offloading decision algorithm based on the deep deterministic policy gradient, developed to improve the handling of dynamic and complex problems by DRL methods.
As shown in Figure 7, TOM-RE maintains a high and stable task completion rate across different vehicle densities and significantly outperforms the baseline methods. In the 200-vehicle high-density scenario, TOM-RE improves completion rate by approximately 8.5 percentage points over CODA and by about 18.5 percentage points over OJTR. This advantage primarily stems from the synergy between multi-dimensional dynamic reputation assessment and fine-grained task classification. First, by applying sequential modeling to RSUs to capture service availability and resource utilization over time, and by using multi-metric weighted strategies for vehicles and UAVs, the system can promptly identify and down-weight malicious or unstable nodes, thereby substantially reducing task loss caused by blackhole and false-feedback attacks. Second, the two-level clustering combined with a lightweight policy network produces a priority decision table that enables rapid assignment of high-priority and high-reliability tasks, avoiding failures caused by resource misallocation. By contrast, CODA and OJTR rely on static or heuristic allocation schemes that cannot adapt quickly to rapid changes in topology and load, resulting in pronounced drops in completion rate. Although DRAODA and RALPTO exhibit some adaptability, their reputation filtering is less effective than that of TOM-RE, so their overall success rates fall behind under high-density and adversarial conditions. These results indicate that TOM-RE offers clear practical value in ensuring reliable and attack-resilient offloading.
As shown in Figure 8, when vehicle density increases from 50 to 200, the proposed method demonstrates a marked advantage in latency control, with notably lower latency than CODA, DRAODA and other schemes in the 200-vehicle scenario. This advantage stems primarily from two factors. First, the priority rules generated by Light-PPO minimize online computation overhead, thereby shortening scheduling response time. Second, NSGA-II treats latency as an explicit optimization objective in the offloading decision, which helps to prioritize latency-sensitive tasks under a high load. By contrast, CODA’s distance-driven allocation and OJTR’s heuristic rules respond slowly to rapid changes in network topology, while DRAODA and RALPTO lack TOM-RE’s combination of strategy stability and reputation-based constraints, resulting in inferior latency performance. These findings indicate that TOM-RE is better suited to vehicular applications with stringent response-time requirements.
As shown in Figure 9, although CODA occasionally exhibits lower transient energy consumption at 50 and 100 vehicles, the energy growth of TOM-RE is more moderate as density increases, and TOM-RE consumes less energy than CODA, OJTR, DRAODA and RALPTO with 150 and 200 vehicles. This advantage is primarily due to three factors. First, the reputation mechanism reduces task assignment to inefficient or malicious nodes, thereby lowering the energy waste caused by failed retransmissions and wasted computations. Second, NSGA-II’s multi-objective balancing avoids extreme solutions that trade high energy consumption for lower latency, achieving a better energy–latency compromise. Third, Light-PPO’s task-prioritization scheme reduces online computation and transmission overhead. Compared with CODA, which can show low energy use at light load but cannot sustain efficiency at high density, TOM-RE demonstrates superior scalability and congestion resilience. OJTR, DRAODA and RALPTO tend to incur higher energy consumption under heavy load, partly due to retransmissions, erroneous task processing, or misallocation to unreliable nodes. In summary, TOM-RE achieves improved energy control while maintaining high task completion and responsiveness, which demonstrates its greater practical value and cost-effectiveness for large-scale, dynamic, and adversarial deployment scenarios.

6. Conclusions

In response to the efficient security task offloading problem of high-load RSUs in complex environments with network attack risks, this paper proposes a multi-objective optimization offloading method based on reputation evaluation, combined with the categories of vehicle networking tasks. Firstly, task classification is refined using K-means and Light-PPO. Secondly, a differentiated reputation evaluation mechanism is designed for RSUs, vehicles, and UAVs. Finally, the uninstallation problem is transformed into a multi-objective optimization problem and solved using the NSGA-II algorithm. Simulation experiments show that this method significantly improves task completion rate, reduces average latency and energy consumption compared to existing baseline schemes, and effectively supports task offloading requirements in complex ground-to-air environments.
From a practical application perspective, this indicates that the proposed framework can operate robustly in high-density, highly dynamic vehicular networks even in the presence of malicious behaviors. It effectively reduces task failures and retransmissions, significantly lowers system energy consumption, and satisfies the stringent latency requirements of real-time services. Therefore, TOM-RE demonstrates strong engineering deployability and application potential, offering a viable solution for trustworthy, secure, and efficient task offloading in large-scale heterogeneous vehicular–infrastructure collaborative systems.
Future work will focus on adapting the proposed method to UAV-assisted air-ground integrated vehicular network scenarios, optimizing the reputation update process to meet ultra-low latency requirements; further extending the algorithm to address more complex multi-attack scenarios and enhancing the defense capability against blackhole attacks, false-feedback attacks, and other types of malicious attacks; and conducting hardware-in-the-loop testing in real vehicular environments to fully verify the practical performance and deployability of the proposed scheme.

Author Contributions

Conceptualization, L.Y., J.Z. and N.F.; methodology, W.F. and L.Y.; software, L.Y. and J.Z.; validation, L.Y., J.Z., Y.S. (Yu Shi) and Y.S. (Yexiong Shang); formal analysis, N.F.; investigation, L.Y., Y.S. (Yu Shi) and Y.S. (Yexiong Shang); resources, N.F.; data curation, L.Y.; writing—original draft preparation, L.Y.; writing—review and editing, N.F. and W.F.; visualization, L.Y.; supervision, N.F. and W.F.; funding acquisition, N.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program Project (2024YFB4303100), the National Natural Science Foundation of China (No. 62472049), and the Scientific Research Project of Shaanxi Provincial Department of Transportation (25-40X).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

The authors would like to thank the editor and reviewers for providing valuable review comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Bréhon–Grataloup, L.; Kacimi, R.; Beylot, A.L. Mobile edge computing for V2X architectures and applications: A survey. Comput. Netw. 2022, 206, 108797. [Google Scholar] [CrossRef]
  2. Wang, S.; Song, X.; Xu, H.; Song, T.; Zhang, G.; Yang, Y. Joint offloading decision and resource allocation in vehicular edge computing networks. Digit. Commun. Netw. 2023. [Google Scholar] [CrossRef]
  3. Oza, P.; Hudson, N.; Chantem, T.; Khamfroush, H. Deadline-aware task offloading for vehicular edge computing networks using traffic light data. ACM Trans. Embed. Comput. Syst. 2024, 23, 1–25. [Google Scholar] [CrossRef]
  4. Gu, X.; Wu, Q.; Fan, P.; Cheng, N.; Chen, W.; Letaief, K. DRL-based federated self-supervised learning for task offloading and resource allocation in isac-enabled vehicle edge computing. Digit. Commun. Netw. 2024, 11, 1614–1627. [Google Scholar] [CrossRef]
  5. He, X.; Cen, Y.; Liao, Y.; Chen, X.; Yang, C. Optimal Task Offloading Strategy for Vehicular Networks in Mixed Coverage Scenarios. Appl. Sci. 2024, 14, 10787. [Google Scholar] [CrossRef]
  6. Liang, H.; Zhang, X.; Hong, X.; Zhang, Z.; Li, M.; Hu, G.; Hou, F. Reinforcement learning enabled dynamic resource allocation in the internet of vehicles. IEEE Trans. Ind. Inform. 2020, 17, 4957–4967. [Google Scholar] [CrossRef]
  7. Liang, H.; Zhou, S.; Liu, X.; Zheng, F.; Hong, X.; Zhou, X.; Zhao, L. A dynamic resource allocation model based on SMDP and DRL algorithm for truck platoon in vehicle network. IEEE Internet Things J. 2021, 9, 10295–10305. [Google Scholar] [CrossRef]
  8. Ribeiro, A., Jr.; da Costa, J.B.D.; Rocha Filho, G.P.; Villas, L.A.; Guidoni, D.L.; Sampaio, S.; Meneguette, R.I. HARMONIC: Shapley values in market games for resource allocation in vehicular clouds. Ad Hoc Netw. 2023, 149, 103224. [Google Scholar] [CrossRef]
  9. Yu, R.; Huang, X.; Kang, J.; Ding, J.; Maharjan, S.; Gjessing, S.; Zhang, Y. Cooperative resource management in cloud-enabled vehicular networks. IEEE Trans. Ind. Electron. 2015, 62, 7938–7951. [Google Scholar] [CrossRef]
  10. Kim, J.W.; Kim, J.W.; Lee, J. Intelligent Resource Allocation Scheme Using Reinforcement Learning for Efficient Data Transmission in VANET. Sensors 2024, 24, 2753. [Google Scholar] [CrossRef]
  11. Xiao, H.; Cai, L.; Feng, J.; Pei, Q.; Shi, W. Resource optimization of MAB-based reputation management for data trading in vehicular edge computing. IEEE Trans. Wirel. Commun. 2023, 22, 5278–5290. [Google Scholar] [CrossRef]
  12. Cheng, P.; Chen, Y.; Ding, M.; Chen, Z.; Liu, S.; Chen, Y.P. Deep reinforcement learning for online resource allocation in IoT networks: Technology, development, and future challenges. IEEE Commun. Mag. 2023, 61, 111–117. [Google Scholar] [CrossRef]
  13. Zheng, X.; Li, M.; Chen, Y.; Guo, J.; Alam, M.; Hu, W. Blockchain-based secure computation offloading in vehicular networks. IEEE Trans. Intell. Transp. Syst. 2020, 22, 4073–4087. [Google Scholar] [CrossRef]
  14. Wang, J.; Wu, W.; Liao, Z.; Sherratt, S.R.; Kim, G.J.; Alfarraj, O.; Alzubi, A.; Tolba, A. A probability preferred priori offloading mechanism in mobile edge computing. IEEE Access 2020, 8, 39758–39767. [Google Scholar] [CrossRef]
  15. Bozorgchenani, A.; Mashhadi, F.; Tarchi, D.; Salinas Monroy, S.A. Multi-objective computation sharing in energy and delay constrained mobile edge computing environments. IEEE Trans. Mob. Comput. 2020, 20, 2992–3005. [Google Scholar] [CrossRef]
  16. Hou, X.; Ren, Z.; Wang, J.; Cheng, W.; Ren, Y.; Chen, K.C.; Zhang, H. Reliable computation offloading for edge-computing-enabled software-defined IoV. IEEE Internet Things J. 2020, 7, 7097–7111. [Google Scholar] [CrossRef]
  17. Khabbaz, M. Deadline-Constrained RSU-to-Vehicle Task Offloading Scheme for Vehicular Fog Networks. IEEE Trans. Veh. Technol. 2023, 72, 14955–14961. [Google Scholar] [CrossRef]
  18. Li, S.; Sun, W.; Ni, Q.; Sun, Y. Road Side Unit-Assisted Learning-Based Partial Task Offloading for Vehicular Edge Computing System. IEEE Trans. Veh. Technol. 2024, 73, 5546–5555. [Google Scholar] [CrossRef]
  19. Fan, W.; Su, Y.; Liu, J.; Li, S.; Huang, W.; Wu, F.; Liu, Y. Joint Task Offloading and Resource Allocation for Vehicular Edge Computing Based on V2I and V2V Modes. IEEE Trans. Intell. Transp. Syst. 2023, 24, 4277–4292. [Google Scholar] [CrossRef]
  20. Cao, D.; Wu, M.; Gu, N.; Sherratt, R.S.; Ghosh, U.; Sharma, P.K. Joint Optimization of Computation Offloading and Resource Allocation Considering Task Prioritization in ISAC-Assisted Vehicular Network. IEEE Internet Things J. 2024, 11, 29523–29532. [Google Scholar] [CrossRef]
  21. Fan, W.; Zhang, Y.; Zhou, G.; Liu, Y. Deep Reinforcement Learning-Based Task Offloading for Vehicular Edge Computing with Flexible RSU-RSU Cooperation. IEEE Trans. Intell. Transp. Syst. 2024, 25, 7712–7725. [Google Scholar] [CrossRef]
  22. Fan, W.; Hua, M.; Zhang, Y.; Su, Y.; Li, X.; Tang, B.; Wu, F.; Liu, Y. Game-Based Task Offloading and Resource Allocation for Vehicular Edge Computing with Edge-Edge Cooperation. IEEE Trans. Veh. Technol. 2023, 72, 7857–7870. [Google Scholar] [CrossRef]
  23. Liao, Z.; Xu, S.; Huang, J.; Wang, J. Task Migration and Resource Allocation Scheme in IoV with Roadside Unit. IEEE Trans. Netw. Serv. Manag. 2023, 20, 4528–4541. [Google Scholar] [CrossRef]
  24. Xu, X.; Jiang, Q.; Zhang, P.; Cao, X.; Khosravi, M.R.; Alex, L.T.; Qi, L.; Dou, W. Game Theory for Distributed IoV Task Offloading with Fuzzy Neural Network in Edge Computing. IEEE Trans. Fuzzy Syst. 2022, 30, 4593–4604. [Google Scholar] [CrossRef]
  25. Wang, H.; Lv, T.; Lin, Z.; Zeng, J. Energy-Delay Minimization of Task Migration Based on Game Theory in MEC-Assisted Vehicular Networks. IEEE Trans. Veh. Technol. 2022, 71, 8175–8188. [Google Scholar] [CrossRef]
  26. Mao, M.; Hu, T.; Zhao, W. Reliable task offloading mechanism based on trusted roadside unit service for internet of vehicles. Ad Hoc Netw. 2023, 139, 103045. [Google Scholar] [CrossRef]
  27. Lakhan, A.; Mohammed, M.A.; Garcia-Zapirain, B.; Nedoma, J.; Martinek, R.; Tiwari, P.; Kumar, N. Fully Homomorphic Enabled Secure Task Offloading and Scheduling System for Transport Applications. IEEE Trans. Veh. Technol. 2022, 71, 12140–12153. [Google Scholar] [CrossRef]
  28. Wang, S.; Li, J.; Wu, G.; Chen, H.; Sun, S. Joint optimization of task offloading and resource allocation based on differential privacy in vehicular edge computing. IEEE Trans. Comput. Soc. Syst. 2021, 9, 109–119. [Google Scholar] [CrossRef]
  29. Parvini, M.; Javan, M.R.; Mokari, N.; Abbasi, B.; Jorswieck, E.A. AoI-Aware Resource Allocation for Platoon-Based C-V2X Networks via Multi-Agent Multi-Task Reinforcement Learning. IEEE Trans. Veh. Technol. 2023, 72, 9880–9896. [Google Scholar] [CrossRef]
  30. Xu, S.; Guo, C.; Hu, R.Q.; Qian, Y. Blockchain-Inspired Secure Computa-tion Offloading in a Vehicular Cloud Network. IEEE Internet Things J. 2022, 9, 14723–14740. [Google Scholar] [CrossRef]
  31. Cao, T.; Yi, J.; Wang, X.; Xiao, H.; Xu, C. Interaction Trust-Driven Data Distribution for Vehicle Social Networks: A Matching Theory Approach. IEEE Trans. Comput. Soc. Syst. 2024, 11, 4071–4086. [Google Scholar] [CrossRef]
  32. Zhang, R.; Wu, L.; Cao, S.; Hu, X.; Xue, S.; Wu, D.; Li, Q. Task offloading with task classification and offloading nodes selection for MEC-enabled IoV. Acm Trans. Internet Technol. (TOIT) 2021, 22, 1–24. [Google Scholar] [CrossRef]
  33. Ahmed, M.; Raza, S.; Mirza, M.A.; Aziz, A.; Khan, M.A.; Khan, W.U.; Li, J.; Han, Z. A survey on vehicular task offloading: Classification, issues, and challenges. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 4135–4162. [Google Scholar] [CrossRef]
  34. Shen, Q.; Hu, B.J.; Xia, E. Dependency-aware task offloading and service caching in vehicular edge computing. IEEE Trans. Veh. Technol. 2022, 71, 13182–13197. [Google Scholar] [CrossRef]
  35. Li, J.; Zhang, S.; Geng, J.; Liu, J.; Wu, Z.; Zhu, H. A Differential Privacy Based Task Offloading Algorithm for Vehicular Edge Computing. IEEE Internet Things J. 2025, 12, 30921–30932. [Google Scholar] [CrossRef]
  36. Chen, L.; Du, J.; Zhu, X. Mobility-Aware Task Offloading and Resource Allocation in UAV-Assisted Vehicular Edge Computing Networks. Drones 2024, 8, 696. [Google Scholar] [CrossRef]
  37. Liu, Y.; Dong, M.; Ota, K.; Liu, A. ActiveTrust: Secure and trustable routing in wireless sensor networks. IEEE Trans. Inf. Forensics Secur. 2016, 11, 2013–2027. [Google Scholar] [CrossRef]
  38. Ahmed, N.; Mohammadani, K.; Bashir, A.K.; Omar, M.; Jones, A.; Hassan, F. Secure and reliable routing in the Internet of Vehicles network: AODV-RL with BHA attack defense. CMES-Comput. Model. Eng. Sci. 2024, 139, 633–659. [Google Scholar] [CrossRef]
  39. Jin, Y.; Gu, Z.; Ban, Z. Restraining false feedbacks in peer-to-peer reputation systems. In Proceedings of the International Conference on Semantic Computing (ICSC 2007), Irvine, CA, USA, 17–19 September 2007; pp. 304–312. [Google Scholar] [CrossRef]
  40. Ke, C.; Xiao, F.; Cao, Y.; Huang, Z. A group-vehicles oriented reputation assessment scheme for edge VANETs. IEEE Trans. Cloud Comput. 2024, 12, 859–875. [Google Scholar] [CrossRef]
  41. Lopez, P.A.; Behrisch, M.; Bieker-Walz, L.; Erdmann, J.; Flötteröd, Y.P.; Hilbrich, R.; Lücken, L.; Rummel, J.; Wagner, P.; Wiessner, E. Microscopic traffic simulation using SUMO. In Proceedings of the 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 2575–2582. [Google Scholar] [CrossRef]
  42. Campanile, L.; Gribaudo, M.; Iacono, M.; Marulli, F.; Mastroianni, M. Computer network simulation with ns-3: A systematic literature review. Electronics 2020, 9, 272. [Google Scholar] [CrossRef]
  43. Wang, X.; Wang, S.; Liang, X.; Zhao, D.; Huang, J.; Xu, X.; Dai, B.; Miao, Q. Deep reinforcement learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 5064–5078. [Google Scholar] [CrossRef] [PubMed]
  44. Lindemann, B.; Müller, T.; Vietz, H.; Jazdi, N.; Weyrich, M. A survey on long short-term memory networks for time series prediction. Procedia CIRP 2021, 99, 650–655. [Google Scholar] [CrossRef]
  45. Tang, M.; Wong, V.W.S. Deep reinforcement learning for task offloading in mobile edge computing systems. IEEE Trans. Mob. Comput. 2020, 21, 1985–1997. [Google Scholar] [CrossRef]
  46. Luo, Q.; Zhang, J.; Hu, S.; Luan, T.H.; Fan, P. Joint Task Migration and Resource Allocation in Vehicular Edge Computing: A Deep Reinforcement Learning-Based Approach. IEEE Trans. Veh. Technol. 2025, 74, 9476–9490. [Google Scholar] [CrossRef]
Figure 1. Internet of Vehicles under attack scenarios.
Figure 1. Internet of Vehicles under attack scenarios.
Vehicles 07 00150 g001
Figure 2. Overall architecture of TOM-RE framework.
Figure 2. Overall architecture of TOM-RE framework.
Vehicles 07 00150 g002
Figure 3. The NSGA-II Multi-objective Optimization Model.
Figure 3. The NSGA-II Multi-objective Optimization Model.
Vehicles 07 00150 g003
Figure 4. The flow of simulation.
Figure 4. The flow of simulation.
Vehicles 07 00150 g004
Figure 5. Real Values vs Predicted Values of RSU Reputation.
Figure 5. Real Values vs Predicted Values of RSU Reputation.
Vehicles 07 00150 g005
Figure 6. Real Values vs. Predicted Values of Mobile Node Reputation.
Figure 6. Real Values vs. Predicted Values of Mobile Node Reputation.
Vehicles 07 00150 g006
Figure 7. Comparison of Task Completion Rates.
Figure 7. Comparison of Task Completion Rates.
Vehicles 07 00150 g007
Figure 8. Comparison of Average Task Delay.
Figure 8. Comparison of Average Task Delay.
Vehicles 07 00150 g008
Figure 9. Comparison of Total Energy Consumption.
Figure 9. Comparison of Total Energy Consumption.
Vehicles 07 00150 g009
Table 1. Coding Scheme for Task Category Field.
Table 1. Coding Scheme for Task Category Field.
Coding SchemeTask CategoryTypical Scenarios (Examples)
0x00Urgent SafetyCollision Warning, Automatic Braking, Emergency Call
0x01Safety AssistanceLane Keeping, Blind Spot Monitoring, Tire Pressure Monitoring
0x02Non-real-time EntertainmentOTA Updates, Streaming Media, Navigation Data Download
0x03-0xFFReservedLegacy Devices or Unclassified Tasks
Table 2. Core Field Description of the Simulated Dataset.
Table 2. Core Field Description of the Simulated Dataset.
Field NameDescription
Packet IDUnique identifier of each data packet, ensuring traceability in transmission logs (e.g., Packet_000123).
Source Node IDIdentifier of the node that generated the task (Vehicle, RSU, or UAV) (e.g., RSU_07).
Destination Node IDIdentifier of the node receiving the offloaded task (e.g., UAV_02).
Task Type LabelEncoded category of the computational task, supporting classification-based offloading (e.g., 0x00 = emergency-safety, 0x01 = safety-assist, 0x02 = non-real-time entertainment).
Transmission DelayEnd-to-end communication delay in milliseconds (ms), indicating network latency (e.g., 25.6).
ThroughputData transmission rate during task offloading (Mbps) (e.g., 8.2).
Packet Loss RateRatio of lost packets during transmission, reflecting network reliability (e.g., 0.005).
Energy ConsumptionEnergy consumed for task processing or transmission (J) (e.g., 0.82).
Node Resource StateCurrent resource utilization of the node, including CPU and memory usage (e.g., CPU = 0.78, Memory = 0.62).
Attack FlagIndicator of malicious behavior in the current record (e.g.,
0 = normal, 1 = blackhole, 2 = false feedback).
Vehicle State InformationReal-time motion state of vehicles, including position coordinates and speed (e.g., Position = (128.52, 64.38), Speed = 14.6 m/s).
TimestampSimulation time of data generation or task completion (ms) (e.g., 152.38).
Table 3. SUMO/NS3 Experimental Parameters.
Table 3. SUMO/NS3 Experimental Parameters.
ParameterValue
Road area/m2 2000 × 2000
Number of vehicles50/100/150/200
Number of UAVs3
WLAN protocol802.11a
Node mobility modeltrace-based mobility
Channel typeYansWifiChannel
Transmission power/dBm[15, 20]
Transmission rate/Mbps54
Simulation duration/min[10, 30]
Proportion of malicious vehicles10%
Vehicle speed/m·s−1[5, 15]
RSU coverage radius/m500
Reputation update period/min10
Message frequency/(counts· 10 min 1 )[10, 30]
Table 4. Key Hyperparameters of LSTM and DQN Models.
Table 4. Key Hyperparameters of LSTM and DQN Models.
(a) LSTM
ParameterValue
window_size60
hidden_size512
dropout0.2
batch_size256
lr 3 × 10 5
grad_clip1.0
patience50
weight_decay 1 × 10 6
(b) DQN
ParameterValue
epsilon1.0
epsilon_min0.01
epsilon_decay0.995
learning_rate0.001
gamma0.9
update_target_frequency10
batch_size64
hidden_layers[256, 128]
Table 5. Analysis of RSU Regression Indicators.
Table 5. Analysis of RSU Regression Indicators.
RSU_IDMSEMAER2
00.010550.081990.91192
40.009670.079010.92180
80.009110.075410.92501
120.009400.076190.92580
160.010340.081240.91503
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ye, L.; Fan, N.; Zhang, J.; Shang, Y.; Shi, Y.; Fan, W. Reputation-Aware Multi-Agent Cooperative Offloading Mechanism for Vehicular Network Attack Scenarios. Vehicles 2025, 7, 150. https://doi.org/10.3390/vehicles7040150

AMA Style

Ye L, Fan N, Zhang J, Shang Y, Shi Y, Fan W. Reputation-Aware Multi-Agent Cooperative Offloading Mechanism for Vehicular Network Attack Scenarios. Vehicles. 2025; 7(4):150. https://doi.org/10.3390/vehicles7040150

Chicago/Turabian Style

Ye, Liping, Na Fan, Junhui Zhang, Yexiong Shang, Yu Shi, and Wenjun Fan. 2025. "Reputation-Aware Multi-Agent Cooperative Offloading Mechanism for Vehicular Network Attack Scenarios" Vehicles 7, no. 4: 150. https://doi.org/10.3390/vehicles7040150

APA Style

Ye, L., Fan, N., Zhang, J., Shang, Y., Shi, Y., & Fan, W. (2025). Reputation-Aware Multi-Agent Cooperative Offloading Mechanism for Vehicular Network Attack Scenarios. Vehicles, 7(4), 150. https://doi.org/10.3390/vehicles7040150

Article Metrics

Back to TopTop